The Art of Data Simulation

Most problems in the scientific world are about understanding different phenomena. We want to learn the characteristics and patterns of the systems we study to be able to preview and predict behavior. As humans, we learn by observing these processes when they happen naturally or with controlled experiments. This might not be an option if we are studying a rare or dangerous event.

With the advent of computer simulations, scientists were given the tools to capture natural systems behavior and furthermore theoretical systems, extreme scenarios and alternative realities. Nowadays, simulations are part of most trusted research studies. The simulation challenges depend on the complexity of the process it mimics. A simple simulation example is trying to reproduce observations from distributions with known parameters.

A simple example could be trying to simulate an experiment consisting of throwing a ball to the roulette and counting if it falls in the area with the stars or in the area with the lines.

We can easily mimic this behavior by sampling from a binomial distribution with probabilities ¼ and ¾. This can be done by generating a random number in the interval (0,1) (Uniform Distribution) and then we choose Stars if the number is less than ¼ and we choose the Red Lines if the number is greater than ¼. This methodology can be used in cases where we are able to compute the inverse of the distribution function.

Feel like creating some fake data? Let’s walk through the Distribution Generator template that allows to simulate observations from the most common Univariate and Multivariate distributions:

Step 1 Data Wrangling

First, we need to create a data table with the following column names and format

  • SimulationID: Attach an ID to different alternatives. We keep the same variables with different parameters to study different behaviors.
  • Column: Contains the Variable Names. Variable Names need to be the same for different SimulationIDs.
  • Type: If the distribution is Univariate or Multivariate.
  • Mode: The type of distribution we want to generate.

The rest of the columns refers to parameters for different type of variables. Look at the example data table to see the required format. The values under Categories, MultivNormPar and Alpha need to respect the formatting structure.

Step 2 Model Inputs

Once you created the input data table replace this Model Inputs with your own. Then select the number of samples to generate and run the simulation by clicking the Simulation Generator button.

Once it runs, you’ll get two output data tables. One for uni-variate distributions and the other for multivariate. Check your outputs in the Results tab.

Finally, you can get a closer look to your results by visualizing the histograms. You can select the Simulation ID and the uni-variate variable to visualize.

Step 3 Visualize the Results

While we’ve done all the analysis, we still haven’t visualized it! Now is the time to explore the results and visualize our findings. Some examples are given in the last tab but feel free to create your own visualizations.

I hope this small introduction to Simulation awake your curiosity and led you to study more complex problems. Here is the link to the template, have fun using it ……..

Leave a Comment

Your email address will not be published. Required fields are marked *