What is Teaching with Data Simulations?

Initial Publication Date: May 4, 2007

Teaching with data simulations means giving students opportunities to simulate data in order to answer a particular research question or solve a statistical problem. There are several ways to use simulations. These include:

  • Physical simulations of a process (e.g., taking repeated samples of Reese's pieces candies to simulate a sampling distribution for the proportion of orange candies, or tossing coins to model births in order to estimate the effect of a "One-Son" policy on average family size).
  • Simulating a game or situation to estimate the chances of certain outcomes (e.g., playing Let's Make a Deal to estimate the chances of winning using two different strategies, to determine which is the better strategy).
  • Using a probability model to simulate data to estimate the chance of a particular outcome (e.g., the chance of getting "three of a kind" when dealing five cards, or the chance of getting five heads on six tosses of a fair coin).
  • Simulating data while varying parameters to illustrate a concept or deepen students' understanding of a process (e.g., simulating confidence intervals from different populations while varying sample size, level of confidence, or standard deviation).
  • Using simulation to generate data under a certain theory to test whether a particular outcome is surprising (e.g., if a student correctly identifies 8 out of 10 samples of cola correctly in a blind taste test. Simulation can be used to determine if this result could just be due to chance/guessing by generating data based on what would be expected if the person is guessing, and comparing their result to a simulated sampling distribution).

Illustrating Difficult and Abstract Concepts

While simulating data can have many functions in the practice of statistics, it is especially helpful in the classroom to facilitate student understanding of concepts that have traditionally been very difficult to learn. Probability is an aspect of statistics education that students have traditionally found difficult to grasp. Simulations allow students to visualize probability distributions, which in turn can make the processes associated with probability more concrete. Likewise, hypothesis testing and inference are important and difficult areas of statistics. As noted above, the use of simulations can facilitate a deeper and more concrete understanding of hypothesis testing and inference by providing visual distributions of data to compare a sample result to, or illustrating how many simulated confidence intervals do or do not capture the true population mean. Similarly, sampling distributions and understanding the processes associated with sampling from a population can be difficult for students and consequently using simulations can facilitate greater student understanding of sampling and the Central Limit Theorem.

Generating Data by Specifying a Probability Model

A fundamental question in statistics is...do the observed results differ significantly from what is expected by chance? Simulation is an important tool that students can use to understand what is expected under chance conditions. Students can gather information about chance conditions and then use statistical reasoning to compare the chance outcomes to the observed outcomes. By creating a model of what is expected under chance students can gain important insight into the variability of a given sampling statistic and use this model to understand how the sampling statistic behaves (Burrill, 2002).


Learning about Inference and Hypothesis Testing

Inference and hypothesis testing are frequently aspects of statistics that students find quite difficult to grasp. The use of simulations provides an important avenue for statistics educators to help facilitate an understanding of inference and hypothesis testing. The connection between statistics and probability are made explicit by simulating the conditions of the null hypothesis (Erickson, 2006). Simulating what is expected under the null hypothesis provides an opportunity for students to develop a deeper understanding of p-values, effect sizes, and Type I & Type II errors.


Sampling from a Population

Using simulation to help students understand the processes associated with sampling from a population can help students gain valuable insight into the processes associated with sampling. Simulation can help students understand what a sample is, why we take samples, how to sample, what constitutes a good sample, and how sample size effects parameter estimation.