Help the Stat Consulting Group by giving a gift

Introduction to the Practice of Statistics

Chapter 5

The following example uses theheadsprogram. If you don't have theheadsprogram, you can download if from within Stata by typingfindit heads(see How can I use the findit command to search for programs and get additional help? for more information about usingfindit).

Theheadsprogram can be used to produce results simulating those shown in Figure 5.1. In the example in the book, the number of bad switches are examined, so whenever you think of getting a "head" with the heads program, it is like getting a bad switch. The example draws 10 switches (coins) at a time and does this for 1000 trials. The data for this can be generated with theheadscommand below.

heads , save

Then click onquitand the graph in Figure 5.1 can be produced with the graph command below.

histogram heads, discrete xlabel(0(1)6)

The graph we got is shown below, and looks much like (but not exactly like) Figure 5.1. You can vary the number of trials, and you will find that as you increase the number of trials (from 1000) the graph will look more and more exactly like figure 5.1.

Theheadsprogram can be used to produce results like that in figure 5.4. The only difference from figure 5.1 above is that there are 100 switches drawn at a time (or the equivalent of tossing 100 coins at a time. This is illustrated below.

heads , save

tab heads# heads out | of 100 | tossed | Freq. Percent Cum. ------------+----------------------------------- 2 | 1 0.10 0.10 3 | 5 0.50 0.60 4 | 18 1.80 2.40 5 | 42 4.20 6.60 6 | 60 6.00 12.60 7 | 88 8.80 21.40 8 | 123 12.30 33.70 9 | 131 13.10 46.80 10 | 120 12.00 58.80 11 | 119 11.90 70.70 12 | 86 8.60 79.30 13 | 74 7.40 86.70 14 | 50 5.00 91.70 15 | 39 3.90 95.60 16 | 23 2.30 97.90 17 | 5 0.50 98.40 18 | 11 1.10 99.50 19 | 1 0.10 99.60 20 | 1 0.10 99.70 21 | 3 0.30 100.00 ------------+----------------------------------- Total | 1,000 100.00

Below we show a graph of this.

histogram heads, discrete xlabel(2(1)21)

Figure 5.5 illustrates the area under the curve using the normal approximation to the binomial. There is an excellent demonstration of this at the Rice Virtual Lab in Statistics at http://www.ruf.rice.edu/~lane/stat_sim/normal_approx/index.html If you choose an N of 100, P of .1, and to show the probability from 0 to 9, you see that you get the results shown at the bottom of page 386 corresponding to figure 5.10. You can see that the exact probability is .45 vs. the normal approximation of .43. You can vary the N and see that as the N decreases, the discrepancy between these two results increases, and as the N increases, the discrepancy decreases. In other words, the accuracy of the "normal approximation" improves with as the N gets greater and greater.

The following example uses the

cltprogram. If you don't have thecltprogram, you can download if from within Stata by typingfindit clt(see How can I use the findit command to search for programs and get additional help? for more information about usingfindit).Figure 5.5 illustrates how the distribution of sample means becomes more and more normal as the sample size increases. The first figure (5.5a) appears like an exponential distribution with sample size of 1, and the following figures have a sample size of 2, 10 and 25. We can use the

cltprogram (central limit theorem) to illustrate this. The examples below draw 1000 sample means from an exponential distribution with sample sizes of 1, 2, 10 and 25.

clt

Sample size of 1

Sample size of 2

Sample size of 10

Sample size of 25

In addition to the examples above, you can try any sample size you like by the N per sample pulldown.Likewise, you can try other distributions including a log distribution, or a normal bimodal distribution, or a uniform distribution.

Below we show one more example where we used a log normal distribution with a sample size of 100 and drawing 5000 sample means, and showing a normal overlay so we can compare the results to a normal distribution.

You can also experiment with producing figures like Figure 5.9 using the Rice Virtual Lab in Statistics demonstration of the Central Limit Theorem at http://onlinestatbook.com/stat_sim/sampling_dist/index.html

Below we started with a parent population that was skewed, and chose to see the distribution of the mean with N=10 and N=20, and a normal overlay. You can see that as the N went from 10 to 20, the distribution became more normal in shape. This demonstration allows you to choose other parent populations, and even allows you to make a custom population by clicking the mouse on the parent distribution to alter the shape of the distribution.