Stata Library
Survey Sampling Examples using Stata 9

Introduction

Survey data generally have one or more of these three characteristics:

Stata takes theses characteristics into account through the use of survey procedures. Before issuing any survey commands it is necessary to set one or more of the following items:

Failure to analyze survey sampling designs without taking these characteristics into account can result in inaccurate point estimates and/or inaccurate estimates of standard errors.

In this unit we will be using data from the book Sampling of Populations by Levy and Lemeshow (1999) with permission of the authors.

Some Definitions

The Population

California requires that all students in public schools be tested each year. The State Department of Education then puts together the annual Academic Performance Index (API) which rates how a school is doing overall, in terms of the test scores. The file, apipop.dta, contains api ratings and demographic information on 6,194 schools in 757 school districts. To be included in the file schools must have at least 100 students.

Of course, in the normal course of events you wouldn't actually have access to data from the whole population. We were lucky in this instance that California collects and releases these data.

Let's try several computations on the population data.

use http://www.ats.ucla.edu/stat/stata/library/apipop, clear

tabulate stype

      stype |      Freq.     Percent        Cum.
------------+-----------------------------------
          E |       4421       71.38       71.38
          H |        755       12.19       83.56
          M |       1018       16.44      100.00
------------+-----------------------------------
      Total |       6194      100.00

summarize api00

Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
   api00 |    6194    664.7126   128.2441        346        969

quietly summarize enroll

display %10.0fc r(sum)
3,811,472

regress api00 meals ell avg_ed

  Source |       SS       df       MS                  Number of obs =    6016
---------+------------------------------               F(  3,  6012) = 5837.12
   Model |  73775065.7     3  24591688.6               Prob > F      =  0.0000
Residual |  25328472.8  6012  4212.98616               R-squared     =  0.7444
---------+------------------------------               Adj R-squared =  0.7443
   Total |  99103538.5  6015  16476.0662               Root MSE      =  64.908

------------------------------------------------------------------------------
   api00 |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
   meals |  -1.672069   .0568866    -29.393   0.000      -1.783587   -1.560551
     ell |  -.6775632   .0616073    -10.998   0.000      -.7983355   -.5567908
  avg_ed |   72.30502    2.09055     34.587   0.000       68.20679    76.40325
   _cons |    558.443   7.969069     70.076   0.000       542.8207    574.0652
------------------------------------------------------------------------------

Simple Random Sampling Example

Let's take a simple random sample of 200 schools from the population file. This can be accomplished with the commands:
generate i = uniform() 
sort i . keep in 1/200 
In this example, the sampling frame contains the 6,194 school so fpc = 6194 and the sampling weights (pw) = 6194/200 = 30.97.

Of course, in the real world you probably wouldn't take a sample of 200 school from a computer file of 6,194, you would just analyze the entire dataset. But suppose you had to go out to each school to collect the data that you needed, then it would take much less time and cost much less money to go to 200 schools than to over 6,000 schools.

The file apisrs.dta has a simple random sample of 200 cases.
use http://www.ats.ucla.edu/stat/stata/library/apisrs, clear

tabulate stype

      stype |      Freq.     Percent        Cum.
------------+-----------------------------------
          E |        145       72.50       72.50
          H |         25       12.50       85.00
          M |         30       15.00      100.00
------------+-----------------------------------
      Total |        200      100.00

tabulate dnum

   district |
     number |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |          1        0.50        0.50
         40 |          1        0.50        1.00
         41 |          1        0.50        1.50
         43 |          1        0.50        2.00
         46 |          3        1.50        3.50
         48 |          1        0.50        4.00
         55 |          1        0.50        4.50
         56 |          2        1.00        5.50
         57 |          1        0.50        6.00
         60 |          1        0.50        6.50
         67 |          1        0.50        7.00
         80 |          1        0.50        7.50
         90 |          2        1.00        8.50
         98 |          1        0.50        9.00
        103 |          1        0.50        9.50
        105 |          1        0.50       10.00
        108 |          2        1.00       11.00
        124 |          1        0.50       11.50
        131 |          1        0.50       12.00
        135 |          2        1.00       13.00
        148 |          2        1.00       14.00
        154 |          1        0.50       14.50
        159 |          1        0.50       15.00
        162 |          1        0.50       15.50
        166 |          3        1.50       17.00
        175 |          1        0.50       17.50
        176 |          1        0.50       18.00
        184 |          1        0.50       18.50
        190 |          1        0.50       19.00
        209 |          1        0.50       19.50
        217 |          1        0.50       20.00
        222 |          1        0.50       20.50
        229 |          1        0.50       21.00
        231 |          1        0.50       21.50
        238 |          1        0.50       22.00
        248 |          2        1.00       23.00
        253 |          3        1.50       24.50
        255 |          1        0.50       25.00
        259 |          1        0.50       25.50
        266 |          1        0.50       26.00
        272 |          1        0.50       26.50
        274 |          1        0.50       27.00
        278 |          2        1.00       28.00
        293 |          1        0.50       28.50
        301 |          1        0.50       29.00
        304 |          1        0.50       29.50
        335 |          1        0.50       30.00
        351 |          1        0.50       30.50
        352 |          1        0.50       31.00
        353 |          1        0.50       31.50
        358 |          1        0.50       32.00
        360 |          1        0.50       32.50
        379 |          1        0.50       33.00
        390 |          1        0.50       33.50
        393 |          1        0.50       34.00
        395 |          2        1.00       35.00
        401 |         18        9.00       44.00
        416 |          1        0.50       44.50
        418 |          2        1.00       45.50
        436 |          1        0.50       46.00
        444 |          1        0.50       46.50
        445 |          1        0.50       47.00
        451 |          1        0.50       47.50
        457 |          2        1.00       48.50
        459 |          1        0.50       49.00
        460 |          1        0.50       49.50
        470 |          1        0.50       50.00
        473 |          1        0.50       50.50
        479 |          1        0.50       51.00
        491 |          1        0.50       51.50
        495 |          1        0.50       52.00
        498 |          1        0.50       52.50
        503 |          2        1.00       53.50
        507 |          5        2.50       56.00
        509 |          1        0.50       56.50
        513 |          2        1.00       57.50
        529 |          2        1.00       58.50
        532 |          1        0.50       59.00
        533 |          1        0.50       59.50
        536 |          1        0.50       60.00
        537 |          2        1.00       61.00
        539 |          3        1.50       62.50
        541 |          1        0.50       63.00
        542 |          1        0.50       63.50
        547 |          1        0.50       64.00
        556 |          2        1.00       65.00
        564 |          1        0.50       65.50
        570 |          1        0.50       66.00
        579 |          1        0.50       66.50
        590 |          1        0.50       67.00
        600 |          1        0.50       67.50
        602 |          1        0.50       68.00
        605 |          1        0.50       68.50
        614 |          2        1.00       69.50
        620 |          3        1.50       71.00
        623 |          1        0.50       71.50
        627 |          3        1.50       73.00
        629 |          1        0.50       73.50
        630 |          2        1.00       74.50
        632 |          5        2.50       77.00
        633 |          1        0.50       77.50
        635 |          1        0.50       78.00
        636 |          2        1.00       79.00
        637 |          1        0.50       79.50
        640 |          1        0.50       80.00
        642 |          1        0.50       80.50
        643 |          1        0.50       81.00
        644 |          1        0.50       81.50
        645 |          1        0.50       82.00
        648 |          1        0.50       82.50
        651 |          1        0.50       83.00
        653 |          1        0.50       83.50
        658 |          1        0.50       84.00
        665 |          1        0.50       84.50
        688 |          1        0.50       85.00
        689 |          1        0.50       85.50
        702 |          1        0.50       86.00
        711 |          1        0.50       86.50
        716 |          1        0.50       87.00
        720 |          1        0.50       87.50
        731 |          1        0.50       88.00
        739 |          1        0.50       88.50
        744 |          3        1.50       90.00
        745 |          1        0.50       90.50
        750 |          1        0.50       91.00
        751 |          1        0.50       91.50
        754 |          1        0.50       92.00
        756 |          1        0.50       92.50
        761 |          1        0.50       93.00
        779 |          2        1.00       94.00
        780 |          1        0.50       94.50
        782 |          1        0.50       95.00
        788 |          1        0.50       95.50
        796 |          4        2.00       97.50
        797 |          1        0.50       98.00
        803 |          1        0.50       98.50
        815 |          1        0.50       99.00
        830 |          1        0.50       99.50
        834 |          1        0.50      100.00
------------+-----------------------------------
      Total |        200      100.00

svyset

 pweight: pw
          VCE: linearized
     Strata 1: <one>
         SU 1: <observations>
        FPC 1: fpc

svy: mean api00

(running mean on estimation sample)

Survey: Mean estimation

Number of strata =       1          Number of obs    =     200
Number of PSUs   =     200          Population size  =    6194
                                    Design df        =     199

--------------------------------------------------------------
             |             Linearized
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
       api00 |    660.165   9.186887      642.0489    678.2811
--------------------------------------------------------------

svy: total enroll

(running total on estimation sample)

Survey: Total estimation

Number of strata =       1          Number of obs    =     200
Number of PSUs   =     200          Population size  =    6194
                                    Design df        =     199

--------------------------------------------------------------
             |             Linearized
             |      Total   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
      enroll |    3924828   220705.4       3489607     4360049
--------------------------------------------------------------

svy: regress api00 meals ell avg_ed

(running regress on estimation sample)

Survey: Linear regression

Number of strata   =         1                  Number of obs      =       200
Number of PSUs     =       200                  Population size    = 6193.9999
                                                Design df          =       199
                                                F(   3,    197)    =    217.11
                                                Prob > F           =    0.0000
                                                R-squared          =    0.7640

------------------------------------------------------------------------------
             |             Linearized
       api00 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       meals |  -1.367668   .3544273    -3.86   0.000    -2.066583   -.6687524
         ell |  -1.266818   .3895673    -3.25   0.001    -2.035028   -.4986079
      avg_ed |   75.49145   14.28649     5.28   0.000     47.31912    103.6638
       _cons |   544.7082   56.15402     9.70   0.000     433.9749    655.4414
------------------------------------------------------------------------------

Stratified Random Sampling Example

This time instead of taking a simple random sample of the whole population we will take separate simple random samples of elementary schools, high school and middle schools. This is known as stratified random sampling. We will sample 100 elementary schools, 50 high schools and 50 middle schools.

In this example, there are three sampling frames: 4,421 elementary schools, 755 high schools, and 1,018 middle schools.

The file apistrat.dta contains the data for the stratified random sample.
use http://www.ats.ucla.edu/stat/stata/library/apistrat, clear

tabulate stype

      stype |      Freq.     Percent        Cum.
------------+-----------------------------------
          E |        100       50.00       50.00
          H |         50       25.00       75.00
          M |         50       25.00      100.00
------------+-----------------------------------
      Total |        200      100.00

tabulate dnum

   district |
     number |      Freq.     Percent        Cum.
------------+-----------------------------------
         19 |          1        0.50        0.50
         20 |          1        0.50        1.00
         25 |          1        0.50        1.50
         27 |          1        0.50        2.00
         40 |          1        0.50        2.50
         41 |          1        0.50        3.00
         64 |          1        0.50        3.50
         69 |          1        0.50        4.00
        105 |          1        0.50        4.50
        108 |          1        0.50        5.00
        114 |          1        0.50        5.50
        135 |          1        0.50        6.00
        140 |          1        0.50        6.50
        148 |          2        1.00        7.50
        153 |          5        2.50       10.00
        155 |          1        0.50       10.50
        158 |          2        1.00       11.50
        160 |          1        0.50       12.00
        162 |          1        0.50       12.50
        176 |          1        0.50       13.00
        182 |          1        0.50       13.50
        185 |          2        1.00       14.50
        196 |          1        0.50       15.00
        202 |          1        0.50       15.50
        208 |          1        0.50       16.00
        214 |          1        0.50       16.50
        215 |          2        1.00       17.50
        216 |          1        0.50       18.00
        223 |          1        0.50       18.50
        225 |          1        0.50       19.00
        226 |          1        0.50       19.50
        233 |          1        0.50       20.00
        238 |          2        1.00       21.00
        247 |          1        0.50       21.50
        253 |          4        2.00       23.50
        259 |          4        2.00       25.50
        266 |          2        1.00       26.50
        270 |          2        1.00       27.50
        273 |          1        0.50       28.00
        275 |          1        0.50       28.50
        279 |          1        0.50       29.00
        284 |          1        0.50       29.50
        294 |          1        0.50       30.00
        308 |          1        0.50       30.50
        316 |          1        0.50       31.00
        324 |          1        0.50       31.50
        333 |          1        0.50       32.00
        339 |          1        0.50       32.50
        348 |          1        0.50       33.00
        349 |          1        0.50       33.50
        351 |          1        0.50       34.00
        358 |          1        0.50       34.50
        364 |          1        0.50       35.00
        376 |          1        0.50       35.50
        382 |          2        1.00       36.50
        390 |          1        0.50       37.00
        394 |          1        0.50       37.50
        395 |          3        1.50       39.00
        401 |         16        8.00       47.00
        419 |          1        0.50       47.50
        423 |          1        0.50       48.00
        432 |          1        0.50       48.50
        439 |          1        0.50       49.00
        448 |          1        0.50       49.50
        450 |          1        0.50       50.00
        457 |          1        0.50       50.50
        459 |          1        0.50       51.00
        460 |          1        0.50       51.50
        465 |          1        0.50       52.00
        473 |          3        1.50       53.50
        475 |          1        0.50       54.00
        478 |          1        0.50       54.50
        484 |          1        0.50       55.00
        492 |          1        0.50       55.50
        495 |          1        0.50       56.00
        497 |          1        0.50       56.50
        498 |          1        0.50       57.00
        499 |          1        0.50       57.50
        501 |          1        0.50       58.00
        507 |          4        2.00       60.00
        509 |          1        0.50       60.50
        512 |          1        0.50       61.00
        513 |          2        1.00       62.00
        514 |          1        0.50       62.50
        515 |          1        0.50       63.00
        531 |          2        1.00       64.00
        532 |          1        0.50       64.50
        537 |          1        0.50       65.00
        541 |          3        1.50       66.50
        550 |          1        0.50       67.00
        554 |          1        0.50       67.50
        569 |          1        0.50       68.00
        575 |          2        1.00       69.00
        590 |          2        1.00       70.00
        596 |          1        0.50       70.50
        602 |          2        1.00       71.50
        605 |          1        0.50       72.00
        620 |          2        1.00       73.00
        621 |          3        1.50       74.50
        627 |          1        0.50       75.00
        630 |          2        1.00       76.00
        632 |          4        2.00       78.00
        635 |          2        1.00       79.00
        636 |          2        1.00       80.00
        639 |          2        1.00       81.00
        650 |          1        0.50       81.50
        653 |          2        1.00       82.50
        655 |          1        0.50       83.00
        656 |          1        0.50       83.50
        662 |          1        0.50       84.00
        685 |          1        0.50       84.50
        689 |          5        2.50       87.00
        702 |          1        0.50       87.50
        706 |          1        0.50       88.00
        722 |          1        0.50       88.50
        725 |          2        1.00       89.50
        735 |          1        0.50       90.00
        738 |          1        0.50       90.50
        751 |          1        0.50       91.00
        756 |          1        0.50       91.50
        760 |          1        0.50       92.00
        766 |          1        0.50       92.50
        767 |          2        1.00       93.50
        774 |          1        0.50       94.00
        780 |          2        1.00       95.00
        781 |          1        0.50       95.50
        784 |          1        0.50       96.00
        787 |          1        0.50       96.50
        796 |          1        0.50       97.00
        797 |          1        0.50       97.50
        802 |          1        0.50       98.00
        806 |          1        0.50       98.50
        813 |          1        0.50       99.00
        819 |          1        0.50       99.50
        825 |          1        0.50      100.00
------------+-----------------------------------
      Total |        200      100.00

svyset

      pweight: pw
          VCE: linearized
     Strata 1: stype
         SU 1: <observations>
        FPC 1: fpc

svy: mean api00

(running mean on estimation sample)

Survey: Mean estimation

Number of strata =       3          Number of obs    =     200
Number of PSUs   =     200          Population size  =    6194
                                    Design df        =     197

--------------------------------------------------------------
             |             Linearized
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
       api00 |   662.2874   9.408941      643.7322    680.8425
--------------------------------------------------------------

svy: total enroll

(running total on estimation sample)

Survey: Total estimation

Number of strata =       3          Number of obs    =     200
Number of PSUs   =     200          Population size  =    6194
                                    Design df        =     197

--------------------------------------------------------------
             |             Linearized
             |      Total   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
      enroll |    3687178   114641.7       3461095     3913260
--------------------------------------------------------------

svy: regress api00 meals ell avg_ed

(running regress on estimation sample)

Survey: Linear regression

Number of strata   =         3                  Number of obs      =       200
Number of PSUs     =       200                  Population size    =      6194
                                                Design df          =       197
                                                F(   3,    195)    =    190.97
                                                Prob > F           =    0.0000
                                                R-squared          =    0.7125

------------------------------------------------------------------------------
             |             Linearized
       api00 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       meals |  -1.818234   .4076227    -4.46   0.000    -2.622098    -1.01437
         ell |  -.0191524   .3890413    -0.05   0.961    -.7863727    .7480679
      avg_ed |   77.47879   16.93665     4.57   0.000     44.07838    110.8792
       _cons |   534.4453   65.57342     8.15   0.000     405.1294    663.7613
------------------------------------------------------------------------------

One-Stage Cluster Sampling

Another approach to sampling from the population is cluster sampling. In this example we will use school districts as the cluster or primary sampling units. We will take a random sample of 15 school districts and look at all of the schools in each one.

In this example, the sampling frame contains the 757 school districts.

The file apiclus1.dta will contain the data for the one-stage cluster sampling design.
use http://www.ats.ucla.edu/stat/stata/library/apiclus1, clear

tabulate stype

      stype |      Freq.     Percent        Cum.
------------+-----------------------------------
          E |        144       78.69       78.69
          H |         14        7.65       86.34
          M |         25       13.66      100.00
------------+-----------------------------------
      Total |        183      100.00

tabulate dnum

   district |
     number |      Freq.     Percent        Cum.
------------+-----------------------------------
         61 |         13        7.10        7.10
        135 |         34       18.58       25.68
        178 |          4        2.19       27.87
        197 |         13        7.10       34.97
        255 |         16        8.74       43.72
        406 |          2        1.09       44.81
        413 |          1        0.55       45.36
        437 |          4        2.19       47.54
        448 |         12        6.56       54.10
        510 |         21       11.48       65.57
        568 |          9        4.92       70.49
        637 |         11        6.01       76.50
        716 |         37       20.22       96.72
        778 |          2        1.09       97.81
        815 |          4        2.19      100.00
------------+-----------------------------------
      Total |        183      100.00

svyset dnum [pw=pw], fpc(fpc)

      pweight: pw
          VCE: linearized
     Strata 1: <one>
         SU 1: dnum
        FPC 1: fpc

/* list fpc pw dnum -- to see the values for these items */

svy: mean api00

(running mean on estimation sample)

Survey: Mean estimation

Number of strata =       1          Number of obs    =     183
Number of PSUs   =      15          Population size  =  9235.4
                                    Design df        =      14

--------------------------------------------------------------
             |             Linearized
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
       api00 |   644.1694   23.54224      593.6763    694.6625
--------------------------------------------------------------

svy: total enroll

(running total on estimation sample)

Survey: Total estimation

Number of strata =       1          Number of obs    =     183
Number of PSUs   =      15          Population size  =  9235.4
                                    Design df        =      14

--------------------------------------------------------------
             |             Linearized
             |      Total   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
      enroll |    5076846    1389984       2095626     8058066
--------------------------------------------------------------

svy: regress api00 meals ell avg_ed

(running regress on estimation sample)

Survey: Linear regression

Number of strata   =         1                  Number of obs      =       157
Number of PSUs     =        15                  Population size    = 9235.4001
                                                Design df          =        14
                                                F(   3,     12)    =     54.36
                                                Prob > F           =    0.0000
                                                R-squared          =    0.6978

------------------------------------------------------------------------------
             |             Linearized
       api00 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       meals |  -2.948702   .3266161    -9.03   0.000    -3.649224    -2.24818
         ell |  -.2227005   .3938377    -0.57   0.581    -1.067398    .6219974
      avg_ed |   16.42832   15.32151     1.07   0.302    -16.43304    49.28968
       _cons |   755.4386   55.61202    13.58   0.000     636.1626    874.7145
------------------------------------------------------------------------------

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.