Stata Textbook Examples
Introduction to the Practice of Statistics by Moore and McCabe
Chapter 11: Multiple Regression

NOTE: This page has been delinked.  It is no longer being maintained, and information on this page may be out of date.

The first examples use the file CSDATA.
use http://www.ats.ucla.edu/stat/stata/examples/mm/webdata/csdata, clear
Figure 11.1, page 719 can be obtained using the summarize command.
summarize  gpa satm satv hsm hss hse

Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
     gpa |     224    2.635223   .7793949        .12          4  
    satm |     224    595.2857   86.40144        300        800  
    satv |     224    504.5491   92.61046        285        760  
     hsm |     224    8.321429   1.638737          2         10  
     hss |     224    8.089286   1.699663          3         10  
     hse |     224     8.09375   1.507874          3         10  
Figure 11.2, page 720 can be produced with the tab1 command below. It is possible that table 11.2 may be mislabeled in the book, repeating HSM as the label for each table.
tab1 hsm hss hse

-> tabulation of hsm  

        hsm |      Freq.     Percent        Cum.
------------+-----------------------------------
          2 |          1        0.45        0.45
          3 |          1        0.45        0.89
          4 |          4        1.79        2.68
          5 |          6        2.68        5.36
          6 |         23       10.27       15.62
          7 |         28       12.50       28.12
          8 |         36       16.07       44.20
          9 |         59       26.34       70.54
         10 |         66       29.46      100.00
------------+-----------------------------------
      Total |        224      100.00

-> tabulation of hss  

        hss |      Freq.     Percent        Cum.
------------+-----------------------------------
          3 |          1        0.45        0.45
          4 |          7        3.12        3.57
          5 |          9        4.02        7.59
          6 |         24       10.71       18.30
          7 |         42       18.75       37.05
          8 |         31       13.84       50.89
          9 |         50       22.32       73.21
         10 |         60       26.79      100.00
------------+-----------------------------------
      Total |        224      100.00

-> tabulation of hse  

        hse |      Freq.     Percent        Cum.
------------+-----------------------------------
          3 |          1        0.45        0.45
          4 |          4        1.79        2.23
          5 |          5        2.23        4.46
          6 |         23       10.27       14.73
          7 |         43       19.20       33.93
          8 |         49       21.88       55.80
          9 |         52       23.21       79.02
         10 |         47       20.98      100.00
------------+-----------------------------------
      Total |        224      100.00
Figure 11.3, page 721 shows the correlations among the variables, and Stata can do this with the pwcorr command. We use pwcorr so we can get the significance, via the sig option.
pwcorr gpa satm satv hsm hss hse, sig

          |      gpa     satm     satv      hsm      hss      hse
----------+------------------------------------------------------
      gpa |   1.0000 
          |
          |
     satm |   0.2517   1.0000 
          |   0.0001
          |
     satv |   0.1145   0.4639   1.0000 
          |   0.0873   0.0000
          |
      hsm |   0.4365   0.4535   0.2211   1.0000 
          |   0.0000   0.0000   0.0009
          |
      hss |   0.3294   0.2405   0.2617   0.5757   1.0000 
          |   0.0000   0.0003   0.0001   0.0000
          |
      hse |   0.2890   0.1083   0.2437   0.4469   0.5794   1.0000 
          |   0.0000   0.1060   0.0002   0.0000   0.0000
          |
Figure 11.4, page 722 shows a regression predicting gpa from hsm hss and hse. We can get these results in Stata using the regress command. Note that the dependent variable (gpa) comes first, followed by the predictors (hsm hss and hse). In Stata, the intercept is labeled _cons (for constant) and appears at the end of the list of variables, in contrast to SAS where the constant appears first. If you would like to see an example where the output of the regress command is explained, see Annotated Output.
regress gpa hsm hss hse

  Source |       SS       df       MS                  Number of obs =     224
---------+------------------------------               F(  3,   220) =   18.86
   Model |  27.7123302     3  9.23744341               Prob > F      =  0.0000
Residual |  107.750459   220  .489774812               R-squared     =  0.2046
---------+------------------------------               Adj R-squared =  0.1937
   Total |  135.462789   223  .607456452               Root MSE      =  .69984

------------------------------------------------------------------------------
     gpa |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
     hsm |   .1685666   .0354921      4.749   0.000       .0986185    .2385147
     hss |   .0343156   .0375589      0.914   0.362      -.0397057    .1083368
     hse |   .0451018   .0386959      1.166   0.245      -.0311602    .1213638
   _cons |   .5898766   .2942432      2.005   0.046       .0099804    1.169773
------------------------------------------------------------------------------
We can use the predict and qnorm commands to get a quantile normal plot like figure 11.5, page 725.

Note: The y-scale does not exactly match the book, however the shape is the same.

predict gpares, resid
qnorm gpares
The regress command gets the output shown in figure 11.6, page 725.
regress gpa hsm hse

  Source |       SS       df       MS                  Number of obs =     224
---------+------------------------------               F(  2,   221) =   27.89
   Model |  27.3034901     2  13.6517451               Prob > F      =  0.0000
Residual |  108.159299   221  .489408591               R-squared     =  0.2016
---------+------------------------------               Adj R-squared =  0.1943
   Total |  135.462789   223  .607456452               Root MSE      =  .69958

------------------------------------------------------------------------------
     gpa |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
     hsm |   .1826544   .0319558      5.716   0.000       .1196773    .2456315
     hse |   .0606701   .0347291      1.747   0.082      -.0077725    .1291128
   _cons |   .6242285    .291722      2.140   0.033       .0493155    1.199142
------------------------------------------------------------------------------
This regress command gets the output shown in figure 11.7, page 727.
regress gpa satm satv

  Source |       SS       df       MS                  Number of obs =     224
---------+------------------------------               F(  2,   221) =    7.48
   Model |   8.5838391     2  4.29191955               Prob > F      =  0.0007
Residual |   126.87895   221  .574112895               R-squared     =  0.0634
---------+------------------------------               Adj R-squared =  0.0549
   Total |  135.462789   223  .607456452               Root MSE      =   .7577

------------------------------------------------------------------------------
     gpa |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
    satm |   .0022828   .0006629      3.444   0.001       .0009764    .0035893
    satv |  -.0000246   .0006185     -0.040   0.968      -.0012434    .0011943
   _cons |   1.288677   .3760368      3.427   0.001       .5476004    2.029754
------------------------------------------------------------------------------
This regress command shows the analysis from figure 11.8, page 728.
regress gpa satm satv hsm hss hse

  Source |       SS       df       MS                  Number of obs =     224
---------+------------------------------               F(  5,   218) =   11.69
   Model |  28.6436439     5  5.72872878               Prob > F      =  0.0000
Residual |  106.819145   218  .489996078               R-squared     =  0.2115
---------+------------------------------               Adj R-squared =  0.1934
   Total |  135.462789   223  .607456452               Root MSE      =     .70

------------------------------------------------------------------------------
     gpa |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
    satm |   .0009436   .0006857      1.376   0.170      -.0004078     .002295
    satv |  -.0004078   .0005919     -0.689   0.492      -.0015744    .0007587
     hsm |   .1459611    .039261      3.718   0.000       .0685814    .2233407
     hss |   .0359053   .0377984      0.950   0.343      -.0385918    .1104024
     hse |   .0552926   .0395687      1.397   0.164      -.0226936    .1332787
   _cons |   .3267187   .3999964      0.817   0.415      -.4616364    1.115074
------------------------------------------------------------------------------
This test command tests the joint contribution of satm and satv, lower part of figure 11.8.
test satm satv

 ( 1)  satm = 0.0
 ( 2)  satv = 0.0

       F(  2,   218) =    0.95
            Prob > F =    0.3882
This test command tests the joint contribution of hsm, hss and hse, bottom of figure 11.8.
test hsm hss hse

 ( 1)  hsm = 0.0
 ( 2)  hss = 0.0
 ( 3)  hse = 0.0

       F(  3,   218) =   13.65
            Prob > F =    0.0000
We have skipped example 11.2 for now.

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.