### Stata Textbook Examples An Introduction to Categorical Analysis by Alan Agresti Chapter 2: Two-Way Contingency Tables

Table 2.1, page 17.
use  http://www.ats.ucla.edu/stat/stata/examples/icda/afterlife, clear

list

+--------------------------+
|  gender   aftlife   freq |
|--------------------------|
1. | females       yes    435 |
2. | females        no    147 |
3. |    male       yes    375 |
4. |    male        no    134 |
+--------------------------+
Notice that both variables gender and aftlife are numeric variables. They have value labels. We can also do:
list, nolab

+-------------------------+
| gender   aftlife   freq |
|-------------------------|
1. |      1         1    435 |
2. |      1         0    147 |
3. |      0         1    375 |
4. |      0         0    134 |
+-------------------------+
Table 2.2, page 18.
tab gender aftlife [fweight=freq]

|  belief in afterlife
gender |        no        yes |     Total
-----------+----------------------+----------
male |       134        375 |       509
females |       147        435 |       582
-----------+----------------------+----------
Total |       281        810 |     1,091 
Calculation in Section 2.1.2, page 18.
tab gender aftlife [fweight=freq], cell row

+-----------------+
| Key             |
|-----------------|
|    frequency    |
| row percentage  |
| cell percentage |
+-----------------+
|  belief in afterlife
gender |        no        yes |     Total
-----------+----------------------+----------
male |       134        375 |       509
|     26.33      73.67 |    100.00
|     12.28      34.37 |     46.65
-----------+----------------------+----------
females |       147        435 |       582
|     25.26      74.74 |    100.00
|     13.47      39.87 |     53.35
-----------+----------------------+----------
Total |       281        810 |     1,091
|     25.76      74.24 |    100.00
|     25.76      74.24 |    100.00 
Calculation on difference of proportions in Section 2.2.1, page 20. The Stata command cs is part of epitab for creating tables for epidemiologists and you can do help epitab for more information on it. It is used mostly for case-control studies.
cs aftlife gender  [fweight=freq]

| gender                 |
|   Exposed   Unexposed  |     Total
-----------------+------------------------+----------
Cases |       435         375  |       810
Noncases |       147         134  |       281
-----------------+------------------------+----------
Total |       582         509  |      1091
|                        |
Risk |  .7474227    .7367387  |  .7424381
|                        |
|      Point estimate    |  [95% Conf. Interval]
|------------------------+----------------------
Risk difference |          .010684       | -.0413721    .0627401
Risk ratio |         1.014502       |  .9457309    1.088273
Attr. frac. ex. |         .0142944       | -.0573833    .0811133
Attr. frac. pop |         .0076766       |
+-----------------------------------------------
chi2(1) =     0.16  Pr>chi2 = 0.6872
Section 2.2.2, Table 2.3 and calculation on page 20 and 21 including relative risk.
use http://www.ats.ucla.edu/stat/stata/examples/icda/aspirin, clear

tab group mi [fweight=count]

|          mi
group |        no        yes |     Total
-----------+----------------------+----------
aspirin |    10,933        104 |    11,037
placebo |    10,845        189 |    11,034
-----------+----------------------+----------
Total |    21,778        293 |    22,071

cs mi group [fweight=count]

| group                  |
|   Exposed   Unexposed  |     Total
-----------------+------------------------+----------
Cases |       189         104  |       293
Noncases |     10845       10933  |     21778
-----------------+------------------------+----------
Total |     11034       11037  |     22071
|                        |
Risk |  .0171289    .0094229  |  .0132753
|                        |
|      Point estimate    |  [95% Conf. Interval]
|------------------------+----------------------
Risk difference |          .007706       |  .0046878    .0107243
Risk ratio |         1.817802       |  1.433031    2.305884
Attr. frac. ex. |          .449885       |  .3021783    .5663269
Attr. frac. pop |         .2901989       |
+-----------------------------------------------
chi2(1) =    25.01  Pr>chi2 = 0.0000
Section 2.3.2 and Section 2.3.3, page 23-25. Odds Ratio for Aspirin Study.
logit mi group [fweight=count], or

Logistic regression                               Number of obs   =      22071
LR chi2(1)      =      25.37
Prob > chi2     =     0.0000
Log likelihood = -1544.6617                       Pseudo R2       =     0.0081
------------------------------------------------------------------------------
mi | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
group |   1.832054   .2250524     4.93   0.000     1.440042     2.33078
------------------------------------------------------------------------------

logit mi group [fweight=count]

Iteration 0:   log likelihood = -1557.3477
Iteration 1:   log likelihood = -1544.9244
Iteration 2:   log likelihood = -1544.6619
Iteration 3:   log likelihood = -1544.6617
Logit estimates                                   Number of obs   =      22071
LR chi2(1)      =      25.37
Prob > chi2     =     0.0000
Log likelihood = -1544.6617                       Pseudo R2       =     0.0081
------------------------------------------------------------------------------
mi |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
group |   .6054377   .1228416     4.93   0.000     .3646726    .8462028
_cons |   -4.65515   .0985233   -47.25   0.000    -4.848252   -4.462048
------------------------------------------------------------------------------
Table 2.4 and calculations on page 26.
use http://www.ats.ucla.edu/stat/stata/examples/icda/table2_4, clear

tab smoke mi [fw=count], col

+-------------------+
| Key               |
|-------------------|
|     frequency     |
| column percentage |
+-------------------+
|          mi
smoke |   control         MI |     Total
-----------+----------------------+----------
no |       346         90 |       436
|     66.67      34.35 |     55.83
-----------+----------------------+----------
yes |       173        172 |       345
|     33.33      65.65 |     44.17
-----------+----------------------+----------
Total |       519        262 |       781
|    100.00     100.00 |    100.00

logit mi smoke [fw=count], or

Iteration 0:   log likelihood = -498.26482
Iteration 1:   log likelihood =  -461.5178
Iteration 2:   log likelihood =  -461.1358
Iteration 3:   log likelihood = -461.13566
Logit estimates                                   Number of obs   =        781
LR chi2(1)      =      74.26
Prob > chi2     =     0.0000
Log likelihood = -461.13566                       Pseudo R2       =     0.0745
------------------------------------------------------------------------------
mi | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
smoke |   3.822222   .6115027     8.38   0.000     2.793415    5.229936
------------------------------------------------------------------------------
Table 2.5 and calculations on page 31.
use http://www.ats.ucla.edu/stat/stata/examples/icda/party, clear

tab male party [fw=count], expected

+--------------------+
| Key                |
|--------------------|
|     frequency      |
| expected frequency |
+--------------------+
|              party
male |         1          2          3 |     Total
-----------+---------------------------------+----------
0 |       279         73        225 |       577
|     261.4       70.7      244.9 |     577.0
-----------+---------------------------------+----------
1 |       165         47        191 |       403
|     182.6       49.3      171.1 |     403.0
-----------+---------------------------------+----------
Total |       444        120        416 |       980
|     444.0      120.0      416.0 |     980.0

tab male party [fw=count], chi2 lrchi2

|              party
male |         1          2          3 |     Total
-----------+---------------------------------+----------
0 |       279         73        225 |       577
1 |       165         47        191 |       403
-----------+---------------------------------+----------
Total |       444        120        416 |       980
Pearson chi2(2) =   7.0095   Pr = 0.030
likelihood-ratio chi2(2) =   7.0026   Pr = 0.030
Table 2.6 and calculation on page 32. Stata's tabulate command does not produce adjusted residuals. Nicholas J. Cox has written a module for tabulation and chi-square tasks. You can download it from the internet by typing findit tabchi (see How can I use the findit command to search for programs and get additional help? for more information about using findit).
tabchi male party [fw=count], a

observed frequency
expected frequency
-------------------------------------
|           party
male |       1        2        3
----------+--------------------------
0 |     279       73      225
| 261.416   70.653  244.931
|   2.293    0.465   -2.618
|
1 |     165       47      191
| 182.584   49.347  171.069
|  -2.293   -0.465    2.618
-------------------------------------
Pearson chi2(2) =   7.0095   Pr = 0.030
likelihood-ratio chi2(2) =   7.0026   Pr = 0.030

gen p1=party-1
cc p1 male [fw=count] if party ~=2

Proportion
|   Exposed   Unexposed  |     Total     Exposed
-----------------+------------------------+----------------------
Cases |       191         225  |       416      0.4591
Controls |       165         279  |       444      0.3716
-----------------+------------------------+----------------------
Total |       356         504  |       860      0.4140
|                        |
|      Point estimate    |  [95% Conf. Interval]
|------------------------+----------------------
Odds ratio |         1.435394       |  1.082895    1.902763  (exact)
Attr. frac. ex. |         .3033271       |  .0765493    .4744484  (exact)
Attr. frac. pop |          .139268       |
+-----------------------------------------------
chi2(1) =     6.78  Pr>chi2 = 0.0092
Section 2.4.6, page 33. Partitioning Chi-squared.
tab male party [fw=count] if party~=3, lrchi2

|         party
male |         1          2 |     Total
-----------+----------------------+----------
0 |       279         73 |       352
1 |       165         47 |       212
-----------+----------------------+----------
Total |       444        120 |       564
likelihood-ratio chi2(1) =   0.1612   Pr = 0.688

gen np = 0
replace np = 1 if party ==3
preserve
collapse (sum) count, by(male np)
list

+-------------------+
| male   np   count |
|-------------------|
1. |    0    0     352 |
2. |    0    1     225 |
3. |    1    0     212 |
4. |    1    1     191 |
+-------------------+

tab male np [fw=count] ,lrchi2
|          np
male |         0          1 |     Total
-----------+----------------------+----------
0 |       352        225 |       577
1 |       212        191 |       403
-----------+----------------------+----------
Total |       564        416 |       980
likelihood-ratio chi2(1) =   6.8414   Pr = 0.009

restore
tab male party [fw=count] ,lrchi2
|              party
male |         1          2          3 |     Total
-----------+---------------------------------+----------
0 |       279         73        225 |       577
1 |       165         47        191 |       403
-----------+---------------------------------+----------
Total |       444        120        416 |       980
likelihood-ratio chi2(2) =   7.0026   Pr = 0.030
Section 2.5.2, Table 2.7 on page 35 and calculation on page 36.
use http://www.ats.ucla.edu/stat/stata/examples/icda/alcohol, clear

tab alcohol mal [fw=count], r

+----------------+
| Key            |
|----------------|
|   frequency    |
| row percentage |
+----------------+
|          mal
alcohol |        no        yes |     Total
-----------+----------------------+----------
0 |    17,066         48 |    17,114
|     99.72       0.28 |    100.00
-----------+----------------------+----------
<1 |    14,464         38 |    14,502
|     99.74       0.26 |    100.00
-----------+----------------------+----------
1-2 |       788          5 |       793
|     99.37       0.63 |    100.00
-----------+----------------------+----------
3-5 |       126          1 |       127
|     99.21       0.79 |    100.00
-----------+----------------------+----------
>=6 |        37          1 |        38
|     97.37       2.63 |    100.00
-----------+----------------------+----------
Total |    32,481         93 |    32,574
|     99.71       0.29 |    100.00

tabchi alcohol mal [fw=count], a

observed frequency
expected frequency
--------------------------------
|         mal
alcohol |        no        yes
----------+---------------------
0 |     17066         48
| 17065.139     48.861
|     0.179     -0.179
|
<1 |     14464         38
| 14460.596     41.404
|     0.711     -0.711
|
1-2 |       788          5
|   790.736      2.264
|    -1.843      1.843
|
3-5 |       126          1
|   126.637      0.363
|    -1.062      1.062
|
>=6 |        37          1
|    37.892      0.108
|    -2.712      2.712
--------------------------------
3 cells with expected frequency < 5
2 cells with expected frequency < 1
Pearson chi2(4) =  12.0821   Pr = 0.017
likelihood-ratio chi2(4) =   6.2020   Pr = 0.185
For the M-squared statistics, we can manually compute it as follows.
recode alcohol 0 = 0 1 = .5 2 = 1.5 3 = 4 4 = 7, gen(ascore)
corr ascore mal [fw=count]
(obs=32574)

|   ascore      mal
-------------+------------------
ascore |   1.0000
mal |   0.0142   1.0000

di r(N)*r(rho)^2
6.5701339
Calculation on page 37 in Section 2.5.4.
corr alcohol mal [fw=count]
(obs=32574)

|  alcohol      mal
-------------+------------------
alcohol |   1.0000
mal |   0.0075   1.0000

di r(N)*r(rho)^2
1.8278158

expand count
(32564 observations created)

egen mrank = rank(alcohol)
corr mrank mal
(obs=32574)

|    mrank      mal
-------------+------------------
mrank |   1.0000
mal |   0.0033   1.0000

di r(N)*r(rho)^2
.35143832
Section 2.6.2, page 41. Fisher's Tea Taster.
use http://www.ats.ucla.edu/stat/stata/examples/icda/fisher_tea, clear

tab pour guess [fw=count] , exact all

|         guess
pour |      milk        tea |     Total
-----------+----------------------+----------
milk |         3          1 |         4
tea |         1          3 |         4
-----------+----------------------+----------
Total |         4          4 |         8
Pearson chi2(1) =   2.0000   Pr = 0.157
likelihood-ratio chi2(1) =   2.0930   Pr = 0.148
Cramer's V =   0.5000
gamma =   0.8000  ASE = 0.294
Kendall's tau-b =   0.5000  ASE = 0.306
Fisher's exact =                 0.486
1-sided Fisher's exact =                 0.243
Table 2.9 on page 41. You need to download a module called _GHYPER by Nick Cox and then you can use the egen command to generate the hypergeometric probabilities (see How can I use the findit command to search for programs and get additional help? for more information about using findit).
clear
set obs 5
obs was 0, now 5

gen n = _n -1
egen prob = hyper(n 4 4 8)
list

+---------------+
| n        prob |
|---------------|
1. | 0   .01428571 |
2. | 1   .22857143 |
3. | 2   .51428571 |
4. | 3   .22857143 |
5. | 4   .01428571 |
+---------------+

gen a = sum(prob)
gen pvalue = 1
replace pvalue = 1 - a[_n-1] if _n>=2
list

+-------------------------------------+
| n        prob          a     pvalue |
|-------------------------------------|
1. | 0   .01428571   .0142857          1 |
2. | 1   .22857143   .2428571   .9857143 |
3. | 2   .51428571   .7571428   .7571428 |
4. | 3   .22857143   .9857143   .2428572 |
5. | 4   .01428571          1   .0142857 |
+-------------------------------------+

drop a
gen chi2 = ( (n-2)^2 + (4-n -2)^2 + (4-n-2)^2 + (n-2)^2 ) /2
list

+---------------------------------+
| n        prob     pvalue   chi2 |
|---------------------------------|
1. | 0   .01428571          1      8 |
2. | 1   .22857143   .9857143      2 |
3. | 2   .51428571   .7571428      0 |
4. | 3   .22857143   .2428572      2 |
5. | 4   .01428571   .0142857      8 |
+---------------------------------+
Figure 2.2 on page 42.
gen y2=0
graph twoway rbar prob y2 chi2, xlabel(0 1 to 8) ytitle(probability)
Section 2.6.4, page 44 using tea-tasting data. The command exactcc can be downloaded from the internet by typing findit exactcc in the command line (see How can I use the findit command to search for programs and get additional help? for more information about using findit).
exactcc pour guess [fw=count] , exact

| guess                  |             Proportion
|   Exposed   Unexposed  |     Total     Exposed
-----------------+------------------------+----------------------
Cases |         3           1  |         4      0.7500
Controls |         1           3  |         4      0.2500
-----------------+------------------------+----------------------
Total |         4           4  |         8      0.5000
|                        |
|      Point estimate    |  [95% Conf. Interval]
|------------------------+----------------------
|                        | Cornfield's limits
Odds ratio |                9       |  .1938699           .  Adjusted
|                        | Exact limits
|                        |  .2117353      626.24
|                        | Cornfield's limits
Attr. frac. ex. |         .8888889       | -4.158098           .  Adjusted
|                        | Exact limits
|                        | -3.722879    .9984032
Attr. frac. pop |         .6666667       |
+-----------------------------------------------
chi2(1) =     2.00  Pr>chi2 = 0.1573
Yates' adjusted chi2(1) =     0.50  Pr>chi2 = 0.4795
1-sided Fisher's exact P = 0.2429
2-sided Fisher's exact P = 0.4857
2 times 1-sided Fisher's exact P = 0.4857

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.