Countfit runs user-specified count models (Poisson, zero-inflated Poisson, negative binomial, and zero-inflated negative binomial) with user-specified variables and compares the model residuals. If you do not indicate which models you wish to compare (nbreg, zinp, prm, or zip), countfit will default to running and comparing all four. One if the quirks of countfit is that it displays results for a given predictor next to the predictor's label rather than name. So an unlabeled predictor will lead to both empty space in the output and possible confusion. This can be prevented by labeling ALL predictors before running countfit.
In this example, we will be looking at academic information on 316 students. The response variable is days absent during the school year (daysabs). We have narrowed our model choices down to two negative binomial models, one with zero-inflation and one without. We believe that the daysabs can best be predicted with math standardized tests score (mathnce), language standardized tests score (langnce) and gender (female). We suspect certain zeroes can be predicted with bilingual status (biling) and language score (langnce). We will run the countfit command indicating these predictors and the two models we wish to compare, then discuss the output.use http://www.ats.ucla.edu/stat/stata/notes/lahigh, clear generate female = (gender == 1)label variable female `"female"'countfit daysabs mathnce langnce female, inf(biling langnce) nbreg zinb---------------------------------------------------------- Variable | NBRM ZINB ---------------------------------+------------------------ daysabs | ctbs math nce | 0.998 0.998 | -0.33 -0.34 ctbs lang nce | 0.986 0.987 | -2.57 -2.38 female | 1.539 1.502 | 3.09 2.94 Constant | 9.825 9.795 | 10.89 10.94 ---------------------------------+------------------------ lnalpha | Constant | 1.288 1.191 | 2.65 1.62 ---------------------------------+------------------------ inflate | bilingual status | 10.132 | 1.53 ctbs lang nce | 1.094 | 1.92 Constant | 0.000 | -2.16 ---------------------------------+------------------------ Statistics | alpha | 1.288 N | 316.000 316.000 ll | -880.873 -879.131 bic | 1790.525 1804.307 aic | 1771.746 1774.262 ---------------------------------------------------------- legend: b/t Comparison of Mean Observed and Predicted Count Maximum At Mean Model Difference Value |Diff| --------------------------------------------- NBRM 0.011 1 0.004 ZINB 0.018 1 0.005 NBRM: Predicted and actual probabilities Count Actual Predicted |Diff| Pearson ------------------------------------------------ 0 0.196 0.201 0.004 0.031 1 0.146 0.135 0.011 0.278 2 0.098 0.104 0.006 0.093 3 0.082 0.083 0.001 0.004 4 0.070 0.068 0.001 0.008 5 0.063 0.057 0.006 0.230 6 0.047 0.048 0.000 0.001 7 0.038 0.040 0.002 0.046 8 0.028 0.034 0.006 0.318 9 0.035 0.029 0.005 0.319 ------------------------------------------------ Sum 0.804 0.799 0.044 1.327 ZINB: Predicted and actual probabilities Count Actual Predicted |Diff| Pearson ------------------------------------------------ 0 0.196 0.207 0.011 0.176 1 0.146 0.127 0.018 0.850 2 0.098 0.101 0.003 0.022 3 0.082 0.082 0.000 0.000 4 0.070 0.068 0.001 0.007 5 0.063 0.057 0.006 0.194 6 0.047 0.048 0.001 0.006 7 0.038 0.041 0.003 0.077 8 0.028 0.035 0.007 0.393 9 0.035 0.030 0.005 0.240 ------------------------------------------------ Sum 0.804 0.798 0.055 1.965 Tests and Fit Statistics ------------------------------------------------------------------------- NBRM BIC= -28.290 AIC= 5.607 Prefer Over Evidence ------------------------------------------------------------------------- vs ZINB BIC= -14.507 dif= -13.783 NBRM ZINB Very strong AIC= 5.615 dif= -0.008 NBRM ZINB Vuong= 0.858 prob= 0.195 ZINB NBRM p=0.195
----------------------------------------------------------
Variable | NBRM ZINB
---------------------------------+------------------------
daysabs |
ctbs math nce | 0.998 0.998
| -0.33 -0.34
ctbs lang nce | 0.986 0.987
| -2.57 -2.38
female | 1.539 1.502
| 3.09 2.94
Constant | 9.825 9.795
| 10.89 10.94
---------------------------------+------------------------
lnalpha |
Constant | 1.288 1.191
| 2.65 1.62
---------------------------------+------------------------
inflate |
bilingual status | 10.132
| 1.53
ctbs lang nce | 1.094
| 1.92
Constant | 0.000
| -2.16
---------------------------------+------------------------
Statistics |
alpha | 1.288
N | 316.000 316.000
ll | -880.873 -879.131
bic | 1790.525 1804.307
aic | 1771.746 1774.262
----------------------------------------------------------
legend: b/t
From the last block, we can see that the two models are extremely
close. The parameter estimates are nearly identical. We can continue to look at
the rest of the output.
Next, we see a table with one line per model showing the maximum and mean differences in observed versus predicted counts.
Comparison of Mean Observed and Predicted Count
Maximum At Mean
Model Difference Value |Diff|
---------------------------------------------
NBRM 0.011 1 0.004
ZINB 0.018 1 0.005
This confirms what we observed in the graph: both models performed worst when
predicting a count of 1. Between these two, we see that the negative binomial
did better at this prediction and, overall, had a lower mean absolute difference
between predicted and observed values. At this point, the negative binomial
model is looking more appropriate than the zero-inflated negative binomial
model. Next, we have one table for each of the models containing count-by-count
information.
In these two tables, we are able to see, for counts 0-9, the actual proportion of our data records with the given count and the predicted proportion from each models. The absolute difference is included, as is the given count's contribution to a Pearson Chi-Square statistic comparing the actual distribution of the data and the distribution proposed by the model. For a given row, the Pearson statistic can be calculated as N(|Diff|2)/Predicted, where N is the number of observations in the dataset. Looking at the sum of the Pearson column gives us a sense of how close the predicted proportions were to the actual proportions. Using this method to compare, the negative binomial appears better than the zero-inflated negative binomial.NBRM: Predicted and actual probabilities Count Actual Predicted |Diff| Pearson ------------------------------------------------ 0 0.196 0.201 0.004 0.031 1 0.146 0.135 0.011 0.278 2 0.098 0.104 0.006 0.093 3 0.082 0.083 0.001 0.004 4 0.070 0.068 0.001 0.008 5 0.063 0.057 0.006 0.230 6 0.047 0.048 0.000 0.001 7 0.038 0.040 0.002 0.046 8 0.028 0.034 0.006 0.318 9 0.035 0.029 0.005 0.319 ------------------------------------------------ Sum 0.804 0.799 0.044 1.327 ZINB: Predicted and actual probabilities Count Actual Predicted |Diff| Pearson ------------------------------------------------ 0 0.196 0.207 0.011 0.176 1 0.146 0.127 0.018 0.850 2 0.098 0.101 0.003 0.022 3 0.082 0.082 0.000 0.000 4 0.070 0.068 0.001 0.007 5 0.063 0.057 0.006 0.194 6 0.047 0.048 0.001 0.006 7 0.038 0.041 0.003 0.077 8 0.028 0.035 0.007 0.393 9 0.035 0.030 0.005 0.240 ------------------------------------------------ Sum 0.804 0.798 0.055 1.965
Tests and Fit Statistics
-------------------------------------------------------------------------
NBRM BIC= -28.290 AIC= 5.607 Prefer Over Evidence
-------------------------------------------------------------------------
vs ZINB BIC= -14.507 dif= -13.783 NBRM ZINB Very strong
AIC= 5.615 dif= -0.008 NBRM ZINB
Vuong= 0.858 prob= 0.195 ZINB NBRM p=0.195
In this table, the tested models are compared to each other head-to-head using
the tests appropriate to each comparison. Each line can be boiled down to
the last three columns. They suggest which model is preferred by the given
comparison and the strength of the evidence supporting this preference.
When we compare our two models using the BIC and AIC,
the negative binomial is preferred over zero-inflated negative
binomial. The Vuong test prefers zero-inflated negative binomial model over the
negative binomial model, but not at a statistically significant level.
Thus, these model fit statistics support what we have seen in the model
residuals.
This will print the full output from each of the four models, then the summary output provided without the noisily option.countfit daysabs mathnce langnce female, inf(biling langnce) nbreg zinb noisily
The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.