R FAQ
How can I do post-hoc pairwise comparisons in R?

Post-hoc pairwise comparisons are commonly performed after significant effects have been found when there are three or more levels of a factor. After an ANOVA, you may know that the means of your response variable differ significantly across your factor, but you do not know which pairs of the factor levels are significantly different from each other.  At this point, you can conduct pairwise comparisons.

We will demonstrate the how to conduct pairwise comparisons in R and the different options for adjusting the p-values of these comparisons given the number of tests conducted. We will be using the hsb2 dataset and looking at the variable write by ses. We will first look at the means and standard deviations by ses.

hsb2<-read.table("http://www.ats.ucla.edu/stat/r/notes/hsb2.csv", sep=",", header=T)
attach(hsb2)

tapply(write, ses, mean) 

    high      low   middle 
55.91379 50.61702 51.92632

tapply(write, ses, sd)

    high      low   middle 
9.442874 9.490391 9.106044

One-Way ANOVA

In R, we can run the ANOVA with the aov command.

a1 <- aov(write ~ ses)
summary(a1)

             Df  Sum Sq Mean Sq F value   Pr(>F)   
ses           2   858.7   429.4  4.9696 0.007842 **
Residuals   197 17020.2    86.4                    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

From this output, we can see that ses is significant in the 2-degrees of freedom test, but we do not know which pairs of ses levels are significantly different from each other.  However, this will require three tests (high vs. low, high vs. middle, low vs. middle), so we wish to adjust what we consider to be statistically significant to account for this multiplicity of tests.  For an one-way ANOVA (ANOVA with a single factor) We can first see the unadjusted p-values using the pairwise.t.test command and indicating no adjustment of p-values:

pairwise.t.test(write, ses, p.adj = "none")

        Pairwise comparisons using t tests with pooled SD 

data:  write and ses 

       high   low   
low    0.0041 -     
middle 0.0108 0.4306

P value adjustment method: none

With this same command, we can adjust the p-values according to a variety of methods. Below we show Bonferroni and Holm adjustments to the p-values and others are detailed in the command help. 

pairwise.t.test(write, ses, p.adj = "bonf")

        Pairwise comparisons using t tests with pooled SD 

data:  write and ses 

       high  low  
low    0.012 -    
middle 0.032 1.000

P value adjustment method: bonferroni

pairwise.t.test(write, ses, p.adj = "holm")

        Pairwise comparisons using t tests with pooled SD 

data:  write and ses 

       high  low  
low    0.012 -    
middle 0.022 0.431

P value adjustment method: holm

We can see that the adjustments all lead to increased p-values, but consistently the high-low and high-middle pairs appear to be significantly different at alpha = .05.  The pairwise.t.test command does not offer Tukey post-hoc tests, but there are other R commands that allow for Tukey comparisons. Below, we show code for using the TukeyHSD (Tukey Honest Significant Differences).

TukeyHSD(a1)

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = write ~ ses)

$ses
                 diff       lwr        upr     p adj
low-high    -5.296772 -9.604818 -0.9887256 0.0114079
middle-high -3.987477 -7.645265 -0.3296892 0.0289035
middle-low   1.309295 -2.605257  5.2238465 0.7096950

We can see that these results are significant with what we saw using other adjustments for the p-values.

Two (or more) Factor ANOVA

You may be fitting an ANOVA with multiple factors.  Below we look at write on ses and female.

a2 <- aov(write ~ ses + female)
summary(a2)

             Df  Sum Sq Mean Sq F value    Pr(>F)    
ses           2   858.7   429.4  5.3869  0.005279 ** 
female        1  1398.1  1398.1 17.5410 4.247e-05 ***
Residuals   196 15622.1    79.7                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

We can look at pair-wise comparisons of the ses levels after adjusting for female. The TukeyHSD command still works well, though now we must specify which factor is of interest. 

TukeyHSD(a2, "ses")
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = write ~ ses + female)

$ses
                 diff       lwr        upr     p adj
low-high    -5.296772 -9.434764 -1.1587797 0.0079527
middle-high -3.987477 -7.500879 -0.4740753 0.0216707
middle-low   1.309295 -2.450736  5.0693251 0.6896199

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.