Stata FAQ
How can I explain a continuous by continuous interaction? (Stata 10 and earlier)

First off, let's start with what a significant continuous by continuous interaction means. It means that the slope of one continuous variable on the response variable changes as the values on a second continuous change.

Multiple regression models often contain interaction terms. This FAQ page covers the situation in which there is a moderator variable which influences the regression of the dependent variable on an independent/predictor variable. In other words, a regression model that has a significant two-way interaction of continuous variables.

There are several approaches that one might use to explain an interaction of two continuous variables. The approach that we will demonstrate is to compute simple slopes, i.e., the slopes of the dependent variable on the independent variable when the moderator variable is held constant at different combinations of high and low values, say 1 standard deviation above the mean and one standard deviation below the mean. You could also compute the simple slope when the moderator variable is held at its mean.

There are two ways that we could accomplish this. One way is to center the moderator variable at the different values, i.e., 1 sd above and 1 sd below the mean. An alternative method makes use of the lincom command and does not require recentering.

We will consider a regression model which includes a continuous by continuous interaction of a predictor variable with a moderator variable. In the formula, Y is the response variable, X the predictor (independent) variable with Z being the moderator variable. The term XZ is the interaction of the predictor with the moderator.

Y = b0 + b1X + b2Z + b3XZ
We will illustrate the simple slopes process using the hsb2 dataset that has a statistically significant continuous by continuous interaction. In order to keep the notation consistent we will temporarily change the names of the variables; y for the response variable, x for the independent variable and z for the moderator. As shown in the code below that read is the response variable, math is the predictor and socst is the moderator variable. After renaming, we will create the interaction manually. We could have used xi3 but chose to do it manually to keep the notation as concise and consistent as possible.

Then, after creating the interaction term, we will run the regression model.

use http://www.ats.ucla.edu/stat/stata/notes/hsb2, clear

rename read y
rename math x
rename socst z

generate xz=x*z

regress y x z xz


      Source |       SS       df       MS              Number of obs =     200
-------------+------------------------------           F(  3,   196) =   78.61
       Model |  11424.7622     3  3808.25406           Prob > F      =  0.0000
    Residual |  9494.65783   196  48.4421318           R-squared     =  0.5461
-------------+------------------------------           Adj R-squared =  0.5392
       Total |    20919.42   199  105.122714           Root MSE      =    6.96

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |  -.1105123   .2916338    -0.38   0.705    -.6856552    .4646307
           z |  -.2200442   .2717539    -0.81   0.419    -.7559812    .3158928
          xz |   .0112807   .0052294     2.16   0.032     .0009677    .0215938
       _cons |   37.84271   14.54521     2.60   0.010     9.157506    66.52792
------------------------------------------------------------------------------
Please note that the interaction, xz, is statistically significant with a p-value of 0.032.

We need to create two global macro variables to hold the range of x that will be used for graphing. We will also create global macro variables that contain the mean and standard deviation of the moderator variable, z.

sum x
global max = r(max)
global min = r(min)
sum z
global m  = r(mean)
global sd = r(sd)
Next, we will demonstrate how to compute the slope for y on x while holding the value of the moderator variable, z, constant at either a high value (mean + 1 SD) or a low value (mean - 1 SD) using the method of recentering. To do this we will generate new recentered variables by subtracting either the mean + 1 SD or mean - 1 SD from z. We will also have to compute new interaction terms using the recentered variables.
/* recentering method */

generate zh  =  z - ($m + $sd)
generate xzh = x*zh

generate zl  =  z - ($m - $sd)
generate xzl = x*zl
Now we are ready to run the two regression models with the recentered variables.
/* regression with recentered variables */

regress y x zh xzh

      Source |       SS       df       MS              Number of obs =     200
-------------+------------------------------           F(  3,   196) =   78.61
       Model |  11424.7622     3  3808.25405           Prob > F      =  0.0000
    Residual |  9494.65785   196  48.4421319           R-squared     =  0.5461
-------------+------------------------------           Adj R-squared =  0.5392
       Total |    20919.42   199  105.122714           Root MSE      =    6.96

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   .6017615   .0774773     7.77   0.000     .4489654    .7545576
          zh |  -.2200441   .2717539    -0.81   0.419    -.7559812    .3158929
         xzh |   .0112807   .0052294     2.16   0.032     .0009677    .0215938
       _cons |   23.94895   4.502279     5.32   0.000     15.06982    32.82808
------------------------------------------------------------------------------

display "equation for high z: y = 23.94895 + .6017615*x"

equation for high z: y = 23.94895 + .6017615*x

regress y x zl xzl

      Source |       SS       df       MS              Number of obs =     200
-------------+------------------------------           F(  3,   196) =   78.61
       Model |  11424.7622     3  3808.25406           Prob > F      =  0.0000
    Residual |  9494.65783   196  48.4421318           R-squared     =  0.5461
-------------+------------------------------           Adj R-squared =  0.5392
       Total |    20919.42   199  105.122714           Root MSE      =    6.96

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   .3595465   .0917421     3.92   0.000      .178618    .5404749
          zl |  -.2200442   .2717539    -0.81   0.419    -.7559812    .3158928
         xzl |   .0112807   .0052294     2.16   0.032     .0009677    .0215938
       _cons |   28.67365   4.387166     6.54   0.000     20.02154    37.32576
------------------------------------------------------------------------------

display "equation for low z: y = 28.67365 + .3595465*x"

equation for low z: y = 28.67365 + .3595465*x
From these two regression models we can see the slopes for high and low moderator values are .6017615 and .3595465 respectively. The constants are 23.94895 and 28.67365. We can use these constants and slopes to plot the regression lines holding the moderator variable constant at their high and low values.
twoway (function y = 23.94895 + .6017615*x, range($min $max)) ///
       (function y = 28.67365 + .3595465*x, range($min $max)) ///
       (scatter y x, msym(oh) jitter(3)),                     ///
       legend(order(1 "z at m+1sd" 2 "z at m-1sd")) /// 
       ytitle(Y) xtitle(X) name(conconb, replace)
       
Using the recentering method we can compute the slope for any value of the moderator variable of interest. There is an alternative to recentering using the lincom command that does not require creating new variables. It will just use the global macro values for the mean and standard deviation created earlier. The x, xz, etc., terms in the lincom commands below refer to the coefficients from the regression model and not the variables themselves.
/* lincom method */

/* define slopes */
global Hz "x + ($m+$sd)*xz" /* define slope for high z */
global Lz "x + ($m-$sd)*xz" /* define slope for low  z */

/* rerun original regrression model */

regress y x z xz

      Source |       SS       df       MS              Number of obs =     200
-------------+------------------------------           F(  3,   196) =   78.61
       Model |  11424.7622     3  3808.25406           Prob > F      =  0.0000
    Residual |  9494.65783   196  48.4421318           R-squared     =  0.5461
-------------+------------------------------           Adj R-squared =  0.5392
       Total |    20919.42   199  105.122714           Root MSE      =    6.96

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |  -.1105123   .2916338    -0.38   0.705    -.6856552    .4646307
           z |  -.2200442   .2717539    -0.81   0.419    -.7559812    .3158928
          xz |   .0112807   .0052294     2.16   0.032     .0009677    .0215938
       _cons |   37.84271   14.54521     2.60   0.010     9.157506    66.52792
------------------------------------------------------------------------------

/* slope with high moderator */

lincom $Hz

 ( 1)  x + 63.14079 xz = 0

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   .6017615   .0774773     7.77   0.000     .4489654    .7545576
------------------------------------------------------------------------------

/* constant with high moderator */

lincom _cons + ($m+$sd)*z

 ( 1)  63.14079 z + _cons = 0

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   23.94895   4.502279     5.32   0.000     15.06982    32.82808
------------------------------------------------------------------------------

display "equation for high z: y = 23.94895 + .6017615*x"

equation for high z: y = 23.94895 + .6017615*x

/* slope with low moderator */

lincom $Lz

 ( 1)  x + 41.66921 xz = 0

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   .3595465   .0917421     3.92   0.000      .178618    .5404749
------------------------------------------------------------------------------

/* constant with low moderator */

lincom _cons + ($m-$sd)*z

 ( 1)  41.66921 z + _cons = 0

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   28.67365   4.387166     6.54   0.000     20.02154    37.32576
------------------------------------------------------------------------------

display "equation for low z: y = 28.67365 + .3595465*x"

equation for low z: y = 28.67365 + .3595465*x
With some copying and pasting from above, we can put together a table of the two simple slopes. This will make it easier to visually compare the values of the slopes with one another.
             |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
(slope 1 Hz) |   .6017615   .0774773     7.77   0.000     .4489654    .7545576
(slope 2 Lz) |   .3595465   .0917421     3.92   0.000      .178618    .5404749
Using the lincom method makes it very easy to compute the slope for any value of the moderator variable.

Some of you may be thinking that if we can compute these slopes using lincom could you also use lincom to test whether the difference in slopes is statistically significant? The answer is yes but it is completely unnecessary. First, we'll show you how to do it and then explain why you don't need to to it.

/* difference in slopes */

lincom ($Hz) - ($Lz)

 ( 1)  21.47159 xz = 0

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |    .242215    .112283     2.16   0.032      .020777     .463653
------------------------------------------------------------------------------
Note how lincom has simplified the equations by combining like terms and canceling whenever possible so that the lincom test has only one term. Let's look at this a little more closely to see what is going on.

Please note: You must use the names y and x with the twoway function plot command. Do not use variable names from your data.

slope1 - slope2 = (x + ($m+$sd)*xz)-(x + ($m-$sd)*xz) 
                =  x + ($m+$sd)*xz -x -($m-$sd)*xz    
                = ($m+$sd)*xz -($m-$sd)*xz          
                = (($m+$sd)-($m-$sd))*xz             
                = ($m+$sd-$m+$sd)*xz 
                = 2*$sd*xz
So what it comes down to is that the test of the difference in simple slopes is just two times the standard deviation of the moderator time the coefficient of the interaction. Let's try it.
/* alternatively */

lincom 2*$sd*xz

 ( 1)  21.47159 xz = 0

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |    .242215    .112283     2.16   0.032      .020777     .463653
------------------------------------------------------------------------------
Thus, the difference in the two simple slopes is about .24 and that difference is statistically significant at p = 0.032. The reason that this test is unnecessary is that the lincom command only involves the interaction term which we already know from the overall regression model is statistically significant at p=0.032

Below are all the commands that we used collected into a do-file to make it easier to use.

use http://www.ats.ucla.edu/stat/stata/notes/hsb2, clear

rename read y
rename math x
rename socst z
generate xz=x*z
summarize z
global m=r(mean)
global sd=r(sd)
/* get range of x */
summarize x
global max = r(max)
global min = r(min)

/* recentering method */

regress y x z xz

generate zh  =  z - ($m + $sd)
generate xzh = x*zh

generate zl  =  z - ($m - $sd)
generate xzl = x*zl

regress y x zh xzh
display "equation for high z: y = 23.94895 + .6017615*x"

regress y x zl xzl
display "equation for low z: y = 28.67365 + .3595465*x"

twoway (function y = 23.94895 + .6017615*x, range($min $max)) ///
       (function y = 28.67365 + .3595465*x, range($min $max)) ///
       (scatter y x, msym(oh) jitter(3)),                     ///
       legend(order(1 "z at m+1sd" 2 "z at m-1sd")) /// 
       ytitle(Y) xtitle(X) name(conconb, replace)

/* lincom method */

/* define slopes */
global Hz "x + ($m+$sd)*xz" /* define slope for high z */
global Lz "x + ($m-$sd)*xz" /* define slope for low  z */

/* rerun original regrression model */
regress y x z xz

/* slope with high moderator */
lincom $Hz
/* constant with high moderator */
lincom _cons + ($m+$sd)*z
display "equation for high z: y = 23.94895 + .6017615*x"

/* slope with low moderator */
lincom $Lz
/* constant with low moderator */
lincom _cons + ($m-$sd)*z
display "equation for low z: y = 28.67365 + .3595465*x"

/* difference in slopes */
lincom ($Hz) - ($Lz)

/* alternatively */
lincom 2*$sd*xz

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.