### SAS FAQ: Given a knot location, how can I fit a piecewise model using Proc Transreg?

Imagine you wish to predict temperature using time of day.  You believe that the temperature increases until a certain time t and decreases after that point and you wish to estimate two functions that will meet at t.  Given this location at which you believe the relationship between your outcome and predictor changes, you can fit two functions to meet at that given point in SAS using proc transreg.

For this example, we will use a fictional dataset where our outcome variable is y, our predictor variable is x, and the relationship between x and y is clearly not a single continuous function.  We have borrowed this example data from SAS examples.


data a;
x=-0.000001;
do i=0 to 199;
if mod(i,50)=0 then do;
c=((x/2)-5)**2;
if i=150 then c=c+5;
y=c;
end;
x=x+0.1;
y=y-sin(x-c);
output;
end;
run;

proc print data = a (obs = 5); run;
Obs       X       I       C          Y

1    0.10000    0    25.0000    24.7694
2    0.20000    1    25.0000    24.4427
3    0.30000    2    25.0000    24.0234
4    0.40000    3    25.0000    23.5155
5    0.50000    4    25.0000    22.9241
proc gplot data = a;
plot y*x;
run;

We may have some reason to believe that there is a "knot" located at x = 10 and that the relationship between x and y is linear.  By saying there is a knot here, we're suggesting that the relationship between x to y for x < 10 is different from the relationship for x > 10.  Rather than fit one line or curve where we believe there are actually two, we can indicate the location of the knot and fit a piecewise model.  We will do this using proc transreg.  We will indicate the location of the knot with the knots option and specify that we wish to fit one-degree polynomials (lines) with the degree option.  We will output a dataset, a2, that includes the coefficients for these lines.

proc transreg data = a;
model identity(y) = pspline(x / knots = 10 degree = 1);
output out = a2 coefficients;
run;

TRANSREG Univariate Algorithm Iteration History for Identity(Y)

Iteration    Average    Maximum                Criterion
Number     Change     Change    R-Square       Change    Note
-------------------------------------------------------------------------
1    0.00000    0.00000     0.48219                 Converged



By fitting this model, we are effectively estimating one intercept and two slopes.  The output from proc transreg does not include these estimates as we would see them after fitting a regression with proc reg, but the estimates have been added to the output dataset a2. The output dataset also includes the transformed outcome variable, as many powers of the predictor x are needed for polynomials of the indicated degree, and as many modified versions of x as are required by the number of knots. We can look at a few records in our output dataset to see this.

proc print data = a2 (obs = 5);
where x > 9.8;
run;

Obs _TYPE_ _NAME_     Y       TY    Intercept   X_1     X_2   TIntercept   TX_1    TX_2     X

99 SCORE  ROW99  -5.85961 -5.85961     1      9.9000 0.00000      1      9.9000 0.00000  9.9000
100 SCORE  ROW100 -5.28805 -5.28805     1     10.0000 0.00000      1     10.0000 0.00000 10.0000
101 SCORE  ROW101  0.62507  0.62507     1     10.1000 0.10000      1     10.1000 0.10000 10.1000
102 SCORE  ROW102  1.32494  1.32494     1     10.2000 0.20000      1     10.2000 0.20000 10.2000
103 SCORE  ROW103  2.09263  2.09263     1     10.3000 0.30000      1     10.3000 0.30000 10.3000


In this example, we used the "identity" transformation of y, so the transformed variable ty is equal to y.  We indicated one-degree polynomials, so variables intercept = x^0 and x_1 = x^1= x were created. No higher powers of x were generated.  We indicated one knot placed at x = 10, so a variable x_2 was generated.  x_2 is zero when x < 10 and is x - 10 when x > 10.  For example, we can see that when x = 10.1, x_2 = 0.1. Transformations of intercept, x_1, and x_2 also appear in the dataset.  In this example, the transformed values are equal to the untransformed.

The last row of the generated dataset contains the coefficients from the model.  In this row, the value for _TYPE_ is "M COEFFI". We can print this line to see the coefficients.

proc print data = a2;
where _TYPE_ eq "M COEFFI";
run;

Obs    _TYPE_    _NAME_   Y   TY   Intercept   X_1   X_2   TIntercept     TX_1       TX_2    X
201   M COEFFI     TY     .    .       .        .     .      16.9614    -1.47976   3.94202   .


Here we see the intercept (16.9614), the slope of the first line (-1.47976) and the slope of the second line (3.94202).  We will see the same estimates if we use proc reg:

proc reg data = a2;
model y = x_1 x_2;
output out = a3 predicted = pred;
run;

Analysis of Variance
Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F
Model                     2     8181.68207     4090.84103      91.72    <.0001
Error                   197     8786.23476       44.60018
Corrected Total         199          16968

Root MSE              6.67834    R-Square     0.4822
Dependent Mean       12.04335    Adj R-Sq     0.4769
Coeff Var            55.45246

Parameter Estimates
Parameter       Standard
Variable     Label        DF       Estimate          Error    t Value    Pr > |t|
Intercept    Intercept     1       16.96137        1.26019      13.46      <.0001
X_1          X 1           1       -1.47976        0.18399      -8.04      <.0001
X_2          X 2           1        3.94202        0.32717      12.05      <.0001



We can also see that the R-square value in the above regression is the same that appeared in the proc transreg output. We can plot the predicted values and see that they consist of two lines that meet at x = 10.

legend label=none value=('y' 'predicted y') position=(bottom left inside) mode=share down = 2;
proc gplot data = a3;
plot (y pred)*x / overlay legend = legend;
run;
quit;


If we have reason to believe that relationship between x and y is quadratic, we can indicate this in proc transreg and fit two two-degree polynomials to the data. We will again run proc tranreg, this time with degree = 2, and output a dataset, aquad.

proc transreg data = a;
model identity(y) = pspline(x / knots = 10 degree = 2);
run;

TRANSREG Univariate Algorithm Iteration History for Identity(Y)

Iteration    Average    Maximum                Criterion
Number     Change     Change    R-Square       Change    Note
-------------------------------------------------------------------------
1    0.00000    0.00000     0.46274                 Converged

In this model, proc transreg is finding the combination of two quadratic functions that meet at x = 10.  The dataset aquad has several added transformed variables: x_2 is the square of x and x_3 is the square of (x-10) for x > 10 and 0 for x < 10.  We can look at a few records in our output dataset to see this.

proc print data = aquad (obs = 5);
where x > 9.8;
run;

Obs   _TYPE_   _NAME_       Y         TY      Intercept     X_1       X_2
99   SCORE    ROW99    -5.85961   -5.85961       1        9.9000    98.010
100   SCORE    ROW100   -5.28805   -5.28805       1       10.0000   100.000
101   SCORE    ROW101    0.62507    0.62507       1       10.1000   102.010
102   SCORE    ROW102    1.32494    1.32494       1       10.2000   104.040
103   SCORE    ROW103    2.09263    2.09263       1       10.3000   106.090

Obs      X_3      TIntercept      TX_1       TX_2       TX_3         X
99   0.000000         1         9.9000     98.010    0.000000     9.9000
100   0.000000         1        10.0000    100.000    0.000000    10.0000
101   0.010000         1        10.1000    102.010    0.010000    10.1000
102   0.040000         1        10.2000    104.040    0.040000    10.2000
103   0.089999         1        10.3000    106.090    0.089999    10.3000


Again, the last row of the generated dataset contains the coefficients from the model.  In this row, the value for _TYPE_is "M COEFFI". We can print this line to see the coefficients.

proc print data = aquad;
where _TYPE_ eq "M COEFFI";
run;

T
I              I
n              n
t              t
_      _         e              e
T      N         r              r
Y      A         c              c         T        T         T
O      P      M         e  X  X  X     e         X        X         X
b      E      E      T  p  _  _  _     p         _        _         _
s      _      _   Y  Y  t  1  2  3     t         1        2         3     X

201  M COEFFI  TY  .  .  .  .  .  .  23.5360  -5.39639  0.36708  -0.38836  .


We will see the same estimates if we use proc reg:

proc reg data = aquad;
model y = x_1 x_2 x_3;
output out = aquad_reg predicted = pred;
run;
Analysis of Variance
Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F
Model                     3     7851.67021     2617.22340      56.27    <.0001
Error                   196     9116.24661       46.51146
Corrected Total         199          16968

Root MSE              6.81993    R-Square     0.4627
Dependent Mean       12.04335    Adj R-Sq     0.4545
Coeff Var            56.62817

Parameter Estimates
Parameter      Standard
Variable    Label       DF      Estimate         Error   t Value   Pr > |t|
Intercept   Intercept    1      23.53600       1.83580     12.82     <.0001
X_1         X 1          1      -5.39639       0.64211     -8.40     <.0001
X_2         X 2          1       0.36708       0.04645      7.90     <.0001
X_3         X 3          1      -0.38836       0.08628     -4.50     <.0001



We can also see that the R-square value in the above regression is the same that appeared in the proc transreg output. We can plot the predicted values and see that they consist of two curves that meet at x = 10.

legend label=none value=('y' 'predicted y') position=(bottom left inside) mode=share down = 2;
plot (y pred)*x / overlay legend = legend;
run;
quit;


We have assumed up to this point that there is a known value at which the relationship between y and x changes. If you know that such a point exists, but do not know its location, see SAS FAQ: How can I find where to split a piecewise regression?.

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.