7.2.3. Stepwise Regression
Stepwise Regression provides an answer to the question of which
The simplest way to isolate the effects of various independent variables on the variation of dependent variable would be to start with one independent variable and run a series of regressions adding one independent variable at a time. An alternative would be to start with all independent variables and omit one at a time. Indeed, these are the two basic procedures most commonly used in Stepwise Regression, but with a difference. Rather than adding or omitting variables randomly it is possible to introduce a statistically meaningful criterion to rank the sequence. The enter/omit criteria used here are the F-to-enter, F-to-remove and Tolerance parameters.

As in Linear Regression, it is possible to create interaction terms, dummy variables, lag/lead terms, select multiple dependent variables and run regressions on subsamples defined by several factor columns (see Variables Selected list.
7.2.3.1. Stepwise Selection Criteria
The next dialogue is for selecting the Tolerance, F-to-enter and F-to-remove thresholds. One of Forward Selection or Backward Selection methods is also specified on this dialogue.

The values suggested by the program are the most commonly used limits. Of course, it is possible to enter any value of choice by editing the number in the field. UNISTAT allows entry of F-values only as enter / remove thresholds. If you wish to enter tail probability values instead, the corresponding F-values can be calculated easily using the Statistics 1 → Distribution Functions → Critical Value procedure. The complement of the desired tail probability value (1 - α) should be entered in the Probability dialogue, and numerator and denominator degrees of freedom should be entered as 1 and 100,000 (representing infinity) respectively. The critical value obtained in this way can then be used in the Stepwise Regression procedure.
F-to-Enter: The F-to-enter statistic of an independent variable is the F-statistic for testing the significance of the regression coefficient it would have if it were in the regression equation. If this calculated value is above the one specified by the user, then the variable can enter the equation. The default value is 3.8416, corresponding to a tail probability value of 0.05 (with 1 and 100,000 degrees of freedom) and it must always be greater than the F-to-remove value. If you wish to change this default value permanently, enter and edit the following line in the [Options] section of Documents\Unistat60\Unistat60.ini file:
StepwiseFtoEnter=3.8416
F-to-Remove: The F-to-remove statistic of an independent variable which is already in the regression equation is the F-statistic for testing the significance of its regression coefficient. If this calculated value is below the one specified by the user then the variable is removed from the equation. The default value is 2.7056, corresponding to a tail probability value of 0.10 (with 1 and 100,000 degrees of freedom) and it must always be less than the F-to-enter value. If you wish to change this default value permanently, enter and edit the following line in the [Options] section of Documents\Unistat60\Unistat60.ini file:
StepwiseFtoRemove=2.7056
Tolerance: In order to avoid highly correlated variables and also to prevent accumulation of rounding errors, a Tolerance value is specified. The Tolerance of a variable which is not in the equation is defined as 1 - R-squared where R is the multiple correlation between the variable and all variables which are in the regression equation. If you wish to change this default value permanently, enter and edit the following line in the [Options] section of Unistat60.ini:
StepwiseTolerance=0.001
Forward/Backward Selection: If the Forward Selection method is employed, then the program will first run a regression with the most likely candidate, and then successively introduce other variables or omit existing ones. If the Backward Selection method is selected, then the program full regression output in the beginning.
It is important to emphasise that neither F-to-enter or F-to-remove, nor the Tolerance of a variable (either in the equation or not) remains the same when a variable is added to or removed from the regression equation. Therefore, whenever an addition or omission takes place, all variables, regardless of being in the equation or not, are made subject to the above checks. When the last of the independent variables is tried for entry or removal and no variables can be entered or removed, then the selection process is terminated.
7.2.3.2. Stepwise Regression Output Options

The full output can be substantial, as a large amount of multiple correlation, R-squared, adjusted R-squared, change in R-squared, Analysis of Variance. The regression coefficient, its standard error, t‑statistic, its tail probability and the calculated F-to-remove value are displayed for each independent variable. Partial correlation, Tolerance and F-to-enter values of variables which are not in the equation are also displayed.
At the end of the selection process, a summary table gives the multiple correlation, R-squared and F-statistic for each step.
7.2.3.3. Stepwise Regression Example
Example 20.1e on p. 436 from Zar, J. H. (2010).
Open REGRESS, select Statistics 1 → Regression Analysis → Stepwise Regression and select temperature, cm, mm and min (C1 to C4) as [Variable]s and ml (C5) as [Dependent]. Select Backward Selection and accept the Tolerance levels given in the next dialogue to obtain the following output:
Stepwise Regression
Backward Selection
Valid Number of Cases: 33, 0 Omitted
Dependent Variable: ml
Tolerance: 0.001
F-to-Enter: 3.8416 (5.0%)
F-to-Remove: 2.7056 (10.0%)
All uncorrelated variables entered
|
Standard Error |
Multiple Correlation |
R-squared |
Adjusted R-squared |
Change in R-squared |
|
0.4238 |
0.8117 |
0.6589 |
0.6102 |
0.6589 |
|
Due To |
Sum of Squares |
DoF |
Mean Square |
F-Stat |
Prob |
|
Regression |
9.717 |
4 |
2.429 |
13.524 |
0.0000 |
|
Error |
5.030 |
28 |
0.180 |
|
|
|
Variables in Equation |
Coefficient |
Std Error |
t-Statistic |
Prob |
F-to-Remove |
|
Constant |
2.9583 |
|
|
|
|
|
Temperature |
-0.1293 |
0.0213 |
-6.0751 |
0.0000 |
36.9063 |
|
cm |
-0.0188 |
0.0563 |
-0.3338 |
0.7410 |
0.1114 |
|
mm |
-0.0462 |
0.2073 |
-0.2230 |
0.8252 |
0.0497 |
|
min |
0.2088 |
0.0670 |
3.1141 |
0.0042 |
9.6979 |
Step 1: Variable Removed: mm
|
Standard Error |
Multiple Correlation |
R-squared |
Adjusted R-squared |
Change in R-squared |
|
0.4168 |
0.8114 |
0.6583 |
0.6230 |
-0.0006 |
|
Due To |
Sum of Squares |
DoF |
Mean Square |
F-Stat |
Prob |
|
Regression |
9.708 |
3 |
3.236 |
18.625 |
0.0000 |
|
Error |
5.039 |
29 |
0.174 |
|
|
|
Variables in Equation |
Coefficient |
Std Error |
t-Statistic |
Prob |
F-to-Remove |
|
Constant |
2.6725 |
|
|
|
|
|
Temperature |
-0.1305 |
0.0203 |
-6.4232 |
0.0000 |
41.2572 |
|
cm |
-0.0154 |
0.0533 |
-0.2892 |
0.7745 |
0.0837 |
|
min |
0.2045 |
0.0632 |
3.2356 |
0.0030 |
10.4694 |
|
Variables not in Equation |
Partial Corr |
Tolerance |
F-to-Enter |
|
mm |
-0.0421 |
0.8518 |
0.0497 |
Step 2: Variable Removed: cm
|
Standard Error |
Multiple Correlation |
R-squared |
Adjusted R-squared |
Change in R-squared |
|
0.4104 |
0.8108 |
0.6573 |
0.6345 |
-0.0010 |
|
Due To |
Sum of Squares |
DoF |
Mean Square |
F-Stat |
Prob |
|
Regression |
9.694 |
2 |
4.847 |
28.775 |
0.0000 |
|
Error |
5.053 |
30 |
0.168 |
|
|
|
Variables in Equation |
Coefficient |
Std Error |
t-Statistic |
Prob |
F-to-Remove |
|
Constant |
2.5520 |
|
|
|
|
|
Temperature |
-0.1324 |
0.0189 |
-6.9993 |
0.0000 |
48.9907 |
|
min |
0.2013 |
0.0613 |
3.2850 |
0.0026 |
10.7910 |
|
Variables not in Equation |
Partial Corr |
Tolerance |
F-to-Enter |
|
mm |
-0.0261 |
0.9176 |
0.0198 |
|
cm |
-0.0536 |
0.8652 |
0.0837 |
Summary Table
Dependent Variable: ml
|
Step |
In/Out |
Variable |
Multiple Corr |
R-squared |
F-Stat |
Prob |
|
1 |
Out |
mm |
0.8114 |
0.6583 |
18.6251 |
0.0000 |
|
2 |
Out |
cm |
0.8108 |
0.6573 |
28.7748 |
0.0000 |