UNISTAT - the ultimate Excel statistics add-in

7.2.3. Stepwise Regression

Stepwise Regression provides an answer to the question of which

The simplest way to isolate the effects of various independent variables on the variation of dependent variable would be to start with one independent variable and run a series of regressions adding one independent variable at a time. An alternative would be to start with all independent variables and omit one at a time. Indeed, these are the two basic procedures most commonly used in Stepwise Regression, but with a difference. Rather than adding or omitting variables randomly it is possible to introduce a statistically meaningful criterion to rank the sequence. The enter/omit criteria used here are the F-to-enter, F-to-remove and Tolerance parameters.

Stepwise Regression

As in Linear Regression, it is possible to create interaction terms, dummy variables, lag/lead terms, select multiple dependent variables and run regressions on subsamples defined by several factor columns (see Variables Selected list.

7.2.3.1. Stepwise Selection Criteria

The next dialogue is for selecting the Tolerance, F-to-enter and F-to-remove thresholds. One of Forward Selection or Backward Selection methods is also specified on this dialogue.

Stepwise Regression

The values suggested by the program are the most commonly used limits. Of course, it is possible to enter any value of choice by editing the number in the field. UNISTAT allows entry of F-values only as enter / remove thresholds. If you wish to enter tail probability values instead, the corresponding F-values can be calculated easily using the Statistics 1Distribution Functions → Critical Value procedure. The complement of the desired tail probability value (1 - α) should be entered in the Probability dialogue, and numerator and denominator degrees of freedom should be entered as 1 and 100,000 (representing infinity) respectively. The critical value obtained in this way can then be used in the Stepwise Regression procedure.

F-to-Enter: The F-to-enter statistic of an independent variable is the F-statistic for testing the significance of the regression coefficient it would have if it were in the regression equation. If this calculated value is above the one specified by the user, then the variable can enter the equation. The default value is 3.8416, corresponding to a tail probability value of 0.05 (with 1 and 100,000 degrees of freedom) and it must always be greater than the F-to-remove value. If you wish to change this default value permanently, enter and edit the following line in the [Options] section of Documents\Unistat60\Unistat60.ini file:

   StepwiseFtoEnter=3.8416

F-to-Remove: The F-to-remove statistic of an independent variable which is already in the regression equation is the F-statistic for testing the significance of its regression coefficient. If this calculated value is below the one specified by the user then the variable is removed from the equation. The default value is 2.7056, corresponding to a tail probability value of 0.10 (with 1 and 100,000 degrees of freedom) and it must always be less than the F-to-enter value. If you wish to change this default value permanently, enter and edit the following line in the [Options] section of Documents\Unistat60\Unistat60.ini file:

   StepwiseFtoRemove=2.7056

Tolerance: In order to avoid highly correlated variables and also to prevent accumulation of rounding errors, a Tolerance value is specified. The Tolerance of a variable which is not in the equation is defined as 1 - R-squared where R is the multiple correlation between the variable and all variables which are in the regression equation. If you wish to change this default value permanently, enter and edit the following line in the [Options] section of Unistat60.ini:

   StepwiseTolerance=0.001

Forward/Backward Selection: If the Forward Selection method is employed, then the program will first run a regression with the most likely candidate, and then successively introduce other variables or omit existing ones. If the Backward Selection method is selected, then the program full regression output in the beginning.

It is important to emphasise that neither F-to-enter or F-to-remove, nor the Tolerance of a variable (either in the equation or not) remains the same when a variable is added to or removed from the regression equation. Therefore, whenever an addition or omission takes place, all variables, regardless of being in the equation or not, are made subject to the above checks. When the last of the independent variables is tried for entry or removal and no variables can be entered or removed, then the selection process is terminated.

7.2.3.2. Stepwise Regression Output Options

Stepwise Regression

The full output can be substantial, as a large amount of multiple correlation, R-squared, adjusted R-squared, change in R-squared, Analysis of Variance. The regression coefficient, its standard error, t‑statistic, its tail probability and the calculated F-to-remove value are displayed for each independent variable. Partial correlation, Tolerance and F-to-enter values of variables which are not in the equation are also displayed.

At the end of the selection process, a summary table gives the multiple correlation, R-squared and F-statistic for each step.

7.2.3.3. Stepwise Regression Example

Example 20.1e on p. 436 from Zar, J. H. (2010).

Open REGRESS, select Statistics 1Regression Analysis → Stepwise Regression and select temperature, cm, mm and min (C1 to C4) as [Variable]s and ml (C5) as [Dependent]. Select Backward Selection and accept the Tolerance levels given in the next dialogue to obtain the following output:

Stepwise Regression

Backward Selection

Valid Number of Cases: 33, 0 Omitted

Dependent Variable: ml

Tolerance: 0.001

F-to-Enter: 3.8416 (5.0%)

F-to-Remove: 2.7056 (10.0%)

 

All uncorrelated variables entered

Standard Error

Multiple Correlation

R-squared

Adjusted R-squared

Change in R-squared

 0.4238

 0.8117

 0.6589

 0.6102

 0.6589

 

Due To

Sum of Squares

DoF

Mean Square

F-Stat

Prob

Regression

 9.717

 4

 2.429

 13.524

 0.0000

Error

 5.030

 28

 0.180

 

 

 

Variables in Equation

Coefficient

Std Error

t-Statistic

Prob

F-to-Remove

Constant

 2.9583

 

 

 

 

Temperature

-0.1293

 0.0213

-6.0751

 0.0000

 36.9063

cm

-0.0188

 0.0563

-0.3338

 0.7410

 0.1114

mm

-0.0462

 0.2073

-0.2230

 0.8252

 0.0497

min

 0.2088

 0.0670

 3.1141

 0.0042

 9.6979

 

Step 1: Variable Removed: mm

Standard Error

Multiple Correlation

R-squared

Adjusted R-squared

Change in R-squared

 0.4168

 0.8114

 0.6583

 0.6230

-0.0006

 

Due To

Sum of Squares

DoF

Mean Square

F-Stat

Prob

Regression

 9.708

 3

 3.236

 18.625

 0.0000

Error

 5.039

 29

 0.174

 

 

 

Variables in Equation

Coefficient

Std Error

t-Statistic

Prob

F-to-Remove

Constant

 2.6725

 

 

 

 

Temperature

-0.1305

 0.0203

-6.4232

 0.0000

 41.2572

cm

-0.0154

 0.0533

-0.2892

 0.7745

 0.0837

min

 0.2045

 0.0632

 3.2356

 0.0030

 10.4694

 

Variables not in Equation

Partial Corr

Tolerance

F-to-Enter

mm

-0.0421

 0.8518

 0.0497

Step 2: Variable Removed: cm

Standard Error

Multiple Correlation

R-squared

Adjusted R-squared

Change in R-squared

 0.4104

 0.8108

 0.6573

 0.6345

-0.0010

 

Due To

Sum of Squares

DoF

Mean Square

F-Stat

Prob

Regression

 9.694

 2

 4.847

 28.775

 0.0000

Error

 5.053

 30

 0.168

 

 

 

Variables in Equation

Coefficient

Std Error

t-Statistic

Prob

F-to-Remove

Constant

 2.5520

 

 

 

 

Temperature

-0.1324

 0.0189

-6.9993

 0.0000

 48.9907

min

 0.2013

 0.0613

 3.2850

 0.0026

 10.7910

 

Variables not in Equation

Partial Corr

Tolerance

F-to-Enter

mm

-0.0261

 0.9176

 0.0198

cm

-0.0536

 0.8652

 0.0837

Summary Table

Dependent Variable: ml

 

Step

In/Out

Variable

Multiple Corr

R-squared

F-Stat

Prob

 1

Out

mm

 0.8114

 0.6583

 18.6251

 0.0000

 2

Out

cm

 0.8108

 0.6573

 28.7748

 0.0000