UNISTAT - the ultimate Excel statistics add-in

7.2.1. Linear Regression

The Linear Regression procedure is suitable for estimating weighted or nonweighted linear regression models with or without a constant term, including nonlinear models such as multiplicative, exponential or reciprocal regressions that can be linearised by logarithmic or exponential transformations. It is possible to run regressions without an independent variable, which is equivalent to running a noconstant regression against a unity independent variable.

Regressions may be run on a subset of cases as determined by the combination of levels of an unlimited number of factor columns. An unlimited number of dependent variables can be selected to run the same model terms, dummy and lag/lead variables in the model, without having to create them as data columns in the spreadsheet first.

options. In regression plots, it is possible to omit outliers interactively, by pressing down the right mouse button on a data point and then pressing <Delete>.

Predictions (Interpolations): UNISTAT provides the user with three different methods of making predictions for a Linear Regression model, each of which has its advantages depending on the purpose of the exercise:

1)       In Stand-Alone Mode, the spreadsheet function Reg (see 3.4.2.6.3. UNISTAT Functions).

2)       Actual and Fitted Values (see 7.2.1.2.1. Linear Regression Main Output Options),

3)       Confidence Intervals for Mean and Actual Y-values (see 7.2.1.2.1. Linear Regression Main Output Options),

      The first option is explained in detail in its own section. However, it is important here to explain how the last two options work. There are three conditions under which predictions will be computed for estimated Y values and their confidence intervals:

1)       If, for a case, all independent variables are non-missing, but only the dependent variable is missing,

2)       If a case does not contain missing values but it has been omitted from the analysis by Data Processor’s DataSelect Row function,

3)       If a case does not contain missing values but it has been omitted from the analysis by selecting subsamples from the Variable Selection Dialogue (see 2.1.2. Categorical Data Analysis).

regression coefficients. When, however, one of Actual and Fitted Values or Confidence Intervals for Mean and Actual Y-values options is selected, the program will detect these cases and compute and display the fitted (estimated) Y values and their confidence predictions are to be made in the data matrix during the data preparation phase, and then exclude them from the analysis by one of the above three methods.

Multicollinearity: One of the basic assumptions of the method of least squares estimation is that no linear dependency exists between the regression variables. However, the present implementation will detect the variables causing multicollinearity and display them at the end of coefficients table. If you do not wish to display these variables enter the following line in the [Options] section of Documents\Unistat60\Unistat60.ini file:

DispCollin=0

      The rest of the coefficients will be determined as if the regression were run without the variables causing collinearity.

Perfect Fit: If the current set of independent variables fully explain the variation of the dependent variable, the program displays a restricted number of results options, main results themselves being confined to estimated regression coefficients only.

      Perfect fit will be reported under the following two circumstances;

1)   if R-squared > 0.99999 or

2)   if the number of variables (including the dependent variable) is equal to the number of observations, in which case R-squared is not computed.

7.2.1.1. Linear Regression Variable Selection

Linear Regression

All columns selected for this procedure should have an equal size. The buttons on the Variable Selection Dialogue have the following tasks:

Variable: Click on [Variable] to select a column containing continuous numeric data as an independent variable.

Interaction: This button is used to create independent variables, which are the products of existing numeric variables. If only one variable is highlighted, then the new independent variable created will be the product of the selected variable by itself. If two or more variables are highlighted, then the new term will be the product of these variables. Maximum three-way interactions are allowed. Interactions of dummy variables or lags are not allowed. In order to create interaction terms for dummy variables, create interactions first, and then create dummy variables for them. For further information see 2.1.4. Creating Interaction, Dummy and Lag/Lead Variables.

Dummy: This button is used to create n or n ‑ 1 new independent (dummy or indicator) variables for a factor column containing n levels. Each dummy variable corresponds to a level of the factor column. A case in a dummy column will have the value of 1 if the factor contains the corresponding level in the same row, and 0 otherwise. If the selected variable is an interaction term, then dummy variables will be created for this interaction term. Up to three-way interactions are allowed and columns containing short or Long Strings can be selected as factors. It is possible to include all n levels or to omit the first or the last level in order to remove the inherent over-parameterisation of the model. For further information see 2.1.4. Creating Interaction, Dummy and Lag/Lead Variables.

Full: This button becomes activated when two or more categorical variables are highlighted. Like the [Dummy] button, it is also used to create dummy variables. The only difference is that this button will create all necessary dummy variables and their interactions to specify a complete model. For instance, if two categorical variables are highlighted, this button will create two sets of dummy variables representing the main effects and a third set representing the interaction term between the two factors. Maximum three-way interactions are supported.

Lag/Lead: This button is used to create new independent variables by shifting the rows of an existing variable up or down. When a lag variable is specified in the Variable Selection Dialogue, then a further dialogue will ask for the number of lags (or leads) for each item selected. Negative integers represent the lags and positive integers the leads. For further information see 2.1.4. Creating Interaction, Dummy and Lag/Lead Variables.

Dependent: It is compulsory to select at least one column containing numeric data as a dependent variable. When more than one dependent variable is selected, the analysis will be repeated as many times as the number of dependent variables, each time only changing the dependent variable and keeping the rest of selections unchanged.

Factor: This allows you to run regressions on subsamples of rows (cases). With some time series or panel data it is desirable to run regressions on some, rather than all rows of the data matrix. Although it is possible to extract subsets using a Data Processor function such as If() (see 3.4.2.7. Conditional Functions), DataRecode Column, Subsample Save (see 2.4.1.6.3. Options) or DataSelect Row, it is much more convenient to use the selection facility provided here.

Linear Regression

      To make use of this facility the data matrix should contain at least one factor column. An unlimited number of factors can be selected. These can be numeric or String Data columns, but each column must contain a limited number of distinct values. Select the factor columns from the Variables Available list by clicking on [Factor]. Then the program will display a dialogue where all possible combinations of factor levels are displayed in a list of check boxes. For instance, if one factor containing three levels is selected, only three check boxes will be displayed representing each level. Only the rows of data matrix corresponding factors selected, say one having two levels and the other three, then the list will contain six check boxes, 1x1, 1x2, 1x3, 2x1, 2x2, 2x3. Suppose the check boxes 1x2 and 2x2 are checked. Then only those rows of the data matrix containing 1 in the first factor column and 2 in the second and 2 in the first one selection is made, it is possible to run a single Regression Analysis on all selected rows combined, or to run a separate analysis for each selection.

      With subsample selection, the possibility of getting an Insufficient degrees of freedom message will increase considerably. Also, interpretation of the Durbin-Watson statistic may not be obvious.

Weight: It was mentioned at the beginning of this chapter that when a column is selected as a weights variable, the program will normalise this column so that its sum is equal to the number of valid cases, and then multiply each independent variable by its square root. In a weighted regression run with constant term included, the column of 1s in the X matrix should also be multiplied by the square root of weights, as the Regression Analysis considers the constant term just like any other coefficient. The algorithm used here produces exactly the same effect without having to take the square root of weights, thus achieving higher accuracy.

7.2.1.2. Linear Regression Output Options

When the calculations are finished, an Output Options Dialogue will provide access to the following options.

Linear Regression

7.2.1.2.1. Linear Regression Main Output Options

Regression Results: The main regression output displays a table for coefficients of the estimated regression equation, their standard errors, t-statistics, probability values (from the t-distribution) and confidence intervals for the significance level specified in the regression Variable Selection Dialogue. If any independent variables have been omitted due to multicollinearity, they are reported at the end of the table. If you do not wish to display these variables enter the following line in the [Options] section of Documents\Unistat60\Unistat60.ini file:

DispCollin=0

      When the model contains dummy variables with very long string values, output may look cramped. You can display the numbers of levels, instead of their string values, by entering the following line in the [Options] section of Unistat60.ini:

OLSFullLabel=0

      In Stand-Alone Mode, the estimated regression coefficients can be saved to data matrix by clicking on the UNISTAT icon (which becomes visible on the right of the toolbar in Output Window and Data Processor after running a procedure). The same coefficients will also be saved automatically in the file POLYCOEF.TXT, in the order of C0 (constant term, if any), C1, C2, ..., Cr.

      The rest of the output consists of the following statistics: residual sum of squares, standard error of regression, mean and standard deviation of the dependent variable, R-squared, R-squared adjusted for the degrees of freedom, F-statistic and its tail probability, the Durbin-Watson statistic, press statistic and the log of likelihood function. The Durbin-Watson statistic will be adjusted for the number of gaps in data caused by missing values. The number of rows omitted due to missing values is reported if it is other than zero.

ANOVA of Regression: The total variation of the dependent variable is partitioned into the Regression (or explained) part which is due to the linear influence of independent variables and the Error (or unexplained) part which is expressed in residuals.

      The F‑value is the ratio of the mean squares for regression and mean squares for the error term. The null hypothesis of “no relationship between the dependent variable and independent variables as a whole” can be tested by means of the probability value reported.

      Individual contributions of independent variables to the regression (explained) sum of squares are also displayed. The sum of individual contributions will be equal to the regression sum of squares. However, it is important here to emphasise that these individual contributions are specific to the order in which independent variables enter into the regression equation. Normally, when two independent variables change place, their individual contributions to the regression sum of squares will also change.

Correlation Matrix of Regression Coefficients: This is a symmetric matrix with unity diagonal elements. It gives the correlation between the regression coefficients and is obtained by dividing the elements of (X’X)-1 matrix by the square root of the diagonal elements corresponding to its row and column.

Covariance Matrix of Regression Coefficients: This option displays a symmetric matrix where diagonal elements are the variances and off-diagonal elements are the covariances of the estimated regression coefficients. This matrix is sometimes referred to as the dispersion matrix and it can also be obtained by multiplying (X’X)-1 matrix by the estimated variance of the error terms.

Actual and Fitted Values: A character plot of estimated and observed values is generated for the dependent variable. The estimated values are displayed and, like residuals, they can be added to the data matrix for further analysis. Among other purposes, these values can be used in estimating two or three stage least squares models and analysis of instrumental variables in general. This option can also be used to make predictions for the dependent variable (see 7.2.1. Linear Regression). When a fitted Y value is predicted, it will be marked by an asterisk (*).

Residuals: This option will display the difference between observed and estimated values of the dependent variable for each observation in the form of a character plot. Scaling will be made according to the largest deviation and all residuals will be plotted between minus and plus the maximum deviation. Alongside the plot, values of residuals will also be displayed.

Confidence Intervals for Mean and Actual Y-values: The two confidence intervals are computed as follows:

      1) Confidence interval for the mean of Y:

            Linear Regression

      2) Confidence interval for actual values of Y:

            Linear Regression

      where X is the given vector of independent variable values, ß is the vector of estimated coefficients, Linear Regression is the critical value from t-distribution for an α / 2 level of significance and n - k degrees of freedom and S is the standard error of prediction. Any significance level can be entered.

      This option can also be used to make predictions for the dependent variable and their confidence intervals (see 7.2.1. Linear Regression). When a case is predicted, it will be marked by an asterisk (*).

      The confidence intervals can also be plotted for bivariate regressions as an option on X-Y line plots (see 4.1.1.1.1. Line).

7.2.1.2.2. Linear Regression Further Output Options

Case (Diagnostic) Statistics: These statistics are useful in determining the influence of individual observations on the overall fit of the model. Looking at outliers (cases with a large residual value) is an effective way of determining whether the model fitted explains well the variation in data. However, residuals alone cannot explain all types of influence of individual cases on the regression. Suppose, for instance, a data set where most observations are clustered together but only one point lies outside the cluster. Suppose also that the regression line passes near this point so that it does not have a large residual. Nevertheless, removing this single point from the regression may have substantial effects on the estimated coefficients (called leverage).

Linear Regression

      Most of the regression diagnostic statistics below measure such effects, which answer the question what would happen if this case was removed from the regression. Luckily, we do not need to estimate the entire model after deleting each case, but compute the same results by applying the following algebraic manipulations.

      Let:

      n = valid number of cases and

      m = number of coefficients in the model.

      Therefore:

      m = 1 + number of independent variables (with constant) and

      m = number of independent variables (with no constant).

      Also, each row of the data matrix is defined as:

      Xi = 1+X1i,…,Xm-1i for regressions with constant and

      Xi = X1i,…,Xmi for regressions without constant.

      Central to most diagnostic statistics are definitions of root mean square of residuals:

      Linear Regression

and the diagonal vector of the projection matrix:

      Linear Regression

      Statistics available under this option can then be defined as follows.

      Fitted Values:

           Linear Regression

      Standard Error of Fitted:

           Linear Regression

      Standardised Fitted:

           Linear Regression

      Adjusted Fitted: 

           Yi – Press Residuali

      Residuals:

           Linear Regression

      Standard Error of Residuals:

           Linear Regression

      Standardised Residuals:

           Linear Regression

      Studentised (Jackknife) Residuals:

           Linear Regression

      Press (Deleted) Residuals:

           Linear Regression

      Press (deleted) residuals are defined as the change in a residual when this case is omitted from the analysis. An estimate can be computed from the above formula, without having to run n regressions.

      Studentised Press Residuals:

           Linear Regression

where:

      Linear Regression

      Leverage:

           Linear Regression for regression without a constant term and

      Linear Regressionfor regression with a constant term.

      Cook’s Distance:

           Linear Regression

      Mahalanobis Distance:

           Linear Regression for regression without a constant term and

      Linear Regressionfor regression with a constant term.

      Welsch Distance:

           Linear Regression

      Covratio:

           Linear Regression

      DfFit:

           Linear Regression

      Standardised DfFit:

           Linear Regression

      Delta-Betaj:

           Linear Regression

      Delta-beta is defined as the change in an estimated coefficient when a case is omitted from the analysis. Like in press residuals, an estimate can be computed from the above formula, without having to run n regressions.

      Standardised Delta-Beta:

           Linear Regression

Plot of Actual and Fitted Values: Select this option to plot actual and fitted Y values and their confidence intervals against row numbers (index), residuals or against any independent variable. A further dialogue will enable you to choose the X-axis variable from a list containing Row Numbers, Residuals and all independent variables.

Linear Regression

      By default, a line graph of the two series is plotted. However, since this procedure (like the plot of residuals) uses the X-Y Plots engine, it has almost all controls and options available for X-Y Plots, except for error bars and right Y-axes. This means that, as well as being able to edit all aspects of the graph, you can connect data points with lines, curves or display confidence intervals.

      The data points on the graph will also respond to the right mouse button in the way X-Y Plots does; the point is highlighted, a panel displays information about the point and in Stand-Alone Mode, the row of the spreadsheet containing the data point is also highlighted (a procedure which is also known as Brushing or Point identification). While the point is highlighted you can press <Delete> to omit the particular row containing the point. The entire Regression Analysis will be run again without the deleted row. If you want to restore the original regression, you will need to take one of the following two actions depending on the way you run UNISTAT:

1.       In Stand-Alone Mode, go back to Data Processor and delete or deactivate the Select Row column created by the program.

2.       In Excel Add-In Mode, highlight a different block of data to remove the effect of the internal Select Row column.

Linear Regression

Plot of Residuals: Residuals can be plotted against row numbers (index), fitted values or against any independent variable. A further dialogue will enable you to choose the X-axis variable from a list containing Row Numbers, Fitted Values and all independent variables.

      By default a scatter graph of residuals is plotted. For more information on available options see Plot of Actual and Fitted Values above.

Linear Regression

Normal Plot of Residuals: Residuals are plotted against the normal probability (probit) axis. For more information on available options see Plot of Actual and Fitted Values above (also see 5.3.4. Normal Probability Plot).

Linear Regression

7.2.1.3. Linear Regression Examples

Example 1

Table 5.1 on p. 134 from Tabachnick, B. G. & L. S. Fidell (1989).

Open REGRESS, select Statistics 1Regression Analysis → Linear Regression and select Motiv, Qual and Grade (C6 to C8) as [Variable]s and Compr (C9) as [Dependent]. Select all output options to obtain the following results:

Linear Regression

Regression Results

Valid Number of Cases: 6, 0 Omitted

Dependent Variable: Compr

 

 

Coefficient

Standard Error

t-Statistic

Probability

Lower 95%

Upper 95%

Constant

-4.7218

 9.0656

-0.5208

 0.6544

-43.7281

 34.2845

Motiv

 0.6583

 0.8721

 0.7548

 0.5292

-3.0942

 4.4107

Qual

 0.2720

 0.5891

 0.4618

 0.6896

-2.2627

 2.8068

Grade

 0.4160

 0.6462

 0.6438

 0.5857

-2.3643

 3.1964

 

Residual Sum of Squares =

 30.3599

Standard Error =

 3.8961

Mean of Y =

 10.0000

Stand Dev of y =

 4.5166

Correlation Coefficient =

 0.8381

R-squared =

 0.7024

Adjusted R-squared =

 0.2559

F(3,2) =

 1.5731

Probability of F =

 0.4114

Durbin-Watson Statistic =

 1.7838

log of likelihood =

-14.6736

Press Statistic =

 661.8681

 

ANOVA of Regression

Due To

Sum of Squares

DoF

Mean Square

F-Stat

Prob

Motiv

 35.042

 1

 35.042

 2.308

 0.2680

Qual

 30.306

 1

 30.306

 1.996

 0.2932

Grade

 6.292

 1

 6.292

 0.415

 0.5857

Regression

 71.640

 3

 23.880

 1.573

 0.4114

Error

 30.360

 2

 15.180

 

 

Total

 102.000

 5

 20.400

 1.344

 0.4787

 

Correlation Matrix of Regression Coefficients

 

Constant

Motiv

Qual

Grade

Constant

 1.0000

-0.8485

 0.1286

-0.1935

Motiv

-0.8485

 1.0000

-0.1768

-0.1151

Qual

 0.1286

-0.1768

 1.0000

-0.7455

Grade

-0.1935

-0.1151

-0.7455

 1.0000

 

Covariance Matrix of Regression Coefficients

 

Constant

Motiv

Qual

Grade

Constant

 82.1859

-6.7083

 0.6870

-1.1338

Motiv

-6.7083

 0.7606

-0.0908

-0.0649

Qual

 0.6870

-0.0908

 0.3471

-0.2838

Grade

-1.1338

-0.0649

-0.2838

 0.4176

 

Actual and Fitted Values

Row

Actual *

Fitted +

5.0000                                        18.0000

1

 18.0000

 17.5675

                                                  + *

2

 9.0000

 8.8399

                 +*                                  

3

 8.0000

 9.0893

              *    +                                

4

 8.0000

 9.3562

              *     +                               

5

 5.0000

 7.6376

*           +                                       

6

 12.0000

 7.5095

           +                    *                   

 

Residuals

Row

Residuals

-4.4905                                               4.4905

1

 0.4325

                                *                          

2

 0.1601

                               *                            

3

-1.0893

                      *                                    

4

-1.3562

                     *                                     

5

-2.6376

            *                                              

6

 4.4905

                                                           *

 

95% Confidence Intervals for Mean and Actual Y Values

 

lb Actual Y

lb Mean of Y

Fitted Y

ub Mean of Y

ub Actual Y

1

-5.9768

 1.0353

 17.5675

 34.0996

 41.1118

2

-12.3119

-4.0588

 8.8399

 21.7387

 29.9918

3

-13.9745

-6.7510

 9.0893

 24.9295

 32.1530

4

-13.8430

-6.6807

 9.3562

 25.3931

 32.5555

5

-11.9529

-2.4997

 7.6376

 17.7748

 27.2281

6

-11.3204

-1.0661

 7.5095

 16.0851

 26.3394

 

Case (Diagnostic) Statistics

 

Fitted Values

Standard Error of Fitted

Standardised Fitted

Adjusted Fitted

Residuals

Standard Error of Residuals

1

 17.5675

 3.8423

 1.9992

 2.2363

 0.4325

 0.6454

2

 8.8399

 2.9979

-0.3065

 8.6076

 0.1601

 2.4885

3

 9.0893

 3.6815

-0.2406

 18.1667

-1.0893

 1.2753

4

 9.3562

 3.7272

-0.1701

 23.9868

-1.3562

 1.1348

5

 7.6376

 2.3561

-0.6241

 9.1581

-2.6376

 3.1031

6

 7.5095

 1.9931

-0.6580

 5.9179

 4.4905

 3.3478

 

 

Standardised Residuals

Studentised Residuals

Press (Deleted) Residuals

Studentised Press Residuals

Leverage

Cook’s Distance

1

 0.1110

 0.6702

 15.7637

 0.5382

 0.8059

 3.9802

2

 0.0411

 0.0643

 0.3924

 0.0455

 0.4254

 0.0015

3

-0.2796

-0.8541

-10.1667

-0.7578

 0.7262

 1.5199

4

-0.3481

-1.1951

-15.9868

-1.5807

 0.7485

 3.8520

5

-0.6770

-0.8500

-4.1581

-0.7520

 0.1990

 0.1041

6

 1.1525

 1.3413

 6.0821

 2.9934

 0.0950

 0.1594

 

 

Mahalanobis Distance

Welsch Distance

Covratio

DfFit

Standardised DfFit

Delta-Beta Constant

1

 4.0295

 43.2522

 210.8306

 15.3312

 3.2040

-20.9957

2

 2.1269

 0.1920

 38.8964

 0.2323

 0.0549

 0.1779

3

 3.6310

-14.9437

 24.3153

-9.0774

-2.1875

-13.0714

4

 3.7425

-39.8563

 1.2590

-14.6305

-5.1916

 13.6116

5

 0.9951

-1.6031

 4.1991

-1.5205

-0.5710

-3.3462

6

 0.4751

 4.6377

 0.0022

 1.5916

 1.7821

 3.8897

 

 

Delta-Beta Motiv

Delta-Beta Qual

Delta-Beta Grade

Standardised Delta-Beta Constant

Standardised Delta-Beta Motiv

Standardised Delta-Beta Qual

1

 1.0201

 0.6412

 0.5191

-1.8597

 0.9392

 0.8739

2

 0.0036

 0.0319

-0.0421

 0.0139

 0.0029

 0.0384

3

 1.6339

 0.3631

-0.9077

-1.2792

 1.6622

 0.5468

4

-2.1874

 1.6812

-0.8213

 1.9858

-3.3171

 3.7744

5

 0.1201

-0.1731

 0.2728

-0.3266

 0.1218

-0.2600

6

-0.1289

-0.1386

-0.0043

 0.9575

-0.3298

-0.5251

 

 

Standardised Delta-Beta Grade

1

 0.6451

2

-0.0461

3

-1.2463

4

-1.6809

5

 0.3735

6

-0.0149

 

Linear Regression

 

Linear Regression

 

Linear Regression

 

Example 2

Table 4.3.1 on p. 296 from Elliot, M. A., J. S. Reisch, N. P. Campbell (1989). This data set is known as Longley’s data and it is particularly sensitive to rounding-off errors and the regression algorithm used.

Open REGRESS, select Statistics 1Regression Analysis → Linear Regression and select GNP Deflator, GNP, Unemployment, Arm Forces Empl, Population and Time (C10 to C15) as [Variable]s and Total (C16) as [Dependent]. Select only the Regression Results output option to obtain the following:

Linear Regression

Regression Results

Valid Number of Cases: 16, 0 Omitted

Dependent Variable: Total

 

 

Coefficient

Standard Error

t-Statistic

Constant

-3482258.634596

 890420.3836074

-3.910802918154

GNP Deflator

 15.06187227137

 84.91492577477

 0.17737602823

GNP

-0.0358191792926

 0.0334910077722

-1.069516317221

Unemployment

-2.020229803817

 0.488399681652

-4.136427355941

Arm Forces Empl

-1.033226867174

 0.214274163162

-4.821985310445

Population

-0.0511041056536

 0.226073200069

-0.226051144664

Time

 1829.151464614

 455.4784991422

 4.01588981271

 

 

Probability

Lower 95%

Upper 95%

Constant

 0.0036

-5496534.8253

-1467982.4439

GNP Deflator

 0.8631

-177.0295

 207.1533

GNP

 0.3127

-0.1116

 0.0399

Unemployment

 0.0025

-3.1251

-0.9154

Arm Forces Empl

 0.0009

-1.5179

-0.5485

Population

 0.8262

-0.5625

 0.4603

Time

 0.0030

 798.7848

 2859.5181

 

Residual Sum of Squares =

 836424.0555059

Standard Error =

 304.854073562

Mean of Y =

 65317

Stand Dev of y =

 3511.96835597

Correlation Coefficient =

 0.997736941572

R-squared =

 0.995479004577

Adjusted R-squared =

 0.992465007629

F(6,9) =

 330.2853392354

Probability of F =

 0.0000

Durbin-Watson Statistic =

 2.559487689283

log of likelihood =

-110.7203479964

Press Statistic =

 2886892.562947

 

WARNING! Table 4.3.1 contains a misprint in row 13 of the (X5) variable. The above results have been obtained by using the correct value of 123366.

Example 3

Example 20.1c on p. 426 from Zar, J. H. (2010).

Open REGRESS, select Statistics 1Regression Analysis → Linear Regression and select temperature, cm, mm and min (C1 to C4) as [Variable]s and ml (C5) as [Dependent]. Select only the Regression Results and ANOVA of Regression output options to obtain the following results:

Linear Regression

Regression Results

Valid Number of Cases: 33, 0 Omitted

Dependent Variable: ml

 

 

Coefficient

Standard Error

t-Statistic

Probability

Lower 95%

Upper 95%

Constant

 2.9583

 1.3636

 2.1695

 0.0387

 0.1651

 5.7515

temperature

-0.1293

 0.0213

-6.0751

 0.0000

-0.1729

-0.0857

cm

-0.0188

 0.0563

-0.3338

 0.7410

-0.1341

 0.0965

mm

-0.0462

 0.2073

-0.2230

 0.8252

-0.4708

 0.3784

min

 0.2088

 0.0670

 3.1141

 0.0042

 0.0714

 0.3461

 

Residual Sum of Squares =

 5.0299

Standard Error =

 0.4238

Mean of Y =

 2.4742

Stand Dev of y =

 0.6789

Correlation Coefficient =

 0.8117

R-squared =

 0.6589

Adjusted R-squared =

 0.6102

F(4,28) =

 13.5235

Probability of F =

 0.0000

Durbin-Watson Statistic =

 1.9947

log of likelihood =

-15.9976

Press Statistic =

 7.1248

 

ANOVA of Regression

Due To

Sum of Squares

DoF

Mean Square

F-Stat

Prob

temperature

 7.876

 1

 7.876

 43.845

 0.0000

cm

 0.013

 1

 0.013

 0.073

 0.7888

mm

 0.086

 1

 0.086

 0.478

 0.4950

min

 1.742

 1

 1.742

 9.698

 0.0042

Regression

 9.717

 4

 2.429

 13.524

 0.0000

Error

 5.030

 28

 0.180

 

 

Total

 14.747

 32

 0.461

 2.565

 0.0066