Unistat Statistics Software | Regression and ANOVA-Matrix Data

7.0.1. Matrix Data

Raw data arranged in a matrix is the data type used by all regression procedures. An unlimited number of data columns can be selected as independent variables from the Variables Available list. Each row of the data matrix should correspond to the same case (such as time intervals, or names of patients in a hospital). Therefore, all columns are expected to have the same size.

Missing data handling: Any rows containing one or more missing observations are omitted. Here we refer to a row of the matrix defined by the selected set of variables, but not to a row of all columns in the data matrix. Therefore, depending on the combination of selected columns, different numbers of rows may be omitted.

An important special case for missing data handling is when more than one dependent variable is selected. In such cases, missing values will be omitted for each run (with a different dependent variable) separately. Consider, for instance, a hypothetical data set from which we select two dependent variables, where only the first one contains a missing value. Also assume that no other variables contain any missing values. In this case, the first run will report one case omitted due to missing values and in the second run (with the second dependent variable) no cases will be omitted.

It is also important to remember that in Stepwise Regression, rows containing missing values are omitted at the beginning. This means that rows will be omitted from the matrix defined by all variables selected for the stepwise analysis rather than the final configuration of variables that are included in the regression equation. This may be important when you run the final stepwise equation once again using the Linear Regression procedure (say, to obtain diagnostic statistics). In this case, it may be necessary to delete some rows manually.

Subsample selection: Most procedures requiring matrix data will also allow for selection of subsample of rows defined by one or more factor columns. A factor is a categorical variable that contains a limited number of distinct numeric or string values (levels). An unlimited number of factor columns may be selected from the Variables Available list. In case two or more factors are selected, you will be able to include in the analysis any rows defined by combinations of factor levels. More information on this topic can be found for each regression procedure.

The Nonlinear Regression procedure allows for selection of only one factor variable and the task assigned to this variable is not selection of subsamples of rows. For more information see 7.2.4. Nonlinear Regression.

Weights: Data in matrix form allows for selection of one column to be used as weights in the analysis of rest of the selected columns. When, for instance, rows of data correspond to different regions in a country, you may eliminate the effect of population differences by selecting a variable containing region populations as weights. In this case, the program will first normalise the weights column so that its sum is equal to the valid number of cases (the number of rows after omission of missing rows), and then run the regression after multiplying all selected columns by the square root of the normalised weights column.

A weighted regression is thus a regression on a different set of data without transforming the data columns in the original data set. Also considering the type of missing data handling (omission of rows containing missing observations), and creation of dummy and lag/lead variables, the user may not be sure about the final configuration of data on which the regression is run. This is the reason why the menu option Statistics 1 → Matrix Statistics is provided here. Under this menu option it is possible to compute descriptive statistics, zero order (Pearson) Correlation Coefficients, variance-covariance and moment matrices for the same data set which is used in the Regression Analysis. The weights option and missing data handling is exactly the same as in the Regression Analysis.

Stepwise Regression and Logit / Probit / Gompit do not support weight variables and weights in Logistic Regression, Multinomial Regression and Poisson Regression are frequency weights.

Previous topic | Next topic