Unistat Statistics Software | Multinomial Regression

7.2.7. Multinomial Regression

The Multinomial Regression procedure (which is also known as Multinomial Logistic or Polytomous regression) is suitable for estimating models where the dependent variable is a categorical variable. If the dependent variable contains only two categories, its results are identical to that of Logistic Regression. Therefore, Multinomial Regression can be considered as an extension of Logistic Regression.

In Multinomial Regression, a set of coefficients are estimated for each category of the dependent variable. This makes its output format fundamentally different from other types of regression (see 7.2.7.3. Multinomial Regression Output Options). Since the estimated model is over-determined, the coefficients are scaled according to one of the categories. By default, the most frequent category is selected as the base category, though this can be changed by the user. The estimated coefficients are displayed relative to the base category and their exponentials are called Relative Risk Ratios.

Multinomial Regression is also closely related to Discriminant Analysis in the sense that both procedures are used to estimate the membership of cases to the groups defined by a categorical variable (see 8.2. Discriminant Analysis). Multinomial Regression should be preferred when the list of independent variables contains dummy variables. Discriminant Analysis is more powerful than Multinomial Regression when all independent variables are continuous and meet the assumptions of multivariate normality.

Predictions (interpolations) and multicollinearity are handled as in other regression options (see, for instance, Logistic Regression). Predicted cases are identified by an asterisk in Classification by Case, Probabilities and Index Values output options (see 7.2.7.3. Multinomial Regression Output Options).

7.2.7.1. Multinomial Regression Model Description

The multinomial logistic function is an extension of the logit function discussed in Logit / Probit / Gompit (see 7.2.5.1. Logit / Probit / Gompit Model Description).

Let J + 1 be the number of distinct categories in the dependent variable and assume that the category 0 is selected as the base category. Then the probabilities given by the multinomial logistic function are:

Multinomial Regression

where is the vector of estimated coefficients for the j^th category and is the i^th case (row) of the data matrix.

The relative risk ratio for case i relative to the base category is:

The logarithm of the likelihood function is given as:

Multinomial Regression

and the first derivatives as:

where:

d_ij = 1 if Y_i = j and

d_ij = 0 otherwise.

A Newton-Raphson type maximum likelihood algorithm is employed to minimise the negative of the log likelihood function. The nature of this method implies that a solution (convergence) cannot always be achieved. In such cases, you are advised to edit the convergence parameters provided, in order to find the right levels for the particular problem at hand.

7.2.7.2. Multinomial Regression Variable Selection

Multinomial Regression

Like Logistic Regression, Multinomial Regression can be used to estimate models with or without a constant term, with or without weights and regressions can be run on a subset of cases as determined by the levels of an unlimited number of factor columns. An unlimited number of dependent variables (numeric or string) can be selected in order to run the same model on different dependent variables. It is also possible to include interaction terms, dummy and lag/lead variables in the model, without having to create them as spreadsheet columns first (see 2.1.4. Creating Interaction, Dummy and Lag/Lead Variables).

It is compulsory to select at least one categorical data column containing numeric or String Data as a dependent variable. When more than one dependent variable is selected, the analysis will be repeated as many times as the number of dependent variables, each time only changing the dependent variable and keeping the rest of selections unchanged.

A column containing numeric data can be selected as a weights column. Weights are frequency weights and all independent variables are multiplied by this column internally by the program.

An intermediate inputs dialogue is displayed next.

Multinomial Regression

Tolerance: This value is used to control the sensitivity of nonlinear minimisation procedure employed. Under normal circumstances, you do not need to edit this value. If convergence cannot be achieved, then larger values of this parameter can be tried by removing one or more zeros.

Maximum Number of Iterations: When convergence cannot be achieved with the default value of 100 function evaluations, a higher value can be tried.

Omit Level: This field will appear only when one or more dummy variables have been included in the model from the Variable Selection Dialogue. Three options are available; (0) do not omit any levels, (1) omit the first level and (2) omit the last level. When no levels are omitted, the model will usually be over-parameterised (see 2.1.4. Creating Interaction, Dummy and Lag/Lead Variables).

Base Category: By default, the most frequent category of the dependent variable is suggested as the base category. To scale the estimated coefficients according to a particular category, enter this category here. The Relative Risk Ratios will be relative to this category.

7.2.7.3. Multinomial Regression Output Options

When the calculations are finished an Output Options Dialogue will provide access to the following options.

Multinomial Regression

Regression Results: The main regression output displays a table of estimated coefficients for each category of the dependent variable, except for the base category. Standard errors, Wald statistics, probability values and confidence intervals are also displayed for the estimated regression coefficients.

Wald Statistic: This is defined as:

Multinomial Regression

and has a chi-square distribution with one degree of freedom.

Confidence Intervals: The confidence intervals for regression coefficients are computed from:

, i = 1, …, k, j = 1, …, J,

where k is the number of independent variables in the model and each coefficient’s standard error, σ_ij, is the square root of the diagonal element of covariance matrix.

Goodness of Fit Tests: See 7.2.6.4.1. Logistic Regression Results for details.

Relative Risk Ratio:

Values of the relative risk ratio indicate the influence of one unit change in a covariate on the regression.

, i = 1, …, k, j = 1, …, J.

The standard error of the relative risk ratio is:

where σ_ij is the standard error of the i^th independent variable for the j^th category of the dependent variable. Coefficient confidence intervals are:

which are simply the exponential of the coefficient confidence intervals.

Correlation Matrix of Regression Coefficients: This is a symmetric matrix with unity diagonal elements. The off-diagonal elements give correlations between the regression coefficients.

Covariance Matrix of Regression Coefficients: This is a symmetric matrix where diagonal elements are the square of parameter standard errors. The off-diagonal elements are covariances between the regression coefficients.

Classification by Group: A table is formed with rows and columns corresponding to observed and estimated group memberships respectively. Each cell displays the number of elements and its percentage with respect to the row total. The diagonal elements are the cases classified correctly and the off-diagonal elements are the misclassified cases.

Classification by Case: The observed and estimated group memberships are listed and pairs with disagreement are marked. The estimated group is the one with the highest probability level.

Probabilities: These are calculated as in the first two equations of Multinomial Regression Model Description. The predicted values are marked with two asterisks.

Index Values: These are the fitted values given as:

The predicted values are marked with two asterisks.

7.2.7.4. Multinomial Regression Examples

Open ANOVA and select Statistics 1 → Regression Analysis → Multinomial Regression. From the Variable Selection Dialogue select Period (C29) as [Dependent], Score (C26) as [Variable] and Noise (S30) as [Dummy]. On Step 2 dialogue enter 1 for Omit Level and leave other entries unchanged. Check all output options to obtain the following output. Some tables have been shortened.

Multinomial Regression

Dependent Variable: Period

Base Category: 1

Valid Number of Cases: 54, 0 Omitted

Regression Results

	Coefficient	Standard Error	Wald Statistic	Probability	Lower 95%	Upper 95%
Period = 2 Constant	4.2452	1.9258	4.8596	0.0275	0.4708	8.0196
Score	-0.0818	0.0358	5.2104	0.0225	-0.1519	-0.0116
Noise = Low	-0.4319	0.7490	0.3325	0.5642	-1.8999	1.0362
Period = 3 Constant	8.9299	2.3890	13.9723	0.0002	4.2476	13.6122
Score	-0.1924	0.0503	14.6346	0.0001	-0.2910	-0.0938
Noise = Low	-1.1811	0.9236	1.6352	0.2010	-2.9914	0.6292

Goodness of Fit Tests

	-2 Log likelihood
Initial Model	118.6501
Final Model	91.6682

	Chi-Square Statistic	Degrees of Freedom	Right-Tail Probability
Likelihood Ratio	26.9820	4	0.0000

	Pseudo R-squared
McFadden	0.2274
Adjusted McFadden	0.1768
Cox & Snell	0.3933
Nagelkerke	0.4424

Correlation Matrix of Regression Coefficients

	Period = 2 Constant	Score	Noise = Low	Period = 3 Constant	Score	Noise = Low
Period = 2 Constant	1.0000	-0.9642	-0.4271	0.6576	-0.5719	-0.2615
Score	-0.9642	1.0000	0.2573	-0.6220	0.5703	0.1609
Noise = Low	-0.4271	0.2573	1.0000	-0.2553	0.1441	0.5618
Period = 3 Constant	0.6576	-0.6220	-0.2553	1.0000	-0.9635	-0.4907
Score	-0.5719	0.5703	0.1441	-0.9635	1.0000	0.3417
Noise = Low	-0.2615	0.1609	0.5618	-0.4907	0.3417	1.0000

Covariance Matrix of Regression Coefficients

	Period = 2 Constant	Score	Noise = Low	Period = 3 Constant	Score	Noise = Low
Period = 2 Constant	3.7085	-0.0665	-0.6160	3.0251	-0.0554	-0.4651
Score	-0.0665	0.0013	0.0069	-0.0532	0.0010	0.0053
Noise = Low	-0.6160	0.0069	0.5610	-0.4568	0.0054	0.3886
Period = 3 Constant	3.0251	-0.0532	-0.4568	5.7072	-0.1158	-1.0828
Score	-0.0554	0.0010	0.0054	-0.1158	0.0025	0.0159
Noise = Low	-0.4651	0.0053	0.3886	-1.0828	0.0159	0.8531

Relative Risk Ratio

	Relative Risk Ratio	Standard Error	Lower 95%	Upper 95%
Period = 2 Score	0.9215	0.0330	0.8590	0.9885
Noise = Low	0.6493	0.4863	0.1496	2.8184
Period = 3 Score	0.8249	0.0415	0.7475	0.9104
Noise = Low	0.3069	0.2835	0.0502	1.8761

Classification by Group

Observed\Predicted	1	2	3
1	12	4	2
	66.67%	22.22%	11.11%
2	7	5	6
	38.89%	27.78%	33.33%
3	0	5	13
	0.00%	27.78%	72.22%

Correctly Classified =

55.56%

Classification by Case

	Actual Group	Misclassified	Estimated Group	Probability
1	1	*	2	0.4327
2	1		1	0.4552
3	1		1	0.6290
4	2	*	3	0.4842
5	2	*	1	0.4283
6	2	*	1	0.5585
…	…	…	…	…

Probabilities

	1	2	3
1	0.2456	0.4327	0.3217
2	0.4552	0.4170	0.1279
3	0.6290	0.3251	0.0459
4	0.1413	0.3745	0.4842
5	0.4283	0.4258	0.1459
6	0.5585	0.3689	0.0726
7	0.0235	0.1661	0.8105
8	0.0953	0.3229	0.5819
9	0.2700	0.4383	0.2917
10	0.0716	0.2858	0.6426
11	0.1595	0.3896	0.4509
12	0.3744	0.4383	0.1873
13	0.0328	0.1969	0.7703
14	0.0953	0.3229	0.5819
15	0.2952	0.4417	0.2631
…	…	…	…

Index Values

	1	2	3
1	0.0000	0.5663	0.2698
2	0.0000	-0.0877	-1.2698
3	0.0000	-0.6600	-2.6169
4	0.0000	0.9751	1.2320
5	0.0000	-0.0059	-1.0773
6	0.0000	-0.4147	-2.0396
7	0.0000	1.9561	3.5414
8	0.0000	1.2204	1.8094
9	0.0000	0.4846	0.0774
10	0.0000	1.3839	2.1943
11	0.0000	0.8934	1.0396
12	0.0000	0.1576	-0.6924
13	0.0000	1.7926	3.1565
14	0.0000	1.2204	1.8094
15	0.0000	0.4028	-0.1151
…	…	…	…

Previous topic | Next topic