7.2.6. Logistic Regression
The Logistic Regression procedure is suitable for estimating Linear Regression models when the dependent variable is a binary (or dichotomous) variable, that is, it consists of two values such as Yes or No, or in general 0 and 1. In such cases, where the dependent variable has an underlying binomial distribution (and thus the predicted Y values should lie between 0 and 1) the Linear Regression procedure cannot be employed.
Like Linear Regression, Logistic Regression can be used to estimate models with or without a constant term and regressions may be run on a subset of cases as determined by the levels of an unlimited number of factor columns. An unlimited number of dependent variables (numeric or string) can be selected in order to run the same model on different dependent variables. It is also possible to include interaction terms, dummy and lag/lead variables in the model, without having to create them as spreadsheet columns first.
Logistic Regression is closely related to Logit / Probit / Gompit. For a brief discussion of similarities and differences of these two procedures see 7.2.5. Logit / Probit / Gompit.
As of this version of UNISTAT, a comprehensive implementation of ROC (Receiver Operating Characteristic) analysis is included in the Logistic Regression procedure. The two output options Classification by Group and ROC Analysis, as well as the two graphics options, will provide a complete ROC analysis output. It is possible to compute AUC (area under the curve) and plot ROC curves with covariates and plot multiple ROC curves with multiple comparisons between AUCs.
7.2.6.1. Logistic Regression Model Description
Logistic Regression employs the logit model as explained in Logit / Probit / Gompit (see 7.2.5.1. Logit / Probit / Gompit Model Description). However, the log of likelihood function for the logistic model can be expressed more explicitly as:
with first derivatives:
where:
7.2.6.2. Logistic Regression Variable Selection
As in Linear Regression, it is possible to create interaction terms, dummy variables, lag/lead terms, select multiple dependent variables (see 2.1.4. Creating Interaction, Dummy and Lag/Lead Variables) and run regressions on subsamples defined by several factor columns with or without weights (see 7.2.1.1. Linear Regression Variable Selection).
It is compulsory to select at least one column containing numeric or String Data as a dependent variable. The program encodes the dependent variable internally such that, by default, the minimum value that occurs in the column is 0 and the rest are 1. It is possible to reverse this condition and encode the maximum of the dependent variable as 0 and the rest of the values as 1 using the Dependent Variable Encoding control in the Intermediate Inputs dialogue.
In case a categorical variable is not selected as the dependent variable, there may be too few 0s and too many 1s in the encoded dependent variable and a convergence may not be achieved.
When more than one dependent variable is selected, the analysis will be repeated as many times as the number of dependent variables, each time only changing the dependent variable and keeping the rest of selections unchanged.
When more than one independent variable is selected, you will be presented with the option to run a single analysis (see 7.2.6.3. Logistic Regression Intermediate Inputs) including all independent variables (which is the default case in earlier versions of UNISTAT) or to run a separate regression for each independent variable, while holding the dependent variable unchanged. The primary use of this option is to compare the areas enclosed under the ROC curves for each independent variable.
A column containing numeric data can be selected as a weights column. Unlike the Linear Regression procedure, however, weights here are frequency weights. All independent variables are multiplied by this column internally by the program.
7.2.6.3. Logistic Regression Intermediate Inputs
The number and kind of controls that appear on this dialogue depend on the selections made in the previous dialogue. For instance, if a dummy or lag variable was created, the dialogue will display one or more other boxes (see 2.1.4. Creating Interaction, Dummy and Lag/Lead Variables). The specific tasks of these controls are as follows:
Tolerance: This value is used to control the sensitivity of nonlinear minimisation procedure employed. Under normal circumstances, you do not need to edit this value. If a convergence cannot be achieved, then larger values of this parameter can be tried by removing one or more zeros.
Maximum Number of Iterations: When convergence cannot be achieved with the default value of 100 function evaluations, a higher value can be tried.
Dependent Variable Encoding: By default, the program will internally encode the dependent variable values such that the minimum is 0 and the rest of the values are 1. If one is entered into this box, the program will encode the maximum value as 0 and the rest as 1.
Making a change in this control will normally reverse the signs of the estimated coefficients and will affect other output as well. If your aim in changing this control is to obtain the correct 2 x 2 table where the maximum value of the dependent variable corresponds to a positive state, e.g. the presence of a disease or vice versa, this can be done in the Output Options Dialogue more efficiently, without affecting the estimated coefficients and other output.
ROC Optimality Criterion: In Logistic Regression, the fitted Y is a continuous variable consisting of probability values. The estimated group membership (in terms of 0 and 1) is dependent on a critical cutoff probability which is also called the Classification threshold probability. The estimated group membership is 0 for any case with an estimated probability (fitted Y value) less than this critical probability and it is 1 otherwise.
In earlier versions of UNISTAT, the default value of Classification threshold probability was fixed at 0.5 and the user was allowed to change this value manually to play different whatif scenarios. As of this version of UNISTAT, the Classification threshold probability is estimated by the program using one of the following two methods:
Maximum sum of sensitivity and specificity: This is also known as the Youden’s index and represents the point on the curve furthest away from the 45º line. It is defined as:
Max(Sensitivity + Specificity)
Point nearest to the topleft corner of ROC plot: This is given as:
Min((1 – Sensitivity)^2 + (1 Specificity)^2)
The estimated Classification threshold probability can be edited to observe the effect of different cutoff values on the 2 x 2 Table and Statistics For Diagnostic Tests output..
ROC Confidence Intervals: The ROC Table output option (see 7.2.6.4.4. ROC Analysis) can display all cases for a large number of test statistics together with their confidence intervals. Here you can choose the type of confidence intervals as:
0: Asymptotic normal (Wald), or
1: Exact binomial (ClopperPearson).
Run regression with all independent variables: This is the default option and produces one set of output with all independent variables included as in earlier versions of UNISTAT.
Run a separate regression for each independent variable: As of this version of UNISTAT, it is possible to run a separate regression for each independent variable, while holding the dependent variable unchanged. The primary use of this option is to compare the areas enclosed under the ROC curves for each independent variable.
7.2.6.4. Logistic Regression Output Options
The SensitivitySpecificity Plot option will not be available when Run a separate regression for each independent variable option is selected in the Intermediate Inputs dialogue.
Classification Threshold Probability: As described above for ROC Optimality Criterion, the estimated Classification threshold probability can be edited to try different scenarios. Changing this value will affect the output in Classification by Group and the Predicted Group and misclassifications in the Case (Diagnostic) Statistics output.
Positive Outcome: When calculating the Statistics For Diagnostic Tests output (e.g. sensitivity, specificity), UNISTAT assumes that the positive outcome is represented by 1 in the dependent variable and the true positive outcome of the test is represented in cell (1,1) of the 2 x 2 Table. Here you can control which value of the dependent variables represents the positive outcome, without affecting the rest of the Logistic Regression output.
Multicollinearity: Variables causing multicollinearity will be displayed with a zero coefficient at the end of the coefficients table. If you do not wish to display these variables enter the following line in the [Options] section of Documents\Unistat10\Unistat10.ini file:
DispCollin=0
The rest of the coefficients will be determined as if the regression were run without the variables causing collinearity.
7.2.6.4.1. Logistic Regression Results
The main regression output displays a table for coefficients of the estimated regression equation, their standard errors, Wald statistics, probability values and confidence intervals for the significance level specified in the Variable Selection Dialogue. If any independent variables have been omitted due to multicollinearity, they are reported at the end of the table with a zero coefficient.
Regression Coefficients:
The Wald statistic is defined as:
and has a chisquare distribution with one degree of freedom.
The confidence intervals for regression coefficients are computed from:
where each coefficient’s standard error, , is the square root of the diagonal element of the covariance matrix.
Goodness of Fit Tests:
2 LogLikelihood for Initial Model: This is 2 times the value when all independent variables are excluded from the model:
2 LogLikelihood for Final Model: This is 2 times the value of the log likelihood function when convergence is achieved.
Likelihood Ratio: This is a test statistic for the null hypothesis that “all regression coefficients for covariates are zero”. It is equal to 2 times the difference between the initial and final model likelihood values and has a chisquare distribution with k degrees of freedom (the number of independent variables in the model).
Goodness of Fit: This is also known as Pearson’s chisquare statistic and is for the observed versus expected number of responses. It has a chisquare distribution with n – k degrees of freedom (the number of valid cases minus the number of independent variables, including the constant term, if any).
HosmerLemeshow Test: This is a test for lack of fit. The observations are sorted according to their fitted Y values (estimated probabilities) in ascending order. The identical cases of independent variables are formed into blocks. Then the cases are grouped into approximately ten classes without splitting the blocks.
The test statistic is defined as:
with g – 2 degrees of freedom, where:
g is the number of classes,
n_{j} is the number of observations in the j^{th} class,
O_{j} is the observed number of cases in the j^{th} class,
E_{j} is the expected number of cases in the j^{th} class.
Pseudo Rsquared: In Logistic Regression (as well as in other maximum likelihood procedures), an Rsquared statistic as in Linear Regression is not available. This is because Logistic Regression employs an iterative maximum likelihood estimation method. Equivalent statistics to test the goodness of fit have been proposed using the initial (L_{0}) and maximum (L_{1}) likelihood values.
McFadden:
Adjusted McFadden:
Cox & Snell:
Nagelkerke:
Correlation Matrix of Regression Coefficients: This is a symmetric matrix with unity diagonal elements. The offdiagonal elements give correlations between regression coefficients.
Covariance Matrix of Regression Coefficients: This is a symmetric matrix where the square roots of the diagonal elements are the parameter standard errors. The offdiagonal elements are covariances between the regression coefficients.
Odds Ratio: Values of the odds ratio indicate the influence of one unit change in a covariate on the regression. It is defined as:
The standard error of the odds ratio is found as:
whereis the i^{th} coefficient’s standard error, and its confidence intervals as:
which are simply the exponential of the coefficient confidence intervals.
HosmerLemeshow Table: The contingency table described above in HosmerLemeshow Test is displayed. The observed and expected values for both values of the independent variable are listed for all classes.
7.2.6.4.2. Logistic Regression Case (Diagnostic) Statistics
Case statistics are useful to determine the influence of individual observations on the overall fit of the model. For further information see 7.2.1.2.2. Linear Regression Case Output.
Predictions (Interpolations): There are three conditions under which predictions will be computed for estimated Y values:
1) If, for a case, all independent variables are nonmissing, but only the dependent variable is missing,
2) if a case does not contain missing values but it has been omitted from the analysis by Data Processor’s Data → Select Row function and
3) if a case does not contain missing values but it has been omitted from the analysis by selecting subsamples from the Variable Selection Dialogue (see 2.1.2. Categorical Data Analysis).
Such cases are not included in the estimation of the model. When, however, Case (Diagnostic) Statistics option is selected, the program will detect these cases and compute and display the fitted (estimated) Y values, as well as their confidence intervals and some other related statistics. Therefore, it will be a good idea to include the cases for which predictions are to be made in the data matrix during the data preparation phase, and then exclude them from the analysis by one of the above three methods. When a case is predicted, its label will be prefixed by an asterisk (*).
In StandAlone Mode, the spreadsheet function Reg can also be used to make predictions (see 3.4.2.6.3. UNISTAT Functions).
This will give the logit of predicted values, which can be transformed back as:
Statistics available under Case (Diagnostic) Statistics option are as follows.
Case Labels: If row labels exist in data, they are displayed as case labels. Otherwise the row numbers are displayed. If a fitted Y value is predicted (see Fitted Y Values below) its label is marked by an asterisk (*). The misclassified cases (i.e. where an actual Y value differs from a predicted group) are divided into two groups as False Positive and False Negative and are marked by F+ and F respectively.
Actual Y: Encoded values of the dependent variable (y_{i}) are displayed.
Fitted Y: These are the estimated values for the dependent variable.
where:
If, for a case, all dependent variables are nonmissing, but the dependent variable is missing, the fitted (i.e. predicted) Y value will be displayed (see 7.2.6. Logistic Regression). Such cases are marked by a single asterisk (*) in their label.
Predicted Group: If cutoff (classification threshold) probability, then the group is 1, and 0 otherwise.
Leverage:
where the vector:
Cook’s Distance:
where is the standardised residual as defined below.
Deviance:
if and otherwise, where:
Residuals:
Standardised Residuals:
Logit Residuals:
Studentised Residuals:
DeltaBeta:
Deltabeta is defined as the change in an estimated coefficient when a case is omitted from the analysis. An estimate can be computed from the above formula without having to run n regressions.
7.2.6.4.3. Classification by Group
2 x 2 Table: The observed group membership (the actual Y in terms of 0 and 1) is tabulated against the estimated group membership. By default, it is assumed that the dependent variable value 1 represents the positive outcome The estimated group membership is 1 for any case with an estimated probability (fitted Y value) greater than or equal to the Classification Threshold Probability and it is 0 otherwise. If 1 does not represent the positive outcome of the test, then you can change this by entering 0 in the Positive Outcome box on the Output Options Dialogue. This will not force a reestimation of the model.

Positive Actual 
Negative Actual 
Total 
Positive Estimate 
TP 
FP 
TP + FP 
Negative Estimate 
FN 
TN 
FN + TN 
Total 
TP + FN 
FP + TN 
TOTAL 
The table entries are defined as:
TP: True Positive: Correct acceptance,
TN: True Negative: Correct rejection,
FP: False Positive: False alarm (Type I error),
FN: False Negative: Missed detection (Type II error).
Statistics for Diagnostic Tests: These are the tests to determine how good a diagnostic method is, for instance, in detecting a positive outcome (i.e. sensitivity) or a negative outcome (i.e. specificity). Many of the statistics displayed here are proportions and their confidence intervals are computed employing the Wald (asymptotic) and ClopperPearson (exact) methods for binomial proportions (see 6.4.3.2. Binomial Test). Confidence intervals for likelihood ratios are computed as in Simel D., Samsa G., Matchar D. (1991).
The tests covered under this topic are also available in other procedures. When the data consists of two binary variables Actual and Estimate, you can use the Paired Proportions (see 6.4.5.6. Statistics for Diagnostic Tests) or CrossTabulation procedures. Alternatively, when you have an already formed 2 x 2 table, you can use the Contingency Table procedure.
Sensitivity: True positive rate or the probability of diagnosing a case as positive when it is actually positive.
TP / (TP + FN)
Specificity: True negative rate or the probability of diagnosing a case as negative when it is actually negative.
TN / (TN + FP)
Accuracy: The rate of correctly classified or the probability of true positive results, including true positive and true negative.
Sensitivity * Prevalence + Specificity * (1 – Prevalence)
(TP + TN) / TOTAL
Prevalence: The actual positive rate.
(TP + FN) / TOTAL
Apparent Prevalence: The estimated positive rate.
(TP + FP) / TOTAL
Youden’s Index: Confidence intervals are calculated as in Bangdiwala S.I., Haedo A.S., Natal M.L. (2008).
Sensitivity + Specificity
TP / (TP + FN) + TN / (FP + TN)
Positive Predictive Value: PPV
TP / (TP + FP)
Negative Predictive Value: NPV
TN / (FN + TN)
Positive Likelihood Ratio: LR+
Sensitivity / (1 – Specificity)
(TP / (TP + FN)) / (1 – (TN / (FP + TN)))
Negative Likelihood Ratio: LR
(1 – Sensitivity) / Specificity
(1 – (TP / (TP + FN))) / (TN / (FP + TN))
Diagnostic Odds Ratio: Confidence intervals are calculated as in Scott I.A., Greenburg P.B., Poole P.J. (2008).
Positive Likelihood Ratio / Negative Likelihood Ratio
(TP * TN) / (FP * FN)
Weighted Positive Likelihood Ratio: WLR+. LR+ is weighted by prevalence.
(Prevalence * Sensitivity) / ((1Prevalence)(1Specificity))
TP / FP
Weighted Negative Likelihood Ratio: WLR. LR is weighted by prevalence.
(Prevalence (1Sensitivity)) / ((1Prevalence) Specificity)
FN / TN
7.2.6.4.4. ROC Analysis
ROC analysis is widely used in assessing the statistical significance of diagnostic laboratory tests. Sensitivity and specificity values are computed for all fitted Y values. The best cutoff point is determined according to the ROC Optimality Criterion selected in the Intermediate Inputs dialogue. This can be Maximum sum of sensitivity and specificity (i.e. the Youden’s index) or the value nearest to the topleft corner of ROC curve. The ROC curve is obtained by plotting the sensitivity values against 1 – specificity.
The output options for multiple ROC curves will not be available unless the Run a separate regression for each independent variable option is selected in the Intermediate Inputs dialogue.
Area Under the Curve (AUC): The area enclosed under the ROC curve is calculated by employing the algorithm developed by Delong E.R., Delong D.M., ClarkePearson D.L. (1998), which is based on the MannWhitney U test statistic. This not only produces an identical result to the area calculated by the trapezoidal rule, but also provides the standard errors (covariances) necessary to statistically compare AUCs, based on the nonparametric Udistribution (see 6.4.1.1. MannWhitney U Test).
AUC is a measure of the predictive power of the model. A value of 0.5 (which means that the curve is a 45º line) shows that the model has no power. A value of 1 (the theoretical maximum) means the full, 100% explanatory power.
The output includes AUCs, their standard errors, tail probabilities and the confidence intervals (asymptotic normal).
Correlation and Covariance Matrices for Areas: These options are available only when the Run a separate regression for each independent variable option is selected in the Intermediate Inputs dialogue.
Multiple Comparisons for Areas: This option is available only when the Run a separate regression for each independent variable option is selected in the Intermediate Inputs dialogue. The difference between all possible pairs of AUCs, their standard errors and confidence intervals are displayed.
The output includes the difference between AUCs, their standard errors, tail probabilities and confidence intervals (asymptotic normal) and a further chisquare test with 1 degree of freedom.
ROC Table: All Statistics for Diagnostic Tests and their confidence intervals are computed for all fitted Y values (classification threshold probabilities). By default, only the sensitivity and specificity values and their confidence intervals are displayed. However, the user can choose to display any statistic with or without confidence intervals. The case (row) corresponding to the Classification Threshold Probability (the best cutoff point) is marked by an asterisk.
7.2.6.4.5. Receiver Operating Characteristic (ROC) Plot
Sensitivity and specificity values are computed for all classification threshold (cutoff) probabilities. For each probability, the sensitivity value is plotted against 1 – specificity. The best cutoff point is marked by a symbol on the curve.
When only one ROC curve is plotted, its confidence intervals are also displayed on the graph. The AUC, its confidence interval, sensitivity and specificity values are displayed in the legend. If there is only one independent variable, then its value corresponding to the best cutoff point is also displayed. When there are multiple independent variables, the best cutoff probability is displayed instead. The display of confidence intervals can be switch on or off from Edit → Data Series dialogue.
If the Run a separate regression for each independent variable option is selected in the Intermediate Inputs dialogue (i.e. if multiple ROCs are compared), the ROC plot is not output for each independent variable separately. In this case only one plot is drawn at the end of the output, with multiple ROC curves and without confidence intervals. For each curve, the area enclosed under the curve (AUC) is displayed in the legend. The best cutoff point is also marked by a symbol on each curve.
7.2.6.4.6. SensitivitiySpecificity Plot
This plot is similar to the ROC plot, except that sensitivity and specificity values are plotted against all values of the classification probability. This plot is not available when the Run a separate regression for each independent variable option is selected in the Intermediate Inputs dialogue.
7.2.6.5. Logistic Regression Examples
Example 1
Open LOGIT and select Statistics 1 → Regression Analysis → Logistic Regression. From the Variable Selection Dialogue select GPA, TUCE and PSI (C12 to C14) as [Variable]s and GRADE (C15) as [Dependent] and. On the Intermediate Inputs dialogue select the Run regression with all independent variables option.
Some of the following results have been shortened to save space.
Logistic Regression
Dependent Variable: GRADE
Minimum of dependent variable is encoded as 0 and the rest as 1.
Valid Number of Cases: 32, 0 Omitted
Regression Coefficients

Coefficient 
Standard Error 
Wald Statistic 
Probability 
Lower 95% 
Upper 95% 
Constant 
13.0213 
4.9313 
6.9724 
0.0083 
22.6866 
3.3561 
GPA 
2.8261 
1.2629 
5.0074 
0.0252 
0.3508 
5.3014 
TUCE 
0.0952 
0.1416 
0.4519 
0.5014 
0.1823 
0.3726 
PSI 
2.3787 
1.0646 
4.9926 
0.0255 
0.2922 
4.4652 
Goodness of Fit Tests

2 Log likelihood 
Initial Model 
41.1835 
Final Model 
25.7793 

ChiSquare Statistic 
Degrees of Freedom 
RightTail Probability 
Pearson 
27.2571 
27 
0.4500 
Likelihood Ratio 
15.4042 
3 
0.0015 
HosmerLemeshow 
7.4526 
8 
0.4887 

Pseudo Rsquared 
McFadden 
0.3740 
Adjusted McFadden 
0.1798 
Cox & Snell 
0.3821 
Nagelkerke 
0.5278 
Correlation Matrix of Regression Coefficients

Constant 
GPA 
TUCE 
PSI 
Constant 
1.0000 
0.7343 
0.4960 
0.4494 
GPA 
0.7343 
1.0000 
0.2065 
0.3181 
TUCE 
0.4960 
0.2065 
1.0000 
0.0990 
PSI 
0.4494 
0.3181 
0.0990 
1.0000 
Covariance Matrix of Regression Coefficients

Constant 
GPA 
TUCE 
PSI 
Constant 
24.3180 
4.5735 
0.3463 
2.3592 
GPA 
4.5735 
1.5950 
0.0369 
0.4276 
TUCE 
0.3463 
0.0369 
0.0200 
0.0149 
PSI 
2.3592 
0.4276 
0.0149 
1.1333 
Odds Ratio

Odds Ratio 
Standard Error 
Lower 95% 
Upper 95% 
GPA 
16.8797 
21.3181 
1.4202 
200.6239 
TUCE 
1.0998 
0.1557 
0.8334 
1.4515 
PSI 
10.7907 
11.4874 
1.3393 
86.9380 
HosmerLemeshow Table

Actual Y = 0 
Expected Y = 0 
Actual Y = 1 
Expected Y = 1 
Total 
1 
4 
3.8965 
0 
0.1035 
4 
2 
3 
2.8964 
0 
0.1036 
3 
3 
3 
2.8353 
0 
0.1647 
3 
4 
2 
2.7165 
1 
0.2835 
3 
5 
2 
2.4295 
1 
0.5705 
3 
6 
4 
2.7678 
0 
1.2322 
4 
7 
1 
1.4199 
2 
1.5801 
3 
8 
1 
1.1139 
2 
1.8861 
3 
9 
0 
0.6265 
3 
2.3735 
3 
10 
1 
0.2977 
2 
2.7023 
3 
Case (Diagnostic) Statistics

Actual Y 
Fitted Y 
Predicted Group 
Leverage 
Cook’s Distance 
Deviance 
1 
0.0000 
0.0266 
0.0000 
0.0390 
0.0011 
0.2321 
2 
0.0000 
0.0595 
0.0000 
0.0545 
0.0036 
0.3503 
3 
0.0000 
0.1873 
0.0000 
0.0889 
0.0225 
0.6440 
… 
… 
… 
… 
… 
… 
… 
30 
1.0000 
0.9453 
1.0000 
0.0853 
0.0054 
0.3353 
F+ 31 
0.0000 
0.5291 
1.0000 
0.1171 
0.1490 
1.2273 
F 32 
1.0000 
0.1110 
0.0000 
0.1299 
1.1953 
2.0966 

Residuals 
Standardised Residuals 
Logit Residuals 
Studentised Residuals 
1 
0.0266 
0.1652 
1.0273 
0.2368 
2 
0.0595 
0.2515 
1.0633 
0.3602 
3 
0.1873 
0.4800 
1.2304 
0.6747 
… 
… 
… 
… 
… 
30 
0.0547 
0.2405 
1.0578 
0.3506 
F+ 31 
0.5291 
1.0600 
2.1237 
1.3061 
F 32 
0.8890 
2.8296 
9.0065 
2.2477 
F+: False Positive
F: False Negative
2 x 2 Table
Estimated \ Actual 
Positive 
Negative 
Total 
Positive 
9 
3 
12 

81.82% 
14.29% 

Negative 
2 
18 
20 

18.18% 
85.71% 

Total 
11 
21 
32 

100.00% 
100.00% 

Classification Threshold Probability = 
0.4211 
Statistics for Diagnostic Tests
Confidence Intervals: Row 1: Asymptotic Normal, Row 2: Exact Binomial

Value 
Standard Error 
Lower 95% 
Upper 95% 
Sensitivity 
0.8182 
0.1163 
0.5903 
1.0000 



0.4822 
0.9772 
Specificity 
0.8571 
0.0764 
0.7075 
1.0000 



0.6366 
0.9695 
Accuracy 
0.8438 
0.0642 
0.7179 
0.9696 



0.6721 
0.9472 
Prevalence 
0.3438 
0.0840 
0.1792 
0.5083 



0.1857 
0.5319 
Apparent Prevalence 
0.3750 
0.0856 
0.2073 
0.5427 



0.2110 
0.5631 
Youden’s Index 
0.6753 






0.1188 
0.9467 
Positive Predictive Value 
0.7500 
0.1250 
0.5050 
0.9950 



0.4281 
0.9451 
Negative Predictive Value 
0.9000 
0.0671 
0.7685 
1.0000 



0.6830 
0.9877 
Positive Likelihood Ratio 
5.7273 

1.9371 
16.9334 
Negative Likelihood Ratio 
0.2121 

0.0596 
0.7547 
Diagnostic Odds Ratio 
27.0000 

3.8033 
191.6750 
Weighted Positive Likelihood Ratio 
3.0000 

1.0678 
8.4284 
Weighted Negative Likelihood Ratio 
0.1111 

0.0295 
0.4183 
Logistic Regression
Dependent Variable: GRADE
Minimum of dependent variable is encoded as 0 and the rest as 1.
Valid Number of Cases: 32, 0 Omitted
Area Under the Curve

AUC 
Standard Error 
ZStatistic 
1Tail Probability 
2Tail Probability 
Lower 95% 
Upper 95% 

0.8831 
0.0621 
6.1721 
0.0000 
0.0000 
0.7615 
1.0048 
ROC Table
* marks the best cutoff case.
Optimality criterion: Max(Sensitivity + Specificity)
Confidence Intervals: Asymptotic Normal

Cutoff P <= 
Sensitivity 
Lower 95% 
Upper 95% 
Specificity 
Lower 95% 
Upper 95% 
1 
0.9453 
0.0909 
0.0000 
0.2608 
1.0000 
1.0000 
1.0000 
2 
0.9048 
0.1818 
0.0000 
0.4097 
1.0000 
1.0000 
1.0000 
3 
0.8521 
0.1818 
0.0000 
0.4097 
0.9524 
0.8613 
1.0000 
… 
… 
… 
… 
… 
… 
… 
… 
10 
0.5699 
0.7273 
0.4641 
0.9905 
0.9048 
0.7792 
1.0000 
11 
0.5291 
0.7273 
0.4641 
0.9905 
0.8571 
0.7075 
1.0000 
* 12 
0.4811 
0.8182 
0.5903 
1.0000 
0.8571 
0.7075 
1.0000 
… 
… 
… 
… 
… 
… 
… 
… 
30 
0.0265 
1.0000 
1.0000 
1.0000 
0.0952 
0.0000 
0.2208 
31 
0.0259 
1.0000 
1.0000 
1.0000 
0.0476 
0.0000 
0.1387 
32 
0.0245 
1.0000 
1.0000 
1.0000 
0.0000 
0.0000 
0.0000 
Range of best Classification Threshold Probability = 
0.3610 <> 0.4811 
Example 2
Continuing from the above example, this time select the Run a separate regression for each independent variable option on the Intermediate Inputs dialogue. Also uncheck the first three output options Regression Results, Case (Diagnostic) Statistics and Classification by Group.
Logistic Regression
Dependent Variable: GRADE
Minimum of dependent variable is encoded as 0 and the rest as 1.
Valid Number of Cases: 32, 0 Omitted
Area Under the Curve

AUC 
Standard Error 
ZStatistic 
1Tail Prob 
2Tail Prob 
Lower 95% 
Upper 95% 
GPA 
0.7944 
0.1001 
2.9421 
0.0016 
0.0033 
0.5983 
0.9905 
TUCE 
0.6688 
0.1075 
1.5702 
0.0582 
0.1164 
0.4581 
0.8796 
PSI 
0.7208 
0.0867 
2.5477 
0.0054 
0.0108 
0.5509 
0.8906 
Correlation Matrix of Areas

GPA 
TUCE 
PSI 
GPA 
1.0000 
0.2215 
0.2182 
TUCE 
0.2215 
1.0000 
0.0997 
PSI 
0.2182 
0.0997 
1.0000 
Covariance Matrix of Areas

GPA 
TUCE 
PSI 
GPA 
0.0100 
0.0024 
0.0019 
TUCE 
0.0024 
0.0116 
0.0009 
PSI 
0.0019 
0.0009 
0.0075 
Multiple Comparisons for Areas

Difference 
Standard Error 
ZStatistic 
1Tail Probability 
2Tail Probability 
GPA – TUCE 
0.1255 
0.1296 
0.9684 
0.1664 
0.3328 
GPA – PSI 
0.0736 
0.1460 
0.5042 
0.3071 
0.6141 
TUCE – PSI 
0.0519 
0.1447 
0.3591 
0.3598 
0.7195 

Lower 95% 
Upper 95% 
ChiSquare Statistic 
RightTail Probability 
GPA – TUCE 
0.1285 
0.3796 
0.9378 
0.3328 
GPA – PSI 
0.2125 
0.3597 
0.2542 
0.6141 
TUCE – PSI 
0.3355 
0.2316 
0.1289 
0.7195 