7.2.5. Logit / Probit / Gompit
The logit, probit and gompit regressions can be used to estimate models with binary dependent variables (dependent variables that consist of two values) as well as the aggregated models where data contains a variable on the number of positive (or negative) responses and another variable giving the total number of subjects. All three regressions work with similar inputs, employ the same maximum likelihood method and share the same output format, but differ in the objective function used.
The logit and Logistic Regression procedures are closely related. A logit analysis with a binary independent variable will produce the same coefficients estimated by Logistic Regression. However, for such problems the Logistic Regression procedure should be preferred, since;
1) it reports odds ratios and their confidence intervals,
2) creates a classification table for the predicted and observed group memberships,
3) draws Receiver Operating Characteristics (ROC) and Sensitivity and Specificity curves,
4) and reports a wide range of Case (Diagnostic) Statistics.
On the other hand, the Logit / Probit / Gompit procedure must be used when;
1) the dependent variable is not in binary format (it is in aggregated format),
2) the natural response rate is to be estimated or
3) a probit or gompit model is to be estimated.
Like other regression options, Logit / Probit / Gompit also allows for automatic creation of interaction terms and dummy variables.
7.2.5.1. Logit / Probit / Gompit Model Description
The logit function is an odds ratio function for a given probability value, e.g.:
Logit(p) = Ln(p/(1-p))
Logit(0.025) = -3.66
Logit(0.95) = 2.94.
Probit is the inverse standard cumulative normal distribution for a given probability value, e.g.:
Probit(0.025) = -1.96
Probit(0.95) = 1.64.
Gompit function, which is the inverse of the complementary log log (or cloglog) function is defined as:
Gompit(p) = 1-Exp(-Exp(p))
Note that this is the inverse of cloglog function, which is available as an axis scaling option in graphics (see Scale Type).
The logarithm of the likelihood function is:
![]()
and its first derivative is:
![]()
where:
ri is the number of responses,
si is the number of subjects.
For the logit model:
![]()
![]()
for the probit model:
Fi is the cumulative normal probability and
Gi is the normal frequency at
.
and for the gompit model:
![]()
![]()
With a dichotomous dependent variable ri = yi (0 or 1) and si = 1.
A Newton-Raphson type maximum likelihood algorithm is employed to minimise the negative of the log likelihood function. The nature of this method implies that a solution (convergence) cannot always be achieved. In such cases, you are advised to edit the convergence parameters provided, in order to find the right levels for the particular problem at hand.
7.2.5.2. Logit / Probit / Gompit Variable Selection
Logit / Probit / Gompit can analyse data in two different formats:
1) Binary (casewise) data where the dependent variable is a dichotomous variable (it consists of two values), and

2) Aggregated (grouped) data, where similar cases are collapsed into groups to generate two columns, one containing the number of responses the other the total number of subjects in the group.

When the first data option is selected, the dependent variable should ideally contain only two distinct values (numeric or string). However, UNISTAT will accept any column as the dependent variable and then consider those values which are equal to the minimum of this column as 0 and any other values as 1. This approach has the advantage and flexibility of running Logit / Probit / Gompit models on columns containing any type of categorical data. For instance, when a logit analysis is run on a column containing years 1995, 1996 and 1997, UNISTAT will internally code all 1995 entries as 0 and all 1996 and 1997 entries as 1. However, it is left to the user to ensure that the dependent variable selected contains sensible values.
The following is an example for the first data type, where there is one dichotomous dependent variable and one independent variable:
|
Dependent |
Independent |
|
0 |
1.3 |
|
0 |
2.7 |
|
1 |
2.1 |
|
0 |
2.7 |
|
0 |
1.3 |
|
1 |
2.1 |
|
1 |
2.7 |
|
1 |
1.9 |
|
1 |
1.3 |
|
0 |
2.1 |
|
0 |
2.7 |
|
1 |
1.3 |
|
1 |
2.1 |
|
0 |
1.3 |
|
0 |
2.1 |
|
0 |
1.9 |
The same data set can be grouped (or collapsed) into the second (aggregated) format as follows:
|
Responses |
Subjects |
Independent |
|
2 |
5 |
1.3 |
|
1 |
2 |
1.9 |
|
3 |
5 |
2.1 |
|
1 |
4 |
2.7 |
where the first variable is called the response variable (which represents the number of true values within the group), and the second the subjects (which represents the total number of cases in that group).
UNISTAT will first ask for the type of the dependent variable. If it is binary as described in (1) above, then select a dependent variable (by clicking on [Dependent]) which contains numeric or string categorical data, and any number of independent variables, which contain numeric data. If the data is in aggregated (or collapsed) from as described in (2) above, then select one column as Response (by clicking on [Response]) and one column as Subjects (by clicking on [Subject]). The following relation should hold for each case:
0 ≤ Response ≤ Subjects
Cases that do not conform to this will be considered as missing. As in Linear Regression, it is possible to create interaction terms and dummy variables, but not lag/lead terms (see 2.1.4. Creating Interaction, Dummy and Lag/Lead Variables).
It is also possible to select a factor (categorical) variable (by clicking on [Factor]) in which case the program will perform the analysis on a sub group as defined by the user (see 7.2.1.1. Linear Regression Variable Selection).
Next, an intermediate inputs dialogue will pop up.

Tolerance: This value is used to control the sensitivity of nonlinear minimisation procedure employed. Under normal circumstances, you do not need to edit this value. If a convergence cannot be achieved, then larger values of this parameter can be tried by removing one or more zeros.
Maximum Number of Iterations: When convergence cannot be achieved with the default value of 100 function evaluations, a higher value can be tried.
Omit Level: This field will appear only when one or more Dialogue. Three options are available; (0) do not omit any levels, (1) omit the first level and (2) omit the last level. When no levels are omitted, the model will usually be over-parameterised (see 2.1.4. Creating Interaction, Dummy and Lag/Lead Variables).
Logit / Probit / Gompit: Select the model to be estimated.
When the aggregated data option is selected, UNISTAT will also ask whether a natural response rate is to be estimated or a fixed one will be given by the user. When a natural response rate is estimated, it will appear in the output just like any other estimated coefficient.

7.2.5.3. Logit / Probit / Gompit Output Options

Regression Results: Final value of the objective function, a goodness of fit test, parameter estimates, their standard errors and the class="UniDialog">Goodness of Fit test between the observed and expected number of responses. This is also known as Pearson’s chi-square statistic and it has a chi-square distribution with n – 1 – k degrees of freedom.
Expected Frequencies: Observed and expected responses, their differences and the expected probabilities are displayed.
Correlation Matrix for Regression Coefficients: Correlations between the estimated coefficients are displayed.
Covariance Matrix for Regression Coefficients: Diagonal elements are the coefficient variances and off diagonal elements are the covariances between coefficients.
7.2.5.4. Logit / Probit / Gompit Examples
Example 1
Table 12.19 on p. 353 from Altman, Douglas (1991). Open LOGIT, select Statistics 1 → Regression Analysis → Logit / Probit / Gompit and select the data option Two Columns Contain Number of Subjects and Number of Responses. Then select Total (C7) as [Subject], Hypertension (C8) as [Response] and Smoking, Obesity and Snoring (C9 to C11) as [Variable]s. Select only the Regression Results output option to obtain the following results:
Logit / Probit / Gompit
Regression Results
Model selected: Logit
Valid Number of Cases: 8, 0 Omitted
Response Variable: Hypertension
Subject Variable: Total
|
|
Coefficient |
Standard Error |
Z-Statistic |
1-Tail Probability |
Lower 95% |
Upper 95% |
|
Constant |
-2.3777 |
0.3802 |
-6.2540 |
0.0000 |
-3.1228 |
-1.6325 |
|
Smoking |
-0.0678 |
0.2781 |
-0.2437 |
0.8075 |
-0.6129 |
0.4773 |
|
Obesity |
0.6953 |
0.2851 |
2.4390 |
0.0147 |
0.1366 |
1.2541 |
|
Snoring |
0.8719 |
0.3976 |
2.1932 |
0.0283 |
0.0927 |
1.6512 |
|
-2 Log likelihood = |
398.9164 |
|
Goodness of Fit: |
|
|
Chi-Square Statistic = |
1.3643 |
|
Degrees of Freedom = |
4 |
|
Right-Tail Probability = |
0.8504 |
Go back to Variable Selection Dialogue, omit Smoking (C9) from the independent variable list and run the analysis again.
Regression Results
Model selected: Logit
Valid Number of Cases: 8, 0 Omitted
Response Variable: Hypertension
Subject Variable: Total
|
|
Coefficient |
Standard Error |
Z-Statistic |
1-Tail Probability |
Lower 95% |
Upper 95% |
|
Constant |
-2.3921 |
0.3757 |
-6.3662 |
0.0000 |
-3.1285 |
-1.6556 |
|
Obesity |
0.6954 |
0.2851 |
2.4395 |
0.0147 |
0.1367 |
1.2541 |
|
Snoring |
0.8655 |
0.3967 |
2.1819 |
0.0291 |
0.0880 |
1.6429 |
|
-2 Log likelihood = |
398.9761 |
|
Goodness of Fit: |
|
|
Chi-Square Statistic = |
1.3854 |
|
Degrees of Freedom = |
5 |
|
Right-Tail Probability = |
0.9259 |
Example 2
Example 12.10 on p. 429 from Armitage, P. & G. Berry (1994). Data given in Table 12.9 needs to be transformed into a suitable format where the main effects of the four factors A, B, C and D can be analysed. This is done by creating a new column for each factor such that it contains the value one if the factor occurs in the factor combination column and zero otherwise. The data matrix would then look like this:
|
Total |
Good |
A |
B |
C |
D |
|
477 |
84 |
0 |
0 |
0 |
0 |
|
231 |
75 |
1 |
0 |
0 |
0 |
|
63 |
13 |
0 |
1 |
0 |
0 |
|
94 |
35 |
1 |
1 |
0 |
0 |
|
150 |
67 |
0 |
0 |
1 |
0 |
|
378 |
201 |
1 |
0 |
1 |
0 |
|
32 |
16 |
0 |
1 |
1 |
0 |
|
169 |
102 |
1 |
1 |
1 |
0 |
|
12 |
2 |
0 |
0 |
0 |
1 |
|
13 |
7 |
1 |
0 |
0 |
1 |
|
7 |
4 |
0 |
1 |
0 |
1 |
|
12 |
8 |
1 |
1 |
0 |
1 |
|
11 |
3 |
0 |
0 |
1 |
1 |
|
45 |
27 |
1 |
0 |
1 |
1 |
|
4 |
1 |
0 |
1 |
1 |
1 |
|
31 |
23 |
1 |
1 |
1 |
1 |
Open LOGIT, select Statistics 1 → Regression Analysis → Logit / Probit / Gompit and select the data option Two Columns Contain Number of Subjects and Number of Responses. Then select Total (C1) as [Subject], Good (C2) as [Response] and A, B, C, D (C3 to C6) as [Variable]s. Select all output options for the following results:
Logit / Probit / Gompit
Regression Results
Model selected: Logit
Valid Number of Cases: 16, 0 Omitted
Response Variable: Good
Subject Variable: Total
|
|
Coefficient |
Standard Error |
Z-Statistic |
1-Tail Probability |
Lower 95% |
Upper 95% |
|
Constant |
-1.4604 |
0.0964 |
-15.1490 |
0.0000 |
-1.6494 |
-1.2715 |
|
A |
0.6498 |
0.1154 |
5.6298 |
0.0000 |
0.4236 |
0.8760 |
|
B |
0.3101 |
0.1222 |
2.5377 |
0.0112 |
0.0706 |
0.5496 |
|
C |
0.9806 |
0.1107 |
8.8560 |
0.0000 |
0.7636 |
1.1976 |
|
D |
0.4204 |
0.1910 |
2.2011 |
0.0277 |
0.0461 |
0.7947 |
|
-2 Log likelihood = |
2104.1204 |
|
Goodness of Fit: |
|
|
Chi-Square Statistic = |
13.6067 |
|
Degrees of Freedom = |
11 |
|
Right-Tail Probability = |
0.2555 |
Expected Frequencies
|
Row |
Obs Responses |
Exp Responses |
Residuals |
Probability |
|
1 |
84.0000 |
89.8673 |
-5.8673 |
0.1884 |
|
2 |
75.0000 |
71.0915 |
3.9085 |
0.3078 |
|
3 |
13.0000 |
15.1470 |
-2.1470 |
0.2404 |
|
4 |
35.0000 |
35.4771 |
-0.4771 |
0.3774 |
|
5 |
67.0000 |
57.3442 |
9.6558 |
0.3823 |
|
6 |
201.0000 |
205.0245 |
-4.0245 |
0.5424 |
|
7 |
16.0000 |
14.6455 |
1.3545 |
0.4577 |
|
8 |
102.0000 |
104.4028 |
-2.4028 |
0.6178 |
|
9 |
2.0000 |
3.1336 |
-1.1336 |
0.2611 |
|
10 |
7.0000 |
5.2474 |
1.7526 |
0.4036 |
|
11 |
4.0000 |
2.2764 |
1.7236 |
0.3252 |
|
12 |
8.0000 |
5.7596 |
2.2404 |
0.4800 |
|
13 |
3.0000 |
5.3365 |
-2.3365 |
0.4851 |
|
14 |
27.0000 |
28.9549 |
-1.9549 |
0.6434 |
|
15 |
1.0000 |
2.2493 |
-1.2493 |
0.5623 |
|
16 |
23.0000 |
22.0422 |
0.9578 |
0.7110 |
Correlation Matrix of Regression Coefficients
|
|
Constant |
A |
B |
C |
D |
|
Constant |
1.0000 |
-0.5095 |
-0.1961 |
-0.3929 |
-0.0716 |
|
A |
-0.5095 |
1.0000 |
-0.1534 |
-0.2952 |
-0.0430 |
|
B |
-0.1961 |
-0.1534 |
1.0000 |
-0.0014 |
-0.0810 |
|
C |
-0.3929 |
-0.2952 |
-0.0014 |
1.0000 |
-0.0569 |
|
D |
-0.0716 |
-0.0430 |
-0.0810 |
-0.0569 |
1.0000 |
Covariance Matrix of Regression Coefficients
|
|
Constant |
A |
B |
C |
D |
|
Constant |
0.0093 |
-0.0057 |
-0.0023 |
-0.0042 |
-0.0013 |
|
A |
-0.0057 |
0.0133 |
-0.0022 |
-0.0038 |
-0.0009 |
|
B |
-0.0023 |
-0.0022 |
0.0149 |
-0.0000 |
-0.0019 |
|
C |
-0.0042 |
-0.0038 |
-0.0000 |
0.0123 |
-0.0012 |
|
D |
-0.0013 |
-0.0009 |
-0.0019 |
-0.0012 |
0.0365 |
Next go back to the Variable Selection Dialogue and check the Probit option. Select only the Regression Results output option.
Regression Results
Model selected: Probit
Valid Number of Cases: 16, 0 Omitted
Response Variable: Good
Subject Variable: Total
|
|
Coefficient |
Standard Error |
Z-Statistic |
1-Tail Probability |
Lower 95% |
Upper 95% |
|
Constant |
-0.8933 |
0.0561 |
-15.9286 |
0.0000 |
-1.0032 |
-0.7833 |
|
A |
0.3963 |
0.0698 |
5.6740 |
0.0000 |
0.2594 |
0.5332 |
|
B |
0.1890 |
0.0749 |
2.5238 |
0.0116 |
0.0422 |
0.3359 |
|
C |
0.6027 |
0.0675 |
8.9292 |
0.0000 |
0.4704 |
0.7350 |
|
D |
0.2584 |
0.1169 |
2.2106 |
0.0271 |
0.0293 |
0.4876 |
|
-2 Log likelihood = |
2103.5279 |
|
Goodness of Fit: |
|
|
Chi-Square Statistic = |
12.9949 |
|
Degrees of Freedom = |
11 |
|
Right-Tail Probability = |
0.2937 |
Example 3
Example 19.1 on p. 876 Greene, W. H. (1997). Data is given in Table 19.1 and results for all three models are given on Table 19.2. Table 19.3 on p. 886 also displays the standard errors for logit and probit models. The binary dependent variable GRADE indicates whether a student’s grade on an examination improved after exposure to a new method of teaching.
Open LOGIT, select Statistics 1 → Regression Analysis → Logit / Probit / Gompit and select the data option Dependent Variable Contains Binary Data. Then select GRADE (C12) as [Dependent] and GPA, TUCE and PSI (C13 to C15) as [Variable]s. Select only the Regression Results and then run the example for logit, probit and gompit models separately.
Logit / Probit / Gompit
Regression Results
Model selected: Logit
Valid Number of Cases: 32, 0 Omitted
Dependent Variable: GRADE
Minimum of dependent variable is encoded as 0 and the rest as 1.
|
|
Coefficient |
Standard Error |
Z-Statistic |
1-Tail Probability |
Lower 95% |
Upper 95% |
|
Constant |
-13.0213 |
4.9313 |
-2.6405 |
0.0134 |
-22.6866 |
-3.3561 |
|
GPA |
2.8261 |
1.2629 |
2.2377 |
0.0334 |
0.3508 |
5.3014 |
|
TUCE |
0.0952 |
0.1416 |
0.6722 |
0.5069 |
-0.1823 |
0.3726 |
|
PSI |
2.3787 |
1.0646 |
2.2344 |
0.0336 |
0.2922 |
4.4652 |
|
-2 Log likelihood = |
25.7793 |
|
Goodness of Fit: |
|
|
Chi-Square Statistic = |
27.2571 |
|
Degrees of Freedom = |
27 |
|
Right-Tail Probability = |
0.4500 |
Regression Results
Model selected: Probit
Valid Number of Cases: 32, 0 Omitted
Dependent Variable: GRADE
Minimum of dependent variable is encoded as 0 and the rest as 1.
|
|
Coefficient |
Standard Error |
Z-Statistic |
1-Tail Probability |
Lower 95% |
Upper 95% |
|
Constant |
-7.4523 |
2.5425 |
-2.9311 |
0.0067 |
-12.4355 |
-2.4692 |
|
GPA |
1.6258 |
0.6939 |
2.3431 |
0.0265 |
0.2658 |
2.9858 |
|
TUCE |
0.0517 |
0.0839 |
0.6166 |
0.5425 |
-0.1127 |
0.2162 |
|
PSI |
1.4263 |
0.5950 |
2.3970 |
0.0234 |
0.2601 |
2.5926 |
|
-2 Log likelihood = |
25.6376 |
|
Goodness of Fit: |
|
|
Chi-Square Statistic = |
26.2516 |
|
Degrees of Freedom = |
27 |
|
Right-Tail Probability = |
0.5047 |
Regression Results
Model selected: Gompit
Valid Number of Cases: 32, 0 Omitted
Dependent Variable: GRADE
Minimum of dependent variable is encoded as 0 and the rest as 1.
|
|
Coefficient |
Standard Error |
Z-Statistic |
1-Tail Probability |
Lower 95% |
Upper 95% |
|
Constant |
-10.0314 |
3.4608 |
-2.8986 |
0.0072 |
-16.8144 |
-3.2484 |
|
GPA |
2.2936 |
1.1096 |
2.0670 |
0.0481 |
0.1187 |
4.4684 |
|
TUCE |
0.0412 |
0.2447 |
0.1682 |
0.8676 |
-0.4384 |
0.5207 |
|
PSI |
1.5623 |
0.9675 |
1.6148 |
0.1176 |
-0.3340 |
3.4585 |
|
-2 Log likelihood = |
26.0160 |
|
Goodness of Fit: |
|
|
Chi-Square Statistic = |
27.9993 |
|
Degrees of Freedom = |
27 |
|
Right-Tail Probability = |
0.4110 |