UNISTAT - the ultimate Excel statistics add-in

7.2.5. Logit / Probit / Gompit

The logit, probit and gompit regressions can be used to estimate models with binary dependent variables (dependent variables that consist of two values) as well as the aggregated models where data contains a variable on the number of positive (or negative) responses and another variable giving the total number of subjects. All three regressions work with similar inputs, employ the same maximum likelihood method and share the same output format, but differ in the objective function used.

The logit and Logistic Regression procedures are closely related. A logit analysis with a binary independent variable will produce the same coefficients estimated by Logistic Regression. However, for such problems the Logistic Regression procedure should be preferred, since;

1)       it reports odds ratios and their confidence intervals,

2)       creates a classification table for the predicted and observed group memberships,

3)       draws Receiver Operating Characteristics (ROC) and Sensitivity and Specificity curves,

4)       and reports a wide range of Case (Diagnostic) Statistics.

On the other hand, the Logit / Probit / Gompit procedure must be used when;

1)       the dependent variable is not in binary format (it is in aggregated format),

2)       the natural response rate is to be estimated or

3)       a probit or gompit model is to be estimated.

Like other regression options, Logit / Probit / Gompit also allows for automatic creation of interaction terms and dummy variables.

7.2.5.1. Logit / Probit / Gompit Model Description

The logit function is an odds ratio function for a given probability value, e.g.:

Logit(p) = Ln(p/(1-p))

Logit(0.025) = -3.66

Logit(0.95) = 2.94.

Probit is the inverse standard cumulative normal distribution for a given probability value, e.g.:

Probit(0.025) = -1.96

Probit(0.95) = 1.64.

Gompit function, which is the inverse of the complementary log log (or cloglog) function is defined as:

Gompit(p) = 1-Exp(-Exp(p))

Note that this is the inverse of cloglog function, which is available as an axis scaling option in graphics (see Scale Type).

The logarithm of the likelihood function is:

Logit / Probit / Gompit-Regression

and its first derivative is:

Logit / Probit / Gompit-Regression

where:

ri is the number of responses,

si is the number of subjects.

For the logit model:

Logit / Probit / Gompit-Regression

Logit / Probit / Gompit-Regression

for the probit model:

Fi is the cumulative normal probability and

Gi is the normal frequency at Logit / Probit / Gompit-Regression.

and for the gompit model:

Logit / Probit / Gompit-Regression

Logit / Probit / Gompit-Regression

With a dichotomous dependent variable ri = yi (0 or 1) and si = 1.

A Newton-Raphson type maximum likelihood algorithm is employed to minimise the negative of the log likelihood function. The nature of this method implies that a solution (convergence) cannot always be achieved. In such cases, you are advised to edit the convergence parameters provided, in order to find the right levels for the particular problem at hand.

7.2.5.2. Logit / Probit / Gompit Variable Selection

Logit / Probit / Gompit can analyse data in two different formats:

1)       Binary (casewise) data where the dependent variable is a dichotomous variable (it consists of two values), and

Logit / Probit / Gompit-Regression

2)       Aggregated (grouped) data, where similar cases are collapsed into groups to generate two columns, one containing the number of responses the other the total number of subjects in the group.

Logit / Probit / Gompit-Regression

When the first data option is selected, the dependent variable should ideally contain only two distinct values (numeric or string). However, UNISTAT will accept any column as the dependent variable and then consider those values which are equal to the minimum of this column as 0 and any other values as 1. This approach has the advantage and flexibility of running Logit / Probit / Gompit models on columns containing any type of categorical data. For instance, when a logit analysis is run on a column containing years 1995, 1996 and 1997, UNISTAT will internally code all 1995 entries as 0 and all 1996 and 1997 entries as 1. However, it is left to the user to ensure that the dependent variable selected contains sensible values.

The following is an example for the first data type, where there is one dichotomous dependent variable and one independent variable:

 

Dependent

Independent

0

1.3

0

2.7

1

2.1

0

2.7

0

1.3

1

2.1

1

2.7

1

1.9

1

1.3

0

2.1

0

2.7

1

1.3

1

2.1

0

1.3

0

2.1

0

1.9

 

The same data set can be grouped (or collapsed) into the second (aggregated) format as follows:

 

Responses

Subjects

Independent

2

5

1.3

1

2

1.9

3

5

2.1

1

4

2.7

where the first variable is called the response variable (which represents the number of true values within the group), and the second the subjects (which represents the total number of cases in that group).

UNISTAT will first ask for the type of the dependent variable. If it is binary as described in (1) above, then select a dependent variable (by clicking on [Dependent]) which contains numeric or string categorical data, and any number of independent variables, which contain numeric data. If the data is in aggregated (or collapsed) from as described in (2) above, then select one column as Response (by clicking on [Response]) and one column as Subjects (by clicking on [Subject]). The following relation should hold for each case:

      0 ≤ Response ≤ Subjects

Cases that do not conform to this will be considered as missing. As in Linear Regression, it is possible to create interaction terms and dummy variables, but not lag/lead terms (see 2.1.4. Creating Interaction, Dummy and Lag/Lead Variables).

It is also possible to select a factor (categorical) variable (by clicking on [Factor]) in which case the program will perform the analysis on a sub group as defined by the user (see 7.2.1.1. Linear Regression Variable Selection).

Next, an intermediate inputs dialogue will pop up.

Logit / Probit / Gompit-Regression

Tolerance: This value is used to control the sensitivity of nonlinear minimisation procedure employed. Under normal circumstances, you do not need to edit this value. If a convergence cannot be achieved, then larger values of this parameter can be tried by removing one or more zeros.

Maximum Number of Iterations: When convergence cannot be achieved with the default value of 100 function evaluations, a higher value can be tried.

Omit Level: This field will appear only when one or more Dialogue. Three options are available; (0) do not omit any levels, (1) omit the first level and (2) omit the last level. When no levels are omitted, the model will usually be over-parameterised (see 2.1.4. Creating Interaction, Dummy and Lag/Lead Variables).

Logit / Probit / Gompit: Select the model to be estimated.

When the aggregated data option is selected, UNISTAT will also ask whether a natural response rate is to be estimated or a fixed one will be given by the user. When a natural response rate is estimated, it will appear in the output just like any other estimated coefficient.

Logit / Probit / Gompit-Regression

7.2.5.3. Logit / Probit / Gompit Output Options

Logit / Probit / Gompit-Regression

Regression Results: Final value of the objective function, a goodness of fit test, parameter estimates, their standard errors and the class="UniDialog">Goodness of Fit test between the observed and expected number of responses. This is also known as Pearson’s chi-square statistic and it has a chi-square distribution with n – 1 – k degrees of freedom.

Expected Frequencies: Observed and expected responses, their differences and the expected probabilities are displayed.

Correlation Matrix for Regression Coefficients: Correlations between the estimated coefficients are displayed.

Covariance Matrix for Regression Coefficients: Diagonal elements are the coefficient variances and off diagonal elements are the covariances between coefficients.

7.2.5.4. Logit / Probit / Gompit Examples

Example 1

Table 12.19 on p. 353 from Altman, Douglas (1991). Open LOGIT, select Statistics 1Regression Analysis → Logit / Probit / Gompit and select the data option Two Columns Contain Number of Subjects and Number of Responses. Then select Total (C7) as [Subject], Hypertension (C8) as [Response] and Smoking, Obesity and Snoring (C9 to C11) as [Variable]s. Select only the Regression Results output option to obtain the following results:

Logit / Probit / Gompit

Regression Results

Model selected: Logit

Valid Number of Cases: 8, 0 Omitted

Response Variable: Hypertension

Subject Variable: Total

 

 

Coefficient

Standard Error

Z-Statistic

1-Tail Probability

Lower 95%

Upper 95%

Constant

-2.3777

 0.3802

-6.2540

 0.0000

-3.1228

-1.6325

Smoking

-0.0678

 0.2781

-0.2437

 0.8075

-0.6129

 0.4773

Obesity

 0.6953

 0.2851

 2.4390

 0.0147

 0.1366

 1.2541

Snoring

 0.8719

 0.3976

 2.1932

 0.0283

 0.0927

 1.6512

 

-2 Log likelihood =

 398.9164

Goodness of Fit:

 

Chi-Square Statistic =

 1.3643

Degrees of Freedom =

 4

Right-Tail Probability =

 0.8504

 

Go back to Variable Selection Dialogue, omit Smoking (C9) from the independent variable list and run the analysis again.

Regression Results

Model selected: Logit

Valid Number of Cases: 8, 0 Omitted

Response Variable: Hypertension

Subject Variable: Total

 

 

Coefficient

Standard Error

Z-Statistic

1-Tail Probability

Lower 95%

Upper 95%

Constant

-2.3921

 0.3757

-6.3662

 0.0000

-3.1285

-1.6556

Obesity

 0.6954

 0.2851

 2.4395

 0.0147

 0.1367

 1.2541

Snoring

 0.8655

 0.3967

 2.1819

 0.0291

 0.0880

 1.6429

 

-2 Log likelihood =

 398.9761

Goodness of Fit:

 

Chi-Square Statistic =

 1.3854

Degrees of Freedom =

 5

Right-Tail Probability =

 0.9259

 

Example 2

Example 12.10 on p. 429 from Armitage, P. & G. Berry (1994). Data given in Table 12.9 needs to be transformed into a suitable format where the main effects of the four factors A, B, C and D can be analysed. This is done by creating a new column for each factor such that it contains the value one if the factor occurs in the factor combination column and zero otherwise. The data matrix would then look like this:

 

Total

Good

A

B

C

D

477

84

0

0

0

0

231

75

1

0

0

0

63

13

0

1

0

0

94

35

1

1

0

0

150

67

0

0

1

0

378

201

1

0

1

0

32

16

0

1

1

0

169

102

1

1

1

0

12

2

0

0

0

1

13

7

1

0

0

1

7

4

0

1

0

1

12

8

1

1

0

1

11

3

0

0

1

1

45

27

1

0

1

1

4

1

0

1

1

1

31

23

1

1

1

1

 

Open LOGIT, select Statistics 1Regression Analysis → Logit / Probit / Gompit and select the data option Two Columns Contain Number of Subjects and Number of Responses. Then select Total (C1) as [Subject], Good (C2) as [Response] and A, B, C, D (C3 to C6) as [Variable]s. Select all output options for the following results:

Logit / Probit / Gompit

Regression Results

Model selected: Logit

Valid Number of Cases: 16, 0 Omitted

Response Variable: Good

Subject Variable: Total

 

 

Coefficient

Standard Error

Z-Statistic

1-Tail Probability

Lower 95%

Upper 95%

Constant

-1.4604

 0.0964

-15.1490

 0.0000

-1.6494

-1.2715

A

 0.6498

 0.1154

 5.6298

 0.0000

 0.4236

 0.8760

B

 0.3101

 0.1222

 2.5377

 0.0112

 0.0706

 0.5496

C

 0.9806

 0.1107

 8.8560

 0.0000

 0.7636

 1.1976

D

 0.4204

 0.1910

 2.2011

 0.0277

 0.0461

 0.7947

 

-2 Log likelihood =

 2104.1204

Goodness of Fit:

 

Chi-Square Statistic =

 13.6067

Degrees of Freedom =

 11

Right-Tail Probability =

 0.2555

 

Expected Frequencies

Row

Obs Responses

Exp Responses

Residuals

Probability

1

 84.0000

 89.8673

-5.8673

 0.1884

2

 75.0000

 71.0915

 3.9085

 0.3078

3

 13.0000

 15.1470

-2.1470

 0.2404

4

 35.0000

 35.4771

-0.4771

 0.3774

5

 67.0000

 57.3442

 9.6558

 0.3823

6

 201.0000

 205.0245

-4.0245

 0.5424

7

 16.0000

 14.6455

 1.3545

 0.4577

8

 102.0000

 104.4028

-2.4028

 0.6178

9

 2.0000

 3.1336

-1.1336

 0.2611

10

 7.0000

 5.2474

 1.7526

 0.4036

11

 4.0000

 2.2764

 1.7236

 0.3252

12

 8.0000

 5.7596

 2.2404

 0.4800

13

 3.0000

 5.3365

-2.3365

 0.4851

14

 27.0000

 28.9549

-1.9549

 0.6434

15

 1.0000

 2.2493

-1.2493

 0.5623

16

 23.0000

 22.0422

 0.9578

 0.7110

 

Correlation Matrix of Regression Coefficients

 

Constant

A

B

C

D

Constant

 1.0000

-0.5095

-0.1961

-0.3929

-0.0716

A

-0.5095

 1.0000

-0.1534

-0.2952

-0.0430

B

-0.1961

-0.1534

 1.0000

-0.0014

-0.0810

C

-0.3929

-0.2952

-0.0014

 1.0000

-0.0569

D

-0.0716

-0.0430

-0.0810

-0.0569

 1.0000

 

Covariance Matrix of Regression Coefficients

 

Constant

A

B

C

D

Constant

 0.0093

-0.0057

-0.0023

-0.0042

-0.0013

A

-0.0057

 0.0133

-0.0022

-0.0038

-0.0009

B

-0.0023

-0.0022

 0.0149

-0.0000

-0.0019

C

-0.0042

-0.0038

-0.0000

 0.0123

-0.0012

D

-0.0013

-0.0009

-0.0019

-0.0012

 0.0365

 

Next go back to the Variable Selection Dialogue and check the Probit option. Select only the Regression Results output option.

Regression Results

Model selected: Probit

Valid Number of Cases: 16, 0 Omitted

Response Variable: Good

Subject Variable: Total

 

 

Coefficient

Standard Error

Z-Statistic

1-Tail Probability

Lower 95%

Upper 95%

Constant

-0.8933

 0.0561

-15.9286

 0.0000

-1.0032

-0.7833

A

 0.3963

 0.0698

 5.6740

 0.0000

 0.2594

 0.5332

B

 0.1890

 0.0749

 2.5238

 0.0116

 0.0422

 0.3359

C

 0.6027

 0.0675

 8.9292

 0.0000

 0.4704

 0.7350

D

 0.2584

 0.1169

 2.2106

 0.0271

 0.0293

 0.4876

 

-2 Log likelihood =

 2103.5279

Goodness of Fit:

 

Chi-Square Statistic =

 12.9949

Degrees of Freedom =

 11

Right-Tail Probability =

 0.2937

 

Example 3

Example 19.1 on p. 876 Greene, W. H. (1997). Data is given in Table 19.1 and results for all three models are given on Table 19.2. Table 19.3 on p. 886 also displays the standard errors for logit and probit models. The binary dependent variable GRADE indicates whether a student’s grade on an examination improved after exposure to a new method of teaching.

Open LOGIT, select Statistics 1Regression Analysis → Logit / Probit / Gompit and select the data option Dependent Variable Contains Binary Data. Then select GRADE (C12) as [Dependent] and GPA, TUCE and PSI (C13 to C15) as [Variable]s. Select only the Regression Results and then run the example for logit, probit and gompit models separately.

Logit / Probit / Gompit

Regression Results

Model selected: Logit

Valid Number of Cases: 32, 0 Omitted

Dependent Variable: GRADE

Minimum of dependent variable is encoded as 0 and the rest as 1.

 

 

Coefficient

Standard Error

Z-Statistic

1-Tail Probability

Lower 95%

Upper 95%

Constant

-13.0213

 4.9313

-2.6405

 0.0134

-22.6866

-3.3561

GPA

 2.8261

 1.2629

 2.2377

 0.0334

 0.3508

 5.3014

TUCE

 0.0952

 0.1416

 0.6722

 0.5069

-0.1823

 0.3726

PSI

 2.3787

 1.0646

 2.2344

 0.0336

 0.2922

 4.4652

 

-2 Log likelihood =

 25.7793

Goodness of Fit:

 

Chi-Square Statistic =

 27.2571

Degrees of Freedom =

 27

Right-Tail Probability =

 0.4500

 

Regression Results

Model selected: Probit

Valid Number of Cases: 32, 0 Omitted

Dependent Variable: GRADE

Minimum of dependent variable is encoded as 0 and the rest as 1.

 

 

Coefficient

Standard Error

Z-Statistic

1-Tail Probability

Lower 95%

Upper 95%

Constant

-7.4523

 2.5425

-2.9311

 0.0067

-12.4355

-2.4692

GPA

 1.6258

 0.6939

 2.3431

 0.0265

 0.2658

 2.9858

TUCE

 0.0517

 0.0839

 0.6166

 0.5425

-0.1127

 0.2162

PSI

 1.4263

 0.5950

 2.3970

 0.0234

 0.2601

 2.5926

 

-2 Log likelihood =

 25.6376

Goodness of Fit:

 

Chi-Square Statistic =

 26.2516

Degrees of Freedom =

 27

Right-Tail Probability =

 0.5047

 

Regression Results

Model selected: Gompit

Valid Number of Cases: 32, 0 Omitted

Dependent Variable: GRADE

Minimum of dependent variable is encoded as 0 and the rest as 1.

 

 

Coefficient

Standard Error

Z-Statistic

1-Tail Probability

Lower 95%

Upper 95%

Constant

-10.0314

 3.4608

-2.8986

 0.0072

-16.8144

-3.2484

GPA

 2.2936

 1.1096

 2.0670

 0.0481

 0.1187

 4.4684

TUCE

 0.0412

 0.2447

 0.1682

 0.8676

-0.4384

 0.5207

PSI

 1.5623

 0.9675

 1.6148

 0.1176

-0.3340

 3.4585

 

-2 Log likelihood =

 26.0160

Goodness of Fit:

 

Chi-Square Statistic =

 27.9993

Degrees of Freedom =

 27

Right-Tail Probability =

 0.4110