UNISTAT - the ultimate Excel statistics add-in

7.2.8. Poisson Regression

The Poisson Regression procedure is suitable for models where the dependent variable is a frequency (count) variable consisting of nonnegative integers. The exponential of estimated regression coefficients are called Incidence Rate Ratios, which give the estimated rate at which events occur. This rate can be multiplied by an Exposure variable to obtain the expected frequencies, which enters the model with a coefficient constrained as 1.

Predictions (interpolations) and multicollinearity are handled as in other regression options (see, for instance, 7.2.6. Logistic Regression). Predicted cases are identified by an asterisk in Actual and Fitted Values and Case (Diagnostic) Statistics output options (see 7.2.8.3. Poisson Regression Output Options). Note that the spreadsheet Reg function (see 3.4.2.6.3. UNISTAT Functions) will give the natural log of predicted values, which should be exponentiated to obtain the expected frequencies.

7.2.8.1. Poisson Regression Model Description

Poisson Regression assumes that actual frequencies yi are drawn from a Poisson distribution with parameters λi, i = 1, …, n. The associated probabilities are given as:

Poisson Regression

where:

      Poisson Regression, i = 1, …, n

is known as the loglinear model.

The logarithm of the likelihood function is given as:

Poisson Regression

and the first derivatives are:

Poisson Regression

A Newton-Raphson type maximum likelihood algorithm is employed to minimise the negative of the log likelihood function. The nature of this method implies that a solution (convergence) cannot always be achieved. In such cases, you are advised to edit the convergence parameters provided and try again.

7.2.8.2. Poisson Regression Variable Selection

Poisson Regression

As in other regression procedures, Poisson Regression can be used to estimate models with or without a constant term, with or without weights and regressions can be run on a subset of cases as determined by the levels of an unlimited number of factor columns. An unlimited number of dependent variables can be selected in order to run the interaction terms, dummy and lag/lead variables in the model, without having to create them as spreadsheet columns first (see 2.1.4. Creating Interaction, Dummy and Lag/Lead Variables).

It is compulsory to select at least one numeric data column containing frequency counts as a dependent variable. When more than one dependent variable is selected, the analysis will be repeated as many times as the number of dependent variables, each time only changing the dependent variable and keeping the rest of selections unchanged.

A column containing numeric data can be selected as a weights column. Weights are frequency weights and all independent variables are multiplied by this column internally by the program.

An intermediate inputs dialogue is displayed next.

Poisson Regression

Tolerance: This value is used to control the sensitivity of nonlinear minimisation procedure employed. Under normal circumstances, you do not need to edit this value. If a convergence cannot be achieved, then larger values of this parameter can be tried by removing one or more zeros.

Maximum Number of Iterations: When convergence cannot be achieved with the default value of 100 function evaluations, a higher value can be tried.

Omit Level: This field will appear only when one or Dialogue. Three options are available; (0) do not omit any levels, (1) omit the first level and (2) omit the last level. When no levels are omitted, the model will usually be over-parameterised (see 2.1.4. Creating Interaction, Dummy and Lag/Lead Variables).

Exposure / Offset: This field will appear only if a column has been assigned the task of [Exposure] in Variable Selection Dialogue. When the value of this field 0 the variable selected (E) will enter the model as Exposure:

      Poisson Regression, i = 1, …, n

and for any other value it will enter the model as Offset:

      Poisson Regression, i = 1, …, n.

7.2.8.3. Poisson Regression Output Options

Poisson Regression

Regression Results: The main regression output displays a table of estimated coefficients for each category of the dependent variable, except for the base category. Standard errors, Wald statistics, probability values and confidence intervals are also displayed for the estimated regression coefficients.

      Wald Statistic: This is defined as:

      Poisson Regression

      and has a chi-square distribution with one degree of freedom.

      Confidence Intervals: The confidence intervals for regression coefficients are computed from:

           Poisson Regression, i = 1, …, k.

      where k is the number of independent variables in the model and each coefficient’s standard error, σi, is the square root of the diagonal element of covariance matrix.

      -2 Log-Likelihood Initial Model: This is the value when all independent variables are excluded from the model:

           Poisson Regression, i = 1, …, k.

      -2 Log-Likelihood Final Model: This is ‑2 times the value of the log likelihood function when convergence is achieved.

      Pseudo R-squared: In Poisson Regression, an r-squared statistic as in the OLS regression is not available. This is because Poisson Regression employs an iterative maximum likelihood estimation method. Equivalent statistics to test the goodness of fit have been proposed using the initial (L0) and maximum (L1) likelihood values.

            McFadden:

                 Poisson Regression

           Adjusted McFadden:

                 Poisson Regression

           Cox & Snell:

                 Poisson Regression

           Nagelkerke:

                 Poisson Regression

      Likelihood Ratio: This is a test statistic for the null hypothesis that “all regression coefficients for covariates are zero”. It is equal to ‑2 times the difference between the initial and final model likelihood values and has a chi-square distribution with k degrees of freedom (the number of independent variables in the model).

      Goodness of Fit: This is a test statistic to measure the goodness of fit of the expected counts on the observed counts. It has a chi-square distribution with n - k degrees of freedom (the number of valid cases minus the number of independent variables, including the constant term, if any).

      Poisson Regression

Incidence Rate Ratio: Values of the incidence rate ratio indicate the influence of one unit change in a covariate on the regression.

           Poisson Regression, i = 1, …, k.

      The standard error of the incidence rate ratio is:

           Poisson Regression

      where σi is the standard error of the ith independent variable for the jth category of the dependent variable. Coefficient confidence intervals are:

           Poisson Regression

      which are simply the exponential of the coefficient confidence intervals.

Correlation Matrix of Regression Coefficients: This is a symmetric matrix with unity diagonal elements. The off-diagonal elements give correlations between the regression coefficients.

Covariance Matrix of Regression Coefficients: This is a symmetric matrix where diagonal elements are the square of parameter standard errors. The off-diagonal elements are covariances between the regression coefficients.

Poisson Regression

Case (Diagnostic) Statistics: Case statistics are useful to determine the influence of individual observations on the overall fit of the model. For further information see 7.2.1.2.2. Linear Regression Further Output Options.

      Statistics available under this option are defined as follows.

      Fitted Values: Also known as expected values:

           Poisson Regression, i = 1, …, n

      Standard Error of Fitted:

      Poisson Regression

      Confidence Intervals of Fitted:

           Poisson Regression

      Deviance:

      Poisson Regression

      Residuals:

           Poisson Regression

      Standardised Residuals:

           Poisson Regression

Plot of Actual and Fitted Values: Select this option to plot actual and fitted Y values against row numbers (index), residuals or against any independent variable. A further dialogue will enable you to choose the X-axis variable from a list.

Poisson Regression

      By default, a line graph of the two series is plotted. However, since this procedure (like the plot of residuals) uses the X-Y Plots engine, it has almost all controls and options available for X-Y Plots, except for error bars and right Y-axes.

      The data points on the graph will also respond to the right mouse button in the way X-Y Plots does; the point is highlighted, a panel displays information about the point and in Stand-Alone Mode, the row of the spreadsheet containing the data point is also highlighted (a procedure which is also known as Brushing or Point identification). While the point is highlighted you can press <Delete> to omit the particular row containing the point. The entire Regression Analysis will be run again without the deleted row. If you want to restore the original regression, you will need to take one of the following two actions depending on the way you run UNISTAT:

1.       In Stand-Alone Mode, go back to the Data Processor and delete or deactivate the Select Row column created by the program.

2.       In Excel Add-In Mode, highlight a different block of data to remove the effect of the internal Select Row column.

Plot of Residuals: Residuals can be plotted against row numbers (index), fitted values or against any independent variable. A further dialogue will enable you to choose the X-axis variable from a list containing Row Numbers, Fitted Values and all independent variables.

      By default a scatter graph of residuals is plotted. For more information on available options see Plot of Actual and Fitted Values above.

7.2.8.4. Poisson Regression Examples

Example 1

Example 12.12 on p. 433 Armitage, P. & G. Berry (1994). The aim is to assess whether there is a significant difference in cancer risk between veterans and non-veterans. The servicemen are divided into 11 age groups and their experience is given in terms of subject-years.

Open POISSON and select Statistics 1Regression Analysis → Poisson Regression. Select Status and Age group (L1 and L2) as [Dummy], Number of cancers (C3) as [Dependent] and Subject-years (C4) as [Exposure]. On Step 2 dialogue enter 1 for Omit Level and leave other entries unchanged. Check all output options and click [Finish]. Some tables have been shortened to save space.

variable in the model as an Offset variable, which is equivalent to an Exposure variable.

Regression results show that the p-value of Status = Veteran variable is 0.9493. As this is much greater than 5%, we can conclude that there is no significant difference in cancer risk between veterans and non-veterans.

Poisson Regression

Regression Results

Valid Number of Cases: 22, 0 Omitted

Dependent Variable: Number of cancers

Exposure: Subject-years

 

 

Coeffi

cient

Standard Error

Wald Statistic

Signifi

cance

Lower 95%

Upper 95%

Constant

-9.3248

 0.2045

2079.0648

 0.0000

-9.7257

-8.9240

Status = Veteran

-0.0035

 0.0555

 0.0040

 0.9493

-0.1123

 0.1052

Age group =

25-29

 0.6793

 0.2325

 8.5373

 0.0035

 0.2236

 1.1350

30-34

 1.3711

 0.2177

 39.6626

 0.0000

 0.9444

 1.7978

35-39

 1.9396

 0.2121

 83.6581

 0.0000

 1.5240

 2.3553

40-44

 2.0343

 0.2161

 88.6203

 0.0000

 1.6108

 2.4579

45-49

 2.7266

 0.2222

 150.5414

 0.0000

 2.2910

 3.1621

50-54

 3.2029

 0.2206

 210.7121

 0.0000

 2.7704

 3.6353

55-59

 3.7162

 0.2178

 291.1924

 0.0000

 3.2894

 4.1430

60-64

 4.0927

 0.2177

 353.4845

 0.0000

 3.6660

 4.5193

65-69

 4.2362

 0.2242

 356.9375

 0.0000

 3.7967

 4.6757

70-

 4.3637

 0.2274

 368.3262

 0.0000

 3.9181

 4.8094

 

-2 Log likelihood:

 

Initial Model =

 2199.7137

Final Model =

 132.0133

Pseudo R-squared:

 

McFadden =

 0.9400

Adjusted McFadden =

 0.9291

Cox & Snell =

 1.0000

Nagelkerke =

 1.0000

Likelihood Ratio Statistic:

 

Chi-Square Statistic =

 2067.7004

Degrees of Freedom =

 11

Right-Tail Probability =

 0.0000

Goodness of Fit:

 

Chi-Square Statistic =

 5.2166

Degrees of Freedom =

 9

Right-Tail Probability =

 0.8150

 

Example 2

Example 19.21 on p. 932 Greene, W. H. (1997). The number of accidents per service month is given for a sample of ship types. There are five types of ships constructed in four different time periods and observed in two time periods.

Open POISSON and select Statistics 1Regression Analysis → Poisson Regression. From the Variable Selection Dialogue select Type, Constructed and Operated (C5-C7) as [Dummy], Accidents (C9) as [Dependent] and Months (C8) as [Exposure]. On Step 2 dialogue enter 1 for Omit Level and leave other entries unchanged. Check all output options to obtain the following output. Some tables have been shortened to save space.

Poisson Regression

Regression Results

Valid Number of Cases: 34, 6 Omitted

Dependent Variable: Accidents

Exposure: Months

 

 

Coeffi

cient

Standard Error

Wald Statistic

Signi

ficance

Lower 95%

Upper 95%

Constant

-6.4029

 0.2175

866.444

 0.0000

-6.8292

-5.9765

Type = Type B

-0.5447

 0.1776

 9.4055

 0.0022

-0.8928

-0.1966

Type C

-0.6888

 0.3290

 4.3818

 0.0363

-1.3337

-0.0439

Type D

-0.0743

 0.2906

 0.0654

 0.7981

-0.6438

 0.4952

Type E

 0.3205

 0.2358

 1.8485

 0.1740

-0.1415

 0.7826

Constructed = C 65-69

 0.6958

 0.1497

 21.6190

 0.0000

 0.4025

 0.9892

C 70-74

 0.8175

 0.1698

 23.1665

 0.0000

 0.4846

 1.1503

C 75-79

 0.4450

 0.2332

 3.6397

 0.0564

-0.0122

 0.9021

Operated = O 75-79

 0.3839

 0.1183

 10.5357

 0.0012

 0.1521

 0.6156

 

-2 Log likelihood:

 

Initial Model =

 244.1948

Final Model =

 136.8291

Pseudo R-squared:

 

McFadden =

 0.4397

Adjusted McFadden =

 0.3660

Cox & Snell =

 0.9575

Nagelkerke =

 0.9582

Likelihood Ratio Statistic:

 

Chi-Square Statistic =

 107.3657

Degrees of Freedom =

 8

Right-Tail Probability =

 0.0000

Goodness of Fit:

 

Chi-Square Statistic =

 38.9626

Degrees of Freedom =

 24

Right-Tail Probability =

 0.0276

 

Incidence Rate Ratio

 

Incidence Rate Ratio

Standard Error

Lower 95%

Upper 95%

Type = Type B

 0.5800

 0.1030

 0.4095

 0.8215

Type C

 0.5022

 0.1652

 0.2635

 0.9571

Type D

 0.9284

 0.2697

 0.5253

 1.6408

Type E

 1.3779

 0.3248

 0.8680

 2.1871

Constructed = C 65-69

 2.0054

 0.3001

 1.4956

 2.6890

C 70-74

 2.2647

 0.3846

 1.6235

 3.1592

C 75-79

 1.5604

 0.3640

 0.9879

 2.4648

Operated = O 75-79

 1.4679

 0.1736

 1.1642

 1.8509

 

Correlation Matrix of Regression Coefficients

 

Constant

Type = Type B

Type C

Type D

Type E

Constant

 1.0000

-0.8115

-0.3783

-0.3706

-0.4680

Type = Type B

-0.8115

 1.0000

 0.4331

 0.4469

 0.5698

Type C

-0.3783

 0.4331

 1.0000

 0.2377

 0.3132

Type D

-0.3706

 0.4469

 0.2377

 1.0000

 0.3349

Type E

-0.4680

 0.5698

 0.3132

 0.3349

 1.0000

Constructed = C 65-69

-0.4847

 0.0862

 0.0360

 0.0277

-0.0052

C 70-74

-0.5509

 0.2722

 0.0458

 0.0284

-0.0377

C 75-79

-0.4015

 0.2285

 0.0968

-0.0962

 0.0476

Operated = O 75-79

-0.2164

 0.0256

-0.0030

-0.0047

 0.0263

 

 

Constructed = C 65-69

C 70-74

C 75-79

Operated = O 75-79

Constant

-0.4847

-0.5509

-0.4015

-0.2164

Type = Type B

 0.0862

 0.2722

 0.2285

 0.0256

Type C

 0.0360

 0.0458

 0.0968

-0.0030

Type D

 0.0277

 0.0284

-0.0962

-0.0047

Type E

-0.0052

-0.0377

 0.0476

 0.0263

Constructed = C 65-69

 1.0000

 0.6337

 0.4758

-0.1196

C 70-74

 0.6337

 1.0000

 0.5494

-0.2629

C 75-79

 0.4758

 0.5494

 1.0000

-0.3150

Operated = O 75-79

-0.1196

-0.2629

-0.3150

 1.0000

 

Covariance Matrix of Regression Coefficients

 

Constant

Type = Type B

Type C

Type D

Type E

Constant

 0.0473

-0.0314

-0.0271

-0.0234

-0.0240

Type = Type B

-0.0314

 0.0315

 0.0253

 0.0231

 0.0239

Type C

-0.0271

 0.0253

 0.1083

 0.0227

 0.0243

Type D

-0.0234

 0.0231

 0.0227

 0.0844

 0.0229

Type E

-0.0240

 0.0239

 0.0243

 0.0229

 0.0556

Constructed = C 65-69

-0.0158

 0.0023

 0.0018

 0.0012

-0.0002

C 70-74

-0.0204

 0.0082

 0.0026

 0.0014

-0.0015

C 75-79

-0.0204

 0.0095

 0.0074

-0.0065

 0.0026

Operated = O 75-79

-0.0056

 0.0005

-0.0001

-0.0002

 0.0007

 

 

Constructed = C 65-69

C 70-74

C 75-79

Operated = O 75-79

Constant

-0.0158

-0.0204

-0.0204

-0.0056

Type = Type B

 0.0023

 0.0082

 0.0095

 0.0005

Type C

 0.0018

 0.0026

 0.0074

-0.0001

Type D

 0.0012

 0.0014

-0.0065

-0.0002

Type E

-0.0002

-0.0015

 0.0026

 0.0007

Constructed = C 65-69

 0.0224

 0.0161

 0.0166

-0.0021

C 70-74

 0.0161

 0.0288

 0.0218

-0.0053

C 75-79

 0.0166

 0.0218

 0.0544

-0.0087

Operated = O 75-79

-0.0021

-0.0053

-0.0087

 0.0140

 

Actual and Fitted Values

Row

Actual *

Fitted +

0.0000                                     58.0000

1

 0.0000

 0.2104

*                                                          

2

 0.0000

 0.1532

*                                                           

3

 3.0000

 3.6382

   *+                                                      

 

Residuals

Row

Residuals

-4.8229                                               4.8229

1

-0.2104

                            *                               

2

-0.1532

                             *                             

3

-0.6382

                          *                                

 

Case (Diagnostic) Statistics

Row

Fitted Values

Standard Error of Fitted

Lower 95%

Upper 95%

Devi

ance

Residuals

Standardised Residuals

1

 0.2104

 0.2175

-0.2159

 0.6367

 0.4208

-0.2104

-0.4587

2

 0.1532

 0.2240

-0.2858

 0.5922

 0.3064

-0.1532

-0.3914

3

 3.6382

 0.1953

 3.2553

 4.0210

 0.1191

-0.6382

-0.3346

 

Poisson Regression

 

Poisson Regression