7.2.8. Poisson Regression
The Poisson Regression procedure is suitable for models where the dependent variable is a frequency (count) variable consisting of nonnegative integers. The exponential of estimated regression coefficients are called Incidence Rate Ratios, which give the estimated rate at which events occur. This rate can be multiplied by an Exposure variable to obtain the expected frequencies, which enters the model with a coefficient constrained as 1.
Predictions (interpolations) and multicollinearity are handled as in other regression options (see, for instance, Logistic Regression). Predicted cases are identified by an asterisk in Case (Diagnostic) Statistics output options (see 7.2.8.3. Poisson Regression Output Options). The spreadsheet Reg function (see 3.4.2.6.3. UNISTAT Functions) will give the natural log of predicted values, which should be exponentiated to obtain the expected frequencies.
7.2.8.1. Poisson Regression Model Description
Poisson Regression assumes that actual frequencies y_{i} are drawn from a Poisson distribution with parameters λ_{i}, i = 1, …, n. The associated probabilities are given as:
where:
is known as the loglinear model.
The logarithm of the likelihood function is given as:
and the first derivatives are:
A NewtonRaphson type maximum likelihood algorithm is employed to minimise the negative of the log likelihood function. The nature of this method implies that a solution (convergence) cannot always be achieved. In such cases, you are advised to edit the convergence parameters provided and try again.
7.2.8.2. Poisson Regression Variable Selection
As in other regression procedures, Poisson Regression can be used to estimate models with or without a constant term, with or without weights and regressions can be run on a subset of cases as determined by the levels of an unlimited number of factor columns. An unlimited number of dependent variables can be selected in order to run the same model on different dependent variables. It is also possible to include interaction terms, dummy and lag/lead variables in the model, without having to create them as spreadsheet columns first (see 2.1.4. Creating Interaction, Dummy and Lag/Lead Variables).
It is compulsory to select at least one numeric data column containing frequency counts as a dependent variable. When more than one dependent variable is selected, the analysis will be repeated as many times as the number of dependent variables, each time only changing the dependent variable and keeping the rest of selections unchanged.
A column containing numeric data can be selected as a weights column. Weights are frequency weights and all independent variables are multiplied by this column internally by the program.
An intermediate inputs dialogue is displayed next.
Tolerance: This value is used to control the sensitivity of nonlinear minimisation procedure employed. Under normal circumstances, you do not need to edit this value. If convergence cannot be achieved, then larger values of this parameter can be tried by removing one or more zeros.
Maximum Number of Iterations: When convergence cannot be achieved with the default value of 100 function evaluations, a higher value can be tried.
Omit Level: This field will appear only when one or more dummy variables have been included in the model from the Variable Selection Dialogue. Three options are available; (0) do not omit any levels, (1) omit the first level and (2) omit the last level. When no levels are omitted, the model will usually be overparameterised (see 2.1.4. Creating Interaction, Dummy and Lag/Lead Variables).
Exposure / Offset: This field will appear only if a column has been assigned the task of [Exposure] in Variable Selection Dialogue. When the value of this field 0 the variable selected (E) will enter the model as Exposure:
and for any other value it will enter the model as Offset:
7.2.8.3. Poisson Regression Output Options
The Actual and Fitted Values, Residuals and Confidence Intervals for Mean and Actual Y Values options that existed in earlier versions of UNISTAT have now been merged with the Case (Diagnostic) Statistics option. See 7.2.1.2.2. Linear Regression Case Output.
Regression Results: The main regression output displays a table of estimated coefficients for each category of the dependent variable, except for the base category. Standard errors, Wald statistics, probability values and confidence intervals are also displayed for the estimated regression coefficients.
Wald Statistic: This is defined as:
and has a chisquare distribution with one degree of freedom.
Confidence Intervals: The confidence intervals for regression coefficients are computed from:
where k is the number of independent variables in the model and each coefficient’s standard error, σ_{i}, is the square root of the diagonal element of covariance matrix.
Goodness of Fit Tests: See 7.2.6.4.1. Logistic Regression Results for details.
Correlation Matrix of Regression Coefficients: This is a symmetric matrix with unity diagonal elements. The offdiagonal elements give correlations between the regression coefficients.
Covariance Matrix of Regression Coefficients: This is a symmetric matrix where diagonal elements are the square of parameter standard errors. The offdiagonal elements are covariances between the regression coefficients.
Incidence Rate Ratio: Values of the incidence rate ratio indicate the influence of one unit change in a covariate on the regression.
The standard error of the incidence rate ratio is:
where σ_{i} is the standard error of the i^{th} independent variable for the j^{th} category of the dependent variable. Coefficient confidence intervals are:
which are simply the exponential of the coefficient confidence intervals.
Case (Diagnostic) Statistics: Case statistics are useful to determine the influence of individual observations on the overall fit of the model. For further information see 7.2.1.2.2. Linear Regression Case Output.
Statistics available under this option are defined as follows.
Actual Y: Observed values of the dependent variable.
Fitted Y: Also known as expected values:
Standard Error of Fitted:
Confidence Intervals of Fitted:
Deviance:
Residuals:
Standardised Residuals:
Plot of Actual and Fitted Values: Select this option to plot actual and fitted Y values against row numbers (index), residuals or against any independent variable. A further dialogue will enable you to choose the Xaxis variable from a list.
By default, a line graph of the two series is plotted. However, since this procedure (like the plot of residuals) uses the XY Plots engine, it has almost all controls and options available for XY Plots, except for error bars and right Yaxes.
The data points on the graph will also respond to the right mouse button in the way XY Plots does; the point is highlighted, a panel displays information about the point and in StandAlone Mode, the row of the spreadsheet containing the data point is also highlighted (a procedure which is also known as Brushing or Point identification). While the point is highlighted you can press <Delete> to omit the particular row containing the point. The entire Regression Analysis will be run again without the deleted row. If you want to restore the original regression, you will need to take one of the following two actions depending on the way you run UNISTAT:
1. In StandAlone Mode, go back to the Data Processor and delete or deactivate the Select Row column created by the program.
2. In Excel AddIn Mode, highlight a different block of data to remove the effect of the internal Select Row column.
Plot of Residuals: Residuals can be plotted against row numbers (index), fitted values or against any independent variable. A further dialogue will enable you to choose the Xaxis variable from a list containing Row Numbers, Fitted Values and all independent variables.
By default a scatter graph of residuals is plotted. For more information on available options see Plot of Actual and Fitted Values above.
7.2.8.4. Poisson Regression Examples
Example 1
Example 14.4 on p. 501 Armitage & Berry (2002). The aim is to assess whether there is a significant difference in cancer risk between veterans and nonveterans. The servicemen are divided into 11 age groups and their experience is given in terms of subjectyears.
Open POISSON and select Statistics 1 → Regression Analysis → Poisson Regression. Select Status and Age group (L1 and L2) as [Dummy], Number of cancers (C3) as [Dependent] and Subjectyears (C4) as [Exposure]. On Step 2 dialogue enter 1 for Omit Level and leave other entries unchanged. Uncheck all output options except for Regression Results and click [Finish].
Armitage and Berry include the logarithm of the Subjectyears variable in the model as an Offset variable, which is equivalent to an Exposure variable.
Regression results show that the pvalue of Status = Veteran variable is 0.9493. As this is much greater than 5%, we can conclude that there is no significant difference in cancer risk between veterans and nonveterans.
Poisson Regression
Dependent Variable: Number of cancers
Exposure: Subjectyears
Valid Number of Cases: 22, 0 Omitted
Regression Results

Coeffi cient 
Standard Error 
Wald Statistic 
Signifi cance 
Lower 95% 
Upper 95% 
Constant 
9.3248 
0.2045 
2079.0648 
0.0000 
9.7257 
8.9240 
Status = Veteran 
0.0035 
0.0555 
0.0040 
0.9493 
0.1123 
0.1052 
Age group = 2529 
0.6793 
0.2325 
8.5373 
0.0035 
0.2236 
1.1350 
3034 
1.3711 
0.2177 
39.6626 
0.0000 
0.9444 
1.7978 
3539 
1.9396 
0.2121 
83.6581 
0.0000 
1.5240 
2.3553 
4044 
2.0343 
0.2161 
88.6203 
0.0000 
1.6108 
2.4579 
4549 
2.7266 
0.2222 
150.5414 
0.0000 
2.2910 
3.1621 
5054 
3.2029 
0.2206 
210.7121 
0.0000 
2.7704 
3.6353 
5559 
3.7162 
0.2178 
291.1924 
0.0000 
3.2894 
4.1430 
6064 
4.0927 
0.2177 
353.4845 
0.0000 
3.6660 
4.5193 
6569 
4.2362 
0.2242 
356.9375 
0.0000 
3.7967 
4.6757 
70 
4.3637 
0.2274 
368.3262 
0.0000 
3.9181 
4.8094 
Example 2
Example 19.21 on p. 932 Greene (1997). The number of accidents per service month is given for a sample of ship types. There are five types of ships constructed in four different time periods and observed in two time periods.
Open POISSON and select Statistics 1 → Regression Analysis → Poisson Regression. From the Variable Selection Dialogue select Type, Constructed and Operated (C5C7) as [Dummy], Accidents (C9) as [Dependent] and Months (C8) as [Exposure]. On Step 2 dialogue enter 1 for Omit Level and leave other entries unchanged. Check all output options to obtain the following output. Some tables have been shortened to save space.
Poisson Regression
Dependent Variable: Accidents
Exposure: Months
Valid Number of Cases: 34, 6 Omitted
Regression Results

Coeffi cient 
Standard Error 
Wald Statistic 
Signi ficance 
Lower 95% 
Upper 95% 
Constant 
6.4029 
0.2175 
866.444 
0.0000 
6.8292 
5.9765 
Type = Type B 
0.5447 
0.1776 
9.4055 
0.0022 
0.8928 
0.1966 
Type C 
0.6888 
0.3290 
4.3818 
0.0363 
1.3337 
0.0439 
Type D 
0.0743 
0.2906 
0.0654 
0.7981 
0.6438 
0.4952 
Type E 
0.3205 
0.2358 
1.8485 
0.1740 
0.1415 
0.7826 
Constructed = C 6569 
0.6958 
0.1497 
21.6190 
0.0000 
0.4025 
0.9892 
C 7074 
0.8175 
0.1698 
23.1665 
0.0000 
0.4846 
1.1503 
C 7579 
0.4450 
0.2332 
3.6397 
0.0564 
0.0122 
0.9021 
Operated = O 7579 
0.3839 
0.1183 
10.5357 
0.0012 
0.1521 
0.6156 
Goodness of Fit Tests

2 Log likelihood 
Initial Model 
244.1948 
Final Model 
136.8291 

ChiSquare Statistic 
Degrees of Freedom 
RightTail Probability 
Pearson 
38.9626 
24 
0.0276 
Likelihood Ratio 
107.3657 
8 
0.0000 

Pseudo Rsquared 
McFadden 
0.4397 
Adjusted McFadden 
0.3660 
Cox & Snell 
0.9575 
Nagelkerke 
0.9582 
Correlation Matrix of Regression Coefficients

Constant 
Type = Type B 
Type C 
Type D 
Type E 
Constant 
1.0000 
0.8115 
0.3783 
0.3706 
0.4680 
Type = Type B 
0.8115 
1.0000 
0.4331 
0.4469 
0.5698 
Type C 
0.3783 
0.4331 
1.0000 
0.2377 
0.3132 
Type D 
0.3706 
0.4469 
0.2377 
1.0000 
0.3349 
Type E 
0.4680 
0.5698 
0.3132 
0.3349 
1.0000 
Constructed = C 6569 
0.4847 
0.0862 
0.0360 
0.0277 
0.0052 
C 7074 
0.5509 
0.2722 
0.0458 
0.0284 
0.0377 
C 7579 
0.4015 
0.2285 
0.0968 
0.0962 
0.0476 
Operated = O 7579 
0.2164 
0.0256 
0.0030 
0.0047 
0.0263 

Constructed = C 6569 
C 7074 
C 7579 
Operated = O 7579 
Constant 
0.4847 
0.5509 
0.4015 
0.2164 
Type = Type B 
0.0862 
0.2722 
0.2285 
0.0256 
Type C 
0.0360 
0.0458 
0.0968 
0.0030 
Type D 
0.0277 
0.0284 
0.0962 
0.0047 
Type E 
0.0052 
0.0377 
0.0476 
0.0263 
Constructed = C 6569 
1.0000 
0.6337 
0.4758 
0.1196 
C 7074 
0.6337 
1.0000 
0.5494 
0.2629 
C 7579 
0.4758 
0.5494 
1.0000 
0.3150 
Operated = O 7579 
0.1196 
0.2629 
0.3150 
1.0000 
Covariance Matrix of Regression Coefficients

Constant 
Type = Type B 
Type C 
Type D 
Type E 
Constant 
0.0473 
0.0314 
0.0271 
0.0234 
0.0240 
Type = Type B 
0.0314 
0.0315 
0.0253 
0.0231 
0.0239 
Type C 
0.0271 
0.0253 
0.1083 
0.0227 
0.0243 
Type D 
0.0234 
0.0231 
0.0227 
0.0844 
0.0229 
Type E 
0.0240 
0.0239 
0.0243 
0.0229 
0.0556 
Constructed = C 6569 
0.0158 
0.0023 
0.0018 
0.0012 
0.0002 
C 7074 
0.0204 
0.0082 
0.0026 
0.0014 
0.0015 
C 7579 
0.0204 
0.0095 
0.0074 
0.0065 
0.0026 
Operated = O 7579 
0.0056 
0.0005 
0.0001 
0.0002 
0.0007 

Constructed = C 6569 
C 7074 
C 7579 
Operated = O 7579 
Constant 
0.0158 
0.0204 
0.0204 
0.0056 
Type = Type B 
0.0023 
0.0082 
0.0095 
0.0005 
Type C 
0.0018 
0.0026 
0.0074 
0.0001 
Type D 
0.0012 
0.0014 
0.0065 
0.0002 
Type E 
0.0002 
0.0015 
0.0026 
0.0007 
Constructed = C 6569 
0.0224 
0.0161 
0.0166 
0.0021 
C 7074 
0.0161 
0.0288 
0.0218 
0.0053 
C 7579 
0.0166 
0.0218 
0.0544 
0.0087 
Operated = O 7579 
0.0021 
0.0053 
0.0087 
0.0140 
Incidence Rate Ratio

Incidence Rate Ratio 
Standard Error 
Lower 95% 
Upper 95% 
Type = Type B 
0.5800 
0.1030 
0.4095 
0.8215 
Type C 
0.5022 
0.1652 
0.2635 
0.9571 
Type D 
0.9284 
0.2697 
0.5253 
1.6408 
Type E 
1.3779 
0.3248 
0.8680 
2.1871 
Constructed = C 6569 
2.0054 
0.3001 
1.4956 
2.6890 
C 7074 
2.2647 
0.3846 
1.6235 
3.1592 
C 7579 
1.5604 
0.3640 
0.9879 
2.4648 
Operated = O 7579 
1.4679 
0.1736 
1.1642 
1.8509 
Case (Diagnostic) Statistics

Actual Y 
Fitted Y 
95% lb Actual Y 
95% ub Actual Y 
Standard Error of Fitted 
1 
0.0000 
0.2104 
0.2159 
0.6367 
0.2175 
2 
0.0000 
0.1532 
0.2858 
0.5922 
0.2240 
3 
3.0000 
3.6382 
3.2553 
4.0210 
0.1953 
… 
… 
… 
… 
… 
… 

Deviance 
Residuals 
Standardised Residuals 
1 
0.4208 
0.2104 
0.4587 
2 
0.3064 
0.1532 
0.3914 
3 
0.1191 
0.6382 
0.3346 
… 
… 
… 
… 