2.1.4. Creating Interaction, Dummy and Lag/Lead Variables

In most regression models, it is possible to add new terms to the model using transformations of existing variables, thus eliminating the need to create them as data columns in the spreadsheet first. In these procedures, interaction, dummy and lag/lead terms can be specified during the variable selection phase. The program will then create these terms in its temporary memory internally. This feature is available in the following procedures:
Statistics 1 →
Statistics 1 → Regression Analysis →
Statistics 2 → Survival Analysis →
Note that, although it is not a regression procedure as of the models selected in other regression procedures. A particularly useful feature here is the option to send the entire final raw data (X) matrix of the regression model to the Output Medium, so that you can see the actual values of the interaction, dummy and lag/lead terms generated by the program internally. In Stand-Alone Mode, this output can then be sent to the Data Processor and used in other procedures if necessary (see 7.1. Matrix Statistics).
On the Variable Selection Dialogue of the above procedures, up to four smaller buttons [Interaction], [Dummy], [Full] and [Lag/Lead] will appear just under the [Variable] button. The Cox Regression procedure does not have a [Lag/Lead] button, as it is irrelevant for this procedure.
The behaviour of these buttons differs from other standard task buttons in that:
1) They can be used to select items from both the Variables Selected list, as well as the Variables Available list.
2) When they are used on a selection, the source (selected) items are not omitted from the list.
3) To de-select the new variables created by these buttons from the Variables Selected list, the [Variable] button should be used (not the button with which they were created).
The functions of these buttons are as follows.
Interaction: This button becomes activated when one or more items are selected from one of the Variables Available or Variables Selected lists. If only one variable is highlighted, then a new term will be added to the Variables Selected list, which is the product of this variable by itself, e.g. C2 Wages × C2 Wages. If two variables are highlighted, then the new term will be the product of these two variables, e.g. C10 Region × C11 Type. Maximum three-way interactions are allowed. Interactions of string, date, time, dummy or lag/lead variables are not allowed. In order to create interaction terms for dummy variables, create interactions first, and then create dummy variables for them.
Note that a special cross sign is used between the variables in an interaction term. If there is a problem with this character on a non-English operating system, you can enter and edit the following line in Unistat60.ini file under the [Options] group to display any other character (say *):
InteractionCross=×
This sign also appears in ANOVA and GLM output with interaction terms.
Dummy: This button is used to create n new (dummy) variables for a factor column containing n levels. Each dummy variable corresponds to a level of the factor column. A case in a dummy column will have the value of 1 if the factor contains the corresponding level in the same row, and 0 otherwise.
One or more categorical variables can be selected from one of the Variables Available or independent variables lists. A new entry will be created in the independent variables list in the form of Dummy(C10 Region). If the selection contains interactions, dummies will be created in the form of Dummy(C10 Region × C11 Type).
Sometimes, it may be important to select the interaction terms in a particular order, an order which is different from what they have in the Variables Available list. In this case, you can select the columns for the independent variables list by clicking on the [Variable] button in the desired order before selecting them for interactions or dummies.

If you wish to omit levels other than the first or to apply contrasts), then you are recommended to construct dummy variables as data columns first. In Stand-Alone Mode, you can do this automatically using the Data Processor’s Dummy() function (see iable]s.
Full: This button becomes activated when two or more items are selected from one of the independent variables or Variables Available lists. Dummy variables for each column selected, as well as dummies for all possible interaction terms (up to 3-way) will be created automatically. The number of new independent variables added to the model is determined by the number of distinct values (levels) of the selected columns. For each interaction term, this number is equal to the product of the number of levels in all variables in the term.
For example, suppose two columns are highlighted, C10 Region and C11 Type. Then the following three terms will be created: Dummy(C10 Region), Dummy(C11 Type), Dummy(C10 Region*C11 Type). If four columns are highlighted, C1, C2, C3, C4, then 14 new dummy terms will be created in the following order: Dummy(C1), Dummy(C2), Dummy(C3), Dummy(C4), Dummy(C1*C2), Dummy(C1*C3), Dummy(C1*C4), Dummy(C2*C3), Dummy(C2*C4), Dummy(C3*C4), Dummy(C1*C2*C3), Dummy(C1*C2*C4), Dummy(C1*C3*C4), Dummy(C2*C3*C4). Also suppose that C1, C2, C3, C4 contain 2, 3, 4, 5 levels respectively. Then the total number of variables added to the model will be 239.
WARNING! Ensure that the columns selected as Dummy are categorical variables containing a limited number of distinct values.
Lag/Lead: This button is used to create new variables by shifting the rows of an existing variable up or down. If you highlight some columns and click on the [Lag/Lead] button, these will be transferred to the Variables Selected list as Lag(C1 Label1;0), Lag(C2 Label2;0), etc. In this case, after clicking [Next] a further dialogue will ask for the size of the lags (or leads) for each [Lag/Lead] variable selected.

You can select the same variable an unlimited number of Enter a negative integer to define a selection as a lag and a positive integer for a lead. Leaving an entry as zero means that the selected variable will be included in the model without modification. When, for instance, -2 is entered for a lag, the program will create a new variable internally, starting from the third case of the original column. Therefore this variable will have two observations missing at the end. On the other hand, when 2 is entered for a lead variable, its first observation will correspond to the third row of the data matrix, the first two cases will be defined as missing and the last two cases will be omitted.
WARNING! Selection of lag/lead variables results in a loss of degrees of freedom. You must ensure that there is a sufficient number of cases (rows) in the data matrix when lags and / or leads are selected.
WARNING! Interpretation of lag/lead variables may not be clear when they are selected along with factors for categorical analysis (see 2.1.2. Categorical Data Analysis) or with the Data Processor’s Data → Select Row function. The use of lag/lead variables is not prevented in such cases you should ensure that their effect is unambiguous.