Interaction terms in non-linear models


This discussion is primarily based on the following article, but see also:(1,2)

Ai, C. and Norton, E.C. (2003). Interaction terms in logit and probit models. Economics Letters, 80, 123-129.
The magnitude of the interaction effect in nonlinear models does not equal the marginal effect of the interaction term, can be of opposite sign, and its statistical significance is not calculated by standard software. We present the correct way to estimate the magnitude and standard errors of the interaction effect in nonlinear models.

The main ideas of this article is that both the test and interpretation of an interaction term in a GLM are done in the wrong way. Instead of interpreting the true interaction coefficient, discussion often relies on the marginal effect of the interaction term. What are marginal effect? In this context, it means that an interaction between two factors should be combined with the main effects marginal to that interaction.(6,8) Using the notation of the authors, the interaction effect is the cross derivative of the expected value of y:

\frac{\partial^2\Phi(\cdot)}{\partial x_1x_2}=\underbrace{\beta_{12}\Phi’(\cdot)}_{\text{marginal\ effect}}+(\beta_1+\beta_{12}x_2)(\beta_2+\beta_{12}x_1)\Phi^{’‘}(\cdot)

If β12=0, we have

\left.\frac{\partial^2 \phi(\cdot)}{\partial x_1 \partial x_2}\right|_{\beta_{12}=0}=\beta_1\beta_2\phi’‘(\cdot)

In addition, the authors highlight the fact that the interaction effect might be not negligible even if the corresponding β=0. Further, the interaction effect is conditional on the independent variables. Finally, the sign of the interaction term does not necessarily indicate the sign of the interaction effect as it may have different signs for different values of the covariates. Altogether, these remarks should give support to the practitioner so that a careful examination of the model fit is done before concluding or interpreting the interaction effect without caution.

What’s about the computation of the odds-ratio associated to the main effects when the interaction is significant? For instance, consider a logistic framework with two main effects, say β1 and β2 and a third coefficient β12 that represents the interaction. For the purpose of the illustration, we can easily imagine two such factors, e.g. the smoking status (smoker or not) and the drinking status (occasional vs. regular drinker). The response variable might be any medical outcome of potential interest (cancer, malignant affection, etc.), coded as a binary (0/1) variable. If the interaction is significant, then talking about the odds-ratio (OR) associated to the smoking variable has no sense at all. Instead, one must describe two OR: the OR for subjects who are smoker but occasional drinker, exp(β1) and the OR for subjects who are smoker and regular drinker, exp(β112).

The two authors hold the necessary Stata code on their homepage. However, I would like to illustrate the issues raised by the interpretation of interaction terms when using non-linear models with R.

The effects package facilitates in some way the graphical display of effects sizes and we will use it in the short application proposed in the next few paragraphs.

Let’s consider the Arrests data, which is also used in (5) (See also this brief report, and (9), Chap. II.2, p. 57) and aims at studying the probability of release of individuals arrested in Toronto for simple possession of small quantities of marijuana. Characteristics of interest are: subjects’ race, age, employment, citizenship, previous recording in police databases.

opar <- par(mfrow=c(2,2),las=1)
quali.var <- c(2,5,6,7)
for (i in quali.var)

The above figure could hardly be interpreted as is because we need to consider both marginal (not shown) and conditional (these plots) distributions at the same time. However, we can run a reduced (compared to that used in (5)) model including colour, age and sex, as well as colour × age. This is done as follows:

arrests.glm <- glm(released ~ colour + age + sex + colour:age,
                     family=binomial, data=Arrests)

and here is the resulting output:

Effect Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.853219 0.241020 3.540 0.0004 ***
colourWhite 1.645338 0.241690 6.808 9.92e-12 ***
age 0.014389 0.007822 1.839 0.0658 .
sexMale -0.161870 0.142805 -1.133 0.2570
colourWhite:age -0.037299 0.009362 -3.984 6.78e-05 ***

At first glance, the model seems quite satisfactory and no deviations from standard assumptions are noticed (see next Figure).

Now, we can get an ANOVA-like summary by issuing(5):


The results are shown below:

Anova Table (Type II tests)   
Response: released
           LR Chisq Df Pr(>Chisq)
colour       81.854  1  < 2.2e-16
age           5.947  1    0.01474
sex           1.325  1    0.24962
colour:age   16.479  1  4.918e-05

Note that the vertical axis is labelled on the probability scale (i.e. the response scale) while the estimated effects are plotted on the scale of the linear predictor. The 95% (pointwise) confidence interval is wider at the extreme values of the age variable.

Now, we can proceed to the estimation of the interaction effect using the method proposed by Ai and Norton.


  1. Ai, C. and Norton, E.C. (2000). Interaction terms in nonlinear models. Unpublished draft manuscript.
  2. Norton, E.C., Wang, H., and Ai, C. (2004). Computing interaction effects and standard errors in logit and probit models. The Stata Journal, 4(2), 154-167.
  3. Hoetker, G. (2003). Confounded coefficients: Accurately comparing logit and probit coefficients across groups. College of Business Working Papers, University of Illinois.
  4. Berry, W.D., Esarey, J., and Rubin, J.H. (2007). Testing for interaction in binary logit and probit models: Is a product term essential? Working Papers of the Society for Political Methodology.
  5. Fox, J. (2003). Effect displays in R for generalized linear models. Journal of Statistical Software, 8(15), 18 pp.
  6. Fox, J. (1987). Effect displays for generalized linear models. In Clogg, C.C. (Ed.), Sociological Methodology, pp. 347-361. American Sociological Association, Washington DC.
  7. Rabe-Hesketh, S., Skrondal, A., and Pickles, A. (2001). Maximum likelihood estimation of generalized linear models with covariate measurement errors. The Stata Journal, 1(1), 26 pp.
  8. Bouyer, J., Hémon, D., Cordier, S., Derriennic, F., Stücker, I., Stengel, B., and Clavel, J. (1995). Épidémiologie, Principes et méthodes quantitatives. Éditions INSERM.
  9. Chen, C.-H., Härdle, W., and Unwin, A. (Eds.) (2008). Handbook of Data Visualization. Springer Verlag.
  10. Tomz, M., Wittenberg, J., and King, G. (2003). Clarify: Software for interpreting and presenting statistical results. Journal of Statistical Software, 8(1), 29 pp.

Articles with the same tag(s):

Multi-Group comparison in Partial Least Squares Path Models
Data Science from Scratch
Yet another gray theme for R base graphics
Writing a book
Stata for health researchers
R, pipes and Co.
R Graphs Cookbook
Bad Data
Data science at the command-line
Emacs Org-mode and literate programming