Misspecified poisson regression models for large-scale registry data

Misspecified poisson regression models for large-scale registry data: inference for 'large n and small p'

Research output: Contribution to journal › Journal article › Research › peer-review

Poisson regression is an important tool in register-based epidemiology where it is used to study the association between exposure variables and event rates. In this paper, we will discuss the situation with 'large n and small p', where n is the sample size and p is the number of available covariates. Specifically, we are concerned with modeling options when there are time-varying covariates that can have time-varying effects. One problem is that tests of the proportional hazards assumption, of no interactions between exposure and other observed variables, or of other modeling assumptions have large power due to the large sample size and will often indicate statistical significance even for numerically small deviations that are unimportant for the subject matter. Another problem is that information on important confounders may be unavailable. In practice, this situation may lead to simple working models that are then likely misspecified. To support and improve conclusions drawn from such models, we discuss methods for sensitivity analysis, for estimation of average exposure effects using aggregated data, and a semi-parametric bootstrap method to obtain robust standard errors. The methods are illustrated using data from the Danish national registries investigating the diabetes incidence for individuals treated with antipsychotics compared with the general unexposed population.

Original language	English
Journal	Statistics in Medicine
Volume	35
Issue number	7
Pages (from-to)	1117-1129
Number of pages	13
ISSN	0277-6715
DOIs	https://doi.org/10.1002/sim.6755
Publication status	Published - 30 Mar 2016

ID: 157491044

Department of Public Health

Misspecified poisson regression models for large-scale registry data: inference for 'large n and small p'