Misspecified poisson regression models for large-scale registry data: inference for 'large n and small p'

Research output: Contribution to journalJournal articleResearchpeer-review

Poisson regression is an important tool in register-based epidemiology where it is used to study the association between exposure variables and event rates. In this paper, we will discuss the situation with 'large n and small p', where n is the sample size and p is the number of available covariates. Specifically, we are concerned with modeling options when there are time-varying covariates that can have time-varying effects. One problem is that tests of the proportional hazards assumption, of no interactions between exposure and other observed variables, or of other modeling assumptions have large power due to the large sample size and will often indicate statistical significance even for numerically small deviations that are unimportant for the subject matter. Another problem is that information on important confounders may be unavailable. In practice, this situation may lead to simple working models that are then likely misspecified. To support and improve conclusions drawn from such models, we discuss methods for sensitivity analysis, for estimation of average exposure effects using aggregated data, and a semi-parametric bootstrap method to obtain robust standard errors. The methods are illustrated using data from the Danish national registries investigating the diabetes incidence for individuals treated with antipsychotics compared with the general unexposed population.

Original languageEnglish
JournalStatistics in Medicine
Volume35
Issue number7
Pages (from-to)1117-1129
Number of pages13
ISSN0277-6715
DOIs
Publication statusPublished - 30 Mar 2016

ID: 157491044