Andreas Nordland PhD Defense - Statistical policy learning with industrial applications

Invitation to PhD Defense

Andreas Nordland

Date: Friday, February 16, 2024, at 14.00
Venue: Øster Farimagsgade 5, 1353 Copenhagen K, room 2.1.12
After the defence, a reception will be held in 5-2-46.

Academic advisors:
Professor Torben Martinussen, Section of Biostatistics, Department of Public Health, University of Copenhagen.
Principal Scientist Klaus Kähler Holst, Novo Nordisk.

Assessment committee:
Associate professor Erin Evelyn Gabriel (Chairperson), Section of Biostatistics, Department of Public Health, University of Copenhagen.
Associate professor Stefan Nygaard Hansen, Department of Public Health, Aarhus University.
Professor Xavier de Luna, Umeå School of Business, Economics and Statistics


This industrial PhD-thesis is the result of a cooperation between the Section of Biostatistics at the University of Copenhagen and the Maersk Research Team. 

Transforming Maersk into a data-driven company hinges on successfully leveraging historical data to initiate decision policies wherever applicable. Estimating optimal policies is a causal problem that promotes new ways of collecting, documenting, and generating data. Policy learning is a vast research field across many disciplines. However, the recent development of nonparametric and doubly robust policy learning techniques in statistics and economics has yet to find applications in logistics and other industries. We see  massive potential for these assumption-lean techniques to leverage the vast amounts of historical data in Maersk. The primary objective of this project is to ease the practical application of the latest theoretical developments. A key contribution is the comprehensive R package polle, which unifies existing policy learning methods, introduces new functionality, and ensures consistent policy evaluation.

To illustrate the usefulness of this implementation, we present a novel application aimed at optimizing maintenance and repair policies to maximize the long-term utility of reefers. A central challenge in this application is to address practical positivity violations arising from limited variation in the decision process. We advocate for a solution involving an action probability threshold restriction, resulting in an estimate for the optimal realistic work order policy. Our findings indicate a significant gain in value, amounting to an estimated $7.5 million increase in annual profits. For cases involving extended follow-up periods, obtaining an early indication of the effectiveness of
an action or treatment using a post-randomization response indicator is highly valuable.

The final contribution of this project focuses on studying the treatment effect among responders, defined as a principal stratum. For a survival analysis setup, we make novel contributions to dealing with right censoring and construct a nonparametric efficient estimator for the target parameter. This target parameter is applicable for subgroup analysis or for designing optimal treatment-switching policies when combined with policy learning techniques.