A new pipeline for the normalization and pooling of metabolomics data

Research output: Contribution to journal › Journal article › Research › peer-review

Vivian Viallon
Mathilde His
Sabina Rinaldi
Marie Breeur
Audrey Gicquiau
Bertrand Hemon
Kim Overvad
Joseph A Rothwell
Lucie Lecuyer
Gianluca Severi
Rudolf Kaaks
Theron Johnson
Matthias B. Schulze
Domenico Palli
Claudia Agnoli
Salvatore Panico
Rosario Tumino
Fulvio Ricceri
W. M. Monique Verschuren
Peter Engelfriet
Charlotte Onland-Moret
Roel Vermeulen
Therese Haugdahl Nøst
Ilona Urbarova
Raul Zamora-Ros
Miguel Rodriguez-Barranco
Pilar Amiano
José Maria Huerta
Eva Ardanaz
Olle Melander
Filip Ottoson
Linda Vidman
Matilda Rentoft
Julie A. Schmidt
Ruth C. Travis
Elisabete Weiderpass
Mattias Johansson
Laure Dossus
Mazda Jenab
Marc J Gunter
Justo Lorenzo Bermejo
Dominique Scherer
Reza M Salek
Pekka Keski-Rahkonen
Pietro Ferrari

Pooling metabolomics data across studies is often desirable to increase the statistical power of the analysis. However, this can raise methodological challenges as several preanalytical and analytical factors could introduce differences in measured concentrations and variability between datasets. Specifically, different studies may use variable sample types (e.g., serum versus plasma) collected, treated, and stored according to different protocols, and assayed in different laboratories using different instruments. To address these issues, a new pipeline was developed to normalize and pool metabolomics data through a set of sequential steps: (i) exclusions of the least informative observations and metabolites and removal of outliers; imputation of missing data; (ii) identification of the main sources of variability through principal component partial R-square (PC-PR2) analysis; (iii) application of linear mixed models to remove unwanted variability, including samples’ originating study and batch, and preserve biological variations while accounting for potential differences in the residual variances across studies. This pipeline was applied to targeted metabolomics data acquired using Biocrates AbsoluteIDQ kits in eight case-control studies nested within the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort. Comprehensive examination of metabolomics measurements indicated that the pipeline improved the comparability of data across the studies. Our pipeline can be adapted to normalize other molecular data, including biomarkers as well as proteomics data, and could be used for pooling molecular datasets, for example in international consortia, to limit biases introduced by inter-study variability. This versatility of the pipeline makes our work of potential interest to molecular epidemiologists.

Original language	English
Article number	631
Journal	Metabolites
Volume	11
Issue number	9
Number of pages	18
ISSN	2218-1989
DOIs	https://doi.org/10.3390/metabo11090631
Publication status	Published - 2021
Externally published	Yes

Bibliographical note

Research areas

Cancer epidemiology, Metabolites, Metabolomics, Normalization, Pooling, Technical variability

ID: 285730564

Department of Public Health