Multilaboratory Untargeted Mass Spectrometry Metabolomics Collaboration to Identify Bottlenecks and Comprehensively Annotate A Single Dataset

Published in Analytical Chemistry, 2025

Joelle Houriet, Preston K Manwill, Armando Alcázar Magaña, Victoria M Anderson, Mehdi A Beniddir, Samuel Bertrand, Jaewoo Choi, Trevor N Clark, Leonard J Foster, Maria Halabalaki, Alan K Jarmusch, Niek F de Jonge, Aswad Khadilkar, John B MacMillan, Claudia S Maier, Luke C Marney, Guillaume Marti, Eleni V Mikropoulou, Damien Olivier-Jimenez, Amélie Perez, Justin JJ van der Hooft, Mitja M Zdouc, Roger G Linington, Nadja B Cech

Abstract: Annotation is the process of assigning features in mass spectrometry metabolomics data sets to putative chemical structures or “analytes.” The purpose of this study was to identify challenges in the annotation of untargeted mass spectrometry metabolomics datasets and suggest strategies to overcome them. Toward this goal, we analyzed an extract of the plant ashwagandha (Withania somnifera) using liquid chromatography–mass spectrometry on two different platforms (an Orbitrap and Q-ToF) with various acquisition modes. The resulting 12 datasets were shared with ten teams that had established expertise in metabolomics data interpretation. Each team annotated at least one positive ion dataset using their own approaches. Eight teams selected the positive ion mode data-dependent acquisition (DDA) data collected on the Orbitrap platform, so the results reported for that dataset were chosen for an in-depth comparison. We compiled and cross-checked the annotations of this dataset from each laboratory to arrive at a “consensus annotation,” which included 142 putative analytes, of which 13 were confirmed by comparison with standards. Each team only reported a subset (24 to 57%) of the analytes in the consensus list. Correct assignment of ion species (clusters and fragments) in MS spectra was a major bottleneck. In many cases, in-source redundant features were mistakenly considered to be independent analytes, causing annotation errors and resulting in overestimation of sample complexity. Our results suggest that better tools/approaches are needed to effectively assign feature identity, group related mass features, and query published spectral and taxonomic data when assigning putative analyte structures.