Strategies for community-sourced biocuration in bioinformatics: a case study on MIBiG 4.0
Published in Briefings in Bioinformatics, 2025
Kai Blin, Catarina Loureiro, Nico L L Louwen, Jorge C Navarro-Muñoz, Hans Gerstmans, Serina L Robinson, Adriano Rutz, Zachary L Reitz, Drew T Doering, Justin J J van der Hooft, Tilmann Weber, Marnix H Medema, Mitja M Zdouc
Abstract: Biocuration is essential to transform molecular sequence data into standardized, machine-readable resources. Such curated datasets enable comparative analysis, predictive modeling, and data integration across bioinformatics platforms. While professional biocuration is resource-intensive and usually limited to institutional settings, community-driven approaches can mobilize large-scale annotation of specialized datasets and are more resilient to disruptions in scientific funding. Here, we present a model for community-powered curation applied to the Minimum Information about a Biosynthetic Gene Cluster (MIBiG) repository. Through a framework of workflows for metadata capture, annotation validation, and contributor coordination, the MIBiG 4.0 initiative recruited 267 scientists across 178 institutions from 33 countries, volunteering an estimated 4000 h of work. These efforts expanded the MIBiG repository by 22% and enhanced its usability in downstream molecular data analyses in comparative genomic analyses, natural product discovery, and machine learning applications. We provide strategies and actionable lessons for adopting this model, supporting the sustainability of curated bioinformatics resources central to nucleic acid research and related fields.