Input/Output

FERMO accepts several input data formats and produces a variety of output data files, as described below.

Input Data Formats

This is a comprehensive list of input data formats currently accepted by FERMO. Detailed information can be found below. If your favorite data format is missing, feel free to contact the developers.

Data type	Accepted formats	Mandatory/Optional	Example
Molecular feature peaktable	mzmine3	Mandatory	mzmine3
MS/MS spectrum information	mgf(mzmine3)	Optional	mgf(mzmine3)
Group metadata	fermo	Optional	fermo
Phenotype/bioactivity data	qualitative, quantitative-percentage, quantitative-concentration	Optional	qualitative, quantitative-percentage, quantitative-concentration
Spectral library	mgf(GNPS)	Optional	mgf(GNPS)
MS2Query results file	ms2query(fermo-modified)	Optional	ms2query(fermo-modified)
AntiSMASH results	antiSMASH(KnownClusterBlast)	Optional	antiSMASH(KnownClusterBlast)

Output Data Formats

All output files created by FERMO are written into a results directory that is created in the directory in which the provided peaktable file resides.

All output files are starting with out.fermo and ending with a suffix that specifies their type (see the table below):

Suffix	Description
.session.json	Main output of fermo_core, is used for visualization in fermo_gui
.graphml	Spectral similarity (=molecular) network file, can be imported into Cytoscape
.fermo.abbrev.csv	Feature annotation file, can be imported into Cytoscape
.fermo.full.csv	Modified full peaktable, can be used in downstream processing
.log	A log file describing the steps performed and any warnings/errors that were registered.
.summary.txt	A summary of the steps taken and parameters applied in human-readable form. Can be adapted for the methods section of a manuscript (beware of copy-pasting and a potential flag by a plagiarism software)

Details Input Data Formats

Molecular feature peaktable

The peaktable must:

Derive from liquid chromatography electrospray ionization (tandem) mass spectrometry (LC-ESI-(MS/)MS)
Constitute of samples acquired at identical concentration/dilution and identical injection volume
Be acquired using untargeted data-dependent acquisition (DDA)
Be of high resolution (ideally, <20 ppm mass deviation)
Be in a single polarity (either positive or negative ion mode)

mzmine3

Peaktable produced by created by MZmine (version 3.x). To generate the peaktable in MZmine3, perform the usual pre-processing steps and export the table using Feature list methods -> Export feature list -> Molecular networking files. In the resulting window (see below), keep defaults except Filter rows: ALL, Feature intensity: Height, and CSV export: COMPREHENSIVE.

MZmine3 peaktable export

For an example, see here.

MSMS spectrum information

mgf(mzmine3)

The MS/MS (MS2) fragmentation data file is automatically generated during MZmine (version 3.x) peaktable export (see above).

BEGIN IONS
FEATURE_ID=13
PEPMASS=610.3346
SCANS=13
RTINSECONDS=572.766
CHARGE=0+
MSLEVEL=2
98.8724 4.9E1
103.7477 2.9E1
... ...
END IONS

Group Metadata

fermo

.csv-file (see example below). Specifically, the file must have:

A single column labeled sample_name specifying the samples IDs. Entries in this column must be unique.
One or more columns specifying group categories. In these columns, all values must be strings (no numbers).

Optionally, the file may have:

One or more samples designated as sample blanks with the signal word BLANK.

sample_name,phylogroup,medium
sample1.mzXML,group1,mediumA
sample2.mzXML,group1,mediumB
sample3.mzXML,group2,mediumA
sample4.mzXML,group2,mediumB
sample1.mzXML,BLANK,BLANK
sampleN.mzXML,groupX,mediumY

Phenotype (bioactivity) data

qualitative

.csv-file (see example below). Specifically, the file must have:

A single column labeled sample_name specifying the samples considered “positive”

sample_name
sample1.mzXML
sample2.mzXML
sample3.mzXML
sampleN.mzXML

quantitative-percentage

.csv-file (see example below). Specifically, the file must have:

A column labeled sample_name specifying the sample identifiers
A column labeled well specifying the well number. Numbers in this column must be occurring only once. Note that the label well stands for any measurement reference (vial, rack position, etc.)
One to six columns labeled with assay:..., which indicate different assays (or one assay at different concentrations). The values indicate the percentage inhibition measured for this sample. Negative percentages are considered as 0 (zero).
Measurements for at least 10 samples

sample_name,well,assay:assay1_conc1,assay:assay1_conc2,assay2_conc1
sample1.mzXML,1,6,30,98
sample1.mzXML,2,-5,22,80
sample2.mzXML,3,3,15,-20
sample2.mzXML,4,18,17,32
sampleN.mzXML,M,X,Y,Z

quantitative-concentration

.csv-file (see example below). Specifically, the file must have:

A column labeled sample_name specifying the sample identifiers
A column labeled well specifying the well number. Numbers in this column must be occurring only once. Note that the label well stands for any measurement reference (vial, rack position, etc.)
One to six columns labeled with assay:..., which indicate different assays. The values indicate the minimal inhibitory concentration and must be positive numbers. Samples with no measured activity are indicated with 0 (zero).
Measurements for at least 10 samples

sample_name,well,assay:bactericidal
sample1.mzXML,1,256
sample1.mzXML,2,256
sample1.mzXML,3,64
sample1.mzXML,4,32
sample1.mzXML,5,0
sample1.mzXML,6,0
sampleN.mzXML,M,X

Spectral Library

mgf(GNPS)

A .mgf-file (Mascot generic format) as produced by GNPS. Specifically, this file must:

Start with BEGIN IONS
Have a PEPMASS entry indicating the precursor ion m/z (must not be 0.0 or 1.0)
Have a number of fragment-intensity pairs
End with END IONS

Optionally, the following fields are supported: - SMILES - INCHI - NAME

Other fields may be present but are not actively parsed.

BEGIN IONS
PEPMASS=1649.45
NAME=Fakeomycin
SMILES=CCCCCC
INCHI=AKFVOKPQHFBYCA
SCANS=1
172.073334  80.0
190.080322  201.0
... ...
END IONS

MS2Query Results File

ms2query(fermo-modified)

A .csv-file (MSQuery version 1.4.0). Specifically, it must have:

A column labeled id with numbers matching molecular feature IDs (not found in standard MS2Query results files
A column labeled analog_compound_name
A column labeled ms2query_model_prediction
A column labeled precursor_mz_difference
A column labeled precursor_mz_analog
A column labeled smiles
A column labeled inchikey
A column labeled npc_class_results

ms2query_model_prediction,precursor_mz_difference,precursor_mz_query_spectrum,precursor_mz_analog,inchikey,analog_compound_name,smiles,id,npc_class_results
0.3459,47.0324,247.1276,294.1600,AAAAAAAAAA,”fakeomycin”,CCCCC,3,Carboline alkaloids

AntiSMASH Results

Online (`fermo_gui`)

In the online version, FERMO requires only an existing antiSMASH job ID, which it will use to automatically fetch the required data. An antiSMASH job ID has the format taxon-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeee. The ID can be found in the antiSMASH notification email or in the URL when browsing the job results.

Offline CLI (`fermo_core`)

The fermo_core CLI requires a single antiSMASH results directory (at the time of writing, version 7.1) containing a knownclusterblast results directory.

Input/Output

Input Data Formats

Output Data Formats

Details Input Data Formats

Molecular feature peaktable

mzmine3

MSMS spectrum information

mgf(mzmine3)

Group Metadata

fermo

Phenotype (bioactivity) data

qualitative

quantitative-percentage

quantitative-concentration

Spectral Library

mgf(GNPS)

MS2Query Results File

ms2query(fermo-modified)

AntiSMASH Results

Online (fermo_gui)

Offline CLI (fermo_core)

Online (`fermo_gui`)

Offline CLI (`fermo_core`)