Ensuring Marine Microbiome Data Integrity: A Comprehensive Guide to MIMAG Standards for Marinomonas Genome Quality

Adrian Campbell Jan 12, 2026 267

This article provides a critical analysis of the Minimum Information about a Metagenome-Assembled Genome (MIMAG) standards as applied to Marinomonas and other marine microbiome genomes.

Ensuring Marine Microbiome Data Integrity: A Comprehensive Guide to MIMAG Standards for Marinomonas Genome Quality

Abstract

This article provides a critical analysis of the Minimum Information about a Metagenome-Assembled Genome (MIMAG) standards as applied to Marinomonas and other marine microbiome genomes. Aimed at researchers, scientists, and drug development professionals, it explores the foundational principles of MIMAG, details methodological workflows for compliance, offers troubleshooting strategies for common genome assembly and binning challenges in marine samples, and compares MIMAG with other genomic quality frameworks. The goal is to equip professionals with the knowledge to generate high-quality, reproducible, and clinically relevant microbial genome data from complex marine environments for applications in biodiscovery and therapeutic development.

What Are MIMAG Standards and Why Are They Critical for Marine Microbiome Research?

The Minimum Information about a Metagenome-Assembled Genome (MIMAG) standard, established by the Genomic Standards Consortium, provides a critical framework for reporting metagenome-assembled genome (MAG) quality and completeness. This framework is essential for comparative genomics, ecological studies, and bioprospecting, particularly for candidate phyla like Marinisomatota. This document details application notes and protocols for applying MIMAG standards within Marinisomatota genome quality research, a key thesis context for understanding the genomic potential of this elusive bacterial lineage.

MIMAG Standards: Core Criteria and Quantitative Benchmarks

The MIMAG standard proposes a two-tiered system (High-quality draft and Medium-quality draft) based on completeness, contamination, and the presence of a set of marker genes and ribosomal RNA genes. The following table summarizes the quantitative thresholds.

Table 1: MIMAG Quality Tier Specifications for Bacterial Genomes

Criterion	High-Quality Draft	Medium-Quality Draft
Completeness (CheckM)	≥90%	≥50%
Contamination (CheckM)	<5%	<10%
tRNA genes	≥18 tRNAs	Presence reported
5S, 16S, 23S rRNA genes	Full set (or >50% length fragments)	Presence reported
Gene annotation	Yes (e.g., IMG, NCBI PGAP)	Encouraged
Assembly Quality	Preferably closed (contig N50 reported)	Contig N50 reported

Table 2: Typical Marinisomatota MAG Statistics from Public Repositories (Example Data)

Study/Source	# MAGs	Avg. Completeness	Avg. Contamination	MIMAG Tier
Marine Sediment Study A	12	94.2% (±3.1)	1.8% (±0.9)	High-quality
Hydrothermal Vent Study B	7	78.5% (±12.4)	5.5% (±2.3)	Medium-quality
Thesis Context: Coastal Plume	5	99.1% (±0.5)	0.5% (±0.2)	High-quality

Protocols for MIMAG-Compliant Marinisomatota Genome Analysis

Protocol 1: Genome-Resolved Metagenomic Assembly and Binning

Objective: Recover Marinisomatota MAGs from complex environmental sequence data.

Quality Trimming: Use Fastp v0.23.2 with parameters: -q 20 -u 30 --length_required 100.
Co-assembly: Perform de novo assembly using MEGAHIT v1.2.9: megahit -1 read1.fq -2 read2.fq -o assembly_output --min-contig-len 1000.
Coverage Profiling: Map reads back to contigs using Bowtie2 v2.4.5 and generate depth files with SAMtools v1.17.
Binning: Execute automated binning with MetaBAT2 v2.15: metabat2 -i contigs.fa -a depth.txt -o bin_dir/bin.
Bin Refinement: Use DAS Tool v1.1.6 to integrate results from multiple binners (e.g., MetaBAT2, MaxBin2) and produce a consolidated set of bins.

Protocol 2: MIMAG Quality Assessment and Tier Assignment

Objective: Evaluate bin quality against MIMAG criteria.

Completeness/Contamination: Run CheckM2 v1.0.1 lineage workflow: checkm2 predict --threads 20 --input bins_dir --output-directory checkm2_results.
tRNA Detection: Use tRNAscan-SE v2.0.9: tRNAscan-SE -B -Q -G -o tRNA.out bins.fa.
rRNA Gene Identification: Employ Barrnap v0.9: barrnap --kingdom bac bins.fa > rrna_genes.gff.
Taxonomic Assignment: Classify bins using GTDB-Tk v2.3.0: gtdbtk classify_wf --genome_dir bins_dir --out_dir gtdbtk_out --cpus 20. Filter for classification within the Marinisomatota phylum (e.g., p__Marinisomatota).
Tier Assignment: Compile results from steps 1-3 and assign MIMAG tier based on Table 1 thresholds.

Protocol 3: Functional Annotation for Drug Development Context

Objective: Annotate high-quality Marinisomatota MAGs to identify biosynthetic gene clusters (BGCs).

Gene Calling & Annotation: Use Prokka v1.14.6 for rapid annotation: prokka --kingdom Bacteria --outdir prokka_annotation --prefix mag bin.fa.
BGC Discovery: Run antiSMASH v7.0: antismash bin.fa --cb-knownclusters --cb-subclusters --genefinding-tool prodigal -c 20 --output-dir antismash_result.
Resistance & Virulence: Screen for AMR genes using RGI (CARD): rgi main -i protein.faa -o rgi_output --type protein.
Comparative Analysis: Generate a protein family (pangenome) profile using Roary v3.13.0 for multiple Marinisomatota MAGs.

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Materials for MIMAG-Compliant Marinisomatota Research

Item/Category	Function/Application	Example Product/Software
High-Throughput Sequencer	Generate raw metagenomic reads from environmental DNA.	Illumina NovaSeq X, PacBio Revio
Metagenomic Assembly Software	Reconstruct long contiguous sequences (contigs) from short reads.	MEGAHIT, SPAdes
Binning Algorithm	Cluster contigs into draft genomes (MAGs) based on sequence composition and abundance.	MetaBAT2, MaxBin2
Quality Assessment Tool	Quantify genome completeness and contamination using single-copy marker genes.	CheckM2, BUSCO
Taxonomic Classifier	Assign phylogenetic lineage to recovered MAGs.	GTDB-Tk
Functional Annotation Pipeline	Predict genes and assign functional categories.	Prokka, DRAM
BGC Detection Suite	Identify genomic regions encoding secondary metabolites (drug leads).	antiSMASH, PRISM
High-Performance Computing (HPC) Cluster	Provides computational resources for data-intensive workflows.	Local or cloud-based HPC infrastructure

Visualizations

Workflow for MIMAG-compliant MAG generation

MIMAG quality tier decision logic

Application Notes and Protocols

This document outlines the specific challenges and methodological frameworks for marine microbial genome-resolved metagenomics, contextualized within the broader thesis goal of establishing high-quality reference genomes for the candidate phylum Marinisomatota in accordance with the Minimum Information about a Metagenome-Assembled Genome (MIMAG) standards.

Challenge 1: Sample Complexity and Biomass Limitations Marine samples, particularly from deep pelagic zones, exhibit extreme microbial diversity with low biomass, complicating DNA extraction and sequencing depth requirements.

Table 1: Quantitative Challenges in Marine Sample Processing

Parameter	Typical Range/Value	Impact on Genome Quality
Microbial Cells per mL (Open Ocean)	10^5 - 10^6	Limits total genomic DNA yield.
Dominant Taxon Relative Abundance	Often <1%	Requires deep sequencing for coverage.
Estimated Genomic Diversity per Sample	10^3 - 10^5 Species/OTUs	Increases assembly complexity and fragmentation.
Target Sequencing Depth for LTM AGs	>100X coverage	Necessitates high-volume filtration or amplification.

Protocol 1.1: Concentrated Biomass Collection and Preservation

Materials: Sterilized Niskin bottles, peristaltic pump, in-line serial filtration system (e.g., 3.0μm pre-filter, 0.22μm sterivex capsule), RNAlater or DNA/RNA Shield preservation buffer.
Method: Collect >50L seawater. Perform in-line sequential filtration under gentle pressure (<5 psi). Immediately upon filter retrieval, aseptically add 1.5mL of preservation buffer to the filter capsule. Flash-freeze in liquid nitrogen and store at -80°C.

Challenge 2: Co-Extracted Contaminants and Host Contamination Marine samples contain PCR inhibitors (humics, salts, polysaccharides) and, for host-associated Marinisomatota, overwhelming host DNA.

Table 2: Common Contaminants and Mitigation Strategies

Contaminant Type	Source	Mitigation Reagent/Kit	Post-Extraction QC Metric
Polysaccharides & Humics	Dissolved Organic Matter	PVPP (Polyvinylpolypyrrolidone) addition to lysis buffer.	A260/A230 ratio (<1.8 indicates carryover).
Salt (NaCl, MgCl₂)	Seawater	Ethanol-based wash buffers; Size-selection cleanup beads.	Fluorometric quantification (Qubit).
Host Genomic DNA (e.g., sponge)	Eukaryotic Host Tissue	Benzonase digestion prior to lysis; Differential lysis.	qPCR for universal 18S vs. 16S rRNA genes.

Protocol 1.2: Inhibitor-Robust Metagenomic DNA Extraction

Materials: DNeasy PowerWater Sterivex Kit (Qiagen) with modifications; PVPP powder; Zymo DNA Clean & Concentrator-5 kit.
Method: Add 0.1g PVPP to the initial SL1 lysis buffer. Follow kit protocol with extended bead-beating (5min). Perform post-elution cleanup using a 0.8X bead-to-sample ratio to remove short fragments and salts. Elute in 10mM Tris-HCl (pH 8.5).

Challenge 3: Achieving MIMAG-Standard Genome Completeness and Contamination The MIMAG standard for a high-quality draft genome requires >90% completeness and <5% contamination. This is difficult for low-abundance marine microbes.

Protocol 1.3: Single-Assemblage, Multi-Depth Sequencing and Binning

Methodology: Split extracted DNA from a single filter into two libraries: 1) Illumina NovaSeq 2x150bp for high-depth (~200M read pairs) assembly, and 2) Oxford Nanopore Technologies (ONT) ligation sequencing for long reads.
Hybrid Assembly & Binning: Assemble Illumina reads using metaSPAdes. Polish assembly with ONT reads using Medaka. Perform binning on the hybrid assembly using metaWRAP (Bin_refinement module) with MetaBAT2, MaxBin2, and CONCOCT. Check all bins against the MIMAG checklist.

Table 3: MIMAG Quality Metrics for a Hypothetical Marinisomatota Bin

MIMAG Quality Metric	Minimum Standard (High-Quality Draft)	Example Bin Result	Tool for Assessment
CheckM Completeness	≥90%	92.5%	CheckM2
CheckM Contamination	≤5%	1.8%	CheckM2
Presence of 16S rRNA	Required (full-length preferred)	Full-length 16S recovered	Barrnap
Presence of tRNA genes	Required for ≥18 amino acids	tRNAs for all 20 aa found	tRNAscan-SE
# of Contigs	--	42	QUAST
N50 (bp)	--	185,450	QUAST

The Scientist's Toolkit: Research Reagent Solutions

Item	Function/Benefit
Sterivex Filter Capsules (0.22μm)	Closed-system, in-line filtration; minimizes contamination risk.
DNA/RNA Shield (Zymo Research)	Inactivates nucleases and preserves nucleic acid integrity at ambient temp for transport.
PVPP (Sigma-Aldrich)	Binds polyphenolic inhibitors (humics) common in marine samples.
Mag-Bind TotalPure NGS Beads (Omega Bio-tek)	Size-selective cleanup; removes short fragments and salts.
NEB Next Ultra II FS DNA Library Prep	Fast, robust library prep for low-input and inhibitor-tolerant workflows.
ReadUntil Kit (Oxford Nanopore)	Enables real-time selective sequencing to enrich for target Marinisomatota reads.

Visualization

Title: Marine Metagenome Assembly Workflow for MIMAG

Title: Linking Strategies to MIMAG Genome Quality Metrics

Application Notes

Marinomonas species are Gram-negative, aerobic, heterotrophic Gammaproteobacteria, predominantly isolated from marine environments. This genus serves as an exemplary model within the Marinisomatota (formerly Marinomonadaceae) for studying genomic adaptation to pelagic and epiphytic niches and for harnessing marine microbial enzymology.

Ecological Significance & Quantitative Metrics

Marinomonas spp. are key players in biogeochemical cycles, particularly in polar, temperate, and deep-sea ecosystems. Their prevalence and functional roles are quantified below.

Table 1: Ecological Prevalence and Functional Metrics of Marinomonas

Metric	Typical Range / Value	Environmental Context	Measurement Method
Abundance in coastal seawater	10^2 - 10^4 cells/L	Temperate surface waters	16S rRNA qPCR / FISH
Biofilm formation enhancement	50-70% increase in biovolume	On marine phytoplankton (e.g., Phaeocystis)	Confocal Laser Scanning Microscopy
Degradation rate of alginate	0.5-1.2 µM C/hr	Polymeric carbon turnover	Substrate-specific respiration
EPS (Exopolysaccharide) production	100-500 mg/L	Under P-limitation	Phenol-sulfuric acid assay
Cold-active enzyme (e.g., protease) activity Q₁₀	1.5-2.5	4°C to 14°C	Spectrophotometric assay
Antarctic sea ice brine salinity tolerance	Up to 15% NaCl	Survival & growth	Plate counts / MPN

Biotechnological Potential & Performance Data

The biotechnological value of Marinomonas lies in its repertoire of stress-adapted enzymes and bioactive compounds.

Table 2: Biotechnological Enzymes and Products from Marinomonas

Product/Enzyme	Source Species	Optimal Activity	Reported Yield/Activity	Potential Application
Cold-active Alkaline Phosphatase	M. primoryensis	pH 9.5, 10°C	250 U/mg	Marine molecular diagnostics, phosphate monitoring
Psychrophilic Serine Protease	M. protea	pH 8.0, 15°C	1800 U/mg	Food processing (low-temperature), detergents
Agarase	M. foliarum	pH 7.5, 25°C	50 U/mL	Agarose sugar recovery, protoplast isolation
Carotenoid (Zeaxanthin)	M. mediterranea	N/A	0.8 mg/g dry cell weight	Nutraceutical, antioxidant
Bioflocculant EPS	M. communis	N/A	92% flocculation efficiency	Wastewater treatment, mining
Halotolerant Lipase	M. arctica	12% NaCl, 20°C	120 U/mg	Bioremediation of oily saline waste

Experimental Protocols

Protocol: Genome-Resolved Metagenomic Analysis forMarinisomatotaMIMAG Compliance

Objective: To extract, sequence, assemble, and annotate a high-quality draft genome of a Marinomonas sp. from a seawater sample meeting MIMAG (Minimum Information about a Metagenome-Assembled Genome) standards.

Materials: See Scientist's Toolkit.

Workflow:

Sample Collection & Filtration: Collect 1L of seawater. Pre-filter through 3.0 µm pore-size polycarbonate membrane to remove eukaryotes and large particles. Retain filtrate.
Cell Concentration & DNA Extraction: Filter the filtrate through a 0.22 µm Sterivex-GP pressure filter unit. Use the PowerWater Sterivex DNA Isolation Kit. Follow kit protocol, including lysozyme incubation (30 min, 37°C) for Gram-negative lysis. Elute DNA in 50 µL.
DNA QC & Library Prep: Quantify using Qubit dsDNA HS Assay. Assess integrity via gel electrophoresis. Prepare library using Illumina DNA Prep and IDT 10bp UDI indices for paired-end (2x150bp) sequencing on Illumina NovaSeq. For completeness, prepare a Nanopore library using SQK-LSK114 for hybrid assembly.
Hybrid Genome Assembly & Binning: Process Illumina reads with fastp (v0.23.2) for adaptor and quality trimming. Basecall Nanopore reads with Guppy (v6+). Perform hybrid assembly using Unicycler (v0.5.0) with default parameters. Recover genomes via binning of the metagenome-assembled contigs (>1000 bp) using MetaBAT2 (v2.15).
MIMAG-Standard Genome QC: Assess the Marinomonas bin using CheckM2 (v1.0.1) for completeness and contamination. Classify phylogenetically with GTDB-Tk (v2.3.0). Annotate using Prokka (v1.14.6) and DRAM (v1.4.0). The genome must meet MIMAG "High-quality Draft" standard: >90% completeness, <5% contamination, presence of 16S, 23S, 5S rRNA genes, and ≥18 tRNAs.

Title: Workflow for MIMAG-Compliant Marinomonas Genome Recovery

Protocol: High-Throughput Screening for Cold-Active Enzyme Activity

Objective: To rapidly screen Marinomonas isolates for extracellular protease activity at low temperatures.

Materials: See Scientist's Toolkit.

Workflow:

Culture Preparation: Inoculate Marinomonas isolates in Marine Broth (MB) and incubate at target temperature (e.g., 15°C) for 48-72 hrs with shaking (180 rpm).
Cell-Free Supernatant (CFS) Collection: Transfer 1 mL culture to a microcentrifuge tube. Centrifuge at 13,000 x g for 5 min at 4°C. Filter supernatant through a 0.2 µm syringe filter.
Substrate Plate Preparation: Prepare a 1.5% w/v agar solution in 50 mM Tris-HCl buffer (pH 8.0). Autoclave and cool to ~50°C. Add 1% w/v sterile skim milk (final concentration) and mix gently. Pour 10 mL into sterile 90 mm Petri dishes.
Activity Assay: Using a sterile cork borer or pipette tip, create wells in the skim milk agar. Load 50 µL of CFS into each well. Incubate plates at 10°C and 25°C (for comparison) for 24-48 hrs.
Quantitative Analysis: Measure the diameter of the clear hydrolysis zone around each well. Calculate activity units relative to a trypsin standard curve. Plot activity vs. temperature for psychrophilic signature (higher relative activity at 10°C).

Title: Screening Protocol for Cold-Active Protease Activity

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Materials for Marinomonas Genomics and Enzymology

Item/Catalog Number	Vendor (Example)	Function in Protocol
Sterivex-GP Pressure Filter Unit (0.22 µm)	MilliporeSigma	Concentration of bacterial cells from large volume seawater for DNA.
PowerWater Sterivex DNA Isolation Kit	Qiagen	Extraction of high-quality, inhibitor-free metagenomic DNA from filters.
Illumina DNA Prep with UDI Indexes	Illumina	Preparation of multiplexed, strand-specific Illumina sequencing libraries.
SQK-LSK114 Ligation Sequencing Kit	Oxford Nanopore	Preparation of libraries for long-read sequencing on Nanopore devices.
Marine Broth 2216	BD Difco / Himedia	Standardized medium for cultivation and maintenance of Marinomonas.
Skim Milk, Powdered	BD Bacto / Sigma	Substrate for detecting extracellular protease activity in agar plates.
Qubit dsDNA HS Assay Kit	Thermo Fisher Scientific	Highly sensitive, selective quantification of double-stranded DNA.
Fastp (v0.23.2) Software	GitHub (Open Source)	Rapid all-in-one preprocessing of Illumina sequencing reads.
CheckM2 (v1.0.1) Software	GitHub (Open Source)	Accurate assessment of genome completeness and contamination.
GTDB-Tk (v2.3.0) Toolkit	GitHub (Open Source)	Phylogenomic classification of genomes against the Genome Taxonomy Database.

Application Notes: Implementing MIMAG Standards forMarinisomatotaGenome Research

The application of the Minimum Information about a Metagenome-Assembled Genome (MIMAG) standard is critical for ensuring the reproducibility and comparative analysis of genomes from uncultivated microorganisms, such as those within the phylum Marinisomatota. This standard provides a structured framework for reporting genome quality, completeness, contamination, and other key metrics, which is essential for downstream functional annotation and metabolic pathway reconstruction used in drug discovery pipelines.

For Marinisomatota, a phylum of marine bacteria often studied for novel biosynthetic gene clusters, rigorous MIMAG compliance allows researchers to confidently prioritize high-quality genomes for further experimental characterization. The core checklist mandates reporting on assembly statistics, completeness and contamination estimates via single-copy marker genes, tRNA/rRNA presence, and taxonomic classification.

Key Quantitative Metrics & Benchmarks

The following tables summarize the core quantitative thresholds as defined by MIMAG and recent application-specific benchmarks for Marinisomatota genomes.

Table 1: MIMAG Quality Tier Definitions for Draft Genomes

Metric	High-Quality Draft	Medium-Quality Draft
Completeness	≥90%	≥50%
Contamination	≤5%	≤10%
16S rRNA	Full-length sequence	Fragment or absent
tRNA	≥18 genes	<18 genes
N50	≥10 kbp	Not specified
Gene Calling	Complete	Partial

Table 2: Recommended Marinisomatota-Specific Assembly Targets

Metric	Optimal Target	Tool for Assessment
Total Assembly Length	2.5 - 4.5 Mbp	QUAST
Number of Contigs	Minimized (<200)	QUAST
CheckM2 Score	>0.9	CheckM2
GTDB-Tk Classification	p__Marinisomatota	GTDB-Tk v2.3.0
BUSCO (Bacteria odb10)	≥90% (Complete)	BUSCO

Experimental Protocols

Protocol 1: Genome-Resolved Metagenomic Assembly and Binning forMarinisomatota

Objective: To reconstruct high-quality metagenome-assembled genomes (MAGs) from marine metagenomic data, specifically targeting the Marinisomatota phylum.

Materials:

Marine environmental DNA (e.g., from filtrate of 0.22 µm filter).
High-molecular-weight DNA extraction kit.
Illumina NovaSeq 6000 platform (150bp paired-end) and/or PacBio HiFi sequencing.
High-performance computing cluster (≥64 GB RAM, 16+ cores).

Procedure:

Quality Control:
- Process raw FASTQ files using fastp (v0.23.2) with command:
Co-Assembly:
- Assemble quality-filtered reads from multiple related samples using metaSPAdes (v3.15.5):
Read Mapping and Binning:
- Map reads from each sample back to the co-assembly using Bowtie2 (v2.5.1) and generate sorted BAM files with samtools.
- Perform metagenomic binning using a combination of:
  - MetaBAT2 (v2.15) on depth tables.
  - MaxBin2 (v2.2.7).
  - CONCOCT (v1.1.0).
- Integrate results using DAS Tool (v1.1.6) to obtain a consensus set of bins.
Marinisomatota-Specific Bin Retrieval:
- Classify all bins using GTDB-Tk (v2.3.0) with the classify_wf command.
- Extract bins classified under p__Marinisomatota for downstream quality assessment.

Protocol 2: MIMAG-Compliant Quality Assessment and Curation

Objective: To assess and refine Marinisomatota MAGs against the MIMAG checklist, producing a standardized genome report.

Procedure:

Assembly Metrics:
- Run QUAST (v5.2.0) on each MAG to report total length, N50, contig count, and GC%.
Completeness & Contamination:
- Run CheckM2 (v1.0.1) for the most accurate estimation of completeness and contamination using machine learning models.
Gene Calling & Functional Annotation:
- Predict protein-coding genes with Prokka (v1.14.6) or bakta (v1.9.3).
- Identify tRNA genes using tRNAscan-SE (v2.0.9).
- Recover full-length 16S rRNA genes by mapping to the SILVA database using barrnap (v0.9).
Genome Curation (if needed):
- Perform manual refinement using Anvi'o (v7.1) interactive interface to remove obvious contaminant contigs based on differential coverage and tetranucleotide frequency outliers.
Report Generation:
- Compile all metrics into a standardized table (see Table 1 & 2).
- Assign a final MIMAG quality tier (High/Medium).

Mandatory Visualizations

Title: MIMAG-Compliant Genome Analysis Workflow

Title: MIMAG Quality Tier Decision Logic

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Materials for MIMAG-Compliant Marinisomatota Genome Research

Item	Function in Workflow	Example Product/Kit
High-Throughput DNA Extraction Kit	Efficient lysis and purification of microbial DNA from complex marine filters, minimizing bias.	DNeasy PowerWater Kit (QIAGEN)
Long-Read Sequencing Chemistry	Generates long reads (>10 kbp) essential for resolving repetitive regions and improving assembly contiguity.	PacBio HiFi SMRTbell libraries
Short-Read Sequencing Platform	Provides high-accuracy, high-coverage data for error correction and binning.	Illumina NovaSeq 6000 S4 Flow Cell
Metagenomic Assembly Software	Integrates multiple k-mer strategies to reconstruct complex microbial communities.	metaSPAdes (v3.15.5)
Binning Algorithm Suite	Utilizes sequence composition and coverage differentials to cluster contigs into genomes.	MetaBAT2, MaxBin2, CONCOCT
Quality Assessment Pipeline	Estimates completeness/contamination using lineage-specific marker genes or ML models.	CheckM2 (v1.0.1)
Taxonomic Classification Database	Provides a standardized genomic taxonomy for accurate phylum-level classification.	GTDB (Genome Taxonomy Database) Release 220
Genome Curation & Visualization Tool	Enables manual inspection and refinement of bins based on coverage and sequence signatures.	Anvi'o (v7.1)
Standardized Reporting Template	Ensures all MIMAG-required metrics are consistently reported for publication and databases.	GSC MIMAG Checklist (v1.2)

Application Notes

Enhancing Metagenome-Assembled Genome (MAG) Binning Through Standardized Metadata: The application of Minimum Information about a Metagenome-Assembled Genome (MIMAG) standards ensures that all genomic data submitted to repositories like GenBank or the European Nucleotide Archive (ENA) is accompanied by uniform, high-quality metadata. This includes essential parameters such as sequencing depth, assembly and binning software (e.g., metaSPAdes, MaxBin2), and checkM completeness/contamination metrics. This standardization allows researchers to accurately assess, compare, and reuse MAGs from disparate studies, directly facilitating the discovery of novel lineages within the Marinisomatota phylum and reducing time spent on data validation.
Facilitating Cross-Study Comparative Genomics in Marinisomatota: Standardized data formats for genome annotations (e.g., using PROKKA or DRAM with consistent databases) enable direct functional and phylogenetic comparisons across collaborative networks. By adhering to MIMAG reporting standards for gene calling, rRNA/tRNA presence, and functional annotation tools, research groups can reliably pool genomic data. This accelerates the identification of conserved metabolic pathways, such as those for polysaccharide degradation or vitamin biosynthesis, which are critical for understanding the ecological role and biotechnological potential of Marinisomatota.
Streamlining Data Integration for Drug Discovery Pipelines: In drug development, particularly for antimicrobials, standardized genome quality data is crucial for target identification. High-quality, MIMAG-compliant genomes of Marinisomatota and associated biosynthetic gene cluster (BGC) predictions (using antiSMASH with standardized parameters) provide a reliable, reproducible dataset for in-silico screening of novel secondary metabolites. This reduces ambiguity in early-stage discovery and enables seamless data sharing between academic research teams and pharmaceutical R&D departments.

Protocols

Protocol 1: Generation of a MIMAG-CompliantMarinisomatotaGenome Draft

Objective: To produce a metagenome-assembled genome (MAG) that meets MIMAG standards for medium-quality or high-quality draft status from marine sediment metagenomic data.

Materials:

Marine sediment genomic DNA extract (>1 µg, fragmented to ~350bp).
Illumina NovaSeq 6000 platform (or equivalent) for paired-end sequencing (2x150 bp).
High-performance computing (HPC) cluster with ≥ 64 GB RAM.

Procedure:

Sequencing & Quality Control:
- Perform shotgun sequencing to a minimum depth of 20x estimated genome coverage.
- Use FastQC v0.11.9 for initial read quality assessment.
- Trim adapters and low-quality bases using Trimmomatic v0.39 with parameters: ILLUMINACLIP:adapters.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36.

Assembly & Binning:
- Perform de novo co-assembly of quality-filtered reads using metaSPAdes v3.15.4 with -k 21,33,55,77 and --meta flags.
- Map quality-filtered reads back to contigs using Bowtie2 v2.4.5 to generate sorted BAM files.
- Perform binning on contigs ≥ 1500 bp using MetaBAT2 v2.15, MaxBin2 v2.2.7, and CONCOCT v1.1.0.
- Generate a consensus set of bins using DAS Tool v1.1.4.
MIMAG Quality Assessment & Annotation:
- Assess each bin's quality using checkM v1.2.2 lineage_wf to determine completeness and contamination.
- Classify taxonomy using GTDB-Tk v2.1.1 against the Genome Taxonomy Database.
- Annotate the MAG using PROKKA v1.14.6 with default parameters and the --metagenome flag.
- Identify rRNA genes using barrnap v0.9 and tRNA genes using tRNAscan-SE v2.0.9.
- Classify the MAG according to MIMAG standards (see Table 1).

Table 1: MIMAG Quality Tier Classification for Generated Marinisomatota MAG

MIMAG Tier	Completeness (checkM)	Contamination (checkM)	rRNA Genes Present?	tRNA Genes Present?	Assembly Status
High-quality draft	>90%	<5%	Full set (5S, 16S, 23S)	≥ 18	Near-complete
Medium-quality draft	≥50%	<10%	Partial or missing	May be missing	Draft
Low-quality draft	<50%	<10%	Not required	Not required	Draft

Protocol 2: Standardized Comparative Genomic Analysis for Pathway Discovery

Objective: To reproducibly identify and compare specific metabolic pathways (e.g., TCA cycle, BGCs) across a curated set of MIMAG-standardized Marinisomatota genomes.

Materials:

A collection of ≥10 MIMAG-classified Marinisomatota genomes in FASTA format.
HPC cluster with Python and R environments.

Procedure:

Data Curation:
- Create a manifest file listing all genome IDs, file paths, and key MIMAG metrics (completeness, contamination, taxonomy).

Functional Profiling:
- Perform uniform functional annotation on all genomes using DRAM v1.4.4 with the distill mode and the standardized --use_uniref flag.
- Extract KEGG Orthology (KO) identifiers from the DRAM output for each genome.
Pathway Presence/Absence Analysis:
- Use the KEGGDecoder tool (v1.3) with the KO profiles to generate a presence/absence matrix for KEGG metabolic modules (e.g., M00009, TCA cycle).
- Visualize the pattern of pathway conservation across genomes as a heatmap using the pheatmap package in R (script provided in Appendix).
- For BGC analysis, run antiSMASH v6.1.1 on all genomes with identical parameters --genefinding-tool prodigal -c 12.

Visualizations

Diagram Title: MIMAG-Compliant Genome Workflow for Collaborative Research

Diagram Title: Logic of Standardization Impact on Science

The Scientist's Toolkit: Research Reagent Solutions

Table 2: Essential Materials for MIMAG-Standard Marinisomatota Genome Research

Item / Solution	Function / Purpose
DNeasy PowerSoil Pro Kit (QIAGEN)	High-yield, inhibitor-free genomic DNA extraction from complex marine sediments, essential for downstream sequencing.
Illumina DNA Prep Kit	Library preparation for Illumina short-read sequencing, providing standardized insert sizes and adapter ligation.
MetaGeneMark v3.25 Gene Prediction Database	Consistent, ab-initio gene-calling algorithm used in pipelines like PROKKA for uniform protein-coding gene annotation.
GTDB (Genome Taxonomy Database) Release 214	Standardized, phylogenetically consistent taxonomic framework for classifying Marinisomatota and related bacteria.
checkM Database (v1.2.2)	Curated set of lineage-specific marker genes used to universally assess genome completeness and contamination.
antiSMASH v6.1.1 Database	Standardized repository of Hidden Markov Models (HMMs) for identifying Biosynthetic Gene Clusters (BGCs) reproducibly.
KEGG (Kyoto Encyclopedia of Genes and Genomes)	Reference pathway database used with tools like KEGGDecoder for uniform metabolic pathway annotation and comparison.

A Step-by-Step Workflow: Applying MIMAG Standards to Your Marinomonas Genome Project

Within the context of advancing the Marinisomatota phylum genome quality research per the Minimum Information about a Metagenome-Assembled Genome (MIMAG) standards, rigorous sample collection and metadata curation are foundational. Marine environments present unique challenges, including physicochemical gradients, diverse microbial communities, and dynamic conditions. Standardized protocols ensure reproducibility, interoperability of datasets, and the generation of high-quality genomes suitable for downstream applications in biotechnology and drug discovery.

Core Principles & Quantitative Benchmarks

Adherence to the following principles is critical for MIMAG-compliant Marinisomatota research.

Table 1: Minimum Metadata Requirements for Marine Genomic Samples

Metadata Category	Specific Parameter	Recommended Measurement Method	MIMAG Compliance Note
Geographic	Latitude, Longitude	GPS (error < 10m)	Mandatory
Depth	Sampling Depth (m)	CTD-rosette or pressure sensor	Mandatory; record offset from sea surface.
Physicochemical	Temperature (°C)	CTD with calibrated probe	Mandatory for context.
Physicochemical	Salinity (PSU)	CTD with calibrated sensor	Mandatory for context.
Physicochemical	Dissolved Oxygen (mg/L)	CTD sensor or Winkler titration	Highly Recommended.
Physicochemical	pH	Spectrophotometric or electrode	Highly Recommended for carbonate system.
Biological	Chlorophyll-a (µg/L)	Fluorescence sensor or extraction	Recommended for productivity context.
Temporal	Date & Time (UTC)	-	Mandatory.
Methodological	Filtration Pore Size (µm)	-	Mandatory for biomass collection.
Methodological	Volume Filtered (L)	Flowmeter or graduated cylinder	Mandatory.
Methodological	Preservative (e.g., RNAlater, freezing)	-	Mandatory.

Table 2: Sample Handling Benchmarks for Optimal Nucleic Acid Yield & Quality

Process Step	Target Benchmark	Quality Control Method
Filtration Time	< 30 min from collection to preservation	Procedural logging.
Biomass Preservation	Flash-freeze in liquid N₂ or immerse in RNAlater at 4:1 (v/v) ratio	Monitor storage temperature consistently at -80°C.
DNA Yield (0.22µm filter)	> 500 ng (for typical 21L seawater)	Qubit dsDNA HS Assay.
DNA Purity	A260/A280 = 1.8-2.0; A260/A230 > 2.0	Nanodrop/TapeStation.
RNA Integrity	RIN (RNA Integrity Number) > 7.0	Bioanalyzer.

Detailed Protocols

Protocol: Sterile Seawater Collection for Omics

Objective: To collect particulate microbial biomass, including Marinisomatota, from a defined water depth without contamination. Materials: CTD-rosette with Niskin bottles, peristaltic pump, tubing, in-line filter holders, sterile polyethersulfone (PES) membrane filters (0.22µm and 0.1µm), sterile forceps, preservative (RNAlater, -80°C freezer), power supply for pump. Procedure: 1. Pre-deployment: Assemble filtration rig on deck. Load sequential filter membranes (e.g., 3.0µm pre-filter, 0.22µm primary) into sterile in-line holders. Connect to peristaltic pump. 2. Collection: Deploy CTD-rosette to target depth. Trigger closure of Niskin bottle(s). Retrieve rosette. 3. Filtration: Immediately transfer seawater from Niskin bottle into a sterile collection carboy. Begin filtration within 10 minutes of rosette retrieval. Process typically 1-2L per sample, recording exact volume via flowmeter or graduated carboy. 4. Biomass Preservation: Using sterile forceps, aseptically transfer the 0.22µm filter to a cryovial containing 1-2 mL of RNAlater. Incubate at 4°C for 24h, then store at -80°C. Alternatively, flash-freeze the filter in liquid nitrogen. 5. Metadata Recording: Record all parameters from Table 1 in the field log and electronic database simultaneously. Assign a unique, persistent sample ID.

Protocol: Metagenomic DNA Extraction from Marine Filters

Objective: To obtain high-molecular-weight, inhibitor-free DNA suitable for long-read sequencing and MIMAG-grade genome assembly. Materials: PowerSoil Pro Kit (Qiagen) or similar, lysis tubes, bead beater, centrifuge, 70°C water bath, molecular grade ethanol, nuclease-free water. Procedure: 1. Lysis: Using sterile tools, cut a portion (e.g., 1/4) of the frozen filter and place in a lysis tube. Include kit-provided beads and solution C1. 2. Homogenize: Secure tubes in a bead beater and homogenize at maximum speed for 45 seconds. Incubate at 70°C for 10 minutes. 3. Inhibitor Removal: Centrifuge briefly. Transfer supernatant to a clean tube. Add solution C2, vortex, incubate on ice for 5 min, then centrifuge at 10,000 x g for 1 minute. 4. DNA Binding: Transfer supernatant to a tube with solution C3, mix, and load onto a MB Spin Column. Centrifuge. 5. Wash: Wash with solution C4 and then with 80% ethanol, centrifuging after each step. 6. Elution: Dry column by centrifugation. Elute DNA in 50-100 µL of nuclease-free water (pre-heated to 70°C). Centrifuge for 1 minute. 7. QC: Quantify yield and purity (see Table 2). Assess fragment size via gel electrophoresis or FemtoPulse system.

Signaling Pathway & Workflow Visualizations

Diagram Title: Workflow for Marine MAG Generation

Diagram Title: MAG Curation & Quality Control Pipeline

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Reagents & Kits for Marine Omics Sample Processing

Item	Function & Rationale	Example Product/Brand
RNAlater Stabilization Solution	Preserves RNA and DNA integrity at the point of collection by penetrating tissues and inactivating RNases/DNases. Critical for transcriptomic studies.	Thermo Fisher Scientific RNAlater
PowerSoil Pro DNA/RNA Extraction Kit	Efficiently lyses tough microbial cells and removes humic acids, polysaccharides, and other PCR inhibitors common in marine samples.	Qiagen DNeasy PowerSoil Pro
Polyethersulfone (PES) Membrane Filters	Low protein binding, high flow rate filters for biomass concentration. Available in sterile, pre-packaged formats for contamination control.	Sterivex-GP (0.22µm) or Pall Supor
CTD Profiling System with Niskin Bottles	Provides accurate, depth-resolved measurements of conductivity (salinity), temperature, depth, and other parameters with simultaneous water collection.	Sea-Bird Scientific SBE 911plus
ZymoBIOMICS Microbial Community Standard	Mock community used as a positive control for DNA extraction and sequencing to benchmark bias and recovery efficiency.	Zymo Research D6300
Nuclease-Free Water	Used for elution and reagent preparation to prevent nucleic acid degradation.	Invitrogen UltraPure DNase/RNase-Free Water
DNeasy Blood & Tissue Kit	An alternative for high-molecular-weight DNA extraction from filter pieces, often used in tandem with bead-beating for marine samples.	Qiagen 69504
Qubit dsDNA HS Assay Kit	Fluorometric quantification specifically for double-stranded DNA, more accurate than UV absorbance for low-concentration, potentially contaminated samples.	Thermo Fisher Scientific Q32851

Application Notes

This protocol details a bioinformatics pipeline for reconstructing metagenome-assembled genomes (MAGs) from marine metagenomic data, with a specific focus on achieving the high-quality standards defined by the MIMAG (Minimum Information about a Metagenome-Assembled Genome) framework. The workflow is contextualized for research on under-represented phyla, such as Marinisomatota (formerly known as SAR406), to enable genomic insights into their metabolic potential and role in marine biogeochemical cycles.

Table 1: Key MIMAG Standards for Genome Quality Tier Classification

Quality Tier	Completeness	Contamination	# of Contigs	Presence of rRNA Genes	tRNA Genes
High-quality draft (HQ)	>90%	<5%	<200	At least 16S, 23S, 5S	≥18
Medium-quality draft (MQ)	≥50%	<10%	No strict limit	Not required	Not required

Table 2: Typical Quantitative Output from a Marine Metagenome Assembly/Binning Run

Metric	Pre-QC Reads	Post-QC/Filtered Reads	Total Assembly Contigs	Total Assembly Length (bp)	N50 (bp)	Bins Retrieved	HQ MAGs	MQ MAGs
Example Value	150 million	135 million	1.2 million	2.1 Gbp	4,150	125	22	48

Protocols

1. Sample Processing and Quality Control

Input: Paired-end metagenomic FASTQ files.
Methodology:
- Adapter Removal & Quality Trimming: Use fastp (v0.23.4) with parameters: --detect_adapter_for_pe --cut_front --cut_tail --average_qual 20.
- Host Read Removal: Align reads to a host genome (e.g., human, sea sponge) using Bowtie2 (v2.5.1). Retain unmapped reads using samtools (v1.20).
- Error Correction: Optional but recommended for improving assembly. Use BBTools (v39.06) tadpole.sh in correction mode.

2. Co-Assembly and Individual Sample Assembly

Methodology:
- Co-Assembly: Combine all quality-filtered reads from a study region using MEGAHIT (v1.2.9). Parameters: --k-min 27 --k-max 127 --k-step 10.
- Individual Assembly: Perform separate assemblies for each sample using SPAdes (v3.15.5) in --meta mode for comparison. Parameters: -k 21,33,55,77.
- Assembly Evaluation: Assess assemblies with metaQUAST (v5.2.0) to compare total length, N50, and gene content.

3. Read Mapping and Binning

Methodology:
- Read Mapping: Map quality-controlled reads from each sample back to the chosen assembly using Bowtie2. Convert to sorted BAM files using samtools.
- Contig Coverage Profiling: Generate per-sample depth files using CoverM (v0.6.1): coverm genome --coupled reads_1.fq reads_2.fq --reference contigs.fa.
- Contig Annotation: Predict open reading frames with Prodigal (v2.6.3) in meta mode (-p meta). Create contig taxonomy profiles using GTDB-Tk (v2.3.2).
- Binning: Execute multiple binners for optimal recovery.
  - Run MetaBAT2 (v2.15): metabat2 -i contigs.fa -a depth.txt -o bin_dir/bin.
  - Run MaxBin2 (v2.2.7): run_MaxBin.pl -contig contigs.fa -abund depth.txt -out maxbin_out.
  - Run CONCOCT (v1.1.0) via metaWRAP pipeline.

4. Bin Refinement, Dereplication, and Quality Assessment

Methodology:
- Bin Refinement: Use metaWRAP (v1.3.2) BIN_REFINEMENT module to consolidate bins from multiple tools, optimizing for completeness and contamination.
- Dereplication: Cluster redundant MAGs across samples using dRep (v3.4.3) with a 95% average nucleotide identity (ANI) threshold.
- Quality Assessment: Check completeness and contamination of final MAGs using CheckM2 (v1.0.1). Annotate with DRAM (v1.4.4) for metabolic profiling. Classify taxonomy definitively with GTDB-Tk.

Visualizations

Marine Metagenome Analysis Pipeline Workflow

MIMAG Genome Quality Classification Decision Tree

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Computational Tools & Databases for Marine MAG Recovery

Tool/Database	Function	Key Parameter/Note
fastp	FASTQ pre-processing, adapter trimming, quality filtering.	Enables single-pass, rapid QC. Critical for HiSeq/NovaSeq data.
Bowtie2 / BWA	Read alignment for host removal & coverage calculation.	Use `--very-sensitive` preset for host screening.
MEGAHIT	Efficient metagenomic assembler for complex communities.	Preferred for large, diverse marine datasets due to speed.
MetaBAT2	Coverage and composition-based binning algorithm.	Primary binner; relies on tetranucleotide frequency and depth.
CheckM2	Fast estimation of MAG completeness and contamination.	Uses machine learning; faster than CheckM1.
GTDB-Tk	Genome taxonomic classification against Genome Taxonomy Database.	Essential for accurate placement of novel Marinisomatota MAGs.
DRAM	Distilled and Refined Annotation of Metabolism.	Assigns KEGG, Pfam, and CAZy annotations; generates metabolism summaries.
NCBI SRA / ENA	Public repositories for raw sequence data deposition.	Mandatory for publication (MIMAG compliance).

Application Notes: The Role of MIMAG Standards inMarinisomatotaResearch

Within the framework of a thesis on Marinisomatota genome quality research, adherence to the Minimum Information about a Metagenome-Assembled Genome (MIMAG) standards is paramount for generating comparable, high-quality reference genomes. This is especially critical for drug development professionals investigating novel biosynthetic gene clusters in these marine bacteria. Key quantitative metrics mandated by MIMAG include genome completeness, contamination, and the presence of standard marker genes. These metrics provide a foundational assessment of draft genome quality before downstream functional analysis.

Core MIMAG Metrics for Marinisomatota:

Completeness & Contamination: Primarily assessed using CheckM, which employs a set of lineage-specific, single-copy marker genes to estimate how complete the genome is and the level of sequence contamination from co-assembled genomes.
rRNA & tRNA Genes: The presence of a full complement of ribosomal RNA genes (5S, 16S, 23S) and a sufficient number of tRNA genes are indicators of a well-assembled, less-fragmented genome. These are typically identified using tools like barrnap and tRNAscan-SE.

Quantitative Data Summary:

Table 1: MIMAG Quality Tiers for Bacterial Genomes (Adapted)

Quality Tier	Completeness	Contamination	rRNA Genes (5S, 16S, 23S)	tRNA Genes
High	>90%	<5%	All present	≥18
Medium	≥50%	<10%	At least one type	-
Draft	<50%	<10%	-	-

Table 2: Example Output from a Marinisomatota Bin Analysis

Metric	Tool Used	Result	Interpretation
Completeness	CheckM2	96.2%	High-quality, near-complete genome.
Contamination	CheckM2	1.8%	Low level of foreign sequence.
Strain Heterogeneity	CheckM	0%	Likely a single strain.
16S rRNA Gene	barrnap	Present	Enables phylogenetic placement.
23S rRNA Gene	barrnap	Present	Indicates good assembly continuity.
tRNA Genes	tRNAscan	42	Adequate for translation.

Experimental Protocols

Protocol 1: Assessing Genome Completeness and Contamination with CheckM2

Objective: To calculate the completeness and contamination of a Marinisomatota draft genome bin using CheckM2, the updated and faster machine learning-based tool.

Materials:

Isolated draft genome in FASTA format (Marinisomatota_bin.fa).
A computing environment (Linux/Unix) with CheckM2 installed (preferably via conda).

Methodology:

Database Setup: Ensure the CheckM2 database is downloaded and installed.

Run CheckM2 Analysis: Execute the checkm2 predict command on your genome bin.
Interpret Output: The primary results are in checkm2_results/quality_report.tsv. Key columns are Completeness, Contamination, and Strain_Heterogeneity.

Protocol 2: Identifying rRNA and tRNA Genes

Objective: To detect the presence of ribosomal RNA and transfer RNA genes in the assembled bin.

Materials:

Draft genome FASTA file.
barrnap (for rRNA) and tRNAscan-SE (for tRNA) installed.

Methodology for rRNA (barrnap):

Run Prediction: Use barrnap in quiet mode for simple output.

Check Output: Examine the GFF file to confirm hits for 16S_rRNA, 23S_rRNA, and 5S_rRNA.

Methodology for tRNA (tRNAscan-SE):

Run Prediction: Use the bacterial model.

Check Output: The summary at the bottom of trna_results.txt reports the total number of tRNA genes found.

Visualizations

Title: MIMAG Genome Quality Assessment Workflow

Title: Logic of MIMAG High-Quality Tier

The Scientist's Toolkit

Table 3: Research Reagent Solutions for MIMAG Metric Calculation

Item	Function in Analysis	Example/Note
CheckM2	Estimates genome completeness and contamination using machine learning on a large protein database.	Replaces CheckM; faster and does not require lineage-specific marker sets.
GTDB-Tk	Provides accurate taxonomic classification, which can inform CheckM lineage selection.	Critical for placing novel Marinisomatota bins.
barrnap	Rapid ribosomal RNA gene prediction.	Outputs GFF3 file of rRNA locations.
tRNAscan-SE 2.0	Detects tRNA genes with high accuracy.	Uses covariance models for diverse bacteria.
CIBG Binning Tools (e.g., MetaBAT2, MaxBin2)	Generate initial genome bins from assembly.	Marinisomatota bins often originate from marine metagenomes.
QUAST	Evaluates assembly statistics (N50, contig count) complementary to MIMAG metrics.	Assesses assembly continuity.
Python/Biopython	For scripting and parsing the outputs of the above tools into summary tables.	Essential for automating pipelines.

Application Notes

Within the framework of a thesis on MIMAG (Minimum Information about a Metagenome-Assembled Genome) standards and Marinisomatota genome quality research, confident assignment of a MAG to the Marinomonas genus is critical. Marinomonas species are aerobic, heterotrophic, Gram-negative bacteria within the family Oceanospirillaceae, order Oceanospirillales, class Gammaproteobacteria. They are frequently recovered from marine environments. Accurate classification requires a multi-layered approach that moves beyond basic 16S rRNA similarity to meet contemporary genomic standards. This protocol integrates phylogenetic, genomic, and phenotypic (in silico) analyses to provide high-confidence genus assignment.

Key Genomic & Phenotypic Markers forMarinomonas

Table 1: Discriminatory Genomic & Phenotypic Features of Marinomonas

Feature	Typical Characteristic in Marinomonas	Confirmation Method	Importance for Classification
16S rRNA Gene Identity	≥94.5% to Marinomonas type species	BLASTn vs. NR/GTDB	Primary screening; necessary but not sufficient.
Average Amino Acid Identity (AAI)	≥60% against Marinomonas clade	CompareM (pyANI)	Robust proxy for genus-level relatedness.
Percentage of Conserved Proteins (POCP)	>50% within genus	Custom BLASTP analysis	Confirms genus-level membership based on proteome.
Core Gene Phylogeny	Monophyly with Marinomonas clade	IQ-TREE/RAxML (120 marker genes)	Gold standard for evolutionary placement.
GC Content	38-48 mol%	Genome sequence analysis	Consistent with known range.
Presence of Polar Flagella	Typically single polar flagellum	In silico detection of fla, mot genes	Common phenotypic trait.
Halotolerance	Growth in 3-12% NaCl	Inferred from presence of osmolyte synthesis genes	Ecological consistency.
Catalase & Oxidase	Positive	In silico detection of katG, ccoN homologs	Key metabolic traits.
Fatty Acid Profile	C_16:1 ω7c, C_16:0, C_18:1 ω7c predominant	Not from MAG; reference data for validation	Matches described chemotaxonomy.

Table 2: MIMAG Standard Compliance for Marinomonas MAG Classification

MIMAG Quality Tier	Required for Genus Assignment?	Key Metrics Relevant to Marinomonas Analysis
High-quality draft	Recommended	Completeness >90%, Contamination <5%, rRNA/tRNA presence.
Medium-quality draft	Minimum	Completeness ≥50%, Contamination <10%. Allows for initial placement.
CheckM2/CheckM Lineage	Mandatory	Use specific Oceanospirillaceae lineage dataset for accurate metrics.

Experimental Protocols

Protocol 1: Phylogenomic Tree Reconstruction for Genus Assignment

Objective: To determine if the MAG forms a monophyletic clade with validated Marinomonas type genomes.

Materials:

High/medium-quality MAG (fasta format).
Reference genomes (from GTDB Rxx or NCBI RefSeq).
Software: GTDB-Tk v2.3.0 (recommended), CheckM2, IQ-TREE 2, FastANI.

Method:

Curate Reference Dataset: Download all type genomes for Marinomonas (approx. 30 species) and closely related genera (e.g., Amphritea, Neptuniibacter) from GTDB.
Perform Taxonomic Classification: Run GTDB-Tk (classify_wf) with your MAG and the reference set. This pipeline automates:
- Identification of 120 bacterial marker genes.
- Multiple sequence alignment and trimming.
- Placement within a reference tree.
Build Custom Phylogeny:
- Extract the marker gene alignment from GTDB-Tk output.
- Construct a maximum-likelihood tree: iqtree2 -s alignment.fasta -m MFP -B 1000 -T AUTO.
- Visualize tree (e.g., iTOL). High-confidence assignment is supported if MAG is placed within a monophyletic Marinomonas clade with high bootstrap support (>70%).

Protocol 2: Calculation of Average Amino Acid Identity (AAI) & POCP

Objective: To quantitatively assess genomic relatedness to the Marinomonas genus.

Materials: MAG and reference proteomes (.faa files), CompareM (v0.1.2), BLASTP+.

Method for AAI:

Use CompareM: comparem aai_wf -x .faa --threads 20 mag_dir ref_dir aai_output.
The output matrix provides pairwise AAI values. An AAI ≥60% with members of the Marinomonas genus, and significantly lower values with outgroups, supports inclusion.

Method for POCP:

Perform all-vs-all BLASTP between the proteomes of your MAG and a reference Marinomonas genome (E-value < 1e-5, >40% identity).
Calculate POCP: POCP = [(C1/N1) + (C2/N2)] / 2 * 100%, where C1/C2 are conserved protein counts, N1/N2 are total proteins in each genome. A value >50% indicates genus-level relationship.

Protocol 3: In Silico Phenotype Profiling

Objective: To confirm the MAG encodes traits characteristic of Marinomonas.

Materials: MAG (.gff/.faa), HMMER, EggNOG-mapper, dbCAN3, specific HMM profiles (e.g., TIGRFAMs for flagella).

Method:

Flagellar Machinery: Search for core structural genes (flgBC, fliC, motAB) using hmmsearch against the PFAM/TIGRFAM databases.
Oxidative Metabolism: Identify catalase (katG) and cytochrome c oxidase (ccoN) homologs via eggNOG-mapper KEGG/COG annotations.
Halotolerance: Screen for genes involved in osmolyte synthesis (e.g., ectoine: ectABC, betaine) using curated HMM profiles.

Visualization

Title: MAG Assignment Workflow to Marinomonas

The Scientist's Toolkit

Table 3: Research Reagent Solutions for MAG Classification Analysis

Item/Resource	Function in Marinomonas Classification	Example/Source
GTDB (Genome Taxonomy Database)	Provides standardized, phylogeny-based reference genomes and taxonomy. Essential for phylogenomic placement.	https://gtdb.ecogenomic.org/
GTDB-Tk Software Toolkit	Automates phylogenomic workflow: identifies markers, places MAG in reference tree. Simplifies genus-level assignment.	https://github.com/ecogenomics/gtdbtk
CheckM2 & CheckM Lineage	Estimates MAG completeness and contamination using lineage-specific marker sets critical for quality assessment.	https://github.com/chklovski/CheckM2
CompareM / pyANI	Calculates quantitative genomic relatedness metrics (AAI, ANI) between MAG and reference genomes.	https://github.com/dparks1134/CompareM
IQ-TREE 2	Efficient software for maximum likelihood phylogenetic inference. Used to build robust trees from marker alignments.	http://www.iqtree.org/
EggNOG-mapper / PROKKA	Provides rapid functional annotation of MAG proteins, enabling in silico phenotypic profiling.	http://eggnog-mapper.embl.de/
TIGRFAM & PFAM HMMs	Curated protein family models for identifying specific functional genes (e.g., flagellar, metabolic).	https://www.jcvi.org/research/tigrfams
MIMAG Standard Guidelines	Framework for reporting MAG quality, ensuring results are comparable and credible for downstream research/drug discovery.	Bowers et al., 2017, Nature Biotechnology

Application Notes

The Minimum Information about a Metagenome-Assembled Genome (MIMAG) standard has established a critical baseline for reporting genome quality, including metrics like completeness, contamination, and strain heterogeneity. For the phylum Marinisomatota (formerly Marinisomatota), often recovered from marine and host-associated environments, achieving a "high-quality" MIMAG draft is the first step. However, in the context of drug discovery, particularly for identifying novel biosynthetic gene clusters (BGCs) for antimicrobials or other therapeutics, the standards for analysis must extend far beyond MIMAG's core genomic metrics. This necessitates advanced annotation and functional analysis pipelines to transform genomic sequences into testable biological hypotheses.

Key Insights:

Post-MIMAG Annotation Depth: MIMAG quality (completeness >90%, contamination <5%) enables reliable structural annotation, but functional annotation requires layering multiple complementary tools (e.g., eggNOG-mapper, InterProScan, KEGG, TIGRFAMs) to assign Gene Ontology terms, EC numbers, and pathway membership with confidence.
Specialized BGC Detection: Standard functional annotators often miss or misannotate BGCs. Dedicated antiSMASH (or similar) analysis is non-negotiable for drug discovery, as it identifies clusters for polyketide synthases (PKS), non-ribosomal peptide synthetases (NRPS), ribosomally synthesized and post-translationally modified peptides (RiPPs), and other specialized metabolites.
Prioritization via Comparative Genomics: Functional annotation data becomes actionable when placed in a comparative context. Creating pangenomes and assessing gene presence/absence across Marinisomatota strains from different ecological niches (e.g., free-living vs. host-associated) can highlight niche-specific adaptations and unique BGCs worthy of heterologous expression and screening.

Quantitative Data Comparison: Standard vs. Advanced Annotation

Table 1: Comparison of Annotation Outputs for a Hypothetical High-Quality Marinisomatota MAG

Annotation Metric	Basic Prokka Pipeline	Advanced Integrated Pipeline	Implication for Drug Discovery
Protein-Coding Genes	3,450	3,450 (consistent)	Baseline gene count established.
Genes with Functional Annotation	2,580 (~75%)	3,100 (~90%)	Higher confidence in gene function expands target space.
Assigned KEGG Orthologs (KOs)	1,850	2,400	Improved pathway reconstruction.
Complete KEGG Modules Identified	120	185	Better understanding of organism's metabolic capabilities.
Biosynthetic Gene Clusters (BGCs)	4 (putative, generic)	8 (specific types assigned)	Directly identifies candidate compound-producing machinery.
CRISPR Arrays Identified	1	3	Insights into phage defense, can be linked to BGC regulation.
Antibiotic Resistance Genes	2	5	Identifies potential self-resistance genes linked to BGCs.

Experimental Protocols

Protocol 1: Advanced Functional Annotation Pipeline forMarinisomatotaMAGs

Objective: To generate comprehensive functional annotations for a high-quality (Marinisomatota) MAG, exceeding basic MIMAG checklist requirements for drug discovery insights.

Materials/Software:

Input: High-quality MAG (FASTA format, completeness >90%, contamination <5% as per MIMAG).
Computational Resources: High-performance computing cluster or server with >=32 GB RAM, multi-core processors.
Conda environment (e.g., Bioconda, Anaconda).
Docker/Singularity (optional, for containerized tools).

Procedure:

Step 1: Structural Gene Calling & Annotation

Use prokka (v1.14.6) for rapid structural annotation: prokka --kingdom Bacteria --outdir prokka_out --prefix marinisoma MAG.fasta.
For improved gene prediction, especially for non-standard start codons common in certain bacteria, consider a two-step approach using Prodigal (v2.6.3) in meta mode: prodigal -i MAG.fasta -a proteins.faa -d genes.fna -o coords.gbk -p meta. Use the resulting protein file for downstream analyses.

Step 2: Comprehensive Functional Annotation

Run eggNOG-mapper (v2.1.9) for orthology assignment, GO terms, and KEGG pathways: emapper.py -i proteins.faa -o eggnog_out --cpu 8.
Run InterProScan (v5.59-91.0) for protein domain/family identification: interproscan.sh -i proteins.faa -dp -cpu 8 -appl Pfam,TIGRFAM,SMART,CDD,PRINTS -f tsv,gff3 -o ipr_out.
(Optional but recommended) Annotate against the dbCAN3 database for carbohydrate-active enzymes: run_dbcan.py proteins.faa protein --out_dir dbcan_out.

Step 3: Specialized Metabolite/BGC Annotation

Run antiSMASH (v7.0) to identify BGCs: antismash MAG.gbk --output-dir antismash_out --taxon bacteria --genefinding-tool prodigal-m.
Analyze antiSMASH results manually via the web interface or parse the .json output to extract cluster types, core biosynthetic genes, and predicted products.

Step 4: Data Integration & Visualization

Integrate results from eggNOG, InterProScan, and antiSMASH into a unified annotation table using custom Python/R scripts.
Use featureCounts or similar to generate a count matrix of KOs/GO terms across multiple MAGs for comparative analysis.

Protocol 2: Comparative Genomic Analysis for BGC Prioritization

Objective: To prioritize BGCs from a set of Marinisomatota MAGs for heterologous expression based on novelty and ecological context.

Procedure:

Perform Protocol 1 on all MAGs in the dataset.
Create a pangenome using Roary (v3.13.0): roary -p 8 -i 90 -cd 99 *.gff.
Construct a phylogenetic tree (from core genome alignment) using FastTree (v2.1.11).
Correlate BGC presence/absence matrix (from antiSMASH) with phylogeny and metadata (e.g., isolation depth, geography, host). Use ggplot2 (R) or seaborn (Python) for visualization.
Prioritize BGCs that are: a) phylogenetically restricted (present in a single clade), b) associated with a specific ecological niche, and c) of a rare or hybrid cluster type.

The Scientist's Toolkit: Research Reagent Solutions

Table 2: Essential Reagents and Tools for Functional Analysis & Validation

Item	Function in Analysis Pipeline
antiSMASH Database	Reference database of known BGCs and rules for identifying novel clusters in genomic data.
eggNOG Orthology Database	Provides functional annotation across thousands of genomes via evolutionary relationships.
InterProScan & Member Databases (Pfam, TIGRFAM)	Identifies protein domains, families, and conserved sites, crucial for inferring enzyme function.
KEGG PATHWAY & MODULE	Maps annotated genes to biological pathways and functional modules for systems-level understanding.
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) Recognition Tool (e.g., CRISPRCasFinder)	Identifies CRISPR-Cas systems, which can be associated with the regulation of defense metabolites.
*Heterologous Expression Host (e.g., Streptomyces coelicolor, E. coli* strains with BGC expression kits)**	Essential for validating the function of in silico-predicted BGCs by expressing them in a lab-controlled host and screening for metabolite production.
LC-MS/MS Metabolomics Standards	Chemical standards used to compare retention times and mass spectra from culture extracts against libraries, linking BGC expression to novel compounds.

Visualizations

Advanced Analysis Workflow

NRPS Biosynthetic Pathway Logic

Solving Common Pitfalls: How to Improve Genome Quality and MIMAG Compliance for Marine MAGs

Diagnosing and Remedying High Contamination Levels in Marine Bins.

Application Notes and Protocols

Thesis Context: This protocol is framed within a broader thesis research effort focused on applying MIMAG (Minimum Information about a Metagenome-Assembled Genome) and genome quality standards to genomes recovered from the phylum Marinisomatota (synonym Marinisomatia). High-quality, contamination-free genomes are critical for accurate phylogenetic placement, metabolic inference, and downstream drug discovery targeting marine microbiomes. Marine sediment bins often suffer from high contamination levels, compromising these goals.

Diagnostic Protocol: Quantifying and Identifying Contamination

Objective: To assess contamination levels and identify contaminant sources within metagenome-assembled bins (MAGs) attributed to Marinisomatota.

Experimental Workflow & Key Metrics

Table 1: Key Quality and Contamination Metrics for MIMAG Standards

Metric	Tool	Target (MIMAG High-Quality)	Interpretation for Marinisomatota Bins
Completeness	CheckM2	>90%	Estimates percentage of conserved single-copy genes present.
Contamination	CheckM2	<5%	Estimates percentage of single-copy genes present in multiple copies.
Strain Heterogeneity	CheckM2	<5% (preferred)	Indicates multiple strains within a bin.
SSU rRNA Count	CheckM, barrnap	0, 1, or 2	Multiple full-length SSU genes suggest contamination.
Taxonomic Consistency	GUNC, GTDB-Tk	Consistent lineage	Detects chimerism; all genes should point to related taxa.

Detailed Protocol 1.1: Integrated Contamination Screening

Initial Quality Assessment: Run CheckM2 (checkm2 predict) on all bins to estimate completeness and contamination.
Taxonomic Profiling: Classify the bin using GTDB-Tk (gtdbtk classify_wf). Note the reference database taxonomy.
Chimerism Detection with GUNC:
- Command: gunc run --input_file bin.fna --db_file gunc_db_progenomes2.1.dmnd --threads 8
- A bin is considered "chimeric" if the pass.GUNC column is False. Examine the taxonomic_level and gene_function_status outputs to identify inconsistent genomic regions.
Visualization with BlobTools2:
- Create a BlobDB: blobtools create -i bin.fna -t tax_file.tsv -o blob_out
- Generate an interactive plot: blobtools view -i blob_out.blobDB.json
- Visually inspect for GC-coverage clusters with divergent taxonomies, which indicate contaminant scaffolds.

The Scientist's Toolkit: Research Reagent Solutions

Item	Function & Relevance
ZymoBIOMICS DNA Miniprep Kit	Standardized, inhibitor-free DNA extraction from marine sediments for reproducible sequencing.
Pacific Biosciences SMRTbell Prep Kit 3.0	Preparation of libraries for HiFi long-read sequencing to resolve repetitive regions and improve assembly.
Illumina DNA Prep Kit	Preparation of high-accuracy short-read libraries for polishing long-read assemblies or co-assembly.
MetaPhlAn 4 Database	Profiling community composition to identify potential contaminant species in the sample.
GTDB-Tk Reference Data (r214)	Essential for accurate taxonomic classification against the current genome taxonomy database.
BUSCO Database (bacteria_odb10)	Provides a universal set of single-copy orthologs for independent completeness/contamination assessment.

Objective: To apply iterative, targeted refinement procedures to reduce contamination in Marinisomatota bins while preserving genomic completeness.

Remediation Decision Pathway

Detailed Protocol 2.1: Targeted Subtractive Binning

Use when a known, abundant contaminant (e.g., *Pseudomonas) is identified.*

Extract scaffold IDs of the contaminant from BlobTools/GUNC output.
Create a "contaminant scaffold list" file.
Re-run the binning tool (e.g., MetaBAT2) on the entire assembly, but provide this list to the --exclude parameter to prevent these scaffolds from being considered.
Extract the new Marinisomatota bin and re-evaluate.

Detailed Protocol 2.2: Manual Curation Based on Coverage and Taxonomy

Use for removing a limited number of contaminant scaffolds.

From the BlobTools plot, identify scaffolds with anomalous GC%, coverage, or taxonomy.
Extract coverage data for each scaffold from the mapping file (using samtools bedcov).

Create a table for manual inspection:

Table 2: Scaffold Curation Decision Matrix (Example)

Scaffold	Length (bp)	GC%	Avg. Coverage	Predicted Taxonomy (GTDB)	Action
scaffold_001	250,500	42.1	45.2	Marinisomataceae	Keep
scaffold_078	18,750	65.3	8.1	Pseudomonadaceae	Remove
scaffold_112	95,200	41.8	3.5	Flavobacteriaceae	Remove

Create a new, cleaned FASTA file excluding the "Remove" scaffolds: seqtk subseq bin.fna keep_list.txt > bin_clean.fna

Detailed Protocol 2.3: Hybrid Reassembly and Re-binning

Use for deeply entangled bins from short-read assemblies.

Map both Illumina and available long-reads (if any) to the contaminated bin.
Extract reads mapping to the bin using samtools fasta.
Perform a hybrid or long-read-only assembly of these extracted reads using Flye or SPAdes.
Re-bin this new, focused assembly using stringent parameters (e.g., higher --minProb in VAMB, specific -l in MaxBin2).

Final Validation: After any remediation step, re-run the full Diagnostic Protocol (CheckM2, GTDB-Tk, GUNC). The goal is to achieve a MIMAG High-Quality draft genome: >90% completeness, <5% contamination, and a non-chimeric classification by GUNC, suitable for definitive Marinisomatota research and downstream applications.

Strategies for Recovering Missing rRNA Operons and Key Genes

Application Notes

The implementation of Minimum Information about a Metagenome-Assembled Genome (MIMAG) standards has highlighted significant gaps in many bacterial genomes, particularly within phyla like Marinisomatota (formerly Marinisomatia). A common shortfall is the absence of full-length rRNA operons (16S, 23S, 5S) and other single-copy marker genes, which are critical for phylogenetic placement, genome completeness estimation, and metabolic pathway inference. This compromises downstream applications in comparative genomics and drug target discovery. Recent strategies leverage hybrid assembly and targeted enrichment to recover these missing genomic elements, thereby elevating genome quality to MIMAG's "high-quality draft" or "complete" status.

Key Quantitative Data on Common Genome Completeness Tools

Tool Name	Primary Method	Key Output Metric	Strengths for rRNA Recovery	Limitations
CheckM2	Machine learning on marker gene sets	Completeness, Contamination	Fast, accurate for overall completeness	Does not target rRNA operon structure
BUSCO (v5)	Homology search against lineage-specific datasets	% of expected single-copy orthologs	Broad phylogenetic breadth, standardized scores	Bacterial gene sets may lack rRNA focus
rna_hmm3	HMMER search with rRNA-specific models	Presence of 5S/16S/23S genes	Specialized for rRNA detection	Does not resolve operon continuity
metaEuk	Gene prediction with eukaryotic focus	Protein and rRNA genes	Effective for complex microbiomes	Less optimized for bacterial rRNA operons
PhyloFlash (v3.4)	Mapping reads to rRNA databases	rRNA sequence and abundance	Recovers rRNA from raw reads pre-assembly	Operon structure not assembled

Protocol 1: Hybrid Assembly for rRNA Operon Recovery

Objective: To generate a contiguous assembly that includes full-length rRNA operons by integrating long-read and short-read sequencing data.

Materials:

Purified genomic DNA (>20 kb fragment size).
Illumina DNA Prep kit and NovaSeq platform (short-reads).
Oxford Nanopore Ligation Sequencing Kit (SQK-LSK114) & GridION, or PacBio HiFi library prep.
High-molecular-weight DNA isolation kit (e.g., Nanobind CBB Big DNA Kit).
Computational resources (≥32 GB RAM, multi-core server).

Procedure:

Library Preparation & Sequencing:
- Perform short-read (2x150 bp) sequencing on the Illumina platform following the manufacturer's protocol.
- In parallel, prepare a long-read library. For Nanopore, use the Ligation Sequencing Kit, load onto a R9.4.1 or R10.4.1 flow cell, and run for ≥48 hours. For PacBio, prepare a HiFi SMRTbell library.
Quality Control:
- Trim short-reads using fastp (v0.23.2) with default parameters.
- Filter long-reads: For Nanopore, use Filternong (--minlength 1000 --minqscore 10). For PacBio HiFi, use default quality values.
Hybrid Assembly:
- Assemble long-reads de novo using Flye (v2.9.3): flye --nano-raw [reads.fastq] --out-dir flye_output --threads 16.
- Polish the Flye assembly using the high-accuracy short-reads with POLCA (from MaSuRCA): polca.sh -a flye_output/assembly.fasta -r '[R1.fastq] [R2.fastq]' -t 16.
rRNA Operon Identification:
- Run Barrnap (v0.9) on the polished assembly to predict rRNA loci: barrnap assembly_polished.fasta --outseq rrna_sequences.fasta.
- Visually verify operon contiguity (16S-ITS-23S-ITS-5S) by mapping reads back to the assembly in a viewer like IGV.

Protocol 2: Targeted Enrichment Using rRNA Probes

Objective: To selectively capture genomic fragments containing rRNA genes from complex or low-biomass samples prior to sequencing.

Materials:

MyBaits rRNA Custom Kit (Arbor Biosciences) or xGen Hybridization Capture Kit (IDT).
Biotinylated 80-mer DNA probes tiling the full length of conserved bacterial 16S, 23S, and 5S rRNA genes.
Magnetic streptavidin beads.
Hybridization oven or thermocycler with heated lid.

Procedure:

Library Preparation:
- Prepare a standard Illumina paired-end library from the gDNA. Do not amplify excessively (≤8 PCR cycles).
Hybridization Capture:
- Pool the library with blocking oligonucleotides and the biotinylated rRNA probe pool.
- Denature at 95°C for 10 minutes and incubate at 65°C for 16-24 hours to allow probes to hybridize to target rRNA fragments in the library.
Capture and Wash:
- Add streptavidin beads to the hybridization mix, incubate to bind biotinylated probe-target complexes.
- Wash beads with increasingly stringent buffers (following kit protocol) to remove non-specifically bound DNA.
Elution and Amplification:
- Elute the captured DNA from the beads in a low-salt buffer.
- Amplify the enriched library with 12-14 cycles of PCR.
Sequencing and Analysis:
- Sequence the enriched library on an Illumina MiSeq or NextSeq platform (2x300 bp recommended).
- Assemble reads using a dedicated rRNA assembler like PhyloFlash or integrate into the hybrid assembly from Protocol 1 as "trusted" contigs.

Diagram Title: Hybrid Assembly Workflow for rRNA Recovery

Diagram Title: Targeted rRNA Enrichment Protocol

The Scientist's Toolkit: Research Reagent Solutions

Item	Function & Relevance
Nanobind CBB Big DNA Kit	Purifies ultra-high molecular weight DNA (>50 kb) essential for long-read sequencing and intact operon analysis.
Oxford Nanopore Ligation Kit (SQK-LSK114)	Prepares DNA libraries for nanopore sequencing, enabling multi-kb reads that span repetitive rRNA operons.
MyBaits Custom rRNA Probe Set	Biotinylated oligonucleotides designed to tile bacterial rRNA genes for targeted enrichment from complex samples.
Streptavidin Magnetic Beads	Solid-phase support for capturing probe-bound target DNA during hybridization selection protocols.
Phusion High-Fidelity DNA Polymerase	Provides high-fidelity amplification of post-capture libraries with minimal bias, crucial for accurate representation.
CheckM2 Database	Provides the most current set of marker genes for robust assessment of genome completeness and contamination post-recovery.

Optimizing Assembly Parameters for Complex, Low-Abundance Marine Communities

Research on complex, low-abundance marine microbial communities is critical for bioprospecting and understanding ecosystem functions. This work is framed within the broader thesis of advancing genome quality research for the phylum Marinisomatota (formerly Marinimicrobia), aligning with the Minimum Information about a Metagenome-Assembled Genome (MIMAG) standards. Achieving high-quality, nearly complete genomes with low contamination from these communities requires optimized assembly strategies to overcome challenges of high diversity, uneven abundance, and genomic novelty.

The following parameters are critical for optimizing genome recovery from marine metagenomic data. Optimal ranges are derived from recent literature and benchmark studies.

Table 1: Optimization Parameters for Metagenome Assembly

Parameter	Typical Default	Optimized Range for Low-Abundance Communities	Impact on Assembly
k-mer Size(s)	Single (e.g., 77)	Multiple, iterative (e.g., 21, 33, 55, 77, 99, 127)	Balances contiguity vs. strain resolution. Smaller k-mers capture low-abundance taxa.
Minimum Contig Length	500 - 1000 bp	1500 - 2500 bp for initial binning	Increases binning accuracy but may discard fragments from rare organisms.
Read Depth Filtering	Off or lenient	Pre-assembly: ≥5x coverage	Reduces noise from very low-coverage sequences, streamlining assembly.
MetaSPAdes `--meta` flag	Not set	Always enabled	Configures assembler for uneven coverage and high diversity.
MEGAHIT `--min-count`	2	1 (default but critical)	Essential for retaining single-copy reads from low-abundance members.
MEGAHIT `--k-list`	Step of 28	Step of 12 (e.g., 27,39,51...)	Finer granularity improves graph connectivity for diverse communities.

Table 2: Post-Assembly Binning & Refinement Parameters

Tool/Step	Key Parameter	Recommended Setting	Rationale
MetaBAT2	`--minContig`	2500	Aligns with MIMAG high-quality draft threshold.
MaxBin2	`-min_contig_length`	1500	Slightly lower to capture more fragments.
CONCOCT	`--length_threshold`	1000	Aggressive for complex communities.
DAS Tool	Integration	Use all above	Consensus binning maximizes recovery.
CheckM	Lineage-specific	Use `-x marinisomatota`	Critical for accurate completeness/contamination estimates for target phylum.
RefineM	`--genome_ext`	fa	Uses taxonomy and metrics to purify bins.

Detailed Experimental Protocols

Protocol 3.1: Optimized Hybrid Assembly Workflow

Objective: Recover high-quality MAGs from marine microbial communities. Input: Paired-end Illumina reads (150bp) and long-read PacBio HiFi/ONT data. Duration: ~5-7 days of computation.

Preprocessing:
- Trim adapters and low-quality bases using fastp (v0.23.2):
- Remove host/organellar reads via mapping to reference databases (e.g., silva.nr99).
Co-assembly:
- For Illumina-only: Use metaSPAdes (v3.15.5) with multi-kmer strategy.
- For Hybrid: Use metaFlye (v2.9.2) on long reads, then polish with short reads.
Binning:
- Map all reads to the assembly using Bowtie2 and samtools.
- Run multiple binners (MetaBAT2, MaxBin2, CONCOCT) on coverage profiles and contigs ≥1500bp.
- Generate consensus bins with DAS Tool (v1.1.4):
Refinement & Quality Assessment:
- Run CheckM2 for rapid quality estimates.
- Perform taxonomy-aware refinement with RefineM:
- Apply MIMAG standards: Bins with ≥50% completeness and <10% contamination are medium-quality; target ≥90% completeness and <5% contamination for Marinisomatota.

Protocol 3.2: TargetedMarinisomatotaEnrichment Verification via 16S rRNA Gene Phylogeny

Objective: Confirm the presence and phylogenetic placement of target phylum in bins.

Extract 16S rRNA genes from MAGs using barrnap (v0.9).
Align to a curated SILVA SSU Ref NR 99 database using SINA (v1.7.2).
Build a maximum-likelihood tree with IQ-TREE (v2.2.0):

Visualize tree to confirm clustering within the Marinisomatota clade.

Diagrams

Title: MAG Recovery Workflow from Marine Metagenomes

Title: MIMAG Standards and Quality Thresholds

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Materials and Computational Tools

Item / Reagent / Tool	Function / Purpose	Key Consideration
DNeasy PowerWater Kit (QIAGEN)	High-yield DNA extraction from marine filters (0.22µm).	Minimizes bias against Gram-positive cells; critical for low-biomass.
PacBio HiFi or ONT Ultra-Long Read Chemistry	Generates long reads (≥10 kb).	Enables assembly through repetitive regions, resolving complex genomes.
metaSPAdes / metaFlye Assemblers	Core assembly engines for short and long reads.	Must be run with `--meta` flag to handle uneven coverage.
GTDB-Tk Database (v2.3.0)	Provides accurate genome taxonomy.	Essential for placing novel Marinisomatota bins in current taxonomy.
CheckM/CheckM2 Software	Assesses MAG completeness & contamination.	Use lineage-specific marker sets for accurate phylum-level estimates.
RefineM Software Package	Refines bins using genomic properties & taxonomy.	Key for reducing cross-phylum contamination in final bins.
PhyloFlash (v3.4)	Rapid 16S rRNA recovery & community profile.	Quick verification of Marinisomatota presence pre-assembly.
Anti-Carryover Reagents (e.g., UDG)	For low-input library prep.	Reduces background noise in sequencing of low-abundance communities.

Addressing Strain Heterogeneity and Fragmentation in Marinomonas MAGs

Application Notes: The Challenge of Strain Heterogeneity inMarinomonasMAGs

Within the framework of MIMAG (Minimum Information about a Metagenome-Assembled Genome) standards and Marinisomatota genome quality research, the genus Marinomonas presents specific challenges. As a ubiquitous marine gammaproteobacterium, it exhibits significant strain-level genomic and functional diversity, complicating the generation of high-quality, representative MAGs. Heterogeneity within a population leads to fragmented assemblies and composite genomes that do not accurately represent a single microbial lineage, undermining downstream ecological interpretation and bioprospecting efforts for drug discovery.

Table 1: Common Quality Metrics for Marinomonas MAGs Against MIMAG Standards

MIMAG Tier	Completeness (%)	Contamination (%)	tRNA Count	5S, 16S, 23S rRNA	Assembly Fragmentation (N50, bp)	Strain Heterogeneity Indicator
High-quality draft (≥)	90	<5	≥18	Full-length genes present	>50,000	Low (CheckM2 heterogeneity <0.1)
Medium-quality draft (≥)	50	<10	-	Partial or absent	>10,000	Moderate to High
Typical Marinomonas Challenge	High (often >95)	Variable (can be elevated)	Often complete	Often missing	Often low (<20,000)	Frequently High

Table 2: Quantitative Impact of Strain Heterogeneity on Assembly

Bioinformatics Metric	Value in Homogeneous Population	Value in Heterogeneous Population	Implication for MAG Quality
Assembly N50 (bp)	>100,000	<50,000	Increased fragmentation
Number of Contigs	Low (e.g., 50-200)	High (e.g., 500-2000)	Difficult to close genome
CheckM2 "Strain Heterogeneity" score	<0.05	>0.15	MAG is a composite of multiple strains
Percent of Single-Copy Core Genes (SCGs) with multiple sequence variants	<1%	5-20%	Clear signal of multiple strains in bin

Detailed Experimental Protocols

Protocol 2.1: Pre-assembly Filtering to Reduce Heterogeneity

Objective: To enrich sequence data from a target strain prior to assembly. Materials: Raw paired-end metagenomic reads (FASTQ), host/adapter trimming tool (e.g., fastp), k-mer frequency analysis tool (KmerGenie). Procedure:

Quality Trimming: Use fastp with parameters -q 20 -u 30 --trim_poly_g to remove low-quality bases and adapters.
k-mer Spectrum Analysis: Run KmerGenie on trimmed reads to generate an optimal k-mer size report and visualize k-mer frequency distribution. A broad peak indicates heterogeneity.
Digital Normalization: Optional: Use bbnorm.sh from BBTools to normalize coverage (target=100 min=5), reducing data complexity from dominant strains without removing rare variants crucial for Marinomonas diversity.

Protocol 2.2: Co-assembly and Iterative Binning with Heterogeneity Check

Objective: Generate and refine MAGs with explicit checks for strain mixtures. Materials: Metagenomic assemblies from metaSPAdes or MEGAHIT, binning software (MetaBAT2, MaxBin2), refinement tool (MetaWRAP-refine), quality tool (CheckM2). Procedure:

Co-assembly: Assemble all quality-filtered reads from related samples using metaSPAdes.py --meta -k 21,33,55,77.
Initial Binning: Create initial bins from the co-assembly contigs using both MetaBAT2 and MaxBin2. Use metawrap binning -o INITIAL_BINS -a assembly.fasta --metabat2 --maxbin2.
CheckM2 Screening: Run checkm2 predict --input INITIAL_BINS --output-dir CHECKM2_OUT on all bins. Flag bins with "Strain Heterogeneity" score >0.1.
Iterative Refinement: For flagged Marinomonas bins, use metawrap refine -o REFINED -A INITIAL_BINS/metabat2_bins -B INITIAL_BINS/maxbin2_bins -c 90 -x 5. This leverages consensus of multiple binners to improve purity.

Protocol 2.3: Single-Nucleotide Variant (SNV) Analysis for Strain Deconvolution

Objective: Identify and, if possible, separate strains within a candidate MAG. Materials: High-quality but heterogeneous Marinomonas MAG, original quality reads mapped to the MAG (BAM files), variant caller (bcftools). Procedure:

Read Mapping: Map all reads back to the MAG using bowtie2 and convert to sorted BAM with samtools.
Variant Calling: Call variants using bcftools mpileup -Ou -f MAG.fasta mappings.bam | bcftools call -mv -Oz -o variants.vcf.gz. Filter for high-quality SNVs (QUAL>20 & DP>10).
Variant Frequency Plotting: Plot the frequency distribution of alternate alleles for all SNV positions. A bimodal distribution (e.g., peaks at ~50% and ~100%) indicates two major strains.
Read Separation (if clear bimodality): Use tools like Strainberry or metaVaR to attempt in silico separation of reads belonging to different strains for re-assembly.

Visualizations

Diagram 1: Workflow for Addressing Strain Heterogeneity

Diagram 2: Strain Heterogeneity Detection via SNV Analysis

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Materials for Strain-Resolved Marinomonas MAG Generation

Item/Category	Specific Product or Tool (Example)	Function in Protocol	Key Notes for Researchers
Sequencing Kit	Illumina NovaSeq 6000 S4 Reagent Kit (600 cycles)	Generate high-depth, paired-end (2x300bp) metagenomic reads.	Long reads aid in resolving repetitive regions common in Marinomonas.
DNA Extraction Kit	DNeasy PowerWater Kit (Qiagen)	Extract high-molecular-weight DNA from marine filter samples.	Minimizes bias against Gram-negative bacteria like Marinomonas.
Assembly Software	metaSPAdes v3.15.5	Perform co-assembly of complex metagenomes.	Uses multiple k-mer sizes to improve assembly of diverse strains.
Binning Software Suite	MetaBAT2, MaxBin2, CONCOCT	Automated clustering of contigs into draft genomes (bins).	Using multiple tools is crucial for consensus binning.
Bin Refinement Tool	MetaWRAP v1.3.2 "BIN_REFINEMENT" module	Improves bin completeness and purity by leveraging multiple bin sets.	Effectively reduces contamination from other species.
Quality Assessment Tool	CheckM2 v1.0.1	Assess MAG completeness, contamination, and strain heterogeneity.	The heterogeneity score is the primary diagnostic for mixed strains.
Variant Calling Tool	BCFtools v1.17	Identify single-nucleotide variants from read mappings.	Used for strain deconvolution analysis within a MAG.
Reference Database	GTDB (Genome Taxonomy Database) r214	Taxonomic classification of Marinomonas bins.	Essential for placing MAGs within the Marinisomatota phylum context.

Application Notes

The recovery and analysis of high-quality metagenome-assembled genomes (MAGs) from marine samples are critical for exploring the "microbial dark matter" of the ocean, with significant implications for biotechnology and natural product discovery. This work is framed within the context of thesis research advancing the MIMAG (Minimum Information about a Metagenome-Assembled Genome) standards, specifically for the novel candidate phylum Marinisomatota. Selecting appropriate computational tools at each stage—from binning to quality assessment and taxonomic classification—is essential for generating robust, publication-ready genomes that meet MIMAG's "high-quality draft" or "complete" specifications.

Binners for Marine Metagenomes

Marine datasets, especially from pelagic zones, often feature high microbial diversity, uneven abundance, and related strains (microdiversity). Modern binners use complementary strategies:

Coverage/composition-based (e.g., MetaBAT 2): Effective for dominant populations but can struggle with highly diverse communities.
Deep learning (e.g., VAMB): Excels at separating species with varying abundance patterns, showing superior performance on complex marine data by leveraging sequence composition and co-abundance.
Hybrid/Graph-based (e.g., MetaSPAdes + metaWRAP): Integrates assembly graphs with coverage/composition, improving binning of closely related strains.

CheckM-like Tools for Quality Assessment

These tools estimate genome completeness, contamination, and strain heterogeneity, which are the core metrics for MIMAG compliance.

CheckM/CheckM2: Uses a consensus of lineage-specific marker genes. Essential for benchmarking but can be biased by the reference tree.
BUSCO: Assesses completeness based on near-universal single-copy orthologs (e.g., using the bacteria_odb10 dataset). Provides a standardized, lineage-independent metric highly valued in MIMAG reports.
GRATE: Evaluates genome quality via the consistency of the underlying assembly graph, identifying potential mis-assemblies not flagged by marker-gene approaches. Critical for ensuring structural accuracy.

Classifiers for Taxonomic Assignment

Precise classification of novel marine lineages like Marinisomatota requires tools that handle phylogenetic novelty.

GTDB-Tk: The current standard for MAG classification. It places genomes within the Genome Taxonomy Database (GTDB) framework, which provides a standardized bacterial/archaeal taxonomy. Crucial for identifying novel families/orders.
CAT/BAT (or Kaiju): Fast, alignment-based classifiers for initial screening of contigs or bins. Useful for identifying contamination from non-target domains.
PhyloPhlAn: For in-depth phylogenetic placement using a large set of conserved markers, helping to elucidate evolutionary relationships of novel phyla.

Quantitative Tool Comparison

Table 1: Performance Metrics of Binning Tools on Marine Datasets

Tool	Algorithm Type	Key Strength	Reported Avg. Completion*	Reported Avg. Contamination*	Best For
MetaBAT 2	Coverage/Composition	Robust, predictable	78%	4%	High-abundance populations
VAMB	Deep Learning (Co-abundance)	Resolves microdiversity	85%	3%	Complex, diverse communities
metaWRAP Bin_refinement	Hybrid Consensus	Increases bin quality	92%	1%	Consolidating outputs of multiple binners

*Representative values from benchmarking studies on marine mock communities (e.g., CAMI II). Actual performance is dataset-dependent.

Table 2: Genome Quality Assessment Tool Outputs

Tool	Core Metric	Method	Relevance to MIMAG Standards
CheckM2	Completeness, Contamination	Machine learning on marker genes	Primary metric for "high-quality draft" (≥90% comp., <5% cont.)
BUSCO	Completeness (Single-copy orthologs)	HMM search against conserved gene sets	Complementary, lineage-agnostic completeness score
GRATE	Graph Consistency Score	Assembly graph analysis	Identifies structural problems; supports "complete" genome criteria

Table 3: Taxonomic Classification Tools for Novel Marine Lineages

Tool	Method	Database	Speed	Use Case for Marinisomatota
GTDB-Tk	Concatenated marker phylogeny	GTDB (r207/v2)	Medium	Definitive classification & relative evolutionary divergence
CAT/BAT	DIAMOND alignment + LCA	NCBI NR	Fast	Initial domain/kingdom screening & contamination check
PhyloPhlAn	Phylogenetic placement	>400,000 markers	Slow	Detailed phylogenetic tree construction

Experimental Protocols

Protocol 1: Integrated Binning and Quality Control Workflow for Marine MAGs

Objective: To generate MIMAG-compliant, high-quality MAGs from a marine metagenomic assembly.

Materials & Reagents:

Computational Resources: High-performance computing cluster with ≥64 GB RAM.
Software Dependencies: Conda environment with snakemake, metaWRAP (v1.3+), CheckM2, BUSCO (v5), GTDB-Tk (v2).
Input Data: Co-assembled metagenomic contigs (FASTA) and quality-trimmed reads (FASTQ) mapped back to contigs (BAM files).

Procedure:

Pre-binning: Create coverage profiles for each sample.

Consensus Binning: Use metaWRAP's Bin_refinement module to integrate bins from multiple tools.
Quality Assessment: Run CheckM2 and BUSCO on the refined bins.
Taxonomic Classification: Classify bins that meet "high-quality draft" standards (≥90% completeness, <5% contamination).
MIMAG Compliance Table: Generate a summary table integrating all metrics for manuscript reporting.

Protocol 2: Phylogenetic Placement of a Novel Marinisomatota MAG

Objective: To determine the phylogenetic position of a recovered Marinisomatota MAG relative to existing GTDB taxa.

Materials:

Input: High-quality Marinisomatota MAG (MAG_001.fasta).
Software: GTDB-Tk (v2), IQ-TREE2, FigTree.
Database: GTDB reference data (r207/v2).

Procedure:

Run GTDB-Tk De Novo Workflow: This generates a multiple sequence alignment of 120 bacterial marker genes.

Model Testing and Tree Inference: Use IQ-TREE2 on the alignment (gtdbtk_denovo/align/concatenated.align) to build a robust tree.
Tree Visualization and Annotation: Root the resulting tree (concatenated.align.treefile) on the specified outgroup and visualize in FigTree to confirm placement within the candidate phylum and proximity to defined orders/families.

Visualizations

MAG Generation and Quality Control Workflow

Phylogenetic Analysis Protocol for Novel MAGs

The Scientist's Toolkit: Research Reagent Solutions

Table 4: Essential Computational Resources for Marine MAG Research

Item	Function/Description	Example/Note
High-Performance Computing (HPC) Cluster	Essential for memory- and CPU-intensive tasks like assembly, binning, and phylogenetic inference.	Minimum 64-128 GB RAM, 20+ cores per job. Cloud options (AWS, GCP) offer scalability.
Conda/Mamba Environments	Manages isolated software installations with specific version dependencies to ensure reproducibility.	Use `environment.yml` files to share tool versions (e.g., checkm2=1.0.1, gtdbtk=2.3.0).
Snakemake/Nextflow Workflow Manager	Automates multi-step analytical pipelines, managing software dependencies and parallel execution.	Critical for reproducible analysis from raw reads to final MAG table.
GTDB Reference Database (r207/v2)	Standardized microbial taxonomy and aligned marker gene database for phylogenetically consistent classification.	Requires ~60 GB storage. Updated periodically; must cite release.
BUSCO Lineage Dataset (e.g., bacteria_odb10)	Dataset of near-universal single-copy orthologs used as an independent benchmark for genome completeness.	Provides a standardized score comparable across studies.
Interactive Tree of Life (iTOL)	Web-based tool for visualizing, annotating, and publishing phylogenetic trees generated by GTDB-Tk or IQ-TREE.	Enhances figures for publications and exploratory analysis.

Benchmarking Quality: How MIMAG for Marinomonas Compares to Other Genomic Standards

Application Notes

Reporting standards are critical for ensuring data reproducibility, interoperability, and meta-analysis in genomics. MIMAG (Minimum Information about a Metagenome-Assembled Genome) and MIxS (Minimum Information about any (x) Sequence) are two established frameworks with distinct scopes and levels of specificity. Within the context of Marinisomatota genome quality research, selecting the appropriate standard is paramount for comparative studies and database submissions.

Comparative Scope and Specificity

The primary distinction lies in their focus. MIxS is a broad, umbrella standard encompassing checklists for various sequence types (MIGS for genomes, MIMS for metagenomes, MIMARKS for marker genes). MIMAG is a highly specific standard developed to report the quality and completeness of single metagenome-assembled genomes (MAGs), a central activity in Marinisomatota research.

Table 1: Core Comparison of MIMAG and MIxS Standards

Feature	MIMAG Standard	MIxS (MIGS/MIMS) Standard
Primary Scope	Quality reporting for individual MAGs	General contextual data for any sequence
Key Metrics	Completeness, contamination, strain heterogeneity, sequencing depth.	Environmental, host-associated, or specimen details.
Required Fields	~20 core fields specific to MAG quality (e.g., checkmcompleteness, checkmcontamination).	~40 core fields + environment-specific packages.
Genomic Context	Mandatory for the genome being described.	Ancillary to the sequenced sample.
Typical Use Case	Submitting/describing a curated MAG to a database (e.g., GTDB, GenBank).	Submitting raw sequences or reads with environmental metadata to SRA/ENA.

Table 2: Quantitative Quality Tiers Defined by MIMAG

Quality Tier	Completeness	Contamination	Strain Heterogeneity	tRNA Genes	rRNA Operons	Use in Marinisomatota Taxonomy
High-quality draft (HQ)	≥90%	<5%	≥95% (or pass)	≥18	≥1 (5S, 16S, 23S)	Species-level proposal
Medium-quality draft (MQ)	≥50%	<10%	Not required	Not required	Not required	Genus/Family-level analysis
Low-quality draft	<50%	<10%	Not required	Not required	Not required	Limited phylogenetic placement

Integration in aMarinisomatotaGenome Study Workflow

For comprehensive reporting, both standards are often used in tandem. MIxS (specifically the MIMS checklist) describes the metagenomic sample from which the MAG was derived (e.g., marine sediment depth, salinity, pH). MIMAG then describes the individual Marinisomatota MAG extracted from that sample, detailing its assembly and quality metrics. This dual approach ensures both environmental context and genomic rigor.

Experimental Protocols

Protocol: Generating a MIMAG-CompliantMarinisomatotaMAG

This protocol details the steps from raw metagenomic data to a MIMAG-ready genome assembly.

Title: Genome-Resolved Metagenomics for Marinisomatota MAGs

Objective: To reconstruct and quality-check a metagenome-assembled genome (MAG) from a complex environmental sample, adhering to MIMAG reporting requirements.

Materials & Reagents: See "Scientist's Toolkit" below.

Procedure:

Metagenomic Sequencing & Data Acquisition:
- Isolate total DNA from the environmental sample (e.g., marine filter) using a kit optimized for low-biomass and high-inhibitor samples (e.g., DNeasy PowerSoil Pro Kit).
- Prepare a sequencing library (e.g., using Illumina DNA Prep) and sequence on an Illumina NovaSeq platform to obtain ≥10 Gbp of paired-end (2x150 bp) data. For improved assembly, supplement with long-read data (PacBio HiFi or Oxford Nanopore).
Pre-processing of Sequence Reads:
- Use FastQC v0.12.1 for initial quality assessment.
- Trim adapters and low-quality bases using Trimmomatic v0.39 with parameters: ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:50.
Co-assembly and Binning:
- Perform de novo co-assembly of quality-filtered reads from multiple related samples using metaSPAdes v3.15.5 with k-mer sizes 21,33,55,77,99,127.
- Map reads back to contigs using Bowtie2 v2.5.1 to generate coverage profiles.
- Perform binning using MetaBAT2 v2.15, MaxBin2 v2.2.7, and CONCOCT v1.1.0. Generate a consensus set of bins using DAS Tool v1.1.6.
MAG Quality Assessment (MIMAG Core):
- Run CheckM2 v1.0.1 lineage_wf on each bin to estimate completeness and contamination.
- Use GUNC v1.0.6 to assess strain heterogeneity (clade separation score).
- Annotate the MAG using Prokka v1.14.6 or DRAM v1.4.0. Verify the presence of tRNA genes (≥18) using tRNAscan-SE v2.0.9 and a complete rRNA operon using Barrnap v0.9.
Taxonomic Classification & Curation:
- Classify the MAG using the GTDB-Tk v2.3.2 reference database (release R214) to confirm placement within the Marinisomatota phylum.
- Manually curate the bin by removing putative contaminant contigs (e.g., outliers in GC-content, coverage, or taxonomic assignment).
- Re-assess quality metrics post-curation.
MIMAG Reporting:
- Compile all metrics into the MIMAG checklist.
- Assign a final quality tier (High, Medium, Low) based on Table 2 thresholds.
- Submit the MAG sequence with MIMAG metadata to an appropriate repository (e.g., GenBank via the Microbial Genome Submission portal).

Protocol: Applying MIxS Metadata to the Source Metagenome

Title: Contextual Metadata Curation Using MIxS

Objective: To annotate the source metagenomic sample with relevant environmental and experimental metadata following the MIxS (MIMS) checklist.

Procedure:

Checklist Selection: Identify the appropriate MIxS checklist. For a marine water sample, use the MIMS (Metagenome or Microbiome) checklist with the "water" environmental package.
Data Collection: Populate mandatory core fields (e.g., investigation type, project name, lat_lon, collection date).
Environmental Package Fields: Populate fields specific to the marine environment: depth, salinity, temp, pressure, samp_mat_process, etc.
Integration: Link this MIxS-compliant sample metadata (typically via a BioSample accession) to the raw sequence read archive (SRA) submission and, subsequently, to the derived Marinisomatota MAG accession.

Diagrams

Title: Integrated MAG Recovery & Reporting Workflow

Title: Relationship Between MIxS and MIMAG Standards

The Scientist's Toolkit

Table 3: Key Research Reagent Solutions for MIMAG-Compliant Marinisomatota Studies

Item / Kit	Function in Protocol	Key Feature for Marinisomatota Research
DNeasy PowerSoil Pro Kit (QIAGEN)	Total DNA extraction from environmental samples.	Effective lysis of difficult-to-break cells and removal of potent PCR inhibitors common in marine sediments.
Illumina DNA Prep Kit	Library preparation for short-read sequencing.	Efficient tagmentation-based workflow for low-input DNA, suitable for metagenomic samples.
SMRTbell Prep Kit 3.0 (PacBio)	Library prep for HiFi long-read sequencing.	Generates highly accurate long reads (>10 kb) crucial for resolving repetitive regions in MAG assembly.
Trimmomatic	Read trimming and adapter removal.	Critical pre-processing step to ensure assembly quality; removes low-quality ends.
metaSPAdes Assembler	De novo metagenomic co-assembly.	Specifically designed for heterogeneous metagenomic data, improving contiguity of MAGs.
CheckM2 / GUNC	MAG quality assessment (comp/contam) and chimerism detection.	Provides the core metrics required by MIMAG for tier classification. More accurate than CheckM1.
GTDB-Tk & Reference Data	Precise taxonomic classification of prokaryotic MAGs.	Essential for placing novel MAGs within the updated Marinisomatota phylogeny.
tRNAscan-SE / Barrnap	Detection of tRNA and rRNA genes.	Validates the presence of essential genetic elements for MIMAG high-quality tier.

This application note provides a detailed comparative analysis of Metagenome-Assembled Genome (MAG) quality tiers, specifically High-Quality (HQ) and Medium-Quality Draft (MQD), within the framework of the Minimum Information about a Metagenome-Assembled Genome (MIMAG) standards. For researchers in microbial ecology, evolution, and drug discovery, accurately assessing and reporting MAG quality is paramount for ensuring the reliability and reproducibility of downstream analyses, including genomic mining for novel biosynthetic gene clusters (BGCs) with therapeutic potential.

The MIMAG standard defines quality tiers based on metrics of completeness, contamination, and the presence of standard marker genes. The following table summarizes the key quantitative thresholds.

Table 1: MIMAG Quality Tier Classification Criteria

Quality Metric	High-Quality Draft (HQ)	Medium-Quality Draft (MQD)
Completeness (CheckM)	≥90%	≥50% and <90%
Contamination (CheckM)	<5%	<10%
rRNA Genes	Presence of 5S, 16S, 23S	Not required
tRNA Genes	≥18 tRNAs	Not required
Assembly Contiguity	≤200 contigs	No defined threshold
N50	No defined threshold	No defined threshold

Table 2: Implications for Downstream Analysis

Analysis Type	High-Quality Draft (HQ)	Medium-Quality Draft (MQD)
Phylogenomic Placement	Suitable for robust genus/species-level assignment.	May be limited to higher taxonomic ranks (family, order).
Metabolic Pathway Inference	High-confidence reconstruction of core and secondary metabolism.	Gaps likely; pathway completeness must be reported with caveats.
Pangenome Studies	Preferred for gene presence/absence and evolutionary analysis.	Use with caution; may skew results due to fragmentation.
Drug Discovery (BGC Screening)	High confidence in BGC structure and novelty assessment.	Potential for fragmented BGCs; requires careful manual curation.

Experimental Protocols for MAG Quality Assessment

Protocol 1: Genome Binning and Initial Quality Check

Objective: To reconstruct MAGs from metagenomic assemblies and perform initial completeness/contamination assessment. Materials: Assembled metagenomic contigs (FASTA), sample-specific or co-assembly read mappings (BAM files). Reagents/Software: MetaBAT2, MaxBin2, CONCOCT, CheckM, DAS Tool. Procedure:

Binning: Execute multiple binning tools (e.g., MetaBAT2, MaxBin2) on the assembled contigs using the provided read depth profiles.
- MetaBAT2 command: metabat2 -i assembled_contigs.fasta -a depth.txt -o metabat2_bins/bin
Bin Consolidation: Use DAS Tool to integrate results from multiple binners and generate a consensus, refined set of bins.
- DAS Tool command: DAS_Tool -i metabat2.csv,maxbin2.csv -l metabat,maxbin -c contigs.fasta -o das_output --write_bins 1
Initial Quality Screening: Run CheckM lineage_wf on the final bin set to estimate completeness and contamination.
- CheckM command: checkm lineage_wf bins_dir checkm_output -x fa -t 20
Filtering: Retain bins with ≥50% completeness and <10% contamination for further analysis (MQD+).

Protocol 2: Comprehensive MAG Curation and Tier Assignment

Objective: To curate bins, identify marker genes, and assign final MIMAG quality tiers. Materials: Bins from Protocol 1 (FASTA), CheckM results. Reagents/Software: CheckM, GTDB-Tk, Barrnap, tRNAscan-SE, Prokka or Bakta. Procedure:

Contamination Refinement: Manually inspect bins flagged with >5% contamination in CheckM. Use tools like GUNC or anvi-refine to identify and remove contaminant contigs.
Marker Gene Identification:
- rRNA Genes: Predict using Barrnap.
  - barrnap --kingdom bac bin.fasta > rrna_genes.gff
- tRNA Genes: Predict using tRNAscan-SE.
  - tRNAscan-SE -B -o trna_output.txt bin.fasta
Taxonomic Classification: Perform standardized taxonomy assignment using GTDB-Tk.
- gtdbtk classify_wf --genome_dir bins_dir --out_dir gtdbtk_out -x fa --cpus 20
Tier Assignment: Compile all metrics. Assign as High-Quality if bin meets all criteria in Table 1. Assign as Medium-Quality Draft if it meets completeness/contamination thresholds but lacks full rRNA/tRNA complement or has high contig count.

Protocol 3: Downstream Analysis Suitability for BGC Discovery

Objective: To evaluate MAG suitability for biosynthetic gene cluster mining, highlighting differences between HQ and MQD tiers. Materials: Curated HQ and MQD MAGs (FASTA). Reagents/Software: antiSMASH, BiG-SCAPE, PRISM. Procedure:

BGC Prediction: Run antiSMASH on both HQ and MQD MAGs with standard parameters.
- antismash bin.fasta --output-dir antismash_result --genefinding-tool prodigal
Cluster Comparison: For a target BGC family (e.g., Type I PKS), extract gene cluster sequences from results.
Structural Analysis: Compare gene cluster architecture. Note fragmentation points in MQD MAGs (e.g., partial core genes, missing regulatory elements).
Network Analysis: Use BiG-SCAPE to correlate BGC completeness (from HQ MAGs) with phylogenetic placement. Note how fragmented MQD BGCs may form separate, potentially artifactual, branches in similarity networks.

Visualizations

Diagram 1: MAG Quality Assessment Workflow

Diagram 2: Downstream Analysis Impact of Quality Tiers

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Tools and Materials for MAG Quality Research

Item	Function/Description	Key Application
CheckM / CheckM2	Assesses MAG completeness and contamination using conserved single-copy marker genes.	Primary quality scoring for MIMAG tier assignment.
GTDB-Tk	Provides standardized taxonomic classification against the Genome Taxonomy Database.	Consistent phylogenetic placement of HQ/MQD MAGs.
antiSMASH	Identifies, annotates, and analyzes biosynthetic gene clusters in microbial genomes.	Core tool for drug discovery potential in HQ MAGs.
DAS Tool	Integrates results from multiple binning algorithms to produce an optimal set of MAGs.	Improves bin quality pre-checkM, enhancing HQ yield.
Barrnap & tRNAscan-SE	Predicts ribosomal and transfer RNA genes, respectively.	Essential for verifying HQ MAG criteria (rRNA/tRNA presence).
Anvi'o / metaWRAP	Interactive visualization and refinement platforms for metagenomic data.	Manual curation of bins, crucial for resolving contamination.
High-Molecular-Weight DNA Kit	Extraction of long, intact DNA from environmental samples.	Improves assembly contiguity, foundational for HQ MAGs.
Long-Read Sequencing (PacBio, Nanopore)	Generates reads spanning repetitive regions and complex BGCs.	Critical for assembling complete, uninterrupted MAGs and BGCs.

This application note is framed within the thesis research on applying Minimum Information about a MAG (MIMAG) standards to genome quality assessment within the phylum Marinisomatota (formerly Marinimicrobia). The genus Marinomonas serves as an ideal case study due to its ecological relevance in marine environments and the growing availability of both isolate genomes and Metagenome-Assembled Genomes (MAGs). This document provides protocols and comparative data to evaluate MAG quality against the traditional "gold standard" of isolate sequencing.

Quantitative Comparison: MAGs vs. Isolate Genomes

Table 1: Summary Statistics of Publicly Available Marinomonas Genomes (as of 2024)

Metric	Isolate Genomes (n=~45)	Medium/High-Quality MAGs (n=~120)	MIMAG Standard (High-Quality)
Average Completeness (%)	99.8	92.5	≥90
Average Contamination (%)	0.1	2.8	<5
Presence of 16S rRNA gene	100%	31%	Complete (≥1 copy)
tRNA genes (avg. count)	46	38	≥18
N50 (avg. kb)	3,452	187	N/A
# Contigs (avg.)	1 (Complete)	52	N/A
CheckM2 Quality Score	0.97	0.85	N/A

Table 2: Functional Gene Comparison (% of BUSCO genes present)

Gene Set (Marine Bacterium)	Isolate Average	MAG Average	Key Functional Gaps in MAGs
Single-Copy Core Genes	99.5%	91.2%	Energy metabolism, Transcription
Secondary Metabolism	98%	75%	Biosynthetic gene clusters (BGCs)
Stress Response	97%	82%	Oxidative stress regulators
Cell Wall & Membrane	99%	88%	Peptidoglycan biosynthesis

Experimental Protocols

Protocol 3.1: Generating a High-QualityMarinomonasMAG from Metagenomic Data

Objective: Reconstruct a MAG meeting MIMAG high-quality standards from marine metagenomic samples.

Materials: See "Scientist's Toolkit" (Section 6).

Procedure:

Sequencing & Quality Control:
- Perform shotgun metagenomic sequencing (Illumina NovaSeq & PacBio HiFi recommended for hybrid assembly).
- Use FastQC v0.12.1 and Trimmomatic v0.39 for read QC and adapter trimming.
Co-assembly & Binning:
- Perform co-assembly of multiple related samples using metaSPAdes v3.15.5 or Flye v2.9.2 (for long reads).
- Bin contigs >1500bp into draft genomes using metaWRAP v1.3.2 pipeline: run MaxBin2, MetaBAT2, and CONCOCT independently.
- Use the metaWRAP Bin_refinement module to consolidate bins, selecting the optimal set based on completeness >90% and contamination <5%.
Quality Assessment & Dereplication:
- Run CheckM2 v1.0.1 to estimate completeness/contamination.
- Use GTDB-Tk v2.3.0 to assign taxonomy.
- Dereplicate genomes using dRep v3.4.0 (ANIg 95%) to avoid redundant Marinomonas MAGs.
MIMAG Compliance Check:
- Use Barrnap v0.9 to identify 16S rRNA genes.
- Use tRNAscan-SE v2.0.9 to count tRNAs.
- Annotate with Prokka v1.14.6 or PGAP.
- Compile all metrics into a standard MIMAG report.

Protocol 3.2: Wet-Lab Validation of aMarinomonasMAG

Objective: Validate key genomic features predicted in a MAG via PCR and cultivation attempts.

Procedure:

Design of Validation Probes:
- Identify 3-5 single-copy core genes unique to the Marinomonas clade of interest from the MAG.
- Design PCR primers (18-22 bp, Tm ~60°C) targeting these genes.
PCR from Source Metagenomic DNA:
- Use the original environmental DNA as template.
- Perform PCR with high-fidelity polymerase (e.g., Q5).
- Sequence amplicons and confirm 100% identity to MAG sequence.
Fluorescence In Situ Hybridization (FISH):
- Design a specific oligonucleotide probe (15-25 nt) complementary to a unique 16S rRNA region of the MAG.
- Label probe with Cy3 fluorescent dye.
- Apply FISH to fixed environmental sample filters; visualize cells via epifluorescence microscopy to confirm physical presence and morphology.

Visualization of Workflows

MAG Generation & Validation Workflow

MAG Suitability Decision Tree

Application Notes

For Drug Development (BGC Discovery): Rely on isolate genomes for complete biosynthetic gene cluster (BGC) characterization. Use MAGs for initial discovery but expect fragmentation; prioritize MAGs with high continuity (N50 > 100kb) and confirm key adenylation (A) and ketosynthase (KS) domains via PCR.
For Evolutionary Studies: MAGs dramatically increase population sampling. Use phylogenies built from >50 single-copy core genes. Always filter trees by completeness to avoid artifacts from missing data.
For Metabolic Modeling (GEMs): High-quality MAGs (completeness >95%, contamination <1%) can yield draft models. Use isolate genomes as templates for gap-filling. The lack of a physical isolate prevents experimental validation of growth predictions.

The Scientist's Toolkit

Table 3: Essential Research Reagents & Tools for MAG-Based Marinisomatota Research

Item	Function/Description	Example Product/Software
MarineDNA Extraction Kit	Efficient lysis of Gram-negative bacteria in complex marine matrices.	DNeasy PowerWater Kit (QIAGEN)
High-Fidelity Polymerase	Accurate amplification of validation targets from low-biomass DNA.	Q5 Hot Start (NEB)
Cy3-labeled FISH Probe	Visualize target cells in environmental samples for MAG validation.	Custom Stellaris probe
CheckM2 / BUSCO	Assess genome completeness/contamination using lineage-specific markers.	CheckM2 (DB v2.1.0)
GTDB-Tk Database	Current taxonomic classification relative to Genome Taxonomy Database.	GTDB Release 220
antiSMASH	Annotate Biosynthetic Gene Clusters (BGCs) in MAGs/isolates.	antiSMASH v7.0
dRep	Dereplicate genome sets; crucial for managing large MAG collections.	dRep v3.4.0
Prokka	Rapid prokaryotic genome annotation for functional comparison.	Prokka v1.14.6

This application note details the protocol for evaluating publicly available Metagenome-Assembled Genomes (MAGs) of the genus Marinomonas (phylum Marinisomatota, formerly Bacteroidota) against the Minimum Information about a Metagenome-Assembled Genome (MIMAG) standard. The work is framed within a broader thesis investigating genome quality and standardization in Marinisomatota research, which is critical for accurate taxonomic classification, metabolic potential inference, and downstream applications in biotechnology and drug discovery.

Key Research Reagent Solutions & Essential Materials

Item	Function / Explanation
Public Sequence Read Archive (SRA)	Primary source for raw metagenomic sequencing data associated with published Marinomonas MAGs.
CheckM2 / BUSCO	Software tools for assessing MAG completeness and contamination. Essential for MIMAG quality tier assignment.
GTDB-Tk (v2.3.0)	Toolkit for consistent taxonomic classification against the Genome Taxonomy Database, crucial for MIMAG's "taxonomy" requirement.
rrnASM / Barmap	Tools for identifying 5S, 16S, 23S rRNA genes and tRNA genes to meet MIMAG's "gene annotation" thresholds.
DRAM / KofamScan	Systems for functional annotation of metabolic pathways and tailoring of metabolism-specific databases.
MiGA	Microbial Genome Atlas used for calculating ANI (Average Nucleotide Identity) to determine species boundaries.
Prokka / Bakta	Automated pipelines for consistent structural genome annotation (CDS, RNA genes).

Experimental Protocol: MIMAG Compliance Evaluation Workflow

Protocol 3.1: MAG Curation and Data Acquisition

Systematic Search: Query the NCBI Assembly and GenBank databases using the term "Marinomonas" filtered by "metagenome-assembled genome" or "MAG". Record accession numbers.
Metadata Collection: For each identified MAG, download associated publication metadata, including sampling environment (e.g., marine sediment, algal surface), sequencing platform (Illumina, PacBio), and assembly software.
Data Retrieval: Download genomic FASTA files (.fna) and annotation files (.gff, .faa) from NCBI using datasets CLI tool or wget.

Protocol 3.2: Genome Quality Assessment (MIMAG Field: "Genome quality")

Completeness/Contamination:
- Run CheckM2: checkm2 predict --input <mag.fasta> --output-directory <output_dir> --threads 8
- Record the "Completeness" and "Contamination" percentages from the output quality_report.tsv.
Quality Tier Assignment: Classify each MAG per MIMAG:
- High-quality draft: ≥90% complete, <5% contaminated.
- Medium-quality draft: ≥50% complete, <10% contaminated.
- Low-quality draft: All others.

Protocol 3.3: Taxonomic Classification (MIMAG Field: "Taxonomy")

Run GTDB-Tk for standardized classification: gtdbtk classify_wf --genome_dir <input_dir> --out_dir <output_dir> --cpus 8 --extension fna
Parse the gtdbtk.bac120.summary.tsv file to obtain domain to species-level classification and the ANI to reference genome.

Protocol 3.4: Gene Annotation Assessment (MIMAG Fields: "rRNA genes", "tRNA genes")

rRNA Gene Identification: Use rrnASM: rrnasm --minlength=500 --identity=0.95 <mag.fasta>. Count full-length 5S, 16S, 23S rRNA genes.
tRNA Gene Identification: Use tRNAscan-SE: tRNAscan-SE -B -o <output.txt> <mag.fasta>. Count total tRNA genes and distinct anticodons.
Compliance Check: Assess against MIMAG "minimum" annotation standards: High/Medium-quality drafts require ≥1 copy of each rRNA and ≥18 tRNAs.

Protocol 3.5: Functional Annotation & Metadata Completeness

Metadata Audit: Verify the presence of 17 core MIMAG metadata fields (e.g., investigation type, project name, geographic location) in the associated BioSample record.
Functional Potential: Annotate using DRAM: DRAM.py annotate -i '*.fna' -o dram_output. Review output for key metabolic pathways (e.g., hydrocarbon degradation, osmoregulation) relevant to Marinomonas ecology.

Data Presentation: Evaluation of PublishedMarinomonasMAGs

Table 1: MIMAG Compliance Summary for Five Representative Marinomonas MAGs

NCBI Assembly Accession	MIMAG Quality Tier	CheckM2 Completeness (%)	CheckM2 Contamination (%)	16S rRNA Count	tRNA Count (>18?)	GTDB Taxonomy (Species)	MIMAG Metadata Fields Populated (/17)
GCA_030856005.1	High-quality draft	98.7	0.5	1	24 (Yes)	Marinomonas sp.	15
GCA_025204215.1	Medium-quality draft	87.2	1.8	0	16 (No)	Marinomonas sp.	11
GCA_022873545.1	High-quality draft	99.1	0.9	2	32 (Yes)	Marinomonas sp.	16
GCA_028846365.1	Medium-quality draft	76.5	4.1	1	22 (Yes)	Marinomonas sp.	9
GCA_026557225.1	Low-quality draft	45.3	12.5	0	9 (No)	Marinomonas sp.	7

Table 2: Analysis of Protocol-Derived Functional Annotations (DRAM)

Assembly Accession	Key Pathway 1 (Score*)	Key Pathway 2 (Score*)	Relevant Gene Cluster Identified
GCA_030856005.1	Ectoine Synthesis (4)	Polyhydroxyalkanoate Metabolism (3)	Complete ectABCD operon
GCA_025204215.1	Denitrification (2)	Sulfur Oxidation (1)	Partial narGHJI cluster
GCA_022873545.1	Alkane Degradation (4)	Cobalamin Synthesis (4)	alkB gene, cob operon
GCA_028846365.1	Flagellar Assembly (4)	Chemotaxis (4)	Full flg, fli, che clusters
GCA_026557225.1	Glycolysis (4)	TCA Cycle (3)	Core metabolic genes only

*DRAM completeness score: 0 (absent) to 4 (complete).

Visualization of Workflows and Relationships

Title: MIMAG Compliance Evaluation Workflow

Title: MIMAG Quality Tier Decision Tree

The Role of MIMAG in Database Curation (e.g., GenBank, IMG/M) and Meta-Analyses

The Minimum Information about a Metagenome-Assembled Genome (MIMAG) standard, established by the Genomic Standards Consortium (GSC), provides a critical framework for reporting genome quality. This is especially pertinent for genomes from uncultivated microorganisms, such as those from the candidate phylum Marinisomatota. The consistent application of MIMAG standards in major public repositories (GenBank, IMG/M) ensures data integrity, facilitates comparative meta-analyses, and directly supports downstream research in fields like microbial ecology and natural product discovery for drug development.

MIMAG Standards: Core Criteria & Quantitative Benchmarks

The MIMAG standard specifies two primary tiers of genome quality: Medium-quality draft (MQD) and High-quality draft (HQD), based on completeness, contamination, and the presence of a ribosomal RNA gene cluster and transfer RNA genes.

Table 1: MIMAG Quality Tiers and Quantitative Requirements

Criterion	Medium-Quality Draft (MQD)	High-Quality Draft (HQD)	Relevance to Marinisomatota
Completeness	≥50%	≥90%	Critical for accurate functional potential assessment in understudied phyla.
Contamination	<10%	<5%	Essential for confident assignment of metabolic pathways to the target genome.
rRNA Genes	Presence of 5S, 16S, 23S genes is recommended	Presence of 5S, 16S, 23S genes is required	16S gene enables phylogenetic placement and linking to 16S amplicon studies.
tRNA Genes	≥18 tRNAs recommended	≥18 tRNAs required	Indicates adequacy for translation; supports genome completeness metrics.

Application Notes: Curation in Public Databases

GenBank Submission Protocol

Step 1: Genome Quality Assessment. Prior to submission, assess your Marinisomatota MAG using CheckM2 (completeness/contamination) and barrnap/tRNAscan-SE (rRNA/tRNA). Annotate using the NCBI Prokaryotic Genome Annotation Pipeline (PGAP) or a comparable tool.
Step 2: MIMAG Metadata Compilation. Prepare a metadata table compliant with the GSC's "MIMAG of a metagenome-assembled genome" checklist. This includes assembly metrics, sequencing platform, binning method, and the quality metrics from Step 1.
Step 3: Submission via NCBI. Use the NCBI Genome submission portal. Upload the assembly FASTA file and the associated annotation. The metadata must be explicitly tagged within the submission to indicate MIMAG compliance and the achieved quality tier.
Step 4: Validation. The database curators will validate the technical compatibility of the files. Consistent community use of MIMAG terminology (e.g., "high-quality draft") streamlines this process.

IMG/M Data Integration and Querying

Protocol for Leveraging MIMAG in IMG/M: Within the IMG/M system, genomes are tagged with quality metrics. To perform a robust meta-analysis targeting Marinisomatota:
- Use the "Find Genomes" function.
- Apply filters: Ecosystem = "Marine" (or other relevant habitat), Phylogeny to include "Marinisomatota".
- Critical Step: Set the "Quality" filter to Completeness ≥ 90% and Contamination ≤ 5% to isolate high-quality drafts per MIMAG.
- Use the resulting genome set for comparative analysis (e.g., KEGG pathway profile comparison, genome neighborhood analysis for biosynthetic gene clusters).

Experimental Protocols for MIMAG-Compliant MAG Generation

Wet-Lab Protocol: Metagenomic Sequencing for MAG Reconstruction

Objective: Generate high-molecular-weight DNA suitable for long-read sequencing to improve Marinisomatota genome assembly.
Materials:
- Environmental sample (e.g., marine sediment filtrate).
- Sterivex-GP 0.22 μm filter unit or equivalent for biomass concentration.
- DNA preservation buffer (e.g., ALS buffer).
- MetaPolyzyme cocktail for microbial cell lysis.
- Magnetic bead-based HMW DNA cleanup kit (e.g., AMPure XP).
- Qubit fluorometer and dsDNA HS assay kit.
- Nanopore ligation sequencing kit or PacBio SMRTbell prep kit.
Method:
- Biomass Concentration & Lysis: Filter 10-100L of seawater through a 0.22μm filter. Lyse cells on-filter using MetaPolyzyme.
- HMW DNA Extraction: Follow a phenol-chloroform-free, column- and bead-based extraction protocol designed to preserve long fragments.
- DNA Quality Control: Assess concentration (Qubit) and fragment size distribution (pulsed-field gel electrophoresis or FemtoPulse).
- Library Preparation & Sequencing: Proceed with platform-specific library prep for long-read sequencing.

Computational Protocol: From Reads to MIMAG-Classified MAG

Objective: Process raw sequence data to generate a MAG and assign its MIMAG quality tier.
Workflow: See Diagram 1.
Software & Commands:
- Quality Control & Assembly:
- Binning:
- MIMAG Quality Assessment:
- Taxonomy & Annotation:

Diagram 1: Computational Workflow for MIMAG-Compliant MAG Generation

Title: MAG Generation and MIMAG Assessment Workflow

The Scientist's Toolkit: Research Reagent Solutions

Table 2: Essential Reagents and Tools for MIMAG-Compliant Marinisomatota Research

Item Name	Category	Function/Benefit
MetaPolyzyme	Wet-Lab Reagent	Enzymatic cocktail for efficient lysis of diverse microbial cell walls in environmental samples, maximizing DNA yield.
AMPure XP Beads	Wet-Lab Reagent	Magnetic beads for size-selective purification of HMW DNA, crucial for long-read sequencing.
CheckM2 Database	Computational Tool	Provides the most current set of marker genes for robust estimation of genome completeness and contamination.
GTDB-Tk (v2.3.0+)	Computational Tool	Standardized tool for assigning accurate taxonomy to MAGs, essential for phylum-level identification (e.g., Marinisomatota).
DRAM (v1.4+)	Computational Tool	Distills functional annotations (KEGG, Pfam) into metabolic pathways and highlights potential biosynthetic gene clusters for drug discovery.
NCBI PGAP Pipeline	Curation Service	Provides consistent, high-quality annotation required for GenBank submission, enabling comparative meta-analyses.

Meta-Analysis Protocol Using MIMAG-Curated Data

Objective: Identify conserved and unique biosynthetic gene clusters (BGCs) across high-quality Marinisomatota genomes.
Method:
- Dataset Curation: From IMG/M or GenBank, download all genomes labeled as "Marinisomatota" and filter for those meeting MIMAG HQD criteria (Table 1).
- Standardized Re-annotation: Re-annotate the filtered genome set uniformly using antiSMASH or DRAM with identical parameters to ensure comparability.
- BGC Profiling: Extract BGC types (e.g., NRPS, PKS, terpene) and their genomic loci from the annotation outputs.
- Comparative Analysis: Create a presence/absence matrix of BGC types per genome. Use clustering (UPGMA) and ordination (PCoA) to visualize patterns. Perform phylogenomic analysis (using a set of conserved single-copy marker genes) and map BGC profiles onto the tree to assess phylogenetic conservation.
Data Presentation: Results should be summarized in two tables:
- Table 3: Summary of the curated dataset (N genomes, average completeness/contamination, list of primary habitats).
- Table 4: Count and frequency of each BGC type across the HQD genome set, highlighting unique clusters.

Diagram 2: Meta-Analysis of MIMAG-Curated Genomes

Title: Meta-Analysis Flow Using MIMAG Filters

Conclusion

Adherence to MIMAG standards is not merely an administrative hurdle but a fundamental practice for ensuring the reliability and utility of Marinomonas and marine microbiome genomes. This synthesis highlights that robust foundational understanding, meticulous methodological application, proactive troubleshooting, and rigorous comparative validation are all interconnected pillars supporting high-quality genomic science. For biomedical and clinical research, particularly in marine biodiscovery, MIMAG-compliant genomes provide a trusted foundation for identifying novel biosynthetic gene clusters, understanding pathogenicity or symbiosis mechanisms, and prioritizing strains for further development. Future directions include the integration of long-read sequencing to overcome current fragmentation limits, the development of marine-specific contamination markers, and the potential evolution of standards to encompass functional and epigenetic data. Ultimately, widespread adoption of these standards will accelerate the translation of marine microbial diversity into tangible therapeutic and biotechnological breakthroughs.

Ensuring Marine Microbiome Data Integrity: A Comprehensive Guide to MIMAG Standards for Marinomonas Genome Quality

Ensuring Marine Microbiome Data Integrity: A Comprehensive Guide to MIMAG Standards for Marinomonas Genome Quality

Abstract

What Are MIMAG Standards and Why Are They Critical for Marine Microbiome Research?

MIMAG Standards: Core Criteria and Quantitative Benchmarks

Protocols for MIMAG-Compliant Marinisomatota Genome Analysis

Protocol 1: Genome-Resolved Metagenomic Assembly and Binning

Protocol 2: MIMAG Quality Assessment and Tier Assignment

Protocol 3: Functional Annotation for Drug Development Context

The Scientist's Toolkit: Research Reagent Solutions

Visualizations

Application Notes

Ecological Significance & Quantitative Metrics

Biotechnological Potential & Performance Data

Experimental Protocols

Protocol: Genome-Resolved Metagenomic Analysis forMarinisomatotaMIMAG Compliance

Protocol: High-Throughput Screening for Cold-Active Enzyme Activity

The Scientist's Toolkit: Research Reagent Solutions

Application Notes: Implementing MIMAG Standards forMarinisomatotaGenome Research

Key Quantitative Metrics & Benchmarks

Experimental Protocols

Protocol 1: Genome-Resolved Metagenomic Assembly and Binning forMarinisomatota

Protocol 2: MIMAG-Compliant Quality Assessment and Curation

Mandatory Visualizations

The Scientist's Toolkit: Research Reagent Solutions

Application Notes

Protocols

Protocol 1: Generation of a MIMAG-CompliantMarinisomatotaGenome Draft

Protocol 2: Standardized Comparative Genomic Analysis for Pathway Discovery

Visualizations

The Scientist's Toolkit: Research Reagent Solutions

A Step-by-Step Workflow: Applying MIMAG Standards to Your Marinomonas Genome Project

Core Principles & Quantitative Benchmarks

Detailed Protocols

Protocol: Sterile Seawater Collection for Omics

Protocol: Metagenomic DNA Extraction from Marine Filters

Signaling Pathway & Workflow Visualizations

The Scientist's Toolkit: Research Reagent Solutions

Application Notes: The Role of MIMAG Standards inMarinisomatotaResearch

Experimental Protocols

Protocol 1: Assessing Genome Completeness and Contamination with CheckM2

Protocol 2: Identifying rRNA and tRNA Genes

Visualizations

The Scientist's Toolkit

Application Notes

Key Genomic & Phenotypic Markers forMarinomonas

Experimental Protocols

Protocol 1: Phylogenomic Tree Reconstruction for Genus Assignment

Protocol 2: Calculation of Average Amino Acid Identity (AAI) & POCP

Protocol 3: In Silico Phenotype Profiling

Visualization

The Scientist's Toolkit

Application Notes

Experimental Protocols

Protocol 1: Advanced Functional Annotation Pipeline forMarinisomatotaMAGs

Protocol 2: Comparative Genomic Analysis for BGC Prioritization

The Scientist's Toolkit: Research Reagent Solutions

Visualizations

Solving Common Pitfalls: How to Improve Genome Quality and MIMAG Compliance for Marine MAGs

Diagnostic Protocol: Quantifying and Identifying Contamination

Remediation Protocol: Decontamination and Refinement

Optimizing Assembly Parameters for Complex, Low-Abundance Marine Communities

Detailed Experimental Protocols

Protocol 3.1: Optimized Hybrid Assembly Workflow

Protocol 3.2: TargetedMarinisomatotaEnrichment Verification via 16S rRNA Gene Phylogeny

Diagrams

The Scientist's Toolkit: Research Reagent Solutions

Addressing Strain Heterogeneity and Fragmentation in Marinomonas MAGs

Application Notes: The Challenge of Strain Heterogeneity inMarinomonasMAGs

Detailed Experimental Protocols

Protocol 2.1: Pre-assembly Filtering to Reduce Heterogeneity

Protocol 2.2: Co-assembly and Iterative Binning with Heterogeneity Check

Protocol 2.3: Single-Nucleotide Variant (SNV) Analysis for Strain Deconvolution

Visualizations

Diagram 1: Workflow for Addressing Strain Heterogeneity

Diagram 2: Strain Heterogeneity Detection via SNV Analysis

The Scientist's Toolkit: Research Reagent Solutions

Application Notes

Binners for Marine Metagenomes

CheckM-like Tools for Quality Assessment