All agents
Specialised agent

Submission Agent

Omi's specialised co-pilot for submission work

I prepare your dataset and metadata for submission to ENA, EGA, GEO, ArrayExpress, or CELLxGENE — the soul-crushing paperwork step that stands between you and a published paper. I check schemas, validate metadata, build the file structures, and hold your hand through the upload portals.

What I can do for you

I assemble your raw and processed files into the exact structure each archive expects (FASTQ + MD5 for ENA/EGA, processed matrices + metadata TSV for GEO, .h5ad with required obs columns for CELLxGENE) and generate the manifest files automatically.

I validate your metadata against the target schema — CELLxGENE schema 5.x, GEO Series matrix, MIAME/MINSEQE, EGA sample/experiment/run XML — and tell you exactly which fields are missing or malformed before the portal rejects your submission at 11pm on a Friday.

I harmonise your cell-type labels to ontologies (Cell Ontology, UBERON tissues, EFO assays, NCBI Taxonomy, MONDO disease) — required for CELLxGENE and increasingly expected by reviewers, but extremely tedious to do manually.

I write the dataset description, build the README, generate the cellxgene-schema-cli validation report, and walk you through the submission portal with screenshots of what to click — making first-time submissions actually finishable.

Examples of what you can ask me

Copy any of these straight into the demo, or adapt them to your data.

  • 1"Prepare my dataset for CELLxGENE submission and validate the schema."
  • 2"Map my cell type labels to the Cell Ontology."
  • 3"Build the GEO submission package for my scRNA-seq experiment."
  • 4"Generate the ENA sample and experiment XMLs for my FASTQ files."
  • 5"Validate my .h5ad against CELLxGENE schema 5.2."
  • 6"Write the dataset description and README for ArrayExpress."

How I work

I run real Scanpy (Python) or Seurat (R) code on the secure MCP server — no hallucinations, no made-up gene lists. Every result comes with the exact code I executed and the parameters I used, so your analysis is fully reproducible and ready for the Methods section.

Best for

Anyone submitting single-cell data to public archives — first-time submitters, lab managers handling consortium uploads, postdocs preparing data for paper revisions, and PIs who'd rather review the science than fight an XML schema.

References

  • CELLxGENE schema (CZI Single-Cell Biology Program, 2024)
  • GEO (Edgar et al., 2002) – Nucleic Acids Research
  • ENA (Burgin et al., 2023) – Nucleic Acids Research
  • EGA (Freeberg et al., 2022) – Nucleic Acids Research
  • ArrayExpress / BioStudies (Sarkans et al., 2018) – Nucleic Acids Research
  • Cell Ontology (Diehl et al., 2016) – Journal of Biomedical Semantics
  • UBERON (Mungall et al., 2012) – Genome Biology

Try Submission now

Jump into the demo with a starter prompt already loaded. Upload your data, or play with our example dataset first.

Other agents you might like

TCR Agent

Track clonotypes and link them to gene expression.

Learn more

Multiomics Agent

Integrate scRNA + scATAC and more.

Learn more

Mutation Calling Agent

Detect variants in single cells.

Learn more