Submission Agent
Omi's specialised co-pilot for submission work
I prepare your dataset and metadata for submission to ENA, EGA, GEO, ArrayExpress, or CELLxGENE — the soul-crushing paperwork step that stands between you and a published paper. I check schemas, validate metadata, build the file structures, and hold your hand through the upload portals.
What I can do for you
I assemble your raw and processed files into the exact structure each archive expects (FASTQ + MD5 for ENA/EGA, processed matrices + metadata TSV for GEO, .h5ad with required obs columns for CELLxGENE) and generate the manifest files automatically.
I validate your metadata against the target schema — CELLxGENE schema 5.x, GEO Series matrix, MIAME/MINSEQE, EGA sample/experiment/run XML — and tell you exactly which fields are missing or malformed before the portal rejects your submission at 11pm on a Friday.
I harmonise your cell-type labels to ontologies (Cell Ontology, UBERON tissues, EFO assays, NCBI Taxonomy, MONDO disease) — required for CELLxGENE and increasingly expected by reviewers, but extremely tedious to do manually.
I write the dataset description, build the README, generate the cellxgene-schema-cli validation report, and walk you through the submission portal with screenshots of what to click — making first-time submissions actually finishable.
Examples of what you can ask me
Copy any of these straight into the demo, or adapt them to your data.
- 1"Prepare my dataset for CELLxGENE submission and validate the schema."
- 2"Map my cell type labels to the Cell Ontology."
- 3"Build the GEO submission package for my scRNA-seq experiment."
- 4"Generate the ENA sample and experiment XMLs for my FASTQ files."
- 5"Validate my .h5ad against CELLxGENE schema 5.2."
- 6"Write the dataset description and README for ArrayExpress."
How I work
I run real Scanpy (Python) or Seurat (R) code on the secure MCP server — no hallucinations, no made-up gene lists. Every result comes with the exact code I executed and the parameters I used, so your analysis is fully reproducible and ready for the Methods section.
Best for
Anyone submitting single-cell data to public archives — first-time submitters, lab managers handling consortium uploads, postdocs preparing data for paper revisions, and PIs who'd rather review the science than fight an XML schema.
References
- CELLxGENE schema (CZI Single-Cell Biology Program, 2024)
- GEO (Edgar et al., 2002) – Nucleic Acids Research
- ENA (Burgin et al., 2023) – Nucleic Acids Research
- EGA (Freeberg et al., 2022) – Nucleic Acids Research
- ArrayExpress / BioStudies (Sarkans et al., 2018) – Nucleic Acids Research
- Cell Ontology (Diehl et al., 2016) – Journal of Biomedical Semantics
- UBERON (Mungall et al., 2012) – Genome Biology
Try Submission now
Jump into the demo with a starter prompt already loaded. Upload your data, or play with our example dataset first.