ML / AI Agent
Omi's specialised co-pilot for ml / ai work
I learn the associations between your single-cell data and clinical outcomes — response vs non-response, progression-free survival, disease subtype — and build models that actually generalise. Think of me as the friendly statistician who also knows deep learning, single-cell quirks, and how to not overfit your 12-patient cohort.
What I can do for you
I train classifiers (logistic regression, random forests, XGBoost, MLPs) on cell-type proportions, pseudobulk expression, or learned embeddings to predict clinical outcomes — with proper cross-validation, calibration, and confidence intervals so you don't fool yourself.
I build patient-level representations from single-cell data using MIL (multiple-instance learning), scPoli, scFoundation, or Geneformer embeddings, and use them to predict response, survival, or subtype in a way that respects the hierarchical (cells-in-patients) structure.
I run feature importance (SHAP, permutation), surface the genes and cell populations driving the prediction, and translate that into testable biological hypotheses — not a black box, an explainable one.
I handle small-cohort reality with stratified CV, nested CV for hyperparameter tuning, leakage checks, and class-imbalance strategies — and I'll tell you honestly when your sample size just isn't enough rather than letting you publish a 0.99 AUC that won't replicate.
Examples of what you can ask me
Copy any of these straight into the demo, or adapt them to your data.
- 1"Train a model to predict immunotherapy response from baseline scRNA."
- 2"Build a classifier for COVID severity using PBMC composition."
- 3"Which cell populations are most predictive of relapse?"
- 4"Use Geneformer embeddings to predict patient subtype."
- 5"Run SHAP on my XGBoost model and explain the top features biologically."
- 6"Cross-validate my classifier with leave-one-patient-out CV."
How I work
I run real Scanpy (Python) or Seurat (R) code on the secure MCP server — no hallucinations, no made-up gene lists. Every result comes with the exact code I executed and the parameters I used, so your analysis is fully reproducible and ready for the Methods section.
Best for
Translational researchers, clinician-scientists with patient cohorts, biomarker discovery teams, and computational biologists building predictive models from single-cell data who want rigour, not hype.
References
- XGBoost (Chen & Guestrin, 2016) – KDD
- SHAP (Lundberg & Lee, 2017) – NeurIPS
- scPoli (De Donno et al., 2023) – Nature Methods
- Geneformer (Theodoris et al., 2023) – Nature
- scFoundation (Hao et al., 2024) – Nature Methods
- scikit-learn (Pedregosa et al., 2011) – JMLR
Try ML / AI now
Jump into the demo with a starter prompt already loaded. Upload your data, or play with our example dataset first.