What if your lab goggles could stop you from making a mistake mid-experiment? What if an AI riding along inside a surgeon’s smart glasses could flag a wrong tool, identify an anatomical structure in real time, and coordinate a robotic arm — all before a single error cascades into harm? That future has arrived, and it has a name: LabOS — and its clinical evolution, MedOS.
Born out of the Stanford-Princeton AI Coscientist Team and led by Dr. Le Cong (Stanford) and Dr. Mengdi Wang (Princeton), these platforms are not software demos or speculative research. They are deployed, benchmarked, and expanding — across elite universities, hospitals, and now operating theatre simulations.
Origins
LabOS: The Co-Scientist That Learns by Watching You Work
Science has always suffered a gap between what AI can compute and what researchers can physically execute at the bench. LabOS was designed to close that gap permanently.
Published on arXiv in October 2025 and updated in December, LabOS is formally described as the first AI co-scientist to unite computational reasoning with physical experimentation. At its technical heart is the STELLA framework — a self-evolving multi-agent architecture that never stops learning. Four specialized agents — a Manager, Developer, Critic, and Tool-Creator — collaborate through a shared “Tool Ocean” of over 98 biomedical tools, expanding their capabilities at runtime with every interaction.
⚡ Key Innovation
LabOS doesn’t just run pre-programmed workflows. It grows its own tools. Every experiment becomes training data. Every protocol deviation feeds the model. The result is a system that gets measurably smarter the more it’s used — a concept the team calls self-evolving scientific infrastructure.
To give the system genuine eyes in the lab, the team built LabSuperVision (LSV) — a vision-language model (VLM) benchmark trained on over 200 egocentric lab videos captured by seven researchers across benches, tissue culture rooms, and instrument bays. When tested against top commercial models — GPT-4o, Gemini 2.5 Pro, Cosmos-1 — all of them scored below 3 out of 5 on protocol accuracy. LabOS’s fine-tuned model, LabOS-VLM-235B, exceeded 90% — catching sterile breaches, incorrect incubation times, and procedural missteps in real time.
How the XR Layer Works
Researchers wear augmented reality glasses that stream live egocentric video to LabOS servers. The system compares what it sees against the written protocol, returning JSON with step alignment, warnings, and suggestions — displayed as a heads-up overlay on the wearer’s view. It talks the scientist through each step, flags deviations, and documents actions with timestamps for automatic reproducibility records.
“We can have 1,000 chatbots, 1,000 AI scientists trying to tell real scientists what to do — but if AI isn’t wired into the physical experiment, we never have anything verifiable.”
— Dr. Mengdi Wang, Princeton AI Lab
LabOS has also demonstrated performance breakthroughs on biomedical reasoning benchmarks — achieving top scores on HLE: Biomedicine, LAB-Bench DBQA, and LAB-Bench LitQA, outperforming frontier models. Its performance scales with inference-time compute, meaning more resources yield smarter decisions in real time.
Real-World Applications Already Deployed
Applications span three major scientific domains:
🧬Cancer Immunotherapy — Functional screening to identify NK cell targets, accelerating discovery pipelines from months to weeks.
🔬Stem Cell Engineering — XR-guided gene editing of iPSCs with real-time deviation alerts and micro-action documentation.
⚗️Materials Science — Autonomous hypothesis testing and mechanistic investigation, with AI proposing and iterating experiments.
Stanford University and Princeton University have filed a patent application related to this work, underscoring its commercial and translational potential.
Evolution
MedOS: The World Model for Medicine — Perception, Action, Simulation
If LabOS taught AI to see through a scientist’s eyes, MedOS puts that intelligence into scrubs. Launched February 11, 2026 by the same Stanford-Princeton team — now with the addition of clinical collaborators Dr. Rebecca Rojansky and Dr. Christina Curtis, and materials scientist Dr. Zhenan Bao — MedOS is described as “the first AI-XR-Cobot system designed to actively assist clinicians inside real clinical environments.”
The system combines three hardware modalities into a single cognitive loop: smart glasses (for real-time egocentric perception), robotic arms (for physical assistance and sample handling), and a multi-agent AI brain that reasons, plans, and acts — all in the time it takes a surgeon to reach for a tool.
🌐 World Model for Medicine
MedOS introduces a continuous feedback architecture that merges perception (what the clinician sees), intervention (what the robot does), and simulation (what the AI predicts) into a single closed loop — a true world model for clinical environments, operating in 3D spatial intelligence.
Benchmark Performance: Beating the Frontier
The performance numbers are striking. MedOS’s multi-agent architecture achieved 97% accuracy on MedQA (USMLE) and 94% on GPQA — outperforming Gemini-3 Pro, GPT-5.2 Thinking, and Claude 4.5 Opus. This was not achieved by a single large model, but by a coordinated agent team that mirrors clinical reasoning logic, synthesizes evidence, and manages procedures simultaneously.
97%MedQA / USMLE — beating all frontier AI models tested
94%GPQA accuracy — surpassing GPT-5.2 Thinking and Gemini-3 Pro
72→91%Medical student performance improvement with MedOS assistance
MedSuperVision: The Largest Medical Video Dataset Ever Built
Underpinning MedOS’s clinical vision intelligence is MedSuperVision — the largest open-source medical video dataset in existence, featuring over 85,000 minutes of surgical footage captured from 1,882 clinical experts. Available at medos-ai.github.io, this dataset gives MedOS an unmatched visual training base for interpreting OR scenes, identifying anatomical structures, and aligning tool movements with procedural intent.
Addressing Physician Burnout at Scale
The clinical motivation is urgent. More than 60% of U.S. physicians report burnout symptoms according to recent studies. MedOS was explicitly built as a response — not by removing doctors from the equation, but by eliminating the cognitive overhead that breaks them.
“The goal is not to replace doctors. It is to amplify their intelligence, extend their abilities, and reduce the risks posed by fatigue, oversight, or complexity. MedOS is not just an assistant. It is the beginning of a new era of AI as a true clinical partner.”
— Dr. Le Cong, Associate Professor, Stanford University
Clinical testing results speak directly to this: registered nurses improved from 49% to 77% in precision tasks with MedOS assistance. Medical students rose from 72% to 91%. These are not marginal improvements — they represent nurses closing the gap with attending physicians in fatigue-prone scenarios.
Beyond performance testing, MedOS has already surfaced novel clinical insights in the wild: uncovering immune side effects of the GLP-1 agonist Semaglutide (Wegovy) from FDA database analysis, and identifying prognostic implications of driver gene co-mutations on cancer patient survival — discoveries that emerged from MedOS’s reasoning layer operating across structured medical data.
Infrastructure
LabClaw: The Open Skill Operating Layer
Running alongside both platforms is an open-source ecosystem called LabClaw — a modular skill library that teaches AI agents when to use a tool, how to call it, and what output to produce. LabClaw packages over 240 production-ready SKILL.md files spanning biology, lab automation, vision/XR, drug discovery, medicine, data science, and scientific visualization. Built for OpenClaw-compatible agents, it is designed as a practical, research-grade alternative to generic prompt bundles — the connective tissue between LabOS’s dry-lab reasoning and its wet-lab XR execution.
Deployment
Where It’s Live: From Stanford Benches to Hospital Corridors
LabOS and MedOS are not theoretical. Early pilots are active at Stanford, Princeton, and the University of Washington. Hospital logistics is the immediate clinical entry point: MedOS is being deployed to move blood samples and supplies — environments where there are no patient-facing interactions, making safety validation more tractable.
The team is in active conversation with hospital systems in the Northwest and East Coast. Surgical simulation testing on mock bodies is underway, with rigorous multi-level physician evaluations before any patient-facing deployment. MedOS launched with backing from NVIDIA, AI4Science, and Nebius, with a public debut at NVIDIA’s GPU Technology Conference (GTC) 2026 and a dedicated session at Stanford.
Clinical collaborators can request early access at ai4med.stanford.edu. Laboratory researchers can explore LabOS at ai4lab.stanford.edu.
Significance
Why This Matters: A New Paradigm for Human-AI Collaboration
LabOS and MedOS represent something genuinely new in the AI landscape — not a chatbot, not a pipeline, not a passive recommendation engine. They are participatory intelligences: systems that enter physical environments, perceive them in 3D, reason about them in real time, and act within them alongside humans. The “world model” framing is apt — these systems construct and update internal models of physical and procedural reality, not just statistical correlations over text.
As Dr. Wang describes it, MedOS creates “a collaborative loop that helps clinicians manage complexity in real time.” LabOS turns the laboratory into “a living intelligence loop” where every experiment tightens the feedback between human intuition and machine reasoning.
The trajectory is clear. LabOS laid the foundation: AI that can see a bench, understand a protocol, and guide a scientist’s hands. MedOS is the evolution: AI that can read a surgical scene, model a clinical decision tree, and extend a clinician’s capabilities past the limits of fatigue and attention. Both are live. Both are growing. And both suggest that the most consequential AI systems of the coming decade will not be the ones writing text — they’ll be the ones working alongside us in the physical world.





