Latest research
August 28, 2025
OLMoASR: A series of open speech recognition models
We release OLMoASR, a family of open automatic speech recognition (ASR) models trained from scratch on a curated, large-scale dataset.August 26, 2025
Asta: Accelerating science through trustworthy agentic AI
We announce Asta, our bold initiative to accelerate science through trustworthy, truly open agentic AI.August 26, 2025
AstaBench: Rigorous benchmarking of AI agents with a holistic scientific research suite
Introducing AstaBench, a novel AI agents evaluation framework and scientific research benchmark suite.August 19, 2025
Signal and Noise: Reducing uncertainty in language model evaluation
We find that two simple metrics, signal and noise, reveal key differences in the utility of current LLM benchmarks.August 18, 2025
MoNaCo: More natural questions for reasoning across dozens of documents
Introducing MoNaCo, a benchmark of highly challenging questions spanning dozens of documents for evaluating large language models.August 12, 2025
MolmoAct: An Action Reasoning Model that reasons in 3D space
MolmoAct is the first model able to “think” in three dimensions, trained efficiently and delivering benchmark-topping performance.July 22, 2025
Contextualized Evaluations: Judging language model responses to underspecified queries
How do we evaluate LLMs on underspecified queries? We show that adding clarifying context flips model rankings and uncovers model biases.July 18, 2025
AutoDS: A prototype engine for autonomous, open-ended scientific discovery
AutoDS goes beyond standard data crunching by building upon its own findings and uncovering insights that may not be immediately apparent even to experienced researchers.July 9, 2025