Skip to main content ->
Ai2

Latest research

August 28, 2025

OLMoASR: A series of open speech recognition models

We release OLMoASR, a family of open automatic speech recognition (ASR) models trained from scratch on a curated, large-scale dataset.
Read post
August 26, 2025

Asta: Accelerating science through trustworthy agentic AI

We announce Asta, our bold initiative to accelerate science through trustworthy, truly open agentic AI.
Read post
August 26, 2025

AstaBench: Rigorous benchmarking of AI agents with a holistic scientific research suite

Introducing AstaBench, a novel AI agents evaluation framework and scientific research benchmark suite.
Read post
August 19, 2025

Signal and Noise: Reducing uncertainty in language model evaluation

We find that two simple metrics, signal and noise, reveal key differences in the utility of current LLM benchmarks.
Read post
August 18, 2025

MoNaCo: More natural questions for reasoning across dozens of documents

Introducing MoNaCo, a benchmark of highly challenging questions spanning dozens of documents for evaluating large language models.
Read post
August 12, 2025

MolmoAct: An Action Reasoning Model that reasons in 3D space

MolmoAct is the first model able to “think” in three dimensions, trained efficiently and delivering benchmark-topping performance.
Read post
July 22, 2025

Contextualized Evaluations: Judging language model responses to underspecified queries

How do we evaluate LLMs on underspecified queries? We show that adding clarifying context flips model rankings and uncovers model biases.
Read post
July 18, 2025

AutoDS: A prototype engine for autonomous, open-ended scientific discovery

AutoDS goes beyond standard data crunching by building upon its own findings and uncovering insights that may not be immediately apparent even to experienced researchers.
Read post
July 9, 2025

Introducing FlexOlmo: a new paradigm for language model training and data collaboration

Explore how FlexOlmo enables collaborative language model training without sacrificing data privacy or control, introducing a new, flexible approach to building shared AI models.
Read post