Ulyana Piterbarg

I'm a visiting researcher at Meta FAIR and a final-year Ph.D. at NYU CILVR, where I'm co-advised by Rob Fergus and Lerrel Pinto, and supported by the NSF GRFP and a scholarship from Google DeepMind.

My research focuses on the intersection of scaling and imitation/reinforcement learning for long & open-ended agent tasks. I'm especially interested in training sandboxes and modeling recipes that can enable LMs & VLMs to solve and/or provide assistance on tasks that take humans 10-1000s of hours to complete.

Previously, as a research scientist intern, I worked on agent post-training for production MoE LLMs (Meta Llama Team), efficient distillation algorithms for Phi models (Microsoft), and neural nets for solving PDEs (Google Research). Before that, I did my undergrad in mathematics and computer science at MIT, during which I was exceptionally lucky to be mentored by Kelsey R. Allen and Josh Tenenbaum.

Once upon a time, I was a design assistant to the director of the Exhibitions Lab of the American Museum of Natural History.

You can reach me at up2021 [at] nyu.edu.

Recent News

Sep 2025 ARE & Gaia2 are out! (🤗 blogpost / demo, 📰 coverage by VentureBeat)

Jul 2025 Our paper on mid-training code LMs to explore diff-by-diff was accepted to COLM 2025

Jun 2025 Presented my PhD work at Cohere and the Deep Learning Ulanbaatar Summer School in Mongolia

Apr 2025 LintSeq received an outstanding paper award at NENLP 2025 🎉

Apr 2025 Gave a talk on post-training LMs for hard agentic tasks at the ICLR 2025 workshop on self-improving foundation models (...recording out now 📷!)

Apr 2025 I moved to Paris to intern with Gregoire Mialon on the agent research arm of the Llama Team. See you in France!

Jan 2025 BALROG and LintSeq were accepted to ICLR 2025 in Singapore

Research

LLMs are becoming increasingly capable agents. I'm interested in the data, algorithms, and environments that will enable models to autonomously complete and/or collaborate with humans on tasks that lie at the "edge of simulation," e.g. solving open problems in mathematics, developing maintainable & reliable software, or ascending in NetHack.

Unlike question-answering and short-horizon tool-use, it is difficult (and in some cases, impossible) to collect demonstration data or train models with "vanilla" online RL for such tasks.

Throughout my Ph.D., I've worked towards this setting by:

developing methods for efficiently training production MoEs on multi-agent tool-use
studying hierachical policy learning & data scaling laws for tiny VLMs/LMs on the ultra long-horizon videogame NetHack (HiHack, diff History)
showing that training code LMs to act and explore diff-by-diff can improve pass@ scaling laws (LintSeq)
contributing to benchmarks and platforms for agent evaluations + RL on long tasks (BALROG, Gaia2/ARE)
mid-training LMs for more human-like code exploration (D3, FAIR CodeGen)

More recently, I've been thinking about LLM agent alignment and leveraging the strong language representations of models for better exploration during RL.

Selected Publications and Preprints

ARE: Scaling Up Agent Environments and Evaluations

Meta Superintelligence Labs, Agents Team

arXiv:2509.17158 / code release / blog

Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents

Davide Paglieri, Bartłomiej Cupiał, Jonathan Cook, Ulyana Piterbarg, Jens Tuyls, Edward Grefenstette, Jakob Nicolaus Foerster, Jack Parker-Holder, Tim Rocktäschel

arXiv:2509.03581

D3: A Dataset for Mid-Training Code LMs to Act Diff-by-Diff

Ulyana Piterbarg, Kanishk Gandhi, Lerrel Pinto, Noah Goodman‡, Rob Fergus‡

COLM 2025

Training Language Models on Synthetic Edit Sequences Improves Code Synthesis

Ulyana Piterbarg, Lerrel Pinto, Rob Fergus

ICLR 2025, NENLP 2025 (Outstanding Paper Award)

BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

Davide Paglieri, Bartłomiej Cupiał*, Samuel Coward, Ulyana Piterbarg, Maciej Wolczyk, Akbir Khan, Eduardo Pignatelli, Łukasz Kuciński, Lerrel Pinto, Rob Fergus, Jakob Nicolaus Foerster, Jack Parker-Holder, Tim Rocktäschel

ICLR 2025

diff History for Neural Language Agents

Ulyana Piterbarg, Lerrel Pinto, Rob Fergus

ICML 2024

NetHack is Hard to Hack

Ulyana Piterbarg, Lerrel Pinto, Rob Fergus

NeurIPS 2023