Ulyana Piterbarg

Ulyana Piterbarg

I'm a visiting researcher at Meta FAIR and a final-year Ph.D. at NYU CILVR, where I'm co-advised by Rob Fergus and Lerrel Pinto, and supported by the NSF GRFP and a scholarship from Google DeepMind.

My research focuses on the intersection of scaling and imitation/reinforcement learning for long & open-ended agent tasks. I'm especially interested in training sandboxes and modeling recipes that can enable LMs & VLMs to solve and/or provide assistance on tasks that take humans 10-1000s of hours to complete.

Previously, as a research scientist intern, I worked on agent RL for production MoE LLMs (Meta Llama Team), efficient distillation algorithms for Phi models (Microsoft), and neural nets for solving PDEs (Google Research). Before that, I did my undergrad in mathematics and computer science at MIT, during which I was exceptionally lucky to be mentored by Kelsey R. Allen and Josh Tenenbaum.

Once upon a time, I was a design assistant to the director of the Exhibitions Lab of the American Museum of Natural History.

You can reach me at up2021 [at] nyu.edu.

Recent News

Sep 2025 ARE & Gaia2 are out! (🤗 blogpost / demo, 📰 coverage by VentureBeat)
Jul 2025 Our paper on mid-training code LMs to explore diff-by-diff was accepted to COLM 2025
Jun 2025 Presented my PhD work at Cohere and the Deep Learning Ulanbaatar Summer School in Mongolia
Apr 2025 LintSeq received an outstanding paper award at NENLP 2025 🎉
Apr 2025 Gave a talk on post-training LMs for hard agentic tasks at the ICLR 2025 workshop on self-improving foundation models (...recording out now 📷!)
Apr 2025 I moved to Paris to intern with Gregoire Mialon on the agent research arm of the Llama Team. See you in France!
Jan 2025 BALROG and LintSeq were accepted to ICLR 2025 in Singapore

Research

LLMs are becoming increasingly capable agents. I'm interested in the data, algorithms, and environments that will enable models to autonomously complete and/or collaborate with humans on tasks that lie at the "edge of simulation," e.g. solving open problems in mathematics, developing maintainable & reliable software, or ascending in NetHack.


Unlike question-answering and short-horizon tool-use, it is difficult (and in some cases, impossible) to collect demonstration data or train models with "vanilla" online RL for such tasks.


Throughout my Ph.D., I've worked towards this setting by:

  • developing methods for efficiently training MoEs on multi-agent tool-use with RL (unreleased - Llama 4)
  • studying hierachical policy learning & data scaling laws for tiny VLMs/LMs on the ultra long-horizon videogame NetHack (HiHack, diff History)
  • showing that training code LMs to act and explore diff-by-diff can improve pass@ scaling laws (LintSeq)
  • contributing to benchmarks and platforms for agent evaluations + RL on long tasks (BALROG, Gaia2/ARE)
  • mid-training LMs for more human-like code exploration (D3, FAIR CodeGen)

More recently, I've been thinking about LLM agent alignment and leveraging the strong language representations of models for better exploration during RL.

Selected Publications and Preprints

Meta Superintelligence Labs, Agents Team
arXiv:2509.17158 / code release / blog
Davide Paglieri, Bartłomiej Cupiał, Jonathan Cook, Ulyana Piterbarg, Jens Tuyls, Edward Grefenstette, Jakob Nicolaus Foerster, Jack Parker-Holder, Tim Rocktäschel
arXiv:2509.03581
Ulyana Piterbarg, Kanishk Gandhi, Lerrel Pinto, Noah Goodman‡, Rob Fergus‡
COLM 2025
Ulyana Piterbarg, Lerrel Pinto, Rob Fergus
ICLR 2025, NENLP 2025 (Outstanding Paper Award)
Davide Paglieri, Bartłomiej Cupiał*, Samuel Coward, Ulyana Piterbarg, Maciej Wolczyk, Akbir Khan, Eduardo Pignatelli, Łukasz Kuciński, Lerrel Pinto, Rob Fergus, Jakob Nicolaus Foerster, Jack Parker-Holder, Tim Rocktäschel
ICLR 2025
Ulyana Piterbarg, Lerrel Pinto, Rob Fergus
ICML 2024
Ulyana Piterbarg, Lerrel Pinto, Rob Fergus
NeurIPS 2023