Ulyana Piterbarg

Ulyana Piterbarg

up2021 [at] cims.nyu.edu

[Twitter] [Google Scholar] [Github] [CV]

Papers

project image

diff History for Neural Language Agents


Ulyana Piterbarg, Lerrel Pinto, Rob Fergus

arXiv Preprint, 2023
arXiv / project page / code /

Neural Language Models (LMs) offer an exciting solution for general-purpose embodied control. However, a key technical issue arises when using an LM-based controller: environment observations must be converted to text, which coupled with history, results in long and verbose textual prompts. As a result, prior work in LM agents is limited to restricted domains with small observation size as well as minimal needs for interaction history or domain-specific instruction tuning. In this paper, we introduce diff history, a simple and highly effective solution to these issues. By applying the Unix diff command on consecutive text observations in the interaction histories used to prompt LM policies, we can both abstract away redundant information and focus the content of textual input on the salient changes in the environment. On NetHack, an unsolved video game that requires long-horizon reasoning for decision-making, LMs tuned with diff history match state-of-the-art performance for neural agents while needing 1800x fewer training data compared to prior work. Even on the simpler BabyAI-Text environment with concise text observations, we find that although diff history increases the length of prompts, the representation it provides offers a 25% improvement in the efficiency of instruction tuning. Further, we show that diffhistory scales favorably across different tuning dataset sizes.

project image

NetHack is Hard to Hack


Ulyana Piterbarg, Lerrel Pinto, Rob Fergus

37th Conference on Neural Information Processing Systems (NeurIPS), 2023
arXiv / project page / code /

Neural policy learning methods struggle in long-horizon tasks, especially in open-ended environments with multi-modal observations, such as the popular dungeon-crawler game, NetHack. Intriguingly, the NeurIPS 2021 NetHack Challenge revealed that symbolic agents outperformed neural approaches by over four times in median game score. In this paper, we delve into the reasons behind this performance gap and present an extensive study on neural policy learning for NetHack. To conduct this study, we analyze the winning symbolic agent, extending its codebase to track internal strategy selection in order to generate one of the largest available demonstration datasets. Utilizing this dataset, we examine (i) the advantages of an action hierarchy; (ii) enhancements in neural architecture; and (iii) the integration of reinforcement learning with imitation learning. Our investigations produce a state-of-the-art neural agent that surpasses previous fully neural policies by 127% in offline settings and 25% in online settings on median game score. However, we also demonstrate that mere scaling is insufficient to bridge the performance gap with the best symbolic models or even the top human players.

project image

Capturing missing physics in climate model parameterizations using neural differential equations


Ali Ramadhan, John C Marshall, Andre Nogueira Souza, Xin Kai Lee, Ulyana Piterbarg, Adeline Hillier, Gregory LeClaire Wagner, Christopher Rackauckas, Chris Hill, Jean-Michel Campin, Raffaele Ferrari

Earth and Space Science Open Archive (ESSOAR), 2022
arXiv / code /

We explore how neural differential equations (NDEs) may be trained on highly resolved fluid-dynamical models of unresolved scales providing an ideal framework for data-driven parameterizations in climate models. NDEs overcome some of the limitations of traditional neural networks (NNs) in fluid dynamical applications in that they can readily incorporate conservation laws and boundary conditions and are stable when integrated over time. We advocate a method that employs a “residual” approach, in which the NN is used to improve upon an existing parameterization through the representation of residual fluxes which are not captured by the base parameterization.

project image

Abstract strategy learning underlies flexible transfer in physical problem solving


Kelsey R. Allen, Kevin A. Smith, Ulyana Piterbarg, Robert Chen, Josh B. Tenenbaum

42nd Annual Meeting of the Cognitive Science Society, 2020


What do people learn when they repeatedly try to solve a set of related problems? In a set of three different exploratory physical problem solving experiments, participants consistently learn strategies rather than generically better world models. Our results suggested that people can make use of limited experience to learn abstract strategies that go beyond simple model-free policies and are instead object-oriented, adaptable, and can be parameterized by model-based variables such as weight.


Unpublished Work


project image

Biped Locomotion from Human Demonstrations with Motion Imitation via RL

MIT 6.832: Underactuated Robotics (graduate) (2021-05-17)

I experimented with reinforcement learning as a basis for learning bipedal locomotive skills from human demonstrations, using data from the CMU Motion Capture Database as the demonstration source and the NASA Valkyrie (R5) robot as the target system for motion imitation. This work draws from Peng at al. 2020 and Xie et al. 2019.


project image

Experiments with Quasi-Geostrophic Flows

MIT Ferrari Lab (2021-01-10)

Under the supervision of Andre N. Souza, I experimented with turbulent regimes of quasigeostrophic flows, numerically approximated using Julia.


project image

A Bayesian Approach to Modeling Infection-Based Social Distancing in the SARS-CoV-2 Pandemic

MIT IDS.147/15.077: Statistical Learning and Data Mining (graduate) (2020-05-20)

This project revolves around a simple extension of the classical SIR compartmental model of disease transmission that parameterizes infection-based social distancing policy i.e., the feedback SIR (fSIR) model imagined by Dr. Elisa Franco. I fit this model via the probabilistic programming package PyMC3 against true statistics of infection from four countries with sharply-contrasting responses to the pandemic, yielding posteriors that made it possible to perform a rough numerical comparison of policy-efficacy.


project image

Exploring strategy learning in the “Tools Environment”

MIT 9.66/9.660/6.804: Computational Cognitive Science (2018-12-10)

I worked with Kevin A. Smith and Kelsey R. Allen to design and to run a preliminary study testing the Virtual Tools Game as a testbed for behavioral experiments studying abstract strategy learning in humans. I also designed a hierarchical Bayesian mixture model to identify the abstract strategies learned by study participants directly from data.


project image

Investigating Option-Conditional Value Prediction in Reinforcement Learning

EPFL Life Sciences Summer Research Program Colloquium (2018-08-15)

Supervised by Johanni Brea and Wulfram Gerstner, I investigated the efficacy of option-conditional value prediction in reinforcement learning (RL) by adapting the Value Prediction Network for tabular environments as well as by implementing the algorithm as in Oh et al.’s original paper, using a combination of temporal-difference search (TD search) and n-step Q-learning for training.


project image

Anamorphic Entrance Scupture Prototype

AMNH Exhibitions Department (2017-06-10)

I co-designed the prototype of an anamorphic entrance sculpture to the American Museum of Natural History special exhibition “Our Senses: An Immersive Experience.” The final sculpture was included in the New York Times feature on the exhibition, “This Exhibition Will Help You Make Sense of Your Senses”.