FAQs
I've never heard of NetHack. What is it? NetHack is an open-source, roguelike video game first released in 1987. The game is procedurally generated, features ASCII graphics, and is notoriously difficult to play. The source code is written in C
.
In 2020, Küttler et al. released the NetHack Learning Environment (NLE
), wrapping the video game into a reinforcement learning (RL) environment for ML/AI research. This environment was presented at NeurIPS 2020.
A year later at NeurIPS 2021, Hambro et al. organized a competition centered on NetHack, dubbed "The NetHack Challenge Competition" (NHC). This competition spurred AI researchers, machine learning enthusiasists, and members of the community at large into benchmarking and developing neural and symbolic methods in NLE
. Symbolic methods beat out neural ones in average in-game score by a staggering margin, leaving purely data-driven neural policies in the dust (Figure 1).
You can read more about NetHack via the game homepage or by checking out the community-maintained NetHack Wiki.
Figure 1: Selected results from the NeurIPS 2021 NetHack Challenge (NHC) against results from NetHack is Hard to Hack" in NeurIPS 2023, showing game score on a log-scale (Piterbarg et al., 2023). Why NetHack? 1. NetHack remains unsolved.
2. Due to its procedurally generated nature, NetHack truly probes policy generalization. Each random seed of the game features a completely unique layout of dungeons, monster encounters, and items to gather.
3. NetHack is compiled in C
, yielding blazingly fast simulation. Training neural policies in NLE
with RL remains tractable despite the high complexity of the game.
4. HiHack. Unlike other open-ended RL environments like Habitat, MineCraft, or AI2-Thor, the state-of-the-art (SOTA) artificial agent in NetHack is an open-source, hard-coded, symbolic, and hierarchical bot, which we "hack" to generate hierarchically labeled demonstration data. HiHack provides the community with a unique opportunity to explore the impact of ground-truth hierarchical behavioral priors on learning, in a data-unlimited setting.
How large is HiHack exactly? In games? Keypresses? Gigabytes? HiHack contains 109,907 games and 3,244,729,367 keypresses (Table 1). After extraction to ttyrec4.bz2
files, the full dataset is 99 GB
.
Despite its scale, neural policies trained on HiHack with imitation learning fail to match the bot in NLE
score, even when RL finetuning is thrown into the mix. In the purely offline regime, neural policy architectures based on LSTMs and even transformers (Figure 2) exhibit sub log-linear scaling in NLE
score with demonstration count (Piterbarg et al., 2023).
What data format is HiHack saved in? HiHack is saved in the ttyrec
data format native to NetHack.
Table 1: Comparing HiHack to NLD-AA
(Hambro et al., 2022). How does HiHack compare to other offline datasets for NetHack? See Table 1 for a comparison of HiHack to the AutoAscend
NetHack Learning Dataset (NLD-AA
), the latter containing demonstrations with keypress labels only.
How was HiHack generated? HiHack was generated through the introduction of explicit "strategy" tracking to the AutoAscend
source code and to the ttyrec
read/write code of NLE
. We open-source all code employed for data generation.
Figure 2: Policy architectures from NetHack is Hard to Hack (Piterbarg et al., 2023). Left: Hierarchical LSTM policy where gt is the high-level strategy prediction (purple) that is used to select over the k low-level policies (yellow). "Straight-through" gradients via Gumbel-Softmax are employed to train the bilevel set of decoders end-to-end. Right: Transformer-LSTM policy. The LSTM encoder (grey) is used to provide a long temporal context ht to the transformer. How well do state-of-the-art algorithms and models perform in NetHack when pre-trained on HiHack? As of October 30, 2023, the SOTA for data-driven neural policies in NLE
is set by agents trained on HiHack, both in the purely offline and offline + online settings (Piterbarg et al., 2023). See Table 2 below and our paper for more details.
Table 2: Aggregate results from NetHack is Hard to Hack (Piterbarg et al., 2023), evaluating the impact of hierarchical labels and architectural improvement on the performance of policies trained both with behavioral cloning (BC), as well as with combined BC and asynchronchronous proximal policy optimization (APPO). Behavioral cloning losses were computed over batches from the HiHack Dataset. All policies were trained for 48 hours on a single GPU. Metrics annotated with (†) were computed only for the top-scoring neural policy seed (out of 6) across each model class.
I have more questions. Who should I contact? Please send any inqueries to up2021 -at- cims.nyu.edu
.
@misc{ piterbarg2023nethack,
title={ NetHack is Hard to Hack },
author={ Ulyana Piterbarg and Lerrel Pinto and Rob Fergus },
year={ 2023 },
eprint={ 2305.19240 },
archivePrefix={ arXiv },
primaryClass={ cs.LG }
}