The Teleport Coding Challenge
Port NetHack 5.0 from C to JavaScript. Bit-exact parity. Highest score wins.
NetHack is one of the most complex and longest-lived open source programs ever written. After 46 years of continuous development — tracing its lineage from Rogue (1980) to Hack (1982) to NetHack — v5.0 just shipped: the first major version bump since 3.0 in 1989. That’s 442,901 lines of C and Lua to port.
This contest asks: can a swarm of LLM coding assistants enable a single person to work with a program of this scale and complexity? Can agents create hundreds of thousands of lines of code that humans would actually want to own afterwards?
Leaderboard
| # | Team | Points | PRNG | Screen | Sessions | Last | ||
|---|---|---|---|---|---|---|---|---|
No contestants yet. Be the first →
Category is based on how the team plans to produce most (over 50%) of the code. Agentic codebases are mostly produced by generating code with an LLM and Transpiled codebases rely mostly on transpiling the C sources with tools. Points = matched screens across the 44 public sessions. Each step where your fork’s rendered 80×24 terminal exactly matches the C reference is one point; the max is the total number of recorded screens. PRNG is shown as advisory progress — it’s the structural prerequisite for screens to match (if your PRNG drifts, the game state drifts and screens can’t line up). The Sessions column tallies sessions where every screen and PRNG call matched. Play opens the build in your browser; Tests opens the Session Viewer scoped to it.
Play it in your browser
Hit any Play button on the leaderboard above to open that contestant’s port directly in your browser — running entirely client-side, no install. Tests opens the Session Viewer scoped to that same fork: scrub through any of the 44 public sessions and see, frame by frame, where the JS port diverges from the official C trace.
Two phases
Phase 1 — Foundation
Standard parity contest against NetHack 5.0. Score against 88 sessions (44 public and 44 held-out). Top 10 teams qualify for Phase 2.
Phase 2 — Generalization
Judges pick a “5.1” target — selected changes to the baseline
codebase. Phase 2 score is parity against 5.1, divided by a penalty proportional
to how much you changed your js/ from your Phase 1 submission. Ports
with maintainable code win.
Best Method award
Throughout both phases, judges spotlight team writeups on the leaderboard. After Phase 2, a separate Best Method award is judged on the quality and reproducibility of the writeup — independent of where you placed in the parity ranking. The goal: capture and share the actual techniques that worked.