The First Mimic: a brief report from Berkeley on serteal’s near-perfect transpilation score, the first in the qualifying round, and the unofficial trophy he took home. The contest remains wide open.
The First Mimic: a brief report from Berkeley on serteal’s near-perfect transpilation score, the first in the qualifying round, and the unofficial trophy he took home. The contest remains wide open.
Several teleport-contest bugs fixed
thanks to careful reports from
@xeophon
(#5)
and @serteal
(#6):
the public session corpus is back in sync between contest
and judge (38 files re-recorded), seed0030's
seg-0 character mismatch is fixed, and the scorer now
requires the cursor to land in the recorded position for a
screen to count as matched. Pull the latest template and
re-run bash frozen/score.sh to pick up the
corrections.
Contest updates: several improvements. Forks
now declare a category — agentic,
transpiled, or other — by running
set-category.sh once before it will be scored. Animation-frame
parity is scored as a supplemental metric, supported by a new
API. /play/<owner>/ now supports
saving, loading, and an in-browser options editor at
/nethackrc/, and the persistence API
was simplified to a single opts.storage handle so
save/restore survives a browser reload. (serteal’s port
hasn’t regressed; their sessions just need a small
migration to fit the new API.) The corpus was re-recorded with
instrumentation fixes. Phase 2 is clarified as a test of
maintainability.
serteal has submitted the first transpiled solution — an Emscripten compilation of the C source into a JavaScript emulation of the C state machine, including a simulated C heap. Click serteal’s name to inspect the JavaScript, and Play to play the working game in the browser. It is not yet a readable JS port, and there is still plenty of time to write one! Can you build a port that beats the transpiler in Phase 2?
| # | Team | Points / 22,670 | PRNG | Screen | Anim† | Speed† | Playable† | Sessions / 88 | Progress | ||
|---|---|---|---|---|---|---|---|---|---|---|---|
Loading…
Category is based on how the team plans to
produce most (over 50%) of the code. Agentic
codebases are mostly produced by generating code with an LLM
and Transpiled codebases rely mostly on
transpiling the C sources with tools.
Points shown as public + held-out: matched 80×24
screens, one point per recorded step where the fork’s render matches
C exactly. The 88 sessions split 44/44 across the public corpus and a held-out
set kept private until contest end. PRNG is advisory —
the structural prerequisite for screens to match, but no points on its own.
Anim† is a supplemental count of
matched animation frames: forks that opt in via the new animation API
render in-between frames (dart trajectories, explosion expansion, etc.)
and earn one credit per matched frame. Reported as a raw integer rather
than a percentage so a fork that opts in stays visible even when most
contestants haven’t wired up the API. Not part of the official
ranking.
Speed† is a linear fit on the offline
scoring path of the form startup_ms + per_move_ms × moves,
computed against the same 88 sessions every fork is scored on.
Two roughly-comparable numbers in one cell: how much fixed cost a
session pays before the first move, and how much each move adds.
Playable† is a two-part
browser-playability check. First, the judge loads the fork's
index.html in real headless Chromium and watches for
failed module fetches, top-level script errors, or 4xx/5xx on any
subresource — catches deploy mistakes (missing files, broken
import paths) that look fine in offline scoring but break the
actual play page. Second, it drives the fork the same way the
browser does (one moveloop call per keypress) and asks whether the
aggregate ms/move stays under 5 ms. Both must pass.
Neither column is part of the official ranking; both are reported as
diagnostics.
Play opens that contestant’s build in your browser.
Tests opens the Session Viewer scoped to the fork —
scrub through each public session frame by frame and see where it diverges
from C.