Back to Our Work
Introducing Trajectories
Announcements

Introducing Trajectories

Upload Harbor jobs with one command, get a shareable URL, and inspect every step of every trial. Built for the terminal-bench community.

Christopher Settles
Christopher Settles
··3 min read

Today we're releasing trajectories.sh, a hosting platform for agent trajectories. It's the place you go when you want to share a Harbor job with a colleague, embed a failing run in a bug report, or browse what other teams have built.

We built it because the terminal-bench community keeps shipping new benchmarks adjacent to Harbor, and every team ended up rolling their own viewer for the output. Trajectories is the missing shared surface: push once from the CLI, get a permanent URL, share a specific step with anyone.

01

Upload any Harbor job

Point the CLI at any Harbor-compatible job directory and it hashes, packages, and uploads the whole run in one command.

One command
npx trajectories-sh upload trajectory ./my-run --visibility unlisted

Every upload gets a shareable URL, a deterministic content hash (so you can spot duplicates), and a permanent home alongside the rest of your org's runs. Visibility is per-upload: private for your team, unlisted for link-only sharing, or public to show up on /browse.

PUBLIC BROWSE
Public trajectories list on trajectories.sh
02

Step-by-step viewer

The step viewer renders every trial as a timeline: agent messages and tool calls on the right, the screenshot or terminal output the agent was looking at on the left. Every step has a stable anchor, so a link like /t/<jobId>?trial=X&step=42 opens the exact moment you want someone to see. The social preview card pulls in that specific screenshot too.

STEP VIEWER
Step viewer showing a browser-agent trial on trajectories.sh

Arrow keys navigate, space toggles auto-play, f goes fullscreen. The viewer also surfaces the trial's verifier output: pytest summary for deterministic graders, per-criterion verdicts and the judge's reasoning for rubric graders, plus a clear "grading errored" state when an LLM judge couldn't produce a verdict.

03

Task visualizer

Trajectories ships with an embedding-space visualizer for task sets at /visualizer. Every task across your linked datasets gets a t-SNE point; hover to read the instruction, filter by tag or author, and add public tasksets to your view for comparison. Useful for spotting duplicates and coverage gaps before you commit to a new benchmark run.

TASK VISUALIZER
Task embedding scatter plot on trajectories.sh/visualizer
04

Examples to open

Two public trajectories worth clicking around in:

Computer use · NASA Eyes on the Solar System
A Conduit run where Claude Opus 4.6 drives the Eyes on the Solar System web app to find the month between 2005 and 2010 with the most active spacecraft. Good showcase of the step viewer: 200+ frames, scrollable timeline, shareable deep links. Open the trajectory →
Coding · RISC-V Firmware (PR #228)
GPT-5.2 on terminus-2 attempting a PR-sized firmware change from the TBench + Harbor community set. Real codebase, real tests, deterministic pytest grading. Open the trajectory →
05

Get started

Sign in at trajectories.sh, generate an API key from settings, and push your first job. The CLI source lives at npm / trajectories-sh.

If you're building a benchmark adjacent to terminal-bench and want tighter integration, pricing, or org-wide private uploads, get in touch.