Today we're releasing trajectories.sh, a hosting platform for agent trajectories. It's the place you go when you want to share a Harbor job with a colleague, embed a failing run in a bug report, or browse what other teams have built.
We built it because the terminal-bench community keeps shipping new benchmarks adjacent to Harbor, and every team ended up rolling their own viewer for the output. Trajectories is the missing shared surface: push once from the CLI, get a permanent URL, share a specific step with anyone.
Upload any Harbor job
Point the CLI at any Harbor-compatible job directory and it hashes, packages, and uploads the whole run in one command.
npx trajectories-sh upload trajectory ./my-run --visibility unlistedEvery upload gets a shareable URL, a deterministic content hash (so you can spot duplicates), and a permanent home alongside the rest of your org's runs. Visibility is per-upload: private for your team, unlisted for link-only sharing, or public to show up on /browse.

Step-by-step viewer
The step viewer renders every trial as a timeline: agent messages and tool calls on the right, the screenshot or terminal output the agent was looking at on the left. Every step has a stable anchor, so a link like /t/<jobId>?trial=X&step=42 opens the exact moment you want someone to see. The social preview card pulls in that specific screenshot too.

Arrow keys navigate, space toggles auto-play, f goes fullscreen. The viewer also surfaces the trial's verifier output: pytest summary for deterministic graders, per-criterion verdicts and the judge's reasoning for rubric graders, plus a clear "grading errored" state when an LLM judge couldn't produce a verdict.
Task visualizer
Trajectories ships with an embedding-space visualizer for task sets at /visualizer. Every task across your linked datasets gets a t-SNE point; hover to read the instruction, filter by tag or author, and add public tasksets to your view for comparison. Useful for spotting duplicates and coverage gaps before you commit to a new benchmark run.

Examples to open
Two public trajectories worth clicking around in:
terminus-2 attempting a PR-sized firmware change from the TBench + Harbor community set. Real codebase, real tests, deterministic pytest grading. Open the trajectory →Get started
Sign in at trajectories.sh, generate an API key from settings, and push your first job. The CLI source lives at npm / trajectories-sh.
If you're building a benchmark adjacent to terminal-bench and want tighter integration, pricing, or org-wide private uploads, get in touch.
