Changelog

How sem evolved — including the mistakes.

01edf1f feat Initial release

First working version. sem diff with tree-sitter parsing for TypeScript, JavaScript, Python, Go, Rust. JSON/YAML/TOML/CSV/Markdown support via custom parsers. Three-phase entity matching (exact ID, content hash, fuzzy similarity).

Node.js + TypeScript • tsup build • better-sqlite3 for storage • ~290ms cold start

Mistake: Node.js startup overhead. The entire runtime — V8 init, module resolution, dynamic imports of tree-sitter WASM binaries — took ~120ms before any real work began. For a CLI tool that should feel instant, this was a problem we didn't think about until we started measuring.
9f7f1c7 feat 7 new commands

Added blame, status, watch, review, history, label, comment. Entity-level blame (who last touched each function), semantic PR review with risk signals, real-time file watching, threaded comments on entities.

+1,191 lines • 12 files • 15 functions, 12 interfaces, 3 variables, 1 class added

fffb38f perf 591ms → 260ms (56% faster)

Parallelized git operations. Replaced sequential simple-git calls with batched diff --name-status + show. Lazy-loaded parsers instead of importing all grammars upfront. Added internal benchmarking harness.

+705 lines • 32 added, 10 modified, 3 deleted entities across 5 types

Mistake: optimizing the wrong layer. We spent a week shaving milliseconds off the Node.js implementation — lazy imports, worker threads, caching git objects. Got from 591ms to 260ms, which felt like a win. But 120ms of that was still V8 startup that no amount of application-level optimization could touch. The real fix was switching runtimes entirely.
ae576ab rewrite Rust rewrite: 30ms (10x faster)

Rewrote sem diff as a compiled Rust binary. git2 for in-process git operations (no subprocess spawning), tree-sitter grammars compiled directly into the binary (no WASM, no dynamic loading). Cargo workspace: sem-core (library) + sem-cli (binary). All 7 parser plugins ported.

+3,905 lines • 278 entities • 11 entity types including Rust-specific (struct, impl, trait)

Lesson: where the time actually goes. Node.js breakdown: ~120ms V8 startup, ~80ms module resolution, ~50ms WASM grammar load, ~40ms git subprocess spawn. Rust breakdown: ~3ms binary load, ~5ms git2 repo open, ~17ms parse + match. The entire Rust binary finishes before Node even loads its first module. We should have started in Rust — but building in Node first let us iterate on the algorithm fast, and the Rust port was straightforward because the architecture was already proven.
bench bench Agent accuracy benchmark

Proved that AI agents are 2.3x more accurate at answering questions about code changes when given sem diff JSON vs raw git diff output. Tested with Claude Sonnet 4.5 across 4 question types and 3 commits of varying size. Core finding: line diffs cause systematic failures — the model confuses lines with entities, can't distinguish adds from modifications, and has no entity type vocabulary.

24 API calls • sem 95.9% avg accuracy vs git 41.5% • +54.4% delta

next feat Homebrew distribution

brew install sem-diff — single command install via Homebrew tap. Formula builds from source using Cargo, compiles all tree-sitter grammars in, links against system libgit2. Includes a test that verifies entity extraction on a Python function.

3 install methods: Homebrew (recommended) • curl installer • cargo install from source

Lesson: distribution matters as much as the tool. A fast binary that's hard to install doesn't get adopted. We started with npm install -g, then a curl script, and finally Homebrew. Each step removed friction — Homebrew handles updates, dependencies, and uninstall for free.