How sem evolved — including the mistakes.
First working version. sem diff with tree-sitter parsing for TypeScript, JavaScript, Python, Go, Rust.
JSON/YAML/TOML/CSV/Markdown support via custom parsers. Three-phase entity matching (exact ID, content hash, fuzzy similarity).
Node.js + TypeScript • tsup build • better-sqlite3 for storage • ~290ms cold start
Added blame, status, watch, review, history, label, comment.
Entity-level blame (who last touched each function), semantic PR review with risk signals, real-time file watching, threaded comments on entities.
+1,191 lines • 12 files • 15 functions, 12 interfaces, 3 variables, 1 class added
Parallelized git operations. Replaced sequential simple-git calls with batched diff --name-status + show. Lazy-loaded parsers instead of importing all grammars upfront. Added internal benchmarking harness.
+705 lines • 32 added, 10 modified, 3 deleted entities across 5 types
Rewrote sem diff as a compiled Rust binary. git2 for in-process git operations (no subprocess spawning), tree-sitter grammars compiled directly into the binary (no WASM, no dynamic loading).
Cargo workspace: sem-core (library) + sem-cli (binary). All 7 parser plugins ported.
+3,905 lines • 278 entities • 11 entity types including Rust-specific (struct, impl, trait)
Proved that AI agents are 2.3x more accurate at answering questions about code changes when given sem diff JSON vs raw git diff output.
Tested with Claude Sonnet 4.5 across 4 question types and 3 commits of varying size.
Core finding: line diffs cause systematic failures — the model confuses lines with entities, can't distinguish adds from modifications, and has no entity type vocabulary.
24 API calls • sem 95.9% avg accuracy vs git 41.5% • +54.4% delta
brew install sem-diff — single command install via Homebrew tap.
Formula builds from source using Cargo, compiles all tree-sitter grammars in, links against system libgit2.
Includes a test that verifies entity extraction on a Python function.
3 install methods: Homebrew (recommended) • curl installer • cargo install from source
npm install -g, then a curl script, and finally Homebrew.
Each step removed friction — Homebrew handles updates, dependencies, and uninstall for free.