Changelog

How sem evolved — including the mistakes.

f2f57ef feat Full git diff syntax support

sem diff now supports the same positional argument syntax as git diff. Single ref (sem diff HEAD~3), two-dot range (sem diff main..feature), three-dot merge-base (sem diff main...feature), pathspec filtering (sem diff -- src/), and --cached alias for --staged. Also adds --format plain to the Rust CLI. Thanks @taciturnaxolotl for the contribution (#37).

+400 lines • 11 files • new RefToWorking scope, merge-base resolution, pathspec filtering

30816d4 feat Markdown diff formatting

New --format markdown output for sem diff. Renders entity changes as clean markdown tables with status symbols, useful for PR comments and CI pipelines. Thanks #32 for the contribution.

+200 lines • Rust + TypeScript formatters

8c10b73 fix JS/TS scope boundary extraction

Fixed entity extraction for JavaScript and TypeScript to respect scope boundaries. Local variables inside function bodies were being extracted as top-level entities. Now only extracts declarations at module/class scope. Important correctness fix for weave merge quality. Thanks @c22 (#35).

+300 lines • 7 new tests • entity_extractor.rs, languages.rs

7f781d2 feat Verbose inline content diffs

New sem diff -v flag shows inline before/after content for each modified entity. Lets you see exactly what changed inside a function without switching to git diff.

+70 lines • terminal formatter

807318b feat ERB parser plugin

New parser for ERB templates (.erb) via tree-sitter-embedded-template. Extracts blocks, expressions, and code tags as entities. Handles all ERB variants including <%- dash directives.

+414 lines • 21st language

2e6a8f8 fix Go grouped declaration extraction

Fixed extraction of Go grouped var(), const(), and type() declarations. Previously only extracted the first declaration in each group. Now extracts each individual declaration as a separate entity.

sem-core 0.3.12

070d6d4 feat Published to crates.io + Homebrew

sem-core and sem-cli are now on crates.io. Install with brew install sem-cli or cargo install sem-cli.

v0.3.10 • crates.io + homebrew-core

59f40ed feat v0.3.1: --file-exts flag for language filtering

New --file-exts flag on sem diff, sem graph, and sem impact. Lets you scope analysis to specific languages in multi-language repos. Example: sem graph --file-exts .py to build the dependency graph for Python files only. Accepts extensions with or without the leading dot. First feature shipped from a community request (#1).

+63 lines • 4 files • diff, graph, impact commands • closes #1

f1cf19a perf Incremental graph updates

EntityGraph::update_from_changes() incrementally updates the dependency graph when files change, instead of rebuilding from scratch. Handles Added, Modified, Deleted, and Renamed files. Re-extracts only changed files, rebuilds symbol table, re-resolves references. Much faster than full rebuild on large repos when only a few files changed.

+444 lines • 3 files • 4 new tests • 18 total tests passing

da9f34f feat sem impact: change impact analysis

New sem impact <entity_name> command shows transitive dependents — what would break if you changed an entity. Uses BFS over the entity dependency graph. Output grouped by file, with --json flag for machine consumption. Shows the blast radius at every depth level: direct dependents, their dependents, and so on.

+177 lines • 4 files • terminal + JSON output

e6dd39f feat Structural hash + entity dependency graph

Two major features. Structural hash (Unison-inspired): computes AST-based hash that strips comments and normalizes whitespace. Two entities with the same structural_hash are logically identical even if formatting differs. Used for rename detection. Entity dependency graph: two-pass extraction (entities → symbol table → reference edges). Intra-file and cross-file resolution. Forward + reverse dependency lookup. Transitive impact analysis via BFS. New sem graph CLI command.

+749 lines • 15 files • 14 tests (up from 10) • structural_hash, EntityGraph, sem graph, sem impact

Lesson: two-pass extraction is key. You can't resolve references in a single pass because forward declarations aren't available yet. Building the full symbol table first (name → entity ID), then resolving references in a second pass, is the same approach compilers use. Obvious in hindsight, but the first attempt tried single-pass and produced broken edges for every forward reference.
01edf1f feat Initial release

First working version. sem diff with tree-sitter parsing for TypeScript, JavaScript, Python, Go, Rust. JSON/YAML/TOML/CSV/Markdown support via custom parsers. Three-phase entity matching (exact ID, content hash, fuzzy similarity).

Node.js + TypeScript • tsup build • better-sqlite3 for storage • ~290ms cold start

Mistake: Node.js startup overhead. The entire runtime — V8 init, module resolution, dynamic imports of tree-sitter WASM binaries — took ~120ms before any real work began. For a CLI tool that should feel instant, this was a problem we didn't think about until we started measuring.
9f7f1c7 feat 7 new commands

Added blame, status, watch, review, history, label, comment. Entity-level blame (who last touched each function), semantic PR review with risk signals, real-time file watching, threaded comments on entities.

+1,191 lines • 12 files • 15 functions, 12 interfaces, 3 variables, 1 class added

fffb38f perf 591ms → 260ms (56% faster)

Parallelized git operations. Replaced sequential simple-git calls with batched diff --name-status + show. Lazy-loaded parsers instead of importing all grammars upfront. Added internal benchmarking harness.

+705 lines • 32 added, 10 modified, 3 deleted entities across 5 types

Mistake: optimizing the wrong layer. We spent a week shaving milliseconds off the Node.js implementation — lazy imports, worker threads, caching git objects. Got from 591ms to 260ms, which felt like a win. But 120ms of that was still V8 startup that no amount of application-level optimization could touch. The real fix was switching runtimes entirely.
ae576ab rewrite Rust rewrite: 30ms (10x faster)

Rewrote sem diff as a compiled Rust binary. git2 for in-process git operations (no subprocess spawning), tree-sitter grammars compiled directly into the binary (no WASM, no dynamic loading). Cargo workspace: sem-core (library) + sem-cli (binary). All 7 parser plugins ported.

+3,905 lines • 278 entities • 11 entity types including Rust-specific (struct, impl, trait)

Lesson: where the time actually goes. Node.js breakdown: ~120ms V8 startup, ~80ms module resolution, ~50ms WASM grammar load, ~40ms git subprocess spawn. Rust breakdown: ~3ms binary load, ~5ms git2 repo open, ~17ms parse + match. The entire Rust binary finishes before Node even loads its first module. We should have started in Rust — but building in Node first let us iterate on the algorithm fast, and the Rust port was straightforward because the architecture was already proven.
bench bench Agent accuracy benchmark

Proved that AI agents are 2.3x more accurate at answering questions about code changes when given sem diff JSON vs raw git diff output. Tested with Claude Sonnet 4.5 across 4 question types and 3 commits of varying size. Core finding: line diffs cause systematic failures — the model confuses lines with entities, can't distinguish adds from modifications, and has no entity type vocabulary.

24 API calls • sem 95.9% avg accuracy vs git 41.5% • +54.4% delta

next feat Homebrew distribution

brew install sem-diff — single command install via Homebrew tap. Formula builds from source using Cargo, compiles all tree-sitter grammars in, links against system libgit2. Includes a test that verifies entity extraction on a Python function.

3 install methods: Homebrew (recommended) • curl installer • cargo install from source

Lesson: distribution matters as much as the tool. A fast binary that's hard to install doesn't get adopted. We started with npm install -g, then a curl script, and finally Homebrew. Each step removed friction — Homebrew handles updates, dependencies, and uninstall for free.