How sem evolved — including the mistakes.
sem diff now supports the same positional argument syntax as git diff.
Single ref (sem diff HEAD~3), two-dot range (sem diff main..feature),
three-dot merge-base (sem diff main...feature), pathspec filtering (sem diff -- src/),
and --cached alias for --staged.
Also adds --format plain to the Rust CLI.
Thanks @taciturnaxolotl for the contribution (#37).
+400 lines • 11 files • new RefToWorking scope, merge-base resolution, pathspec filtering
New --format markdown output for sem diff.
Renders entity changes as clean markdown tables with status symbols, useful for PR comments and CI pipelines.
Thanks #32 for the contribution.
+200 lines • Rust + TypeScript formatters
Fixed entity extraction for JavaScript and TypeScript to respect scope boundaries. Local variables inside function bodies were being extracted as top-level entities. Now only extracts declarations at module/class scope. Important correctness fix for weave merge quality. Thanks @c22 (#35).
+300 lines • 7 new tests • entity_extractor.rs, languages.rs
New sem diff -v flag shows inline before/after content for each modified entity.
Lets you see exactly what changed inside a function without switching to git diff.
+70 lines • terminal formatter
New parser for ERB templates (.erb) via tree-sitter-embedded-template.
Extracts blocks, expressions, and code tags as entities. Handles all ERB variants including <%- dash directives.
+414 lines • 21st language
Fixed extraction of Go grouped var(), const(), and type() declarations.
Previously only extracted the first declaration in each group. Now extracts each individual declaration as a separate entity.
sem-core 0.3.12
sem-core and sem-cli are now on crates.io.
Install with brew install sem-cli or cargo install sem-cli.
v0.3.10 • crates.io + homebrew-core
New --file-exts flag on sem diff, sem graph, and sem impact.
Lets you scope analysis to specific languages in multi-language repos.
Example: sem graph --file-exts .py to build the dependency graph for Python files only.
Accepts extensions with or without the leading dot. First feature shipped from a community request (#1).
+63 lines • 4 files • diff, graph, impact commands • closes #1
EntityGraph::update_from_changes() incrementally updates the dependency graph when files change, instead of rebuilding from scratch.
Handles Added, Modified, Deleted, and Renamed files. Re-extracts only changed files, rebuilds symbol table, re-resolves references.
Much faster than full rebuild on large repos when only a few files changed.
+444 lines • 3 files • 4 new tests • 18 total tests passing
New sem impact <entity_name> command shows transitive dependents — what would break if you changed an entity.
Uses BFS over the entity dependency graph. Output grouped by file, with --json flag for machine consumption.
Shows the blast radius at every depth level: direct dependents, their dependents, and so on.
+177 lines • 4 files • terminal + JSON output
Two major features. Structural hash (Unison-inspired): computes AST-based hash that strips comments and normalizes whitespace. Two entities with the same structural_hash are logically identical even if formatting differs. Used for rename detection.
Entity dependency graph: two-pass extraction (entities → symbol table → reference edges). Intra-file and cross-file resolution. Forward + reverse dependency lookup. Transitive impact analysis via BFS.
New sem graph CLI command.
+749 lines • 15 files • 14 tests (up from 10) • structural_hash, EntityGraph, sem graph, sem impact
First working version. sem diff with tree-sitter parsing for TypeScript, JavaScript, Python, Go, Rust.
JSON/YAML/TOML/CSV/Markdown support via custom parsers. Three-phase entity matching (exact ID, content hash, fuzzy similarity).
Node.js + TypeScript • tsup build • better-sqlite3 for storage • ~290ms cold start
Added blame, status, watch, review, history, label, comment.
Entity-level blame (who last touched each function), semantic PR review with risk signals, real-time file watching, threaded comments on entities.
+1,191 lines • 12 files • 15 functions, 12 interfaces, 3 variables, 1 class added
Parallelized git operations. Replaced sequential simple-git calls with batched diff --name-status + show. Lazy-loaded parsers instead of importing all grammars upfront. Added internal benchmarking harness.
+705 lines • 32 added, 10 modified, 3 deleted entities across 5 types
Rewrote sem diff as a compiled Rust binary. git2 for in-process git operations (no subprocess spawning), tree-sitter grammars compiled directly into the binary (no WASM, no dynamic loading).
Cargo workspace: sem-core (library) + sem-cli (binary). All 7 parser plugins ported.
+3,905 lines • 278 entities • 11 entity types including Rust-specific (struct, impl, trait)
Proved that AI agents are 2.3x more accurate at answering questions about code changes when given sem diff JSON vs raw git diff output.
Tested with Claude Sonnet 4.5 across 4 question types and 3 commits of varying size.
Core finding: line diffs cause systematic failures — the model confuses lines with entities, can't distinguish adds from modifications, and has no entity type vocabulary.
24 API calls • sem 95.9% avg accuracy vs git 41.5% • +54.4% delta
brew install sem-diff — single command install via Homebrew tap.
Formula builds from source using Cargo, compiles all tree-sitter grammars in, links against system libgit2.
Includes a test that verifies entity extraction on a Python function.
3 install methods: Homebrew (recommended) • curl installer • cargo install from source
npm install -g, then a curl script, and finally Homebrew.
Each step removed friction — Homebrew handles updates, dependencies, and uninstall for free.