Entity-level diffs on top of Git. Structured output for AI agents, CI pipelines, and humans who want more than line numbers.
$ sem diff ┌─ src/auth/login.ts ────────────────────────────────── │ │ ⊕ function validateToken [added] │ ∆ function authenticateUser [modified] │ ⊖ function legacyAuth [deleted] │ └────────────────────────────────────────────────────── ┌─ config/database.yml ───────────────────────────────── │ │ ∆ property production.pool_size [modified] │ - 5 │ + 20 │ └────────────────────────────────────────────────────── Summary: 1 added, 1 modified, 1 deleted across 2 files
--format json for machine-readable output. More commands coming soon.
-s, --staged Show staged changes only-c, --commit <sha> Diff a specific commit--from <ref> --to <ref> Diff a commit range-f, --format <fmt> terminal (default) or json| Format | Extensions | Entities |
|---|---|---|
| TypeScript | .ts .tsx | functions, classes, interfaces, types, enums |
| JavaScript | .js .jsx .mjs .cjs | functions, classes, variables |
| Python | .py | functions, classes, decorators |
| Go | .go | functions, methods, types |
| Rust | .rs | functions, structs, enums, impls, traits |
| JSON | .json | properties, objects (RFC 6901 paths) |
| YAML | .yml .yaml | sections, properties (dot paths) |
| TOML | .toml | sections, properties |
| CSV | .csv .tsv | rows (first column as ID) |
| Markdown | .md .mdx | heading-based sections |
Three-phase algorithm that detects additions, modifications, deletions, renames, and moves.
Phase 1 — Exact ID
Same entity ID in before/after? Modified or unchanged.
Phase 2 — Content hash
Same SHA-256, different name? Renamed or moved.
Phase 3 — Fuzzy similarity
>80% Jaccard token overlap? Probable rename.
Structured JSON. Pipe sem into your AI agent, CI pipeline, or automation.
{ "summary": { "fileCount": 2, "added": 1, "modified": 1, "deleted": 1, "total": 3 }, "changes": [ { "entityId": "src/auth.ts::function::validateToken", "changeType": "added", "entityType": "function", "entityName": "validateToken", "filePath": "src/auth.ts" } ] }
git gives you lines. sem gives you entities — functions, properties, rows, sections.
| Feature | git diff | sem diff |
|---|---|---|
| Diff granularity | lines | entities (functions, classes, properties) |
| Code parsing | no | tree-sitter (TS, Python, Go, Rust, JS) |
| JSON / YAML / TOML | lines | key-path entities |
| CSV | lines | row + cell identity |
| Rename detection | heuristic (file-level) | 3-phase (ID + hash + fuzzy) |
| Machine-readable output | patch format | JSON |
| Agent accuracy | 41.5% avg | 95.9% avg (benchmark) |
| Speed | 20ms | 30ms |
| Adoption | - | single binary, drop into any Git repo |
Real measurements on the sem repo. 20 runs each via hyperfine, median reported. Compiled Rust binary with all tree-sitter grammars built in.
Wall-clock time including process startup. Measured with hyperfine --warmup 3.
Same commit, same repo. sem adds entity-level parsing on top of git's line diff.
Built-in instrumentation via sem diff --profile. Shows where time is spent inside the binary.
CPU time breakdown for sem diff --commit (medium, 7 files). Hover for details.