Entity-level diffs on top of Git. Structured output for AI agents, CI pipelines, and humans who want more than line numbers.
$ sem diff ┌─ src/auth/login.ts ────────────────────────────────── │ │ ⊕ function validateToken [added] │ ∆ function authenticateUser [modified] │ ⊖ function legacyAuth [deleted] │ └────────────────────────────────────────────────────── ┌─ config/database.yml ───────────────────────────────── │ │ ∆ property production.pool_size [modified] │ - 5 │ + 20 │ └────────────────────────────────────────────────────── Summary: 1 added, 1 modified, 1 deleted across 2 files
--format json for machine-readable output. More commands coming soon.
-s, --staged Show staged changes only-c, --commit <sha> Diff a specific commit--from <ref> --to <ref> Diff a commit range-f, --format <fmt> terminal (default) or json--file-exts <ext>... Only include files with these extensions (e.g. .py .rs)--entity <name> Show dependencies/dependents for a specific entity--format <fmt> terminal (default) or json--file-exts <ext>... Only include files with these extensions<entity> Name of the entity to analyze--json Output as JSON--file-exts <ext>... Only include files with these extensions<file> File to blame--json Output as JSON| Format | Extensions | Entities |
|---|---|---|
| TypeScript | .ts .tsx | functions, classes, interfaces, types, enums, exports |
| JavaScript | .js .jsx .mjs .cjs | functions, classes, variables, exports |
| Python | .py | functions, classes, decorated definitions |
| Go | .go | functions, methods, types, vars, consts |
| Rust | .rs | functions, structs, enums, impls, traits, mods, consts |
| Java | .java | classes, methods, interfaces, enums, fields, constructors |
| C | .c .h | functions, structs, enums, unions, typedefs |
| C++ | .cpp .cc .hpp | functions, classes, structs, enums, namespaces, templates |
| C# | .cs | classes, methods, interfaces, enums, structs, properties |
| Ruby | .rb | methods, classes, modules |
| PHP | .php | functions, classes, methods, interfaces, traits, enums |
| Swift | .swift | functions, classes, protocols, structs, enums, properties |
| Elixir | .ex .exs | modules, functions, macros, guards, protocols |
| Bash | .sh | functions |
| HCL/Terraform | .hcl .tf .tfvars | blocks, attributes (qualified names) |
| Kotlin | .kt .kts | classes, interfaces, objects, functions, properties |
| Fortran | .f90 .f95 .f | functions, subroutines, modules, programs |
| Vue | .vue | template/script/style blocks + inner TS/JS entities |
| XML | .xml .plist .svg .csproj | elements (nested, tag-name identity) |
| ERB | .erb | blocks, expressions, code tags |
| JSON | .json | properties, objects (RFC 6901 paths) |
| YAML | .yml .yaml | sections, properties (dot paths) |
| TOML | .toml | sections, properties |
| CSV | .csv .tsv | rows (first column as ID) |
| Markdown | .md .mdx | heading-based sections |
Three-phase algorithm that detects additions, modifications, deletions, renames, and moves.
Phase 1 — Exact ID
Same entity ID in before/after? Modified or unchanged.
Phase 2 — Content hash
Same SHA-256, different name? Renamed or moved.
Phase 3 — Fuzzy similarity
>80% Jaccard token overlap? Probable rename.
Structured JSON. Pipe sem into your AI agent, CI pipeline, or automation.
{ "summary": { "fileCount": 2, "added": 1, "modified": 1, "deleted": 1, "total": 3 }, "changes": [ { "entityId": "src/auth.ts::function::validateToken", "changeType": "added", "entityType": "function", "entityName": "validateToken", "filePath": "src/auth.ts" } ] }
git gives you lines. sem gives you entities — functions, properties, rows, sections.
| Feature | git diff | sem diff |
|---|---|---|
| Diff granularity | lines | entities (functions, classes, properties) |
| Code parsing | no | tree-sitter (TS, Python, Go, Rust, JS) |
| JSON / YAML / TOML | lines | key-path entities |
| CSV | lines | row + cell identity |
| Rename detection | heuristic (file-level) | 3-phase (ID + hash + fuzzy) |
| Machine-readable output | patch format | JSON |
| Agent accuracy | 41.5% avg | 95.9% avg (benchmark) |
| Speed | 9ms | 8ms |
| Adoption | - | single binary, drop into any Git repo |
Real measurements on the sem repo. 50 runs each via hyperfine -N, median reported. LTO-optimized Rust binary with xxHash64 and cached tree resolution.
Wall-clock time, no shell overhead. Measured with hyperfine -N --warmup 10 --runs 50 on the sem repo.
Same commit (5 files), same repo. sem adds entity-level parsing on top of git's line diff.
Built-in instrumentation via sem diff --profile. Shows where time is spent inside the binary.
CPU time breakdown for sem diff --commit (large, 13 files). Hover for details.