Know what changed.

Entity-level diffs on top of Git. Structured output for AI agents, CI pipelines, and humans who want more than line numbers.

copied $ brew install sem-cli
or: cargo install --git https://github.com/Ataraxy-Labs/sem sem-cli
~/project
$ sem diff

┌─ src/auth/login.ts ──────────────────────────────────

   function  validateToken          [added]
   function  authenticateUser       [modified]
   function  legacyAuth             [deleted]

└──────────────────────────────────────────────────────

┌─ config/database.yml ─────────────────────────────────

   property  production.pool_size   [modified]
    - 5
    + 20

└──────────────────────────────────────────────────────

Summary: 1 added, 1 modified, 1 deleted across 2 files

Commands

--format json for machine-readable output. More commands coming soon.

sem diff Semantic diff of changes
-s, --staged Show staged changes only
-c, --commit <sha> Diff a specific commit
--from <ref> --to <ref> Diff a commit range
-f, --format <fmt> terminal (default) or json
--file-exts <ext>... Only include files with these extensions (e.g. .py .rs)
sem graph Cross-file entity dependency graph
--entity <name> Show dependencies/dependents for a specific entity
--format <fmt> terminal (default) or json
--file-exts <ext>... Only include files with these extensions
sem impact Show what breaks if an entity changes
<entity> Name of the entity to analyze
--json Output as JSON
--file-exts <ext>... Only include files with these extensions
sem blame Entity-level blame: who last modified each function/class
<file> File to blame
--json Output as JSON

Supported formats

FormatExtensionsEntities
TypeScript.ts .tsxfunctions, classes, interfaces, types, enums, exports
JavaScript.js .jsx .mjs .cjsfunctions, classes, variables, exports
Python.pyfunctions, classes, decorated definitions
Go.gofunctions, methods, types, vars, consts
Rust.rsfunctions, structs, enums, impls, traits, mods, consts
Java.javaclasses, methods, interfaces, enums, fields, constructors
C.c .hfunctions, structs, enums, unions, typedefs
C++.cpp .cc .hppfunctions, classes, structs, enums, namespaces, templates
C#.csclasses, methods, interfaces, enums, structs, properties
Ruby.rbmethods, classes, modules
PHP.phpfunctions, classes, methods, interfaces, traits, enums
Swift.swiftfunctions, classes, protocols, structs, enums, properties
Elixir.ex .exsmodules, functions, macros, guards, protocols
Bash.shfunctions
HCL/Terraform.hcl .tf .tfvarsblocks, attributes (qualified names)
Kotlin.kt .ktsclasses, interfaces, objects, functions, properties
Fortran.f90 .f95 .ffunctions, subroutines, modules, programs
Vue.vuetemplate/script/style blocks + inner TS/JS entities
XML.xml .plist .svg .csprojelements (nested, tag-name identity)
ERB.erbblocks, expressions, code tags
JSON.jsonproperties, objects (RFC 6901 paths)
YAML.yml .yamlsections, properties (dot paths)
TOML.tomlsections, properties
CSV.csv .tsvrows (first column as ID)
Markdown.md .mdxheading-based sections

Entity matching

Three-phase algorithm that detects additions, modifications, deletions, renames, and moves.

Phase 1 — Exact ID Same entity ID in before/after? Modified or unchanged.
Phase 2 — Content hash Same SHA-256, different name? Renamed or moved.
Phase 3 — Fuzzy similarity >80% Jaccard token overlap? Probable rename.

Output

Structured JSON. Pipe sem into your AI agent, CI pipeline, or automation.

sem diff --format json | jq
{
  "summary": {
    "fileCount": 2,
    "added": 1,
    "modified": 1,
    "deleted": 1,
    "total": 3
  },
  "changes": [
    {
      "entityId": "src/auth.ts::function::validateToken",
      "changeType": "added",
      "entityType": "function",
      "entityName": "validateToken",
      "filePath": "src/auth.ts"
    }
  ]
}

sem vs git

git gives you lines. sem gives you entities — functions, properties, rows, sections.

Feature git diff sem diff
Diff granularity lines entities (functions, classes, properties)
Code parsing no tree-sitter (TS, Python, Go, Rust, JS)
JSON / YAML / TOML lines key-path entities
CSV lines row + cell identity
Rename detection heuristic (file-level) 3-phase (ID + hash + fuzzy)
Machine-readable output patch format JSON
Agent accuracy 41.5% avg 95.9% avg (benchmark)
Speed 9ms 8ms
Adoption - single binary, drop into any Git repo

Benchmarks

Real measurements on the sem repo. 50 runs each via hyperfine -N, median reported. LTO-optimized Rust binary with xxHash64 and cached tree resolution.

sem diff across scenarios

Wall-clock time, no shell overhead. Measured with hyperfine -N --warmup 10 --runs 50 on the sem repo.

Small commit (1 file) 5ms
5ms
Medium commit (5 files) 8ms
8ms
Large commit (13 files) 19ms
19ms
Range (8 commits, 30 files) 24ms
24ms
Full semantic understanding in under 25ms. Scales linearly with file count.

sem diff vs git diff

Same commit (5 files), same repo. sem adds entity-level parsing on top of git's line diff.

git diff (line-level only) 9ms
9ms
sem diff (entity-level) 8ms
8ms
Faster than git diff while adding semantic parsing, rename detection, and structural hashing.

Internal profiler

Built-in instrumentation via sem diff --profile. Shows where time is spent inside the binary.

Small commit (1 file, 8 entities)

git2 open repo
1.2ms
git diff + content
0.9ms
parse + match
1.8ms
format output
0.1ms
Total (wall) 4.9ms

Large commit (13 files, 65 entities)

git2 open repo
1.2ms
git diff + content
3.6ms
parse + match (parallel)
10.8ms
format output
0.2ms
Total (wall) 19.4ms

Range: 8 commits (30 files, 1383 entities)

git2 open repo
1.2ms
git diff + content
3.8ms
parse + match (parallel)
17.2ms
format output
0.4ms
Total (wall) 24.0ms

Flame graph

CPU time breakdown for sem diff --commit (large, 13 files). Hover for details.

git2 repo
git diff + content
tree-sitter parse
entity matching
format output