Know what changed.

Entity-level diffs on top of Git. Structured output for AI agents, CI pipelines, and humans who want more than line numbers.

copied $ cargo install --git https://github.com/Ataraxy-Labs/sem --path crates/sem-cli
or: brew install ataraxy-labs/tap/sem
~/project
$ sem diff

┌─ src/auth/login.ts ──────────────────────────────────

   function  validateToken          [added]
   function  authenticateUser       [modified]
   function  legacyAuth             [deleted]

└──────────────────────────────────────────────────────

┌─ config/database.yml ─────────────────────────────────

   property  production.pool_size   [modified]
    - 5
    + 20

└──────────────────────────────────────────────────────

Summary: 1 added, 1 modified, 1 deleted across 2 files

Commands

--format json for machine-readable output. More commands coming soon.

sem diff Semantic diff of changes
-s, --staged Show staged changes only
-c, --commit <sha> Diff a specific commit
--from <ref> --to <ref> Diff a commit range
-f, --format <fmt> terminal (default) or json

Supported formats

FormatExtensionsEntities
TypeScript.ts .tsxfunctions, classes, interfaces, types, enums
JavaScript.js .jsx .mjs .cjsfunctions, classes, variables
Python.pyfunctions, classes, decorators
Go.gofunctions, methods, types
Rust.rsfunctions, structs, enums, impls, traits
JSON.jsonproperties, objects (RFC 6901 paths)
YAML.yml .yamlsections, properties (dot paths)
TOML.tomlsections, properties
CSV.csv .tsvrows (first column as ID)
Markdown.md .mdxheading-based sections

Entity matching

Three-phase algorithm that detects additions, modifications, deletions, renames, and moves.

Phase 1 — Exact ID Same entity ID in before/after? Modified or unchanged.
Phase 2 — Content hash Same SHA-256, different name? Renamed or moved.
Phase 3 — Fuzzy similarity >80% Jaccard token overlap? Probable rename.

Output

Structured JSON. Pipe sem into your AI agent, CI pipeline, or automation.

sem diff --format json | jq
{
  "summary": {
    "fileCount": 2,
    "added": 1,
    "modified": 1,
    "deleted": 1,
    "total": 3
  },
  "changes": [
    {
      "entityId": "src/auth.ts::function::validateToken",
      "changeType": "added",
      "entityType": "function",
      "entityName": "validateToken",
      "filePath": "src/auth.ts"
    }
  ]
}

sem vs git

git gives you lines. sem gives you entities — functions, properties, rows, sections.

Feature git diff sem diff
Diff granularity lines entities (functions, classes, properties)
Code parsing no tree-sitter (TS, Python, Go, Rust, JS)
JSON / YAML / TOML lines key-path entities
CSV lines row + cell identity
Rename detection heuristic (file-level) 3-phase (ID + hash + fuzzy)
Machine-readable output patch format JSON
Agent accuracy 41.5% avg 95.9% avg (benchmark)
Speed 20ms 30ms
Adoption - single binary, drop into any Git repo

Benchmarks

Real measurements on the sem repo. 20 runs each via hyperfine, median reported. Compiled Rust binary with all tree-sitter grammars built in.

sem diff across scenarios

Wall-clock time including process startup. Measured with hyperfine --warmup 3.

Small commit (1 file) 11ms
11ms
Large commit (11 files) 22ms
22ms
Medium commit (7 files, TS + JSON) 30ms
30ms
Range (8 commits, 19 files) 40ms
40ms
Full semantic understanding in under 40ms. Scales linearly with file count.

sem diff vs git diff

Same commit, same repo. sem adds entity-level parsing on top of git's line diff.

git diff (line-level only) 22ms
22ms
sem diff (entity-level) 30ms
30ms
+8ms for full semantic parsing — functions, classes, properties, rename detection.

Internal profiler

Built-in instrumentation via sem diff --profile. Shows where time is spent inside the binary.

Small commit (1 file, 7ms)

git2 open repo
3.23ms
git diff + content
3.26ms
parse + match
0.62ms
format output
0.12ms
Total 7.23ms

Medium commit (7 files, 23ms)

git2 open repo
1.22ms
git diff + content
4.68ms
parse + match
16.78ms
format output
0.35ms
Total 23.05ms

Large commit (11 files, 14ms)

git2 open repo
1.24ms
git diff + content
4.24ms
parse + match
8.83ms
format output
0.11ms
Total 14.42ms

Range — 8 commits (19 files, 31ms)

git2 open repo
1.27ms
git diff + content
6.62ms
parse + match
23.03ms
format output
0.55ms
Total 31.47ms

Flame graph

CPU time breakdown for sem diff --commit (medium, 7 files). Hover for details.

git2 repo
git diff + content
tree-sitter parse
entity matching
format output