# sem > Entity-level semantic diff on top of Git. Functions, classes, methods instead of lines. ## Overview sem extends Git with entity-level operations. Instead of tracking lines, sem tracks functions, classes, methods, and types. Uses tree-sitter for parsing and AST-normalized structural hashing to detect cosmetic vs structural changes. Git tracks lines. Developers think in functions. sem bridges the gap. ## Install ``` brew install sem-cli ``` Or build from source: ``` git clone https://github.com/Ataraxy-Labs/sem cd sem/crates && cargo install --path sem-cli ``` Binary at `crates/target/release/sem`. ## Commands ### sem diff Entity-level diff. Shows which functions/classes were added, modified, deleted, or renamed. Distinguishes cosmetic changes (whitespace/formatting) from structural changes (logic). ``` sem diff # working changes sem diff --staged # staged only sem diff --commit abc1234 # specific commit sem diff --from HEAD~5 --to HEAD # commit range sem diff file1.ts file2.ts # compare two files (no git needed) sem diff --format json # JSON output for agents/CI sem diff --format plain # git-status style sem diff --format markdown # markdown tables sem diff --stdin --format json # read file changes from stdin sem diff --file-exts .py .rs # filter by extension sem diff -v # verbose inline content diffs ``` ### sem blame Entity-level blame. Who last modified each function/class, not each line. ``` sem blame src/auth.ts sem blame src/auth.ts --json ``` ### sem graph Cross-file entity dependency graph. Shows what each function calls and what calls it. ``` sem graph sem graph --entity validateToken sem graph --file-exts .py sem graph --format json sem graph --no-default-excludes ``` ### sem impact Transitive impact analysis. If this entity changes, what else is affected? BFS through dependency graph. ``` sem impact validateToken sem impact validateToken --json sem impact validateToken --file-exts .py sem impact validateToken --no-default-excludes ``` ## Key Features - 27 languages with full entity extraction via tree-sitter - Structural hashing: AST-normalized hashes that ignore whitespace, comments, formatting - Cosmetic vs structural change detection in diff - Entity-level blame (per function, not per line) - Cross-file dependency graph via call/reference analysis - Transitive impact analysis (BFS through dependency graph) - Three-phase entity matching: exact ID, structural hash (rename detection), fuzzy similarity - JSON output for AI agents and CI pipelines - Stdin mode for non-git usage ## Language Support 27 programming languages: | Language | Extensions | Entity Types | |----------|-----------|--------------| | TypeScript | .ts .tsx .mts .cts | functions, classes, interfaces, types, enums, exports | | JavaScript | .js .jsx .mjs .cjs | functions, classes, variables, exports | | Python | .py | functions, classes, decorated definitions | | Go | .go | functions, methods, types, vars, consts | | Rust | .rs | functions, structs, enums, impls, traits, mods, consts | | Java | .java | classes, methods, interfaces, enums, fields, constructors | | C | .c .h | functions, structs, enums, unions, typedefs | | C++ | .cpp .cc .hpp | functions, classes, structs, enums, namespaces, templates | | C# | .cs | classes, methods, interfaces, enums, structs, properties | | Ruby | .rb | methods, classes, modules | | PHP | .php | functions, classes, methods, interfaces, traits, enums | | Swift | .swift | functions, classes, protocols, structs, enums, properties | | Elixir | .ex .exs | modules, functions, macros, guards, protocols | | Bash | .sh | functions | | HCL/Terraform | .hcl .tf .tfvars | blocks, attributes (qualified names) | | Kotlin | .kt .kts | classes, interfaces, objects, functions, properties | | Fortran | .f90 .f95 .f | functions, subroutines, modules, programs | | Vue | .vue | template/script/style blocks + inner TS/JS entities | | XML | .xml .plist .svg .csproj | elements (nested, tag-name identity) | | ERB | .erb | blocks, expressions, code tags | | Svelte | .svelte .svelte.js .svelte.ts | component blocks, rune modules + inner JS/TS entities | | Dart | .dart | classes, mixins, extensions, enums, type aliases, functions | | Perl | .pl .pm .t | subroutines, packages | | OCaml | .ml .mli | values, modules, types, classes, externals | | Scala | .scala .sc .sbt | classes, objects, traits, enums, functions, vals, extensions | | Nix | .nix | bindings, inherit declarations | | Zig | .zig | functions, tests, variables | Plus structured data formats: | Format | Extensions | Entity Types | |--------|-----------|--------------| | JSON | .json | properties, objects (RFC 6901 paths) | | YAML | .yml .yaml | sections, properties (dot paths) | | TOML | .toml | sections, properties | | CSV | .csv .tsv | rows (first column as ID) | | Markdown | .md .mdx | heading-based sections | Everything else falls back to chunk-based diffing. ## JSON Output ``` sem diff --format json ``` Returns: ```json { "summary": { "fileCount": 2, "added": 1, "modified": 1, "deleted": 1, "moved": 0, "renamed": 0, "reordered": 0, "orphan": 0, "total": 3 }, "changes": [ { "entityId": "src/auth.ts::function::validateToken", "changeType": "added", "entityType": "function", "entityName": "validateToken", "startLine": 12, "endLine": 18, "oldStartLine": null, "oldEndLine": null, "filePath": "src/auth.ts" } ] } ``` The named change-type buckets (added, modified, deleted, moved, renamed, reordered) always sum to total. Orphan is metadata for module-level changes already included in those buckets. ## Architecture Cargo workspace: sem-core (library) + sem-cli (binary). - tree-sitter for code parsing (native Rust, not WASM) - git2 for in-process Git operations - rayon for parallel file processing - xxhash for structural hashing - Plugin system for adding new languages and formats ## Performance - Small commit (1 file): 5ms - Medium commit (5 files): 8ms - Large commit (13 files): 19ms - Range (8 commits, 30 files): 24ms - Faster than git diff for equivalent operations ## As a Library sem-core can be used as a Rust library dependency: ```toml [dependencies] sem-core = { git = "https://github.com/Ataraxy-Labs/sem", version = "0.3" } ``` Used by weave (semantic merge driver) and inspect (entity-level code review). ## Links - GitHub: https://github.com/Ataraxy-Labs/sem - Website: https://ataraxy-labs.github.io/sem - License: MIT OR Apache-2.0