# sem > Semantic version control built on Git. Entity-level diff, blame, graph, and impact analysis. ## Overview sem extends git with entity-level operations. Instead of tracking lines, sem tracks functions, classes, methods, and types. Uses tree-sitter for parsing and AST-normalized structural hashing to detect cosmetic vs structural changes. Git tracks lines. Developers think in functions. sem bridges the gap. ## Install ``` cd sem/crates && cargo build --release ``` Binary at `crates/target/release/sem`. ## Commands ### sem diff Entity-level diff between commits. Shows which functions/classes were added, modified, deleted, or renamed. Distinguishes cosmetic changes (whitespace/formatting) from structural changes (logic). ``` sem diff HEAD~1 sem diff main..feature-branch sem diff HEAD~1 --file src/auth.ts ``` ### sem blame Entity-level blame. Shows who last modified each function/class, not each line. ``` sem blame src/auth.ts ``` ### sem graph Cross-file entity dependency graph. Shows what each function calls and what calls it. ``` sem graph sem graph --entity validateToken sem graph --file-exts .py ``` ### sem impact Transitive impact analysis. If this entity changes, what else is affected? BFS through dependency graph. ``` sem impact validateToken sem impact validateToken --file-exts .py ``` ## Global Flags ### --file-exts Available on `sem diff`, `sem graph`, and `sem impact`. Filters analysis to only include files with the specified extensions. Useful for multi-language repos where you want to scope to one language. ``` sem diff --file-exts .py .rs sem graph --file-exts .py ``` ## Key Features - 13 languages: TypeScript, TSX, JavaScript, Python, Go, Rust, Java, C, C++, Ruby, C#, PHP, Fortran - Structural hashing: AST-normalized hashes that ignore whitespace, comments, and formatting - Cosmetic vs structural change detection in diff - Entity-level blame (per function, not per line) - Cross-file dependency graph via call/reference analysis - Transitive impact analysis (BFS through dependency graph) - Incremental graph updates (only re-parse changed files) ## Architecture Cargo workspace: sem-core (library) + sem-cli (binary). ### sem-core - Parser plugins for 13 languages via tree-sitter - Entity extraction: functions, classes, methods, interfaces, types, enums, imports - Structural hashing: normalize AST, strip whitespace/comments, SHA-256 - Dependency graph: cross-file call/reference tracking via petgraph - Region extraction: split files into Entity and Interstitial regions ### Language Support | Language | Extensions | Entity Types | |----------|-----------|--------------| | TypeScript | .ts | functions, classes, interfaces, types, enums | | TSX | .tsx | functions, classes, interfaces, types, enums | | JavaScript | .js .jsx .mjs .cjs | functions, classes, variables | | Python | .py | functions, classes, decorators | | Go | .go | functions, methods, types | | Rust | .rs | functions, structs, enums, impls, traits, mods | | Java | .java | classes, methods, interfaces, enums, fields | | C | .c .h | functions, structs, enums, unions, typedefs | | C++ | .cpp .cc .cxx .hpp | functions, classes, structs, enums, namespaces, templates | | Ruby | .rb | methods, classes, modules | | C# | .cs | methods, classes, interfaces, enums, structs, namespaces | | PHP | .php | functions, classes, methods, interfaces, traits, enums, namespaces | | Fortran | .f90 .f95 .f03 .f08 .f .for | functions, subroutines, modules, programs, interfaces | ## Performance Parallel entity extraction via rayon. Zero-allocation graph traversal. - Small commit (1 file): 5ms - Medium commit (5 files): 8ms - Large commit (13 files, 65 entities): 19ms - Range (8 commits, 30 files, 1383 entities): 24ms - sem diff vs git diff: +9ms overhead for full semantic parsing - Optimizations: LTO, xxHash64 (replaces SHA-256), cached tree resolution, zero-alloc structural hashing ## Used By - **weave**: Entity-level merge driver for Git (uses sem-core for entity extraction) - **inspect**: Entity-level code review CLI (uses sem-core for graph and risk scoring) - **agenthub**: Agent-native GitHub platform (uses sem-core for code graph) ## Links - GitHub: https://github.com/Ataraxy-Labs/sem - License: MIT