sem — MCP Benchmarks

sem-mcp performance

6 MCP tools. Entity-level intelligence for agents.
Real numbers from sem's own codebase (48 Rust files).

6 MCP tools

75% fewer tokens

4.7x faster with caching

2.3x agent accuracy

Token efficiency

How many tokens does an agent need to understand EntityGraph and everything it affects?

Read all 73 source files ~32,000 tokens

32,000

sem_context (8K budget) 8,000 tokens · 121 entities packed

8,000

sem_context (4K budget) 4,000 tokens · 44 entities packed

4,000

Read just graph.rs ~3,906 tokens · 1 file, no cross-file deps

3,906

sem_context (2K budget) 1,887 tokens · target entity only

1,887

sem_context packs the target entity + all dependents + transitive signatures into a token budget. The agent gets the blast radius, not the whole repo. At 8K tokens, it fits 121 entities from across the codebase.

Impact precision

"How many things break if EntityGraph changes?"

grep EntityGraph
string matches (imports, comments, type annotations)

304

sem_impact
entities in the transitive dependency chain

grep results 30 matches

sem_impact (transitive) 304 entities

304

grep finds string matches. sem_impact walks the entity dependency graph and finds everything that transitively depends on the target. No hallucination, no false positives, no missed cross-file callers. In 56ms.

Test targeting

"Which tests should I run after changing EntityGraph?"

cargo test
run everything

sem_impact(mode="tests")
tests that actually depend on EntityGraph

Run all tests 44 tests

sem_impact(mode="tests") 24 tests

Impact analysis filtered to test entities. Agents know exactly which tests matter for their change. 45% fewer tests to run, zero guessing.

Agent accuracy

Same questions about code changes, answered by Claude Sonnet 4.5. One gets sem diff JSON, the other gets raw git diff.

sem diff

git diff

Q1: List added functions (F1) 93% vs 75%

93%

75%

Q2: Files with modified entities (F1) 100% vs 55%

100%

55%

Q3: Entity type counts (accuracy) 91% vs 13%

91%

13%

Q4: Added/modified/deleted counts (exact) 100% vs 22%

100%

22%

Overall average 96% vs 41%

+131% improvement with structured entity diffs.

See detailed findings and failure modes →

Graph caching

Agents call multiple graph tools per session. Without caching, each tool rebuilds the entity graph from scratch. With caching, the first call builds it once and every subsequent call reuses it.

495ms

5 graph calls
no caching (separate processes)

106ms

5 graph calls
in one session (memory cache)

6 tool calls: entities + diff + blame + impact + log + context

Without caching (5 rebuilds) 495ms

495ms

With caching (1 build + 4 cache hits) 106ms

106ms

4.7x faster. 389ms saved per session.

114ms cold start (no cache)

96ms SQLite warm start

21ms avg per call (cached)

Two cache layers: in-memory (keyed by file manifest hash) and SQLite at .sem/cache.db. Memory cache serves sequential calls in the same session. SQLite cache survives process restarts. If any file's mtime changes, the cache is invalidated and the graph rebuilds.

Latency (per tool)

Every tool, measured against sem's own codebase (48 Rust files). Single-process cold calls.

sem_entities 30ms

30ms

sem_diff 47ms

47ms

sem_blame 40ms

40ms

sem_log 85ms

85ms

sem_impact 120ms

120ms

sem_context 136ms

136ms

Cold per-process calls. In a live session with graph caching, graph tools (impact, context) average 21ms after the first call.