Same questions about code changes, answered by Claude Sonnet 4.5. One gets sem diff JSON, the other gets raw git diff. Scored against ground truth across 3 commits (small, medium, large).
Tested on 3 commits from this repo. Each exposes a different failure mode when agents reason about raw line diffs.
git diff has no concept of "entity." When asked to count added entities, the model counts + lines instead.
On the speed optimization commit (fffb38f), git-based Claude reported 238 added — the number of + lines in the diff.
The actual count is 32 added entities. On the Rust rewrite, it said 1,122 vs truth of 259.
Line diffs have no AST. On commit 9f7f1c7 (7 new commands), git-based Claude returned {"file": 11} — it counted files, not entities.
Truth: {"interface": 12, "function": 15, "variable": 3, "class": 1}.
On the Rust rewrite, it found 16 functions when there are 87, and completely missed chunk (80), property (29), impl (10).
Modified functions show + and - hunks, same as new functions in changed files. On fffb38f,
git-based Claude listed 9 "added" functions — 4 were actually modified (detectJsonChanges, parseDiffNameStatus, detectAndGetFiles, populateContent).
Precision dropped to 55.6%. sem tags each entity with changeType: "added" vs changeType: "modified".
JSON/YAML/TOML changes appear as raw +/- key-value lines. The model doesn't classify these as "entities."
On fffb38f, git-based Claude missed package.json and package-lock.json as containing modified entities (recall dropped to 66.7%).
sem reports entityType: "property" for each changed key.
Both tools degrade on the 3,905-line Rust rewrite (ae576ab). git diff was truncated at 100KB — the model found 25/67 added functions (37% recall).
sem's stripped JSON is much more compact (no source code), so the model saw all 278 entities but still only extracted 43/67 (64% recall).
sem wins, but large diffs are where attention limits hurt regardless of format.
3 commits × 4 questions × 2 tools = 24 API calls. Claude Sonnet 4.5, temperature 0. Content fields stripped from sem JSON for fair comparison. Reproduce →