Real-world and synthetic merge benchmarks. Reproduce with weave bench-repo <path>.
How the benchmarks work.
Across 4,917 file merges from 5 repos, weave resolves 83 merges that git cannot, with 0 regressions on C, Python, and Go.
| Repository | Language | Files Tested | Both Clean | Weave Wins | Both Conflict | Regressions | Human Match |
|---|---|---|---|---|---|---|---|
| git/git | C | 1,319 | 1,009 | 39 | 271 | 0 | 64% |
| Flask | Python | 56 | 30 | 14 | 12 | 0 | 57% |
| CPython | C / Python | 256 | 201 | 7 | 48 | 0 | 29% |
| Go | Go | 1,247 | 1,000 | 19 | 228 | 0 | 58% |
| TypeScript | TypeScript | 1,639 | 1,340 | 4 | 292 | 3 | 75% |
The git source code itself. 1,319 file merges from 500 merge commits. Mostly C header and source files.
25 of 39 wins produce output identical to the human merge. The remaining 14 differ in entity ordering (e.g. weave places a struct above a function where the human placed it below). These are stylistic differences, not semantic errors.
Common win patterns: both branches add different extern declarations to a header, both branches add functions to different sections of a .c file, import block changes that git sees as overlapping lines.
Python web framework. 56 file merges from 500 merge commits. Highest resolution rate of all tested repos.
Flask's codebase is well-structured with clear function and class boundaries, making it ideal for entity-level merge. Over half of all git conflicts are resolved by weave. Common patterns: both branches modifying different methods in app.py, import additions to __init__.py.
The Python interpreter. 256 file merges from 500 merge commits. Mix of C source and Python test files.
Lower human match rate due to CPython's heavy use of macros and preprocessor directives in C code, which create entity ordering differences. The wins are clean: header file declarations and test method additions that git falsely conflicts on.
The Go compiler and standard library. 1,247 file merges from 500 merge commits.
Go's explicit structure (top-level functions, clear type declarations) works well with entity-level merge. 58% human match rate. Common patterns: both branches adding different functions, struct field additions in different types.
The TypeScript compiler. 1,639 file merges from 500 merge commits. Highest human match rate but 3 regressions.
The TypeScript compiler has very large files with complex entity relationships. The 3 regressions are under investigation. The 75% human match rate (highest of all repos) shows that when weave does resolve, it closely matches developer intent.
What the numbers mean.
| Term | Definition |
|---|---|
| Files Tested | Number of individual file merges where both branches touched the same file (both-touched files across all merge commits). |
| Both Clean | Both git and weave merged cleanly. No conflict from either tool. |
| Win | Git produced a conflict, but weave resolved cleanly. A false conflict eliminated. |
| Both Conflict | Both git and weave produced conflicts. A real semantic collision that requires human judgment. |
| Regression | Git merged cleanly, but weave produced a different result than the human. Weave introduced an error where git was fine. |
| Human Match | Of the wins, how many produce output identical to what the developer actually wrote. Higher = weave's merge matches human intent. |
| Resolution Rate | Wins / (Wins + Both Conflict). What percentage of git's conflicts weave eliminates. |
Run the benchmarks yourself.
# Clone a repo $ git clone --bare https://github.com/git/git /tmp/git-bench # Run benchmark (scans up to 500 merge commits) $ weave bench-repo /tmp/git-bench # Show diffs for non-matching cases $ weave bench-repo /tmp/git-bench --show-diff # Save base/ours/theirs/human/weave for each case $ weave bench-repo /tmp/git-bench --save benchmarks/git