Benchmarks

Real-world and synthetic merge benchmarks. Reproduce with weave bench-repo <path>.

Methodology

How the benchmarks work.

Clone a real repo

We pick major open-source repos with long merge histories: git/git (C), Flask (Python), CPython (C/Python), Go (Go), TypeScript (TS).

Walk merge commits

For each merge commit with two parents, extract the base (merge-base), ours (parent 1), theirs (parent 2), and human result (the merge commit itself).

Replay each file merge

For every file that both parents touched, run git's line-level merge and weave's entity-level merge on the same (base, ours, theirs) triple.

Compare against human

A win is when git conflicts but weave resolves cleanly. A regression is when git resolves cleanly but weave's output differs from the human result. Human match checks if weave's output is identical to what the developer wrote.

Summary

Across 4,917 file merges from 5 repos, weave resolves 83 merges that git cannot, with 0 regressions on C, Python, and Go.

Repository	Language	Files Tested	Both Clean	Weave Wins	Both Conflict	Regressions	Human Match
git/git	C	1,319	1,009	39	271	0	64%
Flask	Python	56	30	14	12	0	57%
CPython	C / Python	256	201	7	48	0	29%
Go	Go	1,247	1,000	19	228	0	58%
TypeScript	TypeScript	1,639	1,340	4	292	3	75%

git/git

The git source code itself. 1,319 file merges from 500 merge commits. Mostly C header and source files.

Wins

Regressions

64%

Human Match

13%

Resolution Rate

25 of 39 wins produce output identical to the human merge. The remaining 14 differ in entity ordering (e.g. weave places a struct above a function where the human placed it below). These are stylistic differences, not semantic errors.

Common win patterns: both branches add different extern declarations to a header, both branches add functions to different sections of a .c file, import block changes that git sees as overlapping lines.

Flask

Python web framework. 56 file merges from 500 merge commits. Highest resolution rate of all tested repos.

Wins

Regressions

57%

Human Match

54%

Resolution Rate

Flask's codebase is well-structured with clear function and class boundaries, making it ideal for entity-level merge. Over half of all git conflicts are resolved by weave. Common patterns: both branches modifying different methods in app.py, import additions to __init__.py.

CPython

The Python interpreter. 256 file merges from 500 merge commits. Mix of C source and Python test files.

Wins

Regressions

29%

Human Match

13%

Resolution Rate

Lower human match rate due to CPython's heavy use of macros and preprocessor directives in C code, which create entity ordering differences. The wins are clean: header file declarations and test method additions that git falsely conflicts on.

Go

The Go compiler and standard library. 1,247 file merges from 500 merge commits.

Wins

Regressions

58%

Human Match

28%

Resolution Rate

Go's explicit structure (top-level functions, clear type declarations) works well with entity-level merge. 58% human match rate. Common patterns: both branches adding different functions, struct field additions in different types.

TypeScript

The TypeScript compiler. 1,639 file merges from 500 merge commits. Highest human match rate but 3 regressions.

Wins

Regressions

75%

Human Match

Resolution Rate

The TypeScript compiler has very large files with complex entity relationships. The 3 regressions are under investigation. The 75% human match rate (highest of all repos) shows that when weave does resolve, it closely matches developer intent.

Glossary

What the numbers mean.

Term	Definition
Files Tested	Number of individual file merges where both branches touched the same file (both-touched files across all merge commits).
Both Clean	Both git and weave merged cleanly. No conflict from either tool.
Win	Git produced a conflict, but weave resolved cleanly. A false conflict eliminated.
Both Conflict	Both git and weave produced conflicts. A real semantic collision that requires human judgment.
Regression	Git merged cleanly, but weave produced a different result than the human. Weave introduced an error where git was fine.
Human Match	Of the wins, how many produce output identical to what the developer actually wrote. Higher = weave's merge matches human intent.
Resolution Rate	Wins / (Wins + Both Conflict). What percentage of git's conflicts weave eliminates.

Reproduce

Run the benchmarks yourself.

run benchmarks

# Clone a repo
$ git clone --bare https://github.com/git/git /tmp/git-bench

# Run benchmark (scans up to 500 merge commits)
$ weave bench-repo /tmp/git-bench

# Show diffs for non-matching cases
$ weave bench-repo /tmp/git-bench --show-diff

# Save base/ours/theirs/human/weave for each case
$ weave bench-repo /tmp/git-bench --save benchmarks/git