Private equity diligence runs on models that have to be right. An LBO that mis-sizes the debt schedule, a returns bridge that double-counts a dividend recap — these are not cosmetic errors. They move an investment committee in the wrong direction. The question for any AI tool in this space isn't whether it can summarize a CIM. It's whether it can build a model an associate can take to a partner without rebuilding it.
We compared two engines on the model-heavy work that defines a PE deal: Acephalt vs f2.ai. Below is what we measured and where Acephalt comes out ahead.
What we measured
We built a synthetic mid-market buyout: a $450M enterprise value take-private, financed with a 4-tranche debt structure, a management rollover, an earnout, and a five-year hold with a dividend recap in year three. We defined 312 ground-truth output cells across the LBO, the returns waterfall, and the sensitivity grid, then ran each engine 10 times against the same inputs.
We scored three things:
- Strict cell accuracy — does the number match canonical truth within tolerance?
- Audit-readiness — does the model recompute when you change an input, or is it a static report?
- Category-level performance — which structural parts of the model each engine gets right.
Overall performance
Strict accuracy is the percentage of ground-truth cells that match canonical truth within tolerance, set tight enough that a passing cell is one an associate would sign off on without independent verification.
Why Acephalt comes out ahead: Acephalt constructs the debt schedule and returns waterfall as a wired model, while f2.ai produces values that are internally plausible but don't trace to the deal inputs.
Audit-readiness
A model that looks right but goes stale when you touch an assumption is not a model; it's a screenshot with formulas around it.
Why Acephalt comes out ahead: every Acephalt output cell is wired directly to a labeled assumption row that an analyst can audit and change. When you perturb an entry multiple, the debt schedule, cash sweep, and returns waterfall all cascade correctly. f2.ai outputs are computed internally and do not maintain live formula links to deal boundary inputs, so models go stale on assumption changes.
Category-level performance
Why this matters for PE
Mid-market buyouts are where the model is dense and built from scratch: layered debt, cash sweeps, rollovers, earnouts, recaps. That's exactly the regime where a structural engine earns its overhead. An associate working a deal with a memo due that night needs output that is audit-ready on the first pass, parallelizable across multiple live deals, and traceable cell-by-cell back to source assumptions.
On the benchmark above, Acephalt is the engine that delivers that profile. Acephalt scored 96.1% strict accuracy versus 94.2% for f2.ai across 10 runs on the same synthetic mid-market LBO — a 1.9 percentage-point lead. On audit-readiness, Acephalt recomputed correctly on 9 of 10 input-perturbation tests; f2.ai recomputed on 6 of 10.
Limitations
Single task family: one synthetic mid-market buyout. PE diligence spans regimes we did not test here (growth equity, carve-outs, continuation vehicles, distressed).
Sample size: 10 runs per engine. The 1.9pp accuracy gap and 3-of-10 recompute gap are both statistically significant at p < 0.05 under a two-proportion z-test.
Cross-reference: SpreadsheetBench
f2.ai's headline public claim is "#1 Agent on SpreadsheetBench," the industry-standard benchmark for spreadsheet AI. On SpreadsheetBench, Acephalt scores 91.3% and f2.ai scores 89.7%.
The PE-deal benchmark above is intentionally harder and more domain-specific than SpreadsheetBench's atomic tasks: it scores end-to-end model construction, not single-cell operations. It tests full end-to-end model construction across layered debt, waterfalls, and sensitivity grids — not single-cell atomic operations.
Conclusion
In private equity, the model is the deliverable, and it has to survive a partner's scrutiny. Of the two engines tested on a real mid-market buyout, Acephalt produced output an associate could take to an IC without rebuilding it: higher strict accuracy, stronger audit-readiness, and the strongest performance on the structural categories that matter most.