Back to Insights White Paper

Separating Skill from Luck: A Quantitative Framework for GP Evaluation

Reiner Braun Prof. Dr. Reiner Braun Founder & Strategic Partnerships March 2026

Manager rankings in private equity have historically suffered from a statistical problem that is rarely acknowledged: fund-level performance is observed with too few data points to reliably separate systematic manager skill from the noise of market exposure, sector tailwinds, and deal-specific luck. This paper introduces the calibration framework QFT uses to decompose observed returns into attributable and non-attributable components.

Why the small-sample problem matters

A typical buyout fund realises between 10 and 20 portfolio exits over its lifetime. Rankings built on so few observations are highly sensitive to outliers, and the cross-sectional dispersion of reported IRRs overstates the dispersion of underlying GP skill. We quantify this gap using a deal-level panel of more than 20,000 LBO transactions spanning four decades.

The empirical result is that a meaningful fraction of observed top-quartile performance is statistically indistinguishable from median performance once deal-level noise is accounted for. The rankings that drive commitment decisions are, in many cases, noisier than the dispersion they purport to describe.

The QFT decomposition

Our approach isolates three components at the deal level: systematic market exposure, sector and vintage effects, and residual manager contribution. The residual is what we estimate as skill. Because the decomposition operates on individual transactions rather than fund-level IRRs, it generates an order of magnitude more signal per manager.

The resulting estimator is bias-corrected using a parametric bootstrap calibrated on historical exit distributions, and it generalises across vintages, geographies, and strategy mandates. For allocators, the output is a skill estimate with an explicit confidence interval rather than a point ranking.

Download the full paper for the formal derivation, the calibration appendix, and the out-of-sample validation across 2004 to 2018 vintage cohorts.