Pythagorean expectation
Encyclopedia
Pythagorean expectation is a formula invented by Bill James
Bill James
George William “Bill” James is a baseball writer, historian, and statistician whose work has been widely influential. Since 1977, James has written more than two dozen books devoted to baseball history and statistics...

 to estimate how many games a baseball
Baseball
Baseball is a bat-and-ball sport played between two teams of nine players each. The aim is to score runs by hitting a thrown ball with a bat and touching a series of four bases arranged at the corners of a ninety-foot diamond...

 team "should" have won based on the number of runs
Run (baseball)
In baseball, a run is scored when a player advances around first, second and third base and returns safely to home plate, touching the bases in that order, before three outs are recorded and all obligations to reach base safely on batted balls are met or assured...

 they scored and allowed. Comparing a team's actual and Pythagorean winning percentage can be used to evaluate how lucky that team was (by examining the variation between the two winning percentages). The term is derived from the formula's resemblance to the Pythagorean theorem
Pythagorean theorem
In mathematics, the Pythagorean theorem or Pythagoras' theorem is a relation in Euclidean geometry among the three sides of a right triangle...

.

The basic formula is:


where Win is the winning ratio generated by the formula. The expected number of wins would be the expected winning ratio multiplied by the number of games played.

Empirical origin

Empirically, this formula correlates fairly well with how baseball teams actually perform, although an exponent of 1.81 is slightly more accurate. This correlation is one justification for using runs
Run (baseball)
In baseball, a run is scored when a player advances around first, second and third base and returns safely to home plate, touching the bases in that order, before three outs are recorded and all obligations to reach base safely on batted balls are met or assured...

 as a unit of measurement for player performance. Efforts have been made to find the ideal exponent for the formula, the most widely known being the Pythagenport formula developed by Clay Davenport
Clay Davenport
Clay Davenport, a native of Hampton Roads, Virginia, now living in Baltimore, Maryland, is a baseball sabermetrician who co-founded Baseball Prospectus in 1996. He co-edited several of the Baseball Prospectus annual volumes and is a writer for BaseballProspectus.com...

 of Baseball Prospectus
Baseball Prospectus
Baseball Prospectus is an organization that publishes a website, BaseballProspectus.com, devoted to the sabermetric analysis of baseball. BP has a staff of regular columnists and provides advanced statistics as well player and team performance projections on the site...

 (1.5 log((r + ra)/g) + 0.45) and the less well known but equally (if not more) effective Pythagenpat
Pythagenpat
Pythagenpat is a formula created by David Smyth and "U.S. Patriot" which attempts to find the optimal exponent to use in the Pythagorean expectation formula. There are two versions of the formula, each developed independently. One version is rpg0.29, developed by Patriot, and the other is rpg0.287,...

 formula ((r + ra)/g)0.287, developed by David Smyth. Davenport expressed his support for the latter of the two, saying:

After further review, I (Clay) have come to the conclusion that the so-called Smyth/Patriot method, aka Pythagenpat, is a better fit. In that, X = ((rs + ra)/g)0.285, although there is some wiggle room for disagreement in the exponent. Anyway, that equation is simpler, more elegant, and gets the better answer over a wider range of runs scored than Pythagenport, including the mandatory value of 1 at 1 rpg.


These formulas are only necessary when dealing with extreme situations in which the average number of runs scored per game is either very high or very low. For most situations, simply squaring each variable yields accurate results.

There are some systematic statistical deviations between actual winning percentage and expected winning percentage, which include bullpen
Bullpen
In baseball, the bullpen is the area where relief pitchers warm-up before entering a game. Depending on the ballpark, it may be situated in foul territory along the baselines or just beyond the outfield fence. Also, a team's roster of relief pitchers is metonymically referred to as "the bullpen"...

 quality and luck. In addition, the formula tends to regress toward the mean
Regression toward the mean
In statistics, regression toward the mean is the phenomenon that if a variable is extreme on its first measurement, it will tend to be closer to the average on a second measurement, and—a fact that may superficially seem paradoxical—if it is extreme on a second measurement, will tend...

, as teams that win a lot of games tend to be underrepresented by the formula (meaning they "should" have won fewer games), and teams that lose a lot of games tend to be overrepresented (they "should" have won more).

"Second-order" and "third-order" wins

In their Adjusted Standings Report, Baseball Prospectus
Baseball Prospectus
Baseball Prospectus is an organization that publishes a website, BaseballProspectus.com, devoted to the sabermetric analysis of baseball. BP has a staff of regular columnists and provides advanced statistics as well player and team performance projections on the site...

 refers to different "orders" of wins for a team. The basic order of wins is simply the number of games they have won. However, because a team's record may not reflect its true talent due to luck, different measures of a team's talent were developed.

First-order wins, based on pure run differential, are the number of expected wins generated by the "pythagenport" formula (see above). In addition, to further filter out the distortions of luck, sabermetricians can also calculate a team's expected runs scored and allowed via a runs created
Runs created
Runs created is a baseball statistic invented by Bill James to estimate the number of runs a hitter contributes to his team.-Purpose:James explains in his book, The Bill James Historical Baseball Abstract, why he believes runs created is an essential thing to measure:With regard to an offensive...

-type equation (the most accurate at the team level being Base Runs
Base Runs
Base runs is a baseball statistic invented by sabermetrician David Smyth to estimate the number of runs a team "should" have scored given their component offensive statistics, as well as the number of runs a hitter/pitcher creates/allows. It measures essentially the same thing as Bill James' Runs...

). These formulas result in the team's expected number of runs given their total singles, doubles, walks, etc., which helps to eliminate the luck factor of the order in which the team's hits and walks came within an inning.

By plugging these expected runs scored and allowed into the pythagorean formula, one can generate second-order wins, the number of wins a team deserves based on the number of runs they should have scored and allowed given their component offensive and defensive statistics. Third-order wins are second-order wins that have been adjusted for strength of schedule (the quality of the opponent's pitching and hitting). Second- and third-order winning percentage has been shown to predict future actual team winning percentage better than both actual winning percentage and first-order winning percentage.

Theoretical explanation

Initially the correlation between the formula and actual winning percentage was simply an experimental observation. In 2003, Hein Hundal provided an inexact derivation of the formula and showed that the Pythagorean exponent was approximately 2/(σπ) where σ was the standard deviation of runs scored by all teams divided by the average number of runs scored. In 2006, Professor Steven J. Miller provided a statistical derivation of the formula under some assumptions about baseball games: if runs for each team follow a Weibull distribution and the runs scored and allowed per game are statistically independent, then the formula gives the probability of winning.

Full Application

In 2010, Martin Bernstein published a method for estimating a team's win percentage from only basic standard statistics. His formula estimates runs scored and allowed based on objective offensive and defensive stats, and then converts these into wins via James' Pythagorean expectation. The formula says that a team's win percentage equals the following:

All statistics used are team statistics.
Bernstein writes on Baseball Fever that this formula, unlike other sabermetric devices, is not a measure of raw ability, but rather of actual performance. I.e., while many stats seek to take out all external factors affecting a player or team and therefore evaluate said individuals own talent alone, this formula aims to estimate what will actually happen throughout the course of a season.

Use in basketball

American sports executive Daryl Morey
Daryl Morey
Daryl Morey is an American sports executive. He is the current general manager of the Houston Rockets of the National Basketball Association. He was named Assistant General Manager on April 3, 2006 and succeeded Carroll Dawson as General Manager on May 10, 2007...

 was the first to adapt James' Pythagorean expectation to professional basketball while a researcher at STATS, Inc.
STATS, Inc.
STATS LLC is a global sports statistics and information company – the company name originated as an acronym for "Sports Team Analysis and Tracking Systems". It was founded on April 30, 1981 by John Dewan, who became the company's CEO...

. He found that using 13.91 for the exponents provided an acceptable model for predicting won-lost percentages:


Daryl's "Modified Pythagorean Theorem" was first published in STATS Basketball Scoreboard, 1993-94.

Noted basketball analyst Dean Oliver also applied James' Pythagorean theory to professional basketball. The result was similar.

Another noted basketball statistician, John Hollinger
John Hollinger
John Hollinger is an analyst and writer for ESPN. He primarily covers the NBA. Hollinger grew up in Mahwah, New Jersey and is a 1993 graduate of the University of Virginia....

, uses a similar Pythagorean formula except with 16.5 as the exponent.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK