PECOTA
Encyclopedia
PECOTA, an acronym for Player Empirical Comparison and Optimization Test Algorithm, is a sabermetric system for forecasting
Forecasting
Forecasting is the process of making statements about events whose actual outcomes have not yet been observed. A commonplace example might be estimation for some variable of interest at some specified future date. Prediction is a similar, but more general term...

 Major League Baseball
Major League Baseball
Major League Baseball is the highest level of professional baseball in the United States and Canada, consisting of teams that play in the National League and the American League...

 player performance. The word is a backronym based on the name of journeyman major league player Bill Pecota
Bill Pecota
William Joseph Pecota , is a former professional baseball player who played infield in the Major Leagues from 1986-94. Pecota was drafted by the Royals in the 10th round of 1981 amateur draft after playing at DeAnza College in Cupertino, CA, and debuted in Kansas City in 1986...

, who with a lifetime batting average
Batting average
Batting average is a statistic in both cricket and baseball that measures the performance of cricket batsmen and baseball hitters. The two statistics are related in that baseball averages are directly descended from the concept of cricket averages.- Cricket :...

 of .249 is perhaps representative of the typical PECOTA entry. PECOTA was developed by Nate Silver
Nate Silver
Nathaniel Read "Nate" Silver is an American statistician, psephologist, and writer. Silver first gained public recognition for developing PECOTA, a system for forecasting the performance and career development of Major League Baseball players, which he sold to and then managed for Baseball...

 in 2002-2003 and introduced to the public in the book Baseball Prospectus 2003. Baseball Prospectus
Baseball Prospectus
Baseball Prospectus is an organization that publishes a website, BaseballProspectus.com, devoted to the sabermetric analysis of baseball. BP has a staff of regular columnists and provides advanced statistics as well player and team performance projections on the site...

 (BP) has owned PECOTA since 2003; Silver managed PECOTA from 2003 to 2009. He was responsible for the PECOTA projections for the 2003—2009 baseball seasons. Beginning in Spring 2009, BP assumed responsibility for producing the annual forecasts. The first baseball season for which Silver played no role in producing the PECOTA projections was 2010.

One of several widely publicized statistical systems of forecasts of player performance, PECOTA player forecasts are marketed by BP as a fantasy baseball
Fantasy baseball
Fantasy baseball is a game where participants manage an imaginary roster of real Major League baseball players. The participants compete against one another using those players' real life statistics to score points...

 product. Since 2003, annual PECOTA forecasts have been published both in the Baseball Prospectus annual books and, in more detailed form, on the BaseballProspectus.com subscription-based website. PECOTA also inspired some analogous projection systems for other professional sports: KUBIAK
Football Outsiders
Football Outsiders is a website started in July 2003 which focuses on advanced statistical analysis of the NFL. The site is run by a staff of regular writers, who produce a series of weekly columns using both the site's in-house statistics and their personal analyses of NFL games.In 2005 and 2006,...

 for the National Football League
National Football League
The National Football League is the highest level of professional American football in the United States, and is considered the top professional American football league in the world. It was formed by eleven teams in 1920 as the American Professional Football Association, with the league changing...

, SCHOENE for the National Basketball Association
National Basketball Association
The National Basketball Association is the pre-eminent men's professional basketball league in North America. It consists of thirty franchised member clubs, of which twenty-nine are located in the United States and one in Canada...

, and VUKOTA for the National Hockey League
National Hockey League
The National Hockey League is an unincorporated not-for-profit association which operates a major professional ice hockey league of 30 franchised member clubs, of which 7 are currently located in Canada and 23 in the United States...

.

PECOTA forecasts a player's performance in all of the major categories used in typical fantasy baseball games; it also forecasts production in advanced sabermetric categories developed by Baseball Prospectus (e.g., VORP and EqA). In addition, PECOTA forecasts several summary diagnostics such as breakout rates, improve rates, and attrition rates, as well as the market values of the players. The logic and methodology underlying PECOTA have been described in several publications, but the detailed formulas are proprietary
Property
Property is any physical or intangible entity that is owned by a person or jointly by a group of people or a legal entity like a corporation...

 and have not been shared with the baseball research community.

Methodology

Silver described the inspiration for his approach as follows:
The basic idea behind PECOTA is really a fusion of two different things – [Bill] James's
Bill James
George William “Bill” James is a baseball writer, historian, and statistician whose work has been widely influential. Since 1977, James has written more than two dozen books devoted to baseball history and statistics...

 work on similarity scores and Gary Huckabay's work on Vlad, [Baseball Prospectus's] previous projection system, which tried to assign players to a number of different career paths. I think Gary used something like thirteen or fifteen separate career paths, and all that PECOTA is really doing is carrying that to the logical extreme, where there is essentially a separate career path for every player in major league history. The comparability scores are the mechanism by which it picks and chooses from among those career paths.

Comparable players

PECOTA relies on fitting a given player's past performance statistics to the performance of "comparable" Major League ballplayers by means of similarity score
Similarity score
In Sabermetrics and APBRmetrics, similarity scores are a method of comparing baseball and basketball players to other players, with the intent of discovering who the most similar historical players are to a certain player....

s. As is described in the Baseball Prospectus website's glossary:

PECOTA compares each player against a database of roughly 20,000 major league batter seasons since World War II. In addition, it also draws upon a database of roughly 15,000 translated minor league seasons (1997-2006) for players that spent most of their previous season in the minor leagues. . . . PECOTA considers four broad categories of attributes in determining a hitter's comparability:


1. Production metrics – such as batting average, isolated power, and unintentional walk rate for hitters, or strikeout rate and groundball rate for pitchers.


2. Usage metrics, including career length and plate appearances or innings pitched.


3. Phenotypic attributes, including handedness, height, weight, career length (for major leaguers), and minor league level (for prospects).


4. Fielding Position (for hitters) or starting/relief role (for pitchers). . . . In most cases, the database is large enough to provide a meaningfully large set of appropriate comparables. When it isn't, the program is designed to 'cheat' by expanding its tolerance for dissimilar players until a reasonable sample size is reached.


PECOTA uses nearest neighbor analysis
Nearest neighbor search
Nearest neighbor search , also known as proximity search, similarity search or closest point search, is an optimization problem for finding closest points in metric spaces. The problem is: given a set S of points in a metric space M and a query point q ∈ M, find the closest point in S to q...

 to match the individual player with a set of other players who are most similar to him. Although drawing on the underlying concept of Bill James
Bill James
George William “Bill” James is a baseball writer, historian, and statistician whose work has been widely influential. Since 1977, James has written more than two dozen books devoted to baseball history and statistics...

' similarity scores, PECOTA calculates these scores in a distinct way that leads to a very different set of "comparables" than James' method. Furthermore, Silver describes the following distinct feature:
The PECOTA similarity scores are based primarily on looking at a three-year window of a pitcher’s performance. Thus, we might look at what a pitcher did from ages 35-37, and compare that against the most similar age 35-37 performances, after adjusting for parks, league effects, and a whole host of other things. This is different from the similarity scores you might see at baseball-reference.com or in other places, which attempt to evaluate the totality of a player’s career up to a given age.

Once a set of "comparables" is determined for each player, his future performance forecast is based on the historical performance of his "comparables". For example, a 26 year-old's forecast performance in the coming season will be based on how the most comparable Major League 26 year-olds performed in their subsequent season.

Separate sets of predictions are developed for hitters and pitchers.

Peripheral statistics

PECOTA also relies a lot on the use of peripheral statistics to forecast a given player's future performance. For example, drawing on the insights coming out of the use of defense-independent pitching statistics
Defense independent pitching statistics
In baseball, defense-independent pitching statistics measure a pitcher's effectiveness based only on plays that do not involve fielders: home runs allowed, strikeouts, hit batters, walks, and, more recently, fly ball percentage, ground ball percentage, and line drive percentage...

, PECOTA forecasts a pitcher's future performance in a given area by using information about his past performance in other areas. As baseball analyst and journalist Alan Schwarz
Alan Schwarz
Alan Schwarz is a Pulitzer Prize-nominated reporter at the The New York Times best known for writing more than 100 articles that exposed the seriousness of concussions among football players of all ages...

 writes, "Silver . . . designed a sophisticated variance algorithm that has examined every big-league pitcher's statistics since 1946 to determine which numbers best forecast effectiveness, specifically earned run average
Earned run average
In baseball statistics, earned run average is the mean of earned runs given up by a pitcher per nine innings pitched. It is determined by dividing the number of earned runs allowed by the number of innings pitched and multiplying by nine...

. His findings are counterintuitive to most fans. 'When you try to predict future E.R.A.'s with past E.R.A.'s, you're making a mistake,' Silver said. Silver found that the most predictive statistics, by a considerable margin, are a pitcher's strikeout rate and walk rate. Home runs allowed, lefty-righty breakdowns and other data tell less about a pitcher's future".

Probability distributions

Instead of focusing on making point estimates
Point estimation
In statistics, point estimation involves the use of sample data to calculate a single value which is to serve as a "best guess" or "best estimate" of an unknown population parameter....

 of a player's future performance (such as batting average, home runs, and strike-outs), PECOTA relies on the historical performance of the given player's "comparables" to produce a probability distribution
Probability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

 of the given player's predicted performance during the next five years. Alan Schwarz
Alan Schwarz
Alan Schwarz is a Pulitzer Prize-nominated reporter at the The New York Times best known for writing more than 100 articles that exposed the seriousness of concussions among football players of all ages...

 has emphasized this feature of PECOTA: "What separates Pecota from the gaggle of projection systems that outsiders have developed over many decades is how it recognizes, even flaunts, the uncertainty of predicting a player's skills. Rather than generate one line of expected statistics, Pecota presents seven – some optimistic, some pessimistic – each with its own confidence level. The system greatly resembles the forecasting of hurricane paths: players can go in many directions, so preparing for just one is foolish". Silver has written,
This procedure requires us to become comfortable with probabilistic thinking. While a majority of players of a certain type may progress a certain way – say, peak early – there will always be exceptions. Moreover, the comparable players may not always perform in accordance with their true level of ability. They will sometimes appear to exceed it in any given season, and other times fall short, because of the sample size problems that we described earlier.


PECOTA accounts for these sorts of factors by creating not a single forecast point, as other systems do, but rather a range of possible outcomes that the player could expect to achieve at different levels of probability. Instead of telling you that it's going to rain, we tell you that there's an 80% chance of rain, because 80% of the time that these atmospheric conditions have emerged on Tuesday, it has rained on Wednesday.


Surely, this approach is more complicated than the standard method of applying an age adjustment based on the 'average' course of development of all players throughout history. However, it is also leaps and bounds more representative of reality, and more accurate to boot.

Team effort

Although Silver was the creator of PECOTA, producing PECOTA forecasts was a team effort: "I might be `the PECOTA guy,' but it very much is a team effort," Silver has said of the BP staff. "We all do it. It's my baby, but it takes a village to run a PECOTA". For example, PECOTA draws on Clay Davenport
Clay Davenport
Clay Davenport, a native of Hampton Roads, Virginia, now living in Baltimore, Maryland, is a baseball sabermetrician who co-founded Baseball Prospectus in 1996. He co-edited several of the Baseball Prospectus annual volumes and is a writer for BaseballProspectus.com...

's translations (the so-called Davenport Translations or DT's) of minor league and international baseball statistics to estimate the major league equivalent performance of each player. In this way, PECOTA is able to make projections for more than 1,600 players each year, including many players with little or no prior major league experience.

The 2009 preseason forecasts were the last ones for which Silver took primary responsibility. In March 2009, Silver announced that PECOTA's extremely complex and laborious set of database manipulations and calculations would be moving to a different platform. Although Baseball Prospectus had been the owner of PECOTA since Silver sold it to them in 2003—and Silver stewarded and took responsibility for the forecasts—henceforth PECOTA forecasts would be generated by the Baseball Prospectus team, initially with Clay Davenport in charge of the effort., and later with Colin Wyers heading up both production and improvements in PECOTA. And the production of future forecasts would be more tightly integrated with the production of other Baseball Prospectus statistics.

Alternative forecasting systems

Most of the other popular forecasting systems do not use a "comparable players" approach. Instead most rely on direct projections from a player's past performance to his future performance, typically by using as a baseline a weighted average
Weighted mean
The weighted mean is similar to an arithmetic mean , where instead of each of the data points contributing equally to the final average, some data points contribute more than others...

 of a player's performance in his previous three years. Like PECOTA, many of those systems also adjust the projections for aging, park effects
Batting Park Factor
Batting Park Factor, also simply called Park Factor or BPF, is a baseball statistic that indicates the difference between runs scored in a team's home and road games. Most commonly used as a metric in the sabermetric community, it has found more general usage in recent years...

 and regression toward the mean
Regression toward the mean
In statistics, regression toward the mean is the phenomenon that if a variable is extreme on its first measurement, it will tend to be closer to the average on a second measurement, and—a fact that may superficially seem paradoxical—if it is extreme on a second measurement, will tend...

. Like PECOTA, they may also adjust for the competitive difficulty of each of the two major leagues. The systems differ from one another, however, in the types and intensities of age adjustments, regression-effect estimates, park adjustments, and league-difficulty adjustments that they may make as well as in whether they use similarity scores. PECOTA also makes projections for many more players than do other systems, because PECOTA relies on adjusted minor league statistics as well as major league statistics and tries to make projections for all of the players on major league expanded rosters (40 players per team) as well as other prospects
Prospect (sports)
In sports, a prospect is any player whose rights are owned by a professional team, but who has yet to play a game for the team, or isn't established with the team yet. Prospects can sometimes be assigned to farm teams...

.

Beginning in 2000, the Cleveland Indians
Cleveland Indians
The Cleveland Indians are a professional baseball team based in Cleveland, Ohio. They are in the Central Division of Major League Baseball's American League. Since , they have played in Progressive Field. The team's spring training facility is in Goodyear, Arizona...

 developed a proprietary analytical database called DiamondView to evaluate scouting information gathered by the team; this system later incorporated player performance indicators and financial indicators, for purposes of evaluating and projecting the performance of all major league players. During 2008-2009, the Pittsburgh Pirates
Pittsburgh Pirates
The Pittsburgh Pirates are a Major League Baseball club based in Pittsburgh, Pennsylvania. They play in the Central Division of the National League, and are five-time World Series Champions...

 were in process of developing MITT ("Managing, Information, Tools and Talent"), a proprietary database that integrates scouting reports, medical and contract information, and performance statistics and projections.

Updates and revisions

First introduced in 2003, PECOTA projections are produced each year and published both in the Baseball Prospectus annual monographs and on the BaseballProspectus.com website. PECOTA has undergone several improvements since 2003. The 2006 version introduced metrics for the market valuation
Valuation (finance)
In finance, valuation is the process of estimating what something is worth. Items that are usually valued are a financial asset or liability. Valuations can be done on assets or on liabilities...

 of players based on the predicted performance levels. The 2007 version introduced adjustments for league effects, to account for differences in the competitive environment of the two major leagues. The 2008 update took into account differences in players' performance during the first and second halves of the previous season as well as platoon splits (how well a player performed against hitters or pitchers who were left- or right-handed). It also took account of baserunning. In 2009, Baseball Prospectus introduced in-season PECOTA projections, to update and supplement its beginning of the season projections.

Accuracy

Although Baseball Prospectus promotes PECOTA commercially as "deadly accurate," all projection systems are subject to considerable uncertainty. A comparison found that PECOTA had outperformed several other forecasting systems for the 2006 season in predicting OPS
On-base plus slugging
On-base plus slugging is a sabermetric baseball statistic calculated as the sum of a player's on-base percentage and slugging percentage. The ability of a player to both get on base and to hit for power, two important hitting skills, are represented. An OPS of .900 or higher in Major League...

. It performed nearly as well as the best of the other systems in predicting ERA
Earned run average
In baseball statistics, earned run average is the mean of earned runs given up by a pitcher per nine innings pitched. It is determined by dividing the number of earned runs allowed by the number of innings pitched and multiplying by nine...

. It should be noted that, while PECOTA projections are made for well over 1000 hitters each season, the evaluation of the system included only slightly over 100 players who had a minimum of 500 major league AB and had also been included in projections by the other systems. Nate Silver's own comparison of the performance of alternative projection systems for hitters in 2007 also showed that PECOTA led the field, though a couple of others were close.

Although designed primarily for predicting individual player performance, PECOTA has been applied also to predicting team performance. For this purpose, projected team depth chart
Depth chart
In sports, a depth chart is used to show the placements of the starting players and the secondary players. Generally a starting player will be listed first or on top while a back-up will be listed below...

s are established with projected playing times for each team member, drawing on the expert advice of the Baseball Prospectus staff. The number of runs a team will score and allow during the coming season is estimated based on the playing times and PECOTA's predicted individual performance of each player, using a "Marginal Lineup Value" algorithm created by David Tate and further developed by Keith Woolner
Keith Woolner
Keith Woolner is an author for Baseball Prospectus and is the creator of the runs-based statistic VORP or Value Over Replacement Player. VORP is acknowledged by the sabermetrics community as one of the key concepts in the analysis of a player's performance and market valuation.-Education and early...

. A team's expected wins is based on applying an improved version of Bill James' Pythagorean Formula
Pythagorean expectation
Pythagorean expectation is a formula invented by Bill James to estimate how many games a baseball team "should" have won based on the number of runs they scored and allowed. Comparing a team's actual and Pythagorean winning percentage can be used to evaluate how lucky that team was...

 to the estimated number of runs scored and allowed by the roster of players under the given playing-time assumptions.

PECOTA has been used in preseason forecasts of how many wins teams will attain and in mid-season simulations of the number of wins each team will attain and its odds of reaching the playoffs. In 2006, PECOTA's preseason forecasts compared favorably to other forecasting systems (including Las Vegas
Las Vegas, Nevada
Las Vegas is the most populous city in the U.S. state of Nevada and is also the county seat of Clark County, Nevada. Las Vegas is an internationally renowned major resort city for gambling, shopping, and fine dining. The city bills itself as The Entertainment Capital of the World, and is famous...

betting line odds) in predicting the number of wins teams would earn during the season. An independent evaluation by the website Vegas Watch showed that PECOTA had the lowest error in predicting Major League team wins in 2008 of all the best known forecasts, both those that were sabermetrically based and those that relied on individual expertise. In 2009, however, PECOTA lagged behind all the well-known forecasters.

A summary for the 2003 through 2007 seasons shows that PECOTA's average error between the predicted and actual team wins declined:
2003 5.91 wins;
2004 7.71 wins;
2005 5.14 wins;
2006 4.94 wins;
2007 4.31 wins. Silver conjectures that the improvement has come in part from taking defense into account in the forecasts beginning in 2005.
In 2008 the average error was 8.5 wins.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK