Monday, September 8, 2014

When Should I Steal?

The Stolen Base
Some consider the stolen base a "lost art." Gone are the days of Vince Coleman's back-to-back-to-back 100+ stolen base seasons of Whitey-ball folklore. Teams are stealing at the lowest rates (per game) since the 1950's.
Aside from the 2011 outlier, stolen base rates have trended downward at a serious pace, but stolen bases still have their place in the game, especially in increasingly shrinking run environments, but at what point is the value added from a stolen base worth the risk of an out?

Run Expectancy
Tom Tango's handy-dandy run expectancy chart can give us this answer. In his run expectancy matrix, we can see how run expectancy can change from one state to another from a series of events. The basic guide that saberists abide by is that you should be able to steal bases twice as much as you get caught trying to steal to break even in expected runs, but every situation is different. With runners on first and third and two outs, you would actually have to steal bases at an almost 6:1 ratio to break even. This is because of three factors: you are not adding any value to the runner that is already on third, making an out takes the bat out of someone's hands, and making an out with someone already in scoring position is the most detrimental kind of out. Also, in any given situation, you are facing a battery with different characteristics. Stealing a base off of Kyle Lohse and Yadier Molina was nearly impossible back in 2011. On the other hand, stealing a base off of John Lackey and Jarrod Saltalamacchia would have been a lot easier. Accounting for the risk of your own baserunner, the defense, league rates, and base-out situation will lead to the most informed decision.

In the tool below, begin by picking your situation (the strings go: out, first base, second base, third base where "x" means no runner and a number means a runner occupies that base e.g. 0x2x means no outs and runner on second base). Then evaluate your baserunner's steal rate against an average opponent (Steamer's updated projection gives Kolten Wong a 21/24 chance of stealing a base). After that, evaluate your opponents steal rate against (lefty or righty pitcher, strong armed catcher). Then plug in the league average steal rate, and you should have an expected stolen base percentage for your given situation and the given change in run expectancy (RE24).


Additions
Some additions to this spreadsheet could give the defensive positioning, a "winning run" selection box for the bottom of the 9th inning (this changes manager's decisions because only the lead run matters and only one runs matters), or the batter in the given situation.

Saturday, February 22, 2014

WAR Methodologies and Reasoning

WAR Basics

WAR is the approximate value of wins that a player added (or subtracted) to his team. Finding a certain player’s value is often very difficult and somewhat subjective. What I value in a player could be different from what you value in a player, and that value doesn’t necessarily have to be in the form of wins. You could call your statistic Value Added and the gist of it would still be the same: Which player was more valuable?

WAR is a blueprint. It doesn’t tell you what statistics you have to use or value. Many of the popular WAR methods follow the same blueprint: an offensive component(s), a defensive component, and a positional adjustment displayed as values above or below a league average. This gives you the value of the given player as compared to a league average. Baseball Reference calls this WAA (wins above average) and so will I. After this, you add in a replacement adjustment for each league and boom, there you have it, WAR. WAR offers a lot of leeway inherently. Offense + Defense + Position + Replacement = WAR. It doesn’t tell you what you have to quantify any of these numbers with. If you think that batting average is the holy grail of offensive statistics, all you have to do is convert a given players batting average to runs and you have your offensive component. There is nothing inherent about WAR’s framework that tells you that you cannot do this.

If you are trying to find a theme so far, WAR is wide open to different interpretations.

RE24 Basics

RE24, or Run Expectancy based on the 24 base/out states, is the difference in run expectancy between the start of a play and at the end of a play. It is a descriptive statistic as opposed to a predictive statistic. A story-telling statistic is what some people call these types of numbers. It tells what happened. Say, for instance, Justin Upton walks to the plate with Andrelton Simmons standing on first base and no outs. The run expectancy of this plate appearance is 0.941 runs which simply means that, on average, 0.941 runs are scored are scored either in this plate appearance or through the end of the inning. If Justin hits a single and Andrelton stops at second base, the run expectancy is now 1.556 runs. To get Justin Upton’s added run expectancy, you take the run expectancy at the end of the play and you subtract the run expectancy from the beginning of the play. End of play = 1.556, beginning of play = 0.941 so 1.556 – 0.941 = 0.615. Justin Upton’s RE24 for this plate appearance is 0.615 runs. Do this for every play of the season and you have a player’s annual RE24. (Justin Upton’s was 22.2 for the 2013 season which placed him 28th in the NL among batters in case you were wondering).

RE24 is very good at telling you how much offensive value a player added or subtracted on a given play in terms of runs. Everyone can agree that having both first and second base occupied will lead to more runs than just first base occupied no matter what else happened barring any outs. RE24 does not care about the how. All it cares about is the input and output of a play. It combines the ability to create runs by yourself and the ability to create runs contextually. A double with a man on third base is more valuable than a double with no men on, but also, it is worse to not advance the man on third to score a run than it is to groundout with nobody on. It gives the batter its due reward for hitting well in context but also penalizes accordingly.

Using RE24 as the Offensive Component to WAR

RE24 is a great story-telling statistic that tells you how a player faired in the situations he was provided, and it also comes in a very easy-to-use package. It is already stated as runs above or below the league average. RE24 is great for my interpretation of WAR. Rather than trying to deduce the intrinsic value of a player in a vacuum, I want to see how that player contributed to his teams wins. I see WAR as a story-telling statistic. I do not need WAR to be predictive of what will happen the next year. When I want to compare how two different players played in a season, I do not need the statistic to try to be something no one statistic can be: a perfect representation of a player’s current skill. All I need to see is how he contributed to his team’s performance between two endpoints. All I need to know about Matt Carpenter’s 2013 season is what he contributed to the Cardinals’ overall performance. When evaluating Matt Carpenter’s 2013 season, I do not need to know how he should have performed or how he will perform in the future. My value in terms of seasonal WAR is purely descriptive. Matt Carpenter added 52.4 runs offensively (according to RE24). This does not mean he will add 52.4 runs offensively in 2014. It doesn’t even mean he will add 1 run offensively in 2014, but I don’t need WAR to be predictive. It is a story-telling statistic that approximates a certain player’s contributions to his team in terms of on-field production.

Introducing nWAR

nWAR is my WAR formula which uses the basic, most popular framework. Offense + Defense + Position + Replacement = WAR. For the offensive production I use RE24 with an additive park adjustment based on home field and the Ultimate Base Running statistic from Fangraphs (because it does not include stolen base attempts for which RE24 already accounts for), for the defensive component I use an average of the Ultimate Zone Rating and the Defensive Runs Saved metrics, and I use the methods outlined in Tom Tango’s “How to calculate WAR” for my position values as well as the replacement values for each league.

To give you an example we’ll use Justin Upton again. Let’s start with his RE24, which is what I said before, 22.2. Adjust this for the slight hitter’s lean of Turner Field and we have an aRE24 (RE24 adjusted for the hitter’s home park) of 19.5. Add in his baserunning score of 5 runs above average and we have an offensive runs above average of 24.5. Next, take the average of his UZR (Ultimate Zone Rating) and DRS (Defensive Runs Saved) ratings. I do this because both systems are good on their own, but they both have shortfalls. Taking the average seems to balance out these shortfalls. It is not perfect, but it is very good and it is the best quantifiable defensive metrics we have available. Upton’s UZR was –9.6 and his DRS was –5. Not a very good year for Justin defensively with –7.3 defensive runs above average. To convert these run values to wins you simply divide them by 10.5 using Tango’s method. Justin has a combined 17.2 runs above average so far (19.5 + 5 + [–7.3]) = 17.2. So his wins above average so far is roughly 1.64 wins (that’s a rounded figure). Up next comes the position adjustment. Justin played 147 games as a corner outfielder and 2 as a designated hitter. Using Tom Tango’s positional adjustments we get –0.5 wins. Next we have the replacement adjustment and for the National League it is 2 wins for every 162 games. Justin played 149 games so his replacement adjustment is 1.8 wins. To put all of that together we just find the sum. (1.64 + [–0.5] + 1.8) = 3.0 wins above average (there is some rounding in there because I thought it would look better to just keep it to one decimal point for the readability). And there we have it. Justin Upton’s 2013 nWAR is 3.0.

For pitchers I use their RE24 value and adjust it for their home park. Then, I convert his aRE24 to an expected runs allowed by dividing it by the number of innings pitched and multiplying it by nine. This number gives us the pitcher’s runs above average per nine innings. To get this number into runs allowed per nine innings we subtract it from the average runs allowed per nine innings for each league (4 in the NL and 4.33 in the AL).

If you did not follow that, I’ll give you an example.

Cliff Lee pitched 222.2 innings in 2013 and had an RE24 of 24.6. A simple additive park factor adjustment of about 0.449 runs per inning gives us an aRE24 of 26.6. We then take this number and break it down to runs above average (above average meaning better statistically not literally more than average) per nine innings. So we have (aRE24 ∕ Inn) x 9 = (26.6 ∕ 222.67 [two-thirds of an inning]) x 9 = 1.07 runs above average per nine innings. We then subtract the average runs allowed per nine innings from the runs above average per nine innings. 4 – 1.07 = 2.93. Cliff Lee’s expected runs allowed per nine innings based off of aRE24 is 2.93 runs. We’ll call this number xRA. We then use xRA and follow Tango’s method for calculating WAR for pitchers. We create his run environment then calculate his winning percentage based off of his xRA and average run environment. Add in a similar replacement adjustment that we used for the batters and we have our nWAR for pitchers.

Here is a list of some of the Top 10 position players from each league according to nWAR:

NL

AL

Player
nWAR
Player
nWAR
Paul Goldschmidt
8.2
Mike Trout
10.3
Andrew McCutchen
7.7
Josh Donaldson
8.1
Matt Carpenter
7.4
Chris Davis
7.5
Carlos Gomez
7.4
Robinson Cano
7.1
Freddie Freeman
7.0
Miguel Cabrera
7.0
Yadier Molina
6.4
Shane Victorino
6.0
Hunter Pence
5.9
Evan Longoria
5.9
Joey Votto
5.9
Jason Kipnis
5.7
Matt Holliday
5.4
Manny Machado
5.6
Shin-Soo Choo
5.3
Carlos Santana
5.1

And here are the Top 5 pitchers from each league:

NL

AL

Player
nWAR
Player
nWAR
Clayton Kershaw
8.1
Yu Darvish
7.6
Jose Fernandez
6.6
Max Scherzer
7.5
Cliff Lee
6.0
James Shields
7.2
Jhouylis Chacin
5.9
Chris Sale
7.1
Adam Wainwright
5.6
Anibal Sanchez
6.8

Okay, well, there you have it. This is my take on WAR including the why and the how I did it. Remember that WAR is simply a framework from which to start. I think this version of WAR accurately displays what I want it to tell me. Who added the most value to his respective team on the field?