Friday, August 11, 2006

Swing Analysis

Jeff Albert has posted guest articles on Andruw Jones and Alex Rodriguez over at Baseball Analysts comparing their swings in different seasons. I understand the appeal of this kind of analysis, but my guess is that it's dubious. Firstly, I'm not even clear on how representative the swing captures are - do all of A-Rod's swings in 2006 look like the pictured? Are we seeing relatively extreme examples? What is the standard season-to-season variance of the appearance of a player's swing? Perhaps these are readily answered questions, or even questions that people who watch more baseball than I could answer without much thought. But I don't know the answers, and I think that those questions are essential before launching into a side by side comparison of the video.

Second, I have a major problem with any analysis that attempts to posit an explanation for statistical variance by looking back at what differences can be seen in how the player played. Now, if intensive scouting data is kept on a player, and you use that scouting data in concert with a) an understanding of the scouting data for all hitters and b) an understanding of the empirical relation of scouting data to performance results, then you can probably learn a lot. But if you ad hoc notice a change in statistics and decide to get all Jake Gittes on a few minutes of video, I doubt your findings can have any degree of confidence.

Looking at their statistical records, it seems entirely reasonable to assume that the performance records in question are the result of variance, pure and simple. A-Rod is on the other side of 27 and is still an excellent hitter; he went from a 4- to 5-win hitter in Texas to a 3-win hitter in 2004 and 2006, with an outlying 2005. His strikeouts have increased a bit in each year in New York, which is to be expected at his age. His $HR came down when he left Texas, as one would expect, and has been the same in 2004 and 2006. The only big deal is that he hit an extra dozen home runs in 2005, right? Since 2005 was his career year, why would anyone expect him to repeat it at 30 in 2006? It's perfectly fair to say his true talent level is 40 HR per 600 AB, which he undershoots by a few in '04 and '06 and overshoots by 8 in 2005. It just seems like his 'slump' only amounts to not getting the extra HR every four weeks that he got last season.

Meanwhile, Jones' 2005 doesn't seem like much of a big deal either. It differed from his established performance levels in that he cut his K's down to a level he hadn't been at since '99-2000 and he hit 15 extra HR. But the difference in HR looks mostly like his doubles turning into homers for one season. From 2002-4, he hit 15.3 extra base hits per 100 batted balls, and has 15.8 in 2006. In 2005, he had 16.5, so we're not talking about a huge difference in power; the only difference is that he had a good year for getting them all the way into the seats. It also didn't hurt the breakout aura that his 2004 season was below his established levels. Now, is it possible that he played differently and his results reflect a different approach? Of course. Indeed, his hits per batted ball were down 25 points last season, so perhaps he was lofting balls more often, resulting in more hits in the stands but also more hits in OF gloves. But the explanation that the differences in numbers is just statistical variance has as much or more merit than the guess that he made mechanical changes to get more air under the ball.

I do not mean to imply that we should just accept 'statistical variance' as an acceptable answer for why player x is struggling/doing well and so forth. But it should be the default explanation.
A player who goes from 3 to 11 HR in successive seasons has not necessarily bulked up, and a player going from 35 to 50 has not necessarily made any changes. Moreover, I am willing to believe that a great deal of statistical variance *can* be explained by looking at changes in a hitter's mechanics (although clearly not all variance can be so explained). Heck, it's of course possible that, due to changes in his swing, A-Rod in 2005 was a true .350/.460/.680 hitter whose observed performance didn't live up to his talent, and that in 2004 and 2006 he's really a .270/.360/.460 hitter whose observed performance is better than his true talent. The point is just that taking the performance data and trying to clarify it by cherry-picking the scouting info that would explain the fluctuation in performance data is foolish. That may or may not be what Jeff Albert (or Don Mattingley, or Alex Rodriguez) does. What I am arguing is that the task should be to first come up with a useful and reliable method for cross-referencing scouting and performance data; since one will never have a perfect sample of either player performance or scouting info, picking and choosing just doesn't seem like a solid method of learning more about a player.


At 8:11 AM, Blogger Rob said...

Okay, I'm having a brain cramp getting from "I do not mean to imply that we should just accept 'statistical variance' as an acceptable answer for why player x is struggling/doing well and so forth." to "But it should be the default explanation." If it's the default, why is it not acceptable?

At 2:47 PM, Blogger Aaeamdar said...

Because an acceptable answer implies the end of the analysis. What Tom is rightly saying is that *absent other evidence* statistical varience is the most likely explanation for changes in performance. That does not mean, however, that just because statistical varience is a probable answer that you stop looking. What you don't do, however, is puff up some observations that absent performance differences tell you nothing, and then try and bootstrap those observations into explaining the recorded performance difference.

In other words, if you had *no idea* how two players performed and you looked at side-by-side swings, could you reliably comment about their performce? If the answer is "No" (which it almost certainly is), then using that observation to try and explain known performace is false analysis.


