This page highlights some of my baseball analysis projects (see below). Since my academic work doesn't specifically relate to baseball, I'd like to share a bit more about my background pertaining to my ability to conduct baseball research.

As is evident from elsewhere on this site, I am closing in on my Ph.D. in computer science at Princeton University (expected graduation early 2020). My early research work was recognized by the National Science Foundation with a prestigious Graduate Research Fellowship. My most recent work, in conjunction with AT&T Labs, has focused on optimizing the placement of IP and optical equipment in computer networks. I also have a background in mathematics, as I was a double major in computer science and math as an undergraduate at Williams College. While there, I took several upper-level statistics classes, in addition to the more theoretical courses like Abstract Algebra and Number Theory. Even after I graduated, one of my Williams math professors twice invited me to come back to help with the summer math program for gifted high school students. I'm also still in regular contact with another of my Williams professors, Steven Miller who has done some consulting work with MLB teams.

Baseball has long been my passion. I've been reading FanGraphs since 2009, and I'm a dedicated listener to several analytically-inclined baseball podcasts. I regularly spend hours poring over various leaderboards on FanGraphs, Baseball Reference, Baseball Prospectus, Baseball Savant, and The Baseball Gauge. During the season, I routinely watch (part or all of) several games each night. In the offseason, I rewatch random games from the past season while following along on the Baseball Savant Gamefeed and the FanGraphs win probability play log.

Even before I began studying baseball, I loved the game. I played Little League and then Babe Ruth, the only girl in my town to play Babe Ruth with the boys. I only quit after ninth grade, when I realized that my right-down-the-middle (almost) 50 mph fastball served as perfect batting practice for high school boys.

By the time it became eminently clear that my baseball playing abilities were, to put it nicely, lacking, I had discovered that I had a talent for distance running. I started running high school track during my last baseball season, earning a varsity letter on my first try. I went on to run competitively for my first two years at Williams College, where I was the 2011 NCAA Division III National Champion in both the 5000 and 10,000 meters and earned four total All American awards.

My experience as a competitive athlete allows me to understand the perspective of players who are resistant to suggestions from the analytics department. I apply a logical, evidence-based approach to decisions in my daily life. However, I found that I raced best when I prevented myself from overthinking things; I actively worked to avoid obsessing over my lap splits or the impressive resumes of the other athletes. As such, I think I'm well-positioned to help establish a productive and trusting relationship between on-field personnel and the analytics department, a key challenge in today's game.


Are Analysts Affecting the Behavior They're Observing?

Published on the FanGraphs Community Blog

We observe a strong negative correlation between a pitcher's strikeout rate and his effectiveness (runs allowed). Further, strikeout rate seems to be a repeatable skill, as individual pitchers' strikeout rates remain fairly consistent over the years. In contrast, pitchers generally don't demonstrate the ability to generate soft contact, as measured by BABIP, year over year. Therefore, we analysts encourage pitchers to try for strikeouts and denounce previously accepted idea of pitching to contact. But might we be improperly ascribing causation to this correlation between strikeout rate and pitcher effectiveness?

Should We Panic Yet? Probability and Baseball Models

Here is a paper I wrote in 2011 about probability and baseball. This piece was originally intended to be a chapter in a textbook on probability my professor, Steven Miller, was working on. Though I don't believe it ever was included in the textbook, I think it serves as an example of my ability to explain complex topics in an easy-to-understand manner.

A Response to Baseball's "Hot Hand" is Real

The "hot hand" is the idea that an athlete can be "locked in" at times, and therefore we expect him to perform better than his seasonal average in his next attempt. Athletes and "old schoo" announcers in multiple sports have long asserted this phenomenon exists, while analytical studies have generally concluded that it's a myth. In August of 2017, Rob Arthur and Greg Matthews published a piece on FiveThirtyEight entitled Baseball's "Hot Hand" Is Real, claiming, as the title suggests, to have found mathematical evidence for this idea heretofore dismissed by sabermetricians. I intend no ill will toward Arthur and Matthews, and I take no issue with their mathematical techniques. However, I find the headline/framing of this article to be (likely unintentionally) misleading.

Most fundamentally, the term "hot hand" is generally used to refer to material differences in performance. If the effect is so small that it exists only in fastball velocity but is washed out by other factors in measurable results, this is not a hot hand. The article does a fine job convincing me that average fastball velocity is predictive of future average fastball velocity, but it does not find that strong/weak performance predicts future strong/weak performance. Arthur and Matthews resort to the theoretical argument that, in general an, "almost 4 mph difference in heat translates to a 1.03-run difference in projected runs allowed per nine." But why is this difference not measurable in practice? Their argument is: faster fastball leads to (a) fewer runs per game and (b) future faster fastball. Therefore, future fewer runs per game. But this is a falsifiable hypothesis; why not directly measure the runs allowed-effect for "hot" pitchers? Because, in fact, the data do not bear out that faster fastball predicts future better results.

I also object to the claims about the predictive nature of the model.

  • "It's likely that our method can also detect injuries. In particular, we found evidence that clusters of several slow pitches in a row are associated with a hurt pitcher."
  • "Our approach isn't just backwards-looking, either -- we can also predict whether a pitcher will be hot or cold in the future. Using just the first two months' worth of 2016 data, we tried to predict every pitcher's subsequent fastball velocity. Our model was able to predict how hard the next pitch would be better than a guess based on the pitcher's season-long average would be able to, suggesting that it's able to pick up on when a pitcher is hot or cold at any point in the season after June 1."

What does the "model" show that a simple graph of average fastball velocity by start doesn't reveal? We have long known that a drop in fastball velocity is a leading indicator of possible injury.

I believe there is a kernel of value to the research presented in this article. In particular, I'd like to see the antepenultimate paragraph developed further, as it seems to be getting at some measurable performance differences predicted by the model. But, I think that many of the article's assertions are grander than are truly supported by the data. I don't claim to have data refuting their conclusions, but I'm also not convinced that the hot hand does exist.