When we last checked in on the bottom of the table, there were 7 teams that we named legitimate relegation candidates. This weekend's good showing by Sunderland, who were on the fringe of the discussion anyway, means we're down to 6. It's still entirely possible that teams will fall down the table, but if we're speaking in realistic terms, there's a clear gap between the teams that are presumably safe and those that should be worried.
Next week's midweek fixtures include just one six-pointer, as Aston Villa host Newcastle. When the Toon hosted the Villans earlier this season, the result was a 1-1 draw. Meanwhile, Southampton will travel to Old Trafford, Reading will host Chelsea, Wigan go to Stoke, and Queens Park Rangers host Manchester City. It's a huge opportunity for both Villa and United to put some distance between them and the drop zone. Add in the revenge factor for those in the Toon Army who still remember the last day of the 2008-09 season, and suddenly we've got must-see TV.
There's some additional intrigue for the statistically inclined. You'll notice that Aston Villa have the worst goal differential by far. It's almost twice as bad as Newcastle's, and they're down by more than one goal a game. That's historically bad. Yet the two teams are separated by just one point.
So what gives? We can generally assume that teams with negative goal differentials will do worse than teams with positive differentials. Scoring goals is the point of the game, after all. Of course, goal differential doesn't tell the whole story. No metric ever will. Some teams thrive in close games and will occasionally get blown out. Tactics change when teams take leads or fall behind, and those late game maneuverings can sometimes affect the margin of victory or defeat. It's the reason we write narratives after the game using words like "deserved" or "lucky." However, given a large enough sample size, the goal differential will eventually become a reliable predictor of wins, losses, and draws.
What we're about to do here is based on a formula devised by Bill James, saberist and Godfather of quantitative baseball analysis. James theorized that he could predict a team's W-L record by inputting their runs scored and runs allowed into an equation. What he came up with looked a lot like the formula used to find the hypotenuse of a right triangle (or the Pythagorean theorem), so it became known as Pythagorean Expectation. The formula is this:
Wins = Runs Scored2 / (Runs Scored2 + Runs Allowed2)
Believe it or not, the formula is actually a pretty accurate predictor. It would be nice to translate it directly to football, but alas, there are no draws or ties in baseball, and the introduction of a non-binary event to the equation complicates it significantly. The next part is extremely complex and frankly a bit over my head. If you'd like to read further, I suggest taking a look a Martin Eastwood's pena.lt/y blog. The long and short of it is that we have a slightly more complicated, yet more accurate, equation that ostensibly works for footy. Here it is:
Predicted Points = (Goals For1.122777 / (Goals For1.072388 + Goals Against1.127248)) X 2.499973 X Number of Games
Skeptical? Me too. I'm also insanely curious (and I also believe that the guy who makes a living putting together mathematical models might just know what he's talking about). Let's take a look at the points expectation for both Newcastle and Aston Villa based on their goals for and against.
Goals For: 28
Goals Against: 41
Points Expectation: 23.9
Actual Points: 21
Goals For: 19
Goals Against: 44
Points Expectation: 16.6
Actual Points: 20
So, Newcastle have been a bit unlucky, and Aston Villa have been slightly more lucky than that. Does it mean that Newcastle will win next Tuesday? Not at all. Even so, I feel better about their chances moving forward than I do Aston Villa's. A deficit that large will eventually catch up with you.
Let's add a couple of columns to our table: ExPts is for Expected Points, or the number of points the team should have gained thus far based on goals for and against. It is an expression of luck, for lack of a better term. PrPts stands for Projected Points. I've taken Expected Points, scaled it on a per game basis, multiplied it by the number of games remaining, then added it to the amount of actual points the team currently has. In other words, if the team continues on the pace it is currently on, this is a reasonable guess for their final point total.
The model projects Wigan Athletic, Aston Villa, and QPR to be the relegated teams. Not really a surprise, based on our hypothesis. The three teams with the worst goal differentials end up at the bottom. What do you think? Does this sort of analysis have merit? Does the final table give you more hope for Newcastle, or does it worry you more because they've underachieved?