It has been a long time since I have last posted so I apologize to anyone who was looking for new material only to leave the page disappointed. I have recently started a new job, which has lowered the amount of time I can dedicate to my blog. Adding to the fact that I still try to bet a little bit on the weekends, I need to stretch myself a bit in order to maintain my previous posting frequency. This will probably remain the case in the coming months, however I will make up for that by further improving the quality of my articles and by avoiding the temptation of posting filler content when I have nothing to say. Enough about me, and let’s dive into the topic of this article, **the favourite-longshot bias**.

**The overround and the favourite-longshot bias**

Today I want to expand further on the overround analysis from my last article. There I have demonstrated the relatively well-known fact that overround in the betting markets is declining in the recent years, but have also offered a more detailed look into the different developments within different markets/countries/leagues. The article was received well, showing that the topic is an interesting one for the betting community and is perhaps not getting the coverage it deserves.

So today I dive deeper into the topic of overround and more specifically the famous favourite-longshot bias. The favourite-longshot bias refers to the not-so-intuitive fact that bookmakers tend to assign disproportionally large overround to longshots. As a result the average price a bettor receives on a favourite is, relatively speaking, more favourable compared to the one for the longshot. There have been a few attempts to explain why that is the case, so I will not speculate on that here. But I have not come across a lot of material trying to **quantify** the bias. The question by how much exactly is the overround for longshots overweighed remains largely unanswered. So that is what I am looking at today.

### The model

The base case, or the simplest assumption one could make is that the margin is attributed equally to all outcomes. I will call this the **linear** case. So if you have a 5% margin you simply multiply the offered odds with 1.05 to get the fair odds, the formula being:

**O _{f }= O * (1+M)**

The alternative is a formula proposed by Joseph Buchdahl in football-data.co.uk. In summary, it assigns the margin to the different outcomes not equally, but in proportion to the size of their odds, or in a reverse proportion to the probability of the outcome. I call it the **proportional** case. You can see the exact formula in the link above. Unlike the linear formula, the proportional one takes the favourite-longshot bias into account, yet we could test if it quantifies it correctly. The formula:

**O _{f }= n*O / (n – M*O)**

In the two formulas above **O _{f}** stands for fair odds,

**O**for offered odds,

**M**for margin (aka overround) and

**n**for number of outcomes (3 for a 3-way market).

I am using the names linear and proportional for the sake of simplicity, although one might argue that they are not factually correct, since the linear formula still adds the overround in proportion to the odds while the proportional one gives the odds even higher weight resulting in a sort of an “overproportional” attribution of overround. Anyway, I will stick to that convention for the rest of the article.

### The data

I test the two formulas with the odds data provided by football-data.co.uk. I have gathered results for the few soft books with the longest record (Interwetten, William Hill, Ladbrokes, bet365) and for Pinnacle.

## Results

First I simulate a betting strategy with the fair odds assumed by the two formulas. If we were able to bet level stakes at the margin-free fair odds as opposed to the ones the bookmakers offer, we would expect to have around zero profit in the long run.

I have arbitrarily determined 10 ranges of odds for each bookmaker so that I get approximately the same number of bets in each of them. The first bracket has the lowest odds, the second one has the second lowest and so on. Then I have averaged the returns of such a strategy for each range and for both formulas and have drawn a graph. Ideally the lines should be flat and close to 0. The results are below:

It is safe to say that the graphs confirm the favourite-longshot bias. Clearly, for most if not all of the bookmakers, the linear formula gives too high fair odds in the low odds range and too low in the short odds range. The proportional formula seems to do a better job.

**Quantifying the margin attribution**

The graphs give an interesting insight but are inconclusive for shorter sets of data and lower margins – like the Pinnacle set. In particular, the graphs suggest that the favorite-longshot bias is not present in Pinnacle odds to the same extent as in the soft books, but are insufficient to make a strong conclusion. So I started looking for a more scientific way to measure the correct distribution of margin in order to be able to draw more precise conclisions.

**Scoring rules**

There are a number of so called scoring rules often applied for that purpose. The most common ones include RMSE (root mean squared error), MAE (mean absolute error) and LE (logarithmic error) – but I did not find any of them particularly useful. The problem I have encountered is that they do well in measuring wrong predictions in the low/high probability range (like 5% and 95%) but were essentially useless for probabilities close to 50%. I don’t want to bore you with the details, so just consider this: assigning a 50% probability to a series of 100 outcomes, whether you got 100 wins, 100 losses or 50 wins and 50 losses does not change any of those scores – while obviously you made a good prediction only in the third case.

Still, I decided to report the results for MAE as it is somewhat useful in detecting deviations for low/high probabilities. I have decided against RMSE and LE as they both overweight mistakes in the extremes (e.g. a very short odds bet lost or a very long odds bet won), which, variance aside, is all the same for the bettor, since he cares mostly about how much money he loses/wins on average and MAE is the best estimate for that. However, MAE is just as incapable as the other two at detecting bad predictions around 50% assigned probability so keep that in mind.

**Linear regression**

As I was a bit disappointed by the accuracy of the scoring rules I kept searching and found an interesting suggestion of using the series of binaries (1-happened, 0-did not happen) as the dependent variable and the percentage predictions based on the fair odds as the independent variable in a linear regression. In any case the correlation and r-squared are low because the dependent variable always ends up at one of the two extremes. However, with accurate predictions one would expect a slope close to 1 and an intercept close to 0.

You can see the summarized results for the linear regressions and the MAEs (the lower, the better) below:

For the linear margin distribution we consistently get slope higher than 1 and negative intercept. So to arrive at the correct probabilities we need to take the “fair probabilities” derived from the linear model, give proportional increase to all of our probabilities and then deduct a fixed amount from the result. Let me give you an example.

Let’s say our initial probabilities according to the calculation with the linear formula for two outcomes of an event are 80% (fair odds 1.25) and 20% (fair odds 5). We have a slope of 1.10 and an intercept of -0.05. So to arrive at the correct probabilities, we first multiply both by 1.1 (or 10%) to 88% and 22%, then deduct 5% from both arriving at the final probabilities of 83% and 17%. Now, these are not strictly the correct probabilities either, since depending on the coefficients they won’t necessarily add up to one, but in the context of the model those numbers do a better job at explaining the result of the event then the initial ones. Therefore we could safely say they are closer to the true probabilities.

Meaning we have underestimated the probability for the 80% (short odds) and overestimated it for the 20% (long odds). Why did we do that? Well, most probably we have assigned too big of a margin to the short odds, therefore the fair short odds we arrived at were too high and the respective fair probability was too low – and the other way around for the long odds.

So, for a slope higher than 1 and an intercept lower than 0, the bookmaker assigned a higher portion of the margin to the long odds and a lower one to the short odds than what the model accounted for. And the other way around for slope lower than 1 and an intercept higher than 0 (as we have it for some bookies with the proportional model).

**Conclusion**

Our table shows us again that the proportional formula accounts for margin attribution on average much better than the linear one. However, we also see that different bookmakers assign margins differently and there is no one-size-fits-all solution. For example, Interwetten assigns an even lower share of the margin to the short odds and an even higher one to the long odds than what the proportional formula would suggest. On the other hand William Hill seems to be using something between the linear and the proportional formula.

What does that mean in practice? First, if you bet randomly at all books (which I hope you don’t), at Interwetten you will get much better results betting on short odds than on long ones, at William Hill the difference will be smaller, but in all of them short odds will give you a better result.

Second, and more importantly, we see that if we are to use just one formula to estimate the fair odds across all bookmakers, we can hardly find a better contestant then the proportional formula suggested by Joseph Buchdahl at football-data.co.uk. This is important for sharp bettors or services like Trademate that try to estimate the fair odds a bookmaker has calculated based on their odds on offer. For them the proportional formula is a great choice and could be used as it is or fine-tuned a bit to fit the bookmaker in question more accurately. In any case the proportional formula beats the linear one by a mile and I would prefer it for any bookmaker out there.

So that’s it from me on the favourite-longshot bias and the bookmaker overround. I see that some of you like these articles a lot, yet others seem to be a bit bored by them. So I plan to follow with some lighter stuff like interesting stats from Trademate, some arbing wisdom and perhaps a few exotic betting markets. In any case, if there is anything in particular you would like to read more about just let me know. Good luck to everyone and till the next time!

**Rebel Betting Promotion**

On an unrelated note, Rebel Betting have let me know that they have started a major sale, offering large discounts on their long-term offers. It’s a big one-time payment but a great per-month value and if you are planning to arb long-term I think you can hardly find a better deal than that. If you are going for the 6 months, just make sure that the leagues you are aiming at are playing during summer, since it is usually a quiet season for arbers. But in any case I don’t think you can go wrong with the 32 month option.

Also, a big thank you to whoever purchased the software from my banner. If you need any support or advice just drop me an e-mail at admin@churchofbetting.com and I will do my best to help you.

The best work on this subject has been done by Hyun Song Shin with the following being his first paper on the subject:

http://www.math.ku.dk/~rolf/teaching/thesis/Shin91.pdf

This helps you understand why the favourite-longshot bias is generated by bookmakers and works so well for them.

This, in turn, has been used in interesting studies such as that by Erik Strumbelj:

http://lkm.fri.uni-lj.si/uploads/eriks/Strumbelj_WorkingPaper2013.pdf

How does the Buchdahl model compare to Shin’s?

Thanks for sharing the papers. They seem to offer a comprehensive analysis of the theory behind margin setting. Right now I don’t have the time to read them carefully, but I’ll do so when I have the chance. However, from the first read and without having run the calculations, the formula from Shin’s paper looks very similar to the one proposed by Buchdahl in terms of the nature of the relationship between odds and probabilities (what Shin refers to as the “square-root rule”).

Back to your question, the models are similar but certainly not the same. Shin’s model leads to a bit lower odds for the favourites. It’s a lengthy topic, so I plan to write a separate article comparing the models you have quoted to the one from Buchdahl.

Hi!

Great that you posted about our sale. Very honest and informative! I like it.

So all you people out there who consider signing up – now is a good time. The sale only lasts until Thursday 18 May.

Also…we have some exciting news to share with you soon. I suggest you follow us on facebook so you don’t miss out on any updates. (…new bookies and new sports! And we hope to integrate more summer sports as well)

Cheers ya’ll!