Power Rating

From Hattrick
Jump to navigationJump to search
An example of Power Rating

Power Rating is a feature that apperas in match report after the conclusion of a match. It expresses with a single number the team strength. The real algorithm is still a secret, but some details have been revealed in a very long article (partially shown below) written by LA-Artod, one of the developers of the rating.

The easiest way is to express the strength of a team with a number, and then you compare teams by comparing those numbers. These numbers are ratings of the team strength, and by comparing these ratings you can create a ranking – an ordering of numbers from the strongest team to the weakest. [1]

Before Power Rating, there already were ratings analyzers (like Elo chess, Alltid Hattrick, HatStats, LoddarStats, VnukStats and PStats and HTEV), but this is the first officially relaesed by Hattrick Team.

Announcements

Hattrick announces
A new way to measure your team 03/04/2020
It’s been long in coming, but now the first component of the new Hattrick ranking system is here. That is the Power Rating, which for the first time gives us a single measure for team strength - beyond just sector ratings and player stars.

To calculate your Power Rating, we look at all the factors that contribute to winning actual matches. Your formation, your sector ratings, team tactics and tactic levels, player specialties and how well suited those have been to the opponents you have been facing.

The Power Rating value is an average of your last 5 competitive matches. Since individual matches can always give unexpected results, the outcomes of those matches does not matter when calculating the Power Rating. Instead, what we want to have is a value that shows “objective” strength, so that you can compare your “punching power” for future matches to other teams in your series, or to other teams in the wider Hattrick world. Or just to be able to track your own progress as you build your team.

You can find your current Power Rating value on the new Team Rankings page, along with how your rating compares to other teams in your league and globally.

What’s more, we are also using the Power Rating in the tournament system. When setting up a new Single match, the “Recommended matchups” list will be based on teams that have a similar Power Rating to your own. Don’t forget that Single matches are free to use for everyone, until June 1, so there is no excuse for not trying this one out!

We also have a new Open tournament type, where you can request opponents that are similar to your own, based on this rating.

Finally, for Supporters, we have added new Supporter Statistics which shows you Power Rating per league and per division.

Have fun, and stay tuned for more ranking news in the coming weeks!

Hattrick announces
Power rating update 14/05/2020
We’ve done some tweaks and additions to the new Power rating system, based on your feedback:
  1. We’ve made the Power rating data available to CHPP developers, so that it can be included in their apps.
  2. There is a new Top 15 list in the Regional details page
  3. The formula was changed, and now reflects the Long Shot tactic more accurately
  4. The rating will be calculated on a larger sample, namely the best 7 out of your last 14 competitive matches.

This update is already live and used in the current rankings. We hope you like the changes, which we think will make the Power Rating an even better measure of team strength in Hattrick.

What can we learn from Power Ratings?

  1. Specialties matter, but not to the incredible extent that market prices for players with specialties suggest. They matter more in close matches, of course, but you have to consider whether the match would still be close with a better player without a specialty.
  2. There is a tactical aspect in Hattrick that matters. Whether it matters enough is up to discussion. But when you give up some defence or attack to boost midfield, or the opposite – it matters. There are matches where the success probability implied by Base Power Ratings was 0% and the actual success probability was 100%. Sure, it usually meant that the opponent made some catastrophic mistake – but it happens.
  3. Counter-attacks are strong, but – as experienced users already know – the variability of results is higher by playing it. So, if you are the favourite, randomness can only hurt you, and you want to minimize variability. Therefore, counter-attacks are, in general, a tactic for the underdogs.
  4. Surprisingly, or maybe not, long shots are a tactic type with high variability as well. So maybe this explains why long shots are a tactic strong enough to force any team to adapt its style of play against it, but not actually that powerful to dominate the game.
  5. Tactic types matter… up to a point. The other tactics are significantly less strong choices, just suited for occasional use.

Calculation method

There are three components in everry match result. They are:

  1. How strong your players are in general = Base Power Rating measures raw line-up strength,
  2. What choices your team build allows you = Skill Power Rating measures user skill,
  3. How good you are as a manager to pick the choice that is best against your opponent and if you add a spruce of randomness, you get the actual result.

Only the first two are considered in Power Rating, while the third is not. The purpose is to determine how strong a team is when the result is also affected by luck.

Base Power Rating

The first point is the hardest one because assessing the strength of a player by its skills is problematic. In statistics, it is much easier to work with a result that can only be a win or a loss, but Power Rating works with “success probabilities” defined as the probability of winning + one half of the probability of drawing.

The core of the Hattrick Power Rating is therefore a rating, called Base Power Rating, that summarizes the strength, in terms of success probabilities, of a specific lineup against a generic opponent. The properties are [2]:

  • Monotony: if any of your ratings improves and the rest stay equal, your Power Rating improves;
  • Differences: in Base Power Ratings should be interpretable as success probabilities;
  • Location invariance: the same difference in Power Ratings should mean the same success probability no matter what the absolute value of Base Power rating is.

Base Power Ratings measure team strength against a generic opponent. A ranking came out from repeatedly simulated games between teams that were strictly better than each other, and with an iterated procedure such that a given difference in Power Ratings meant that the stronger team had a certain success probability.

Special events are included. Calculating expected goals from special events is far from easy, though. The probabilities shift as special events take place, and this makes calculations very, very difficult. It was the hardest part of the whole project.

The algorithm, also works with extreme teams, AOA teams, Counter-attacks (CA), Long shots (LS), and asymmetrical teams facing opponents that hit their weak points, fared more poorly than their sector ratings would suggest. Not a shock, because it meant that there is a tactical aspect in Hattrick that matters, and quite a lot. It also meant, though, that there was a difference between the overall strength of a lineup and the strength of a lineup against a specific opponent’s lineup. This could then become a measure of tactical ability.

Skill Power Rating

Base Power Ratings show you how strong your team would be if you played a certain lineup regardless of the opponent, and without the opponent adjusting his lineup to yours. This is not how one plays against long shots. And when tactics come into play, it’s not about team strength anymore – it’s about skill.

When you’re not considering your team in isolation, but compare it to a specific opponent, then the perspective changes. You don’t necessarily play your absolutely strongest lineup – you play the lineup that gives you the better chances to win. The difference in Base Power Ratings show what the success probability would be – on average – if both teams played their default lineup regardless of the opponent. But when you have two lineups to compare, then you can calculate – preferably with Super Replays – the specific success probability for that match. And if a team which, based on Base Power Ratings alone, would have had a 50% success probability (“implied probability”), instead has a 70% success probability (“actual probability”), then its manager played well. If it has a 30% actual probability when it would have had a 50% implied probability, then it played badly. This difference between actual and implied success probability is the so-called Skill Power Rating.

What about team strength proper, not just line-up strength? Well, of course team strength is the strength of its strongest lineup in terms of Base Power Ratings, while team width and flexibility will be reflected in Skill Power Ratings.

Footnotes

  1. ^  Not all rankings are based on ratings – the Cup ranking currently used for Cup seedings isn’t – but single ratings can easily be transformed into rankings.
  2. ^ A key property the Hattrick Power Rating does not have is transitivity: if lineup B has a higher Base Power Rating than lineup A, and lineup C has a higher Base Power Rating than lineup B, it doesn’t mean that lineup C will have higher success probabilities when playing against lineup A, even in the case that lineup B has higher success probabilities against lineup A and lineup C has higher success probabilities against lineup B. It means that, against a plurality of different opponents, lineup C will on average be stronger – win more often – than lineup B, and lineup B will on average be stronger than lineup A. This is because Hattrick has some “rock-paper-scissors” components, as we will see later – and this is addressed by other parts of the Power Ratings project.