Hattrick and the Server
|This is an official Hattrick Editorial originally published in 2006-03-19 14:12:00.|
We would like to take this opportunity to apologize for the bad performance of Hattrick this season, to explain what has happened and what is being done to improve capacity over time.
What happened on Saturday 18 March?
Around midday the main database server stopped responding and needed a restart, which caused delays on the site. After the same problem had recurred a further two times, we made the decision to move all of the main database to a spare server, as the crashes were most probably due to a hardware failure. Right now, we are running on the spare server which seems to be going well despite it being a weaker machine. We hope to move back into the main server later this week.
How does this connect with the general performance issues we have experienced this season?
At the face of it, not at all. The general performance crisis - which culminated two weeks ago - showed itself as a very high database load on our main database. This turned out to have several causes, one of which was hardware related and was sorted that same weekend, one which had to do with software optimization which we fixed this last week. Combined, these two fixes cut the system load in half. What happened on Saturday was unrelated, even though it ended up having similar results for users.
Why does this happen and what is being done to prevent it from happening in the future?
The recent issues have had various reasons, and it's not possible to point out one specific culprit. There are no universal bottlenecks at the moment. We have to work on several fronts to get it sorted out: hardware, software, as well as our own organization.
We are doing a lot of work right now on optimizing the database, rewriting applications to make them work quicker, and improving the server environment in various ways. We recently had a leading database specialist visit us to help us boost performance, and he gave us a good few ideas for improvements (as well as some kudos for running the largest public database he had ever seen). We are creating better routines for scheduled maintenance as well, to avoid some issues we've had in the past.
Basically, all of the above is business as usual though.
We are doing some other things as well. We are hiring more people to come work with Hattrick, and one of the main objectives is to set up better server maintenance monitoring. We have things like that already, of course, but it is an area where we want more focus, and more accountability. This will not give immediate results, but it is necessary if we want to take Hattrick to the next level of reliability and speed.
We are also close to sealing a partnership with a major hardware supplier. This will give us access to hardware upgrades as well as to valuable technical know-how.
It is very hard to make any guarantees for when Hattrick will be running to everyone's satisfaction. Maybe it will never happen. But we will improve from where we are today.
We have always been racing to keep up with demand, and while we have been ahead of the curve several times, we have also been far behind on occasion. In many ways, being strapped for resources has been good for us; it has forced us to keep improving in smart ways. Of course, much of this extra capacity has been consumed by new users, who in turn have given us resources to improve even further.
Since 2000, the user base of Hattrick has grown by more than 100 times and the site is still more reliable and faster than it was back then. That is not a comfort at times like this, but it does give some perspective and, I think, some credit to the idea that things will improve over time. So despite the understandable doom and gloom on a weekend like this, I think there is reason to be cautiously optimistic.