Monday, May 31, 2010

Graph Maintenance - Complete!

As noted in the previous entry, we recently updated the AggreTweet graphing technology to display more accurate information. Now, graphs will at-most be 3 minutes out of date. Considering that the graphs only plot points in 6 minutes intervals (by design), this should really not be noticeable by anybody looking.

So, in cases like a Halo: Reach BETA launch, and everyone is clamoring for a code via Twitter, you will see the spike in the graphs as soon as the next plot point on the graph is generated. This also means that if a given topic gets a high volume of tweets in rapid succession, it will rise to the Top-10 list much sooner (and accurately) than before.

We'll be monitoring performance as always, and if we notice any other anomalies we'll be sure to address them as quickly as possible. As always, look for updates here!

Friday, May 28, 2010

AggreTweet Graph Maintenance - 5/28/2010

Notice 5/28/2010: The trend graphs on AggreTweet are not displaying accurately at present, affecting overall trends / rankings. We are working to rectify.

So what does that notice atop Aggregame mean? Well in short - the overall rank-ordering of top games, as well as the graphs displaying on game channels, are not actually representative of the most current Twitter activity. This only affects the graphs - actual Tweets are flowing real-time, so that's good.

However, AggreTweet is most useful for tracking trends. It's no secret that reading the front page of AggreTweet is, for all intents and purposes, a waste of time. It's a real-time feed of every tweet for all major video games, so needless to say there's a lot of data constantly streaming in. To try to actually read everything coming in would be like swimming upstream against the current. Then again, if the stream of tweets was broken, it would be an even worse scenario.

Thankfully, we are still tracking all the data. That means once this problem is rectified (hopefully sometime over the weekend), the graphs will suddenly be displaying properly again and you'll be able to filter through that historical data (Past Day, Week, Month) and see how the weekend's trends really played out.

A Deeper Look into the Problem:

Given the volume of tweets and Twitter users AggreTweet tracks (over 20 Million tweets stored, with hundreds of thousands of Gamers contributing) our physical databases are enormous. Working with this volume of data takes time (and CPU cycles).

AggreTweet is architected in such a way that it is extremely efficient at handling high volumes of tweets, rapidly - and more importantly - handling lots of concurrent viewers watching the live feeds. So if there happens to be thousands of visitors chatting simultaneously in various Game Channels on AggreTweet, it can handle it and not crash the website due to that many people all requesting those streams concurrently.

Given this primary charge, some trade-offs must be made. One of those is in how we store and index each tweet, and subsequently how we plot that tweet against one another in the graphs. So given the nature of this particular failure (a failure to scale in quantity of total tweets, not concurrency), working to fix the issue (permanently) is a bit of a chore (and waiting game given how much data we must re-factor). Rather than put effort into a "band-aid" fix up front, we're going to spend all our time and resources fixing it the right way, once.

As such, the maintenance will take most of the weekend. We'll have an update for you when we have some conclusive results. At least we know where the problem is, and how to fix it. If we erased all historical data we could have the problem solved immediately. But that's not a very appealing option... So instead, we must arduously wade through 20 Million tweets (and more importantly, test our results to make sure this fix is not going to cause any other problems).

Apologies for the inconvenience. AggreTweet will remain online (as it is still technically functioning); you'll just have to assume that none of the rankings or graphs are displaying current data. Actually, technically all the graphs are accurate (relatively speaking), but are actually showing the plot points from hours ago. So if you hover the graph and it says that a topic had "30 tweets" 6 minutes ago, it really had 30 tweets at that particular moment - 5 hours and 6 minutes ago (because the graphing engine is so back-logged due to this performance issue).

In any event, we'll have it solved as quickly as possible. Stay tuned.