One of the biggest bottlenecks that plague an application is it’s connection to it’s database. Nothing will bog down a write-heavy application faster than having to wait for a bogged down database.
But what if you could remove all that? What if you could just shove your data into a blackhole and forget about it? What if you could just hand the data off and forget about it and go back to serving requests?
If you have an application that doesn’t rely on realtime access to the data there may be a solution!
MySQL has a lot of different storage engines (the storage engine layer of a database is responsible for storing and retrieving data to and from various media in differing schemas check out the mysql manual on storage engines) the biggest ones being MyISAM and InnoDB. However there are a variety of uses for the others. I found one for blackhole in a particularly tricky application we had at Izea.
The Blackhole storage engine is just how it sounds. You write data to it and it just throws it away. It goes to /dev/null. It’s not written. Every part of a MySQL query transaction takes place up to the point where the data is written to disk. It’s just flushed. Up until I started this project I had no idea what it was used for. Then I discovered that even though it didn’t store the data it did write to the binlog for replication (see the mysql manual section on replication for an explination on how all that works).
Replication is extremely efficient, it has to be. The basics of how it works is data is written to the database. Each transaction is recorded in a binary log. You can take that binary log and put it on another mysql server and using some tools “replay” the transactions back to create an EXACT duplicate of the database to that point. The binlog file is streamed from the “master” server to the “slave” server.
So I had an idea. Why not hook the application up to the blackhole and have replication stream that to the actual database in it’s “oh-so-much-more-efficient” manner. This would do a few things. The application would *NEVER* be waiting on the database as it isn’t writing to disk and isn’t performing indexes. The queries optimization stage that MySQL does in the background would already be done freeing up some cycles on the db to handle writing and indexing. It would keep the data synced as there would be real highpoints and lowpoints of activity not just one long line of noise from a database being constantly hammered. And none of the write requests would get dropped by application timeouts.
So a bit of background on the application. We have an application that is hit from a piece of javascript to record some things on a page. It is currently dealing with thousands of hits per second but needs to scale to much more than that. As it is right now the database is bogged down with IO issues. We have 15k rpm drives in them and it’s optimized all to hell but it’s so write heavy it’s still bogged down. And because the application is waiting for a response back from the db, a lot of requests are just dropped so we are missing good chunks of data.
The solution was to reengineer the application to a lighter weight framework and to put the blackhole server in between the application and the main database that the statistical information is performed on every night. Each night a script runs at midnight that pulls information out of the database and performs a bunch of analysis on it. Then records the results to another database. The raw data is then flushed. This was taxing on the server and further complicated the IO issue.
The new structure is roughly outlined below:
app server —-\
>Blackhole database —— (replication) ——> Slave database
app server—-/
In the next article I’ll go briefly into implementation and the results of the experiment.