Monday, April 6, 2009

Memory caching can be a saviour

At Synaq we are busy working on a pretty complex application. Essentially its a frontend interface to a system that scans and processes customers emails for spam then records the results of the scans in a MySQL database. Without going into too much senseless detail, the backend processes a few million items per day and suffice it say that is one helluva database to search through when you need to extract useful data.

Because of the sheer quantity of data we have had to use numerous techniques to try and make the frontend still act at least reasonably responsive when it needs to query the database. Then one day I asked myself "Does the interface really need to query that database so often for data that in essence hardly ever changes?". The scenario is that the interface does not really make many alterations to the data extracted and a lot of the data used is repeated per page for a specific users session. One security feature we have for example is that every user is defined as belonging to a specific Organisation (or Organisational Unit to be technically correct) and every page load requires retrieving this list of Organisations that the current user is allowed to see. This is not likely to change that often and so we came up with an idea.

We use APC, a memory caching facility for PHP scripts, and it also allows you to store your own values through your code into memory explicitly. Thankfully, symfony provides a class that can manage that for us as well, the SfAPCCache Class, that makes using the cache a doddle. Our problem? We need to ensure that the data we store is totally unique.

The solution was to store the results of a database query for our OrganisationalUnits model class into the APC Cache memory. The way we did this was to use the Criteria object for the Propel query as the name of the item to be stored. It stands to reason that if the Criteria object for a specific query is unique then the result will be unique. If the same Criteria object is passed again then the results from the database will be the same as the same Criteria object we passed before. Why query the database a second time?

The APC Cache though cannot take an object type as a name only a string. Easily enough done with PHP's serialize() function. But that string is excessively long (a few thousand characters sometimes) so we need to find a way to shorten and yet keep the uniqueness. So we get the MD5 hash of that serialized Criteria object. There we go. But due to our own paranoia and the need to be 110% sure that we wont by some ridiculous stroke of bad luck create another Criteria object later that against all the statistics of MD5 creates the same hash, we also make an SHA1 hash and concatenate the two hashes. There! Now the chances of any Criteria objects having the same name are so remote as to be nigh-on impossible.

But it doesn't end there. This doesn't help us if we don't know a way to actually add this to the cache and remove etc. For this we go to our OrganisationalUnitsPeer class and overwrite the doSelect method that recieves all calls to run a query onthe database as such:


public static function doSelect(Criteria $criteria, $con = null)
{
$data_cache = new sfAPCCache();

$serialised = serialize($criteria);
$md5_hash = md5($serialised);
$sha1_hash = sha1($serialised);

$complete_name = "organisational_units_doSelect_".$md5_hash.$sha1_hash;

if ($data_cache->has($complete_name))
{


return unserialize($data_cache->get($complete_name));
}
else
{
$query_result = parent::doSelect($criteria);
$data_cache->set($complete_name, serialize($query_result), 3600);

return $query_result;
}

}


Rather simple I thought. We also wanted to be sure that if the user added, updated or removed a new Organisation that the cache would not give the incorrect listing so we added to OrganisationalUnits class (not Peer):

public function save($con = null)
{
$data_cache = new sfAPCCache();

$data_cache->removePattern("organisational_units**");

$return = parent::save();

return $return;
}

public function delete($con = null)
{
$data_cache = new sfAPCCache();

$data_cache->removePattern("organisational_units**");

$return = parent::delete();

return $return;
}

Just doing this to the one set of data has increased our page loads speeds dramatically as well as reducing the load on the server itself as well when we do intense performance testing. We hope to employ this further along with other items that similarly load for each page etc and will never change.

2 comments:

  1. Good article. I have a pair of suggestions:
    1. Use $con in save and delete. If you don't do that and save or delete is called in a transaction, the record will be saved / deleted in another transaction.
    2. I use a silimar method, but I have a value in app.yml to configure IF I want cache ot not, being able to debug my program without cache in order to locate problems.

    Regards

    ReplyDelete
  2. I like your idea of the app.yml setting and in fact we have started setting up our caching to allow for a config per database table/model class dynamically. Some data we do want caching some not and we may change our minds and turn caching on or off per model class as we see fit. This also makes it a lot easier to test the impact of caching certain model classes to the overall performance of the application.

    ReplyDelete