Wednesday, April 29, 2009

Don't forget your Bitwise Operators

Edit After getting some comments about this post I realised some people might want a little intro into what Bitwise operators are. A great tutorial on it for PHP can be found here

I have had discussions before with other PHP developers, and in fact with developers in general, geeking out about ways to get things done in our respective languages etc. One thing I noted from these chats is that the knowledge of Bitwise operations, and how they can be used to create cleaner, more efficient applications, seems to be lacking. So I thought I would take the opportunity to point out one way that we are using Bitwise operators to make our jobs a little easier here at Synaq in developing Pinpoint 2.

A little bit of a history. Pinpoint 2 is our own development to replace the aging Pinpoint 1 interface which is based on the widely used, open source Mailwatch PHP application. Essentially it is a front end interface for the Mail Security service we provide; scanning companies mail on our servers for viruses, spam, etc, before forwarding the clean mail onto the clients own network. One thing that the old system (and of course the new one) needs to do is store classifications of mail. Some of the types they get classified as are Low Scoring Spam (i.e. probably spam but a chance that it could be clean), High Scoring Spam (i.e. definitely spam with a very slim chance that is clean), Virus, Bad Content (eg. the client blocks all mail with movie attachments), etc, etc. The old Pinpoint 1 based on Mailwatch uses a database schema that stores a 1 or 0 flag for that specific type. As a simplified example:
  • is_high_scoring: 0 or 1
  • is_low_scoring: 0 or 1
  • is_virus : 0 or 1
  • is_bad_content: 0 or 1
As you can see this gets rather limiting because what if, for example, you wanted to add another classification type? You then need to go ahead and alter the table schema in order to accomodate adding another is_* column to the table which is really kludgy and not that easy to implement.

So for Pinpoint 2 we decided to reduce all those classification columns into one and assign each classification a bit value. For example:
  • if clean: classification = 0
  • if low scoring: classification = classification + 1
  • if high scoring: classification = classification + 2
  • if virus: classification = classification + 4
  • if bad content: classification = classification + 8
  • if something else: classification = classification + 16
  • if another something else: classification = classification + 32
So if we had a mail that was classified as high scoring spam with a virus attached and would you know it, the content is also bad its classification value would be :
2 + 4 + 8 = 14
So in our classification column a value of 14 is stored. If we now want to in our interface check the type we do not have to access multiple columns and determine if it contains a 1 or 0 but instead retrieve one value and work our bitwise operators on them. For example with Propel in symfony, if we wanted all messages that were viruses:

$mail_detail_c = new Criteria();
$mail_detail_c->add(MailDetailsPeer::CLASSIFICATION, 4 , Criteria::BINARY_AND);
$virus_mail_obj_array = MailDetailsPeer::doSelect($mail_detail_c);

We now have an array of results with all messages that are viruses. If we wanted all messages that were viruses AND high scoring spam:

$mail_detail_c = new Criteria();
$mail_detail_c->getNewCriterion(MailDetailsPeer::CLASSIFICATION, 4 , Criteria::BINARY_AND);
$classification_criterion = $mail_detail_c->getNewCriterion(MailDetailsPeer::CLASSIFICATION, 4 , Criteria::BINARY_AND);
$classification_criterion->addAnd($mail_detail_c->getNewCriterion(MailDetailsPeer::CLASSIFICATION, 8, Criteria::BINARY_AND);

You can see from all this it is a lot easier to write dynamic queries using bitwise operators than it is to try and add new columns to a schema everytime you add a new classification type.

No comments:

Post a Comment