bz2url – a very simple url compressor for wordpress

Its been over a year since we started url compression for inhouse projects at Saturn and is being in use for Asianet India and this blog. Recently for a new project we absolutely needed an integrated url shortner, which finally gave shape to the initial version.

Wordpress events are handled through static functions of the class, whereas the class methods are registered using the constructor. We fixed up a two character short uri, making the short url equal to the domain plus 3 including the leading slash. With upper case, lower case and numbers it works out 2^62 unique combinations, which is about 4.62E+18. Well this should be enough and more when compared to the number of urls that was estimated. The table was defined with the two char short uri as primary key, and the real permalink was stored alongside. To avoid duplication, instead of making url uniqe, which would lead to complications when the number of records increase, we added a computed field the ‘crc’. When a row is inserted the crc32 polynomial of the url will be evaluated into the crc field. The table is indexed on this field. The colum being int, would be very fast in index scanning. Using the help of this, we check wether a particular permalink already has as short code, if that fails, a new short code is created, and validated to be unique before storing into the index table.

Two theme functions are provided in the wordpress way, which is expected to be used only in the_loop. The functions are the_shortUrl, and get_the_shortUrl. Both work the same way as the name specifies and the standards of wordpress dictates. No event hooks are added to generate the short url in this version. Though such things could be added, we (rather our design and integration team) are already familiar with theme hacking, and most of our portals on wordpress is built with themes built from the scratch. So any social buttons and the like would have our code wrapped with the short code generator invoked.

The code is ported from, only that uses 4 character, which results in about 2.12E+37 combinations.