Compressing php source files for embedded applications

Compressing php code is not been a tough task, and this is not for the purpose of obfuscating or encoding to make closed source distribution. But to create compact files which can be used in embedded web applications

Compressing php code is not been a tough task, and this is not for the purpose of obfuscating or encoding to make closed source distribution. But to create compact files which can be distributed by pasting into inline text, or into email without the need for attaching. The size will be drastically reduced. Actually at Saturn we did this to help us achieve to load a whole application on to a very space constrained flash disk. We were the least bothered about the process cycles taken to evaluate the php files since they will happen only once in a blue moon when the embedded system restarts. I am pretty sure that for closing source, there are other solutions.

Since this was planned as a cli script to be invoked from the linux command-line, looping through the results of shell command find, we started off with a usage display, and test that we had enough parameters passed in, or show the usage.


function tUsage() {
    die(
"usage: encode.php <src[.php]> <out[.php]>\n");
}
 
if (
$argc !== 3) {
    
tUsage();
}

and continued on to capture the first paramter as source, and the second one as target


$src $argv[1];

$tgt $argv[2];

make sure we are only handling files with extension ‘.php’


$src = (substr($src, -4) !== '.php') ? $src '.php' $src;

$tgt = (substr($tgt, -4) !== '.php') ? $tgt '.php' $tgt;

we had to make sure that the source file existed, though in our build kit (make) we were using find, and will never encounter such a situation, this could happen incase the encoding script was invoked for a single source file and the path was wrong or something


if (!file_exists($src)) {
    echo 
"File $src should exist\n";
    
tUsage();
}

read the source we are going to parse through, and tokenize using the built in function token_get_all


$code = @file_get_contents($src);

$tokens token_get_all($code);

Now let us parse the tokens, to identify what will go into the compressed file, and what gets discarded. Since it was easy to use the output buffering to capture the content, we used that and finally the stripped up code is collected into a store.


$ClosTag false;
ob_start();
foreach (
$tokens as $token) {
    if (
is_array($token)) {
        switch (
$token[0]) {
            case 
T_OPEN_TAG:
                if(
$ClosTag == TRUE){
                    echo 
$token[1];
                }
            case 
T_COMMENT:
            case 
T_DOC_COMMENT:
                break;
            case 
T_CLOSE_TAG:
                
$ClosTag true;
            default:
                echo 
$token[1];
                break;
        }
    }else{
        echo 
$token;
    }
}
$code ob_get_clean();

now write the compressed base64 encoded value of the captured, stripped code with proper decoding statements into the target file.

For convenience the full code is attached for download here Download