Migration from MySQL to MSSQL our solutions – Continued

In the previous post, I had specified that a code analysis would be presented later on, which is happening now. Though I am not permitted to abstract the whole database abstraction, which would otherwise divulge the core business logic of the system too, I do expect that the following would be enough to guide a MySQL PHP developer to port his application to MSSQL 2005. Mostly the compatablities are maintained. But for the database design, we had to let away some of the wonderful features from MySQL.

We started by a code auditing and reworked the system such that we did not use any group_concat through out the system. Also all timestamp and datetime fields were changed to varchar(19) since we were already feeding those fields with the php function date’s return value or ‘now()’. But there was quite a handful of areas where we were using STR_TO_DATE and with different formats. So this had to be handled in its own way. And it is here we started our regular expression war path.


$query = preg_replace(array("@\n@","@=@","@\s+@","@! =@"), array(' ', ' = ', ' ','!='), $query);

First we made sure that certain punctuations were cleaned up, such that future parsing would be easy. Then known udfs were replaced with the corresponding ‘dbo.’ identifications, NOW with getdate, identified concat strings to pick firstname + lastname replaced accordingly and then passed through the following function. I dont think the function needs any explanation. This function was used in both selections as well as insert, update situations.


    function handle_STR_TO_DATE($query){
       if(!eregi('STR_TO_DATE', $query)) return $query;
       preg_match_all("@STR_TO_DATE\((.+?),(.+?)\)@i",$query, $m);
          $rpl = array();
          foreach($m[0] as $k => $v){
            $format = strtolower(str_replace(array("'",'%'),'', $m[2][$k]));
            $sep = (strpos($format, '/'))?'/':'-';
            $t = explode($sep, $format);
            $d = explode($sep, str_replace("'",'',$m[1][$k]));
            $rpl[] = "'" . date("Y-m-d", strtotime($d[(array_search('m', $t))] . '/' . $d[(array_search('d', $t))] . '/' . $d[(array_search('y', $t))])) . "'";
          }
       return str_replace($m[0], $rpl, $query);
    }
 

Then the query was parsed for selection of formatted date, alas this was real hell. MySQL had a very flexible method, and one cannot imagine the combinations of formatted datetime object from MySQL. MSSQL uses a very primitive set of predefined formats, the same old method of jailing the developers to a set of limited options, and freezing the freedom. Well the outcome was an overworked server, and you will see what I mean while going through the following piece of code.


        $m = array();
        if(preg_match_all("@date_format\((.+?),(.+?)\)@i",$query, $m)){
            $rpl = array();
            foreach($m[0] as $i => $v){
              switch(substr($m[2][$i],1,-1)){
                case '%d-%m-%Y':
                  $rpl[$i] = 'convert(varchar,cast('.$m[1][$i].' as datetime),105)';
                break;
                case '%d/%m/%y':
                  $rpl[$i] = 'convert(varchar,cast('.$m[1][$i].' as datetime),3)';
                break;
                case '%H:%i':
                  $rpl[$i] = 'left(convert(varchar,cast('.$m[1][$i].' as datetime),8),5)';
                break;
                case '%m/%d/%Y':
                  $rpl[$i] = 'convert(varchar,cast('.$m[1][$i].' as datetime),101)';
                break;
                case '%m/%d/%Y %h:%i:%s':
                  $rpl[$i] = "convert(varchar,cast(".$m[1][$i]." as datetime),101) + ' ' + convert(varchar,cast(".$m[1][$i]." as datetime),8)";
                break;
                case '%m/%d/%Y %h:%i:%s %p':
                  $rpl[$i] = "replace(replace(convert(varchar,cast(".$m[1][$i]." as datetime),131),':000AM',' AM'),':000PM',' PM')";
                break;
                case '%Y-%m-%d':
                  $rpl[$i] = 'replace(convert(varchar,cast('.$m[1][$i].' as datetime),111),'/','-')';
                break;
                case '%Y/%m/%d':
                  $rpl[$i] = 'convert(varchar,cast('.$m[1][$i].' as datetime),111)';
                break;
                default:
                  $rpl[$i] = 'convert(varchar,cast('.$m[1][$i].' as datetime),100)';
                break;   
                }
            }
            $query = str_replace($m[0], $rpl, $query);
        }
 

Okay that was a real magic, but the worst was yet to come. After a round of testing, with the help of a logger see my article on logging, it creeped into my mind that there was a lot more to be done. Phew a grep showed me that we were using about 150 times the MySQL version of ‘if’, which was greek in the MSSQL world. Again a set of digging into online documentations and help, which finally gave shape to the following snippet.


        if(preg_match_all("@ if\((.+?),(.+?),(.+?)\)@i",$query, $m)){
            $rpl = array();
            foreach($m[0] as $i => $v){
                $rpl[$i] = '(CASE WHEN '.$m[1][$i].' THEN '.$m[2][$i].' ELSE '.$m[3][$i]. ' END )';
            }
            $query = str_replace($m[0], $rpl, $query);
        }
 

Well my logs were showing quite a good improvement, and we could identify some of the other bugs, which was overlooked by previous testing team. Some misplaced group by clauses and odd order by were causing havoc in certain points which were identified. At this stage we again put the system through rigorous testing. Now we started to face a new problem. The server side pagination was not properly working. Again the whole round of digging, document references etc were made and we arrived at a solution which is not yet tested to the core, but believed to work.


        $m = array();
        if(preg_match_all("@select (.+?)limit ([0-9, ]+)@i",$query, $m)){
            $rpl = array();
            foreach($m[0] as $i => $v){
              if(!isset($m[2][$i])) continue;
                list($start, $rowsperpage) = explode(',', $m[2][$i]);
                if(empty($rowsperpage)){
                    $rowsperpage = $start;
                    $start = 0;
                }
                if($start == '0'){
                    $rpl[$i] = 'SELECT TOP '. $rowsperpage . ' ' . $m[1][$i];
                }else{
                   $fields = substr($m[1][$i], 0, stripos($m[1][$i],' from '));
                   preg_match("@order by (.+?) (asc|desc)@i", $m[1][$i], $jt);
                   $m[1][$i] = str_replace($jt[0],'',$m[1][$i]);
                   $rpl[$i] = 'SELECT '. $fields . ' FROM (SELECT row_number() OVER ('.$jt[0].') AS rownum,' . $m[1][$i] . ') AS A'.$i.' where A'.$i.'.rownum BETWEEN '. $start .' AND '. ($start + $rowsperpage);
                }
            }
            $query = str_replace($m[0], $rpl, $query);
        }
 

The system is being put to realtime testing, and if there is any more to add, I will do so in a later post.