PHP Parse XML from mysql

Today we were looking at migrating data from one database to another and thought that we’d try out Mysql command line XML output format. The format is fairly simple, so we decided we’d run with it instead of CSV. To parse the XML file, we use PHP, and soon discovered there’s no real nice code snippets out there on the net for parsing XML with PHP. Anyway, we cobbled the following code together to read mysql XML from stdin, and parse it into $rows.

Our original XML file looks something like this….

<?xml version=”1.0″?>

<resultset statement=”SELECT * FROM httpd_access_log_tmp WHERE pending=’Y’
“>
<row>
<field name=”ev_timestamp”>2008-06-30 02:45:10</field>
<field name=”site”>rackcorp.com</field>
<field name=”hostname”>XX.XXX.XX.XX</field>
<field name=”method”>GET</field>
<field name=”url”>/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX</field>
<field name=”authuser”>-</field>
<field name=”agent”>Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv</field>
<field name=”referer”>http://www.mysql.com/XXXXXXX</field>
<field name=”status”>200</field>
<field name=”servedfrom”>XXXXXXXXX</field>
<field name=”timeused”>0</field>
<field name=”bytesbody”>9948</field>
<field name=”bytesin”>499</field>
<field name=”bytesout”>10227</field>
<field name=”httpprotocol”>HTTP/1.1</field>
<field name=”httphost”>XXXXXXXXXXXXX</field>
<field name=”pending”>Y</field>
</row>

<row>
etc…

So now the code to parse it:

if ( ! ( $fp = fopen(“php://stdin”, “r”) ) ) {
die (“Couldn’t read STDIN.”);
}

$rows = array();
$columnName = ”;
$rowcount = 0;

if (!($xml_parser = xml_parser_create())) die(“Couldn’t create parser.”);
function startElementHandler ($parser,$name,$attrib){
global $rowcount;
global $rows;
global $columnName;
switch ($name) {
case “FIELD”:
$columnName = $attrib[“NAME”];
break;
default:
$columnName=””;
}
}

function endElementHandler ($parser,$name){
global $rowcount;
global $rows;
global $columnName;

$columnName=”; // Finished with data

if($name==”ROW”) {$rowcount++;}
}

function characterDataHandler ($parser, $data) {
global $rowcount;
global $rows;
global $columnName;

if ($columnName == “”) {return;}

$rows[$rowcount][$columnName] = $data;

}
xml_set_element_handler($xml_parser,”startElementHandler”,”endElementHandler”);
xml_set_character_data_handler( $xml_parser, “characterDataHandler”);

while( $data = fread($fp, 4096)){
if(!xml_parse($xml_parser, $data, feof($fp))) {
break;
}
}
xml_parser_free($xml_parser);

var_dump($rows);

$rows now looks a little something like this:

array(2) {
[0]=>
array(17) {
[“ev_timestamp”]=>
string(19) “2008-06-30 02:45:10”
[“site”]=>
string(15) “rackcorp.com”
[“hostname”]=>
string(12) “XX.XXX.XX.XX”
[“method”]=>
string(3) “GET”
[“url”]=>
string(61) “/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX”
[“authuser”]=>
string(1) “-”
[“agent”]=>
string(50) “Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv”
[“referer”]=>
string(46) “http://www.mysql.com/XXXXXXX”
[“status”]=>
string(3) “200”
[“servedfrom”]=>
string(12) “XXXXXXXXX”
[“timeused”]=>
string(1) “0”
[“bytesbody”]=>
string(4) “9948”
[“bytesin”]=>
string(3) “499”
[“bytesout”]=>
string(5) “10227”
[“httpprotocol”]=>
string(8) “HTTP/1.1”
[“httphost”]=>
string(17) “XXXXXXXXXXXXXXXXXXXXXXXXX”
[“pending”]=>
string(1) “Y”
}
[1]=> etc….

That should do it. Then do with $rows what you want. We end up throwing it back into another few databases under a webhosting accesslogs warehousing scheme.

[del.icio.us] [Digg] [StumbleUpon] [Technorati] [Windows Live]