PHP Parse XML from mysql

Today we were looking at migrating data from one database to another and thought that we’d try out Mysql command line XML output format. The format is fairly simple, so we decided we’d run with it instead of CSV. To parse the XML file, we use PHP, and soon discovered there’s no real nice code snippets out there on the net for parsing XML with PHP. Anyway, we cobbled the following code together to read mysql XML from stdin, and parse it into $rows.

Our original XML file looks something like this….

<?xml version=”1.0″?>

<resultset statement=”SELECT * FROM httpd_access_log_tmp WHERE pending=’Y’
“>
<row>
<field name=”ev_timestamp”>2008-06-30 02:45:10</field>
<field name=”site”>rackcorp.com</field>
<field name=”hostname”>XX.XXX.XX.XX</field>
<field name=”method”>GET</field>
<field name=”url”>/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX</field>
<field name=”authuser”>-</field>
<field name=”agent”>Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv</field>
<field name=”referer”>http://www.mysql.com/XXXXXXX</field>
<field name=”status”>200</field>
<field name=”servedfrom”>XXXXXXXXX</field>
<field name=”timeused”>0</field>
<field name=”bytesbody”>9948</field>
<field name=”bytesin”>499</field>
<field name=”bytesout”>10227</field>
<field name=”httpprotocol”>HTTP/1.1</field>
<field name=”httphost”>XXXXXXXXXXXXX</field>
<field name=”pending”>Y</field>
</row>

<row>
etc…

So now the code to parse it:

if ( ! ( $fp = fopen(“php://stdin”, “r”) ) ) {
die (“Couldn’t read STDIN.”);
}

$rows = array();
$columnName = ”;
$rowcount = 0;

if (!($xml_parser = xml_parser_create())) die(“Couldn’t create parser.”);
function startElementHandler ($parser,$name,$attrib){
global $rowcount;
global $rows;
global $columnName;
switch ($name) {
case “FIELD”:
$columnName = $attrib[“NAME”];
break;
default:
$columnName=””;
}
}

function endElementHandler ($parser,$name){
global $rowcount;
global $rows;
global $columnName;

$columnName=”; // Finished with data

if($name==”ROW”) {$rowcount++;}
}

function characterDataHandler ($parser, $data) {
global $rowcount;
global $rows;
global $columnName;

if ($columnName == “”) {return;}

$rows[$rowcount][$columnName] = $data;

}
xml_set_element_handler($xml_parser,”startElementHandler”,”endElementHandler”);
xml_set_character_data_handler( $xml_parser, “characterDataHandler”);

while( $data = fread($fp, 4096)){
if(!xml_parse($xml_parser, $data, feof($fp))) {
break;
}
}
xml_parser_free($xml_parser);

var_dump($rows);

$rows now looks a little something like this:

array(2) {
[0]=>
array(17) {
[“ev_timestamp”]=>
string(19) “2008-06-30 02:45:10”
[“site”]=>
string(15) “rackcorp.com”
[“hostname”]=>
string(12) “XX.XXX.XX.XX”
[“method”]=>
string(3) “GET”
[“url”]=>
string(61) “/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX”
[“authuser”]=>
string(1) “-”
[“agent”]=>
string(50) “Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv”
[“referer”]=>
string(46) “http://www.mysql.com/XXXXXXX”
[“status”]=>
string(3) “200”
[“servedfrom”]=>
string(12) “XXXXXXXXX”
[“timeused”]=>
string(1) “0”
[“bytesbody”]=>
string(4) “9948”
[“bytesin”]=>
string(3) “499”
[“bytesout”]=>
string(5) “10227”
[“httpprotocol”]=>
string(8) “HTTP/1.1”
[“httphost”]=>
string(17) “XXXXXXXXXXXXXXXXXXXXXXXXX”
[“pending”]=>
string(1) “Y”
}
[1]=> etc….

That should do it. Then do with $rows what you want. We end up throwing it back into another few databases under a webhosting accesslogs warehousing scheme.

[del.icio.us] [Digg] [StumbleUpon] [Technorati] [Windows Live]

csync2: Install and setup csync2 on CentOS 5

This blog details how to build and install csync2 form source, as well as configure it.

Step 1) Download and install required libraries

If you haven’t already done so, install graft – it’s great. Here’s a tutorial how to install graft:
http://blog.rackcorp.com/?p=16

Go to ftp://ftp.gnupg.org/gcrypt/libgpg-error/ and download the latest version of libgpg-error

cd /usr/local/PKG_BUILD
wget ftp://ftp.gnupg.org/gcrypt/libgpg-error/libgpg-error-1.6.tar.bz2
bzip2 -d libgpg-error-1.6.tar.bz2
tar -xvf libgpg-error-1.6.tar
cd libgpg-error-1.6
./configure –prefix=/usr/local/PACKAGES/libgpg-error-1.6
make
make install
graft -i /usr/local/PACKAGES/libgpg-error-1.6/

Go to http://www.gnupg.org/download/index.en.html, and download libgcrypt and install it:

cd /usr/local/PKG_BUILD
wget ftp://ftp.gnupg.org/gcrypt/libgcrypt/libgcrypt-1.4.1.tar.bz2
bzip2 -d libgcrypt-1.4.1.tar.bz2
cd libgcrypt-1.4.1
./configure –prefix=/usr/local/PACKAGES/libgcrypt-1.4.1
make
make install
graft -i /usr/local/PACKAGES/libgcrypt-1.4.1

Go to http://www.t2-project.org/packages/libtasn1.html, and download libtasn1

cd /usr/local/PKG_BUILD
wget ftp://ftp.gnutls.org/pub/gnutls/libtasn1/libtasn1-1.4.tar.gz
tar -xvzf libtasn1-1.4.tar.gz
cd libtasn1-1.4
./configure –prefix=/usr/local/PACKAGES/libtasn1-1.4
make
make install
graft -i /usr/local/PACKAGES/libtasn1-1.4

If you get a conflict, you can just remove the conflicting file and retry the graft:

rm -f /usr/local/share/info/dir
graft -i /usr/local/PACKAGES/libtasn1-1.4

Go to http://www.sqlite.org/download.html, and download the source tree file:

cd /usr/local/PKG_BUILD
wget http://www.sqlite.org/sqlite-3.5.9.tar.gz
tar -xvzf sqlite-3.5.9.tar.gz
cd sqlite-3.5.9
./configure –prefix=/usr/local/PACKAGES/sqlite-3.5.9
make
make install
graft -i /usr/local/PACKAGES/sqlite-3.5.9

Go to http://www.gnu.org/software/gnutls/releases/, and download the latest gnutls:

cd /usr/local/PKG_BUILD
wget http://www.gnu.org/software/gnutls/releases/gnutls-2.4.0.tar.bz2
bzip2 -d gnutls-2.4.0.tar.bz2
tar -xvf gnutls-2.4.0.tar
cd gnutls-2.4.0
./configure –prefix=/usr/local/PACKAGES/gnutls-2.4.0
make
make install
graft -i /usr/local/PACKAGES/gnutls-2.4.0

Once again, if you get a conflict:

rm -f /usr/local/share/info/dir
graft -i /usr/local/PACKAGES/gnutls-2.4.0

Go to http://librsync.sourceforge.net/ and download the latest librsync source:

cd /usr/local/PKG_BUILD
wget http://internode.dl.sourceforge.net/sourceforge/librsync/librsync-0.9.7.tar.gz
tar -xvzf librsync-0.9.7.tar.gz
cd librsync-0.9.7
./configure –prefix=/usr/local/PACKAGES/librsync-0.9.7
make
make install
graft -i /usr/local/PACKAGES/librsync-0.9.7

Step 2) Download and install csync2

Go to http://oss.linbit.com/csync2/ and download the latest csync2 source:

cd /usr/local/PKG_BUILD
wget http://oss.linbit.com/csync2/csync2-1.34.tar.gz
tar -xvzf csync2-1.34.tar.gz
cd csync2-1.34

Now I couldn’t get csync2 to locate libsqlite. Seems it doesn’t like the latest version (3) anyway, so we have to go back and download & install an older version:

cd /usr/local/PKG_BUILD
wget http://www.sqlite.org/sqlite-2.8.17.tar.gz
tar -xvzf sqlite-2.8.17.tar.gz
cd sqlite-2.8.17
./configure –prefix=/usr/local/PACKAGES/sqlite-2.8.17
make
make install
graft -i /usr/local/PACKAGES/sqlite-2.8.17

Now, back to csync2. Note that we also run ldconfig just to make sure all our libraries are findable:

ldconfig
cd /usr/local/PKG_BUILD/csync2-1.34
./configure –prefix=/usr/local/PACKAGES/csync2-1.34
make
make install
graft -i /usr/local/PACKAGES/csync2-1.34

And we’re done! csync is compiled and installed! On to step 3….

Step 3) Set up xinetd

By default my CentOS 5 server did not have xinetd installed, so let’s install it

yum install xinetd

Create the following file as /etc/xinetd.d/csync2:

service csync2
{
disable = no
protocol = tcp
socket_type = stream
wait = no
user = root
server = /usr/local/sbin/csync2
server_args = -i
}

The csync2 protocol isn’t a standard one so we need to add it:

echo “csync2 30865/tcp” >> /etc/services

Then let’s restart xinetd:

service xinetd restart

And xinetd is ready…..now we need to configure csync2….

Step 4) Configuring csync2

TO BE CONTINUED…

[del.icio.us] [Digg] [StumbleUpon] [Technorati] [Windows Live]

graft: Install and Configure package management

Graft is a great little tool for package management on UNIX systems such as Linux. The tool itself allows you to keep a package isolated in it’s own directory, and uses symlinks to combine them in with the rest of the operating system.

An example would be a new tool that has one binary and one shared library:

/usr/local/PACKAGES/somenewtool-1.0
/usr/local/PACKAGES/somenewtool-1.0/bin
/usr/local/PACKAGES/somenewtool-1.0/bin/newtool
/usr/local/PACKAGES/somenewtool-1.0/lib
/usr/local/PACKAGES/somenewtool-1.0/lib/libnewtool.so

Using the graft tool, we can install the binary and the library in with the rest of our OS using a simple command like:

graft -i /usr/local/PACKAGES/somenewtool-1.0

With the appropriate configuration, this would create these symlinks:

/usr/local/bin/newtool -> /usr/local/PACKAGES/somenewtool-1.0/bin/newtool
/usr/local/lib/libnewtool.so -> /usr/local/PACKAGES/somenewtool-1.0/lib/libnewtool.so

Removing these symlinks is also really simple like this:

graft -d /usr/local/PACKAGES/somenewtool-1.0

Now you can see the usefulness of this tool to remove one version, and install a new one. Anyway, let’s look at the install process for graft.

1) Setup environment

mkdir -p /usr/local/PACKAGES
mkdir -p /usr/local/PKG_BUILD
cd /usr/local/PKG_BUILD

2) Download and Install Graft

Graft can be found at: http://www.gormand.com.au/peters/tools/

wget http://www.gormand.com.au/peters/tools/graft/graft-2.4.tar.gz
tar -xvzf graft-2.4.tar.gz
cd graft-2.4
make -f Makefile.dist

edit the Makefile file and change these lines:
PACKAGEDIR = /usr/local/PACKAGES
TARGETDIR = /usr/local
PERL = /usr/bin/perl

Now let’s make it:

make
make install
/usr/local/PACKAGES/graft-2.4/bin/graft -i /usr/local/PACKAGES/graft-2.4

And that’s it! Graft is now installed (in fact we just used it to install itself!). Now to get the most out of graft, I recommend adding the line “/usr/local/lib” into your /etc/ld.so.conf file, and running “ldconfig”.

[del.icio.us] [Digg] [StumbleUpon] [Technorati] [Windows Live]

nginx error – upstream sent too big header

We recently decided to test nginx with our CDN.  It seems that lighttpd just isn’t up the the task of serving high-connection rate services.  It was grinding to a halt on our systems as they started processing 400 – 600 connections per second.  We ended up running multiple lighttpd’s on single servers to alleviate this problem, but in doing so we were losing all the benefits of lighttpd such as stat caching etc.

Enter nginx.  Nginx is another web server similar in nature to lighttpd.  We thought we’d give it a go with just one of our clients which processes in the vicinity of 800 connections per second.  Here is our commentary:
1) nginx seems very well designed and coded – I find the configuration absolutely brilliant and very self explanatory to a developer.  I think a non-developer might have a bit of a hard time with it however, but I’m I’m sure they could work it out.  One of the biggest hates in all the forums is how badly nginx is documented – it’s NOT true.  I found the core features to be very well documented (in English) albeit in various locations.

2) Speed – The improvement was immediately noticable.  This webserver software is indeed faster than lighttpd.  Anybody wishing to argue – go home.  It’s faster under both small loads and large loads.

3) High-volume of connections – Handles it smoothly.  Lighttpd really had problems here.  Nginx doesn’t seem to care.

We did get one error which didnt seem documented anywhere so we ended up going through the source code to track down the issue.  The error message we were getting was:

“upstream sent too big header while reading response header from upstream”

Which would cause 502 Bad Gateway errors.  Apparantly this is an expected error with the default fastcgi configuration as the buffer size is not big enough for processing our headers (not sure why considering I think the default is 4k?).  Anyway – here is our fastcgi paramaters that appear to be currently working:

fastcgi_connect_timeout 60;
fastcgi_send_timeout 180;
fastcgi_read_timeout 180;
fastcgi_buffer_size 128k;
fastcgi_buffers 4 256k;
fastcgi_busy_buffers_size 256k;
fastcgi_temp_file_write_size 256k;
fastcgi_intercept_errors on

Overall, we’re going to let nginx run for a few weeks and see if we run into any more little problems.  If we don’t, we’ll look into changing our entire CDN over to nginx as lighttpd just isn’t cutting it performance wise, and the bug support seems almost non existent these days.  Personally, I think lighttpd’s days are numbered and I’m expecting its distribution to start reducing sometime over the next 6 months….

[del.icio.us] [Digg] [StumbleUpon] [Technorati] [Windows Live]