nginx ncache performance and stability

BACKGROUND: We’ve been running nginx successfully for a long long time now without issue.  We were really pleased with it, and migrated all of our CDN servers across to using it several months ago.  While there were a few little configuration defaults that didn’t play too nicely, we ironed out the problems within a few days and it’s business as usual, serving between 700TB and 1.8PB (In August 2008!) per month!

Now we have the problem that our proprietary systems that actually cache our customer’s sites just aren’t fast enough to handle the fast-changing content that our customers are demanding.  So we’ve been weighing up a few options:

1) Deploy Varnish
2) Deploy Squid
3) Deploy ncache

We actually TRIED to deply varnish after seeing so many recommendations, but at the end of the day it couldn’t keep up.  It really should be noted that our systems are all 32bit, and I get the feeling varnish would perform a lot better on 64bit, but when you have over a hundred servers all running on 32bit processors…..upgrading them all in one hit just isn’t an option.  The problem with varnish is that it would drop connections seemingly because it had run out of memory (although we’re not 100% on this as the debugging wasn’t overly useful).

So…..we tried……we failed.  NEXT

Our next attempt was to look into deploying Squid.  This one proved a bit complex to integrate into our existing CDN network because of squid’s limitations.  We would have to write a bit of custom code to make this one work, so it has been made a last resort.

So, option 3 (which is the whole point of this blog entry), was to try out the ncache nginx module.  So we installed the ncache 2.0 package along with nginx 0.6.32.  We set them all up and things ran absolutely beautifully.  We manually tested caches and it was working great, very fast, very efficient, and well, it was great!

We were extremely happy until the next day when one CDN started reporting high load issues.  In analysing the server, it seems nginx was trying to consume several gig of memory – owch.  So we killed it off and restarted it, and it ran fine again.  Maybe it was just a temporary glitch?

Nope – over the following week, we had several servers experience identical issues – that is, where nginx consumes all available memory until such a point that it simply stops being able to serve files and requires a restart.  Looks like a case of either:

A) ncache 2.0 has a serious memory leak
B) ncache 2.0 doesn’t handle high volumes of cached data very well.

We’ve tried to work through the issues to make sure they’re not configuration issues, but no luck.  At the end of the day it’s going to be cheaper and easier to make some mods to squid and deply that instead.

So for anybody wondering about ncache performance and stability, I’d simply say that it’s a great product, but not really production-ready at this stage if you’re doing high volumes of data.

[del.icio.us] [Digg] [StumbleUpon] [Technorati] [Windows Live]

ping: sendmsg: Operation not permitted

We recently found several of our CDN servers suddenly experiencing 10-20% network packet loss – OUCH!.  The loss would not be constant, but would happen more frequently at some times of the day than others.  No other servers on the same networks were being affected – only the CDN boxes.

One of the syptoms we soon discovered was we’d get errors on the local server when trying to ping out:

ping: sendmsg: Operation not permitted

Ahah!  This gave us a great start in that it’s the kernel itself that is rejecting the packets rather than some wierd network anomaly.  So we checked route caches and all kinds of things…..nothing really gave the problem away, until we checked our centralised syslog server and saw thousands of these messages:

ip_conntrack: table full, dropping packet

Okay, “dropping packet” – that would make for a good explanation of things.  Anyway sure enough, the connection tracking table for netfilter was full (We’d already upped these to 128000 for all of our CDN boxes! – but apparantly this wasn’t enough!)  So we upped the limit even higher on all servers:

echo 200000 > /proc/sys/net/ipv4/ip_conntrack_max

And straight away all CDN’s started working again.  Don’t forget to add an entry to your /etc/sysctl.conf:

net.ipv4.netfilter.ip_conntrack_max = 200000

It turns out the problem was a new cachecentric.com customer who was really heavy on the thumbnails – their site serves about 150,000 thumbnails per second!!!  This was the immediate cause of all our problems – I guess it was always going to happen eventually, so lucky we caught it so quickly.  I hope this info helps someone else out.

[del.icio.us] [Digg] [StumbleUpon] [Technorati] [Windows Live]

PHP Parse XML from mysql

Today we were looking at migrating data from one database to another and thought that we’d try out Mysql command line XML output format. The format is fairly simple, so we decided we’d run with it instead of CSV. To parse the XML file, we use PHP, and soon discovered there’s no real nice code snippets out there on the net for parsing XML with PHP. Anyway, we cobbled the following code together to read mysql XML from stdin, and parse it into $rows.

Our original XML file looks something like this….

<?xml version=”1.0″?>

<resultset statement=”SELECT * FROM httpd_access_log_tmp WHERE pending=’Y’
“>
<row>
<field name=”ev_timestamp”>2008-06-30 02:45:10</field>
<field name=”site”>rackcorp.com</field>
<field name=”hostname”>XX.XXX.XX.XX</field>
<field name=”method”>GET</field>
<field name=”url”>/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX</field>
<field name=”authuser”>-</field>
<field name=”agent”>Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv</field>
<field name=”referer”>http://www.mysql.com/XXXXXXX</field>
<field name=”status”>200</field>
<field name=”servedfrom”>XXXXXXXXX</field>
<field name=”timeused”>0</field>
<field name=”bytesbody”>9948</field>
<field name=”bytesin”>499</field>
<field name=”bytesout”>10227</field>
<field name=”httpprotocol”>HTTP/1.1</field>
<field name=”httphost”>XXXXXXXXXXXXX</field>
<field name=”pending”>Y</field>
</row>

<row>
etc…

So now the code to parse it:

if ( ! ( $fp = fopen(“php://stdin”, “r”) ) ) {
die (“Couldn’t read STDIN.”);
}

$rows = array();
$columnName = ”;
$rowcount = 0;

if (!($xml_parser = xml_parser_create())) die(“Couldn’t create parser.”);
function startElementHandler ($parser,$name,$attrib){
global $rowcount;
global $rows;
global $columnName;
switch ($name) {
case “FIELD”:
$columnName = $attrib[“NAME”];
break;
default:
$columnName=””;
}
}

function endElementHandler ($parser,$name){
global $rowcount;
global $rows;
global $columnName;

$columnName=”; // Finished with data

if($name==”ROW”) {$rowcount++;}
}

function characterDataHandler ($parser, $data) {
global $rowcount;
global $rows;
global $columnName;

if ($columnName == “”) {return;}

$rows[$rowcount][$columnName] = $data;

}
xml_set_element_handler($xml_parser,”startElementHandler”,”endElementHandler”);
xml_set_character_data_handler( $xml_parser, “characterDataHandler”);

while( $data = fread($fp, 4096)){
if(!xml_parse($xml_parser, $data, feof($fp))) {
break;
}
}
xml_parser_free($xml_parser);

var_dump($rows);

$rows now looks a little something like this:

array(2) {
[0]=>
array(17) {
[“ev_timestamp”]=>
string(19) “2008-06-30 02:45:10”
[“site”]=>
string(15) “rackcorp.com”
[“hostname”]=>
string(12) “XX.XXX.XX.XX”
[“method”]=>
string(3) “GET”
[“url”]=>
string(61) “/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX”
[“authuser”]=>
string(1) “-”
[“agent”]=>
string(50) “Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv”
[“referer”]=>
string(46) “http://www.mysql.com/XXXXXXX”
[“status”]=>
string(3) “200”
[“servedfrom”]=>
string(12) “XXXXXXXXX”
[“timeused”]=>
string(1) “0”
[“bytesbody”]=>
string(4) “9948”
[“bytesin”]=>
string(3) “499”
[“bytesout”]=>
string(5) “10227”
[“httpprotocol”]=>
string(8) “HTTP/1.1”
[“httphost”]=>
string(17) “XXXXXXXXXXXXXXXXXXXXXXXXX”
[“pending”]=>
string(1) “Y”
}
[1]=> etc….

That should do it. Then do with $rows what you want. We end up throwing it back into another few databases under a webhosting accesslogs warehousing scheme.

[del.icio.us] [Digg] [StumbleUpon] [Technorati] [Windows Live]

csync2: Install and setup csync2 on CentOS 5

This blog details how to build and install csync2 form source, as well as configure it.

Step 1) Download and install required libraries

If you haven’t already done so, install graft – it’s great. Here’s a tutorial how to install graft:
http://blog.rackcorp.com/?p=16

Go to ftp://ftp.gnupg.org/gcrypt/libgpg-error/ and download the latest version of libgpg-error

cd /usr/local/PKG_BUILD
wget ftp://ftp.gnupg.org/gcrypt/libgpg-error/libgpg-error-1.6.tar.bz2
bzip2 -d libgpg-error-1.6.tar.bz2
tar -xvf libgpg-error-1.6.tar
cd libgpg-error-1.6
./configure –prefix=/usr/local/PACKAGES/libgpg-error-1.6
make
make install
graft -i /usr/local/PACKAGES/libgpg-error-1.6/

Go to http://www.gnupg.org/download/index.en.html, and download libgcrypt and install it:

cd /usr/local/PKG_BUILD
wget ftp://ftp.gnupg.org/gcrypt/libgcrypt/libgcrypt-1.4.1.tar.bz2
bzip2 -d libgcrypt-1.4.1.tar.bz2
cd libgcrypt-1.4.1
./configure –prefix=/usr/local/PACKAGES/libgcrypt-1.4.1
make
make install
graft -i /usr/local/PACKAGES/libgcrypt-1.4.1

Go to http://www.t2-project.org/packages/libtasn1.html, and download libtasn1

cd /usr/local/PKG_BUILD
wget ftp://ftp.gnutls.org/pub/gnutls/libtasn1/libtasn1-1.4.tar.gz
tar -xvzf libtasn1-1.4.tar.gz
cd libtasn1-1.4
./configure –prefix=/usr/local/PACKAGES/libtasn1-1.4
make
make install
graft -i /usr/local/PACKAGES/libtasn1-1.4

If you get a conflict, you can just remove the conflicting file and retry the graft:

rm -f /usr/local/share/info/dir
graft -i /usr/local/PACKAGES/libtasn1-1.4

Go to http://www.sqlite.org/download.html, and download the source tree file:

cd /usr/local/PKG_BUILD
wget http://www.sqlite.org/sqlite-3.5.9.tar.gz
tar -xvzf sqlite-3.5.9.tar.gz
cd sqlite-3.5.9
./configure –prefix=/usr/local/PACKAGES/sqlite-3.5.9
make
make install
graft -i /usr/local/PACKAGES/sqlite-3.5.9

Go to http://www.gnu.org/software/gnutls/releases/, and download the latest gnutls:

cd /usr/local/PKG_BUILD
wget http://www.gnu.org/software/gnutls/releases/gnutls-2.4.0.tar.bz2
bzip2 -d gnutls-2.4.0.tar.bz2
tar -xvf gnutls-2.4.0.tar
cd gnutls-2.4.0
./configure –prefix=/usr/local/PACKAGES/gnutls-2.4.0
make
make install
graft -i /usr/local/PACKAGES/gnutls-2.4.0

Once again, if you get a conflict:

rm -f /usr/local/share/info/dir
graft -i /usr/local/PACKAGES/gnutls-2.4.0

Go to http://librsync.sourceforge.net/ and download the latest librsync source:

cd /usr/local/PKG_BUILD
wget http://internode.dl.sourceforge.net/sourceforge/librsync/librsync-0.9.7.tar.gz
tar -xvzf librsync-0.9.7.tar.gz
cd librsync-0.9.7
./configure –prefix=/usr/local/PACKAGES/librsync-0.9.7
make
make install
graft -i /usr/local/PACKAGES/librsync-0.9.7

Step 2) Download and install csync2

Go to http://oss.linbit.com/csync2/ and download the latest csync2 source:

cd /usr/local/PKG_BUILD
wget http://oss.linbit.com/csync2/csync2-1.34.tar.gz
tar -xvzf csync2-1.34.tar.gz
cd csync2-1.34

Now I couldn’t get csync2 to locate libsqlite. Seems it doesn’t like the latest version (3) anyway, so we have to go back and download & install an older version:

cd /usr/local/PKG_BUILD
wget http://www.sqlite.org/sqlite-2.8.17.tar.gz
tar -xvzf sqlite-2.8.17.tar.gz
cd sqlite-2.8.17
./configure –prefix=/usr/local/PACKAGES/sqlite-2.8.17
make
make install
graft -i /usr/local/PACKAGES/sqlite-2.8.17

Now, back to csync2. Note that we also run ldconfig just to make sure all our libraries are findable:

ldconfig
cd /usr/local/PKG_BUILD/csync2-1.34
./configure –prefix=/usr/local/PACKAGES/csync2-1.34
make
make install
graft -i /usr/local/PACKAGES/csync2-1.34

And we’re done! csync is compiled and installed! On to step 3….

Step 3) Set up xinetd

By default my CentOS 5 server did not have xinetd installed, so let’s install it

yum install xinetd

Create the following file as /etc/xinetd.d/csync2:

service csync2
{
disable = no
protocol = tcp
socket_type = stream
wait = no
user = root
server = /usr/local/sbin/csync2
server_args = -i
}

The csync2 protocol isn’t a standard one so we need to add it:

echo “csync2 30865/tcp” >> /etc/services

Then let’s restart xinetd:

service xinetd restart

And xinetd is ready…..now we need to configure csync2….

Step 4) Configuring csync2

TO BE CONTINUED…

[del.icio.us] [Digg] [StumbleUpon] [Technorati] [Windows Live]

graft: Install and Configure package management

Graft is a great little tool for package management on UNIX systems such as Linux. The tool itself allows you to keep a package isolated in it’s own directory, and uses symlinks to combine them in with the rest of the operating system.

An example would be a new tool that has one binary and one shared library:

/usr/local/PACKAGES/somenewtool-1.0
/usr/local/PACKAGES/somenewtool-1.0/bin
/usr/local/PACKAGES/somenewtool-1.0/bin/newtool
/usr/local/PACKAGES/somenewtool-1.0/lib
/usr/local/PACKAGES/somenewtool-1.0/lib/libnewtool.so

Using the graft tool, we can install the binary and the library in with the rest of our OS using a simple command like:

graft -i /usr/local/PACKAGES/somenewtool-1.0

With the appropriate configuration, this would create these symlinks:

/usr/local/bin/newtool -> /usr/local/PACKAGES/somenewtool-1.0/bin/newtool
/usr/local/lib/libnewtool.so -> /usr/local/PACKAGES/somenewtool-1.0/lib/libnewtool.so

Removing these symlinks is also really simple like this:

graft -d /usr/local/PACKAGES/somenewtool-1.0

Now you can see the usefulness of this tool to remove one version, and install a new one. Anyway, let’s look at the install process for graft.

1) Setup environment

mkdir -p /usr/local/PACKAGES
mkdir -p /usr/local/PKG_BUILD
cd /usr/local/PKG_BUILD

2) Download and Install Graft

Graft can be found at: http://www.gormand.com.au/peters/tools/

wget http://www.gormand.com.au/peters/tools/graft/graft-2.4.tar.gz
tar -xvzf graft-2.4.tar.gz
cd graft-2.4
make -f Makefile.dist

edit the Makefile file and change these lines:
PACKAGEDIR = /usr/local/PACKAGES
TARGETDIR = /usr/local
PERL = /usr/bin/perl

Now let’s make it:

make
make install
/usr/local/PACKAGES/graft-2.4/bin/graft -i /usr/local/PACKAGES/graft-2.4

And that’s it! Graft is now installed (in fact we just used it to install itself!). Now to get the most out of graft, I recommend adding the line “/usr/local/lib” into your /etc/ld.so.conf file, and running “ldconfig”.

[del.icio.us] [Digg] [StumbleUpon] [Technorati] [Windows Live]