nginx error – upstream sent too big header

We recently decided to test nginx with our CDN.  It seems that lighttpd just isn’t up the the task of serving high-connection rate services.  It was grinding to a halt on our systems as they started processing 400 – 600 connections per second.  We ended up running multiple lighttpd’s on single servers to alleviate this problem, but in doing so we were losing all the benefits of lighttpd such as stat caching etc.

Enter nginx.  Nginx is another web server similar in nature to lighttpd.  We thought we’d give it a go with just one of our clients which processes in the vicinity of 800 connections per second.  Here is our commentary:
1) nginx seems very well designed and coded – I find the configuration absolutely brilliant and very self explanatory to a developer.  I think a non-developer might have a bit of a hard time with it however, but I’m I’m sure they could work it out.  One of the biggest hates in all the forums is how badly nginx is documented – it’s NOT true.  I found the core features to be very well documented (in English) albeit in various locations.

2) Speed – The improvement was immediately noticable.  This webserver software is indeed faster than lighttpd.  Anybody wishing to argue – go home.  It’s faster under both small loads and large loads.

3) High-volume of connections – Handles it smoothly.  Lighttpd really had problems here.  Nginx doesn’t seem to care.

We did get one error which didnt seem documented anywhere so we ended up going through the source code to track down the issue.  The error message we were getting was:

“upstream sent too big header while reading response header from upstream”

Which would cause 502 Bad Gateway errors.  Apparantly this is an expected error with the default fastcgi configuration as the buffer size is not big enough for processing our headers (not sure why considering I think the default is 4k?).  Anyway – here is our fastcgi paramaters that appear to be currently working:

fastcgi_connect_timeout 60;
fastcgi_send_timeout 180;
fastcgi_read_timeout 180;
fastcgi_buffer_size 128k;
fastcgi_buffers 4 256k;
fastcgi_busy_buffers_size 256k;
fastcgi_temp_file_write_size 256k;
fastcgi_intercept_errors on

Overall, we’re going to let nginx run for a few weeks and see if we run into any more little problems.  If we don’t, we’ll look into changing our entire CDN over to nginx as lighttpd just isn’t cutting it performance wise, and the bug support seems almost non existent these days.  Personally, I think lighttpd’s days are numbered and I’m expecting its distribution to start reducing sometime over the next 6 months….

[del.icio.us] [Digg] [StumbleUpon] [Technorati] [Windows Live]

Power outage on uk-rs-vhost*, uk-rs-cs*

As reported, on Wednesday, 30/04/08 at 21:36 AEST, the primary datacentre used by RackCorp for all UK equipment suffered a power issue resulting in power being cut to all equipment for a few minutes. The following is the accepted publication of cause by the datacentre:

The nature of the UPS failure was unique according to the manufacturer. There are over 3000 systems of the same model installed worldwide, and not one of these has experienced a similar failure. The failure was extremely unusual and unexpected, and the UPS manufacturer is having an engineer from Switzerland over to examine the failed system in more detail. The manufacturer has not yet determined the exact cause of the fault and as such we are not in a position to update you on what action we will be taking.
When the UPS failed a number of capacitors and other electrical components were overloaded and burnt out, this in turn triggered our fire alarm and all employees were evacuated from the building as part of our standard procedure to ensure their safety. [....]

Their publication goes on to acknowledge that the fire alarm combined with a number of software issues caused further problems with other services operated by them. We’ve had some queries from clients, and can confirm that the ONLY item above that affected any RackCorp client was the power outage itself. Servers were offline for a few minutes, and soon powered back on. Due to the sub-5 minute outage interval, offsite switchover did NOT occur on any RackCorp UK hosted services.

On the up side, we can proudly say that EVERY UK-RS server managed by us booted up in a flawless manner on the day, without incident. All equipment was checked over the following 26hrs by our staff and it is understood that no detectable damage is understood to have occurred to any equipment in use by us or our customers. This result can be attributed to our procedural approach to server management, and many of you have already complimented us on that outcome – so on behalf of the staff who were working on that night we certainly thankyou for your kind feedback.

– NetOps

[del.icio.us] [Digg] [StumbleUpon] [Technorati] [Windows Live]

RackCorp Industry Blog

Hey Everyone,

Welcome to our new RackCorp branding, and just as importantly, our new RackCorp blog!

For anybody who is new to the history of RackCorp, I might just give you a quick history as to how RackCorp came to be. RackCorp is both owned and operated by Network Synergy Corporation Pty. Ltd, an Australian company that has been around since 2003 in it’s present form, and has a history dating back to 1996 in terms of acquired staff, customers, services, equipment, contracts, and most importantly – experience!

Network Synergy Corporation (NSCorp) has never been a one size fits all company, nor have we ever been a follower in terms of technology. After acquiring 2 web services companies at the end of 2007, we found ourselves maintaining 9 websites, and 18 service portals either on behalf of our own services or services that we had acquired over the years. The maintennance for this many platforms (many of which were acquired in states of high-maintennance) was so high that innovation had been at an all time low throughout 2007. We sat down with management in January this year and made a big decision to invest heavily into combining all of our service portals into one – and if you haven’t already worked it out, this is where RackCorp.com came from!

In later blogs I hope to show off some of the great technology that we’ve built in-house over the years, and hopefully I’ll finally be able to stop leaving customers bemused by always saying “Yeah we can do that, in fact we could have set you up on that years ago”. In doing so I really do hope that our customers can finally get that little something more out of RackCorp that wasn’t available before.

So that’s it for our first blog. A bit of a tease I know, but we’ll keep new blogs coming quick and fast over the coming weeks. There really is months of tech stuff to show off and talk about here at RackCorp, so do check back with our blog regularly as I’m sure there will be plenty of things to interest everybody!

RackCorp LogoNSCorp Logo

– RackCorp

[del.icio.us] [Digg] [StumbleUpon] [Technorati] [Windows Live]