Okay guys, so as many of you know, we offer both Apache and Nginx servers here as part of our standard shared hosting packages. There is no better web server out there for reliable performance in a high-traffic environment. One thing that I frequently go through with the new staff here are nginx location / rewrite rules because they can be a bit confusing.
The best way to think of things is that as a request comes in, Nginx will scan through the configuration to find a “location” line that matches the request. There are TWO modes that nginx uses to scan through the configuration file: literal string matching and regular expression checks. Nginx first scans through ALL literal string location entries in the order that they occur in the configuration file, and secondly scans through ALL the regular expression location entries in the order that they occur in the configuration file. So be aware – location ordering order DOES matter.
Now there’s a few ways of interrupting that flow:
location = /images { }(Note: does not work for regular expressions)
The “=” is the important character here. This matches a request for “/images” ONLY. This also halts the location scanning as soon as such an exact match is met.
location ^~ /images {}(Note: does not work for regular expressions)
The “^~” results in a case sensitive match for the beginning of a request. This means /images, /images/logo.gif, etc will all be matched. This also halts the location scanning as soon as a match is met.
location ~ /images {}
location ~* /images {}(case insensitive version)
This causes a case (in-)sensitive match for the beginning of a request. Identical to the previous one, except this one doesn’t stop searching for a more exact location clauses.
That’s IT! Yes it really is that simple. Now there’s a few variations for case-insensitive matches or named-locations, but don’t worry about those for now.
Now all of the above examples are literal string examples. If you replace /images with a regular expression then suddenly you have altered the order of the rules (remember ALL literal strings get checked first, and THEN regular expressions – regardless of the order you have them in your configuration).
An examples of a regular expression match is:
location ~ \.(gif|jpg|jpeg)$ { }
This will match any request that ends in .gif, .jpg, or .jpeg.
So now that we’ve discussed the foundations of the location rules, we can move into rewrites. There are TWO kinds of rewrites – URL redirects (HTTP301/HTTP302), or an internal rewrite (mangles the request before it is processed).
URL Redirects are the simplest to understand:
location /admin {
rewrite ^/admin/(.*)$ http://admin.example.com/$1 permanent;
}
This example will redirect any request matching the location rule (see earlier) as a HTTP 301 permanent redirection to http://admin.example.com/. e.g. http://www.example.com/admin/index.html now gets HTTP redirected to http://admin.example.com/index.html. Note the regular expression and the $1 replacement in the URL. If you want the redirect to be a HTTP 302 (temporary redirection), just change the word “permanent” to “redirect”.
Internal rewrites are a little more complicated:
location /admin {
rewrite ^/admin/(.*)$ /$1 break;
}
The key word here is “break”. This causes the rewrite processing to stop. If this word was “last”, it would then go back to scanning location entries as per our discussions earlier – but now with the rewritten URL.
I hope that clears up nginx configuration. The documentation is really good over at the nginx wiki (http://wiki.nginx.org/NginxModules). I think this was the only part that sometimes confuses some of us here. let us know if you think I missed anything, otherwise I hope to put up some of our nginx rewrites for some o the more popular forums/blogs in the weeks to come!
We have now added a new option in the ongoing fight against unwanted spam. As of early this morning, all RackCorp mail servers in Australia, US, and Canada have been updated to RackCorpMailServices-1.14. In doing do, we have now included a new option in our online portal to help manage spam.
You can find the option here when managing accounts (and similarly for managing aliases):

With this option, you can now effectively defer ALL inbound email that matches the realtime blacklists. Up until now, you only were able to greylist (defer for 10 minutes) any inbound email matching these blacklists. By permanently deferring the email, you ensure that you do NOT receive any email that is coming from a blacklisted source, AND that the sender will eventually receive notification that you did not receive that email (explaining that it is because they are blacklisted).
It’s not all good though – the downside to doing this is that if someone IS blacklisted and is sending you something urgent, then they might not find out about it for several days. Exactly how long until they do find out varies between 4 hours and 10 days, and is dependent on the sender’s ISP / mail infrastructure (not ours!).
When do we recommend using this option? If you’re receiving so much spam that you’re finding it hard to do business, then activate this option – it’ll help a lot.
I’ve been itching to tackle this subject for so long, but time is hard to find these days! This isn’t purely a marketing blog here, RackCorp offers international services in LOTS of countries(20+ now), and quite often it’s not cost beneficial to our customers to have a fully decked out presence in some locations, so we too have to choose our providers carefully.
DATACENTRE LOCATION
- If you’re serving speed-critical videos, files, game services, or telephony solutions, then you should try to choose someone who has equipment close to your customers.
- If your customers will be uploading / downloading LOTS of data, try utilise peering networks / centres that your customers may be connected to as much as possible as it will save your customers money.
- If you’ve got a small budget, and your service is not speed critical then consider going with equipment in the US or UK. It may not be the fastest to your customer’s locations (unless they are in the US or UK!), but you’ll find it gives the best return for the money.
MAINTENANCE
- Does the provider perform regular maintenance on their equipment?
- Does the provider replace hardware regularly?
- What versions of firmware/software is your provider running – is it surpassed?
- When was the last time the provider ran without mains power for a test?
- Does the provider notify customers of software updates in advance, and do they have alternatives if your system is unable to upgrade?
REDUNDANCY
Okay, so things go wrong. Hardware fails, things screw up. It happens. Now what!
- Does the provider have at least N+1 hardware on standby – and what’s the turnaround time in getting +1 operational?
- Does the provider have network redundancy that will result in no service degradation even if a primary link fails?
- Does the provider have the systems in place to automatically detect failures and respond to them?
CONTACT
Your site goes down – you don’t know why. It might be your provider’s fault, it might be your fault. This is where many people might panic….but you shouldn’t if you have addressed the following:
- Does your provider actively respond to outages, or do you have to notify them first?
- Do you have a phone number for your provider? Do they answer or provide voicemail services to which they respond in a reasonable timeframe?
- Does your provider have a “support ticket” system where issues can be tracked, or is it all verbal / email based? Support tickets are a requisite when dealing with anything more than a few hundred customers.
- Does your supplier communicate with you so that you understand what is going on. They need to speak on your level else there is a risk of miscommunication.
- How many staff does your provider have? Can they survive at a critical moment without key persons (Murphy’s law applies to hosting in some extreme ways….)
TROUBLESHOOTING
There’s a problem. Your customers are complaining, but you don’t know what it could be. This is where you need help!
- Does the provider publish issues, large and small?
- Does the provider accept blame for issues related to them, or do they try to conceal things?
- Does the provider have a technical team able to troubleshoot hardware, network, and software issues?
WHAT CAN YOU DO?
So now you should just go and take the above list of questions and give them to your prospective service provider to fill in the blanks. WRONG!
Most large providers will at best send you a services overview PDF, or at worst stick your request in their trash can. There’s just too many ’shopping’ customers in this industry who demand way too much for what they’re willing to pay. So what you NEED to do is browse their website and answer as many questions as you can FIRST. Then if you find you still have questions, then sure, email a few questions to get clarification.
BUT HOW CAN I TRUST THEIR WEBSITE?
It’s amazing how many lies are throughout the hosting industry. Some are hidden, some are blatant. Some are ‘industry expected’, some are astonishing. So let’s make a checklist of things you can check yourself:
So that’s it. Not really how I wanted to put all this information, but it’s a start. Now here’s comes the marketing piece for RackCorp
We don’t get praise much here at RackCorp – because customers tend not to notice even the most disastrous events that we live through. I see so many hosting companies have a whinge that it’s not their fault when a datacenter loses power, or when their network provider accidentally stops announcing their routes. That’s part of this business – it’s about how you prepare for the worst and deal with it that makes you a good provider for critical services.
BACKGROUND: We’ve been running nginx successfully for a long long time now without issue. We were really pleased with it, and migrated all of our CDN servers across to using it several months ago. While there were a few little configuration defaults that didn’t play too nicely, we ironed out the problems within a few days and it’s business as usual, serving between 700TB and 1.8PB (In August 2008!) per month!
Now we have the problem that our proprietary systems that actually cache our customer’s sites just aren’t fast enough to handle the fast-changing content that our customers are demanding. So we’ve been weighing up a few options:
1) Deploy Varnish
2) Deploy Squid
3) Deploy ncache
We actually TRIED to deply varnish after seeing so many recommendations, but at the end of the day it couldn’t keep up. It really should be noted that our systems are all 32bit, and I get the feeling varnish would perform a lot better on 64bit, but when you have over a hundred servers all running on 32bit processors…..upgrading them all in one hit just isn’t an option. The problem with varnish is that it would drop connections seemingly because it had run out of memory (although we’re not 100% on this as the debugging wasn’t overly useful).
So…..we tried……we failed. NEXT
Our next attempt was to look into deploying Squid. This one proved a bit complex to integrate into our existing CDN network because of squid’s limitations. We would have to write a bit of custom code to make this one work, so it has been made a last resort.
So, option 3 (which is the whole point of this blog entry), was to try out the ncache nginx module. So we installed the ncache 2.0 package along with nginx 0.6.32. We set them all up and things ran absolutely beautifully. We manually tested caches and it was working great, very fast, very efficient, and well, it was great!
We were extremely happy until the next day when one CDN started reporting high load issues. In analysing the server, it seems nginx was trying to consume several gig of memory – owch. So we killed it off and restarted it, and it ran fine again. Maybe it was just a temporary glitch?
Nope – over the following week, we had several servers experience identical issues – that is, where nginx consumes all available memory until such a point that it simply stops being able to serve files and requires a restart. Looks like a case of either:
A) ncache 2.0 has a serious memory leak
B) ncache 2.0 doesn’t handle high volumes of cached data very well.
We’ve tried to work through the issues to make sure they’re not configuration issues, but no luck. At the end of the day it’s going to be cheaper and easier to make some mods to squid and deply that instead.
So for anybody wondering about ncache performance and stability, I’d simply say that it’s a great product, but not really production-ready at this stage if you’re doing high volumes of data.
We recently found several of our CDN servers suddenly experiencing 10-20% network packet loss – OUCH!. The loss would not be constant, but would happen more frequently at some times of the day than others. No other servers on the same networks were being affected – only the CDN boxes.
One of the syptoms we soon discovered was we’d get errors on the local server when trying to ping out:
ping: sendmsg: Operation not permitted
Ahah! This gave us a great start in that it’s the kernel itself that is rejecting the packets rather than some wierd network anomaly. So we checked route caches and all kinds of things…..nothing really gave the problem away, until we checked our centralised syslog server and saw thousands of these messages:
ip_conntrack: table full, dropping packet
Okay, “dropping packet” – that would make for a good explanation of things. Anyway sure enough, the connection tracking table for netfilter was full (We’d already upped these to 128000 for all of our CDN boxes! – but apparantly this wasn’t enough!) So we upped the limit even higher on all servers:
echo 200000 > /proc/sys/net/ipv4/ip_conntrack_max
And straight away all CDN’s started working again. Don’t forget to add an entry to your /etc/sysctl.conf:
net.ipv4.netfilter.ip_conntrack_max = 200000
It turns out the problem was a new cachecentric.com customer who was really heavy on the thumbnails – their site serves about 150,000 thumbnails per second!!! This was the immediate cause of all our problems – I guess it was always going to happen eventually, so lucky we caught it so quickly. I hope this info helps someone else out.