Category: System Administration

SSL Certificate 2048 bit generation with OpenSSL

openssl req -nodes -newkey rsa:2048 -keyout www.examplewebsite.com.key -out www.examplewebsite.com.csr

You'll be prompted for a series of questions:
Country Name (2 letter code) [GB]:AU
State or Province Name (full name) [Berkshire]:New South Wales
Locality Name (eg, city) [Newbury]:Sydney
Organization Name (eg, company) [My Company Ltd]:Example Company
Organizational Unit Name (eg, section) []:
Common Name (eg, your name or your server's hostname) []:www.examplewebsite.com
Email Address []:

Please enter the following 'extra' attributes
to be sent with your certificate request
A challenge password []:
An optional company name []:

You then upload the contents of the www.examplewebsite.com.csr file to your SSL certificate registrar and they'll send you back your certificate file.  You can then plug these into your favorite webserver/application server.

[del.icio.us] [Digg] [StumbleUpon] [Technorati] [Windows Live]

sysctl.conf and other settings for high performance webservers

 

There’s a couple of key settings on CentOS servers that significantly helps for high performance web servers that we always put in by default across all of our managed machines:

  • net.ipv4.tcp_syncookies = 1
    While it’s more commonly seen by people wanting to prevent denial of service attacks from taking down their websites, some people don’t realise that a heavy traffic site is not much different from one that is under a constant denial of service!
  • net.ipv4.netfilter.ip_conntrack_max = 300000
    Netfilter under linux does a great job, but it can sometimes be artificially restricted by some OS limitations that try to prevent some traffic from taking up too many system resources.  This is one of those settings that I feel is often set too low.  We ramp it up to 300,000 which means NetFilter can track up to 300,000 “sessions” (such as a HTTP connection) at one time.  If you’ve got 10,000 people on your website at once, you’ll definitely want to adjust this one!
  • net.ipv4.tcp_max_syn_backlog = 10240
    An application such as Nginx is very capable of serving as many TCP connections as an operating system and hardware can handle.  With that said, there will be a backlog of TCP connections ins a pending state before the user-space application such as Nginx gets to call accept().  The key here is to make sure the backlog of unaccepted TCP connections never exceeds the above number else there will be the equivalent of packetloss of the connection packets, and some clients will experience delays, if not a complete outage.  We find 10240 is a high enough number for this on current modern day servers.
  • net.core.netdev_max_backlog = 4000
    This one is important, particularly for servers that operate past 100MBit/s.  It governs how many packets will be queued inbetween the kernel processing the interface packet queue.  At gigabit speeds on busy servers, seeing the queue exceed the default of 1000 is pretty common.  We usually put this up to 4,000 for web servers.
  • kernel.panic = 10
    While unrelated to performance, there’s nothing worse on a busy web server than seeing a kernel panic.  While this isn’t common, when you do push a server to it’s limits, you can certainly come across kernel panics more commonly than you might otherwise, and this setting just helps reduce downtime on production servers.

We usually also change the TCP congestion control algorithm too by adding the following to rc.local:

  • /sbin/modprobe tcp_htcp

You will also want to increase the send queue on your interface by adding the following to your rc.local (you’ll want to change eth0 to your interface name):

  • /sbin/ifconfig eth0 txqueuelen 10000

There’s a lot of commentary online about changing tcp memory buffers and sizes.  Personally I haven’t found them to make much difference on a suitably spec’d server.  One day I might get around to having a look at how these affect performance, but for now, the above settings are known to achieve gigabit HTTP serving speeds for our webservers so that’s good enough for me!

 

[del.icio.us] [Digg] [StumbleUpon] [Technorati] [Windows Live]

Nginx location and rewrite configuration made easy

Okay guys, so as many of you know, we offer both Apache and Nginx servers here as part of our standard shared hosting packages. There is no better web server out there for reliable performance in a high-traffic environment. One thing that I frequently go through with the new staff here are nginx location / rewrite rules because they can be a bit confusing.

The best way to think of things is that as a request comes in, Nginx will scan through the configuration to find a “location” line that matches the request. There are TWO modes that nginx uses to scan through the configuration file: literal string matching and regular expression checks. Nginx first scans through ALL literal string location entries in the order that they occur in the configuration file, and secondly scans through ALL the regular expression location entries in the order that they occur in the configuration file. So be aware – location ordering order DOES matter.

Now there’s a few ways of interrupting that flow:

location = /images { } (Note: does not work for regular expressions)
The “=” is the important character here. This matches a request for “/images” ONLY. This also halts the location scanning as soon as such an exact match is met.

location ^~ /images {} (Note: does not work for regular expressions)
The “^~” results in a case sensitive match for the beginning of a request. This means /images, /images/logo.gif, etc will all be matched. This also halts the location scanning as soon as a match is met.

location ~ /images {}
location ~* /images {} (case insensitive version)
This causes a case (in-)sensitive match for the beginning of a request. Identical to the previous one, except this one doesn’t stop searching for a more exact location clauses.

That’s IT! Yes it really is that simple. Now there’s a few variations for case-insensitive matches or named-locations, but don’t worry about those for now.

Now all of the above examples are literal string examples. If you replace /images with a regular expression then suddenly you have altered the order of the rules (remember ALL literal strings get checked first, and THEN regular expressions – regardless of the order you have them in your configuration).

An examples of a regular expression match is:

location ~ \.(gif|jpg|jpeg)$ { }
This will match any request that ends in .gif, .jpg, or .jpeg.

So now that we’ve discussed the foundations of the location rules, we can move into rewrites. There are TWO kinds of rewrites – URL redirects (HTTP301/HTTP302), or an internal rewrite (mangles the request before it is processed).

URL Redirects are the simplest to understand:

location /admin {
rewrite ^/admin/(.*)$ http://admin.example.com/$1 permanent;
}

This example will redirect any request matching the location rule (see earlier) as a HTTP 301 permanent redirection to http://admin.example.com/. e.g. http://www.example.com/admin/index.html now gets HTTP redirected to http://admin.example.com/index.html. Note the regular expression and the $1 replacement in the URL. If you want the redirect to be a HTTP 302 (temporary redirection), just change the word “permanent” to “redirect”.

Internal rewrites are a little more complicated:

location /admin {
rewrite ^/admin/(.*)$ /$1 break;
}

The key word here is “break”. This causes the rewrite processing to stop. If this word was “last”, it would then go back to scanning location entries as per our discussions earlier – but now with the rewritten URL.

I hope that clears up nginx configuration. The documentation is really good over at the nginx wiki (http://wiki.nginx.org/NginxModules). I think this was the only part that sometimes confuses some of us here. let us know if you think I missed anything, otherwise I hope to put up some of our nginx rewrites for some o the more popular forums/blogs in the weeks to come!

[del.icio.us] [Digg] [StumbleUpon] [Technorati] [Windows Live]

Choosing a “Critical Services” provider – checklist

I’ve been itching to tackle this subject for so long, but time is hard to find these days!  This isn’t purely a marketing blog here, RackCorp offers international services in LOTS of countries(20+ now), and quite often it’s not cost beneficial to our customers to have a fully decked out presence in some locations, so we too have to choose our providers carefully.

DATACENTRE LOCATION
– If you’re serving speed-critical videos, files, game services, or telephony solutions, then you should try to choose someone who has equipment close to your customers.
– If your customers will be uploading / downloading LOTS of data, try utilise peering networks / centres that your customers may be connected to as much as possible as it will save your customers money.
– If you’ve got a small budget, and your service is not speed critical then consider going with equipment in the US or UK.  It may not be the fastest to your customer’s locations (unless they are in the US or UK!), but you’ll find it gives the best return for the money.

MAINTENANCE
– Does the provider perform regular maintenance on their equipment?
– Does the provider replace hardware regularly?
– What versions of firmware/software is your provider running – is it surpassed?
– When was the last time the provider ran without mains power for a test?
– Does the provider notify customers of software updates in advance, and do they have alternatives if your system is unable to upgrade?

REDUNDANCY
Okay, so things go wrong.  Hardware fails, things screw up.  It happens.  Now what!
– Does the provider have at least N+1 hardware on standby – and what’s the turnaround time in getting +1 operational?
– Does the provider have network redundancy that will result in no service degradation even if a primary link fails?
– Does the provider have the systems in place to automatically detect failures and respond to them?

CONTACT
Your site goes down – you don’t know why.  It might be your provider’s fault, it might be your fault.  This is where many people might panic….but you shouldn’t if you have addressed the following:
– Does your provider actively respond to outages, or do you have to notify them first?
– Do you have a phone number for your provider?  Do they answer or provide voicemail services to which they respond in a reasonable timeframe?
– Does your provider have a “support ticket” system where issues can be tracked, or is it all verbal / email based?  Support tickets are a requisite when dealing with anything more than a few hundred customers.
– Does your supplier communicate with you so that you understand what is going on.  They need to speak on your level else there is a risk of miscommunication.
– How many staff does your provider have?  Can they survive at a critical moment without key persons (Murphy’s law applies to hosting in some extreme ways….)

TROUBLESHOOTING
There’s a problem.  Your customers are complaining, but you don’t know what it could be.  This is where you need help!
– Does the provider publish issues, large and small?
– Does the provider accept blame for issues related to them, or do they try to conceal things?
– Does the provider have a technical team able to troubleshoot hardware, network, and software issues?

WHAT CAN YOU DO?
So now you should just go and take the above list of questions and give them to your prospective service provider to fill in the blanks.  WRONG!
Most large providers will at best send you a services overview PDF, or at worst stick your request in their trash can.  There’s just too many ‘shopping’ customers in this industry who demand way too much for what they’re willing to pay.  So what you NEED to do is browse their website and answer as many questions as you can FIRST.  Then if you find you still have questions, then sure, email a few questions to get clarification.

BUT HOW CAN I TRUST THEIR WEBSITE?
It’s amazing how many lies are throughout the hosting industry.  Some are hidden, some are blatant.  Some are ‘industry expected’, some are astonishing.  So let’s make a checklist of things you can check yourself:

  1. Do you see the term “UNLIMITED” used on their website? Is your use of that service governed by anything such as bandwidth restrictions (if you’ve got a 10Mbit connection with unlimited traffic, then chances are you’re not going to do much over 3TB of data a month).  If you’re being offered unlimited disk space and you think you’re actually going to use more than average, then look elsewhere.
  2. Fair-use policies. I like to think of these as “This is what we’ll offer you, but don’t expect us to actually provide it” policies.  If you’re expecting to use anything more than an average ‘service’ would use – then look elsewhere.
  3. SLAs. Does the provider state what happens if they fail to meet their 99.9999999% SLA?  No?  Look elsewhere because chances are they don’t know what happens either.  Does the provider offer more than 99.99% SLA?  If so, look elsewhere – it’s obvious their marketing team hasn’t spoken to their finance / legal team, or that their SLA’s are ultimately meaningless to you as a customer.
  4. Backups. What is the company’s back up policy.  How frequently do they back up.  Do they charge to provide you with access to your backup?
  5. Head over to a DNS checking service such as intodns and enter in your provider’s domain name.  Some things to check:
    – “NS records from your nameservers” section should show at LEAST 2 nameservers.  The IP addresses that show up should NOT be very close to each other (i.e. X.X.X.1 and X.X.X.2).  Preferably one or more of those X’s will be different.  This indicates the provider has their own nameservers on redundant networks.
    – “Glue for NS records” section should indicate good things.  While this won’t break anything, it does indicate a provider’s ability to keep their systems running at their best performance.
    – “MX Records” section should have at least 2 mail servers listed there – once again, look for them to be somewhat different IP’s not close together as per before.
  6. ADVANCED LOOKUP: Software version check – fire up telnet and enter their website in as the hostname, and specify port 80 for the port.  Once it is connected, type:
    GET / HTTP/1.1
    Host: www.rackcorp.com  (where www.rackcorp.com is their website URL – followed by two enters)
    You should get a bunch of information up the top of the page which may include Apache / IIS / lighttpd version numbers, PHP version numbers, or other versions.  Use these to look up on the net to see just how old these versions are – you might be surprised at the number of hosting companies running on software 5 or 6 years out of date.  If they don’t maintain their own website, then they certainly won’t maintain yours.
    I should point out here, that less information is better information from a security perspective.  Many audits will frown upon servers that give you version information, so if you don’t get any versions, or don’t recognise anything then it’s probably a GOOD thing.
  7. Google for their name. Do you find more bad reviews than good reviews?  Just remember than complainers are usually a lot louder than praisers, and even the most well run company can NOT satisfy everyone.  Remember than some (many!) hosting companies are into the dodgy practice of posting fake reviews about themselves.  Don’t believe any review unless you can see a customer URL alongside it – and if it is there, check it still exists and isn’t “under maintenance” or simply non existent.

So that’s it.  Not really how I wanted to put all this information, but it’s a start.  Now here’s comes the marketing piece for RackCorp 🙂

  • RackCorp has multiple DNS servers in multiple countries including US, UK, Germany, Canada, and Australia.  We try to localise these where possible so domains from those countries primarily use nameservers in those countries.  Our DNS services have never had a complete failure EVER (or even come close)
  • RackCorp has multiple mail servers running in HOT-HOT redundancy mode in multiple datacenters in multiple countries.  This means if a whole country goes offline (for whatever reason), our customers will STILL be able to access POP/IMAP/SMTP/Webmail services without even realising.  Our email services have NEVER had an outage for more than a few minutes – we have NEVER lost a single customer email due to an outage.
  • RackCorp server monitoring is closely tied in with our DNS system and is configured to automatically change announcements depending upon service availability / performance.  This lets us AUTOMATICALLY switch between webservers, mail servers, CDN networks, and even more depending upon whether those services are available.
  • RackCorp focuses on critical website hosting in multiple countries.  We employ geo-serving technology to protect against localised DDoS attacks, and to better speed up systems.
  • In 2008, our pimary datacentre for US-based services (including DNS, email and our own website) was the H1 datacentre with The Planet.  An explosion occurred at the datacentre rendering it completely offline.  While most of our competitors crossed their fingers and hoped for the datacenter to come back up swiftly, our services, and hundrds of our customer services were back up and running within 5-15 minutes from alternative locations.  The datacenter remained offline for 3 days due to the incident, with many end-customers of our competitors left offline because suppliers had no offsite redundancy, offsite backups, email redundancy, or anything of the such.

We don’t get praise much here at RackCorp – because customers tend not to notice even the most disastrous events that we live through.  I see so many hosting companies have a whinge that it’s not their fault when a datacenter loses power, or when their network provider accidentally stops announcing their routes.  That’s part of this business – it’s about how you prepare for the worst and deal with it that makes you a good provider for critical services.

[del.icio.us] [Digg] [StumbleUpon] [Technorati] [Windows Live]

ping: sendmsg: Operation not permitted

We recently found several of our CDN servers suddenly experiencing 10-20% network packet loss – OUCH!.  The loss would not be constant, but would happen more frequently at some times of the day than others.  No other servers on the same networks were being affected – only the CDN boxes.

One of the syptoms we soon discovered was we’d get errors on the local server when trying to ping out:

ping: sendmsg: Operation not permitted

Ahah!  This gave us a great start in that it’s the kernel itself that is rejecting the packets rather than some wierd network anomaly.  So we checked route caches and all kinds of things…..nothing really gave the problem away, until we checked our centralised syslog server and saw thousands of these messages:

ip_conntrack: table full, dropping packet

Okay, “dropping packet” – that would make for a good explanation of things.  Anyway sure enough, the connection tracking table for netfilter was full (We’d already upped these to 128000 for all of our CDN boxes! – but apparantly this wasn’t enough!)  So we upped the limit even higher on all servers:

echo 200000 > /proc/sys/net/ipv4/ip_conntrack_max

And straight away all CDN’s started working again.  Don’t forget to add an entry to your /etc/sysctl.conf:

net.ipv4.netfilter.ip_conntrack_max = 200000

It turns out the problem was a new cachecentric.com customer who was really heavy on the thumbnails – their site serves about 150,000 thumbnails per second!!!  This was the immediate cause of all our problems – I guess it was always going to happen eventually, so lucky we caught it so quickly.  I hope this info helps someone else out.

[del.icio.us] [Digg] [StumbleUpon] [Technorati] [Windows Live]