Category: General Help

Choosing a “Critical Services” provider – checklist

I’ve been itching to tackle this subject for so long, but time is hard to find these days!  This isn’t purely a marketing blog here, RackCorp offers international services in LOTS of countries(20+ now), and quite often it’s not cost beneficial to our customers to have a fully decked out presence in some locations, so we too have to choose our providers carefully.

DATACENTRE LOCATION
– If you’re serving speed-critical videos, files, game services, or telephony solutions, then you should try to choose someone who has equipment close to your customers.
– If your customers will be uploading / downloading LOTS of data, try utilise peering networks / centres that your customers may be connected to as much as possible as it will save your customers money.
– If you’ve got a small budget, and your service is not speed critical then consider going with equipment in the US or UK.  It may not be the fastest to your customer’s locations (unless they are in the US or UK!), but you’ll find it gives the best return for the money.

MAINTENANCE
– Does the provider perform regular maintenance on their equipment?
– Does the provider replace hardware regularly?
– What versions of firmware/software is your provider running – is it surpassed?
– When was the last time the provider ran without mains power for a test?
– Does the provider notify customers of software updates in advance, and do they have alternatives if your system is unable to upgrade?

REDUNDANCY
Okay, so things go wrong.  Hardware fails, things screw up.  It happens.  Now what!
– Does the provider have at least N+1 hardware on standby – and what’s the turnaround time in getting +1 operational?
– Does the provider have network redundancy that will result in no service degradation even if a primary link fails?
– Does the provider have the systems in place to automatically detect failures and respond to them?

CONTACT
Your site goes down – you don’t know why.  It might be your provider’s fault, it might be your fault.  This is where many people might panic….but you shouldn’t if you have addressed the following:
– Does your provider actively respond to outages, or do you have to notify them first?
– Do you have a phone number for your provider?  Do they answer or provide voicemail services to which they respond in a reasonable timeframe?
– Does your provider have a “support ticket” system where issues can be tracked, or is it all verbal / email based?  Support tickets are a requisite when dealing with anything more than a few hundred customers.
– Does your supplier communicate with you so that you understand what is going on.  They need to speak on your level else there is a risk of miscommunication.
– How many staff does your provider have?  Can they survive at a critical moment without key persons (Murphy’s law applies to hosting in some extreme ways….)

TROUBLESHOOTING
There’s a problem.  Your customers are complaining, but you don’t know what it could be.  This is where you need help!
– Does the provider publish issues, large and small?
– Does the provider accept blame for issues related to them, or do they try to conceal things?
– Does the provider have a technical team able to troubleshoot hardware, network, and software issues?

WHAT CAN YOU DO?
So now you should just go and take the above list of questions and give them to your prospective service provider to fill in the blanks.  WRONG!
Most large providers will at best send you a services overview PDF, or at worst stick your request in their trash can.  There’s just too many ‘shopping’ customers in this industry who demand way too much for what they’re willing to pay.  So what you NEED to do is browse their website and answer as many questions as you can FIRST.  Then if you find you still have questions, then sure, email a few questions to get clarification.

BUT HOW CAN I TRUST THEIR WEBSITE?
It’s amazing how many lies are throughout the hosting industry.  Some are hidden, some are blatant.  Some are ‘industry expected’, some are astonishing.  So let’s make a checklist of things you can check yourself:

  1. Do you see the term “UNLIMITED” used on their website? Is your use of that service governed by anything such as bandwidth restrictions (if you’ve got a 10Mbit connection with unlimited traffic, then chances are you’re not going to do much over 3TB of data a month).  If you’re being offered unlimited disk space and you think you’re actually going to use more than average, then look elsewhere.
  2. Fair-use policies. I like to think of these as “This is what we’ll offer you, but don’t expect us to actually provide it” policies.  If you’re expecting to use anything more than an average ‘service’ would use – then look elsewhere.
  3. SLAs. Does the provider state what happens if they fail to meet their 99.9999999% SLA?  No?  Look elsewhere because chances are they don’t know what happens either.  Does the provider offer more than 99.99% SLA?  If so, look elsewhere – it’s obvious their marketing team hasn’t spoken to their finance / legal team, or that their SLA’s are ultimately meaningless to you as a customer.
  4. Backups. What is the company’s back up policy.  How frequently do they back up.  Do they charge to provide you with access to your backup?
  5. Head over to a DNS checking service such as intodns and enter in your provider’s domain name.  Some things to check:
    – “NS records from your nameservers” section should show at LEAST 2 nameservers.  The IP addresses that show up should NOT be very close to each other (i.e. X.X.X.1 and X.X.X.2).  Preferably one or more of those X’s will be different.  This indicates the provider has their own nameservers on redundant networks.
    – “Glue for NS records” section should indicate good things.  While this won’t break anything, it does indicate a provider’s ability to keep their systems running at their best performance.
    – “MX Records” section should have at least 2 mail servers listed there – once again, look for them to be somewhat different IP’s not close together as per before.
  6. ADVANCED LOOKUP: Software version check – fire up telnet and enter their website in as the hostname, and specify port 80 for the port.  Once it is connected, type:
    GET / HTTP/1.1
    Host: www.rackcorp.com  (where www.rackcorp.com is their website URL – followed by two enters)
    You should get a bunch of information up the top of the page which may include Apache / IIS / lighttpd version numbers, PHP version numbers, or other versions.  Use these to look up on the net to see just how old these versions are – you might be surprised at the number of hosting companies running on software 5 or 6 years out of date.  If they don’t maintain their own website, then they certainly won’t maintain yours.
    I should point out here, that less information is better information from a security perspective.  Many audits will frown upon servers that give you version information, so if you don’t get any versions, or don’t recognise anything then it’s probably a GOOD thing.
  7. Google for their name. Do you find more bad reviews than good reviews?  Just remember than complainers are usually a lot louder than praisers, and even the most well run company can NOT satisfy everyone.  Remember than some (many!) hosting companies are into the dodgy practice of posting fake reviews about themselves.  Don’t believe any review unless you can see a customer URL alongside it – and if it is there, check it still exists and isn’t “under maintenance” or simply non existent.

So that’s it.  Not really how I wanted to put all this information, but it’s a start.  Now here’s comes the marketing piece for RackCorp 🙂

  • RackCorp has multiple DNS servers in multiple countries including US, UK, Germany, Canada, and Australia.  We try to localise these where possible so domains from those countries primarily use nameservers in those countries.  Our DNS services have never had a complete failure EVER (or even come close)
  • RackCorp has multiple mail servers running in HOT-HOT redundancy mode in multiple datacenters in multiple countries.  This means if a whole country goes offline (for whatever reason), our customers will STILL be able to access POP/IMAP/SMTP/Webmail services without even realising.  Our email services have NEVER had an outage for more than a few minutes – we have NEVER lost a single customer email due to an outage.
  • RackCorp server monitoring is closely tied in with our DNS system and is configured to automatically change announcements depending upon service availability / performance.  This lets us AUTOMATICALLY switch between webservers, mail servers, CDN networks, and even more depending upon whether those services are available.
  • RackCorp focuses on critical website hosting in multiple countries.  We employ geo-serving technology to protect against localised DDoS attacks, and to better speed up systems.
  • In 2008, our pimary datacentre for US-based services (including DNS, email and our own website) was the H1 datacentre with The Planet.  An explosion occurred at the datacentre rendering it completely offline.  While most of our competitors crossed their fingers and hoped for the datacenter to come back up swiftly, our services, and hundrds of our customer services were back up and running within 5-15 minutes from alternative locations.  The datacenter remained offline for 3 days due to the incident, with many end-customers of our competitors left offline because suppliers had no offsite redundancy, offsite backups, email redundancy, or anything of the such.

We don’t get praise much here at RackCorp – because customers tend not to notice even the most disastrous events that we live through.  I see so many hosting companies have a whinge that it’s not their fault when a datacenter loses power, or when their network provider accidentally stops announcing their routes.  That’s part of this business – it’s about how you prepare for the worst and deal with it that makes you a good provider for critical services.

ping: sendmsg: Operation not permitted

We recently found several of our CDN servers suddenly experiencing 10-20% network packet loss – OUCH!.  The loss would not be constant, but would happen more frequently at some times of the day than others.  No other servers on the same networks were being affected – only the CDN boxes.

One of the syptoms we soon discovered was we’d get errors on the local server when trying to ping out:

ping: sendmsg: Operation not permitted

Ahah!  This gave us a great start in that it’s the kernel itself that is rejecting the packets rather than some wierd network anomaly.  So we checked route caches and all kinds of things…..nothing really gave the problem away, until we checked our centralised syslog server and saw thousands of these messages:

ip_conntrack: table full, dropping packet

Okay, “dropping packet” – that would make for a good explanation of things.  Anyway sure enough, the connection tracking table for netfilter was full (We’d already upped these to 128000 for all of our CDN boxes! – but apparantly this wasn’t enough!)  So we upped the limit even higher on all servers:

echo 200000 > /proc/sys/net/ipv4/ip_conntrack_max

And straight away all CDN’s started working again.  Don’t forget to add an entry to your /etc/sysctl.conf:

net.ipv4.netfilter.ip_conntrack_max = 200000

It turns out the problem was a new cachecentric.com customer who was really heavy on the thumbnails – their site serves about 150,000 thumbnails per second!!!  This was the immediate cause of all our problems – I guess it was always going to happen eventually, so lucky we caught it so quickly.  I hope this info helps someone else out.

csync2: Install and setup csync2 on CentOS 5

This blog details how to build and install csync2 form source, as well as configure it.

Step 1) Download and install required libraries

If you haven’t already done so, install graft – it’s great. Here’s a tutorial how to install graft:
http://blog.rackcorp.com/?p=16

Go to ftp://ftp.gnupg.org/gcrypt/libgpg-error/ and download the latest version of libgpg-error

cd /usr/local/PKG_BUILD
wget ftp://ftp.gnupg.org/gcrypt/libgpg-error/libgpg-error-1.6.tar.bz2
bzip2 -d libgpg-error-1.6.tar.bz2
tar -xvf libgpg-error-1.6.tar
cd libgpg-error-1.6
./configure –prefix=/usr/local/PACKAGES/libgpg-error-1.6
make
make install
graft -i /usr/local/PACKAGES/libgpg-error-1.6/

Go to http://www.gnupg.org/download/index.en.html, and download libgcrypt and install it:

cd /usr/local/PKG_BUILD
wget ftp://ftp.gnupg.org/gcrypt/libgcrypt/libgcrypt-1.4.1.tar.bz2
bzip2 -d libgcrypt-1.4.1.tar.bz2
cd libgcrypt-1.4.1
./configure –prefix=/usr/local/PACKAGES/libgcrypt-1.4.1
make
make install
graft -i /usr/local/PACKAGES/libgcrypt-1.4.1

Go to http://www.t2-project.org/packages/libtasn1.html, and download libtasn1

cd /usr/local/PKG_BUILD
wget ftp://ftp.gnutls.org/pub/gnutls/libtasn1/libtasn1-1.4.tar.gz
tar -xvzf libtasn1-1.4.tar.gz
cd libtasn1-1.4
./configure –prefix=/usr/local/PACKAGES/libtasn1-1.4
make
make install
graft -i /usr/local/PACKAGES/libtasn1-1.4

If you get a conflict, you can just remove the conflicting file and retry the graft:

rm -f /usr/local/share/info/dir
graft -i /usr/local/PACKAGES/libtasn1-1.4

Go to http://www.sqlite.org/download.html, and download the source tree file:

cd /usr/local/PKG_BUILD
wget http://www.sqlite.org/sqlite-3.5.9.tar.gz
tar -xvzf sqlite-3.5.9.tar.gz
cd sqlite-3.5.9
./configure –prefix=/usr/local/PACKAGES/sqlite-3.5.9
make
make install
graft -i /usr/local/PACKAGES/sqlite-3.5.9

Go to http://www.gnu.org/software/gnutls/releases/, and download the latest gnutls:

cd /usr/local/PKG_BUILD
wget http://www.gnu.org/software/gnutls/releases/gnutls-2.4.0.tar.bz2
bzip2 -d gnutls-2.4.0.tar.bz2
tar -xvf gnutls-2.4.0.tar
cd gnutls-2.4.0
./configure –prefix=/usr/local/PACKAGES/gnutls-2.4.0
make
make install
graft -i /usr/local/PACKAGES/gnutls-2.4.0

Once again, if you get a conflict:

rm -f /usr/local/share/info/dir
graft -i /usr/local/PACKAGES/gnutls-2.4.0

Go to http://librsync.sourceforge.net/ and download the latest librsync source:

cd /usr/local/PKG_BUILD
wget http://internode.dl.sourceforge.net/sourceforge/librsync/librsync-0.9.7.tar.gz
tar -xvzf librsync-0.9.7.tar.gz
cd librsync-0.9.7
./configure –prefix=/usr/local/PACKAGES/librsync-0.9.7
make
make install
graft -i /usr/local/PACKAGES/librsync-0.9.7

Step 2) Download and install csync2

Go to http://oss.linbit.com/csync2/ and download the latest csync2 source:

cd /usr/local/PKG_BUILD
wget http://oss.linbit.com/csync2/csync2-1.34.tar.gz
tar -xvzf csync2-1.34.tar.gz
cd csync2-1.34

Now I couldn’t get csync2 to locate libsqlite. Seems it doesn’t like the latest version (3) anyway, so we have to go back and download & install an older version:

cd /usr/local/PKG_BUILD
wget http://www.sqlite.org/sqlite-2.8.17.tar.gz
tar -xvzf sqlite-2.8.17.tar.gz
cd sqlite-2.8.17
./configure –prefix=/usr/local/PACKAGES/sqlite-2.8.17
make
make install
graft -i /usr/local/PACKAGES/sqlite-2.8.17

Now, back to csync2. Note that we also run ldconfig just to make sure all our libraries are findable:

ldconfig
cd /usr/local/PKG_BUILD/csync2-1.34
./configure –prefix=/usr/local/PACKAGES/csync2-1.34
make
make install
graft -i /usr/local/PACKAGES/csync2-1.34

And we’re done! csync is compiled and installed! On to step 3….

Step 3) Set up xinetd

By default my CentOS 5 server did not have xinetd installed, so let’s install it

yum install xinetd

Create the following file as /etc/xinetd.d/csync2:

service csync2
{
disable = no
protocol = tcp
socket_type = stream
wait = no
user = root
server = /usr/local/sbin/csync2
server_args = -i
}

The csync2 protocol isn’t a standard one so we need to add it:

echo “csync2 30865/tcp” >> /etc/services

Then let’s restart xinetd:

service xinetd restart

And xinetd is ready…..now we need to configure csync2….

Step 4) Configuring csync2

TO BE CONTINUED…

graft: Install and Configure package management

Graft is a great little tool for package management on UNIX systems such as Linux. The tool itself allows you to keep a package isolated in it’s own directory, and uses symlinks to combine them in with the rest of the operating system.

An example would be a new tool that has one binary and one shared library:

/usr/local/PACKAGES/somenewtool-1.0
/usr/local/PACKAGES/somenewtool-1.0/bin
/usr/local/PACKAGES/somenewtool-1.0/bin/newtool
/usr/local/PACKAGES/somenewtool-1.0/lib
/usr/local/PACKAGES/somenewtool-1.0/lib/libnewtool.so

Using the graft tool, we can install the binary and the library in with the rest of our OS using a simple command like:

graft -i /usr/local/PACKAGES/somenewtool-1.0

With the appropriate configuration, this would create these symlinks:

/usr/local/bin/newtool -> /usr/local/PACKAGES/somenewtool-1.0/bin/newtool
/usr/local/lib/libnewtool.so -> /usr/local/PACKAGES/somenewtool-1.0/lib/libnewtool.so

Removing these symlinks is also really simple like this:

graft -d /usr/local/PACKAGES/somenewtool-1.0

Now you can see the usefulness of this tool to remove one version, and install a new one. Anyway, let’s look at the install process for graft.

1) Setup environment

mkdir -p /usr/local/PACKAGES
mkdir -p /usr/local/PKG_BUILD
cd /usr/local/PKG_BUILD

2) Download and Install Graft

Graft can be found at: http://www.gormand.com.au/peters/tools/

wget http://www.gormand.com.au/peters/tools/graft/graft-2.4.tar.gz
tar -xvzf graft-2.4.tar.gz
cd graft-2.4
make -f Makefile.dist

edit the Makefile file and change these lines:
PACKAGEDIR = /usr/local/PACKAGES
TARGETDIR = /usr/local
PERL = /usr/bin/perl

Now let’s make it:

make
make install
/usr/local/PACKAGES/graft-2.4/bin/graft -i /usr/local/PACKAGES/graft-2.4

And that’s it! Graft is now installed (in fact we just used it to install itself!). Now to get the most out of graft, I recommend adding the line “/usr/local/lib” into your /etc/ld.so.conf file, and running “ldconfig”.

Apache 2 + FastCgi + PHP 5 compile / build

Building the three of these is fairly easy, but a bit tricky if you’re used to how it worked in previous versions of Apache or PHP.  Please note that we install everything into prefix trees and use a tool called graft to manage the versions.  If you simply install your versions into default locations, you can probably leave out anything to do with prefix=, and top_dir.  We also download all install archives into /usr/local/PKG_BUILD.

Anyway to get started, the first trick is to build Apache 2 first by itself.  Here is an example:

cd /usr/local/PKG_BUILD
tar -xvzf httpd-2.2.8.tar.gz
cd httpd-2.2.8
./configure –prefix=/usr/local/PACKAGES/httpd-2.2.8 –enable-file-cache –enable-cache –enable-deflate –enable-mime-magic –enable-expires –enable-headers –enable-version –enable-ssl –enable-http –enable-cgi –enable-rewrite –enable-so –with-ssl=/usr/local/PACKAGES/openssl-0.9.8
make
make install

This will install Apache into /usr/local/PACKAGES/httpd-2.2.8.  If you want to make some modules shared (such as how we’re going to build mod_fastcgi shortly) then you can do so by adding things like: –enable-rewrite=shared.

The first part of installing fastcgi is to download fcgi from http://www.fastcgi.com.  To install this:

cd /usr/local/PKG_BUILD
tar -xvzf fcgi-2.4.0.tar.gz
cd fcgi-2.4.0
./configure –prefix=/usr/local/PACKAGES/fcgi-2.4.0
make
make install

The next step is to download mod_fastcgi (this is a DIFFERENT package to the fcgi one).  You can also download this from http://www.fastcgi.com/

cd /usr/local/PKG_BUILD
tar -xvzf mod_fastcgi-2.4.6.tar.gz
cd mod_fastcgi-2.4.6
cp Makefile.AP2 Makefile
make top_dir=/usr/local/PACKAGES/httpd-2.2.8/
make install top_dir=/usr/local/PACKAGES/httpd-2.2.8/

This will actually find your httpd source and compile mod_fastcgi for you, as well as install it with the Apache 2 modules! (Easy huh?).

The final step is to download and install PHP 5.  You can obtain PHP from http://www.php.net.  After downloading:

cd /usr/local/PKG_BUILD
tar -xvzf php-5.2.4.tar.gz
cd php-5.2.4

Now the next part will no doubt differ depending upon what features you want to compile PHP with.  That’s left up to you and google to work out how to compile the other modules.  I’ll show what we compile some of our systems with, but the only ones that are really applicable to this blog are the –enable-force-cgi-redirect –enable-fastcgi ones.

./configure  –prefix=/usr/local/PACKAGES/php-5.2.4-norm –with-openssl=/usr/local/PACKAGES/openssl-0.9.8d –with-zlib –enable-force-cgi-redirect –enable-fastcgi –with-mysql=/usr/local/PACKAGES/mysql-5.0.41 –enable-static –with-config-file-path=/etc/php.ini –with-imap –with-mcrypt –with-curl=/usr/local/PACKAGES/curl-7.16.1/ –with-kerberos –with-imap-ssl
make
make install

And that’s it for the compiling stage.  I’ll put up a blog at a later date on how to configure Apache 2 to work with FastCgi.  Stay tuned.