Category: Service Issues

CVE-2015-0235 “Ghost” Linux glibc vulnerability

For those that missed it, CVE-2015-0235 (aka Linux Ghost) was announced today which details a glibc library bug that is still on many Linux distributions. glibc is used by many applications including webservers, mail servers, php applications etc.

The specific bug was in the gethostbyname() and gethostbyname2() functions (hence “ghost” name!), so only applications that call these are potentially vulnerable. Even then, there is limited scope for exploitation, but there already has been a PoC for the Exim mail server developed so it certainly is possible (given the right conditions). Luckily, these two functions lack IPv6 support, so many newer applications and services have chosen to stop using these functions, and instead use IPv6-enabled functions instead. As has been seen however, some popular ones such as Exim do still use the older IPv4-only functions.

The bug itself has been around since 2000, and was inadvertently patched in August 2014 without realising the implications. Unfortunately since the security issues were not detected at the time, many Linux distributions didn’t back-port the patch into Linux distributions. This is what has occurred today.

Accordingly, we have now taken the following actions:

  • All standard-level webservers globally and chroot environments have been patched and restarted between 6:30AM and 7:15AM this morning.
  • All mail servers were patched and restarted between 6:30AM and 7:15AM this morning.
  • We will be taking the following actions that may result in a few minutes downtime for some sites tonight:

  • All protected-level webservers globally and chroot environments will be patched and restarted overnight at varying times (critical maintenance alert will have been received by all affected customers). As these are all behind load balancers, this shouldn’t have any end-user affect.
  • RackCorp monitoring services will be restarted throughout the day. This may result in some performance graphs being slightly skewed at times.
  • In addition:

  • VM Hosts will have no noticeable impact.
  • Load Balancer services will have no noticeable impact.
  • RackCorp API services will have no noticeable impact (however some unrelated database maintenance is scheduled for tonight that may result in queries taking a few seconds longer than usual).
  • Content delivery services are unaffected.
  • Network services are unaffected.
  • In terms of customer patch cycles, we are treating this as a critical bug for some customers, and moderate (normal patch cycle) for others depending upon the attack vector surface area. All affected customers will have received an email accordingly. If you are unsure of the impact for your specific services, please raise a support ticket accordingly.

    Additional useful resources:
    Ars Technica Writeup on Linux Ghost
    gethostbyname() vs getaddrinfo() by Erratasec

    [del.icio.us] [Digg] [StumbleUpon] [Technorati] [Windows Live]

    Choosing a “Critical Services” provider – checklist

    I’ve been itching to tackle this subject for so long, but time is hard to find these days!  This isn’t purely a marketing blog here, RackCorp offers international services in LOTS of countries(20+ now), and quite often it’s not cost beneficial to our customers to have a fully decked out presence in some locations, so we too have to choose our providers carefully.

    DATACENTRE LOCATION
    – If you’re serving speed-critical videos, files, game services, or telephony solutions, then you should try to choose someone who has equipment close to your customers.
    – If your customers will be uploading / downloading LOTS of data, try utilise peering networks / centres that your customers may be connected to as much as possible as it will save your customers money.
    – If you’ve got a small budget, and your service is not speed critical then consider going with equipment in the US or UK.  It may not be the fastest to your customer’s locations (unless they are in the US or UK!), but you’ll find it gives the best return for the money.

    MAINTENANCE
    – Does the provider perform regular maintenance on their equipment?
    – Does the provider replace hardware regularly?
    – What versions of firmware/software is your provider running – is it surpassed?
    – When was the last time the provider ran without mains power for a test?
    – Does the provider notify customers of software updates in advance, and do they have alternatives if your system is unable to upgrade?

    REDUNDANCY
    Okay, so things go wrong.  Hardware fails, things screw up.  It happens.  Now what!
    – Does the provider have at least N+1 hardware on standby – and what’s the turnaround time in getting +1 operational?
    – Does the provider have network redundancy that will result in no service degradation even if a primary link fails?
    – Does the provider have the systems in place to automatically detect failures and respond to them?

    CONTACT
    Your site goes down – you don’t know why.  It might be your provider’s fault, it might be your fault.  This is where many people might panic….but you shouldn’t if you have addressed the following:
    – Does your provider actively respond to outages, or do you have to notify them first?
    – Do you have a phone number for your provider?  Do they answer or provide voicemail services to which they respond in a reasonable timeframe?
    – Does your provider have a “support ticket” system where issues can be tracked, or is it all verbal / email based?  Support tickets are a requisite when dealing with anything more than a few hundred customers.
    – Does your supplier communicate with you so that you understand what is going on.  They need to speak on your level else there is a risk of miscommunication.
    – How many staff does your provider have?  Can they survive at a critical moment without key persons (Murphy’s law applies to hosting in some extreme ways….)

    TROUBLESHOOTING
    There’s a problem.  Your customers are complaining, but you don’t know what it could be.  This is where you need help!
    – Does the provider publish issues, large and small?
    – Does the provider accept blame for issues related to them, or do they try to conceal things?
    – Does the provider have a technical team able to troubleshoot hardware, network, and software issues?

    WHAT CAN YOU DO?
    So now you should just go and take the above list of questions and give them to your prospective service provider to fill in the blanks.  WRONG!
    Most large providers will at best send you a services overview PDF, or at worst stick your request in their trash can.  There’s just too many ‘shopping’ customers in this industry who demand way too much for what they’re willing to pay.  So what you NEED to do is browse their website and answer as many questions as you can FIRST.  Then if you find you still have questions, then sure, email a few questions to get clarification.

    BUT HOW CAN I TRUST THEIR WEBSITE?
    It’s amazing how many lies are throughout the hosting industry.  Some are hidden, some are blatant.  Some are ‘industry expected’, some are astonishing.  So let’s make a checklist of things you can check yourself:

    1. Do you see the term “UNLIMITED” used on their website? Is your use of that service governed by anything such as bandwidth restrictions (if you’ve got a 10Mbit connection with unlimited traffic, then chances are you’re not going to do much over 3TB of data a month).  If you’re being offered unlimited disk space and you think you’re actually going to use more than average, then look elsewhere.
    2. Fair-use policies. I like to think of these as “This is what we’ll offer you, but don’t expect us to actually provide it” policies.  If you’re expecting to use anything more than an average ‘service’ would use – then look elsewhere.
    3. SLAs. Does the provider state what happens if they fail to meet their 99.9999999% SLA?  No?  Look elsewhere because chances are they don’t know what happens either.  Does the provider offer more than 99.99% SLA?  If so, look elsewhere – it’s obvious their marketing team hasn’t spoken to their finance / legal team, or that their SLA’s are ultimately meaningless to you as a customer.
    4. Backups. What is the company’s back up policy.  How frequently do they back up.  Do they charge to provide you with access to your backup?
    5. Head over to a DNS checking service such as intodns and enter in your provider’s domain name.  Some things to check:
      – “NS records from your nameservers” section should show at LEAST 2 nameservers.  The IP addresses that show up should NOT be very close to each other (i.e. X.X.X.1 and X.X.X.2).  Preferably one or more of those X’s will be different.  This indicates the provider has their own nameservers on redundant networks.
      – “Glue for NS records” section should indicate good things.  While this won’t break anything, it does indicate a provider’s ability to keep their systems running at their best performance.
      – “MX Records” section should have at least 2 mail servers listed there – once again, look for them to be somewhat different IP’s not close together as per before.
    6. ADVANCED LOOKUP: Software version check – fire up telnet and enter their website in as the hostname, and specify port 80 for the port.  Once it is connected, type:
      GET / HTTP/1.1
      Host: www.rackcorp.com  (where www.rackcorp.com is their website URL – followed by two enters)
      You should get a bunch of information up the top of the page which may include Apache / IIS / lighttpd version numbers, PHP version numbers, or other versions.  Use these to look up on the net to see just how old these versions are – you might be surprised at the number of hosting companies running on software 5 or 6 years out of date.  If they don’t maintain their own website, then they certainly won’t maintain yours.
      I should point out here, that less information is better information from a security perspective.  Many audits will frown upon servers that give you version information, so if you don’t get any versions, or don’t recognise anything then it’s probably a GOOD thing.
    7. Google for their name. Do you find more bad reviews than good reviews?  Just remember than complainers are usually a lot louder than praisers, and even the most well run company can NOT satisfy everyone.  Remember than some (many!) hosting companies are into the dodgy practice of posting fake reviews about themselves.  Don’t believe any review unless you can see a customer URL alongside it – and if it is there, check it still exists and isn’t “under maintenance” or simply non existent.

    So that’s it.  Not really how I wanted to put all this information, but it’s a start.  Now here’s comes the marketing piece for RackCorp 🙂

    • RackCorp has multiple DNS servers in multiple countries including US, UK, Germany, Canada, and Australia.  We try to localise these where possible so domains from those countries primarily use nameservers in those countries.  Our DNS services have never had a complete failure EVER (or even come close)
    • RackCorp has multiple mail servers running in HOT-HOT redundancy mode in multiple datacenters in multiple countries.  This means if a whole country goes offline (for whatever reason), our customers will STILL be able to access POP/IMAP/SMTP/Webmail services without even realising.  Our email services have NEVER had an outage for more than a few minutes – we have NEVER lost a single customer email due to an outage.
    • RackCorp server monitoring is closely tied in with our DNS system and is configured to automatically change announcements depending upon service availability / performance.  This lets us AUTOMATICALLY switch between webservers, mail servers, CDN networks, and even more depending upon whether those services are available.
    • RackCorp focuses on critical website hosting in multiple countries.  We employ geo-serving technology to protect against localised DDoS attacks, and to better speed up systems.
    • In 2008, our pimary datacentre for US-based services (including DNS, email and our own website) was the H1 datacentre with The Planet.  An explosion occurred at the datacentre rendering it completely offline.  While most of our competitors crossed their fingers and hoped for the datacenter to come back up swiftly, our services, and hundrds of our customer services were back up and running within 5-15 minutes from alternative locations.  The datacenter remained offline for 3 days due to the incident, with many end-customers of our competitors left offline because suppliers had no offsite redundancy, offsite backups, email redundancy, or anything of the such.

    We don’t get praise much here at RackCorp – because customers tend not to notice even the most disastrous events that we live through.  I see so many hosting companies have a whinge that it’s not their fault when a datacenter loses power, or when their network provider accidentally stops announcing their routes.  That’s part of this business – it’s about how you prepare for the worst and deal with it that makes you a good provider for critical services.

    [del.icio.us] [Digg] [StumbleUpon] [Technorati] [Windows Live]

    Power outage on uk-rs-vhost*, uk-rs-cs*

    As reported, on Wednesday, 30/04/08 at 21:36 AEST, the primary datacentre used by RackCorp for all UK equipment suffered a power issue resulting in power being cut to all equipment for a few minutes. The following is the accepted publication of cause by the datacentre:

    The nature of the UPS failure was unique according to the manufacturer. There are over 3000 systems of the same model installed worldwide, and not one of these has experienced a similar failure. The failure was extremely unusual and unexpected, and the UPS manufacturer is having an engineer from Switzerland over to examine the failed system in more detail. The manufacturer has not yet determined the exact cause of the fault and as such we are not in a position to update you on what action we will be taking.
    When the UPS failed a number of capacitors and other electrical components were overloaded and burnt out, this in turn triggered our fire alarm and all employees were evacuated from the building as part of our standard procedure to ensure their safety. [....]

    Their publication goes on to acknowledge that the fire alarm combined with a number of software issues caused further problems with other services operated by them. We’ve had some queries from clients, and can confirm that the ONLY item above that affected any RackCorp client was the power outage itself. Servers were offline for a few minutes, and soon powered back on. Due to the sub-5 minute outage interval, offsite switchover did NOT occur on any RackCorp UK hosted services.

    On the up side, we can proudly say that EVERY UK-RS server managed by us booted up in a flawless manner on the day, without incident. All equipment was checked over the following 26hrs by our staff and it is understood that no detectable damage is understood to have occurred to any equipment in use by us or our customers. This result can be attributed to our procedural approach to server management, and many of you have already complimented us on that outcome – so on behalf of the staff who were working on that night we certainly thankyou for your kind feedback.

    – NetOps

    [del.icio.us] [Digg] [StumbleUpon] [Technorati] [Windows Live]