ping: sendmsg: Operation not permitted

We recently found several of our CDN servers suddenly experiencing 10-20% network packet loss – OUCH!.  The loss would not be constant, but would happen more frequently at some times of the day than others.  No other servers on the same networks were being affected – only the CDN boxes.

One of the syptoms we soon discovered was we’d get errors on the local server when trying to ping out:

ping: sendmsg: Operation not permitted

Ahah!  This gave us a great start in that it’s the kernel itself that is rejecting the packets rather than some wierd network anomaly.  So we checked route caches and all kinds of things…..nothing really gave the problem away, until we checked our centralised syslog server and saw thousands of these messages:

ip_conntrack: table full, dropping packet

Okay, “dropping packet” – that would make for a good explanation of things.  Anyway sure enough, the connection tracking table for netfilter was full (We’d already upped these to 128000 for all of our CDN boxes! – but apparantly this wasn’t enough!)  So we upped the limit even higher on all servers:

echo 200000 > /proc/sys/net/ipv4/ip_conntrack_max

And straight away all CDN’s started working again.  Don’t forget to add an entry to your /etc/sysctl.conf:

net.ipv4.netfilter.ip_conntrack_max = 200000

It turns out the problem was a new customer who was really heavy on the thumbnails – their site serves about 150,000 thumbnails per second!!!  This was the immediate cause of all our problems – I guess it was always going to happen eventually, so lucky we caught it so quickly.  I hope this info helps someone else out.

[] [Digg] [StumbleUpon] [Technorati] [Windows Live]