Today the American Registry for Internet Numbers (ARIN) granted a request from TotalChoice Hosting for additional IPv4 address space. This new pool of IP’s will be made available to our clients within the next 72 hours, as we need to get the block advertised to the internet and routed within our network.
With the pool of available IP addresses dwindling at a very fast rate, it has become increasingly difficult for providers to obtain new allocations directly from ARIN. As of this posting there are 2.69 /8’s in aggregate available for allocation from ARIN.
We are very pleased to have accomplished this.
We have the last 3 days been tracking a local root vulnerability in the Linux kernel, the core element of all Linux operating systems. This vulnerability is unprecedented in scope, effecting Linux versions going as far back as 8 years which prompted extra consideration in how we handle it.
Here at TCH we operate a network that is dominated by Linux, to say we took this matter very seriously would be an understatement. It was decided after evaluating the threat this vulnerability poses to our network, dedicated servers, and shared/reseller clients, that waiting any longer on an upstream update was not reasonable. Originally there was an estimate of Saturday 1900GMT for upstream updates but this fell through prompting us to take action. In addition to a lack of a reliable upstream update for this issue is the fact that this vulnerability is being actively exploited in the wild with publicly available attack code on many security and underground web sites.
At this moment, we are rolling out to all Linux servers on our network an updated kernel version that will close this vulnerability while maintaining version compatibility with future upstream software updates. This effort in retaining version support will allow our dedicated clients in addition to our own support team to resume normal update practices with tools such as ‘yum’ or ‘apt-get’ and not have to worry about conflicting versions against our in-house kernel update.
Please do not be alarmed if you experience an outage temporarily on dedicated, shared or reseller servers, we thank everyone for understanding the urgency of this matter and if you have any questions or comments please feel free to submit a help desk ticket at https://www.tchhelp.com.
UPDATE:Aug 18, 2009
We will be conducting reboots again this evening to push out a revised version of last nights kernel that corrects issues with r1backup agent, local firewall services and the network driver on certain servers. In addition, this new kernel revision is binary compatible with CentOS/RHEL 4 kernels being that it was built off the same kernel source tree as the standard kernels.
At 3:34 PM EST, we begin to see intermittent packet loss to some of our network due to an inbound Denial of Service attack. We then experienced a core router crash while working to mitigate the attack. This resulted in a wide spread network outage until 3:46 PM EST, when we were able to switch over to our redundant core router. We continued to experience intermittent outages until 4:03 PM EST, at which time all services were returned to normal.
We are sorry for any inconvenience this causes you.
April 24, 2009 4:05 PM
TotalChoice Hosting Total Choice is currently experiencing a wide spread network outage. We have networking engineering personnel on site in investigating this issue.
At this time we do not have a estimated time for when normal network conditions will return.
We will keep this site updated as this unplanned outage moves along.
We are very sorry for this issue and are working to correct the networking issues as quickly as possible.
Thank you for your understanding during this outage.
April 24, 2009 4:24 PM
All services have now been returned to normal. We had a core router crash due to a failed supervisor card, however it did not switch over smoothly. The router was brought back online, switched to the backup supervisor card, and then a replacement card was installed.
We apologize for the inconvenience and thank you for your patience.
Yup, you guessed it – we are going to talk about backups.
Here at TCH we take backups very seriously and when I say that I do not say it lightly, there is no single more important aspect of our management regime than our backup infrastructure. I am going to explain a bit about the extent to which we go through to protect the data you host with TCH.
The first layer of protection we use is raid 1 mirroring on all our shared, reseller and operations servers (help desk, dns servers etc..), this allows for the servers to maintain an identical copy of the system so that in the event of a disk failure the server can continue operating with no adverse effects. There is a catch here though, that is the fact that software support for raid cards in Linux and even under Windows is severely lacking in the capacity of failure notification, which means when a disk fails there is no industry standard method for alerting someone that there is a problem. At TCH, we have developed an in-house software solution that works with our two preferred raid hardware vendors (AMC 3ware and ARECA raid) to the extent that when there is a disk failure in a raid array it captures information on the failure then sends e-mail alerts to management blackberry pagers and to our help desk ensuring that problems are identified and maintenance is immediately scheduled.
Using raid however reliable it may be, is still not impervious to data loss, which brings us to our next level of protection. All our servers are setup with a spare hard disk and configured to take weekly backups of all user data and server configurations. Although these backups have proven to be extremely reliable they are not always ideal as they can be up to a week old depending on the situation where they are required. We look at these backups as a strictly first-line restore point in the event of a failure that allows us to restore accounts in a fashion that is application compatible with the cPanel interface which makes sure accounts function properly and consistently and overcome the data gaps using our CDP solution below.
Finally, we have our gigabit network enabled continuous data protection (CDP) which runs on absolutely every server that retains client and mission critical data. This is a solution we maintain on network accessible storage (NAS) devices that we have build in house, they contain hardware raid across 16 hard disks with redundant power supplies and a capacity of between 6-13tb of space. The continuous data protection (CDP) is a low-level software run on servers with minimal load impact as it does not read the file system but rather the disk in a raw block-by-block method. This allows CDP to identify differences on the disk quickly and backup only those areas of the disk that have changed since the last backup run (incremental backups). These backups are captured on a 12h schedule of every day, 365 days a year and saved to the NAS devices in a “snapshot” capacity. The snapshots make it possible to recover data as it was 12h ago or 5 days ago or anywhere in between – we save copies of the data in every state from every backup run, we do not overwrite backups!. We can further leverage this backup solution as it is a backup of the servers entire hard disk once you pancake together all the snapshots, so in the event of a catastrophic failure we can take the CDP backup image and restore an entire hard disk in a single swift action. You can also leverage this solution from inside cPanel with the R1Soft Backup feature that allows you to restore from the CDP backup images any data you require just as we would without having to request support, although we are always more than happy to help you with any data recovery needs you may have, so do not hesitate.
I hope you have enjoyed learning a bit more about how we protect the data you host with TCH and understand that there is never any substitute for a well planned and executed backup solution. Before I sign off, let me remind you to take the time and consider the data you store at home or work and ask, do you have a backup solution? If not consider storing some of your more important data on your TCH account so in the event of a failure you can rest assured knowing that TCH has you covered.