Over the course of my sysadmin career I’ve managed server setups designed to handle very large volumes of traffic, including one site that needs to periodically handle over 10000 concurrent users. In order to meet these demands they were designed with complex hosting - load-balancers, auto-scaling cloud infrastructure and things like that. And while those sites do actually need it, something people needing scale don’t realise is that they can do it for far less.
This is because the vast majority of sites only need a simple caching proxy in front of it to handle large traffic volumes, however this can be difficult to get across. I had this problem during a recent email conversation with a guy asking for server advice. He runs several blogs and forums that get a lot of traffic and despite me assuring him that he didn’t need to spend more than $20-$30 on a caching VPS (or even less if he went for a CDN like Cloudflare or Amazon’s Cloudfront) he simply didn’t believe me.
So I ran a simple experiment. I installed Nginx on a cheap-and-nasty budget VPS of mine and configured it to reverse-proxy one of my sites (he didn’t want me to use one his sites, unfortunately). The VPS had 512MB RAM and could, at most, use just under 1Ghz of 1 CPU core on a server that I’m sure is oversold to hell and back. All in all, it’s not a fast VM :) The site itself was a fairly standard - mostly text with some pictures and other static resources embedded here and there. In other words, like the vast majority of sites on the net.
The test itself was done via Siege running from four other VPSes. As I was only trying to show that even a weak VPS like that could handle a lot of traffic I started off with 100 concurrent connections (25 per VPS) and sieged the site for 10 minutes. I then reran the test with 100 more concurrent connections for 10 minutes and continued to increment it till I started seeing connection failures. In addition, this was a pure benchmarking test so there were no pauses between requests (ie I wasn’t trying to simulate internet browsing). I started seeing lots of connection errors at the 700 mark so I scaled back to 600 and besieged the site again. The final stats are below:
Target VPS CPU Usage: 0.48 (highest point)
Transaction rate: 1156.17 trans/sec
Throughput: 11.56 MB/sec
Transactions: 691410 hits
Availability: 100.00 %
Elapsed time: 598.02 secs
Longest transaction: 10.38
Shortest transaction: 0.21
Average transaction: 0.64
I put the 4 most relevant stats first. What they show is that the VPS was handling 1156 transactions a second (!!) from 600 concurrent connections while using less than half its CPU power. In addition, the Throughput shows why the errors happened at a higher concurrency - there was just under 100 megabits of network traffic! This meant that when I added concurrent connections the network link was completely saturated.
What does this mean? The network was the bottleneck. Hell, the VPS could of handled a lot more traffic (probably even double the amount) if there was more bandwidth.
Needless to say, the guy found these stats rather impressive but he was still a bit sceptical. In the end I convinced him to try out Cloudflare and take it from there. Yes, it seems strange that something so cheap could be so effective, however as the above test shows caching can have an amazing cost/benefit ratio. Seriously, you’d be silly not to give it a try, especially when Cloudflare has a free package :)
Fix Slow SSH Logins