Static files? Not on my server! (Technical Debt Part 3/3)
During the last month, I’ve been working on paying off some of the technical debt that has been collecting interest over the course of KitchenPC’s development. Most of these have been stability changes, or just things that, when fixed, remove roadblocks to a smoother development process. I’d like to go over these in a three part blog series that, like the Hitchhikers Guide to the Galaxy trilogy, will consist of five parts.
Part 3/3: Static files? Not on my server!
After implementing my new session management system, one of the main problems was that I was stuck with really big cookies – somewhere over 700 bites, err bytes. That’s a lot of data to be sent on each request, including requests to static script files, images, CSS files, fonts, and other web resources that have no need for the session data stored in each cookie. Now, after I posted the last blog I ended up writing my own custom serialization code for the session data, which got the cookie size down from 700+ bytes to 77 bytes (well, depending on the size of your user name,) so that’s a huge win. However, I still wanted to implement a system that would not transmit cookies over the wire on each request. For this reason (and others,) I decided to move all my static files over to a CDN.
A CDN, according to Wikipedia, is a “large distributed system of servers deployed in multiple data centers in the Internet.” More specifically, it’s a network of servers that can easily store static files for you so that you don’t have to house them on your own web servers, which are busy with dynamic content, database queries, finding out what you can do with 47 ounces of asparagus, etc. This network is usually built using servers across dozens of locations. There are two huge advantages of this geographical diversity. First, if one node goes down due to a wind storm, bird strike, or someone tripping over a power cord (they told me they were going to tape down those cords!), then your content can be served up from elsewhere. Second, requests can automatically be routed to data centers near-by, which will result in faster load times. In other words, someone in Hong Kong visiting KitchenPC will download graphics and script files from servers near Hong Kong, and someone in Seattle visiting KitchenPC will download those same graphics and script files from servers near Seattle. You get the idea.
Finding the Right CDN For You
There are tons of different CDN companies out there. However, not all of them directly deal with end users. For example, EdgeCast is a huge provider, but their minimum package is a thousand gigs of data for around $300 a month. Way more than a small website needs. For this reason, there are many smaller companies that simply buy up bandwidth at wholesale prices from the big guys, and resell it in smaller chunks to people like me. The advantage of going through one of these companies is that you have no minimum to buy; your monthly bill can literally be like 88 cents.
Rackspace CloudFiles
The first CDN I checked out was Rackspace, whose solutions is called CloudFiles. They’re built on Akamai CDN, and were desirable because I already host my web servers through Rackspace, and having less services billing me each month is always a good thing. I have been using CloudFiles already for storing recipe images on. During the beta version, if you uploaded either a profile picture or a recipe image, that file would go straight to CloudFiles and be served from there. However, when I checked into CloudFiles for hosting my site’s static content on, I was hugely disappointed.
First, they don’t support origin-pulls. An origin-pull, in CDN vernacular, is the ability to have the CDN download the file from your server the first time it’s requested, then cache it from that point on. In other words, if someone requests http://cdn.kitchenpc.com/images/logo.png, and that particular node in the CDN doesn’t yet have that file, it will download the file from its origin, which is http://www.kitchenpc.com/images/logo.png. This greatly simplifies things, as you don’t need to push out all your content ahead of time, and push it out again whenever things change. Using Rackspace, I’d have to setup a new publishing process that copies all my files out to CloudFiles using their proprietary API (they can’t even be bothered to support something standard, such as FTP) – which means I have to either copy everything (very slow), or keep track of exactly what files were changed since I last published. No thanks.
With a CDN supporting origin-pulls, I can simply deploy my changes out to my web server (which I already have scripts to do) and then call a single API which invalidates the CDN cache, causing the CDN to pull the modified files from my server once again. For me, this is a huge win.
CloudFiles also doesn’t support folder hierarchies. I find this completely ridiculous, especially considering how I have a “RecipeImages” folder with 60,000 files in it. Their web interface basically chokes, and graphical front-ends to CloudFiles, such as CyberDuck, also are too slow to be usable. I have various folders on my web server, such as /scripts, /styles, /images, etc to organize my content. I’m absolutely not about to redo all my HTML to put every file in the same directory, especially since I use various plugins, such as TinyMCE, that require static resources to be laid out in specific locations.
CloudFiles actually allows you to hack around this limitation, as it supports the forward-slash character in an object name, but then you’d be writing some sort of publishing mechanism that flattens out your folder hierarchy, comes up with all the right object names, and deploys files using their proprietary REST API. I had time for none of this. Hopefully some day RackSpace will catch up with the rest of the world and provide a real CDN solution, but for now, they were out of the running.
GoGrid
Next, I decided to give GoGrid a try. They came recommended to me, and were quite high up on a lot of the performance comparisons I looked at. I mentioned earlier that EdgeCast has various resellers. GoGrid is one of these.
GoGrid not only sells CDN services, but they provide complete cloud hosting (as well as dedicated hosting) as well. Unfortunately, you have to sign up for their cloud hosting account before you can provision a CDN account, which must be done by emailing customer support and waiting an hour. Signing up for this account proved difficult. For starters, my credit card was denied due to a vague error. After calling the bank, I was told the error was on GoGrid’s side. Luckily, their customer support was incredibly helpful and stayed in chat with me for quite some time. I was eventually told I should try a debit card, and I could switch over to a credit card once the account was setup. However, at this time, their servers started misbehaving. I would start typing in my billing information, and after about 10 seconds, the page would refresh and tell me the session had expired. I tried on two different computers and three different web browsers, all with the same issue. The customer support rep had no idea what the problem was. Eventually, I connected to a remote server on RackSpace’s network and for some reason that computer worked. It took me about two hours just to sign up for GoGrid. The customer service rep was great, and even gave me a $100 credit on my account.
When I had the account setup, it didn’t take me too long to figure out how to setup the CDN. Their interface is not the greatest, and changes (such as defining a CNAME) take a while to propagate, but it’s definitely usable. Very quickly, I found a huge limitation in their design.
Basically, a CNAME (such as cdn.kitchenpc.com) can only point to a single root origin, such as http://www.kitchenpc.com/images. I didn’t want to have to define cdn-images.kitchenpc.com, cdn-scripts.kitchenpc.com, etc. Now, I can point cdn.kitchenpc.com to http://www.kitchenpc.com directly, however then any resource on my server can be pulled through the CDN; including dynamic content such as the home page. Now, probably this doesn’t really matter that much. In fact, it doesn’t at all. However I’m just OCD about how my servers are setup, and decided I didn’t like it. I did try emailing customer support, and they were completely ignorant as to what I was trying to do. I decided it would take more time to explain it to them than it was worth, so I decided I’d had enough of GoGrid.
Amazon CloudFront
The last CDN I tried was Amazon CloudFront. Amazon is, of course, a huge player in the startup world, hosting countless sites. They have a huge CDN spanning the globe, and very competitive prices. One nice thing about Amazon is I already do a lot of business with them and already have an account, so they already had all my billing information and everything. Setting up a CloudFront account took about 30 seconds.
Their web interface is also superior and very easy to use. Defining a CNAME was a snap, every change I made was instant, and everything just worked on the first try. Amazon also has the same limitation; a CNAME can only point to a single root resource on the origin. However, Amazon provides a feature called Behaviors. A Behavior can define exactly what URLs are allowed and not allowed. Using this feature, I was able to define behaviors to allow only requests to /scripts/*, /images/* and /styles/*. Problem solved.
Amazon does have a bit of a downside. Their pricing is quite convoluted compared to GoGrid, who just charges a flat rate per gigabyte. Amazon has various prices depending on where you want data cached, how many times you call their APIs, how many times you invalidate the cache (though your first 1,000 are free), etc. I came to the conclusion that though their pricing was not as straight forward, I just wasn’t dealing with enough data to care. I expect my usage to be under $5/mon.
So…
So in the end, I ended up with Amazon CloudFront and am quite happy with them. Modifying my site HTML was also quite easy. Since I use XML templates for all my pages, I was able to add some code into the pre-processor to rewrite HTML such as:
<img cdn.src=”/images/logo.png” />
to:
<img src=”http://cdn.kitchenpc.com/images/logo.png” />
And then be able to define the CDN URL prefix within the web.config file. This lets me run my site locally on my dev box without hitting the CDN, and use the CDN in the production environment. I then had to change a bunch of HTML, however this was pretty easy with Visual Studio’s “Find and Replace” tools. Within about an hour, I was completely up and running on Amazon CloudFront, with the site running quite nicely.
For startups, I’d definitely recommend using a CDN. It’s very cheap (the bandwidth is probably cheaper than whatever you’re paying the server hosting company), provides fail-over and redundancy, takes stress off your web servers, and speeds up a lot of requests in other countries. Also, since the domain name is different, you won’t be sending cookies over every HTTP request, which can speed up things as well. Win-win.
I’m reassured not to be the only one to fall out with GoGrid’s bizarre “systems”; in my case, I’d tried them a while ago, so had an existing account but no idea of the password. Simple enough – IF the password reset mechanism (complete with multiple CAPTCHAS!) had worked. It didn’t.
I wound up creating a fresh account (which got me a rather abrupt email threatening account suspension for “fraud”, which they soon dropped – bizarre and not exactly friendly). Their web interface for the Edgecast CDN they are reselling is shockingly obtuse, but works once you get the hang of it; my personal blog seems to be loading nice and quickly through it, at least. (It is ‘baked’: entirely pre-generated offline then uploaded as static content.)
Since then things have changed a lot and Edgecast CDN is now still is one of the leaders of the industry with it’s points of presence located on five continents over the globe. And now i can advice another reseller of this CDN network http://JoDiHost.com that makes it easier to get high-quality cdn-services at very affordable prices with no annual contracts and “pay-as-you-go” model.