Optimizing the Application Loading Process (Technical Debt Part 1/3)
During the last month, I’ve been working on paying off some of the technical debt that has been collecting interest over the course of KitchenPC’s development. Most of these have been stability changes, or just things that, when fixed, remove roadblocks to a smoother development process. I’d like to go over these in a three part blog series that, like the Hitchhikers Guide to the Galaxy trilogy, will consist of five parts.
Part 1/3: Optimizing the Application Loading Process
KitchenPC is implemented as a web application that runs within an IIS worker process. Everything from “What Can I Make” to recipe searches to NLP queries is run within this context, and nothing much is handled out of proc. There is one exception, email queuing, which runs as a Windows service. However, pretty much everything else happens under IIS. For this reason, there is a rather lengthy load time when the app initially boots up, as it has to load every ingredient and recipe from the database, compile the data, build in-memory graphs, build search trees, and initialize other data on the heap which makes the user experience lightning fast later on.
During the beta, this was pretty fast since I only had about 10,000 recipes and no NLP dictionaries. App startup time was about 2-3 seconds. However, I now possess a collection of recipes nearly six times larger, plus a ton of grammatical data to make natural language queries possible. The application start time, even on a fast computer, is now somewhere around 10-15 seconds.
This presented two problems. First, I’m impatient. Working long hours on KitchenPC requires many, many rebuilds. Each time I fire up the application in Visual Studio, I had to twiddle my thumbs while all the data was loaded into memory. Often times, I was changing things that had nothing to do with this data, and didn’t even need to use these features. Second, there were many bugs where if requests came in while the app was loading, there’d be random dictionary collisions and other problems that would cause exceptions to be thrown, and sometimes the app initialization to fail. This was mostly due to poor code that didn’t lock objects and make them thread safe. As site traffic picked up, this created a lot of friction deploying new changes into production. My goal was to quickly be able to deploy changes, and not have a single HTTP request error out.
Multi-threaded Application Loading
My solution to this problem was to make the Application_Start code as simple as possible, and simply spawn a new thread to handle the heavy lifting. So now, the application start looks something like this:
public class Global : System.Web.HttpApplication { void Application_Start(object sender, EventArgs e) { Thread thread = new Thread(LoadData); thread.Start(); } private void LoadData() { // I can haz data } }
The first time a user goes to the site, the LoadData method is called on a new thread. The user will immediately see the home page, even though KitchenPC is actually not fully initialized. Of course, this means that certain functionality on the site will not be available for about 10-15 seconds, which could be a problem if the user immediately uses something like “Ingredients to Exclude” (which relies on NLP), selects a recipe within the search results (which relies on recipe aggregation code), or clicks on “What Can I Make?” (which relies on the modeling engine) – All of these features depend on data that’s loaded into static memory on initialization.
To solve this, I put write locks on this data as it’s being loaded. The LoadData() method, cute as the lolcatz reference is, actually looks more like this:
private static ReaderWriterLockSlim _cacheLock = new ReaderWriterLockSlim(); public static void LoadData() { _cacheLock.EnterWriteLock(); try { // Load data from the database } finally { _cacheLock.ExitWriteLock(); } }
Then, any code that needs to reference any of this data enters a read lock:
public static string ReadData(Guid key) { _cacheLock.EnterReadLock(); try { // Lookup key in data and return value } finally { _cacheLock.ExitReadLock(); } }
From the user’s point of a view, if they were to use one of these features while the app was still loading, they would get a ten second pause or so until the initialization code finished and the data were fully available. Remember, this would only be a problem for the first few seconds after the IIS app pool was reset (such as new binaries were sent out to the server, or a configuration change was made), which usually happens at night when traffic is low.
Another advantage of this approach is I can also reload data without reloading the entire app. There’s also administrative web services that can call LoadData() again, re-acquiring that write lock, to refresh the data in memory. This is done when we change NLP data, push new recipes into the site, or change certain ingredient metadata. Basically, certain site features can temporarily be halted or taken offline as data is being refreshed.
This makes making minor site config changes or site updates much smoother, and everything is now done in a much more thread-safe manner without weird random exceptions cropping up all over the place.
Not all components support this!
One gotcha with this design is not all ASP.NET components support being run outside the local thread scope. For example, anything that relies on HttpContext.Current having a value will throw a NullReferenceException when being run in a newly spawned thread, as HttpContext.Current is actually a dictionary keyed by a thread ID. If you’re no longer running in the thread that started the HTTP request, this value will all of a sudden be null.
Castle ActiveRecord, which is the ORM that KitchenPC is built on top of, uses the HttpContext as a key to figure out which database connection and transaction to use.
Luckily, ActiveRecord ships with a class called HybridWebThreadScopeInfo which will solve this problem. This IWebThreadScopeInfo implementation will detect if HttpContext.Current is null, in which case it will initialize a new database context just as if it were a new request.
If you take a look at the code, you’ll see:
HttpContext current = HttpContext.Current; if (current == null) { if (stack == null) { stack = new Stack(); } return stack; }
Where-as the normal WebThreadScopeInfo class will throw an exception if current is null.
You can easily configure this in web.config with one line:
<activerecord isWeb="true" threadinfotype="Castle.ActiveRecord.Framework.Scopes.HybridWebThreadScopeInfo, Castle.ActiveRecord.Web">
That’s all there is to it!
Hope you’ve enjoyed this little technical insight into the inner workings of KitchenPC, and hopefully some of these techniques will help you come up with a better design for your app. Stay tuned for part 2, which will be equally technical and nerdy!
After browsing a number of the blog posts on your page (for almost 3), hrs I seriously like your way
of blogging. I bookmarked it as well so that I can keep up with it
on a regular basis. Go and visit my blog also and tell
me how you feel.
Woah! I’m really loving the template/theme of this website. It’s simple, yet effective.
A lot of times it’s difficult to get that “perfect balance” between user friendliness and visual appeal. I must say you’ve done a great
job with this. Also, the blog loads super quick for me on
Firefox. Exceptional Blog!