Google, meet KitchenPC

Leave a comment

One of the things I’ve noticed about Google is they have the ability to format certain types of results depending on what type of content the page is displaying.  If you search for “spinach quiche”, you’ll get several results that Google recognizes as recipes and it will display them nicely.  You’ll see a picture of the dish, the “star” rating, how many reviews the recipe has, and even the total preparation time.  That’s pretty sweet!  However, Google apparently doesn’t like KitchenPC too much as my results would get displayed like any other page.  In other words, Google didn’t treat my content as recipe content.  Boo!

Results from AllRecipes.com look great!

It’s been on my list to get to the bottom of this and figure out exactly what sites like AllRecipes are doing that I’m not.  After all, I have all this data available so why not display it on search result listings?  Recently, I got the perfect excuse to dig a bit deeper into this.  Recently, Google announced new features to make searching for recipes even more powerful.  Now, users can filter down search results to only recipe content, exclude recipes based on cook time or calories, and even check “Yes/No” boxes based on what ingredients they have to find that perfect recipe.  So I decided to spend the evening researching what Google calls “Rich Snippets.”

Google will display a “rich snippet” for your page if it finds certain types of markup embedded in your HTML.  Google recognizes several standards, namely microdata, RDFa and microformats.  These technologies all basically work in the same way; by embedding certain markup that will be ignored by browser rendering but recognized by any parser looking for this data.

Google will support any of these formats to recognize recipe data in a website and parse out various properties.  In fact, there’s an excellent tutorial on exactly how to do this with the three major formats here.  I looked briefly at the different options, and eventually chose to go with the hRecipe microformat since that’s what AllRecipes was using and it seemed to be the most adopted standard.  I also see AllRecipes results displayed nicely in Bing, so I know Bing also supports this format.

Modifying the HTML was fairly straight forward.  You can surround information with span tags of a certain class to indicate what they are.  In certain circumstances, you want to display information in one way (such as display 4 star images in a row to indicate a rating) but provide the data to be parsed in another way (such as 4.0).  You can do that with an empty span tag with the correct data in the title attribute.

How KPC results will look according to Google's testing tool

Google, of course, provides a Rich Snippets Testing Tool to preview your content to make sure everything gets parsed right.  Rather than modifying a bunch of my code, I instead saved a recipe to a static file called test.htm on my web server so I could modify that in Notepad until I got everything working right.  I then migrating the changes back over to the source code when everything was displaying the way I wanted.

Though it will probably take a few weeks for Google to update their index, hopefully now KitchenPC results will show up when users are using Google’s new recipe searching tools.  That is if I’m not drowned out by the millions of AllRecipes results that will usually bubble to the top of the first page.  Sigh.

A fistful of tomatoes

1 Comment

One of the innovative aspects of KitchenPC is its ability to convert and aggregate ingredients from one form into another.  For example, chopped tomatoes and whole tomatoes can be tallied up and added to a shopping list expressed in weight.  Since a grocery store sells tomatoes by weight, you probably want that on your shopping list and not “5 cups chopped tomatoes.”  This “form conversion engine” is essential to not only accurate and meaningful shopping lists, but the meal planner as well.  If I say I have 3 tomatoes that I want to use up, the modeler can consider recipes that use tomatoes in chopped form, whole, or by weight.  Without this collection of ingredient metadata, KitchenPC would be relegated to your average recipe database with wanna-be shopping and planning tools that don’t really work.

The first version of the form conversion engine was extremely basic, and provided just enough functionality to prove the initial concept of a recipe website that innovated around this sort of ability.  However, the engine lacked certain functionality.  Primarily, it was only able to represent conversion ratios through weight.  For example, a form (1 cup of chopped tomatoes) would have a weight, always expressed in grams.  Since tomatoes were sold in weight, we could calculate how many “grams” of tomatoes you’d have to buy so that, when chopped, would give you x cups.  This worked for units as well, such as “1 slice of cheddar cheese” weighs about 28 grams, thus if you needed 4 slices of cheese, we can add this ingredient expressed in weight to your shopping list.

However, this design didn’t support the less popular conversions, such as items sold in whole unit or in liquid form.  I could not represent “two scoops of vanilla ice cream is equal to one cup. ”  If there was an ingredient used by weight but sold in volume (I have no idea what this would be!), then that sort of conversion path could not be represented either.

Earlier this week, I took some time to improve the conversion engine to allow these sorts of conversions.  The database can now store conversion coefficients in any unit (weight, volume or whole unit) and convert to any other unit, as long as a conversion path can be found.  This allows me to add some new units such as “squirts of Hershey’s Syrup” and “splashes of soy sauce.”  Though the database doesn’t yet make much use of this far more powerful conversion engine, you can expect some new unit types for the more popular ingredients in the near future.

Going through this code (which was among the oldest code in the KitchenPC source depot) also gave me the opportunity to do a lot more testing on the less common conversion paths.  I noticed converting from volume to whole unit (such as 1cup chopped onions = x whole onions) simply never worked, it just happens that this conversion path is not surfaced through any current ingredient in the database.  I wrote several new unit tests that now offer complete code coverage for every conversion type (no matter how silly) by mocking up fake ingredients and forms to convert.

If I did my job right, you won’t notice any change at all in KitchenPC.  Your shopping list will be as accurate as it’s always been, and digging up meal plans based on what’s in your pantry should be as smooth as ever.  Keep an eye out for new units being added to existing ingredients, and let me know if you notice any problems or have any feedback on how I can improve the existing database.

Friday night coding

Leave a comment

This evening I decided to stay home and hack around with some KitchenPC code, as I’ve neglected to really do much in the technical realm lately.  What better way to get back into a coding groove than to implement a few features that have been requested by real-life users?  I had time to address three of these issues, and figured I’d write a bit about each before going to bed.

Calendar Scroll Back

A problem with the calendar, as voiced by at least two people, is you can only scroll forward in time but not backwards.  Apparently, people want to scroll back in their history to see what they’ve made in the past.  Sometimes, people just want to jog their memory of what they’ve eaten recently, or perhaps send a recently cooked recipe to a friend.  Other people might forget they had something planned even though they purchased ingredients for it, and have to scroll back to see.  Whatever the reason, it was a great example of a bad assumption on my part during the initial design.  I had even gone out of my way to add explicit code to prevent people from scrolling back earlier than the current date, and toggling the visibility of scroll buttons based on the current date range.  The lesson learned is not to limit your users through arbitrary boundaries that have no technical or business justification.

Total Result Count

See? KitchenPC has plenty of chicken recipes!

One of the comments from the UserTesting.com video was that my site doesn’t contain enough recipes.  While this is probably a valid opinion, I believe the impression the tester got was skewed due to my search result limitations.  I limit the results of any query to 100 recipes so as not to generate too much HTML or too much data being serialized across the wire.  While the ideal solution is to implement paging, this seemed not necessary for an MVP as most users never bother searching past the first page of a search engine.  I figured they can just narrow down their search criteria easier than digging through 100 recipes to find what they’re looking for.  The drawback of this approach was exemplified when the user commented that my site apparently only has 100 chicken recipes, when in fact I have easily fifteen times that amount.  The fix was to publicize the actual database count of the query, while making it clear to the user that only the first 100 matches are being displayed.

For those who care, I ran across a great PostgreSQL function called OVER() that allows me to very easily get the total database count even when using a LIMIT clause in the query without having to return multiple tables.  This made it incredibly easy for me to gain access to this data and include it in the UI.

Sortable Results Page

Another comment on the UserTesting.com video was about the arbitrary way recipes were displayed.  In my defense, the results were sorted by rating; it just happened that nothing on that page had been rated by any user yet.  There was no UI indication that any sort order was applied to the results, and the user became frustrated scrolling through the recipes trying to find what she was looking for.  Well, not only did I fix this by adding sort direction arrows to the page, but I added the ability for the user to toggle both the sort column and sort direction by clicking on the column header.

Chicken recipes ordered by descending cook time

Allowing the user to sort the results allows them to easily locate the recipes they’re interested in.  It also provides the secondary benefit of allowing the user access to recipes that might have not been otherwise displayed.  For example, if the user searched for “chicken” and got all 1,500 results, they would only see the top 100 chicken recipes with the highest rating.  If they then sorted the results by prep time, they could then see the top 100 lowest prep times regardless of the rating; in other words, I sort the total result set and not just the recipes displayed on the page.  True, I have to hit the database again to re-sort but I figure it’s a pretty good solution while I have a relatively small amount of data in the database.

So, not too bad for a Friday night eh?  I’m planning on round 2 of improvements this weekend, as this will hopefully pave the way for some major feature redesign coming up in the next few months.  Stay tuned!

Rise of the TwitterBot

6 Comments

Last night I found myself in the mood for a little late night coding, and decided to tackle a project that I’ve had in mind for a while now; setup a Twitter feed that KitchenPC would automatically post new recipes to.  I’ve seen some other sites do this, and figured that even if no one follows it’ll at least be an excuse to go learn Twitter’s API and do something new.

I first decided to create a new Twitter account exclusively for this purpose.  This feed would only contain KitchenPC recipe links as not to pollute the existing @KitchenPC Twitter account which advertises company news and updates.  This new feed could potentially tweet a hundred or more new recipes a day, so one would have to be somewhat crazy to follow this feed in the first place.  The last thing I’d want to do is have people un-follow @KitchenPC because it became too “spammy.”  With that idea in mind, I created @KPCRecipes and started my research.

The obvious design for my TwitterBot was to add code into the KitchenPC Queue, which is a Windows Service that sits around all day emailing people when stuff happens.  If you’ve ever received an email from KitchenPC (notifications, password resets, etc), that was the queue in action.  It basically watches for files dropped to a certain directory and processes them one at a time.  For example, if someone posts a new recipe on KitchenPC, an XML file is created on the file system.  The KPC Queue will then open the file, deserialize the data within the file and see that it contains information about a new recipe post.  It then loads the recipe data from the database, along with any users who subscribe to that event, and emails them.  This way, the website itself is not being blocked while it emails potentially dozens of people.  In the future, I could also scale this out by adding more KPC Queue instances or creating a read-only copy of the database for queue processing.  This queue design also provides fault tolerance.  If there’s an error sending emails or the SMTP server is down, the file is skipped and tried again later.  Even if the queue process crashes, files will just start piling up and will be processed when the queue comes back online; no emails would be lost.

The KPC Queue provides the perfect place for the TwitterBot code to live as well; whenever a “recipe post” event is processed, I can call up the TwitterBot and handle the tweeting out of band, as not to slow down the website at all.  Perfect!

Turns out, there is absolutely no good information online about how to create a TwitterBot, or an automated program that posts Twitter status updates to a hard coded user account.  Twitter’s API documentation is decent, however most of it assumes you’re writing a web based application where the user logs on to their Twitter account and provides your application access.  They also briefly cover how to write a desktop application where the user can provide your application rights to their Twitter account.  Every blog and tutorial I found on the Twitter API was also geared around one of these scenarios.

Those of you even slightly familiar with Twitter’s APIs will know that Twitter is built around OAuth, a protocol for transferring user credentials without providing a username and password, as users might not want your app to have this.  The problem I found with OAuth is tokens are exchanged via various HTTP redirects, which are usually encapsulated in a web based interface.  While it’s true that Twitter used to allow credentials to be expressed using HTTP basic auth, they’ve blocked that ability as of last year and forced all apps to use OAuth.  I did not want my TwitterBot to have to pop up a web browser for me to logon to my KPCRecipes Twitter account; I wanted the account credentials to be hard coded into the program and require zero user interaction.  After all, the queue runs as an NT Service and doesn’t even require a user to be logged on to the system.  Digging up information on this design was rather difficult.  Sounds like the perfect opportunity to write a how-to blog!

Step 1 – Register your application on Twitter

The first thing you’ll need to do to get Twitter’s API to even talk to you is register your application on Twitter.  It doesn’t matter that I’ll be the only one in the world using this application, it still needs to be registered with Twitter.  This gives you an application key which you can use when calling APIs.  You can register your application by going to http://twitter.com/apps/new and filling out the form.  The form is fairly easy to follow, and I chose “Client” for my Application Type for reasons I’ll get into below.  When you submit the form, you’ll get a Consumer Key and a Consumer Secret.  This information is specific to your application and will never change, unless you go in and reset it.  You’ll need to hard code this information into your app, and from this point on I’ll refer to these values as CONSUMER_KEY and CONSUMER_SECRET.

Step 2 – Dig up an API wrapper for your language of choice

There’s several Twitter Libraries written in .NET – I checked out TweetSharp and Twitterizer and decided I liked Twitterizer a bit better.  One thing I liked about Twitterizer is you do pretty much everything by calling static utility methods, so everything is very stateless.  Once you download Twitterizer, just copy the two DLLs (Newtonsoft.Json.dll and Twitterizer2.dll) into your project directory and reference them.

The Twitterizer “getting started” tutorial not only provided a great outline with code samples to get started, it also provided a very simple explanation of how OAuth works.

Step 3 – Authorize your application using your Twitter account

Ok so now we have a Twitter account, in my case KPCRecipes, and our application we registered in step 1.  We want this application to be able to post status updates on our Twitter account, so the Twitter account has to authorize the application to do so.  This process must be done over the web; I suppose you could automate this, it’s just HTTP traffic after all, but luckily it’s something you only need to do once.  I wrote a simple little program to do this for me, which looks like this:

OAuthTokenResponse authorizationTokens = OAuthUtility.GetRequestToken(CONSUMER_KEY, CONSUMER_SECRET, "oob");

string url = String.Format("http://twitter.com/oauth/authorize?oauth_token={0}", authorizationTokens.Token);
Console.WriteLine("Go to:\n\n{0}\n\nLogon as @KPCRecipes and enter the pin number below:\n\n", url);
string pin = Console.ReadLine();

OAuthTokenResponse accessTokens = OAuthUtility.GetAccessToken(CONSUMER_KEY, CONSUMER_SECRET,    authorizationTokens.Token, pin);

Console.WriteLine("Here are your access tokens:\n\nScreenName: {0}\nToken: {1}\nTokenSecret: {2}\nUserId: {3}\n\n", accessTokens.ScreenName, accessTokens.Token, accessTokens.TokenSecret, accessTokens.UserId.ToString());

Let’s walk through this code.  First, we use GetRequestToken to ask Twitter for an authorization code for our app, using the app’s CONSUMER_KEY and CONSUMER_SECRET.  The third parameter is a callback URL which Twitter will redirect to after the user approves the app.  Normally, Twitter would redirect to that callback URL and pass along the access tokens that we seek.  In our case, we need to get these access tokens programmatically.  We thus pass in the string “oob” to instruct Twitter to give us a PIN code to access these tokens later.  Yes, kinda weird but just go with it.

Next, we spit out a URL for the user to go to and approve our app.  We could spawn a web browser here, but I was lazy and decided to just write out the URL for the user to copy and paste into their browser.  When you go to this URL, you’ll see the name of your app and are asked if you want to allow it to have access to your Twitter account (be sure you’re logged in using the correct Twitter account!)  Once you accept it, you’ll see a PIN code.  Paste that PIN code into the app, and this value will now be stored in the string pin.

So, at this point of the code, my Twitter app has access to my KPCRecipes Twitter account and can post status updates programmatically.

However, I still need the access tokens to programmatically access that Twitter account in the future.  These access tokens are sort of like a username and password, but can only be used by this one app and only to access this one Twitter account.  Anyone else who tried to use them would need to know my app’s CONSUMER_KEY and CONSUMER_SECRET.

The next part of the code calls GetAccessToken, passes in the authorization token we had originally, along with the pin that Twitter gave us.  This call returns all the information we need to access the KPCRecipes account using our Twitter app.  The good news is this information will never change, so I can just hard code this into the KPC Queue.

The two pieces of information you’ll need are accessTokens.Token and accessTokens.TokenSecret.  We will call these ACCESS_TOKEN and ACCESS_SECRET.  I added these two values to the queue’s .config file so I can refer to them at runtime whenever I need them, and it’s of course much less hacky than actually hard coding them into the compiled app.

Step 3 – Go Tweet something!

So, now we can throw this program away.  It’s completely not needed unless for some reason you have to re-approve the app again or there’s a zombie apocalypse.  We can now use all this information to programmatically update our Twitter status.  The first thing to do is build an OAuth token which we can send to Twitter to tell it which app we are and what Twitter account we want to access.  You can do that with the following code:

OAuthTokens tokens = new OAuthTokens();

tokens.ConsumerKey = CONSUMER_KEY;
tokens.ConsumerSecret = CONSUMER_SECRET;
tokens.AccessToken = ACCESS_TOKEN;
tokens.AccessTokenSecret = ACCESS_SECRET;

This is pretty straight forward.  An OAuth token has the CONSUMER_KEY and CONSUMER_SECRET to tell Twitter which app we are, as well as the ACCESS_TOKEN and ACCESS_SECRET to tell it which user we want to authenticate as.  Using this tokens object, you can now call any Twitter API you want!  Let’s call the TwitterStatus.Update method:

TwitterResponse tweetResponse = TwitterStatus.Update(tokens, "Yummy Cake Recipe :: http://www.kitchenpc.com/Recipes/YummyCake.html");

Console.WriteLine("Result: {0}", tweetResponse.Result.ToString());

All we do here is call the static Update() method of TwitterStatus, pass in our tokens, and pass in a string with our new status.  The resulting TwitterResponse object will contain information about whether the post was successful or contain any error information.  You can now go to your Twitter timeline and see the update immediately.  Easy enough!

In Summary

To create a TwitterBot, you first create a Twitter account and register a Twitter application.  You then allow that application access to your account, and store the access tokens within your TwitterBot’s configuration.  This is the exact technique the KitchenPC TwitterBot uses and the best way I’ve found to do the job.

More Cool Things

I wasn’t done there!  Since recipe names and permalinks can make for rather long Tweets (perhaps over the 140 character limit), I decided to use bit.ly to shorten the URLs.  Twitter will now automatically shorten URLs you post via their web interface, but URLs posted via the API did not get automatically shortened when I tried (though, perhaps they do if the message is too long?)  Either way, I thought it would be fun to tie into bit.ly’s API and have them automatically shortened.  This was pretty darned easy to do.

First, you’ll need a bit.ly account.  You can get one at http://bit.ly/a/sign_up and filling out the form.  Next, go to http://bit.ly/a/your_api_key and get your API key.  Store this key in your .config file as well!  I’ll refer to this key as BITLY_APIKEY.

Your can access bit.ly’s URL shortening services using simple REST commands.  There might be some nifty API wrapper somewhere, but I decided to just use the WebRequest class to interface with bit.ly directly.

string recipeLink = "http://www.kitchenpc.com/Recipes/YummyCake.html";
string url = String.Format("http://api.bit.ly/v3/shorten?login=kitchenpc&apiKey={0}&longUrl={1}&format=txt",
BITLY_APIKEY, HttpUtility.UrlEncode(recipeLink));

WebRequest webRequest = WebRequest.Create(url);
WebResponse webResponse = webRequest.GetResponse();
Stream responseStream = webResponse.GetResponseStream();
StreamReader reader = new StreamReader(responseStream);
string shortUrl = reader.ReadToEnd();

This code creates an HTTP GET request to api.bit.ly and reads the response into a string called shortUrl.  The URL contains a login (which is the username you use to login to bit.ly), your bit.ly API key, the long URL (safely escaped, or bit.ly will error out) you want to shorten, and the format you want to read the results in (you can also get the result in JSON or XML if you’d like.)

The response will simply contain something like http://bit.ly/x12345.

This code can be combined with the Twitter code to first shorten URLs so they nicely fit within a Twitter status update.  It’s possible that the Twitter APIs might provide this functionality using their own URL shortener, but if they do, this doesn’t appear to be exposed through Twitterizer.

So there you have it

My TwitterBot has been online for about 24 hours now and has been happily posting new recipes to the @KPCRecipes Twitter feed.  What started out as an excuse to learn something new has actually been met with some success.  I’ve done almost no advertising (just a quick post on @KitchenPC and the KitchenPC Facebook page) but already got a handful of followers, including even a couple restaurants!  Apparently, there’s people who love recipe feeds and it provides yet another way for users to interact with my site.  I’ll get traffic from people clicking on the links and through retweets, and it creates a new face to my site.  Another advantage of using bit.ly to shorten to URLs is bit.ly provides me some data and graphs about clicks to each of these URLs, so I can see how popular these tweets actually are.

Thanks for reading!

Show me the content!

Leave a comment

As I’ve mentioned in my last posting (which was so long, you’re probably still reading it,) a key to both SEO and keeping users coming back for more is having lots of content that people are interested in.  After talking one on one with some friends who were excited about the concept of the site, yet didn’t actually use the site, this priority was validated as the single item which needed the most immediate attention.

KitchenPC will most likely iterate and become one of many things depending on the protracted and painful process of market validation, but of all things what is certain is that as long as the mission of KitchenPC is to be a recipe website, having lots of recipes is kinda important.  I’m also placing a huge bet that a fully normalized and relational representation of recipes is not only something that hasn’t been successfully done before, but is something that if done correctly, can unlock both features and revenue models that are not possible with traditional recipe websites.

I’ve determined three possible ways to get content on one’s site.  I’ll discuss the pros and cons of each way, and what I’ve learned from trying.

Crowd Sourcing – Users will just enter content, right?

Yea, right.  Crowd Sourcing is the technique that led to the success of Internet megaliths such as IMDB and Wikipedia, however this is a strategy that is, in my opinion, only applicable to maintaining momentum once you have it.  In other words, once (and only once) you have a massive amount of content that could draw a multi-million eyeball user base, could crowd sourcing be used to achieve consistant, reliable and up to date data.  There have been many studies on the “Wiki” phenomenon, which hypothesizes that when you allow a crowd to maintain a large set of data which can be modified by anyone, the “good” data will win out against the “bad” data.  I’m applying this strategy to my own website, allowing anyone to edit any recipe.  However, with that said, when a website is in its infancy, crowd sourcing is most likely not a viable approach for generating initial user content.  People don’t just magically appear to go do work for you, however they love to be part of a larger cause where their work will be seen by millions.  If KitchenPC takes off, I think the “quality” of my recipes will improve due to crowd sourced efforts, however from what I’ve seen so far, almost no one has decided to enter a recipe voluntarily on the website.  It could be that they don’t do this because it’s a complete pain in the ass right now, but that’s a subject for another post.

Just pay people!

This is a great approach if you have deep pockets or have significant funding.  I’m sure if I had a $50,000 angel investment, I could hire a team of skilled workers to transcribe recipes by the thousands, maintain the ingredient database, and manufacture a large database of awesome recipes.  However, I have a pretty limited budget as I’m boot strapping this company from my savings account.

My initial approach to doing this was a very naive one.  I used the website getacoder.com to post a contract for a single data entry person to type in 10,000 recipes.  For those familiar with the site, the majority of bidders are from developing countries and will work for extremely cheap (by U.S. standards.)  However, on this particular site I’ve found most bidders just copy and paste the same canned responses into bids, don’t really investigate the project in detail, and underbid just to get their foot in the door.  You’re lucky to get someone to even start on the project, let alone finish.  I learned pretty quickly I had to break up the project into smaller amounts of work; such as 1,000 recipes.  Unfortunately, the bids for 1,000 recipes are not 1/10th of the price of 10,000 recipes.  They’re actually about the same.  I hired about five people to enter 1,000 recipes each, and three of them never started or responded to any emails once I approved their bids.  One of them entered around 290 and then quit.  The other one was up to around 480 after three months of working.  She would disappear for days at a time and then get back to work.  Right now, I haven’t heard from her in weeks after sending several emails.

Another frustrating thing about getacoder.com is you have to put the money into an escrow account immediately, and when the coder fails to deliver, getting your money back can take weeks or months.  I’ve found this company to be somewhat shady and unpleasant to deal with, and several people consider the company a giant fraud.

Recently, I’ve moved my outsourcing efforts to vWorker.com which is so far proving to be much better.  Their site is faster and more responsive with a nicer UI and better tools.  The feature that really sold me is the ability to accept multiple bidders on a single project.  I posted a project to enter 1,000 recipes, and accepted my favorite 20 bidders, which “spawned” 20 new projects automatically that I can manage and escrow separately.  Since my goal is to get 10,000 recipes on the site within the near future, I decided hiring 20 workers would be a good initial number to try.  This requires thousands of dollars in escrowed cash, but I fully expect to get most of that money back as most workers will not finish this task.

Out of the 20 workers, five quit within the first few days.  vWorker allows workers to quit within 24 hours and not suffer a bad review or have to go through mediation.  Four workers I’ve escrowed money for, but after over a week have not responded, done any work, or even created accounts on KitchenPC.  Two workers have created an account, but have entered zero recipes.  I believe one of those two has been trying to enter recipes, but insists on pasting in ingredients thus all his recipes end up in the “manual approval” queue and not published on the site.  Nine of the twenty are indeed entering recipes, which is far better than I had predicted.  The top worker has already entered 203, which is fantastic!  The other workers have entered 88, 69, 40, 37, 21, 17, 7 and 7.  Luckily, there’s about five of these workers who I think are doing great work, enter accurate recipes, and are even submitting pictures for their recipes.  I’m fairly confident that at least five of these workers will finish their 1,000 recipe requirement, and hopefully some will agree to enter an additional 1,000.  I’ve found most of these people really want to do good work, and will learn if you spend the time to correct their mistakes and coach them.  However, doing this has been my full time job for the last several days!

I’ve been trying a few motivational techniques as well to inspire these workers to do great work.  First, I email the entire group daily and post the current scores.  This allows people to see where they place in the group, and they get excited if they’re near the top.  If one worker has only posted a couple recipes and someone else has posted dozens, they feel embarrassed and will work extra hard that night.

vWorker also allows me to offer “instant bonuses”, which is an amount of cash that is immediately placed in the worker’s account.  During the first few days, I selected one worker who was doing a particularly amazing job and gave him an extra $50 bucks as a bonus, and congratulated him in the daily score email so that others would learn of his accomplishment and the reward.

A couple days ago, I decided to hold a contest to see who could enter the best “paella” recipe (to appease my friend who complained about the lack of this Spanish dish on my site) offering the winner a $10 bonus.  Over 20 paella recipes have already been submitted, most with pictures and very carefully entered methods.  I plan to hold a few more of these “contests” so that I can fill in recipes for missing tags and try to round out the database where I see various cuisines under represented.

Automated Data Entry

The third and final approach for content generation is to automatically import it from another source, or become a portal to existing content on the Internet while adding value through aggregation.  There’s a ton of recipe sites that already do this, and simple consolidate recipes from other sources on the Internet.

I’ve worked thus far under the opinion that automated data entry is impossible.  Importing recipes that simply have to be human readable would be incredibly easy, but KitchenPC has to index raw metadata allowing it to understand the inner workings of the recipe and how it relates to ingredients the user has to shop for, various pantry amounts, and other recipes that use similar ingredients.  I believe fully automated parsing of recipes is technically possible, but doing this accurately is not something I’ve been able to accomplish yet or probably will be able to for years.

For these reasons, I’ve previously dismissed this idea as a viable approach to initial content generation.  However, after months of frustration getting recipes manually entered and still only having barely over a thousand recipes, I’ve decided that data entry automation is worth checking into more.  I’ve decided a compromise could be made, and maybe I can partially automate the importing of recipes while isolating out just the part that computers can’t do.

The first thing I did was obtain a collection of about 8,000 recipes in XML format.  I chose this set of recipes since it was already partially normalized.  The amounts and units were in different XML tags and could be extracted easily.  However, the ingredient name and form are still highly variable, such as “packed brown sugar” or “fillet of salmon” or “your favorite fruit”.  The core of the problem would be mapping the ingredient descriptions to valid KitchenPC ingredient entities.

I created a list of the distinct ingredients across the 8,000 recipes, ignoring things like spacing and case, which resulted in about 12,000 unique ingredients across the database.  I uploaded this data to Amazon Mechanical Turk, which is a website that allows you to create small jobs for humans to do repetitively.  This allowed me to get thousands of humans to map these ingredients to a list of “known KitchenPC ingredients” and get somewhat accurate results.

Mechanical Turk Real-Time Progress Screen

I decided to pay 3 cents per match, which the average human would take about 40 seconds to do each.  Hundreds of people worked on this problem overnight, and the set was fully matched in around 15 hours.  The results, however, were incredibly disappointing.

The root of the problem is people do not have the interest of data quality in mind when the more matches they do, the more money they get.  While there were thousands of perfect matches, the dataset was literred with randomized answers.  For example, “chicken breasts” would get matched to “Baby Ruth Candy Bar”.  Several hundred ingredients got matched to Cinnamon Toast Crunch cereal.  We found two workers who matched over 1,000 items each to the first item on the list, which leads me to believe that people have written automated scripts that accept this jobs and submit random answers.  Even those who tried would often be lazy, matching things like “white rice” to “rice vinegar” as they’d just search for the first match with the main word.

I cringed at the thought of weeding through 12,000 results by hand to remove the bogus entries, and was about to chalk it up to experience and just pay everyone anyway ($400 out the window!)  The thought of paying all these idiots that just wanted to game the system for a quick buck really made me ill, but the idea of bulk rejecting every answer would be unfair to the majority of workers, who did put in valid answers.  There had to be a way to somehow clean up this data set using statistics and assumptions about human behavior; I really wish I had Steven Levitt on speed-dial, he’d have loved this type of problem!

Luckily, a friend of mine offered to lend her Excel experience to try to devise a way to sniff out the bad answers using some creative data pivoting and grouping.  Mainly, she looked for ingredients that got “picked” more times than usual, and also for users who submitted the most amount of work.  Going through about 600 users was considerably easier than going through 12,000 matches.  We could look at a single worker, one at a time, and quickly see all their answers and could make a decision within seconds if they were mostly bogus, or that user was actually trying.  With each user, we would either bulk reject or bulk approve all their answers at once.  My friend spent probably around 4 or 5 hours working on this spreadsheet and emailed the results back to me with the bad answers rejected.  I definitely owe her dinner for this one!

I also have to hand it to Mechanical Turk for a pretty awesome user interface.  I could download the results as a CSV file, open it with Excel and put x’s in the approve or reject column, and upload the file back to Amazon to process.

The problem with this approach is it’s unfair to workers who did a bad job at matching, but did manage to get a few right answers.  Since we paid for all or paid for nothing, this was the compromise we had to make.  This did result in me getting several nasty emails from these workers wondering why their results were all rejected.

I’ve learned some valuable lessons experimenting with Mechanical Turk.  First, while it’s a good tool, assume around 25% of the results you get from it are going to be completely bogus.  One way to work around this is to assign each HIT to two workers, and only accept the HIT if the answer is agreed on by both random parties.  This, of course, means you’ll pay twice as much money for the results.  Also, since KitchenPC matching is somewhat open to personal opinion (if a recipe called for Apples, I have about five different varieties so I instructed workers to just pick their favorite or most common) so this could severely limit the chances of consensus.  Another approach is to “pre-screen” workers by only letting approved workers work on your HITs.  One way to pre-screen workers is to issue a test which they must complete successfully before they could work.  The test might contain a few “tough” matches, or simply test their culinary knowledge in general.  However, this would limit the scope of workers who could work on each batch, thus slowing down the delivery of the results.  I had about 600 people work on this set at once and it still took 15 hours.  Plus, scammers could still game the system by answering your test and then still bulk submitting bogus answers.

Any way you look at it, the results generated by Mechanical Turk cannot be trusted as an accurate mapping to bulk import thousands of recipes.

However, I believe the results I got from Mechanical Turk (around 9,000 approved mappings) will still be good for building a solution to import recipes easier.  My idea is to import each recipe one at a time, however the Turk data will be used to select a “default mapping” for each ingredient.  This would save me the time from having to map ingredients for each recipe and search for it in a list, as I would only have to quickly glance at the default choice and make sure it’s right.  If it was not right, I would change it and that choice would become the new default mapping.  Another approach would be to go through the top 1,000 or so most common ingredients in the set and map them by hand, and then I could bulk import any recipe that used only these “blessed” ingredients.  Using this technique, I believe I could import a few hundred recipes per day which is better than nothing.

In Closing…

I’ve learned that no one solution can be applied to these sorts of problems.  I think my quest to pull content into the website will be accomplished by a combination of many techniques in parallel, each with its own strengths and weaknesses.  I’m also not hugely worried about data quality, as I’m taking the approach that quantity is actually better than quality in this particular case.  Quality can be improved over time through editing of each recipe, and the recipes that are the most accurate and complete will “bubble” to the top of search results with higher ratings.

AllRecipes has several hundred thousand recipes, and most likely they have their fair share of total crap recipes too.  However, you never see them because they’re buried down on page 47 of your search results.

Hopefully, my experiences will help someone who’s also looking to generate initial content on their site.  Cheers!

The first few days…

Leave a comment

Well, I thought I’d write a quick post about the first few days of KitchenPC Beta.  I’m exhausted, cranky, stressed out, on edge, sleep deprived, craving real food, and “paranoid” (I’ll explain in a bit) all at the same time.

First off, the good news.  Everyone who’s used KitchenPC loves it and has had tons of good things to say.  Albeit they are mostly my friends and prying “honest” feedback out of a good friend is like getting a straight answer from a politician.  With that said, I think the initial impression people have of the site is great.  Whether it will start to show signs of any exponential growth has still yet to be seen.

Now a list of all the @!#$ that has gone wrong so far.

Bugs Preventing User Signups

Within a few hours of launching the site, I began to see some errors in the log about crashes in the CreateUser method, which was used to create a new user account based on a Facebook logon.  The exception was a simple “Null reference exception”, and of course the .NET stack trace won’t give you anything useful like what object was null.

I started adding more and more logging to track down what the issue was, and making sure I was checking every possible variable for a null value.  The most annoying thing was this was working for some people (like me!) and not others.  Thus, something about certain Facebook accounts caused the crash.  After adding the Facebook email to the trace logs, I noticed a friend of mine was attempting to create an account.  Luckily, he was on Facebook at the time so I sent him an instant message and asked if he could be my guinea pig as I tracked down the issue.  After a bit more debugging and logging, I noticed the crash was right after it tried to import the account’s “current location” data, which I was using as the default setting for the “Location” profile data on KitchenPC.  Turns out, I had missed a single period in code which said “if(user.location.current_city…”, and user.location was of course null.  My brain kept seeing “location” and “current_city” as all one property name.  Ugh.  So checking for that fixed the problem, and I was ready to go.  I’m not sure what actually causes this issue, but needless to say on certain Facebook accounts, applications can read this information, and on others it’s blocked.  Yay for Facebook’s convoluted security model.

Meal Planner Engine Bugs

This doozy of a bug totally sucked all the life out of my Saturday night, however it was somewhat of a “fun” bug from a hacker perspective.  I had a few people complain about the meal planner not working for them.  It would either take too long and they’d give up, or the page load would just time out.  Either way, it was working fine for me when I tried.  I finally found a repro case that would cause the meal planner to lock up every time.  All I had to do was demand 7 recipes that will make use of 3 beets.

Now, I use bitmasks to store the allowed tags and the tags a particular ingredient can link to.  That way, I can just say (AllowedTags & IngredientYouHave.LinkedTags) and if that’s over zero, at least one recipe has that ingredient in that tag set.  This is an incredibly fast way to start narrowing down recipes we can consider, and also detecting if a query is impossible.

At first, I thought my bitmasks were off by one.  When the planner begins on a query, it checks these bitmasks and sees if there’s indeed any recipes in the database that meet your criteria, and if not, it throws an ImpossibleQueryException, which translates into a polite “No recipes found” error for the user.  I watched under the debugger as it was trying to find a matching recipe but kept failing, and figured the query was impossible and my code wasn’t detecting that up front.  Soon, I noticed the query was indeed possible, however there was only 1 matching “beet” recipe in the entire database and it kept on picking that.  When trying to find a second beet recipe, it would only come back to the same one again and loop around, since the algorithm will continue and find a new one if the recipe already exists in the set.  This caused an infinite loop.

So basically, if there’s zero recipes that match your criteria, I handle this case up front.  If there’s 3 recipes and you need 7, crash boom.  I was able to come up with a decent mechanism to catch this behavior for now, but I think the ideal solution will be to return the “partial” set to the user and say “Sorry, this was the best I could do.”

Server Problems

The site was down for a couple hours this afternoon due to an Apache issue.  I use Ubuntu Server as my front end load balancer using Apache and mod_proxy.  I have exceptions for the /scripts and /images directory, so the Apache server hosts all the static files and any dynamic page requests are routed to one of two Windows servers.  This is cool because I’m prepared for a big traffic spike, and also I can take down one server by commenting it out of a conf file, and mess with stuff or upgrade it.

For now, I also run PostgreSQL on this Unix server.  That won’t work out for the long term, but right now my database is only a couple megs and Postgres requires around 10 megs of RAM to run, it just wasn’t worth paying for another server instance for a tiny little database.  When the time comes, provisioning a new database server will be easy; I just setup Postgres, migrate the DB contents over (or use streaming replication, setting up a warm stand-by) and then point the web servers over to the new database box.  No DNS changes or anything.

However, due to a bug in Postgres 9.0′s installer (fixed in 9.0.2, thanks guys) there was a dependency on libuuid 1.6 which the installer didn’t install and of course doesn’t ship with Ubuntu 10.04 so I had to built it myself.  Somehow during this process, libuuid.so.1 got linked to 1.6, which Postgres was quite happy about.  Apache, not so much.  However, Apache didn’t show any signs of discontent for two days.  At which point it decided to crash and not restart.  Even more fun, it would attempt to restart, crash, then leave the process hung in memory.  I had like 20 Apache processes going on.

For this issue, I had to go bug my friend Brian (a Unix guru) to ssh in to the box and help figure that one out.  I was not in a good mood, as the production server was totally offline the entire time.  Ugh!  Brian finally worked it out and I totally owe him a beer next time I’m in Vegas.

IIS Being.. well..  IIS.

I’ve also been running into some quite annoying problems with IIS deciding it was bored and no longer answering requests.  It’s as if IIS goes “Ooo look at the squirrel” and ignores the socket.  When this happens, the site just doesn’t pick up and eventually the browser times out.  I was tail’ing the Apache logs to make sure mod_proxy was indeed forwarding the requests to IIS, which it was.  However, IIS’s logs said nothing at all about the request; as if they never happened.  Apache said the socket was closed unexpectedly or some such thing.  This seems to happen after a few hours or maybe a day of uptime, and then I have to reset IIS to get it working again.

The workaround I’ve found for now is to write a jMeter (I love this program) script that loads the Login page every 60 seconds, which seems to keep IIS nice and busy and responding fast.  This thing’s been running for days now, and the problem doesn’t occur as long as it’s running.  However, I still need to get to the bottom of this nonsense.  I do sometimes miss Microsoft, where I could just track down the guy who wrote the thing that doesn’t work and get him to help out.

Advertising

I don’t really know what defines “success” as far as a product launch goes, but I also seem to be off to a bit of a slow start as far as user signups.  I’m just now seeing a few signups by people I don’t know, which is great because it means people are telling their friends, or word is getting around on Facebook through “So and so likes KitchenPC (Website)” posts.  However, I’m still only around 70 user accounts total, and the number doesn’t really seem to be picking up very quickly.

This morning I emailed around 200 people from the survey, however these efforts might have been thwarted due to both the weekend and the hours of downtime due to libuuid problems.

I plan on putting a lot of focus into getting the word out during this week, hopefully I won’t be slowed down by any more annoying server problems.

There’s this whole other level of stress that accompanies a launch that I never saw coming.  I thought that launching was a milestone, a point which represents a finality; where stress was relieved, not accrued.  It just seems that everything matters more, there’s this frantic rush to fix serious bugs or get servers back online, since people are now counting on the site being up and people who I don’t know are using it.  Every time I click on the link, I get paranoid that I’ll see a server error, or the recipe modeler will get stuck in another infinite loop, or Apache will be dead again or IIS will be looking at more woodland creatures.   I guess this is just life, hopefully I can start getting some traction going and start focusing on how to actually improve the product, rather than just treading water to keep the thing online and working.

Next, I’ll be blogging about my next priorities to move the business forward.  Stay tuned!

All systems ready for launch…

Leave a comment

A couple nights ago, I did a dry-run of a single server deployment of KitchenPC onto the Rackspace cloud.  The main purpose was to figure out what kinds of issues I’d run into during the actual deployment, but I also took this opportunity to test the performance of Rackspace’s “CloudServers” product and see how the site “felt” on this platform.

Getting the site up and running on a Windows 2008 RC2 64-bit instance was pretty easy, although I ran into some hassles with the new version of IIS and trying to figure out where all the settings were.  The site felt amazingly fast, any page would just “appear” instantly.  The meal planner code was also quite fast, at least the same speed as the Dell PowerEdge server in which KitchenPC called home during the alpha.

I spent a bit of time writing some scripted tests using jMeter.  These tests were pretty basic.  I ran jMeter on a separate machine on the Rackspace cloud, as not to degrade the web server performance.  I decided to run the tests within the cloud (as opposed to a remote location) since the goal is to test the actual server performance, and not Rackspace’s network capabilities (I already assume they have obscene amounts of network bandwidth.)

A single test would perform an HTTP GET and download the Login page, then POST the Login page back creating a new user account with a random email and password.  It would then follow the 302 redirect to the homepage, thus creating a user context.  At this point, it would issue AJAX calls to do the following:

  • Load five random recipes
  • Load the first and second pages of the news feed (similar to the Facebook wall)
  • Perform a search query with some hard coded parameters

Next, it would issue HTTP GET commands for the following:

  • Cookbook.html (A version of the search results page that returns everything in the user’s cookbook)
  • MealPlan.html (the page that allows the user to enter meal planner criteria)
  • Calendar.html (The user’s calendar)
  • Profile.html (The user’s account settings page)
  • Home.html (Back home again, figured it’d be nice to load this twice since it will be so common)
  • Search.html (Advanced recipe search page)
  • Pantry.html (The user’s pantry)

I then add a recipe to the shopping list, add a recipe to the cookbook, and manipulate the calendar a bit.  I then create a new recipe with a couple basic ingredients and a random title.  Lastly, I update the user’s account by changing a single setting.

I figure this is a decent simulation of a single user messing around with the site for a bit, and at least touches the main features.  I decided not to test the meal planner, because I already know that’s not gonna hold up too well under heavy server loads :)

I decided to simulate 100 users simultaneously running the above script, each running said steps 5 times in a row.  This would last several minutes, which would allow me to watch the CPU usage for both IIS and Postgres.

Also, while the test was running, I logged on with my own web browser from my own Internet connection to see how the site “felt”.

Perf Testing Results (Click to see full size)

The results were extremely satisfying.  Out of the nearly 14,000 HTTP requests, there were zero errors and the average response time was around 2.3 seconds.  The total server CPU usage was pegged at 100% the entire time (which is what you’d want, as every server resource should be used to respond to requests as fast as possible) and IIS took about 60-70% of the CPU while Postgres idled along at around 10-20%.  IIS used around 110 megs of RAM, and Postgres never allocated more than 10 megs (I assume it just had my entire DB cached in memory the whole time.)

I poked around the site on my home computer from remote, and the site still felt nice and quick, even with 100 simulated users doing all sorts of stuff.  Now, I’m not really sure I could come up with an accurate estimate of how many “user accounts” a single server instance could handle, but if 1 out of 1,000 users were logged in at any one time, then this means a single instance should be able to handle 100,000 user accounts.

Oh, one more thing to point out.  I did not test loading static content such as image files as they’ll be loaded from a CDN and also usually cached after the user logs on anyway.

Not counting bandwidth, running a Windows 2008 server instance with a gig of RAM costs about $60/mon from Rackspace.

After all is said, I don’t see any major show stopping issues in terms of performance.  Well, unless 100 people want to use the meal planner all at once.  I do have some ideas on how to improve that scenario though.

Next, on to release!

Wheels go round and round

1 Comment

One of the major UI hurdles I’ve been trying to overcome (and, by far, the number one complaint I’ve received from alpha testers and recipe enterers) has been the inefficiency of entering recipe ingredients.  The ability to enter custom ingredients and forms was a major stepping stone to overcoming said hurdle, but the fact remains that the cumbersome UI to select an ingredient, choose a form, type in a number, then select an available unit for that ingredient is just too clunky to deal with.  One of my data entry people up and quit the other day because she just wasn’t getting paid enough to deal with this nonsense.  I don’t blame her.  The fact she did this, in terms of product feedback, was probably more valuable than the recipes she would have entered, as it finally convinced me this was a problem I had to deal with.  I took this as a message; “Mike – you can’t even pay people to use your product!”

While talking with a friend of mine over Skype and doing some screen sharing, I took down notes as I watched her struggle with the recipe entering process as a first time user.  Though finding an ingredient is super easy (I wrote a very slick auto-complete drop-down for this), the issue became clear that KitchenPC needs an easier way to express ingredient amounts.  Dealing with dropdowns, multiple text boxes, etc is just a deal-breaker.  After giving it some though, I decided I was dealing with this UI paradigm completely backwards.  Once an ingredient is selected, the first step is to choose a form.  For example, when working with cheddar cheese, the form dropdown contains “By Weight”, “By Slice”, “shredded” and “diced”.  Once the user selects a form, the unit dropdown is populated with units acceptable to that form.  If the user chooses “By Weight”, weight units such as ounces and pounds appear.  If the user selects a volumetric form such as “shredded” or “diced”, units such as cup and tablespoon appear.  Since “Slice” is a unit itself, the user can only enter a numeric value when this form is selected and the unit drop-down is disabled.

The seems reasonable at first, but it has a fundamental flaw; the interface flow is completely contradictory (in fact, entirely inverted) to the standard human thought process.  A user will typically think of an ingredient usage using a common mental template, in this case an amount, a unit, and possibly a form that makes sense to that unit.  My existing user interface asks them to navigate this thought process backwards, starting at the end.  This results in the user wrangling the UI into representing their desired usage.  The user knows the amount, however they have to select a form before they are even allowed to enter an amount.  They know their desired unit, however now they have to go dig through all the forms to find one that permits that unit type.  Since form names can be less than entirely descriptive, in some cases it might not be clear which unit type a certain form might allow.  Not to mention, doing all this requires the use of a form dropdown, a numeric input box, and a unit dropdown.  Novice users will navigate this UI by using the mouse which means:

  1. Grab mouse, click the forms dropdown, select a form.
  2. Click on Amount text box.  Move hands back to keyboard.
  3. Type in an amount.
  4. Grab mouse, click on the unit dropdown, select a unit.

A more advanced user is aware of the tab button to navigate web pages.  However, most are not aware of the keyboard shortcuts when dealing with a dropdown list.  Firefox is, of course, superior to IE in this respect as the up and down keys will intuitively open the list.  However, most users will still be tabbing, mousing, and typing all over the place.

If the unit type they want is not available for the form they “guessed”, the user gets frustrated and yells at the screen.  If they haven’t given up by that point, they might mess around with the forms (a concept that really doesn’t make any sense to the average user in the first place) trying to find one that allows their desired unit.

This is crap.

A user interface that mimics how humans actually think would start with the amount.  Regardless of the unit or form, I always need an amount right?  So why not start with this.  The second thought would be the unit type.  Once an amount and unit type have been determined, the user interface should be smart enough to ask for a form if and only if the provided information is ambiguous with multiple form types (rare in most ingredients).  For example, “4 slices” and “2 pounds” is a perfectly acceptable usage for cheese.  When expressed in this way, the form can be ascertained automatically.  If the user says “slices”, they mean the “By Slice” form.  If they say “pounds” they mean the “By Weight” form.  However, if the user enters “4 cups”, this expression presents ambiguity between the “shredded” and “diced” form types.  Only in this case should the user even be bothered to worry about the form of the ingredient, and they should only be bothered with this if and when the information is needed.

The paradigm can be thought of as an inverted tree, slowly narrowing down choices until an acceptable ingredient usage is obtained.  The user is only pestered when more information is needed.  I developed a set of rules to follow when designing this new UI:

  1. Don’t bother the user, just let them type.  Don’t change anything automatically or confine them to only being able to type certain things.
  2. It should be possible to enter an ingredient usage in a single textbox.  Users should not have to tab around.
  3. Help should be provided passively, to “steer” the user along as they type.

The third rule is one I really like, especially this concept of steering the user into describing a valid ingredient usage through natural language.  I thought of a steering wheel, and decided the UI should have a rounded “spinny” feel, like a wheel.  As the user types, a wheel of choices should pop up and rotate as the user nears a valid choice.  Unlike a dropdown, the wheel can describe a set of choices for a certain phrase, such as the unit name or form name.  Imagine something like this:

  1. User types in “1 1/2″ and presses space to type in a unit.
  2. A wheel of units available (across all forms) appears.
  3. When the user types “tabl” the wheel spins around to “tablespoons” but does not auto-complete for them (rule #1)
  4. The user can optionally press ENTER and the rest of the word is filled out for them.
  5. At this point, “1 1/2 Tablespoons” is entered, however this measurement is ambiguous.  Thus, a new wheel pops up with forms that allow volumetric measurements.
  6. The user sees the wheel with “diced” and “shredded” and is allowed to type one or select one with the up/down keys.
  7. When a form is selected, the usage can now be parsed in KitchenPC “meta” language and stored in the database, thus a nice green checkmark is displayed to indicate KitchenPC likes their entry.

I like this UI because it’s a single textbox and allows the user to express any ingredient usage across any form without taking their hands off the keyboard.  In fact, many times the amount can just be pasted into the textbox, especially since I can parse aliases of unit types (such as “1 oz”, “2 c”, “3 lbs”, etc) and map them to the correct unit type internally.  It also fits in with my existing custom form ability.  If what the user types cannot be parsed, the amount will be used as a custom form and be queued for manual processing.

They say a picture is worth 1,000 words, but I’ve decided to go a step further and attach a video of tonight’s prototype.  I’m hoping this technology can be integrated into the main KitchenPC UI fairly easily and make it into the initial public beta.  I think it’s a huge step forward in usability, and perhaps will lessen the number of my users who yell curse words at the screen.  At the very least, it’s a start.

Up all night… again.

1 Comment

As Saturday night becomes early Sunday morning, I find myself frantically tying up loose ends in order to prepare for the “UI integration” phase of KitchenPC, which marks the final coding phase before the private beta release.  I spent the evening tracking down resolutions to a few of those elusive bugs that I’ve been putting off for far too long now.  The ones of interest are:

HTML special codes were being double-escaped

This seems like it should be easy to fix, but for HTML purification, I was using an open-source project called AntiSamy.  When I started, this project was the only one I could find that was a viable white-list approach to purifying HTML and preventing XSS attacks.  There was a few major features missing (like validation of inline CSS styles,) but I was able to construct the input HTML using TinyMCE to not rely on those features.  However, for some reason certain HTML character codes would get escaped, regardless of what the allow-list said.  HTML text that included   would be allowed, but anything else, such as → would get turned into the super annoying →.  WTH?  After putting this off long enough, I decided to track down the issue but instead found the entire project is now pretty much abandoned.  Before I could dig up the source and try to fix this myself, I ran across the recently updated Microsoft AntiXSS Library.  This library, previously part of the Exchange Server SDK (which would actually reference Exchange Server DLLs that I don’t have!), was since released as an open-source, free-to-use library for sanitizing HTML and HTML fragments.  It took all of about 5 minutes to switch my code over, and it works absolutely beautifully!  The one problem was that I had to go through several dozen existing recipes and clean them up by hand.

Archive support for recipes

Now that anyone can edit any recipe, I got scared and decided I should probably archive recipe versions.  Luckily, this was a pretty easy feature to implement.  It required a new table with a recipe id and a text field.  Every time a recipe is updated, I serialize the old version of the recipe as text and stick it in the table.  This will eventually lead to an ability to see old versions of the recipe, highlight differences, and roll back to previous versions similar to most Wiki software out there.  For now, it’s just a safeguard so that I will have some way (albeit hacky) to restore recipes that get corrupted due to accidental user interference or malicious play.  This feature makes me feel a bit better about allowing users to edit any recipe they wish.

Fixed invalid conversion data in database

This one has been annoying me for quite some time.  Sometimes when the user created a shopping list from a recipe, the web service would throw an exception as the form the ingredient was used in could not be converted into the form it’s purchased in.  There was really no good way to enforce this integrity through database schema design, so these errors had to be tracked down manually.  Fortunately, I found a good way to find them all at once, and keep the database clean.  For recipe modeling, I keep certain recipe and ingredient “relationships” in memory in a giant graph.  This graph stores ingredients in their purchased form, but reads every recipe usage up front when the site app loads.  I’ve modified this code to log an error if it runs into any conversion problems.  The code it calls is basically the same code path as adding a recipe to the shopping list.  Furthermore, I’ve been doing performance tests on a database with 10,000 “randomized” recipes.  These tests have exposed invalid conversion data on ingredients that were not even being used on the production DB.  I’ve managed to track these all down and fix them, so there shouldn’t be any such annoying bugs for beta.

Misc bugs that don’t depend on the new UI

I’ve also fixed a few smaller bugs, made some KitchenPC Mobile improvements, and a few finishing touches here and there.  The bugs I have left (and there’s only 6 of them at this point) all will have to wait until the new UI integration.  They are all marked as “Minor” bugs.

KitchenPC Modeling Engine is done!

This is also huge news.  The modeler is now code-complete and running lightning fast.  Generating a model with 10,000 recipes in the database takes about 2 seconds, and the results are excellent.  The overall algorithm design was pretty basic (I had that working in one night), but the hard part turned out to be how to really “define” success.  Computers will do exactly what you tell them, so it really helps if you know what you want them to do.  A lot of variables come into play here, which assume a knowledge of what the user really wants out of this feature.  For example, would a user prefer to use more of their pantry items if this also meant they had to buy more ingredients at the store?  Or are users incredibly cheap and want to buy as few new things as possible, even if they won’t get complete usage out of all their pantry items.  Someday, I believe the meal planner will have an “Advanced” section to allow the user to control these variables, but for beta I have to make a few assumptions about what the user really wants.  I’m really glad I didn’t end up contracting this work out to someone who doesn’t know or care about my users.

Waiting on the UI

On Friday, my Polish guys surprised me by finishing a day early on the final modifications to the prototypes (which are basically Photoshop mockups of each page.)  I’ve signed off on all these changes, and “cutting” (the process of slicing up the bitmap images into HTML) will begin Monday.  I’m hoping the designers will be able to provide me with HTML as each page is completed, which will allow me to work in parallel with them.  I considered putting up a few “teaser” screen shots of the finalized KitchenPC UI, but decided to unleash the final design in all its glory when it goes online.

Final steps

I expect the UI integration to take a week or two, and there will no-doubt be a few last minute bugs to address.  However, the next big step is going to be deciding on a place for KitchenPC to live.  I’m still bouncing back between Amazon EC2 and Rackspace CloudServers, but in reality if I don’t like my decision, I can always move later (it’s just a beta.)

I’d love to say the hard part is over, and it kind of is; getting a product you’re proud of up on the Internet is a huge step, and I really think people are going to enjoy what I’ve created.  I think it will be by far the best meal planning tool available on the Internet (Sorry Monki boy,) and the meal modeling features are just something that hasn’t been done before.  I can’t think of anything I should have done differently or any major regrets with the overall design of the product.  It’s simply going to rock!  I’m more proud of this than any product I worked on at Microsoft during my 12 year career there.

After KitchenPC goes live online, I’ll begin shifting my focus to the business side of things.  This is foreign territory to me, but hugely exciting.  The first step, of course, is to actually find 1,000 beta users.  Rather than just picking 1,000 people off the streets of the Internet, I’d like to do something a bit more intelligent and find a way to target my 1,000 to people who will really use the product and deliver valuable feedback.  Invites will of course go out to the 280+ people who took the survey, and I’ll probably give anyone who asks a few invite codes, but I’ll be spending some time over the next week or so figuring out the best approach for finding this beta group.

During the beta, I’ll be looking at usage metrics, studying how the server performs under certain loads, and addressing any performance problems critical to making possible a wider deployment.  These numbers will also give me an idea of costs associated per user, which is a must for any business plan.  Luckily, my site isn’t hugely data intensive so I expect PostgreSQL to quietly yawn as users explore the site.  This will also be my chance to see if I made any hugely wrong assumptions about the product, and consider the need for any major redesign.

I’ll also be trying to get a bit of media attention, perhaps from the likes of TechCrunch or what-not.  Issuing a formal press-release is probably a good idea, though timing is questionable on that one.  This blog will start to take a turn as well.  I’ll be talking more about future business plans for KitchenPC (such as my visions for revenue generation,) and detailing my adventures trying to get the word out.  Stay tuned!

Power vs Usability: Part II

2 Comments

As a follow-up to my recent post titled “Power vs Usability: The Never Ending War“, I thought I’d write a bit about the changes that I’ve implemented recently to address the issues discussed.  It’s fairly easy to agree that the “Do Nothing Approach” just won’t cut it.  There will of course be missing ingredients, and there will of course be users inconvenienced by this fact.  I can’t expect these users to happily report these problems, as they’re more likely to just close their browsers.  In the Dot-Com-Startup world, you really have one chance to wow your customer and keep them coming back for more, and a user who types in half a recipe and then can’t find “plum marmelade” will not be wowed (except for “wow, this sucks”) and will definitely not be coming back for more.

The “Oh, Did We Mess Up?” approach is not all that much better.  As mentioned, it doesn’t actually solve the user’s problem, it simply presents a bug-reporting form that turns users into testers, and assumes they have some sort of active interest in making my site better.  It still presents the deal-breaker scenario where a user enters half a recipe and cannot save it.  It’s this scenario that I’ve determined is the critical issue to resolve.

Redesigning my database to allow recipes to reference non-existant ingredients or ingredient forms was also a design change that I found highly discouraging.  It’s a lot of code and a lot of design changes that target a hopefully seldom used scenario, and these changes ripple through most of the product (shopping lists, meal planning, displaying data throughout the site, etc.)  The approach I found best was one that was isolated to recipe entry and had little or no changes anywhere else within the product, which meant no database changes.

Over the past three days, I’ve implemented a design that I feel acknowledges the root pain point while not warranting a major design overhaul for the entire site.  This solution allows for recipe entry using custom ingredients or custom forms, however stores these recipes in “limbo” until the database can be updated to represent them correctly.  The user can enter their recipe as they see fit, however a process will take place on the backend to manually approve this recipe before it will actually exist in the database.  The process goes something like this:

  1. User enters a recipe requiring “plum marmelade”, which does not exist in the database.
  2. When typing in this ingredient, KitchenPC will see no exact matches exist and display an option at the bottom of the dropdown titled “Custom: plum marmelade”.
  3. The user selects this, and the rest of the ingredient input switches to a freeform mode, where the user can type in any amount and any unit.
  4. When the user is finished with their recipe entry, they click Save.
  5. On the backend, KitchenPC detects the custom ingredient usage and aborts the normal saving code path.  Instead, the Recipe object is serialized and added to the “PendingRecipes” table.  A message is displayed to the user to indicate this.
  6. Later on, this “pending recipe” will be processed through human intervention.  Either “plum marmelade” will be added to the ingredients table and the recipe will be linked to the newly created ingredient, or if the ingredient indeed existed and the user was in error, the recipe will be corrected.  After the links are fixed up, the recipe will be published as normal under the user’s context.

The mechanism also allows the modification of an existing recipe to use a custom ingredient in the same way.  The altered recipe will be serialized with its existing recipe id, and added to the pending recipe queue.

In the case of a missing form of an existing ingredient (for example, say “goat cheese” was missing the “by slice” form), the user can now choose a “Custom” option in the forms dropdown and type any amount they’d like.  When this happens, the recipe once again goes through the pending recipe queue before being published.

This overall approach has a few advantages.  First and foremost, the user is not prevented from entering their recipe.  The fact that the recipe will not appear right away simply acts as a deterrent for users to use custom ingredients or forms where it’s not actually necessary.  In fact, the UI for using custom ingredients was purposely made harder to use.  There is no shortcut key for selecting a custom ingredient, the user must tab to it or select it with the mouse.  This is in contrast to existing ingredients, which have numbered shortcut keys.

The second advantage is this defines a workflow for reporting missing ingredients and improving the database.  The user doesn’t have to click on a special button to “report a missing ingredient”, they simply use the ingredient in their recipe and the database will be improved as a result.  There may come a time where custom ingredients and forms are a thing of the past and the database has pretty much any consumable food product known to man, but for now, this provides a viable approach for dealing with missing data.

I don’t yet have any backend tools for easily dealing with the pending recipes, however I imagine this is something I’ll be able to get to before beta.  This might be a web based “admin” tool that only I have access to, or even a simple command line tool that can be run on any machine with access to the database.

I’m pretty happy with the solution and am glad I took the time to really deal with this problem from the user’s point of view.  I think the site will be overall better for it, and it just goes to show that it doesn’t matter how perfect your architecture is; if you annoy the user, they’ll be going elsewhere.

Older Entries Newer Entries

Follow

Get every new post delivered to your Inbox.