In February, IBM placed its newest supercomputer, Watson, up against the two winningest Jeopardy! contestants in a man-versus-machine showdown. Watson easily bested its rivals, demonstrating a very real future for natural language interaction between humans and computers. Traditionally, computers have only been truly competent while processing explicit, concise instructions void of the ambiguities and “common-sense” references that litter human languages. This limitation in technology is really the basis for all difficulties interacting with machines. It’s the reason we yell at computers when they do what we say and not what we mean. It’s the reason for your mother calling you up to ask why “her Internet” is not working. It’s the subject of much comedy as well, as we can all relate to these experiences. Then again, it’s also the reason why us software engineers get paid the big bucks. If anyone could just open up Visual Studio and have a nice chat with the computer about their new idea, perhaps I’d have to find a different line of work.
KitchenPC is all about data organization. Specifically, the organization and categorization of recipes. Until KitchenPC is able to really understand a recipe, it cannot pivot this data into interesting results for the user. The problem is, recipes are written in human languages and cannot be easily parsed by a computer. In other words, recipes have to be converted from “human-ese” to a disciplined and precise format that KitchenPC can make sense of. Up until now, I’ve been using people to do this translation. Hired people, users, friends, myself; all going through a painstakingly slow process to read a recipe and enter the exact same information back into KitchenPC, while clarifying any part of the recipe in which KitchenPC may not understand. Over the last couple weeks, I’ve been trying out a new approach that may change all this; teaching KitchenPC how to understand the raw recipes themselves. Just like Watson understanding Jeopardy! questions, this requires a deep understanding of natural language, grammar, and common-sense. It requires a lot of insight into how the human brain breaks down instructions and fills in assumptions based on previous knowledge it has acquired.
Luckily, I can limit my domain to the culinary arts. KitchenPC does not need to know how to assemble IKEA furniture (I fret no computer would be able to understand that!) – and its vocabulary doesn’t need to encompass the entire English grammar, only that which is useful to baking a cake. KitchenPC must be able to break down references to common ingredients and make the same assumptions about its use as a typical chef. In this article, I’d like to share a few of the challenges I ran across while building Chef Watson.
My Test Data
Building a parser is a bit like teaching a child; it starts out with the ability to learn, but a very basic knowledge. At first, the parser says “Why?” a lot and you have to teach it. However, you first must give your parser a world to explore. I decided to download several thousand recipes from various websites and build a database of about 2,600 distinct ingredient usages as a set of test cases. Each time the parser wasn’t able to understand one, it would stop and ask for clarification. I would have to figure out exactly what it doesn’t understand and add the correct words or phrases to its vocabulary. This was by far the most pain staking part of the process. Along the way, I developed some specialize tools to expedite this part of the process.
Basic Ingredient Grammar
Luckily, most ingredient usages can be expressed with a finite set of grammatical templates. Often this is something like “an amount”, then “a unit” followed by a name of an ingredient. For example, “5 cups brown sugar”. However, this ingredient could also be expressed as “5 cups of brown sugar” or “brown sugar: 5 cups.” The first step for my parser was to build a template engine that allowed me to define the different ways an ingredient could be expressed, without worrying about the vocabulary for each individual token quite yet. This was rather like building a regular expression parser, where I could test if a phrase was a match for a given grammatical template. Rather than rip apart the input trying to decipher what’s what, you can now just loop through your templates and test for a match. If there’s no match, move on to the next template.
Like I said, a good parser needs to create an abstraction between grammar and vocabulary. Each component of this gramar needs to support any number of synonyms that increase the matching power of that template. Such as a match for amount could be satisfied by 1, 1 1/2, one, or the indefinite article “a“. The unit cups might be cup, or the abbreviation ”c.” Pounds could be expressed as pound, lb, or lbs.
An ingredient might also be clarified with adjectives, such as “5 cups packed brown sugar” or “3 cups chopped carrots”. The parser needs to know how the different words pair up. ”packed” brown sugar has a higher density than unpacked brown sugar, thus five cups is effectively more in terms of weight. KitchenPC already has a massive database of ingredients and forms that it uses to do conversions and build shopping lists, or search for matching recipes based on available amounts. In my case, it was just a matter of parsing out the various phrases in the input and matching them to a database of known possibilities. If the amount was expressed in weight, I would look for a “weight” form for the ingredient (8oz brown sugar.) If the amount was expressed in volume, I’d look for a volumetric form (1 cup brown sugar.) So far, so good.
The solution I found to work the best was a two-phase matching approach. First, try to understand all the words you come across. Make sure they’re all part of your vocabulary, and they match at least one gramatical template. If you run into extra words that aren’t in any of your dictionaries, error out. Once you have this, then try to construct a valid ingredient usage by assembling the match data. ”purple walrus” does not match any known template, so there’s no point in attempting to create an ingredient usage based on that input. ”5 milk” does match a valid template (the amount ingredient template), but since the ingredient “milk” does not have a default “whole unit” measurement, the parsed input cannot be constructed into a valid ingredient usage. Worry about each problem separately; build a data structure that describes the input you’ve collected, then construct a validated result based on that input.
You say tomato, I say red tomato
Unfortunately, my database only has a single name for each ingredient. Recipes might call for “flour” when it really means “all-purpose flour.” It may call for “glace cherries” which I would call “candied cherries.” This required me to build a rather large synonym database which mapped common names of various ingredients together, and also could be used to set defaults for generic ingredients (for example, “milk” would be mapped to “2% milk”)
This also came in handy for parsing out random adjectives. For example, “3 ripe bananas” should be parsed as “bananas: 3″ with a prep note of “ripe.” For this reason, ingredient synonyms can optionally contain a prep note to use when that alias is parsed. This came in handy for ingredient synonyms such as “boiling water” or “room temperature butter.” You want to link these to their root ingredient, but you don’t want to lose the qualifying adjectives for the reader.
Then there’s the issue of plural versus singular (1 cherry or 5 cherries). I first considered stemming the input first, but it was both tough to find a suitable NLP dictionary of stems that worked well for me, and suffix stripping algorithms caused all sorts of chaos in my tests. This might be a good approach for a more general natural language parser, but since I’m confined to a very known set of vocabulary, I just decided to handle plurals and singulars as regular ingredient synonyms. It took a bit more time to build, but I trust the results a lot more.
A clove of lettuce and a head of garlic?
Some ingredients can be expressed in units applicable to only those ingredients. A mapping of custom units had to be built so the parser could understand these types of units. My approach was to build a list of all known units (heads, slices, cloves, etc) and once parsed, be relatable to an ingredient form. If no relation was found (such as a clove of cheese), the parser would error out.
Preparing your ingredients
Many ingredient usages have a preparation step that is actually not relevant to the ingredient or form expressed. Such as “3 carrots, sliced” would mean to take 3 whole carrots, then slice them. The fact that you slice them doesn’t alter the measurement or form the way that “3 cups sliced carrots” would. However, the word “sliced” must be preserved as a preparation instruction (which KitchenPC calls a “prep note.”)
I took the approach that an adjective that occurred after the ingredient would be interpreted as a prep note and not a form. For example, “3 cups of cherries, pitted” would call for 3 cups of whole cherries measured, then taken out of the measuring cup and pitted. Where-as “3 cups of pitted cherries” would want you to pit a bunch of cherries until you filled up 3 cups.
I decided against treating anything after a “comma” as a prep note, since this could yield false positives which could completely change the meaning of the ingredient. I value accuracy over sheer parsing percentage, so I’d rather drop the match than parse it incorrectly. For this reason, I created a dictionary of approved prep notes. I decided to allow any ingredient form to also be a prep note. For example, “shredded” is a form of cheese, thus I would allow the prep format “8oz cheese, shredded” (which of course yields the result “cheese: 8oz (shredded)” and has nothing at all to do with the “shredded” form of cheese.)
Now for the weird stuff
So, this all works pretty well if everyone decides to use proper formatting for all their ingredients, and only uses forms with their logical units of measurement. You’d probably run into very few problems parsing professional cook books with these methods, and I was able to get a parsing accuracy of over 90% with this alone. But we can do better, right?
My parser will first attempt to generate a match with the rules above, however if no match is found, I look at the match data again and try to make a few assumptions based on common sense. I call this code path “anomalous parsing.”
The first anomaly this “sub-parser” can handle is called “prep to form fall-through.” This basically allows a prep note to clarify which form it refers to, and works only with volumetric usages if no default volumetric pairing is known. Confused? Okay, let me provide some examples.
The usage “3 cups apples” is invalid, since whole apples doesn’t have a default volumetric form. You’d have to say “3 cups chopped apples” or “3 cups sliced apples” for it to know what you were talking about. However, if the usage was expressed as “3 cups apples, chopped” then most humans would understand that as “3 cups of chopped apples”, though it would be considered extremely sloppy. However, if you were to say “3 cups grapes, chopped” then “grapes” does in fact have a default volumetric form (you can fill up a measuring cup with grapes very easily.) Thus, this usage would be parsed as “3 cups of whole grapes” with a prep note of “chopped.” Prep to form fall-through only comes in handy when the prep note perfectly matches a known form and the only other option would be to error out.
The second anomaly parser handles mismatched unit types. I call this “auto-form conversion.” A parsed form is tightly coupled with a set of units that the form can be expressed in. For example, “shredded cheese” is assumed to be expressed in volume, such as “5 cups shredded cheese.” However, what if I said, “8oz shredded cheese?” A normal person born on this planet would know what I was talking about: take 8oz of cheese, then shred it. An ounce of cheese is an ounce of cheese no matter what you do to it, thus “shredded” is actually a prep note in this case. If a valid assumption can be made about a form, even if the unit type is incompatible, we can convert this usage into another form.
There would be two possibilities to “correct” this usage. The first would be to convert “8oz shredded cheese” into cups, and get roughly 2 cups. However, this would result in a bunch of weird irrational numbers all over the place and really confuse a lot of people. The other approach is to reinterpret the usage as “weight” and demote the word shredded into a prep note. I took the latter approach, and parse “8oz shredded cheese” as “cheese: 8oz (shredded)”
Auto-form conversions are also applicable to whole units. For example, “3 mashed bananas” would be interpretted as “3 whole bananas” with a prep note of “mashed.”
If both the regular parser and anomalous parser fail, I return an error indicating exactly where it failed and what aspects could not be parsed.
But wait, it gets even more ridiculous!
With my parser so far, I was able to get around 97% match accuracy with my sample data which was fantastic! However, I noticed a lot of the same ingredients were throwing the parser for a loop. These ingredients had something in common with each other; and posed a huge challenge to overcome.
Some ingredients are actually a preparation or modification of another existing ingredient, to which further forms may be yielded from. Huh? Take for example, “3 cups of finely ground graham cracker crumbs.” Yes, that’s a real use-case from my test data. In this case, the ingredient is “graham cracker crumbs”, however graham cracker crumbs are something that can be derived from whole graham crackers, by violently smashing them with your fist. There are several examples of these “prepared ingredients” that I came across, mostly in the “crumbs” variety. Egg yolks and egg whites come to mind, as do chocolate squares (a square broken off from a whole chocolate bar.)
My first approach to solve this problem failed miserably. I attempted to make “crumbs” a synonym for the “crushed” form of graham crackers, and then added a template to handle the form immediately following the ingredient. This blew up in my face by yielding a mess of false positives, and also breaks when a further form is specified before the ingredient (such as “finely ground”)
I thought long and hard about how to interpret these anomalies. One way would be to enter these in as real ingredients, so that they could have actual forms and be parsed as such, and then modify the ingredient aggregation code to convert from form to sub-ingredient to real ingredient. This would be a huge architectural change that would affect the entire site. Not worth it.
Luckily, there’s very few of these anomalies. In fact, so few that I came up with a hacky approach that allows me to handle these one-off cases without polluting the rest of the code. I basically handle these things in the grammar template layer. I’m able to store anomalous usages in the database and transcribe them to “what the user actually meant.” Thus, in the database exists a row for “graham cracker crumbs” that links to “graham crackers” and the form “crushed.” When the parser runs into the phrase “graham cracker crumbs”, it rearranges the input so that it gets treated as “crushed graham crackers” instead. After this interception, the normal parser can take over from there. In other words, “3 cups graham cracker crumbs” will be seen by the usage assembler as “3 cups crushed graham crackers”, just as if the user typed that in, and yield a result of “graham crackers (crushed): 3 cups”
The important thing to note here is I include the entire phrase, and not just the word “crumbs.” This allows me to further link entire phrases such as “finely ground graham cracker crumbs” to a finely ground form type. I’m not too worried about a huge number of combinations as these are very rare and will be used as a last resort. One way to think about this feature is an automatic “find and replace” for whatever the user entered.
The benefit of this approach is I can define any number of these one-off cases and basically correct any arbitrary weirdness on the fly to something that makes sense. So long as KitchenPC is internally able to represent the concept of that usage, I can now instruct the parser to handle anything. Pretty cool, huh?
One thing I really like about this parser is it errors on the side of caution. If it doesn’t understand exactly what the user said, the match is dropped. The last thing I would want is to import recipes incorrectly and have a bunch of flakey recipes. It also acts as a “filter” of sorts to weed out crap recipes; if you don’t care enough to write recipes correctly using clear and concise ingredient usages, I really don’t want your recipe on my site. There’s enough recipes out there on the Internet to scrape, I can ignore the ones with “some sort of meat” or “35 purple M&Ms”
Right now, I have a success rate of just over 98% and every single one of those matches has been validated. I figure for any given recipe, my chances of being able to import that recipe is 0.98n where n is the average number of ingredients in a recipe. If we say there’s an average of 10 ingredients per recipe (that’s just a wild guess), that puts me at around 80% or so.
This would make for a mighty fine recipe scraper which could comb the Internet for hundreds of thousands of recipes to import, making my previously hired work-force completely an utterly obsolete. This pleases me to no end. I’m hoping to implement a crawler that will make use of this parsing engine in the coming weeks.
The parser will also lend itself to many UI improvements. This might entail a different “version” of the parser that would allow for a bit more ambiguity. For example, it might accept anything after a comma to be a prep note, or make more guesses about how forms could be converted. This parser would only assist a user, rather than completely validating input. Elements of the UI could be filled out automatically with the information the parser is able to grasp, and the rest could be filled in or corrected by a user. This would make features such as pasting in several ingredients at a time possible, as well as bulk adding items to your shopping list, pantry, or meal planner. This parser will be vetted through the crawler, but eventually exposed through several extremely useful usability improvements on the site.
I also hope to show off the parser a bit more, perhaps with some video showing the various test harnesses I’ve created for the parser or perhaps an interactive demo where my readers can play “stump the parser.” Stay tuned!