How would you use an accurate Natural Language Understanding API?
Maluuba is a company that specializes in natural language understanding. If a user says “can you recommend a good sushi place,” we’ll provide nearby sushi restaurants. If a user says “how do I get to New York” we’ll provide directions to New York. Some of our engineers have been working on this problem for two years, and as a result we have one of the most sophisticated NLU engines in the industry for short natural queries.
We’ve been using this technology to build a very accurate personal assistant. However, we suspect that this technology could be much more useful if we opened it up to the community, allowing developers to use it in their own projects. In an effort to get this process started, I’m going to outline the different functionalities of our technology, and pose a question to the community: What would you find most useful? What would you like to see from us to add the most value?
Maluuba’s natural language engine was built to deeply understand what a user’s query means in an effort to give them exact results. The first step is taking a query like “how do I make bread pudding” and determining what “domain” it falls into. We currently support 18 different domains of understanding:
- Business Finder (including restaurants and local businesses)
- General Knowledge queries (facts, figures, science, history, etc.)
- Event Search
- Movies and theatres
- General search
- Calls (both individuals and businesses)
- Contacts (queries about contacts on the user’s phone)
- Help (questions about the app)
Once we’ve determined which domain a user’s query falls into, there are various intents or actions related to that domain. We then determine which action best matches what a user asked for. Using “how do I make bread pudding” as an example, it would first get classified as search. Once inside the search domain, there are seven actions that can be taken:
- Search Google
- Search Bing
- Search recipes (Epicurious)
- Search images
- Search YouTube
- Search eBay
- Search Amazon
In this case, “how do I make bread pudding” would classify into the action “search recipes”. Finally at this point we can do a deep parse for important entities. In the context of search recipe, there is only one important entity: “food”; a complicated action like flight booking could have as many as 16. The parse of “how do I make bread pudding” will extract “bread pudding” as the desired food, and at this point, all of this information could be returned to the consumer of the API.
There is an additional layer of technology: a step we like to refer to as normalization. Let’s say a user says “set up a meeting with George next Tuesday”. The NLU engine will determine that the sentence is a calendar_create_event action, and parse out “George” as an attendee and “next Tuesday” as a date. However, the text “next Tuesday” is not in an ideal machine-readable format. At this stage we use a date normalizer to convert “next Tuesday” into a UNIX time stamp.
The following is a full walkthrough example of the query “remind me in the afternoon to wash the dog” and the JSON that is ultimately returned.
"userRequest": "remind me in the afternoon to wash the dog",
"message": "wash the dog",
That’s a high-level walkthrough of our engine’s interface. We support 18 different top-level domains, around 50 actions, 4 normalizers, and many different entities.
Now we want to know: how would you use this engine? What parts of it seem most useful? High level domain classification, action determination, the deep parse, the normalizers or the entire system as a single unified package? We’re willing to open up the entire system, or subsets of it, and we’d love to hear ideas on how people are thinking of using it.
If you are interested in learning more, or you’d like to provide your feedback, please contact as at firstname.lastname@example.org.