Interpreting the intent behind spoken or written language is a very complex process that humans learn through experience in different social settings. This is one reason why literature is studied in universities around the world and academics earn doctoral degrees distilling the various nuances of the works of great literary figures.
3Pillar recently teamed up with one of our customers that promises disruption in the way businesses understand competitor strategies and respond to them. The customer has invested years of research and thought leadership in dissecting discourse from a wide variety of industries. The outcome of this research is a model for classifying discourse as a particular strategy that you may be battling or playing. The model further suggests your counter strategies or assistive strategies.
The key challenge in building an automated decision system based on this model is that the computational linguistics involved with intent analysis are not well developed or understood today. The customer asked 3Pillar to build a beta version of the product to understand the possibilities and the challenges. When the 3Pillar team analyzed the model, it realized that many components of the model could be realized with Natural Language Processing (NLP). NLP in and of itself is not the solution to intent analysis, but starting from an ontology or a model of human interaction, NLP should be the next stop.
We buzzed around armed with our Python REPLs (this is not a word yet, but it should be!) and promptly imported the venerable NLTK library and its corpora. The most useful components of the library were as follows:
Tokenization – Tokenizing text found on the web is a notoriously difficult activity. There is an entire ecosystem busy with scraping web pages with automated and manual processes. NLTK made it simple to tokenize content in two steps – first to sentences and next to words, and we did not need to bother ourselves with various sentence and word delimiters.
Pronunciation – The CMU Pronouncing Dictionary contains about 150,000 words from North American English and their pronunciations. Phonetic analysis of words in a sentence allowed us to find patterns with alliteration and rhyming words.
Part of Speech Tagging – We used the Average Perceptron tagger with the UPENN tagset to understand the structure of sentences. Understanding if a sentence began with a verb, if it had cardinal numbers or if it had superlatives were essential for the decision system.
Our choice of Python ultimately enabled us to deliver the decision system prototype complete with an API built with the Flask microframework and an automated installation procedure.
One more successful product and a happy customer. Interested in finding out more? Contact us to hear the rest of the story!
This blog post originally appeared on the main 3Pillar Global website here.