The Benefit of NLP and Text Mining, a Valuable Tool During a Crisis

April 22, 2020


In the day to day operation, businesses have many KPIs to help them track and understand how the business performs in different areas. People are used to seeing these well-established KPIs, understand how to interpret them and how they relate to how the business is doing. However, when something unexpected happens such as a pandemic or some other major crises,  the predefined KPIs might fail to pick up changes in the business that no one ever though to monitor under normal conditions.

Of course, the analysts can spend time slicing and dicing these measures to gain additional insights of what is happening to the business. However, it might sometimes be hard to have the imagination to understand all the ways the crisis can impact the business. In addition, most analysts are not close to the day to day interaction with customers so it takes time for the anecdotal information from call operators, customer reps , salespeople etc. to reach the analysts that can then perform additional ad-hoc analysis.

This is where the analysis of unstructured text mining and NLP can help in capturing new trends and new problems that existing measures might miss.

Examples of text that can be used for these purposes include but are not limited to:

·      Emails from customers

·      Notes taken by call reps

·      Texts from chats between customers and the help desk

·      Texts from public and social forums (Facebook, Twitter etc.)

Starting with the unstructured text without predefined topics, NLP can pick up on new and unexpected trends or problems that might be missed from the existing structured transaction data. For example, the COVID-19 crises create a lot of anxieties that could impact consumers behavior in unpredicted ways.  For example, a drug prescribed to treat an unrelated condition is believed to have effect in treating COVID-19, which may trigger the anxiety of patients with this condition as they may be worried about possible low supply. As a result, the calls to get this prescription filled early or to just check on the availability might increase significantly.  

While the increased call volume would likely be detected by the data captured on a regular basis, the reason for the increase might not be easily apparent, especially if the call does not result in an order as could be the case if they call before they would be eligible for a refill.  However, if the text from the calls or the operator notes are captured,  methods using NLP for unsupervised topic clustering of text data can be deployed to identify and track new emerging problems.  In this case a topic related to questions about this specific drug could be detected.   These methods are referred to as unsupervised learning methods since they classify similar topics without relying on a predetermined dependent variable.  But this does not mean that one can just deploy the tools out of the box and hope for good results. It usually takes a fair amount of “supervision” of the unsupervised learning techniques to get the best results. Specifically, a lot of work goes into the preprocessing of the text to get it ready for applying any classification technique.  

A major difficulty in NLP tasks is working in a very high dimensionality. If every unique word in a body of text is considered to be a dimension for a model, this will likely entail hundreds of thousands of dimensions. Standardizing the text reduces the dimensionality by trimming redundancies, which then aid the model’s efficacy and computational efficiency. Doing this standardization is a major part of the NLP process and how it is done can greatly impact the quality of the results. Not only dean the standardization improves the results it also reduces the resources required to perform the analysis as it reduces the dimensionality of the problem.

Text standardization that needs to occur before any clustering of topics typically comprise of two parts.

First, general standardization that applies to all text mining projects such as checking for common variations in spelling, identify synonyms, and word stemming  (standardize different forms pf the same word, e.g., car , car’s , cars, cars’ all get con verted to  car)

Second, domain standardization that is project specific. In the above example it would be beneficial to simplify and standardize classification of calls with the aid of medical dictionaries, list of drugs and commonly used terminology.

Incorporating all the domain specific knowledge as well as fine tuning the preprocessing to the text being analyzed takes a significant amount of effort.  Hence, it is important prepare to have an established NLP process in place to enable the business to take full advantage of it during regular times as well as during times of crisis to identify emerging issues.


RELATED POSTS