Advanced techniques used to perform text analysis

June 11, 2020 at 7:00 AM
feature image.png

As mentioned in my previous blog, “There are some ‘BASIC’ methods in text analysis and then there are some ’ADVANCED’ ones!” This blog will specifically talk about the ‘Advanced’ techniques.

Advanced techniques:

More useful in text analysis performed by businesses these days. These are the techniques that you look for in any text analytics vendor before contracting with one.

Text classification

It is the ability to classify/tag texts under different categories, using trained models on text classifications. It is a form of supervised learning, hence the set of possible classes are known/defined in advance, and won't change.

The most common classification tasks are topic classification, sentiment analysis, intent classification and language detection.

1. Topic classification

There are heaps of unstructured texts, to organise them , we can put each text under their relevant topic. The ability to classify the topic being talked about in the text is called Topic classification.

For example: You are an online cloth retailing brand, and you receive 100 feedbacks per day. These 100 feedbacks will not talk about a 100 unique things, but most of them will talk about some similar topics/concerns, while just a few may be of a random kind. Refer to the table below.

The ‘Uncategorised’ ones are those that are relatively non-repetitive kind of feedback or even if they are repetitive, they are not yet significant enough to constitute a separate category of its own!

Further, there are some instances when a topic may contain further sub-topics.

Take the example of ‘Facebook’. It has multiple topics such as Stories, Messenger, News feed, marketplace, etc. Though there are different topics, each topic may have the same set of text categories such as UI/UX, Bugs, Content, etc., which talk about those aspects w.r.t. that topic only. So, topic analysis will help you identify the topic under mention.

This may need multi-labeling or may be running through 2 models. One for the main topic and another for the sub-topic.

2. Sentiment analysis

Even if we know how many feedbacks fall under each text category, what’s more useful to know is, “how many of them are happy, sad or indifferent”. This technique has the ability to determine the sentiment of the text-writer as either positive, negative or neutral. This way you know the sentiment-composition behind every category/ feedback.

Using the previous example on the Clothing retail brand, check out the graph below and you’ll see how USEFUL is sentiment analysis as an insight.

You can track the changes in sentiment composition overtime. Without this composition graph, you won’t know how your customers actually ‘Feel’ about you!? It’s like being blind-folded before a beautiful sight.

3. Intent detection

Ability to determine ‘What the writer intends to do’ from the text. A certain set of words or phrases may point towards a particular intent. This feature is particularly useful for Sales or Marketing teams, because they convert leads into customers.

Let’s say an end-user writes, “My subscription to Netflix is ending in 3 days, how do I renew it?” This sentence shows the intent of ‘Renewal’ and hence is useful for the Sales team to look at and guide the user through the renewal procedure.

4. Language detection

Self-explanatory. It determines the language of the written text.

5. Emotion analysis

It is the ability to determine the emotion of a text-writer from the text. Research has also shown that emotion is amongst the leading indicators of loyalty. Customers who are frustrated, confused, or angry are unlikely to spend more money with a business. In contrast, individuals who are pleased, delighted, or happy with an interaction are likely to recommend the organisation to their peers.

Businesses that wish to design experiences, which encourage a particular response must analyse emotions to do so successfully and if they identify any gap in achieving the desired emotion, then this analysis will help to identify opportunities for business improvement and more customer-centric offerings across different touch-points of a customer’s journey.

6. Text extraction

It has different forms of extractions such as Keyword extraction and entity recognition.

6.a. Keyword extraction

Automatically extract certain keywords and expressions from a text which are important. This will help to summarise content of a text and recognise the main topics being discussed. It also helps to generate word clouds.

For example: Product reviews on Amazon’s website extracts keywords.

Below is a snapshot from the keywords extracted for a skincare creme product on Amazon. A quick glance at them helps you to understand the summarised version of the product’s review and it’s main topic of discussion.

This review belongs to the Nivea cream !

6.b. Entity recognition

Popularly aka ‘Named entity recognition’. Entity here means person, place, product, company, etc., all these entities have a name. It’s very common to find such nouns in a text (aka ‘document’ in text analysis language). Collectively, they are called named entities. They can contain values such as number, pin code, address, email, percentages, amounts, telephone, links and so on.

All of these entities can provide interesting and important information on a piece of text and improve the overall analysis of a document. Let’s say extracting information on Competitors, to know what our users say about them.

To extract these entities from a piece of text, we need to first identify named entities and values, and then extract them as part of our text analysis.

7. Word Sense Disambiguation aka Concept extraction

One word can have multiple meanings and machines should be trained enough to identify the context of its meaning used in a sentence. This feature helps to identify the correct meaning.

8. Clustering (aka Topic modeling)

Texts that are similar to each other are grouped together under one cluster. Hence multiple clusters are formed that house texts of similar kinds within it and each cluster talks about something different.

Unlike classification algorithms, clustering algorithms are less accurate, but they implement faster as you do not need to train a model (aka unsupervised machine learning).

9. Others include:

9a. Theme discovery

“Uncover trends before they’re trending”. Automatically identify the newly forming trends or issues amongst incoming texts, using Machine learning algorithms. This will help to highlight unexpected problems and prevent blindspots.

9b. Urgency detection

An irritated customer will be high on temper and low on patience, and interactions with such, call for relatively urgent attention. Because other patient ones can still wait but this customer may churn if left unattended beyond his/her tolerance time. You may observe that the words they use for expressing their urgency will vary. For example: words like “very quick” , “ASAP”, “faster”, “right away”, etc.

That’s why urgency detection model will determine any text as 1 of the 2 kinds i.e. ‘Urgent’ or ‘Non-urgent’.

However, in my personal opinion, Urgency detection is rightly done when you define the frequency with which an issue or trend is mentioned in the incoming texts.

And that’s where you should be alerted with a notification. Well, Dropthought does that, in this right sense!! Find out about ‘Advanced Downstream Triggers’

9c. Prioritising tickets

Some text analytics vendors allow you to display the support tickets in a prioritised manner, which is based on the text analysis they make and clubbed with your specified condition of prioritisation.

9d. Summarisation

“Automatic text summarisation is the task of producing a concise and fluent summary while preserving key information content and overall meaning” -Text Summarisation Techniques: A Brief Survey, 2017

Automatic text summarisation methods are greatly needed to address the ever-growing amount of text data available online to both better help discover relevant information and to consume relevant information faster.

9e. Opinion mining

It is primarily the extraction of opinion/ suggestions/ advice for the readers of the entity under discussion.

That was a long long list.. But those that I have specified under ‘Other techniques’ are relatively less frequently used and they are mostly offered as a unique service by different text analytics vendors.

Written by: Aishwarya Prasad

At: Medium