Named Entity Recognition (NER) — NLP Basics — part 6 of 10

6 min readMar 24, 2024

Welcome to part 6 of our series on the basics of NLP.

In this part, we will cover Named Entity Recognition (NER), what it is, how it’s done and why it’s important.

What is NER?

Named Entity Recognition (NER) is a technique used in NLP to recognize named entities in text into categories like person, place, company or date.

For example, let’s consider the following sentence:

“Dr. Amelia Jones, a data scientist at Google AI, will be presenting her research on sentiment analysis in tweets at the upcoming NLP conference in Berlin, Germany on July 15th.”

This sentence includes:

Person: Dr. Amelia Jones
Title: Data Scientist
Organization: Google AI
Location: Berlin, Germany
Date: July 15th
Event: NLP conference
Area of Study: Sentiment analysis

The process of identifying these named entities is called NER.

Now, that we have covered what NER is, let’s talk about the “how”.

NER Techniques:

There are several approaches to NER, but two main categories are:

1. Rules — based NER:

This method involves recognizing entities based on predefined set of rules and dictionaries.

a. Pattern-based rules:

These rules deal with the morphological structure of words.

e.g: If a word ends with the suffix ‘-field’ or ‘-land’, it is probably a location as it is commonly associated with city/town names.

b. Context-based rules:

These rules deal with the surrounding words of a particular word to classify it.

e.g: If “Dr.”, “Mr.”, “Mrs.” or “Miss” appears before a word, it is probably the name of a person.

c. Dictionary approach:

Here’s a sample dictionary that can be used to classify words into different entities.

{
  "Countries": ["Pakistan", "India", "China", "Russia", ...],
  "Cities": ["Karak", "Peshawar", "Islamabad", ...],
  "Names": ["Haziq", "Shariq", "Wasiq", "Faiq", ...]
}

We check every token/word against this dictionary to assign possible entity names to it.

Disadvantages:

Rules — based NER can have a lot of false positives.

For example, the word “battlefield” will be identified as a location based on the rule discussed above.

Dictionary — based NER becomes very hard to maintain as the potential names of cities, names can be huge.

Some words can also represent two different entities, for example, Apple is a fruit as well as an organization, in such cases rules — based NER suffers.

1. Machine Learning — based NER:

This approach trains machine learning models on large datasets of labelled text.

There are many ML — based approaches to NER but these two are mainly used:

a. Conditional Random Fields (CRFs):

In CRF, y_i (tag for current word) depends only on y_i-1 (tag for previous word).

We give a sequence of words as input and the model computes the probability of a sequence of tags.

For example, for the sequence “Ali is an Engineer”, the goal of the model would be to maximize probability of “Person O O Profession”.

Here, O means not a named entity.

Mathematically, the probability of output sequence Y given input sequence X is given by:

Where,

Z(X) is the normalization term (partition function) ensuring the distribution sums to 1.
f jk are feature functions that capture convey information about words.
W jk are weights associated with each feature function.

The detailed mathematical intuition behind the process is beyond the scope of this article. If you interested, I found this article quite interesting — click here.

b. Bi-LSTM combined with CRF:

The problem with CRF is, it is blind to words that come after the target word.

For example, “I went to Harvard University.”

In this sentence, a CRF will recognize Harvard as a person name because it can’t see the “University” that comes after it.

Hence, we connect Bi-directional LSTM between the inputs and CRF to make our system more context aware.

There are two LSTM networks — one that takes input in the forward direction and another that takes the input sequence in backward direction.

The output of this Bi-LSTM network is then fed to CRF.

Now, that we know “how” it’s done, let’s actually do it.

NER with nltk:

Here’s a simple function that takes in a sequence of words and returns the corresponding entity names as a list.

import nltk

def ner_nltk(text):
    entities = {}

    tokens = nltk.sent_tokenize(text)
    for sent in tokens:
        word_tokens = nltk.word_tokenize(sent)
        for i, chunk in enumerate(nltk.ne_chunk(nltk.pos_tag(word_tokens))):
            if hasattr(chunk, 'label'):
                entities[word_tokens[i]] = chunk.label()
                
    
    return entities

text = "Tesla opened a new office in Tokyo today"
ner_nltk(text)

{'Tesla': 'GPE', 'Tokyo': 'GPE'}

NER with spaCy:

Here’s a simple function that takes in a sequence of words and returns the corresponding entity names as a list.

def ner_spacy(sentence):
  nlp = spacy.load("en_core_web_sm")

  doc = nlp(sentence)
  named_entities = [token.ent_iob_ + "-" + token.ent_type_ for token in doc]
  
  return named_entities

ner_spacy("Tesla opened a new office in Tokyo today")

['B-ORG', 'O-', 'O-', 'O-', 'O-', 'O-', 'B-GPE', 'B-DATE']

You can also visualize this result with spacy:

  
from spacy import displacy
displacy.render(doc, style="ent")

IOB Format:

spaCy returns its output in IOB format which stands for inside-outside-beginning.

Each word is tagged with either I-, O- or B- tag.

B- means beginning of a named entity. e.g. in New York, “New” will be tagged with B-GPE.

I- means inside a named entity. e.g. in New York, “York” will be tagged with I-GPE.

O- means outside. In other words, it is a tag used for non-entities.

Finally, let’s touch upon how NER can be useful:

Applications of NER:

NER plays a crucial role in various Natural Language Processing (NLP) tasks. Here are a few key applications:

1. Information Extraction:

It would be very inefficient for search engines to search for the keywords in a query in the whole text of articles, right?

Hence, we run NER models on all the articles at once and store the relevant keywords such as location, names or organizations mentioned.

When someone submits a query, we compare the keywords in the query with the keywords of articles (extracted by NER) to serve results.

2. Smart Resume Scans:

NER can identify entities such as names, locations, email addresses, phone numbers, skills or certifications etc. from resumes.

You can also implement a basic shortlisting with NER by:

Extracting relevant skills from resumes.
Extracting required skills from job descriptions.
Comparing them to score the candidate.

3. Content Classification:

Classifying content (articles, videos) into different categories makes it easy to be discovered by people.

NER can help with that by extracting relevant entities such as names, organizations or places discussed in the articles.

This can help in managing content into hierarchies which in turn improves discoverability.

Conclusion:

In this article, we discussed Named Entity Recognition in great detail. We discussed what it is, how it is carried out and where we can use it.

You can find the code here.

Stay tuned, see you next time :-)