POS (Parts of Speech) Tagging — NLP basics — Part 5 of 10

5 min readMar 19, 2024

I think we all remember elementary grammar in school where we learned to identify different parts of speech in sentences.

Parts of speech, also known as word classes, are categories that words are grouped into based on their grammatical properties and functions within a sentence.

For example, the words “play” and “run” have the same function, i.e. Action.

Hence, they are grouped into the same category or part of speech, Verb.

Classification of words into different POS depends on context.

For example, “playing” can be a verb (He is playing) and also noun (I love playing).

Following are 8 parts of speech in the English language:

1. Noun
2. Pronoun
3. Verb
4. Adjective
5. Adverb
6. Preposition
7. Conjunction and
8. Interjection

What is POS tagging:

Parts of Speech (POS) tagging is the process of assigning each word in a text a grammatical category such as nouns, verbs, adjectives or preposition.

Just to make it simple, consider POS tagging techniques or models as a black box where you input a sentence and get the same sentence with POS tags with it for each word.

For example, consider the following sentence:

“A quick brown fox jumps over a lazy dog”

This is the same sentence with POS tags:

A: DT (Determiner)
quick: JJ (Adjective)
brown: JJ (Adjective)
fox: NN (Noun)
jumps: VB (Verb)
over: IN (Preposition)
a: DT (Determiner)
lazy: JJ (Adjective)
dog: NN (Noun)

Okay, now that we have covered what POS tagging is, let’s dive into why we need POS tagging?

This brings us to the applications of POS tagging.

Applications of POS Tagging

POS tagging finds its applications in a variety of NLP tasks.

Here are 3 applications of POS tagging in NLP:

1. Machine Translation:

Machine translation is the task of translating from one language to another.

Let’s see how POS tagging can help.

Consider the example,

“Left of the room” and “I left the room”.

While translating these two sentences, the word “left” has ambiguous meaning.

If we POS tag the two sentences, we get the information that “left” in the first sentence is an adjective and the other one is a verb.

This helps in Machine Translation and is called Disambiguation of words.

2. Named Entity Recognition:

Named Entity Recognition (NER) is the process of identifying entities such as people, organizations and locations in text.

POS tagging can help with that.

For example, if a word is tagged as a proper noun, it is likely that the word is name of a person.

3. Question Answering:

Systems that answer questions can use POS tags to identify the type of information being asked for in a question.

For example, a question beginning with “where” is likely looking for a place as the answer.

Okay, so we now know what POS tagging is and why it’s important, the question is HOW? how do we generate these POS tags for words?

This brings us to the techniques of POS tagging.

Techniques of POS tagging:

Following are three main techniques used for POS tagging:

1. Rule-based Approach:

This approach relies on a pre-defined set of rules and a dictionary that contains words and their likely POS tags.

When a rule-based system receives this sentence, it does two things:

Looks up every word in its dictionary.
Applies predefined rules.

A rule can be,

“If a word ends in ‘ing’ but is used as a subject, it is likely a noun!”

Consider the following example.

“Swimming is awesome!”

Let’s see the word swimming,

Looking up the word swimming in its dictionary, the system identifies “swimming” can be a noun or a verb.
Applying the rule discussed above, we find that the word is a noun in this context.

2. Stochastic POS Tagging:

This approach relies on training a probabilistic model on pre-tagged text data.

The model learns the probabilities of different POS tag sequences based on the surrounding words and other features.

During inference, the model assigns the most probable POS tag to each word based on the learned probabilities.

Stochastic approach is generally more accurate, but the downside is, it is less interpretable as it relies on complex patterns.

Hidden Markov Models (HMMs) are stochastic models mostly used for POS tagging.

Going into the exact details of how HMMs assign POS tags to words is beyond the scope of this article and merits a separate article.

3. Transformation Based Tagging:

Transformation Based Tagging (TBT) is also known as Brill Method.

It works in three simple steps:

Assign initial POS tags.
Apply transformation rules.
Iteratively refine tags.

Let’s consider an example.

“The cooking is delightful.”

Assign initial POS tags:

The first step is to assign initial POS tags to the sentence.

“The (DT) cooking (VB) is (VB) delightful. (JJ)”

2. Apply Transformation rules:

Consider a rule that states,

“If a verb in its ‘ing’ form follows ‘the’, change its tag to noun.”

So, the updated POS tags are:

“The (DT) cooking (NN) is (VB) delightful. (JJ)”

3. Iteratively refine tags:

This example was covered in a single round of applying transformation rules, but generally it takes a lot more rounds and the process iteratively changes POS tags for words.

Enough with explain the what, why and how. Let’s get our hands dirty with some code.

POS tagging with NLTK:

The following code implements a function that takes in a sentence and returns POS tags for every word.

import nltk
from nltk.tokenize import word_tokenize

def pos_tagger(sentence):
  words = word_tokenize(sentence)
  
  return nltk.pos_tag(words)

sentence = "Cooking is delightful"
pos_tagger(sentence)

[('cooking', 'NN'), ('is', 'VBZ'), ('delightful', 'JJ')]

All the magic is done by pos_tag() function under the hood.

It uses a PerceptronTagger by default for POS tagging. This falls under the category of stochastic POS tagging.

It’s trained on a large corpus of pre-tagged text data.

During inference, it considers features of the word and its context to predict the most likely part of speech tag.

POS Tagging with Spacy:

The following code implements a function that takes in a sentence and returns the sentence with POS tags.

import spacy

def pos_tagger(sentence):
  nlp = spacy.load("en_core_web_sm")
  doc = nlp(sentence)
  
  return [(token.text, token.pos_) for token in doc]

sentence = "Cooking is delightful"
pos_tagger(sentence)

[('Cooking', 'NOUN'), ('is', 'AUX'), ('delightful', 'ADJ')]

We first import spaCy and load the English language model "en_core_web_sm".

We process the sentence using spaCy’s nlp pipeline, which tokenizes the sentence and assigns POS tags to each token and then returns the sentence with pos tags in the above format.

Conclusion:

In this 5th article of our series on NLP basics, we covered POS tagging, what it is, why we need it, different techniques to perform POS tagging and nltk and spaCy implementations.

You can find the notebook here.

I hope you liked this article. If you did, consider following for more.