Understanding Natural Language Processing (NLP): The Bridge Between Human Language and Artificial Intelligence

8.23K 0 0 0 0

📗 Chapter 2: Core NLP Tasks and Techniques

Essential Capabilities That Power Modern Natural Language Understanding


🧠 Introduction

In Chapter 1, we laid the foundation for Natural Language Processing (NLP) by discussing the structure of language and preprocessing. Now, it’s time to dive into the core tasks and techniques that make NLP systems functional, intelligent, and truly interactive.

This chapter covers the essential NLP tasks like Part-of-Speech (POS) tagging, Named Entity Recognition (NER), text classification, sentiment analysis, summarization, and more—complete with real-world code examples and tool comparisons.


📘 Section 1: Part-of-Speech (POS) Tagging

🔍 What is POS Tagging?

Part-of-Speech tagging assigns a word its corresponding grammatical role—such as noun, verb, adjective, or adverb—based on its context in a sentence.

Word

POS Tag

The

Determiner

quick

Adjective

fox

Noun

jumps

Verb

over

Preposition

lazy

Adjective

dog

Noun


🧪 Code: POS Tagging with spaCy

python

 

import spacy

nlp = spacy.load("en_core_web_sm")

 

doc = nlp("The quick brown fox jumps over the lazy dog")

for token in doc:

    print(token.text, token.pos_)


📘 Section 2: Named Entity Recognition (NER)

🧠 What is NER?

NER identifies and classifies entities in text into predefined categories like:

  • PERSON
  • ORGANIZATION
  • LOCATION
  • DATE
  • MONEY

Phrase

Entity Type

Elon Musk

PERSON

SpaceX

ORG

$100 million

MONEY

April 2023

DATE


🧪 Code: NER with spaCy

python

 

doc = nlp("Apple Inc. was founded by Steve Jobs in Cupertino in 1976.")

for ent in doc.ents:

    print(ent.text, ent.label_)


📘 Section 3: Text Classification

📄 What is Text Classification?

This involves assigning a category label to a piece of text.

Input

Output Category

“This movie is amazing!”

Positive sentiment

“I want to cancel my subscription”

Complaint

“Buy 1 Get 1 Free!”

Promotion

Common classification tasks:

  • Spam detection
  • Topic categorization
  • Emotion detection
  • Intent classification (chatbots)

🧪 Code: Text Classification with scikit-learn

python

 

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.naive_bayes import MultinomialNB

 

texts = ["I love this product", "Worst experience ever", "Absolutely fantastic", "I want a refund"]

labels = ["positive", "negative", "positive", "negative"]

 

vectorizer = CountVectorizer()

X = vectorizer.fit_transform(texts)

 

model = MultinomialNB()

model.fit(X, labels)

 

test = vectorizer.transform(["This is horrible"])

print(model.predict(test))  # Output: ['negative']


📘 Section 4: Sentiment Analysis

Sentiment analysis determines whether the sentiment behind a piece of text is positive, negative, or neutral.

Sentence

Sentiment

"I absolutely loved the experience!"

Positive

"It was just okay, nothing special."

Neutral

"I hate the UI design."

Negative


🧪 Code: Sentiment Analysis with TextBlob

python

 

from textblob import TextBlob

 

text = TextBlob("I am so happy with the customer service!")

print("Polarity:", text.sentiment.polarity)  # Range: -1 to +1


📘 Section 5: Text Summarization

Summarization condenses a long article or document into a shorter version, preserving key information.

️ Types:

  • Extractive: Selects key sentences (e.g., TextRank)
  • Abstractive: Rewrites content in a new form (e.g., BART, T5)

🧪 Code: Extractive Summarization with Gensim

python

 

from gensim.summarization import summarize

 

text = """

Natural language processing (NLP) is a field of artificial intelligence that enables computers to understand, interpret, and generate human language. It has applications in chatbots, translation, speech recognition, and more.

"""

print(summarize(text, ratio=0.5))


📘 Section 6: Topic Modeling

Topic modeling uncovers hidden themes or topics in large collections of text using unsupervised learning.

🔍 Common Methods:

  • Latent Dirichlet Allocation (LDA)
  • Non-negative Matrix Factorization (NMF)

🧪 Code: Topic Modeling with LDA (Gensim)

python

 

from gensim import corpora, models

 

docs = ["NLP is fun and exciting", "Machine learning is a subset of AI", "NLP includes machine translation"]

 

tokens = [doc.lower().split() for doc in docs]

dictionary = corpora.Dictionary(tokens)

corpus = [dictionary.doc2bow(text) for text in tokens]

 

lda_model = models.LdaModel(corpus, num_topics=2, id2word=dictionary)

for idx, topic in lda_model.print_topics(-1):

    print(f"Topic {idx}: {topic}")


📘 Section 7: POS Tagging vs NER vs Text Classification – Quick Comparison

Task

Input

Output

Example Use

POS Tagging

Sentence

Word + grammatical tag

Syntax analysis

NER

Sentence

Word + entity label

Info extraction

Text Classification

Text/document

Label (intent, topic, sentiment)

Email filters


📘 Section 8: Tools and Libraries Overview

Library

Task Support Areas

Highlights

spaCy

POS, NER, dependency parsing

Fast, production-ready

NLTK

Linguistic tools, corpora

Academic and educational use

TextBlob

Sentiment, translation, POS

Beginner-friendly

scikit-learn

Classification, vectorization

ML pipelines

Gensim

Topic modeling, summarization

LDA, TF-IDF, Word2Vec

Hugging Face

Transformers for any NLP task

Pretrained BERT, GPT, T5 models


Chapter Summary Table


Technique

Core Function

Tools

POS Tagging

Word role detection

spaCy, NLTK

NER

Entity recognition

spaCy, HuggingFace

Classification

Text labeling

scikit-learn, fastText

Sentiment Analysis

Emotion detection

TextBlob, Vader

Summarization

Text compression

Gensim, T5

Topic Modeling

Theme discovery

Gensim, LDA

Back

FAQs


1. What is Natural Language Processing (NLP)?

Answer: NLP is a field of artificial intelligence that enables computers to understand, interpret, generate, and respond to human language in a meaningful way.

2. How is NLP different from traditional programming?

Answer: Traditional programming involves structured inputs, while NLP deals with unstructured, ambiguous, and context-rich human language that requires probabilistic models and machine learning.

3. What are some everyday applications of NLP?

Answer: NLP is used in chatbots, voice assistants (like Siri, Alexa), machine translation (Google Translate), spam detection, sentiment analysis, and auto-correct features.

4. What is the difference between NLU and NLG?

Answer:

  • NLU (Natural Language Understanding): Interprets and extracts meaning from language.
  • NLG (Natural Language Generation): Generates human-like language from data or code.

5. Which programming languages are best for working with NLP?

Answer: Python is the most popular due to its vast libraries like NLTK, spaCy, Hugging Face Transformers, TextBlob, and TensorFlow.

6. What are some challenges in NLP?

Answer: Key challenges include understanding sarcasm, ambiguity, handling different languages or dialects, recognizing context, and avoiding model bias.

7. What is a language model?

Answer: A language model is an AI system trained to predict and generate human-like language, such as GPT, BERT, and T5. It forms the core of many NLP applications.

8. How does NLP handle multiple languages?

Answer: Multilingual models like mBERT and XLM-RoBERTa are trained on multiple languages and can perform tasks like translation, classification, and question-answering across them.

9. Is NLP only for text-based applications?

Answer: No. NLP also works with speech through technologies like speech-to-text (ASR) and text-to-speech (TTS), enabling audio-based applications like virtual assistants.

10. Can I use NLP without being a data scientist?

Answer: Yes! Many low-code/no-code tools (like MonkeyLearn, Google Cloud NLP API, and Hugging Face AutoNLP) let non-experts build NLP solutions using pre-trained models and easy interfaces.