Embark on a journey of knowledge! Take the quiz and earn valuable credits.
Take A QuizChallenge yourself and boost your learning! Start the quiz now to earn credits.
Take A QuizUnlock your potential! Begin the quiz, answer questions, and accumulate credits along the way.
Take A Quiz
Build a Machine Learning Model to Separate Facts from
Fabrication
🧠 Introduction
The internet has transformed how we consume news, but it's
also given rise to a dangerous byproduct — fake news. From political
misinformation to health-related hoaxes, fake news has the power to influence
public opinion and cause real-world harm.
This capstone project teaches you how to use Natural
Language Processing (NLP) to build a model that detects fake news articles
using text classification techniques.
In this chapter, you'll learn:
By the end, you’ll have a fake news classifier ready to
deploy or demonstrate in your data science portfolio.
📦 Step 1: Understanding
the Problem
📊 Step 2: Load the
Dataset
We’ll use the Fake and Real News Dataset from Kaggle.
📥 Dataset: Fake and Real
News Dataset
python
import
pandas as pd
df_fake
= pd.read_csv("Fake.csv")
df_real
= pd.read_csv("True.csv")
#
Add labels
df_fake['label']
= 0
df_real['label']
= 1
df
= pd.concat([df_fake, df_real]).sample(frac=1).reset_index(drop=True)
df.head()
🧹 Step 3: Text Cleaning
& Preprocessing
python
import
re
import
string
def
clean_text(text):
text = text.lower()
text = re.sub(r'https?://\S+', '',
text) # remove links
text = re.sub(r'\[.*?\]', '', text) # remove brackets
text =
re.sub(f"[{re.escape(string.punctuation)}]", '', text)
text = re.sub(r'\w*\d\w*', '', text) # remove words with numbers
return text
df['text']
= df['title'] + " " + df['text']
# Combine title and body
df['text']
= df['text'].apply(clean_text)
📚 Step 4: Feature
Extraction with TF-IDF
python
from
sklearn.feature_extraction.text import TfidfVectorizer
vectorizer
= TfidfVectorizer(stop_words='english', max_df=0.7)
X
= vectorizer.fit_transform(df['text'])
y
= df['label']
🔁 Step 5: Train-Test
Split
python
from
sklearn.model_selection import train_test_split
X_train,
X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
🤖 Step 6: Build
Classification Models
1. Naive Bayes
python
from
sklearn.naive_bayes import MultinomialNB
from
sklearn.metrics import accuracy_score, confusion_matrix, classification_report
nb_model
= MultinomialNB()
nb_model.fit(X_train,
y_train)
y_pred_nb
= nb_model.predict(X_test)
print("Accuracy:",
accuracy_score(y_test, y_pred_nb))
2. Support Vector Machine (SVM)
python
from
sklearn.svm import LinearSVC
svm_model
= LinearSVC()
svm_model.fit(X_train,
y_train)
y_pred_svm
= svm_model.predict(X_test)
print("Accuracy:",
accuracy_score(y_test, y_pred_svm))
📈 Step 7: Model
Evaluation
python
import
seaborn as sns
import
matplotlib.pyplot as plt
cm
= confusion_matrix(y_test, y_pred_svm)
sns.heatmap(cm,
annot=True, fmt='d', cmap='Blues', xticklabels=['Fake', 'Real'],
yticklabels=['Fake', 'Real'])
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion
Matrix')
plt.show()
📋 Classification Report
python
print(classification_report(y_test,
y_pred_svm, target_names=['Fake', 'Real']))
🔍 Step 8: Feature
Importance (Optional for SVM)
python
import
numpy as np
feature_names
= vectorizer.get_feature_names_out()
coefs
= svm_model.coef_.flatten()
top_positive_coefficients
= np.argsort(coefs)[-20:]
top_negative_coefficients
= np.argsort(coefs)[:20]
top_words
= [feature_names[i] for i in top_positive_coefficients]
top_fake_words
= [feature_names[i] for i in top_negative_coefficients]
print("Top
Real-indicative words:", top_words)
print("Top
Fake-indicative words:", top_fake_words)
🌐 Step 9: Model
Deployment Ideas
📋 Summary Table
Step |
Tool / Library |
Outcome |
Data Loading |
Pandas |
Combined labeled
dataset |
Preprocessing |
Regex, string
ops |
Cleaned input
for NLP |
Feature Extraction |
TfidfVectorizer |
Sparse matrix of word
weights |
Models |
Naive Bayes,
SVM |
Trained
classifier |
Evaluation |
Scikit-learn metrics |
Accuracy, precision,
recall |
Answer: A data science capstone project is a comprehensive, end-to-end project that showcases your ability to solve real-world problems using data. It’s crucial because it demonstrates your technical skills, creativity, and business understanding — especially important for job interviews and portfolio building.
Answer: Choose based on your interests, career goals, available data, and skill level. Make sure it aligns with the kind of job you want (e.g., business analytics, machine learning, NLP), and that the data is accessible and relevant.
Answer: Yes! These projects can be approached at a beginner level with basic models (like logistic regression or Naive Bayes) and expanded over time with advanced techniques.
Answer: A typical capstone project can take anywhere from 2–6 weeks, depending on the depth. Budget time for data cleaning, analysis, modeling, visualization, and presentation.
Answer: Common tools include Python, Pandas, NumPy, Scikit-learn, Matplotlib/Seaborn, Streamlit (for deployment), and Jupyter Notebooks. For advanced projects, consider TensorFlow, PyTorch, XGBoost, and Prophet.
Answer: Definitely! Hosting your project via a Streamlit app, Flask API, or on platforms like Heroku, Hugging Face, or GitHub Pages shows professionalism and adds massive value to your resume.
Answer: Yes. Platforms like Kaggle, UCI Machine Learning Repository, and Google Dataset Search are great sources. Just ensure the data is cleanable and suitable for your problem statement.
Answer: Focus on real-world impact, explain your process clearly, include visualizations, host a demo, and document everything in a clean GitHub repository with a well-written README.md.
Answer: Yes, collaboration mirrors real-world work. Just be clear about who did what, and try to showcase your individual contributions during interviews or portfolio reviews.
Answer: For a capstone, focus on one well-executed project. It should go deep — from data collection and EDA to modeling and presentation. You can complement it with smaller side projects, but depth > breadth for capstones.
Please log in to access this content. You will be redirected to the login page shortly.
LoginReady to take your education and career to the next level? Register today and join our growing community of learners and professionals.
Comments(0)