Embark on a journey of knowledge! Take the quiz and earn valuable credits.
Take A QuizChallenge yourself and boost your learning! Start the quiz now to earn credits.
Take A QuizUnlock your potential! Begin the quiz, answer questions, and accumulate credits along the way.
Take A Quiz
Master the Art of Data Exploration with Central
Tendency, Variability & Visualization
🧠 Introduction
Before we build models, make predictions, or test
hypotheses, we must understand our data. Descriptive statistics give us
the tools to do just that.
Descriptive statistics are the first step in any data
analysis pipeline — used to summarize, simplify, and visualize the key features
of a dataset.
Whether you're dealing with a spreadsheet of survey
responses or a massive machine-generated dataset, descriptive statistics help
answer questions like:
In this chapter, we’ll explore:
📘 Section 1: What Are
Descriptive Statistics?
Descriptive statistics refers to methods for summarizing
raw data into meaningful information — either numerically or graphically.
Two Primary Goals:
📊 Section 2: Measures of
Central Tendency
These are values that represent the “center” or “average” of
a dataset.
1. Mean (Arithmetic Average)
python
import
pandas as pd
df
= pd.DataFrame({'Marks': [50, 60, 70, 80, 90]})
mean_val
= df['Marks'].mean()
print("Mean:",
mean_val)
Value |
Description |
Mean |
Sum of values / Number
of values |
Pros |
Easy to compute
and understand |
Cons |
Sensitive to extreme
values (outliers) |
2. Median
The middle value when data is sorted.
python
median_val
= df['Marks'].median()
print("Median:",
median_val)
Scenario |
Best Measure |
Data with outliers |
Median |
Symmetric distribution |
Mean or
Median |
3. Mode
The most frequently occurring value.
python
mode_val
= df['Marks'].mode()[0]
print("Mode:",
mode_val)
Type |
Example |
Unimodal |
One clear mode |
Bimodal |
Two high
peaks |
Multimodal |
Several peaks |
🎯 Section 3: Measures of
Dispersion
These help us understand how spread out the data is
around the center.
1. Range
python
range_val
= df['Marks'].max() - df['Marks'].min()
print("Range:",
range_val)
Simple but highly sensitive to outliers.
2. Variance and Standard Deviation
python
variance
= df['Marks'].var()
std_dev
= df['Marks'].std()
print("Variance:",
variance)
print("Standard
Deviation:", std_dev)
Feature |
Variance |
Standard Deviation |
Units |
Squared |
Same as original data |
Interpretation |
Less
intuitive |
More
intuitive |
3. Interquartile Range (IQR)
python
Q1
= df['Marks'].quantile(0.25)
Q3
= df['Marks'].quantile(0.75)
IQR
= Q3 - Q1
print("IQR:",
IQR)
Quartile |
Meaning |
Q1 |
25th percentile |
Q3 |
75th
percentile |
IQR |
Range of the middle
50% |
📊 Section 4: Frequency
Distributions
A frequency distribution is a summary of how often
each value (or range) occurs.
python
df['Marks'].value_counts().sort_index()
Example Table: Frequency Table of Scores
Marks Range |
Frequency |
50–60 |
2 |
61–70 |
4 |
71–80 |
3 |
81–90 |
1 |
📈 Section 5: Visualizing
Data
1. Histogram
python
import
matplotlib.pyplot as plt
df['Marks'].hist(bins=5)
plt.title("Histogram
of Marks")
plt.show()
Shows frequency of value ranges.
2. Box Plot
python
import
seaborn as sns
sns.boxplot(df['Marks'])
plt.title("Boxplot
of Marks")
plt.show()
3. Bar Chart & Pie Chart (Categorical Data)
python
df_cat
= pd.DataFrame({'Gender': ['M', 'F', 'M', 'F', 'M']})
df_cat['Gender'].value_counts().plot(kind='bar')
🧠 Section 6: Data Shape
and Distribution
Understanding distribution shape helps you choose the right
statistical methods.
Shape |
Characteristics |
Normal |
Bell-shaped,
symmetric, mean ≈ median |
Skewed Left |
Tail on the
left, mean < median |
Skewed Right |
Tail on the right,
mean > median |
python
sns.histplot(df['Marks'], kde=True)
📋 Section 7: Summary
Table – Descriptive Statistics Techniques
Technique |
Purpose |
Python Code
Example |
Mean |
Average value |
df['col'].mean() |
Median |
Middle value |
df['col'].median() |
Mode |
Most frequent value |
df['col'].mode()[0] |
Standard Deviation |
Spread around
the mean |
df['col'].std() |
IQR |
Middle 50% range |
Q3 - Q1 |
Histogram |
Frequency
visualization |
df['col'].hist() |
Boxplot |
Summary of spread and outliers |
sns.boxplot(df['col']) |
Answer: Descriptive statistics summarize and describe the features of a dataset (like averages and charts), while inferential statistics use a sample to draw conclusions or make predictions about a larger population.
Answer: Yes, typically. Descriptive stats help explore and understand the data, and inferential stats help make decisions or predictions based on that data.
Answer: Absolutely. Descriptive statistics can be used on either a full population or a sample — they simply describe the data you have.
Answer: It’s often impractical, costly, or impossible to collect data on an entire population. Inferential statistics allow us to make reasonable estimates or test hypotheses using smaller samples.
Answer: Common examples include the mean, median, mode, range, standard deviation, histograms, and pie charts — all of which describe the shape and spread of the data.
Answer: These include confidence intervals, hypothesis testing (e.g., t-tests, chi-square tests), ANOVA, and regression analysis.
Answer: A confidence interval is an inferential statistic because it estimates a population parameter based on a sample.
Answer: P-values are part of inferential statistics. They are used in hypothesis testing to assess the evidence against a null hypothesis.
Answer: Once you've summarized your data and understand its structure, you'll move to inferential statistics if your goal is to generalize, compare groups, or test relationships beyond your dataset.
Answer: Yes — while charts are often associated with descriptive stats, inferential techniques can also be visualized (e.g., confidence interval plots, regression lines, distribution curves from hypothesis tests).
Please log in to access this content. You will be redirected to the login page shortly.
LoginReady to take your education and career to the next level? Register today and join our growing community of learners and professionals.
Comments(0)