Mastering Data Visualization with Matplotlib in Python

2.3K 0 0 0 0

Chapter 5: Plotting with Pandas and Matplotlib

🔹 1. Introduction

Pandas is a powerful library for data manipulation and analysis, widely used for working with structured data. It integrates seamlessly with Matplotlib, allowing you to create high-quality visualizations directly from Pandas DataFrames. In this chapter, we will explore how to use Matplotlib to visualize data that is stored in Pandas DataFrames. You will learn how to plot data from CSV files, handle time-series data, and customize your plots.

By the end of this chapter, you will be able to:

  • Integrate Matplotlib with Pandas for easy plotting
  • Create line plots, bar charts, histograms, and other types of visualizations directly from DataFrames
  • Handle time-series data and plot it efficiently
  • Customize your plots by modifying labels, titles, and other elements
  • Utilize Pandas for cleaning and processing data before visualization

🔹 2. Plotting Data from Pandas DataFrames

Pandas has a built-in plotting function that integrates directly with Matplotlib. This allows you to quickly plot data from a DataFrame without needing to manually define axes or figures. Pandas automatically handles the conversion of your data into a Matplotlib format.

Plotting a Line Plot

Let's begin by loading a simple dataset and plotting it using Pandas:

import pandas as pd

import matplotlib.pyplot as plt

 

# Sample DataFrame

data = {'Month': ['Jan', 'Feb', 'Mar', 'Apr', 'May'],

        'Sales': [100, 120, 150, 170, 200]}

 

df = pd.DataFrame(data)

 

# Plot a line plot

df.plot(x='Month', y='Sales', kind='line')

plt.title('Monthly Sales')

plt.ylabel('Sales')

plt.xlabel('Month')

plt.show()

In this example:

  • df.plot() is used to plot the data. We specify the x and y columns.
  • kind='line' specifies that we want to create a line plot.
  • Matplotlib handles the actual plotting behind the scenes.

🔹 3. Creating Bar Charts

Bar charts are useful for comparing categorical data. You can easily create bar charts from Pandas DataFrames.

Creating a Bar Chart

# Create a bar chart

df.plot(x='Month', y='Sales', kind='bar', color='green')

plt.title('Sales by Month')

plt.xlabel('Month')

plt.ylabel('Sales')

plt.show()

In this case:

  • kind='bar' creates a vertical bar chart.
  • We specify the column for the x and y axes, which are Month and Sales respectively.

Horizontal Bar Chart

You can also create horizontal bar charts by changing the kind to 'barh':

df.plot(x='Month', y='Sales', kind='barh', color='blue')

plt.title('Sales by Month (Horizontal)')

plt.xlabel('Sales')

plt.ylabel('Month')

plt.show()


🔹 4. Plotting Histograms

A histogram is used to visualize the distribution of a single variable. Matplotlib and Pandas make it easy to create histograms.

Creating a Histogram

# Sample data for histogram

data = {'Age': [22, 25, 30, 35, 40, 45, 50, 55, 60, 65]}

df = pd.DataFrame(data)

 

# Create a histogram

df['Age'].plot(kind='hist', bins=10, color='purple')

plt.title('Age Distribution')

plt.xlabel('Age')

plt.ylabel('Frequency')

plt.show()

Here:

  • kind='hist' creates a histogram.
  • bins=10 specifies the number of bins to divide the data into.

🔹 5. Time-Series Data and Plotting

Time-series data refers to data collected at regular intervals over time. Matplotlib, combined with Pandas, is excellent for visualizing time-series data. In this section, we'll explore how to work with date-time values and plot them using Matplotlib.

Plotting Time-Series Data

Let’s start by creating a simple time-series plot:

# Create time-series data

data = {'Date': pd.date_range(start='1/1/2021', periods=5, freq='M'),

        'Sales': [100, 120, 150, 170, 200]}

 

df = pd.DataFrame(data)

 

# Plot the time-series data

df.plot(x='Date', y='Sales', kind='line', color='orange')

plt.title('Sales Over Time')

plt.xlabel('Date')

plt.ylabel('Sales')

plt.show()

  • pd.date_range() generates a sequence of dates.
  • df.plot(x='Date', y='Sales', kind='line') plots the sales over time with dates on the x-axis.

Time-Series with Custom Date Formatting

If you want to customize the date format on the x-axis, you can use Matplotlib’s DateFormatter:

import matplotlib.dates as mdates

 

# Plot the time-series data

df.plot(x='Date', y='Sales', kind='line', color='purple')

plt.title('Sales Over Time')

plt.xlabel('Date')

plt.ylabel('Sales')

 

# Format the x-axis dates

plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%b %Y'))

plt.gcf().autofmt_xdate()  # Rotate date labels for better visibility

plt.show()

This formats the dates to display as Month Year (e.g., Jan 2021).


🔹 6. Customizing Plots with Matplotlib

While Pandas handles most of the plot customizations, you can also modify the plot further using Matplotlib directly.

Adding Titles, Labels, and Legends

df.plot(x='Month', y='Sales', kind='line')

plt.title('Monthly Sales')

plt.xlabel('Month')

plt.ylabel('Sales')

plt.legend(['Sales'])

plt.grid(True)  # Add gridlines

plt.show()

In this example:

  • plt.title() adds a title to the plot.
  • plt.xlabel() and plt.ylabel() label the axes.
  • plt.legend() adds a legend to the plot.
  • plt.grid(True) adds gridlines to the plot.

Changing Line Styles and Colors

df.plot(x='Month', y='Sales', kind='line', linestyle='-', color='red', linewidth=2)

plt.title('Sales Trend')

plt.xlabel('Month')

plt.ylabel('Sales')

plt.show()

Here, we specify:

  • linestyle='-' for a solid line.
  • color='red' for the line color.
  • linewidth=2 for the line thickness.

🔹 7. Plotting with Multiple DataFrames

When working with multiple datasets, you might want to plot data from different Pandas DataFrames on the same figure. Here’s how you can do that:

# First DataFrame

data1 = {'Month': ['Jan', 'Feb', 'Mar', 'Apr', 'May'], 'Sales': [100, 120, 150, 170, 200]}

df1 = pd.DataFrame(data1)

 

# Second DataFrame

data2 = {'Month': ['Jan', 'Feb', 'Mar', 'Apr', 'May'], 'Profit': [50, 60, 80, 90, 120]}

df2 = pd.DataFrame(data2)

 

# Plot both DataFrames

plt.plot(df1['Month'], df1['Sales'], label='Sales')

plt.plot(df2['Month'], df2['Profit'], label='Profit')

 

# Add labels and legend

plt.title('Sales and Profit Over Time')

plt.xlabel('Month')

plt.ylabel('Amount')

plt.legend()

 

plt.show()

This plots Sales and Profit from two different DataFrames on the same chart.


🔹 8. Summary Table

Operation

Function/Method

Description

Plotting a Line Plot

df.plot(kind='line')

Create a basic line plot from a DataFrame

Plotting a Bar Chart

df.plot(kind='bar')

Create a bar chart from a DataFrame

Plotting a Histogram

df['column'].plot(kind='hist')

Create a histogram from a DataFrame column

Plotting Time-Series Data

df.plot(x='Date', y='Sales')

Create a time-series plot

Adding Legends

plt.legend()

Add a legend to the plot

Customizing Plot Appearance

plt.title(), plt.xlabel()

Customize plot titles and axis labels

Customizing Plot Style

plt.plot(..., linestyle='--')

Change the line style and color

Plotting with Multiple DataFrames

plt.plot(df1['Month'], df1['Sales'], label='Sales')

Plot multiple DataFrames on the same chart



Back

FAQs


1. What is Matplotlib in Python?

Matplotlib is a powerful Python library used for creating static, animated, and interactive visualizations. It provides extensive control over plot design and is used by data scientists and analysts for visualizing data.

2. How do I install Matplotlib?

  1. Matplotlib can be installed using the Python package manager pip:


pip install matplotlib

3. What are the most common plot types in Matplotlib?

Some of the most common plot types include line plots, bar charts, scatter plots, histograms, and pie charts.

4. How can I change the style of a plot in Matplotlib?

Matplotlib offers various customization options, including color, line style, markers, axis labels, titles, and more. You can use functions like plt.plot(), plt.title(), plt.xlabel(), and plt.ylabel() to modify the style.

5. How can I save a Matplotlib plot as an image?

  1. You can save a Matplotlib plot as an image file using the savefig() method:


plt.savefig('plot.png')

6. What is the difference between plt.show() and plt.savefig()?

plt.show() displays the plot on the screen, while plt.savefig() saves the plot as an image file (e.g., PNG, JPEG, SVG, PDF).

7. Can Matplotlib be used for interactive plots?

Yes, Matplotlib supports interactive features, such as zooming, panning, and hovering over elements. For even more advanced interactivity, you can combine Matplotlib with libraries like Plotly or Bokeh.

8. How do I create a pie chart in Matplotlib?

  1. Use the plt.pie() function to create pie charts:

sizes = [10, 20, 30, 40]

labels = ['A', 'B', 'C', 'D']

plt.pie(sizes, labels=labels)


plt.show()

9. Can I create 3D plots with Matplotlib?

Yes, Matplotlib supports 3D plotting via the Axes3D module. You can create 3D scatter plots, surface plots, and more

10. How do I change the figure size in Matplotlib?

  1. You can change the figure size using plt.figure(figsize=(width, height)):


plt.figure(figsize=(10, 6))  # Set figure size to 10x6 inches