Embark on a journey of knowledge! Take the quiz and earn valuable credits.
Take A QuizChallenge yourself and boost your learning! Start the quiz now to earn credits.
Take A QuizUnlock your potential! Begin the quiz, answer questions, and accumulate credits along the way.
Take A Quiz
🔹 1. Introduction
Time Series Analysis is one of the most crucial
techniques in data analysis, especially when working with data that is
collected or recorded over time. This includes stock prices, temperature
measurements, sales data, or log files.
Pandas provides powerful tools for working with time-based
data, including:
In this chapter, we will explore how to load, clean, and
manipulate time series data using Pandas.
🔹 2. Working with Date
and Time Data
✅ Parsing Dates
When working with CSV or Excel files containing time-based
data, you often need to convert string representations of dates into
actual datetime objects that can be manipulated.
Pandas provides the pd.to_datetime() function to convert a
column of strings into a DatetimeIndex.
import
pandas as pd
#
Sample data with date in string format
data
= {'Date': ['2021-01-01', '2021-02-01', '2021-03-01'],
'Value': [10, 20, 30]}
df
= pd.DataFrame(data)
#
Convert the 'Date' column to datetime
df['Date']
= pd.to_datetime(df['Date'])
print(df)
Output:
Date |
Value |
|
0 |
2021-01-01 |
10 |
1 |
2021-02-01 |
20 |
2 |
2021-03-01 |
30 |
This conversion allows you to perform time-based indexing,
filtering, and arithmetic operations.
✅ Handling DateTimeIndex
If your DataFrame has a datetime column, you can set it as
the index for better performance when working with time-based
operations:
df.set_index('Date',
inplace=True)
print(df)
Date |
Value |
2021-01-01 |
10 |
2021-02-01 |
20 |
2021-03-01 |
30 |
Now, the Date column becomes the index, allowing for
easier manipulation.
🔹 3. Date Offsets and
Date Ranges
Pandas allows you to generate date ranges and work
with date offsets for custom date manipulations.
✅ Generating a Date Range
To generate a range of dates over a given period, use
pd.date_range():
date_range
= pd.date_range(start='2021-01-01', periods=6, freq='M')
print(date_range)
Output:
DatetimeIndex(['2021-01-31',
'2021-02-28', '2021-03-31', '2021-04-30', '2021-05-31', '2021-06-30'],
dtype='datetime64[ns]', freq='M')
Here, start specifies the start date, periods is the number
of periods, and freq='M' generates monthly intervals.
✅ Using Date Offsets
Pandas offers date offsets to shift dates by a
specific amount:
date
= pd.to_datetime('2021-01-01')
print(date
+ pd.DateOffset(days=10)) # Adding 10
days
print(date
+ pd.DateOffset(months=2)) # Adding 2
months
Output:
2021-01-11
2021-03-01
🔹 4. Time Series Indexing
Time series indexing enables you to access specific
time-based data, such as filtering records within a date range.
✅ Accessing Data by Date
If your DataFrame has a DateTimeIndex, you can easily access
rows by date:
#
Filter data for a specific date
print(df['2021-02-01':'2021-03-01'])
✅ Resampling Time Series Data
You can resample your data to different frequencies
(e.g., daily to monthly, hourly to daily, etc.). This is useful when dealing
with data at different time granularities.
#
Resample the data to monthly frequency
monthly_data
= df.resample('M').sum()
print(monthly_data)
Output:
Date |
Value |
2021-01-31 |
10 |
2021-02-28 |
20 |
2021-03-31 |
30 |
Here, M stands for month-end frequency. You can also
use D for daily, W for weekly, and many other frequency strings.
🔹 5. Handling Time Zones
Pandas makes it easy to work with time zones. You can
convert your datetime objects into different time zones using the tz_convert()
method.
✅ Converting Time Zones
# Create a datetime object with a timezone
df['Date']
= pd.to_datetime(df['Date']).dt.tz_localize('UTC')
#
Convert to another time zone (e.g., 'US/Eastern')
df['Date']
= df['Date'].dt.tz_convert('US/Eastern')
print(df)
Output:
Date |
Value |
2021-01-01 07:00:00-05:00 |
10 |
2021-02-01
07:00:00-05:00 |
20 |
2021-03-01
07:00:00-05:00 |
30 |
🔹 6. Shifting and Lagging
Data
Another essential feature of time series data is shifting
— this involves shifting the data forward or backward to compare current values
with past values.
✅ Example of Shifting Data
df['Prev_Value']
= df['Value'].shift(1) # Shift by one
time step
print(df)
Output:
Date |
Value |
Prev_Value |
2021-01-01 |
10.0 |
NaN |
2021-02-01 |
20.0 |
10.0 |
2021-03-01 |
30.0 |
20.0 |
Here, the shift() function creates a new column with
previous values, which is useful for computing differences or growth
rates.
🔹 7. Summary Table
Operation |
Function/Method |
Description |
Convert
string to datetime |
pd.to_datetime() |
Converts a string
or column to datetime object |
Generate date
range |
pd.date_range() |
Create a
range of dates |
Add or
subtract time |
pd.DateOffset() |
Add or
subtract a time period from dates |
Resample time
series |
df.resample() |
Change the
frequency of time series data |
Time zone
localization |
dt.tz_localize() |
Localize
datetime to a specific time zone |
Time zone
conversion |
dt.tz_convert() |
Convert
datetime between time zones |
Shift or lag
data |
df.shift() |
Shift values
forward or backward by one unit |
Calculate
rolling window |
df.rolling() |
Apply a
rolling function (e.g., mean, sum) |
Pandas is a Python library for data manipulation and analysis, providing powerful data structures like DataFrames and Series.
While NumPy is great for numerical operations, Pandas is designed for working with structured data, including heterogeneous data types (strings, dates, integers, etc.) in a tabular format
A DataFrame is a two-dimensional data structure in Pandas, similar to a table or spreadsheet, with rows and columns. It’s the core structure for working with data in Pandas.
A Series is a one-dimensional data structure that can hold any data type (integers, strings, etc.), similar to a single column in a DataFrame.
You can load data using functions like pd.read_csv() for CSV files, pd.read_excel() for Excel files, and pd.read_sql() for SQL databases.
Yes — Pandas provides functions like fillna() to fill missing values, dropna() to remove rows/columns with missing data, and isna() to identify missing values.
You can filter data using conditions. For example: df[df['Age'] > 30] filters rows where the 'Age' column is greater than 30.
✅ Yes — use the groupby() function to group data by one or more columns and perform aggregations like mean(), sum(), or count().
Pandas integrates well with Matplotlib and provides a plot() function to create basic visualizations like line charts, bar charts, and histograms
Please log in to access this content. You will be redirected to the login page shortly.
LoginReady to take your education and career to the next level? Register today and join our growing community of learners and professionals.
Comments(0)