data:image/s3,"s3://crabby-images/7d871/7d871e641b79c4843ff3c985e2e8111d53d7493e" alt="What is exploratory data analysis (EDA): Methods, use cases, and best practices in the age of AI"
What is exploratory data analysis (EDA): Methods, use cases, and best practices in the age of AI
TL;DR: Heatmaps are a great way to visualize the relationship between multiple numeric variables. When conducting exploratory data analysis, this can provide some clues as to which fields may be correlated. Seaborn heatmaps provide a quick way to get started. If you’re starting with complete, clean, numeric data in a Pandas DataFrame, creating your first Seaborn heatmap is as simple as:
import seaborn as sns
plt.figure(figsize=(8, 6))
sns.heatmap(your_dataframe, cmap='YlGnBu')
plt.title('Your title')
plt.show()
Data visualization is a critical tool for turning raw information into actionable insights. When it comes to exploring the relationships and distributions of data, few tools are as versatile and easy to use as Python’s Seaborn library. One of Seaborn’s most popular visualization techniques is the heatmap—a color-coded representation of numerical data that can quickly reveal patterns, correlations, and clusters. In this blog post, we’ll dive into everything you need to know about Seaborn heatmaps, from basic usage to advanced customization. We’ll also look at how to prepare your data for a heatmap, discuss when to use one, and consider possible alternatives if a heatmap isn’t the right fit for your specific use case.
By the end of this article, you’ll be able to confidently create your own Seaborn heatmaps, tweak their appearance to suit your needs, and decide if they’re the best form of data visualization for your problem.
If you want to follow along with a video, check out this quick tutorial:
A heatmap is a two-dimensional data visualization technique where individual values in a matrix are represented by colors. The concept is fairly straightforward: each cell in the matrix corresponds to a pair of categories or numeric values, and the cell’s color indicates the magnitude or intensity of a numerical measure. You can think of heatmaps as a more visual, color-coded version of a table of numbers.
Heatmaps are highly popular in fields like data analytics, bioinformatics, finance, and even sports analytics. They can highlight complex relationships or patterns in data, making them particularly useful for correlation analysis, cluster analysis, or simply gaining a quick overview of data distributions. They’re a perfect tool for multi-variate analysis when conducting exploratory data analysis.
Heatmaps shine in the following scenarios:
However, heatmaps may not be ideal when your primary goal is to display exact values. While you can annotate a heatmap to show exact numbers, the main power of a heatmap lies in visually revealing patterns—making it less about precise values and more about comparisons.
One of the main reasons for Seaborn’s popularity is its simplicity and elegance in creating statistical graphics. Heatmaps are no exception. In just a few lines of code, you can generate an insightful visualization.Before starting, make sure you have the necessary libraries installed:
pip install seaborn matplotlib pandas
Let’s begin by creating a basic heatmap. We’ll assume you’re analyzing fictional sales data from Superdope, a company that sells apparel and clothes. Suppose we have monthly sales data across three regions and three product categories (T-Shirts, Hoodies, Jackets).Before we get started, let’s generate some synthetic data for a fictitious company called Superdope, which sells fashion apparel.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Sample data for Superdope
data = {
'Region': ['North', 'North', 'North', 'South', 'South', 'South', 'East', 'East', 'East'],
'Product': ['T-Shirt', 'Hoodie', 'Jacket', 'T-Shirt', 'Hoodie', 'Jacket', 'T-Shirt', 'Hoodie', 'Jacket'],
'Jan': [100, 50, 30, 80, 70, 25, 60, 55, 20],
'Feb': [110, 45, 35, 85, 60, 30, 65, 58, 25],
'Mar': [120, 48, 28, 75, 65, 35, 70, 52, 22],
}
df = pd.DataFrame(data)
Now that we have some data stored in a Pandas DataFrame called df, we’re ready to get started.
# Let's pivot this data so that rows represent Regions, columns represent Products, and the values represent average sales across the months.
df['AverageSales'] = df[['Jan', 'Feb', 'Mar']].mean(axis=1)
pivot_df = df.pivot(index='Region', columns='Product', values='AverageSales')
And finally, let’s plot the data using a heatmap:
plt.figure(figsize=(8, 6))
sns.heatmap(pivot_df, cmap='YlGnBu')
plt.title('Average Sales by Region and Product')
plt.show()
Explanation:
This basic heatmap is a great starting point. Each cell’s color corresponds to the magnitude of the average sales, providing an immediate visual for comparing T-Shirt, Hoodie, and Jacket sales across the North, South, and East regions.Advanced heatmap customizationWhile the default Seaborn heatmap is serviceable, you often need to customize it for clarity and branding or to highlight specific information. Here are a few popular parameters and techniques:
Let’s illustrate some of these customizations:
plt.figure(figsize=(8, 6))
sns.heatmap(
pivot_df,
cmap='coolwarm', # A different color map
annot=True, # Show numbers in each cell
fmt=".1f", # Format the annotation numbers
linewidths=0.5, # Lines between cells
linecolor='black', # Color of the cell lines
cbar_kws={'shrink': 0.8} # Shrink the color bar a bit
)
plt.title('Average Sales by Region and Product - Customized')
plt.xticks(rotation=45, ha='right') # Rotate x-axis labels for clarity
plt.yticks(rotation=0) # Keep y-axis labels horizontal
plt.tight_layout()
plt.show()
In the example above:
Advanced color maps or sns.color_palette can also be used if you need a specific gradient or brand color scheme. For example, if Superdope has brand guidelines, you might want to use a palette that matches the company's colors.
While Seaborn’s heatmap is widely used, there are a few close “cousins” in the Python ecosystem:
Note on personal preferences: I like to work with Plotly for two reasons. The first is that it renders dynamically, so when you’re embedding your chart in a report, it naturally scales up and down as the browser window size changes. This makes for a much better builder and viewer experience especially when using a platform like Fabi.ai to build reports. The other reason is that it’s interactive. Although this interactivity isn’t necessarily the most valuable for heatmaps, it does make charts more engaging and more useful for the viewer of the report.In most cases, sticking with Seaborn is straightforward, especially for quick correlation matrices or aggregated summary tables. But if you need more interactive features or specialized styling, these other libraries may be worth a look.Preparing your data for a heatmapBefore jumping into plotting a heatmap, it’s crucial to ensure your data is in a suitable format and scale:
Example of data preparation with pivot tables
Below is an extended example using pivot_table. Suppose we have daily sales data across multiple months, and we want to see total sales for each region and product category:
import numpy as np
# Extended daily data
data_extended = {
'Date': pd.date_range('2023-01-01', periods=90, freq='D'),
'Region': np.random.choice(['North', 'South', 'East'], 90),
'Product': np.random.choice(['T-Shirt', 'Hoodie', 'Jacket'], 90),
'Sales': np.random.randint(10, 200, 90)
}
df_extended = pd.DataFrame(data_extended)
# Add a 'Month' column
df_extended['Month'] = df_extended['Date'].dt.month_name()
# Create a pivot table with sum of sales by Month and Region, grouping by Product
pivot_extended = pd.pivot_table(
df_extended,
values='Sales',
index='Region',
columns='Month',
aggfunc='sum'
).fillna(0)
print(pivot_extended)
In this snippet:
This pivot table can then be directly passed into sns.heatmap(pivot_extended) for visualization.
While heatmaps are incredibly useful, they’re not always the best choice. Sometimes a different chart type or approach might better communicate your data’s story. Here are some alternatives and why you might consider them:
Choosing the right chart type often hinges on the question you’re trying to answer and the nature of your data. Heatmaps excel at displaying aggregated data in a matrix format, particularly correlations or multi-dimensional relationships. If your primary need is to see detailed distributions, identify outliers, or track changes over time, other chart types may be more appropriate.
Heatmaps are a powerful tool in a data scientist’s or analyst’s arsenal. Seaborn makes them especially easy to implement, even if you’re just starting out. Here’s a quick summary of how you might integrate heatmaps into your workflow for Superdope:
Seaborn heatmaps provide a straightforward, visually compelling way to explore and present data. Whether you’re investigating correlations between numeric variables, comparing aggregated sales across product categories and regions, or analyzing time-series data, a heatmap can offer immediate, at-a-glance insights. By leveraging the power of Seaborn’s simple API and Pandas’ data manipulation capabilities, you can quickly create heatmaps tailored to your specific datasets and questions.
We explored basic heatmap creation, discussed advanced customization options, and briefly touched on potential alternatives—both to Seaborn (like Plotly or Matplotlib) and to heatmaps themselves (like bar charts or scatter plots). The key is to choose the visualization that best represents your data and addresses your analytical questions.
For a company like Superdope, analyzing how product sales vary by region or month can make a substantial difference in how you allocate resources, run promotions, or adjust product lines. In your own projects, consider how a heatmap might highlight hidden patterns or relationships that aren’t readily apparent in raw data or traditional tables. With Seaborn’s versatility, getting started is as simple as importing the library and calling a few lines of code—so give it a try and see how your data storytelling transforms.