Seaborn heatmaps: Everything you need to get started

TL;DR: Heatmaps are a great way to visualize the relationship between multiple numeric variables. When conducting exploratory data analysis, this can provide some clues as to which fields may be correlated. Seaborn heatmaps provide a quick way to get started. If you’re starting with complete, clean, numeric data in a Pandas DataFrame, creating your first Seaborn heatmap is as simple as:

import seaborn as sns

plt.figure(figsize=(8, 6))
sns.heatmap(your_dataframe, cmap='YlGnBu')
plt.title('Your title')
plt.show()

Introduction

Data visualization is a critical tool for turning raw information into actionable insights. When it comes to exploring the relationships and distributions of data, few tools are as versatile and easy to use as Python’s Seaborn library. One of Seaborn’s most popular visualization techniques is the heatmap—a color-coded representation of numerical data that can quickly reveal patterns, correlations, and clusters. In this blog post, we’ll dive into everything you need to know about Seaborn heatmaps, from basic usage to advanced customization. We’ll also look at how to prepare your data for a heatmap, discuss when to use one, and consider possible alternatives if a heatmap isn’t the right fit for your specific use case.

Seaborn heatmap

By the end of this article, you’ll be able to confidently create your own Seaborn heatmaps, tweak their appearance to suit your needs, and decide if they’re the best form of data visualization for your problem.

If you want to follow along with a video, check out this quick tutorial:

What are heatmaps

A heatmap is a two-dimensional data visualization technique where individual values in a matrix are represented by colors. The concept is fairly straightforward: each cell in the matrix corresponds to a pair of categories or numeric values, and the cell’s color indicates the magnitude or intensity of a numerical measure. You can think of heatmaps as a more visual, color-coded version of a table of numbers.

Heatmaps are highly popular in fields like data analytics, bioinformatics, finance, and even sports analytics. They can highlight complex relationships or patterns in data, making them particularly useful for correlation analysis, cluster analysis, or simply gaining a quick overview of data distributions. They’re a perfect tool for multi-variate analysis when conducting exploratory data analysis.

When to use them

Heatmaps shine in the following scenarios:

  1. Correlation analysis: If you want to quickly understand how multiple numeric variables relate to each other, a correlation matrix visualized via a heatmap can reveal which variables move together (positive correlation), move inversely (negative correlation), or show no clear relationship.
  2. Categorical vs. numerical data: Heatmaps can also display how a numerical metric changes across different categories. For example, comparing monthly sales across various store locations and product types.
  3. High-dimensional data: Heatmaps allow you to see a big-picture view of large datasets. When you have a matrix of data points, using color scales can help you quickly spot outliers or clusters.
  4. Time-series analysis: You can visualize data across time (e.g., months, weeks, hours) on one axis and different categories or variables on the other axis. For instance, analyzing website traffic or sales volume by hour of day and day of week.

However, heatmaps may not be ideal when your primary goal is to display exact values. While you can annotate a heatmap to show exact numbers, the main power of a heatmap lies in visually revealing patterns—making it less about precise values and more about comparisons.

Creating a Seaborn heatmap

One of the main reasons for Seaborn’s popularity is its simplicity and elegance in creating statistical graphics. Heatmaps are no exception. In just a few lines of code, you can generate an insightful visualization.Before starting, make sure you have the necessary libraries installed:

pip install seaborn matplotlib pandas

Basic heatmaps

Let’s begin by creating a basic heatmap. We’ll assume you’re analyzing fictional sales data from Superdope, a company that sells apparel and clothes. Suppose we have monthly sales data across three regions and three product categories (T-Shirts, Hoodies, Jackets).Before we get started, let’s generate some synthetic data for a fictitious company called Superdope, which sells fashion apparel.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Sample data for Superdope
data = {
    'Region': ['North', 'North', 'North', 'South', 'South', 'South', 'East', 'East', 'East'],
    'Product': ['T-Shirt', 'Hoodie', 'Jacket', 'T-Shirt', 'Hoodie', 'Jacket', 'T-Shirt', 'Hoodie', 'Jacket'],
    'Jan': [100, 50, 30, 80, 70, 25, 60, 55, 20],
    'Feb': [110, 45, 35, 85, 60, 30, 65, 58, 25],
    'Mar': [120, 48, 28, 75, 65, 35, 70, 52, 22],
}

df = pd.DataFrame(data)

Now that we have some data stored in a Pandas DataFrame called df, we’re ready to get started.

# Let's pivot this data so that rows represent Regions, columns represent Products, and the values represent average sales across the months.

df['AverageSales'] = df[['Jan', 'Feb', 'Mar']].mean(axis=1)
pivot_df = df.pivot(index='Region', columns='Product', values='AverageSales')

And finally, let’s plot the data using a heatmap:

plt.figure(figsize=(8, 6))
sns.heatmap(pivot_df, cmap='YlGnBu')
plt.title('Average Sales by Region and Product')
plt.show()
Basic Seaborn heatmap with no annotations

Explanation:

  • We create a DataFrame that includes the region, product, and monthly sales data.
  • We calculate an AverageSales column by averaging the sales from January, February, and March.
  • We use pivot_df to reshape the data for the heatmap: each row is a region, each column is a product, and the cell values are the average sales for those combinations.
  • We call sns.heatmap() on pivot_df with a chosen color map (e.g., 'YlGnBu').
  • Finally, we display the heatmap with plt.show().

This basic heatmap is a great starting point. Each cell’s color corresponds to the magnitude of the average sales, providing an immediate visual for comparing T-Shirt, Hoodie, and Jacket sales across the North, South, and East regions.Advanced heatmap customizationWhile the default Seaborn heatmap is serviceable, you often need to customize it for clarity and branding or to highlight specific information. Here are a few popular parameters and techniques:

  1. Annotations: Displaying numerical values inside each cell can be helpful, especially when precise values are important.
  2. Color bars: By default, Seaborn will display a color bar on the side, but you can remove it or position it differently if needed.
  3. Custom color maps: Seaborn supports a range of color maps, and you can also create your own.
  4. Axis label rotation: If you have longer category names, you might want to rotate them for better readability.
  5. Value normalization: If your data spans vastly different scales, normalizing or scaling your data might lead to a more meaningful color range.

Let’s illustrate some of these customizations:

plt.figure(figsize=(8, 6))
sns.heatmap(
    pivot_df, 
    cmap='coolwarm',         # A different color map
    annot=True,              # Show numbers in each cell
    fmt=".1f",               # Format the annotation numbers
    linewidths=0.5,          # Lines between cells
    linecolor='black',       # Color of the cell lines
    cbar_kws={'shrink': 0.8} # Shrink the color bar a bit
)

plt.title('Average Sales by Region and Product - Customized')
plt.xticks(rotation=45, ha='right')  # Rotate x-axis labels for clarity
plt.yticks(rotation=0)               # Keep y-axis labels horizontal
plt.tight_layout()
plt.show()

In the example above:

  • annot=True: Adds the actual numerical values in each cell.
  • fmt=".1f": Formats these numbers to one decimal place.
  • linewidths and linecolor: Adds thin lines between cells, improving readability.
  • cbar_kws={'shrink': 0.8}: Adjusts the size of the color bar.
  • plt.xticks() and plt.yticks(): Helps ensure that labels are readable.

Advanced color maps or sns.color_palette can also be used if you need a specific gradient or brand color scheme. For example, if Superdope has brand guidelines, you might want to use a palette that matches the company's colors.

Seaborn heatmap with YlGnBu colormap

Alternatives to Seaborn heatmaps

While Seaborn’s heatmap is widely used, there are a few close “cousins” in the Python ecosystem:

  1. Matplotlib: Since Seaborn is built on top of Matplotlib, you can create heatmaps directly with Matplotlib’s imshow or matshow functions. However, you’ll often need more code to achieve the same level of style that Seaborn provides out of the box.
  2. Plotly: Plotly offers interactive heatmaps, which can be zoomed, hovered over, and even integrated into web apps easily. For dynamic dashboards, Plotly is a strong choice.
  3. Altair: Altair is a declarative statistical visualization library. While it’s more commonly used for bar charts, line charts, and scatter plots, it can also generate heatmaps if you prefer a highly declarative style.
  4. Bokeh: Like Plotly, Bokeh is great for interactive visualizations. If you need a server-based or web-based app, Bokeh’s interactive heatmaps can provide an engaging user experience.

Note on personal preferences: I like to work with Plotly for two reasons. The first is that it renders dynamically, so when you’re embedding your chart in a report, it naturally scales up and down as the browser window size changes. This makes for a much better builder and viewer experience especially when using a platform like Fabi.ai to build reports. The other reason is that it’s interactive. Although this interactivity isn’t necessarily the most valuable for heatmaps, it does make charts more engaging and more useful for the viewer of the report.In most cases, sticking with Seaborn is straightforward, especially for quick correlation matrices or aggregated summary tables. But if you need more interactive features or specialized styling, these other libraries may be worth a look.Preparing your data for a heatmapBefore jumping into plotting a heatmap, it’s crucial to ensure your data is in a suitable format and scale:

  1. Identify your axes: Decide what rows and columns will represent in your final matrix. This might be different product categories, time periods, or geographic regions.
  2. Aggregation or summary statistics: Often, raw data is too granular. You might need to compute a summary statistic like mean, sum, or count. In our Superdope example, we used average sales per region-product combination.
  3. Pivot or reshape your data:
    • If you’re dealing with correlation matrices, you can simply call df.corr() on a DataFrame of numeric columns.
    • If you need to create a custom matrix, pivot() or pivot_table() from Pandas will help structure your data so that each row-column combination is a single numeric value.
  4. Handling missing values: Missing data can distort your heatmap. Decide whether to remove rows/columns with missing values or use an imputation strategy (filling them with zeros, means, etc.).
  5. Scaling: If certain columns have values that range vastly higher than others, you might want to standardize or normalize your data. This ensures one very large column doesn’t overshadow smaller columns in the color map.

Example of data preparation with pivot tables

Below is an extended example using pivot_table. Suppose we have daily sales data across multiple months, and we want to see total sales for each region and product category:

import numpy as np

# Extended daily data
data_extended = {
    'Date': pd.date_range('2023-01-01', periods=90, freq='D'), 
    'Region': np.random.choice(['North', 'South', 'East'], 90),
    'Product': np.random.choice(['T-Shirt', 'Hoodie', 'Jacket'], 90),
    'Sales': np.random.randint(10, 200, 90)
}

df_extended = pd.DataFrame(data_extended)

# Add a 'Month' column
df_extended['Month'] = df_extended['Date'].dt.month_name()

# Create a pivot table with sum of sales by Month and Region, grouping by Product
pivot_extended = pd.pivot_table(
    df_extended, 
    values='Sales', 
    index='Region', 
    columns='Month', 
    aggfunc='sum'
).fillna(0)

print(pivot_extended)

In this snippet:

  • We generate daily data for 90 days, ensuring we cover multiple months.
  • np.random.choice and np.random.randint create random data for demonstration.
  • We extract the month name from each date to use as columns in our pivot table.
  • pivot_table creates a summary of total sales for each region-month combination.
  • Finally, we fill missing values with zeros (in case some region has no sales in a particular month).

This pivot table can then be directly passed into sns.heatmap(pivot_extended) for visualization.

Alternatives to heatmaps

While heatmaps are incredibly useful, they’re not always the best choice. Sometimes a different chart type or approach might better communicate your data’s story. Here are some alternatives and why you might consider them:

  1. Bar charts: If you only have a few categories (e.g., product types) to compare, a grouped or stacked bar chart could be clearer. Bar charts are excellent for highlighting differences in discrete categories.
  2. Line charts: If your data is heavily time-based (e.g., daily or monthly sales trends), line charts might do a better job of showing how values change over time. You can still color-code lines to represent different regions or product categories.
  3. Scatter plots: When you’re exploring relationships between two variables (and possibly categorizing by a third), scatter plots with color or size encoding might be more informative than a heatmap.
  4. Box plots or violin plots: If you want to see the distribution of sales, profit, or any other metric (beyond just an average or sum), box plots or violin plots can show you the data’s median, quartiles, and outliers.
  5. Radar or spider charts: For multi-dimensional categorical comparisons (e.g., comparing multiple metrics across a few categories), radar charts can offer an alternative visual format.

Choosing the right chart type often hinges on the question you’re trying to answer and the nature of your data. Heatmaps excel at displaying aggregated data in a matrix format, particularly correlations or multi-dimensional relationships. If your primary need is to see detailed distributions, identify outliers, or track changes over time, other chart types may be more appropriate.

Putting it all together

Heatmaps are a powerful tool in a data scientist’s or analyst’s arsenal. Seaborn makes them especially easy to implement, even if you’re just starting out. Here’s a quick summary of how you might integrate heatmaps into your workflow for Superdope:

  1. Data collection: Gather sales data from multiple regions, product types, or time periods.
  2. Data preparation: Clean up missing values, ensure columns are labeled properly, and create relevant summary statistics or correlation matrices.
  3. Pivot or reshape: Use Pandas to create a matrix where rows and columns represent relevant categories or time periods, and the cell values represent the numeric data you’re interested in visualizing (e.g., total or average sales).
  4. Initial heatmap: Use sns.heatmap to get a quick overview. Check if the color scaling makes sense and if any data stands out.
  5. Customization: Add annotations, customize the color palette, rotate labels, and resize the figure. This step helps ensure clarity and makes the chart look professional.
  6. Interpretation: Look for patterns—do certain products sell better in certain months? Are there any surprisingly low or high sales figures? Use these insights to drive business decisions, marketing strategies, or inventory planning.

Conclusion

Seaborn heatmaps provide a straightforward, visually compelling way to explore and present data. Whether you’re investigating correlations between numeric variables, comparing aggregated sales across product categories and regions, or analyzing time-series data, a heatmap can offer immediate, at-a-glance insights. By leveraging the power of Seaborn’s simple API and Pandas’ data manipulation capabilities, you can quickly create heatmaps tailored to your specific datasets and questions.

We explored basic heatmap creation, discussed advanced customization options, and briefly touched on potential alternatives—both to Seaborn (like Plotly or Matplotlib) and to heatmaps themselves (like bar charts or scatter plots). The key is to choose the visualization that best represents your data and addresses your analytical questions.

For a company like Superdope, analyzing how product sales vary by region or month can make a substantial difference in how you allocate resources, run promotions, or adjust product lines. In your own projects, consider how a heatmap might highlight hidden patterns or relationships that aren’t readily apparent in raw data or traditional tables. With Seaborn’s versatility, getting started is as simple as importing the library and calling a few lines of code—so give it a try and see how your data storytelling transforms.

Related reads

Subscribe to Query & Theory