4 practical ways to use SQL and Python together for modern data analysis
TL;DR: To quickly create a Sankey diagram, use pen and paper to map out your “nodes” and “links”, add those to a spreadsheet with “To”, “From” and “Amount” columns and ask AI (OpenAI, Anthropic or Fabi.ai) to create the diagram for you.
Sankey diagrams are a powerful data visualization for telling a compelling and engaging story about your data. These flow diagrams use edges to visualize the flow of data between different stages, showing you at a glance how resources, energy, or costs move through a system. When used correctly, Sankey diagrams can provide deep insights that other charts and graphs might miss. They’re a powerful data visualization tool, whether you’re simply doing some data exploration or putting together a report or presentation.
Sankey diagrams are quite complex and don’t come readily available in basic business intelligence (BI) or spreadsheet solutions. Creating a Sankey diagram is best done with Python, and AI, in a notebook-like or Python data app environment.
In this post, we’ll show you how to quickly create your own Sankey diagram, with little to no coding! If you prefer to follow along with a video, check out our short tutorial:
What is a Sankey Diagram?
A Sankey diagram is a specific type of flow diagram that represents the flow of data, resources, or material between entities. The width of each arrow (also called a “link”) is proportional to the flow rate, making it easy to spot the dominant contributors in your dataset.
These diagrams are commonly used to visualize energy usage, material flows, financial data, or any process where it's crucial to understand how resources are distributed. In business contexts, Sankey diagrams are invaluable for tracking expenses, customer journeys, and more.
How to Create a Sankey Diagram using Python and AI
Step 1: Map out your “To” and “From” amounts
If you’re simply doing some data exploration or building a dashboards with a few nodes and links, your best bet to getting started is to map out the “To” and “From” amount with pen and paper. Ultimately we’re trying to build a spreadsheet that has three columns: “To”, “From” and “Amount”. Doing this programmatically or in your head can be challenging and it’s easy to get mixed up.
Here’s a very simple illustration of how you might sketch this out.
If you’re dealing with lots of nodes and links, this approach isn’t sustainable. Depending on the format of your data, you may need to write a short SQL or Python script to do this automatically. AI should be great at handling this task as well.
Step 2: Prepare Your Data in a spreadsheet
Once you’ve completed the step above, you should have your data stored in three simple columns. To make this data available to the AI, save that as a CSV or Excel file.
Note that most AI solutions only work with CSV, but if you’re doing this in Fabi.ai, Excel works as well.
In our example we’re showing the flow of product and services revenue to see which products contribute the most to our revenue. The first rows of our file will look like this:
Step 3: Generate the Chart with Python and AI
Now that your data is ready, it's time to generate the Sankey diagram. We’ll be using Fabi.ai to do the heavy lifting, so no need to worry about writing code. You can also do this in other AI solutions that support data visualization, such as OpenAI or Anthropic. However, these AI solutions don’t provide an effective way to share your data visualization as a dashboard or report with your coworkers, so you would likely simply have to take a screenshot.
Whichever solution you choose, follow these steps:
- Upload your data: Simply upload your CSV or Excel file to Fabi.ai.
- Ask the AI to create a Sankey diagram: Type in a simple command like, "Generate a Sankey diagram," and the AI will handle the rest.
- Add to Smartbook: Once generated, you can easily add the chart to your Smartbook for future analysis or reporting.
And that's it! Fabi.ai’s AI and Python integration makes this process incredibly simple and fast, so you can focus on analyzing your data instead of writing complex code.
And of course, if you want to publish your Sankey diagram as an interactive chart for your peers, you can quickly publish it directly from the Fabi.ai Smartbook.
Bonus: For the DIYers
If you're comfortable with Python and want to generate a Sankey diagram on your own, here’s how to do that step-by-step.
First, import your CSV file as a pandas DataFrame:
import pandas as pd
import plotly.graph_objects as go
# Load DataFrame from a CSV file
file_path = "sankey_data.csv" # Replace with your file path
df = pd.read_csv(file_path)
# Check if the necessary columns exist
required_columns = {"From", "To", "Amount"}
if not required_columns.issubset(df.columns):
raise ValueError(f"The file must contain the following columns: {required_columns}")
Prep the data to format it for the Plotly Sankey API:
# Prepare data for Sankey diagram
# Create a list of unique labels from 'From' and 'To'
labels = list(pd.concat([df["From"], df["To"]]).unique())
# Map labels to indices
label_to_index = {label: idx for idx, label in enumerate(labels)}
# Create source and target lists based on the DataFrame
sources = df["From"].map(label_to_index).tolist()
targets = df["To"].map(label_to_index).tolist()
values = df["Amount"].tolist()
Generate the Sankey diagram using Plotly:
# Create Sankey diagram
fig = go.Figure(go.Sankey(
node=dict(
pad=15, # Space between nodes
thickness=20, # Node thickness
line=dict(color="black", width=0.5),
label=labels # Node labels
),
link=dict(
source=sources, # Source indices
target=targets, # Target indices
value=values # Values (amounts)
)
))
fig.update_layout(title_text="Sankey Diagram", font_size=10)
fig.show()
Creating your first Sankey diagram made easy
Sankey diagrams are a great way to communicate the flow of resources or data within your organization whether you’re simply doing some exploratory data analysis or building a report or presentation. To generate your first chart, sketch out the diagram on pen and paper, save it in a spreadsheet and use AI and Python to generate the chart for you.
At Fabi.ai, our mission is to make data visualization, ad hoc analysis and exploratory data analysis incredibly simple for data practitioners of all levels. With an all-in-one platform that combines SQL, Python, AI and no-code, building these sorts of reports is simple and collaborative! You can get started for free in less than 2 minutes.