TL;DR: When sharing a notebook, you need to consider whether your peers are technical and want to see the code or just want to see the report. You also need to consider whether you want to offer a static or interactive report. If you simply want to share the entire notebook, code included, cloud-hosted notebook solutions like Google Colab, Snowflake notebooks or Databricks notebooks have that functionality built in, but if you’re using local Jupyter notebooks, you need to either share your notebook via Github, Binder, Voila, JupyterHub or NBViewer depending on your requirements. If you are looking to share an interactive report that doesn’t necessarily follow the linear order of a notebook, you will need to use a Python data app library such as Streamlit or Plotly Dash.
Jupyter notebooks (or close cousins such as Google Colab, Snowflake notebooks or Databricks notebooks) are incredibly powerful tools to conduct ad hoc or exploratory data analysis. Data analysis is rarely linear and typically requires building up to a conclusion step-by-step, which notebooks are perfectly well suited for with their cell structure. This layout can largely be credited with their success and adoption since their release.
However, anyone who has used a Jupyter notebook and has come to some interesting conclusion in their analysis, has invariably asked themselves: How can I share this with my coworkers?
Notebooks that are embedded in cloud data warehouse providers have some basic sharing functionality built in, but any notebook developed locally requires some amount of work to share with your peers. In this article we are going to explore the different ways for you to share a data analysis done in a notebook with their pros and cons and considerations. Depending on the level of technical savviness of the stakeholders you’re sharing your analysis with, you may also want to consider using a Python data app.
And if all of this feels too overwhelming, we’ve provided a bite-sized recap at the end with a decision tree to help guide you.
Why share a Jupyter or Google Colab notebook?
As a data analyst, data scientist or really any other data practitioner, your analysis is only as good as the impact that it has on the business. And in order to have an impact, you need to be able to share and communicate your findings with your peers and stakeholders.
Perhaps you’re a data scientist predicting energy usage on the grid and you need to share your forecast with your engineers in the field or your clients. Or perhaps you’re a data analyst who was asked by the marketing team to determine the leading indicators of order returns, and you need to present your findings in a team meeting. In either situation, sharing and distributing your insights is as important as the process of uncovering the insight.
Making notebooks interactive and non-linear
We’re going to dive into specific solutions to share your notebooks, however, at the very outset, there are two important questions to ask yourself:
- Does my audience need to interact with the data? Interactive reports can reduce back-and-forth follow-ups and empower stakeholders to explore the data themselves.
- Should my report follow the linear order of my notebook? Traditional notebooks are structured linearly, but stakeholders often benefit from seeing key conclusions upfront with supporting details later.
The solution for sharing your data notebook will be different whether it’s just a static, read-only report or an interactive report or data app that you expect your stakeholders to be able to engage with. If you’re unsure and you’re doing your analysis in the context of a business, there’s a good chance that you will want your reports to be interactive and non-linear to some degree. After producing an analysis, you will very likely get follow-up requests, and being able to let your stakeholders play around with the data and the report themselves can reduce some of that back and forth, which in turn will reduce the workload on you while also increasing your stakeholder’s confidence in data and the analysis.
Interactive reports do come with a bit of added complexity which we touch on in the following section.
5 way to share Jupyter or Google Colab notebooks
1. Google Colab and other hosted notebooks
We can’t really talk about sharing Jupyter notebooks and making them collaborative without talking about Google Colab (no pun intended). Google Colab is a cloud-hosted notebook solution that was designed specifically for collaboration. There is a free tier to get started, and unless you’re specifically attached to Jupyter notebooks, Colab may be the quickest way for you to start sharing your data analysis.
Certain features do come at a cost, but you can get relatively far for free. The main advantage of using Colab is that it’s already hosted, so you can easily share a link with your coworkers and you can create interactive reports. It also comes with the benefit of access control. As with Google Sheets or Google Docs, you can pick and choose specifically who in your organization can access certain notebooks and what permissions they have on those notebooks.
Certain cloud providers such as Snowflake and Databricks also have their own version of notebooks which mirror a lot of the Google Colab functionality. However, in all cases, these notebooks follow a linear format. Your notebook is shared in the same order, top to bottom, as your analysis. This may not be ideal to communicate your insights. You may for example want to include a summary at the top with the concluding chart and then your supporting data down below. We touch on alternate options when we discuss Python data apps further down.
2. Github
Let’s get back to traditional Jupyter notebooks. The most basic form of sharing is using Github. If you commit your notebook to Github, it will automatically detect the format and provide a “Preview” option. This preview will effectively give you a read-only version of your notebook. As with any other file in Github, you can share this preview with your coworkers. Of course, they need to have access to the Github repository, which inherently limits this to a more technical audience.
This, of course, provides a static, non-interactive version of your notebook.
3. NBViewer
Think of NBViewer as the GitHub notebook “Preview” functionality we touched on above, but just a bit more robust. NBViewer still only provides a static, read-only view of a notebook, and it does display the code, but in theory it is supposed to support larger notebooks.
If you’re going to use the web-hosted version of NBViewer, your Github repo will have to be made publicly available which is not generally something you can do if you’re working in a company setting. If you need to keep your code and analysis private, you will need to self-host NBViewer. That said, with the Github notebook Preview functionality, NBViewer has lost some of its relevance. It’s also worth noting that NBViewer has a caching system that can make it hard to quickly iterate. By default it refreshes every 10 minutes, and you can invalidate the cache with a URL hack, but it can be a bit temperamental.
4. Voila
Voila is an open source Python framework that’s designed specifically to share notebooks as interactive, code-free reports with your stakeholders. You may want to think of Voila a bit more like a Python data app framework than a Jupyter extension. In order to support the interactivity of a report, Voila leverages ipywidgets in reactive Python frameworks like Plotly and Altair.
Building these reactive reports does take a bit more work and may require you to learn how to properly leverage callback functions. For example, the code to create a simple chart with a dropdowns looks like this:
# Function to update the plot
def update_plot(column):
with fig.batch_update():
fig.data[0].y = df[column]
fig.data[0].name = column
fig.layout.yaxis.title = column
fig.layout.title = f"{column} Over Years"
# Set up the initial trace
fig.add_trace(go.Scatter(
x=df['Year'],
y=df['Sales'],
mode='lines+markers',
name='Sales'
))
fig.update_layout(
title="Sales Over Years",
xaxis_title="Year",
yaxis_title="Sales",
template="plotly_white"
)
# Dropdown widget
dropdown = widgets.Dropdown(
options=['Sales', 'Profit'],
value='Sales',
description='Metric:',
)
# Attach a callback to the dropdown
def dropdown_change(change):
update_plot(change['new'])
dropdown.observe(dropdown_change, names='value')
# Display the dropdown and the plot
display(VBox([dropdown, fig]))
The “widgets” module is what creates the dropdown in the user interface and the dropdown_change function is what updates the plot and the “observe” method listens for the change.
5. JupyterHub & Binder
Finally, we can’t talk about sharing Jupyter notebooks without talking about JupyterHub and Binder. Let’s start with JupyterHub. JupyterHub was designed specifically to share Jupyter notebooks in a multi-player environment. You can host a JupyterHub service locally or deploy it to the cloud to share URLs of your notebooks with your coworkers. Because you’re running a service, you can create interactive reports and dashboards the same way you can with Voila (discussed above). The key difference between JupyterHub and Voila is that Voila will hide the code from your end user, and JupyterHub can spin up individual kernels for each user.
JupyterHub can be quite powerful if you’re trying to share notebooks with technical stakeholders, but it can be quite challenging to get set up and maintain. This requires quite a bit of technical expertise and cloud infrastructure know-how.
Binder is different from JupyterHub, but leverages it behind the scenes. The key difference is that it leverages Docker to manage the environment to make it easy to share a notebook without any concern around dependencies management or environment setup.
Alternatives to notebooks
After seeing your options to share data science notebooks, you may be wondering what your other options are. Specifically, if you are looking for a way to share your work done in Python with non-technical stakeholders in a visually appealing and interactive way, Python data apps are your answer. Python data apps are frameworks that allow you to do all your work in Python and build a user interface with no front end coding. The simplest and most popular to get started is Streamlit, but you also have Gradio, Taipy or, for more advanced apps, Plotly Dash.
To get a sense of what you can build with a Python data app, check out Streamlit’s Github repository.
The challenge with Python data apps is that you do need to spin up a service and host them yourself. There are some solutions like Streamlit Community Cloud, but these are meant to share data apps openly with the community, not privately and securely in the enterprise. We have a step-by-step guide on how to host a Streamlit app on an AWS EC2 instance which should give you a sense of what it would take to deploy and share an app.
Note: Fabi.ai was built to make data analysis and the distribution of those insights incredibly simple, secure and scalable. We offer the quickest and easiest way to get started, while also offering some of the familiar Jupyter notebook user interface. You can try Fabi.ai out for free and get started in 2 minutes.
Recap of your options
Do all these different options make you dizzy and feel overwhelmed? Fret not, here’s a recap of your options and a simple framework to make your decision.
To summarize, when it comes time to sharing your data analysis from a notebook, you need to ask yourself the following questions:
- Does my report need to be interactive?
- Should my report follow the linear order of my notebook or will my analysis be more impactful if I can configure the layout and narrative?
- Are my stakeholders technical or should I make sure the code is mostly hidden?
If your report is static and can just follow the same order as your notebook, GitHub and NBViewer should work well for your use case. If you need your report to be interactive and the order of your report is linear, JupyterHub, Binder or Voila may be better options. If you want to build fully customizable reports and dashboards in Python, you should consider Python data apps.
Here’s a decision tree to help you navigate these questions:
A very important consideration if you choose to go the self-hosted route: As with any self-managed and hosted applications, make sure you’ve carefully thought about security, privacy and scalability questions. The minute you expose a URL to the internet you’re putting yourself at risk.
Fabi.ai is a fully hosted, enterprise-grade Python data application platform that makes it incredibly simple to build and share insights in minutes with the power of AI. You can get started for free in less than 5 minutes.