A new frontier: Data exploration in the age of AI

TL;DR: Modern AI tools are revolutionizing data exploration, turning what used to be hours or days of analysis into minutes of interactive questioning and visualization. While the fundamentals of data analytics remain important, new AI-powered platforms are making data exploration more accessible to non-technical users while supercharging the productivity of data experts through features like code generation and automated visualization.

As a data scientist working with data teams across various organizations, one of the very first things I learned was data exploration. Almost as a mantra, I was constantly told: Understand the question, understand the context, understand the data. Before diving into data analytics, you must understand your data. It's a critical first step to any data science project. Yet, the way I approach this data exploration work has completely changed.

I remember when exploratory data analysis would take hours, and sometimes even days. I would pare down the raw data so that it could fit in Excel and slice and dice it from there. Or I would fire up a Jupyter notebook and use Python as a data exploration tool to dig into the structured data. And if I wanted to share the data visualization findings from my exploration, I was out of luck, turning to screenshots. Nowadays, teams can explore data in literal minutes, in completely collaborative environments. These changes are boosting productivity for data analysts and experts. They're also bringing data analytics to the masses and fostering better collaboration across the enterprise.

In this deep-dive article, we'll break down how new data exploration techniques can increase efficiency and bring valuable insights to the enterprise. We'll talk about where we see future AI evolving and how to leverage data tools to tackle advanced data visualization.

The basics still matter (but they're getting supercharged)

Before we dive immediately into new data exploration tools and AI, it's worth noting that the basics of data analytics haven't changed. Even if you leverage AI, you still need to know:

  • Which statistical techniques to look out for and which ones matter
  • How to use data discovery methods for univariate analysis. In particular, how to build histograms to study distributions and patterns.
  • Use bivariate and multivariate analyses to find correlations and relationships between two or more fields.

We dove deep into these fundamentals of statistical analysis with examples in a past article. No matter how experienced a data analyst you are, sifting through reams of data to make sense of it can be challenging. You can't skip data preparation and data cleaning, as they are crucial steps. In the rest of this article, we're going to explore how AI can accelerate your data discovery, along with some practical tips to get started.

Real talk: How actual people use AI in their data workflows

Before we get into the nitty-gritty, here are some concrete, real examples of data science use cases that we've seen with our customers.

Exploring client data

Agencies working with clients often deal with data exploration questions. When you first engage a new client or start a project for them, you usually have some high-level questions and a dataset to analyze. One of the first steps is to know what the data contains. It's also important to know what questions it can answer.

You can get a sense of how to approach the problem by pulling a sample of the data, transforming it, and asking the AI some questions about it, using the client's questions as part of the prompt. A process that could take a few hours with spreadsheets, pivot tables, and charts can be done in a few minutes. This lets data analysts focus on generating insights.

Digging into campaign performance

Campaign data is relatively structured, but can be sliced a million different ways. If you're trying to find gems in the data, you can ask AI a few campaign performance questions with almost no time or cash cost. It will help you decide which campaigns to focus on.

AI can help you go beyond restrictive interfaces like Google Analytics. You could then ask complex questions, like: "Which ad groups launched in the last 90 days have seen an increase in both CPL and conversion rates?" 

Uncovering customer and user behavior

Understanding how users are behaving can be notoriously difficult. There isn't usually one single event that drives value and indicates that a user is likely to convert, for example. Equipped with event and product usage data, you can ask the AI, "Which combination of events might be a leading indicator that a user will convert?"

The response may not be complete. But, it can help you find the right direction and explore new ideas. This is where data discovery and visualization really shine, helping analysts spot patterns they might have missed.

How to make AI work for you in the real world

The real-world use cases above show that traditional tools are still useful. But, AI provides a new way to engage with your data. In a matter of seconds, you can have a conversation with the AI and progressively go deeper and deeper without touching a line of code. And when code is involved, AI can quickly generate scripts for you, drastically lowering the cost of asking questions. This reduces the risk of asking the wrong questions, giving you a lot more freedom to explore avenues that you normally wouldn't have considered.

So, you're ready to leverage AI to explore your data in new ways; where do you even start? Let's talk about the tools you have at your disposal, their limits, and common mistakes to avoid when using AI for exploratory data analysis.

Getting started: Tools to jumpstart your work

Let's start with the most obvious AI tool for exploratory data analysis: ChatGPT or Claude. Many other great Large Language Model (LLM) providers exist. But if this is your first use of AI for data science, start here. It's a safe bet.

You can create a free account with either provider and start by uploading your file. To keep things simple, we generally recommend starting off with a CSV and a smaller dataset. Once you've uploaded your file, here are a few prompts to get you started:

  • What types of questions can I ask about this data?
  • Which fields seem to be reliably populated and useful?
  • Provide a summary of the most important fields

As we can see from this response, it calls out that "sentiment" is consistently one of three values and that rating is consistently between 1 and 10. Without doing any sort of univariate analysis or writing any code, I immediately get a sense that those fields are reliable. This kind of quick data quality assessment is invaluable.

The AI's answer to these questions can drastically speed up your analysis and get you started in the right direction. Just this simple step could save you hours of exploration. Better yet, this step may shine a light on patterns and correlations in your data that you wouldn't have even thought to look for.

Note: There are some risks and pitfalls when using AI for this type of data exploration work. Specifically around data privacy, working with large datasets, and hallucination. We touch on all these in more detail when we talk about the limits of AI down below.

But what if your data doesn't fit in a small file, or you want to ask more advanced questions that do need code?

In those cases, you'll want to consider an enterprise-grade AI solution for data exploration. Doing so provides privacy guarantees, support for larger datasets, and more specific tools for data discovery. Here are a few of the features that make this type of platform well suited for exploratory data analysis:

  • Large data sets: Data analysis platforms can exceed the limits of CSV and Excel fields. You can easily analyze millions of rows. You can also pull data from your enterprise sources using real-time database connectors.
  • Privacy: Each tool has its own data privacy policy, and we encourage you to read it. Fabi.ai does not integrate with LLM providers that cannot guarantee they will not use shared data for training.
  • Code generation: AI data analysis platforms will focus on code generation. This will simplify code inspection and editing, lowering the chance of hallucinations.
  • Collaboration: These platforms facilitate teamwork. You can easily work with peers and test your findings.

Here are a few examples of the types of things you can ask AI to do:

Plot a histogram of basket size by platform

Here the AI created a Plotly histogram. It shows the different basket size distributions from each platform. This kind of data visualization helps data analysts quickly understand patterns in their data.

Create a correlation matrix of numeric variables

A correlation matrix can take some time to code, but it can be extremely handy for understanding how two fields may relate to each other. This is a simple task for AI to tackle, and it is particularly useful for data manipulation and analytics.

Create a hexbin of basket size vs number of items

Hexbin plots are a less conventional form of multivariate analysis but are really useful in practice. They give you a sense of cluster density, which gets lost in their more commonly used cousins, the scatter plot. Traditional data exploration techniques rarely provide this type of charting. But AI unlocks a new set of options.

In the examples above, the interpretation is left up to the data practitioner, but the AI generated the code behind these plots in a matter of seconds. This paired analysis with AI will let you quickly explore your data. It will also help you visualize it from different angles and explore it much faster.

Tip: Take the time to clean up and prep the data ahead of time if you already know what to focus on. For example, with a table of 100 fields, if you know which 20 are the most interesting, reducing your dataset to those 20 can really help the AI.

Finally, for more technical data practitioners, AI IDE plugins or AI-native IDEs can supercharge your workflow. This is especially true when paired with Jupyter Notebook plugins. Here are a few for you to consider:

  • GitHub Copilot: Copilot is a generic AI code generator that works with any coding language and can be set up in most widely used IDEs.
  • Cursor: Cursor is a VSCode fork with built-in AI. Like GitHub Copilot, it works with any coding language, and you can bring your own LLM key.
  • Windsurf: Like Cursor, Windsurf is a new agentic IDE with built-in AI.

IDEs with built-in AI are really powerful; yet, these tools are not designed specifically for data analysis. This can be a strength if your data work might lead to development. But, it's also a flaw. They lack tools to make exploratory data analysis a breeze.

As with all things data, there's not a one-size-fits-all, and there may not even be a single tool that works for you. Learning to embrace various tools and making them work in tandem can provide a huge unlock if done properly. For example, with third-party data not in our warehouse, I scrub it for anonymity. Then, I explore it using ChatGPT. If it seems useful, I upload it to Fabi.ai for further analysis. Or, if the data I'm exploring is already in our data warehouse, I just start in Fabi.ai directly.

The limits of AI: When to grab the wheel back and common mistakes

AI is a boon to data exploration. But, as we touched on above, there are some limits and risks. Let's review these:

  • Hallucinations: When you use AI to analyze your data and summarize it, rather than generating code for you to interpret, the AI can easily hallucinate. It may make up facts or data that isn't there in an attempt to answer the question.
  • Incorrect assumptions about your data model: You likely know more about your data than the AI does. If you have three "amount" fields, the AI may have no good way to make an informed decision about which one to use. You may just happen to know that the "v3" amount field is the latest and greatest, and the other two fields shouldn't be used. You should either reduce the scope of the data or provide this information to the AI.
  • Large datasets: If you're working with large datasets, you won't simply be able to upload your data to an AI provider and start asking for insights. ChatGPT has a 512 MB limit, which can be quite restrictive. In that case, consider using a data analysis platform that can break down the process into two distinct steps: Code generation to produce summarized data followed by interpretation of the summary. 
  • Jumping right to the final question: A common mistake is to give AI some data and immediately ask for conclusions. It's best to think of AI as an assistant that needs to work through a problem step by step, just like a human would. So rather than asking for conclusive insights, ask the AI the questions that you would be asking yourself in order.

Where this is all heading (and why it's exciting)

The future of AI for data analytics is exciting. With better AI models and data analysis tools, exploratory data analysis is getting easier by the day. This is particularly true with the rise of AI agents. In the section above, we mentioned the common mistake of asking the AI for conclusions and skipping intermediate steps. AI agents are improving at finding the best steps to take. They can now handle an entire task with a single question.

An AI agent is not the same as a one-shot answer. The AI must first make a plan. Then, it must use its tools to complete the tasks in that plan. So if the question "Which Superdope platform has the highest basket size?" requires first pulling the data, then creating a histogram, then analyzing the output, AI agents can handle each task on their own at once. In contrast, with a one-shot answer, the AI would look for the histogram to analyze but would say that it can't find it.

The future with AI data agents is exciting. It means every data engineer and analyst will have an assistant that can do complex analyses with little to no supervision. This means less technical users can explore data on their own, and technical data experts can now move much faster.

Tip: As AI agents continue to improve, they will abstract more and more of the heavy lifting. If you want to stay ahead of the curve and understand how the AI is working under the hood, work with tools that are closer to the AI or show you the steps. Data practitioners who understand how the mechanics work will have a leg up on those who just trust the AI. We can think of this using a car analogy: As cars evolved, fewer drivers understood how a motor works. So, when it broke down, they relied more and more on experts, even for small fixes. But if they understood the mechanics of the engine, they could likely troubleshoot the issue and avoid a hefty repair bill.

That said, a future where AI agents run off with raw data and answer complex, strategic business questions on their own is still far off. Turning raw, messy enterprise data into valuable insights needs several steps. The AI struggles with this because some business assumptions are only known by the people running it. Until all this is recorded and made available to the AI, humans will remain critical for intervention.

Note: Fabi.ai is the leading AI data agent designed specifically for data analysis and reporting. If you would like to take it for a spin, we invite you to try it out. You can get started for free in less than two minutes.

Taking your data game to the next level

If you take one thing from this article, let it be this: AI can transform your data exploration. It can supercharge you and your data teams. But you'll need to supervise it closely.

If you're new to using AI to explore your data, start simple. Save your data as a CSV. Then, upload it to ChatGPT or Claude. Next, ask AI some simple questions about your data's fields. Gradually, ask more complex questions. If you're an advanced data practitioner, look for analysis platforms with AI agents. They should be built specifically for your analysis use case. Curate your data as much as you can initially, then start asking the AI incremental questions as you work through your exploration. Remember: Even with the best AI agent, break down your problem into logical steps as you work through your exploration.

The future of AI in data science is promising. If we keep advancing at this pace, data exploration will look a lot different a year from now. 

Related reads

Subscribe to Query & Theory