How we built Analyst Agent: Engineering a new kind of data agent

TL;DR: Building Analyst Agent required us to move beyond simple RAG to a true agentic approach with custom tooling, while solving the complex engineering challenge of maintaining isolated Python kernels that stay in sync with user-specific data and state.

Last week we launched Analyst Agent, which provides an entirely new, and better, way to conduct data analysis and enable self-service analytics in the enterprise. 

Here's a quick recap of what Analyst Agent is, and why it's different from anything else in the market:

  • Dual-mode agent architecture: Analyst Agent operates in both builder mode (for data practitioners) and self-service analytics mode (for business users). The agent navigates complex schemas and generates analytical code in builder mode, while in deployment mode it maintains the same underlying capabilities but hides code execution behind a natural language interface.
  • Python-native analysis engine: Unlike standard text-to-SQL AI solutions, Analyst Agent can analyze data using full Python capabilities. This technical foundation enables custom visualizations, advanced statistical analysis, machine learning implementations, and arbitrary script generation beyond what SQL alone can accomplish.

If you’ve tried Analyst Agent, you know it’s like magic. ✨ If you haven’t tried it out yet, you can use it for free in just a few minutes. We’ve worked hard to make it "just work."

Here's a short video showing Analyst Agent in action: 


Now that Analyst Agent is in the hands of our users, we wanted to spend some time shining the light on the engineering work that went into building the feature, as well as share some of the lessons we've learned along the way and where we plan to go from here. At a high level, we're going to dig into two big technical challenges:

  1. Agentic AI: How we built an AI agent designed specifically for data analysis
  2. Enterprise and collaborative functionality: How we built this agent in a way that any user within an organization can interact with their own agent even within shared reports

Let's dive in!

System architecture overview

We built Analyst Agent with care. We focused on a smart design that balances AI features, infrastructure needs, and user experience. Each part is key to a smooth data analysis experience that balances security and performance.

Key components of the system architecture

The architecture of Analyst Agent consists of three critical components:

  1. AI component: The agent uses a planning-based approach. It makes a plan, chooses the next action, and uses the tools we've created. We have advanced past basic RAG (Retrieval Augmented Generation), but we still support RAG features in some of our tools.
  2. Infrastructure component:  Security and isolation were paramount in our design. The agent can't just run code or change production databases. Instead, everything runs in a containerized environment. All code written by the AI runs in a dedicated kernel. When a human, AI, or report runs code, they use the same kernel environment. This means they work with the same datasets, including variables and objects.
  3. User experience component: We had to solve several challenges related to the user experience: How do we handle cold kernel starts? What happens when code takes a long time to run? How do we gracefully handle errors during code execution?

The team designed the backend agent and the UX layer in tandem to create a seamless experience for users. However, the infrastructure and agent components remain distinct. The system isolates and manages the kernel environment, where the agent, report, and human user all "live," spinning it up or down as needed.

Building the agent system

When creating an AI agent for data analysis, we had to make smart choices. We focused on frameworks, data handling, context management, and function implementation. These key elements shape how well the agent understands what users want. They also affect its ability to perform complex tasks.

Agent architecture decisions

The foundation of our agent system lies in the frameworks and architectural patterns we selected. These choices influenced how our agent handles information, makes decisions, and engages with data and users.

We decided to use LangGraph as our agent framework. LangGraph offered several advantages:

  • The open source community has widely adopted it 
  • Major companies like Uber have implemented it
  • It doesn't bind us to any particular LLM provider
  • It supports token streaming rather than just streaming steps (critical for UX to reduce the feeling of latency)

We looked at other options, like LlamaIndex workflow and Microsoft Autogen. However, LlamaIndex's async workflows were harder to debug and slowed our system down. Anthropic's MCP also looked promising but was released after we built our system.

We still use RAG, but we've moved beyond relying solely on it. When a user asks a question, we fetch the right context to create prompts. The AI can also build its own context with special tools.

Our "schema function" gets details about the database schema. Our "conversation history tool" fetches info on past user questions. The schema function itself uses RAG and a vector database to retrieve relevant context. Since developers implement these as tools, the AI can call them as needed to complete its task. Here’s an example of what this implementation looks like: 

   @tool(response_format="content_and_artifact")
        def dry_run(
            code_type: str, code: str, data_source_name: Optional[str] = None
        ) -> tuple[str, CellExecutionResult]:
            """
           Do a dry run of the sql or python code.
           The code is checked for syntax, missing libraries, data schema/semantics error, and runtime error.
            :param code_type:  The type of code to run. It can be either 'sql' or 'python'
            :param code: the sql query or python code to run
            :param data_source_name: Optional. The name of the data source to run the sql query against. It should be None if code type is python, or query dataframes or uploaded files.
            :return: is_success=True if there's no error; otherwise, is_success=False with optional error_message.
           """
            
            execution_result: CellExecutionResult = _do_dry_run(
                code_type=code_type, code=code, data_source_name=data_source_name
            )
            content_for_llm = execution_result.prep_json_for_llm()
            
            return content_for_llm, execution_result

Handling data sources and preparation

Connecting to and preparing data for analysis is a critical capability for any data agent. We focused on flexible connections and smart preparation to help us manage the variety of enterprise data environments.

Our system connects to data in two primary ways:

  1. DuckDB for handling file uploads stored in the kernel
  2. Database/data warehouse drivers for remote database connections

Database schema understanding

We crawl database schemas on a regular basis and store the metadata in a vector database. We use PostgreSQL as our vector database for this purpose.

Data preparation for analysis

We deal with two types of data:

  1. The analysis will focus on the actual data, such as SQL query results stored as DataFrames
  2. Metadata used for analysis (e.g. database schema information)

The user generally prepares the data for analysis, ensuring they upload or query the right data. The AI can generate SQL queries and manipulate already loaded data using SQL or Python. For metadata, we vectorize everything to enable semantically-relevant and quick retrieval.

Managing agent context and scope

It's crucial for an AI agent to keep the right context. This helps it give relevant and accurate answers. We used different techniques to help our agent keep the right context and manage resources well.

Context window management

We use a combination of RAG and context trimming. We save the key parts of the chat: the first message, the system message, and the last message. The AI can also pull details from the conversation history using a special tool when needed.

Memory techniques

Our primary memory management technique is message trimming. We've identified conversation vectorization as a future improvement area.

Handling large datasets

For large metadata, we use RAG and context window trimming. For the actual data, we don't pass the entire dataset to the LLM unless the AI has reduced it to a much smaller dataset using code it generated.

Implementation of function calling

LangGraph provides the foundation for our function calling implementation. We encountered several challenges:

  • When functions have too many arguments, the AI may ignore some
  • Having too many tools causes the AI to sometimes select the wrong tool
  • Long-running tools can make the AI feel slow
  • LangGraph uses a loosely-defined dict for state, so it doesn't enforce strong type binding. This makes maintenance harder.

To counteract these challenges, we ended up using combination of LangGraph and our own functions, as shown in the code snippet below: 

​​class ReActAgent: 
    ... 
    def _build_graph(self):
        graph_builder = StateGraph(ReActState)
        # chatbot node
        def chatbot(state: ReActState): 
            ... 
            trimmed_messages = self._trim_messages(state[MESSAGES])
            return {
                MESSAGES: [self.llm_with_tools.invoke(trimmed_messages)],
                N_STEPS: n_steps,
            }
        # tools node
        tools_node = ToolNode(tools=self.tools)
        CHAT_BOT_NODE_NAME = "chatbot"
        TOOLS_NODE_NAME = "tools"
        graph_builder.add_node(CHAT_BOT_NODE_NAME, chatbot)
        graph_builder.add_node(TOOLS_NODE_NAME, tools_node)
        # this function will determine the next node to route to
        def select_next_node(state: ReActState): ... 
        graph_builder.add_conditional_edges(CHAT_BOT_NODE_NAME, select_next_node)
        graph_builder.add_edge(TOOLS_NODE_NAME, CHAT_BOT_NODE_NAME)
        graph_builder.add_edge(START, CHAT_BOT_NODE_NAME)
        graph_builder.add_edge(CHAT_BOT_NODE_NAME, END)
        graph = graph_builder.compile()
        return graph

Data processing layer

Handling data with precision is at the core of Analyst Agent's capabilities. To ensure this, we had to focus on managing many data sources and keeping data quality high so the system could handle large datasets for enterprises while still performing well.

Handling multiple data sources

We created a flexible way to connect to different data sources. This helps users manage information flow easily. Our kernel environment allows for effective data storage and manipulation. We also added smart state management to keep data fresh without adding extra burdens. Our solution balances performance with the need for accurate, up-to-date information in a highly dynamic analytical environment.

Technical architecture for data connections

Fabi.ai manages the kernel to store uploaded files or SQL query results as Python DataFrames. Since they're DataFrames, we can merge them directly in memory using either Python or DuckDB if the user wants to join datasets using SQL.

Data integration challenges

Our biggest challenge isn't data integration but state management and caching. We face two key issues:

  1. We don't want to unnecessarily re-query the database to refresh data
  2. Python's flexibility means builders or the AI can override variable or object states

To handle this, we set up a dependency tracking system. It reruns upstream and downstream code blocks based on caching and staleness criteria. For example, we don't eagerly refresh a SQL query from an external data source unless we detect the update to the query.

Data preparation and cleaning

Ensuring data is properly prepared for analysis is essential for accurate results. We balance automation and flexibility to meet the needs of enterprise data environments.

Preprocessing techniques

At the metadata level, we vectorize schema information and other metadata such as sample queries and documents. The builder prepares the data for analysis on the fly.

Data quality approach

We've built our platform to work with messy enterprise data rather than trying to enforce perfect data quality. This is why the AI always shows the code and the builder can override or guide the AI.

The AI layer

The intelligence of our system relies on thoughtful implementation of AI capabilities. Every choice in this layer matters. From choosing a model to crafting prompts, it shapes how well Analyst Agent understands user questions. This, in turn, shapes the accuracy and helpfulness of its responses.

LLM evaluation approach

We've made a deliberate choice to be LLM-agnostic. We believe LLM providers are in an arms race, and our ability to switch between models and providers is a major benefit for our users. Currently, we default to Claude 3.7, but customers can choose their models or even use a private Fabi.ai-hosted LLM.

This approach brought challenges. We needed to avoid technical choices that tied us to one provider while making sure prompts could be general.

Benchmarking method

We rely primarily on manual testing and observation of behavior in real-world use cases. Despite the existence of evaluation frameworks, we've found manual testing to be most effective so far. Implementing an automated evaluation framework is going to be a big area of investment in the future.

Selection criteria

Our key criteria are:

  • Accuracy
  • Speed
  • Compatibility with our agentic framework (proper function/tool calling support)

Some models like DeepSeek don't yet support tool calling, and others are too slow for practical use.

Prompt engineering approach

Crafting effective prompts is crucial to guiding the AI toward generating useful and accurate outputs. Our approach balances providing context with allowing the agent flexibility to solve problems.

Prompt structure

Our prompt structure consists of:

  • System instruction: Standard information about Fabi.ai that defines the agent's goals
  • Smartbook/SmartReport context: Relevant details for the prompt (code snippets, metadata, DataFrames, etc.).
  • Human input: The user's prompt

Optimization for data analysis

Our optimization process involved significant trial and error. We aim to provide as little and as specific context as possible, moving most context into tools. However, AI doesn't always select the right tools, so finding the right balance is crucial.

In our Smartbooks, we use the same agentic framework. We added AI auto-focus and artifact tagging so users can send specific code and data to the AI. As a result, the prompt size is smaller, and users get clearer information.

Training and fine-tuning decisions

We've deliberately chosen not to fine-tune models. Fine-tuning tends to lock you into a specific model, and we don't believe it provides much advantage over a proper RAG and agentic system. We want to make it simple to upgrade as new models emerge, and fine-tuning conflicts with this goal.

How we designed the AI agent

The design of Analyst Agent emerged from our understanding of user needs and technical possibilities. We designed unique experiences for various user types–both data experts and data consumers–while keeping a consistent technical base throughout.

Data analyst agent for data practitioners

From the beginning, Fabi.ai has taken an AI-first approach to data analysis. From this experience, we learned key lessons about how enterprise users interact with AI for data analysis:

  • Most questions are ambiguous and not simple data pulls
  • The AI needs to handle unexpected situations
  • AI must be fast and responsive
  • Exploratory analysis evolves progressively

Our first implementation was a "simple" LLM call with embedded context retrieved using RAG. We would find the right context for the user's questions and use vectorized metadata, such as data warehouse schema info, and then ask the AI to create code.

This approach had several limitations:

  • The generated code often had issues with variable states and bugs
  • Vectorizing conversation context added latency, and semantic search was imperfect
  • The AI could only use pre-installed packages
  • RAG would sometimes retrieve too little or too much context

The AI generated code reasonably well, but often with issues, and it wasn't flexible enough to handle roadblocks on its own. We began considering an agent-based approach, and when AI agents gained prominence in 2024, it confirmed our thinking. We rebuilt our AI architecture from scratch as a truly agentic system.

We wanted the AI to function like a real analyst, capable of:

  • Designing a high-level plan for approaching the problem
  • Executing that plan
  • Testing the outcome
  • Iterating or creating a new plan as needed

To execute plans, we gave the AI agent various tools it could invoke when needed, for example:

  • Dry run code
  • Pip install Python packages
  • Retrieve historical conversations
  • Retrieve Smartbook or data warehouse schema

Analyst Agent uses these tools to analyze complex data and it can often find the right answers on its own.

Data analyst for business users

The AI agent mentioned is part of Fabi.ai Smartbooks. This is a code-friendly space for users who know SQL and Python. We wanted to help business users, but we found that giving them an AI agent that hides code and uses text-to-SQL has two main issues:

  1. It requires extremely clean data and a rigid semantic layer, which is difficult to maintain as business and data models evolve
  2. Data teams need confidence in what data the AI will use to answer questions, ensuring it won't fabricate answers for questions it lacks data for

Our solution was to let data practitioners deploy specialized agents on curated datasets. This gives Fabi.ai report builders a way to share Analyst Agents on tightly scoped datasets, knowing the AI will focus only on that data. The process works as follows:

  1. Build datasets in Smartbooks
  2. Configure AI agent artifacts
  3. Publish report with embedded AI agent

This lets report viewers explore data independently. It also cuts down on follow-up questions to data teams and helps provide quick answers in meetings. The business-user AI agent is like the one in Smartbooks with two main differences: it doesn’t query the data warehouse directly—no text-to-SQL, just text-to-Python and it uses a hidden kernel for state management. 

The system architecture

 We needed more than just the agent. We had to build a strong technical base. This base would support enterprise needs for security, scalability, and collaboration. To do this, we had to tackle some challenges with state management and user isolation. This led us to design Analyst Agent based on three core ideas:

  1. Personal AI agents

Different users can access the same report and ask the AI their own questions. For example, a CMO and demand generation lead should both be able to open the same "Campaign dashboard" and view the latest data while asking their own questions. This is straightforward for basic text-to-SQL but becomes complex for advanced data exploration using Python.

  1. AI in sync with datasets

Analyst Agent stays synchronized with the report as users update tables and charts through filters or inputs. For example, if the "Campaign dashboard" shows a table of campaigns over the last 30 days with a filter option for 60 or 90 days, updating that filter should update the data available to the AI.

  1. Enterprise-grade architecture

The solution is enterprise-grade, both secure and scalable for enterprise-level data volumes. Low latency is critical for user experience, and we take security and privacy seriously.

Individually, these requirements are straightforward, but combining them was challenging. Many users asking AI questions about the same report need careful management of different states. Python requires a runtime environment to store these variable states. We can't have the AI generating different Python DataFrames with the same name in the same kernel.

Kernel orchestration

Our solution was a sophisticated Python kernel orchestration system:

We maintain a primary kernel, and when a user accesses a report, we pull a pre-warmed kernel and seed it with the report's state. For example, if there's a campaign_data DataFrame in the report, that same DataFrame will be available with the latest data.

When a user updates a DataFrame using a filter, only their kernel updates, and the AI gains access to the latest data. For report updates, we use a unique combination of UUID matching and force a kernel reset based on update timestamps.

State management

When a user loads a report, we create a hidden kernel that's a copy of the report's kernel with all the same variable and object states. A major challenge arises when AI generates code that changes the state of objects and variables. We built a caching system that lets the AI run code as the last step in the dependency chain. Once execution finishes, the code is deleted and the cache updates the variable and object states. This prevents states from becoming out of sync with what's visible in the report.

Each Analyst Agent and report has its own kernel. This setup helps avoid state conflicts. We use our copying mechanism and kernel manager to keep everything consistent.

Lessons learned

Building Analyst Agent taught us important lessons about technical issues and strategic choices in creating AI-driven data tools.

Building an AI agent prototype was the easy part

We built an AI agent for data analysis very quickly. With today's open-source solutions, you could probably build your own version in hours or days. The real challenges came when we tried to integrate this into a system. This system needed to keep third-party data updated and it also had to be collaborative, secure, and scalable.

We used great tools like LlamaIndex and LangGraph for the AI agent. However, we didn’t find any ready-made solutions for the infrastructure part. We used Kubernetes for kernel management, but this didn't address the logic of when and how to seed, spin up, or shut down Python kernels or manage variable and object state within each kernel.

AI agents are the future

Though AI agents may seem overhyped, we're fully convinced of their value. They've unlocked workflows and user experiences that were impossible with one-shot AI or basic workflows. Letting the AI formulate and execute its own plans with predefined tools has exponentially increased the possible user interactions.

Building an LLM-agnostic agent was challenging. While providers like OpenAI offer tools, we were committed to avoiding vendor lock-in and wanted to make sure we worked with any LLM provider.

Take advantage of the LLM war

There's a race among LLM providers, with major breakthroughs like Deepseek appearing regularly. Building LLM infrastructure is tough, but it helps platforms and their customers who use these LLMs.  We're in a golden age of LLMs with constant access to better, faster, cheaper models.

We've built this philosophy into our approach from the beginning. We believe customers should be able to choose their preferred LLM or use a privately hosted Fabi.ai LLM.

Technical decisions we'd reconsider

If we were starting over, there are two things we’d think about doing differently: 

  1. Consider using Anthropic's MCP. LangGraph, tied to Langchain, has limitations, but we're waiting for the ecosystem to mature further. 
  2. Think about using an evaluation framework from the start. It can be tough, but we suspect it’s worth it.

Looking forward

We’ve already started to see how Analyst Agent is changing how Fabi.ai customers engage with their data and how business teams use that data, but we're not stopping here. We're investing in two key areas moving forward:

1. Agents that can talk to each other

Now that data practitioners can build specialized AI agents for domain-specific questions, we're extending this to enable agent-to-agent communication. Picture a "Marketing analytics agent" linked to "Campaign," "Pipeline," and "Field events" agents. It can call on any of these to respond to marketing analytics questions.

This differs from monolithic AI in two key ways:

  1. It’s easier to build, test, and control each module in a component system. This setup also makes agent performance clearer. Business users can trace responses to specific agents and reports. 
  2. Not all data is in data warehouses. A lot of it is in slides, Slack, spreadsheets, and more. By creating agents to access these sources, we can open up new possibilities.

2. Agents that can do more than analysis

Analyst Agent is great at data analysis. However, we think the standards will improve, and AI agents will get more powerful by handling complex business settings. Future users should be able to ask Analyst Agent to schedule Slack updates or pull data from slides for summaries.

Look forward to better connections with your favorite systems. You’ll be able to manage and share data and insights more easily.

If you want to build your first data agent, you can get up and running in minutes for free. If you have questions about how we built Analyst Agent or want to talk about any of the technical details discussed in this post, I’d love to connect with you

Related reads
No items found.

Subscribe to Query & Theory