Addressing data platform tech debt: LLM considerations
TL;DR: LLMs boost data insights by acting as highly skilled analysts, provided they are given structured rules and data conventions. Focus on data quality, flexibility, and proper naming conventions to effectively utilize LLMs, positioning your data team strategically.
Over the past few months we’ve spoken to over 100 data leaders and experts, and I would estimate that about 40% of the time we hear some flavor of “Our data and definitions are a mess, we’re currently focused on a big project to clean up our data platform.” This is perfectly valid, however, if you haven’t already considered the role that LLMs will play in your future tech stack, you’re likely already a step behind. Companies that are not and do not have plans to fold in LLMs into their data stack will invariably make slower, less data-informed decisions, and progressively fall behind the competition.
LLMs will fit in at various levels of the tech stack, whether it’s automated data pipeline, data quality control or at the insights and analytics layer. At Fabi.ai we’re experts at the analytics layer, so we’ll present our learnings for your consideration from this angle, however there are some great solutions cropping up at all levels of the data stack, which we recommend looking in to.
First, before we talk about what you should be thinking about as you address any tech debt, let’s talk about what LLMs are good and not good at. Ultimately, the best way to think about an LLM at your analytics layer is like an extremely qualified data analyst. If this statement make you laugh and you think that there’s no way an LLM could handle what your data analyst does, consider this: the reason it may seem like the data analyst is much better than an LLM at retrieving insights is simply because they’ve likely been with the company long enough to have run into enough issues and gotten enough feedback to internalize the data schema and rules that they need to consider. Given those same rules and considerations, the LLM will perform just as well. So the real question is: how can you formalize rules and considerations that exist in the minds of your analysts? LLMs are not good at guessing definitions or meaning, just as any new data analyst hire would not be either. If you can solve this, you’ll quickly 10X the productivity of your data analyst, and free them up to actually spend time discussing the business impact and strategy from the insights rather than just pumping out SQL queries. Companies that successfully adopt LLMs for data insight retrieval are going to be placing their data team in much more strategic positions than they already are. It’s also worth noting that LLMs are not good at proactively searching for insights or connecting seemingly unrelated pieces of data. The creativity required for this type of task is where we predict analysts and data scientists will continue to play a crucial role, only much more efficiently.
So what should you be thinking about as you consider a data architecture revamp?
- Take naming conventions and standard seriously. “ts” should be “timestamp”. A list of numbers in an enum type field should be explicitly spelled out. This is good practice in general and will help regardless. As a matter of fact, LLMs are better than humans at quickly taking in past examples and figuring out inferred meaning. Oftentimes today, a lack of discipline on this front is compensated with documentation that often quickly falls out of date or complex data models at the BI layer.
- Focus on flexibility of your data and pipeline. Definitions, data and requests will always be evolving. LLMs might surprise you and can and easily handle 10-way or more joins, however, the more flexibility you give yourself in maintaining and updating wide, gold-standard tables the better your insights will be. Today there are a multitude of incredible, modern ETL and data model management solutions that can help with this. Lean into this.
- Over-invest in data quality management and under-invest in BI. For most use cases, the sophistication of BI has become a way to compensate for bad data and schemas. The more energy you put in ensuring that the data quality is up to standard each step of the way, the more you’ll save in effort at the BI level and the more you’ll create a data-driven organization.
If you’re interested in learning more about what we do at Fabi.ai or would like to connect to discuss the merits of LLMs as you contemplate a large data rearchitecture project, please reach out!