TL;DR: JSON is everywhere in modern data work, but it's a pain to analyze directly. Whether you're dealing with application data, API responses, or IoT data, you've got four main options to convert it to a more analysis-friendly format: Python's built-in libraries, Pandas for complex structures, DuckDB for direct querying, or your data warehouse's native JSON functions.
When I was a product manager - and still to this day - I would frequently find myself confronted with JSON (and yes I’m deliberately using the word “confronted”). JSON (short for JavaScript Object Notation) is everywhere the minute you start working with application data, API responses or IoT data.
Two of the most common situation where I would run into it:
- Analyzing our product application data. For example I would have to go and find a list of customers that have a certain property and the data would live in a NoSQL databases with all the data stored in JSON format (in my case, I was working with MongoDB)
- Calling an application API and trying to understand what the data was about and making sense of it
But JSON can be quite impractical for any sort of ad hoc or exploratory data analysis (EDA). So unless you're Spock from Star Trek, you'll want to parse (or flatten) your JSON data before you start your analysis.
If you’re already familiar with JSON and just want to jump straight into the ways to work with it, skip ahead:
- Using Python’s Built-in Libraries
- Pandas Library
- Querying JSON Files with DuckDB
- Parsing JSON Fields in Databases
If you're in a rush and want the quickest way to convert a JSON file to tabular format with minimal code, check out our tutorial video.
What is JSON?
In the world of data, flexibility is key. JSON (JavaScript Object Notation) has become a popular format for data storage and exchange due to its simplicity and ease of use, especially in APIs and web services. However, when it comes to data analysis and reporting, CSV (Comma-Separated Values), DataFrames and other tabular formats are usually much more practical.
But how do you convert JSON data into a CSV file efficiently? Whether you’re dealing with a simple JSON object or a nested structure, this guide will walk you through several methods to perform this conversion. From using common tools and Python libraries to leveraging more advanced options like DuckDB, we’ll explore multiple approaches to fit different needs.
Why convert JSON to tabular format?
Before diving into the methods, let’s discuss why you might need to convert JSON to CSV:
- Compatibility: Many analytics tools, databases, and spreadsheet applications work more seamlessly with CSV files.
- Readability: CSV files are often easier for non-technical stakeholders to read and interpret.
- Data Processing: For large datasets, CSV files can be more efficient to process than JSON files, particularly in environments like Python or R.
Understanding JSON
JSON is a lightweight data-interchange format that's easy for both humans and machines to read and write. It is text-based and consists of key-value pairs, making it ideal for transmitting data in web applications. However, its flexible structure, including nested arrays and objects, can make it tricky to work with when you need flat, tabular data, which is where CSV files come in.
Converting JSON to tabular format (or CSV) with practical examples
1. Using Python’s built-in libraries
Python offers built-in libraries like json and csv to handle JSON and CSV files, respectively. This method is straightforward and ideal for smaller files or simple JSON structures.
This approach is great for quick tasks and provides full control over the conversion process. The script above works if the JSON is fairly structured an unnested.
2. Pandas library
For more complex or larger datasets, the Pandas library offers a powerful and flexible way to convert JSON to CSV.
Pandas is especially useful when dealing with nested JSON structures, as it can automatically handle flattening these into a tabular format.
Tip: if you’re dealing with a complex, nested JSON file, consider using an LLM such as Claude or ChatGPT to help you write the code to parse the data. Things can get quite complicated very fast depending on the data.
3. Querying JSON files with DuckDB
DuckDB is an emerging database management system that's known for its speed and efficiency in handling large datasets. One of its standout features is the ability to query JSON files directly without requiring any intermediate steps.
You can load a JSON file directly into DuckDB and convert it to CSV using a simple SQL query.
This approach is powerful for data analysts who are already familiar with SQL and want to avoid writing extensive code in Python.
Fabi.ai is directly integrated with DuckDB, so if you simply want to start exploring you JSON data for ad hoc analysis or exploratory data analysis, you can simply upload your file and start querying. More on that here in our product documentation.
4. Parsing JSON fields in databases
If your JSON data is embedded within fields in a database like Snowflake or BigQuery, you can sometimes use the SQL syntax from your data warehouse provider. You’ll want to check on their specific documentation to understand what you can and can’t do.
For example, in BigQuery, you might use SQL to extract JSON data:
Typically any data warehouse that supports embedding JSON into your schema will provide a way to query it. However, you may find it easier to use AI to help you write this query if this is not something you do on a regular basis.
Or alternatively you could simply query your Snowflake or BigQuery data, store it as a DataFrame then using Python to parse the data (more on that here on how to do this quickly in Fabi.ai).
Your quickest way to convert a JSON file to tabular format: Python, DuckDB or your data warehouse provided SQL syntax
If you’re working with any sort of application data, API or IoT data, you’ve likely come across JSON and will continue to. However, querying JSON data and working with it can be tedious, so if possible, your best bet is to convert it to CSV or some other tabular format such as Python DataFrames.
If you’re comfortable with Python, you can leverage the json library or built in Pandas functions. To parse out or flatten more complex JSON data you may need to write some advanced scripts or perhaps even leverage AI to help write the script for you. Alternatively, if you have access to DuckDB, you can easily read a JSON file and query it as if you were querying a table from a structured data warehouse. Finally, if your JSON data is embedded within your data warehouse schema, you can more than likely query it directly using your data warehouse provider’s SQL syntax, but it may require a bit of work to flatten it into a usable format.
If you’re currently wrangling a JSON file or want to see how AI can help you, you can get started with Fabi.ai for free in less than 2 minutes.