Why the IDE is not the future of SQL-based analytics

Why the IDE is not the future of SQL-based analytics

I’m just going to say it: the traditional IDE format is not great for writing queries for analytics work. I’ll start by explaining why, then tell you what you can do about it.

First, my explanation — there are two things that you really need to know when you’re writing or reading a query, and IDEs really fall short here:

  • 🗃 Data context: What are these tables and columns?
    Queries operate over tables and columns. Without understanding what these basic ingredients represent, you’ll be flying in the dark.
  • 📈 Business context: What is the purpose of this query?
    The product of SQL often isn’t the code itself — it’s the insights you gain from it, which generally live outside of your .sql files.

Let’s consider these more in-depth, then consider some solutions.

The IDE is built for code context, not data context 🗃

What’s the first thing you do when you start writing a query?select ... from ...

You write select and from, then look for tables and columns (and information about these tables and columns) to fill in the blanks.

Traditional IDEs, however, are structured to provide code context, not this flavor of data context. If you’ve ever used a modern IDE, you know you can jump to object definitions immediately. It’s fantastic. This makes it so much easier to get the kind of context you need for programming— the kind of context that’s defined in a separate file in your codebase. Along with a view of your file structure to orient you, the IDE enables you to manage codebases with hundreds of intertwined files.

But this is not what you need for SQL.

You don’t need jumps to understand your queries — you need deep documentation.
Don’t use a hammer when you need a shovel. ♠️

Unlike codebases, SQL files are [generally] standalone files with dependencies only on tables, meaning a lot of the benefit provided by reference-jumping in IDEs is greatly reduced. Your dependencies are only one level deep. On the other hand, the sort of table and column information you do need will often require depth. For instance, I’ll spend time looking for additional documentation about tables, who else used them, pipelines that reference them in Github to understand their provenance, etc. And access to this depth of information is, simply put, lacking in any of today’s SQL IDEs.

The IDE has no place for business context 📈

Let’s say you’ve figured out how to get sufficient data context to write a query. What do you do next? Typically, you’ll want to write up your work (at least if you want to have any hope of understanding the decisions you made or the insights garnered in the future).

The code is not the product. Your insights are. 💡
“But Robert, what about self-documenting code and comments?”, you might say.

In my experience, loose SQL files alone, however well-organized or well-commented, don’t quite suffice to keep track of your work. I always find myself either pushing them to git along with some markdown or keeping them in a separate note-taking tool. Why?

Because SQL is an interface between you and your data, not a traditional programming language.

In other words, SQL is almost always a means to an end, not the end itself. And consequently, this means that the code itself is never the whole story. The code, the motivation, the results — these are all necessary to understand what a query is for, and there’s precious little space in the IDE for context of this form.

Solutions

Alright. So I’ve told you that IDEs have no space for data context or business context, but I haven’t offered any solutions. In reality, there just aren’t many options out there.

  • Roll up data discovery + query IDE/SQL notebooks + doc app
    Data discovery tools fix the data context problem, and a separate doc app will address the business context problem. And this is what we recommended with our early customers, but there’s a serious problem with this: everything is disjoint, making it hard to manage and keep things in sync. Plus, I get a headache thinking about using 3 apps for a workflow that should only require one.
  • Whale
    If you’re looking for a barebones solution, check out whale, which enables you combine data context and business context using command-line data discovery and executable SQL within markdown files. Some open-source notebook-based solutions also exist, but we’ve generally found them clunky or underdeveloped (plain Jupyter notebooks + Knowledge-repo, Franchise, Querybook).

If you’re looking for a more elegant solution, we built something — a place for your data context, business context, and queries to live in beautiful harmony.

Notion, meet the SQL notebook, with deeply integrated data discovery.

We call it: Hyperquery — The Collaborative Workspace for Data Analysts

We’re building a SQL-first workspace that elegantly rolls the query editor into a delightful docs tool (think Notion), with easily accessible and deeply integrated data discovery. We’ve been working with a few early customers and given their reception, we’re confident this is the best way to write SQL. We guarantee you’ve never used anything like this. Sign up for the waitlist at hyperquery.ai — we’ll be launching our invite-only platform in the next few weeks.

Concluding remarks

The sort of contextual information we all search for when we write queries is not just a convenience. SQL queries and their constituent components need context to have ANY meaning. Without context, you’re not just wasting time, you’re being reckless! 😉

Keep this in mind the next time you write a query — get context, write docs, and you’ll have a much more sustainable workflow. And if you want a tool to make this a bit more manageable, sign up for Hyperquery here.

Tweet @imrobertyi / @hyperquery to say hi.👋
Follow us on LinkedIn. 🙂