You are an analytics machine. You take in coffee and output actionable insights. But you have one kryptonite: the infamous ad-hoc request.
“Hey can you quickly pull these numbers again…”
“Hey where was that analysis that you made…”
“Hey do you have the query for this?”
“Hey is this the table you used that one time 4 months ago?”
“Hey what does this query do again?”
This flavor of stakeholder management, we frequently lament, is an unfortunate part of the job. But I’d disagree.
When I was a data scientist at Airbnb, I compiled a list of requests I received. Some were certainly greenfield and necessary: e.g. can you help me with a query, can you create a new metric for X, etc. But the majority felt avoidable: e.g. requests for work that I’d already done before, a long tail of follow-up questions on work that I should’ve documented better, or work that I’d known others had done before. And the frustrating thing was that, in these cases, it shouldn’t have been necessary for me to be involved. I had somehow inadvertently made myself the gatekeeper of my own work by forgetting to include context or the query used for an insight. In other cases, because work was not centralized, I was simply operating as a human store of institutional knowledge.
In this article, I’ll contend that this class of problems is largely solvable by changing how we share our work, and this can lead to a drastic cut in the number of ad-hoc requests you receive. In what follows, I’ll consider two culprits: bad practices and the tools that perpetuate them.
Our tools are inherently bad at analytics collaboration
Let’s start by discussing the shortcomings of the tools we use to share work with our stakeholders. Consider the following common complaints we as analysts often make:
- You complain that your dashboard isn’t being used (but it lacks context and it’s swimming in an ocean of other contextless dashboards).
- You complain that you shared your notebook with tons of documentation, but your stakeholders still have so many questions (but… you wrote your work in a Jupyter notebook rather than SQL, making your results inaccessible to non-technical folks).
- Your stakeholders are hounding you to refresh your work (but you neglected to provide the SQL query alongside your results, because it’s sitting in some tab in your IDE).
- You hiss at a stakeholder who’s asked you about “that analysis you did” for the 100th time (but you’re keeping your work in Google docs at best, as SQL queries dumped in Slack at worst, where visibility and discoverability are poor — hell, you can’t even find it yourself).
The tools we love — dashboards, notebooks, IDEs, Google docs — are inherently bad at collaboration in the way we need to collaborate around analytics (and this is why we built Hyperquery). Yes, stakeholders get answers to their questions when you hastily drop links to your work in Slack, but these existing modes of delivery tend to rob them of any hopes of independence and faculty, meaning every subsequent request relating to the work will have to go through you. If you are the only way for someone to find, understand, and modify your work, you’re going to get a lot of questions.
How to address these shortcomings: context, transparency, discoverability
Shortcomings in our tooling can be greatly mitigated through the adoption of stricter standards around how to deliver work. When faced with a question, our instinct as analysts is to tackle the question head-on and provide an answer as quickly as possible. But we need to provide more than just answers in the form that we obtain them, or we risk being over-run by our inability to scale past our Slack-response velocity. Yes, we need to respond quickly and iterate closely alongside our stakeholders, but those conversations will go more smoothly if our work brings the right elements to the table:
- 📒 Context: explain the what, why, and how
Queries are useless without the context for which they are generated. While it is often tempting to dump queries, dashboards, data without business context, this will inevitably lead to follow-up questions, misinterpretation, and generally, frustrated parties on both ends. Conversations always go more smoothly when context, assumptions, and conclusions are explicit.
- 👀 Transparency: share your query
Transparency is a necessary condition to having a conversation around an analysis. No one wants an answer to a question without visibility into why. Plots and data should never be delivered without the queries used to generate them. And work should be done in SQL to ensure frictionless, stateless reproducibility. Without transparency, stakeholders will have no ability to answer follow-up questions, meaning these questions will trickle back to you, and worse, they’ll have less trust, which means the impact of your work will be curtailed. SQL is easy to learn, but even easier to manipulate when you’re given a query to bootstrap your work with — and I have not come across a stakeholder that wasn’t eager to do the latter.
- 🔍 Discoverability: put your work somewhere others can find it
Work needs to be discoverable. The reason why we get so many repeated requests is because nobody (including us!) can find our work after it’s delivered the first time. While this won’t directly help any single analysis, publishing work to a central place will reduce redundancies and open a channel for self-service in the future. The most painful part of analytics work isn’t the work itself. It’s doing the work again next week because you didn’t organize and document it the first time.
Perhaps you’ve taken up the torch already, but I know I’ve personally dropped the ball on these standards more than once — enough to warrant stating these requirements explicitly and imposing them as organization-wide standards. Context, transparency, and discoverability are the missing pillars of collaborative analytics. Without them, stakeholders (and other analysts) lack visibility into what you’ve done why you did it. Without them, decapitated plots and SQL code dumps will inevitably run rampant in Slack, and you’ll be the one forced to keep it all together.
If we want to build more robust, collaboration-friendly, trustworthy analytics organizations, we need to work in a way that doesn’t hamstring our efforts. Let’s embrace better standards, work in a way that enables conversations with our stakeholders (not one-sided handoffs), and level up our profession.
Tweet @imrobertyi / @hyperquery to say hi.👋 Or follow us on LinkedIn. 🙂
If you’ve felt the pains of analytics collaboration, you’ll love Hyperquery. Hyperquery is a doc workspace where you can execute + visualize queries, giving you context, transparency, and discoverability out-of-the-box.