You’ve probably heard of the data modeling technique referred to as ‘One Big Table’. The technique essentially revolves around preferring wide tables in order to make ingestion into BI tools easier for analysts. Less talked about is why analyst ease of use is important to data teams. Does the business care? I’m almost certain the answer to this question is a resounding “No”! The business tends to care about timely and accurate data products, not necessarily the developer experience.
Based on what I’ve seen professionally, data teams gravitate towards OBT because analysts generally do not enjoy writing SQL. If you are an analyst reading this and you DO enjoy writing SQL, congratulations! You’ll likely be a Data Engineer within a few years or less. But for the others, what tends to happen is they either (1) struggle with SQL, which leads to unnecessary errors, rework, and frustration, or (2) rely on the BI tool to perform their joins, which leads to performance issues. In both scenarios, the problem bubbles upstream until Data Engineering solves it with OBT. There’s nothing inherently wrong about this, however I do think companies would benefit from having their Data Analysts upskill. Repetitive joins don’t need to be addressed with engineering time if the analysts are sufficiently capable.
I think a side effect of abstracting away source system relationships is that the data product becomes a black box very quickly. When there is a data quality concern, the only team that can address it becomes Data Engineering since all of the logic is now in their OBT. If analysts wrote the queries themselves, triaging issues should be faster and involve fewer team members. We’re all about efficiency here 🙂 – Thanks for reading.
– DQC
Tag: philosophy
-
One Big Table (OBT) – Why?
-
The Definition of Data
What is data? This seems like a simplistic question until you really sit down to think about it. Is writing the number two on a piece of paper considered data? What about the exploding message that contains Inspector Gadget’s newest mission? Admittedly, these are a lot of questions, but that is the nature of data. Data often leads to more questions despite getting more answers.
To me, data is defined as, “The attempt to describe reality in a format that is designed to be accessed more than once.” So, to revisit our initial questions:
Is writing the number two on a piece of paper considered data?
It probably won’t surprise you if I say, “it depends”. The piece of paper definitely meets the formatting requirement, but if writing the number two isn’t an attempt to describe reality, then it should not be considered data.
What about the exploding message that contains Inspector Gadget’s newest mission?
The mission is an attempt to describe reality by communicating what the author wants to happen. Additionally, even though the message is intended to be transient, both the author and the recipient accessed its content. So yes, the mission qualifies as data.
However, an unrecorded conversation does not meet these two requirements. Information is shared, but it is not stored in an accessible format. You can argue that the human brain is an accessible format, but I’d counter with a friendly challenge – try to remember what you had for dinner 1 week ago on Tuesday.
Goodbye for now.
– DQC