Tag: Data Engineering

One Big Table (OBT) – Why?

You’ve probably heard of the data modeling technique referred to as ‘One Big Table’. The technique essentially revolves around preferring wide tables in order to make ingestion into BI tools easier for analysts. Less talked about is why analyst ease of use is important to data teams. Does the business care? I’m almost certain the answer to this question is a resounding “No”! The business tends to care about timely and accurate data products, not necessarily the developer experience.

Based on what I’ve seen professionally, data teams gravitate towards OBT because analysts generally do not enjoy writing SQL. If you are an analyst reading this and you DO enjoy writing SQL, congratulations! You’ll likely be a Data Engineer within a few years or less. But for the others, what tends to happen is they either (1) struggle with SQL, which leads to unnecessary errors, rework, and frustration, or (2) rely on the BI tool to perform their joins, which leads to performance issues. In both scenarios, the problem bubbles upstream until Data Engineering solves it with OBT. There’s nothing inherently wrong about this, however I do think companies would benefit from having their Data Analysts upskill. Repetitive joins don’t need to be addressed with engineering time if the analysts are sufficiently capable.

I think a side effect of abstracting away source system relationships is that the data product becomes a black box very quickly. When there is a data quality concern, the only team that can address it becomes Data Engineering since all of the logic is now in their OBT. If analysts wrote the queries themselves, triaging issues should be faster and involve fewer team members. We’re all about efficiency here 🙂 – Thanks for reading.

– DQC

October 13, 2025
Software Engineering vs. Data Engineering – An Analogy

The following question came up in a recent conversation:

“Why do you gravitate towards data-centric projects?”

I was surprised by my answer, which is why I’m sharing. Essentially, I believe that data projects are less emotional than traditional software projects. The expected outputs can be clearly defined and are less subjective. On a data project, although there are several paths to success, they all look and feel quite similar. For a software project, success has a much wider range of variability. I’ll give an analogy – Data Engineering is like a grocery store, while Software Engineering is like a restaurant. Let me explain…

With traditional software, companies often pay 6/7/8 figures for projects or SaaS tools to support mission critical business processes – think CRMs, financial systems, help desks, etc… These systems are typically customizable to fit company-specific preferences, much like a restaurant allows substitutions. And just like restaurants, user sentiment tends to be polarized. People love or hate systems like Salesforce – rarely anything in between. The same goes for McDonald’s. These industries require highly specific customer targeting. Grocery stores, on the other hand, cater more broadly and don’t rely as heavily on emotional appeal or fine-tuned experiences.

Another aspect worth considering is that Data Engineers typically work with data produced by software products. That means they’re often brought in after the fact – usually when querying data becomes slow, inconsistent, or expensive. To bring back the analogy: imagine someone who isn’t an experienced home chef but likes to eat good food. At first, they might rely heavily on restaurants or takeout. Eventually, health or financial concerns force a shift – maybe to frozen meals (think PF Chang’s in the grocery freezer), and eventually, to cooking from scratch with groceries.

Similarly, companies may start with SaaS dashboards or run ad-hoc SQL on siloed systems. But over time, data sprawl leads to confusion, inconsistencies, and frustration. That’s when data engineering becomes essential. Like cooking your own meals, it requires more effort up front, but it creates a healthier, more scalable data ecosystem in the long run.

Happy Querying
– DQC

July 12, 2025
Cloud Data Migration – Part II (Failure)

Of course, things don’t always go as planned. To illustrate, I’ll be summarizing a failed database migration that started more than a year before I arrived on the scene. The technology stack was already selected, the mission was relatively clear, so what went wrong? Part I briefly referenced “Project Management” as a special set of skills crucial to the outcome of these types of initiatives. This is where the organization was lacking. The project lead must have the technical ability and executive authority to drive the project forward. If either of these components are lacking, the red flags are a-flapping.

Luckily for them, my involvement greatly enhanced the technical capacity of the team. Not that the existing staff weren’t capable, they just needed a little extra bandwidth. However, when it came to executive authority, we had almost none. The dedicated project manager didn’t fully grasp the nature of the work but also wouldn’t trust the team to make independent decisions. When every decision has to go through an executive (not necessarily C-Suite but someone with real project influence) who doesn’t understand or trust, nothing moves. Thus was the fate of this project. We were able to get all of the infrastructure set up and configured, but the project stalled completely when it came down to migration of the business facing reports.

My suggestion was to inventory all reporting assets, generate a time estimate per report to update connection strings and republish, then multiply the unit estimate by the number of reports. I wrote a script to handle the inventory process, estimated 1 hour per unit, and multiplied by 400 reports (the exact number escapes me). This means that 400 hours of work could be distributed to 4 developers who commit to 10 reports per week. This translates into a 10-week project with clear goals and deliverables. Did this come to fruition? No. The individuals with executive authority decided not to execute, the project stalled indefinitely, I left the project when it was clear nothing was happening any time soon.

The End.

So, what are the key takeaways from this?

1.) Skill is not enough sometimes. You also need authority to march forward.

2.) If you’re less technical than the team you’re leading, trust them or your success will be limited.

3.) The last 10% – 20% of the project is usually where teams get stuck.

4.) I’ll bail on a failing project when I have no authority to course correct. You should consider this if you’re able to see the red flags in real time.

5.) It’s possible to make a positive impact, even on a sinking ship. Don’t let one sour experience change your entire outlook.

Signing off…

-DQC

June 12, 2025
Impactful Datasets – Part II

The reality in the corporate world is that executives are responsible for seeing into the future. This assertion comes with no implicit mysticism. If the C-suite predicts the future correctly, shareholders, employees, and customers will generally all be positively impacted. A great example of this is Apple’s 2007 partnership with AT&T to launch the iPhone. Apple now enjoys a multi-trillion-dollar market cap, up from roughly $150B in 2007. If different choices were made by leadership along the way, the story would be wildly different.

As we move into the anticipated era of Generative AI, data will be increasingly relied upon to make better bets on infrastructure, staffing needs, R&D investments, and more. A quick web search of “data driven case studies” will show you several examples of Business Intelligence and data as the catalyst to true competitive advantages for various organizations. If a dataset can help executives make better decisions, I consider it impactful. I don’t think this will be changing anytime soon.

We can all agree that when it comes to data – garbage in / garbage out. Moving forward, ensuring data quality will be non-negotiable. If you’re an executive reading this, it’s not too late to start. Solely relying on your direct reports to interface with the data team can be risky. Vision is often lost in translation…

– DQC

April 27, 2025
Impactful Datasets – Part I

The topic of impactful datasets can generate many different opinions, especially concerning the definition of ‘impactful’. I consider an impactful dataset to be one that does one or more of the following:

1.) Saves significant amounts of time
2.) Significantly increases revenue or reduces expenses
3.) Facilitates improved executive decision making

Let’s start by unpacking the time saving element. Far too many organizations have teams that spend hours upon hours manually updating spreadsheets. Usually this is due to the low barrier of entry – almost anyone can get started with Google Sheets or Excel. The problem is that these are tools meant for ad-hoc analysis but end up being leveraged as long-lived data repositories. Automating one of these processes can easily save between 2-10 hours per FTE per week. This means a department of 10 analysts could win back 100 working hours per week as a result of a carefully crafted dataset. This doesn’t mean 2.5 analysts should be terminated, but it does mean that you can focus on more strategic work with the extra time.

Let’s now discuss revenue increasing/cost saving datasets. The rise of LLMs has showed us that data in the right format can be extremely valuable. Quite literally, an industry has been formed based on the fact that you can collect large amounts of data from the internet, feed it into Machine Learning algorithms, then store this condensed knowledge base into a dataset called an LLM. Developers and consumers alike are happy to pay for tokens, subscriptions, and other derived services, which is essentially paying for access to valuable datasets. The flip side of this phenomenon points to the cost saving potential of this trend. Let’s face it, the current developer job market is stagnant because companies can produce a similar output with fewer AI-assisted developers. Even though long-term code quality is questionable, short-term savings are definitely being driven by LLMs.

We’ll cover executive decision making in the next one…
– DQC

April 21, 2025