The following question came up in a recent conversation:
“Why do you gravitate towards data-centric projects?”
I was surprised by my answer, which is why I’m sharing. Essentially, I believe that data projects are less emotional than traditional software projects. The expected outputs can be clearly defined and are less subjective. On a data project, although there are several paths to success, they all look and feel quite similar. For a software project, success has a much wider range of variability. I’ll give an analogy – Data Engineering is like a grocery store, while Software Engineering is like a restaurant. Let me explain…
With traditional software, companies often pay 6/7/8 figures for projects or SaaS tools to support mission critical business processes – think CRMs, financial systems, help desks, etc… These systems are typically customizable to fit company-specific preferences, much like a restaurant allows substitutions. And just like restaurants, user sentiment tends to be polarized. People love or hate systems like Salesforce – rarely anything in between. The same goes for McDonald’s. These industries require highly specific customer targeting. Grocery stores, on the other hand, cater more broadly and don’t rely as heavily on emotional appeal or fine-tuned experiences.
Another aspect worth considering is that Data Engineers typically work with data produced by software products. That means they’re often brought in after the fact – usually when querying data becomes slow, inconsistent, or expensive. To bring back the analogy: imagine someone who isn’t an experienced home chef but likes to eat good food. At first, they might rely heavily on restaurants or takeout. Eventually, health or financial concerns force a shift – maybe to frozen meals (think PF Chang’s in the grocery freezer), and eventually, to cooking from scratch with groceries.
Similarly, companies may start with SaaS dashboards or run ad-hoc SQL on siloed systems. But over time, data sprawl leads to confusion, inconsistencies, and frustration. That’s when data engineering becomes essential. Like cooking your own meals, it requires more effort up front, but it creates a healthier, more scalable data ecosystem in the long run.
Happy Querying
– DQC
Tag: Data Engineering
-
Software Engineering vs. Data Engineering – An Analogy
-
Cloud Data Migration – Part II (Failure)
Of course, things don’t always go as planned. To illustrate, I’ll be summarizing a failed database migration that started more than a year before I arrived on the scene. The technology stack was already selected, the mission was relatively clear, so what went wrong? Part I briefly referenced “Project Management” as a special set of skills crucial to the outcome of these types of initiatives. This is where the organization was lacking. The project lead must have the technical ability and executive authority to drive the project forward. If either of these components are lacking, the red flags are a-flapping.
Luckily for them, my involvement greatly enhanced the technical capacity of the team. Not that the existing staff weren’t capable, they just needed a little extra bandwidth. However, when it came to executive authority, we had almost none. The dedicated project manager didn’t fully grasp the nature of the work but also wouldn’t trust the team to make independent decisions. When every decision has to go through an executive (not necessarily C-Suite but someone with real project influence) who doesn’t understand or trust, nothing moves. Thus was the fate of this project. We were able to get all of the infrastructure set up and configured, but the project stalled completely when it came down to migration of the business facing reports.
My suggestion was to inventory all reporting assets, generate a time estimate per report to update connection strings and republish, then multiply the unit estimate by the number of reports. I wrote a script to handle the inventory process, estimated 1 hour per unit, and multiplied by 400 reports (the exact number escapes me). This means that 400 hours of work could be distributed to 4 developers who commit to 10 reports per week. This translates into a 10-week project with clear goals and deliverables. Did this come to fruition? No. The individuals with executive authority decided not to execute, the project stalled indefinitely, I left the project when it was clear nothing was happening any time soon.
The End.
So, what are the key takeaways from this?
1.) Skill is not enough sometimes. You also need authority to march forward.
2.) If you’re less technical than the team you’re leading, trust them or your success will be limited.
3.) The last 10% – 20% of the project is usually where teams get stuck.
4.) I’ll bail on a failing project when I have no authority to course correct. You should consider this if you’re able to see the red flags in real time.
5.) It’s possible to make a positive impact, even on a sinking ship. Don’t let one sour experience change your entire outlook.
Signing off…
-DQC -
Impactful Datasets – Part II
The reality in the corporate world is that executives are responsible for seeing into the future. This assertion comes with no implicit mysticism. If the C-suite predicts the future correctly, shareholders, employees, and customers will generally all be positively impacted. A great example of this is Apple’s 2007 partnership with AT&T to launch the iPhone. Apple now enjoys a multi-trillion-dollar market cap, up from roughly $150B in 2007. If different choices were made by leadership along the way, the story would be wildly different.
As we move into the anticipated era of Generative AI, data will be increasingly relied upon to make better bets on infrastructure, staffing needs, R&D investments, and more. A quick web search of “data driven case studies” will show you several examples of Business Intelligence and data as the catalyst to true competitive advantages for various organizations. If a dataset can help executives make better decisions, I consider it impactful. I don’t think this will be changing anytime soon.
We can all agree that when it comes to data – garbage in / garbage out. Moving forward, ensuring data quality will be non-negotiable. If you’re an executive reading this, it’s not too late to start. Solely relying on your direct reports to interface with the data team can be risky. Vision is often lost in translation…
– DQC -
Impactful Datasets – Part I
The topic of impactful datasets can generate many different opinions, especially concerning the definition of ‘impactful’. I consider an impactful dataset to be one that does one or more of the following:
1.) Saves significant amounts of time
2.) Significantly increases revenue or reduces expenses
3.) Facilitates improved executive decision making
Let’s start by unpacking the time saving element. Far too many organizations have teams that spend hours upon hours manually updating spreadsheets. Usually this is due to the low barrier of entry – almost anyone can get started with Google Sheets or Excel. The problem is that these are tools meant for ad-hoc analysis but end up being leveraged as long-lived data repositories. Automating one of these processes can easily save between 2-10 hours per FTE per week. This means a department of 10 analysts could win back 100 working hours per week as a result of a carefully crafted dataset. This doesn’t mean 2.5 analysts should be terminated, but it does mean that you can focus on more strategic work with the extra time.
Let’s now discuss revenue increasing/cost saving datasets. The rise of LLMs has showed us that data in the right format can be extremely valuable. Quite literally, an industry has been formed based on the fact that you can collect large amounts of data from the internet, feed it into Machine Learning algorithms, then store this condensed knowledge base into a dataset called an LLM. Developers and consumers alike are happy to pay for tokens, subscriptions, and other derived services, which is essentially paying for access to valuable datasets. The flip side of this phenomenon points to the cost saving potential of this trend. Let’s face it, the current developer job market is stagnant because companies can produce a similar output with fewer AI-assisted developers. Even though long-term code quality is questionable, short-term savings are definitely being driven by LLMs.We’ll cover executive decision making in the next one…
– DQC