The reality in the corporate world is that executives are responsible for seeing into the future. This assertion comes with no implicit mysticism. If the C-suite predicts the future correctly, shareholders, employees, and customers will generally all be positively impacted. A great example of this is Apple’s 2007 partnership with AT&T to launch the iPhone. Apple now enjoys a multi-trillion-dollar market cap, up from roughly $150B in 2007. If different choices were made by leadership along the way, the story would be wildly different.
As we move into the anticipated era of Generative AI, data will be increasingly relied upon to make better bets on infrastructure, staffing needs, R&D investments, and more. A quick web search of “data driven case studies” will show you several examples of Business Intelligence and data as the catalyst to true competitive advantages for various organizations. If a dataset can help executives make better decisions, I consider it impactful. I don’t think this will be changing anytime soon.
We can all agree that when it comes to data – garbage in / garbage out. Moving forward, ensuring data quality will be non-negotiable. If you’re an executive reading this, it’s not too late to start. Solely relying on your direct reports to interface with the data team can be risky. Vision is often lost in translation…
– DQC
Author: Data Quality Consultants
-
Impactful Datasets – Part II
-
Impactful Datasets – Part I
The topic of impactful datasets can generate many different opinions, especially concerning the definition of ‘impactful’. I consider an impactful dataset to be one that does one or more of the following:
1.) Saves significant amounts of time
2.) Significantly increases revenue or reduces expenses
3.) Facilitates improved executive decision making
Let’s start by unpacking the time saving element. Far too many organizations have teams that spend hours upon hours manually updating spreadsheets. Usually this is due to the low barrier of entry – almost anyone can get started with Google Sheets or Excel. The problem is that these are tools meant for ad-hoc analysis but end up being leveraged as long-lived data repositories. Automating one of these processes can easily save between 2-10 hours per FTE per week. This means a department of 10 analysts could win back 100 working hours per week as a result of a carefully crafted dataset. This doesn’t mean 2.5 analysts should be terminated, but it does mean that you can focus on more strategic work with the extra time.
Let’s now discuss revenue increasing/cost saving datasets. The rise of LLMs has showed us that data in the right format can be extremely valuable. Quite literally, an industry has been formed based on the fact that you can collect large amounts of data from the internet, feed it into Machine Learning algorithms, then store this condensed knowledge base into a dataset called an LLM. Developers and consumers alike are happy to pay for tokens, subscriptions, and other derived services, which is essentially paying for access to valuable datasets. The flip side of this phenomenon points to the cost saving potential of this trend. Let’s face it, the current developer job market is stagnant because companies can produce a similar output with fewer AI-assisted developers. Even though long-term code quality is questionable, short-term savings are definitely being driven by LLMs.We’ll cover executive decision making in the next one…
– DQC -
The Definition of Data
What is data? This seems like a simplistic question until you really sit down to think about it. Is writing the number two on a piece of paper considered data? What about the exploding message that contains Inspector Gadget’s newest mission? Admittedly, these are a lot of questions, but that is the nature of data. Data often leads to more questions despite getting more answers.
To me, data is defined as, “The attempt to describe reality in a format that is designed to be accessed more than once.” So, to revisit our initial questions:
Is writing the number two on a piece of paper considered data?
It probably won’t surprise you if I say, “it depends”. The piece of paper definitely meets the formatting requirement, but if writing the number two isn’t an attempt to describe reality, then it should not be considered data.
What about the exploding message that contains Inspector Gadget’s newest mission?
The mission is an attempt to describe reality by communicating what the author wants to happen. Additionally, even though the message is intended to be transient, both the author and the recipient accessed its content. So yes, the mission qualifies as data.
However, an unrecorded conversation does not meet these two requirements. Information is shared, but it is not stored in an accessible format. You can argue that the human brain is an accessible format, but I’d counter with a friendly challenge – try to remember what you had for dinner 1 week ago on Tuesday.
Goodbye for now.
– DQC -
Thoughts on AI – v0
The elephant in the room is Artificial Intelligence (AI). Is it just media hype or will AI actually change the world in ways it’s difficult to fully grasp? I won’t pretend to know for sure, but I’ve definitely got an opinion to share.
First, the hype train is real but there is substance to this current iteration of what we know as AI. AI can write content, AI can generate photos, AI can produce videos, and yes – AI can write code. But will this all be enough to replace humans with robots? I think not. What’s happening now is that individuals in creative spaces are using these tools to be more productive and less bogged down in historically tedious tasks. This does spell trouble for newly minted professionals, however there is equal opportunity for them to adapt and attempt to even the playing field.
In the short term, many companies will try to streamline their operations by lowering headcount and introducing AI assistance. This will lead to fewer jobs across many industries – we’re already seeing this trend in the marketplace. However, I believe that in the long term, these talent pools that have been shut out from corporate opportunities will find a way to coalesce their creativity and shift the paradigm. Think Bitcoin but on a larger scale.
I encourage you to read this article by the creator of Open WebUI. I think we’ll see more of this mindset find its way into the mainstream – not just reserved for techies on the bleeding edge of LLM ecosystem development. Part of the reason for this blog is to make sure the human element continues to define our work. When you think about it, much of the data created nowadays serves as a proxy for how humans behave, think, and feel.
The concept of ‘digit’ in digital is literally a reference to the human hand. Marinate on that.
– DQC -
Tools vs. Fundamentals
There’s constant chatter amongst data professionals about what tools to use to get the job done. I could spend all day rattling them off: Snowflake, Spark, Databricks, Big Query, Kafka, dbt, SQLMesh, and the list goes on and on… But here’s the real question – When’s the last time you heard a discussion about getting back to the basics?
Personally, I think the fundamentals are what separates average performers and “10x” Data Engineers. It’s not generally the tooling that makes the big difference. I’ve seen more business problems solved with a relational database and SQL queries than the latest and greatest of the modern data stack. Better tools can actually mask poor decision making – until it can’t any longer. That’s the pivotal moment when you’ll be forced to get back to the basics anyway.
Do your stakeholders really care that you migrated to Databricks from a SQL Server database? Probably not. Adopting new tools can actually increase time to insights due to the resulting learning curve, which is largely unavoidable. So, while the data team is happily exploring new technologies and padding their resumés, the business isn’t making any real strides from a data perspective. I encourage my peers to focus on communication, effective requirements gathering, customer service, and actually understanding what the business needs so we can deliver and make an impact.
Even though we are usually in a support role, it’s amazing what a motivated data professional can accomplish when focusing on the fundamentals instead of shiny objects. Rant over. Thanks for reading.
– DQC