This article originally appeared on Solutions Review’s Insight Jam, an enterprise IT community enabling the human conversation on AI. View the original article here.
For as long as there have been computers, we’ve heard the warning, “Garbage in – Garbage out!” when it came to expecting great outcomes from using poor data. But with the advent of generative AI (GAI) applications, many were hopeful we had finally unshackled ourselves from the issues of poor-quality data undermining the benefits of technology deployments.
Surely, this amazing new technology that could learn to find answers from a single example, that could spot and summarize key issues in massive documents in seconds, could also weed through the mess that describes much, if not most, of our data storage. Like Mickey Mouse as the Sorcerer’s Apprentice, we would wave our GAI wand over our cauldron of data and out would come the insights and value long hidden in that dark brew of duplicated, mismatched, incomplete, and poorly OCR’d files. True magic! But is it?
A recent study by Gartner suggests that at least 30 percent of GAI projects will be abandoned after proof of concept by the end of 2025 due to poor data quality, inadequate risk controls, escalating costs or unclear business value. Oh my, that doesn’t sound like magic in the making.
So why is GAI still stymied by poor data quality, and what should we be doing about it?
“Garbage in – Garbage out!” manifests in three primary ways for Gen AI. We see it first in the form of bias and hallucination in our response output when we allow the GAI to wander untamed data sources to provide us with answers. Uncontrolled access yields unpredictable and likely undesirable outcomes. We see it again prominently in the quality of answers we get when the questions or prompts we ask lack the direction or sophistication to get us true value. Your answers are only as good as the questions you ask. Finally, we see it in the form of incomplete or inaccurate responses if we point GAI toward our in-house data sources when they remain full of duplicates, poor-quality images, and incomplete and improperly associated files.
Anyone planning to undertake a GAI project needs to address all three of these data quality pitfalls if they expect to achieve basic objectives, much less the often-lofty expectations that flow from this new technology. The first pitfall, bias from untamed resource access, can be controlled by understanding the application you are using and how it is querying the LLM behind your actions. Responsible vendors are containing their search parameters to get responsible outputs. Beware of just using an unfettered Chat GTP interface.
The second pitfall is avoiding garbage through poor prompting, which requires education. Investing in your team’s skills will be required of anyone operating in this new GAI environment. Basic courses are available to help users avoid mistakes and get better outcomes from their prompt attempts. For important projects, you can employ a partner organization that has already upskilled their team on prompt engineering. Key partners and service providers have upskilled services teams with training and practice at creating effective prompts. A partner can work within your environment to help you optimize your GAI capabilities without burdening your teams.
Finally, there is the age-old need for data cleansing. While GAI may learn to find a particular topic faster than traditional machine learning AI, it will still struggle with poorly imaged documents, distinguishing duplicates, and associating disparate files to get complete and accurate answers. Most organizations have less than stellar data hygiene. A partner can help you clean up that data in cost-manageable chunks, prioritizing high-value or, most often, queried content. Over time, a minimal but steady investment in data clean-up and storage hygiene will pay dividends in getting you more effective output from your upcoming GAI initiatives.
We all want the magic, but as any magician will tell you, there is a trick to success behind the show-stopping finish. Investing in your skills, your questions, and your data quality are the required elements of GAI magic. Partners are available to help with these elements. Don’t wait to get your success pieces in place. Foundational investments now will yield dividends when key GAI initiatives take off. And remember, “Garbage in – Garbage out!” still rules.