Originally published on LinkedIn on Apr. 17, 2024
It’s an exciting time for many organizations. After struggling for years to navigate massive, siloed stores of information, they see a way forward with Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) systems, and generative AI. Finally, technology is available that can ingest the content from disparate sources, quickly understand it, and provide cogent answers to their users’ and employees’ far-reaching questions.
As they start implementing these solutions, however, organizations are running into an unexpected hurdle. LLMs often have the same problem as humans—they cannot find the right information. RAG systems help, but humans still need to train the system. People must sort good answers from bad to help technology return the right results. As teams investigate why the technology is returning suboptimal results, they find themselves forced to confront the massive backlog of data and content that they were trying to avoid.
In this article, I suggest that while technologies like LLMs, RAGs, and generative AI are extremely valuable to organizations trying to make hefty bodies of information easier to access and understand, they are only half the equation. To get the best return on investment (ROI), organizations must also invest in their content development teams.
Now is the right time for generative AI solutions. Study after study shows that workers are struggling with the amount of content they need to navigate, either because of the sheer volume of information, because it is so spread out, or because it’s in too many applications. For example:
A well-designed generative AI solution, powered by an LLM (and often a RAG system) can help with all these issues. Users can drastically reduce the need to ping-pong between multiple sources and open disparate documents to find answers. Instead, they can go to one portal to answer their wide-ranging questions and explore the nuances of the subject through a conversational interface.
As companies evaluate AI solutions, they are frequently surprised that they must evaluate the quality of their information as well. Although it may seem obvious, many teams might not fully grasp the relationship between their content quality and AI results. For instance, LLMs cannot surface information that is not there. And if the underlying corpus has a glut of low-quality content, it will take considerable work to make sure that the LLM doesn’t expose it to users.
A good deal of AI research seems to center on the idea that valuable information is there, and we only need to find it. For instance, in their 2023 paper, Hugo Touvron et al tell us “Quality is all you need,” and then describe how they have achieved this with Llama 2. The process involves evaluating millions of supervised fine-tuning (SFT) datasets and narrowing them down to a much smaller subset of higher quality SFTs to achieve better results. While impressive, this process takes more effort than some companies want to take on.
This reality is reflected in Clear ML’s 2023 report that analyzes the costs of implementing generative AI solutions in enterprise environments. When surveying 1,000 executives in the AI, machine learning, engineering, IT, and data science space, they expected just 13% of their budget to go towards data preparation. (No category was included for generating missing data, and content isn’t addressed at all.)
In their 2024 paper, Li et al advance the development of LLM quality through an Instruction Following Difficulty (IFD) metric. Developers can use this metric to automatically measure the quality of content (reducing the need for human verification) and reduce the amount of content needed for inclusion in an LLM. The idea here is twofold—AI can automatically determine content quality, and then using that determination, can drastically narrow down how much content is included in the LLM.
This model is probably much closer to what many companies are hoping for—an algorithm that can both find the needle in the haystack and set it aside for future efforts. However, it still assumes the relevant content is there, and we simply need the right algorithm to find it.
In his 2018 book, Infonomics, Douglas B. Laney suggests that although most organizations create a huge amount of information, few actually value it. Executives do not prioritize content because information isn’t an explicit line item on their balance sheets. They cannot see the ROI.
Ironically, many of these organizations are overwhelmed with content specifically because they undervalue it. They have lax standards for who is allowed to create and post assets and a dearth of people to review their accuracy and keep them up to date once created. The result is a high volume of low quality, poorly maintained information.
This type of thinking often creates a culture where content teams are underfunded, and content leaders are forced to make detrimental tradeoffs. They employ mitigation strategies that solve short-term problems but create long-term issues, as described below:
Tradeoffs are not unique to content teams, and in well-run environments, content leaders use blended techniques to mitigate the risks. However, outside stakeholders might be surprised by the results. It’s entirely possible they had not thought through the downstream implications to RAG corpuses. With training, AI can overcome some of these issues, but not all.
Whereas some teams might be discouraged by this situation, others will look at it as an opportunity. KPMG reports that 80% of C-suite and business leaders believe generative AI is important to maintaining a competitive advantage and gaining market share. The same study shows leaders are looking to improve their ROI (as opposed to simply experimenting with the technology). Savvy content leaders should use this as an opportunity to directly associate their teams’ work with business-critical initiatives.
As content leaders look for these opportunities, they should specifically try to attach themselves to RAG-based projects. AI teams often improve the accuracy of off-the-shelf LLM solutions by cross-checking the answers against a corpus of organization-specific content and data (“the RAG”). These solutions then power conversational chatbots for use by customers and employees. Content teams are natural members of these projects since their work constitutes much of the RAG baseline.
As part of this process, development teams will probably scrutinize the content more deeply than they ever have before. RAG developers and test teams are likely to find outdated pages, inaccurate information, missing content, or conflicting information on different parts of the site. In addition to working with the appropriate stakeholders to fix these issues, content development teams should also use this opportunity to track their overall impact on the project and time spent supporting it.
As previously noted, the effort to improve the underlying content in the RAG might surprise those who are funding the project. The following metrics can help content leaders assess the impact of their teams’ efforts and potentially secure ongoing funding to maintain the quality of the generative AI results.
The technology behind LLMs, RAGs, and generative AI is changing quickly, and many content professionals fear these changes will make them redundant. For those who have worked in underfunded departments, it’s an easy conclusion to draw. I would suggest, however, that content professionals will remain relevant for the foreseeable future.
In corporate environments, LLMs, RAGs, and generative AI have the potential to radically improve employee efficiency and customer satisfaction. Users will be able to easily access the information they need and engage in helpful conversations that help them understand it. But these solutions aren’t just built on technology. They also require a solid foundation of useful content, created and managed in large part by content professionals.
We use cookies to analyze website traffic and optimize your website experience. By accepting our use of cookies, your data will be aggregated with all other user data.