Improving the Performance of AI Models with Better Prompts and Better Data
In the constantly evolving generative AI sector, there’s significant competition between companies to create ever more relevant and consistent models. After all, there’s little point having a theoretically high-performing model which requires endless re-tuning to achieve a satisfactory output.
Our extensive experimentation with generative AI has resulted in a host of very different client use cases and proofs of concept. Each project extends the potential and flexibility of our AI models. However, build processes have not been without their challenges. Despite their expansive creativity, generative AI models sometimes struggle to remain relevant and consistent in a business setting.
Take the poor performance of AI decision-making tools during the COVID-19 pandemic for instance, as reported in Harvard Business Review, who wrote: “The ultimate impact of such incomplete and poor-quality data was that it resulted in poor predictions, making the AI decision tools unreliable and untrustworthy.”
Or consider the example of corporate profiling within investment banking. This task demands a specific, standardized approach where key financial and growth metrics must always be present.
The challenge in this particular use case lies in ensuring that AI, when tasked with generating a company profile, adheres to these standards. This attention to detail provides the client with the confidence that the resulting profiles can be consistently compared and assessed.
Which brings us to the billion-dollar question: how do you enhance the relevance and consistency of model outputs, given the huge range of tasks generative AI is expected to fulfil?
Data and Prompts: A Two-Pronged Approach
The pursuit for increased relevance and consistency of model outputs has led us to develop two extensive technical resources: our proprietary data corpus and our prompt library.
We are in the initial stages of exploring LLM fine-tuning, which we’ll come to later in this piece. Our current focus is ensuring our AI models pull from high-quality, relevant data and employ standardized prompts to maintain output quality and relevance.
Let’s examine both those elements in more detail.
Proprietary Data Corpus
Controlling the data corpus is paramount. It ensures that the answers generated by AI models stay relevant to the specific industry, field, or function. Take for example, use cases like providing competitive intelligence to professional service firms, something we’ve mastered. For this, we use our proprietary Insightsfirst platform.
Insightsfirst tracks business competitors. It’s a rich source of high-quality data that we’ve vectorized (optimized for increased speed). We’ve designed this data source to be far more reliable than pulling upon the vast, unvetted data available on the web.
By ensuring that the data corpus is consistent, complete, thorough, specific, accurate and accessible, the process of insight retrieval becomes significantly easier and more productive.
Domain and Use Case Specific Prompt Library
In addition to collating a controlled data corpus, our domain experts have generated prompt libraries. These libraries are based on standardized use cases to ensure output consistency.
Consider the investment bank use case again. Such entities often create pitch decks or proposals to win new business. Typical questions they are expected to answer when writing such proposals might include “what does your company do” or “what are some case studies you’ve worked on?”
In other words, the process can readily be templated. There’s no point in prompting a generative AI with a non-specific prompt to “create a case study for my company,” then specifying in detail what information to include or not include. That would be a cumbersome and hit-and-miss approach.
Instead, Evalueserve has established a library of prompt templates such as sector analyses, company profiles, or use case analyses, which can be drawn upon when creating a specific document. This ensures a consistent and targeted approach, while delivering results in a timely manner.
By leveraging a controlled data corpus and specific prompt libraries, we ensure that the generated outputs from our AI models consistently align with both industry standards and client expectations.
LLM Fine Tuning: The Road Ahead
This two-pronged approach works very well, and our clients are very happy with it. However, we won’t rest on our laurels. Research into improving LLMs to deliver ever more useful insights and reliable information is ongoing.
LLM fine-tuning represents a new frontier in the quest for enhanced AI output relevance and consistency. At Evalueserve, we are developing what we term Domain-Specific Language Models (DLMs). These are simply pre-tuned LLMs where our domain expertise has been encoded into the model to deliver specialized, high performing outputs.
We can also create enterprise-level LLMs fine-tuned to each client’s unique needs. Whether it’s investment banking, environmental, social and governance firms, or competitive intelligence for professional service firms, we can tailor our creations to each use case.
At Evalueserve, refining the performance and reliability of generative AI models is an ongoing commitment.
By focusing on proprietary data corpus and prompt libraries, and exploring the potentials of LLM fine-tuning, we are taking significant strides towards ensuring the relevance and consistency of AI-generated outputs across a wide range of business settings.
Of course, we’ll continue to update you on our discoveries, inventions, and improvements. It’s an exciting time, within an expanding field, and we’re committed to being at the forefront of innovation.