Generative AI in Banking: Takeaways From Our Executive Banking Event

For a new technology, generative AI is breaking into Wall Street boardrooms at an impressive speed. Fueled by substantial advancements in natural language, AI is becoming more accessible and, frankly, more comprehensible to business users than ever before.

Banking workflows, however, are still notoriously tricky to change, with high complexity, strict terms of confidentiality, and regulation all working against rapid adoption.

To explore how generative AI is making an impact on the ground, we gathered with a group of banking executives in New York City. Shoutout to Eleven Madison Park for hosting us. Fine-dining pairs quite well with fine-tuning and the possibilities and frustrations that come with it.

We were fortunate to be joined by Michael Schrage from MIT and our Chief Technology Officer, Rigvi Chevala. They led the conversation around the business impact of generative AI, what it takes to succeed in banking, and high-potential use cases that are being explored. Here are some of the highlights.

On the Pillars of a successful generative aI strategy

A successful generative AI strategy rests on three pillars: The Large Language Model (LLM), the underlying data corpus, and the prompts. Each industry has unique terminology, so combining all these three pillars is key for any use case.

1. LLMs and DLMs – “Domain Language Models”

“You have a generic model, that’s what everyone’s been hearing about. But there are large language models that can be put behind the firewall completely disconnected from the cloud,” said Rigvi when explaining the first pillar. Recent advancements in the field have introduced a technique called low rank adaptation, which reduces the size of these models without losing information by employing matrix theory. This technique enables the operation of LLMs behind a firewall, reducing the reliance on large providers like OpenAI. Each subsequent version of these models incorporates all the learning from the earlier versions, ensuring the retention of valuable information and progress.

2. Data

The second pillar is the data corpus you’re working with, be it public, semi-curated, curated or proprietary data, which is essential for ensuring the use case works in the given context. Rather than analyzing your own institution, use data from a comparable bank or financial service to test your prompts. The outcome could stimulate new ideas, pinpoint irrelevant use cases, or identify areas needing fine-tuning. Start with experimentation in a proprietary or open-source domain, then create an internal competition to determine which use case would have the greatest impact and offer the most learning opportunities

3. Prompts

The third pillar is the quality of prompts you send to the system; they must be domain-specific and clearly phrased to avoid outputting garbage. OpenAI charges per prompt token, which are essentially units of meaning in language. This pricing model, combined with the compute power required to host the LLM, contributes to the overall cost of using AI. Therefore, the optimization of AI-generated prompts is crucial to effective cost management. While the length of prompts may not significantly impact costs for lower-volume tasks like outbound emails, it becomes crucial when dealing with high-volume tasks like classification.

On proven use cases

1. Research

At Evalueserve, we have deployed a Research Bot for a leading global consulting firm. We leverage the vast amounts of data available, from paid databases to internal enterprise data, scanning through over 200,000 data sources. Our product, Insightsfirst, delivers insights from this data to our clients. Now, the research bot, deployed on top of Insightsfirst, accesses this rich data corpus, allowing us to conduct thorough research and build comprehensive company profiles.

We use domain-specific prompts to produce summaries. These prompts can be adjusted as needed to achieve the desired outcome, all automated to the point where a simple command can generate key takeaways for a company. There’s quite a bit of rapid research that bankers do, and this can absolutely apply to that.

2. Document Creation

The same consulting client has an extensive number of case studies compiled over the last several years. We are leveraging this data to automatically generate an initial draft of a new case study, tailored to the requirements of prospective clients. This same logic is being used to auto-generate RFP responses for another client. Although we haven’t yet fine-tuned an LLM, this is on our roadmap. We don’t want to rely on GPT-4 and receive generic outputs; instead, we aim to fine-tune it to deliver industry-specific results.

3. Business Intelligence – Cross-Dashboard Analysis

We’re working on a unique use case with a client that is using multiple dashboards in their research of diseases. The goal isn’t merely language research but actual interaction with data dashboards. We deployed a virtual assistant that can translate natural language into SQL commands, allowing the system to refresh and manipulate data based on user queries. For example, a user can ask about the spread of a disease in a specific region, and the system will adjust the data to display that information. The system can then redraw graphs based on follow-up questions, such as the percentage of patients affected in that area. This approach revolutionizes Business Intelligence (BI) needs and allows for cross-dashboard analysis.

On experimentation in banking

When it comes to testing new use cases, Michael suggests that you “begin with the experimentation either in a proprietary or open-source domain, and then really begin to think and create sort of an internal competition as to what kind of use cases would A) have the biggest impact and B) you learn the most from.” Below are some of the use cases we have been experimenting with at Evalueserve:

1. Library/Business Information Services

Bankers receive vast amounts of information from the library team. We are exploring the possibility of developing a user interface that would allow them to interact with this data more efficiently. Rather than manually sifting through all the information, they could simply ask questions and receive answers through the interface. This approach to information consumption could result in significant savings and can be applied to both structured and unstructured data.

2. Credit Reviews

Another area we are evaluating is credit reviews, which typically comprise two components: data extraction and analysis. First, financial data is extracted from quarterly and annual reports to fill financial models. Significant savings are already possible using technology for this extraction process. However, the challenge lies in generating precise analyses or summaries based on this data, as the results may vary depending on specific company situations. Currently, we see some success when specific prompts are used, but the absence of these prompts often leads to unsatisfactory results. The challenge, therefore, is developing a system that is not reliant on specific prompts.

3. Pitchbooks 

Pitchbooks are not static, but so much of their composition is common elements across different tickers and storyboards. Auto GPT can be used to instruct the LLM to retrieve a specific dataset and present it in a predetermined format, eliminating the need for manual intervention. There are two aspects to consider: the generation of content using prompts and LLMs, and the automation of processes. The latter doesn’t necessarily require Generative AI, but rather involves stitching together the output with a predefined template. The missing piece was the availability of content, which we now have.

Internally at Evalueserve, we are exploring the idea of building a common component and an action library on top of it.

4. Risk Management

We have attempted to employ GAI in enhanced due diligence processes, such as checking for potential bad actors in various languages. Although the technology has been somewhat successful, providing correct results 60-80% of the time and speeding up the analyst’s work, we often revert to manual processes for improved accuracy.

However, we are discovering instances where GAI is more beneficial in Risk, such as in documenting models and providing explanations, challenging and understanding the strengths and weaknesses of models. Whether the AI provides excellent or poor responses, these insights reveal opportunities for domain expertise and training. It is noteworthy how the same prompt yields different answers based on the training and enforcement, shedding light on variances between how risk is sold and managed.

On Limitations & Mitigation of generative aI in banking

Analysts routinely produce comps models, pitchbooks, credit reviews, and financial models. Evalueserve’s Corporate & Investment Banking team has experimented with off-the-shelf GPT technology on these work products and found promising results in some areas. The technology works well when it comes to searching for information from a large number of documents, yielding up to 30% savings in time. It also excelled in data extraction, pulling financial numbers from annual reports and collecting ESG data. However, success was limited when it came to analyzing information and producing a final, accurate, and comprehensive work product. The technology can provide a starting point, but it falls short when it comes to matching the narrative created by an analyst who absorbs a lot of information and presents data to fit a particular storyline.

When it comes to bias in AI responses, it’s important to note that AI will likely provide the most probable answer based on the data it has been trained on. This is where we need to consider prompt engineering, we must push the AI for less probable, but potentially more insightful responses. Using open-source, smaller models allows us to control where to put our thumb on the scale and train the AI based on our unique corpus and risk appetite.

One also must consider, what constitutes a good answer? This involves getting an answer and having teams of people review its validity. This is where domain expertise becomes essential. This is why Michael advises you to consider partnerships, to avoid circular reasoning or self-affirming bias. “You don’t want to be the snake eating its own tail, or, you know, what’s the phrase drinking your own bathwater, we want to have a different form of peer review or rival review or counterpart review. It may well be that you use the answers imagining that you were some other financial institution to train your answers. This is where we want to have a similar answer, this is where we want to differentiate ourselves as advisors, as analysts.”


While Generative AI holds tremendous promise for the banking industry, it’s important to recognize its limitations. At Evalueserve, we’re focused on what we call the DLM – Domain Language Model that combines cutting-edge technology with our deep industry expertise. This approach has been the foundation of our intelligent solutions, and as we integrate generative AI into our offerings, we remain committed to this model. By leveraging our industry knowledge and embracing the power of generative AI, we strive to deliver innovative solutions that meet the unique challenges and opportunities in the banking sector.

We’re eager to continue this discussion and will be holding virtual roundtables and webinars in the coming months to delve deeper into banking-specific topics. Let us know below if you’d be interested in joining a future event, and what topics you want to learn more about.

Allison Cornett
Susan Xie
Associate Director, Marketing Posts

Latest Posts