From Pilot Purgatory to Production: What AI Leaders Are Really Thinking
Evalueserve brought together senior executives from Life Sciences, Healthcare, Logistics, and Manufacturing for an evening of candid conversation about what it actually takes to scale AI inside large, complex organizations.
Introduction
On a lovely May evening in Chicago's Fulton Market District, Evalueserve hosted a gathering that felt a little different from the average AI conference. No vendor theatrics, no scripted demos, just twenty-plus senior leaders from some of the world's most recognizable companies, sharing what's actually working, what's blocking them, and what keeps them up at night when it comes to enterprise AI.
The guest list spanned a wide range of functions and seniority levels: AI strategy leads, commercial insights managers, directors of market access and data science, chief marketing officers, clinical ecosystem leads, and technology executives, all representing organizations at the forefront of AI adoption across Life Sciences, Healthcare, Logistics, and Manufacturing.
The dinner, moderated by Evalueserve's Gabe Keeler, featured Clara Buenker, Google Cloud AI GTM Lead, as the evening's keynote voice. Together they guided the room through a wide-ranging conversation that moved from cultural roadblocks and security fears to multimodal use cases and the future of the workforce.
The "Pilot Trap" is real, and leaders know it
The evening opened with a telling ice-breaker: what's one task you'd hand off to an AI agent tomorrow, at home? Answers ranged from planning family vacations to using AI to alter and enhance family photos, setting up competing LLMs head-to-head just to see which performs best, and mapping out full landscaping projects. The room was clearly full of active, curious AI users.
But Buenker was quick to draw the contrast. Trusting an AI to plan a long weekend is a very different proposition from trusting it to manage a clinical trial pipeline or a global logistics network. That gap, between consumer convenience and enterprise accountability, framed much of the evening's discussion.
The dominant enterprise challenge, she argued, isn't building AI. It's escaping what she called "Pilot Purgatory": organizations sitting on hundreds of proofs-of-concept that never make it into production workflows where real ROI is captured.
"The conversation has completely shifted from 'How smart is the model?' to 'How do we manage it?'"
CLARA BUENKER, GOOGLE CLOUD AI GTM LEAD
The risks that prove most damaging are increasingly not the ones institutions prepared for, but the ones cascading quietly behind them.
The institutions navigating this most effectively are not necessarily those with the most sophisticated models but those that have accepted the new reality. Volatility is no longer a temporary deviation from stability, but the environment itself. Building for that reality, rather than waiting for conditions to normalise, is where the work begins.
"Adoption cohorts, small groups that use AI together and discuss what works and what doesn't, help build the muscle. But any solution also needs to be tested not only at the technical level, but at the people level, to ensure it actually gets adopted."
GABE KEELER, EVALUESERVE
Shadow AI and cultural risk aversion: the unspoken tension
Before the dinner, several attendees submitted candid reflections on their challenges. One, from a senior leader in pharma, landed squarely at the center of a broader conversation:
"My biggest roadblock is cultural. My company is very risk-averse... AI adoption is not incentivized. The only AI tool available for all employees is Copilot at the moment, with all others blocked by VPN."
DIRECTOR, INSIGHTS AND STRATEGY, PHARMA
This resonated across the table. When asked for a show of hands on whether their teams were likely using unvetted consumer AI tools, the response was telling. The phenomenon, employees quietly pasting proprietary data into public chatbots because approved tooling isn't keeping pace, was widely recognized.
Buenker's advice for AI champions in risk-averse organizations: stop framing the ask as "we want a new tool" and start framing it as "we need a governed platform." Build a Center of Excellence that partners with IT rather than circumventing it. Target the tasks that everyone agrees are painful and administrative. And prove value first on synthetic or low-sensitivity data before asking permission to touch regulated information.
FROM THE FLOOR
Several attendees noted that AI usage has already begun appearing in employee goals and performance frameworks at their companies, but tracking it qualitatively remains an open challenge. How do you measure whether someone is using AI well, not just whether they're using it at all?
A separate concern surfaced around quality control: as organizations push pilots into production, many are doing so without adequate QC processes in place. The rush to ship is outpacing the rigor needed to ensure outputs are accurate, safe, and fit for purpose.
Security and trust: Google's acquisition of Wiz enters the conversation
Not every attendee arrived as a Google Cloud convert. A thread of skepticism ran through parts of the room, with some companies remaining cautious about trusting any major cloud provider's AI solutions with sensitive enterprise data.
Buenker addressed this directly, pointing to Google's acquisition of Wiz as a meaningful signal of its commitment to enterprise-grade security and governance. The message: this isn't a cloud AI product layered on top of consumer infrastructure. It's a deliberate build-out of the security, compliance, and data governance architecture that regulated industries actually require.
The core commitment she returned to repeatedly: customer data is never used to train public models. Combined with proactive data loss prevention and granular role-based access controls, the platform is designed to bring IT and legal stakeholders into the conversation, not alarm them.
Where Gemini's multimodal edge actually shows up
A senior marketing leader in pharma put it plainly: they were more familiar with Claude and ChatGPT than Gemini and wanted to understand what was actually possible. Buenker's answer focused on native multimodality, the ability to process text, images, audio, and video simultaneously, as the differentiator that matters most in the industries represented at the table.
In pharma, R&D teams are using Gemini to simultaneously analyze visual chemical structures alongside scientific literature, a task that previously required toggling between entirely separate tools and workflows. In logistics, warehouse operators are cross-referencing live camera feeds with shipping documents in real time. In manufacturing, a supervisor can point the AI at a camera feed and ask it to diagnose a stalled robotic arm by cross-referencing the machine's manual.
The pattern across all of these: multimodal agents aimed directly at the organization's most expensive operational bottleneck.
AI won't replace your experts, but their roles will look different
The question is not whether agentic AI can analyse, synthesise, recognise patterns, and form views. It absolutely can and will continue to expand its capabilities. But who is directing it, and toward what end? Without human experience and intention, AI optimises without purpose, producing volume without insight, analysis without judgment, answers to questions nobody asked in the first place, leaving the financial services industry at risk of drowning in AI slop.
This is reshaping workforce strategy across the industry. Domain expertise has become the most valuable asset a financial services professional can bring, not because agentic AI cannot crunch the numbers or surface the patterns, but because without someone who understands the field deeply enough to direct it, question it, and know what a good answer looks like, the output is just noise at scale. Eventually, your investment memos, equities reports, credit analyses will all start looking the same.
A telling example of the shift in workforce strategy comes from Bridgewater. The firm has fundamentally changed its hiring criteria, actively seeking out history majors, philosophers, and musicians alongside traditional engineering recruits.
The reasoning is that people who think laterally, ask unconventional questions, and can direct and prompt AI are generating better outcomes than those who follow purely technical paths. As Mangesh Patnaik, Senior Vice President at Evalueserve, put it, "We're now looking for people who think beyond the obvious, those who can prompt AI in new ways and approach problems from unexpected angles, not just follow traditional technical paths."
The PE Research Director echoed this. The critical skill in an AI-enabled organisation is not simply producing AI outputs but interrogating them, asking why a conclusion was reached, what sources were used, and where the assumptions might be wrong. That interrogation is only meaningful when it is grounded in experience. Domain expertise is what allows a professional to tell the difference between a genuine insight and a confident answer built on flawed foundations.
The institutions getting their workforce strategy right are thinking carefully about where in the workflow agents and agentic AI tools create the most value, and deploying them there with precision. The combination of human direction and AI execution, applied with discipline and domain depth, is what will define the most competitive financial services organisations of the next decade.
"When everyone has the exact same AI baseline, your differentiators become your proprietary data, your cultural velocity, and your taste."
CLARA BUENKER
The questions the room is still wrestling with
The evening closed not with tidy answers but with the questions that are actually keeping leaders busy. How do you maintain agents as regulations, data sources, and governance requirements evolve? How do you get AI genuinely embedded in a workflow rather than bolted onto individual users? How do you track qualitative AI adoption in a meaningful way? And perhaps most urgently: as pilots move into production, who is responsible for quality control, and what does rigorous QC even look like for AI-generated outputs?
These are exactly the questions worth gathering around a table for. Evalueserve looks forward to continuing the conversation.
Talk to One of Our Experts
Explore how Evalueserve can design, launch, and scale Agentic AI solutions customized for your business.