In a meeting a few months ago, a senior compliance executive at a Fortune 50 company, which is an Evalueserve client, posed this challenge to us, "Can you help us predict when a law or regulation will pass in any country that we operate in, which could affect our business there?" This system, he elaborated, also needed to be a single repository of global regulatory developments relevant to them, with an ability to filter by country, and traceability of historical articles to provide a layer of accountability.
In attempting to solve this business problem, we turned to artificial intelligence (AI) and machine learning (ML) techniques, which have made particularly impressive technological strides over the last decade. Today, AI and ML techniques that leverage large volumes of structured and unstructured data promise a wide range of business insights—from predicting market and customer behavior to predicting the impending failure of a machine. The big challenge, however, is to have large volumes of good quality data with which to train any type of model. Most leaders in the implementation of advanced AI and ML algorithms like Google, Facebook, Netflix, and Amazon own large databases with historical records. In the case of our client, however, there wasn’t any kind of dataset to work with, nor was there a centralized system with historically classified records that we could use to train a model.
The way this Fortune 50 company, and several others we work with among Fortune 500 companies, has approached the enormous task of global regulatory environment monitoring is still fairly human-analyst dependent, and consequently, very effort intensive: companies have large, global teams of analysts, who regularly manually wade through hundreds of government websites with non-intuitive user interfaces, news articles, blogs, and other sources. This material is then compiled and summarized by an analyst and routed through a supervisor in charge of a particular area (health, security, finance, compliance, etc.) within a country or region, who then makes a judgment call on whether an item should be escalated for follow up.
In the case of our client, this effort-intensive process includes 150 analysts worldwide, who spend a good part of their working hours skimming through hundreds of articles every day, often duplicating each other’s work while risking missing an important update. Over time, this kind of rote work negatively impacts their overall motivation and often leads to high attrition on the team. Thus, solving this problem, even partially, promised to provide several business benefits to our client, such as managing regulatory risk, informing business strategy, and helping with boosting analysts' morale in the regulatory monitoring team.
Evalueserve's initial solution was to create a large database of accurately classified articles, which can then be used to study the feasibility of predicting regulatory changes. We approached this goal in three steps:
- Build a large data set of classified documents (with at least 85% accuracy). For the dozens of categories chosen by the client (based on WTO ICS codes), we created a training data set of classified documents, evenly distributed across the various categories. For this, we reached out to the company's analysts to help us classify several thousand relevant documents that we had gathered automatically. Along the way, we also used a variety of creative techniques to solve the problem of sparse categories to create a robust training data set.
- Develop code to pull relevant data from the different sources that were chosen, including regulation updates, congressional session transcripts, blogs, news articles, public consultation summaries, etc. The documents were in a variety of different formats—HTML, PDF, Word, and even Excel. For the dozen or so countries that were most important to the client, this led to nearly a thousand relevant articles automatically pulled into the system every day.
- Build a web-based system that doubles as a search engine and classification tool. This system lets users easily navigate the (automatically classified) articles and allows them to manually update the expected 15% classification errors. This will lead to a higher quality classification engine, and consequently a higher quality data set that will feed an ever-improving machine learning algorithm.
Business Outcome for the Client
In terms of its predictive power for regulatory changes with a level of accuracy and on a scale that our client can solely rely on, this system is still a work in progress. However, even in partially solving the client's problem (so far), Evalueserve's analytical approach to dealing with large volumes of data, our team’s creativity to solve some challenging problems along the way, and our Mind+Machine™ philosophy have enabled our client to save tens of thousands of analyst research hours, while creating a valuable data asset and improving quality, accountability, and traceability.
We believe that the methodology outlined in this Evalueserve solution, even in its current (somewhat rudimentary) form, could be employed to provide custom-designed solutions to the regulatory monitoring challenges of other companies in a variety of industry verticals.