Data Extraction Tool Improves Efficiency for Leading Investment Bank

Summary

While working on daily, weekly, and monthly economic research reports, Evalueserve’s Emerging Markets (EM) team was performing various repetitive, time-intensive manual tasks for a top investment bank. Evalueserve’s experts realized that these tasks could be easily automated and built an in-house Data Extractor tool that has reduced the time taken to run processes by approximately 100 hours per year. The tool has been positively received by the client, as evidenced in the below testimonial.

“This is a great contribution. Unfortunately, we spend a lot of time uploading data from sources that publish information in inconvenient formats. Not only do we save time, but we also use data we didn’t previously have access to.”

The Challenge

While working in economic research, downloading and processing public information is a common and frequent practice; therefore, automating processes becomes a necessity. Different tools are used to accomplish this and extract and process information. Generally speaking, data can be processed more easily in formats like Excel or CSV files. However, when data is published directly in the body of the webpage, it can present challenges such as:

  • The data is not compatible directly with Excel and other tools.
  • Copying and pasting the data is not always possible or straightforward.
  • Copying data manually is time-consuming and usually prone to human error.

So, Evalueserve developed a compatible, reliable, and efficient tool to automate data extraction from websites.

Our Solution

After a thorough investigation, the team used Python with the Selenium library to extract data from websites automatically.

Selenium is a tool that automates web browsers and can navigate through them based on the commands set in the code. These commands consist of HTML codes created using XPath, a simple language to query elements from HTML documents.

Selenium/XPath allows for the use of search bars, click buttons, and dropdown menus, and enables data retrieval from any element on the website exactly like a user would perform manually.

A Python script is then written, using the tool with commands to navigate any given webpage and then save the data in an Excel file to continue processing it.

The run time usually is less than a minute to have the script ready.

Business Impact

The setup time to develop the application in the tool takes longer than running the process manually; however, once it is set up, the job is completed within ~3 minutes, as opposed to more than an hour and a half manually, resulting in efficiency gains to the tune of 97%.

The tool also allows for near-complete elimination of human error, thus making the entire process robust and reliable, although a quality check of the output file is advised to ensure the data is properly arranged, especially for cases in which the reporting structure has changed, as that could lead to unidentified variables. Automation tools also allow the team to take on more requests or value-added tasks that would not have been possible had they relied only on manual work.

These automations can be replicated by other teams for their specific requirements. However, this tool can’t be set up as a generic script, as there are no equivalencies among HTML codes for different websites. As a standard practice, the developers of this tool share their learning with other team members and help them replicate this application based on their requirements.

Limitations

The setup time to develop the application in the tool takes longer than running the process manually; however, once it is set up, the job is completed within ~3 minutes, as opposed to more than an hour and a half manually, resulting in efficiency gains to the tune of 97%.

The tool also allows for near-complete elimination of human error, thus making the entire process robust and reliable, although a quality check of the output file is advised to ensure the data is properly arranged, especially for cases in which the reporting structure has changed, as that could lead to unidentified variables. Automation tools also allow the team to take on more requests or value-added tasks that would not have been possible had they relied only on manual work.

These automations can be replicated by other teams for their specific requirements. However, this tool can’t be set up as a generic script, as there are no equivalencies among HTML codes for different websites. As a standard practice, the developers of this tool share their learning with other team members and help them replicate this application based on their requirements.

Authors

The Data Scraping Tool development efforts have been led by Alberto Iturra with exceptional contributions from fellow Emerging Markets team members Pablo de la Barra  and Sergio Lillo.

Alberto Iturra

Senior Research Lead

Pablo de la Barra

Senior analyst

Talk to One of Our Experts

Get in touch today to find out about how Evalueserve can help you improve your processes, making you better, faster and more efficient.

Share: