How to Deal with Large Data Sets for Accurate Sales Forecasts?
For an investment analyst in the agrochemicals sector, meteorological data is an excellent resource for sales forecasts—but how to leverage such a large data set?
The client wanted to leverage freely available meteorological data to improve predictions of pesticide sales. However, they lacked both the human resources and the programming and analytics expertise to gather, process and analyze such a large data set.
The client is a sell-side analyst at an US-based investment bank focused on the agrochemicals industry. The analyst recognized the potential for more accurate sales forecasts of pesticides in this completely free resource. However, the analytics would entail collecting weather data for over 150 locations for the past 10 years. This amounted to ~39 million data points. The analyst also recognized that even if they could afford to put their associates’ time into gathering and formatting the data, they did not have the coding and analytics expertise to work with such a vast data pool.
We provided an end-to-end service from data collection to insight generation. Our team automated the collection and cleansing of relevant data from the source (the US National Oceanic and Atmospheric Administration website) and built a user interface that enabled proper handling of the huge data set. Then our analysts developed a regression model for sales forecasts based on a successfully tested hypothesis linking weather conditions, crop growth and pesticide sales.
Our team created a database of accurately classified data points by: • Automating data collection with a dedicated web crawler that downloads the text files from the website’s FTP server and codes that extract relevant data fields from raw data • Building a stable database with a familiar Microsoft Excel interface using SQL to ensure proper handling of the existing ~39 million data points and the planned 0.4 million/per monthly update
“Evalueserve’s expertise enabled the client to harness an alternative data source that was previously complete untapped, and to make sales forecasts in a scientific way.”
Developing code to cleanse the data, a process that included filling in missing locations and deriving data for varying report frequencies
When the team was satisfied that the database was stable and accurately reflected the information on the desired parameters, they moved to analysis. Our analysts developed a regression model for the sales forecast, based on a hypothesis of the relationship between specific weather conditions, crop growth patterns and pesticide sales. The hypothesis tested successfully and the model was put into action.
The client previously used a traditional approach for sales prediction based on management guidance, historical trends and sector expertise. Evalueserve’s expertise enabled the client to harness an alternative data source that was previously completely untapped, and to make sales forecasts in a scientific way. The analyst is satisfied with the accuracy of the sales forecasts coming out of the regression model.
Data about the weather is freely available through sources like the United States’ National Oceanic and Atmospheric Administration (NOAA) website. Information about solar radiation, temperature and precipitation could prove extremely useful in the prediction of demand for agrochemicals. However, for most analysts, such data remain an untapped resource due to the time-intensive and demanding analytics needed to transform them into insights.
Successfully harnessed the potential of a previously untapped and completely free source of information
Produced accurate sales forecasts for pesticides
Proved the value of customized analytics of alternative data