Migrating and generalising a forecasting framework from Hadoop to Azure Databricks

10 minutes read
case-study
ClientNDA
IndustryRetail

Our client, a leading UK retailer, aimed to extend its ML Forecasting Framework to other subsidiaries within its group network. The objective was to transition the extensive platform from Hadoop to a dedicated Azure Databricks environment, including all library dependencies and datasets. 

However, lacking the necessary expertise and resources posed a challenge in refactoring the framework for the new environment. VirtusLab, having previously developed the forecasting framework and being a partner of Databricks, emerged as the ideal choice to assist in the migration. Upon completion, the revamped framework resulted in the initiation of five distinct projects within the client’s network.

Download this success story as PDF

Print it out, take it with you to read later, or share it with your peers. Free download
case-study

The challenge

The forecasting framework achieved significant usability within the client’s organization. Because of its success, the retailer decided to migrate the platform to other organizations within the group. The migration aimed to enhance support for day-to-day forecasting and research activities related to mobile products, including phones, subscriptions, and accessories.

 

Migrating_Forecasting_to_Azure_Databricks_graph_1

The scalable framework itself was a large project, written in pySpark on a legacy Hadoop platform. It encompassed: 

  1. A pySpark project with tens of data processing pipelines.
  2. A set of machine learning models featuring custom configuration and hyperparameter tuning code, running in parallel through Spark User Defined Functions (UDFs).
  3. Jenkins CI/CD pipelines implemented in Groovy for automation.
  4. Management of the Python dependency environment using Conda.
  5. Jupyter notebooks provided for research purposes, leveraging the forecasting framework code.

Our client aimed to migrate this large-scale project to a completely new Azure Databricks environment. This endeavor demanded to:

  1. Revamp the framework for use in a different environment, moving from Hadoop to Azure Databricks.
  2. Make the framework domain and company agnostic.
  3. Move selected data pipelines and all libraries to Azure Databricks.
  4. Enable merging central Hadoop cluster datasets with new custom datasets from external sources.
  5. Migrate Jenkins CI/CD pipelines to Azure DevOps.
  6. Empower Data Scientists to perform research using notebooks in the new environment, accessing the code as a library.
  7. Ensure strong security measures to protect access to code, data, and artifacts in the new environment.

Relying on trust as the foundation of our partnership, our client reached out to VirtusLab for assistance.

The solution

VirtusLab refactored the framework to suit new organisations and incorporated the option to integrate additional data sources in two major steps: Refactoring the code for migration and preparing the infrastructure for seamless framework execution.

 

Migrating_Forecasting_to_Azure_Databricks_graph_2

Code preparation

We enhanced the versatility of the forecasting framework by removing domain-dependent code elements like column and table names, as well as specific config settings:

  • Removing scheduler-dependent code
  • Revamping the definition of defining clusters to make them applicable in any environment

This made it adaptable for implementation in various organizations. We also extracted Hadoop-specific components from the code, facilitating its execution in Azure Databricks and other environments. For instance, we extracted the Oozie workflow generation used for Hadoop deployments.

Infrastructure preparation

We helped to set up the infrastructure on Azure Databricks to enable the smooth execution of the framework. This involved:

  • Creating new Databricks Dev and Prod compute clusters with preinstalled environments
  • Automating updates for the forecasting framework and Python dependencies through Conda
  • Migrating all CI/CD pipelines to Azure DevOps while hosting the framework as Azure Artifacts
  • Integrating the preinstalled Forecasting Framework into Azure Data Factory for new projects
  • Enabling the use of the framework for research via notebooks
  • Implementing regular data exports from the Hadoop cluster to the new Azure Cloud
  • Employing Azure Key Vault for secure secrets management.

The results

VirtusLab deployed the generalized forecasting framework in both Hadoop and Azure Databricks. Our client’s subsidiaries used the framework within six months, following its successful restructuring and implementation in the new Azure Databricks environment. They also gained:

  1. Customized Forecast Generation – Regularly generated forecasts utilizing bespoke ML and statistical models tailored for the new domains, incorporating their individual models.
  2. Migration of Engineering Best Practices – Successfully transitioned all best practices such as CI/CD, tests, code reviews, and the creation of separate DEV and PROD environments to the new ecosystems and teams.
  3. Immediate Project Implementation – Promptly implemented five distinct new projects in the updated environment using the migrated framework.

The tech-stack

Cloud environment: DataFactory, Blob Storage, Artifacts, Key Vault, Azure DevOps, Databricks

Languages: Conda, python

Frameworks: Spark

The tech-stack

Cloud environment

artifact_registry
Azure DevOps logo
Databricks_Logo
Image Alt
Azure_storage
Azure_key_vault

Languages

Conda
python-logo

Frameworks

Spark-logo

Take the first step to a sustained competitive edge for your business

Let's connect

VirtusLab's work has met the mark several times over, and their latest project is no exception. The team is efficient, hard-working, and trustworthy. Customers can expect a proactive team that drives results.

Stephen Rooke
Stephen RookeDirector of Software Development @ Extreme Reach

VirtusLab's engineers are truly Strapi extensions experts. Their knowledge and expertise in the area of Strapi plugins gave us the opportunity to lift our multi-brand CMS implementation to a different level.

facile logo
Leonardo PoddaEngineering Manager @ Facile.it

VirtusLab has been an incredible partner since the early development of Scala 3, essential to a mature and stable Scala 3 ecosystem.

Martin_Odersky
Martin OderskyHead of Programming Research Group @ EPFL

VirtusLab's strength is its knowledge of the latest trends and technologies for creating UIs and its ability to design complex applications. The VirtusLab team's in-depth knowledge, understanding, and experience of MIS systems have been invaluable to us in developing our product. The team is professional and delivers on time – we greatly appreciated this efficiency when working with them.

Michael_Grant
Michael GrantDirector of Development @ Cyber Sec Company