How we moved our in-house big data ETL platform to the cloud. Part I

Truth and nothing but the truth on the hurdles of cloud migration from our ETL & BI expert.

undefined
12 min read
Business intelligence
Client stories
ETL/ELT
Insurance

Do you have an on-prem data management process that you’d like to move to the cloud someday soon? If so, this article is for you. Here, we’ll talk about one of our largest data projects – an in-house big data ETL platform for a Fortune 500 insurer. The platform was built to facilitate financial decision making and reconciliation, among other things. Initially built on-prem, it is now partly running in the cloud.

Hi, I’m Ivan Sokolov, ETL and BI developer at Symfa. Now, let’s jump right to the project and how we moved it to the cloud (and why we did it in the first place).

Table of Contents

  • What an Insurance Platform is and how I ended up on the project
  • September - December, 2022. Here my story begins.
  • February 2023. Why did we decide to move the platform to the cloud?
  • Spring 2023. PoC development.
  • April - June 2023. We’re honing our cloud skills to shift our platform to the cloud
  • July 2023 hit hard on the cloud vendor (pardon my gloat)
  • September again! PoC ready, let’s build a team, finally.
  • How is the operation of the platform on-prem different from its cloud state?
  • How does the cloud part of an insurance platform work?
  • Where does all this data go next?
  • Conclusion: How does it feel to be a part of this project?

What an Insurance Platform is and how I ended up on the project

For those unfamiliar with this story, I’ll make a real short intro into the Insurance Platform.

Why the client turned to us:

This project started with a pretty common problem – the client had data coming in from all over Europe, all in different formats and even languages. The system they were using to handle it was old and unsafe, so it was time for an upgrade. They used to process the data using Excel files to figure out if the company was making money or losing it.

The task before us:

We needed to convert the terabytes of data coming in daily into two formats: premiums (how many insurance policies were sold) and claims (how much money the company paid out against those policies) through a reliable automated process, that’ll ensure data stability and change traceability.

The project result:

The previously manual and time consuming process is now all automated. The non-GUI system collects files from the designated folders every hour – which is enough to keep the users happy and the server running smoothly – and provides the output files with stabilized clean data for the analysts and managers to use for the reporting, reconciliation and even accounting.

You can find more details on how the system works here (article by Andrew, Delivery Manager on the project) and here (Alesia, Lead BA, adds some cool details to the story).

Ah, here’s me and my part.

September - December, 2022. Here my story begins.

I used to work with this client before on a slightly different type of task (here’s what I did for them previously). So, onboarding was no issue.

I joined the project in Autumn 2022. Back then, the client was completely focused on loading data into their on-prem International Data Warehouse. I was doing the final database that collects data after automated verification, clearance and mapping. A huge ETL process it is, I must admit, which we fully implemented on the client’s on-prem infrastructure.

December 2022, we were more or less finished with it and the platform was getting ready to go live.

February 2023. Why did we decide to move the platform to the cloud?

February 2023, another vendor – a global award-winning cloud infrastructure provider – appears and claims that the platform can do faster, better and more flexibly in the cloud. Any vendor knows how it goes in the cloud –  if you need to ramp up capacity, with the cloud you can blow it like a bubble gum. Once you don’t need that much of the resources, shrink the bubble to suit your needs (which can’t be done on a local server that easily). However, cloud transition has its price and many businesses opt for local servers if they have to work with stable heavy loads (like our platform does). Surprisingly for us, the third-party vendor approached our client with a few numbers that somehow stuck with the client, and the cloud transition began.

More often than not, after cloud migration, the whole new cloud cost optimization initiative starts. Or, the company implements an already thought-out strategy that was planned well in advance (DevOps first infrastructure, spot instances, performance optimization for short-lived spikes in demand, etc.) DURING the transition. To figure out how to optimize your cloud so that it doesn’t eat up all your yearly budget in a few months is actually what modern DevOps services are all about.

Spring 2023. PoC development.

So, it’s Spring 2023. The cloud vendor’s task was to make a PoC – Proof of Concept to show how the platform can do in the cloud. To do so, we helped the vendor with the docs, knowledge transfer and the cloud infrastructure setup.

  • Our team handled a portion of the DWH development.
  • We created comprehensive project documentation and data migration scripts.
  • Data mapping, presentations, and demonstration recordings were completed to facilitate a deeper understanding of the DWH.
  • Together with the vendor, we held regular sessions to share knowledge and skills, enabling a seamless transition to the cloud.

The cloud part of the platform was planned to use Azure ETL applications – Data Factory, Data Lake and Azure Synapse Analytics.

The vendor’s team studied our processes that we built for the client and soon they rolled out a demo on the cloud infrastructure. For the demo purposes a major client’s contractor was chosen and the PoC seemed to work fine. After that the vendor left to do the rest of the development and we started our training for the cloud jobs.

April - June 2023. We’re honing our cloud skills to shift our platform to the cloud

Starting from April, a huge cloud learning initiative begins for the team in order to move our brainchild to the cloud.

It is a common thing for us – developers – to learn new things. Our colleagues from a different project with the same client did some upskilling, too. For a dashboard development project, my colleague learnt to work with a no-code framework, Bubble. So, mastering a new framework is kind of a routine for us.

Our client has their entire stack running on the Microsoft ecosystem. It makes sense, Microsoft are cool guys. Naturally, it was AZ-900 certification we needed to make Insurance Platform work in Azure. AZ-900 is a series of courses on some general cloud principles – how the cloud is implemented in Microsoft, its main components, and so on.

Azure certificates

The learning process didn’t end up there. The client organized a practical course for us to dive even deeper into the cloud. It was like a super practical crash project, or – better say – an in-house bootcamp where we together with the client’s in-house team learnt to build pipelines from scratch locally on our computers. Within this in-house bootcamp, I also did this course and highly recommend it.

July 2023 hit hard on the cloud vendor (pardon my gloat)

In July, after the final knowledge transfer session with Symfa, the cloud vendor came back with the PoC. Previously, they only tested their solution on one major client, while the platform had been processing 300 businesses on prem by then. It was in July, after three months of the PoC development, that our client realized that the cloud vendor misunderstood some fundamental aspects of how insurance data works.

They took some more time and got back in August after urgent remodeling of the PoC. This time the solution matched the business logic that actually exists in our client’s company.

September again! PoC ready, let’s build a team, finally.

In September, the client decided that we should work in one big hybrid team – the client, the cloud vendor, and Symfa. While the cloud vendor was preparing the cloud infrastructure for all of us, we were finalizing the data prepping mechanisms in the on-prem part, studying during working hours, in our free time, running some tests, and continuing with the set up for the on-prem infrastructure.

Me, I worked on the database configuration, ran testing processes to make sure that the data was properly uploaded to our home servers, did verification, reconciliation, re-uploading – it was a seemingly endless stream of supporting and testing activities. 

On September 9, the so-called go-live happened on the on-prem part and this is where the on-prem story ends for me. I stopped supporting the data warehouse on our local servers and part of the team – me including – has switched to the cloud.

How is the operation of the platform on-prem different from its cloud state?

On-prem infrastructure

Image 03

The cloud part replicates the on-prem part that we developed for the client earlier. It contains dev, UAT, preprod and production environments. The four environments ensure the optimum data quality, without spending too much time and resources on data double checking. Although, to be completely honest, we do practice data double checking, as we’re testing the data continuously: 

  1. starting from Unit tests
  2. integration tests in the dev environment 
  3. testing done by the team leads in the UAT and Preproduction environments

How does the cloud part of an insurance platform work?

Data Lake is where we store our data. A space where data is presented in different layers: bronze, silver and a gold layer.

Schematic view of data organization in the client’s Data Lake

Schematic view of data organization in the client’s Data Lake

Data Factory helps you manage the data that you have exactly in this lake. 

We use Data Factory to take data from on-prem PreProcessor, and flow it through the cloud-based landing zone, bronze, silver and golden layer:

[PreProcessor] → Data Factory/Synapse  → LandingZone  → DataFactory/Synapse  → Bronze  → DataFactory/Synapse  → Silver  → DataFactory/SQL Stored Procedures  → DWH

Architecture example from Microsoft

Architecture example from Microsoft

Where does all this data go next?

To the client’s cloud warehouse (SQL pool we call it). But even the client’s cloud DWH is not a terminal point either, because it’s the basis for various things, like reports, reconciliations, and accounting (more on that in my next article).

Conclusion: How does it feel to be a part of this project?

The responsibility is huge, but it’s also so exciting. We are at the core of making difficult financial decisions.

Was the cloud migration justified? 

In terms of speed – yes. In terms of resources – yet to be seen. Wait for the next part of the story for the details.

What with the partnership with the globally recognized Azure development vendor? 

The cloud vendor we’re working with outweighs Symfa in terms of numbers and experience. See for yourself:

  • 1,200 Azure migrations completed annually
  • 750 Azure experts, Microsoft MVPs, and Solution Architects​
  • 140 Azure environments managed daily​
  • Microsoft Partner of the Year 2022 & 2023. 

Honestly, I feel like a kid sometimes working with them – they’re implementing things in a manner I’ve never seen before. I learn every bit of knowledge from them that my brain capacity allows me to process. But even to such giants flops are nothing new.

This makes for at least two reasons for you to give a look to my next article here.

That’s it for today. 

Thanks for reading, I’ll be delighted to meet you again in the Symfa blog.

Follow Ivan on LinkedIn for more BI & ETL backstage stories.

Subscribe to Symfa’s LinkedIn and X to be the first to know about our new articles.

Credits

Ivan Sokolov
Ivan Sokolov

Business Intelligence Developer

Ivan is an aspiring young leader of the corporate BI universe. He's constantly challenging himself with new approaches to data processing and experimenting with tools. Ivan is residing in Georgia with his wife and two kids.

Ivan is an aspiring young leader of the corporate BI universe. He's constantly challenging himself with new approaches to data processing and experimenting with tools. Ivan is residing in Georgia with his wife and two kids.

More Like This

BACK TO BLOG

Contact us

Our team will get back to you promptly to discuss the next steps