Transferring ETL processes to the cloud. Why bother?
There’s a big difference between working with data on a local server and in the cloud. For example, we have a file containing 300,000 lines. If we use the classic on-prem model, the file will be uploaded to the destination server approximately within 3-4 hours.
But in case we use the cloud, we could do it much faster. This is a competitive edge the cloud features – it’s way more flexible in comparison with on-premise servers. If there is a request for loading data-heavy documents, in contrast to an on-premise model, the cloud can leverage additional capacities just like that on demand until the work is done. Therefore, there’s no need to constantly use a powerful server, and which is more important – pay for it.
That is why our insurance customer is partially moving ETL processes to the cloud. Some of these workflows – such as data pre-processing, first and second processing phases – the customer leaves on local servers. This is interim data, which is quite heavy and unsuitable to extract valuable information from. But when data is at the final processing stage and is ready for uploading to the destination server, these workflows are transferred to the cloud.
Here’s a quick recap of the tools we apply for data processing and visualization.
Now off to the hottest part – real-life cases of migrating the ETL and Reporting to the cloud.
Use case 1: How we help the client fine-tune the reports generation pipeline in the Cloud
Insurance industry is basically sitting on piles of personal data. Not to use it would be an unforgivable act in the age of data technologies. Thus, one of our biggest teams works on granular reports generation – such as revenue from premiums, expenses for claims, best performing geographies, etc.
How the client made the best of the cloud BI & ETL tools: Until recently we had been using on-premise tools only for this project – SSIS and SSRS (read one of our previous blog posts to know the difference between them) + Power BI which can be used both on a local server and in the cloud. But in view of the client’s decision in favor of partial cloud migration, they needed to establish the report generation pipeline in the cloud ecosystem.
As part of this project (and the one described below), we’ve been assisting with the system’s migration to the cloud to a third-party vendor of the client’s who was in charge of a PoC and cloud infrastructure preparation. The cloud version of the solution utilizes Azure ETL applications – Data Factory and Data Lake. Data Factory allows to visually configure the logic of data generation, whereas Data Lake serves for data storage. As for visualization – a cloud version of Power BI is used here.
Read on for more details on our contribution.
Use Case 2: How we assisted in financial reporting storage system renovation and cloud migration
This project covers several insurance business lines, such as warranty servicing, P&C, M&C, ATE insurance, Accident & Health insurance. Every month authorized agents send data to the client’s European offices in a disparate format.
How the client made the best of cloud BI & ETL tools: To ensure smooth data flow, and provide company’s leadership with data-driven and insightful reports, the client needed to update their existing financial reporting storage systems.
When we just started working on the project, it existed on the client’s local servers, while all the ETL jobs were done using the Centerprise solution.
First, the data processing flow was unified for all the data sources. Thus, an on-prem ETL process had three stages:
However, after proper evaluation of the advantages that the cloud provides, the customer initiated gradual abandonment of on-premise servers. For migration, we helped the third-party vendor of the client’s during the transition period.
Our team worked on the DWH elaboration, prepared documentation, data migration scripts with exhaustive comments, data mappings, presentations, demo tapes, and conducted regular strategic sessions with the vendor’s team to share our knowledge and expertise for smooth cloud migration. The cloud-based solution now uses Azure Data Factory and Azure Synapse for ETL jobs, Azure Data Lake for storage and Power BI for visualizations. Our effort allowed the third-party team to implement our practices and the system knowledge in the cloud and build a PоC highly appreciated by the customer.
Closing remarks
Data is paramount. I hope from these two blog posts you got the rough idea how data-driven decisions are made in insurance. For many the question remains though: Should I establish my ETL processes on a local server or use the cloud infrastructure? As usual, a silver bullet simply does not exist. Before deciding in favor of any of the options, you need to assess the data volumes you operate with, data sensitivity, and how important frequent ETL updates are to your company and more. Same client I was talking about in these two blog posts now is expanding their own on-prem data center to serve their specific needs that cloud cannot fully meet, or which are simply impractical to meet in the cloud given the volume of data that our client generates.
The bottomline is – each situation is unique, but tools are not. As you can see, the tech stack that we use for our insurance client isn’t extravagant or something that only few companies can afford. The hack is in consistency and commitment from the vendor that you entrust your data with.
Sure thing, we do utilize break-though techs alongside a good old workhorse like SSIS. Are you looking for your ideal combination of cutting edge and regular techs? Stay tuned and we’ll tell you exactly about that in our following articles.
Follow Ivan on LinkedIn for more BI & ETL backstage stories.