Transferring ETL processes to the cloud. Why bother?
There’s a big difference between working with data on a local server and in the cloud. For example, we have a file containing 300,000 lines. If we use the classic on-prem model, the file will be uploaded to the destination server approximately within 3-4 hours.
But in case we use the cloud, we could do it much faster. This is a competitive edge the cloud features – it’s way more flexible in comparison with on-premise servers. If there is a request for loading data-heavy documents, in contrast to an on-premise model, the cloud can leverage additional capacities just like that on demand until the work is done. Therefore, there’s no need to constantly use a powerful server, and which is more important – pay for it.
That is why our insurance customer is partially moving ETL processes to the cloud. Some of these workflows – such as data pre-processing, first and second processing phases – the customer leaves on local servers. This is interim data, which is quite heavy and unsuitable to extract valuable information from. But when data is at the final processing stage and is ready for uploading to the destination server, these workflows are transferred to the cloud.
Here’s a quick recap of the tools we apply for data processing and visualization.
Now off to the hottest part – real-life case of migrating the ETL and Reporting to the cloud.
Use Case: How we assisted in financial reporting storage system renovation and cloud migration
Insurance industry is basically sitting on piles of personal data. Not to use it would be an unforgivable act in the age of data technologies. Thus, one of our biggest teams works on granular reports generation – such as revenue from premiums, expenses for claims, best performing geographies, etc.
This project covers several insurance business lines, such as warranty servicing, P&C, M&C, ATE insurance, Accident & Health insurance. Every month authorized agents send data to the client’s European offices in a disparate format.
When we just started working on the project, it existed on the client’s local servers, while all the ETL jobs were done using the Centerprise solution.
First, the data processing flow was unified for all the data sources. Thus, an on-prem ETL process had three stages:
However, after proper evaluation of the advantages that the cloud provides, the customer initiated gradual abandonment of on-premise servers. For migration, we helped the third-party vendor of the client’s during the transition period.
How the client made the best of the cloud BI & ETL tools: Until recently we had been using on-premise tools only for this project – SSIS and SSRS (read one of our previous blog posts to know the difference between them) + Power BI which can be used both on a local server and in the cloud. But in view of the client’s decision in favor of partial cloud migration, they needed to establish the report generation pipeline in the cloud ecosystem.
As part of this project, we’ve been assisting with the system’s migration to the cloud to a third-party vendor of the client’s who was in charge of a PoC and cloud infrastructure preparation. The cloud version of the solution utilizes Azure ETL applications – Data Factory and Data Lake. Data Factory allows to visually configure the logic of data generation, whereas Data Lake serves for data storage. As for visualization – a cloud version of Power BI is used here.
Our team worked on the DWH elaboration, prepared documentation, data migration scripts with exhaustive comments, data mappings, presentations, demo tapes, and conducted regular strategic sessions with the vendor’s team to share our knowledge and expertise for smooth cloud migration. Our effort allowed the third-party team to implement our practices and the system knowledge in the cloud and build a PоC highly appreciated by the customer.
Closing remarks
Data is paramount. I hope from these two blog posts you got the rough idea how data-driven decisions are made in insurance. For many the question remains though: Should I establish my ETL processes on a local server or use the cloud infrastructure? As usual, a silver bullet simply does not exist. Before deciding in favor of any of the options, you need to assess the data volumes you operate with, data sensitivity, and how important frequent ETL updates are to your company and more. Same client I was talking about in these two blog posts now is expanding their own on-prem data center to serve their specific needs that cloud cannot fully meet, or which are simply impractical to meet in the cloud given the volume of data that our client generates.
The bottomline is – each situation is unique, but tools are not. As you can see, the tech stack that we use for our insurance client isn’t extravagant or something that only few companies can afford. The hack is in consistency and commitment from the vendor that you entrust your data with.
Sure thing, we do utilize break-though techs alongside a good old workhorse like SSIS. Are you looking for your ideal combination of cutting edge and regular techs? Stay tuned and we’ll tell you exactly about that in our following articles.
Follow Ivan on LinkedIn for more BI & ETL backstage stories.