How to automate big data mapping and stabilization. Insurance Case

A multinational giant entrusts us with a turn-key delivery of a mission critical data platform

As I said, first there were simple excel files. The client wanted us to process, map them correctly, and put the resulting files neatly in the database.

"Develop a process for us," they said. “How do you imagine this can be mapped?”

That is, we define the unifiable fields and map the data using the tool of the client’s choice. Then the file is uploaded to the data storage and from here analysts and managers take stabilized data for reporting and analytics. They now use the clean and reliable data to see:

Policies sold
Best performing locations and geographies
Best distribution channels
Best performing agents
The most/least profitable accounts
Over which period, etc.

Thus, the project issues to be solved and the expected value from Symfa’s involvement can be summarized as follows:

Symfa’s Involvement

In time, when we got comfortable with the client’s data, we started experimenting for deeper insights. You see, cases abound. The same mapping logic doesn’t apply equally effectively to all the processes. So, whenever we saw some discrepancies, new validation processes or extra rules were introduced. The logic gradually grew more complicated and now provides data for five client’s business lines.

How does the solution work?

This is a non-GUI system, all the processing happens under the hood, hidden from the business users. Automatic data input happens at a set interval. In our case the system collects files once every hour. Such an interval is comfortable for the users and ensures an optimal load on the server.

Once a user gets a file from a contractor, he/she puts it in a designated folder and waits for an email notification – processing success/failure/error, etc.
The system collects the files from the configured catalogs and knows where the output files go, respectively.
If the file input is a success, the process ends here for the underwriter. The system will do the rest and the analysts will soon get their clean data in the output folder in the warehouse.
If not, he/she receives a notification in an email on what kind of error occurred.
If he/she can figure out the error him-/herself – rename the file, fill in the missing fields, etc. – then he/she does those basic corrections and the cycle starts again.

If not, then the support team comes in to figure things out.

General View of the Solution (1)

Special achievement: The system now knows how to talk to the user through the notification system

Together with the client, we made a notification system that sends email notifications on the file processing status – success, failure, error, file not found, etc.

First, the system was built with the developers in mind. Later it became clear that sooner or later business units will start to work with this system, too. Eventually, it’s mainly business folks now who use it daily. To make the human-machine interaction smooth, we needed a user-friendly notification system. Those should be carefully thought-out notifications clear enough to enable a non IT person to deal with a basic set of exceptions – delete the file, rename the file, etc. Thus, people that have nothing in common with software development, are able to operate the system and deal with errors or failures to bring the processing back on track.

Insurance Platform Works

How do I see the future growth of the project?

The process can evolve further than that.

There are no identical data or identical files. A user can simply make an unforeseen change or a human error thus creating an exception case. When the system uploads such an exception file, it will either be processed with an error, or loaded incorrectly and bring dirt into the clean data.

Right now, we’re working on automated exceptions handling to make the system fully human-independent. The more we work, the more cases we get and thereby bring the process to 100% automation.

Data Jobs

We’re finished with the skeleton now and moving to the data jobs – the muscles of the project. By now, we’ve introduced five data jobs (one of which even enables monitoring processing timeout for larger files, when processing takes 15+ hours).

What Is a Data Job

Probably, in a year, there will be one dedicated IT specialist monitoring 15 jobs that’ll replace a whole development team. This is the plan for now.

See a rough scheme below for one of the data jobs.

Implementation view for one of the jobs

Scaling

The data we process is huge, and the process can scale further. Everything comes down to computing power, so the server limitations are our only limitations.
New business lines add as the project grows. We started with two and now it’s five, and I see no reason why it cannot be 10 in a couple of months.

What’s the secret ingredient of the project's success?

Great teamplay

We’ve built great relationships within the team. You may have heard about such cases – a QA approaches the dev with a bug only to get “send it to my backlog, I'll give it a look tomorrow, maybe”. This isn’t the case for us. My goal as a DM was to build a team, where we go through our ups and downs together and stay in sync no matter what. I’m proud of the job we’ve done.

The client trusts us

A few months after the project started, the Vice President of the client texts me:
– Andrey, how many more people do you need for this project? I trust you, you know.

– Well, – I say. – Four more QAs.

– No problem, – he says.

This short conversation brought us our QA talents.

A couple more months pass and three more BAs join us in the same way. A simple question from the VP and my short answer – this is all it takes for resource allocation after all we’ve been through during two years of partnership.

I’m not naive to conclude our commitment alone was what won us this enterprise insurance software project. Surely VPs and IT directors consulted with their local tech leads on the solutions we designed. Our two-year long technical record was impeccable before we were entrusted with this massive data platform processing dozens of terabytes of data per day.

Eventually, the client canceled the weekly and monthly Symfa team reporting because of our great performance. We have daily meetups, though. Stable data flow, blockers never take too long to resolve, the client sees the result with their own eyes – no one needs extra paperwork.

Look for more project details in the next article on the topic

In the next article about this project, Alesya Sumkina, BA Lead, will share:

How the project is kept in order (and thus ultimate data cleanliness is ensured)
Why documentation on big data projects is a must and
Why the client sends their tech leads to consult with Alesya.

More insights on how the data is processed and the tooling we use on similar projects in the story by Ivan Sokolov, ETL & BI Developer.

Follow us on LinkedIn and X to stay updated on our recent articles. Or fill out a simple form below and subscribe to our blog. We post frequently about the latest trends in software development and give you a glimpse of the real software backstage stories.

How to Automate a Big Data ETL Project if You’re an Insurance Major. Part I

Here’s how we did it for our client which now gets clean stabilized data for their 5 business lines.