The keeper of clean code and destroyer of redundancies, they call me
When I joined the project, I did the project audit to make it clear both for the team and the customer in which way I see my contribution to the system's growth.
The legacy system posed a good deal of challenges, to name just a few.
- Immense calculation procedures, hardcoding and duplicated code in the database.
- Redundant Azure resources and services that either were not used or that we were unaware of. Some of them added to the system’s complexity, too.
- Duplicated code, hardcoding, multiple implementations of the same functionality in the backend.
- Duplication of backend models and overly complex application building process in the frontend.
First things first, I made clean code my priority to make the system easy to manage and maintain. Some redundant complexities were to be cut off, too, to preserve a meaningful shift to a simpler system state that my colleague started on the project.
What I discovered and had to deal with in the first place doesn’t sound like a whole new story to anyone engaged in digital transformation projects for enterprise clients. In case too many people are involved in the development process over the years, you’ll find more than a few curious things in the code.
Architectural improvements to boost the application performance
Wherever huge volumes of data are involved, architecture is the king. If you have extra 5 milliseconds added along the data processing pipeline, those 5 milliseconds grow exponentially when large datasets come into play.
The application was excessively decoupled into microservices that still used a single database, with separate code bases. We unified the once scattered logic, whenever it was feasible. The resulting system runs faster and is much easier to test, maintain, and grow new functionality.
As the system grew, we introduced end-to-end and performance testing to gain a detailed overview of the software operation.
Assemble an ETL jigsaw puzzle without a cover picture (= documentation)
ETL was a whole new challenge due to multiple data sources, data processing approaches, the data volume in question and inconsistent documentation.
As a vendor, I’m unable to change the way primary data is collected from the data sources outside my project. But I can modify data processing here, in my own area of responsibility. Right now we’re working on the ETL pipelines to make them more efficient.
Here’s an example of what we do. A client’s partner provides a huge amount of data once a month. To load it into our internal reporting system takes about 10 hours, which is a very stressful 10 hours, to be honest. A network issue, a database connectivity issue – there are endless opportunities for a failure to happen in real life! – and poof… The data is a mess, hours of work and resource consumption lost to a glitch.
In this case you need a process intelligent enough to catch the error and restore the process from the moment when it occurred, at this leaving all stored data in place and adding only the new necessary data. Such system behavior requires structural changes – the ones we’re working on right now – so that even after the crash or exception happens, the process can restore automatically, without affecting the quality of data already received.
When I joined the project, there were still a lot of blank spots in the documentation. Due to the amount of data that we’re working with and multiple ETL pipelines, documenting isn’t just something that makes our lives easier on the project. It’s a bare necessity to enable stable data performance.
You need nothing extravagant. Azure DevOps has in-built capabilities for documenting important project data, though it is more of a technical tool. The key is to ensure that everyone on the project has access to it and – that’s important – to keep things simple. In the project wiki, we keep and update the project information divided into several sections – General Overview, Infrastructure, Backend, and Frontend.
Currently, the architecture of the solution has both an on-prem and the cloud part. The process starts on the local physical servers. From here, the dev team takes the data stored in the SQL database. Using the SSIS package the team transfers the data to the SQL database in the Azure cloud. The BI application is deployed in the Azure Application Services. As you see, the data makes quite a journey. For businesses that do not operate such huge amounts of data there’s no emergency to introduce structural change. That wasn’t the case (as we're talking about an application for enterprise insurance client), so we’re working hard to make data flow smooth, uninterrupted and intelligent.
Another missing part that really made a difference for us and helped us see a bigger picture on the project, was the project audit and documentation. Lack of or incomplete documentation is a generic issue in software engineering, not only enterprise software development. After all, documentation isn’t at the top of the priority list, as we’re hopping from task to task, with documentation left at the bottom of pending items. But that’s not the winner's attitude.
If your story is similar to this, start with an honest project overview and the docs. It may not seem like the most innovative solution, but it will open up your eyes to the truth – what you have, how you use it and what can be improved about it.