Evolution of Test Automation for a Monolith

Published in

The Protean Tester

5 min readJan 2, 2024

The monolith. Dreaded for its sluggishness, tangledness and license costs. Mongering fear in the hearts of developers, who tame it through incantations long lost to the void of attrition. The call for new features and profits overshadowing the long lasting desire to replace it. Its expensive maintenance delegated to consulting shamans, whose native development rituals make little sense to modern man.

A menacing monolith. Generated via https://hotpot.ai/art-generator

Deciphering its glyphs, marks and forgotten functionalities we have discovered ways to tolerate its dull presence in our microservices landscape, where heralds sound the progress of technocracy.

As we continuously push forward, the monolith slowly withers in its corner, remembering the attention it would revel in during the Platform’s early days. Slowly smothered, deconstructed and derelict, it becomes an empty husk, which no longer commands but is instructed. As its decommissioning lurks on the horizon, we review a few test automation patterns applied throughout the years.

Out with the Old, in with the New

Initially the testing team of external consultants was tasked with training us in the Ways of the Monolith. SoapUI was the main tool used, with the code written partially in a foreign language, and long, long, long if-else trees and string concatenation of all over the place. The tests focused on the Monolith, mostly ignoring the New Platform. Clearly, this would not do.

With Kotlin being the main language at our company, and the ambition to have more involvement of developers in test automation, it was an easy choice. We felt a low entry barrier for contributions by developers was important to create a context where ownership of test automation is shared across test and development teams. With Kotlins interop with Java, we’d have access to the full array of test libraries from the Java ecosystem and still get to enjoy the benefits Kotlin brings in terms of conciseness and syntax.

Manual Operation Required

Some of the old tests depended on batch jobs to be ran…manually. After the job completed, you could start the next part of the ‘automated’ test. Working with the team managing the Monolith we figured out how to run the batch jobs from code, speeding up the tests and removing the manual element completely. A nice reminder to not settle for mediocre testability in your applications and test automation.

Atomic Tests Blew up the System

Atomic tests (of the non-exploding kind) can help you pin-point faster and with more accuracy what the problem is with your SUT. In our case we set out enthusiastically to increase coverage and write more tests. This worked great for the New Platform side of things, whizzing along quickly, creating all the entities we needed for our tests. But the Monolith choked. Queues would fill up, database transactions would get into deadlock, test verifications would timeout and application server instances would disappear as soon as we started increasing the amount of data created by tests.

Anti Pattern: Non Atomic Tests

We needed a way to get around the unreliability of the Monolith. Knowing full well that we were simply trading our current problems for a new set of problems, we decided to drop atomic tests in favour of non atomic tests. Instead of creating fresh test data for every operation we’d want to test, we’d create test data once and re-use it for a chain of operations. This way we could alleviate the bottleneck, still get test (some) test results and keep moving forward. There’s a few issues with this approach:

A failed test means that you won’t have results for all the tests that would’ve run after that test.
Failing tests mean that you have to investigate not one but many operations to figure out what is wrong. This increased the load on the team, or Test Engineer on Duty. It takes more time to determine the issue.
Separate parts of the test really need to be separate. If you need data in a specific state for a specific operation you may have to order the tests or create a new test all together after all. Which then increases the pressure on the bottleneck again.

Having identified all of this upfront we still decided to move on and to transpose the test set to atomic tests again as soon as it was possible.

Comparing Notes

During the strangling of the Monolith there will be a period of time where both the Monolith and the Platform operate together. While extracting functionalities to new applications (part of) the functionality lives on in the Monolith, and production will still depend on the monolithic part functioning. In our case copies of entities were kept in both the New Platform and the Monolith. One strategy we applied here, was to ‘compare notes’ for each test. An entity created in the New Platform must look the same in the Monolith. Any difference should be investigated, as our migration strategy dictated consistency across both.

Unlocking Faster Feedback

As we edge closer to decommissioning the Monolith, less and less features depend on it. However, by keeping it in scope for our automated tests we suffer a penalty: slow feedback. Test runs including the Monolith take about 30 mins, and still suffer from unreliable results. Again, there is a trade off to be made.

As the New Platform is now the main driver of most major processes, feedback on the New Platform has become more important than feedback on the Monolith. Our test set did not make that distinction, and any test run would verify both the New Platform and the Monolith.

In order to have faster feedback, we made our tests configurable. An environment variable now determines what parts of our non-atomic tests concerning the Monolith can be skipped. This reduced the test lead time to ~5 minutes. Tests with the Monolith included can still be run on demand. A next step is to return to atomic tests and have even more manageable tests.

It ain’t pretty but it works

Of course, some of the things described above come with a certain risk. Risk of spending too much time on inspecting flaky tests. Risk of the Monolith failing and us consciously not running the tests for it. Reality is that we don’t always get to work with perfect systems. And that’s ok. Please just don’t apply any of this blindly. Always weigh your options, in relation to your goals and risk appetite. The only reason this worked in my context is because of:

A great team of test engineers, with the culture to meticulously check failing tests.
A shift-left test approach, where a lot of the tests are executed early on.

If you want to get started with test automation, check out my Pluralsight course — Test Automation: The Big Picture, it’s a perfect starting point! (affiliate link)