Software architecture may seem an abstract concept much removed from your company’s bottom line. But this week, I was reminded how much trouble you can get in if your most important applications are built on shaky foundations.
The following customer story illustrates the severe financial and reputational consequences if you neglect software architecture. It provides inspiration for managing and monitoring your software assets as you would any other business-critical asset.
A big bang in a digital twin
The software we were investigating was an industrial system. The kind of system that if something goes wrong in the software, machines go out of control, resulting in a very big bang. Lives are literally at stake. Of course, the customer was aware of that, so they had set up an excellent testing program. All software has been thoroughly and completely tested. They even used a digital twin for simulation.
A devil’s dilemma: Should we go live?
Based on the test results, they didn’t dare go live with a new software release. The system was not stable enough and gave unexpected results. The delay of the new release was not without consequences. There were contractual obligations and deadlines to meet.
So, management had two unenviable options:
- Going live with risks, including the possibility that things might explode. Or,
- Admit to the customer that things are out of control and seek a postponement. Which would beg difficult questions to answer— “How long will the delay be? How do we know that the software will be stable come the new delivery date?”
The customer wanted to know what was going on and asked us to investigate the system. The software has a long history, over which it was considerably expanded and adapted.
What was the designed software architecture?
The software system consisted of several modules that exchanged messages with each other. That is a typical and fit-for-purpose design. Such a design should include a messaging protocol that encompasses:
- Types and content of the messages
- Handshaking aspects like: “Hello, I sent you a message – did you get it?” and in reply, “Yes, thank you. Well received.”
The technical reality
So, what was the problem in this case? We analyzed the system and found that about 50 percent of the code dealt with that message protocol—composing messages, interpreting messages, and so forth. Less than half of the code was business logic. It was just messaging. That’s a lot and not so beneficial because more code means more work. But the really annoying thing was that each and every software module had its own message handler.
A tech-savvy person, perhaps an architect, would immediately say, “Yeah, that’s a poorly designed system.” And that’s right, in part. All the software modules had their own message handling code because the system had evolved over time rather than being designed.
The cause of the problems
The individual software modules were produced at different times—some years apart—and had been developed by different teams or suppliers. Years ago, when the system was small, the message protocol was not specified in sufficient detail. At the time, that was not a problem because the team was small, and everyone knew how the system worked.
As the system grew, modules were successively added, and the teams that delivered those new modules had to meet the requirement: “make sure you can communicate with the other modules.” Over time, this resulted in many message handlers in one system.
Many good, small decisions sometimes lead to a big mess. Something can come into existence without being designed. And because the software team was good and experienced, that problem remained unseen for a long time.
Or rather, it was not a hidden problem, but the team was constantly faced with the choice, “Are we going to solve this properly and spend two months on it, or are we going to use the time to build new functionality, which the customer is waiting for?”
Often the short term wins out over the long term—until the long term is suddenly today, and it didn’t work anymore. Too many test issues and too little time.
How tech issues become business issues
The result of all this was that the system had eventually become untestable. The individual software modules appeared to be of good quality, but it was impossible to determine whether the operation was correct from the interplay of modules.
It looked rather like a team of top professional football players who couldn’t see or understand each other. And then, as management, you suddenly find yourself with that horrid dilemma of going live irresponsibly or asking for a postponement.
Lessons learned
We learn from this that it is crucial to design the software architecture well and periodically evaluate whether the software architecture is still appropriate; and whether it has been implemented correctly in accordance with the design.
That is perhaps even more important if you have a very experienced software team that always fixes it. Then it takes longer before the underlying problem becomes apparent. In the case of this customer, the software engineers knew about the problems, but they did not receive sufficient attention from management. The short term often wins.
And the Euros
There are numerous financial consequences to this story:
- Firstly, the financial damage of contractual penalties because of software errors. How ironic that the hourly rate for lawyers is much higher than for software architects.
- Secondly, going live with a system later is often a disaster for your reputation.
But there are also significant costs that could have been avoided over the years:
- Where typically one functional tester is needed to test the work of two software engineers, in this case, testers outnumbered software engineers six to four. Moreover, the engineers wasted a lot more time than was necessary testing their own code.
- The company could have saved hundreds of thousands of Euros in testing costs if the software architecture had been more closely monitored.
And then we haven’t even mentioned the costs involved in fixing the found problems. These are also costs to be avoided. If all these costs had not been incurred, the system would have gone live faster, and operating profit would have been much higher.
In conclusion
Although senior managers typically regard software architecture as technical detail and someone else’s concern, it is ultimately the nervous system of your company. If it goes wrong, it is potentially disastrous and not an easy fix. Just keep an eye on it. Or let us do that for you. We’re experts in software quality. It’s what we do.