23 April 2007

Workarounds

This morning we are fighting with a bug in the Microsoft MSI installer, that deletes part of the bundled JRE from our app after an upgrade, as described Here. There is a workaround listed on the Microsoft knowledgebase site for this bug.

A workaround is a "temporary" solution to a bug, that is created in lieu of fixing the bug and re-distributing the software. This happens when a manager sees the cost of fixing the problem as higher than the cost of creating the workaround. The problem, as all software engineers know, is that there is an under-appreciated hidden cost - the quotes around the word "temporary". It's not temporary, it will always be there and will always suck, because once the software works, it's too risky to change it.

My last government client was a great case study for how repeated use of workarounds can literally box you into a corner. We used the full Oracle J2EE stack, which is pretty buggy, especially if you're using the off-in-the-weeds use cases that the Oracle QA department didn't get around to. For every bug, another workaround from the on-site consultant. Management didn't see a problem with this, because a workaround appears to offer the same value as a fix, except for that pesky "temporary" bit - but hopefully that will be someone else's problem. In the end, workarounds on top of workarounds left the system fragile and unstable.

In contrast, look at the Mac OSX Save As... dialog. Apple recently posted on their propaganda blog about the keyboard shortcuts you can use in the dialog. It occurred to me that this tip applies to all applications, which makes the user trust that if they learn this behaviour, they'll be able to apply it whenever they encounter this dialog. That consistency is an attribute of quality software, and it's possible because Mac software (generally) uses the system-provided functionality to do lots of stuff.

If this were a Microsoft OS, when a problem was found in the Save As... dialog, rather than fix it, there would be a workaround. Each programmer can just make their own dialog box, that looks the same, but has the extra flag or modified data structure or whatever avoids the bug. And now, each of those developers can introduce differences, like being lazy and leaving out the keyboard shortcuts. Now the user can't trust that anything is consistent, so they avoid learning faster ways to use the "computer" as a whole.

To be fair, it's not realistic to expect workarounds to disappear. As a lead, sometimes you find yourself in a situation where a proper fix pushes out the schedule, and you weigh the costs and benefits, including how bad the workaround will be if it lives forever, and what implications it could have on the design. And you feel dirty.

Some people feel so dirty that they won't go along with it. There's a spectrum of developer reactions, from indignation that leads to an indefinite delay of the release, to an irresponsible lack of concern. But even if they're forced to give up their ideals and do some messy hack, ultimately, developers know that quality software has zero workarounds, period.

SO, I think perhaps the best metric for how bad your software is: how many workarounds it contains. Repeatedly leaving bugs in the software is a bad practice that not many managers really understand. There's a good reason for programmers to get out the resume paper when they see a culture of accepting workarounds on their project.

No comments: