7
Jan
When it all goes Horribly Wrong
The bug reports are coming in and it's serious. Your software ships in a few days or, worse, it's already out there. The CEO has pawned his jag and the only programmers who understand the code have either died of malnutrition, exposure, or gone back to live with their families in the Amazon Basin. You, the Engineering Manager, have placed your neck and certain more delicate parts of your body on the chopping block. What do you do?
- Make sure it can't get any worse. If you've got some people working on the problem ask them to produce the current best solution they can and make sure it's backed up and reproduceable.
- Understand that you have a commercial problem much more than an engineering one. At this stage of the game it's going to be all about making compromises between short-term, mid-term and long-term revenue. You may fool some people into buying now even though you're in a mess, but you risk them never touching you again with a barge-pole later.
- Try to establish whether you actually know everything that is wrong with your product. Ideally you would have an extensive test-harness which would run reasonably quickly and which you could add to if necessary. If you haven't, you're going to have to accept the risk that any changes you make could introduce bugs somewhere else which you may not spot. It's probably a good idea to start building a test-harness now as a mid-term solution if you have the resources.
- Evaluate all known bugs for their commericial sensitivity and risk. I'm assuming none of these are trivial bugs - in fact, they're mainly "What on Earth" bugs (see here) - so fixing them isn't necessarily possible in the time scale (in fact, you've no idea how to fix them, so you've no idea how long a fix would take).
- Tweak. This is especially dangerous if you've failed on point (3) above, but it may be your only option. Ask your engineers to use their intuition given what they know about the code to suggest little changes which may magically make the situation better (i.e. improve the bug ratings that you've made in step 4). Since you are, largely speaking, poking in the dark, make sure you can undo any changes which turn out to be disastrous.
Start trying to improve your understanding of the code by:
- Making sure all your engineers have note books and asking them to write things down that they discover. In particular, engineers do not like to digress when they spot something whilst working on something else. With a little notebook by their side, they can write it down and come back to it later.
- Trying to discover where the boundaries of your code are - i.e. sections whose interface is clearly defined and which cannot side-effect or be side-effected from other areas. Work on the smallest / simplest first.
- Tidying up your use of spaces, tabs and new-lines. Use white space as a means to make the code as easy to understand as possible.
- If you have a safe means of doing so, renaming variables, functions, and so on into more descriptive names, even if these are really long (like "check_date_field_in_record_and_update_next_appointment_date_if_in_the_past"). If you can say it, totally accurately, in just a few words, then great, but accuracy now is more important than brevity. This will hopefully have a knock on effect that may start to highlight where mistaken assumptions have happened in the code.
- Reviewing comments. Get rid of any that don't make sense. Get rid of clutter that isn't affecting the functionality of the code. Use comment patterns (like rows of stars(*)) to highlight particular parts of your code (like all functions). Like white space, use comments to make it as easy as possible to see what is going on.
- Simplify (dangerous without step (3) above). Get rid of repetition using functional and object abstraction as safely as possible. Try to reduce the size of your problem without making it more complicated.
- Fix the bloody thing. Or throw it away.
Richard


Feedback awaiting moderation
This post has 68 feedbacks awaiting moderation...
Leave a comment