Tracking difficult bugs, the programmer / customer relationship
Archive - Originally posted on "The Horse's Mouth" - 2009-03-20 13:21:24 - Graham EllisIf you've been programming for a while, you have probably come across one or two of those code bugs that are really really really hard to reproduce, track down and fix. And yet - when you do track them down / fix them from live (production) code, your users / customers just shrug their shoulders and you get comments like "if it only took you a minute or two to fix when you found what it was, why on earth did it take you so long to find what it was in the first place?"
I've had one of these 'incidents' running over the last few days.
The support pledge to ask people who are in favour of, and would use and benefit from, an improved TransWilts train service (www.transwilts.org.uk) is backed up by a page that includes a list of signatories so far.
When I wrote it, it worked.
When I tested it, it worked.
When I revisited it, it worked.
When others visited it, it worked.
Then someone report it had failed - a big gap between the two columns of names, and the right-most column so squished that the names folded - fault.
I tested it again - visited the page and ... it worked.
Correspondence - "which browser are you using" and "what's your screen resolution, but the problem went away - even for the reporter, it worked.
I tried it the next day to make sure, and it worked.
Another failure report came in - fault.
And deciding that there had to be *something* in these reports I visited the page and (to be honest, to my surprise) I got fault using exactly the same browser that had previously been working. But at least I now had evidence on my screen, and I was able to capture the HTML of that evidence and see just WHY it was happening.
Here's the
Around 10% of our signups ask that their name be withheld rather than appearing on our page. That's for people in the railway industry, for example, who want to show their support but can't have their names out on our site because of their jobs. As each name is added on to the display, our code says "am I exactly halfway through the names I have to display" and if the answer is 'yes', it starts a new column - thus giving two columns.
However, if there happened to be a "name withheld" exactly halfway through, the answer to that question will be 'yes' not once ... but twice (the program loop before AND the program loop after the name to be hidden) and so the new column code is run twice, and we ended up with an extra blank column; we also ended up with columns with a total width or 150% of the width of the table, which browsers may get confused about.
When you think about it, the explanation makes logical sense when looked at against the evidence, with more people signing up as the campaign runs, so the problem appearing and disappearing as a certain Mr M..... (name withheld!) is exactly halfway through or slides forwards or back a couple of places.
The solution was to replace:
if (($nsf + $anonc -1) == $halfway)
$names .= "<td><td width=50% valign=top class=nonitaltext>";
by:
if ($split == 0 and ($nsf + $anonc -1) == $halfway) {
$names .= "<td><td width=50% valign=top class=nonitaltext>";
$split=1; }
which rather brutally ensures only a single new column by adding a flag variable (which, yes, I have initialised before the start of the loop).
The solution doesn't answer he question "if the problem was so small, why did it take you so long?", but I hope that this article does - as a whole go some way to providing that answer.
As a footnote, I want to add further comment:
It is very easy - far TOO easy - for the programmer in these circumstances to assume that the problem is caused by user error - indeed, I incorrectly did so at first, and I apologise. The strong consideration must be given to user error, but not to the extend of it becoming an assumption.
It is hard on the user experiencing the problems too; should he / should he not keep reporting them - especially if intermittent - so a programmer who's dubious as to their validity.
And IN THIS PARTICULAR CASE, I want to thank L** who has been the primary reporter for his positive approach and good nature while we have been sorting it. L** - you're quite exceptional and I'm proud to have you as a friend as well as a colleague in campaign.