Main Content

Backups, Codebase, Strategy and more - dealing with forum incidents

Archive - Originally posted on "The Horse's Mouth" - 2013-03-03 23:55:57 - Graham Ellis

I've spent much of this evening dealing with an "incident" on our First Great Western Coffee Shop forum, in which a member ran a script on his own computer, from his own forum account, which trashed all of his posts. Well - not exactly all of his posts as it was "a bit broken", but still enough to leave significant gaps in previously-excellent resource threads which discussed and explained various railway issues from track safety to the holding up of connecting trains when incoming connections are late through to the unique identifiers which are applied to each train (headcodes / 4 character codes used in signal boxes / something called RSIDs). The incident has wasted a lot of my time - but it's also provided a useful test and check of our moderation and admin systems in real life, for which I'm grateful. Some lessons / comments that you might like to consider if you're running such a forum - or indeed any user-contributed resource - or indeed any website with scripts.

1. Backups. THESE ARE IMPORTANT. Our server takes a regular copy of the data that changes frequently (i.e. user contributions). In addition to a download of the web site and its data on a cycle, we take intermediate copies that stay on the server. Not quite a transaction log that we can roll back, but we've still got a series of very recent snapshots! And it's important that the backups can be restored, and / or data selectively extracted from them as necessary. Not only do backups have to be taken, but they also have to be checked for fintess for purpose and that purpose is to be able to put things right using them in the event of an incident!

2. Code. I had to selectively untrash around 150 posts, and at my first experiment I ended up with a few <br /> structures and similar displayed on the forum. And that was just a single post. Keen volunteers were available to hand-correct the remaining issues, but the design aspects of my learning to program training courses reminds delegates that it's usually cheaper to develop code well than to have to make heavy maintainance or do other coding later, but that other coding later is in turn much quicker and cheaper that having to do a lot of work on the underlying data. And so it proved - all 150 posts were untrashed in just under an hour via code / a batch process - including writing the code - whereas would have taken all night by going at the data.

3. Strategy. We have an overview / strategy for the site, and systems in place so we know where we're headed, and we can work out the logical and obvious way when an unexpected incident occurs. Much better than having to start thinking "what do we do now?" too often.

4. Team. The incident highlighted the strength of the support team (admin, moderation and regular members too). Without the support of all of these, the job could be much harder (if you get a bunch of mavericks rather than a lone wolf ...). And keep the team - the WHOLE team - informed as you work on the issues. A banner or post admitting problems may look bad to new, casual users, but it's only temporary, helps re-assure regulars, and avoids to omany duplicated reportings of the problem - like that ble and white "Police aware" tape you sometimes see on a vehicle at the roadside.

5. Actions. As I've seen before in these circumstances, you can expect to get an email that seeks to explain the activities and the reasons behind them. Actions, though, speak louder than words, and the victims of the actions are not only the one or two people who have upset the member into triggering the script, but also all the other contributors to the damaged threads, and the people who read them. Half a dozen people - perhaps - had done something, made some little comment that helped spark the incident. But there were 726 unique visitors to the site during the one day alone (true visitors / Google Analytic numbers) all of whom are potential victims.

6. Patterns. Look for patterns in what's happening and look for the begnign steady state. If multiple people have issues with the same post, then look to that post as the cause of problems. But if a single person has problems with lots of posts and threads, by numerouse people and some going back years, then look to that person as the common element when you decide on how to handle the situation.

The time to look through and consider this list above is when you don't have ongoing issues. So it's possibly a bad time for me to be writing and posting this. However, the above are generallities that have stood us in good stead, and should also stand others in good stead. You can't prevent any incident happening, but you can ensure that the waves calm quickly and you're back to a smooth sea again.