Keep the libwww-perl bots out

If you look through your server logs you’ll probably notice more than a few requests like these:

GET //wp-pass.php?_wp_http_referer= … “libwww-perl/5.805”
GET /2004/02/18/smoking-ban-is-on-the-way/trackback/ … “libwww-perl/5.805”
GET /2004/02/18/irish-car-tax-list/trackback/ … “libwww-perl/5.805”
GET /tag/php//tags.php?BBCodeFile= … “libwww-perl/5.579”

If you do find them (grep libwww-perl access_log) then add the following code to your .htaccess file. On a WordPress site this file should already be there if you’re using fancy permalinks.

RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} libwww-perl.*
RewriteRule .* – [F,L]

Change “RewriteBase /” to suit your own base directory.

There are other bad guys out there. This page has a long list of rewrite rules to keep out all sorts of bots! I haven’t looked through them myself so YMMV if you try them.

This has the added benefit of reducing load on your server. WordPress sites are dynamically generated. This is great under normal circumstances but when you get a flood of requests it can place an unnecessary load on your site. WP-Cache helps a lot but these rules will stop them dead at the front door!

PS. ‘Course, if you depend on a libwww-perl application then don’t add this rule or you may give yourself a headache trying to figure out why things stopped working!

40 thoughts on “Keep the libwww-perl bots out

  1. Wow, blocking WordPress is a bit drastic, but I can understand why you did it. It might be better to block by IP or IP range rather than UA in that case?

  2. It was a drastic decision that I had to put in place because there were literally hundred of them per hour. All from 2.1 alpha It lasted about 1 week and then suddenly stopped so I removed the WordPress entry.

    Since the 15th of Sept there have been 7031 attempts to spam my old blog.

    3209 – No useragent
    1749 – TrackBack/1.02
    1338 – Opera/9.0

    The rest are b2evolution, IE6.0 etc.

  3. The page you linked includes a code banning IP addresses from Turkey. That’s offensive man, not all of us are bad – but yes, I admit that we got the most idiotic hacker wannabes and those who use internet to curse, fart and try to get laid. Does that mean all of us must be keeped out from websites?

  4. I’d say banning libwww-perl user agents is a bit drastic too – there’s plenty of decent LWP-based scripts doing useful things.

    Granted, they can easily change the user-agent to something more descriptive of what they are and leave out the libwww-perl – but so can any attacker.

    I’m surprised that anyone writing attack tools using LWP doesn’t just mimic a common user-agent like IE or Firefox rather than leaving it on the default!

  5. Baris – that’s why I said, “I haven’t looked through them myself so YMMV if you try them.” As you can see yourself since you left a comment here, I haven’t adopted his rules. Thank you for the warning!

    Robert – woah, nasty stuff. Glad to see you’re using Akismet now!

  6. David – I know! I couldn’t believe it when I spotted them ages ago, but they’re still at it well over a year after I first noticed them.

    TBH – if someone is using a script to interrogate my site I’d hope they have the courtesy to tell me about it first. So far noone has so I don’t worry about blocking them.

    Seeing a 403 on those requests is *so* satisfying too!

  7. I use the redirection plugin, and to block this sort of stuff I just use a REGEX and redirect them all through an ALEXA redirect to my home page.

    Most of them won’t follow the redirect, but it does block them.


  8. I echo david’s comment. Not all libwww-perl requests are malicious. Could you at least change the title of your post so it doesn’t paint us genuine perl programmers in such a bad light ? Perl has enough of an image problem already without this.

  9. Walter – changed “bad guys” to “bots” in the title, but the URL can’t change. Sorry if I upset, I definitely don’t want to make out all Perl coders are bad!

  10. I have to admit that I’m a bit disappointed in this post as a whole. To suggest blocking libwww-perl completely is sort of like saying we should close down all roads because wrecks frequently happen on them.

    That said, if the idea is to reduce spam to one’s blog, wouldn’t it be wise to at least setup some exlusions and block libwww-perl for your comments and postback forms only? To block it out in total is to prevent some news readers/aggregators from getting to your xml/atom/rss feeds.

  11. Thanks for the advice and the link Donncha. I have added the whole list from that page although I’ve been through it and commented out the Turkey IP’s as that seemed a bit extreme 🙂

    I can always uncomment any of them if I ever get any hassle from them in the future.

  12. AJ – the libwww scripts I see hitting my site hit all sorts of urls, not just comment and trackback urls. Even non-existant urls, looking for an exploit.

    I think a huge majority of them are not trying to spam, they’re trying to break into my site. I’m not worried that they’ll succeed because I try to keep up to date but it’s a useful way of stopping them before they hit any php code.

    Dan – exploits, almost all exploits as I said above.
    I actually added a check for Opera 9 to my comment form but I dump all the comments stopped in a file. Check it out here: (for the time being, will be removed!)

  13. Aaron, I use Redirection, too. Can you explain how do we add the regex code?

    Go to the Redirection control Panel.
    Scroll to the bottom where it says “Add Redirection”:
    Add you regEX as the source url eg: (.*php\?(page|j|o|r|file|sub|.*?)=(http|ftp).*)
    Add the file you want to redirect to in the “Target URL”
    Select the checkbox that says “Regex”
    Click add Regex.

    You can also use REGEX when deleting post categories:

    Source URL: “/my/original/category/(.*)”
    Target URL: “/some/othercategory/$1”

  14. Hello Doncha,
    My blog was seriously messed up by a libwww bot last week so I have also implemented similar .htaccess rules.
    Two of my favorite 403 – Forbidden alternatives are:
    402 – Payment required 😉
    and a 301 – Permanent Redirect to their own hostname / remote IP with their maliciously crafted exploit URL untouched!
    I am currently testing a few additional regexps to block more malicious bots by detecting remote inclusion attempt strings/patterns in the requested URLs but it needs more testing before publishing it.

  15. It’s certainly interesting to see WordPress used as a user-agent. Is that simply an attempt to cover tracks, or simply the sloppy way some spambots are configured?

  16. Donncha,

    Perhaps in your case that works. I’ve not been hit so terribly by libwww-perl on my site. But being a PHP and Perl programmer myself, I regularly write apps for people who are looking to incorporate blog RSS/XML/ATOM feeds into their headlines or sites. Of course of late I’m doing most of that work in PHP, but I’m still doing some sites that prefer to crawl feeds in perl and database them. For my clients, I always set the agent name and it refers back to their site so a webmaster could easily tell that they had been crawled by that particular site, but my point remains that libwww-perl is frequently used to retrieve XML/RSS/ATOM feeds. If you exclude it across the board, you are stopping out some sites from reading your feed.

  17. I’ve been noticing these on my site for some time now. Not knowing what they were, but seeing that they were causing a lot of 404’s by hitting non-existent URLs, I 403 banned them with a plugin quite some time ago.

  18. Harsh. Too harsh. I’m a Perl programmer, and I use LWP::UserAgent all the time. I tend not to touch the $ua->agent, or I append contact info or a URL to it, so people know what I’m doing.

    In particular, I wrote a Jabber bot that periodically goes out and fetches RSS feeds (via LWP::UserAgent), then retransmits those feeds over Jabber to its subscribers.

    Seems to me you’re curing the symptoms — it’ll probably cure the disease, but you might kill a few patients along the way.

    @Platinax: The WordPress user agent is actually used for trackbacks and whatnot, IIRC.

  19. Hi,
    Great post and very useful.

    By blocking libwww-perl will it have any impact on general users using IE or Firefox who visit a web site? Also what impact does blocking libwww-perl have on search bots ( google , yahoo ) or does it only effect people trying to launch attacks with libwww-perl

  20. Why is everyone getting defensive about being a perl programmer? EVERYTHING that hits your site with the user agent “libwww” is CRAP.. that is the bottom line.. Oh, and blocking all of Turkey sounds good too 🙂

  21. @olly No it wont, and fot the bots, their user agents look very like a regular user’s one, except that it has added a line GoogleBot-2.0 (for example) inside the () where it says MSIE-6 compatible; win32 and stuff like that…

    that’s my .20

  22. Yeah great!!!

    my $ua = LWP::UserAgent->new;
    $ua->agent(“Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b5) Gecko/2008043010 Fedora/3.0-0.60.beta5.fc9 Firefox/3.0b5”);
    my $req = HTTP::Request->new(GET => $url);
    my $res = $ua->request($req);
    if($res->is_success) {
    } else {

    “I hate wwwlib-perl.”

    And Firefox? 😀

  23. All the tips and comments are great, but obviously directed to or between PROGRAMMERS who understand all the shortcuts and abbreviations so foreign to the regular schmuck like me.

    All the other ways to block spam are just not helpful to those of us who just want to put up a site and then go do other stuff. Like WORK.

    I don’t mind doing a little pseudo-programming, but at least someone could tell us where to find these mystery files.

    I’m still trying to find this MOD_REWRITE file.

Leave a Reply

%d bloggers like this:

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.