More ways to stop spammers and unwanted traffic

Comment spammers, trackback spam, stupid bots and AVG linkscanner eating into your bandwidth and server resources? Here’s how to put a dent in their activities with a few mod_rewrite rules.

I hate those blogs that send me fake trackbacks and pingbacks. Unfortunately it’s impossible to stop but this morning I figured out a way of stopping some of them.

Look through the log files of your web server for the string ‘ “-” “-“‘. Lots of requests there aren’t there? I found 914 requests yesterday. Those are requests without a USER_AGENT or HTTP_REFERER and almost all of them are suspicious because they weren’t followed by requests for images, stylesheets. or Javascript files. Unfortunately the WordPress cron server also falls into this category so you need to filter out requests from your own server’s IP address.

This morning I checked up on a spam trackback that came in. This one came from

URL: /xmlrpc.php
HTTP_RAW_POST_DATA: <?xml version=”1.0″?>
<value><string>http://7wins. eu/cbprod/detail_10347/cure+your+tight+foreskin.html</string></value>

I looked through my log files for that IP address and discovered the following: – – [03/Jul/2008:06:40:01 +0000] “GET /2005/02/18/10-more-ways-to-make-money-with-your-digital-cameras/ HTTP/1.0” 200 36151 “-” “-” – – [03/Jul/2008:07:04:18 +0000] “GET /2007/06/07/im-not-the-only-one-to-love-the-alfa-147/ HTTP/1.0” 200 44967 “-” “-” – – [03/Jul/2008:08:09:40 +0000] “GET /2005/03/01/i-am-bored-sites-for-when-youre-bored/all-comments/ HTTP/1.0” 200 410423 “-” “-” – – [03/Jul/2008:08:09:44 +0000] “POST /xmlrpc.php HTTP/1.0” 200 249 “-” “XML-RPC for PHP 2.2.1” – – [03/Jul/2008:09:00:09 +0000] “GET /2007/10/28/what-time-is-it-wordpress/ HTTP/1.0” 200 63332 “-” “-“

So, the spammer grabs “/2005/03/01/i-am-bored-sites-for-when-youre-bored/all-comments/” at 8:09am and 4 seconds later sends a trackback spam to the same blog post. Annoying isn’t it?

The following mod_rewrite rules will kill those fake GET requests dead.

# stop requests with no UA or referrer
RewriteCond %{HTTP_REFERER} ^$
Rewritecond %{HTTP_USER_AGENT} ^$
RewriteCond %{REMOTE_ADDR} !^64\.22\.71\.36$
RewriteRule ^(.*) – [F]

Replace “64\.22\.71\.36” with the IP address of your own server. If you don’t know what it is, look through your logs for requests for wp-cron.php, run ifconfig from the command line, or check with your hosting company.
Here are a few of the requests already stopped this morning: – – [03/Jul/2008:09:59:59 +0000] “GET /2005/04/02/photo-matt-a-response-to-the-noise/ HTTP/1.1” 403 248 “-” “-” – – [03/Jul/2008:10:00:11 +0000] “GET /2006/12/14/bupa-to-leave-irish-market/ HTTP/1.1” 403 240 “-” “-” – – [03/Jul/2008:10:03:18 +0000] “GET /2008/05/23/youre-looking-so-silly-wii-fit HTTP/1.1” 403 212 “-” “-” – – [03/Jul/2008:10:04:52 +0000] “GET /1998/03/22/for-the-next-month-o/ HTTP/1.1” 403 234 “-” “-” – – [03/Jul/2008:10:06:06 +0000] “GET /2006/10/01/killing-off-php/ HTTP/1.1” 403 229 “-” “-” – – [03/Jul/2008:10:07:54 +0000] “GET /2005/08/12/thunderbird-feeds-and-messages-duplicates/ HTTP/1.1” 403 255 “-” “-“

Some spam bots are stupid. They don’t know where your wp-comments-post.php is. That’s the file your comment form feeds when a comment is made. If your blog is installed in the root, “/”, of your domain you can add this one line to stop the 404 requests generated:

RewriteRule ^(.*)/wp-comments-post.php – [F,L]

Trackbacks and pingbacks almost always come from sane looking user agents. They usually have the blog or forum software name to identify them. Look for “/trackback/” POSTs in your logs. Notice how 99% of them have browser names in them? Here’s how to stop them, and this has been documented for a long time:

RewriteCond %{HTTP_USER_AGENT} ^.*(Opera|Mozilla|MSIE).*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^$
RewriteRule ^(.*)/trackback/ – [F,L]

I’ve been using that chunk of code for ages. It works exceptionally well. This was prompted by a deluge of 40,000 spam trackbacks this site received in one day a few months ago.

If you use my Cookies for Comments plugin. Check your browser for the cookie it leaves and use the following code to block almost all of your comment spam:

RewriteCond %{HTTP_COOKIE} !^.*put_cookie_value_here.*$
RewriteRule ^wp-comments-post.php – [F,L]

That will block the spammers even before they hit any PHP script. Your server will breeze through the worst spam attempts. It blocked 2308 comment spam attempts yesterday. Unfortunately it also stops the occasional human visitor leaving a comment but I think it’s worth it.

Do something different. That’s what you have to do. Place a hurdle before the spammers and they’ll fall. On that note, I shouldn’t really be blogging all this, but almost all these ideas can be found elsewhere already and the spammers still haven’t adapted.

Unwanted traffic? What’s that? Surely all visitors are good? Nope, unfortunately not. Robert alerted me to the fact that AVG anti-virus now includes an AJAX powered browser plugin called “Linkscanner” that scans all the links on search engine result pages for viruses and malicious code. Unfortunately that generates a huge number of requests for pages that are never even seen by the visitor. I counted over 7,000 hits yesterday.

Thankfully Padraig Brady has a solution. I hope he doesn’t mind if I reprint his mod_rewrite rules here (unfortunately WordPress changes the ” character so you’ll have to change them back, or grab the code from Padraig’s page.)

#Here we assume certain MSIE 6.0 agents are from linkscanner
#redirect these requests back to avg in the hope they’ll see their silliness
Rewritecond %{HTTP_USER_AGENT} “.*MSIE 6.0; Windows NT 5.1; SV1.$” [OR]
Rewritecond %{HTTP_USER_AGENT} “.*MSIE 6.0; Windows NT 5.1;1813.$”
RewriteCond %{HTTP_REFERER} ^$
RewriteCond %{HTTP:Accept-Encoding} ^$
RewriteRule ^.* [R=307,L]

Keep the libwww-perl bots out

If you look through your server logs you’ll probably notice more than a few requests like these:

GET //wp-pass.php?_wp_http_referer= … “libwww-perl/5.805”
GET /2004/02/18/smoking-ban-is-on-the-way/trackback/ … “libwww-perl/5.805”
GET /2004/02/18/irish-car-tax-list/trackback/ … “libwww-perl/5.805”
GET /tag/php//tags.php?BBCodeFile= … “libwww-perl/5.579”

If you do find them (grep libwww-perl access_log) then add the following code to your .htaccess file. On a WordPress site this file should already be there if you’re using fancy permalinks.

RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} libwww-perl.*
RewriteRule .* – [F,L]

Change “RewriteBase /” to suit your own base directory.

There are other bad guys out there. This page has a long list of rewrite rules to keep out all sorts of bots! I haven’t looked through them myself so YMMV if you try them.

This has the added benefit of reducing load on your server. WordPress sites are dynamically generated. This is great under normal circumstances but when you get a flood of requests it can place an unnecessary load on your site. WP-Cache helps a lot but these rules will stop them dead at the front door!

PS. ‘Course, if you depend on a libwww-perl application then don’t add this rule or you may give yourself a headache trying to figure out why things stopped working!