More ways to stop spammers and unwanted traffic

Comment spammers, trackback spam, stupid bots and AVG linkscanner eating into your bandwidth and server resources? Here’s how to put a dent in their activities with a few mod_rewrite rules.

I hate those blogs that send me fake trackbacks and pingbacks. Unfortunately it’s impossible to stop but this morning I figured out a way of stopping some of them.

Look through the log files of your web server for the string ‘ “-” “-“‘. Lots of requests there aren’t there? I found 914 requests yesterday. Those are requests without a USER_AGENT or HTTP_REFERER and almost all of them are suspicious because they weren’t followed by requests for images, stylesheets. or Javascript files. Unfortunately the WordPress cron server also falls into this category so you need to filter out requests from your own server’s IP address.

This morning I checked up on a spam trackback that came in. This one came from 85.177.33.196:

URL: /xmlrpc.php
HTTP_RAW_POST_DATA: <?xml version=”1.0″?>
<methodCall>
<methodName>pingback.ping</methodName>
<params>
<param>
<value><string>http://7wins. eu/cbprod/detail_10347/cure+your+tight+foreskin.html</string></value>
</param>
<param>
<value><string>http://ocaoimh.ie/2005/03/01/i-am-bored-sites-for-when-youre-bored/all-comments/</string></value>
</param>
</params>
</methodCall>

I looked through my log files for that IP address and discovered the following:

85.177.33.196 – – [03/Jul/2008:06:40:01 +0000] “GET /2005/02/18/10-more-ways-to-make-money-with-your-digital-cameras/ HTTP/1.0” 200 36151 “-” “-”
85.177.33.196 – – [03/Jul/2008:07:04:18 +0000] “GET /2007/06/07/im-not-the-only-one-to-love-the-alfa-147/ HTTP/1.0” 200 44967 “-” “-”
85.177.33.196 – – [03/Jul/2008:08:09:40 +0000] “GET /2005/03/01/i-am-bored-sites-for-when-youre-bored/all-comments/ HTTP/1.0” 200 410423 “-” “-”
85.177.33.196 – – [03/Jul/2008:08:09:44 +0000] “POST /xmlrpc.php HTTP/1.0” 200 249 “-” “XML-RPC for PHP 2.2.1”
85.177.33.196 – – [03/Jul/2008:09:00:09 +0000] “GET /2007/10/28/what-time-is-it-wordpress/ HTTP/1.0” 200 63332 “-” “-“

So, the spammer grabs “/2005/03/01/i-am-bored-sites-for-when-youre-bored/all-comments/” at 8:09am and 4 seconds later sends a trackback spam to the same blog post. Annoying isn’t it?

The following mod_rewrite rules will kill those fake GET requests dead.

# stop requests with no UA or referrer
RewriteCond %{HTTP_REFERER} ^$
Rewritecond %{HTTP_USER_AGENT} ^$
RewriteCond %{REMOTE_ADDR} !^64\.22\.71\.36$
RewriteRule ^(.*) – [F]

Replace “64\.22\.71\.36” with the IP address of your own server. If you don’t know what it is, look through your logs for requests for wp-cron.php, run ifconfig from the command line, or check with your hosting company.
Here are a few of the requests already stopped this morning:

72.21.40.122 – – [03/Jul/2008:09:59:59 +0000] “GET /2005/04/02/photo-matt-a-response-to-the-noise/ HTTP/1.1” 403 248 “-” “-”
216.32.81.66 – – [03/Jul/2008:10:00:11 +0000] “GET /2006/12/14/bupa-to-leave-irish-market/ HTTP/1.1” 403 240 “-” “-”
66.228.208.166 – – [03/Jul/2008:10:03:18 +0000] “GET /2008/05/23/youre-looking-so-silly-wii-fit HTTP/1.1” 403 212 “-” “-”
216.32.81.74 – – [03/Jul/2008:10:04:52 +0000] “GET /1998/03/22/for-the-next-month-o/ HTTP/1.1” 403 234 “-” “-”
69.46.20.87 – – [03/Jul/2008:10:06:06 +0000] “GET /2006/10/01/killing-off-php/ HTTP/1.1” 403 229 “-” “-”
72.21.58.74 – – [03/Jul/2008:10:07:54 +0000] “GET /2005/08/12/thunderbird-feeds-and-messages-duplicates/ HTTP/1.1” 403 255 “-” “-“

Some spam bots are stupid. They don’t know where your wp-comments-post.php is. That’s the file your comment form feeds when a comment is made. If your blog is installed in the root, “/”, of your domain you can add this one line to stop the 404 requests generated:

RewriteRule ^(.*)/wp-comments-post.php – [F,L]

Trackbacks and pingbacks almost always come from sane looking user agents. They usually have the blog or forum software name to identify them. Look for “/trackback/” POSTs in your logs. Notice how 99% of them have browser names in them? Here’s how to stop them, and this has been documented for a long time:

RewriteCond %{HTTP_USER_AGENT} ^.*(Opera|Mozilla|MSIE).*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^$
RewriteCond %{REQUEST_METHOD} ^POST$
RewriteRule ^(.*)/trackback/ – [F,L]

I’ve been using that chunk of code for ages. It works exceptionally well. This was prompted by a deluge of 40,000 spam trackbacks this site received in one day a few months ago.

If you use my Cookies for Comments plugin. Check your browser for the cookie it leaves and use the following code to block almost all of your comment spam:

RewriteCond %{HTTP_COOKIE} !^.*put_cookie_value_here.*$
RewriteRule ^wp-comments-post.php – [F,L]

That will block the spammers even before they hit any PHP script. Your server will breeze through the worst spam attempts. It blocked 2308 comment spam attempts yesterday. Unfortunately it also stops the occasional human visitor leaving a comment but I think it’s worth it.

Do something different. That’s what you have to do. Place a hurdle before the spammers and they’ll fall. On that note, I shouldn’t really be blogging all this, but almost all these ideas can be found elsewhere already and the spammers still haven’t adapted.

Unwanted traffic? What’s that? Surely all visitors are good? Nope, unfortunately not. Robert alerted me to the fact that AVG anti-virus now includes an AJAX powered browser plugin called “Linkscanner” that scans all the links on search engine result pages for viruses and malicious code. Unfortunately that generates a huge number of requests for pages that are never even seen by the visitor. I counted over 7,000 hits yesterday.

Thankfully Padraig Brady has a solution. I hope he doesn’t mind if I reprint his mod_rewrite rules here (unfortunately WordPress changes the ” character so you’ll have to change them back, or grab the code from Padraig’s page.)

#Here we assume certain MSIE 6.0 agents are from linkscanner
#redirect these requests back to avg in the hope they’ll see their silliness
Rewritecond %{HTTP_USER_AGENT} “.*MSIE 6.0; Windows NT 5.1; SV1.$” [OR]
Rewritecond %{HTTP_USER_AGENT} “.*MSIE 6.0; Windows NT 5.1;1813.$”
RewriteCond %{HTTP_REFERER} ^$
RewriteCond %{HTTP:Accept-Encoding} ^$
RewriteRule ^.* http://www.avg.com/?LinkScannerSucks [R=307,L]

Comments

comments

67 Replies to “More ways to stop spammers and unwanted traffic”

  1. Thank you so much for this post, while offtopic it has shown me how blog pings (and such) work – leading me somewhere new and giving me something to research.

    This has been the new thing that I learned today.

    (of course the AVG element is now redundant as they should have now stopped doing this (as of yesterday AFAIK, but that’s still a worthwhile example)

  2. So, is this the ultimate list of ‘donncha’s rules for stopping spammers’ (skipping Cookies for Comments and AVG LinkScanner stuff):

    `
    # stop requests with no UA or referrer
    RewriteCond %{HTTP_REFERER} ^$
    Rewritecond %{HTTP_USER_AGENT} ^$
    RewriteCond %{REMOTE_ADDR} !^64\.22\.71\.36$
    RewriteRule ^(.*) – [F]

    # stop requests to wp-comments-post.php outside of /
    RewriteRule ^(.*)/wp-comments-post.php – [F,L]

    # bad user agents for Trackbacks
    RewriteCond %{HTTP_USER_AGENT} ^.*(Opera|Mozilla|MSIE).*$ [OR]
    RewriteCond %{HTTP_USER_AGENT} ^$
    RewriteCond %{REQUEST_METHOD} ^POST$
    RewriteRule ^(.*)/trackback/ – [F,L]
    `

  3. That’s interesting. Unfortunately I don’t think I have access to that info on my host, and I’d not know exactaly what I’d be doing anyway 😛

  4. I’ve really learned alot from this article.. (its a oldie) Thanks for doing all this research and highlighting a preferred way to combat spam is at the HTTP protocol level not the PHP application level.

    I re-found this article from you moving your url.. BTW.

    1. Thanks, I was trying to debug an obscure problem with wp-super-cache, static homepages and a permalink structure with the category and postname. After 20 hours of testing my server worked perfectly. Grrr.

      I’ve reverted everything back, unfortunately that means these posts will be seen as new again by WordPress Planet. 🙁

Leave a Reply