Earlier this month I noticed that a particular bot that likes to visit my website, “MJ12bot/v1.4.8” seems to be particularly attracted to the “reply to comment” links generated by my blog. Those are links that bots see, but we see the “Reply” button that uses JavaScript to reply to a comment.
To be honest, it’s pretty annoying to see a bot constantly fetching those URLs from my website. Earlier this month, it was on a roll and grabbing several dozen at a time. While my server can handle the traffic without any issues, who wants a bot trampling over their server?
I decided to stop them in two ways:
- Redirect them back to the post in a mod_rewrite rule.
- Block them in robots.txt and hopefully the bots will go away.
Coming up with a mod_rewrite rule was surprisingly hard, but after mentioning this on Mastodon I received a reply from Jos Klever who figured out I needed the QSD flag. So, to spare you the hassle of researching it, here are the mod_rewrite rules that worked for me. It causes a 301 permanent redirect to the anchor tag of the comment. Add this to your .htaccess file.
RewriteCond %{QUERY_STRING} replytocom=(.*)$
RewriteRule ^(.*)/ $1/#comment-%1 [NE,QSD,L,R=301]
Blocking requests like this in the robots.txt is much simpler. WordPress can generate the robots.txt file for you using the robots_txt filter. Add the following to a mu-plugin PHP script.
function disallow_replycom_urls( $output, $public ) {
$output .= 'Disallow: ?replytocom';
return $output;
}
add_filter( 'robots_txt', 'disallow_replycom_urls', 10, 2 );
I haven’t received many comments on my posts lately. However, I stumbled upon some interesting posts by clicking the RANDOM link above, which I decided to examine as part of my research. During my search, I delved deep into the blogosphere of the past, almost like being an archaeologist, because some links were no longer available, and I had to search for them on archive.org. I was also pleasantly surprised to find that a link to a GIF from 2005 was still alive!