Regular Expressions: Finding Email Addresses

Recently I fixed the Sendmail configuration on one of our boxes, and I’m now inundated daily with over 1700 return mails from old and expired email addresses. This is the second day so I decided to combat it:

  1. Create a procmail script to redirect all the bounced mails into a file.
  2. Grab all the email addresses from that file and create SQL statements to disable sending mail to those users of our site.

With Google’s help, I came up with the following procmail recipe, and stuffed it into my .procmailrc:

* ^From: .*MAILER-DAEMON.*
* ^Subject:.*(Undeliverable|failure notice|Returned mail:|Delivery (Status )?Notification|Mail System Error|Delivery fail|Nondeliverable mail|Message status – undeliverable|Mail Delivery Problem|Notification d’état de la distribution).*

That should catch almost all the returns sent to my inbox.
I used the following code to extract the emails from the RETURNS file. It can probably be done better, but this works well enough.

awk -F “< " '// {print $2}’ ” ‘{print $1}’| sort|uniq

I then grep out bogus lines such as the ones smtp servers add, opened the file in vi and added SQL statements around each email address. I expect a lot less email in my inbox tomorrow..

Here’s a handy online regex tester if you want to test a regular expression easily.

By Donncha

Donncha Ó Caoimh is a software developer at Automattic and WordPress plugin developer. He posts photos at In Photos and can also be found on Twitter.

Leave a Reply