”Beware the SpamBot, my son!
The jaws that bite, the claws that catch!
Beware the Jubjub bird, and shun
The frumious Bandersnatch!”[1]
Oh, my children, it’s a dark and dangerous world for the innocent web form. With spam messages outnumbering legitimate e-mails and web comments by something like 20 to one,[2] an unprotected comment form doesn’t stand a chance of going unabused.
Automated spambots troll the web, filling out forms and spewing pablum about penny stocks, Viagra and the personal proclivities of Paris Hilton. Imagine my trepidation, as I worked to add visitor comments to Big Medium 2, at the prospect of sending my bouncing baby beta into this spam-infested wilderness.
But I’ve given the genie a vorpal blade to defend himself and, in the end, Big Medium has developed a shrewd nose for the scent of processed pork. I’m hopeful that Big Medium will hold up as quite the spambot slayer.
The app uses a broad arsenal of anti-spam techniques that together create a useful profile for website defense: Call it the seven habits of highly effective spambot hunters. I’m especially indebted to Ned Batchelder for much of this; Ned posted an excellent article about using hashes and honey pots to detect spambots. It’s a must-read for any web developer, and I’ve implemented several of its strategies.
Come, let’s follow our Great White Hunter on spam safari and discover the habits that have made Big Medium so alert to spam, shall we? (And please, forgive me as I continue to labor this hunting metaphor.)
Habit #1: Leave No Trace
I’ve spent some time observing the behavior of spambots in their natural habitat (it’s a dank place that smells of old beans). A frequent strategy appears to be to have a human visitor go to the site, submit a form, and then script automated hits against that form. Spambots are not terribly bright, but they’re relentless. They submit again and again to the same URL, lobbing the fields and values for which they’ve been programmed.
Big Medium counters this by covering its tracks, never using the same field names twice. Every time you visit the page, all of the field names change. The field names are MD5 hashes of the page’s slug name, its database creation date and a server secret. A semi-obfuscated timestamp is mashed with this field name, creating a 50-digit field name that changes every second.
If the correct combination of field names are not received, the form submission is discarded.
What’s more, these field names are tied to that timestamp I mentioned. Even if once-valid fields are received, the form submission is discarded if the timestamp is more than 12 hours old. For forms older than 30 minutes, you’re prompted to re-submit. So the spambots’ visit-me-once strategy of programming for field names no longer works, at least not outside of the first 30 minutes.
It’s not super-difficult to defeat this technique by analyzing how the form changes from moment to moment, but it does require some custom coding, and as I’ll explain later, the incentives for doing so aren’t very high.
Alas, there are other spambots that don’t follow this strategy at all. These ’bots actually visit the page live and fill out all of the fields on the page, paying no particular heed to the field names. They require a different countermeasure...
Habit #2: Set Traps
These form-filling spambots typically fill out all form fields following simple rules based on field name and field type. These rules are similar, I assume, to the rules used by many browser’s auto-fill features to automatically enter your name, email address, phone, etc.
By creating our coded 50-digit field names, these spambots can no longer use field names to detect which field is which. They have to go on field type alone. This gives us an opportunity to set traps by adding dummy fields to the form. These fields are hidden via CSS from web site visitors, but the typical ’bot is not CSS-aware, so it doesn’t know that these dummy fields aren not meant to be filled.
If a value is submitted in any of these honey-pot fields, Big Medium discards the submission.
To make things a bit more challenging, Big Medium also mixes up the order of these dummy fields with each page build, which means that spambots can’t rely on a fixed order of real forms and dummy forms.
The main drawback here is a potential accessibility issue. For people using alternative browsers that do not support the CSS display:none rule, the hidden fields will not be hidden. Big Medium marks them as “Leave this field blank,” but of course, that’s not a great interface. Should our hapless visitor inadvertently enter a value in one of these fields, we’ve mistakenly bagged a real live human, not a spambot. Oops.
I’m a bit behind on my knowledge of page readers and other alternative browsers, so I’m not sure whether or not this is actually a significant issue. Comments are welcome.
Habit #3: Embrace Indirection
Big Medium’s forms all have submission URLs that point at the content page itself. At the last moment, when the form is submitted, Big Medium swoops in via JavaScript and sets the form to point to the real submission URL. Likewise, the form’s submission method is changed from “GET” to “POST.”
Most spambots don’t grok JavaScript, which means that their submissions will go to the wrong page.
Here again, we have a potential accessibility issue, however. If a visitor has JavaScript disabled in their browser, they can’t submit the form. Big Medium adds a note to JavaScript-disabled pages saying that JavaScript is required to submit the form. That’s not ideal, but in the end, I think requiring JavaScript is an acceptable price of participation if it helps to keep the environment spam-free.
Habit #4: Find the Prey’s Lair
Many spambots never actually visit the page with the form on it. They just submit directly to the submission URL, using their programmed fields. So, another test is to check the source of the form submission. Call it the spambot’s lair.
Spambots typically fib about the page where they’re coming from (the “referrer” URL). These fake referrer URLs are sometimes the site’s homepage URL or some other value unrelated to the page where the form actually lives. Big Medium checks these referrer URLs and discards the form if it smells bad.
Trouble is, even legitimate browsers often provide unreliable referrers, sometimes providing none at all. If no referrer is provided, Big Medium lets the submission slide, so this isn’t a very strong test, but hey, every little bit helps.
Habit #5: Ask Good Questions
Every safari gentleman should be a master of conversation. Big Medium does its best by posing a challenge question in its hunt for spambots. This is a very simple CAPTCHA test to see if the form submitter is human. The challenge question and its expected response can be customized to any site managed by Big Medium. As of this writing, this blog’s challenge question is, “What color is fresh grass?” If spambots are custom-programmed to answer, “green,” I can just change the question, and the spammer will have to come back to the site to adjust.
Half the battle, after all, is just making it hard to spam. Increasing the cost and difficulty of maintaining spambots for an individual site means that the spammer is more likely to move on to easier prey.
The logical flip side is that you should reduce incentives for doing that maintenance. If there’s not much to be gained by doing custom coding for your site, then the spammer will likely move on. Which brings us to...
Habit #6: Don’t Look Tasty
When dodging the slavering jaws of the spambots, it’s best if your site looks like a lousy supper. That means reducing the rewards of getting the spam message onto the site.
Even if spammers make the necessary code changes to help their spambots navigate the obstacles I’ve described above, they may simply decide that it’s not worth the extra work and maintenance if the incentives are too low.
My assumption is that spammers are in the hunt for a good Google PageRank score; it’s about boosting their results in search engines. Big Medium automatically adds the rel="nofollow" tag to all links. That tells Google and other search engines not to index the link, so posts in Big Medium comments get no Google juice.
Admittedly the pace of spam has only increased since the rel="nofollow" tag was introduced by Google and a group of blog software makers a couple of years ago. My optimistic hunch is that new apps that support this tag from the get-go (read: Big Medium) won’t be as big a target. We’ll see.
Habit(?) #7: The Most Dangerous Prey of All: Man
The strategies listed above are primarily aimed at thwarting automated spambots. They won’t do much to slow down a real person determined to share their recommendations for ringtones and rock-bottom mortgage rates. So it’s important to provide some additional protection against human spammers, as well as the handful of clever spambots that might manage to evade all of the other traps and tests.
Analyzing the actual content of the message is the only answer. That’s where Akismet comes in.
Big Medium has built-in support for the Akismet online anti-spam service. The brainchild of Wordpress phenom Matt Mullenweg, Akismet torture-tests comments with hundreds of tests and then tells Big Medium whether or not it’s spam. If a comment reeks of processed ham, Big Medium chucks it into the spam bin instead of posting it to the site. Spam comments are deleted automatically every 15 days.
On top of all of that, of course, you can also choose to review all comments that actually make it through the gauntlet before posting them to the site. That lets you eyeball and approve all of the remaining messages personally.
Callooh! Callay! Cautious Optimism...
In the end, I’m cautiously optimistic that Big Medium’s new comment feature is backed by tight protection against spammers, but time will tell.
“And, has thou slain the SpamBot?
Come to my arms, my beamish boy!
O frabjous day! Callooh! Callay!’
He chortled in his joy.
Tags:
akismet,
beta,
bigmedium,
cms,
comments,
programming,
spam
Comments
8 comment(s) on this page (times are local Paris time). Add your own comment below.
wow. You have really put some juice into this. I think i have come with a new diagnosis for you - Impressive Compulsive Disorder
Keep up the great work
steve
I implemented something alike for my weblog. In two or three years of usage, I never go any comment spam. And that's without any JavaScript or bayesian filter or timed expiration of valid input names. (Comments are disabled for now however so you can't check it out, sorry.)
Awesome! I've been eagerly awaiting the comments feature, and to know you've put this much thought behind it is great. Keep those updates coming :)
One-click editing anyone?
Just wanted to provide a quick thank you for the comment feature Josh. I was one of the people that originally asked about it. At the time I really didn't know there were bots that were clever enough to spam comments sections. I'm very grateful that you took so much care in constructing the comments, and keeping an eye out for the publishers.
Big Medium, my best software purchase EVER.
I have been reading up on several ways to prevent spambots from spamming web forms. I came up with a simple technique that doesn't use image validation but simple number validation. Each time a user enters my form, I generate a unique ID and a 5-7 digit number code. I save this unique ID to a database along with its associated number code. When the form is submitted, if hidden field unique ID is the same and number code you typed is correct it submits the info and deletes the record, otherwise it will assume spam and not submit info. Again, it can be broken but that come into how complex I display the 5-7 digit code. Visit: http://www.atksolutions.com/contact.php
for an example. So far it has worked Anthony http://www.atksolutions.com
You are a wealth of information. Merci beaucoup! Thanks for educating so many. Your time is very much appreciated. I wish I knew at your age what you know! lol Visit my site and share your comments.
Just have to see if your vorpal blade goes snickersnack on this comment.
Points for the excellent usage of snickersnack.
Add a Comment
Don't be shy.