Blog

Or search support forum

What's Global Moxie?

Global Moxie specializes in mobile design strategy and user experience for a multiscreen world. We offer consulting services, training, and product-invention workshops to help creative organizations build tapworthy mobile apps and effective websites. We're based in Brooklyn, NY. Learn more.

On Shelves

Books by Josh Clark

Tapworthy: Designing Great iPhone Apps

Best iPhone Apps: The Guide for Discriminating Downloaders

iWork ’09: The Mising Manual

Moxiemail

Enter your e-mail to receive occasional updates:

Seven Habits of Highly Effective Spambot Hunters

Posted Mar 30, 2007 (updated Sep 11, 2008)

”Beware the SpamBot, my son!
  The jaws that bite, the claws that catch!
Beware the Jubjub bird, and shun
  The frumious Bandersnatch!”[1]

Jabberwocky
Jabberwocky.

Oh, my children, it’s a dark and dangerous world for the innocent web form. With spam messages outnumbering legitimate e-mails and web comments by something like 20 to one,[2] an unprotected comment form doesn’t stand a chance of going unabused.

Automated spambots troll the web, filling out forms and spewing pablum about penny stocks, Viagra and the personal proclivities of Paris Hilton. Imagine my trepidation, as I worked to add visitor comments to Big Medium 2, at the prospect of sending my bouncing baby beta into this spam-infested wilderness.

But I’ve given the genie a vorpal blade to defend himself and, in the end, Big Medium has developed a shrewd nose for the scent of processed pork. I’m hopeful that Big Medium will hold up as quite the spambot slayer.

The app uses a broad arsenal of anti-spam techniques that together create a useful profile for website defense: Call it the seven habits of highly effective spambot hunters. I’m especially indebted to Ned Batchelder for much of this; Ned posted an excellent article about using hashes and honey pots to detect spambots. It’s a must-read for any web developer, and I’ve implemented several of its strategies.

Come, let’s follow our Great White Hunter on spam safari and discover the habits that have made Big Medium so alert to spam, shall we? (And please, forgive me as I continue to labor this hunting metaphor.)

Habit #1: Leave No Trace

I’ve spent some time observing the behavior of spambots in their natural habitat (it’s a dank place that smells of old beans). A frequent strategy appears to be to have a human visitor go to the site, submit a form, and then script automated hits against that form. Spambots are not terribly bright, but they’re relentless. They submit again and again to the same URL, lobbing the fields and values for which they’ve been programmed.

Big Medium counters this by covering its tracks, never using the same field names twice. Every time you visit the page, all of the field names change. The field names are MD5 hashes of the page’s slug name, its database creation date and a server secret. A semi-obfuscated timestamp is mashed with this field name, creating a 50-digit field name that changes every second.

If the correct combination of field names are not received, the form submission is discarded.

What’s more, these field names are tied to that timestamp I mentioned. Even if once-valid fields are received, the form submission is discarded if the timestamp is more than 12 hours old. For forms older than 30 minutes, you’re prompted to re-submit. So the spambots’ visit-me-once strategy of programming for field names no longer works, at least not outside of the first 30 minutes.

It’s not super-difficult to defeat this technique by analyzing how the form changes from moment to moment, but it does require some custom coding, and as I’ll explain later, the incentives for doing so aren’t very high.

Alas, there are other spambots that don’t follow this strategy at all. These ’bots actually visit the page live and fill out all of the fields on the page, paying no particular heed to the field names. They require a different countermeasure...

Habit #2: Set Traps

These form-filling spambots typically fill out all form fields following simple rules based on field name and field type. These rules are similar, I assume, to the rules used by many browser’s auto-fill features to automatically enter your name, email address, phone, etc.

By creating our coded 50-digit field names, these spambots can no longer use field names to detect which field is which. They have to go on field type alone. This gives us an opportunity to set traps by adding dummy fields to the form. These fields are hidden via CSS from web site visitors, but the typical ’bot is not CSS-aware, so it doesn’t know that these dummy fields are not meant to be filled.

If a value is submitted in any of these honey-pot fields, Big Medium discards the submission.

To make things a bit more challenging, Big Medium also mixes up the order of these dummy fields with each page build, which means that spambots can’t rely on a fixed order of real fields and dummy fields.

The main drawback here is a potential accessibility issue. For people using alternative browsers that do not support the CSS display:none rule, the hidden fields will not be hidden. Big Medium marks them as “Leave this field blank,” but of course, that’s not a great interface. Should our hapless visitor inadvertently enter a value in one of these fields, we’ve mistakenly bagged a real live human, not a spambot. Oops.

I’m a bit behind on my knowledge of page readers and other alternative browsers, so I’m not sure whether or not this is actually a significant issue. Comments are welcome.

Habit #3: Embrace Indirection

Big Medium’s forms all have submission URLs that point at the content page itself. At the last moment, when the form is submitted, Big Medium swoops in via JavaScript and sets the form to point to the real submission URL. Likewise, the form’s submission method is changed from “GET” to “POST.”

Most spambots don’t grok JavaScript, which means that their submissions will go to the wrong page.

Here again, we have a potential accessibility issue, however. If a visitor has JavaScript disabled in their browser, they can’t submit the form. Big Medium adds a note to JavaScript-disabled pages saying that JavaScript is required to submit the form. That’s not ideal, but in the end, I think requiring JavaScript is an acceptable price of participation if it helps to keep the environment spam-free.

Habit #4: Find the Prey’s Lair

Many spambots never actually visit the page with the form on it. They just submit directly to the submission URL, using their programmed fields. So, another test is to check the source of the form submission. Call it the spambot’s lair.

Spambots typically fib about the page where they’re coming from (the “referrer” URL). These fake referrer URLs are sometimes the site’s homepage URL or some other value unrelated to the page where the form actually lives. Big Medium checks these referrer URLs and discards the form if it smells bad.

Trouble is, even legitimate browsers often provide unreliable referrers, sometimes providing none at all. If no referrer is provided, Big Medium lets the submission slide, so this isn’t a very strong test, but hey, every little bit helps.

Habit #5: Ask Good Questions

Every safari gentleman should be a master of conversation. Big Medium does its best by posing a challenge question in its hunt for spambots. This is a very simple CAPTCHA test to see if the form submitter is human. The challenge question and its expected response can be customized to any site managed by Big Medium. As of this writing, this blog’s challenge question is, “What color is fresh grass?” If spambots are custom-programmed to answer, “green,” I can just change the question, and the spammer will have to come back to the site to adjust.

Half the battle, after all, is just making it hard to spam. Increasing the cost and difficulty of maintaining spambots for an individual site means that the spammer is more likely to move on to easier prey.

The logical flip side is that you should reduce incentives for doing that maintenance. If there’s not much to be gained by doing custom coding for your site, then the spammer will likely move on. Which brings us to...

Habit #6: Don’t Look Tasty

When dodging the slavering jaws of the spambots, it’s best if your site looks like a lousy supper. That means reducing the rewards of getting the spam message onto the site.

Even if spammers make the necessary code changes to help their spambots navigate the obstacles I’ve described above, they may simply decide that it’s not worth the extra work and maintenance if the incentives are too low.

My assumption is that spammers are in the hunt for a good Google PageRank score; it’s about boosting their results in search engines. Big Medium automatically adds the rel="nofollow" tag to all links. That tells Google and other search engines not to index the link, so posts in Big Medium comments get no Google juice.

Admittedly the pace of spam has only increased since the rel="nofollow" tag was introduced by Google and a group of blog software makers a couple of years ago. My optimistic hunch is that new apps that support this tag from the get-go (read: Big Medium) won’t be as big a target. We’ll see.

Habit(?) #7: The Most Dangerous Prey of All: Man

The strategies listed above are primarily aimed at thwarting automated spambots. They won’t do much to slow down a real person determined to share their recommendations for ringtones and rock-bottom mortgage rates. So it’s important to provide some additional protection against human spammers, as well as the handful of clever spambots that might manage to evade all of the other traps and tests.

Analyzing the actual content of the message is the only answer. That’s where Akismet comes in.

Big Medium has built-in support for the Akismet online anti-spam service. The brainchild of Wordpress phenom Matt Mullenweg, Akismet torture-tests comments with hundreds of tests and then tells Big Medium whether or not it’s spam. If a comment reeks of processed ham, Big Medium chucks it into the spam bin instead of posting it to the site. Spam comments are deleted automatically every 15 days.

On top of all of that, of course, you can also choose to review all comments that actually make it through the gauntlet before posting them to the site. That lets you eyeball and approve all of the remaining messages personally.

Callooh! Callay! Cautious Optimism...

In the end, I’m cautiously optimistic that Big Medium’s new comment feature is backed by tight protection against spammers, but time will tell.

“And, has thou slain the SpamBot?
  Come to my arms, my beamish boy!
O frabjous day! Callooh! Callay!’
  He chortled in his joy.

1. Apologies to Lewis Carroll. [Back]

2. According to Akismet, they're identifying 95 percent of web comments as spam. My personal spam filter, the excellent SpamSieve, traps 83 percent of my e-mail as spam (over 200 per day!). [Back]

Tags: , , , , , ,

Want more? Recent blog entries...

Comments

16 comment(s) on this page. Add your own comment below.

Steve Carter
Mar 30, 2007 2:17pm [ 1 ]

wow. You have really put some juice into this. I think i have come with a new diagnosis for you - Impressive Compulsive Disorder

Keep up the great work

steve

Mar 30, 2007 3:02pm [ 2 ]

I implemented something alike for my weblog. In two or three years of usage, I never go any comment spam. And that's without any JavaScript or bayesian filter or timed expiration of valid input names. (Comments are disabled for now however so you can't check it out, sorry.)

Tom
Apr 2, 2007 5:09am [ 3 ]

Awesome! I've been eagerly awaiting the comments feature, and to know you've put this much thought behind it is great. Keep those updates coming :)

One-click editing anyone?

Iwalk
Apr 2, 2007 12:04pm [ 4 ]

Just wanted to provide a quick thank you for the comment feature Josh. I was one of the people that originally asked about it. At the time I really didn't know there were bots that were clever enough to spam comments sections. I'm very grateful that you took so much care in constructing the comments, and keeping an eye out for the publishers.

Big Medium, my best software purchase EVER.

Jan 29, 2008 3:56pm [ 5 ]

I have been reading up on several ways to prevent spambots from spamming web forms. I came up with a simple technique that doesn't use image validation but simple number validation. Each time a user enters my form, I generate a unique ID and a 5-7 digit number code. I save this unique ID to a database along with its associated number code. When the form is submitted, if hidden field unique ID is the same and number code you typed is correct it submits the info and deletes the record, otherwise it will assume spam and not submit info. Again, it can be broken but that come into how complex I display the 5-7 digit code. Visit: http://www.atksolutions.com/contact.php

for an example. So far it has worked Anthony http://www.atksolutions.com

May 27, 2008 8:47am [ 6 ]

Just have to see if your vorpal blade goes snickersnack on this comment.

May 27, 2008 8:54am [ 7 ]

Points for the excellent usage of snickersnack.

Nov 2, 2008 10:57pm [ 8 ]

My website previously used frontpage, but it was attacked ruthlessly by spambots. You absolutely need captcha, but even that isnt 100% secure. Frontpage has pretty much died now and microsoft dropped it though. I like your captcha as it asks a question, which means no stupid images. Even google mail has images that humans sometimes cant read. exactly where can I obtain such a captcha script, and does it work with all forms? I need to be able to add custom fields. Thx.

Nov 3, 2008 5:15am [ 9 ]

Thanks for your note. This article details techniques that can be used to build your own form-handler, and I'm afraid that I don't have details about third-party scripts that implement these methods. A Google search for "simple captcha" or "question captcha" will likely turn up some options.

Nov 17, 2008 8:39pm [ 10 ]

I've tried a different solution that seems to be working pretty well so far: I took out all the input and form fields from my wp-comments-post.php file. In their place I'm using Flash to submit the comment data through AMFPHP. I'm pretty sure spamBots don't know how to work with Flash so I should be safe, for a while. :-)

I believe the only spam I'm getting now is trackbacks which are easy enough to capture.

As soon as I'm sure I've ironed out any bugs with this solution I'll post my code for others to use. An added bonus is that users can format the text with bold, italics, underline, color, links and bullets.

Question: I seem to have lost the email notifications when comments are submitted. I'll probably have to initiate that myself since I'm bypassing the normal WP comment submission. I'd appreciate it if you could point me to info about how that works.

harrisandreson
Apr 26, 2010 12:07am [ 11 ]

I like your captcha as it asks a question, which means no stupid images. Even google mail has images that humans sometimes cant read. exactly where can I obtain such a captcha script, and does it work with all forms? I need to be able to add custom fields. cissp

Jul 10, 2010 1:30am [ 12 ]

I like your captcha as it asks a question, which means no stupid images. Even google mail has images that humans sometimes cant read. exactly where can I obtain such a captcha script, and does it work with all forms?

Frank Leonhardt
Sep 12, 2010 11:23am [ 13 ]

I hate to be the one to break it to you, but you've got some comment spam in here (Jul 10, 2010 2:30am). Inevitable, I suppose, and an object lesson in how to break the system.

Thanks for the interesting detection techniques, some of which are certainly new to me. I'm not sure how effective some of them will be; especially the form name rehashing. One problem is that a lot of the spam isn't generated by bots, but by people paid to do it in third-world countries. In places where the monthly income is only a few dollars you can get a lot of spam for your buck with no site-specific coding required.

You're also not mentioning IP blacklists as a technique. It's not the answer, but IME it works very well (having analysed the addresses being used).

Sep 12, 2010 11:36am [ 14 ]

Yep, most of the techniques described here won't prevent human spammers, as noted in Habit #7 above, although Akismet will at least block obviously spammy comments.

I'm leaving the spam comment you mention up here for the purposes of this discussion. In the case of that comment, there's neither much damage (the comment is actually fairly valid from a content perspective) and doesn't give the spammer any Google juice (nofollow link). So even this crack in the system isn't a particularly vexing one.

The techniques described here, as suggested by the title, are aimed only at spambots. In the 3+ years that I've been using this system, spambots have indeed gotten no traction here. You're right that IP blacklists may help, but they won't do much for helping against a human spam network sourced, for example, via Amazon's Mechanical Turk, where every spammer would have a different IP address. (Akismet presumably includes IP blacklists as part of its spam protection.)

Thanks for your (unspammy!) comment.

My website previously used frontpage, but it was attacked ruthlessly by spambots. You absolutely need captcha, but even that isnt 100% secure. Frontpage has pretty much died now and microsoft dropped it though. I like your captcha as it asks a question, which means no stupid images. Even google mail has images that humans sometimes cant read. exactly where can I obtain such a captcha script, and does it work with all forms? I need to be able to add custom fields. Thx.

Nov 7, 2011 8:41pm [ 16 ]

My website previously used frontpage, but it was attacked ruthlessly by spambots. You absolutely need captcha, but even that isnt 100% secure. Frontpage has pretty much died now and microsoft dropped it though. I like your captcha as it asks a question, which means no stupid images. Even google mail has images that humans sometimes cant read. exactly where can I obtain such a captcha script, and does it work with all forms? I need to be able to add custom fields. Thx.

Add a Comment

Don't be shy.

(Use Markdown for formatting.)

This question helps prevent spam:

Download Big Medium
Try it free for 30 days, or buy to unlock.

Blown Away

“I’m blown away by Josh Clark’s deep understanding of the iPhone user experience.”
—Jürgen Schweizer, founder of Cultured Code, maker of Things iPhone app

“It’s rare to find a person like Josh Clark who speaks so intently to the topic of interface design and mobile devices.”
—John Maeda, president of Rhode Island School of Design

“If you have time to read only one book on what makes apps successful, it is Tapworthy by Josh Clark.”
—Andreas Sjostrom, manager of mobile solutions, Sogeti