How I’ve Cut WordPress Comment Spam by 45%

For the last couple of months I’ve been keeping the database entries for comment spam caught by Akismet at my personal blog. I have an ongoing search for patterns that I hope to use in a future project.

WordPress accepts three types of comments: “regular” comments made by submitting the comment form on-site, “trackbacks,” and “pingbacks.” Trackbacks and pingbacks are similar in that they (supposedly) result from someone else’s blog linking to one of my blog posts, but they differ in that with a pingback, WordPress actually verifies that the remote site has indeed linked to my site. The result is that I have almost no pingback spam. The few spammy pingbacks have come from sites that do link to my site but seem to have no other purpose than to get people to click on ads.

Of the tens of thousands of spam I’ve collected, about 45% is regular comments and 55%, trackbacks. My guess was that most regular comment spam was made by robots that post to comment forms en masse. So I thought that if I forced commenters to preview their comments before submitting them, most of the regular comment spam would disappear. To do that, I added to my comments preview plugin the option to require previews (a feature which others had asked for earlier), and I activated it on my blog.

My guess turned out to be right: required comment previews eliminated about 95% of the regular comment spam. However, the remaining 5% was significant enough to be troublesome. Somehow spam robots were submitting a preview, then posting it again. Looking at my logs, I noticed that the bots would submit the preview from one IP address but submit the final comment from another, something I think no human would ever do. So I added a nonce field to the comment form which checks that the comment has been previewed and submitted by the same IP address before it’s submitted. That slashed the remaining regular comment spam to a tenth of its size. There are still a handful of regular comment spams remaining, and they appear to be real browsers (using JavaScript and accepting cookies), so I’m curious about whether they’re controlled by humans or are just zombies.

To do next: find a way to reduce trackback spam by the same amount.

Post a Comment

Your email is never published nor shared. Required fields are marked *