Using Human Queue Worker to process comments

Some time ago, I released Human Queue Worker, a module that takes the concept of the Drupal Queue system, but where the processing of the items is done by human users rather than an automated process. I say 'takes the concept'; it in fact uses the Drupal Queue to create and claim queue items, but instead of declaring your queue with hook_cron_queue_info(), you declare it to Human Queue Worker as a queue that humans will be working on.

This was written for my current project and for a fairly specific need, and I didn't imagine many sites would be using it. However, it has an obvious and popular application: approving comments. I always figured it would be nice if someone wrote a little module to define a comment processing human queue.

Well, that someone is me, and the time is now. You see, I'm an idiot: when I set up this new blog site of mine, I totally forgot to set up a CAPTCHA, and then when I added Mollon, I didn't set it up properly. So this site has a few hundred spammy comments that I need to delete.

The problem is that comment management takes time. Unless there are some magical area of the core UI I've completely missed, I can either visit each node and delete them one by one, or use the comment admin form. There, I can mass-delete the ones with obvious spammy titles, but all the others will still need individual inspection.

The Human Queue UI simplifies this hugely. There's just one page for the queue. When you go to that page, you're presented with an item to process. In the case of comment approval, that's the comment itself, plus the parent node and parent comment to give you some context. To process the comment, click one of two buttons: 'Publish' or 'Delete'. The comment is dealt with, and the form reloads, with a brand new comment for you to process. Which means that the only clicking you do is the action buttons: Publish; Delete; Publish; Delete. (Though with the amount of spam on my site, it's probably Delete; Delete; Delete, like the Cybermen.)

I've not timed it, but I reckon I can probably go at quite a rate. And that's with just one of me: the core Queue system guarantees that only one worker can claim an item at any one time, and that applies to human workers too. So if another user were to work the queue too, by going to the same page, they would be getting shown different comments to work on, and we'd work through the comments at twice the rate.

Now I just need to find a compliant friend and make them into my worker drone. If you're interested, please don't post a comment!

Comments

I would suggest adding an option to mass delete all content posted by the user who's comment you are deleting. I did something similar on a D6 site yet (though not with the fancy queue system you have). In my case, about 80-90% of the database was spam, but I found that huge numbers of spam posts were posted by the same users, so by deleting all of a user's posts at once I was able to dramatically speed up the cleanup process.

I found out a lot of spammers use both several names and/or use several IP-adresses. So when mass-deleting use both the username (if available) but also the IP-address.
Personally I use on all my sites module HTTPBL which prevents hard-core spammers access to the sites.

That is a nice idea, but you can already hire human 'drones':

Mechanical Turk

https://www.mturk.com/mturk/welcome

The only real problem is how much you trust them, so using agent systems with a percentage weighting, can help that process.

Another thing is that going one by one, you have the drupal_bootstrap_time.

It is better like in real drupal batches to pre-load a set of items, running through them via JS or other means and submitting back the whole batch.

Also how do you ensure that all items are processed and no conflicts arise? (a time out lock?)

Interesting topic, interesting solution.

Thanks!

In the case of my project, it absolutely had to be staff members who did the work, as it was sensitive data.

> Also how do you ensure that all items are processed and no conflicts arise? (a time out lock?)

Drupal's queue system does that for us already -- when you claim an item from the queue, it's reserved for a period of time, and can't be claimed by anyone else until that expires. The form takes note of that time, and the form validation handler checks it and will fail form validation if the claim time is expired.

Hi!
I was using captcha and botcha on sites and still getting spam and spam registrations. I installed the spambot module that uses a black list from www.stopforumspam.com to block submissions. This has cut spam submissions to almost 0%.

This helps to avoid the need to delete spam comments if they never get submitted.

God Bless!
Frederick

Maybe a good idea to integrate with Mechanical Turk?

I see lots of use for this module, particulary trusting users to approve/enrich content imported from other sources, usually through the Feeds module.