Thursday, September 27, 2012

Academic Requesters: How to prevent retakes on your surveys without risking worker accounts

EDIT: Current information is on THIS POST. The information below is dated and might not work.



Preventing retakes on your survey is the most common issue that requesters have with mTurk. This guide explains an easy way to prevent retakes just by using the Amazon mTurk Command Line Tools (CLT) that work on Windows, OS X and GNU/Linux.

Note about Amazon's Block Method

A Google search will yield this forum post where an mTurk representative tells requesters to use the block worker function to prevent survey retakes. Unfortunately, Amazon's system is not perfect, so even if you do this a worker may still get the following form email:
Greetings from Mechanical Turk.
We regret to inform you that you were blocked from working on HITs by the following requester(s):
Example
Requesters typically block Workers who submit poor quality work.
Requesters rely on Mechanical Turk for high quality work results. In order to maintain quality, we continuously monitor service activity. Should additional Requesters block you in the future, we may suspend your account. Please ensure your work quality is at a high standard. We encourage you to read the HIT instructions and follow them carefully.
We realize that this block may be a onetime occurrence for you. Should you maintain high work quality with no further complaints for the next few months we will dismiss this event.
Regards,
The Mechanical Turk Team
Not only that, if you want to give our multiple surveys of the same type workers who took them before won't be able to take a different kind of survey from you.
IsaacM says this should never happen but experience begs to differ as requesters who use this method still get these emails sent out to workers. (There are many posts on the mTurk worker forum to this effect.) If there is a "glitch" in mTurk workers are going to be emailing you pleading for an unblock and you will get a bad reputation which may make responses from future surveys come more slowly or not at all.

Preferred Method: Qualifications

Instead of potentially causing a lot of trouble, you can use mTurk's qualifications to keep workers out of surveys. Basically, how it works is this: Your survey has a qualification pre-attached to it called "Did my survey" and it requires a value of 0 from a worker, a worker requests the qualification and it is auto-granted to them at a value of 0, and once they take the survey the qualification is incremented by 1.
First, create a file that looks like this, and name it something like no_retakes.properties:
name:No retakes please!
description:Prevent retakes on my survey
keywords:prevent, retakes
autogranted:true
autograntedvalue:0
To make the qualification, execute this command with the mTurk Command Line Tools (note: In this post, I've used the Windows syntax for all command line examples. For OSX/Linux, prepend the characters ./ to the beginning of the command and append .sh to the end of the first word, e.g. createQualificationType.sh):
createQualificationType -properties noretakes.properties
You will get a QualTypeID printed to standard output as well as to a file called noretakes.success. You need this ID to change the values later.
Make sure to add this qualification to your hit.properties file if you are making a new HIT with the CLT or to the necessary qualifications in the Hosted Requester GUI. Remember that the value should be 0.
Once the first run of the survey is complete, you can now raise everyone's values.
Whether you used the getResults command line tool or the web UI to get your results file, you should still have all of the work ID's who submitted work to your survey. Create a tab-delimited text file with the columns workerid and score. A few programs can create these files, such as Microsoft Excel and LibreOffice Calc. In Excel, they are called .tsv (tab separated values) files while in Calc they are saved as .csv but with a different delimiter (namely, tab or \t).
Your file should look something like this, with the symbol → representing a tab:
workid→score
A1EXAMPLE→1
A2EXAMPLE→1
A3EXAMPLE→1
Then, run the following command to update everyone's qual score (note, the -qualtypeid parameter takes the QualTypeID generated earlier and stored in no_retakes.success)
updateQualificationScore -qualtypeid TPREVENTRETAKESQUALIDEXAMPLE -input noretakes.tsv

Harder method: Internal lists

A harder method to prevent retakes is to ask for a worker's work ID when they begin your survey, and if it's found on a list, tell them to return the HIT. Doing that is far out of the scope of this article, though, and this method is better as it will prevent any workers from even accepting a survey they cannot do, which will leave them on the site longer for people who can do them!
I hope this was useful for you and I hope your survey gets lots of replies. :) Remember to pay workers fairly! (at least 12 cents per minute)

14 comments:

  1. I've recently outlined the "Harder method" at my own blog: http://www.tylerjohnburleigh.com/mturk/2013/01/simplescreening-using-workerids/

    It's not really all that hard. ;-)

    ReplyDelete
    Replies
    1. This is entirely insecure, you are loading the account ID's into each turkers browser and obfuscating it. You must everything server side or use an MD5 hash!

      Delete
    2. While I agree (and posted a comment to help make this hashing happen), what are the worst case scenarios for making such a list of worker IDs public?

      Delete
    3. I recently developed a more secure version that piggybacks on the Unique Turker service, while providing the same level of control in HIT design: http://www.tylerjohnburleigh.com/mturk/2013/05/simple-screening-v2/

      This one, as you and others have suggested, uses a server-side database for the storage of worker IDs.

      Delete
  2. Great, straighforward outline of the method! Thanks!
    I implemented this yesterday.
    However, I noticed that the response rate/ hour dropped dramatically!! Could it be that most people are screening on "HITs I am qualified for", so they don't see the ones using auto qualification in their search results?

    ReplyDelete
    Replies
    1. It could depend on how much you are paying vs. time it takes to complete the hit. Yes, some workers do screen hits by only searching ones that are available to them.

      Delete
    2. OMG, I always search the "HITs I am qualified for", searching for HITs is very time consuming and any way that we can narrow things down we have to do. But I also have to say that trying to remember if you've done a certain HIT before gets harder and harder each day because even if I don't do a HIT, I've probably read the details before. It is problem for both sides.

      Delete
  3. Hi,

    I need help understanding the qualification method. I am an academic and plan on publishing a lot of surveys on MTurk, however, I do not want to use a method that causes inconvenience to workers. I am not allowed access to turkernation as I am currently not in the US.

    Can you please help me?

    ReplyDelete
  4. Amazon only accepts requesters from the US.

    ReplyDelete
  5. I used this auto-grant method earlier and it worked perfectly fine. But of late it does not seem to be working. Any idea what the problem could be. I tried finding solutions - it is not because of low payment or higher time or lack of worker population.

    ReplyDelete
  6. I've tried creating the "no retakes" qualification and while everything seems to work fine in practice (I'm able to accept the HIT from another account after publishing a batch), no one has completed my HIT. I've run nearly the exact same HIT before and received an excellent response rate. Any thoughts on why the response rate has dropped to zero? Thank you in advance for any suggestions.

    ReplyDelete
  7. It could be because of the default "masters" qualification that Amazon uses now. You may have published your HIT in the past and had it available to the entire workforce, but when you publish using "masters" qualification, you will have workers who are very picky about who they work for and how much they are paid.

    ReplyDelete
  8. One thing that is not clear to me in this post is, until I run the "update qualification score" method, a is the worker allowed to pick up my task multiple times and do it? Or is the worker automatically blocked after the first time he or she picks up our task?

    ReplyDelete
  9. I am a requester and I’m haveing alot of problems with retakers I pay around $5 I’m getting ready to pull my survey. I need a legitimate step by step how to stop that from happening. please help

    ReplyDelete