OpenAI is asking third-party contractors to upload real assignments and tasks from their current or previous workplaces so that it can use the data to evaluate the performance of its next-generation AI models, according to records from OpenAI and the training data company Handshake AI obtained by WIRED.
The project appears to be part of OpenAIâs efforts to establish a human baseline for different tasks that can then be compared with AI models. In September, the company launched a new evaluation process to measure the performance of its AI models against human professionals across a variety of industries. OpenAI says this is a key indicator of its progress towards achieving AGI, or an AI system that outperforms humans at most economically valuable tasks.
âWeâve hired folks across occupations to help collect real-world tasks modeled off those youâve done in your full-time jobs, so we can measure how well AI models perform on those tasks,â reads one confidential document from OpenAI. âTake existing pieces of long-term or complex work (hours or days+) that youâve done in your occupation and turn each into a task.”
OpenAI is asking contractors to describe tasks theyâve done in their current job or in the past and to upload real examples of work they did, according to an OpenAI presentation about the project viewed by WIRED. Each of the examples should be âa concrete output (not a summary of the file, but the actual file), e.g., Word doc, PDF, Powerpoint, Excel, image, repo,â the presentation notes. OpenAI says people can also share fabricated work examples created to demonstrate how they would realistically respond in specific scenarios.
OpenAI and Handshake AI declined to comment.
Real-world tasks have two components, according to the OpenAI presentation. Thereâs the task request (what a personâs manager or colleague told them to do) and the task deliverable (the actual work they produced in response to that request). The company emphasizes multiple times in instructions that the examples contractors share should reflect âreal, on-the-job workâ that the person has âactually done.â
One example in the OpenAI presentation outlines a task from a âSenior Lifestyle Manager at a luxury concierge company for ultra-high-net-worth individuals.â The goal is to âprepare a short, 2-page PDF draft of a 7-day yacht trip overview to the Bahamas for a family who will be traveling there for the first time.â It includes additional details regarding the familyâs interests and what the itinerary should look like. The âexperienced human deliverableâ then shows what the contractor in this case would upload: a real Bahamas itinerary created for a client.
OpenAI instructs the contractors to delete corporate intellectual property and personally identifiable information from the work files they upload. Under a section labeled âImportant reminders,â OpenAI tells the workers to âremove or anonymize any: personal information, proprietary or confidential data, material nonpublic information (e.g., internal strategy, unreleased product details).â
One of the files viewed by WIRED document mentions a ChatGPT tool called âSuperstar Scrubbingâ that provides advice on how to delete confidential information.
Evan Brown, an intellectual property lawyer with Neal & McDevitt, tells WIRED that AI labs that receive confidential information from contractors at this scale could be subject to trade secret misappropriation claims. Contractors who offer documents from their previous workplaces to an AI company, even scrubbed, could be at risk of violating their previous employersâ nondisclosure agreements or exposing trade secrets.
âThe AI lab is putting a lot of trust in its contractors to decide what is and isnât confidential,â says Brown. âIf they do let something slip through, are the AI labs really taking the time to determine what is and isnât a trade secret? It seems to me that the AI lab is putting itself at great risk.â
