How AI Is Trained: The People Behind Models (2026)

Jun 24, 2026

The behind-the-curtain guide to the human beings who actually train AI, and what it costs them.

Outputs from a 1.3-billion-parameter model that humans had taught were preferred over a model 100 times larger that they had not - OpenAI. That single result, from the research that became ChatGPT, is the most important fact in modern AI and the one people understand least. The intelligence in a large language model does not come only from scraping the internet and spending money on chips. It comes from people: tens of thousands of them, writing answers, ranking responses, grading reasoning, designing tests, and breaking the model on purpose so it can be patched.

Most coverage of AI treats training as a purely technical event, a matter of compute and parameters. The reality is closer to a global labor operation. A Kenyan worker reading descriptions of violence for less than two dollars an hour, a physician in Boston paid a hundred and fifty dollars an hour to review a medical answer, a software engineer building a simulated workplace for an agent to practice in, and a researcher poached for a nine-figure package all sit on the same supply chain. They are all, in different ways, the people behind the models.

This guide explains how AI is actually trained by walking through the pipeline stage by stage and showing exactly where humans enter, who those humans are, what they are paid, where they work, and how the entire arrangement is changing as the industry runs low on easy data. It is written for a curious non-technical reader who wants the insider view, not the marketing version. The numbers and dates here are drawn from 2025 and 2026 reporting and primary research, because this corner of AI is moving faster than almost any other.

Written by Yuma Heymans (@yumahey), who built HeroHunt.ai and its autonomous AI Recruiter to source scarce specialists from over a billion online profiles. Finding qualified experts who are not on any job board, which is now the central bottleneck in training frontier models, is the exact problem he works on every day.

How AI Actually Learns From People
The Hidden Workforce: Who Trains the Models
From Cheap Labels to Frontier Data
Inside RLHF: How Human Preferences Become a Reward
RL Environments: The 2026 Frontier of Human Data
The Companies Behind the Curtain
The Pay Ladder: What AI Trainers Actually Earn
The Human Cost: Trauma, Precarity, and the Global South
Red Teams: The People Paid to Break Models
The Researchers and the Compensation War
The Geography and Scale of the AI Labor Force
The Future: Synthetic Data and the Expert Bottleneck

1. How AI Actually Learns From People

A modern AI model is built in stages, and humans shape every one of them, even the stage that looks fully automated. Understanding this sequence is the key to understanding who the people behind the models are, because each stage employs a different kind of person doing a different kind of work. If you only remember one thing from this guide, remember that the model's raw knowledge and its actual behavior come from two separate processes, and the behavior, the part you experience when you chat with it, is almost entirely human-taught.

The first stage is pretraining, where the model reads a vast slice of the internet and learns to predict the next word. GPT-3, the model that started the current era, was trained on roughly 300 billion tokens of text, around 60% of it a filtered version of the web crawl Common Crawl - Wikipedia. No human writes this data, but humans make the consequential decisions about it: which sources to include, what counts as quality, and what to throw away. GPT-3's team built a quality classifier to separate good text from junk and ran deduplication to stop the model memorizing repeated pages - NeurIPS. The single most uncomfortable piece of human labor in AI lives here too: someone has to look at the worst content on the internet so the model can be taught to avoid it.

It is worth pausing on this curation work, because it is the least visible stage and among the most consequential. Deciding what goes into pretraining is a chain of human judgment calls with enormous downstream effects: which languages are represented and which are left thin, which websites are deemed low quality and thrown out, how aggressively to strip toxic or copyrighted material, and how to balance code against prose against academic text. These choices are made by small teams whose names never appear anywhere, and they are increasingly fought over in court as publishers and authors challenge what was scraped. A model is, in a real sense, a compression of the decisions humans made about its diet, and those decisions determine what it knows and what it is blind to long before a single rater sees an output.

A pretrained model is knowledgeable but useless as an assistant. It will happily continue a prompt with more questions instead of answering. The fix is supervised fine-tuning, where humans write ideal answers to example prompts and the model learns to imitate them. In the original ChatGPT lineage, this was done by about 40 contractors hired through Upwork and Scale AI, who hand-wrote roughly 13,000 demonstration answers - OpenAI. This is the cleanest example of humans literally teaching the machine to talk: they show it what a good response looks like, thousands of times.

The third stage is the one that made assistants feel genuinely helpful: reinforcement learning from human feedback, or RLHF. Here, humans do not write answers; they compare them. The model generates several responses and a person ranks them from best to worst, and those rankings train a separate model to predict human taste. The two later stages, RLHF and the 2025 successor described below, are where the model's personality, helpfulness, and safety are forged, and they run almost entirely on human judgment.

The diagram below shows the full pipeline and the kind of person who shows up at each stage, from anonymous web text at the bottom to scarce domain experts at the top.

How AI Is Trained, and Who Shows Up at Each Stage

The behavior you experience is almost entirely human-taught

The newest stage, added across 2025, is reinforcement learning from verifiable rewards, often shortened to RLVR. For tasks with a checkable answer, like math or code, the model practices millions of times and a computer, not a human, marks each attempt right or wrong. This is the engine behind the reasoning models that defined 2025, and as the next sections explain, it changes the human role without removing it. Andrej Karpathy, the researcher who has done the most to explain this shift to a general audience, frames RLVR as the major new training stage of 2025, sitting on top of the older pretraining, fine-tuning, and RLHF stages - Karpathy.

The 2025 reasoning models show how fast this is moving and why the human role keeps shifting rather than disappearing. OpenAI's o1, released in late 2024, was the first mainstream model trained to "think" through a problem before answering, and it reached the 89th percentile on competitive-programming questions while exceeding human PhD-level accuracy on a hard science benchmark - OpenAI. Its open Chinese competitor, DeepSeek-R1, went further by showing reasoning could emerge from pure reinforcement learning, and it did so cheaply: the reinforcement stage reportedly cost about $294,000 of compute - Tech Startups. DeepSeek's team deliberately avoided a learned reward model, warning that it "may suffer from reward hacking," and relied on automatic correctness checks instead - arXiv. For a moment this looked like the end of the human labeler, but it really marked a change in which humans matter most.

The throughline of the entire pipeline is that automation handles scale while humans handle judgment, and the higher you climb, the scarcer and more expensive the humans become.

Because the full pipeline is hard to picture, the clearest walkthrough for a general audience is the lecture below, a 2025 deep dive by former OpenAI and Tesla researcher Andrej Karpathy that traces a model from raw web text through fine-tuning and human feedback to the reasoning stage.

Deep Dive into LLMs like ChatGPT (Andrej Karpathy, 2025)

Karpathy's framing is useful precisely because he keeps returning to the human inputs at each stage, from the contractors who write demonstration answers to the raters whose comparisons become the reward signal. With the map in hand, the rest of this guide zooms in on the people at each layer.

2. The Hidden Workforce: Who Trains the Models

The people who train AI are not one job but a stratified workforce, and the strata barely talk to each other. At the bottom are commodity data workers doing high-volume, low-context tasks. At the top are credentialed specialists writing problems most humans could not solve. In between sits a large middle of generalist annotators and preference raters. The reason this matters is that public debate constantly confuses these groups: the two-dollar-an-hour content moderator and the two-hundred-dollar-an-hour physician are both "AI trainers," but they live in completely different economies, and the industry is rapidly shifting investment from the first toward the second.

The largest and most invisible group is the data-labeling and content-moderation workforce, concentrated in the Global South. These are the people who draw boxes around pedestrians for self-driving cars, transcribe audio, flag toxic text, and sort images. Their work made the term "ghost work" famous, coined by anthropologist Mary L. Gray and computer scientist Siddharth Suri to describe paid human labor that customers assume is done by an algorithm - Slate. For years this was the entire story of AI annotation, and it is still numerically the biggest part of the workforce, but it is no longer where the money or the growth is. Gray and Suri estimated that as many as 8% of Americans had done some form of this hidden online work at least once, a hint at how large and uncounted the true workforce is - Slate.

The journalist Josh Dzieza captured the texture of this work in a widely read 2023 feature, describing annotators in Nairobi doing repetitive tasks for low pay with almost no information about what they were building or why - The Verge. The defining feature of the bottom layer is not just low pay but deliberate opacity: workers are kept in the dark so the finished product can be presented as the work of a clever algorithm rather than a clever crowd. That opacity is exactly what makes a guide like this one necessary, because the labor is engineered to be invisible.

The middle layer is the preference rater, the person who powers RLHF by comparing model outputs. This work requires literacy, judgment, and patience rather than formal credentials, and it is done both by Global South contractors and by college-educated workers in richer countries on platforms like Scale AI's Outlier. Above them sit the domain experts: the doctors, lawyers, software engineers, bankers, and PhDs whom labs now pay premium rates to write hard questions, grade nuanced answers, and build specialized tests. This top tier barely existed as a category three years ago and is now the fastest-growing and most fought-over segment of the entire industry.

The gap between these floors is best understood through a day in the life. A bulk worker in Nairobi might spend a nine-hour shift reading hundreds of short passages and tagging each for violence or abuse, paid by the task, with no idea which company the work serves. A preference rater in a mid-sized US city might spend the same day comparing pairs of chatbot answers and explaining why one is more helpful, paid an hourly wage that clears rent but little more. A domain expert, a practicing radiologist or a securities lawyer, might log on for a few hours between cases to write three genuinely hard questions and grade the model's attempts, billing more per hour than many of the engineers who built the model. All three are training the same kind of system, and the only thing separating their pay by a factor of a hundred is how replaceable their judgment is.

There is a fourth group people rarely count as "AI trainers" at all: the red-teamers and safety evaluators who attack the model to find its dangerous failures, and the researchers and engineers who design the training process itself. Both are covered in their own sections later, because both are genuinely part of the answer to "who trained this model." The practical takeaway from this taxonomy is that when a headline says AI is built by underpaid workers or, conversely, by lavishly paid experts, both claims are true at once: they are describing different floors of the same building, and the elevator is moving up.

3. From Cheap Labels to Frontier Data

The defining shift of 2025 and 2026 is that AI labs stopped wanting cheap labels and started paying a fortune for hard ones. For most of the deep-learning era, the goal of annotation was volume: label a million images as cheaply as possible. The frontier labs have now largely solved the easy problems, so the marginal improvement in a model increasingly comes from data that does not exist anywhere on the internet and that only a qualified human can produce. This is why a market once defined by gig workers earning a few dollars an hour now mints billionaires, and it is the single most important context for everything else in this guide.

The driver is what researchers call the data wall. There is a finite amount of useful human text online, and the biggest models are starting to use it all up. Epoch AI estimates the effective stock of quality human-generated public text at roughly 300 trillion tokens, and projects that frontier models will have consumed it sometime between 2026 and 2032 - Epoch AI. The former OpenAI chief scientist Ilya Sutskever put it more bluntly at a late-2024 conference, declaring that "pre-training as we know it will unquestionably end" because "we've achieved peak data and there'll be no more. There's only one internet." - The Verge.

When you cannot get more data by scraping, you get it by hiring people to create it. The benchmark that captures this best is Humanity's Last Exam, a 2025 test built specifically to be hard for AI, authored by nearly 1,000 subject-matter experts across more than 500 institutions in 50 countries - arXiv. When it launched, the best frontier models scored only a few percent on it, which tells you both how hard expert-authored material is and why labs are willing to pay so much to commission more of it. The same dynamic shows up in training data, not just tests: the questions a model most needs to learn from are exactly the ones that require a specialist to write.

The premium on expertise is measurable, not just rhetorical. On a graduate-level science benchmark called GPQA, human experts holding or pursuing PhDs in the relevant field score around 65%, while intelligent non-experts with full access to the web manage only about 34% - DataCamp. That gap is precisely what a lab buys when it hires a credentialed contributor: the ability to produce and verify answers a motivated generalist simply cannot. Even the foundational math benchmark GSM8K, a set of about 8,500 grade-school problems, was written and cross-checked by human contractors recruited through a data vendor - GitHub. Behind almost every number used to measure AI progress sits a person who authored the question.

Industry insiders have started calling expert human data "the new oil," a scarce and extractable resource whose owners can name their price. The analogy is imperfect but instructive: unlike crude, the supply of expert judgment cannot be pumped faster by drilling harder. It has to be recruited one specialist at a time, which is why the companies that win this market are the ones best at finding and vetting rare people, not the ones with the cheapest labor. This single fact, that the binding input is now scarce human expertise, explains the valuations, the lawsuits, and the talent wars described throughout the rest of this guide.

This pivot has a name inside the industry: frontier data, or sometimes expert data. It reframes the worker from a cheap pair of eyes into a scarce source of judgment. A credentialed physician who can tell a plausible-sounding wrong medical answer from a correct one is doing something no crowd worker and, crucially, no current AI can reliably do. That scarcity is the whole economic story of the new data companies, and it explains why recruiting these specialists has itself become a bottleneck, a point the future section returns to. The shift from cheap labels to frontier data is, in one line, the shift from buying human time to buying human expertise.

4. Inside RLHF: How Human Preferences Become a Reward

RLHF is the most important training technique most people have never heard of, and it works by turning messy human opinions into a number a machine can optimize. The mechanism sounds almost too simple: show people pairs or small sets of model answers, ask which is better, and use those judgments to train a second AI, the reward model, whose only job is to predict what a human would prefer. The main model is then tuned to score well against that stand-in judge. This is how a system with no inherent sense of "good" learns to produce answers that feel helpful, polite, and on-topic.

The idea predates ChatGPT by two years. It first worked at scale in a 2020 OpenAI project on text summarization, which collected more than 64,000 human comparisons to train a reward model, proving that fuzzy human preference could be captured as a number a machine optimizes - arXiv. InstructGPT then turned this into the modern recipe with a surprisingly small crew. About 40 contractors, hired through Upwork and Scale AI, wrote roughly 13,000 demonstration answers for the fine-tuning stage and produced the comparison data behind the reward model - arXiv. A counterintuitive detail: the reward model did not need to be enormous, and OpenAI used a 6-billion-parameter version because larger ones trained unstably and wasted compute. The lesson, repeated since, is that a relatively small amount of high-quality human judgment can redirect a vastly larger model.

A critical design choice is that humans rank rather than score. People are notoriously bad at assigning consistent numerical grades, so labs ask for comparisons instead. As the widely cited Hugging Face explainer puts it, direct scores from humans are "uncalibrated and noisy," whereas head-to-head rankings can be combined into a reliable relative order - Hugging Face. In the original InstructGPT work, labelers were shown between four and nine responses per prompt and asked to rank them, generating tens of thousands of comparisons from about 33,000 prompts - arXiv. Even with careful guidelines, the labelers only agreed with each other about 73% of the time, a number worth sitting with: the "ground truth" that shapes a model's personality is itself a noisy human consensus.

What raters are actually asked to do is more subtle than picking the better answer. Labs hand them detailed rubrics covering helpfulness, honesty, harmlessness, formatting, and tone, and the quality of those written instructions matters as much as the raters themselves, because a vague guideline produces inconsistent data. This is also where some of a model's best-known quirks come from. If raters consistently reward answers that sound confident and agreeable, the model learns to be sycophantic, telling people what they want to hear rather than what is true. The personality of an AI assistant, in other words, is not an accident of the math; it is a faithful reflection of what a particular group of paid humans, following a particular set of instructions, decided was good.

Once the reward model exists, the main model is optimized against it using an algorithm such as PPO, with one essential safeguard. Left unchecked, the model learns to "reward-hack," producing weird or sycophantic text that fools the reward model without actually being better. To prevent this, training adds a leash called a KL penalty that punishes the model for drifting too far from its original fine-tuned self - Lil'Log. In plain terms, the reward model is a judge trained on human taste, and the KL penalty is a rule that says "improve, but do not become a stranger." The result preserves the model's coherence while nudging its behavior toward what raters liked.

The diagram below, from Hugging Face's widely used explainer, shows that final loop: the model is updated to score well against the human-trained reward model while the KL penalty tethers it to its original self.

Diagram of the RLHF fine-tuning loop in which a policy model is optimized against a human-trained reward model with a KL penalty anchoring it to the original model — Source: Hugging Face, Illustrating Reinforcement Learning from Human Feedback (2022).

The technique has two important descendants that change the human role. Anthropic's Constitutional AI replaced human labels for harmlessness with AI feedback guided by a written set of 16 principles, mixing roughly 135,000 human helpfulness comparisons with 183,000 AI-generated harmlessness comparisons - arXiv. Note the nuance: AI took over the harm labels, but humans still supplied the helpfulness preferences, so this reduced human labeling rather than eliminating it. The second descendant, Direct Preference Optimization, showed in 2023 that you can skip the separate reward model and reinforcement loop entirely and tune the model straight from preference pairs, matching or beating the older method - arXiv. DPO made preference training cheap and is now a default, but it does not change the fundamental dependency: somewhere upstream, a human still decided which answer was better.

Anthropic later tested whether the public, rather than employees, could write the rules a model lives by. In a 2023 experiment it asked about 1,000 representative Americans to propose and vote on principles for an AI constitution using a deliberation platform, then trained a model on the result and compared it to its in-house version - Anthropic. The public-sourced constitution leaned more toward objectivity and accessibility. The experiment matters for this guide because it widens the definition of "the people behind the model" to include ordinary citizens deciding its values, not only paid raters comparing outputs. It also previews a likely future: as AI feedback handles more of the routine labeling, the scarce human contribution shifts toward defining what the model should value in the first place.

5. RL Environments: The 2026 Frontier of Human Data

The hottest category in AI training data right now is not labels at all; it is environments. As labs pivot from chatbots to agents that take actions, use tools, and complete multi-step tasks, they need places for those agents to practice. An RL environment is a simulated task with an automated scorekeeper, and Epoch AI defines it crisply as a set of actions plus a prompt plus a grader: "the model attempts the task, and a grader (typically automated, such as a unit test or an LLM judging against a rubric) assigns a score to its attempts" - Epoch AI. The plain-language version: instead of grading the model's homework, humans now build the gym and the scoreboard, then let the model train against them millions of times.

This shifts the human job from labeling to engineering. Building a faithful simulation of a piece of software, complete with realistic failure modes and a reliable way to check success, is skilled work, and it is priced accordingly. Per Epoch's analysis, replicating a single website as a training environment costs around $20,000, while reproducing something as complex as Slack runs about $300,000, and full contracts reach six or seven figures per quarter - Epoch AI. The startup Mechanize, founded by alumni of Epoch in 2025, reportedly offered software engineers $500,000 salaries purely to build these environments, betting on a few extremely high-quality ones rather than many cheap ones - TechCrunch.

These environments are increasingly built and traded on dedicated platforms, like the open marketplace pictured below, where contributors publish reusable training worlds for agents to practice in.

Screenshot of Prime Intellect's Environments Hub interface, an open marketplace where contributors create and share reinforcement-learning environments for training AI agents — Source: Prime Intellect, Environments Hub launch (August 2025).

The scale of demand is hard to overstate. According to reporting relayed by TechCrunch, Anthropic discussed spending more than $1 billion on RL environments in a single year - TechCrunch. A counter-movement has emerged to keep this resource from being monopolized: Prime Intellect launched an open Environments Hub in August 2025, which grew from a private beta to thousands of community-built environments and was used to train its own open reasoning model - Prime Intellect. The market now includes specialized startups, the big data vendors pivoting in, in-house lab teams, and even product companies licensing simulated versions of their own software for agents to train on.

The economics reward realism and exclusivity. Epoch reports that exclusive environment deals command a four-to-five-times price premium over shared ones, and that a finished environment gets used far more for training than for one-off testing - Epoch AI. The concrete examples are vivid: OpenAI reportedly bought hundreds of "UI gym" website replicas, at roughly $20,000 each, to teach its ChatGPT agent to click around real interfaces - SemiAnalysis. A startup called Veris AI emerged in 2025 with $8.5 million in seed funding to build high-fidelity replicas of enterprise software, so an agent can practice on a fake version of a company's systems before it ever touches the sensitive real ones - Veris AI. Each of these is a job that did not exist three years ago and that pays a skilled engineer accordingly.

The reason environments matter for a guide about people is subtle but important: they are the clearest sign yet that the human contribution to AI is moving up the skill ladder, not disappearing. The work is no longer "tell me if this label is right" but "design a realistic, gameable-proof test of competence in your professional domain." Even here, humans set the difficulty deliberately high, with labs targeting tasks where models initially pass only 2 to 3% of the time so there is room to improve - Epoch AI. The risk that keeps practitioners up at night is reward hacking, where a model finds a loophole in a poorly built environment and games the score, which is precisely why thoughtful human design is the scarce input. Not everyone is convinced this will work cleanly. Karpathy, who invests in the open environments effort, has hedged publicly that he is "bullish on environments and agentic interactions" but "bearish on reinforcement learning specifically," a reminder that even the people building this layer disagree about how far it goes - TechCrunch.

6. The Companies Behind the Curtain

Behind the labs sits an industry of data vendors most people have never heard of, and in 2025 it was upended by a single deal. In June 2025, Meta paid $14.3 billion for a 49% non-voting stake in Scale AI, valuing the company at more than $29 billion and pulling its founder Alexandr Wang into Meta to lead a new superintelligence lab - CNBC. Scale had been the dominant data provider, the company that helped recruit InstructGPT's labelers and supplied much of the industry. The deal broke its neutrality overnight: rival labs did not want their training data flowing through a company half-owned by a competitor.

The exodus was swift and is the reason the rest of this market matters. Google, reportedly Scale's largest customer with around $200 million in planned 2025 spending, moved to cut ties, and OpenAI confirmed it was phasing Scale out in favor of more specialized providers - TechCrunch. Within weeks Scale laid off about 200 employees, roughly 14% of its staff - Computerworld. The winners were the rivals. The clearest illustration of how much value sits in this layer is simply the valuations the surviving companies now command, shown below.

AI Training-Data Company Valuations (2025)

The biggest beneficiary of Scale's stumble was Surge AI, a company that bootstrapped to roughly $1.2 billion in 2024 revenue without ever taking outside money, quietly supplying expert RLHF data to OpenAI, Google, and Anthropic - Sacra. In mid-2025 it was reportedly in talks to raise about $1 billion at a valuation of at least $25 billion - Bloomberg. The breakout story, though, was Mercor, an expert-data marketplace founded by three Thiel Fellows in their early twenties, which raised a $350 million round at a $10 billion valuation in October 2025, up from a $2 billion valuation just eight months earlier - TechCrunch. Mercor's model is to match credentialed professionals to labs, and it reported paying out more than $1.5 million a day to its contractors.

Below the giants sits a deep bench, each carving out a niche, and the pattern across all of them is the same move upmarket toward expertise. Turing raised $111 million at a $2.2 billion valuation supplying coding data - TechCrunch, the student-network company Handshake pivoted into a training-data marketplace in early 2025, and Invisible Technologies raised $100 million at a $2 billion-plus valuation - SiliconANGLE. The contrast tier tells its own story: Appen, the old guard of cheap labeling, saw its stock fall more than 40% when Google terminated an $82.8 million contract in early 2024, a stark sign that the commodity model was collapsing even as the expert model boomed - AIwire. The lesson for any buyer is that this is no longer one market; it is a cheap-labor market in decline and an expert-data market on fire, sharing an old name.

The long tail of the industry tells the same story from a dozen angles. Snorkel AI raised $100 million at a $1.3 billion valuation while repositioning from automated, programmatic labeling toward selling expert data as a service - Business Wire. Toloka, once a pure crowdsourcing platform, took a $72 million investment led by Jeff Bezos's family office and pivoted toward expert work - PYMNTS. Even profitable, low-profile veterans moved upmarket: iMerit, long known for high-volume labeling in India, launched a network of 4,000-plus vetted domain experts it calls Scholars - Endroid. And in early 2026 Handshake acquired the data-quality startup Cleanlab in an acqui-hire, a clear signal that the prize is now accuracy rather than volume - TechCrunch. Read together, every serious player is racing to the same high ground: verified human expertise.

Company	2025 valuation	Niche	Funding status
Scale AI	~$29B	Broad data, RL environments	Meta took 49% stake
Surge AI	~$25B (in talks)	Expert RLHF data	Bootstrapped to 2025
Mercor	$10B	Expert marketplace	$350M Series C
Turing	$2.2B	Coding data	$111M Series E
Invisible	$2B+	Custom training data	$100M round
Snorkel AI	$1.3B	Programmatic labeling	$100M Series D

What the table cannot capture is how fast these numbers move. Mercor's valuation quintupled in eight months, Surge went from no outside funding to a reported $25 billion target in a single year, and Scale lost its biggest customers within weeks of a deal that made it far richer. For a buyer, the practical implication is that vendor selection is now a moving target, and the right question is less "who is biggest" than "who has the experts I need and whose incentives are not compromised by who owns them." For a worker, the same volatility means the platform paying the best rate this quarter may restructure the next, which is why portable, verifiable credentials matter more than loyalty to any single marketplace.

7. The Pay Ladder: What AI Trainers Actually Earn

Pay in AI training spans two orders of magnitude, and where you sit on the ladder depends almost entirely on how hard you are to replace. At the bottom, the people who labeled toxic content for early ChatGPT through the outsourcer Sama in Kenya took home between $1.32 and $2 an hour after tax, according to the TIME investigation that first exposed the practice - TIME. Crowdsourcing platforms are not much better at the floor: a large academic study of Amazon Mechanical Turk found a median wage of around $2 an hour, with only about 4% of workers clearing the US minimum wage - arXiv. The research platform Prolific at least sets a hard floor, requiring researchers to pay participants at least $8 an hour - Prolific.

The top of the ladder is a different universe. Mercor advertises an average AI-trainer rate of about $81 an hour, rising past $200 an hour for senior domain experts - TIME. The specifics are striking: the company pays primary-care physicians $130 to $170 an hour to review datasets and lawyers $110 to $130 an hour to craft and grade legal questions - TIME. Its roster reportedly includes alumni of Goldman Sachs, McKinsey, and Mount Sinai hospital. The chart below shows the full spread, and the gap between the floor and the ceiling is the entire thesis of the frontier-data shift.

The AI-Training Pay Ladder (approx. USD per hour)

The middle of the ladder is where most of the friction and litigation lives. Generalist annotation on Scale AI's Outlier platform was advertised at up to $40 an hour, but a wage lawsuit alleged the effective pay worked out closer to $15 an hour because workers were paid for only about half the hours they put in - TechCrunch. Mercor's own published rate guide spells out the gradient explicitly: roughly $12 to $25 an hour for entry-level labeling, $25 to $53 for mid-level domain RLHF, and $75 to over $200 for credentialed experts - Mercor. The same task category, "AI training," contains both a near-minimum-wage job and a job that annualizes to several hundred thousand dollars. The full ladder, with who sits on each rung, looks like this:

Tier	Typical work	Approx. pay	Who does it
Bulk	Moderation, image labels, transcription	$1-2/hr	Global South gig workers
Crowd	Surveys, simple ranking	$8-15/hr	Online crowd platforms
Generalist	RLHF preference rating	$15-40/hr	College-educated contractors
Domain RLHF	Specialized judgment	$25-53/hr	Subject practitioners
Expert	Hard problems, model evals	$75-200+/hr	PhDs, doctors, lawyers, engineers

At the very bottom, conditions can be worse than even these averages suggest. Scale AI's Remotasks platform has been reported to pay some labelers around one US cent for tasks that take minutes, and Kenya's data-labeler association weighed legal action after the platform abruptly cut workers off in 2024 - Canadian Affairs. The middle tier's disputes reached regulators directly: the US Department of Labor opened an investigation into Scale AI over whether it underpaid data-labeling contractors - TechCrunch. These are not edge cases but structural features of a market that grew faster than its labor protections.

The vendor economics behind these rates are mostly opaque, but one historical case is documented cleanly. In the Sama arrangement, OpenAI's contract paid the outsourcer about $12.50 an hour per worker, between six and nine times what the workers themselves took home - TIME. For the new expert marketplaces the margins are not disclosed, but the structure is similar: the buyer pays a blended rate, the worker receives a share, and the platform keeps a recruiting and management take, often estimated at roughly a third. That take is the whole business. A marketplace that can find a board-certified physician willing to grade AI outputs, verify the credential, handle the contracting and payments, and guarantee quality is selling something a lab cannot easily build in-house, and it charges accordingly. The reason these companies carry software-like valuations on what is fundamentally a labor business is that they have turned the hard problem of sourcing scarce experts into a repeatable service, which is also why their revenue figures can be slippery: a headline number often reflects gross marketplace volume before the contractor's share is removed, so a billion-dollar claim can overstate true net revenue by a wide margin. The practical implication for anyone entering this field is that credentials are the only reliable escape from the floor, which is exactly why the industry's center of gravity is migrating toward people who have them.

8. The Human Cost: Trauma, Precarity, and the Global South

The bottom of the AI labor pyramid carries real and documented human harm, and no honest guide to the people behind the models can skip it. The foundational account is TIME's January 2023 investigation, which revealed that OpenAI had signed contracts worth about $200,000 with Sama to have Kenyan workers label graphic text, including descriptions of child sexual abuse, murder, suicide, and torture, so that ChatGPT could learn to filter such content - TIME. Workers described reading 150 to 250 passages of this material per nine-hour shift. Sama wound the OpenAI work down early, in February 2022, roughly eight months ahead of schedule, and the two sides agreed the full contract value would not be paid.

The psychological toll is now a matter of court record, and the litigation is still live in 2026. In Kenya, more than 140 former Facebook content moderators were diagnosed with PTSD, depression, or anxiety by the head of mental-health services at Kenyatta National Hospital, with the diagnoses filed in court in December 2024 - CNN. Of 144 moderators assessed, 81% were classed as suffering severe PTSD. The cases trace back to a world-first 2022 lawsuit by former moderator Daniel Motaung, who earned about $2.20 an hour, and in September 2024 Kenya's Court of Appeal ruled that 185 moderators could take their case to trial there, rejecting Meta's argument that Kenyan courts lacked jurisdiction - Foxglove. As recently as February 2026, a Kenyan judge postponed long-awaited rulings in the two landmark cases, with settlement talks reported to have collapsed - Business & Human Rights Resource Centre.

These conditions have produced the first organized labor movement in the sector. In May 2023, more than 150 African AI and social-media workers voted to form the African Content Moderators Union, described as the first of its kind - TIME. That effort scaled up in April 2025 into the Global Trade Union Alliance of Content Moderators, uniting moderator unions across nine countries from Kenya and Ghana to the Philippines and Colombia - UNI Global Union.

Launch of the Global Trade Union Alliance of Content Moderators, uniting data and moderation workers from nine countries to bargain with Big Tech — Source: UNI Global Union, Nairobi, April 2025.

The harms are not confined to the Global South either: in January 2025, six US contractors sued Scale AI and its Outlier platform, alleging the work labeling violent content caused them PTSD, and the company responded that it maintains safeguards including advance warnings and the ability to opt out - The Register.

The deeper pattern, documented by journalist Josh Dzieza in his influential feature "AI Is a Lot of Work" and by Karen Hao in her 2025 book "Empire of AI," is one of deliberate invisibility - The Verge. Workers in Kenya, Venezuela, Colombia, and the Philippines were paid roughly one to two dollars an hour to make AI systems safer and smarter, while being told almost nothing about what they were building or for whom - Rest of World. The wages have not improved much with time: a late-2025 analysis from the London School of Economics found some East African content work paying as little as the equivalent of $1 a day in ten-hour shifts - LSE. Regulation is beginning to respond, with advocates pointing to the EU's Digital Services Act and AI Act as potential levers for minimum standards and mental-health support. The uncomfortable truth this section establishes is that the cleanliness of a chatbot's output has historically been paid for, in part, by the mental health of people the user will never see.

9. Red Teams: The People Paid to Break Models

A distinct and growing group of people train AI by attacking it, and their findings feed directly back into how the model behaves. Red-teamers probe a model for dangerous outputs, jailbreaks, and misuse potential before and after release, and what they discover becomes training data for the next round of safety tuning. This is a genuine training input, not just quality assurance: the people who successfully break the model are, in effect, teaching it where its guardrails need to be. For its GPT-5 launch, OpenAI ran more than 5,000 hours of red-teaming with over 400 external testers and experts, prioritizing topics like violent-attack planning, jailbreaks, and bioweaponization - OpenAI.

This work increasingly demands rare domain expertise rather than general hacking skill. To assess whether GPT-5 could help plan violent attacks, OpenAI assembled a specialized team of 25 red-teamers with backgrounds in defense, intelligence, and law enforcement - OpenAI. Anthropic runs a small Frontier Red Team that, notably, reports to its policy division rather than the team building the models, a structural choice meant to remove any incentive to soften alarming findings - Anthropic. When external experts found that Claude Opus 4 showed meaningful uplift on dangerous biological tasks, Anthropic activated a stricter safety tier for the model in May 2025 - Anthropic.

Alongside the in-house and contracted experts is a crowd of independent jailbreak hunters, recruited through bug bounties. Anthropic ran a public challenge in February 2025 that drew 339 researchers and paid out $55,000 to teams who could defeat its safeguards on dangerous-knowledge questions - HackerOne, and followed it with an invitation-only program offering up to $25,000 for a single universal jailbreak of an unreleased model - Anthropic. These findings are not merely patched; they harden the model. Anthropic's Constitutional Classifiers, built partly from red-team discoveries, cut the jailbreak success rate from 86% to 4.4% - Anthropic.

These programs increasingly run on formal safety frameworks that spell out when a capability is too dangerous to ship, and human evaluators are the ones who decide whether a threshold has been crossed. OpenAI's Preparedness Framework tracks frontier risks at two levels, "High" and "Critical," with a high rating blocking deployment until safeguards are in place and a critical rating halting further development - OpenAI. Google DeepMind's parallel Frontier Safety Framework defines "critical capability levels" across cyber, biology, and AI self-improvement - Google DeepMind. The most striking example of expert involvement is national-security grade: Anthropic partnered with the US National Nuclear Security Administration to red-team its models for nuclear and radiological knowledge inside a classified environment - Anthropic.

Governments have now joined the red team. The UK's AI Security Institute, with roughly 100 staff drawn from intelligence and academia, conducts pre-deployment testing of frontier models, and in 2026 the US Center for AI Standards and Innovation signed evaluation agreements with five frontier labs including Google DeepMind, Microsoft, and xAI - CNBC. These are not box-ticking exercises. In testing reported in 2026, the UK institute's evaluators coaxed a frontier ChatGPT model into providing hacking tips within about six hours, and found that recent models could run multi-step corporate network attacks faster than skilled humans - New York Times. The significance for this guide is that "training" now formally includes adversarial human testing as a stage with its own institutions and budgets. The people who try hardest to make the model misbehave are, paradoxically, among the most important people teaching it how to behave.

10. The Researchers and the Compensation War

The most visible and best-paid people behind the models are the researchers and engineers who design the training itself, and in 2025 they became the prize in an open talent war. The trigger was Meta. After its Scale AI deal, Meta launched Meta Superintelligence Labs in mid-2025 and went on an aggressive hiring spree, with CEO Mark Zuckerberg personally recruiting and, according to OpenAI's Sam Altman, offering signing bonuses as high as $100 million - CNBC. The figures escalated from there. Meta reportedly poached Apple's foundation-models lead Ruoming Pang with a package worth more than $200 million over several years - Bloomberg.

The single most eye-watering number came in October 2025, when the Wall Street Journal reported that Zuckerberg had offered Thinking Machines Lab co-founder Andrew Tulloch a package worth up to $1.5 billion over at least six years, an offer Meta dismissed as "inaccurate and ridiculous" before Tulloch ultimately joined - TechCrunch. These are not cash salaries but multi-year equity packages loaded with retention vesting, designed to make leaving expensive. Still, the scale is real, and it reflects a belief that a handful of researchers can be worth more than enormous compute budgets.

The raid was specific and very public. Meta hired a cluster of OpenAI researchers who had worked on models like GPT-4 and the o1 reasoning system, and installed former OpenAI scientist Shengjia Zhao as the new lab's chief scientist - VentureBeat. It reportedly agreed to pay about $250 million over four years to a 24-year-old researcher named Matt Deitke, after he initially turned down roughly half that and then met Zuckerberg in person - Wikipedia. OpenAI did not stand still: according to reporting, it responded by handing out more equity, accelerating vesting schedules, and offering retention bonuses to keep its top people from walking. The bidding war is the clearest possible evidence that, at the very top, the people who design training are treated as the single scarcest input in the entire industry.

Money, it turned out, did not buy loyalty evenly. SignalFire's 2025 talent report found sharply different retention across the major labs, a useful proxy for where researchers actually want to work, shown below. Anthropic and Google DeepMind retained staff far better than OpenAI and Meta, and engineers were leaving OpenAI for Anthropic at roughly eight times the reverse rate - SignalFire.

Two-Year Researcher Retention by Lab (2025)

Notably, not every leader chose to compete on price. Anthropic CEO Dario Amodei publicly declined to match Meta's offers, arguing that doing so would compromise the company's principles of fairness in compensation - TechRepublic. The war also had losers among the winners: in October 2025, even as it hoarded elite talent, Meta laid off about 600 people in its broader AI division, sparing only its top new-hire unit - CNBC. The takeaway is that the people who architect training sit at the very top of the value chain, commanding sums that would have been unthinkable in any other field, while the structure of the work below them churns. Talent, not just compute, is now treated as the binding constraint on building better models.

11. The Geography and Scale of the AI Labor Force

The people who train AI are spread across the planet in a pattern that mirrors global inequality, and the totals are larger than most readers expect. There is no clean count of dedicated AI annotators, but the best proxy is the World Bank's estimate of between 154 million and 435 million online gig workers worldwide, a group that includes the data workers who feed AI - World Bank. The same research found that low- and middle-income countries account for a large and fast-growing share of this work, with gig job postings in Sub-Saharan Africa growing 130% in a single period against 14% in North America, alongside a persistent gender gap in which women on one major platform earned just 68% of what men did. The work, in other words, is flowing toward the Global South and carrying old inequalities with it.

The geography is concentrated and specific. Scale AI's labeling subsidiary Remotasks at one point ran a workforce reported at around 240,000 across Kenya, the Philippines, and Venezuela, before abruptly exiting several countries in early 2024 - Contrary Research. Venezuela became a notable hub precisely because its currency collapse made local workers willing to label data for as little as $0.90 to $2 an hour - Wikipedia. A trio of "impact sourcing" firms built businesses on this geography: Sama in East Africa, iMerit with around 5,000 workers mostly in India, and CloudFactory with more than 7,000 across Nepal and Kenya - CloudFactory. India alone has an estimated 70,000 annotators today, a number NASSCOM projects could approach 1 million by 2030 - NASSCOM.

The conditions in these hubs have been measured and found wanting. The Oxford Internet Institute's Fairwork project scores data-work platforms out of ten on fairness principles, and its 2023 cloudwork ratings were damning: no platform scored above 5 out of 10, and four, including Amazon Mechanical Turk, scored zero - Fairwork. Fairwork's own framing is worth quoting: behind every AI model "lies a complex supply chain of data work" performed by workers in "low-oversight environments" with conditions that are "often hidden, unregulated, and vulnerable to exploitation" - Fairwork.

The specific scores are damning in detail. An Oxford assessment found that Scale's Remotasks platform met only 1 of 10 fair-work criteria, while the impact-sourcing firm Sama, often held up as the responsible option, scored just 5 out of 10 - Oxford Internet Institute. The pattern across the league table is consistent: the platforms moving fastest and cheapest score worst. That is why the industry's reputational and legal exposure is concentrated at the bottom of the pay ladder, and why the move toward credentialed experts doubles as a way for buyers to reduce risk, since a physician earning $150 an hour with flexible hours is a fundamentally different labor relationship than a crowd worker earning $2 to moderate trauma.

The market built on this labor is growing fast even as its shape changes. Estimates vary widely by methodology, but Grand View Research sizes the data-annotation tools market at around $1.02 billion in 2023, rising to $5.33 billion by 2030 at a 26.5% compound annual growth rate - Grand View Research. The practical reading of all this geography and scale is twofold: the human foundation of AI is enormous and globally distributed, and it is simultaneously becoming a site of measurement, regulation, and organized pushback. The era when this labor could stay completely invisible is ending, which is itself a meaningful change in how the people behind the models are treated.

12. The Future: Synthetic Data and the Expert Bottleneck

The biggest open question in this entire field is whether AI will soon train itself and make the human workforce obsolete, and the honest 2026 answer is no, but the job is changing profoundly. The case for replacement rests on synthetic data: models generating their own training material. The trend is real and large. Gartner predicted years ago that synthetic data would overtake real data in AI training, and reasoning models like DeepSeek-R1 already lean heavily on model-generated, heavily-filtered data - Tech Monitor. The most striking demonstrations, like the 2025 "Absolute Zero" method, train reasoning with essentially zero external data by having the model propose and solve its own problems - arXiv.

There is a catch that keeps humans in the loop. Training a model repeatedly on its own output risks model collapse, a degradation documented in a 2024 Nature study where recursive AI-generated data caused models to lose diversity and quality over successive generations - Nature. The industry consensus is that synthetic data works only when it is carefully filtered and verified, and that verification requires human expertise, especially in domains where there is no automatic way to check correctness. RLVR can grade a math proof automatically, but it cannot tell you whether a legal argument is sound or a medical recommendation is safe. That is precisely where the expensive experts come in.

The industry's bet is that filtered, verified synthetic data can stretch the human supply rather than replace it. Microsoft researchers have pitched scalable synthetic data as a way to "break the data wall," treating model-generated examples as a tool rather than a hazard - Microsoft Research. Yet the human anchor stays visible even in the most automated pipelines: Scale AI has said roughly 25% of its contributors hold advanced degrees, a figure that would have been absurd in the box-drawing era - Foundation Capital. And the overall annotation market keeps expanding, from about $3.6 billion in 2025 toward $4.6 billion in 2026, even as the commodity end automates away - Business Research Insights.

This is the expert bottleneck, and it is reshaping the human role rather than ending it. Routine labeling is genuinely shrinking: in 2026, Sama announced it would lay off more than 1,100 workers in Kenya after Meta ended a contract, part of a broader contraction in commodity annotation - Business Daily Africa. At the same time, demand for scarce expertise is exploding, which is why a company like Mercor went from a roughly $75 million revenue run rate to a $10 billion valuation in under a year - TechBuzz. The emerging pattern is "agents in the loop": AI does the first-pass labeling at scale, and humans verify and correct the hard cases, with the human increasingly functioning as a senior reviewer rather than a line worker.

In practice this model is already concrete. A common 2026 pipeline pairs an automated AI judge that grades an entire dataset with a sampled human verification slice, routing the lowest-confidence cases to expert review rather than checking everything by hand. The human stops being the labeler and becomes the auditor of last resort, the one who adjudicates what a model and an automated grader could not settle between them. That is a more skilled job than the one it replaces, and it pays better, but there is also simply less of it per unit of data. This is the precise mechanism by which the headcount at the bottom can shrink while the pay and influence of the people at the top keep climbing, the two halves of the same transition this guide has traced from the start.

That shift makes finding qualified specialists the new constraint, and it is quietly turning a data problem into a hiring problem. When the binding input is a board-certified physician or a senior litigator willing to spend a few hours a week grading AI outputs, the company that can locate and engage those people fastest wins. This is why sourcing has moved to the center of the industry, and why autonomous recruiting tools that can find specialists who are not on any job board, including platforms like HeroHunt.ai alongside the expert marketplaces themselves, have become part of the training supply chain. The people who train AI are no longer an afterthought to be hired cheaply; they are the scarce resource the whole enterprise now competes for.

The likely trajectory, then, is not a world without human trainers but a world with fewer, better-paid, more specialized ones, sitting atop an automated base that handles the volume. The cheap-labeling workforce that built the first generation of AI will keep contracting, with all the human dislocation that implies, while the expert layer keeps expanding and rising in value. If the last decade of AI was a story of scaling compute and scraping data, the next one is shaping up to be a story about people: which experts a lab can recruit, how it treats the workers at the bottom, and whether the judgment that only humans can supply can be sourced fast enough to keep the models improving. The machines are not training themselves. They still need us, just different ones of us, for different reasons, at very different prices.

Conclusion

The simplest way to read everything above is as a single arc: AI training is migrating from cheap human time to scarce human judgment, and that migration touches money, ethics, and geography all at once. The model you talk to was shaped by anonymous web text, then by contractors writing example answers, then by raters comparing outputs, then by experts building tests and breaking guardrails, and finally by researchers paid like star athletes to design the whole process. Every one of those layers is human, and the value, the controversy, and the growth are all flowing toward the top of the stack.

For anyone trying to act on this, the decision framework is clear. If you are building AI, your binding constraint is no longer compute or even data volume; it is access to credentialed experts and well-designed environments, so treat sourcing and fair treatment of contributors as core strategy rather than procurement overhead. If you are a worker, the lesson is that credentials and domain depth are the only durable escape from a labor floor that automation is steadily eroding. And if you are simply a user or a citizen, the takeaway is that the cleanliness, safety, and competence of AI rest on real people, some paid extraordinarily well and some paid extraordinarily little, whose existence the technology is designed to make you forget.

The volatility of this market is its defining feature. Valuations quintuple in months, vendors lose their biggest customers overnight, lawsuits drag across years, and the line between "human-trained" and "self-trained" keeps moving. What will not change is the underlying dependency. As long as models need judgment they cannot generate, taste they cannot acquire, and accountability they cannot provide, there will be people behind the models. The only real questions are how many, how skilled, how well treated, and how visible we are willing to let them be.

That visibility is itself becoming a competitive and ethical issue. The unions forming in Nairobi, the lawsuits winding through the courts, the regulators signing testing agreements, and the marketplaces competing to treat experts well are all, in different ways, pulling this workforce out of the shadows. The companies that get ahead of the shift, by paying fairly, protecting workers from the worst content, and sourcing genuine expertise rather than exploiting desperation, will not only look better; they will get better data, because judgment offered under decent conditions is worth more than judgment extracted under bad ones. The people behind the models were always there. The only thing changing is whether the rest of us are willing to see them.

This guide reflects the AI-training landscape as of June 2026. Valuations, pay rates, and training methods in this field change quickly, so verify current figures before relying on them for decisions.

How AI Is Trained: The People Behind Models (2026)

Contents

1. How AI Actually Learns From People

Deep Dive into LLMs like ChatGPT (Andrej Karpathy, 2025)

2. The Hidden Workforce: Who Trains the Models

3. From Cheap Labels to Frontier Data

4. Inside RLHF: How Human Preferences Become a Reward

How human feedback becomes a training signal

5. RL Environments: The 2026 Frontier of Human Data

Where RL environments are built and shared

6. The Companies Behind the Curtain

AI Training-Data Company Valuations (2025)

7. The Pay Ladder: What AI Trainers Actually Earn

The AI-Training Pay Ladder (approx. USD per hour)

8. The Human Cost: Trauma, Precarity, and the Global South

The people behind the models begin to organize

9. Red Teams: The People Paid to Break Models

10. The Researchers and the Compensation War

Two-Year Researcher Retention by Lab (2025)

11. The Geography and Scale of the AI Labor Force

12. The Future: Synthetic Data and the Expert Bottleneck

Conclusion

Latest Articles

Best LLMs for Recruitment: 2026 Hiring Benchmark

Finding AI Trainers: The 2026 Insider Guide

Recruiting AI/ML Engineers: 2026 Insider Guide

Ready to hire smarter?