Human feedback displaces compute as AI's top constraint
A survey of 120 AI practitioners finds quality of human feedback ranks as the top development challenge, surpassing cost constraints that have long dominated industry thinking.
May 6th 2026 · World
Research from human feedback platform Prolific has identified communication as the biggest bottleneck in AI development, surpassing traditional constraints like cost and compute resources. A survey of more than 120 AI practitioners, including engineers, researchers, product managers and business leaders, found that the quality of human feedback used to train AI systems ranks as the top challenge, followed closely by the difficulty of measuring whether training is working. Only about one in six respondents cited cost as a significant difficulty, challenging years of industry assumptions about what drives AI development constraints. The findings point to what Prolific calls an "instruction gap": the loss of signal between what engineers need and what expert contributors can deliver without proper context. Early AI systems relied on volume and simple labeling tasks where humans were largely interchangeable. However, as AI becomes capable of handling complex tasks in healthcare, legal and financial contexts, the industry now requires domain experts who can evaluate whether models reason correctly rather than simply producing plausible-sounding outputs. Nearly half of respondents cited designing evaluation methodologies and subject matter expert validation as primary inputs, well beyond traditional data labeling. The stakes of addressing this challenge are rising as nearly two-thirds of surveyed practitioners identified AI agents and autonomous systems as the primary growth area for 2026. These systems plan, decide and act independently, requiring a fundamentally different reliability standard than previous AI applications. The research suggests companies achieving success in closing the instruction gap treat the management of human expertise as a first-class engineering problem deserving operational rigor equal to model architecture and infrastructure. This involves ongoing calibration processes where contributors can flag ambiguous instructions and engineers can tighten task design based on observed disagreements.