Is generative artificial intelligence really a bad worker?
It is not yet clear how much AI currently helps productivity at work. But one thing is certain: if it were a worker, it would be terrible. Left on its own, in short, AI would wreak havoc. Even in the simplest things.
This is revealed by the new Remote Labor Index study, by Scale AI in collaboration with the Center for AI Safety (Cais). The first to test the best artificial intelligence agents in a systematic way, with office tasks. It turns out that Manus, Grok, Claude, Chatgpt and Gemini, are still far from being able to replace human workers in the online freelance labour market.
Only 3% of tasks completed
The test simulated a series of real assignments taken from platforms such as Upwork. From graphics to video editing; from game development to administrative tasks.
The result is unequivocal: even the most advanced models successfully completed less than 3% of the tasks. They earn a total of $1,810 out of a potential $143,991.
According to Dan Hendrycks, director of Cais, the main limitation is not the ability to generate text or code, but the lack of long-term memory, the inability to learn from experience and the difficulties in handling complex, multi-step processes. These three points correspond to the main gap areas of generative AI and its most significant differences from the human brain for cognitive work tasks. The brain continuously learns, adapts and has a flexibility that allows it (us) to find its way in complex tasks made up of several disparate actions. 'Agents know how to respond, but they don't know how to work,' Hendrycks summarises.

