How People Actually Use GenAI at Work

Executive summary

Two landmark usage datasets—NBER’s “How People Use ChatGPT” and Anthropic’s “Which Economic Tasks are Performed with AI?”—converge on a simple truth: today’s LLMs are overwhelmingly cognitive amplifiers used for information seeking, writing, reasoning, and programming, not physical or manual work.  By mid-2025, around 10% of the world’s adults had adopted ChatGPT, with especially fast growth in lower-income countries—evidence that usage is broadening beyond early adopters.  In Anthropic’s four-million-conversation corpus, software development + writing account for nearly half of usage, and about 57% of use looks like augmentation vs. 43% automation—a useful baseline for task design.  Across the wider literature, measured productivity gains are real but heterogeneous (e.g., +14% for call-centre agents; larger boosts for novices).  Controlled experiments find substantial quality and speed improvements for professional writing and knowledge-work tasks, but with “jagged frontier” failure modes that demand workflow and guardrail design.  Exposure studies suggest higher-income, cognitive roles are more affected (positively or negatively) than manual roles; 36–80% of occupations see a material share of tasks touched.  Macro lenses (OECD, IMF, Stanford HAI) indicate accelerating business adoption and uneven distributional effects, raising policy questions on skills, competition, and inclusion.  The practical takeaway: treat LLMs as decision-support and drafting co-pilots, instrument workflows for augmentation first, and build measurement, QA, and escalation around the known jagged edges.  For boards and policymakers: combine skills investments with assurance frameworks and open, replicable measurement of AI’s effects on work. 

1) The two big usage datasets—and why they matter

How People Use ChatGPT (NBER, Sept 2025). Using a privacy-preserving pipeline, the authors document adoption and classify messages by work vs non-work and high-level intents. Highlights: ~10% of the world’s adult population had adopted by July 2025; early male skew has narrowed; growth is faster in lower-income countries. Task mix is dominated by practical guidance, information seeking, and writing.  Which Economic Tasks are Performed with AI? (Anthropic, Feb 2025). Maps 4M Claude conversations to O*NET tasks and occupations. Finds concentration in software development and writing; ~36% of occupations use AI for at least a quarter of their tasks; 57% augmentation vs 43% automation. 

Together, these studies give us both a societal adoption view (who uses AI and for what) and a task-economic view (which tasks/occupations are actually touched).

2) What people do with LLMs (today)

Across both datasets you see a consistent signal:

Information triage and sense-making (asking/reading/interpreting) Writing and documentation (drafting, summarising, revising) Reasoning and problem solving (including code synthesis and debugging)

Manual, physical, and equipment-centred activities are near zero—unsurprising because these systems operate over symbols, language, and code. In the NBER taxonomy, “getting information,” “interpreting for others,” and “documenting” dominate; in the Anthropic skill distribution you see critical thinking, reading comprehension, writing, systems analysis, and programming at the top. 

3) Augmentation first, automation sometimes

Anthropic’s 57/43 augmentation/automation split is a handy planning prior. It implies most value comes when humans iterate with the model—learning, refining, and supervising—rather than handing off entire tasks. That fits with measured productivity evidence:

Customer support (field evidence): +14% issues resolved/hour on average; +34% for novices; improvements in sentiment and retention.  Professional writing (RCT): ChatGPT exposure reduces time and raises quality for mid-level writing tasks.  Consulting tasks (“jagged frontier”): Large average gains on tasks within the model’s competence; sharp failures when tasks fall outside that frontier—hence the need for guardrails and escalation.  Software engineering: Multiple studies point to speed gains (e.g., GitHub Copilot reports faster completion and time-to-merge, though results vary and are context-dependent). 

Design implication: instrument workflows for augmentation (draft → critique → revise), with automation only where you have tight specs, high confidence, and monitoring.

4) Who’s most exposed—and how?

Early exposure indices (OpenAI/UPenn’s GPTs are GPTs; IMF/OECD updates; World Bank/ILO variants) converge on a pattern: cognitive, higher-income roles see more tasks affected, while manual roles remain less exposed. Estimates vary (method and scope differ), but the signal is robust:

OpenAI/UPenn (2023): ~80% of the U.S. workforce could have ≥10% of tasks affected; ~19% with ≥50% of tasks affected; higher-income roles more exposed.  IMF (2024–2025): Roughly 40–60% of jobs affected (higher in advanced economies); distributional concerns around inequality and firm concentration.  OECD (2023–2024): Changing skill demand towards analytical, writing, and problem-solving competencies; need for adult-learning systems that keep pace. 

5) Adoption is accelerating—so measurement must mature

The Stanford HAI AI Index 2025 documents a sharp rise in enterprise AI use and investment, especially in generative AI. That’s consistent with the usage growth seen in the NBER study and with firms’ rapid tooling roll-outs. The research frontier now needs longitudinal, multi-platform panels and open measurement to track task-level effects and substitution/augmentation balance as model capabilities shift. 

6) A practical playbook for firms (augmentation-first)

A. Choose the right task archetypes

Drafting & rewriting: briefs, emails, reports, policy drafts, code comments. Synthesis & sense-making: literature sweeps, discovery, call notes, post-meeting summaries. Structured reasoning with templates: root-cause analysis, options appraisal, test-case generation. Constrained automation: repeatable transforms (formatting, unit tests, data extraction) with validation.

B. Design the guardrails (to manage the jagged frontier)

Scope screens: Is the task inside model competence? If not, break it down or route to experts.  Deliberate prompting patterns: critique-then-improve, chain-of-checks, and counter-argument prompts. Human-in-the-loop QA: acceptance criteria, spot checks, and escalation for ambiguity/high stakes. Provenance & audit: log prompts/outputs, keep decision trails, and retain human authorship.

C. Measure what matters

Time to complete, quality scores (rubrics), rework rates, customer sentiment; segment by novice vs expert to reveal heterogeneous gains. 

7) A policy and governance agenda (for boards, regulators, and educators)

Skills & inclusion. Invest in reading/writing/critical thinking and systems analysis, which the usage studies show are central to today’s LLM complementarity; expand adult-learning and on-the-job micro-credentials.  Assurance & transparency. Encourage publication of usage taxonomies, task-level impact reports, and evaluation artefacts; align with emerging AI governance and labour-market guidance from OECD/IMF.  Competition & concentration. Monitor productivity gains alongside potential winner-take-most dynamics in models, tooling, and data centres; support interoperability and benchmarking across providers.  Safety + effectiveness by design. Pair red-teaming and guardrails with effectiveness testing (does the tool really improve outcomes for this task and cohort?).

8) Where the evidence is thin (read this before you extrapolate)

Single-platform datasets. The Anthropic study is Claude-only; generalisation requires cross-platform corroboration.  Selection effects. Early enterprise adopters and self-selected power users are not the whole economy. Rapid capability shifts. New model families (and “agentic” behaviours) may alter the augmentation/automation balance year-to-year—hence the need for continuous measurement. 

9) What leaders should do next (a 6-month plan)

Map work to tasks. Use O*NET-style decompositions; label candidates for augment → automate.  Pilot with novices first. Expect the largest gains and the clearest learning-curve compression.  Build an evaluation harness. Time/quality metrics, domain rubrics, and human QA. Codify playbooks. Prompt patterns, checklists, escalation rules, and model choice guidance. Upskill managers. Teach when to pair human and model and how to supervise for quality. Report transparently. Publish internal dashboards on impact, errors, and mitigations.

References (selected, linked inline above)

Usage & adoption: Chatterji et al., How People Use ChatGPT (NBER, 2025); Anthropic, Which Economic Tasks are Performed with AI? (2025).  Productivity: Brynjolfsson et al., Generative AI at Work (QJE 2025; NBER WP 2023); Noy & Zhang (2023) RCT on professional writing; Dell’Acqua et al. (2023) BCG field experiment; GitHub Copilot studies and replications.  Exposure & skills: OpenAI/UPenn GPTs are GPTs (2023); OECD Employment Outlook 2023; OECD 2024 skill-demand analysis; IMF (2024–2025) labour-market notes; World Bank/ILO exposure updates.  Macro & adoption trends: Stanford HAI AI Index 2025. 

Closing thought

Taken together, the usage evidence says: LLMs are already mainstream cognitive infrastructure. The leaders who win won’t be those who chase full automation everywhere, but those who master where and how to pair humans with models—and prove it with transparent measurement and robust guardrails.

Leave a comment