We have developed a privacy-preserving dataset from millions of human-AI interactions across economic tasks. Our aggregated and anonymized dataset provides unique insights into how AI systems are being integrated into different types of work and extends the replication data from our recent paper (Handa, Tamkin, et al. 2025).
As we show in the appendix, analyzing the privacy-preserving aggregate data (produced using Clio) can yield similar results to classifying conversations directly. The aggregate dataset contains ~2,000 hierarchically-clustered groups of AI interactions on Claude.ai (Free and Pro) with the following columns:
- Cluster hierarchy (3 levels, each containing summaries of at least several hundred unique human-AI interactions from at least several hundred unique organizations):
- cluster_name_0, cluster_description_0: Base-level clusters (most granular), representing specific task patterns (e.g., "review business NDAs for contractors")
- cluster_name_1, cluster_description_1: Mid-level clusters, grouping related base-level tasks (e.g., "check business contracts for typos and errors")
- cluster_name_2, cluster_description_2: Top-level clusters (least granular), capturing broad categories of tasks (e.g., "draft, explain, and analyze legal documents and procedures")
- Usage metrics:
- percent_records: percentage of total records in the cluster
- percent_users: percentage of total users in the cluster
- Occupational mapping:
- onet_task: O*NET task mapping for base cluster
- onet_occupations: Comma-separated list of related O*NET occupations
- onet_occupational_areas: Corresponding O*NET occupational areas
We are collecting feedback about this dataset and its potential research applications to understand how it might advance research in economics, labor markets, and technological change. We aim to understand the broader research community's interests and gather feedback on our data format.