A Global Dataset Mapping the AI Innovation from Academic Research to Industrial Patents

Abstract In the rapidly evolving field of artificial intelligence (AI), mapping innovation patterns and understanding effective technology transfer from research to applications are essential for economic growth. However, existing data infrastructures suffer from fragmentation, incomplete coverage,...

Full description

Saved in:
Bibliographic Details
Main Authors: Haixing Gong, Hui Zou, Xingzhou Liang, Shiyuan Meng, Pinlong Cai, Xingcheng Xu, Jingjing Qu
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:Scientific Data
Online Access:https://doi.org/10.1038/s41597-025-05518-3
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract In the rapidly evolving field of artificial intelligence (AI), mapping innovation patterns and understanding effective technology transfer from research to applications are essential for economic growth. However, existing data infrastructures suffer from fragmentation, incomplete coverage, and insufficient evaluative capacity. Here, we present DeepInnovationAI, a global dataset containing machine learning techniques. DeepPatentAI.csv: Contains 2,356,204 patent records with 8 field-specific attributes. DeepDiveAI.csv: Encompasses 3,511,929 academic publications with 13 metadata fields. These two datasets leverage large language models, multilingual text analysis and dual-layer BERT classifiers to accurately identify AI-related content, while utilizing hypergraph analysis to create robust innovation metrics. Additionally, DeepCosineAI.csv: By applying semantic vector proximity analysis, this file contains 3,511,929 most relevant paper-patent pairs, each described by 3 metadata fields, to facilitate the identification of potential knowledge flows. DeepInnovationAI enables researchers, policymakers, and industry leaders to anticipate trends and identify collaboration opportunities. With extensive temporal and geographical scope, it supports detailed analysis of technological development patterns and international competition dynamics, establishing a foundation for modeling AI innovation and technology transfer processes.
ISSN:2052-4463