Quick-MIMIC: A Multimodal Data Extraction Pipeline for MIMIC with Parallelization

Medical big data with artificial intelligence are vital in advancing digital medicine. However, the opaque and non-standardised nature embedded in most medical data extraction is prone to batch effects and has become a significant obstacle to reproducing previous works. This paper aims to develop an...

Full description

Saved in:
Bibliographic Details
Main Authors: Yutao Dou, Wei Li, Yangtao Zheng, Xiaojun Yao, Huanxiang Liu, Albert Y. Zomaya, Shaoliang Peng
Format: Article
Language:English
Published: Tsinghua University Press 2024-12-01
Series:Big Data Mining and Analytics
Subjects:
Online Access:https://www.sciopen.com/article/10.26599/BDMA.2024.9020024
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Medical big data with artificial intelligence are vital in advancing digital medicine. However, the opaque and non-standardised nature embedded in most medical data extraction is prone to batch effects and has become a significant obstacle to reproducing previous works. This paper aims to develop an easy-to-use time-series multimodal data extraction pipeline, Quick-MIMIC, for standardised data extraction from MIMIC datasets. Our method can fully integrate different data structures into a time-series table, including structured, semi-structured, and unstructured data. We also introduce two additional modules to Quick-MIMIC, a pipeline parallelization method and data analysis methods, for reducing the data extraction time and presenting the characteristics of the extracted data intuitively. The extensive experimental results show that our pipeline can efficiently extract the needed data from the MIMIC dataset and convert it into the correct format for further analytic tasks.
ISSN:2096-0654