Linking The Cancer Imaging Archive and GenBank to the National Clinical Cohort Collaborative

Abstract Objective This project demonstrates the feasibility of connecting medical imaging data and features, SARS‐CoV‐2 genome variants, with clinical data in the National Clinical Cohort Collaborative (N3C) repository to accelerate integrative research on detection, diagnosis, and treatment of COV...

Full description

Saved in:
Bibliographic Details
Main Authors: Ahmad Baghal, Joel Saltz, Tahsin Kurc, Prateek Prasanna, Samantha Baghal, Janos Hajagos, Erich Bremer, Shaymaa Al‐Shukri, Joshua L. Kennedy, Michael Rutherford, Tracy Nolan, Kirk Smith, Christopher G. Chute, Fred Prior
Format: Article
Language:English
Published: Wiley 2025-01-01
Series:Learning Health Systems
Subjects:
Online Access:https://doi.org/10.1002/lrh2.10457
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841527654207979520
author Ahmad Baghal
Joel Saltz
Tahsin Kurc
Prateek Prasanna
Samantha Baghal
Janos Hajagos
Erich Bremer
Shaymaa Al‐Shukri
Joshua L. Kennedy
Michael Rutherford
Tracy Nolan
Kirk Smith
Christopher G. Chute
Fred Prior
author_facet Ahmad Baghal
Joel Saltz
Tahsin Kurc
Prateek Prasanna
Samantha Baghal
Janos Hajagos
Erich Bremer
Shaymaa Al‐Shukri
Joshua L. Kennedy
Michael Rutherford
Tracy Nolan
Kirk Smith
Christopher G. Chute
Fred Prior
author_sort Ahmad Baghal
collection DOAJ
description Abstract Objective This project demonstrates the feasibility of connecting medical imaging data and features, SARS‐CoV‐2 genome variants, with clinical data in the National Clinical Cohort Collaborative (N3C) repository to accelerate integrative research on detection, diagnosis, and treatment of COVID‐19‐related morbidities. The N3C curated a rich collection of aggregated and de‐identified electronic health records (EHR) data of over 18 million patients, including 7.5 million COVID‐positive patients, seen at hospitals across the United States. Medical imaging data and variant samples are important data modalities used in the study of COVID‐19. Materials and Methods Imaging data and features are hosted on the Cancer Imaging Archive (TCIA), and sequenced variant samples are analyzed and stored at the NIH GenBank. The University of Arkansas for Medical Sciences (UAMS) published the first COVID‐19 data set of 105 patients on TCIA and 37 patients on GenBank. We developed a process to link imaging and genomic variants and N3C EHR data through Privacy Preserving Record Linkage (PPRL) using de‐identified cryptographic hashes to match records associated with the same individuals without using patient identifiers. Results The PPRL techniques were piloted using clinical and imaging data sets provided by UAMS. Developed software components and processes executed properly, and linked data were returned and processed for visualization. Conclusion Linking across clinical data sources at the patient level provides opportunities to gain insights from data that may not be known otherwise. The PPRL prototype and the pilot serve as a model to link disparate and diverse data repositories to enhance clinical research.
format Article
id doaj-art-073994b670a54db7b3926078d26b706e
institution Kabale University
issn 2379-6146
language English
publishDate 2025-01-01
publisher Wiley
record_format Article
series Learning Health Systems
spelling doaj-art-073994b670a54db7b3926078d26b706e2025-01-15T08:51:32ZengWileyLearning Health Systems2379-61462025-01-0191n/an/a10.1002/lrh2.10457Linking The Cancer Imaging Archive and GenBank to the National Clinical Cohort CollaborativeAhmad Baghal0Joel Saltz1Tahsin Kurc2Prateek Prasanna3Samantha Baghal4Janos Hajagos5Erich Bremer6Shaymaa Al‐Shukri7Joshua L. Kennedy8Michael Rutherford9Tracy Nolan10Kirk Smith11Christopher G. Chute12Fred Prior13Department of Biomedical Informatics University of Arkansas for Medical Sciences, College of Medicine Little Rock Arkansas USAStony Brook University The State University of New York, Biomedical Informatics Stony Brook New York USAStony Brook University The State University of New York, Biomedical Informatics Stony Brook New York USAStony Brook University The State University of New York, Biomedical Informatics Stony Brook New York USADepartment of Internal Medicine The University of Tennessee Health Science Center Memphis Tennessee USAStony Brook University The State University of New York, Biomedical Informatics Stony Brook New York USAStony Brook University The State University of New York, Biomedical Informatics Stony Brook New York USADepartment of Biomedical Informatics University of Arkansas for Medical Sciences, College of Medicine Little Rock Arkansas USADepartment of Pediatrics and Internal Medicine University of Arkansas for Medical Sciences, College of Medicine, Arkansas Children's Research Institute Little Rock Arkansas USADepartment of Biomedical Informatics University of Arkansas for Medical Sciences, College of Medicine Little Rock Arkansas USADepartment of Biomedical Informatics University of Arkansas for Medical Sciences, College of Medicine Little Rock Arkansas USADepartment of Biomedical Informatics University of Arkansas for Medical Sciences, College of Medicine Little Rock Arkansas USAJohns Hopkins University Baltimore Maryland USADepartment of Biomedical Informatics University of Arkansas for Medical Sciences, College of Medicine Little Rock Arkansas USAAbstract Objective This project demonstrates the feasibility of connecting medical imaging data and features, SARS‐CoV‐2 genome variants, with clinical data in the National Clinical Cohort Collaborative (N3C) repository to accelerate integrative research on detection, diagnosis, and treatment of COVID‐19‐related morbidities. The N3C curated a rich collection of aggregated and de‐identified electronic health records (EHR) data of over 18 million patients, including 7.5 million COVID‐positive patients, seen at hospitals across the United States. Medical imaging data and variant samples are important data modalities used in the study of COVID‐19. Materials and Methods Imaging data and features are hosted on the Cancer Imaging Archive (TCIA), and sequenced variant samples are analyzed and stored at the NIH GenBank. The University of Arkansas for Medical Sciences (UAMS) published the first COVID‐19 data set of 105 patients on TCIA and 37 patients on GenBank. We developed a process to link imaging and genomic variants and N3C EHR data through Privacy Preserving Record Linkage (PPRL) using de‐identified cryptographic hashes to match records associated with the same individuals without using patient identifiers. Results The PPRL techniques were piloted using clinical and imaging data sets provided by UAMS. Developed software components and processes executed properly, and linked data were returned and processed for visualization. Conclusion Linking across clinical data sources at the patient level provides opportunities to gain insights from data that may not be known otherwise. The PPRL prototype and the pilot serve as a model to link disparate and diverse data repositories to enhance clinical research.https://doi.org/10.1002/lrh2.10457GenBankN3CPPRLradiomicsTCIAviral variants
spellingShingle Ahmad Baghal
Joel Saltz
Tahsin Kurc
Prateek Prasanna
Samantha Baghal
Janos Hajagos
Erich Bremer
Shaymaa Al‐Shukri
Joshua L. Kennedy
Michael Rutherford
Tracy Nolan
Kirk Smith
Christopher G. Chute
Fred Prior
Linking The Cancer Imaging Archive and GenBank to the National Clinical Cohort Collaborative
Learning Health Systems
GenBank
N3C
PPRL
radiomics
TCIA
viral variants
title Linking The Cancer Imaging Archive and GenBank to the National Clinical Cohort Collaborative
title_full Linking The Cancer Imaging Archive and GenBank to the National Clinical Cohort Collaborative
title_fullStr Linking The Cancer Imaging Archive and GenBank to the National Clinical Cohort Collaborative
title_full_unstemmed Linking The Cancer Imaging Archive and GenBank to the National Clinical Cohort Collaborative
title_short Linking The Cancer Imaging Archive and GenBank to the National Clinical Cohort Collaborative
title_sort linking the cancer imaging archive and genbank to the national clinical cohort collaborative
topic GenBank
N3C
PPRL
radiomics
TCIA
viral variants
url https://doi.org/10.1002/lrh2.10457
work_keys_str_mv AT ahmadbaghal linkingthecancerimagingarchiveandgenbanktothenationalclinicalcohortcollaborative
AT joelsaltz linkingthecancerimagingarchiveandgenbanktothenationalclinicalcohortcollaborative
AT tahsinkurc linkingthecancerimagingarchiveandgenbanktothenationalclinicalcohortcollaborative
AT prateekprasanna linkingthecancerimagingarchiveandgenbanktothenationalclinicalcohortcollaborative
AT samanthabaghal linkingthecancerimagingarchiveandgenbanktothenationalclinicalcohortcollaborative
AT janoshajagos linkingthecancerimagingarchiveandgenbanktothenationalclinicalcohortcollaborative
AT erichbremer linkingthecancerimagingarchiveandgenbanktothenationalclinicalcohortcollaborative
AT shaymaaalshukri linkingthecancerimagingarchiveandgenbanktothenationalclinicalcohortcollaborative
AT joshualkennedy linkingthecancerimagingarchiveandgenbanktothenationalclinicalcohortcollaborative
AT michaelrutherford linkingthecancerimagingarchiveandgenbanktothenationalclinicalcohortcollaborative
AT tracynolan linkingthecancerimagingarchiveandgenbanktothenationalclinicalcohortcollaborative
AT kirksmith linkingthecancerimagingarchiveandgenbanktothenationalclinicalcohortcollaborative
AT christophergchute linkingthecancerimagingarchiveandgenbanktothenationalclinicalcohortcollaborative
AT fredprior linkingthecancerimagingarchiveandgenbanktothenationalclinicalcohortcollaborative