Linking The Cancer Imaging Archive and GenBank to the National Clinical Cohort Collaborative
Abstract Objective This project demonstrates the feasibility of connecting medical imaging data and features, SARS‐CoV‐2 genome variants, with clinical data in the National Clinical Cohort Collaborative (N3C) repository to accelerate integrative research on detection, diagnosis, and treatment of COV...
Saved in:
Main Authors: | , , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2025-01-01
|
Series: | Learning Health Systems |
Subjects: | |
Online Access: | https://doi.org/10.1002/lrh2.10457 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841527654207979520 |
---|---|
author | Ahmad Baghal Joel Saltz Tahsin Kurc Prateek Prasanna Samantha Baghal Janos Hajagos Erich Bremer Shaymaa Al‐Shukri Joshua L. Kennedy Michael Rutherford Tracy Nolan Kirk Smith Christopher G. Chute Fred Prior |
author_facet | Ahmad Baghal Joel Saltz Tahsin Kurc Prateek Prasanna Samantha Baghal Janos Hajagos Erich Bremer Shaymaa Al‐Shukri Joshua L. Kennedy Michael Rutherford Tracy Nolan Kirk Smith Christopher G. Chute Fred Prior |
author_sort | Ahmad Baghal |
collection | DOAJ |
description | Abstract Objective This project demonstrates the feasibility of connecting medical imaging data and features, SARS‐CoV‐2 genome variants, with clinical data in the National Clinical Cohort Collaborative (N3C) repository to accelerate integrative research on detection, diagnosis, and treatment of COVID‐19‐related morbidities. The N3C curated a rich collection of aggregated and de‐identified electronic health records (EHR) data of over 18 million patients, including 7.5 million COVID‐positive patients, seen at hospitals across the United States. Medical imaging data and variant samples are important data modalities used in the study of COVID‐19. Materials and Methods Imaging data and features are hosted on the Cancer Imaging Archive (TCIA), and sequenced variant samples are analyzed and stored at the NIH GenBank. The University of Arkansas for Medical Sciences (UAMS) published the first COVID‐19 data set of 105 patients on TCIA and 37 patients on GenBank. We developed a process to link imaging and genomic variants and N3C EHR data through Privacy Preserving Record Linkage (PPRL) using de‐identified cryptographic hashes to match records associated with the same individuals without using patient identifiers. Results The PPRL techniques were piloted using clinical and imaging data sets provided by UAMS. Developed software components and processes executed properly, and linked data were returned and processed for visualization. Conclusion Linking across clinical data sources at the patient level provides opportunities to gain insights from data that may not be known otherwise. The PPRL prototype and the pilot serve as a model to link disparate and diverse data repositories to enhance clinical research. |
format | Article |
id | doaj-art-073994b670a54db7b3926078d26b706e |
institution | Kabale University |
issn | 2379-6146 |
language | English |
publishDate | 2025-01-01 |
publisher | Wiley |
record_format | Article |
series | Learning Health Systems |
spelling | doaj-art-073994b670a54db7b3926078d26b706e2025-01-15T08:51:32ZengWileyLearning Health Systems2379-61462025-01-0191n/an/a10.1002/lrh2.10457Linking The Cancer Imaging Archive and GenBank to the National Clinical Cohort CollaborativeAhmad Baghal0Joel Saltz1Tahsin Kurc2Prateek Prasanna3Samantha Baghal4Janos Hajagos5Erich Bremer6Shaymaa Al‐Shukri7Joshua L. Kennedy8Michael Rutherford9Tracy Nolan10Kirk Smith11Christopher G. Chute12Fred Prior13Department of Biomedical Informatics University of Arkansas for Medical Sciences, College of Medicine Little Rock Arkansas USAStony Brook University The State University of New York, Biomedical Informatics Stony Brook New York USAStony Brook University The State University of New York, Biomedical Informatics Stony Brook New York USAStony Brook University The State University of New York, Biomedical Informatics Stony Brook New York USADepartment of Internal Medicine The University of Tennessee Health Science Center Memphis Tennessee USAStony Brook University The State University of New York, Biomedical Informatics Stony Brook New York USAStony Brook University The State University of New York, Biomedical Informatics Stony Brook New York USADepartment of Biomedical Informatics University of Arkansas for Medical Sciences, College of Medicine Little Rock Arkansas USADepartment of Pediatrics and Internal Medicine University of Arkansas for Medical Sciences, College of Medicine, Arkansas Children's Research Institute Little Rock Arkansas USADepartment of Biomedical Informatics University of Arkansas for Medical Sciences, College of Medicine Little Rock Arkansas USADepartment of Biomedical Informatics University of Arkansas for Medical Sciences, College of Medicine Little Rock Arkansas USADepartment of Biomedical Informatics University of Arkansas for Medical Sciences, College of Medicine Little Rock Arkansas USAJohns Hopkins University Baltimore Maryland USADepartment of Biomedical Informatics University of Arkansas for Medical Sciences, College of Medicine Little Rock Arkansas USAAbstract Objective This project demonstrates the feasibility of connecting medical imaging data and features, SARS‐CoV‐2 genome variants, with clinical data in the National Clinical Cohort Collaborative (N3C) repository to accelerate integrative research on detection, diagnosis, and treatment of COVID‐19‐related morbidities. The N3C curated a rich collection of aggregated and de‐identified electronic health records (EHR) data of over 18 million patients, including 7.5 million COVID‐positive patients, seen at hospitals across the United States. Medical imaging data and variant samples are important data modalities used in the study of COVID‐19. Materials and Methods Imaging data and features are hosted on the Cancer Imaging Archive (TCIA), and sequenced variant samples are analyzed and stored at the NIH GenBank. The University of Arkansas for Medical Sciences (UAMS) published the first COVID‐19 data set of 105 patients on TCIA and 37 patients on GenBank. We developed a process to link imaging and genomic variants and N3C EHR data through Privacy Preserving Record Linkage (PPRL) using de‐identified cryptographic hashes to match records associated with the same individuals without using patient identifiers. Results The PPRL techniques were piloted using clinical and imaging data sets provided by UAMS. Developed software components and processes executed properly, and linked data were returned and processed for visualization. Conclusion Linking across clinical data sources at the patient level provides opportunities to gain insights from data that may not be known otherwise. The PPRL prototype and the pilot serve as a model to link disparate and diverse data repositories to enhance clinical research.https://doi.org/10.1002/lrh2.10457GenBankN3CPPRLradiomicsTCIAviral variants |
spellingShingle | Ahmad Baghal Joel Saltz Tahsin Kurc Prateek Prasanna Samantha Baghal Janos Hajagos Erich Bremer Shaymaa Al‐Shukri Joshua L. Kennedy Michael Rutherford Tracy Nolan Kirk Smith Christopher G. Chute Fred Prior Linking The Cancer Imaging Archive and GenBank to the National Clinical Cohort Collaborative Learning Health Systems GenBank N3C PPRL radiomics TCIA viral variants |
title | Linking The Cancer Imaging Archive and GenBank to the National Clinical Cohort Collaborative |
title_full | Linking The Cancer Imaging Archive and GenBank to the National Clinical Cohort Collaborative |
title_fullStr | Linking The Cancer Imaging Archive and GenBank to the National Clinical Cohort Collaborative |
title_full_unstemmed | Linking The Cancer Imaging Archive and GenBank to the National Clinical Cohort Collaborative |
title_short | Linking The Cancer Imaging Archive and GenBank to the National Clinical Cohort Collaborative |
title_sort | linking the cancer imaging archive and genbank to the national clinical cohort collaborative |
topic | GenBank N3C PPRL radiomics TCIA viral variants |
url | https://doi.org/10.1002/lrh2.10457 |
work_keys_str_mv | AT ahmadbaghal linkingthecancerimagingarchiveandgenbanktothenationalclinicalcohortcollaborative AT joelsaltz linkingthecancerimagingarchiveandgenbanktothenationalclinicalcohortcollaborative AT tahsinkurc linkingthecancerimagingarchiveandgenbanktothenationalclinicalcohortcollaborative AT prateekprasanna linkingthecancerimagingarchiveandgenbanktothenationalclinicalcohortcollaborative AT samanthabaghal linkingthecancerimagingarchiveandgenbanktothenationalclinicalcohortcollaborative AT janoshajagos linkingthecancerimagingarchiveandgenbanktothenationalclinicalcohortcollaborative AT erichbremer linkingthecancerimagingarchiveandgenbanktothenationalclinicalcohortcollaborative AT shaymaaalshukri linkingthecancerimagingarchiveandgenbanktothenationalclinicalcohortcollaborative AT joshualkennedy linkingthecancerimagingarchiveandgenbanktothenationalclinicalcohortcollaborative AT michaelrutherford linkingthecancerimagingarchiveandgenbanktothenationalclinicalcohortcollaborative AT tracynolan linkingthecancerimagingarchiveandgenbanktothenationalclinicalcohortcollaborative AT kirksmith linkingthecancerimagingarchiveandgenbanktothenationalclinicalcohortcollaborative AT christophergchute linkingthecancerimagingarchiveandgenbanktothenationalclinicalcohortcollaborative AT fredprior linkingthecancerimagingarchiveandgenbanktothenationalclinicalcohortcollaborative |