Harmonization and integration of data from prospective cohort studies across the Region of the Americas

Objectives. To develop a generalizable extraction, transform, and load (ETL) process and workflow for prospective harmonization of data from active cohort studies being conducted in different geographic locations across the Region of the Americas. Methods. This study harmonized and merged data from...

Full description

Saved in:
Bibliographic Details
Main Authors: Janeil Williams, Olga Tchuvatkina, Marshall K. Tulloch-Reid, Joette McKenzie, Novie Younger-Coleman, Ian Hambleton, Kimlin Ashing, Camille Ragin
Format: Article
Language:English
Published: Pan American Health Organization 2025-05-01
Series:Revista Panamericana de Salud Pública
Subjects:
Online Access:https://iris.paho.org/handle/10665.2/67007
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Objectives. To develop a generalizable extraction, transform, and load (ETL) process and workflow for prospective harmonization of data from active cohort studies being conducted in different geographic locations across the Region of the Americas. Methods. This study harmonized and merged data from two active prospective cohort studies, the Living in Full Health (LIFE) project in Jamaica and the Cancer Prevention Project of Philadelphia (CAP3) in the United States. The RedCAP data collection platform was leveraged in harmonizing and pooling baseline prospective cohort data that was collected from June 2019 to December 2024. Results. The merged data from this harmonization methodology displayed good coverage on the mapped variables. Seventeen of 23 (74%) of the questionnaire forms harmonized greater than 50% of the variables. Statistical tests on the age-adjusted prevalence of health conditions demonstrated regional differences that could be used to investigate disease hypotheses in the Black Diaspora. Conclusion. This study developed a successful data harmonization process that can guide similar projects. Active data harmonization is a useful strategy that can reduce costs and leverage resources required to conduct multi-site cohort studies, while fostering data sharing and collaborative research across the Region of the Americas.
ISSN:1020-4989
1680-5348