Harmonization and integration of data from prospective cohort studies across the Region of the Americas

Williams et al.

Objectives

To develop a generalizable extraction, transform, and load (ETL) process and workflow for prospective harmonization of data from active cohort studies being conducted in different geographic locations across the Region of the Americas. 

Methods

This study harmonized and merged data from two active prospective cohort studies, the Living in Full Health (LIFE) project in Jamaica and the Cancer Prevention Project of Philadelphia (CAP3) in the United States. The RedCAP data collection platform was leveraged in harmonizing and pooling baseline prospective cohort data that was collected from June 2019 to December 2024. 

Results

The merged data from this harmonization methodology displayed good coverage on the mapped variables. Seventeen of 23 (74%) of the questionnaire forms harmonized greater than 50% of the variables. Statistical tests on the age-adjusted prevalence of health conditions demonstrated regional differences that could be used to investigate disease hypotheses in the Black Diaspora. 

Conclusion

This study developed a successful data harmonization process that can guide similar projects. Active data harmonization is a useful strategy that can reduce costs and leverage resources required to conduct multi-site cohort studies, while fostering data sharing and collaborative research across the Region of the Americas.

Article's language
English
Original research