Learning-augmented sketching offers improved performance for privacy preserving and secure GWAS

Summary: Trusted execution environments (TEEs), such as Intel SGX, enable secure, privacy-preserving computations but may have computational resource constraints. To address this, methods like SkSES use sketching for genome-wide association studies (GWAS) across distributed datasets while maintainin...

Full description

Saved in:
Bibliographic Details
Main Authors: Junyan Xu, Kaiyuan Zhu, Jieling Cai, Can Kockan, Natnatee Dokmai, Hyunghoon Cho, David P. Woodruff, S. Cenk Sahinalp
Format: Article
Language:English
Published: Elsevier 2025-03-01
Series:iScience
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2589004225002718
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Summary: Trusted execution environments (TEEs), such as Intel SGX, enable secure, privacy-preserving computations but may have computational resource constraints. To address this, methods like SkSES use sketching for genome-wide association studies (GWAS) across distributed datasets while maintaining privacy. Here, we present a learning-augmented version of SkSES for more accurate identification of significant SNPs. Our method first conducts GWAS on a public training dataset to locally identify significant SNPs. These SNPs are assigned dedicated memory to enable more precise selection of significant SNPs over the entire dataset while optimizing memory usage. Our method maintains the stringent privacy guarantees of SkSES, ensuring sensitive genotype data remains undisclosed to other institutions or cloud providers. Experimental results on benchmark datasets show the learning-augmented version achieves up to 40% higher accuracy compared to the original SkSES under identical memory constraints. This advancement improves the scalability and effectiveness of collaborative GWAS studies in TEEs.
ISSN:2589-0042