Posted by Andrew Carroll, Product Manager and Cory McLean, Software Engineer, Google Health
Each person’s genome, which collectively encodes the biochemical machinery they are born with, is composed of over 3 billion letters of DNA. However, only a small subset of the genome (~4-5 million positions) varies between two people. Nonetheless, each person’s unique genome interacts with the environment they experience to determine the majority of their health outcomes. A key method of understanding the relationship between genetic variants and traits is a genome-wide association study (GWAS), in which each genetic variant present in a cohort is individually examined for correlation with the trait of interest. GWAS results can be used to identify and prioritize potential therapeutic targets by identifying genes that are strongly associated with a disease of interest, and can also be used to build a polygenic risk score (PRS) to predict disease predisposition based on the combined influence of variants present in an individual. However, while accurate measurement of traits in an individual (called phenotyping) is essential to GWAS, it often requires painstaking expert curation and/or subjective judgment calls.
In “Large-scale machine learning-based phenotyping significantly improves genomic discovery for optic nerve head morphology”, we demonstrate how using machine learning (ML) models to classify medical imaging data can be used to improve GWAS. We describe how models can be trained for phenotypes to generate trait predictions and how these predictions are used to identify novel genetic associations. We then show that the novel associations discovered improve PRS accuracy and, using glaucoma as an example, that the improvements for anatomical eye traits relate to human disease. We have released the model training code and detailed documentation for its use on our Genomics Research GitHub repository.
This article is purposely trimmed, please visit the source to read the full article.