A Genome-Wide Association Study of Mass Spectrometry Proteomics Using the Seer Proteograph Platform
bioRxiv – Karsten Suhre, et al.
Genome-wide association studies (GWAS) are used to identify genetic variations associated with diseases or other complex phenotypes. By comparing the genomes of a large number of individuals, GWAS can pinpoint genes or regions of the genome that influence phenotypes, providing valuable insights for clinical researchers. One particular type of GWAS is the search for genetic variants that affect protein levels. Such genetic variants are called protein quantitative trait loci, or pQTLs for short.
A pQTL is a specific region of the genome that is associated with a change in the abundance or expression of a particular protein. Most GWAS searching for pQTLs to date utilized affinity proteomics platforms, but there are challenges with this approach. Protein altering variants (PAVs) can affect binding affinity, leading to readouts that may not reflect true protein levels. PAVs are correlated with other neighboring genetic variants, which may be misinterpreted as pQTLs. This issue is known as an epitope effect, which potentially skews analyses and resulting conclusions.
To address this issue, mass spectrometry (MS)-based proteomics may offer a more accurate alternative. A recent study from researchers at Weill Cornell Medicine-Qatar, Harvard Medical School and Brigham and Women’s Hospital, in collaboration with Seer and TruDiagnostic, Inc., used Seer’s Proteograph™ workflow to validate pQTLs and demonstrated the technology’s capability to differentiate between putative epitope effects and true protein expression QTLs.
Key Insights
- Previous pQTL studies utilizing affinity-based technologies have identified many associations between genome and protein expression, but the impact of epitope effects was largely unknown.
- Mass spectrometry sequences multiple regions of a protein and can identify protein altering variants. Coupled with deep, unbiased proteomics using the Proteograph workflow, this method provides pQTL analysis at population-scale.
- This study examined previously identified pQTLs by two leading affinity technologies in two previous large-scale studies: the UK Biobank study of more than 50,000 participants using Olink, and the Icelandic consortium study of more than 36,000 participants using SomaScan. Among the pQTLs found by each of these studies, and which involved proteins also identified by MS using the Proteograph workflow in a cohort of 1565 individuals, out of the 200 pQTLs with the strongest p-values (100 pQTLs from the Olink study and 100 pQTLs from the SomaScan study), up to 33% were not reproducible by MS. However, among these pQTLs, the 46 pQTLs that were found by both Olink and Somalogic were reproducible. This study found that the non-reproducible pQTLs could potentially represent epitope effects rather than changes in protein abundance or expression. The authors noted that the statistically strongest pQTLs by affinity-based technologies could be especially enriched for putative epitope effects. It is possible that across all pQTLs identified by affinity-based technologies, and not just the combined 200 that are statistically strongest and which are also found by MS, the rate of epitope effects could be lower.
The Study Design
The researchers conducted a GWAS using the MS-based Proteograph platform for proteomics analysis of blood samples from a diverse cohort of 1,260 samples from the Tarkin study and a replication cohort of 325 samples from the QMDiab study. A threshold for identifying MS-based pQTLs (MS-pQTL) was applied to the analysis. Next, researchers compared the resulting MS-pQTLs to previous pQTL identifications from the deCODE and UKBBPPP studies performed with affinity-based technologies, SomaScan™ and Olink™.
The Results
This study represents the first large GWAS using MS-based proteomics, leveraging Seer’s Proteograph workflow to compare pQTL identifications from affinity-based proteomics technologies. The researchers developed novel data analysis protocols to account for genetic variants within the analyzed peptides to reduce epitope effects.
- Discovery of New pQTLs: Of the 252 MS-pQTLs identified, what was found: 65 by both affinity platforms, 80 by at least one affinity platform, and 107 pQTLs that have not been reported previously. This suggests a high novel discovery rate when using deep, unbiased proteomics with the Proteograph workflow.
- Impact of Epitope Effects: Researchers examined previously reported pQTLs to assess reproducibility. Among proteins that were quantified in the Tarkin and QMdiab study, taking the top 100 strongest pQTLs from the deCODE study and the top 100 strongest pQTLs from the UKBPPP study, they found that up to one-third of the combined 200 evaluated pQTLs were not reproducible by MS. However, the subset of those pQTLs that were identified by both affinity technologies were reproducible. Taking these two findings together, it is possible that the non-reproducible pQTLs represent misidentifications, potentially due to epitope effects.
- Importance of a Proteome Library: Researchers were able to improve the accuracy of MS-pQTLs in the study by accounting for PAVs in their analysis. The use of a proteome library accounting for PAVs was crucial as traditional methods could misidentify proteins, especially when relying on a limited library.
The researchers defined a new MS-peptide association score (MSPA), which shows the degree of replication of a pQTL signal in the MS proteomics data. In the figure above, among proteins that are quantified in the Tarkin study, each of the 100 most significant pQTLs are plotted according to the pQTL rank from the deCODE (left) and UKBPPP (right) studies. The green shaded area, with a high MSPA score, represents strong evidence of true pQTLs. The pQTLs in the red region exhibit little or no evidence in the MS data, suggesting potential epitope effects. Notably, pQTLs that were identified by both targeted technologies, which are likely to be true, do not fall within the red region, suggesting that the red region is potentially populated by ”true negatives’’. This highlights the power of MSPA as a tool for validating previous pQTL findings.
This underscores the study’s contributions to understanding the relationship between genome and protein expression. This study suggests that many additional true pQTLs remain to be discovered using MS-based platforms.
Discover Seer Technology Access Programs & Services
Easily explore the protein universe with Seer. Our service providers and in-house Seer Technology Access Center (STAC) empower your research, accelerating results and deciphering the proteome at unprecedented depth and scale. Answer biology’s most challenging questions — faster.
DOI: 10.1101/2024.05.27.596028