Proteomics Challenges: From Proteoform Complexity to Technical Limitations

Thanks to decades of significant scientific and technological advancements, genomics has continued to evolve and help scientists make important biological discoveries, including the genetic basis for complex diseases, CRISPR gene editing technology, personalized medical treatments, and more. Proteomics, on the other hand, has long been considered challenging, complex and cumbersome, with analysis workflows lacking consistent, scalable parameters for large-scale studies of statistical significance.

However, within the complexity of the proteome lies extraordinary potential for biological discoveries – discoveries that genomic analysis cannot achieve on its own.

The proteome is extremely complex and also extremely informative.
— Serafim Batzoglou, Chief Data Officer, Seer
Decoding & Navigating the Proteome, AWS Health Podcast, October 2023

Understanding the vast diversity and intricacy of the human proteome the same way we now understand the human genome will inevitably lead to remarkable advancements in medicine and biology. Even now, researchers are uncovering novel insights into human health and disease that could have gone undiscovered without proteomics research.

So why has the scientific community only recently shown an increased interest in the research method? Below we dive deeper into the challenges in proteomics and begin to highlight why the field is experiencing a paradigm shift in research and development.

China Launches 30-Year, Multibillion RMB Proteomics Initiative. Read more.


Proteoform Variants Increase Complexity

One of the most challenging aspects of proteomics is the quantity and diversity of proteoforms.

The human genome contains approximately 20,000 genes, and the majority of those genes are able to produce 10-50 proteoforms.1 That means the human proteome may hold more than 1 million different proteoforms.2 This creates a range of technological and analytical considerations compounded by the fact that as unique molecular variants of a protein, proteoforms result from various biological modifications to their primary amino acid sequences, adding layers of complexity to proteomic data.

Cartoon depicting a scientist, labeled Traditional Biochemistry, using a fishing line for the terms RXR, BRCA, actin, ras, and Wnt in a lake. A steam shovel, labeled Proteomics, is meanwhile scooping up terms like Ub, Notch, Atm, myc, P53, and Pb in a net.

Credit: Joe Sutliff

Each modification can significantly affect a protein’s function, stability, and interactions with other molecules, including:

  • Mutations
  • Alternative splicing
  • Post-translational modifications (PTMs):
    • Phosphorylation
    • Acetylation
    • Glycosylation
    • Biologically relevant enzymatic cleavage

High Dynamic Ranges Make Low Abundant Protein Detection Difficult

In certain samples, the dynamic range of proteins can be significantly high. A sample of human blood for example has a dynamic range spanning 10-12 orders of magnitude.3 What this means is rare, novel disease-state biomarkers might be obscured by proteins 10 billion times more concentrated within the same sample.

A chart depicting the top 22 proteins equals 99 percent of total protein mass, while the remaining proteins account for approximately 1 percent of total protein mass.

And when it comes to identifying low-abundant proteforms, targeted, affinity- or antibody-based proteomic analysis methods have technical limitations:3

  • The high cost of antibody production
  • Difficulty in the generation of antibodies with high specificity
  • Limited availability of high-quality antibodies
  • Batch-to-batch variability of the antibodies
  • Relatively low stability

Lack of Standardization in Data Analysis, Visualization, and Workflows

Proteomics research has the potential to generate large amounts of data, including mass spectrometry-based proteomics which often comprises millions of peptide identifications.1 A data set may also include multiple ‘omics’ data, which requires the accurate calculation of the “false discovery rate” or FDR.

Without robust data analysis tools and technology, during the data analysis process, there can be oversights, the incorrect identification of thousands of absent proteins, and skewed results and interpretations of large-scale studies. In large proteomic data sets, there can be delays and bottlenecks as well as issues with data reproducibility and consistency.

The Mass Spectrometry Conundrum

Mass spectrometry is a cornerstone technique in proteomics and is widely considered to be the most comprehensive approach to​ understanding the full complexity of the proteome​. However, it has also been considered too slow, laborious, and complicated, particularly for large-scale, high-throughput studies. Often mass spec requires the support of high throughput technology and specialized analytical scientists to complete a study.

Important to note, is targeted affinity-based assays interrogate epitopes to infer proteins​ so lack the ability to identify​ variants or modified proteins​. They are limited in the areas of cross-species capabilities​, ‘discrete’ target libraries​, and FDR identification compared to LC-MS/MS.


 

Overcoming Proteomics Challenges to Push Scientific Discovery Forward

The advancements being made in the innovation and development of proteomics technology are helping turn the challenges once synonymous with the field into extraordinary opportunities.

Genomics and transcriptomics alone cannot explain cellular functions the way proteomics can, making it a critical piece of the puzzle for researchers looking to truly understand human biology and improve human health.

It is our mission at Seer to develop the technological breakthroughs researchers need to push scientific discovery forward using proteomics to lead the way.

Working with Seer has transformed our multi-omics biomarker discovery initiatives. Seer’s Proteograph is allowing us to generate unbiased, quantitative data for over 3000 pig plasma proteins, enabling us to find new biomarker signatures of slow-progressing diseases like CLN3 Batten Diseases in animal models.
Jon Brudvig, Ph.D., Assistant Professor, Pediatrics, Pediatrics and Rare Diseases Group, Sanford Research
publication pending

Keep Exploring

Proteomics Applications

Explore posters and publications demonstrating the integration of proteomics with other omes in biomarker discovery, cancer research, and computational and assay methods.

Discover

Customer Stories

Hear customer insights, technology overviews, and scientific presentations highlighting the impact unbiased proteomics is having in biomarker and drug target discovery.

Watch

Questions?

Our Learning Center offers Q&As about proteomics, mass spectrometry compatibilities, proteogenomics, NGS, nanoparticle technology workflows, and more.

Read