Here are the top proteomics and Proteograph™ questions that we get asked the most. So, take your time and look around. If you have a question that wasn’t answered here, we can help. Get in touch with us here.
Proteomics is the study and functional analysis of proteins. Proteins are the building blocks of the living cells and are central molecules in the structure, function, and regulation of cells, tissues, and organs. As such, proteomics tells us “what is” rather than “what could be”, giving us a close look at phenotypes of interest and offering important understanding of protein activity at the molecular level.
To simplify it, DNA is the blueprint of living organisms encoding instructions in RNA, and proteins are the functional building blocks that drive the molecular mechanisms in living cells.
A proteome is the complete set of proteins produced or modified by an organism or system. The genome encodes over 20,000 genes, however, it is estimated that there may be over one million distinct protein variants, called proteoforms, arising from dynamic mechanisms such as post-translational modifications (PTMs), allelic variation, and alternative splicing.
Proteomics is used to study what is happening inside the body at a molecular level to investigate the mechanisms underpinning human biology, including health and disease.
Quantitative proteomics is the study of how much of a given protein is present in a sample and how that differs between samples, across time in the context of disease or conditions. Proteins can also change in terms of where in cells and tissues they are present, what type of interactions they engage in, or how they are modified.
Structural proteomics is the study of protein structures in three dimensions, including examining how protein sequence relates to structure, their modifications, or interactions. A tailored upfront sample process is often needed to preserve this information for a downstream detector. This can lead to the discovery of new biomarkers, protein drug targets, and therapies.
Proteins can be studied with traditional biochemistry approaches, such as Edman degradation, Liquid Chromatography (LC) coupled with Mass Spectrometry (MS), antibody-based western blots or Elisa, or more recent iterations that use aptamers to detect proteins in a sample.
LC coupled with MS (LCMS) is widely used to identify and quantify entire proteins (i.e., top-down) or peptides (i.e., fragments of proteins, bottom-up) based on their specific mass-to-charge ratios. To decomplex peptide mixtures prior to MS analysis – and dive deeper into the molecular complexity – LC is commonly paired with MS to provide additional separation of analytes (i.e., peptides/proteins) before proteomic analysis by LCMS analysis.
Unbiased proteomics, like Mass Spectrometry-based proteomics, enables broad analyses of the entire proteome, including proteoforms arising from post-translational modifications (PTMs), allelic variation, and alternative splicing, while targeted proteomics typically utilizes an analyte-specific reagent (ASR), like an antibody or aptamer, which binds unique protein epitopes and is used to screen for a specific protein for which the ligand was designed.
Targeted technologies recognize a limited number of preselected epitopes and it is possible that a protein altering variant may cause a conformational change or some epitopes to be inaccessible to ASRs.
Proteogenomics is an area of research that combines proteomics and genomics to better understand the flow of biological and genetic information between DNA, RNA, and proteins.
Technological improvements to address current research challenges (e.g., depth and scale in complex samples like plasma samples or low input materials like single cell proteomics) will enable more comprehensive studies. As these challenges are addressed, integration of proteomics and other molecular phenotypes (i.e., multi-omics) will provide a systems level view of human biology, with better understanding of human health and disease, by supporting drug response analysis, drug discovery, patient stratifications, and precision medicine.
Through next generation sequencing (NGS), massively parallel sequencing has enabled high-throughput, scalable, and fast generation of genomic information from DNA and RNA (i.e., four bases: A, C, T, and G) to measure the human genome. While NGS has revealed important findings, proteomics must be considered to gain a comprehensive understanding of human biology.
For example, when we compare transcriptomic data with proteomic data in the same biosample, we see a low or modest correlation. This demonstrates the unique information proteomics adds to studying biology and that we cannot reliably extrapolate from transcriptomics. Much like NGS enables comprehensive insights into the genome, a next generation proteomics (NGP) technology enables comprehensive insights into the proteome through unbiased and deep coverage of the proteome at scale to facilitate understanding of complex human biology.
Proteomics can be used to investigate aspects of biology, including disease biomarkers, protein-protein interactions, PTMs, or protein degradation and turnover. Each research focus requires distinct sets of sample preparation techniques and detection strategies. Comprehensive assessment of the proteome requires an unbiased, rapid, deep, and scalable technology.
Researchers focused on protein discovery require techniques that cast a wide net to catch as many proteins as possible (i.e., unbiased technology) across the dynamic range (i.e., deep technology) to discover protein biomarkers and their differential abundance associated with a phenotype. This should be done across a large enough cohort of samples to empower discovery (i.e., scalable and rapid technology). Given the content breadth and depth, and need for large cohort sizes, an unbiased, rapid, deep, and scalable technology is important for researchers focused on protein discovery.
Liquid Chromatology (LC) is a technique used to separate a mixture of peptides to reduce peptide complexity and control the flow of content into the Mass Spectrometer (MS). One LC strategy is to separate peptides by hydrophobicity (i.e., reverse phase chromatography). This helps to not overwhelm the MS with the complexity of the proteome by separating out individual peptides over time. The resolving power (i.e., capacity to separate peptides) of a LC system depends on multiple parameters, including flow rate, size of the packing material in a LC column, column length, gradient length, and buffer composition. In cases where a user may wish to increase the separation power, ion mobility as well as protein and peptide fractionation may be used orthogonally to LC.
Mass Spectrometry (MS) is an analytical tool used to detect and quantify compounds, like peptides and proteins, in a sample based on specific mass-to-charge ratio (m/z). MS consists of three components:
The mass spectra produced by the MS are then analyzed to identify and quantify proteins. Modern (i.e., hybrid) Mass Spectrometers contain multiple components to sort and analyze ions and often have the capability to further fragment for high confidence identification of peptides with similar nominal mass but different amino acid sequences.
While intact proteins can be investigated after purification or enrichment with Mass Spectrometers (i.e., top-down proteomics), the far more common workflow (i.e., bottom-up proteomics) involves proteolytic cleavage of proteins to peptides (e.g., using trypsin) that are easier to handle in MS and Liquid Chromatography. While this process can cope with much more complex samples and high dynamic ranges, it requires in silico reassembly of the proteins based on detected peptides or protein species.
Mass Spectrometers can be operated in different modes, including data-dependent acquisition (DDA) mode (MS2 iteratively is triggered either by top nth most intense MS1 spectra or top speed mode), data independent acquisition (DIA) mode (multiplexed fragmentation – MS2 – of consecutive windows along the m/z range), and targeted mode (peptides are targeted based on a list of proteins of interest, peptides, and their MS2 or MS3 fragments).
Yes, there can be. Common upstream sample preparation workflows are complex, time-consuming, and require technical expertise. There can be challenges with protein solubility, the proteomic dynamic range (i.e., quantifying low abundant proteins), proteome complexity, low protein contents, data analysis, proteoform identification, and throughput.
An example of a complex sample is blood plasma, for which the top 22 most abundant proteins account for approximately 99% of the total protein mass. Thousands of less abundant proteins, however, are found in the other 1% of the total protein mass and have significant impact on biology and health. The main challenge in proteomic research today is many of the current approaches for studying proteins are limited in depth or throughput, and, therefore, cannot access thousands of low abundance proteins in powerful enough studies to detect biological signal. Techniques, such as depletion and fractionation, exist to try to reduce the signal from the most abundant proteins, however, these workflows can be costly, complex, and time-consuming.
Depletion is a method used in proteomics research to access low abundant proteins in complex samples with wide dynamic range. It helps reduce the complexity of biological samples – serum, plasma, or biofluids – by removing the more abundant proteins and enhancing the detection of lower abundant proteins in both discovery and targeted proteomic analyses. This technique requires the use of specific molecules (e.g., antibodies) tailored to bind predefined proteins. With peptide fractionation, scientists need to compromise between the proteome depth and analysis throughput.
Bottom-up proteomics workflows enable the identification of tens of thousands of peptides in studies that are commonly reassembled to protein in-silico. In many cases, this provides direct information of the abundance and dynamics for the active molecule (the protein and its modifications and alterations). The challenge is that some peptides are unique to a protein, but are sometimes shared between multiple proteins or multiple proteoforms, which can arise from alternative splicing (i.e., protein isoforms), allelic variation (i.e., protein variants), or post-translational modifications or protein degradations. In these cases, protein-level aggregation can obscure biological protein signals, whereas analyses performed at the peptide-level (i.e., peptide-centric) enable high-resolution identification of proteoforms.
Seer’s proprietary engineered nanoparticles (NP) consist of a magnetic core and a surface that can be engineered to present unique physicochemical properties. When NPs are introduced into a biofluid, such as blood plasma, a selective and reproducible protein corona is formed at the nano-bio interface, driven by a combination of protein-NP affinity, protein abundance, and protein-protein interactions. In a process called the Vroman effect, there is competition between proteins for the binding surface on the NP, which results in the binding of high-affinity and low-abundant proteins. At pre-equilibrium, the protein corona composition is based mainly on proteins that are in close proximity to the NP, commonly high-abundant proteins. At equilibrium, high-abundant and low-affinity proteins are displaced by low-abundant and high-affinity proteins, resulting in the sampling of proteins across the wide dynamic range. NPs can be tuned with different functionalization to enhance and differentiate protein selectivity.
Coefficient of Variation (CV) is a statistic used to compare the extent of variation from one set of data to another (i.e., reproducibility). CV is a measure of the relative distribution of data points around the average (i.e., ratio of standard deviation divided by the mean). Factors that may contribute to variability include technical variability (e.g., operator, protein preparation method, instrument, ionization efficiency, or software choice) and biological variability (e.g., genetic background, disease state, age of subject, or gender).
A nanoparticle (NP) is usually defined as a particle that is less than 500 nanometers in diameter (significantly smaller than a particle). NPs have been used in a wide range of diverse applications, including medicine (nanomedicine), biotechnology and pharmaceuticals, energy, electronics and communications, automobiles/machinery, chemistry and materials, and environmental testing.
When nanoparticles are placed in contact with a biological sample, a thin layer of intact proteins rapidly, selectively, and reproducibly absorb onto the surface of a nanoparticle upon contact, forming what is called a protein “corona”.
The composition and quantity of the corona proteins depend on the physicochemical properties of the NPs and the surface. Properties like nanoparticle size, shape, material, charge, porosity, as well as surface functional groups will impact corona composition.
When nanoparticles come into contact with proteins in a biofluid, based on the nanoparticles’ tuned physicochemical properties, a highly reproducible and robust protein corona will form containing proteins spanning the wide dynamic range of the plasma proteome. The dynamic range is quantitatively compressed as a function of relative affinities and protein concentrations. With the use of a panel of several diverse nanoparticles, the resulting protein coronas will cumulatively result in broad and deep coverage of the plasma proteome.
Seer’s engineered nanoparticles (NPs) consist of a magnetic core and a surface with unique physicochemical properties. When nanoparticles are introduced into a biofluid, such as blood plasma, a selective and reproducible protein corona is formed at the nano-bio interface, driven by a combination of protein-nanoparticle affinity, protein abundance, and protein-protein interactions. Panels of these proprietary engineered NPs have been designed to efficiently and robustly sample the physicochemical space of the entire proteome, compressing the dynamic range quantitatively to render biological information more accessible for any downstream detector.
The Proteograph Product Suite and workflow has 4 key components:
1. Proteograph Assay Kit: 5-nanoparticle panel and bottom-up proteomics consumables.
2. SP100 Instrument: automated workflow for plasma to Mass Spectrometry-ready peptides.
3. Mass Spectrometry (MS): compatible with most LCMS instruments. (The Proteograph Product Suite does not include a MS instrument.)
4. Proteograph Analysis Suite: designed for speed and reproducibility, enabling powerful biological insights.
The Proteograph workflow is compatible with all bottom-up proteomics capable Mass Spectrometry (MS) platforms. Seer has commercial partnerships with market-leading MS vendors, including ThermoFisher Scientific, Bruker, and SCIEX. Need help deciding what Mass Spectrometry is best for your lab and proteomics research? We can help. Fill out this form and one of our proteomics experts will be in touch.
Yes. Seer has partnered with select leading service providers to help with the adoption of unbiased proteomics by making it easier for researchers to get access to the Proteograph Product Suite workflow and unbiased insights. Explore service providers.
The Proteograph Assay method can run up to 16 biosamples per run. If 16 samples are not used, DI water should be used in empty sample tubes. Each sample incubates separately with each of the five nanoparticles, resulting in 80 wells of peptides in a 96-well plate.
Seer has qualified plasma and serum samples at this time. Additional sample types, such as CSF, urine, and synovial fluid, have also been tested. For more product information, contact us, and a Proteograph expert will be in touch.
The total runtime of the instrument is less than 7 hours.
Yes, back-to-back assays can run the same day to process a total of 32 samples (16 samples per assay) in one day. Users would need to be present for approximately 30-45 minutes at the beginning of each run for assay preparation and setup and for approximately 30 minutes at the end of each assay to ensure that the peptide collection plate is properly stored, the instrument deck is cleaned up, and for recommended peptide quantification immediately following assay completion.
While some assays target up to 7000 proteins, this is the upper limit of what proteins can be searched for in a sample. Conversely, the unbiased Proteograph workflow does not target a list of proteins, but rather offers access to an unlimited protein search space in a sample.
Additionally, as targeted methods need to be specifically adapted for every protein (i.e., proteoform), it is very challenging for these methods to be extended into other organisms or differentiate proteoforms with small chemical differences or differences not exposed on the surface. The Proteograph has been shown to be species agnostic and Mass Spectrometry (MS) is capable of detecting many peptides per proteins, including those with modifications. Therefore, it is technologically infeasible for targeted methods to include all >1 million proteoforms in the targeted list of proteins, thus these methods cannot measure protein isoforms or post-translational modifications (PTMs), both of which have been shown to have significant biological value. MS-based workflows, including the Proteograph, however, can detect protein isoforms and PTMs.
Several factors should be considered when choosing proteomics software. Tools to assess post-acquisition QC, including inspection of the chromatograms and spectra to examine the raw signal to troubleshoot issues and tune settings, and tools to compare injections across time to assess LC-MS performance, are useful.
Additionally, tools to perform post-acquisition analysis, such as peptide/protein identification and quantification, and tools to gain biological insights, are also important. However, for both post-acquisition QC and analysis, proteomics data analysis software should be scalable, easy-to-use, and high-performing, and should enable automation and support data storage/management.
PAS 2.0 is an intuitive, scalable, cloud-powered proteomics informatics tool that enables researchers to process Proteograph LC-MS data quickly and efficiently. For an overview of its main features and benefits, we invite you to explore our interactive demo.
PAS 2.0 includes the new Proteogenomics workflow, which enables the integration of Proteograph proteomics data with next generation sequencing genomic variant information, allowing quick and easy identification of variant peptides. Additionally, PAS 2.0 includes two new visualization tools to browse and explore proteogenomic results and several visualization and under-the-hood improvements to enhance plot rendering speeds and stability, especially for large-cohort analyses. For more information on PAS 2.0, you can find the release notes on our PAS product page or click here.
Establishing and maintaining the computational infrastructure that is needed to effectively analyze proteomics data can be challenging for some labs. Cloud computing offers a flexible, scalable, and low-cost solution for proteomics data analysis, helping proteomics researchers conduct high-throughput proteomics studies at scale.
Proteogenomics is the incorporation of genomic information with proteomic data analysis to identify variant peptides not captured in canonical reference protein databases. The new scalable, high-resolution Proteogenomics workflow in PAS allows researchers to build a custom peptide database, perform variant peptide searches, and analyze the results in one day.
The Proteograph Analysis Suite (PAS) includes pre-installed analysis protocol using state-of-the-field search engines, including MaxQuant, EncylopeDIA, DIA-NN, and MSFragger. These protocols should be selected based on the run mode of your MS. For data generated in DDA mode, MaxQuant or MSFragger can be used. For data generated in DIA mode, EncyclopeDIA or DIA-NN can be used. Additionally, these protocols can be customized to fit your analysis needs. For example, PAS includes a Seer-generated human plasma spectral library file for DIA analysis with DIA-NN, however, a user may wish to upload their own spectral library.
BRUCE WILCOX, PhD
VP of Proteomics, PrognomiQ
The Proteograph enables us to generate large-scale unbiased proteomic data at unparalled speed and depth of coverage. Our multiomics studies combine unbiased genomics, transcriptomics and metabolomics with extensive proteomics data from the Proteograph to provide an unprecedented systems biology view of every subject. This comprehensive biological data enables us to develop transformative products for early disease detection in cancer and other complex diseases.
MARK R. FLORY, PhD
Senior Scientist, Cancer Early Detection Advanced Research Center (CEDAR), Knight Cancer Institute, Oregon Health and Science University
The Seer Proteograph platform uniquely combines deep, discovery-mode proteomic profiling of plasma and serum with the ability to feasibly scale studies for analysis of large cohorts. The technology allows us to perform statistically powered proteomic biomarker discovery campaigns directly in clinically accessible, yet complex and historically challenging, liquid biopsy specimen types.
ALEX CAMPOS, PhD
Former Director, Proteomics Core Sanford Burnham Preby’s Medical Discovery Institute, NCI-Designated Cancer Center Core
I have been using the immunodepletion strategy for years to deal with the dynamic range of biofluid proteomes, but the throughput and reproducibility were never what we expected and needed. The Seer Proteograph technology is a game-changer. First and foremost, the reproducibility is impressive. Not only we can identify more proteins than any other technology we have tried, but we can also do it much faster.