In eukaryotes, alternative
pre-mRNA splicing allows a single gene to encode different
protein isoforms that function in many biological processes, and they are used as
biomarkers or therapeutic targets for diseases. Although
protein isoforms in the human genome are well annotated, we speculate that some low-abundance
protein isoforms may still be under-annotated because most genes have a primary coding product and alternative
protein isoforms tend to be under-expressed. A
peptide coencoded by a novel exon and an annotated exon separated by an intron is known as a novel junction
peptide. In the absence of known transcripts and homologous
proteins, traditional whole-genome six-frame translation-based proteogenomics cannot identify novel junction
peptides, and it cannot capture novel
alternative splice sites. In this article, we first propose a strategy and tool for identifying novel junction
peptides, called CJunction, which we then integrate into a proteogenomics process specifically designed for novel
protein isoform discovery and apply to the analysis of a deep-coverage HeLa mass spectrometry data set with identifier PXD004452 in ProteomeXchange. We succeeded in identifying and validating three novel
protein isoforms of two functionally important genes, NHSL1 (causative gene of
Nance-Horan syndrome) and EEF1B2 (translation
elongation factor), which validate our hypothesis. These novel
protein isoforms have significant sequence differences from the annotated gene-coding products introduced by the novel N-terminal, suggesting that they may play importantly different functions.