Alternative splicing (AS) is an important mechanism in the development of many
cancers, as novel or aberrant AS patterns play an important role as an independent onco-driver. In addition,
cancer-specific AS is potentially an effective target of personalized
cancer therapeutics. However, detecting AS events remains a challenging task, especially if these AS events are novel. This is exacerbated by the fact that existing transcriptome annotation databases are far from being comprehensive, especially with regard to
cancer-specific AS. Additionally, traditional sequencing technologies are severely limited by the short length of the generated reads, which rarely spans more than a single splice junction site. Given these challenges, transcriptomic long-read (LR) sequencing presents a promising potential for the detection and discovery of AS. We present Freddie, a computational annotation-independent
isoform discovery and detection tool. Freddie takes as input transcriptomic LR sequencing of a sample alongside its genomic split alignment and computes a set of
isoforms for the given sample. It then partitions the input reads into sets that can be processed independently and in parallel. For each partition, Freddie segments the genomic alignment of the reads into canonical exon segments. The goal of this segmentation is to be able to represent any potential
isoform as a subset of these canonical exons. This segmentation is formulated as an optimization problem and is solved with a dynamic programming algorithm. Then, Freddie reconstructs the
isoforms by jointly clustering and error-correcting the reads using the canonical segmentation as a succinct representation. The clustering and error-correcting step is formulated as an optimization problem-the Minimum Error Clustering into
Isoforms (MErCi) problem-and is solved using integer linear programming (ILP). We compare the performance of Freddie on simulated datasets with other
isoform detection tools with varying dependence on annotation databases. We show that Freddie outperforms the other tools in its accuracy, including those given the complete ground truth annotation. We also run Freddie on a transcriptomic LR dataset generated in-house from a
prostate cancer cell line with a matched short-read
RNA-seq dataset. Freddie results in
isoforms with a higher short-read cross-validation rate than the other tested tools. Freddie is open source and available at https://github.com/vpc-ccg/freddie/.