Bioinformatics

A Safety Framework for Flow Decomposition Problems via Integer Linear Programming

We propose the first method for computing all safe solutions for an NP-hard problem, *minimum flow decomposition*. We obtain our results by developing a *safety test* for paths based on a general Integer Linear Programming (ILP) formulation. Moreover, we provide implementations with practical optimizations aimed to reduce the total ILP time. Experimental results on the transcriptome datasets of Shao and Kingsford (TCBB, 2017) show that all safe paths for minimum flow decompositions correctly recover up to 90% of the full RNA transcripts, which is at least 25% more than previously known safe paths. Moreover, despite the NP-hardness of the problem, we can report all safe paths for 99.8% of the over 27,000 non-trivial graphs of this dataset in only 1.5 hours.

Chaining for Accurate Alignment of Erroneous Long Reads to Acyclic Variation Graphs

We present a new algorithm to co-linearly chain a set of seeds in a string labeled acyclic graph, together with the first efficient implementation of such a co-linear chaining algorithm into a new aligner of long reads to variation graphs, GraphChainer. Compared to GraphAligner, GraphChainer aligns 12% to 17% more reads, and 21% to 28% more total read length.