Conference Publication

Practical Minimum Path Cover

We present the first publicly available high-performance implementation of state-of-the-art MPC algorithms, including the parameterized approaches. Our experiments on random DAGs show that parameterized algorithms are orders-of-magnitude faster on dense graphs. Additionally, we present new pre-processing heuristics based on transitive edge sparsification. We show that our heuristics improve MPC-solvers by orders-of-magnitude.

Sorting Finite Automata via Partition Refinement

We provide a $O(|\delta| \log |Q|)$-time algorithm computing a co-lex total preorder when the input is a Wheeler NFA, and an algorithm with the same time complexity computing the smallest-width co-lex partial order of any DFA.

Chaining of Maximal Exact Matches in Graphs

We show how to chain *maximal exact matches* (MEMs) between a query string $Q$ and a labeled directed acyclic graph (DAG) $G=(V,E)$ to solve the *longest common subsequence* (LCS) problem between $Q$ and $G$.

Width Helps and Hinders Splitting Flows

We show that, for acyclic graphs, considering the width of the graph yields advances in our understanding of its approximability. For the non-negative version, we show that a popular heuristic is a $O( \log |X|)$-approximation on graphs satisfying two properties related to the width (satisfied by e.g., series-parallel graphs), and strengthen its worst-case approximation ratio for sparse graphs. For the negative version, we give a $(\lceil \log \Vert X \Vert \rceil +1)$-approximation using a power-of-two approach, combined with parity fixing arguments and a decomposition of unitary flows ($\Vert X \Vert \leq 1$) into at most width paths.

Safety and Completeness in Flow Decompositions for RNA Assembly

We give the first local characterization of safe paths for flow decompositions in directed acyclic graphs (DAGs), leading to a practical algorithm for finding the complete set of safe paths. We additionally evaluated our algorithms against the trivial safe algorithms (unitigs, extended unitigs) and the popularly used heuristic (greedy-width) for flow decomposition on RNA transcripts datasets. We find that despite maintaining perfect precision the safe and complete algorithm reports significantly higher coverage as compared to trivial safe algorithms.

Sparsifying, Shrinking and Splicing for Minimum Path Cover in Parameterized Linear Time

We obtain two new MPC parameterized algorithms for DAGs running in time $O(k^2|V|\log(|V|) + |E|)$ and $O(k^3|V| + |E|)$. We also obtain a parallel algorithm running in $O(k^2|V| + |E|)$ parallel steps and using $O(\log(|V|))$ processors (in the PRAM model). We also obtain edge sparsification algorithms preserving the width of the DAG with the same running time as our MPC algorithms.

A linear-time parameterized algorithm for computing the width of a DAG

We describe an parameterized algorithm to compute the width $k$ of a DAG in time $O(k2^k|E| + k^24^k|V|)$.

Fast Indexes for Gapped Pattern Matching

We describe indexes for searching large data sets for variable-length-gapped (VLG) patterns. Our best approach provides search speeds several times faster than prior art across a broad range of patterns and texts.

Faster Repetition-Aware Compressed Suffix Trees Based on Block Trees

New Repetition-Aware Compressed Suffix Tree. Slightly larger than state-of-the-art, but outperforms them in time, often by orders of magnitude.