AI summaryⓘ
The authors focus on finding special types of subgraphs called snarls, ultrabubbles, and superbubbles in genetic variation graphs, which help analyze genomes. They developed the first linear-time algorithms to find all snarls and ultrabubbles, solving problems that were open since 2018. Their approach uses a graph decomposition technique called SPQR-trees and a new way to represent snarls compactly. They also show how to efficiently detect cycles in subgraphs, a key step for their algorithms, and prove that finding certain feedback arcs in these graphs is possible in linear time under specific conditions. Their work advances efficient graph algorithms in computational biology and extends the use of SPQR-trees beyond earlier applications.
superbubblessnarlsultrabubblesdirected graphsbidirected graphsSPQR-tree2-separatoracyclicityfeedback arcscomputational biology
Authors
Francisco Sena, Aleksandr Politov, Corentin Moumard, Massimo Cairo, Romeo Rizzi, Manuel Cáceres, Sebastian Schmidt, Juha Harviainen, Alexandru I. Tomescu
Abstract
A fundamental algorithmic problem in computational biology is to find all subgraphs of a given type (superbubbles, snarls, and ultrabubbles) in a directed or bidirected input graph. These correspond to regions of genetic variation and are useful in analyzing collections of genomes. We present the first linear-time algorithms for identifying all snarls and all ultrabubbles, resolving problems open since 2018. The algorithm for snarls relies on a new linear-size representation of all snarls with respect to the number of vertices in the graph. We employ the well-known SPQR-tree decomposition, which encodes all 2-separators of a biconnected graph. After several dynamic-programming-style traversals of this tree, we maintain key properties (such as acyclicity) that allow us to decide whether a given 2-separator defines a subgraph to be reported. A crucial ingredient for linear-time complexity is that acyclicity of linearly many subgraphs can be tested simultaneously via the problem of computing all arcs in a directed graph whose removal renders it acyclic (so-called feedback arcs). As such, we prove a fundamental result for bidirected graphs, that may be of independent interest: all feedback arcs can be computed in linear time for tipless bidirected graphs, while in general this is at least as hard as matrix multiplication, assuming the k-Clique Conjecture. Our results form a unified framework that also yields a completely different linear-time algorithm for finding all superbubbles. Although some of the results are technically involved, the underlying ideas are conceptually simple, and may extend to other bubble-like subgraphs. More broadly, our work contributes to the theoretical foundations of computational biology and advances a growing line of research that uses SPQR-tree decompositions as a general tool for designing efficient algorithms, beyond their traditional role in graph drawing.