Adrian Sampson

latest blogging: May 28, 2024 — more bloggings — subscribe

One Weird Trick for Efficient Pangenomic Variation Graphs (and File Formats for Free)

Last time, I introduced pangenomic variation graphs, the standard text file format that biologists use for them, and a hopelessly naïve reference data model we implemented for them. This time, we use a single principle—flattening–to build an efficient representation that is not only way faster than the naïve library but also competitive with an exisitng, optimized toolkit. Flattening also yields a memory-mapped file format “for free” that, in a shamelessly cherry-picked scenario, is more than a thousand times faster than the serialization-based alternative.

keep reading…