Optimizing DNA Storage Efficiency via Joint Constrained and Error-Control Coding

Addressing the Problem

DNA-based storage technologies are attracting increasing attention due to recent demonstrations of the viability of recording information in macromolecules. Unlike many optical and magnetic storage technologies, DNA based memories are non-volatile, have exceptionally long life-cycles, and given the trends in cost decrease of DNA synthesis and sequencing, these memories may soon become the most effective means of archiving information. Unfortunately, the handful of known DNA-based architectures lack random access and rewriting capabilities needed to make DNA memories a disruptive technology.

Given the surge of Big Data platforms, designing new memory systems able to store petabytes and terabytes of data within grams of easily transferable and hideable genetic material would have a tremendous impact on the storage, communication, and data analytics industry.

Research Goals

The goals of this project are to develop new DNA storage paradigms that enable large-scale, fast and cost-efficient archival data storage and retrieval, as well as random data access with limited rewriting capabilities. The proposed storage architecture is centered around new data formatting and joint constrained and error-control coding methods, and it utilizes state-of-the art DNA editing and Next Generation Sequencing techniques.

Current Activities

As a proof of concept, the research group has developed the first small-volume prototype of a re-writable DNA storage system based on new families of carefully designed DNA prefix-synchronized address code words. The addresses are used for specific context hybridization and rewriting via DNA synthesis. The raw error-rate of the system is less than 0.1% and simple lookup tables of sequence prefixes allowed for error-free information retrieval in the test phase.