What’s inside a computer? The more curious of us have taken apart our favorite gadget. A quick peek and you will see the integrated silicon chips of the motherboard, a power supply, some data ports, and the hard drive. We all cherish our computer’s CPU and RAM, they provide the speed and processing power. And what descent into madness would there be without internet connectivity through a router port or Wi-Fi card? But the most treasured part of the computer is the hard drive. It stores our homework, our photos, our programs, and our secrets
Hard drives store information with electromagnetism. Within the hard drive, there are layers of shiny, metallic plates or platters. These platters and divided into billions of sectors that can be individually magnetized with a read-write actuator arm. This method of data storage defines the computer language that is used to process data. If a section of the platter is magnetized, it is a 1, if the section is demagnetized, it is a 0. This 1 or 0 is called a bit; eight bits is one byte. The letter ‘A’ is stored as 01000001 in binary. We measure storage in bytes: 100 bytes = 1 kilobyte, 1000 kilobytes = 1 megabyte, 1000 megabytes = 1 gigabyte. There is an exponential growth of data being produced and computer storage is limited. We are now turning to cloud storage, which is a fancy way of say off-site data storage that we can access anytime.
In the US, the largest data storage center (by square footage) is the Switch SuperNAP data center in Las Vegas, NV; the largest in the world is the Range International Information Group in Langfang, China. These data centers can store hundreds of petabytes of data; where 1 petabyte = 1000 terabytes and 1 terabytes = 1000 gigabytes. Even now the Switch SuperNAP data center has begun expansion on their facility to accommodate future data storage needs.
More data for less
Storage needs will continue to rise and the current solution is limited to the number of data centers available. An alternative to conventional electromagnetic storage may lie in biochemical storage. Instead of traditional silicon hardware, data can be stored on strands of DNA. This is not far off from what DNA already accomplishes. One cell in our body has the DNA instructions for an entire human being. The entire human genome, approximately 30,000 genes, is stored on molecules that have a combined mass on the picogram scale.
DNA uses four molecules called nucleotides to store genetic data: adenine (A), thymine (T), guanine (G), and cytosine (C). They are paired (G—C and A—T) on complementary strands that form a double helix. To express a gene, the DNA is read in sets of three, which is called a codon. One codon correlates to one amino acid, but some of the codons are redundant and code for the same amino acid. Though it’s possible to arrange the nucleotides into 256 different combinations, due to redundancy, our genome is actually 64 codons that code for 20 amino acids.
This data limitation is only restricted when a gene is expressed. If a DNA strand was synthesized, but not translated into a protein, the data storage potential increases by orders of magnitude. Take an 8-bit sequence in binary; there are eight ways to arrange a 1 or 0, giving 28 = 256 combinations. For and 8-bit DNA sequence, there are eight ways to arrange a G, C, A, or T, giving 48 = 65,536 potential combinations. Some sequence combinations are forbidden; large stretches of repeating nucleotides are unfavorable.
How read and write data to DNA
Nick Goldman at the European Bioinformatics Institute successfully encoded and stored five files to DNA. This included a text file of Shakespeare’s sonnets, an audio file of Martin Luther King’s ‘I have a dream’ speech, a copy of Watson and Crick’s structure of DNA paper, and a photo of the research team.
To do this, the binary code of the files is converted to DNA code. Long stretches of zeros or ones were converted to a five-nucleotide code. The team segmented the codes into workable fragments; each fragment had both an index code and sequence overlap to ensure its position and file integrity. The DNA sequence fragments were then synthesized off-site and shipped to the team. The team then sequenced the fragments, and reconstructed the files and compared them to the original files. There was 100% accuracy between the files.
Long term storage
This process of digital conversion to DNA, DNA synthesis, DNA sequencing, and reconstruction does take time, but the capacity and durability exceed conventional electronic hard drives. A hard disk has an average lifetime usage of about 10 years; DNA molecules are stable for hundreds of years. The data density that the platters in a hard drive can store is ~1013 bits per cm3, whereas an equivalent of DNA can store ~1019 bits per cm3, a six orders of magnitude difference. It is estimated that the world’s data can all be stored on 1 kg (2.2 lbs.) of DNA.
Not content with just data storage, bioengineers at the Sandia National Laboratories are working ways to encrypt and store sensitive information. Their method combines using an encryption cipher with DNA sequencing. Without the proper encryption key, the DNA sequence is nonsense, and the data is protected. Their first message of 180-characters was encoded into 550 base pairs. With current issues of data breaches and encryption breaking, this method would provide a very secure way to store information safely for decades or even centuries.
Top image: Hard Drive (Public Domain) & DNA (Public Domain)