Entry for:The Bioinformatics Peer Prize II
Is it possible to use music to identify the nature of DNA sequences? I published a proof of principle article about DNA sonification in BMC Bioinformatics. This was featured in Nature blogs, BMC blogs, IFLScience and The Conversation (which has received over 30,000 reads).
In some distant alt-universe I was the drummer in a Sydney pop band before becoming an ever-so-serious molecular biologist. I listen deeply to music and analyse the rhythms and notes with an analytical mind. Algorithmic music made from DNA sequence can similarly be perceived to tell us something useful about the underlying DNA sequence.
The underlying principle is to process DNA sequence according to the central dogma of molecular biology, but instead of a cell making protein strings from DNA; in-silico we make a string of musical notes.
The innovation of the DNA sonification tool is to create audio from three reading frames simultaneously and to use start and stop codons (that regulate protein synthesis) to start and stop the audio of each reading frame. The goal is to implement DNA sonification to supplement existing visual tools available in genomic browsers.
After uploading a sequence, the user can select precisely how the musical transcription is accomplished. Sequence data is converted to MIDI (musical instrument digital interface) format and played in the browser using a plugin. The simplest algorithm maps each of four nucleotides to single notes, creating a four-note auditory landscape. Another maps di-nucleotides increasing the complexity to 16 notes across two instruments. The most informative mode maps each nucleotide triplet (codon) to one of 20 notes, and outputs the audio across three instruments at once, just as the genetic code maps 64 codons to 20 amino acids. The result is a series of three-note arpeggios - CGF-ADD-CFF-DFG-AFC-GCD etc.
The DNA sonification tools are available through a webpage interface whereby input DNA sequence is processed in real-time to produce an auditory display playable directly within the browser.
The most informative sonification algorithm reads the DNA sequence as codons in three reading frames to produce three concurrent streams of audio in an auditory display. This approach is advantageous since start and stop codons in either frame have a direct affect to start or stop the audio in that frame, leaving the other frames unaffected. Using these methods, DNA sequences such as open reading frames or repetitive DNA sequences can be clearly distinguished from one another.
You can listen to these example auditory displays in the YouTube playlist (taken from the sonification tools site).
All audio was produced by the sonification tool and converted to mp3 for inclusion in the video. Basic animation has been added to indicate which DNA codons are being played. Seven (7) audio examples are provided. These include two artificial control sequences, two sequences containing natural repetitive sequences and a comparison of a coding vs non-coding sequences. Lastly (just for fun) are examples of DNA sequences that were processed simply to make generative music.
Normally, scientists rely heavily on visual inspection of DNA sequences to unlock their secrets. Sonification alone is not intended to replace visual inspection but rather to complement it, in the same way that colour may highlight the properties of a DNA sequence.
Outside of the rigours of DNA research there is strong interest within the community to better understand how DNA sequences determine our physical form and how mutations we accumulate in DNA over time affect our health. Hopefully, listening to audio derived from DNA may help people better understand how cell biology works.
5. Future ideas/collaborators needed to further research?
Currently I am thinking of applying new algorithms to detect and highlight features of multiple sequence alignments and intron exon boundaries that are commonly displayed in genomic browsers. Unfortunately, my coding skill are in keeping with those of a musician turned molecular biologist (!) rather than those of a trained coder or computational biologist. So its a work in progress.
I have made in-roads into synchronising web animation and real time audio streaming - the technique seams to be potentially useful in the domain of DNA sequence analyses.
I also aim to recode the existing tool and pay attention to improving the interface from an end users perspective, and hopefully play the audio in the browser without the use of a client-side plugin.
From a teaching and engagements strategy, I think developing a parallel generative music tool also has merit.