Prepping a genomics 101 workshop as we speak, and I’m just going through some older material of mine. Will use this to explore the issue of correlations in big big data sets. Anyway, this piece written back in 2006 was a fun exercise in this notion, although it was a little freaky how the correlations were for the Bush era folks. One of these days, I should do an update on this – might be interesting to see how those matches come up.
By DAVID NG
Every living thing on this planet adheres to a script, a biological language that is not unlike the ingredient lists on the back of your grocery store products. This script is DNA, composed of a limited alphabet of four building blocks (or letters, if you will): A, T, C and G.
Our human document is just over three-billion letters in length. To offer some perspective, E. coli has just over four-and-a-half million letters, a fly has about 150-million letters and rice has close to 400-million. In all—as of August, 2005—over 100,000,000,000 letters of code have been sequenced from a multitude of earthly delights and made publicly available for research within the life sciences.
DNA orchestrates the production of proteins—the molecules that are responsible for the architecture, mechanics, senses and defenses of each and every cell and tissue in an organism’s being. These proteins actually do the work of “living.”
And here’s where it gets interesting: Proteins are composed of strings of amino acids, pieced together as a direct result of DNA code. There are 20 different amino acids, each one denoted by a single letter. Since amino-acid alphabet is only missing the letters B, J, O, U, X and Z, one can look for relevant words within the huge dataset of genomes—within life’s code—and, perhaps, find wisdom for important decisions.
With this in mind, I decided to supercollide genetics and politics—more specifically, to contemplate specific words, built with strings of amino acids, and search all available genetic and protein sequence data for relevant matches.1 And it is these matches or answers that are gleaned—as if from a Magic 8 Ball—to reflect and evaluate our leaders, our options and our future. Whether you buy into this brand of decision making or not, here is what you’ll discover when you search genetic code for amino-acid sequence strings such as “BUSH,” as well as other names from current events.
1. The query for “BUSH” receives no hits, primarily because it is deemed a “low complexity sequence.” This is compounded by the fact that the letters B and U do not exist as specific amino acids.
2. To be fair, I tried the string “GWBUSH.” Here, the closest match resulted in the sequence “GWDASH.” It was interesting to note that 21 of the top 22 matches were derived from the genomes of “uncultured” organisms—ones that cannot be grown in any laboratory setting
3. Next, I tried “GWBLISH,” under the pretense that when you squint, it looks like “GWBUSH.” In this case, the best sequence match referred to the Japanese strain of Oryza sativa (paddy rice), a food staple from a country that is justifiably sensitive to past actions of the United States.
4. Because none of the above results sounded particularly encouraging, I figured that a better indicator of Bush’s worth might come from querying the names of his top advisors. However, when the sequence strings “ROVE,” “RICE” and “ALITO” are queried, all are met with the “low complexity sequence” result. The top hit for “RUMSFELD” was Xylella fastidiosa, a grapevine-decimating pathogen infamous in the wine industry. Interestingly, the top two matches for “CHENEY”are Vibrio vulnificus, a bacterium in the same family as those that cause cholera, as well as Vibrio speldidus, a nasty intestinal pathogen known for inducing vomiting, diarrhea and abdominal pain.
5. Finally, in an effort to further demonstrate my impartiality, I begrudgingly entered “PRESIDENTBUSH.” In this case, the best non-hypothetical match—one that can actually be assigned a biological function—was from the genome of Entamoeba histolytica. The organism is a single-celled, parasitic protozoan known for infections that sometimes last for years, which may be accompanied by vague gastrointestinal distress or dysentery—complete with blood and mucus in the stool.
6. For good measure, I considered how 2005 could have been different politically, entering a couple of searches related to Senator John Kerry. A query for “KERRY” received many perfect hits from a wide variety of different organisms, and “PRESIDENTKERRY” results in a best non-hypothetical match to a gene in Zygosaccharomyces rouxii, an organism belonging to the kingdom fungi.
I’m left to make the following conclusions: Simply stated, Bush is of low complexity. Addressing Bush using his first and middle initials suggests that he will run away or act in an uncultured manner. Squint at Bush and he just might make fun of your slanting eyes, call you “sushi lover,” or make some other inappropriate comment. His closest advisors are, at best, too simple for the task or busy attacking wine, or, at worst, will make you suffer horribly. At the end of the day, if you don’t want the hypothetical, but instead want the truth: President Bush is akin to an extended period of significant discomfort in your gut.
In hindsight, it would appear that these queries reflect accurately on the past year, what with the general mismanagement of the Iraq war, the fallout from Hurricane Katrina, the administration’s rebuff of climate change, as well as the President’s awkward but accommodating tone with intelligent design.
But, what if John Kerry had been elected president? Well, he would have just been a “fun guy!”
Perhaps there is some merit to this method of divination after all?
As a postscript—and to look forward rather than back, it being a new year—I ran one last query. Given Hillary Clinton’s potential candidacy in the 2008 presidential race, I performed one last search, inputting “HILLARY.”
Here, the top non-hypothetical hit corresponded to Burkholderia vietnamiensis strain G4, a bacterium known for its prowess in eliminating various hazardous environmental contaminants that are found in groundwater. Could this result possibly foreshadow an interesting campaign ahead? Is HILLARY someone that will bioremediate—literally “clean up” —the polluting mess left by the current administration.
No matter, I think it is best that I leave that act of interpretation to you the reader, since my actual Magic 8 Ball suggests that I “better not tell you now.”
1. Anyone can do this with a common bioinformatics tool known as BLAST. Follow the link and click on the “search for short, nearly exact matches” under the PROTEIN subheading. In the new page, enter your query, and then hit the “BLAST” button.
Originally published January 11, 2006 at Seed Magazine.