Important questions that are addressed by DNA sequence alignment
Understanding the function of unknown or newly-discovered genes
Large sets of new DNA sequence data are generated every day, often including gene sequences from species that have just been analyzed for the first time. This means that genetic sequence databases (such as the massive GenBank) contain a large and growing number of gene sequences that code for proteins of unknown function. (And non-coding sequences, which make up the overwhelming bulk of most genomes and are being added to databases just as rapidly, are generally less well understood than gene sequences.) For most of these unknown genetic elements, a complete understanding of their biological roles (if any) will require the hard work of experimental genetics and cell biology. But scientists can gain important clues about the potential functions of particular DNA sequences by comparing them to the sequences of elements with known functions.
Consider, for example, the genes that enable animals to smell. In vertebrates, the sense of smell is made possible by proteins called olfactory receptors. In mammals, the genes that encode these fascinating receptors constitute a very large family of genes, numbering from about 400 to more than 1200. Very few of these proteins have ever been the subject of a laboratory experiment, but scientists know that these genes encode olfactory receptors because they share common sequence components that enable the encoded proteins to function as odor sensors. When the genome of the duck-billed platypus was decoded in 2008, scientists immediately identified a family of olfactory receptor genes, and analyzed their likely evolutionary history, without performing a single laboratory experiment. They used DNA sequence alignment, comparing the platypus DNA sequences to known olfactory receptor sequences.
Identifying gene sequences that cause disease or susceptibility to disease
Genetic mutations are known to cause various pathological conditions (such as cystic fibrosis, which is caused by a mutation in the CFTR gene). More commonly, genetic mutations and variations are known to underlie enhanced susceptibility to diseases such as diabetes and cancer. Once scientists have some idea of which genes to look at, they can readily determine the nature of a genetic variation by comparing the sequence in question to normal sequences. Sometimes, the nature of the mutation can shed light on previously-unknown aspects of the disease or condition itself.
For example, some human families show higher rates of breast and ovarian cancer than the general population. Careful pedigree analysis and genetic mapping led to the discovery, in 1994, of two different genes (named BRCA1 and BRCA2) that harbored mutations in these cancer-prone families. Mutations in the genes were detected by sequence alignment with sequences from unaffected families. The discovery of these mutations led to enhanced understanding of the mechanisms by which the human body normally wards off cancer.
Detecting relatedness between different gene sequences that can indicate common ancestry
Evolutionary biologists who are mapping the tree of life can form and test hypotheses about common ancestry of organisms through careful analysis of the features of living animals in combination with detailed study of the ever-expanding fossil record. But these methods alone often leave important questions unanswered, and many evolutionary relationships remain incompletely understood. In the last few decades, however, biologists have acquired a new tool for the probing of evolutionary relationships: comparison of DNA sequences. As discussed earlier, organisms inherit their genetic information from their ancestors, and two organisms (or species) that share a common ancestor will have a shared genetic history, including shared genes. Over time, accumulated variation leads to significant genetic differences between distantly-related species, while the differences between more closely-related species are less dramatic. This means that comparison of gene sequences can provide scientists with new clues regarding the relatedness of various species, based not entirely on how the organisms appear, but on how recently they diverged from a common ancestor. (Similar reasoning can be applied to a simple human pedigree, where people in an extended family share common ancestors of differing remoteness.)
Skeleton of Ambulocetus natans, meaning "walking and swimming whale." Image courtesy of Hans Thewissen.
Consider the interesting question of how whales descended from land-based mammals. For many years, scientists hypothesized that whales had descended from a particular group of ancient and extinct animals called mesonychids, and the identity of whales' closest living relatives was a matter of significant controversy. While it was clear that whales had descended from land mammals, passing through some remarkable intermediate stages in which the ancestral animals were apparently "walking whales," it was far from clear how whales fit into the mammalian family tree until researchers were able to look carefully at whale genes, comparing them to the same genes from other mammals. The genetic findings supported one specific hypothesis, and subsequent work has led to a clearer view of the whale section of the mammlian family tree. The closest cousins of whales, it turns out, are most likely hippopotamuses.
Identifying changes in gene sequences that cause or accompany evolution
Some of the most interesting questions in all of biology center on the mechanisms that underlie evolutionary change. How did snakes come to have so many more vertebrae than other vertebrates? How did whales come to have flippers in place of legs and feet? How did the corn plants that are cultivated on the Great Plains arise from much humbler beginnings? In each case, scientists have a fairly good idea of the nature of the common ancestor, and seek to understand the nature of the genetic changes that brought about these interesting evolutionary transitions.
The evolution of corn has provided a superb opportunity for biologists to examine such questions in detail. Modern corn is descended from a wild grass called teosinte, which still grows in Central America. The two plants are strikingly different in their architecture, because corn displays some peculiar traits that arose as the result of selection by ancient human farmers. Through years of careful genetic mapping, biologists identified regions of the corn genome that contained genetic changes which could explain some of the unique traits found in corn but not in teosinte. Then, using various analysis techniques including sequence alignment, they have described very specific genetic changes that account for the various differences between the two plants. One interesting gene, called tga1, controls several aspects of development in corn and teosinte, and the change of a single amino acid in the encoded protein leads to corn-like features in teosinte. This difference was detected by sequence alignment.
Further examples come from study of our own species. Biologists studying human evolution are very interested in the functions of a protein called FOXP2. Mutations in this gene lead to severe speech defects in humans, suggesting that the protein controls critical aspects of human speech and, perhaps, human cognition. The gene is present in vertebrates ranging from fish and reptiles to birds and mammals, and it tends to remain unchanged over evolutionary time. Interestingly, however, sequence alignment has revealed that the FOXP2 gene in humans and their closest living relatives, chimps, differs by two amino acids from all related mammals. Biologists are continuing to study this important protein, and have recently found that it has also changed appreciably in bats that find prey using sonar (echolocation). Ongoing research like this is moving closer to a fuller understanding of how, genetically, humans and other organisms have come to acquire their wonderfully distinct characteristics.