Genes and DNA: a brief introduction

How can two healthy parents give birth to a child with a genetic disorder such as cystic fibrosis? What is the nature of the information that directs a tiny, one-celled embryo to develop into a bat, a whale, or a human baby? How and why do individuals of the same species differ from each other, and how does this lead to evolutionary change? Such questions are addressed by the scientific study of inheritance, better known as genetics.

Mouse embryo, midway through development. Image courtesy of Wellcome Library, London.

Decades before scientists knew anything about genomes or the double helix, they had deduced that organisms inherited characteristics from their parents by way of packets of information they called genes. By the turn of the 20th century, it was clear that genes were carried on cellular structures called chromosomes. But the identity of the genetic material was not definitively established until the 1940's. Shortly thereafter, the discoveries of James Watson and Francis Crick made history and ushered in the modern era of molecular genetics.

What are genes and what do they do? Basically, genes are packets of information, sets of instructions for the construction of proteins. Proteins are the machines of life, the diverse and complex molecules that perform the functions that characterize a living cell.

All organisms on earth store and transmit this genetic information through the medium of DNA, a giant molecule with some fascinating properties that facilitate both the storage and copying of genetic instructions. DNA encodes information using a (nearly) universal code with a four-letter alphabet: A, T, C and G. The letters represent chemical structures, called bases, in the linear structure of DNA, and the pairing of A with T and G with C in the famed double helix makes possible the copying of DNA during cell division. In genes, the four-letter alphabet is organized into three-letter words, which are translated into the language of proteins, a language with a 20-letter alphabet used to form proteins consisting of hundreds to thousands of “letters” (representing amino acids). So, each three-letter "word" in the DNA code of a gene represents a single amino acid, a building block in the construction of a protein.

The total DNA complement in a given organism is its genome, which is made up of genes (about 22,000 in humans) that are typically embedded in vast regions of non-coding DNA (i.e., DNA that does not contain genes and therefore does not code for protein).


Sketch of the DNA double helix by Francis Crick. Image courtesy of Wellcome Library, London.

The deciphering of the genetic code, along with the development of techniques for the analysis of DNA bases, enabled scientists to begin to read DNA sequences from various genomes, and to start to address many of the basic questions of genetics by looking directly into the genetic code of life. Early efforts (in the late 1970's) entailed the analysis of tiny viral genomes and single well-known human genes, such as hemoglobin and insulin. But the scale and scope of DNA sequencing rapidly expanded, and the final decade of the 20th century ended with the completion of the sequencing of the human genome. Today, vast databases contain the complete genetic codes of hundreds of organisms, and entire genomes are added every few weeks.

Next section: Genetic inheritance and common ancestry