BIOLOGY of complete nucleotide sequence of human genome during last decade has set in a new era of genomics. In the last section, the essentials of human genome sequencing and its consequences will also be discussed. Let us begin our discussion by first understanding the structure of the most interesting molecule in the living system, that is, the DNA. In subsequent sections, we will understand that why it is the most abundant genetic material, and what its relationship is with RNA. 6.1 THE DNA DNA is a long polymer of deoxyribonucleotides. The length of DNA is usually defined as number of nucleotides (or a pair of nucleotide referred to as base pairs) present in it. This also is the characteristic of an organism. For example, a bacteriophage known as φ ×174 has 5386 nucleotides, Bacteriophage lambda has 48502 base pairs (bp), Escherichia colihas 4.6 × 106 bp, and haploid content of human DNA is 3.3 × 109 bp. Let us discuss the structure of such a long polymer. 6.1.1 Structure of Polynucleotide Chain Let us recapitulate the chemical structure of a polynucleotide chain (DNA or RNA). A nucleotide has three components – a nitrogenous base, a pentose sugar (ribose in case of RNA, and deoxyribose for DNA), and a phosphate group. There are two types of nitrogenous bases – Purines (Adenine and Guanine), and Pyrimidines (Cytosine, Uracil and Thymine). Cytosine is common for both DNA and RNA and Thymine is present in DNA. Uracil is present in RNA at the place of Thymine. A nitrogenous base is linked to the pentose sugar through a N-glycosidic linkage to form a nucleoside, such as adenosine or deoxyadenosine, guanosine or deoxyguanosine, cytidine or deoxycytidine and uridine or deoxythymidine. When a phosphate group is linked to 5'-OH of a nucleoside through phosphoester linkage, a corresponding nucleotide (or deoxynucleotide depending upon the type of sugar present) is formed. Two nucleotides are linked through 3'-5' phosphodiester linkage to form a dinucleotide. More nucleotides can be joined in such a manner to form a polynucleotide chain. A polymer thus formed has at one end a free phosphate moiety at Figure 6.1 A Polynucleotide chain BIOLOGY Figure 6.2 Double stranded polynucleotide chain turn. Consequently, the distance between a bp in a helix is approximately equal to 0.34 nm. (v) The plane of one base pair stacks over the other in double helix. This, in addition to H-bonds, confers stability of the helical structure (Figure 6.3). Compare the structure of purines and pyrimidines. Can you find out why the distance between two polynucleotide chains in DNA remains almost constant? The proposition of a double helix structure for DNA and its simplicity in explaining the genetic implication became revolutionary. Very soon, Francis Crick proposed the Central dogma in molecular biology, which states that the genetic Figure 6.3 DNA double helix information flows from DNA�RNA�Protein. Central dogma MOLECULAR BASIS OF INHERITANCE In some viruses the flow of information is in reverse direction, that is, from RNA to DNA. Can you suggest a simple name to the process? 6.1.2 Packaging of DNA Helix Taken the distance between two consecutive base pairs as 0.34 nm (0.34×10–9 m), if the length of DNA double helix in a typical mammalian cell is calculated (simply by multiplying the total number of bp with distance between two consecutive bp, that is, 6.6 × 109 bp × 0.34 × 10-9 m/bp), it comes out to be approximately 2.2 metres. A length that is far greater than the dimension of a typical nucleus (approximately 10–6 m). How is such a long polymer packaged in a cell? If the length of E. coli DNA is 1.36 mm, can you calculate the number of base pairs in E.coli? In prokaryotes, such as, E. coli, though they do not have a defined nucleus, the DNA is not scattered Figure 6.4a Nucleosome throughout the cell. DNA (being negatively charged) is held with some proteins (that have positive charges) in a region termed as ‘nucleoid’. The DNA in nucleoid is organised in large loops held by proteins. In eukaryotes, this organisation is much more complex. There is a set of positively charged, basic proteins called histones. A protein acquires charge depending upon the abundance of amino acids residues with charged side chains. Histones are rich in the basic amino acid residues lysines and arginines. Both the amino acid residues carry Figure 6.4b EM picture - ‘Beads-on-String’ positive charges in their side chains. Histones are organised to form a unit of eight molecules called as histone octamer. The negatively charged DNA is wrapped around the positively charged histone octamer to form a structure called nucleosome (Figure 6.4 a). A typical nucleosome contains 200 bp of DNA helix. Nucleosomes constitute the repeating unit of a structure in nucleus called chromatin, thread-like stained (coloured) bodies seen in nucleus. The nucleosomes in chromatin are seen as ‘beads-on-string’ structure when viewed under electron microscope (EM) (Figure 6.4 b). Theoretically, how many such beads (nucleosomes) do you imagine are present in a mammalian cell? The beads-on-string structure in chromatin is packaged to form chromatin fibers that are further coiled and condensed at metaphase stage of cell division to form chromosomes. The packaging of chromatin at higher level requires additional set of proteins that collectively are referred to as BIOLOGY Radioactive phages were allowed to attach to E. coli bacteria. Then, as the infection proceeded, the viral coats were removed from the bacteria by agitating them in a blender. The virus particles were separated from the bacteria by spinning them in a centrifuge. Bacteria which was infected with viruses that had radioactive DNA were radioactive, indicating that DNA was the material that passed from the virus to the bacteria. Bacteria that were infected with viruses that had radioactive proteins were not radioactive. This indicates that proteins did not enter the bacteria from the viruses. DNA is therefore the genetic material that is passed from virus to bacteria (Figure 6.5). Figure 6.5 The Hershey-Chase experiment 6.2.2 Properties of Genetic Material (DNA versus RNA) From the foregoing discussion, it is clear that the debate between proteins versus DNA as the genetic material was unequivocally resolved from Hershey-Chase experiment. It became an established fact that it is DNA that acts as genetic material. However, it subsequently became clear that BIOLOGY genetic material, but DNA being more stable is preferred for storage of genetic information. For the transmission of genetic information, RNA is better. 6.3 RNA WORLD From foregoing discussion, an immediate question becomes evident – which is the first genetic material? It shall be discussed in detail in the chapter on chemical evolution, but briefly, we shall highlight some of the facts and points. RNA was the first genetic material. There is now enough evidence to suggest that essential life processes (such as metabolism, translation, splicing, etc.), evolved around RNA. RNA used to act as a genetic material as well as a catalyst (there are some important biochemical reactions in living systems that are catalysed by RNA catalysts and not by protein enzymes). But, RNA being a catalyst was reactive and hence unstable. Therefore, DNA has evolved from RNA with chemical modifications that make it more stable. DNA being double stranded and having complementary strand further resists changes by evolving a process of repair. 6.4 REPLICATION While proposing the double helical structure for DNA, Watson and Crick had immediately proposed a scheme for replication of DNA. To quote their original statement that is as follows: ‘‘It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material’’ (Watson and Crick, 1953). The scheme suggested that the two strands would separate and act as a template for the synthesis of new complementary strands. After the completion of Figure 6.6 Watson-Crick model for semiconservative DNA replication, each DNA molecule would have one replication parental and one newly synthesised strand. This 104 scheme was termed as semiconservative DNA replication (Figure 6.6). 6.4.1 The Experimental Proof It is now proven that DNA replicates semiconservatively. It was shown first in Escherichia coli and subsequently in higher organisms, such as plants MOLECULAR BASIS OF INHERITANCE and human cells. Matthew Meselson and Franklin Stahl performed the following experiment in 1958: (i) They grew E. coli in a medium containing 15NH4Cl (15N is the heavy isotope of nitrogen) as the only nitrogen source for many generations. The result was that 15N was incorporated into newly synthesised DNA (as well as other nitrogen containing compounds). This heavy DNA molecule could be distinguished from the normal DNA by centrifugation in a cesium chloride (CsCl) density gradient (Please note that 15N is not a radioactive isotope, and it can be separated from 14N only based on densities). (ii) Then they transferred the cells into a medium with normal 14NH4Cl and took samples at various definite time intervals as the cells multiplied, and extracted the DNA that remained as double-stranded helices. The various samples were separated independently on CsCl gradients to measure the densities of DNA (Figure 6.7). Can you recall what centrifugal force is, and think why a molecule with higher mass/density would sediment faster? The results are shown in Figure 6.7. Figure 6.7 Meselson and Stahl’s Experiment (iii) Thus, the DNA that was extracted from the culture one generation after the transfer from 15N to 14N medium [that is after 20 minutes; E. coli divides in 20 minutes] had a hybrid or intermediate density. DNA extracted from the culture after another generation [that is after 40 minutes, II generation] was BIOLOGY frameshift insertion or deletion mutations. Insertion or deletion of three or its multiple bases insert or delete one or multiple codon hence one or multiple amino acids, and reading frame remains unaltered from that point onwards. 6.6.2 tRNA– the Adapter Molecule From the very beginning of the proposition of code, it was clear to Francis Crick that there has to be a mechanism to read the code and also to link it to the amino acids, because amino acids have no structural specialities to read the code uniquely. He postulated the presence of an adapter molecule that would on one hand read the code and on other hand would bind to specific amino acids. The tRNA, then called sRNA (soluble RNA), was known before the genetic code was postulated. However, its role as an adapter molecule was assigned much later. tRNA has an anticodon loop that has bases complementary to the code, and it also has an amino acid acceptor end to which it binds to amino acids. tRNAs are specific for each amino acid (Figure 6.12). For initiation, there is another specific tRNA that is referred to as initiator tRNA. There are no tRNAs for stop codons. In figure 6.12, the secondary structure of tRNA has been Figure 6.12 tRNA - the adapter molecule depicted that looks like a clover-leaf. In actual structure, the tRNA is a compact molecule which looks like inverted L. 6.7 TRANSLATION Translation refers to the process of polymerisation of amino acids to form a polypeptide (Figure 6.13). The order and sequence of amino acids are defined by the sequence of bases in the mRNA. The amino acids are joined by a bond which is known as a peptide bond. Formation of a peptide bond requires energy. Therefore, in the first phase itself amino acids are activated in the presence of ATP and linked to their cognate tRNA–a process commonly called as charging of tRNA or aminoacylation of tRNA to be more specific. If two such charged tRNAs are brought close enough, the formation of peptide bond between them MOLECULAR BASIS OF INHERITANCE would be favoured energetically. The presence of a catalyst would enhance the rate of peptide bond formation. The cellular factory responsible for synthesising proteins is the ribosome. The ribosome consists of structural RNAs and about 80 different proteins. In its inactive state, it exists as two subunits; a large subunit and a small subunit. When the small subunit encounters an mRNA, the process of translation of the mRNA to protein begins. There are two sites in the large subunit, for subsequent amino acids Figure 6.13 Translation to bind to and thus, be close enough to each other for the formation of a peptide bond. The ribosome also acts as a catalyst (23S rRNA in bacteria is the enzyme- ribozyme) for the formation of peptide bond. A translational unit in mRNA is the sequence of RNA that is flanked by the start codon (AUG) and the stop codon and codes for a polypeptide. An mRNA also has some additional sequences that are not translated and are referred as untranslated regions (UTR). The UTRs are present at both 5'-end (before start codon) and at 3'-end (after stop codon). They are required for efficient translation process. For initiation, the ribosome binds to the mRNA at the start codon (AUG) that is recognised only by the initiator tRNA. The ribosome proceeds to the elongation phase of protein synthesis. During this stage, complexes composed of an amino acid linked to tRNA, sequentially bind to the appropriate codon in mRNA by forming complementary base pairs with the tRNA anticodon. The ribosome moves from codon to codon along the mRNA. Amino acids are added one by one, translated into Polypeptide sequences dictated by DNA and represented by mRNA. At the end, a release factor binds to the stop codon, terminating translation and releasing the complete polypeptide from the ribosome. 6.8 REGULATIONOF GENE EXPRESSION Regulation of gene expression refers to a very broad term that may occur at various levels. Considering that gene expression results in the formation of a polypeptide, it can be regulated at several levels. In eukaryotes, the regulation could be exerted at (i) transcriptional level (formation of primary transcript), (ii) processing level (regulation of splicing), (iii) transport of mRNA from nucleus to the cytoplasm, (iv) translational level. MOLECULAR BASIS OF INHERITANCE Figure 6.14 The lac Operon Lactose is the substrate for the enzyme beta-galactosidase and it regulates switching on and off of the operon. Hence, it is termed as inducer. In the absence of a preferred carbon source such as glucose, if lactose is provided in the growth medium of the bacteria, the lactose is transported into the cells through the action of permease (Remember, a very low level of expression of lac operon has to be present in the cell all the time, otherwise lactose cannot enter the cells). The lactose then induces the operon in the following manner. The repressor of the operon is synthesised (all-the-time – constitutively) from the i gene. The repressor protein binds to the operator region of the operon and prevents RNA polymerase from transcribing the operon. In the presence of an inducer, such as lactose or allolactose, the repressor is inactivated by interaction with the inducer. This allows RNA polymerase access to the promoter and transcription proceeds (Figure 6.14). Essentially, regulation of lac operon can also be visualised as regulation of enzyme synthesis by its substrate. Remember, glucose or galactose cannot act as inducers for lac operon. Can you think for how long the lac operon would be expressed in the presence of lactose? Regulation of lac operon by repressor is referred to as negative regulation. Lac operon is under control of positive regulation as well, but it is beyond the scope of discussion at this level. MOLECULAR BASIS OF INHERITANCE disorders that affect human beings. Besides providing clues to understanding human biology, learning about non-human organisms DNA sequences can lead to an understanding of their natural capabilities that can be applied toward solving challenges in health care, agriculture, energy production, environmental remediation. Many non-human model organisms, such as bacteria, yeast, Caenorhabditis elegans (a free living non-pathogenic nematode), Drosophila (the fruit fly), plants (rice and Arabidopsis), etc., have also been sequenced. Methodologies : The methods involved two major approaches. One approach focused on identifying all the genes that are expressed as RNA (referred to as Expressed Sequence Tags (ESTs). The other took the blind approach of simply sequencing the whole set of genome that contained all the coding and non-coding sequence, and later assigning different regions in the sequence with functions (a term referred to as Sequence Annotation). For sequencing, the total DNA from a cell is isolated and converted into random fragments of relatively smaller sizes (recall DNA is a very long polymer, and there are technical limitations in sequencing very long pieces of DNA) and cloned in suitable host using specialised vectors. The cloning resulted into amplification of each piece of DNA fragment so that it subsequently could be sequenced with ease. The commonly used hosts were bacteria and yeast, and the vectors were called as BAC (bacterial artificial chromosomes), and YAC (yeast artificial chromosomes). The fragments were sequenced using automated DNA sequencers that worked on the principle of a method developed by Frederick Sanger. (Remember, Sanger is also credited for developing method for determination of amino acid sequences in proteins). These sequences were then arranged based on some overlapping regions present in them. This required generation of overlappingfragments for sequencing. Alignment of these sequences was humanly not possible. Therefore, specialised computer based programs were developed (Figure 6.15). These sequences were subsequently annotated and were assigned to each chromosome. The sequence of chromosome 1 was completed only in May 2006 (this was the last of the 24 human chromosomes – 22 Figure 6.15 A representative diagram of human genome project autosomes and X and Y – to be

RELOAD if chapter isn't visible.