|
|
![]() |
How the Genome was won how to access these data | |
|
Describing the genes gene function and expression | ||
|
Human diversity and the draft sequence genetic mutations and disease | ||
![]() |
Comparative genomics and evolution comparing different organisms | |
|
Gene expression when genes are turned on | ||
|
Fighting disease what can the draft gene therapy |
|
The complete sequence of the human genome, a feat of unparalleled cooperation by an international consortium of many laboratories, funding, and technology on a grand scale, is now close to reality, and will underpin human biology and medicine for the next century.
The most significant milestone to date is likely the production of the 'draft sequence'. This represents approximately 90% of the total human genome sequence - 15% in 'finished' form and 75% in draft. This poster shows that the draft sequence can be applied to many sequence-based investigations, in lieu of the finished. In which case, why finish? This will be necessary to give more assurance that what is already there is correct, and will fill in the gaps to make a complete human genome. |
|
|
The Companion paper in Science magazine The sequence for approximately 90% of the human genome has been published in the June, 2000 issue of Science magazine. |
|
The tutorials are available for printing in two formats: 1. as an HTML page 2. in PDF (portable document format) |
|
How the Genome was won
A clone-based approach to sequencing the genome means that overlapping stretches of DNA, whose approximate location is known, are sequenced. It is necessary to sequence each base multiple times for accurate assembly of the final product. For the working draft, data were generated at half the normal level of redundancy in order to allow the sequence to be generated twice as fast. As a consequence, there remain some small gaps where the sequence was not sampled. Despite this, it is typical to have well over 95% of the bases of each clone represented. Immediate uses: ability to unsplice cDNAs into their exons - eg. Re-evaluate EST db & recalculate the UniGene clusters. |
|
Producing the draft
The draft of the human genome is the result of a joint venture between 25 international genome sequencing centers. Click here for a list of these sequencing centers. |
|
Table. Draft sequence properties
Properties of the draft sequence (another column showing data for finished sequence?) |
|
Figure. Pie chart
Pie chart showing proportions of chromosome landscape - repetitive, pseudogenes, heterochromatin, a-satellite etc. |
|
Tutorial
How to access the human genome data and look at a map. |
|
Describing the genes
Estimates as to the number of gene in the human genome have ranged from 50,000 to 100,000. We are now confident that there are 89,453 genes. Of these, XXXX of them have previously been characterized, suggesting that there are XXXX genes that we have never come across before. The automated annotation of coding regions by GeneScan, followed by the analysis of these by domain and motif finders will be vital for getting the best value out of the draft sequence. The ultimate goal will be an encyclopedia of all human genes, linked together by information on function and expression. Having the draft sequence in hand means that this process can begin. For example, there are XXX predicted proteins that contain a kinase domain. |
|
Figure. Schematic showing annotation.
schematic of a piece of draft DNA and how it is annotated, including kinase domain. |
|
Tutorial. UniGene or LocusLink.
show links from draft sequence to a UniGene cluster or LocusLink |
|
Human diversity and the draft sequence
Many human phenotypes are inherited, and the majority cannot be assigned to a single inherited gene or mutation. Single nucleotide polymorphisms (SNPs), are hotspots in the genome where one base is frequently switched for another, which, when combined with other SNPs, may give rise to a complex trait. If the genomes from two individuals are compared, about one base in 1000 is a SNP site. The draft sequence will provide a source of reference for mapping SNPs, such as the É..factor V - common in Caucasians, rare in other ethnic groups. On a grand scale, this will begin to show the dynamics of human population migration and reveal the details of our own evolution. |
|
Figure. SNP alignment for a disease.
alignment (including a bit of the draft sequence) showing a snp for some disease |
|
Tutorial. BLAST.
This tutorial demonstrates how the Basic Local Alignment Tool (BLAST) can be used to compare a piece of draft sequence with sequences in dbSNP. |
|
Comparative genomics and evolution
The comparison of complete genomes from different organisms can not only reveal fascinating evolutionary connections, but also provide fuel for model systems for human disease. Now that we have the draft sequence to hand, comparative studies that includes humans is much more meaningful. Of the XXX genes that are present in all organisms sequenced to date, XXX/all are found within the XXXX predicted protein sequences in humans. This suggests that XX% of human genes are involved in higher organizational functions. |
|
Figure 1. Expansion of a gene family
A superfamily of conserved BRCT domains |
|
Figure 2. Schematic comparing organisms
schematic of what percentage of who matches who |
|
Figure 3. Chart
chart of selected domain categories: total number of these & % genome of human, yeast, fly, worm, H influenzae ? suggest e.g.: kinases, zinc fingers, SH2 domains, lipases, P-loops, homeboxes, ubiquitin, ABC transporters. This covers a range of cellular functions. |
|
Tutorial: COGs
compare a piece of draft sequence to COGs |
|
Gene expression
Making use of the working draft it is possible to obtain more accurate sequence corresponding to each mRNA plus sequence for some additional genes that are not well represented at the mRNA level. One strategy for exploring how the genome functions is determining how all of the genes are turned on and off in a coordinated way. Experimental systems that have been developed for monitoring the activities of thousands of genes are mostly based on mRNA sequence data, much of which is in the relatively inaccurate. Furthermore, the genomic context afforded by the working draft facilitate the study of DNA signals that may play a role in regulating gene expression. |
|
Figure. Microarray data
demonstration of e.g. some clinical diagnostic application of microarray analysis - show difference between disease non-disease state. |
|
Tutorial. PubMed.
Pubmed search? |
|
Fighting disease
All the above approaches will help find disease genes. SNPs - mapping of contributory factors for complex diseases and e.g. assessment of risks for different phenotypes for drug side effects. Microarrays: gene expression profiles in disease and non-disease states, alteration of gene expression with drug treatment. Comparative genomics will provide systems for modeling human disease and help define potential drug targets for genes found in pathogenic bacteria/organisms, but not in humans. Knowing the true level of redundancy for some genes will assist in gene therapy approaches. e.g. Diabetes |
|
Tutorial. NCBI resources.
A tour of NCBI resources that shows how different types of information can be gathered about a particular disease, starting from or intersecting a draft record. |
|
Websites of interest Human Genomes Guide at NCBI the human genome at a glance Human Genome Sequencing at NCBI sequencing progress one chromosome at a time Genomic Biology at NCBI genomic-scale science, whole genomes and related resources Entrez Genomes assemblies of complete genomes for over 600 organisms UniGene non-redundant sets of sequences that represent genes LocusLink combines descriptive and sequence information of human genes GeneMap'99 includes the locations of more than 30,000 human genes NCI the National Cancer Institute at NIH NHGRI the National Human Genome Research Institute at NIH The Human Genome Project at NHGRI the Human Genome Project at the National Human Genome Research Institute |
| Disclaimer | Privacy statement | Revised May 9, 2000 |