The announcement in March that the human genome map is finally complete is a massive step in the drive to prevent and cure genetic diseases. But to unravel the secrets hidden in those 3 billion units of DNA, a vast amount of computer power is required. At Columbia, a special computing facility contained within the Genome Centerófunded by P&S and a large consortium of other academic and medical research institutionsóis equipped with the most advanced tools to provide researchers with this necessary computing power.
Created in 1997, AMDeC (Academic Medicine Development Company) is a 39-institution consortium that seeks to enhance the competitiveness and quality of biomedical research in New York state. AMDeC creates facilities like the one at the Genome Center, which allows the consortium members to share resources and expertise.
AMDeC has allocated $2.5 million to expand the computational capacity at the Columbia Genome Center to form the AMDeC Bioinformatics Core Facility. The facility, which opened in 2001, is designed to carry out large-scale research projects with the capacity to directly access the more than 100 whole-genome and specialty databases routinely downloaded and annotated on local Genome Center servers. With a comprehensive array of hardware, software, and expert staff, the core facility enables Columbia researchers to perform sophisticated genomic studies to help identify gene family relationships, predict protein structure or function, or identify disease-related genes based upon evolutionary links to known genes in model organisms.
To speed such computationally demanding projects, the Genome Center has installed several specialized computer processors, including the most recent addition called a Beowulf cluster. The cluster distributes projects into many smaller segments, each of which is analyzed simultaneously with its network of 90 processors, yielding much faster results. Clusters provide power comparable to some supercomputers at a fraction of the price.
Aside from the hardware, the facility also has the commercial and academic software most commonly used by researchers for gene and genome analysis. The Genome Center staff makes sure it has the latest copies of the major publicly available DNA sequence databases as well. "We provide a scientific and technical staff that can assist with or collaborate on individual projects," says Dr. T. Conrad Gilliam, disease gene mapping expert and director of the Genome Center.
Researchers can use the facility to conduct one-time projects or to generate their own ongoing analysis pipeline. "We're a catalyst for Columbia researchers and AMDeC members to get their research done," says Dr. James J. Russo, large-scale DNA sequencing expert and senior research scientist at the Genome Center.
The facility's scientists have been traveling to AMDeC member sites to spread the word about the bioinformatics core facility, says Dr. Kenneth C. Smith, a senior programmer at the Genome Center and project manager of the facility. "We're picking up about one or two new large projects after each presentation," he says. "At least half of all active projects involve P&S researchers."
One current effort led by P&S researcher Dr. Rudolph Leibel, co-director of the Naomi Berrie Diabetes Center and head of the Division of Molecular Genetics, entails a base-by-base search of multiple chromosomal regions in search of single-nucleotide polymorphisms that predispose individual mouse strains to develop diabetes. This work, in turn, informs the search for human diabetes genes. Dr. Stuart Fischer, research scientist at the Genome Center and expert in genome technologies and software applications, has adapted powerful commercial software to design a program and database to automate the detection, management, and analysis of this international disease gene search.
Dr. Russo and colleagues are sequencing the Legionella pneumophila genome, the bacterium that causes Legionnaire's disease, a form of pneumonia. They are using the GeneMatcher2 hardware accelerator with its 9,216 parallel processing cells to perform ultra-fast dynamic programming algorithms in search of novel genes and gene classes. The group routinely compares all DNA sequences in the Legionella genome against a comprehensive set of specialty databases in the updated local servers to help deduce protein structure and function. This feat would quickly exhaust the capacity of conventional hardware.
Dr. Eric Kandel, University Professor at P&S, and Dr. Jingyue Ju, associate professor of chemical engineering and head of DNA sequencing and chemical biology in the Genome Center, recently finished a large-scale analysis of more than 30,000 Aplysia (sea slug) cDNAs in addition to expressed sequence tags (partial DNA sequences of expressed genes). Using one of the facility's machines, a 44-processor BlastMachine cluster, Genome Center researchers assisted graduate student John Edwards, a member of Dr. Ju's laboratory, in a base-by-base comparison of the Aplysia sequences with all publicly deposited sequences. The massive analysis was completed in less than two days and provided the first glimpse of whole-genome functional gene classification in this classic model organism.
Previous projects have included an epilepsy project, which was a collaboration between Dr. Ruth Ottman, professor of epidemiology at the Mailman School of Public Health, and scientists at the Genome Center that resulted in the identification of a novel epilepsy gene, Leucine-rich Glioma Inactivated 1 (LGI1) gene. Dr. Pavel Morozov, research scientist at the Genome Center and expert in mathematics and evolutionary biology, conducted a detailed phylogenetic analysis that implicates LGI1 in the process of neuronal migration. If supported by experimentation, this finding promises to open new frontiers in the study of epilepsy. "The facility will be helpful to many investigators here and at other AMDeC member institutions," Dr. Russo says. "Each additional project gives us more expertise, which means we can help researchers do their genomic research more efficiently.
For more information see http://amdec-bioinfo.cu-genome.org