CARDIOLOGIST MARIO DENG’S WORK IS AN EXAMPLE OF HOW research and clinical practice can work together to ultimately benefit patients. Based on fundamental science, like that Columbia’s world-renowned researchers are conducting in genomics, Dr. Deng’s translational research analyzed thousands of genes. The result was a blood test, based on 20 of those genes, that can detect if people with heart transplants are rejecting their donor organs.
     The new diagnostic could replace — in some cases — the invasive and risk-ridden biopsies doctors have performed for the past 30 years to monitor heart rejection in individuals with transplants. With the test, one of the first validated applications of genomics, the study of all the genes in an organism, Dr. Deng has joined the impressive group of researchers — molecular biologists, computer scientists, statisticians, physicists, mathematicians, and clinicians — who are shaping innovative methods and technologies to harness the information associated with the 20,000 to 25,000 genes of the human genome and the way they self-assemble into the cell’s biological circuitry.

Genomic data need massive analysis
Ever since scientists sequenced the human genome in 2001, and even before, they have been generating terabytes (1012 bytes) of data — collected into computerized databases — about the activity of thousands of genes and proteins active in the different cell types of humans and other organisms. Before the age of genomics, during the past 30 years or so, investigators typically studied single genes and their protein products, learning enormous amounts about their regulation and structure. Indeed, the single-gene approach led to significant progress in understanding basic biological processes and in deciphering Mendelian disorders, such as cystic fibrosis and sickle cell anemia, which follow unique inheritance patterns of single genes. But the majority of chronic conditions — heart disease, cancer, diabetes, and mental illness — involve many genes, whose activity is affected by such environmental factors as toxins, obesity, and stress.
     Within the past five to 10 years, though, new technologies, such as microarrays and other high-throughput
Barry Honig, Ph.D. Andrea Califano, Ph.D.
Barry Honig, Ph.D. Andrea Califano, Ph.D.
methods, as well as the sequencing of the genome of many organisms, have allowed researchers to study the activity of many genes simultaneously in a very short time. The challenge now is figuring out the regulation and interaction of the genes, and the proteins for which they code, in normal cellular processes and in disease. But translating the genomic pathways into the next generation of diagnostics and treatments is going to take time and work, as methods of analysis have yet to be standardized and new technologies are still being invented based on work ongoing at Columbia. But the effort will be worth it. Genomics is expected to transform biology by providing a highly detailed elucidation of the nature of life and in the process help to explain disease in a way that will revolutionize and, someday, individualize the practice of medicine.
     As part of a nationwide effort to spur genomic approaches to basic and translational research, the federal government recently awarded more than $50 million to Columbia, capitalizing on the institution’s strengths in many scientific disciplines. The funded efforts range from the macroscopic to the sub-microscopic — from characterizing genomic circuitry in cells to describing the X-ray crystallographic structures of important proteins.
     With the aim of developing computational and scientific infrastructure, as well as the software and data management tools to leverage the vast amount of data from the Human Genome Project, the NIH in September 2005 gave the University an $18.5 million, five-year grant to create a National Center for the Multi-Scale Analysis of Genetic and Cellular Networks (MAGNet). MAGNet is under the leadership of Andrea Califano, Ph.D., professor of bioinformatics, and Barry Honig, Ph.D., Howard Hughes Medical Institute investigator and professor of biochemistry and molecular biophysics.
     To help determine the three-dimensional shapes of proteins, as part of the Protein Structure Initiative, the NIH in July 2005 also awarded Columbia $25 million in a five-year grant. Wayne Hendrickson, Ph.D., University Professor of Biochemistry and Molecular Biophysics, is spearheading this effort, which involves Columbia-based and outside centers. Protein structure is vital for understanding disease and new drug development.
     And in June 2005, the NIH awarded $9 million to James Rothman, Ph.D., director of the Judith P. Sulzberger, M.D., Columbia Genome Center and the Clyde and Helen Wu Professor of Chemical Biology in the Department of Physiology and Cellular Biophysics, to head up one of the agency’s Molecular Libraries Screening Network Centers at the medical center campus. The Columbia center is focusing on using large-scale methods to identify small molecules that permit the study of genes, cells, and biochemical pathways in health and disease, with the goal of finding new drug targets and treatments.

| TOP |

Genomic blood test vs. invasive biopsy
Companies also are supporting genomic research at Columbia. XDx Inc., a molecular diagnostics company in South San Francisco, sponsored a trial led by Dr. Deng for the past four years of the genomics-based test for heart transplant rejection. The study, called CARGO (the Cardiac Allograft Rejection Gene Expression Observational Study) involved 600 patients in eight U.S. academic medical centers (co-PIs were at Drexel University, University of Maryland, UCLA, Cleveland Clinic, University of Pittsburgh, University of Florida, and Stanford University). Findings from CARGO, first e-published Dec. 19, 2005, in the American Journal of Transplantation, showed that AlloMap, the 20-gene test, could sensitively detect the absence of moderate/severe rejection in these patients.
     “The genomics revolution ushered in by the completion of the Human Genome Project has made possible what
Wayne Hendrickson, Ph.D.
Wayne Hendrickson, Ph.D.
was only dreamed about before — namely, the ability to detect rejection of the transplanted heart without taking a biopsy,” says Dr. Deng, who is director of cardiac transplantation research and assistant professor of medicine.
     While the rejection rate is highest in the first year, approximately 3 percent to 5 percent of heart transplant patients experience moderate or severe rejection of their new organ after the first year. Heart muscle biopsies have been the most reliable method for detecting the rejection, until recently. To monitor rejection and guide immunosuppressive therapy, doctors perform biopsies on patients for the rest of their lives. Initially biopsies are performed weekly, then monthly, and then every three to six months throughout the patient’s lifetime.
     Although heart transplant patients understand they must have the multiple biopsies, the procedure is not pleasant. The biopsy involves a clinician threading a catheter through a vein, guiding the tube to the right ventricle, and removing a half-dozen samples. Clinicians then use microscopy to look for the presence of white blood cell infiltration in the heart muscle tissue to assess rejection. The AlloMap test, in contrast, analyzes gene expression in white blood cells from a simple and quick blood sample. Clinicians also assess the health of the patient at the time of the blood drawing.
     Dr. Deng and collaborators developed the test based on the hypothesis that immune cells would express different genes during rejection of foreign tissue and quiescence. Initially, they analyzed 7,000 genes chosen from the medical literature that would be involved in white blood cell activity. They studied the genes in 285 patient samples. From these genes, they selected 250 relevant candidate genes and compared their levels of expression in a total of 145 patients, who either experienced rejection or did not. They then quantified the expression of each of these genes, using what is called the real-time polymerase chain reaction, to ensure that expression levels were valid. With studies in 270 additional patients and further statistical analyses, they eventually reduced the number of genes in the test to 20, which includes housekeeping genes as controls. The process started in 2000, culminating with a commercial product in November 2004. “Since then, there have been even further advances in technology. Therefore, the process from multi-gene analysis to a product could move even faster today,” Dr. Deng says. Furthermore, Dr. Deng is now collaborating with MAGNet investigators to dissect the transcriptional pathways that harbor these genes, leading to an understanding of cardiac allograft rejection at the systems biology level.

| TOP |

Fostering translational research using high-throughput methods
In fact, the ability to easily analyze tens of thousands of genes and perform tests on thousands of samples in a short amount of time is what Dr. Rothman, who took the helm of the genome center in March 2005, is now doing. Dr. Rothman is a world-renowned cell biologist who has received the Lasker Award and Columbia’s Louisa Gross Horwitz Prize for his discovery of the mechanism of cell secretion. Impressed by the power of high-throughput technology to transform his research in cell biology as well as genomics, he came to Columbia in 2004 with the goal
James Rothman, Ph.D.
James Rothman, Ph.D.
of making this a reality. “With these new developments, by the end of 2006 the Columbia Genome Center will be able to measure in a matter of one or a few days the effects of 100,000 small molecules or each of the genome’s 20,000 to 25,000 genes in thousands of cell cultures,” Dr. Rothman says. Many of these methods, borrowed from the pharmaceutical industry, rely on robotics, automated liquid handling, computerized microscopy, and image analysis as well as sophisticated data analysis tools. “The automated microscope at the Columbia Genome Center — perhaps the only one in academia — allows imaging of 50,000 different cell cultures in one day and monitors the pattern of light emission from hundreds of individual cells genetically engineered to report on the status of a biological pathway or process of interest.”
     Here is how Dr. Rothman envisions investigations between the genome center and scientists at Columbia would work: Let’s say a Columbia researcher has been studying in great detail a few important genes involved in b-pancreatic cells releasing insulin in response to sugar and has developed relevant cell culture assays of this process. By collaborating with the genome center the researcher would now be able to understand all the genes involved in insulin secretion in response to sugar in the pancreas, Dr. Rothman says.
     Each of the 25,000 genes, Dr. Rothman says, could be tested in hundreds of micro-cell culture plates (which have 384 wells) using gene interference technology, such as siRNA, to measure the effect of the sequential absence of each of the genes. siRNA, or interfering RNAs, are short sequences of RNA that interfere with the expression of genes. Before the interference experiment could take place, though, researchers would have to create a measurable assay to mark with a fluorescent dye a protein of interest involved in the secretion pathway. Under normal circumstances, the tag would be localized near the cell membrane. But after each of the interfering siRNAs was added to the cell culture samples, automated microscopy would reveal the effect of the absence of each gene by where the tag was now situated inside the cell. “In the majority of cases, the absence of the gene would have no effect on the tagged protein, but in some cases the tagged protein might be stuck in the cytoplasm, showing the gene played some role in the secretion pathway,” Dr. Rothman says.
     Instead of employing interfering genes, scientists also could probe marker proteins with microscopy to assess the effects of each of 100,000 small chemicals added to a cell culture split up into thousands of samples. These small molecules could identify drug targets or be chemically modified to become drugs themselves.
     High-throughput methods, Dr. Rothman explains, also can be used to identify a previously unknown disease susceptibility gene on a particular human chromosome. “Although the human genome has been sequenced, the function of most genes has yet to be determined,” Dr. Rothman says. Some of the genes sequenced by the Human Genome Project and localized to the chromosomes have known functions. But other regions of DNA only have
Peter Antinozzi, Ph.D.
Peter Antinozzi, Ph.D.
predicted genes, the functions of which remain unknown. “Pedigree analysis of families has been useful in finding many disease-susceptibility genes, but sometimes such genetic methods are not useful because families are too small and statistical methods don’t work well enough to pinpoint the actual disease-modifying gene as distinct from a region that may contain it. Also, multiple genes may be acting, further complicating matters.”
     Peter Antinozzi, Ph.D., an associate research scientist in Dr. Rothman’s laboratory, has developed a new method to accelerate the identification of disease susceptibility genes. In a study published in the Proceedings of the National Academy of Sciences in March 2006, Dr. Antinozzi was able to find two new candidate genes for Type 2 diabetes using this novel protocol.
     The procedure is based on integrating population genetics with gene knockdown techniques and cell-based functional screens. Dr. Antinozzi first selected a section of the human genome the scientific literature showed was linked with Type 2 diabetes. The region he selected was on chromosome 18, which contains 5 million base pairs of DNA and had not been further narrowed down to isolate a causative gene. The sequence for this region was analyzed and 10 genes were selected for further investigation. He then used siRNAs to interrupt the expression of each of the 10 genes and observed the effect on hormone secretion in pancreatic b-cells. He found that reduced expression of one of the genes, laminin a1, impaired insulin secretion. “Laminin a1 is a subunit of the laminin-1 complex. Although the function of laminin-1 has been addressed previously, there had been no prior association of this gene to diabetes risk,” Dr. Antinozzi says.
     In analyzing the literature he found four effectors of laminin-1. He again used siRNAs to reduce expression of these four genes and assayed their effect using the hormone secretion assay. One of these four genes had a strong effect on secretion: the laminin receptor 1, which also had never been linked to diabetes. “Now that these two new genes have been implicated in Type 2 diabetes, I expect our results will instigate population geneticists to focus resources on these two genes to identify the actual allelic variants that are associated to the disease,” Dr. Antinozzi says. “The great promise of identifying novel disease genes lies in their ability to serve as diagnosis tools and potential new targets for drugs.” Dr. Antinozzi is now working on methods to ramp up the research by looking at larger DNA regions and expanding the collection of cell-based assays to study other diseases. He expects this method will be widely applied to other diseases with a demonstrated genetic component.
     “High-throughput methods being developed at the genome center provide unprecedented opportunities at Columbia for researchers to analyze all the genes involved in the biological processes and diseases they are studying,” Dr. Rothman says. “For many years, the very important and seminal findings from basic scientists at Columbia were not easily translated into applications for health and medicine. We now feel the technologies and the expertise at the genome center provide a key nexus for Columbia scientists to transform research into results that will help alleviate human suffering and disease.”

| TOP |

Dissecting the genomic and proteomic circuitry of cells
Developing lists of genes and proteins involved in a normal or disease state is one aspect of genomic research, but

Genomics at Columbia

Funding in 2005 from the National Institutes of Health is supporting new work by Columbia researchers in these programs:

MAGNet, the National Center for Multi-Scale Analysis of Genetic and Cellular Networks ($18.5 million), part of the National Centers for Biomedical Computing, a network of seven centers created to begin developing the computational and scientific infrastructure as well as software and data management tools needed to leverage the vast core data generated in part by the Human Genome Project. MAGNet is directed by Andrea Califano and Barry Honig.

Molecular Libraries Screening Centers Network ($9 million), directed by James Rothman, as part of an $88.9 million NIH program to establish, at nine institutions, a collaborative research network that will use high-tech screening methods to identify small molecules that can be used as research tools. Small molecules have great potential to help scientists in their efforts to learn more about key biological processes involved in human health and disease.

Components of the Protein Structure Initiative ($25 million), a national effort to determine the three-dimensional shapes of a wide range of proteins. This structural information will help reveal the roles that proteins play in health and disease and will help point the way to designing new medicines. Columbia researchers are leading or participating in three of the 10 new research centers

The New York Consortium on Membrane Protein Structure, led by Wayne Hendrickson. Other faculty: Burkhard Rost, Barry Honig, Lawrence Shapiro, Ming Zhou, John Hunt, Ann McDermott, and Filippo Mancia.

The New York Structural Genomics Research Consortium (led by SGX Pharmaceuticals Inc, a company co-founded by Barry Honig and Wayne Hendrickson). Other faculty: Lawrence Shapiro.

The Northeast Structural Genomics Consortium, led by Rutgers University’s Gaetano Montelione. Columbia faculty: Burkhard Rost, Barry Honig, Wayne Hendrickson, Peter Allen, Liang Tong, John Hunt, and Andrew Laine.
actually understanding how all the genes and proteins interact requires very different types of analyses. Inventing methods to understand gene and protein networks is what systems and computational biologists, bioinformaticians, and other scientists, working at MAGNet and elsewhere at Columbia, do.
     “With approximately 20,000 to 25,000 genes in the human genome, there are trillions of potential two- and three-way interactions among genes and proteins inside the cell,” Dr. Califano explains. “Exploring each one in the laboratory would take a long time, even with current high-throughput methods. Instead, we are using computers and the new methods of systems biology to predict which proteins are interacting with each other and with DNA and how these interactions change in disease. Inferring these networks and circuits will have a major impact on how we understand basic cellular processes and how these are disregulated in disease.” Systems biology is the holistic study of biological systems, using computational tools and high-throughput data to infer, model, and simulate the myriad interactions among DNA, RNA, and proteins in the cell.
     MAGNet researchers are part of another Columbia center called C2B2, the Center for Computational Biology and Bioinformatics, co-directed by Drs. Honig and Califano. C2B2 is an interdepartmental center whose goal is to catalyze research at the interface between biology and the computational and physical sciences. C2B2 supports active research programs in areas such as computational biophysics and structural biology, the modeling of regulatory, signaling, and metabolic networks, pattern recognition, machine learning, and functional genomics. The centers bring together talent from Columbia’s Washington Heights and Morningside campuses and include faculty from biochemistry and molecular biophysics, biomedical informatics, biological sciences, chemistry, computer science, applied physics and applied mathematics, electrical engineering, and the Center for Computational Learning Systems.
     In the past, biologists often studied the cell by looking at each of its component parts, Dr. Califano explains. But such an approach is akin to trying to understand the operation of an automobile engine by looking only at a few selected parts, such as the fuel injectors, the rotor, and the spark plugs. Knowing how these parts work individually does not provide sufficient information to understand the operation of the entire engine, let alone be able to fix it when it malfunctions. Likewise, medicine has been limited in its ability to understand biological functions and treat disease at the molecular level because the integrated study of a large number of interrelated components of the cells, i.e., the systems biology approach, was too complex an endeavor to be easily undertaken. Now, using advanced computational tools and high-throughput biological data, including the massive amount of data generated by many genome projects, investigators can finally start to model complex cellular processes to make predictions that are starting to be validated in the lab with increasingly high success rates.
     Riccardo Dalla-Favera, M.D., the Percy and Joanne Uris Professor, professor of genetics and development, professor of pathology, director of the Institute for Cancer Genetics, and director of the Herbert Irving Comprehensive Cancer Center, collaborates with Dr. Califano. They use genomic approaches to understand cancer, particularly B cell lymphomas, Dr. Dalla-Favera’s area of expertise. In April 2005, Dr. Dalla-Favera, Dr. Califano, and others published a paper in Nature Genetics describing an algorithm to define the regulatory networks in human B cells, the first such study using systems biology methodology in human cells.
     The researchers analyzed mRNA expression patterns from 340 microarrays of different types of B cells, including normal cells, a variety of lymphomas, and experimentally manipulated cell lines. They used an algorithm called ARACNE to study which of the 20,000 genes in a B cell were transcriptionally activated or repressed by specific
Riccardo Dalla-Favera, M.D.
Riccardo Dalla-Favera, M.D.
transcription factors, such as the MYC protooncogene. MYC is an important transcription factor, which if translocated may lead to various types of cancer, including some lymphomas. Their algorithm was able to identify 56 putative targets of MYC that are specific to B cells. Twelve of these, which had not been previously reported, were further investigated and 11 of them were validated in vivo as bona fide MYC targets using chromatin immunoprecipitation. This is an unprecedented validation rate for a network reverse engineering tool. A more recent version of the algorithm has led to the identification of more than 200 putative MYC targets in B cells and more than 100 post-translational modulators of MYC function. The method thus provides a model for studying transcriptional networks in the cell on a global scale rather than one gene at a time.
     Dr. Dalla-Favera says the algorithm is important because it will allow researchers to elucidate important pathways in cells de novo, without knowing them advance. He compares cellular networks to air traffic patterns in the United States. The genome represents all the airports. Gene expression profiling are the flights. Bioinformatics, computational biology, and systems biology provide the methods to characterize which flights are linked and how a delay affects the rest of the system. The MYC protein, he says, is a major airport hub. “If problems occur there, the circuitry of the cell is disturbed as it is in cancer and as it would be if Chicago’s O’Hare were snowed under,” Dr. Dalla-Favera explains. “While we know the major airports, scientists still don’t know the cellular traffic patterns but we are developing tools to understand them. Understanding key hubs in normal cell activity should provide drug targets and clues to when things go awry in disease.”

| TOP |

From the macroscopic to the sub-microscopic
Seeing the overall pattern of thousands of protein-DNA and protein-protein interactions is very important in understanding disease, but when it comes down to designing new drugs, scientists still need to know the actual structure of a protein and its binding properties. Although all the genes have been sequenced, only a small percentage of the structures of the corresponding proteins are known. The Protein Structure Initiative aims to get the structures of all the proteins encoded by the genome. The number of proteins is greater than the number of genes because of variations in the way the genes get spliced and then read by the cell’s protein-making machinery.
     But proteins have motifs, such as certain twists and turns, that are reflected in the DNA sequence and proteins with similar motifs comprise families and often have similar functions. One mission of the Protein Structure Initiative at Columbia, under the leadership of Dr. Hendrickson, a world leader in X-ray crystallography of proteins, will be to determine the structures of a particular class of proteins, those that reside within the cell membrane. Relatively little structural information is available for membrane-bound proteins, because of technical difficulties. Dr. Hendrickson has developed a methodology called MAD, or multiple wavelength anomalous diffraction, which might help to overcome the technical difficulties. “One aspect of the Columbia initiative is to determine the X-ray structure of a good representative in the family of membrane-bound proteins,” Dr. Hendrickson says. “Having the structural information of the prototype would speed up the process of getting the structure of other members of the family.”
     “We hope that the Protein Structure Initiative will allow us to develop a new view of the relationships between protein sequence, protein structure, and protein function that will ultimately make the 3-D structures and functions of most proteins predictable from the protein sequence,” says Dr. Honig, who is developing methods to predict protein structures from sequence. “Researchers working in the different aspects of genomic science look at a problem from different scales. For example, some look at how molecules inside cells interact and others how amino acids inside proteins interact.”

| TOP |

The future of genomic medicine
The route from fundamental science to clinical application is demonstrated by the development of the blood test to assess tissue rejection. After researchers discovered the relevant genes by using microarray screening technologies and literature searches, a test was developed and clinical research validated the findings. The research continues, as Dr. Deng works with MAGNet’s Dr. Califano to study the organ recipient’s cellular networks using the tools of computational and systems biology.
     Before genomics becomes more widely used in the practice of medicine, significant basic research and clinical studies will be conducted. While many systems biology studies have been done in lower organisms, more studies need to be done with human genes. Interpretation of microarray results have yet to be standardized for either basic or clinical use. Computer modeling of different interactions of genes and proteins must be able to predict what happens if a new gene or protein is thrown into the algorithm. Understanding the dance a protein undergoes as it responds to a change in the environment is a goal that won’t be reached simply or quickly. Clinical trials must follow to test new genomic-based diagnostics and drugs. Nevertheless, the discipline of genomics offers great promise in filling the gaps between health and disease, in a way never before seen in the history of medicine.

| TOP |