- Genomics proves successful in uncovering the complex nature of cancer
- HPC systems provide insight into the relationship between DNA sequences and chromosomal rearrangements
- Cancer subtypes that respond differently to treatments identified
There is an enormous amount that we do not understand about the fundamental causes and behavior of cancer cells, but at some level, experts believe that cancer must relate to DNA and the genome.
Since the human genome consists of three billion base pairs, scientists use computing and scientific software to find connections in biological data. But genomics is more than simple pattern matching.
"When you move into multi-dimensional, structural, time-series, and population-level studies, the algorithms get a lot harder and they also tend to be more computationally intensive," says Matt Vaughn, director of Life Sciences Computing at the Texas Advanced Computing Center (TACC).
"This requires resources like those at TACC, which help large numbers of researchers explore the complexity of cancer genomes."
Fishing in big data ponds
A group led by Karen Vazquez, professor of pharmacology and toxicology at The University of Texas at Austin (UT Austin), has been working to find correlations between chromosomal rearrangements — one of the hallmarks of cancer genomes — and certain DNA sequences with the potential to fold into secondary structures.
These structures, including hairpin or cruciform shapes, triple or quadruple-stranded DNA, and other naturally-occurring, but alternative, forms, are collectively known as 'potential non-B DNA structures' or PONDS.
PONDS enable genes to replicate and generate proteins and are therefore essential for human life. But scientists also suspect they may be linked to mutations that can elevate cancer risk.
Using the Stampede and Lonestar supercomputers at TACC, Vasquez worked with researchers from the University of Texas MD Anderson Cancer Center and Cardiff University to test the hypothesis that PONDS might be found at, or near, rearrangement breakpoints — locations on a chromosome where DNA might get deleted, inverted, or swapped around.
By analyzing the distribution of PONDS-forming sequences within about 1,000 bases of approximately 20,000 translocations and more than 40,000 deletion breakpoints in cancer genomes, they found a significant association between PONDS-forming sequences and cancer.
"We found that short inverted repeats are indeed enriched at translocation breakpoints in human cancer genomes," says Vazquez.
The correlation recurred in different individuals and patient tumor samples. They concluded that PONDS-forming sequences represent an intrinsic risk factor for genomic rearrangements in cancer genomes.
"In many cases, translocations are what turn a normal cell into a cancer cell," says co-author Albino Bacolla, a research investigator in molecular and cellular oncology at MD Anderson.
"What we found in our study was that the sites of chromosome breaks are not random along the DNA double helix; instead, they occur preferentially at specific locations. Cruciform structures in the DNA, built by the short, inverted repeats, mark the spots for chromosome breaks, mutations, and potentially initiate cancer development."
Understanding the processes by which PONDS lead to chromosomal rearrangements, and these rearrangements impact cancer, will be important for future diagnostic and treatment purposes.
Analyzing the genome in action
With the exception of mutations, the genome remains roughly fixed for a given cell line. On the other hand, the transcriptome — the set of all messenger RNA molecules in one cell or a population of cells — can vary with external conditions.
TACC has been vital to our analysis of cancer genomics data, both for providing the necessary computational power and the security needed for handling sensitive patient genomic datasets. ~ Vishy Iyer
Messenger RNA (mRNA) convey genetic information from DNA to the ribosome, where they specify what proteins the cell should make — a process known as gene expression. Understanding what genes are being expressed in a tumor helps to more precisely classify tumors into subgroups so they can be properly treated.
Vishy Iyer, a professor of molecular biosciences at UT Austin, has developed a way to identify sections of DNA that correlate with variations in specific traits, as well as epigenetic, or non-DNA related, factors that impact gene expression levels.
He and his group use this approach on data from The Cancer Genome Atlas (TCGA) to study the effects of genetic variation and mutations on gene expression in tumors. TACC's Stampede supercomputer helps them mine petabytes of data from TCGA to identify genetic variants and subtle correlations that relate to various forms of cancer.
Iyer and a team of researchers from UT Austin and MD Anderson Cancer Center, report on a genome-wide transcriptome analysis of the two types of cells that make up the prostate gland — prostatic basal and luminal epithelial populations.
"By analyzing gene expression programs, we found that the basal cells in the human prostate showed a strong signature associated with cancer stem cells, which are the tumor originating cells," Iyer says. "This knowledge can be helpful in the development of more targeted therapies that seek to eliminate cancer at its origin."
By identifying these subtle indicators, not just in DNA but in mRNA expression, the work will help improve patient diagnoses and provide the proper treatment based on the specific cancers involved.
"Next-generation sequencing technology allows us to observe genomes and their activity in unprecedented detail," he says. "It's also making a lot of biomedical research increasingly computational, so it's great to have a resource like TACC available to us."
These projects were supported, in part, by grants from NIH, DOD, Cancer Prevention Research Institute of Texas, MD Anderson Cancer Center Center for Cancer Epigenetics, Center for Cancer Research, Lymphoma Research Foundation and the Marie Betzner Morrow Centennial Endowment.