The challenges of big data for cancer drug discovery

24/12/13 - by Joe Dunckley

The field of targeted personalised medicine is maturing, driven by the response to recent challenges -- including a new wave of big data.

Herceptin packet

One of the most important developments in cancer treatment in the past couple of decades has been the introduction of targeted treatments, enabled by a growing understanding of the gene mutations that transform healthy cells into cancers. Designed to attack the products of these rogue genes -- the protein drivers of cancer growth -- the targeted treatments that are now benefiting patients are often more effective than traditional chemotherapeutics and much less toxic, because they home in on the disease cells while sparing more of the healthy ones.

But defeating a disease like cancer was never going to be quite as simple as that. Fantastic fighters though our first generation of targeted treatments are, cancer is a formidable foe and there remain some major challenges for researchers. Professor Paul Workman and Dr Bissan Al-Lazikani from the Cancer Research UK Cancer Therapeutics Unit here at The Institute of Cancer Research, London, recently assessed these challenges in the leading journal Nature Reviews Drug Discovery and suggested some solutions to the big data challenge based on their own research.

The biggest challenge highlighted in their analysis is the sheer number and diversity of genes involved both within and between cancers. The large number of as yet undrugged cancer genes leads to difficulty in prioritising these for drug discovery. Furthermore, the diversity of cancer genes leads to the development of drug resistance: the treatment drives the cancer into remission, but over time the disease begins to return as the drug becomes ineffective against it. Target just one mutation and the chances are that it will not be carried by every cell in the tumour, so some will slip through the net. In addition, tumours can evolve during treatment such that the genes that are driving the cancer can change -- and so will require alternative treatments.

When the first targeted treatments were being developed, our understanding of the genetic heterogeneity both between and within tumours was still very patchy, and it was also difficult to determine which mutations carried by cancer cells were driving the disease and which were random harmless passengers picked up during its development. This incomplete picture of the biology of the tumour meant that our early targeted treatments were often opportunistic pursuits of the few insights into the working of the cancer cell that we did already possess.

Paul and Bissan argue that we are now in a good position to solve these problems, by building a much more complete picture not only of how cancer cells work, but also of which cancer genes should be targeted for maximum benefit. In the past decade, large-scale projects have produced mountains of data on the biology of cancer cells. This is especially true in genomics, where projects like The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC) are working towards sequencing the genomes of tens of thousands of tumours from dozens of different cancer types. This deluge of data is full of signs pointing to the next targets for drug discovery and to strategies for pre-empting and overcoming drug resistance. The amount of data way outstrips that which has been collected in the lifetime of the Hubble Space Telescope. But it creates a new problem of its own: how to sift through all the data, find the needles in the haystack, and prioritise the best potential targets -- especially those targets that could be attacked in targeted combinations to tackle tumour heterogeneity and drug resistance.

Their answer to this challenge is CanSAR, the vast database developed by Bissan's team here at The Institute of Cancer Research (ICR). With a powerful search interface that taps into intelligent tools for cross-referencing results from across the different fields of cancer research, CanSAR can highlight the most promising genes to target.

Using CanSAR, Bissan, Paul and their colleagues have already drawn up a shortlist of candidates. To do this they first narrowed down from the initial list of hundreds of known cancer genes to focus on those genes are found to be mutated in cancers again and again, by several different big genome data studies like TCGA and ICGC, in large numbers of patients and in multiple types of cancer.

With the resulting shortlist of 58 genes, Paul and Bissan used CanSAR to combine and analyse all available information from genetics, cell biology, chemistry, structural biology and pharmacology studies -- using both past laboratory research and computer projections to assess “druggability” -- that is, how practical it would be to design a small molecule drug acting on the protein product of the cancer driver genes. The key unique feature of the ICR team’s approach is to provide a computer-based look that is multidimensional, multidisciplinary and comparative across all the data, rather than just focusing on any one gene and any one type of data, such as the genetics. This can give a much better picture of which genes to focus our drug discovery efforts on.

And by looking at the bigger picture of groups of genes linked to cancer, we can begin to see the patterns of how cancer genes conspire together in functional networks -- ones that not only create and support advanced cancer but also allow resistance to develop. Analyzing networks rather than individual genes should help us to prioritize drug targets -- or existing drugs -- that will work best in the sort of clever drug combinations that will be needed to overcome resistance and defeat cancer. This network analysis feature and others have been enhanced in the latest version 2 release of CanSAR, recently released as a tool for the whole research community.

Driven by the new big data, the discovery of targeted treatments is now entering a second phase of maturity. Intelligent tools like CanSAR light the way and help us map the paths ahead.