Main Menu

Five ways we can harness the power of Big Data to transform cancer research and treatment


Professor Paul Workman, Chief Executive of The Institute of Cancer Research, London, sets out what needs to happen for Big Data to deliver on its promise to revolutionise cancer research and care.

Posted on 21 April, 2016 by Professor Paul Workman

Big Data cloud (image: Dr Bissan Al-Lazikani et al)

Image courtesy of Costas Mitsopoulos , Amanda C. Schierz , Paul Workman, Bissan Al-Lazikani (figure from 2015 PLOS paper, Distinctive Behaviors of Druggable Proteins in Cellular Networks)

When I took over as Chief Executive of The Institute of Cancer Research, London in November 2014, I announced a three-point plan to overcome the greatest challenge in cancer research — cancer’s ability to adapt, evolve and become resistant to treatment.

The plan set out my intention for the ICR to focus our drug discovery and development research on tackling resistance: firstly by expanding the number of cancer genes we target; secondly by developing clever new combinations of smart, targeted therapies; and thirdly by discovering network drugs that hit cancer hard in multiple ways at once.

To pursue priorities such as these successfully, we need to be making best use of numerous new technologies. And perhaps most important of all we need to be establishing better ways of bringing together very large volumes of scientific data from multiple sources.

I’ve written before about how many of the most challenging research problems will only be solved through multidisciplinary approaches and by investigating cancer as a whole system, one which is changing and evolving in space and time. When many different disciplines are generating data for a project, it’s critical to ensure that we can effectively integrate all the different forms of information.

The sheer scale of the data that we generate ourselves at the ICR — especially when combined with massive publicly available data sets — means that we must greatly expand our ability to store, process, visualise and exchange information.

To transform treatment and overcome evolution and resistance to drug therapy in the clinic — as well as to enhance precision radiotherapy, surgery and immunological treatments — we will need to bring together mind-boggling amounts of genome sequencing and other ‘omics’ data: pathology information, imaging data, patient notes and much more. This approach has a name — Big Data, and it has become a hot topic in medical research — just as it is having an impact everywhere in modern life today.

Earlier this year, I attended the Future of Healthcare Investors Forum, organised by the London Stock Exchange and MedCity an organisation promoting life sciences across London and the south east. There was strong focus on the importance of health data, digital health and diagnostics as well the new generation of therapies. I chaired an expert panel discussion on how genomics technology and data are transforming the healthcare industry.

A major theme that ran through the sessions was excitement about the new technologies that are becoming available, the decreasing costs and democratisation of genome sequencing, and the general importance and disruptive impact of Big Data. There was widespread agreement on the potential of these changes to transform the future of healthcare.

And London recently held its first Festival of Genomics, in which two of our senior researchers at the ICR — Dr Bissan Al-Lazikani and Professor Nazneen Rahman — presented on how we are harnessing the explosion of data in cancer research  to benefit  patients and why we need to be smarter in our use of genetic data.

I’ve been thinking about the major challenges and opportunities that we face if we want to ensure that the new technologies and information explosion have the greatest benefit for the lives of patients. Here I suggest five approaches we’re going to have to take to ensure that Big Data delivers on its promise to revolutionise cancer research and care.

1. We need to integrate non-clinical and clinical data much more effectively

lab_screen_carousel 590x332

The term Big Data gets us thinking of the sheer size of the information sets we need to analyse — moving from the megabyte level a few years ago to the familiar gigabytes and terabytes of today and accelerating rapidly to petabytes, exabytes and zettabytes. The total amount of data generated worldwide was estimated to reach four zettabytes in 2014.

Our research needs at the ICR alone are already into the petabyte range. But it’s not only the quantity of data which presents a challenge — it’s also its complexity and diversity.

Big Data can include information from many different areas of science and medicine, recorded in multiple different formats, and covering everything from genomic sequences to complex medical images, and including unstructured information written in free text as well the requirement for measurements taken across time.

By integrating and analysing the entire complexity of the data as a whole rather than individual data types — using Big Data analytics and machine learning — we have the potential to uncover new knowledge that can help shape the way we search for cancer treatments or implement new models of care.

At the ICR we have already demonstrated that such data integration can be achieved in the area of drug discovery by integrating biological, chemical and clinical research data in our canSAR knowledgebase. This contains billions of experimental data points from multiple disciplines.

CanSAR integrates and ‘translates’ data from these multiple disciplines into the same language, allowing us to analyse them together. This can be used, for example, to select the best targets for new drug discovery. Building on this success, we now want to extend this approach to other areas of cancer research and treatment.

I firmly believe that by using Big Data it will increasingly be possible to predict outcomes for cancer (and other) patients with greater accuracy; to increase the precision of drug, radiation and surgical treatment; and to adapt therapy in real time to defeat cancer evolution and drug resistance.

But integrating non-clinical research data with clinical data is a real challenge. Some types of data have common agreed standards, but many others do not. Clinical notes, for example, exist in many different formats, including hand-written comments, sometimes with incomplete or inconsistent annotations. Patient-Reported Outcome Measures comprise an increasingly valuable but complex dataset. But past clinical trials were generally not designed with Big Data analytics in mind and it is often challenging to take results from these and analyse the information in new ways.

So how can we best integrate and analyse these together with genomic and imaging data?

We need to find approaches of standardising the collection of data and new ways to integrate and analyse it as a research community. We also need a cultural change in clinical trials and clinical practice that allows better collection and analysis of data to benefit all patients.

At the ICR, we are moving towards our vision of transforming cancer outcomes through research by developing our own Big Data platform. Called the Knowledge Hub, this project is led by Dr Al-Lazikani.

Our new initiative will standardise formats for storage of data across the organisation, so that it can be shared, understood and analysed by all our researchers. A key goal of the Knowledge Hub is to provide the Big Data analysis and prediction capability to adapt therapy to individual patients at a specific time, based on data collected from the patient and analysed against the knowledge-base in real time.

This kind of Big Data initiative is vital. It’s essential that we get this right now if we’re going to reap the benefits of Big Data for research, healthcare and the UK economy.

2. We need to invest in the infrastructure to support Big Data initiatives

data-rainbow 590x332

If we’re to take full advantage of exciting new technologies to collate and share complex data, then we’re going to need investment in the infrastructure to support them.

Data collection is getting cheaper and faster, but we haven’t seen the same progress in new systems to bring all of these different types of data together, and to analyse the information effectively.

Researchers in this area, such as Dr Al-Lazikani, often define big data as working on a volume and intricacy of information which is beyond most current capabilities to handle and analyse.

So the cancer community as a whole needs to invest in infrastructure that can give us this capability. Current national activities such as the MRC eMedLab initiative and European Bioinformatics Institute showcase some of the key data sharing initiatives to support broader medical research.

At the ICR, we are investing in a digital programme to underpin our work in this area, from our infrastructure to our digital vision. We know that it is critical that we build more capacity, storage and transfer capability to help us to share data reciprocally across our research community both independently and in collaboration with the major national initiatives.

3. We need to invest in people with Big Data skills

Student at a computer (photo: iStock)

Photo: iStock 

Another vital area of investment will be in ensuring we have people with the right skills to manage Big Data projects.

Big Data analytics is still an emerging field and there are a limited number of people who have the right combination of skills and experience. It can be hard for all research organisations, including the ICR, to recruit the highly skilled staff we need, particularly when we’re competing with big companies like Google and IBM for the same pool of talented people.

So we need to train more people with skills in Big Data analysis, and equally importantly we need to develop them in a different way. There is no point training up people in using specific technologies that could be redundant three or four years later. We must produce people with skills which will allow them to adapt to changing technologies and then become future leaders in the field.

The next generation of data scientists need to be trained in a combination of mathematics, statistics and computing science, allowing them to apply their skills to many different challenges in data mining and artificial intelligence.

Research organisations also need to be flexible and imaginative in their recruitment. At the ICR, we have hired people from the fields of nuclear and astrophysics to work on our Big Data projects, as they have strong backgrounds in computer science, maths and statistics and are experienced at dealing with data on a huge scale, even if they haven’t had prior experience of working in cancer research.

These people are keen to join us because they are excited about the challenges of cancer research and the potential to help transform the lives of patients. The power of the new computational and mathematical modelling approaches is demonstrated by the recent research from the ICR’s Dr Andrea Sottoriva — who originally trained in computer science and modelling, and worked in astroparticle physics as a scientific programmer on the first undersea neutrino telescope, as well as taking an internship at the CERN Large Hadron Collider. He later switched to biomedical research and now uses genomics, computational approaches and mathematical modelling to understand cancer as a complex system.

Andrea’s research is changing the way we think about the evolution of cancers and our ability predict how individual cancers will behave and should be treated. This new perspective indicates that although cancer genomes are incredibly complex, the way tumours grow can be predicted using relatively simple rules. This could be very relevant to the way cancers are treated and is a great example of Big Data in practice.

4. We must ensure the NHS is able to embrace Big Data

data-protection 590x332

Accessing and analysing Big Data will not only shape the way we do scientific and medical research, but also has potential to have a major impact on the way care is delivered.

This will require the NHS and other healthcare systems around the world to embrace the potential of Big Data, so they can embed it in their models of patient care, and also be aware of the needs of research organisations when collecting clinical information.

The signs are positive. At the Future of Healthcare Investors Forum event, Life Sciences Minister George Freeman gave an upbeat speech outlining his vision for the future.

He wants the NHS to become a better adopter of technology with ‘every patient a research patient’ and ‘every hospital a research hospital’. It’s a message I strongly support.

Mr Freeman emphasised that better use of informatics, digitalisation and diagnostic biomarkers could modernise the NHS — not only leading to better patient outcomes, but also saving money. I welcome the Minister’s focus on the importance of innovation, technology and growth within the healthcare sector.

He also strongly advocated the need for academia, the NHS and industry to work together to deliver great science, improved outcomes and economic benefits rather than tending to work in silos as they do now.

It is exactly this kind of integrated and joined-up approach that we take at the ICR and in our partnership with The Royal Marsden NHS Foundation Trust.

Our drug discovery programme is closely integrated with creative clinical trials in the Drug Development Unit, allowing us to simultaneously develop new treatments and molecular biomarkers, and to use them together in personalised diagnosis and treatment. We are now working as a single team to expand our collaborations across e-health and developing Big Data applications to clinical research. We are linking closely with other national initiatives and the National Disease Registry to ensure that we sow the seeds for comprehensive, joined-up cancer care in the future.

Analysis of large complex datasets will be critical in all this, allowing researchers to contribute information about a cancer and its response to treatment to help shape the care of individual patients.

5. We should see Big Data as an opportunity not a threat

excel-data 590x332

Last year the House of Commons Science and Technology Select Committee carried out an inquiry into Big Data, detailing some of the opportunities and challenges in the development of Big Data technologies.

The report was broad in its scope, covering everything from transport to investment banking, but it contained a fascinating and revealing section on Big Data in healthcare.

The relevant section focused mainly on access to patient data, covering public attitudes, safeguards and, in particular, the project, and the concerns which had caused it to be delayed.

The report did say that the potential benefits of Big Data were particularly significant in healthcare, but its main focus seemed to be around controlling the negatives by responding to public perceptions of data sharing and ensuring use of data is properly regulated.

Of course appropriate confidentiality and security is essential. It’s vital that there is public confidence in the safeguards that exist over data access. But it’s also critical that we don’t lose sight of the enormous benefits that Big Data can have for scientific and medical research.

Access to patient data is absolutely vital for the research conducted at organisations like the ICR.

In fact, the two agendas of data transparency and data protection do not need to be at odds with each other. Patient data must be stored safely and securely to ensure patient confidentiality, but we also need to make sure that such safeguards do not come at the cost of efficient access to patient data for research use. Patients are often passionate advocates for the use of their data towards improving therapy and survival, provided that the data are appropriately used. We need to work with patients to form these working practices.

It is important we build trust in the way organisations like the ICR use patient data and in order to do so more information needs to be provided to both the public and to clinicians about how information is stored and accessed, and we need more public debate.

Here at the ICR we have been involved in joint campaigns and information projects such as the Personal Data Saves Lives campaign and the patient records information available on the Wellcome Trust website. We are also interacting with Cancer Research UK to see how we can work together to build on the progress we have made.

I think a really important message is that we need to see Big Data initiatives as an opportunity, not as a threat. If we get this right, Big Data will transform our whole approach to scientific and medical research, and deliver huge benefits for patients.


Paul Workman big data
comments powered by Disqus