Wedge, D.C., Gundem, G., Mitchell, T., Woodcock, D.J., Martincorena, I., Ghori, M., Zamora, J., Butler, A., Whitaker, H., Kote-Jarai, Z., et al.
(2018). Sequencing of prostate cancers identifies new cancer genes, routes of progression and drug targets. Nat genet,
Prostate cancer represents a substantial clinical challenge because it is difficult to predict outcome and advanced disease is often fatal. We sequenced the whole genomes of 112 primary and metastatic prostate cancer samples. From joint analysis of these cancers with those from previous studies (930 cancers in total), we found evidence for 22 previously unidentified putative driver genes harboring coding mutations, as well as evidence for NEAT1 and FOXA1 acting as drivers through noncoding mutations. Through the temporal dissection of aberrations, we identified driver mutations specifically associated with steps in the progression of prostate cancer, establishing, for example, loss of CHD1 and BRCA2 as early events in cancer development of ETS fusion-negative cancers. Computational chemogenomic (canSAR) analysis of prostate cancer mutations identified 11 targets of approved drugs, 7 targets of investigational drugs, and 62 targets of compounds that may be active and should be considered candidates for future clinical trials..
Rosenstein, B.S., Rao, A., Moran, J.M., Spratt, D.E., Mendonca, M.S., Al-Lazikani, B., Mayo, C.S. & Speers, C.
(2018). Genomics, bio specimens, and other biological data: Current status and future directions. Med phys,
Antolin, A.A., Tym, J.E., Komianou, A., Collins, I., Workman, P. & Al-Lazikani, B.
(2018). Objective, Quantitative, Data-Driven Assessment of Chemical Probes. Cell chem biol,
Chemical probes are essential tools for understanding biological systems and for target validation, yet selecting probes for biomedical research is rarely based on objective assessment of all potential compounds. Here, we describe the Probe Miner: Chemical Probes Objective Assessment resource, capitalizing on the plethora of public medicinal chemistry data to empower quantitative, objective, data-driven evaluation of chemical probes. We assess >1.8 million compounds for their suitability as chemical tools against 2,220 human targets and dissect the biases and limitations encountered. Probe Miner represents a valuable resource to aid the identification of potential chemical probes, particularly when used alongside expert curation..
Cato, L., Neeb, A., Sharp, A., Buzón, V., Ficarro, S.B., Yang, L., Muhle-Goll, C., Kuznik, N.C., Riisnaes, R., Nava Rodrigues, D., et al.
(2017). Development of Bag-1L as a therapeutic target in androgen receptor-dependent prostate cancer. Elife,
Targeting the activation function-1 (AF-1) domain located in the N-terminus of the androgen receptor (AR) is an attractive therapeutic alternative to the current approaches to inhibit AR action in prostate cancer (PCa). Here we show that the AR AF-1 is bound by the cochaperone Bag-1L. Mutations in the AR interaction domain or loss of Bag-1L abrogate AR signaling and reduce PCa growth. Clinically, Bag-1L protein levels increase with progression to castration-resistant PCa (CRPC) and high levels of Bag-1L in primary PCa associate with a reduced clinical benefit from abiraterone when these tumors progress. Intriguingly, residues in Bag-1L important for its interaction with the AR AF-1 are within a potentially druggable pocket, implicating Bag-1L as a potential therapeutic target in PCa..
Martínez-Jiménez, F., Overington, J.P., Al-Lazikani, B. & Marti-Renom, M.A.
(2017). Rational design of non-resistant targeted cancer therapies. Sci rep,
Drug resistance is one of the major problems in targeted cancer therapy. A major cause of resistance is changes in the amino acids that form the drug-target binding site. Despite of the numerous efforts made to individually understand and overcome these mutations, there is a lack of comprehensive analysis of the mutational landscape that can prospectively estimate drug-resistance mutations. Here we describe and computationally validate a framework that combines the cancer-specific likelihood with the resistance impact to enable the detection of single point mutations with the highest chance to be responsible of resistance to a particular targeted cancer therapy. Moreover, for these treatment-threatening mutations, the model proposes alternative therapies overcoming the resistance. We exemplified the applicability of the model using EGFR-gefitinib treatment for Lung Adenocarcinoma (LUAD) and Lung Squamous Cell Cancer (LSCC) and the ERK2-VTX11e treatment for melanoma and colorectal cancer. Our model correctly identified the phenotype known resistance mutations, including the classic EGFR-T790M and the ERK2-P58L/S/T mutations. Moreover, the model predicted new previously undescribed mutations as potentially responsible of drug resistance. Finally, we provided a map of the predicted sensitivity of alternative ERK2 and EGFR inhibitors, with a particular highlight of two molecules with a low predicted resistance impact..
Coker, E.A., Mitsopoulos, C., Workman, P. & Al-Lazikani, B.
(2017). SiGNet: A signaling network data simulator to enable signaling network inference. Plos one,
Network models are widely used to describe complex signaling systems. Cellular wiring varies in different cellular contexts and numerous inference techniques have been developed to infer the structure of a network from experimental data of the network's behavior. To objectively identify which inference strategy is best suited to a specific network, a gold standard network and dataset are required. However, suitable datasets for benchmarking are difficult to find. Numerous tools exist that can simulate data for transcriptional networks, but these are of limited use for the study of signaling networks. Here, we describe SiGNet (Signal Generator for Networks): a Cytoscape app that simulates experimental data for a signaling network of known structure. SiGNet has been developed and tested against published experimental data, incorporating information on network architecture, and the directionality and strength of interactions to create biological data in silico. SiGNet is the first tool to simulate biological signaling data, enabling an accurate and systematic assessment of inference strategies. SiGNet can also be used to produce preliminary models of key biological pathways following perturbation..
Tym, J.E., Mitsopoulos, C., Coker, E.A., Razaz, P., Schierz, A.C., Antolin, A.A. & Al-Lazikani, B.
(2016). canSAR: an updated cancer research and drug discovery knowledgebase. Nucleic acids res,
canSAR (http://cansar.icr.ac.uk) is a publicly available, multidisciplinary, cancer-focused knowledgebase developed to support cancer translational research and drug discovery. canSAR integrates genomic, protein, pharmacological, drug and chemical data with structural biology, protein networks and druggability data. canSAR is widely used to rapidly access information and help interpret experimental data in a translational and drug discovery context. Here we describe major enhancements to canSAR including new data, improved search and browsing capabilities, new disease and cancer cell line summaries and new and enhanced batch analysis tools. .
Al-Lazikani, B. & Workman, P.
(2016). Minimizing bias in target selection by exploiting multidisciplinary Big Data and the protein interactome. Future med chem,
Campbell, J., Ryan, C.J., Brough, R., Bajrami, I., Pemberton, H.N., Chong, I.Y., Costa-Cabral, S., Frankum, J., Gulati, A., Holme, H., et al.
(2016). Large-Scale Profiling of Kinase Dependencies in Cancer Cell Lines. Cell rep,
One approach to identifying cancer-specific vulnerabilities and therapeutic targets is to profile genetic dependencies in cancer cell lines. Here, we describe data from a series of siRNA screens that identify the kinase genetic dependencies in 117 cancer cell lines from ten cancer types. By integrating the siRNA screen data with molecular profiling data, including exome sequencing data, we show how vulnerabilities/genetic dependencies that are associated with mutations in specific cancer driver genes can be identified. By integrating additional data sets into this analysis, including protein-protein interaction data, we also demonstrate that the genetic dependencies associated with many cancer driver genes form dense connections on functional interaction networks. We demonstrate the utility of this resource by using it to predict the drug sensitivity of genetically or histologically defined subsets of tumor cell lines, including an increased sensitivity of osteosarcoma cell lines to FGFR inhibitors and SMAD4 mutant tumor cells to mitotic inhibitors. .
Workman, P., Clarke, P.A. & Al-Lazikani, B.
(2016). Blocking the survival of the nastiest by HSP90 inhibition. Oncotarget,
It is now recognised that genetic, epigenetic and phenotypic heterogeneity within individual human cancers is responsible for therapeutic resistance - knowledge that is having a profound impact on current thinking and experimentation. There has been concern that molecularly targeted therapy is doomed to failure, with resistant clones emerging in response to the Darwinian selective pressure of any drug treatment. However, two studies have shown that the evolution of drug resistance can be restrained by co-administration of a pharmacologic inhibitor of the HSP90 molecular chaperone..
Antolin, A.A., Workman, P., Mestres, J. & Al-Lazikani, B.
(2016). Polypharmacology in Precision Oncology: Current Applications and Future Prospects. Curr pharm des,
Over the past decade, a more comprehensive, large-scale approach to studying cancer genetics and biology has revealed the challenges of tumor heterogeneity, adaption, evolution and drug resistance, while systems-based pharmacology and chemical biology strategies have uncovered a much more complex interaction between drugs and the human proteome than was previously anticipated. In this mini-review we assess the progress and potential of drug polypharmacology in biomarker-driven precision oncology. Polypharmacology not only provides great opportunities for drug repurposing to exploit off-target effects in a new single-target indication but through simultaneous blockade of multiple targets or pathways offers exciting opportunities to slow, overcome or even prevent inherent or adaptive drug resistance. We highlight the many challenges associated with exploiting known or desired polypharmacology in drug design and development, and assess computational and experimental methods to uncover unknown polypharmacology. A comprehensive understanding of the intricate links between polypharmacology, efficacy and safety is urgently needed if we are to tackle the enduring challenge of cancer drug resistance and to fully exploit polypharmacology for the ultimate benefit of cancer patients..
Pearl, L.H., Schierz, A.C., Ward, S.E., Al-Lazikani, B. & Pearl, F.M.
(2015). Therapeutic opportunities within the DNA damage response. Nat rev cancer,
The DNA damage response (DDR) is essential for maintaining the genomic integrity of the cell, and its disruption is one of the hallmarks of cancer. Classically, defects in the DDR have been exploited therapeutically in the treatment of cancer with radiation therapies or genotoxic chemotherapies. More recently, protein components of the DDR systems have been identified as promising avenues for targeted cancer therapeutics. Here, we present an in-depth analysis of the function, role in cancer and therapeutic potential of 450 expert-curated human DDR genes. We discuss the DDR drugs that have been approved by the US Food and Drug Administration (FDA) or that are under clinical investigation. We examine large-scale genomic and expression data for 15 cancers to identify deregulated components of the DDR, and we apply systematic computational analysis to identify DDR proteins that are amenable to modulation by small molecules, highlighting potential novel therapeutic targets. .
Mitsopoulos, C., Schierz, A.C., Workman, P. & Al-Lazikani, B.
(2015). Distinctive Behaviors of Druggable Proteins in Cellular Networks. Plos comput biol,
The interaction environment of a protein in a cellular network is important in defining the role that the protein plays in the system as a whole, and thus its potential suitability as a drug target. Despite the importance of the network environment, it is neglected during target selection for drug discovery. Here, we present the first systematic, comprehensive computational analysis of topological, community and graphical network parameters of the human interactome and identify discriminatory network patterns that strongly distinguish drug targets from the interactome as a whole. Importantly, we identify striking differences in the network behavior of targets of cancer drugs versus targets from other therapeutic areas and explore how they may relate to successful drug combinations to overcome acquired resistance to cancer drugs. We develop, computationally validate and provide the first public domain predictive algorithm for identifying druggable neighborhoods based on network parameters. We also make available full predictions for 13,345 proteins to aid target selection for drug discovery. All target predictions are available through canSAR.icr.ac.uk. Underlying data and tools are available at https://cansar.icr.ac.uk/cansar/publications/druggable_network_neighbourhoods/. .
Bulusu, K.C., Tym, J.E., Coker, E.A., Schierz, A.C. & Al-Lazikani, B.
(2014). canSAR: updated cancer research and drug discovery knowledgebase. Nucleic acids res,
canSAR (http://cansar.icr.ac.uk) is a public integrative cancer-focused knowledgebase for the support of cancer translational research and drug discovery. Through the integration of biological, pharmacological, chemical, structural biology and protein network data, it provides a single information portal to answer complex multidisciplinary questions including--among many others--what is known about a protein, in which cancers is it expressed or mutated, and what chemical tools and cell line models can be used to experimentally probe its activity? What is known about a drug, its cellular sensitivity profile and what proteins is it known to bind that may explain unusual bioactivity? Here we describe major enhancements to canSAR including new data, improved search and browsing capabilities and new target, cancer cell line, protein family and 3D structure summaries and tools..
Walters, Z.S., Villarejo-Balcells, B., Olmos, D., Buist, T.W., Missiaglia, E., Allen, R., Al-Lazikani, B., Garrett, M.D., Blagg, J. & Shipley, J., et al.
(2014). JARID2 is a direct target of the PAX3-FOXO1 fusion protein and inhibits myogenic differentiation of rhabdomyosarcoma cells. Oncogene,
Rhabdomyosarcomas (RMS) are the most frequent soft-tissue sarcoma in children and characteristically show features of developing skeletal muscle. The alveolar subtype is frequently associated with a PAX3-FOXO1 fusion protein that is known to contribute to the undifferentiated myogenic phenotype of RMS cells. Histone methylation of lysine residues controls developmental processes in both normal and malignant cell contexts. Here we show that JARID2, which encodes a protein known to recruit various complexes with histone-methylating activity to their target genes, is significantly overexpressed in RMS with PAX3-FOXO1 compared with the fusion gene-negative RMS (t-test; P < 0.0001). Multivariate analyses showed that higher JARID2 levels are also associated with metastases at diagnosis, independent of fusion gene status and RMS subtype (n = 120; P = 0.039). JARID2 levels were altered by silencing or overexpressing PAX3-FOXO1 in RMS cell lines with and without the fusion gene, respectively. Consistent with this, we demonstrated that JARID2 is a direct transcriptional target of the PAX3-FOXO1 fusion protein. Silencing JARID2 resulted in reduced cell proliferation coupled with myogenic differentiation, including increased expression of Myogenin (MYOG) and Myosin Light Chain (MYL1) in RMS cell lines representative of both the alveolar and embryonal subtypes. Induced myogenic differentiation was associated with a decrease in JARID2 levels and this phenotype could be rescued by overexpressing JARID2. Furthermore, we that showed JARID2 binds to and alters the methylation status of histone H3 lysine 27 in the promoter regions of MYOG and MYL1 and that the interaction of JARID2 at these promoters is dependent on EED, a core component of the polycomb repressive complex 2 (PRC2). Therefore, JARID2 is a downstream effector of PAX3-FOXO1 that maintains an undifferentiated myogenic phenotype that is characteristic of RMS. JARID2 and other components of PRC2 may represent novel therapeutic targets for treating RMS patients..
Al-Lazikani, B. & Workman, P.
(2013). Unpicking the combination lock for mutant BRAF and RAS melanomas. Cancer discov,
Large-scale, unbiased combinatorial drug screening has been used to identify effective genotype-selective therapeutic combinations that show promising activity in preclinical models of mutant BRAF andRAS melanoma that are resistant to the clinical BRAF inhibitor vemurafenib..
Gonzalez de Castro, D., Clarke, P.A., Al-Lazikani, B. & Workman, P.
(2013). Personalized cancer medicine: molecular diagnostics, predictive biomarkers, and drug resistance. Clin pharmacol ther,
The progressive elucidation of the molecular pathogenesis of cancer has fueled the rational development of targeted drugs for patient populations stratified by genetic characteristics. Here we discuss general challenges relating to molecular diagnostics and describe predictive biomarkers for personalized cancer medicine. We also highlight resistance mechanisms for epidermal growth factor receptor (EGFR) kinase inhibitors in lung cancer. We envisage a future requiring the use of longitudinal genome sequencing and other omics technologies alongside combinatorial treatment to overcome cellular and molecular heterogeneity and prevent resistance caused by clonal evolution..
Box, C., Mendiola, M., Gowan, S., Box, G.M., Valenti, M., Brandon, A.D., Al-Lazikani, B., Rogers, S.J., Wilkins, A., Harrington, K.J., et al.
(2013). A novel serum protein signature associated with resistance to epidermal growth factor receptor tyrosine kinase inhibitors in head and neck squamous cell carcinoma. Eur j cancer,
BACKGROUND: Acquired resistance to tyrosine kinase inhibitors (TKIs) is becoming a major challenge in the treatment of many cancers. Epidermal growth factor receptor (EGFR) is overexpressed in squamous carcinomas, notably those of the head and neck (HNSCC), and can be targeted with several TKIs. We aimed to identify soluble proteins suitable for development as markers of EGFR TKI resistance in cancer patients to aid in early and minimally invasive assessment of therapeutic responses. METHODS: Resistant HNSCC cell lines were generated by exposure to an EGFR TKI, gefitinib, in vitro. Cell lines were characterised for their biological behaviour in vitro (using growth inhibition assays, flow cytometry, western blots, antibody arrays and/or immunoassays) and in vivo (using subcutaneous tumour xenografts). Sera from EGFR-treated and -untreated HNSCC patients were analysed by immunoassay. RESULTS: Two independent sublines of CAL 27 and a PJ34 subline with acquired resistance to EGFR TKIs (gefitinib, erlotinib and afatinib) were developed. Resistant cells grew as highly aggressive xenografts leading to reduced host survival rates compared with EGFR-TKI sensitive cells. This suggested a link between resistance in vitro and poor prognosis in vivo. A significant upregulation of proteins linked to tumour angiogenesis and invasion was identified in resistant cells. This 'resistance-associated protein signature' (RAPS) was detected in the sera of a small cohort of HNSCC patients and was associated with reduced survival. CONCLUSION: We have identified a protein signature associated with EGFR-TKI resistance that may also be linked to poor prognosis and warrants further investigation as a potential clinical biomarker..
Workman, P., Al-Lazikani, B. & Clarke, P.A.
(2013). Genome-based cancer therapeutics: targets, kinase drug resistance and future strategies for precision oncology. Curr opin pharmacol,
Extraordinary progress has been made in our detailed understanding of the genetic and epigenetic mechanisms responsible for oncogenesis and cancer progression. Empowered by next-generation sequencing, many new targets and pathways have been identified to exploit oncogene and non-oncogene addiction and synthetic lethality. Kinase inhibitors feature strongly in the druggable cancer genome and 19 have been approved in oncology. While survival gains are valuable, drug resistance has emerged as the major challenge. The clonal heterogeneity and evolution of cancers is an intrinsic problem, together with feedback loops, kinase switching and activation of alternative targets and pathways. The solution to drug resistance will require the use of rationally targeted combinational regimens. The application of adaptive treatment cycles based on ongoing multi-technology profiling will be the key to long-term therapeutic success. .
Workman, P. & Al-Lazikani, B.
(2013). Drugging cancer genomes. Nature reviews drug discovery,
Gaulton, A., Bellis, L.J., Bento, A.P., Chambers, J., Davies, M., Hersey, A., Light, Y., McGlinchey, S., Michalovich, D., Al-Lazikani, B., et al.
(2012). ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic acids res,
ChEMBL is an Open Data database containing binding, functional and ADMET information for a large number of drug-like bioactive compounds. These data are manually abstracted from the primary published literature on a regular basis, then further curated and standardized to maximize their quality and utility across a wide range of chemical biology and drug-discovery research problems. Currently, the database contains 5.4 million bioactivity measurements for more than 1 million compounds and 5200 protein targets. Access is available through a web-based interface, data downloads and web services at: https://www.ebi.ac.uk/chembldb..
Halling-Brown, M.D., Bulusu, K.C., Patel, M., Tym, J.E. & Al-Lazikani, B.
(2012). canSAR: an integrated cancer public translational research and drug discovery resource. Nucleic acids res,
canSAR is a fully integrated cancer research and drug discovery resource developed to utilize the growing publicly available biological annotation, chemical screening, RNA interference screening, expression, amplification and 3D structural data. Scientists can, in a single place, rapidly identify biological annotation of a target, its structural characterization, expression levels and protein interaction data, as well as suitable cell lines for experiments, potential tool compounds and similarity to known drug targets. canSAR has, from the outset, been completely use-case driven which has dramatically influenced the design of the back-end and the functionality provided through the interfaces. The Web interface at http://cansar.icr.ac.uk provides flexible, multipoint entry into canSAR. This allows easy access to the multidisciplinary data within, including target and compound synopses, bioactivity views and expert tools for chemogenomic, expression and protein interaction network data..
Al-Lazikani, B., Banerji, U. & Workman, P.
(2012). Combinatorial drug therapy for cancer in the post-genomic era. Nat biotechnol,
Over the past decade, whole genome sequencing and other 'omics' technologies have defined pathogenic driver mutations to which tumor cells are addicted. Such addictions, synthetic lethalities and other tumor vulnerabilities have yielded novel targets for a new generation of cancer drugs to treat discrete, genetically defined patient subgroups. This personalized cancer medicine strategy could eventually replace the conventional one-size-fits-all cytotoxic chemotherapy approach. However, the extraordinary intratumor genetic heterogeneity in cancers revealed by deep sequencing explains why de novo and acquired resistance arise with molecularly targeted drugs and cytotoxic chemotherapy, limiting their utility. One solution to the enduring challenge of polygenic cancer drug resistance is rational combinatorial targeted therapy..
Workman, P., Clarke, P.A. & Al-Lazikani, B.
(2012). Personalized medicine: patient-predictive panel power. Cancer cell,
Two recent papers published in Nature demonstrate the power of systematic high-throughput pharmacologic profiling of very large, diverse, molecularly-characterized human cancer cell line panels to reveal linkages between genetic profile and targeted-drug sensitivity. Known oncogene addictions are confirmed while surprising complexities and biomarker relationships with clinical potential are revealed..
Orchard, S., Al-Lazikani, B., Bryant, S., Clark, D., Calder, E., Dix, I., Engkvist, O., Forster, M., Gaulton, A., Gilson, M., et al.
(2012). Shouldn't enantiomeric purity be included in the 'minimum information about a bioactive entity? Response from the MIABE group. Nat rev drug discov,
Suwaki, N., Vanhecke, E., Atkins, K.M., Graf, M., Swabey, K., Huang, P., Schraml, P., Moch, H., Cassidy, A.M., Brewer, D., et al.
(2011). A HIF-regulated VHL-PTP1B-Src signaling axis identifies a therapeutic target in renal cell carcinoma. Sci transl med,
Metastatic renal cell carcinoma (RCC) is a molecularly heterogeneous disease that is intrinsically resistant to chemotherapy and radiotherapy. Although therapies targeted to the molecules vascular endothelial growth factor and mammalian target of rapamycin have shown clinical effectiveness, their effects are variable and short-lived, underscoring the need for improved treatment strategies for RCC. Here, we used quantitative phosphoproteomics and immunohistochemical profiling of 346 RCC specimens and determined that Src kinase signaling is elevated in RCC cells that retain wild-type von Hippel-Lindau (VHL) protein expression. RCC cell lines and xenografts with wild-type VHL exhibited sensitivity to the Src inhibitor dasatinib, in contrast to cell lines that lacked the VHL protein, which were resistant. Forced expression of hypoxia-inducible factor (HIF) in RCC cells with wild-type VHL diminished Src signaling output by repressing transcription of the Src activator protein tyrosine phosphatase 1B (PTP1B), conferring resistance to dasatinib. Our results suggest that a HIF-regulated VHL-PTP1B-Src signaling pathway determines the sensitivity of RCC to Src inhibitors and that stratification of RCC patients with antibody-based profiling may identify patients likely to respond to Src inhibitors in RCC clinical trials..
Bellis, L.J., Akhtar, R., Al-Lazikani, B., Atkinson, F., Bento, A.P., Chambers, J., Davies, M., Gaulton, A., Hersey, A., Ikeda, K., et al.
(2011). Collation and data-mining of literature bioactivity data for drug discovery. Biochem soc trans,
The challenge of translating the huge amount of genomic and biochemical data into new drugs is a costly and challenging task. Historically, there has been comparatively little focus on linking the biochemical and chemical worlds. To address this need, we have developed ChEMBL, an online resource of small-molecule SAR (structure-activity relationship) data, which can be used to support chemical biology, lead discovery and target selection in drug discovery. The database contains the abstracted structures, properties and biological activities for over 700000 distinct compounds and in excess of more than 3 million bioactivity records abstracted from over 40000 publications. Additional public domain resources can be readily integrated into the same data model (e.g. PubChem BioAssay data). The compounds in ChEMBL are largely extracted from the primary medicinal chemistry literature, and are therefore usually 'drug-like' or 'lead-like' small molecules with full experimental context. The data cover a significant fraction of the discovery of modern drugs, and are useful in a wide range of drug design and discovery tasks. In addition to the compound data, ChEMBL also contains information for over 8000 protein, cell line and whole-organism 'targets', with over 4000 of those being proteins linked to their underlying genes. The database is searchable both chemically, using an interactive compound sketch tool, protein sequences, family hierarchies, SMILES strings, compound research codes and key words, and biologically, using a variety of gene identifiers, protein sequence similarity and protein families. The information retrieved can then be readily filtered and downloaded into various formats. ChEMBL can be accessed online at https://www.ebi.ac.uk/chembldb..
Orchard, S., Al-Lazikani, B., Bryant, S., Clark, D., Calder, E., Dix, I., Engkvist, O., Forster, M., Gaulton, A., Gilson, M., et al.
(2011). Minimum information about a bioactive entity (MIABE). Nat rev drug discov,
Bioactive molecules such as drugs, pesticides and food additives are produced in large numbers by many commercial and academic groups around the world. Enormous quantities of data are generated on the biological properties and quality of these molecules. Access to such data - both on licensed and commercially available compounds, and also on those that fail during development - is crucial for understanding how improved molecules could be developed. For example, computational analysis of aggregated data on molecules that are investigated in drug discovery programmes has led to a greater understanding of the properties of successful drugs. However, the information required to perform these analyses is rarely published, and when it is made available it is often missing crucial data or is in a format that is inappropriate for efficient data-mining. Here, we propose a solution: the definition of reporting guidelines for bioactive entities - the Minimum Information About a Bioactive Entity (MIABE) - which has been developed by representatives of pharmaceutical companies, data resource providers and academic groups..
Abad-Zapatero, C., Perišić, O., Wass, J., Bento, A.P., Overington, J., Al-Lazikani, B. & Johnson, M.E.
(2010). Ligand efficiency indices for an effective mapping of chemico-biological space: the concept of an atlas-like representation. Drug discov today,
We propose a numerical framework that permits an effective atlas-like representation of chemico-biological space based on a series of Cartesian planes mapping the ligands with the corresponding targets connected by an affinity parameter (K(i) or related). The numerical framework is derived from the concept of ligand efficiency indices, which provide a natural coordinate system combining the potency toward the target (biological space) with the physicochemical properties of the ligand (chemical space). This framework facilitates navigation in the multidimensional drug discovery space using map-like representations based on pairs of combined variables related to the efficiency of the ligands per Dalton (molecular weight or number of non-hydrogen atoms) and per unit of polar surface area (or number of polar atoms)..
Berriman, M., Haas, B.J., LoVerde, P.T., Wilson, R.A., Dillon, G.P., Cerqueira, G.C., Mashiyama, S.T., Al-Lazikani, B., Andrade, L.F., Ashton, P.D., et al.
(2009). The genome of the blood fluke Schistosoma mansoni. Nature,
Schistosoma mansoni is responsible for the neglected tropical disease schistosomiasis that affects 210 million people in 76 countries. Here we present analysis of the 363 megabase nuclear genome of the blood fluke. It encodes at least 11,809 genes, with an unusual intron size distribution, and new families of micro-exon genes that undergo frequent alternative splicing. As the first sequenced flatworm, and a representative of the Lophotrochozoa, it offers insights into early events in the evolution of the animals, including the development of a body pattern with bilateral symmetry, and the development of tissues into organs. Our analysis has been informed by the need to find new drug targets. The deficits in lipid metabolism that make schistosomes dependent on the host are revealed, and the identification of membrane receptors, ion channels and more than 300 proteases provide new insights into the biology of the life cycle and new targets. Bioinformatics approaches have identified metabolic chokepoints, and a chemogenomic screen has pinpointed schistosome proteins for which existing drugs may be active. The information generated provides an invaluable resource for the research community to develop much needed new control tools for the treatment and eradication of this important and neglected disease..
(2009). Drugging the cancer genome. Drug discovery today,
Al-Lazikani, B., Hill, E.E. & Morea, V.
(2008). Protein structure prediction. Methods mol biol,
Protein structure prediction has matured over the past few years to the point that even fully automated methods can provide reasonably accurate three-dimensional models of protein structures. However, until now it has not been possible to develop programs able to perform as well as human experts, who are still capable of systematically producing better models than automated servers. Although the precise details of protein structure prediction procedures are different for virtually every protein, this chapter describes a generic procedure to obtain a three-dimensional protein model starting from the amino acid sequence. This procedure takes advantage both of programs and servers that have been shown to perform best in blind tests and of the current knowledge about evolutionary relationships between proteins, gained from detailed analyses of protein sequence, structure, and functional data..
Overington, J.P., Al-Lazikani, B. & Hopkins, A.L.
(2006). How many drug targets are there?. Nat rev drug discov,
For the past decade, the number of molecular targets for approved drugs has been debated. Here, we reconcile apparently contradictory previous reports into a comprehensive survey, and propose a consensus number of current drug targets for all classes of approved therapeutic drugs. One striking feature is the relatively constant historical rate of target innovation (the rate at which drugs against new targets are launched); however, the rate of developing drugs against new families is significantly lower. The recent approval of drugs that target protein kinases highlights two additional trends: an emerging realization of the importance of polypharmacology, and also the power of a gene-family-led approach in generating novel and important therapies..
Marks, D.J., Harbord, M.W., MacAllister, R., Rahman, F.Z., Young, J., Al-Lazikani, B., Lees, W., Novelli, M., Bloom, S. & Segal, A.W., et al.
(2006). Defective acute inflammation in Crohn's disease: a clinical investigation. Lancet,
BACKGROUND: The cause of Crohn's disease has not been mechanistically proven. We tested the hypothesis that the disease is a form of immunodeficiency caused by impaired innate immunity. METHODS: We investigated inflammatory responses in patients and controls by quantifying neutrophil recruitment and cytokine production after acute trauma, interleukin 8 secretion by cultured monocyte-derived macrophages after exposure to inflammatory mediators, and local inflammatory and vascular changes in response to subcutaneous injection of heat-killed Escherichia coli. FINDINGS: In patients with Crohn's disease, trauma to rectum, ileum, or skin led to abnormally low neutrophil accumulation (differences from healthy individuals of 79%, n=8, p=0.0003; 57%, n=3, p=0.05; 50%, n=13, p<0.0001, respectively) and lower production of proinflammatory interleukin 8 (63%, n=7, p=0.003; 63%, n=3, p=0.05; 45%, n=8, p<0.0001) and interleukin 1beta (50%, n=8, p=0.0005). Interleukin 8 secretion by cultured macrophages was reduced after exposure to acute wound fluid (38%, n=50, p<0.0001), C5a (48%, n=41, p=0.0005), or tumour necrosis factor alpha (52%, n=27, p<0.0001). Local inflammatory reaction to inoculation with E coli was attenuated, as quantified by changes in bloodflow (ileal disease 50%, n=6, p=0.01; colonic disease 77%, n=6, p=0.0003). This response was mediated by nitric oxide in controls, was increased by sildenafil in patients, and was not related to CARD15 genotype. INTERPRETATION: In Crohn's disease, a constitutionally weak immune response predisposes to accumulation of intestinal contents that breach the mucosal barrier of the bowel wall, resulting in granuloma formation and chronic inflammation. Polymorphisms in CARD15 do not underlie this phenotype, but incapacitate the NOD2 pathway that can compensate for impairment of innate inflammation. Current treatment of secondary chronic inflammation might exaggerate the underlying lesion and promote chronic disease..
Freilich, S., Spriggs, R.V., George, R.A., Al-Lazikani, B., Swindells, M. & Thornton, J.M.
(2005). The complement of enzymatic sets in different species. J mol biol,
We present here a comprehensive analysis of the complement of enzymes in a large variety of species. As enzymes are a relatively conserved group there are several classification systems available that are common to all species and link a protein sequence to an enzymatic function. Enzymes are therefore an ideal functional group to study the relationship between sequence expansion, functional divergence and phenotypic changes. By using information retrieved from the well annotated SWISS-PROT database together with sequence information from a variety of fully sequenced genomes and information from the EC functional scheme we have aimed here to estimate the fraction of enzymes in genomes, to determine the extent of their functional redundancy in different domains of life and to identify functional innovations and lineage specific expansions in the metazoa lineage. We found that prokaryote and eukaryote species differ both in the fraction of enzymes in their genomes and in the pattern of expansion of their enzymatic sets. We observe an increase in functional redundancy accompanying an increase in species complexity. A quantitative assessment was performed in order to determine the degree of functional redundancy in different species. Finally, we report a massive expansion in the number of mammalian enzymes involved in signalling and degradation..
George, R.A., Spriggs, R.V., Bartlett, G.J., Gutteridge, A., MacArthur, M.W., Porter, C.T., Al-Lazikani, B., Thornton, J.M. & Swindells, M.B.
(2005). Effective function annotation through catalytic residue conservation. Proc natl acad sci u s a,
Because of the extreme impact of genome sequencing projects, protein sequences without accompanying experimental data now dominate public databases. Homology searches, by providing an opportunity to transfer functional information between related proteins, have become the de facto way to address this. Although a single, well annotated, close relationship will often facilitate sufficient annotation, this situation is not always the case, particularly if mutations are present in important functional residues. When only distant relationships are available, the transfer of function information is more tenuous, and the likelihood of encountering several well annotated proteins with different functions is increased. The consequence for a researcher is a range of candidate functions with little way of knowing which, if any, are correct. Here, we address the problem directly by introducing a computational approach to accurately identify and segregate related proteins into those with a functional similarity and those where function differs. This approach should find a wide range of applications, including the interpretation of genomics/proteomics data and the prioritization of targets for high-throughput structure determination. The method is generic, but here we concentrate on enzymes and apply high-quality catalytic site data. In addition to providing a series of comprehensive benchmarks to show the overall performance of our approach, we illustrate its utility with specific examples that include the correct identification of haptoglobin as a nonenzymatic relative of trypsin, discrimination of acid-d-amino acid ligases from a much larger ligase pool, and the successful annotation of BioH, a structural genomics target..
George, R.A., Spriggs, R.V., Thornton, J.M., Al-Lazikani, B. & Swindells, M.B.
(2004). SCOPEC: a database of protein catalytic domains. Bioinformatics,
Vol.20 Suppl 1,
MOTIVATION: Domains are the units of protein structure, function and evolution. It is therefore essential to utilize knowledge of domains when studying the evolution of function, or when assigning function to genome sequence data. For this purpose, we have developed a database of catalytic domains, SCOPEC, by combining structural domain information from SCOP, full-length sequence information from Swiss-Prot, and verified functional information from the Enzyme Classification (EC) database. Two major problems need to be overcome to create a database of domain-function relationships; (1) for sequences, EC numbers are typically assigned to whole sequences rather than the functional unit, and (2) The Protein Data Bank (PDB) structures elucidated from a larger multi-domain protein will often have EC annotation although the relevant catalytic domain may lie elsewhere. RESULTS: SCOPEC entries have high quality enzyme assignments; having passed both computational and manual checks. SCOPEC currently contains entries for 75% of all EC annotations in the PDB. Overall, EC number is fairly well conserved within a superfamily, even when the proteins are distantly related. Initial analysis is encouraging; suggesting that there is a 50:50 chance of conserved function in distant homologues first detected by a third iteration PSI-BLAST search. Therefore, we envisage that a knowledge-based approach to function assignment using the domain-EC relationships in SCOPEC will gain a marked improvement over this base line. AVAILABILITY: The SCOPEC database is a valuable resource in the analysis and prediction of protein structure and function. It can be obtained or queried at our website http://www.enzome.com.
Sheinerman, F.B., Al-Lazikani, B. & Honig, B.
(2003). Sequence, structure and energetic determinants of phosphopeptide selectivity of SH2 domains. J mol biol,
Here, we present an approach for the prediction of binding preferences of members of a large protein family for which structural information for a number of family members bound to a substrate is available. The approach involves a number of steps. First, an accurate multiple alignment of sequences of all members of a protein family is constructed on the basis of a multiple structural superposition of family members with known structure. Second, the methods of continuum electrostatics are used to characterize the energetic contribution of each residue in a protein to the binding of its substrate. Residues that make a significant contribution are mapped onto the protein sequence and are used to define a "binding site signature" for the complex being considered. Third, sequences whose structures have not been determined are checked to see if they have binding-site signatures similar to one of the known complexes. Predictions of binding affinity to a given substrate are based on similarities in binding-site signature. An important component of the approach is the introduction of a context-specific substitution matrix suitable for comparison of binding-site residues. The methods are applied to the prediction of phosphopeptide selectivity of SH2 domains. To this end, the energetic roles of all protein residues in 17 different complexes of SH2 domains with their cognate targets are analyzed. The total number of residues that make significant contributions to binding is found to vary from nine to 19 in different complexes. These energetically important residues are found to contribute to binding through a variety of mechanisms, involving both electrostatic and hydrophobic interactions. Binding-site signatures are found to involve residues in different positions in SH2 sequences, some of them as far as 9A away from a bound peptide. Surprisingly, similarities in the signatures of different domains do not correlate with whole-domain sequence identities unless the latter is greater than 50%. An extensive comparison with the optimal binding motifs determined by peptide library experiments, as well as other experimental data indicate that the similarity in binding preferences of different SH2 domains can be deduced on the basis of their binding-site signatures. The analysis provides a rationale for the empirically derived classification of SH2 domains described by Songyang & Cantley, in that proteins in the same group are found to have similar residues at positions important for binding. Confident predictions of binding preference can be made for about 85% of SH2 domain sequences found in SWISSPROT. The approach described in this work is quite general and can, in principle, be used to analyze binding preferences of members of large protein families for which structural information for a number of family members is available. It also offers a strategy for predicting cross-reactivity of compounds designed to bind to a particular target, for example in structure-based drug design..
Sheinerman, F.B., Al-Lazikani, B. & Honig, B.
(2001). Predicting phosphopeptide selectivity of SH2 domains. Biophys j,
Al-Lazikani, B., Jung, J., Xiang, Z. & Honig, B.
(2001). Protein structure prediction. Curr opin chem biol,
The prediction of protein structure, based primarily on sequence and structure homology, has become an increasingly important activity. Homology models have become more accurate and their range of applicability has increased. Progress has come, in part, from the flood of sequence and structure information that has appeared over the past few years, and also from improvements in analysis tools. These include profile methods for sequence searches, the use of three-dimensional structure information in sequence alignment and new homology modeling tools, specifically in the prediction of loop and side-chain conformations. There have also been important advances in understanding the physical chemical basis of protein stability and the corresponding use of physical chemical potential functions to identify correctly folded from incorrectly folded protein conformations..
Al-Lazikani, B., Sheinerman, F.B. & Honig, B.
(2001). Combining multiple structure and sequence alignments to improve sequence detection and alignment: application to the SH2 domains of Janus kinases. Proc natl acad sci u s a,
In this paper, an approach is described that combines multiple structure alignments and multiple sequence alignments to generate sequence profiles for protein families. First, multiple sequence alignments are generated from sequences that are closely related to each sequence of known three-dimensional structure. These alignments then are merged through a multiple structure alignment of family members of known structure. The merged alignment is used to generate a Hidden Markov Model for the family in question. The Hidden Markov Model can be used to search for new family members or to improve alignments for distantly related family members that already have been identified. Application of a profile generated for SH2 domains indicates that the Janus family of nonreceptor protein tyrosine kinases contains SH2 domains. This conclusion is strongly supported by the results of secondary structure-prediction programs, threading calculations, and the analysis of comparative models generated for these domains. One of the Janus kinases, human TYK2, has an SH2 domain that contains a histidine instead of the conserved arginine at the key phosphotyrosine-binding position, betaB5. Calculations of the pK(a) values of the betaB5 arginines in a number of SH2 domains and of the betaB5 histidine in a homology model of TYK2 suggest that this histidine is likely to be neutral around pH 7, thus indicating that it may have lost the ability to bind phosphotyrosine. If this indeed is the case, TYK2 may contain a domain with an SH2 fold that has a modified binding specificity..
Al-Lazikani, B., Lesk, A.M. & Chothia, C.
(2000). Canonical structures for the hypervariable regions of T cell alphabeta receptors. J mol biol,
T cell alphabeta receptors have binding sites for peptide-MHC complexes formed by six hypervariable regions. Analysis of the six atomic structures known for Valpha and for Vbeta domains shows that their first and second hypervariable regions have one of three or four different main-chain conformations (canonical structures). Six of these canonical structures have the same conformation in complexes with peptide-MHC complexes, the free receptor and/or in an isolated V domain. Thus, for at least the first and second hypervariable regions in the currently known structures, the conformation of the canonical structures is well defined in the free state and is conserved on formation of complexes with peptide-MHC. We identified the key residues that are mainly responsible for the conformation of each canonical structure. The first and second hypervariable regions of Valpha and Vbeta domains are encoded by the germline V segments. Humans have 37 functional Valpha segments and 47 Vbeta segments, and mice have 20 Vbeta segments. Inspection of the size of their hypervariable regions, and of sites that contain key residues, indicates that close to 70 % of Valpha segments and 90 % of Vbeta segments have hypervariable regions with a conformation of one of the known canonical structures. The alpha and beta V gene segments in both humans and mice have only a few combinations of different canonical structure in their first and second hypervariable regions. In human Vbeta domains, the number of different sequences with these canonical structure combinations is larger than in mice, whilst for Valpha domains it is probably smaller..
Al-Lazikani, B., Lesk, A.M. & Chothia, C.
(1997). Standard conformations for the canonical structures of immunoglobulins. J mol biol,
A comparative analysis of the main-chain conformation of the L1, L2, L3, H1 and H2 hypervariable regions in 17 immunoglobulin structures that have been accurately determined at high resolution is described. This involves 79 hypervariable regions in all. We also analysed a part of the H3 region in 12 of the 15 VH domains considered here. On the basis of the residues at key sites the 79 hypervariable regions can be assigned to one of 18 different canonical structures. We show that 71 of these hypervariable regions have a conformation that is very close to what can be defined as a "standard" conformation of each canonical structure. These standard conformations are described in detail. The other eight hypervariable regions have small deviations from the standard conformations that, in six cases, involve only the rotation of a single peptide group. Most H3 hypervariable regions have the same conformation in the part that is close to the framework and the details of this conformation are also described here..
Kinnersley, B., Sud, A., Coker, E.A., Tym, J.E., Di Micco, P., Al-Lazikani, B. & Houlston, R.S.
Leveraging human genetics to guide cancer drug development. Jco clinical cancer informatics,
Santos, R., Ursu, O., Gaulton, A., Bento, A.P., Donadi, R.S., Bologa, C.G., Karlsson, A., Al-Lazikani, B., Hersey, A., Oprea, T.I., et al.
A comprehensive map of molecular drug targets. Nat rev drug discov,
The success of mechanism-based drug discovery depends on the definition of the drug target. This definition becomes even more important as we try to link drug response to genetic variation, understand stratified clinical efficacy and safety, rationalize the differences between drugs in the same therapeutic class and predict drug utility in patient subgroups. However, drug targets are often poorly defined in the literature, both for launched drugs and for potential therapeutic agents in discovery and development. Here, we present an updated comprehensive map of molecular targets of approved drugs. We curate a total of 893 human and pathogen-derived biomolecules through which 1,578 US FDA-approved drugs act. These biomolecules include 667 human-genome-derived proteins targeted by drugs for human disease. Analysis of these drug targets indicates the continued dominance of privileged target families across disease areas, but also the growth of novel first-in-class mechanisms, particularly in oncology. We explore the relationships between bioactivity class and clinical success, as well as the presence of orthologues between human and animal models and between pathogen and human genomes. Through the collaboration of three independent teams, we highlight some of the ongoing challenges in accurately defining the targets of molecular therapeutics and present conventions for deconvoluting the complexities of molecular pharmacology and drug efficacy..
Yap, T.A., Smith, A.D., Ferraldeschi, R., Al-Lazikani, B., Workman, P. & de Bono, J.S.
Drug discovery in advanced prostate cancer: translating biology into therapy. Nat rev drug discov,
Castration-resistant prostate cancer (CRPC) is associated with a poor prognosis and poses considerable therapeutic challenges. Recent genetic and technological advances have provided insights into prostate cancer biology and have enabled the identification of novel drug targets and potent molecularly targeted therapeutics for this disease. In this article, we review recent advances in prostate cancer target identification for drug discovery and discuss their promise and associated challenges. We review the evolving therapeutic landscape of CRPC and discuss issues associated with precision medicine as well as challenges encountered with immunotherapy for this disease. Finally, we envision the future management of CRPC, highlighting the use of circulating biomarkers and modern clinical trial designs..
Patel, M.N., Halling-Brown, M.D., Tym, J.E., Workman, P. & Al-Lazikani, B.
Objective assessment of cancer genes for drug discovery. Nat rev drug discov,
Selecting the best targets is a key challenge for drug discovery, and achieving this effectively, efficiently and systematically is particularly important for prioritizing candidates from the sizeable lists of potential therapeutic targets that are now emerging from large-scale multi-omics initiatives, such as those in oncology. Here, we describe an objective, systematic, multifaceted computational assessment of biological and chemical space that can be applied to any human gene set to prioritize targets for therapeutic exploration. We use this approach to evaluate an exemplar set of 479 cancer-associated genes, reveal the tension between biological relevance and chemical tractability, and describe major gaps in available knowledge that could be addressed to aid objective decision-making. We also propose drug repurposing opportunities and identify potentially druggable cancer-associated proteins that have been poorly explored with regard to the discovery of small-molecule modulators, despite their biological relevance..
Agüero, F., Al-Lazikani, B., Aslett, M., Berriman, M., Buckner, F.S., Campbell, R.K., Carmona, S., Carruthers, I.M., Chan, A.W., Chen, F., et al.
Genomic-scale prioritization of drug targets: the TDR Targets database. Nat rev drug discov,
The increasing availability of genomic data for pathogens that cause tropical diseases has created new opportunities for drug discovery and development. However, if the potential of such data is to be fully exploited, the data must be effectively integrated and be easy to interrogate. Here, we discuss the development of the TDR Targets database (http://tdrtargets.org), which encompasses extensive genetic, biochemical and pharmacological data related to tropical disease pathogens, as well as computationally predicted druggability for potential targets and compound desirability information. By allowing the integration and weighting of this information, this database aims to facilitate the identification and prioritization of candidate drug targets for pathogens..