Software and Databases
Molecular and Population Genetics Team
Section:
Section of Cancer Genetics
In collaboration with groups at the Cancer Research UK London Research Institute and the University of Edinburgh and MRC Human Genetics Unit, we have conducted a genome-wide association study for colorectal cancer. In Phase 1 of the study we generated genotype data for 547,487 SNPs in 922 individuals with colorectal neoplasia and 927 controls ascertained through the Colorectal Tumour Gene Identification (CoRGI) consortium.
Thus far, the GWAS has led to the identification of several novel susceptibility loci for colorectal cancer. Please use the reference below for citation purposes.
Tomlinson et al. A genome-wide association study identifies colorectal cancer susceptibility loci on chromosomes 10p14 and 8q23.3. Nat Genet. 2008 Mar 30
In order to facilitate the identification of additional loci predisposing to colorectal cancer, the genotype count data and allelic test results from the genome-wide phase of this study are available to other groups for meta-analyses and further studies.
Download the Data (part 1 of 5) (9013 KB)
Download the Data (part 2 of 5) (9693 KB)
Download the Data (part 3 of 5) (8313 KB)
Download the Data (part 4 of 5) (8300 KB)
Download the Data (part 5 of 5) (5136 KB)
Cancer Research UK provided principal funding for this study.
Predicted Impact of Coding SNPs (PICS) Database
Project Coordinators: M Rudd & RS Houlston
We have classified and catalogued the predicted impact on protein function of non-synonymous single nucleotide polymorphisms (nsSNPs) in genes relevant to the biology of cancer using in silico computational tools. The data is supplementary to that published in: Matthew F. Rudd, Richard D. Williams, Emily L. Webb, Steffen Schmidt, Gabrielle S. Sellick, Richard S. Houlston. The PICS (Predicted Impact of Coding SNPs) database. Cancer Epidemiology, Biomarkers and Prevention (in press).
- Supplementary Table 1 details 9,537 validated bi-allelic nsSNPs retrieved from NCBI dbSNP Build 123 located within one of 21,506 annotated genes. The data is available in Excel (.xls) and plain text (.txt) formats.
- Supplementary Table 2 details the 7,080 genes curated on the basis of their potential biological relevance to cancer. The data is available in Excel (.xls) format.
- Supplementary Table 3 details 3,009 nsSNPs located within one of 7,080 candidate cancer genes, with minor allele frequencies (MAF) >= 0.01 validated in Caucasian populations. The predicted impact on wild-type protein structure and function was computed for each entry using three freely available algorithms: Grantham matrix¹, PolyPhen², and SIFT³. The data is available in Excel (.xls) and plain text (.txt) formats.
- Header descriptions for Tables 1 & 3 are found in the Readme (.doc) file.
Please click below to download:
Table 1 Text (8145 KB)
Table 3 Text (1509 KB)
Supplementary Table xls.1 (13843 KB)
Supplementary Table xls.2 (811 KB)
Supplementary Table xls.3 (2915 KB)
For any comments or suggestions, please contact Matthew Rudd
References:
- Grantham R. Amino acid difference formula to help explain protein evolution. Science, 185: 862-864, 1974.
- PolyPhen (Polymorphism Phenotyping). Ramensky V, Bork P, Sunyaev S. Human non-synonymous SNPs: server and survey. Nucleic Acids Res, 30: 3894-3900, 2002.
- SIFT (Sorting Intolerant From Tolerant). Ng PC, Henikoff S. Predicting deleterious amino acid substitutions. Genome Res, 11: 863-874, 2001.
Bioinformatics: SNPLINK
Project Coordinators: E Webb & RS Houlston
SNPLINK: Multipoint linkage analysis of densely distributed SNP data incorporating automated linkage disequilibrium removal.
We have developed the program SNPLINK to undertake full genomewide linkage analysis in an automated fashion. Since linkage disequilibrium between densely spaced SNP markers can inflate linkage statistics, SNPLINK performs automated linkage disequilibrium removal and then re-calculates linkage statistics.
The program consists of a set of Perl scripts and modules and has been extensively tested in a Unix environment. Installation is straightforward and is described in the file ‘Installation and user guide.doc’.
Please click below to download:
SNPLINK (128 KB)
There are 8 files which have been zipped by WinZip 8.0:
- example1.in
- example2.in
- example-chr1.dat
- example-chr1.pre
- Installation and user guide.doc
- snplink.pl
- snplinkfunctions.pm
- snplinkfunctionspar.pm
SNPLINK carries out both parametric and nonparametric linkage analysis. The software package Merlin is required for both types of analysis, while the software package Allegro is required for parametric analysis. The R statistical software is required to produce the graphical outputs and Perl is required to run the scripts. All required software is freely available (see links below).
- Merlin (Abecasis,G.R., Cherny,S.S., Cookson,W.O. and Cardon,L.R. (2002) Merlin – rapid analysis of dense genetic maps using sparse gene flow trees. Nat. Genet., 30, 97-101).
- Allegro (Gudbjartsson,D.F., Jonasson,K., Frigge,M.L. and Kong,A. (2004) Allegro, a new computer program for multipoint linkage analysis. Nat. Genet., 25, 12-13)
- The R Project for Statistical Computing
- Download the Latest Version of Perl
Any comments or suggestions, please contact Emily Webb
Reference: Webb et al. (2005) SNPLINK: multipoint linkage analysis of densely distributed SNP data incorporating automated linkage disequilibrium removal. Bioinformatics (in press).
Search for low penetrance alleles for colorectal cancer through a scan of nsSNPs in 2,575 cases and 2,707 controls.
Project Coordinators: E Webb & RS Houlston
SNP data file. Please click below to download:
Supplementary Table xls.1 (157 KB)
Any comments or suggestions, please contact Emily Webb