Tools - StatGen

UK Biobank and extreme temperatures

We are investigating the effect of extreme temperatures (both hot and cold) on various health outcomes. The HadUK-Grid gridded climate observations on a 1km grid over the UK (v1.3.0.ceda) and the UK Biobank data will be used. A quick summary of the climate observations are available at https://statgen.org/uk-weather/weather_summary_simple.html.

Acclimation Proteomic Study

We investigated plasma proteomic changes in response to acute heat exposure and subsequent heat acclimation. We also explored whether changes at the proteomic level correlate with the classic physiological changes of human heat acclimation. Such untargeted proteomic investigations have the potential to generate new hypotheses and accessible results sharing can help accelerate discoveries in this field. The web application is available at https://acclimation.statgen.org/.

ExPheWas

The ExPheWas browser showcases the results of a phenome-wide association study (PheWAS) using an approach that models the joint effect of variants at protein-coding genes. In the current data release, we tested the association between 26,616 protein coding or lincRNA genes and 1,746 phenotypes available in the UK Biobank. The ExPheWas browser is available at https://exphewas.ca/ and has been published in Nucleic Acids Research (doi: 10.1093/nar/gkac289).

PheWeb

We have multiple PheWeb instances publicly available. Those instances are hosted on a Compute Canada virtual machine.

forward

forward is a bioinformatics utility to execute, manage and explore phenomic studies. It is available via GitHub. It's documentation is available here. An issue tracker for this project (which can be used for feature requests) is also available.

genipe

genipe is an automatic pipeline to perform genome-wide imputation analysis on a genotyped dataset. It is available via GitHub. Full documentation is available, along with detailed installation instructions.

pyplink

pyplink is an open source python module that enables working with Plink binary files without having to first convert them to text files. It is available via GitHub. The README file contains a description of how to use this module.

pyGenClean

pyGenClean is a bioinformatics tool to facilitate and standardize the genetic data clean up pipeline with genotyping array data. In conjunction with a source batch-queuing system, the tool minimizes data manipulation errors, it accelerates the completion of the data clean up process and it provides informative graphics and metrics to guide decision making for statistical analysis. It is available via GitHub.

CNV Analysis Toolkit

This set of tools was created for the study of whole genome sequencing based algorithms for CNV genotyping in monozygotic twins families.

CNGen

We have created a script that converts integrated SNP and CNV calls generated from Birdsuite's Fawkes procedure into phased copy number genotypes (CN genotypes) using familial pedigree data. This software makes possible the use of CNPs and CNVs for genetic linkage with family data.

Chip2Spell

We have created a program that automatically generates the input files for Alohomora_m. The program, Chip2Spell, takes as input a genotype report and publicly available annotation files and creates the genotype file, map file and frequency file that will be used by Alohomora. The program is especially useful if the map and frequency files for a given platform are not stored in the Alohomora library, but it is also a quick way to convert the standard Affymetrix or Illumina genotype report format to the AB format requested by Alohomora.

Gene-environment interactions

We have created some code that performs a GWAS analysis of gene-environment interactions for imputed SNPs and non-imputed SNPs. Instead of using PLINK to perform the logistic regressions or the generalized linear model, we used the PLINK's R plug-in function to do it for the non-imputed SNPs.

MAX test

We have coded the MAX test of Zheng and Gastwirth (Statist. Med. 2006;25:3150) in some SAS MACROS. As well, we have created a wrapper for our MAX test code that allows the user to correct for multiple testing by using the maxT algorithm (Alg 4.1 of Westfall and Young, 1993).

Beyond SAS Genetics™

Paper highlighting the basic features of SAS/Genetics as we routinely apply them to the analysis of human genetic association studies, along with additional SAS/STAT procedures such as LOGISTIC and PHREG to conduct analyses commonly used in human genetic studies.

Testing for linkage disequilibrium using SAS/GENETICS and SAS/STAT

Testing for the presence of linkage disequilibrium (LD) and measuring its value is important in statistical genetics. LD deals with the correlation of genetic variation at two or more loci in the genome within a given population. PROC ALLELE in SAS/GENETICS provides a variety of pairwise LD measures that are related to the well-known Pearson correlation r. Different statistical tests of linkage disequilibrium are performed using PROC ALLELE. PROC HAPLOTYPE offers LD test statistics for multiple loci. In this paper, we clarify differences between LD measures obtained using PROC ALLELE and show how the HAPLO=OPTION of this procedure interacts with the linkage disequilibrium calculations and tests. Moreover, we compare PROC CORR and PROC ALLELE in terms of correlation coefficients of genotypic data.