pyGenClean is an informatics tool to facilitate and standardize the genetic data clean up pipeline with genotyping array data. In conjuction with a source batch-queuing system, the tool minimizes data manipulation errors, it accelerates the completion of the data clean up process and it provides informative graphics and metrics to guide decision making for statistical analysis.
pyGenClean is an open source Python 2.7 code and is freely available, along with documentation and examples. It is a command tool working on both Linux and Windows operating systems.
If you use pyGenClean in any published work, please cite the published scientific paper describing the tool.
Lemieux Perreault LP, Provost S, Legault MA, Barhdadi A, Dubé MP: pyGenClean: efficient tool for genetic data clean up before association testing. Bioinformatics 2013, 29(13):1704-1705 [DOI:10.1093/bioinformatics/btt261]
- Source code:
- Latest via GitHub
- Example of configuration file
- Reference file (ethnicity module)
- Test dataset