merge_cnvs: Merges adjacent variants (fragmentation)

merge_cnvs.main(args)[source]

Script to merge CNVs that are adjacent, if they are separated by a distance under a given threshold.

Such a script is very useful to fix the problems arising from call fragmentation, a phenomenon which occurs when a genotyping algorithms doesn’t test the region between the calls to extend the breakpoints of the CNVs.

e.g. Given a distance threshold of 5kb, all the CNVs that are separated by less than 5kb will be merged together.

This script will generate BED files for easy visualisation of the resulting calls using UCSC Genome Browser, and a pickle file for easy data analysis on the resulting set.

merge_cnvs.merge(cnvs, threshold=5, size_over=None)[source]

Merges the CNVs in the list if they are separated by less than a threshold value.

Parameters:
  • cnvs (list) – The list of CNVs to merge.
  • threshold (float) – The distance threshold in kilobase. All the CNVs separated by less than this value will be merged.
  • size_over (float) – A minimum size for the resulting CNVs to be added to the list. This allows the easy size-based filtering of the CNVs. (optional)
Returns:

A list of merged CNVs.

Return type:

list

The plot_distance_distribution.py script is a good tool to determine if a given dataset needs to be merged to avoid fragmentation.

merge_cnvs.write_bed(cnvs, threshold, fam)[source]

Write the bed file for the merged samples.

merge_cnvs.write_pickle(cnvs, threshold, fam)[source]

Simple utility function to write the CNVs list as a pickle file.

Previous topic

mendelian: Tools to filter inherited CNVs

Next topic

parsers for the different calling algorithms

This Page