Script to merge CNVs that are adjacent, if they are separated by a distance under a given threshold.
Such a script is very useful to fix the problems arising from call fragmentation, a phenomenon which occurs when a genotyping algorithms doesn’t test the region between the calls to extend the breakpoints of the CNVs.
e.g. Given a distance threshold of 5kb, all the CNVs that are separated by less than 5kb will be merged together.
This script will generate BED files for easy visualisation of the resulting calls using UCSC Genome Browser, and a pickle file for easy data analysis on the resulting set.
Merges the CNVs in the list if they are separated by less than a threshold value.
Parameters: |
|
---|---|
Returns: | A list of merged CNVs. |
Return type: | list |
The plot_distance_distribution.py script is a good tool to determine if a given dataset needs to be merged to avoid fragmentation.