cnv_struct: A data structure to handle CNVs

class cnv_struct.cnv(**kwargs)[source]

Build a CNV object.

Parameters:
  • chr (str) – The chromosome (e.g. 1, 13, X, Y)
  • start (int) – The start of the CNV.
  • end (int) – The end of the CNV.
  • size (int) – The size of the CNV (end - start).
  • pos (str) – The position of the CNV in genome coordinate format of the form chr#:start-end. (e.g. chr3:13515-14539)
  • type (str) – The type of the CNV (“gain” or “loss”)
  • cn – The copy number: <2 for deletions, >2 for gains.
  • confidence – A confidence value (specific to a given algorithm).
  • algo (str) – The genotyping algorithm.
  • source (str) – Either “father”, “mother”, “twin1” or “twin2”.
  • doc (float) – The depth of coverage ratio if available.

Two initilization methods are available. One can either use the (chr, start, end) parameters to initilize the CNV object, or the more straight forward pos attribute. All the object are normalized to insure integrity.

Note

The pos kwarg is never stored as an attribute, but it can be retrieved using the cnv_struct.cnv.get_pos() function.

get_pos()[source]

Gets the position string from the chr, start and end attributes.

Returns:A position string of the form “chrXX:start-end”.
Return type:str
normalize_attributes(pos=None)[source]
Normalizes the different fields for consistency regardless of the

method of initialization.

All the missing attributes will be set to None, the type parameter will be inferred from the copy number, if available.

The CNV type is checked against expected values.

The chr, start and end are parsed from the pos if such an attribute is given.

given

cnv_struct.family_intersection(cnvs1, cnvs2, ro)[source]

Takes two family dicts (sample -> chromosome -> cnv list) and returns a new dict. corresponding to their intersection.

Parameters:
  • cnvs1 (dict) – CNVs family dict (e.g. from a given family/algorithm).
  • cnvs2 (dict) – CNVs family dict (e.g. from another algorithm, same family).
  • ro (float) – Reciprocal overlap threshold for identity.
Returns:

A family dict with the intersection of the two CNV lists.

Return type:

dict

cnv_struct.family_union(cnvs1, cnvs2, ro)[source]

Takes two family dicts (sample -> chromosome -> cnv list) and returns a new dict. corresponding to their union.

Parameters:
  • cnvs1 (dict) – CNVs family dict (e.g. from a given family/algorithm).
  • cnvs2 (dict) – CNVs family dict (e.g. from another algorithm, same family).
  • ro (float) – Reciprocal overlap threshold for identity.
Returns:

A family dict with the union of the two CNV lists.

Return type:

dict

cnv_struct.merge_cnvs(cnv1, cnv2)[source]

Merges two CNVs into a new object spanning the whole region.

Parameters:
Returns:

The merged CNV object.

Return type:

cnv_struct.cnv

cnv_struct.overlap(cnv1, cnv2, threshold=None, global_ov=False)[source]

Checks the overlap between two CNVs.

Parameters:
Returns:

Either returns a tuple of overlap ratios for both CNVs or, if a theshold is provided, a bool that is True if the overlap is sufficiently high.

cnv_struct.ro_to_best_in_list(cnv, li, return_cnv=False)[source]

Takes a cnv and finds the reciprocal overlap with the best matching cnv in the provided list. :param cnv: A cnv (e.g. from a twin) :type cnv: cnv.

Parameters:li (list.) – A list of cnv to find the best reciprocal overlapping cnv.
Returns:Returns a 2-uple of the respective overlaps for both cnvs.

Since there is rarely more than 1000 CNV per chromosome per sample, it is not necessary to sort the CNV list. A simple brute-force approach is deemed acceptable.

Previous topic

cnv_db.db

Next topic

compare_dgv script to check DGV overlap

This Page