generate_cnv_graph a data structure to handle different CNV signatures in a family

class generate_cnv_graph.ConnectedComponent(first_vertex, overlap_threshold=0.7)[source]

A class representing a graph-theoretic connected component.

Parameters:
  • first_vertex (Either cnv_struct.cnv or generate_cnv_graph.Vertex) – The first vertex is either the first CNV in the connected component or the first generate_cnv_graph.Vertex object. Internally, whenever a CNV is added to a connected component, they are converted to instances of this class.
  • overlap_threshold (float (between 0 and 1)) – An overlap threshold that defines if two CNVs will be considered identical.

Internally, a ConnectedComponent object knows where the represented loci starts, ends, on which chromosome, the overlap threshold used and represents its set of vertices as a list (the adj_list attribute).

add(cnv)[source]

Adds the given cnv to the adjacency list.

cnv_generator()[source]

Returns a generator (iterable) of cnv objects from the vertices of this connected component.

remove(vertex)[source]

Removes a vertex from this connected component.

suitable(cnv)[source]

Determines if the given cnv should be added to this connected component.

class generate_cnv_graph.Edge(v1, v2, w)[source]

A simple Edge object that links two vertices together. Edges are weighted.

Edges should be generated with this class and added using generate_cnv_graph.add_edge()

class generate_cnv_graph.Vertex(cnv_obj)[source]

A container class for CNVs which allows Edge objects to link overlapping CNVs together.

add_edge(e)[source]

Add an edge between vertices.

Parameters:e (Edge) – The edge object to add to this vertex.

Concretely, this is used to link overlapping CNVs together.

generate_cnv_graph.check_profiles(graph)[source]

Counts the signatures and prints the matrix given a signature graph.

As described in the generate_cnv_graph.main() method, the signatures represent the status for every member of the family at a given loci.

A sample matrix could be:

Twin1 Twin2 Mother Father Count
+ + - 0 52
0 - - 0 105
- - - - 21

Which says that at 52 loci, both twins had gains, the mother had a deletion and the father had no detected CNV. Same reasoning goes for the two other signatures.

generate_cnv_graph.clean_ccs(ccs)[source]

Merges CNVs that have the same source (sample).

Parameters:ccs (list) – The graph as a list of Connected Components.

This is used so that connected components represent families with a single representation for every individual. Thus, we merge indirectly overlapping loci, meaning that if two CNVs from an individual are both overlapped by CNVs from another individual within a family, they will be merged.

generate_cnv_graph.create_family_graph(cnvs, threshold)[source]

Creates the graph representing the CNVs from a given family as connected components.

This graph is defined as follows. The nodes represent CNVs and the edges represent overlap between CNVs. The complete graph is thus made of multiple connected components representing different loci.

generate_cnv_graph.create_seek_index(pileups)[source]

Create and index of the seek positions (tell) to the genomic position.

This is used to quickly move around very large pileup files. Use it only on unzipped files.

generate_cnv_graph.get_coverage(cc_list, twin1_pileup, twin2_pileup, mother_pileup, father_pileup, window_size=1000)[source]

Computes the coverage inside and outside of every CNV loci represented by a connected component in the cc_list graph.

Parameters:
  • cc_list (list) – Graph represented by a list of its connected components.
  • twin1_pileup (str) – Path to the pileup for twin1.
  • twin2_pileup (str) – Path to the pileup for twin2.
  • mother_pileup (str) – Path to the pileup for mother.
  • father_pileup (str) – Path to the pileup for father.
  • window_size (int) – The size of genomic window around the region for coverage computation.

Concretely, this script adds the region_doc and cc_doc attributes to every connected component in the graph. The difference between those values can then be included in the printed matrices.

generate_cnv_graph.main(args)[source]

Generates a graph structure representing the familial status for a given loci.

The signature matrix represents the status of every individual of a given family at a given loci. The status can be +, - or 0, representing a gain, a loss or a no call, respectively. This being said, given a particular loci, the Mendelian inheritance can of a variant can be quickly assessed by contemplating the status for every indivdual from a family. This is why we generate matrices with the status symbol for every individual in the family and count the number of times a signature occurs. As an example, let’s say that both twins and the mother have a deletion, and the father had no CNV called at the given region, the signature would be (-, -, -, 0) as the arbitrary order for signatures is always (twin1, twin2, mother, father).

The goal of such an analysis was to quickly assess the amount of inherited CNVs and to detect any algorithm-specific biais.

A pileup file parsing utility is also integrated with this tool allowing the validation of the regions by comparing the coverage inside and outside of the CNV loci. Such an analysis had modest success.

generate_cnv_graph.merge_ccs(cc_list, cnv)[source]

Merges connected components by using their respective overlap to cnv.

generate_cnv_graph.normalize_pileups_starting_position(*pileups)[source]

Takes an arbitrary number of files and uses readline so that all the files have the same starting position as the highest starting position pileup.

generate_cnv_graph.seek_to(position, chromo, pileups, indexes, seek_profiling=False)[source]

Seeks to a given position for all pileups.

Previous topic

de_novo_germinal tool to filter out de novo variants

Next topic

mendelian: Tools to filter inherited CNVs

This Page