A class representing a graph-theoretic connected component.
Parameters: |
|
---|
Internally, a ConnectedComponent object knows where the represented loci starts, ends, on which chromosome, the overlap threshold used and represents its set of vertices as a list (the adj_list attribute).
A simple Edge object that links two vertices together. Edges are weighted.
Edges should be generated with this class and added using generate_cnv_graph.add_edge()
A container class for CNVs which allows Edge objects to link overlapping CNVs together.
Counts the signatures and prints the matrix given a signature graph.
As described in the generate_cnv_graph.main() method, the signatures represent the status for every member of the family at a given loci.
A sample matrix could be:
Twin1 | Twin2 | Mother | Father | Count |
---|---|---|---|---|
+ | + | - | 0 | 52 |
0 | - | - | 0 | 105 |
- | - | - | - | 21 |
Which says that at 52 loci, both twins had gains, the mother had a deletion and the father had no detected CNV. Same reasoning goes for the two other signatures.
Merges CNVs that have the same source (sample).
Parameters: | ccs (list) – The graph as a list of Connected Components. |
---|
This is used so that connected components represent families with a single representation for every individual. Thus, we merge indirectly overlapping loci, meaning that if two CNVs from an individual are both overlapped by CNVs from another individual within a family, they will be merged.
Creates the graph representing the CNVs from a given family as connected components.
This graph is defined as follows. The nodes represent CNVs and the edges represent overlap between CNVs. The complete graph is thus made of multiple connected components representing different loci.
Create and index of the seek positions (tell) to the genomic position.
This is used to quickly move around very large pileup files. Use it only on unzipped files.
Computes the coverage inside and outside of every CNV loci represented by a connected component in the cc_list graph.
Parameters: |
|
---|
Concretely, this script adds the region_doc and cc_doc attributes to every connected component in the graph. The difference between those values can then be included in the printed matrices.
Generates a graph structure representing the familial status for a given loci.
The signature matrix represents the status of every individual of a given family at a given loci. The status can be +, - or 0, representing a gain, a loss or a no call, respectively. This being said, given a particular loci, the Mendelian inheritance can of a variant can be quickly assessed by contemplating the status for every indivdual from a family. This is why we generate matrices with the status symbol for every individual in the family and count the number of times a signature occurs. As an example, let’s say that both twins and the mother have a deletion, and the father had no CNV called at the given region, the signature would be (-, -, -, 0) as the arbitrary order for signatures is always (twin1, twin2, mother, father).
The goal of such an analysis was to quickly assess the amount of inherited CNVs and to detect any algorithm-specific biais.
A pileup file parsing utility is also integrated with this tool allowing the validation of the regions by comparing the coverage inside and outside of the CNV loci. Such an analysis had modest success.
Merges connected components by using their respective overlap to cnv.