parsers for the different calling algorithms

This package contains the family-based parsers for every algorithm. Given a base path, the parsers will walk through every sample and return a dict representing the CNV calls for the family. The dict will be of the form sample -> chromosome -> cnv list which means that accessing all the CNVs from chromosome 3 on the father can be done using cnvs['father'][3].

This structure is inconvenient for analysis based on individual samples (no family context). This feature is not yet implemented as this set of tools.

The base directory structure required for the parsers is of the form:

.
├── father
│   └── calls
│       └── <Method specific files>
├── mother
│   └── calls
│       └── <Method specific files>
├── twin1
│   └── calls
│       └── <Method specific files>
└── twin2
    └── calls
        └── <Method specific files>

Therefore, a bash script can easily be written to convert the raw output from the different algorithms into such a structure. Alternatively, this could be done manually if the number of samples is low.

Note

Most scripts from this set of tools can take pickle files as input. When this is the case, the scripts expect either a simple list of CNVs, or a dictionary structure like the one presented here. In other words, the hassle from conforming to this directory structure could be avoided by using the pickle interface implemented in most scripts from this toolkit.

class parsers.ParentParser(family_root)[source]

Interface class for the parsers.

get_cnvs()[source]

Interface method which should return a dictionary of CNVs conforming with the previously established convention.

parsers.family_to_list(li)[source]
Converts a sample -> chromosome -> cnv list dictionary structure to a simple list
of CNVs for the whole family.
Parameters:li (dict) – A dict of the samples to chromosomes to CNV lists.
Returns:A one dimensional list of CNVs.
Return type:list

This is useful when the analysis does not require familial information or when traversal in simple in list form (than a double iteration over samples and chromosomes).

parsers.get_parser_for_algorithm(s)[source]

Gets the appropriate Parser class for a given algorithm.

This is used internally to fetch the correct class when using the command-line tools. The --format argument is often used to identify the correct parser.

Parameters:s (str) – The name of the parser to fetch (e.g. erds, cnvnator).
Returns:The parser
Return type:type

Previous topic

merge_cnvs: Merges adjacent variants (fragmentation)

Next topic

parsers.breakdancer

This Page