IO¶
- faltwerk.io.is_gz_file(filepath)[source]¶
https://stackoverflow.com/questions/3703276/how-to-tell-if-a-file-is-gzip-compressed
- faltwerk.io.load_bfactor_column(fp)[source]¶
Load annotation data stored in the bfactor column of a .pdb file
- faltwerk.io.load_conserved(fp, ref=None, metric=<function mean_pairwise_similarity>)[source]¶
If no reference sequence name is provided, assume the first sequence is the reference. Why do we even need to specify the reference? Bc/ in the MSA it can contain gaps, which we’ll omit bc/ we want to be able to map the conservation values to the protein structure, which does not contain gaps and we assume is identical to the reference sequence.
Available functions:
mean_pairwise_similarity
entropy
- faltwerk.io.parse_hyphy(fp, method='meme', direction='positive', skip=[])[source]¶
hyphyreturns endless files with lots and lots of values (granted, it makes the calculations very transparent). Parse them into a human- friendly format.Optional arguments:
skip – use e.g. to mask gaps in an alignment, needs to be some form of binary iterator (eg [0, 0, 0, 1, 0, 0, 0, …])
- faltwerk.io.read_pdb(fp: Union[str, Path], name: str = 'x', strict: bool = True, reindex: bool = False, reindex_start_position: int = 1) <module 'Bio.PDB.Structure' from '/home/docs/checkouts/readthedocs.org/user_builds/faltwerk/conda/latest/lib/python3.10/site-packages/Bio/PDB/Structure.py'>[source]¶
Return structure AND sequence
https://biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ
The hierarchy of the PDB parser is this:
>>> p = PDBParser() >>> structure = p.get_structure("X", "pdb1fat.ent") >>> for model in structure: >>> for chain in model: >>> for residue in chain: >>> for atom in residue: >>> print(atom)