faltwerk Python API basics

For more specialized operations, please also see eg functions related to geometry and stats.

Fold: basic structure manipulation and annotation

class faltwerk.models.Fold(fp, quiet=True, annotate=True, strict=True, reindex: bool = False, reindex_start_position: int = 1)[source]

Core object to manipulate single-component protein structures.

The Fold object is the basis for many protein-centric operations in the faltwerk library.

Basic usage:

>>> from faltwerk import Fold
>>> model = Fold('af2_prediction.pdb')
__init__(fp, quiet=True, annotate=True, strict=True, reindex: bool = False, reindex_start_position: int = 1)[source]

Create a faltwerk.Fold object.

Optional arguments:

  • strict (default True) – check that single chain

  • annotate (default True) – add a track that annotates positions N > C

add_scores(fp)[source]

Add ColabFold formatted quality scores.

align_to(ref, mode=0, minscore=0.5)[source]

Align a structure to another one using foldseek. Returns the Tm score (> 0.5 is good) and a copy of the query structure:

>>> qry = Fold(...)
>>> ref = Fold(...)
>>> tm_score, cp_qry = qry.align_to(ref)

Optional arguments (passed to foldseek, see https://github.com/steineggerlab/foldseek):

  • mode (default 0)

  • minscore (default 0.5)

annotate_(key, values, check=True)[source]

Annotate a structure with a track of some feature (solvent access, selected sites, surface probability, …), with one value for each residue. Inplace operation.

Optional arguments:

  • check (default True) – length features == length protein?

>>> from faltwerk.io import load_bfactor_column
>>> features = load_bfactor_column('dMASIF.pdb')
>>> model = Fold(...)
>>> model.annotate_('surface', features)
annotate_many_(many: dict)[source]

Batch annotate features inplace.

>>> model.annotate_many_({'foo': arr1, 'bar': arr2})
rename_chains_(renames: dict) None[source]

Rename chains inplace (note the pytorch-style “_” fn name suffix). AF2 will for example run multiple models, to load them, each has to have a unique chain name.

>>> model = Fold(...)
>>> model.rename_chains_({'A': 'foo', 'B': 'bar'})  # inplace operation

Complex: basic structure complex manipulation and annotation

class faltwerk.models.Complex(fp, scores=None, reindex: bool = False, reindex_start_position: int = 1)[source]

Core object to manipulate multi-component protein structures.

Many proteins interact in protein-protein complexes, and there is an increasing number of models such as AlphaFold v2 that can be used to predict such binding patterns. The Complex object collects methods to manipulate multi-component structures, selecting individual components, annotating binding sites, etc.

Basic usage:

>>> from faltwerk import Complex, Layout
>>> cx = Complex('af2_multichain_prediction.pdb', reindex=1)

Remove and select chains:

>>> cx - 'AB'  # rm chains, same as cx - ['A', 'B']
>>> cx * 'C'   # select
>>> cx.chains['C']
>>> cx[2]

Visualise:

>>> Layout(cx).geom_ribbon().render()
__init__(fp, scores=None, reindex: bool = False, reindex_start_position: int = 1)[source]

Create a faltwerk.Complex object.

Optional arguments:

  • scores (default None) – add quality scores in ColabFold format [json]

  • reindex (default False) – reindex residues (if not numbered 1:n)

  • reindex_start_position (default 1) – where to start residue index

Binding: map likely ligand binding sites to protein structure

class faltwerk.models.Binding(fold, option='confident')[source]

Object to predict ligand binding based on mapped Pfam domains, using the method first explored in the “InteracDome”:

The basic idea there was to take all PDB structures with interacting ligands, mark the interface between the two on the (linear) sequence, and then map the (linear) sequence to the protein sequence underlying a protein structure, marking the likely interacting residues on this query structure. There can be multiple ligands for each Pfam domain.

Because this approach was “trained” on Pfam v31, and all coordinates are based on this version, please only use v31 as reference.

There are three sets of contact data to make predictions from:

confident .. “correspond to domain-ligand interactions that had nonredundant instances across three or more distinct PDB entries and achieved a cross-validated precision of at least 0.5. [Kobren and Singh] recommend using this collection to annotate potential ligand-binding positions in protein sequences”

representable/ nonredundant .. “correspond to domain-ligand interactions that had nonredundant instances across three or more distinct PDB structures. [Kobren and Singh] recommend using this collection to learn more about domain binding properties” (nonredundant is the nonredundant version of representable)

Update: For an end-to-end approach to the same problem, see DiffDock – https://github.com/gcorso/DiffDock

Usage:

>>> from faltwerk import Fold, Binding, Layout
>>> model = Fold('faltwerk/data/zinc_finger/1mey.pdb)
>>> hmms = 'path/to/pfam_v31/Pfam-A.hmm'
>>> b = Binding(model, option='non-redundant')
>>> b.predict_binding_(hmms)  # inplace operation
>>> # Explore: b.domains, b.ligands
>>> model.annotate_('binding', b.get_binding('PF13912.5', 'ZN'))
>>> Layout(model).geom_surface('binding').render()
>>> # Marked in yellow are two symmetric iron binding sites
__init__(fold, option='confident')[source]
get_binding(domain, ligand)[source]

Returns binding frequencies for protein sequence of fold

get_domain(domain)[source]

Returns vector of residue positions with 1 where domain present and 0 otherwise.

>>> for i in b.ids:
>>>    _ = b.get_domain(i)