Encoding

Structures from the KLIFS database can be encoded as kissim fingerprint. This kinase fingerprint is based on the KLIFS pocket alignment, which defines 85 pocket residues for all kinase structures. This enables a residue-by-residue comparison without a computationally expensive alignment step. The pocket fingerprint consists of 85 concatenated residue fingerprints, each encoding a residue’s spatial and physicochemical properties (see Figure 1). The spatial properties describe the residue’s position in relation to the kinase pocket centroid and important kinase subpockets, i.e. the hinge region, the DFG region, and the front pocket. The physicochemical properties encompass for each residue its size and pharmacophoric features, solvent exposure and side chain orientation.

The ``kissim`` fingerprint composition.

Figure 1: The kissim fingerprint.

[1]:
from pathlib import Path
[2]:
# Load path to test data
from kissim.dataset.test import PATH as PATH_TEST_DATA

Set up remote and local KLIFS session using the opencadd.databases.klifs module.

[3]:
from opencadd.databases.klifs import setup_remote, setup_local

KLIFS_REMOTE = setup_remote()
KLIFS_LOCAL = setup_local(PATH_TEST_DATA / "KLIFS_download")

Encode one structure

[4]:
from kissim.encoding import Fingerprint
[5]:
# flake8-noqa-cell
Fingerprint.from_structure_klifs_id?
Signature: Fingerprint.from_structure_klifs_id(structure_klifs_id, klifs_session=None)
Docstring:
Calculate fingerprint for a KLIFS structure (by structure KLIFS ID).

Parameters
----------
structure_klifs_id : int
    Structure KLIFS ID.
klifs_session : opencadd.databases.klifs.session.Session or None
    Local or remote KLIFS session.
    If None (default), set up remote KLIFS session.

Returns
-------
kissim.encoding.Fingerprint
    Fingerprint.
File:      ~/Documents/GitHub/kissim/kissim/encoding/fingerprint.py
Type:      method

Generate fingerprint from remote KLIFS session

[6]:
fingerprint = Fingerprint.from_structure_klifs_id(109)
[7]:
fingerprint = Fingerprint.from_structure_klifs_id(109, KLIFS_REMOTE)

Generate fingerprint from local KLIFS session

[8]:
fingerprint = Fingerprint.from_structure_klifs_id(109, KLIFS_LOCAL)

Explore Fingerprint object

[9]:
fingerprint.physicochemical
[9]:
size hbd hba charge aromatic aliphatic sco exposure
residue.ix
1 2.0 1.0 1.0 0.0 1.0 0.0 2.0 3.0
2 2.0 1.0 0.0 1.0 0.0 0.0 3.0 3.0
3 2.0 0.0 0.0 0.0 0.0 1.0 2.0 1.0
4 1.0 0.0 0.0 0.0 0.0 0.0 NaN 1.0
5 1.0 0.0 0.0 0.0 0.0 0.0 NaN 3.0
... ... ... ... ... ... ... ... ...
81 2.0 0.0 2.0 -1.0 0.0 0.0 3.0 3.0
82 3.0 0.0 0.0 0.0 1.0 0.0 2.0 2.0
83 1.0 0.0 0.0 0.0 0.0 0.0 NaN 3.0
84 2.0 0.0 0.0 0.0 0.0 1.0 2.0 1.0
85 1.0 1.0 1.0 0.0 0.0 0.0 3.0 3.0

85 rows × 8 columns

[10]:
fingerprint.physicochemical.hist();
../_images/tutorials_encoding_15_0.png
[11]:
fingerprint.distances
[11]:
hinge_region dfg_region front_pocket center
residue.ix
1 13.638825 18.151474 14.976771 17.175079
2 11.992615 15.466840 12.175196 14.700109
3 9.609095 14.682669 9.020685 12.176826
4 11.448428 15.794301 8.619237 12.807139
5 14.557948 17.282959 11.822055 15.756100
... ... ... ... ...
81 9.265642 7.738092 6.713789 4.406957
82 8.524595 6.593623 5.218254 4.892906
83 11.939774 6.238186 8.907651 8.120236
84 13.661263 9.324866 9.374503 10.433297
85 17.328062 12.035613 12.813744 13.699333

85 rows × 4 columns

[12]:
fingerprint.distances.hist();
../_images/tutorials_encoding_17_0.png
[13]:
fingerprint.moments
[13]:
hinge_region dfg_region front_pocket center
moments
1 12.744562 13.951375 12.748791 12.093494
2 4.411861 4.842253 4.286650 3.403159
3 2.920387 3.377544 3.075991 1.621391

You can select which feature types you would like to extract from the fingerprint: bits for physicochemical, distances, and/or moments features.

physicochemical

distances

moments

fingerprint length

True

True

True

1032

True

True

False

1020

True

False

True

692

True

False

False

680

False

True

True

352

False

True

False

340

False

False

True

12

False

False

False

0

[14]:
fingerprint_array = fingerprint.values_array(
    physicochemical=True, spatial_distances=True, spatial_moments=True
)
[15]:
import pandas as pd

pd.DataFrame({"residue.id": fingerprint.residue_ids, "residue.ix": fingerprint.residue_ixs})
[15]:
residue.id residue.ix
0 292 1
1 293 2
2 294 3
3 295 4
4 296 5
... ... ...
80 427 81
81 428 82
82 429 83
83 430 84
84 431 85

85 rows × 2 columns

Normalize fingerprint

[16]:
from kissim.encoding import FingerprintNormalized

fingerprint_normalized = FingerprintNormalized.from_fingerprint(fingerprint)
[17]:
fingerprint_normalized.physicochemical.hist();
../_images/tutorials_encoding_24_0.png
[18]:
fingerprint_normalized.distances.hist();
../_images/tutorials_encoding_25_0.png

Save/load fingerprints

[19]:
json_filepath = Path("fingerprint.json")
[20]:
fingerprint.to_json(json_filepath)
Fingerprint.from_json(json_filepath)
[20]:
<kissim.encoding.fingerprint.Fingerprint at 0x7fc84c6caeb0>
[21]:
json_filepath.unlink()

Encode multiple structures

[22]:
from kissim.encoding import FingerprintGenerator
[23]:
# flake8-noqa-cell
FingerprintGenerator.from_structure_klifs_ids?
Signature:
FingerprintGenerator.from_structure_klifs_ids(
    structure_klifs_ids,
    klifs_session=None,
    n_cores=1,
)
Docstring:
Calculate fingerprints for one or more KLIFS structures (by structure KLIFS IDs).

Parameters
----------
structure_klifs_id : int
    Input structure KLIFS ID (output fingerprints may contain less IDs because some
    structures could not be encoded).
klifs_session : opencadd.databases.klifs.session.Session
    Local or remote KLIFS session.
n_cores : int or None
    Number of cores to be used for fingerprint generation as defined by the user.

Returns
-------
kissim.encoding.fingerprint_generator
    Fingerprint generator object containing fingerprints.
File:      ~/Documents/GitHub/kissim/kissim/encoding/fingerprint_generator.py
Type:      method

Select structure KLIFS IDs

[24]:
structure_klifs_ids = [109, 118, 12347, 1641, 3833, 9122]

Generate fingerprints from remote KLIFS session

[25]:
fingerprint_generator = FingerprintGenerator.from_structure_klifs_ids(
    structure_klifs_ids=structure_klifs_ids, klifs_session=KLIFS_REMOTE, n_cores=2
)

Generate fingerprints from local KLIFS session

[26]:
fingerprint_generator = FingerprintGenerator.from_structure_klifs_ids(
    structure_klifs_ids=structure_klifs_ids, klifs_session=KLIFS_LOCAL, n_cores=2
)

Save/load fingerprints

[27]:
json_filepath = Path("fingerprints.json")
[28]:
fingerprint_generator.to_json(json_filepath)
FingerprintGenerator.from_json(json_filepath)
[28]:
<kissim.encoding.fingerprint_generator.FingerprintGenerator at 0x7fc8440a2eb0>
[29]:
json_filepath.unlink()

Calculate individual features

Load pocket

[30]:
from kissim.io import PocketBioPython, PocketDataFrame

pocket_bp = PocketBioPython.from_structure_klifs_id(12347)
pocket_df = PocketDataFrame.from_structure_klifs_id(12347)

SiteAlign features

The ``kissim`` fingerprint composition.

Figure 2: The SiteAlign feature bits of the kissim fingerprint.

Select one of the SiteAlign features:

  • "hba": Hydrogen bond acceptor feature

  • "hbd": Hydrogen bond donor feature

  • "size": Size feature

  • "charge": Charge feature

  • "aliphatic": Aliphatic feature

  • "aromatic": Aromatic feature

[31]:
from kissim.encoding.features import SiteAlignFeature

feature_sitealign = SiteAlignFeature.from_pocket(pocket_bp, feature_name="hba")
[32]:
print("Number of bits: ", len(feature_sitealign.values))
print(*feature_sitealign.values)
Number of bits:  85
1.0 0.0 0.0 nan nan nan nan 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 2.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 2.0 0.0 1.0 1.0 0.0 1.0 1.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 2.0 0.0 0.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 2.0 0.0 nan nan nan
[33]:
feature_sitealign.details
[33]:
residue.id residue.name sitealign.category
residue.ix
1 461 GLN 1
2 462 ARG 0
3 463 ILE 0
4 <NA> None <NA>
5 <NA> None <NA>
... ... ... ...
81 594 ASP 2
82 595 PHE 0
83 <NA> None <NA>
84 <NA> None <NA>
85 <NA> None <NA>

85 rows × 3 columns

Side chain orientation

The ``kissim`` fingerprint composition.

Figure 3: The side chain orientation feature bits of the kissim fingerprint.

[34]:
from kissim.encoding.features import SideChainOrientationFeature

feature_sco = SideChainOrientationFeature.from_pocket(pocket_bp)
[35]:
print("Number of bits: ", len(feature_sco.values))
print(*feature_sco.values)
Number of bits:  85
2.0 nan nan nan nan nan nan 3.0 nan 3.0 2.0 nan nan 3.0 1.0 3.0 nan nan nan 2.0 2.0 nan 3.0 nan 3.0 nan 2.0 nan nan 3.0 1.0 3.0 3.0 3.0 3.0 1.0 3.0 3.0 3.0 nan 3.0 3.0 2.0 3.0 1.0 2.0 3.0 1.0 nan nan 3.0 3.0 3.0 3.0 3.0 2.0 nan 2.0 1.0 2.0 1.0 2.0 3.0 2.0 3.0 2.0 3.0 1.0 3.0 2.0 3.0 2.0 nan 3.0 2.0 3.0 3.0 3.0 3.0 nan 3.0 2.0 nan nan nan
[36]:
feature_sco.details
[36]:
residue.id sco.category sco.angle ca.vector sc.vector pocket_center.vector
residue.ix
1 461 2.0 84.436343 <Vector 8.81, 16.81, 51.66> <Vector 5.63, 14.81, 52.26> <Vector 0.83, 21.62, 36.45>
2 462 NaN NaN <Vector 8.91, 14.91, 48.36> None <Vector 0.83, 21.62, 36.45>
3 463 NaN NaN <Vector 5.48, 13.77, 47.17> None <Vector 0.83, 21.62, 36.45>
4 <NA> NaN NaN None None <Vector 0.83, 21.62, 36.45>
5 <NA> NaN NaN None None <Vector 0.83, 21.62, 36.45>
... ... ... ... ... ... ...
81 594 3.0 92.784985 <Vector 1.83, 18.43, 34.14> <Vector 1.72, 16.82, 36.09> <Vector 0.83, 21.62, 36.45>
82 595 2.0 59.266540 <Vector 2.38, 19.71, 30.60> <Vector 0.31, 24.33, 31.38> <Vector 0.83, 21.62, 36.45>
83 <NA> NaN NaN None None <Vector 0.83, 21.62, 36.45>
84 <NA> NaN NaN None None <Vector 0.83, 21.62, 36.45>
85 <NA> NaN NaN None None <Vector 0.83, 21.62, 36.45>

85 rows × 6 columns

Exposure

The ``kissim`` fingerprint composition.

Figure 4: The solvent exposure feature bits of the kissim fingerprint.

[37]:
from kissim.encoding.features import SolventExposureFeature

feature_exposure = SolventExposureFeature.from_pocket(pocket_bp)
[38]:
print("Number of bits: ", len(feature_exposure.values))
print(*feature_exposure.values)
Number of bits:  85
3.0 3.0 1.0 nan nan nan nan 3.0 1.0 2.0 3.0 2.0 3.0 1.0 3.0 2.0 3.0 2.0 3.0 2.0 1.0 3.0 3.0 2.0 1.0 3.0 1.0 1.0 3.0 3.0 1.0 3.0 3.0 2.0 3.0 2.0 3.0 2.0 3.0 3.0 3.0 1.0 3.0 2.0 3.0 3.0 3.0 1.0 3.0 1.0 3.0 3.0 1.0 3.0 3.0 1.0 1.0 2.0 2.0 2.0 1.0 1.0 2.0 1.0 3.0 1.0 3.0 1.0 3.0 3.0 1.0 3.0 2.0 3.0 2.0 1.0 3.0 2.0 2.0 3.0 2.0 1.0 nan nan nan
[39]:
feature_exposure.details
[39]:
residue.id exposure.category exposure.ratio exposure.ratio_ca exposure.ratio_cb
residue.ix
1 461.0 3.0 0.875000 1.000000 0.875000
2 462.0 3.0 0.785714 0.500000 0.785714
3 463.0 1.0 0.312500 NaN 0.312500
4 NaN NaN NaN NaN NaN
5 NaN NaN NaN NaN NaN
... ... ... ... ... ...
81 594.0 2.0 0.518519 0.703704 0.518519
82 595.0 1.0 0.320000 NaN 0.320000
83 NaN NaN NaN NaN NaN
84 NaN NaN NaN NaN NaN
85 NaN NaN NaN NaN NaN

85 rows × 5 columns

Subpocket distances

The ``kissim`` fingerprint composition.

Figure 5: The spatial feature bits of the kissim fingerprint.

[40]:
from kissim.encoding.features import SubpocketsFeature

feature_subpockets = SubpocketsFeature.from_pocket(pocket_df)
[41]:
feature_subpockets.values
[41]:
{'hinge_region': [13.085004250208536, 4.6390448501167905, 2.3912455031818345],
 'dfg_region': [14.305938842969063, 4.88461568697127, 3.310695450716709],
 'front_pocket': [13.30385832908826, 4.084029450633254, 2.8948181398060013],
 'center': [12.235948428129538, 3.483684294025381, -1.4928813986271416]}
[42]:
feature_subpockets.details["distances"]
[42]:
residue.id hinge_region dfg_region front_pocket center
residue.ix
1 461 13.169870 18.206156 14.569412 17.831911
2 462 12.055552 15.556118 12.221359 15.876463
3 463 10.828905 15.211579 9.537408 14.079297
4 <NA> NaN NaN NaN NaN
5 <NA> NaN NaN NaN NaN
... ... ... ... ... ...
81 594 8.630239 6.924464 5.781793 4.057558
82 595 11.446905 6.909891 9.576426 6.348532
83 <NA> NaN NaN NaN NaN
84 <NA> NaN NaN NaN NaN
85 <NA> NaN NaN NaN NaN

85 rows × 5 columns

[43]:
feature_subpockets.details["moments"]
[43]:
hinge_region dfg_region front_pocket center
moment
1 13.085004 14.305939 13.303858 12.235948
2 4.639045 4.884616 4.084029 3.483684
3 2.391246 3.310695 2.894818 -1.492881