API - Quick start Python interface
Let’s take a look at the kissim
quick start API to encode a set of structures (from the KLIFS database) and perform an all-against-all comparison.
[1]:
from kissim.api import encode, compare
[2]:
# Load path to test data
from kissim.dataset.test import PATH as PATH_TEST_DATA
Encode structures into fingerprints
The encode
function is a quick start API to generate fingerprints in bulk based on structure KLIFS IDs.
Input parameters are:
structure_klifs_ids
: Structure KLIFS IDs.fingerprints_json_filepath
: (Optionally) Path to output json file containing fingerprints.local_klifs_download_path
: (Optionally) Set path local KLIFS download or - if not set - fetch data from KLIFS database.n_cores
: (Optionally) Number of cores used to generate fingerprints.
The return object is of type FingerprintGenerator
.
[3]:
# flake8-noqa-cell
encode?
Signature:
encode(
structure_klifs_ids,
fingerprints_filepath=None,
local_klifs_download_path=None,
n_cores=1,
)
Docstring:
Encode structures.
Parameters
----------
structure_klifs_ids : list of int
Structure KLIFS IDs.
fingerprints_filepath : str or pathlib.Path
Path to output json file. Default None.
local_klifs_download_path : str or None
If path to local KLIFS download is given, set up local KLIFS session.
If None is given, set up remote KLIFS session.
n_cores : int
Number of cores used to generate fingerprints.
Returns
-------
kissim.encoding.FingerprintGenerator
Fingerprints.
File: ~/Documents/GitHub/kissim/kissim/api/encode.py
Type: function
Run encode
function
[4]:
structure_klifs_ids = [109, 118, 12347, 1641, 3833, 9122]
fingerprint_generator = encode(
structure_klifs_ids=structure_klifs_ids,
fingerprints_filepath=None,
n_cores=2,
local_klifs_download_path=PATH_TEST_DATA / "KLIFS_download",
)
Inspect output: FingerprintGenerator
[5]:
print(f"Number of structures (input): {len(structure_klifs_ids)}")
print(f"Number of fingerprints (output): {len(fingerprint_generator.data.keys())}")
fingerprint_generator
Number of structures (input): 6
Number of fingerprints (output): 6
[5]:
<kissim.encoding.fingerprint_generator.FingerprintGenerator at 0x7f04c1976910>
[6]:
fingerprint_generator.data
[6]:
{109: <kissim.encoding.fingerprint.Fingerprint at 0x7f04c18d0970>,
118: <kissim.encoding.fingerprint.Fingerprint at 0x7f04c18dd1c0>,
12347: <kissim.encoding.fingerprint.Fingerprint at 0x7f04c18d0b80>,
1641: <kissim.encoding.fingerprint.Fingerprint at 0x7f04c18d03a0>,
3833: <kissim.encoding.fingerprint.Fingerprint at 0x7f04c18d0430>,
9122: <kissim.encoding.fingerprint.Fingerprint at 0x7f04c18dd130>}
Find more information about the FingerprintGenerator
object here.
Compare fingerprints
The compare
function is a quick start API to perform a pairwise all-against-all (bulk) comparison for a set of fingerprints.
Input parameters are:
fingerprint_generator
: Fingerprints.output_path
: (Optionally) Path to output folder for distances json files.feature_weights
: (Optionally) Feature weights used to calculate the final fingerprint distance.n_cores
: (Optionally) Number of cores used to generate distances.
The return objects are of type FeatureDistancesGenerator
and FingerprintDistanceGenerator
.
[7]:
# flake8-noqa-cell
compare?
Signature:
compare(
fingerprint_generator,
output_path=None,
feature_weights=None,
n_cores=1,
)
Docstring:
Compare fingerprints (pairwise).
Parameters
----------
fingerprint_generator : kissim.encoding.FingerprintGenerator
Fingerprints for KLIFS dataset.
output_path : str
Path to output folder.
feature_weights : None or list of float
Feature weights of the following form:
(i) None
Default feature weights: All features equally distributed to 1/15
(15 features in total).
(ii) By feature (list of 15 floats):
Features to be set in the following order: size, hbd, hba, charge, aromatic,
aliphatic, sco, exposure, distance_to_centroid, distance_to_hinge_region,
distance_to_dfg_region, distance_to_front_pocket, moment1, moment2, and moment3.
All floats must sum up to 1.0.
n_cores : int
Number of cores used to generate fingerprint distances.
Returns
-------
feature_distances_generator : kissim.comparison.FeatureDistancesGenerator
Feature distances for all pairwise fingerprints.
fingerprint_distance_generator : kissim.comparison.FingerprintDistanceGenerator
Fingerprint distance for all pairwise fingeprints.
File: ~/Documents/GitHub/kissim/kissim/api/compare.py
Type: function
Run compare
function
[8]:
feature_distances_generator, fingerprint_distance_generator = compare(
fingerprint_generator=fingerprint_generator,
output_path=None,
feature_weights=None,
n_cores=2,
)
For final fingerprint distances, please refer to the FingerprintDistanceGenerator
object.
Inspect output: FingerprintDistanceGenerator
[9]:
print(f"Number of fingerprints (input): {len(fingerprint_generator.data)}")
print(f"Number of pairwise comparisons (output): {len(fingerprint_distance_generator.data)}")
fingerprint_distance_generator
Number of fingerprints (input): 6
Number of pairwise comparisons (output): 15
[9]:
<kissim.comparison.fingerprint_distance_generator.FingerprintDistanceGenerator at 0x7f04c1880310>
[10]:
fingerprint_distance_generator.data
[10]:
structure.1 | structure.2 | kinase.1 | kinase.2 | distance | bit_coverage | |
---|---|---|---|---|---|---|
0 | 109 | 118 | ABL2 | ABL2 | 0.074214 | 0.992000 |
1 | 109 | 12347 | ABL2 | BRAF | 0.259053 | 0.919333 |
2 | 109 | 1641 | ABL2 | CHK1 | 0.253045 | 0.990667 |
3 | 109 | 3833 | ABL2 | AAK1 | 0.277368 | 0.990667 |
4 | 109 | 9122 | ABL2 | ADCK3 | 0.358882 | 0.990667 |
5 | 118 | 12347 | ABL2 | BRAF | 0.273133 | 0.918000 |
6 | 118 | 1641 | ABL2 | CHK1 | 0.246844 | 0.989333 |
7 | 118 | 3833 | ABL2 | AAK1 | 0.282949 | 0.990000 |
8 | 118 | 9122 | ABL2 | ADCK3 | 0.360833 | 0.989333 |
9 | 12347 | 1641 | BRAF | CHK1 | 0.303330 | 0.918000 |
10 | 12347 | 3833 | BRAF | AAK1 | 0.307277 | 0.918000 |
11 | 12347 | 9122 | BRAF | ADCK3 | 0.376875 | 0.918000 |
12 | 1641 | 3833 | CHK1 | AAK1 | 0.229590 | 0.989333 |
13 | 1641 | 9122 | CHK1 | ADCK3 | 0.347142 | 0.989333 |
14 | 3833 | 9122 | AAK1 | ADCK3 | 0.303542 | 0.990000 |
Get the structure distance matrix.
[11]:
fingerprint_distance_generator.structure_distance_matrix()
[11]:
structure.2 | 109 | 118 | 1641 | 3833 | 9122 | 12347 |
---|---|---|---|---|---|---|
structure.1 | ||||||
109 | 0.000000 | 0.074214 | 0.253045 | 0.277368 | 0.358882 | 0.259053 |
118 | 0.074214 | 0.000000 | 0.246844 | 0.282949 | 0.360833 | 0.273133 |
1641 | 0.253045 | 0.246844 | 0.000000 | 0.229590 | 0.347142 | 0.303330 |
3833 | 0.277368 | 0.282949 | 0.229590 | 0.000000 | 0.303542 | 0.307277 |
9122 | 0.358882 | 0.360833 | 0.347142 | 0.303542 | 0.000000 | 0.376875 |
12347 | 0.259053 | 0.273133 | 0.303330 | 0.307277 | 0.376875 | 0.000000 |
Map structure pairs to kinase pairs (example: here use structure pair with minimum distance as representative for kinase pair).
[12]:
fingerprint_distance_generator.kinase_distance_matrix(by="minimum")
[12]:
kinase.2 | AAK1 | ABL2 | ADCK3 | BRAF | CHK1 |
---|---|---|---|---|---|
kinase.1 | |||||
AAK1 | 0.000000 | 0.277368 | 0.303542 | 0.307277 | 0.229590 |
ABL2 | 0.277368 | 0.000000 | 0.358882 | 0.259053 | 0.246844 |
ADCK3 | 0.303542 | 0.358882 | 0.000000 | 0.376875 | 0.347142 |
BRAF | 0.307277 | 0.259053 | 0.376875 | 0.000000 | 0.303330 |
CHK1 | 0.229590 | 0.246844 | 0.347142 | 0.303330 | 0.000000 |
Inspect output: FeatureDistancesGenerator
[13]:
print(f"Number of fingerprints (input): {len(fingerprint_generator.data.keys())}")
print(f"Number of pairwise comparisons (output): {len(feature_distances_generator.data)}")
feature_distances_generator
Number of fingerprints (input): 6
Number of pairwise comparisons (output): 15
[13]:
<kissim.comparison.feature_distances_generator.FeatureDistancesGenerator at 0x7f0524591ee0>
[14]:
feature_distances_generator.data
[14]:
structure.1 | structure.2 | kinase.1 | kinase.2 | distance.1 | distance.2 | distance.3 | distance.4 | distance.5 | distance.6 | ... | bit_coverage.6 | bit_coverage.7 | bit_coverage.8 | bit_coverage.9 | bit_coverage.10 | bit_coverage.11 | bit_coverage.12 | bit_coverage.13 | bit_coverage.14 | bit_coverage.15 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 109 | 118 | ABL2 | ABL2 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 1.00 | 0.88 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.0 | 1.0 | 1.0 |
1 | 109 | 12347 | ABL2 | BRAF | 0.410256 | 0.397436 | 0.333333 | 0.243590 | 0.141026 | 0.230769 | ... | 0.92 | 0.67 | 0.92 | 0.92 | 0.92 | 0.92 | 0.92 | 1.0 | 1.0 | 1.0 |
2 | 109 | 1641 | ABL2 | CHK1 | 0.388235 | 0.352941 | 0.364706 | 0.247059 | 0.141176 | 0.223529 | ... | 1.00 | 0.86 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.0 | 1.0 | 1.0 |
3 | 109 | 3833 | ABL2 | AAK1 | 0.505882 | 0.505882 | 0.411765 | 0.211765 | 0.082353 | 0.270588 | ... | 1.00 | 0.86 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.0 | 1.0 | 1.0 |
4 | 109 | 9122 | ABL2 | ADCK3 | 0.623529 | 0.470588 | 0.435294 | 0.258824 | 0.235294 | 0.305882 | ... | 1.00 | 0.88 | 0.98 | 1.00 | 1.00 | 1.00 | 1.00 | 1.0 | 1.0 | 1.0 |
5 | 118 | 12347 | ABL2 | BRAF | 0.410256 | 0.397436 | 0.333333 | 0.243590 | 0.141026 | 0.230769 | ... | 0.92 | 0.65 | 0.92 | 0.92 | 0.92 | 0.92 | 0.92 | 1.0 | 1.0 | 1.0 |
6 | 118 | 1641 | ABL2 | CHK1 | 0.388235 | 0.352941 | 0.364706 | 0.247059 | 0.141176 | 0.223529 | ... | 1.00 | 0.84 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.0 | 1.0 | 1.0 |
7 | 118 | 3833 | ABL2 | AAK1 | 0.505882 | 0.505882 | 0.411765 | 0.211765 | 0.082353 | 0.270588 | ... | 1.00 | 0.85 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.0 | 1.0 | 1.0 |
8 | 118 | 9122 | ABL2 | ADCK3 | 0.623529 | 0.470588 | 0.435294 | 0.258824 | 0.235294 | 0.305882 | ... | 1.00 | 0.86 | 0.98 | 1.00 | 1.00 | 1.00 | 1.00 | 1.0 | 1.0 | 1.0 |
9 | 12347 | 1641 | BRAF | CHK1 | 0.346154 | 0.423077 | 0.346154 | 0.243590 | 0.115385 | 0.217949 | ... | 0.92 | 0.65 | 0.92 | 0.92 | 0.92 | 0.92 | 0.92 | 1.0 | 1.0 | 1.0 |
10 | 12347 | 3833 | BRAF | AAK1 | 0.435897 | 0.474359 | 0.333333 | 0.230769 | 0.102564 | 0.230769 | ... | 0.92 | 0.65 | 0.92 | 0.92 | 0.92 | 0.92 | 0.92 | 1.0 | 1.0 | 1.0 |
11 | 12347 | 9122 | BRAF | ADCK3 | 0.576923 | 0.371795 | 0.397436 | 0.256410 | 0.269231 | 0.282051 | ... | 0.92 | 0.68 | 0.89 | 0.92 | 0.92 | 0.92 | 0.92 | 1.0 | 1.0 | 1.0 |
12 | 1641 | 3833 | CHK1 | AAK1 | 0.352941 | 0.411765 | 0.352941 | 0.200000 | 0.082353 | 0.235294 | ... | 1.00 | 0.84 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.0 | 1.0 | 1.0 |
13 | 1641 | 9122 | CHK1 | ADCK3 | 0.611765 | 0.494118 | 0.541176 | 0.317647 | 0.282353 | 0.317647 | ... | 1.00 | 0.86 | 0.98 | 1.00 | 1.00 | 1.00 | 1.00 | 1.0 | 1.0 | 1.0 |
14 | 3833 | 9122 | AAK1 | ADCK3 | 0.611765 | 0.482353 | 0.494118 | 0.282353 | 0.223529 | 0.341176 | ... | 1.00 | 0.87 | 0.98 | 1.00 | 1.00 | 1.00 | 1.00 | 1.0 | 1.0 | 1.0 |
15 rows × 34 columns