CLI - Command-line interface

Let’s take a look at the kissim command-line interface (CLI) to encode a set of structures (from the KLIFS database) and perform an all-against-all comparison. The CLI follows the same logic as the quick start Python interface as described in “API - Quick start Python interface”.

kissim CLI
[1]:
from pathlib import Path

import pandas as pd
[2]:
# Path to this notebook
HERE = Path(_dh[-1])  # noqa: F821

Encode structures into fingerprints

[3]:
%%bash
kissim encode -h
# flake8-noqa-cell
usage: kissim encode [-h] -i INPUT [INPUT ...] -o OUTPUT [-l LOCAL]
                     [-c NCORES]

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT [INPUT ...], --input INPUT [INPUT ...]
                        List of structure KLIFS IDs or path to txt file
                        containing structure KLIFS IDs.
  -o OUTPUT, --output OUTPUT
                        Path to output JSON file containing fingerprint data.
  -l LOCAL, --local LOCAL
                        Path to KLIFS download folder. If set local KLIFS data
                        is used, else remote KLIFS data.
  -c NCORES, --ncores NCORES
                        Number of cores. If 1 fingerprint generation in
                        sequence, else in parallel.

kissim encode command

[4]:
%%bash
# Set path to local KLIFS data
PATH_KLIFS_DOWNLOAD="../../kissim/tests/data/KLIFS_download/"
kissim encode -i 109 118 12347 1641 3833 9122 -o fingerprints.json -l $PATH_KLIFS_DOWNLOAD -c 2
# flake8-noqa-cell
INFO:kissim.encoding.fingerprint_generator:GENERATE FINGERPRINTS
INFO:kissim.encoding.fingerprint_generator:Number of input structures: 6
INFO:kissim.encoding.fingerprint_generator:Fingerprint generation started at: 2021-11-16 14:21:15.937623
INFO:kissim.utils:Number of cores used: 2.
INFO:kissim.encoding.fingerprint_generator:109: Generate fingerprint...
INFO:kissim.encoding.fingerprint_generator:118: Generate fingerprint...
INFO:kissim.encoding.fingerprint_generator:12347: Generate fingerprint...
INFO:kissim.encoding.fingerprint_generator:1641: Generate fingerprint...
INFO:kissim.encoding.fingerprint_generator:3833: Generate fingerprint...
INFO:kissim.encoding.fingerprint_generator:9122: Generate fingerprint...
INFO:kissim.encoding.fingerprint_generator:Number of output fingerprints: 6
INFO:kissim.encoding.fingerprint_generator:Runtime: 0:00:06.288938
INFO:kissim.api.encode:Write fingerprints to file: fingerprints.json

This command generate two files:

  • Data: fingerprints.json

  • Logs (not under Windows): fingerprint.log

Inspect output: FingerprintGenerator

You can load the content of the fingerprints.json file as FingerprintGenerator object.

[5]:
fingerprints_path = HERE / "fingerprints.json"
fingerprints_path
[5]:
PosixPath('/home/dominique/Documents/GitHub/kissim/docs/tutorials/fingerprints.json')
[6]:
from kissim.encoding import FingerprintGenerator

fingerprint_generator = FingerprintGenerator.from_json(fingerprints_path)
print(f"Number of fingerprints: {len(fingerprint_generator.data.keys())}")
fingerprint_generator
Number of fingerprints: 6
[6]:
<kissim.encoding.fingerprint_generator.FingerprintGenerator at 0x7f78c6d10ca0>

Compare fingerprints

[7]:
%%bash
kissim compare -h
# flake8-noqa-cell
usage: kissim compare [-h] -i INPUT -o OUTPUT
                      [-w WEIGHTS WEIGHTS WEIGHTS WEIGHTS WEIGHTS WEIGHTS WEIGHTS WEIGHTS WEIGHTS WEIGHTS WEIGHTS WEIGHTS WEIGHTS WEIGHTS WEIGHTS]
                      [-c NCORES]

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        Path to JSON file containing fingerprint data.
  -o OUTPUT, --output OUTPUT
                        Path to output folder where distance bzip-compressed
                        CSV files will be saved.
  -w WEIGHTS WEIGHTS WEIGHTS WEIGHTS WEIGHTS WEIGHTS WEIGHTS WEIGHTS WEIGHTS WEIGHTS WEIGHTS WEIGHTS WEIGHTS WEIGHTS WEIGHTS, --weights WEIGHTS WEIGHTS WEIGHTS WEIGHTS WEIGHTS WEIGHTS WEIGHTS WEIGHTS WEIGHTS WEIGHTS WEIGHTS WEIGHTS WEIGHTS WEIGHTS WEIGHTS
                        Feature weights. Eeach feature must be set
                        individually, all weights must sum up to 1.0.
  -c NCORES, --ncores NCORES
                        Number of cores. If 1 comparison in sequence, else in
                        parallel.

kissim compare command

[8]:
%%bash
kissim compare -i fingerprints.json -o . -c 2
# flake8-noqa-cell
INFO:kissim.comparison.feature_distances_generator:GENERATE FEATURE DISTANCES
INFO:kissim.comparison.feature_distances_generator:Number of input fingerprints: 6
INFO:kissim.comparison.feature_distances_generator:Feature distances generation started at: 2021-11-16 14:21:25.566224
INFO:kissim.utils:Number of cores used: 2.
INFO:kissim.comparison.feature_distances_generator:Number of ouput feature distances: 15
INFO:kissim.comparison.feature_distances_generator:Runtime: 0:00:00.058451
INFO:kissim.comparison.fingerprint_distance_generator:GENERATE FINGERPRINT DISTANCES
INFO:kissim.comparison.fingerprint_distance_generator:Fingerprint distance generation started at: 2021-11-16 14:21:25.627493
INFO:kissim.comparison.fingerprint_distance_generator:Feature weights: [0.06666667 0.06666667 0.06666667 0.06666667 0.06666667 0.06666667
 0.06666667 0.06666667 0.06666667 0.06666667 0.06666667 0.06666667
 0.06666667 0.06666667 0.06666667]
Calculate pairwise fingerprint distance: 100%|██████████| 15/15 [00:00<00:00, 22614.87it/s]
Calculate pairwise fingerprint coverage: 100%|██████████| 15/15 [00:00<00:00, 25115.59it/s]
INFO:kissim.comparison.fingerprint_distance_generator:Number of output fingerprint distances: 15
INFO:kissim.comparison.fingerprint_distance_generator:Runtime: 0:00:00.004275
INFO:kissim.comparison.tree:Clustering (method: ward) and calculating branch distances
INFO:kissim.comparison.tree:Converting clustering to a Newick tree
INFO:kissim.comparison.tree:Writing resulting tree to fingerprint_distances_to_kinase_clusters.tree
INFO:kissim.comparison.tree:Writing resulting kinase annotation to kinase_annotation.csv

This command generate the following files:

  • Data - fingerprint distances: fingerprint_distances.csv.bz2

  • Data - feature distances: feature_distances.csv.bz2

  • Data - default kinase matrix based on minimum fingerprint pair distance per kinase pair: fingerprint_distances_to_kinase_matrix.csv

  • Data - default kinase tree based on kinase matrix clustering (hierarchical clustering using Ward’s method): fingerprint_distances_to_kinase_clusters.csv

  • Logs (not under Windows): distances.log

Inspect output: All-against-all fingerprint distances

You can load the content of the fingerprint_distances.csv.bz2 file as FingerprintDistancesGenerator object.

[9]:
from kissim.comparison import FingerprintDistanceGenerator

fingerprint_distance_path = HERE / "fingerprint_distances.csv.bz2"
fingerprint_distance_generator = FingerprintDistanceGenerator.from_csv(fingerprint_distance_path)
print(f"Number of pairwise comparisons: {len(fingerprint_distance_generator.data)}")
fingerprint_distance_generator
Number of pairwise comparisons: 15
[9]:
<kissim.comparison.fingerprint_distance_generator.FingerprintDistanceGenerator at 0x7f78c01baa90>
[10]:
fingerprint_distance_generator.data
[10]:
structure.1 structure.2 kinase.1 kinase.2 distance bit_coverage
0 109 118 ABL2 ABL2 0.074214 0.992000
1 109 12347 ABL2 BRAF 0.256488 0.919333
2 109 1641 ABL2 CHK1 0.251476 0.990667
3 109 3833 ABL2 AAK1 0.278152 0.990667
4 109 9122 ABL2 ADCK3 0.357313 0.990667
5 118 12347 ABL2 BRAF 0.270569 0.918000
6 118 1641 ABL2 CHK1 0.245275 0.989333
7 118 3833 ABL2 AAK1 0.283733 0.990000
8 118 9122 ABL2 ADCK3 0.359265 0.989333
9 12347 1641 BRAF CHK1 0.300766 0.918000
10 12347 3833 BRAF AAK1 0.305568 0.918000
11 12347 9122 BRAF ADCK3 0.375166 0.918000
12 1641 3833 CHK1 AAK1 0.227237 0.989333
13 1641 9122 CHK1 ADCK3 0.344004 0.989333
14 3833 9122 AAK1 ADCK3 0.301189 0.990000

Inspect output: All-against-all feature distances

You can load the content of the feature_distances.csv.bz2 file as FeatureDistancesGenerator object.

[11]:
from kissim.comparison import FeatureDistancesGenerator

feature_distances_path = HERE / "feature_distances.csv.bz2"
feature_distances_generator = FeatureDistancesGenerator.from_csv(feature_distances_path)
print(f"Number of pairwise comparisons: {len(feature_distances_generator.data)}")
feature_distances_generator
Number of pairwise comparisons: 15
[11]:
<kissim.comparison.feature_distances_generator.FeatureDistancesGenerator at 0x7f78c00c65b0>
[12]:
feature_distances_generator.data
[12]:
structure.1 structure.2 kinase.1 kinase.2 distance.1 distance.2 distance.3 distance.4 distance.5 distance.6 ... bit_coverage.6 bit_coverage.7 bit_coverage.8 bit_coverage.9 bit_coverage.10 bit_coverage.11 bit_coverage.12 bit_coverage.13 bit_coverage.14 bit_coverage.15
0 109 118 ABL2 ABL2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 1.00 0.88 1.00 1.00 1.00 1.00 1.00 1.0 1.0 1.0
1 109 12347 ABL2 BRAF 0.410256 0.397436 0.333333 0.205128 0.141026 0.230769 ... 0.92 0.67 0.92 0.92 0.92 0.92 0.92 1.0 1.0 1.0
2 109 1641 ABL2 CHK1 0.388235 0.352941 0.364706 0.223529 0.141176 0.223529 ... 1.00 0.86 1.00 1.00 1.00 1.00 1.00 1.0 1.0 1.0
3 109 3833 ABL2 AAK1 0.505882 0.505882 0.411765 0.223529 0.082353 0.270588 ... 1.00 0.86 1.00 1.00 1.00 1.00 1.00 1.0 1.0 1.0
4 109 9122 ABL2 ADCK3 0.623529 0.470588 0.435294 0.235294 0.235294 0.305882 ... 1.00 0.88 0.98 1.00 1.00 1.00 1.00 1.0 1.0 1.0
5 118 12347 ABL2 BRAF 0.410256 0.397436 0.333333 0.205128 0.141026 0.230769 ... 0.92 0.65 0.92 0.92 0.92 0.92 0.92 1.0 1.0 1.0
6 118 1641 ABL2 CHK1 0.388235 0.352941 0.364706 0.223529 0.141176 0.223529 ... 1.00 0.84 1.00 1.00 1.00 1.00 1.00 1.0 1.0 1.0
7 118 3833 ABL2 AAK1 0.505882 0.505882 0.411765 0.223529 0.082353 0.270588 ... 1.00 0.85 1.00 1.00 1.00 1.00 1.00 1.0 1.0 1.0
8 118 9122 ABL2 ADCK3 0.623529 0.470588 0.435294 0.235294 0.235294 0.305882 ... 1.00 0.86 0.98 1.00 1.00 1.00 1.00 1.0 1.0 1.0
9 12347 1641 BRAF CHK1 0.346154 0.423077 0.346154 0.205128 0.115385 0.217949 ... 0.92 0.65 0.92 0.92 0.92 0.92 0.92 1.0 1.0 1.0
10 12347 3833 BRAF AAK1 0.435897 0.474359 0.333333 0.205128 0.102564 0.230769 ... 0.92 0.65 0.92 0.92 0.92 0.92 0.92 1.0 1.0 1.0
11 12347 9122 BRAF ADCK3 0.576923 0.371795 0.397436 0.230769 0.269231 0.282051 ... 0.92 0.68 0.89 0.92 0.92 0.92 0.92 1.0 1.0 1.0
12 1641 3833 CHK1 AAK1 0.352941 0.411765 0.352941 0.164706 0.082353 0.235294 ... 1.00 0.84 1.00 1.00 1.00 1.00 1.00 1.0 1.0 1.0
13 1641 9122 CHK1 ADCK3 0.611765 0.494118 0.541176 0.270588 0.282353 0.317647 ... 1.00 0.86 0.98 1.00 1.00 1.00 1.00 1.0 1.0 1.0
14 3833 9122 AAK1 ADCK3 0.611765 0.482353 0.494118 0.247059 0.223529 0.341176 ... 1.00 0.87 0.98 1.00 1.00 1.00 1.00 1.0 1.0 1.0

15 rows × 34 columns

Clean up output files.

Inspect output: Kinase matrix

[13]:
kinase_matrix = pd.read_csv(HERE / "fingerprint_distances_to_kinase_matrix.csv", index_col=0)
kinase_matrix
[13]:
AAK1 ABL2 ADCK3 BRAF CHK1
AAK1 0.000000 0.278152 0.301189 0.305568 0.227237
ABL2 0.278152 0.000000 0.357313 0.256488 0.245275
ADCK3 0.301189 0.357313 0.000000 0.375166 0.344004
BRAF 0.305568 0.256488 0.375166 0.000000 0.300766
CHK1 0.227237 0.245275 0.344004 0.300766 0.000000

Inspect output: Kinome tree and kinase annotation

[14]:
# Kinome tree
with open(HERE / "fingerprint_distances_to_kinase_clusters.tree", "r") as f:
    newick_string = f.read()
print(newick_string)
(((BRAF:0.256,ABL2:0.256)0.256:0.063,(CHK1:0.227,AAK1:0.227)0.227:0.092)0.269:0.064,ADCK3:0.384);
[15]:
# Kinase annotation
kinase_annotations = pd.read_csv(HERE / "kinase_annotation.csv", index_col=0, sep="\t")
kinase_annotations
[15]:
kinase.family kinase.group
kinase.klifs_name
CHK1 CAMKL CAMK
BRAF RAF TKL
ABL2 Abl TK
AAK1 NAK Other
ADCK3 ABC1 Atypical

The Newick tree and kinase annotation files can be loaded into the external software FigTree to visualize the kissim-based kinome tree.

Delete output files

[16]:
[i.unlink() for i in HERE.glob("*.json")]
[i.unlink() for i in HERE.glob("*.csv")]
[i.unlink() for i in HERE.glob("*.csv.bz2")]
[i.unlink() for i in HERE.glob("*.tree")]
[i.unlink() for i in HERE.glob("*.log")];