Note for: PPIMpred: a web server for high-throughput screening of small molecules targeting protein-protein interaction
Note for: PPIMpred: a web server for high-throughput
screening of small molecules targeting protein-protein interaction
Doi: 10.1098/rsos.160501
Overall concept:
PPIMpred
-
web server for high throughput screening of
small molecules
-
targeting protein-protein interaction
o
MDM2-p53
o
Bcl2-Bak
o
cMyc-Max
-
Method for machine learning
o
3 different kernels of support vector machine
§
Linear
§
Polynomial
§
Radial basis
o
Naïve Byes
o
Random forest
-
5 folds cross validation is used to validate the
model
Three protein-protein interactions model -- > screen in
large dataset 265,242 small chemical from NCI
Enable user to get the structural + chemical similarities
with known chemical modulator
Introduction:
-
Small chemicals -- > inhibit PPI at the
interfaces -- > PPI modulators
o
Small amounts of PPI modulators enter clinical
trial
§
Nutlin-3a (Mdm2/P53)
§
ABT-263 (Bcl2-Bak)
§
GX15-070 (Bcl2-Bak)
-
Public database reveals >17,000 non-redundant
PPIMs
-
Advantage of PPIMs – bind to many types of
protein interfaces, orthosteric (active site) and allosteric sites (not active
site)
-
Support vector machine algorithm – for the ML
o
10 standard physico-chemical properties
descriptor -- > build optimal models for known PPIs
o
Predicted SVM scores of training/testing
datasets
§
Compare with IC50 values and docking scores
§
Small chemicals from NCI -- > screen through
this model -- > got some candidate -- > docking studies -- > find out
relationship between high vs random predicted SVM scores with autodock vina
scores
Method
Cross-validated data – 5-fold cross validate
-
TIMBAL
(database for protein-protein interaction modulators)
-
PubChem
-
80% of total positive
o
Used for cross-validation
§
Mdm2/P53 --
250 small molecules
§
Bcl2-Bak – 735 small molecules
§
cMyc-Max –
15 small molecules
-
Negative dataset
o
XX randomly selected compound from pubchem +
other positive set of PPIMs
-
Blind dataset
o
20% remaining positive dataset
-
Independent (large) dataset
o
216,103 structures were used -- > large independent
dataset
o
Data with no xlogP3 value -- > remove
-
Comparative study
o
2P2I positive
dataset -- > 40 PPIMs
Availability of the wbserver: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4792518/
Machine-learning techniques
-
Feature selection as molecular descriptor
o
Using physico-chemical properties -- > positive
and negative datasets
o
18 descriptors -- > each chemical structure
o
Reduce to 10 descriptors with a P-value <0.05
(significantly different)
Comparison of IC50 value
-
Positive training set -- > map to ChEMBL
database
-
Obtain IC50 values (not all)
-
IC50 -- > convert to log scale -- >
compare to SVM score
Performance measures
-
5 fold CV used in three different types of support
vector machine
-
4 is used for training and 1 is used for the
testing -- > doing this across the data
-
Threshold-dependent parameters;
o Sensitivity
o Specificity
o Accuracy
o Precision (PPV)
o F1 score
-
Threshold-independent
parameters
o Under ROC (receiver operating
characteristic) curve
Confidence measurement
-
Predicted SVM scores
o
Plot in histogram
-
Unknow predicted query
o
Used for validation
o
Predicted SVM score
§
Higher AUC in positive plot
·
Prediction to be positive PPIM higher
Result +Discussion
Feature selection
10 descriptors that have been selected base on t-test with
P<0.05
1.
MW
2.
Xlogp3
3.
Hydrogen bond donor count
4.
Rotatable bond count
5.
Topological polar surface area
6.
Heavy atom count
7.
Complexity
8.
Defined atom stereocenter count
9.
Defined bond stereocenter count
10.
Covalently bonded unit count
ML construction
Classification based on
-
Threshold-dependent measure
-
Threshold-independent
measure
Data -- > supervised data (threshold dependent/independent
+ feature selection) -- > ML (1.SV,2.Random forest,3. Naïve Byes) -- >
cross-validation, k =5 -- > calculate for the average performance
Get the model from three different SVMs -- > testing with
the larger database (NCI-small chemical dataset) -- > top-hit candidate -- >
analyze further with autodock
Webserver:
-
Using three different SVM-based models
-
Two separate input pages
o
Molecular search
§
Single molecule search
·
Molecular descriptor input
·
Target selection
·
Threshold value
§
Batch input
o
Similarity search
§
Allow user to draw desired chemical structure
using JME tool/directly past MOL file
-
Output
o
Molecular search
§
Prediction result
§
Tubular result
§
Graphical result
Comments
Post a Comment