Note for: PPIMpred: a web server for high-throughput screening of small molecules targeting protein-protein interaction

Note for: PPIMpred: a web server for high-throughput screening of small molecules targeting protein-protein interaction

Doi: 10.1098/rsos.160501

Overall concept:

PPIMpred

-          web server for high throughput screening of small molecules

-          targeting protein-protein interaction

o   MDM2-p53

o   Bcl2-Bak

o   cMyc-Max

-          Method for machine learning

o   3 different kernels of support vector machine

§  Linear

§  Polynomial

§  Radial basis

o   Naïve Byes

o   Random forest

-          5 folds cross validation is used to validate the model

Three protein-protein interactions model -- > screen in large dataset 265,242 small chemical from NCI

Enable user to get the structural + chemical similarities with known chemical modulator

 

Introduction:

-          Small chemicals -- > inhibit PPI at the interfaces -- > PPI modulators

o   Small amounts of PPI modulators enter clinical trial

§  Nutlin-3a (Mdm2/P53)

§  ABT-263 (Bcl2-Bak)

§  GX15-070 (Bcl2-Bak)

-          Public database reveals >17,000 non-redundant PPIMs

-          Advantage of PPIMs – bind to many types of protein interfaces, orthosteric (active site) and allosteric sites (not active site)

-          Support vector machine algorithm – for the ML

o   10 standard physico-chemical properties descriptor -- > build optimal models for known PPIs

o   Predicted SVM scores of training/testing datasets

§  Compare with IC50 values and docking scores

§  Small chemicals from NCI -- > screen through this model -- > got some candidate -- > docking studies -- > find out relationship between high vs random predicted SVM scores with autodock vina scores

Method

Cross-validated data – 5-fold cross validate

-          TIMBAL (database for protein-protein interaction modulators)

-          PubChem

-          80% of total positive

o   Used for cross-validation

§  Mdm2/P53 --  250 small molecules

§  Bcl2-Bak – 735 small molecules

§   cMyc-Max – 15 small molecules

-          Negative dataset

o   XX randomly selected compound from pubchem + other positive set of PPIMs

-          Blind dataset

o   20% remaining positive dataset

-          Independent (large) dataset

o   216,103 structures were used -- > large independent dataset

o   Data with no xlogP3 value -- > remove

-          Comparative study

o   2P2I positive dataset -- > 40 PPIMs

Availability of the wbserver: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4792518/

Machine-learning techniques

-          Feature selection as molecular descriptor

o   Using physico-chemical properties -- > positive and negative datasets

o   18 descriptors -- > each chemical structure

o   Reduce to 10 descriptors with a P-value <0.05 (significantly different)

Comparison of IC50 value

-          Positive training set -- > map to ChEMBL database

-          Obtain IC50 values (not all)

-          IC50 -- > convert to log scale -- > compare to SVM score

Performance measures

-          5 fold CV used in three different types of support vector machine

-          4 is used for training and 1 is used for the testing -- > doing this across the data

-          Threshold-dependent parameters;

o   Sensitivity

o   Specificity

o   Accuracy

o   Precision (PPV)

o   F1 score

-          Threshold-independent parameters

o   Under ROC (receiver operating characteristic) curve

Confidence measurement

-          Predicted SVM scores

o   Plot in histogram

-          Unknow predicted query

o   Used for validation

o   Predicted SVM score

§  Higher AUC in positive plot

·         Prediction to be positive PPIM higher

Result +Discussion

Feature selection

10 descriptors that have been selected base on t-test with P<0.05

1.       MW

2.       Xlogp3

3.       Hydrogen bond donor count

4.       Rotatable bond count

5.       Topological polar surface area

6.       Heavy atom count

7.       Complexity

8.       Defined atom stereocenter count

9.       Defined bond stereocenter count

10.   Covalently bonded unit count

ML construction

Classification based on

-          Threshold-dependent measure

-          Threshold-independent measure

Data -- > supervised data (threshold dependent/independent + feature selection) -- > ML (1.SV,2.Random forest,3. Naïve Byes) -- > cross-validation, k =5 -- > calculate for the average performance

 

Get the model from three different SVMs -- > testing with the larger database (NCI-small chemical dataset) -- > top-hit candidate -- > analyze further with autodock


Webserver:

-          Using three different SVM-based models

-          Two separate input pages

o   Molecular search

§  Single molecule search

·         Molecular descriptor input

·         Target selection

·         Threshold value

§  Batch input

o   Similarity search

§  Allow user to draw desired chemical structure using JME tool/directly past MOL file

-          Output

o   Molecular search

§  Prediction result

§  Tubular result

§  Graphical result

 

 


Comments

Popular posts from this blog

Useful links (updated: 2024-04-26)

Genome editing technology short note

SUSA Thailand - Sustainable University? (update 2023-06-23)