Note: mACPpred A Support Vector Machine-Based Meta-Predictor for Identification of Anticancer Peptides

Note: mACPpred A Support Vector Machine-Based Meta-Predictor for Identification of Anticancer Peptides

doi: 10.3390/ijms20081964


Gap;

  • Accurate prediction of ACPs -- problematic in field of immunoinformatics

  • Performance of ML is not that good, at current moment

This study:

  • Present novel approaches to precisely predict

    • Applying two-step feature selection protocol

      • Contain 7 features that embedded in AA sequences

        • Composition-based

        • Physicochemical properties

        • Profiles

      • Obtained their corresponding optimal feature-based models

    • Predicted probability feature vectors

      • Used as input to support vector machine


Intro;

ACP -- 5-50 AA

In depth mechanisms toward cancers are unknown but thought to be related with membranolytic activities


Identify novel  ACP -- time consuming, thus using computational method to predict ACP based on the AA sequences before synthesis -- cost-saving


This study;

  • Dataset with a lowest redundancy benchmark

  • Using this dataset to extract feature and building the model

7 features from AA sequences (เป็น molecular descriptor ที่ใช้กับ peptide/protein)

  • amino acid composition (AAC), 

  • dipeptide composition (DPC)

  • composition-transition-distribution (CTD), 

  • quasi-sequence-order (QSO), 

  • amino acid index (AAIF), 

  • binary profile (NC5),

  • conjoint triad (CTF)

Output ในการทำนาย -- ACP and non-ACP


Go to PyBioMed - https://pybiomed.readthedocs.io/en/latest/index.html -- it briefly explains the molecular features of protein.

  • ใช้ 7 feature ในการหา molecular feature แล้วมาเลือกอีกทีว่า feature ตัวไหนน่าจะมีความสำคัญต่อการเอามาใช้สร้างโมเดลการทำนาย (using machine learning model approach)



Machine learning approaches that they use/ pick one that shows a good performance

  • Support vector machine learning

  • Random Forest

  • K-nearest neighbors (KNN)

  • Logistic regression

Using 10 fold-cross validation to validate the result of ML -- SVM the most prominent



























  • Then, they realize that some features could be removed since some generate noise.

  • จริง ๆ แล้ว มีหลายเทคนิคในการที่จะใช้เป็นหลักเกณฑ์ในการเลือก features แต่เขาเลือกใช้ two-step feature
































Web Server:

www.thegleelab.org/mACPpred


Downloading dataset:

http://thegleelab.org/mACPpred/ACPData.html


Novelty of this method;

  • Benchmark or training dataset -- lowest redundancy

  • First study to employ, CTF and QSO, as features to predict ACP

  • Most of the existing predictors 

    • Use one feature encoding

    • Combination of multiple features encoding  -- too complex to predict ACP (feature dimension ~ high)

Room for improvement

  • Using other ML algorithms

    • Decision tree-based

    • Neural network-based algorithms

- Incorporation of “novel” features/computational approaches

- Increase the size of training dataset -- require experimental data to do this


Features for peptides

  • Sequenced based features

















  • Physicochemical properties-based features







Comments

Popular posts from this blog

Useful links (updated: 2024-05-05)

SUSA Thailand - Sustainable University? (update 2023-06-23)

Genome editing technology short note