Note: mACPpred A Support Vector Machine-Based Meta-Predictor for Identification of Anticancer Peptides
Note: mACPpred A Support Vector Machine-Based Meta-Predictor for Identification of Anticancer Peptides
doi: 10.3390/ijms20081964
Gap;
Accurate prediction of ACPs -- problematic in field of immunoinformatics
Performance of ML is not that good, at current moment
This study:
Present novel approaches to precisely predict
Applying two-step feature selection protocol
Contain 7 features that embedded in AA sequences
Composition-based
Physicochemical properties
Profiles
Obtained their corresponding optimal feature-based models
Predicted probability feature vectors
Used as input to support vector machine
Intro;
ACP -- 5-50 AA
In depth mechanisms toward cancers are unknown but thought to be related with membranolytic activities
Identify novel ACP -- time consuming, thus using computational method to predict ACP based on the AA sequences before synthesis -- cost-saving
This study;
Dataset with a lowest redundancy benchmark
Using this dataset to extract feature and building the model
7 features from AA sequences (เป็น molecular descriptor ที่ใช้กับ peptide/protein)
amino acid composition (AAC),
composition-transition-distribution (CTD),
quasi-sequence-order (QSO),
amino acid index (AAIF),
binary profile (NC5),
conjoint triad (CTF)
Output ในการทำนาย -- ACP and non-ACP
Go to PyBioMed - https://pybiomed.readthedocs.io/en/latest/index.html -- it briefly explains the molecular features of protein.
ใช้ 7 feature ในการหา molecular feature แล้วมาเลือกอีกทีว่า feature ตัวไหนน่าจะมีความสำคัญต่อการเอามาใช้สร้างโมเดลการทำนาย (using machine learning model approach)
Machine learning approaches that they use/ pick one that shows a good performance
Support vector machine learning
Random Forest
K-nearest neighbors (KNN)
Logistic regression
Using 10 fold-cross validation to validate the result of ML -- SVM the most prominent
Then, they realize that some features could be removed since some generate noise.
จริง ๆ แล้ว มีหลายเทคนิคในการที่จะใช้เป็นหลักเกณฑ์ในการเลือก features แต่เขาเลือกใช้ two-step feature
Web Server:
Downloading dataset:
http://thegleelab.org/mACPpred/ACPData.html
Novelty of this method;
Benchmark or training dataset -- lowest redundancy
First study to employ, CTF and QSO, as features to predict ACP
Most of the existing predictors
Use one feature encoding
Combination of multiple features encoding -- too complex to predict ACP (feature dimension ~ high)
Room for improvement
Using other ML algorithms
Decision tree-based
Neural network-based algorithms
- Incorporation of “novel” features/computational approaches
- Increase the size of training dataset -- require experimental data to do this
Features for peptides
Sequenced based features
Physicochemical properties-based features
Comments
Post a Comment