Anticancer Peptides Classification with High-Accuracy Feature Representation Using Long Short-Term Memory
- Authority: 15th Annual Undergraduate Research Conference on Applied Computing (URC 2024)
- Category: Conference Proceeding
Cancer presents a formidable challenge due to its complexity, variability, and multitude of causes. Despite being extensively studied, our understanding of cancer remains incomplete. This underscores the urgent need for comprehensive therapeutic strategies. Among the potential treatment avenues, anticancer peptides (ACPs) hold significant promise. However, the identification and synthesis of these peptides on a large scale pose ongoing challenges, necessitating the development of reliable prediction methods. Existing methods for predicting ACPs often suffer from low accuracy and rely on features with limited resolution. To address this, we propose a novel classification approach based on long short-term memory (LSTM) networks, utilizing a new set of features. This feature set comprises both contemporary and innovative extraction techniques. The contemporary features include binary profile features and k-mer sparse matrices of reduced amino acids. The novel features are derived from the Composition of the K-Spaced Side Chain Pairs (CKSSCP), the Composition of the K-Spaced Electrically Charged Side Chain Pairs (CKSECSCP), and a combination of [pk(CO2H)] + [pk(NH2)] + [pk(R)] + [Isoelectric point]. The combined feature set is employed to train the LSTM model, and extensive experiments are conducted on benchmark datasets using k-fold cross-validation. The results demonstrate that our model surpasses other ACP classification methods in terms of accuracy and Mathew’s correlation coefficient (MCC). Specifically, for the ACP740 dataset with 5-folds, we achieve an MCC score of 74.61%, outperforming ACP-KSRC, ACP-MHCNN, and ACP-DA methods by 7.61%, 2.61%, and 10.61%, respectively. Similarly, for the ACP344 dataset with 5-folds, we achieve an MCC score of 84.46%, surpassing ACP-KSRC and the ZH-method by 3.46% and 6.46%, respectively. With its superior classification performance, our proposed model could aid in identifying novel ACPs and contribute to a deeper understanding of their structural and chemical characteristics. The source code and datasets used in this study are publicly available on the author’s GitHub page: (https://github.com/Nazer5130/ACP-HAFR).