wps最新的官方下载入口

2025年03月14日

　　CAMP first applied the following five steps of multi-source data curation and multi-level label construction (Fig. 1a, more details can be found in Methods and Supplementary Note 10): (1) extracting peptide–protein complex structures from the RCSB PDB21,22 and the known drug-target pairs from DrugBank23,24,25,26,27; (2) using the protein–ligand interaction predictor (PLIP)28 to recognize non-covalent interactions between the peptide and the protein in each PDB complex, and only keeping the peptide–protein pairs with non-covalent interactions as positive samples; (3) deriving binding residue labels of the peptide from PepBDB29, a structure database of peptide–protein complexes derived from the RCSB PDB21,22; (4) generating residue-level structural and physicochemical properties, intrinsic disorder tendencies of peptides and proteins and protein evolutionary information based on the primary sequences of peptides and wps官方最新中文版下载地方是多少 proteins; and (5) integrating multi-level labels, i.e., the binary interaction labels and peptide-binding residue labels of peptide–protein pairs, for the training process.

　　Figure 1b shows the overall network architecture of CAMP. Given the feature profiles of the input peptide–protein pair, CAMP exploits two multi-channel feature extractors to process them separately. Each extractor contains a numerical channel and three categorical channels. The numerical channel is used to extract the pre-defined dense features (i.e., the protein Position-specific scoring matrice (PSSM) and the intrinsic disorder tendency of each residue in both protein and peptide sequences). Each categorical channel contains a self-learning word embedding layer30, which takes one of the categorical features of the input peptide or protein (i.e., the raw amino acids, secondary structures, polarity, and hydropathy properties). Here, we design such a multi-channel architecture because the input profiles contain multifaceted features of different scales, which may bring inconsistency if we only use a simple encoder. Next, CAMP exploits two convolutions neural network (CNN) modules that extract 中文版的最新wps下载地址在哪呢 the hidden contextual 最新官方中文的wps下载地址是什么 features of peptides and proteins, respectively. In addition, CAMP adopts self-attention mechanisms to learn the long-dependencies between residues and the contributions of individual residues of proteins and peptides to the final interaction prediction. After that, CAMP combines all the extracted features and uses three fully connected layers to predict whether there exists an interaction between a given peptide–protein pair. Furthermore, CAMP takes the output of the peptide CNN module with a sigmoid activation function for each position to predict whether each peptide residue binds to the partner protein. In our problem, the binary interaction prediction is our fundamental task and we aim to solve this problem by providing multi-level supervised information. Here, the extra binding residue labels can not only provide additional information to boost the performance of our main task, but also bring new insights about the pepPI by identifying the critical residues along with the peptide.

　　The binary classification of pepPIs is the primary goal of CAMP. Here, we compared the classification performance of CAMP with that of other state-of-the-art baseline methods, including a similarity-based matrix factorization method called NRLMF5, a deep-learning-based model for PPI prediction called PIPR12, and a deep-learning-based model for CPI prediction called DeepDTA31. All the prediction methods were evaluated on a benchmark data set through cross-validation. The area under the receiver operating characteristics curve (AUC) and the area under the precision-recall curve (AUPR) were used to evaluate the performance of all models. In general, AUPR can provide a better metric to evaluate the prediction models on skewed data in a more informative way than AUC32. To help readers estimate the difficulty of our task, we also reported the performance of several machine-learning baseline methods in Supplementary Note 1.

　　Since the human-curated data may contain “redundant” interaction pairs (e.g., one protein interacting with more than one similar peptide or vice versa), which could be easily predicted by the models. To avoid the trivial predictions caused by such cases, we followed the same strategy as in MONN33, and mainly used the cluster-based cross-validation settings for performance evaluation. In particular, based on similarity scores derived from Smith-Waterman alignment (https://github.com/mengyao/Complete-Striped-Smith-Waterman-Library), we divided proteins and peptides into different clusters such that the entities from the same cluster did not appear in the training and testing sets at the same time (more details can be found in Supplementary Note 8). We evaluated the performance of CAMP and the baseline methods under three cluster-based cross-validation settings. More specifically, in the “novel protein setting”, no proteins from the same cluster appeared in both training and testing sets; in the “novel peptide setting”, no peptides from the same cluster appeared in both training and testing sets; and in the “novel pair setting”, neither proteins nor peptides from the same cluster appeared in training and testing sets at the same time. Figure 2 shows that CAMP consistently outperformed the state-of-the-art baseline methods, with an increase by up to 10% and 15% in terms of AUC and AUPR, respectively中文版的wps的下载的网站在哪里. In addition, we observed a slight decreasing trend of prediction performance for all methods with larger clustering thresholds, which generally corresponded to more difficult tasks. We also noticed that the model performance under the “novel peptide setting” seemed to be better than that in the other settings. This can be explained by the fact that the peptides in our benchmark set shared less similarity with each other than proteins, and thus the distributions of peptides in the training and testing sets did not change much after clustering based on similarities. Such test results suggested that CAMP can achieve better and more robust performance than the baseline methods under all cross-validation settings.最新官网的wps的下载的入口是多少

　　Figure 2 also shows that CAMP generated relatively variant prediction results under certain cluster settings. To further investigate the potential factors that cause this phenomenon, we conducted additional analyses using a fivefold cross-validation procedure on the binary prediction task (in Supplementary Note 1). Our analysis result (Supplementary Fig. 1) indicated that the relatively large prediction errors under two clustering settings may result from certain protein families, domains, and organisms (e.g., histone and GPCR for the protein families, trypsin and kringle for the domains, and bovine for the protein organisms).

　　Furthermore, we conducted comprehensive ablation studies to demonstrate the importance of individual components of CAMP, including different groups of features and the self-attention modules in the network architecture (Supplementary Note 2). Our ablation studies (Supplementary Table 2 and Supplementary Fig. 2) demonstrated that the current model architecture and feature selection scheme were optimal for our prediction task.

　　So far, a number of computational methods have been developed for predicting the interacting sites on the protein surface in PepPI predictions14,34,35. These methods learn from 3D structure information of peptide–protein complexes and can pinpoint interacting sites on protein surfaces with relatively good accuracy. However, few models are specifically designed to characterize interacting sites on the peptides in PepPIs, which are also crucial for understanding the biological roles of peptides and designing efficacious peptide drugs. For pharmacologists, the choice of chemical modification heavily relies on the identification of essential peptide residues involved in binding activities1. Conventionally, pharmacologists would iteratively replace possible residues and conducted wet experiments for verification. Although these attempts could provide useful information for further drug design, e.g., changing particular non-binding residues or modifying groups on their side chains to improve stability and reduce toxicity1,2, these experimental approaches are generally expensive and time-consuming.

　　In CAMP, we designed a supervised prediction module to identify binding residues from a peptide sequence最新中文的wps的下载网站是多少. We first constructed a set of qualified labels for peptide-binding residues using the interacting information derived from PepBDB29, which is a comprehensive structure database containing the known interacting peptide–protein complexes from the RCSB PDB21,22 and information about binding residues in peptides involved in hydrogen bonds and hydrophobic contacts. With the support from such supervised information, CAMP achieved an average AUC of 0.806 and Matthews Correlation Coefficient (MCC) (definitions can be found in Supplementary Note 9) of 0.514 on peptide-binding residue identification using a fivefold cross-validation procedure under the “random-split setting” (Fig. 3a, b). The cross-validation results under other settings can be found in Supplementary Note 3.

　　To further demonstrate the performance of CAMP in binding residue prediction, we also selected four representative cases (ranked ~1%, 35%, 50%, and 85% in terms of the average AUC scores of predicted peptide-binding residues, respectively) and compared the predicted residues with the true interacting ones. Figure 3c shows the first example, a complex of an HIV-1-specific cell entry inhibitor and HIV-1 GP41 trimeric core (PDB ID: 1FAV [https://doi.org/10.2210/pdb1FAV/pdb]). The peptide inhibitor has 33 amino acids and 12 of them are binding residues. CAMP identified all these binding residues without any false positives. Such a prediction was the most ideal case in our prediction task and we found that 30.2% of the binding residue identification was completely accurate like this case. Figure 3d shows the second example, a complex of HIV-1 gp120 envelope glycoprotein and the CD4 receptor (PDB ID: 4JZW [https://doi.org/10.2210/pdb4JZW/pdb]), which ranked around the top 35% in terms of the average AUC. The peptide has 28 amino acids and 13 of them are binding residues. Our predicted binding residues covered 11 true binding residues along the peptide sequence and missed two true binding residues. Figure 3e shows the third example, a complex of a peptide from histone deacetylase and the ankyrin repeat family A protein (PDB ID: 3V31 [https://doi.org/10.2210/pdb3V31/pdb]). This pair ranked around the median among our predictions in terms of AUC and 11/13 of the true binding residues were successfully identified by CAMP with one false positive. Figure 3f shows the last example, a complex of the T-lymphoma invasion and metastasis inducing protein and an eight-residue phosphorylated syndecan-1 peptide (PDB ID: 4GVC [https://doi.org/10.2210/pdb4GVC/pdb]), which ranked ~85% among our predictions with an average AUC of 0.571. All eight residues including one false positive were predicted as binding residues by CAMP. Overall, our test results demonstrated that CAMP yields accurate binding residue predictions and thus can provide reliable evidence for further understanding the interacting mechanisms of peptides with their partner proteins.

　　Glucagon-like peptide receptor (GLP-1R) agonists play an important role in the treatment of type 2 diabetes mellitus36,37. We next investigated whether CAMP was able to correctly identify the interactions of Semaglutide, a known GLP-1R agonist (GLP-1RA), and its analogs with GLP-1R. In our benchmark data set, there are seven Semaglutide-analogous peptides that bind to GLP-1R. To avoid “easy prediction”, we removed those GLP-1RA peptide drugs from the training set that shared similar sequences (defined as peptide sequence similarities >40%) with Semaglutide (e.g., Liraglutide and Taspoglutide), and had interacting proteins similar to GLP-1R (i.e., with protein sequence similarities >40%). After removing these records as well as seven pairs of Semaglutide-analogous peptides and GLP-1R, we re-trained the CAMP model and combined the seven Semaglutide-analogous peptides with the remaining 3400 proteins to construct an independent test set which contained 23,800 candidate pairs. The test showed that CAMP was able to identify six of seven interacting pairs of Semaglutide-analogs peptides and GLP-1R with an AUC score of 0.831. For all the Semaglutide-analogs peptides, GLP-1R was ranked to the top 10% almost among all the candidate proteins (more details can also be found in Supplementary Table 3 and Supplementary Fig. 7). Such results further demonstrated the strong predictive power of CAMP.

　　We also examined the predicted binding residues of Semaglutide with its receptor (detailed results can be found in Supplementary Fig. 8 and Supplementary Note 4). CAMP correctly identified 11/12 of the true binding residues of Semaglutide with an average AUC of 0.917. Such a prediction result can provide useful insights for pharmacologists if they aim to improve the stability of the peptide drugs by replacing the non-binding residues with synthetic amino acids without changing the interacting interface of the binding complexes.

　　We conducted additional tests to further illustrate the generalizability of CAMP on binary interaction prediction and peptide-binding residue identification官方最新中文版的wps下载的网址在哪里. In particular, we first evaluated CAMP on an additional independent data set derived from the PDB22,38 following the same strategy as in constructing our previous benchmark data set. This additional test set contained 379 PepPIs from 262 peptides and 246 proteins from the PDB complexes released from 1 October 2019 to 10 March 2020. The corresponding PDB IDs and UniProt IDs can be found in Supplementary Tables 13 and 16 in Supplementary data. We also randomly paired these peptides and proteins without known evidence of interactions in the test set to obtain negative samples.

　　To demonstrate the robust performance of CAMP on binary interaction prediction, we evaluated the performances of CAMP and the baseline models on several variations of the above test data set with different positive-negative ratios. Each model was first trained on the complete benchmark data set and then an ensemble version (i.e., average predictions from five models) was used to make predictions on the additional test datasets. Figure 4a and b show that CAMP achieved the best results under all scenarios, demonstrating that CAMP outperformed the baseline methods with a relatively robust performance. We also observed that the AUC of all methods increased slightly as the positive-negative ratio decreased from 1:1 to 1:10. This was probably because the increased sample size brought more information for models to learn. Also, the AUPR of all methods decreased more dramatically than AUC as the positive vs. negative ratio increased. This was mainly because AUPR is generally more affected by the ratio of positive vs negative samples32.

　　We also evaluated the prediction results of CAMP on the identification of peptide-binding residues. We obtained the annotated binding residues of peptide sequences from PepBDB29. In total, 208 PepPIs have such peptide-binding residue labels from the test data set. Figure 4c and d show that CAMP was able to maintain its prediction power on the above additional data set.

　　We additionally compared CAMP with other methods on several representative benchmark data sets (Supplementary Table 4) that were originally used to evaluate the performance of peptide docking and detecting “hotspots” at protein interface34,39,40,41,42. As shown in Supplementary Fig. 9, CAMP still outperformed the baseline methods on all these additional datasets in terms of both AUC and AUPR scores. These additional evaluation results further demonstrated the superior predictive power and generalizing ability of CAMP.

　　We further investigated the application potential of CAMP in three related tasks, i.e., predicting peptide–PBD (protein binding domain) interaction prediction, binding affinity assessment, and virtual screening of peptides. For predicting peptide–PBD interactions, although we rarely found deep-learning-based methods for predicting PepPIs, there was a machine-learning approach, called HSM10, focusing on a quite related problem, i.e., predicting the interactions between peptides and globular PBDs. The PBD-containing proteins play essential roles in a variety of cell activities, e.g., multiprotein scaffold formation and enzyme activity regulation38,43,44. By incorporating biophysical knowledge as prior information into a machine-learning framework, HSM was reported to yield superior prediction performance on eight common PBD families with AUC scores ranging from 0.88 to 0.92. We compared CAMP with two reported models of HSM, i.e., HSM-ID (in which eight separate models were trained for each PBD/enzyme family) and HSM-D (in which a single unified model was trained for all families), on predicting peptide–PBD interactions. Here, we compared the performance of CAMP with that of HSM models on predicting peptide–PBD interactions. In particular, we evaluated the performance of CAMP with the same data set and eightfold cross-validation setting as used in the HSM paper (see Supplementary Note 6 for more details).

　　Figure 5 shows that CAMP significantly outperformed both HSM-ID and HSM-D across all domain families except the PDZ family. We also noticed that HSM-ID and HSM-D had large prediction variations across different families. As explained in the HSM paper, this may be due to the skewed distribution of the data (i.e., the numbers of pairs from different families were imbalanced). For families of large data amounts like PDZ, the HSM models could learn quite well but for those families of relatively small data sizes like domains from the phosphotyrosine binding family, HSM models had an obvious drop in performance. In contrast, the performance of CAMP was more robust and less influenced by the fluctuant data sizes. Such results indicated that CAMP is also suitable for tackling the related peptide–PBD interaction prediction problem.

　　Next, we investigated whether CAMP can also be applied to assess the binding affinity of peptide–protein pairs. Here, we made a comparison between CAMP and several baseline methods, including random forest (a conventional machine-learning based framework), DeepDTA (a deep-learning-based framework)31, and AutoDock CrankPep (a structure-based docking method)45, on an affinity data set derived from PDBbind v201946 (more details about data processing can be found in Supplementary Note 6). As shown in Supplementary Table 5, CAMP achieved higher performance than all the baseline methods with higher Pearson correlation coefficients and smaller prediction errors in terms of RMSE. Considering that CAMP was not particular designed for affinity prediction and the limited size of training data, such a comparison result was satisfactory and further illustrated the great potential of CAMP in predicting binding affinities between peptides and proteins. We also investigated whether CAMP can be applied for virtual “alanine scanning”, as the experimental “alanine scanning” strategy is considered as a “standard” in affinity assessment. Since there was no public data that can comprehensively cover the experimental “alanine scanning” affinities for all protein–peptide complex structures available from the RCSB PDB21,22, here we only chose two peptide–protein complexes (PDB IDs: 4TMP [https://doi.org/10.2210/pdb4TMP/pdb], 4N4H [https://doi.org/10.2210/pdb4N4H/pdb]) as case studies instead of performing a systematic evaluation (more details can be found in Supplementary Note 6). As shown in Supplementary Fig. 10, the Pearson correlation 最新官网的wps下载的网站在哪呢 coefficients between the logarithms of experimental affinities and the prediction scores were 0.6284 and 0.5646, for the PDB complexes 4TMP and 4N4H, respectively, which indicated that CAMP can capture the variation tendency of binding affinities in the “alanine scanning” experiments to a certain degree. In a real application scenario, CAMP can be used to rank the virtual “alanine scanning” results to determine which residues are more important for the binding activities.

　　Furthermore, we evaluated the capability of CAMP and various docking methods, including CABS-Dock47, MDockPeP48, AutoDock CrankPep v1.045, and GalaxyPepDock49, for virtual screening of peptides (Supplementary Note 6). We observed that CAMP achieved better performance than those structure-based docking methods (Supplementary Table 6). It was not surprising to observe such comparison results because these structure-based docking methods were originally designed for binding pose prediction rather than virtual screening. Considering the above fact, we believe that CAMP can provide a more suitable and powerful tool than those structure-based docking methods on the virtual screening of peptides.

上一篇: 最新中文版的wps是多少

下一篇: 最新的官网的wps的下载入口是多少