Prediction of PPI Affinity: Machine Learning’s Current Situation

Affinity of protein-protein/peptide is significantly important to understand the nature of PPI and to find a target of interest. We are always looking at the PPI binding affinity. Lots of information available on the databases now, owing to tremendous efforts of pharmaceutical companies, academia and research centers in this field. However, affinities are still unknown in most cases due in part to the presence of innumerous number of PPIs. It is almost impossible to measure the all PPI affinities in an experimental fashion. We think one of the reasonable ways is the prediction of PPI affinity by machine learning.

There is an easy-to-understand perspective article in Frontiers in Bioinformatics by Yamaguchi’s group at Nagoya University.1) In this article, current stage of machine learning-based protein-protein binding affinity prediction methodologies and available databases are introduced.

Particularly, the authors are focusing on protein design, which is outrageously different from a usual PPI binding affinity prediction because of the generality requirements. In the field of protein design (e.g. high-affinity antibody design), a limited class of proteins or peptides are the target of PPI binding affinity prediction. The required datasets are not the same and the data choice changes the accuracy of prediction significantly.

Still, the fundamental basis for the application of machine learning toward PPI affinities are the almost the same in terms of the model of prediction. Of course, the prediction accuracy would raise easily if the learning process is optimized for a particular class of proteins. But that kind of model would potentially work for general PPI affinity prediction by changing the datasets for learning.

This article simply summarizes the current machine learning methods for PPI affinity prediction. Learning approaches are classified into structure-based and sequence-based ways. The flows are basically the same: 1) imput of datasets, 2) training, 3) feature extraction, 4) model improvement and 5) testing to see the prediction accuracy. A huge discrepancy between these two methods is the dataset, of course. In the case of structure-based methods, topological features need to be considered. Sequence-based methods requires an interpretation of text-based information to identify the 3D interaction.

Ideally, these prediction models merge into one to obtain high accuracy. It is still a research field to reasonably build up a model for the datasets of different types.

While we are writing this, we found an interesting web- based service for PPI binding affinity tool: PPI-Affinity.2) In a Journal of proteasome paper, the author of University of Duisburg-Essen, Germany, demonstrated the prediction accuracy of CXCR4 (CXC chemokine receptor 4) and EPI-X4 (Endogenous Peptide Inhibitor of CXCR4) mutants. Binding affinity prediction accuracy is generally fine compared with the result of experimental data.

PPI-Affinity is currently usable online.3) For the users of novel prediction technologies in pharmaceutical companies, it is preferable if the developer provides a web-based service to try an initial test or demonstration. Even though AI-based machine learning technologies are becoming insourced but it takes time to prepare a requisite UI for the researchers. We need to figure out PPI-Affinity’s reliability but we would want to say this is a potential contribution to drug discovery.

We are continuously trying to identify the PPI to be the target in high accuracy. We are always welcome to technology discussions on strategies to use PPI for drug discovery as well as collaboration.


Scroll to top