Machine Learning for Oral Bioavailability Prediction: How About HobPre?

Machine learning is solving issues of extreme difficulty in extensive fields. In drug discovery and development, pharmacokinetics prediction is a have been the major target for building up the learning model with high accuracy.

As a small molecule-based drug discovery company, we are intensive interest in creation of orally available drugs for user-friendliness. Oral bioavailability prediction is intensively investigated and the rule-of-five is the topmost example.

However, machine learning has opened up the opportunity for all of us to predict a parameter with high complexity. We are internally dealing with this high-priority issue and paying deep attention to the novel applications and services.

We think machine learning would enable the high-probability prediction of oral bioavailability in a short while.

HobPre is a novel human bioavailability (HOB) prediction software available online.1) This software provides the prediction service of representative ADMET parameters, including HOB. Input of the chemical structures by SMILES is the only thing and the output comes in a few minutes. If its prediction is significantly accurate, you would want to use it now, aren’t you? Here we would talk about the accuracy and reliability of HobPre a bit in detail.

HobPre adopts random forest (RF) method2) and takes the cutoff bioavailability values of 50% and 20% for the prediction model. RF in HobPre is a classification model that classifies an input molecule to have higher bioavailability value than the cutoff value or not.

The probability of prediction is also output like below.

The detail of HobPre is described in this paper.3) In order for the comparison with other AI-based bioavailability prediction, the authors took training datasets from the previous study4) and trained the model and tested with additional molecules that are available in ChEMBL and other literatures.

By the use of the training datasets below, HobPre is revealed to possess good accuracy of prediction with the consensus model, as the previously known result of similar dataset was 0.783 in the case of the cutoff value of 0.783.

Table 1. Training Datasets1)

Cutoff N of Molecules Positive Negative Accuracy of consensus model
50% 1157 536 621 0.793-0.823
20% 1142 859 283 0.815-0.910

Sharpley Additive Explanation (SHAP) algorithm was utilized for the evaluation of descriptor in the RF model as well.5) It revealed that the sum number of all OH bonds (SsOH), Topological polar surface area of N and O atoms (TopoPSA(NO)), centered Moreau-Broto autocorrelation of lag 0 weighted by Gasteiger charge (ATSC0c) are the top 3 of the  important input features in this order.

Interestingly, the SsOH has a significantly larger value (0.0239) then the second (0.0178). It implies the huge contribution of the free hydroxy group on bioavailability.

At this time, the usefulness and the reliability of HobPre is not certain, but AI-based technology like this would support us more accurate selection of the lead and the candidate of an orally available drug. Evaluation of the contribution of input values is also a great “learning” for us to figure out the way to design a requisite molecule.

HobPre is not the only prediction software of human bioavailability. We need to test a wide variety of programs and, for us, develop an optimal model for PepMetics®. We would appreciate you to have a discussion on bioavailability prediction and please contact us if interested.


Scroll to top