Resurgence of machine learning (ML) is becoming a generally usable paradigm shift trigger in a variety of fields. ML has a potential to apply for anything in a sense its artificial neural networks with an appropriate amount of dataset and training mimics human’s way of thinking.
In synthetic chemistry, smart and reliable prediction of the focused reaction is traditionally relied on chemists’ solid knowledge and experiences. One would think ML opens up the possibility to develop a robust and generalized way to predict the result of the reaction of interest or even provide the best answer of the reaction conditions.
Actually, a general tool for ML-based reaction prediction is not present right now. But the development is in progress and this paper1) summarizes the concepts and strategies in a short review. It won’t be so long before a paradigm shift in reaction designs.
The authors focus on chemical performance improvement by the use of chemical knowledge in the field of organic synthesis. Chemical performance is an ambiguous description and the authors define it as ”one of the decisive factors for the success of a synthetic reaction”. The yield, enantiomeric excess and regioselectivity are the parameters of chemical performance, for example.
The authors indicate “Chemists make predictions of reaction performance based on their domain knowledge.” The domain knowledge includes the general information of the reactants like reactivity and solubility, the molecular level reaction mechanism to be performed, the rate-determining steps, and the selectivity-determining steps.
The domain knowledge-based predictions face with difficulty in many cases and the operation of iterative PDCA cycles is necessary to obtain enough reaction performance. But ML can provide robust predictions of reaction outcomes with the accumulation of data.
The paper takes up chemical knowledge-based embedding approaches as the main topic. This approach is easy to understand for synthetic chemists because its feature is just the introduction of chemical knowledge into an ML model.
Chemical knowledge-based embedding is a novel but promising approach and it was demonstrated in several reactions.2) In one of the most successful cases so far, a graph neural network is incorporated into mechanistically informed statistical models of reactivity performance. It allows to avoid mechanical calculations to raise the performance of ML-based prediction without loss of the chemical interpretability.
The designs of the chemical knowledge-based descriptor and the ML model have equally inevitable value for this approach. Once the descriptor and the model are defined, the input of the proper reaction knowledge dataset, learning and prediction is readily be performed. It is noteworthy that the dataset to produce the reaction knowledge has a huge impact and sometimes it requires human-based interpretation or summarization.
We think it is necessary for chemists to generate the requisite chemical performance when they use a predictor. It requires the inputs of chemical knowledge by chemists to bridge the gap between chemists and computational scientists. This is something to know for the chemists who are eager to utilize machine learning-based technologies.
Availability of Graphic Processing Unit (GPU) acceleration has had a huge contribution and now simple machine learning process is available without a high-performance server. ML is gradually becoming a tool around us even though knowledge and skill is required for application. It would change the synthesis paradigm in lab.
1) https://doi.org/10.1002/chem.202202834
2) https://doi.org/10.1021/acs.accounts.0c00745