Abstract
The aim of the study is to create machine learning models for in silico prediction of the permeability of compounds through the blood-brain barrier.
Materials for research, namely the creation of a dataset for training models, were obtained by analyzing the PubMed library (pubmed.ncbi.nlm.nih.gov) in manual mode, using key words («bbb penetration», «in silico bbb test», «Blood Brain Barrier Permeability», «Blood-Brain-Barrier»). The data was entered into the dataset in the form of a specification of the simplified representation of molecules in the input line (SMILES) and classification marks: 1 – penetrates, 0 – does not penetrate. SMILES for the found substances were searched using the PubChem service (pubchem.ncbi.nlm.nih.gov). As research methods,
we used a set of binary classification methods for machine learning (pycaret.org) with the python 3.8 programming language (python.org) in the miniconda package management environment (conda.io). Pipeline programming was carried out using the jupyter notebook package (jupyter.org). Features were
generated in the dataset from SMILES using the RDKit package (rdkit.org).
As a result of the research machine learning models were created for in silico prediction of the permeability of compounds across the blood-brain barrier. According to the AUC criterion, the most promising were the models – Random Forest Classifier, Light Gradient Boosting Machine, Extra Trees Classifier. The
use of the developed models in the «ExpSys Nasalia» expert system makes it possible to predict the choice of excipients in the development of cerebroprotective nasal agents. In silico prediction will enable researchers at the stage of pharmaceutical development of new intranasal dosage forms, to more efficiently select auxiliary ingredients, for example, add adsorption enhancers to the formulation. The created models are placed on the webserver of the «ExpSys Nasalia» expert system (nasalia.zsmu.zp.ua) in the calculations section.