The
coronavirus disease 2019 (COVID-19) global pandemic resulted in millions of people becoming infected with the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus and close to seven million deaths worldwide. It is essential to further explore and design effective
COVID-19 treatment drugs that target the main
protease of SARS-CoV-2, a major target for
COVID-19 drugs. In this study, machine learning was applied for predicting the
SARS-CoV-2 main protease binding of Food and Drug Administration (FDA)-approved drugs to assist in the identification of potential repurposing candidates for
COVID-19 treatment.
Ligands bound to the
SARS-CoV-2 main protease in the
Protein Data Bank and compounds experimentally tested in
SARS-CoV-2 main protease binding assays in the literature were curated. These chemicals were divided into training (516 chemicals) and testing (360 chemicals) data sets. To identify
SARS-CoV-2 main protease binders as potential candidates for repurposing to treat
COVID-19, 1188 FDA-approved drugs from the Liver Toxicity Knowledge Base were obtained. A random forest algorithm was used for constructing predictive models based on molecular descriptors calculated using Mold2 software. Model performance was evaluated using 100 iterations of fivefold cross-validations which resulted in 78.8% balanced accuracy. The random forest model that was constructed from the whole training dataset was used to predict
SARS-CoV-2 main protease binding on the testing set and the FDA-approved drugs. Model applicability domain and prediction confidence on drugs predicted as the main
protease binders discovered 10 FDA-approved drugs as potential candidates for repurposing to treat
COVID-19. Our results demonstrate that machine learning is an efficient method for
drug repurposing and, thus, may accelerate
drug development targeting SARS-CoV-2.