DRP-KGE: Drug Repositioning Prediction using Knowledge Graph Embedding

Drug repositioning (DR) refers to finding new therapeutic uses for existing drugs. Although de novo drug discovery is a lengthy, expensive and risky process, drug repositioning reduces the costs and time. Current computational drug repositioning methods face data representation and negative data sampling challenges. An appropriate data representation is a vital need to improve the accuracy of computational approaches. Retrospective studies on the DR task attempt to utilize various representations for drugs and diseases based on their characteristics. However, aggregating these features and bring drugs and diseases into a unified latent space is crucial step in prediction. The advantage of this latent space is that similar concepts become closer in the space. Besides, the number of unknown associations between drugs and diseases, which considers negative data, is much more than known associations as positive data. It leads to an imbalanced dataset that reduces the performance of models and causes inaccurate predictions of computational methods. To address these challenges, we propose the DRP-KGE framework, which applies a knowledge graph embedding method.

The privilege of a knowledge graph (KG) is the ability to model complex relationships between biological entities. Thus, a Drug-Disease KG (DDKG) is constructed that considers the drugs, diseases, and their features as nodes. Moreover, the relationships between entities are regarded as edges. Then, the entities and relationships are embedded into a unified space by applying Word2Vec on the DDKG. In the next step, the concatenation of extracted embedded vectors for drug and disease pairs are fed to a logistic regression model as a classifier for inferring the association between them. Besides, despite the typical approach that considers all unknown drug-disease associations as negative data, we select a subset of unknown associations provided the disease occurs because of the adverse reaction to a drug. In the results of this negative sampling technique, the applied dataset is more balanced and accurate than considering all unknown associations as negative data. To evaluate the DRP-KGE framework, we use different criteria. It achieves AUC-ROC = 87.94% and AUC-PR=86.99%, which is higher than previous works. Moreover, we check the performance of our framework in finding related drugs for skin-related diseases: contact dermatitis and atopic eczema. DRP-KGE predicted using Beclomethasone for contact dermatitis and Fluorometholone, Clocortolone, Fluocinonide, and Beclomethasone for atopic eczema, which are examined their efficiency in other studies. Moreover, using Fluorometholone for contact dermatitis is the new suggestion of DRP-KGE that should be analyzed experimentally.


knowledge graph, negative data definition, Word2Vec, logistic regression.


Zahra Ghorbanali, Fatemeh Zare-Mirakabad, Ali Masoudi-Nejad, Mohammad Akbari, Najmeh Salehi

Figure 1


  1. DRP-KGE-Data_and_Code.zip (465 KB).