TranDTA is the first method that applies transformers to extract feature of protein sequence and uses transformer representations in drug target binding affinity (DTBA) prediction. This model only uses inputs as 1D representations. It can also be run without limitations on resources (memory, CPU and GPU) due to the use of pretrained representations for inputs. This model can be helpful in drug development process. Experimental results show that TranDTA outperforms other existing sequence-based methods in prediction performance on the KIBA dataset. Moreover, it performed closely with the structure-based models and was slightly better in this experiment. Because of the success of transformers in NLP and the results of this study, we believe that TranDTA is an effective approach for DTBA prediction and can be quite helpful in drug development process. Figure 1 shows TranDTA architecture. In TranDTA, we use ProtAlbert[1] to transform the proteins sequences into feature vectors. Then, they are concatenated to the molecular fingerprint vectors of drugs and fed into 5 fully connected layers to predict the binding affinity value. Due to limited resources, we used a sample set of 1,512 interactions to train and test. We used 20% of the data for testing and the rest for training. To construct the sample set of interactions, we used the systematic random sampling method.We also used the Cochran's formula to calculate the sample size (error rate = 0.025 , Z =1.96).
[1] A. Elnaggar et al., “ProtTrans: Towards cracking the language of life’s code through self-supervised deep learning and high performance computing,” bioRxiv, 2020, doi: 10.1101/2020.07.12.199554.