Publications

Genetic algorithm for dyad pattern finding in DNA sequences

In this paper a novel genetic algorithm is presented for the dyad motif finding problem. The genetic algorithm uses a multi-objective fitness function based on the sum of pairs, the number of matches, and the information content. The individuals required for the population pool in the genetic algorithm are optimized by Gibbs sampling method. Also, new crossover and mutation operators are designed. The algorithm is implemented and tested on the different types of real datasets. The results are compared with other well-known algorithms and the effectiveness of our algorithm is shown.

Journal Link

A CONSTANT TIME ALGORITHM FOR DNA ADD

We present a new molecular algorithm for adding two binary numbers with n bits. Without considering the generation of input, this algorithm can be performed in O(1) in a test tube using O(n) different types of DNA strands, and the output can be detected in O(n). The output strands, prior to read out operation, can serve as the input strands for another round of addition. The algorithm can be easily extended to any other logical operation, and even for adding two decimal numbers.

Journal Link

Finding Motifs based on suffix trie

Pattern discovery or motif finding is one of the most challenging problems in both molecular biology and computer science. In this paper we present an exact exhaustive method, for finding motifs of length ℓ in a set of t sequences of length n with a limited number of mutations d. The algorithm is based on the Depth First Search on a suffix trie with maximum nodes O(tn) and is performed in O(t 2n 2 ℓ 2 ) time complexity. The proposed algorithm is tested on yeast and human transcription factor binding site data sets and the obtained results are compared to the other well-known algorithms. The experimental results demonstrate that the proposed method is working analogous to them algorithms.

Journal Link

RNAComp: A new method for RNA secondary structure alignment

In this paper, a new algorithm for alignment of two RNA secondary structures without pseudoknots is presented. The algorithm is based on finding the longest common sub-structures between two RNA structures and special effort is devoted for aligning the beginning the end parts of the existing stems (base pairs) in the secondary structures of two RNAs. The results of structure alignment of different types of RNA by this algorithm are obtained, and the result of this algorithm show more consistency with the models in the evolution rather than the other existing structure alignment algorithms

Journal Link

New scoring scheme for finding motifs in DNA sequences

Pattern discovery in DNA sequences is one of the most fundamental problems in molecular biology with important applications in finding regulatory signals and transcription factor binding sites. An important task in this problem is to search (or predict) known binding sites in a new DNA sequence. For this reason, all subsequences of the given DNA sequence are scored based on an scoring function and the prediction is done by selecting the best score. By assuming no dependency between binding site base positions, most of the available tools for known binding site prediction are designed. Recently Tomovic and Oakeley investigated the statistical basis for either a claim of dependence or independence, to determine whether such a claim is generally true, and they presented a scoring function for binding site prediction based on the dependency between binding site base positions. Our primary objective is to investigate the scoring functions which can be used in known binding site prediction based on the assumption of dependency or independency in binding site base positions.

Journal Link

Parallel generation of the biological trees 1

Several biological features are presented by different types of trees. Two types of such trees are considered in this paper. the first type is trees with n external nodes that each internal node have at least two children, and are used in neuro-science and called neuronal dendritic trees. The second type is trees with n internal nodes and m external nodes. This type of trees represent the secondary structure of RNA sequences, and called RNA trees. In this paper, we present two new parallel algorithms for generation of these two biological trees. Both algorithms are adoptive and cost-optimal and generate the trees in B-order. Computations run in an SM SIMD model.

Journal Link

Evolutionary Algorithm for RNA Secondary Structure Prediction Based on Simulated SHAPE Data

Background Non-coding RNAs perform a wide range of functions inside the living cells that are related to their structures. Several algorithms have been proposed to predict RNA secondary structure based on minimum free energy. Low prediction accuracy of these algorithms indicates that free energy alone is not sufficient to predict the functional secondary structure. Recently, the obtained information from the SHAPE experiment greatly improves the accuracy of RNA secondary structure prediction by adding this information to the thermodynamic free energy as pseudo-free energy. Method In this paper, a new method is proposed to predict RNA secondary structure based on both free energy and SHAPE pseudo- free energy.

Journal Link

Evaluating the accuracy of splice site prediction based on integrating Jensen-Shannon divergence and a polynomial equation of order 2

Advances in DNA sequencing technology have caused generation of the vast amount of new sequence data. It is essential to understand the functions, features, and structures of every newly sequenced data. Analyzing sequence data by different methods could provide important information about the sequence data. One of the essential tasks for genome annotation is gene prediction that can help to understand the features and determine functions of the genes. One of the key steps towards correct gene structure prediction is accurate splice site detection. There are vast numbers of splice site prediction methods, however, a few of them can be incorporated in gene prediction modules because of their complexity. In this paper, a novel model is presented to recognize unknown splice sites in a new genome without using any prior knowledge. Our model is defined based on integrating Jensen-Shannon divergence and a polynomial equation of order 2. Finally, the proposed model is evaluated on Yeast’s genome to predict splice sites. The experimental results suggest that the proposed method is an effective approach for splice site prediction.

Journal Link

WCOACH: protein -protein complex prediction in weighted PPT networks

Protein complexes are aggregates of protein molecules that play important roles in biological processes . Detecting protein complexes from protein-protein interaction (PPI) networks is one of the most challenging problems in computational biology, and many computational methods have been developed to solve this problem . Generally , these methods yield high false positive rates . In this article , a semantic similarity measure between proteins , based on Gene Ontology (GO) structure , i s applied to weigh PPI networks. Consequently, one of the well-known methods, COACH, has been improved to be compatible with weighted PPI networks for protein complex detection.

Journal Link

RNA secondary structure prediction based on SHAPE data in helix regions

RNA molecules play important and fundamental roles in biological processes. Frequently, the functional form of single-stranded RNA molecules requires a specific tertiary structure. Classically, RNA structure determination has mostly been accomplished by X-Ray crystallography or Nuclear Magnetic Resonance approaches. These experimental methods are time consuming and expensive. In the past two decades, some computational methods and algorithms have been developed for RNA secondary structure prediction. In these algorithms, minimum free energy is known as the best criterion. However, the results of algorithms show that minimum free energy is not a sufficient criterion to predict RNA secondary structure. These algorithms need some additional knowledge about the structure, which has to be added in the methods. Recently, the information obtained from some experimental data, called SHAPE, can greatly improve the consistency between the native and predicted RNA secondary structure.

Journal Link

Using tempreture effects to predict the interactions between two RNAs

Interaction of two RNA molecules is considered as an important factor that regulates gene expression post-transcriptional process. Most of the ncRNAs prevent the translation of their target mRNA(s) by forming stable bindings with them. Although several computational methods have been proposed to predict the interactions between two RNAs, none of them can produce reliable and accurate results. RESULTS: In this paper, a new approach entitled tempRNAs is presented to accurately predict interaction structure between two RNAs based on a gradual temperature decrease. For each specified temperature, our algorithm contains two main steps. First, the secondary structure of each RNA is determined with respect to the previous base pairs as constraints. Second, two RNAs are concatenated and then the interaction between them is calculated according to the previous base pairs. The secondary structures are determined based on minimum free energy model. The proposed algorithm is evaluated for a set of known interacting RNA pairs. The results show the higher accuracy of the proposed method in comparison to the other state-of-the-art algorithms, namely inRNAs and RactIP.

Journal Link

RNA-RNA interaction prediction using genetic algorithm

Background RNA-RNA interaction plays an important role in the regulation of gene expression and cell development. In this process, an RNA molecule prohibits the translation of another RNA molecule by establishing stable interactions with it. In the RNA-RNA interaction prediction problem, two RNA sequences are given as inputs and the goal is to find the optimal secondary structure of two RNAs and between them. Some different algorithms have been proposed to predict RNA-RNA interaction structure. However, most of them suffer from high computational time. Results In this paper, we introduce a novel genetic algorithm called GRNAs to predict the RNA-RNA interaction. The proposed algorithm is performed on some standard datasets with appropriate accuracy and lower time complexity in comparison to the other state-of-the-art algorithms.

Journal Link

Inverse RNA folding solution based on multi-objective genetic algorithm and gibbs sampling method

In living systems, RNAs play important biological functions. The functional form of an RNA frequently requires a specific tertiary structure. The scaffold for this structure is provided by secondary structural elements that are hydrogen bonds within the molecule. Here, we concentrate on the inverse RNA folding problem. In this problem, an RNA secondary structure is given as a target structure and the goal is to design an RNA sequence that its structure is the same (or very similar) to the given target structure. Different heuristic search methods have been proposed for this problem. One common feature among these methods is to use a folding algorithm to evaluate the accuracy of the designed RNA sequence during the generation process. The well known folding algorithms take O(n3) times where n is the length of the RNA sequence. In this paper, we introduce a new algorithm called GGI-Fold based on multi-objective genetic algorithm and Gibbs sampling method for the inverse RNA folding problem. Our algorithm generates a sequence where its structure is the same or very similar to the given target structure. The key feature of our method is that it never uses any folding algorithm to improve the quality of the generated sequences. We compare our algorithm with RNA-SSD for some biological test samples. In all test samples, our algorithm outperforms the RNA-SSD method for generating a sequence where its structure is more stable.

Journal Link

PreRkTAG: Prediction of RNA Knotted Structures Using Tree Adjoining Grammars

Background: RNA molecules play many important regulatory, catalytic and structural roles in the cell, and RNA secondary structure prediction with pseudoknots is one the most important problems in biology. An RNA pseudoknot is an element of the RNA sec ondary structure in which bases of a single-stranded loop pair with complementary bases outside the loop. Modeling these nested structures (pseudoknots) causes numerous com putational diffilties and so it has been generally neglected in RNA structure prediction algorithms. Objectives: In this study, we present a new heuristic algorithm for the Prediction of RNA Knotted structures using Tree Adjoining Grammars (named PreRKTAG). Materials and Methods: For a given RNA sequence, PreRKTAG uses a genetic algorithm on tree adjoining grammars to propose a structure with minimum thermodynamic energy.

Journal Link

EPWM: An Extended Position Weight Matrix for Motif Representation in Biological Sequences

In this paper, we design a new MR model based on information theory. This model can be used for the de novo motif finding and motif search problems. We extract some known motifs from JASPAR and TRANSFAC databases to search for common features among them. Based on these features, a new MR model is constructed called EPWM. A jackknife test is used to show the EPWM model is more successful than the other MR models for the motif search problem. The jackknife test with each MR model (the EPWM model and the other MR models) is implemented and performed on the JASPAR and TRANSFAC databases. To verify the efficiency of the EPWM model for the de novo motif finding problem, we implement the EPWM model and the other MR models in the Gibbs sampling method. Finally, the Gibbs sampling method is performed on the JASPAR and TRANSFAC databases. The results show that the EPWM model gives more accurate prediction than the other MR models for the motif search and de novo motif finding problems

Journal Link

A heuristic approach to RNA-RNA interaction prediction

RNA-RNA interaction is used in many biological processes such as gene expression regulation. In this process, an RNA molecule prohibits the translation of another RNA molecule by establishing stable interactions with it. In this regard, some algorithms have been formed to predict the structure of the interaction between two RNA molecules. One common pitfall in the most algorithms is their high computational time. In this paper, we introduce a novel algorithm called TIRNA to accurately predict the secondary structure between two RNA molecules based on minimum free energy (MFE). The algorithm is stand on a heuristic approach which employs some dot matrices for finding the secondary structure of each RNA and between two RNAs. The proposed algorithm has been performed on some standard datasets such as CopA-CopT, R1inv-R2inv, Tar-Tar*, DIS-DIS and IncRNA₅₄-RepZ in the Escherichia coli bacteria. The time and space complexity of the algorithm are 0(k² log k²) and 0(k²), respectively, where k indicates the sum of the length of two RNAs. The experimental results show the high validity and efficiency of the TIRNA.

Journal Link

Genetic algorithm for dyad pattern finding in DNA sequences

In this paper a novel genetic algorithm is presented for the dyad motif finding problem. The genetic algorithm uses a multi-objective fitness function based on the sum of pairs, the number of matches, and the information content. The individuals required for the population pool in the genetic algorithm are optimized by Gibbs sampling method. Also, new crossover and mutation operators are designed. The algorithm is implemented and tested on the different types of real datasets. The results are compared with other well-known algorithms and the effectiveness of our algorithm is shown.

Journal Link

A constant time algorithm for DNA add

We present a new molecular algorithm for adding two binary numbers with n bits. Without considering the generation of input, this algorithm can be performed in O(1) in a test tube using O(n) different types of DNA strands, and the output can be detected in O(n). The output strands, prior to read out operation, can serve as the input strands for another round of addition. The algorithm can be easily extended to any other logical operation, and even for adding two decimal numbers.

Journal Link

RNA design using simulated SHAPE data

It has long been established that in addition to being involved in protein translation, RNA plays essential roles in numerous other cellular processes, including gene regulation and DNA replication. Such roles are known to be dictated by higher-order structures of RNA molecules. It is therefore of prime importance to find an RNA sequence that can fold to acquire a particular function that is desirable for use in pharmaceuticals and basic research. The challenge of finding an RNA sequence for a given structure is known as the RNA design problem. Although there are several algorithms to solve this problem, they mainly consider hard constraints, such as minimum free energy, to evaluate the predicted sequences. Recently, SHAPE data has emerged as a new soft constraint for RNA secondary structure prediction.

Journal Link

Transcription factor binding sites identification on human genome using an artificial neural network

Transcription factor binding sites on human DNA are the target locations of specific proteins called transcription factors. Gene expression process begins when a transcription factor binds to its target location in the genome. Expensive experimental methods are used to identify a limited number of these binding sites, hence there is essential need for computational algorithms. In this paper, we train a back propagation neural network to identify SP1 factor binding sites on human chromosome1. Biological data have been extracted from NCBI database which includes a wide variety of genetic information of human and other species. In order to compare the performance of our trained neural network with other classification algorithms, we use Support Vector Machine, Discriminant Analysis and K-Nearest Neighbor algorithm to classify same data. Results show that our trained neural network outperforms other classification algorithms.

Journal Link

Evaluating the accuracy of protein design using native secondary sub-structures

Background According to structure-dependent function of proteins, two main challenging problems called Protein Structure Prediction (PSP) and Inverse Protein Folding (IPF) are investigated. In spite of IPF essential applications, it has not been investigated as much as PSP problem.In fact, the ultimate goal of IPF problem or protein design is to create proteins with enhanced properties or even novel functions. One of the major computational challenges in protein design is its large sequence space, namely searching through all plausible sequences is impossible. Inasmuch as, protein secondary structure represents an appropriate primary scaffold of the protein conformation, undoubtedly studying the Protein Secondary Structure Inverse Folding (PSSIF) problem is a quantum leap forward in protein design, as it can reduce the search space.In this paper, a novel genetic algorithm which uses native secondary sub- structures is proposed to solve PSSIF problem.

Journal Link Download GAPSSIF

Transcription Factor Binding Sites Prediction Based on Modified Nucleosomes

In computational methods, position weight matrices (PWMs) are commonly applied for transcription factor binding site (TFBS) prediction. Although these matrices are more accurate than simple consensus sequences to predict actual binding sites, they usually produce a large number of false positive (FP) predictions and so are impoverished sources of information. Several studies have employed additional sources of information such as sequence conservation or the vicinity to transcription start sites to distinguish true binding regions from random ones. Recently, the spatial distribution of modified nucleosomes has been shown to be associated with different promoter architectures. These aligned patterns can facilitate DNA accessibility for transcription factors. We hypothesize that using data from these aligned and periodic patterns can improve the performance of binding region prediction.

Journal Link

PSOMF: An algorithm for pattern discovery using PSO

The task of transcription factor binding sites discovery from the upstream region of gene, without any prior knowledge of what look likes, is very challenging. In this paper we propose an algorithm based on Particle Swarm Optimization (PSO) to identify motif instances in multiple biological sequences. The experimental results on yeast sac-choromyces Cerevisae transcription factor binding sites, demonstrate that the proposed method is working analogous to YMF, MEME and AlignACE algorithms

Journal Link