Spaced words for alignment-free sequence comparison and read mapping
`Spaced words’ or `spaced seeds’ are frequently used in biological sequence analysis, e.g. in database searching. A `spaced word’ is a word that contains wildcard characters at certain positions specified by a pre-defined binary pattern of `match’ and `don’t-care’ positions. It has been shown that methods that rely on spaced words are often more accurate than approaches based on contiguous words. In 2014, we proposed to use spaced words in alignment-free sequence comparison, to estimate phylogenetic distances between genomic sequences. The results of `spaced words’ algorithms depend on the underlying pattern of `match’ and `don’t-care’ positions. We developed a program called `rasbhari’ to calculate suitable patterns for database searching, read mapping and alignment-free sequence comparison.
Using Deep Neural Networks to Reveal Cell Identity from Gene Expression Profiles
Understanding cell identity is an important task in many biomedical areas. Expression patterns of specific marker genes have been used to characterize some limited cell types, but for most of the cell types, we do not know exclusive markers.
In this talk we introduce a method based on deep neural networks to identify cell type based on gene expression profiles. We have used more than 1000 whole-genome transcription profiles to train and test our model, and reached more than 96% classification accuracy.
incaRNAfbinv : a web server for the fragment-based design of RNA sequences
In recent years, new methods for computational RNA design have been developed and applied to various problems in synthetic biology and nanotechnology. Lately, there is considerable interest in incorporating essential biological information when solving the inverse RNA folding problem. Correspondingly,RNAfbinv aims at including biologically meaningful constraints and is the only program to-date that performs a fragment-based design of RNA sequences. In doing so it allows the design of sequences that do not necessarily exactly fold into the target, as long as the overall coarse-grained tree graph shape is preserved. Augmented by the weighted sampling algorithm of incaRNAtion, our web server called incaRNAfbinv implements the method devised in RNAfbinv and offers an interactive environment for the inverse folding of RNA using a fragment-based design approach. It takes as input: a target RNA secondary structure; optional sequence and motif constraints; optional target minimum free energy,neutrality, and GC content. In addition to the design
RNA-SEQ: Read alignment and mapping
Development of novel sequencing technologies has provided a new method to reveal the presence and quantity of RNA in a biological sample for both mapping and quantifying transcriptomes. Understanding the transcriptome is essential for interpreting the functional elements of the genome and revealing the molecular constituents of cells and tissues, and also for understanding development and diseases.
Two methods are used to assign raw sequence reads or assemble the transcriptome including:
De novo: This approach does not require a reference genome to reconstruct the transcriptome, and is typically used if the genome is unknown, incomplete, or substantially altered compared to the reference.
Genome guided: This approach relies on the same methods used for DNA alignment, with the additional complexity of aligning reads that cover non-continuous portions of the reference genome. These non-continuous reads are the result of sequencing spliced transcripts.
In this presentation different algorithms for transcriptome assembly is reviewed.
Two ribosome recruitment sites direct multiple translation events within HIV1 Gag open reading frame
In the late phase of the HIV virus cycle, the unspliced genomic RNA is exported to the cytoplasm for the necessary translation of the Gag and Gagpol polyproteins. Three distinct translation initiation mechanisms ensuring Gag production have been described with little rationale for their multiplicity. The Gag-IRES has the singularity to be located within Gag ORF and to directly interact with ribosomal 40S. Aiming at elucidating the specificity and the relevance of this interaction, we probed HIV-1 Gag-IRES structure and developed an innovative integrative modelling strategy to take into account all the gathered information. We propose a novel Gag-IRES secondary structure strongly supported by all experimental data. We further demonstrate the presence of two regions within Gag-IRES that independently and directly interact with the ribosome. Importantly, these binding sites are functionally relevant to Gag translation both in vitro and ex vivo. This work provides insight into the Gag-IRES molecular mechanism and gives compelling evidence for its physiological importance. It allows us to propose original hypotheses about the IRES physiological role and conservation among primate lentiviruses.
Breast Cancer Drug (ICD-85)
Finding novel drug is always a challenge. The major concern is related to the costs of preclinical and clinical trials studies which are the most budget and time consuming process. Bioinformatics applying techniques allowing us to perform many experiments by computer so that so much time and money could be saved. It is a powerful tool that accelerate the process of drug development by providing in sight into structure potential targets and target specific sites of signaling molecules or their downstream effectors. From biological view the process of bioinformatics, can be through 4 steps including : A-Target identification, B-Target Validation, C- lead substance and D- lead optimization. Our team experience in discovery of ICD-85, the biological peptides which suppress the growth of cancer cell, primary steps including identification of peptides through trial and error finding the mechanism of action, safety studies , bio-distribution and finally exposure time related activity and finally clinical trial in phase 0 and phase 1 studies in breast cancer patients will be discussed and the challenges we faced during 12 years of work will be presented in this speech . The possible help of bioinformatics science will be discussed to join with biologist to ease the process of drug discovery .
Drug Design for Alzheimer
Alzheimer’s disease, the most common form of dementia, is a chronic neurodegenerative disorder characterized by progressive cognitive impairment in elderly people. According to the cholinergic hypothesis, memory loss in Alzheimer’s disease is due to decreased levels of the neurotransmitter acetylcholine (ACh), which plays a key role in memory and cognition in cholinergic synapses. Therefore, Alzheimer’s disease is characterized by a low ACh in the hippocampus and cortex. Acetylcholinesterase (AChE), one of the most essential enzymes in the family of serine hydrolases, is responsible for rapid breakdown of ACh to allow repeated signal transmission. Inhibition of AChE in Alzheimer’s disease treatment should decrease the level of ACh in the synapses, providing a chance to induce a signal in the downstream nerve. In this study we present an approach for predicting the inhibitory activity of AChE inhibitors by combining docking studies and structure-based quantitative structure–activity relationship (QSAR) model. Docking analysis revealed that hydrophobic interactions play important roles in the AChE-inhibitor complex. A structure-based QSAR model is also developed to represent the relationship between descriptors created from docking and the activities of the inhibitors. The least squares support vector regression was constructed using the four most relevant docking descriptors and one molecular structure descriptor. The Q2 value of the model was found to be 0.790.
Protein engineering is an important tool for overcoming the limitations of natural enzymes as biocatalysts. In this regard computational tools are becoming increasingly important in order to create improved or novel enzymes.
Here we describe some strategies for rational protein engineering and summarize the computational tools available. Computational tools can either be used to increase stability, activity and affinity of proteins. This also includes new peptide design.
Alignment-free sequence comparison using maximal common substrings
Most methods for alignment-free sequence comparison are based on a fixed word length or on fixed binary patterns of `match’ and `don’t-care’ positions. The results of these methods therefore depend on the word length or underlying pattern. As an alternative, some approaches have been proposed that are based on the length of common subwords. Haubold et al. (2009) showed how phylogenetic distances can be estimated in a rigorous way based on the average length of common substrings. Generalizing this approach, we proposed to use the length of common substrings with k mismatches in alignment-free sequence comparison. In a recent paper, we showed that the number of substitutions per position in DNA sequences can be accurately estimated from the length distribution of $k$-mismatch common substrings.
A Systems Approach to Modeling Cell-specific Metabolic Networks
Genome-scale metabolic networks have been widely used to model the metabolic capacities of of a variety of cell types, ranging from microorganisms to plants and human. More specifically, context-specific human metabolic networks have been used during the last decade to understand human physiology and pathology. In the present talk, by reviewing recent publications, I will explain how “omics” data empower the reconstruction and (the subsequent) analysis of such networks. Furthermore, some of the basic computational challenges of the procedure will be discussed.
Comparison of Different Approaches for Identifying Subnetworks in Metabolic Networks
A metabolic network model provides a computational framework for studying the metabolism of a cell at the system level. The organization of metabolic networks has been investigated in different studies. One of the organization aspects considered in these studies is the decomposition of a metabolic network. The decompositions produced by different methods are very different and there is no comprehensive evaluation framework to compare the results with each other. In this study, these methods are reviewed and compared in the first place. Then they are applied to six different metabolic network models and the results are evaluated and compared based on two existing and two newly proposed criteria. Results show that no single method can beat others in all criteria but it seems that the methods introduced by Guimera & Amaral and Verwoerd do better on among metabolite-based methods and the method introduced by Sridharan et al. does better among reaction-based ones.