The current speed at which novel high-throughput technologies are developing and large-scale biological data are produced offers tremendous opportunities for enhancing our molecular understanding of biological systems. The purpose of the International computational biology workshop at Amirkabir University is to bring together researchers and scholars from around the world, who are interested in the application of computational systems, algorithmic concepts and information technologies to address challenging problems in Bioinformatics research, with a particular focus on the genome, protein and RNA.
The objectives of this educational/research course are:
|9 Dec||10 Dec||11 Dec|
|9:00 - 9:30||Opening Remarks|
|9:30 - 10:15||Mehdi Sadeghi
Systems Biology and Inference of Gene Regulatory Networks
Assessing the Impact of Exact Reads On Reducing the Error Rate of Read Mapping
Multiomics Analysis of Pregnancy-Related Pathologies
|10:15 - 10:45||Q&A||Q&A||Q&A|
|10:45 - 11:00||break||break||break|
|11:00 - 11:45||Zahra Razaghi Moghadam Kashani
Feasibility of Predicted Metabolic Engineering Strategies
Sequence Comparison without Alignment: The SpaM Approaches
Genomic Alterations and Cancer Initiation: An Experience with Papillary Thyroid Carcinoma
|11:45 - 12:15||Q&A||Q&A||Q&A|
|12:15 - 14:00||break||break||break|
|14:00 - 14:45||Najmeh Salehi
Characterizing SARS-CoV-2 Mutations in Iranian Related COVID-19 Patients
R2DT: A New Method for Visualising RNA Secondary Structure in Standard Orientations
Introduction to the Structure and Pathogenesis of Helicobacter Pylori
|14:45 - 15:15||Q&A||Q&A||Q&A|
|15:15 - 15:30||break||break||break|
|15:30 - 16:15||Hesam Dashti
Predicting Risk of Cardiovascular Disease Using Machine Learning Methods
Ensemble Multi-Label Learning For Protein Functional Classification
Supervised Learning of Gene-Regulatory Networks Based On Graph Distance Profiles of Transcriptomics Data
|16:15 - 16:45||Q&A||Q&A||Q&A|
|16:45 - 17:00||break||break||break|
|17:00 - 17:45||Fatemeh Zare-Mirakabad
Essential Gene Prediction
Systems biology is the systematic study of cells, organs, organisms, and especially cellular processes such as molecular interactions, intercellular connections and so on. Laboratory and computational technical capabilities and finding the main patterns in networks have accelerated this field of science. A major part of the gene regulatory network (GRN) study is based on the networks inferenced from the data obtained by various techniques for measuring gene expression, such as microarray data or, more precisely, by switching the genes off or by increasing their expression and tracking the changes caused by these disturbances. Different analyzes lead to the construction of a network of gene interactions with one another that can be viewed and interpreted as a graph. However, these graphs cannot accurately represent a network that can predict a living organism’s behavior. The main drawback of these graphs is that they do not model GRNs as a logic circuit that contains logical relationships between genes. This inability is due to the fact that these techniques cannot provide us with necessary information for the actual reconstruction of networks, and in return, the resulting networks are highly ambiguous. The goal of this seminar is to address the ambiguities in network reconstruction and its reduction, and also network reconstruction with the logical relationships between the components in mind.
Large-scale metabolic models are widely used to design metabolic engineering strategies for diverse biotechnological applications.
However, the existing computational approaches focus on alteration of reaction fluxes and often neglect the manipulations of gene expression to implement these strategies. Here we find that the association of genes with multiple reactions leads to infeasibility of engineering strategies at the flux level, since they require contradicting manipulations of gene expression. Moreover, we identify that all of the existing approaches to design gene knock-out strategies do not ensure that the resulting design may also require other gene alterations, such as up- or down-regulations, to match the desired flux distribution.
To address these issues, we propose a constraint-based approach, that facilitates the design of feasible metabolic engineering strategies at the gene level and that is readily applicable to large-scale metabolic networks.
We show that the proposed approach can identify feasible strategies to overproduce ethanol in Escherichia coli and lactate in Saccharomyces cerevisiae, but overproduction of the TCA cycle intermediates is not feasible in five organisms used as cell factories under default growth conditions.
A new human coronavirus, SARS-CoV-2, was first identified in December 2019 in patients in Wuhan, the capital of China’s Hubei province. This new etiologic agent has caused the outbreak of COVID-19, which was quickly spread around the world. It is vital to uncover and investigate the full genome sequence of SARS-CoV-2 throughout the world to track changes in this virus. The SARS-CoV-2 has been mutating from the first time it was sequenced in January of 2020.
Characterization analysis of SARS-CoV-2 mutations has significant medical and biological effects on diagnosis, prevention and treatment of COVID-19. To this purpose, the SARS-CoV-2 full genome sequence of some patients in Iran and other countries with a travel history to Iran or contacts with Iranian cases were retrieved from the GISAID database.
All SARS-CoV-2 sequences were compared with the reference Wuhan genome NC_045512.2. The analysis revealed various common and rare nucleotide mutations, some of which cause nonsynonymous mutations in the protein sequences. The S protein mutations were evaluated in the 3D structure of S protein to investigate the effect of the mutations on the binding of this protein to the angiotensin-converting enzyme-2 (ACE2) as a host cell receptor.
Characterizing SARS-CoV-2 mutations in Iranian related COVID-19 patients can be helpful to design specific diagnostic tests, trace the SARS-CoV-2 sequence changes in Iran, and explore therapeutic drugs and vaccines.
On average, cardiovascular disease (CVD) takes one life every 38 seconds, and despite significant progress, CVD remains a leading cause of mortality and morbidity worldwide.
To ensure early detection and treatment, risk prediction is a key goal for CVD prevention. The emerging field of machine learning for health has demonstrated success in analyzing complex interaction networks in biomedical data and to yield key insights in areas such as multidimensional exposome data and the prediction of arrhythmias.
Epidemiological studies and large clinical trials provide the platform for utilizing machine learning methods in computational CVD research, and in this talk, we focus on the design and development of advanced computational methods for predicting CVD events.
Nowadays, according to valuable resources of high-quality genome sequences, reference-based assembly methods with high accuracy and efficiency are strongly required. Many different algorithms have been designed for mapping reads onto a genome sequence which try to enhance the accuracy of reconstructed genomes. In this problem, one of the challenges occurs when some reads are aligned to multiple locations due to repetitive regions in the genomes. In this research, our goal is to decrease the error rate of rebuilt genomes by resolving multi-mapping reads. To achieve this purpose, we reduce the search space for the reads which can be aligned against the genome with mismatches, insertions or deletions to decrease the probability of incorrect read mapping. We propose a pipeline divided to three steps: ExactMapping, InExactMapping, and MergingContigs, where exact and inexact reads are aligned in two separate phases. We test our pipeline on some simulated and real data sets by applying some read mappers. The results show that the two-step mapping of reads onto the contigs generated by a mapper such as Bowtie2, BWA and Yara is effective in improving the contigs in terms of error rate. Assessment results of our pipeline suggest that reducing the error rate of read mapping, not only can improve the genomes reconstructed by reference-based assembly in a reasonable running time, but can also have an impact on improving the genomes generated by de novo assembly. In fact, our pipeline produces genomes comparable to those of a multi-mapping reads resolution tool, namely MMR by decreasing the number of multi-mapping reads. Consequently, we introduce EIM as a post-processing step to genomes reconstructed by mappers.
Sequence alignment is the basis of most methods of DNA and protein sequence analysis. For the large amounts of sequence data are nowadays available, however, sequence alignment has become too slow. Therefore, fast alignment-free methods for sequence comparison have become widely used in recent years. Most of these approaches are based on word frequencies, for words of a fixed length, or on word-matching statistics. Other approaches use the length of maximal word matches. While these methods are very fast, most of them are calculate ad-hoc measures of sequences similarity or dissimilarity that are hard to interpret. In my talk, I will describe a number of alignment-free methods that we developed in our group.
Our approaches are based on spaced word matches (`SpaM’), i.e. on in exact word matches, that are allowed to contain mismatches at certain pre-defined positions. Unlike most previous alignment-free approaches, our approaches are able to accurately estimate phylogenetic distances between DNA or protein sequences using a stochastic model of molecular evolution. The programs FSWM, Prot-SpaM, Multi-SpaM and Read-SpaM estimate distances based on so called micro-alignments. Other methods are based on the number of spaced-word matches between two DNA sequences. This lecture aims to introduce this microorganism as a target for bioinformatics studies.
Non-coding RNAs (ncRNA) are essential for all life, and the functions of many ncRNAs depend on their secondary (2D) and tertiary (3D) structure. Despite proliferation of 2D visualisation software, there is a lack of methods for automatically generating 2D representations in consistent, reproducible, and recognisable layouts, making them difficult to construct, compare and analyse. R2DT is a new comprehensive method for visualising a wide range of RNA structures in standardised layouts. R2DT is based on a library of 3,632 templates representing the majority of known structured RNAs, from small RNAs to the large subunit ribosomal RNA. R2DT has been applied to ncRNA sequences from the RNAcentral database and produced >14 million diagrams, creating the world’s largest RNA 2D structure dataset. The talk will also give an overview of RNAcentral and Rfam, two complementary resources for ncRNA research.
Multi-label classification is a type of classification where each instance can be associated with one or more classes. Multi-label classification is omnipresent in real-world problems, for example, in text and web categorization, functional genomics and proteomics, and scene classification. Multi-label classification methods can be group into two main categories: problem transformation, and algorithm adaptation. The problem transformation method is defined as a method that transforms the multi-label classification problem either into one or more single-label classification problems. Some examples of multi-label classification in this category are binary relevant, AdaBoost.MH and chain classifiers. The algorithm adaptation method is defined as a method that extends specific learning algorithms in order to handle multi-label data directly. Some examples of multi-label classification in this category are Bp-MU, ML-kNN, and RankSVM.
In this presentation, we propose a new approach to multi-label classification, which is based on deep learning. We use the conventional neural network (CNN) and long short term memory (LSTM) networks in a classifier chains manner. In this method, at first, few conventional, pooling, and inception layers are used for feature extraction of the given data. After that, the obtained features are feed to chains of LSTM. Finally, the multi-label of the given data is obtained from the last LSTM.
Recent technological advances in science provide novel opportunities to unravel the complex biology of pregnancy. Physiological changes during pregnancy are highly dynamic and involve multiple interconnected biological systems. The simultaneous interrogation of these systems with suitable technologies can reveal otherwise unrecognized crosstalk. Understanding such crosstalk can point to important disease mechanisms such as immune programming by the microbiome, or specific interactions between proteins and cellular elements, and ultimately guide new diagnostic and therapeutic strategies.
An ongoing cohort study by the March of Dimes Prematury Research Center at Stanford University exploits recent technological advances to examine of the transcriptomic, immunological, microbiome, and proteomic events associated with normal and pathological pregnancies. We performed Multiomics analysis of 51 samples from 17 pregnant women, delivering at term. The dataset sizes ranged from tens to tens of thousands of measurements, with different modularities and internal complexities. A modified Elastic-net algorithm was used to measure the ability of each dataset to predict gestational age. Using stacked generalization, these datasets were combined into a single integrative model. These algorithms were customized to account for the size and modularity of each unique technology. Results were cross-validated on previously unseen patients.
Several decades of cancer research has confirmed a long incubation time of tumor lesion development. This provides for a great opportunity to detect early precancerous lesions and to intervene during the initiation and progression of carcinogenesis.
There is already no agreed-upon point of view about the level of contribution of mutational/copy number alteration events in cancer initiation. Some recent studies demonstrated that the frequency of copy number alterations inversely correlated with mutational events in distinct tumors; therefore, different cancers can be classified into mutation- or copy number alteration-dominant groups. In contrast, other pan-cancer genomic investigations have demonstrated that prognostic biomarkers are predominantly among copy number altered genes.
In this presentation, I will talk about my recent project on comprehensive genomic investigation of cancer initiation in Papillary Thyroid Carcinoma.
Helicobacter pylori is a spiral organism that causes one of the most common infections in humans. The infection causes stomach inflammation and duodenal ulcer, gastric ulcer, gastric cancer (adenocarcinoma), and mucosa-associated lymphoid tissue (MALT) lymphoma. The human infection persists for life unless antimicrobial therapy due to increasing antimicrobial resistance. After attachment to the gastric mucosa, H. pylori can damage host tissue by some toxins, such as vacuolating cytotoxin A (VacA), CagA.
Its pathogenesis is mediated by a complex interplay between bacterial virulence factors, host, also environmental factors. The bacteria can be transmitted via the oral-fecal route from person to person within families. Afterward, H. pylori enter the host stomach, first survival in the acidic stomach by its urease enzyme. Then move toward epithelium cells by flagella-mediated motility, consequently attached to host cells by its adhesins.
Finally, causing tissue damage by its toxin. H. pylori have a genome of ∼1600 genes, the majority of which have been functionally characterized. The discovery of H.pylori organisms as causing agents for gastritis in the stomach led to the Nobel Prize (2005) in physiology or medicine by Marshall and Warren.
This lecture aims to introduce this microorganism as a target for bioinformatics studies.
Characterisation of gene-regulatory network (GRN) interactions provides a stepping stone to understanding how genes affect cellular phenotypes.
Yet, despite advances in profiling technologies, GRN reconstruction from gene expression data remains a pressing problem in systems biology. Here, we devise a supervised learning approach, GRADIS, which utilises support vector machine to reconstruct GRNs based on distance profiles obtained from a graph representation of transcriptomics data.
By employing the data from Escherichia coli and Saccharomyces cerevisiae as well as synthetic networks from the DREAM4 and five network inference challenges, we demonstrate that our GRADIS approach outperforms the state-of-the-art supervised and unsupervided approaches.
This holds when predictions about target genes for individual transcription factors as well as for the entire network are considered. We employ experimentally verified GRNs from E. coli and S. cerevisiae to validate the predictions and obtain further insights in the performance of the proposed approach.
In addition, we apply our approach on data from the model plant Arabidopsis thaliana. Our GRADIS approach offers the possibility for usage of other network-based representations of large-scale data, and can be readily extended to help the characterisation of other cellular networks, including protein–protein and protein–metabolite interactions.
Essential genes encode the basic functions of a cell and are vital for the survival of an organism. Identifying and analysing the characteristics of essential genes provide important biological information in explaining how genotype affects phenotype, identification of genes related to human diseases and discovering attractive drug targets for new antibiotics. With the advances in high-throughput technologies and the production of large amounts of data, it is possible to discover essential genes at the system level by analysing sequence-based features of genes and their topological features obtained from biological networks. For essential gene prediction based on computational methods, we have to know which features of genes are important. To do this, we extract some features form protein-protein interaction network and protein sequences. In the following, we apply the features for classifying essential and non-essential genes.
This event is online.
Please 30 minutes before the event start, enter webinar with your Username and Password:
Earn an Online Bioinformatics Certificate after attending the workshop.
Please let us know if you have any questions or concerns: firstname.lastname@example.org
برای تماس با تلفن پشتیبانی دومین ورکشاپ بین المللی زیست شناسی محاسباتی می توانید از ساعت 8 صبح تا 12 شب و از سراسر کشور با شماره 0903-476-2326 تماس حاصل فرمایید
A telegram group has been set up to create interaction between workshop participants and speakers: