########################################################################################## # # # Software : GAPSSIF (Protein Secondary Structure Inverse Folding) # # Release : 1.0 (July 2016) # # # # # # Copyright : Computational Biology Research Center - CBRC # # Amirkabir University of Technology, Tehran, Iran # # http://bioinformatics.aut.ac.ir/ # # # ########################################################################################## GAPSSIF Version 1.0 - Project Documentation ========================================== ==================================================================================================== Operating system compatibility and Requirments ==================================================================================================== GAPSSIF implemented and evaluated on ubuntu linux 64-bit. GAPSSIF employs a secondary structure prediction algorithm named Reprof which is easily accessible via ubuntu software center. it is also available at: http://manpages.ubuntu.com/manpages/xenial/man1/reprof.1.html. ==================================================================================================== Method Description ==================================================================================================== GAPSSIF is a memetic algorithm for solving Protein secondary structure inverse folding(PSSIF) problem. in fact, PSSIF is a simpelified form of protein design which involves designing appropriate amino acid sequence for an input secondary structure. GAPSSIF uses native secondary sub-structures to solve PSSIF problem. In essence, evolutionary information can lead the algorithm to design appropriate amino acid sequences respective to the target secondary structures. Furthermore, designed sequences can be folded to tertiary structures almost similar to their reference 3D structures. GAPSSIF software takes four steps to design an appropriate amino acid sequence for the input structure: 1. Loading fragment repository 2. Making Knoweledge-based Population using fragment repository. 3. Enriching knoweledge-based population. 4.Searching through sequence space - this st The algorithm will be terminated earlier than 50 iteration if it could find a sequence with identical secondary structure as target structure. GAPSSIF will be executed in the absence of its repository but results greatly differ in the presents of evolutionary information. GAPSSIF can be downloaded at http://bioinformatics.aut.ac.ir/GAPSSIF/ ==================================================================================================== Project Folder ==================================================================================================== The project folder contains: - an x-executable or linux executable file named 'GAPSSIF'. - 306 .txt files which contain amino acid fragments. These fragments obtained via an analysis through PDB secondary structure files. The name of each file clearly represents the secondary structure and length of its containing fragmnets. - 'GAPSSIF' also works in the absence of its repository (306 .txt files) but it will not have significant performance. - 'Doc' folder contain software documentation. ==================================================================================================== Installation and Samples ==================================================================================================== GAPSSIF is presented in an x-executable or linux executable file and does not need any installation. -- Download GAPSSIF-Package.tar.gz from http://bioinformatics.aut.ac.ir/GAPSSIF/; -- Open Terminal and go to the downloaded package directory using "cd" command; -- type "tar -xvzf GAPSSIF-Package.tar.gz" and press enter to extract the file; ==================================================================================================== Software Usage ==================================================================================================== GAPSSIF will easily execute use the only "GAPSSIF" x-executable file. Usage : ./GAPSSIF input_file output_file ** input_file is mandatory and contain target secondary structure. ** output_file is optional and if no name is defined results will be saved in 'report-input_file.txt'. example: -- open terminal; -- redirect to the extracted package directory using "cd" command; -- change directory to the 'Doc' sub-folder; -- type "../GAPSSIF sample-input.txt sample-output.txt"; -- following steps will be taken gradually: 1/4 Loading Repository ... 2/4 Making Knoweledge-based Population ... 3/4 Enriching Knoweledge-based Population ... 4/4 Searching Through Sequence Space ... 1/50 2/50 3/50 4/50 5/50 6/50 7/50 8/50 9/50 10/50 11/50 12/50 13/50 14/50 15/50 16/50 17/50 18/50 19/50 20/50 21/50 22/50 23/50 24/50 25/50 26/50 27/50 28/50 29/50 30/50 31/50 32/50 33/50 34/50 35/50 36/50 37/50 38/50 39/50 40/50 41/50 42/50 43/50 44/50 45/50 46/50 47/50 48/50 49/50 50/50 Designing Process Finished. ==================================================================================================== Input File Format ==================================================================================================== The input file is a FASTA format file which only contains a string of secondary structure. example: >1GUT CCCCCCEEEEEEEEEEECCCEEEEEEEECCCCEEEEEEEHHHHHHHCCCCCCEEEEECCHHHCEEEC the input secondary structure should only contain: H --> Helix , E --> Beta Strands , C --> Coils No more character is acceptable for input secondary structure. it should be mentioned that beta strand with lenght 1 are modified to Coil. 'sample-input.txt' is an example of input file. ==================================================================================================== Output File Description ==================================================================================================== Output files are written in the following file format for each input secondary structure: ** 'total time' : declares total time in seconds from the first step till the end of last step. ** 'Target Structure' : declares the input secondary structure. ** 'Designed Sequence' : declares the best designed amino acid sequence after taking all steps of the algorithm. ** 'Designed Sequence Predicted structure by Reprof' : declares the predicted secondary structure of designed sequence using Reprof. ** 'Q3 percent of designed sequence' : this value declares the accuracy of designed sequence using Q3 valus. 'sample-output.txt' is an example of the output structure. ==================================================================================================== Release Notes ==================================================================================================== Version 1.0 (2016) Authors : M.Movahedi1, F.Zare-Mirakabad, S.Arab Description : First release of the software Comments : all 306 txt files are mandatory for reliable results. Feel free to contact us via: f.zare@aut.ac.ir ==================================================================================================== Enjoy it :) !