##########################################################################################
#                                                                                        #
#   Software   :  GAPSSIF   (Protein Secondary Structure Inverse Folding)                #
#   Release    :  1.0  (July 2016)                                                       #
#                                                                                        #
#   								                         #
#   Copyright  :  Computational Biology Research Center - CBRC                           #
#                 Amirkabir University of Technology, Tehran, Iran   			 #
#                 http://bioinformatics.aut.ac.ir/					 #
#                                                                                        #
##########################################################################################

		    GAPSSIF Version 1.0 - Project Documentation
                    ==========================================
                    
====================================================================================================
                            Operating system compatibility and Requirments
====================================================================================================

GAPSSIF implemented and evaluated on ubuntu linux 64-bit.

GAPSSIF employs a secondary structure prediction algorithm named Reprof which is easily accessible 
via ubuntu software center. it is also available at: 

http://manpages.ubuntu.com/manpages/xenial/man1/reprof.1.html.

====================================================================================================
					  Method Description
====================================================================================================

GAPSSIF is a memetic algorithm for solving Protein secondary structure inverse folding(PSSIF)  
problem. in fact, PSSIF is a simpelified form of protein design which involves designing appropriate
amino acid sequence for an input secondary structure. 

GAPSSIF uses native secondary sub-structures to solve PSSIF problem. In essence, evolutionary 
information can lead the algorithm to design appropriate amino acid sequences respective to 
the target secondary structures. Furthermore, designed sequences can be folded to tertiary structures
almost similar to their reference 3D structures. 

GAPSSIF software takes four steps to design an appropriate amino acid sequence for the input structure:
1. Loading fragment repository 2. Making Knoweledge-based Population using fragment repository. 
3. Enriching knoweledge-based population. 4.Searching through sequence space - this st

The algorithm will be terminated earlier than 50 iteration if it could find a sequence with 
identical secondary structure as target structure.

GAPSSIF will be executed in the absence of its repository but results greatly differ in the presents
of evolutionary information.

GAPSSIF can be downloaded at http://bioinformatics.aut.ac.ir/GAPSSIF/

====================================================================================================
					    Project Folder
====================================================================================================

The project folder contains:

- an x-executable or linux executable file named 'GAPSSIF'.

- 306 .txt files which contain amino acid fragments. These fragments obtained via an analysis 
through PDB secondary structure files. The name of each file clearly represents the secondary 
structure and length of its containing fragmnets.

- 'GAPSSIF' also works in the absence of its repository (306 .txt files) but it will not have 
significant performance.

- 'Doc' folder contain software documentation.

====================================================================================================
				      Installation and Samples
====================================================================================================

GAPSSIF is presented in an x-executable or linux executable file and does not need any installation.

-- Download GAPSSIF-Package.tar.gz from http://bioinformatics.aut.ac.ir/GAPSSIF/;

-- Open Terminal and go to the downloaded package directory using "cd" command;

-- type "tar -xvzf GAPSSIF-Package.tar.gz" and press enter to extract the file;

====================================================================================================
					  Software Usage
====================================================================================================

GAPSSIF will easily execute use the only "GAPSSIF" x-executable file.

Usage : ./GAPSSIF input_file output_file

** input_file is mandatory and contain target secondary structure. 
** output_file is optional and if no name is defined results will be saved in 'report-input_file.txt'.

example:

-- open terminal;

-- redirect to the extracted package directory using "cd" command;

-- change directory to the 'Doc' sub-folder;

-- type "../GAPSSIF sample-input.txt sample-output.txt";

-- following steps will be taken gradually:

    1/4 Loading Repository ...
    2/4 Making Knoweledge-based Population ...
    3/4 Enriching Knoweledge-based Population ...
    4/4 Searching Through Sequence Space ...

    1/50  2/50  3/50  4/50  5/50  6/50  7/50  8/50  9/50  10/50  11/50  12/50  13/50  14/50  15/50
    16/50  17/50  18/50  19/50  20/50  21/50  22/50  23/50  24/50  25/50  26/50  27/50  28/50  29/50
    30/50  31/50  32/50  33/50  34/50  35/50  36/50  37/50  38/50  39/50  40/50  41/50  42/50  43/50
    44/50  45/50  46/50  47/50  48/50  49/50  50/50
    
    Designing Process Finished.    

====================================================================================================
					  Input File Format
====================================================================================================

The input file is a FASTA format file which only contains a string of secondary structure. 

example:

>1GUT
CCCCCCEEEEEEEEEEECCCEEEEEEEECCCCEEEEEEEHHHHHHHCCCCCCEEEEECCHHHCEEEC

the input secondary structure should only contain: H --> Helix , E --> Beta Strands , C --> Coils

No more character is acceptable for input secondary structure.

it should be mentioned that beta strand with lenght 1 are modified to Coil.

'sample-input.txt' is an example of input file.

====================================================================================================
					  Output File Description
====================================================================================================

Output files are written in the following file format for each input secondary structure:

** 'total time' : declares total time in seconds from the first step till the end of last step.

** 'Target Structure' : declares the input secondary structure.

** 'Designed Sequence' : declares the best designed amino acid sequence after taking all steps of the algorithm.

** 'Designed Sequence Predicted structure by Reprof' : declares the predicted secondary structure of 
    designed sequence using Reprof.

** 'Q3 percent of designed sequence' : this value declares the accuracy of designed sequence using Q3 valus.    

'sample-output.txt' is an example of the output structure.

====================================================================================================
					    Release Notes
====================================================================================================

Version 1.0 (2016)

Authors     :  M.Movahedi1, F.Zare-Mirakabad, S.Arab
Description :  First release of the software
Comments    :  all 306 txt files are mandatory for reliable results.

Feel free to contact us via: f.zare@aut.ac.ir
====================================================================================================

Enjoy it :) !