============================================================

Satsuma version 3.1.0 (June 2014)

Software for analysis of large genomic data sets

Satsuma copyright (c) Manfred Grabherr, Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Sweden

FFTReal copyright (c) Laurent de Soras

============================================================


Licensing

Spines is free software: you can redistribute it and/or modify it under the terms of the Lesser GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the Lesser GNU General Public License for more details.

You should have received a copy of the Lesser GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.


1. Contents

IMPORTANT: the executables provided with the package require the gcc 4.6.0 runtime libraries. For all other gcc versions, you need to cleanly re-compile all executables on your system via

> make clean
> make


2. Supported Platforms

Satsuma exclusively runs on 64-bit Linux and has been tested on the Suse and Ubuntu distributions (note: while not actively supported and tested, the code compiles and runs on MacOS X 10.4.11 (Intel), gcc 4.0.1, when compiled with ‘make clean UNSUPPORTED=yes’ followed by ‘make UNSUPPORTED=yes’).

NOTE: the make file system requires csh to be installed.


3. Modules

- Satsuma: high-sensitivity alignments through cross-correlation.

- SatsumaSynteny: Satsuma in a battleship-style search framework.


4. References and credits

For Satsuma and SatsumaSynteny, please reference:

Grabherr MG, Russell P, Meyer M, Mauceli E, Alfoldi J, Di Palma F, Lindblad-Toh K. Genome-wide synteny through highly sensitive sequence alignment: Satsuma. Bioinformatics. 2010 May 1;26(9):1145-51. Epub 2010 Mar 5.


5. Satsuma

Satsuma aligns two fasta sequences exhaustively. For a small example, see the script ./test_Satsuma which runs on small sequences provided with the distribution for testing purposes.

Command line arguments (and defaults):


-q<string> : query fasta sequence

-t<string> : target fasta sequence

-o<string> : output directory

-l<int> : minimum alignment length (def=0)

-t_chunk<int> : target chunk size (def=4096)

-q_chunk<int> : query chunk size (def=4096)

-n<int> : number of blocks (def=1)

-lsf<bool> : submit jobs to LSF (def=0)

-nosubmit<bool> : do not run jobs (def=0)

-nowait<bool> : do not wait for jobs (def=0)

-chain_only<bool> : only chain the matches (def=0)

-refine_only<bool> : only refine the matches (def=0)

-min_prob<double> : minimum probability to keep match (def=0.99999)

-proteins<bool> : align in protein space (def=0)

-cutoff<double> : signal cutoff (def=1.8)

-same_only<bool> : only align sequences that have the same name. (def=0)

-self<bool> : ignore self-matches. (def=0)


Note that Satsuma calls other executables (HomologyByXCorr, MergeXCorrMatches), and thus has to be invoked by either supplying the full path of the executable, or “./Satsuma” (see test_Satsuma).

Notes:


6. SatsumaSynteny

SatsumaSynteny aligns two fasta sequences in a battleship fashion syntenically. For a small example, see the script ./test_SatsumaSynteny which runs on sequences provided with the distribution for testing purposes.

Command line arguments (and defaults):


-q<string> : query fasta sequence

-t<string> : target fasta sequence

-o<string> : output directory

-l<int> : minimum alignment length (def=0)

-t_chunk<int> : target chunk size (def=4096)

-q_chunk<int> : query chunk size (def=4096)

-t_chunk_seed<int> : target chunk size (seed) (def=8192)

-q_chunk_seed<int> : query chunk size (seed) (def=8192)

-n<int> : number of blocks (def=1)

-ni<int> : number of initial search blocks (def=-1)

-lsf<bool> : submit jobs to LSF (def=0)

-nosubmit<bool> : do not run jobs (def=0)

-nowait<bool> : do not wait for jobs (def=0)

-chain_only<bool> : only chain the matches (def=0)

-refine_only<bool> : only refine the matches (def=0)

-min_prob<double> : minimum probability to keep match (def=0.99999)

-proteins<bool> : align in protein space (def=0)

-cutoff<double> : signal cutoff (def=1.8)

-cutoff<double> : signal cutoff (seed) (def=3)

-m<int> : number of jobs per block (def=8)

-resume<string> : resumes w/ the output of a previous run (xcorr*data) (def=)

-seed<string> : loads seeds and runs from there (xcorr*data) (def=)

-pixel<int> : number of blocks per pixel (def=24)

-nofilter<bool> : do not pre-filter seeds (slower runtime) (def=0)

-dups<bool> : allow for duplications in the query sequence (def=0)


Note that SatsumaSynteny calls other executables (FilterGridSeeds, HomologyByXCorr, HomologyByXCorrSlave, MergeXCorrMatches), and thus has to be invoked by either supplying the full path of the executable, or “./SatsumSynteny” (see test_SatsumaSynteny).

Notes:


Parameter choice, execution and data preparation


7. Output files

Alignment coordinates:

<outdir>/satsuma_summary.out: all alignment coordinates (Satsuma only)

<outdir>/satsuma_summary.refined.out: final coordinates (Satsuma and SatsumaSynteny)


Contents:

Target sequence name (provided by fasta)

First target base

Last target base

Query sequence name (provided by fasta)

First query base

Last query base

Identity

Orientation



EXAMPLE:

chrX 5947 6164 chrX 9153 9360 0.626728 +

chrX 6270 6452 chrX 9472 9654 0.576923 +


Note: ‘space’ in fasta names is permissible for alignment, but all spaces will be replaced with “_” in the output files.

Other output:

<outdir>/MergeXCorrMatches.out: readable alignments (Satsuma only)

<outdir>/MergeXCorrMatches.refined.out: final readable alignments (Satsuma and

SatsumaSynteny)


8. Visualization

Use ./MicroSyntenyPlot –i <satsuma_summary.txt> to create a postscript dot plot (color coded by target chromosomes).

Use ./ChromosomePaint to create a postscript file that colors chromosomes by color.

Chromosomes


Use ./BlockDisplaySatsuma to create a file that can be shown in the interactive multi-level synteny browser MizBee.