Multivariate Analysis of Transcript Splicing (MATS)
Xing Lab, University of California, Los Angeles
Convert Bowtie Output to SAM file:
A user can use Bowtie to map sequences to the splice junction database that comes with MATS.
We constructed a database of splice junctions in human genes using the Ensembl transcript annotations (release 57). The database includes all known splice junctions observed in Ensemble transcripts, as well as hypothetical splice junctions obtained by all possible pair-wise fusions of exons within genes. In total, the database contains about 3.5 million splice junctions. Each splice junction sequence is 84bp long with 42bp from the 3’ end of the upstream exon and 42bp from the 5’ end of the downstream exon. An exon-exon junction annotation with a user-defined junction length can be created from here.
Once a user mapped sequences to the splice junction database, run the following format of command to generate SAM files.
python makeSamFromBowtieOut.py outputFromBowtie outFile.sam readLength junctionLength shortExonFileForJunction
For example:
test.ESRP.junction.bowtie.out is available
here.
shortExonFileForJunction (junctions.Ensembl.r57.84nt.fasta)
contains the length of basepairs taken from each flanking exon.
makeSamFromBowtieOut.py uses such information to get the correct
coordinates for the mapped reads.
These SAM files can be converted to MATS input using another python
program called “convertSamToMATSInput.py”.