Multivariate Analysis of Transcript Splicing (MATS)
Xing Lab, University of California, Los Angeles
Convert SAM file to input data file:
MATS includes a program, "convertSamToMATSInput.py", that you can use to convert SAM format output from an aligner such as Tophat to MATS input file format. From the command line, run convertSamToMATSInput.py in the following format:
python convertSamToMATSInput.py genesAndExons sample_1.sam sample_2.sam <single-end or paired-end> sample_1_name sample_2_name readLength junctionLength outDir
For example:
Input Parameter:
genesAndExons: file describing genes and their exons from human Ensembl release 57 and UCSC Known Genes (hg19) or mouse Ensembl release 65 (mm9)
sample_1.sam and sample_2.sam files: the output from an aligner such as Tophat
single-end or paired-end: SE for single end, PE for paired-end data
sample_1_name: name for the sample 1 (e.g., ESRP)
sample_2_name: name for the sample 2 (e.g., EV)
readLength: original length of RNA-Seq
junctionLength: length of junction annotation used to map reads. If no specific junction annotation is used, use (readLength*2 -1)
outDir: output directory
Output:
1) Exon skipping events file and input file for exon skipping events
convertSamToMATSInput.py program creates an exon skipping events file (exonSkipping.txt) and a MATS input file for exon skipping events (MATS.input.exonSkipping.txt).
exonSkipping.txt file is a tab-separated format containing the following columns with a header
id: ID for an exon
gene_ID: gene ID
geneSymbol: associated gene symbol
eventType: type of splicing event ("ExonSkipping")
chr: chromosome
strand: 1 for positive, -1 for negative
cassetteExonStart: start of the cassette exon (0-base)
cassetteExonEnd: end of the cassette exon(1-base)
upsteamExonStart: start of the upstream exon (0-base)
upstreamExonEnd: end of the upstream exon (1-base)
downsteamExonStart: start of the downstream exon (0-base)
downstreamExonEnd: end of the downstream exon (1-base)
upstreamJunctionCount_sample1: Upstream junction count for first sample
downstreamJunctionCount_sample1: Downstream junction count for first sample
skippingJunctionCount_sample1: Skipping junction count for first sample
upstreamJunctionCount_sample2: Upstream junction count for second sample
downstreamJunctionCount_sample2: Downstream junction count for second sample
skippingJunctionCount_sample2: Skipping junction count for second sample
MATS.input.exonSkipping.txt is the MATS input file for exon skipping events
2) Alt 5/3 ss events file and input file for alt-5/3 ss events
convertSamToMATSInput.py program creates an alt 5/3 ss events file (altSS.txt) and a MATS input file for alt-5/3 ss events (MATS.input.altSS.txt).
altSS.txt file is a tab-separated format containing the following columns with a header
id: ID for an exon
gene_ID: gene ID
geneSymbol: associated gene symbol
eventType: type of splicing event ("Alt3Prime" or "Alt5Prime")
chr: chromosome
strand: 1 for positive, -1 for negative
longExonStart: start of the long exon (0-base)
longExonEnd: end of the long exon(1-base)
shortExonStart: start of the short exon (0-base)
shortExonEnd: end of the short exon (1-base)
flankingExonStart: start of the flanking exon (0-base)
flankingExonEnd: end of the flanking exon (1-base)
inclusionJunctionCount_sample1: Inclusion junction count for first sample
skippingJunctionCount_sample1: Skipping junction count for first sample
inclusionJunctionCount_sample2: Inclusion junction count for second sample
skippingJunctionCount_sample2: Skipping junction count for second sample
MATS.input.altSS.txt is the MATS input file for alt 5/3 splice site events
Test Run:
You can type the following from the command line for a test run. All necessary data files and test script are included here.
Note:
convertSamToMATSInput.py currently only supports SAM alignments with the CIGAR match ('M') and reference skip ('N') operations. Support for insertions, deletions, and other operations will be added in the future.
Convert Bowtie Output to SAM file:
|
|