Multivariate Analysis of Transcript Splicing (MATS)
Xing Lab, University of California, Los Angeles
Install MATS:
|
|
|
|
|
|
|
|
|
cd
tar -xvzf MATS.2.0.0.tgzcd ~/bowtieIndexestar -xvzf bowtieIndexes.tgz
tar -xvzf MATS.2.0.0.tgzcd ~/bowtieIndexestar -xvzf bowtieIndexes.tgz
Test MATS:
Run testRun.sh as below to test MATS runs properly.
cd ~/MATS.2.0.0
./testRun.sh ~/bowtieIndexes/hg19
This output can be found in the out_test directory. The test run output should look like the MATS output description../testRun.sh ~/bowtieIndexes/hg19
Trim Fastq (Optional):
To trim the poor quality 3' end of reads, use the trimFastq.py script found in the bin directory.
python trimFastq.py input.fastq trimmed.fastq desired_length
cd ~/MATS.2.0.0/
python bin/trimFastq.py testData/231ESRP.250K.r1.fastq testData/trimmed.fastq 32
The above command trims 231ESRP.250K.r1.fastq to 32 bp long by removing sequence from the 3' end of the reads and then saves it to trimmed.fastq.
python bin/trimFastq.py testData/231ESRP.250K.r1.fastq testData/trimmed.fastq 32
Alternative Splicing Events
MATS analyzes skipped exon (SE), alternative 5' splice site (A5SS), alternative 3' splice site (A3SS), mutually exclusive exons (MXE), and retained intron (RI) events. Possible alternative splicing
events are identified from the RNA-Seq data and annotation of transcripts in GTF format. The following is a list of provided GTF files found in the gtf directory:
Alternatively, you can download your own transcript annotation in GTF format. However, the first column (chromosome/contig name) in the GTF must match the sequence names in your bowtie index. Use bowtie-inspect (found in the bowtie directory) to display sequence names for the bowtie index.
|
|
|
|
|
Alternatively, you can download your own transcript annotation in GTF format. However, the first column (chromosome/contig name) in the GTF must match the sequence names in your bowtie index. Use bowtie-inspect (found in the bowtie directory) to display sequence names for the bowtie index.
bowtie-inspect --names your_bowtie_index
Only use a GTF in which the chromosome/contig name (first column) matches with the above command output.
Using MATS:
The following is a detailed description of the options used with MATS.
Usage:
python run_MATS.py -s1 reads1_1[,reads1_2] -s2 reads2_1[,reads2_2] -gtf gtfFile -bi bowtieIndexBase -o outDir [options]*
Required Parameters:
Optional:
Example:
Usage:
python run_MATS.py -s1 reads1_1[,reads1_2] -s2 reads2_1[,reads2_2] -gtf gtfFile -bi bowtieIndexBase -o outDir [options]*
Required Parameters:
-s1 reads_1_1[,reads1_2] | FASTQ file(s) for the sample_1. For the paired-end data, two files must be in a comma separated list. |
-s2 reads_2_1[,reads2_2] | FASTQ file(s) for the sample_2. For the paired-end data, two files must be in a comma separated list. |
-gtf gtfFile | An annotation of genes and transcripts in GTF format |
-bi bowtieIndexBase | The basename of the bowtie indexes (ebwt files). The base name does not include the first period. For example, use hg19 for hg19.1.ebwt |
-o outDir | The output directory |
-a <int> | The "anchor length" used in TopHat. At least “anchor length” NT must be mapped to each end of a given junction. The default is 8 |
-r1 <float> | The insert size of sample_1 data. This applies only for the paired-end data. The default is 15 |
-r2 <float> | The insert size of sample_2 data. This applies only for the paired-end data. The default is 15 |
-sd1 <float> | The standard deviation for the r1. The default is 70 |
-sd2 <float> | The standard deviation for the r2. The default is 70 |
-c <float> | The cutoff splicing difference. The cutoff used in the null hypothesis test for differential splicing. The default is 0.05 for 5% difference. Valid: 0 ≤ cutoff < 1 |
python run_MATS.py -s1 testData/231ESRP.250K.r1.fastq,testData/231ESRP.250K.r2.fastq -s2 testData/231EV.250K.r1.fastq,testData/231EV.250K.r2.fastq -gtf gtf/Homo_sapiens.Ensembl.GRCh37.65.gtf -bi ~/bowtieIndexes/hg19 -o out_test -a 8 -r1 72 -sd1 40 -r2 70 -sd2 48 -c 0.05
Output:
All output files are in outputFolder
|
|
|
|
|
|
|