Multivariate Analysis of Transcript Splicing (MATS)

Xing Lab, University of California, Los Angeles

Install MATS:

Install Python 2.6 or above and corresponding versions of NumPy and SciPy

Add the Python 2.6 directory to the $PATH environment variable

Add the NumPy and SciPy directory to the $PYTHONPATH environment variable

OPTIONAL: If using the Bayesian method in MATS P value calculation, then install JAGS (Just Another Gibbs Sampler)

OPTIONAL: Add the JAGS directory to the $PATH environment variable

OPTIONAL: If using createJunctionAnnotation.sh then install PYGR (Python Graph Database)

Unzip MATS1.2.0.tar.gz into a folder

Run MATS:

Run MATS in the same folder when MATS is unzipped.

./MATS.sh [-d InputDataFile] [-o OutputFolder][-c Cutoff_Splicing_Difference(default: 0.1)]
[-t Null_Hypothesis_Type(default: 1)][-p Number_of_Processors(default:1)] [-m Statistical_Method(default:B)][-s Max_Simulation_Precision(default: 7)]

-c Cutoff_Splicing_Difference. The cutoff used in the null hypothesis test for differential splicing or switch-like difference.

-t Null_Hypothesis_Type. 1: null hypothesis for differential splicing (|InclusionLevel1-InclusionLevel2|<=cutoff. 2: null hypothesis for switch-like difference (outside of the region (InclusionLevel1 <=cutoff and InclusionLevel2>=1-cutoff) or (InclusionLevel1 >=1-cutoff and InclusionLevel2<=cutoff).

-p Number_of_Processors. MATS is capable of using multiple instances. This parameter specifies the max number of processors MATS will use.

-m Statistical_Method. B: MATS will use the Bayesian method. L: MATS will use likelihood-ratio test, which is ~100x faster than the Bayesian method.

-s Max_Simulation_Precision. This parameter is disabled for the likelihood-ratio test. It decides the max precision MATS can reach for the P values through simulations. For example, MATS can reach the highest precision of 10^-7 for -s 7.

Run the test dataset with splicing difference cutoff 10%, testing for the null hypothesis of differential splicing |InclusionLevel1-InclusionLevel2|<=10%, using one processor, the Bayesian method and 7 simulation rounds:
./MATS.sh -d test_data.txt -o test -c 0.1 -t 1 -p 1 -m B -s 7

Input Data File Format:

Each column is TAB delimited. Input file should have the first line as the title.

id	Inclusion Junction Count Sample1	Skipping Junction Count Sample1	Inclusion Junction Count Sample2	Skipping Junction Count Sample2
1	43	6	2	59
2	59	49	281	297

For example, when sample 1 is the ESRP1 over-expression sample and sample 2 is the EV sample, the first line of the data file is from the figure below. The inclusion junction count in the sample 1 is summarized by round((38+47)/2)=43.The skipping junction count in the sample 1 is the 6 as illustrated in the figure.

Interpret the MATS result:

The output file is a TAB delimited file named 'MATS_result.txt', which contains the following columns:

id	Inclusion Junction Count Sample1	Skipping Junction Count Sample1	Inclusion Junction Count Sample2	Skipping Junction Count Sample2	Inclusion Level Sample1	Inclusion Level Sample2	Posterior_Probability	Posterior_P_Value	FDR
1	43	6	2	59	0.88	0.03	1.0	0.0	0.0

The last 3 columns of the output contain statistics for splicing difference.

Posterior probability of splicing difference no less than a cutoff (default 10% splicing difference). High posterior probability indicates high probability of splicing difference.

Posterior P value of splicing difference. Small posterior P value indicates high statistical significance of splicing difference.

False discovery rate (FDR) of splicing difference. Small FDR indicates low probability of getting false discovery from the significant exons.

Convert SAM file to input data file:

MATS includes a program, "convertSamToMATSInput.py", that you can use to convert SAM format output from an aligner such as Tophat to MATS input file format.

Here is a detailed guide about the converting tool for SAM to MATS input. The converting tool is included in the MATS download.

Convert Bowtie Output to SAM file:

We also support the Bowtie output of RNA-Seq reads mapped to annotated junctions.

Here is a detailed guide about the converting tool for Bowtie output to SAM. The converting tool is included in the MATS download.

ESRP Dataset

The FASTQ files for the ESRP dataset are available here: ESRP1 sample and EV sample.

The junction annotation is available for Ensembl release 57 and UCSC Known Genes (hg19) in FASTA format. For instructions on making a bowtie index for the junction annotation, consult the bowtie manual.

The Bowtie output of the FASTQ files can be used to generate the results of our manuscript.

Junction annotation with user defined length can also be created by following the instructions here.

Example MATS Pipeline

################################################
### Step 1. Convert bowtie output to SAM format
################################################
#
# If a user uses an aligner (other than bowtie) that creates SAM format, a user can skip this step and go to Step 2.
# Since bowtie needs to align reads to genome and junctions separately, the bowtie output must be combined before preceeding to Step 2.
#
# This example pipeline shows how to process 50bp reads with 84bp junction length.
# Since different junction lengths may be needed for different read lengths,
# a junction annotaion file (fasta) with user-defined junction length can be created using createJunctionAnnotation.sh script that comes with MATS
# For example, if a user has 32bp reads and wants to use 54bp junction length (27bp from each end of junction),
# run the following line
#
./createJunctionAnnotation.sh -d Ensembl -o Ensembl_54 -j 27 -p /Path/to/pygr/hg19/
#
# It will create junctions.Ensembl.54nt.fasta and bowtieIndex files (in *.ebwt format) in the output directory
# A user can use the resulting bowtieIndex for bowtie mapping and use fasta junction annotation file for the following process
#
# bowtie can align paired-end reads using the -1 and -2 options for pairs of fastq files
# for more information about bowtie mapping, consult bowtie manual page
# http://bowtie-bio.sourceforge.net/manual.shtm
#
### Create a sam file from genome mapping of each sample.
# bowtie output can be either single-end or paired-end
python makeSamFromBowtieOut.py ESRP.hg19.bowtie.out ESRP.hg19.sam 50 84
python makeSamFromBowtieOut.py EV.hg19.bowtie.out EV.hg19.sam 50 84
 
### Create a sam file from junction mapping of each sample.
# junction.Ensembl.r57.84nt.fasta is downloadable
python makeSamFromBowtieOut.py ESRP.junction.bowtie.out ESRP.junction.sam 50 84 junctions.Ensembl.r57.84nt.fasta
python makeSamFromBowtieOut.py EV.junction.bowtie.out EV.junction.sam 50 84 junctions.Ensembl.r57.84nt.fasta
#
# If a user created an annotaion file (fasta) with a user-defined junction length using createJunctionAnnotation.sh script, use the following:
#
python makeSamFromBowtieOut.py ESRP.junction.bowtie.out ESRP.junction.sam 32 54 junctions.Ensembl.54nt.fasta
python makeSamFromBowtieOut.py EV.junction.bowtie.out EV.junction.sam 32 54 junctions.Ensembl.54nt.fasta
#
### combine genome and junction sam files
cat ESRP.hg19.sam ESRP.junction.sam > ESRP_SE.sam
cat EV.hg19.sam EV.junction.sam > EV_SE.sam
 
#############################################################
### Step 2. make MATS input file for AS events
#############################################################
#
# make MATS input files for exon skipping events and alternative 5/3 splice site events from the sam files
# sam files are from Step 1 (using bowtie only) or an aligner that support sam format such as TopHat or SpliceMap
#
python convertSamToMATSInput.py genesAndExons.Ensembl.r57.txt ESRP_SE.sam EV_SE.sam SE ESRP EV 50 84 output
#
# It will generate following files in the output directory.
# exonSkipping.txt and MATS.input.exonSkipping.txt for exon skipping events
# altSS.txt and MATS.input.altSS.txt for alternative 5/3 splice site events
 
######################
### Step 3.  run MATS
######################
 
# run MATS with MATS.input.exonSkipping.txt for exon skipping events
./MATS.sh -d MATS.input.exonSkipping.txt -o MATS_ES -c 0.1 -t 1 -p 1 -s 7
 
# run MATS with MATS.input.altSS.txt for alternative 5/3 splice site events
./MATS.sh -d MATS.input.altSS.txt -o MATS_ALT_SS -c 0.1 -t 1 -p 1 -s 7