Introducing the TAMrS pipeline!!!

Posted on Posted in Uncategorized

Yay– the day has finally come to release TAMrS- (T)ranscriptome (A)ssembly (M)ade (r)eally (S)imple! TAMrS is a pipeline that produces transcriptome assemblies from a set of PE or SE reads, importantly, using a single command. The goal of this software project is to allow non-bioinformaticians/non-programmers to use cutting edge tools (e.g. Trimmomatic/FLASH/Trinity/eXpress) that are typically not a part of GUI-enabled packages (e.g. Galaxy, Geneious, CLC Bio, etc).  The software is available for download at https://sourceforge.net/projects/tamrs/. It operates only on Linux OS, and large assembly projects will require a few hundred Gb RAM (hardware requirements which are typical of large genomics projects).

The software will, in its most basic execution, trim adapters, quality trim, assembly using Trinity, then quantitate expression using eXpress.. Read merging is also optionally supported as part of the workflow, and merged reads will be trimmed and assembled automatically.  How much read merging will help depends critically on how much your reads overlap, though I don’t think that merging will ever hurt.

To download and install the software (make check will execute some unit tests to make sure you have all the parts correctly installed). You’ll need git and curl to install, these should be part of a standard Linux distribution:

git clone git://git.code.sf.net/p/tamrs/code tamrs-code
cd tamrs-code
make
make check

To execute the program (replace READ1 and READ2 info with path to your reads). Full usage details are available in the USAGE file included in the download.

./tamrs.mk flash \
OVLMIN=20 \
OVLMAX=140 \
MINLEN=25 \
PHRED=33 \
MINK=1 \
MEM=2 \
TRIM=2 \
CPU=2 \
BCPU=2 \
RUN=run \
READ1=../sample_data/test1.fq \
READ2=../sample_data/test2.fq

Parameters were selected based on my own experience assembling and quantitating vertebrate transcriptomes over the past several years. Some of this work is published (Trinity Paper , Trimming, Error Correction), and some is sitting on a hard drive waiting to be published. The protocol will work best with longer Illumina PE reads (e.g 150nt), but shorter Illumina reads, including single end reads, will work. There is no support for PacBio/Ion Torrent/454 reads, basically because I have not had an opportunity to work with these types of reads. So, if you want to see support for these, and you have means to produce these types of datasets, we should talk!

Oh, and why use this pipeline, rather that somebody else’s, or your own? Good question, there are several really great pipelines out there (e.g., Eel Pond mRNAseq Protocol, and corresponding blog post), and while I have a lot of confidence in my pipeline, remember that assemblers, and pipelines, handle data differently, and one may work better that another on a  given dataset. (remember one of the conclusions from the Assemblathon 2 paper? – best assembler for genome A not necessarily best for genome B and C )

Lastly, did something not work as intended, do  you have a suggestion for improvement, did it work wonderfully?? Please let me know, in the comments, on Twitter, or via email.

Happy Assembling!