Transcriptome Assembly: A reviewers guide

I am writing this, in hopes that it will remind people (reviewers) of the critical details required of published transcriptome studies. This guide intends to cover the details through the production of the final transcriptome assembly. I will write about testing for differential expression at a later date. The ‘living document’ available for editing is here: https://github.com/macmanes-lab/OpenScience/blob/master/reviewers_guide.md. Please suggest changes, additions, subtractions over there.

You’ve just been invited to review a manuscript \’Assembly and Characterization of the X transcriptome (Genus species)\’. Awesome, now here is what you need to think about, at least with regards to the methods.

Nucleic acid extraction. How did the authors do it? Trizol, a column, something else. Did the authors check its quality?
Sequencing libraries. What kit was used? Be specific. Illumina is just one of the suppliers of these kits, and they sell several different varieties (stranded, unstranded, Ribo-depletion, etc). How much RNA went into the prep.
Sequencing: What platform was used? Paired end or single end sequencing? How many raw reads were generated? Are these data publicly available
- update from @BioMath: If you are sequencing multiple individuals, make sure that you tell the reader how many reads generated for each individual (https://twitter.com/BioMath/status/530537239043125248)
Were adapters trimmed? How did the authors trim them. What software/settings were used?
Were low quality nucleotides trimmed? If so what threshold was used and why. What software/settings?
If data were normalized, how was this done, what settings, software? How many reads were discarded?
If data were error corrected, how was this done, what settings, software? How many errors were corrected (some software does not report this number)?
How was assembly done? What software/version. What settings were used and why. How many contigs were produced.
How was the raw assembly filtered. Did the authors filter out ‘garbage’ contigs using blast, gene expression, something else? Is the final assembly available on Dryad/Genbank or similar?
Is the code used to analyze the data available (perhaps on Github)? Custom scripts not provided to reader/reviewer is not acceptable.

Marcin Cieslik

I agree often “methods matter” a lot. It would be great if you explained how the RNA extraction protocol and RNA quality influences (biases?) the results. I assumed, maybe incorrectly, that the library preparation protocol has a much higher impact on the result. P.S. I hope no one does transcriptome assembly on unstranded RNA.
- Matt MacManes
  
  Imagine if you were trying to assemble a transcriptome from RNA that was very degraded. For example, take the imaginary case where all transcripts were fractured into 2 pieces. You would successfully make the library, but none of your transcripts would contain the 5′ end. Poly A selection would recover that 3′ end, but the 5′ end would be left in the supernatant and therefore washed away.
  
  This is a top example, but I do believe that as input quality decrease, so does your ability to accurately reconstruct full length transcripts.
  
  Protocol may not matter. I suppose I could imagine that there may be hidden biases that could be uncovered in the future by somebody doing a systematic analysis of extraction protocols. More, reporting these details is good practice on the reproducibility side of things.
  - Marcin Cieslik
    
    Right. Protocols less affected by 5’/3′ bias (e.g. ribo-minus) would appear to be better suited for assembly.
    - Matt MacManes
      
      well, Ribo-minus messes up things in different and probably worse ways: see http://genomebiology.com/2014/15/6/R86