I am writing this, in hopes that it will remind people (reviewers) of the critical details required of published transcriptome studies. This guide intends to cover the details through the production of the final transcriptome assembly. I will write about testing for differential expression at a later date. The ‘living document’ available for editing is here: https://github.com/macmanes-lab/OpenScience/blob/master/reviewers_guide.md. Please suggest changes, additions, subtractions over there.
You’ve just been invited to review a manuscript \’Assembly and Characterization of the X transcriptome (Genus species)\’. Awesome, now here is what you need to think about, at least with regards to the methods.
- Nucleic acid extraction. How did the authors do it? Trizol, a column, something else. Did the authors check its quality?
- Sequencing libraries. What kit was used? Be specific. Illumina is just one of the suppliers of these kits, and they sell several different varieties (stranded, unstranded, Ribo-depletion, etc). How much RNA went into the prep.
- Sequencing: What platform was used? Paired end or single end sequencing? How many raw reads were generated? Are these data publicly available
- update from @BioMath: If you are sequencing multiple individuals, make sure that you tell the reader how many reads generated for each individual (https://twitter.com/BioMath/status/530537239043125248)
- Were adapters trimmed? How did the authors trim them. What software/settings were used?
- Were low quality nucleotides trimmed? If so what threshold was used and why. What software/settings?
- If data were normalized, how was this done, what settings, software? How many reads were discarded?
- If data were error corrected, how was this done, what settings, software? How many errors were corrected (some software does not report this number)?
- How was assembly done? What software/version. What settings were used and why. How many contigs were produced.
- How was the raw assembly filtered. Did the authors filter out ‘garbage’ contigs using blast, gene expression, something else? Is the final assembly available on Dryad/Genbank or similar?
- Is the code used to analyze the data available (perhaps on Github)? Custom scripts not provided to reader/reviewer is not acceptable.