Shannon Stream of Consciousness…

Posted on Posted in Bioinformatics, Publishing, software

A new transcriptome assembler has been released. Shanon: http://biorxiv.org/content/early/2016/02/09/039230 and http://sreeramkannan.github.io/Shannon/. It looks quite interesting from an algorithm standpoint and claims to be better than the existing suite of assemblers.. Good deal, I\’m all ears! I\’ll hope to provide a full review (with the lab) in a bit, but I have a few immediate suggestion after kicking the tires yesterday.

A list of software \’issues\’ and suggestions that I have wondered about:

  1. Help menu: List defaults along with options. E.g., what is defaults value for `K`, `partition_size`, etc
  2. Why Quorum – claim better in the paper – how was this evaluated? Did you test other error correction software packages, esp. those designed for RNAseq. I test them here: http://biorxiv.org/content/early/2015/12/30/035642
  3. Allow people to feed in previously error corrected reads. I like (per manuscripts linked above), bfc and Rcorrector. I should be able to use those tools in Shannon. Maybe `–left_corr` and `–right_corr` to pass in corrected reads? EDIT – you can do this by passing in reads in fasta format. 
  4. Checkpoints! Restart a failed run at the last known \’good\’ checkpoint. For instance, I had run fail at 13 hours yesterday – surely I should not have to start from the beginning of the assembly process.
  5. Any tips for increasing speed? For instance, what if I set the partition size to smaller/larger? EDIT – Right now Shannon is completely unusable for anything but the smallest datasets (IME). Right now, for me, Shannon takes in excess of 120 hours for a 20M read assembly. WAY too long to be useful. The developers are working on this issue.
  6. How sensitive are results to kmer size? Should we be optimizing kmer length?
  7. Have you evaluated Shannon assemblies using `DETONATE`/`TransRate`/`BUSCO`? Would be informative to see how they compare to the Trinity assemblies. EDIT – with a very small test (1M reads) the numbers are not as good as Trinity, but let\’s best a larger dataset before we say too much here. 
  8. Unless I missed it, how is Shannon licensed? Might I suggest some version of a MIT/BSD license? EDIT- the license is GPL v3