Shannon Stream of Consciousness…

Posted on Posted in Bioinformatics, Publishing, software

A new transcriptome assembler has been released. Shanon: http://biorxiv.org/content/early/2016/02/09/039230 and http://sreeramkannan.github.io/Shannon/. It looks quite interesting from an algorithm standpoint and claims to be better than the existing suite of assemblers.. Good deal, I\’m all ears! I\’ll hope to provide a full review (with the lab) in a bit, but I have a few immediate suggestion after kicking the tires yesterday.

A list of software \’issues\’ and suggestions that I have wondered about:

  1. Help menu: List defaults along with options. E.g., what is defaults value for `K`, `partition_size`, etc
  2. Why Quorum – claim better in the paper – how was this evaluated? Did you test other error correction software packages, esp. those designed for RNAseq. I test them here: http://biorxiv.org/content/early/2015/12/30/035642
  3. Allow people to feed in previously error corrected reads. I like (per manuscripts linked above), bfc and Rcorrector. I should be able to use those tools in Shannon. Maybe `–left_corr` and `–right_corr` to pass in corrected reads? EDIT – you can do this by passing in reads in fasta format. 
  4. Checkpoints! Restart a failed run at the last known \’good\’ checkpoint. For instance, I had run fail at 13 hours yesterday – surely I should not have to start from the beginning of the assembly process.
  5. Any tips for increasing speed? For instance, what if I set the partition size to smaller/larger? EDIT – Right now Shannon is completely unusable for anything but the smallest datasets (IME). Right now, for me, Shannon takes in excess of 120 hours for a 20M read assembly. WAY too long to be useful. The developers are working on this issue.
  6. How sensitive are results to kmer size? Should we be optimizing kmer length?
  7. Have you evaluated Shannon assemblies using `DETONATE`/`TransRate`/`BUSCO`? Would be informative to see how they compare to the Trinity assemblies. EDIT – with a very small test (1M reads) the numbers are not as good as Trinity, but let\’s best a larger dataset before we say too much here. 
  8. Unless I missed it, how is Shannon licensed? Might I suggest some version of a MIT/BSD license? EDIT- the license is GPL v3
  • Titus Brown

    Shannon is licensed under GPL v3. I believe it says so under the ‘license’ link in the Shannon docs 😉 https://sreeramkannan.github.io/Shannon/

    • Matt MacManes

      ya.. I saw that after – I was initially looking in the github repo, which has no mention of license.

  • Peter Fields

    I recall from using Quorum as part of the MaSuRCA that there wasn’t a need to do adapter trimming ahead of time, though I don’t actually know if it’s necessarily more effective than simply using Trimmomatic.

    • Matt MacManes

      Quorum is an error correction tool – not a trimmer, so I assume you would still need to do trim adapters.

      • Peter Fields

        Yes, that’s definitely true. As part of the MaSuRCA run though the Quorum step removes adapters, and Quorum can generally be used to remove them if possible adapter sequences are assumed to be a form of contaminant. I just don’t know if that’s happening as part of Shannon.

  • Sreeram Kannan

    Thanks for the comments. Some more answers at http://sreeramkannan.github.io/Shannon/faq