CEGMA on EC2!!

Posted on Leave a commentPosted in Bioinformatics, software

CEGMA has been, for my lab, a really great tool for understanding genome assembly completeness and quality, so – that it has had so many problems recently has been a big problem. Never quite sure what exacty the issue is, is revolved around geneid not properly working.. (https://gist.github.com/macmanes/9cb776429df90723e3a9) is an example.. Anyway, to solve this […]

New Paper: On the optimal trimming of high-throughput mRNA sequence data

Posted on 8 CommentsPosted in Bioinformatics

I\’ve just finished work on a new On the optimal trimming of high-throughput mRNA sequence data, which is as a preprint on bioRxiv: http://biorxiv.org/content/early/2013/12/23/000422. The paper has been submitted to Frontiers in Bioinformatics and Computational Biology for potential inclusion published in a special issue dealing with the Quality Control of NGS data (http://journal.frontiersin.org/Journal/10.3389/fgene.2014.00013/abstract). I began work on […]

sed and awk for genomics

Posted on Leave a commentPosted in Bioinformatics, Illumina

In my continuing quest to conquer fastQ, fastA, sam & bam files, I have accumulated several useful ‘tools’. Many of them are included in other software packages (e.g. SAMtools), but for some tasks, especially file management and  conversion, no standard toolkit exists, and instead researchers script their own solution. For me, sed and awk, along […]