Bioinformatics

Nail in the quality trimming coffin!

Posted on October 6, 2014March 19, 2016 macmanesLeave a commentPosted in Bioinformatics, Illumina

I find myself continually frustrated**. Reading transcriptome assembly paper after paper, all of them quality trimming mindlessly at a Phred score 20 or 30. Why do people do this??!?!?! I just don\’t understand, given there is nary a shred of evidence to suggest its benefit, and a substantial amount of evidence to suggest the contrary […]

If CLC and Galaxy aren’t the answer, what is??

Posted on August 30, 2014March 19, 2016 macmanes4 CommentsPosted in Bioinformatics, Illumina

Earlier in the day, I said this: I don’t think Galaxy/CLC is the solution for biologists with genomic data.. These tools simply allow for sloppy and uninformed analyses. — Matt MacManes (@PeroMHC) August 30, 2014 Now, this is something that I have struggled with for the last few years.. Increasingly, people have been trying to […]

Thoughts about #ngs2014

Posted on August 24, 2014March 19, 2016 macmanesLeave a commentPosted in Bioinformatics

As many of you know, I recently spent a week at MSU\’s Kellogg Biological Station. No, I haven\’t betrayed my Maize and Blue, instead, I was there after being invited by Titus Brown to participate in the 5th installment of ANGUS workshop. I have to admit feeling a fair amount of imposter syndrome going in – […]

CEGMA on EC2!!

Posted on July 22, 2014March 19, 2016 macmanesLeave a commentPosted in Bioinformatics, software

CEGMA has been, for my lab, a really great tool for understanding genome assembly completeness and quality, so – that it has had so many problems recently has been a big problem. Never quite sure what exacty the issue is, is revolved around geneid not properly working.. (https://gist.github.com/macmanes/9cb776429df90723e3a9) is an example.. Anyway, to solve this […]

Not having fun with Tophat

Posted on May 28, 2014March 19, 2016 macmanes6 CommentsPosted in Bioinformatics, software

I was an early user of Tophat, way back before there were many good tools for quantitating expression from NGS reads. I was just learning how to process these types of data (e.g., learning a scripting language), so I thought that many of the problems I encountered were a result of my inadequacy — and […]

New Paper: On the optimal trimming of high-throughput mRNA sequence data

Posted on November 15, 2013March 19, 2016 macmanes8 CommentsPosted in Bioinformatics

I\’ve just finished work on a new On the optimal trimming of high-throughput mRNA sequence data, which is as a preprint on bioRxiv: http://biorxiv.org/content/early/2013/12/23/000422. The paper has been submitted to Frontiers in Bioinformatics and Computational Biology for potential inclusion published in a special issue dealing with the Quality Control of NGS data (http://journal.frontiersin.org/Journal/10.3389/fgene.2014.00013/abstract). I began work on […]

sed and awk for genomics

Posted on June 5, 2013 macmanesLeave a commentPosted in Bioinformatics, Illumina

In my continuing quest to conquer fastQ, fastA, sam & bam files, I have accumulated several useful ‘tools’. Many of them are included in other software packages (e.g. SAMtools), but for some tasks, especially file management and conversion, no standard toolkit exists, and instead researchers script their own solution. For me, sed and awk, along […]

Improving transcriptome assembly through error correction of high-throughput sequence reads

Posted on April 4, 2013 macmanesLeave a commentPosted in Bioinformatics, Illumina

I am writing this blog post in support of a paper that I have just submitted to arXiv: Improving transcriptome assembly through error correction of high-throughput sequence reads. My goal is not to talk about the nuts and bolts of the paper so much as it is to ramble about its motivation and the writing process. […]