CEGMA has been, for my lab, a really great tool for understanding genome assembly completeness and quality, so – that it has had so many problems recently has been a big problem. Never quite sure what exacty the issue is, is revolved around geneid not properly working.. (https://gist.github.com/macmanes/9cb776429df90723e3a9) is an example..
Anyway, to solve this problem, I turned to the Amazon Compute Cloud (see also Shaun Jackman\’s solution in Homebrew (http://korflab.ucdavis.edu/Datasets/cegma/faq.html#link14)) to save the day (and so should you!)
As of right now, there is a public AMI called CEGMA on the CLOUD (ami-18935a70 in the us-east) region (ive made copies available in other regions as well) that is properly configured to run CEGMA. So what this means is that you can sign on to EC2 (or get an account) and run CEGMA free of headaches..The current cost seems to be $.42 (42 cents) for the c3.2xLarge instance, which seems pretty reasonable to me…
I have tested this out with the sample data as well as with some of my own, and it seems to work great. I would recommend using at least the c3.2xlarge instance. Using at least 8 threads for the tblastn steps seems useful, as is a good bit of storage. I assume that everybody cab access it, and that they may have to switch region to us-east to find the public image.. if you don\’t know how to do this, or it doesn\’t seem to work, let me know!
Please refer to the CEGMA homepage for details about how to properly run CEGMA, but know that all the components are up and running, so it should be easy sailing ahead. Make sure and cite the CEGMA manuscript and this blog post, which may have a DOI affixed to it (if I can figure it out).
Let me know if there are issues..