samtools consensus seq generation

Posted on Posted in Uncategorized

Instead of trying to squeeze this explanation into 140 characters, here is a short blog about the issue. I’ve posted the question to the samtools list serve, but no response there.

I am generating consensus sequences from a series of BAM files using the standard samtools command:

samtools mpileup -AIuf my.fasta my.bam | bcftools view -cgI – | vcfutils.pl vcf2fq > my.fq

What happens with the genotype is 50/50 is that instead of calling one of the bases, it instead uses an ambiguity code- R, Y, M, etc.. This is problematic for me, as those polymorphism are interesting, and the downstream work (aligning, testing for selection) cannot handle them.

So:

  1. Is there a way to force samtools to call an nucleotide rather than an ambiguous base?
  2. Is there a better way to be generating these consensus sequences from BAM files?