Getting alignment stats out of BWA

Posted on Posted in Uncategorized

I am really loving bwa mem-it is a fast and accurate short read mapper.. One problem, unlike bowtie/bowtie2, there are no alignment stats printed by default at the end of the run.. This is a major annoyance, as I (and I suspect everyone else) uses these stats to infer something about the quality of the reference/reads/alignment process… So why they are not there is a mystery to me.  Now you might say, ‘Quit crying you big baby, just generate the stats yourself.’ I can, of course, do this, but this takes time, sometimes a significant amount of time, and I don’t really have time. It also occupies CPU cycles, and the output occupies HD space.. So, I tweeted:

I got a couple of helpful responses:

samstat… this is a cool program, and generates really nice looking output, though in .html format. I really just want basic stats for normal usage, so samstat seems like overkill. In addition, it is missing one key mapping stat that I really want- the number of reads that map as proper pairs.

Vince replied with this, and his solution seemed like a great suggestion.. I was initially wary of invoking samtools (especially view) as it adds a significant amount of time to the total runtime.. For a small test dataset, we’re talking about like 25% increase.. For something that already runs for several hours, this is a substantial amount of time.

Anyway, I got to thinking about this, and it seems that calling samtools may be the only thing to do- short of writing my own tool, which I don’t really have time for.. I played around with the script, adding threaded samtools view, and using the -u flag *for minimally compressed bam) and got the time down a little.. This script still adds time to the run, bot now like 10%.. so better than 25%.

bwa mem -t6 long read.1.fq read.2.fq \
| samtools view -@6 -Sub - \
| tee >(samtools flagstat - > stats.out) > aln.bam

  • Nice, didn’t know about the threaded view option!

    It would be fairly easy to add this functionality to bwa-mem, and that would eliminate almost all of the runtime cost.

    • Matt MacManes

      several of the samtools tools are now threaded.. sort being the other big one for me..