I am really loving bwa mem-it is a fast and accurate short read mapper.. One problem, unlike bowtie/bowtie2, there are no alignment stats printed by default at the end of the run.. This is a major annoyance, as I (and I suspect everyone else) uses these stats to infer something about the quality of the reference/reads/alignment process… So why they are not there is a mystery to me. Now you might say, ‘Quit crying you big baby, just generate the stats yourself.’ I can, of course, do this, but this takes time, sometimes a significant amount of time, and I don’t really have time. It also occupies CPU cycles, and the output occupies HD space.. So, I tweeted:
anybody have a little BAM stats program that can added to the end of a BWA run? e.g. bas mem | BAMstats – | out.bam @lh3lh3
— Matt MacManes (@PeroMHC) November 19, 2013
I got a couple of helpful responses:
@PeroMHC @lh3lh3 you can pipe to samstat, it writes out HTML report http://t.co/poQ36G1vz5 not sure if it takes BAM
— Richard Smith-Unna (@blahah404) November 19, 2013
samstat… this is a cool program, and generates really nice looking output, though in .html format. I really just want basic stats for normal usage, so samstat seems like overkill. In addition, it is missing one key mapping stat that I really want- the number of reads that map as proper pairs.
@blahah404 @PeroMHC @lh3lh3 Haha, maybe "bwa mem ref.fa seq.fq | samtools view -Sbh – | tee >(samtools flagstat – > stats.out) > aln.bam"
— Vince Buffalo (@vsbuffalo) November 19, 2013
Vince replied with this, and his solution seemed like a great suggestion.. I was initially wary of invoking samtools (especially view) as it adds a significant amount of time to the total runtime.. For a small test dataset, we’re talking about like 25% increase.. For something that already runs for several hours, this is a substantial amount of time.
Anyway, I got to thinking about this, and it seems that calling samtools may be the only thing to do- short of writing my own tool, which I don’t really have time for.. I played around with the script, adding threaded samtools view, and using the -u flag *for minimally compressed bam) and got the time down a little.. This script still adds time to the run, bot now like 10%.. so better than 25%.
bwa mem -t6 long read.1.fq read.2.fq \
| samtools view -@6 -Sub - \
| tee >(samtools flagstat - > stats.out) > aln.bam