help…

Posted on Posted in Bioinformatics

Lazy web query. Actually not that lazy given I did google. Maybe this seems like a silly problem that I should be able to figure out..

Anyway, I want to download a bunch of sequences from UniProt. I have a URL that works http://www.uniprot.org/uniprot/?&format=fasta&limit=5&query=taxonomy:6656+AND+name:ABCC11 which in this case will download all the Arthorpod sequences for the gene ABCC11. I want to be able to do this programmatically for a bunch of queries using curl or wget like this.

for gene in $(cat genes_list); do
    curl http://www.uniprot.org/uniprot/?&format=fasta&limit=5&query=taxonomy:6656+AND+name:$gene > gene.fasta;
done

but when I use curl/wget to download from the URL, I get only the XML formatted webpage, not the actual fasta.. Why is this and how can I get the fasta..

  • Cheng H. Lee

    curl “http://www.uniprot.org/uniprot/?&format=fasta&limit=5&query=taxonomy:6656+AND+name:$gene”

    should get you what you need. Note: it’s important that you use double quotes around the URL to avoid confusing the shell (since “&” backgrounds a command) while still getting variable expansion for ‘$gene’.

    The field delimiter for the query part of URLs is “&”; having “&” in your request means you’re asking the server for the “amp;” field (and in this case, multiple times), which is likely confusing it and causing it to fall back to its default behavior of returning XML.