February 24, 2022

I thought I would write a short post to explain how to fetch genome data from NCBI the old way, but here’s the gist, assuming you want to fetch all bacterial genomes:

% wget
% mkdir bacteria_refseq
% awk -F '\t' '{if($12=="Complete Genome") print $20}' assembly_summary.txt > assembly_summary_complete_genomes.txt
% for record in $(cat assembly_summary_complete_genomes.txt); \
    do wget -P bacteria_refseq -e robots=off -r --no-parent -A "*genomic.fna.gz" "$record"/; done

Then wait, like me.