###### PUBLISHED ON JUN 20, 2019 — BIOINFORMATICS, GENETICS

Here is a concise guide to downloading gnomAD using gsutil. I am typically working on a server, and (begrudgingly) use conda to manage software. The only real trick here is getting the conda environment setup – I could have easily called this “Using gsutil with conda.” – the actual gsutil utility is pretty easy to use. gsutil, needed for accessing the gnomAD data, is buried within the ‘google-cloud-sdk’ conda package, not to be confused with the numerous other ‘google-cloud-’ conda packages. (A conda search for gsutil will not, at the time of writing, find the correct google-cloud-sdk package.) The Google Cloud SDK requires python v2, so I think it makes sense to create a dedicated conda environment:

# Create a new conda env with gsutil AND crcmod -- which will allow for


You can now activate the new environment and use gsutil to download gnomAD data:

conda activate GoogleCloud


Before starting the download, make sure crcmod is correctly compiled. Doing so will speed up the download significantly (days to hours, in my case).

gsutil version -l
#gsutil version: 4.38
#checksum: 58d3e78c61e7e0e80813a6ebc26085f6 (OK)
#boto version: 2.49.0
#python version: 2.7.15 | packaged by conda-forge | (default, Feb 28 2019, 04:00:11) [GCC 7.3.0]
#OS: Linux 3.10.0-693.5.2.el7.x86_64
#multiprocessing available: True
#using cloud sdk: True
#pass cloud sdk credentials to gsutil: True
#config path(s): No config found

Make sure complied crcmod is True. We can now download data from the desired release. Here, I will show how to download
mkdir -p gnomAD/r2.1.1

Note: not all releases have the same subdirectory structure. Use the -r flag with the gsutil call to copy a whole subdirectory. Also, these take awhile, so submit it to the scheduler or use a screen session if you’re working on a server.