Downloading CMIP5 model output can be a laborious task. I’ve tried several repositories and multiple ways of accessing these data and ultimately found the following to be the most efficient and effective methodology. This is particularly helpful when you are attempting to download data for a large temporal period for multiple models. Additionally, for a lot of regional climate change analyses, users may need only a spatial subset of the data. On most repositories, most CMIP5 model data are archived at yearly resolution which can be a pain when you need 30 years of data and are only available for the entire spatial domain. This involves downloading a lot of data that the user may not need. This tutorial allows one to only download the variables/temporal domain/spatial domain that the user needs to minimize download sizes.

This methodology assumes the user:

  1. is using Linux/Unix (in my case Fedora).
  2. has Climate Data Operators (CDO) available for use.

The first step is to setup a CERA account through the instructions on the DKRZ website. Disclaimer: You should only use these data for research/academic/non-commercial reasons. I received my account information relatively quickly. Once you have your account information there are two ways to access the data. One is through an online GUI that is quite time-intensive. The other option is to download Jblob, which is a java command line program designed to streamline data requests and make requests through the command line. Download Jblob following the straightforward instructions on the webpage.

After Jblob is downloaded, test to make sure Jblob is working properly by using the –version command:

Next, we need to gather the dataset names to inform the server which specific files we want. The easiest way to do this is to go to this IPCC website. Find the model(s) you are interested in. For this example, let’s assume we want the NCAR CCSM4 historical run. Find the CCSM4 line on the webpage and click on “Historical”. This will take you to a page that looks like this. Under the Detailed Metadata heading click on the WDCC metadata link. At this point we need to find the file name(s) for the variable(s) of interest. Let’s say we are interested in downloading specific humidity (hus). Go toward the bottom of the page and find the box labeled, “Attached Entries”. Scroll down to find the entry with “hus”. There may be multiple entries available depending on the number of model runs available (e.g. r6i1p1). Click on “Show Details of Selected Entry”. Under the general information tab in the acronym field you will find the dataset name, in this example, it is “NRS4hiDADhus611v121031”. I basically made a running list of all of the dataset names in a text file that I needed. Let’s say you always wanted the u- (ua) and v- (va) component of the winds as well for CCSM4. The only change you would make to the above file name would be to substitute ‘ua’ and ‘va’ for ‘hus’ in the file name. Now, let’s utilize Jblob.

In Linux/Unix command line, insert the following:

This downloads the daily specific humidity data for the CCSM4 model for the historical period (indicated by the –dataset option). The –dir option indicates the local download folder. The –tmin and –tmax options allow the user to define the temporal bounds of interest. It is important to use these options if you don’t want 100 years of data. Next, I used the cdo command sellonlatbox to extract out my study domain. This will save you a TON of disk space. The –username and –password options will be your CERA account information. Additionally, you can group several of these requests into a shell script and download all the files you need for your study (this saves a lot of time). There are several other Jblob and CDO commands that can be used to further specify your data if needed. Hopefully this will be good enough to get you headed in the right direction. Cheers!