Cdo{rb,py}

Cdo{rb,py} is a smart caller for the CDO binary, which allows you use it in the context of Python and Ruby as if it would be a native library. It offers some features, which makes it superior over using CDO in a plain shell:

  • automatic tempfile handling
  • conditional processing (i.e. process only files, which do not exist) as a configuration
  • flexible multi-threadding
  • direct data access via numpy/narray
  • write new operators out of old ones

Current release is 1.2.3. There also is a github repository for easy code sharing and where the changelog is tracked.

Usage

Almost all features are covered by units tests. Theses should be a good starting point to get an impression in how to use the package:

  • Python: source:trunk/cdo/contrib/python/test/test_cdo.py or at github
  • Ruby: source:trunk/cdo/contrib/ruby/test/test_cdo.rb or at github

Both bindings are tested with the unix and the win32 version of CDO. Please note, that returning arrays by setting returnCdf is not tested due to the lack of the corresponding netcdf library on windows. There are precompiled windows version of netcdf, but I will not spent time to get it running.

Before doing anything else, the libraries must have been loaded in the usual way:

from cdo import *   # python version
cdo = Cdo()

In the python version an object has to be created for internal reasons, whereas this is not necessary for Ruby. This may change in the future, but for now it is only a minor difference
require 'cdo'       # ruby version

IO

Input and output files can be set with the keywors input and output

    Cdo.infov(:input => ifile)      #ruby version
    cdo.showlevel(:input => ifile)
    cdo.infov(input=ifile)          #python verson
    cdo.showlevel(input=ifile)
    Cdo.timmin(:input => ifile ,:output => ofile)   #ruby version
    cdo.timmin(input = ifile,    output =  ofile)   #python version

Options

Commandline options like '-f' or '-P' can by used via the options keyword:

    Cdo.timmin(:input => ifile ,:output => ofile,:options => '-f nc') #ruby version
    cdo.timmin(input = ifile,    output = ofile,  options = '-f nc')  #python version

Operator arguments have to be given as the first method argument

    Cdo.remap(gridFile,    weightFile,:input => ifile,:output => ofile,:options => '-f nc') #ruby version
    cdo.remap(gridFile+","+weighFile,  input =  ifile, output =  ofile, options = '-f nc')  #python version

or
    Cdo.seltimestep('1/10',:input => ifile,:output => ofile,:options => '-r -B F64') #ruby version
    cdo.seltimestep('1/10', input =  ifile, output =  ofile, options =  '-r -B F64') #python version

Operator Chains

To take real advantage of CDOs internal parallelism, you should work with operator chains as mush as possible:

    Cdo.setname('random',:input => "-mul -random,r10x10 -enlarge,r10x10 -setyear,2000 -for,1,4",:output => ofile,:options => '-f nc') #ruby version
    cdo.setname('random', input =  "-mul -random,r10x10 -enlarge,r10x10 -setyear,2000 -for,1,4", output =  ofile, options =  '-f nc') #python version

Another good example taken from the Tutorial illustrates the different ways of chaining: While the chain

cdo sub -dayavg ifile2 -timavg ifile1 ofile
is represented by

Cdo.sub(:input => "-dayavg #{ifile2} -timavg #{ifile1}", :output => ofile)  #ruby
cdo.sub(input = "-dayavg " + ifile2 + " -timavg " +ifile1, output = ofile)  #python

the serial version would look like

Cdo.sub(:input => Cdo.dayavg(:input => ifile2) + " " + Cdo.timavg(:input => ifile1), :output => ofile)  #ruby
cdo.sub(input  =  cdo.dayavg(input  =  ifile2) + " " + cdo.timavg(input  =  ifile1), output  =  ofile)  #python

or using the join-method:
Cdo.sub(:input => [Cdo.dayavg(:input => ifile2),Cdo.timavg(:input => ifile1)].join(" "), :output => ofile)  #ruby
cdo.sub(input  =  " ".join([cdo.dayavg(input  =  ifile2),cdo.timavg(input  =  ifile1)] , output  =  ofile)  #python

Special Features

Tempfile handling

If the output stream is omitted, a temporary file is written and its name is the return value of the call:

    ofile = Cdo.timmin(:input => ifile ,:options => '-f nc')   #ruby version
    ofile = cdo.timmin(input  =  ifile,  options =  '-f nc')   #python version

Here, the output files are automatically removed, when the scripts finishes. Manual cleanup is not necessary any more.

Conditional Processing

When processing large number of input files as it is the case in a running experiment, it can be very helpful to suppress the creation of intermediate output if these files are already there. This can speed up your post-processing. In the default behavior, output is created no matter if something is overwritten or not. Conditional processing can be used in two different ways:

  • global setting
    cdo.forceOutput = False   #python
    or
    Cdo.forceOutput = false   #ruby
    This switch changes the default behavior (example)
  • operator option
    cdo.stdatm("0,10,20",output = ofile, force =  False)  #python
    or
    Cdo.stdatm(0,10,20,:output => ofile,:force => false)  #ruby
    The usage of this options allows you to setup the output action very precisely without changing the default (example for good place to uses this feature)

Multi-threadding

When things can be done in parallel, Python and Ruby offer a smart way to handle this without to much overhead. A Ruby example should illustrate how it can be done: Tutorial

require 'cdo'
require 'jobqueue'

iFile                 = ARGV[0].nil? ? 'ifs_oper_T1279_2011010100.grb' : ARGV[0]
targetGridFile        = ARGV[1].nil? ? 'cell_grid-r2b07.nc'            : ARGV[1] # grid file
targetGridweightsFile = ARGV[2].nil? ? 'cell_weight-r2b07.nc'          : ARGV[2] # pre-computed interpolation weights
nWorkers              = ARGV[3].nil? ? 8                               : ARGV[3] # number of parallel threads

# lets work in debug mode
Cdo.debug = true

# create a queue with a predifined number of workers
jq = JobQueue.new(nWorkers)

# split the input file wrt to variable names,codes,levels,grids,timesteps,...
splitTag = "ifs2icon_skel_split_" 
#Cdo.splitcode(:in => iFile, :out => splitTag,:options => '-f nc')
Cdo.splitname(:in => iFile, :out => splitTag,:options => '-f nc')

# collect Files form the split
files = Dir.glob("#{splitTag}*.nc")

# remap variables in parallel
files.each {|file|
  jq.push {
    basename = file[0..-(File.extname(file).size+1)]
    Cdo.remap(targetGridFile,targetGridweightsFile,
              :in => file,
              :out => "remapped_#{basename}.nc")
  }
}
jq.run

# Merge all the results together
Cdo.merge(:in => Dir.glob("remapped_*.nc").join(" "),:out => 'mergedResults.nc')


In this case the parallelization is done per variable. The only lines, which had to be added for letting the code run on a user defined (see line 7) number of thread are 2, 13, 25, 30 and 32. This approach uses a queue, which takes all jobs and is getting started with q.run. A python version of JobQueue should be easy to implement. Contribution would be appreciated!

Data access via numpy/narray

When working with netcdf, it is possible to get access to the data in two additional ways:

  1. a file handle: Using a file handle offers the flexibility to go through the whole file with all it information like variables, dimensions and attributes. To get such an handle form a cdo call, use the returnCdf keyword or use the readCdf methods:
    cdo.stdatm("0", options = "-f nc", returnCdf  =  True).variables["P"][:]  #python, access variable 'P' with scipy.io
    Cdo.stdatm(0, :options => "-f nc", :returnCdf => true).var("P").get       #ruby , access with ruby-netcdf
    or return the pure handle with
    cdo.readCdf(ifile)  #python
    Cdf.readCdf(ifile)  #ruby
    
  2. a numpy/narray object: If a certain variable should be read in, use the returnArray instead of returnCdf:
    pressure = cdo.stdatm("0", options = "-f nc",  returnArray = 'P')  #python
    pressure = Cdo.stdatm(0, :options => "-f nc", :returnArray = 'P')  #ruby
  3. a masked array: If the target variable has missing values, i.e. makes use of the FillValue, the returned structure reflects this. For python a masked array is returned, the ruby version uses a special version of NArray called NArrayMiss. As an example, lets mask out the ocean from the global topography:
    oro = cdo.setrtomiss(-10000,0, input =  cdo.topo( options =  '-f nc'), returnMaArray =  'topo')  #python
    oro = Cdo.setrtomiss(-10000,0,:input => Cdo.topo(:options => '-f nc'),:returnMaArray => 'topo')  #ruby

Have a look into the documentation of the underlying netcdf libraries to get an overview of their functionality:

Prerequisites

The python module requires scipy.io (or pycdf as a fallback) whereas the ruby module needs ruby-netcdf. These dependencies are not handled automatically by pip or gem, because they are optional. Scipy is available for most linux/unix distributions as a precompiled package. If this is not the case for your favorite one, you could also use its pip repository. The ruby-netcdf package has a gem-repository:

  • Ruby:
    gem install ruby-netcdf
    or
    gem install ruby-netcdf --user-install
  • Python:
    pip install scipy
    or visit the the homepage for help on manual installation

Use Cases: Plotting

Examples: Python

from cdo import *
cdo   = Cdo()                                                         # create the CDO caller
ifile = 'tsurf.nc'                                                    # input: surface temperature
cdo.fldsum(input=ifile)                                               # compute the timeseries of global sum, return a temporary filename
vals  = cdo.fldsum(input=ifile,returnCdf=True).variables['tsurf'][:]  # return the timeseries as numpy array
print(cdo.fldsum(input=ifile,returnCdf=True).variables)               # get a list of all variables in the file 

Basic plotting:

from cdo import *
import matplotlib.pyplot as plt
ifile = 'EH5_AMIP_1_TSURF_1991-1995.nc'
cdo   = Cdo()

# Comput the field mean value timeseries and return it as a numpy array
vals  = cdo.fldmean(input=ifile,returnCdf=True).variables['tsurf'][:] 

# make it 1D
tmean = vals.flatten()

# Plot the cumulatice sum of the variataion
plt.plot((tmean-tmean.mean()).cumsum())
plt.show()
produces: original:

Examples: Ruby

require 'cdo'
ifile = 'tsurf.nc'                                                   # input: surface temperature
vals  = Cdo.fldsum(:in => ifile,:returnCdf=> true).var('tsurf').get  # return the global sum timeseries as narray
puts Cdo.fldsum(:in=> ifile,:returnCdf => true).var_names            # get a list of all variables in the file 

If you want some basic plotting, use the Ruby bindings of the GNU Scientific Library. You can install it like cdo. Here's a short example:

require 'cdo'
require 'gsl'
ifile="data/examples/EH5_AMIP_1_TSURF_1991-1995.nc" 
tmean = Cdo.fldmean(:in => ifile,:returnCdf => true).var('tsurf').get
tmean.to_gv.plot("w d title 'AMPI global mean surface temp'")

which shows

In this context the variable tmean is of type narray which is the ruby version of numpy. It has several methods itself. For filtering the out the temporal behaviour of the aboce time series, you could substract the mean value and display the cumulative sum by adding:

(tmean-tmean.mean)[0,0,0..-1].cumsum.to_gv.plot("w d title 'CUMSUM of global mean surface temp variation'")

with results

Use Cases: Interpolation, Root finding, Data fitting, ...

Through the numpy/narray interface, both python and ruby version offer a huge amount of extra functionality via several 3rd party libraries:

Write your own operators

Future versions

Both cdo modules are not directly linked to a special CDO version. Instead you can change the CDO version to what ever you have installed. Use the setCdo method to use another CDO binary. When CDO is updated and new operators area available, they are usable in the python and ruby modules automatically without any update.

Installation

CDO can be easily accessed via Ruby and Python. For each of the these two there is a dedicated package with can be installed from public servers with their own specific package management systems: gem for Ruby and pypi for Python. Interfaces of both packages are the identical.

Ruby

Ruby's package system is called gem. The cdo module is located here. Its installation is rather easy: Just type

gem install cdo
and you'll get the latest version installed on your system. Installation as usual requires root privileges, so it might be necessary to prepend a sudo to the command. gem has a great built-in help:
gem help install
will show all you need for installation. If you do not have root access to you machine, another installation directory should be chosen with the --install-dir option or you use
gem install cdo --user-install
for an installation under $HOME/.gem.

Ruby 1.9.x comes with gem included, but some distros like debian and its derivates create extra packages for it. You might have to watch out for a rubygems package.

Python

The cdo module can also be installed for python using pypi, the python package index. Cdo can be found here. If pip is installed on your system, just type

pip install cdo
For user installations, use
pip install --build=/tmp/pip_build --src=/tmp/src_build --user cdo
Please Note: For upgrading with pip, you have to remove the temporary directories first. Otherwise the upgrade will not take place:
rm -rf /tmp/pip /tmp/src && pip install --build=/tmp/pip --src=/tmp/src --user cdo --upgrade

Without pip, you should download the tar file and run (possibly requiring root privileges)
python setup.py install
in the extracted directory.

For ZMAW users

Depending on the target language, use module load python or module load ruby before you use the cdo bindings. For getting easy access to the data, the netcdf bindings for Python/Ruby have to be installed:

  • Python: pycdf is installed for the python-2.7 module on all lenny-64 workstations. Using with cdo.py should work with and without returnCdf=True.
  • Ruby: ruby-netcdf is installed for the available ruby module. In case of problems it can be installed by every user locally:
    1. Create a directory structure (if you didn't do it for cdo.rb) for local gems with
      mkdir -p $HOME/.gem/ruby/1.9.1
    2. Call
      gem install -r ruby-netcdf --install-dir=$HOME/.gem/ruby/1.9.1 -- --with-netcdf-dir=/sw/lenny-x64/netcdf-4.1.3-gccsys

plotTest.png (26.4 KB) Ralf Mueller, 2012-01-20 10:06

plotTestCumsum.png (18.6 KB) Ralf Mueller, 2012-01-20 10:18

tmean_py.png (38.6 KB) Ralf Mueller, 2012-01-24 14:08

tsurfOrg.png (30.4 KB) Ralf Mueller, 2012-11-22 09:22

tsurfOrg.png (59.9 KB) Ralf Mueller, 2012-11-22 12:30

tmean_py.png (36.7 KB) Ralf Mueller, 2012-11-22 12:30