Project

General

Profile

Performance of CDO v NCO: Subsetting a lat-lon box

Added by Matt Thompson over 10 years ago

All,

Recently a user had a question about the best way to subset data within a lat-lon box on some NC4 files. Obviously I thought of CDO (where it's simple), but he also mentioned NCO (where it's never simple). After scrolling around the NCO site for a while, I figured out you could do something a la sellonlatbox with NCO's ncea.

What surprised me, though, was how much faster NCO was at doing it than CDO:

(280) $ time ncea -d lat,25.0,50.0 -d lon,-127.0,-65.0 dR_MERRA-AA-r1.del_aer_Nv.200812clm.nc4 subset.ncea.nc4
0.081u 0.355s 0:00.44 97.7%    0+0k 0+43208io 0pf+0w
(281) $ time cdo sellonlatbox,-127,-65,25,50 dR_MERRA-AA-r1.del_aer_Nv.200812clm.nc4 subset.cdo.nc4
cdo sellonlatbox: Processed 224570880 values from 15 variables over 1 timestep ( 2.37s )
2.168u 0.214s 0:02.38 99.5%    0+0k 0+43416io 0pf+0w
(282) $ cdo diffn subset.ncea.nc4 subset.cdo.nc4
  0 of 1080 records differ
cdo diffn: Processed 11016000 values from 30 variables over 2 timesteps ( 0.32s )

In most things I've ever done, CDO has seemed to win out in the performance arena along with the ease-of-use arena (h5diff v cdo diffn, for example; much, much faster in CDO). So, my first thought was, perhaps I didn't compile CDO against NetCDF the same way as I did NCO (both are compiled against the same NetCDF build). Or, a co-worker here often tells me about how selecting the wrong "cache size" for NetCDF can cause a 10x slowdown, so maybe I did that?

Do you have any suggestions/ideas? Or is sellonlatbox perhaps just one area where NCO has the advantage? (I should probably add: "...for now.")

Thanks,
Matt


Replies (1)

RE: Performance of CDO v NCO: Subsetting a lat-lon box - Added by Uwe Schulzweida over 10 years ago

Hi Matt,

The smallest dataset read from disk in CDO is a horizontal slice of the array (one time step on one level). sellonlatbox makes a subset of this dataset in memory. I assume that the NCO operator ncea is reading this subset direct from the netCDF file. That could make the processing faster in this case.

Cheers,
Uwe

    (1-1/1)