Project

General

Profile

File size triples when using seldate argument

Added by Oliver Angelil about 7 years ago

The file "precip.nc" is attached. When I apply:

cdo -seldate,1948-01-01,2015-06-01 precip.nc new.nc

the file size goes from 13mb to 32mb. All I did was remove the last timestep (2015-07-01). My first guess was that the number of bits were being changed from 32 to 64, but this is not the case. This problem does not occur when using a different select operator like "selyear".

Oliver

precip.nc (13.4 MB) precip.nc

Replies (5)

RE: File size triples when using seldate argument - Added by Uwe Schulzweida about 7 years ago

Hi Oliver,

Compressed NetCDF4 files will be decompressed with all CDO operators. The CDO option "-z zip" is needed to compress NetCDF4 data:

cdo -z zip -seldate,1948-01-01,2015-06-01 precip.nc new.nc
Cheers,
Uwe

RE: File size triples when using seldate argument - Added by Oliver Angelil about 7 years ago

Hi Uwe,

Thanks. Yes I was wrong about selyear: the file also becomes decompressed when using that! Is there a way to determine whether a netcdf file is compressed or not?

Oliver

RE: File size triples when using seldate argument - Added by Oliver Angelil about 7 years ago

Hi again,

I have created two of my own netcdf files now, and -z zip fails to keep the files compressed. I've attached two netcdf4 files. "file.nc" (479K) was created using default settings with the netcdf4 python module; and "file_zlib_on.nc" (364K) was created with the "zlib=True" argument.

The following commands:
cdo -z zip -seldate,1990-01-01,2017-01-05 file.nc new.nc
cdo -z zip -seldate,1990-01-01,2017-01-05 file_zlib_on.nc new.nc
both result in a file that is 1.2mb

And the following (without -z zip):
cdo -seldate,1990-01-01,2017-01-05 file.nc new.nc
cdo -seldate,1990-01-01,2017-01-05 file_zlib_on.nc new.nc
both result in a file that is 1.1mb

What am I missing to keep the file sizes around 300-500K?

Thanks,
Oliver

RE: File size triples when using seldate argument - Added by Uwe Schulzweida about 7 years ago

The performance of CDO is optimized for large data on each timestep. Therefor a dataset is processed time step by time step. The maximum chunksize for the compression is the size of the grid.
Your NetCDF file contains 2 variables with 7094 time steps. The variables have a gridsize (number of columns) of 6 and 9. It seems that the chunksize used is the whole field with all time steps 6*7094 and 9*7094. CDO can only compress the 6 and 9 values but this often doesn't make sense if the overhead is larger than the compression result. That's the reason why the compressed file is larger than the uncompressed one.
The uncompressed file is larger than the original file because CDO is always using a time axis with unlimited steps. Your NetCDF file is using a time axis with a fixed number of timesteps. I don't know exactly why the file produced with CDO is larger but I assume it has something to do with chunking and buffering in netCDF4/HDF5. This problem does not exist with netCDF3 data:

cdo -f nc -seldate,1990-01-01,2017-01-05 file.nc new.nc

RE: File size triples when using seldate argument - Added by Oliver Angelil about 7 years ago

Thanks Uwe,

I've recreated the file as "NETCDF3_CLASSIC". Now the command is simply: cdo -seldate,1990-01-01,2017-01-05 file.nc new.nc (no "-z zip" needed). I wouldn't know how to create a netcdf4 file for this to work, besides the solution you suggested of converting it to netcdf3 with cdo. However the "precip.nc" file I first attached is netcdf4 and the command with "-z zip" works, so there may be another way I could write "file.nc" as netcdf4 so that it also works! Anyways, I'l stick with netcdf3 for now. :)

Oliver

    (1-5/5)