Project

General

Profile

Percentile calculations with timpctl

Added by Stefan Fronzek almost 14 years ago

Hello!

I am struggling with some unexpected results from calculating percentiles on daily data. I calculated the 90th percentile of 30-years of daily data. For testing, I counted the number of days exceeding the 90th percentile. My expectation was that the results should be exactly 10% of the total number of days, i.e. 10958 days * 10% = 109.58. However, the values spread around 110, some grid cells have less than 100 days (corresponding to 9.125% of the total days), some have more than 115 (10.495%) -- see the attached map. I checked that there are no missing values for the area that I plotted.

I wonder if anybody knows what is the reason for this. Is cdo only approximating percentiles rather giving exact values?

That's how I calculated that 90th percentile of all daily values 1971-01-01 to 2000-12-31:

cdo seldate,1971-01-01,2000-12-31 /cygdrive/d/data/DATA2/CRU/daily/version3.0/tg_0.25deg_reg_v3.0.nc tg_1971-2000.nc
cdo timmin tg_1971-2000.nc tgmin_1971-2000.nc
cdo timmax tg_1971-2000.nc tgmax_1971-2000.nc
cdo timpctl,99 tg_1971-2000.nc tgmin_1971-2000.nc tgmax_1971-2000.nc tg99_1971-2000.nc

I plotted the number of days exceeding the 90th percentile in Grads with the following commands (the tg90_1971-2000.ctl sets some grads specific-things for tg90_1971-2000.nc):

sdfopen /cygdrive/d/data/DATA2/CRU/daily/version3.0/tg_0.25deg_reg_v3.0.nc
open tg99_1971-2000.ctl
d sum(const(maskout(1,tg.1-tg.2),0,-u),time=1jan1971,time=30dec2000)

Best regards,
Stefan


Replies (1)

RE: Percentile calculations with timpctl - Added by Ralf Mueller over 13 years ago

Stefan Fronzek wrote:

Hello!

I am struggling with some unexpected results from calculating percentiles on daily data. I calculated the 90th percentile of 30-years of daily data. For testing, I counted the number of days exceeding the 90th percentile. My expectation was that the results should be exactly 10% of the total number of days, i.e. 10958 days * 10% = 109.58. However, the values spread around 110, some grid cells have less than 100 days (corresponding to 9.125% of the total days), some have more than 115 (10.495%)

I do not expect this. Just like the median the percentile depends on the distribution of your values (which may vary from location to location). For testing, you may use artificial data for a single location.
I do not expect CDO to calculate values approximately, but for percentiles histograms with a certain bin size is used. This incluences the percentile value of course (see docu).

    (1-1/1)