where clauses including timstamptz and intervals

Started by Niels Jespersenabout 5 years ago8 messagesgeneral
Jump to latest

Hello all

Are these two queries exactly eqivalent? The table is partitioned on r_time, which is a timestamptz. The explain plans are not exactly the same. The first wants to scan a partition more than the latter.

select f.xx from f
where f.r_time >= '2020-10-01 00:00:00+00'::timestamptz
and f.r_time < ('2020-10-01 00:00:00+00'::timestamptz + interval '1 month');

select f.xx from f
where f.r_time >= '2020-10-01 00:00:00+00'::timestamptz
and f.r_time < ('2020-11-01 00:00:00+00'::timestamptz);

Regards Niels Jespersen

In reply to: Niels Jespersen (#1)
Re: where clauses including timstamptz and intervals

On Fri, Apr 09, 2021 at 07:24:54AM +0000, Niels Jespersen wrote:

Hello all

Are these two queries exactly eqivalent? The table is partitioned on
r_time, which is a timestamptz. The explain plans are not exactly the
same. The first wants to scan a partition more than the latter.

select f.xx from f
where f.r_time >= '2020-10-01 00:00:00+00'::timestamptz
and f.r_time < ('2020-10-01 00:00:00+00'::timestamptz + interval '1 month');

select f.xx from f
where f.r_time >= '2020-10-01 00:00:00+00'::timestamptz
and f.r_time < ('2020-11-01 00:00:00+00'::timestamptz);

It depends on timezone.

For example, in my timezone:

$ select ('2020-10-01 00:00:00+00'::timestamptz + interval '1 month'), '2020-11-01 00:00:00+00'::timestamptz;
?column? │ timestamptz
────────────────────────┼────────────────────────
2020-11-01 02:00:00+01 │ 2020-11-01 01:00:00+01
(1 row)

Please note that there is 1 hour difference.

The reason is that somewhere in there we change time due to daylight
savings.

If I'd set timezone to UTC, suddently it's the same:

$ set timezone=UTC;
SET

$ select ('2020-10-01 00:00:00+00'::timestamptz + interval '1 month') = '2020-11-01 00:00:00+00'::timestamptz;
?column?
──────────
t
(1 row)

As usual - time is more complicated than one could expect.

Best regards,

depesz

#3Ron
ronljohnsonjr@gmail.com
In reply to: hubert depesz lubaczewski (#2)
Re: where clauses including timstamptz and intervals

On 4/9/21 5:24 AM, hubert depesz lubaczewski wrote:

On Fri, Apr 09, 2021 at 07:24:54AM +0000, Niels Jespersen wrote:

Hello all

Are these two queries exactly eqivalent? The table is partitioned on
r_time, which is a timestamptz. The explain plans are not exactly the
same. The first wants to scan a partition more than the latter.

select f.xx from f
where f.r_time >= '2020-10-01 00:00:00+00'::timestamptz
and f.r_time < ('2020-10-01 00:00:00+00'::timestamptz + interval '1 month');

select f.xx from f
where f.r_time >= '2020-10-01 00:00:00+00'::timestamptz
and f.r_time < ('2020-11-01 00:00:00+00'::timestamptz);

It depends on timezone.

For example, in my timezone:

$ select ('2020-10-01 00:00:00+00'::timestamptz + interval '1 month'), '2020-11-01 00:00:00+00'::timestamptz;
?column? │ timestamptz
────────────────────────┼────────────────────────
2020-11-01 02:00:00+01 │ 2020-11-01 01:00:00+01
(1 row)

Please note that there is 1 hour difference.

The reason is that somewhere in there we change time due to daylight
savings.

That looks like a flaw in how "month" is calculated.  Whether or not October
is 744 hours (31 days x 24 hours/day) or 743 hours (subtracting the "fall
back" hour), one month from 01-Oct-2020 is still 01-Nov-2020.

--
Angular momentum makes the world go 'round.

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ron (#3)
Re: where clauses including timstamptz and intervals

Ron <ronljohnsonjr@gmail.com> writes:

On 4/9/21 5:24 AM, hubert depesz lubaczewski wrote:

For example, in my timezone:

$ select ('2020-10-01 00:00:00+00'::timestamptz + interval '1 month'), '2020-11-01 00:00:00+00'::timestamptz;
?column? │ timestamptz
────────────────────────┼────────────────────────
2020-11-01 02:00:00+01 │ 2020-11-01 01:00:00+01
(1 row)

Please note that there is 1 hour difference.
The reason is that somewhere in there we change time due to daylight
savings.

That looks like a flaw in how "month" is calculated.  Whether or not October
is 744 hours (31 days x 24 hours/day) or 743 hours (subtracting the "fall
back" hour), one month from 01-Oct-2020 is still 01-Nov-2020.

No, it's more likely because the calculation was done in some time
zone other than UTC, which (somewhat confusingly) is what the starting
value was specified in. Assuming Hubert meant EU rules, we have

regression=# set timezone = 'Europe/Paris';
SET
regression=# select '2020-10-01 00:00:00+00'::timestamptz;
timestamptz
------------------------
2020-10-01 02:00:00+02
(1 row)

regression=# select '2020-10-01 00:00:00+00'::timestamptz + interval '1 month';
?column?
------------------------
2020-11-01 02:00:00+01
(1 row)

which looks to me like the addition did exactly what it's supposed to,
ie same local time 1 month later.

regards, tom lane

#5Francisco Olarte
folarte@peoplecall.com
In reply to: Ron (#3)
Re: where clauses including timstamptz and intervals

Ron:

On Fri, Apr 9, 2021 at 5:53 PM Ron <ronljohnsonjr@gmail.com> wrote:

Are these two queries exactly eqivalent? The table is partitioned on
r_time, which is a timestamptz. The explain plans are not exactly the
same. The first wants to scan a partition more than the latter.

select f.xx from f
where f.r_time >= '2020-10-01 00:00:00+00'::timestamptz
and f.r_time < ('2020-10-01 00:00:00+00'::timestamptz + interval '1 month');

select f.xx from f
where f.r_time >= '2020-10-01 00:00:00+00'::timestamptz
and f.r_time < ('2020-11-01 00:00:00+00'::timestamptz);

It depends on timezone.

....

That looks like a flaw in how "month" is calculated. Whether or not October
is 744 hours (31 days x 24 hours/day) or 743 hours (subtracting the "fall
back" hour), one month from 01-Oct-2020 is still 01-Nov-2020.

I've encountered these thing before. What is probably happening is
that your CLIENT TIMEZONE is DST aware.

When you add the interval 1st PG takes the tstz, and converts it to
internal. No problem here, it is a gmt one. The it parses the
interval, again no problem, intervals do not depend on DST.

But then you ask to add them, and timezone + interval is time zone
sensitive. And, counter intuitively to many, tstz does not store
timezone, it marks an instant in time, it just signals that
manipulation to the timezone must be TZ aware, so it does it in your
current timezone.

You probably have ( and if not, you should consider having ) your
partitions defined on GMT, so the time jump at the starty of the month
make it take one more ( are you in Britain/Portugal/Canary islands,
which are +0 in winter +1 in summer, IIRC ? )

If you want to scan exact partitions by specifying arithmetic limits,
set timezone to whichever tz your partitions are partitioned with, or
just name the partitions. If you just want to scan a date range (
thats's what I do a lot ), just trust pg to do the right thing.

Remember, specifying +00 in the input does not make the arithmetic
work in that tz, it works in your tz. I think you could achieve the
desired result by temporarily switching to timestamp without,
something like
('2020-10-01 00:00:00+00'::timestamptz at timezone 'GMT' + interval '1
month') at timezone 'GMT'
IIRC this converts to tstz to ts, then adds there, then converts back
to tstz, my results are:

cdrs=> show timezone;
TimeZone
----------
UTC
(1 row)

cdrs=> select ('2020-10-01 00:00:00+00'::timestamptz + interval '1 month');
?column?
------------------------
2020-11-01 00:00:00+00
(1 row)

cdrs=> set timezone to 'Europe/Madrid';
SET

cdrs=> select ('2020-10-01 00:00:00+00'::timestamptz), ('2020-10-01
00:00:00+00'::timestamptz + interval '1 month');
timestamptz | ?column?
------------------------+------------------------
2020-10-01 02:00:00+02 | 2020-11-01 02:00:00+01
(1 row)

********* diferent UTC hours, same local. (02 but +02 and +01)

cdrs=> select ('2020-10-01 00:00:00+00'::timestamptz), ('2020-10-01
00:00:00+00'::timestamptz at time zone 'utc'+ interval '1 month') at
time zone 'utc';
timestamptz | timezone
------------------------+------------------------
2020-10-01 02:00:00+02 | 2020-11-01 01:00:00+01
(1 row)

********* Now same UTC hours, different local. ( 02 + 02, 01 + 01 )

If I had monthly partitions rolling at 00 utc, 1st one will nibble the
first hour of an extra one.

TSTZ is hard.

Francisco Olarte.

In reply to: Tom Lane (#4)
SV: where clauses including timstamptz and intervals

-----Oprindelig meddelelse-----
Fra: Tom Lane <tgl@sss.pgh.pa.us>
Emne: Re: where clauses including timstamptz and intervals

Ron <ronljohnsonjr@gmail.com> writes:

On 4/9/21 5:24 AM, hubert depesz lubaczewski wrote:

For example, in my timezone:

(1 row)

Please note that there is 1 hour difference.
The reason is that somewhere in there we change time due to daylight
savings.

That looks like a flaw in how "month" is calculated. Whether or not
October is 744 hours (31 days x 24 hours/day) or 743 hours
(subtracting the "fall back" hour), one month from 01-Oct-2020 is still 01-Nov-2020.

No, it's more likely because the calculation was done in some time zone other than UTC, which (somewhat confusingly) is what the starting value was specified in. Assuming Hubert meant EU rules, we have

regression=# set timezone = 'Europe/Paris'; SET regression=# select '2020-10-01 00:00:00+00'::timestamptz;
timestamptz
------------------------
2020-10-01 02:00:00+02
(1 row)

regression=# select '2020-10-01 00:00:00+00'::timestamptz + interval '1 month';
?column?
------------------------
2020-11-01 02:00:00+01
(1 row)

which looks to me like the addition did exactly what it's supposed to, ie same local time 1 month later.

regards, tom lane

I hope this sums it up.

select ('2020-10-01 00:00:00+00'::timestamptz at time zone 'utc' + interval '1 month') as c1,
'2020-11-01 00:00:00+00'::timestamptz at time zone 'utc' as c2,
('2020-10-01 00:00:00+00'::timestamptz at time zone 'utc' + interval '1 month') =
('2020-11-01 00:00:00+00'::timestamptz at time zone 'utc') as c3,
('2020-10-01 00:00:00+00'::timestamptz at time zone 'Europe/Brussels' + interval '1 month') as c4,
'2020-11-01 00:00:00+00'::timestamptz at time zone 'Europe/Brussels' as c5,
('2020-10-01 00:00:00+00'::timestamptz at time zone 'Europe/Brussels' + interval '1 month') =
('2020-11-01 00:00:00+00'::timestamptz at time zone 'Europe/Brussels') as c6;

returns

c1 | c2 | c3 | c4 | c5 | c6
---------------------+---------------------+----+---------------------+---------------------+----
2020-11-01 00:00:00 | 2020-11-01 00:00:00 | t | 2020-11-01 02:00:00 | 2020-11-01 01:00:00 | f
(1 row)

The answer to my original problem behind my original question which is how to query using utc timing in order to hit exactly the (utc created) partition, without changing the session timezone, would be like this:

select f.* from f
where f.r_time at time zone 'utc' >= '2020-10-01 00:00:00+00'::timestamptz at time zone 'utc'
and f.r_time at time zone 'utc' < ('2020-10-01 00:00:00+00'::timestamptz at time zone 'utc' + interval '1 month');

I think maybe it's better to just change the session timezone when this need arises.

Thank you all for feedback.

Regards Niels

#7Francisco Olarte
folarte@peoplecall.com
In reply to: Niels Jespersen (#6)
Re: where clauses including timstamptz and intervals

Niels:

On Fri, Apr 9, 2021 at 7:39 PM Niels Jespersen <NJN@dst.dk> wrote:

The answer to my original problem behind my original question which is how to query using utc timing in order to hit exactly the (utc created) partition, without changing the session timezone, would be like this:

You would have had much success had you stated your problem instead of
the seemingly academical question of queery equivalence, anyway:

select f.* from f
where f.r_time at time zone 'utc' >= '2020-10-01 00:00:00+00'::timestamptz at time zone 'utc'
and f.r_time at time zone 'utc' < ('2020-10-01 00:00:00+00'::timestamptz at time zone 'utc' + interval '1 month');

Although it is equivalent I would suggest to use:

f.r_time< ('2020-10-01 00:00:00+00'::timestamptz at time zone 'utc' +
interval '1 month') at time zone 'utc' ;

optimizer should treat both the same, but I'm no sure it can recognize
it, and it may not see your condition is of the type "field <
constant" to select index scans and similar things. Also, you can
refactor your calculations into an stable function for greater
clarity.

I think maybe it's better to just change the session timezone when this need arises.

Maybe. Also, I do not see the need to query exact partitions by tstz
range, but assuming you do it may be much easier to query the catalog
for the partition and then query the partition. ( In many years of
working with time-partitioned tables I have never got the need to
build a query to hit exactly X partitions from times ).

Francisco Olarte.

In reply to: Francisco Olarte (#7)
SV: where clauses including timstamptz and intervals

-----Oprindelig meddelelse-----
Fra: Francisco Olarte <folarte@peoplecall.com>
Sendt: 10. april 2021 09:31
Til: Niels Jespersen <NJN@dst.dk>

Although it is equivalent I would suggest to use:

f.r_time< ('2020-10-01 00:00:00+00'::timestamptz at time zone 'utc' + interval '1 month') at time zone 'utc' ;

optimizer should treat both the same, but I'm no sure it can recognize it, and it may not see your condition is of the type "field < constant" to select index scans and similar things. Also, you can refactor your calculations into an stable function for greater clarity.

The plannes seems to be able to figure this out nicely.

I think maybe it's better to just change the session timezone when this need arises.

Maybe. Also, I do not see the need to query exact partitions by tstz range, but assuming you do it may be much easier to query the catalog for the partition and then query the partition. ( In many years of working with time-partitioned tables I have never got the need to build a query to hit exactly X partitions from times ).

I need to hit an exact partition, because the query makes some QA on data coming from a file that fits that exact timeslot. File and partition must match exactly because the file may be raplaced later and the partition truncated.

Thank you for your insights.

Niels

Show quoted text

Francisco Olarte.