Strange behavior in generate_series(date, date, interval) with DST

Started by Sérgio Saquetimover 11 years ago6 messagesgeneral
Jump to latest
#1Sérgio Saquetim
sergiosaquetim@gmail.com

I've noticed a strange behavior in the generate_series functions.

I'm trying to get all days between a start and an end date including the
bounds. So naturally I've tried something like the query below
​.

The real query uses generate_series to join other tables and is much more
complicated, but for the sake of brevity, I think that this query is good
enough to show the problem.
:

postgres=# SELECT generate_series('2014-10-10'::DATE,
​​
'2014-10-15'::DATE, '1 DAY'::INTERVAL);
generate_series
------------------------
2014-10-10 00:00:00-03
2014-10-11 00:00:00-03
2014-10-12 00:00:00-03
2014-10-13 00:00:00-03
2014-10-14 00:00:00-03
2014-10-15 00:00:00-03
​​
(6 rows)

​Please note that the upper bound ​

'2014-10-15' is included in the resulting rows.

Now if I try this same query with slightly different dates I get:

​postgres=# SELECT generate_series('2014-10-15'::DATE, '2014-10-20'::DATE,
'1 DAY'::INTERVAL);
generate_series
------------------------
2014-10-15 00:00:00-03
2014-10-16 00:00:00-03
2014-10-17 00:00:00-03
2014-10-18 00:00:00-03
2014-10-19 01:00:00-02
(5 rows)

-----------------------------------------------
The upper bound is not included in the results!
-----------------------------------------------

Here, in Brazil our DST started on Oct 19. So if I had to guess I would say
that this strange behavior is due to the DST, but I'm having a hard time to
understand why this is happening!

Is this expected behavior?

I know that I can achieve the results I expect with the following query:

postgres=# WITH RECURSIVE days(d) AS (
SELECT '2014-10-15'::DATE
UNION ALL
SELECT d+1 FROM days WHERE d < '2014-10-20'::DATE
)
SELECT * FROM days;
d
------------
2014-10-15
2014-10-16
2014-10-17
2014-10-18
2014-10-19
2014-10-20
(6 rows)

But using that instead of generate_series, just feels wrong so I would like
to understand what's happening and if there is a way to overcome that,
before changing my queries.

Thank you!

Sérgio Saquetim

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Sérgio Saquetim (#1)
Re: Strange behavior in generate_series(date, date, interval) with DST

On 12/07/2014 12:11 PM, Sérgio Saquetim wrote:

I've noticed a strange behavior in the generate_series functions.

I'm trying to get all days between a start and an end date including the
bounds. So naturally I've tried something like the query below
​.

The real query uses generate_series to join other tables and is much
more complicated, but for the sake of brevity, I think that this query
is good enough to show the problem.
:

postgres=# SELECT generate_series('2014-10-10'::DATE,
​​
'2014-10-15'::DATE, '1 DAY'::INTERVAL);
generate_series
------------------------
2014-10-10 00:00:00-03
2014-10-11 00:00:00-03
2014-10-12 00:00:00-03
2014-10-13 00:00:00-03
2014-10-14 00:00:00-03
2014-10-15 00:00:00-03
​​
(6 rows)

​Please note that the upper bound ​

'2014-10-15' is included in the resulting rows.

Now if I try this same query with slightly different dates I get:

​postgres=# SELECT generate_series('2014-10-15'::DATE,
'2014-10-20'::DATE, '1 DAY'::INTERVAL);
generate_series
------------------------
2014-10-15 00:00:00-03
2014-10-16 00:00:00-03
2014-10-17 00:00:00-03
2014-10-18 00:00:00-03
2014-10-19 01:00:00-02
(5 rows)

-----------------------------------------------
The upper bound is not included in the results!
-----------------------------------------------

Here, in Brazil our DST started on Oct 19. So if I had to guess I would
say that this strange behavior is due to the DST, but I'm having a hard
time to understand why this is happening!

Is this expected behavior?

I know that I can achieve the results I expect with the following query:

postgres=# WITH RECURSIVE days(d) AS (
SELECT '2014-10-15'::DATE
UNION ALL
SELECT d+1 FROM days WHERE d < '2014-10-20'::DATE
)
SELECT * FROM days;
d
------------
2014-10-15
2014-10-16
2014-10-17
2014-10-18
2014-10-19
2014-10-20
(6 rows)

But using that instead of generate_series, just feels wrong so I would
like to understand what's happening and if there is a way to overcome
that, before changing my queries.

First generate_series is really looking for a timestamp on input and
returns timestamps:

http://www.postgresql.org/docs/9.3/interactive/functions-srf.html

generate_series(start, stop, step interval) timestamp or timestamp with
time zone setof timestamp or setof timestamp with time zone (same as
argument type) Generate a series of values, from start to stop with a
step size of step

So:

test=# set timezone='Brazil/East';
SET

test=# SELECT generate_series('2014-10-10'::TIMESTAMP,
'2014-10-20'::TIMESTAMP, '1 DAY'::INTERVAL);
generate_series
---------------------
2014-10-10 00:00:00
2014-10-11 00:00:00
2014-10-12 00:00:00
2014-10-13 00:00:00
2014-10-14 00:00:00
2014-10-15 00:00:00
2014-10-16 00:00:00
2014-10-17 00:00:00
2014-10-18 00:00:00
2014-10-19 00:00:00
2014-10-20 00:00:00

test=# SELECT generate_series('2014-10-10'::TIMESTAMP WITH TIME ZONE,
'2014-10-20'::TIMESTAMP WITH TIME ZONE, '1 DAY'::INTERVAL);
generate_series
------------------------
2014-10-10 00:00:00-03
2014-10-11 00:00:00-03
2014-10-12 00:00:00-03
2014-10-13 00:00:00-03
2014-10-14 00:00:00-03
2014-10-15 00:00:00-03
2014-10-16 00:00:00-03
2014-10-17 00:00:00-03
2014-10-18 00:00:00-03
2014-10-19 01:00:00-02

Though this part I do not understand:

test=# select '2014-10-19'::timestamp with time zone;
timestamptz
------------------------
2014-10-19 01:00:00-02
(1 row)

test=# select '2014-10-20'::timestamp with time zone;
timestamptz
------------------------
2014-10-20 00:00:00-02
(1 row)

I thought interval understood 23/25 'day' across DST/ST boundaries. I
will have to think more on this.

Thank you!

Sérgio Saquetim

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Sérgio Saquetim (#1)
Re: Strange behavior in generate_series(date, date, interval) with DST

On 12/07/2014 12:11 PM, Sérgio Saquetim wrote:

I've noticed a strange behavior in the generate_series functions.

I'm trying to get all days between a start and an end date including the
bounds. So naturally I've tried something like the query below
​.

The real query uses generate_series to join other tables and is much
more complicated, but for the sake of brevity, I think that this query
is good enough to show the problem.

But using that instead of generate_series, just feels wrong so I would
like to understand what's happening and if there is a way to overcome
that, before changing my queries.

To follow up, it looks to be a Midnight issue. I live on the US West
Coast so:

test=# show timezone;
TimeZone
------------
US/Pacific
(1 row)

Our Spring change happened March 9th at 2:00 AM:

test=# select '2014-03-09 01:00'::timestamp with time zone ;
timestamptz
------------------------
2014-03-09 01:00:00-08
(1 row)

test=# select '2014-03-09 02:00'::timestamp with time zone ;
timestamptz
------------------------
2014-03-09 03:00:00-07
(1 row)

When I do a similar generate_series:

test=# SELECT generate_series('2014-03-01'::DATE , '2014-03-10'::DATE ,
'1 DAY'::INTERVAL);
generate_series
------------------------
2014-03-01 00:00:00-08
2014-03-02 00:00:00-08
2014-03-03 00:00:00-08
2014-03-04 00:00:00-08
2014-03-05 00:00:00-08
2014-03-06 00:00:00-08
2014-03-07 00:00:00-08
2014-03-08 00:00:00-08
2014-03-09 00:00:00-08
2014-03-10 00:00:00-07
(10 rows)

it works.

So it seems there is some confusion which Midnight is being used for the
DATE to timestamp with time zone conversion.

Thank you!

Sérgio Saquetim

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Sérgio Saquetim
sergiosaquetim@gmail.com
In reply to: Adrian Klaver (#3)
Re: Strange behavior in generate_series(date, date, interval) with DST

You've nailed it, thank you!

Finally I'm understanding what's going on.

I wasn't paying attention to the fact that generate_series really expects
for timezone inputs. So when I was passing the upper bound
as '2014-10-20'::DATE, the value was being cast to 2014-10-20 00:00:00-02.

postgres=# SELECT '2014-10-20'::TIMESTAMPTZ;
timestamptz
------------------------
2014-10-20 00:00:00-02
(1 row)

But after the DST change the generate_series changes the hour in the
generated values as in 2014-10-20 01:00:00-02, which is bigger
than 2014-10-20 00:00:00-02 and because of that it's not returned.

Using a larger upper bound solved my problem.

postgres=# SELECT generate_series('2014-10-15 00:00:00'::TIMESTAMPTZ,
'2014-10-20 23:59:59'::TIMESTAMPTZ, '1 DAY'::INTERVAL);
generate_series
------------------------
2014-10-15 00:00:00-03
2014-10-16 00:00:00-03
2014-10-17 00:00:00-03
2014-10-18 00:00:00-03
2014-10-19 01:00:00-02
2014-10-20 01:00:00-02
(6 rows)

Thank you again!

Sérgio Saquetim

2014-12-07 20:04 GMT-02:00 Adrian Klaver <adrian.klaver@aklaver.com>:

Show quoted text

On 12/07/2014 12:11 PM, Sérgio Saquetim wrote:

I've noticed a strange behavior in the generate_series functions.

I'm trying to get all days between a start and an end date including the
bounds. So naturally I've tried something like the query below
​.

The real query uses generate_series to join other tables and is much
more complicated, but for the sake of brevity, I think that this query
is good enough to show the problem.

But using that instead of generate_series, just feels wrong so I would
like to understand what's happening and if there is a way to overcome
that, before changing my queries.

To follow up, it looks to be a Midnight issue. I live on the US West Coast
so:

test=# show timezone;
TimeZone
------------
US/Pacific
(1 row)

Our Spring change happened March 9th at 2:00 AM:

test=# select '2014-03-09 01:00'::timestamp with time zone ;
timestamptz
------------------------
2014-03-09 01:00:00-08
(1 row)

test=# select '2014-03-09 02:00'::timestamp with time zone ;
timestamptz
------------------------
2014-03-09 03:00:00-07
(1 row)

When I do a similar generate_series:

test=# SELECT generate_series('2014-03-01'::DATE , '2014-03-10'::DATE ,
'1 DAY'::INTERVAL);
generate_series
------------------------
2014-03-01 00:00:00-08
2014-03-02 00:00:00-08
2014-03-03 00:00:00-08
2014-03-04 00:00:00-08
2014-03-05 00:00:00-08
2014-03-06 00:00:00-08
2014-03-07 00:00:00-08
2014-03-08 00:00:00-08
2014-03-09 00:00:00-08
2014-03-10 00:00:00-07
(10 rows)

it works.

So it seems there is some confusion which Midnight is being used for the
DATE to timestamp with time zone conversion.

Thank you!

Sérgio Saquetim

--
Adrian Klaver
adrian.klaver@aklaver.com

#5Andrew Sullivan
ajs@crankycanuck.ca
In reply to: Sérgio Saquetim (#4)
Re: Strange behavior in generate_series(date, date, interval) with DST

On Sun, Dec 07, 2014 at 08:25:48PM -0200, Sérgio Saquetim wrote:

I wasn't paying attention to the fact that generate_series really expects
for timezone inputs. So when I was passing the upper bound
as '2014-10-20'::DATE, the value was being cast to 2014-10-20 00:00:00-02.

postgres=# SELECT '2014-10-20'::TIMESTAMPTZ;
timestamptz
------------------------
2014-10-20 00:00:00-02
(1 row)

[…]

Using a larger upper bound solved my problem.

As a more general lesson, I'd suggest that when you're working with
dates your best bet is to do it with your time zone as UTC. If you
then want to format the output in the local time zone, you can do that
in the outer SELECT with AT TIME ZONE. This isn't because Postgres is
going to get this wrong, but because it's far too easy to confuse
yourself with those time changes. It makes debugging easier,
particularly because the time change only happens twice a year so
nobody _ever_ thinks of it when troubleshooting.

Best regards,

A

--
Andrew Sullivan
ajs@crankycanuck.ca

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Francisco Olarte
folarte@peoplecall.com
In reply to: Sérgio Saquetim (#1)
Re: Strange behavior in generate_series(date, date, interval) with DST

Hi Sérgio:

On Sun, Dec 7, 2014 at 9:11 PM, Sérgio Saquetim <sergiosaquetim@gmail.com>
wrote:

I've noticed a strange behavior in the generate_series functions.

I'm trying to get all days between a start and an end date including the
bounds. So naturally I've tried something like the query below
​.

As both your examples and previous responses highlight your problem is that
generate_series is not defined for dates, I'll avoid further comments on
this. As you are using dates and your working example is using dates I
would suggest rewriting your query around to avoid implicits cast problems.
Date difference is integer, generate_series is defined on them, so:

=> select generate_series(0,'2014-10-20'::date - '2014-10-10'::date);
generate_series
-----------------
0
1
2
3
4
5
6
7
8
9
10
(11 rows)

cdrs=> select '2014-10-10'::date+ generate_series(0,'2014-10-20'::date -
'2014-10-10'::date);
?column?
------------
2014-10-10
2014-10-11
2014-10-12
2014-10-13
2014-10-14
2014-10-15
2014-10-16
2014-10-17
2014-10-18
2014-10-19
2014-10-20
(11 rows)

Or even

=> select ini + generate_series(0, fin-ini) from (select '2014-10-10'::date
as ini, '2014-10-20'::date as fin) data ;
?column?
------------
2014-10-10
2014-10-11
2014-10-12
2014-10-13
2014-10-14
2014-10-15
2014-10-16
2014-10-17
2014-10-18
2014-10-19
2014-10-20
(11 rows)

Regards
Francisco Olarte.