between not propated into a simple equality join

Started by Benedikt Grundmannover 9 years ago4 messages

bgrundmann@janestreet.com

over 9 years ago

We just run into a very simple query that the planner does much worse on
than we thought it would (in production the table in question is ~ 100
GB). It surprised us given the planner is generally quite good, so I
thought I share our surprise

Setup:

postgres_prod@proddb_testing=# select version();[1]
version

────────────────────────────────────────────────────────────────────────────────────────────────────────────────
PostgreSQL 9.2.16 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7
20120313 (Red Hat 4.4.7-16), 64-bit
(1 row)

Time: 69.246 ms

postgres_prod@proddb_testing=# create table toy_data3 (the_date date, i
int);
CREATE TABLE
Time: 67.096 ms
postgres_prod@proddb_testing=# insert into toy_data3

(select current_date-(s.idx/1000), s.idx from generate_series(1,1000000)
as s(idx));
INSERT 0 1000000
Time: 1617.483 ms
postgres_prod@proddb_testing=# create index toy_data_date3 on
toy_data3(the_date);
CREATE INDEX
Time: 660.166 ms
postgres_prod@proddb_testing=# analyze toy_data3;
ANALYZE
Time: 294.984 ms

The bad behavior:

postgres_prod@proddb_testing=# explain analyze
select * from (
select td1.the_date, td1.i
from toy_data3 td1, toy_data3 td2 where td1.the_date = td2.the_date
and td1.i = td2.i
) foo
where the_date between current_date and current_date;
QUERY
PLAN
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Hash Join (cost=55.49..21980.50 rows=1 width=8) (actual
time=0.336..179.374 rows=999 loops=1)
Hash Cond: ((td2.the_date = td1.the_date) AND (td2.i = td1.i))
-> Seq Scan on toy_data3 td2 (cost=0.00..14425.00 rows=1000000
width=8) (actual time=0.007..72.510 rows=1000000 lo
-> Hash (cost=40.44..40.44 rows=1003 width=8) (actual
time=0.321..0.321 rows=999 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 40kB
-> Index Scan using toy_data_date3 on toy_data3 td1
(cost=0.01..40.44 rows=1003 width=8) (actual time=0.018.
Index Cond: ((the_date >= ('now'::cstring)::date) AND
(the_date <= ('now'::cstring)::date))
Total runtime: 179.440 ms
(8 rows)

Time: 246.094 ms

Notice the red. Which is sad because one would like it to realize that it
could propagate the index constraint onto td2. That is on both sides of
the join do the green.

As it does correctly when one explicitly uses equality (bold below) (but of
course we sometimes have multiple day ranges in production and we only used
a single date range above to make it extra interesting for the planner to
NOT do a seqscan):

postgres_prod@proddb_testing=# explain analyze
select * from (
select td1.the_date, td1.i
from toy_data3 td1, toy_data3 td2 where td1.the_date = td2.the_date
and td1.i = td2.i ) foo
where *the_date = current_date*;
QUERY
PLAN
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Hash Join (cost=50.47..92.17 rows=1 width=8) (actual time=0.300..0.652
rows=999 loops=1)
Hash Cond: (td1.i = td2.i)
-> Index Scan using toy_data_date3 on toy_data3 td1 (cost=0.00..37.93
rows=1003 width=8) (actual time=0.023..0.169
Index Cond: (the_date = ('now'::cstring)::date)
-> Hash (cost=37.93..37.93 rows=1003 width=8) (actual
time=0.270..0.270 rows=999 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 40kB
-> Index Scan using toy_data_date3 on toy_data3 td2
(cost=0.00..37.93 rows=1003 width=8) (actual time=0.007.
Index Cond: (the_date = ('now'::cstring)::date)
Total runtime: 0.713 ms
(9 rows)

Time: 66.904 ms

Cheers,

Bene

David G. Johnston

david.g.johnston@gmail.com

over 9 years ago

In reply to: Benedikt Grundmann (#1)

Re: between not propated into a simple equality join

On Mon, May 9, 2016 at 8:53 AM, Benedikt Grundmann <
bgrundmann@janestreet.com> wrote:

We just run into a very simple query that the planner does much worse on
than we thought it would (in production the table in question is ~ 100
GB). It surprised us given the planner is generally quite good, so I
thought I share our surprise

Setup:

postgres_prod@proddb_testing=# select version();[1]
version

────────────────────────────────────────────────────────────────────────────────────────────────────────────────
PostgreSQL 9.2.16 on x86_64-unknown-linux-gnu, compiled by gcc (GCC)
4.4.7 20120313 (Red Hat 4.4.7-16), 64-bit
(1 row)

Time: 69.246 ms

postgres_prod@proddb_testing=# create table toy_data3 (the_date date, i
int);
CREATE TABLE
Time: 67.096 ms
postgres_prod@proddb_testing=# insert into toy_data3

(select current_date-(s.idx/1000), s.idx from generate_series(1,1000000)
as s(idx));
INSERT 0 1000000
Time: 1617.483 ms
postgres_prod@proddb_testing=# create index toy_data_date3 on
toy_data3(the_date);
CREATE INDEX
Time: 660.166 ms
postgres_prod@proddb_testing=# analyze toy_data3;
ANALYZE
Time: 294.984 ms

The bad behavior:

postgres_prod@proddb_testing=# explain analyze
select * from (
select td1.the_date, td1.i
from toy_data3 td1, toy_data3 td2 where td1.the_date = td2.the_date
and td1.i = td2.i
) foo
where the_date between current_date and current_date;
QUERY
PLAN

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Hash Join (cost=55.49..21980.50 rows=1 width=8) (actual
time=0.336..179.374 rows=999 loops=1)
Hash Cond: ((td2.the_date = td1.the_date) AND (td2.i = td1.i))
-> Seq Scan on toy_data3 td2 (cost=0.00..14425.00 rows=1000000
width=8) (actual time=0.007..72.510 rows=1000000 lo
-> Hash (cost=40.44..40.44 rows=1003 width=8) (actual
time=0.321..0.321 rows=999 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 40kB
-> Index Scan using toy_data_date3 on toy_data3 td1
(cost=0.01..40.44 rows=1003 width=8) (actual time=0.018.
Index Cond: ((the_date >= ('now'::cstring)::date) AND
(the_date <= ('now'::cstring)::date))
Total runtime: 179.440 ms
(8 rows)

Time: 246.094 ms

Notice the red. Which is sad because one would like it to realize that it
could propagate the index constraint onto td2. That is on both sides of
the join do the green.

FWIW

This is my plan result:
version
PostgreSQL 9.5.2 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
4.8.2-19ubuntu1) 4.8.2, 64-bit
All default settings

using "BETWEEN"

QUERY PLAN
Nested Loop (cost=0.86..48.91 rows=1 width=8) (actual time=0.042..168.512
rows=999 loops=1)
-> Index Scan using toy_data_date3 on toy_data3 td1 (cost=0.43..8.46
rows=1 width=8) (actual time=0.022..1.388 rows=999 loops=1)
Index Cond: ((the_date >= ('now'::cstring)::date) AND (the_date <=
('now'::cstring)::date))
-> Index Scan using toy_data_date3 on toy_data3 td2 (cost=0.42..40.44
rows=1 width=8) (actual time=0.078..0.160 rows=1 loops=999)
Index Cond: (the_date = td1.the_date)
Filter: (td1.i = i)
Rows Removed by Filter: 998
Planning time: 0.353 ms
Execution time: 169.692 ms

using "="

QUERY PLAN
Hash Join (cost=49.89..90.46 rows=1 width=8) (actual time=2.320..5.652
rows=999 loops=1)
Hash Cond: (td1.i = td2.i)
-> Index Scan using toy_data_date3 on toy_data3 td1 (cost=0.43..37.37
rows=967 width=8) (actual time=0.014..1.168 rows=999 loops=1)
Index Cond: (the_date = ('now'::cstring)::date)
-> Hash (cost=37.37..37.37 rows=967 width=8) (actual time=2.292..2.292
rows=999 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 48kB
-> Index Scan using toy_data_date3 on toy_data3 td2
(cost=0.43..37.37 rows=967 width=8) (actual time=0.008..1.183 rows=999
loops=1)
Index Cond: (the_date = ('now'::cstring)::date)
Planning time: 0.326 ms
Execution time: 6.673 ms

I was hoping to be able to say more but alas cannot find the words.

I'm surprised by the estimate of 1 rows for the td1 index scan in my 9.5
query - and also why the 9.2 query would choose to sequential scan hash
join in favor of what seems to be a superior index scan nested loop on a
fraction of the table.

The fact that the between doesn't get transitively applied to td2 through
the td1=td2 condition, not as much...though whether the limitation is due
to theory or implementation I do not know.

I do suspect that at least part of the issue is that the computation of
"the_date" makes the two columns highly correlated while the planner
assumes independence.

David J.

David Rowley

david.rowley@2ndquadrant.com

over 9 years ago

In reply to: David G. Johnston (#2)

Re: between not propated into a simple equality join

On 10 May 2016 at 16:34, David G. Johnston <david.g.johnston@gmail.com> wrote:

On Mon, May 9, 2016 at 8:53 AM, Benedikt Grundmann
<bgrundmann@janestreet.com> wrote:

We just run into a very simple query that the planner does much worse on
than we thought it would (in production the table in question is ~ 100 GB).
It surprised us given the planner is generally quite good, so I thought I
share our surprise

Setup:

postgres_prod@proddb_testing=# select version();[1]
version

────────────────────────────────────────────────────────────────────────────────────────────────────────────────
PostgreSQL 9.2.16 on x86_64-unknown-linux-gnu, compiled by gcc (GCC)
4.4.7 20120313 (Red Hat 4.4.7-16), 64-bit
(1 row)

Time: 69.246 ms

postgres_prod@proddb_testing=# create table toy_data3 (the_date date, i
int);
CREATE TABLE
Time: 67.096 ms
postgres_prod@proddb_testing=# insert into toy_data3
(select current_date-(s.idx/1000), s.idx from generate_series(1,1000000)
as s(idx));
INSERT 0 1000000
Time: 1617.483 ms
postgres_prod@proddb_testing=# create index toy_data_date3 on
toy_data3(the_date);
CREATE INDEX
Time: 660.166 ms
postgres_prod@proddb_testing=# analyze toy_data3;
ANALYZE
Time: 294.984 ms

The bad behavior:

postgres_prod@proddb_testing=# explain analyze
select * from (
select td1.the_date, td1.i
from toy_data3 td1, toy_data3 td2 where td1.the_date = td2.the_date
and td1.i = td2.i
) foo
where the_date between current_date and current_date;
QUERY
PLAN

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Hash Join (cost=55.49..21980.50 rows=1 width=8) (actual
time=0.336..179.374 rows=999 loops=1)
Hash Cond: ((td2.the_date = td1.the_date) AND (td2.i = td1.i))
-> Seq Scan on toy_data3 td2 (cost=0.00..14425.00 rows=1000000
width=8) (actual time=0.007..72.510 rows=1000000 lo
-> Hash (cost=40.44..40.44 rows=1003 width=8) (actual
time=0.321..0.321 rows=999 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 40kB
-> Index Scan using toy_data_date3 on toy_data3 td1
(cost=0.01..40.44 rows=1003 width=8) (actual time=0.018.
Index Cond: ((the_date >= ('now'::cstring)::date) AND
(the_date <= ('now'::cstring)::date))
Total runtime: 179.440 ms
(8 rows)

Time: 246.094 ms

Notice the red. Which is sad because one would like it to realize that it
could propagate the index constraint onto td2. That is on both sides of the
join do the green.

FWIW

This is my plan result:
version
PostgreSQL 9.5.2 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
4.8.2-19ubuntu1) 4.8.2, 64-bit
All default settings

using "BETWEEN"

QUERY PLAN
Nested Loop (cost=0.86..48.91 rows=1 width=8) (actual time=0.042..168.512
rows=999 loops=1)
-> Index Scan using toy_data_date3 on toy_data3 td1 (cost=0.43..8.46
rows=1 width=8) (actual time=0.022..1.388 rows=999 loops=1)
Index Cond: ((the_date >= ('now'::cstring)::date) AND (the_date <=
('now'::cstring)::date))
-> Index Scan using toy_data_date3 on toy_data3 td2 (cost=0.42..40.44
rows=1 width=8) (actual time=0.078..0.160 rows=1 loops=999)
Index Cond: (the_date = td1.the_date)
Filter: (td1.i = i)
Rows Removed by Filter: 998
Planning time: 0.353 ms
Execution time: 169.692 ms

using "="

QUERY PLAN
Hash Join (cost=49.89..90.46 rows=1 width=8) (actual time=2.320..5.652
rows=999 loops=1)
Hash Cond: (td1.i = td2.i)
-> Index Scan using toy_data_date3 on toy_data3 td1 (cost=0.43..37.37
rows=967 width=8) (actual time=0.014..1.168 rows=999 loops=1)
Index Cond: (the_date = ('now'::cstring)::date)
-> Hash (cost=37.37..37.37 rows=967 width=8) (actual time=2.292..2.292
rows=999 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 48kB
-> Index Scan using toy_data_date3 on toy_data3 td2
(cost=0.43..37.37 rows=967 width=8) (actual time=0.008..1.183 rows=999
loops=1)
Index Cond: (the_date = ('now'::cstring)::date)
Planning time: 0.326 ms
Execution time: 6.673 ms

I was hoping to be able to say more but alas cannot find the words.

I'm surprised by the estimate of 1 rows for the td1 index scan in my 9.5
query - and also why the 9.2 query would choose to sequential scan hash join
in favor of what seems to be a superior index scan nested loop on a fraction
of the table.

The fact that the between doesn't get transitively applied to td2 through
the td1=td2 condition, not as much...though whether the limitation is due to
theory or implementation I do not know.

Quite simply the equivalence class mechanism which allows the the_date
qual to be pushed into td2 for when = is used does not work when
BETWEEN is used. This is because equivalence class only track quals
which prove equality. A while back did propose something I named
"Equivalence Class Filters" which improved this exact case [1]/messages/by-id/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A@mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services,
although it did need a bit more work so as not to push the qual for
parameterised nested loop plans, like the exact one you got when you
tried on 9.5. In this case the extra qual is redundant and there was
fear that it may slow down execution in some cases. I very much thing
it could be made to work. Tom did have some concerns, but I think I
have ideas which solves most or all of them.

In the meantime the workaround is just to include a BETWEEN clause for
both dates. However, this solution is often impractical if you're only
exposing one of the columns through a view and applying the WHERE
clause from outside of the view. The only way i could think to fix
that would be to add that column to the view and apply the same
BETWEEN clause on that column, which is pretty horrid :-(

[1]: /messages/by-id/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A@mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Benedikt Grundmann

bgrundmann@janestreet.com

over 9 years ago

In reply to: David Rowley (#3)

Re: between not propated into a simple equality join

On Tue, May 10, 2016 at 7:41 AM, David Rowley <david.rowley@2ndquadrant.com>
wrote:

On 10 May 2016 at 16:34, David G. Johnston <david.g.johnston@gmail.com>
wrote:

On Mon, May 9, 2016 at 8:53 AM, Benedikt Grundmann
<bgrundmann@janestreet.com> wrote:

We just run into a very simple query that the planner does much worse on
than we thought it would (in production the table in question is ~ 100

GB).

It surprised us given the planner is generally quite good, so I thought

I

share our surprise

Setup:

postgres_prod@proddb_testing=# select version();[1]
version

────────────────────────────────────────────────────────────────────────────────────────────────────────────────

PostgreSQL 9.2.16 on x86_64-unknown-linux-gnu, compiled by gcc (GCC)
4.4.7 20120313 (Red Hat 4.4.7-16), 64-bit
(1 row)

Time: 69.246 ms

postgres_prod@proddb_testing=# create table toy_data3 (the_date date, i
int);
CREATE TABLE
Time: 67.096 ms
postgres_prod@proddb_testing=# insert into toy_data3
(select current_date-(s.idx/1000), s.idx from

generate_series(1,1000000)

as s(idx));
INSERT 0 1000000
Time: 1617.483 ms
postgres_prod@proddb_testing=# create index toy_data_date3 on
toy_data3(the_date);
CREATE INDEX
Time: 660.166 ms
postgres_prod@proddb_testing=# analyze toy_data3;
ANALYZE
Time: 294.984 ms

The bad behavior:

postgres_prod@proddb_testing=# explain analyze
select * from (
select td1.the_date, td1.i
from toy_data3 td1, toy_data3 td2 where td1.the_date = td2.the_date
and td1.i = td2.i
) foo
where the_date between current_date and current_date;
QUERY
PLAN

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Hash Join (cost=55.49..21980.50 rows=1 width=8) (actual
time=0.336..179.374 rows=999 loops=1)
Hash Cond: ((td2.the_date = td1.the_date) AND (td2.i = td1.i))
-> Seq Scan on toy_data3 td2 (cost=0.00..14425.00 rows=1000000
width=8) (actual time=0.007..72.510 rows=1000000 lo
-> Hash (cost=40.44..40.44 rows=1003 width=8) (actual
time=0.321..0.321 rows=999 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 40kB
-> Index Scan using toy_data_date3 on toy_data3 td1
(cost=0.01..40.44 rows=1003 width=8) (actual time=0.018.
Index Cond: ((the_date >= ('now'::cstring)::date) AND
(the_date <= ('now'::cstring)::date))
Total runtime: 179.440 ms
(8 rows)

Time: 246.094 ms

Notice the red. Which is sad because one would like it to realize that

it

could propagate the index constraint onto td2. That is on both sides

of the

join do the green.

FWIW

This is my plan result:
version
PostgreSQL 9.5.2 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
4.8.2-19ubuntu1) 4.8.2, 64-bit
All default settings

using "BETWEEN"

QUERY PLAN
Nested Loop (cost=0.86..48.91 rows=1 width=8) (actual

time=0.042..168.512

rows=999 loops=1)
-> Index Scan using toy_data_date3 on toy_data3 td1 (cost=0.43..8.46
rows=1 width=8) (actual time=0.022..1.388 rows=999 loops=1)
Index Cond: ((the_date >= ('now'::cstring)::date) AND (the_date

<=

('now'::cstring)::date))
-> Index Scan using toy_data_date3 on toy_data3 td2 (cost=0.42..40.44
rows=1 width=8) (actual time=0.078..0.160 rows=1 loops=999)
Index Cond: (the_date = td1.the_date)
Filter: (td1.i = i)
Rows Removed by Filter: 998
Planning time: 0.353 ms
Execution time: 169.692 ms

using "="

QUERY PLAN
Hash Join (cost=49.89..90.46 rows=1 width=8) (actual time=2.320..5.652
rows=999 loops=1)
Hash Cond: (td1.i = td2.i)
-> Index Scan using toy_data_date3 on toy_data3 td1 (cost=0.43..37.37
rows=967 width=8) (actual time=0.014..1.168 rows=999 loops=1)
Index Cond: (the_date = ('now'::cstring)::date)
-> Hash (cost=37.37..37.37 rows=967 width=8) (actual

time=2.292..2.292

rows=999 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 48kB
-> Index Scan using toy_data_date3 on toy_data3 td2
(cost=0.43..37.37 rows=967 width=8) (actual time=0.008..1.183 rows=999
loops=1)
Index Cond: (the_date = ('now'::cstring)::date)
Planning time: 0.326 ms
Execution time: 6.673 ms

I was hoping to be able to say more but alas cannot find the words.

I'm surprised by the estimate of 1 rows for the td1 index scan in my 9.5
query - and also why the 9.2 query would choose to sequential scan hash

join

in favor of what seems to be a superior index scan nested loop on a

fraction

of the table.

The fact that the between doesn't get transitively applied to td2 through
the td1=td2 condition, not as much...though whether the limitation is

due to

theory or implementation I do not know.

Quite simply the equivalence class mechanism which allows the the_date
qual to be pushed into td2 for when = is used does not work when
BETWEEN is used. This is because equivalence class only track quals
which prove equality. A while back did propose something I named
"Equivalence Class Filters" which improved this exact case [1],
although it did need a bit more work so as not to push the qual for
parameterised nested loop plans, like the exact one you got when you
tried on 9.5. In this case the extra qual is redundant and there was
fear that it may slow down execution in some cases. I very much thing
it could be made to work. Tom did have some concerns, but I think I
have ideas which solves most or all of them.

In the meantime the workaround is just to include a BETWEEN clause for
both dates. However, this solution is often impractical if you're only
exposing one of the columns through a view and applying the WHERE
clause from outside of the view. The only way i could think to fix
that would be to add that column to the view and apply the same
BETWEEN clause on that column, which is pretty horrid :-(

Indeed that (using a view) is what we do in practice and how we encountered
the problem. For now we will remove the view and rewrite lots of queries
to explicitly do the join themselves so they can include the constraints as
necessary (sigh). For what it is worth I expect there to be many more views
like it in our database. Having one table of primary data and another one
of some optional less often used additional data and both tables dated + id
is a reasonably common pattern. So please take this as a +1 on the work
you started.

Thanks to everybody for the replies and the work on PostgreSQL in general.

Cheers,

Bene

Show quoted text

[1]
/messages/by-id/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A@mail.gmail.com
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services