Joining on a view containing a UNION ALL produces a suboptimal plan on 9.3.4

Started by Chris Hanksalmost 12 years ago5 messagesgeneral
Jump to latest
#1Chris Hanks
christopher.m.hanks@gmail.com

Hi everyone -

I have a slow query issue in an app I'm working on. I'm unfortunately
not at liberty to share the query/schema details, but I've put
together a very similar reproduction of the issue:

-----

CREATE TABLE a (id integer primary key, col integer);
CREATE TABLE b (id integer primary key, col integer);
CREATE TABLE c (id integer primary key, col integer);
CREATE TABLE d (id integer primary key, col integer);
CREATE TABLE e (id integer primary key, col integer);

INSERT INTO a (id, col) SELECT i, floor(random() * 100000) FROM
generate_series(1, 100000, 2) i;
INSERT INTO b (id, col) SELECT i, floor(random() * 100000) FROM
generate_series(1, 100000, 2) i;
INSERT INTO c (id, col) SELECT i, floor(random() * 100000) FROM
generate_series(2, 100000, 2) i;
INSERT INTO d (id, col) SELECT i, floor(random() * 100000) FROM
generate_series(2, 100000, 2) i;
INSERT INTO e (id, col) SELECT i, floor(random() * 100000) FROM
generate_series(1, 100000, 1) i;

ANALYZE;

CREATE VIEW tables AS
SELECT a.*, b.col AS other_col
FROM a
LEFT JOIN b ON a.id = b.id
UNION ALL
SELECT c.*, d.col AS other_col
FROM c
LEFT JOIN d ON c.id = d.id;

EXPLAIN ANALYZE
SELECT *
FROM tables
WHERE id = 89; -- Index scans, as expected.

EXPLAIN ANALYZE
SELECT *
FROM e
JOIN tables ON e.col = tables.id
WHERE e.id = 568; -- Big merge joins, when simple index scans should
be possible?

-----

Would this be considered a deficiency in the optimizer? Is there a simple fix?

Thanks!
Chris

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Igor Neyman
ineyman@perceptron.com
In reply to: Chris Hanks (#1)
Re: Joining on a view containing a UNION ALL produces a suboptimal plan on 9.3.4

-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
owner@postgresql.org] On Behalf Of Chris Hanks
Sent: Thursday, July 10, 2014 5:02 AM
To: PostgreSQL General
Subject: [GENERAL] Joining on a view containing a UNION ALL produces a
suboptimal plan on 9.3.4

Hi everyone -

I have a slow query issue in an app I'm working on. I'm unfortunately not at
liberty to share the query/schema details, but I've put together a very similar
reproduction of the issue:

-----

CREATE TABLE a (id integer primary key, col integer); CREATE TABLE b (id
integer primary key, col integer); CREATE TABLE c (id integer primary key, col
integer); CREATE TABLE d (id integer primary key, col integer); CREATE TABLE
e (id integer primary key, col integer);

INSERT INTO a (id, col) SELECT i, floor(random() * 100000) FROM
generate_series(1, 100000, 2) i; INSERT INTO b (id, col) SELECT i,
floor(random() * 100000) FROM generate_series(1, 100000, 2) i; INSERT INTO
c (id, col) SELECT i, floor(random() * 100000) FROM generate_series(2,
100000, 2) i; INSERT INTO d (id, col) SELECT i, floor(random() * 100000) FROM
generate_series(2, 100000, 2) i; INSERT INTO e (id, col) SELECT i,
floor(random() * 100000) FROM generate_series(1, 100000, 1) i;

ANALYZE;

CREATE VIEW tables AS
SELECT a.*, b.col AS other_col
FROM a
LEFT JOIN b ON a.id = b.id
UNION ALL
SELECT c.*, d.col AS other_col
FROM c
LEFT JOIN d ON c.id = d.id;

EXPLAIN ANALYZE
SELECT *
FROM tables
WHERE id = 89; -- Index scans, as expected.

EXPLAIN ANALYZE
SELECT *
FROM e
JOIN tables ON e.col = tables.id
WHERE e.id = 568; -- Big merge joins, when simple index scans should be
possible?

-----

Would this be considered a deficiency in the optimizer? Is there a simple fix?

Thanks!
Chris

Chris,

" JOIN tables ON e.col = tables.id" - is this a typo?

Shouldn't it be " JOIN tables ON e.id = tables.id" ?

Or, you need it the way it is?

Regards,
Igor Neyman

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Chris Hanks
christopher.m.hanks@gmail.com
In reply to: Igor Neyman (#2)
Re: Joining on a view containing a UNION ALL produces a suboptimal plan on 9.3.4

I need it the way it is. It's a foreign key in the actual query.

Thanks!

On Thu, Jul 10, 2014 at 8:31 AM, Igor Neyman <ineyman@perceptron.com> wrote:

-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
owner@postgresql.org] On Behalf Of Chris Hanks
Sent: Thursday, July 10, 2014 5:02 AM
To: PostgreSQL General
Subject: [GENERAL] Joining on a view containing a UNION ALL produces a
suboptimal plan on 9.3.4

Hi everyone -

I have a slow query issue in an app I'm working on. I'm unfortunately not at
liberty to share the query/schema details, but I've put together a very similar
reproduction of the issue:

-----

CREATE TABLE a (id integer primary key, col integer); CREATE TABLE b (id
integer primary key, col integer); CREATE TABLE c (id integer primary key, col
integer); CREATE TABLE d (id integer primary key, col integer); CREATE TABLE
e (id integer primary key, col integer);

INSERT INTO a (id, col) SELECT i, floor(random() * 100000) FROM
generate_series(1, 100000, 2) i; INSERT INTO b (id, col) SELECT i,
floor(random() * 100000) FROM generate_series(1, 100000, 2) i; INSERT INTO
c (id, col) SELECT i, floor(random() * 100000) FROM generate_series(2,
100000, 2) i; INSERT INTO d (id, col) SELECT i, floor(random() * 100000) FROM
generate_series(2, 100000, 2) i; INSERT INTO e (id, col) SELECT i,
floor(random() * 100000) FROM generate_series(1, 100000, 1) i;

ANALYZE;

CREATE VIEW tables AS
SELECT a.*, b.col AS other_col
FROM a
LEFT JOIN b ON a.id = b.id
UNION ALL
SELECT c.*, d.col AS other_col
FROM c
LEFT JOIN d ON c.id = d.id;

EXPLAIN ANALYZE
SELECT *
FROM tables
WHERE id = 89; -- Index scans, as expected.

EXPLAIN ANALYZE
SELECT *
FROM e
JOIN tables ON e.col = tables.id
WHERE e.id = 568; -- Big merge joins, when simple index scans should be
possible?

-----

Would this be considered a deficiency in the optimizer? Is there a simple fix?

Thanks!
Chris

Chris,

" JOIN tables ON e.col = tables.id" - is this a typo?

Shouldn't it be " JOIN tables ON e.id = tables.id" ?

Or, you need it the way it is?

Regards,
Igor Neyman

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Chris Hanks (#1)
Re: Joining on a view containing a UNION ALL produces a suboptimal plan on 9.3.4

Chris Hanks <christopher.m.hanks@gmail.com> writes:

CREATE VIEW tables AS
SELECT a.*, b.col AS other_col
FROM a
LEFT JOIN b ON a.id = b.id
UNION ALL
SELECT c.*, d.col AS other_col
FROM c
LEFT JOIN d ON c.id = d.id;

EXPLAIN ANALYZE
SELECT *
FROM tables
WHERE id = 89; -- Index scans, as expected.

EXPLAIN ANALYZE
SELECT *
FROM e
JOIN tables ON e.col = tables.id
WHERE e.id = 568; -- Big merge joins, when simple index scans should
be possible?

Would this be considered a deficiency in the optimizer? Is there a simple fix?

Don't hold your breath. To arrive at the
union-on-the-inside-of-a-nestloop plan you're hoping for, the planner
would have to create a "parameterized path" for the UNION ALL structure.
But when you have joins in the arms of the UNION ALL, they are considered
to be independent subqueries, and we currently have a policy decision not
to try to generate parameterized paths for subqueries. It'd be quite
expensive and I think the planner is probably lacking some necessary
mechanisms anyway.

Given that e.id is unique, you could possibly fake it with something like

select * from tables where id = (select e.col from e where e.id = 568);

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Chris Hanks
christopher.m.hanks@gmail.com
In reply to: Tom Lane (#4)
Re: Joining on a view containing a UNION ALL produces a suboptimal plan on 9.3.4

That did the trick! Thanks, Tom!

On Thu, Jul 10, 2014 at 8:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Chris Hanks <christopher.m.hanks@gmail.com> writes:

CREATE VIEW tables AS
SELECT a.*, b.col AS other_col
FROM a
LEFT JOIN b ON a.id = b.id
UNION ALL
SELECT c.*, d.col AS other_col
FROM c
LEFT JOIN d ON c.id = d.id;

EXPLAIN ANALYZE
SELECT *
FROM tables
WHERE id = 89; -- Index scans, as expected.

EXPLAIN ANALYZE
SELECT *
FROM e
JOIN tables ON e.col = tables.id
WHERE e.id = 568; -- Big merge joins, when simple index scans should
be possible?

Would this be considered a deficiency in the optimizer? Is there a simple fix?

Don't hold your breath. To arrive at the
union-on-the-inside-of-a-nestloop plan you're hoping for, the planner
would have to create a "parameterized path" for the UNION ALL structure.
But when you have joins in the arms of the UNION ALL, they are considered
to be independent subqueries, and we currently have a policy decision not
to try to generate parameterized paths for subqueries. It'd be quite
expensive and I think the planner is probably lacking some necessary
mechanisms anyway.

Given that e.id is unique, you could possibly fake it with something like

select * from tables where id = (select e.col from e where e.id = 568);

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general