Joining on a view containing a UNION ALL produces a suboptimal plan on 9.3.4
Hi everyone -
I have a slow query issue in an app I'm working on. I'm unfortunately
not at liberty to share the query/schema details, but I've put
together a very similar reproduction of the issue:
-----
CREATE TABLE a (id integer primary key, col integer);
CREATE TABLE b (id integer primary key, col integer);
CREATE TABLE c (id integer primary key, col integer);
CREATE TABLE d (id integer primary key, col integer);
CREATE TABLE e (id integer primary key, col integer);
INSERT INTO a (id, col) SELECT i, floor(random() * 100000) FROM
generate_series(1, 100000, 2) i;
INSERT INTO b (id, col) SELECT i, floor(random() * 100000) FROM
generate_series(1, 100000, 2) i;
INSERT INTO c (id, col) SELECT i, floor(random() * 100000) FROM
generate_series(2, 100000, 2) i;
INSERT INTO d (id, col) SELECT i, floor(random() * 100000) FROM
generate_series(2, 100000, 2) i;
INSERT INTO e (id, col) SELECT i, floor(random() * 100000) FROM
generate_series(1, 100000, 1) i;
ANALYZE;
CREATE VIEW tables AS
SELECT a.*, b.col AS other_col
FROM a
LEFT JOIN b ON a.id = b.id
UNION ALL
SELECT c.*, d.col AS other_col
FROM c
LEFT JOIN d ON c.id = d.id;
EXPLAIN ANALYZE
SELECT *
FROM tables
WHERE id = 89; -- Index scans, as expected.
EXPLAIN ANALYZE
SELECT *
FROM e
JOIN tables ON e.col = tables.id
WHERE e.id = 568; -- Big merge joins, when simple index scans should
be possible?
-----
Would this be considered a deficiency in the optimizer? Is there a simple fix?
Thanks!
Chris
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
owner@postgresql.org] On Behalf Of Chris Hanks
Sent: Thursday, July 10, 2014 5:02 AM
To: PostgreSQL General
Subject: [GENERAL] Joining on a view containing a UNION ALL produces a
suboptimal plan on 9.3.4Hi everyone -
I have a slow query issue in an app I'm working on. I'm unfortunately not at
liberty to share the query/schema details, but I've put together a very similar
reproduction of the issue:-----
CREATE TABLE a (id integer primary key, col integer); CREATE TABLE b (id
integer primary key, col integer); CREATE TABLE c (id integer primary key, col
integer); CREATE TABLE d (id integer primary key, col integer); CREATE TABLE
e (id integer primary key, col integer);INSERT INTO a (id, col) SELECT i, floor(random() * 100000) FROM
generate_series(1, 100000, 2) i; INSERT INTO b (id, col) SELECT i,
floor(random() * 100000) FROM generate_series(1, 100000, 2) i; INSERT INTO
c (id, col) SELECT i, floor(random() * 100000) FROM generate_series(2,
100000, 2) i; INSERT INTO d (id, col) SELECT i, floor(random() * 100000) FROM
generate_series(2, 100000, 2) i; INSERT INTO e (id, col) SELECT i,
floor(random() * 100000) FROM generate_series(1, 100000, 1) i;ANALYZE;
CREATE VIEW tables AS
SELECT a.*, b.col AS other_col
FROM a
LEFT JOIN b ON a.id = b.id
UNION ALL
SELECT c.*, d.col AS other_col
FROM c
LEFT JOIN d ON c.id = d.id;EXPLAIN ANALYZE
SELECT *
FROM tables
WHERE id = 89; -- Index scans, as expected.EXPLAIN ANALYZE
SELECT *
FROM e
JOIN tables ON e.col = tables.id
WHERE e.id = 568; -- Big merge joins, when simple index scans should be
possible?-----
Would this be considered a deficiency in the optimizer? Is there a simple fix?
Thanks!
Chris
Chris,
" JOIN tables ON e.col = tables.id" - is this a typo?
Shouldn't it be " JOIN tables ON e.id = tables.id" ?
Or, you need it the way it is?
Regards,
Igor Neyman
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
I need it the way it is. It's a foreign key in the actual query.
Thanks!
On Thu, Jul 10, 2014 at 8:31 AM, Igor Neyman <ineyman@perceptron.com> wrote:
-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
owner@postgresql.org] On Behalf Of Chris Hanks
Sent: Thursday, July 10, 2014 5:02 AM
To: PostgreSQL General
Subject: [GENERAL] Joining on a view containing a UNION ALL produces a
suboptimal plan on 9.3.4Hi everyone -
I have a slow query issue in an app I'm working on. I'm unfortunately not at
liberty to share the query/schema details, but I've put together a very similar
reproduction of the issue:-----
CREATE TABLE a (id integer primary key, col integer); CREATE TABLE b (id
integer primary key, col integer); CREATE TABLE c (id integer primary key, col
integer); CREATE TABLE d (id integer primary key, col integer); CREATE TABLE
e (id integer primary key, col integer);INSERT INTO a (id, col) SELECT i, floor(random() * 100000) FROM
generate_series(1, 100000, 2) i; INSERT INTO b (id, col) SELECT i,
floor(random() * 100000) FROM generate_series(1, 100000, 2) i; INSERT INTO
c (id, col) SELECT i, floor(random() * 100000) FROM generate_series(2,
100000, 2) i; INSERT INTO d (id, col) SELECT i, floor(random() * 100000) FROM
generate_series(2, 100000, 2) i; INSERT INTO e (id, col) SELECT i,
floor(random() * 100000) FROM generate_series(1, 100000, 1) i;ANALYZE;
CREATE VIEW tables AS
SELECT a.*, b.col AS other_col
FROM a
LEFT JOIN b ON a.id = b.id
UNION ALL
SELECT c.*, d.col AS other_col
FROM c
LEFT JOIN d ON c.id = d.id;EXPLAIN ANALYZE
SELECT *
FROM tables
WHERE id = 89; -- Index scans, as expected.EXPLAIN ANALYZE
SELECT *
FROM e
JOIN tables ON e.col = tables.id
WHERE e.id = 568; -- Big merge joins, when simple index scans should be
possible?-----
Would this be considered a deficiency in the optimizer? Is there a simple fix?
Thanks!
ChrisChris,
" JOIN tables ON e.col = tables.id" - is this a typo?
Shouldn't it be " JOIN tables ON e.id = tables.id" ?
Or, you need it the way it is?
Regards,
Igor Neyman
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Chris Hanks <christopher.m.hanks@gmail.com> writes:
CREATE VIEW tables AS
SELECT a.*, b.col AS other_col
FROM a
LEFT JOIN b ON a.id = b.id
UNION ALL
SELECT c.*, d.col AS other_col
FROM c
LEFT JOIN d ON c.id = d.id;
EXPLAIN ANALYZE
SELECT *
FROM tables
WHERE id = 89; -- Index scans, as expected.
EXPLAIN ANALYZE
SELECT *
FROM e
JOIN tables ON e.col = tables.id
WHERE e.id = 568; -- Big merge joins, when simple index scans should
be possible?
Would this be considered a deficiency in the optimizer? Is there a simple fix?
Don't hold your breath. To arrive at the
union-on-the-inside-of-a-nestloop plan you're hoping for, the planner
would have to create a "parameterized path" for the UNION ALL structure.
But when you have joins in the arms of the UNION ALL, they are considered
to be independent subqueries, and we currently have a policy decision not
to try to generate parameterized paths for subqueries. It'd be quite
expensive and I think the planner is probably lacking some necessary
mechanisms anyway.
Given that e.id is unique, you could possibly fake it with something like
select * from tables where id = (select e.col from e where e.id = 568);
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
That did the trick! Thanks, Tom!
On Thu, Jul 10, 2014 at 8:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Chris Hanks <christopher.m.hanks@gmail.com> writes:
CREATE VIEW tables AS
SELECT a.*, b.col AS other_col
FROM a
LEFT JOIN b ON a.id = b.id
UNION ALL
SELECT c.*, d.col AS other_col
FROM c
LEFT JOIN d ON c.id = d.id;EXPLAIN ANALYZE
SELECT *
FROM tables
WHERE id = 89; -- Index scans, as expected.EXPLAIN ANALYZE
SELECT *
FROM e
JOIN tables ON e.col = tables.id
WHERE e.id = 568; -- Big merge joins, when simple index scans should
be possible?Would this be considered a deficiency in the optimizer? Is there a simple fix?
Don't hold your breath. To arrive at the
union-on-the-inside-of-a-nestloop plan you're hoping for, the planner
would have to create a "parameterized path" for the UNION ALL structure.
But when you have joins in the arms of the UNION ALL, they are considered
to be independent subqueries, and we currently have a policy decision not
to try to generate parameterized paths for subqueries. It'd be quite
expensive and I think the planner is probably lacking some necessary
mechanisms anyway.Given that e.id is unique, you could possibly fake it with something like
select * from tables where id = (select e.col from e where e.id = 568);
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general