BUG #15857: Parallel Hash Join makes join instead of exists

Started by PG Bug reporting formalmost 7 years ago10 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 15857
Logged by: Vladimir Kriukov
Email address: krujkov@gmail.com
PostgreSQL version: 11.3
Operating system: CentOS 7
Description:

CREATE TABLE a(id int);
ALTER TABLE a ADD CONSTRAINT a_pkey PRIMARY KEY(id);
INSERT INTO a(id) SELECT generate_series(1, 1000000);
INSERT INTO a(id) SELECT generate_series(1000001, 10000000);
CREATE TABLE b(id int, base_id int);
ALTER TABLE b ADD CONSTRAINT b_pkey PRIMARY KEY(id);
INSERT INTO b (id) select generate_series(1, 1000000);
UPDATE b SET base_id = 1000000 - id;
CREATE TABLE c(id int, base_id int);
ALTER TABLE c ADD CONSTRAINT c_pkey PRIMARY KEY(id);
INSERT INTO c (id) SELECT generate_series(1, 1000000);
UPDATE c SET base_id = id / 10;

VACUUM ANALYZE;
SET random_page_cost = 1.1;
SET work_mem = '3276kB';
SET effective_cache_size = '90GB';

-- This gives an incorrect result of 999991, when 100000 is expected on
Postgres 11.3 and 12 beta 1.
SELECT COUNT (*)
FROM a
JOIN b
ON a.id=b.base_id
WHERE EXISTS (
SELECT 1
FROM c
WHERE c.base_id = a.id
);

-- Just for the reference, "bad" plan has this shape:
-- Finalize Aggregate (cost=63211.58..63211.59 rows=1 width=8)
-- -> Gather (cost=63211.36..63211.57 rows=2 width=8)
-- Workers Planned: 2
-- -> Partial Aggregate (cost=62211.36..62211.37 rows=1 width=8)
-- -> Nested Loop (cost=19853.44..62201.25 rows=4045
width=0)
-- -> Parallel Hash Join (cost=19853.00..42614.78
rows=40580 width=8)
-- Hash Cond: (b.base_id = c.base_id)
-- -> Parallel Seq Scan on b
(cost=0.00..13016.67 rows=416667 width=4)
-- -> Parallel Hash (cost=13016.67..13016.67
rows=416667 width=4)
-- -> Parallel Seq Scan on c
(cost=0.00..13016.67 rows=416667 width=4)
-- -> Index Only Scan using a_pkey on a
(cost=0.43..0.48 rows=1 width=4)
-- Index Cond: (id = b.base_id)

#2Thomas Munro
thomas.munro@gmail.com
In reply to: PG Bug reporting form (#1)
Re: BUG #15857: Parallel Hash Join makes join instead of exists

On Tue, Jun 18, 2019 at 6:40 PM PG Bug reporting form
<noreply@postgresql.org> wrote:

-- This gives an incorrect result of 999991, when 100000 is expected on
Postgres 11.3 and 12 beta 1.

Reproduced here. Investigating.

--
Thomas Munro
https://enterprisedb.com

#3Pantelis Theodosiou
ypercube@gmail.com
In reply to: PG Bug reporting form (#1)
Re: BUG #15857: Parallel Hash Join makes join instead of exists

On Tue, Jun 18, 2019 at 7:40 AM PG Bug reporting form <
noreply@postgresql.org> wrote:

The following bug has been logged on the website:

Bug reference: 15857
Logged by: Vladimir Kriukov
Email address: krujkov@gmail.com
PostgreSQL version: 11.3
Operating system: CentOS 7
Description:

CREATE TABLE a(id int);
ALTER TABLE a ADD CONSTRAINT a_pkey PRIMARY KEY(id);
INSERT INTO a(id) SELECT generate_series(1, 1000000);
INSERT INTO a(id) SELECT generate_series(1000001, 10000000);
CREATE TABLE b(id int, base_id int);
ALTER TABLE b ADD CONSTRAINT b_pkey PRIMARY KEY(id);
INSERT INTO b (id) select generate_series(1, 1000000);
UPDATE b SET base_id = 1000000 - id;
CREATE TABLE c(id int, base_id int);
ALTER TABLE c ADD CONSTRAINT c_pkey PRIMARY KEY(id);
INSERT INTO c (id) SELECT generate_series(1, 1000000);
UPDATE c SET base_id = id / 10;

VACUUM ANALYZE;
SET random_page_cost = 1.1;
SET work_mem = '3276kB';
SET effective_cache_size = '90GB';

-- This gives an incorrect result of 999991, when 100000 is expected on
Postgres 11.3 and 12 beta 1.
SELECT COUNT (*)
FROM a
JOIN b
ON a.id=b.base_id
WHERE EXISTS (
SELECT 1
FROM c
WHERE c.base_id = a.id
);

I think it is correct result. This:

UPDATE c SET base_id = id / 10;

would result in 9 rows (id from 1 to 9) to be updated with base_id = 0, as
it should with integer division. These 9 rows will not match the condition:

WHERE c.base_id = a.id

as there is no row in a with a.id = 0

Pantelis

#4Pantelis Theodosiou
ypercube@gmail.com
In reply to: Pantelis Theodosiou (#3)
Re: BUG #15857: Parallel Hash Join makes join instead of exists

On Tue, Jun 18, 2019 at 9:56 AM Pantelis Theodosiou <ypercube@gmail.com>
wrote:

...

I think it is correct result. This:

UPDATE c SET base_id = id / 10;

would result in 9 rows (id from 1 to 9) to be updated with base_id = 0, as
it should with integer division. These 9 rows will not match the condition:

WHERE c.base_id = a.id

as there is no row in a with a.id = 0

Please ignore the above, I didn't read carefully.

Show quoted text
#5Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#2)
Re: BUG #15857: Parallel Hash Join makes join instead of exists

On Tue, Jun 18, 2019 at 8:47 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Tue, Jun 18, 2019 at 6:40 PM PG Bug reporting form
<noreply@postgresql.org> wrote:

-- This gives an incorrect result of 999991, when 100000 is expected on
Postgres 11.3 and 12 beta 1.

Reproduced here. Investigating.

First clue is that if you change "WHERE c.base_id = a.id" to "WHERE
c.base_id = b.base_id", you get Parallel Hash Semi Join instead of
Parallel Hash Join, but an otherwise identical plan with the same Hash
Cond, and the result changes to 100000 instead of 999991.

Second clue is that if you set enable_parallel_hash to off, you get a
Hash Semi Join for "WHERE c.base_id = b.base_id", but if you use
"WHERE c.base_id = a.id" you get a Hash Join over Hash of Unique of
Sort of c, instead of a Hash Semi Join.

That points to the problem: for JOIN_UNIQUE_INNER we plan a Parallel
Hash Join, but that's nonsense, there is no code to unique-ify the
partial inner side (because that's not possible). There may be
something better we can do here (like understanding that this should
really be a semi-join), but this works for me to prevent the bad plan:

diff --git a/src/backend/optimizer/path/joinpath.c
b/src/backend/optimizer/path/joinpath.c
index 501ad775cbe..e42c82c2bb4 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -1869,7 +1869,8 @@ hash_inner_and_outer(PlannerInfo *root,
                         * Can we use a partial inner plan too, so
that we can build a
                         * shared hash table in parallel?
                         */
-                       if (innerrel->partial_pathlist != NIL &&
enable_parallel_hash)
+                       if (innerrel->partial_pathlist != NIL &&
+                               save_jointype != JOIN_UNIQUE_INNER &&
enable_parallel_hash)
                        {
                                cheapest_partial_inner =
                                        (Path *)
linitial(innerrel->partial_pathlist);

--
Thomas Munro
https://enterprisedb.com

#6Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#5)
Re: BUG #15857: Parallel Hash Join makes join instead of exists

On Tue, Jun 18, 2019 at 9:24 PM Thomas Munro <thomas.munro@gmail.com> wrote:

That points to the problem: for JOIN_UNIQUE_INNER we plan a Parallel
Hash Join, but that's nonsense, there is no code to unique-ify the
partial inner side (because that's not possible). There may be
something better we can do here (like understanding that this should
really be a semi-join), but this works for me to prevent the bad plan:

Here's a tidier version with a comment and commit message.

--
Thomas Munro
https://enterprisedb.com

Attachments:

0001-Prevent-Parallel-Hash-Join-for-JOIN_UNIQUE_INNER.patchapplication/octet-stream; name=0001-Prevent-Parallel-Hash-Join-for-JOIN_UNIQUE_INNER.patchDownload+5-3
#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#6)
Re: BUG #15857: Parallel Hash Join makes join instead of exists

Thomas Munro <thomas.munro@gmail.com> writes:

Here's a tidier version with a comment and commit message.

It seems a little weird to make the join-type test be the first one
in the if() condition: it doesn't seem like the most important thing
to test, and it doesn't agree with the way you've worded the comment.
Otherwise +1.

regards, tom lane

#8Thomas Munro
thomas.munro@gmail.com
In reply to: Tom Lane (#7)
Re: BUG #15857: Parallel Hash Join makes join instead of exists

On Wed, Jun 19, 2019 at 1:17 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

It seems a little weird to make the join-type test be the first one
in the if() condition: it doesn't seem like the most important thing
to test, and it doesn't agree with the way you've worded the comment.
Otherwise +1.

Thanks for the fast review. Updated. I'll default to waiting until
after I see the tags for the release that's in progress unless you
think it should be included.

--
Thomas Munro
https://enterprisedb.com

Attachments:

0001-Prevent-Parallel-Hash-Join-for-JOIN_UNIQUE_INNER-v2.patchapplication/octet-stream; name=0001-Prevent-Parallel-Hash-Join-for-JOIN_UNIQUE_INNER-v2.patchDownload+5-3
#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#8)
Re: BUG #15857: Parallel Hash Join makes join instead of exists

Thomas Munro <thomas.munro@gmail.com> writes:

Thanks for the fast review. Updated. I'll default to waiting until
after I see the tags for the release that's in progress unless you
think it should be included.

OK to push as far as I'm concerned. I am not expecting to have to
re-wrap given the lack of packagers complaints so far, and if we
do re-wrap, I'd be OK with including this fix.

regards, tom lane

#10Thomas Munro
thomas.munro@gmail.com
In reply to: Tom Lane (#9)
Re: BUG #15857: Parallel Hash Join makes join instead of exists

On Wed, Jun 19, 2019 at 2:05 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

OK to push as far as I'm concerned. I am not expecting to have to
re-wrap given the lack of packagers complaints so far, and if we
do re-wrap, I'd be OK with including this fix.

Done.

Vladimir, thanks for the report! Unfortunately the fix is not
included in the out-of-schedule 11.4 release that's about to be made
(unless something triggers a re-wrap), but you should expect it in the
next regular release[1]https://www.postgresql.org/developer/roadmap/. In the meantime you might be able to work
around the problem by using a different but equivalent join condition
so you get a semi-join, or by turning off enable_parallel_hash. Even
with the fix you'll probably get better performance if you can
convince it to use a semi-join.

[1]: https://www.postgresql.org/developer/roadmap/

--
Thomas Munro
https://enterprisedb.com