BUG #18522: Wrong results with Merge Right Anti Join, inconsistent with Merge Anti Join

Started by PG Bug reporting formalmost 2 years ago9 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 18522
Logged by: Antti Lampinen
Email address: antti@lampinen.eu
PostgreSQL version: 16.3
Operating system: AWS RDS + Macos
Description:

The following two queries result in different query plans and different
results, even though there is only a dummy condition change between them.
The
latter results are correct, there are two rows that match the conditions.

The following was run both in AWS RDS and after restoring database from
plain
SQL dump to local Macos system.

First query.

EXPLAIN (ANALYZE, SETTINGS, VERBOSE )
SELECT id, market_id
FROM "order"
WHERE "order"."deleted_at" IS NULL AND
"order"."archived_at" IS NULL AND
NOT EXISTS (SELECT
FROM "matchmaking_request"
WHERE ("matchmaking_request"."order_id" = "order"."id"
AND
"matchmaking_request"."started_at" IS NOT NULL
AND
"matchmaking_request"."cancelled_at" IS NULL))
AND
"order"."market_id" IN (1);

QUERY PLAN
Merge Right Anti Join (cost=415.85..527.39 rows=350 width=8) (actual
time=6.015..6.018 rows=1 loops=1)
Output: "order".id, "order".market_id
Inner Unique: true
Merge Cond: (matchmaking_request.order_id = "order".id)
-> Index Scan using matchmaking_request_order_id_idx on
public.matchmaking_request (cost=0.28..834.07 rows=5840 width=4) (actual
time=0.032..2.089 rows=702 loops=1)
Output: <retracted>
Filter: ((matchmaking_request.started_at IS NOT NULL) AND
(matchmaking_request.cancelled_at IS NULL))
Rows Removed by Filter: 301
-> Sort (cost=415.56..416.82 rows=503 width=8) (actual time=3.695..3.730
rows=493 loops=1)
Output: "order".id, "order".market_id
Sort Key: "order".id
Sort Method: quicksort Memory: 40kB
-> Index Scan using order_market_id_idx on public."order"
(cost=0.28..392.99 rows=503 width=8) (actual time=0.035..3.555 rows=493
loops=1)
Output: "order".id, "order".market_id
Index Cond: ("order".market_id = 1)
Filter: (("order".deleted_at IS NULL) AND ("order".archived_at
IS NULL))
Rows Removed by Filter: 871
Settings: random_page_cost = '1'
Planning Time: 0.643 ms
Execution Time: 6.301 ms

Second query. Note the dummy condition at the end.

EXPLAIN (ANALYZE, SETTINGS, VERBOSE )
SELECT id, market_id
FROM "order"
WHERE "order"."deleted_at" IS NULL AND
"order"."archived_at" IS NULL AND
NOT EXISTS (SELECT
FROM "matchmaking_request"
WHERE ("matchmaking_request"."order_id" = "order"."id"
AND
"matchmaking_request"."started_at" IS NOT NULL
AND
"matchmaking_request"."cancelled_at" IS NULL))
AND
"order"."market_id" IN (1, 1); -- dummy condition

QUERY PLAN
Merge Anti Join (cost=0.56..579.56 rows=503 width=8) (actual
time=6.622..6.773 rows=2 loops=1)
Output: "order".id, "order".market_id
Merge Cond: ("order".id = matchmaking_request.order_id)
-> Index Scan using order_pkey on public."order" (cost=0.28..467.77
rows=723 width=8) (actual time=0.078..4.435 rows=493 loops=1)
Output: <retracted>
Filter: (("order".deleted_at IS NULL) AND ("order".archived_at IS
NULL) AND ("order".market_id = ANY ('{1,1}'::integer[])))
Rows Removed by Filter: 1935
-> Index Scan using matchmaking_request_order_id_idx on
public.matchmaking_request (cost=0.28..834.07 rows=5840 width=4) (actual
time=0.017..1.996 rows=702 loops=1)
Output: <retracted>
Filter: ((matchmaking_request.started_at IS NOT NULL) AND
(matchmaking_request.cancelled_at IS NULL))
Rows Removed by Filter: 301
Settings: random_page_cost = '1'
Planning Time: 0.629 ms
Execution Time: 6.853 ms

#2Richard Guo
guofenglinux@gmail.com
In reply to: PG Bug reporting form (#1)
Re: BUG #18522: Wrong results with Merge Right Anti Join, inconsistent with Merge Anti Join

On Tue, Jun 25, 2024 at 2:37 AM PG Bug reporting form
<noreply@postgresql.org> wrote:

The following two queries result in different query plans and different
results, even though there is only a dummy condition change between them.
The
latter results are correct, there are two rows that match the conditions.

Thank you for reporting. As far as I recall, this is the first
field-reported bug about the right anti join!

Could you please provide the schema and necessary data for the two
tables to reproduce this bug? If there is a self-contained repro, that
would be great.

Thanks
Richard

#3Antti Lampinen
antti@lampinen.eu
In reply to: Richard Guo (#2)
Re: BUG #18522: Wrong results with Merge Right Anti Join, inconsistent with Merge Anti Join

On Tue, Jun 25, 2024 at 5:07 AM Richard Guo <guofenglinux@gmail.com> wrote:

On Tue, Jun 25, 2024 at 2:37 AM PG Bug reporting form
<noreply@postgresql.org> wrote:

The following two queries result in different query plans and different
results, even though there is only a dummy condition change between them.
The
latter results are correct, there are two rows that match the conditions.

Could you please provide the schema and necessary data for the two
tables to reproduce this bug? If there is a self-contained repro, that
would be great.

I managed to create a self-contained repro:
https://gist.github.com/arlampin/0b86963694a936147383f3af3c83224c

This gives me consistently different results based on superfluous condition
change. See the two EXPLAIN queries in the sample.

Thanks,
Antti

#4Richard Guo
guofenglinux@gmail.com
In reply to: Antti Lampinen (#3)
Re: BUG #18522: Wrong results with Merge Right Anti Join, inconsistent with Merge Anti Join

On Tue, Jun 25, 2024 at 12:24 PM Antti Lampinen <antti@lampinen.eu> wrote:

On Tue, Jun 25, 2024 at 5:07 AM Richard Guo <guofenglinux@gmail.com> wrote:

Could you please provide the schema and necessary data for the two
tables to reproduce this bug? If there is a self-contained repro, that
would be great.

I managed to create a self-contained repro:
https://gist.github.com/arlampin/0b86963694a936147383f3af3c83224c

This gives me consistently different results based on superfluous condition
change. See the two EXPLAIN queries in the sample.

Thank you so much for the repro script. I've found the root cause:
for an inner_unique join we assume that the executor will stop scanning
for matches after the first match. Therefore, we set skip_mark_restore
to true to indicate that we can skip mark/restore overhead. However,
merge right anti join does not get this memo and continues scanning the
inner side for matches after the first match, totally ignoring the
single_match flag, while still thinking that it can skip mark/restore.

Will fix this later.

Thanks
Richard

#5Richard Guo
guofenglinux@gmail.com
In reply to: Richard Guo (#4)
Re: BUG #18522: Wrong results with Merge Right Anti Join, inconsistent with Merge Anti Join

On Tue, Jun 25, 2024 at 7:54 PM Richard Guo <guofenglinux@gmail.com> wrote:

Thank you so much for the repro script. I've found the root cause:
for an inner_unique join we assume that the executor will stop scanning
for matches after the first match. Therefore, we set skip_mark_restore
to true to indicate that we can skip mark/restore overhead. However,
merge right anti join does not get this memo and continues scanning the
inner side for matches after the first match, totally ignoring the
single_match flag, while still thinking that it can skip mark/restore.

Will fix this later.

I think we can fix this issue by ensuring that merge-right-anti-join
also advances to next outer tuple after the first match in inner_unique
cases. This can also help save cycles by avoiding unnecessary scanning
of inner tuples after the first match.

Although hash-right-anti-join does not suffer from this wrong results
issue, I think we can apply the same change to it as well, to help save
cycles for the same reason.

Please see attached patch for the fix. This patch still lacks a test
case. I'll try to see if I can come up with one.

Thanks
Richard

Attachments:

v1-0001-Fix-right-anti-joins-when-the-inner-relation-is-proven-unique.patchapplication/octet-stream; name=v1-0001-Fix-right-anti-joins-when-the-inner-relation-is-proven-unique.patchDownload+22-21
#6Richard Guo
guofenglinux@gmail.com
In reply to: Richard Guo (#5)
Re: BUG #18522: Wrong results with Merge Right Anti Join, inconsistent with Merge Anti Join

On Tue, Jun 25, 2024 at 10:45 PM Richard Guo <guofenglinux@gmail.com> wrote:

On Tue, Jun 25, 2024 at 7:54 PM Richard Guo <guofenglinux@gmail.com> wrote:

Thank you so much for the repro script. I've found the root cause:
for an inner_unique join we assume that the executor will stop scanning
for matches after the first match. Therefore, we set skip_mark_restore
to true to indicate that we can skip mark/restore overhead. However,
merge right anti join does not get this memo and continues scanning the
inner side for matches after the first match, totally ignoring the
single_match flag, while still thinking that it can skip mark/restore.

Will fix this later.

I think we can fix this issue by ensuring that merge-right-anti-join
also advances to next outer tuple after the first match in inner_unique
cases. This can also help save cycles by avoiding unnecessary scanning
of inner tuples after the first match.

Although hash-right-anti-join does not suffer from this wrong results
issue, I think we can apply the same change to it as well, to help save
cycles for the same reason.

Please see attached patch for the fix. This patch still lacks a test
case. I'll try to see if I can come up with one.

I've added a test case in the v2 patch. I'll park it in the July
commitfest.

Thanks
Richard

Attachments:

v2-0001-Fix-right-anti-joins-when-the-inner-relation-is-proven-unique.patchapplication/octet-stream; name=v2-0001-Fix-right-anti-joins-when-the-inner-relation-is-proven-unique.patchDownload+100-21
#7Richard Guo
guofenglinux@gmail.com
In reply to: Richard Guo (#2)
Re: BUG #18522: Wrong results with Merge Right Anti Join, inconsistent with Merge Anti Join

On Tue, Jun 25, 2024 at 10:07 AM Richard Guo <guofenglinux@gmail.com> wrote:

Thank you for reporting. As far as I recall, this is the first
field-reported bug about the right anti join!

Just for the record, this issue was reported again with BUG #18526 by
Feliphe Pozzer in

/messages/by-id/18526-bf89f27cb20d8a18@postgresql.org

Thanks
Richard

#8Richard Guo
guofenglinux@gmail.com
In reply to: Richard Guo (#6)
Re: BUG #18522: Wrong results with Merge Right Anti Join, inconsistent with Merge Anti Join

On Wed, Jun 26, 2024 at 3:31 PM Richard Guo <guofenglinux@gmail.com> wrote:

I've added a test case in the v2 patch. I'll park it in the July
commitfest.

Here is an updated patch, with some tweaks suggested by pgindent, and
adjustments to the commit message suggested by Andrew (off-list).

Barring objections, I'm planning to push it next week.

Thanks
Richard

Attachments:

v3-0001-Fix-right-anti-joins-when-the-inner-relation-is-proven-unique.patchapplication/octet-stream; name=v3-0001-Fix-right-anti-joins-when-the-inner-relation-is-proven-unique.patchDownload+100-21
#9Richard Guo
guofenglinux@gmail.com
In reply to: Richard Guo (#8)
Re: BUG #18522: Wrong results with Merge Right Anti Join, inconsistent with Merge Anti Join

On Tue, Jul 2, 2024 at 6:22 PM Richard Guo <guofenglinux@gmail.com> wrote:

Here is an updated patch, with some tweaks suggested by pgindent, and
adjustments to the commit message suggested by Andrew (off-list).

Barring objections, I'm planning to push it next week.

Pushed. Thank you for the report. The fix will appear in August's
minor releases.

Thanks
Richard