BUG #18634: Wrong varnullingrels with merge ... when not matched by source

Started by PG Bug reporting formover 1 year ago9 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 18634
Logged by: Alexander Lakhin
Email address: exclusion@gmail.com
PostgreSQL version: 17.0
Operating system: Ubuntu 22.04
Description:

The following script:
CREATE TABLE t (a int);
INSERT INTO t VALUES(1), (2);
CREATE VIEW v AS SELECT a FROM t WHERE EXISTS (SELECT 1 FROM t);

MERGE INTO v USING (SELECT * FROM generate_series(1,1)) AS s(a)
ON s.a = v.a WHEN NOT MATCHED BY SOURCE THEN DELETE;

produces:
ERROR: XX000: wrong varnullingrels (b) (expected (b 4)) for Var 5/1
LOCATION: search_indexed_tlist_for_var, setrefs.c:2847

At the same time,
MERGE INTO v USING (SELECT * FROM generate_series(1,1)) AS s(a)
ON s.a = v.a WHEN MATCHED THEN DELETE;
planned and executed with no error.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: PG Bug reporting form (#1)
Re: BUG #18634: Wrong varnullingrels with merge ... when not matched by source

PG Bug reporting form <noreply@postgresql.org> writes:

The following script:
CREATE TABLE t (a int);
INSERT INTO t VALUES(1), (2);
CREATE VIEW v AS SELECT a FROM t WHERE EXISTS (SELECT 1 FROM t);

MERGE INTO v USING (SELECT * FROM generate_series(1,1)) AS s(a)
ON s.a = v.a WHEN NOT MATCHED BY SOURCE THEN DELETE;

produces:
ERROR: XX000: wrong varnullingrels (b) (expected (b 4)) for Var 5/1
LOCATION: search_indexed_tlist_for_var, setrefs.c:2847

I haven't run this fully to ground, but what it looks like
is that preprocess_targetlist is generating row identity
Vars that lack required varnullingrels. I don't understand
though why this only seems to affect MERGE.

regards, tom lane

#3Richard Guo
guofenglinux@gmail.com
In reply to: Tom Lane (#2)
Re: BUG #18634: Wrong varnullingrels with merge ... when not matched by source

On Fri, Sep 27, 2024 at 6:41 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

PG Bug reporting form <noreply@postgresql.org> writes:

The following script:
CREATE TABLE t (a int);
INSERT INTO t VALUES(1), (2);
CREATE VIEW v AS SELECT a FROM t WHERE EXISTS (SELECT 1 FROM t);

MERGE INTO v USING (SELECT * FROM generate_series(1,1)) AS s(a)
ON s.a = v.a WHEN NOT MATCHED BY SOURCE THEN DELETE;

produces:
ERROR: XX000: wrong varnullingrels (b) (expected (b 4)) for Var 5/1
LOCATION: search_indexed_tlist_for_var, setrefs.c:2847

I haven't run this fully to ground, but what it looks like
is that preprocess_targetlist is generating row identity
Vars that lack required varnullingrels. I don't understand
though why this only seems to affect MERGE.

It looks like that preprocess_targetlist will add any vars used in
parse->mergeJoinCondition that belong to the source relation to the
processed tlist. This logic was introduced to support WHEN NOT
MATCHED BY SOURCE actions (see 0294df2f1). For such actions, the
source relation is on the nullable side of the outer join. But when
adding the vars used in the join condition that belong to source
relation to the tlist, we fail to mark them as nullable by the join.

I think we can check the jointype of the join between the target and
the source relation when adding the vars in mergeJoinCondition. If it
is JOIN_LEFT, we mark the vars that belong to source as nullable by
this join.

With this routine, ISTM we'd need a way for preprocess_targetlist to
access the JoinExpr that we build in transform_MERGE_to_join.

Thanks
Richard

#4Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Richard Guo (#3)
Re: BUG #18634: Wrong varnullingrels with merge ... when not matched by source

On Fri, 27 Sept 2024 at 13:52, Richard Guo <guofenglinux@gmail.com> wrote:

On Fri, Sep 27, 2024 at 6:41 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I haven't run this fully to ground, but what it looks like
is that preprocess_targetlist is generating row identity
Vars that lack required varnullingrels. I don't understand
though why this only seems to affect MERGE.

It looks like that preprocess_targetlist will add any vars used in
parse->mergeJoinCondition that belong to the source relation to the
processed tlist. This logic was introduced to support WHEN NOT
MATCHED BY SOURCE actions (see 0294df2f1). For such actions, the
source relation is on the nullable side of the outer join. But when
adding the vars used in the join condition that belong to source
relation to the tlist, we fail to mark them as nullable by the join.

Yes, I reached the same conclusion.

The same goes for Vars in merge action quals and targetlists, and the
RETURNING list, but not the ones added for rowmarks, if I'm
understanding it correctly.

I think we can check the jointype of the join between the target and
the source relation when adding the vars in mergeJoinCondition. If it
is JOIN_LEFT, we mark the vars that belong to source as nullable by
this join.

With this routine, ISTM we'd need a way for preprocess_targetlist to
access the JoinExpr that we build in transform_MERGE_to_join.

Another option is to do it in transform_MERGE_to_join(). That feels
safer, because the jointree might have been modified by the time we
reach preprocess_targetlist().

Something like the attached.

Regards,
Dean

Attachments:

bug18634-fix.patchtext/x-patch; charset=US-ASCII; name=bug18634-fix.patchDownload+79-0
#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dean Rasheed (#4)
Re: BUG #18634: Wrong varnullingrels with merge ... when not matched by source

Dean Rasheed <dean.a.rasheed@gmail.com> writes:

On Fri, 27 Sept 2024 at 13:52, Richard Guo <guofenglinux@gmail.com> wrote:

I think we can check the jointype of the join between the target and
the source relation when adding the vars in mergeJoinCondition. If it
is JOIN_LEFT, we mark the vars that belong to source as nullable by
this join.

Another option is to do it in transform_MERGE_to_join(). That feels
safer, because the jointree might have been modified by the time we
reach preprocess_targetlist().

Yeah, I think it's critical that these Vars be already correctly
marked before we engage in all the slicing-and-dicing that
prepjointree et al will do. As an example, it seems not impossible
for join removal to make wrong decisions if they aren't.

Something like the attached.

Could use some comments ... but actually, now I'm confused about why
any of this is the right thing at all:

+	 * Similarly, any non-target Vars in the join condition will be added to
+	 * the targetlist by preprocess_targetlist(), and so must be marked as
+	 * nullable by the join, for LEFT and FULL joins.

Why do we need these Vars in the tlist? If they're for re-evaluating
the join condition, isn't the already-nulled form of them the wrong
thing?

regards, tom lane

#6Richard Guo
guofenglinux@gmail.com
In reply to: Tom Lane (#5)
Re: BUG #18634: Wrong varnullingrels with merge ... when not matched by source

On Fri, Sep 27, 2024 at 11:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Could use some comments ... but actually, now I'm confused about why
any of this is the right thing at all:

+        * Similarly, any non-target Vars in the join condition will be added to
+        * the targetlist by preprocess_targetlist(), and so must be marked as
+        * nullable by the join, for LEFT and FULL joins.

Why do we need these Vars in the tlist? If they're for re-evaluating
the join condition, isn't the already-nulled form of them the wrong
thing?

I have the same concern. I think we should NOT mark the vars in
mergeJoinCondition as nullable, as mergeJoinCondition acts as join
quals rather than filter quals at that outer join. Instead, we should
mark them nullable when they are pulled out and ready to be added to
the targetlist, if they are really needed in the targetlist.

Thanks
Richard

#7Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Richard Guo (#6)
Re: BUG #18634: Wrong varnullingrels with merge ... when not matched by source

On Sat, 28 Sept 2024 at 01:40, Richard Guo <guofenglinux@gmail.com> wrote:

On Fri, Sep 27, 2024 at 11:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Could use some comments ... but actually, now I'm confused about why
any of this is the right thing at all:

+        * Similarly, any non-target Vars in the join condition will be added to
+        * the targetlist by preprocess_targetlist(), and so must be marked as
+        * nullable by the join, for LEFT and FULL joins.

Why do we need these Vars in the tlist? If they're for re-evaluating
the join condition, isn't the already-nulled form of them the wrong
thing?

Doh. Yes, that's broken. A previous version of the WHEN NOT MATCHED BY
SOURCE patch used a qual of the form "src IS [NOT] NULL" to
distinguish between the MATCHED and NOT MATCHED BY SOURCE cases, but I
changed that to use the merge join condition as part of a larger
refactoring [1]/messages/by-id/CAEZATCV-7j0wq6T2AGsyRbaVY5muZ8KJ07L-=8Pvt=a3w1V5vA@mail.gmail.com, which was a mistake.

[1]: /messages/by-id/CAEZATCV-7j0wq6T2AGsyRbaVY5muZ8KJ07L-=8Pvt=a3w1V5vA@mail.gmail.com

A simple case to demonstrate the problem is this:

CREATE TABLE src (a int, b text);
INSERT INTO src VALUES (1, 'src row');

CREATE TABLE tgt (a int, b text);
INSERT INTO tgt VALUES (NULL, 'tgt row');

MERGE INTO tgt USING src ON tgt.a IS NOT DISTINCT FROM src.a
WHEN MATCHED THEN UPDATE SET a = src.a, b = src.b
WHEN NOT MATCHED BY SOURCE THEN DELETE;

SELECT * FROM tgt;
a | b
---+---
|
(1 row)

which is wrong (it executes the MATCHED UPDATE action instead of the
NOT MATCHED BY SOURCE DELETE action because src.a is NULL above the
join).

The simplest fix is to add a "src IS NOT NULL" wholerow check to the
executor recheck condition, similar to the prior implementation, which
I've done in the v2 patch, attached.

A slightly better fix would be to *replace* the executor check with
that IS NOT NULL test, and not actually recheck the original join
condition at all. That's already sufficient for the initial check at
the top of ExecMergeMatched(), and it would only require a minor
change to the EPQ rechecking further down to make it work fully. It
would, however, make the field names wrong/misleading, so probably
they would need updating if we did that, making it unsuitable for
back-patching. Since this is really only an optimisation, and it's not
clear how much difference it would actually make anyway, I'm inclined
to leave it as a possible future enhancement.

I have the same concern. I think we should NOT mark the vars in
mergeJoinCondition as nullable, as mergeJoinCondition acts as join
quals rather than filter quals at that outer join. Instead, we should
mark them nullable when they are pulled out and ready to be added to
the targetlist, if they are really needed in the targetlist.

Actually, the marking is done after building the join node, so it's
only marking a copy of the join condition, for use above the join. The
original condition inside the join node remains unmarked, so I think
it's right.

For the sake of the archives, this "wrong varnullingrels" error is not
limited to the join condition. With the same table/view definitions as
in the original report, it can also be reproduced with queries like

MERGE INTO v USING (SELECT * FROM generate_series(1,1)) AS s(a) ON true
WHEN MATCHED THEN UPDATE SET a = s.a
WHEN NOT MATCHED BY SOURCE THEN DELETE;

and

MERGE INTO v USING (SELECT * FROM generate_series(1,1)) AS s(a) ON true
WHEN NOT MATCHED BY SOURCE THEN DELETE
RETURNING s.a;

Basically, anything in the ModifyTable node that accesses source
relation Vars, if the source relation is on the outer side of the
join.

I spent some time trying to figure out why none of the existing tests
hit this error, and I think the reason is that all the previous tests
involved a plan where the ModifyTable node is directly on top of the
join node, so the top targetlist was the join node's targetlist, and
therefore wasn't marked. But in the example here, there is a one-time
filter Result node between the ModifyTable node and the join node,
which means the ModifyTable node pulls from the Result node, whose
output is marked as nullable, because it's above the join. That makes
the error somewhat rare, though maybe there are other cases that can
lead to a plan node being inserted between the ModifyTable node and
the join node.

It feels a bit unsatisfactory that this wasn't detectable with a
ModifyTable node directly on top of the join node, making the bug hard
to spot, but I don't know whether it would be feasible to do anything
about that.

I have split this into 2 commits, since it's really 2 separate bugs,
and tried to improve the commentary.

Regards,
Dean

Attachments:

v2-0001-Fix-incorrect-non-strict-join-recheck-in-MERGE-WH.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Fix-incorrect-non-strict-join-recheck-in-MERGE-WH.patchDownload+101-6
v2-0002-Fix-varnullingrels-markings-for-MERGE-WHEN-NOT-MA.patchtext/x-patch; charset=US-ASCII; name=v2-0002-Fix-varnullingrels-markings-for-MERGE-WHEN-NOT-MA.patchDownload+89-1
#8Richard Guo
guofenglinux@gmail.com
In reply to: Dean Rasheed (#7)
Re: BUG #18634: Wrong varnullingrels with merge ... when not matched by source

On Sun, Sep 29, 2024 at 3:49 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:

On Sat, 28 Sept 2024 at 01:40, Richard Guo <guofenglinux@gmail.com> wrote:

I have the same concern. I think we should NOT mark the vars in
mergeJoinCondition as nullable, as mergeJoinCondition acts as join
quals rather than filter quals at that outer join. Instead, we should
mark them nullable when they are pulled out and ready to be added to
the targetlist, if they are really needed in the targetlist.

Actually, the marking is done after building the join node, so it's
only marking a copy of the join condition, for use above the join. The
original condition inside the join node remains unmarked, so I think
it's right.

Ah yes, you are right. I thought the join node we build for MERGE
would use the marked mergeJoinCondition as join quals, but that's not
the case.

I spent some time trying to figure out why none of the existing tests
hit this error, and I think the reason is that all the previous tests
involved a plan where the ModifyTable node is directly on top of the
join node, so the top targetlist was the join node's targetlist, and
therefore wasn't marked. But in the example here, there is a one-time
filter Result node between the ModifyTable node and the join node,
which means the ModifyTable node pulls from the Result node, whose
output is marked as nullable, because it's above the join. That makes
the error somewhat rare, though maybe there are other cases that can
lead to a plan node being inserted between the ModifyTable node and
the join node.

It feels a bit unsatisfactory that this wasn't detectable with a
ModifyTable node directly on top of the join node, making the bug hard
to spot, but I don't know whether it would be feasible to do anything
about that.

For an outer join, any vars appearing in its targetlist (and qpqual)
should be marked nullable if they are from the nullable side, because
they are logically above the join. However, when we fix up the
targetlist and qpqual, we don't have enough info available to identify
the nullingrel bits added by the outer join. So we have to use
superset matches rather than exact matches.

This is why we don't hit this error in cases where the ModifyTable
node is directly on top of the join node, even though we fail to mark
the vars in targetlist correctly.

In Alexander's case, there is a Result node in between. When we fix
up the targetlist for the Result node, we perform exact matches for
the nullingrel bits. That's how this issue is revealed.

Yeah, it's a bit unsatisfying that we can only perform superset
matches rather than exact matches for the Vars in the targetlist and
qpqual of an outer join. Maybe we can record the ojrelid of the outer
join in Join node, and then match these Vars with input Vars'
nullingrels plus this ojrelid. But when the outer joins are
re-ordered according to identity 3, it becomes very tricky to figure
out what the correct ojrelid(s) we should use for the outer join.

Thanks
Richard

#9Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Richard Guo (#8)
Re: BUG #18634: Wrong varnullingrels with merge ... when not matched by source

On Mon, 30 Sept 2024 at 03:16, Richard Guo <guofenglinux@gmail.com> wrote:

On Sun, Sep 29, 2024 at 3:49 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:

I spent some time trying to figure out why none of the existing tests
hit this error, and I think the reason is that all the previous tests
involved a plan where the ModifyTable node is directly on top of the
join node, so the top targetlist was the join node's targetlist, and
therefore wasn't marked. But in the example here, there is a one-time
filter Result node between the ModifyTable node and the join node,
which means the ModifyTable node pulls from the Result node, whose
output is marked as nullable, because it's above the join. That makes
the error somewhat rare, though maybe there are other cases that can
lead to a plan node being inserted between the ModifyTable node and
the join node.

It feels a bit unsatisfactory that this wasn't detectable with a
ModifyTable node directly on top of the join node, making the bug hard
to spot, but I don't know whether it would be feasible to do anything
about that.

For an outer join, any vars appearing in its targetlist (and qpqual)
should be marked nullable if they are from the nullable side, because
they are logically above the join. However, when we fix up the
targetlist and qpqual, we don't have enough info available to identify
the nullingrel bits added by the outer join. So we have to use
superset matches rather than exact matches.

This is why we don't hit this error in cases where the ModifyTable
node is directly on top of the join node, even though we fail to mark
the vars in targetlist correctly.

OK, that makes sense, I think.

I have pushed and back-patched both fixes.

Regards,
Dean