pgsql: Clean up all relid fields of RestrictInfos during join removal.

Started by Tom Lane2 months ago4 messagescomitters
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

Clean up all relid fields of RestrictInfos during join removal.

The original implementation of remove_rel_from_restrictinfo()
thought it could skate by with removing no-longer-valid relid
bits from only the clause_relids and required_relids fields.
This is quite bogus, although somehow we had not run across a
counterexample before now. At minimum, the left_relids and
right_relids fields need to be fixed because they will be
examined later by clause_sides_match_join(). But it seems
pretty foolish not to fix all the relid fields, so do that.

This needs to be back-patched as far as v16, because the
bug report shows a planner failure that does not occur
before v16. I'm a little nervous about back-patching,
because this could cause unexpected plan changes due to
opening up join possibilities that were rejected before.
But it's hard to argue that this isn't a regression. Also,
the fact that this changes no existing regression test results
suggests that the scope of changes may be fairly narrow.
I'll refrain from back-patching further though, since no
adverse effects have been demonstrated in older branches.

Bug: #19460
Reported-by: François Jehl <francois.jehl@pigment.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: /messages/by-id/19460-5625143cef66012f@postgresql.org
Backpatch-through: 16

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/cfcd5711160a42249def8f781bae197829cf44c7

Modified Files
--------------
src/backend/optimizer/plan/analyzejoins.c | 18 +++++++++++++-
src/test/regress/expected/join.out | 39 +++++++++++++++++++++++++++++++
src/test/regress/sql/join.sql | 23 ++++++++++++++++++
3 files changed, 79 insertions(+), 1 deletion(-)

#2Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#1)
Re: pgsql: Clean up all relid fields of RestrictInfos during join removal.

Hi Tom,

On Mon, Apr 20, 2026 at 06:48:35PM +0000, Tom Lane wrote:

Clean up all relid fields of RestrictInfos during join removal.

The original implementation of remove_rel_from_restrictinfo()
thought it could skate by with removing no-longer-valid relid
bits from only the clause_relids and required_relids fields.
This is quite bogus, although somehow we had not run across a
counterexample before now. At minimum, the left_relids and
right_relids fields need to be fixed because they will be
examined later by clause_sides_match_join(). But it seems
pretty foolish not to fix all the relid fields, so do that.

prion looks unhappy on this one:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&amp;dt=2026-04-20%2022%3A50%3A01

This uses -DRELCACHE_FORCE_RELEASE -DCATCACHE_FORCE_RELEASE.
--
Michael

#3Richard Guo
guofenglinux@gmail.com
In reply to: Michael Paquier (#2)
Re: pgsql: Clean up all relid fields of RestrictInfos during join removal.

On Tue, Apr 21, 2026 at 8:03 AM Michael Paquier <michael@paquier.xyz> wrote:

Hi Tom,

On Mon, Apr 20, 2026 at 06:48:35PM +0000, Tom Lane wrote:

Clean up all relid fields of RestrictInfos during join removal.

The original implementation of remove_rel_from_restrictinfo()
thought it could skate by with removing no-longer-valid relid
bits from only the clause_relids and required_relids fields.
This is quite bogus, although somehow we had not run across a
counterexample before now. At minimum, the left_relids and
right_relids fields need to be fixed because they will be
examined later by clause_sides_match_join(). But it seems
pretty foolish not to fix all the relid fields, so do that.

prion looks unhappy on this one:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&amp;dt=2026-04-20%2022%3A50%3A01

This uses -DRELCACHE_FORCE_RELEASE -DCATCACHE_FORCE_RELEASE.

This seems the same problem as in skink. There is a WIP patch for the
fix at:
/messages/by-id/458729.1776724816@sss.pgh.pa.us

- Richard

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Richard Guo (#3)
Re: pgsql: Clean up all relid fields of RestrictInfos during join removal.

Richard Guo <guofenglinux@gmail.com> writes:

On Tue, Apr 21, 2026 at 8:03 AM Michael Paquier <michael@paquier.xyz> wrote:

prion looks unhappy on this one:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&amp;dt=2026-04-20%2022%3A50%3A01
This uses -DRELCACHE_FORCE_RELEASE -DCATCACHE_FORCE_RELEASE.

This seems the same problem as in skink.

Yeah, likely. As best I can tell, the reason skink is falling over is
that USE_VALGRIND enables list.c's DEBUG_LIST_MEMORY_USAGE, making
list.c much more memory-hungry and thus able to recycle freed
bitmapset storage before it would have been recycled in a regular
debug build. Probably prion's options have a similar effect.

Wanting to get the buildfarm green again, I didn't stop to dig into
this interesting question: how the heck did cfcd57111 manage to pass
regression testing in our standard test rig with CLOBBER_FREED_MEMORY
enabled? In the problematic cases, remove_rel_from_eclass reduces
cur_em->em_relids to empty causing it to be pfree'd, which means that
there is now some associated RestrictInfo whose left_relids or
right_relids is pointing at freed memory. That should result in the
later bms_del_member calls in remove_rel_from_restrictinfo blowing up
instantly on Assert(bms_is_valid_set(a)). The only way that doesn't
happen, AFAICS, is if we repopulate that freed chunk with another
Bitmapset in between. It's certainly possible given that the loop in
remove_rel_from_restrictinfo will do some bms_copy's, but you would
not think it'd happen that way reliably enough to get through
check-world.

regards, tom lane