Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

Started by nikhil rajover 1 year ago26 messages
#1nikhil raj
nikhilraj474@gmail.com
2 attachment(s)

Hi All,

I've encountered a noticeable difference in execution time and query
execution plan row counts between PostgreSQL 13 and PostgreSQL 16 when
running a query on information_schema tables. Surprisingly, PostgreSQL 16
is performing slower than PostgreSQL 13.

The query executed on both versions is as follows:
SELECT DISTINCT "tc"."constraint_name" AS "ConstraintName",
"ccu"."column_name" AS "ColumnName"
FROM
information_schema.constraint_column_usage AS "ccu" right join
information_schema.table_constraints AS "tc"
ON "tc"."constraint_catalog" = "ccu"."constraint_catalog"
AND "tc"."constraint_name" = "ccu"."constraint_name"
WHERE "tc"."constraint_type" = 'PRIMARY KEY'
AND "ccu"."table_name" = 't_c56ng1_repository'

Here are the details of the PostgreSQL versions and the execution plans:

*4PostgreSQL 13.14 (PostgreSQL 13.14 on x86_64-pc-linux-gnu, compiled by
gcc 11.4.0, 64-bit)*
Execution plan: PG13.14 Execution Plan
<https://explain.dalibo.com/plan/ag1a62a9d47dg29d&gt;

*PostgreSQL 16.4 (PostgreSQL 16.4 on x86_64-pc-linux-gnu, compiled by gcc
11.4.0, 64-bit)*
Execution plan: PG16.4 Execution Plan
<https://explain.dalibo.com/plan/4c66fdfbf2hf9ed2&gt;

Has anyone else experienced similar behavior or could provide insights into
why PostgreSQL 16 might be slower for this query? Any advice or suggestions
for optimization would be greatly appreciated.

Thank you!
NOTE:- PFA the raw file of explain and analyze below.

Attachments:

PG13_32644.txttext/plain; charset=US-ASCII; name=PG13_32644.txtDownload
PG16_32644.txttext/plain; charset=US-ASCII; name=PG16_32644.txtDownload
#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: nikhil raj (#1)
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

On 8/26/24 14:49, nikhil raj wrote:

Hi All,

I've encountered a noticeable difference in execution time and query
execution plan row counts between PostgreSQL 13 and PostgreSQL 16 when
running a query on |information_schema| tables. Surprisingly, PostgreSQL
16 is performing slower than PostgreSQL 13.

Did you run ANALYZE on the Postgres 16 instance?

*4PostgreSQL 13.14 (PostgreSQL 13.14 on x86_64-pc-linux-gnu, compiled by
gcc 11.4.0, 64-bit)*
Execution plan: PG13.14 Execution Plan
<https://explain.dalibo.com/plan/ag1a62a9d47dg29d&gt;

*PostgreSQL 16.4 (PostgreSQL 16.4 on x86_64-pc-linux-gnu, compiled by
gcc 11.4.0, 64-bit)*
Execution plan: PG16.4 Execution Plan
<https://explain.dalibo.com/plan/4c66fdfbf2hf9ed2&gt;

Use:

https://explain.depesz.com/

It is easier to follow it's output.

Has anyone else experienced similar behavior or could provide insights
into why PostgreSQL 16 might be slower for this query? Any advice or
suggestions for optimization would be greatly appreciated.

Yes when ANALYZE was not run on a new instance.

Thank you!

NOTE:-  PFA the raw file of explain and analyze below.

--
Adrian Klaver
adrian.klaver@aklaver.com

#3nikhil raj
nikhilraj474@gmail.com
In reply to: Adrian Klaver (#2)
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

Hi Adrian,

Thanks for the quick response.

I've already performed a vacuum, reindex, and analyze on the entire
database, but the issue persists. As you can see from the execution plan,
the time difference in PostgreSQL 16 is still significantly higher, even
after all maintenance activities have been completed.
It seems there might be a bug in PostgreSQL 16 where the performance of
queries on *information_schema* tables is degraded. As both the tables are
postgres system tables

https://explain.depesz.com/s/bdO6b :-PG13
<https://explain.depesz.com/s/bdO6b&gt;
https://explain.depesz.com/s/bpAU :- PG16
<https://explain.depesz.com/s/bpAU&gt;

On Tue 27 Aug, 2024, 3:40 AM Adrian Klaver, <adrian.klaver@aklaver.com>
wrote:

Show quoted text

On 8/26/24 14:49, nikhil raj wrote:

Hi All,

I've encountered a noticeable difference in execution time and query
execution plan row counts between PostgreSQL 13 and PostgreSQL 16 when
running a query on |information_schema| tables. Surprisingly, PostgreSQL
16 is performing slower than PostgreSQL 13.

Did you run ANALYZE on the Postgres 16 instance?

*4PostgreSQL 13.14 (PostgreSQL 13.14 on x86_64-pc-linux-gnu, compiled by
gcc 11.4.0, 64-bit)*
Execution plan: PG13.14 Execution Plan
<https://explain.dalibo.com/plan/ag1a62a9d47dg29d&gt;

*PostgreSQL 16.4 (PostgreSQL 16.4 on x86_64-pc-linux-gnu, compiled by
gcc 11.4.0, 64-bit)*
Execution plan: PG16.4 Execution Plan
<https://explain.dalibo.com/plan/4c66fdfbf2hf9ed2&gt;

Use:

https://explain.depesz.com/

It is easier to follow it's output.

Has anyone else experienced similar behavior or could provide insights
into why PostgreSQL 16 might be slower for this query? Any advice or
suggestions for optimization would be greatly appreciated.

Yes when ANALYZE was not run on a new instance.

Thank you!

NOTE:- PFA the raw file of explain and analyze below.

--
Adrian Klaver
adrian.klaver@aklaver.com

#4Adrian Klaver
adrian.klaver@aklaver.com
In reply to: nikhil raj (#3)
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

On 8/26/24 15:41, nikhil raj wrote:

Hi Adrian,

Thanks for the quick response.

I've already performed a vacuum, reindex, and analyze on the entire
database, but the issue persists. As you can see from the execution
plan, the time difference in PostgreSQL 16 is still significantly
higher, even after all maintenance activities have been completed.

It seems there might be a bug in PostgreSQL 16 where the performance of
queries on *information_schema* tables is degraded. As both the tables
are postgres system tables

https://explain.depesz.com/s/bdO6b <https://explain.depesz.com/s/bdO6b&gt;
:-PG13 <https://explain.depesz.com/s/bdO6b&gt;

https://explain.depesz.com/s/bpAU <https://explain.depesz.com/s/bpAU&gt;
:- PG16 <https://explain.depesz.com/s/bpAU&gt;

What I see is Postgres 13:

Nested Loop (cost=9.54..119.02 rows=1 width=128) (actual
time=1.038..288.777 rows=1 loops=1)

Join Filter: (("*SELECT* 1".constraint_name)::name = "*SELECT*
1_1".conname)
Rows Removed by Join Filter: 935
Buffers: shared hit=34,675

vs Postgres 16

Nested Loop (cost=62.84..538.22 rows=1 width=128) (actual
time=1,905.153..14,006.921 rows=1 loops=1)

Join Filter: ("*SELECT* 1".conname = ("*SELECT*
1_1".constraint_name)::name)
Rows Removed by Join Filter: 997
Buffers: shared hit=5,153,054

So either switching this

("*SELECT* 1".constraint_name)::name = "*SELECT* 1_1".conname

to

"*SELECT* 1".conname = ("*SELECT* 1_1".constraint_name)::name

is more of a change then I would expect.

Or

Buffers: shared hit=34,675

vs

Buffers: shared hit=5,153,054

indicates a hardware/configuration difference.

Are both instances running on the same machine?

Is the configuration for both the same?

On Tue 27 Aug, 2024, 3:40 AM Adrian Klaver, <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:

On 8/26/24 14:49, nikhil raj wrote:

Hi All,

I've encountered a noticeable difference in execution time and query
execution plan row counts between PostgreSQL 13 and PostgreSQL 16

when

running a query on |information_schema| tables. Surprisingly,

PostgreSQL

16 is performing slower than PostgreSQL 13.

Did you run ANALYZE on the Postgres 16 instance?

*4PostgreSQL 13.14 (PostgreSQL 13.14 on x86_64-pc-linux-gnu,

compiled by

gcc 11.4.0, 64-bit)*
Execution plan: PG13.14 Execution Plan
<https://explain.dalibo.com/plan/ag1a62a9d47dg29d

<https://explain.dalibo.com/plan/ag1a62a9d47dg29d&gt;&gt;

*PostgreSQL 16.4 (PostgreSQL 16.4 on x86_64-pc-linux-gnu,

compiled by

gcc 11.4.0, 64-bit)*
Execution plan: PG16.4 Execution Plan
<https://explain.dalibo.com/plan/4c66fdfbf2hf9ed2

<https://explain.dalibo.com/plan/4c66fdfbf2hf9ed2&gt;&gt;

Use:

https://explain.depesz.com/ <https://explain.depesz.com/&gt;

It is easier to follow it's output.

Has anyone else experienced similar behavior or could provide

insights

into why PostgreSQL 16 might be slower for this query? Any advice or
suggestions for optimization would be greatly appreciated.

Yes when ANALYZE was not run on a new instance.

Thank you!

NOTE:-  PFA the raw file of explain and analyze below.

--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>

--
Adrian Klaver
adrian.klaver@aklaver.com

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: nikhil raj (#1)
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

nikhil raj <nikhilraj474@gmail.com> writes:

I've encountered a noticeable difference in execution time and query
execution plan row counts between PostgreSQL 13 and PostgreSQL 16 when
running a query on information_schema tables. Surprisingly, PostgreSQL 16
is performing slower than PostgreSQL 13.

Yeah, it looks like that condition on "table_name" is not getting
pushed down to the scan level anymore. I'm not sure why not,
but will look closer tomorrow.

regards, tom lane

#6David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#5)
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

On Tue, 27 Aug 2024 at 13:40, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah, it looks like that condition on "table_name" is not getting
pushed down to the scan level anymore. I'm not sure why not,
but will look closer tomorrow.

I was looking for the offending commit as at first I thought it might
be related to Memoize. It does not seem to be.

I get the following up until 2489d76c, and from then on, it's a subquery filter.

-> Index Scan using pg_class_relname_nsp_index on pg_class r_2
(cost=0.27..8.30 rows=1 width=8) (actual time=0.004..0.004 rows=0
loops=1)
Index Cond: (relname = 't_c56ng1_repository'::name)
Filter: ((relkind = ANY ('{r,p}'::"char"[])) AND
pg_has_role(relowner, 'USAGE'::text))

So looks like it was the "Make Vars be outer-join-aware." commit that
changed this.

David

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#6)
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

David Rowley <dgrowleyml@gmail.com> writes:

On Tue, 27 Aug 2024 at 13:40, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah, it looks like that condition on "table_name" is not getting
pushed down to the scan level anymore. I'm not sure why not,
but will look closer tomorrow.

So looks like it was the "Make Vars be outer-join-aware." commit that
changed this.

Yeah, I got that same result by bisecting. It seems like it's
somehow related to the cast to information_schema.sql_identifier:
we are able to get rid of that normally but seem to fail to do so
in this query.

There was a smaller increase in the runtime at dfb75e478 "Add primary
keys and unique constraints to system catalogs", but that seems to
just be due to there being more rows in the relevant catalogs.
(That's from testing the query in an empty database; probably the
effect of dfb75e478 would be swamped in a production DB anyway.)

regards, tom lane

#8Justin Clift
justin@postgresql.org
In reply to: David Rowley (#6)
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

On 2024-08-27 11:50, David Rowley wrote:

On Tue, 27 Aug 2024 at 13:40, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah, it looks like that condition on "table_name" is not getting
pushed down to the scan level anymore. I'm not sure why not,
but will look closer tomorrow.

I was looking for the offending commit as at first I thought it might
be related to Memoize. It does not seem to be.

As a general thought, seeing that this might be an actual problem
should some kind of automated testing be added that checks for
performance regressions like this?

Regards and best wishes,

Justin Clift

#9David Rowley
dgrowleyml@gmail.com
In reply to: Justin Clift (#8)
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

On Tue, 27 Aug 2024 at 18:00, Justin Clift <justin@postgresql.org> wrote:

As a general thought, seeing that this might be an actual problem
should some kind of automated testing be added that checks for
performance regressions like this?

We normally try to catch these sorts of things with regression tests.
Of course, that requires having a test that would catch a particular
problem, which we don't seem to have for this particular case. A
performance test would also require testing a particular scenario, so
I don't see why that's better. A regression test is better suited as
there's no middle ground between pass and fail.

David

#10David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#7)
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

On Tue, 27 Aug 2024 at 14:03, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah, I got that same result by bisecting. It seems like it's
somehow related to the cast to information_schema.sql_identifier:
we are able to get rid of that normally but seem to fail to do so
in this query.

In case it saves you a bit of time, I stripped as much of the
unrelated stuff out as I could and got:

create table t (a name, b int);
explain select * from (select a::varchar,b from (select distinct a,b
from t) st) t right join t t2 on t.b=t2.b where t.a='test';

getting rid of the cast or swapping to INNER JOIN rather than RIGHT
JOIN means that qual_is_pushdown_safe() gets a Var rather than a
PlaceHolderVar.

David

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#10)
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

[ switching to -hackers list ]

David Rowley <dgrowleyml@gmail.com> writes:

In case it saves you a bit of time, I stripped as much of the
unrelated stuff out as I could and got:

create table t (a name, b int);
explain select * from (select a::varchar,b from (select distinct a,b
from t) st) t right join t t2 on t.b=t2.b where t.a='test';

getting rid of the cast or swapping to INNER JOIN rather than RIGHT
JOIN means that qual_is_pushdown_safe() gets a Var rather than a
PlaceHolderVar.

Thanks. So it seems that what's happening is that we stick a
PlaceHolderVar on the intermediate subquery's output ("a::varchar"),
and then later when we realize that the RIGHT JOIN can be reduced to
an inner join we run around and remove the right join from the
PlaceHolderVar's nullingrels, leaving a useless PHV with no
nullingrels. remove_nulling_relids explains

* Note: it might seem desirable to remove the PHV altogether if
* phnullingrels goes to empty. Currently we dare not do that
* because we use PHVs in some cases to enforce separate identity
* of subexpressions; see wrap_non_vars usages in prepjointree.c.

However, then when we consider whether the upper WHERE condition
can be pushed down into the unflattened lower subquery,
qual_is_pushdown_safe punts:

* XXX Punt if we find any PlaceHolderVars in the restriction clause.
* It's not clear whether a PHV could safely be pushed down, and even
* less clear whether such a situation could arise in any cases of
* practical interest anyway. So for the moment, just refuse to push
* down.

We didn't see this particular behavior before 2489d76c49 because
pullup_replace_vars avoided inserting a PHV:

* If it contains a Var of the subquery being pulled up, and
* does not contain any non-strict constructs, then it's
* certainly nullable so we don't need to insert a
* PlaceHolderVar.

I dropped that case in 2489d76c49 because now we need to attach
nullingrels to the expression. You could imagine attaching the
nullingrels to the contained Var(s) instead of putting a PHV on top,
but that seems like a mess and I'm not quite sure it's semantically
the same. In any case it wouldn't fix adjacent cases where there is
a non-strict construct in the subquery output expression.

So it seems like we need to fix one or the other of these
implementation shortcuts to restore the previous behavior.
I'm wondering if it'd be okay for qual_is_pushdown_safe to accept
PHVs that have no nullingrels. I'm not really thrilled about trying
to back-patch any such fix though --- the odds of introducing new bugs
seem nontrivial, and the problem case seems rather narrow. If we
are willing to accept a HEAD-only fix, it'd likely be better to
attack the other end and make it possible to remove no-op PHVs.
I think that'd require marking PHVs that need to be kept because
they are serving to isolate subexpressions.

regards, tom lane

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#11)
1 attachment(s)
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

I wrote:

We didn't see this particular behavior before 2489d76c49 because
pullup_replace_vars avoided inserting a PHV:
* If it contains a Var of the subquery being pulled up, and
* does not contain any non-strict constructs, then it's
* certainly nullable so we don't need to insert a
* PlaceHolderVar.
I dropped that case in 2489d76c49 because now we need to attach
nullingrels to the expression. You could imagine attaching the
nullingrels to the contained Var(s) instead of putting a PHV on top,
but that seems like a mess and I'm not quite sure it's semantically
the same. In any case it wouldn't fix adjacent cases where there is
a non-strict construct in the subquery output expression.

I realized that actually we do have the mechanism for making that
work: we could apply add_nulling_relids to the expression, if it
meets those same conditions. This is a kluge really, but it would
restore the status quo ante in a fairly localized fashion that
seems like it might be safe enough to back-patch into v16.

Here's a WIP patch that does it like that. One problem with it
is that it requires rcon->relids to be calculated in cases where
we didn't need that before, which is probably not *that* expensive
but it's annoying. If we go forward with this, I'm thinking about
changing add_nulling_relids' API contract to say "if target_relid
is NULL then all level-zero Vars/PHVs are modified", so that we
don't need that relid set in non-LATERAL cases.

The other problem with this is that it breaks one test case in
memoize.sql: a query that formerly generated a memoize plan
now does not use memoize. I am not sure why not --- does that
mean anything to you?

regards, tom lane

Attachments:

avoid-unnecessary-PHV-during-pullup-wip.patchtext/x-diff; charset=us-ascii; name=avoid-unnecessary-PHV-during-pullup-wip.patchDownload
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 969e257f70..3a12a52440 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -48,7 +48,7 @@ typedef struct pullup_replace_vars_context
 	List	   *targetlist;		/* tlist of subquery being pulled up */
 	RangeTblEntry *target_rte;	/* RTE of subquery */
 	Relids		relids;			/* relids within subquery, as numbered after
-								 * pullup (set only if target_rte->lateral) */
+								 * pullup */
 	bool	   *outer_hasSubLinks;	/* -> outer query's hasSubLinks */
 	int			varno;			/* varno of subquery */
 	bool		wrap_non_vars;	/* do we need all non-Var outputs to be PHVs? */
@@ -1163,11 +1163,8 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
 	rvcontext.root = root;
 	rvcontext.targetlist = subquery->targetList;
 	rvcontext.target_rte = rte;
-	if (rte->lateral)
-		rvcontext.relids = get_relids_in_jointree((Node *) subquery->jointree,
-												  true, true);
-	else						/* won't need relids */
-		rvcontext.relids = NULL;
+	rvcontext.relids = get_relids_in_jointree((Node *) subquery->jointree,
+											  true, true);
 	rvcontext.outer_hasSubLinks = &parse->hasSubLinks;
 	rvcontext.varno = varno;
 	/* this flag will be set below, if needed */
@@ -1713,7 +1710,7 @@ pull_up_simple_values(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte)
 	rvcontext.root = root;
 	rvcontext.targetlist = tlist;
 	rvcontext.target_rte = rte;
-	rvcontext.relids = NULL;
+	rvcontext.relids = NULL;	/* XXX */
 	rvcontext.outer_hasSubLinks = &parse->hasSubLinks;
 	rvcontext.varno = varno;
 	rvcontext.wrap_non_vars = false;
@@ -1877,7 +1874,7 @@ pull_up_constant_function(PlannerInfo *root, Node *jtnode,
 	 * lateral references, even if it's marked as LATERAL.  This means we
 	 * don't need to fill relids.
 	 */
-	rvcontext.relids = NULL;
+	rvcontext.relids = NULL;	/* XXX */
 
 	rvcontext.outer_hasSubLinks = &parse->hasSubLinks;
 	rvcontext.varno = ((RangeTblRef *) jtnode)->rtindex;
@@ -2490,14 +2487,48 @@ pullup_replace_vars_callback(Var *var,
 				else
 					wrap = false;
 			}
+			else if (rcon->wrap_non_vars)
+			{
+				/* Caller told us to wrap all non-Vars in a PlaceHolderVar */
+				wrap = true;
+			}
 			else
 			{
 				/*
-				 * Must wrap, either because we need a place to insert
-				 * varnullingrels or because caller told us to wrap
-				 * everything.
+				 * If the node contains Var(s) or PlaceHolderVar(s) of the
+				 * subquery being pulled up, and does not contain any
+				 * non-strict constructs, then instead of adding a PHV on top
+				 * we can add the required nullingrels to those Vars/PHVs.
+				 * (This is fundamentally a generalization of the above cases
+				 * for bare Vars and PHVs.)
+				 *
+				 * This test is somewhat expensive, but it avoids pessimizing
+				 * the plan in cases where the nullingrels get removed again
+				 * later by outer join reduction.
+				 *
+				 * This analysis could be tighter: in particular, a non-strict
+				 * construct hidden within a lower-level PlaceHolderVar is not
+				 * reason to add another PHV.  But for now it doesn't seem
+				 * worth the code to be more exact.
+				 *
+				 * For a LATERAL subquery, we have to check the actual var
+				 * membership of the node, but if it's non-lateral then any
+				 * level-zero var must belong to the subquery.
 				 */
-				wrap = true;
+				if ((rcon->target_rte->lateral ?
+					 bms_overlap(pull_varnos(rcon->root, newnode),
+								 rcon->relids) :
+					 contain_vars_of_level(newnode, 0)) &&
+					!contain_nonstrict_functions(newnode))
+				{
+					/* No wrap needed */
+					wrap = false;
+				}
+				else
+				{
+					/* Else wrap it in a PlaceHolderVar */
+					wrap = true;
+				}
 			}
 
 			if (wrap)
@@ -2522,7 +2553,7 @@ pullup_replace_vars_callback(Var *var,
 	if (var->varlevelsup > 0)
 		IncrementVarSublevelsUp(newnode, var->varlevelsup, 0);
 
-	/* Propagate any varnullingrels into the replacement Var or PHV */
+	/* Propagate any varnullingrels into the replacement expression */
 	if (var->varnullingrels != NULL)
 	{
 		if (IsA(newnode, Var))
@@ -2542,7 +2573,15 @@ pullup_replace_vars_callback(Var *var,
 													var->varnullingrels);
 		}
 		else
-			elog(ERROR, "failed to wrap a non-Var");
+		{
+			/* There should be lower-level Vars/PHVs we can modify */
+			newnode = add_nulling_relids(newnode,
+										 rcon->relids,
+										 var->varnullingrels);
+			/* Assert we did put the varnullingrels into the expression */
+			Assert(bms_is_subset(var->varnullingrels,
+								 pull_varnos(rcon->root, newnode)));
+		}
 	}
 
 	return newnode;
#13David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#12)
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

On Wed, 28 Aug 2024 at 09:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:

The other problem with this is that it breaks one test case in
memoize.sql: a query that formerly generated a memoize plan
now does not use memoize. I am not sure why not --- does that
mean anything to you?

The reason it works in master is that get_memoize_path() calls
extract_lateral_vars_from_PHVs() and finds PlaceHolderVars to use as
the Memoize keys. With your patch PlannerInfo.placeholder_list is
empty.

The commit that made this work is 069d0ff02. Richard might be able to
explain better. I don't quite understand why RelOptInfo.lateral_vars
don't contain these in the first place.

David

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#13)
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

David Rowley <dgrowleyml@gmail.com> writes:

On Wed, 28 Aug 2024 at 09:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:

The other problem with this is that it breaks one test case in
memoize.sql: a query that formerly generated a memoize plan
now does not use memoize. I am not sure why not --- does that
mean anything to you?

The reason it works in master is that get_memoize_path() calls
extract_lateral_vars_from_PHVs() and finds PlaceHolderVars to use as
the Memoize keys. With your patch PlannerInfo.placeholder_list is
empty.

That seems like a pretty fishy way to do it. Are you saying that
Memoize is never applicable if there aren't outer joins in the
query? Without OJs there probably won't be any PHVs.

regards, tom lane

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#14)
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

I wrote:

That seems like a pretty fishy way to do it. Are you saying that
Memoize is never applicable if there aren't outer joins in the
query? Without OJs there probably won't be any PHVs.

Oh, scratch that, I see you mean this is an additional way to do it
not the only way to do it. But I'm confused why it works for
t1.two+1 AS c1
but not
t1.two+t2.two AS c1
Those ought to look pretty much the same for this purpose.

regards, tom lane

#16David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#15)
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

On Wed, 28 Aug 2024 at 11:37, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Oh, scratch that, I see you mean this is an additional way to do it
not the only way to do it. But I'm confused why it works for
t1.two+1 AS c1
but not
t1.two+t2.two AS c1
Those ought to look pretty much the same for this purpose.

The bms_overlap(pull_varnos(rcon->root, newnode), rcon->relids) test
is false with t1.two+1. Looks like there needs to be a Var from t2
for the bms_overlap to be true

David

#17Richard Guo
guofenglinux@gmail.com
In reply to: Tom Lane (#12)
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

On Wed, Aug 28, 2024 at 5:52 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I realized that actually we do have the mechanism for making that
work: we could apply add_nulling_relids to the expression, if it
meets those same conditions.

I think this should work, as long as we apply add_nulling_relids only
to Vars/PHVs that belong to the subquery in this case, because only
those Vars/PHVs would be nulled by the outer joins contained in the
nullingrels.

If we go forward with this, I'm thinking about
changing add_nulling_relids' API contract to say "if target_relid
is NULL then all level-zero Vars/PHVs are modified", so that we
don't need that relid set in non-LATERAL cases.

+1. In LATERAL case, we can always find the subquery's relids in
rcon->relids. In non-lateral case, any level-zero Vars/PHVs must
belong to the subquery - so if we change add_nulling_relids' API to be
so, we do not need to have rcon->relids set.

Thanks
Richard

#18Richard Guo
guofenglinux@gmail.com
In reply to: Richard Guo (#17)
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

On Wed, Aug 28, 2024 at 11:30 AM Richard Guo <guofenglinux@gmail.com> wrote:

On Wed, Aug 28, 2024 at 5:52 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I realized that actually we do have the mechanism for making that
work: we could apply add_nulling_relids to the expression, if it
meets those same conditions.

I think this should work, as long as we apply add_nulling_relids only
to Vars/PHVs that belong to the subquery in this case, because only
those Vars/PHVs would be nulled by the outer joins contained in the
nullingrels.

To be more concrete, I know theoretically it is the whole expression
that is nullable by the outer joins, not its individual vars. But in
this case if the contained vars (that belong to the subquery) become
NULL, the whole expression would be NULL too, because it does not
contain any non-strict constructs. That's why I think this approach
should work.

Thanks
Richard

#19Justin Clift
justin@postgresql.org
In reply to: David Rowley (#9)
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

On 2024-08-27 20:14, David Rowley wrote:

On Tue, 27 Aug 2024 at 18:00, Justin Clift <justin@postgresql.org>
wrote:

As a general thought, seeing that this might be an actual problem
should some kind of automated testing be added that checks for
performance regressions like this?

We normally try to catch these sorts of things with regression tests.
Of course, that requires having a test that would catch a particular
problem, which we don't seem to have for this particular case. A
performance test would also require testing a particular scenario, so
I don't see why that's better. A regression test is better suited as
there's no middle ground between pass and fail.

Yeah, that's the kind of thing I was thinking.

Any idea who normally does those, and if it would be reasonable to add
test(s) for the internal information tables?

Regards and best wishes,

Justin Clift

#20Richard Guo
guofenglinux@gmail.com
In reply to: Tom Lane (#11)
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

On Wed, Aug 28, 2024 at 12:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

If we
are willing to accept a HEAD-only fix, it'd likely be better to
attack the other end and make it possible to remove no-op PHVs.
I think that'd require marking PHVs that need to be kept because
they are serving to isolate subexpressions.

I think it's always desirable to remove no-op PHVs, even if we end up
with a different approach to fix the issue discussed here. Doing that
could potentially open up opportunities for optimization in other
cases. For example:

explain (costs off)
select * from t t1 left join
lateral (select t1.a as x, * from t t2) s on true
where t1.a = s.a;
QUERY PLAN
----------------------------
Nested Loop
-> Seq Scan on t t1
-> Seq Scan on t t2
Filter: (t1.a = a)
(4 rows)

The target entry s.x is wrapped in a PHV that contains lateral
reference to t1, which forces us to resort to nestloop join. However,
since the left join has been reduced to an inner join, and it is
removed from the PHV's nullingrels, leaving the nullingrels being
empty, we should be able to remove this PHV and use merge or hash
joins, depending on which is cheaper.

I think there may be more cases where no-op PHVs constrain
optimization opportunities.

In [1]/messages/by-id/CAMbWs4_2t2pqqCFdS3NYJLwMMkAzYQKBOhKweFt-wE3YOi7rGg@mail.gmail.com when working on the fix-grouping-sets patch, I included a
mechanism in 0003 to remove no-op PHVs by including a flag in
PlaceHolderVar to indicate whether it is safe to remove the PHV when
its phnullingrels becomes empty. In that patch this flag is only set
in cases where the PHV is used to carry the nullingrel bit that
represents the grouping step. Maybe we can extend its use to remove
all no-op PHVs, except those that are serving to isolate
subexpressions.

Any thoughts on this?

[1]: /messages/by-id/CAMbWs4_2t2pqqCFdS3NYJLwMMkAzYQKBOhKweFt-wE3YOi7rGg@mail.gmail.com

Thanks
Richard

#21Richard Guo
guofenglinux@gmail.com
In reply to: David Rowley (#16)
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

On Wed, Aug 28, 2024 at 7:58 AM David Rowley <dgrowleyml@gmail.com> wrote:

On Wed, 28 Aug 2024 at 11:37, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Oh, scratch that, I see you mean this is an additional way to do it
not the only way to do it. But I'm confused why it works for
t1.two+1 AS c1
but not
t1.two+t2.two AS c1
Those ought to look pretty much the same for this purpose.

The bms_overlap(pull_varnos(rcon->root, newnode), rcon->relids) test
is false with t1.two+1. Looks like there needs to be a Var from t2
for the bms_overlap to be true

Exactly. What Tom's patch does is that if the expression contains
Vars/PHVs that belong to the subquery, and does not contain any
non-strict constructs, then it can escape being wrapped.

In expression 't1.two+t2.two', 't2.two' is a Var that belongs to the
subquery, and '+' is strict, so it can escape being wrapped.

The expression 't1.two+1' does not meet these conditions, so it is
wrapped into a PHV, and the PHV contains lateral reference to t1,
which results in a nestloop join with a parameterized inner path.
That's why Memoize can work in this query.

Thanks
Richard

#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Richard Guo (#21)
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

Richard Guo <guofenglinux@gmail.com> writes:

Exactly. What Tom's patch does is that if the expression contains
Vars/PHVs that belong to the subquery, and does not contain any
non-strict constructs, then it can escape being wrapped.

In expression 't1.two+t2.two', 't2.two' is a Var that belongs to the
subquery, and '+' is strict, so it can escape being wrapped.

The expression 't1.two+1' does not meet these conditions, so it is
wrapped into a PHV, and the PHV contains lateral reference to t1,
which results in a nestloop join with a parameterized inner path.
That's why Memoize can work in this query.

Yeah. (I'd missed that t1.two is a lateral reference and t2.two is
not; sorry for the noise.)

What happens as of HEAD is that, because we wrap this subquery output
in a PHV marked as due to be evaluated at t2, the entire clause

(t1.two+t2.two) = t2.unique1

becomes a base restriction clause for t2, so that when we generate
a path for t2 it will include this as a path qual (forcing the path
to be laterally dependent on t1). Without the PHV, it's just an
ordinary join clause and it will not be evaluated at scan level
unless it can be turned into an indexqual --- which it can't.

The preceding regression-test case with "t1.two+1 = t2.unique1"
can be made into a parameterized indexscan on t2.unique1, so it is,
and then memoize can trigger off that.

I'm inclined to think that treating such a clause as a join clause
is strictly better than what happens now, so I'm not going to
apologize for the PHV not being there. If you wanted to cast
blame, you could look to set_plain_rel_pathlist, where it says

* We don't support pushing join clauses into the quals of a seqscan, but
* it could still have required parameterization due to LATERAL refs in
* its tlist.

(This comment could stand some work, as it fails to note that
labeling the path with required parameterization can result in
"join clauses" being evaluated there anyway.)

In the normal course of things I'd be dubious about the value of
pushing join clauses into a seqscan, but maybe the possibility of a
memoize'd join has moved the goalposts enough that we should
consider that. Alternatively, maybe get_memoized_path should take
more responsibility for devising plausible subpaths rather than
assuming they'll be handed to it on a platter. (I don't remember
all the conditions checked in add_path, but I wonder if we are
missing some potential memoize applications because suitable paths
fail to survive the scan rel's add_path tournament.)

In the meantime, I think this test case is mighty artificial,
and it wouldn't bother me any to just take it out again for the
time being.

regards, tom lane

#23Richard Guo
guofenglinux@gmail.com
In reply to: Tom Lane (#22)
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

On Thu, Aug 29, 2024 at 4:47 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

In the meantime, I think this test case is mighty artificial,
and it wouldn't bother me any to just take it out again for the
time being.

Yeah, I think we can remove the 't1.two+t2.two' test case if we go
with your proposed patch to address this performance regression.

Thanks
Richard

#24David Rowley
dgrowleyml@gmail.com
In reply to: Justin Clift (#19)
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

On Wed, 28 Aug 2024 at 18:59, Justin Clift <justin@postgresql.org> wrote:

Any idea who normally does those, and if it would be reasonable to add
test(s) for the internal information tables?

These tend to get added along with features and along with of bug
fixes. I imagine any tests for the information_schema views would be
for the results of the views rather than for the expected plans.
However, that seems very separate from this as the bug has nothing to
do with information_schema. It just happens to be a query to an
information_schema view that helped highlight the bug. Those views
are often quite complex and so are the resulting plans. With tests
checking the expected EXPLAIN output, it's much better to give these a
very narrow focus otherwise the expected output could be too unstable
and the purpose of the test harder to determine for anyone working on
a new patch which results in a plan change of a preexisting test.
I've seen tests before rendered useless by people blindly accepting
the plan change without properly determining what the test is supposed
to be testing. That's much more likely to happen when the purpose of
the test is less clear due to some unwieldy and complex expected plan.
I managed to get a reproducer for this down to something quite simple.
Probably that or something similar would be a better test to make sure
this bug stays gone.

David

#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Richard Guo (#23)
2 attachment(s)
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

Richard Guo <guofenglinux@gmail.com> writes:

On Thu, Aug 29, 2024 at 4:47 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

In the meantime, I think this test case is mighty artificial,
and it wouldn't bother me any to just take it out again for the
time being.

Yeah, I think we can remove the 't1.two+t2.two' test case if we go
with your proposed patch to address this performance regression.

Here's a polished-up patchset for that. I made the memoize test
removal a separate patch because (a) it only applies to master
and (b) it seems worth calling out as something we might be able
to revert later.

I found one bug in the draft patch: add_nulling_relids only processes
Vars of level zero, so we have to apply it before not after adjusting
the Vars' levelsup. An alternative could be to add a levelsup
parameter to add_nulling_relids, but that seemed like unnecessary
complication.

regards, tom lane

Attachments:

v1-0001-Remove-one-memoize-test-case-added-by-commit-069d.patchtext/x-diff; charset=us-ascii; name*0=v1-0001-Remove-one-memoize-test-case-added-by-commit-069d.p; name*1=atchDownload
From 8fafdc4852b8f2164286e6863219eb6b4d267639 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu, 29 Aug 2024 16:25:23 -0400
Subject: [PATCH v1 1/2] Remove one memoize test case added by commit
 069d0ff02.

This test case turns out to depend on the assumption that a non-Var
subquery output that's underneath an outer join will always get
wrapped in a PlaceHolderVar.  But that behavior causes performance
regressions in some cases compared to what happened before v16.
The next commit will avoid inserting a PHV in the same cases where
pre-v16 did, and that causes get_memoized_path to not detect that
a memoize plan could be used.

Commit this separately, in hopes that we can restore the test after
making get_memoized_path smarter.  (It's failing to find memoize
plans in adjacent cases where no PHV was ever inserted, so there
is definitely room for improvement there.)

Discussion: https://postgr.es/m/CAG1ps1xvnTZceKK24OUfMKLPvDP2vjT-d+F2AOCWbw_v3KeEgg@mail.gmail.com
---
 src/test/regress/expected/memoize.out | 30 ---------------------------
 src/test/regress/sql/memoize.sql      | 11 ----------
 2 files changed, 41 deletions(-)

diff --git a/src/test/regress/expected/memoize.out b/src/test/regress/expected/memoize.out
index df2ca5ba4e..9ee09fe2f5 100644
--- a/src/test/regress/expected/memoize.out
+++ b/src/test/regress/expected/memoize.out
@@ -160,36 +160,6 @@ WHERE s.c1 = s.c2 AND t1.unique1 < 1000;
   1000 | 9.5000000000000000
 (1 row)
 
--- Try with LATERAL references within PlaceHolderVars
-SELECT explain_memoize('
-SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN
-LATERAL (SELECT t1.two+t2.two AS c1, t2.unique1 AS c2 FROM tenk1 t2) s ON TRUE
-WHERE s.c1 = s.c2 AND t1.unique1 < 1000;', false);
-                                   explain_memoize                                    
---------------------------------------------------------------------------------------
- Aggregate (actual rows=1 loops=N)
-   ->  Nested Loop (actual rows=1000 loops=N)
-         ->  Seq Scan on tenk1 t1 (actual rows=1000 loops=N)
-               Filter: (unique1 < 1000)
-               Rows Removed by Filter: 9000
-         ->  Memoize (actual rows=1 loops=N)
-               Cache Key: t1.two
-               Cache Mode: binary
-               Hits: 998  Misses: 2  Evictions: Zero  Overflows: 0  Memory Usage: NkB
-               ->  Seq Scan on tenk1 t2 (actual rows=1 loops=N)
-                     Filter: ((t1.two + two) = unique1)
-                     Rows Removed by Filter: 9999
-(12 rows)
-
--- And check we get the expected results.
-SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN
-LATERAL (SELECT t1.two+t2.two AS c1, t2.unique1 AS c2 FROM tenk1 t2) s ON TRUE
-WHERE s.c1 = s.c2 AND t1.unique1 < 1000;
- count |        avg         
--------+--------------------
-  1000 | 9.0000000000000000
-(1 row)
-
 -- Ensure we do not omit the cache keys from PlaceHolderVars
 SELECT explain_memoize('
 SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN
diff --git a/src/test/regress/sql/memoize.sql b/src/test/regress/sql/memoize.sql
index 059bec5f4f..2eaeb1477a 100644
--- a/src/test/regress/sql/memoize.sql
+++ b/src/test/regress/sql/memoize.sql
@@ -85,17 +85,6 @@ SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN
 LATERAL (SELECT t1.two+1 AS c1, t2.unique1 AS c2 FROM tenk1 t2) s ON TRUE
 WHERE s.c1 = s.c2 AND t1.unique1 < 1000;
 
--- Try with LATERAL references within PlaceHolderVars
-SELECT explain_memoize('
-SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN
-LATERAL (SELECT t1.two+t2.two AS c1, t2.unique1 AS c2 FROM tenk1 t2) s ON TRUE
-WHERE s.c1 = s.c2 AND t1.unique1 < 1000;', false);
-
--- And check we get the expected results.
-SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN
-LATERAL (SELECT t1.two+t2.two AS c1, t2.unique1 AS c2 FROM tenk1 t2) s ON TRUE
-WHERE s.c1 = s.c2 AND t1.unique1 < 1000;
-
 -- Ensure we do not omit the cache keys from PlaceHolderVars
 SELECT explain_memoize('
 SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN
-- 
2.43.5

v1-0002-Avoid-inserting-PlaceHolderVars-in-cases-where-pr.patchtext/x-diff; charset=us-ascii; name*0=v1-0002-Avoid-inserting-PlaceHolderVars-in-cases-where-pr.p; name*1=atchDownload
From 7b48394838797489e7ab869f97ca06449fdbcee3 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu, 29 Aug 2024 16:43:35 -0400
Subject: [PATCH v1 2/2] Avoid inserting PlaceHolderVars in cases where pre-v16
 PG did not.

Commit 2489d76c4 removed some logic from pullup_replace_vars()
that avoided wrapping a PlaceHolderVar around a pulled-up
subquery output expression if the expression could be proven
to go to NULL anyway (because it contained Vars or PHVs of the
pulled-up relation and did not contain non-strict constructs).
But removing that logic turns out to cause performance regressions
in some cases, because the extra PHV prevents subexpression folding,
and will do so even if outer-join reduction later turns it into a
no-op with no phnullingrels bits.

The reason for always adding a PHV was to ensure we had someplace
to put the varnullingrels marker bits of the Var being replaced.
However, it turns out we can optimize in exactly the same cases that
the previous code did, because we can attach the needed varnullingrels
bits to the contained Var(s)/PHV(s).

This is not a complete solution --- it would be even better if we
could remove PHVs after reducing them to no-ops.  It doesn't look
practical to back-patch such an improvement, but this change seems
safe and at least gets rid of the performance-regression cases.

Per complaint from Nikhil Raj.  Back-patch to v16 where the
problem appeared.

Discussion: https://postgr.es/m/CAG1ps1xvnTZceKK24OUfMKLPvDP2vjT-d+F2AOCWbw_v3KeEgg@mail.gmail.com
---
 src/backend/optimizer/prep/prepjointree.c | 66 ++++++++++++++++++-----
 src/backend/rewrite/rewriteManip.c        |  9 ++--
 src/test/regress/expected/subselect.out   | 29 ++++++++++
 src/test/regress/sql/subselect.sql        | 18 +++++++
 4 files changed, 107 insertions(+), 15 deletions(-)

diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 969e257f70..34fbf8ee23 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -2490,14 +2490,48 @@ pullup_replace_vars_callback(Var *var,
 				else
 					wrap = false;
 			}
+			else if (rcon->wrap_non_vars)
+			{
+				/* Caller told us to wrap all non-Vars in a PlaceHolderVar */
+				wrap = true;
+			}
 			else
 			{
 				/*
-				 * Must wrap, either because we need a place to insert
-				 * varnullingrels or because caller told us to wrap
-				 * everything.
+				 * If the node contains Var(s) or PlaceHolderVar(s) of the
+				 * subquery being pulled up, and does not contain any
+				 * non-strict constructs, then instead of adding a PHV on top
+				 * we can add the required nullingrels to those Vars/PHVs.
+				 * (This is fundamentally a generalization of the above cases
+				 * for bare Vars and PHVs.)
+				 *
+				 * This test is somewhat expensive, but it avoids pessimizing
+				 * the plan in cases where the nullingrels get removed again
+				 * later by outer join reduction.
+				 *
+				 * This analysis could be tighter: in particular, a non-strict
+				 * construct hidden within a lower-level PlaceHolderVar is not
+				 * reason to add another PHV.  But for now it doesn't seem
+				 * worth the code to be more exact.
+				 *
+				 * For a LATERAL subquery, we have to check the actual var
+				 * membership of the node, but if it's non-lateral then any
+				 * level-zero var must belong to the subquery.
 				 */
-				wrap = true;
+				if ((rcon->target_rte->lateral ?
+					 bms_overlap(pull_varnos(rcon->root, newnode),
+								 rcon->relids) :
+					 contain_vars_of_level(newnode, 0)) &&
+					!contain_nonstrict_functions(newnode))
+				{
+					/* No wrap needed */
+					wrap = false;
+				}
+				else
+				{
+					/* Else wrap it in a PlaceHolderVar */
+					wrap = true;
+				}
 			}
 
 			if (wrap)
@@ -2518,18 +2552,14 @@ pullup_replace_vars_callback(Var *var,
 		}
 	}
 
-	/* Must adjust varlevelsup if replaced Var is within a subquery */
-	if (var->varlevelsup > 0)
-		IncrementVarSublevelsUp(newnode, var->varlevelsup, 0);
-
-	/* Propagate any varnullingrels into the replacement Var or PHV */
+	/* Propagate any varnullingrels into the replacement expression */
 	if (var->varnullingrels != NULL)
 	{
 		if (IsA(newnode, Var))
 		{
 			Var		   *newvar = (Var *) newnode;
 
-			Assert(newvar->varlevelsup == var->varlevelsup);
+			Assert(newvar->varlevelsup == 0);
 			newvar->varnullingrels = bms_add_members(newvar->varnullingrels,
 													 var->varnullingrels);
 		}
@@ -2537,14 +2567,26 @@ pullup_replace_vars_callback(Var *var,
 		{
 			PlaceHolderVar *newphv = (PlaceHolderVar *) newnode;
 
-			Assert(newphv->phlevelsup == var->varlevelsup);
+			Assert(newphv->phlevelsup == 0);
 			newphv->phnullingrels = bms_add_members(newphv->phnullingrels,
 													var->varnullingrels);
 		}
 		else
-			elog(ERROR, "failed to wrap a non-Var");
+		{
+			/* There should be lower-level Vars/PHVs we can modify */
+			newnode = add_nulling_relids(newnode,
+										 NULL,	/* modify all Vars/PHVs */
+										 var->varnullingrels);
+			/* Assert we did put the varnullingrels into the expression */
+			Assert(bms_is_subset(var->varnullingrels,
+								 pull_varnos(rcon->root, newnode)));
+		}
 	}
 
+	/* Must adjust varlevelsup if replaced Var is within a subquery */
+	if (var->varlevelsup > 0)
+		IncrementVarSublevelsUp(newnode, var->varlevelsup, 0);
+
 	return newnode;
 }
 
diff --git a/src/backend/rewrite/rewriteManip.c b/src/backend/rewrite/rewriteManip.c
index 191f2dc0b1..b20625fbd2 100644
--- a/src/backend/rewrite/rewriteManip.c
+++ b/src/backend/rewrite/rewriteManip.c
@@ -1141,7 +1141,8 @@ AddInvertedQual(Query *parsetree, Node *qual)
 /*
  * add_nulling_relids() finds Vars and PlaceHolderVars that belong to any
  * of the target_relids, and adds added_relids to their varnullingrels
- * and phnullingrels fields.
+ * and phnullingrels fields.  If target_relids is NULL, all level-zero
+ * Vars and PHVs are modified.
  */
 Node *
 add_nulling_relids(Node *node,
@@ -1170,7 +1171,8 @@ add_nulling_relids_mutator(Node *node,
 		Var		   *var = (Var *) node;
 
 		if (var->varlevelsup == context->sublevels_up &&
-			bms_is_member(var->varno, context->target_relids))
+			(context->target_relids == NULL ||
+			 bms_is_member(var->varno, context->target_relids)))
 		{
 			Relids		newnullingrels = bms_union(var->varnullingrels,
 												   context->added_relids);
@@ -1188,7 +1190,8 @@ add_nulling_relids_mutator(Node *node,
 		PlaceHolderVar *phv = (PlaceHolderVar *) node;
 
 		if (phv->phlevelsup == context->sublevels_up &&
-			bms_overlap(phv->phrels, context->target_relids))
+			(context->target_relids == NULL ||
+			 bms_overlap(phv->phrels, context->target_relids)))
 		{
 			Relids		newnullingrels = bms_union(phv->phnullingrels,
 												   context->added_relids);
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index 9eecdc1e92..2d35de3fad 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -1721,6 +1721,35 @@ fetch backward all in c1;
 (2 rows)
 
 commit;
+--
+-- Verify that we correctly flatten cases involving a subquery output
+-- expression that doesn't need to be wrapped in a PlaceHolderVar
+--
+explain (costs off)
+select tname, attname from (
+select relname::information_schema.sql_identifier as tname, * from
+  (select * from pg_class c) ss1) ss2
+  right join pg_attribute a on a.attrelid = ss2.oid
+where tname = 'tenk1' and attnum = 1;
+                                QUERY PLAN                                
+--------------------------------------------------------------------------
+ Nested Loop
+   ->  Index Scan using pg_class_relname_nsp_index on pg_class c
+         Index Cond: (relname = 'tenk1'::name)
+   ->  Index Scan using pg_attribute_relid_attnum_index on pg_attribute a
+         Index Cond: ((attrelid = c.oid) AND (attnum = 1))
+(5 rows)
+
+select tname, attname from (
+select relname::information_schema.sql_identifier as tname, * from
+  (select * from pg_class c) ss1) ss2
+  right join pg_attribute a on a.attrelid = ss2.oid
+where tname = 'tenk1' and attnum = 1;
+ tname | attname 
+-------+---------
+ tenk1 | unique1
+(1 row)
+
 --
 -- Tests for CTE inlining behavior
 --
diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql
index 75a9b718b2..af6e157aca 100644
--- a/src/test/regress/sql/subselect.sql
+++ b/src/test/regress/sql/subselect.sql
@@ -890,6 +890,24 @@ fetch backward all in c1;
 
 commit;
 
+--
+-- Verify that we correctly flatten cases involving a subquery output
+-- expression that doesn't need to be wrapped in a PlaceHolderVar
+--
+
+explain (costs off)
+select tname, attname from (
+select relname::information_schema.sql_identifier as tname, * from
+  (select * from pg_class c) ss1) ss2
+  right join pg_attribute a on a.attrelid = ss2.oid
+where tname = 'tenk1' and attnum = 1;
+
+select tname, attname from (
+select relname::information_schema.sql_identifier as tname, * from
+  (select * from pg_class c) ss1) ss2
+  right join pg_attribute a on a.attrelid = ss2.oid
+where tname = 'tenk1' and attnum = 1;
+
 --
 -- Tests for CTE inlining behavior
 --
-- 
2.43.5

#26nikhil raj
nikhilraj474@gmail.com
In reply to: Tom Lane (#25)
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

Hi All,

I hope you're doing well.

I'm writing to kindly requesting if there is a bug tracker ID or any
reference number associated with this issue, I would appreciate it if you
could share it with me.

Thank you for your time and assistance. Please let me know if there's any
additional information you need from me.

Best regards,

Nikhil

On Fri, 30 Aug, 2024, 2:23 am Tom Lane, <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Richard Guo <guofenglinux@gmail.com> writes:

On Thu, Aug 29, 2024 at 4:47 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

In the meantime, I think this test case is mighty artificial,
and it wouldn't bother me any to just take it out again for the
time being.

Yeah, I think we can remove the 't1.two+t2.two' test case if we go
with your proposed patch to address this performance regression.

Here's a polished-up patchset for that. I made the memoize test
removal a separate patch because (a) it only applies to master
and (b) it seems worth calling out as something we might be able
to revert later.

I found one bug in the draft patch: add_nulling_relids only processes
Vars of level zero, so we have to apply it before not after adjusting
the Vars' levelsup. An alternative could be to add a levelsup
parameter to add_nulling_relids, but that seemed like unnecessary
complication.

regards, tom lane