path->param_info only set for lateral?
Hello,
Over in "Parallelize correlated subqueries that execute within each
worker" [1} Richard Guo found a bug in the current version of my patch
in that thread. While debugging that issue I've been wondering why
Path's param_info field seems to be NULL unless there is a LATERAL
reference even though there may be non-lateral outer params
referenced.
Consider the query:
select * from pg_description t1 where objoid in
(select objoid from pg_description t2 where t2.description =
t1.description);
The subquery's rel has a baserestrictinfo containing an OpExpr
comparing a Var (t2.description) to a Param of type PARAM_EXEC
(t1.description). But the generated SeqScan path doesn't have its
param_info field set, which means PATH_REQ_OUTER returns NULL also
despite there being an obvious param referencing a required outer
relid. Looking at create_seqscan_path we see that param_info is
initialized with:
get_baserel_parampathinfo(root, rel, required_outer)
where required_outer is passed in from set_plain_rel_pathlist as
rel->lateral_relids. And get_baserel_parampathinfo always returns NULL
if required_outer is empty, so obviously with this query (no lateral
reference) we're not going to get any ParamPathInfo added to the path
or the rel.
Is there a reason why we don't track the required relids providing the
PARAM_EXEC params in this case?
Thanks,
James Coleman
1: /messages/by-id/CAMbWs4_evjcMzN8Gw78bHfhfo2FKJThqhEjRJRmoMZx=NXcJ7w@mail.gmail.com
James Coleman <jtc331@gmail.com> writes:
Over in "Parallelize correlated subqueries that execute within each
worker" [1} Richard Guo found a bug in the current version of my patch
in that thread. While debugging that issue I've been wondering why
Path's param_info field seems to be NULL unless there is a LATERAL
reference even though there may be non-lateral outer params
referenced.
Per pathnodes.h:
* "param_info", if not NULL, links to a ParamPathInfo that identifies outer
* relation(s) that provide parameter values to each scan of this path.
* That means this path can only be joined to those rels by means of nestloop
* joins with this path on the inside. ...
We're only interested in this for params that are coming from other
relations of the same query level, so that they affect join order and
join algorithm choices. Params coming down from outer query levels
are much like EXTERN params to the planner: they are pseudoconstants
for any one execution of the current query level.
This isn't just LATERAL stuff; it's also intentionally-generated
nestloop-with-inner-indexscan-cases. But it's not outer-level Params.
Even though those are also PARAM_EXEC Params, they are fundamentally
different animals for the planner's purposes.
regards, tom lane
On Sun, Jun 18, 2023 at 10:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
James Coleman <jtc331@gmail.com> writes:
Over in "Parallelize correlated subqueries that execute within each
worker" [1} Richard Guo found a bug in the current version of my patch
in that thread. While debugging that issue I've been wondering why
Path's param_info field seems to be NULL unless there is a LATERAL
reference even though there may be non-lateral outer params
referenced.Per pathnodes.h:
* "param_info", if not NULL, links to a ParamPathInfo that identifies outer
* relation(s) that provide parameter values to each scan of this path.
* That means this path can only be joined to those rels by means of nestloop
* joins with this path on the inside. ...We're only interested in this for params that are coming from other
relations of the same query level, so that they affect join order and
join algorithm choices. Params coming down from outer query levels
are much like EXTERN params to the planner: they are pseudoconstants
for any one execution of the current query level.This isn't just LATERAL stuff; it's also intentionally-generated
nestloop-with-inner-indexscan-cases. But it's not outer-level Params.
Even though those are also PARAM_EXEC Params, they are fundamentally
different animals for the planner's purposes.
Thanks for the explanation.
I wonder if it'd be worth clarifying the comment slightly to hint in
that direction (like the attached)?
Thanks,
James Coleman
Attachments:
v1-0001-Clarify-param_info-query-level.patchapplication/octet-stream; name=v1-0001-Clarify-param_info-query-level.patchDownload
From cee5df2966cefb7071c30178979903d345d80ca7 Mon Sep 17 00:00:00 2001
From: jcoleman <jtc331@gmail.com>
Date: Tue, 20 Jun 2023 20:43:15 -0400
Subject: [PATCH v1] Clarify param_info query level
---
src/include/nodes/pathnodes.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index c17b53f7ad..5dbacbdf7b 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1571,7 +1571,8 @@ typedef struct ParamPathInfo
* That means this path can only be joined to those rels by means of nestloop
* joins with this path on the inside. Also note that a parameterized path
* is responsible for testing all "movable" joinclauses involving this rel
- * and the specified outer rel(s).
+ * and the specified outer rel(s). Params from higher query levels are not
+ * included here.
*
* "rows" is the same as parent->rows in simple paths, but in parameterized
* paths and UniquePaths it can be less than parent->rows, reflecting the
--
2.39.2 (Apple Git-143)