Consider parallel for lateral subqueries with limit
I've been investigating parallelizing certain correlated subqueries,
and during that work stumbled across the fact that
set_rel_consider_parallel disallows parallel query on what seems like
a fairly simple case.
Consider this query:
select t.unique1
from tenk1 t
join lateral (select t.unique1 from tenk1 offset 0) l on true;
Current set_rel_consider_parallel sets consider_parallel=false on the
subquery rel because it has a limit/offset. That restriction makes a
lot of sense when we have a subquery whose results conceptually need
to be "shared" (or at least be the same) across multiple workers
(indeed the relevant comment in that function notes that cases where
we could prove a unique ordering would also qualify, but punts on
implementing that due to complexity). But if the subquery is LATERAL,
then no such conceptual restriction.
If we change the code slightly to allow considering parallel query
even in the face of LIMIT/OFFSET for LATERAL subqueries, then our
query above changes from the following plan:
Nested Loop
Output: t.unique1
-> Gather
Output: t.unique1
Workers Planned: 2
-> Parallel Index Only Scan using tenk1_unique1 on public.tenk1 t
Output: t.unique1
-> Gather
Output: NULL::integer
Workers Planned: 2
-> Parallel Index Only Scan using tenk1_hundred on public.tenk1
Output: NULL::integer
to this plan:
Gather
Output: t.unique1
Workers Planned: 2
-> Nested Loop
Output: t.unique1
-> Parallel Index Only Scan using tenk1_unique1 on public.tenk1 t
Output: t.unique1
-> Index Only Scan using tenk1_hundred on public.tenk1
Output: NULL::integer
The code change itself is quite simple (1 line). As far as I can tell
we don't need to expressly check parallel safety of the limit/offset
expressions; that appears to happen elsewhere (and that makes sense
since the RTE_RELATION case doesn't check those clauses either).
If I'm missing something about the safety of this (or any other
issue), I'd appreciate the feedback.
James
Attachments:
v1-0001-Allow-parallel-LATERAL-subqueries-with-LIMIT-OFFS.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Allow-parallel-LATERAL-subqueries-with-LIMIT-OFFS.patchDownload
From 0aff5f1b5e35e37a311c01e9f53caf6e088e8d43 Mon Sep 17 00:00:00 2001
From: jcoleman <jtc331@gmail.com>
Date: Mon, 30 Nov 2020 11:36:35 -0500
Subject: [PATCH v1] Allow parallel LATERAL subqueries with LIMIT/OFFSET
The code that determined whether or not a rel should be considered for
parallel query excluded subqueries with LIMIT/OFFSET. That's correct in
the general case: as the comment notes that'd mean we have to guarantee
ordering (and claims it's not worth checking that) for results to be
consistent across workers. However there's a simpler case that hasn't
been considered: LATERAL subqueries with LIMIT/OFFSET don't fall under
the same reasoning since they're executed (when not converted to a JOIN)
per tuple anyway, so consistency of results across workers isn't a
factor.
---
src/backend/optimizer/path/allpaths.c | 4 +++-
src/test/regress/expected/select_parallel.out | 15 +++++++++++++++
src/test/regress/sql/select_parallel.sql | 6 ++++++
3 files changed, 24 insertions(+), 1 deletion(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 84a69b064a..3c9313b5a9 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -686,11 +686,13 @@ set_rel_consider_parallel(PlannerInfo *root, RelOptInfo *rel,
* inconsistent results at the top-level. (In some cases, where
* the result is ordered, we could relax this restriction. But it
* doesn't currently seem worth expending extra effort to do so.)
+ * LATERAL is an exception: LIMIT/OFFSET is safe to execute within
+ * workers since the sub-select is executed per tuple
*/
{
Query *subquery = castNode(Query, rte->subquery);
- if (limit_needed(subquery))
+ if (!rte->lateral && limit_needed(subquery))
return;
}
break;
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 9b0c418db7..9ba40ca2c5 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -1042,6 +1042,21 @@ explain (costs off)
Filter: (stringu1 ~~ '%AAAA'::text)
(11 rows)
+-- ...unless it's LATERAL
+savepoint settings;
+set parallel_tuple_cost=0;
+explain (costs off) select t.unique1 from tenk1 t
+join lateral (select t.unique1 from tenk1 offset 0) l on true;
+ QUERY PLAN
+---------------------------------------------------------------------
+ Gather
+ Workers Planned: 4
+ -> Nested Loop
+ -> Parallel Index Only Scan using tenk1_unique1 on tenk1 t
+ -> Index Only Scan using tenk1_hundred on tenk1
+(5 rows)
+
+rollback to savepoint settings;
-- to increase the parallel query test coverage
SAVEPOINT settings;
SET LOCAL force_parallel_mode = 1;
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index 5a01a98b26..5c14b78457 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -392,6 +392,12 @@ explain (costs off, verbose)
explain (costs off)
select * from tenk1 a where two in
(select two from tenk1 b where stringu1 like '%AAAA' limit 3);
+-- ...unless it's LATERAL
+savepoint settings;
+set parallel_tuple_cost=0;
+explain (costs off) select t.unique1 from tenk1 t
+join lateral (select t.unique1 from tenk1 offset 0) l on true;
+rollback to savepoint settings;
-- to increase the parallel query test coverage
SAVEPOINT settings;
--
2.17.1
On Mon, Nov 30, 2020 at 7:00 PM James Coleman <jtc331@gmail.com> wrote:
I've been investigating parallelizing certain correlated subqueries,
and during that work stumbled across the fact that
set_rel_consider_parallel disallows parallel query on what seems like
a fairly simple case.Consider this query:
select t.unique1
from tenk1 t
join lateral (select t.unique1 from tenk1 offset 0) l on true;Current set_rel_consider_parallel sets consider_parallel=false on the
subquery rel because it has a limit/offset. That restriction makes a
lot of sense when we have a subquery whose results conceptually need
to be "shared" (or at least be the same) across multiple workers
(indeed the relevant comment in that function notes that cases where
we could prove a unique ordering would also qualify, but punts on
implementing that due to complexity). But if the subquery is LATERAL,
then no such conceptual restriction.If we change the code slightly to allow considering parallel query
even in the face of LIMIT/OFFSET for LATERAL subqueries, then our
query above changes from the following plan:Nested Loop
Output: t.unique1
-> Gather
Output: t.unique1
Workers Planned: 2
-> Parallel Index Only Scan using tenk1_unique1 on public.tenk1 t
Output: t.unique1
-> Gather
Output: NULL::integer
Workers Planned: 2
-> Parallel Index Only Scan using tenk1_hundred on public.tenk1
Output: NULL::integerto this plan:
Gather
Output: t.unique1
Workers Planned: 2
-> Nested Loop
Output: t.unique1
-> Parallel Index Only Scan using tenk1_unique1 on public.tenk1 t
Output: t.unique1
-> Index Only Scan using tenk1_hundred on public.tenk1
Output: NULL::integerThe code change itself is quite simple (1 line). As far as I can tell
we don't need to expressly check parallel safety of the limit/offset
expressions; that appears to happen elsewhere (and that makes sense
since the RTE_RELATION case doesn't check those clauses either).If I'm missing something about the safety of this (or any other
issue), I'd appreciate the feedback.
Note that near the end of grouping planner we have a similar check:
if (final_rel->consider_parallel && root->query_level > 1 &&
!limit_needed(parse))
guarding copying the partial paths from the current rel to the final
rel. I haven't managed to come up with a test case that exposes that
though since simple examples like the one above get converted into a
JOIN, so we're not in grouping_planner for a subquery. Making the
subquery above correlated results in us getting to that point, but
isn't currently marked as parallel safe for other reasons (because it
has params), so that's not a useful test. I'm not sure if there are
cases where we can't convert to a join but also don't involve params;
haven't thought about it a lot though.
James
Brian's email didn't keep the relevant headers, and so didn't show up
as a reply, so I've pasted it here and am replying in this original
thread. You can find the original at [1].
On Sun, Dec 6, 2020 at 7:34 PM Brian Davis <brian@brianlikespostgres.com> wrote:
Note that near the end of grouping planner we have a similar check:
if (final_rel->consider_parallel && root->query_level > 1 &&
!limit_needed(parse))guarding copying the partial paths from the current rel to the final
rel. I haven't managed to come up with a test case that exposes thatPlayed around with this a bit, here's a non-correlated subquery that gets us to that if statement
DROP TABLE IF EXISTS foo;
CREATE TABLE foo (bar int);INSERT INTO foo (bar)
SELECT
g
FROM
generate_series(1, 10000) AS g;SELECT
(
SELECT
bar
FROM
foo
LIMIT 1
) AS y
FROM
foo;
Thanks. That triggers the parallel if case for limit -- any additional
GUCs need to be modified to get that result? I assume regardless a
parallel plan isn't chosen (even if you remove the limit needed
check)?
I also was thinking about the LATERAL part.
I couldn't think of any reason why the uncorrelated subquery's results would need to be shared and therefore the same, when we'll be "looping" over each row of the source table, running the subquery anew for each, conceptually.
But then I tried this...
test=# CREATE TABLE foo (bar int);
CREATE TABLE
test=#
test=# INSERT INTO foo (bar)
test-# SELECT
test-# g
test-# FROM
test-# generate_series(1, 10) AS g;
INSERT 0 10
test=#
test=#
test=# SELECT
test-# foo.bar,
test-# lat.bar
test-# FROM
test-# foo JOIN LATERAL (
test(# SELECT
test(# bar
test(# FROM
test(# foo AS foo2
test(# ORDER BY
test(# random()
test(# LIMIT 1
test(# ) AS lat ON true;
bar | bar
-----+-----
1 | 7
2 | 7
3 | 7
4 | 7
5 | 7
6 | 7
7 | 7
8 | 7
9 | 7
10 | 7
(10 rows)As you can see, random() is only called once. If postgres were supposed to be running the subquery for each source row, conceptually, it would be a mistake to cache the results of a volatile function like random().
This was genuinely surprising to me. I think one could argue that this
is just an optimization (after all -- if there is no correlation, then
running it once is conceptually/safely the same as running it multiple
times), but that doesn't seem to hold water with the volatile function
in play.
Of course, given the volatile function we'd never parallelize this
anyway. But we still have to consider the case where the result is
otherwise ordered differently between workers (just by virtue of disk
order, for example).
I've tried the above query using tenk1 from the regress tests to get a
bit more data, and, along with modifying several GUCs, can force
parallel plans. However in no case can I get it to execute that
uncorrelated lateral in multiple workers. That makes me suspicious
that there's another check in play here ensuring the lateral subquery
is executed for each group, and that in the uncorrelated case really
that rule still holds -- it's just a single group.
The docs say: "When a FROM item contains LATERAL cross-references, evaluation proceeds as follows: for each row of the FROM item providing the cross-referenced column(s), or set of rows of multiple FROM items providing the columns, the LATERAL item is evaluated using that row or row set's values of the columns. The resulting row(s) are joined as usual with the rows they were computed from. This is repeated for each row or set of rows from the column source table(s)."
They don't say what happens with LATERAL when there aren't cross-references though. As we expect, adding one does show random() being called once for each source row.
If my theory above is correct then it's implicit that the row set is
the whole previous FROM group.
test=# SELECT
test-# foo.bar,
test-# lat.bar
test-# FROM
test-# foo JOIN LATERAL (
test(# SELECT
test(# bar
test(# FROM
test(# foo AS foo2
test(# WHERE
test(# foo2.bar < foo.bar + 100000
test(# ORDER BY
test(# random()
test(# LIMIT 1
test(# ) AS lat ON true;
bar | bar
-----+-----
1 | 5
2 | 8
3 | 3
4 | 4
5 | 5
6 | 5
7 | 1
8 | 3
9 | 7
10 | 3
(10 rows)It seems like to keep the same behavior that exists today, results of LATERAL subqueries would need to be the same if they aren't correlated, and so you couldn't run them in parallel with a limit if the order wasn't guaranteed. But I'll be the first to admit that it's easy enough for me to miss a key piece of logic on something like this, so I could be way off base too.
If it weren't for the volatile function in the example, I think I
could argue we could change before (i.e., my theorizing above
originally about just being an optimization)...but yes, it seems like
this behavior shouldn't change. I can't seem to make it break though
with the patch.
While I haven't actually tracked down to guarantee this is handled
elsewhere, a thought experiment -- I think -- shows it must be so.
Here's why: suppose we don't have a limit here, but the query return
order is different in different backends. Then we would have the same
problem you bring up. In that case this code is already setting
consider_parallel=true on the rel. So I don't think we're changing any
behavior here.
James
1: /messages/by-id/a50766a4-a927-41c4-984c-76e513b6d1c4@www.fastmail.com
On 12/7/20 6:45 PM, James Coleman wrote:
On Sun, Dec 6, 2020 at 7:34 PM Brian Davis <brian@brianlikespostgres.com> wrote:
Played around with this a bit, here's a non-correlated subquery that gets us to that if statement
While I haven't actually tracked down to guarantee this is handled
elsewhere, a thought experiment -- I think -- shows it must be so.
Here's why: suppose we don't have a limit here, but the query return
order is different in different backends. Then we would have the same
problem you bring up. In that case this code is already setting
consider_parallel=true on the rel. So I don't think we're changing any
behavior here.
So it looks like you and Brian are satisfied that this change is not
allowing bad behavior.
Seems like an obvious win. Hopefully we can get some other concurring
opinions.
Regards,
--
-David
david@pgmasters.net
On Tue, Dec 8, 2020 at 10:46 AM James Coleman <jtc331@gmail.com> wrote:
While I haven't actually tracked down to guarantee this is handled
elsewhere, a thought experiment -- I think -- shows it must be so.
Here's why: suppose we don't have a limit here, but the query return
order is different in different backends. Then we would have the same
problem you bring up. In that case this code is already setting
consider_parallel=true on the rel. So I don't think we're changing any
behavior here.
AFAICS, the patch seems very reasonable and specifically targets
lateral subqueries with limit/offset. It seems like the uncorrelated
case is the only real concern.
I generally agree that the current patch is probably not changing any
behavior in the uncorrelated case (and like yourself, haven't yet
found a case for which it breaks), but I'm not sure Brian's concerns
can be ruled out entirely.
How about a minor update to the patch to make it slightly more
restrictive, to exclude the case when there are no lateral
cross-references, so we'll be allowing parallelism only when we know
the lateral subquery will be evaluated anew for each source row?
I was thinking of the following patch modification:
BEFORE:
- if (limit_needed(subquery))
+ if (!rte->lateral && limit_needed(subquery))
AFTER:
- if (limit_needed(subquery))
+ if ((!rte->lateral || bms_is_empty(rel->lateral_relids)) &&
+ limit_needed(subquery))
Thoughts?
Regards,
Greg Nancarrow
Fujitsu Australia
On Thu, May 27, 2021 at 9:01 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
On Tue, Dec 8, 2020 at 10:46 AM James Coleman <jtc331@gmail.com> wrote:
While I haven't actually tracked down to guarantee this is handled
elsewhere, a thought experiment -- I think -- shows it must be so.
Here's why: suppose we don't have a limit here, but the query return
order is different in different backends. Then we would have the same
problem you bring up. In that case this code is already setting
consider_parallel=true on the rel. So I don't think we're changing any
behavior here.AFAICS, the patch seems very reasonable and specifically targets
lateral subqueries with limit/offset. It seems like the uncorrelated
case is the only real concern.
I generally agree that the current patch is probably not changing any
behavior in the uncorrelated case (and like yourself, haven't yet
found a case for which it breaks), but I'm not sure Brian's concerns
can be ruled out entirely.How about a minor update to the patch to make it slightly more
restrictive, to exclude the case when there are no lateral
cross-references, so we'll be allowing parallelism only when we know
the lateral subquery will be evaluated anew for each source row?
I was thinking of the following patch modification:BEFORE: - if (limit_needed(subquery)) + if (!rte->lateral && limit_needed(subquery))AFTER: - if (limit_needed(subquery)) + if ((!rte->lateral || bms_is_empty(rel->lateral_relids)) && + limit_needed(subquery))Thoughts?
Apologies for the delayed response; this seems fine to me. I've
attached patch v2.
Thanks,
James Coleman
Attachments:
v2-0001-Allow-parallel-LATERAL-subqueries-with-LIMIT-OFFS.patchapplication/octet-stream; name=v2-0001-Allow-parallel-LATERAL-subqueries-with-LIMIT-OFFS.patchDownload
From f7f507714b1eb1b44daac83dc67de2f854e2bb2a Mon Sep 17 00:00:00 2001
From: jcoleman <jtc331@gmail.com>
Date: Mon, 30 Nov 2020 11:36:35 -0500
Subject: [PATCH v2] Allow parallel LATERAL subqueries with LIMIT/OFFSET
The code that determined whether or not a rel should be considered for
parallel query excluded subqueries with LIMIT/OFFSET. That's correct in
the general case: as the comment notes that'd mean we have to guarantee
ordering (and claims it's not worth checking that) for results to be
consistent across workers. However there's a simpler case that hasn't
been considered: LATERAL subqueries with LIMIT/OFFSET don't fall under
the same reasoning since they're executed (when not converted to a JOIN)
per tuple anyway, so consistency of results across workers isn't a
factor.
---
src/backend/optimizer/path/allpaths.c | 5 ++++-
src/test/regress/expected/select_parallel.out | 15 +++++++++++++++
src/test/regress/sql/select_parallel.sql | 6 ++++++
3 files changed, 25 insertions(+), 1 deletion(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 353454b183..b7c9b17f01 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -682,11 +682,14 @@ set_rel_consider_parallel(PlannerInfo *root, RelOptInfo *rel,
* inconsistent results at the top-level. (In some cases, where
* the result is ordered, we could relax this restriction. But it
* doesn't currently seem worth expending extra effort to do so.)
+ * LATERAL is an exception: LIMIT/OFFSET is safe to execute within
+ * workers since the sub-select is executed per tuple
*/
{
Query *subquery = castNode(Query, rte->subquery);
- if (limit_needed(subquery))
+ if (limit_needed(subquery) &&
+ (!rte->lateral || bms_is_empty(rel->lateral_relids)))
return;
}
break;
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 4ea1aa7dfd..2303f70d6e 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -1040,6 +1040,21 @@ explain (costs off)
Filter: (stringu1 ~~ '%AAAA'::text)
(11 rows)
+-- ...unless it's LATERAL
+savepoint settings;
+set parallel_tuple_cost=0;
+explain (costs off) select t.unique1 from tenk1 t
+join lateral (select t.unique1 from tenk1 offset 0) l on true;
+ QUERY PLAN
+---------------------------------------------------------------------
+ Gather
+ Workers Planned: 4
+ -> Nested Loop
+ -> Parallel Index Only Scan using tenk1_unique1 on tenk1 t
+ -> Index Only Scan using tenk1_hundred on tenk1
+(5 rows)
+
+rollback to savepoint settings;
-- to increase the parallel query test coverage
SAVEPOINT settings;
SET LOCAL force_parallel_mode = 1;
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index f924731248..019e17e751 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -390,6 +390,12 @@ explain (costs off, verbose)
explain (costs off)
select * from tenk1 a where two in
(select two from tenk1 b where stringu1 like '%AAAA' limit 3);
+-- ...unless it's LATERAL
+savepoint settings;
+set parallel_tuple_cost=0;
+explain (costs off) select t.unique1 from tenk1 t
+join lateral (select t.unique1 from tenk1 offset 0) l on true;
+rollback to savepoint settings;
-- to increase the parallel query test coverage
SAVEPOINT settings;
--
2.20.1
On Fri, Jul 16, 2021 at 3:16 PM James Coleman <jtc331@gmail.com> wrote:
On Thu, May 27, 2021 at 9:01 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
On Tue, Dec 8, 2020 at 10:46 AM James Coleman <jtc331@gmail.com> wrote:
While I haven't actually tracked down to guarantee this is handled
elsewhere, a thought experiment -- I think -- shows it must be so.
Here's why: suppose we don't have a limit here, but the query return
order is different in different backends. Then we would have the same
problem you bring up. In that case this code is already setting
consider_parallel=true on the rel. So I don't think we're changing any
behavior here.AFAICS, the patch seems very reasonable and specifically targets
lateral subqueries with limit/offset. It seems like the uncorrelated
case is the only real concern.
I generally agree that the current patch is probably not changing any
behavior in the uncorrelated case (and like yourself, haven't yet
found a case for which it breaks), but I'm not sure Brian's concerns
can be ruled out entirely.How about a minor update to the patch to make it slightly more
restrictive, to exclude the case when there are no lateral
cross-references, so we'll be allowing parallelism only when we know
the lateral subquery will be evaluated anew for each source row?
I was thinking of the following patch modification:BEFORE: - if (limit_needed(subquery)) + if (!rte->lateral && limit_needed(subquery))AFTER: - if (limit_needed(subquery)) + if ((!rte->lateral || bms_is_empty(rel->lateral_relids)) && + limit_needed(subquery))Thoughts?
Apologies for the delayed response; this seems fine to me. I've
attached patch v2.
Greg,
Do you believe this is now ready for committer?
Thanks,
James Coleman
On Thu, Nov 4, 2021 at 12:49 AM James Coleman <jtc331@gmail.com> wrote:
Greg,
Do you believe this is now ready for committer?
The patch LGTM.
I have set the status to "Ready for Committer".
Regards,
Greg Nancarrow
Fujitsu Australia
Greg Nancarrow <gregn4422@gmail.com> writes:
The patch LGTM.
I have set the status to "Ready for Committer".
I don't really see why this patch is even a little bit safe.
The argument for it seems to be that a lateral subquery will
necessarily be executed in such a way that each complete iteration
of the subquery, plus joining to its outer rel, happens within a
single worker ... but where is the guarantee of that? Once
you've marked the rel as parallel-safe, the planner is free to
consider all sorts of parallel join structures. I'm afraid this
would be easily broken as soon as you look at cases with three or
more rels. Or maybe even just two. The reason for the existing
restriction boils down to this structure being unsafe:
Gather
NestLoop
Scan ...
Limit
Scan ...
and I don't see how the existence of a lateral reference
makes it any safer.
regards, tom lane
On Tue, Jan 4, 2022 at 5:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Greg Nancarrow <gregn4422@gmail.com> writes:
The patch LGTM.
I have set the status to "Ready for Committer".I don't really see why this patch is even a little bit safe.
The argument for it seems to be that a lateral subquery will
necessarily be executed in such a way that each complete iteration
of the subquery, plus joining to its outer rel, happens within a
single worker ... but where is the guarantee of that? Once
you've marked the rel as parallel-safe, the planner is free to
consider all sorts of parallel join structures. I'm afraid this
would be easily broken as soon as you look at cases with three or
more rels. Or maybe even just two. The reason for the existing
restriction boils down to this structure being unsafe:Gather
NestLoop
Scan ...
Limit
Scan ...and I don't see how the existence of a lateral reference
makes it any safer.
Thanks for taking a look. I'm not following how the structure you
posited is inherently unsafe when it's a lateral reference. That
limit/scan (if lateral) has to be being executed per tuple in the
outer scan, right? And if it's a unique execution per tuple, then
consistency across tuples (that are in different workers) can't be a
concern.
Is there a scenario I'm missing where lateral can currently be
executed in that way in that structure (or a different one)?
Thanks,
James Coleman
On Tue, Jan 4, 2022 at 9:59 PM James Coleman <jtc331@gmail.com> wrote:
On Tue, Jan 4, 2022 at 5:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Greg Nancarrow <gregn4422@gmail.com> writes:
The patch LGTM.
I have set the status to "Ready for Committer".I don't really see why this patch is even a little bit safe.
The argument for it seems to be that a lateral subquery will
necessarily be executed in such a way that each complete iteration
of the subquery, plus joining to its outer rel, happens within a
single worker ... but where is the guarantee of that? Once
you've marked the rel as parallel-safe, the planner is free to
consider all sorts of parallel join structures. I'm afraid this
would be easily broken as soon as you look at cases with three or
more rels. Or maybe even just two. The reason for the existing
restriction boils down to this structure being unsafe:Gather
NestLoop
Scan ...
Limit
Scan ...and I don't see how the existence of a lateral reference
makes it any safer.Thanks for taking a look. I'm not following how the structure you
posited is inherently unsafe when it's a lateral reference. That
limit/scan (if lateral) has to be being executed per tuple in the
outer scan, right? And if it's a unique execution per tuple, then
consistency across tuples (that are in different workers) can't be a
concern.Is there a scenario I'm missing where lateral can currently be
executed in that way in that structure (or a different one)?
To expand on this:
Suppose lateral is not in play. Then if we have a plan like:
Gather
NestLoop
Scan X
Limit
Scan Y
Because we have the result "X join Limit(Y)" we need "Limit(Y)" to be
consistent across all of the possible executions of "Limit(Y)" (i.e.,
in each worker it executes in). That means (absent infrastructure for
guaranteeing a unique ordering) we obviously can't parallelize the
inner side of the join as the limit may be applied in different ways
in each worker's execution.
Now suppose lateral is in play. Then (given the same plan) instead of
our result being "X join Limit(Y)" the result is "X join Limit(Y sub
X)", that is, each row in X is joined to a unique invocation of
"Limit(Y)". In this case we are already conceivably getting different
results for each execution of the subquery "Limit(Y)" even if we're
not running those executions across multiple workers. Whether we've
optimized such a subquery into running a single execution per row in X
or have managed to optimize it in such a way that a single execution
of "Limit(Y)" may be shared by multiple rows in X makes no difference
because there are no relational guarantees that that is the case
(conceptually each row in X gets its own results for "Limit(Y)").
I've not been able to come up with a scenario where this doesn't hold
-- even if part of the join or the subquery execution happens in a
different worker. I believe that for there to be a parallel safety
problem here you'd need to have a given subquery execution for a row
in X be executed multiple times. Alternatively I've been trying to
reason about whether there might be a scenario where a 3rd rel is
involved (i.e., the inner rel is itself a join rel), but as far as I
can tell the moment we end up with a join structure such that the
lateral rel is the one with the limit we'd be back to being safe again
for the reasons claimed earlier.
If there's something obvious (or not so obvious) I'm missing, I'd
appreciate a counterexample, but I'm currently unable to falsify my
original claim that the lateral reference is a fundamental difference
that allows us to consider this parallel safe.
Thanks,
James Coleman
James Coleman <jtc331@gmail.com> writes:
On Tue, Jan 4, 2022 at 9:59 PM James Coleman <jtc331@gmail.com> wrote:
On Tue, Jan 4, 2022 at 5:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I don't really see why this patch is even a little bit safe.
Suppose lateral is not in play. Then if we have a plan like:
Gather
NestLoop
Scan X
Limit
Scan Y
Because we have the result "X join Limit(Y)" we need "Limit(Y)" to be
consistent across all of the possible executions of "Limit(Y)" (i.e.,
in each worker it executes in). That means (absent infrastructure for
guaranteeing a unique ordering) we obviously can't parallelize the
inner side of the join as the limit may be applied in different ways
in each worker's execution.
Now suppose lateral is in play. Then (given the same plan) instead of
our result being "X join Limit(Y)" the result is "X join Limit(Y sub
X)", that is, each row in X is joined to a unique invocation of
"Limit(Y)".
This argument seems to be assuming that Y is laterally dependent on X,
but the patch as written will take *any* lateral dependency as a
get-out-of-jail-free card. If what we have is "Limit(Y sub Z)" where
Z is somewhere else in the query tree, it's not apparent to me that
your argument holds.
But more generally, I don't think you've addressed the fundamental
concern, which is that a query involving Limit is potentially
nondeterministic (if it lacks a fully-deterministic ORDER BY),
so that different workers could get different answers from it if
they're using a plan type that permits that to happen. (See the
original discussion that led to 75f9c4ca5, at [1]/messages/by-id/153417684333.10284.11356259990921828616@wrigleys.postgresql.org.) I do not see
how a lateral dependency removes that hazard. The bug report that
started the original discussion hit the problem because it
generated a plan like
Gather
-> Hash Semi Join
-> Parallel Seq Scan
-> Hash
-> Limit
-> Seq Scan
We didn't make the submitter drill down far enough to verify
exactly why he got nondeterministic results from the Limit, but
I suppose the reason was that the table was big enough to trigger
"synchronize_seqscans" behavior, allowing different workers to
read different parts of that table. Now, that particular case
didn't have any lateral dependency, but if there was one it'd
just have resulted in changing the hash join to a nestloop join,
and the nondeterminism hazard would be exactly the same AFAICS.
In this case we are already conceivably getting different
results for each execution of the subquery "Limit(Y)" even if we're
not running those executions across multiple workers.
That seems to be about the same argument Andres made initially
in the old thread, but we soon shot that down as not being the
level of guarantee we want to provide. There's nothing in the
SQL standard that says that
select * from events where account in
(select account from events
where data->>'page' = 'success.html' limit 3);
(the original problem query) shall execute the sub-query
only once, but people expect it to act that way.
If you want to improve this area, my feeling is that it'd be
better to look into what was speculated about in the old
thread: LIMIT doesn't create nondeterminism if the query has
an ORDER BY that imposes a unique row ordering, ie
order-by-primary-key. We didn't have planner infrastructure
that would allow checking that cheaply in 2018, but maybe
there is some now?
regards, tom lane
[1]: /messages/by-id/153417684333.10284.11356259990921828616@wrigleys.postgresql.org
This entry has been waiting on author input for a while (our current
threshold is roughly two weeks), so I've marked it Returned with
Feedback.
Once you think the patchset is ready for review again, you (or any
interested party) can resurrect the patch entry by visiting
https://commitfest.postgresql.org/38/2851/
and changing the status to "Needs Review", and then changing the
status again to "Move to next CF". (Don't forget the second step;
hopefully we will have streamlined this in the near future!)
Thanks,
--Jacob
On Tue, Mar 1, 2022 at 5:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
But more generally, I don't think you've addressed the fundamental
concern, which is that a query involving Limit is potentially
nondeterministic (if it lacks a fully-deterministic ORDER BY),
so that different workers could get different answers from it if
they're using a plan type that permits that to happen. (See the
original discussion that led to 75f9c4ca5, at [1].) I do not see
how a lateral dependency removes that hazard.
...In this case we are already conceivably getting different
results for each execution of the subquery "Limit(Y)" even if we're
not running those executions across multiple workers.That seems to be about the same argument Andres made initially
in the old thread, but we soon shot that down as not being the
level of guarantee we want to provide. There's nothing in the
SQL standard that says that
select * from events where account in
(select account from events
where data->>'page' = 'success.html' limit 3);
(the original problem query) shall execute the sub-query
only once, but people expect it to act that way.
I understand that particular case being a bit of a gotcha (i.e., what
we would naturally expect exceeds what the spec says).
But in the case where there's correlation via LATERAL we already don't
guarantee unique executions for a given set of params into the lateral
subquery execution, right? For example, suppose we have:
select *
from foo
left join lateral (
select n
from bar
where bar.a = foo.a
limit 1
) on true
and suppose that foo.a (in the table scan) returns these values in
order: 1, 2, 1. In that case we'll execute the lateral subquery 3
separate times rather than attempting to order the results of foo.a
such that we can re-execute the subquery only when the param changes
to a new unique value (and we definitely don't cache the results to
guarantee unique subquery executions).
So, assuming that we can guarantee we're talking about the proper
lateral reference (not something unrelated as you pointed out earlier)
then I don't believe we're actually changing even implicit guarantees
about how the query executes. I think that probably true both in
theory (I claim that once someone declares a lateral join they're
definitionally expecting a subquery execution per outer row) and in
practice (e.g., your hypothesis about synchronized seq scans would
apply here also to subsequent executions of the subquery).
Is there still something I'm missing about the concern you have?
If you want to improve this area, my feeling is that it'd be
better to look into what was speculated about in the old
thread: LIMIT doesn't create nondeterminism if the query has
an ORDER BY that imposes a unique row ordering, ie
order-by-primary-key. We didn't have planner infrastructure
that would allow checking that cheaply in 2018, but maybe
there is some now?
Assuming what I argued earlier holds true for lateral [at minimum when
correlated] subquery execution, then I believe your suggestion here is
orthogonal and would expand the use cases even more. For example, if
we were able to guarantee a unique result set (including order), then
we could allow parallelizing subqueries even if they're not lateral
and correlated.
James Coleman
On Mon, Sep 19, 2022 at 3:58 PM James Coleman <jtc331@gmail.com> wrote:
But in the case where there's correlation via LATERAL we already don't
guarantee unique executions for a given set of params into the lateral
subquery execution, right? For example, suppose we have:select *
from foo
left join lateral (
select n
from bar
where bar.a = foo.a
limit 1
) on trueand suppose that foo.a (in the table scan) returns these values in
order: 1, 2, 1. In that case we'll execute the lateral subquery 3
separate times rather than attempting to order the results of foo.a
such that we can re-execute the subquery only when the param changes
to a new unique value (and we definitely don't cache the results to
guarantee unique subquery executions).
I think this is true, but I don't really understand why we should
focus on LATERAL here. What we really need, and I feel like we've
talked about this before, is a way to reason about where parameters
are set and used. Your sample query gets a plan like this:
Nested Loop Left Join (cost=0.00..1700245.00 rows=10000 width=8)
-> Seq Scan on foo (cost=0.00..145.00 rows=10000 width=4)
-> Limit (cost=0.00..170.00 rows=1 width=4)
-> Seq Scan on bar (cost=0.00..170.00 rows=1 width=4)
Filter: (foo.a = a)
If this were to occur inside a larger plan tree someplace, it would be
OK to insert a Gather node above the Nested Loop node without doing
anything further, because then the parameter that stores foo.a would
be both set and used in the worker. If you wanted to insert a Gather
at any other place in this plan, things get more complicated. But just
because you have LATERAL doesn't mean that you have this problem,
because if you delete the "limit 1" then the subqueries get flattened
together and the parameter disappears, and if you delete the lateral
reference (i.e. WHERE foo.a = bar.a) then there's still a subquery but
it no longer refers to an outer parameter. And on the flip side just
because you don't have LATERAL doesn't mean that you don't have this
problem. e.g. the query could instead be:
select *, (select n from bar where bar.a = foo.a limit 1) from foo;
...which I think is pretty much equivalent to your formulation and has
the same problem as far as parallel query as your formulation but does
not involve the LATERAL keyword.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, Sep 19, 2022 at 4:29 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Sep 19, 2022 at 3:58 PM James Coleman <jtc331@gmail.com> wrote:
But in the case where there's correlation via LATERAL we already don't
guarantee unique executions for a given set of params into the lateral
subquery execution, right? For example, suppose we have:select *
from foo
left join lateral (
select n
from bar
where bar.a = foo.a
limit 1
) on trueand suppose that foo.a (in the table scan) returns these values in
order: 1, 2, 1. In that case we'll execute the lateral subquery 3
separate times rather than attempting to order the results of foo.a
such that we can re-execute the subquery only when the param changes
to a new unique value (and we definitely don't cache the results to
guarantee unique subquery executions).I think this is true, but I don't really understand why we should
focus on LATERAL here. What we really need, and I feel like we've
talked about this before, is a way to reason about where parameters
are set and used.
Yes, over in the thread "Parallelize correlated subqueries that
execute within each worker" [1] which was meant to build on top of
this work (though is technically separate). Your bringing it up here
too is making me wonder if we can combine the two and instead of
always allowing subqueries with LIMIT instead (like the other patch
does) delay final determination of parallel safety of rels and paths
(perhaps, as that thread is discussing, until gather/gather merge path
creation).
Side note: I'd kinda like a way (maybe a development GUC or even a
compile time option) to have EXPLAIN show where params are set. If
something like that exists, egg on my face I guess, but I'd love to
know about it.
Your sample query gets a plan like this:
Nested Loop Left Join (cost=0.00..1700245.00 rows=10000 width=8)
-> Seq Scan on foo (cost=0.00..145.00 rows=10000 width=4)
-> Limit (cost=0.00..170.00 rows=1 width=4)
-> Seq Scan on bar (cost=0.00..170.00 rows=1 width=4)
Filter: (foo.a = a)If this were to occur inside a larger plan tree someplace, it would be
OK to insert a Gather node above the Nested Loop node without doing
anything further, because then the parameter that stores foo.a would
be both set and used in the worker. If you wanted to insert a Gather
at any other place in this plan, things get more complicated. But just
because you have LATERAL doesn't mean that you have this problem,
because if you delete the "limit 1" then the subqueries get flattened
together and the parameter disappears,
For future reference in this email thread when you remove the "limit
1" this is the plan you get:
Merge Right Join (cost=372.18..815.71 rows=28815 width=8)
Merge Cond: (bar.a = foo.a)
-> Sort (cost=158.51..164.16 rows=2260 width=8)
Sort Key: bar.a
-> Seq Scan on bar (cost=0.00..32.60 rows=2260 width=8)
-> Sort (cost=179.78..186.16 rows=2550 width=4)
Sort Key: foo.a
-> Seq Scan on foo (cost=0.00..35.50 rows=2550 width=4)
Just to make sure I'm following: by "doesn't mean that you have this
problem" you mean "doesn't mean you have this limitation on parallel
query"?
The "flattening out" (conversion to join between two table scans
executed a single time each) is an interesting case because I consider
that to be "just" a performance optimization, and therefore I don't
think anything about the guarantees a user expects should change. But
interestingly: it does end up providing stronger guarantees about the
query results than it would if the conversion didn't happen (the
subquery results in only a single table scan whereas without the
change a scan per outer row means it's *possible* to get different
results across different outer rows even with the same join key
value).
and if you delete the lateral
reference (i.e. WHERE foo.a = bar.a) then there's still a subquery but
it no longer refers to an outer parameter. And on the flip side just
because you don't have LATERAL doesn't mean that you don't have this
problem. e.g. the query could instead be:select *, (select n from bar where bar.a = foo.a limit 1) from foo;
...which I think is pretty much equivalent to your formulation and has
the same problem as far as parallel query as your formulation but does
not involve the LATERAL keyword.
Yes, that's a good point too. I need to play with these examples and
confirm whether lateral_relids gets set in that case. IIRC when that's
set isn't exactly the same as whether or not the LATERAL keyword is
used, and I should clarify that my claims here are meant to be about
when we execute it that way regardless of the keyword usage. The
keyword usage I'd assumed just made it easier to talk about, but maybe
you're implying that it's actually generating confusion.
James Coleman
1: /messages/by-id/CA+TgmoYXm2NCLt1nikWfYj1_r3=fsoNCHCtDVdN7X1uX_xuXgw@mail.gmail.com
On Thu, Sep 22, 2022 at 5:19 PM James Coleman <jtc331@gmail.com> wrote:
On Mon, Sep 19, 2022 at 4:29 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Sep 19, 2022 at 3:58 PM James Coleman <jtc331@gmail.com> wrote:
But in the case where there's correlation via LATERAL we already don't
guarantee unique executions for a given set of params into the lateral
subquery execution, right? For example, suppose we have:select *
from foo
left join lateral (
select n
from bar
where bar.a = foo.a
limit 1
) on trueand suppose that foo.a (in the table scan) returns these values in
order: 1, 2, 1. In that case we'll execute the lateral subquery 3
separate times rather than attempting to order the results of foo.a
such that we can re-execute the subquery only when the param changes
to a new unique value (and we definitely don't cache the results to
guarantee unique subquery executions).I think this is true, but I don't really understand why we should
focus on LATERAL here. What we really need, and I feel like we've
talked about this before, is a way to reason about where parameters
are set and used.Yes, over in the thread "Parallelize correlated subqueries that
execute within each worker" [1] which was meant to build on top of
this work (though is technically separate). Your bringing it up here
too is making me wonder if we can combine the two and instead of
always allowing subqueries with LIMIT instead (like the other patch
does) delay final determination of parallel safety of rels and paths
(perhaps, as that thread is discussing, until gather/gather merge path
creation).
Upon further investigation and thought I believe it *might* be
possible to do what I'd wondered about above (delay final
determination of parallel safety of the rel until later on in planning
by marking e.g. rels as tentatively safe and re-evaluating that as we
go) as my original patch did on the referenced thread, but that thread
also ended up with a much simpler proposed approach that still moved
final parallel safety determination to later in the planner, but it
did it by checking (in generate_gather_paths and
generate_user_gather_paths) whether that point in the plan tree
supplies the params required by the partial path.
So the current approach in that other thread is orthogonal to (if
complementary in some queries) the current question, which is "must we
immediately disallow parallelism on a rel that has a limit?"
Tom's concern about my arguments about special casing lateral was:
This argument seems to be assuming that Y is laterally dependent on X,
but the patch as written will take *any* lateral dependency as a
get-out-of-jail-free card. If what we have is "Limit(Y sub Z)" where
Z is somewhere else in the query tree, it's not apparent to me that
your argument holds.
I do see now that there was a now obvious flaw in the original patch:
rte->lateral may well be set to true even in cases where there's no
actual lateral dependency. That being said I don't see a need to
determine if the current subtree provides the required lateral
dependency, for the following reasons:
1. We don't want to set rel->consider_parallel to false immediately
because that will then poison everything higher in the tree, despite
the fact that it may well be that it's only higher up the plan tree
that the lateral dependency is provided. Regardless of the level in
the plan tree at which the param is provided we are going to execute
the subquery (even in serial execution) once per outer tuple at the
point in the join tree where the lateral join lies.
2. We're *not* at this point actually checking parallel safety of a
given path (i.e., is the path parallel safe at this point given the
params currently provided), we're only checking to see if the rel
itself should be entirely excluded from consideration for parallel
plans at any point in the future.
3. We already know that the question of whether or not a param is
provided can't be the concern here since it isn't under consideration
in the existing code path. That is, if a subquery doesn't have a limit
then this particular check won't determine that the subquery's
existence should result in setting rel->consider_parallel to false
because of any params it may or may not contain.
4. It is already the case that a subplan using exec params in the
where clause will not be considered parallel safe in path generation.
I believe the proper check for when to exclude subqueries with limit
clauses is (as in the attached patch) prohibiting a limit when
rel->lateral_relids is empty. Here are several examples of queries and
plans and how this code plays out to attempt to validate that
hypothesis:
select *
from foo
left join lateral (
select n
from bar
where bar.a = foo.a
limit 1
) on true;
Nested Loop Left Join (cost=0.00..8928.05 rows=2550 width=8)
-> Seq Scan on foo (cost=0.00..35.50 rows=2550 width=4)
-> Limit (cost=0.00..3.48 rows=1 width=4)
-> Seq Scan on bar (cost=0.00..38.25 rows=11 width=4)
Filter: (a = foo.a)
This was my base case for the past few emails. It hits the modified
code path with rte->lateral = true and
bms_num_members(rel->lateral_relids) = 1. The patch would allow the
subquery rel.consider_parallel to be set to true, however with solely
this patch we still won't put the limit subquery within the
parallelized portion of the plan because of the exec param used in the
where clause.
select *
from foo
left join lateral (
select foo.a
from bar
limit 1
) on true;
Nested Loop Left Join (cost=0.00..97.78 rows=2550 width=8)
-> Seq Scan on foo (cost=0.00..35.50 rows=2550 width=4)
-> Limit (cost=0.00..0.01 rows=1 width=4)
-> Seq Scan on bar (cost=0.00..32.60 rows=2260 width=4)
In this case the lateral reference is only in the target list of the
subquery, and this form is where the patch kicks in to allow placing
the gather node above the whole join tree (thus executing the limit
subquery within each worker).
select *, (select n from bar where bar.a = foo.a limit 1) from foo;
Seq Scan on foo (cost=0.00..8902.55 rows=2550 width=8)
SubPlan 1
-> Limit (cost=0.00..3.48 rows=1 width=4)
-> Seq Scan on bar (cost=0.00..38.25 rows=11 width=4)
Filter: (a = foo.a)
Robert wondered if this was effectively the same thing, and it
definitely, in my opinion, ought to be the same--in terms of the
results you expect--as my original example. However this example
doesn't appear to hit this code path at all. We also already
parallelize this form of query (both when the outer tuple reference is
in the subquery's target list and when it's in the subquery's where
clause).
select *
from foo
left join lateral (
select n
from bar
where bar.a = foo.a
) on true;
Merge Right Join (cost=372.18..815.71 rows=28815 width=8)
Merge Cond: (bar.a = foo.a)
-> Sort (cost=158.51..164.16 rows=2260 width=8)
Sort Key: bar.a
-> Seq Scan on bar (cost=0.00..32.60 rows=2260 width=8)
-> Sort (cost=179.78..186.16 rows=2550 width=4)
Sort Key: foo.a
-> Seq Scan on foo (cost=0.00..35.50 rows=2550 width=4)
Removing the limit results in the correlated subquery being rewritten
as a join. Because this rewrite removes the correlation (at least in
execution) this query also doesn't hit the modified code path at all.
I do find this one interesting because it's one of these examples of
how we already provide different guarantees around correlated subquery
execution (even when executing serially): when there's a limit here
the subquery executes multiple times (which means, for example, that
the results of the scan may be returned in a different order) but
without the limit it executes a single time (so the order is
guaranteed).
select *
from foo
left join lateral (
select n
from bar
) on true;
Nested Loop Left Join (cost=0.00..140795.50 rows=5763000 width=8)
-> Seq Scan on foo (cost=0.00..35.50 rows=2550 width=4)
-> Seq Scan on bar (cost=0.00..32.60 rows=2260 width=4)
If we remove the correlation but retain the lateral keyword we also
don't hit this code path. By the way, the plan is also the same if you
remove the useless lateral keyword in this query.
select *
from foo
where foo.a in (
select bar.a from bar limit 1
);
Hash Semi Join (cost=0.03..42.37 rows=13 width=4)
Hash Cond: (foo.a = bar.a)
-> Seq Scan on foo (cost=0.00..35.50 rows=2550 width=4)
-> Hash (cost=0.01..0.01 rows=1 width=4)
-> Limit (cost=0.00..0.01 rows=1 width=4)
-> Seq Scan on bar (cost=0.00..32.60 rows=2260 width=4)
This is the query form Tom referenced from a bug report that
originally brought in the current code that excludes all subqueries
with limits from parallelization. This form does indeed hit the code
block in question, but rte->lateral is false and
bms_num_members(rel->lateral_relids) is 0 so it is unaffected by this
patch.
select *
from foo
left join (
select n
from bar
limit 1
) on true;
Nested Loop Left Join (cost=0.00..67.39 rows=2550 width=8)
-> Seq Scan on foo (cost=0.00..35.50 rows=2550 width=4)
-> Materialize (cost=0.00..0.02 rows=1 width=4)
-> Limit (cost=0.00..0.01 rows=1 width=4)
-> Seq Scan on bar (cost=0.00..32.60 rows=2260 width=4)
This query also has rte->lateral is false and
bms_num_members(rel->lateral_relids) is 0 when reaching the line of
code changed in this patch. The interesting thing about this query is
that if you set enable_material = off the materialize node goes away,
and we do not attempt to rewrite the query into something that would
execute the subquery only once, so this is a case where we already
don't provide the theoretically strongest possible guarantees, though
one could reasonably argue it's a bit artificial with materialization
turned off.
In summary we already allow parallelizing this type of execution
pattern if the subquery is in the select clause; we apply stricter
standards to all subqueries in from clauses. I believe the prohibition
on parallelizing subqueries with limits in from clauses was
unnecessarily restrictive when it was added. When we have lateral
dependencies we execute the query in the same was we would when the
subquery is in the target list (i.e., per tuple at that point in the
join tree), and so we should be able to parallelize those subqueries
in the from clause just like we already do when they are in the target
list.
In the attached series the first patch adds a bunch of new tests to
show a bunch of permutations of queries. Most of the added test
queries don't end up changing with the 2nd patch applied (the actual
code changes) so that you can easily see the narrow scope of what's
affected. I don't envision most of the tests sticking around if this
is committed, but hopefully it provides a helpful way to evaluate the
semantics of the change I'm proposing.
Thanks,
James Coleman
Attachments:
v3-0002-Subqueries-with-LIMIT-can-be-parallel-safe-when-e.patchapplication/octet-stream; name=v3-0002-Subqueries-with-LIMIT-can-be-parallel-safe-when-e.patchDownload
From 5530d65ed6df1b0edc86c8176e7d10f8f10ea6f8 Mon Sep 17 00:00:00 2001
From: jcoleman <jtc331@gmail.com>
Date: Mon, 30 Nov 2020 11:36:35 -0500
Subject: [PATCH v3 2/2] Subqueries with LIMIT can be parallel safe when
executing per-outer tuple
The code that determined whether or not a rel should be considered for
parallel query excluded subqueries with LIMIT/OFFSET. That's correct in
the general case: as the comment notes that'd mean we have to guarantee
ordering (and claims it's not worth checking that) for results to be
consistent across workers. However there's a simpler case to special
case than known unique or consistently ordered results: when we're
already going to execute the subquery within the context of each outer
tuple then whether we do that repeated execution within a single process
or multiple processes isn't going to affect the guarantees we offer
about consistency of results.
---
src/backend/optimizer/path/allpaths.c | 7 +++++-
src/test/regress/expected/select_parallel.out | 22 +++++++++----------
2 files changed, 16 insertions(+), 13 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 8fc28007f5..6a8756430d 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -684,11 +684,16 @@ set_rel_consider_parallel(PlannerInfo *root, RelOptInfo *rel,
* inconsistent results at the top-level. (In some cases, where
* the result is ordered, we could relax this restriction. But it
* doesn't currently seem worth expending extra effort to do so.)
+ * We can carve out an exception, however, for cases in which the
+ * subquery with a limit is already going to be executed in the
+ * context of a single outer tuple. In that case we executed the
+ * subquery more than once anyway, and so we already cannot
+ * guarantee row order determinicity whether parallel or not.
*/
{
Query *subquery = castNode(Query, rte->subquery);
- if (limit_needed(subquery))
+ if (bms_is_empty(rel->lateral_relids) && limit_needed(subquery))
return;
}
break;
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index ada786d434..6030dafde5 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -1097,25 +1097,23 @@ explain (costs off) select t.unique1 from tenk1 t
join lateral (select t.unique1 from tenk1 offset 0) l on true;
QUERY PLAN
---------------------------------------------------------------------
- Nested Loop
- -> Gather
- Workers Planned: 4
+ Gather
+ Workers Planned: 4
+ -> Nested Loop
-> Parallel Index Only Scan using tenk1_unique1 on tenk1 t
- -> Gather
- Workers Planned: 4
- -> Parallel Index Only Scan using tenk1_hundred on tenk1
-(7 rows)
+ -> Index Only Scan using tenk1_hundred on tenk1
+(5 rows)
explain (costs off) select t.unique1 from tenk1 t
join lateral (select t.unique1 from tenk1 limit 1) l on true;
QUERY PLAN
---------------------------------------------------------------------
- Nested Loop
- -> Gather
- Workers Planned: 4
+ Gather
+ Workers Planned: 4
+ -> Nested Loop
-> Parallel Index Only Scan using tenk1_unique1 on tenk1 t
- -> Limit
- -> Seq Scan on tenk1
+ -> Limit
+ -> Seq Scan on tenk1
(6 rows)
explain (costs off) select t.unique1 from tenk1 t
--
2.32.1 (Apple Git-133)
v3-0001-Add-tests-before-change.patchapplication/octet-stream; name=v3-0001-Add-tests-before-change.patchDownload
From 91f3407a5d5737c4f0c08c4d988aaf7b3cd7ebdc Mon Sep 17 00:00:00 2001
From: jcoleman <jtc331@gmail.com>
Date: Sat, 24 Sep 2022 09:58:45 -0400
Subject: [PATCH v3 1/2] Add tests before change
---
src/test/regress/expected/select_parallel.out | 164 ++++++++++++++++++
src/test/regress/sql/select_parallel.sql | 45 +++++
2 files changed, 209 insertions(+)
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 91f74fe47a..ada786d434 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -1040,6 +1040,170 @@ explain (costs off)
Filter: (stringu1 ~~ '%AAAA'::text)
(11 rows)
+-- There are some special cases with LATERAL...
+savepoint settings;
+set parallel_tuple_cost=0;
+explain (costs off) select *, (select t.unique1 from tenk1 t2) from tenk1 t;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Gather
+ Workers Planned: 4
+ -> Parallel Seq Scan on tenk1 t
+ SubPlan 1
+ -> Gather
+ Workers Planned: 4
+ -> Parallel Index Only Scan using tenk1_hundred on tenk1 t2
+(7 rows)
+
+explain (costs off) select *, (select t.unique1 from tenk1 t2 limit 1) from tenk1 t;
+ QUERY PLAN
+------------------------------------
+ Gather
+ Workers Planned: 4
+ -> Parallel Seq Scan on tenk1 t
+ SubPlan 1
+ -> Limit
+ -> Seq Scan on tenk1 t2
+(6 rows)
+
+explain (costs off) select *,
+ (select t2.two from tenk1 t2 where t.unique1 = t2.unique1 limit 1)
+ from tenk1 t;
+ QUERY PLAN
+----------------------------------------------------------
+ Gather
+ Workers Planned: 4
+ -> Parallel Seq Scan on tenk1 t
+ SubPlan 1
+ -> Limit
+ -> Index Scan using tenk1_unique1 on tenk1 t2
+ Index Cond: (unique1 = t.unique1)
+(7 rows)
+
+explain (costs off) select *,
+ (select t2.two from tenk1 t2 where t.unique1 = t2.unique1)
+ from tenk1 t;
+ QUERY PLAN
+----------------------------------------------------
+ Gather
+ Workers Planned: 4
+ -> Parallel Seq Scan on tenk1 t
+ SubPlan 1
+ -> Index Scan using tenk1_unique1 on tenk1 t2
+ Index Cond: (unique1 = t.unique1)
+(6 rows)
+
+explain (costs off) select t.unique1 from tenk1 t
+join lateral (select t.unique1 from tenk1 offset 0) l on true;
+ QUERY PLAN
+---------------------------------------------------------------------
+ Nested Loop
+ -> Gather
+ Workers Planned: 4
+ -> Parallel Index Only Scan using tenk1_unique1 on tenk1 t
+ -> Gather
+ Workers Planned: 4
+ -> Parallel Index Only Scan using tenk1_hundred on tenk1
+(7 rows)
+
+explain (costs off) select t.unique1 from tenk1 t
+join lateral (select t.unique1 from tenk1 limit 1) l on true;
+ QUERY PLAN
+---------------------------------------------------------------------
+ Nested Loop
+ -> Gather
+ Workers Planned: 4
+ -> Parallel Index Only Scan using tenk1_unique1 on tenk1 t
+ -> Limit
+ -> Seq Scan on tenk1
+(6 rows)
+
+explain (costs off) select t.unique1 from tenk1 t
+join lateral (
+ select *
+ from tenk1
+ where t.unique1 = tenk1.unique1
+ limit 1
+) l on true;
+ QUERY PLAN
+---------------------------------------------------------------------
+ Nested Loop
+ -> Gather
+ Workers Planned: 4
+ -> Parallel Index Only Scan using tenk1_unique1 on tenk1 t
+ -> Limit
+ -> Index Only Scan using tenk1_unique1 on tenk1
+ Index Cond: (unique1 = t.unique1)
+(7 rows)
+
+explain (costs off) select t.unique1 from tenk1 t
+join lateral (
+ select *
+ from tenk1
+ where t.unique1 = tenk1.unique1
+) l on true;
+ QUERY PLAN
+-------------------------------------------------------------------------
+ Gather
+ Workers Planned: 4
+ -> Parallel Hash Join
+ Hash Cond: (t.unique1 = tenk1.unique1)
+ -> Parallel Index Only Scan using tenk1_unique1 on tenk1 t
+ -> Parallel Hash
+ -> Parallel Index Only Scan using tenk1_unique1 on tenk1
+(7 rows)
+
+explain (costs off) select t.unique1 from tenk1 t
+join lateral (
+ select *
+ from tenk1
+ limit 1
+) l on true;
+ QUERY PLAN
+---------------------------------------------------------------------
+ Nested Loop
+ -> Limit
+ -> Seq Scan on tenk1
+ -> Gather
+ Workers Planned: 4
+ -> Parallel Index Only Scan using tenk1_unique1 on tenk1 t
+(6 rows)
+
+explain (costs off) select t.unique1 from tenk1 t
+join lateral (
+ select *
+ from tenk1
+) l on true;
+ QUERY PLAN
+-------------------------------------------------------------------------
+ Nested Loop
+ -> Gather
+ Workers Planned: 4
+ -> Parallel Index Only Scan using tenk1_unique1 on tenk1 t
+ -> Materialize
+ -> Gather
+ Workers Planned: 4
+ -> Parallel Index Only Scan using tenk1_hundred on tenk1
+(8 rows)
+
+explain (costs off) select t.unique1 from tenk1 t
+join lateral (
+ select t.unique1
+ from tenk1
+) l on true;
+ QUERY PLAN
+-------------------------------------------------------------------------
+ Nested Loop
+ -> Gather
+ Workers Planned: 4
+ -> Parallel Index Only Scan using tenk1_unique1 on tenk1 t
+ -> Materialize
+ -> Gather
+ Workers Planned: 4
+ -> Parallel Index Only Scan using tenk1_hundred on tenk1
+(8 rows)
+
+rollback to savepoint settings;
-- to increase the parallel query test coverage
SAVEPOINT settings;
SET LOCAL force_parallel_mode = 1;
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index 62fb68c7a0..ed3691d807 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -390,6 +390,51 @@ explain (costs off, verbose)
explain (costs off)
select * from tenk1 a where two in
(select two from tenk1 b where stringu1 like '%AAAA' limit 3);
+-- There are some special cases with LATERAL...
+savepoint settings;
+set parallel_tuple_cost=0;
+explain (costs off) select *, (select t.unique1 from tenk1 t2) from tenk1 t;
+explain (costs off) select *, (select t.unique1 from tenk1 t2 limit 1) from tenk1 t;
+explain (costs off) select *,
+ (select t2.two from tenk1 t2 where t.unique1 = t2.unique1 limit 1)
+ from tenk1 t;
+explain (costs off) select *,
+ (select t2.two from tenk1 t2 where t.unique1 = t2.unique1)
+ from tenk1 t;
+explain (costs off) select t.unique1 from tenk1 t
+join lateral (select t.unique1 from tenk1 offset 0) l on true;
+explain (costs off) select t.unique1 from tenk1 t
+join lateral (select t.unique1 from tenk1 limit 1) l on true;
+explain (costs off) select t.unique1 from tenk1 t
+join lateral (
+ select *
+ from tenk1
+ where t.unique1 = tenk1.unique1
+ limit 1
+) l on true;
+explain (costs off) select t.unique1 from tenk1 t
+join lateral (
+ select *
+ from tenk1
+ where t.unique1 = tenk1.unique1
+) l on true;
+explain (costs off) select t.unique1 from tenk1 t
+join lateral (
+ select *
+ from tenk1
+ limit 1
+) l on true;
+explain (costs off) select t.unique1 from tenk1 t
+join lateral (
+ select *
+ from tenk1
+) l on true;
+explain (costs off) select t.unique1 from tenk1 t
+join lateral (
+ select t.unique1
+ from tenk1
+) l on true;
+rollback to savepoint settings;
-- to increase the parallel query test coverage
SAVEPOINT settings;
--
2.32.1 (Apple Git-133)
On Thu, Sep 22, 2022 at 5:19 PM James Coleman <jtc331@gmail.com> wrote:
Your sample query gets a plan like this:
Nested Loop Left Join (cost=0.00..1700245.00 rows=10000 width=8)
-> Seq Scan on foo (cost=0.00..145.00 rows=10000 width=4)
-> Limit (cost=0.00..170.00 rows=1 width=4)
-> Seq Scan on bar (cost=0.00..170.00 rows=1 width=4)
Filter: (foo.a = a)If this were to occur inside a larger plan tree someplace, it would be
OK to insert a Gather node above the Nested Loop node without doing
anything further, because then the parameter that stores foo.a would
be both set and used in the worker. If you wanted to insert a Gather
at any other place in this plan, things get more complicated. But just
because you have LATERAL doesn't mean that you have this problem,
because if you delete the "limit 1" then the subqueries get flattened
together and the parameter disappears,For future reference in this email thread when you remove the "limit
1" this is the plan you get:Merge Right Join (cost=372.18..815.71 rows=28815 width=8)
Merge Cond: (bar.a = foo.a)
-> Sort (cost=158.51..164.16 rows=2260 width=8)
Sort Key: bar.a
-> Seq Scan on bar (cost=0.00..32.60 rows=2260 width=8)
-> Sort (cost=179.78..186.16 rows=2550 width=4)
Sort Key: foo.a
-> Seq Scan on foo (cost=0.00..35.50 rows=2550 width=4)Just to make sure I'm following: by "doesn't mean that you have this
problem" you mean "doesn't mean you have this limitation on parallel
query"?
I'm talking specifically about whether there's a parameter. The
problem here isn't created by LATERAL, but by parameters. In the
nested loop plan, there's a parameter that's storing foo.a, and the
storage location that holds that parameter value is in backend-private
memory, so you can't set the value in the leader and then use it in a
worker, and that restricts where you can insert a Gather node. But in
the Merge Join plan (or if you got a Hash Join plan) there is no
parameter. So the fact that parameter storage isn't shared between
leaders and workers does not matter.
Yes, that's a good point too. I need to play with these examples and
confirm whether lateral_relids gets set in that case. IIRC when that's
set isn't exactly the same as whether or not the LATERAL keyword is
used, and I should clarify that my claims here are meant to be about
when we execute it that way regardless of the keyword usage. The
keyword usage I'd assumed just made it easier to talk about, but maybe
you're implying that it's actually generating confusion.
Yes, I think so.
Stepping back a bit, commit 75f9c4ca5a8047d7a9cfbc7d51a610933d04dc7f
introduced the code that is at issue here, and it took the position
that limit/offset should be marked parallel-restricted because they're
nondeterministic. I'm not sure that's really quite the right thing.
The original idea behind having things be parallel-restricted was that
there might be stuff which depends on state in the leader that is not
replicated to or shared with the workers, and thus it must be executed
in the leader to get the right answer. Here, however, there is no such
problem. Something like LIMIT/OFFSET that is nondeterministic is
perfectly safe to execute in a worker, or in the leader. It's just not
safe to execute the same computation more than once and assume that we
will get the same answer each time. But that's really a different
problem.
And the problem that you've run into here, I think, is that if a Limit
node is getting fed a parameter from higher up in the query plan, then
it's not really the same computation being performed every time, and
thus the non-determinism doesn't matter, and thus the
parallel-restriction goes away, but the code doesn't know that.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, Sep 26, 2022 at 10:37 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Sep 22, 2022 at 5:19 PM James Coleman <jtc331@gmail.com> wrote:
Your sample query gets a plan like this:
Nested Loop Left Join (cost=0.00..1700245.00 rows=10000 width=8)
-> Seq Scan on foo (cost=0.00..145.00 rows=10000 width=4)
-> Limit (cost=0.00..170.00 rows=1 width=4)
-> Seq Scan on bar (cost=0.00..170.00 rows=1 width=4)
Filter: (foo.a = a)If this were to occur inside a larger plan tree someplace, it would be
OK to insert a Gather node above the Nested Loop node without doing
anything further, because then the parameter that stores foo.a would
be both set and used in the worker. If you wanted to insert a Gather
at any other place in this plan, things get more complicated. But just
because you have LATERAL doesn't mean that you have this problem,
because if you delete the "limit 1" then the subqueries get flattened
together and the parameter disappears,For future reference in this email thread when you remove the "limit
1" this is the plan you get:Merge Right Join (cost=372.18..815.71 rows=28815 width=8)
Merge Cond: (bar.a = foo.a)
-> Sort (cost=158.51..164.16 rows=2260 width=8)
Sort Key: bar.a
-> Seq Scan on bar (cost=0.00..32.60 rows=2260 width=8)
-> Sort (cost=179.78..186.16 rows=2550 width=4)
Sort Key: foo.a
-> Seq Scan on foo (cost=0.00..35.50 rows=2550 width=4)Just to make sure I'm following: by "doesn't mean that you have this
problem" you mean "doesn't mean you have this limitation on parallel
query"?I'm talking specifically about whether there's a parameter. The
problem here isn't created by LATERAL, but by parameters. In the
nested loop plan, there's a parameter that's storing foo.a, and the
storage location that holds that parameter value is in backend-private
memory, so you can't set the value in the leader and then use it in a
worker, and that restricts where you can insert a Gather node. But in
the Merge Join plan (or if you got a Hash Join plan) there is no
parameter. So the fact that parameter storage isn't shared between
leaders and workers does not matter.
Ah, yes, right.
Yes, that's a good point too. I need to play with these examples and
confirm whether lateral_relids gets set in that case. IIRC when that's
set isn't exactly the same as whether or not the LATERAL keyword is
used, and I should clarify that my claims here are meant to be about
when we execute it that way regardless of the keyword usage. The
keyword usage I'd assumed just made it easier to talk about, but maybe
you're implying that it's actually generating confusion.Yes, I think so.
Stepping back a bit, commit 75f9c4ca5a8047d7a9cfbc7d51a610933d04dc7f
introduced the code that is at issue here, and it took the position
that limit/offset should be marked parallel-restricted because they're
nondeterministic. I'm not sure that's really quite the right thing.
The original idea behind having things be parallel-restricted was that
there might be stuff which depends on state in the leader that is not
replicated to or shared with the workers, and thus it must be executed
in the leader to get the right answer. Here, however, there is no such
problem. Something like LIMIT/OFFSET that is nondeterministic is
perfectly safe to execute in a worker, or in the leader. It's just not
safe to execute the same computation more than once and assume that we
will get the same answer each time. But that's really a different
problem.And the problem that you've run into here, I think, is that if a Limit
node is getting fed a parameter from higher up in the query plan, then
it's not really the same computation being performed every time, and
thus the non-determinism doesn't matter, and thus the
parallel-restriction goes away, but the code doesn't know that.
Correct. I think the simplest description of the proper limitation is
that we can't parallelize when either:
1. The param comes from outside the worker, or
2. The "same" param value from the same tuple is computed in multiple
workers (as distinct from the same value from different outer tuples).
Or, to put it positively, when can parallelize when:
1. There are no params, or
2. The param is uniquely generated and used inside a single worker.
I believe the followup email I sent (with an updated patch) details
the correctness of that approach; I'd be interested in your feedback
on that (I recognize it's quite a long email, though...) if you get a
chance.
Thanks for your review on this so far,
James Coleman
This patch has been "Needs Review" and bouncing from CF to CF. It
actually looks like it got quite a bit of design discussion and while
James Coleman seems convinced of its safety it doesn't sound like Tom
Lane and Robert Haas are yet and it doesn't look like that's going to
happen in this CF.
I think I'm going to mark this Returned With Feedback on the basis
that it did actually get a significant amount of discussion and no
further review seems to be forthcoming. Perhaps a more rigorous
approach of proving the correctness or perhaps an
easier-to-reason-about set of constraints would make it easier to
reach a consensus?
Once you think the patchset is ready for review again, you (or any
interested party) can resurrect the patch entry by visiting
https://commitfest.postgresql.org/42/2851/
and changing the status to "Needs Review", and then changing the
status again to "Move to next CF". (Don't forget the second step;
hopefully we will have streamlined this in the near future!)
--
Gregory Stark
As Commitfest Manager