enable_incremental_sort changes query behavior
Hi,
With sqlsmith I found a query that gives this error:
ERROR: ORDER/GROUP BY expression not found in targetlist
I noted the query (sql query below, sorry it uses custom tables i
couldn't replicate with regression tables) because it doesn't include
an ORDER/GROUP BY clause.
--- 0 ----
select distinct
subq_0.c1 as c0,
ref_0.radi_usua_radi as c1,
ref_0.radi_nume_asoc as c2,
subq_0.c1 as c3,
case when (cast(null as pg_lsn) >=
pg_catalog.pg_last_wal_receive_lsn())
and (true = pg_catalog.pg_rotate_logfile_old()) then
ref_0.radi_usua_rem else ref_0.radi_usua_rem end
as c4,
cast(nullif((select hist_codi from public.hist_eventos_2
limit 1 offset 4)
,
pg_catalog.pg_stat_get_buf_alloc()) as int8) as c5
from
public.radicado_2 as ref_0,
lateral (select
ref_0.radi_text_temp as c0,
ref_0.radi_usua_actu as c1
from
public.hist_eventos_1 as ref_1
where cast(nullif(cast(null as float4),
cast(null as float4)) as float4) >= pg_catalog.pi()) as subq_0
where ref_0.radi_usua_dest is not NULL;
--- 0 ----
Attached the stack trace produced until de elog that produces the message.
But if I set enable_incremental_sort to off the query gets executed
without problems (attached the explain produced for that case)
--
Jaime Casanova www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Sep 26, 2020 at 2:49 PM Jaime Casanova
<jaime.casanova@2ndquadrant.com> wrote:
Hi,
With sqlsmith I found a query that gives this error:
ERROR: ORDER/GROUP BY expression not found in targetlistI noted the query (sql query below, sorry it uses custom tables i
couldn't replicate with regression tables) because it doesn't include
an ORDER/GROUP BY clause.--- 0 ---- select distinct subq_0.c1 as c0, ref_0.radi_usua_radi as c1, ref_0.radi_nume_asoc as c2, subq_0.c1 as c3, case when (cast(null as pg_lsn) >= pg_catalog.pg_last_wal_receive_lsn()) and (true = pg_catalog.pg_rotate_logfile_old()) then ref_0.radi_usua_rem else ref_0.radi_usua_rem end as c4, cast(nullif((select hist_codi from public.hist_eventos_2 limit 1 offset 4) , pg_catalog.pg_stat_get_buf_alloc()) as int8) as c5 from public.radicado_2 as ref_0, lateral (select ref_0.radi_text_temp as c0, ref_0.radi_usua_actu as c1 from public.hist_eventos_1 as ref_1 where cast(nullif(cast(null as float4), cast(null as float4)) as float4) >= pg_catalog.pi()) as subq_0 where ref_0.radi_usua_dest is not NULL; --- 0 ----Attached the stack trace produced until de elog that produces the message.
But if I set enable_incremental_sort to off the query gets executed
without problems (attached the explain produced for that case)
Thanks for the report.
Is there by an chance an index on ref_0.radi_text_temp?
And if you set enable_hashagg = off what plan do you get (or error)?
Thanks,
James
On Wed, 30 Sep 2020 at 21:21, James Coleman <jtc331@gmail.com> wrote:
On Sat, Sep 26, 2020 at 2:49 PM Jaime Casanova
<jaime.casanova@2ndquadrant.com> wrote:Hi,
With sqlsmith I found a query that gives this error:
ERROR: ORDER/GROUP BY expression not found in targetlist
[...]
But if I set enable_incremental_sort to off the query gets executed
without problems (attached the explain produced for that case)Thanks for the report.
Hi,
by experiment I reduced the query to this
--- 0 ---
select distinct
subq_0.c1 as c0,
case when (true = pg_catalog.pg_rotate_logfile_old()) then
ref_0.t else ref_0.t
end
as c4
from
public.ref_0,
lateral (select
ref_0.i as c1
from
generate_series(1, 100) as ref_1) as subq_0
--- 0 ---
the only custom table already needed can be created with this commands:
--- 0 ---
create table ref_0 as select repeat('abcde', (random() * 10)::integer)
t, random() * 1000 i from generate_series(1, 500000);
create index on ref_0 (i);
analyze ref_0 ;
--- 0 ---
Is there by an chance an index on ref_0.radi_text_temp?
there is an index involved but not on that field, commands above
create the index in the right column... after that, ANALYZE the table
And if you set enable_hashagg = off what plan do you get (or error)?
same error
--
Jaime Casanova www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Oct 1, 2020 at 3:09 AM Jaime Casanova
<jaime.casanova@2ndquadrant.com> wrote:
On Wed, 30 Sep 2020 at 21:21, James Coleman <jtc331@gmail.com> wrote:
On Sat, Sep 26, 2020 at 2:49 PM Jaime Casanova
<jaime.casanova@2ndquadrant.com> wrote:Hi,
With sqlsmith I found a query that gives this error:
ERROR: ORDER/GROUP BY expression not found in targetlist[...]
But if I set enable_incremental_sort to off the query gets executed
without problems (attached the explain produced for that case)Thanks for the report.
Hi,
by experiment I reduced the query to this
--- 0 --- select distinct subq_0.c1 as c0, case when (true = pg_catalog.pg_rotate_logfile_old()) then ref_0.t else ref_0.t end as c4 from public.ref_0, lateral (selectref_0.i as c1 from generate_series(1, 100) as ref_1) as subq_0 --- 0 ---the only custom table already needed can be created with this commands:
--- 0 --- create table ref_0 as select repeat('abcde', (random() * 10)::integer) t, random() * 1000 i from generate_series(1, 500000); create index on ref_0 (i); analyze ref_0 ; --- 0 ---Is there by an chance an index on ref_0.radi_text_temp?
there is an index involved but not on that field, commands above
create the index in the right column... after that, ANALYZE the tableAnd if you set enable_hashagg = off what plan do you get (or error)?
same error
I was able to reproduce the error without incremental sort enabled
(i.e., it happens with a full sort also). The function call in the
SELECT doesn't have to be in a case expression; for example I was able
to reproduce changing that to `random()::text || ref_0.t`.
It looks like the issue happens when:
1. The sort happens within a parallel node.
2. One of the sort keys is an expression containing a volatile
function call and a column from the lateral join.
Here are the settings I used with your above repro case to show it
with regular sort:
enable_hashagg=off
enable_incremental_sort=off
enable_seqscan=off
parallel_setup_cost=10
parallel_tuple_cost=0
The plan (obtained by replacing the volatile function with a stable one):
Unique
-> Nested Loop
-> Gather Merge
Workers Planned: 2
-> Sort
Sort Key: ref_0.i, (md5(ref_0.t))
-> Parallel Index Scan using ref_0_i_idx on ref_0
-> Function Scan on generate_series ref_1
Changing `md5(ref_0.t)` to `random()::text || ref_0.t` causes the error.
I haven't been able to dig further than that yet, but my intuition is
to poke around in the parallel query machinery?
James
On Thu, Oct 01, 2020 at 09:02:57AM -0400, James Coleman wrote:
On Thu, Oct 1, 2020 at 3:09 AM Jaime Casanova
<jaime.casanova@2ndquadrant.com> wrote:On Wed, 30 Sep 2020 at 21:21, James Coleman <jtc331@gmail.com> wrote:
On Sat, Sep 26, 2020 at 2:49 PM Jaime Casanova
<jaime.casanova@2ndquadrant.com> wrote:Hi,
With sqlsmith I found a query that gives this error:
ERROR: ORDER/GROUP BY expression not found in targetlist[...]
But if I set enable_incremental_sort to off the query gets executed
without problems (attached the explain produced for that case)Thanks for the report.
Hi,
by experiment I reduced the query to this
--- 0 --- select distinct subq_0.c1 as c0, case when (true = pg_catalog.pg_rotate_logfile_old()) then ref_0.t else ref_0.t end as c4 from public.ref_0, lateral (selectref_0.i as c1 from generate_series(1, 100) as ref_1) as subq_0 --- 0 ---the only custom table already needed can be created with this commands:
--- 0 --- create table ref_0 as select repeat('abcde', (random() * 10)::integer) t, random() * 1000 i from generate_series(1, 500000); create index on ref_0 (i); analyze ref_0 ; --- 0 ---Is there by an chance an index on ref_0.radi_text_temp?
there is an index involved but not on that field, commands above
create the index in the right column... after that, ANALYZE the tableAnd if you set enable_hashagg = off what plan do you get (or error)?
same error
I was able to reproduce the error without incremental sort enabled
(i.e., it happens with a full sort also). The function call in the
SELECT doesn't have to be in a case expression; for example I was able
to reproduce changing that to `random()::text || ref_0.t`.It looks like the issue happens when:
1. The sort happens within a parallel node.
2. One of the sort keys is an expression containing a volatile
function call and a column from the lateral join.Here are the settings I used with your above repro case to show it
with regular sort:enable_hashagg=off
enable_incremental_sort=off
enable_seqscan=off
parallel_setup_cost=10
parallel_tuple_cost=0The plan (obtained by replacing the volatile function with a stable one):
Unique
-> Nested Loop
-> Gather Merge
Workers Planned: 2
-> Sort
Sort Key: ref_0.i, (md5(ref_0.t))
-> Parallel Index Scan using ref_0_i_idx on ref_0
-> Function Scan on generate_series ref_1Changing `md5(ref_0.t)` to `random()::text || ref_0.t` causes the error.
I haven't been able to dig further than that yet, but my intuition is
to poke around in the parallel query machinery?
Nope. Bisect says this was introduced by this commit:
ba3e76cc57 Consider Incremental Sort paths at additional places
Looking at the diff, I suspect generate_useful_gather_paths (or maybe
get_useful_pathkeys_for_relation) fails to do something with the
pathkeys.
Of course, that'd explain why it only happens for parallel plans, as
this builds gather nodes.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Oct 1, 2020 at 6:08 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
On Thu, Oct 01, 2020 at 09:02:57AM -0400, James Coleman wrote:
On Thu, Oct 1, 2020 at 3:09 AM Jaime Casanova
<jaime.casanova@2ndquadrant.com> wrote:On Wed, 30 Sep 2020 at 21:21, James Coleman <jtc331@gmail.com> wrote:
On Sat, Sep 26, 2020 at 2:49 PM Jaime Casanova
<jaime.casanova@2ndquadrant.com> wrote:Hi,
With sqlsmith I found a query that gives this error:
ERROR: ORDER/GROUP BY expression not found in targetlist[...]
But if I set enable_incremental_sort to off the query gets executed
without problems (attached the explain produced for that case)Thanks for the report.
Hi,
by experiment I reduced the query to this
--- 0 --- select distinct subq_0.c1 as c0, case when (true = pg_catalog.pg_rotate_logfile_old()) then ref_0.t else ref_0.t end as c4 from public.ref_0, lateral (selectref_0.i as c1 from generate_series(1, 100) as ref_1) as subq_0 --- 0 ---the only custom table already needed can be created with this commands:
--- 0 --- create table ref_0 as select repeat('abcde', (random() * 10)::integer) t, random() * 1000 i from generate_series(1, 500000); create index on ref_0 (i); analyze ref_0 ; --- 0 ---Is there by an chance an index on ref_0.radi_text_temp?
there is an index involved but not on that field, commands above
create the index in the right column... after that, ANALYZE the tableAnd if you set enable_hashagg = off what plan do you get (or error)?
same error
I was able to reproduce the error without incremental sort enabled
(i.e., it happens with a full sort also). The function call in the
SELECT doesn't have to be in a case expression; for example I was able
to reproduce changing that to `random()::text || ref_0.t`.It looks like the issue happens when:
1. The sort happens within a parallel node.
2. One of the sort keys is an expression containing a volatile
function call and a column from the lateral join.Here are the settings I used with your above repro case to show it
with regular sort:enable_hashagg=off
enable_incremental_sort=off
enable_seqscan=off
parallel_setup_cost=10
parallel_tuple_cost=0The plan (obtained by replacing the volatile function with a stable one):
Unique
-> Nested Loop
-> Gather Merge
Workers Planned: 2
-> Sort
Sort Key: ref_0.i, (md5(ref_0.t))
-> Parallel Index Scan using ref_0_i_idx on ref_0
-> Function Scan on generate_series ref_1Changing `md5(ref_0.t)` to `random()::text || ref_0.t` causes the error.
I haven't been able to dig further than that yet, but my intuition is
to poke around in the parallel query machinery?Nope. Bisect says this was introduced by this commit:
ba3e76cc57 Consider Incremental Sort paths at additional places
Looking at the diff, I suspect generate_useful_gather_paths (or maybe
get_useful_pathkeys_for_relation) fails to do something with the
pathkeys.Of course, that'd explain why it only happens for parallel plans, as
this builds gather nodes.
I should have known I'd regret making a wild guess. I was in the
middle of bisecting too, but had to set it aside for a bit and hadn't
finished.
I'll try to look into that commit (and I agree those functions are the
likely places to look) as I get a chance here.
James
On Thu, Oct 1, 2020 at 9:10 PM James Coleman <jtc331@gmail.com> wrote:
On Thu, Oct 1, 2020 at 6:08 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:On Thu, Oct 01, 2020 at 09:02:57AM -0400, James Coleman wrote:
On Thu, Oct 1, 2020 at 3:09 AM Jaime Casanova
<jaime.casanova@2ndquadrant.com> wrote:On Wed, 30 Sep 2020 at 21:21, James Coleman <jtc331@gmail.com> wrote:
On Sat, Sep 26, 2020 at 2:49 PM Jaime Casanova
<jaime.casanova@2ndquadrant.com> wrote:Hi,
With sqlsmith I found a query that gives this error:
ERROR: ORDER/GROUP BY expression not found in targetlist[...]
But if I set enable_incremental_sort to off the query gets executed
without problems (attached the explain produced for that case)Thanks for the report.
Hi,
by experiment I reduced the query to this
--- 0 --- select distinct subq_0.c1 as c0, case when (true = pg_catalog.pg_rotate_logfile_old()) then ref_0.t else ref_0.t end as c4 from public.ref_0, lateral (selectref_0.i as c1 from generate_series(1, 100) as ref_1) as subq_0 --- 0 ---the only custom table already needed can be created with this commands:
--- 0 --- create table ref_0 as select repeat('abcde', (random() * 10)::integer) t, random() * 1000 i from generate_series(1, 500000); create index on ref_0 (i); analyze ref_0 ; --- 0 ---Is there by an chance an index on ref_0.radi_text_temp?
there is an index involved but not on that field, commands above
create the index in the right column... after that, ANALYZE the tableAnd if you set enable_hashagg = off what plan do you get (or error)?
same error
I was able to reproduce the error without incremental sort enabled
(i.e., it happens with a full sort also). The function call in the
SELECT doesn't have to be in a case expression; for example I was able
to reproduce changing that to `random()::text || ref_0.t`.It looks like the issue happens when:
1. The sort happens within a parallel node.
2. One of the sort keys is an expression containing a volatile
function call and a column from the lateral join.Here are the settings I used with your above repro case to show it
with regular sort:enable_hashagg=off
enable_incremental_sort=off
enable_seqscan=off
parallel_setup_cost=10
parallel_tuple_cost=0The plan (obtained by replacing the volatile function with a stable one):
Unique
-> Nested Loop
-> Gather Merge
Workers Planned: 2
-> Sort
Sort Key: ref_0.i, (md5(ref_0.t))
-> Parallel Index Scan using ref_0_i_idx on ref_0
-> Function Scan on generate_series ref_1Changing `md5(ref_0.t)` to `random()::text || ref_0.t` causes the error.
I haven't been able to dig further than that yet, but my intuition is
to poke around in the parallel query machinery?Nope. Bisect says this was introduced by this commit:
ba3e76cc57 Consider Incremental Sort paths at additional places
Looking at the diff, I suspect generate_useful_gather_paths (or maybe
get_useful_pathkeys_for_relation) fails to do something with the
pathkeys.Of course, that'd explain why it only happens for parallel plans, as
this builds gather nodes.I should have known I'd regret making a wild guess. I was in the
middle of bisecting too, but had to set it aside for a bit and hadn't
finished.I'll try to look into that commit (and I agree those functions are the
likely places to look) as I get a chance here.
I've been able to confirm that the problem goes away if we stop adding
the gather merge paths in generate_useful_gather_paths().
I'm not sure yet what conclusion that leads us to. It seems to be that
the biggest clue remains that all of this works correctly unless one
of the selected columns (which happens to be a pathkey at this point
because it's a DISTINCT query) contains a volatile expression.
James
On Fri, Oct 02, 2020 at 09:19:44AM -0400, James Coleman wrote:
...
I've been able to confirm that the problem goes away if we stop adding
the gather merge paths in generate_useful_gather_paths().I'm not sure yet what conclusion that leads us to. It seems to be that
the biggest clue remains that all of this works correctly unless one
of the selected columns (which happens to be a pathkey at this point
because it's a DISTINCT query) contains a volatile expression.
Yeah. It seems to me this is a bug in get_useful_pathkeys_for_relation,
which is calling find_em_expr_for_rel and is happy with anything it
returns. But this was copied from postgres_fdw, which however does a bit
more here:
if (pathkey_ec->ec_has_volatile ||
!(em_expr = find_em_expr_for_rel(pathkey_ec, rel)) ||
!is_foreign_expr(root, rel, em_expr))
So not only does it explicitly check volatility of the pathkey, it also
calls is_foreign_expr which checks the expression for mutable functions.
The attached patch seems to fix this, but it only adds the check for
mutable functions. Maybe it should check ec_has_volatile too ...
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
incremental-sort-fix.patchtext/plain; charset=us-asciiDownload
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index b399592ff8..33df9b0783 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -2771,6 +2771,7 @@ get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel)
{
PathKey *pathkey = (PathKey *) lfirst(lc);
EquivalenceClass *pathkey_ec = pathkey->pk_eclass;
+ Expr *em_expr;
/*
* We can only build an Incremental Sort for pathkeys which
@@ -2783,7 +2784,8 @@ get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel)
* enable not just an incremental sort on the entirety of
* query_pathkeys but also incremental sort below a JOIN.
*/
- if (!find_em_expr_for_rel(pathkey_ec, rel))
+ if (!(em_expr = find_em_expr_for_rel(pathkey_ec, rel)) ||
+ contain_mutable_functions((Node *) em_expr))
break;
npathkeys++;
On Fri, Oct 2, 2020 at 10:32 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
On Fri, Oct 02, 2020 at 09:19:44AM -0400, James Coleman wrote:
...
I've been able to confirm that the problem goes away if we stop adding
the gather merge paths in generate_useful_gather_paths().I'm not sure yet what conclusion that leads us to. It seems to be that
the biggest clue remains that all of this works correctly unless one
of the selected columns (which happens to be a pathkey at this point
because it's a DISTINCT query) contains a volatile expression.Yeah. It seems to me this is a bug in get_useful_pathkeys_for_relation,
which is calling find_em_expr_for_rel and is happy with anything it
returns. But this was copied from postgres_fdw, which however does a bit
more here:if (pathkey_ec->ec_has_volatile ||
!(em_expr = find_em_expr_for_rel(pathkey_ec, rel)) ||
!is_foreign_expr(root, rel, em_expr))So not only does it explicitly check volatility of the pathkey, it also
calls is_foreign_expr which checks the expression for mutable functions.The attached patch seems to fix this, but it only adds the check for
mutable functions. Maybe it should check ec_has_volatile too ...
We actually discussed the volatility check in that function back on
the original thread [1], and we'd concluded that was specifically
necessary for the fdw code because the function would execute on two
different servers (and thus produce different results), but that in a
local server only scenario it should be fine.
My understanding (correct me if I'm wrong) is that the volatile
function should only be executed once (at the scan level?) to build
the tuple and from then on the datum isn't going to change, so I'm not
sure why the volatility would matter here.
James
1: /messages/by-id/20200328025830.6v6ogkseohakc32q@development
On Fri, Oct 2, 2020 at 10:53 AM James Coleman <jtc331@gmail.com> wrote:
On Fri, Oct 2, 2020 at 10:32 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:On Fri, Oct 02, 2020 at 09:19:44AM -0400, James Coleman wrote:
...
I've been able to confirm that the problem goes away if we stop adding
the gather merge paths in generate_useful_gather_paths().I'm not sure yet what conclusion that leads us to. It seems to be that
the biggest clue remains that all of this works correctly unless one
of the selected columns (which happens to be a pathkey at this point
because it's a DISTINCT query) contains a volatile expression.Yeah. It seems to me this is a bug in get_useful_pathkeys_for_relation,
which is calling find_em_expr_for_rel and is happy with anything it
returns. But this was copied from postgres_fdw, which however does a bit
more here:if (pathkey_ec->ec_has_volatile ||
!(em_expr = find_em_expr_for_rel(pathkey_ec, rel)) ||
!is_foreign_expr(root, rel, em_expr))So not only does it explicitly check volatility of the pathkey, it also
calls is_foreign_expr which checks the expression for mutable functions.The attached patch seems to fix this, but it only adds the check for
mutable functions. Maybe it should check ec_has_volatile too ...We actually discussed the volatility check in that function back on
the original thread [1], and we'd concluded that was specifically
necessary for the fdw code because the function would execute on two
different servers (and thus produce different results), but that in a
local server only scenario it should be fine.My understanding (correct me if I'm wrong) is that the volatile
function should only be executed once (at the scan level?) to build
the tuple and from then on the datum isn't going to change, so I'm not
sure why the volatility would matter here.James
1: /messages/by-id/20200328025830.6v6ogkseohakc32q@development
Oh, hmm, could what I said all be true, but there still be some rule
that you shouldn't compare datums generated from volatile expressions
in different backends (i.e., parallel query)?
James
On Fri, Oct 02, 2020 at 10:55:14AM -0400, James Coleman wrote:
On Fri, Oct 2, 2020 at 10:53 AM James Coleman <jtc331@gmail.com> wrote:
On Fri, Oct 2, 2020 at 10:32 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:On Fri, Oct 02, 2020 at 09:19:44AM -0400, James Coleman wrote:
...
I've been able to confirm that the problem goes away if we stop adding
the gather merge paths in generate_useful_gather_paths().I'm not sure yet what conclusion that leads us to. It seems to be that
the biggest clue remains that all of this works correctly unless one
of the selected columns (which happens to be a pathkey at this point
because it's a DISTINCT query) contains a volatile expression.Yeah. It seems to me this is a bug in get_useful_pathkeys_for_relation,
which is calling find_em_expr_for_rel and is happy with anything it
returns. But this was copied from postgres_fdw, which however does a bit
more here:if (pathkey_ec->ec_has_volatile ||
!(em_expr = find_em_expr_for_rel(pathkey_ec, rel)) ||
!is_foreign_expr(root, rel, em_expr))So not only does it explicitly check volatility of the pathkey, it also
calls is_foreign_expr which checks the expression for mutable functions.The attached patch seems to fix this, but it only adds the check for
mutable functions. Maybe it should check ec_has_volatile too ...We actually discussed the volatility check in that function back on
the original thread [1], and we'd concluded that was specifically
necessary for the fdw code because the function would execute on two
different servers (and thus produce different results), but that in a
local server only scenario it should be fine.My understanding (correct me if I'm wrong) is that the volatile
function should only be executed once (at the scan level?) to build
the tuple and from then on the datum isn't going to change, so I'm not
sure why the volatility would matter here.James
1: /messages/by-id/20200328025830.6v6ogkseohakc32q@development
Oh, hmm, could what I said all be true, but there still be some rule
that you shouldn't compare datums generated from volatile expressions
in different backends (i.e., parallel query)?
I'm not sure it's all that related to parallel query - it's more likely
that when constructing the paths below the gather merge, this new code
fails to do something important.
I'm not 100% sure how the grouping and volatile functions work, so let
me think aloud here ...
The backtrace looks like this:
#0 get_sortgroupref_tle
#1 0x0000000000808ab9 in prepare_sort_from_pathkeys
#2 0x000000000080926c in make_sort_from_pathkeys
#3 0x0000000000801032 in create_sort_plan
#4 0x00000000007fe7e0 in create_plan_recurse
#5 0x0000000000800b2c in create_gather_merge_plan
#6 0x00000000007fe94d in create_plan_recurse
#7 0x0000000000805328 in create_nestloop_plan
#8 0x00000000007ff3c5 in create_join_plan
#9 0x00000000007fe5f8 in create_plan_recurse
#10 0x0000000000800d68 in create_projection_plan
#11 0x00000000007fe662 in create_plan_recurse
#12 0x0000000000801252 in create_upper_unique_plan
#13 0x00000000007fe760 in create_plan_recurse
#14 0x00000000007fe4f2 in create_plan
#15 0x000000000081082f in standard_planner
and the create_sort_plan works with lefttree that is IndexScan, so the
query we're constructing looks like this:
Distinct
-> Nestloop
-> Gather Merge
-> Sort
-> Index Scan
and it's the sort that expects to find the expression in the Index Scan
target list. Which seems rather bogus, because clearly the index scan
does not include the expression. (I wonder if it's somehow related that
indexes can't be built on volatile expressions ...)
Anyway, the index scan clearly does not include the expression the sort
references, hence the failure. And the index can can't compute it,
because we probably need to compute it on top of the join I think
(otherwise we might get duplicate values for volatile functions etc.)
Looking at this from a slightly different angle, the root cause here
seems to be that generate_useful_gather_paths uses the pathkeys it gets
from get_useful_pathkeys_for_relation, which means root->query_pathkeys.
But all other create_gather_merge_calls use root->sort_pathkeys, so
maybe this is the actual problem and get_useful_pathkeys_for_relation
should use root->sort_pathkeys instead. That does fix the issue for me
too (and it passes all regression tests).
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Oct 2, 2020 at 2:25 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
On Fri, Oct 02, 2020 at 10:55:14AM -0400, James Coleman wrote:
On Fri, Oct 2, 2020 at 10:53 AM James Coleman <jtc331@gmail.com> wrote:
On Fri, Oct 2, 2020 at 10:32 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:On Fri, Oct 02, 2020 at 09:19:44AM -0400, James Coleman wrote:
...
I've been able to confirm that the problem goes away if we stop adding
the gather merge paths in generate_useful_gather_paths().I'm not sure yet what conclusion that leads us to. It seems to be that
the biggest clue remains that all of this works correctly unless one
of the selected columns (which happens to be a pathkey at this point
because it's a DISTINCT query) contains a volatile expression.Yeah. It seems to me this is a bug in get_useful_pathkeys_for_relation,
which is calling find_em_expr_for_rel and is happy with anything it
returns. But this was copied from postgres_fdw, which however does a bit
more here:if (pathkey_ec->ec_has_volatile ||
!(em_expr = find_em_expr_for_rel(pathkey_ec, rel)) ||
!is_foreign_expr(root, rel, em_expr))So not only does it explicitly check volatility of the pathkey, it also
calls is_foreign_expr which checks the expression for mutable functions.The attached patch seems to fix this, but it only adds the check for
mutable functions. Maybe it should check ec_has_volatile too ...We actually discussed the volatility check in that function back on
the original thread [1], and we'd concluded that was specifically
necessary for the fdw code because the function would execute on two
different servers (and thus produce different results), but that in a
local server only scenario it should be fine.My understanding (correct me if I'm wrong) is that the volatile
function should only be executed once (at the scan level?) to build
the tuple and from then on the datum isn't going to change, so I'm not
sure why the volatility would matter here.James
1: /messages/by-id/20200328025830.6v6ogkseohakc32q@development
Oh, hmm, could what I said all be true, but there still be some rule
that you shouldn't compare datums generated from volatile expressions
in different backends (i.e., parallel query)?I'm not sure it's all that related to parallel query - it's more likely
that when constructing the paths below the gather merge, this new code
fails to do something important.I'm not 100% sure how the grouping and volatile functions work, so let
me think aloud here ...The backtrace looks like this:
#0 get_sortgroupref_tle
#1 0x0000000000808ab9 in prepare_sort_from_pathkeys
#2 0x000000000080926c in make_sort_from_pathkeys
#3 0x0000000000801032 in create_sort_plan
#4 0x00000000007fe7e0 in create_plan_recurse
#5 0x0000000000800b2c in create_gather_merge_plan
#6 0x00000000007fe94d in create_plan_recurse
#7 0x0000000000805328 in create_nestloop_plan
#8 0x00000000007ff3c5 in create_join_plan
#9 0x00000000007fe5f8 in create_plan_recurse
#10 0x0000000000800d68 in create_projection_plan
#11 0x00000000007fe662 in create_plan_recurse
#12 0x0000000000801252 in create_upper_unique_plan
#13 0x00000000007fe760 in create_plan_recurse
#14 0x00000000007fe4f2 in create_plan
#15 0x000000000081082f in standard_plannerand the create_sort_plan works with lefttree that is IndexScan, so the
query we're constructing looks like this:Distinct
-> Nestloop
-> Gather Merge
-> Sort
-> Index Scanand it's the sort that expects to find the expression in the Index Scan
target list. Which seems rather bogus, because clearly the index scan
does not include the expression. (I wonder if it's somehow related that
indexes can't be built on volatile expressions ...)
The index scan does include values not found in the index though,
right? It returns the whole tuple -- though I'm not sure if it
includes calculated values (BTW: is this what is meant by a
"projection" being added? Or is that something else?)
Anyway, the index scan clearly does not include the expression the sort
references, hence the failure. And the index can can't compute it,
because we probably need to compute it on top of the join I think
(otherwise we might get duplicate values for volatile functions etc.)
In this particular query there's no reason why we'd have to wait to
compute it until after the join, right?
For example, if we take away the parallelism but still force a Unique
plan, we get this:
Unique
-> Sort
Sort Key: ref_0.i, (CASE WHEN pg_rotate_logfile_old() THEN
ref_0.t ELSE ref_0.t END)
-> Nested Loop
-> Seq Scan on ref_0
-> Function Scan on generate_series ref_1
And I don't see any reason why the CASE statement couldn't in theory
(I don't know the internals enough to know when it actually happens)
be done as part of the base relation scan (in this case, the seq
scan). It's not dependent on any information from the join.
Furthermore the same plan works correctly for a non-volatile
expression, so that means (I think) that the relation scan is capable
of producing that expression in the tuple it generates (either that or
the sort is generating it?).
Looking at this from a slightly different angle, the root cause here
seems to be that generate_useful_gather_paths uses the pathkeys it gets
from get_useful_pathkeys_for_relation, which means root->query_pathkeys.
But all other create_gather_merge_calls use root->sort_pathkeys, so
maybe this is the actual problem and get_useful_pathkeys_for_relation
should use root->sort_pathkeys instead. That does fix the issue for me
too (and it passes all regression tests).
So I'm not convinced this is the correct way forward. It seems we want
to be able to apply this as broadly as possible, and using
sort_pathkeys here means we wouldn't use it for the DISTINCT case at
all, right?
So far I think excluding volatile expressions (or mutable functions?)
is the best way we've discussed so far, though I'm clear yet on why it
is that that's a necessary restriction. If the answer is "there are
cases where it would be unsafe for parallel query even though it isn't
here...", I'm fine with that, but if so, we should make sure that's in
the code comments/readmes somewhere.
I also verified that the functions I've been playing with
(pg_rotate_logfile_old and md5) are marked parallel safe in pg_proc.
James
On Fri, Oct 02, 2020 at 04:12:11PM -0400, James Coleman wrote:
On Fri, Oct 2, 2020 at 2:25 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:On Fri, Oct 02, 2020 at 10:55:14AM -0400, James Coleman wrote:
On Fri, Oct 2, 2020 at 10:53 AM James Coleman <jtc331@gmail.com> wrote:
On Fri, Oct 2, 2020 at 10:32 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:On Fri, Oct 02, 2020 at 09:19:44AM -0400, James Coleman wrote:
...
I've been able to confirm that the problem goes away if we stop adding
the gather merge paths in generate_useful_gather_paths().I'm not sure yet what conclusion that leads us to. It seems to be that
the biggest clue remains that all of this works correctly unless one
of the selected columns (which happens to be a pathkey at this point
because it's a DISTINCT query) contains a volatile expression.Yeah. It seems to me this is a bug in get_useful_pathkeys_for_relation,
which is calling find_em_expr_for_rel and is happy with anything it
returns. But this was copied from postgres_fdw, which however does a bit
more here:if (pathkey_ec->ec_has_volatile ||
!(em_expr = find_em_expr_for_rel(pathkey_ec, rel)) ||
!is_foreign_expr(root, rel, em_expr))So not only does it explicitly check volatility of the pathkey, it also
calls is_foreign_expr which checks the expression for mutable functions.The attached patch seems to fix this, but it only adds the check for
mutable functions. Maybe it should check ec_has_volatile too ...We actually discussed the volatility check in that function back on
the original thread [1], and we'd concluded that was specifically
necessary for the fdw code because the function would execute on two
different servers (and thus produce different results), but that in a
local server only scenario it should be fine.My understanding (correct me if I'm wrong) is that the volatile
function should only be executed once (at the scan level?) to build
the tuple and from then on the datum isn't going to change, so I'm not
sure why the volatility would matter here.James
1: /messages/by-id/20200328025830.6v6ogkseohakc32q@development
Oh, hmm, could what I said all be true, but there still be some rule
that you shouldn't compare datums generated from volatile expressions
in different backends (i.e., parallel query)?I'm not sure it's all that related to parallel query - it's more likely
that when constructing the paths below the gather merge, this new code
fails to do something important.I'm not 100% sure how the grouping and volatile functions work, so let
me think aloud here ...The backtrace looks like this:
#0 get_sortgroupref_tle
#1 0x0000000000808ab9 in prepare_sort_from_pathkeys
#2 0x000000000080926c in make_sort_from_pathkeys
#3 0x0000000000801032 in create_sort_plan
#4 0x00000000007fe7e0 in create_plan_recurse
#5 0x0000000000800b2c in create_gather_merge_plan
#6 0x00000000007fe94d in create_plan_recurse
#7 0x0000000000805328 in create_nestloop_plan
#8 0x00000000007ff3c5 in create_join_plan
#9 0x00000000007fe5f8 in create_plan_recurse
#10 0x0000000000800d68 in create_projection_plan
#11 0x00000000007fe662 in create_plan_recurse
#12 0x0000000000801252 in create_upper_unique_plan
#13 0x00000000007fe760 in create_plan_recurse
#14 0x00000000007fe4f2 in create_plan
#15 0x000000000081082f in standard_plannerand the create_sort_plan works with lefttree that is IndexScan, so the
query we're constructing looks like this:Distinct
-> Nestloop
-> Gather Merge
-> Sort
-> Index Scanand it's the sort that expects to find the expression in the Index Scan
target list. Which seems rather bogus, because clearly the index scan
does not include the expression. (I wonder if it's somehow related that
indexes can't be built on volatile expressions ...)The index scan does include values not found in the index though,
right? It returns the whole tuple -- though I'm not sure if it
includes calculated values (BTW: is this what is meant by a
"projection" being added? Or is that something else?)
AFAIK index scans are projection-capable - at least it seems like that
from is_projection_capable_path. But the volatility blocks that, I think
(see below).
Anyway, the index scan clearly does not include the expression the sort
references, hence the failure. And the index can can't compute it,
because we probably need to compute it on top of the join I think
(otherwise we might get duplicate values for volatile functions etc.)In this particular query there's no reason why we'd have to wait to
compute it until after the join, right?For example, if we take away the parallelism but still force a Unique
plan, we get this:Unique
-> Sort
Sort Key: ref_0.i, (CASE WHEN pg_rotate_logfile_old() THEN
ref_0.t ELSE ref_0.t END)
-> Nested Loop
-> Seq Scan on ref_0
-> Function Scan on generate_series ref_1And I don't see any reason why the CASE statement couldn't in theory
(I don't know the internals enough to know when it actually happens)
be done as part of the base relation scan (in this case, the seq
scan). It's not dependent on any information from the join.
Imagine a query like this:
select t1.id, volatile_func() from t1 join t2 using (id);
and now assume t2 contains duplicate values. If the volatile_func gets
evaluated as part of the t1 scan, then we'll get multiple occurrences
in the results, due to the t2 duplicates. I belive volatile functions
are not expected to behave like that - the docs say:
A query using a volatile function will re-evaluate the function at
every row where its value is needed..
And I assume this references to rows at the level where the function is
used, i.e. after the join.
Furthermore the same plan works correctly for a non-volatile
expression, so that means (I think) that the relation scan is capable
of producing that expression in the tuple it generates (either that or
the sort is generating it?).
I do believe the reason is exactly that non-volatile expressions can be
pushed down, so that we actually find them in the target list of the
node below the sort.
See explains for the second set of queries, where I used a simple
non-volatile expression.
Looking at this from a slightly different angle, the root cause here
seems to be that generate_useful_gather_paths uses the pathkeys it gets
from get_useful_pathkeys_for_relation, which means root->query_pathkeys.
But all other create_gather_merge_calls use root->sort_pathkeys, so
maybe this is the actual problem and get_useful_pathkeys_for_relation
should use root->sort_pathkeys instead. That does fix the issue for me
too (and it passes all regression tests).So I'm not convinced this is the correct way forward. It seems we want
to be able to apply this as broadly as possible, and using
sort_pathkeys here means we wouldn't use it for the DISTINCT case at
all, right?So far I think excluding volatile expressions (or mutable functions?)
is the best way we've discussed so far, though I'm clear yet on why it
is that that's a necessary restriction. If the answer is "there are
cases where it would be unsafe for parallel query even though it isn't
here...", I'm fine with that, but if so, we should make sure that's in
the code comments/readmes somewhere.I also verified that the functions I've been playing with
(pg_rotate_logfile_old and md5) are marked parallel safe in pg_proc.
More importanly, it does not actually fix the issue - it does fix that
particular query, but just replacing the DISTINCT with either ORDER BY
or GROUP BY make it fail again :-(
Attached is a simple script I used, demonstrating these issues for the
three cases that expect to have ressortgroupref != 0 (per the comment
before TargetEntry in plannodes.h).
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
On Fri, Oct 2, 2020 at 4:56 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
On Fri, Oct 02, 2020 at 04:12:11PM -0400, James Coleman wrote:
On Fri, Oct 2, 2020 at 2:25 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:On Fri, Oct 02, 2020 at 10:55:14AM -0400, James Coleman wrote:
On Fri, Oct 2, 2020 at 10:53 AM James Coleman <jtc331@gmail.com> wrote:
On Fri, Oct 2, 2020 at 10:32 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:On Fri, Oct 02, 2020 at 09:19:44AM -0400, James Coleman wrote:
...
I've been able to confirm that the problem goes away if we stop adding
the gather merge paths in generate_useful_gather_paths().I'm not sure yet what conclusion that leads us to. It seems to be that
the biggest clue remains that all of this works correctly unless one
of the selected columns (which happens to be a pathkey at this point
because it's a DISTINCT query) contains a volatile expression.Yeah. It seems to me this is a bug in get_useful_pathkeys_for_relation,
which is calling find_em_expr_for_rel and is happy with anything it
returns. But this was copied from postgres_fdw, which however does a bit
more here:if (pathkey_ec->ec_has_volatile ||
!(em_expr = find_em_expr_for_rel(pathkey_ec, rel)) ||
!is_foreign_expr(root, rel, em_expr))So not only does it explicitly check volatility of the pathkey, it also
calls is_foreign_expr which checks the expression for mutable functions.The attached patch seems to fix this, but it only adds the check for
mutable functions. Maybe it should check ec_has_volatile too ...We actually discussed the volatility check in that function back on
the original thread [1], and we'd concluded that was specifically
necessary for the fdw code because the function would execute on two
different servers (and thus produce different results), but that in a
local server only scenario it should be fine.My understanding (correct me if I'm wrong) is that the volatile
function should only be executed once (at the scan level?) to build
the tuple and from then on the datum isn't going to change, so I'm not
sure why the volatility would matter here.James
1: /messages/by-id/20200328025830.6v6ogkseohakc32q@development
Oh, hmm, could what I said all be true, but there still be some rule
that you shouldn't compare datums generated from volatile expressions
in different backends (i.e., parallel query)?I'm not sure it's all that related to parallel query - it's more likely
that when constructing the paths below the gather merge, this new code
fails to do something important.I'm not 100% sure how the grouping and volatile functions work, so let
me think aloud here ...The backtrace looks like this:
#0 get_sortgroupref_tle
#1 0x0000000000808ab9 in prepare_sort_from_pathkeys
#2 0x000000000080926c in make_sort_from_pathkeys
#3 0x0000000000801032 in create_sort_plan
#4 0x00000000007fe7e0 in create_plan_recurse
#5 0x0000000000800b2c in create_gather_merge_plan
#6 0x00000000007fe94d in create_plan_recurse
#7 0x0000000000805328 in create_nestloop_plan
#8 0x00000000007ff3c5 in create_join_plan
#9 0x00000000007fe5f8 in create_plan_recurse
#10 0x0000000000800d68 in create_projection_plan
#11 0x00000000007fe662 in create_plan_recurse
#12 0x0000000000801252 in create_upper_unique_plan
#13 0x00000000007fe760 in create_plan_recurse
#14 0x00000000007fe4f2 in create_plan
#15 0x000000000081082f in standard_plannerand the create_sort_plan works with lefttree that is IndexScan, so the
query we're constructing looks like this:Distinct
-> Nestloop
-> Gather Merge
-> Sort
-> Index Scanand it's the sort that expects to find the expression in the Index Scan
target list. Which seems rather bogus, because clearly the index scan
does not include the expression. (I wonder if it's somehow related that
indexes can't be built on volatile expressions ...)The index scan does include values not found in the index though,
right? It returns the whole tuple -- though I'm not sure if it
includes calculated values (BTW: is this what is meant by a
"projection" being added? Or is that something else?)AFAIK index scans are projection-capable - at least it seems like that
from is_projection_capable_path. But the volatility blocks that, I think
(see below).Anyway, the index scan clearly does not include the expression the sort
references, hence the failure. And the index can can't compute it,
because we probably need to compute it on top of the join I think
(otherwise we might get duplicate values for volatile functions etc.)In this particular query there's no reason why we'd have to wait to
compute it until after the join, right?For example, if we take away the parallelism but still force a Unique
plan, we get this:Unique
-> Sort
Sort Key: ref_0.i, (CASE WHEN pg_rotate_logfile_old() THEN
ref_0.t ELSE ref_0.t END)
-> Nested Loop
-> Seq Scan on ref_0
-> Function Scan on generate_series ref_1And I don't see any reason why the CASE statement couldn't in theory
(I don't know the internals enough to know when it actually happens)
be done as part of the base relation scan (in this case, the seq
scan). It's not dependent on any information from the join.Imagine a query like this:
select t1.id, volatile_func() from t1 join t2 using (id);
and now assume t2 contains duplicate values. If the volatile_func gets
evaluated as part of the t1 scan, then we'll get multiple occurrences
in the results, due to the t2 duplicates. I belive volatile functions
are not expected to behave like that - the docs say:A query using a volatile function will re-evaluate the function at
every row where its value is needed..And I assume this references to rows at the level where the function is
used, i.e. after the join.Furthermore the same plan works correctly for a non-volatile
expression, so that means (I think) that the relation scan is capable
of producing that expression in the tuple it generates (either that or
the sort is generating it?).I do believe the reason is exactly that non-volatile expressions can be
pushed down, so that we actually find them in the target list of the
node below the sort.See explains for the second set of queries, where I used a simple
non-volatile expression.Looking at this from a slightly different angle, the root cause here
seems to be that generate_useful_gather_paths uses the pathkeys it gets
from get_useful_pathkeys_for_relation, which means root->query_pathkeys.
But all other create_gather_merge_calls use root->sort_pathkeys, so
maybe this is the actual problem and get_useful_pathkeys_for_relation
should use root->sort_pathkeys instead. That does fix the issue for me
too (and it passes all regression tests).So I'm not convinced this is the correct way forward. It seems we want
to be able to apply this as broadly as possible, and using
sort_pathkeys here means we wouldn't use it for the DISTINCT case at
all, right?So far I think excluding volatile expressions (or mutable functions?)
is the best way we've discussed so far, though I'm clear yet on why it
is that that's a necessary restriction. If the answer is "there are
cases where it would be unsafe for parallel query even though it isn't
here...", I'm fine with that, but if so, we should make sure that's in
the code comments/readmes somewhere.I also verified that the functions I've been playing with
(pg_rotate_logfile_old and md5) are marked parallel safe in pg_proc.More importanly, it does not actually fix the issue - it does fix that
particular query, but just replacing the DISTINCT with either ORDER BY
or GROUP BY make it fail again :-(Attached is a simple script I used, demonstrating these issues for the
three cases that expect to have ressortgroupref != 0 (per the comment
before TargetEntry in plannodes.h).
So does checking for volatile expressions (if you happened to test
that) solve all the cases? If you haven't tested that yet, I can try
to do that this evening.
James
On Fri, Oct 2, 2020 at 4:56 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
And I don't see any reason why the CASE statement couldn't in theory
(I don't know the internals enough to know when it actually happens)
be done as part of the base relation scan (in this case, the seq
scan). It's not dependent on any information from the join.Imagine a query like this:
select t1.id, volatile_func() from t1 join t2 using (id);
and now assume t2 contains duplicate values. If the volatile_func gets
evaluated as part of the t1 scan, then we'll get multiple occurrences
in the results, due to the t2 duplicates. I belive volatile functions
are not expected to behave like that - the docs say:A query using a volatile function will re-evaluate the function at
every row where its value is needed..And I assume this references to rows at the level where the function is
used, i.e. after the join.
Ah, this makes sense. I was thinking exactly the opposite: that you'd
want not to execute volatile functions multiple times for the same
base rel row, but now that you describe it it makes sense why they
shouldn't be.
James
On Fri, Oct 02, 2020 at 05:45:52PM -0400, James Coleman wrote:
On Fri, Oct 2, 2020 at 4:56 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:...
More importanly, it does not actually fix the issue - it does fix that
particular query, but just replacing the DISTINCT with either ORDER BY
or GROUP BY make it fail again :-(Attached is a simple script I used, demonstrating these issues for the
three cases that expect to have ressortgroupref != 0 (per the comment
before TargetEntry in plannodes.h).So does checking for volatile expressions (if you happened to test
that) solve all the cases? If you haven't tested that yet, I can try
to do that this evening.
Yes, it does fix all the three queries in the SQL script.
The question however is whether this is the root issue, and whether it's
the right way to fix it. For example - volatility is not the only reason
that may block qual pushdown. If you look at qual_is_pushdown_safe, it
also blocks pushdown of leaky functions in security_barrier views. So I
wonder if that could cause failures too, somehow. But I haven't managed
to create such example.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Oct 2, 2020 at 6:28 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
On Fri, Oct 02, 2020 at 05:45:52PM -0400, James Coleman wrote:
On Fri, Oct 2, 2020 at 4:56 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:...
More importanly, it does not actually fix the issue - it does fix that
particular query, but just replacing the DISTINCT with either ORDER BY
or GROUP BY make it fail again :-(Attached is a simple script I used, demonstrating these issues for the
three cases that expect to have ressortgroupref != 0 (per the comment
before TargetEntry in plannodes.h).So does checking for volatile expressions (if you happened to test
that) solve all the cases? If you haven't tested that yet, I can try
to do that this evening.Yes, it does fix all the three queries in the SQL script.
The question however is whether this is the root issue, and whether it's
the right way to fix it. For example - volatility is not the only reason
that may block qual pushdown. If you look at qual_is_pushdown_safe, it
also blocks pushdown of leaky functions in security_barrier views. So I
wonder if that could cause failures too, somehow. But I haven't managed
to create such example.
I was about to say that the issue here is slightly different from qual
etc. pushdown, since we're not concerned about quals here, and so I
wonder where we determine what target list entries to put in a given
scan path, but then I realized that implies (maybe!) a simpler
solution. Instead of duplicating checks on target list entries would
be safe, why not check directly in get_useful_pathkeys_for_relation()
whether or not the pathkey has a target list entry?
I haven't been able to try that out yet, and so maybe I'm missing
something, but I need to step out for a bit, so I'll have to look at
it later.
James
On Fri, Oct 2, 2020 at 7:07 PM James Coleman <jtc331@gmail.com> wrote:
On Fri, Oct 2, 2020 at 6:28 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:On Fri, Oct 02, 2020 at 05:45:52PM -0400, James Coleman wrote:
On Fri, Oct 2, 2020 at 4:56 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:...
More importanly, it does not actually fix the issue - it does fix that
particular query, but just replacing the DISTINCT with either ORDER BY
or GROUP BY make it fail again :-(Attached is a simple script I used, demonstrating these issues for the
three cases that expect to have ressortgroupref != 0 (per the comment
before TargetEntry in plannodes.h).So does checking for volatile expressions (if you happened to test
that) solve all the cases? If you haven't tested that yet, I can try
to do that this evening.Yes, it does fix all the three queries in the SQL script.
The question however is whether this is the root issue, and whether it's
the right way to fix it. For example - volatility is not the only reason
that may block qual pushdown. If you look at qual_is_pushdown_safe, it
also blocks pushdown of leaky functions in security_barrier views. So I
wonder if that could cause failures too, somehow. But I haven't managed
to create such example.I was about to say that the issue here is slightly different from qual
etc. pushdown, since we're not concerned about quals here, and so I
wonder where we determine what target list entries to put in a given
scan path, but then I realized that implies (maybe!) a simpler
solution. Instead of duplicating checks on target list entries would
be safe, why not check directly in get_useful_pathkeys_for_relation()
whether or not the pathkey has a target list entry?I haven't been able to try that out yet, and so maybe I'm missing
something, but I need to step out for a bit, so I'll have to look at
it later.
I've started poking at this, but haven't yet found a workable
solution. See the attached patch which "solves" the problem by
breaking putting the sort under the gather merge, but it at least
demonstrates conceptually what I think we're interested in doing.
The issue currently is that the comparison of expressions fails -- in
our query, the first column selected shows up as a Var (with the same
contents) but is a different pointer in the em_expr and the reltarget
exprs list. I don't yet see a good way to get the equivalence class
for a PathTarget entry.
James
Attachments:
incremental_sort_fix_expr_lookup_idea.patchtext/x-patch; charset=US-ASCII; name=incremental_sort_fix_expr_lookup_idea.patchDownload
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index b68a5a0ec7..454b3abe5c 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -777,8 +777,17 @@ find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel)
foreach(lc_em, ec->ec_members)
{
EquivalenceMember *em = lfirst(lc_em);
+ PathTarget *target = rel->reltarget;
+ ListCell *lc_expr;
+ bool foundExpr = false;
- if (bms_is_subset(em->em_relids, rel->relids) &&
+ foreach(lc_expr, target->exprs)
+ {
+ if (lfirst(lc_expr) == em->em_expr)
+ foundExpr = true;
+ }
+
+ if (foundExpr && bms_is_subset(em->em_relids, rel->relids) &&
!bms_is_empty(em->em_relids))
{
/*
On Fri, Oct 2, 2020 at 2:25 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
The backtrace looks like this:
#0 get_sortgroupref_tle
#1 0x0000000000808ab9 in prepare_sort_from_pathkeys
#2 0x000000000080926c in make_sort_from_pathkeys
#3 0x0000000000801032 in create_sort_plan
#4 0x00000000007fe7e0 in create_plan_recurse
#5 0x0000000000800b2c in create_gather_merge_plan
#6 0x00000000007fe94d in create_plan_recurse
#7 0x0000000000805328 in create_nestloop_plan
#8 0x00000000007ff3c5 in create_join_plan
#9 0x00000000007fe5f8 in create_plan_recurse
#10 0x0000000000800d68 in create_projection_plan
#11 0x00000000007fe662 in create_plan_recurse
#12 0x0000000000801252 in create_upper_unique_plan
#13 0x00000000007fe760 in create_plan_recurse
#14 0x00000000007fe4f2 in create_plan
#15 0x000000000081082f in standard_plannerand the create_sort_plan works with lefttree that is IndexScan, so the
query we're constructing looks like this:Distinct
-> Nestloop
-> Gather Merge
-> Sort
-> Index Scanand it's the sort that expects to find the expression in the Index Scan
target list. Which seems rather bogus, because clearly the index scan
does not include the expression. (I wonder if it's somehow related that
indexes can't be built on volatile expressions ...)Anyway, the index scan clearly does not include the expression the sort
references, hence the failure. And the index can can't compute it,
because we probably need to compute it on top of the join I think
(otherwise we might get duplicate values for volatile functions etc.)Looking at this from a slightly different angle, the root cause here
seems to be that generate_useful_gather_paths uses the pathkeys it gets
from get_useful_pathkeys_for_relation, which means root->query_pathkeys.
But all other create_gather_merge_calls use root->sort_pathkeys, so
maybe this is the actual problem and get_useful_pathkeys_for_relation
should use root->sort_pathkeys instead. That does fix the issue for me
too (and it passes all regression tests).
So I've been a bit confused how our error could come from working with
root->query_pathkeys when that's what's supposedly being set from the
make_pathkeys_for_sortclauses() call in the backtrace Jaime reported,
but I just realized that the trace I get when reproducing the error is
different -- and matches the one you shared above.
Jaime: was the backtrace in the original report by any chance record
from breakpointing in the first call to get_sortgroupref_tle() (and
one that successfully returned a sort group ref) rather than a call
that hit the elog error on line 379?
James
On Fri, Oct 2, 2020 at 11:16 PM James Coleman <jtc331@gmail.com> wrote:
On Fri, Oct 2, 2020 at 7:07 PM James Coleman <jtc331@gmail.com> wrote:
On Fri, Oct 2, 2020 at 6:28 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:On Fri, Oct 02, 2020 at 05:45:52PM -0400, James Coleman wrote:
On Fri, Oct 2, 2020 at 4:56 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:...
More importanly, it does not actually fix the issue - it does fix that
particular query, but just replacing the DISTINCT with either ORDER BY
or GROUP BY make it fail again :-(Attached is a simple script I used, demonstrating these issues for the
three cases that expect to have ressortgroupref != 0 (per the comment
before TargetEntry in plannodes.h).So does checking for volatile expressions (if you happened to test
that) solve all the cases? If you haven't tested that yet, I can try
to do that this evening.Yes, it does fix all the three queries in the SQL script.
The question however is whether this is the root issue, and whether it's
the right way to fix it. For example - volatility is not the only reason
that may block qual pushdown. If you look at qual_is_pushdown_safe, it
also blocks pushdown of leaky functions in security_barrier views. So I
wonder if that could cause failures too, somehow. But I haven't managed
to create such example.I was about to say that the issue here is slightly different from qual
etc. pushdown, since we're not concerned about quals here, and so I
wonder where we determine what target list entries to put in a given
scan path, but then I realized that implies (maybe!) a simpler
solution. Instead of duplicating checks on target list entries would
be safe, why not check directly in get_useful_pathkeys_for_relation()
whether or not the pathkey has a target list entry?I haven't been able to try that out yet, and so maybe I'm missing
something, but I need to step out for a bit, so I'll have to look at
it later.I've started poking at this, but haven't yet found a workable
solution. See the attached patch which "solves" the problem by
breaking putting the sort under the gather merge, but it at least
demonstrates conceptually what I think we're interested in doing.The issue currently is that the comparison of expressions fails -- in
our query, the first column selected shows up as a Var (with the same
contents) but is a different pointer in the em_expr and the reltarget
exprs list. I don't yet see a good way to get the equivalence class
for a PathTarget entry.
Hmm, I think I was looking to do is the attached patch. I didn't
realize until I did a lot of reading through source that we have an
equal() function that can compare exprs. That (plus the realization in
[1]: the originally reported backtrace wasn't where the error actually came from) convinced me that what we need is to confirm not only that the all of the vars in the ec member come from the relids in the rel but also that the expr is actually represented in the target of the rel.
came from) convinced me that what we need is to confirm not only that
the all of the vars in the ec member come from the relids in the rel
but also that the expr is actually represented in the target of the
rel.
With the gucs changed as I mentioned earlier both of the plans (with
and without a volatile call in the 2nd select entry) now look like:
Unique
-> Gather Merge
Workers Planned: 2
-> Sort
Sort Key: ref_0.i, (md5(ref_0.t))
-> Nested Loop
-> Parallel Index Scan using ref_0_i_idx on ref_0
-> Function Scan on generate_series ref_1
Without the gucs changed the minimal repro case now doesn't error, but
results in this plan:
HashAggregate
Group Key: ref_0.i, CASE WHEN pg_rotate_logfile_old() THEN ref_0.t
ELSE ref_0.t END
-> Nested Loop
-> Seq Scan on ref_0
-> Function Scan on generate_series ref_1
Similarly in your six queries I now only see parallel query showing up
in the last one.
I created an entirely new function because adding the target expr
lookup to the existing find_em_expr_for_rel() function broke a bunch
of postgres_fdw tests. That maybe raises questions about whether that
code also could have problems in theory/in the future, but I didn't
investigate further. In any case we already know it excludes
volatile...so maybe it's fine because in practice that's actually a
broader exclusion than what we're doing here.
This seems to fix the issue, but I'd like feedback on whether it's too
strict. We could of course just check em_has_volatile, but I'm
wondering if that's simultaneously too strict (by not allowing the
volatile expression to be computed in the gather merge supposing
there's no join) and too loose (maybe there are other cases we should
care about?). It also just strikes me as re-encoding rules that should
have already been applied (and thus we should be able to look up in
the data we have if it's safe to use the expr/pathkey). Those are
mostly intuitions though.
James
1: /messages/by-id/CAAaqYe8zqDAv0Sfak5Riu+DKsm-i3ARPursn5v6qTwiCXmkXKQ@mail.gmail.com
Attachments:
v1-0001-Fix-get_useful_pathkeys_for_relation-for-volatile.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Fix-get_useful_pathkeys_for_relation-for-volatile.patchDownload
From 1b2d39d2119ab6bd09f6b984b4d3a7f46be090e9 Mon Sep 17 00:00:00 2001
From: James Coleman <jtc331@gmail.com>
Date: Sat, 3 Oct 2020 10:35:47 -0400
Subject: [PATCH v1] Fix get_useful_pathkeys_for_relation for volatile exprs
below joins
It's not sufficient to find an ec member whose Vars are all in the rel
we're working with; volatile expressions in particular present a case
where we also need to know if the rel's target contains the expression
or not before we can assume the pathkey for that ec is safe to use at
this point in the query. If the pathkey expr is volatile and we're
below a join, then the expr can't appear in the target until we're above
the join, so we can't sort with it yet.
---
src/backend/optimizer/path/allpaths.c | 11 +++----
src/backend/optimizer/path/equivclass.c | 38 +++++++++++++++++++++++++
src/include/optimizer/paths.h | 1 +
3 files changed, 45 insertions(+), 5 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index b399592ff8..53050b825b 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -2760,7 +2760,8 @@ get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel)
/*
* Considering query_pathkeys is always worth it, because it might allow
* us to avoid a total sort when we have a partially presorted path
- * available.
+ * available or to push the total sort into the parallel portion of the
+ * query.
*/
if (root->query_pathkeys)
{
@@ -2773,17 +2774,17 @@ get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel)
EquivalenceClass *pathkey_ec = pathkey->pk_eclass;
/*
- * We can only build an Incremental Sort for pathkeys which
- * contain an EC member in the current relation, so ignore any
+ * We can only build a sort for pathkeys which
+ * contain an EC member in the current relation's target, so ignore any
* suffix of the list as soon as we find a pathkey without an EC
- * member the relation.
+ * member in the relation.
*
* By still returning the prefix of the pathkeys list that does
* meet criteria of EC membership in the current relation, we
* enable not just an incremental sort on the entirety of
* query_pathkeys but also incremental sort below a JOIN.
*/
- if (!find_em_expr_for_rel(pathkey_ec, rel))
+ if (!find_em_expr_for_rel_target(pathkey_ec, rel))
break;
npathkeys++;
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index b68a5a0ec7..6cc358e60d 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -794,6 +794,44 @@ find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel)
return NULL;
}
+/*
+ * Find an equivalence class member expression that is in the
+ * the indicated relation's target.
+ */
+Expr *
+find_em_expr_for_rel_target(EquivalenceClass *ec, RelOptInfo *rel)
+{
+ ListCell *lc_em;
+
+ foreach(lc_em, ec->ec_members)
+ {
+ EquivalenceMember *em = lfirst(lc_em);
+ PathTarget *target = rel->reltarget;
+ ListCell *lc_expr;
+ bool has_target_expr = false;
+
+ foreach(lc_expr, target->exprs)
+ {
+ if (equal(lfirst(lc_expr), em->em_expr))
+ has_target_expr = true;
+ }
+
+ if (has_target_expr && bms_is_subset(em->em_relids, rel->relids) &&
+ !bms_is_empty(em->em_relids))
+ {
+ /*
+ * If there is more than one equivalence member whose Vars are
+ * taken entirely from this relation, we'll be content to choose
+ * any one of those.
+ */
+ return em->em_expr;
+ }
+ }
+
+ /* We didn't find any suitable equivalence class expression */
+ return NULL;
+}
+
/*
* generate_base_implied_equalities
* Generate any restriction clauses that we can deduce from equivalence
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 10b6e81079..6ddb54f415 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -135,6 +135,7 @@ extern EquivalenceClass *get_eclass_for_sort_expr(PlannerInfo *root,
Relids rel,
bool create_it);
extern Expr *find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel);
+extern Expr *find_em_expr_for_rel_target(EquivalenceClass *ec, RelOptInfo *rel);
extern void generate_base_implied_equalities(PlannerInfo *root);
extern List *generate_join_implied_equalities(PlannerInfo *root,
Relids join_relids,
--
2.17.1
On Sat, 3 Oct 2020 at 08:15, James Coleman <jtc331@gmail.com> wrote:
Jaime: was the backtrace in the original report by any chance record
from breakpointing in the first call to get_sortgroupref_tle() (and
one that successfully returned a sort group ref) rather than a call
that hit the elog error on line 379?
I remember the first backtrace a made for this was on the line of the
elog but then a second backtrace was with the function, now that
repeat it with line of the function for the elog (and the original
query) i see again create_incrementalsort_plan() which made me try
with enable_incremental_sort disabled...
sorry for that, attached a better backtrace
--
Jaime Casanova www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
On Sat, Oct 03, 2020 at 10:50:06AM -0400, James Coleman wrote:
On Fri, Oct 2, 2020 at 11:16 PM James Coleman <jtc331@gmail.com> wrote:
On Fri, Oct 2, 2020 at 7:07 PM James Coleman <jtc331@gmail.com> wrote:
On Fri, Oct 2, 2020 at 6:28 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:On Fri, Oct 02, 2020 at 05:45:52PM -0400, James Coleman wrote:
On Fri, Oct 2, 2020 at 4:56 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:...
More importanly, it does not actually fix the issue - it does fix that
particular query, but just replacing the DISTINCT with either ORDER BY
or GROUP BY make it fail again :-(Attached is a simple script I used, demonstrating these issues for the
three cases that expect to have ressortgroupref != 0 (per the comment
before TargetEntry in plannodes.h).So does checking for volatile expressions (if you happened to test
that) solve all the cases? If you haven't tested that yet, I can try
to do that this evening.Yes, it does fix all the three queries in the SQL script.
The question however is whether this is the root issue, and whether it's
the right way to fix it. For example - volatility is not the only reason
that may block qual pushdown. If you look at qual_is_pushdown_safe, it
also blocks pushdown of leaky functions in security_barrier views. So I
wonder if that could cause failures too, somehow. But I haven't managed
to create such example.I was about to say that the issue here is slightly different from qual
etc. pushdown, since we're not concerned about quals here, and so I
wonder where we determine what target list entries to put in a given
scan path, but then I realized that implies (maybe!) a simpler
solution. Instead of duplicating checks on target list entries would
be safe, why not check directly in get_useful_pathkeys_for_relation()
whether or not the pathkey has a target list entry?I haven't been able to try that out yet, and so maybe I'm missing
something, but I need to step out for a bit, so I'll have to look at
it later.I've started poking at this, but haven't yet found a workable
solution. See the attached patch which "solves" the problem by
breaking putting the sort under the gather merge, but it at least
demonstrates conceptually what I think we're interested in doing.The issue currently is that the comparison of expressions fails -- in
our query, the first column selected shows up as a Var (with the same
contents) but is a different pointer in the em_expr and the reltarget
exprs list. I don't yet see a good way to get the equivalence class
for a PathTarget entry.Hmm, I think I was looking to do is the attached patch. I didn't
realize until I did a lot of reading through source that we have an
equal() function that can compare exprs. That (plus the realization in
[1] the originally reported backtrace wasn't where the error actually
came from) convinced me that what we need is to confirm not only that
the all of the vars in the ec member come from the relids in the rel
but also that the expr is actually represented in the target of the
rel.With the gucs changed as I mentioned earlier both of the plans (with
and without a volatile call in the 2nd select entry) now look like:Unique
-> Gather Merge
Workers Planned: 2
-> Sort
Sort Key: ref_0.i, (md5(ref_0.t))
-> Nested Loop
-> Parallel Index Scan using ref_0_i_idx on ref_0
-> Function Scan on generate_series ref_1Without the gucs changed the minimal repro case now doesn't error, but
results in this plan:HashAggregate
Group Key: ref_0.i, CASE WHEN pg_rotate_logfile_old() THEN ref_0.t
ELSE ref_0.t END
-> Nested Loop
-> Seq Scan on ref_0
-> Function Scan on generate_series ref_1Similarly in your six queries I now only see parallel query showing up
in the last one.
OK, that seems reasonable I think.
I created an entirely new function because adding the target expr
lookup to the existing find_em_expr_for_rel() function broke a bunch
of postgres_fdw tests. That maybe raises questions about whether that
code also could have problems in theory/in the future, but I didn't
investigate further. In any case we already know it excludes
volatile...so maybe it's fine because in practice that's actually a
broader exclusion than what we're doing here.
I don't think postgres_fdw needs these new checks, because FDW scan's
are not allowed in parallel part of the plan - if that changes in the
future, I'm sure that'll require fixes in plenty other places.
OTOH it's interesting that it triggers those failures - I wonder if we
could learn something from them? AFAICS those paths can't be built by
generate_useful_gather_paths (because of the postgres_fdw vs. parallel
query restrictions), so how do these plans look?
This seems to fix the issue, but I'd like feedback on whether it's too
strict. We could of course just check em_has_volatile, but I'm
wondering if that's simultaneously too strict (by not allowing the
volatile expression to be computed in the gather merge supposing
there's no join) and too loose (maybe there are other cases we should
care about?). It also just strikes me as re-encoding rules that should
have already been applied (and thus we should be able to look up in
the data we have if it's safe to use the expr/pathkey). Those are
mostly intuitions though.
I don't know :-( As I mentioned before, I suspect checking just the
volatility may not be enough in some cases (leakyness, etc.) and I think
you're right it may be too strict in other cases.
Not sure I understand which rules you think we're re-encoding, but I
have a nagging feeling there's a piece of code somewhere earlier in
the query planner meant to prevent such cases (sort on path without all
the pathkeys), and that fixing it here is just a band aid :-(
I'm sure we're building plans with Sort on top of Index Scan paths, so
how come those don't fail? How come the non-parallel version of these
queries don't fail?
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Oct 3, 2020 at 5:44 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
On Sat, Oct 03, 2020 at 10:50:06AM -0400, James Coleman wrote:
On Fri, Oct 2, 2020 at 11:16 PM James Coleman <jtc331@gmail.com> wrote:
On Fri, Oct 2, 2020 at 7:07 PM James Coleman <jtc331@gmail.com> wrote:
On Fri, Oct 2, 2020 at 6:28 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:On Fri, Oct 02, 2020 at 05:45:52PM -0400, James Coleman wrote:
On Fri, Oct 2, 2020 at 4:56 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:...
More importanly, it does not actually fix the issue - it does fix that
particular query, but just replacing the DISTINCT with either ORDER BY
or GROUP BY make it fail again :-(Attached is a simple script I used, demonstrating these issues for the
three cases that expect to have ressortgroupref != 0 (per the comment
before TargetEntry in plannodes.h).So does checking for volatile expressions (if you happened to test
that) solve all the cases? If you haven't tested that yet, I can try
to do that this evening.Yes, it does fix all the three queries in the SQL script.
The question however is whether this is the root issue, and whether it's
the right way to fix it. For example - volatility is not the only reason
that may block qual pushdown. If you look at qual_is_pushdown_safe, it
also blocks pushdown of leaky functions in security_barrier views. So I
wonder if that could cause failures too, somehow. But I haven't managed
to create such example.I was about to say that the issue here is slightly different from qual
etc. pushdown, since we're not concerned about quals here, and so I
wonder where we determine what target list entries to put in a given
scan path, but then I realized that implies (maybe!) a simpler
solution. Instead of duplicating checks on target list entries would
be safe, why not check directly in get_useful_pathkeys_for_relation()
whether or not the pathkey has a target list entry?I haven't been able to try that out yet, and so maybe I'm missing
something, but I need to step out for a bit, so I'll have to look at
it later.I've started poking at this, but haven't yet found a workable
solution. See the attached patch which "solves" the problem by
breaking putting the sort under the gather merge, but it at least
demonstrates conceptually what I think we're interested in doing.The issue currently is that the comparison of expressions fails -- in
our query, the first column selected shows up as a Var (with the same
contents) but is a different pointer in the em_expr and the reltarget
exprs list. I don't yet see a good way to get the equivalence class
for a PathTarget entry.Hmm, I think I was looking to do is the attached patch. I didn't
realize until I did a lot of reading through source that we have an
equal() function that can compare exprs. That (plus the realization in
[1] the originally reported backtrace wasn't where the error actually
came from) convinced me that what we need is to confirm not only that
the all of the vars in the ec member come from the relids in the rel
but also that the expr is actually represented in the target of the
rel.With the gucs changed as I mentioned earlier both of the plans (with
and without a volatile call in the 2nd select entry) now look like:Unique
-> Gather Merge
Workers Planned: 2
-> Sort
Sort Key: ref_0.i, (md5(ref_0.t))
-> Nested Loop
-> Parallel Index Scan using ref_0_i_idx on ref_0
-> Function Scan on generate_series ref_1Without the gucs changed the minimal repro case now doesn't error, but
results in this plan:HashAggregate
Group Key: ref_0.i, CASE WHEN pg_rotate_logfile_old() THEN ref_0.t
ELSE ref_0.t END
-> Nested Loop
-> Seq Scan on ref_0
-> Function Scan on generate_series ref_1Similarly in your six queries I now only see parallel query showing up
in the last one.OK, that seems reasonable I think.
If we proceed with this patch I'd like to tweak it to check for the
relids subset first on the theory that it will be cheaper than the
expr equality check loop; that change is attached.
I created an entirely new function because adding the target expr
lookup to the existing find_em_expr_for_rel() function broke a bunch
of postgres_fdw tests. That maybe raises questions about whether that
code also could have problems in theory/in the future, but I didn't
investigate further. In any case we already know it excludes
volatile...so maybe it's fine because in practice that's actually a
broader exclusion than what we're doing here.I don't think postgres_fdw needs these new checks, because FDW scan's
are not allowed in parallel part of the plan - if that changes in the
future, I'm sure that'll require fixes in plenty other places.OTOH it's interesting that it triggers those failures - I wonder if we
could learn something from them? AFAICS those paths can't be built by
generate_useful_gather_paths (because of the postgres_fdw vs. parallel
query restrictions), so how do these plans look?
Well the fdw code doesn't trigger the errors, but both
generate_useful_gather_paths() and the fdw code originally relied on
find_em_expr_for_rel(), but now they're diverging. IIRC the fdw code
also uses it for determining what path keys are valid -- in that case
which ones are safe to be sent to the foreign server (so all of the
vars have to be from rels on the foreign server). The fdw code has
additional checks too though, for example no volatile expressions may
be pushed to the remote.
This seems to fix the issue, but I'd like feedback on whether it's too
strict. We could of course just check em_has_volatile, but I'm
wondering if that's simultaneously too strict (by not allowing the
volatile expression to be computed in the gather merge supposing
there's no join) and too loose (maybe there are other cases we should
care about?). It also just strikes me as re-encoding rules that should
have already been applied (and thus we should be able to look up in
the data we have if it's safe to use the expr/pathkey). Those are
mostly intuitions though.I don't know :-( As I mentioned before, I suspect checking just the
volatility may not be enough in some cases (leakyness, etc.) and I think
you're right it may be too strict in other cases.Not sure I understand which rules you think we're re-encoding, but I
have a nagging feeling there's a piece of code somewhere earlier in
the query planner meant to prevent such cases (sort on path without all
the pathkeys), and that fixing it here is just a band aid :-(
I'm getting at exactly what you're saying below: there's got to be
some existing rules (and presumably code already enforcing it in some
places) at what level in the path tree a given pathkey is
safe/correct, and we don't want to reinvent those rules (or ideally
that code). Verifying the expr is in the rel seems to me to be
intuitively correct, but I wish there was another place in the code
that could validate that.
I'm sure we're building plans with Sort on top of Index Scan paths, so
how come those don't fail? How come the non-parallel version of these
queries don't fail?
Yeah, exactly. Something has to be doing something similar -- I don't
think everywhere else using sort_pathkeys instead of query_pathkeys
really changes the problem. I've read through all of the places we use
generate_useful_gather_paths() as well as root->sort_pathkeys to try
to get a better understanding of this. Most of the places I convinced
myself it's already safe -- for example places like
create_orderd_paths are working with the upper rel, and the full
target list is available. A few other places I added comments (see the
"questions" patch in the attached series) where I'm not sure why we
apply projections *after* we create a sort path; given the issue we're
discussing here I would have thought you'd want to do exactly the
opposite.
I haven't yet tried to read through the code in other places where we
build an explicit sort to see if I can find any hints as to what
existing code does to ensure this situation doesn't occur, but so far
I haven't found anything else obvious.
Is there someone who might be able to point us in the right
direction/validate what we're thinking?
James
Attachments:
v2-0001-Fix-get_useful_pathkeys_for_relation-for-volatile.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Fix-get_useful_pathkeys_for_relation-for-volatile.patchDownload
From e7b94d4efda41becb2f28123eca2db2a88d0d4d0 Mon Sep 17 00:00:00 2001
From: James Coleman <jtc331@gmail.com>
Date: Sat, 3 Oct 2020 10:35:47 -0400
Subject: [PATCH v2 1/2] Fix get_useful_pathkeys_for_relation for volatile
exprs below joins
It's not sufficient to find an ec member whose Vars are all in the rel
we're working with; volatile expressions in particular present a case
where we also need to know if the rel's target contains the expression
or not before we can assume the pathkey for that ec is safe to use at
this point in the query. If the pathkey expr is volatile and we're
below a join, then the expr can't appear in the target until we're above
the join, so we can't sort with it yet.
---
src/backend/optimizer/path/allpaths.c | 11 ++++---
src/backend/optimizer/path/equivclass.c | 43 +++++++++++++++++++++++++
src/backend/optimizer/plan/planner.c | 9 ++++--
src/include/optimizer/paths.h | 1 +
4 files changed, 57 insertions(+), 7 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index b399592ff8..53050b825b 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -2760,7 +2760,8 @@ get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel)
/*
* Considering query_pathkeys is always worth it, because it might allow
* us to avoid a total sort when we have a partially presorted path
- * available.
+ * available or to push the total sort into the parallel portion of the
+ * query.
*/
if (root->query_pathkeys)
{
@@ -2773,17 +2774,17 @@ get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel)
EquivalenceClass *pathkey_ec = pathkey->pk_eclass;
/*
- * We can only build an Incremental Sort for pathkeys which
- * contain an EC member in the current relation, so ignore any
+ * We can only build a sort for pathkeys which
+ * contain an EC member in the current relation's target, so ignore any
* suffix of the list as soon as we find a pathkey without an EC
- * member the relation.
+ * member in the relation.
*
* By still returning the prefix of the pathkeys list that does
* meet criteria of EC membership in the current relation, we
* enable not just an incremental sort on the entirety of
* query_pathkeys but also incremental sort below a JOIN.
*/
- if (!find_em_expr_for_rel(pathkey_ec, rel))
+ if (!find_em_expr_for_rel_target(pathkey_ec, rel))
break;
npathkeys++;
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index b68a5a0ec7..444c805dfb 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -794,6 +794,49 @@ find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel)
return NULL;
}
+/*
+ * Find an equivalence class member expression that is in the
+ * the indicated relation's target.
+ */
+Expr *
+find_em_expr_for_rel_target(EquivalenceClass *ec, RelOptInfo *rel)
+{
+ ListCell *lc_em;
+
+ foreach(lc_em, ec->ec_members)
+ {
+ EquivalenceMember *em = lfirst(lc_em);
+ PathTarget *target = rel->reltarget;
+ ListCell *lc_expr;
+
+ /*
+ * First all fo the Vars in the equivalence member have to be taken
+ * entirely from this relation.
+ */
+ if (bms_is_subset(em->em_relids, rel->relids) &&
+ !bms_is_empty(em->em_relids))
+ {
+ /*
+ * Then we must verify that the equivalence member's expr is
+ * generated in the relation's target.
+ */
+ foreach(lc_expr, target->exprs)
+ {
+ /*
+ * If there is more than one equivalence member matching these
+ * requirements we'll be content to choose any one of them.
+ */
+ if (equal(lfirst(lc_expr), em->em_expr))
+ return em->em_expr;
+ }
+
+ }
+ }
+
+ /* We didn't find any suitable equivalence class expression */
+ return NULL;
+}
+
/*
* generate_base_implied_equalities
* Generate any restriction clauses that we can deduce from equivalence
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index f331f82a6c..a48721d2fc 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7417,7 +7417,9 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
* current reltarget is. We don't do this in the case where the
* target is parallel-safe, since we will be able to generate superior
* paths by doing it after the final scan/join target has been
- * applied.
+ * applied. Since the target isn't parallel safe, we don't need
+ * to apply projections to the partial paths before building
+ * gather paths.
*/
generate_useful_gather_paths(root, rel, false);
@@ -7570,7 +7572,10 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
* if the relation is parallel safe, and we don't do it for child rels to
* avoid creating multiple Gather nodes within the same plan. We must do
* this after all paths have been generated and before set_cheapest, since
- * one of the generated paths may turn out to be the cheapest one.
+ * one of the generated paths may turn out to be the cheapest one. We've
+ * already applied any necessary projections to the partial paths above
+ * so any Gather Merge paths will be able to make use of path keys in
+ * requested target..
*/
if (rel->consider_parallel && !IS_OTHER_REL(rel))
generate_useful_gather_paths(root, rel, false);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 10b6e81079..6ddb54f415 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -135,6 +135,7 @@ extern EquivalenceClass *get_eclass_for_sort_expr(PlannerInfo *root,
Relids rel,
bool create_it);
extern Expr *find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel);
+extern Expr *find_em_expr_for_rel_target(EquivalenceClass *ec, RelOptInfo *rel);
extern void generate_base_implied_equalities(PlannerInfo *root);
extern List *generate_join_implied_equalities(PlannerInfo *root,
Relids join_relids,
--
2.17.1
v2-0002-questions.patchtext/x-patch; charset=US-ASCII; name=v2-0002-questions.patchDownload
From 98d98ddab02c36bf7ed2faa614c86f29d487d929 Mon Sep 17 00:00:00 2001
From: James Coleman <jtc331@gmail.com>
Date: Sat, 3 Oct 2020 22:05:08 -0400
Subject: [PATCH v2 2/2] questions
---
src/backend/optimizer/plan/planner.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index a48721d2fc..45463946b5 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -5021,6 +5021,10 @@ create_ordered_paths(PlannerInfo *root,
root->sort_pathkeys,
limit_tuples);
/* Add projection step if needed */
+ /* TODO: why don't we apply the projection to the path
+ * before sorting? Is it because it's already been done
+ * by apply_scanjoin_target_to_paths?
+ */
if (sorted_path->pathtarget != target)
sorted_path = apply_projection_to_path(root, ordered_rel,
sorted_path, target);
@@ -5051,6 +5055,10 @@ create_ordered_paths(PlannerInfo *root,
limit_tuples);
/* Add projection step if needed */
+ /* TODO: why don't we apply the projection to the path
+ * before sorting? Is it because it's already been done
+ * by apply_scanjoin_target_to_paths?
+ */
if (sorted_path->pathtarget != target)
sorted_path = apply_projection_to_path(root, ordered_rel,
sorted_path, target);
@@ -5102,6 +5110,10 @@ create_ordered_paths(PlannerInfo *root,
&total_groups);
/* Add projection step if needed */
+ /* TODO: why can't we apply the projection to the partial
+ * path? Is it because it's already been done if possible
+ * by apply_scanjoin_target_to_paths?
+ */
if (path->pathtarget != target)
path = apply_projection_to_path(root, ordered_rel,
path, target);
@@ -5163,6 +5175,10 @@ create_ordered_paths(PlannerInfo *root,
&total_groups);
/* Add projection step if needed */
+ /* TODO: why can't we apply the projection to the partial
+ * path? Is it because it's already been done if possible
+ * by apply_scanjoin_target_to_paths?
+ */
if (sorted_path->pathtarget != target)
sorted_path = apply_projection_to_path(root, ordered_rel,
sorted_path, target);
--
2.17.1
On Sat, Oct 3, 2020 at 10:10 PM James Coleman <jtc331@gmail.com> wrote:
On Sat, Oct 3, 2020 at 5:44 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:On Sat, Oct 03, 2020 at 10:50:06AM -0400, James Coleman wrote:
On Fri, Oct 2, 2020 at 11:16 PM James Coleman <jtc331@gmail.com> wrote:
On Fri, Oct 2, 2020 at 7:07 PM James Coleman <jtc331@gmail.com> wrote:
On Fri, Oct 2, 2020 at 6:28 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:On Fri, Oct 02, 2020 at 05:45:52PM -0400, James Coleman wrote:
On Fri, Oct 2, 2020 at 4:56 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:...
More importanly, it does not actually fix the issue - it does fix that
particular query, but just replacing the DISTINCT with either ORDER BY
or GROUP BY make it fail again :-(Attached is a simple script I used, demonstrating these issues for the
three cases that expect to have ressortgroupref != 0 (per the comment
before TargetEntry in plannodes.h).So does checking for volatile expressions (if you happened to test
that) solve all the cases? If you haven't tested that yet, I can try
to do that this evening.Yes, it does fix all the three queries in the SQL script.
The question however is whether this is the root issue, and whether it's
the right way to fix it. For example - volatility is not the only reason
that may block qual pushdown. If you look at qual_is_pushdown_safe, it
also blocks pushdown of leaky functions in security_barrier views. So I
wonder if that could cause failures too, somehow. But I haven't managed
to create such example.I was about to say that the issue here is slightly different from qual
etc. pushdown, since we're not concerned about quals here, and so I
wonder where we determine what target list entries to put in a given
scan path, but then I realized that implies (maybe!) a simpler
solution. Instead of duplicating checks on target list entries would
be safe, why not check directly in get_useful_pathkeys_for_relation()
whether or not the pathkey has a target list entry?I haven't been able to try that out yet, and so maybe I'm missing
something, but I need to step out for a bit, so I'll have to look at
it later.I've started poking at this, but haven't yet found a workable
solution. See the attached patch which "solves" the problem by
breaking putting the sort under the gather merge, but it at least
demonstrates conceptually what I think we're interested in doing.The issue currently is that the comparison of expressions fails -- in
our query, the first column selected shows up as a Var (with the same
contents) but is a different pointer in the em_expr and the reltarget
exprs list. I don't yet see a good way to get the equivalence class
for a PathTarget entry.Hmm, I think I was looking to do is the attached patch. I didn't
realize until I did a lot of reading through source that we have an
equal() function that can compare exprs. That (plus the realization in
[1] the originally reported backtrace wasn't where the error actually
came from) convinced me that what we need is to confirm not only that
the all of the vars in the ec member come from the relids in the rel
but also that the expr is actually represented in the target of the
rel.With the gucs changed as I mentioned earlier both of the plans (with
and without a volatile call in the 2nd select entry) now look like:Unique
-> Gather Merge
Workers Planned: 2
-> Sort
Sort Key: ref_0.i, (md5(ref_0.t))
-> Nested Loop
-> Parallel Index Scan using ref_0_i_idx on ref_0
-> Function Scan on generate_series ref_1Without the gucs changed the minimal repro case now doesn't error, but
results in this plan:HashAggregate
Group Key: ref_0.i, CASE WHEN pg_rotate_logfile_old() THEN ref_0.t
ELSE ref_0.t END
-> Nested Loop
-> Seq Scan on ref_0
-> Function Scan on generate_series ref_1Similarly in your six queries I now only see parallel query showing up
in the last one.OK, that seems reasonable I think.
If we proceed with this patch I'd like to tweak it to check for the
relids subset first on the theory that it will be cheaper than the
expr equality check loop; that change is attached.I created an entirely new function because adding the target expr
lookup to the existing find_em_expr_for_rel() function broke a bunch
of postgres_fdw tests. That maybe raises questions about whether that
code also could have problems in theory/in the future, but I didn't
investigate further. In any case we already know it excludes
volatile...so maybe it's fine because in practice that's actually a
broader exclusion than what we're doing here.I don't think postgres_fdw needs these new checks, because FDW scan's
are not allowed in parallel part of the plan - if that changes in the
future, I'm sure that'll require fixes in plenty other places.OTOH it's interesting that it triggers those failures - I wonder if we
could learn something from them? AFAICS those paths can't be built by
generate_useful_gather_paths (because of the postgres_fdw vs. parallel
query restrictions), so how do these plans look?Well the fdw code doesn't trigger the errors, but both
generate_useful_gather_paths() and the fdw code originally relied on
find_em_expr_for_rel(), but now they're diverging. IIRC the fdw code
also uses it for determining what path keys are valid -- in that case
which ones are safe to be sent to the foreign server (so all of the
vars have to be from rels on the foreign server). The fdw code has
additional checks too though, for example no volatile expressions may
be pushed to the remote.This seems to fix the issue, but I'd like feedback on whether it's too
strict. We could of course just check em_has_volatile, but I'm
wondering if that's simultaneously too strict (by not allowing the
volatile expression to be computed in the gather merge supposing
there's no join) and too loose (maybe there are other cases we should
care about?). It also just strikes me as re-encoding rules that should
have already been applied (and thus we should be able to look up in
the data we have if it's safe to use the expr/pathkey). Those are
mostly intuitions though.I don't know :-( As I mentioned before, I suspect checking just the
volatility may not be enough in some cases (leakyness, etc.) and I think
you're right it may be too strict in other cases.Not sure I understand which rules you think we're re-encoding, but I
have a nagging feeling there's a piece of code somewhere earlier in
the query planner meant to prevent such cases (sort on path without all
the pathkeys), and that fixing it here is just a band aid :-(I'm getting at exactly what you're saying below: there's got to be
some existing rules (and presumably code already enforcing it in some
places) at what level in the path tree a given pathkey is
safe/correct, and we don't want to reinvent those rules (or ideally
that code). Verifying the expr is in the rel seems to me to be
intuitively correct, but I wish there was another place in the code
that could validate that.I'm sure we're building plans with Sort on top of Index Scan paths, so
how come those don't fail? How come the non-parallel version of these
queries don't fail?Yeah, exactly. Something has to be doing something similar -- I don't
think everywhere else using sort_pathkeys instead of query_pathkeys
really changes the problem. I've read through all of the places we use
generate_useful_gather_paths() as well as root->sort_pathkeys to try
to get a better understanding of this. Most of the places I convinced
myself it's already safe -- for example places like
create_orderd_paths are working with the upper rel, and the full
target list is available. A few other places I added comments (see the
"questions" patch in the attached series) where I'm not sure why we
apply projections *after* we create a sort path; given the issue we're
discussing here I would have thought you'd want to do exactly the
opposite.I haven't yet tried to read through the code in other places where we
build an explicit sort to see if I can find any hints as to what
existing code does to ensure this situation doesn't occur, but so far
I haven't found anything else obvious.
I did a lot more reading through code, and I concluded that generally
(maybe exclusively?) create_sort_path() is only called on upper rels
(e.g., from create_ordered_paths()), so we already have the full
target list available. I also stepped through in the debugger and
confirmed that non-var exprs generally don't seem to show up in the
pathtarget of paths either on base rels or on join rels. Finally, the
last piece of relevant data is that we don't actually push down sorts
on non-parallel paths even when it would be beneficial. For example,
this query:
select i, md5(i::text) from t2, generate_series(1,2) order by i;
generates this plan:
Sort
Sort Key: t2.i
-> Nested Loop
-> Function Scan on generate_series
-> Materialize
-> Seq Scan on t2
but it seems obvious to me that it would be better to move the sort to
immediately above the seq scan.
So I conclude from all of this that the real reason we didn't see this
problem before was that we didn't have anything that was trying to
generate useful sort paths lower in the query, so it just worked. To
state it from the other direction, I don't see any indication that
there actually is any code trying to determine what's safe to put in
the target at various join levels; from what I can tell only vars are
distributed at those points. Thinking about it some more, this
actually makes sense, since get_useful_pathkeys() is new code, and
something like it would have been needed already if we wanted to push
sorts down further into the plan.
As a bit of a side note: combined with the above idea about pushing
sort down, it could also be beneficial to try to distribute
expressions lower too.
So I think the previously attached patch is probably correct; the
second patch with the comment questions shows I still don't fully
understand how/when projections are applied (and when they're needed),
though it might have something to do with what I said about only
operating on upper rels.
James
On Sun, Oct 4, 2020 at 9:40 PM James Coleman <jtc331@gmail.com> wrote:
On Sat, Oct 3, 2020 at 10:10 PM James Coleman <jtc331@gmail.com> wrote:
On Sat, Oct 3, 2020 at 5:44 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:On Sat, Oct 03, 2020 at 10:50:06AM -0400, James Coleman wrote:
On Fri, Oct 2, 2020 at 11:16 PM James Coleman <jtc331@gmail.com> wrote:
On Fri, Oct 2, 2020 at 7:07 PM James Coleman <jtc331@gmail.com> wrote:
On Fri, Oct 2, 2020 at 6:28 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:On Fri, Oct 02, 2020 at 05:45:52PM -0400, James Coleman wrote:
On Fri, Oct 2, 2020 at 4:56 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:...
More importanly, it does not actually fix the issue - it does fix that
particular query, but just replacing the DISTINCT with either ORDER BY
or GROUP BY make it fail again :-(Attached is a simple script I used, demonstrating these issues for the
three cases that expect to have ressortgroupref != 0 (per the comment
before TargetEntry in plannodes.h).So does checking for volatile expressions (if you happened to test
that) solve all the cases? If you haven't tested that yet, I can try
to do that this evening.Yes, it does fix all the three queries in the SQL script.
The question however is whether this is the root issue, and whether it's
the right way to fix it. For example - volatility is not the only reason
that may block qual pushdown. If you look at qual_is_pushdown_safe, it
also blocks pushdown of leaky functions in security_barrier views. So I
wonder if that could cause failures too, somehow. But I haven't managed
to create such example.I was about to say that the issue here is slightly different from qual
etc. pushdown, since we're not concerned about quals here, and so I
wonder where we determine what target list entries to put in a given
scan path, but then I realized that implies (maybe!) a simpler
solution. Instead of duplicating checks on target list entries would
be safe, why not check directly in get_useful_pathkeys_for_relation()
whether or not the pathkey has a target list entry?I haven't been able to try that out yet, and so maybe I'm missing
something, but I need to step out for a bit, so I'll have to look at
it later.I've started poking at this, but haven't yet found a workable
solution. See the attached patch which "solves" the problem by
breaking putting the sort under the gather merge, but it at least
demonstrates conceptually what I think we're interested in doing.The issue currently is that the comparison of expressions fails -- in
our query, the first column selected shows up as a Var (with the same
contents) but is a different pointer in the em_expr and the reltarget
exprs list. I don't yet see a good way to get the equivalence class
for a PathTarget entry.Hmm, I think I was looking to do is the attached patch. I didn't
realize until I did a lot of reading through source that we have an
equal() function that can compare exprs. That (plus the realization in
[1] the originally reported backtrace wasn't where the error actually
came from) convinced me that what we need is to confirm not only that
the all of the vars in the ec member come from the relids in the rel
but also that the expr is actually represented in the target of the
rel.With the gucs changed as I mentioned earlier both of the plans (with
and without a volatile call in the 2nd select entry) now look like:Unique
-> Gather Merge
Workers Planned: 2
-> Sort
Sort Key: ref_0.i, (md5(ref_0.t))
-> Nested Loop
-> Parallel Index Scan using ref_0_i_idx on ref_0
-> Function Scan on generate_series ref_1Without the gucs changed the minimal repro case now doesn't error, but
results in this plan:HashAggregate
Group Key: ref_0.i, CASE WHEN pg_rotate_logfile_old() THEN ref_0.t
ELSE ref_0.t END
-> Nested Loop
-> Seq Scan on ref_0
-> Function Scan on generate_series ref_1Similarly in your six queries I now only see parallel query showing up
in the last one.OK, that seems reasonable I think.
If we proceed with this patch I'd like to tweak it to check for the
relids subset first on the theory that it will be cheaper than the
expr equality check loop; that change is attached.I created an entirely new function because adding the target expr
lookup to the existing find_em_expr_for_rel() function broke a bunch
of postgres_fdw tests. That maybe raises questions about whether that
code also could have problems in theory/in the future, but I didn't
investigate further. In any case we already know it excludes
volatile...so maybe it's fine because in practice that's actually a
broader exclusion than what we're doing here.I don't think postgres_fdw needs these new checks, because FDW scan's
are not allowed in parallel part of the plan - if that changes in the
future, I'm sure that'll require fixes in plenty other places.OTOH it's interesting that it triggers those failures - I wonder if we
could learn something from them? AFAICS those paths can't be built by
generate_useful_gather_paths (because of the postgres_fdw vs. parallel
query restrictions), so how do these plans look?Well the fdw code doesn't trigger the errors, but both
generate_useful_gather_paths() and the fdw code originally relied on
find_em_expr_for_rel(), but now they're diverging. IIRC the fdw code
also uses it for determining what path keys are valid -- in that case
which ones are safe to be sent to the foreign server (so all of the
vars have to be from rels on the foreign server). The fdw code has
additional checks too though, for example no volatile expressions may
be pushed to the remote.This seems to fix the issue, but I'd like feedback on whether it's too
strict. We could of course just check em_has_volatile, but I'm
wondering if that's simultaneously too strict (by not allowing the
volatile expression to be computed in the gather merge supposing
there's no join) and too loose (maybe there are other cases we should
care about?). It also just strikes me as re-encoding rules that should
have already been applied (and thus we should be able to look up in
the data we have if it's safe to use the expr/pathkey). Those are
mostly intuitions though.I don't know :-( As I mentioned before, I suspect checking just the
volatility may not be enough in some cases (leakyness, etc.) and I think
you're right it may be too strict in other cases.Not sure I understand which rules you think we're re-encoding, but I
have a nagging feeling there's a piece of code somewhere earlier in
the query planner meant to prevent such cases (sort on path without all
the pathkeys), and that fixing it here is just a band aid :-(I'm getting at exactly what you're saying below: there's got to be
some existing rules (and presumably code already enforcing it in some
places) at what level in the path tree a given pathkey is
safe/correct, and we don't want to reinvent those rules (or ideally
that code). Verifying the expr is in the rel seems to me to be
intuitively correct, but I wish there was another place in the code
that could validate that.I'm sure we're building plans with Sort on top of Index Scan paths, so
how come those don't fail? How come the non-parallel version of these
queries don't fail?Yeah, exactly. Something has to be doing something similar -- I don't
think everywhere else using sort_pathkeys instead of query_pathkeys
really changes the problem. I've read through all of the places we use
generate_useful_gather_paths() as well as root->sort_pathkeys to try
to get a better understanding of this. Most of the places I convinced
myself it's already safe -- for example places like
create_orderd_paths are working with the upper rel, and the full
target list is available. A few other places I added comments (see the
"questions" patch in the attached series) where I'm not sure why we
apply projections *after* we create a sort path; given the issue we're
discussing here I would have thought you'd want to do exactly the
opposite.I haven't yet tried to read through the code in other places where we
build an explicit sort to see if I can find any hints as to what
existing code does to ensure this situation doesn't occur, but so far
I haven't found anything else obvious.I did a lot more reading through code, and I concluded that generally
(maybe exclusively?) create_sort_path() is only called on upper rels
(e.g., from create_ordered_paths()), so we already have the full
target list available. I also stepped through in the debugger and
confirmed that non-var exprs generally don't seem to show up in the
pathtarget of paths either on base rels or on join rels. Finally, the
last piece of relevant data is that we don't actually push down sorts
on non-parallel paths even when it would be beneficial. For example,
this query:select i, md5(i::text) from t2, generate_series(1,2) order by i;
generates this plan:
Sort
Sort Key: t2.i
-> Nested Loop
-> Function Scan on generate_series
-> Materialize
-> Seq Scan on t2but it seems obvious to me that it would be better to move the sort to
immediately above the seq scan.So I conclude from all of this that the real reason we didn't see this
problem before was that we didn't have anything that was trying to
generate useful sort paths lower in the query, so it just worked. To
state it from the other direction, I don't see any indication that
there actually is any code trying to determine what's safe to put in
the target at various join levels; from what I can tell only vars are
distributed at those points. Thinking about it some more, this
actually makes sense, since get_useful_pathkeys() is new code, and
something like it would have been needed already if we wanted to push
sorts down further into the plan.As a bit of a side note: combined with the above idea about pushing
sort down, it could also be beneficial to try to distribute
expressions lower too.So I think the previously attached patch is probably correct; the
second patch with the comment questions shows I still don't fully
understand how/when projections are applied (and when they're needed),
though it might have something to do with what I said about only
operating on upper rels.
Just FYI in case the patch seems mergeable; I'm adding test cases now,
and I think I might make some minor tweaks to the patch.
I hope to have a new version out ~this evening.
James
On Mon, Oct 5, 2020 at 11:33 AM James Coleman <jtc331@gmail.com> wrote:
On Sun, Oct 4, 2020 at 9:40 PM James Coleman <jtc331@gmail.com> wrote:
On Sat, Oct 3, 2020 at 10:10 PM James Coleman <jtc331@gmail.com> wrote:
On Sat, Oct 3, 2020 at 5:44 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:On Sat, Oct 03, 2020 at 10:50:06AM -0400, James Coleman wrote:
On Fri, Oct 2, 2020 at 11:16 PM James Coleman <jtc331@gmail.com> wrote:
On Fri, Oct 2, 2020 at 7:07 PM James Coleman <jtc331@gmail.com> wrote:
On Fri, Oct 2, 2020 at 6:28 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:On Fri, Oct 02, 2020 at 05:45:52PM -0400, James Coleman wrote:
On Fri, Oct 2, 2020 at 4:56 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:...
More importanly, it does not actually fix the issue - it does fix that
particular query, but just replacing the DISTINCT with either ORDER BY
or GROUP BY make it fail again :-(Attached is a simple script I used, demonstrating these issues for the
three cases that expect to have ressortgroupref != 0 (per the comment
before TargetEntry in plannodes.h).So does checking for volatile expressions (if you happened to test
that) solve all the cases? If you haven't tested that yet, I can try
to do that this evening.Yes, it does fix all the three queries in the SQL script.
The question however is whether this is the root issue, and whether it's
the right way to fix it. For example - volatility is not the only reason
that may block qual pushdown. If you look at qual_is_pushdown_safe, it
also blocks pushdown of leaky functions in security_barrier views. So I
wonder if that could cause failures too, somehow. But I haven't managed
to create such example.I was about to say that the issue here is slightly different from qual
etc. pushdown, since we're not concerned about quals here, and so I
wonder where we determine what target list entries to put in a given
scan path, but then I realized that implies (maybe!) a simpler
solution. Instead of duplicating checks on target list entries would
be safe, why not check directly in get_useful_pathkeys_for_relation()
whether or not the pathkey has a target list entry?I haven't been able to try that out yet, and so maybe I'm missing
something, but I need to step out for a bit, so I'll have to look at
it later.I've started poking at this, but haven't yet found a workable
solution. See the attached patch which "solves" the problem by
breaking putting the sort under the gather merge, but it at least
demonstrates conceptually what I think we're interested in doing.The issue currently is that the comparison of expressions fails -- in
our query, the first column selected shows up as a Var (with the same
contents) but is a different pointer in the em_expr and the reltarget
exprs list. I don't yet see a good way to get the equivalence class
for a PathTarget entry.Hmm, I think I was looking to do is the attached patch. I didn't
realize until I did a lot of reading through source that we have an
equal() function that can compare exprs. That (plus the realization in
[1] the originally reported backtrace wasn't where the error actually
came from) convinced me that what we need is to confirm not only that
the all of the vars in the ec member come from the relids in the rel
but also that the expr is actually represented in the target of the
rel.With the gucs changed as I mentioned earlier both of the plans (with
and without a volatile call in the 2nd select entry) now look like:Unique
-> Gather Merge
Workers Planned: 2
-> Sort
Sort Key: ref_0.i, (md5(ref_0.t))
-> Nested Loop
-> Parallel Index Scan using ref_0_i_idx on ref_0
-> Function Scan on generate_series ref_1Without the gucs changed the minimal repro case now doesn't error, but
results in this plan:HashAggregate
Group Key: ref_0.i, CASE WHEN pg_rotate_logfile_old() THEN ref_0.t
ELSE ref_0.t END
-> Nested Loop
-> Seq Scan on ref_0
-> Function Scan on generate_series ref_1Similarly in your six queries I now only see parallel query showing up
in the last one.OK, that seems reasonable I think.
If we proceed with this patch I'd like to tweak it to check for the
relids subset first on the theory that it will be cheaper than the
expr equality check loop; that change is attached.I created an entirely new function because adding the target expr
lookup to the existing find_em_expr_for_rel() function broke a bunch
of postgres_fdw tests. That maybe raises questions about whether that
code also could have problems in theory/in the future, but I didn't
investigate further. In any case we already know it excludes
volatile...so maybe it's fine because in practice that's actually a
broader exclusion than what we're doing here.I don't think postgres_fdw needs these new checks, because FDW scan's
are not allowed in parallel part of the plan - if that changes in the
future, I'm sure that'll require fixes in plenty other places.OTOH it's interesting that it triggers those failures - I wonder if we
could learn something from them? AFAICS those paths can't be built by
generate_useful_gather_paths (because of the postgres_fdw vs. parallel
query restrictions), so how do these plans look?Well the fdw code doesn't trigger the errors, but both
generate_useful_gather_paths() and the fdw code originally relied on
find_em_expr_for_rel(), but now they're diverging. IIRC the fdw code
also uses it for determining what path keys are valid -- in that case
which ones are safe to be sent to the foreign server (so all of the
vars have to be from rels on the foreign server). The fdw code has
additional checks too though, for example no volatile expressions may
be pushed to the remote.This seems to fix the issue, but I'd like feedback on whether it's too
strict. We could of course just check em_has_volatile, but I'm
wondering if that's simultaneously too strict (by not allowing the
volatile expression to be computed in the gather merge supposing
there's no join) and too loose (maybe there are other cases we should
care about?). It also just strikes me as re-encoding rules that should
have already been applied (and thus we should be able to look up in
the data we have if it's safe to use the expr/pathkey). Those are
mostly intuitions though.I don't know :-( As I mentioned before, I suspect checking just the
volatility may not be enough in some cases (leakyness, etc.) and I think
you're right it may be too strict in other cases.Not sure I understand which rules you think we're re-encoding, but I
have a nagging feeling there's a piece of code somewhere earlier in
the query planner meant to prevent such cases (sort on path without all
the pathkeys), and that fixing it here is just a band aid :-(I'm getting at exactly what you're saying below: there's got to be
some existing rules (and presumably code already enforcing it in some
places) at what level in the path tree a given pathkey is
safe/correct, and we don't want to reinvent those rules (or ideally
that code). Verifying the expr is in the rel seems to me to be
intuitively correct, but I wish there was another place in the code
that could validate that.I'm sure we're building plans with Sort on top of Index Scan paths, so
how come those don't fail? How come the non-parallel version of these
queries don't fail?Yeah, exactly. Something has to be doing something similar -- I don't
think everywhere else using sort_pathkeys instead of query_pathkeys
really changes the problem. I've read through all of the places we use
generate_useful_gather_paths() as well as root->sort_pathkeys to try
to get a better understanding of this. Most of the places I convinced
myself it's already safe -- for example places like
create_orderd_paths are working with the upper rel, and the full
target list is available. A few other places I added comments (see the
"questions" patch in the attached series) where I'm not sure why we
apply projections *after* we create a sort path; given the issue we're
discussing here I would have thought you'd want to do exactly the
opposite.I haven't yet tried to read through the code in other places where we
build an explicit sort to see if I can find any hints as to what
existing code does to ensure this situation doesn't occur, but so far
I haven't found anything else obvious.I did a lot more reading through code, and I concluded that generally
(maybe exclusively?) create_sort_path() is only called on upper rels
(e.g., from create_ordered_paths()), so we already have the full
target list available. I also stepped through in the debugger and
confirmed that non-var exprs generally don't seem to show up in the
pathtarget of paths either on base rels or on join rels. Finally, the
last piece of relevant data is that we don't actually push down sorts
on non-parallel paths even when it would be beneficial. For example,
this query:select i, md5(i::text) from t2, generate_series(1,2) order by i;
generates this plan:
Sort
Sort Key: t2.i
-> Nested Loop
-> Function Scan on generate_series
-> Materialize
-> Seq Scan on t2but it seems obvious to me that it would be better to move the sort to
immediately above the seq scan.So I conclude from all of this that the real reason we didn't see this
problem before was that we didn't have anything that was trying to
generate useful sort paths lower in the query, so it just worked. To
state it from the other direction, I don't see any indication that
there actually is any code trying to determine what's safe to put in
the target at various join levels; from what I can tell only vars are
distributed at those points. Thinking about it some more, this
actually makes sense, since get_useful_pathkeys() is new code, and
something like it would have been needed already if we wanted to push
sorts down further into the plan.As a bit of a side note: combined with the above idea about pushing
sort down, it could also be beneficial to try to distribute
expressions lower too.So I think the previously attached patch is probably correct; the
second patch with the comment questions shows I still don't fully
understand how/when projections are applied (and when they're needed),
though it might have something to do with what I said about only
operating on upper rels.Just FYI in case the patch seems mergeable; I'm adding test cases now,
and I think I might make some minor tweaks to the patch.I hope to have a new version out ~this evening.
All right, here's a modified patch. I did a bit of surgery: the
concept is the same, but I decided to explicitly not the parallels to
(and follow as possible) prepare_sort_from_pathkeys(). That means we
only have to do the expression equality check if the expr is volatile,
and the relids bms check doesn't have to bail if it's empty.
This properly allows us to push the sort all the way down to the base
rel when the only things in the base rel are referenced (including
stable expressions) but pushes the sort up to the upper rel when a
volatile expression is used (technically it would be safe for it to go
lower, but in the current planner that's the only time it appears in
the target list, so allowing that would be a larger functional
change).
I left the questions patch here, though obviously it doesn't need to
be committed. If someone has answers to the questions, I'd be
interested though, but I don't think it affects the correctness of
this patch.
James
Attachments:
v3-0002-questions.patchtext/x-patch; charset=US-ASCII; name=v3-0002-questions.patchDownload
From eb494a83c0aba4b63fb44a34f7c06536ed131e5b Mon Sep 17 00:00:00 2001
From: James Coleman <jtc331@gmail.com>
Date: Sat, 3 Oct 2020 22:05:08 -0400
Subject: [PATCH v3 2/2] questions
---
src/backend/optimizer/plan/planner.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b7377c0c92..76b02232cb 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -5021,6 +5021,10 @@ create_ordered_paths(PlannerInfo *root,
root->sort_pathkeys,
limit_tuples);
/* Add projection step if needed */
+ /* TODO: why don't we apply the projection to the path
+ * before sorting? Is it because it's already been done
+ * by apply_scanjoin_target_to_paths?
+ */
if (sorted_path->pathtarget != target)
sorted_path = apply_projection_to_path(root, ordered_rel,
sorted_path, target);
@@ -5051,6 +5055,10 @@ create_ordered_paths(PlannerInfo *root,
limit_tuples);
/* Add projection step if needed */
+ /* TODO: why don't we apply the projection to the path
+ * before sorting? Is it because it's already been done
+ * by apply_scanjoin_target_to_paths?
+ */
if (sorted_path->pathtarget != target)
sorted_path = apply_projection_to_path(root, ordered_rel,
sorted_path, target);
@@ -5102,6 +5110,10 @@ create_ordered_paths(PlannerInfo *root,
&total_groups);
/* Add projection step if needed */
+ /* TODO: why can't we apply the projection to the partial
+ * path? Is it because it's already been done if possible
+ * by apply_scanjoin_target_to_paths?
+ */
if (path->pathtarget != target)
path = apply_projection_to_path(root, ordered_rel,
path, target);
@@ -5163,6 +5175,10 @@ create_ordered_paths(PlannerInfo *root,
&total_groups);
/* Add projection step if needed */
+ /* TODO: why can't we apply the projection to the partial
+ * path? Is it because it's already been done if possible
+ * by apply_scanjoin_target_to_paths?
+ */
if (sorted_path->pathtarget != target)
sorted_path = apply_projection_to_path(root, ordered_rel,
sorted_path, target);
--
2.17.1
v3-0001-Fix-get_useful_pathkeys_for_relation-for-volatile.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Fix-get_useful_pathkeys_for_relation-for-volatile.patchDownload
From 63dc76f2fcdf6d7bf1f8a1c23e29fb7a9e4e3e50 Mon Sep 17 00:00:00 2001
From: James Coleman <jtc331@gmail.com>
Date: Sat, 3 Oct 2020 10:35:47 -0400
Subject: [PATCH v3 1/2] Fix get_useful_pathkeys_for_relation for volatile
exprs below joins
It's not sufficient to find an ec member whose Vars are all in the rel
we're working with; volatile expressions in particular present a case
where we also need to know if the rel's target contains the expression
or not before we can assume the pathkey for that ec is safe to use at
this point in the query. If the pathkey expr is volatile and we're
below a join, then the expr can't appear in the target until we're above
the join, so we can't sort with it yet.
---
src/backend/optimizer/path/allpaths.c | 13 +--
src/backend/optimizer/path/equivclass.c | 62 ++++++++++++
src/backend/optimizer/plan/planner.c | 9 +-
src/include/optimizer/paths.h | 1 +
.../regress/expected/incremental_sort.out | 98 +++++++++++++++++++
src/test/regress/sql/incremental_sort.sql | 31 ++++++
6 files changed, 206 insertions(+), 8 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index b399592ff8..3393e1d37a 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -2760,7 +2760,8 @@ get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel)
/*
* Considering query_pathkeys is always worth it, because it might allow
* us to avoid a total sort when we have a partially presorted path
- * available.
+ * available or to push the total sort into the parallel portion of the
+ * query.
*/
if (root->query_pathkeys)
{
@@ -2773,17 +2774,17 @@ get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel)
EquivalenceClass *pathkey_ec = pathkey->pk_eclass;
/*
- * We can only build an Incremental Sort for pathkeys which
- * contain an EC member in the current relation, so ignore any
- * suffix of the list as soon as we find a pathkey without an EC
- * member the relation.
+ * We can only build a sort for pathkeys which contain an EC
+ * member in the current relation's target, so ignore any suffix
+ * of the list as soon as we find a pathkey without an EC member
+ * in the relation.
*
* By still returning the prefix of the pathkeys list that does
* meet criteria of EC membership in the current relation, we
* enable not just an incremental sort on the entirety of
* query_pathkeys but also incremental sort below a JOIN.
*/
- if (!find_em_expr_for_rel(pathkey_ec, rel))
+ if (!find_em_expr_usable_for_sorting_rel(pathkey_ec, rel))
break;
npathkeys++;
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index b68a5a0ec7..6f4e841d05 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -794,6 +794,68 @@ find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel)
return NULL;
}
+/*
+ * Find an equivalence class member expression that can be safely used by a
+ * sort node on top of the provided relation. The rules here must match those
+ * applied in prepare_sort_from_pathkeys.
+ */
+Expr *
+find_em_expr_usable_for_sorting_rel(EquivalenceClass *ec, RelOptInfo *rel)
+{
+ ListCell *lc_em;
+
+ /*
+ * If there is more than one equivalence member matching these
+ * requirements we'll be content to choose any one of them.
+ */
+ foreach(lc_em, ec->ec_members)
+ {
+ EquivalenceMember *em = lfirst(lc_em);
+ PathTarget *target = rel->reltarget;
+ ListCell *lc_expr;
+
+ /*
+ * We shouldn't be trying to sort by an equivalence class that
+ * contains a constant, so no need to consider such cases any further.
+ */
+ if (em->em_is_const)
+ continue;
+
+ /*
+ * Any Vars in the equivalence class member need to come from this
+ * relation. This is a superset of prepare_sort_from_pathkeys ignoring
+ * child members unless they belong to the rel being sorted.
+ */
+ if (!bms_is_subset(em->em_relids, rel->relids))
+ continue;
+
+ /*
+ * As long as the expression isn't volatile then
+ * prepare_sort_from_pathkeys is able to generate a new target entry,
+ * so there's no need to verify that one already exists.
+ */
+ if (!ec->ec_has_volatile)
+ return em->em_expr;
+ else
+ {
+ /*
+ * If, however, it's volatile, we have to verify that the
+ * equivalence member's expr is already generated in the
+ * relation's target.
+ */
+ foreach(lc_expr, target->exprs)
+ {
+ /* XXX: Do we need to strip relabel types here? */
+ if (equal(lfirst(lc_expr), em->em_expr))
+ return em->em_expr;
+ }
+ }
+ }
+
+ /* We didn't find any suitable equivalence class expression */
+ return NULL;
+}
+
/*
* generate_base_implied_equalities
* Generate any restriction clauses that we can deduce from equivalence
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index f331f82a6c..b7377c0c92 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7417,7 +7417,9 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
* current reltarget is. We don't do this in the case where the
* target is parallel-safe, since we will be able to generate superior
* paths by doing it after the final scan/join target has been
- * applied.
+ * applied. Since the target isn't parallel safe, we don't need to
+ * apply projections to the partial paths before building gather
+ * paths.
*/
generate_useful_gather_paths(root, rel, false);
@@ -7570,7 +7572,10 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
* if the relation is parallel safe, and we don't do it for child rels to
* avoid creating multiple Gather nodes within the same plan. We must do
* this after all paths have been generated and before set_cheapest, since
- * one of the generated paths may turn out to be the cheapest one.
+ * one of the generated paths may turn out to be the cheapest one. We've
+ * already applied any necessary projections to the partial paths above so
+ * any Gather Merge paths will be able to make use of path keys in
+ * requested target..
*/
if (rel->consider_parallel && !IS_OTHER_REL(rel))
generate_useful_gather_paths(root, rel, false);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 10b6e81079..d72b3a5d42 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -135,6 +135,7 @@ extern EquivalenceClass *get_eclass_for_sort_expr(PlannerInfo *root,
Relids rel,
bool create_it);
extern Expr *find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel);
+extern Expr *find_em_expr_usable_for_sorting_rel(EquivalenceClass *ec, RelOptInfo *rel);
extern void generate_base_implied_equalities(PlannerInfo *root);
extern List *generate_join_implied_equalities(PlannerInfo *root,
Relids join_relids,
diff --git a/src/test/regress/expected/incremental_sort.out b/src/test/regress/expected/incremental_sort.out
index e376ea1276..bc7c8d9451 100644
--- a/src/test/regress/expected/incremental_sort.out
+++ b/src/test/regress/expected/incremental_sort.out
@@ -1469,3 +1469,101 @@ explain (costs off) select * from t union select * from t order by 1,3;
(13 rows)
drop table t;
+-- Sort pushdown can't go below where expressions are part of the rel target.
+-- In particular this is interesting for volatile expressions which have to
+-- go above joins since otherwise we'll incorrectly use expression evaluations
+-- across multiple rows.
+set enable_hashagg=off;
+set enable_seqscan=off;
+set enable_incremental_sort = off;
+set parallel_tuple_cost=0;
+set parallel_setup_cost=0;
+set min_parallel_table_scan_size = 0;
+set min_parallel_index_scan_size = 0;
+-- Parallel sort below join.
+explain (costs off) select distinct sub.unique1, stringu1
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub;
+ QUERY PLAN
+--------------------------------------------------------------------------
+ Unique
+ -> Nested Loop
+ -> Gather Merge
+ Workers Planned: 2
+ -> Sort
+ Sort Key: tenk1.unique1, tenk1.stringu1
+ -> Parallel Index Scan using tenk1_unique1 on tenk1
+ -> Function Scan on generate_series
+(8 rows)
+
+explain (costs off) select sub.unique1, stringu1
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub
+order by 1, 2;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Gather Merge
+ Workers Planned: 2
+ -> Sort
+ Sort Key: tenk1.unique1, tenk1.stringu1
+ -> Parallel Index Scan using tenk1_unique1 on tenk1
+ -> Function Scan on generate_series
+(7 rows)
+
+-- Parallel sort but with expression that can be generated at the base rel..
+explain (costs off) select distinct sub.unique1, md5(stringu1)
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub;
+ QUERY PLAN
+----------------------------------------------------------------------------------------
+ Unique
+ -> Nested Loop
+ -> Gather Merge
+ Workers Planned: 2
+ -> Sort
+ Sort Key: tenk1.unique1, (md5((tenk1.stringu1)::text)) COLLATE "C"
+ -> Parallel Index Scan using tenk1_unique1 on tenk1
+ -> Function Scan on generate_series
+(8 rows)
+
+explain (costs off) select sub.unique1, md5(stringu1)
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub
+order by 1, 2;
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Nested Loop
+ -> Gather Merge
+ Workers Planned: 2
+ -> Sort
+ Sort Key: tenk1.unique1, (md5((tenk1.stringu1)::text)) COLLATE "C"
+ -> Parallel Index Scan using tenk1_unique1 on tenk1
+ -> Function Scan on generate_series
+(7 rows)
+
+-- Parallel sort but with expression not available until the upper rel.
+explain (costs off) select distinct sub.unique1, stringu1 || random()::text
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Unique
+ -> Sort
+ Sort Key: tenk1.unique1, (((tenk1.stringu1)::text || (random())::text)) COLLATE "C"
+ -> Gather
+ Workers Planned: 2
+ -> Nested Loop
+ -> Parallel Index Scan using tenk1_unique1 on tenk1
+ -> Function Scan on generate_series
+(8 rows)
+
+explain (costs off) select sub.unique1, stringu1 || random()::text
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub
+order by 1, 2;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Sort Key: tenk1.unique1, (((tenk1.stringu1)::text || (random())::text)) COLLATE "C"
+ -> Gather
+ Workers Planned: 2
+ -> Nested Loop
+ -> Parallel Index Scan using tenk1_unique1 on tenk1
+ -> Function Scan on generate_series
+(7 rows)
+
diff --git a/src/test/regress/sql/incremental_sort.sql b/src/test/regress/sql/incremental_sort.sql
index 9c040c90e6..12afb86408 100644
--- a/src/test/regress/sql/incremental_sort.sql
+++ b/src/test/regress/sql/incremental_sort.sql
@@ -221,3 +221,34 @@ set enable_hashagg to off;
explain (costs off) select * from t union select * from t order by 1,3;
drop table t;
+
+-- Sort pushdown can't go below where expressions are part of the rel target.
+-- In particular this is interesting for volatile expressions which have to
+-- go above joins since otherwise we'll incorrectly use expression evaluations
+-- across multiple rows.
+set enable_hashagg=off;
+set enable_seqscan=off;
+set enable_incremental_sort = off;
+set parallel_tuple_cost=0;
+set parallel_setup_cost=0;
+set min_parallel_table_scan_size = 0;
+set min_parallel_index_scan_size = 0;
+
+-- Parallel sort below join.
+explain (costs off) select distinct sub.unique1, stringu1
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub;
+explain (costs off) select sub.unique1, stringu1
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub
+order by 1, 2;
+-- Parallel sort but with expression that can be generated at the base rel..
+explain (costs off) select distinct sub.unique1, md5(stringu1)
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub;
+explain (costs off) select sub.unique1, md5(stringu1)
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub
+order by 1, 2;
+-- Parallel sort but with expression not available until the upper rel.
+explain (costs off) select distinct sub.unique1, stringu1 || random()::text
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub;
+explain (costs off) select sub.unique1, stringu1 || random()::text
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub
+order by 1, 2;
--
2.17.1
On Mon, Oct 5, 2020 at 10:38 PM James Coleman <jtc331@gmail.com> wrote:
On Mon, Oct 5, 2020 at 11:33 AM James Coleman <jtc331@gmail.com> wrote:
On Sun, Oct 4, 2020 at 9:40 PM James Coleman <jtc331@gmail.com> wrote:
On Sat, Oct 3, 2020 at 10:10 PM James Coleman <jtc331@gmail.com> wrote:
On Sat, Oct 3, 2020 at 5:44 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:On Sat, Oct 03, 2020 at 10:50:06AM -0400, James Coleman wrote:
On Fri, Oct 2, 2020 at 11:16 PM James Coleman <jtc331@gmail.com> wrote:
On Fri, Oct 2, 2020 at 7:07 PM James Coleman <jtc331@gmail.com> wrote:
On Fri, Oct 2, 2020 at 6:28 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:On Fri, Oct 02, 2020 at 05:45:52PM -0400, James Coleman wrote:
On Fri, Oct 2, 2020 at 4:56 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:...
More importanly, it does not actually fix the issue - it does fix that
particular query, but just replacing the DISTINCT with either ORDER BY
or GROUP BY make it fail again :-(Attached is a simple script I used, demonstrating these issues for the
three cases that expect to have ressortgroupref != 0 (per the comment
before TargetEntry in plannodes.h).So does checking for volatile expressions (if you happened to test
that) solve all the cases? If you haven't tested that yet, I can try
to do that this evening.Yes, it does fix all the three queries in the SQL script.
The question however is whether this is the root issue, and whether it's
the right way to fix it. For example - volatility is not the only reason
that may block qual pushdown. If you look at qual_is_pushdown_safe, it
also blocks pushdown of leaky functions in security_barrier views. So I
wonder if that could cause failures too, somehow. But I haven't managed
to create such example.I was about to say that the issue here is slightly different from qual
etc. pushdown, since we're not concerned about quals here, and so I
wonder where we determine what target list entries to put in a given
scan path, but then I realized that implies (maybe!) a simpler
solution. Instead of duplicating checks on target list entries would
be safe, why not check directly in get_useful_pathkeys_for_relation()
whether or not the pathkey has a target list entry?I haven't been able to try that out yet, and so maybe I'm missing
something, but I need to step out for a bit, so I'll have to look at
it later.I've started poking at this, but haven't yet found a workable
solution. See the attached patch which "solves" the problem by
breaking putting the sort under the gather merge, but it at least
demonstrates conceptually what I think we're interested in doing.The issue currently is that the comparison of expressions fails -- in
our query, the first column selected shows up as a Var (with the same
contents) but is a different pointer in the em_expr and the reltarget
exprs list. I don't yet see a good way to get the equivalence class
for a PathTarget entry.Hmm, I think I was looking to do is the attached patch. I didn't
realize until I did a lot of reading through source that we have an
equal() function that can compare exprs. That (plus the realization in
[1] the originally reported backtrace wasn't where the error actually
came from) convinced me that what we need is to confirm not only that
the all of the vars in the ec member come from the relids in the rel
but also that the expr is actually represented in the target of the
rel.With the gucs changed as I mentioned earlier both of the plans (with
and without a volatile call in the 2nd select entry) now look like:Unique
-> Gather Merge
Workers Planned: 2
-> Sort
Sort Key: ref_0.i, (md5(ref_0.t))
-> Nested Loop
-> Parallel Index Scan using ref_0_i_idx on ref_0
-> Function Scan on generate_series ref_1Without the gucs changed the minimal repro case now doesn't error, but
results in this plan:HashAggregate
Group Key: ref_0.i, CASE WHEN pg_rotate_logfile_old() THEN ref_0.t
ELSE ref_0.t END
-> Nested Loop
-> Seq Scan on ref_0
-> Function Scan on generate_series ref_1Similarly in your six queries I now only see parallel query showing up
in the last one.OK, that seems reasonable I think.
If we proceed with this patch I'd like to tweak it to check for the
relids subset first on the theory that it will be cheaper than the
expr equality check loop; that change is attached.I created an entirely new function because adding the target expr
lookup to the existing find_em_expr_for_rel() function broke a bunch
of postgres_fdw tests. That maybe raises questions about whether that
code also could have problems in theory/in the future, but I didn't
investigate further. In any case we already know it excludes
volatile...so maybe it's fine because in practice that's actually a
broader exclusion than what we're doing here.I don't think postgres_fdw needs these new checks, because FDW scan's
are not allowed in parallel part of the plan - if that changes in the
future, I'm sure that'll require fixes in plenty other places.OTOH it's interesting that it triggers those failures - I wonder if we
could learn something from them? AFAICS those paths can't be built by
generate_useful_gather_paths (because of the postgres_fdw vs. parallel
query restrictions), so how do these plans look?Well the fdw code doesn't trigger the errors, but both
generate_useful_gather_paths() and the fdw code originally relied on
find_em_expr_for_rel(), but now they're diverging. IIRC the fdw code
also uses it for determining what path keys are valid -- in that case
which ones are safe to be sent to the foreign server (so all of the
vars have to be from rels on the foreign server). The fdw code has
additional checks too though, for example no volatile expressions may
be pushed to the remote.This seems to fix the issue, but I'd like feedback on whether it's too
strict. We could of course just check em_has_volatile, but I'm
wondering if that's simultaneously too strict (by not allowing the
volatile expression to be computed in the gather merge supposing
there's no join) and too loose (maybe there are other cases we should
care about?). It also just strikes me as re-encoding rules that should
have already been applied (and thus we should be able to look up in
the data we have if it's safe to use the expr/pathkey). Those are
mostly intuitions though.I don't know :-( As I mentioned before, I suspect checking just the
volatility may not be enough in some cases (leakyness, etc.) and I think
you're right it may be too strict in other cases.Not sure I understand which rules you think we're re-encoding, but I
have a nagging feeling there's a piece of code somewhere earlier in
the query planner meant to prevent such cases (sort on path without all
the pathkeys), and that fixing it here is just a band aid :-(I'm getting at exactly what you're saying below: there's got to be
some existing rules (and presumably code already enforcing it in some
places) at what level in the path tree a given pathkey is
safe/correct, and we don't want to reinvent those rules (or ideally
that code). Verifying the expr is in the rel seems to me to be
intuitively correct, but I wish there was another place in the code
that could validate that.I'm sure we're building plans with Sort on top of Index Scan paths, so
how come those don't fail? How come the non-parallel version of these
queries don't fail?Yeah, exactly. Something has to be doing something similar -- I don't
think everywhere else using sort_pathkeys instead of query_pathkeys
really changes the problem. I've read through all of the places we use
generate_useful_gather_paths() as well as root->sort_pathkeys to try
to get a better understanding of this. Most of the places I convinced
myself it's already safe -- for example places like
create_orderd_paths are working with the upper rel, and the full
target list is available. A few other places I added comments (see the
"questions" patch in the attached series) where I'm not sure why we
apply projections *after* we create a sort path; given the issue we're
discussing here I would have thought you'd want to do exactly the
opposite.I haven't yet tried to read through the code in other places where we
build an explicit sort to see if I can find any hints as to what
existing code does to ensure this situation doesn't occur, but so far
I haven't found anything else obvious.I did a lot more reading through code, and I concluded that generally
(maybe exclusively?) create_sort_path() is only called on upper rels
(e.g., from create_ordered_paths()), so we already have the full
target list available. I also stepped through in the debugger and
confirmed that non-var exprs generally don't seem to show up in the
pathtarget of paths either on base rels or on join rels. Finally, the
last piece of relevant data is that we don't actually push down sorts
on non-parallel paths even when it would be beneficial. For example,
this query:select i, md5(i::text) from t2, generate_series(1,2) order by i;
generates this plan:
Sort
Sort Key: t2.i
-> Nested Loop
-> Function Scan on generate_series
-> Materialize
-> Seq Scan on t2but it seems obvious to me that it would be better to move the sort to
immediately above the seq scan.So I conclude from all of this that the real reason we didn't see this
problem before was that we didn't have anything that was trying to
generate useful sort paths lower in the query, so it just worked. To
state it from the other direction, I don't see any indication that
there actually is any code trying to determine what's safe to put in
the target at various join levels; from what I can tell only vars are
distributed at those points. Thinking about it some more, this
actually makes sense, since get_useful_pathkeys() is new code, and
something like it would have been needed already if we wanted to push
sorts down further into the plan.As a bit of a side note: combined with the above idea about pushing
sort down, it could also be beneficial to try to distribute
expressions lower too.So I think the previously attached patch is probably correct; the
second patch with the comment questions shows I still don't fully
understand how/when projections are applied (and when they're needed),
though it might have something to do with what I said about only
operating on upper rels.Just FYI in case the patch seems mergeable; I'm adding test cases now,
and I think I might make some minor tweaks to the patch.I hope to have a new version out ~this evening.
All right, here's a modified patch. I did a bit of surgery: the
concept is the same, but I decided to explicitly not the parallels to
(and follow as possible) prepare_sort_from_pathkeys(). That means we
only have to do the expression equality check if the expr is volatile,
and the relids bms check doesn't have to bail if it's empty.This properly allows us to push the sort all the way down to the base
rel when the only things in the base rel are referenced (including
stable expressions) but pushes the sort up to the upper rel when a
volatile expression is used (technically it would be safe for it to go
lower, but in the current planner that's the only time it appears in
the target list, so allowing that would be a larger functional
change).I left the questions patch here, though obviously it doesn't need to
be committed. If someone has answers to the questions, I'd be
interested though, but I don't think it affects the correctness of
this patch.
And with a typo fixed.
James
Attachments:
v4-0001-Fix-get_useful_pathkeys_for_relation-for-volatile.patchtext/x-patch; charset=US-ASCII; name=v4-0001-Fix-get_useful_pathkeys_for_relation-for-volatile.patchDownload
From b8f104dbafd09cc719c90f0cbfafffa849c043f2 Mon Sep 17 00:00:00 2001
From: James Coleman <jtc331@gmail.com>
Date: Sat, 3 Oct 2020 10:35:47 -0400
Subject: [PATCH v4 1/2] Fix get_useful_pathkeys_for_relation for volatile
exprs below joins
It's not sufficient to find an ec member whose Vars are all in the rel
we're working with; volatile expressions in particular present a case
where we also need to know if the rel's target contains the expression
or not before we can assume the pathkey for that ec is safe to use at
this point in the query. If the pathkey expr is volatile and we're
below a join, then the expr can't appear in the target until we're above
the join, so we can't sort with it yet.
---
src/backend/optimizer/path/allpaths.c | 13 +--
src/backend/optimizer/path/equivclass.c | 62 ++++++++++++
src/backend/optimizer/plan/planner.c | 9 +-
src/include/optimizer/paths.h | 1 +
.../regress/expected/incremental_sort.out | 98 +++++++++++++++++++
src/test/regress/sql/incremental_sort.sql | 31 ++++++
6 files changed, 206 insertions(+), 8 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index b399592ff8..3393e1d37a 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -2760,7 +2760,8 @@ get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel)
/*
* Considering query_pathkeys is always worth it, because it might allow
* us to avoid a total sort when we have a partially presorted path
- * available.
+ * available or to push the total sort into the parallel portion of the
+ * query.
*/
if (root->query_pathkeys)
{
@@ -2773,17 +2774,17 @@ get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel)
EquivalenceClass *pathkey_ec = pathkey->pk_eclass;
/*
- * We can only build an Incremental Sort for pathkeys which
- * contain an EC member in the current relation, so ignore any
- * suffix of the list as soon as we find a pathkey without an EC
- * member the relation.
+ * We can only build a sort for pathkeys which contain an EC
+ * member in the current relation's target, so ignore any suffix
+ * of the list as soon as we find a pathkey without an EC member
+ * in the relation.
*
* By still returning the prefix of the pathkeys list that does
* meet criteria of EC membership in the current relation, we
* enable not just an incremental sort on the entirety of
* query_pathkeys but also incremental sort below a JOIN.
*/
- if (!find_em_expr_for_rel(pathkey_ec, rel))
+ if (!find_em_expr_usable_for_sorting_rel(pathkey_ec, rel))
break;
npathkeys++;
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index b68a5a0ec7..6f4e841d05 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -794,6 +794,68 @@ find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel)
return NULL;
}
+/*
+ * Find an equivalence class member expression that can be safely used by a
+ * sort node on top of the provided relation. The rules here must match those
+ * applied in prepare_sort_from_pathkeys.
+ */
+Expr *
+find_em_expr_usable_for_sorting_rel(EquivalenceClass *ec, RelOptInfo *rel)
+{
+ ListCell *lc_em;
+
+ /*
+ * If there is more than one equivalence member matching these
+ * requirements we'll be content to choose any one of them.
+ */
+ foreach(lc_em, ec->ec_members)
+ {
+ EquivalenceMember *em = lfirst(lc_em);
+ PathTarget *target = rel->reltarget;
+ ListCell *lc_expr;
+
+ /*
+ * We shouldn't be trying to sort by an equivalence class that
+ * contains a constant, so no need to consider such cases any further.
+ */
+ if (em->em_is_const)
+ continue;
+
+ /*
+ * Any Vars in the equivalence class member need to come from this
+ * relation. This is a superset of prepare_sort_from_pathkeys ignoring
+ * child members unless they belong to the rel being sorted.
+ */
+ if (!bms_is_subset(em->em_relids, rel->relids))
+ continue;
+
+ /*
+ * As long as the expression isn't volatile then
+ * prepare_sort_from_pathkeys is able to generate a new target entry,
+ * so there's no need to verify that one already exists.
+ */
+ if (!ec->ec_has_volatile)
+ return em->em_expr;
+ else
+ {
+ /*
+ * If, however, it's volatile, we have to verify that the
+ * equivalence member's expr is already generated in the
+ * relation's target.
+ */
+ foreach(lc_expr, target->exprs)
+ {
+ /* XXX: Do we need to strip relabel types here? */
+ if (equal(lfirst(lc_expr), em->em_expr))
+ return em->em_expr;
+ }
+ }
+ }
+
+ /* We didn't find any suitable equivalence class expression */
+ return NULL;
+}
+
/*
* generate_base_implied_equalities
* Generate any restriction clauses that we can deduce from equivalence
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index f331f82a6c..b7377c0c92 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7417,7 +7417,9 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
* current reltarget is. We don't do this in the case where the
* target is parallel-safe, since we will be able to generate superior
* paths by doing it after the final scan/join target has been
- * applied.
+ * applied. Since the target isn't parallel safe, we don't need to
+ * apply projections to the partial paths before building gather
+ * paths.
*/
generate_useful_gather_paths(root, rel, false);
@@ -7570,7 +7572,10 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
* if the relation is parallel safe, and we don't do it for child rels to
* avoid creating multiple Gather nodes within the same plan. We must do
* this after all paths have been generated and before set_cheapest, since
- * one of the generated paths may turn out to be the cheapest one.
+ * one of the generated paths may turn out to be the cheapest one. We've
+ * already applied any necessary projections to the partial paths above so
+ * any Gather Merge paths will be able to make use of path keys in
+ * requested target..
*/
if (rel->consider_parallel && !IS_OTHER_REL(rel))
generate_useful_gather_paths(root, rel, false);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 10b6e81079..d72b3a5d42 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -135,6 +135,7 @@ extern EquivalenceClass *get_eclass_for_sort_expr(PlannerInfo *root,
Relids rel,
bool create_it);
extern Expr *find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel);
+extern Expr *find_em_expr_usable_for_sorting_rel(EquivalenceClass *ec, RelOptInfo *rel);
extern void generate_base_implied_equalities(PlannerInfo *root);
extern List *generate_join_implied_equalities(PlannerInfo *root,
Relids join_relids,
diff --git a/src/test/regress/expected/incremental_sort.out b/src/test/regress/expected/incremental_sort.out
index e376ea1276..bc7c8d9451 100644
--- a/src/test/regress/expected/incremental_sort.out
+++ b/src/test/regress/expected/incremental_sort.out
@@ -1469,3 +1469,101 @@ explain (costs off) select * from t union select * from t order by 1,3;
(13 rows)
drop table t;
+-- Sort pushdown can't go below where expressions are part of the rel target.
+-- In particular this is interesting for volatile expressions which have to
+-- go above joins since otherwise we'll incorrectly use expression evaluations
+-- across multiple rows.
+set enable_hashagg=off;
+set enable_seqscan=off;
+set enable_incremental_sort = off;
+set parallel_tuple_cost=0;
+set parallel_setup_cost=0;
+set min_parallel_table_scan_size = 0;
+set min_parallel_index_scan_size = 0;
+-- Parallel sort below join.
+explain (costs off) select distinct sub.unique1, stringu1
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub;
+ QUERY PLAN
+--------------------------------------------------------------------------
+ Unique
+ -> Nested Loop
+ -> Gather Merge
+ Workers Planned: 2
+ -> Sort
+ Sort Key: tenk1.unique1, tenk1.stringu1
+ -> Parallel Index Scan using tenk1_unique1 on tenk1
+ -> Function Scan on generate_series
+(8 rows)
+
+explain (costs off) select sub.unique1, stringu1
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub
+order by 1, 2;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Gather Merge
+ Workers Planned: 2
+ -> Sort
+ Sort Key: tenk1.unique1, tenk1.stringu1
+ -> Parallel Index Scan using tenk1_unique1 on tenk1
+ -> Function Scan on generate_series
+(7 rows)
+
+-- Parallel sort but with expression that can be generated at the base rel..
+explain (costs off) select distinct sub.unique1, md5(stringu1)
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub;
+ QUERY PLAN
+----------------------------------------------------------------------------------------
+ Unique
+ -> Nested Loop
+ -> Gather Merge
+ Workers Planned: 2
+ -> Sort
+ Sort Key: tenk1.unique1, (md5((tenk1.stringu1)::text)) COLLATE "C"
+ -> Parallel Index Scan using tenk1_unique1 on tenk1
+ -> Function Scan on generate_series
+(8 rows)
+
+explain (costs off) select sub.unique1, md5(stringu1)
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub
+order by 1, 2;
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Nested Loop
+ -> Gather Merge
+ Workers Planned: 2
+ -> Sort
+ Sort Key: tenk1.unique1, (md5((tenk1.stringu1)::text)) COLLATE "C"
+ -> Parallel Index Scan using tenk1_unique1 on tenk1
+ -> Function Scan on generate_series
+(7 rows)
+
+-- Parallel sort but with expression not available until the upper rel.
+explain (costs off) select distinct sub.unique1, stringu1 || random()::text
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Unique
+ -> Sort
+ Sort Key: tenk1.unique1, (((tenk1.stringu1)::text || (random())::text)) COLLATE "C"
+ -> Gather
+ Workers Planned: 2
+ -> Nested Loop
+ -> Parallel Index Scan using tenk1_unique1 on tenk1
+ -> Function Scan on generate_series
+(8 rows)
+
+explain (costs off) select sub.unique1, stringu1 || random()::text
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub
+order by 1, 2;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Sort Key: tenk1.unique1, (((tenk1.stringu1)::text || (random())::text)) COLLATE "C"
+ -> Gather
+ Workers Planned: 2
+ -> Nested Loop
+ -> Parallel Index Scan using tenk1_unique1 on tenk1
+ -> Function Scan on generate_series
+(7 rows)
+
diff --git a/src/test/regress/sql/incremental_sort.sql b/src/test/regress/sql/incremental_sort.sql
index 9c040c90e6..3ee11c394b 100644
--- a/src/test/regress/sql/incremental_sort.sql
+++ b/src/test/regress/sql/incremental_sort.sql
@@ -221,3 +221,34 @@ set enable_hashagg to off;
explain (costs off) select * from t union select * from t order by 1,3;
drop table t;
+
+-- Sort pushdown can't go below where expressions are part of the rel target.
+-- In particular this is interesting for volatile expressions which have to
+-- go above joins since otherwise we'll incorrectly use expression evaluations
+-- across multiple rows.
+set enable_hashagg=off;
+set enable_seqscan=off;
+set enable_incremental_sort = off;
+set parallel_tuple_cost=0;
+set parallel_setup_cost=0;
+set min_parallel_table_scan_size = 0;
+set min_parallel_index_scan_size = 0;
+
+-- Parallel sort below join.
+explain (costs off) select distinct sub.unique1, stringu1
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub;
+explain (costs off) select sub.unique1, stringu1
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub
+order by 1, 2;
+-- Parallel sort but with expression that can be safely generated at the base rel.
+explain (costs off) select distinct sub.unique1, md5(stringu1)
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub;
+explain (costs off) select sub.unique1, md5(stringu1)
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub
+order by 1, 2;
+-- Parallel sort but with expression not available until the upper rel.
+explain (costs off) select distinct sub.unique1, stringu1 || random()::text
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub;
+explain (costs off) select sub.unique1, stringu1 || random()::text
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub
+order by 1, 2;
--
2.17.1
v4-0002-questions.patchtext/x-patch; charset=US-ASCII; name=v4-0002-questions.patchDownload
From 45e4fc922b1b33ec4c1ea282b3a66d943f3b51f0 Mon Sep 17 00:00:00 2001
From: James Coleman <jtc331@gmail.com>
Date: Sat, 3 Oct 2020 22:05:08 -0400
Subject: [PATCH v4 2/2] questions
---
src/backend/optimizer/plan/planner.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b7377c0c92..76b02232cb 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -5021,6 +5021,10 @@ create_ordered_paths(PlannerInfo *root,
root->sort_pathkeys,
limit_tuples);
/* Add projection step if needed */
+ /* TODO: why don't we apply the projection to the path
+ * before sorting? Is it because it's already been done
+ * by apply_scanjoin_target_to_paths?
+ */
if (sorted_path->pathtarget != target)
sorted_path = apply_projection_to_path(root, ordered_rel,
sorted_path, target);
@@ -5051,6 +5055,10 @@ create_ordered_paths(PlannerInfo *root,
limit_tuples);
/* Add projection step if needed */
+ /* TODO: why don't we apply the projection to the path
+ * before sorting? Is it because it's already been done
+ * by apply_scanjoin_target_to_paths?
+ */
if (sorted_path->pathtarget != target)
sorted_path = apply_projection_to_path(root, ordered_rel,
sorted_path, target);
@@ -5102,6 +5110,10 @@ create_ordered_paths(PlannerInfo *root,
&total_groups);
/* Add projection step if needed */
+ /* TODO: why can't we apply the projection to the partial
+ * path? Is it because it's already been done if possible
+ * by apply_scanjoin_target_to_paths?
+ */
if (path->pathtarget != target)
path = apply_projection_to_path(root, ordered_rel,
path, target);
@@ -5163,6 +5175,10 @@ create_ordered_paths(PlannerInfo *root,
&total_groups);
/* Add projection step if needed */
+ /* TODO: why can't we apply the projection to the partial
+ * path? Is it because it's already been done if possible
+ * by apply_scanjoin_target_to_paths?
+ */
if (sorted_path->pathtarget != target)
sorted_path = apply_projection_to_path(root, ordered_rel,
sorted_path, target);
--
2.17.1
On Tue, 6 Oct 2020 at 06:46, James Coleman <jtc331@gmail.com> wrote:
All right, here's a modified patch. I did a bit of surgery: the
concept is the same, but I decided to explicitly not the parallels to
(and follow as possible) prepare_sort_from_pathkeys(). That means we
only have to do the expression equality check if the expr is volatile,
and the relids bms check doesn't have to bail if it's empty.This properly allows us to push the sort all the way down to the base
rel when the only things in the base rel are referenced (including
stable expressions) but pushes the sort up to the upper rel when a
volatile expression is used (technically it would be safe for it to go
lower, but in the current planner that's the only time it appears in
the target list, so allowing that would be a larger functional
change).I left the questions patch here, though obviously it doesn't need to
be committed. If someone has answers to the questions, I'd be
interested though, but I don't think it affects the correctness of
this patch.And with a typo fixed.
Hi James,
Thanks for working on this, I haven't tried the latest patch but the
previous single patch worked fine in the cases you and Tomas found and
with my original example.
I haven't been able to reproduce the same problem with other queries
(I tried RLS and views with barriers, i also wanted to test with LEAK
functions but didn't have the time yet).
Can you please create an entry in the commitfest for this one so we
don't lose track of it?
--
Jaime Casanova www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Oct 6, 2020 at 9:28 AM Jaime Casanova
<jaime.casanova@2ndquadrant.com> wrote:
Can you please create an entry in the commitfest for this one so we
don't lose track of it?
Done.
On Thu, Oct 1, 2020 at 9:03 AM James Coleman <jtc331@gmail.com> wrote:
The plan (obtained by replacing the volatile function with a stable one):
Unique
-> Nested Loop
-> Gather Merge
Workers Planned: 2
-> Sort
Sort Key: ref_0.i, (md5(ref_0.t))
-> Parallel Index Scan using ref_0_i_idx on ref_0
-> Function Scan on generate_series ref_1Changing `md5(ref_0.t)` to `random()::text || ref_0.t` causes the error.
As far as I remember this stuff, target lists for any nodes below the
top of the join tree were previously always just Var nodes. Any
expressions that needed to be computed were computed at the level of
the topmost join, before adding upper planner steps like Agg,
WindowAgg, Limit, etc. But, here you've got a complex expression --
md5(ref_0.t) -- begin computed somewhere in the middle of the plan
tree so that the sort can be performed at that point. However, that
seems likely to break a bunch of assumptions all over the planner,
because, unless a lot of work and surgery has been done since I looked
at this last, that md5(ref_0.t) value is not going to "bubble up" to
higher levels of the plan through the tlists of the individual nodes.
That poses a risk that it will be recomputed multiple times, which is
particularly bad for volatile functions because you might not always
get the same answer, but it seems like it might be bad even for
non-volatile functions, because it could be slow if the value is used
at multiple places in the query. It feels like the sort of thing that
might have broken some other assumptions, too, although I can't say
exactly what.
I wonder if whatever part of the incremental sort patch allowed
computation of tlist expressions below the top level of the join tree
ought to just be reverted outright, but that might be overkill. I
can't say that I really understand exactly what's going on here, but
it seems like a pretty significant behavior change to make as part of
another project, and I wonder if we're going to keep finding other
problems with it.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, Oct 7, 2020 at 3:52 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Oct 1, 2020 at 9:03 AM James Coleman <jtc331@gmail.com> wrote:
The plan (obtained by replacing the volatile function with a stable one):
Unique
-> Nested Loop
-> Gather Merge
Workers Planned: 2
-> Sort
Sort Key: ref_0.i, (md5(ref_0.t))
-> Parallel Index Scan using ref_0_i_idx on ref_0
-> Function Scan on generate_series ref_1Changing `md5(ref_0.t)` to `random()::text || ref_0.t` causes the error.
As far as I remember this stuff, target lists for any nodes below the
top of the join tree were previously always just Var nodes. Any
expressions that needed to be computed were computed at the level of
the topmost join, before adding upper planner steps like Agg,
WindowAgg, Limit, etc. But, here you've got a complex expression --
md5(ref_0.t) -- begin computed somewhere in the middle of the plan
tree so that the sort can be performed at that point. However, that
seems likely to break a bunch of assumptions all over the planner,
because, unless a lot of work and surgery has been done since I looked
at this last, that md5(ref_0.t) value is not going to "bubble up" to
higher levels of the plan through the tlists of the individual nodes.
That poses a risk that it will be recomputed multiple times, which is
particularly bad for volatile functions because you might not always
get the same answer, but it seems like it might be bad even for
non-volatile functions, because it could be slow if the value is used
at multiple places in the query. It feels like the sort of thing that
might have broken some other assumptions, too, although I can't say
exactly what.
The distinction between volatile and stable expressions here has been
discussed at length in this thread; I think it'd be worth reading
through the rest of the thread before offering sledgehammer ideas
below like reverting the entire piece.
In particular, most of the discussion in the thread has been about
what is actually safe to reference at lower levels in the plan. It
would be extremely easy tomake this only allow sorting by Vars at
lower levels in the plan, though the current revision of the patch
follows what prepare_sort_from_pathkeys allows and disallows.
Obviously it'd be nice to be able to sort by non-Var expressions at
lower levels too. But again, it'd be easy to change that if it's not
safe, so it'd be helpful if you could point to areas that don't bubble
non-Vars up so that we can actually comment about that in the source
and discuss specifics.
I wonder if whatever part of the incremental sort patch allowed
computation of tlist expressions below the top level of the join tree
ought to just be reverted outright, but that might be overkill. I
can't say that I really understand exactly what's going on here, but
it seems like a pretty significant behavior change to make as part of
another project, and I wonder if we're going to keep finding other
problems with it.
James
On Wed, Oct 07, 2020 at 04:01:27PM -0400, James Coleman wrote:
On Wed, Oct 7, 2020 at 3:52 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Oct 1, 2020 at 9:03 AM James Coleman <jtc331@gmail.com> wrote:
The plan (obtained by replacing the volatile function with a stable one):
Unique
-> Nested Loop
-> Gather Merge
Workers Planned: 2
-> Sort
Sort Key: ref_0.i, (md5(ref_0.t))
-> Parallel Index Scan using ref_0_i_idx on ref_0
-> Function Scan on generate_series ref_1Changing `md5(ref_0.t)` to `random()::text || ref_0.t` causes the error.
As far as I remember this stuff, target lists for any nodes below the
top of the join tree were previously always just Var nodes. Any
expressions that needed to be computed were computed at the level of
the topmost join, before adding upper planner steps like Agg,
WindowAgg, Limit, etc. But, here you've got a complex expression --
md5(ref_0.t) -- begin computed somewhere in the middle of the plan
tree so that the sort can be performed at that point. However, that
seems likely to break a bunch of assumptions all over the planner,
because, unless a lot of work and surgery has been done since I looked
at this last, that md5(ref_0.t) value is not going to "bubble up" to
higher levels of the plan through the tlists of the individual nodes.
That poses a risk that it will be recomputed multiple times, which is
particularly bad for volatile functions because you might not always
get the same answer, but it seems like it might be bad even for
non-volatile functions, because it could be slow if the value is used
at multiple places in the query. It feels like the sort of thing that
might have broken some other assumptions, too, although I can't say
exactly what.The distinction between volatile and stable expressions here has been
discussed at length in this thread; I think it'd be worth reading
through the rest of the thread before offering sledgehammer ideas
below like reverting the entire piece.
Um, that's a tad too harsh, I guess ...
We've discussed the volatile/stable expressions before, but AFAIK we
failed to notice the planner assumes such expressions are expected to be
computed only in the upper parts of the plan. If this assumption really
exists, it probably relies on simply calling create_sort_path just about
right, and the new code looking at useful pathkeys below gather merge
might be breaking that simply by doing it too early.
In that case we can either rip this code out, effectively reverting most
of ba3e76cc571, or consider only pathkeys that we can find in the tlist
of the child path.
I'm still not entirely sure I understand what's happening, or what the
exact rule is. Consider this query:
explain (verbose) select distinct i, t, md5(t) from ref_0;
which on PG12 (i.e. before incremental sort) is planned like this:
QUERY PLAN
----------------------------------------------------------------------------------
Unique (cost=78120.92..83120.92 rows=500000 width=65)
Output: i, t, (md5(t))
-> Sort (cost=78120.92..79370.92 rows=500000 width=65)
Output: i, t, (md5(t))
Sort Key: ref_0.i, ref_0.t, (md5(ref_0.t))
-> Seq Scan on public.ref_0 (cost=0.00..10282.00 rows=500000 width=65)
Output: i, t, md5(t)
(7 rows)
i.e. the (stable) function is pushed all the way to the scan node. And
even if we replace it with a volatile expression it gets pushed down:
explain (verbose) select distinct i, t, md5(random()::text || t) from ref_0;
QUERY PLAN
----------------------------------------------------------------------------------
Unique (cost=83120.92..88120.92 rows=500000 width=65)
Output: i, t, (md5(((random())::text || t)))
-> Sort (cost=83120.92..84370.92 rows=500000 width=65)
Output: i, t, (md5(((random())::text || t)))
Sort Key: ref_0.i, ref_0.t, (md5(((random())::text || ref_0.t)))
-> Seq Scan on public.ref_0 (cost=0.00..15282.00 rows=500000 width=65)
Output: i, t, md5(((random())::text || t))
(7 rows)
But perhaps I just don't understand the assumption correctly?
In particular, most of the discussion in the thread has been about
what is actually safe to reference at lower levels in the plan. It
would be extremely easy tomake this only allow sorting by Vars at
lower levels in the plan, though the current revision of the patch
follows what prepare_sort_from_pathkeys allows and disallows.Obviously it'd be nice to be able to sort by non-Var expressions at
lower levels too. But again, it'd be easy to change that if it's not
safe, so it'd be helpful if you could point to areas that don't bubble
non-Vars up so that we can actually comment about that in the source
and discuss specifics.
I think the proble here might be "what we think is theoretically safe"
vs. "what the planner expects". I wouldn't be surprised if this was safe
in theoretical sense, but planner not expecting that.
I wonder if whatever part of the incremental sort patch allowed
computation of tlist expressions below the top level of the join tree
ought to just be reverted outright, but that might be overkill. I
can't say that I really understand exactly what's going on here, but
it seems like a pretty significant behavior change to make as part of
another project, and I wonder if we're going to keep finding other
problems with it.
It certainly was not my intention to make such change in this patch. I
don't think the earlier parts of the incremental sort have this issue
because all those places were already constructing regular sort before.
But this code is different because it considers adding sort node (both
regular and incremental) below a gather merge, which was not happening
before. I suspect this is the problem, somehow.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Oct 07, 2020 at 03:52:04PM -0400, Robert Haas wrote:
On Thu, Oct 1, 2020 at 9:03 AM James Coleman <jtc331@gmail.com> wrote:
The plan (obtained by replacing the volatile function with a stable one):
Unique
-> Nested Loop
-> Gather Merge
Workers Planned: 2
-> Sort
Sort Key: ref_0.i, (md5(ref_0.t))
-> Parallel Index Scan using ref_0_i_idx on ref_0
-> Function Scan on generate_series ref_1Changing `md5(ref_0.t)` to `random()::text || ref_0.t` causes the error.
As far as I remember this stuff, target lists for any nodes below the
top of the join tree were previously always just Var nodes. Any
expressions that needed to be computed were computed at the level of
the topmost join, before adding upper planner steps like Agg,
WindowAgg, Limit, etc. But, here you've got a complex expression --
md5(ref_0.t) -- begin computed somewhere in the middle of the plan
tree so that the sort can be performed at that point. However, that
seems likely to break a bunch of assumptions all over the planner,
because, unless a lot of work and surgery has been done since I looked
at this last, that md5(ref_0.t) value is not going to "bubble up" to
higher levels of the plan through the tlists of the individual nodes.That poses a risk that it will be recomputed multiple times, which is
particularly bad for volatile functions because you might not always
get the same answer, but it seems like it might be bad even for
non-volatile functions, because it could be slow if the value is used
at multiple places in the query. It feels like the sort of thing that
might have broken some other assumptions, too, although I can't say
exactly what.
I agree this might be a clear correctness issue for volatile functions.
Not sure about stable functions, though - it's easy to construct cases
where pushing the function down results in much faster execution.
Consider this:
create table t (id int, val text);
insert into t select 1, repeat('x', 1000) from generate_series(1,100) s(i);
-- no pushdown
select id, md5(t1.val) from t t1 join t t2 using (id) join t t3 using (id);
Time: 5222.895 ms (00:05.223)
-- manual pushdown
with x as (select id, md5(t.val) from t)
select id, t1.md5 from x t1 join x t2 using (id) join x t3 using (id);
Time: 486.455 ms
Of course, this is independent of what the planner assumes correctly,
which is what's being discussed in this thread. And I'm sure there are
plenty counter-examples too, so we'd need to cost this properly.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Oct 7, 2020 at 6:22 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
On Wed, Oct 07, 2020 at 04:01:27PM -0400, James Coleman wrote:
On Wed, Oct 7, 2020 at 3:52 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Oct 1, 2020 at 9:03 AM James Coleman <jtc331@gmail.com> wrote:
The plan (obtained by replacing the volatile function with a stable one):
Unique
-> Nested Loop
-> Gather Merge
Workers Planned: 2
-> Sort
Sort Key: ref_0.i, (md5(ref_0.t))
-> Parallel Index Scan using ref_0_i_idx on ref_0
-> Function Scan on generate_series ref_1Changing `md5(ref_0.t)` to `random()::text || ref_0.t` causes the error.
As far as I remember this stuff, target lists for any nodes below the
top of the join tree were previously always just Var nodes. Any
expressions that needed to be computed were computed at the level of
the topmost join, before adding upper planner steps like Agg,
WindowAgg, Limit, etc. But, here you've got a complex expression --
md5(ref_0.t) -- begin computed somewhere in the middle of the plan
tree so that the sort can be performed at that point. However, that
seems likely to break a bunch of assumptions all over the planner,
because, unless a lot of work and surgery has been done since I looked
at this last, that md5(ref_0.t) value is not going to "bubble up" to
higher levels of the plan through the tlists of the individual nodes.
That poses a risk that it will be recomputed multiple times, which is
particularly bad for volatile functions because you might not always
get the same answer, but it seems like it might be bad even for
non-volatile functions, because it could be slow if the value is used
at multiple places in the query. It feels like the sort of thing that
might have broken some other assumptions, too, although I can't say
exactly what.The distinction between volatile and stable expressions here has been
discussed at length in this thread; I think it'd be worth reading
through the rest of the thread before offering sledgehammer ideas
below like reverting the entire piece.Um, that's a tad too harsh, I guess ...
Yes, it was too harsh. So, my apologies, Robert.
Nonetheless I do think that reverting would be a surprisingly strong
reaction for this because I don't think the problem scope is
particularly dire.
Two reasons why:
1. As noted earlier it'd be pretty easy to limit this sort pushdown
only to sorts on Vars. But I don't think that;s necessary, because:
2. From my time in the debugger on this issue, I don't see how the
planner can be assuming these things won't bubble up.
For one thing, your Tomas's example below shows the stable expression
being computed at the scan node level and bubbling up through.
For another thing, earlier in this thread [1] I have a patch that
compares expressions in the path target to the expression in the
pathkey, and even under a gather merge the stable expression (md5) is
already in the path target at that level of the path.
We've discussed the volatile/stable expressions before, but AFAIK we
failed to notice the planner assumes such expressions are expected to be
computed only in the upper parts of the plan. If this assumption really
exists, it probably relies on simply calling create_sort_path just about
right, and the new code looking at useful pathkeys below gather merge
might be breaking that simply by doing it too early.
I don't think such an assumption exists, as explained above, or at
least I can't find an indication of it, and it's already violated in
pre-incremental sort code.
In that case we can either rip this code out, effectively reverting most
of ba3e76cc571, or consider only pathkeys that we can find in the tlist
of the child path.
This is exactly what the patch in [1] does. The current patch
effectively does it also, but I believe it does so in a more refined
way.
I'm still not entirely sure I understand what's happening, or what the
exact rule is. Consider this query:explain (verbose) select distinct i, t, md5(t) from ref_0;
which on PG12 (i.e. before incremental sort) is planned like this:
QUERY PLAN
----------------------------------------------------------------------------------
Unique (cost=78120.92..83120.92 rows=500000 width=65)
Output: i, t, (md5(t))
-> Sort (cost=78120.92..79370.92 rows=500000 width=65)
Output: i, t, (md5(t))
Sort Key: ref_0.i, ref_0.t, (md5(ref_0.t))
-> Seq Scan on public.ref_0 (cost=0.00..10282.00 rows=500000 width=65)
Output: i, t, md5(t)
(7 rows)i.e. the (stable) function is pushed all the way to the scan node. And
even if we replace it with a volatile expression it gets pushed down:explain (verbose) select distinct i, t, md5(random()::text || t) from ref_0;
QUERY PLAN
----------------------------------------------------------------------------------
Unique (cost=83120.92..88120.92 rows=500000 width=65)
Output: i, t, (md5(((random())::text || t)))
-> Sort (cost=83120.92..84370.92 rows=500000 width=65)
Output: i, t, (md5(((random())::text || t)))
Sort Key: ref_0.i, ref_0.t, (md5(((random())::text || ref_0.t)))
-> Seq Scan on public.ref_0 (cost=0.00..15282.00 rows=500000 width=65)
Output: i, t, md5(((random())::text || t))
(7 rows)But perhaps I just don't understand the assumption correctly?
In particular, most of the discussion in the thread has been about
what is actually safe to reference at lower levels in the plan. It
would be extremely easy tomake this only allow sorting by Vars at
lower levels in the plan, though the current revision of the patch
follows what prepare_sort_from_pathkeys allows and disallows.Obviously it'd be nice to be able to sort by non-Var expressions at
lower levels too. But again, it'd be easy to change that if it's not
safe, so it'd be helpful if you could point to areas that don't bubble
non-Vars up so that we can actually comment about that in the source
and discuss specifics.I think the proble here might be "what we think is theoretically safe"
vs. "what the planner expects". I wouldn't be surprised if this was safe
in theoretical sense, but planner not expecting that.
Agreed as a general rule, but as above I'm having a hard time matching
up pre-this-code behavior with the claim that the planner isn't
expecting it.
I wonder if whatever part of the incremental sort patch allowed
computation of tlist expressions below the top level of the join tree
ought to just be reverted outright, but that might be overkill. I
can't say that I really understand exactly what's going on here, but
it seems like a pretty significant behavior change to make as part of
another project, and I wonder if we're going to keep finding other
problems with it.It certainly was not my intention to make such change in this patch. I
don't think the earlier parts of the incremental sort have this issue
because all those places were already constructing regular sort before.But this code is different because it considers adding sort node (both
regular and incremental) below a gather merge, which was not happening
before. I suspect this is the problem, somehow.
If my above understanding holds, then it's not a significant behavior
change, but if it is, we have a simple workaround (looking only at
pathkeys referencing a Var in the path target list).
James
1: /messages/by-id/CAAaqYe9Q5TL-d0Fcs1m9bes7YbfkrizmEVFF+LdTLE6ZEXCwhg@mail.gmail.com
On Tue, Oct 06, 2020 at 09:37:31AM -0400, James Coleman wrote:
On Tue, Oct 6, 2020 at 9:28 AM Jaime Casanova
<jaime.casanova@2ndquadrant.com> wrote:Can you please create an entry in the commitfest for this one so we
don't lose track of it?
We're not too far from the next minor release, so I've been looking at
this fix again and I intend to get it committed shortly (on Monday or
so). Attached is a slightly modified version of the v4 patch that:
(a) Removes comments about projections added to code that is not
directly related to the fix. I'm not against adding such comments
separately.
(b) Fixes comment in expected output of incremental_sort test.
(c) Removes else branch from find_em_expr_usable_for_sorting_rel. It's
not quite needed thanks to the "return" in the "if" branch. IMO this
makes it more elegant.
I do have two questions about find_em_expr_usable_for_sorting_rel.
(a) Where does the em_is_const check come from? The comment claims we
should not be trying to sort by equivalence class members, but I can't
convince myself it's actually true and I don't see it discussed in this
thread.
(b) In find_em_expr_for_rel, which was what was used before, the
condition on relids was this:
if (bms_is_subset(em->em_relids, rel->relids) &&
!bms_is_empty(em->em_relids))
{
return em->em_expr;
}
but here we're using only
if (!bms_is_subset(em->em_relids, rel->relids))
continue;
Isn't this missing the second bms_is_empty condition?
Of course, an alternative to this fix would be reverting ba3e76cc571
(completely or just the part introducing generate_useful_gather_paths).
If anyone thinks that's what we should do, please speak now.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
v5-0001-Fix-get_useful_pathkeys_for_relation-for-volatile.patchtext/plain; charset=us-asciiDownload
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index b399592ff8..3393e1d37a 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -2760,7 +2760,8 @@ get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel)
/*
* Considering query_pathkeys is always worth it, because it might allow
* us to avoid a total sort when we have a partially presorted path
- * available.
+ * available or to push the total sort into the parallel portion of the
+ * query.
*/
if (root->query_pathkeys)
{
@@ -2773,17 +2774,17 @@ get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel)
EquivalenceClass *pathkey_ec = pathkey->pk_eclass;
/*
- * We can only build an Incremental Sort for pathkeys which
- * contain an EC member in the current relation, so ignore any
- * suffix of the list as soon as we find a pathkey without an EC
- * member the relation.
+ * We can only build a sort for pathkeys which contain an EC
+ * member in the current relation's target, so ignore any suffix
+ * of the list as soon as we find a pathkey without an EC member
+ * in the relation.
*
* By still returning the prefix of the pathkeys list that does
* meet criteria of EC membership in the current relation, we
* enable not just an incremental sort on the entirety of
* query_pathkeys but also incremental sort below a JOIN.
*/
- if (!find_em_expr_for_rel(pathkey_ec, rel))
+ if (!find_em_expr_usable_for_sorting_rel(pathkey_ec, rel))
break;
npathkeys++;
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 823422edad..9662dd3262 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -794,6 +794,66 @@ find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel)
return NULL;
}
+/*
+ * Find an equivalence class member expression that can be safely used by a
+ * sort node on top of the provided relation. The rules here must match those
+ * applied in prepare_sort_from_pathkeys.
+ */
+Expr *
+find_em_expr_usable_for_sorting_rel(EquivalenceClass *ec, RelOptInfo *rel)
+{
+ ListCell *lc_em;
+
+ /*
+ * If there is more than one equivalence member matching these
+ * requirements we'll be content to choose any one of them.
+ */
+ foreach(lc_em, ec->ec_members)
+ {
+ EquivalenceMember *em = lfirst(lc_em);
+ PathTarget *target = rel->reltarget;
+ ListCell *lc_expr;
+
+ /*
+ * We shouldn't be trying to sort by an equivalence class that
+ * contains a constant, so no need to consider such cases any further.
+ */
+ if (em->em_is_const)
+ continue;
+
+ /*
+ * Any Vars in the equivalence class member need to come from this
+ * relation. This is a superset of prepare_sort_from_pathkeys ignoring
+ * child members unless they belong to the rel being sorted.
+ */
+ if (!bms_is_subset(em->em_relids, rel->relids))
+ continue;
+
+ /*
+ * As long as the expression isn't volatile then
+ * prepare_sort_from_pathkeys is able to generate a new target entry,
+ * so there's no need to verify that one already exists.
+ */
+ if (!ec->ec_has_volatile)
+ return em->em_expr;
+
+ /*
+ * If, however, it's volatile, we have to verify that the
+ * equivalence member's expr is already generated in the
+ * relation's target.
+ */
+ foreach(lc_expr, target->exprs)
+ {
+ /* XXX: Do we need to strip relabel types here? */
+ if (equal(lfirst(lc_expr), em->em_expr))
+ return em->em_expr;
+ }
+ }
+
+ /* We didn't find any suitable equivalence class expression */
+ return NULL;
+}
+
/*
* generate_base_implied_equalities
* Generate any restriction clauses that we can deduce from equivalence
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 10b6e81079..d72b3a5d42 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -135,6 +135,7 @@ extern EquivalenceClass *get_eclass_for_sort_expr(PlannerInfo *root,
Relids rel,
bool create_it);
extern Expr *find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel);
+extern Expr *find_em_expr_usable_for_sorting_rel(EquivalenceClass *ec, RelOptInfo *rel);
extern void generate_base_implied_equalities(PlannerInfo *root);
extern List *generate_join_implied_equalities(PlannerInfo *root,
Relids join_relids,
diff --git a/src/test/regress/expected/incremental_sort.out b/src/test/regress/expected/incremental_sort.out
index e376ea1276..7cf2eee7c1 100644
--- a/src/test/regress/expected/incremental_sort.out
+++ b/src/test/regress/expected/incremental_sort.out
@@ -1469,3 +1469,101 @@ explain (costs off) select * from t union select * from t order by 1,3;
(13 rows)
drop table t;
+-- Sort pushdown can't go below where expressions are part of the rel target.
+-- In particular this is interesting for volatile expressions which have to
+-- go above joins since otherwise we'll incorrectly use expression evaluations
+-- across multiple rows.
+set enable_hashagg=off;
+set enable_seqscan=off;
+set enable_incremental_sort = off;
+set parallel_tuple_cost=0;
+set parallel_setup_cost=0;
+set min_parallel_table_scan_size = 0;
+set min_parallel_index_scan_size = 0;
+-- Parallel sort below join.
+explain (costs off) select distinct sub.unique1, stringu1
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub;
+ QUERY PLAN
+--------------------------------------------------------------------------
+ Unique
+ -> Nested Loop
+ -> Gather Merge
+ Workers Planned: 2
+ -> Sort
+ Sort Key: tenk1.unique1, tenk1.stringu1
+ -> Parallel Index Scan using tenk1_unique1 on tenk1
+ -> Function Scan on generate_series
+(8 rows)
+
+explain (costs off) select sub.unique1, stringu1
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub
+order by 1, 2;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Gather Merge
+ Workers Planned: 2
+ -> Sort
+ Sort Key: tenk1.unique1, tenk1.stringu1
+ -> Parallel Index Scan using tenk1_unique1 on tenk1
+ -> Function Scan on generate_series
+(7 rows)
+
+-- Parallel sort but with expression that can be safely generated at the base rel.
+explain (costs off) select distinct sub.unique1, md5(stringu1)
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub;
+ QUERY PLAN
+----------------------------------------------------------------------------------------
+ Unique
+ -> Nested Loop
+ -> Gather Merge
+ Workers Planned: 2
+ -> Sort
+ Sort Key: tenk1.unique1, (md5((tenk1.stringu1)::text)) COLLATE "C"
+ -> Parallel Index Scan using tenk1_unique1 on tenk1
+ -> Function Scan on generate_series
+(8 rows)
+
+explain (costs off) select sub.unique1, md5(stringu1)
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub
+order by 1, 2;
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Nested Loop
+ -> Gather Merge
+ Workers Planned: 2
+ -> Sort
+ Sort Key: tenk1.unique1, (md5((tenk1.stringu1)::text)) COLLATE "C"
+ -> Parallel Index Scan using tenk1_unique1 on tenk1
+ -> Function Scan on generate_series
+(7 rows)
+
+-- Parallel sort but with expression not available until the upper rel.
+explain (costs off) select distinct sub.unique1, stringu1 || random()::text
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Unique
+ -> Sort
+ Sort Key: tenk1.unique1, (((tenk1.stringu1)::text || (random())::text)) COLLATE "C"
+ -> Gather
+ Workers Planned: 2
+ -> Nested Loop
+ -> Parallel Index Scan using tenk1_unique1 on tenk1
+ -> Function Scan on generate_series
+(8 rows)
+
+explain (costs off) select sub.unique1, stringu1 || random()::text
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub
+order by 1, 2;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Sort Key: tenk1.unique1, (((tenk1.stringu1)::text || (random())::text)) COLLATE "C"
+ -> Gather
+ Workers Planned: 2
+ -> Nested Loop
+ -> Parallel Index Scan using tenk1_unique1 on tenk1
+ -> Function Scan on generate_series
+(7 rows)
+
diff --git a/src/test/regress/sql/incremental_sort.sql b/src/test/regress/sql/incremental_sort.sql
index 9c040c90e6..3ee11c394b 100644
--- a/src/test/regress/sql/incremental_sort.sql
+++ b/src/test/regress/sql/incremental_sort.sql
@@ -221,3 +221,34 @@ set enable_hashagg to off;
explain (costs off) select * from t union select * from t order by 1,3;
drop table t;
+
+-- Sort pushdown can't go below where expressions are part of the rel target.
+-- In particular this is interesting for volatile expressions which have to
+-- go above joins since otherwise we'll incorrectly use expression evaluations
+-- across multiple rows.
+set enable_hashagg=off;
+set enable_seqscan=off;
+set enable_incremental_sort = off;
+set parallel_tuple_cost=0;
+set parallel_setup_cost=0;
+set min_parallel_table_scan_size = 0;
+set min_parallel_index_scan_size = 0;
+
+-- Parallel sort below join.
+explain (costs off) select distinct sub.unique1, stringu1
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub;
+explain (costs off) select sub.unique1, stringu1
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub
+order by 1, 2;
+-- Parallel sort but with expression that can be safely generated at the base rel.
+explain (costs off) select distinct sub.unique1, md5(stringu1)
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub;
+explain (costs off) select sub.unique1, md5(stringu1)
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub
+order by 1, 2;
+-- Parallel sort but with expression not available until the upper rel.
+explain (costs off) select distinct sub.unique1, stringu1 || random()::text
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub;
+explain (costs off) select sub.unique1, stringu1 || random()::text
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub
+order by 1, 2;
On Thu, Oct 29, 2020 at 6:06 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
On Tue, Oct 06, 2020 at 09:37:31AM -0400, James Coleman wrote:
On Tue, Oct 6, 2020 at 9:28 AM Jaime Casanova
<jaime.casanova@2ndquadrant.com> wrote:Can you please create an entry in the commitfest for this one so we
don't lose track of it?We're not too far from the next minor release, so I've been looking at
this fix again and I intend to get it committed shortly (on Monday or
so). Attached is a slightly modified version of the v4 patch that:(a) Removes comments about projections added to code that is not
directly related to the fix. I'm not against adding such comments
separately.
Seems fine. Do you want to commit them at the same time (just a
separate commit)? Or have a separate patch? It seems a bit overkill to
start a new thread just for those.
(b) Fixes comment in expected output of incremental_sort test.
Thanks.
(c) Removes else branch from find_em_expr_usable_for_sorting_rel. It's
not quite needed thanks to the "return" in the "if" branch. IMO this
makes it more elegant.
No objection.
I do have two questions about find_em_expr_usable_for_sorting_rel.
(a) Where does the em_is_const check come from? The comment claims we
should not be trying to sort by equivalence class members, but I can't
convince myself it's actually true and I don't see it discussed in this
thread.
That comes from find_ec_member_for_tle which is called by
prepare_sort_from_pathkeys which we note in the comments contains the
set of rules we need to mirror.
(b) In find_em_expr_for_rel, which was what was used before, the
condition on relids was this:if (bms_is_subset(em->em_relids, rel->relids) &&
!bms_is_empty(em->em_relids))
{
return em->em_expr;
}but here we're using only
if (!bms_is_subset(em->em_relids, rel->relids))
continue;Isn't this missing the second bms_is_empty condition?
I definitely remember intentionally removing that condition. For one,
it's not present in prepare_sort_from_pathkeys. My memory is a bit
fuzzy on all of the whys, but isn't it the case that if we've already
excluded const expressions, that we must have vars (and therefore
rels) present, and therefore the relids bms must be non-empty?
Probably more importantly, we're going to check that an actual em expr
matches, which means the bms subset check is really just an
optimization to avoid unnecessarily scanning the exprs for equality.
Perhaps it's worth adding a comment saying as such?
By the way, the fact that this is parallel to
prepare_sort_from_pathkeys is the reason the XXX comment is still
there about possibly needing to remove relabel types (since
prepare_sort_from_pathkeys does that). I don't think it's a hard
requirement: the worst case is that by not digging into relabel types
we're being unnecessarily strict.
Both of these I can add if desired.
James
On Fri, Oct 30, 2020 at 01:26:08PM -0400, James Coleman wrote:
On Thu, Oct 29, 2020 at 6:06 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:On Tue, Oct 06, 2020 at 09:37:31AM -0400, James Coleman wrote:
On Tue, Oct 6, 2020 at 9:28 AM Jaime Casanova
<jaime.casanova@2ndquadrant.com> wrote:Can you please create an entry in the commitfest for this one so we
don't lose track of it?We're not too far from the next minor release, so I've been looking at
this fix again and I intend to get it committed shortly (on Monday or
so). Attached is a slightly modified version of the v4 patch that:(a) Removes comments about projections added to code that is not
directly related to the fix. I'm not against adding such comments
separately.Seems fine. Do you want to commit them at the same time (just a
separate commit)? Or have a separate patch? It seems a bit overkill to
start a new thread just for those.
Probably sometime later, as a separate patch. I haven't thought very
much about those comments, it just seemed unrelated to the fix so I've
removed that for now. Let's not bother with this until after the minor
release.
(b) Fixes comment in expected output of incremental_sort test.
Thanks.
(c) Removes else branch from find_em_expr_usable_for_sorting_rel. It's
not quite needed thanks to the "return" in the "if" branch. IMO this
makes it more elegant.No objection.
I do have two questions about find_em_expr_usable_for_sorting_rel.
(a) Where does the em_is_const check come from? The comment claims we
should not be trying to sort by equivalence class members, but I can't
convince myself it's actually true and I don't see it discussed in this
thread.That comes from find_ec_member_for_tle which is called by
prepare_sort_from_pathkeys which we note in the comments contains the
set of rules we need to mirror.
Thanks for the pointer. I'll take a look.
(b) In find_em_expr_for_rel, which was what was used before, the
condition on relids was this:if (bms_is_subset(em->em_relids, rel->relids) &&
!bms_is_empty(em->em_relids))
{
return em->em_expr;
}but here we're using only
if (!bms_is_subset(em->em_relids, rel->relids))
continue;Isn't this missing the second bms_is_empty condition?
I definitely remember intentionally removing that condition. For one,
it's not present in prepare_sort_from_pathkeys. My memory is a bit
fuzzy on all of the whys, but isn't it the case that if we've already
excluded const expressions, that we must have vars (and therefore
rels) present, and therefore the relids bms must be non-empty?
Probably more importantly, we're going to check that an actual em expr
matches, which means the bms subset check is really just an
optimization to avoid unnecessarily scanning the exprs for equality.
Perhaps it's worth adding a comment saying as such?By the way, the fact that this is parallel to
prepare_sort_from_pathkeys is the reason the XXX comment is still
there about possibly needing to remove relabel types (since
prepare_sort_from_pathkeys does that). I don't think it's a hard
requirement: the worst case is that by not digging into relabel types
we're being unnecessarily strict.Both of these I can add if desired.
Yeah, it'd be good to explain the reasoning why it's fine to have the
conditions like this. Thanks.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Oct 30, 2020 at 5:03 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
On Fri, Oct 30, 2020 at 01:26:08PM -0400, James Coleman wrote:
On Thu, Oct 29, 2020 at 6:06 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:On Tue, Oct 06, 2020 at 09:37:31AM -0400, James Coleman wrote:
On Tue, Oct 6, 2020 at 9:28 AM Jaime Casanova
<jaime.casanova@2ndquadrant.com> wrote:Can you please create an entry in the commitfest for this one so we
don't lose track of it?We're not too far from the next minor release, so I've been looking at
this fix again and I intend to get it committed shortly (on Monday or
so). Attached is a slightly modified version of the v4 patch that:(a) Removes comments about projections added to code that is not
directly related to the fix. I'm not against adding such comments
separately.Seems fine. Do you want to commit them at the same time (just a
separate commit)? Or have a separate patch? It seems a bit overkill to
start a new thread just for those.Probably sometime later, as a separate patch. I haven't thought very
much about those comments, it just seemed unrelated to the fix so I've
removed that for now. Let's not bother with this until after the minor
release.
For whatever it's worth, I didn't originally consider them unrelated;
the purpose was to explain why those places were safe relative to the
same kinds of issues under discussion here: whether an expr will be in
the target and when it might need to be added.
EIther way I've split it into a separate patch in this series.
(b) Fixes comment in expected output of incremental_sort test.
Thanks.
(c) Removes else branch from find_em_expr_usable_for_sorting_rel. It's
not quite needed thanks to the "return" in the "if" branch. IMO this
makes it more elegant.No objection.
I do have two questions about find_em_expr_usable_for_sorting_rel.
(a) Where does the em_is_const check come from? The comment claims we
should not be trying to sort by equivalence class members, but I can't
convince myself it's actually true and I don't see it discussed in this
thread.That comes from find_ec_member_for_tle which is called by
prepare_sort_from_pathkeys which we note in the comments contains the
set of rules we need to mirror.Thanks for the pointer. I'll take a look.
(b) In find_em_expr_for_rel, which was what was used before, the
condition on relids was this:if (bms_is_subset(em->em_relids, rel->relids) &&
!bms_is_empty(em->em_relids))
{
return em->em_expr;
}but here we're using only
if (!bms_is_subset(em->em_relids, rel->relids))
continue;Isn't this missing the second bms_is_empty condition?
I definitely remember intentionally removing that condition. For one,
it's not present in prepare_sort_from_pathkeys. My memory is a bit
fuzzy on all of the whys, but isn't it the case that if we've already
excluded const expressions, that we must have vars (and therefore
rels) present, and therefore the relids bms must be non-empty?
Probably more importantly, we're going to check that an actual em expr
matches, which means the bms subset check is really just an
optimization to avoid unnecessarily scanning the exprs for equality.
Perhaps it's worth adding a comment saying as such?By the way, the fact that this is parallel to
prepare_sort_from_pathkeys is the reason the XXX comment is still
there about possibly needing to remove relabel types (since
prepare_sort_from_pathkeys does that). I don't think it's a hard
requirement: the worst case is that by not digging into relabel types
we're being unnecessarily strict.Both of these I can add if desired.
Yeah, it'd be good to explain the reasoning why it's fine to have the
conditions like this. Thanks.
I was looking at this some more, and I'm still convinced that this is
correct, but I don't think a comment about it being an optimization
(though I suspect that that its major benefit), since it came from,
and must parallel, find_ec_member_for_tle, and there is no such
explanation there. I don't want to backfill reasoning onto that
decision, but I do think it's important to parallel it since it's
ultimately what is going to cause errors if we're out of sync with it.
As to why find_em_expr_for_rel is different, I think the answer is
that it's used by the FDW code, and it seems to me to make sense that
in that case we'd only send sort conditions to the foreign server if
the em actually contains rels.
The new series attached includes the primary fix for this issue, the
additional comments that could either go in at the same time or not,
and my patch with semi-related questions (which isn't intended to be
committed, but to keep track of the questions).
James
Attachments:
v6-0003-questions.patchtext/x-patch; charset=US-ASCII; name=v6-0003-questions.patchDownload
From b81e133200086058cf2c8f683e31a643cc32aea7 Mon Sep 17 00:00:00 2001
From: James Coleman <jtc331@gmail.com>
Date: Sat, 3 Oct 2020 22:05:08 -0400
Subject: [PATCH v6 3/3] questions
---
src/backend/optimizer/plan/planner.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 6989c5c0e3..f78bbcec02 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -5018,6 +5018,10 @@ create_ordered_paths(PlannerInfo *root,
root->sort_pathkeys,
limit_tuples);
/* Add projection step if needed */
+ /* TODO: why don't we apply the projection to the path
+ * before sorting? Is it because it's already been done
+ * by apply_scanjoin_target_to_paths?
+ */
if (sorted_path->pathtarget != target)
sorted_path = apply_projection_to_path(root, ordered_rel,
sorted_path, target);
@@ -5048,6 +5052,10 @@ create_ordered_paths(PlannerInfo *root,
limit_tuples);
/* Add projection step if needed */
+ /* TODO: why don't we apply the projection to the path
+ * before sorting? Is it because it's already been done
+ * by apply_scanjoin_target_to_paths?
+ */
if (sorted_path->pathtarget != target)
sorted_path = apply_projection_to_path(root, ordered_rel,
sorted_path, target);
@@ -5099,6 +5107,10 @@ create_ordered_paths(PlannerInfo *root,
&total_groups);
/* Add projection step if needed */
+ /* TODO: why can't we apply the projection to the partial
+ * path? Is it because it's already been done if possible
+ * by apply_scanjoin_target_to_paths?
+ */
if (path->pathtarget != target)
path = apply_projection_to_path(root, ordered_rel,
path, target);
@@ -5160,6 +5172,10 @@ create_ordered_paths(PlannerInfo *root,
&total_groups);
/* Add projection step if needed */
+ /* TODO: why can't we apply the projection to the partial
+ * path? Is it because it's already been done if possible
+ * by apply_scanjoin_target_to_paths?
+ */
if (sorted_path->pathtarget != target)
sorted_path = apply_projection_to_path(root, ordered_rel,
sorted_path, target);
--
2.17.1
v6-0001-Fix-get_useful_pathkeys_for_relation-for-volatile.patchtext/x-patch; charset=US-ASCII; name=v6-0001-Fix-get_useful_pathkeys_for_relation-for-volatile.patchDownload
From c9ca57f75b6423916a63c2ddfddae9e00bbfef10 Mon Sep 17 00:00:00 2001
From: James Coleman <jtc331@gmail.com>
Date: Sat, 3 Oct 2020 10:35:47 -0400
Subject: [PATCH v6 1/3] Fix get_useful_pathkeys_for_relation for volatile
exprs below joins
It's not sufficient to find an ec member whose Vars are all in the rel
we're working with; volatile expressions in particular present a case
where we also need to know if the rel's target contains the expression
or not before we can assume the pathkey for that ec is safe to use at
this point in the query. If the pathkey expr is volatile and we're
below a join, then the expr can't appear in the target until we're above
the join, so we can't sort with it yet.
---
src/backend/optimizer/path/allpaths.c | 13 +--
src/backend/optimizer/path/equivclass.c | 67 +++++++++++++
src/include/optimizer/paths.h | 1 +
.../regress/expected/incremental_sort.out | 98 +++++++++++++++++++
src/test/regress/sql/incremental_sort.sql | 31 ++++++
5 files changed, 204 insertions(+), 6 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index b399592ff8..3393e1d37a 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -2760,7 +2760,8 @@ get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel)
/*
* Considering query_pathkeys is always worth it, because it might allow
* us to avoid a total sort when we have a partially presorted path
- * available.
+ * available or to push the total sort into the parallel portion of the
+ * query.
*/
if (root->query_pathkeys)
{
@@ -2773,17 +2774,17 @@ get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel)
EquivalenceClass *pathkey_ec = pathkey->pk_eclass;
/*
- * We can only build an Incremental Sort for pathkeys which
- * contain an EC member in the current relation, so ignore any
- * suffix of the list as soon as we find a pathkey without an EC
- * member the relation.
+ * We can only build a sort for pathkeys which contain an EC
+ * member in the current relation's target, so ignore any suffix
+ * of the list as soon as we find a pathkey without an EC member
+ * in the relation.
*
* By still returning the prefix of the pathkeys list that does
* meet criteria of EC membership in the current relation, we
* enable not just an incremental sort on the entirety of
* query_pathkeys but also incremental sort below a JOIN.
*/
- if (!find_em_expr_for_rel(pathkey_ec, rel))
+ if (!find_em_expr_usable_for_sorting_rel(pathkey_ec, rel))
break;
npathkeys++;
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index a21b3b4756..483e68780e 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -797,6 +797,73 @@ find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel)
return NULL;
}
+/*
+ * Find an equivalence class member expression that can be safely used by a
+ * sort node on top of the provided relation. The rules here must match those
+ * applied in prepare_sort_from_pathkeys.
+ */
+Expr *
+find_em_expr_usable_for_sorting_rel(EquivalenceClass *ec, RelOptInfo *rel)
+{
+ ListCell *lc_em;
+
+ /*
+ * If there is more than one equivalence member matching these
+ * requirements we'll be content to choose any one of them.
+ */
+ foreach(lc_em, ec->ec_members)
+ {
+ EquivalenceMember *em = lfirst(lc_em);
+ Expr *em_expr = em->em_expr;
+ PathTarget *target = rel->reltarget;
+ ListCell *lc_target_expr;
+
+ /*
+ * We shouldn't be trying to sort by an equivalence class that
+ * contains a constant, so no need to consider such cases any further.
+ */
+ if (em->em_is_const)
+ continue;
+
+ /*
+ * Any Vars in the equivalence class member need to come from this
+ * relation. This is a superset of prepare_sort_from_pathkeys ignoring
+ * child members unless they belong to the rel being sorted.
+ */
+ if (!bms_is_subset(em->em_relids, rel->relids))
+ continue;
+
+ /*
+ * As long as the expression isn't volatile then
+ * prepare_sort_from_pathkeys is able to generate a new target entry,
+ * so there's no need to verify that one already exists.
+ */
+ if (!ec->ec_has_volatile)
+ return em->em_expr;
+
+ /*
+ * If, however, it's volatile, we have to verify that the
+ * equivalence member's expr is already generated in the
+ * relation's target (but we can strip relabels).
+ */
+ while (em_expr && IsA(em_expr, RelabelType))
+ em_expr = ((RelabelType *) em_expr)->arg;
+ foreach(lc_target_expr, target->exprs)
+ {
+ Expr *target_expr = lfirst(lc_target_expr);
+
+ while (target_expr && IsA(target_expr, RelabelType))
+ target_expr = ((RelabelType *) target_expr)->arg;
+
+ if (equal(target_expr, em_expr))
+ return em->em_expr;
+ }
+ }
+
+ /* We didn't find any suitable equivalence class expression */
+ return NULL;
+}
+
/*
* generate_base_implied_equalities
* Generate any restriction clauses that we can deduce from equivalence
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 2134227ebc..8a4c6f8b59 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -135,6 +135,7 @@ extern EquivalenceClass *get_eclass_for_sort_expr(PlannerInfo *root,
Relids rel,
bool create_it);
extern Expr *find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel);
+extern Expr *find_em_expr_usable_for_sorting_rel(EquivalenceClass *ec, RelOptInfo *rel);
extern void generate_base_implied_equalities(PlannerInfo *root);
extern List *generate_join_implied_equalities(PlannerInfo *root,
Relids join_relids,
diff --git a/src/test/regress/expected/incremental_sort.out b/src/test/regress/expected/incremental_sort.out
index e376ea1276..7cf2eee7c1 100644
--- a/src/test/regress/expected/incremental_sort.out
+++ b/src/test/regress/expected/incremental_sort.out
@@ -1469,3 +1469,101 @@ explain (costs off) select * from t union select * from t order by 1,3;
(13 rows)
drop table t;
+-- Sort pushdown can't go below where expressions are part of the rel target.
+-- In particular this is interesting for volatile expressions which have to
+-- go above joins since otherwise we'll incorrectly use expression evaluations
+-- across multiple rows.
+set enable_hashagg=off;
+set enable_seqscan=off;
+set enable_incremental_sort = off;
+set parallel_tuple_cost=0;
+set parallel_setup_cost=0;
+set min_parallel_table_scan_size = 0;
+set min_parallel_index_scan_size = 0;
+-- Parallel sort below join.
+explain (costs off) select distinct sub.unique1, stringu1
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub;
+ QUERY PLAN
+--------------------------------------------------------------------------
+ Unique
+ -> Nested Loop
+ -> Gather Merge
+ Workers Planned: 2
+ -> Sort
+ Sort Key: tenk1.unique1, tenk1.stringu1
+ -> Parallel Index Scan using tenk1_unique1 on tenk1
+ -> Function Scan on generate_series
+(8 rows)
+
+explain (costs off) select sub.unique1, stringu1
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub
+order by 1, 2;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Gather Merge
+ Workers Planned: 2
+ -> Sort
+ Sort Key: tenk1.unique1, tenk1.stringu1
+ -> Parallel Index Scan using tenk1_unique1 on tenk1
+ -> Function Scan on generate_series
+(7 rows)
+
+-- Parallel sort but with expression that can be safely generated at the base rel.
+explain (costs off) select distinct sub.unique1, md5(stringu1)
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub;
+ QUERY PLAN
+----------------------------------------------------------------------------------------
+ Unique
+ -> Nested Loop
+ -> Gather Merge
+ Workers Planned: 2
+ -> Sort
+ Sort Key: tenk1.unique1, (md5((tenk1.stringu1)::text)) COLLATE "C"
+ -> Parallel Index Scan using tenk1_unique1 on tenk1
+ -> Function Scan on generate_series
+(8 rows)
+
+explain (costs off) select sub.unique1, md5(stringu1)
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub
+order by 1, 2;
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Nested Loop
+ -> Gather Merge
+ Workers Planned: 2
+ -> Sort
+ Sort Key: tenk1.unique1, (md5((tenk1.stringu1)::text)) COLLATE "C"
+ -> Parallel Index Scan using tenk1_unique1 on tenk1
+ -> Function Scan on generate_series
+(7 rows)
+
+-- Parallel sort but with expression not available until the upper rel.
+explain (costs off) select distinct sub.unique1, stringu1 || random()::text
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Unique
+ -> Sort
+ Sort Key: tenk1.unique1, (((tenk1.stringu1)::text || (random())::text)) COLLATE "C"
+ -> Gather
+ Workers Planned: 2
+ -> Nested Loop
+ -> Parallel Index Scan using tenk1_unique1 on tenk1
+ -> Function Scan on generate_series
+(8 rows)
+
+explain (costs off) select sub.unique1, stringu1 || random()::text
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub
+order by 1, 2;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Sort Key: tenk1.unique1, (((tenk1.stringu1)::text || (random())::text)) COLLATE "C"
+ -> Gather
+ Workers Planned: 2
+ -> Nested Loop
+ -> Parallel Index Scan using tenk1_unique1 on tenk1
+ -> Function Scan on generate_series
+(7 rows)
+
diff --git a/src/test/regress/sql/incremental_sort.sql b/src/test/regress/sql/incremental_sort.sql
index 9c040c90e6..3ee11c394b 100644
--- a/src/test/regress/sql/incremental_sort.sql
+++ b/src/test/regress/sql/incremental_sort.sql
@@ -221,3 +221,34 @@ set enable_hashagg to off;
explain (costs off) select * from t union select * from t order by 1,3;
drop table t;
+
+-- Sort pushdown can't go below where expressions are part of the rel target.
+-- In particular this is interesting for volatile expressions which have to
+-- go above joins since otherwise we'll incorrectly use expression evaluations
+-- across multiple rows.
+set enable_hashagg=off;
+set enable_seqscan=off;
+set enable_incremental_sort = off;
+set parallel_tuple_cost=0;
+set parallel_setup_cost=0;
+set min_parallel_table_scan_size = 0;
+set min_parallel_index_scan_size = 0;
+
+-- Parallel sort below join.
+explain (costs off) select distinct sub.unique1, stringu1
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub;
+explain (costs off) select sub.unique1, stringu1
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub
+order by 1, 2;
+-- Parallel sort but with expression that can be safely generated at the base rel.
+explain (costs off) select distinct sub.unique1, md5(stringu1)
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub;
+explain (costs off) select sub.unique1, md5(stringu1)
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub
+order by 1, 2;
+-- Parallel sort but with expression not available until the upper rel.
+explain (costs off) select distinct sub.unique1, stringu1 || random()::text
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub;
+explain (costs off) select sub.unique1, stringu1 || random()::text
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub
+order by 1, 2;
--
2.17.1
v6-0002-Add-comments-explaining-where-projections-aren-t-.patchtext/x-patch; charset=US-ASCII; name=v6-0002-Add-comments-explaining-where-projections-aren-t-.patchDownload
From 945cb596cc0e62780994bd084eaea5355af85554 Mon Sep 17 00:00:00 2001
From: James Coleman <jtc331@gmail.com>
Date: Fri, 30 Oct 2020 19:27:13 -0400
Subject: [PATCH v6 2/3] Add comments explaining where projections aren't
necessary
---
src/backend/optimizer/plan/planner.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 986d7a52e3..6989c5c0e3 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7414,7 +7414,9 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
* current reltarget is. We don't do this in the case where the
* target is parallel-safe, since we will be able to generate superior
* paths by doing it after the final scan/join target has been
- * applied.
+ * applied. Since the target isn't parallel safe, we don't need to
+ * apply projections to the partial paths before building gather
+ * paths.
*/
generate_useful_gather_paths(root, rel, false);
@@ -7567,7 +7569,10 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
* if the relation is parallel safe, and we don't do it for child rels to
* avoid creating multiple Gather nodes within the same plan. We must do
* this after all paths have been generated and before set_cheapest, since
- * one of the generated paths may turn out to be the cheapest one.
+ * one of the generated paths may turn out to be the cheapest one. We've
+ * already applied any necessary projections to the partial paths above so
+ * any Gather Merge paths will be able to make use of path keys in
+ * requested target.
*/
if (rel->consider_parallel && !IS_OTHER_REL(rel))
generate_useful_gather_paths(root, rel, false);
--
2.17.1
On Fri, Oct 30, 2020 at 07:37:33PM -0400, James Coleman wrote:
...
I was looking at this some more, and I'm still convinced that this is
correct, but I don't think a comment about it being an optimization
(though I suspect that that its major benefit), since it came from,
and must parallel, find_ec_member_for_tle, and there is no such
explanation there. I don't want to backfill reasoning onto that
decision, but I do think it's important to parallel it since it's
ultimately what is going to cause errors if we're out of sync with it.As to why find_em_expr_for_rel is different, I think the answer is
that it's used by the FDW code, and it seems to me to make sense that
in that case we'd only send sort conditions to the foreign server if
the em actually contains rels.The new series attached includes the primary fix for this issue, the
additional comments that could either go in at the same time or not,
and my patch with semi-related questions (which isn't intended to be
committed, but to keep track of the questions).
Thanks. Attached are two patches that I plan to get committed
0001 is what you sent, with slightly reworded commit message. This is
the actual fix.
I'm still thinking about Robert's comment that perhaps we should be
considering only expressions already in the relation's target, which
would imply checking for volatile expressions is not enough. But I've
been unable to convince myself it's correct or incorrect. In any case
0001 is a clear improvement - in the worst case we'll have to make it
stricter in the future.
0002 partially reverts ba3e76cc571 by moving find_em_expr_for_rel back
to postgres_fdw.c. We've moved it to equivclass.c so that it could be
called from both the FDW and allpaths.c, but now that the functions
diverged it's only called from the FDW again. So maybe we should move
it back, but I guess that's not a good thing in a minor release.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
v7-0001-Fix-get_useful_pathkeys_for_relation-for-volatile-ex.patchtext/plain; charset=us-asciiDownload
From b2c8085256aa6d6e34f6c9b8394738ee1bef3997 Mon Sep 17 00:00:00 2001
From: James Coleman <jtc331@gmail.com>
Date: Sat, 3 Oct 2020 10:35:47 -0400
Subject: [PATCH 1/3] Fix get_useful_pathkeys_for_relation for volatile
expressions
When considering Incremental Sort below a Gather Merge, we need to be
a bit more careful when matching pathkeys to EC members. It's not enough
to find a member whose Vars are all in the current relation's target;
volatile expressions in particular need to be contained in the target,
otherwise it's too early to use the pathkey.
Reported-by: Jaime Casanova
Author: James Coleman
Reviewed-by: Tomas Vondra
Backpatch-through: 13
Discussion: https://postgr.es/m/CAJGNTeNaxpXgBVcRhJX%2B2vSbq%2BF2kJqGBcvompmpvXb7pq%2BoFA%40mail.gmail.com
---
src/backend/optimizer/path/allpaths.c | 13 +--
src/backend/optimizer/path/equivclass.c | 70 +++++++++++++
src/include/optimizer/paths.h | 1 +
.../regress/expected/incremental_sort.out | 98 +++++++++++++++++++
src/test/regress/sql/incremental_sort.sql | 31 ++++++
5 files changed, 207 insertions(+), 6 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 8ad6384c6a..84a69b064a 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -2816,7 +2816,8 @@ get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel)
/*
* Considering query_pathkeys is always worth it, because it might allow
* us to avoid a total sort when we have a partially presorted path
- * available.
+ * available or to push the total sort into the parallel portion of the
+ * query.
*/
if (root->query_pathkeys)
{
@@ -2829,17 +2830,17 @@ get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel)
EquivalenceClass *pathkey_ec = pathkey->pk_eclass;
/*
- * We can only build an Incremental Sort for pathkeys which
- * contain an EC member in the current relation, so ignore any
- * suffix of the list as soon as we find a pathkey without an EC
- * member the relation.
+ * We can only build a sort for pathkeys which contain an EC
+ * member in the current relation's target, so ignore any suffix
+ * of the list as soon as we find a pathkey without an EC member
+ * in the relation.
*
* By still returning the prefix of the pathkeys list that does
* meet criteria of EC membership in the current relation, we
* enable not just an incremental sort on the entirety of
* query_pathkeys but also incremental sort below a JOIN.
*/
- if (!find_em_expr_for_rel(pathkey_ec, rel))
+ if (!find_em_expr_usable_for_sorting_rel(pathkey_ec, rel))
break;
npathkeys++;
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index a21b3b4756..f98fd7b3eb 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -797,6 +797,76 @@ find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel)
return NULL;
}
+/*
+ * Find an equivalence class member expression that can be safely used by a
+ * sort node on top of the provided relation. The rules here must match those
+ * applied in prepare_sort_from_pathkeys.
+ */
+Expr *
+find_em_expr_usable_for_sorting_rel(EquivalenceClass *ec, RelOptInfo *rel)
+{
+ ListCell *lc_em;
+
+ /*
+ * If there is more than one equivalence member matching these
+ * requirements we'll be content to choose any one of them.
+ */
+ foreach(lc_em, ec->ec_members)
+ {
+ EquivalenceMember *em = lfirst(lc_em);
+ Expr *em_expr = em->em_expr;
+ PathTarget *target = rel->reltarget;
+ ListCell *lc_target_expr;
+
+ /*
+ * We shouldn't be trying to sort by an equivalence class that
+ * contains a constant, so no need to consider such cases any further.
+ */
+ if (em->em_is_const)
+ continue;
+
+ /*
+ * Any Vars in the equivalence class member need to come from this
+ * relation. This is a superset of prepare_sort_from_pathkeys ignoring
+ * child members unless they belong to the rel being sorted.
+ */
+ if (!bms_is_subset(em->em_relids, rel->relids))
+ continue;
+
+ /*
+ * As long as the expression isn't volatile then
+ * prepare_sort_from_pathkeys is able to generate a new target entry,
+ * so there's no need to verify that one already exists.
+ */
+ if (!ec->ec_has_volatile)
+ return em->em_expr;
+
+ /*
+ * If, however, it's volatile, we have to verify that the
+ * equivalence member's expr is already generated in the
+ * relation's target (we do strip relabels first from both
+ * expressions, which is cheap and might allow us to match
+ * more expressions).
+ */
+ while (em_expr && IsA(em_expr, RelabelType))
+ em_expr = ((RelabelType *) em_expr)->arg;
+
+ foreach(lc_target_expr, target->exprs)
+ {
+ Expr *target_expr = lfirst(lc_target_expr);
+
+ while (target_expr && IsA(target_expr, RelabelType))
+ target_expr = ((RelabelType *) target_expr)->arg;
+
+ if (equal(target_expr, em_expr))
+ return em->em_expr;
+ }
+ }
+
+ /* We didn't find any suitable equivalence class expression */
+ return NULL;
+}
+
/*
* generate_base_implied_equalities
* Generate any restriction clauses that we can deduce from equivalence
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 2134227ebc..8a4c6f8b59 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -135,6 +135,7 @@ extern EquivalenceClass *get_eclass_for_sort_expr(PlannerInfo *root,
Relids rel,
bool create_it);
extern Expr *find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel);
+extern Expr *find_em_expr_usable_for_sorting_rel(EquivalenceClass *ec, RelOptInfo *rel);
extern void generate_base_implied_equalities(PlannerInfo *root);
extern List *generate_join_implied_equalities(PlannerInfo *root,
Relids join_relids,
diff --git a/src/test/regress/expected/incremental_sort.out b/src/test/regress/expected/incremental_sort.out
index e376ea1276..7cf2eee7c1 100644
--- a/src/test/regress/expected/incremental_sort.out
+++ b/src/test/regress/expected/incremental_sort.out
@@ -1469,3 +1469,101 @@ explain (costs off) select * from t union select * from t order by 1,3;
(13 rows)
drop table t;
+-- Sort pushdown can't go below where expressions are part of the rel target.
+-- In particular this is interesting for volatile expressions which have to
+-- go above joins since otherwise we'll incorrectly use expression evaluations
+-- across multiple rows.
+set enable_hashagg=off;
+set enable_seqscan=off;
+set enable_incremental_sort = off;
+set parallel_tuple_cost=0;
+set parallel_setup_cost=0;
+set min_parallel_table_scan_size = 0;
+set min_parallel_index_scan_size = 0;
+-- Parallel sort below join.
+explain (costs off) select distinct sub.unique1, stringu1
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub;
+ QUERY PLAN
+--------------------------------------------------------------------------
+ Unique
+ -> Nested Loop
+ -> Gather Merge
+ Workers Planned: 2
+ -> Sort
+ Sort Key: tenk1.unique1, tenk1.stringu1
+ -> Parallel Index Scan using tenk1_unique1 on tenk1
+ -> Function Scan on generate_series
+(8 rows)
+
+explain (costs off) select sub.unique1, stringu1
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub
+order by 1, 2;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Gather Merge
+ Workers Planned: 2
+ -> Sort
+ Sort Key: tenk1.unique1, tenk1.stringu1
+ -> Parallel Index Scan using tenk1_unique1 on tenk1
+ -> Function Scan on generate_series
+(7 rows)
+
+-- Parallel sort but with expression that can be safely generated at the base rel.
+explain (costs off) select distinct sub.unique1, md5(stringu1)
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub;
+ QUERY PLAN
+----------------------------------------------------------------------------------------
+ Unique
+ -> Nested Loop
+ -> Gather Merge
+ Workers Planned: 2
+ -> Sort
+ Sort Key: tenk1.unique1, (md5((tenk1.stringu1)::text)) COLLATE "C"
+ -> Parallel Index Scan using tenk1_unique1 on tenk1
+ -> Function Scan on generate_series
+(8 rows)
+
+explain (costs off) select sub.unique1, md5(stringu1)
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub
+order by 1, 2;
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Nested Loop
+ -> Gather Merge
+ Workers Planned: 2
+ -> Sort
+ Sort Key: tenk1.unique1, (md5((tenk1.stringu1)::text)) COLLATE "C"
+ -> Parallel Index Scan using tenk1_unique1 on tenk1
+ -> Function Scan on generate_series
+(7 rows)
+
+-- Parallel sort but with expression not available until the upper rel.
+explain (costs off) select distinct sub.unique1, stringu1 || random()::text
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Unique
+ -> Sort
+ Sort Key: tenk1.unique1, (((tenk1.stringu1)::text || (random())::text)) COLLATE "C"
+ -> Gather
+ Workers Planned: 2
+ -> Nested Loop
+ -> Parallel Index Scan using tenk1_unique1 on tenk1
+ -> Function Scan on generate_series
+(8 rows)
+
+explain (costs off) select sub.unique1, stringu1 || random()::text
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub
+order by 1, 2;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Sort Key: tenk1.unique1, (((tenk1.stringu1)::text || (random())::text)) COLLATE "C"
+ -> Gather
+ Workers Planned: 2
+ -> Nested Loop
+ -> Parallel Index Scan using tenk1_unique1 on tenk1
+ -> Function Scan on generate_series
+(7 rows)
+
diff --git a/src/test/regress/sql/incremental_sort.sql b/src/test/regress/sql/incremental_sort.sql
index 9c040c90e6..3ee11c394b 100644
--- a/src/test/regress/sql/incremental_sort.sql
+++ b/src/test/regress/sql/incremental_sort.sql
@@ -221,3 +221,34 @@ set enable_hashagg to off;
explain (costs off) select * from t union select * from t order by 1,3;
drop table t;
+
+-- Sort pushdown can't go below where expressions are part of the rel target.
+-- In particular this is interesting for volatile expressions which have to
+-- go above joins since otherwise we'll incorrectly use expression evaluations
+-- across multiple rows.
+set enable_hashagg=off;
+set enable_seqscan=off;
+set enable_incremental_sort = off;
+set parallel_tuple_cost=0;
+set parallel_setup_cost=0;
+set min_parallel_table_scan_size = 0;
+set min_parallel_index_scan_size = 0;
+
+-- Parallel sort below join.
+explain (costs off) select distinct sub.unique1, stringu1
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub;
+explain (costs off) select sub.unique1, stringu1
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub
+order by 1, 2;
+-- Parallel sort but with expression that can be safely generated at the base rel.
+explain (costs off) select distinct sub.unique1, md5(stringu1)
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub;
+explain (costs off) select sub.unique1, md5(stringu1)
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub
+order by 1, 2;
+-- Parallel sort but with expression not available until the upper rel.
+explain (costs off) select distinct sub.unique1, stringu1 || random()::text
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub;
+explain (costs off) select sub.unique1, stringu1 || random()::text
+from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub
+order by 1, 2;
--
2.25.4
v7-0002-partially-revert-ba3e76cc571eba3dea19c9465ff15ac3ac1.patchtext/plain; charset=us-asciiDownload
From aefe619b1cecb4ca70f2035b2b62ff270a4e80e7 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@2ndquadrant.com>
Date: Tue, 3 Nov 2020 03:22:50 +0100
Subject: [PATCH 2/3] partially revert ba3e76cc571eba3dea19c9465ff15ac3ac186576
---
contrib/postgres_fdw/postgres_fdw.c | 29 +++++++++++++++++++++++++
src/backend/optimizer/path/equivclass.c | 29 -------------------------
src/include/optimizer/paths.h | 1 -
3 files changed, 29 insertions(+), 30 deletions(-)
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9c5aaacc51..dcf533d700 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -6524,6 +6524,35 @@ conversion_error_callback(void *arg)
}
}
+/*
+ * Find an equivalence class member expression, all of whose Vars, come from
+ * the indicated relation.
+ */
+Expr *
+find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel)
+{
+ ListCell *lc_em;
+
+ foreach(lc_em, ec->ec_members)
+ {
+ EquivalenceMember *em = lfirst(lc_em);
+
+ if (bms_is_subset(em->em_relids, rel->relids) &&
+ !bms_is_empty(em->em_relids))
+ {
+ /*
+ * If there is more than one equivalence member whose Vars are
+ * taken entirely from this relation, we'll be content to choose
+ * any one of those.
+ */
+ return em->em_expr;
+ }
+ }
+
+ /* We didn't find any suitable equivalence class expression */
+ return NULL;
+}
+
/*
* Find an equivalence class member expression to be computed as a sort column
* in the given target.
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index f98fd7b3eb..a507a76bcf 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -768,35 +768,6 @@ get_eclass_for_sort_expr(PlannerInfo *root,
return newec;
}
-/*
- * Find an equivalence class member expression, all of whose Vars, come from
- * the indicated relation.
- */
-Expr *
-find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel)
-{
- ListCell *lc_em;
-
- foreach(lc_em, ec->ec_members)
- {
- EquivalenceMember *em = lfirst(lc_em);
-
- if (bms_is_subset(em->em_relids, rel->relids) &&
- !bms_is_empty(em->em_relids))
- {
- /*
- * If there is more than one equivalence member whose Vars are
- * taken entirely from this relation, we'll be content to choose
- * any one of those.
- */
- return em->em_expr;
- }
- }
-
- /* We didn't find any suitable equivalence class expression */
- return NULL;
-}
-
/*
* Find an equivalence class member expression that can be safely used by a
* sort node on top of the provided relation. The rules here must match those
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 8a4c6f8b59..5f6ffbd860 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -134,7 +134,6 @@ extern EquivalenceClass *get_eclass_for_sort_expr(PlannerInfo *root,
Index sortref,
Relids rel,
bool create_it);
-extern Expr *find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel);
extern Expr *find_em_expr_usable_for_sorting_rel(EquivalenceClass *ec, RelOptInfo *rel);
extern void generate_base_implied_equalities(PlannerInfo *root);
extern List *generate_join_implied_equalities(PlannerInfo *root,
--
2.25.4
On Tue, Nov 03, 2020 at 03:37:43AM +0100, Tomas Vondra wrote:
On Fri, Oct 30, 2020 at 07:37:33PM -0400, James Coleman wrote:
...
I was looking at this some more, and I'm still convinced that this is
correct, but I don't think a comment about it being an optimization
(though I suspect that that its major benefit), since it came from,
and must parallel, find_ec_member_for_tle, and there is no such
explanation there. I don't want to backfill reasoning onto that
decision, but I do think it's important to parallel it since it's
ultimately what is going to cause errors if we're out of sync with it.As to why find_em_expr_for_rel is different, I think the answer is
that it's used by the FDW code, and it seems to me to make sense that
in that case we'd only send sort conditions to the foreign server if
the em actually contains rels.The new series attached includes the primary fix for this issue, the
additional comments that could either go in at the same time or not,
and my patch with semi-related questions (which isn't intended to be
committed, but to keep track of the questions).Thanks. Attached are two patches that I plan to get committed
0001 is what you sent, with slightly reworded commit message. This is
the actual fix.I'm still thinking about Robert's comment that perhaps we should be
considering only expressions already in the relation's target, which
would imply checking for volatile expressions is not enough. But I've
been unable to convince myself it's correct or incorrect. In any case
0001 is a clear improvement - in the worst case we'll have to make it
stricter in the future.0002 partially reverts ba3e76cc571 by moving find_em_expr_for_rel back
to postgres_fdw.c. We've moved it to equivclass.c so that it could be
called from both the FDW and allpaths.c, but now that the functions
diverged it's only called from the FDW again. So maybe we should move
it back, but I guess that's not a good thing in a minor release.
I've pushed the 0001 part, i.e. the main fix. Not sure about the other
parts (comments and moving the code back to postgres_fdw) yet.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I've pushed the 0001 part, i.e. the main fix. Not sure about the other
parts (comments and moving the code back to postgres_fdw) yet.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, Tomas!
I'm still having trouble with this problem. Tom commited something [1]https://git.postgresql.org/pg/commitdiff/936043c9eacb9e9c7356a8190a410d2c4e4ea03a to avoid core dumps, but all I get now is: subplan "SubPlan 1" was not initialized
I upgraded to 13.1 over the weekend but the problem still occurs.
This is the topic I started: /messages/by-id/CAAaqYe--qpnBOUiqnHSAc71K2HpXetnGedRAX0Z04zx=Y0ExHA@mail.gmail.com
[1]: https://git.postgresql.org/pg/commitdiff/936043c9eacb9e9c7356a8190a410d2c4e4ea03a
Hmm, I missed that other thread. That indeed seems like a bug in the
same area already tweaked by ebb7ae839d033d0f2 for similar cases.
The attached patch fixes this simply by adding is_parallel_safe to
get_useful_pathkeys_for_relation - that does fix the reproducer, but I'm
not entirely sure that's the right place. Maybe it should be done in
find_em_expr_usable_for_sorting_rel (which might make a difference if an
EC clause can contain a mix of parallel safe and unsafe expressions). Or
maybe we should do it in the caller (which would allow using
get_useful_pathkeys_for_relation in contexts not requiring parallel
safety in the future).
Anyway, while this is not an "incremental sort" bug, it seems like the
root cause is the same as for ebb7ae839d033d0f2 - one of the incremental
sort patches started considering sorting below gather nodes, not
realizing these possible consequences.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
incremental-sort-fix-parallel-safe.patchtext/x-patch; charset=UTF-8; name=incremental-sort-fix-parallel-safe.patchDownload
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 84a69b064a..93db261011 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -2826,6 +2826,7 @@ get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel)
foreach(lc, root->query_pathkeys)
{
+ Expr *expr;
PathKey *pathkey = (PathKey *) lfirst(lc);
EquivalenceClass *pathkey_ec = pathkey->pk_eclass;
@@ -2840,7 +2841,14 @@ get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel)
* enable not just an incremental sort on the entirety of
* query_pathkeys but also incremental sort below a JOIN.
*/
- if (!find_em_expr_usable_for_sorting_rel(pathkey_ec, rel))
+ if (!(expr = find_em_expr_usable_for_sorting_rel(pathkey_ec, rel)))
+ break;
+
+ /*
+ * Also stop when the expression is not parallel-safe. We plan
+ * to add all of this under a Gather node.
+ */
+ if (!is_parallel_safe(root, (Node *) expr))
break;
npathkeys++;
On Mon, Nov 16, 2020 at 11:23 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
Hmm, I missed that other thread. That indeed seems like a bug in the
same area already tweaked by ebb7ae839d033d0f2 for similar cases.
I meant to bring it up in this thread before we got the other patch
committed, but I just ended up not having time to look into it.
The attached patch fixes this simply by adding is_parallel_safe to
get_useful_pathkeys_for_relation - that does fix the reproducer, but I'm
not entirely sure that's the right place. Maybe it should be done in
find_em_expr_usable_for_sorting_rel (which might make a difference if an
EC clause can contain a mix of parallel safe and unsafe expressions). Or
maybe we should do it in the caller (which would allow using
get_useful_pathkeys_for_relation in contexts not requiring parallel
safety in the future).Anyway, while this is not an "incremental sort" bug, it seems like the
root cause is the same as for ebb7ae839d033d0f2 - one of the incremental
sort patches started considering sorting below gather nodes, not
realizing these possible consequences.
Yeah. I'd like to investigate a bit if really we should also be adding
it to prepare_sort_from_pathkeys (which
find_em_expr_usable_for_sorting_rel parallels) or similar, since this
seems to be a broader concern that's been previously missed (even if
the bug was otherwise not yet exposed). In particular, as I understand
it, we did do sorts below gather nodes before, just in fewer places,
which meant that "in theory" the code was already broken (or at least
not complete) for parallel queries.
James
On 11/17/20 3:03 PM, James Coleman wrote:
On Mon, Nov 16, 2020 at 11:23 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:Hmm, I missed that other thread. That indeed seems like a bug in the
same area already tweaked by ebb7ae839d033d0f2 for similar cases.I meant to bring it up in this thread before we got the other patch
committed, but I just ended up not having time to look into it.The attached patch fixes this simply by adding is_parallel_safe to
get_useful_pathkeys_for_relation - that does fix the reproducer, but I'm
not entirely sure that's the right place. Maybe it should be done in
find_em_expr_usable_for_sorting_rel (which might make a difference if an
EC clause can contain a mix of parallel safe and unsafe expressions). Or
maybe we should do it in the caller (which would allow using
get_useful_pathkeys_for_relation in contexts not requiring parallel
safety in the future).Anyway, while this is not an "incremental sort" bug, it seems like the
root cause is the same as for ebb7ae839d033d0f2 - one of the incremental
sort patches started considering sorting below gather nodes, not
realizing these possible consequences.Yeah. I'd like to investigate a bit if really we should also be adding
it to prepare_sort_from_pathkeys (which
find_em_expr_usable_for_sorting_rel parallels) or similar, since this
seems to be a broader concern that's been previously missed (even if
the bug was otherwise not yet exposed). In particular, as I understand
it, we did do sorts below gather nodes before, just in fewer places,
which meant that "in theory" the code was already broken (or at least
not complete) for parallel queries.
True. It's possible the bug was there before, but the affected plans
were just very unlikely for some reason, and the new plans introduced by
incremental sort just made it easier to trigger.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, Nov 17, 2020 at 11:20 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
On 11/17/20 3:03 PM, James Coleman wrote:
On Mon, Nov 16, 2020 at 11:23 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:Hmm, I missed that other thread. That indeed seems like a bug in the
same area already tweaked by ebb7ae839d033d0f2 for similar cases.I meant to bring it up in this thread before we got the other patch
committed, but I just ended up not having time to look into it.The attached patch fixes this simply by adding is_parallel_safe to
get_useful_pathkeys_for_relation - that does fix the reproducer, but I'm
not entirely sure that's the right place. Maybe it should be done in
find_em_expr_usable_for_sorting_rel (which might make a difference if an
EC clause can contain a mix of parallel safe and unsafe expressions). Or
maybe we should do it in the caller (which would allow using
get_useful_pathkeys_for_relation in contexts not requiring parallel
safety in the future).Anyway, while this is not an "incremental sort" bug, it seems like the
root cause is the same as for ebb7ae839d033d0f2 - one of the incremental
sort patches started considering sorting below gather nodes, not
realizing these possible consequences.Yeah. I'd like to investigate a bit if really we should also be adding
it to prepare_sort_from_pathkeys (which
find_em_expr_usable_for_sorting_rel parallels) or similar, since this
seems to be a broader concern that's been previously missed (even if
the bug was otherwise not yet exposed). In particular, as I understand
it, we did do sorts below gather nodes before, just in fewer places,
which meant that "in theory" the code was already broken (or at least
not complete) for parallel queries.True. It's possible the bug was there before, but the affected plans
were just very unlikely for some reason, and the new plans introduced by
incremental sort just made it easier to trigger.
For additional patches on the broader topic (e.g., parallel safety) is
it preferable to keep them here or in the thread Luis started (or even
a new one besides that -- since that's on -bugs not -hackers)?
James
On Wed, Oct 7, 2020 at 6:22 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
I'm still not entirely sure I understand what's happening, or what the
exact rule is. Consider this query:explain (verbose) select distinct i, t, md5(t) from ref_0;
which on PG12 (i.e. before incremental sort) is planned like this:
QUERY PLAN
----------------------------------------------------------------------------------
Unique (cost=78120.92..83120.92 rows=500000 width=65)
Output: i, t, (md5(t))
-> Sort (cost=78120.92..79370.92 rows=500000 width=65)
Output: i, t, (md5(t))
Sort Key: ref_0.i, ref_0.t, (md5(ref_0.t))
-> Seq Scan on public.ref_0 (cost=0.00..10282.00 rows=500000 width=65)
Output: i, t, md5(t)
(7 rows)i.e. the (stable) function is pushed all the way to the scan node. And
even if we replace it with a volatile expression it gets pushed down:explain (verbose) select distinct i, t, md5(random()::text || t) from ref_0;
QUERY PLAN
----------------------------------------------------------------------------------
Unique (cost=83120.92..88120.92 rows=500000 width=65)
Output: i, t, (md5(((random())::text || t)))
-> Sort (cost=83120.92..84370.92 rows=500000 width=65)
Output: i, t, (md5(((random())::text || t)))
Sort Key: ref_0.i, ref_0.t, (md5(((random())::text || ref_0.t)))
-> Seq Scan on public.ref_0 (cost=0.00..15282.00 rows=500000 width=65)
Output: i, t, md5(((random())::text || t))
(7 rows)But perhaps I just don't understand the assumption correctly?
This isn't a counterexample, because there's no join tree here -- or,
well, there is, but it's trivial, because there's only one relation
involved. You can't have a non-Var expression computed before you
finish all the joins, because there are no joins.
What I said was: "target lists for any nodes below the top of the join
tree were previously always just Var nodes." The topmost join allowed
non-Var nodes before, but not lower levels.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Fri, Nov 20, 2020 at 12:06 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Oct 7, 2020 at 6:22 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:I'm still not entirely sure I understand what's happening, or what the
exact rule is. Consider this query:explain (verbose) select distinct i, t, md5(t) from ref_0;
which on PG12 (i.e. before incremental sort) is planned like this:
QUERY PLAN
----------------------------------------------------------------------------------
Unique (cost=78120.92..83120.92 rows=500000 width=65)
Output: i, t, (md5(t))
-> Sort (cost=78120.92..79370.92 rows=500000 width=65)
Output: i, t, (md5(t))
Sort Key: ref_0.i, ref_0.t, (md5(ref_0.t))
-> Seq Scan on public.ref_0 (cost=0.00..10282.00 rows=500000 width=65)
Output: i, t, md5(t)
(7 rows)i.e. the (stable) function is pushed all the way to the scan node. And
even if we replace it with a volatile expression it gets pushed down:explain (verbose) select distinct i, t, md5(random()::text || t) from ref_0;
QUERY PLAN
----------------------------------------------------------------------------------
Unique (cost=83120.92..88120.92 rows=500000 width=65)
Output: i, t, (md5(((random())::text || t)))
-> Sort (cost=83120.92..84370.92 rows=500000 width=65)
Output: i, t, (md5(((random())::text || t)))
Sort Key: ref_0.i, ref_0.t, (md5(((random())::text || ref_0.t)))
-> Seq Scan on public.ref_0 (cost=0.00..15282.00 rows=500000 width=65)
Output: i, t, md5(((random())::text || t))
(7 rows)But perhaps I just don't understand the assumption correctly?
This isn't a counterexample, because there's no join tree here -- or,
well, there is, but it's trivial, because there's only one relation
involved. You can't have a non-Var expression computed before you
finish all the joins, because there are no joins.What I said was: "target lists for any nodes below the top of the join
tree were previously always just Var nodes." The topmost join allowed
non-Var nodes before, but not lower levels.
As I understand what you're saying, the attached (from the repro case
in [1]'s discussion about parallel safety here) is a counterexample.
Specifically we have a plan like:
Merge Right Join
-> Unique
-> Gather Merge
-> Sort
-> Nested Loop
The pathtarget of the nested loop contains non-var expressions (in
this case a CASE expression).
Am I misunderstanding what you're saying?
I've attached verbose output (and the query).
James
Attachments:
On Fri, Nov 20, 2020 at 1:51 PM James Coleman <jtc331@gmail.com> wrote:
This isn't a counterexample, because there's no join tree here -- or,
well, there is, but it's trivial, because there's only one relation
involved. You can't have a non-Var expression computed before you
finish all the joins, because there are no joins.What I said was: "target lists for any nodes below the top of the join
tree were previously always just Var nodes." The topmost join allowed
non-Var nodes before, but not lower levels.As I understand what you're saying, the attached (from the repro case
in [1]'s discussion about parallel safety here) is a counterexample.Specifically we have a plan like:
Merge Right Join
-> Unique
-> Gather Merge
-> Sort
-> Nested LoopThe pathtarget of the nested loop contains non-var expressions (in
this case a CASE expression).
Well, in this case there are two join trees, one for the subquery and
one for the outer query. The expressions for the subquery are computed
at the top of the plan tree for that subquery.
I guess I didn't specify before that I meant "at the top of the join
tree for a subquery level," but I did mean that. On principle, the
planner can't very realistically postpone evaluating a subquery's
expressions until the top of the outer query's join tree, because, as
in your example here, the subquery might contain an ORDER BY or
DISTINCT clause. And anyway, if you look at the planner code, you'll
see that every subquery is basically planned separately, unless it
gets flattened into the parent query, so even if it were possible to
postpone expression evaluation to outer subquery levels in some case,
the current code structure definitely would fail to achieve that goal.
However, apart from incremental sort, you can postpone evaluating a
join tree's expressions until you reach the top of the join tree, and
I think that's what we have always done in the past. Each baserel in
the join tree gets a target list containing a list of vars which
corresponds to its column list, and the joinrels get lists of vars
which correspond to the columns from the inputs are needed at higher
levels of the plan tree. At the top of the join tree, we stick in the
expressions, replacing the target list of the top node in the plan
tree; the expressions have to be computable from the inputs because
the inputs contain all the columns that we figured out were needed at
this level at the beginning of planning. But, if there are scans or
joins below the top level of the plan tree, then do they bubble up to
higher levels of the plan? Should they? It's pretty complicated.
Theoretically if I've computed length(somelongvar) then perhaps I can
just bubble the computed value up the plan tree, rather than the
original column, which might be cheaper. But that only works if the
higher levels only need length(somelongvar) and not somelongvar
itself. I don't think there's any logic to work stuff like this out,
unless the incremental sort patch added some, because I don't think we
ever did it before. And it may be that it doesn't really matter, at
least when there aren't volatile functions: perhaps the worst case is
that we evaluate a non-volatile function twice, and maybe that's just
a performance consequence that we can live with. But on the other
hand, maybe it breaks stuff.
Or, on the third hand, maybe I'm wrong about what the rule was in the
first place. This is certainly a complicated topic. I don't think I'm
wrong, or I wouldn't be bothering to write emails about it, but that
doesn't mean I'm not wrong anyway.
--
Robert Haas
EDB: http://www.enterprisedb.com
Thanks much for the detailed followup; this is super helpful.
On Fri, Nov 20, 2020 at 2:57 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Nov 20, 2020 at 1:51 PM James Coleman <jtc331@gmail.com> wrote:
This isn't a counterexample, because there's no join tree here -- or,
well, there is, but it's trivial, because there's only one relation
involved. You can't have a non-Var expression computed before you
finish all the joins, because there are no joins.What I said was: "target lists for any nodes below the top of the join
tree were previously always just Var nodes." The topmost join allowed
non-Var nodes before, but not lower levels.As I understand what you're saying, the attached (from the repro case
in [1]'s discussion about parallel safety here) is a counterexample.Specifically we have a plan like:
Merge Right Join
-> Unique
-> Gather Merge
-> Sort
-> Nested LoopThe pathtarget of the nested loop contains non-var expressions (in
this case a CASE expression).Well, in this case there are two join trees, one for the subquery and
one for the outer query. The expressions for the subquery are computed
at the top of the plan tree for that subquery.I guess I didn't specify before that I meant "at the top of the join
tree for a subquery level," but I did mean that. On principle, the
planner can't very realistically postpone evaluating a subquery's
expressions until the top of the outer query's join tree, because, as
in your example here, the subquery might contain an ORDER BY or
DISTINCT clause. And anyway, if you look at the planner code, you'll
see that every subquery is basically planned separately, unless it
gets flattened into the parent query, so even if it were possible to
postpone expression evaluation to outer subquery levels in some case,
the current code structure definitely would fail to achieve that goal.
Ah, the "for a subquery level" clarification is helpful.
However, apart from incremental sort, you can postpone evaluating a
join tree's expressions until you reach the top of the join tree, and
I think that's what we have always done in the past. Each baserel in
the join tree gets a target list containing a list of vars which
corresponds to its column list, and the joinrels get lists of vars
which correspond to the columns from the inputs are needed at higher
levels of the plan tree. At the top of the join tree, we stick in the
expressions, replacing the target list of the top node in the plan
tree; the expressions have to be computable from the inputs because
the inputs contain all the columns that we figured out were needed at
this level at the beginning of planning. But, if there are scans or
joins below the top level of the plan tree, then do they bubble up to
higher levels of the plan? Should they? It's pretty complicated.
Theoretically if I've computed length(somelongvar) then perhaps I can
just bubble the computed value up the plan tree, rather than the
original column, which might be cheaper. But that only works if the
higher levels only need length(somelongvar) and not somelongvar
itself.
Apologies for some rambling/overlapping thoughts below; I'm try to
absorb all of this.
A join rel (at the top of a subquery join tree) would have to somehow
know that it needs to pass up both length(somelongvar) and somelongvar
right? Since both might be needed by a higher level?
If I understand correctly you're saying that:
1. Target list entries for join rels contain vars and non-var expressions
2. Target list entries for base rels contain only vars
3. By implication the planner knows how to propagate non-var
expression results up from join rels, but not from base rels.
Point 3 is the one that would surprise me, or at least, I'd be
interested in knowing what makes the two different in that case (is it
an artificial restriction, a result of how things are structured in
code, something else?). I feel like I'm still missing context on why
this is inherently a problem.
Does this mean a simple "SELECT bar * 2 FROM foo" ends up with a join
rel at the top? Or that a projection handles it (maybe this is the
wrong question -- perhaps the project results in a join rel? I don't
know enough here to tell if this ends up being an odd question)? And
if a projection, then would inserting a projection above a base rel
but before the top of the join tree solve the problem while allowing
us to push expression evaluation down?
Perhaps the key (to what I'm missing) is just saying something like:
"the work hasn't been down to push expression evaluation down and
ensure it remains in the target lists at every level of the join tree
so as to be propagated up, and so adding it to the base rel will
almost certainly mean duplicate evaluation". That still leaves my
question about how propagating it up from a subquery join tree works
(maybe that's an already handled "special" case).
I don't think there's any logic to work stuff like this out,
unless the incremental sort patch added some, because I don't think we
ever did it before. And it may be that it doesn't really matter, at
least when there aren't volatile functions: perhaps the worst case is
that we evaluate a non-volatile function twice, and maybe that's just
a performance consequence that we can live with. But on the other
hand, maybe it breaks stuff.
It seems to me that (given the new restriction on volatile
expressions) it'd likely not be the end of the world from the
unnecessary execution perspective; after all, we already do lots of
duplicate expression evaluation in other areas like filter clauses.
Or, on the third hand, maybe I'm wrong about what the rule was in the
first place. This is certainly a complicated topic. I don't think I'm
wrong, or I wouldn't be bothering to write emails about it, but that
doesn't mean I'm not wrong anyway.
Well, I for one definitely didn't understand what you were trying to
point out earlier, so I appreciate your continuing the discussion.
It seems to me a (likely?) outcome of this is requiring that the
expressions be in the target list to be considered for sorting here.
And indeed that fixes the non-parallel subplan issue we're also
looking at.
But I think that still leaves something missing: after all,
prepare_sort_from_pathkeys does know how to insert new target list
entries, so something either there (or in the caller/how its called)
has to be enforcing an apparently implicit rule about what point in
the tree it's safe to do such. Or even just no path generation had
ever considered it before (that feels unsatisfactory as an explanation
to me, because it feels more implicit than I'd like, but...)
James
James Coleman <jtc331@gmail.com> writes:
But I think that still leaves something missing: after all,
prepare_sort_from_pathkeys does know how to insert new target list
entries, so something either there (or in the caller/how its called)
has to be enforcing an apparently implicit rule about what point in
the tree it's safe to do such. Or even just no path generation had
ever considered it before (that feels unsatisfactory as an explanation
to me, because it feels more implicit than I'd like, but...)
Hi guys,
I hadn't been paying any attention to this thread, but Robert asked
me to take a look. A few comments:
1. James was wondering, far upthread, why we would do projections
pre-sort or post-sort. I think the answers can be found by studying
planner.c's make_sort_input_target(), which separates out what we want
to do pre-sort and post-sort. (There, the "sort" is envisioned as a
standalone Sort node atop all the scan/join steps; but its decisions
nonetheless constrain what will be considered for sorting that's
implemented further down.) It has a *very* long introductory comment
laying out all the considerations.
2. Robert is correct that under normal circumstances, the targetlists of
both baserels and join rels contain only Vars. Any computations we have
to do are postponed to the final top-level targetlist, whether they are
volatile or not. The fact that this policy is applied regardless of
volatility may explain why James isn't seeing volatility checks where he
expected them. The core (and, I think, only) exception to this is that
an expression that is a sort or group key has to be evaluated earlier.
But even then, it won't be pushed down as far as the reltargets of any
scan or join relations; it'll be computed after the final join.
3. If we have a volatile sort/group key, that constrains the set of plans
we can validly generate, because the key expression mustn't be evaluated
redundantly. With the addition of parallelism, a non-parallel-safe
sort/group key expression likewise constrains the set of valid plans,
even if it's not considered volatile.
4. In the past, the only way we'd consider non-Var sort keys in any way
during scan/join planning was if (a) a relation had an expression index
matching a requested sort key; of course, that's a-fortiori going to be
a non-volatile expression. Or (b) we needed to sort in order to provide
suitable input for a merge join ... but again, volatile expressions
aren't considered candidates for mergejoin. So there basically wasn't
any need to consider sorting by volatiles below the top level sort/group
processing, and that again contributes to why you don't see explicit
tests in that area. I'm not sure offhand whether the parallel-query
patches added anything to guard against nonvolatile-but-parallel-unsafe
sort expressions. If not, there might be live bugs there independently
of incremental sort; or that might be unreachable because of overall
limitations on parallelism.
5. While I've not dug far enough into the code to identify the exact
issue, the impression I have about this bug and the parallel-unsafe
subplan bug is that the incremental sort code is generating Paths
without due regard to point 3. If it is making paths based on explicit
sorts for final-stage pathkeys, independently of either expression
indexes or mergejoin requirements, that could well explain why it's
got problems that weren't seen before.
6. I did look at ebb7ae839, and I suspect that the last part of
find_em_expr_usable_for_sorting_rel(), ie the search of the reltarget
list, is dead code. There is no case where a volatile expression would
appear there, per point 2. The committed regression test cases
certainly do not provide a counterexample: a look at the code coverage
report will show you that that loop never finds a match in any
regression test case. I'd be inclined to flush that code and just
say that we won't return volatile expressions here.
regards, tom lane
On Mon, Nov 23, 2020 at 2:24 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
James Coleman <jtc331@gmail.com> writes:
But I think that still leaves something missing: after all,
prepare_sort_from_pathkeys does know how to insert new target list
entries, so something either there (or in the caller/how its called)
has to be enforcing an apparently implicit rule about what point in
the tree it's safe to do such. Or even just no path generation had
ever considered it before (that feels unsatisfactory as an explanation
to me, because it feels more implicit than I'd like, but...)Hi guys,
I hadn't been paying any attention to this thread, but Robert asked
me to take a look. A few comments:
Thanks for jumping in (and thanks Robert for asking for Tom to take a
look); I appreciate the input.
1. James was wondering, far upthread, why we would do projections
pre-sort or post-sort. I think the answers can be found by studying
planner.c's make_sort_input_target(), which separates out what we want
to do pre-sort and post-sort. (There, the "sort" is envisioned as a
standalone Sort node atop all the scan/join steps; but its decisions
nonetheless constrain what will be considered for sorting that's
implemented further down.) It has a *very* long introductory comment
laying out all the considerations.
That comment is helpful; I wish I'd discovered it sooner.
Does it imply we need to intentionally avoid SRFs also?
2. Robert is correct that under normal circumstances, the targetlists of
both baserels and join rels contain only Vars. Any computations we have
to do are postponed to the final top-level targetlist, whether they are
volatile or not. The fact that this policy is applied regardless of
volatility may explain why James isn't seeing volatility checks where he
expected them. The core (and, I think, only) exception to this is that
an expression that is a sort or group key has to be evaluated earlier.
But even then, it won't be pushed down as far as the reltargets of any
scan or join relations; it'll be computed after the final join.
Is that primarily a project policy? Or a limitation of our current
planner (just can't push things down/carry the results back up as much
as we'd like)? Or something else? In particular, it seems plausible
there are cases where pushing down the sort on a non-Var expression to
the base rel could improve plans, so I'm wondering if there's a reason
to intentionally avoid that in the long or short run (or both).
3. If we have a volatile sort/group key, that constrains the set of plans
we can validly generate, because the key expression mustn't be evaluated
redundantly. With the addition of parallelism, a non-parallel-safe
sort/group key expression likewise constrains the set of valid plans,
even if it's not considered volatile.
This makes a lot of sense (just unfortunate we had to re-discover it).
4. In the past, the only way we'd consider non-Var sort keys in any way
during scan/join planning was if (a) a relation had an expression index
matching a requested sort key; of course, that's a-fortiori going to be
a non-volatile expression. Or (b) we needed to sort in order to provide
suitable input for a merge join ... but again, volatile expressions
aren't considered candidates for mergejoin. So there basically wasn't
any need to consider sorting by volatiles below the top level sort/group
processing, and that again contributes to why you don't see explicit
tests in that area. I'm not sure offhand whether the parallel-query
patches added anything to guard against nonvolatile-but-parallel-unsafe
sort expressions. If not, there might be live bugs there independently
of incremental sort; or that might be unreachable because of overall
limitations on parallelism.
Interesting: so merge joins are an example of us pushing down sorts,
which I assume means (part of) the answer to my question on (2) is
that there's nothing inherently wrong/broken with evaluating
expressions lower down the tree as long as we're careful about what is
safe/unsafe with respect to volatility and parallelism?
5. While I've not dug far enough into the code to identify the exact
issue, the impression I have about this bug and the parallel-unsafe
subplan bug is that the incremental sort code is generating Paths
without due regard to point 3. If it is making paths based on explicit
sorts for final-stage pathkeys, independently of either expression
indexes or mergejoin requirements, that could well explain why it's
got problems that weren't seen before.
Yes, that's correct. Tomas already pushed the fix for the volatility
case, and there's a new thread [1] for the parallel safety concerns.
6. I did look at ebb7ae839, and I suspect that the last part of
find_em_expr_usable_for_sorting_rel(), ie the search of the reltarget
list, is dead code. There is no case where a volatile expression would
appear there, per point 2. The committed regression test cases
certainly do not provide a counterexample: a look at the code coverage
report will show you that that loop never finds a match in any
regression test case. I'd be inclined to flush that code and just
say that we won't return volatile expressions here.
Hmm, I see what you mean. It's theoretically valuable if we ended up
with volatile expressions there, but...that seems at best unlikely.
I have wondered if we should strictly require the expression to be in
the target list even if nonvolatile, but prepare_sort_from_pathkeys
doesn't seem to think that's necessary, so I'm comfortable with that
unless there's something you know we haven't considered.
Alternatively, if we decide we need to be more strict (whether because
of discussion under (1) or because we otherwise decide only Vars can
be allowed here), then would we still potentially need/want the
relabel stripping?
Depending on what we conclude on that, I'll plan to address that in
the thread in [1].
James
1: /messages/by-id/CAAaqYe8cK3g5CfLC4w7bs=hC0mSksZC=H5M8LSchj5e5OxpTAg@mail.gmail.com
James Coleman <jtc331@gmail.com> writes:
On Mon, Nov 23, 2020 at 2:24 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
1. James was wondering, far upthread, why we would do projections
pre-sort or post-sort. I think the answers can be found by studying
planner.c's make_sort_input_target(), which separates out what we want
to do pre-sort and post-sort.
Does it imply we need to intentionally avoid SRFs also?
It's sort of a wart that we allow SRFs in ORDER BY at all, but my
expectation is that make_sort_input_target would prevent lower levels
of the planner from needing to think about that. We don't allow SRFs
in index expressions, nor in WHERE clauses (so they'll never come up
as mergejoin sort keys). So really there's no way that scan/join
processing should need to consider such cases. Such a sort would
only ever be implemented via a final Sort atop ProjectSet.
2. Robert is correct that under normal circumstances, the targetlists of
both baserels and join rels contain only Vars.
Is that primarily a project policy? Or a limitation of our current
planner (just can't push things down/carry the results back up as much
as we'd like)? Or something else? In particular, it seems plausible
there are cases where pushing down the sort on a non-Var expression to
the base rel could improve plans, so I'm wondering if there's a reason
to intentionally avoid that in the long or short run (or both).
I think you've just rediscovered Joe Hellerstein's thesis topic [1]/messages/by-id/28216.1023340706@sss.pgh.pa.us.
We ripped out the remnants of that code ages ago (search very early
git states for "JMH" if you're interested), because the complexity
vs. benefit ratio seemed pretty bad. Maybe there'll be a case for
putting it back some day, but I'm dubious. Note that we have the
ability to push down sorts-on-expressions anyway; that's not constrained
by what is in the relation targetlists.
Interesting: so merge joins are an example of us pushing down sorts,
which I assume means (part of) the answer to my question on (2) is
that there's nothing inherently wrong/broken with evaluating
expressions lower down the tree as long as we're careful about what is
safe/unsafe with respect to volatility and parallelism?
Right, I don't see any fundamental problem with that, we just have
to be careful about these constraints.
I have wondered if we should strictly require the expression to be in
the target list even if nonvolatile, but prepare_sort_from_pathkeys
doesn't seem to think that's necessary, so I'm comfortable with that
unless there's something you know we haven't considered.
No, prepare_sort_from_pathkeys is happy to build a sort expression if it
can't find it already computed in the input. The secret here is that
we should never get to that code with a "dangerous" sort expression,
because we should never have made a Path that would request such a thing
in the first place. It's fairly obvious to me how we deal with the
consideration for volatile sortkeys. We cannot restrict parallel-unsafe
sortkeys quite the same way, because the restriction only applies to
parallel not non-parallel plans. Maybe it's sufficient to mark Paths
as parallel unsafe if they sort by parallel-unsafe sortkeys? Is there
anyplace that is Just Assuming that paths it builds will be
parallel-safe?
regards, tom lane
On Tue, Nov 24, 2020 at 2:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
James Coleman <jtc331@gmail.com> writes:
On Mon, Nov 23, 2020 at 2:24 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
1. James was wondering, far upthread, why we would do projections
pre-sort or post-sort. I think the answers can be found by studying
planner.c's make_sort_input_target(), which separates out what we want
to do pre-sort and post-sort.Does it imply we need to intentionally avoid SRFs also?
It's sort of a wart that we allow SRFs in ORDER BY at all, but my
expectation is that make_sort_input_target would prevent lower levels
of the planner from needing to think about that. We don't allow SRFs
in index expressions, nor in WHERE clauses (so they'll never come up
as mergejoin sort keys). So really there's no way that scan/join
processing should need to consider such cases. Such a sort would
only ever be implemented via a final Sort atop ProjectSet.
In this case though we're sorting based on "interesting" pathkeys,
which means we don't don't necessarily have index expressions or
mergejoin sort keys. For example, even with the bugfix patch (from the
parallel fix thread I've linked to previously) applied, I'm able to
generate this (with of course some GUCs "tweaked"):
select unique1 from tenk1 order by unnest('{1}'::int[]);
Gather Merge
Workers Planned: 2
-> Sort
Sort Key: (unnest('{1}'::integer[]))
-> ProjectSet
-> Parallel Index Only Scan using tenk1_unique1 on tenk1
If I understand correctly, that's a violation of the spirit of what
the comments above make_sort_input_target(). If that's true, then it
should be pretty easy to disallow them from being considered. Given
the existing restrictions on where SRFs can be placed in a SELECT
(e.g., no CASE/COALESCE) and my assumption (without having thoroughly
verified this) that SRFs aren't allowed as arguments to functions or
as arguments to any other expression (I assume only scalars are
allowed), would it be sufficient to check the pathkey expression
(without recursion) to see if it's a FuncExpr that returns a set?
2. Robert is correct that under normal circumstances, the targetlists of
both baserels and join rels contain only Vars.Is that primarily a project policy? Or a limitation of our current
planner (just can't push things down/carry the results back up as much
as we'd like)? Or something else? In particular, it seems plausible
there are cases where pushing down the sort on a non-Var expression to
the base rel could improve plans, so I'm wondering if there's a reason
to intentionally avoid that in the long or short run (or both).I think you've just rediscovered Joe Hellerstein's thesis topic [1].
We ripped out the remnants of that code ages ago (search very early
git states for "JMH" if you're interested), because the complexity
vs. benefit ratio seemed pretty bad. Maybe there'll be a case for
putting it back some day, but I'm dubious. Note that we have the
ability to push down sorts-on-expressions anyway; that's not constrained
by what is in the relation targetlists.
I'm doing some grepping now to see what that work entailed, but I'm
not sure that what I'm describing is the same thing. By "pushing down
a non-Var expression to the base rel" I mean what (to my ears) sounds
like what you're saying by "we have the ability to push down
sorts-on-expressions anyway; that's not constrained by what is in the
relation targetlists". The target list entries I'm talking about for
these expressions are the ones inserted by
prepare_sort_from_pathkeys(), which in PG13 happens just a bit more
frequently since, at least in parallel query, consider sort paths
lower than we did before based on interesting pathkeys (for now just
query_pathkeys) in generate_useful_gather_paths().
Interesting: so merge joins are an example of us pushing down sorts,
which I assume means (part of) the answer to my question on (2) is
that there's nothing inherently wrong/broken with evaluating
expressions lower down the tree as long as we're careful about what is
safe/unsafe with respect to volatility and parallelism?Right, I don't see any fundamental problem with that, we just have
to be careful about these constraints.
Great. That's exactly what I was looking for. To summarize: the
constraints are on volatility, parallel safety, and SRFs.
I have wondered if we should strictly require the expression to be in
the target list even if nonvolatile, but prepare_sort_from_pathkeys
doesn't seem to think that's necessary, so I'm comfortable with that
unless there's something you know we haven't considered.No, prepare_sort_from_pathkeys is happy to build a sort expression if it
can't find it already computed in the input. The secret here is that
we should never get to that code with a "dangerous" sort expression,
because we should never have made a Path that would request such a thing
in the first place. It's fairly obvious to me how we deal with the
consideration for volatile sortkeys. We cannot restrict parallel-unsafe
sortkeys quite the same way, because the restriction only applies to
parallel not non-parallel plans. Maybe it's sufficient to mark Paths
as parallel unsafe if they sort by parallel-unsafe sortkeys? Is there
anyplace that is Just Assuming that paths it builds will be
parallel-safe?
It seems actually quite simple as I understand it: in my proposed
patch in [1] we know in find_em_expr_usable_for_sorting_rel() whether
or not we're working on parallel paths, and so we're able to rely
simply on is_parallel_safe() for the expr under consideration.
James
1: /messages/by-id/CAAaqYe8cK3g5CfLC4w7bs=hC0mSksZC=H5M8LSchj5e5OxpTAg@mail.gmail.com
On Wed, Nov 25, 2020 at 2:53 PM James Coleman <jtc331@gmail.com> wrote:
On Tue, Nov 24, 2020 at 2:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
James Coleman <jtc331@gmail.com> writes:
On Mon, Nov 23, 2020 at 2:24 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
1. James was wondering, far upthread, why we would do projections
pre-sort or post-sort. I think the answers can be found by studying
planner.c's make_sort_input_target(), which separates out what we want
to do pre-sort and post-sort.Does it imply we need to intentionally avoid SRFs also?
It's sort of a wart that we allow SRFs in ORDER BY at all, but my
expectation is that make_sort_input_target would prevent lower levels
of the planner from needing to think about that. We don't allow SRFs
in index expressions, nor in WHERE clauses (so they'll never come up
as mergejoin sort keys). So really there's no way that scan/join
processing should need to consider such cases. Such a sort would
only ever be implemented via a final Sort atop ProjectSet.In this case though we're sorting based on "interesting" pathkeys,
which means we don't don't necessarily have index expressions or
mergejoin sort keys. For example, even with the bugfix patch (from the
parallel fix thread I've linked to previously) applied, I'm able to
generate this (with of course some GUCs "tweaked"):select unique1 from tenk1 order by unnest('{1}'::int[]);
Gather Merge
Workers Planned: 2
-> Sort
Sort Key: (unnest('{1}'::integer[]))
-> ProjectSet
-> Parallel Index Only Scan using tenk1_unique1 on tenk1If I understand correctly, that's a violation of the spirit of what
the comments above make_sort_input_target(). If that's true, then it
should be pretty easy to disallow them from being considered. Given
the existing restrictions on where SRFs can be placed in a SELECT
(e.g., no CASE/COALESCE) and my assumption (without having thoroughly
verified this) that SRFs aren't allowed as arguments to functions or
as arguments to any other expression (I assume only scalars are
allowed), would it be sufficient to check the pathkey expression
(without recursion) to see if it's a FuncExpr that returns a set?
Here's the plan with a change to restrict SRFs here:
Sort
Sort Key: (unnest('{1,2}'::integer[]))
-> Gather
Workers Planned: 2
-> ProjectSet
-> Parallel Index Only Scan using tenk1_unique1 on tenk1
I'm a bit surprised the ProjectSet is above the Index Scan rather than
above the Gather, but maybe this is still more correct?
James
On Tue, Nov 3, 2020 at 4:39 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
I've pushed the 0001 part, i.e. the main fix. Not sure about the other
parts (comments and moving the code back to postgres_fdw) yet.
I noticed the CF entry [1] got moved to the next CF; I'm thinking this
entry should be marked as committed since the fix for the initial bug
reported on this thread has been pushed. We have the parallel safety
issue outstanding, but there's a separate thread and patch for that,
so I'll make a new CF entry for that.
I can mark it as committed, but I'm not sure how to "undo" (or if
that's desirable) the move to the next CF.
Thoughts?
James
On 01.12.2020 03:08, James Coleman wrote:
On Tue, Nov 3, 2020 at 4:39 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:I've pushed the 0001 part, i.e. the main fix. Not sure about the other
parts (comments and moving the code back to postgres_fdw) yet.I noticed the CF entry [1] got moved to the next CF; I'm thinking this
entry should be marked as committed since the fix for the initial bug
reported on this thread has been pushed. We have the parallel safety
issue outstanding, but there's a separate thread and patch for that,
so I'll make a new CF entry for that.I can mark it as committed, but I'm not sure how to "undo" (or if
that's desirable) the move to the next CF.Thoughts?
James
Oops...
I must have rushed with this one, thank you for noticing.
I don't see how to move it back either. I think it's fine to mark it as
Committed where it is now.
--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Tue, Dec 1, 2020 at 4:08 AM Anastasia Lubennikova
<a.lubennikova@postgrespro.ru> wrote:
On 01.12.2020 03:08, James Coleman wrote:
On Tue, Nov 3, 2020 at 4:39 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:I've pushed the 0001 part, i.e. the main fix. Not sure about the other
parts (comments and moving the code back to postgres_fdw) yet.I noticed the CF entry [1] got moved to the next CF; I'm thinking this
entry should be marked as committed since the fix for the initial bug
reported on this thread has been pushed. We have the parallel safety
issue outstanding, but there's a separate thread and patch for that,
so I'll make a new CF entry for that.I can mark it as committed, but I'm not sure how to "undo" (or if
that's desirable) the move to the next CF.Thoughts?
James
Oops...
I must have rushed with this one, thank you for noticing.
I don't see how to move it back either. I think it's fine to mark it as
Committed where it is now.
BTW, I still see this one as needs review
--
Jaime Casanova
Professional PostgreSQL: Soporte 24x7 y capacitación
On 12/16/20 1:51 AM, Jaime Casanova wrote:
On Tue, Dec 1, 2020 at 4:08 AM Anastasia Lubennikova
<a.lubennikova@postgrespro.ru> wrote:On 01.12.2020 03:08, James Coleman wrote:
On Tue, Nov 3, 2020 at 4:39 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:I've pushed the 0001 part, i.e. the main fix. Not sure about the other
parts (comments and moving the code back to postgres_fdw) yet.I noticed the CF entry [1] got moved to the next CF; I'm thinking this
entry should be marked as committed since the fix for the initial bug
reported on this thread has been pushed. We have the parallel safety
issue outstanding, but there's a separate thread and patch for that,
so I'll make a new CF entry for that.I can mark it as committed, but I'm not sure how to "undo" (or if
that's desirable) the move to the next CF.Thoughts?
James
Oops...
I must have rushed with this one, thank you for noticing.
I don't see how to move it back either. I think it's fine to mark it as
Committed where it is now.BTW, I still see this one as needs review
Hmm, not sure what was the plan, but I'll mark it as committed once I
push the remaining patches (for parallel-safe checks, SRFs etc.) in a
couple days.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Just a bump to see if there is a time frame for a fix on this.
Google Cloud doesn't yet support setting the *enable_incremental_sort * flag
yet.
Was able to temporarily resolve by running the following though:
ALTER DATABASE corpdb SET enable_incremental_sort TO OFF;
Hope this helps others until the core issue has been resolved.
--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
On 3/1/21 8:23 PM, rbrooks wrote:
Just a bump to see if there is a time frame for a fix on this.
Google Cloud doesn't yet support setting the *enable_incremental_sort * flag
yet.
Was able to temporarily resolve by running the following though:ALTER DATABASE corpdb SET enable_incremental_sort TO OFF;
Hope this helps others until the core issue has been resolved.
I don't follow - all fixes in this thread were committed - that's why
it's marked like that in the CF app. And thus it should be in the recent
13.2 release.
I don't know what version is currently supported by Google Cloud, maybe
they haven't upgraded their PostgreSQL version yet.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Thanks for clarifying, Tomas.
Yes - it appears Google Cloud SQL automatically updates minor versions of
Postgres but hasn't updated beyond 13.1 yet.
--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html