"could not find pathkey item to sort" for TPC-DS queries 94-96

Started by Luc Vlamingabout 5 years ago29 messageshackers
Jump to latest
#1Luc Vlaming
luc@swarm64.com

Hi,

When trying to run on master (but afaik also PG-13) TPC-DS queries 94,
95 and 96 on a SF10 I get the error "could not find pathkey item to sort".
When I disable enable_gathermerge the problem goes away and then the
plan for query 94 looks like below. I tried figuring out what the
problem is but to be honest I would need some pointers as the code that
tries to matching equivalence members in prepare_sort_from_pathkeys is
something i'm really not familiar with.

To reproduce you can either ingest and test using the toolkit I used too
(see https://github.com/swarm64/s64da-benchmark-toolkit/), or
alternatively just use the schema (see
https://github.com/swarm64/s64da-benchmark-toolkit/tree/master/benchmarks/tpcds/schemas/psql_native)

Best,
Luc

------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=229655.62..229655.63 rows=1 width=72)
-> Sort (cost=229655.62..229655.63 rows=1 width=72)
Sort Key: (count(DISTINCT ws1.ws_order_number))
-> Aggregate (cost=229655.60..229655.61 rows=1 width=72)
-> Nested Loop Semi Join (cost=1012.65..229655.59
rows=1 width=16)
-> Nested Loop (cost=1012.22..229653.73 rows=1
width=20)
Join Filter: (ws1.ws_web_site_sk =
web_site.web_site_sk)
-> Nested Loop (cost=1012.22..229651.08
rows=1 width=24)
-> Gather (cost=1011.80..229650.64
rows=1 width=28)
Workers Planned: 2
-> Nested Loop Anti Join
(cost=11.80..228650.54 rows=1 width=28)
-> Hash Join
(cost=11.37..227438.35 rows=2629 width=28)
Hash Cond:
(ws1.ws_ship_date_sk = date_dim.d_date_sk)
-> Parallel Seq
Scan on web_sales ws1 (cost=0.00..219548.92 rows=3000992 width=32)
-> Hash
(cost=10.57..10.57 rows=64 width=4)
-> Index Scan
using idx_d_date on date_dim (cost=0.29..10.57 rows=64 width=4)
Index
Cond: ((d_date >= '2000-03-01'::date) AND (d_date <= '2000-04-30'::date))
-> Index Only Scan using
idx_wr_order_number on web_returns wr1 (cost=0.42..0.46 rows=2 width=4)
Index Cond:
(wr_order_number = ws1.ws_order_number)
-> Index Scan using
customer_address_pkey on customer_address (cost=0.42..0.44 rows=1 width=4)
Index Cond: (ca_address_sk =
ws1.ws_ship_addr_sk)
Filter: ((ca_state)::text =
'GA'::text)
-> Seq Scan on web_site (cost=0.00..2.52
rows=10 width=4)
Filter: ((web_company_name)::text =
'pri'::text)
-> Index Scan using idx_ws_order_number on
web_sales ws2 (cost=0.43..1.84 rows=59 width=8)
Index Cond: (ws_order_number =
ws1.ws_order_number)
Filter: (ws1.ws_warehouse_sk <> ws_warehouse_sk)

The top of the stacktrace is:
#0 errfinish (filename=0x5562dc1a5125 "createplan.c", lineno=6186,
funcname=0x5562dc1a54d0 <__func__.14> "prepare_sort_from_pathkeys") at
elog.c:514
#1 0x00005562dbc2d7de in prepare_sort_from_pathkeys
(lefttree=0x5562dc5a2f58, pathkeys=0x5562dc4eabc8, relids=0x0,
reqColIdx=0x0, adjust_tlist_in_place=<optimized out>,
p_numsortkeys=0x7ffc0b8cda84, p_sortColIdx=0x7ffc0b8cda88,
p_sortOperators=0x7ffc0b8cda90, p_collations=0x7ffc0b8cda98,
p_nullsFirst=0x7ffc0b8cdaa0) at createplan.c:6186
#2 0x00005562dbe8d695 in make_sort_from_pathkeys (lefttree=<optimized
out>, pathkeys=<optimized out>, relids=<optimized out>) at createplan.c:6313
#3 0x00005562dbe8eba3 in create_sort_plan (flags=1,
best_path=0x5562dc548d68, root=0x5562dc508cf8) at createplan.c:2118
#4 create_plan_recurse (root=0x5562dc508cf8, best_path=0x5562dc548d68,
flags=1) at createplan.c:489
#5 0x00005562dbe8f315 in create_gather_merge_plan
(best_path=0x5562dc5782f8, root=0x5562dc508cf8) at createplan.c:1885
#6 create_plan_recurse (root=0x5562dc508cf8, best_path=0x5562dc5782f8,
flags=<optimized out>) at createplan.c:541
#7 0x00005562dbe8ddad in create_nestloop_plan
(best_path=0x5562dc585668, root=0x5562dc508cf8) at createplan.c:4237
#8 create_join_plan (best_path=0x5562dc585668, root=0x5562dc508cf8) at
createplan.c:1062
#9 create_plan_recurse (root=0x5562dc508cf8, best_path=0x5562dc585668,
flags=<optimized out>) at createplan.c:418
#10 0x00005562dbe8ddad in create_nestloop_plan
(best_path=0x5562dc5c4428, root=0x5562dc508cf8) at createplan.c:4237
#11 create_join_plan (best_path=0x5562dc5c4428, root=0x5562dc508cf8) at
createplan.c:1062
#12 create_plan_recurse (root=0x5562dc508cf8, best_path=0x5562dc5c4428,
flags=<optimized out>) at createplan.c:418
#13 0x00005562dbe8ddad in create_nestloop_plan
(best_path=0x5562dc5d3bd8, root=0x5562dc508cf8) at createplan.c:4237
#14 create_join_plan (best_path=0x5562dc5d3bd8, root=0x5562dc508cf8) at
createplan.c:1062
#15 create_plan_recurse (root=0x5562dc508cf8, best_path=0x5562dc5d3bd8,
flags=<optimized out>) at createplan.c:418
#16 0x00005562dbe8e428 in create_agg_plan (best_path=0x5562dc5d6f08,
root=0x5562dc508cf8) at createplan.c:2238
#17 create_plan_recurse (root=0x5562dc508cf8, best_path=0x5562dc5d6f08,
flags=3) at createplan.c:509
#18 0x00005562dbe8eb73 in create_sort_plan (flags=1,
best_path=0x5562dc5d7378, root=0x5562dc508cf8) at createplan.c:2109
#19 create_plan_recurse (root=0x5562dc508cf8, best_path=0x5562dc5d7378,
flags=1) at createplan.c:489
#20 0x00005562dbe8e7e8 in create_limit_plan (flags=1,
best_path=0x5562dc5d7a08, root=0x5562dc508cf8) at createplan.c:2784
#21 create_plan_recurse (root=0x5562dc508cf8, best_path=0x5562dc5d7a08,
flags=1) at createplan.c:536
#22 0x00005562dbe914ae in create_plan (root=root@entry=0x5562dc508cf8,
best_path=<optimized out>) at createplan.c:349

#2Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Luc Vlaming (#1)
Re: "could not find pathkey item to sort" for TPC-DS queries 94-96

On 4/12/21 2:24 PM, Luc Vlaming wrote:

Hi,

When trying to run on master (but afaik also PG-13) TPC-DS queries 94,
95 and 96 on a SF10 I get the error "could not find pathkey item to sort".
When I disable enable_gathermerge the problem goes away and then the
plan for query 94 looks like below. I tried figuring out what the
problem is but to be honest I would need some pointers as the code that
tries to matching equivalence members in prepare_sort_from_pathkeys is
something i'm really not familiar with.

Could be related to incremental sort, which allowed some gather merge
paths that were impossible before. We had a couple issues related to
that fixed in November, IIRC.

To reproduce you can either ingest and test using the toolkit I used too
(see https://github.com/swarm64/s64da-benchmark-toolkit/), or
alternatively just use the schema (see
https://github.com/swarm64/s64da-benchmark-toolkit/tree/master/benchmarks/tpcds/schemas/psql_native)

Thanks, I'll see if I can reproduce that with your schema.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3James Coleman
jtc331@gmail.com
In reply to: Tomas Vondra (#2)
Re: "could not find pathkey item to sort" for TPC-DS queries 94-96

On Mon, Apr 12, 2021 at 8:37 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 4/12/21 2:24 PM, Luc Vlaming wrote:

Hi,

When trying to run on master (but afaik also PG-13) TPC-DS queries 94,
95 and 96 on a SF10 I get the error "could not find pathkey item to sort".
When I disable enable_gathermerge the problem goes away and then the
plan for query 94 looks like below. I tried figuring out what the
problem is but to be honest I would need some pointers as the code that
tries to matching equivalence members in prepare_sort_from_pathkeys is
something i'm really not familiar with.

Could be related to incremental sort, which allowed some gather merge
paths that were impossible before. We had a couple issues related to
that fixed in November, IIRC.

To reproduce you can either ingest and test using the toolkit I used too
(see https://github.com/swarm64/s64da-benchmark-toolkit/), or
alternatively just use the schema (see
https://github.com/swarm64/s64da-benchmark-toolkit/tree/master/benchmarks/tpcds/schemas/psql_native)

Thanks, I'll see if I can reproduce that with your schema.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

The query in question is:

select count(*)
from store_sales
,household_demographics
,time_dim, store
where ss_sold_time_sk = time_dim.t_time_sk
and ss_hdemo_sk = household_demographics.hd_demo_sk
and ss_store_sk = s_store_sk
and time_dim.t_hour = 15
and time_dim.t_minute >= 30
and household_demographics.hd_dep_count = 7
and store.s_store_name = 'ese'
order by count(*)
limit 100;

From debugging output it looks like this is the plan being chosen
(cheapest total path):
Gather(store_sales household_demographics time_dim) rows=60626
cost=3145.73..699910.15
HashJoin(store_sales household_demographics time_dim)
rows=25261 cost=2145.73..692847.55
clauses: store_sales.ss_hdemo_sk =
household_demographics.hd_demo_sk
HashJoin(store_sales time_dim) rows=252609
cost=1989.73..692028.08
clauses: store_sales.ss_sold_time_sk =
time_dim.t_time_sk
SeqScan(store_sales) rows=11998564
cost=0.00..658540.64
SeqScan(time_dim) rows=1070
cost=0.00..1976.35
SeqScan(household_demographics) rows=720
cost=0.00..147.00

prepare_sort_from_pathkeys fails to find a pathkey because
tlist_member_ignore_relabel returns null -- which seemed weird because
the sortexpr is an Aggref (in a single member equivalence class) and
the tlist contains a single member that's also an Aggref. It turns out
that the only difference between the two Aggrefs is that the tlist
entry has "aggsplit = AGGSPLIT_INITIAL_SERIAL" while the sortexpr has
aggsplit = AGGSPLIT_SIMPLE.

That's as far as I've gotten so far, but I figured I'd get that info
out to see if it means anything obvious to anyone else.

James

#4Robert Haas
robertmhaas@gmail.com
In reply to: Tomas Vondra (#2)
Re: "could not find pathkey item to sort" for TPC-DS queries 94-96

On Mon, Apr 12, 2021 at 8:37 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Could be related to incremental sort, which allowed some gather merge
paths that were impossible before. We had a couple issues related to
that fixed in November, IIRC.

Hmm, could be. Although, the stack trace at issue doesn't seem to show
a call to create_incrementalsort_plan().

--
Robert Haas
EDB: http://www.enterprisedb.com

#5James Coleman
jtc331@gmail.com
In reply to: Robert Haas (#4)
Re: "could not find pathkey item to sort" for TPC-DS queries 94-96

On Wed, Apr 14, 2021 at 8:16 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Apr 12, 2021 at 8:37 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Could be related to incremental sort, which allowed some gather merge
paths that were impossible before. We had a couple issues related to
that fixed in November, IIRC.

Hmm, could be. Although, the stack trace at issue doesn't seem to show
a call to create_incrementalsort_plan().

The changes to gather merge path generation made it possible to use
those paths in more cases for both incremental sort and regular sort,
so by "incremental sort" I read Tomas as saying "the patches that
brought in incremental sort" not specifically "incremental sort
itself".

James

#6Robert Haas
robertmhaas@gmail.com
In reply to: James Coleman (#3)
Re: "could not find pathkey item to sort" for TPC-DS queries 94-96

On Wed, Apr 14, 2021 at 5:43 PM James Coleman <jtc331@gmail.com> wrote:

The query in question is:
select count(*)
from store_sales
,household_demographics
,time_dim, store
where ss_sold_time_sk = time_dim.t_time_sk
and ss_hdemo_sk = household_demographics.hd_demo_sk
and ss_store_sk = s_store_sk
and time_dim.t_hour = 15
and time_dim.t_minute >= 30
and household_demographics.hd_dep_count = 7
and store.s_store_name = 'ese'
order by count(*)
limit 100;

From debugging output it looks like this is the plan being chosen
(cheapest total path):
Gather(store_sales household_demographics time_dim) rows=60626
cost=3145.73..699910.15
HashJoin(store_sales household_demographics time_dim)
rows=25261 cost=2145.73..692847.55
clauses: store_sales.ss_hdemo_sk =
household_demographics.hd_demo_sk
HashJoin(store_sales time_dim) rows=252609
cost=1989.73..692028.08
clauses: store_sales.ss_sold_time_sk =
time_dim.t_time_sk
SeqScan(store_sales) rows=11998564
cost=0.00..658540.64
SeqScan(time_dim) rows=1070
cost=0.00..1976.35
SeqScan(household_demographics) rows=720
cost=0.00..147.00

This doesn't really make sense to me given the strack trace in the OP.
That seems to go Limit -> Sort -> Agg -> NestLoop -> NestLoop ->
NestLoop -> GatherMerge -> Sort. If the plan were as you have it here,
there would be no Sort and no Gather Merge, so where would be getting
a failure related to pathkeys?

I think if we can get the correct plan the thing to look at would be
the tlists at the relevant levels of the plan.

--
Robert Haas
EDB: http://www.enterprisedb.com

#7Robert Haas
robertmhaas@gmail.com
In reply to: James Coleman (#5)
Re: "could not find pathkey item to sort" for TPC-DS queries 94-96

On Wed, Apr 14, 2021 at 8:20 PM James Coleman <jtc331@gmail.com> wrote:

Hmm, could be. Although, the stack trace at issue doesn't seem to show
a call to create_incrementalsort_plan().

The changes to gather merge path generation made it possible to use
those paths in more cases for both incremental sort and regular sort,
so by "incremental sort" I read Tomas as saying "the patches that
brought in incremental sort" not specifically "incremental sort
itself".

I agree. That's why I said "hmm, could be" even though the plan
doesn't involve one.

--
Robert Haas
EDB: http://www.enterprisedb.com

#8James Coleman
jtc331@gmail.com
In reply to: Robert Haas (#6)
Re: "could not find pathkey item to sort" for TPC-DS queries 94-96

On Wed, Apr 14, 2021 at 8:21 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Apr 14, 2021 at 5:43 PM James Coleman <jtc331@gmail.com> wrote:

The query in question is:
select count(*)
from store_sales
,household_demographics
,time_dim, store
where ss_sold_time_sk = time_dim.t_time_sk
and ss_hdemo_sk = household_demographics.hd_demo_sk
and ss_store_sk = s_store_sk
and time_dim.t_hour = 15
and time_dim.t_minute >= 30
and household_demographics.hd_dep_count = 7
and store.s_store_name = 'ese'
order by count(*)
limit 100;

From debugging output it looks like this is the plan being chosen
(cheapest total path):
Gather(store_sales household_demographics time_dim) rows=60626
cost=3145.73..699910.15
HashJoin(store_sales household_demographics time_dim)
rows=25261 cost=2145.73..692847.55
clauses: store_sales.ss_hdemo_sk =
household_demographics.hd_demo_sk
HashJoin(store_sales time_dim) rows=252609
cost=1989.73..692028.08
clauses: store_sales.ss_sold_time_sk =
time_dim.t_time_sk
SeqScan(store_sales) rows=11998564
cost=0.00..658540.64
SeqScan(time_dim) rows=1070
cost=0.00..1976.35
SeqScan(household_demographics) rows=720
cost=0.00..147.00

This doesn't really make sense to me given the strack trace in the OP.
That seems to go Limit -> Sort -> Agg -> NestLoop -> NestLoop ->
NestLoop -> GatherMerge -> Sort. If the plan were as you have it here,
there would be no Sort and no Gather Merge, so where would be getting
a failure related to pathkeys?

Ah, yeah, I'm not sure where the original stacktrace came from, but
here's the stack for the query I reproduced it with (perhaps it does
so on different queries or there was some other GUC change in the
reported plan):

#0 errfinish (filename=filename@entry=0x56416eefa845 "createplan.c",
lineno=lineno@entry=6186,
funcname=funcname@entry=0x56416eefb660 <__func__.24872>
"prepare_sort_from_pathkeys") at elog.c:514
#1 0x000056416eb6ed52 in prepare_sort_from_pathkeys
(lefttree=0x564170552658, pathkeys=0x5641704f2640, relids=0x0,
reqColIdx=reqColIdx@entry=0x0,
adjust_tlist_in_place=adjust_tlist_in_place@entry=false,
p_numsortkeys=p_numsortkeys@entry=0x7fff1252817c,
p_sortColIdx=0x7fff12528170,
p_sortOperators=0x7fff12528168, p_collations=0x7fff12528160,
p_nullsFirst=0x7fff12528158) at createplan.c:6186
#2 0x000056416eb6ee69 in make_sort_from_pathkeys (lefttree=<optimized
out>, pathkeys=<optimized out>, relids=<optimized out>) at
createplan.c:6313
#3 0x000056416eb71fc7 in create_sort_plan
(root=root@entry=0x564170511a70,
best_path=best_path@entry=0x56417054f650, flags=flags@entry=1)
at createplan.c:2118
#4 0x000056416eb6f638 in create_plan_recurse
(root=root@entry=0x564170511a70, best_path=0x56417054f650,
flags=flags@entry=1) at createplan.c:489
#5 0x000056416eb72e06 in create_gather_merge_plan
(root=root@entry=0x564170511a70,
best_path=best_path@entry=0x56417054f6e8) at createplan.c:1885
#6 0x000056416eb6f723 in create_plan_recurse
(root=root@entry=0x564170511a70, best_path=0x56417054f6e8,
flags=flags@entry=4) at createplan.c:541
#7 0x000056416eb726fb in create_agg_plan
(root=root@entry=0x564170511a70,
best_path=best_path@entry=0x56417054f8c8) at createplan.c:2238
#8 0x000056416eb6f67b in create_plan_recurse
(root=root@entry=0x564170511a70, best_path=0x56417054f8c8,
flags=flags@entry=3) at createplan.c:509
#9 0x000056416eb71f8e in create_sort_plan
(root=root@entry=0x564170511a70,
best_path=best_path@entry=0x56417054f560, flags=flags@entry=1)
at createplan.c:2109
#10 0x000056416eb6f638 in create_plan_recurse
(root=root@entry=0x564170511a70, best_path=0x56417054f560,
flags=flags@entry=1) at createplan.c:489
#11 0x000056416eb72c83 in create_limit_plan
(root=root@entry=0x564170511a70,
best_path=best_path@entry=0x56417054ffa0, flags=flags@entry=1)
at createplan.c:2784
#12 0x000056416eb6f713 in create_plan_recurse
(root=root@entry=0x564170511a70, best_path=0x56417054ffa0,
flags=flags@entry=1) at createplan.c:536
#13 0x000056416eb6f79d in create_plan (root=root@entry=0x564170511a70,
best_path=<optimized out>) at createplan.c:349
#14 0x000056416eb7fe93 in standard_planner (parse=0x564170437268,
query_string=<optimized out>, cursorOptions=2048,
boundParams=<optimized out>)
at planner.c:407

I think if we can get the correct plan the thing to look at would be
the tlists at the relevant levels of the plan.

Does the information in [1] help at all? The tlist does have an
Aggref, as expected, but its aggsplit value doesn't match the
pathkey's Aggref's aggsplit value.

James

1: /messages/by-id/CAAaqYe_NU4hO9COoJdcXWqjtH=dGMknYdsSdJjZ=JOHPTea-Nw@mail.gmail.com

#9James Coleman
jtc331@gmail.com
In reply to: Robert Haas (#6)
Re: "could not find pathkey item to sort" for TPC-DS queries 94-96

On Wed, Apr 14, 2021 at 8:21 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Apr 14, 2021 at 5:43 PM James Coleman <jtc331@gmail.com> wrote:

The query in question is:
select count(*)
from store_sales
,household_demographics
,time_dim, store
where ss_sold_time_sk = time_dim.t_time_sk
and ss_hdemo_sk = household_demographics.hd_demo_sk
and ss_store_sk = s_store_sk
and time_dim.t_hour = 15
and time_dim.t_minute >= 30
and household_demographics.hd_dep_count = 7
and store.s_store_name = 'ese'
order by count(*)
limit 100;

From debugging output it looks like this is the plan being chosen
(cheapest total path):
Gather(store_sales household_demographics time_dim) rows=60626
cost=3145.73..699910.15
HashJoin(store_sales household_demographics time_dim)
rows=25261 cost=2145.73..692847.55
clauses: store_sales.ss_hdemo_sk =
household_demographics.hd_demo_sk
HashJoin(store_sales time_dim) rows=252609
cost=1989.73..692028.08
clauses: store_sales.ss_sold_time_sk =
time_dim.t_time_sk
SeqScan(store_sales) rows=11998564
cost=0.00..658540.64
SeqScan(time_dim) rows=1070
cost=0.00..1976.35
SeqScan(household_demographics) rows=720
cost=0.00..147.00

This doesn't really make sense to me given the strack trace in the OP.
That seems to go Limit -> Sort -> Agg -> NestLoop -> NestLoop ->
NestLoop -> GatherMerge -> Sort. If the plan were as you have it here,
there would be no Sort and no Gather Merge, so where would be getting
a failure related to pathkeys?

Also I just realized why this didn't make sense -- I'm not sure what
the above path is. It'd gotten logged as part of the debug options I
have configured, but it must be 1.) incomplete (perhaps at a lower
level of path generation) and/or not the final path selected.

My apologies for the confusion.

James

#10James Coleman
jtc331@gmail.com
In reply to: James Coleman (#3)
Re: "could not find pathkey item to sort" for TPC-DS queries 94-96

On Wed, Apr 14, 2021 at 5:42 PM James Coleman <jtc331@gmail.com> wrote:

On Mon, Apr 12, 2021 at 8:37 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 4/12/21 2:24 PM, Luc Vlaming wrote:

Hi,

When trying to run on master (but afaik also PG-13) TPC-DS queries 94,
95 and 96 on a SF10 I get the error "could not find pathkey item to sort".
When I disable enable_gathermerge the problem goes away and then the
plan for query 94 looks like below. I tried figuring out what the
problem is but to be honest I would need some pointers as the code that
tries to matching equivalence members in prepare_sort_from_pathkeys is
something i'm really not familiar with.

Could be related to incremental sort, which allowed some gather merge
paths that were impossible before. We had a couple issues related to
that fixed in November, IIRC.

To reproduce you can either ingest and test using the toolkit I used too
(see https://github.com/swarm64/s64da-benchmark-toolkit/), or
alternatively just use the schema (see
https://github.com/swarm64/s64da-benchmark-toolkit/tree/master/benchmarks/tpcds/schemas/psql_native)

Thanks, I'll see if I can reproduce that with your schema.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

The query in question is:

select count(*)
from store_sales
,household_demographics
,time_dim, store
where ss_sold_time_sk = time_dim.t_time_sk
and ss_hdemo_sk = household_demographics.hd_demo_sk
and ss_store_sk = s_store_sk
and time_dim.t_hour = 15
and time_dim.t_minute >= 30
and household_demographics.hd_dep_count = 7
and store.s_store_name = 'ese'
order by count(*)
limit 100;

From debugging output it looks like this is the plan being chosen
(cheapest total path):
Gather(store_sales household_demographics time_dim) rows=60626
cost=3145.73..699910.15
HashJoin(store_sales household_demographics time_dim)
rows=25261 cost=2145.73..692847.55
clauses: store_sales.ss_hdemo_sk =
household_demographics.hd_demo_sk
HashJoin(store_sales time_dim) rows=252609
cost=1989.73..692028.08
clauses: store_sales.ss_sold_time_sk =
time_dim.t_time_sk
SeqScan(store_sales) rows=11998564
cost=0.00..658540.64
SeqScan(time_dim) rows=1070
cost=0.00..1976.35
SeqScan(household_demographics) rows=720
cost=0.00..147.00

prepare_sort_from_pathkeys fails to find a pathkey because
tlist_member_ignore_relabel returns null -- which seemed weird because
the sortexpr is an Aggref (in a single member equivalence class) and
the tlist contains a single member that's also an Aggref. It turns out
that the only difference between the two Aggrefs is that the tlist
entry has "aggsplit = AGGSPLIT_INITIAL_SERIAL" while the sortexpr has
aggsplit = AGGSPLIT_SIMPLE.

That's as far as I've gotten so far, but I figured I'd get that info
out to see if it means anything obvious to anyone else.

This really goes back to [1] where we fixed a similar issue by making
find_em_expr_usable_for_sorting_rel parallel the rules in
prepare_sort_from_pathkeys.

Most of those conditions got copied, and the case we were trying to
handle is the fact that prepare_sort_from_pathkeys can generate a
target list entry under those conditions if one doesn't exist. However
there's a further restriction there I don't remember looking at: it
uses pull_var_clause and tlist_member_ignore_relabel to ensure that
all of the vars that feed into the sort expression are found in the
target list. As I understand it, that is: it will build a target list
entry for something like "md5(column)" if "column" (and that was one
of our test cases for the previous fix) is in the target list already.

But there's an additional detail here: the call to pull_var_clause
requests aggregates, window functions, and placeholders be treated as
vars. That means for our Aggref case it would require that the two
Aggrefs be fully equal, so the differing aggsplit member would cause a
target list entry not to be built, hence our error here.

I've attached a quick and dirty patch that encodes that final rule
from prepare_sort_from_pathkeys into
find_em_expr_usable_for_sorting_rel. I can't help but think that
there's a cleaner way to do with this with less code duplication, but
hindering that is that prepare_sort_from_pathkeys is working with a
TargetList while find_em_expr_usable_for_sorting_rel is working with a
list of expressions.

James

1: /messages/by-id/CAAaqYe9C3f6A_tZCRfr9Dm7hPpgGwpp4i-K_=NS9GWXuNiFANg@mail.gmail.com

Attachments:

v1-0001-Fix-find_em_expr_usable_for_sorting_rel.patchapplication/octet-stream; name=v1-0001-Fix-find_em_expr_usable_for_sorting_rel.patchDownload+42-3
#11Luc Vlaming
luc@swarm64.com
In reply to: James Coleman (#10)
Re: "could not find pathkey item to sort" for TPC-DS queries 94-96

On 15-04-2021 04:01, James Coleman wrote:

On Wed, Apr 14, 2021 at 5:42 PM James Coleman <jtc331@gmail.com> wrote:

On Mon, Apr 12, 2021 at 8:37 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 4/12/21 2:24 PM, Luc Vlaming wrote:

Hi,

When trying to run on master (but afaik also PG-13) TPC-DS queries 94,
95 and 96 on a SF10 I get the error "could not find pathkey item to sort".
When I disable enable_gathermerge the problem goes away and then the
plan for query 94 looks like below. I tried figuring out what the
problem is but to be honest I would need some pointers as the code that
tries to matching equivalence members in prepare_sort_from_pathkeys is
something i'm really not familiar with.

Could be related to incremental sort, which allowed some gather merge
paths that were impossible before. We had a couple issues related to
that fixed in November, IIRC.

To reproduce you can either ingest and test using the toolkit I used too
(see https://github.com/swarm64/s64da-benchmark-toolkit/), or
alternatively just use the schema (see
https://github.com/swarm64/s64da-benchmark-toolkit/tree/master/benchmarks/tpcds/schemas/psql_native)

Thanks, I'll see if I can reproduce that with your schema.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

The query in question is:

select count(*)
from store_sales
,household_demographics
,time_dim, store
where ss_sold_time_sk = time_dim.t_time_sk
and ss_hdemo_sk = household_demographics.hd_demo_sk
and ss_store_sk = s_store_sk
and time_dim.t_hour = 15
and time_dim.t_minute >= 30
and household_demographics.hd_dep_count = 7
and store.s_store_name = 'ese'
order by count(*)
limit 100;

From debugging output it looks like this is the plan being chosen
(cheapest total path):
Gather(store_sales household_demographics time_dim) rows=60626
cost=3145.73..699910.15
HashJoin(store_sales household_demographics time_dim)
rows=25261 cost=2145.73..692847.55
clauses: store_sales.ss_hdemo_sk =
household_demographics.hd_demo_sk
HashJoin(store_sales time_dim) rows=252609
cost=1989.73..692028.08
clauses: store_sales.ss_sold_time_sk =
time_dim.t_time_sk
SeqScan(store_sales) rows=11998564
cost=0.00..658540.64
SeqScan(time_dim) rows=1070
cost=0.00..1976.35
SeqScan(household_demographics) rows=720
cost=0.00..147.00

prepare_sort_from_pathkeys fails to find a pathkey because
tlist_member_ignore_relabel returns null -- which seemed weird because
the sortexpr is an Aggref (in a single member equivalence class) and
the tlist contains a single member that's also an Aggref. It turns out
that the only difference between the two Aggrefs is that the tlist
entry has "aggsplit = AGGSPLIT_INITIAL_SERIAL" while the sortexpr has
aggsplit = AGGSPLIT_SIMPLE.

That's as far as I've gotten so far, but I figured I'd get that info
out to see if it means anything obvious to anyone else.

This really goes back to [1] where we fixed a similar issue by making
find_em_expr_usable_for_sorting_rel parallel the rules in
prepare_sort_from_pathkeys.

Most of those conditions got copied, and the case we were trying to
handle is the fact that prepare_sort_from_pathkeys can generate a
target list entry under those conditions if one doesn't exist. However
there's a further restriction there I don't remember looking at: it
uses pull_var_clause and tlist_member_ignore_relabel to ensure that
all of the vars that feed into the sort expression are found in the
target list. As I understand it, that is: it will build a target list
entry for something like "md5(column)" if "column" (and that was one
of our test cases for the previous fix) is in the target list already.

But there's an additional detail here: the call to pull_var_clause
requests aggregates, window functions, and placeholders be treated as
vars. That means for our Aggref case it would require that the two
Aggrefs be fully equal, so the differing aggsplit member would cause a
target list entry not to be built, hence our error here.

I've attached a quick and dirty patch that encodes that final rule
from prepare_sort_from_pathkeys into
find_em_expr_usable_for_sorting_rel. I can't help but think that
there's a cleaner way to do with this with less code duplication, but
hindering that is that prepare_sort_from_pathkeys is working with a
TargetList while find_em_expr_usable_for_sorting_rel is working with a
list of expressions.

James

1: /messages/by-id/CAAaqYe9C3f6A_tZCRfr9Dm7hPpgGwpp4i-K_=NS9GWXuNiFANg@mail.gmail.com

Hi,

The patch seems to make the planner proceed and not error out anymore.
Cannot judge if it's doing the right thing however or if its enough :)
It works for me for all reported queries however (queries 94-96).

And sorry for the confusion wrt the stacktrace and plan. I tried to
produce a plan to possibly help with debugging but that would ofc then
not have the problem of the missing sortkey as otherwise i cannot
present a plan :) The stacktrace was however correct, and the plan
considered involved a gather-merge with a sort. Unfortunately I could
not (easily) get the plan outputted in the end; even when setting the
costs to 0 somehow...

Regards,
Luc

#12James Coleman
jtc331@gmail.com
In reply to: Luc Vlaming (#11)
Re: "could not find pathkey item to sort" for TPC-DS queries 94-96

On Thu, Apr 15, 2021 at 5:33 AM Luc Vlaming <luc@swarm64.com> wrote:

On 15-04-2021 04:01, James Coleman wrote:

On Wed, Apr 14, 2021 at 5:42 PM James Coleman <jtc331@gmail.com> wrote:

On Mon, Apr 12, 2021 at 8:37 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 4/12/21 2:24 PM, Luc Vlaming wrote:

Hi,

When trying to run on master (but afaik also PG-13) TPC-DS queries 94,
95 and 96 on a SF10 I get the error "could not find pathkey item to sort".
When I disable enable_gathermerge the problem goes away and then the
plan for query 94 looks like below. I tried figuring out what the
problem is but to be honest I would need some pointers as the code that
tries to matching equivalence members in prepare_sort_from_pathkeys is
something i'm really not familiar with.

Could be related to incremental sort, which allowed some gather merge
paths that were impossible before. We had a couple issues related to
that fixed in November, IIRC.

To reproduce you can either ingest and test using the toolkit I used too
(see https://github.com/swarm64/s64da-benchmark-toolkit/), or
alternatively just use the schema (see
https://github.com/swarm64/s64da-benchmark-toolkit/tree/master/benchmarks/tpcds/schemas/psql_native)

Thanks, I'll see if I can reproduce that with your schema.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

The query in question is:

select count(*)
from store_sales
,household_demographics
,time_dim, store
where ss_sold_time_sk = time_dim.t_time_sk
and ss_hdemo_sk = household_demographics.hd_demo_sk
and ss_store_sk = s_store_sk
and time_dim.t_hour = 15
and time_dim.t_minute >= 30
and household_demographics.hd_dep_count = 7
and store.s_store_name = 'ese'
order by count(*)
limit 100;

From debugging output it looks like this is the plan being chosen
(cheapest total path):
Gather(store_sales household_demographics time_dim) rows=60626
cost=3145.73..699910.15
HashJoin(store_sales household_demographics time_dim)
rows=25261 cost=2145.73..692847.55
clauses: store_sales.ss_hdemo_sk =
household_demographics.hd_demo_sk
HashJoin(store_sales time_dim) rows=252609
cost=1989.73..692028.08
clauses: store_sales.ss_sold_time_sk =
time_dim.t_time_sk
SeqScan(store_sales) rows=11998564
cost=0.00..658540.64
SeqScan(time_dim) rows=1070
cost=0.00..1976.35
SeqScan(household_demographics) rows=720
cost=0.00..147.00

prepare_sort_from_pathkeys fails to find a pathkey because
tlist_member_ignore_relabel returns null -- which seemed weird because
the sortexpr is an Aggref (in a single member equivalence class) and
the tlist contains a single member that's also an Aggref. It turns out
that the only difference between the two Aggrefs is that the tlist
entry has "aggsplit = AGGSPLIT_INITIAL_SERIAL" while the sortexpr has
aggsplit = AGGSPLIT_SIMPLE.

That's as far as I've gotten so far, but I figured I'd get that info
out to see if it means anything obvious to anyone else.

This really goes back to [1] where we fixed a similar issue by making
find_em_expr_usable_for_sorting_rel parallel the rules in
prepare_sort_from_pathkeys.

Most of those conditions got copied, and the case we were trying to
handle is the fact that prepare_sort_from_pathkeys can generate a
target list entry under those conditions if one doesn't exist. However
there's a further restriction there I don't remember looking at: it
uses pull_var_clause and tlist_member_ignore_relabel to ensure that
all of the vars that feed into the sort expression are found in the
target list. As I understand it, that is: it will build a target list
entry for something like "md5(column)" if "column" (and that was one
of our test cases for the previous fix) is in the target list already.

But there's an additional detail here: the call to pull_var_clause
requests aggregates, window functions, and placeholders be treated as
vars. That means for our Aggref case it would require that the two
Aggrefs be fully equal, so the differing aggsplit member would cause a
target list entry not to be built, hence our error here.

I've attached a quick and dirty patch that encodes that final rule
from prepare_sort_from_pathkeys into
find_em_expr_usable_for_sorting_rel. I can't help but think that
there's a cleaner way to do with this with less code duplication, but
hindering that is that prepare_sort_from_pathkeys is working with a
TargetList while find_em_expr_usable_for_sorting_rel is working with a
list of expressions.

James

1: /messages/by-id/CAAaqYe9C3f6A_tZCRfr9Dm7hPpgGwpp4i-K_=NS9GWXuNiFANg@mail.gmail.com

Hi,

The patch seems to make the planner proceed and not error out anymore.
Cannot judge if it's doing the right thing however or if its enough :)
It works for me for all reported queries however (queries 94-96).

And sorry for the confusion wrt the stacktrace and plan. I tried to
produce a plan to possibly help with debugging but that would ofc then
not have the problem of the missing sortkey as otherwise i cannot
present a plan :) The stacktrace was however correct, and the plan
considered involved a gather-merge with a sort. Unfortunately I could
not (easily) get the plan outputted in the end; even when setting the
costs to 0 somehow...

Regards,
Luc

Same patch, but with a test case now.

James

Attachments:

v2-0001-Fix-find_em_expr_usable_for_sorting_rel.patchapplication/octet-stream; name=v2-0001-Fix-find_em_expr_usable_for_sorting_rel.patchDownload+75-3
#13Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Robert Haas (#7)
Re: "could not find pathkey item to sort" for TPC-DS queries 94-96

On 4/15/21 2:21 AM, Robert Haas wrote:

On Wed, Apr 14, 2021 at 8:20 PM James Coleman <jtc331@gmail.com> wrote:

Hmm, could be. Although, the stack trace at issue doesn't seem to show
a call to create_incrementalsort_plan().

The changes to gather merge path generation made it possible to use
those paths in more cases for both incremental sort and regular sort,
so by "incremental sort" I read Tomas as saying "the patches that
brought in incremental sort" not specifically "incremental sort
itself".

I agree. That's why I said "hmm, could be" even though the plan
doesn't involve one.

Yeah, that's what I meant. The difference to pre-13 behavior is that we
now call generate_useful_gather_paths, which also considers adding extra
sort (unlike plain generate_gather_paths).

So now we can end up with "Gather Merge -> Sort" paths that would not be
considered before.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#14Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: James Coleman (#12)
Re: "could not find pathkey item to sort" for TPC-DS queries 94-96

On 4/15/21 7:35 PM, James Coleman wrote:

On Thu, Apr 15, 2021 at 5:33 AM Luc Vlaming <luc@swarm64.com> wrote:

On 15-04-2021 04:01, James Coleman wrote:

On Wed, Apr 14, 2021 at 5:42 PM James Coleman <jtc331@gmail.com> wrote:

On Mon, Apr 12, 2021 at 8:37 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 4/12/21 2:24 PM, Luc Vlaming wrote:

Hi,

When trying to run on master (but afaik also PG-13) TPC-DS queries 94,
95 and 96 on a SF10 I get the error "could not find pathkey item to sort".
When I disable enable_gathermerge the problem goes away and then the
plan for query 94 looks like below. I tried figuring out what the
problem is but to be honest I would need some pointers as the code that
tries to matching equivalence members in prepare_sort_from_pathkeys is
something i'm really not familiar with.

Could be related to incremental sort, which allowed some gather merge
paths that were impossible before. We had a couple issues related to
that fixed in November, IIRC.

To reproduce you can either ingest and test using the toolkit I used too
(see https://github.com/swarm64/s64da-benchmark-toolkit/), or
alternatively just use the schema (see
https://github.com/swarm64/s64da-benchmark-toolkit/tree/master/benchmarks/tpcds/schemas/psql_native)

Thanks, I'll see if I can reproduce that with your schema.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

The query in question is:

select count(*)
from store_sales
,household_demographics
,time_dim, store
where ss_sold_time_sk = time_dim.t_time_sk
and ss_hdemo_sk = household_demographics.hd_demo_sk
and ss_store_sk = s_store_sk
and time_dim.t_hour = 15
and time_dim.t_minute >= 30
and household_demographics.hd_dep_count = 7
and store.s_store_name = 'ese'
order by count(*)
limit 100;

From debugging output it looks like this is the plan being chosen
(cheapest total path):
Gather(store_sales household_demographics time_dim) rows=60626
cost=3145.73..699910.15
HashJoin(store_sales household_demographics time_dim)
rows=25261 cost=2145.73..692847.55
clauses: store_sales.ss_hdemo_sk =
household_demographics.hd_demo_sk
HashJoin(store_sales time_dim) rows=252609
cost=1989.73..692028.08
clauses: store_sales.ss_sold_time_sk =
time_dim.t_time_sk
SeqScan(store_sales) rows=11998564
cost=0.00..658540.64
SeqScan(time_dim) rows=1070
cost=0.00..1976.35
SeqScan(household_demographics) rows=720
cost=0.00..147.00

prepare_sort_from_pathkeys fails to find a pathkey because
tlist_member_ignore_relabel returns null -- which seemed weird because
the sortexpr is an Aggref (in a single member equivalence class) and
the tlist contains a single member that's also an Aggref. It turns out
that the only difference between the two Aggrefs is that the tlist
entry has "aggsplit = AGGSPLIT_INITIAL_SERIAL" while the sortexpr has
aggsplit = AGGSPLIT_SIMPLE.

That's as far as I've gotten so far, but I figured I'd get that info
out to see if it means anything obvious to anyone else.

This really goes back to [1] where we fixed a similar issue by making
find_em_expr_usable_for_sorting_rel parallel the rules in
prepare_sort_from_pathkeys.

Most of those conditions got copied, and the case we were trying to
handle is the fact that prepare_sort_from_pathkeys can generate a
target list entry under those conditions if one doesn't exist. However
there's a further restriction there I don't remember looking at: it
uses pull_var_clause and tlist_member_ignore_relabel to ensure that
all of the vars that feed into the sort expression are found in the
target list. As I understand it, that is: it will build a target list
entry for something like "md5(column)" if "column" (and that was one
of our test cases for the previous fix) is in the target list already.

But there's an additional detail here: the call to pull_var_clause
requests aggregates, window functions, and placeholders be treated as
vars. That means for our Aggref case it would require that the two
Aggrefs be fully equal, so the differing aggsplit member would cause a
target list entry not to be built, hence our error here.

I've attached a quick and dirty patch that encodes that final rule
from prepare_sort_from_pathkeys into
find_em_expr_usable_for_sorting_rel. I can't help but think that
there's a cleaner way to do with this with less code duplication, but
hindering that is that prepare_sort_from_pathkeys is working with a
TargetList while find_em_expr_usable_for_sorting_rel is working with a
list of expressions.

Yeah, I think it'll be difficult to reuse code from later planner stages
exactly because it operates on different representation. So something
like your patch is likely necessary.

As for the patch, I have a couple comments:

1) expr_list_member_ignore_relabel would deserve a better comment, and
maybe a reference to tlist_member_ignore_relabel which it copies

2) I suppose the comment before "if (ec->ec_has_volatile)" needs
updating, because now it says we're done as long as the expression is
not volatile (but we're doing more stuff).

3) Shouldn't find_em_expr_usable_for_sorting_rel now mostly mimic what
prepare_sort_from_pathkeys does? That is, try to match the entries
directly first, before the new pull_vars() business?

4) I've simplified the foreach() loop a bit. prepare_sort_from_pathkeys
does it differently, but that's because there are multiple foreach
levels, I think. Yes, we'll not free the list, but I that's what most
other places in planner do ...

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-Fix-find_em_expr_usable_for_sorting_rel.patchtext/x-patch; charset=UTF-8; name=0001-Fix-find_em_expr_usable_for_sorting_rel.patchDownload+75-3
0002-tweaks.patchtext/x-patch; charset=UTF-8; name=0002-tweaks.patchDownload+20-6
#15Zhihong Yu
zyu@yugabyte.com
In reply to: Tomas Vondra (#14)
Re: "could not find pathkey item to sort" for TPC-DS queries 94-96

On Thu, Apr 15, 2021 at 6:27 PM Tomas Vondra <tomas.vondra@enterprisedb.com>
wrote:

On 4/15/21 7:35 PM, James Coleman wrote:

On Thu, Apr 15, 2021 at 5:33 AM Luc Vlaming <luc@swarm64.com> wrote:

On 15-04-2021 04:01, James Coleman wrote:

On Wed, Apr 14, 2021 at 5:42 PM James Coleman <jtc331@gmail.com>

wrote:

On Mon, Apr 12, 2021 at 8:37 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 4/12/21 2:24 PM, Luc Vlaming wrote:

Hi,

When trying to run on master (but afaik also PG-13) TPC-DS queries

94,

95 and 96 on a SF10 I get the error "could not find pathkey item to

sort".

When I disable enable_gathermerge the problem goes away and then the
plan for query 94 looks like below. I tried figuring out what the
problem is but to be honest I would need some pointers as the code

that

tries to matching equivalence members in prepare_sort_from_pathkeys

is

something i'm really not familiar with.

Could be related to incremental sort, which allowed some gather merge
paths that were impossible before. We had a couple issues related to
that fixed in November, IIRC.

To reproduce you can either ingest and test using the toolkit I

used too

(see https://github.com/swarm64/s64da-benchmark-toolkit/), or
alternatively just use the schema (see

https://github.com/swarm64/s64da-benchmark-toolkit/tree/master/benchmarks/tpcds/schemas/psql_native
)

Thanks, I'll see if I can reproduce that with your schema.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

The query in question is:

select count(*)
from store_sales
,household_demographics
,time_dim, store
where ss_sold_time_sk = time_dim.t_time_sk
and ss_hdemo_sk = household_demographics.hd_demo_sk
and ss_store_sk = s_store_sk
and time_dim.t_hour = 15
and time_dim.t_minute >= 30
and household_demographics.hd_dep_count = 7
and store.s_store_name = 'ese'
order by count(*)
limit 100;

From debugging output it looks like this is the plan being chosen
(cheapest total path):
Gather(store_sales household_demographics time_dim)

rows=60626

cost=3145.73..699910.15
HashJoin(store_sales household_demographics time_dim)
rows=25261 cost=2145.73..692847.55
clauses: store_sales.ss_hdemo_sk =
household_demographics.hd_demo_sk
HashJoin(store_sales time_dim) rows=252609
cost=1989.73..692028.08
clauses: store_sales.ss_sold_time_sk =
time_dim.t_time_sk
SeqScan(store_sales) rows=11998564
cost=0.00..658540.64
SeqScan(time_dim) rows=1070
cost=0.00..1976.35
SeqScan(household_demographics) rows=720
cost=0.00..147.00

prepare_sort_from_pathkeys fails to find a pathkey because
tlist_member_ignore_relabel returns null -- which seemed weird because
the sortexpr is an Aggref (in a single member equivalence class) and
the tlist contains a single member that's also an Aggref. It turns out
that the only difference between the two Aggrefs is that the tlist
entry has "aggsplit = AGGSPLIT_INITIAL_SERIAL" while the sortexpr has
aggsplit = AGGSPLIT_SIMPLE.

That's as far as I've gotten so far, but I figured I'd get that info
out to see if it means anything obvious to anyone else.

This really goes back to [1] where we fixed a similar issue by making
find_em_expr_usable_for_sorting_rel parallel the rules in
prepare_sort_from_pathkeys.

Most of those conditions got copied, and the case we were trying to
handle is the fact that prepare_sort_from_pathkeys can generate a
target list entry under those conditions if one doesn't exist. However
there's a further restriction there I don't remember looking at: it
uses pull_var_clause and tlist_member_ignore_relabel to ensure that
all of the vars that feed into the sort expression are found in the
target list. As I understand it, that is: it will build a target list
entry for something like "md5(column)" if "column" (and that was one
of our test cases for the previous fix) is in the target list already.

But there's an additional detail here: the call to pull_var_clause
requests aggregates, window functions, and placeholders be treated as
vars. That means for our Aggref case it would require that the two
Aggrefs be fully equal, so the differing aggsplit member would cause a
target list entry not to be built, hence our error here.

I've attached a quick and dirty patch that encodes that final rule
from prepare_sort_from_pathkeys into
find_em_expr_usable_for_sorting_rel. I can't help but think that
there's a cleaner way to do with this with less code duplication, but
hindering that is that prepare_sort_from_pathkeys is working with a
TargetList while find_em_expr_usable_for_sorting_rel is working with a
list of expressions.

Yeah, I think it'll be difficult to reuse code from later planner stages
exactly because it operates on different representation. So something
like your patch is likely necessary.

As for the patch, I have a couple comments:

1) expr_list_member_ignore_relabel would deserve a better comment, and
maybe a reference to tlist_member_ignore_relabel which it copies

2) I suppose the comment before "if (ec->ec_has_volatile)" needs
updating, because now it says we're done as long as the expression is
not volatile (but we're doing more stuff).

3) Shouldn't find_em_expr_usable_for_sorting_rel now mostly mimic what
prepare_sort_from_pathkeys does? That is, try to match the entries
directly first, before the new pull_vars() business?

4) I've simplified the foreach() loop a bit. prepare_sort_from_pathkeys
does it differently, but that's because there are multiple foreach
levels, I think. Yes, we'll not free the list, but I that's what most
other places in planner do ...

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Hi,

            if (!expr_list_member_ignore_relabel(lfirst(k), target->exprs))
-               break;
+               return NULL;

I think it would be better if list_free(exprvars) is called before the
return.

Cheers

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#14)
Re: "could not find pathkey item to sort" for TPC-DS queries 94-96

[ sorry for not getting to this thread till now ]

Tomas Vondra <tomas.vondra@enterprisedb.com> writes:

3) Shouldn't find_em_expr_usable_for_sorting_rel now mostly mimic what
prepare_sort_from_pathkeys does? That is, try to match the entries
directly first, before the new pull_vars() business?

Yeah. I concur that the problem here is that
find_em_expr_usable_for_sorting_rel isn't fully accounting for what
prepare_sort_from_pathkeys can and can't do. However, I don't like this
patch much:

* As written, I think it may just move the pain somewhere else. The point
of the logic in prepare_sort_from_pathkeys is to handle either full
expression matches (e.g. sort by "A+B" when "A+B" is an expression in
the input tlist) or computable expressions (sort by "A+B" when A and B
are individually available). I think you've fixed the second case and
broken the first one. Now it's possible that the case never arises,
and certainly failing to generate an early sort isn't catastrophic anyway.
But we ought to get it right.

* If the goal is to match what prepare_sort_from_pathkeys can do, I
think that doubling down on the strategy of having a duplicate copy
is not the path to a maintainable fix.

I think it's time for some refactoring of this code so that we can
actually share the logic. Accordingly, I propose the attached.
It's really not that hard to share, as long as you accept the idea
that the list passed to the shared subroutine can be either a list of
TargetEntries or of bare expressions.

Also, I don't much care for either the name or API of
find_em_expr_usable_for_sorting_rel. The sole current caller only
really needs a boolean result, and if it did need more than that
it'd likely need the whole EquivalenceMember not just the em_expr
(certainly createplan.c does). So 0002 attached is some bikeshedding
on that. I kept that separate because it might be wise to do it only
in HEAD, just in case somebody out there is calling the function from
an extension.

(BTW, responding to an upthread question: I think the looping to
remove multiple levels of RelabelType is probably now redundant,
but I didn't remove it. If we want to do that there are more
places to touch than just this code.)

regards, tom lane

Attachments:

0001-Fix-find_em_expr_usable_for_sorting_rel-2.patchtext/x-diff; charset=us-ascii; name=0001-Fix-find_em_expr_usable_for_sorting_rel-2.patchDownload+243-180
0002-change-function-name.patchtext/x-diff; charset=us-ascii; name=0002-change-function-name.patchDownload+30-29
#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#16)
Re: "could not find pathkey item to sort" for TPC-DS queries 94-96

I wrote:

I think it's time for some refactoring of this code so that we can
actually share the logic. Accordingly, I propose the attached.

After sleeping on it, here's an improved version that gets rid of
an unnecessary assumption about ECs usually not containing both
parallel-safe and parallel-unsafe members. I'd tried to do this
yesterday but didn't like the amount of side-effects on createplan.c
(caused by the direct call sites not being passed the "root" pointer).
However, we can avoid refactoring createplan.c APIs by saying that it's
okay to pass root = NULL to find_computable_ec_member if you're not
asking it to check parallel safety. And there's not really a need to
put a parallel-safety check into find_ec_member_matching_expr at all;
that task can be left with the one caller that cares.

regards, tom lane

Attachments:

0001-Fix-find_em_expr_usable_for_sorting_rel-3.patchtext/x-diff; charset=us-ascii; name=0001-Fix-find_em_expr_usable_for_sorting_rel-3.patchDownload+264-181
0002-change-function-name-3.patchtext/x-diff; charset=us-ascii; name=0002-change-function-name-3.patchDownload+25-24
#18James Coleman
jtc331@gmail.com
In reply to: Tom Lane (#17)
Re: "could not find pathkey item to sort" for TPC-DS queries 94-96

On Sun, Apr 18, 2021 at 1:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

I think it's time for some refactoring of this code so that we can
actually share the logic. Accordingly, I propose the attached.

After sleeping on it, here's an improved version that gets rid of
an unnecessary assumption about ECs usually not containing both
parallel-safe and parallel-unsafe members. I'd tried to do this
yesterday but didn't like the amount of side-effects on createplan.c
(caused by the direct call sites not being passed the "root" pointer).
However, we can avoid refactoring createplan.c APIs by saying that it's
okay to pass root = NULL to find_computable_ec_member if you're not
asking it to check parallel safety. And there's not really a need to
put a parallel-safety check into find_ec_member_matching_expr at all;
that task can be left with the one caller that cares.

I like the refactoring here.

Two things I wonder:
1. Should we add tests for the relabel code path?
2. It'd be nice not to have the IS_SRF_CALL duplicated, but that might
add enough complexity that it's not worth it.

Thanks,
James Coleman

#19James Coleman
jtc331@gmail.com
In reply to: Tom Lane (#16)
Re: "could not find pathkey item to sort" for TPC-DS queries 94-96

On Sat, Apr 17, 2021 at 3:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

...
Also, I don't much care for either the name or API of
find_em_expr_usable_for_sorting_rel. The sole current caller only
really needs a boolean result, and if it did need more than that
it'd likely need the whole EquivalenceMember not just the em_expr
(certainly createplan.c does). So 0002 attached is some bikeshedding
on that. I kept that separate because it might be wise to do it only
in HEAD, just in case somebody out there is calling the function from
an extension.

I forgot to comment on this in my previous email, but it seems to me
that relation_has_safe_ec_member, while less wordy, isn't quite
descriptive enough. Perhaps something like
relation_has_sort_safe_ec_member?

James Coleman

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: James Coleman (#19)
Re: "could not find pathkey item to sort" for TPC-DS queries 94-96

James Coleman <jtc331@gmail.com> writes:

I forgot to comment on this in my previous email, but it seems to me
that relation_has_safe_ec_member, while less wordy, isn't quite
descriptive enough. Perhaps something like
relation_has_sort_safe_ec_member?

I'm not wedded to that name, certainly, but it seems like neither
of these is quite getting at the issue. An EC can be sorted on,
by definition, but there are some things we don't want to sort
on till the final output step. I was trying to think of something
using the terminology "early sort", but didn't much like
"relation_has_early_sortable_ec_member" or obvious variants of that.

regards, tom lane

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#20)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: James Coleman (#18)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#22)
#24James Coleman
jtc331@gmail.com
In reply to: Tom Lane (#23)
In reply to: Tom Lane (#23)
In reply to: Dagfinn Ilmari Mannsåker (#25)
#27James Coleman
jtc331@gmail.com
In reply to: Dagfinn Ilmari Mannsåker (#26)
#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dagfinn Ilmari Mannsåker (#26)
#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: James Coleman (#24)