v12.0: ERROR: could not find pathkey item to sort

Started by Justin Pryzbyover 6 years ago20 messageshackers
Jump to latest
#1Justin Pryzby
pryzby@telsasoft.com

I've reduced the failing query as much as possible to this:

-- This is necessary to fail:
SET enable_nestloop=off;

SELECT * FROM
(SELECT start_time, t1.site_id
FROM pgw_kpi_view t1
-- Apparently the where clause is necessary to fail...
WHERE (start_time>='2019-10-10' AND start_time<'2019-10-11')
-- The group by MAY be necessary to fail...
GROUP BY 1,2
) AS data
JOIN sites ON ( sites.site_location='' OR sites.site_office=data.site_id)

The view is actually a join of two relkind=p partitioned tables (which I
will acknowledge probably performs poorly).

(gdb) bt
#0 errfinish (dummy=dummy@entry=0) at elog.c:411
#1 0x000000000087a959 in elog_finish (elevel=elevel@entry=20, fmt=fmt@entry=0x9d93d8 "could not find pathkey item to sort") at elog.c:1365
#2 0x00000000006a587f in prepare_sort_from_pathkeys (lefttree=0x7f7cb84492e8, pathkeys=<optimized out>, relids=0x7f7cb8410700, reqColIdx=reqColIdx@entry=0x0, adjust_tlist_in_place=<optimized out>,
adjust_tlist_in_place@entry=false, p_numsortkeys=p_numsortkeys@entry=0x7ffc4b2e10c4, p_sortColIdx=p_sortColIdx@entry=0x7ffc4b2e10c8, p_sortOperators=p_sortOperators@entry=0x7ffc4b2e10d0,
p_collations=p_collations@entry=0x7ffc4b2e10d8, p_nullsFirst=p_nullsFirst@entry=0x7ffc4b2e10e0) at createplan.c:5893
#3 0x00000000006a5a6a in make_sort_from_pathkeys (lefttree=<optimized out>, pathkeys=<optimized out>, relids=<optimized out>) at createplan.c:6020
#4 0x00000000006a6e30 in create_sort_plan (flags=4, best_path=0x7f7cb8410cc8, root=0x7f7fdc3ac6b0) at createplan.c:1985
#5 create_plan_recurse (root=root@entry=0x7f7fdc3ac6b0, best_path=0x7f7cb8410cc8, flags=flags@entry=4) at createplan.c:459
#6 0x00000000006a6e4e in create_group_plan (best_path=0x7f7cb8410d58, root=0x7f7fdc3ac6b0) at createplan.c:2012
#7 create_plan_recurse (root=root@entry=0x7f7fdc3ac6b0, best_path=best_path@entry=0x7f7cb8410d58, flags=flags@entry=1) at createplan.c:464
#8 0x00000000006a8278 in create_merge_append_plan (flags=4, best_path=0x7f7cb8446cd8, root=0x7f7fdc3ac6b0) at createplan.c:1333
#9 create_plan_recurse (root=root@entry=0x7f7fdc3ac6b0, best_path=0x7f7cb8446cd8, flags=flags@entry=4) at createplan.c:402
#10 0x00000000006a6e4e in create_group_plan (best_path=0x7f7cb84486c8, root=0x7f7fdc3ac6b0) at createplan.c:2012
#11 create_plan_recurse (root=root@entry=0x7f7fdc3ac6b0, best_path=0x7f7cb84486c8, flags=flags@entry=1) at createplan.c:464
#12 0x00000000006a9739 in create_plan (root=0x7f7fdc3ac6b0, best_path=<optimized out>) at createplan.c:325
#13 0x00000000006aa988 in create_subqueryscan_plan (scan_clauses=0x0, tlist=0x7f7cb8450820, best_path=0x7f7cb8448db8, root=0x7f7fdc34b948) at createplan.c:3385
#14 create_scan_plan (root=root@entry=0x7f7fdc34b948, best_path=best_path@entry=0x7f7cb8448db8, flags=<optimized out>, flags@entry=0) at createplan.c:670
#15 0x00000000006a6d31 in create_plan_recurse (root=root@entry=0x7f7fdc34b948, best_path=0x7f7cb8448db8, flags=flags@entry=0) at createplan.c:427
#16 0x00000000006a983a in create_nestloop_plan (best_path=0x7f7cb844fb80, root=0x7f7fdc34b948) at createplan.c:4008
#17 create_join_plan (root=root@entry=0x7f7fdc34b948, best_path=best_path@entry=0x7f7cb844fb80) at createplan.c:1020
#18 0x00000000006a6d75 in create_plan_recurse (root=root@entry=0x7f7fdc34b948, best_path=0x7f7cb844fb80, flags=flags@entry=1) at createplan.c:393
#19 0x00000000006a9739 in create_plan (root=root@entry=0x7f7fdc34b948, best_path=<optimized out>) at createplan.c:325
#20 0x00000000006b5a04 in standard_planner (parse=0x1bd2308, cursorOptions=256, boundParams=0x0) at planner.c:413
#21 0x000000000075fb2e in pg_plan_query (querytree=querytree@entry=0x1bd2308, cursorOptions=cursorOptions@entry=256, boundParams=boundParams@entry=0x0) at postgres.c:878
#22 0x000000000075fbee in pg_plan_queries (querytrees=<optimized out>, cursorOptions=cursorOptions@entry=256, boundParams=boundParams@entry=0x0) at postgres.c:968
#23 0x000000000076007a in exec_simple_query (
query_string=0x1ba36e0 "SELECT * FROM\n\t(SELECT start_time, t1.site_id\n\tFROM pgw_kpi_view t1\n\t\n\tWHERE (start_time>='2019-10-10' AND start_time<'2019-10-11')\n\t\n\tGROUP BY 1,2\n\t) AS data\nJOIN sites ON ( sites.site_location='' OR"...) at postgres.c:1143
#24 0x0000000000761212 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x1bd8e70, dbname=0x1bd8d10 "ts", username=<optimized out>) at postgres.c:4236
#25 0x0000000000483d02 in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4431
#26 BackendStartup (port=0x1bd5190) at postmaster.c:4122
#27 ServerLoop () at postmaster.c:1704
#28 0x00000000006f0b1f in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1b9e280) at postmaster.c:1377
#29 0x0000000000484c93 in main (argc=3, argv=0x1b9e280) at main.c:228

bt f:

#2 0x00000000006a587f in prepare_sort_from_pathkeys (lefttree=0x7f7cb84492e8, pathkeys=<optimized out>, relids=0x7f7cb8410700, reqColIdx=reqColIdx@entry=0x0, adjust_tlist_in_place=<optimized out>,
adjust_tlist_in_place@entry=false, p_numsortkeys=p_numsortkeys@entry=0x7ffc4b2e10c4, p_sortColIdx=p_sortColIdx@entry=0x7ffc4b2e10c8, p_sortOperators=p_sortOperators@entry=0x7ffc4b2e10d0,
p_collations=p_collations@entry=0x7ffc4b2e10d8, p_nullsFirst=p_nullsFirst@entry=0x7ffc4b2e10e0) at createplan.c:5893
sortexpr = <optimized out>
ec = 0x7f7cb8edbe28
em = <optimized out>
tle = <optimized out>
pathkey = <optimized out>
pk_datatype = <optimized out>
sortop = <optimized out>
j = <optimized out>
tlist = 0x7f7cb8451bb8
i = 0x7f7cb8edc2d8
numsortkeys = 0
sortColIdx = 0x7f7cb8451c58
sortOperators = 0x7f7cb8451c70
collations = 0x7f7cb8451c88
nullsFirst = 0x7f7cb8451ca0
__func__ = "prepare_sort_from_pathkeys"
#3 0x00000000006a5a6a in make_sort_from_pathkeys (lefttree=<optimized out>, pathkeys=<optimized out>, relids=<optimized out>) at createplan.c:6020
numsortkeys = 32636
sortColIdx = 0x7f7cb8447468
sortOperators = 0x7f7cb83fa278
collations = 0x0
nullsFirst = 0x7f7cb8edc2f8

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Justin Pryzby (#1)
Re: v12.0: ERROR: could not find pathkey item to sort

Justin Pryzby <pryzby@telsasoft.com> writes:

I've reduced the failing query as much as possible to this:
-- This is necessary to fail:
SET enable_nestloop=off;

SELECT * FROM
(SELECT start_time, t1.site_id
FROM pgw_kpi_view t1
-- Apparently the where clause is necessary to fail...
WHERE (start_time>='2019-10-10' AND start_time<'2019-10-11')
-- The group by MAY be necessary to fail...
GROUP BY 1,2
) AS data
JOIN sites ON ( sites.site_location='' OR sites.site_office=data.site_id)

The view is actually a join of two relkind=p partitioned tables (which I
will acknowledge probably performs poorly).

Could you provide a self-contained test case please?

regards, tom lane

#3Justin Pryzby
pryzby@telsasoft.com
In reply to: Tom Lane (#2)
Re: v12.0: ERROR: could not find pathkey item to sort

On Fri, Oct 11, 2019 at 10:48:37AM -0400, Tom Lane wrote:

Justin Pryzby <pryzby@telsasoft.com> writes:

The view is actually a join of two relkind=p partitioned tables (which I
will acknowledge probably performs poorly).

Could you provide a self-contained test case please?

Working on it. FWIW explain for a v11 customer looks like this:

Nested Loop (cost=10000011818.10..10000076508.23 rows=734500 width=159)
Join Filter: ((s.site_location = ''::text) OR (s.site_office = (COALESCE(huawei_ggsn_201610.ne_name, huawei_ggsn_gw_201610.ne_name))))
-> Group (cost=11818.10..11946.31 rows=2937 width=40)
Group Key: (COALESCE(huawei_ggsn_201610.start_time, huawei_ggsn_gw_201610.start_time)), (COALESCE(huawei_ggsn_201610.ne_name, huawei_ggsn_gw_201610.ne_name))
-> Merge Append (cost=11818.10..11931.59 rows=2944 width=40)
Sort Key: (COALESCE(huawei_ggsn_201610.start_time, huawei_ggsn_gw_201610.start_time)), (COALESCE(huawei_ggsn_201610.ne_name, huawei_ggsn_gw_201610.ne_name))
-> Group (cost=332.48..333.10 rows=83 width=40)
Group Key: (COALESCE(huawei_ggsn_201610.start_time, huawei_ggsn_gw_201610.start_time)), (COALESCE(huawei_ggsn_201610.ne_name, huawei_ggsn_gw_201610.ne_name))
-> Sort (cost=332.48..332.69 rows=83 width=40)
Sort Key: (COALESCE(huawei_ggsn_201610.start_time, huawei_ggsn_gw_201610.start_time)), (COALESCE(huawei_ggsn_201610.ne_name, huawei_ggsn_gw_201610.ne_name))
-> Hash Full Join (cost=46.48..329.84 rows=83 width=40)
Hash Cond: ((huawei_ggsn_201610.ne_name = huawei_ggsn_gw_201610.ne_name) AND (huawei_ggsn_201610.ggsn_function = huawei_ggsn_gw_201610.ggsn_function) AND (huawei_ggsn_201610.start_time = huawei_
ggsn_gw_201610.start_time) AND (huawei_ggsn_201610.interval_seconds = huawei_ggsn_gw_201610.interval_seconds) AND (huawei_ggsn_201610.device_id = huawei_ggsn_gw_201610.device_id) AND (huawei_ggsn_201610.c_134710251 = huawei_ggs
n_gw_201610.c_134710251) AND (huawei_ggsn_201610.c_134710252 = huawei_ggsn_gw_201610.c_134710252) AND (huawei_ggsn_201610.c_134710253 = huawei_ggsn_gw_201610.c_134710253) AND (huawei_ggsn_201610.ne_id = huawei_ggsn_gw_201610.ne
_id) AND (huawei_ggsn_201610.ugw_function = huawei_ggsn_gw_201610.ugw_function))
Filter: ((COALESCE(huawei_ggsn_201610.start_time, huawei_ggsn_gw_201610.start_time) >= '2019-10-01 00:00:00-11'::timestamp with time zone) AND (COALESCE(huawei_ggsn_201610.start_time, huawei_ggs
n_gw_201610.start_time) < '2019-10-02 00:00:00-11'::timestamp with time zone))
-> Seq Scan on huawei_ggsn_201610 (cost=0.00..255.44 rows=744 width=94)
-> Hash (cost=20.44..20.44 rows=744 width=94)
-> Seq Scan on huawei_ggsn_gw_201610 (cost=0.00..20.44 rows=744 width=94)
[...]

I'm suspecting this; is it useful to test with this commit reverted ?

commit 8edd0e79460b414b1d971895312e549e95e12e4f
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Mon Mar 25 15:42:35 2019 -0400

Suppress Append and MergeAppend plan nodes that have a single child.

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Justin Pryzby (#3)
Re: v12.0: ERROR: could not find pathkey item to sort

Justin Pryzby <pryzby@telsasoft.com> writes:

On Fri, Oct 11, 2019 at 10:48:37AM -0400, Tom Lane wrote:

Could you provide a self-contained test case please?

I'm suspecting this; is it useful to test with this commit reverted ?

I wouldn't bother; we'd still need a test case to find out what the
problem is.

regards, tom lane

#5Justin Pryzby
pryzby@telsasoft.com
In reply to: Tom Lane (#2)
Re: v12.0: ERROR: could not find pathkey item to sort

On Fri, Oct 11, 2019 at 10:48:37AM -0400, Tom Lane wrote:

Could you provide a self-contained test case please?

SET enable_partitionwise_aggregate = 'on';
SET enable_partitionwise_join = 'on';
SET max_parallel_workers_per_gather=0;
-- maybe not important but explain(settings) suggests I should include them for completeness:
SET effective_io_concurrency = '0';
SET work_mem = '512MB';
SET jit = 'off';

CREATE TABLE s(site_id int, site_location text, site_office text);
INSERT INTO s SELECT generate_series(1,99),'','';

CREATE TABLE t(start_time timestamp, site_id text, i int)PARTITION BY RANGE(start_time);
CREATE TABLE t1 PARTITION OF t FOR VALUES FROM ('2019-10-01')TO('2019-10-02');
INSERT INTO t1 SELECT a,b FROM generate_series( '2019-10-01'::timestamp, '2019-10-01 23:45'::timestamp, '15 minutes')a, generate_series(1,99)b, generate_series(1,99)c;
CREATE TABLE t2 PARTITION OF t FOR VALUES FROM ('2019-10-02')TO('2019-10-03');
INSERT INTO t2 SELECT a,b FROM generate_series( '2019-10-02'::timestamp, '2019-10-02 23:45'::timestamp, '15 minutes')a, generate_series(1,99)b, generate_series(1,99)c;

ANALYZE s,t;

explain
SELECT s.* FROM
(SELECT start_time, site_id::int
FROM t t1 FULL JOIN t t2 USING(start_time,site_id)
WHERE (start_time>='2019-10-01' AND start_time<'2019-10-01 01:00')
GROUP BY 1,2) AS data
JOIN s ON (s.site_location='' OR s.site_office::int=data.site_id)

Justin

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Justin Pryzby (#5)
Re: v12.0: ERROR: could not find pathkey item to sort

Justin Pryzby <pryzby@telsasoft.com> writes:

On Fri, Oct 11, 2019 at 10:48:37AM -0400, Tom Lane wrote:

Could you provide a self-contained test case please?

[ test case ]

Yup, fails for me too. Will look shortly.

regards, tom lane

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Justin Pryzby (#5)
Re: v12.0: ERROR: could not find pathkey item to sort

Justin Pryzby <pryzby@telsasoft.com> writes:

On Fri, Oct 11, 2019 at 10:48:37AM -0400, Tom Lane wrote:

Could you provide a self-contained test case please?

[ test case ]

Oh, this is the same issue Amit described in

/messages/by-id/CA+HiwqG2WVUGmLJqtR0tPFhniO=H=9qQ+Z3L_ZC+Y3-EVQHFGg@mail.gmail.com

namely that we're not generating EquivalenceClass members corresponding
to sub-joins of a partitionwise join.

Are you interested in helping to test the patches proposed there?

regards, tom lane

#8Justin Pryzby
pryzby@telsasoft.com
In reply to: Tom Lane (#7)
Re: v12.0: ERROR: could not find pathkey item to sort

On Sun, Oct 13, 2019 at 02:06:02PM -0400, Tom Lane wrote:

Justin Pryzby <pryzby@telsasoft.com> writes:

On Fri, Oct 11, 2019 at 10:48:37AM -0400, Tom Lane wrote:

Could you provide a self-contained test case please?

[ test case ]

Oh, this is the same issue Amit described in

/messages/by-id/CA+HiwqG2WVUGmLJqtR0tPFhniO=H=9qQ+Z3L_ZC+Y3-EVQHFGg@mail.gmail.com

namely that we're not generating EquivalenceClass members corresponding
to sub-joins of a partitionwise join.

Are you interested in helping to test the patches proposed there?

Sure. Any requests other than testing that our original query works correctly
and maybe endeavoring to read the patch ?

BTW it probably should've been documented as an "Open Item" for v12.

--
Justin Pryzby
System Administrator
Telsasoft
+1-952-707-8581

#9Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#8)
Re: v12.0: ERROR: could not find pathkey item to sort

On Sun, Oct 13, 2019 at 01:30:29PM -0500, Justin Pryzby wrote:

BTW it probably should've been documented as an "Open Item" for v12.

https://commitfest.postgresql.org/25/2278/
I realized possibly people were thinking of that as a "feature" and not a
bugfix for backpatch (?)

But, my issue is a query which worked under v11 PWJ but fails under v12
(apparently broken by d25ea01275).

In my mind, if the planner doesn't support that query with PWJ, I think it
should run without PWJ rather than fail.

Justin

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Justin Pryzby (#9)
Re: v12.0: ERROR: could not find pathkey item to sort

Justin Pryzby <pryzby@telsasoft.com> writes:

On Sun, Oct 13, 2019 at 01:30:29PM -0500, Justin Pryzby wrote:

BTW it probably should've been documented as an "Open Item" for v12.

https://commitfest.postgresql.org/25/2278/
I realized possibly people were thinking of that as a "feature" and not a
bugfix for backpatch (?)
But, my issue is a query which worked under v11 PWJ but fails under v12
(apparently broken by d25ea01275).

Yeah, this should have been dealt with as an open item, but it
slipped through the cracks. We'll make sure to get it fixed,
one way or another, for 12.1.

In view of the proposed patches being dependent on some other
13-only changes, I wonder if we should fix v12 by reverting
d25ea0127. The potential planner performance loss for large
partition sets could be nasty, but failing to plan at all is worse.

regards, tom lane

#11Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tom Lane (#10)
Re: v12.0: ERROR: could not find pathkey item to sort

Sorry about the late reply.

On Mon, Oct 14, 2019 at 11:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Justin Pryzby <pryzby@telsasoft.com> writes:

On Sun, Oct 13, 2019 at 01:30:29PM -0500, Justin Pryzby wrote:

BTW it probably should've been documented as an "Open Item" for v12.

https://commitfest.postgresql.org/25/2278/
I realized possibly people were thinking of that as a "feature" and not a
bugfix for backpatch (?)
But, my issue is a query which worked under v11 PWJ but fails under v12
(apparently broken by d25ea01275).

Yeah, this should have been dealt with as an open item, but it
slipped through the cracks. We'll make sure to get it fixed,
one way or another, for 12.1.

In view of the proposed patches being dependent on some other
13-only changes, I wonder if we should fix v12 by reverting
d25ea0127. The potential planner performance loss for large
partition sets could be nasty, but failing to plan at all is worse.

Actually, the patch I proposed to fix equivalence code can be applied
on its own. The example I gave on that thread needs us to fix
partitionwise code to even work, which is perhaps a 13-only change,
but we have an example here that is broken due to d25ea01275.
Perhaps, considering applying my patch seems better than alternatives
which are either reverting d25ea01275 or avoiding doing partitionwise
join for such queries.

Since we've got 3373c71553 ("Speed up finding EquivalenceClasses for a
given set of rels") in HEAD, need two versions of the patch; please
see attached.

Thanks,
Amit

Attachments:

d25ea01275-fixup-v12.patchapplication/octet-stream; name=d25ea01275-fixup-v12.patchDownload+135-17
d25ea01275-fixup-v13.patchapplication/octet-stream; name=d25ea01275-fixup-v13.patchDownload+154-26
#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Langote (#11)
Re: v12.0: ERROR: could not find pathkey item to sort

Amit Langote <amitlangote09@gmail.com> writes:

On Mon, Oct 14, 2019 at 11:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

In view of the proposed patches being dependent on some other
13-only changes, I wonder if we should fix v12 by reverting
d25ea0127. The potential planner performance loss for large
partition sets could be nasty, but failing to plan at all is worse.

Actually, the patch I proposed to fix equivalence code can be applied
on its own. The example I gave on that thread needs us to fix
partitionwise code to even work, which is perhaps a 13-only change,
but we have an example here that is broken due to d25ea01275.
Perhaps, considering applying my patch seems better than alternatives
which are either reverting d25ea01275 or avoiding doing partitionwise
join for such queries.

I looked at this a bit, and I see that the core idea is to generate
the missing EC members by applying add_child_rel_equivalences when
building child join rels. Perhaps we can make that work, but this
patch fails to, because you've ignored the comment pointing out
that the nullable_relids fixup logic only works for baserels:

* And likewise for nullable_relids. Note this code assumes
* parent and child relids are singletons.

We could improve that to work for joinrels, I think, but it would become
very much more complicated (and slower). AFAICS the logic would have
to be "iterate through top_parent_relids, see which ones are in
em_nullable_relids, and for each one that is, find the corresponding
child_relid and substitute that in new_nullable_relids". This is a
bit of a problem because we don't have any really convenient way to
map individual top parent relids to child relids. I wonder if we
should extend AppendRelInfo to include the top parent relid as well as
the immediate parent. (Perhaps, while we're at it, we could make
adjust_appendrel_attrs_multilevel less of an inefficient and
underdocumented mess.)

Also, I'm pretty sure this addition is wrong/broken:

+
+                /*
+                 * There aren't going to be more expressions to translate in
+                 * the same EC.
+                 */
+                break;

What makes you think that an EC can only contain one entry per rel?

More generally, as long as this patch requires changing
add_child_rel_equivalences' API anyway, I wonder if we should
rethink that altogether. Looking at the code now, I realize that
d25ea01275 resulted in horribly bastardizing that function's API,
because the parent_rel and appinfo arguments are only consulted in
some cases, while in other cases we disregard them and rely on
child_rel->top_parent_relids to figure out how to translate stuff.
It would be possible to make the argument list be just (root, child_rel)
and always rely on child_rel->top_parent_relids. At the very least,
if we keep the extra arguments, we should document them as being just
optimizations.

regards, tom lane

#13Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tom Lane (#12)
Re: v12.0: ERROR: could not find pathkey item to sort

Thanks for taking a look and sorry about the delay in replying.

On Fri, Oct 25, 2019 at 1:51 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Langote <amitlangote09@gmail.com> writes:

On Mon, Oct 14, 2019 at 11:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

In view of the proposed patches being dependent on some other
13-only changes, I wonder if we should fix v12 by reverting
d25ea0127. The potential planner performance loss for large
partition sets could be nasty, but failing to plan at all is worse.

Actually, the patch I proposed to fix equivalence code can be applied
on its own. The example I gave on that thread needs us to fix
partitionwise code to even work, which is perhaps a 13-only change,
but we have an example here that is broken due to d25ea01275.
Perhaps, considering applying my patch seems better than alternatives
which are either reverting d25ea01275 or avoiding doing partitionwise
join for such queries.

I looked at this a bit, and I see that the core idea is to generate
the missing EC members by applying add_child_rel_equivalences when
building child join rels. Perhaps we can make that work, but this
patch fails to, because you've ignored the comment pointing out
that the nullable_relids fixup logic only works for baserels:

* And likewise for nullable_relids. Note this code assumes
* parent and child relids are singletons.

We could improve that to work for joinrels, I think, but it would become
very much more complicated (and slower). AFAICS the logic would have
to be "iterate through top_parent_relids, see which ones are in
em_nullable_relids, and for each one that is, find the corresponding
child_relid and substitute that in new_nullable_relids". This is a
bit of a problem because we don't have any really convenient way to
map individual top parent relids to child relids.

Actually, there is adjust_child_relids_multilevel() which translates
the top parent relids in em_nullable_relids to child relids.

I have updated the patches that way.

I wonder if we
should extend AppendRelInfo to include the top parent relid as well as
the immediate parent. (Perhaps, while we're at it, we could make
adjust_appendrel_attrs_multilevel less of an inefficient and
underdocumented mess.)

Hmm, I agree we should try to fix that situation somehow. Even better
if we could do away with child expressions in ECs, because they cause
EC related code to show up in profiles when executing complex queries
with thousands of partitions.

Also, I'm pretty sure this addition is wrong/broken:

+
+                /*
+                 * There aren't going to be more expressions to translate in
+                 * the same EC.
+                 */
+                break;

What makes you think that an EC can only contain one entry per rel?

I was wrong about that. Fixed.

More generally, as long as this patch requires changing
add_child_rel_equivalences' API anyway, I wonder if we should
rethink that altogether. Looking at the code now, I realize that
d25ea01275 resulted in horribly bastardizing that function's API,
because the parent_rel and appinfo arguments are only consulted in
some cases, while in other cases we disregard them and rely on
child_rel->top_parent_relids to figure out how to translate stuff.
It would be possible to make the argument list be just (root, child_rel)
and always rely on child_rel->top_parent_relids.

Actually, as of 3373c71553, add_child_rel_equivalences() looks at
parent_rel->eclass_indexes to look up ECs, so maybe we can't take out
parent_rel.

At the very least,
if we keep the extra arguments, we should document them as being just
optimizations.

For common cases that doesn't involve multi-level partitioning, it
really helps to have the appinfos be supplied by the caller because
they're already available. I've added a comment at the top about
that.

Attached updated patches.

Thanks,
Amit

Attachments:

d25ea01275-fixup-HEAD-v2.patchapplication/octet-stream; name=d25ea01275-fixup-HEAD-v2.patchDownload+190-47
d25ea01275-fixup-PG12-v2.patchapplication/octet-stream; name=d25ea01275-fixup-PG12-v2.patchDownload+163-28
#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Langote (#13)
Re: v12.0: ERROR: could not find pathkey item to sort

Amit Langote <amitlangote09@gmail.com> writes:

Attached updated patches.

[ looks at that... ] I seriously, seriously dislike what you did
in build_join_rel, ie adding the new joinrel to the global data
structures before it's fully filled in. That's inevitably going
to bite us on the ass someday, and you couldn't even be bothered
with a comment.

Worse, the reason you did that seems to be so that
generate_join_implied_equalities can find and modify the joinrel,
which is an undocumented and not very well thought out side-effect.
There are other call sites for that where the joinrel may or may not
already exist, meaning that it might or might not add more members into
the joinrel's eclass_indexes. At best that's wasted work, and at
worst it costs efficiency, since we could in principle get different
sets of common relids depending on which join input pair we're
considering. They're equally valid sets, but unioning them is
just going to add more relids than we need.

Also, the existing logic around eclass_indexes is that it's only
set for baserels and we know it is valid after we've finished
EC merging. I don't much like modifying add_child_rel_equivalences
to have some different opinions about that for joinrels.

It'd probably be better to just eat the cost of doing
get_common_eclass_indexes over again when it's time to do
add_child_rel_equivalences for a joinrel, since we aren't (I hope)
going to do that more than once per joinrel anyway. This would
probably require refactoring things so that there are separate
entry points to add child equivalences for base rels and join rels.
But that seems cleaner anyway than what you've got here.

David --- much of the complexity here comes from the addition of
the eclass_indexes infrastructure, so do you have any thoughts?

regards, tom lane

#15David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#14)
Re: v12.0: ERROR: could not find pathkey item to sort

On Thu, 31 Oct 2019 at 05:09, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David --- much of the complexity here comes from the addition of
the eclass_indexes infrastructure, so do you have any thoughts?

Hindsight makes me think I should have mentioned in the comment for
eclass_indexes that it's only used for simple rels and remains NULL
for anything else.

All the code in equivclass.c either uses get_common_eclass_indexes()
and get_eclass_indexes_for_relids() which go down to the simple rel
level to obtain their eclass_indexes. When calling
get_eclass_indexes_for_relids() we'll build a union Bitmapset with the
indexes from each simple rel that the join rel is made from. We only
ever directly use the eclass_indexes field when we're certain we're
dealing with a simple rel already. get_eclass_indexes_for_relids()
would do the same job, but using the field directly saves a bit of
needless effort and memory allocations. So, in short, I don't really
see why we need to set eclass_indexes for anything other than simple
rels.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#16Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tom Lane (#14)
Re: v12.0: ERROR: could not find pathkey item to sort

Thanks for checking.

On Thu, Oct 31, 2019 at 1:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Also, the existing logic around eclass_indexes is that it's only
set for baserels and we know it is valid after we've finished
EC merging. I don't much like modifying add_child_rel_equivalences
to have some different opinions about that for joinrels.

It'd probably be better to just eat the cost of doing
get_common_eclass_indexes over again when it's time to do
add_child_rel_equivalences for a joinrel, since we aren't (I hope)
going to do that more than once per joinrel anyway.

If you mean once per child joinrel, then yes. Implemented this
approach in the attached updated patch.

ISTM, get_common_eclass_indexes() returns the same value for all the
child joinrels (or really for its outer and inner component rels) as
it does for the parent joinrel. So, it might be better to set
eclass_indexes in parent joinrel once and use the same value for all
its descendant joinrels. Although, we'd need to export
get_common_eclass_indexes() out of equivclass.c to call it from
build_join_rel() such that it doesn't require messing with where the
joinrel is added to the global data structure. Maybe that complicates
eclass_indexes infrastructure though.

This would
probably require refactoring things so that there are separate
entry points to add child equivalences for base rels and join rels.
But that seems cleaner anyway than what you've got here.

Separate entry points sounds better, but only in HEAD? Should we have
separate entry points in PG 12 too?

Attached updated patch only for HEAD.

Thanks,
Amit

Attachments:

d25ea01275-fixup-HEAD-v3.patchapplication/octet-stream; name=d25ea01275-fixup-HEAD-v3.patchDownload+233-2
#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Langote (#16)
Re: v12.0: ERROR: could not find pathkey item to sort

Amit Langote <amitlangote09@gmail.com> writes:

This would
probably require refactoring things so that there are separate
entry points to add child equivalences for base rels and join rels.
But that seems cleaner anyway than what you've got here.

Separate entry points sounds better, but only in HEAD?

I had actually had in mind that we might have two wrappers around a
common search-and-replace routine, but after studying the code I see that
there's just enough differences to make it probably not worth the trouble
to do it like that. I did spend a bit of time removing cosmetic
differences between the two versions, though, mostly by creating
common local variables.

I think the way you did the matching_ecs computation is actually wrong:
we need to find ECs that reference any rel of the join, not only those
that reference both inputs. If nothing else, the way you have it here
makes the results dependent on which pair of input rels gets considered
first, and that's certainly bad for multiway joins.

Also, I thought we should try to put more conditions on whether we
invoke add_child_join_rel_equivalences at all. In the attached I did

if ((enable_partitionwise_join || enable_partitionwise_aggregate) &&
(joinrel->has_eclass_joins ||
has_useful_pathkeys(root, parent_joinrel)))

but I wonder if we could do more, like checking to see if the parent
joinrel is partitioned. Alternatively, maybe it's unnecessary because
we won't try to build child joinrels unless these conditions are true?

I did not like the test case either. Creating a whole new (and rather
large) test table just for this one case is unreasonably expensive ---
it about doubles the runtime of the equivclass test for me. There's
already a perfectly good test table in partition_join.sql, which seems
like a more natural home for this anyhow. After a bit of finagling
I was able to adapt the test query to fail on that table.

Patch v4 attached. I've not looked at what we need to do to make this
work in v12.

regards, tom lane

Attachments:

d25ea01275-fixup-HEAD-v4.patchtext/x-diff; charset=us-ascii; name=d25ea01275-fixup-HEAD-v4.patchDownload+261-13
#18Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tom Lane (#17)
Re: v12.0: ERROR: could not find pathkey item to sort

On Sun, Nov 3, 2019 at 4:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Langote <amitlangote09@gmail.com> writes:

This would
probably require refactoring things so that there are separate
entry points to add child equivalences for base rels and join rels.
But that seems cleaner anyway than what you've got here.

Separate entry points sounds better, but only in HEAD?

I had actually had in mind that we might have two wrappers around a
common search-and-replace routine, but after studying the code I see that
there's just enough differences to make it probably not worth the trouble
to do it like that. I did spend a bit of time removing cosmetic
differences between the two versions, though, mostly by creating
common local variables.

Agree that having two totally separate routines is better.

I think the way you did the matching_ecs computation is actually wrong:
we need to find ECs that reference any rel of the join, not only those
that reference both inputs. If nothing else, the way you have it here
makes the results dependent on which pair of input rels gets considered
first, and that's certainly bad for multiway joins.

I'm not sure I fully understand the problems, but maybe you're right.

Also, I thought we should try to put more conditions on whether we
invoke add_child_join_rel_equivalences at all. In the attached I did

if ((enable_partitionwise_join || enable_partitionwise_aggregate) &&
(joinrel->has_eclass_joins ||
has_useful_pathkeys(root, parent_joinrel)))

but I wonder if we could do more, like checking to see if the parent
joinrel is partitioned. Alternatively, maybe it's unnecessary because
we won't try to build child joinrels unless these conditions are true?

Actually, I think we can assert in add_child_rel_equivalences() that
enable_partitionwise_join is true. Then checking
enable_partitionwise_aggregate is unnecessary.

I did not like the test case either. Creating a whole new (and rather
large) test table just for this one case is unreasonably expensive ---
it about doubles the runtime of the equivclass test for me. There's
already a perfectly good test table in partition_join.sql, which seems
like a more natural home for this anyhow. After a bit of finagling
I was able to adapt the test query to fail on that table.

That's great. I tried but I can only finagle so much when it comes to
twisting around plan shapes to what I need. :)

Patch v4 attached. I've not looked at what we need to do to make this
work in v12.

Thanks a lot for the revised patch.

Maybe the only difference between HEAD and v12 is that the former has
eclass_indexes infrastructure, whereas the latter doesn't? I have
attached a version of your patch adapted for v12.

Also, looking at this in the patched code:

+ /*
+ * We may ignore expressions that reference a single baserel,
+ * because add_child_rel_equivalences should have handled them.
+ */
+ if (bms_membership(cur_em->em_relids) != BMS_MULTIPLE)
+ continue;

I have been thinking maybe add_child_rel_equivalences() doesn't need
to translate EC members that reference multiple appendrels, because
there top_parent_relids is always a singleton set, whereas em_relids
of such expressions is not? Those half-translated expressions are
useless, only adding to the overhead of scanning ec_members. I'm
thinking that we should apply this diff:

diff --git a/src/backend/optimizer/path/equivclass.c
b/src/backend/optimizer/path/equivclass.c
index e8e9e9a314..d4d80c8101 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -2169,7 +2169,7 @@ add_child_rel_equivalences(PlannerInfo *root,
                 continue;       /* ignore children here */
             /* Does this member reference child's topmost parent rel? */
-            if (bms_overlap(cur_em->em_relids, top_parent_relids))
+            if (bms_is_subset(cur_em->em_relids, top_parent_relids))
             {
                 /* Yes, generate transformed child version */
                 Expr       *child_expr;

Thoughts?

Thanks,
Amit

Attachments:

d25ea01275-fixup-PG12-v4.patchapplication/octet-stream; name=d25ea01275-fixup-PG12-v4.patchDownload+261-13
#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Langote (#18)
Re: v12.0: ERROR: could not find pathkey item to sort

Amit Langote <amitlangote09@gmail.com> writes:

On Sun, Nov 3, 2019 at 4:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Also, I thought we should try to put more conditions on whether we
invoke add_child_join_rel_equivalences at all. In the attached I did
if ((enable_partitionwise_join || enable_partitionwise_aggregate) &&
(joinrel->has_eclass_joins ||
has_useful_pathkeys(root, parent_joinrel)))
but I wonder if we could do more, like checking to see if the parent
joinrel is partitioned. Alternatively, maybe it's unnecessary because
we won't try to build child joinrels unless these conditions are true?

Actually, I think we can assert in add_child_rel_equivalences() that
enable_partitionwise_join is true. Then checking
enable_partitionwise_aggregate is unnecessary.

After tracing the call sites back a bit further, I agree that we won't
be here in the first place unless enable_partitionwise_join is true,
so the extra tests I proposed are unnecessary. I took them out again.

I have been thinking maybe add_child_rel_equivalences() doesn't need
to translate EC members that reference multiple appendrels, because
there top_parent_relids is always a singleton set, whereas em_relids
of such expressions is not?  Those half-translated expressions are
useless, only adding to the overhead of scanning ec_members.  I'm
thinking that we should apply this diff:
-            if (bms_overlap(cur_em->em_relids, top_parent_relids))
+            if (bms_is_subset(cur_em->em_relids, top_parent_relids))

Meh, I'm not really convinced. The case where this would be relevant
is an EC generated from something like "WHERE (a.x + b.y) = c.z"
where "a" is partitioned. It's possible that we'd never have a use
for a sort key corresponding to "a_child.x + b.y", but I think that's
not obvious, and probably short-sighted. Anyway such EC members are
pretty rare in the first place, so we're not going to win much
performance by trying to optimize them.

Anyway, I've pushed the fix for Justin's problem to v12 and HEAD.
The problem with poor planning of multiway joins that you mentioned
in the other thread remains open, but I imagine the patches you
posted there are going to need rebasing over this commit, so I
set that CF entry to Waiting On Author.

regards, tom lane

#20Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tom Lane (#19)
Re: v12.0: ERROR: could not find pathkey item to sort

On Wed, Nov 6, 2019 at 1:51 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Langote <amitlangote09@gmail.com> writes:

I have been thinking maybe add_child_rel_equivalences() doesn't need
to translate EC members that reference multiple appendrels, because
there top_parent_relids is always a singleton set, whereas em_relids
of such expressions is not?  Those half-translated expressions are
useless, only adding to the overhead of scanning ec_members.  I'm
thinking that we should apply this diff:
-            if (bms_overlap(cur_em->em_relids, top_parent_relids))
+            if (bms_is_subset(cur_em->em_relids, top_parent_relids))

Meh, I'm not really convinced. The case where this would be relevant
is an EC generated from something like "WHERE (a.x + b.y) = c.z"
where "a" is partitioned. It's possible that we'd never have a use
for a sort key corresponding to "a_child.x + b.y", but I think that's
not obvious, and probably short-sighted. Anyway such EC members are
pretty rare in the first place, so we're not going to win much
performance by trying to optimize them.

OK.

Anyway, I've pushed the fix for Justin's problem to v12 and HEAD.
The problem with poor planning of multiway joins that you mentioned
in the other thread remains open, but I imagine the patches you
posted there are going to need rebasing over this commit, so I
set that CF entry to Waiting On Author.

Thank you. I will send rebased patches on that thread.

Regards,
Amit