BUG #15795: ERROR: could not find pathkey item to sort

Started by PG Bug reporting formalmost 7 years ago7 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 15795
Logged by: Suresh Kumar R
Email address: suresh.arsenal29@gmail.com
PostgreSQL version: 10.3
Operating system: Ubuntu 16.04
Description:

I have a table 'person' with properties _id, _actual_type_name and name when
I tried the below query, I got an error saying 'could not find pathkey item
to sort'.

Here is query:
SELECT DISTINCT A._id0 as _id0, A._actual_type_name0 as _actual_type_name0
FROM ( ( SELECT DISTINCT _id as _id0, _actual_type_name as
_actual_type_name0, name as name0 FROM hello_world.person ) union all (
SELECT DISTINCT _id as _id0, _actual_type_name as _actual_type_name0, name
as name0 FROM hello_world.person)) as A WHERE ( A.name0 = A.name0 );

#2David G. Johnston
david.g.johnston@gmail.com
In reply to: PG Bug reporting form (#1)
Re: BUG #15795: ERROR: could not find pathkey item to sort

On Tuesday, May 7, 2019, PG Bug reporting form <noreply@postgresql.org>
wrote:

The following bug has been logged on the website:

Bug reference: 15795
Logged by: Suresh Kumar R
Email address: suresh.arsenal29@gmail.com
PostgreSQL version: 10.3
Operating system: Ubuntu 16.04
Description:

I have a table 'person' with properties _id, _actual_type_name and name
when
I tried the below query, I got an error saying 'could not find pathkey item
to sort'.

Upgrade to the latest release (10.7)

David J.

#3Suresh Kumar R
suresh.arsenal29@gmail.com
In reply to: David G. Johnston (#2)
Re: BUG #15795: ERROR: could not find pathkey item to sort

I tried in the latest 10.7 release too. I still get the same error.

Suresh.

On Wed, May 8, 2019 at 12:40 PM David G. Johnston <
david.g.johnston@gmail.com> wrote:

Show quoted text

On Tuesday, May 7, 2019, PG Bug reporting form <noreply@postgresql.org>
wrote:

The following bug has been logged on the website:

Bug reference: 15795
Logged by: Suresh Kumar R
Email address: suresh.arsenal29@gmail.com
PostgreSQL version: 10.3
Operating system: Ubuntu 16.04
Description:

I have a table 'person' with properties _id, _actual_type_name and name
when
I tried the below query, I got an error saying 'could not find pathkey
item
to sort'.

Upgrade to the latest release (10.7)

David J.

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: PG Bug reporting form (#1)
Re: BUG #15795: ERROR: could not find pathkey item to sort

PG Bug reporting form <noreply@postgresql.org> writes:

Here is query:
SELECT DISTINCT A._id0 as _id0, A._actual_type_name0 as _actual_type_name0
FROM ( ( SELECT DISTINCT _id as _id0, _actual_type_name as
_actual_type_name0, name as name0 FROM hello_world.person ) union all (
SELECT DISTINCT _id as _id0, _actual_type_name as _actual_type_name0, name
as name0 FROM hello_world.person)) as A WHERE ( A.name0 = A.name0 );

It's politer to provide a self-contained test case, rather than expect us
to reverse-engineer details that might be critical.

For the archives, though, this isn't hard to reproduce:

regression=# create table person(_id int, _actual_type_name text, name text);
CREATE TABLE
regression=# SELECT DISTINCT A._id0 as _id0, A._actual_type_name0 as _actual_type_name0
FROM ( ( SELECT DISTINCT _id as _id0, _actual_type_name as
_actual_type_name0, name as name0 FROM person ) union all (
SELECT DISTINCT _id as _id0, _actual_type_name as _actual_type_name0, name
as name0 FROM person)) as A WHERE ( A.name0 = A.name0 );
ERROR: could not find pathkey item to sort

Curiously, this only fails for me in 9.6 and 10, not earlier or later
branches.

regards, tom lane

#5Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tom Lane (#4)
Re: BUG #15795: ERROR: could not find pathkey item to sort

On 2019/05/08 19:33, Tom Lane wrote:

For the archives, though, this isn't hard to reproduce:

regression=# create table person(_id int, _actual_type_name text, name text);
CREATE TABLE
regression=# SELECT DISTINCT A._id0 as _id0, A._actual_type_name0 as _actual_type_name0
FROM ( ( SELECT DISTINCT _id as _id0, _actual_type_name as
_actual_type_name0, name as name0 FROM person ) union all (
SELECT DISTINCT _id as _id0, _actual_type_name as _actual_type_name0, name
as name0 FROM person)) as A WHERE ( A.name0 = A.name0 );
ERROR: could not find pathkey item to sort

Curiously, this only fails for me in 9.6 and 10, not earlier or later
branches.

Here are my findings after debugging this on 9.5, 9.6, 10, and 11:

As you also said, the problem only seems to occur in 9.6 and 10. It does
because the MergeAppend path created for UNION ALL subquery in the outer
query's FROM contains one pathkey item too many in its pathkeys field.
That's because each of its child subpaths has pathkeys (_id,
_actual_type_name, name), whereas the outer query only wants (_id,
_actual_type_name). I hoped that convert_subquery_pathkeys() called
during child subquery's path creation would've taken the unnecessary
"name" out because the outer query doesn't need it, but it didn't. It
fails to do so because the subquery's pathkey "name" is being matched
successfully to an outer EC installed due to the outer query's WHERE
(A.name0 = A.name0); that's via the following code in
convert_subquery_pathkeys():

outer_ec =
get_eclass_for_sort_expr(root,
outer_expr,

I checked if and how convert_subquery_pathkeys() determines which of the
subquery's pathkeys are useful to the outer query and there does exist a
scoring system to evaluate the usefulness of subquery's pathkeys to outer
query, but I didn't fully understand it. In any case, the main problem
seems to be that convert_subquery_pathkeys() can't keep "name" from
appearing in the output pathkeys that it produces. Based on that premise,
I added the following code to convert_subquery_pathkeys():

+
+		if (retvallen == outer_query_keys)
+			 break;

which seems to fix the issue. Alternatively, maybe we can apply
truncate_useless_pathkeys() to the result of convert_subquery_pathkeys().

The problem doesn't manifest with 9.5 or 11 (even HEAD for that matter),
because a sort-based path is not chosen in their case for unique-fying
(UNION ALL uses Append, not MergeAppend on cost grounds), so there's no
attempt to look for "pathkey item to sort" to begin with.

In 11's (and HEAD's) case, even if MergeAppend had won in terms of
costing, the problem wouldn't occur at least for this query, because
"name" doesn't appear in the outer query's ECs due to the following commit:

commit 8ec5429e2f422f4d570d4909507db0d4ca83bbac
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sun Oct 8 12:23:32 2017 -0400

Reduce "X = X" to "X IS NOT NULL", if it's easy to do so.

However it's easy to tweak the query such that "name" *will* end up in
outer query's ECs as follows:

SELECT DISTINCT A._id0 as _id0, A._actual_type_name0 as _actual_type_name0
FROM ( ( SELECT DISTINCT _id as _id0, _actual_type_name as
_actual_type_name0, name as name0 FROM person ) union all (
SELECT DISTINCT _id as _id0, _actual_type_name as _actual_type_name0, name
as name0 FROM person)) as A INNER JOIN person A1 ON ( A.name0 = A1.name );

Now, A.name0 and A1.name in the join condition do form a an EC, which does
trick convert_subquery_pathkeys() into accepting the child subqueries'
"name" key.

IOW, if the "fix" I mentioned above is correct, it will have to applied to
all the branches, because this seems to be a fundamental problem with
convert_subquery_pathkeys().

Thoughts?

Thanks,
Amit

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Langote (#5)
Re: BUG #15795: ERROR: could not find pathkey item to sort

Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:

On 2019/05/08 19:33, Tom Lane wrote:

For the archives, though, this isn't hard to reproduce:

regression=# create table person(_id int, _actual_type_name text, name text);
CREATE TABLE
regression=# SELECT DISTINCT A._id0 as _id0, A._actual_type_name0 as _actual_type_name0
FROM ( ( SELECT DISTINCT _id as _id0, _actual_type_name as
_actual_type_name0, name as name0 FROM person ) union all (
SELECT DISTINCT _id as _id0, _actual_type_name as _actual_type_name0, name
as name0 FROM person)) as A WHERE ( A.name0 = A.name0 );
ERROR: could not find pathkey item to sort

Curiously, this only fails for me in 9.6 and 10, not earlier or later
branches.

... In any case, the main problem
seems to be that convert_subquery_pathkeys() can't keep "name" from
appearing in the output pathkeys that it produces. Based on that premise,
I added the following code to convert_subquery_pathkeys():

+
+		if (retvallen == outer_query_keys)
+			 break;

That's just a kluge.

which seems to fix the issue. Alternatively, maybe we can apply
truncate_useless_pathkeys() to the result of convert_subquery_pathkeys().

Yeah, I thought about that too. The comment on convert_subquery_pathkeys
that says truncate_useless_pathkeys would do nothing is wrong. However,
if you apply truncate_useless_pathkeys to the results, you'll find that
some of the regression test query plans change, and not for the better.
I'm thinking of changing that comment to read like

* We intentionally don't do truncate_useless_pathkeys() here, because there
* are situations where seeing the raw ordering of the subquery is helpful.
* For example, if it returns ORDER BY x DESC, that may prompt us to
* construct a mergejoin using DESC order rather than ASC order; but the
* right_merge_direction heuristic would have us throw the knowledge away.

Anyway, yes, the basic issue is that convert_subquery_pathkeys is
returning pathkeys that are outside the norm for the outer query,
and some places are falling over because of that. I've found at
least two bugs that I believe are present in multiple branches,
although I'm having difficulty in constructing branch-independent
tests for them, because the bugs interact with each other and with
other changes that we've made.

First off, the reported problem can be reproduced with an existing
regression-test table, eg

regression=# select distinct q1 from (select distinct * from int8_tbl union all select distinct * from int8_tbl) ss where (q2 = q2);
ERROR: could not find pathkey item to sort

The reason that this doesn't fail in >= v11 is that we do not build
an EquivalenceClass for q2 in the outer query, so that there's nothing
for convert_subquery_pathkeys to match it to. And that happens because
the q2 = q2 clause is converted to "q2 is not null". In <= v10, that
clause is left alone and since it's a mergejoinable operator we end
up assigning ECs to each side. (Conceivably, since it's not actually
a join clause, we don't need to do that, but I'm hesitant to muck with
that; it would only be a band-aid over the true problem anyhow.)

Anyway, what's happening in v10 is

1. convert_subquery_pathkeys sees that the subpath is sorted by q1, q2,
and it successfully generates a 2-element pathkeys list representing
that in the outer query, which it can do because (a) q2 is in the
subpath tlist and (b) there is an EC in the outer query for q2.
The other UNION ALL arm has an identical path, too.

2. When we come to build a MergeAppend path for the UNION ALL, we
label it with the common pathkeys list of the two inputs, i.e.
q1, q2, although the outer query really only cares about sort-by-q1.

3. When we come to build a plan for the MergeAppend, we need to
locate the sort columns in the outputs of its child SubqueryScan
plans ... and q2 is not there. Ooops.

Now, how is that happening given that convert_subquery_pathkeys
is specifically checking that the column is emitted by the subquery?
The reason is that *it's checking the wrong thing*. It's looking
at what the native output of the subquery is, not at what the
SubqueryScan that we're going to stack atop it will produce.
And in this case, because q2 is not required by the outer query
(once we've pushed the WHERE clause down to or below the subquery
scan), q2 is not in the reltarget list for the subquery RTE, so
it's not going to be emitted by the SubqueryScan.

I haven't decided what to do about this. The minimally invasive
fix would be to teach convert_subquery_pathkeys that it can't emit
anything not listed in rel->reltarget. That seems like it might
lose useful information, though perhaps the consequences are minimal
given how hard it is to hit this bug. Better would be to add
entries to the reltarget for useful sort columns --- but I think
it's likely too late to do so here. (We may have already generated
some paths using the existing reltarget contents.)

Anyway, we're not out of the woods by any means. Wondering whether
the issue couldn't be reproduced in >= v11 by doing something that
wouldn't be reduced to an IS NOT NULL, I stumbled across this in HEAD:

regression=# select distinct q1 from (select distinct * from int8_tbl union all select distinct * from int8_tbl) ss where (-q1 = q2);
psql: server closed the connection unexpectedly

That's hitting this assertion:

TRAP: FailedAssertion("!(list_length(dest_tlist) == list_length(src_tlist))", File: "tlist.c", Line: 345)

Investigation shows that in this case, we successfully generate
a Plan, but the final apply_tlist_labeling step fails, because
create_merge_append_plan has blithely ignored the CP_EXACT_TLIST
flag and stuck on extra resjunk columns. That seems like a pretty
clear violation of createplan.c's internal expectations, so I made
a patch that teaches that function to stick on a filtering Result
if it added extra columns in violation of a flag demanding that it
not do so. As of HEAD, create_append_plan has similar logic so I
made the same change there, although I have not found a way to reach
that bug today. (See patch attached.)

With that patch, we successfully build a plan, but it looks a bit odd:

Unique
-> Result
-> Merge Append
Sort Key: "*SELECT* 1".q1, ((- "*SELECT* 1".q1))
-> Subquery Scan on "*SELECT* 1"
-> Unique
-> Sort
Sort Key: i81.q1, i81.q2
-> Seq Scan on int8_tbl i81
Filter: ((- q1) = q2)
-> Subquery Scan on "*SELECT* 2"
-> Unique
-> Sort
Sort Key: i82.q1, i82.q2
-> Seq Scan on int8_tbl i82
Filter: ((- q1) = q2)

What's happening there is that the outer query has an EC containing
{ -q1, q2 }, and convert_subquery_pathkeys successfully matches the q2
pathkey to that EC. Then, when prepare_sort_from_pathkeys tries to
find a sort column for that EC, it still doesn't find q2 in the
SubqueryScan output --- but it does find q1, so it can make an
expression matching the other EC entry instead of failing.

This is kind of grotty, of course. It'd be nicer if the outer
MergeAppend only considered as many sort columns as we actually
semantically need. But I'm not sure of a good way to arrange that.

Another thing that I've not quite got to the bottom of is how come the
MergeAppend path is getting stuck with CP_EXACT_TLIST responsibility in
the first place. Generally, since we know MergeAppend can't project,
there'd be a ProjectionPath on top of it. Usually there is, which is
how come this bug has escaped detection this long --- so maybe there's
another bug that we really ought to be using a ProjectionPath but are
not, in some particular circumstance? Not that I think it would be okay
for create_merge_append_plan to just ignore the flag.

Anyway, because there are so many different things that can mask these
bugs, getting good test cases for them might be hopeless. In the
attached I have

select distinct q1 from
(select distinct * from int8_tbl i81
union all
select distinct * from int8_tbl i82) ss
where q2 = q2;

which exposes the reltarget-vs-subplan-tlist issue, but only in 9.6
and 10, though it surely exists in other branches including HEAD, and

select distinct q1 from
(select distinct * from int8_tbl i81
union all
select distinct * from int8_tbl i82) ss
where -q1 = q2;

which exposes create_merge_append_plan's misfeasance, but only in
11 and HEAD, though it surely exists at least as far back as v10.
Getting more robust test cases would be nice -- any ideas?

regards, tom lane

Attachments:

handle-exact-tlist-for-append-plans-1.patchtext/x-diff; charset=us-ascii; name=handle-exact-tlist-for-append-plans-1.patchDownload+175-67
#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#6)
Re: BUG #15795: ERROR: could not find pathkey item to sort

I wrote:

Now, how is that happening given that convert_subquery_pathkeys
is specifically checking that the column is emitted by the subquery?
The reason is that *it's checking the wrong thing*. It's looking
at what the native output of the subquery is, not at what the
SubqueryScan that we're going to stack atop it will produce.
...
I haven't decided what to do about this. The minimally invasive
fix would be to teach convert_subquery_pathkeys that it can't emit
anything not listed in rel->reltarget. That seems like it might
lose useful information, though perhaps the consequences are minimal
given how hard it is to hit this bug.

I concluded that that should be a reasonable fix. The outer query
doesn't really have any use for sort keys that are not relevant to
either mergejoins or sorting/grouping, and in either of those cases
the sort keys would have been seen to be needed above the scan level
so they'd be included in the reltarget list.

Hence, attached is a draft patch for that part against v10. (I started
there since we don't have a test case to show this bug in HEAD.)
I realized that the existing tests for !resjunk are just a half-baked
version of "is the column available in the outer query", so I folded
those into the improved checks.

Notice that with this in place, we don't get the funny extra
sort key in the MergeAppend, since convert_subquery_pathkeys doesn't
try to create that pathkey. That's good for plan quality I guess
but it means that we have no live test case for the bug in
create_merge_append_plan --- which is still a bug nonetheless.

Anyway, I plan to go ahead with applying the combination of these
fixes. If we find better test cases we can add them later, but
right now I'm not that hopeful about that.

regards, tom lane

Attachments:

make-convert_subquery_pathkeys-more-cautious-v10.patchtext/x-diff; charset=us-ascii; name=make-convert_subquery_pathkeys-more-cautious-v10.patchDownload+153-20