Sort keys are omitted incorrectly after grouping sets

Started by Richard Guoover 4 years ago2 messages
#1Richard Guo
Richard Guo
guofenglinux@gmail.com

Hi hackers,

In each PathKey we have an EC representing the value being sorted on.
This works correctly and efficiently in scan/join planning since the
members in an EC are always equal to each other.

However, after applying GROUP BY grouping sets, the members in an EC may
not be that equivalent any more, because NULLS may be emitted by
grouping sets. As a result, we may lose some sort keys incorrectly.

# explain (costs off) select a, b from foo where a = b group by cube(a, b)
order by a, b;
QUERY PLAN
-------------------------------
Sort
Sort Key: a
-> MixedAggregate
Hash Key: a, b
Hash Key: a
Hash Key: b
Group Key: ()
-> Seq Scan on foo
Filter: (a = b)
(9 rows)

I believe we should not ignore sort key 'b' in the query above.

Is this a problem we should be worried about?

Thanks
Richard

#2Greg Stark
Greg Stark
stark@mit.edu
In reply to: Richard Guo (#1)
Re: Sort keys are omitted incorrectly after grouping sets

On Tue, 3 Aug 2021 at 00:04, Richard Guo <guofenglinux@gmail.com> wrote:

Is this a problem we should be worried about?

It's easy to see this produce output in the wrong order:

postgres=# select a, b from (values (1,1),(2,2)) as foo(a,b) where a =
b group by cube(a, b) order by a, b nulls first;
a | b
---+---
1 |
1 | 1
2 | 2
2 |
|
| 1
| 2
(7 rows)

postgres=# select a, b from (values (1,1),(2,2)) as foo(a,b) where a =
b group by cube(a, b) order by a, b nulls last;
a | b
---+---
1 |
1 | 1
2 | 2
2 |
|
| 1
| 2
(7 rows)

I know we had a hack to deal with outer joins "placeholder vars" or
something like that. I imagine the same thing needs to happen here.

Incidentally, the same thing doesn't happen for a VALUES clause with a
single row value. There it seems we inline the row value and the plan
ends up ordering on both -- though it's hard to tell because the way
the explain plan is formatted makes it hard to see what's going on:

postgres=# select a, b from (values (1,1)) as foo(a,b) where a = b
group by cube(a, b) order by a, b nulls first;
a | b
---+---
1 |
1 | 1
|
| 1
(4 rows)

postgres=# explain select a, b from (values (1,1)) as foo(a,b) where a
= b group by cube(a, b) order by a, b nulls first;
QUERY PLAN
--------------------------------------------------------
Sort (cost=0.10..0.11 rows=4 width=8)
Sort Key: (1), (1) NULLS FIRST
-> MixedAggregate (cost=0.00..0.06 rows=4 width=8)
Hash Key: 1, 1
Hash Key: 1
Hash Key: 1
Group Key: ()
-> Result (cost=0.00..0.01 rows=1 width=8)
One-Time Filter: (1 = 1)
(9 rows)

With two rows we're clearly not inlining it and clearly ordering on
only the first column:

postgres=# explain select a, b from (values (1,1),(2,2)) as foo(a,b)
where a = b group by cube(a, b) order by a, b nulls first;
QUERY PLAN
-------------------------------------------------------------------------
Sort (cost=0.12..0.13 rows=4 width=8)
Sort Key: "*VALUES*".column1
-> MixedAggregate (cost=0.00..0.08 rows=4 width=8)
Hash Key: "*VALUES*".column1, "*VALUES*".column2
Hash Key: "*VALUES*".column1
Hash Key: "*VALUES*".column2
Group Key: ()
-> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=1 width=8)
Filter: (column1 = column2)
(9 rows)

--
greg