Planner, check if can use consider HASH for groupings (src/backend/optimizer/plan/planner.c)

Started by Ranier Vilelaover 5 years ago11 messages

ranier.vf@gmail.com

over 5 years ago

1 attachment(s)

Hi,

In case gd->any_hashable is FALSE, grouping_is_hashable is never called.
In this case, the planner could use HASH for groupings, but will never know.

Apparently gd pointer, will never be NULL there, verified with Assert(gd !=
NULL).

regards,
Ranier Vilela

Attachments:

v1-0001-check_grouping_is_hashable.patchapplication/octet-stream; name=v1-0001-check_grouping_is_hashable.patchDownload

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 8007e205ed..629425544e 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3867,7 +3867,7 @@ create_grouping_paths(PlannerInfo *root,
 		 */
 		if ((parse->groupClause != NIL &&
 			 agg_costs->numOrderedAggs == 0 &&
-			 (gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause))))
+			 (gd && gd->any_hashable || grouping_is_hashable(parse->groupClause))))
 			flags |= GROUPING_CAN_USE_HASH;
 
 		/*

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 5 years ago

In reply to: Ranier Vilela (#1)

Re: Planner, check if can use consider HASH for groupings (src/backend/optimizer/plan/planner.c)

On Thu, Sep 17, 2020 at 02:12:12PM -0300, Ranier Vilela wrote:

Hi,

In case gd->any_hashable is FALSE, grouping_is_hashable is never called.
In this case, the planner could use HASH for groupings, but will never know.

The condition is pretty simple - if the query has grouping sets, look at
grouping sets, otherwise look at groupClause. For this to be an issue
the query would need to have both grouping sets and (independent) group
clause - which AFAIK is not possible.

For hashing to be worth considering, at least one grouping set has to be
hashable - otherwise it's pointless.

Granted, you could have something like

GROUP BY GROUPING SETS ((a), (b)), c

which I think essentially says "add c to every grouping set" and that
will be covered by the any_hashable check.

Or how would a query triggering this look like?

Apparently gd pointer, will never be NULL there, verified with Assert(gd !=
NULL).

Um, what? If you add the assert right before the if condition, you won't
even be able to do initdb. It's pretty clear it'll crash for any query
without grouping sets.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Ranier Vilela

ranier.vf@gmail.com

over 5 years ago

In reply to: Tomas Vondra (#2)

Re: Planner, check if can use consider HASH for groupings (src/backend/optimizer/plan/planner.c)

Em qui., 17 de set. de 2020 às 17:09, Tomas Vondra <
tomas.vondra@2ndquadrant.com> escreveu:

On Thu, Sep 17, 2020 at 02:12:12PM -0300, Ranier Vilela wrote:

Hi,

In case gd->any_hashable is FALSE, grouping_is_hashable is never called.
In this case, the planner could use HASH for groupings, but will never

know.

The condition is pretty simple - if the query has grouping sets, look at
grouping sets, otherwise look at groupClause. For this to be an issue
the query would need to have both grouping sets and (independent) group
clause - which AFAIK is not possible.

Uh?
(parse->groupClause != NIL) If different from NIL we have ((independent)
group clause), grouping_is_hashable should check?
(gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause))))
If gd is not NIL and gd->any_hashtable is FALSE?

For hashing to be worth considering, at least one grouping set has to be
hashable - otherwise it's pointless.

Granted, you could have something like

GROUP BY GROUPING SETS ((a), (b)), c

which I think essentially says "add c to every grouping set" and that
will be covered by the any_hashable check.

I am not going into the merit of whether or not it is worth hashing.
grouping_is_hashable as a last resort, must decide.

Apparently gd pointer, will never be NULL there, verified with Assert(gd

!=

NULL).

Um, what? If you add the assert right before the if condition, you won't
even be able to do initdb. It's pretty clear it'll crash for any query
without grouping sets.

Here not:
Assert(gd != NULL);
create_ordinary_grouping_paths(root, input_rel, grouped_rel,
agg_costs, gd, &extra,
&partially_grouped_rel);

Otherwise Coverity would be right.
CID 1412604 (#1 of 1): Dereference after null check (FORWARD_NULL)13.
var_deref_model: Passing null pointer gd to create_ordinary_grouping_paths,
which dereferences it. [show details
<https://scan6.coverity.com/eventId=31389494-14&modelId=31389494-0&fileInstanceId=105740687&filePath=%2Fdll%2Fpostgres%2Fsrc%2Fbackend%2Foptimizer%2Fplan%2Fplanner.c&fileStart=4053&fileEnd=4180>]

Which cannot be true, gd is never NULL here.

regards,
Ranier Vilela

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 5 years ago

In reply to: Ranier Vilela (#3)

1 attachment(s)

Re: Planner, check if can use consider HASH for groupings (src/backend/optimizer/plan/planner.c)

On Thu, Sep 17, 2020 at 06:31:12PM -0300, Ranier Vilela wrote:

Em qui., 17 de set. de 2020 ï¿½s 17:09, Tomas Vondra <
tomas.vondra@2ndquadrant.com> escreveu:

On Thu, Sep 17, 2020 at 02:12:12PM -0300, Ranier Vilela wrote:

Hi,

In case gd->any_hashable is FALSE, grouping_is_hashable is never called.
In this case, the planner could use HASH for groupings, but will never

know.

The condition is pretty simple - if the query has grouping sets, look at
grouping sets, otherwise look at groupClause. For this to be an issue
the query would need to have both grouping sets and (independent) group
clause - which AFAIK is not possible.

Uh?
(parse->groupClause != NIL) If different from NIL we have ((independent)
group clause), grouping_is_hashable should check?
(gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause))))
If gd is not NIL and gd->any_hashtable is FALSE?

Sorry, I'm not quite sure I understand what this is meant to say :-(

Anyway, (groupClause != NIL) does not mean the groupClause is somehow
independent (of what?). Add some debugging to create_grouping_paths and
you'll see that e.g. this query ends up with groupClause with 3 items:

select 1 from t group by grouping sets ((a), (b), (c));

and so does this one:

select 1 from t group by grouping sets ((a,c), (a,b));

I'm no expert in grouping sets, but I'd bet this means we transform the
grouping sets into a groupClause by extracting the keys. I haven't
investigated why exactly we do this, but I'm sure there's a reason (e.g.
it gives us SortGroupClause).

You seem to believe a query can have both grouping sets and regular
grouping at the same level - but how would such query look like? Surely
you can't have two GROuP BY clauses. You can do

select 1 from t group by a, grouping sets ((b), (c));

which is however mostly equivalent to (AFAICS)

select 1 from t group by grouping sets ((a,b), (a,c))

so it's not like we have an independent groupClause either I think.

The whole point of the original condition is - if there are grouping
sets, check if at least one can be executed using hashing (i.e. all keys
are hashable). Otherwise (without grouping sets) look at the grouping as
a whole.

I don't see how your change improves this - if there are grouping sets,
it's futile to look at the whole groupClause if at least one grouping
set can't be hashed.

But there's a simple way to disprove this - show us a case (table and a
query) where your code correctly allows hashing while the current one
does not.

For hashing to be worth considering, at least one grouping set has to be
hashable - otherwise it's pointless.

Granted, you could have something like

GROUP BY GROUPING SETS ((a), (b)), c

which I think essentially says "add c to every grouping set" and that
will be covered by the any_hashable check.

I am not going into the merit of whether or not it is worth hashing.
grouping_is_hashable as a last resort, must decide.

I don't know what this is supposed to say either. The whole point of
this check is to simply skip construction of hash-based paths in cases
when it's obviously futile (i.e. when some of the keys don't support
hashing). We do this as early as possible, because the whole point is to
save precious CPU cycles during query planning.

Apparently gd pointer, will never be NULL there, verified with Assert(gd

!=

NULL).

Um, what? If you add the assert right before the if condition, you won't
even be able to do initdb. It's pretty clear it'll crash for any query
without grouping sets.

Here not:
Assert(gd != NULL);
create_ordinary_grouping_paths(root, input_rel, grouped_rel,
agg_costs, gd, &extra,
&partially_grouped_rel);

I have no idea where you're adding this assert. But simply adding it to
create_grouping_paths (right before the if condition changed by your
patch) will trigger a failure during initdb. Simply because for queries
without grouping sets (and with regular grouping) we pass gs=NULL.

Try applying the attached patch and do "pg_ctl -D ... init" - you'll get
a failure proving that gd=NULL.

Otherwise Coverity would be right.
CID 1412604 (#1 of 1): Dereference after null check (FORWARD_NULL)13.
var_deref_model: Passing null pointer gd to create_ordinary_grouping_paths,
which dereferences it. [show details
<https://scan6.coverity.com/eventId=31389494-14&modelId=31389494-0&fileInstanceId=105740687&filePath=%2Fdll%2Fpostgres%2Fsrc%2Fbackend%2Foptimizer%2Fplan%2Fplanner.c&fileStart=4053&fileEnd=4180>]

Which cannot be true, gd is never NULL here.

Or maybe coverity simply does not realize the NULL-dereference can't
happen, because it's guarded by other conditions, or something like
that ... As the assert failure demonstrates, we do indeed get NULLs
here, and it does not crash so coverity is missing something.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

grouping-assert.patchtext/plain; charset=us-asciiDownload

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 8007e205ed..f20c31b9dc 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3845,6 +3845,8 @@ create_grouping_paths(PlannerInfo *root,
 			|| grouping_is_sortable(parse->groupClause))
 			flags |= GROUPING_CAN_USE_SORT;
 
+		Assert(gd != NULL);
+
 		/*
 		 * Determine whether we should consider hash-based implementations of
 		 * grouping.

Ranier Vilela

ranier.vf@gmail.com

over 5 years ago

In reply to: Ranier Vilela (#1)

Re: Planner, check if can use consider HASH for groupings (src/backend/optimizer/plan/planner.c)

Em seg., 21 de set. de 2020 às 13:37, Tomas Vondra <
tomas.vondra@2ndquadrant.com> escreveu:

On Mon, Sep 21, 2020 at 08:32:35AM -0300, Ranier Vilela wrote:

Em dom., 20 de set. de 2020 às 21:10, Tomas Vondra <
tomas.vondra@2ndquadrant.com> escreveu:

On Sun, Sep 20, 2020 at 08:09:56PM -0300, Ranier Vilela wrote:

Em sex., 18 de set. de 2020 às 10:37, Tomas Vondra <
tomas.vondra@2ndquadrant.com> escreveu:

On Thu, Sep 17, 2020 at 06:31:12PM -0300, Ranier Vilela wrote:

Em qui., 17 de set. de 2020 às 17:09, Tomas Vondra <
tomas.vondra@2ndquadrant.com> escreveu:

On Thu, Sep 17, 2020 at 02:12:12PM -0300, Ranier Vilela wrote:

Hi,

In case gd->any_hashable is FALSE, grouping_is_hashable is never

called.

In this case, the planner could use HASH for groupings, but will

never

know.

The condition is pretty simple - if the query has grouping sets,

look at

grouping sets, otherwise look at groupClause. For this to be an

issue

the query would need to have both grouping sets and (independent)

group

clause - which AFAIK is not possible.

Uh?
(parse->groupClause != NIL) If different from NIL we have

((independent)

group clause), grouping_is_hashable should check?
(gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause))))
If gd is not NIL and gd->any_hashtable is FALSE?

Sorry, I'm not quite sure I understand what this is meant to say :-(

Anyway, (groupClause != NIL) does not mean the groupClause is somehow
independent (of what?). Add some debugging to create_grouping_paths

and

you'll see that e.g. this query ends up with groupClause with 3

items:

select 1 from t group by grouping sets ((a), (b), (c));

and so does this one:

select 1 from t group by grouping sets ((a,c), (a,b));

I'm no expert in grouping sets, but I'd bet this means we transform

the

grouping sets into a groupClause by extracting the keys. I haven't
investigated why exactly we do this, but I'm sure there's a reason

(e.g.

it gives us SortGroupClause).

You seem to believe a query can have both grouping sets and regular
grouping at the same level - but how would such query look like?

Surely

you can't have two GROuP BY clauses. You can do

Translating into code:
gd is grouping sets and
parse->groupClause is regular grouping
and we cannot have both at the same time.

Have you verified the claim that we can't have both at the same time? As
I already explained, we build groupClause from grouping sets. I suggest
you do some debugging using the queries I used as examples, and you'll
see the claim is not true.

I think we already agreed that we cannot have both at the same time.

No, we haven't. I think we've agreed that you can't have both a group by
clause with grouping sets and then another group by clause with "plain"
grouping. But even with grouping sets you'll have groupClause generated
from the grouping sets. See preprocess_grouping_sets.

select 1 from t group by a, grouping sets ((b), (c));

which is however mostly equivalent to (AFAICS)

select 1 from t group by grouping sets ((a,b), (a,c))

so it's not like we have an independent groupClause either I think.

The whole point of the original condition is - if there are grouping
sets, check if at least one can be executed using hashing (i.e. all

keys

are hashable). Otherwise (without grouping sets) look at the

grouping as

a whole.

Or if gd is NULL check parse->groupClause.

Which is what I'm saying.

I don't see how your change improves this - if there are grouping

sets,

it's futile to look at the whole groupClause if at least one grouping
set can't be hashed.

But there's a simple way to disprove this - show us a case (table

and a

query) where your code correctly allows hashing while the current one
does not.

I'm not an expert in grouping either.
The question I have here is whether gd is populated and has gd->
any_hashable as FALSE,
Its mean no use checking parse-> groupClause, it's a waste of time, Ok.

I'm sorry, I don't follow your logic. Those are two separate cases. If
we have grouping sets, we have to check if at least one can be hashed.
If we don't have grouping sets, we have to check groupClause directly.
Why would that be a waste of time is unclear to me.

This is clear to me.
The problem is how it was implemented in create_grouping_paths.

You haven't demonstrated what the problem is, though. Showing us a query
that fails would make it very clear.

For hashing to be worth considering, at least one grouping set has

to be

hashable - otherwise it's pointless.

Granted, you could have something like

GROUP BY GROUPING SETS ((a), (b)), c

which I think essentially says "add c to every grouping set" and

that

will be covered by the any_hashable check.

I am not going into the merit of whether or not it is worth hashing.
grouping_is_hashable as a last resort, must decide.

I don't know what this is supposed to say either. The whole point of
this check is to simply skip construction of hash-based paths in

cases

when it's obviously futile (i.e. when some of the keys don't support
hashing). We do this as early as possible, because the whole point

is to

save precious CPU cycles during query planning.

Apparently gd pointer, will never be NULL there, verified with

Assert(gd

!=

NULL).

Um, what? If you add the assert right before the if condition, you

won't

even be able to do initdb. It's pretty clear it'll crash for any

query

without grouping sets.

Here not:
Assert(gd != NULL);
create_ordinary_grouping_paths(root, input_rel, grouped_rel,
agg_costs, gd, &extra,
&partially_grouped_rel);

I have no idea where you're adding this assert. But simply adding it

to

create_grouping_paths (right before the if condition changed by your
patch) will trigger a failure during initdb. Simply because for

queries

without grouping sets (and with regular grouping) we pass gs=NULL.

Try applying the attached patch and do "pg_ctl -D ... init" - you'll

get

a failure proving that gd=NULL.

Well, I did a test right now, downloaded the latest HEAD and added the
Assertive.
I compiled everything, ran the regression tests, installed the

binaries,

loaded the server and installed a client database.
Everything is working.
Maybe in all these steps, there is no grouping that would trigger the

code

in question, but I doubt it.

For me doing this leads to reliable crashes when pg_regress does initdb
(so before the actual checks):

patch -p1 < ~/grouping-assert.patch
./configure --enable-debug --enable-cassert
make -s clean && make -s -j4 && make check

And the "make check" it immediately crashes like this:

============== creating temporary instance ==============
============== initializing database system ==============

pg_regress: initdb failed
Examine /home/user/work/postgres/src/test/regress/log/initdb.log for
the reason.

on the assert. So yeah, I'd guess there's something wrong with your
build. What does pg_config say?

Default.
I never change anything. I simply clone the last head:
git clone --single-branch https://github.com/postgres/postgres/
and have it compiled.

Compiled how?

build Debug

regards,
Ranier Vilela

Import Notes

Reply to msg id not found: 20200921163659.y336m2zlyal3ly7z@development

Ranier Vilela

ranier.vf@gmail.com

over 5 years ago

In reply to: Ranier Vilela (#1)

Re: Planner, check if can use consider HASH for groupings (src/backend/optimizer/plan/planner.c)

Em seg., 21 de set. de 2020 às 14:24, Tomas Vondra <
tomas.vondra@2ndquadrant.com> escreveu:

On Sun, Sep 20, 2020 at 01:50:34PM -0300, Ranier Vilela wrote:

Em seg., 21 de set. de 2020 às 13:37, Tomas Vondra <
tomas.vondra@2ndquadrant.com> escreveu:

On Mon, Sep 21, 2020 at 08:32:35AM -0300, Ranier Vilela wrote:

Em dom., 20 de set. de 2020 às 21:10, Tomas Vondra <
tomas.vondra@2ndquadrant.com> escreveu:

On Sun, Sep 20, 2020 at 08:09:56PM -0300, Ranier Vilela wrote:

Em sex., 18 de set. de 2020 às 10:37, Tomas Vondra <
tomas.vondra@2ndquadrant.com> escreveu:

On Thu, Sep 17, 2020 at 06:31:12PM -0300, Ranier Vilela wrote:

Em qui., 17 de set. de 2020 às 17:09, Tomas Vondra <
tomas.vondra@2ndquadrant.com> escreveu:

On Thu, Sep 17, 2020 at 02:12:12PM -0300, Ranier Vilela wrote:

Hi,

In case gd->any_hashable is FALSE, grouping_is_hashable is

never

called.

In this case, the planner could use HASH for groupings, but

will

never

know.

The condition is pretty simple - if the query has grouping

sets,

look at

grouping sets, otherwise look at groupClause. For this to be an

issue

the query would need to have both grouping sets and

(independent)

group

clause - which AFAIK is not possible.

Uh?
(parse->groupClause != NIL) If different from NIL we have

((independent)

group clause), grouping_is_hashable should check?
(gd ? gd->any_hashable :

grouping_is_hashable(parse->groupClause))))

If gd is not NIL and gd->any_hashtable is FALSE?

Sorry, I'm not quite sure I understand what this is meant to say

:-(

Anyway, (groupClause != NIL) does not mean the groupClause is

somehow

independent (of what?). Add some debugging to

create_grouping_paths

and

you'll see that e.g. this query ends up with groupClause with 3

items:

select 1 from t group by grouping sets ((a), (b), (c));

and so does this one:

select 1 from t group by grouping sets ((a,c), (a,b));

I'm no expert in grouping sets, but I'd bet this means we

transform

the

grouping sets into a groupClause by extracting the keys. I haven't
investigated why exactly we do this, but I'm sure there's a reason

(e.g.

it gives us SortGroupClause).

You seem to believe a query can have both grouping sets and

regular

grouping at the same level - but how would such query look like?

Surely

you can't have two GROuP BY clauses. You can do

Translating into code:
gd is grouping sets and
parse->groupClause is regular grouping
and we cannot have both at the same time.

Have you verified the claim that we can't have both at the same

time? As

I already explained, we build groupClause from grouping sets. I

suggest

you do some debugging using the queries I used as examples, and

you'll

see the claim is not true.

I think we already agreed that we cannot have both at the same time.

No, we haven't. I think we've agreed that you can't have both a group by
clause with grouping sets and then another group by clause with "plain"
grouping. But even with grouping sets you'll have groupClause generated
from the grouping sets. See preprocess_grouping_sets.

select 1 from t group by a, grouping sets ((b), (c));

which is however mostly equivalent to (AFAICS)

select 1 from t group by grouping sets ((a,b), (a,c))

so it's not like we have an independent groupClause either I

think.

The whole point of the original condition is - if there are

grouping

sets, check if at least one can be executed using hashing (i.e.

all

keys

are hashable). Otherwise (without grouping sets) look at the

grouping as

a whole.

Or if gd is NULL check parse->groupClause.

Which is what I'm saying.

I don't see how your change improves this - if there are grouping

sets,

it's futile to look at the whole groupClause if at least one

grouping

set can't be hashed.

But there's a simple way to disprove this - show us a case (table

and a

query) where your code correctly allows hashing while the current

one

does not.

I'm not an expert in grouping either.
The question I have here is whether gd is populated and has gd->
any_hashable as FALSE,
Its mean no use checking parse-> groupClause, it's a waste of time,

Ok.

I'm sorry, I don't follow your logic. Those are two separate cases.

If

we have grouping sets, we have to check if at least one can be

hashed.

If we don't have grouping sets, we have to check groupClause

directly.

Why would that be a waste of time is unclear to me.

This is clear to me.
The problem is how it was implemented in create_grouping_paths.

You haven't demonstrated what the problem is, though. Showing us a query
that fails would make it very clear.

For hashing to be worth considering, at least one grouping set

has

to be

hashable - otherwise it's pointless.

Granted, you could have something like

GROUP BY GROUPING SETS ((a), (b)), c

which I think essentially says "add c to every grouping set"

and

that

will be covered by the any_hashable check.

I am not going into the merit of whether or not it is worth

hashing.

grouping_is_hashable as a last resort, must decide.

I don't know what this is supposed to say either. The whole point

of

this check is to simply skip construction of hash-based paths in

cases

when it's obviously futile (i.e. when some of the keys don't

support

hashing). We do this as early as possible, because the whole point

is to

save precious CPU cycles during query planning.

Apparently gd pointer, will never be NULL there, verified with

Assert(gd

!=

NULL).

Um, what? If you add the assert right before the if condition,

you

won't

even be able to do initdb. It's pretty clear it'll crash for

any

query

without grouping sets.

Here not:
Assert(gd != NULL);
create_ordinary_grouping_paths(root, input_rel, grouped_rel,
agg_costs, gd, &extra,
&partially_grouped_rel);

I have no idea where you're adding this assert. But simply adding

it

to

create_grouping_paths (right before the if condition changed by

your

patch) will trigger a failure during initdb. Simply because for

queries

without grouping sets (and with regular grouping) we pass gs=NULL.

Try applying the attached patch and do "pg_ctl -D ... init" -

you'll

get

a failure proving that gd=NULL.

Well, I did a test right now, downloaded the latest HEAD and added

the

Assertive.
I compiled everything, ran the regression tests, installed the

binaries,

loaded the server and installed a client database.
Everything is working.
Maybe in all these steps, there is no grouping that would trigger

the

code

in question, but I doubt it.

For me doing this leads to reliable crashes when pg_regress does

initdb

(so before the actual checks):

patch -p1 < ~/grouping-assert.patch
./configure --enable-debug --enable-cassert
make -s clean && make -s -j4 && make check

And the "make check" it immediately crashes like this:

============== creating temporary instance

==============

============== initializing database system

==============

pg_regress: initdb failed
Examine /home/user/work/postgres/src/test/regress/log/initdb.log

for

the reason.

on the assert. So yeah, I'd guess there's something wrong with your
build. What does pg_config say?

Default.
I never change anything. I simply clone the last head:
git clone --single-branch https://github.com/postgres/postgres/
and have it compiled.

Compiled how?

build Debug

What does that mean? Wouldn't it be simpler to just show pg_config
output, which shows the exact flags used for configure / compile?

I'm sorry.

BINDIR = C:/dll/postgres/Debug/PG_CON~1
DOCDIR = /doc
HTMLDIR = /doc
INCLUDEDIR = /include
PKGINCLUDEDIR = /include
INCLUDEDIR-SERVER = /include/server
LIBDIR = /lib
PKGLIBDIR = /lib
LOCALEDIR = /share/locale
MANDIR = /man
SHAREDIR = /share
SYSCONFDIR = /etc
PGXS = /lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = --enable-thread-safety --with-ldap --without-zlib
CC = not recorded
CPPFLAGS = not recorded
CFLAGS = not recorded
CFLAGS_SL = not recorded
LDFLAGS = not recorded
LDFLAGS_EX = not recorded
LDFLAGS_SL = not recorded
LIBS = not recorded
VERSION = PostgreSQL 14devel

msvc 2019 (64 bits)
windows 10 (64bits)
steps:
cd\dll\postgres\src\tools\msvc
build Debug

Maybe, in Windows builds, has another file?

regards,
Ranier Vilela

Import Notes

Reply to msg id not found: 20200921172426.ei3crgn6hqq3ph24@development

Ranier Vilela

ranier.vf@gmail.com

over 5 years ago

In reply to: Tomas Vondra (#4)

2 attachment(s)

Re: Planner, check if can use consider HASH for groupings (src/backend/optimizer/plan/planner.c)

Em sex., 18 de set. de 2020 às 10:37, Tomas Vondra <
tomas.vondra@2ndquadrant.com> escreveu:

On Thu, Sep 17, 2020 at 06:31:12PM -0300, Ranier Vilela wrote:

Em qui., 17 de set. de 2020 às 17:09, Tomas Vondra <
tomas.vondra@2ndquadrant.com> escreveu:

On Thu, Sep 17, 2020 at 02:12:12PM -0300, Ranier Vilela wrote:

Hi,

In case gd->any_hashable is FALSE, grouping_is_hashable is never

called.

In this case, the planner could use HASH for groupings, but will never

know.

The condition is pretty simple - if the query has grouping sets, look at
grouping sets, otherwise look at groupClause. For this to be an issue
the query would need to have both grouping sets and (independent) group
clause - which AFAIK is not possible.

Uh?
(parse->groupClause != NIL) If different from NIL we have ((independent)
group clause), grouping_is_hashable should check?
(gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause))))
If gd is not NIL and gd->any_hashtable is FALSE?

Sorry, I'm not quite sure I understand what this is meant to say :-(

Anyway, (groupClause != NIL) does not mean the groupClause is somehow
independent (of what?). Add some debugging to create_grouping_paths and
you'll see that e.g. this query ends up with groupClause with 3 items:

select 1 from t group by grouping sets ((a), (b), (c));

and so does this one:

select 1 from t group by grouping sets ((a,c), (a,b));

I'm no expert in grouping sets, but I'd bet this means we transform the
grouping sets into a groupClause by extracting the keys. I haven't
investigated why exactly we do this, but I'm sure there's a reason (e.g.
it gives us SortGroupClause).

You seem to believe a query can have both grouping sets and regular
grouping at the same level - but how would such query look like? Surely
you can't have two GROuP BY clauses. You can do

Translating into code:
gd is grouping sets and
parse->groupClause is regular grouping
and we cannot have both at the same time.

select 1 from t group by a, grouping sets ((b), (c));

which is however mostly equivalent to (AFAICS)

select 1 from t group by grouping sets ((a,b), (a,c))

so it's not like we have an independent groupClause either I think.

The whole point of the original condition is - if there are grouping
sets, check if at least one can be executed using hashing (i.e. all keys
are hashable). Otherwise (without grouping sets) look at the grouping as
a whole.

Or if gd is NULL check parse->groupClause.

I don't see how your change improves this - if there are grouping sets,
it's futile to look at the whole groupClause if at least one grouping
set can't be hashed.

But there's a simple way to disprove this - show us a case (table and a
query) where your code correctly allows hashing while the current one
does not.

I'm not an expert in grouping either.
The question I have here is whether gd is populated and has gd->
any_hashable as FALSE,
Its mean no use checking parse-> groupClause, it's a waste of time, Ok.

For hashing to be worth considering, at least one grouping set has to be
hashable - otherwise it's pointless.

Granted, you could have something like

GROUP BY GROUPING SETS ((a), (b)), c

which I think essentially says "add c to every grouping set" and that
will be covered by the any_hashable check.

I am not going into the merit of whether or not it is worth hashing.
grouping_is_hashable as a last resort, must decide.

I don't know what this is supposed to say either. The whole point of
this check is to simply skip construction of hash-based paths in cases
when it's obviously futile (i.e. when some of the keys don't support
hashing). We do this as early as possible, because the whole point is to
save precious CPU cycles during query planning.

Apparently gd pointer, will never be NULL there, verified with

Assert(gd

!=

NULL).

Um, what? If you add the assert right before the if condition, you won't
even be able to do initdb. It's pretty clear it'll crash for any query
without grouping sets.

Here not:
Assert(gd != NULL);
create_ordinary_grouping_paths(root, input_rel, grouped_rel,
agg_costs, gd, &extra,
&partially_grouped_rel);

I have no idea where you're adding this assert. But simply adding it to
create_grouping_paths (right before the if condition changed by your
patch) will trigger a failure during initdb. Simply because for queries
without grouping sets (and with regular grouping) we pass gs=NULL.

Try applying the attached patch and do "pg_ctl -D ... init" - you'll get
a failure proving that gd=NULL.

Well, I did a test right now, downloaded the latest HEAD and added the
Assertive.
I compiled everything, ran the regression tests, installed the binaries,
loaded the server and installed a client database.
Everything is working.
Maybe in all these steps, there is no grouping that would trigger the code
in question, but I doubt it.

Otherwise Coverity would be right.
CID 1412604 (#1 of 1): Dereference after null check (FORWARD_NULL)13.
var_deref_model: Passing null pointer gd to

create_ordinary_grouping_paths,

which dereferences it. [show details
<

https://scan6.coverity.com/eventId=31389494-14&modelId=31389494-0&fileInstanceId=105740687&filePath=%2Fdll%2Fpostgres%2Fsrc%2Fbackend%2Foptimizer%2Fplan%2Fplanner.c&fileStart=4053&fileEnd=4180

]

Which cannot be true, gd is never NULL here.

Or maybe coverity simply does not realize the NULL-dereference can't
happen, because it's guarded by other conditions, or something like
that ... As the assert failure demonstrates, we do indeed get NULLs
here, and it does not crash so coverity is missing something.

1. With Assert.
All works.

2. create_ordinary_grouping_paths is called with gd var.

3. create_ordinary_grouping_paths call get_number_of_groups

4. get_number_of_groups dereference gd pointer (line 3705).
foreach(lc, gd->rollups)

5. line (3701:
Assert(gd); /* keep Coverity happy */

Of course, this works, gd is never NULL here.

6. grouping_is_hashable is never called here, because gd is never NULL here.
if ((parse->groupClause != NIL &&
agg_costs->numOrderedAggs == 0 &&
(gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause))))
flags |= GROUPING_CAN_USE_HASH;

The question is is it worth it or not worth calling (grouping_is_hashable),
at that point?

regards,
Ranier Vilela

Attachments:

full_with_assert.txttext/plain; charset=US-ASCII; name=full_with_assert.txtDownload

gd_is_null.patchapplication/octet-stream; name=gd_is_null.patchDownload

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 8007e205ed..ea8be886ef 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3829,6 +3829,8 @@ create_grouping_paths(PlannerInfo *root,
 	{
 		int			flags = 0;
 		GroupPathExtraData extra;
+		
+		Assert(gd != NULL);
 
 		/*
 		 * Determine whether it's possible to perform sort-based

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 5 years ago

In reply to: Ranier Vilela (#7)

Re: Planner, check if can use consider HASH for groupings (src/backend/optimizer/plan/planner.c)

On Sun, Sep 20, 2020 at 08:09:56PM -0300, Ranier Vilela wrote:

Em sex., 18 de set. de 2020 ï¿½s 10:37, Tomas Vondra <
tomas.vondra@2ndquadrant.com> escreveu:

On Thu, Sep 17, 2020 at 06:31:12PM -0300, Ranier Vilela wrote:

Em qui., 17 de set. de 2020 ï¿½s 17:09, Tomas Vondra <
tomas.vondra@2ndquadrant.com> escreveu:

On Thu, Sep 17, 2020 at 02:12:12PM -0300, Ranier Vilela wrote:

Hi,

In case gd->any_hashable is FALSE, grouping_is_hashable is never

called.

In this case, the planner could use HASH for groupings, but will never

know.

The condition is pretty simple - if the query has grouping sets, look at
grouping sets, otherwise look at groupClause. For this to be an issue
the query would need to have both grouping sets and (independent) group
clause - which AFAIK is not possible.

Uh?
(parse->groupClause != NIL) If different from NIL we have ((independent)
group clause), grouping_is_hashable should check?
(gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause))))
If gd is not NIL and gd->any_hashtable is FALSE?

Sorry, I'm not quite sure I understand what this is meant to say :-(

Anyway, (groupClause != NIL) does not mean the groupClause is somehow
independent (of what?). Add some debugging to create_grouping_paths and
you'll see that e.g. this query ends up with groupClause with 3 items:

select 1 from t group by grouping sets ((a), (b), (c));

and so does this one:

select 1 from t group by grouping sets ((a,c), (a,b));

I'm no expert in grouping sets, but I'd bet this means we transform the
grouping sets into a groupClause by extracting the keys. I haven't
investigated why exactly we do this, but I'm sure there's a reason (e.g.
it gives us SortGroupClause).

You seem to believe a query can have both grouping sets and regular
grouping at the same level - but how would such query look like? Surely
you can't have two GROuP BY clauses. You can do

Translating into code:
gd is grouping sets and
parse->groupClause is regular grouping
and we cannot have both at the same time.

Have you verified the claim that we can't have both at the same time? As
I already explained, we build groupClause from grouping sets. I suggest
you do some debugging using the queries I used as examples, and you'll
see the claim is not true.

select 1 from t group by a, grouping sets ((b), (c));

which is however mostly equivalent to (AFAICS)

select 1 from t group by grouping sets ((a,b), (a,c))

so it's not like we have an independent groupClause either I think.

The whole point of the original condition is - if there are grouping
sets, check if at least one can be executed using hashing (i.e. all keys
are hashable). Otherwise (without grouping sets) look at the grouping as
a whole.

Or if gd is NULL check parse->groupClause.

Which is what I'm saying.

I don't see how your change improves this - if there are grouping sets,
it's futile to look at the whole groupClause if at least one grouping
set can't be hashed.

But there's a simple way to disprove this - show us a case (table and a
query) where your code correctly allows hashing while the current one
does not.

I'm not an expert in grouping either.
The question I have here is whether gd is populated and has gd->
any_hashable as FALSE,
Its mean no use checking parse-> groupClause, it's a waste of time, Ok.

I'm sorry, I don't follow your logic. Those are two separate cases. If
we have grouping sets, we have to check if at least one can be hashed.
If we don't have grouping sets, we have to check groupClause directly.
Why would that be a waste of time is unclear to me.

For hashing to be worth considering, at least one grouping set has to be
hashable - otherwise it's pointless.

Granted, you could have something like

GROUP BY GROUPING SETS ((a), (b)), c

which I think essentially says "add c to every grouping set" and that
will be covered by the any_hashable check.

I am not going into the merit of whether or not it is worth hashing.
grouping_is_hashable as a last resort, must decide.

I don't know what this is supposed to say either. The whole point of
this check is to simply skip construction of hash-based paths in cases
when it's obviously futile (i.e. when some of the keys don't support
hashing). We do this as early as possible, because the whole point is to
save precious CPU cycles during query planning.

Apparently gd pointer, will never be NULL there, verified with

Assert(gd

!=

NULL).

Um, what? If you add the assert right before the if condition, you won't
even be able to do initdb. It's pretty clear it'll crash for any query
without grouping sets.

Here not:
Assert(gd != NULL);
create_ordinary_grouping_paths(root, input_rel, grouped_rel,
agg_costs, gd, &extra,
&partially_grouped_rel);

I have no idea where you're adding this assert. But simply adding it to
create_grouping_paths (right before the if condition changed by your
patch) will trigger a failure during initdb. Simply because for queries
without grouping sets (and with regular grouping) we pass gs=NULL.

Try applying the attached patch and do "pg_ctl -D ... init" - you'll get
a failure proving that gd=NULL.

Well, I did a test right now, downloaded the latest HEAD and added the
Assertive.
I compiled everything, ran the regression tests, installed the binaries,
loaded the server and installed a client database.
Everything is working.
Maybe in all these steps, there is no grouping that would trigger the code
in question, but I doubt it.

For me doing this leads to reliable crashes when pg_regress does initdb
(so before the actual checks):

patch -p1 < ~/grouping-assert.patch
./configure --enable-debug --enable-cassert
make -s clean && make -s -j4 && make check

And the "make check" it immediately crashes like this:

============== creating temporary instance ==============
============== initializing database system ==============

pg_regress: initdb failed
Examine /home/user/work/postgres/src/test/regress/log/initdb.log for the reason.

on the assert. So yeah, I'd guess there's something wrong with your
build. What does pg_config say?

Otherwise Coverity would be right.
CID 1412604 (#1 of 1): Dereference after null check (FORWARD_NULL)13.
var_deref_model: Passing null pointer gd to

create_ordinary_grouping_paths,

which dereferences it. [show details
<

https://scan6.coverity.com/eventId=31389494-14&modelId=31389494-0&fileInstanceId=105740687&filePath=%2Fdll%2Fpostgres%2Fsrc%2Fbackend%2Foptimizer%2Fplan%2Fplanner.c&fileStart=4053&fileEnd=4180

]

Which cannot be true, gd is never NULL here.

Or maybe coverity simply does not realize the NULL-dereference can't
happen, because it's guarded by other conditions, or something like
that ... As the assert failure demonstrates, we do indeed get NULLs
here, and it does not crash so coverity is missing something.

1. With Assert.
All works.

No, it doesn't.

2. create_ordinary_grouping_paths is called with gd var.

3. create_ordinary_grouping_paths call get_number_of_groups

4. get_number_of_groups dereference gd pointer (line 3705).
foreach(lc, gd->rollups)

Yes. And that only happens in branch

if (parse->groupingSets) { ... }

which means we know gd is not null here. But coverity does not realize
that, and produces bogus report.

5. line (3701:
Assert(gd); /* keep Coverity happy */

Of course, this works, gd is never NULL here.

6. grouping_is_hashable is never called here, because gd is never NULL here.
if ((parse->groupClause != NIL &&
agg_costs->numOrderedAggs == 0 &&
(gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause))))
flags |= GROUPING_CAN_USE_HASH;

The question is is it worth it or not worth calling (grouping_is_hashable),
at that point?

How would I know? You're proposing this optimization, so it's up to you
to show it's worth it. Or are you just randomly changing the condition
based on a wild speculation that it might help?

FWIW I'm not going to respond to this thread until you post an example
of a query where the proposed change actually improves things. That
should be a must-have part of an patch claiming to optimize something.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Ranier Vilela

ranier.vf@gmail.com

over 5 years ago

In reply to: Tomas Vondra (#8)

Re: Planner, check if can use consider HASH for groupings (src/backend/optimizer/plan/planner.c)

Em dom., 20 de set. de 2020 às 21:10, Tomas Vondra <
tomas.vondra@2ndquadrant.com> escreveu:

On Sun, Sep 20, 2020 at 08:09:56PM -0300, Ranier Vilela wrote:

Em sex., 18 de set. de 2020 às 10:37, Tomas Vondra <
tomas.vondra@2ndquadrant.com> escreveu:

On Thu, Sep 17, 2020 at 06:31:12PM -0300, Ranier Vilela wrote:

Em qui., 17 de set. de 2020 às 17:09, Tomas Vondra <
tomas.vondra@2ndquadrant.com> escreveu:

On Thu, Sep 17, 2020 at 02:12:12PM -0300, Ranier Vilela wrote:

Hi,

In case gd->any_hashable is FALSE, grouping_is_hashable is never

called.

In this case, the planner could use HASH for groupings, but will

never

know.

The condition is pretty simple - if the query has grouping sets,

look at

grouping sets, otherwise look at groupClause. For this to be an issue
the query would need to have both grouping sets and (independent)

group

clause - which AFAIK is not possible.

Uh?
(parse->groupClause != NIL) If different from NIL we have

((independent)

group clause), grouping_is_hashable should check?
(gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause))))
If gd is not NIL and gd->any_hashtable is FALSE?

Sorry, I'm not quite sure I understand what this is meant to say :-(

Anyway, (groupClause != NIL) does not mean the groupClause is somehow
independent (of what?). Add some debugging to create_grouping_paths and
you'll see that e.g. this query ends up with groupClause with 3 items:

select 1 from t group by grouping sets ((a), (b), (c));

and so does this one:

select 1 from t group by grouping sets ((a,c), (a,b));

I'm no expert in grouping sets, but I'd bet this means we transform the
grouping sets into a groupClause by extracting the keys. I haven't
investigated why exactly we do this, but I'm sure there's a reason (e.g.
it gives us SortGroupClause).

You seem to believe a query can have both grouping sets and regular
grouping at the same level - but how would such query look like? Surely
you can't have two GROuP BY clauses. You can do

Translating into code:
gd is grouping sets and
parse->groupClause is regular grouping
and we cannot have both at the same time.

Have you verified the claim that we can't have both at the same time? As
I already explained, we build groupClause from grouping sets. I suggest
you do some debugging using the queries I used as examples, and you'll
see the claim is not true.

I think we already agreed that we cannot have both at the same time.

select 1 from t group by a, grouping sets ((b), (c));

which is however mostly equivalent to (AFAICS)

select 1 from t group by grouping sets ((a,b), (a,c))

so it's not like we have an independent groupClause either I think.

The whole point of the original condition is - if there are grouping
sets, check if at least one can be executed using hashing (i.e. all keys
are hashable). Otherwise (without grouping sets) look at the grouping as
a whole.

Or if gd is NULL check parse->groupClause.

Which is what I'm saying.

I don't see how your change improves this - if there are grouping sets,
it's futile to look at the whole groupClause if at least one grouping
set can't be hashed.

But there's a simple way to disprove this - show us a case (table and a
query) where your code correctly allows hashing while the current one
does not.

I'm not an expert in grouping either.
The question I have here is whether gd is populated and has gd->
any_hashable as FALSE,
Its mean no use checking parse-> groupClause, it's a waste of time, Ok.

I'm sorry, I don't follow your logic. Those are two separate cases. If
we have grouping sets, we have to check if at least one can be hashed.
If we don't have grouping sets, we have to check groupClause directly.
Why would that be a waste of time is unclear to me.

This is clear to me.
The problem is how it was implemented in create_grouping_paths.

For hashing to be worth considering, at least one grouping set has

to be

hashable - otherwise it's pointless.

Granted, you could have something like

GROUP BY GROUPING SETS ((a), (b)), c

which I think essentially says "add c to every grouping set" and that
will be covered by the any_hashable check.

I am not going into the merit of whether or not it is worth hashing.
grouping_is_hashable as a last resort, must decide.

I don't know what this is supposed to say either. The whole point of
this check is to simply skip construction of hash-based paths in cases
when it's obviously futile (i.e. when some of the keys don't support
hashing). We do this as early as possible, because the whole point is to
save precious CPU cycles during query planning.

Apparently gd pointer, will never be NULL there, verified with

Assert(gd

!=

NULL).

Um, what? If you add the assert right before the if condition, you

won't

even be able to do initdb. It's pretty clear it'll crash for any

query

without grouping sets.

Here not:
Assert(gd != NULL);
create_ordinary_grouping_paths(root, input_rel, grouped_rel,
agg_costs, gd, &extra,
&partially_grouped_rel);

I have no idea where you're adding this assert. But simply adding it to
create_grouping_paths (right before the if condition changed by your
patch) will trigger a failure during initdb. Simply because for queries
without grouping sets (and with regular grouping) we pass gs=NULL.

Try applying the attached patch and do "pg_ctl -D ... init" - you'll get
a failure proving that gd=NULL.

Well, I did a test right now, downloaded the latest HEAD and added the
Assertive.
I compiled everything, ran the regression tests, installed the binaries,
loaded the server and installed a client database.
Everything is working.
Maybe in all these steps, there is no grouping that would trigger the code
in question, but I doubt it.

For me doing this leads to reliable crashes when pg_regress does initdb
(so before the actual checks):

patch -p1 < ~/grouping-assert.patch
./configure --enable-debug --enable-cassert
make -s clean && make -s -j4 && make check

And the "make check" it immediately crashes like this:

============== creating temporary instance ==============
============== initializing database system ==============

pg_regress: initdb failed
Examine /home/user/work/postgres/src/test/regress/log/initdb.log for
the reason.

on the assert. So yeah, I'd guess there's something wrong with your
build. What does pg_config say?

Default.
I never change anything. I simply clone the last head:
git clone --single-branch https://github.com/postgres/postgres/
and have it compiled.

Otherwise Coverity would be right.
CID 1412604 (#1 of 1): Dereference after null check (FORWARD_NULL)13.
var_deref_model: Passing null pointer gd to

create_ordinary_grouping_paths,

which dereferences it. [show details
<

https://scan6.coverity.com/eventId=31389494-14&modelId=31389494-0&fileInstanceId=105740687&filePath=%2Fdll%2Fpostgres%2Fsrc%2Fbackend%2Foptimizer%2Fplan%2Fplanner.c&fileStart=4053&fileEnd=4180

]

Which cannot be true, gd is never NULL here.

Or maybe coverity simply does not realize the NULL-dereference can't
happen, because it's guarded by other conditions, or something like
that ... As the assert failure demonstrates, we do indeed get NULLs
here, and it does not crash so coverity is missing something.

1. With Assert.
All works.

No, it doesn't.

Here yes.
Latest head, msvc 2019 (64 bits), Windows 10 64 bits.
vcregress check
install c:\postgres_debug
pg_ctl -D c:\postgres_debug\data -l c:\postgres_debug\log\logfile start

locmaq.db=# select 1 from tbPersons group by grouping sets ((a), (b), (c));
ERROR: column "a" does not exist
LINE 1: select 1 from tbPersons group by grouping sets ((a), (b), (c...

None crash.

2. create_ordinary_grouping_paths is called with gd var.

3. create_ordinary_grouping_paths call get_number_of_groups

4. get_number_of_groups dereference gd pointer (line 3705).
foreach(lc, gd->rollups)

Yes. And that only happens in branch

if (parse->groupingSets) { ... }

which means we know gd is not null here. But coverity does not realize
that, and produces bogus report.

5. line (3701:
Assert(gd); /* keep Coverity happy */

Of course, this works, gd is never NULL here.

6. grouping_is_hashable is never called here, because gd is never NULL

here.

if ((parse->groupClause != NIL &&
agg_costs->numOrderedAggs == 0 &&
(gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause))))
flags |= GROUPING_CAN_USE_HASH;

The question is is it worth it or not worth calling

(grouping_is_hashable),

at that point?

Here crash. Line (2199, planner.c):

if (have_grouping)
{
Assert(gset_data != NULL);

^Clocmaq.db=?# select 1 from tbpersons group by grouping sets ((a), (b),
(c));
no connection to the server
The connection to the server was lost. Attempting reset: Failed.
!?>
!?> \q

regards,
Ranier Vilela

#10

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 5 years ago

In reply to: Ranier Vilela (#9)

Re: Planner, check if can use consider HASH for groupings (src/backend/optimizer/plan/planner.c)

On Mon, Sep 21, 2020 at 08:32:35AM -0300, Ranier Vilela wrote:

Em dom., 20 de set. de 2020 ï¿½s 21:10, Tomas Vondra <
tomas.vondra@2ndquadrant.com> escreveu:

On Sun, Sep 20, 2020 at 08:09:56PM -0300, Ranier Vilela wrote:

Em sex., 18 de set. de 2020 ï¿½s 10:37, Tomas Vondra <
tomas.vondra@2ndquadrant.com> escreveu:

On Thu, Sep 17, 2020 at 06:31:12PM -0300, Ranier Vilela wrote:

Em qui., 17 de set. de 2020 ï¿½s 17:09, Tomas Vondra <
tomas.vondra@2ndquadrant.com> escreveu:

On Thu, Sep 17, 2020 at 02:12:12PM -0300, Ranier Vilela wrote:

Hi,

In case gd->any_hashable is FALSE, grouping_is_hashable is never

called.

In this case, the planner could use HASH for groupings, but will

never

know.

The condition is pretty simple - if the query has grouping sets,

look at

grouping sets, otherwise look at groupClause. For this to be an issue
the query would need to have both grouping sets and (independent)

group

clause - which AFAIK is not possible.

Uh?
(parse->groupClause != NIL) If different from NIL we have

((independent)

group clause), grouping_is_hashable should check?
(gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause))))
If gd is not NIL and gd->any_hashtable is FALSE?

Sorry, I'm not quite sure I understand what this is meant to say :-(

Anyway, (groupClause != NIL) does not mean the groupClause is somehow
independent (of what?). Add some debugging to create_grouping_paths and
you'll see that e.g. this query ends up with groupClause with 3 items:

select 1 from t group by grouping sets ((a), (b), (c));

and so does this one:

select 1 from t group by grouping sets ((a,c), (a,b));

I'm no expert in grouping sets, but I'd bet this means we transform the
grouping sets into a groupClause by extracting the keys. I haven't
investigated why exactly we do this, but I'm sure there's a reason (e.g.
it gives us SortGroupClause).

You seem to believe a query can have both grouping sets and regular
grouping at the same level - but how would such query look like? Surely
you can't have two GROuP BY clauses. You can do

Translating into code:
gd is grouping sets and
parse->groupClause is regular grouping
and we cannot have both at the same time.

Have you verified the claim that we can't have both at the same time? As
I already explained, we build groupClause from grouping sets. I suggest
you do some debugging using the queries I used as examples, and you'll
see the claim is not true.

I think we already agreed that we cannot have both at the same time.

No, we haven't. I think we've agreed that you can't have both a group by
clause with grouping sets and then another group by clause with "plain"
grouping. But even with grouping sets you'll have groupClause generated
from the grouping sets. See preprocess_grouping_sets.

select 1 from t group by a, grouping sets ((b), (c));

which is however mostly equivalent to (AFAICS)

select 1 from t group by grouping sets ((a,b), (a,c))

so it's not like we have an independent groupClause either I think.

The whole point of the original condition is - if there are grouping
sets, check if at least one can be executed using hashing (i.e. all keys
are hashable). Otherwise (without grouping sets) look at the grouping as
a whole.

Or if gd is NULL check parse->groupClause.

Which is what I'm saying.

I don't see how your change improves this - if there are grouping sets,
it's futile to look at the whole groupClause if at least one grouping
set can't be hashed.

But there's a simple way to disprove this - show us a case (table and a
query) where your code correctly allows hashing while the current one
does not.

I'm not an expert in grouping either.
The question I have here is whether gd is populated and has gd->
any_hashable as FALSE,
Its mean no use checking parse-> groupClause, it's a waste of time, Ok.

I'm sorry, I don't follow your logic. Those are two separate cases. If
we have grouping sets, we have to check if at least one can be hashed.
If we don't have grouping sets, we have to check groupClause directly.
Why would that be a waste of time is unclear to me.

This is clear to me.
The problem is how it was implemented in create_grouping_paths.

You haven't demonstrated what the problem is, though. Showing us a query
that fails would make it very clear.

For hashing to be worth considering, at least one grouping set has

to be

hashable - otherwise it's pointless.

Granted, you could have something like

GROUP BY GROUPING SETS ((a), (b)), c

which I think essentially says "add c to every grouping set" and that
will be covered by the any_hashable check.

I am not going into the merit of whether or not it is worth hashing.
grouping_is_hashable as a last resort, must decide.

I don't know what this is supposed to say either. The whole point of
this check is to simply skip construction of hash-based paths in cases
when it's obviously futile (i.e. when some of the keys don't support
hashing). We do this as early as possible, because the whole point is to
save precious CPU cycles during query planning.

Apparently gd pointer, will never be NULL there, verified with

Assert(gd

!=

NULL).

Um, what? If you add the assert right before the if condition, you

won't

even be able to do initdb. It's pretty clear it'll crash for any

query

without grouping sets.

Here not:
Assert(gd != NULL);
create_ordinary_grouping_paths(root, input_rel, grouped_rel,
agg_costs, gd, &extra,
&partially_grouped_rel);

I have no idea where you're adding this assert. But simply adding it to
create_grouping_paths (right before the if condition changed by your
patch) will trigger a failure during initdb. Simply because for queries
without grouping sets (and with regular grouping) we pass gs=NULL.

Try applying the attached patch and do "pg_ctl -D ... init" - you'll get
a failure proving that gd=NULL.

Well, I did a test right now, downloaded the latest HEAD and added the
Assertive.
I compiled everything, ran the regression tests, installed the binaries,
loaded the server and installed a client database.
Everything is working.
Maybe in all these steps, there is no grouping that would trigger the code
in question, but I doubt it.

For me doing this leads to reliable crashes when pg_regress does initdb
(so before the actual checks):

patch -p1 < ~/grouping-assert.patch
./configure --enable-debug --enable-cassert
make -s clean && make -s -j4 && make check

And the "make check" it immediately crashes like this:

============== creating temporary instance ==============
============== initializing database system ==============

pg_regress: initdb failed
Examine /home/user/work/postgres/src/test/regress/log/initdb.log for
the reason.

on the assert. So yeah, I'd guess there's something wrong with your
build. What does pg_config say?

Default.
I never change anything. I simply clone the last head:
git clone --single-branch https://github.com/postgres/postgres/
and have it compiled.

Compiled how?

Otherwise Coverity would be right.
CID 1412604 (#1 of 1): Dereference after null check (FORWARD_NULL)13.
var_deref_model: Passing null pointer gd to

create_ordinary_grouping_paths,

which dereferences it. [show details
<

https://scan6.coverity.com/eventId=31389494-14&modelId=31389494-0&fileInstanceId=105740687&filePath=%2Fdll%2Fpostgres%2Fsrc%2Fbackend%2Foptimizer%2Fplan%2Fplanner.c&fileStart=4053&fileEnd=4180

]

Which cannot be true, gd is never NULL here.

Or maybe coverity simply does not realize the NULL-dereference can't
happen, because it's guarded by other conditions, or something like
that ... As the assert failure demonstrates, we do indeed get NULLs
here, and it does not crash so coverity is missing something.

1. With Assert.
All works.

No, it doesn't.

Here yes.
Latest head, msvc 2019 (64 bits), Windows 10 64 bits.
vcregress check
install c:\postgres_debug
pg_ctl -D c:\postgres_debug\data -l c:\postgres_debug\log\logfile start

locmaq.db=# select 1 from tbPersons group by grouping sets ((a), (b), (c));
ERROR: column "a" does not exist
LINE 1: select 1 from tbPersons group by grouping sets ((a), (b), (c...

None crash.

This very obviously fails during parse analysis, so it does not even
execute the planner. Hence it can't hit the assert and fail. You might
even have Assert(false) there and it would not crash.

2. create_ordinary_grouping_paths is called with gd var.

3. create_ordinary_grouping_paths call get_number_of_groups

4. get_number_of_groups dereference gd pointer (line 3705).
foreach(lc, gd->rollups)

Yes. And that only happens in branch

if (parse->groupingSets) { ... }

which means we know gd is not null here. But coverity does not realize
that, and produces bogus report.

5. line (3701:
Assert(gd); /* keep Coverity happy */

Of course, this works, gd is never NULL here.

6. grouping_is_hashable is never called here, because gd is never NULL

here.

if ((parse->groupClause != NIL &&
agg_costs->numOrderedAggs == 0 &&
(gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause))))
flags |= GROUPING_CAN_USE_HASH;

The question is is it worth it or not worth calling

(grouping_is_hashable),

at that point?

Here crash. Line (2199, planner.c):

if (have_grouping)
{
Assert(gset_data != NULL);

^Clocmaq.db=?# select 1 from tbpersons group by grouping sets ((a), (b),
(c));
no connection to the server
The connection to the server was lost. Attempting reset: Failed.
!?>
!?> \q

Not sure what is this assert supposed to prove? create_grouping_paths
obviously expects gset_data to be NULL and it does necessary checks
before dereferencing it. So the assert seems somewhat bogus.

I suggest you try to create a query that actually triggers the segfault
(without any extra asserts added to bogus places). A reproducer is about
the only thing that would convince me to spend more time on this.

Also, initially this was proposed as an optimization patch, but without
any demonstration of an improvement ...

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#11

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 5 years ago

In reply to: Ranier Vilela (#5)

Re: Planner, check if can use consider HASH for groupings (src/backend/optimizer/plan/planner.c)

On Sun, Sep 20, 2020 at 01:50:34PM -0300, Ranier Vilela wrote:

Em seg., 21 de set. de 2020 ï¿½s 13:37, Tomas Vondra <
tomas.vondra@2ndquadrant.com> escreveu:

On Mon, Sep 21, 2020 at 08:32:35AM -0300, Ranier Vilela wrote:

Em dom., 20 de set. de 2020 ï¿½s 21:10, Tomas Vondra <
tomas.vondra@2ndquadrant.com> escreveu:

On Sun, Sep 20, 2020 at 08:09:56PM -0300, Ranier Vilela wrote:

Em sex., 18 de set. de 2020 ï¿½s 10:37, Tomas Vondra <
tomas.vondra@2ndquadrant.com> escreveu:

On Thu, Sep 17, 2020 at 06:31:12PM -0300, Ranier Vilela wrote:

Em qui., 17 de set. de 2020 ï¿½s 17:09, Tomas Vondra <
tomas.vondra@2ndquadrant.com> escreveu:

On Thu, Sep 17, 2020 at 02:12:12PM -0300, Ranier Vilela wrote:

Hi,

In case gd->any_hashable is FALSE, grouping_is_hashable is never

called.

In this case, the planner could use HASH for groupings, but will

never

know.

The condition is pretty simple - if the query has grouping sets,

look at

grouping sets, otherwise look at groupClause. For this to be an

issue

the query would need to have both grouping sets and (independent)

group

clause - which AFAIK is not possible.

Uh?
(parse->groupClause != NIL) If different from NIL we have

((independent)

group clause), grouping_is_hashable should check?
(gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause))))
If gd is not NIL and gd->any_hashtable is FALSE?

Sorry, I'm not quite sure I understand what this is meant to say :-(

Anyway, (groupClause != NIL) does not mean the groupClause is somehow
independent (of what?). Add some debugging to create_grouping_paths

and

you'll see that e.g. this query ends up with groupClause with 3

items:

select 1 from t group by grouping sets ((a), (b), (c));

and so does this one:

select 1 from t group by grouping sets ((a,c), (a,b));

I'm no expert in grouping sets, but I'd bet this means we transform

the

grouping sets into a groupClause by extracting the keys. I haven't
investigated why exactly we do this, but I'm sure there's a reason

(e.g.

it gives us SortGroupClause).

You seem to believe a query can have both grouping sets and regular
grouping at the same level - but how would such query look like?

Surely

you can't have two GROuP BY clauses. You can do

Translating into code:
gd is grouping sets and
parse->groupClause is regular grouping
and we cannot have both at the same time.

Have you verified the claim that we can't have both at the same time? As
I already explained, we build groupClause from grouping sets. I suggest
you do some debugging using the queries I used as examples, and you'll
see the claim is not true.

I think we already agreed that we cannot have both at the same time.

No, we haven't. I think we've agreed that you can't have both a group by
clause with grouping sets and then another group by clause with "plain"
grouping. But even with grouping sets you'll have groupClause generated
from the grouping sets. See preprocess_grouping_sets.

select 1 from t group by a, grouping sets ((b), (c));

which is however mostly equivalent to (AFAICS)

select 1 from t group by grouping sets ((a,b), (a,c))

so it's not like we have an independent groupClause either I think.

The whole point of the original condition is - if there are grouping
sets, check if at least one can be executed using hashing (i.e. all

keys

are hashable). Otherwise (without grouping sets) look at the

grouping as

a whole.

Or if gd is NULL check parse->groupClause.

Which is what I'm saying.

I don't see how your change improves this - if there are grouping

sets,

it's futile to look at the whole groupClause if at least one grouping
set can't be hashed.

But there's a simple way to disprove this - show us a case (table

and a

query) where your code correctly allows hashing while the current one
does not.

I'm not an expert in grouping either.
The question I have here is whether gd is populated and has gd->
any_hashable as FALSE,
Its mean no use checking parse-> groupClause, it's a waste of time, Ok.

I'm sorry, I don't follow your logic. Those are two separate cases. If
we have grouping sets, we have to check if at least one can be hashed.
If we don't have grouping sets, we have to check groupClause directly.
Why would that be a waste of time is unclear to me.

This is clear to me.
The problem is how it was implemented in create_grouping_paths.

You haven't demonstrated what the problem is, though. Showing us a query
that fails would make it very clear.

For hashing to be worth considering, at least one grouping set has

to be

hashable - otherwise it's pointless.

Granted, you could have something like

GROUP BY GROUPING SETS ((a), (b)), c

which I think essentially says "add c to every grouping set" and

that

will be covered by the any_hashable check.

I am not going into the merit of whether or not it is worth hashing.
grouping_is_hashable as a last resort, must decide.

I don't know what this is supposed to say either. The whole point of
this check is to simply skip construction of hash-based paths in

cases

when it's obviously futile (i.e. when some of the keys don't support
hashing). We do this as early as possible, because the whole point

is to

save precious CPU cycles during query planning.

Apparently gd pointer, will never be NULL there, verified with

Assert(gd

!=

NULL).

Um, what? If you add the assert right before the if condition, you

won't

even be able to do initdb. It's pretty clear it'll crash for any

query

without grouping sets.

Here not:
Assert(gd != NULL);
create_ordinary_grouping_paths(root, input_rel, grouped_rel,
agg_costs, gd, &extra,
&partially_grouped_rel);

I have no idea where you're adding this assert. But simply adding it

to

create_grouping_paths (right before the if condition changed by your
patch) will trigger a failure during initdb. Simply because for

queries

without grouping sets (and with regular grouping) we pass gs=NULL.

Try applying the attached patch and do "pg_ctl -D ... init" - you'll

get

a failure proving that gd=NULL.

Well, I did a test right now, downloaded the latest HEAD and added the
Assertive.
I compiled everything, ran the regression tests, installed the

binaries,

loaded the server and installed a client database.
Everything is working.
Maybe in all these steps, there is no grouping that would trigger the

code

in question, but I doubt it.

For me doing this leads to reliable crashes when pg_regress does initdb
(so before the actual checks):

patch -p1 < ~/grouping-assert.patch
./configure --enable-debug --enable-cassert
make -s clean && make -s -j4 && make check

And the "make check" it immediately crashes like this:

============== creating temporary instance ==============
============== initializing database system ==============

pg_regress: initdb failed
Examine /home/user/work/postgres/src/test/regress/log/initdb.log for
the reason.

on the assert. So yeah, I'd guess there's something wrong with your
build. What does pg_config say?

Default.
I never change anything. I simply clone the last head:
git clone --single-branch https://github.com/postgres/postgres/
and have it compiled.

Compiled how?

build Debug

What does that mean? Wouldn't it be simpler to just show pg_config
output, which shows the exact flags used for configure / compile?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services