Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

Started by Andy Fanover 5 years ago39 messageshackers

zhihui.fan1213@gmail.com

over 5 years ago

Hi:

This patch is the first patch in UniqueKey patch series[1]/messages/by-id/CAKU4AWr1BmbQB4F7j22G+NS4dNuem6dKaUf+1BK8me61uBgqqg@mail.gmail.com, since I need to
revise
that series many times but the first one would be not that often, so I'd
like to
submit this one for review first so that I don't need to maintain it again
and again.

v1-0001-Introduce-notnullattrs-field-in-RelOptInfo-to-ind.patch

Introduce notnullattrs field in RelOptInfo to indicate which attr are not
null
in current query. The not null is judged by checking pg_attribute and
query's
restrictinfo. The info is only maintained at Base RelOptInfo and Partition's
RelOptInfo level so far.

Any thoughts?

[1]: /messages/by-id/CAKU4AWr1BmbQB4F7j22G+NS4dNuem6dKaUf+1BK8me61uBgqqg@mail.gmail.com
/messages/by-id/CAKU4AWr1BmbQB4F7j22G+NS4dNuem6dKaUf+1BK8me61uBgqqg@mail.gmail.com

--
Best Regards
Andy Fan (https://www.aliyun.com/)

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: Andy Fan (#1)

Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

On Wed, Feb 10, 2021 at 11:18 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

Hi:

This patch is the first patch in UniqueKey patch series[1], since I need
to revise
that series many times but the first one would be not that often, so I'd
like to
submit this one for review first so that I don't need to maintain it again
and again.

v1-0001-Introduce-notnullattrs-field-in-RelOptInfo-to-ind.patch

Introduce notnullattrs field in RelOptInfo to indicate which attr are not
null
in current query. The not null is judged by checking pg_attribute and
query's
restrictinfo. The info is only maintained at Base RelOptInfo and
Partition's
RelOptInfo level so far.

Any thoughts?

[1]
/messages/by-id/CAKU4AWr1BmbQB4F7j22G+NS4dNuem6dKaUf+1BK8me61uBgqqg@mail.gmail.com

--
Best Regards
Andy Fan (https://www.aliyun.com/)

Add the missed patch..

--
Best Regards
Andy Fan (https://www.aliyun.com/)

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

over 5 years ago

In reply to: Andy Fan (#2)

Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

Can this information be part of PathTarget structure and hence part of
RelOptInfo::reltarget, so that it can be extended to join, group and
other kinds of RelOptInfo in future? In fact, it might be easy to do
that in this patch itself.

On Wed, Feb 10, 2021 at 8:57 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Wed, Feb 10, 2021 at 11:18 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

Hi:

This patch is the first patch in UniqueKey patch series[1], since I need to revise
that series many times but the first one would be not that often, so I'd like to
submit this one for review first so that I don't need to maintain it again and again.

v1-0001-Introduce-notnullattrs-field-in-RelOptInfo-to-ind.patch

Introduce notnullattrs field in RelOptInfo to indicate which attr are not null
in current query. The not null is judged by checking pg_attribute and query's
restrictinfo. The info is only maintained at Base RelOptInfo and Partition's
RelOptInfo level so far.

Any thoughts?

[1] /messages/by-id/CAKU4AWr1BmbQB4F7j22G+NS4dNuem6dKaUf+1BK8me61uBgqqg@mail.gmail.com
--
Best Regards
Andy Fan (https://www.aliyun.com/)

Add the missed patch..

--
Best Regards
Andy Fan (https://www.aliyun.com/)

--
Best Wishes,
Ashutosh Bapat

Tom Lane

tgl@sss.pgh.pa.us

over 5 years ago

In reply to: Ashutosh Bapat (#3)

Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> writes:

Can this information be part of PathTarget structure and hence part of
RelOptInfo::reltarget, so that it can be extended to join, group and
other kinds of RelOptInfo in future?

Why would that be better than keeping it in RelOptInfo?

regards, tom lane

David Rowley

dgrowleyml@gmail.com

over 5 years ago

In reply to: Andy Fan (#1)

Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

On Wed, 10 Feb 2021 at 16:18, Andy Fan <zhihui.fan1213@gmail.com> wrote:

v1-0001-Introduce-notnullattrs-field-in-RelOptInfo-to-ind.patch

Introduce notnullattrs field in RelOptInfo to indicate which attr are not null
in current query. The not null is judged by checking pg_attribute and query's
restrictinfo. The info is only maintained at Base RelOptInfo and Partition's
RelOptInfo level so far.

Any thoughts?

I'm not that happy with what exactly the definition is of
RelOptInfo.notnullattrs.

The comment for the field says:
+ /* Not null attrs, start from -FirstLowInvalidHeapAttributeNumber */

So you could expect someone to assume that these are a Bitmapset of
attnums for all columns in the relation marked as NOT NULL. However,
that's not true since you use find_nonnullable_vars() to chase down
quals that filter out NULL values and you mark those too.

The reason I don't really like this is that it really depends where
you want to use RelOptInfo.notnullattrs. If someone wants to use it
to optimise something before the base quals are evaluated then they
might be unhappy that they found some NULLs.

I think you either need to explain in detail what the field means or
separate out the two meanings somehow.

David

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: David Rowley (#5)

Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

Thank you all, friends!

On Fri, Feb 12, 2021 at 9:02 AM David Rowley <dgrowleyml@gmail.com> wrote:

On Wed, 10 Feb 2021 at 16:18, Andy Fan <zhihui.fan1213@gmail.com> wrote:

v1-0001-Introduce-notnullattrs-field-in-RelOptInfo-to-ind.patch

Introduce notnullattrs field in RelOptInfo to indicate which attr are

not null

in current query. The not null is judged by checking pg_attribute and

query's

restrictinfo. The info is only maintained at Base RelOptInfo and

Partition's

RelOptInfo level so far.

Any thoughts?

I'm not that happy with what exactly the definition is of
RelOptInfo.notnullattrs.

The comment for the field says:
+ /* Not null attrs, start from -FirstLowInvalidHeapAttributeNumber */

So you could expect someone to assume that these are a Bitmapset of
attnums for all columns in the relation marked as NOT NULL. However,
that's not true since you use find_nonnullable_vars() to chase down
quals that filter out NULL values and you mark those too.

The comment might be unclear, but the behavior is on purpose. I want
to find more cases which can make the attr NOT NULL, all of them are
useful for UniqueKey stuff.

The reason I don't really like this is that it really depends where
you want to use RelOptInfo.notnullattrs. If someone wants to use it
to optimise something before the base quals are evaluated then they
might be unhappy that they found some NULLs.

Do you mean the notnullattrs is not set correctly before the base quals are
evaluated? I think we have lots of data structures which are set just
after some
stage. but notnullattrs is special because it is set at more than 1
stage. However
I'm doubtful it is unacceptable, Some fields ever change their meaning at
different
stages like Var->varno. If a user has a misunderstanding on it, it
probably will find it
at the testing stage.

I think you either need to explain in detail what the field means or
separate out the two meanings somehow.

Agreed. Besides the not null comes from 2 places (metadata and quals), it
also
means it is based on the relation, rather than the RelTarget. for sample:
A is not
null, but SELECT return_null_udf(A) FROM t, return_null_udf is NULL.
I think
this is not well documented as well. How about just change the documents
as:

1. /* Not null attrs, start from -FirstLowInvalidHeapAttributeNumber, the
NOT NULL
* comes from pg_attribtes and quals at different planning stages.
*/

2. /* Not null attrs, start from -FirstLowInvalidHeapAttributeNumber, the
NOT NULL
* comes from pg_attribtes and quals at different planning stages. And
it just means
* the base attr rather than RelOptInfo->reltarget.
*/

I don't like to separate them into 2 fields because it may make the usage
harder a
bit as well.
--
Best Regards
Andy Fan (https://www.aliyun.com/)

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: Ashutosh Bapat (#3)

Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

On Thu, Feb 11, 2021 at 9:09 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
wrote:

Can this information be part of PathTarget structure and hence part of
RelOptInfo::reltarget, so that it can be extended to join, group and
other kinds of RelOptInfo in future?

I think you want to expand this field in a more generic way. For example:
SELECT udf(a) FROM t WHERE a is not null; In current implementation, I
only
knows a is not null, nothing about if udf(a) is null or not. And we can't
present anything
for joinrel as well since it is just attno.

At the same time, looks we can't tell if UDF(A) is null even if the UDF
is strict and
A is not null?

In fact, it might be easy to do that in this patch itself.

Actually I can't think out the method:)

On Wed, Feb 10, 2021 at 8:57 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Wed, Feb 10, 2021 at 11:18 AM Andy Fan <zhihui.fan1213@gmail.com>

wrote:

Hi:

This patch is the first patch in UniqueKey patch series[1], since I

need to revise

that series many times but the first one would be not that often, so

I'd like to

submit this one for review first so that I don't need to maintain it

again and again.

v1-0001-Introduce-notnullattrs-field-in-RelOptInfo-to-ind.patch

Introduce notnullattrs field in RelOptInfo to indicate which attr are

not null

in current query. The not null is judged by checking pg_attribute and

query's

restrictinfo. The info is only maintained at Base RelOptInfo and

Partition's

RelOptInfo level so far.

Any thoughts?

[1]

/messages/by-id/CAKU4AWr1BmbQB4F7j22G+NS4dNuem6dKaUf+1BK8me61uBgqqg@mail.gmail.com

--
Best Regards
Andy Fan (https://www.aliyun.com/)

Add the missed patch..

--
Best Regards
Andy Fan (https://www.aliyun.com/)

--
Best Wishes,
Ashutosh Bapat

--
Best Regards
Andy Fan (https://www.aliyun.com/)

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

over 5 years ago

In reply to: Tom Lane (#4)

Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

On Thu, Feb 11, 2021 at 8:22 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> writes:

Can this information be part of PathTarget structure and hence part of
RelOptInfo::reltarget, so that it can be extended to join, group and
other kinds of RelOptInfo in future?

Why would that be better than keeping it in RelOptInfo?

regards, tom lane

We have all the expressions relevant to a given relation (simple,
join, group whatever) in Pathtarget. We could remember notnullness of
attributes of a simple relation in RelOptInfo. But IMO non/nullness of
the TLEs of a relation is more useful that attributes and thus
associate those in the PathTarget which is used to produce TLEs. That
way we could use this infra in more general ways.

--
Best Wishes,
Ashutosh Bapat

Tom Lane

tgl@sss.pgh.pa.us

over 5 years ago

In reply to: Ashutosh Bapat (#8)

Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> writes:

On Thu, Feb 11, 2021 at 8:22 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> writes:

Can this information be part of PathTarget structure and hence part of
RelOptInfo::reltarget, so that it can be extended to join, group and
other kinds of RelOptInfo in future?

Why would that be better than keeping it in RelOptInfo?

We have all the expressions relevant to a given relation (simple,
join, group whatever) in Pathtarget. We could remember notnullness of
attributes of a simple relation in RelOptInfo. But IMO non/nullness of
the TLEs of a relation is more useful that attributes and thus
associate those in the PathTarget which is used to produce TLEs. That
way we could use this infra in more general ways.

That argument seems nearly vacuous to me, because for pretty much any
expression that isn't a base-relation Var, the answer will have to be
"don't know". Meanwhile, there are clear costs to keeping such info
in PathTarget, namely having to copy it around. Another issue with
keeping it in PathTarget is that I'm not convinced it'll be readily
available where you need it: most places that would be interested in
making such proofs are only looking at expression trees.

Now there is one angle that *might* become easier if the info were in
PathTarget, namely that it could be simpler and more reliable to mark
nullable output columns of an outer join as being nullable (even if
they came from not-null base columns). However, as I've muttered
about elsewhere, I'd prefer to deal with that can of worms by altering
the representation of the Vars themselves. Again, if you're looking
at a WHERE clause, it's not real clear how you would find a relevant
PathTarget.

regards, tom lane

#10

David Rowley

dgrowleyml@gmail.com

over 5 years ago

In reply to: Andy Fan (#6)

Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

On Fri, 12 Feb 2021 at 15:18, Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Fri, Feb 12, 2021 at 9:02 AM David Rowley <dgrowleyml@gmail.com> wrote:

The reason I don't really like this is that it really depends where
you want to use RelOptInfo.notnullattrs. If someone wants to use it
to optimise something before the base quals are evaluated then they
might be unhappy that they found some NULLs.

Do you mean the notnullattrs is not set correctly before the base quals are
evaluated? I think we have lots of data structures which are set just after some
stage. but notnullattrs is special because it is set at more than 1 stage. However
I'm doubtful it is unacceptable, Some fields ever change their meaning at different
stages like Var->varno. If a user has a misunderstanding on it, it probably will find it
at the testing stage.

You're maybe focusing too much on your use case for notnullattrs. It
only cares about NULLs in the result for each query level.

.... thinks of an example...

OK, let's say I decided that COUNT(*) is faster than COUNT(id) so
decided that I might like to write a patch which rewrite the query to
use COUNT(*) when it was certain that "id" could not contain NULLs.

The query is:

SELECT p.partid, p.partdesc,COUNT(s.saleid) FROM part p LEFT OUTER
JOIN sales s ON p.partid = s.partid GROUP BY p.partid;

sale.saleid is marked as NOT NULL in pg_attribute. As the writer of
the patch, I checked the comment for notnullattrs and it says "Not
null attrs, start from -FirstLowInvalidHeapAttributeNumber", so I
should be ok to assume since sales.saleid is marked in notnullattrs
that I can rewrite the query?!

The documentation about the RelOptInfo.notnullattrs needs to be clear
what exactly it means. I'm not saying your representation of how to
record NOT NULL in incorrect. I'm saying that you need to be clear
what exactly is being recorded in that field.

If you want it to mean "attribute marked here cannot output NULL
values at this query level", then you should say something along those
lines.

However, having said that, because this is a Bitmapset of
pg_attribute.attnums, it's only possible to record Vars from base
relations. It does not seem like you have any means to record
attributes that are normally NULLable, but cannot produce NULL values
due to a strict join qual.

e.g: SELECT t.nullable FROM t INNER JOIN j ON t.nullable = j.something;

I'd expect the RelOptInfo for t not to contain a bit for the
"nullable" column, but there's no way to record the fact that the join
RelOptInfo for {t,j} cannot produce a NULL for that column. It might
be quite useful to know that for the UniqueKeys patch.

I know there's another discussion here between Ashutosh and Tom about
PathTarget's and Vars. I had the Var idea too once myself [1]/messages/by-id/14678.1401639369@sss.pgh.pa.us but it
was quickly shot down. Tom's reasoning there in [1]/messages/by-id/14678.1401639369@sss.pgh.pa.us seems legit. I
guess we'd need some sort of planner version of Var and never confuse
it with the Parse version of Var. That sounds like quite a big
project which would have quite a lot of code churn. I'm not sure how
acceptable it would be to have Var represent both these things. It
gets complex when you do equal(var1, var2) and expect that to return
true when everything matches apart from the notnull field. We
currently have this issue with the "location" field and we even have a
special macro which just ignores those in equalfuncs.c. I imagine not
many people would like to expand that to other fields.

It would be good to agree on the correct representation for Vars that
cannot produce NULLs so that we don't shut the door on classes of
optimisation that require something other than what you need for your
case.

David

[1]: /messages/by-id/14678.1401639369@sss.pgh.pa.us

#11

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: David Rowley (#10)

Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

On Tue, Feb 16, 2021 at 12:01 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Fri, 12 Feb 2021 at 15:18, Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Fri, Feb 12, 2021 at 9:02 AM David Rowley <dgrowleyml@gmail.com>

wrote:

The reason I don't really like this is that it really depends where
you want to use RelOptInfo.notnullattrs. If someone wants to use it
to optimise something before the base quals are evaluated then they
might be unhappy that they found some NULLs.

Do you mean the notnullattrs is not set correctly before the base quals

are

evaluated? I think we have lots of data structures which are set just

after some

stage. but notnullattrs is special because it is set at more than 1

stage. However

I'm doubtful it is unacceptable, Some fields ever change their meaning

at different

stages like Var->varno. If a user has a misunderstanding on it, it

probably will find it

at the testing stage.

You're maybe focusing too much on your use case for notnullattrs. It
only cares about NULLs in the result for each query level.

.... thinks of an example...

OK, let's say I decided that COUNT(*) is faster than COUNT(id) so
decided that I might like to write a patch which rewrite the query to
use COUNT(*) when it was certain that "id" could not contain NULLs.

The query is:

SELECT p.partid, p.partdesc,COUNT(s.saleid) FROM part p LEFT OUTER
JOIN sales s ON p.partid = s.partid GROUP BY p.partid;

sale.saleid is marked as NOT NULL in pg_attribute. As the writer of
the patch, I checked the comment for notnullattrs and it says "Not
null attrs, start from -FirstLowInvalidHeapAttributeNumber", so I
should be ok to assume since sales.saleid is marked in notnullattrs
that I can rewrite the query?!

The documentation about the RelOptInfo.notnullattrs needs to be clear
what exactly it means. I'm not saying your representation of how to
record NOT NULL in incorrect. I'm saying that you need to be clear
what exactly is being recorded in that field.

If you want it to mean "attribute marked here cannot output NULL
values at this query level", then you should say something along those
lines.

I think I get what you mean. You are saying notnullattrs is only correct
at the *current* stage, namely set_rel_size. It would not be true after
that, but the data is still there. That would cause some confusion. I
admit
that is something I didn't realize before. I checked other fields of
RelOptInfo,
looks no one filed works like this, so I am not really happy with this
design
now. I'm OK with saying more things along these lines. That can be done
we all understand each other well. Any better design is welcome as well.
I think the "Var represents null stuff" is good, until I see your comments
below.

However, having said that, because this is a Bitmapset of
pg_attribute.attnums, it's only possible to record Vars from base
relations. It does not seem like you have any means to record
attributes that are normally NULLable, but cannot produce NULL values
due to a strict join qual.

e.g: SELECT t.nullable FROM t INNER JOIN j ON t.nullable = j.something;

I'd expect the RelOptInfo for t not to contain a bit for the
"nullable" column, but there's no way to record the fact that the join
RelOptInfo for {t,j} cannot produce a NULL for that column. It might
be quite useful to know that for the UniqueKeys patch.

The current patch can detect t.nullable is not null correctly. That
is done by find_nonnullable_vars(qual) and deconstruct_recure stage.

I know there's another discussion here between Ashutosh and Tom about
PathTarget's and Vars. I had the Var idea too once myself [1] but it
was quickly shot down. Tom's reasoning there in [1] seems legit. I
guess we'd need some sort of planner version of Var and never confuse
it with the Parse version of Var. That sounds like quite a big
project which would have quite a lot of code churn. I'm not sure how
acceptable it would be to have Var represent both these things. It
gets complex when you do equal(var1, var2) and expect that to return
true when everything matches apart from the notnull field. We
currently have this issue with the "location" field and we even have a
special macro which just ignores those in equalfuncs.c. I imagine not
many people would like to expand that to other fields.

Thanks for sharing this.

It would be good to agree on the correct representation for Vars that
cannot produce NULLs so that we don't shut the door on classes of
optimisation that require something other than what you need for your
case.

Agreed. The simplest way is just adding some comments. If go a
step further, how about just reset the notnullattrs when it is nullable
later like outer join? I have added this logic in the attached patch.
(comment for the notnullattrs is still not touched). I think we only
need to handle this in build_join_rel stage. With the v2 commit 2,
notnullattrs might be unset too early, but if the value is there, then
it is correct.

--
Best Regards
Andy Fan (https://www.aliyun.com/)

#12

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: Andy Fan (#11)

Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

On Tue, Feb 16, 2021 at 10:03 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Tue, Feb 16, 2021 at 12:01 PM David Rowley <dgrowleyml@gmail.com>
wrote:

On Fri, 12 Feb 2021 at 15:18, Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Fri, Feb 12, 2021 at 9:02 AM David Rowley <dgrowleyml@gmail.com>

wrote:

The reason I don't really like this is that it really depends where
you want to use RelOptInfo.notnullattrs. If someone wants to use it
to optimise something before the base quals are evaluated then they
might be unhappy that they found some NULLs.

Do you mean the notnullattrs is not set correctly before the base quals

are

evaluated? I think we have lots of data structures which are set just

after some

stage. but notnullattrs is special because it is set at more than 1

stage. However

I'm doubtful it is unacceptable, Some fields ever change their meaning

at different

stages like Var->varno. If a user has a misunderstanding on it, it

probably will find it

at the testing stage.

You're maybe focusing too much on your use case for notnullattrs. It
only cares about NULLs in the result for each query level.

.... thinks of an example...

OK, let's say I decided that COUNT(*) is faster than COUNT(id) so
decided that I might like to write a patch which rewrite the query to
use COUNT(*) when it was certain that "id" could not contain NULLs.

The query is:

SELECT p.partid, p.partdesc,COUNT(s.saleid) FROM part p LEFT OUTER
JOIN sales s ON p.partid = s.partid GROUP BY p.partid;

sale.saleid is marked as NOT NULL in pg_attribute. As the writer of
the patch, I checked the comment for notnullattrs and it says "Not
null attrs, start from -FirstLowInvalidHeapAttributeNumber", so I
should be ok to assume since sales.saleid is marked in notnullattrs
that I can rewrite the query?!

The documentation about the RelOptInfo.notnullattrs needs to be clear
what exactly it means. I'm not saying your representation of how to
record NOT NULL in incorrect. I'm saying that you need to be clear
what exactly is being recorded in that field.

If you want it to mean "attribute marked here cannot output NULL
values at this query level", then you should say something along those
lines.

I think I get what you mean. You are saying notnullattrs is only correct
at the *current* stage, namely set_rel_size. It would not be true after
that, but the data is still there. That would cause some confusion. I
admit
that is something I didn't realize before. I checked other fields of
RelOptInfo,
looks no one filed works like this, so I am not really happy with this
design
now. I'm OK with saying more things along these lines. That can be done
we all understand each other well. Any better design is welcome as well.
I think the "Var represents null stuff" is good, until I see your
comments below.

However, having said that, because this is a Bitmapset of
pg_attribute.attnums, it's only possible to record Vars from base
relations. It does not seem like you have any means to record
attributes that are normally NULLable, but cannot produce NULL values
due to a strict join qual.

e.g: SELECT t.nullable FROM t INNER JOIN j ON t.nullable = j.something;

I'd expect the RelOptInfo for t not to contain a bit for the
"nullable" column, but there's no way to record the fact that the join
RelOptInfo for {t,j} cannot produce a NULL for that column. It might
be quite useful to know that for the UniqueKeys patch.

The current patch can detect t.nullable is not null correctly. That
is done by find_nonnullable_vars(qual) and deconstruct_recure stage.

I know there's another discussion here between Ashutosh and Tom about
PathTarget's and Vars. I had the Var idea too once myself [1] but it
was quickly shot down. Tom's reasoning there in [1] seems legit. I
guess we'd need some sort of planner version of Var and never confuse
it with the Parse version of Var. That sounds like quite a big
project which would have quite a lot of code churn. I'm not sure how
acceptable it would be to have Var represent both these things. It
gets complex when you do equal(var1, var2) and expect that to return
true when everything matches apart from the notnull field. We
currently have this issue with the "location" field and we even have a
special macro which just ignores those in equalfuncs.c. I imagine not
many people would like to expand that to other fields.

Thanks for sharing this.

It would be good to agree on the correct representation for Vars that
cannot produce NULLs so that we don't shut the door on classes of
optimisation that require something other than what you need for your
case.

Agreed. The simplest way is just adding some comments. If go a
step further, how about just reset the notnullattrs when it is nullable
later like outer join? I have added this logic in the attached patch.
(comment for the notnullattrs is still not touched). I think we only
need to handle this in build_join_rel stage.

With the v2 commit 2,
notnullattrs might be unset too early, but if the value is there, then
it is correct.

This looks bad as well. How about adding an extra field in RelOptInfo for
the
outer join case. For example:

@@ -710,8 +710,14 @@ typedef struct RelOptInfo
        PlannerInfo *subroot;           /* if subquery */
        List       *subplan_params; /* if subquery */
        int                     rel_parallel_workers;   /* wanted number of
parallel workers */
-       /* Not null attrs, start from -FirstLowInvalidHeapAttributeNumber */
+       /*
+        * Not null attrs, start from -FirstLowInvalidHeapAttributeNumber.
The nullness
+        * might be changed after outer join, So we need to consult with
leftouter_relids
+        * before using it.
+        */
        Bitmapset               *notnullattrs;
+       /* A list of Relids which will be a outer rel when join with this
relation. */
+       List    *leftouter_relids;

/* Information about foreign tables and foreign joins */
Oid serverid; /* identifies
server for the table or join */

leftout_relids should be able to be filled with root->join_info_list. If
we go with this
direction, not sure leftouter_relids should be a List or not since I even
can't think
out a query which can have more than one relids for a relation.

--
Best Regards
Andy Fan (https://www.aliyun.com/)

#13

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: David Rowley (#10)

Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

On Tue, Feb 16, 2021 at 12:01 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Fri, 12 Feb 2021 at 15:18, Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Fri, Feb 12, 2021 at 9:02 AM David Rowley <dgrowleyml@gmail.com>

wrote:

The reason I don't really like this is that it really depends where
you want to use RelOptInfo.notnullattrs. If someone wants to use it
to optimise something before the base quals are evaluated then they
might be unhappy that they found some NULLs.

Do you mean the notnullattrs is not set correctly before the base quals

are

evaluated? I think we have lots of data structures which are set just

after some

stage. but notnullattrs is special because it is set at more than 1

stage. However

I'm doubtful it is unacceptable, Some fields ever change their meaning

at different

stages like Var->varno. If a user has a misunderstanding on it, it

probably will find it

at the testing stage.

You're maybe focusing too much on your use case for notnullattrs. It
only cares about NULLs in the result for each query level.

.... thinks of an example...

OK, let's say I decided that COUNT(*) is faster than COUNT(id) so
decided that I might like to write a patch which rewrite the query to
use COUNT(*) when it was certain that "id" could not contain NULLs.

The query is:

SELECT p.partid, p.partdesc,COUNT(s.saleid) FROM part p LEFT OUTER
JOIN sales s ON p.partid = s.partid GROUP BY p.partid;

sale.saleid is marked as NOT NULL in pg_attribute. As the writer of
the patch, I checked the comment for notnullattrs and it says "Not
null attrs, start from -FirstLowInvalidHeapAttributeNumber", so I
should be ok to assume since sales.saleid is marked in notnullattrs
that I can rewrite the query?!

The documentation about the RelOptInfo.notnullattrs needs to be clear
what exactly it means. I'm not saying your representation of how to
record NOT NULL in incorrect. I'm saying that you need to be clear
what exactly is being recorded in that field.

If you want it to mean "attribute marked here cannot output NULL
values at this query level", then you should say something along those
lines.

However, having said that, because this is a Bitmapset of
pg_attribute.attnums, it's only possible to record Vars from base
relations. It does not seem like you have any means to record
attributes that are normally NULLable, but cannot produce NULL values
due to a strict join qual.

e.g: SELECT t.nullable FROM t INNER JOIN j ON t.nullable = j.something;

I'd expect the RelOptInfo for t not to contain a bit for the
"nullable" column, but there's no way to record the fact that the join
RelOptInfo for {t,j} cannot produce a NULL for that column. It might
be quite useful to know that for the UniqueKeys patch.

I checked again and found I do miss the check on JoinExpr->quals. I have
fixed it in v3 patch. Thanks for the review!

In the attached v3, commit 1 is the real patch, and commit 2 is just add
some logs to help local testing. notnull.sql/notnull.out is the test case
for
this patch, both commit 2 and notnull.* are not intended to be committed
at last.

Besides the above fix in v3, I changed the comments alongs the notnullattrs
as below and added a true positive helper function is_var_nullable.

+       /*
+        * Not null attrs, the values are calculated by looking into
pg_attribute and quals
+        * However both cases are not reliable in some outer join cases. So
when
+        * we want to check if a Var is nullable, function is_var_nullable
is a good
+        * place to start with, which is true positive.
+        */
+       Bitmapset               *notnullattrs;

I also found separating the two meanings is unnecessary since both of them
are not reliable in the outer join case and we are just wanting which attr
is
nullable, no match how we know it. The below example shows why
not-null-by-qual
is not readable as well (at least with current implementation)

create table n1(a int, b int not null);
create table n2(a int, b int not null);
create table n3(a int, b int not null);

select * from n1 left join n2 on n1.a = n2.a full join n3 on n2.a = n3.a;

In this case, when we check (n1 left join n2 on n1.a = n2.a) , we know
n1.a is not nullable. However after full join with n3, it changed.

I know there's another discussion here between Ashutosh and Tom about
PathTarget's and Vars. I had the Var idea too once myself [1] but it
was quickly shot down. Tom's reasoning there in [1] seems legit. I
guess we'd need some sort of planner version of Var and never confuse
it with the Parse version of Var. That sounds like quite a big
project which would have quite a lot of code churn. I'm not sure how
acceptable it would be to have Var represent both these things. It
gets complex when you do equal(var1, var2) and expect that to return
true when everything matches apart from the notnull field. We
currently have this issue with the "location" field and we even have a
special macro which just ignores those in equalfuncs.c. I imagine not
many people would like to expand that to other fields.

It would be good to agree on the correct representation for Vars that
cannot produce NULLs so that we don't shut the door on classes of
optimisation that require something other than what you need for your
case.

David

[1]
/messages/by-id/14678.1401639369@sss.pgh.pa.us

--
Best Regards
Andy Fan (https://www.aliyun.com/)

#14

Dmitry Dolgov

9erthalion6@gmail.com

about 5 years ago

In reply to: Andy Fan (#13)

Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

On Thu, Feb 18, 2021 at 08:58:13PM +0800, Andy Fan wrote:

Thanks for continuing work on this patch!

On Tue, Feb 16, 2021 at 12:01 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Fri, 12 Feb 2021 at 15:18, Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Fri, Feb 12, 2021 at 9:02 AM David Rowley <dgrowleyml@gmail.com>

wrote:

The reason I don't really like this is that it really depends where
you want to use RelOptInfo.notnullattrs. If someone wants to use it
to optimise something before the base quals are evaluated then they
might be unhappy that they found some NULLs.

Do you mean the notnullattrs is not set correctly before the base quals

are

evaluated? I think we have lots of data structures which are set just

after some

stage. but notnullattrs is special because it is set at more than 1

stage. However

I'm doubtful it is unacceptable, Some fields ever change their meaning

at different

stages like Var->varno. If a user has a misunderstanding on it, it

probably will find it

at the testing stage.

You're maybe focusing too much on your use case for notnullattrs. It
only cares about NULLs in the result for each query level.

.... thinks of an example...

OK, let's say I decided that COUNT(*) is faster than COUNT(id) so
decided that I might like to write a patch which rewrite the query to
use COUNT(*) when it was certain that "id" could not contain NULLs.

The query is:

SELECT p.partid, p.partdesc,COUNT(s.saleid) FROM part p LEFT OUTER
JOIN sales s ON p.partid = s.partid GROUP BY p.partid;

sale.saleid is marked as NOT NULL in pg_attribute. As the writer of
the patch, I checked the comment for notnullattrs and it says "Not
null attrs, start from -FirstLowInvalidHeapAttributeNumber", so I
should be ok to assume since sales.saleid is marked in notnullattrs
that I can rewrite the query?!

The documentation about the RelOptInfo.notnullattrs needs to be clear
what exactly it means. I'm not saying your representation of how to
record NOT NULL in incorrect. I'm saying that you need to be clear
what exactly is being recorded in that field.

If you want it to mean "attribute marked here cannot output NULL
values at this query level", then you should say something along those
lines.

However, having said that, because this is a Bitmapset of
pg_attribute.attnums, it's only possible to record Vars from base
relations. It does not seem like you have any means to record
attributes that are normally NULLable, but cannot produce NULL values
due to a strict join qual.

e.g: SELECT t.nullable FROM t INNER JOIN j ON t.nullable = j.something;

I'd expect the RelOptInfo for t not to contain a bit for the
"nullable" column, but there's no way to record the fact that the join
RelOptInfo for {t,j} cannot produce a NULL for that column. It might
be quite useful to know that for the UniqueKeys patch.

I checked again and found I do miss the check on JoinExpr->quals. I have
fixed it in v3 patch. Thanks for the review!

In the attached v3, commit 1 is the real patch, and commit 2 is just add
some logs to help local testing. notnull.sql/notnull.out is the test case
for
this patch, both commit 2 and notnull.* are not intended to be committed
at last.

Just to clarify, this version of notnullattrs here is the latest one,
and another one from "UniqueKey on Partitioned table" thread should be
disregarded?

Besides the above fix in v3, I changed the comments alongs the notnullattrs
as below and added a true positive helper function is_var_nullable.

With "true positive" you mean it will always correctly say if a Var is
nullable or not? I'm not sure about this, but couldn't be there still
some cases when a Var belongs to nullable_baserels, but still has some
constraints preventing it from being nullable (e.g. a silly example when
the not nullable column belong to the table, and the query does full
join of this table on itself using this column)?

Is this function necessary for the following patches? I've got an
impression that the discussion in this thread was mostly evolving about
correct description when notnullattrs could be used, not making it
bullet proof.

Bitmapset *notnullattrs;

It looks like RelOptInfo has its own out function _outRelOptInfo,
probably the notnullattrs should be also present there as BITMAPSET_FIELD?

As a side note, I've attached those two new threads to CF item [1]https://commitfest.postgresql.org/32/2433/,
hopefully it's correct.

[1]: https://commitfest.postgresql.org/32/2433/

#15

Andy Fan

zhihui.fan1213@gmail.com

about 5 years ago

In reply to: Dmitry Dolgov (#14)

Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

On Fri, Mar 5, 2021 at 12:00 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Thu, Feb 18, 2021 at 08:58:13PM +0800, Andy Fan wrote:

Thanks for continuing work on this patch!

On Tue, Feb 16, 2021 at 12:01 PM David Rowley <dgrowleyml@gmail.com>

wrote:

On Fri, 12 Feb 2021 at 15:18, Andy Fan <zhihui.fan1213@gmail.com>

wrote:

On Fri, Feb 12, 2021 at 9:02 AM David Rowley <dgrowleyml@gmail.com>

wrote:

The reason I don't really like this is that it really depends where
you want to use RelOptInfo.notnullattrs. If someone wants to use it
to optimise something before the base quals are evaluated then they
might be unhappy that they found some NULLs.

Do you mean the notnullattrs is not set correctly before the base

quals

are

evaluated? I think we have lots of data structures which are set

just

after some

stage. but notnullattrs is special because it is set at more than 1

stage. However

I'm doubtful it is unacceptable, Some fields ever change their

meaning

at different

stages like Var->varno. If a user has a misunderstanding on it, it

probably will find it

at the testing stage.

You're maybe focusing too much on your use case for notnullattrs. It
only cares about NULLs in the result for each query level.

.... thinks of an example...

OK, let's say I decided that COUNT(*) is faster than COUNT(id) so
decided that I might like to write a patch which rewrite the query to
use COUNT(*) when it was certain that "id" could not contain NULLs.

The query is:

SELECT p.partid, p.partdesc,COUNT(s.saleid) FROM part p LEFT OUTER
JOIN sales s ON p.partid = s.partid GROUP BY p.partid;

sale.saleid is marked as NOT NULL in pg_attribute. As the writer of
the patch, I checked the comment for notnullattrs and it says "Not
null attrs, start from -FirstLowInvalidHeapAttributeNumber", so I
should be ok to assume since sales.saleid is marked in notnullattrs
that I can rewrite the query?!

The documentation about the RelOptInfo.notnullattrs needs to be clear
what exactly it means. I'm not saying your representation of how to
record NOT NULL in incorrect. I'm saying that you need to be clear
what exactly is being recorded in that field.

If you want it to mean "attribute marked here cannot output NULL
values at this query level", then you should say something along those
lines.

However, having said that, because this is a Bitmapset of
pg_attribute.attnums, it's only possible to record Vars from base
relations. It does not seem like you have any means to record
attributes that are normally NULLable, but cannot produce NULL values
due to a strict join qual.

e.g: SELECT t.nullable FROM t INNER JOIN j ON t.nullable = j.something;

I'd expect the RelOptInfo for t not to contain a bit for the
"nullable" column, but there's no way to record the fact that the join
RelOptInfo for {t,j} cannot produce a NULL for that column. It might
be quite useful to know that for the UniqueKeys patch.

I checked again and found I do miss the check on JoinExpr->quals. I have
fixed it in v3 patch. Thanks for the review!

In the attached v3, commit 1 is the real patch, and commit 2 is just add
some logs to help local testing. notnull.sql/notnull.out is the test

case

for
this patch, both commit 2 and notnull.* are not intended to be committed
at last.

Just to clarify, this version of notnullattrs here is the latest one,
and another one from "UniqueKey on Partitioned table" thread should be
disregarded?

Actually they are different sections for UniqueKey. Since I don't want to
mess
two topics in one place, I open another thread. The topic here is how to
represent
a not null attribute, which is a precondition for all UniqueKey stuff. The
thread
" UniqueKey on Partitioned table[1]/messages/by-id/CAKU4AWrU35c9g3cE15JmVwh6B2Hzf4hf7cZUkRsiktv7AKR3Ag@mail.gmail.com " is talking about how to maintain the
UniqueKey on a partitioned table only.

Besides the above fix in v3, I changed the comments alongs the

notnullattrs

as below and added a true positive helper function is_var_nullable.

With "true positive" you mean it will always correctly say if a Var is
nullable or not?

not null. If I say it is not null (return value is false), it is not null
for sure. If
it is nullable (true), it may be still not null for some stages. But I
don't want
to distinguish them too much, so I just say it is nullable.

I'm not sure about this, but couldn't be there still
some cases when a Var belongs to nullable_baserels, but still has some
constraints preventing it from being nullable (e.g. a silly example when
the not nullable column belong to the table, and the query does full
join of this table on itself using this column)?

Do you say something like "SELECT * FROM t1 left join t2 on t1.a = t2.a

WHERE
t2.b = 3; "? In this case, the outer join will be reduced to inner join
at
reduce_outer_join stage, which means t2 will not be shown in
nullable_baserels.

Is this function necessary for the following patches? I've got an
impression that the discussion in this thread was mostly evolving about
correct description when notnullattrs could be used, not making it
bullet proof.

Exactly, that is the blocker issue right now. I hope more authorities can
give
some suggestions to move on.

Bitmapset *notnullattrs;

It looks like RelOptInfo has its own out function _outRelOptInfo,
probably the notnullattrs should be also present there as BITMAPSET_FIELD?

Yes, it should be added.

As a side note, I've attached those two new threads to CF item [1],
hopefully it's correct.

[1]: https://commitfest.postgresql.org/32/2433/

Thanks for doing that. It is correct.

[1]: /messages/by-id/CAKU4AWrU35c9g3cE15JmVwh6B2Hzf4hf7cZUkRsiktv7AKR3Ag@mail.gmail.com
/messages/by-id/CAKU4AWrU35c9g3cE15JmVwh6B2Hzf4hf7cZUkRsiktv7AKR3Ag@mail.gmail.com

--
Best Regards
Andy Fan (https://www.aliyun.com/)

#16

Dmitry Dolgov

9erthalion6@gmail.com

about 5 years ago

In reply to: Andy Fan (#15)

Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

On Fri, Mar 05, 2021 at 10:22:45AM +0800, Andy Fan wrote:

I checked again and found I do miss the check on JoinExpr->quals. I have
fixed it in v3 patch. Thanks for the review!

In the attached v3, commit 1 is the real patch, and commit 2 is just add
some logs to help local testing. notnull.sql/notnull.out is the test

case

for
this patch, both commit 2 and notnull.* are not intended to be committed
at last.

Just to clarify, this version of notnullattrs here is the latest one,
and another one from "UniqueKey on Partitioned table" thread should be
disregarded?

Actually they are different sections for UniqueKey. Since I don't want to
mess
two topics in one place, I open another thread. The topic here is how to
represent
a not null attribute, which is a precondition for all UniqueKey stuff. The
thread
" UniqueKey on Partitioned table[1] " is talking about how to maintain the
UniqueKey on a partitioned table only.

Sure, those two threads are addressing different topics. But [1] also
includes the patch for notnullattrs (I guess it's the same as one of the
older versions from this thread), so it would be good to specify which
one should be used to avoid any confusion.

I'm not sure about this, but couldn't be there still
some cases when a Var belongs to nullable_baserels, but still has some
constraints preventing it from being nullable (e.g. a silly example when
the not nullable column belong to the table, and the query does full
join of this table on itself using this column)?

Do you say something like "SELECT * FROM t1 left join t2 on t1.a = t2.a

WHERE
t2.b = 3; "? In this case, the outer join will be reduced to inner join
at
reduce_outer_join stage, which means t2 will not be shown in
nullable_baserels.

Nope, as I said it's a bit useless example of full self join t1 on
itself. In this case not null column "a" will be considered as nullable,
but following your description for is_var_nullable it's fine (although
couple of commentaries to this function are clearly necessary).

Is this function necessary for the following patches? I've got an
impression that the discussion in this thread was mostly evolving about
correct description when notnullattrs could be used, not making it
bullet proof.

Exactly, that is the blocker issue right now. I hope more authorities can
give
some suggestions to move on.

Hm...why essentially a documentation question is the blocker? Or if you
mean it's a question of the patch scope, are there any arguments for
extending it?

#17

Andy Fan

zhihui.fan1213@gmail.com

about 5 years ago

In reply to: Dmitry Dolgov (#16)

Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

On Fri, Mar 5, 2021 at 4:16 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Fri, Mar 05, 2021 at 10:22:45AM +0800, Andy Fan wrote:

I checked again and found I do miss the check on JoinExpr->quals. I

have

fixed it in v3 patch. Thanks for the review!

In the attached v3, commit 1 is the real patch, and commit 2 is

just add

some logs to help local testing. notnull.sql/notnull.out is the test

case

for
this patch, both commit 2 and notnull.* are not intended to be

committed

at last.

Just to clarify, this version of notnullattrs here is the latest one,
and another one from "UniqueKey on Partitioned table" thread should be
disregarded?

Actually they are different sections for UniqueKey. Since I don't want

to

mess
two topics in one place, I open another thread. The topic here is how to
represent
a not null attribute, which is a precondition for all UniqueKey stuff.

The

thread
" UniqueKey on Partitioned table[1] " is talking about how to maintain

the

UniqueKey on a partitioned table only.

Sure, those two threads are addressing different topics. But [1] also
includes the patch for notnullattrs (I guess it's the same as one of the
older versions from this thread), so it would be good to specify which
one should be used to avoid any confusion.

I'm not sure about this, but couldn't be there still
some cases when a Var belongs to nullable_baserels, but still has some
constraints preventing it from being nullable (e.g. a silly example

when

the not nullable column belong to the table, and the query does full
join of this table on itself using this column)?

Do you say something like "SELECT * FROM t1 left join t2 on t1.a = t2.a

WHERE
t2.b = 3; "? In this case, the outer join will be reduced to inner join
at
reduce_outer_join stage, which means t2 will not be shown in
nullable_baserels.

Nope, as I said it's a bit useless example of full self join t1 on
itself. In this case not null column "a" will be considered as nullable,
but following your description for is_var_nullable it's fine (although
couple of commentaries to this function are clearly necessary).

Is this function necessary for the following patches? I've got an
impression that the discussion in this thread was mostly evolving about
correct description when notnullattrs could be used, not making it
bullet proof.

Exactly, that is the blocker issue right now. I hope more authorities can
give
some suggestions to move on.

Hm...why essentially a documentation question is the blocker? Or if you
mean it's a question of the patch scope, are there any arguments for
extending it?

I treat the below comment as the blocker issue:

It would be good to agree on the correct representation for Vars that
cannot produce NULLs so that we don't shut the door on classes of
optimisation that require something other than what you need for your
case.

David/Tom/Ashutosh, do you mind to share more insights to this?
I mean the target is the patch is in a committable state.

--
Best Regards
Andy Fan (https://www.aliyun.com/)

#18

Andy Fan

zhihui.fan1213@gmail.com

about 5 years ago

In reply to: David Rowley (#10)

Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

On Tue, Feb 16, 2021 at 12:01 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Fri, 12 Feb 2021 at 15:18, Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Fri, Feb 12, 2021 at 9:02 AM David Rowley <dgrowleyml@gmail.com>

wrote:

The reason I don't really like this is that it really depends where
you want to use RelOptInfo.notnullattrs. If someone wants to use it
to optimise something before the base quals are evaluated then they
might be unhappy that they found some NULLs.

Do you mean the notnullattrs is not set correctly before the base quals

are

evaluated? I think we have lots of data structures which are set just

after some

stage. but notnullattrs is special because it is set at more than 1

stage. However

I'm doubtful it is unacceptable, Some fields ever change their meaning

at different

stages like Var->varno. If a user has a misunderstanding on it, it

probably will find it

at the testing stage.

You're maybe focusing too much on your use case for notnullattrs. It
only cares about NULLs in the result for each query level.

.... thinks of an example...

OK, let's say I decided that COUNT(*) is faster than COUNT(id) so
decided that I might like to write a patch which rewrite the query to
use COUNT(*) when it was certain that "id" could not contain NULLs.

The query is:

SELECT p.partid, p.partdesc,COUNT(s.saleid) FROM part p LEFT OUTER
JOIN sales s ON p.partid = s.partid GROUP BY p.partid;

sale.saleid is marked as NOT NULL in pg_attribute. As the writer of
the patch, I checked the comment for notnullattrs and it says "Not
null attrs, start from -FirstLowInvalidHeapAttributeNumber", so I
should be ok to assume since sales.saleid is marked in notnullattrs
that I can rewrite the query?!

The documentation about the RelOptInfo.notnullattrs needs to be clear
what exactly it means. I'm not saying your representation of how to
record NOT NULL in incorrect. I'm saying that you need to be clear
what exactly is being recorded in that field.

If you want it to mean "attribute marked here cannot output NULL
values at this query level", then you should say something along those
lines.

However, having said that, because this is a Bitmapset of
pg_attribute.attnums, it's only possible to record Vars from base
relations. It does not seem like you have any means to record
attributes that are normally NULLable, but cannot produce NULL values
due to a strict join qual.

e.g: SELECT t.nullable FROM t INNER JOIN j ON t.nullable = j.something;

I'd expect the RelOptInfo for t not to contain a bit for the
"nullable" column, but there's no way to record the fact that the join
RelOptInfo for {t,j} cannot produce a NULL for that column. It might
be quite useful to know that for the UniqueKeys patch.

I read your comments again and find I miss your point before. So I'd
summarize
the my current understanding to make sure we are in the same place for
further
working.

I want to define a notnullattrs on RelOptInfo struct. The not nullable
may comes from catalog definition or quals on the given query. For example:

CREATE TABLE t(a INT NOT NULL, nullable INT);
SELECT * FROM t; ==> a is not null for sure by definition.
SELECT * FROM t WHERE nullable > 3; ==> nullable is not null as well by
qual.

However the thing becomes complex with the below 2 cases.

1. SELECT * FROM t INNER JOIN j on t.nullable = q.b;
We know t.b will not be null **finally**. But the current plan may something
like this:

QUERY PLAN
------------------------------------------
Merge Join
Merge Cond: (t.nullable = j.something)
-> Sort
Sort Key: t.nullable
-> Seq Scan on t
-> Sort
Sort Key: j.something
-> Seq Scan on j
(8 rows)

which means the Path "Seq Scan on t" still contains some null values. At
least,
we should not assume t.nullable is "not nullable" the base relation stage.

2. SELECT t.a FROM j LEFT JOIN t ON t.b = t.a;
Even the t.a is not null by definition, but it may have null **finally**
due to
the outer join.

My current patch doesn't handle the 2 cases well since t.nullable is marked
as
NOT NULL for both cases.

I know there's another discussion here between Ashutosh and Tom about
PathTarget's and Vars. I had the Var idea too once myself [1] but it
was quickly shot down. Tom's reasoning there in [1] seems legit. I
guess we'd need some sort of planner version of Var and never confuse
it with the Parse version of Var. That sounds like quite a big
project which would have quite a lot of code churn. I'm not sure how
acceptable it would be to have Var represent both these things. It
gets complex when you do equal(var1, var2) and expect that to return
true when everything matches apart from the notnull field. We
currently have this issue with the "location" field and we even have a
special macro which just ignores those in equalfuncs.c. I imagine not
many people would like to expand that to other fields.

It would be good to agree on the correct representation for Vars that
cannot produce NULLs so that we don't shut the door on classes of
optimisation that require something other than what you need for your
case.

Looks we have to maintain not null on the general RelOptInfo level rather
than Base
RelOptInfo. But I don't want to teach Var about the notnull so far. The
reasons are: 1).
We need to maintain the Planner version and Parser version due to the VIEW
case.
2). We have to ignore the extra part for equal(Var, Var) . 3). Var is
usually shared among
different RelOptInfo. which means we have to maintain different copies for
this purpose IIUC.

I assume we want to know if a Var is nullable with a function like.
is_var_notnullable(Var *var, Relids relids). If so, we can define the
data as below:

struct RelOptInfo {

Bitmapset** notnullattrs;
..
};

After this we can implement the function as:

bool
is_var_notnullable(Var* var, Relids relids)
{
RelOptInfo *rel = find_rel_by_relids(reldis);
return bms_is_member(var->varattno, rel->notnullattrs[var->varno]);
}

Probably we can make some hackers to reduce the notnullattrs's memory usage
overhead.

Any thoughts?

David

[1]
/messages/by-id/14678.1401639369@sss.pgh.pa.us

--
Best Regards
Andy Fan (https://www.aliyun.com/)

#19

Andy Fan

zhihui.fan1213@gmail.com

about 5 years ago

In reply to: Andy Fan (#18)

Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

I assume we want to know if a Var is nullable with a function like.
is_var_notnullable(Var *var, Relids relids). If so, we can define the
data as below:

struct RelOptInfo {

Bitmapset** notnullattrs;
..
};

After this we can implement the function as:

bool
is_var_notnullable(Var* var, Relids relids)
{
RelOptInfo *rel = find_rel_by_relids(reldis);
return bms_is_member(var->varattno, rel->notnullattrs[var->varno]);
}

Probably we can make some hackers to reduce the notnullattrs's memory usage
overhead.

To be more precise, to make the rel->notnullattrs shorter, we can do the
following methods:
1). Relids only has single element, we can always use a 1-len array rather
than rel->varno
elements. 2). For multi-elements relids, we use the max(varno) as the
length of rel->notnullattrs.
3). For some cases, the notnullattrs of a baserel is not changed in later
stages, we can just
reuse the same Bitmapset * in later stages.

--
Best Regards
Andy Fan (https://www.aliyun.com/)

#20

Andy Fan

zhihui.fan1213@gmail.com

about 5 years ago

In reply to: Andy Fan (#18)

Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

However the thing becomes complex with the below 2 cases.

1. SELECT * FROM t INNER JOIN j on t.nullable = q.b;
We know t.b will be not null **finally**. But the current plan may
something
like this:

QUERY PLAN
------------------------------------------
Merge Join
Merge Cond: (t.nullable = j.something)
-> Sort
Sort Key: t.nullable
-> Seq Scan on t
-> Sort
Sort Key: j.something
-> Seq Scan on j
(8 rows)

which means the Path "Seq Scan on t" still contains some null values. At
least,
we should not assume t.nullable is "not nullable" the base relation stage.

2. SELECT t.a FROM j LEFT JOIN t ON t.b = t.a;
Even the t.a is not null by definition, but it may have null **finally**
due to
the outer join.

The above 2 cases have been addressed by defining the notnullattrs on
every RelOptInfo, and maintaining them on every join. However, per offline
discussion with David, IIUC, there is a more case to think about.

CREATE TABLE t (a INT, b INT);
SELECT * FROM t WHERE a = 1 and b = 2;

We know b is not null after we evaluate the qual b = 2, but it may still
nullable when we just evaluate a = 1;

I prefer to not handle it by saying the semantics of notnullattrs is correct
after we evaluate all the quals on its RelOptInfo.

It would be good to agree on the correct representation for Vars that

cannot produce NULLs so that we don't shut the door on classes of
optimisation that require something other than what you need for your
case.

Looks we have to maintain not null on the general RelOptInfo level rather
than Base
RelOptInfo. But I don't want to teach Var about the notnull so far. The
reasons are: 1).
We need to maintain the Planner version and Parser version due to the VIEW
case.
2). We have to ignore the extra part for equal(Var, Var) . 3). Var is
usually shared among
different RelOptInfo. which means we have to maintain different copies for
this purpose IIUC.

I assume we want to know if a Var is nullable with a function like.
is_var_notnullable(Var *var, Relids relids). If so, we can define the
data as below:

struct RelOptInfo {

Bitmapset** notnullattrs;
..
};

After this we can implement the function as:

/*
* is_var_notnullable
* Check if the var is nullable for a given RelOptIno after
* all the quals on it have been evaluated.
*
* var is the var to check, relids is the ids of a RelOptInfo
* we will check on.
*/
bool
is_var_notnullable(Var* var, Relids relids)
{
RelOptInfo *rel = find_rel_by_relids(reldis);
return bms_is_member(var->varattno, rel->notnullattrs[var->varno]);
}

Do you think this is a reasonable solution?

bool

is_var_notnullable(Var* var, Relids relids)
{
RelOptInfo *rel = find_rel_by_relids(reldis);
return bms_is_member(var->varattno, rel->notnullattrs[var->varno]);
}

Probably we can make some hackers to reduce the notnullattrs's memory usage
overhead.

--

Best Regards
Andy Fan (https://www.aliyun.com/)

#21

Andy Fan

zhihui.fan1213@gmail.com

almost 5 years ago

In reply to: Andy Fan (#20)

#22

David Rowley

dgrowleyml@gmail.com

almost 5 years ago

In reply to: Andy Fan (#21)

#23