Self contradictory examining on rel's baserestrictinfo

Started by bigbro_wq@hotmail.comover 1 year ago16 messages
Jump to latest
#1bigbro_wq@hotmail.com
bigbro_wq@hotmail.com

Hi,
I committed the major features of self-contradictory examining patches.
I'd like to explain it in the following aspects.

1. Background
A few months ago, when i read source codes of B-tree in routine
_bt_preprocess_keys, i found that there are more contradictory
checking case we can add. I sent email to pgsql-hackers and
then community contributor replied me and told me someone had
already proposed this question. Thanks for taking the time
to address my question. After serveral conversations, i found
that we can do something more. We can place these jobs at planning time.
2. Under the frame
In practice, B-tree index is used widely; Binary operation expression
that Var op Const is often written; Also without data type coercion
and collation. So features above are supported.
There are several considerations 1) self-contradictory situation
is seldom appeared. 2) It's much expensive if spend too much effort
on it and find that rel is not self-contradictory. 3) The expression
evaluation mechanism will do the right things finally. So this is
a compromise.
3. Detailes
      We will get the base restrictinfo list after deconstructed jointree.
1) The baserestrictinfo is a AND-implict list, that mean if one element
of the list is contradictory to others or self-contradictory the whole
rel is self-contradictory, then we can mark the rel is dummy.

2) The stragegy operators(>,>=,=,<,<=) of B-tree, some of them may be
contradictory to others. Another operator need to be considered is <>
the negator of =.
2.1) = against >, >=, =, <, <=, <>
2.2) >, >= against <, <=

3) We can eliminate the weaker restrictive operator if there is a more
restrictive operator (eg. x > 4 and x >=5, the sub-expression x > 4 will
be abandoned). This will simplify the subsequent examining.
A special situation we need to care about is if we encounter operator '='
and there is not contradictory, the operator '=' will dominate the others
operators.

4) We need to record down the details when we encounter the operator '<>'
and then check them oen by one if then operator '=' appear.
I use linear searching when the number of elements is less than or equal 4
or we can't find a support function to sort these elements.
If we find a support function and the number of elements is greater than 4,
sort these elements and use binary searching.
I choose number 4 here for two considerations. 1) the type size of storing
the const value multiply 4 match a cpu cache line usually. 2) In practice
we will not write down much not equal expressions generally.

5) NULL test
When IS NULL is set, it's contradictory to IS NOT NULL obviously.
And it's contradictory to operator that imply IS NOT NULL implictly.
If we find one of these operators ( >,>=,=,<,<=,<> ) and the expression with
the operator make sense (eg. NULL is not in expression) then we can recognize
this situation is contradictory to IS NULL test.

6) Boolean type
Boolean type is a little tricky, because it can evaluate with both of
test and opreator.
When we encounter the operator one of =, <> or expression is a single
Var or NOT Boolexpr with a single Var. We need to extract details to
make a test format (eg. x = true to x is true; x <> not false to x is false.)

When Var is set with IS_TRUE, this is contradictory to IS_NOT_TRUE,
IS_FALSE and IS_UNKNOWN.
When Var is set with IS_FALSE, this is contradictory to IS_TRUE,
IS_NOT_FALSE and IS_UNKNOWN.
When Var is set with IS_UNKNOWN, this is contradictory to IS_NOT_UNKNOWN,
IS_TRUE and IS_FALSE.
      
When the NULL test and self test of of boolean type are both appeared.
If IS NULL is set it's contradictory to IS TRUE, IS FALSE, IS NOT
UNKNOWN; If IS NOT NULL is set it's contradictory to IS UNKNOWN.
      
Finally we need check them carefully when strategy operator appear.
If we found boolean expression is written like x > true or x < false,
it's beyond the scope of the defined values of boolean type, the expression
does not make sense obviously.
If IS_UNKNOWN is set and it's contradictory to stragegy operator.
If IS_NOT_FALSE is set, that means the equivalent expression is (eg. x IS
NULL OR x IS TRUE). The NULL test will always fail with the strategy operator,
so we only need to test whether the sub-expression (x IS TRUE) is
contradictory with the strategy operator. If IS_NOT_TRUE is set, it's
similar to IS_NOT_FALSE.
      
7) Scalar array comparison expression
First we need to deconstruct the const array, figure out the null and non-null
elements.
If ALL flag is set and the Const contain NULL. we will get nothing (eg. x <=
ALL(array[56, null])), it's contradictory.
Then sort the non-null elements and eliminate any duplicates. We will give up
if we can't get the sorting support function.
1. Expression with operator <>
We can examine them one by one if ALL flag is set, it's similar to the common
expression (eg. x <> 2 and x <> 3);
If ANY flag is set and the number of sorted elements is only one, we can convert
the scalar array expression to a binary operator expression, but we can't do
anything else if there are more than one elements.

2. Expression with one of these operators >,>=,<,<=
We just need check the boundary element (eg. x > ALL(array[1,2,3]) it's equal
to x > 3) and convert expression to the equivalent binary operator expression.

3. Expression with operator =
We treat it as a binary operator expression if the number of sorted elements
is only one;
When the number of sorted elements is more than one.
It is contradictory if ALL flag is set.
If ANY flag is set, we need to postpone them before the collection of same type
expression is completed (eg. x in(1,3) and x in(1,5) we can extract the same
value 1 when we collected all of these expression over). Then we check
the intersection elements.
      
8) Row comparison expression
We need convert row comparison to nonconstructor.
The operator <> and = have already done in analysis phase. So we just need
to take care of one of operators <, <=, > and >=.
The basic rules for evaluating a row comparison expression is the row elements
are compared left-to-right, stopping as soon as an unequal or null pair
of elements is found.
Let us ignore the NULL that may appear in the row comparison expression
temporarily, the logic of deconstructing row comparison expression is like:
ROW(a1, a2, ..., an) >= ROW(b1, b2, ..., bn)
Normal format:
(a1 > b1)
OR (a1 = b1 AND a2 > b2)
OR (a1 = b1 AND a2 = b2 AND a3 > b3)
...
OR (a1 = b1 AND a2 = b2 AND ... AND an >= bn)
The difference between operators in row comparison expression is the more
restrictive operator is used in last sub-condition of every sub-OR expression
(eg. a3 > b3 the more restrictive > is used), but the operator of last
sub-condition (an >= bn) of the last sub-OR expression is according to the
operator of row comparison expression.
We get two regulations:
1. operator '=' is used in every sub-condition except the last one in
every sub-OR expression or the sub-OR expression had no equal sub-condition.
So we can record down the equal sub-condition with a list(i called it
sub-condition-list) and then we will iterate the list to generating
sub-conditions for subsequent sub-OR expression when we meet a new pair
of elements.
2. The more restrictive operator is used in last sub-condition of
every sub-OR expression except the last sub-OR expression. row expression's
operator weill be used int he last sub-condition of last sub-OR expression.

It's time to face the special cases now.
1. NULL value.
If either of this pair of elements is null, we can skip the evaluation of
subsequent elements, but if the NULL come out at first position then the
whole expression will supply nothing.
For example ROW(a,NULL,c) > (1,2,3)
normal format:
a > 1
or a = 1 and NULL > 2
or a = 1 and NULL = 2 and c > 3
The sub-OR expression doesn't make sense if NULL in sub-condition. So the
final converted expression is a > 1.

2. Pairs of elements are equal.
When they contain volatile functions, we will give up and just return back.
If they are not Const that mean element IS NOT NULL. We need add the NULL
test to the sub-condition-list for generating subsequent sub-OR expression.
It's need to ignore them if they are Const, so we don't add it to the
sub-condition-list.
We skip the current sub-OR expression generating, but if the pair of elements
come out at the last position we need be carefull.
If the operator of row comparison expression is a more restrictive operator,
we skip the last sub-OR expression generating, but if the operator is weaker
we need to generate the last sub-OR expression.
For examples:
ROW(a,b,c) > (1,b,3)        ROW(a,2,c) > (1,2,3)
normal format:         normal format:
a > 1         a > 1
or a = 1 and b > b         or a = 1 and 2 > 2
or a = 1 and b = b and c > 3        or a = 1 and 2 = 2 and c > 3
The final expression is The final expression is
a > 1         a > 1
or a = 1 and b IS NOT NULL and c > 3 or a = 1 and c > 3

ROW(a,b,c) > (1,2,c)    ROW(a,b,c) >= (1,2,c)
normal format:       normal format:
a > 1            a > 1
or a = 1 and b > 2 or a = 1 and b > 2
or a = 1 and b = 2 and c > c or a = 1 and b = 2 and c >= c
The final expression is The final expression is
a > 1            a > 1
or a = 1 and b > 2       or a = 1 and b > 2
            or a = 1 and b = 2 and c is not null
If case 1 and 2 are both appeared in row expression.
We will skip the subsequent pairs of elements when we meet the NULL.
So it just needs to be considered case 2 is in front of 1.
A special situation is all of pairs of elements are equal before we
meet NULL, according to the rule of case 2 we don't generate current
sub-OR expression if meet a pairs of elements these are equal.
So the nonconstructor is empty and we need a flag to know the result of
evaluation of the last pair of elements, then the result of row expression
is according to the flag.

3. Pairs of elements are Const but not equal.
Seeing the above normal format, there is no need to generate sub-OR
expression for the subsequent pairs of elements if we meet the pair of
elements they are Const but not equal.
But whether it's need to generate the current sub-OR expression or not is
according to the reulst of evaluation of current pair of elements.
If the reulst is positive we need to generate the sub-OR expression or
do nothing.
If the pair of elements come out at the first position, then the result
of row comparison expression is according to result of the pair of elements.
If the result is positive, the reuslt of row comparison expression is
positive.
Another situation is all of pairs of elements are equal before we meet
the not equal pair of elements, it's similar to what we discussed in case 2.

We will get a OR-expression finally. Then we can do the examining jobs and
prune the OR-expression, we only keep the output expression that is a simple
expression. The OR-format is not better than row expression for subsequent
processing. If we need watch the output OR-expression we should add macro
DEBUG_ROW_DECODE when configure our environment.

9) OR-expression
We need postpone OR-expression if we meet it. Because we can push other
expressions these have the same semantic level with OR-expression down into
the OR-expression. Then we can check every sub-expression with the pushed
down details.
we will abandon the sub-expression if we find the sub-expression is
contradictory.
when all of the sub-expressions are contradictory that mean the OR-expression
is contradictory.
A special situation is once we find a sub-expression is const true and then
the OR-expression is const true.
For example: x > 10 and (x < 9 or y > 10) in the expression we push x > 10
down into the OR-expression, then we will find the sub-expression x < 9 is
contradictory to x > 10 and x < 9 will be abandoned, so the final expression
is x > 10 AND y > 10.

10) EC situation
We may get a clause match the EC mechanism after we finished the all jobs.
For example: x > 10 and (x < 9 or y = 10) we will get the expression y = 10
match the EC, so we need add it into the EC.

11) The phase 2 examining
As we know in the case 10, if we add a ec and we may get a contradictory case
after the generate_base_implied_equalities invoked. So we need to recheck the
rel when rel's baserestrictinfo added clauses.
For example: select * from self_a x join self_a y on x.a= y.a where x.a in
(2,3,45) and x.a in (45, 78) and y.a <> 45
we will get the intersection element 45 and convert
(x.a in (2,3,45) and x.a in (45, 78)) to x.a = 45, then add it into EC.
We will get the generated clause y.a = 45 after
generate_base_implied_equalities invoked, it's contradictory to y.a <>45,
so alias y is self-contradictory finally.

Best regards

Attachments:

patch.diffapplication/octet-stream; name=patch.diffDownload+2711-5
regresssion_patch.diffapplication/octet-stream; name=regresssion_patch.diffDownload+4228-111
#2Robert Haas
robertmhaas@gmail.com
In reply to: bigbro_wq@hotmail.com (#1)
Re: Self contradictory examining on rel's baserestrictinfo

On Mon, Nov 25, 2024 at 3:58 AM ro b <bigbro_wq@hotmail.com> wrote:

1. Background
A few months ago, when i read source codes of B-tree in routine
_bt_preprocess_keys, i found that there are more contradictory
checking case we can add. I sent email to pgsql-hackers and
then community contributor replied me and told me someone had
already proposed this question. Thanks for taking the time
to address my question. After serveral conversations, i found
that we can do something more. We can place these jobs at planning time.

When you're referring to things that happened in the past, you should
provide links to specific messages and names of specific contributors.
It will be difficult for anyone to find the previous discussion based
on your description of "a few months ago" and a "community
contributor".

I'm a little confused because it seems like you think we don't do any
of this kind of thing already. But:

robert.haas=# explain select * from foo where a < 1 and a > 1;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(2 rows)

robert.haas=# explain select * from foo where a < 1 and a = 1;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(2 rows)

robert.haas=# explain select * from foo where a <> 1 and a = 1;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(2 rows)

robert.haas=# explain select * from foo where a in (1,2,3) and a is null;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(2 rows)

There are cases where we don't already draw the necessary conclusions,
such as a>1 and a>2, which could be simplified to a>2. But those cases
aren't necessarily that common.

7) Scalar array comparison expression
First we need to deconstruct the const array, figure out the null and non-null
elements.
If ALL flag is set and the Const contain NULL. we will get nothing (eg. x <=
ALL(array[56, null])), it's contradictory.

True, but that already seems to be working:

robert.haas=# explain select * from foo where a <= all(array[56, null]);
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(2 rows)

I'm not saying there is no room for improvement here, but I think you
will need to (1) be more clear about exactly which cases we are
already handling vs. which ones you want to handle, (2) possibly split
the patch into smaller patches each of which handles one specific case
instead of bundling many improvements together, and (3) improve the
comments and commit message in the patch(es).

--
Robert Haas
EDB: http://www.enterprisedb.com

In reply to: Robert Haas (#2)
Re: Self contradictory examining on rel's baserestrictinfo

On Mon, Nov 25, 2024 at 3:55 PM Robert Haas <robertmhaas@gmail.com> wrote:

There are cases where we don't already draw the necessary conclusions,
such as a>1 and a>2, which could be simplified to a>2. But those cases
aren't necessarily that common.

Actually, we do use the more restrictive operator with cases like "a>1
and a>2" -- but only in contexts that happen to involve a B-Tree index
scan, where _bt_preprocess_keys gets called. So it's a bit hit or
miss.

Tom has in the past expressed an interesting in moving the stuff in
_bt_preprocess_keys into the planner:

/messages/by-id/2587523.1647982549@sss.pgh.pa.us

Note that almost nothing in _bt_preprocess_keys is particularly
related to nbtree itself, except perhaps for the stuff that marks scan
keys required. The fact that "WHERE a IN (1, 2, 3) AND a < 2" can
actually be simplified to "WHERE a = 1" has exactly nothing to do with
B-Tree index scans, but we'll only do this simplification in the
context of a B-Tree index scan. There is also some redundancy between
_bt_preprocess_keys and the planner, which is less than ideal.

--
Peter Geoghegan

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#3)
Re: Self contradictory examining on rel's baserestrictinfo

Peter Geoghegan <pg@bowt.ie> writes:

On Mon, Nov 25, 2024 at 3:55 PM Robert Haas <robertmhaas@gmail.com> wrote:

There are cases where we don't already draw the necessary conclusions,
such as a>1 and a>2, which could be simplified to a>2. But those cases
aren't necessarily that common.

Actually, we do use the more restrictive operator with cases like "a>1
and a>2" -- but only in contexts that happen to involve a B-Tree index
scan, where _bt_preprocess_keys gets called. So it's a bit hit or
miss.

Worth noting also is that _bt_preprocess_keys can detect cases that
the planner cannot because the relevant values are not constants.

I'm a little skeptical that we should expend a lot more effort on
the sorts of cases discussed here. Basically, this sort of patch
improves matters for people who write crummy queries while penalizing
everybody else. We need to be very careful about adding cycles to
planner runtime in pursuit of optimizations that are only rarely
successful.

I'm not trying to say that we should reject all of this out-of-hand,
but I want to see some analysis of how much each check costs and
how likely it is to win in real-world queries.

BTW, it's also worth pointing out the constraint_exclusion GUC.
If that's turned up to "on" from its default setting of "partition",
it authorizes the planner to spend more effort looking for
contradictory quals. It might be reasonable to gate some of these
checks with that. (I also can't help wondering if any of them are
already implemented but are hidden behind that. I think Robert
must have had constraint_exclusion = on for his examples a
couple messages back.)

regards, tom lane

In reply to: Tom Lane (#4)
Re: Self contradictory examining on rel's baserestrictinfo

On Mon, Nov 25, 2024 at 4:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm a little skeptical that we should expend a lot more effort on
the sorts of cases discussed here. Basically, this sort of patch
improves matters for people who write crummy queries while penalizing
everybody else.

I think that it's more complicated than that. Rather than explain what
I mean in general terms, I'll give you a specific example of the kind
of thing that seems quite interesting to me:

It would be fairly easy to teach nbtree about a new kind of
ScalarArrayOp that worked just like a conventional SAOP, but also
returned tuples matching "IS NULL" (IS NULL uses the equals strategy
internally already, so it'd be almost the same as treating NULL as
just another array element). This could have significant advantages
over what's even possible right now, particularly in cases involving
ORDER BY ... LIMIT.

I suppose that we'd have to invent some kind of new syntax for this.
But wouldn't it also make sense if it worked with "WHERE a IN (1, 2)
OR a IS NULL"? Or even with "WHERE a = 1 OR a IS NULL"? Theoretically
this would still be a case that amounted to improving matters for
badly written queries...but not really (we can hardly expect many
users to adopt our esoteric non-standard syntax).

In fact, you could make a similar argument for rewriting IN() into a
"= ANY()" SOAP (in the way that we always have).

We need to be very careful about adding cycles to
planner runtime in pursuit of optimizations that are only rarely
successful.

I agree.

--
Peter Geoghegan

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#5)
Re: Self contradictory examining on rel's baserestrictinfo

Peter Geoghegan <pg@bowt.ie> writes:

It would be fairly easy to teach nbtree about a new kind of
ScalarArrayOp that worked just like a conventional SAOP, but also
returned tuples matching "IS NULL" (IS NULL uses the equals strategy
internally already, so it'd be almost the same as treating NULL as
just another array element). This could have significant advantages
over what's even possible right now, particularly in cases involving
ORDER BY ... LIMIT.

I suppose that we'd have to invent some kind of new syntax for this.
But wouldn't it also make sense if it worked with "WHERE a IN (1, 2)
OR a IS NULL"? Or even with "WHERE a = 1 OR a IS NULL"?

I'd be a strong -1 for inventing new syntax for that. However, if
that's actually a common query pattern (I'm not convinced of it)
we could certainly build something to recognize that usage and
transform it into a suitable executable structure.

However, I don't see the connection between that and the current
thread. That pattern is not self-contradictory. My doubts
about its usefulness have more to do with it being evidence of
the SQL anti-pattern of treating NULL as a normal data value.
But once you've made that choice in your data representation,
you're going to end up with queries like this, and there's
nothing you could do to write them "better".

regards, tom lane

In reply to: Tom Lane (#6)
Re: Self contradictory examining on rel's baserestrictinfo

On Mon, Nov 25, 2024 at 6:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Peter Geoghegan <pg@bowt.ie> writes:

I suppose that we'd have to invent some kind of new syntax for this.
But wouldn't it also make sense if it worked with "WHERE a IN (1, 2)
OR a IS NULL"? Or even with "WHERE a = 1 OR a IS NULL"?

I'd be a strong -1 for inventing new syntax for that. However, if
that's actually a common query pattern (I'm not convinced of it)
we could certainly build something to recognize that usage and
transform it into a suitable executable structure.

My basic point is this: SQL is supposed to be declarative -- in theory
it isn't supposed to matter how you write the query.

I don't see any hard distinction between the sorts of transformations
that the OP is talking about, and what you're describing here. There's
quite a lot of gray area, at least.

However, I don't see the connection between that and the current
thread. That pattern is not self-contradictory. My doubts
about its usefulness have more to do with it being evidence of
the SQL anti-pattern of treating NULL as a normal data value.

It's just an example, chosen to be easy to explain. I guess you didn't
like that example, so I'll go with another:

What about the "general OR optimization" stuff described by the MDAM
paper? The redundant and contradictory qual stuff is needed there, to
make sure that the final index scan access path (consisting of a
series of disjunctive index scans in key space order) will reliably
avoid returning duplicate rows.

(FWIW I think that the OP took some cues from the MDAM paper, since
they talked about doing things like transforming SQL Standard row
value constructor expressions into disjuncts.)

I don't have a concrete proposal here, or anything like it. I just
want to express that I believe that there's a lack of joined-up
thinking about nbtree preprocessing and the optimizer (not for the
first time). We *are* missing a few interesting tricks here.

--
Peter Geoghegan

#8Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#4)
Re: Self contradictory examining on rel's baserestrictinfo

On Mon, Nov 25, 2024 at 4:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

BTW, it's also worth pointing out the constraint_exclusion GUC.
If that's turned up to "on" from its default setting of "partition",
it authorizes the planner to spend more effort looking for
contradictory quals. It might be reasonable to gate some of these
checks with that. (I also can't help wondering if any of them are
already implemented but are hidden behind that. I think Robert
must have had constraint_exclusion = on for his examples a
couple messages back.)

I definitely didn't change the value of that setting. It's possible I
was running on an older branch, but I don't think it was anything
ancient.

--
Robert Haas
EDB: http://www.enterprisedb.com

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#8)
Re: Self contradictory examining on rel's baserestrictinfo

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Nov 25, 2024 at 4:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

... I think Robert
must have had constraint_exclusion = on for his examples a
couple messages back.)

I definitely didn't change the value of that setting. It's possible I
was running on an older branch, but I don't think it was anything
ancient.

Hmm, constraint_exclusion has defaulted to "partition" for many
years now. But when I tried to reproduce your examples at [1]/messages/by-id/CA+TgmoZqiCwHbZczXXLjucfuHi=7EahSyzEj5yrqYKMQ0QOL9Q@mail.gmail.com:

regression=# create table foo (a int);
CREATE TABLE
regression=# explain select * from foo where a < 1 and a > 1;
QUERY PLAN
-----------------------------------------------------
Seq Scan on foo (cost=0.00..48.25 rows=13 width=4)
Filter: ((a < 1) AND (a > 1))
(2 rows)

regression=# show constraint_exclusion;
constraint_exclusion
----------------------
partition
(1 row)

regression=# set constraint_exclusion = on;
SET
regression=# explain select * from foo where a < 1 and a > 1;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(2 rows)

and similarly for some of the other examples (I didn't try
every one). Maybe your test table was partitioned??

regards, tom lane

[1]: /messages/by-id/CA+TgmoZqiCwHbZczXXLjucfuHi=7EahSyzEj5yrqYKMQ0QOL9Q@mail.gmail.com

#10Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#9)
Re: Self contradictory examining on rel's baserestrictinfo

On Tue, Nov 26, 2024 at 12:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Maybe your test table was partitioned??

Ah, yes, it was.

--
Robert Haas
EDB: http://www.enterprisedb.com

#11bigbro_wq@hotmail.com
bigbro_wq@hotmail.com
In reply to: Robert Haas (#2)
Re: Self contradictory examining on rel's baserestrictinfo

Thank you for your advice and guidance.
I didn't know the constraint_exclusion switch existed.
As your advice i need to make what i am doing to be clear.

There are cases where we don't already draw the necessary conclusions,
such as a>1 and a>2, which could be simplified to a>2. But those cases
aren't necessarily that common.

The path i committed not just test contradictory but also do the
simplification. The simplification is limited in the BTREE.
Could you interpret the case in a little more detail.

Best regards

________________________________
From: Robert Haas <robertmhaas@gmail.com>
Sent: Tuesday, November 26, 2024 04:55
To: ro b <bigbro_wq@hotmail.com>
Cc: pgsql-hackers <pgsql-hackers@lists.postgresql.org>
Subject: Re: Self contradictory examining on rel's baserestrictinfo

On Mon, Nov 25, 2024 at 3:58 AM ro b <bigbro_wq@hotmail.com> wrote:

1. Background
A few months ago, when i read source codes of B-tree in routine
_bt_preprocess_keys, i found that there are more contradictory
checking case we can add. I sent email to pgsql-hackers and
then community contributor replied me and told me someone had
already proposed this question. Thanks for taking the time
to address my question. After serveral conversations, i found
that we can do something more. We can place these jobs at planning time.

When you're referring to things that happened in the past, you should
provide links to specific messages and names of specific contributors.
It will be difficult for anyone to find the previous discussion based
on your description of "a few months ago" and a "community
contributor".

I'm a little confused because it seems like you think we don't do any
of this kind of thing already. But:

robert.haas=# explain select * from foo where a < 1 and a > 1;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(2 rows)

robert.haas=# explain select * from foo where a < 1 and a = 1;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(2 rows)

robert.haas=# explain select * from foo where a <> 1 and a = 1;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(2 rows)

robert.haas=# explain select * from foo where a in (1,2,3) and a is null;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(2 rows)

There are cases where we don't already draw the necessary conclusions,
such as a>1 and a>2, which could be simplified to a>2. But those cases
aren't necessarily that common.

7) Scalar array comparison expression
First we need to deconstruct the const array, figure out the null and non-null
elements.
If ALL flag is set and the Const contain NULL. we will get nothing (eg. x <=
ALL(array[56, null])), it's contradictory.

True, but that already seems to be working:

robert.haas=# explain select * from foo where a <= all(array[56, null]);
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(2 rows)

I'm not saying there is no room for improvement here, but I think you
will need to (1) be more clear about exactly which cases we are
already handling vs. which ones you want to handle, (2) possibly split
the patch into smaller patches each of which handles one specific case
instead of bundling many improvements together, and (3) improve the
comments and commit message in the patch(es).

--
Robert Haas
EDB: http://www.enterprisedb.com

#12Robert Haas
robertmhaas@gmail.com
In reply to: bigbro_wq@hotmail.com (#11)
Re: Self contradictory examining on rel's baserestrictinfo

On Wed, Nov 27, 2024 at 9:30 AM ro b <bigbro_wq@hotmail.com> wrote:

The path i committed not just test contradictory but also do the
simplification. The simplification is limited in the BTREE.
Could you interpret the case in a little more detail.

I am not able to understand this paragraph.

Regretfully,

--
Robert Haas
EDB: http://www.enterprisedb.com

#13bigbro_wq@hotmail.com
bigbro_wq@hotmail.com
In reply to: Robert Haas (#12)
Re: Self contradictory examining on rel's baserestrictinfo

I mean why can't draw the conclusions
like the case a>1 and a>2 which can be simplified
to a>2 if the operator > is btree opclass.

Best regards

________________________________
From: Robert Haas <robertmhaas@gmail.com>
Sent: Wednesday, November 27, 2024 22:52
To: ro b <bigbro_wq@hotmail.com>
Cc: Peter Geoghegan <pg@bowt.ie>; pgsql-hackers <pgsql-hackers@lists.postgresql.org>
Subject: Re: Self contradictory examining on rel's baserestrictinfo

On Wed, Nov 27, 2024 at 9:30 AM ro b <bigbro_wq@hotmail.com> wrote:

The path i committed not just test contradictory but also do the
simplification. The simplification is limited in the BTREE.
Could you interpret the case in a little more detail.

I am not able to understand this paragraph.

Regretfully,

--
Robert Haas
EDB: http://www.enterprisedb.com

#14Robert Haas
robertmhaas@gmail.com
In reply to: bigbro_wq@hotmail.com (#13)
Re: Self contradictory examining on rel's baserestrictinfo

On Wed, Nov 27, 2024 at 10:28 AM ro b <bigbro_wq@hotmail.com> wrote:

I mean why can't draw the conclusions
like the case a>1 and a>2 which can be simplified
to a>2 if the operator > is btree opclass.

This question has already been discussed in some detail on this email
thread. First, we sometimes do detect it, as discussed by Peter.
Second, it is an unusual case and so not worth spending a lot of CPU
cycles to detect, as discussed by Tom.

We might still accept a patch to improve things in this area. For
example, if somebody could find a way to change the code so that we
are able to detect more cases without needing to spend more CPU
cycles, I think everyone would like that. However, that may be a
tricky goal to accomplish.

--
Robert Haas
EDB: http://www.enterprisedb.com

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#14)
Re: Self contradictory examining on rel's baserestrictinfo

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Nov 27, 2024 at 10:28 AM ro b <bigbro_wq@hotmail.com> wrote:

I mean why can't draw the conclusions
like the case a>1 and a>2 which can be simplified
to a>2 if the operator > is btree opclass.

This question has already been discussed in some detail on this email
thread. First, we sometimes do detect it, as discussed by Peter.
Second, it is an unusual case and so not worth spending a lot of CPU
cycles to detect, as discussed by Tom.

We might still accept a patch to improve things in this area. For
example, if somebody could find a way to change the code so that we
are able to detect more cases without needing to spend more CPU
cycles, I think everyone would like that. However, that may be a
tricky goal to accomplish.

This discussion trailed off without any final resolution about what
to do with the patch submission. I think we owe the submitter at
least a minimal actual review, so ...

I took a quick look at the "self_contradictory.sql" regression test
included in the patch, with an eye to seeing how much of it we pass
already when constraint_exclusion is set to "on". The answer is
"a fair amount", in particular we successfully detect many of the
cases where the clauses can be proved contradictory. The ones
where we fail to do so are mostly cases like

 explain (costs off)
 SELECT * FROM self_a WHERE z IS UNKNOWN AND z IS NOT UNKNOWN;
-        QUERY PLAN        
---------------------------
- Result
-   One-Time Filter: false
+                    QUERY PLAN                     
+---------------------------------------------------
+ Seq Scan on self_a
+   Filter: ((z IS UNKNOWN) AND (z IS NOT UNKNOWN))
 (2 rows)

This is not a fundamental shortcoming in the system, it's just
that optimizer/util/predtest.c has not been built out very far
with respect to BooleanTest nodes. We could certainly do that,
and it wouldn't add much runtime for queries that contain no
BooleanTest conditions, but I have to question whether it's
worth the trouble. Is a query like the above really common?

There are also a bunch of cases that are not contradictory
but that the patch manages to simplify, for example

 explain (costs off)
 SELECT * FROM self_a WHERE a <= 10 AND a = 10;
-     QUERY PLAN     
---------------------
+             QUERY PLAN             
+------------------------------------
  Seq Scan on self_a
-   Filter: (a = 10)
+   Filter: ((a <= 10) AND (a = 10))
 (2 rows)

The constraint_exclusion framework can't make this sort of
transformation because it is only focused on proving a relation's
quals to be contradictory or not. So this part is new work ...
but again I have to wonder if this query pattern is common enough
to be worth spending planner cycles to check for.

As Peter mentioned, we do actually have logic that performs the
equivalent transformation within btree indexqual setup. However, the
context is pretty different: first, the work of identifying clauses
that bear on the same variable has already been done, and second we
really have to do at least some of this work on the way to identifying
the appropriate initial search condition for btree descent. It's not
apparent to me that it's worth having a whole new planner pass that
is searching for clauses involving the same variable. That is not
especially cheap if there are a lot of clauses. If we were to pursue
this, I'd want to think about somehow integrating it with the work
that indxpath.c already does to collect index-relevant conditions.

So in short, my main review comment is that you should be looking
to integrate with existing planner functionality such as predtest.c
and indxpath.c, rather than writing a few thousand lines of
not-connected-to-anything-else code. It's too hard to make the
case that these optimizations are worth the cost if they require
an entire separate analysis pass.

Some other comments:

* You need to work harder on the comments in whatever you write.
contradictory.c is drastically undercommented, and what there is
is frequently unhelpful, like the obviously copied-and-pasted
file header. You also did things like adding entirely-undocumented
fields to common structures such as RestrictInfo. This certainly
wouldn't be committable as-is; but it also makes the patch painful
to review in any detail, since the reviewer must reverse-engineer
a lot of information that should have been provided by comments.

* The side-effects on existing regression test cases are pretty
concerning: it's not clear that those tests still exercise what
they were meant to. (You can be sure that a test case intentionally
written with redundant conditions was not written that way just to
provide a case you could optimize later.) We'd have to analyze
all those test cases in detail and figure out whether the
simplification is OK or we need to provide some way to defeat it.

I'm going to mark this submission Returned With Feedback, because
I don't foresee anything very close to this getting committed.
But I'd encourage you to pursue the ideas in other forms as
suggested by this discussion.

regards, tom lane

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#15)
Re: Self contradictory examining on rel's baserestrictinfo

I wrote:

This is not a fundamental shortcoming in the system, it's just
that optimizer/util/predtest.c has not been built out very far
with respect to BooleanTest nodes.

Oh ... I'd momentarily forgotten that there's already a patch
in the queue to attack that:

https://commitfest.postgresql.org/51/4690/

That's been stalled for lack of review, so if you were interested
in helping out, that'd be great.

regards, tom lane