Replace IN VALUES with ANY in WHERE clauses during optimization

Started by Ivan Kushover 1 year ago39 messageshackers

ivan.kush@tantorlabs.com

over 1 year ago

Hello, hackers! I with my friends propose the patch to replace IN VALUES
to ANY in WHERE clauses.

# Intro

The `VALUES` in the `IN VALUES` construct is replaced with with an array
of values when `VALUES` contains 1 column. In the end it will be
replaced with ANY by the existing function makeA_Expr
(src/backend/nodes/makefuncs.c)

This improves performance, especially if the values are small.

# Patch

v1-in_values_to_array.patch

# How realized

`VALUES` statement corresponds to `values_clause` nonterminal symbol in
gram.y, where it's parsed to `SelectStmt` node.

`IN` is parsed in `a_expr` symbol. When it contains `VALUES` with 1
column, parser extracts data from `SelectStmt` and passes it

to function call `makeSimpleA_Expr` where simple `A_Expr` is created.

Later during optimizations of parser tree this `A_Expr` will be
transformed to `ArrayExpr` (already realized in Postgres)

# Authors.
Author: Ivan Kush <ivan.kush@tantorlabs.com>
Author: Vadim Yacenko <vadim.yacenko@tantorlabs.com>
Author: Alexander Simonov <alexander.simonov@tantorlabs.com>

# Tests
Implementation contains many regression tests of varying complexity,
which check supported features.

# Platform
This patch was checkouted from tag REL_17_STABLE. Code is developed in
Linux, doesn't contain platfrom-specific code, only Postgres internal
data structures and functions.

# Documentation
Regression tests contain many examples

# Performance
It increases performance

# Example
Let's compare result. With path the execution time is significantly lower.

We have a table table1 with 10000 rows.

Let's execute several commands
see commands.sql

Plan no patch
see plan_no_patch.txt

Plan with patch
see plan_with_patch.txt

--
Best wishes,
Ivan Kush
Tantor Labs LLC

Import Notes

Reply to msg id not found: 30853de3-5e98-4f8a-bc1a-a3d794cde74b@tantorlabs.comReference msg id not found: 30853de3-5e98-4f8a-bc1a-a3d794cde74b@tantorlabs.com

Alena Rybakina

a.rybakina@postgrespro.ru

over 1 year ago

In reply to: Ivan Kush (#1)

Re: Replace IN VALUES with ANY in WHERE clauses during optimization

Hi!

On 03.10.2024 22:52, Ivan Kush wrote:

Hello, hackers! I with my friends propose the patch to replace IN
VALUES to ANY in WHERE clauses.

# Intro

The `VALUES` in the `IN VALUES` construct is replaced with with an
array of values when `VALUES` contains 1 column. In the end it will be
replaced with ANY by the existing function makeA_Expr
(src/backend/nodes/makefuncs.c)

This improves performance, especially if the values are small.

# Patch

v1-in_values_to_array.patch

# How realized

`VALUES` statement corresponds to `values_clause` nonterminal symbol
in gram.y, where it's parsed to `SelectStmt` node.

`IN` is parsed in `a_expr` symbol. When it contains `VALUES` with 1
column, parser extracts data from `SelectStmt` and passes it

to function call `makeSimpleA_Expr` where simple `A_Expr` is created.

Later during optimizations of parser tree this `A_Expr` will be
transformed to `ArrayExpr` (already realized in Postgres)

# Authors.
Author: Ivan Kush <ivan.kush@tantorlabs.com>
Author: Vadim Yacenko <vadim.yacenko@tantorlabs.com>
Author: Alexander Simonov <alexander.simonov@tantorlabs.com>

# Tests
Implementation contains many regression tests of varying complexity,
which check supported features.

# Platform
This patch was checkouted from tag REL_17_STABLE. Code is developed in
Linux, doesn't contain platfrom-specific code, only Postgres internal
data structures and functions.

# Documentation
Regression tests contain many examples

# Performance
It increases performance

# Example
Let's compare result. With path the execution time is significantly
lower.

We have a table table1 with 10000 rows.

postgres=# \d table1;
                         Table "public.table1"
Column |            Type             | Collation | Nullable | Default
--------+-----------------------------+-----------+----------+---------
fld1   | timestamp without time zone |           | not null |
fld2   | bytea                       |           | not null |
Indexes:
    "table1index" btree (fld2)

Let's execute several commands
see commands.sql

Plan no patch
see plan_no_patch.txt

Plan with patch
see plan_with_patch.txt

I think you should think about putting these constants in ANY Array

EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
select * from t
    where x in (VALUES(1200), (1));
                    QUERY PLAN
---------------------------------------------------
Seq Scan on t (actual rows=1 loops=1)
   Filter: (x = ANY ('{1200,1}'::integer[]))
(3 rows)

Anlrey Lepikhov and I recently described this in an article [0]https://danolivo.substack.com/p/7456653e-9716-4e91-ad09-83737784c665 here and
the implementation already exists, but for now it was posted a binary
application for testing. The acceleration is significant I agree.

[0]: https://danolivo.substack.com/p/7456653e-9716-4e91-ad09-83737784c665

--
Regards,
Alena Rybakina
Postgres Professional

Laurenz Albe

laurenz.albe@cybertec.at

over 1 year ago

In reply to: Alena Rybakina (#2)

Re: Replace IN VALUES with ANY in WHERE clauses during optimization

On Thu, 2024-10-03 at 23:10 +0300, Alena Rybakina wrote:

On 03.10.2024 22:52, Ivan Kush wrote:

Hello, hackers! I with my friends propose the patch to replace IN
VALUES to ANY in WHERE clauses.

# Intro

The `VALUES` in the `IN VALUES` construct is replaced with with an
array of values when `VALUES` contains 1 column. In the end it will be
replaced with ANY by the existing function makeA_Expr
(src/backend/nodes/makefuncs.c)

This improves performance, especially if the values are small.

Anlrey Lepikhov and I recently described this in an article [0] here and
the implementation already exists, but for now it was posted a binary
application for testing. The acceleration is significant I agree.

[0] https://danolivo.substack.com/p/7456653e-9716-4e91-ad09-83737784c665

I believe that the speed improvement is significant, but who writes a
query like

... WHERE col IN (VALUES (1), (2), (3))

when they could write the much shorter

... WHERE col IN (1, 2, 3)

which is already converted to "= ANY"?

I wonder if it is worth the extra planning time to detect and improve
such queries.

Yours,
Laurenz Albe

Tom Lane

tgl@sss.pgh.pa.us

over 1 year ago

In reply to: Laurenz Albe (#3)

Re: Replace IN VALUES with ANY in WHERE clauses during optimization

Laurenz Albe <laurenz.albe@cybertec.at> writes:

I wonder if it is worth the extra planning time to detect and improve
such queries.

I'm skeptical too. I'm *very* skeptical of implementing it in the
grammar as shown here --- I'd go so far as to say that that approach
cannot be accepted. That's far too early, and it risks all sorts
of problems. An example is that the code as given seems to assume
that all the sublists are the same length ... but we haven't checked
that yet. I also suspect that this does not behave the same as the
original construct for purposes like resolving dissimilar types in
the VALUES list. (In an ideal world, perhaps it'd behave the same,
but that ship sailed a couple decades ago.)

regards, tom lane

Ivan Kush

ivan.kush@tantorlabs.com

over 1 year ago

In reply to: Alena Rybakina (#2)

Re: Replace IN VALUES with ANY in WHERE clauses during optimization

Do you mean, that I should try to execute such command?

In this patch it gives ANY

postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
select * from table1
where fld2 in
(VALUES('\\230\\211\\030f\\332\\261R\\333\\021\\356\\337z5\\336\\032\\372'::bytea),
('\\235\\204 \\004\\017\\353\\301\\200\\021\\355a&d}\\245\\312'::byte
a));
QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------

Bitmap Heap Scan on table1 (actual rows=0 loops=1)
Recheck Cond: (fld2 = ANY
('{"\\x5c3233305c3231315c303330665c3333325c323631525c3333335c3032315c3335365c3333377a355c3333365c3033325c333732","\\x5c3233355c323034205c30303
45c3031375c3335335c3330315c3230305c3032315c3335356126647d5c3234355c333132"}'::bytea[]))

-> Bitmap Index Scan on table1index (actual rows=0 loops=1)
Index Cond: (fld2 = ANY
('{"\\x5c3233305c3231315c303330665c3333325c323631525c3333335c3032315c3335365c3333377a355c3333365c3033325c333732","\\x5c3233355c323034205c3
030345c3031375c3335335c3330315c3230305c3032315c3335356126647d5c3234355c333132"}'::bytea[]))

(4 rows)

Do you plan to send your implementation to the hackers?

On 10/3/24 23:10, Alena Rybakina wrote:

I think you should think about putting these constants in ANY Array

EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
select * from t
    where x in (VALUES(1200), (1));
                    QUERY PLAN
---------------------------------------------------
Seq Scan on t (actual rows=1 loops=1)
   Filter: (x = ANY ('{1200,1}'::integer[]))
(3 rows)

--
Best wishes,
Ivan Kush
Tantor Labs LLC

Ivan Kush

ivan.kush@tantorlabs.com

over 1 year ago

In reply to: Laurenz Albe (#3)

Re: Replace IN VALUES with ANY in WHERE clauses during optimization

Some ORMs or proprietary software may write it mistakenly. In these
cases this idea may be helpful.

This patch contains GUC to enable/disable this optimization

On 10/3/24 23:19, Laurenz Albe wrote:

On Thu, 2024-10-03 at 23:10 +0300, Alena Rybakina wrote:

On 03.10.2024 22:52, Ivan Kush wrote:

Hello, hackers! I with my friends propose the patch to replace IN
VALUES to ANY in WHERE clauses.

# Intro

The `VALUES` in the `IN VALUES` construct is replaced with with an
array of values when `VALUES` contains 1 column. In the end it will be
replaced with ANY by the existing function makeA_Expr
(src/backend/nodes/makefuncs.c)

This improves performance, especially if the values are small.

Anlrey Lepikhov and I recently described this in an article [0] here and
the implementation already exists, but for now it was posted a binary
application for testing. The acceleration is significant I agree.

[0] https://danolivo.substack.com/p/7456653e-9716-4e91-ad09-83737784c665

I believe that the speed improvement is significant, but who writes a
query like

... WHERE col IN (VALUES (1), (2), (3))

when they could write the much shorter

... WHERE col IN (1, 2, 3)

which is already converted to "= ANY"?

I wonder if it is worth the extra planning time to detect and improve
such queries.

Yours,
Laurenz Albe

--
Best wishes,
Ivan Kush
Tantor Labs LLC

Andrei Lepikhov

lepihov@gmail.com

over 1 year ago

In reply to: Tom Lane (#4)

Re: Replace IN VALUES with ANY in WHERE clauses during optimization

On 10/4/24 04:08, Tom Lane wrote:

Laurenz Albe <laurenz.albe@cybertec.at> writes:

I wonder if it is worth the extra planning time to detect and improve
such queries.

I'm skeptical too. I'm *very* skeptical of implementing it in the
grammar as shown here --- I'd go so far as to say that that approach
cannot be accepted. That's far too early, and it risks all sorts
of problems. An example is that the code as given seems to assume
that all the sublists are the same length ... but we haven't checked
that yet. I also suspect that this does not behave the same as the
original construct for purposes like resolving dissimilar types in
the VALUES list. (In an ideal world, perhaps it'd behave the same,
but that ship sailed a couple decades ago.)

We also have an implementation of VALUES -> ARRAY transformation.
Because enterprises must deal with users' problems, many of these users
employ automatically generated queries.
Being informed very well of the consensus about that stuff, we've
designed it as a library. But, looking into the code now, I see that it
only needs a few cycles if no one 'x IN VALUES' expression is presented
in the query. Who knows? It may be OK for the core.
So, I've rewritten the code into the patch - see it in the attachment.

The idea is quite simple - at the same place as
convert_ANY_sublink_to_join, we can test the SubLink on proper VALUES
RTE and perform the transformation if it's convertible.

--
regards, Andrei Lepikhov

Alena Rybakina

a.rybakina@postgrespro.ru

over 1 year ago

In reply to: Ivan Kush (#5)

Re: Replace IN VALUES with ANY in WHERE clauses during optimization

On 04.10.2024 11:43, Ivan Kush wrote:

Do you mean, that I should try to execute such command?

In this patch it gives ANY

postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
select * from table1
   where fld2 in
(VALUES('\\230\\211\\030f\\332\\261R\\333\\021\\356\\337z5\\336\\032\\372'::bytea),
('\\235\\204 \\004\\017\\353\\301\\200\\021\\355a&d}\\245\\312'::byte
a));
                                                                                                                             QUERY
PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------

Bitmap Heap Scan on table1 (actual rows=0 loops=1)
  Recheck Cond: (fld2 = ANY
('{"\\x5c3233305c3231315c303330665c3333325c323631525c3333335c3032315c3335365c3333377a355c3333365c3033325c333732","\\x5c3233355c323034205c30303
45c3031375c3335335c3330315c3230305c3032315c3335356126647d5c3234355c333132"}'::bytea[]))

  -> Bitmap Index Scan on table1index (actual rows=0 loops=1)
        Index Cond: (fld2 = ANY
('{"\\x5c3233305c3231315c303330665c3333325c323631525c3333335c3032315c3335365c3333377a355c3333365c3033325c333732","\\x5c3233355c323034205c3
030345c3031375c3335335c3330315c3230305c3032315c3335356126647d5c3234355c333132"}'::bytea[]))

(4 rows)

Yes I meant it.

Do you plan to send your implementation to the hackers?

It was sent here [0]/messages/by-id/21d5fca5-0c02-4afd-8c98-d0930b298a8d@gmail.com.

[0]: /messages/by-id/21d5fca5-0c02-4afd-8c98-d0930b298a8d@gmail.com
/messages/by-id/21d5fca5-0c02-4afd-8c98-d0930b298a8d@gmail.com

--
Regards,
Alena Rybakina
Postgres Professional

Ivan Kush

ivan.kush@tantorlabs.com

over 1 year ago

In reply to: Alena Rybakina (#8)

Re: Replace IN VALUES with ANY in WHERE clauses during optimization

I agree, your realization is better: reliability is better and debugging
is simplier.
I've looked at the code, looks good to me. Only style notes like
VTA/VtA, SELECT/select, etc. may be corrected

On 10/4/24 12:15, Alena Rybakina wrote:

It was sent here [0].

[0]
/messages/by-id/21d5fca5-0c02-4afd-8c98-d0930b298a8d@gmail.com

--
Best wishes,
Ivan Kush
Tantor Labs LLC

#10

Alena Rybakina

a.rybakina@postgrespro.ru

over 1 year ago

In reply to: Andrei Lepikhov (#7)

Re: Replace IN VALUES with ANY in WHERE clauses during optimization

Hi!

On 04.10.2024 12:05, Andrei Lepikhov wrote:

We also have an implementation of VALUES -> ARRAY transformation.
Because enterprises must deal with users' problems, many of these
users employ automatically generated queries.
Being informed very well of the consensus about that stuff, we've
designed it as a library. But, looking into the code now, I see that
it only needs a few cycles if no one 'x IN VALUES' expression is
presented in the query. Who knows? It may be OK for the core.
So, I've rewritten the code into the patch - see it in the attachment.

The idea is quite simple - at the same place as
convert_ANY_sublink_to_join, we can test the SubLink on proper VALUES
RTE and perform the transformation if it's convertible.

I updated the patch due to the problem with the coercion types for both
sides of the expression.

We must find a common type for both leftop of the expression and rightop
including constants for correct transformation, and at the same time
check that the resulting types are compatible.

To do this we find an operator for the two input types if it is
possible, and also remember the target types for the left and right
sides, and after that make a coercion.

This processing is only needed in cases where we are not working with
parameters since the final type is not specified for the parameters.

--
Regards,
Alena Rybakina
Postgres Professional

#11

Alexander Korotkov

aekorotkov@gmail.com

about 1 year ago

In reply to: Alena Rybakina (#10)

Re: Replace IN VALUES with ANY in WHERE clauses during optimization

Hi, Alena!

On Thu, Jan 9, 2025 at 3:11 PM Alena Rybakina <a.rybakina@postgrespro.ru>
wrote:

On 04.10.2024 12:05, Andrei Lepikhov wrote:

We also have an implementation of VALUES -> ARRAY transformation.
Because enterprises must deal with users' problems, many of these
users employ automatically generated queries.
Being informed very well of the consensus about that stuff, we've
designed it as a library. But, looking into the code now, I see that
it only needs a few cycles if no one 'x IN VALUES' expression is
presented in the query. Who knows? It may be OK for the core.
So, I've rewritten the code into the patch - see it in the attachment.

The idea is quite simple - at the same place as
convert_ANY_sublink_to_join, we can test the SubLink on proper VALUES
RTE and perform the transformation if it's convertible.

I updated the patch due to the problem with the coercion types for both
sides of the expression.

We must find a common type for both leftop of the expression and rightop
including constants for correct transformation, and at the same time
check that the resulting types are compatible.

To do this we find an operator for the two input types if it is
possible, and also remember the target types for the left and right
sides, and after that make a coercion.

This processing is only needed in cases where we are not working with
parameters since the final type is not specified for the parameters.

I took a look at this patch.

+ /* TODO: remember parameters */

What was intended to do here?

Also, aren't we too restrictive while requiring is_simple_values_sequence()?
For instance, I believe cases like this (containing Var) could be
transformed too.

select * from t t1, lateral (select * from t t2 where t2.i in (values
(t1.i), (1)));

------
Regards,
Alexander Korotkov
Supabase

#12

Alexander Korotkov

aekorotkov@gmail.com

about 1 year ago

In reply to: Alexander Korotkov (#11)

Re: Replace IN VALUES with ANY in WHERE clauses during optimization

On Sun, Feb 9, 2025 at 1:58 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:

On Thu, Jan 9, 2025 at 3:11 PM Alena Rybakina <a.rybakina@postgrespro.ru> wrote:

On 04.10.2024 12:05, Andrei Lepikhov wrote:

We also have an implementation of VALUES -> ARRAY transformation.
Because enterprises must deal with users' problems, many of these
users employ automatically generated queries.
Being informed very well of the consensus about that stuff, we've
designed it as a library. But, looking into the code now, I see that
it only needs a few cycles if no one 'x IN VALUES' expression is
presented in the query. Who knows? It may be OK for the core.
So, I've rewritten the code into the patch - see it in the attachment.

The idea is quite simple - at the same place as
convert_ANY_sublink_to_join, we can test the SubLink on proper VALUES
RTE and perform the transformation if it's convertible.

I updated the patch due to the problem with the coercion types for both
sides of the expression.

We must find a common type for both leftop of the expression and rightop
including constants for correct transformation, and at the same time
check that the resulting types are compatible.

To do this we find an operator for the two input types if it is
possible, and also remember the target types for the left and right
sides, and after that make a coercion.

This processing is only needed in cases where we are not working with
parameters since the final type is not specified for the parameters.

I took a look at this patch.

+ /* TODO: remember parameters */

What was intended to do here?

Also, aren't we too restrictive while requiring is_simple_values_sequence()?
For instance, I believe cases like this (containing Var) could be transformed too.

select * from t t1, lateral (select * from t t2 where t2.i in (values (t1.i), (1)));

Also, I think there is quite a code duplication about construction of
SAOP between match_orclause_to_indexcol() and convert_VALUES_to_ANY()
functions. I would like to see a refactoring as a separate first
patch, which extracts the common part into a function.

------
Regards,
Alexander Korotkov
Supabase

#13

Alena Rybakina

a.rybakina@postgrespro.ru

about 1 year ago

In reply to: Alexander Korotkov (#12)

Re: Replace IN VALUES with ANY in WHERE clauses during optimization

Hi! Thank you for your review!

On 09.02.2025 18:38, Alexander Korotkov wrote:

On Sun, Feb 9, 2025 at 1:58 PM Alexander Korotkov<aekorotkov@gmail.com> wrote:

On Thu, Jan 9, 2025 at 3:11 PM Alena Rybakina<a.rybakina@postgrespro.ru> wrote:

On 04.10.2024 12:05, Andrei Lepikhov wrote:

We also have an implementation of VALUES -> ARRAY transformation.
Because enterprises must deal with users' problems, many of these
users employ automatically generated queries.
Being informed very well of the consensus about that stuff, we've
designed it as a library. But, looking into the code now, I see that
it only needs a few cycles if no one 'x IN VALUES' expression is
presented in the query. Who knows? It may be OK for the core.
So, I've rewritten the code into the patch - see it in the attachment.

The idea is quite simple - at the same place as
convert_ANY_sublink_to_join, we can test the SubLink on proper VALUES
RTE and perform the transformation if it's convertible.

I updated the patch due to the problem with the coercion types for both
sides of the expression.

We must find a common type for both leftop of the expression and rightop
including constants for correct transformation, and at the same time
check that the resulting types are compatible.

To do this we find an operator for the two input types if it is
possible, and also remember the target types for the left and right
sides, and after that make a coercion.

This processing is only needed in cases where we are not working with
parameters since the final type is not specified for the parameters.

I took a look at this patch.

+ /* TODO: remember parameters */

This comment is not relevant anymore.This comment was added during patch
development and so it is enough to save const params in
arrayExpr->elements. I'll delete it.

Andrei did review of my last code and improved it. I'll add his code too.

What was intended to do here?

Also, aren't we too restrictive while requiring is_simple_values_sequence()?
For instance, I believe cases like this (containing Var) could be transformed too.

select * from t t1, lateral (select * from t t2 where t2.i in (values (t1.i), (1)));

I am willing to agree with you because I didn't see any limitations for
that. After analyzing diff of regression tests and your example (below),
I think I will need to add a piece of logic of preparation to pull up
the sub-select into top range table like here [0]/messages/by-id/975a3736-a8b5-49b3-8009-4d4e86867aa1@postgrespro.ru to correct processing
vars elements based on their position in the query.

[0]: /messages/by-id/975a3736-a8b5-49b3-8009-4d4e86867aa1@postgrespro.ru
/messages/by-id/975a3736-a8b5-49b3-8009-4d4e86867aa1@postgrespro.ru

alena@postgres=# explain select * from t t1, lateral (select * from t t2
where t2.x in (val
ues (t1.x), (1)));
ERROR: bogus varlevelsup: 1 offset 0

So, I'm working on it.

Also, I think there is quite a code duplication about construction of
SAOP between match_orclause_to_indexcol() and convert_VALUES_to_ANY()
functions. I would like to see a refactoring as a separate first
patch, which extracts the common part into a function.

I completely agree with you. Ill add it.

--
Regards,
Alena Rybakina
Postgres Professional

#14

Alena Rybakina

a.rybakina@postgrespro.ru

about 1 year ago

In reply to: Alexander Korotkov (#12)

Re: Replace IN VALUES with ANY in WHERE clauses during optimization

Hi!

On 09.02.2025 18:38, Alexander Korotkov wrote:

Also, aren't we too restrictive while requiring is_simple_values_sequence()?
For instance, I believe cases like this (containing Var) could be transformed too.

select * from t t1, lateral (select * from t t2 where t2.i in (values (t1.i), (1)));

I'm still working on it.

Also, I think there is quite a code duplication about construction of
SAOP between match_orclause_to_indexcol() and convert_VALUES_to_ANY()
functions. I would like to see a refactoring as a separate first
patch, which extracts the common part into a function.

Done.

I have attached a patch. In addition to the transfer, I added the
process of searching for a suitable operator and type for the left
expression for input expressions: const and left expression, since they
may differ from the declared types. Additionally, we convert the left
expr to a type suitable for the found operator.

--
Regards,
Alena Rybakina
Postgres Professional

#15

newtglobal postgresql_contributors

postgresql_contributors@newtglobalcorp.com

about 1 year ago

In reply to: Ivan Kush (#1)

Re: Replace IN VALUES with ANY in WHERE clauses during optimization

The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: tested, failed
Spec compliant: tested, failed
Documentation: tested, failed

Hi Ivan Kush
I tested the patch with `commands.sql` and observed noticeable improvements in planning and execution time, especially with multiple tables. Even single-table queries show small time reductions (0.02–0.04 ms). The patch optimizes `IN` clauses effectively, particularly with `VALUES`. For example, `col IN (VALUES ('a'), ('b'), ('c'))` now behaves similarly to `col IN ('a', 'b', 'c')`, leading to faster execution and reduced planning overhead.
Regards,
Postgresql Contributors - NewtGlobal

#16

Alena Rybakina

a.rybakina@postgrespro.ru

about 1 year ago

In reply to: Alena Rybakina (#14)

Re: Replace IN VALUES with ANY in WHERE clauses during optimization

Hi!

On 21.02.2025 00:09, Alena Rybakina wrote:

Hi!

On 09.02.2025 18:38, Alexander Korotkov wrote:

Also, aren't we too restrictive while requiring is_simple_values_sequence()?
For instance, I believe cases like this (containing Var) could be transformed too.

select * from t t1, lateral (select * from t t2 where t2.i in (values (t1.i), (1)));

I added it and attached a patch with diff file. To be honest, I didn't
find queries except for var with volatile functions where the transform
can't be applied.

I'm not sure about only cases where var can refer to something outside
available_rels list but I couldn't come up with an example where that's
possible, what do you think?

--
Regards,
Alena Rybakina
Postgres Professional

#17

Alena Rybakina

a.rybakina@postgrespro.ru

about 1 year ago

In reply to: Alena Rybakina (#16)

Re: Replace IN VALUES with ANY in WHERE clauses during optimization

On 28.02.2025 14:48, Alena Rybakina wrote:

Hi!

On 21.02.2025 00:09, Alena Rybakina wrote:

Hi!

On 09.02.2025 18:38, Alexander Korotkov wrote:

Also, aren't we too restrictive while requiring is_simple_values_sequence()?
For instance, I believe cases like this (containing Var) could be transformed too.

select * from t t1, lateral (select * from t t2 where t2.i in (values (t1.i), (1)));

I added it and attached a patch with diff file. To be honest, I didn't
find queries except for var with volatile functions where the
transform can't be applied.

I removed the function volatility check that I added in the previous
version, since we already check it in is_simple_values_sequence.

I'm not sure about only cases where var can refer to something outside
available_rels list but I couldn't come up with an example where
that's possible, what do you think?

Considering it again, I think we can't face problems like that because
we don't work with join.

I attached a diff file as a difference with the 3rd version of the
patch, when we did not consider the values with var for transformation.

--
Regards,
Alena Rybakina
Postgres Professional

#18

Alexander Korotkov

aekorotkov@gmail.com

about 1 year ago

In reply to: Alena Rybakina (#17)

Re: Replace IN VALUES with ANY in WHERE clauses during optimization

Hi, Alena!

On Sat, Mar 1, 2025 at 1:39 PM Alena Rybakina <a.rybakina@postgrespro.ru> wrote:

On 09.02.2025 18:38, Alexander Korotkov wrote:

Also, aren't we too restrictive while requiring is_simple_values_sequence()?
For instance, I believe cases like this (containing Var) could be transformed too.

select * from t t1, lateral (select * from t t2 where t2.i in (values (t1.i), (1)));

I added it and attached a patch with diff file. To be honest, I didn't find queries except for var with volatile functions where the transform can't be applied.

I removed the function volatility check that I added in the previous version, since we already check it in is_simple_values_sequence.

I'm not sure about only cases where var can refer to something outside available_rels list but I couldn't come up with an example where that's possible, what do you think?

Considering it again, I think we can't face problems like that because we don't work with join.

I attached a diff file as a difference with the 3rd version of the patch, when we did not consider the values with var for transformation.

I take detailed look at makeSAOPArrayExpr() function, which is much
more complex than corresponding fragment from
match_orclause_to_indexcol(). And I found it to be mostly wrong. We
are working in post parse-analyze stage. That means it's too late to
do type coercion or lookup operator by name. We have already all the
catalog objects nailed down. In connection with that, second argument
of OpExpr shouldn't be ignored as it might contain amrelevant type
cast. I think I've fixed the most of them problems in the attached
patchset.

------
Regards,
Alexander Korotkov
Supabase

#19

Alena Rybakina

a.rybakina@postgrespro.ru

about 1 year ago

In reply to: Alexander Korotkov (#18)

Re: Replace IN VALUES with ANY in WHERE clauses during optimization

Hi, Alexander!

On 06.03.2025 11:23, Alexander Korotkov wrote:

Hi, Alena!

On Sat, Mar 1, 2025 at 1:39 PM Alena Rybakina<a.rybakina@postgrespro.ru> wrote:

On 09.02.2025 18:38, Alexander Korotkov wrote:

Also, aren't we too restrictive while requiring is_simple_values_sequence()?
For instance, I believe cases like this (containing Var) could be transformed too.

select * from t t1, lateral (select * from t t2 where t2.i in (values (t1.i), (1)));

I added it and attached a patch with diff file. To be honest, I didn't find queries except for var with volatile functions where the transform can't be applied.

I removed the function volatility check that I added in the previous version, since we already check it in is_simple_values_sequence.

I'm not sure about only cases where var can refer to something outside available_rels list but I couldn't come up with an example where that's possible, what do you think?

Considering it again, I think we can't face problems like that because we don't work with join.

I attached a diff file as a difference with the 3rd version of the patch, when we did not consider the values with var for transformation.

I take detailed look at makeSAOPArrayExpr() function, which is much
more complex than corresponding fragment from
match_orclause_to_indexcol(). And I found it to be mostly wrong. We
are working in post parse-analyze stage. That means it's too late to
do type coercion or lookup operator by name. We have already all the
catalog objects nailed down. In connection with that, second argument
of OpExpr shouldn't be ignored as it might contain amrelevant type
cast. I think I've fixed the most of them problems in the attached
patchset.

I agree with your conclusion and changes.

--
Regards,
Alena Rybakina
Postgres Professional

#20

Alexander Korotkov

aekorotkov@gmail.com

about 1 year ago

In reply to: Alena Rybakina (#19)

Re: Replace IN VALUES with ANY in WHERE clauses during optimization

On Wed, Mar 12, 2025 at 8:11 PM Alena Rybakina
<a.rybakina@postgrespro.ru> wrote:

On 06.03.2025 11:23, Alexander Korotkov wrote:

Hi, Alena!

On Sat, Mar 1, 2025 at 1:39 PM Alena Rybakina <a.rybakina@postgrespro.ru> wrote:

On 09.02.2025 18:38, Alexander Korotkov wrote:

Also, aren't we too restrictive while requiring is_simple_values_sequence()?
For instance, I believe cases like this (containing Var) could be transformed too.

select * from t t1, lateral (select * from t t2 where t2.i in (values (t1.i), (1)));

I added it and attached a patch with diff file. To be honest, I didn't find queries except for var with volatile functions where the transform can't be applied.

I removed the function volatility check that I added in the previous version, since we already check it in is_simple_values_sequence.

I'm not sure about only cases where var can refer to something outside available_rels list but I couldn't come up with an example where that's possible, what do you think?

Considering it again, I think we can't face problems like that because we don't work with join.

I attached a diff file as a difference with the 3rd version of the patch, when we did not consider the values with var for transformation.

I take detailed look at makeSAOPArrayExpr() function, which is much
more complex than corresponding fragment from
match_orclause_to_indexcol(). And I found it to be mostly wrong. We
are working in post parse-analyze stage. That means it's too late to
do type coercion or lookup operator by name. We have already all the
catalog objects nailed down. In connection with that, second argument
of OpExpr shouldn't be ignored as it might contain amrelevant type
cast. I think I've fixed the most of them problems in the attached
patchset.

I agree with your conclusion and changes.

I've revised the patchset. Mostly comments/commit messages and minor
refactoring. One thing I have to fix: we must do
IncrementVarSublevelsUp() unconditionally for all expressions as Vars
could be deeper inside. Also, I've removed our subquery check
completely. Not sure if we need it at all. I'll further analyze
that.

------
Regards,
Alexander Korotkov
Supabase