Simple IN vs IN values performace

Started by Oleksandr Voytsekhovskyyabout 5 years ago9 messagesgeneral

av@uniweb.ua

about 5 years ago

Greetings,

We have queries with IN filters with long list of INT values

Sometimes, they running extremely slow, and I have found suggestion to use syntax
Field IN (VALUES(1465), (1478), ...
Instead of
Field IN (1465, 1478, ...
On some cases it helps, but on other it makes query running 1000+ times slower

So the issue is:
- some queries much faster with simple IN
- some queries much faster with IN + VALUES

Here is explain on case when it’s extremely slow:

-> HashAggregate (cost=5.78..9.62 rows=385 width=4)
Group Key: ""*VALUES*"".column1"
-> Values Scan on ""*VALUES*"" (cost=0.00..4.81 rows=385 width=4)"
What is the right way to pass long INT values list to IN filter?

I am using PostgreSQL 13.1 on Ubuntu

David G. Johnston

david.g.johnston@gmail.com

about 5 years ago

In reply to: Oleksandr Voytsekhovskyy (#1)

Re: Simple IN vs IN values performace

On Monday, February 22, 2021, Oleksandr Voytsekhovskyy <av@uniweb.ua> wrote:

What is the right way to pass long INT values list to IN filter

Don’t.

Pass in a delimited string, then parse that string into an array and use “=
any(array)”.

This has the primary benefit of making the input a single parameter.

David J.

Michael Lewis

mlewis@entrata.com

about 5 years ago

In reply to: David G. Johnston (#2)

Re: Simple IN vs IN values performace

Wouldn't using “= any(array)” change how the query is planned? Or is the
concern just parsing the values?

Ron

ronljohnsonjr@gmail.com

about 5 years ago

In reply to: Oleksandr Voytsekhovskyy (#1)

Re: Simple IN vs IN values performace

On 2/22/21 9:00 AM, Oleksandr Voytsekhovskyy wrote:

Greetings,

We have queries with IN filters with long list of INT values

Sometimes, they running extremely slow, and I have found suggestion to use
syntax
Field IN (VALUES(1465), (1478), ...
Instead of
Field IN (1465, 1478, ...
On some cases it helps, but on other it makes query running 1000+ times slower

Insert the values into a TEMPORARY TABLE, then join that to your main table?

So the issue is:
- some queries much faster with simple IN
- some queries much faster with IN + VALUES

Here is explain on case when it’s extremely slow:

-> HashAggregate (cost=5.78..9.62 rows=385 width=4)
Group Key: ""*VALUES*"".column1"
-> Values Scan on ""*VALUES*"" (cost=0.00..4.81 rows=385 width=4)"
What is the right way to pass long INT values list to IN filter?

I am using PostgreSQL 13.1 on Ubuntu

--
Angular momentum makes the world go 'round.

Michael Lewis

mlewis@entrata.com

about 5 years ago

In reply to: Ron (#4)

Re: Simple IN vs IN values performace

Insert the values into a TEMPORARY TABLE, then join that to your main
table?

In my experience, this is very performant but needs an analyze command
after populating the temp table to ensure there are statistics so the plan
doesn't go awry. Otherwise, I'm not sure it is different from a
materialized CTE or a subquery with OFFSET 0 at the end.

Alexander Voytsekhovskyy

young.inbox@gmail.com

about 5 years ago

In reply to: Michael Lewis (#5)

Re: Simple IN vs IN values performace

Greetings,

Didn’t get your ideas with materialized CTE or a subquery with OFFSET 0

Could you please show simple example?

Show quoted text

23 февр. 2021 г., в 04:33, Michael Lewis <mlewis@entrata.com> написал(а):

Insert the values into a TEMPORARY TABLE, then join that to your main table?

In my experience, this is very performant but needs an analyze command after populating the temp table to ensure there are statistics so the plan doesn't go awry. Otherwise, I'm not sure it is different from a materialized CTE or a subquery with OFFSET 0 at the end.

Alexander Voytsekhovskyy

young.inbox@gmail.com

about 5 years ago

In reply to: Ron (#4)

Re: Simple IN vs IN values performace

Thanks for suggestion

with tmp tables there are another issue - there are already 5-6 tables and
2-3 IN filters. If i will replace them with tmp tables it may hit query
planner limits and it will become to produce terrible query plans, for
example when genetic query optimizer starts

On Tue, Feb 23, 2021 at 1:45 AM Ron <ronljohnsonjr@gmail.com> wrote:

Show quoted text

On 2/22/21 9:00 AM, Oleksandr Voytsekhovskyy wrote:

Greetings,

We have queries with IN filters with long list of INT values

Sometimes, they running extremely slow, and I have found suggestion to use
syntax

Field IN (VALUES(1465), (1478), ...

Instead of

Field IN (1465, 1478, ...

On some cases it helps, but on other it makes query running 1000+ times
slower

Insert the values into a TEMPORARY TABLE, then join that to your main
table?

So the issue is:
- some queries much faster with simple IN
- some queries much faster with IN + VALUES

Here is explain on case when it’s extremely slow:

-> HashAggregate (cost=5.78..9.62 rows=385 width=4)
Group Key: ""*VALUES*"".column1"
-> Values Scan on ""*VALUES*"" (cost=0.00..4.81 rows=385 width=4)"

What is the right way to pass long INT values list to IN filter?

I am using PostgreSQL 13.1 on Ubuntu

--
Angular momentum makes the world go 'round.

Pavel Stehule

pavel.stehule@gmail.com

about 5 years ago

In reply to: Alexander Voytsekhovskyy (#7)

Re: Simple IN vs IN values performace

út 23. 2. 2021 v 10:36 odesílatel Alexander Voytsekhovskyy <
young.inbox@gmail.com> napsal:

Thanks for suggestion

with tmp tables there are another issue - there are already 5-6 tables and
2-3 IN filters. If i will replace them with tmp tables it may hit query
planner limits and it will become to produce terrible query plans, for
example when genetic query optimizer starts

you can increase these limits - they are relatively low, and can be
increased on modern CPU.

https://www.postgresql.org/docs/current/runtime-config-query.html

Regards

Pavel

Show quoted text

On Tue, Feb 23, 2021 at 1:45 AM Ron <ronljohnsonjr@gmail.com> wrote:

On 2/22/21 9:00 AM, Oleksandr Voytsekhovskyy wrote:

Greetings,

We have queries with IN filters with long list of INT values

Sometimes, they running extremely slow, and I have found suggestion to
use syntax

Field IN (VALUES(1465), (1478), ...

Instead of

Field IN (1465, 1478, ...

On some cases it helps, but on other it makes query running 1000+ times
slower

Insert the values into a TEMPORARY TABLE, then join that to your main
table?

So the issue is:
- some queries much faster with simple IN
- some queries much faster with IN + VALUES

Here is explain on case when it’s extremely slow:

-> HashAggregate (cost=5.78..9.62 rows=385 width=4)
Group Key: ""*VALUES*"".column1"
-> Values Scan on ""*VALUES*"" (cost=0.00..4.81 rows=385 width=4)"

What is the right way to pass long INT values list to IN filter?

I am using PostgreSQL 13.1 on Ubuntu

--
Angular momentum makes the world go 'round.

Alexander Voytsekhovskyy

young.inbox@gmail.com

about 5 years ago

In reply to: David G. Johnston (#2)

Re: Simple IN vs IN values performace

Greetings,

i have tested both options ANY with string + parsing and simple array - and
there are cases when execution time 100+ times worse than IN

On Tue, Feb 23, 2021 at 12:23 AM David G. Johnston <
david.g.johnston@gmail.com> wrote:

Show quoted text

On Monday, February 22, 2021, Oleksandr Voytsekhovskyy <av@uniweb.ua>
wrote:

What is the right way to pass long INT values list to IN filter

Don’t.

Pass in a delimited string, then parse that string into an array and use
“= any(array)”.

This has the primary benefit of making the input a single parameter.

David J.