Simple IN vs IN values performace

Started by Oleksandr Voytsekhovskyyabout 5 years ago9 messagesgeneral
Jump to latest

Greetings,

We have queries with IN filters with long list of INT values

Sometimes, they running extremely slow, and I have found suggestion to use syntax
Field IN (VALUES(1465), (1478), ...
Instead of
Field IN (1465, 1478, ...
On some cases it helps, but on other it makes query running 1000+ times slower

So the issue is:
- some queries much faster with simple IN
- some queries much faster with IN + VALUES

Here is explain on case when it’s extremely slow:

-> HashAggregate (cost=5.78..9.62 rows=385 width=4)
Group Key: ""*VALUES*"".column1"
-> Values Scan on ""*VALUES*"" (cost=0.00..4.81 rows=385 width=4)"
What is the right way to pass long INT values list to IN filter?

I am using PostgreSQL 13.1 on Ubuntu

#2David G. Johnston
david.g.johnston@gmail.com
In reply to: Oleksandr Voytsekhovskyy (#1)
Re: Simple IN vs IN values performace

On Monday, February 22, 2021, Oleksandr Voytsekhovskyy <av@uniweb.ua> wrote:

What is the right way to pass long INT values list to IN filter

Don’t.

Pass in a delimited string, then parse that string into an array and use “=
any(array)”.

This has the primary benefit of making the input a single parameter.

David J.

#3Michael Lewis
mlewis@entrata.com
In reply to: David G. Johnston (#2)
Re: Simple IN vs IN values performace

Wouldn't using “= any(array)” change how the query is planned? Or is the
concern just parsing the values?

#4Ron
ronljohnsonjr@gmail.com
In reply to: Oleksandr Voytsekhovskyy (#1)
Re: Simple IN vs IN values performace

On 2/22/21 9:00 AM, Oleksandr Voytsekhovskyy wrote:

Greetings,

We have queries with IN filters with long list of INT values

Sometimes, they running extremely slow, and I have found suggestion to use
syntax
Field IN (VALUES(1465), (1478), ...
Instead of
Field IN (1465, 1478, ...
On some cases it helps, but on other it makes query running 1000+ times slower

Insert the values into a TEMPORARY TABLE, then join that to your main table?

So the issue is:
- some queries much faster with simple IN
- some queries much faster with IN + VALUES

Here is explain on case when it’s extremely slow:

-> HashAggregate (cost=5.78..9.62 rows=385 width=4)
Group Key: ""*VALUES*"".column1"
-> Values Scan on ""*VALUES*"" (cost=0.00..4.81 rows=385 width=4)"
What is the right way to pass long INT values list to IN filter?

I am using PostgreSQL 13.1 on Ubuntu

--
Angular momentum makes the world go 'round.

#5Michael Lewis
mlewis@entrata.com
In reply to: Ron (#4)
Re: Simple IN vs IN values performace

Insert the values into a TEMPORARY TABLE, then join that to your main
table?

In my experience, this is very performant but needs an analyze command
after populating the temp table to ensure there are statistics so the plan
doesn't go awry. Otherwise, I'm not sure it is different from a
materialized CTE or a subquery with OFFSET 0 at the end.

#6Alexander Voytsekhovskyy
young.inbox@gmail.com
In reply to: Michael Lewis (#5)
Re: Simple IN vs IN values performace

Greetings,

Didn’t get your ideas with materialized CTE or a subquery with OFFSET 0

Could you please show simple example?

Show quoted text

23 февр. 2021 г., в 04:33, Michael Lewis <mlewis@entrata.com> написал(а):

Insert the values into a TEMPORARY TABLE, then join that to your main table?

In my experience, this is very performant but needs an analyze command after populating the temp table to ensure there are statistics so the plan doesn't go awry. Otherwise, I'm not sure it is different from a materialized CTE or a subquery with OFFSET 0 at the end.

#7Alexander Voytsekhovskyy
young.inbox@gmail.com
In reply to: Ron (#4)
Re: Simple IN vs IN values performace

Thanks for suggestion

with tmp tables there are another issue - there are already 5-6 tables and
2-3 IN filters. If i will replace them with tmp tables it may hit query
planner limits and it will become to produce terrible query plans, for
example when genetic query optimizer starts

On Tue, Feb 23, 2021 at 1:45 AM Ron <ronljohnsonjr@gmail.com> wrote:

Show quoted text

On 2/22/21 9:00 AM, Oleksandr Voytsekhovskyy wrote:

Greetings,

We have queries with IN filters with long list of INT values

Sometimes, they running extremely slow, and I have found suggestion to use
syntax

Field IN (VALUES(1465), (1478), ...

Instead of

Field IN (1465, 1478, ...

On some cases it helps, but on other it makes query running 1000+ times
slower

Insert the values into a TEMPORARY TABLE, then join that to your main
table?

So the issue is:
- some queries much faster with simple IN
- some queries much faster with IN + VALUES

Here is explain on case when it’s extremely slow:

-> HashAggregate (cost=5.78..9.62 rows=385 width=4)
Group Key: ""*VALUES*"".column1"
-> Values Scan on ""*VALUES*"" (cost=0.00..4.81 rows=385 width=4)"

What is the right way to pass long INT values list to IN filter?

I am using PostgreSQL 13.1 on Ubuntu

--
Angular momentum makes the world go 'round.

#8Pavel Stehule
pavel.stehule@gmail.com
In reply to: Alexander Voytsekhovskyy (#7)
Re: Simple IN vs IN values performace

Hi

út 23. 2. 2021 v 10:36 odesílatel Alexander Voytsekhovskyy <
young.inbox@gmail.com> napsal:

Thanks for suggestion

with tmp tables there are another issue - there are already 5-6 tables and
2-3 IN filters. If i will replace them with tmp tables it may hit query
planner limits and it will become to produce terrible query plans, for
example when genetic query optimizer starts

you can increase these limits - they are relatively low, and can be
increased on modern CPU.

https://www.postgresql.org/docs/current/runtime-config-query.html

Regards

Pavel

Show quoted text

On Tue, Feb 23, 2021 at 1:45 AM Ron <ronljohnsonjr@gmail.com> wrote:

On 2/22/21 9:00 AM, Oleksandr Voytsekhovskyy wrote:

Greetings,

We have queries with IN filters with long list of INT values

Sometimes, they running extremely slow, and I have found suggestion to
use syntax

Field IN (VALUES(1465), (1478), ...

Instead of

Field IN (1465, 1478, ...

On some cases it helps, but on other it makes query running 1000+ times
slower

Insert the values into a TEMPORARY TABLE, then join that to your main
table?

So the issue is:
- some queries much faster with simple IN
- some queries much faster with IN + VALUES

Here is explain on case when it’s extremely slow:

-> HashAggregate (cost=5.78..9.62 rows=385 width=4)
Group Key: ""*VALUES*"".column1"
-> Values Scan on ""*VALUES*"" (cost=0.00..4.81 rows=385 width=4)"

What is the right way to pass long INT values list to IN filter?

I am using PostgreSQL 13.1 on Ubuntu

--
Angular momentum makes the world go 'round.

#9Alexander Voytsekhovskyy
young.inbox@gmail.com
In reply to: David G. Johnston (#2)
Re: Simple IN vs IN values performace

Greetings,

i have tested both options ANY with string + parsing and simple array - and
there are cases when execution time 100+ times worse than IN

On Tue, Feb 23, 2021 at 12:23 AM David G. Johnston <
david.g.johnston@gmail.com> wrote:

Show quoted text

On Monday, February 22, 2021, Oleksandr Voytsekhovskyy <av@uniweb.ua>
wrote:

What is the right way to pass long INT values list to IN filter

Don’t.

Pass in a delimited string, then parse that string into an array and use
“= any(array)”.

This has the primary benefit of making the input a single parameter.

David J.