Different query plan used for the same query depending on how parameters are passed

Started by David Chapmanalmost 9 years ago5 messagesgeneral
Jump to latest
#1David Chapman
david.chapman@mavensecurities.com

I have a table that includes two text columns t1 and t2, and a composite
index on these columns. When issuing a query of the following form:

SELECT * FROM test WHERE t1 = 'X' and t2 = ANY(ARRAY['Y1', 'Y2', ..])

I have observed that it will use the index and have reasonable performance
if the whole query is passed as a single big string. However if it is
parameterised (I'm using Npgsql) it switches to doing a sequence scan and
performs terribly.

The table contains approx 2.3 million records and the query matches about
20k records (i.e. there are 20k 'Y' values in the array).

I have experimented with changing work_mem, preparing the statement in
advance, ANALYZEing the table, none of these change the behavior.

Why does the query planner choose to ignore the index when the command is
parameterised?

--

This e-mail together with any attachments (the "Message") is confidential
and may contain privileged information. If you are not the intended
recipient or if you have received this e-mail in error, please notify the
sender immediately and permanently delete this Message from your system.
Do not copy, disclose or distribute the information contained in this
Message.

Maven Investment Partners Ltd (No. 07511928), Maven Derivatives Ltd (No.
07511840) , MVN Asset Management Limited (No. 09659116), Maven Europe Ltd
(No. 08966593), Maven Derivatives Asia Limited (No.10361312) & Maven
Securities Holding Ltd (No. 07505438) are registered as companies in
England and Wales and their registered address is Level 3, 6 Bevis Marks,
London EC3A 7BA, United Kingdom. The companies’ VAT No. is 135539016. Maven
Asia (Hong Kong) Ltd (No. 2444041) is registered in Hong Kong and its
registered address is 20/F, 198 Wellington St, Hong Kong. Only Maven
Derivatives Ltd and MVN Asset Management Limited are authorised and
regulated by the Financial Conduct Authority (Maven Derivatives Ltd FRN:
607267, MVN Asset Management Limited FRN: 714429)

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: David Chapman (#1)
Re: Different query plan used for the same query depending on how parameters are passed

On 05/16/2017 06:01 AM, David Chapman wrote:

I have a table that includes two text columns t1 and t2, and a composite
index on these columns. When issuing a query of the following form:

SELECT * FROM test WHERE t1 = 'X' and t2 = ANY(ARRAY['Y1', 'Y2', ..])

I have observed that it will use the index and have reasonable
performance if the whole query is passed as a single big string. However
if it is parameterised (I'm using Npgsql) it switches to doing a
sequence scan and performs terribly.

What Postgres version?

Can you show the parametrized version?

What is the output if you add EXPLAIN ANALYZE to the beginning of the
query?:

https://www.postgresql.org/docs/9.6/static/sql-explain.html

The table contains approx 2.3 million records and the query matches
about 20k records (i.e. there are 20k 'Y' values in the array).

I have experimented with changing work_mem, preparing the statement in
advance, ANALYZEing the table, none of these change the behavior.

Why does the query planner choose to ignore the index when the command
is parameterised?

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Igor Neyman
ineyman@perceptron.com
In reply to: David Chapman (#1)
Re: Different query plan used for the same query depending on how parameters are passed

From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of David Chapman
Sent: Tuesday, May 16, 2017 9:02 AM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Different query plan used for the same query depending on how parameters are passed

Attention: This email was sent from someone outside of Perceptron. Always exercise caution when opening attachments or clicking links from unknown senders or when receiving unexpected emails.

I have a table that includes two text columns t1 and t2, and a composite index on these columns. When issuing a query of the following form:

SELECT * FROM test WHERE t1 = 'X' and t2 = ANY(ARRAY['Y1', 'Y2', ..])

I have observed that it will use the index and have reasonable performance if the whole query is passed as a single big string. However if it is parameterised (I'm using Npgsql) it switches to doing a sequence scan and performs terribly.

The table contains approx 2.3 million records and the query matches about 20k records (i.e. there are 20k 'Y' values in the array).

I have experimented with changing work_mem, preparing the statement in advance, ANALYZEing the table, none of these change the behavior.

Why does the query planner choose to ignore the index when the command is parameterised?

It’s because when optimizer builds execution plan for parametrized queiry, it doesn’t know what values for t1 and t2 will be provided for WHERE clause.

Regards,

Igor Neyman

#4David Chapman
david.chapman@mavensecurities.com
In reply to: Adrian Klaver (#2)
Re: Different query plan used for the same query depending on how parameters are passed

Version is 'PostgreSQL 9.5.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC)
4.8.5 20150623 (Red Hat 4.8.5-4), 64-bit'

Here is the output of EXPLAIN ANALYZE on the two queries.

Index Scan using test_index_t1_t2 on test (cost=0.43..684.11 rows=71
width=245) (actual time=0.022..1.147 rows=99 loops=1)
Index Cond: ((t1 = 'X'::bpchar) AND (t2 = ANY ('{2286575,2139022,2139030,
2139032,1912037,860952,2139035,2139037,2278996,2139043,72062644919750111,
2139045,2139047,1904847,2139049,2259635,2259633,2293287,2293281,2206950,
2127033,2206952,2146439,2139064,2139066,2139070,2139074,2305447,2139077,
2377493,2139079,2282669,2139083,2375729,2260985,2286579,2274454,2286583,
2286582,2281512,2286588,2286591,2286592,1912217,2286596,2307845,2307846,
2293294,287038,2293295,2293291,2356192,2317356,2356191,2356190,2356181,
2356195,2356197,2389077,2356199,2356217,2374842,2374844,2374846,2337988,
2374856,1950450,2374833,2374839,2282877,2374843,2374849,2374855,2390631,
2028018,2374834,2139011,2139004,2344765,2344767,2390620,2393192,2344771,
2344773,2344775,2390201,2344777,1893069,2344779,2344781,2344783,2374838,
2304999,2344762,2344764,2344766,2344768,2344772,867138,2276706}'::text[])))
Planning time: 0.779 ms
Execution time: 1.417 ms

Seq Scan on test (cost=0.00..403725.30 rows=1 width=245) (actual
time=47.543..5362.518 rows=99 loops=1)
Filter: (((t1)::text = 'X'::text) AND (t2 = ANY
('{2286575,2139022,2139030,2139032,1912037,860952,2139035,2139037,2278996,
2139043,72062644919750111,2139045,2139047,1904847,2139049,2259635,2259633,
2293287,2293281,2206950,2127033,2206952,2146439,2139064,2139066,2139070,
2139074,2305447,2139077,2377493,2139079,2282669,2139083,2375729,2260985,
2286579,2274454,2286583,2286582,2281512,2286588,2286591,2286592,1912217,
2286596,2307845,2307846,2293294,287038,2293295,2293291,2356192,2317356,
2356191,2356190,2356181,2356195,2356197,2389077,2356199,2356217,2374842,
2374844,2374846,2337988,2374856,1950450,2374833,2374839,2282877,2374843,
2374849,2374855,2390631,2028018,2374834,2139011,2139004,2344765,2344767,
2390620,2393192,2344771,2344773,2344775,2390201,2344777,1893069,2344779,
2344781,2344783,2374838,2304999,2344762,2344764,2344766,2344768,2344772,
867138,2276706}'::text[])))
Rows Removed by Filter: 2327145
Planning time: 0.179 ms
Execution time: 5362.558 ms

As it turns out t1 was actually defined as character(1) rather than text. I
can see Postgres has cast the first parameter to bpchar in the first case
but I guess Npgsql is explicitly sending it as type text, which bypasses
the index.

On 16 May 2017 at 15:17, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

On 05/16/2017 06:01 AM, David Chapman wrote:

I have a table that includes two text columns t1 and t2, and a composite
index on these columns. When issuing a query of the following form:

SELECT * FROM test WHERE t1 = 'X' and t2 = ANY(ARRAY['Y1', 'Y2', ..])

I have observed that it will use the index and have reasonable
performance if the whole query is passed as a single big string. However if
it is parameterised (I'm using Npgsql) it switches to doing a sequence scan
and performs terribly.

What Postgres version?

Can you show the parametrized version?

What is the output if you add EXPLAIN ANALYZE to the beginning of the
query?:

https://www.postgresql.org/docs/9.6/static/sql-explain.html

The table contains approx 2.3 million records and the query matches about
20k records (i.e. there are 20k 'Y' values in the array).

I have experimented with changing work_mem, preparing the statement in
advance, ANALYZEing the table, none of these change the behavior.

Why does the query planner choose to ignore the index when the command is
parameterised?

--
Adrian Klaver
adrian.klaver@aklaver.com

--

This e-mail together with any attachments (the "Message") is confidential
and may contain privileged information. If you are not the intended
recipient or if you have received this e-mail in error, please notify the
sender immediately and permanently delete this Message from your system.
Do not copy, disclose or distribute the information contained in this
Message.

Maven Investment Partners Ltd (No. 07511928), Maven Derivatives Ltd (No.
07511840) , MVN Asset Management Limited (No. 09659116), Maven Europe Ltd
(No. 08966593), Maven Derivatives Asia Limited (No.10361312) & Maven
Securities Holding Ltd (No. 07505438) are registered as companies in
England and Wales and their registered address is Level 3, 6 Bevis Marks,
London EC3A 7BA, United Kingdom. The companies’ VAT No. is 135539016. Maven
Asia (Hong Kong) Ltd (No. 2444041) is registered in Hong Kong and its
registered address is 20/F, 198 Wellington St, Hong Kong. Only Maven
Derivatives Ltd and MVN Asset Management Limited are authorised and
regulated by the Financial Conduct Authority (Maven Derivatives Ltd FRN:
607267, MVN Asset Management Limited FRN: 714429)

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Chapman (#4)
Re: Different query plan used for the same query depending on how parameters are passed

David Chapman <david.chapman@mavensecurities.com> writes:

Here is the output of EXPLAIN ANALYZE on the two queries.

Index Scan using test_index_t1_t2 on test (cost=0.43..684.11 rows=71
width=245) (actual time=0.022..1.147 rows=99 loops=1)
Index Cond: ((t1 = 'X'::bpchar) AND (t2 = ANY ('{2286575,2139022,2139030,

^^^^^^^^^^^^^^^^^^

Seq Scan on test (cost=0.00..403725.30 rows=1 width=245) (actual
time=47.543..5362.518 rows=99 loops=1)
Filter: (((t1)::text = 'X'::text) AND (t2 = ANY

^^^^^^^^^^^^^^^^^^^^^^^^

There's your problem. t1 is evidently of char(n) type, and when you
write "t1 = 'X'" the literal also becomes char(n) and so you have
a condition that can match the index. But the parameter is evidently
being assigned type text, which wins the type conflict so then you have
"t1::text text-eq text-constant", and that operator doesn't match the index.

Probably, casting the parameter to char(n) explicitly would fix this.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general