SELECT with LIKE clause makes full table scan

Started by Matthias Apitzabout 4 years ago12 messagesgeneral
Jump to latest
#1Matthias Apitz
guru@unixarea.de

Hello,

We face in a PostgreSQL 11.4 installation on a potent Linux host a
serious performance degree.

A SELECT with a LIKE clause on a table with ca. 5.200.000 rows on a
column with an Index ignores this and does a full table scan:

sisis=# explain (analyze, buffers) select * from d01buch where d01ort like 'Z 9610%' ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..680262.71 rows=510 width=952) (actual time=1324.096..1349.429 rows=1 loops=1)
Workers Planned: 4
Workers Launched: 4
Buffers: shared hit=102040 read=560674
-> Parallel Seq Scan on d01buch (cost=0.00..679211.71 rows=128 width=952) (actual time=1117.663..1315.062 rows=0 loops=5)
Filter: (d01ort ~~ 'Z 9610%'::text)
Rows Removed by Filter: 1055853
Buffers: shared hit=102040 read=560674
Planning Time: 2.028 ms
Execution Time: 1349.593 ms
(10 Zeilen)

Why is this (ignoring the Index) and what could be done?

Thanks

matthias

--
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub

#2Josef Šimánek
josef.simanek@gmail.com
In reply to: Matthias Apitz (#1)
Re: SELECT with LIKE clause makes full table scan

st 26. 1. 2022 v 11:55 odesílatel Matthias Apitz <guru@unixarea.de> napsal:

Hello,

We face in a PostgreSQL 11.4 installation on a potent Linux host a
serious performance degree.

A SELECT with a LIKE clause on a table with ca. 5.200.000 rows on a
column with an Index ignores this and does a full table scan:

Which index is ignored? Can you share the CREATE INDEX command as well?

Show quoted text

sisis=# explain (analyze, buffers) select * from d01buch where d01ort like 'Z 9610%' ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..680262.71 rows=510 width=952) (actual time=1324.096..1349.429 rows=1 loops=1)
Workers Planned: 4
Workers Launched: 4
Buffers: shared hit=102040 read=560674
-> Parallel Seq Scan on d01buch (cost=0.00..679211.71 rows=128 width=952) (actual time=1117.663..1315.062 rows=0 loops=5)
Filter: (d01ort ~~ 'Z 9610%'::text)
Rows Removed by Filter: 1055853
Buffers: shared hit=102040 read=560674
Planning Time: 2.028 ms
Execution Time: 1349.593 ms
(10 Zeilen)

Why is this (ignoring the Index) and what could be done?

Thanks

matthias

--
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub

#3Matthias Apitz
guru@unixarea.de
In reply to: Josef Šimánek (#2)
Re: SELECT with LIKE clause makes full table scan

El día miércoles, enero 26, 2022 a las 12:20:08 +0100, Josef Šimánek escribió:

st 26. 1. 2022 v 11:55 odesílatel Matthias Apitz <guru@unixarea.de> napsal:

Hello,

We face in a PostgreSQL 11.4 installation on a potent Linux host a
serious performance degree.

A SELECT with a LIKE clause on a table with ca. 5.200.000 rows on a
column with an Index ignores this and does a full table scan:

Which index is ignored? Can you share the CREATE INDEX command as well?

/* # $Revision: 1.1.2.21 $ */
create UNIQUE INDEX d01sig on d01buch(d01gsi,d01ex) ;
/* alter table d01buch add constraint d01sig unique (d01gsi,d01ex)
deferrable initially deferred; *//* D01SIG */
create INDEX d01mcopyno on d01buch(d01mcopyno) ;/* D01MCOPYNO */
create INDEX d01bnr on d01buch(d01bnr) ;/* D01BNR */
create INDEX d01ort on d01buch(d01ort) ;/* D01ORT */
create INDEX d01mcopynozweig on d01buch(d01mcopyno,d01zweig) ;/* D01KATZWEIG */
create INDEX d01ort2 on d01buch(d01ort2) ;/* D02ORT2 */
create INDEX d01aufnahme on d01buch(d01aufnahme) ;/* D01aufnahme */
create INDEX d01titlecatkey on d01buch(d01titlecatkey) ;/* D01TITLECATKEY */
create INDEX d01invkrnr on d01buch(d01invkreis,d01invnr) ;/* D01invkrnr */

matthias

sisis=# explain (analyze, buffers) select * from d01buch where d01ort like 'Z 9610%' ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..680262.71 rows=510 width=952) (actual time=1324.096..1349.429 rows=1 loops=1)
Workers Planned: 4
Workers Launched: 4
Buffers: shared hit=102040 read=560674
-> Parallel Seq Scan on d01buch (cost=0.00..679211.71 rows=128 width=952) (actual time=1117.663..1315.062 rows=0 loops=5)
Filter: (d01ort ~~ 'Z 9610%'::text)
Rows Removed by Filter: 1055853
Buffers: shared hit=102040 read=560674
Planning Time: 2.028 ms
Execution Time: 1349.593 ms
(10 Zeilen)

Why is this (ignoring the Index) and what could be done?

Thanks

matthias

--
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub

--
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub

#4Dominique Devienne
ddevienne@gmail.com
In reply to: Matthias Apitz (#1)
Re: SELECT with LIKE clause makes full table scan

On Wed, Jan 26, 2022 at 11:55 AM Matthias Apitz <guru@unixarea.de> wrote:

A SELECT with a LIKE clause on a table with ca. 5.200.000 rows on a
column with an Index ignores this and does a full table scan:

sisis=# explain (analyze, buffers) select * from d01buch where d01ort like 'Z 9610%' ;
-> Parallel Seq Scan on d01buch (cost=0.00..679211.71 rows=128 width=952)...
Filter: (d01ort ~~ 'Z 9610%'::text)

Could it be a collation issue? They need to match between the query
and the index typically.

See below how an index-scan transforms into a sequential-scan just
from using ILIKE instead of LIKE.
Just to illustrate how collations affect plans for prefix-like
queries. Show the relevant DDL for the `d01ort` column,
and its indexes, and that should help diagnose this.

Perhaps it could be related to statistics too? --DD

```
ddevienne=> explain select count(*) from pg_class where relname like 'PNS%';
-> Index Only Scan using pg_class_relname_nsp_index on pg_class
(cost=0.41..8.44 rows=5 width=0)
Index Cond: ((relname >= 'PNS'::text) AND (relname < 'PNT'::text))
Filter: (relname ~~ 'PNS%'::text)
Time: 1.647 ms

ddevienne=> explain select count(*) from pg_class where relname ilike 'PNS%';
Aggregate (cost=2682.35..2682.36 rows=1 width=8)
-> Seq Scan on pg_class (cost=0.00..2682.34 rows=5 width=0)
Filter: (relname ~~* 'PNS%'::text)
Time: 1.262 ms
```

#5Julien Rouhaud
rjuju123@gmail.com
In reply to: Matthias Apitz (#3)
Re: SELECT with LIKE clause makes full table scan

Hi,

On Wed, Jan 26, 2022 at 12:39:25PM +0100, Matthias Apitz wrote:

sisis=# explain (analyze, buffers) select * from d01buch where d01ort like 'Z 9610%' ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..680262.71 rows=510 width=952) (actual time=1324.096..1349.429 rows=1 loops=1)
Workers Planned: 4
Workers Launched: 4
Buffers: shared hit=102040 read=560674
-> Parallel Seq Scan on d01buch (cost=0.00..679211.71 rows=128 width=952) (actual time=1117.663..1315.062 rows=0 loops=5)
Filter: (d01ort ~~ 'Z 9610%'::text)
Rows Removed by Filter: 1055853
Buffers: shared hit=102040 read=560674
Planning Time: 2.028 ms
Execution Time: 1349.593 ms
(10 Zeilen)

Why is this (ignoring the Index) and what could be done?

[...]
create INDEX d01ort on d01buch(d01ort) ;/* D01ORT */

That index can't be used with a LIKE that has a trailing wildcard. You need to
either create an index with text_pattern_ops operator class (see
https://www.postgresql.org/docs/current/indexes-opclass.html), or a GIN index
using pg_trgm (which will also work with non-trailing wildcards), see
https://www.postgresql.org/docs/current/pgtrgm.html.

#6Dominique Devienne
ddevienne@gmail.com
In reply to: Julien Rouhaud (#5)
Re: SELECT with LIKE clause makes full table scan

On Wed, Jan 26, 2022 at 1:13 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

Why is this (ignoring the Index) and what could be done?

[...]
create INDEX d01ort on d01buch(d01ort) ;/* D01ORT */

That index can't be used with a LIKE that has a trailing wildcard.

Really? That seems to contradict the doc, i.e. default index type is B-Tree,
which definitely supports trailing wildcard LIKE-predicates, as
explicitly stated in said doc:
https://www.postgresql.org/docs/14/indexes-types.html#INDEXES-TYPES-BTREE

So what makes you say that? --DD

PS: That also contradicts the small demo I made earlier up-thread:
```
ddevienne=> \d+ pg_class
Indexes:
"pg_class_oid_index" UNIQUE, btree (oid)
"pg_class_relname_nsp_index" UNIQUE, btree (relname, relnamespace) <<<<<<<<<
"pg_class_tblspc_relfilenode_index" btree (reltablespace, relfilenode)
```
Whether the index is UNIQUE or not does not matter in this case.

#7Julien Rouhaud
rjuju123@gmail.com
In reply to: Dominique Devienne (#6)
Re: SELECT with LIKE clause makes full table scan

Hi,

On Wed, Jan 26, 2022 at 02:34:21PM +0100, Dominique Devienne wrote:

On Wed, Jan 26, 2022 at 1:13 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

Why is this (ignoring the Index) and what could be done?

[...]
create INDEX d01ort on d01buch(d01ort) ;/* D01ORT */

That index can't be used with a LIKE that has a trailing wildcard.

Really? That seems to contradict the doc, i.e. default index type is B-Tree,
which definitely supports trailing wildcard LIKE-predicates, as
explicitly stated in said doc:
https://www.postgresql.org/docs/14/indexes-types.html#INDEXES-TYPES-BTREE

So what makes you say that? --DD

This part of the documentation you mentioned:

However, if your database does not use the C locale you will need to create
the index with a special operator class to support indexing of
pattern-matching queries; see Section 11.10 below.

PS: That also contradicts the small demo I made earlier up-thread:

relname datatype is name, which has a default C collation, so you are in the
only case that natively work for btree indexes:

# select unnest(indcollation)::regcollation from pg_index where indexrelid = 'pg_class_relname_nsp_index'::regclass;
unnest
--------
"C"
-
(2 rows)

I highly doubt that OP tables are also using C collation, so almost no one
does that.

#8Imre Samu
pella.samu@gmail.com
In reply to: Matthias Apitz (#1)
Re: SELECT with LIKE clause makes full table scan

We face in a PostgreSQL 11.4 installation on a potent Linux host a
...
Why is this (ignoring the Index) and what could be done?

IMHO: 11.4 is very old. ( Release date: 2019-06-20 ) and missing a lot of
patches.
The latest patch release is 11.14 ( see
https://www.postgresql.org/docs/11/release.html )

You can test the PG11.14 with the PostgreSQL docker image (
https://hub.docker.com/_/postgres )
- docker pull postgres:11.14-bullseye
- import table ( d01buch )
- create indexes
- test your query

Regards,
Imre

Matthias Apitz <guru@unixarea.de> ezt írta (időpont: 2022. jan. 26., Sze,
11:55):

Show quoted text

Hello,

We face in a PostgreSQL 11.4 installation on a potent Linux host a
serious performance degree.

A SELECT with a LIKE clause on a table with ca. 5.200.000 rows on a
column with an Index ignores this and does a full table scan:

sisis=# explain (analyze, buffers) select * from d01buch where d01ort like
'Z 9610%' ;
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..680262.71 rows=510 width=952) (actual
time=1324.096..1349.429 rows=1 loops=1)
Workers Planned: 4
Workers Launched: 4
Buffers: shared hit=102040 read=560674
-> Parallel Seq Scan on d01buch (cost=0.00..679211.71 rows=128 width=952)
(actual time=1117.663..1315.062 rows=0 loops=5)
Filter: (d01ort ~~ 'Z 9610%'::text)
Rows Removed by Filter: 1055853
Buffers: shared hit=102040 read=560674
Planning Time: 2.028 ms
Execution Time: 1349.593 ms
(10 Zeilen)

Why is this (ignoring the Index) and what could be done?

Thanks

matthias

--
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/
+49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Julien Rouhaud (#7)
Re: SELECT with LIKE clause makes full table scan

Julien Rouhaud <rjuju123@gmail.com> writes:

On Wed, Jan 26, 2022 at 02:34:21PM +0100, Dominique Devienne wrote:

On Wed, Jan 26, 2022 at 1:13 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

That index can't be used with a LIKE that has a trailing wildcard.

So what makes you say that? --DD

This part of the documentation you mentioned:

However, if your database does not use the C locale you will need to create
the index with a special operator class to support indexing of
pattern-matching queries; see Section 11.10 below.

Note that declaring the index with C collation should also work,
and might be preferable to using the pattern_ops opclass.
C collation has at least some chance of being used explicitly
in queries, whereas a pattern_ops index is basically never going
to match anything but LIKE/regex searches.

regards, tom lane

#10Matthias Apitz
guru@unixarea.de
In reply to: Julien Rouhaud (#7)
Re: SELECT with LIKE clause makes full table scan

We changed two relevant Indexes to

CREATE INDEX d01ort ON d01buch(d01ort bpchar_pattern_ops );
CREATE INDEX d01ort2 ON d01buch(d01ort2 bpchar_pattern_ops );

and now the same queries are fast. We're looking through our code for
more such LIKE clauses on VCHAR columns.

Thanks for all the hints

matthias
--
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub

#11Julien Rouhaud
rjuju123@gmail.com
In reply to: Matthias Apitz (#10)
Re: SELECT with LIKE clause makes full table scan

Hi,

On Wed, Jan 26, 2022 at 11:07 PM Matthias Apitz <guru@unixarea.de> wrote:

We changed two relevant Indexes to

CREATE INDEX d01ort ON d01buch(d01ort bpchar_pattern_ops );
CREATE INDEX d01ort2 ON d01buch(d01ort2 bpchar_pattern_ops );

When you said changed, did you drop the previous ones? As Tom
mentioned, those indexes are specialized and are only useful for LIKE
'something%' queries. It's quite likely that your existing indexes
were useful for other queries, which may not be as fast without those
indexes. You can check in pg_stat_user_indexes if your indexes seems
to be used before actually dropping them for instance:
https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STAT-ALL-INDEXES-VIEW

#12Matthias Apitz
guru@unixarea.de
In reply to: Julien Rouhaud (#11)
Re: SELECT with LIKE clause makes full table scan

El día miércoles, enero 26, 2022 a las 11:21:12p. m. +0800, Julien Rouhaud escribió:

Hi,

On Wed, Jan 26, 2022 at 11:07 PM Matthias Apitz <guru@unixarea.de> wrote:

We changed two relevant Indexes to

CREATE INDEX d01ort ON d01buch(d01ort bpchar_pattern_ops );
CREATE INDEX d01ort2 ON d01buch(d01ort2 bpchar_pattern_ops );

When you said changed, did you drop the previous ones?

Yes, of course.

As Tom
mentioned, those indexes are specialized and are only useful for LIKE
'something%' queries. It's quite likely that your existing indexes
were useful for other queries, which may not be as fast without those
indexes. You can check in pg_stat_user_indexes if your indexes seems
to be used before actually dropping them for instance:
https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STAT-ALL-INDEXES-VIEW

Thanks, we're still investigating more cases with LIKE clause and will
consider your hint.

matthias

--
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub