index on ILIKE/LIKE - PostgreSQL 9.2

Started by drum.lucas@gmail.comalmost 10 years ago15 messagesgeneral

Jump to latest

drum.lucas@gmail.com

almost 10 years ago

Hi there!

I've got a simple but slow query:

SELECT DISTINCT title

FROM ja_jobs WHERE title ILIKE '%RYAN WER%'
and clientid = 31239 AND time_job > 1457826264
order BY title
limit 10

Explain analyze:

Limit (cost=5946.40..5946.41 rows=1 width=19) (actual

time=2746.759..2746.772 rows=1 loops=1)
-> Unique (cost=5946.40..5946.41 rows=1 width=19) (actual
time=2746.753..2746.763 rows=1 loops=1)
-> Sort (cost=5946.40..5946.41 rows=1 width=19) (actual
time=2746.750..2746.754 rows=4 loops=1)
Sort Key: "title"
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on "ja_jobs" (cost=49.02..5946.39
rows=1 width=19) (actual time=576.275..2746.609 rows=4 loops=1)
Recheck Cond: (("clientid" = 31239) AND ("time_job" >
1457826264))
Filter: (("title")::"text" ~~* '%RYAN WER%'::"text")
Rows Removed by Filter: 791
-> Bitmap Index Scan on "ix_jobs_client_times"
(cost=0.00..49.02 rows=1546 width=0) (actual time=100.870..100.870
rows=795 loops=1)
Index Cond: (("clientid" = 31239) AND
("time_job" > 1457826264))
Total runtime: 2746.879 ms

Then, I created a trgm index:

CREATE INDEX ix_ja_jobs_trgm_gin ON public.ja_jobs USING gin (title
gin_trgm_ops);

Explain analyze after the index: (Yes, I ran the analyze)

Limit (cost=389.91..389.91 rows=1 width=20) (actual

time=3720.511..3720.511 rows=0 loops=1)
-> Unique (cost=389.91..389.91 rows=1 width=20) (actual
time=3720.507..3720.507 rows=0 loops=1)
-> Sort (cost=389.91..389.91 rows=1 width=20) (actual
time=3720.505..3720.505 rows=0 loops=1)
Sort Key: "title"
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on "ja_jobs" (cost=385.88..389.90
rows=1 width=20) (actual time=3720.497..3720.497 rows=0 loops=1)
Recheck Cond: (("clientid" = 31239) AND ("time_job" >
1457826264) AND (("title")::"text" ~~ '%RYAN WER%'::"text"))
Rows Removed by Index Recheck: 4
-> BitmapAnd (cost=385.88..385.88 rows=1 width=0)
(actual time=3720.469..3720.469 rows=0 loops=1)
-> Bitmap Index Scan on "ix_jobs_client_times"
(cost=0.00..50.00 rows=1644 width=0) (actual time=0.142..0.142 rows=795
loops=1)
Index Cond: (("clientid" = 31239) AND
("time_job" > 1457826264))
-> Bitmap Index Scan on "ix_ja_jobs_trgm_gin"
(cost=0.00..335.63 rows=484 width=0) (actual time=3720.213..3720.213
rows=32 loops=1)
Index Cond: (("title")::"text" ~~ '%RYAN
WER%'::"text")
Total runtime: 3720.653 ms

so.. the query is still slow..
Do you guys know what can be done ? related to the ILIKE?

cheers
Lucas

Melvin Davidson

melvin6925@gmail.com

almost 10 years ago

In reply to: drum.lucas@gmail.com (#1)

Re: index on ILIKE/LIKE - PostgreSQL 9.2

On Wed, May 11, 2016 at 8:30 PM, Lucas Possamai <drum.lucas@gmail.com>
wrote:

Hi there!

I've got a simple but slow query:

SELECT DISTINCT title

FROM ja_jobs WHERE title ILIKE '%RYAN WER%'
and clientid = 31239 AND time_job > 1457826264
order BY title
limit 10

Explain analyze:

Limit (cost=5946.40..5946.41 rows=1 width=19) (actual

time=2746.759..2746.772 rows=1 loops=1)
-> Unique (cost=5946.40..5946.41 rows=1 width=19) (actual
time=2746.753..2746.763 rows=1 loops=1)
-> Sort (cost=5946.40..5946.41 rows=1 width=19) (actual
time=2746.750..2746.754 rows=4 loops=1)
Sort Key: "title"
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on "ja_jobs" (cost=49.02..5946.39
rows=1 width=19) (actual time=576.275..2746.609 rows=4 loops=1)
Recheck Cond: (("clientid" = 31239) AND ("time_job" >
1457826264))
Filter: (("title")::"text" ~~* '%RYAN WER%'::"text")
Rows Removed by Filter: 791
-> Bitmap Index Scan on "ix_jobs_client_times"
(cost=0.00..49.02 rows=1546 width=0) (actual time=100.870..100.870
rows=795 loops=1)
Index Cond: (("clientid" = 31239) AND
("time_job" > 1457826264))
Total runtime: 2746.879 ms

Then, I created a trgm index:

CREATE INDEX ix_ja_jobs_trgm_gin ON public.ja_jobs USING gin (title
gin_trgm_ops);

Explain analyze after the index: (Yes, I ran the analyze)

Limit (cost=389.91..389.91 rows=1 width=20) (actual

time=3720.511..3720.511 rows=0 loops=1)
-> Unique (cost=389.91..389.91 rows=1 width=20) (actual
time=3720.507..3720.507 rows=0 loops=1)
-> Sort (cost=389.91..389.91 rows=1 width=20) (actual
time=3720.505..3720.505 rows=0 loops=1)
Sort Key: "title"
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on "ja_jobs" (cost=385.88..389.90
rows=1 width=20) (actual time=3720.497..3720.497 rows=0 loops=1)
Recheck Cond: (("clientid" = 31239) AND ("time_job" >
1457826264) AND (("title")::"text" ~~ '%RYAN WER%'::"text"))
Rows Removed by Index Recheck: 4
-> BitmapAnd (cost=385.88..385.88 rows=1 width=0)
(actual time=3720.469..3720.469 rows=0 loops=1)
-> Bitmap Index Scan on "ix_jobs_client_times"
(cost=0.00..50.00 rows=1644 width=0) (actual time=0.142..0.142 rows=795
loops=1)
Index Cond: (("clientid" = 31239) AND
("time_job" > 1457826264))
-> Bitmap Index Scan on "ix_ja_jobs_trgm_gin"
(cost=0.00..335.63 rows=484 width=0) (actual time=3720.213..3720.213
rows=32 loops=1)
Index Cond: (("title")::"text" ~~ '%RYAN
WER%'::"text")
Total runtime: 3720.653 ms

so.. the query is still slow..
Do you guys know what can be done ? related to the ILIKE?

cheers
Lucas

The main problem is WHERE title ILIKE '%RYAN WER%'
When you put a % on the left of the text, there is no way to optimize that,
so yes, it will be slow.

If you can eliminate the leading percent and just have trailing, it will be
much faster.

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

drum.lucas@gmail.com

almost 10 years ago

In reply to: Melvin Davidson (#2)

Re: index on ILIKE/LIKE - PostgreSQL 9.2

The main problem is WHERE title ILIKE '%RYAN WER%'
When you put a % on the left of the text, there is no way to optimize
that, so yes, it will be slow.

If you can eliminate the leading percent and just have trailing, it will
be much faster.

Hmm.. yep.. I suppose I can do that.

But, taking the left % off, the query is still slow:

Limit (cost=418.57..418.58 rows=1 width=20) (actual

time=4439.367..4439.381 rows=1 loops=1)
Buffers: shared hit=6847
-> Unique (cost=418.57..418.58 rows=1 width=20) (actual
time=4439.363..4439.374 rows=1 loops=1)
Buffers: shared hit=6847
-> Sort (cost=418.57..418.58 rows=1 width=20) (actual
time=4439.360..4439.365 rows=4 loops=1)
Sort Key: "title"
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=6847
-> Bitmap Heap Scan on "ja_jobs" (cost=414.55..418.56
rows=1 width=20) (actual time=4439.312..4439.329 rows=4 loops=1)
Recheck Cond: (("clientid" = 31239) AND ("time_job" >
1457826264) AND (("title")::"text" ~~* 'RYAN SHOWER%'::"text"))
Buffers: shared hit=6847
-> BitmapAnd (cost=414.55..414.55 rows=1 width=0)
(actual time=4439.280..4439.280 rows=0 loops=1)
Buffers: shared hit=6843
-> Bitmap Index Scan on "ix_jobs_client_times"
(cost=0.00..50.67 rows=1711 width=0) (actual time=0.142..0.142 rows=795
loops=1)
Index Cond: (("clientid" = 31239) AND
("time_job" > 1457826264))
Buffers: shared hit=8
-> Bitmap Index Scan on "ix_ja_jobs_trgm_gin"
(cost=0.00..363.62 rows=483 width=0) (actual time=4439.014..4439.014
rows=32 loops=1)
Index Cond: (("title")::"text" ~~* 'RYAN
SHOWER%'::"text")
Buffers: shared hit=6835
Total runtime: 4439.427 ms

Here [1]https://www.depesz.com/2011/02/19/waiting-for-9-1-faster-likeilike/ it appears to be working even with two %.. But it's not for me....

[1]: https://www.depesz.com/2011/02/19/waiting-for-9-1-faster-likeilike/

Any ideia? lol

Melvin Davidson

melvin6925@gmail.com

almost 10 years ago

In reply to: drum.lucas@gmail.com (#3)

Re: index on ILIKE/LIKE - PostgreSQL 9.2

On Wed, May 11, 2016 at 9:36 PM, Lucas Possamai <drum.lucas@gmail.com>
wrote:

The main problem is WHERE title ILIKE '%RYAN WER%'
When you put a % on the left of the text, there is no way to optimize
that, so yes, it will be slow.

If you can eliminate the leading percent and just have trailing, it will
be much faster.

Hmm.. yep.. I suppose I can do that.

But, taking the left % off, the query is still slow:

Limit (cost=418.57..418.58 rows=1 width=20) (actual

time=4439.367..4439.381 rows=1 loops=1)
Buffers: shared hit=6847
-> Unique (cost=418.57..418.58 rows=1 width=20) (actual
time=4439.363..4439.374 rows=1 loops=1)
Buffers: shared hit=6847
-> Sort (cost=418.57..418.58 rows=1 width=20) (actual
time=4439.360..4439.365 rows=4 loops=1)
Sort Key: "title"
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=6847
-> Bitmap Heap Scan on "ja_jobs" (cost=414.55..418.56
rows=1 width=20) (actual time=4439.312..4439.329 rows=4 loops=1)
Recheck Cond: (("clientid" = 31239) AND ("time_job" >
1457826264) AND (("title")::"text" ~~* 'RYAN SHOWER%'::"text"))
Buffers: shared hit=6847
-> BitmapAnd (cost=414.55..414.55 rows=1 width=0)
(actual time=4439.280..4439.280 rows=0 loops=1)
Buffers: shared hit=6843
-> Bitmap Index Scan on "ix_jobs_client_times"
(cost=0.00..50.67 rows=1711 width=0) (actual time=0.142..0.142 rows=795
loops=1)
Index Cond: (("clientid" = 31239) AND
("time_job" > 1457826264))
Buffers: shared hit=8
-> Bitmap Index Scan on "ix_ja_jobs_trgm_gin"
(cost=0.00..363.62 rows=483 width=0) (actual time=4439.014..4439.014
rows=32 loops=1)
Index Cond: (("title")::"text" ~~* 'RYAN
SHOWER%'::"text")
Buffers: shared hit=6835
Total runtime: 4439.427 ms

Here [1] it appears to be working even with two %.. But it's not for
me....

[1] https://www.depesz.com/2011/02/19/waiting-for-9-1-faster-likeilike/

Any ideia? lol

Trying redoing the query with CTE as below:

WITH ja_jobs as
(SELECT DISTINCT title
FROM ja_jobs
WHERE clientid = 31239 AND time_job > 1457826264
)
SELECT title
FROM ja_jobs
WHERE title ILIKE 'RYAN WER%'
ORDER BY title
LIMIT 10;

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

drum.lucas@gmail.com

almost 10 years ago

In reply to: Melvin Davidson (#4)

Re: index on ILIKE/LIKE - PostgreSQL 9.2

Trying redoing the query with CTE as below:

WITH ja_jobs as
(SELECT DISTINCT title
FROM ja_jobs
WHERE clientid = 31239 AND time_job > 1457826264
)
SELECT title
FROM ja_jobs
WHERE title ILIKE 'RYAN WER%'
ORDER BY title
LIMIT 10;

hmm.. still slow =(

and it's not hitting the index: (i had to change the clientid because the
previous one was in cache)

Limit (cost=93790.08..93790.09 rows=1 width=218) (actual

Show quoted text

time=284.293..284.308 rows=5 loops=1)
Buffers: shared hit=42284
CTE ja_jobs
-> HashAggregate (cost=93774.31..93779.16 rows=485 width=20) (actual
time=207.235..228.141 rows=16320 loops=1)
Buffers: shared hit=42284
-> Bitmap Heap Scan on "ja_jobs" (cost=882.98..93697.86
rows=30578 width=20) (actual time=21.942..133.380 rows=48472 loops=1)
Recheck Cond: (("clientid" = 14635) AND ("time_job" >
1436731799))
Buffers: shared hit=42284
-> Bitmap Index Scan on "ix_jobs_client_times"
(cost=0.00..875.34 rows=30578 width=0) (actual time=12.389..12.389
rows=48472 loops=1)
Index Cond: (("clientid" = 14635) AND ("time_job" >
1436731799))
Buffers: shared hit=243
-> Sort (cost=10.92..10.93 rows=1 width=218) (actual
time=284.289..284.293 rows=5 loops=1)
Sort Key: "ja_jobs"."title"
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=42284
-> CTE Scan on "ja_jobs" (cost=0.00..10.91 rows=1 width=218)
(actual time=236.248..284.263 rows=5 loops=1)
Filter: (("title")::"text" ~~* '%To Electrical%'::"text")
Rows Removed by Filter: 16315
Buffers: shared hit=42284
Total runtime: 287.633 ms

Jan de Visser

jan@de-visser.net

almost 10 years ago

In reply to: drum.lucas@gmail.com (#5)

Re: index on ILIKE/LIKE - PostgreSQL 9.2

On Wed, May 11, 2016 at 10:03 PM, Lucas Possamai <drum.lucas@gmail.com>
wrote:

Trying redoing the query with CTE as below:

WITH ja_jobs as
(SELECT DISTINCT title
FROM ja_jobs
WHERE clientid = 31239 AND time_job > 1457826264
)
SELECT title
FROM ja_jobs
WHERE title ILIKE 'RYAN WER%'
ORDER BY title
LIMIT 10;

hmm.. still slow =(

and it's not hitting the index: (i had to change the clientid because the
previous one was in cache)

Limit (cost=93790.08..93790.09 rows=1 width=218) (actual

time=284.293..284.308 rows=5 loops=1)
Buffers: shared hit=42284
CTE ja_jobs
-> HashAggregate (cost=93774.31..93779.16 rows=485 width=20)
(actual time=207.235..228.141 rows=16320 loops=1)
Buffers: shared hit=42284
-> Bitmap Heap Scan on "ja_jobs" (cost=882.98..93697.86
rows=30578 width=20) (actual time=21.942..133.380 rows=48472 loops=1)
Recheck Cond: (("clientid" = 14635) AND ("time_job" >
1436731799))
Buffers: shared hit=42284
-> Bitmap Index Scan on "ix_jobs_client_times"
(cost=0.00..875.34 rows=30578 width=0) (actual time=12.389..12.389
rows=48472 loops=1)
Index Cond: (("clientid" = 14635) AND ("time_job" >
1436731799))
Buffers: shared hit=243
-> Sort (cost=10.92..10.93 rows=1 width=218) (actual
time=284.289..284.293 rows=5 loops=1)
Sort Key: "ja_jobs"."title"
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=42284
-> CTE Scan on "ja_jobs" (cost=0.00..10.91 rows=1 width=218)
(actual time=236.248..284.263 rows=5 loops=1)
Filter: (("title")::"text" ~~* '%To Electrical%'::"text")
Rows Removed by Filter: 16315
Buffers: shared hit=42284
Total runtime: 287.633 ms

I think a GIN index can't be used for sorting.

Jeff Janes

jeff.janes@gmail.com

almost 10 years ago

In reply to: drum.lucas@gmail.com (#1)

Re: index on ILIKE/LIKE - PostgreSQL 9.2

On Wed, May 11, 2016 at 5:30 PM, Lucas Possamai <drum.lucas@gmail.com> wrote:

Hi there!

I've got a simple but slow query:

SELECT DISTINCT title
FROM ja_jobs WHERE title ILIKE '%RYAN WER%'
and clientid = 31239 AND time_job > 1457826264
order BY title
limit 10

CREATE INDEX ix_ja_jobs_trgm_gin ON public.ja_jobs USING gin (title
gin_trgm_ops);

Explain analyze after the index: (Yes, I ran the analyze)

Limit (cost=389.91..389.91 rows=1 width=20) (actual
time=3720.511..3720.511 rows=0 loops=1)
-> Unique (cost=389.91..389.91 rows=1 width=20) (actual
time=3720.507..3720.507 rows=0 loops=1)
-> Sort (cost=389.91..389.91 rows=1 width=20) (actual
time=3720.505..3720.505 rows=0 loops=1)
Sort Key: "title"
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on "ja_jobs" (cost=385.88..389.90
rows=1 width=20) (actual time=3720.497..3720.497 rows=0 loops=1)
Recheck Cond: (("clientid" = 31239) AND ("time_job" >
1457826264) AND (("title")::"text" ~~ '%RYAN WER%'::"text"))
Rows Removed by Index Recheck: 4
-> BitmapAnd (cost=385.88..385.88 rows=1 width=0)
(actual time=3720.469..3720.469 rows=0 loops=1)
-> Bitmap Index Scan on "ix_jobs_client_times"
(cost=0.00..50.00 rows=1644 width=0) (actual time=0.142..0.142 rows=795
loops=1)
Index Cond: (("clientid" = 31239) AND
("time_job" > 1457826264))
-> Bitmap Index Scan on "ix_ja_jobs_trgm_gin"
(cost=0.00..335.63 rows=484 width=0) (actual time=3720.213..3720.213 rows=32
loops=1)
Index Cond: (("title")::"text" ~~ '%RYAN
WER%'::"text")
Total runtime: 3720.653 ms

How big is the table? The gin index? shared_buffers? RAM? What
kind of IO system do you have, and how many other things were going on
with it?

It would be interesting to see the output of explain (analyze,
buffers) with track_io_timing turned on.

There have been improvements in this area since 9.2, you should
consider upgrading to at least 9.4.

Cheers,

Jeff

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

drum.lucas@gmail.com

almost 10 years ago

In reply to: Jeff Janes (#7)

Re: index on ILIKE/LIKE - PostgreSQL 9.2

How big is the table? The gin index? shared_buffers? RAM? What
kind of IO system do you have, and how many other things were going on
with it?

- Just a reminder that I'm not running these tests on my prod server.. I'm
running on my test server. So the confs will be different

The table is 9GB big

The gin index is 400MB big
shared_buffers = 1536MB
RAM = 8 GB

I just wanted to understand why the GIN index is not working, but it works
here: https://www.depesz.com/2011/02/19/waiting-for-9-1-faster-likeilike/

It would be interesting to see the output of explain (analyze,
buffers) with track_io_timing turned on.

explain analyze buffer with track_io_timing turned on:

Limit (cost=93466.83..93466.83 rows=1 width=218) (actual

time=24025.463..24025.478 rows=5 loops=1)
Buffers: shared hit=8 read=42285
I/O Timings: read=23599.672
CTE ja_jobs
-> HashAggregate (cost=93451.05..93455.90 rows=485 width=20) (actual
time=23946.801..23967.660 rows=16320 loops=1)
Buffers: shared hit=3 read=42285
I/O Timings: read=23599.672
-> Bitmap Heap Scan on "ja_jobs" (cost=877.70..93374.92
rows=30453 width=20) (actual time=161.372..23835.632 rows=48472 loops=1)
Recheck Cond: (("clientid" = 14635) AND ("time_job" >
1436731799))
Buffers: shared hit=3 read=42285
I/O Timings: read=23599.672
-> Bitmap Index Scan on "ix_jobs_client_times"
(cost=0.00..870.09 rows=30453 width=0) (actual time=133.920..133.920
rows=48472 loops=1)
Index Cond: (("clientid" = 14635) AND ("time_job" >
1436731799))
Buffers: shared hit=3 read=244
I/O Timings: read=120.137
-> Sort (cost=10.92..10.93 rows=1 width=218) (actual
time=24025.457..24025.462 rows=5 loops=1)
Sort Key: "ja_jobs"."title"
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=8 read=42285
I/O Timings: read=23599.672
-> CTE Scan on "ja_jobs" (cost=0.00..10.91 rows=1 width=218)
(actual time=23977.095..24025.325 rows=5 loops=1)
Filter: (("title")::"text" ~~* '%To Electrical%'::"text")
Rows Removed by Filter: 16315
Buffers: shared hit=3 read=42285
I/O Timings: read=23599.672
Total runtime: 24028.551 ms

There have been improvements in this area since 9.2, you should
consider upgrading to at least 9.4.

Yep I know. The upgrade will happen, but I don't know when.

Andreas Joseph Krogh

andreas@visena.com

almost 10 years ago

In reply to: drum.lucas@gmail.com (#1)

Re: index on ILIKE/LIKE - PostgreSQL 9.2

På torsdag 12. mai 2016 kl. 02:30:33, skrev Lucas Possamai <drum.lucas@gmail.com
<mailto:drum.lucas@gmail.com>>:
Hi there!
I've got a simple but slow query:

SELECT DISTINCT title
FROM ja_jobs WHERE title ILIKE '%RYAN WER%'
and clientid = 31239 AND time_job > 1457826264
order BY title
limit 10

Explain analyze:

Limit (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.759..2746.772
rows=1 loops=1)
-> Unique (cost=5946.40..5946.41 rows=1 width=19) (actual
time=2746.753..2746.763 rows=1 loops=1)
-> Sort (cost=5946.40..5946.41 rows=1 width=19) (actual
time=2746.750..2746.754 rows=4 loops=1)
Sort Key: "title"
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on "ja_jobs" (cost=49.02..5946.39 rows=1
width=19) (actual time=576.275..2746.609 rows=4 loops=1)
Recheck Cond: (("clientid" = 31239) AND ("time_job" >
1457826264))
Filter: (("title")::"text" ~~* '%RYAN WER%'::"text")
Rows Removed by Filter: 791
-> Bitmap Index Scan on "ix_jobs_client_times"
(cost=0.00..49.02 rows=1546 width=0) (actual time=100.870..100.870 rows=795
loops=1)
Index Cond: (("clientid" = 31239) AND ("time_job" >
1457826264))
Total runtime: 2746.879 ms
Then, I created a trgm index:

CREATE INDEX ix_ja_jobs_trgm_gin ON public.ja_jobs USING gin (title
gin_trgm_ops);

Explain analyze after the index: (Yes, I ran the analyze)

Limit (cost=389.91..389.91 rows=1 width=20) (actual time=3720.511..3720.511
rows=0 loops=1)
-> Unique (cost=389.91..389.91 rows=1 width=20) (actual
time=3720.507..3720.507 rows=0 loops=1)
-> Sort (cost=389.91..389.91 rows=1 width=20) (actual
time=3720.505..3720.505 rows=0 loops=1)
Sort Key: "title"
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on "ja_jobs" (cost=385.88..389.90 rows=1
width=20) (actual time=3720.497..3720.497 rows=0 loops=1)
Recheck Cond: (("clientid" = 31239) AND ("time_job" >
1457826264) AND (("title")::"text" ~~ '%RYAN WER%'::"text"))
Rows Removed by Index Recheck: 4
-> BitmapAnd (cost=385.88..385.88 rows=1 width=0)
(actual time=3720.469..3720.469 rows=0 loops=1)
-> Bitmap Index Scan on "ix_jobs_client_times"
(cost=0.00..50.00 rows=1644 width=0) (actual time=0.142..0.142 rows=795
loops=1)
Index Cond: (("clientid" = 31239) AND
("time_job" > 1457826264))
-> Bitmap Index Scan on "ix_ja_jobs_trgm_gin"
(cost=0.00..335.63 rows=484 width=0) (actual time=3720.213..3720.213 rows=32
loops=1)
Index Cond: (("title")::"text" ~~ '%RYAN WER
%'::"text")
Total runtime: 3720.653 ms

so.. the query is still slow..
Do you guys know what can be done ? related to the ILIKE?

cheers
Lucas

It uses available indexes, but that isn't good enough.

Try including clientid in the index, using the btree_gin extension:

CREATE INDEX ix_ja_jobs_trgm_clientid_gin ON public.ja_jobs USING gin (title
gin_trgm_ops, clientid);

Note that if clientid is a bigint you have to cast the value to bigint for
btree_gin to use it (note that this isn't necessary if you use a prepared
statement):

SELECT DISTINCT title
       FROM ja_jobs WHERE title ILIKE '%RYAN WER%'
       and clientid = 31239::bigint AND time_job > 1457826264
       order BY title
       limit 10

Also note that the index cannot ant won't be used for sorting. A bitmap-AND is
also inevitable because GIN-indexes cannot be used for the '>' operator, so PG
uses the ix_jobs_client_times btree-index and bigmap-ANDs the result.

Can you post you complete schema?

-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andreas@visena.com <mailto:andreas@visena.com>
www.visena.com <https://www.visena.com>
<https://www.visena.com>

#10

Andreas Joseph Krogh

andreas@visena.com

almost 10 years ago

In reply to: Andreas Joseph Krogh (#9)

Re: index on ILIKE/LIKE - PostgreSQL 9.2

På torsdag 12. mai 2016 kl. 09:57:58, skrev Andreas Joseph Krogh <
andreas@visena.com <mailto:andreas@visena.com>>:
På torsdag 12. mai 2016 kl. 02:30:33, skrev Lucas Possamai <
drum.lucas@gmail.com <mailto:drum.lucas@gmail.com>>:
Hi there!
I've got a simple but slow query:

SELECT DISTINCT title
FROM ja_jobs WHERE title ILIKE '%RYAN WER%'
and clientid = 31239 AND time_job > 1457826264
order BY title
limit 10

Explain analyze:

Limit (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.759..2746.772
rows=1 loops=1)
-> Unique (cost=5946.40..5946.41 rows=1 width=19) (actual
time=2746.753..2746.763 rows=1 loops=1)
-> Sort (cost=5946.40..5946.41 rows=1 width=19) (actual
time=2746.750..2746.754 rows=4 loops=1)
Sort Key: "title"
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on "ja_jobs" (cost=49.02..5946.39 rows=1
width=19) (actual time=576.275..2746.609 rows=4 loops=1)
Recheck Cond: (("clientid" = 31239) AND ("time_job" >
1457826264))
Filter: (("title")::"text" ~~* '%RYAN WER%'::"text")
Rows Removed by Filter: 791
-> Bitmap Index Scan on "ix_jobs_client_times"
(cost=0.00..49.02 rows=1546 width=0) (actual time=100.870..100.870 rows=795
loops=1)
Index Cond: (("clientid" = 31239) AND ("time_job" >
1457826264))
Total runtime: 2746.879 ms
Then, I created a trgm index:

CREATE INDEX ix_ja_jobs_trgm_gin ON public.ja_jobs USING gin (title
gin_trgm_ops);

Explain analyze after the index: (Yes, I ran the analyze)

Limit (cost=389.91..389.91 rows=1 width=20) (actual time=3720.511..3720.511
rows=0 loops=1)
-> Unique (cost=389.91..389.91 rows=1 width=20) (actual
time=3720.507..3720.507 rows=0 loops=1)
-> Sort (cost=389.91..389.91 rows=1 width=20) (actual
time=3720.505..3720.505 rows=0 loops=1)
Sort Key: "title"
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on "ja_jobs" (cost=385.88..389.90 rows=1
width=20) (actual time=3720.497..3720.497 rows=0 loops=1)
Recheck Cond: (("clientid" = 31239) AND ("time_job" >
1457826264) AND (("title")::"text" ~~ '%RYAN WER%'::"text"))
Rows Removed by Index Recheck: 4
-> BitmapAnd (cost=385.88..385.88 rows=1 width=0)
(actual time=3720.469..3720.469 rows=0 loops=1)
-> Bitmap Index Scan on "ix_jobs_client_times"
(cost=0.00..50.00 rows=1644 width=0) (actual time=0.142..0.142 rows=795
loops=1)
Index Cond: (("clientid" = 31239) AND
("time_job" > 1457826264))
-> Bitmap Index Scan on "ix_ja_jobs_trgm_gin"
(cost=0.00..335.63 rows=484 width=0) (actual time=3720.213..3720.213 rows=32
loops=1)
Index Cond: (("title")::"text" ~~ '%RYAN WER
%'::"text")
Total runtime: 3720.653 ms

so.. the query is still slow..
Do you guys know what can be done ? related to the ILIKE?

cheers
Lucas

Also note that the index cannot ant won't be used for sorting. A bitmap-AND is
also inevitable because GIN-indexes cannot be used for the '>' operator, so PG
uses the ix_jobs_client_times btree-index and bigmap-ANDs the result.

Can you post you complete schema?

I created this test:

create table ja_jobs(id bigserial primary key, title varchar not null,
clientid bigint not null, time_job bigint not null);
CREATE INDEX ix_ja_jobs_trgm_clientid_gin ON public.ja_jobs USING gin (title
gin_trgm_ops, clientid);

 --- insert some test-data
 
As you see, this uses the index (when casting clientid to bigint):

andreak=# explain analyze SELECT DISTINCT title
FROM ja_jobs WHERE title ILIKE '%ras du%'
and clientid = 12::bigint AND time_job > 257826264
order BY title
limit 10;
QUERY
PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=8.43..8.44 rows=1 width=32) (actual time=0.033..0.034 rows=1
loops=1)
   -> Unique (cost=8.43..8.44 rows=1 width=32) (actual time=0.032..0.032
rows=1 loops=1)
         -> Sort (cost=8.43..8.43 rows=1 width=32) (actual
time=0.032..0.032 rows=1 loops=1)
               Sort Key: title
               Sort Method: quicksort Memory: 25kB
               -> Bitmap Heap Scan on ja_jobs (cost=7.20..8.42 rows=1
width=32) (actual time=0.025..0.025 rows=1 loops=1)
                     Recheck Cond: (((title)::text ~~* '%ras du%'::text) AND
(clientid = '12'::bigint))
                     Filter: (time_job > 257826264)
                     Heap Blocks: exact=1
                     -> Bitmap Index Scan on ix_ja_jobs_trgm_clientid_gin
(cost=0.00..7.20 rows=1 width=0) (actual time=0.016..0.016 rows=1 loops=1)
                           Index Cond: (((title)::text ~~* '%ras du%'::text)
AND (clientid = '12'::bigint))
Planning time: 0.169 ms
Execution time: 0.061 ms
(13 rows)

-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andreas@visena.com <mailto:andreas@visena.com>
www.visena.com <https://www.visena.com>
<https://www.visena.com>

#11

Andreas Joseph Krogh

andreas@visena.com

almost 10 years ago

In reply to: Andreas Joseph Krogh (#10)

Re: index on ILIKE/LIKE - PostgreSQL 9.2

På torsdag 12. mai 2016 kl. 10:05:01, skrev Andreas Joseph Krogh <
andreas@visena.com <mailto:andreas@visena.com>>:
[snp] I created this test:

create table ja_jobs(id bigserial primary key, title varchar not null,
clientid bigint not null, time_job bigint not null);
CREATE INDEX ix_ja_jobs_trgm_clientid_gin ON public.ja_jobs USING gin (title
gin_trgm_ops, clientid);

 --- insert some test-data
 
As you see, this uses the index (when casting clientid to bigint):

andreak=# explain analyze SELECT DISTINCT title
FROM ja_jobs WHERE title ILIKE '%ras du%'
and clientid = 12::bigint AND time_job > 257826264
order BY title
limit 10;
QUERY
PLAN

Forgot to say, this is in PG-9.6 (master), but should work on previous
versions.

-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andreas@visena.com <mailto:andreas@visena.com>
www.visena.com <https://www.visena.com>
<https://www.visena.com>

#12

Jeff Janes

jeff.janes@gmail.com

almost 10 years ago

In reply to: drum.lucas@gmail.com (#8)

Re: index on ILIKE/LIKE - PostgreSQL 9.2

On Wed, May 11, 2016 at 11:59 PM, Lucas Possamai <drum.lucas@gmail.com> wrote:

How big is the table? The gin index? shared_buffers? RAM? What
kind of IO system do you have, and how many other things were going on
with it?

- Just a reminder that I'm not running these tests on my prod server.. I'm
running on my test server. So the confs will be different

The table is 9GB big
The gin index is 400MB big
shared_buffers = 1536MB
RAM = 8 GB

With those sizes, the gin index will probably be naturally kept mostly
in the file-system cache, if it is used regularly. So the original
slowness of your first query is likely just a cold-cache problem. Can
you generate a stream of realistic queries and see what it stabilizes
at?

I just wanted to understand why the GIN index is not working, but it works
here: https://www.depesz.com/2011/02/19/waiting-for-9-1-faster-likeilike/

In your first email, the gin index did "work", according to the
execution plan. It just wasn't as fast as you wanted. In general,
the longer the query string is between the %%, the worse it will
perform (until version 9.6, and to a smaller degree even with 9.6).
But it still seems oddly slow to me, unless you have a cold-cache and
really bad (or overloaded) IO.

It would be interesting to see the output of explain (analyze,
buffers) with track_io_timing turned on.

explain analyze buffer with track_io_timing turned on:

...

That is the wrong query. The CTE (i.e. the WITH part) is an
optimization fence, so it can't use the gin index, simply because of
the way you query is written. (I think Melvin suggested it because he
noticed that using the gin index actually slowed down the query, so he
wanted to force it to not be used.)

Cheers,

Jeff

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#13

drum.lucas@gmail.com

almost 10 years ago

In reply to: Jeff Janes (#12)

Re: index on ILIKE/LIKE - PostgreSQL 9.2

With those sizes, the gin index will probably be naturally kept mostly
in the file-system cache, if it is used regularly. So the original
slowness of your first query is likely just a cold-cache problem. Can
you generate a stream of realistic queries and see what it stabilizes
at?

I just wanted to understand why the GIN index is not working, but it

works

here:

https://www.depesz.com/2011/02/19/waiting-for-9-1-faster-likeilike/

In your first email, the gin index did "work", according to the
execution plan. It just wasn't as fast as you wanted. In general,
the longer the query string is between the %%, the worse it will
perform (until version 9.6, and to a smaller degree even with 9.6).
But it still seems oddly slow to me, unless you have a cold-cache and
really bad (or overloaded) IO.

It would be interesting to see the output of explain (analyze,
buffers) with track_io_timing turned on.

explain analyze buffer with track_io_timing turned on:

...

That is the wrong query. The CTE (i.e. the WITH part) is an
optimization fence, so it can't use the gin index, simply because of
the way you query is written. (I think Melvin suggested it because he
noticed that using the gin index actually slowed down the query, so he
wanted to force it to not be used.)

Oh ok.

- Here is the explain analyze buffer with the original query I posted here
with the gin index:

Query:

explain (analyze, buffers)
SELECT title
FROM ja_jobs WHERE title ILIKE '%RYAN WER%'
and clientid = 31239 AND time_job > 1457826264
order BY title
limit 10

Explain analyze:

Show quoted text

Limit (cost=390.07..390.08 rows=1 width=20) (actual
time=3945.263..3945.280 rows=4 loops=1)
Buffers: shared hit=5956 read=10
I/O Timings: read=60.323
-> Sort (cost=390.07..390.08 rows=1 width=20) (actual
time=3945.256..3945.260 rows=4 loops=1)
Sort Key: "title"
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=5956 read=10
I/O Timings: read=60.323
-> Bitmap Heap Scan on "ja_jobs" (cost=386.05..390.06 rows=1
width=20) (actual time=3944.857..3945.127 rows=4 loops=1)
Recheck Cond: (("clientid" = 31239) AND ("time_job" >
1457826264) AND (("title")::"text" ~~* '% WER%'::"text"))
Buffers: shared hit=5951 read=10
I/O Timings: read=60.323
-> BitmapAnd (cost=386.05..386.05 rows=1 width=0) (actual
time=3929.540..3929.540 rows=0 loops=1)
Buffers: shared hit=5950 read=7
I/O Timings: read=45.021
-> Bitmap Index Scan on "ix_jobs_client_times"
(cost=0.00..50.16 rows=1660 width=0) (actual time=45.536..45.536 rows=795
loops=1)
Index Cond: (("clientid" = 31239) AND
("time_job" > 1457826264))
Buffers: shared hit=5 read=7
I/O Timings: read=45.021
-> Bitmap Index Scan on "ix_jobs_trgm_gin"
(cost=0.00..335.64 rows=485 width=0) (actual time=3883.886..3883.886
rows=32 loops=1)
Index Cond: (("title")::"text" ~~* '%RYAN WER
%'::"text")
Buffers: shared hit=5945
Total runtime: 3945.554 ms

#14

Jeff Janes

jeff.janes@gmail.com

almost 10 years ago

In reply to: drum.lucas@gmail.com (#13)

Re: index on ILIKE/LIKE - PostgreSQL 9.2

On Thu, May 12, 2016 at 2:32 PM, Lucas Possamai <drum.lucas@gmail.com> wrote:

-> Bitmap Index Scan on "ix_jobs_trgm_gin"
(cost=0.00..335.64 rows=485 width=0) (actual time=3883.886..3883.886 rows=32
loops=1)
Index Cond: (("title")::"text" ~~* '%RYAN
WER%'::"text")
Buffers: shared hit=5945
Total runtime: 3945.554 ms

So it is not cold-cache or IO problems, but a CPU problem. Your query
only has 6 trigrams in it, and that is causing nearly 6000 buffer
hits. I'm guessing the " w" trigram is extremely common in your data
set. Anyway, you have some huge posting lists there, and they were
not dealt with very well in 9.2 or 9.3.

Cheers,

Jeff

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#15

drum.lucas@gmail.com

almost 10 years ago

In reply to: Jeff Janes (#14)

Re: index on ILIKE/LIKE - PostgreSQL 9.2

Hi there,

The problem was solved by using lowercase

*New index:*

CREATE INDEX CONCURRENTLY ON public.ja_jobs (clientid, lower(title)
varchar_pattern_ops, time_job);

*New query:*

SELECT DISTINCT title
FROM public.ja_jobs WHERE lower(title) LIKE lower('RYAN
WER')
AND clientid = 31239
AND time_job > 1457826264
order BY title
limit 10;

- Improvement of 1400%

Thanks

Lucas