Trigram is slow when 10m rows

Started by Aaron Lewisover 9 years ago5 messagesgeneral
Jump to latest
#1Aaron Lewis
the.warl0ck.1989@gmail.com

I have a simple table with Trigram index,

create table mytable(hash char(40), title text);
create index title_trgm_idx on mytable using gin(title gin_trgm_ops);

When I run a query with 10m rows, it uses the Trigram index, but takes
3s to execute, very slow.
(I have 80m rows, but only inserted 10m for testing purpose)

test=# select count(*) from mytable;
count
----------
13971887
(1 row)

test=# explain analyze select * from mytable where title ilike 'x264';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on mytable (cost=462.69..5639.67 rows=1380
width=83) (actual time=2937.308..2937.308 rows=0 loops=1)
Recheck Cond: (title ~~* 'x264'::text)
Rows Removed by Index Recheck: 11402855
Heap Blocks: exact=39557 lossy=158010
-> Bitmap Index Scan on title_trgm_idx (cost=0.00..462.35
rows=1380 width=0) (actual time=342.440..342.440 rows=1220793 loops=1)
Index Cond: (title ~~* 'x264'::text)
Planning time: 0.611 ms
Execution time: 2937.729 ms
(8 rows)

Any ideas to speed things up?

--
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print: 9F67 391B B770 8FF6 99DC D92D 87F6 2602 1371 4D33

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Oleg Bartunov
oleg@sai.msu.su
In reply to: Aaron Lewis (#1)
Re: Trigram is slow when 10m rows

On Sun, Nov 13, 2016 at 2:54 PM, Aaron Lewis <the.warl0ck.1989@gmail.com>
wrote:

I have a simple table with Trigram index,

create table mytable(hash char(40), title text);
create index title_trgm_idx on mytable using gin(title gin_trgm_ops);

When I run a query with 10m rows, it uses the Trigram index, but takes
3s to execute, very slow.
(I have 80m rows, but only inserted 10m for testing purpose)

test=# select count(*) from mytable;
count
----------
13971887
(1 row)

test=# explain analyze select * from mytable where title ilike 'x264';
QUERY PLAN
------------------------------------------------------------
--------------------------------------------------------------------------
Bitmap Heap Scan on mytable (cost=462.69..5639.67 rows=1380
width=83) (actual time=2937.308..2937.308 rows=0 loops=1)
Recheck Cond: (title ~~* 'x264'::text)
Rows Removed by Index Recheck: 11402855
Heap Blocks: exact=39557 lossy=158010
-> Bitmap Index Scan on title_trgm_idx (cost=0.00..462.35
rows=1380 width=0) (actual time=342.440..342.440 rows=1220793 loops=1)
Index Cond: (title ~~* 'x264'::text)
Planning time: 0.611 ms
Execution time: 2937.729 ms
(8 rows)

Any ideas to speed things up?

Rows Removed by Index Recheck: 11402855
Heap Blocks: exact=39557 lossy=158010

You need to increase work_mem

Show quoted text

--
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print: 9F67 391B B770 8FF6 99DC D92D 87F6 2602 1371 4D33

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Aaron Lewis
the.warl0ck.1989@gmail.com
In reply to: Oleg Bartunov (#2)
Re: Trigram is slow when 10m rows

Thanks Oleg.

I've increased work_mem to 128MB, now the query falls down to 1.7s,
faster but still not good enough.

Is there any other thing I can do about it?

test=# explain analyze select * from mytable where title ilike 'x264';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on mytable (cost=462.69..5639.67 rows=1380
width=83) (actual time=1754.656..1754.656 rows=0 loops=1)
Recheck Cond: (title ~~* 'x264'::text)
Rows Removed by Index Recheck: 1220793
Heap Blocks: exact=197567
-> Bitmap Index Scan on title_trgm_idx (cost=0.00..462.35
rows=1380 width=0) (actual time=346.663..346.663 rows=1220793 loops=1)
Index Cond: (title ~~* 'x264'::text)
Planning time: 1.168 ms
Execution time: 1755.944 ms

On Sun, Nov 13, 2016 at 10:04 PM, Oleg Bartunov <obartunov@gmail.com> wrote:

On Sun, Nov 13, 2016 at 2:54 PM, Aaron Lewis <the.warl0ck.1989@gmail.com>
wrote:

I have a simple table with Trigram index,

create table mytable(hash char(40), title text);
create index title_trgm_idx on mytable using gin(title gin_trgm_ops);

When I run a query with 10m rows, it uses the Trigram index, but takes
3s to execute, very slow.
(I have 80m rows, but only inserted 10m for testing purpose)

test=# select count(*) from mytable;
count
----------
13971887
(1 row)

test=# explain analyze select * from mytable where title ilike 'x264';
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on mytable (cost=462.69..5639.67 rows=1380
width=83) (actual time=2937.308..2937.308 rows=0 loops=1)
Recheck Cond: (title ~~* 'x264'::text)
Rows Removed by Index Recheck: 11402855
Heap Blocks: exact=39557 lossy=158010
-> Bitmap Index Scan on title_trgm_idx (cost=0.00..462.35
rows=1380 width=0) (actual time=342.440..342.440 rows=1220793 loops=1)
Index Cond: (title ~~* 'x264'::text)
Planning time: 0.611 ms
Execution time: 2937.729 ms
(8 rows)

Any ideas to speed things up?

Rows Removed by Index Recheck: 11402855
Heap Blocks: exact=39557 lossy=158010

You need to increase work_mem

--
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print: 9F67 391B B770 8FF6 99DC D92D 87F6 2602 1371 4D33

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print: 9F67 391B B770 8FF6 99DC D92D 87F6 2602 1371 4D33

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Oleg Bartunov
oleg@sai.msu.su
In reply to: Aaron Lewis (#3)
Re: Trigram is slow when 10m rows

On Sun, Nov 13, 2016 at 5:25 PM, Aaron Lewis <the.warl0ck.1989@gmail.com>
wrote:

Thanks Oleg.

I've increased work_mem to 128MB, now the query falls down to 1.7s,
faster but still not good enough.

Is there any other thing I can do about it?

your query 'x264' is short in terms of the number of trigrams, so trigram
index isn't good. Did you tried text_pattern_ops for btree ? Something like
create index title_btree_idx on mytable using btree(title text_pattern_ops
);

Show quoted text

test=# explain analyze select * from mytable where title ilike 'x264';
QUERY PLAN
------------------------------------------------------------
--------------------------------------------------------------------------
Bitmap Heap Scan on mytable (cost=462.69..5639.67 rows=1380
width=83) (actual time=1754.656..1754.656 rows=0 loops=1)
Recheck Cond: (title ~~* 'x264'::text)
Rows Removed by Index Recheck: 1220793
Heap Blocks: exact=197567
-> Bitmap Index Scan on title_trgm_idx (cost=0.00..462.35
rows=1380 width=0) (actual time=346.663..346.663 rows=1220793 loops=1)
Index Cond: (title ~~* 'x264'::text)
Planning time: 1.168 ms
Execution time: 1755.944 ms

On Sun, Nov 13, 2016 at 10:04 PM, Oleg Bartunov <obartunov@gmail.com>
wrote:

On Sun, Nov 13, 2016 at 2:54 PM, Aaron Lewis <the.warl0ck.1989@gmail.com

wrote:

I have a simple table with Trigram index,

create table mytable(hash char(40), title text);
create index title_trgm_idx on mytable using gin(title gin_trgm_ops);

When I run a query with 10m rows, it uses the Trigram index, but takes
3s to execute, very slow.
(I have 80m rows, but only inserted 10m for testing purpose)

test=# select count(*) from mytable;
count
----------
13971887
(1 row)

test=# explain analyze select * from mytable where title ilike 'x264';
QUERY PLAN

------------------------------------------------------------

--------------------------------------------------------------------------

Bitmap Heap Scan on mytable (cost=462.69..5639.67 rows=1380
width=83) (actual time=2937.308..2937.308 rows=0 loops=1)
Recheck Cond: (title ~~* 'x264'::text)
Rows Removed by Index Recheck: 11402855
Heap Blocks: exact=39557 lossy=158010
-> Bitmap Index Scan on title_trgm_idx (cost=0.00..462.35
rows=1380 width=0) (actual time=342.440..342.440 rows=1220793 loops=1)
Index Cond: (title ~~* 'x264'::text)
Planning time: 0.611 ms
Execution time: 2937.729 ms
(8 rows)

Any ideas to speed things up?

Rows Removed by Index Recheck: 11402855
Heap Blocks: exact=39557 lossy=158010

You need to increase work_mem

--
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print: 9F67 391B B770 8FF6 99DC D92D 87F6 2602 1371 4D33

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print: 9F67 391B B770 8FF6 99DC D92D 87F6 2602 1371 4D33

#5Jeff Janes
jeff.janes@gmail.com
In reply to: Aaron Lewis (#1)
Re: Trigram is slow when 10m rows

On Sun, Nov 13, 2016 at 3:54 AM, Aaron Lewis <the.warl0ck.1989@gmail.com>
wrote:

I have a simple table with Trigram index,

create table mytable(hash char(40), title text);
create index title_trgm_idx on mytable using gin(title gin_trgm_ops);

When I run a query with 10m rows, it uses the Trigram index, but takes
3s to execute, very slow.
(I have 80m rows, but only inserted 10m for testing purpose)

test=# select count(*) from mytable;
count
----------
13971887
(1 row)

test=# explain analyze select * from mytable where title ilike 'x264';
QUERY PLAN
------------------------------------------------------------
--------------------------------------------------------------------------
Bitmap Heap Scan on mytable (cost=462.69..5639.67 rows=1380
width=83) (actual time=2937.308..2937.308 rows=0 loops=1)
Recheck Cond: (title ~~* 'x264'::text)
Rows Removed by Index Recheck: 11402855
Heap Blocks: exact=39557 lossy=158010
-> Bitmap Index Scan on title_trgm_idx (cost=0.00..462.35
rows=1380 width=0) (actual time=342.440..342.440 rows=1220793 loops=1)
Index Cond: (title ~~* 'x264'::text)
Planning time: 0.611 ms
Execution time: 2937.729 ms
(8 rows)

Any ideas to speed things up?

What version of postgresql and pg_trgm are you using? It might work better
under 9.6/1.3

Since your query doesn't use wildcards, it is probably more well suited to
a regular btree index, perhaps with citext.

Cheers,

Jeff