Why is this query not using GIN index?

Started by Aaron Lewisover 9 years ago8 messagesgeneral
Jump to latest
#1Aaron Lewis
the.warl0ck.1989@gmail.com

I have a simple table, and a gin index,

create table mytable(hash char(40), title varchar(500));
create index name_fts on mytable using gin(to_tsvector('english', 'title'));
create unique index md5_uniq_idx on mytable(hash);

When I execute a query with tsquery, the GIN index was not in use:

test=# explain analyze select * from mytable where
to_tsvector('english', title) @@ 'abc | def'::tsquery limit 10;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..277.35 rows=10 width=83) (actual
time=0.111..75.549 rows=10 loops=1)
-> Seq Scan on mytable (cost=0.00..381187.45 rows=13744 width=83)
(actual time=0.110..75.546 rows=10 loops=1)
Filter: (to_tsvector('english'::regconfig, (title)::text) @@
'''abc'' | ''def'''::tsquery)
Rows Removed by Filter: 10221
Planning time: 0.176 ms
Execution time: 75.564 ms
(6 rows)

Any ideas?

--
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print: 9F67 391B B770 8FF6 99DC D92D 87F6 2602 1371 4D33

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Oleg Bartunov
oleg@sai.msu.su
In reply to: Aaron Lewis (#1)
Re: Why is this query not using GIN index?

On Sun, Nov 13, 2016 at 2:50 PM, Aaron Lewis <the.warl0ck.1989@gmail.com>
wrote:

I have a simple table, and a gin index,

create table mytable(hash char(40), title varchar(500));
create index name_fts on mytable using gin(to_tsvector('english',
'title'));

^^^^^

Show quoted text

create unique index md5_uniq_idx on mytable(hash);

When I execute a query with tsquery, the GIN index was not in use:

test=# explain analyze select * from mytable where
to_tsvector('english', title) @@ 'abc | def'::tsquery limit 10;
QUERY PLAN
------------------------------------------------------------
--------------------------------------------------------
Limit (cost=0.00..277.35 rows=10 width=83) (actual
time=0.111..75.549 rows=10 loops=1)
-> Seq Scan on mytable (cost=0.00..381187.45 rows=13744 width=83)
(actual time=0.110..75.546 rows=10 loops=1)
Filter: (to_tsvector('english'::regconfig, (title)::text) @@
'''abc'' | ''def'''::tsquery)
Rows Removed by Filter: 10221
Planning time: 0.176 ms
Execution time: 75.564 ms
(6 rows)

Any ideas?

--
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print: 9F67 391B B770 8FF6 99DC D92D 87F6 2602 1371 4D33

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Aaron Lewis
the.warl0ck.1989@gmail.com
In reply to: Oleg Bartunov (#2)
Re: Why is this query not using GIN index?

Hi Oleg,

Can you elaborate on the title column? I don't get it.

On Sun, Nov 13, 2016 at 10:10 PM, Oleg Bartunov <obartunov@gmail.com> wrote:

On Sun, Nov 13, 2016 at 2:50 PM, Aaron Lewis <the.warl0ck.1989@gmail.com>
wrote:

I have a simple table, and a gin index,

create table mytable(hash char(40), title varchar(500));
create index name_fts on mytable using gin(to_tsvector('english',
'title'));

^^^^^

create unique index md5_uniq_idx on mytable(hash);

When I execute a query with tsquery, the GIN index was not in use:

test=# explain analyze select * from mytable where
to_tsvector('english', title) @@ 'abc | def'::tsquery limit 10;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..277.35 rows=10 width=83) (actual
time=0.111..75.549 rows=10 loops=1)
-> Seq Scan on mytable (cost=0.00..381187.45 rows=13744 width=83)
(actual time=0.110..75.546 rows=10 loops=1)
Filter: (to_tsvector('english'::regconfig, (title)::text) @@
'''abc'' | ''def'''::tsquery)
Rows Removed by Filter: 10221
Planning time: 0.176 ms
Execution time: 75.564 ms
(6 rows)

Any ideas?

--
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print: 9F67 391B B770 8FF6 99DC D92D 87F6 2602 1371 4D33

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print: 9F67 391B B770 8FF6 99DC D92D 87F6 2602 1371 4D33

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Julien Rouhaud
rjuju123@gmail.com
In reply to: Aaron Lewis (#3)
Re: Why is this query not using GIN index?

On 13/11/2016 15:26, Aaron Lewis wrote:

Hi Oleg,

Can you elaborate on the title column? I don't get it.

create table mytable(hash char(40), title varchar(500));
create index name_fts on mytable using gin(to_tsvector('english',
'title'));

You created an index on the text 'title', not on the title column, so
the index is useless.

Drop the existing index and create this one instead:

create index name_fts on mytable using gin(to_tsvector('english', title));

On Sun, Nov 13, 2016 at 10:10 PM, Oleg Bartunov <obartunov@gmail.com> wrote:

On Sun, Nov 13, 2016 at 2:50 PM, Aaron Lewis <the.warl0ck.1989@gmail.com>
wrote:

I have a simple table, and a gin index,

create table mytable(hash char(40), title varchar(500));
create index name_fts on mytable using gin(to_tsvector('english',
'title'));

^^^^^

create unique index md5_uniq_idx on mytable(hash);

When I execute a query with tsquery, the GIN index was not in use:

test=# explain analyze select * from mytable where
to_tsvector('english', title) @@ 'abc | def'::tsquery limit 10;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..277.35 rows=10 width=83) (actual
time=0.111..75.549 rows=10 loops=1)
-> Seq Scan on mytable (cost=0.00..381187.45 rows=13744 width=83)
(actual time=0.110..75.546 rows=10 loops=1)
Filter: (to_tsvector('english'::regconfig, (title)::text) @@
'''abc'' | ''def'''::tsquery)
Rows Removed by Filter: 10221
Planning time: 0.176 ms
Execution time: 75.564 ms
(6 rows)

Any ideas?

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Aaron Lewis
the.warl0ck.1989@gmail.com
In reply to: Julien Rouhaud (#4)
Re: Why is this query not using GIN index?

Sigh, didn't notice that. Thanks for the heads up.

It takes 500ms with 10m rows, could it be faster?
I've increased work_mem to 256MB

test=# explain analyze select * from mytable where
to_tsvector('english', title) @@ 'x264'::tsquery limit 1000 offset 10;
                                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=684.06..2949.42 rows=1000 width=83) (actual
time=348.506..536.483 rows=1000 loops=1)
   ->  Bitmap Heap Scan on mytable  (cost=661.41..158917.22 rows=69859
width=83) (actual time=345.354..536.199 rows=1010 loops=1)
         Recheck Cond: (to_tsvector('english'::regconfig, title) @@
'''x264'''::tsquery)
         Rows Removed by Index Recheck: 12242
         Heap Blocks: exact=20 lossy=186
         ->  Bitmap Index Scan on name_fts  (cost=0.00..643.95
rows=69859 width=0) (actual time=333.703..333.703 rows=1044673
loops=1)
               Index Cond: (to_tsvector('english'::regconfig, title)
@@ '''x264'''::tsquery)
 Planning time: 0.144 ms
 Execution time: 537.212 ms
(9 rows)

On Sun, Nov 13, 2016 at 10:33 PM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:

On 13/11/2016 15:26, Aaron Lewis wrote:

Hi Oleg,

Can you elaborate on the title column? I don't get it.

create table mytable(hash char(40), title varchar(500));
create index name_fts on mytable using gin(to_tsvector('english',
'title'));

You created an index on the text 'title', not on the title column, so
the index is useless.

Drop the existing index and create this one instead:

create index name_fts on mytable using gin(to_tsvector('english', title));

On Sun, Nov 13, 2016 at 10:10 PM, Oleg Bartunov <obartunov@gmail.com> wrote:

On Sun, Nov 13, 2016 at 2:50 PM, Aaron Lewis <the.warl0ck.1989@gmail.com>
wrote:

I have a simple table, and a gin index,

create table mytable(hash char(40), title varchar(500));
create index name_fts on mytable using gin(to_tsvector('english',
'title'));

^^^^^

create unique index md5_uniq_idx on mytable(hash);

When I execute a query with tsquery, the GIN index was not in use:

test=# explain analyze select * from mytable where
to_tsvector('english', title) @@ 'abc | def'::tsquery limit 10;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..277.35 rows=10 width=83) (actual
time=0.111..75.549 rows=10 loops=1)
-> Seq Scan on mytable (cost=0.00..381187.45 rows=13744 width=83)
(actual time=0.110..75.546 rows=10 loops=1)
Filter: (to_tsvector('english'::regconfig, (title)::text) @@
'''abc'' | ''def'''::tsquery)
Rows Removed by Filter: 10221
Planning time: 0.176 ms
Execution time: 75.564 ms
(6 rows)

Any ideas?

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

--
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print: 9F67 391B B770 8FF6 99DC D92D 87F6 2602 1371 4D33

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Oleg Bartunov
oleg@sai.msu.su
In reply to: Aaron Lewis (#5)
Re: Why is this query not using GIN index?

On Sun, Nov 13, 2016 at 6:05 PM, Aaron Lewis <the.warl0ck.1989@gmail.com>
wrote:

Sigh, didn't notice that. Thanks for the heads up.

It takes 500ms with 10m rows, could it be faster?

sure. Recheck with function call is pretty expensive, so I'd not recommend
to create functional index, just create separate column of type tsvector
(materialize to_tsvector) and create gin index on it. You should surprise.

Show quoted text

I've increased work_mem to 256MB

test=# explain analyze select * from mytable where
to_tsvector('english', title) @@ 'x264'::tsquery limit 1000 offset 10;
QUERY PLAN
------------------------------------------------------------
------------------------------------------------------------
---------------
Limit  (cost=684.06..2949.42 rows=1000 width=83) (actual
time=348.506..536.483 rows=1000 loops=1)
->  Bitmap Heap Scan on mytable  (cost=661.41..158917.22 rows=69859
width=83) (actual time=345.354..536.199 rows=1010 loops=1)
Recheck Cond: (to_tsvector('english'::regconfig, title) @@
'''x264'''::tsquery)
Rows Removed by Index Recheck: 12242
Heap Blocks: exact=20 lossy=186
->  Bitmap Index Scan on name_fts  (cost=0.00..643.95
rows=69859 width=0) (actual time=333.703..333.703 rows=1044673
loops=1)
Index Cond: (to_tsvector('english'::regconfig, title)
@@ '''x264'''::tsquery)
Planning time: 0.144 ms
Execution time: 537.212 ms
(9 rows)

On Sun, Nov 13, 2016 at 10:33 PM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:

On 13/11/2016 15:26, Aaron Lewis wrote:

Hi Oleg,

Can you elaborate on the title column? I don't get it.

create table mytable(hash char(40), title varchar(500));
create index name_fts on mytable using gin(to_tsvector('english',
'title'));

You created an index on the text 'title', not on the title column, so
the index is useless.

Drop the existing index and create this one instead:

create index name_fts on mytable using gin(to_tsvector('english',

title));

On Sun, Nov 13, 2016 at 10:10 PM, Oleg Bartunov <obartunov@gmail.com>

wrote:

On Sun, Nov 13, 2016 at 2:50 PM, Aaron Lewis <

the.warl0ck.1989@gmail.com>

wrote:

I have a simple table, and a gin index,

create table mytable(hash char(40), title varchar(500));
create index name_fts on mytable using gin(to_tsvector('english',
'title'));

^^^^^

create unique index md5_uniq_idx on mytable(hash);

When I execute a query with tsquery, the GIN index was not in use:

test=# explain analyze select * from mytable where
to_tsvector('english', title) @@ 'abc | def'::tsquery limit 10;
QUERY PLAN

------------------------------------------------------------

--------------------------------------------------------

Limit (cost=0.00..277.35 rows=10 width=83) (actual
time=0.111..75.549 rows=10 loops=1)
-> Seq Scan on mytable (cost=0.00..381187.45 rows=13744 width=83)
(actual time=0.110..75.546 rows=10 loops=1)
Filter: (to_tsvector('english'::regconfig, (title)::text) @@
'''abc'' | ''def'''::tsquery)
Rows Removed by Filter: 10221
Planning time: 0.176 ms
Execution time: 75.564 ms
(6 rows)

Any ideas?

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

--
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print: 9F67 391B B770 8FF6 99DC D92D 87F6 2602 1371 4D33

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Oleg Bartunov (#6)
Re: Why is this query not using GIN index?

Oleg Bartunov <obartunov@gmail.com> writes:

On Sun, Nov 13, 2016 at 6:05 PM, Aaron Lewis <the.warl0ck.1989@gmail.com>

It takes 500ms with 10m rows, could it be faster?

sure. Recheck with function call is pretty expensive, so I'd not recommend
to create functional index, just create separate column of type tsvector
(materialize to_tsvector) and create gin index on it. You should surprise.

I doubt it'll help that much --- more than half the time is going into the
bitmap indexscan, and with over 1m candidate matches, there's no way
that's going to be super cheap.

I wonder whether a gist index would be better here, since it would support
a plain indexscan which should require scanning much less of the index
given the small LIMIT.

(Materializing the tsvector would probably help for gist, too, by reducing
the cost of lossy-index rechecks.)

BTW, it still looks like the performance is being significantly hurt by
inadequate work_mem.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Aaron Lewis
the.warl0ck.1989@gmail.com
In reply to: Tom Lane (#7)
Re: Why is this query not using GIN index?

Hey guys,

I'm trying to understand the performance impact of "Index Recheck", I
googled for Index Recheck, but didn't find much details about it,
where can I know more about it?

And how did you know the performance is being significantly hurt by
inadequate work_mem?

I'm running PG 9.6.1, built from source.

On Mon, Nov 14, 2016 at 2:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Oleg Bartunov <obartunov@gmail.com> writes:

On Sun, Nov 13, 2016 at 6:05 PM, Aaron Lewis <the.warl0ck.1989@gmail.com>

It takes 500ms with 10m rows, could it be faster?

sure. Recheck with function call is pretty expensive, so I'd not recommend
to create functional index, just create separate column of type tsvector
(materialize to_tsvector) and create gin index on it. You should surprise.

I doubt it'll help that much --- more than half the time is going into the
bitmap indexscan, and with over 1m candidate matches, there's no way
that's going to be super cheap.

I wonder whether a gist index would be better here, since it would support
a plain indexscan which should require scanning much less of the index
given the small LIMIT.

(Materializing the tsvector would probably help for gist, too, by reducing
the cost of lossy-index rechecks.)

BTW, it still looks like the performance is being significantly hurt by
inadequate work_mem.

regards, tom lane

--
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print: 9F67 391B B770 8FF6 99DC D92D 87F6 2602 1371 4D33

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general