Wrong rows count estimation (explain, gist, tsearch)

Started by Sergey Konoplevover 16 years ago4 messagesgeneral
Jump to latest
#1Sergey Konoplev
gray.ru@gmail.com

Hi, community

I have a table containing column for FTS and an appropriate index:

zzz=# \d search_table
...
obj_tsvector | tsvector |
not null default ''::tsvector
...
"i_search_table__tsvector_1" gist (obj_tsvector) WHERE obj_status_did = 1

The table filled with about 7.5E+6 rows. Most of them have different
from default values in obj_tsvector column. I use "estimated rows
count trick" to make search results counter faster, and every time
when obj_tsvector is used estimation rows count is extremely differ
from actual (eg. 6821 vs 372012). I played with SET STATISTICS but
have no success.

zzz=# EXPLAIN ANALYZE SELECT count(1) FROM search_table WHERE
obj_status_did = 1 AND obj_tsvector @@ (make_tsquery('(музыка)',
'utf8_russian'));

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=25226.63..25226.64 rows=1 width=0) (actual
time=14832.455..14832.455 rows=1 loops=1)
-> Bitmap Heap Scan on search_table (cost=465.16..25209.57
rows=6821 width=0) (actual time=3202.390..14731.096 rows=371026
loops=1)
Recheck Cond: (obj_status_did = 1)
Filter: (obj_tsvector @@ '''музыка'''::tsquery)
-> Bitmap Index Scan on i_search_table__tsvector_1
(cost=0.00..463.45 rows=6821 width=0) (actual time=2919.257..2919.257
rows=372012 loops=1)
Index Cond: (obj_tsvector @@ '''музыка'''::tsquery)
Total runtime: 14832.555 ms
(7 rows)

PG version - 8.3.7, STATISTICS is set to 500 for the column.

What's wrong with it? Is it possible to solve the problem? Thanx.

--
Regards,
Sergey Konoplev

#2Sergey Konoplev
gray.ru@gmail.com
In reply to: Sergey Konoplev (#1)
Re: Wrong rows count estimation (explain, gist, tsearch)

BTW, dead tupples <5%

On Mon, Sep 28, 2009 at 11:09 AM, Sergey Konoplev <gray.ru@gmail.com> wrote:

Hi, community

I have a table containing column for FTS and an appropriate index:

zzz=# \d search_table
...
obj_tsvector                              | tsvector                 |
not null default ''::tsvector
...
   "i_search_table__tsvector_1" gist (obj_tsvector) WHERE obj_status_did = 1

The table filled with about 7.5E+6 rows. Most of them have different
from default values in obj_tsvector column. I use "estimated rows
count trick" to make search results counter faster, and every time
when obj_tsvector is used estimation rows count is extremely differ
from actual (eg. 6821 vs 372012). I played with SET STATISTICS but
have no success.

zzz=# EXPLAIN ANALYZE SELECT count(1)  FROM search_table  WHERE
obj_status_did = 1 AND obj_tsvector @@ (make_tsquery('(музыка)',
'utf8_russian'));

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=25226.63..25226.64 rows=1 width=0) (actual
time=14832.455..14832.455 rows=1 loops=1)
  ->  Bitmap Heap Scan on search_table  (cost=465.16..25209.57
rows=6821 width=0) (actual time=3202.390..14731.096 rows=371026
loops=1)
        Recheck Cond: (obj_status_did = 1)
        Filter: (obj_tsvector @@ '''музыка'''::tsquery)
        ->  Bitmap Index Scan on i_search_table__tsvector_1
(cost=0.00..463.45 rows=6821 width=0) (actual time=2919.257..2919.257
rows=372012 loops=1)
              Index Cond: (obj_tsvector @@ '''музыка'''::tsquery)
 Total runtime: 14832.555 ms
(7 rows)

PG version - 8.3.7, STATISTICS is set to 500 for the column.

What's wrong with it? Is it possible to solve the problem? Thanx.

--
Regards,
Sergey Konoplev

--
Regards,
Sergey Konoplev
--
PostgreSQL articles in english & russian
http://gray-hemp.blogspot.com/search/label/postgresql/

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Sergey Konoplev (#1)
Re: Wrong rows count estimation (explain, gist, tsearch)

Sergey Konoplev <gray.ru@gmail.com> writes:

The table filled with about 7.5E+6 rows. Most of them have different
from default values in obj_tsvector column. I use "estimated rows
count trick" to make search results counter faster, and every time
when obj_tsvector is used estimation rows count is extremely differ
from actual (eg. 6821 vs 372012). I played with SET STATISTICS but
have no success.

8.3 has just a stub estimator for @@. You might have better results
with 8.4. In the particular example you're showing, though, I don't
think the poor rowcount estimate is making any difference to the
plan choice.

regards, tom lane

#4Sergey Konoplev
gray.ru@gmail.com
In reply to: Tom Lane (#3)
Re: Wrong rows count estimation (explain, gist, tsearch)

On Mon, Sep 28, 2009 at 6:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Sergey Konoplev <gray.ru@gmail.com> writes:

The table filled with about 7.5E+6 rows. Most of them have different
from default values in obj_tsvector column. I use "estimated rows
count trick" to make search results counter faster, and every time
when obj_tsvector is used estimation rows count is extremely differ
from actual (eg. 6821 vs 372012). I played with SET STATISTICS but
have no success.

8.3 has just a stub estimator for @@.  You might have better results
with 8.4.  In the particular example you're showing, though, I don't
think the poor rowcount estimate is making any difference to the
plan choice.

Thanx, Tom. Will try 8.4

                       regards, tom lane

--
Regards,
Sergey Konoplev
--
PostgreSQL articles in english & russian
http://gray-hemp.blogspot.com/search/label/postgresql/