tsearch 2 query

Started by Matthew Terenzioover 17 years ago5 messagesgeneral
Jump to latest
#1Matthew Terenzio
mterenzio@gmail.com

There are less than 20,000 records being searched here, but the query takes
several minutes.

I know this may not be enough info, but would one suggest I optimize the
query or put my attention towards other areas.

SELECT id,date,headline as head,headline(body,q),rank(vectors,q),timestamp
FROM stories,to_tsquery('$query') AS q WHERE vectors @@ q ORDER BY $sort
DESC OFFSET $offset LIMIT 20

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Matthew Terenzio (#1)
Re: tsearch 2 query

"Matthew Terenzio" <mterenzio@gmail.com> writes:

There are less than 20,000 records being searched here, but the query takes
several minutes.
I know this may not be enough info, but would one suggest I optimize the
query or put my attention towards other areas.

What does EXPLAIN ANALYZE show for it?

regards, tom lane

#3Matthew Terenzio
mterenzio@gmail.com
In reply to: Tom Lane (#2)
Re: tsearch 2 query

Thanks Tom, Sorry if that last post went over multiple times. I was getting
a mailing failure (or so I thought)

here is EXPLAIN ANALYZE . I really need to work on my skills at analyzing
these:

Limit (cost=105505.78..105505.83 rows=20 width=655) (actual
time=74806.973..74807.037 rows=20 loops=1)

-> Sort (cost=105505.78..105555.44 rows=19861 width=655) (actual
time=74806.968..74806.989 rows=20 loops=1)

Sort Key: stories."timestamp"

-> Nested Loop (cost=0.00..90497.94 rows=19861 width=655)
(actual time=720.251..74798.672 rows=680 loops=1)

-> Function Scan on q (cost=0.00..12.50 rows=1000
width=32) (actual time=0.013..0.017 rows=1 loops=1)

-> Index Scan using description_index on stories
(cost=0.00..90.14 rows=20 width=623) (actual time=700.633..63243.713
rows=680 loops=1)

Index Cond: (stories.vectors @@ "outer".q)

Total runtime: 74847.177 ms

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Matthew Terenzio (#3)
Re: tsearch 2 query

"Matthew Terenzio" <mterenzio@gmail.com> writes:

here is EXPLAIN ANALYZE . I really need to work on my skills at analyzing
these:

Limit (cost=105505.78..105505.83 rows=20 width=655) (actual
time=74806.973..74807.037 rows=20 loops=1)
-> Sort (cost=105505.78..105555.44 rows=19861 width=655) (actual
time=74806.968..74806.989 rows=20 loops=1)
Sort Key: stories."timestamp"
-> Nested Loop (cost=0.00..90497.94 rows=19861 width=655)
(actual time=720.251..74798.672 rows=680 loops=1)
-> Function Scan on q (cost=0.00..12.50 rows=1000
width=32) (actual time=0.013..0.017 rows=1 loops=1)
-> Index Scan using description_index on stories
(cost=0.00..90.14 rows=20 width=623) (actual time=700.633..63243.713
rows=680 loops=1)
Index Cond: (stories.vectors @@ "outer".q)

Total runtime: 74847.177 ms

Huh. The plan looks fine --- I had thought maybe the optimizer was
dropping the ball, but this seems to be more or less what you need.
The indexscan seems awfully slow though.

The only thought I have to offer is that you're apparently using quite
an old version of Postgres --- the 1000-row estimate for a scalar
function scan would only have happened in 8.0 or before. Perhaps
updating to something newer would help. I'm not sure if there are any
big performance improvements in GIST indexes per se, but in 8.2 or
8.3 you'd have the option to switch to a GIN index instead. If this
table is read-mostly then that'd be a win.

regards, tom lane

#5Oleg Bartunov
oleg@sai.msu.su
In reply to: Matthew Terenzio (#3)
Re: tsearch 2 query

ok,

you calculate headline() 19861 times, while you need only 20.
Use subselect and will be surprized

Oleg
On Thu, 2 Oct 2008, Matthew Terenzio wrote:

Thanks Tom, Sorry if that last post went over multiple times. I was getting
a mailing failure (or so I thought)

here is EXPLAIN ANALYZE . I really need to work on my skills at analyzing
these:

Limit (cost=105505.78..105505.83 rows=20 width=655) (actual
time=74806.973..74807.037 rows=20 loops=1)

-> Sort (cost=105505.78..105555.44 rows=19861 width=655) (actual
time=74806.968..74806.989 rows=20 loops=1)

Sort Key: stories."timestamp"

-> Nested Loop (cost=0.00..90497.94 rows=19861 width=655)
(actual time=720.251..74798.672 rows=680 loops=1)

-> Function Scan on q (cost=0.00..12.50 rows=1000
width=32) (actual time=0.013..0.017 rows=1 loops=1)

-> Index Scan using description_index on stories
(cost=0.00..90.14 rows=20 width=623) (actual time=700.633..63243.713
rows=680 loops=1)

Index Cond: (stories.vectors @@ "outer".q)

Total runtime: 74847.177 ms

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83