Query Optimisation

Started by Matover 22 years ago4 messagesgeneral
Jump to latest
#1Mat
psql-mail@freeuk.com

Hi, I have the following query - is there anything i've missed or is it
just slow?!

I have an index on meta.date that i thought might have been used but
isn't (I know it would only be a small performance increase in the
current plan).

meta.date is between 1999 and 2003. I think generally the most
efficient order to do things would be to extract all the messages
within the date range and then search over just them.

I am currently in the process of setting up full text indexing as
described in the techdocs.postgresql.org i guess this is the main way
of speeding up searches through ~40GB of bulk text?

Thanks!...
m

EXPLAIN ANALYZE SELECT meta.msg_id, date, from_line, subject FROM
message ,meta WHERE meta.date >= '15-06-2003 00:00:00' AND meta.date <=
'26-06-2003 00:00:00' AND message.header||message.body ILIKE '%chicken%'
AND meta.sys_id = message.sys_id ORDER BY meta.date DESC;
QUERY PLAN

------------------------------------------------------------------------
------------------------------------------------------------------------
------
Sort (cost=50.16..50.17 rows=1 width=120) (actual time=412333.65..
412333.76 rows=166 loops=1)
Sort Key: meta.date
-> Nested Loop (cost=0.00..50.15 rows=1 width=120) (actual time=
400713.41..412332.53 rows=166 loops=1)
-> Seq Scan on message (cost=0.00..25.00 rows=5 width=8) (actual time=
58.18..410588.49 rows=20839 loops=1)
Filter: ((header || body) ~~* '%chicken%'::text)
-> Index Scan using meta_pkey on meta (cost=0.00..5.02 rows=1 width=
112) (actual time=0.07..0.07 rows=0 loops=20839)
Index Cond: (meta.sys_id = "outer".sys_id)
Filter: ((date >= '2003-06-15 00:00:00'::timestamp without time zone)
AND (date <= '2003-06-26 00:00:00'::timestamp without time zone))
Total runtime: 412334.08 msec
(9 rows)

#2Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Mat (#1)
Re: [GENERAL] Query Optimisation

[replying to -performance]

On Tue, 15 Jul 2003 psql-mail@freeuk.com wrote:

Hi, I have the following query - is there anything i've missed or is it
just slow?!

The fact that it underestimates the number of matching message rows by a
factor of about 4000 doesn't help. I'm not sure you're going to be able to
get a better estimate using message.header||message.body ILIKE '%chicken%'
(possibly using two ilikes with or might help but probably not enough).
Have you vacuum analyzed the two tables recently? The seq scan cost on
message seems fairly low given what I would expect to be the size of that
table.

I am currently in the process of setting up full text indexing as
described in the techdocs.postgresql.org i guess this is the main way
of speeding up searches through ~40GB of bulk text?

That's still probably the best way.

Show quoted text

EXPLAIN ANALYZE SELECT meta.msg_id, date, from_line, subject FROM
message ,meta WHERE meta.date >= '15-06-2003 00:00:00' AND meta.date <=
'26-06-2003 00:00:00' AND message.header||message.body ILIKE '%chicken%'
AND meta.sys_id = message.sys_id ORDER BY meta.date DESC;
QUERY PLAN

------------------------------------------------------------------------
------------------------------------------------------------------------
------
Sort (cost=50.16..50.17 rows=1 width=120) (actual time=412333.65..
412333.76 rows=166 loops=1)
Sort Key: meta.date
-> Nested Loop (cost=0.00..50.15 rows=1 width=120) (actual time=
400713.41..412332.53 rows=166 loops=1)
-> Seq Scan on message (cost=0.00..25.00 rows=5 width=8) (actual time=
58.18..410588.49 rows=20839 loops=1)
Filter: ((header || body) ~~* '%chicken%'::text)
-> Index Scan using meta_pkey on meta (cost=0.00..5.02 rows=1 width=
112) (actual time=0.07..0.07 rows=0 loops=20839)
Index Cond: (meta.sys_id = "outer".sys_id)
Filter: ((date >= '2003-06-15 00:00:00'::timestamp without time zone)
AND (date <= '2003-06-26 00:00:00'::timestamp without time zone))
Total runtime: 412334.08 msec
(9 rows)

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

#3Jonathan Bartlett
johnnyb@eskimo.com
In reply to: Mat (#1)
Re: Query Optimisation

Try explicitly casting the values into the appropriate date type.

Jon

On Tue, 15 Jul 2003 psql-mail@freeuk.com wrote:

Show quoted text

Hi, I have the following query - is there anything i've missed or is it
just slow?!

I have an index on meta.date that i thought might have been used but
isn't (I know it would only be a small performance increase in the
current plan).

meta.date is between 1999 and 2003. I think generally the most
efficient order to do things would be to extract all the messages
within the date range and then search over just them.

I am currently in the process of setting up full text indexing as
described in the techdocs.postgresql.org i guess this is the main way
of speeding up searches through ~40GB of bulk text?

Thanks!...
m

EXPLAIN ANALYZE SELECT meta.msg_id, date, from_line, subject FROM
message ,meta WHERE meta.date >= '15-06-2003 00:00:00' AND meta.date <=
'26-06-2003 00:00:00' AND message.header||message.body ILIKE '%chicken%'
AND meta.sys_id = message.sys_id ORDER BY meta.date DESC;
QUERY PLAN

------------------------------------------------------------------------
------------------------------------------------------------------------
------
Sort (cost=50.16..50.17 rows=1 width=120) (actual time=412333.65..
412333.76 rows=166 loops=1)
Sort Key: meta.date
-> Nested Loop (cost=0.00..50.15 rows=1 width=120) (actual time=
400713.41..412332.53 rows=166 loops=1)
-> Seq Scan on message (cost=0.00..25.00 rows=5 width=8) (actual time=
58.18..410588.49 rows=20839 loops=1)
Filter: ((header || body) ~~* '%chicken%'::text)
-> Index Scan using meta_pkey on meta (cost=0.00..5.02 rows=1 width=
112) (actual time=0.07..0.07 rows=0 loops=20839)
Index Cond: (meta.sys_id = "outer".sys_id)
Filter: ((date >= '2003-06-15 00:00:00'::timestamp without time zone)
AND (date <= '2003-06-26 00:00:00'::timestamp without time zone))
Total runtime: 412334.08 msec
(9 rows)

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mat (#1)
Re: Query Optimisation

psql-mail@freeuk.com writes:

-> Seq Scan on message (cost=0.00..25.00 rows=5 width=8) (actual time=
58.18..410588.49 rows=20839 loops=1)
Filter: ((header || body) ~~* '%chicken%'::text)

Estimated cost of a seqscan only 25? Have you ever vacuumed or analyzed
that table? The planner evidently thinks it is tiny ...

regards, tom lane