Re: FTI is really really slow; what am I doing wrong?
Did you vacuum after
populating the tables?
If not you should do it
Show quoted text
On Wed, Aug 22, 2001 at 11:08:55AM -0400, Paul C. wrote:
Greetings,
I am trying to test out the performance of the contrib/fulltextindex
package and I am getting horrid performance results.
The Setup:
I created a simple table, ST (id SERIAL, body varchar(1024), which is to be
searched. I created the ST_FTI table, trigger and indices as per
instructions in the FTI readme and C file. To populate the table, I took a
flat text version of 'War and Peace' I found on the net, broke it up into
sentences and inserted each sentence into ST as a row. So I have about
38,000 sentences and my ST_FTI table is about 2 million rows.
The Test:
There is exactly one sentence (row) that has the strings 'Newton' and
'Kepler' in it. That is my target. For a straight select on ST:
select * from st where body ~* 'newton' and body ~* 'kepler';
the cost is 1100.41
BUT for an query using the FTI indices:
select s.* from st s, st_fti f1, st_fti f2 where f1.string
~ '^kepler' and f2.string ~ '^newton' and s.oid = f1.id
and s.oid = f2.id;
the cost becomes a staggering 80628.92!!! The plans are pasted at the end
of this message.
Now, I have all the indices created (on id of st_fti, on string of st_fti
and on oid of st). I cannot figure out why this is so much worse than the
straight query. Indeed, the cost to look up a single string in the st_fti
table is way high:
select * from st_fti where string ~ '^kepler';
costs 36703.40, AND its doing a Seq Scan on st_fti, even though an index
exists.
What am I doing wrong? Is it the sheer size of the st_fti table that is
causing problems? Any help would be greatly appreciated.
Thanks,
Paul C.FTI search
NOTICE: QUERY PLAN:
Merge Join (cost=80046.91..80628.92 rows=110 width=28)
-> Sort (cost=41827.54..41827.54 rows=19400 width=24)
-> Hash Join (cost=1992.80..40216.39 rows=19400 width=24)
-> Seq Scan on st_fti f2 (cost=0.00..36703.40 rows=19400
width=4)
-> Hash (cost=929.94..929.94 rows=34094 width=20)
-> Seq Scan on st s (cost=0.00..929.94 rows=34094
width=20)
-> Sort (cost=38219.37..38219.37 rows=19400 width=4)
-> Seq Scan on st_fti f1 (cost=0.00..36703.40 rows=19400 width=4)
EXPLAINPlain search:
NOTICE: QUERY PLAN:
Seq Scan on st (cost=0.00..1100.41 rows=1 width=16)
EXPLAIN_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Import Notes
Reply to msg id not found: F104VWDKpzxDPHprp5y00014a94@hotmail.comReference msg id not found: F104VWDKpzxDPHprp5y00014a94@hotmail.com
You've vacuum analyze 'd the database, haven't you?
-Mitch
Show quoted text
There is exactly one sentence (row) that has the strings 'Newton' and
'Kepler' in it. That is my target. For a straight select on ST:
select * from st where body ~* 'newton' and body ~* 'kepler';
the cost is 1100.41
BUT for an query using the FTI indices:
select s.* from st s, st_fti f1, st_fti f2 where f1.string
~ '^kepler' and f2.string ~ '^newton' and s.oid = f1.id
and s.oid = f2.id;
the cost becomes a staggering 80628.92!!! The plans are pasted at the end
of this message.
Now, I have all the indices created (on id of st_fti, on string of st_fti
and on oid of st). I cannot figure out why this is so much worse than the
straight query. Indeed, the cost to look up a single string in the st_fti
table is way high:
select * from st_fti where string ~ '^kepler';
costs 36703.40, AND its doing a Seq Scan on st_fti, even though an index
exists.
What am I doing wrong? Is it the sheer size of the st_fti table that is
causing problems? Any help would be greatly appreciated.
Thanks,
Import Notes
Reference msg id not found: F104VWDKpzxDPHprp5y00014a94@hotmail.com | Resolved by subject fallback
Meant to send this to the whole list...
I had vaccum'ed it, but not 'vacuum analyze' -ed it. That did the trick.
Cost down to 12.09 from 80628.92.
Thank you!
From: "Mitch Vincent" <mvincent@cablespeed.com>
To: "Paul C." <ulive1x@hotmail.com>, <pgsql-general@postgresql.org>
Subject: Re: [GENERAL] FTI is really really slow; what am I doing wrong?
Date: Wed, 22 Aug 2001 11:28:04 -0400You've vacuum analyze 'd the database, haven't you?
-Mitch
There is exactly one sentence (row) that has the strings 'Newton' and
'Kepler' in it. That is my target. For a straight select on ST:
select * from st where body ~* 'newton' and body ~* 'kepler';
the cost is 1100.41
BUT for an query using the FTI indices:
select s.* from st s, st_fti f1, st_fti f2 where f1.string
~ '^kepler' and f2.string ~ '^newton' and s.oid = f1.id
and s.oid = f2.id;
the cost becomes a staggering 80628.92!!! The plans are pasted at theend
of this message.
Now, I have all the indices created (on id of st_fti, on string ofst_fti
and on oid of st). I cannot figure out why this is so much worse than
the
straight query. Indeed, the cost to look up a single string in the
st_fti
table is way high:
select * from st_fti where string ~ '^kepler';
costs 36703.40, AND its doing a Seq Scan on st_fti, even though an index
exists.
What am I doing wrong? Is it the sheer size of the st_fti table that is
causing problems? Any help would be greatly appreciated.
Thanks,
_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp
Import Notes
Resolved by subject fallback
I think the problem is that the indexes on the words are not usable for
regular expression matching. If the words are folded to lower case
when the index is built, then using an exact match (or even like) should
be much faster.
Import Notes
Reply to msg id not found: F104VWDKpzxDPHprp5y00014a94@hotmail.comReference msg id not found: F104VWDKpzxDPHprp5y00014a94@hotmail.com | Resolved by subject fallback
"Paul C." <ulive1x@hotmail.com> writes:
Indeed, the cost to look up a single string in the st_fti
table is way high:
select * from st_fti where string ~ '^kepler';
costs 36703.40, AND its doing a Seq Scan on st_fti, even though an index
exists.
Have you done a VACUUM ANALYZE on st_fti? Are you running the database
in plain C locale?
regards, tom lane
Import Notes
Reply to msg id not found: F104VWDKpzxDPHprp5y00014a94@hotmail.comReference msg id not found: F104VWDKpzxDPHprp5y00014a94@hotmail.com | Resolved by subject fallback