Re: FTI is really really slow; what am I doing wrong?

Started by Nonamealmost 25 years ago5 messagesgeneral

Jump to latest

Noname

newsreader@mediaone.net

almost 25 years ago

Did you vacuum after
populating the tables?
If not you should do it

Show quoted text

On Wed, Aug 22, 2001 at 11:08:55AM -0400, Paul C. wrote:

Greetings,
I am trying to test out the performance of the contrib/fulltextindex
package and I am getting horrid performance results.
The Setup:
I created a simple table, ST (id SERIAL, body varchar(1024), which is to be
searched. I created the ST_FTI table, trigger and indices as per
instructions in the FTI readme and C file. To populate the table, I took a
flat text version of 'War and Peace' I found on the net, broke it up into
sentences and inserted each sentence into ST as a row. So I have about
38,000 sentences and my ST_FTI table is about 2 million rows.
The Test:
There is exactly one sentence (row) that has the strings 'Newton' and
'Kepler' in it. That is my target. For a straight select on ST:
select * from st where body ~* 'newton' and body ~* 'kepler';
the cost is 1100.41
BUT for an query using the FTI indices:
select s.* from st s, st_fti f1, st_fti f2 where f1.string
~ '^kepler' and f2.string ~ '^newton' and s.oid = f1.id
and s.oid = f2.id;
the cost becomes a staggering 80628.92!!! The plans are pasted at the end
of this message.
Now, I have all the indices created (on id of st_fti, on string of st_fti
and on oid of st). I cannot figure out why this is so much worse than the
straight query. Indeed, the cost to look up a single string in the st_fti
table is way high:
select * from st_fti where string ~ '^kepler';
costs 36703.40, AND its doing a Seq Scan on st_fti, even though an index
exists.
What am I doing wrong? Is it the sheer size of the st_fti table that is
causing problems? Any help would be greatly appreciated.
Thanks,
Paul C.

FTI search
NOTICE: QUERY PLAN:
Merge Join (cost=80046.91..80628.92 rows=110 width=28)
-> Sort (cost=41827.54..41827.54 rows=19400 width=24)
-> Hash Join (cost=1992.80..40216.39 rows=19400 width=24)
-> Seq Scan on st_fti f2 (cost=0.00..36703.40 rows=19400
width=4)
-> Hash (cost=929.94..929.94 rows=34094 width=20)
-> Seq Scan on st s (cost=0.00..929.94 rows=34094
width=20)
-> Sort (cost=38219.37..38219.37 rows=19400 width=4)
-> Seq Scan on st_fti f1 (cost=0.00..36703.40 rows=19400 width=4)
EXPLAIN

Plain search:
NOTICE: QUERY PLAN:
Seq Scan on st (cost=0.00..1100.41 rows=1 width=16)
EXPLAIN

_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Import Notes

Reply to msg id not found: F104VWDKpzxDPHprp5y00014a94@hotmail.comReference msg id not found: F104VWDKpzxDPHprp5y00014a94@hotmail.com

Mitch Vincent

mvincent@cablespeed.com

almost 25 years ago

In reply to: Noname (#1)

You've vacuum analyze 'd the database, haven't you?

-Mitch

Show quoted text

There is exactly one sentence (row) that has the strings 'Newton' and
'Kepler' in it. That is my target. For a straight select on ST:
select * from st where body ~* 'newton' and body ~* 'kepler';
the cost is 1100.41
BUT for an query using the FTI indices:
select s.* from st s, st_fti f1, st_fti f2 where f1.string
~ '^kepler' and f2.string ~ '^newton' and s.oid = f1.id
and s.oid = f2.id;
the cost becomes a staggering 80628.92!!! The plans are pasted at the end
of this message.
Now, I have all the indices created (on id of st_fti, on string of st_fti
and on oid of st). I cannot figure out why this is so much worse than the
straight query. Indeed, the cost to look up a single string in the st_fti
table is way high:
select * from st_fti where string ~ '^kepler';
costs 36703.40, AND its doing a Seq Scan on st_fti, even though an index
exists.
What am I doing wrong? Is it the sheer size of the st_fti table that is
causing problems? Any help would be greatly appreciated.
Thanks,

Import Notes

Reference msg id not found: F104VWDKpzxDPHprp5y00014a94@hotmail.com | Resolved by subject fallback

Paul C.

ulive1x@hotmail.com

almost 25 years ago

In reply to: Mitch Vincent (#2)

Meant to send this to the whole list...

I had vaccum'ed it, but not 'vacuum analyze' -ed it. That did the trick.
Cost down to 12.09 from 80628.92.

Thank you!

From: "Mitch Vincent" <mvincent@cablespeed.com>
To: "Paul C." <ulive1x@hotmail.com>, <pgsql-general@postgresql.org>
Subject: Re: [GENERAL] FTI is really really slow; what am I doing wrong?
Date: Wed, 22 Aug 2001 11:28:04 -0400

You've vacuum analyze 'd the database, haven't you?

-Mitch

There is exactly one sentence (row) that has the strings 'Newton' and
'Kepler' in it. That is my target. For a straight select on ST:
select * from st where body ~* 'newton' and body ~* 'kepler';
the cost is 1100.41
BUT for an query using the FTI indices:
select s.* from st s, st_fti f1, st_fti f2 where f1.string
~ '^kepler' and f2.string ~ '^newton' and s.oid = f1.id
and s.oid = f2.id;
the cost becomes a staggering 80628.92!!! The plans are pasted at the

end

of this message.
Now, I have all the indices created (on id of st_fti, on string of

st_fti

and on oid of st). I cannot figure out why this is so much worse than

the

straight query. Indeed, the cost to look up a single string in the

st_fti

table is way high:
select * from st_fti where string ~ '^kepler';
costs 36703.40, AND its doing a Seq Scan on st_fti, even though an index
exists.
What am I doing wrong? Is it the sheer size of the st_fti table that is
causing problems? Any help would be greatly appreciated.
Thanks,

_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp

Import Notes

Resolved by subject fallback

Bruno Wolff III

bruno@wolff.to

almost 25 years ago

In reply to: Paul C. (#3)

I think the problem is that the indexes on the words are not usable for
regular expression matching. If the words are folded to lower case
when the index is built, then using an exact match (or even like) should
be much faster.

Import Notes

Reply to msg id not found: F104VWDKpzxDPHprp5y00014a94@hotmail.comReference msg id not found: F104VWDKpzxDPHprp5y00014a94@hotmail.com | Resolved by subject fallback

Tom Lane

tgl@sss.pgh.pa.us

almost 25 years ago

In reply to: Bruno Wolff III (#4)

"Paul C." <ulive1x@hotmail.com> writes:

Indeed, the cost to look up a single string in the st_fti
table is way high:
select * from st_fti where string ~ '^kepler';
costs 36703.40, AND its doing a Seq Scan on st_fti, even though an index
exists.

Have you done a VACUUM ANALYZE on st_fti? Are you running the database
in plain C locale?

regards, tom lane

Import Notes

Reply to msg id not found: F104VWDKpzxDPHprp5y00014a94@hotmail.comReference msg id not found: F104VWDKpzxDPHprp5y00014a94@hotmail.com | Resolved by subject fallback