Performance Question Followup No.2

Started by Gordan Bobicover 24 years ago6 messagesgeneral
Jump to latest
#1Gordan Bobic
gordan@bobich.net

After just having split the action into two parts (FTI delete + Master
delete), it would appear that most of the delay does come from the triggers
executing.

delete from masterfti where masterfti.id = master.id and master.entrytime <
'31-Oct-2001';
DELETE 172609

Came back in about 10 seconds.

delete from master where entrytime < '31-Oct-2001';

is still going after 10 minutes even though there are no records in the FTI
table that are being deleted.

Is there an explanation for this? Anything that I should check? Any
performance tweaks that would improve this situation?

Sorry to go on about this, but I'm totally puzzled by this.

Regards.

Gordan

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gordan Bobic (#1)
Re: Performance Question Followup No.2

Gordan Bobic <gordan@bobich.net> writes:

After just having split the action into two parts (FTI delete + Master
delete), it would appear that most of the delay does come from the triggers
executing.

I imagine that the problem is that the triggers have to delete the FTI
records retail --- one master record's worth at a time. That's
inherently far less efficient than getting rid of all of them in a
single query, as your comparison case is doing. I see no easy way
to get around that in the context of the existing FTI design.

There is a new "tsearch" contrib module in 7.2 that might be worth your
time to look at instead. I'm not sure whether it's any better on this
measure, but at least it's a fresh implementation...

regards, tom lane

#3Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Gordan Bobic (#1)
Re: Performance Question Followup No.2

On Wed, 7 Nov 2001, Gordan Bobic wrote:

After just having split the action into two parts (FTI delete + Master
delete), it would appear that most of the delay does come from the triggers
executing.

delete from masterfti where masterfti.id = master.id and master.entrytime <
'31-Oct-2001';
DELETE 172609

Came back in about 10 seconds.

delete from master where entrytime < '31-Oct-2001';

is still going after 10 minutes even though there are no records in the FTI
table that are being deleted.

Is there an explanation for this? Anything that I should check? Any
performance tweaks that would improve this situation?

Sorry to go on about this, but I'm totally puzzled by this.

Have you tried restarting your connection and doing the set for the
sequence scan and then doing the first delete? The foreign key
stuff caches a query plan the first time IIRC, so if you do the
set after it's run once, it isn't going to change its plan.

#4Gordan Bobic
gordan@bobich.net
In reply to: Tom Lane (#2)
Re: Performance Question Followup No.2

On Wednesday 07 Nov 2001 17:25, Tom Lane wrote:

Gordan Bobic <gordan@bobich.net> writes:

Thanks for the reply.

After just having split the action into two parts (FTI delete + Master
delete), it would appear that most of the delay does come from the
triggers executing.

I imagine that the problem is that the triggers have to delete the FTI
records retail --- one master record's worth at a time. That's
inherently far less efficient than getting rid of all of them in a
single query, as your comparison case is doing. I see no easy way
to get around that in the context of the existing FTI design.

Would that really explain such a HUGE difference in performance? Even without
any corresponding FTI records (if they are deleted first - I tried it)?

I am not talking about a few percent, or even factor 2 difference. I am
talking about a difference between 10 seconds to completion and aborting
after 45 minutes - on a 1 GHz machine.

There is a new "tsearch" contrib module in 7.2 that might be worth your
time to look at instead. I'm not sure whether it's any better on this
measure, but at least it's a fresh implementation...

I didn't use the FTI module implementation because again, it uses triggers -
this, yet again proved to be too slow. The query performance wasn't improved,
though, even with properly set up indices. In order to get it to be of
benefit I
1) Implemented it "in software" in the application layer.
2) Made it not insert duplicates.
3) Made it not do word-stemming/subwords.
4) Made the stop-word table separate (for ease of use - application reads
this).
5) Inserted in excess of 200 stop words (finding them all wasy hard work, and
it is a rather application specific thing to do) to get the Master/FTI ratio
to under 35 unique words/record.

Now the performance is slightly improved, although with enough memory and a
fast processor, the difference isn't all that great when compared to an ILIKE
search on the text fields. It's a few times faster, but I guess I was
expecting more...

Regards.

Gordan

#5Gordan Bobic
gordan@bobich.net
In reply to: Stephan Szabo (#3)
Re: Performance Question Followup No.2

On Wednesday 07 Nov 2001 17:30, Stephan Szabo wrote:

On Wed, 7 Nov 2001, Gordan Bobic wrote:

After just having split the action into two parts (FTI delete + Master
delete), it would appear that most of the delay does come from the
triggers executing.

delete from masterfti where masterfti.id = master.id and master.entrytime
< '31-Oct-2001';
DELETE 172609

Came back in about 10 seconds.

delete from master where entrytime < '31-Oct-2001';

is still going after 10 minutes even though there are no records in the
FTI table that are being deleted.

Is there an explanation for this? Anything that I should check? Any
performance tweaks that would improve this situation?

Sorry to go on about this, but I'm totally puzzled by this.

Have you tried restarting your connection and doing the set for the
sequence scan and then doing the first delete? The foreign key
stuff caches a query plan the first time IIRC, so if you do the
set after it's run once, it isn't going to change its plan.

Yes, I did, but with triggers enabled, doing bulk cascaded deletes just takes
too long. I have now removed all the triggers and am handling things at the
application level. It is by no means as neat, tidy and elegant as triggers,
but it is worth it for a several hundred fold performance improvement.

Thanks for telling me about the foreign key caching, though. I can see how
that could cause interesting performance tuning situations. :-)

Gordan

#6Antoine Reid
antoiner@hansonpublications.com
In reply to: Gordan Bobic (#1)
Re: Performance Question Followup No.2

[comments inline...]
On Wed, Nov 07, 2001 at 01:16:57PM +0000, Gordan Bobic wrote:

After just having split the action into two parts (FTI delete + Master
delete), it would appear that most of the delay does come from the triggers
executing.

If I understand correctly, you experience delays when doing UPDATEs or
DELETEs on the master table? (Due to triggers updating the fti table)

I am still using 7.1.3 but am using contrib/fulltextindex from CVS as of a
couple days ago. One thing worth noting is a change in the documentation that
comes with it. There are also some functionality changes.

In 7.1.3 (release) they suggest ONE index on ("string", "id") while in CVS
they suggest TWO separate indexes. After doing this change on my database,
I do see a big difference in the time it takes to make an UPDATE.
My understanding is that it is now able to use the 2nd index (on "id") to
remove the old entries in the fti table, while it previously couldn't use
an index. I may be wrong, but I do see a good speedup.

OTOH, I'm also using the newest fti.c (which supports more than 2 arguments,
to index multiple fields into the same fti table), with the old Makefile,
so I'm not sure what makes the biggest difference.

My biggest consumer of fti is a master table with 10963140 records, with a
corresponding fti table of ~350000000 records. (yes, 350 M records) :)

My next step is probably to add an extra argument to fti() before the
list of fields to include, to control whether substrings are included or not.
In my application, complete words would be sufficient..

Has anybody got a patch to fti.c to disable that? My C pointers logic is
rusty...

Is there an explanation for this? Anything that I should check? Any
performance tweaks that would improve this situation?

In my case, doing UPDATEs one record at a time on the master table, in
separate transactions, seems to work very well now that I have separate
indexes on the fti table. Perhaps you could try it in a test database,
do some measurements (one index spanning 2 fields vs. 2 separate indexes)
and post your results here?

I'm really looking forward to contrib/tsearch in 7.2. However ISTR the docs
clearly saying it will only work with >= 7.2, but I think it can solve many
of our problems.. I haven't tried beta2 yet, but will setup a test machine
probably next week. I could probably get to a point where I can do some
measurements. Anybody interested in seeing my numbers?

Sorry to go on about this, but I'm totally puzzled by this.

Regards.

Gordan

HTH
Antoine