Poor Delete performance

Started by Bill Huffabout 25 years ago9 messagesgeneral
Jump to latest
#1Bill Huff
bhuff@colltech.com

I am having some performance issues with deletion.

It appears that in deleting records from a table with a significant
number or rows ( in this case 1.3 million ) it takes about 1 hour per
100K rows deleted if deleting more then 400K at a time. This sounds
way to slow to me. If I break it up into smaller deletes then I can
usually get about 5K rows per minute, but even that seems awfully slow
to me.

The machine is a pretty beefy box ( PIII 800Mhz, 256M, SCSI III )
and is basically doing nothing else. When performing the delete the
system goes to 100% utilization ( >95% is postmaster ) and stays that
way until the delete finishes. My guess is that the overhead of MVCC
tracking all of the rows deleted so that a rollback can be performed
is killing me here, but I can see no way to get around it.

I have searched the on-line archives for all the postgres lists and
checked the FAQ and I can't find anything that gives any insight into
increasing delete performance.

Any information whatsoever would be greatly appreciated.

--
Bill

--
_____
/ ___/___ | Bill Huff / bhuff@colltech.com
/ /__ __/ | Voice: (512) 263-0770 x 262
/ /__/ / | Fax: (512) 263-8921
\___/ /ollective | Pager: 1-800-946-4646 # 1406217
\/echnologies |------[ http://www.colltech.com ] ------

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bill Huff (#1)
Re: Poor Delete performance

Bill Huff <bhuff@colltech.com> writes:

It appears that in deleting records from a table with a significant
number or rows ( in this case 1.3 million ) it takes about 1 hour per
100K rows deleted if deleting more then 400K at a time. This sounds
way to slow to me.

Me too. What PG version is this? What's the query exactly, and what
does EXPLAIN show for it? What is the full table definition ("pg_dump
-s -t tablename dbname" is the most reliable way to present the table)?

I have searched the on-line archives for all the postgres lists and
checked the FAQ and I can't find anything that gives any insight into
increasing delete performance.

DELETE (and UPDATE) are pretty much like SELECT as far as performance
considerations go. Selecting the tuples to delete/update is the primary
performance issue. So the docs about performance generally just talk
about SELECT to keep things simple.

regards, tom lane

#3Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Bill Huff (#1)
Re: Poor Delete performance

As an outside chance, does the table in question have any constraints
defined on it?

On Mon, 12 Mar 2001, Bill Huff wrote:

Show quoted text

I am having some performance issues with deletion.

It appears that in deleting records from a table with a significant
number or rows ( in this case 1.3 million ) it takes about 1 hour per
100K rows deleted if deleting more then 400K at a time. This sounds
way to slow to me. If I break it up into smaller deletes then I can
usually get about 5K rows per minute, but even that seems awfully slow
to me.

The machine is a pretty beefy box ( PIII 800Mhz, 256M, SCSI III )
and is basically doing nothing else. When performing the delete the
system goes to 100% utilization ( >95% is postmaster ) and stays that
way until the delete finishes. My guess is that the overhead of MVCC
tracking all of the rows deleted so that a rollback can be performed
is killing me here, but I can see no way to get around it.

I have searched the on-line archives for all the postgres lists and
checked the FAQ and I can't find anything that gives any insight into
increasing delete performance.

Any information whatsoever would be greatly appreciated.

#4Bill Huff
bhuff@colltech.com
In reply to: Stephan Szabo (#3)
Re: Poor Delete performance

There is a foreign key constraint, but that is accomplished by an
INSERT/UPDATE trigger. I don't see why that should cause any problems.

--
Bill

On Mon, Mar 12, 2001 at 08:27:51AM -0800, Stephan Szabo wrote:

As an outside chance, does the table in question have any constraints
defined on it?

On Mon, 12 Mar 2001, Bill Huff wrote:

I am having some performance issues with deletion.

It appears that in deleting records from a table with a significant
number or rows ( in this case 1.3 million ) it takes about 1 hour per
100K rows deleted if deleting more then 400K at a time. This sounds
way to slow to me. If I break it up into smaller deletes then I can
usually get about 5K rows per minute, but even that seems awfully slow
to me.

The machine is a pretty beefy box ( PIII 800Mhz, 256M, SCSI III )
and is basically doing nothing else. When performing the delete the
system goes to 100% utilization ( >95% is postmaster ) and stays that
way until the delete finishes. My guess is that the overhead of MVCC
tracking all of the rows deleted so that a rollback can be performed
is killing me here, but I can see no way to get around it.

I have searched the on-line archives for all the postgres lists and
checked the FAQ and I can't find anything that gives any insight into
increasing delete performance.

Any information whatsoever would be greatly appreciated.

--
_____
/ ___/___ | Bill Huff / bhuff@colltech.com
/ /__ __/ | Voice: (512) 263-0770 x 262
/ /__/ / | Fax: (512) 263-8921
\___/ /ollective | Pager: 1-800-946-4646 # 1406217
\/echnologies |------[ http://www.colltech.com ] ------

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bill Huff (#4)
Re: Poor Delete performance

Bill Huff <bhuff@colltech.com> writes:

There is a foreign key constraint, but that is accomplished by an
INSERT/UPDATE trigger. I don't see why that should cause any problems.

Ah, I believe I see the problem: it's the inefficient implementation of
AFTER EVENT triggers. The existence of either an AFTER UPDATE or AFTER
DELETE trigger causes trigger.c to add an event to its event queue for
every operation on the affected table. And every addition of an event
searches that queue --- with a linear scan. Thus operations on such a
table incur O(N^2) search cost if N tuples are affected in one
operation.

This needs to be improved (if we can't get rid of the lookup completely,
maybe use a hash table instead of sequential scan?) but it's much too
late to consider fixing it for 7.1 :-(.

Actually, though, it might be even stupider than that: it looks like
the queue should only be searched if the tuple being deleted was
inserted/modified earlier in the same transaction. Assuming that that
doesn't apply to Bill's case, the only thing I can see that could be
causing O(N^2) behavior is the lappend() in deferredTriggerAddEvent.
That's simple enough that we *could* fix it for 7.1 ...

regards, tom lane

#6Matthew
matt@ctlno.com
In reply to: Tom Lane (#5)
RE: Poor Delete performance

[snip]

This needs to be improved (if we can't get rid of the lookup completely,
maybe use a hash table instead of sequential scan?) but it's much too
late to consider fixing it for 7.1 :-(.

Actually, though, it might be even stupider than that: it looks like
the queue should only be searched if the tuple being deleted was
inserted/modified earlier in the same transaction. Assuming that that
doesn't apply to Bill's case, the only thing I can see that could be
causing O(N^2) behavior is the lappend() in deferredTriggerAddEvent.
That's simple enough that we *could* fix it for 7.1 ...

This would be a welcome improvement. I have for a long time noticed
very slow delete performance when deleting a large number of records in one
command. I can give more detail if so desired.

#7Bill Huff
bhuff@colltech.com
In reply to: Matthew (#6)
Re: Poor Delete performance

Just for everyones information. I removed the foreign key constraint
that was on the table and was able to delete about 190K records in
just over 5 seconds. That is much more like it. B^)

This does appear to be an interesting *feature* with the way that
constraints are handled.

--
Bill

On Mon, Mar 12, 2001 at 12:23:41PM -0600, Matthew wrote:

[snip]

This needs to be improved (if we can't get rid of the lookup completely,
maybe use a hash table instead of sequential scan?) but it's much too
late to consider fixing it for 7.1 :-(.

Actually, though, it might be even stupider than that: it looks like
the queue should only be searched if the tuple being deleted was
inserted/modified earlier in the same transaction. Assuming that that
doesn't apply to Bill's case, the only thing I can see that could be
causing O(N^2) behavior is the lappend() in deferredTriggerAddEvent.
That's simple enough that we *could* fix it for 7.1 ...

This would be a welcome improvement. I have for a long time noticed
very slow delete performance when deleting a large number of records in one
command. I can give more detail if so desired.

--
_____
/ ___/___ | Bill Huff / bhuff@colltech.com
/ /__ __/ | Voice: (512) 263-0770 x 262
/ /__/ / | Fax: (512) 263-8921
\___/ /ollective | Pager: 1-800-946-4646 # 1406217
\/echnologies |------[ http://www.colltech.com ] ------

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Matthew (#6)
Re: Poor Delete performance

Matthew <matt@ctlno.com> writes:

This would be a welcome improvement. I have for a long time noticed
very slow delete performance when deleting a large number of records in one
command. I can give more detail if so desired.

Is this in a table that contains foreign key references? (Or, more
generally, has any AFTER UPDATE or AFTER DELETE triggers?)

regards, tom lane

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#8)
Re: Poor Delete performance

I've applied a patch for this problem to current sources. The
improvement was more than I expected --- a test case involving
deleting 80000 tuples from a foreign-key-referencing table dropped
from ~15min to ~8sec. Insertion of a large number of tuples in one
transaction sped up quite a bit too.

If you are in a hurry for a fix, you could probably adapt the diff
to 7.0.3 sources:
http://www.postgresql.org/cgi/cvsweb.cgi/pgsql/src/backend/commands/trigger.c.diff?r1=1.86&amp;r2=1.87&amp;f=c

regards, tom lane