on delete cascade slowing down delete

Started by Ivan Sergio Borgonovoover 17 years ago22 messagesgeneral

mail@webthatworks.it

over 17 years ago

I've a large table with a pk and several smaller tables with fk
referencing to it.

deleting from the first table is very slow.

Not all the related fk have indexes but they are VERY small (0 to
100 records) while the main table contain 600-800K records.

the
explain delete p;
doesn't give any clue.

Any hint to track down the problem?

thanks

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

Alan Hodgson

ahodgson@simkin.ca

over 17 years ago

In reply to: Ivan Sergio Borgonovo (#1)

Re: on delete cascade slowing down delete

On Thursday 21 August 2008, Ivan Sergio Borgonovo <mail@webthatworks.it>
wrote:

I've a large table with a pk and several smaller tables with fk
referencing to it.

deleting from the first table is very slow.

Not all the related fk have indexes but they are VERY small (0 to
100 records) while the main table contain 600-800K records.

the
explain delete p;
doesn't give any clue.

Any hint to track down the problem?

1) Define slow.

2) Tell us what your disk subsystem consists of.

3) Are there any delete triggers on the tables?

--
Alan

Joshua D. Drake

jd@commandprompt.com

over 17 years ago

In reply to: Alan Hodgson (#2)

Re: on delete cascade slowing down delete

On Thu, 21 Aug 2008 10:16:21 -0700
Alan Hodgson <ahodgson@simkin.ca> wrote:

Any hint to track down the problem?

1) Define slow.

2) Tell us what your disk subsystem consists of.

3) Are there any delete triggers on the tables?

And the EXPLAIN ANALYZE

Joshua D. Drake

--
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

David Fetter

david@fetter.org

over 17 years ago

In reply to: Ivan Sergio Borgonovo (#1)

Re: on delete cascade slowing down delete

On Thu, Aug 21, 2008 at 07:06:32PM +0200, Ivan Sergio Borgonovo wrote:

I've a large table with a pk and several smaller tables with fk
referencing to it.

deleting from the first table is very slow.

Not all the related fk have indexes but they are VERY small (0 to
100 records) while the main table contain 600-800K records.

the
explain delete p;
doesn't give any clue.

Any hint to track down the problem?

BEGIN;
EXPLAIN ANALYZE DELETE ...
ROLLBACK;

Lack of indexes on the referencing tables might be an issue, as might
any triggers.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Ivan Sergio Borgonovo

mail@webthatworks.it

over 17 years ago

In reply to: Joshua D. Drake (#3)

Re: on delete cascade slowing down delete

On Thu, 21 Aug 2008 10:19:44 -0700
Joshua Drake <jd@commandprompt.com> wrote:

On Thu, 21 Aug 2008 10:16:21 -0700
Alan Hodgson <ahodgson@simkin.ca> wrote:

Any hint to track down the problem?

1) Define slow.

still undefined anyway 20-30min compared to over 2h and still
running.

2) Tell us what your disk subsystem consists of.

RAID5 on SCSI, 4Gb RAM, 2xXeon (single core, HT, 3.2GHz)

3) Are there any delete triggers on the tables?

yes as in the subject... several but on very small tables (no more
than some hundreds of record, but mostly in the order of some tens).

BTW work_mem is 32Mb

And the EXPLAIN ANALYZE

isn't explain analyze going to take forever since it actually run
the statements?

btw if I have

create table p(
pid int primary key
);

create table s(
sid int primary key,
pid int references p (pid) on delete cascade
);

Is it going to make things faster if I:

delete from s;
reindex table s;
delete from p;

or once everything (eg correct indexes that anyway will be useful
for normal operation, postgres.conf suited for my workload and box,
something I still have to discover, etc...) else that don't require
bookkeeping on my part it won't be different from a simpler

delete from p;

thanks

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

Peter Billen

peter@clueless.be

over 17 years ago

In reply to: Ivan Sergio Borgonovo (#5)

SERIAL datatype

Hi all,

I would like to ask a question about the serial datatype. Say I have a
field of type serial, and say for the sake of example that the range of
a serial goes from 1 to 5 (inclusive). I insert 10 entries into the
table, so the table is 'full':

INSERT INTO my_table (my_serial) VALUES (DEFAULT);
INSERT INTO my_table (my_serial) VALUES (DEFAULT);
INSERT INTO my_table (my_serial) VALUES (DEFAULT);
INSERT INTO my_table (my_serial) VALUES (DEFAULT);

Next I delete a random entry, say the one with value 3:

DELETE FROM my_table WHERE my_serial = 3;

Is it possible to insert a new entry? Will the serial sequence somehow
be able to find the gap (3)?

The reason why I am asking is because I have a table in which constantly
entries are being deleted and inserted. What happens if the serial
sequence is exhausted? If it is not able to go the the next gap, how is
it possible to keep inserting and deleting entries once the serial
sequence has been exhausted? I can't find this anywhere in docs.

To me, it is perfectly possible that there is only one entry in the
table, with a serial value equal to its upper limit.

Thanks in advance. Kind regards,

Peter

Peter Billen

peter@clueless.be

over 17 years ago

In reply to: Peter Billen (#6)

Re: SERIAL datatype

Oops, my example was a bit incorrectly edited.

I wanted to say that the range of a serial datatype goes from 1 to 5
(incluse) and I insert five entries (not 10).

Peter

Peter Billen schreef:

Show quoted text

Hi all,

I would like to ask a question about the serial datatype. Say I have a
field of type serial, and say for the sake of example that the range
of a serial goes from 1 to 5 (inclusive). I insert 10 entries into the
table, so the table is 'full':

INSERT INTO my_table (my_serial) VALUES (DEFAULT);
INSERT INTO my_table (my_serial) VALUES (DEFAULT);
INSERT INTO my_table (my_serial) VALUES (DEFAULT);
INSERT INTO my_table (my_serial) VALUES (DEFAULT);

Next I delete a random entry, say the one with value 3:

DELETE FROM my_table WHERE my_serial = 3;

Is it possible to insert a new entry? Will the serial sequence somehow
be able to find the gap (3)?

The reason why I am asking is because I have a table in which
constantly entries are being deleted and inserted. What happens if the
serial sequence is exhausted? If it is not able to go the the next
gap, how is it possible to keep inserting and deleting entries once
the serial sequence has been exhausted? I can't find this anywhere in
docs.

To me, it is perfectly possible that there is only one entry in the
table, with a serial value equal to its upper limit.

Thanks in advance. Kind regards,

Peter

ries van Twisk

pg@rvt.dds.nl

over 17 years ago

In reply to: Peter Billen (#7)

Re: SERIAL datatype

On Aug 21, 2008, at 2:23 PM, Peter Billen wrote:

Oops, my example was a bit incorrectly edited.

I wanted to say that the range of a serial datatype goes from 1 to 5
(incluse) and I insert five entries (not 10).

Peter

Peter Billen schreef:

Hi all,

I would like to ask a question about the serial datatype. Say I
have a field of type serial, and say for the sake of example that
the range of a serial goes from 1 to 5 (inclusive). I insert 10
entries into the table, so the table is 'full':

INSERT INTO my_table (my_serial) VALUES (DEFAULT);
INSERT INTO my_table (my_serial) VALUES (DEFAULT);
INSERT INTO my_table (my_serial) VALUES (DEFAULT);
INSERT INTO my_table (my_serial) VALUES (DEFAULT);

Next I delete a random entry, say the one with value 3:

DELETE FROM my_table WHERE my_serial = 3;

Is it possible to insert a new entry? Will the serial sequence
somehow be able to find the gap (3)?

The reason why I am asking is because I have a table in which
constantly entries are being deleted and inserted. What happens if
the serial sequence is exhausted? If it is not able to go the the
next gap, how is it possible to keep inserting and deleting entries
once the serial sequence has been exhausted? I can't find this
anywhere in docs.

To me, it is perfectly possible that there is only one entry in the
table, with a serial value equal to its upper limit.

Thanks in advance. Kind regards,

Peter

May be you want to use BIGSERIAL if you are worried?

Ries

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

Peter Billen

peter@clueless.be

over 17 years ago

In reply to: ries van Twisk (#8)

Re: SERIAL datatype

My concern is not that the table will become full, but that the sequence
will be exhausted. Doing INSERT, DELETE, INSERT, DELETE ... will exhaust
the sequence. What will happen then? Do I have to manually re-order my
serial values and reset the start sequence ID to MAX() + 1?

Thanks in advance,

Peter

ries van Twisk schreef:

Show quoted text

On Aug 21, 2008, at 2:23 PM, Peter Billen wrote:

Oops, my example was a bit incorrectly edited.

I wanted to say that the range of a serial datatype goes from 1 to 5
(incluse) and I insert five entries (not 10).

Peter

Peter Billen schreef:

Hi all,

I would like to ask a question about the serial datatype. Say I have
a field of type serial, and say for the sake of example that the
range of a serial goes from 1 to 5 (inclusive). I insert 10 entries
into the table, so the table is 'full':

INSERT INTO my_table (my_serial) VALUES (DEFAULT);
INSERT INTO my_table (my_serial) VALUES (DEFAULT);
INSERT INTO my_table (my_serial) VALUES (DEFAULT);
INSERT INTO my_table (my_serial) VALUES (DEFAULT);

Next I delete a random entry, say the one with value 3:

DELETE FROM my_table WHERE my_serial = 3;

Is it possible to insert a new entry? Will the serial sequence
somehow be able to find the gap (3)?

The reason why I am asking is because I have a table in which
constantly entries are being deleted and inserted. What happens if
the serial sequence is exhausted? If it is not able to go the the
next gap, how is it possible to keep inserting and deleting entries
once the serial sequence has been exhausted? I can't find this
anywhere in docs.

To me, it is perfectly possible that there is only one entry in the
table, with a serial value equal to its upper limit.

Thanks in advance. Kind regards,

Peter

May be you want to use BIGSERIAL if you are worried?

Ries

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

#10

Scott Marlowe

scott.marlowe@gmail.com

over 17 years ago

In reply to: Peter Billen (#6)

Re: SERIAL datatype

On Thu, Aug 21, 2008 at 1:08 PM, Peter Billen <peter@clueless.be> wrote:

Hi all,

I would like to ask a question about the serial datatype. Say I have a field
of type serial, and say for the sake of example that the range of a serial
goes from 1 to 5 (inclusive). I insert 5 (ed) entries into the table, so the
table is 'full':

INSERT INTO my_table (my_serial) VALUES (DEFAULT);
INSERT INTO my_table (my_serial) VALUES (DEFAULT);
INSERT INTO my_table (my_serial) VALUES (DEFAULT);
INSERT INTO my_table (my_serial) VALUES (DEFAULT);

Next I delete a random entry, say the one with value 3:

DELETE FROM my_table WHERE my_serial = 3;

Is it possible to insert a new entry? Will the serial sequence somehow be
able to find the gap (3)?

No, sequences do not fill in gaps.

The reason why I am asking is because I have a table in which constantly
entries are being deleted and inserted. What happens if the serial sequence
is exhausted? If it is not able to go the the next gap, how is it possible
to keep inserting and deleting entries once the serial sequence has been
exhausted? I can't find this anywhere in docs.

Regular SERIAL type is limited to a 32 bit int. BIGSERIAL uses a 64
bit int. That will give you an upper limit of 2^63, assuming positive
values only in the sequence. If you run out of that many values
you're running a seriously busy database over a very long time.

My rough guesstimate is that at 2000 inserts per second, it would take
approximately 145,865,043 years to exhaust a BIGSERIAL. I might be
off by a factor of ten or so there. But I don't think I am. Note
that an insert rate of 2000 per second would exhaust a regular SERIAL
type (2^31 size) in 12 days.

To me, it is perfectly possible that there is only one entry in the table,
with a serial value equal to its upper limit.

That's fine too. If you need gapless sequences, be prepared to pay
more in terms of overhead costs. If you don't need gapless sequences
(and usually you don't) then use either SERIAL or BIGSERIAL.

#11

Scott Marlowe

scott.marlowe@gmail.com

over 17 years ago

In reply to: Peter Billen (#9)

Re: SERIAL datatype

On Thu, Aug 21, 2008 at 1:51 PM, Peter Billen <peter@clueless.be> wrote:

My concern is not that the table will become full, but that the sequence
will be exhausted. Doing INSERT, DELETE, INSERT, DELETE ... will exhaust the
sequence. What will happen then? Do I have to manually re-order my serial
values and reset the start sequence ID to MAX() + 1?

See my other reply. short answer: use bigserial

#12

Peter Billen

peter@clueless.be

over 17 years ago

In reply to: Scott Marlowe (#10)

Re: SERIAL datatype

Thanks. I thought it was a bit counter-intuitive to have a BIGSERIAL
while I will only have a few thousands of entries, which are updated (by
DELETE and INSERT) constantly.

Thanks Scott,

Peter

Scott Marlowe schreef:

Show quoted text

On Thu, Aug 21, 2008 at 1:08 PM, Peter Billen <peter@clueless.be> wrote:

Hi all,

I would like to ask a question about the serial datatype. Say I have a field
of type serial, and say for the sake of example that the range of a serial
goes from 1 to 5 (inclusive). I insert 5 (ed) entries into the table, so the
table is 'full':

INSERT INTO my_table (my_serial) VALUES (DEFAULT);
INSERT INTO my_table (my_serial) VALUES (DEFAULT);
INSERT INTO my_table (my_serial) VALUES (DEFAULT);
INSERT INTO my_table (my_serial) VALUES (DEFAULT);

Next I delete a random entry, say the one with value 3:

DELETE FROM my_table WHERE my_serial = 3;

Is it possible to insert a new entry? Will the serial sequence somehow be
able to find the gap (3)?

No, sequences do not fill in gaps.

The reason why I am asking is because I have a table in which constantly
entries are being deleted and inserted. What happens if the serial sequence
is exhausted? If it is not able to go the the next gap, how is it possible
to keep inserting and deleting entries once the serial sequence has been
exhausted? I can't find this anywhere in docs.

Regular SERIAL type is limited to a 32 bit int. BIGSERIAL uses a 64
bit int. That will give you an upper limit of 2^63, assuming positive
values only in the sequence. If you run out of that many values
you're running a seriously busy database over a very long time.

My rough guesstimate is that at 2000 inserts per second, it would take
approximately 145,865,043 years to exhaust a BIGSERIAL. I might be
off by a factor of ten or so there. But I don't think I am. Note
that an insert rate of 2000 per second would exhaust a regular SERIAL
type (2^31 size) in 12 days.

To me, it is perfectly possible that there is only one entry in the table,
with a serial value equal to its upper limit.

That's fine too. If you need gapless sequences, be prepared to pay
more in terms of overhead costs. If you don't need gapless sequences
(and usually you don't) then use either SERIAL or BIGSERIAL.

#13

Alban Hertroys

dalroi@solfertje.student.utwente.nl

over 17 years ago

In reply to: Ivan Sergio Borgonovo (#5)

Re: on delete cascade slowing down delete

Is it going to make things faster if I:

delete from s;
reindex table s;

Why do you think this step would help you any? There's no index on p
to begin with. You'd just be reindexing the auto-generated unique
index on s (due to it being a PK).

delete from p;

And no, this would most likely be slower.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,48ae6140243481364815068!

#14

Ivan Sergio Borgonovo

mail@webthatworks.it

over 17 years ago

In reply to: Alban Hertroys (#13)

Re: on delete cascade slowing down delete

On Fri, 22 Aug 2008 08:48:30 +0200
Alban Hertroys <dalroi@solfertje.student.utwente.nl> wrote:

Is it going to make things faster if I:

delete from s;
reindex table s;

Why do you think this step would help you any? There's no index on
p to begin with. You'd just be reindexing the auto-generated
unique index on s (due to it being a PK).

Sorry I forgot to add the index in the example.
What if there was an index in s.pid too?
But mostly... if I delete s will the deletion of p be faster?

delete from p;

And no, this would most likely be slower.

Why?

Stopping a
delete from p;
I can see that actually postgresql is also executing a
delete from s where pid=$1;
if s is already empty, and there are no other cascading delete on s,
the lookup should be faster. I was wondering if that doesn't make a
difference in terms of performance if
a) I've an index on pid on both tables
or
b) s is already empty

and... should I reindex s if I "delete from s" first if I want some
speed up on delete from p;

Anyway this looks more and more a dead end once things get more and
more complicated since it requires too much bookkeeping.

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

#15

Sam Mason

sam@samason.me.uk

over 17 years ago

In reply to: Peter Billen (#6)

Re: SERIAL datatype

On Thu, Aug 21, 2008 at 09:08:24PM +0200, Peter Billen wrote:

Is it possible to insert a new entry? Will the serial sequence somehow
be able to find the gap (3)?

As others have said, no it's not going to. Sequences will only return
values out of order when explicitly told to. The main reason is to
help prevent ambiguities in the data; if it could automatically reset
it would be much more difficult to determine if 7 was "older" than 3
or, worse, which 7 is the correct one. In a well designed system this
shouldn't ever occur, but if something does go horribly wrong it's much
easier to put the pieces back together this way.

Sam

#16

Alban Hertroys

dalroi@solfertje.student.utwente.nl

over 17 years ago

In reply to: Ivan Sergio Borgonovo (#14)

Re: on delete cascade slowing down delete

On Aug 22, 2008, at 9:45 AM, Ivan Sergio Borgonovo wrote:

On Fri, 22 Aug 2008 08:48:30 +0200
Alban Hertroys <dalroi@solfertje.student.utwente.nl> wrote:

Is it going to make things faster if I:

delete from s;
reindex table s;

Why do you think this step would help you any? There's no index on
p to begin with. You'd just be reindexing the auto-generated
unique index on s (due to it being a PK).

Sorry I forgot to add the index in the example.
What if there was an index in s.pid too?
But mostly... if I delete s will the deletion of p be faster?

Hard to tell without the results from explain analyse. It depends on
what the planner decides to do, but it's often faster than the things
we come up with to work around the planner. As a rule of thumb, if
you're trying to work around the planner it is likely your problem is
caused by something else.

Without an explain plan everything is just speculation really, the
planner is quite smart and it knows your data. It tends to outsmart
the devs.

delete from p;

And no, this would most likely be slower.

Why?

Because of the extra reindex step. If you'd replace that with an
analyse of p, then it may be faster. Or it may not.

You seem to misinterpret the use case for REINDEX. Read here:
http://www.postgresql.org/docs/8.3/interactive/sql-reindex.html

Especially note the usage scenarios ;)

Stopping a
delete from p;
I can see that actually postgresql is also executing a
delete from s where pid=$1;
if s is already empty, and there are no other cascading delete on s,
the lookup should be faster. I was wondering if that doesn't make a
difference in terms of performance if
a) I've an index on pid on both tables
or
b) s is already empty

and... should I reindex s if I "delete from s" first if I want some
speed up on delete from p;

Anyway this looks more and more a dead end once things get more and
more complicated since it requires too much bookkeeping.

Maybe you shouldn't try to speculate on solutions before you
ascertained what the problem is? People asked for an EXPLAIN ANALYSE,
we can't really help you without that.
With it we can probably exactly tell you what's causing your problem,
if we still need to once you see the results.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,48b147ea243482493511527!

#17

Alban Hertroys

dalroi@solfertje.student.utwente.nl

over 17 years ago

In reply to: Peter Billen (#9)

Re: SERIAL datatype

On Aug 21, 2008, at 9:51 PM, Peter Billen wrote:

My concern is not that the table will become full, but that the
sequence will be exhausted. Doing INSERT, DELETE, INSERT,
DELETE ... will exhaust the sequence. What will happen then? Do I
have to manually re-order my serial values and reset the start
sequence ID to MAX() + 1?

DELETEs don't use your sequence so will not exhaust it. In practice
only INSERTs do. I saw you mention sequences in combination with
DELETEs a few times, just making sure you're not confused ;)

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,48b14c10243481755132881!

#18

Ivan Sergio Borgonovo

mail@webthatworks.it

over 17 years ago

In reply to: Alban Hertroys (#16)

Re: on delete cascade slowing down delete

On Sun, 24 Aug 2008 13:37:11 +0200
Alban Hertroys <dalroi@solfertje.student.utwente.nl> wrote:

Is it going to make things faster if I:

delete from s;
reindex table s;

Why do you think this step would help you any? There's no index
on p to begin with. You'd just be reindexing the auto-generated
unique index on s (due to it being a PK).

Sorry I forgot to add the index in the example.
What if there was an index in s.pid too?
But mostly... if I delete s will the deletion of p be faster?

Hard to tell without the results from explain analyse. It depends
on what the planner decides to do, but it's often faster than the
things we come up with to work around the planner. As a rule of
thumb, if you're trying to work around the planner it is likely
your problem is caused by something else.

Without an explain plan everything is just speculation really,
the planner is quite smart and it knows your data. It tends to
outsmart the devs.

delete from p;

And no, this would most likely be slower.

Why?

Because of the extra reindex step. If you'd replace that with an
analyse of p, then it may be faster. Or it may not.

You seem to misinterpret the use case for REINDEX. Read here:
http://www.postgresql.org/docs/8.3/interactive/sql-reindex.html

Especially note the usage scenarios ;)

http://www.postgresql.org/docs/8.1/interactive/routine-reindex.html

So on later version than 7.4... what's going to happen if I delete a
whole table?
It looks like it is not an issue and at least reindexing can be
avoided.

Maybe you shouldn't try to speculate on solutions before you
ascertained what the problem is? People asked for an EXPLAIN
ANALYSE, we can't really help you without that.

As to my understanding EXPLAIN ANALYSE does actually run the
query... but it was so damn slow to have result in a useful time.
I re engineered the tables and now the stuff works at a reasonable
speed.

Does the planner optimise multiple statements in a transaction or
just a statement at a time?

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

#19

Mark Roberts

mailing_lists@pandapocket.com

over 17 years ago

In reply to: Scott Marlowe (#10)

Re: SERIAL datatype

On Thu, 2008-08-21 at 13:53 -0600, Scott Marlowe wrote:

Regular SERIAL type is limited to a 32 bit int. BIGSERIAL uses a 64
bit int.

I think one of the things that would be offsetting is the size
difference between the two types (32 vs 64 bits, 5 foreign keys, and a
billion rows or so makes for alot of pain).

-Mark

#20

Scott Marlowe

scott.marlowe@gmail.com

over 17 years ago

In reply to: Mark Roberts (#19)

Re: SERIAL datatype

On Mon, Aug 25, 2008 at 10:23 AM, Mark Roberts
<mailing_lists@pandapocket.com> wrote:

On Thu, 2008-08-21 at 13:53 -0600, Scott Marlowe wrote:

Regular SERIAL type is limited to a 32 bit int. BIGSERIAL uses a 64
bit int.

I think one of the things that would be offsetting is the size
difference between the two types (32 vs 64 bits, 5 foreign keys, and a
billion rows or so makes for alot of pain).

Well, of course a 64 bit int is gonna be bigger than a 32 bit, but
with alignment issues and on 64 bit hardware, I'm guessing the
difference isn't exactly twice as slow / twice as much storage. And
it's way faster than a GUID which was what I think started this
thread.

#21

Mark Roberts

mailing_lists@pandapocket.com

over 17 years ago

In reply to: Scott Marlowe (#20)

#22

Boszormenyi Zoltan

zb@cybertec.at

over 17 years ago

In reply to: Mark Roberts (#21)