Deletes and large tables

Started by Edmund Dengleralmost 21 years ago7 messagesgeneral
Jump to latest
#1Edmund Dengler
edmundd@eSentire.com

Greetings!

We have a table with more than 250 million rows. I am trying to delete the
first 100,000 rows (based on a bigint primary key), and I had to cancel
after 4 hours of the system not actually finishing the delete. I wrote a
script to delete individual rows 10,000 at a time using transactions, and
am finding each individual delete takes on the order of 0.1 seconds to 2-3
seconds. There are 4 indexes on the table, one of which is very "hashlike"
(ie, distribution is throught the index for sequential rows).

Is this considered "normal" for delete speed? Currently using 8.0.1.
Drives are capable of 40MB/sec sustained transfers, but only acheiving
about 2-10kB/sec, and mostly CPU bound.

Regards!
Ed

#2Richard Huxton
dev@archonet.com
In reply to: Edmund Dengler (#1)
Re: Deletes and large tables

Edmund Dengler wrote:

Greetings!

We have a table with more than 250 million rows. I am trying to delete the
first 100,000 rows (based on a bigint primary key), and I had to cancel
after 4 hours of the system not actually finishing the delete. I wrote a
script to delete individual rows 10,000 at a time using transactions, and
am finding each individual delete takes on the order of 0.1 seconds to 2-3
seconds. There are 4 indexes on the table, one of which is very "hashlike"
(ie, distribution is throught the index for sequential rows).

I don't suppose it's off checking foreign-keys in a lot of tables is it?

--
Richard Huxton
Archonet Ltd

#3Edmund Dengler
edmundd@eSentire.com
In reply to: Richard Huxton (#2)
Re: Deletes and large tables

Greetings!

This table has 3 foreign keys, but that should not matter during deletes.
In addition, the tables being referred to are small, and should be in
cache.

There are no tables depending on it for references, so no dependent
triggers should be running.

Also, if this was a foreign key issue, I would expect I/O issues/bounds
and not CPU.

Regards!
Ed

On Fri, 10 Jun 2005, Richard Huxton wrote:

Show quoted text

Edmund Dengler wrote:

Greetings!

We have a table with more than 250 million rows. I am trying to delete the
first 100,000 rows (based on a bigint primary key), and I had to cancel
after 4 hours of the system not actually finishing the delete. I wrote a
script to delete individual rows 10,000 at a time using transactions, and
am finding each individual delete takes on the order of 0.1 seconds to 2-3
seconds. There are 4 indexes on the table, one of which is very "hashlike"
(ie, distribution is throught the index for sequential rows).

I don't suppose it's off checking foreign-keys in a lot of tables is it?

--
Richard Huxton
Archonet Ltd

#4David Gagnon
dgagnon@siunik.com
In reply to: Edmund Dengler (#1)
Re: Deletes and large tables

Hi,

You have any foreign keys pointing this table ? That's the problem I
got when I wanted to delete all rows from a table with 5 FK. You may
search my name in the list archive and found the thread on this matter.

Ciao
/David
Edmund Dengler wrote:

Show quoted text

Greetings!

We have a table with more than 250 million rows. I am trying to delete the
first 100,000 rows (based on a bigint primary key), and I had to cancel
after 4 hours of the system not actually finishing the delete. I wrote a
script to delete individual rows 10,000 at a time using transactions, and
am finding each individual delete takes on the order of 0.1 seconds to 2-3
seconds. There are 4 indexes on the table, one of which is very "hashlike"
(ie, distribution is throught the index for sequential rows).

Is this considered "normal" for delete speed? Currently using 8.0.1.
Drives are capable of 40MB/sec sustained transfers, but only acheiving
about 2-10kB/sec, and mostly CPU bound.

Regards!
Ed

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

#5David Gagnon
dgagnon@siunik.com
In reply to: Edmund Dengler (#3)
Re: Deletes and large tables

This table has 3 foreign keys, but that should not matter during deletes.
In addition, the tables being referred to are small, and should be in
cache.

I'm talking about FK that point this table... Not FK defined for this
table that point to other table. If Table A is referenced by 10 other
tables 10 referencial check are needed.

There are no tables depending on it for references, so no dependent
triggers should be running.

Also, if this was a foreign key issue, I would expect I/O issues/bounds
and not CPU.

Maybe... I'm honetly not sure.

Like I said in my previous mail... I got a similar problem (maybe not
the same). It was taking 10 minutes to delete 10k line in a table. I
turn on some log info in postgresql.conf and I saw that for each row
deleted 4 selects were issued to check FK. I drop those FK and the
after the delete was taking less than a second.

Hope it help
/David

Show quoted text

Regards!
Ed

On Fri, 10 Jun 2005, Richard Huxton wrote:

Edmund Dengler wrote:

Greetings!

We have a table with more than 250 million rows. I am trying to delete the
first 100,000 rows (based on a bigint primary key), and I had to cancel
after 4 hours of the system not actually finishing the delete. I wrote a
script to delete individual rows 10,000 at a time using transactions, and
am finding each individual delete takes on the order of 0.1 seconds to 2-3
seconds. There are 4 indexes on the table, one of which is very "hashlike"
(ie, distribution is throught the index for sequential rows).

I don't suppose it's off checking foreign-keys in a lot of tables is it?

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Edmund Dengler (#3)
Re: Deletes and large tables

Edmund Dengler <edmundd@eSentire.com> writes:

This table has 3 foreign keys, but that should not matter during deletes.

Richard was inquiring about foreign keys linking *into* this table, not
out of it.

In particular, foreign keys that don't have indexes on the referencing
columns; those will incur seqscans to check for appearances of the value
you are deleting.

regards, tom lane

#7Edmund Dengler
edmundd@eSentire.com
In reply to: David Gagnon (#5)
Re: Deletes and large tables

Just did a sanity check. I dumped the DB schema, and there is indeed a
foreign key reference into the table. Now interestingly, the table
pointing in has no index on the column, but is a relatively small table
with only entries near the end of the large table.

So looks like I was getting CPU bound because of a sequental scan of the
smaller table per delete.

Dropped the constraint, and deletes are now much faster.

Regards!
Ed

On Fri, 10 Jun 2005, David Gagnon wrote:

Show quoted text

This table has 3 foreign keys, but that should not matter during deletes.
In addition, the tables being referred to are small, and should be in
cache.

I'm talking about FK that point this table... Not FK defined for this
table that point to other table. If Table A is referenced by 10 other
tables 10 referencial check are needed.

There are no tables depending on it for references, so no dependent
triggers should be running.

Also, if this was a foreign key issue, I would expect I/O issues/bounds
and not CPU.

Maybe... I'm honetly not sure.

Like I said in my previous mail... I got a similar problem (maybe not
the same). It was taking 10 minutes to delete 10k line in a table. I
turn on some log info in postgresql.conf and I saw that for each row
deleted 4 selects were issued to check FK. I drop those FK and the
after the delete was taking less than a second.

Hope it help
/David

Regards!
Ed

On Fri, 10 Jun 2005, Richard Huxton wrote:

Edmund Dengler wrote:

Greetings!

We have a table with more than 250 million rows. I am trying to delete the
first 100,000 rows (based on a bigint primary key), and I had to cancel
after 4 hours of the system not actually finishing the delete. I wrote a
script to delete individual rows 10,000 at a time using transactions, and
am finding each individual delete takes on the order of 0.1 seconds to 2-3
seconds. There are 4 indexes on the table, one of which is very "hashlike"
(ie, distribution is throught the index for sequential rows).

I don't suppose it's off checking foreign-keys in a lot of tables is it?

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly