Fast Deletion For Large Tables

Started by Raymond Chuiover 23 years ago9 messagesgeneral
Jump to latest
#1Raymond Chui
raymond.chui@noaa.gov

I have some tables with huge data.
The tables have column timestamp and float.
I am try to keep up to 6 day of data values.
What I do is execute SQL below from crontab (UNIX to
schedule commands).

BEGIN;
DELETE FROM table_1 WHERE column_time < ('now'::timestamp - '6
days'::interval);
.....
DELETE FROM table_n WHERE column_time < ('now'::timestamp - '6
days'::interval);
COMMIT;

Everything is running fine, except take long time to finish.
Because some tables stored values from 50,000 to 100,000 rows
Some deletion need to deleted up to 45,000 rows.

So I am thinking just delete the rows by their row number or row ID,
like

DELETE FROM a_table WHERE row_id < 45000;

I know there is row_id in Oracle.
Is there row_id for a table in Postgres?
Thank Q!

#2Shridhar Daithankar
shridhar_daithankar@persistent.co.in
In reply to: Raymond Chui (#1)
Re: Fast Deletion For Large Tables

On 2 Oct 2002 at 14:20, Raymond Chui wrote:

Everything is running fine, except take long time to finish.
Because some tables stored values from 50,000 to 100,000 rows
Some deletion need to deleted up to 45,000 rows.

That's not much. How much time it exactly takes? Does it have index on
timestamp field?

So I am thinking just delete the rows by their row number or row ID,
like

DELETE FROM a_table WHERE row_id < 45000;

I don't think that's the problem. Do you have big enough WAL's for deleting
45000 records in one transaction? Try deleting them in batches or increase WAL
files and buffers..Should help you..

I know there is row_id in Oracle.
Is there row_id for a table in Postgres?

That's called as OIDs in postgresql. But as I said, it's likely that you might
not have sufficient;y big WAL. Try setting some higher values.. Or don't delete
in transactions if possible..

Bye
Shridhar

--
Consent decree: A document in which a hapless company consents never to commit
in the future whatever heinous violations of Federal law it never admitted to
in the first place.

#3Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Raymond Chui (#1)
Re: [ADMIN] Fast Deletion For Large Tables

On Wed, 2 Oct 2002, Raymond Chui wrote:

I have some tables with huge data.
The tables have column timestamp and float.
I am try to keep up to 6 day of data values.
What I do is execute SQL below from crontab (UNIX to
schedule commands).

BEGIN;
DELETE FROM table_1 WHERE column_time < ('now'::timestamp - '6
days'::interval);
.....
DELETE FROM table_n WHERE column_time < ('now'::timestamp - '6
days'::interval);
COMMIT;

Everything is running fine, except take long time to finish.
Because some tables stored values from 50,000 to 100,000 rows
Some deletion need to deleted up to 45,000 rows.

So I am thinking just delete the rows by their row number or row ID,
like

DELETE FROM a_table WHERE row_id < 45000;

I know there is row_id in Oracle.
Is there row_id for a table in Postgres?

Not really of that sort IIRC Oracle's row_id definition, although you
could probably fake something with a sequence.

#4Roland Roberts
roland@astrofoto.org
In reply to: Raymond Chui (#1)
Re: [GENERAL] Fast Deletion For Large Tables

"rc" == Raymond Chui <raymond.chui@noaa.gov> writes:

roland
--
PGP Key ID: 66 BC 3B CD
Roland B. Roberts, PhD RL Enterprises
roland@rlenter.com 76-15 113th Street, Apt 3B
roland@astrofoto.org Forest Hills, NY 11375

#5Roland Roberts
roland@astrofoto.org
In reply to: Raymond Chui (#1)
Re: [GENERAL] Fast Deletion For Large Tables

"rc" == Raymond Chui <raymond.chui@noaa.gov> writes:

rc> Everything is running fine, except take long time to finish.
rc> Because some tables stored values from 50,000 to 100,000 rows
rc> Some deletion need to deleted up to 45,000 rows.

Is column_time indexed?

roland
--
PGP Key ID: 66 BC 3B CD
Roland B. Roberts, PhD RL Enterprises
roland@rlenter.com 76-15 113th Street, Apt 3B
roland@astrofoto.org Forest Hills, NY 11375

In reply to: Raymond Chui (#1)
Re: Fast Deletion For Large Tables

On Wed, Oct 02, 2002 at 02:20:39PM -0400, Raymond Chui wrote:

Everything is running fine, except take long time to finish.
Because some tables stored values from 50,000 to 100,000 rows
Some deletion need to deleted up to 45,000 rows.

how often do you make this "delete" process?
do you vacuum before *and* after it?
with deleteing 45000 out of 50000 (or 100000) records no index can speed
up. anyway - deleting 45000 records shouldn't take very long.
do you have any foreign keys from this table to others? any indices?

try maybe this path:
connect
vacuum
drop indices
disconnect
connect
delete from ...
create indices
vacuum analyze
disconnect

could work better.

depesz

--
hubert depesz lubaczewski http://www.depesz.pl/
------------------------------------------------------------------------
Mój Boże, spraw abym milczał, dopóki się nie upewnię, że naprawdę mam
coś do powiedzenia. (c) 1998 depesz

#7Nikolaus Dilger
nikolaus@dilger.cc
In reply to: hubert depesz lubaczewski (#6)
Re: [ADMIN] Fast Deletion For Large Tables

Raymond,

Partitioned tables would solve your issue since you
could just truncate a partiotion in order to delete the
unneeded data. Unfortunately they are not available in
PostgreSQL. But maybe in a future release.

Unfortunately you also cannot use a UNION ALL in a view
to fake partitions.

So only several changes to your applicatin would allow
you to partion your data. But depending on your
situation it may be worthwhile. E.g. if you always
want to keep one weeks worth of data and purge a day at
a time you could have 7 tables instead of one.
table1_mon, table1_tue, table1_wed, etc. And you could
select from all of them with a UNION ALL.

There is no rowid in PostgreSQL. So while adding an
additional column with a sequence would help as long as
you have an index. Just indexing your date column
maybe easier. But then you pay with slower inserts or
the time and effort to build the index.

Regards,
Nikolaus Dilger

On Wed, 02 October 2002, Raymond Chui wrote:

Show quoted text

I have some tables with huge data.
The tables have column timestamp and float.
I am try to keep up to 6 day of data values.
What I do is execute SQL below from crontab (UNIX to
schedule commands).

BEGIN;
DELETE FROM table_1 WHERE column_time &lt;
('now'::timestamp - '6
days'::interval);
.....
DELETE FROM table_n WHERE column_time &lt;
('now'::timestamp - '6
days'::interval);
COMMIT;

Everything is running fine, except take long time to
finish.
Because some tables stored values from 50,000 to
100,000 rows
Some deletion need to deleted up to 45,000 rows.

So I am thinking just delete the rows by their row
number or row ID,
like

DELETE FROM a_table WHERE row_id &lt; 45000;

I know there is row_id in Oracle.
Is there row_id for a table in Postgres?
Thank Q!

#8Shridhar Daithankar
shridhar_daithankar@persistent.co.in
In reply to: Nikolaus Dilger (#7)
Re: [GENERAL] Fast Deletion For Large Tables

On 5 Oct 2002 at 20:17, Nikolaus Dilger wrote:

Partitioned tables would solve your issue since you
could just truncate a partiotion in order to delete the
unneeded data. Unfortunately they are not available in
PostgreSQL. But maybe in a future release.

If you don't mind realigning your schema, inheriting tables can closely mimick
partitions upto certain extent.

Reading the original post below, I recommend you to consider inheritance
approach. Create a base table which is empty and create inherited partitions.
You can get all the data in child table by querying upon base table. So your
application would not require any modification as long as selecting data is
considered.

You have to write a wrapper on insert/update/delete code to operate upon a
particular partition.

There is no rowid in PostgreSQL. So while adding an
additional column with a sequence would help as long as
you have an index. Just indexing your date column
maybe easier. But then you pay with slower inserts or
the time and effort to build the index.

There is oid and if you don't use it you can disable it saving some space if
you have large data.

HTH

Bye
Shridhar

--
Bachelor: A man who chases women and never Mrs. one.

#9Ross J. Reedstrom
reedstrm@rice.edu
In reply to: Nikolaus Dilger (#7)
Re: [ADMIN] Fast Deletion For Large Tables

On Sat, Oct 05, 2002 at 08:17:38PM -0700, Nikolaus Dilger wrote:

Raymond,

Partitioned tables would solve your issue since you
could just truncate a partiotion in order to delete the
unneeded data. Unfortunately they are not available in
PostgreSQL. But maybe in a future release.

Unfortunately you also cannot use a UNION ALL in a view
to fake partitions.

Hmm, you haven't tried this recently, have you? With pgsql 7.2.1,
It seems to work just fine:

test=# create view transactions as select * from monday union all select * from tuesday union all select * from wednesday ;
CREATE
test=# \d transactions
View "transactions"
Column | Type | Modifiers
----------+---------+-----------
daynum | integer |
transact | integer |
View definition: (SELECT monday.daynum, monday.transact FROM monday UNION ALL SELECT tuesday.daynum, tuesday.transact FROM tuesday) UNION ALL SELECT wednesday.daynum, wednesday.transact FROM wednesday;

test=# select * from transactions;
daynum | transact
--------+----------
1 | 1
1 | 2
1 | 3
2 | 4
2 | 5
2 | 6
(6 rows)

test=#

Ross