Re: Need help in reclaiming disk space by deleting the selected records

Started by Yelai, Ramkumar IN BLR STSover 13 years ago5 messagesgeneral
Jump to latest
#1Yelai, Ramkumar IN BLR STS
ramkumar.yelai@siemens.com

Thanks Steve and Andres,

I read these articles
http://www.linuxinsight.com/optimize_postgresql_database_size.html
http://yuval.bar-or.org/blog/2010/08/sluggish-postgresql-databases-and-reindexing-indexes/
http://www.if-not-true-then-false.com/2009/partitioning-large-postgresql-tables-and-handle-millions-of-rows-efficiently-and-quickly/

and I have some more questions on the Steve comments.

1. Do I need run REINDEX to reduce space or auto vacuum will handle re indexing?
2. Cluster, Re index and Vacuum full locks the table, Hence do we need to avoid database operations ( select, delete, insert ) while doing disk clean up? Just curious what if I keep inserting while running this command?
3. All the three commands needs some additional space to do this operation? Am I correct?
4. Would all database server ( oracle, sqlserver and mysql ) needs downtime while doing disk clean up?
5. I am very happy to use Truncate and table partitioning, it is satisfying my requirements. But in order to achieve this, for 10 years ( currently 6 unique archiving tables I have ) I have to create 1440 month tables. Will it creates any issue and is there anything I need to consider carefully while doing this?

Thanks & Regards,
Ramkumar
_____________________________________________
From: Yelai, Ramkumar IN BLR STS
Sent: Thursday, September 13, 2012 7:03 PM
To: 'pgsql-general@postgresql.org'
Subject: Need help in reclaiming disk space by deleting the selected records

Hi All,

I am a beginner in Postgresql and Databases. I have a requirement that reclaiming disk space by deleting the rows in a selected time span. I went through the documents and articles to know how to get the table size (http://wiki.postgresql.org/wiki/Disk_Usage)

But before let the user delete, I have to show the size of the records size in the selected time span. But here I don't know how to calculate the selected records size.
In addition to this, I assume that after deleted the records I have to run VACUUM FULL command to reclaiming the space( Please correct me if I am wrong or let me know the best approach) .

The table looks like this

CREATE TABLE IF NOT EXISTS "SN_SamplTable"
(
"ID" integer NOT NULL,
"Data" integer,
"CLIENT_COUNT_TIMESTAMP" timestamp without time zone
);

Please help me to how to proceed on this.

Thanks & Regards,
Ramkumar.

#2Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Yelai, Ramkumar IN BLR STS (#1)
Re: Re: Need help in reclaiming disk space by deleting the selected records

Yelai, Ramkumar wrote:

1. Do I need run REINDEX to reduce space or auto vacuum will handle

re indexing?

Autovacuum will not rebuild the index as REINDEX does.
It will only free index entries tht can be reused later.

2. Cluster, Re index and Vacuum full locks the table, Hence do we

need to avoid database operations

( select, delete, insert ) while doing disk clean up? Just curious

what if I keep inserting while

running this command?

The INSERT would get blocked until the CLUSTER, REINDEX or VACUUM FULL
is done.

3. All the three commands needs some additional space to do this

operation? Am I correct?

Yes.

4. Would all database server ( oracle, sqlserver and mysql ) needs

downtime while doing disk clean

up?

"Disk cleanup" is a very vague term.
All database management systems handle these things differently.

Note that PostgreSQL normally does not need any downtime if the
vacuum strategy is right.

5. I am very happy to use Truncate and table partitioning, it is

satisfying my requirements. But in

order to achieve this, for 10 years ( currently 6 unique archiving

tables I have ) I have to create

1440 month tables. Will it creates any issue and is there anything I

need to consider carefully while

doing this?

Anything exceeding a few hundred partitions is not considered a good
idea.
The system needs to keep track of all the tables, and query planning
for such a partitioned table might be expensive.

1440 is probably pushing the limits, but maybe somebody with more
experience can say more.

Yours,
Laurenz Albe

#3Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Yelai, Ramkumar IN BLR STS (#1)
Re: Re: Need help in reclaiming disk space by deleting the selected records

Yelai, Ramkumar wrote:

Anything exceeding a few hundred partitions is not considered a good

idea.

The system needs to keep track of all the tables, and query planning

for such a partitioned table

might be expensive.

1440 is probably pushing the limits, but maybe somebody with more

experience can say more.

By mistake I added 1440 tables, but it is incorrect, below is the

total number of tables

7 base tables X 120 months = 840 child tables.

As per your statement, If I create these many table then it will

affect the performance. But as per

the document

(http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html)

constraint_exclusion will improve query performance. Please clarify me

here how query planning will be

expensive?

The planner will have to decide which of the 840 tables
to access.

I have one more strategy that instead of creating 10 years, I'd like

to use batch processing like

create 2 years of tables ( 240 tables ) and when we are going above 2

years we will create next 2 year

table and update the trigger or use 5 years (480 tables ) instead of 2

years.

The above approach will not create a more partitioned table and if

user wants space they can truncate

the old tables. Please let me know is this good approach?

I don't understand that in detail.

I would recommend that you prototype some of these variants
and run some performance tests. That's the only good way to
know what will perform well in your environment.

Yours,
Laurenz Albe

#4John R Pierce
pierce@hogranch.com
In reply to: Laurenz Albe (#3)
Re: Re: Need help in reclaiming disk space by deleting the selected records

On 09/28/12 12:36 AM, Albe Laurenz wrote:

Yelai, Ramkumar wrote:

7 base tables X 120 months = 840 child tables. As per your statement, If I create these many table then it will affect the performance. But as per the document (http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html) constraint_exclusion will improve query performance. Please clarify me here how query planning will be expensive?

The planner will have to decide which of the 840 tables
to access.

well, really, which of the 120 tables for a given base table. he's got
7 different base tables. 120 partitions is still too many.

if I was partioning a table for 2 year retention, I'd probably do it by
month, or even quarter, so there would be 24 or 8 child tables for 2
years. we use week tables for 6 months, but almost all of our
activity is to the latest week and the one before that, its quite rare
we need to dig back to older records..

--
john r pierce N 37, W 122
santa cruz ca mid-left coast

#5Yelai, Ramkumar IN BLR STS
ramkumar.yelai@siemens.com
In reply to: Laurenz Albe (#3)
Re: Re: Need help in reclaiming disk space by deleting the selected records

-----Original Message-----
From: Albe Laurenz [mailto:laurenz.albe@wien.gv.at]
Sent: Friday, September 28, 2012 1:07 PM
To: Yelai, Ramkumar IN BLR STS; pgsql-general@postgresql.org
Cc: scrawford@pinpointresearch.com; andres@2ndquadrant.com
Subject: RE: [GENERAL] Re: Need help in reclaiming disk space by deleting the selected records

Yelai, Ramkumar wrote:

Anything exceeding a few hundred partitions is not considered a good

idea.

The system needs to keep track of all the tables, and query planning

for such a partitioned table

might be expensive.

1440 is probably pushing the limits, but maybe somebody with more

experience can say more.

By mistake I added 1440 tables, but it is incorrect, below is the

total number of tables

7 base tables X 120 months = 840 child tables.

As per your statement, If I create these many table then it will

affect the performance. But as per

the document

(http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html)

constraint_exclusion will improve query performance. Please clarify me

here how query planning will be

expensive?

The planner will have to decide which of the 840 tables to access.

I have one more strategy that instead of creating 10 years, I'd like

to use batch processing like

create 2 years of tables ( 240 tables ) and when we are going above 2

years we will create next 2 year

table and update the trigger or use 5 years (480 tables ) instead of 2

years.

The above approach will not create a more partitioned table and if

user wants space they can truncate

the old tables. Please let me know is this good approach?

I don't understand that in detail.

I would recommend that you prototype some of these variants and run some performance tests. That's the only good way to know what will perform well in your environment.

Yours,
Laurenz Albe

Thanks Laurenz Albe.

After I went through the below articles, I understand query plans about partition table and its limits.

http://stackoverflow.com/questions/6104774/how-many-table-partitions-is-too-many-in-postgres
http://postgresql.1045698.n5.nabble.com/Table-partitioning-td3410542.html

As per our functionality ( 7 tables are represents 7 Unique archiving logs ), we will not be querying 840 tables or 7 base tables at same time. i.e each unique archiving logs table will have only 120 child tables, hence planner will have to device which of 120 table to access.

In addition to this, at any time I will be reading only one partition table among 120 tables as per our computations. In this computation we will not use any joins or combining the partition tables.

As Laurenz said, I will do some prototype and I will check the query plans based on our queries.

Please let me know if you have any points are suggestions.

Thanks & regards,
Ramkumar