Monthly table partitioning for fast purges?

Started by Roger Handover 22 years ago11 messagesgeneral
Jump to latest
#1Roger Hand
rhand@ragingnet.com

We are moving an application from Oracle 8i to Postgres and I've run into a problem attempting to duplicate a feature we currently use.

In Oracle you can divide a table into partitions. We use this feature to break up the data by month. Each month we store several tens of millions of rows in a particular table, and each month we drop the partition that's a year old. In other words, we always keep the last 12 months of data (12 partitions). This is clean and fast. Since the partition is by a timestamp column, it also gives us a certain amount of automatic indexing.

Postgres doesn't support table partitions (correct me if I'm wrong!) so the only option appears to be to dump everything into one big table. What I'm worried about is the purging of the data from 12 months ago ... I'm worried that this will be a slow and expensive operation.

Does anyone have any advice for how best to handle this?

Thanks in advance,

-Roger

#2Martijn van Oosterhout
kleptog@svana.org
In reply to: Roger Hand (#1)
Re: Monthly table partitioning for fast purges?

On Fri, Aug 01, 2003 at 01:46:54PM -0700, Roger Hand wrote:

We are moving an application from Oracle 8i to Postgres and I've run into
a problem attempting to duplicate a feature we currently use.

In Oracle you can divide a table into partitions. We use this feature to
break up the data by month. Each month we store several tens of millions
of rows in a particular table, and each month we drop the partition that's
a year old. In other words, we always keep the last 12 months of data (12
partitions). This is clean and fast. Since the partition is by a timestamp
column, it also gives us a certain amount of automatic indexing.

Postgres doesn't support table partitions (correct me if I'm wrong!) so
the only option appears to be to dump everything into one big table. What
I'm worried about is the purging of the data from 12 months ago ... I'm
worried that this will be a slow and expensive operation.

Does anyone have any advice for how best to handle this?

I feel your pain! No, PortgreSQL doesn't support this. There were some
proposals recently on -hackers but there didn't seem to be a great deal of
interest. The best solution I've come up with is by creating base tables for
each year by hand and using a view to combine them.

You can create RULEs to automatically move new data to various tables. As
long as you're not doing UPDATEs you can avoid a lot of the complexity.
Similar effects can be acheived using inheritance.

Good luck!

--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

"All that is needed for the forces of evil to triumph is for enough good
men to do nothing." - Edmond Burke
"The penalty good people pay for not being interested in politics is to be
governed by people worse than themselves." - Plato

#3Shridhar Daithankar
shridhar_daithankar@persistent.co.in
In reply to: Roger Hand (#1)
Re: Monthly table partitioning for fast purges?

On 1 Aug 2003 at 13:46, Roger Hand wrote:

In Oracle you can divide a table into partitions. We use this feature to break up the data by month. Each month we store several tens of millions of rows in a particular table, and each month we drop the partition that's a year old. In other words, we always keep the last 12 months of data (12 partitions). This is clean and fast. Since the partition is by a timestamp column, it also gives us a certain amount of automatic indexing.

Postgres doesn't support table partitions (correct me if I'm wrong!) so the only option appears to be to dump everything into one big table. What I'm worried about is the purging of the data from 12 months ago ... I'm worried that this will be a slow and expensive operation.

Well, you need to emulate this feature on postgresql, as pointed out already.
It is possible in postgresql but is not provided out of the box.

Here is what you can do.

* Create a empty base table.
* Create a trigger the directs the select on timestamps accordingly
* Write a schedule to create a new table at the start of every month and update
the trigger accordingly
* Create and maintain rules to insert into base table depending upon the
timestamp.

This is just a rough idea.

There might be fair amount of work to get this working but surely it is not
imposible.

HTH

Bye
Shridhar

--
Justice, n.: A decision in your favor.

#4Ron Johnson
ron.l.johnson@cox.net
In reply to: Shridhar Daithankar (#3)
Re: Monthly table partitioning for fast purges?

On Sun, 2003-08-03 at 06:31, Shridhar Daithankar wrote:

On 1 Aug 2003 at 13:46, Roger Hand wrote:

[snip]

Here is what you can do.

* Create a empty base table.
* Create a trigger the directs the select on timestamps accordingly
* Write a schedule to create a new table at the start of every month and update
the trigger accordingly
* Create and maintain rules to insert into base table depending upon the
timestamp.

This is just a rough idea.

There might be fair amount of work to get this working but surely it is not
imposible.

And you get a big performance hit when all those records are moved.

Partitioning "should" be put on the TODO list soon after tablespaces
(or DBA-defined directories) is implemented.

-- 
+-----------------------------------------------------------------+
| Ron Johnson, Jr.        Home: ron.l.johnson@cox.net             |
| Jefferson, LA  USA                                              |
|                                                                 |
| "I'm not a vegetarian because I love animals, I'm a vegetarian  |
|  because I hate vegetables!"                                    |
|    unknown                                                      |
+-----------------------------------------------------------------+
#5Mat
psql-mail@freeuk.com
In reply to: Ron Johnson (#4)
Re: Monthly table partitioning for fast purges?

I am looking at ways to speed up queries, the most common way by for
queries to be constrianed is by date range. I have indexed the date
column. Queries are still slower than i would like.

Would there be any performance increase for these types of queries if
the tables were split by month as described by Shridhar (method1) so
only the required tables were loaded from disk?

Will there be any performance increase if table partitioning is
implemented?

If method1 is likely to yield a performance increase, will there be a
large hit for other types
of queries which require all the data?

I'd be happy to post the EXPLAIN ANALYZE output for a typical query if
more info is needed.

The date column is of type timestamp (and so goes right down to seconds)
, most user queries are only concerned about whole days without the
times, (but hte time data is required for other queries) can i do
something with an index of the timestamps cast to dates? and then cast
the queries to dates too?

Thanks!

Ron Johnson Wrote:

On Sun, 2003-08-03 at 06:31, Shridhar Daithankar wrote:

On 1 Aug 2003 at 13:46, Roger Hand wrote:

[snip]

Here is what you can do.

* Create a empty base table.
* Create a trigger the directs the select on timestamps accordingly

* Write a schedule to create a new table at the start of every

month and update

the trigger accordingly
* Create and maintain rules to insert into base table depending

upon the

timestamp.

This is just a rough idea.

There might be fair amount of work to get this working but surely

it is not

imposible.

And you get a big performance hit when all those records are moved.

Partitioning "should" be put on the TODO list soon after tablespaces
(or DBA-defined directories) is implemented.

--

#6Jan Poslusny
pajout@gingerall.cz
In reply to: Mat (#5)
Re: Monthly table partitioning for fast purges?

Did you think about cluster on index ?

psql-mail@freeuk.com wrote:

Show quoted text

I am looking at ways to speed up queries, the most common way by for
queries to be constrianed is by date range. I have indexed the date
column. Queries are still slower than i would like.

Would there be any performance increase for these types of queries if
the tables were split by month as described by Shridhar (method1) so
only the required tables were loaded from disk?

Will there be any performance increase if table partitioning is
implemented?

If method1 is likely to yield a performance increase, will there be a
large hit for other types
of queries which require all the data?

I'd be happy to post the EXPLAIN ANALYZE output for a typical query if
more info is needed.

The date column is of type timestamp (and so goes right down to seconds)
, most user queries are only concerned about whole days without the
times, (but hte time data is required for other queries) can i do
something with an index of the timestamps cast to dates? and then cast
the queries to dates too?

Thanks!

Ron Johnson Wrote:

On Sun, 2003-08-03 at 06:31, Shridhar Daithankar wrote:

On 1 Aug 2003 at 13:46, Roger Hand wrote:

[snip]

Here is what you can do.

* Create a empty base table.
* Create a trigger the directs the select on timestamps accordingly

* Write a schedule to create a new table at the start of every

month and update

the trigger accordingly
* Create and maintain rules to insert into base table depending

upon the

timestamp.

This is just a rough idea.

There might be fair amount of work to get this working but surely

it is not

imposible.

And you get a big performance hit when all those records are moved.

Partitioning "should" be put on the TODO list soon after tablespaces
(or DBA-defined directories) is implemented.

#7Benjamin Jury
benjamin.jury@mpuk.com
In reply to: Jan Poslusny (#6)
Re: Monthly table partitioning for fast purges?

You could create an index on the function date(), which strips the time
information.

-----Original Message-----
From: psql-mail@freeuk.com [mailto:psql-mail@freeuk.com]
Sent: 04 August 2003 14:01
To: PgSQL General ML
Subject: Re: [GENERAL] Monthly table partitioning for fast purges?

I am looking at ways to speed up queries, the most common way by for
queries to be constrianed is by date range. I have indexed the date
column. Queries are still slower than i would like.

...

Show quoted text

The date column is of type timestamp (and so goes right down
to seconds)
, most user queries are only concerned about whole days without the
times, (but hte time data is required for other queries) can i do
something with an index of the timestamps cast to dates? and
then cast
the queries to dates too?

#8scott.marlowe
scott.marlowe@ihs.com
In reply to: Roger Hand (#1)
Re: Monthly table partitioning for fast purges?

On Fri, 1 Aug 2003, Roger Hand wrote:

We are moving an application from Oracle 8i to Postgres and I've run
into a problem attempting to duplicate a feature we currently use.

In Oracle you can divide a table into partitions. We use this feature
to break up the data by month. Each month we store several tens of
millions of rows in a particular table, and each month we drop the
partition that's a year old. In other words, we always keep the last 12
months of data (12 partitions). This is clean and fast. Since the
partition is by a timestamp column, it also gives us a certain amount of
automatic indexing.

Postgres doesn't support table partitions (correct me if I'm wrong!) so
the only option appears to be to dump everything into one big table.
What I'm worried about is the purging of the data from 12 months ago
... I'm worried that this will be a slow and expensive operation.

There are two approaches I can think of, and you should test each one to
see how it holds up to your usage.

Approach the first: Using scripts, create a table for each month. Create
a view that combines all of these months. When a month goes out of date,
simply remove it from the view. Deleting the month can be done at your
leisure, as it only saves disk space at this point, but since it isn't in
the view, it doesn't slow you down to keep them.

Approach the second: Use partial indexes to make it look like the table
is partitioned. I.e. every month create a new partial index like:

create index on bigoldtable (datefield) where datefield >=1stofmonth and
datefield<=lastofmonth.

Then include the "where date >=firstofmonth AND date <= lastofmonth

This should then hit the partial index, which will be small compared to
the master table with all the rows, or the main index, which will index
all fields.

#9Ron Johnson
ron.l.johnson@cox.net
In reply to: Benjamin Jury (#7)
Re: Monthly table partitioning for fast purges?

On Mon, 2003-08-04 at 09:02, Benjamin Jury wrote:

You could create an index on the function date(), which strips the time
information.

How much of a hit would there be if he had separate "monthly tables"
and UNIONed them?

-----Original Message-----
From: psql-mail@freeuk.com [mailto:psql-mail@freeuk.com]
Sent: 04 August 2003 14:01
To: PgSQL General ML
Subject: Re: [GENERAL] Monthly table partitioning for fast purges?

I am looking at ways to speed up queries, the most common way by for
queries to be constrianed is by date range. I have indexed the date
column. Queries are still slower than i would like.

...

The date column is of type timestamp (and so goes right down
to seconds)
, most user queries are only concerned about whole days without the
times, (but hte time data is required for other queries) can i do
something with an index of the timestamps cast to dates? and
then cast
the queries to dates too?

-- 
+-----------------------------------------------------------------+
| Ron Johnson, Jr.        Home: ron.l.johnson@cox.net             |
| Jefferson, LA  USA                                              |
|                                                                 |
| "I'm not a vegetarian because I love animals, I'm a vegetarian  |
|  because I hate vegetables!"                                    |
|    unknown                                                      |
+-----------------------------------------------------------------+
#10Jean-Luc Lachance
jllachan@nsd.ca
In reply to: Shridhar Daithankar (#3)
Re: Monthly table partitioning for fast purges?

I second that.
I have discussed adding partitioning tables almost a year ago...
No need to partition a functional index or anything like that.
Just partition on a specific field.

Ron Johnson wrote:

Show quoted text

[...]
Partitioning "should" be put on the TODO list soon after tablespaces
(or DBA-defined directories) is implemented.

--
+-----------------------------------------------------------------+
| Ron Johnson, Jr.        Home: ron.l.johnson@cox.net             |
| Jefferson, LA  USA                                              |
|                                                                 |
| "I'm not a vegetarian because I love animals, I'm a vegetarian  |
|  because I hate vegetables!"                                    |
|    unknown                                                      |
+-----------------------------------------------------------------+

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

#11Ron Johnson
ron.l.johnson@cox.net
In reply to: Jean-Luc Lachance (#10)
Re: Monthly table partitioning for fast purges?

On Tue, 2003-08-05 at 10:28, Jean-Luc Lachance wrote:

I second that.
I have discussed adding partitioning tables almost a year ago...
No need to partition a functional index or anything like that.
Just partition on a specific field.

Well.....

If you're going to partition a *huge* table by, say, a date field,
and there's also an index on that date field, partitioning the
index on that same field may have some advantages.

Ron Johnson wrote:

[...]
Partitioning "should" be put on the TODO list soon after tablespaces
(or DBA-defined directories) is implemented.

-- 
+-----------------------------------------------------------------+
| Ron Johnson, Jr.        Home: ron.l.johnson@cox.net             |
| Jefferson, LA  USA                                              |
|                                                                 |
| "I'm not a vegetarian because I love animals, I'm a vegetarian  |
|  because I hate vegetables!"                                    |
|    unknown                                                      |
+-----------------------------------------------------------------+