Table Partitioning Advice Request

Started by Vincenzo Romanoover 16 years ago10 messagesgeneral

vincenzo.romano@notorand.it

over 16 years ago

Hi all.

I'm planning to implement table partitioning as "suggested" (among
other sources) in the official documentation.
I'm using v8.4.2 at the moment.

My case is far from the general one as:
1. I'll never UPDATE or DELETE rows from partitioned tables (only INSERTs)
2. Rows will be inserted one-by-one or, in the worse case, in bunches
of two or three
3. Partitioning will be based upon TIMESTAMP ranges
4. The "virtual" tables should approach (and possibly go past) 100M rows
5. Most (99%) of the INSERTs (and possibly SELECTs) will actually
operate on a rather small number of partitions (hardly more than 2).

My main TABLE is like the following one:

CREATE TABLE events (
eventtype text not null,
item_id int8 not null,
event_date timestamp not null default now(),
row_date timestamp not null default now(),
event_id serial8 primary key
);

where the partitioning would happen over the values of the event_date column.
The row_date columns is to record the row creation TIMESTAMP as events
can be created relatively to the past, the future or the current time.

In my mind a solution which is simple to maintain is to add a simple
RULE ... ON INSERT for every newly created partition table.
The TRIGGER approach, in my opinion, is much more complex to maintain
as either the body of the function needs to be rewritten as new
partitions are added, or
some external TABLE lookup is needed to choose the actual table name
to be used for a (dynamically created) INSERT.

Now the questions.

1. As the number of RULEs will grow with the time, how will change
the efficiency of the query planner while "browsing" among the RULES?
2. In the case the previous answer would lead to bad news for me, is
there any better/different approach to partitioning with TRIGGERs?
3. Is there any more general advise for such approaches?

--
Vincenzo Romano
NON QVIETIS MARIBVS NAVTA PERITVS

Sam Jas

samjas33@yahoo.com

over 16 years ago

In reply to: Vincenzo Romano (#1)

Re: Table Partitioning Advice Request

Rule is not advisable, Trigger is the best solution.

---
Thanks
Sam Jas

--- On Thu, 17/12/09, Vincenzo Romano <vincenzo.romano@notorand.it> wrote:

From: Vincenzo Romano <vincenzo.romano@notorand.it>
Subject: [GENERAL] Table Partitioning Advice Request
To: pgsql-general@postgresql.org
Date: Thursday, 17 December, 2009, 11:05 AM

Hi all.

I'm planning to implement table partitioning as "suggested" (among
other sources) in the official documentation.
I'm using v8.4.2 at the moment.

My main TABLE is like the following one:

CREATE TABLE events (
eventtype text not null,
item_id int8 not null,
event_date timestamp not null default now(),
row_date timestamp not null default now(),
event_id serial8 primary key
);

Now the questions.

--
Vincenzo Romano
NON QVIETIS MARIBVS NAVTA PERITVS

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

The INTERNET now has a personality. YOURS! See your Yahoo! Homepage. http://in.yahoo.com/

Vincenzo Romano

vincenzo.romano@notorand.it

over 16 years ago

In reply to: Sam Jas (#2)

Re: Table Partitioning Advice Request

Why?

If you have to choose among a couple hundred partition tables, the
trigger function body is far from trivial!
You really think that calling and running a trigger function for every
line is the best solution?

2009/12/17 Sam Jas <samjas33@yahoo.com>

Rule is not advisable, Trigger is the best solution.

---
Thanks
Sam Jas
--- On Thu, 17/12/09, Vincenzo Romano <vincenzo.romano@notorand.it> wrote:
From: Vincenzo Romano <vincenzo.romano@notorand.it>
Subject: [GENERAL] Table Partitioning Advice Request
To: pgsql-general@postgresql.org
Date: Thursday, 17 December, 2009, 11:05 AM

Hi all.

I'm planning to implement table partitioning as "suggested" (among
other sources) in the official documentation.
I'm using v8.4.2 at the moment.

My case is far from the general one as:
1. I'll never UPDATE or DELETE rows from partitioned tables (only INSERTs)
2. Rows will be inserted one-by-one or, in the worse case, in bunches
of two or three
3. Partitioning will be based upon TIMESTAMP ranges
4. The "virtual" tables should approach (and possibly go past) 100M rows
5. Most (99%) of the INSERTs (and possibly SELECTs) will actually
operate on a rather small number of partitions (hardly more than 2).

My main TABLE is like the following one:

CREATE TABLE events (
eventtype text not null,
item_id int8 not null,
event_date timestamp not null default now(),
row_date timestamp not null default now(),
event_id serial8 primary key
);

where the partitioning would happen over the values of the event_date column.
The row_date columns is to record the row creation TIMESTAMP as events
can be created relatively to the past, the future or the current time.

In my mind a solution which is simple to maintain is to add a simple
RULE ... ON INSERT for every newly created partition table.
The TRIGGER approach, in my opinion, is much more complex to maintain
as either the body of the function needs to be rewritten as new
partitions are added, or
some external TABLE lookup is needed to choose the actual table name
to be used for a (dynamically created) INSERT.

Now the questions.

1. As the number of RULEs will grow with the time, how will change
the efficiency of the query planner while "browsing" among the RULES?
2. In the case the previous answer would lead to bad news for me, is
there any better/different approach to partitioning with TRIGGERs?
3. Is there any more general advise for such approaches?

--
Vincenzo Romano
NON QVIETIS MARIBVS NAVTA PERITVS

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

________________________________
The INTERNET now has a personality. YOURS! See your Yahoo! Homepage.

--
Vincenzo Romano
NotOrAnd Information Technologies
cel. +39 339 8083886 | gtalk. vincenzo.romano@notorand.it
fix. +39 0823 454163 | skype. notorand.it
fax. +39 02 700506964 | msn. notorand.it
NON QVIETIS MARIBVS NAVTA PERITVS

David Fetter

david@fetter.org

over 16 years ago

In reply to: Vincenzo Romano (#3)

Re: Table Partitioning Advice Request

On Thu, Dec 17, 2009 at 02:41:40PM +0100, Vincenzo Romano wrote:

2009/12/17 Sam Jas <samjas33@yahoo.com>
Rule is not advisable, Trigger is the best solution.

If you have to choose among a couple hundred partition tables, the
trigger function body is far from trivial!

It's possible to generate such code and deploy it automatically.

You really think that calling and running a trigger function for every
line is the best solution?

Yes. The trigger function is choosing from a small subset of the
tables, or you know which tables exactly the rows are going into and
insert them there.

Oh, and please do trim, and don't top-post. I've fix this in this
post.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Vincenzo Romano

vincenzo.romano@notorand.it

over 16 years ago

In reply to: David Fetter (#4)

Re: Table Partitioning Advice Request

2009/12/17 David Fetter <david@fetter.org>:

You really think that calling and running a trigger function for every
line is the best solution?

Yes. The trigger function is choosing from a small subset of the
tables, or you know which tables exactly the rows are going into and
insert them there.

So I understand that when a table has multiple RULEs, the planner just
browse all of them
to check the one(s) that will apply.
What I have in mind is a "side table" with some bookkeeping data to be
used by the trigger
function to select the actual table to be used.
Do you think this is faster than RULEs?

Is there any performance study for the trigger-based implementation?

Oh, and please do trim, and don't top-post. I've fix this in this
post.

Sorry! :-)

Dimitri Fontaine

dimitri@2ndQuadrant.fr

over 16 years ago

In reply to: Vincenzo Romano (#5)

Re: Table Partitioning Advice Request

Vincenzo Romano <vincenzo.romano@notorand.it> writes:

Is there any performance study for the trigger-based implementation?

Consider that if you use RULE to partition, when you DROP a partition
the INSERTs are locked out because the query depends on the table being
droped.

That alone could lead you to stop considering RULEs for partitioning.
--
dim

Vincenzo Romano

vincenzo.romano@notorand.it

over 16 years ago

In reply to: Sam Jas (#2)

Re: Table Partitioning Advice Request

2009/12/17 Sam Jas <samjas33@yahoo.com>

Rule is not advisable, Trigger is the best solution.

Does the trigger solution need the TABLE CHECK constraint?
It looks to me it won't.

Vincenzo Romano

vincenzo.romano@notorand.it

over 16 years ago

In reply to: Dimitri Fontaine (#6)

Re: Table Partitioning Advice Request

2009/12/17 Dimitri Fontaine <dfontaine@hi-media.com>:

Vincenzo Romano <vincenzo.romano@notorand.it> writes:

Is there any performance study for the trigger-based implementation?

Consider that if you use RULE to partition, when you DROP a partition
the INSERTs are locked out because the query depends on the table being
droped.

That alone could lead you to stop considering RULEs for partitioning.

In that case I would also drop the relevant rule(s).
But it seems anyway that the rule system would need to scan all the
rules in order to know which one(s) to apply.
And, at least in my case, I would have hundreds of rules ...

Thanks for the hint, anyway.

--
Vincenzo Romano
NON QVIETIS MARIBVS NAVTA PERITVS

Scott Marlowe

scott.marlowe@gmail.com

over 16 years ago

In reply to: Vincenzo Romano (#7)

Re: Table Partitioning Advice Request

On Thu, Dec 17, 2009 at 8:29 AM, Vincenzo Romano
<vincenzo.romano@notorand.it> wrote:

2009/12/17 Sam Jas <samjas33@yahoo.com>

Rule is not advisable, Trigger is the best solution.

Does the trigger solution need the TABLE CHECK constraint?
It looks to me it won't.

The table check constraint is used during selects to pick just the
child tables needed to deliver the data you are asking for, assuming
the key value is in the where clause.

The main reason to avoid rules is that they're much less efficient
than triggers. We use partitioning at work for our stats db, and
partition by day, and we have about 2 years worth of stats data, so
our insert trigger has about 700 if / elseif / else conditions in it,
and the trigger is still very very fast. I tried it with rules before
and it was way too slow.

Note that the absolute fastest way to insert to the right child table
is to do it from the application side, choosing the right child table
there.

We automated our trigger creation. While the trigger is huge in terms
of number of lines, speed wise the creation of the new trigger each
night at midnight is measured in milliseconds

#10

Vincenzo Romano

vincenzo.romano@notorand.it

over 16 years ago

In reply to: Scott Marlowe (#9)

Re: Table Partitioning Advice Request

2009/12/18 Scott Marlowe <scott.marlowe@gmail.com>:

The main reason to avoid rules is that they're much less efficient
than triggers. We use partitioning at work for our stats db, and
partition by day, and we have about 2 years worth of stats data, so
our insert trigger has about 700 if / elseif / else conditions in it,
and the trigger is still very very fast. I tried it with rules before
and it was way too slow.

I still don't understand the poor efficiency of the rule system. I presume it's
in the implementation, not the concept.

Well, for the sake of efficiency I would like to avoid that if/elseif chain
with an extra support table to look for the right partition.
Unless you have a different advise against dynamic SQL (EXECUTE...)
to define the partition table name.

Note that the absolute fastest way to insert to the right child table
is to do it from the application side, choosing the right child table
there.

I definitely disagree on this advise. For the sake of speed I could
also run some library level (and thus application level) database
solution.
The main objective for using an RDBMS (at least in my view) is to
gather in a single place (the RDBMS) the data storage and
management stuff.
Moving policies to the application level would just scatter that
stuff with faster but more error prone solutions.
And the aplpication needs to know the partitioning policies.

We automated our trigger creation. While the trigger is huge in terms
of number of lines, speed wise the creation of the new trigger each
night at midnight is measured in milliseconds

Correct.
There is still another point for me to investigate into this partitioning
area. When the partitions get created.
I see from your answers that everyone is expecting the partitions to
be already in place at the moment the trigger or the rule get fired.
In my application you can generate events for the past or the future
where there's no partition yet.

--
Vincenzo Romano
NON QVIETIS MARIBVS NAVTA PERITVS