about partitioning

Started by Joao Ferreira gmailover 17 years ago11 messagesgeneral

Jump to latest

Joao Ferreira gmail

joao.miguel.c.ferreira@gmail.com

over 17 years ago

Hello all,

my application is coming to a point on which 'partitioning' seems to be
the solution for many problems:

- query speed up
- data elimination speed up

I'dd like to get the feeling of it by talking to people who use
partitioning, in general..

- good, bad,
- hard to manage, easy to manage,
- processing over-head during INSERT/UPDATE,
- stability/compatibility of pg_dump and restore operations,
- how many partitions would be reasonable for read _and_ write access
optimal speed;

thx
joao

Robert Treat

xzilla@users.sourceforge.net

over 17 years ago

In reply to: Joao Ferreira gmail (#1)

Re: about partitioning

On Thursday 11 September 2008 07:47:00 Joao Ferreira gmail wrote:

Hello all,

my application is coming to a point on which 'partitioning' seems to be
the solution for many problems:

- query speed up
- data elimination speed up

I'dd like to get the feeling of it by talking to people who use
partitioning, in general..

- good, bad,

good :-)

- hard to manage, easy to manage,

I think the upfront costs for managing a partitioning setup are higher with
postgres than other systems, but there is nothing that you shouldn't be able
to automate in a cron script (at which point management becomes easy), plus
postgres gives you some interesting flexibility that is harder to find in
other setups.

- processing over-head during INSERT/UPDATE,

you can setup inserts to have relativly little overhead, but it requires more
management/maintence work up front. Updates within a partition also have
relativly little extra overhead, especially if you put in a little
application logic to figure out how to work on a partition directly. Updates
where you are changing the partition key value are always more problematic
though.

- stability/compatibility of pg_dump and restore operations,

no real issues here as long as your on recent enough versions to do wildcard
table matching for individual tables.

- how many partitions would be reasonable for read _and_ write access
optimal speed;

again, this depends on how exactly your working on the data. For example, we
have tables with over a thousand partitions on them; in those scenarios all
data is written into a single partition (with a new partition created daily),
and the qeury patterns are really straightforward... last month gets a lot of
queries, lasat three months not so much, last year barely any, and beyond
that is pretty much just archive info. That said, we have other systems where
that wouldnt work at all (for example, a static number of partitions, all of
which are queried activly).

For some more info, I've given at least one presentation on the topic, which
seems to be missing from the omniti site, but I've uploaded it to
slideshare...
http://www.slideshare.net/xzilla/postgresql-partitioning-pgcon-2007-presentation

HTH.

--
Robert Treat
http://www.omniti.com
Database: Scalability: Consulting:

Blazej

bl.oleszkiewicz@gmail.com

over 17 years ago

In reply to: Robert Treat (#2)

Re: about partitioning

8><

For some more info, I've given at least one presentation on the topic, which
seems to be missing from the omniti site, but I've uploaded it to
slideshare...
http://www.slideshare.net/xzilla/postgresql-partitioning-pgcon-2007-presentation

HTH.

8><

Very nice presentation. I have 2 additional questions:

(1) Are you testing INSERT/UPDATE performance when is only one rule
script for all partition tables?
(2) Are you using "SET constraint_exclusion = on;"?

Regards,
Blazej

Joao Ferreira gmail

joao.miguel.c.ferreira@gmail.com

over 17 years ago

In reply to: Robert Treat (#2)

Re: about partitioning

hei,

thanks everybody for this great feedback.

I'll sure look deeper into this.

Joao

Show quoted text

On Sat, 2008-09-13 at 16:48 -0400, Robert Treat wrote:

On Thursday 11 September 2008 07:47:00 Joao Ferreira gmail wrote:

Hello all,

my application is coming to a point on which 'partitioning' seems to be
the solution for many problems:

- query speed up
- data elimination speed up

I'dd like to get the feeling of it by talking to people who use
partitioning, in general..

- good, bad,

good :-)

- hard to manage, easy to manage,

I think the upfront costs for managing a partitioning setup are higher with
postgres than other systems, but there is nothing that you shouldn't be able
to automate in a cron script (at which point management becomes easy), plus
postgres gives you some interesting flexibility that is harder to find in
other setups.

- processing over-head during INSERT/UPDATE,

you can setup inserts to have relativly little overhead, but it requires more
management/maintence work up front. Updates within a partition also have
relativly little extra overhead, especially if you put in a little
application logic to figure out how to work on a partition directly. Updates
where you are changing the partition key value are always more problematic
though.

- stability/compatibility of pg_dump and restore operations,

no real issues here as long as your on recent enough versions to do wildcard
table matching for individual tables.

- how many partitions would be reasonable for read _and_ write access
optimal speed;

again, this depends on how exactly your working on the data. For example, we
have tables with over a thousand partitions on them; in those scenarios all
data is written into a single partition (with a new partition created daily),
and the qeury patterns are really straightforward... last month gets a lot of
queries, lasat three months not so much, last year barely any, and beyond
that is pretty much just archive info. That said, we have other systems where
that wouldnt work at all (for example, a static number of partitions, all of
which are queried activly).

For some more info, I've given at least one presentation on the topic, which
seems to be missing from the omniti site, but I've uploaded it to
slideshare...
http://www.slideshare.net/xzilla/postgresql-partitioning-pgcon-2007-presentation

HTH.

Robert Treat

xzilla@users.sourceforge.net

over 17 years ago

In reply to: Blazej (#3)

Re: about partitioning

Blazej wrote:

8><

For some more info, I've given at least one presentation on the topic, which
seems to be missing from the omniti site, but I've uploaded it to
slideshare...
http://www.slideshare.net/xzilla/postgresql-partitioning-pgcon-2007-presentation

HTH.

8><

Very nice presentation. I have 2 additional questions:

(1) Are you testing INSERT/UPDATE performance when is only one rule
script for all partition tables?

I've tested a number of different scenarios; I guess I'd say as a
general rule, the more rules you have, likely the worse your performance
will get; but really I would use triggers anyway

(2) Are you using "SET constraint_exclusion = on;"?

yes, otherwise there isnt much point :-)

--
Robert Treat
http://www.omniti.com/
Database: Scalability: Consulting:

Joao Ferreira gmail

joao.miguel.c.ferreira@gmail.com

over 17 years ago

In reply to: Robert Treat (#2)

Re: about partitioning

Hi Robert and all,

I've been trying to follow the examples on the pg docs, section 5.9,
with your presentation as aditional guide.

I've bumped into the UPDATES. I'm getting this:

----------------------------------------------------------
CREATE RULE measurement_update_y2004m02 AS ON UPDATE TO measurement
WHERE ( logdate >= DATE '2004-02-01' AND logdate < DATE '2004-03-01' )
DO INSTEAD UPDATE INTO measurement_y2004m02 VALUES (NEW.city_id,
NEW.logdate, NEW.peaktemp, NEW.unitsales);
ERROR: syntax error at or near "INTO"
LINE 1: ...D logdate < DATE '2004-03-01' ) DO INSTEAD UPDATE INTO
measu...
-------------------------------------------------

it doesn't like the INTO part.

can you help me?

thanks
joao

Show quoted text

On Sat, 2008-09-13 at 16:48 -0400, Robert Treat wrote:

On Thursday 11 September 2008 07:47:00 Joao Ferreira gmail wrote:

Kevin Kempter

kevin@kevinkempterllc.com

over 17 years ago

In reply to: Joao Ferreira gmail (#6)

Re: about partitioning

I may be wrong and I didn't double check the docs but I think you need
this (leave the INTO keyword off):

CREATE RULE measurement_update_y2004m02 AS ON UPDATE TO measurement
WHERE ( logdate >= DATE '2004-02-01' AND logdate < DATE '2004-03-01' )
DO INSTEAD UPDATE measurement_y2004m02 VALUES (NEW.city_id,
NEW.logdate, NEW.peaktemp, NEW.unitsales)

On Sep 15, 2008, at 4:48 PM, Joao Ferreira gmail wrote:

Hi Robert and all,

I've been trying to follow the examples on the pg docs, section 5.9,
with your presentation as aditional guide.

I've bumped into the UPDATES. I'm getting this:

----------------------------------------------------------
CREATE RULE measurement_update_y2004m02 AS ON UPDATE TO measurement
WHERE ( logdate >= DATE '2004-02-01' AND logdate < DATE '2004-03-01' )
DO INSTEAD UPDATE INTO measurement_y2004m02 VALUES (NEW.city_id,
NEW.logdate, NEW.peaktemp, NEW.unitsales);
ERROR: syntax error at or near "INTO"
LINE 1: ...D logdate < DATE '2004-03-01' ) DO INSTEAD UPDATE INTO
measu...
-------------------------------------------------

it doesn't like the INTO part.

can you help me?

thanks
joao

On Sat, 2008-09-13 at 16:48 -0400, Robert Treat wrote:

On Thursday 11 September 2008 07:47:00 Joao Ferreira gmail wrote:

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

David Fetter

david@fetter.org

over 17 years ago

In reply to: Joao Ferreira gmail (#6)

Re: about partitioning

On Mon, Sep 15, 2008 at 11:48:49PM +0100, Joao Ferreira gmail wrote:

Hi Robert and all,

I've been trying to follow the examples on the pg docs, section 5.9,
with your presentation as aditional guide.

I've bumped into the UPDATES. I'm getting this:

----------------------------------------------------------
CREATE RULE measurement_update_y2004m02 AS ON UPDATE TO measurement
WHERE ( logdate >= DATE '2004-02-01' AND logdate < DATE '2004-03-01' )
DO INSTEAD UPDATE INTO measurement_y2004m02 VALUES (NEW.city_id,
NEW.logdate, NEW.peaktemp, NEW.unitsales);
ERROR: syntax error at or near "INTO"
LINE 1: ...D logdate < DATE '2004-03-01' ) DO INSTEAD UPDATE INTO
measu...
-------------------------------------------------

it doesn't like the INTO part.

can you help me?

thanks
joao

Joï¿½o,

Rules are not how you want to do this anyhow. Write triggers :)

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Simon Riggs

simon@2ndQuadrant.com

over 17 years ago

In reply to: Robert Treat (#5)

Re: about partitioning

On Mon, 2008-09-15 at 10:05 -0400, Robert Treat wrote:

Blazej wrote:

(1) Are you testing INSERT/UPDATE performance when is only one rule
script for all partition tables?

I've tested a number of different scenarios; I guess I'd say as a
general rule, the more rules you have, likely the worse your performance
will get; but really I would use triggers anyway

I would try to load data directly into a partition if possible. If this
is possible, you get to skip worrying about all that stuff completely
and performance is much better. You can't COPY into a table using RULEs,
and if you've got enough data to use partitioning then you will want to
use COPY to load data. Choosing a partition key that is optimal for
loading and queries will help you.

(2) Are you using "SET constraint_exclusion = on;"?

yes, otherwise there isnt much point :-)

I think Blazej means "in the postgresql.conf".

Probably yes, but not in all cases. If you have queries that access the
partitioned table and yet don't benefit from constraint exclusion, then
you will be adding time and CPU for no benefit. If you see it CPU bound
then its worth looking at in more detail.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#10

Joao Ferreira gmail

joao.miguel.c.ferreira@gmail.com

over 17 years ago

In reply to: David Fetter (#8)

Re: about partitioning

On Mon, 2008-09-15 at 17:30 -0700, David Fetter wrote:

João,

Rules are not how you want to do this anyhow. Write triggers :)

yes. you're right. I was just trying to follow the tutorial to get a
better grasp of the whole scenario.

Triggers seem to be a better approach, in my case.

João

PS: thanks for the tilde ~ :) I keep forgetting

Show quoted text

Cheers,
David.

#11

Joao Ferreira gmail

joao.miguel.c.ferreira@gmail.com

over 17 years ago

In reply to: Robert Treat (#2)

Re: about partitioning

Hello all,

still with partitioning...

wheter I use rules or triggers is there a way for me _not_ to specify
field-by-field all the fields I wish to be redirected to the
child-table...

as example:

instead of this:
---------------------------------------------------------
create rule insert_t_1 AS ON INSERT TO teste WHERE (t>=1000 and t<2000)
DO INSTEAD INSERT INTO t_1 VALUES (NEW.t, NEW.s1, NEW.s2 NEW.s3, NEW.s4
NEW.s5, NEW.s6, NEW.s7, NEW.s8);
---------------------------------------------------------

something like this:
---------------------------------------------------------
create rule insert_t_1 AS ON INSERT TO teste WHERE (t>=1000 and t<2000)
DO INSTEAD INSERT INTO t_1 VALUES (__ALL__);
---------------------------------------------------------

of course this assumes that the child table inherits all fields from the
parent table _and_ has no extra fields which is exactly my case.

any hints.

thx
j

Show quoted text

On Sat, 2008-09-13 at 16:48 -0400, Robert Treat wrote:

On Thursday 11 September 2008 07:47:00 Joao Ferreira gmail wrote:

Hello all,

my application is coming to a point on which 'partitioning' seems to be
the solution for many problems:

- query speed up
- data elimination speed up

I'dd like to get the feeling of it by talking to people who use
partitioning, in general..

- good, bad,

good :-)

- hard to manage, easy to manage,

I think the upfront costs for managing a partitioning setup are higher with
postgres than other systems, but there is nothing that you shouldn't be able
to automate in a cron script (at which point management becomes easy), plus
postgres gives you some interesting flexibility that is harder to find in
other setups.

- processing over-head during INSERT/UPDATE,

you can setup inserts to have relativly little overhead, but it requires more
management/maintence work up front. Updates within a partition also have
relativly little extra overhead, especially if you put in a little
application logic to figure out how to work on a partition directly. Updates
where you are changing the partition key value are always more problematic
though.

- stability/compatibility of pg_dump and restore operations,

no real issues here as long as your on recent enough versions to do wildcard
table matching for individual tables.

- how many partitions would be reasonable for read _and_ write access
optimal speed;

again, this depends on how exactly your working on the data. For example, we
have tables with over a thousand partitions on them; in those scenarios all
data is written into a single partition (with a new partition created daily),
and the qeury patterns are really straightforward... last month gets a lot of
queries, lasat three months not so much, last year barely any, and beyond
that is pretty much just archive info. That said, we have other systems where
that wouldnt work at all (for example, a static number of partitions, all of
which are queried activly).

For some more info, I've given at least one presentation on the topic, which
seems to be missing from the omniti site, but I've uploaded it to
slideshare...
http://www.slideshare.net/xzilla/postgresql-partitioning-pgcon-2007-presentation

HTH.