about partitioning
Hello all,
my application is coming to a point on which 'partitioning' seems to be
the solution for many problems:
- query speed up
- data elimination speed up
I'dd like to get the feeling of it by talking to people who use
partitioning, in general..
- good, bad,
- hard to manage, easy to manage,
- processing over-head during INSERT/UPDATE,
- stability/compatibility of pg_dump and restore operations,
- how many partitions would be reasonable for read _and_ write access
optimal speed;
thx
joao
On Thursday 11 September 2008 07:47:00 Joao Ferreira gmail wrote:
Hello all,
my application is coming to a point on which 'partitioning' seems to be
the solution for many problems:- query speed up
- data elimination speed upI'dd like to get the feeling of it by talking to people who use
partitioning, in general..- good, bad,
good :-)
- hard to manage, easy to manage,
I think the upfront costs for managing a partitioning setup are higher with
postgres than other systems, but there is nothing that you shouldn't be able
to automate in a cron script (at which point management becomes easy), plus
postgres gives you some interesting flexibility that is harder to find in
other setups.
- processing over-head during INSERT/UPDATE,
you can setup inserts to have relativly little overhead, but it requires more
management/maintence work up front. Updates within a partition also have
relativly little extra overhead, especially if you put in a little
application logic to figure out how to work on a partition directly. Updates
where you are changing the partition key value are always more problematic
though.
- stability/compatibility of pg_dump and restore operations,
no real issues here as long as your on recent enough versions to do wildcard
table matching for individual tables.
- how many partitions would be reasonable for read _and_ write access
optimal speed;
again, this depends on how exactly your working on the data. For example, we
have tables with over a thousand partitions on them; in those scenarios all
data is written into a single partition (with a new partition created daily),
and the qeury patterns are really straightforward... last month gets a lot of
queries, lasat three months not so much, last year barely any, and beyond
that is pretty much just archive info. That said, we have other systems where
that wouldnt work at all (for example, a static number of partitions, all of
which are queried activly).
For some more info, I've given at least one presentation on the topic, which
seems to be missing from the omniti site, but I've uploaded it to
slideshare...
http://www.slideshare.net/xzilla/postgresql-partitioning-pgcon-2007-presentation
HTH.
--
Robert Treat
http://www.omniti.com
Database: Scalability: Consulting:
8><
For some more info, I've given at least one presentation on the topic, which
seems to be missing from the omniti site, but I've uploaded it to
slideshare...
http://www.slideshare.net/xzilla/postgresql-partitioning-pgcon-2007-presentationHTH.
8><
Very nice presentation. I have 2 additional questions:
(1) Are you testing INSERT/UPDATE performance when is only one rule
script for all partition tables?
(2) Are you using "SET constraint_exclusion = on;"?
Regards,
Blazej
hei,
thanks everybody for this great feedback.
I'll sure look deeper into this.
Joao
Show quoted text
On Sat, 2008-09-13 at 16:48 -0400, Robert Treat wrote:
On Thursday 11 September 2008 07:47:00 Joao Ferreira gmail wrote:
Hello all,
my application is coming to a point on which 'partitioning' seems to be
the solution for many problems:- query speed up
- data elimination speed upI'dd like to get the feeling of it by talking to people who use
partitioning, in general..- good, bad,
good :-)
- hard to manage, easy to manage,
I think the upfront costs for managing a partitioning setup are higher with
postgres than other systems, but there is nothing that you shouldn't be able
to automate in a cron script (at which point management becomes easy), plus
postgres gives you some interesting flexibility that is harder to find in
other setups.- processing over-head during INSERT/UPDATE,
you can setup inserts to have relativly little overhead, but it requires more
management/maintence work up front. Updates within a partition also have
relativly little extra overhead, especially if you put in a little
application logic to figure out how to work on a partition directly. Updates
where you are changing the partition key value are always more problematic
though.- stability/compatibility of pg_dump and restore operations,
no real issues here as long as your on recent enough versions to do wildcard
table matching for individual tables.- how many partitions would be reasonable for read _and_ write access
optimal speed;again, this depends on how exactly your working on the data. For example, we
have tables with over a thousand partitions on them; in those scenarios all
data is written into a single partition (with a new partition created daily),
and the qeury patterns are really straightforward... last month gets a lot of
queries, lasat three months not so much, last year barely any, and beyond
that is pretty much just archive info. That said, we have other systems where
that wouldnt work at all (for example, a static number of partitions, all of
which are queried activly).For some more info, I've given at least one presentation on the topic, which
seems to be missing from the omniti site, but I've uploaded it to
slideshare...
http://www.slideshare.net/xzilla/postgresql-partitioning-pgcon-2007-presentationHTH.
Blazej wrote:
8><
For some more info, I've given at least one presentation on the topic, which
seems to be missing from the omniti site, but I've uploaded it to
slideshare...
http://www.slideshare.net/xzilla/postgresql-partitioning-pgcon-2007-presentationHTH.
8><
Very nice presentation. I have 2 additional questions:
(1) Are you testing INSERT/UPDATE performance when is only one rule
script for all partition tables?
I've tested a number of different scenarios; I guess I'd say as a
general rule, the more rules you have, likely the worse your performance
will get; but really I would use triggers anyway
(2) Are you using "SET constraint_exclusion = on;"?
yes, otherwise there isnt much point :-)
--
Robert Treat
http://www.omniti.com/
Database: Scalability: Consulting:
Hi Robert and all,
I've been trying to follow the examples on the pg docs, section 5.9,
with your presentation as aditional guide.
I've bumped into the UPDATES. I'm getting this:
----------------------------------------------------------
CREATE RULE measurement_update_y2004m02 AS ON UPDATE TO measurement
WHERE ( logdate >= DATE '2004-02-01' AND logdate < DATE '2004-03-01' )
DO INSTEAD UPDATE INTO measurement_y2004m02 VALUES (NEW.city_id,
NEW.logdate, NEW.peaktemp, NEW.unitsales);
ERROR: syntax error at or near "INTO"
LINE 1: ...D logdate < DATE '2004-03-01' ) DO INSTEAD UPDATE INTO
measu...
-------------------------------------------------
it doesn't like the INTO part.
can you help me?
thanks
joao
Show quoted text
On Sat, 2008-09-13 at 16:48 -0400, Robert Treat wrote:
On Thursday 11 September 2008 07:47:00 Joao Ferreira gmail wrote:
I may be wrong and I didn't double check the docs but I think you need
this (leave the INTO keyword off):
CREATE RULE measurement_update_y2004m02 AS ON UPDATE TO measurement
WHERE ( logdate >= DATE '2004-02-01' AND logdate < DATE '2004-03-01' )
DO INSTEAD UPDATE measurement_y2004m02 VALUES (NEW.city_id,
NEW.logdate, NEW.peaktemp, NEW.unitsales)
On Sep 15, 2008, at 4:48 PM, Joao Ferreira gmail wrote:
Hi Robert and all,
I've been trying to follow the examples on the pg docs, section 5.9,
with your presentation as aditional guide.I've bumped into the UPDATES. I'm getting this:
----------------------------------------------------------
CREATE RULE measurement_update_y2004m02 AS ON UPDATE TO measurement
WHERE ( logdate >= DATE '2004-02-01' AND logdate < DATE '2004-03-01' )
DO INSTEAD UPDATE INTO measurement_y2004m02 VALUES (NEW.city_id,
NEW.logdate, NEW.peaktemp, NEW.unitsales);
ERROR: syntax error at or near "INTO"
LINE 1: ...D logdate < DATE '2004-03-01' ) DO INSTEAD UPDATE INTO
measu...
-------------------------------------------------it doesn't like the INTO part.
can you help me?
thanks
joaoOn Sat, 2008-09-13 at 16:48 -0400, Robert Treat wrote:
On Thursday 11 September 2008 07:47:00 Joao Ferreira gmail wrote:
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Mon, Sep 15, 2008 at 11:48:49PM +0100, Joao Ferreira gmail wrote:
Hi Robert and all,
I've been trying to follow the examples on the pg docs, section 5.9,
with your presentation as aditional guide.I've bumped into the UPDATES. I'm getting this:
----------------------------------------------------------
CREATE RULE measurement_update_y2004m02 AS ON UPDATE TO measurement
WHERE ( logdate >= DATE '2004-02-01' AND logdate < DATE '2004-03-01' )
DO INSTEAD UPDATE INTO measurement_y2004m02 VALUES (NEW.city_id,
NEW.logdate, NEW.peaktemp, NEW.unitsales);
ERROR: syntax error at or near "INTO"
LINE 1: ...D logdate < DATE '2004-03-01' ) DO INSTEAD UPDATE INTO
measu...
-------------------------------------------------it doesn't like the INTO part.
can you help me?
thanks
joao
Jo�o,
Rules are not how you want to do this anyhow. Write triggers :)
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
On Mon, 2008-09-15 at 10:05 -0400, Robert Treat wrote:
Blazej wrote:
(1) Are you testing INSERT/UPDATE performance when is only one rule
script for all partition tables?I've tested a number of different scenarios; I guess I'd say as a
general rule, the more rules you have, likely the worse your performance
will get; but really I would use triggers anyway
I would try to load data directly into a partition if possible. If this
is possible, you get to skip worrying about all that stuff completely
and performance is much better. You can't COPY into a table using RULEs,
and if you've got enough data to use partitioning then you will want to
use COPY to load data. Choosing a partition key that is optimal for
loading and queries will help you.
(2) Are you using "SET constraint_exclusion = on;"?
yes, otherwise there isnt much point :-)
I think Blazej means "in the postgresql.conf".
Probably yes, but not in all cases. If you have queries that access the
partitioned table and yet don't benefit from constraint exclusion, then
you will be adding time and CPU for no benefit. If you see it CPU bound
then its worth looking at in more detail.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
On Mon, 2008-09-15 at 17:30 -0700, David Fetter wrote:
João,
Rules are not how you want to do this anyhow. Write triggers :)
yes. you're right. I was just trying to follow the tutorial to get a
better grasp of the whole scenario.
Triggers seem to be a better approach, in my case.
João
PS: thanks for the tilde ~ :) I keep forgetting
Show quoted text
Cheers,
David.
Hello all,
still with partitioning...
wheter I use rules or triggers is there a way for me _not_ to specify
field-by-field all the fields I wish to be redirected to the
child-table...
as example:
instead of this:
---------------------------------------------------------
create rule insert_t_1 AS ON INSERT TO teste WHERE (t>=1000 and t<2000)
DO INSTEAD INSERT INTO t_1 VALUES (NEW.t, NEW.s1, NEW.s2 NEW.s3, NEW.s4
NEW.s5, NEW.s6, NEW.s7, NEW.s8);
---------------------------------------------------------
something like this:
---------------------------------------------------------
create rule insert_t_1 AS ON INSERT TO teste WHERE (t>=1000 and t<2000)
DO INSTEAD INSERT INTO t_1 VALUES (__ALL__);
---------------------------------------------------------
of course this assumes that the child table inherits all fields from the
parent table _and_ has no extra fields which is exactly my case.
any hints.
thx
j
Show quoted text
On Sat, 2008-09-13 at 16:48 -0400, Robert Treat wrote:
On Thursday 11 September 2008 07:47:00 Joao Ferreira gmail wrote:
Hello all,
my application is coming to a point on which 'partitioning' seems to be
the solution for many problems:- query speed up
- data elimination speed upI'dd like to get the feeling of it by talking to people who use
partitioning, in general..- good, bad,
good :-)
- hard to manage, easy to manage,
I think the upfront costs for managing a partitioning setup are higher with
postgres than other systems, but there is nothing that you shouldn't be able
to automate in a cron script (at which point management becomes easy), plus
postgres gives you some interesting flexibility that is harder to find in
other setups.- processing over-head during INSERT/UPDATE,
you can setup inserts to have relativly little overhead, but it requires more
management/maintence work up front. Updates within a partition also have
relativly little extra overhead, especially if you put in a little
application logic to figure out how to work on a partition directly. Updates
where you are changing the partition key value are always more problematic
though.- stability/compatibility of pg_dump and restore operations,
no real issues here as long as your on recent enough versions to do wildcard
table matching for individual tables.- how many partitions would be reasonable for read _and_ write access
optimal speed;again, this depends on how exactly your working on the data. For example, we
have tables with over a thousand partitions on them; in those scenarios all
data is written into a single partition (with a new partition created daily),
and the qeury patterns are really straightforward... last month gets a lot of
queries, lasat three months not so much, last year barely any, and beyond
that is pretty much just archive info. That said, we have other systems where
that wouldnt work at all (for example, a static number of partitions, all of
which are queried activly).For some more info, I've given at least one presentation on the topic, which
seems to be missing from the omniti site, but I've uploaded it to
slideshare...
http://www.slideshare.net/xzilla/postgresql-partitioning-pgcon-2007-presentationHTH.