Hash partitioning.
Hi,
Do we have any plans to implement Hash Partitioning, maybe I missing
this feature?
Sincerely yours,
Yuri Levinsky, DBA
Celltick Technologies Ltd., 32 Maskit St., Herzliya 46733, Israel
Mobile: +972 54 6107703, Office: +972 9 9710239; Fax: +972 9 9710222
Attachments:
image002.jpgimage/jpeg; name=image002.jpgDownload
On Tue, Jun 25, 2013 at 03:48:19PM +0300, Yuri Levinsky wrote:
Hi,
Do we have any plans to implement Hash Partitioning, maybe I missing this
feature?
You can do it by writing your own constraint and trigger functions that
control the hashing.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Bruce,
Many thanks. According to PostgreSQL documentation it's only range and
list partitions are supported. My question is: when I am following your
advice, is PostgreSQL will do partitioning pruning on select? My
expectation is:
I divided my table on 128 hash partitions according let's say user_id.
When I do select * from users where user_id=? , I am expecting the
engine select from some particular partition according to my function.
The issue is critical when you working with big tables, that you can't
normally partition by range/list. The feature allow parallel select from
such table: each thread might select from his own dedicated partition.
The feature also (mainly) allow to decrease index b-tree level on
partition key column by dividing index into smaller parts.
Sincerely yours,
Yuri Levinsky, DBA
Celltick Technologies Ltd., 32 Maskit St., Herzliya 46733, Israel
Mobile: +972 54 6107703, Office: +972 9 9710239; Fax: +972 9 9710222
-----Original Message-----
From: Bruce Momjian [mailto:bruce@momjian.us]
Sent: Tuesday, June 25, 2013 4:21 PM
To: Yuri Levinsky
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Hash partitioning.
On Tue, Jun 25, 2013 at 03:48:19PM +0300, Yuri Levinsky wrote:
Hi,
Do we have any plans to implement Hash Partitioning, maybe I missing
this feature?
You can do it by writing your own constraint and trigger functions that
control the hashing.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
This mail was received via Mail-SeCure System.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jun 25, 2013 at 05:19:47PM +0300, Yuri Levinsky wrote:
Bruce,
Many thanks. According to PostgreSQL documentation it's only range and
list partitions are supported. My question is: when I am following your
advice, is PostgreSQL will do partitioning pruning on select? My
expectation is:
I divided my table on 128 hash partitions according let's say user_id.
When I do select * from users where user_id=? , I am expecting the
engine select from some particular partition according to my function.
The issue is critical when you working with big tables, that you can't
normally partition by range/list. The feature allow parallel select from
such table: each thread might select from his own dedicated partition.
The feature also (mainly) allow to decrease index b-tree level on
partition key column by dividing index into smaller parts.
Uh, where do you see that we only support range and list? You aren't
using an EnterpriseDB closed-source product, are you?
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jun 25, 2013 at 9:21 AM, Bruce Momjian <bruce@momjian.us> wrote:
On Tue, Jun 25, 2013 at 03:48:19PM +0300, Yuri Levinsky wrote:
Hi,
Do we have any plans to implement Hash Partitioning, maybe I missing this
feature?You can do it by writing your own constraint and trigger functions that
control the hashing.
Not really. Constraint exclusion won't kick in for a constraint like
CHECK (hashme(a) % 16 == 3) and a WHERE clause of the form a = 42.
Of course, since partitioning generally doesn't improve performance in
PostgreSQL anyway, it's not clear why you'd want to do this in the
first place. But the fact that constraint exclusion won't work if you
do is kind of a knockout blow.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jun 25, 2013 at 11:02:40AM -0400, Robert Haas wrote:
On Tue, Jun 25, 2013 at 9:21 AM, Bruce Momjian <bruce@momjian.us> wrote:
On Tue, Jun 25, 2013 at 03:48:19PM +0300, Yuri Levinsky wrote:
Hi,
Do we have any plans to implement Hash Partitioning, maybe I missing this
feature?You can do it by writing your own constraint and trigger functions that
control the hashing.Not really. Constraint exclusion won't kick in for a constraint like
CHECK (hashme(a) % 16 == 3) and a WHERE clause of the form a = 42.
Uh, I thought we checked the constant against every CHECK constraint and
only scanned partitions that matched. Why does this not work?
Of course, since partitioning generally doesn't improve performance in
PostgreSQL anyway, it's not clear why you'd want to do this in the
I think partitioning does improve performance by reducing index depth.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jun 25, 2013 at 11:06 AM, Bruce Momjian <bruce@momjian.us> wrote:
Not really. Constraint exclusion won't kick in for a constraint like
CHECK (hashme(a) % 16 == 3) and a WHERE clause of the form a = 42.Uh, I thought we checked the constant against every CHECK constraint and
only scanned partitions that matched. Why does this not work?
That's a pretty fuzzy description of what we do. For this to work,
we'd have to be able to use the predicate a = 42 to prove that
hashme(a) % 16 = 3 is false. But we can't actually substitute 42 in
for a and then evaluate hashme(42) % 16 = 3, because we don't know
that the a = 42 in the WHERE clause means exact equality for all
purposes, only that it means "has the numerically same value". For
integers, equality under = is sufficient to prove equivalence.
But for numeric values, for example, it is not. The values
'42'::numeric and '42.0'::numeric are equal according to =(numeric,
numeric), but they are not the same. If the hashme() function did
something like length($1::text), it would get different answers for
those two values. IOW, the theorem prover has no way of knowing that
the hash function provided has semantics that are compatible with the
opclass of the operator used in the query.
Of course, since partitioning generally doesn't improve performance in
PostgreSQL anyway, it's not clear why you'd want to do this in theI think partitioning does improve performance by reducing index depth.
Generally, I think traversing an extra level of the index is cheaper
than opening extra relations and going through the theorem-prover
machinery. There are benefits to partitioning, but they have to do
with management - e.g. each partition can be vacuumed independently;
old partitions can be dropped more efficiently than you can
bulk-delete rows spread throughout a table - rather than performance.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jun 25, 2013 at 11:15:24AM -0400, Robert Haas wrote:
On Tue, Jun 25, 2013 at 11:06 AM, Bruce Momjian <bruce@momjian.us> wrote:
Not really. Constraint exclusion won't kick in for a constraint like
CHECK (hashme(a) % 16 == 3) and a WHERE clause of the form a = 42.Uh, I thought we checked the constant against every CHECK constraint and
only scanned partitions that matched. Why does this not work?That's a pretty fuzzy description of what we do. For this to work,
we'd have to be able to use the predicate a = 42 to prove that
hashme(a) % 16 = 3 is false. But we can't actually substitute 42 in
for a and then evaluate hashme(42) % 16 = 3, because we don't know
that the a = 42 in the WHERE clause means exact equality for all
purposes, only that it means "has the numerically same value". For
integers, equality under = is sufficient to prove equivalence.But for numeric values, for example, it is not. The values
'42'::numeric and '42.0'::numeric are equal according to =(numeric,
numeric), but they are not the same. If the hashme() function did
something like length($1::text), it would get different answers for
those two values. IOW, the theorem prover has no way of knowing that
the hash function provided has semantics that are compatible with the
opclass of the operator used in the query.
I looked at predtest.c but I can't see how we accept >= and <= ranges,
but not CHECK (a % 16 == 3). It is the '%' operator? I am not sure why
the hashme() function is there. Wouldn't it work if hashme() was an
immutable function?
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jun 25, 2013 at 11:45 AM, Bruce Momjian <bruce@momjian.us> wrote:
On Tue, Jun 25, 2013 at 11:15:24AM -0400, Robert Haas wrote:
On Tue, Jun 25, 2013 at 11:06 AM, Bruce Momjian <bruce@momjian.us> wrote:
Not really. Constraint exclusion won't kick in for a constraint like
CHECK (hashme(a) % 16 == 3) and a WHERE clause of the form a = 42.Uh, I thought we checked the constant against every CHECK constraint and
only scanned partitions that matched. Why does this not work?That's a pretty fuzzy description of what we do. For this to work,
we'd have to be able to use the predicate a = 42 to prove that
hashme(a) % 16 = 3 is false. But we can't actually substitute 42 in
for a and then evaluate hashme(42) % 16 = 3, because we don't know
that the a = 42 in the WHERE clause means exact equality for all
purposes, only that it means "has the numerically same value". For
integers, equality under = is sufficient to prove equivalence.But for numeric values, for example, it is not. The values
'42'::numeric and '42.0'::numeric are equal according to =(numeric,
numeric), but they are not the same. If the hashme() function did
something like length($1::text), it would get different answers for
those two values. IOW, the theorem prover has no way of knowing that
the hash function provided has semantics that are compatible with the
opclass of the operator used in the query.I looked at predtest.c but I can't see how we accept >= and <= ranges,
but not CHECK (a % 16 == 3). It is the '%' operator? I am not sure why
the hashme() function is there. Wouldn't it work if hashme() was an
immutable function?
Let me back up a minute. You told the OP that he could make hash
partitioning by writing his own constraint and trigger functions. I
think that won't work. But I'm happy to be proven wrong. Do you have
an example showing how to do it?
Here's why I think it WON'T work:
rhaas=# create table foo (a int, b text);
CREATE TABLE
rhaas=# create table foo0 (check ((a % 16) = 0)) inherits (foo);
CREATE TABLE
rhaas=# create table foo1 (check ((a % 16) = 1)) inherits (foo);
CREATE TABLE
rhaas=# create table foo2 (check ((a % 16) = 2)) inherits (foo);
CREATE TABLE
rhaas=# create table foo3 (check ((a % 16) = 3)) inherits (foo);
CREATE TABLE
rhaas=# explain select * from foo where a = 1;
QUERY PLAN
------------------------------------------------------------
Append (cost=0.00..101.50 rows=25 width=36)
-> Seq Scan on foo (cost=0.00..0.00 rows=1 width=36)
Filter: (a = 1)
-> Seq Scan on foo0 (cost=0.00..25.38 rows=6 width=36)
Filter: (a = 1)
-> Seq Scan on foo1 (cost=0.00..25.38 rows=6 width=36)
Filter: (a = 1)
-> Seq Scan on foo2 (cost=0.00..25.38 rows=6 width=36)
Filter: (a = 1)
-> Seq Scan on foo3 (cost=0.00..25.38 rows=6 width=36)
Filter: (a = 1)
(11 rows)
Notice we get a scan on every partition. Now let's try it with no
modulo arithmetic, just a straightforward one-partition-per-value:
rhaas=# create table foo (a int, b text);
CREATE TABLE
rhaas=# create table foo0 (check (a = 0)) inherits (foo);
CREATE TABLE
rhaas=# create table foo1 (check (a = 1)) inherits (foo);
CREATE TABLE
rhaas=# create table foo2 (check (a = 2)) inherits (foo);
CREATE TABLE
rhaas=# create table foo3 (check (a = 3)) inherits (foo);
CREATE TABLE
rhaas=# explain select * from foo where a = 1;
QUERY PLAN
------------------------------------------------------------
Append (cost=0.00..25.38 rows=7 width=36)
-> Seq Scan on foo (cost=0.00..0.00 rows=1 width=36)
Filter: (a = 1)
-> Seq Scan on foo1 (cost=0.00..25.38 rows=6 width=36)
Filter: (a = 1)
(5 rows)
Voila, now constraint exclusion is working.
I confess that I'm not entirely clear about the details either, but
the above tests speak for themselves.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Bruce Momjian <bruce@momjian.us> writes:
I looked at predtest.c but I can't see how we accept >= and <= ranges,
but not CHECK (a % 16 == 3). It is the '%' operator? I am not sure why
the hashme() function is there. Wouldn't it work if hashme() was an
immutable function?
No. Robert's description is exactly correct: it's a question of whether
we can know that the semantics of function X have anything to do with
the behavior of operator Y. In the case of something like CHECK (X >= 16)
combined with WHERE X = 10, if the given = and >= operators belong to
the same btree opclass family then we can assume that their semantics
are compatible and then apply reasoning to show that these two clauses
can't both be true for the same value of X. We can *not* use "X = 10"
to reason about the behavior of anything that isn't in the = operator's
btree opclass, because we don't assume that "=" means "absolutely
identical for every purpose". And in fact it does not mean that for
several pretty common datatypes (float being another example besides
numeric).
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Guys,
I am sorry for taking your time. The reason for my question is:
As former Oracle DBA and now simple beginner PostgreSQL DBA I would like
to say: the current partitioning mechanism might be improved. Sorry, it
seems to me far behind yesterday requirements. As model for improvement
the Oracle might be taken as example. Unfortunately I am not writing an
C code and see my benefit to PostgreSQL community in only rising this
issue. I'll be very happy to be helpful in something else, but...
Sincerely yours,
Yuri Levinsky, DBA
Celltick Technologies Ltd., 32 Maskit St., Herzliya 46733, Israel
Mobile: +972 54 6107703, Office: +972 9 9710239; Fax: +972 9 9710222
-----Original Message-----
From: Robert Haas [mailto:robertmhaas@gmail.com]
Sent: Tuesday, June 25, 2013 6:55 PM
To: Bruce Momjian
Cc: Yuri Levinsky; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Hash partitioning.
On Tue, Jun 25, 2013 at 11:45 AM, Bruce Momjian <bruce@momjian.us>
wrote:
On Tue, Jun 25, 2013 at 11:15:24AM -0400, Robert Haas wrote:
On Tue, Jun 25, 2013 at 11:06 AM, Bruce Momjian <bruce@momjian.us>
wrote:
Not really. Constraint exclusion won't kick in for a constraint
like CHECK (hashme(a) % 16 == 3) and a WHERE clause of the form a
= 42.
Uh, I thought we checked the constant against every CHECK
constraint and only scanned partitions that matched. Why does this
not work?
That's a pretty fuzzy description of what we do. For this to work,
we'd have to be able to use the predicate a = 42 to prove that
hashme(a) % 16 = 3 is false. But we can't actually substitute 42 in
for a and then evaluate hashme(42) % 16 = 3, because we don't know
that the a = 42 in the WHERE clause means exact equality for all
purposes, only that it means "has the numerically same value". For
integers, equality under = is sufficient to prove equivalence.But for numeric values, for example, it is not. The values
'42'::numeric and '42.0'::numeric are equal according to =(numeric,
numeric), but they are not the same. If the hashme() function did
something like length($1::text), it would get different answers for
those two values. IOW, the theorem prover has no way of knowing that
the hash function provided has semantics that are compatible with the
opclass of the operator used in the query.
I looked at predtest.c but I can't see how we accept >= and <= ranges,
but not CHECK (a % 16 == 3). It is the '%' operator? I am not sure
why the hashme() function is there. Wouldn't it work if hashme() was
an immutable function?
Let me back up a minute. You told the OP that he could make hash
partitioning by writing his own constraint and trigger functions. I
think that won't work. But I'm happy to be proven wrong. Do you have
an example showing how to do it?
Here's why I think it WON'T work:
rhaas=# create table foo (a int, b text); CREATE TABLE rhaas=# create
table foo0 (check ((a % 16) = 0)) inherits (foo); CREATE TABLE rhaas=#
create table foo1 (check ((a % 16) = 1)) inherits (foo); CREATE TABLE
rhaas=# create table foo2 (check ((a % 16) = 2)) inherits (foo); CREATE
TABLE rhaas=# create table foo3 (check ((a % 16) = 3)) inherits (foo);
CREATE TABLE rhaas=# explain select * from foo where a = 1;
QUERY PLAN
------------------------------------------------------------
Append (cost=0.00..101.50 rows=25 width=36)
-> Seq Scan on foo (cost=0.00..0.00 rows=1 width=36)
Filter: (a = 1)
-> Seq Scan on foo0 (cost=0.00..25.38 rows=6 width=36)
Filter: (a = 1)
-> Seq Scan on foo1 (cost=0.00..25.38 rows=6 width=36)
Filter: (a = 1)
-> Seq Scan on foo2 (cost=0.00..25.38 rows=6 width=36)
Filter: (a = 1)
-> Seq Scan on foo3 (cost=0.00..25.38 rows=6 width=36)
Filter: (a = 1)
(11 rows)
Notice we get a scan on every partition. Now let's try it with no
modulo arithmetic, just a straightforward one-partition-per-value:
rhaas=# create table foo (a int, b text); CREATE TABLE rhaas=# create
table foo0 (check (a = 0)) inherits (foo); CREATE TABLE rhaas=# create
table foo1 (check (a = 1)) inherits (foo); CREATE TABLE rhaas=# create
table foo2 (check (a = 2)) inherits (foo); CREATE TABLE rhaas=# create
table foo3 (check (a = 3)) inherits (foo); CREATE TABLE rhaas=# explain
select * from foo where a = 1;
QUERY PLAN
------------------------------------------------------------
Append (cost=0.00..25.38 rows=7 width=36)
-> Seq Scan on foo (cost=0.00..0.00 rows=1 width=36)
Filter: (a = 1)
-> Seq Scan on foo1 (cost=0.00..25.38 rows=6 width=36)
Filter: (a = 1)
(5 rows)
Voila, now constraint exclusion is working.
I confess that I'm not entirely clear about the details either, but the
above tests speak for themselves.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
Company
This mail was received via Mail-SeCure System.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jun 25, 2013 at 12:08:34PM -0400, Tom Lane wrote:
Bruce Momjian <bruce@momjian.us> writes:
I looked at predtest.c but I can't see how we accept >= and <= ranges,
but not CHECK (a % 16 == 3). It is the '%' operator? I am not sure why
the hashme() function is there. Wouldn't it work if hashme() was an
immutable function?No. Robert's description is exactly correct: it's a question of whether
we can know that the semantics of function X have anything to do with
the behavior of operator Y. In the case of something like CHECK (X >= 16)
combined with WHERE X = 10, if the given = and >= operators belong to
the same btree opclass family then we can assume that their semantics
are compatible and then apply reasoning to show that these two clauses
can't both be true for the same value of X. We can *not* use "X = 10"
to reason about the behavior of anything that isn't in the = operator's
btree opclass, because we don't assume that "=" means "absolutely
identical for every purpose". And in fact it does not mean that for
several pretty common datatypes (float being another example besides
numeric).
OK, so it is really the index comparisons that we are using; makes
sense.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Yuri Levinsky escribi�:
As former Oracle DBA and now simple beginner PostgreSQL DBA I would like
to say: the current partitioning mechanism might be improved. Sorry, it
seems to me far behind yesterday requirements.
I don't think you'll find anybody that disagrees with this.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jun 25, 2013 at 12:55 PM, Robert Haas <robertmhaas@gmail.com> wrote:
Let me back up a minute. You told the OP that he could make hash
partitioning by writing his own constraint and trigger functions. I
think that won't work. But I'm happy to be proven wrong. Do you have
an example showing how to do it?Here's why I think it WON'T work:
rhaas=# create table foo (a int, b text);
CREATE TABLE
rhaas=# create table foo0 (check ((a % 16) = 0)) inherits (foo);
CREATE TABLE
rhaas=# create table foo1 (check ((a % 16) = 1)) inherits (foo);
CREATE TABLE
rhaas=# create table foo2 (check ((a % 16) = 2)) inherits (foo);
CREATE TABLE
rhaas=# create table foo3 (check ((a % 16) = 3)) inherits (foo);
CREATE TABLE
rhaas=# explain select * from foo where a = 1;
QUERY PLAN
------------------------------------------------------------
Append (cost=0.00..101.50 rows=25 width=36)
-> Seq Scan on foo (cost=0.00..0.00 rows=1 width=36)
Filter: (a = 1)
-> Seq Scan on foo0 (cost=0.00..25.38 rows=6 width=36)
Filter: (a = 1)
-> Seq Scan on foo1 (cost=0.00..25.38 rows=6 width=36)
Filter: (a = 1)
-> Seq Scan on foo2 (cost=0.00..25.38 rows=6 width=36)
Filter: (a = 1)
-> Seq Scan on foo3 (cost=0.00..25.38 rows=6 width=36)
Filter: (a = 1)
(11 rows)Notice we get a scan on every partition. Now let's try it with no
modulo arithmetic, just a straightforward one-partition-per-value:rhaas=# create table foo (a int, b text);
CREATE TABLE
rhaas=# create table foo0 (check (a = 0)) inherits (foo);
CREATE TABLE
rhaas=# create table foo1 (check (a = 1)) inherits (foo);
CREATE TABLE
rhaas=# create table foo2 (check (a = 2)) inherits (foo);
CREATE TABLE
rhaas=# create table foo3 (check (a = 3)) inherits (foo);
CREATE TABLE
rhaas=# explain select * from foo where a = 1;
Did you try "select * from foo where (a % 16) = (1::int % 16)"?
A few views I have that span multiple "partitions" (in quotes since
they're not exactly partitions, but close), I can make constraint
exclusion work if I match the expression EXACTLY, including types
(I've posted a few questions about this to pg-performance).
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jun 25, 2013 at 12:08 PM, Yuri Levinsky <yuril@celltick.com> wrote:
Guys,
I am sorry for taking your time. The reason for my question is:
As former Oracle DBA and now simple beginner PostgreSQL DBA I would like
to say: the current partitioning mechanism might be improved. Sorry, it
seems to me far behind yesterday requirements. As model for improvement
the Oracle might be taken as example. Unfortunately I am not writing an
C code and see my benefit to PostgreSQL community in only rising this
issue. I'll be very happy to be helpful in something else, but...
Please don't flee over this...
As I think you can see, now, the partitioning problem is tougher than it
may at first seem to be. It's quite useful to quickly get to the point of
understanding that.
There would indeed be merit in improving the partitioning apparatus,
and actually, I think it's been a couple of years since there has been
serious discussion of this.
The discussion tends to head into the rabbit hole of disputing about
whether one mechanism or another is ideal. That's the wrong starting
point - we shouldn't start with what's easiest to make "ideal," we
should start by determining what is required/desirable, without too
much reference, at least initially, on to how to achieve it.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"
Christopher Browne <cbbrowne@gmail.com> writes:
There would indeed be merit in improving the partitioning apparatus,
and actually, I think it's been a couple of years since there has been
serious discussion of this.
We could certainly use a partitioning mechanism that's easier to use
than what we have now, which is basically "build it yourself, here's
the parts bin". There would also be some performance benefits from
moving the partitioning logic into hard-wired code.
However, I find it hard to think that hash partitioning as such is very
high on the to-do list. As was pointed out upthread, the main practical
advantage of partitioning is *not* performance of routine queries, but
improved bulk-data management such as the ability to do periodic
housecleaning by dropping a partition. If your partitioning is on a
hash, you've thrown away any such advantage, because there's no
real-world meaning to the way the data's been split up. So I find range
and list partitioning way more plausible.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jun 25, 2013 at 4:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
However, I find it hard to think that hash partitioning as such is very
high on the to-do list. As was pointed out upthread, the main practical
advantage of partitioning is *not* performance of routine queries, but
improved bulk-data management such as the ability to do periodic
housecleaning by dropping a partition. If your partitioning is on a
hash, you've thrown away any such advantage, because there's no
real-world meaning to the way the data's been split up. So I find range
and list partitioning way more plausible.
It would be nice if range partitioning based on some user-defined
function was completely automatic, as in:
* You define a function that returns a partition name for a given input.
* You define a table to somehow be auto-partitioned on
your_function(some_column)
* The planner knows now it's some_column applied to your_function, so
it can do constraint exclusion checks (your_function would probably
need to be stable at least)
* If a returned partition is missing... what? (auto-create? that'd be nice)
It's pretty much what we have already, albeit easier to use. And,
perhaps constraint exclusion logic could be specialized for this case,
and made more robust.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Claudio Freire <klaussfreire@gmail.com> wrote:
Did you try "select * from foo where (a % 16) = (1::int % 16)"?
I did. Using Robert's hashed partitioning table definitions:
test=# explain select * from foo where a = 1 and (a % 16) = (1 % 16);
QUERY PLAN
------------------------------------------------------------
Append (cost=0.00..31.53 rows=2 width=36)
-> Seq Scan on foo (cost=0.00..0.00 rows=1 width=36)
Filter: ((a = 1) AND ((a % 16) = 1))
-> Seq Scan on foo1 (cost=0.00..31.53 rows=1 width=36)
Filter: ((a = 1) AND ((a % 16) = 1))
(5 rows)
So if you are generating your queries through something capable of
generating that last clause off of the first, this could work. Not
all applications need to remain as flexible about the operators as
we want the database engine itself to be.
I agree though, that having an index implementation that can do the
first level split faster than any partitioning mechanism can do is
better, and that the main benefits of partitioning are in
administration, *not* searching. At least until we have parallel
query execution. At *that* point this all changes.
One other thing worth noting is that I have several times seen
cases where the planner cannot exclude partitions, but at execution
time it finds that it doesn't need to execute all of the plan
nodes. I think it makes sense to not work quite so hard to
eliminate partitions at plan time if we can skip the unneeded ones
at run time, once we have more data values resolved.
--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jun 25, 2013 at 6:52 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
I agree though, that having an index implementation that can do the
first level split faster than any partitioning mechanism can do is
better, and that the main benefits of partitioning are in
administration, *not* searching.
Indeed, but the proposal for hash partitions isn't fundamentally
different from range partitions. It's "easy-to-use partitions over
user-defined functions", hash or not.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Tom,
I clearly understand your point. I actually came from corporate market
such as Verizon, Barclays... I remember very good that PostgreSQL is
open source, but let's forget it for a moment. The key issue for
corporate market always been a partitioning(vertical and lately
horizontal). Because of that Oracle has too many types and combinations
of partitions, the other vendors as well. Easy partitions maintenance
(automatic, simple syntax) is very important for everybody who lives in
corporate RDBMS world and not only use "DB's for free" in order to
create some virtual shop. The main purpose of partitioning in my world
is to store billions of rows and be able to search by date, hour or even
minute as fast as possible. When you dealing with company, which has
~350.000.000 users, and you don't want to use key/value data stores: you
need hash partitioned tables and hash partitioned table clusters to
perform fast search and 4-6 tables join based on user phone number for
example. I believe to increase PostgreSQL popularity in corporate
world, to make real money from support, the next features might be:
better vertical and later horizontal partitioning, columnar-oriented
tables, DB freeze for NetApp/EMC snapshots and similar.
Sincerely yours,
Yuri Levinsky, DBA
Celltick Technologies Ltd., 32 Maskit St., Herzliya 46733, Israel
Mobile: +972 54 6107703, Office: +972 9 9710239; Fax: +972 9 9710222
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Tuesday, June 25, 2013 10:33 PM
To: Christopher Browne
Cc: Yuri Levinsky; Robert Haas; Bruce Momjian; PostgreSQL Mailing Lists
Subject: Re: [HACKERS] Hash partitioning.
Christopher Browne <cbbrowne@gmail.com> writes:
There would indeed be merit in improving the partitioning apparatus,
and actually, I think it's been a couple of years since there has been
serious discussion of this.
We could certainly use a partitioning mechanism that's easier to use
than what we have now, which is basically "build it yourself, here's the
parts bin". There would also be some performance benefits from moving
the partitioning logic into hard-wired code.
However, I find it hard to think that hash partitioning as such is very
high on the to-do list. As was pointed out upthread, the main practical
advantage of partitioning is *not* performance of routine queries, but
improved bulk-data management such as the ability to do periodic
housecleaning by dropping a partition. If your partitioning is on a
hash, you've thrown away any such advantage, because there's no
real-world meaning to the way the data's been split up. So I find range
and list partitioning way more plausible.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers