partitioned table query question

Started by Mason Haleover 18 years ago26 messageshackersgeneral

masonhale@gmail.com

over 18 years ago

hackersgeneral

I'm implementing table partitioning on 8.2.5 -- I've got the tables set up
to partition based on the % 10 value of a key.
My problem is that I can't get the planner to take advantage of the
partitioning without also adding a key % 10 to the where clause.
Is there any way around that?

My child table definitions are:

CREATE TABLE topic_version_page_0 (
CHECK (topic_version_id % 10 = 0::integer )
) inherits (topic_version_page);

...

CREATE TABLE topic_version_page_9 (
CHECK (topic_version_id % 10 = 9::integer )
) inherits (topic_version_page);

I've also created indexes and constraints for each child table, and an
insert trigger on the master table (topic_version_page).

If I include a 'topic_version_id % 10 = [some value]' in my query, then the
partitioning shows up in the query plan:

test=> explain select * from topic_version_page where topic_version_id % 10
= (102 % 10) and topic_version_id = 102;
QUERY PLAN

But if I don't explicitly include a 'topic_version_id % 10' -- the plan
gets much worse, checking every table (see below).

test=> explain select * from topic_version_page where topic_version_id =
102;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------
Result (cost=4.27..105.68 rows=22 width=194)
-> Append (cost=4.27..105.68 rows=22 width=194)
-> Bitmap Heap Scan on topic_version_page (cost=4.27..9.61 rows=2
width=194)
Recheck Cond: (topic_version_id = 102)
-> Bitmap Index Scan on
index_topic_version_page_on_topic_version_id_and_created_at (cost=
0.00..4.27 rows=2 width=0)
Index Cond: (topic_version_id = 102)
-> Bitmap Heap Scan on topic_version_page_0 topic_version_page
(cost=4.27..9.61 rows=2 width=194)
Recheck Cond: (topic_version_id = 102)
-> Bitmap Index Scan on
index_topic_version_page_0_on_topic_version_id_and_page_id
(cost=0.00..4.27rows=2 width=0)
Index Cond: (topic_version_id = 102)
-> Bitmap Heap Scan on topic_version_page_1 topic_version_page
(cost=4.27..9.61 rows=2 width=194)
Recheck Cond: (topic_version_id = 102)
-> Bitmap Index Scan on
index_topic_version_page_1_on_topic_version_id_and_page_id
(cost=0.00..4.27rows=2 width=0)
Index Cond: (topic_version_id = 102)
-> Bitmap Heap Scan on topic_version_page_2 topic_version_page
(cost=4.27..9.61 rows=2 width=194)
Recheck Cond: (topic_version_id = 102)
-> Bitmap Index Scan on
index_topic_version_page_2_on_topic_version_id_and_page_id
(cost=0.00..4.27rows=2 width=0)
Index Cond: (topic_version_id = 102)
-> Bitmap Heap Scan on topic_version_page_3 topic_version_page
(cost=4.27..9.61 rows=2 width=194)
Recheck Cond: (topic_version_id = 102)
-> Bitmap Index Scan on
index_topic_version_page_3_on_topic_version_id_and_page_id
(cost=0.00..4.27rows=2 width=0)
Index Cond: (topic_version_id = 102)
-> Bitmap Heap Scan on topic_version_page_4 topic_version_page
(cost=4.27..9.61 rows=2 width=194)
Recheck Cond: (topic_version_id = 102)
-> Bitmap Index Scan on
index_topic_version_page_4_on_topic_version_id_and_page_id
(cost=0.00..4.27rows=2 width=0)
Index Cond: (topic_version_id = 102)
-> Bitmap Heap Scan on topic_version_page_5 topic_version_page
(cost=4.27..9.61 rows=2 width=194)
Recheck Cond: (topic_version_id = 102)
-> Bitmap Index Scan on
index_topic_version_page_5_on_topic_version_id_and_page_id
(cost=0.00..4.27rows=2 width=0)
Index Cond: (topic_version_id = 102)
-> Bitmap Heap Scan on topic_version_page_6 topic_version_page
(cost=4.27..9.61 rows=2 width=194)
Recheck Cond: (topic_version_id = 102)
-> Bitmap Index Scan on
index_topic_version_page_6_on_topic_version_id_and_page_id
(cost=0.00..4.27rows=2 width=0)
Index Cond: (topic_version_id = 102)
-> Bitmap Heap Scan on topic_version_page_7 topic_version_page
(cost=4.27..9.61 rows=2 width=194)
Recheck Cond: (topic_version_id = 102)
-> Bitmap Index Scan on
index_topic_version_page_7_on_topic_version_id_and_page_id
(cost=0.00..4.27rows=2 width=0)
Index Cond: (topic_version_id = 102)
-> Bitmap Heap Scan on topic_version_page_8 topic_version_page
(cost=4.27..9.61 rows=2 width=194)
Recheck Cond: (topic_version_id = 102)
-> Bitmap Index Scan on
index_topic_version_page_8_on_topic_version_id_and_page_id
(cost=0.00..4.27rows=2 width=0)
Index Cond: (topic_version_id = 102)
-> Bitmap Heap Scan on topic_version_page_9 topic_version_page
(cost=4.27..9.61 rows=2 width=194)
Recheck Cond: (topic_version_id = 102)
-> Bitmap Index Scan on
index_topic_version_page_9_on_topic_version_id_and_page_id
(cost=0.00..4.27rows=2 width=0)
Index Cond: (topic_version_id = 102)
(46 rows)

Is there anyway to get the benefit of partitioning without adding a
additional 'topic_version_id % 10' condition to every query that touches
this table?

Thanks in advance.

Mason

Erik Jones

erik@myemma.com

over 18 years ago

In reply to: Mason Hale (#1)

hackersgeneral

Re: partitioned table query question

On Dec 7, 2007, at 10:51 PM, Mason Hale wrote:

I'm implementing table partitioning on 8.2.5 -- I've got the tables
set up to partition based on the % 10 value of a key.

My problem is that I can't get the planner to take advantage of the
partitioning without also adding a key % 10 to the where clause.
Is there any way around that?

My child table definitions are:

CREATE TABLE topic_version_page_0 (
CHECK (topic_version_id % 10 = 0::integer )
) inherits (topic_version_page);

...

CREATE TABLE topic_version_page_9 (
CHECK (topic_version_id % 10 = 9::integer )
) inherits (topic_version_page);

I've also created indexes and constraints for each child table, and
an insert trigger on the master table (topic_version_page).

If I include a 'topic_version_id % 10 = [some value]' in my query,
then the partitioning shows up in the query plan:

test=> explain select * from topic_version_page where
topic_version_id % 10 = (102 % 10) and topic_version_id = 102;
QUERY
PLAN
----------------------------------------------------------------------
----------------------------------------------------------------
Result (cost=4.27..19.23 rows=2 width=194)
-> Append (cost= 4.27..19.23 rows=2 width=194)
-> Bitmap Heap Scan on topic_version_page
(cost=4.27..9.62 rows=1 width=194)
Recheck Cond: (topic_version_id = 102)
Filter: ((topic_version_id % 10) = 2)
-> Bitmap Index Scan on
index_topic_version_page_on_topic_version_id_and_created_at
(cost=0.00..4.27 rows=2 width=0)
Index Cond: (topic_version_id = 102)
-> Bitmap Heap Scan on topic_version_page_2
topic_version_page (cost=4.27..9.62 rows=1 width=194)
Recheck Cond: (topic_version_id = 102)
Filter: ((topic_version_id % 10) = 2)
-> Bitmap Index Scan on
index_topic_version_page_2_on_topic_version_id_and_page_id
(cost=0.00..4.27 rows=2 width=0)
Index Cond: (topic_version_id = 102)
(12 rows)

But if I don't explicitly include a 'topic_version_id % 10' -- the
plan gets much worse, checking every table (see below).

test=> explain select * from topic_version_page where
topic_version_id = 102;
QUERY
PLAN
----------------------------------------------------------------------
----------------------------------------------------------------
Result (cost=4.27..105.68 rows=22 width=194)
-> Append (cost= 4.27..105.68 rows=22 width=194)
-> Bitmap Heap Scan on topic_version_page
(cost=4.27..9.61 rows=2 width=194)
Recheck Cond: (topic_version_id = 102)
-> Bitmap Index Scan on
index_topic_version_page_on_topic_version_id_and_created_at (cost=
0.00..4.27 rows=2 width=0)
Index Cond: (topic_version_id = 102)
-> Bitmap Heap Scan on topic_version_page_0
topic_version_page (cost=4.27..9.61 rows=2 width=194)
Recheck Cond: (topic_version_id = 102)
-> Bitmap Index Scan on
index_topic_version_page_0_on_topic_version_id_and_page_id
(cost=0.00..4.27 rows=2 width=0)
Index Cond: (topic_version_id = 102)
-> Bitmap Heap Scan on topic_version_page_1
topic_version_page (cost=4.27..9.61 rows=2 width=194)
Recheck Cond: (topic_version_id = 102)
-> Bitmap Index Scan on
index_topic_version_page_1_on_topic_version_id_and_page_id (cost=
0.00..4.27 rows=2 width=0)
Index Cond: (topic_version_id = 102)
-> Bitmap Heap Scan on topic_version_page_2
topic_version_page (cost=4.27..9.61 rows=2 width=194)
Recheck Cond: (topic_version_id = 102)
-> Bitmap Index Scan on
index_topic_version_page_2_on_topic_version_id_and_page_id
(cost=0.00..4.27 rows=2 width=0)
Index Cond: (topic_version_id = 102)
-> Bitmap Heap Scan on topic_version_page_3
topic_version_page (cost=4.27..9.61 rows=2 width=194)
Recheck Cond: (topic_version_id = 102)
-> Bitmap Index Scan on
index_topic_version_page_3_on_topic_version_id_and_page_id (cost=
0.00..4.27 rows=2 width=0)
Index Cond: (topic_version_id = 102)
-> Bitmap Heap Scan on topic_version_page_4
topic_version_page (cost=4.27..9.61 rows=2 width=194)
Recheck Cond: (topic_version_id = 102)
-> Bitmap Index Scan on
index_topic_version_page_4_on_topic_version_id_and_page_id
(cost=0.00..4.27 rows=2 width=0)
Index Cond: (topic_version_id = 102)
-> Bitmap Heap Scan on topic_version_page_5
topic_version_page (cost=4.27..9.61 rows=2 width=194)
Recheck Cond: (topic_version_id = 102)
-> Bitmap Index Scan on
index_topic_version_page_5_on_topic_version_id_and_page_id (cost=
0.00..4.27 rows=2 width=0)
Index Cond: (topic_version_id = 102)
-> Bitmap Heap Scan on topic_version_page_6
topic_version_page (cost=4.27..9.61 rows=2 width=194)
Recheck Cond: (topic_version_id = 102)
-> Bitmap Index Scan on
index_topic_version_page_6_on_topic_version_id_and_page_id
(cost=0.00..4.27 rows=2 width=0)
Index Cond: (topic_version_id = 102)
-> Bitmap Heap Scan on topic_version_page_7
topic_version_page (cost=4.27..9.61 rows=2 width=194)
Recheck Cond: (topic_version_id = 102)
-> Bitmap Index Scan on
index_topic_version_page_7_on_topic_version_id_and_page_id (cost=
0.00..4.27 rows=2 width=0)
Index Cond: (topic_version_id = 102)
-> Bitmap Heap Scan on topic_version_page_8
topic_version_page (cost=4.27..9.61 rows=2 width=194)
Recheck Cond: (topic_version_id = 102)
-> Bitmap Index Scan on
index_topic_version_page_8_on_topic_version_id_and_page_id
(cost=0.00..4.27 rows=2 width=0)
Index Cond: (topic_version_id = 102)
-> Bitmap Heap Scan on topic_version_page_9
topic_version_page (cost=4.27..9.61 rows=2 width=194)
Recheck Cond: (topic_version_id = 102)
-> Bitmap Index Scan on
index_topic_version_page_9_on_topic_version_id_and_page_id (cost=
0.00..4.27 rows=2 width=0)
Index Cond: (topic_version_id = 102)
(46 rows)

Is there anyway to get the benefit of partitioning without adding a
additional 'topic_version_id % 10' condition to every query that
touches this table?

Thanks in advance.

You beat me to the punch on this one. I was wanting to use modulo
operations for bin style partitioning as well, but this makes things
pretty awkward as well as unintuitive. So, to the postgres gurus:
What are the limitations of check constraints when used with
constraint exclusion? Is this really the intended behavior?

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

Vick Khera

vivek@khera.org

over 18 years ago

In reply to: Erik Jones (#2)

hackersgeneral

Re: partitioned table query question

On Dec 10, 2007, at 1:21 PM, Erik Jones wrote:

You beat me to the punch on this one. I was wanting to use modulo
operations for bin style partitioning as well, but this makes things
pretty awkward as well as unintuitive. So, to the postgres gurus:
What are the limitations of check constraints when used with
constraint exclusion? Is this really the intended behavior?

/me too!

I have vague recollection of reading that the constraints on the child
tables needed to be free of computation (ie, just straight comparison
ranges) but I can't find that reference now.

But in my case, I can almost always pick the appropriate sub-table
from the application level anyway.

Nathan Wilhelmi

wilhelmi@ucar.edu

over 18 years ago

In reply to: Vick Khera (#3)

hackersgeneral

Script to reset all sequence values in the a given DB?

Hello - Does anyone happen to have a SQL script or function that can
reset all the sequence values found in a given DB? When we rebuild the
DB it would be handy to be able to set all the sequence back to a known
starting place.

Thanks!

-Nate

Pavel Stehule

pavel.stehule@gmail.com

over 18 years ago

In reply to: Nathan Wilhelmi (#4)

hackersgeneral

Re: Script to reset all sequence values in the a given DB?

On 10/12/2007, Nathan Wilhelmi <wilhelmi@ucar.edu> wrote:

Hello - Does anyone happen to have a SQL script or function that can
reset all the sequence values found in a given DB? When we rebuild the
DB it would be handy to be able to set all the sequence back to a known
starting place.

create or replace function resetall()
returns void as $$
declare
v varchar;
m integer;
begin
for v in
select n.nspname || '.' || c.relname
from pg_catalog.pg_class c
left join
pg_catalog.pg_namespace n
on n.oid = c.relnamespace
where c.relkind = 'S'
loop
execute 'select min_value from '||v into m;
setval(v, m, false);
end loop;
return;
end; $$ language plpgsql;

Regards

Pavel Stehule

Show quoted text

Thanks!

-Nate

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Obe, Regina

robe.dnd@cityofboston.gov

over 18 years ago

In reply to: Pavel Stehule (#5)

hackersgeneral

Re: Script to reset all sequence values in the a given DB?

Did you want to set to a specific known value or the min value of the
sequence. I think Pavel's sets to the min value of the sequence.

The below sets all the sequences to the same value

CREATE AGGREGATE sum ( BASETYPE = text,
SFUNC = textcat,
STYPE = text,
INITCOND = '' );

CREATE OR REPLACE FUNCTION cp_resetsequences(resetto integer)
RETURNS void AS
$BODY$
BEGIN
EXECUTE (SELECT SUM('ALTER SEQUENCE ' || sequence_schema || '.'
|| sequence_name || ' RESTART WITH ' || CAST(resetto As varchar(50)) ||
'; ' )
FROM information_schema.sequences);
END
$BODY$
LANGUAGE 'plpgsql' VOLATILE;

--Note this will set all the sequences in the database to 150
SELECT cp_resetsequences(150);

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Pavel Stehule
Sent: Monday, December 10, 2007 4:33 PM
To: Nathan Wilhelmi
Cc: PGSQL Mailing List
Subject: Re: [GENERAL] Script to reset all sequence values in the a
given DB?

On 10/12/2007, Nathan Wilhelmi <wilhelmi@ucar.edu> wrote:

Hello - Does anyone happen to have a SQL script or function that can
reset all the sequence values found in a given DB? When we rebuild the
DB it would be handy to be able to set all the sequence back to a

known

starting place.

Regards

Pavel Stehule

Thanks!

-Nate

---------------------------(end of

broadcast)---------------------------

TIP 5: don't forget to increase your free space map settings

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
-----------------------------------------
The substance of this message, including any attachments, may be
confidential, legally privileged and/or exempt from disclosure
pursuant to Massachusetts law. It is intended
solely for the addressee. If you received this in error, please
contact the sender and delete the material from any computer.

Vivek Khera

khera@kcilink.com

over 18 years ago

In reply to: Nathan Wilhelmi (#4)

hackersgeneral

Re: Script to reset all sequence values in the a given DB?

please don't hijack old threads ("partitioned table query question" in
this case) and change the subject line to start your new question. it
messes up threaded mail readers.

thanks.

On Dec 10, 2007, at 3:00 PM, Nathan Wilhelmi wrote:

Show quoted text

Hello - Does anyone happen to have a SQL script or function that can
reset all the sequence values found in a given DB? When we rebuild
the DB it would be handy to be able to set all the sequence back to
a known starting place.

Thanks!

-Nate

---------------------------(end of
broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Tom Lane

tgl@sss.pgh.pa.us

over 18 years ago

In reply to: Erik Jones (#2)

hackersgeneral

Re: partitioned table query question

Erik Jones <erik@myemma.com> writes:

You beat me to the punch on this one. I was wanting to use modulo
operations for bin style partitioning as well, but this makes things
pretty awkward as well as unintuitive. So, to the postgres gurus:
What are the limitations of check constraints when used with
constraint exclusion? Is this really the intended behavior?

Don't hold your breath. predtest.c has some intelligence about
btree-indexable comparison operators, but none about modulo.

In the particular case here, the reason that
WHERE (foo % 10) = 3
is seen to be incompatible with a check constraint
(foo % 10) = 9
is that the "=" is btree indexable, so predtest knows something about
its semantics; and given that % is an immutable operator, the code is
able to see that these could only both be true if 3 = 9. This
deduction involves exactly zero %-specific knowledge. In particular
it doesn't require assuming that "a=b" implies "(a % c) = (b % c)",
which would involve much more knowledge about the specific operators
involved than is available to the planner. (The fact that an operator
is a btree equality member doesn't mean that it might not consider two
values to be equal that are distinct to some other operators of the
data type. See plus and minus zero in IEEE float arithmetic for one
handy example ... and that's not even considering nonstandard versions
of equality.)

regards, tom lane

Erik Jones

erik@myemma.com

over 18 years ago

In reply to: Tom Lane (#8)

hackersgeneral

Re: partitioned table query question

On Dec 10, 2007, at 4:29 PM, Tom Lane wrote:

Erik Jones <erik@myemma.com> writes:

You beat me to the punch on this one. I was wanting to use modulo
operations for bin style partitioning as well, but this makes things
pretty awkward as well as unintuitive. So, to the postgres gurus:
What are the limitations of check constraints when used with
constraint exclusion? Is this really the intended behavior?

Don't hold your breath. predtest.c has some intelligence about
btree-indexable comparison operators, but none about modulo.

In the particular case here, the reason that
WHERE (foo % 10) = 3
is seen to be incompatible with a check constraint
(foo % 10) = 9
is that the "=" is btree indexable, so predtest knows something about
its semantics; and given that % is an immutable operator, the code is
able to see that these could only both be true if 3 = 9.

I get that.

This
deduction involves exactly zero %-specific knowledge. In particular
it doesn't require assuming that "a=b" implies "(a % c) = (b % c)",
which would involve much more knowledge about the specific operators
involved than is available to the planner. (The fact that an operator
is a btree equality member doesn't mean that it might not consider two
values to be equal that are distinct to some other operators of the
data type. See plus and minus zero in IEEE float arithmetic for one
handy example ... and that's not even considering nonstandard versions
of equality.)

Forgive me if I'm nagging on this, I just want to understand this
better. Why does evaluating a CHECK constraint like 'CHECK some_id %
100 = 32' against WHERE clause like 'WHERE some_id=1132' need to know
anything about equality properites of %? Or, rather, why does it
stop there? Can't it just substitute the given value for some_id in
to the check expression, execute it and check the result value for
TRUE/FALSE?

On a related note, how would you recommend implementing some kind of
bin based (i.e. hash values, round robin, etc...) partitioning scheme
if this won't work? I've tried a number of different approaches with
functions in the check constraint but can't seem to get anything
going there either. I'm the third person this week (all in this
thread and another I had going) that's interested in this approach.

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

#10

Tom Lane

tgl@sss.pgh.pa.us

over 18 years ago

In reply to: Erik Jones (#9)

hackersgeneral

Re: partitioned table query question

Erik Jones <erik@myemma.com> writes:

Forgive me if I'm nagging on this, I just want to understand this
better. Why does evaluating a CHECK constraint like 'CHECK some_id %
100 = 32' against WHERE clause like 'WHERE some_id=1132' need to know
anything about equality properites of %? Or, rather, why does it
stop there? Can't it just substitute the given value for some_id in
to the check expression, execute it and check the result value for
TRUE/FALSE?

What "given value"?

What you're missing is that the condition "a = b" does not mean that
"f(a) = f(b)" for every function f. It is possible to define
constraints on equality that would make that true, but such constraints
would be far stronger than what is required to make btree (or even hash)
indexes work.

In the example I gave, we are able to conclude that 3 is unequal to 9
not because of any a-priori knowledge, but because we apply the specific
operator to the specific constants and find out that it yields false.
Our knowledge of the consistency requirements that are imposed on btree
equality operators then allows us to determine that the two original
conditions can't be true at the same time.

This does *not* imply assuming that the two constants are really "the
same" in the sense that no other operator in the system could tell them
apart. This isn't mere academic hairsplitting: there actually are
standard equality operators in the system for which such a conclusion
would fail. I already mentioned float comparison, and numeric
comparison has similar behaviors --- for instance,

regression=# select '0.00'::numeric = '0.0'::numeric;
?column?
----------
t
(1 row)

regression=# select text('0.00'::numeric) = text('0.0'::numeric);
?column?
----------
f
(1 row)

regards, tom lane

#11

Erik Jones

erik@myemma.com

over 18 years ago

In reply to: Tom Lane (#10)

hackersgeneral

Re: partitioned table query question

On Dec 10, 2007, at 5:50 PM, Tom Lane wrote:

Erik Jones <erik@myemma.com> writes:

Forgive me if I'm nagging on this, I just want to understand this
better. Why does evaluating a CHECK constraint like 'CHECK some_id %
100 = 32' against WHERE clause like 'WHERE some_id=1132' need to know
anything about equality properites of %? Or, rather, why does it
stop there? Can't it just substitute the given value for some_id in
to the check expression, execute it and check the result value for
TRUE/FALSE?

What "given value"?

The where clause in the select query.

What you're missing is that the condition "a = b" does not mean that
"f(a) = f(b)" for every function f. It is possible to define
constraints on equality that would make that true, but such
constraints
would be far stronger than what is required to make btree (or even
hash)
indexes work.

In the table constraint CHECK (some_id % 100 = 32), isn't that just f
(a) = 32?

In the example I gave, we are able to conclude that 3 is unequal to 9
not because of any a-priori knowledge, but because we apply the
specific
operator to the specific constants and find out that it yields false.
Our knowledge of the consistency requirements that are imposed on
btree
equality operators then allows us to determine that the two original
conditions can't be true at the same time.

This does *not* imply assuming that the two constants are really "the
same" in the sense that no other operator in the system could tell
them
apart. This isn't mere academic hairsplitting: there actually are
standard equality operators in the system for which such a conclusion
would fail. I already mentioned float comparison, and numeric
comparison has similar behaviors --- for instance,

regression=# select '0.00'::numeric = '0.0'::numeric;
?column?
----------
t
(1 row)

regression=# select text('0.00'::numeric) = text('0.0'::numeric);
?column?
----------
f
(1 row)

I get your arguments wrt the known (to the planner) semantics of the
equality operator. I guess what I don't understand is that given the
query

SELECT COUNT(*)
FROM table
WHERE some_id=34;

on a table with the much discussed constraint (34 % 100) = 32 isn't
simply evaluated as a one-time filter whenever whatever constraint
exclusion code examines child partition tables' constraints.

Again, though, is there some better way to go about implementing some
kind of hash based partitioning in postgres besides this that would
be more natural wrt queries?

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

#12

Tom Lane

tgl@sss.pgh.pa.us

over 18 years ago

In reply to: Erik Jones (#11)

hackersgeneral

Re: partitioned table query question

Erik Jones <erik@myemma.com> writes:

I guess what I don't understand is that given the query

SELECT COUNT(*)
FROM table
WHERE some_id=34;

on a table with the much discussed constraint (34 % 100) = 32 isn't
simply evaluated as a one-time filter whenever whatever constraint
exclusion code examines child partition tables' constraints.

I'm not sure how else to explain it: the fact that the WHERE clause
asserts that some operator named "=" will succeed on some_id and 34
is not sufficient grounds to assume that "some_id % 100" and "34 % 100"
will give the same result. Knowing that the "=" operator is a btree
equality operator gives us latitude to make certain conclusions, but
not that one, because there is no way to know whether the semantics
of the particular btree operator class have anything to do with the
behavior of "%".

If you dig in the PG archives you will find some discussions of
inventing a "real equality" flag for operators, which would authorize
the planner to make such deductions for any immutable operator/function.
The idea hasn't gone anywhere, partly because it's not clear that it
would really help in very many common cases. The fact that we could
*not* set the flag on such common cases as float and numeric equality
is a bit discouraging in that connection.

regards, tom lane

#13

Trevor Talbot

quension@gmail.com

over 18 years ago

In reply to: Tom Lane (#12)

hackersgeneral

Re: partitioned table query question

On 12/10/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Erik Jones <erik@myemma.com> writes:

I guess what I don't understand is that given the query

SELECT COUNT(*)
FROM table
WHERE some_id=34;

on a table with the much discussed constraint (34 % 100) = 32 isn't
simply evaluated as a one-time filter whenever whatever constraint
exclusion code examines child partition tables' constraints.

I'm not sure how else to explain it: the fact that the WHERE clause
asserts that some operator named "=" will succeed on some_id and 34
is not sufficient grounds to assume that "some_id % 100" and "34 % 100"
will give the same result. Knowing that the "=" operator is a btree
equality operator gives us latitude to make certain conclusions, but
not that one, because there is no way to know whether the semantics
of the particular btree operator class have anything to do with the
behavior of "%".

Erik is questioning is why it has to assume anything. Why can't it
just execute the expression and find out? On a high level, the
partitioning system looks exactly like partial expression indexes.

#14

Trevor Talbot

quension@gmail.com

over 18 years ago

In reply to: Trevor Talbot (#13)

hackersgeneral

Re: partitioned table query question

On 12/10/07, Trevor Talbot <quension@gmail.com> wrote:

On 12/10/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Erik Jones <erik@myemma.com> writes:

I guess what I don't understand is that given the query

SELECT COUNT(*)
FROM table
WHERE some_id=34;

on a table with the much discussed constraint (34 % 100) = 32 isn't
simply evaluated as a one-time filter whenever whatever constraint
exclusion code examines child partition tables' constraints.

I'm not sure how else to explain it: the fact that the WHERE clause
asserts that some operator named "=" will succeed on some_id and 34
is not sufficient grounds to assume that "some_id % 100" and "34 % 100"
will give the same result. Knowing that the "=" operator is a btree
equality operator gives us latitude to make certain conclusions, but
not that one, because there is no way to know whether the semantics
of the particular btree operator class have anything to do with the
behavior of "%".

Erik is questioning is why it has to assume anything. Why can't it
just execute the expression and find out? On a high level, the
partitioning system looks exactly like partial expression indexes.

...Oops. I sit here for 10 minutes pondering it, and figure out the
comparison with expression indexes isn't really true 2 seconds after I
hit "send". Sigh.

#15

Tom Lane

tgl@sss.pgh.pa.us

over 18 years ago

In reply to: Trevor Talbot (#13)

hackersgeneral

Re: partitioned table query question

"Trevor Talbot" <quension@gmail.com> writes:

Erik is questioning is why it has to assume anything. Why can't it
just execute the expression and find out?

Because the whole point of the problem is to *not* execute the
expression, but to assume that it must yield false, for every row
of a given partition. Without a solid logical basis for that
assumption, you're just building a house of cards.

The bottom line here is that we have built a partitioning facility
out of spare parts, ie, a very generalized contradiction-proving
section of the planner. It's been an interesting exercise, and
it's certainly resulted in a better contradiction-prover than
we would have had otherwise, but it's got obvious limitations both
in planning performance and in the sorts of partitioning rules we
can support. My feeling is that trying to push the current approach to
do bin or hash partitioning transparently is likely not reasonable.
Eventually we'll have to push an understanding of partitioning down to
some lower level of the system --- that is, if we think it's critical
enough to justify that much effort.

regards, tom lane

#16

Robert Treat

xzilla@users.sourceforge.net

over 18 years ago

In reply to: Erik Jones (#11)

hackersgeneral

Re: partitioned table query question

On Monday 10 December 2007 20:01, Erik Jones wrote:

Again, though, is there some better way to go about implementing some
kind of hash based partitioning in postgres besides this that would
be more natural wrt queries?

One way is to set a static bin id for each partition, then do a select with
where bin_id = mod(2112,3); Where 2112 equals the number your looking for,
and 3 would be the number of buckets.

--
Robert Treat
http://www.omniti.com/
We're Big. On PostgreSQL.

#17

Simon Riggs

simon@2ndQuadrant.com

over 18 years ago

In reply to: Tom Lane (#12)

hackersgeneral

Re: partitioned table query question

On Mon, 2007-12-10 at 23:18 -0500, Tom Lane wrote:

Erik Jones <erik@myemma.com> writes:

If you dig in the PG archives you will find some discussions of
inventing a "real equality" flag for operators, which would authorize
the planner to make such deductions for any immutable operator/function.
The idea hasn't gone anywhere, partly because it's not clear that it
would really help in very many common cases.

I think the only thing we can really say is that we haven't done it yet,
everything takes time and this is a tricky problem.

People are asking for this to work with an integer to allow hashing. We
already use that exact concept internally for hash aggregation and hash
joins, so I don't think we can argue it wouldn't help in common cases.

The fact that we could
*not* set the flag on such common cases as float and numeric equality
is a bit discouraging in that connection.

With respect, that argument is circular. You required the implementation
to be a marking of the operator in that way, which leads to the
restriction you note. It can work for float and numeric, just not in all
conceivable cases. Most datatypes have some kind of weirdness that
prevents us from specifying it for everything in an easy manner.

Can I ask you to please lead us out of this impasse? The main problem is
working out an acceptable way to mark operators/datatypes/whatever or
combinations or thereof as "able to be transformed". Once we have that,
working out the rest is complex but doable. And it will then allow
partitioning, partial indexes and functional indexes to have a wider
range of application.

We really only want this for
- hashing functions, various, mostly on integers, uuids, text?
- upper() and lower() on text
- substr() on text
- very few others

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

#18

Bruce Momjian

bruce@momjian.us

over 18 years ago

In reply to: Tom Lane (#15)

hackersgeneral

Re: partitioned table query question

[moved from -general]

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

The bottom line here is that we have built a partitioning facility
out of spare parts, ie, a very generalized contradiction-proving
section of the planner.

That's true but I think the conclusions you reach are a stretch. The main
problem with our existing approach is that there's no way to make the kind of
deductions we want which depend on an holistic view of all the constraints. So
you can't sort the ranges and search them using a binary scan, or join
matching partitions from separate tables before appending them, or any number
of similar ideas.

But I don't see that handling hash or bin partitioning is really going to be
so hard with our current scheme. I think we need to have some form of
understanding of which functions preserve which btree ordering information in
order to make better use of expression indexes anyways.

Eventually we'll have to push an understanding of partitioning down to
some lower level of the system --- that is, if we think it's critical
enough to justify that much effort.

If by a lower level of the system you mean the storage manager or anything
like that then I definitely don't agree. If on the other hand you just mean a
simpler more regular structure than turing-complete constraints then I would
agree, but for the reasons above -- not for the problems with hash/bin
partitioning and equality semantics.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's PostGIS support!

#19

Mike Rylander

mrylander@gmail.com

over 18 years ago

In reply to: Erik Jones (#11)

hackersgeneral

Re: partitioned table query question

On Dec 10, 2007 8:01 PM, Erik Jones <erik@myemma.com> wrote:

[snip]

Again, though, is there some better way to go about implementing some
kind of hash based partitioning in postgres besides this that would
be more natural wrt queries?

Adding a column to hold the result of the %, perhaps updated by a
trigger so your app needn't change, and partitioning on that would be
the obvious way to get what you want today. If you have a byte or two
of slack space in the tuple (by alignment), just use a "char" or an
INT2. Assuming you don't affect fully aligned base tuple size, there
should be no table bloat, and no noticeable effect on speed. As far
as being more natural WRT queries, well, you'd add to your where
clause

bin = 34

instead of

some_id % 100 = 34

The former seems to me to be more natural from the narrow perspective
of the SELECT statement.

--miker

#20

Erik Jones

erik@myemma.com

over 18 years ago

In reply to: Mike Rylander (#19)

hackersgeneral

Re: partitioned table query question

On Dec 11, 2007, at 7:20 AM, Mike Rylander wrote:

On Dec 10, 2007 8:01 PM, Erik Jones <erik@myemma.com> wrote:

[snip]

Again, though, is there some better way to go about implementing some
kind of hash based partitioning in postgres besides this that would
be more natural wrt queries?

Adding a column to hold the result of the %, perhaps updated by a
trigger so your app needn't change, and partitioning on that would be
the obvious way to get what you want today. If you have a byte or two
of slack space in the tuple (by alignment), just use a "char" or an
INT2. Assuming you don't affect fully aligned base tuple size, there
should be no table bloat, and no noticeable effect on speed. As far
as being more natural WRT queries, well, you'd add to your where
clause

bin = 34

instead of

some_id % 100 = 34

The former seems to me to be more natural from the narrow perspective
of the SELECT statement.

Well, given that the bin is computed as a function of some_id, the
most natural way would be to not have to mention that bin in SELECT
statements at all. However, it does appear that either a.) including
the bin as a table attribute and in the where clause (either directly
or the computation) or b.) precomputing the bin and directly
accessing the child table will be the only options we have for now.

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

#21

Mason Hale

masonhale@gmail.com

over 18 years ago

In reply to: Erik Jones (#20)

hackersgeneral

#22

Bruce Momjian

bruce@momjian.us