On partitioning

Started by Alvaro Herreraover 11 years ago178 messageshackers
Jump to latest
#1Alvaro Herrera
alvherre@2ndquadrant.com

Prompted by a comment in the UPDATE/LIMIT thread, I saw Marko Tiikkaja
reference Tom's post
/messages/by-id/1598.1399826841@sss.pgh.pa.us
which mentions the possibility of a different partitioning
implementation than what we have so far. As it turns out, I've been
thinking about partitioning recently, so I thought I would share what
I'm thinking so that others can poke holes. My intention is to try to
implement this as soon as possible.

Declarative partitioning
========================

In this design, partitions are first-class objects, not normal tables in
inheritance hierarchies. There are no pg_inherits entries involved at all.

Partitions are a physical implementation detail. Therefore we do not allow
the owner to be changed, or permissions to be granted directly to partitions;
all these operations happen to the parent relation instead.

System Catalogs
---------------

In pg_class we have two additional relkind values:

* relkind RELKIND_PARTITIONED_REL 'P' indicates a partitioned relation.
It is used to indicate a parent table, i.e. one the user can directly
address in DML queries. Such relations DO NOT have their own storage.
These use the same rules as regular tables for access privileges,
ownership and so on.

* relkind RELKIND_PARTITION 'p' indicates a partition within a partitioned
relation (its parent). These cannot be addressed directly in DML
queries and only limited DDL support is provided. They don't have
their own pg_attribute entries either and therefore they are always
identical in column definitions to the parent relation. Since they
are not accessible directly, there is no need for ACL considerations;
the parent relation's owner is the owner, and grants are applied to
the parent relation only.
XXX --- is there a need for a partition having different column
default values than its parent relation?

Partitions are numbered sequentially, normally from 1 onwards; but it is
valid to have negative partition numbers and 0. Partitions don't have
names (except automatically generated ones for pg_class.relname, but
they are unusable in DDL).

Each partition is assigned an Expression that receives a tuple and
returns boolean. This expression returns true if a given tuple belongs
into it, false otherwise. If a tuple for a partitioned relation is run
through expressions of all partitions, exactly one should return true.
If none returns true, it might be because the partition has not been
created yet. A user-facing error is raised in this case (Rationale: if
user creates a partitioned rel and there is no partition that accepts
some given tuple, it's the user's fault.)

Additionally, each partitioned relation may have a master expression.
This receives a tuple and returns an integer, which corresponds to the
number of the partition it belongs into.

There are two new system catalogs:

pg_partitioned_rel --> (prelrelid, prelexpr) pg_partition -->
(partrelid, partseq, partexpr, partoverflow)

For partitioned rels that have prelexpr, we run that expression and
obtain the partition number; as a crosscheck we run partexpr and ensure
it returns true. For partitioned rels that don't have prelexpr, we run
partexpr for each partition in turn until one returns true. This means
that for a properly set up partitioned table, we need to run a single
expression on a tuple to find out what partition the tuple belongs into.

Per-partition expressions are formed as each partition is created, and
are based on the user-supplied partitioning criterion. Master
expressions are formed at relation creation time. (XXX Can we change
the master expression later, as a result of some ALTER command?
Presumably this would mean that all partitions might need to be
rewritten.)

Triggers --------

(These are user-defined triggers, not partitioning triggers. In fact
there are no partitioning triggers at all.)

Triggers are attached to the parent relation, not to the specific
partition. When a trigger function runs on a tuple inserted, updated or
modified on a partition, the data received by the trigger function makes
it appear that the tuple belongs to the parent relation. There is no
need to let the trigger know which partition the tuple went in or came
from. XXX is there a need to give it the partition number that the
tuple went it?

Syntax ------

CREATE TABLE xyz ( ... ) PARTITION BY RANGE ( a_expr ) This creates the
main table only: no partitions are created automatically.

We do not support other types of partitioning at this stage. We will
implement these later.

We do not currently support ALTER TABLE/PARTITION BY (i.e. partition a
table after the fact). We leave this as a future improvement.

Allowed actions on RELKIND_PARTITIONED_REL: * ALTER TABLE <xyz> CREATE
PARTITION <n> This creates a new partition * ALTER TABLE <xyz> CREATE
PARTITION FOR <value> Same as above; the partition number is determined
automatically.

Allowed actions on a RELKIND_PARTITION:

* ALTER PARTITION <n> ON TABLE <xyz> SET TABLESPACE * ALTER PARTITION
<n> ON TABLE <xyz> DROP * CREATE INDEX .. ON PARTITION <n> ON TABLE
<xyz> * VACUUM parent PARTITION <n>

As a future extension we will allow partitions to become detached from
the parent relation, thus becoming an independent table. This might be
a relatively expensive operation: pg_attribute entries need to be
created, for example.

Overflow Partitions -------------------

There is no explicit concept of overflow partitions.

Vacuum, aging -------------

PARTITIONED_RELs, not containing tuples directly, do not have
relfrozenxid or relminmxid. Each partition has individual values for
these variables.

Autovacuum knows to ignore PARTITIONED_RELs, and considers each
RELKIND_PARTITION separately.

Each partition is vacuumed as a normal relation.

Planner -------

A partitioned relation behaves just like a regular relation for purposes
of planner. XXX do we need special considerations regarding relation
size estimation?

For scan plans, we need to prepare Append lists which are used to scan
for tuples in a partitioned relation. We can setup fake constraint
expressions based on the partitioning expressions, which let the planner
discard unnecessary partitions by way of constraint exclusion.

(In the future we might be interested in creating specialized plan and
execution nodes that know more about partitioned relations, to avoid
creating useless Append trees only to prune them later.)

Executor --------

When doing an INSERT or UPDATE ResultRelInfo needs to be expanded for
partitioned relations: the target relation of an insertion is the parent
relation, but the actual partition needs to be resolved at ModifyTable
execution time. This means RelOptInfo needs to know about partitions;
either we deal with them as "other rels" terms, or we create a new
RelOptKind. At any rate, running the partitioning expression on the new
tuple would give an partition index. This needs to be done once for
each new tuple.

I think during ExecInsert, after running triggers and before executing
constraints, we need to switch resultRelationDesc from the parent
relation into the partition-specific relation.

ExecInsertIndexTuples only knows about partitions. It's an error to
call it using a partitioned rel.

Heap Access Method ------------------ For the purposes of low-level
routines in heapam.c, only partitions exist; trying to insert or modify
tuples in a RELKIND_PARTITIONED_REL is an error. heap_insert and
heap_multi_insert only accept inserting tuples into an individual
partition. These routines do not check that the tuples belong into the
specific partition; that's responsibility of higher-level code. Because
of this, code like COPY will need to make its own checks. Maybe we
should offer another API (in between high-level things such as
ModifyTable/COPY and heapam.c) that receives tuples into a
PARTITIONED_REL and routes them into specific partitions. Note: need to
ensure we do not slow down COPY for the regular case of
RELKIND_RELATION.

Taking backups --------------

pg_dump is able to dump a partitioned relation as a CREATE
TABLE/PARTITION command and a series of ALTER TABLE/CREATE PARTITION
commands. The data of all partitions is considered a single COPY
operation.

XXX this limits the ability to restore in parallel. To fix we might consider
using one COPY for each partition. It's not clear what relation should be
mentioned in such a COPY command, though -- my instinct is that it
should reference the parent table only, not the individual partition.

Previous Discussion
-------------------

/messages/by-id/d3c4af540703292358s8ed731el7771ab14083aa610@mail.gmail.com
Auto Partitioning Patch - WIP version 1
(Nikhil Sontakke, March 2007)

/messages/by-id/20080111231945.GY6934@europa.idg.com.au
Declarative partitioning grammar
(Gavin Sherry, January 2008)

/messages/by-id/bd8134a40906080702s96c90a9q3bbb581b9bd0d5d7@mail.gmail.com
Patch for automating partitions in PostgreSQL 8.4 Beta 2
(Kedar Potdar, Jun 2009)

/messages/by-id/20091029111531.96CD.52131E4D@oss.ntt.co.jp
Syntax for partitioning
(Itagaki Takahiro, Oct 2009)

/messages/by-id/AANLkTikP-1_8B04eyIK0sDf8uA5KMo64o8sorFBZE_CT@mail.gmail.com
Partitioning syntax
(Itagaki Takahiro, Jan 2010)

Not really related:
/messages/by-id/1199296574.7260.149.camel@ebony.site
Dynamic Partitioning using Segment Visibility Maps
(Simon Riggs, January 2008)

Still To Be Designed
--------------------
* Dependency issues
* Are indexes/constraints inherited from the parent rel?
* Multiple keys? Subpartitioning? Hash partitioning?

Open Questions
--------------

* What's the syntax to refer to specific partitions within a partitioned
table?
We could do "TABLE <xyz> PARTITION <n>", but for example if in
the future we add hash partitioning, we might need some non-integer
addressing (OTOH assigning sequential numbers to hash partitions doesn't
seem so bad). Discussing with users of other DBMSs partitioning feature,
one useful phrase is "TABLE <xyz> PARTITION FOR <value>".

* Do we want to provide partitioned materialized views?

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#1)
Re: On partitioning

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

[ partition sketch ]

In this design, partitions are first-class objects, not normal tables in
inheritance hierarchies. There are no pg_inherits entries involved at all.

Hm, actually I'd say they are *not* first class objects; the problem with
the existing design is exactly that child tables *are* first class
objects. This is merely a terminology quibble though.

* relkind RELKIND_PARTITION 'p' indicates a partition within a partitioned
relation (its parent). These cannot be addressed directly in DML
queries and only limited DDL support is provided. They don't have
their own pg_attribute entries either and therefore they are always
identical in column definitions to the parent relation.

Not sure that not storing the pg_attribute rows is a good thing; but
that's something that won't be clear till you try to code it.

Each partition is assigned an Expression that receives a tuple and
returns boolean. This expression returns true if a given tuple belongs
into it, false otherwise.

-1, in fact minus a lot. One of the core problems of the current approach
is that the system, particularly the planner, hasn't got a lot of insight
into exactly what the partitioning scheme is in a partitioned table built
on inheritance. If you allow the partitioning rule to be a black box then
that doesn't get any better. I want to see a design wherein the system
understands *exactly* what the partitioning behavior is. I'd start with
supporting range-based partitioning explicitly, and maybe we could add
other behaviors such as hashing later.

In particular, there should never be any question at all that there is
exactly one partition that a given row belongs to, not more, not less.
You can't achieve that with a set of independent filter expressions;
a meta-rule that says "exactly one of them should return true" is an
untrustworthy band-aid.

(This does not preclude us from mapping the tuple through the partitioning
rule and finding that the corresponding partition doesn't currently exist.
I think we could view the partitioning rule as a function from tuples to
partition numbers, and then we look in pg_class to see if such a partition
exists.)

Additionally, each partitioned relation may have a master expression.
This receives a tuple and returns an integer, which corresponds to the
number of the partition it belongs into.

I guess this might be the same thing I'm arguing for, except that I say
it is not optional but is *the* way you define the partitioning. And
I don't really want black-box expressions even in this formulation.
If you're looking for arbitrary partitioning rules, you can keep on
using inheritance. The point of inventing partitioning, IMHO, is for
the system to have a lot more understanding of the behavior than is
possible now.

As an example of the point I'm trying to make, the planner should be able
to discard range-based partitions that are eliminated by a WHERE clause
with something a great deal cheaper than the theorem prover it currently
has to use for the purpose. Black-box partitioning rules not only don't
improve that situation, they actually make it worse.

Other than that, this sketch seems reasonable ...

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#1)
Re: On partitioning

On Fri, Aug 29, 2014 at 4:56 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

For scan plans, we need to prepare Append lists which are used to scan
for tuples in a partitioned relation. We can setup fake constraint
expressions based on the partitioning expressions, which let the planner
discard unnecessary partitions by way of constraint exclusion.

(In the future we might be interested in creating specialized plan and
execution nodes that know more about partitioned relations, to avoid
creating useless Append trees only to prune them later.)

This seems like a big part of the point of doing first class
partitions. If we have an equivalence class that specifies a constant
for all the variables in the master expression then we should be able
to look up the corresponding partition as a O(1) operation (or
O(log(n) if it involves searching a list) rather than iterating over
all the partitions and trying to prove lots of exclusions. We might
even need a btree index to store the partitions so that we can handle
scaling up and still find the corresponding partitions quickly.

And I think there are still unanswered questions about indexes. You
seem to be implying that users would be free to create any index they
want on any partition. It's probably going to be necessary to support
creating an index on the partitioned table which would create an index
on each of the partitions and, crucially, automatically create
corresponding indexes whenever new partitions are added.

That said, everything that's here sounds pretty spot-on to me.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#2)
Re: On partitioning

2014-08-29 18:35 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

[ partition sketch ]

In this design, partitions are first-class objects, not normal tables in
inheritance hierarchies. There are no pg_inherits entries involved at

all.

Hm, actually I'd say they are *not* first class objects; the problem with
the existing design is exactly that child tables *are* first class
objects. This is merely a terminology quibble though.

+1 .. only few partitions slowdown planning significantly from 1ms to 20ms,
what is a issue with very simple queries over PK

Show quoted text

* relkind RELKIND_PARTITION 'p' indicates a partition within a

partitioned

relation (its parent). These cannot be addressed directly in DML
queries and only limited DDL support is provided. They don't have
their own pg_attribute entries either and therefore they are always
identical in column definitions to the parent relation.

Not sure that not storing the pg_attribute rows is a good thing; but
that's something that won't be clear till you try to code it.

Each partition is assigned an Expression that receives a tuple and
returns boolean. This expression returns true if a given tuple belongs
into it, false otherwise.

-1, in fact minus a lot. One of the core problems of the current approach
is that the system, particularly the planner, hasn't got a lot of insight
into exactly what the partitioning scheme is in a partitioned table built
on inheritance. If you allow the partitioning rule to be a black box then
that doesn't get any better. I want to see a design wherein the system
understands *exactly* what the partitioning behavior is. I'd start with
supporting range-based partitioning explicitly, and maybe we could add
other behaviors such as hashing later.

In particular, there should never be any question at all that there is
exactly one partition that a given row belongs to, not more, not less.
You can't achieve that with a set of independent filter expressions;
a meta-rule that says "exactly one of them should return true" is an
untrustworthy band-aid.

(This does not preclude us from mapping the tuple through the partitioning
rule and finding that the corresponding partition doesn't currently exist.
I think we could view the partitioning rule as a function from tuples to
partition numbers, and then we look in pg_class to see if such a partition
exists.)

Additionally, each partitioned relation may have a master expression.
This receives a tuple and returns an integer, which corresponds to the
number of the partition it belongs into.

I guess this might be the same thing I'm arguing for, except that I say
it is not optional but is *the* way you define the partitioning. And
I don't really want black-box expressions even in this formulation.
If you're looking for arbitrary partitioning rules, you can keep on
using inheritance. The point of inventing partitioning, IMHO, is for
the system to have a lot more understanding of the behavior than is
possible now.

As an example of the point I'm trying to make, the planner should be able
to discard range-based partitions that are eliminated by a WHERE clause
with something a great deal cheaper than the theorem prover it currently
has to use for the purpose. Black-box partitioning rules not only don't
improve that situation, they actually make it worse.

Other than that, this sketch seems reasonable ...

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#3)
Re: On partitioning

Greg Stark <stark@mit.edu> writes:

And I think there are still unanswered questions about indexes.

One other interesting thought that occurs to me: are we going to support
UPDATEs that cause a row to belong to a different partition? If so, how
are we going to handle the update chain links?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#5)
Re: On partitioning

Tom Lane wrote:

Greg Stark <stark@mit.edu> writes:

And I think there are still unanswered questions about indexes.

One other interesting thought that occurs to me: are we going to support
UPDATEs that cause a row to belong to a different partition? If so, how
are we going to handle the update chain links?

Bah, I didn't mention it? My current thinking is that it would be
disallowed; if you have chosen your partitioning key well enough it
shouldn't be necessary. As a workaround you can always DELETE/INSERT.
Maybe we can allow it later, but for a first cut this seems more than
good enough.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#6)
Re: On partitioning

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Tom Lane wrote:

One other interesting thought that occurs to me: are we going to support
UPDATEs that cause a row to belong to a different partition? If so, how
are we going to handle the update chain links?

Bah, I didn't mention it? My current thinking is that it would be
disallowed; if you have chosen your partitioning key well enough it
shouldn't be necessary. As a workaround you can always DELETE/INSERT.
Maybe we can allow it later, but for a first cut this seems more than
good enough.

Hm. I certainly agree that it's a case that could be disallowed for a
first cut, but it'd be nice to have some clue about how we might allow it
eventually.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#7)
Re: On partitioning

On 2014-08-29 13:15:16 -0400, Tom Lane wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Tom Lane wrote:

One other interesting thought that occurs to me: are we going to support
UPDATEs that cause a row to belong to a different partition? If so, how
are we going to handle the update chain links?

Bah, I didn't mention it? My current thinking is that it would be
disallowed; if you have chosen your partitioning key well enough it
shouldn't be necessary. As a workaround you can always DELETE/INSERT.
Maybe we can allow it later, but for a first cut this seems more than
good enough.

Hm. I certainly agree that it's a case that could be disallowed for a
first cut, but it'd be nice to have some clue about how we might allow it
eventually.

Not pretty, but we could set t_ctid to some 'magic' value when switching
partitions. Everything chasing ctid chains could then error out when
hitting a invisible row with such a t_ctid. The usecases for doing such
updates really are more maintenance style commands, so it's possibly not
too bad from a usability POV :(

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#8)
Re: On partitioning

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-08-29 13:15:16 -0400, Tom Lane wrote:

Hm. I certainly agree that it's a case that could be disallowed for a
first cut, but it'd be nice to have some clue about how we might allow it
eventually.

Not pretty, but we could set t_ctid to some 'magic' value when switching
partitions. Everything chasing ctid chains could then error out when
hitting a invisible row with such a t_ctid.

An actual fix would presumably involve adding a partition number to the
ctid chain field in tuples in partitioned tables. The reason I bring it
up now is that we'd have to commit to doing that (or at least leaving room
for it) in the first implementation, if we don't want to have an on-disk
compatibility break.

There is certainly room to argue that the value of this capability isn't
worth the disk space this solution would eat. But we should have that
argument while the option is still feasible ...

The usecases for doing such
updates really are more maintenance style commands, so it's possibly not
too bad from a usability POV :(

I'm afraid that might just be wishful thinking.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#7)
Re: On partitioning

Tom Lane wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Tom Lane wrote:

One other interesting thought that occurs to me: are we going to support
UPDATEs that cause a row to belong to a different partition? If so, how
are we going to handle the update chain links?

Bah, I didn't mention it? My current thinking is that it would be
disallowed; if you have chosen your partitioning key well enough it
shouldn't be necessary. As a workaround you can always DELETE/INSERT.
Maybe we can allow it later, but for a first cut this seems more than
good enough.

Hm. I certainly agree that it's a case that could be disallowed for a
first cut, but it'd be nice to have some clue about how we might allow it
eventually.

I hesitate to suggest this, but we have free flag bits in
MultiXactStatus. We could use a specially marked multixact member to
indicate the OID of the target relation; perhaps set an infomask bit to
indicate that this has happened. Of course, no HOT updates are possible
so I think it's okay from a heap_prune_chain perspective. This abuses
the knowledge that OIDs and XIDs are both 32 bits long.

Since nowhere else we have the space necessary to store the longer data
that a cross-partition update would require, I don't see anything else
ATM. (For a moment I thought about abusing combo CIDs, but that doesn't
work because this requires to be persistent and visible from other
backends, neither of which is a quality of combocids.)

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#9)
Re: On partitioning

On 2014-08-29 13:29:19 -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-08-29 13:15:16 -0400, Tom Lane wrote:

Hm. I certainly agree that it's a case that could be disallowed for a
first cut, but it'd be nice to have some clue about how we might allow it
eventually.

Not pretty, but we could set t_ctid to some 'magic' value when switching
partitions. Everything chasing ctid chains could then error out when
hitting a invisible row with such a t_ctid.

An actual fix would presumably involve adding a partition number to the
ctid chain field in tuples in partitioned tables. The reason I bring it
up now is that we'd have to commit to doing that (or at least leaving room
for it) in the first implementation, if we don't want to have an on-disk
compatibility break.

Right. Just adding it unconditionally doesn't sound feasible to me. Our
per-row overhead is already too large. And it doesn't sound fun to have
the first-class partitions use a different heap tuple format than plain
relations.

What we could do is to add some sort of 'jump' tuple when moving a tuple
from one relation to another. So, when updating a tuple between
partitions we add another in the old partition with xmin_jump =
xmax_jump = xmax_old and have the jump tuple's content point to the new
relation.
Far from pretty, but it'd only matter overhead wise when used.

The usecases for doing such
updates really are more maintenance style commands, so it's possibly not
too bad from a usability POV :(

I'm afraid that might just be wishful thinking.

I admit that you might very well be right there :(

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#11)
Re: On partitioning

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-08-29 13:29:19 -0400, Tom Lane wrote:

An actual fix would presumably involve adding a partition number to the
ctid chain field in tuples in partitioned tables. The reason I bring it
up now is that we'd have to commit to doing that (or at least leaving room
for it) in the first implementation, if we don't want to have an on-disk
compatibility break.

What we could do is to add some sort of 'jump' tuple when moving a tuple
from one relation to another. So, when updating a tuple between
partitions we add another in the old partition with xmin_jump =
xmax_jump = xmax_old and have the jump tuple's content point to the new
relation.

Hm, that might work. It sounds more feasible than Alvaro's suggestion
of abusing cmax --- I don't think that field is free for use in this
context.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Hannu Krosing
hannu@tm.ee
In reply to: Tom Lane (#7)
Re: On partitioning

On 08/29/2014 07:15 PM, Tom Lane wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Tom Lane wrote:

One other interesting thought that occurs to me: are we going to support
UPDATEs that cause a row to belong to a different partition? If so, how
are we going to handle the update chain links?

Bah, I didn't mention it? My current thinking is that it would be
disallowed; if you have chosen your partitioning key well enough it
shouldn't be necessary. As a workaround you can always DELETE/INSERT.
Maybe we can allow it later, but for a first cut this seems more than
good enough.

Hm. I certainly agree that it's a case that could be disallowed for a
first cut, but it'd be nice to have some clue about how we might allow it
eventually.

There needs to be some structure that is specific to partitions and not
multiple plain tables which would then be used for both update chains and
cross-partition indexes (as you seem to imply by jumping from indexes
to update chains a few posts back).

It would need to replace plain tid (pagenr, tupnr) with triple of (partid,
pagenr, tupnr).

Cross-partition indexes are especially needed if we want to allow putting
UNIQUE constraints on non-partition-key columns.

Cheers

--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic O�

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Hannu Krosing
hannu@tm.ee
In reply to: Alvaro Herrera (#1)
Re: On partitioning

On 08/29/2014 05:56 PM, Alvaro Herrera wrote:

Prompted by a comment in the UPDATE/LIMIT thread, I saw Marko Tiikkaja
reference Tom's post
/messages/by-id/1598.1399826841@sss.pgh.pa.us
which mentions the possibility of a different partitioning
implementation than what we have so far. As it turns out, I've been
thinking about partitioning recently, so I thought I would share what
I'm thinking so that others can poke holes. My intention is to try to
implement this as soon as possible.

Declarative partitioning
========================
...
Still To Be Designed
--------------------
* Dependency issues
* Are indexes/constraints inherited from the parent rel?

I'd say mostly yes.

There could some extra "constraint exclusion type" magic for
conditional indexes, but the rest probably should come from "main table"

And there should be some kind of cross-partition indexes. At
"partitioning" capability, this can probably wait for version 2.

* Multiple keys?

Why not. But probably just for hash partitioning.

Subpartitioning?

Probably not. If you need speed for huge numbers of partitions, use
Gregs idea of keeping the partitions in a tree (or just having a
partition index).

Hash partitioning?

At some point definitely.

Also one thing you left unmentioned is dropping (and perhaps also
truncating)
a partition. We still may want to do historic data management the same way
we do it now, by just getting rid of the whole partition or its data.

At some point we may also want to do redistributing data between
partitions,
maybe for case where we end up with 90% of the data in on partition due to
bad partitioning key or partitioning function choice. This is again
something
that is hard now and can therefore be left to a later version.

Open Questions
--------------

* What's the syntax to refer to specific partitions within a partitioned
table?
We could do "TABLE <xyz> PARTITION <n>", but for example if in
the future we add hash partitioning, we might need some non-integer
addressing (OTOH assigning sequential numbers to hash partitions doesn't
seem so bad). Discussing with users of other DBMSs partitioning feature,
one useful phrase is "TABLE <xyz> PARTITION FOR <value>".

Or more generally

TABLE <xyz> PARTITION FOR/WHERE col1=val1, col2=val2, ...;

Cheers

--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic O�

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Hannu Krosing (#13)
Re: On partitioning

Hannu Krosing wrote:

Cross-partition indexes are especially needed if we want to allow putting
UNIQUE constraints on non-partition-key columns.

I'm not going to implement cross-partition indexes in the first patch.
They are a huge can of worms.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#1)
Re: On partitioning

On Fri, Aug 29, 2014 at 11:56 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

In this design, partitions are first-class objects, not normal tables in
inheritance hierarchies. There are no pg_inherits entries involved at all.

Whoa. I always assumed that table inheritance was a stepping-stone to
real partitioning, and that real partitioning would be built on top of
table inheritance. In particular, I assume that (as Itagaki
Takahiro's patch did those many years ago) we'd add some metadata
somewhere to allow fast tuple routing (for both pruning and
inserts/updates). What's the benefit of inventing something new
instead?

I'm skeptical about your claim that there will be no pg_inherits
entries involved at all. You need some way to know which partitions
go with which parent table. You can store that many-to-one mapping
someplace other than pg_inherits, but it seems to me that that doesn't
buy you anything; they're just pg_inherits entries under some other
name. Why reinvent that?

Each partition is assigned an Expression that receives a tuple and
returns boolean. This expression returns true if a given tuple belongs
into it, false otherwise. If a tuple for a partitioned relation is run
through expressions of all partitions, exactly one should return true.
If none returns true, it might be because the partition has not been
created yet. A user-facing error is raised in this case (Rationale: if
user creates a partitioned rel and there is no partition that accepts
some given tuple, it's the user's fault.)

Additionally, each partitioned relation may have a master expression.
This receives a tuple and returns an integer, which corresponds to the
number of the partition it belongs into.

I agree with Tom: this is a bad design. In particular, if we want to
scale to large numbers of partitions (a principal weakness of the
present system) we need the operation of routing a tuple to a
partition to be as efficient as possible. Range partitioning can be
O(lg n) where n is the number of partitions: store a list of the
boundaries and binary-search it. List partitioning can be O(lg k)
where k is the number of values (which may be more than the number of
partitions) via a similar technique. Hash partitioning can be O(1).
I'm not sure what other kind of partitioning anybody would want to do,
but it's likely that they *won't* want it to be O(1) in the number of
partitions. So I'd say have *only* the master expression.

But, really, I don't think an expression is the right way to store
this; evaluating that repeatedly will, I think, still be too slow.
Think about what happens in PL/pgsql: minimizing the number of times
that you enter and exit the executor helps performance enormously,
even if the expressions are simple enough not to need planning. I
think the representation should be more like an array of partition
boundaries and the pg_proc OID of a comparator.

Per-partition expressions are formed as each partition is created, and
are based on the user-supplied partitioning criterion. Master
expressions are formed at relation creation time. (XXX Can we change
the master expression later, as a result of some ALTER command?
Presumably this would mean that all partitions might need to be
rewritten.)

This is another really important point. If you store an opaque
expression mapping partitioning keys to partition numbers, you can't
do things like this efficiently. With a more transparent
representation, like a sorted array of partition boundaries for range
partitioning, or a sorted array of hash values for consistent hashing,
you can do things like split and merge partitions efficiently,
minimizing rewriting.

Planner -------

A partitioned relation behaves just like a regular relation for purposes
of planner. XXX do we need special considerations regarding relation
size estimation?

For scan plans, we need to prepare Append lists which are used to scan
for tuples in a partitioned relation. We can setup fake constraint
expressions based on the partitioning expressions, which let the planner
discard unnecessary partitions by way of constraint exclusion.

So if we're going to do all this, why bother making the partitions
anything other than inheritance children? There might be some benefit
in having the partitions be some kind of stripped-down object if we
could avoid some of these planner gymnastics and get, e.g. efficient
run-time partition pruning. But if you're going to generate Append
plans and switch ResultRelInfos and stuff just as you would for an
inheritance hierarchy, why not just make it an inheritance hierarchy?

It seems pretty clear to me that we need partitioned tables to have
the same tuple descriptor throughout the relation, for efficient tuple
routing and so on. But the other restrictions you're proposing to
impose on partitions have no obvious value that I can see. We could
have a rule that when you inherit from a partition root, you can only
inherit from that one table (no multiple inheritance) and your tuple
descriptor must match precisely (down to dropped columns and column
ordering) and that would give you everything I think you really need
here. There's no gain to be had in forbidding partitions from having
different owners, or being selected from directly, or having
user-visible names. The first of those is arguably useless, but it's
not really causing us any problems, and the latter two are extremely
useful features. Unless you are going to implement partition pruning
is so good that it will never fail to realize a situation where only
one partition needs to be scanned, letting users target the partition
directly is a very important escape hatch.

(In the future we might be interested in creating specialized plan and
execution nodes that know more about partitioned relations, to avoid
creating useless Append trees only to prune them later.)

Good idea.

pg_dump is able to dump a partitioned relation as a CREATE
TABLE/PARTITION command and a series of ALTER TABLE/CREATE PARTITION
commands. The data of all partitions is considered a single COPY
operation.

XXX this limits the ability to restore in parallel. To fix we might consider
using one COPY for each partition. It's not clear what relation should be
mentioned in such a COPY command, though -- my instinct is that it
should reference the parent table only, not the individual partition.

Targeting the individual partitions seems considerably better.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Alvaro Herrera (#1)
Re: On partitioning

On Sat, Aug 30, 2014 at 12:56 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Prompted by a comment in the UPDATE/LIMIT thread, I saw Marko Tiikkaja
reference Tom's post
/messages/by-id/1598.1399826841@sss.pgh.pa.us
which mentions the possibility of a different partitioning
implementation than what we have so far. As it turns out, I've been
thinking about partitioning recently, so I thought I would share what
I'm thinking so that others can poke holes. My intention is to try to
implement this as soon as possible.

+1.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#1)
Re: On partitioning

Another thought about this general topic:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

...
Allowed actions on a RELKIND_PARTITION:
* CREATE INDEX .. ON PARTITION <n> ON TABLE <xyz>
...
Still To Be Designed
--------------------
* Are indexes/constraints inherited from the parent rel?

I think one of the key design decisions we have to make is whether
partitions are all constrained to have exactly the same set of indexes.
If we don't insist on that it will greatly complicate planning compared
to what we'll get if we do insist on it, because then the planner will
need to generate a separate customized plan subtree for each partition.
Aside from costing planning time, most likely that would forever prevent
us from pushing some types of intelligence about partitioning into the
executor.

Now, in the current model, it's up to the user what indexes to create
on each partition, and sometimes one might feel that maintaining a
particular index is unnecessary in some partitions. But the flip side
of that is it's awfully easy to screw yourself by forgetting to add
some index when you add a new partition. So I'm not real sure which
approach is superior from a purely user-oriented perspective.

I'm not trying to push one or the other answer right now, just noting
that this is a critical decision.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Hannu Krosing
hannu@tm.ee
In reply to: Tom Lane (#18)
Re: On partitioning

On 08/31/2014 10:03 PM, Tom Lane wrote:

Another thought about this general topic:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

...
Allowed actions on a RELKIND_PARTITION:
* CREATE INDEX .. ON PARTITION <n> ON TABLE <xyz>
...
Still To Be Designed
--------------------
* Are indexes/constraints inherited from the parent rel?

I think one of the key design decisions we have to make is whether
partitions are all constrained to have exactly the same set of indexes.
If we don't insist on that it will greatly complicate planning compared
to what we'll get if we do insist on it, because then the planner will
need to generate a separate customized plan subtree for each partition.
Aside from costing planning time, most likely that would forever prevent
us from pushing some types of intelligence about partitioning into the
executor.

Now, in the current model, it's up to the user what indexes to create
on each partition, and sometimes one might feel that maintaining a
particular index is unnecessary in some partitions. But the flip side
of that is it's awfully easy to screw yourself by forgetting to add
some index when you add a new partition.

The "forgetting" part is easy to solve by inheriting all indexes from
parent (or template) partition unless explicitly told not to.

One other thing that has been bothering me about this proposal
is the ability to take partitions offline for maintenance or to load
them offline ant then switch in.

In current scheme we do this using ALTER TABLE ... [NO] INHERIT ...

If we also want to have this with the not-directly-accessible partitions
then perhaps it could be done by having a possibility to move
a partition between two tables with exactly the same structure ?

So I'm not real sure which
approach is superior from a purely user-oriented perspective.

What we currently have is a very flexible scheme which has a few
drawbacks

1) unnecessarily complex for simple case
2) easy to shoot yourself in the foot by forgetting something
3) can be hard on planner, especially with huge number of partitions

An alternative way of solving these problems is adding some
(meta-)constraints to current way of doing things and some more
automation

CREATE TABLE FOR PARTITIONMASTER
WITH (ALL_INDEXES_SAME=ON,
SAME_STRUCTURE_ALWAYS=ON,
SINGLE_INHERITANCE_ONLY=ON,
NESTED_INHERITS=OFF,
PARTITION_FUNCTION=default_range_partitioning(int)
);

and then force these when adding inherited tables (in this case
partition tables)
either via CREATE TABLE or ALTER TABLE

Best Regards

--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic O�

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Martijn van Oosterhout
kleptog@svana.org
In reply to: Tom Lane (#2)
Re: On partitioning

On Fri, Aug 29, 2014 at 12:35:50PM -0400, Tom Lane wrote:

Each partition is assigned an Expression that receives a tuple and
returns boolean. This expression returns true if a given tuple belongs
into it, false otherwise.

-1, in fact minus a lot. One of the core problems of the current approach
is that the system, particularly the planner, hasn't got a lot of insight
into exactly what the partitioning scheme is in a partitioned table built
on inheritance. If you allow the partitioning rule to be a black box then
that doesn't get any better. I want to see a design wherein the system
understands *exactly* what the partitioning behavior is. I'd start with
supporting range-based partitioning explicitly, and maybe we could add
other behaviors such as hashing later.

In particular, there should never be any question at all that there is
exactly one partition that a given row belongs to, not more, not less.
You can't achieve that with a set of independent filter expressions;
a meta-rule that says "exactly one of them should return true" is an
untrustworthy band-aid.

(This does not preclude us from mapping the tuple through the partitioning
rule and finding that the corresponding partition doesn't currently exist.
I think we could view the partitioning rule as a function from tuples to
partition numbers, and then we look in pg_class to see if such a partition
exists.)

There is one situation where you need to be more flexible, and that is
if you ever want to support online repartitioning. To do that you have
to distinguish between "I want to insert tuple X, which partition
should it go into" and "I want to know which partitions I need to look
for partition_key=Y".

For the latter you really have need an expression per partition, or
something equivalent. If performance is an issue I suppose you could
live with having an "old" and an "new" partition scheme, so you
couldn't have two "live repartitionings" happening simultaneously.

Now, if you want to close the door on online repartitioning forever
then that fine. But being in the position of having to say "yes our
partitioning scheme sucks, but we would have to take the database down
for a week to fix it" is no fun.

Unless logical replication provides a way out maybe??

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

He who writes carelessly confesses thereby at the very outset that he does
not attach much importance to his own thoughts.

-- Arthur Schopenhauer

#21Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#18)
#22Andres Freund
andres@anarazel.de
In reply to: Hannu Krosing (#13)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#21)
#24Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#18)
#25Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#23)
#26Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#23)
#27Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#23)
#28Hannu Krosing
hannu@tm.ee
In reply to: Andres Freund (#22)
#29Craig Ringer
craig@2ndquadrant.com
In reply to: Andres Freund (#22)
#30Craig Ringer
craig@2ndquadrant.com
In reply to: Tom Lane (#18)
#31Bruce Momjian
bruce@momjian.us
In reply to: Martijn van Oosterhout (#20)
#32Martijn van Oosterhout
kleptog@svana.org
In reply to: Bruce Momjian (#31)
#33Robert Haas
robertmhaas@gmail.com
In reply to: Martijn van Oosterhout (#32)
#34Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#1)
#35Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruce Momjian (#34)
#36Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#35)
#37Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Bruce Momjian (#36)
#38Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Amit Langote (#37)
#39Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#38)
#40Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Andres Freund (#39)
#41Andres Freund
andres@anarazel.de
In reply to: Amit Langote (#40)
#42Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#41)
#43Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#42)
#44Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Robert Haas (#42)
#45Robert Haas
robertmhaas@gmail.com
In reply to: Amit Langote (#44)
#46Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Robert Haas (#45)
#47Robert Haas
robertmhaas@gmail.com
In reply to: Amit Langote (#46)
#48Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#47)
#49Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#48)
#50Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Robert Haas (#49)
#51Stephen Frost
sfrost@snowman.net
In reply to: Robert Haas (#49)
#52Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Stephen Frost (#51)
#53Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#52)
#54Robert Haas
robertmhaas@gmail.com
In reply to: Stephen Frost (#51)
#55Stephen Frost
sfrost@snowman.net
In reply to: Robert Haas (#54)
#56Robert Haas
robertmhaas@gmail.com
In reply to: Stephen Frost (#55)
#57Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Robert Haas (#47)
#58Robert Haas
robertmhaas@gmail.com
In reply to: Amit Langote (#57)
#59Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Robert Haas (#58)
#60Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#59)
#61Robert Haas
robertmhaas@gmail.com
In reply to: Amit Langote (#59)
#62Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Robert Haas (#61)
#63Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Amit Langote (#62)
#64Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Amit Langote (#62)
In reply to: Alvaro Herrera (#64)
In reply to: Kenneth Marshall (#65)
#67Stephen Frost
sfrost@snowman.net
In reply to: Kenneth Marshall (#65)
#68Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Jim Nasby (#63)
#69Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Alvaro Herrera (#64)
#70Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Langote (#68)
#71Amit Kapila
amit.kapila16@gmail.com
In reply to: Alvaro Herrera (#64)
#72Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Kapila (#70)
#73Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#61)
#74Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Langote (#72)
#75Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Kapila (#74)
#76Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Kapila (#74)
#77Robert Haas
robertmhaas@gmail.com
In reply to: Amit Langote (#62)
#78Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#73)
#79Robert Haas
robertmhaas@gmail.com
In reply to: Amit Langote (#75)
#80Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Amit Langote (#76)
#81Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Jim Nasby (#80)
#82Robert Haas
robertmhaas@gmail.com
In reply to: Jim Nasby (#81)
#83Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Robert Haas (#82)
#84Robert Haas
robertmhaas@gmail.com
In reply to: Jim Nasby (#83)
#85Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#77)
#86Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#78)
#87Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Robert Haas (#77)
#88Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Kapila (#85)
#89Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Kapila (#86)
#90Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Langote (#88)
#91Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Kapila (#90)
#92Josh Berkus
josh@agliodbs.com
In reply to: Amit Langote (#91)
#93Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#85)
#94Robert Haas
robertmhaas@gmail.com
In reply to: Amit Langote (#87)
#95Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#86)
#96Josh Berkus
josh@agliodbs.com
In reply to: Alvaro Herrera (#1)
#97Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#95)
#98Robert Haas
robertmhaas@gmail.com
In reply to: Josh Berkus (#96)
#99Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#97)
#100Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#99)
#101Josh Berkus
josh@agliodbs.com
In reply to: Alvaro Herrera (#1)
#102Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#100)
#103Robert Haas
robertmhaas@gmail.com
In reply to: Josh Berkus (#101)
#104Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Robert Haas (#95)
#105Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Josh Berkus (#92)
#106Josh Berkus
josh@agliodbs.com
In reply to: Jim Nasby (#105)
#107Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Robert Haas (#93)
#108Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Langote (#107)
#109Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#102)
#110Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Kapila (#108)
#111Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Kapila (#108)
#112Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Josh Berkus (#92)
#113Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Amit Kapila (#109)
#114Josh Berkus
josh@agliodbs.com
In reply to: Alvaro Herrera (#113)
#115Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Josh Berkus (#106)
#116Amit Kapila
amit.kapila16@gmail.com
In reply to: Josh Berkus (#114)
#117Amit Kapila
amit.kapila16@gmail.com
In reply to: Alvaro Herrera (#113)
#118Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Kapila (#116)
#119Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Kapila (#117)
#120Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Amit Langote (#119)
#121Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#120)
#122Robert Haas
robertmhaas@gmail.com
In reply to: Jim Nasby (#104)
#123Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#108)
#124Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Robert Haas (#102)
#125Robert Haas
robertmhaas@gmail.com
In reply to: Amit Langote (#124)
#126Amit Kapila
amit.kapila16@gmail.com
In reply to: Alvaro Herrera (#120)
#127Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#123)
#128Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#127)
#129Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Robert Haas (#128)
#130Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#128)
#131Robert Haas
robertmhaas@gmail.com
In reply to: Amit Langote (#129)
#132Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Robert Haas (#131)
#133Robert Haas
robertmhaas@gmail.com
In reply to: Jim Nasby (#132)
#134Claudio Freire
klaussfreire@gmail.com
In reply to: Robert Haas (#133)
#135Tom Lane
tgl@sss.pgh.pa.us
In reply to: Claudio Freire (#134)
#136Claudio Freire
klaussfreire@gmail.com
In reply to: Tom Lane (#135)
#137Josh Berkus
josh@agliodbs.com
In reply to: Alvaro Herrera (#1)
#138Claudio Freire
klaussfreire@gmail.com
In reply to: Josh Berkus (#137)
#139Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Claudio Freire (#138)
#140Claudio Freire
klaussfreire@gmail.com
In reply to: Alvaro Herrera (#139)
#141José Luis Tallón
jltallon@adv-solutions.net
In reply to: Amit Langote (#129)
#142José Luis Tallón
jltallon@adv-solutions.net
In reply to: Alvaro Herrera (#139)
#143Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Robert Haas (#133)
#144David Fetter
david@fetter.org
In reply to: Robert Haas (#131)
#145José Luis Tallón
jltallon@adv-solutions.net
In reply to: José Luis Tallón (#142)
#146Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: José Luis Tallón (#142)
#147Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: José Luis Tallón (#141)
#148Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Alvaro Herrera (#139)
#149Claudio Freire
klaussfreire@gmail.com
In reply to: Amit Langote (#148)
#150Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Claudio Freire (#149)
#151José Luis Tallón
jltallon@adv-solutions.net
In reply to: Claudio Freire (#149)
#152Claudio Freire
klaussfreire@gmail.com
In reply to: José Luis Tallón (#151)
#153Robert Haas
robertmhaas@gmail.com
In reply to: Amit Langote (#148)
#154Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Robert Haas (#153)
#155Robert Haas
robertmhaas@gmail.com
In reply to: Amit Langote (#154)
#156Claudio Freire
klaussfreire@gmail.com
In reply to: Robert Haas (#155)
#157Josh Berkus
josh@agliodbs.com
In reply to: Amit Kapila (#127)
#158Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Robert Haas (#155)
#159Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Claudio Freire (#156)
#160Robert Haas
robertmhaas@gmail.com
In reply to: Josh Berkus (#157)
#161Josh Berkus
josh@agliodbs.com
In reply to: Amit Kapila (#127)
#162Robert Haas
robertmhaas@gmail.com
In reply to: Josh Berkus (#161)
#163Josh Berkus
josh@agliodbs.com
In reply to: Amit Kapila (#127)
#164Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Josh Berkus (#163)
#165Josh Berkus
josh@agliodbs.com
In reply to: Amit Kapila (#127)
#166Robert Haas
robertmhaas@gmail.com
In reply to: Josh Berkus (#163)
#167Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Robert Haas (#166)
#168Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#167)
#169Robert Haas
robertmhaas@gmail.com
In reply to: Amit Langote (#168)
#170Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Robert Haas (#169)
#171Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Ashutosh Bapat (#170)
#172Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Robert Haas (#169)
#173Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#172)
#174Robert Haas
robertmhaas@gmail.com
In reply to: Amit Langote (#172)
#175Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Robert Haas (#174)
#176Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#175)
#177Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Amit Langote (#176)
#178Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Jim Nasby (#177)