Re: transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Mon, May 1, 2017 at 12:51 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
What we could document now is that partitioned tables don't allow
specifying triggers that reference transition tables. Although, I am
wondering where this note really belongs - the partitioning chapter, the
triggers chapter or the CREATE TRIGGER reference page? Maybe, Kevin and
Thomas have something to say about that. If it turns out that the
partitioning chapter is a good place, here is a documentation patch.
I think that before we document this behavior, we'd better make sure
we understand exactly what the behavior is, and we'd better make sure
it's correct. Currently, triggers that involve transition tables are
altogether prohibited when the root relation is partitioned, but are
allowed in inheritance cases. However, the actual behavior appears to
be buggy. Here's what happens when I define a parent and a child and
update all the rows:
rhaas=# CREATE FUNCTION t() RETURNS trigger
rhaas-# LANGUAGE plpgsql
rhaas-# AS $$declare q record; begin raise notice 'table %',
tg_table_name; for q in select * from old loop raise notice 'table %
got value %', tg_table_name, q.a; end loop; return null; end;$$;
CREATE FUNCTION
rhaas=# CREATE TABLE p (a int, b text);
CREATE TABLE
rhaas=# CREATE TABLE p1 () INHERITS (p);
CREATE TABLE
rhaas=# CREATE TRIGGER x AFTER UPDATE ON p REFERENCING OLD TABLE AS
old NEW TABLE AS new FOR EACH STATEMENT EXECUTE PROCEDURE t();
CREATE TRIGGER
rhaas=# INSERT INTO p VALUES (0, 'zero');
INSERT 0 1
rhaas=# INSERT INTO p1 VALUES (1, 'one');
INSERT 0 1
rhaas=# INSERT INTO p1 VALUES (2, 'two');
INSERT 0 1
rhaas=# UPDATE p SET b = 'whatever';
NOTICE: table p
NOTICE: table p got value 0
UPDATE 3
Only the rows in the parent show up in the transition table. But now
look what happens if I add an unrelated trigger that also uses
transition tables to the children:
rhaas=# CREATE FUNCTION u() RETURNS trigger LANGUAGE plpgsql AS
$$begin null; end$$;
CREATE FUNCTION
rhaas=# CREATE TRIGGER x1 AFTER UPDATE ON p1 REFERENCING OLD TABLE AS
old NEW TABLE AS new FOR EACH STATEMENT EXECUTE PROCEDURE u();
CREATE TRIGGER
rhaas=# UPDATE p SET b = 'whatever';
NOTICE: table p
NOTICE: table p got value 0
NOTICE: table p got value 1
NOTICE: table p got value 2
UPDATE 3
It seems pretty clear to me that this is busted. The existence of
trigger x1 on p1 shouldn't affect whether trigger x on p sees changes
to p1's rows in its transition tables. Either all changes to any
descendants of p should be captured by the transition tables, or only
changes to the root table should be captured. If we do the former,
the restriction against using transition tables in triggers on
partitioned tables should be removed, I would think. If we do the
latter, then what we should document is not that partitioned tables
have a restriction that doesn't apply to inheritance but rather that
the restriction on the partitioned case flows from the fact that only
the parent's changes are captured, and the parent is always empty in
the partitioning case. In deciding between these two cases, we should
consider the case where the inheritance children have extra columns
and/or different column orderings.
Adding this as an open item. Kevin?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, May 1, 2017 at 10:01 AM, Robert Haas <robertmhaas@gmail.com> wrote:
It seems pretty clear to me that this is busted.
I don't think you actually tested anything that is dependent on any
of my patches there.
Adding this as an open item. Kevin?
It will take some time to establish what legacy behavior is and how
the new transition tables are impacted. My first reaction is that a
trigger on the parent should fire for any related action on a child
(unless maybe the trigger is defined with an ONLY keyword???) using
the TupleDesc of the parent. Note that the SQL spec mandates that
even in a AFTER EACH ROW trigger the transition tables must
represent all rows affected by the STATEMENT. I think that this
should be independent of triggers fired at the row level. I think
the rules should be similar for updateable views.
This will take some time to investigate, discuss and produce a
patch. I think best case is Friday.
--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, May 1, 2017 at 12:10 PM, Kevin Grittner <kgrittn@gmail.com> wrote:
On Mon, May 1, 2017 at 10:01 AM, Robert Haas <robertmhaas@gmail.com> wrote:
It seems pretty clear to me that this is busted.
I don't think you actually tested anything that is dependent on any
of my patches there.
I was testing which rows show up in a transition table, so I assumed
that was related to the transition tables patch. Note that this is
not about which triggers are fired, just about how inheritance
interacts with transition tables.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, May 1, 2017 at 11:53 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, May 1, 2017 at 12:10 PM, Kevin Grittner <kgrittn@gmail.com> wrote:
On Mon, May 1, 2017 at 10:01 AM, Robert Haas <robertmhaas@gmail.com> wrote:
It seems pretty clear to me that this is busted.
I don't think you actually tested anything that is dependent on any
of my patches there.I was testing which rows show up in a transition table, so I assumed
that was related to the transition tables patch. Note that this is
not about which triggers are fired, just about how inheritance
interacts with transition tables.
Yeah, I got confused a bit there, comparing to the updateable views case.
--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, May 2, 2017 at 3:01 AM, Robert Haas <robertmhaas@gmail.com> wrote:
[...]
Only the rows in the parent show up in the transition table. But now
look what happens if I add an unrelated trigger that also uses
transition tables to the children:rhaas=# CREATE FUNCTION u() RETURNS trigger LANGUAGE plpgsql AS
$$begin null; end$$;
CREATE FUNCTION
rhaas=# CREATE TRIGGER x1 AFTER UPDATE ON p1 REFERENCING OLD TABLE AS
old NEW TABLE AS new FOR EACH STATEMENT EXECUTE PROCEDURE u();
CREATE TRIGGER
rhaas=# UPDATE p SET b = 'whatever';
NOTICE: table p
NOTICE: table p got value 0
NOTICE: table p got value 1
NOTICE: table p got value 2
UPDATE 3It seems pretty clear to me that this is busted. The existence of
trigger x1 on p1 shouldn't affect whether trigger x on p sees changes
to p1's rows in its transition tables.
Yikes -- agreed. See analysis and draft patch for discussion below.
Either all changes to any
descendants of p should be captured by the transition tables, or only
changes to the root table should be captured. If we do the former,
the restriction against using transition tables in triggers on
partitioned tables should be removed, I would think. If we do the
latter, then what we should document is not that partitioned tables
have a restriction that doesn't apply to inheritance but rather that
the restriction on the partitioned case flows from the fact that only
the parent's changes are captured, and the parent is always empty in
the partitioning case. In deciding between these two cases, we should
consider the case where the inheritance children have extra columns
and/or different column orderings.
I think that we should only capture transition tuples captured from
the explicitly named relation, since we only fire AFTER STATEMENT
triggers on that relation. I see no inconsistency with the policy of
rejecting transition tables on partitioned tables (as I proposed and
Kevin accepted[1]/messages/by-id/CACjxUsNhdm4ZCgaVreLK5kAwHTZUkqJAVXiywwi-HNVsuTLMnA@mail.gmail.com), because partitioned tables can't have any data so
there would be no point. In contrast, every parent table in an
inheritance hierarchy is also a regular table and can hold data, so I
think we should allow transition tables on them, and capture
transition tuples from that table only when you modify it directly.
The transition table infrastructure only supports exactly one relation
being modified at each query level, and it's a bug that this example
captures tuples from p1 into the tuplestore intended for p's tuples
even though it is not even going to fire the after statement trigger
on p1. It's only a coincidence that the tuples have compatible
TupleDescs.
The pre-existing behaviour for triggers with inheritance is that
STATEMENT triggers fire only for the directly named relation, but ROW
triggers fire for all affected relations. The transition table patch
didn't change that, but it added code to AfterTriggerSaveEvent, which
is called by ExecAR(Insert|Update|Delete)Triggers, to capture
transitions. That gets called for every updated relation (ie
including partitions and inheritance sub-tables) to implement the ROW
policy. It needs to be taught not to capture transition tuples from
relations except the one directly named.
One solution to this problem is for nodeModifyTable.c to tell the
ExecAR* functions explicitly whether to capture transition tuples. It
knows when it has modified the explicitly named relation in a
hierarchy (mt_whichplan == 0) without rerouting via a partitioned
table (mt_partition_dispatch_info == NULL). See attached patch for
discussion (it lacks tests and needs better comments). Does this make
sense? Do you see a better way?
[1]: /messages/by-id/CACjxUsNhdm4ZCgaVreLK5kAwHTZUkqJAVXiywwi-HNVsuTLMnA@mail.gmail.com
--
Thomas Munro
http://www.enterprisedb.com
Attachments:
transition-tables-from-one-relation-only.patchapplication/octet-stream; name=transition-tables-from-one-relation-only.patchDownload+70-49
On Tue, May 2, 2017 at 9:44 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
I think that we should only capture transition tuples captured from
the explicitly named relation, since we only fire AFTER STATEMENT
triggers on that relation. I see no inconsistency with the policy of
rejecting transition tables on partitioned tables (as I proposed and
Kevin accepted[1]), because partitioned tables can't have any data so
there would be no point. In contrast, every parent table in an
inheritance hierarchy is also a regular table and can hold data, so I
think we should allow transition tables on them, and capture
transition tuples from that table only when you modify it directly.
I suspect that most users would find it more useful to capture all of
the rows that the statement actually touched, regardless of whether
they hit the named table or an inheritance child. I just don't know
if it's practical to make that work. (And, of course, I don't know if
other people agree with my assessment of what is useful ... but
generally there seems to be support for making partitioned tables, at
least, look more like a single table that happens to have partitions
and less like a bunch of separate tables attached to each other with
duct tape.)
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas wrote:
I suspect that most users would find it more useful to capture all of
the rows that the statement actually touched, regardless of whether
they hit the named table or an inheritance child.
Yes, agreed. For the plain inheritance cases each row would need to
have an indicator of which relation it comes from (tableoid); I'm not
sure if such a thing would be useful in the partitioning case.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, May 3, 2017 at 12:02 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
Robert Haas wrote:
I suspect that most users would find it more useful to capture all of
the rows that the statement actually touched, regardless of whether
they hit the named table or an inheritance child.Yes, agreed. For the plain inheritance cases each row would need to
have an indicator of which relation it comes from (tableoid); I'm not
sure if such a thing would be useful in the partitioning case.
I think it would be about equally useful.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, May 03, 2017 at 11:47:04AM -0400, Robert Haas wrote:
On Tue, May 2, 2017 at 9:44 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:I think that we should only capture transition tuples captured from
the explicitly named relation, since we only fire AFTER STATEMENT
triggers on that relation. I see no inconsistency with the policy of
rejecting transition tables on partitioned tables (as I proposed and
Kevin accepted[1]), because partitioned tables can't have any data so
there would be no point. In contrast, every parent table in an
inheritance hierarchy is also a regular table and can hold data, so I
think we should allow transition tables on them, and capture
transition tuples from that table only when you modify it directly.I suspect that most users would find it more useful to capture all of
the rows that the statement actually touched, regardless of whether
they hit the named table or an inheritance child. I just don't know
if it's practical to make that work. (And, of course, I don't know if
other people agree with my assessment of what is useful ... but
generally there seems to be support for making partitioned tables, at
least, look more like a single table that happens to have partitions
and less like a bunch of separate tables attached to each other with
duct tape.)
+1 on the not-duct-tape view of partitioned tables.
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, May 4, 2017 at 4:02 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Robert Haas wrote:
I suspect that most users would find it more useful to capture all of
the rows that the statement actually touched, regardless of whether
they hit the named table or an inheritance child.Yes, agreed. For the plain inheritance cases each row would need to
have an indicator of which relation it comes from (tableoid); I'm not
sure if such a thing would be useful in the partitioning case.
On Thu, May 4, 2017 at 4:26 AM, David Fetter <david@fetter.org> wrote:
+1 on the not-duct-tape view of partitioned tables.
Hmm. Ok. Are we talking about PG10 or PG11 here? Does this approach
makes sense?
1. Remove the prohibition on creating transition-capturing triggers
on a partitioned table.
2. Make sure that the ExecAR* functions call AfterTriggerSaveEvent
when modifying partition tables if the explicitly named parent
partitioned table has after triggers with transition tables. Not sure
how exactly how but doesn't seem difficult.
3. Convert tuples to the TupleDesc of the relation that owns the
statement trigger (ie the partitioned table) when inserting them into
the tuplestore. One way to do that might be to build an array of
TupleConversionMap objects that does the opposite of the conversions
done by tup_conv_maps. While tup_conv_maps is for converting tuples
to the layout needed for a partition, tup_unconv_maps (or better name)
would be for converting the old and new tuples to the TupleDesc of the
partitioned table. Then the appropriate TupleConversionMap could be
passed into the ExecAR* functions as a new argument 'transition_map'.
AfterTriggerSaveEvent would use 'oldtup' and 'newtup' directly for ROW
triggers, but convert using the passed in map if it needs to insert
them into the transition tuplestores.
The same thing could work for inheritance, if tupconvert.c had a new
kind of conversion that allows slicing of tuples (converting a wider
child table's tuples to the parent's subset of columns) rather the
just conversion between logically equivalent TupleDescs.
To avoid the whiff of duct tape, we'd probably also want to make ROW
triggers created on the partitioned table(s) containing partition to
fire too, with appropriate TypeConversionMap treatment. Not sure what
exactly is involved there.
On the other hand, doing that with inheritance hierarchies would be an
incompatible behavioural change, which I guess people don't want -- am
I right?
--
Thomas Munro
http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, May 4, 2017 at 4:46 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
On Thu, May 4, 2017 at 4:02 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Robert Haas wrote:
I suspect that most users would find it more useful to capture all of
the rows that the statement actually touched, regardless of whether
they hit the named table or an inheritance child.Yes, agreed. For the plain inheritance cases each row would need to
have an indicator of which relation it comes from (tableoid); I'm not
sure if such a thing would be useful in the partitioning case.On Thu, May 4, 2017 at 4:26 AM, David Fetter <david@fetter.org> wrote:
+1 on the not-duct-tape view of partitioned tables.
Hmm. Ok. Are we talking about PG10 or PG11 here? Does this approach
makes sense?
I was thinking PG10 if it can be done straightforwardly.
1. Remove the prohibition on creating transition-capturing triggers
on a partitioned table.2. Make sure that the ExecAR* functions call AfterTriggerSaveEvent
when modifying partition tables if the explicitly named parent
partitioned table has after triggers with transition tables. Not sure
how exactly how but doesn't seem difficult.3. Convert tuples to the TupleDesc of the relation that owns the
statement trigger (ie the partitioned table) when inserting them into
the tuplestore. One way to do that might be to build an array of
TupleConversionMap objects that does the opposite of the conversions
done by tup_conv_maps. While tup_conv_maps is for converting tuples
to the layout needed for a partition, tup_unconv_maps (or better name)
would be for converting the old and new tuples to the TupleDesc of the
partitioned table. Then the appropriate TupleConversionMap could be
passed into the ExecAR* functions as a new argument 'transition_map'.
AfterTriggerSaveEvent would use 'oldtup' and 'newtup' directly for ROW
triggers, but convert using the passed in map if it needs to insert
them into the transition tuplestores.The same thing could work for inheritance, if tupconvert.c had a new
kind of conversion that allows slicing of tuples (converting a wider
child table's tuples to the parent's subset of columns) rather the
just conversion between logically equivalent TupleDescs.To avoid the whiff of duct tape, we'd probably also want to make ROW
triggers created on the partitioned table(s) containing partition to
fire too, with appropriate TypeConversionMap treatment. Not sure what
exactly is involved there.On the other hand, doing that with inheritance hierarchies would be an
incompatible behavioural change, which I guess people don't want -- am
I right?
Incompatible with what? Transition tables haven't been released yet.
If we're going to fix the definition of what they do, now's the time.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, May 5, 2017 at 2:40 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, May 4, 2017 at 4:46 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:On Thu, May 4, 2017 at 4:02 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Robert Haas wrote:
I suspect that most users would find it more useful to capture all of
the rows that the statement actually touched, regardless of whether
they hit the named table or an inheritance child.Yes, agreed. For the plain inheritance cases each row would need to
have an indicator of which relation it comes from (tableoid); I'm not
sure if such a thing would be useful in the partitioning case.On Thu, May 4, 2017 at 4:26 AM, David Fetter <david@fetter.org> wrote:
+1 on the not-duct-tape view of partitioned tables.
Hmm. Ok. Are we talking about PG10 or PG11 here? Does this approach
makes sense?I was thinking PG10 if it can be done straightforwardly.
Ok, I will draft a patch to do it the way I described and see what people think.
To avoid the whiff of duct tape, we'd probably also want to make ROW
triggers created on the partitioned table(s) containing partition to
fire too, with appropriate TypeConversionMap treatment. Not sure what
exactly is involved there.On the other hand, doing that with inheritance hierarchies would be an
incompatible behavioural change, which I guess people don't want -- am
I right?Incompatible with what? Transition tables haven't been released yet.
If we're going to fix the definition of what they do, now's the time.
The last two paragraphs of my email were about ROW triggers created on
partitioned tables and tables with inheritance children, not the new
transition table stuff. I will forget that for now and look only at
making the transition tables duct-tape-free.
--
Thomas Munro
http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, May 01, 2017 at 11:10:52AM -0500, Kevin Grittner wrote:
On Mon, May 1, 2017 at 10:01 AM, Robert Haas <robertmhaas@gmail.com> wrote:
It seems pretty clear to me that this is busted.
I don't think you actually tested anything that is dependent on any
of my patches there.Adding this as an open item. Kevin?
It will take some time to establish what legacy behavior is and how
the new transition tables are impacted. My first reaction is that a
trigger on the parent should fire for any related action on a child
(unless maybe the trigger is defined with an ONLY keyword???) using
the TupleDesc of the parent. Note that the SQL spec mandates that
even in a AFTER EACH ROW trigger the transition tables must
represent all rows affected by the STATEMENT. I think that this
should be independent of triggers fired at the row level. I think
the rules should be similar for updateable views.This will take some time to investigate, discuss and produce a
patch. I think best case is Friday.
[Action required within three days. This is a generic notification.]
The above-described topic is currently a PostgreSQL 10 open item. Kevin,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1]/messages/by-id/20170404140717.GA2675809@tornado.leadboat.com and send a status update within three calendar days of
this message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10. Consequently, I will appreciate your efforts
toward speedy resolution. Thanks.
[1]: /messages/by-id/20170404140717.GA2675809@tornado.leadboat.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, May 5, 2017 at 8:29 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
On Fri, May 5, 2017 at 2:40 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, May 4, 2017 at 4:46 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:On Thu, May 4, 2017 at 4:02 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Robert Haas wrote:
I suspect that most users would find it more useful to capture all of
the rows that the statement actually touched, regardless of whether
they hit the named table or an inheritance child.Yes, agreed. For the plain inheritance cases each row would need to
have an indicator of which relation it comes from (tableoid); I'm not
sure if such a thing would be useful in the partitioning case.On Thu, May 4, 2017 at 4:26 AM, David Fetter <david@fetter.org> wrote:
+1 on the not-duct-tape view of partitioned tables.
Hmm. Ok. Are we talking about PG10 or PG11 here? Does this approach
makes sense?I was thinking PG10 if it can be done straightforwardly.
Ok, I will draft a patch to do it the way I described and see what people think.
FYI I am still working on this and will post a draft patch to do this
(that is: make transition tables capture changes from children with
appropriate tuple conversion) in the next 24 hours.
--
Thomas Munro
http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, May 8, 2017 at 7:09 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
On Fri, May 5, 2017 at 8:29 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:On Fri, May 5, 2017 at 2:40 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, May 4, 2017 at 4:46 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:On Thu, May 4, 2017 at 4:02 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Robert Haas wrote:
I suspect that most users would find it more useful to capture all of
the rows that the statement actually touched, regardless of whether
they hit the named table or an inheritance child.Yes, agreed. For the plain inheritance cases each row would need to
have an indicator of which relation it comes from (tableoid); I'm not
sure if such a thing would be useful in the partitioning case.On Thu, May 4, 2017 at 4:26 AM, David Fetter <david@fetter.org> wrote:
+1 on the not-duct-tape view of partitioned tables.
Hmm. Ok. Are we talking about PG10 or PG11 here? Does this approach
makes sense?I was thinking PG10 if it can be done straightforwardly.
Ok, I will draft a patch to do it the way I described and see what people think.
FYI I am still working on this and will post a draft patch to do this
(that is: make transition tables capture changes from children with
appropriate tuple conversion) in the next 24 hours.
Ok, here is a first swing at it, for discussion.
In master, the decision to populate transition tables happens in
AfterTriggerSaveEvent (called by the ExecAR* functions) in trigger.c.
It does that if it sees that there are triggers on the
relation-being-modified that have transition tables.
With this patch, nodeModifyTuple.c makes a 'TriggerTransitionFilter'
object to override that behaviour, if there are child tables of either
kind. That is passed to the ExecAR* functions and thence
AfterTriggerSaveEvent, overriding its usual behaviour, so that it can
know that it needs collect tuples from child nodes and how to convert
them to the layout needed for the tuplestores if necessary.
Thoughts? I'm not yet sure about the locking situation. Generally
needs some more testing.
--
Thomas Munro
http://www.enterprisedb.com
Attachments:
transition-tuples-from-child-tables.patchapplication/octet-stream; name=transition-tuples-from-child-tables.patchDownload+326-59
On Tue, May 9, 2017 at 10:29 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
In master, the decision to populate transition tables happens in
AfterTriggerSaveEvent (called by the ExecAR* functions) in trigger.c.
It does that if it sees that there are triggers on the
relation-being-modified that have transition tables.With this patch, nodeModifyTuple.c makes a 'TriggerTransitionFilter'
object to override that behaviour, if there are child tables of either
kind. That is passed to the ExecAR* functions and thence
AfterTriggerSaveEvent, overriding its usual behaviour, so that it can
know that it needs collect tuples from child nodes and how to convert
them to the layout needed for the tuplestores if necessary.Thoughts? I'm not yet sure about the locking situation. Generally
needs some more testing.
Here is a new version with tidied up tests and a couple of small bug
fixes. However, I've realised that there is a surprising behaviour
with this approach, and I'm not sure what to do about it.
Recall that transition tables can be specified for statement-level
triggers AND row-level triggers. If you specify them for row-level
triggers, then they can see all rows changed so far each time they
fire. Now our policy of firing the statement level triggers only for
the named relation but firing row-level triggers for all modified
relations leads to a tricky problem for the inheritance case: what
type of transition tuples should the child table's row-level triggers
see?
Suppose you have an inheritance hierarchy like this:
animal
-> mammal
-> cat
You define a statement-level trigger on "animal" and another
statement-level trigger on "mammal". You define a row-level trigger
on "cat". When you update either "animal" or "mammal", the row
triggers on "cat" might run. Row-level triggers on "cat" see OLD and
NEW as "cat" tuples, of course, but if they are configured to see
transition tables, should they see "cat", "mammal" or "animal" tuples
in the transition tables? With my patch as it is, that depends on
which level of the hierarchy you explicitly updated!
No such problem exists for partition hierarchies since the tables all
appear as the same type to user code (though conversions may be
happening for technical reasons).
--
Thomas Munro
http://www.enterprisedb.com
Attachments:
transition-tuples-from-child-tables-v2.patchapplication/octet-stream; name=transition-tuples-from-child-tables-v2.patchDownload+344-90
Thomas Munro wrote:
Recall that transition tables can be specified for statement-level
triggers AND row-level triggers. If you specify them for row-level
triggers, then they can see all rows changed so far each time they
fire.
Uhmm ... why do we do this? It seems like a great way to cause much
confusion. Shouldn't we see the transition table containing the whole
set for statement-level triggers only, and give row-level triggers just
the individual affected row each time?
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, May 10, 2017 at 9:57 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
Thomas Munro wrote:
Recall that transition tables can be specified for statement-level
triggers AND row-level triggers. If you specify them for row-level
triggers, then they can see all rows changed so far each time they
fire.Uhmm ... why do we do this? It seems like a great way to cause much
confusion. Shouldn't we see the transition table containing the whole
set for statement-level triggers only, and give row-level triggers just
the individual affected row each time?
I assumed that had come from the standard. I don't have a published
standard, but I have just taken a quick look at one of the publicly
available drafts dated 2006. I think its model is that the transition
tables are always conceptually there, and NEW and OLD are just range
variables over those tables. That may explain why transition tables
are mentioned in the context of row-level triggers, and it may be that
the spec's authors never intended row-level triggers to be able to see
the (partial) transition table other than through the range variables
that access exactly one row, but I don't see any wording that
explicitly says so in the spec. Do you? Thoughts, Kevin?
After thinking about this some more, it's not only the conversion to
some arbitrary parent tuple type that would be surprising to a user of
inheritance + triggers + transition tables, it's the fact that a
row-level trigger on a given child table will also see tuples
collected from other tables in the hierarchy. I think I didn't quite
get that right in -v2: it should probably build
TriggerTransitionFilter from union of all child tables' transition
table flags, not just the named table, so that if any child table has
a row trigger we start collecting transition tuples from all others
for it to see... That sounds pretty crazy to me.
So, assuming we want to proceed with this plan of collecting
transition tuples from children, I see approximately 3 choices:
1. Reject transition tables on row-level triggers.
2. Reject transition tables on row-level triggers on tables that
inherit from other tables.
3. Continue to allow transition tables on row-level triggers, but
document that they must be prepared to see transition tuples as they
would when querying arbitrary parent tables, and see tuples from other
tables in the hierarchy (!)
Another possibility which I haven't seriously considered is per-table
transition tables so you'd collect each child's tuples separately.
I take Alvaro's comment as a vote for 1. I vote for 1 too.
--
Thomas Munro
http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/05/10 6:51, Thomas Munro wrote:
No such problem exists for partition hierarchies since the tables all
appear as the same type to user code (though conversions may be
happening for technical reasons).
To clarify a bit, there may exist differences in the ordering of columns,
either between the parent and its partitions or between different
partitions. For example, while parent's rowtype is (a int, b char, c
float), a partition's may be (b char, a int, c float), and yet another
partition may have (c float, a int, b char). If some user code happens to
depend on the ordering of columns, selecting from the parent and selecting
from a partition directly may return the same result but in different formats.
Thanks,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, May 10, 2017 at 2:31 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2017/05/10 6:51, Thomas Munro wrote:
No such problem exists for partition hierarchies since the tables all
appear as the same type to user code (though conversions may be
happening for technical reasons).To clarify a bit, there may exist differences in the ordering of columns,
either between the parent and its partitions or between different
partitions. For example, while parent's rowtype is (a int, b char, c
float), a partition's may be (b char, a int, c float), and yet another
partition may have (c float, a int, b char). If some user code happens to
depend on the ordering of columns, selecting from the parent and selecting
from a partition directly may return the same result but in different formats.
Right. And the patch I posted converts all transition tuples it
collects from child tables to match the TupleDescriptor of the
relation you named, which it gets from
estate->es_root_result_relations[0]. Is that right? I suppose it
will be very common for partitions to have matching TupleDescriptors,
so the TupleConversionMap will usually be NULL meaning no conversion
is ever done. But in the inheritance case they might be different on
purpose, and in both inheritance and partitioning cases they might be
different in physical ways that aren't logically important as you said
(column order, dropped columns).
--
Thomas Munro
http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers