BUG #17817: DISABLE TRIGGER ALL on a partitioned table with foreign key fails
The following bug has been logged on the website:
Bug reference: 17817
Logged by: Alan Hodgson
Email address: ahodgson@simkin.ca
PostgreSQL version: 15.2
Operating system: linux-amd64
Description:
This works on 14.7. It fails on 15.2. The Ruby on Rails test suites use
DISABLE TRIGGER ALL extensively.
BEGIN;
CREATE TABLE test_fk (id serial primary key);
CREATE TABLE test_table (test serial, created_at timestamp not null, fk_id
int not null references test_fk(id)) PARTITION BY RANGE (created_at);
CREATE TABLE test_table_2017 PARTITION OF test_table FOR VALUES FROM
('2017-01-01') TO ('2018-01-01');
ALTER TABLE test_table DISABLE TRIGGER ALL;
ROLLBACK;
ERROR: trigger "RI_ConstraintTrigger_c_46838897" for table
"test_table_2017" does not exist
PG Bug reporting form <noreply@postgresql.org> writes:
This works on 14.7. It fails on 15.2. The Ruby on Rails test suites use
DISABLE TRIGGER ALL extensively.
BEGIN;
CREATE TABLE test_fk (id serial primary key);
CREATE TABLE test_table (test serial, created_at timestamp not null, fk_id
int not null references test_fk(id)) PARTITION BY RANGE (created_at);
CREATE TABLE test_table_2017 PARTITION OF test_table FOR VALUES FROM
('2017-01-01') TO ('2018-01-01');
ALTER TABLE test_table DISABLE TRIGGER ALL;
ROLLBACK;
ERROR: trigger "RI_ConstraintTrigger_c_46838897" for table
"test_table_2017" does not exist
Yeah, duplicated here. Bisecting says it broke at
commit ec0925c22a3da7199650c9903a03a0017705ed5c
Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Thu Aug 4 20:02:02 2022 +0200
Fix ENABLE/DISABLE TRIGGER to handle recursion correctly
Using ATSimpleRecursion() in ATPrepCmd() to do so as bbb927b4db9b did is
not correct, because ATPrepCmd() can't distinguish between triggers that
may be cloned and those that may not, so would wrongly try to recurse
for the latter category of triggers.
So this commit restores the code in EnableDisableTrigger() that
86f575948c77 had added to do the recursion, which would do it only for
triggers that may be cloned, that is, row-level triggers. This also
changes tablecmds.c such that ATExecCmd() is able to pass the value of
ONLY flag down to EnableDisableTrigger() using its new 'recurse'
parameter.
Interestingly, although that commit was back-patched to v11, the failure
does not occur in pre-v15 branches. So what's different about v15?
One clue is that the contents of pg_trigger are quite a bit different:
# select oid, tgparentid, tgrelid::regclass, tgname from pg_trigger where tgrelid in ('test_fk'::regclass, 'test_table'::regclass, 'test_table_2017'::regclass);
oid | tgparentid | tgrelid | tgname
-------+------------+-----------------+------------------------------
40997 | 0 | test_fk | RI_ConstraintTrigger_a_40997
40998 | 0 | test_fk | RI_ConstraintTrigger_a_40998
40999 | 0 | test_table | RI_ConstraintTrigger_c_40999
41000 | 0 | test_table | RI_ConstraintTrigger_c_41000
41006 | 40999 | test_table_2017 | RI_ConstraintTrigger_c_41006
41007 | 41000 | test_table_2017 | RI_ConstraintTrigger_c_41007
(6 rows)
# ALTER TABLE test_table DISABLE TRIGGER ALL;
ERROR: trigger "RI_ConstraintTrigger_c_40999" for table "test_table_2017" does not exist
whereas in v14 I see
# select oid, tgparentid, tgrelid::regclass, tgname from pg_trigger where tgrelid in ('test_fk'::regclass, 'test_table'::regclass, 'test_table_2017'::regclass);
oid | tgparentid | tgrelid | tgname
-------+------------+-----------------+------------------------------
38169 | 0 | test_fk | RI_ConstraintTrigger_a_38169
38170 | 0 | test_fk | RI_ConstraintTrigger_a_38170
38176 | 0 | test_table_2017 | RI_ConstraintTrigger_c_38176
38177 | 0 | test_table_2017 | RI_ConstraintTrigger_c_38177
(4 rows)
It's a reasonable bet that we're trying to look up the child trigger
using the name of its parent trigger ... but why are we searching by
name at all, rather than OID? Seems mighty failure-prone.
Stack trace from the lookup failure is
#0 errfinish (filename=0xac4aeb "trigger.c", lineno=1838,
funcname=0xad8fe0 <__func__.30665> "EnableDisableTrigger") at elog.c:480
#1 0x00000000004c968f in EnableDisableTrigger (rel=<optimized out>,
tgname=0x7f10283ad88c "RI_ConstraintTrigger_c_40999",
fires_when=<optimized out>, skip_system=false, recurse=true, lockmode=6)
at trigger.c:1835
#2 0x000000000069a717 in EnableDisableTrigger (rel=rel@entry=0x7f1031892768,
tgname=tgname@entry=0x0, fires_when=fires_when@entry=68 'D',
skip_system=false, recurse=true, lockmode=6) at trigger.c:1819
#3 0x0000000000691db4 in ATExecEnableDisableTrigger (
lockmode=<optimized out>, recurse=<optimized out>,
skip_system=<optimized out>, fires_when=<optimized out>,
trigname=<optimized out>, rel=<optimized out>) at tablecmds.c:14729
#4 ATExecCmd (wqueue=0x7fffb8534408, tab=0x17f4788, cmd=<optimized out>,
lockmode=6, cur_pass=<optimized out>, context=0x7fffb85345a0)
at tablecmds.c:5165
#5 0x0000000000693108 in ATRewriteCatalogs (context=0x7fffb85345a0,
lockmode=6, wqueue=0x7fffb8534408)
at ../../../src/include/nodes/nodes.h:193
regards, tom lane
On 2023-Mar-01, Tom Lane wrote:
Interestingly, although that commit was back-patched to v11, the failure
does not occur in pre-v15 branches. So what's different about v15?
Hmm, I think f4566345cf40 probably explains that.
It's a reasonable bet that we're trying to look up the child trigger
using the name of its parent trigger ... but why are we searching by
name at all, rather than OID? Seems mighty failure-prone.
I have no recollection of this, but we probably didn't have the OID
originally. It may be that simply changing that is enough to solve the
problem. I'll try to have a look later today, but I'm not sure I'll
have time.
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
On 2023-Mar-01, Tom Lane wrote:
It's a reasonable bet that we're trying to look up the child trigger
using the name of its parent trigger ... but why are we searching by
name at all, rather than OID? Seems mighty failure-prone.
I have no recollection of this, but we probably didn't have the OID
originally. It may be that simply changing that is enough to solve the
problem. I'll try to have a look later today, but I'm not sure I'll
have time.
I can throw together a patch for what I was thinking of.
regards, tom lane
I wrote:
I can throw together a patch for what I was thinking of.
Basically just make the recursive steps match on tgparentid instead
of name, like this.
I wonder whether anyplace else is making a similar mistake? Although
there's not much we will let you do to a foreign key trigger, so it
might not matter for anything else.
regards, tom lane
Attachments:
fix-bug-17817.patchtext/x-diff; charset=us-ascii; name=fix-bug-17817.patchDownload+60-5
On 2023-Mar-03, Tom Lane wrote:
I wrote:
I can throw together a patch for what I was thinking of.
Basically just make the recursive steps match on tgparentid instead
of name, like this.
Thank you, looks sane to me.
I wonder whether anyplace else is making a similar mistake? Although
there's not much we will let you do to a foreign key trigger, so it
might not matter for anything else.
Right ... triggers created in the normal way would have matching names,
so the previous code would work correctly.
I wonder how come this problem took so long to be detected with Ruby on
Rails; it's been in released 13.x and 14.x for seven months now. I
suppose it would be very useful if the Ruby on Rails community would run
their tests more often on new Postgres versions (or even on the tip of
stable branches).
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
Y una voz del caos me habló y me dijo
"Sonríe y sé feliz, podría ser peor".
Y sonreí. Y fui feliz.
Y fue peor.
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
On 2023-Mar-03, Tom Lane wrote:
Basically just make the recursive steps match on tgparentid instead
of name, like this.
Thank you, looks sane to me.
OK, will work on getting it committed.
I wonder how come this problem took so long to be detected with Ruby on
Rails; it's been in released 13.x and 14.x for seven months now. I
suppose it would be very useful if the Ruby on Rails community would run
their tests more often on new Postgres versions (or even on the tip of
stable branches).
Um ... 13.x and 14.x aren't showing the problem, or is there something
I missed? But I agree that it'd be good if we could get some Ruby folk
to test more promptly --- it's pretty sad that this didn't get noticed
sooner in v15.
regards, tom lane
I wrote:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
I wonder how come this problem took so long to be detected with Ruby on
Rails; it's been in released 13.x and 14.x for seven months now.
Um ... 13.x and 14.x aren't showing the problem, or is there something
I missed?
Oh! Running the same test shows that while 12.x through 14.x do not
throw an error, they don't disable the child table's triggers either.
Moreover, we can't apply this fix idea since there is no tgparentid
linkage (or indeed any parent trigger to link to).
It's not hard to see one way to fix it: if the initial call is "for
all triggers", forget about recursing for individual triggers and
instead recursively do a "for all triggers" on the child. However,
that would be the sort of semantics change that people tend to bitch
about in stable branches, because it'd nuke non-inherited triggers
too.
I'm kind of inclined to leave things alone pre-v15. I asssume the
existing behavior had been that way all along, or do you have reason
to think it changed recently in those branches?
regards, tom lane