Delete cascade with three levels bug ?

Started by Marcelo Costaabout 16 years ago6 messages
#1Marcelo Costa
marcelojscosta@gmail.com

Hi, hackers

I have a problem at PostgreSQL 8.3.5 (Slackware Server and Win 2003 Server)
SO independent.

When run the scripts below I receive the error:

---------------------------------------------------------------------------
testes=# DELETE FROM pai WHERE co_pai = 1;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
---------------------------------------------------------------------------

The script creates the father table, son and grandson. All conected with FK
Cascade.

I changed the comportment of the pg_trigger to postpone the validation of
constraint
(make compatible with 7.4 version because we migrate it).

When I delete the father register the next error ocourr.

PS1.: On Windows need reinicialize the service.
PS2.: On Linux all connection down but not is needed reinicialize the
service.
PS3.: I make a test on 8.4 version and the same error ocourr.

-------Scripts-------
CREATE TABLE father
(
co_father double precision NOT NULL,
no_description character varying(50) NOT NULL,
CONSTRAINT pk_father PRIMARY KEY (co_father)
)
WITH (OIDS=TRUE);
ALTER TABLE father OWNER TO postgres;

CREATE TABLE son
(
co_son double precision NOT NULL,
co_father double precision NOT NULL,
no_description character varying(50) NOT NULL,
CONSTRAINT pk_son PRIMARY KEY (co_son)
)
WITH (OIDS=TRUE);
ALTER TABLE son OWNER TO postgres;

CREATE TABLE grandson
(
co_grandson double precision NOT NULL,
co_son double precision NOT NULL,
no_description character varying(50) NOT NULL,
CONSTRAINT pk_grandson PRIMARY KEY (co_grandson)
)
WITH (OIDS=TRUE);
ALTER TABLE son OWNER TO postgres;

ALTER TABLE son
ADD CONSTRAINT fk_son_father FOREIGN KEY (co_father)
REFERENCES father (co_father) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE CASCADE;

ALTER TABLE grandson
ADD CONSTRAINT fk_grandson_son FOREIGN KEY (co_son)
REFERENCES son (co_son) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE CASCADE;

UPDATE pg_constraint SET condeferred = TRUE, condeferrable = TRUE;
UPDATE pg_trigger SET tgdeferrable = TRUE, tginitdeferred = TRUE;

INSERT INTO father VALUES(1, 'FATHER-1');

INSERT INTO son VALUES(1,1, 'Son FATHER-1');
INSERT INTO son VALUES(2,1, 'Son FATHER-1');
INSERT INTO son VALUES(3,1, 'Son FATHER-1');

INSERT INTO grandson VALUES(1,1, 'Grandson FATHER-1');
INSERT INTO grandson VALUES(2,2, 'Grandson FATHER-1');
INSERT INTO grandson VALUES(3,3, 'Grandson FATHER-1');

DELETE FROM father WHERE co_father = 1

My LOGS
-----------------------------------------------------------------------------------------------------------

0LOG: 00000: server process (PID 23470) was terminated by signal 11:
Segmentation fault
0LOCATION: LogChildExit, postmaster.c:2529
0LOG: 00000: terminating any other active server processes
0LOCATION: HandleChildCrash, postmaster.c:2374
0FATAL: 57P03: the database system is in recovery mode
0LOCATION: ProcessStartupPacket, postmaster.c:1648
0LOG: 00000: all server processes terminated; reinitializing
0LOCATION: PostmasterStateMachine, postmaster.c:2690
0LOG: 00000: database system was interrupted; last known up at 2009-10-27
11:43:37 BRST
0LOCATION: StartupXLOG, xlog.c:4836
0DEBUG: 00000: checkpoint record is at 1/1C865BD4
0LOCATION: StartupXLOG, xlog.c:4906
0DEBUG: 00000: redo record is at 1/1C865BD4; shutdown TRUE
0LOCATION: StartupXLOG, xlog.c:4932
0DEBUG: 00000: next transaction ID: 0/27113; next OID: 67190
0LOCATION: StartupXLOG, xlog.c:4936
0DEBUG: 00000: next MultiXactId: 1; next MultiXactOffset: 0
0LOCATION: StartupXLOG, xlog.c:4939
0LOG: 00000: database system was not properly shut down; automatic recovery
in progress
0LOCATION: StartupXLOG, xlog.c:5003
0LOG: 00000: record with zero length at 1/1C865C14
0LOCATION: ReadRecord, xlog.c:3126
0LOG: 00000: redo is not required
0LOCATION: StartupXLOG, xlog.c:5146
0DEBUG: 00000: transaction ID wrap limit is 2147484026, limited by database
"template1"
0LOCATION: SetTransactionIdLimit, varsup.c:283
0LOG: 00000: database system is ready to accept connections
0LOCATION: reaper, postmaster.c:2156

-----------------------------------------------------------------------------------------------------------------------------

Has I know, the deletion cascade with three levels don't give this problem.
The same think occourr if I change the pg_trigger comportment to postpone
validation. This is right?

Sorry, but what wrap limit do ?

Thanks in advanced.

Sincerely,
--
Marcelo Costa
<http://www.marcelocosta.net&gt;
-------------------------------------------------
“You can't always get what want”,

Doctor House in apology to Mike Jagger

In reply to: Marcelo Costa (#1)
Re: Delete cascade with three levels bug ?

Marcelo Costa escreveu:

Hi, hackers

I have a problem at PostgreSQL 8.3.5 (Slackware Server and Win 2003
Server) SO independent.

When run the scripts below I receive the error:

This is not a bug. There are many ways to shoot yourself in the foot; and it
is one of them...

UPDATE pg_trigger SET tgdeferrable = TRUE, tginitdeferred = TRUE;

This query will defer all of the triggers, including the cascade ones. If you
compiles with --enable-cassert you will see:

TRAP: FailedAssertion("!(afterTriggers->query_depth >= 0)", Arquivo:
"/a/pgsql/src/backend/commands/trigger.c", Linha: 3893)

Why don't you use the syntax (DEFERRABLE and INITIALLY DEFERRED)? That will do
the Right Thing (TM).

BTW, is it worth preventing such a crash putting an elog message in trigger.c?

--
Euler Taveira de Oliveira
http://www.timbira.com/

#3Robert Haas
robertmhaas@gmail.com
In reply to: Euler Taveira de Oliveira (#2)
Re: Delete cascade with three levels bug ?

On Tue, Oct 27, 2009 at 10:41 AM, Euler Taveira de Oliveira
<euler@timbira.com> wrote:

Marcelo Costa escreveu:

Hi, hackers

I have a problem at PostgreSQL 8.3.5 (Slackware Server and Win 2003
Server) SO independent.

When run the scripts below I receive the error:

This is not a bug. There are many ways to shoot yourself in the foot; and it
is one of them...

UPDATE pg_trigger SET tgdeferrable = TRUE, tginitdeferred = TRUE;

This query will defer all of the triggers, including the cascade ones. If you
compiles with --enable-cassert you will see:

TRAP: FailedAssertion("!(afterTriggers->query_depth >= 0)", Arquivo:
"/a/pgsql/src/backend/commands/trigger.c", Linha: 3893)

Why don't you use the syntax (DEFERRABLE and INITIALLY DEFERRED)? That will do
the Right Thing (TM).

BTW, is it worth preventing such a crash putting an elog message in trigger.c?

It doesn't seem right to allow a catalog change that results in an
assertion failure. Seems like we should either prevent the catalog
change, or have an elog() there rather than Assert().

...Robert

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#3)
Re: Delete cascade with three levels bug ?

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Oct 27, 2009 at 10:41 AM, Euler Taveira de Oliveira

BTW, is it worth preventing such a crash putting an elog message in trigger.c?

It doesn't seem right to allow a catalog change that results in an
assertion failure. Seems like we should either prevent the catalog
change, or have an elog() there rather than Assert().

We can't prevent a superuser from making stupid catalog changes. I
agree though that it's not very nice to Assert or dump core afterwards.
Changing the assert to an elog seems like an appropriate answer,
assuming that this isn't masking some more significant issue.

regards, tom lane

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marcelo Costa (#1)
Re: Delete cascade with three levels bug ?

Marcelo Costa <marcelojscosta@gmail.com> writes:

[ trying to defer RI_FKey_cascade_del trigger crashes the backend ]

I looked at this a bit more and think that there's more to it than pilot
error. The crash occurs because we queue a deferred trigger here:

#0 AfterTriggerSaveEvent (relinfo=0x40181778, event=1, row_trigger=1 '\001', oldtup=0x4018c1b8, newtup=0x0, recheckIndexes=0x0, modifiedCols=0x0) at trigger.c:3890
#1 0x2db9c0 in ExecARDeleteTriggers (estate=0x40181778, relinfo=0x40181778, tupleid=0x4018c1b8) at trigger.c:1896
#2 0x317124 in ExecDelete (tupleid=0x7b03c388, planSlot=0x4018b2c0, epqstate=0x4018b664, estate=0x401816a8) at nodeModifyTable.c:368
#3 0x317894 in ExecModifyTable (node=0x4018b618) at nodeModifyTable.c:738
#4 0x2fe01c in ExecProcNode (node=0x4018b618) at execProcnode.c:359
#5 0x2fc034 in ExecutePlan (estate=0x401816a8, planstate=0x4018b618, operation=CMD_DELETE, sendTuples=0 '\000', numberTuples=0, direction=ForwardScanDirection, dest=0x40003a80) at execMain.c:1188
#6 0x2fadf0 in standard_ExecutorRun (queryDesc=0x4017e2a0, direction=ForwardScanDirection, count=0) at execMain.c:278
#7 0x2fac94 in ExecutorRun (queryDesc=0x40181778, direction=ForwardScanDirection, count=1) at execMain.c:227
#8 0x3268c8 in _SPI_pquery (queryDesc=0x4017e2a0, fire_triggers=0 '\000', tcount=0) at spi.c:2009
#9 0x326520 in _SPI_execute_plan (plan=0x40180288, paramLI=0x4017e280, snapshot=0x0, crosscheck_snapshot=0x0, read_only=0, fire_triggers=0 '\000', tcount=0) at spi.c:1831
#10 0x32383c in SPI_execute_snapshot (plan=0x40180288, Values=0x7b03be88, Nulls=0x7b03bf88 " ", snapshot=0x0, crosscheck_snapshot=0x0, read_only=0, fire_triggers=0, tcount=0) at spi.c:441
#11 0x493d08 in ri_PerformCheck (qkey=0x7b03bb20, qplan=0x40180288, fk_rel=0x40133600, pk_rel=0x401262f0, old_tuple=0x7b03b5a8, new_tuple=0x0, detectNewRows=1, expect_OK=8, constrname=0x7b03b8cc "fk_son_father") at ri_triggers.c:3350
#12 0x4901d4 in RI_FKey_cascade_del (fcinfo=0x40181778) at ri_triggers.c:1166
#13 0x2db140 in ExecCallTriggerFunc (trigdata=0x7b03b588, tgindx=0, finfo=0x7b03b850, instr=0x0, per_tuple_context=0x40153b60) at trigger.c:1631
#14 0x2dccc8 in AfterTriggerExecute (event=0x40170678, rel=0x401262f0, trigdesc=0x1, finfo=0x4016a0d0, instr=0x0, per_tuple_context=0x40153b60) at trigger.c:2830
#15 0x2dd038 in afterTriggerInvokeEvents (events=0x40170a58, firing_id=1, estate=0x40169e40, delete_ok=1 '\001') at trigger.c:3009
#16 0x2dd5ac in AfterTriggerFireDeferred () at trigger.c:3247
#17 0x1fd230 in CommitTransaction () at xact.c:1581
#18 0x1fdfe8 in CommitTransactionCommand () at xact.c:2323
#19 0x3ed0e8 in finish_xact_command () at postgres.c:2372
#20 0x3eae20 in exec_simple_query (query_string=0x400625b8 "DELETE FROM father WHERE co_father = 1;") at postgres.c:1039
#21 0x3ef344 in PostgresMain (argc=2, argv=0x51, username=0x4005f9d8 "postgres") at postgres.c:3573

when we are not inside any AfterTriggerBeginQuery/AfterTriggerEndQuery
pair. Normally _SPI_pquery() would take care of that detail, but it's
been specifically told not to by the RI trigger code (notice the
fire_triggers=0 arguments). It is not immediately obvious that an
RI trigger query could never cause AFTER triggers to be queued, so
I think this is at least a latent bug, even if this particular symptom
involves intentional misconfiguration of standard triggers.

The two reasonable fixes seem to be to change our minds about not
letting SPI fire deferred triggers in this usage, or to add code to
AfterTriggerFireDeferred to invoke AfterTriggerBegin/EndQuery.
I do not recall what the reasoning was for deferring triggers during
ri_triggers calls, and am afraid to monkey with that behavior unless
someone can lay out an argument why it's okay to change. So modifying
AfterTriggerFireDeferred seems like the right solution.

regards, tom lane

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#5)
Re: Delete cascade with three levels bug ?

I wrote:

The crash occurs because we queue a deferred trigger here ...
when we are not inside any AfterTriggerBeginQuery/AfterTriggerEndQuery
pair. Normally _SPI_pquery() would take care of that detail, but it's
been specifically told not to by the RI trigger code (notice the
fire_triggers=0 arguments). It is not immediately obvious that an
RI trigger query could never cause AFTER triggers to be queued, so
I think this is at least a latent bug, even if this particular symptom
involves intentional misconfiguration of standard triggers.

After further study I've gone back to the let's-just-make-the-assert-an-elog
camp. The reason we have an issue is exactly that the cascade-delete
trigger assumes that it will never be deferred, which is how come it
can get away with telling SPI to not deal with any invoked AFTER triggers
immediately. There isn't any good reason to support deferring that
trigger, nor is there any way to get into the state short of clueless
superuser manipulation of the catalogs.

I did look at modifying AfterTriggerFireDeferred, but concluded that's
not a great solution on two grounds:

* there doesn't seem to be any way to do it without taking a noticeable
performance hit (in particular, an extra useless scan over the deferred
trigger queue to verify everything's been fired);

* the crash occurs all the way back to 8.0, and such a fix would be a
great deal more trouble to back-port.

So elog it is ...

regards, tom lane