foreign keys and RI triggers

Started by Neil Conwayover 20 years ago16 messages
#1Neil Conway
neilc@samurai.com

I spent a little while looking into a performance issue with a large
UPDATE on a table with foreign keys. A few questions:

(1) When a PK table is updated, we skip firing the per-row UPDATE RI
triggers if none of the referenced columns in the PK table have been
modified. However, AFAICS we do not apply a similar optimization for
updates of foreign key tables: if a user does not modify the foreign key
column, we needn't check for the presence of the FK column value in the
primary key table. Is there a reason we don't implement this?

(2) For per-row RI triggers of all kinds, we save the trigger under
CurTransactionContext and invoke it at the end of the current query.
There is not even overflow to disk (the report that prompted me to look
into this was someone's database crashing because they kept running OOM
when doing an UPDATE of a large table with FKs on a pretty lowend
machine). While avoiding consuming a lot of memory for queued trigger
execution is worth doing anyway, ISTM we needn't queue RI triggers in
the first place. Is there a reason we can't just invoke after-row RI
triggers immediately?

(Hmm, I suppose we would need to defer firing the trigger until the
command ID is incremented if the foreign key references its own table.
But even so, this should not be an issue for non-self-referential
foreign keys.)

(3) This is minor, but AFAICS RI_FKey_check_upd() is not used --
RI_FKey_check_ins() is used to validate both inserts and updates on
tables with foreign keys (see tablecmds.c circa 4423). Both functions
are just wrappers over RI_FKey_check() anyway. This is rather confusing;
would anyone object if I removed both functions and made RI_FKey_check()
public?

-Neil

#2Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: Neil Conway (#1)
Re: foreign keys and RI triggers

On Thu, 26 May 2005, Neil Conway wrote:

I spent a little while looking into a performance issue with a large
UPDATE on a table with foreign keys. A few questions:

(1) When a PK table is updated, we skip firing the per-row UPDATE RI
triggers if none of the referenced columns in the PK table have been
modified. However, AFAICS we do not apply a similar optimization for
updates of foreign key tables: if a user does not modify the foreign key
column, we needn't check for the presence of the FK column value in the
primary key table. Is there a reason we don't implement this?

Are you sure? RI_FKey_Check seems to have a section on
TRIGGER_FIRED_BY_UPDATE which seems to check if the keys are equal if the
old row wasn't part of this transaction. I'm not sure why it's doing the
transaction id check, but it looks like it will do an equals check at
least some of the time.

(2) For per-row RI triggers of all kinds, we save the trigger under
CurTransactionContext and invoke it at the end of the current query.
There is not even overflow to disk (the report that prompted me to look
into this was someone's database crashing because they kept running OOM
when doing an UPDATE of a large table with FKs on a pretty lowend
machine). While avoiding consuming a lot of memory for queued trigger
execution is worth doing anyway, ISTM we needn't queue RI triggers in
the first place. Is there a reason we can't just invoke after-row RI
triggers immediately?

If I'm understanding the question, there's two things. First is deferred
constraints and the second is that constraints happen after the entire
statement.

In a case like:
insert into pk values(2);
insert into pk values(1);
insert into fk values(2);
update pk set key=key+1;

In no action, that's not an error AFAIK because the constraint is
satisfied at end of statement. If the order of updates happened such that
the key=2 row were updated first we couldn't know whether or not the
constraint would be satisfied by later updates to the same table.

#3Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: Stephan Szabo (#2)
Re: foreign keys and RI triggers

On Thu, 26 May 2005, Stephan Szabo wrote:

On Thu, 26 May 2005, Neil Conway wrote:

(2) For per-row RI triggers of all kinds, we save the trigger under
CurTransactionContext and invoke it at the end of the current query.
There is not even overflow to disk (the report that prompted me to look
into this was someone's database crashing because they kept running OOM
when doing an UPDATE of a large table with FKs on a pretty lowend
machine). While avoiding consuming a lot of memory for queued trigger
execution is worth doing anyway, ISTM we needn't queue RI triggers in
the first place. Is there a reason we can't just invoke after-row RI
triggers immediately?

If I'm understanding the question, there's two things. First is deferred
constraints and the second is that constraints happen after the entire
statement.

In a case like:
insert into pk values(2);
insert into pk values(1);
insert into fk values(2);
update pk set key=key+1;

In no action, that's not an error AFAIK because the constraint is
satisfied at end of statement. If the order of updates happened such that
the key=2 row were updated first we couldn't know whether or not the
constraint would be satisfied by later updates to the same table.

Although, we could almost run referential actions that way. The almost
comes from some behavior for set default cases to make sure the default
still exists which I think would still have to happen at end of statement
to be spec complient (I don't have easy access to my copy of SQL99 from
here). I think we're still a little short of entirely complient on timing
in any case because unless I'm misremembering constraint checks happen
after user defined triggers and referential actions happen before which
would be difficult if not impossible to do right now with the way we fire
triggers.

#4Neil Conway
neilc@samurai.com
In reply to: Stephan Szabo (#2)
Re: foreign keys and RI triggers

Stephan Szabo wrote:

Are you sure? RI_FKey_Check seems to have a section on
TRIGGER_FIRED_BY_UPDATE which seems to check if the keys are equal if the
old row wasn't part of this transaction.

Well, regardless of how RI_FKey_Check() itself works, ISTM there is no
need to enqueue the RI trigger in the first place. That's when the
update-on-PK-table optimization is applied -- see trigger.c circa 3005.
The specific case I was looking at resulted in the postgres backend
allocating a few hundred MB just to store all the pending RI triggers,
even though the UPDATE in question didn't change the foreign key field,
so it didn't matter a great deal how quickly RI_FKey_Check() was able to
bail out.

If I'm understanding the question, there's two things. First is deferred
constraints

Right -- obviously we can't fire RI triggers for deferred constraints
immediately. Immediate constraints are the common case, though.

constraints happen after the entire statement.
In a case like:
insert into pk values(2);
insert into pk values(1);
insert into fk values(2);
update pk set key=key+1;

Hmm, good point. But ISTM there are still some circumstances in which we
can safely check the RI trigger immediately, rather than at end of
statement. For example, updating the FK table, inserting into the FK
table, or deleting from the PK table.

-Neil

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephan Szabo (#2)
Re: foreign keys and RI triggers

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

Are you sure? RI_FKey_Check seems to have a section on
TRIGGER_FIRED_BY_UPDATE which seems to check if the keys are equal if the
old row wasn't part of this transaction. I'm not sure why it's doing the
transaction id check, but it looks like it will do an equals check at
least some of the time.

I think the reason for the xact check is that if we have deferred
triggers and we do

begin;
insert new FK value;
update new row;
commit;

then when the trigger for the INSERT event fires, it will do nothing
because the tuple it's triggered on is now dead. So the trigger for
the UPDATE event had better make the check. It's possible we could
skip the UPDATE event if we could be certain the INSERT trigger had
already fired, but I'm not sure how to be certain about that.

While avoiding consuming a lot of memory for queued trigger
execution is worth doing anyway, ISTM we needn't queue RI triggers in
the first place. Is there a reason we can't just invoke after-row RI
triggers immediately?

If I'm understanding the question, there's two things. First is deferred
constraints and the second is that constraints happen after the entire
satement.

Right. RI constraints are actually the only kind we do "right" in
terms of enforcing the check when the SQL spec says we should.

The thoughts I've had about special-casing RI events to save memory
have to do with the idea of lossy storage. As you accumulate more
per-row events, at some point it becomes more efficient to forget
the individual rows and just reapply the original full-table check
query when it's time to check the constraint. So if we could recognize
RI events as being associated with the same constraint, and keep track
of how many are pending for each constraint, we could make a decision to
discard the queue and instead register one event to apply a full-table
check. It's not clear how to do that efficiently though.

regards, tom lane

#6Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: Neil Conway (#4)
Re: foreign keys and RI triggers

On Fri, 27 May 2005, Neil Conway wrote:

Stephan Szabo wrote:

Are you sure? RI_FKey_Check seems to have a section on
TRIGGER_FIRED_BY_UPDATE which seems to check if the keys are equal if the
old row wasn't part of this transaction.

Well, regardless of how RI_FKey_Check() itself works, ISTM there is no
need to enqueue the RI trigger in the first place. That's when the
update-on-PK-table optimization is applied -- see trigger.c circa 3005.
The specific case I was looking at resulted in the postgres backend
allocating a few hundred MB just to store all the pending RI triggers,
even though the UPDATE in question didn't change the foreign key field,
so it didn't matter a great deal how quickly RI_FKey_Check() was able to
bail out.

Okay, I can't think of cases even with triggers and the like where
removing the check on equal valued rows would give appreciably different
results, but I haven't thought too hard about it.

If I'm understanding the question, there's two things. First is deferred
constraints

Right -- obviously we can't fire RI triggers for deferred constraints
immediately. Immediate constraints are the common case, though.

constraints happen after the entire statement.
In a case like:
insert into pk values(2);
insert into pk values(1);
insert into fk values(2);
update pk set key=key+1;

Hmm, good point. But ISTM there are still some circumstances in which we
can safely check the RI trigger immediately, rather than at end of
statement. For example, updating the FK table, inserting into the FK
table, or deleting from the PK table.

Unfortunately, I don't think so, if my assumption that user triggers are
supposed to happen before constraint checks is true. In that case, we
must wait until not only the action but all triggers fired by that action
happen in order to run the constraint check because a trigger could make
an otherwise invalid row valid.

#7Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: Stephan Szabo (#6)
Re: foreign keys and RI triggers

On Thu, 26 May 2005, Stephan Szabo wrote:

On Fri, 27 May 2005, Neil Conway wrote:

Stephan Szabo wrote:

Are you sure? RI_FKey_Check seems to have a section on
TRIGGER_FIRED_BY_UPDATE which seems to check if the keys are equal if the
old row wasn't part of this transaction.

Well, regardless of how RI_FKey_Check() itself works, ISTM there is no
need to enqueue the RI trigger in the first place. That's when the
update-on-PK-table optimization is applied -- see trigger.c circa 3005.
The specific case I was looking at resulted in the postgres backend
allocating a few hundred MB just to store all the pending RI triggers,
even though the UPDATE in question didn't change the foreign key field,
so it didn't matter a great deal how quickly RI_FKey_Check() was able to
bail out.

Okay, I can't think of cases even with triggers and the like where
removing the check on equal valued rows would give appreciably different
results, but I haven't thought too hard about it.

Err, except the case that Tom mentions in his message.

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephan Szabo (#7)
Re: foreign keys and RI triggers

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

Okay, I can't think of cases even with triggers and the like where
removing the check on equal valued rows would give appreciably different
results, but I haven't thought too hard about it.

Err, except the case that Tom mentions in his message.

But the check could incorporate the same transaction ID test already
in use. I think Neil is right that it'd be a win to apply the test
before enqueueing the trigger instead of after.

regards, tom lane

#9Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: Tom Lane (#8)
Re: foreign keys and RI triggers

On Thu, 26 May 2005, Tom Lane wrote:

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

Okay, I can't think of cases even with triggers and the like where
removing the check on equal valued rows would give appreciably different
results, but I haven't thought too hard about it.

Err, except the case that Tom mentions in his message.

But the check could incorporate the same transaction ID test already
in use. I think Neil is right that it'd be a win to apply the test
before enqueueing the trigger instead of after.

Good point. That would help in many cases anyway.

#10Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: Tom Lane (#5)
Re: foreign keys and RI triggers

On Thu, 26 May 2005, Tom Lane wrote:

The thoughts I've had about special-casing RI events to save memory
have to do with the idea of lossy storage. As you accumulate more
per-row events, at some point it becomes more efficient to forget
the individual rows and just reapply the original full-table check
query when it's time to check the constraint. So if we could recognize

One problem with that is that it works for the constraint check but not
for referential actions, although if we instead fired the referential
actions truly immediately rather than queued to statement end that'd
prevent those from being an issue. The only thing there is that we'd have
to also have a constraint check for at least set default.

RI events as being associated with the same constraint, and keep track
of how many are pending for each constraint, we could make a decision to
discard the queue and instead register one event to apply a full-table
check. It's not clear how to do that efficiently though.

Yeah, I was thinking we could keep a separate structure for (foreign key
trigger oid, action) where we could keep track of a current count and
whether or not we've consolidated already and scan the queue when we do
the consolidation removing items for that oid. That's still not very good
though.

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephan Szabo (#10)
Re: foreign keys and RI triggers

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

On Thu, 26 May 2005, Tom Lane wrote:

The thoughts I've had about special-casing RI events to save memory
have to do with the idea of lossy storage.

One problem with that is that it works for the constraint check but not
for referential actions,

True. But even fixing it only for constraint checks would be a win.

Sooner or later, we will have to implement spill-to-disk for the trigger
event queue anyway ...

regards, tom lane

#12Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: Tom Lane (#11)
Re: foreign keys and RI triggers

On Thu, 26 May 2005, Tom Lane wrote:

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

On Thu, 26 May 2005, Tom Lane wrote:

The thoughts I've had about special-casing RI events to save memory
have to do with the idea of lossy storage.

One problem with that is that it works for the constraint check but not
for referential actions,

True. But even fixing it only for constraint checks would be a win.

Yeah, I'm just wondering if going the extra step and forcing really really
immediate referential actions (even if that sometimes means adding a no
action trigger on the event as well) would be worth doing as that could
remove the queued pk actions for cascade and set null as well and at least
turn the queued pk action for set default into one that could be
consolidated.

#13Neil Conway
neilc@samurai.com
In reply to: Tom Lane (#8)
Re: foreign keys and RI triggers

Tom Lane wrote:

But the check could incorporate the same transaction ID test already
in use. I think Neil is right that it'd be a win to apply the test
before enqueueing the trigger instead of after.

Speaking of which, does anyone see a reason why RI_FKey_keyequal_upd()
is implemented as a pseudo-trigger function -- e.g. taking a pointer to
a TriggerData? It seems a kludge.

-Neil

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Neil Conway (#13)
Re: foreign keys and RI triggers

Neil Conway <neilc@samurai.com> writes:

Speaking of which, does anyone see a reason why RI_FKey_keyequal_upd()
is implemented as a pseudo-trigger function -- e.g. taking a pointer to
a TriggerData?

Because (a) it needs all the same arguments and (b) it can share
infrastructure with the other RI triggers.

regards, tom lane

#15Neil Conway
neilc@samurai.com
In reply to: Tom Lane (#14)
Re: foreign keys and RI triggers

Tom Lane wrote:

Because (a) it needs all the same arguments

Well, it needs the Trigger that we're in the process of queueing, the
old tuple, the new tuple, and the updated relation. It doesn't need the
rest of the content of TriggerData. trigger.c has to manually construct
a TriggerData to pass to it, so it's not like it's a notational convenience.

(b) it can share infrastructure with the other RI triggers.

Such as? I don't see anything it allows us to share.

-Neil

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Neil Conway (#15)
Re: foreign keys and RI triggers

Neil Conway <neilc@samurai.com> writes:

(b) it can share infrastructure with the other RI triggers.

Such as? I don't see anything it allows us to share.

Note the ri_BuildQueryKeyFull call, and the arguments thereto.

As to the notational convenience issue, I think it's good that
RI_FKey_keyequal_upd looks almost exactly like the other RI trigger
functions. Having it look different in order to save a couple lines
in trigger.c doesn't seem like a win to me.

regards, tom lane