RI triggers and schemas
As of CVS tip, referential integrity triggers are kinda broken: they
will only work for tablenames that are in the current search path.
I think that instead of storing just table names in the trigger
parameters, we should store either table OIDs or schema name + table
name. Do you have any preferences about this?
An advantage of using OIDs is that we could forget the pushups that
ALTER TABLE RENAME presently goes through to update RI triggers.
On the other hand, as long as the RI implementation depends on
generating textual queries, it'd be faster to have the names available
than to have to look them up from the OID. But I seem to recall Stephan
threatening to rewrite that code at a lower level pretty soon, so the
speed issue might go away. In any case it's probably a minor issue
compared to generating the query plan.
So I'm leaning towards OIDs, but wanted to see if anyone had a beef
with that.
regards, tom lane
Tom Lane wrote:
Jan Wieck <janwieck@yahoo.com> writes:
Actually I'm kicking around a slightly different idea, how to
resolve the entire problem. We could build up the
querystring, required to do the check, at trigger creation
time, parse it and store the querytree node-print or hand it
to the trigger as argument.Hm. Seems kinda bulky; and the parse step alone is not that expensive.
(You could only do raw grammar parsing I think, not the parse analysis
phase, unless you wanted to deal with having to outdate these stored
querytrees after changes in table schemas.)
You're right, as soon as other details than the column or
table name might change, this could get even more screwed.
I think the existing scheme of generating the plan during first use
in a particular backend is fine. At least as long as we're sticking
with standard plans at all ... IIRC Stephan was wondering about
bypassing the whole parse/plan mechanism in favor of heap-access-level
operations.
I don't know if using heap-access directly in the RI triggers
is such a good idea.
It is guaranteed that there is a unique key covering all the
referenced columns (and only them). I'm not sure though if it
has to be in the same column order as the reference. Nor do I
think that matters other than making the creation of the
scankey a bit more difficult.
But there could be no, some, a full matching index, maybe one
with extra columns at the end on the foreign key. So for the
referential action, the entire process of deciding which
index fits best, pushing some of the qualification into the
index scankey, and do the rest on the heap tuples, has to be
duplicated here.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #
_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com
Import Notes
Reply to msg id not found: 8754.1017176559@sss.pgh.pa.us | Resolved by subject fallback
On Tue, 26 Mar 2002, Tom Lane wrote:
As of CVS tip, referential integrity triggers are kinda broken: they
will only work for tablenames that are in the current search path.
I think that instead of storing just table names in the trigger
parameters, we should store either table OIDs or schema name + table
name. Do you have any preferences about this?An advantage of using OIDs is that we could forget the pushups that
ALTER TABLE RENAME presently goes through to update RI triggers.On the other hand, as long as the RI implementation depends on
generating textual queries, it'd be faster to have the names available
than to have to look them up from the OID. But I seem to recall Stephan
threatening to rewrite that code at a lower level pretty soon, so the
speed issue might go away. In any case it's probably a minor issue
compared to generating the query plan.So I'm leaning towards OIDs, but wanted to see if anyone had a beef
with that.
I'd say oids are better (and probably attnos for the columns). That's
generally what I've been assuming in my attempts on rewriting the
code. I've been working on getting something together using the
heap_*/index_* scanning functions. I feel like I'm reimplementing alot of
wheels though, so I need to see what I can use from other places.
Tom Lane wrote:
As of CVS tip, referential integrity triggers are kinda broken: they
will only work for tablenames that are in the current search path.
I think that instead of storing just table names in the trigger
parameters, we should store either table OIDs or schema name + table
name. Do you have any preferences about this?An advantage of using OIDs is that we could forget the pushups that
ALTER TABLE RENAME presently goes through to update RI triggers.
I'm always suspicious about the spec if it makes RENAME easy.
regards,
Hiroshi Inoue
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
Tom Lane wrote:
An advantage of using OIDs is that we could forget the pushups that
ALTER TABLE RENAME presently goes through to update RI triggers.
I'm always suspicious about the spec if it makes RENAME easy.
Point taken ;-)
However, unless someone can give a specific reason to make life hard,
I'm inclined to simplify this.
regards, tom lane
Tom Lane wrote:
An advantage of using OIDs is that we could forget the pushups that
ALTER TABLE RENAME presently goes through to update RI triggers.I'm always suspicious about the spec if it makes RENAME easy.
Point taken ;-)
I don't get it???
Chris
On Tue, 26 Mar 2002, Jan Wieck wrote:
Tom Lane wrote:
I think the existing scheme of generating the plan during first use
in a particular backend is fine. At least as long as we're sticking
with standard plans at all ... IIRC Stephan was wondering about
bypassing the whole parse/plan mechanism in favor of heap-access-level
operations.I don't know if using heap-access directly in the RI triggers
is such a good idea.It is guaranteed that there is a unique key covering all the
referenced columns (and only them). I'm not sure though if it
has to be in the same column order as the reference. Nor do I
think that matters other than making the creation of the
scankey a bit more difficult.But there could be no, some, a full matching index, maybe one
with extra columns at the end on the foreign key. So for the
referential action, the entire process of deciding which
index fits best, pushing some of the qualification into the
index scankey, and do the rest on the heap tuples, has to be
duplicated here.
That is the problem that I've run into in working on doing it. I'm
still trying to figure out what levels of that code can be used.
The advantage that I see is that we get more control over the time
qualifications used for tuples which may come into play for match
partial. I'm not sure that it's worth the effort to try doing it
this way, but I figured I'd try it.
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
The advantage that I see is that we get more control over the time
qualifications used for tuples which may come into play for match
partial. I'm not sure that it's worth the effort to try doing it
this way, but I figured I'd try it.
It might be better to address that directly, eg:
- define another SnapShot value that has the semantics you want
- add a field to Scan plan nodes to specify explicitly the snapshot
you want used. Presumably by default the planner would fill this
with the standard QuerySnapshot, but you could
- find a way to override the default (if nothing else, walk the
completed plan tree and tweak the snapshot settings).
I believe it's already true that scan plan nodes lock down the target
snapshot during plan node startup, by copying QuerySnapshot into node
local execution state. So maybe you don't even need the above hack;
perhaps just twiddling QuerySnapshot right before ExecutorStart would
get the job done.
It might be useful to discuss exactly what is bad or broken about the
current RI implementation, so we can get a clearer idea of what ought
to be done. I know that y'all are dissatisfied with it but I'm not
sure I fully understand the issues.
regards, tom lane
On Tue, 26 Mar 2002, Stephan Szabo wrote:
On Tue, 26 Mar 2002, Jan Wieck wrote:
Tom Lane wrote:
I think the existing scheme of generating the plan during first use
in a particular backend is fine. At least as long as we're sticking
with standard plans at all ... IIRC Stephan was wondering about
bypassing the whole parse/plan mechanism in favor of heap-access-level
operations.I don't know if using heap-access directly in the RI triggers
is such a good idea.It is guaranteed that there is a unique key covering all the
referenced columns (and only them). I'm not sure though if it
has to be in the same column order as the reference. Nor do I
think that matters other than making the creation of the
scankey a bit more difficult.But there could be no, some, a full matching index, maybe one
with extra columns at the end on the foreign key. So for the
referential action, the entire process of deciding which
index fits best, pushing some of the qualification into the
index scankey, and do the rest on the heap tuples, has to be
duplicated here.That is the problem that I've run into in working on doing it. I'm
still trying to figure out what levels of that code can be used.The advantage that I see is that we get more control over the time
qualifications used for tuples which may come into play for match
partial. I'm not sure that it's worth the effort to try doing it
this way, but I figured I'd try it.
Another thing to bear in mind:
We (www.seatbooker.net, that is) have had a certain amount of trouble with
contention for locks taken out by RI triggers - in particular triggers on
FK tables where large numbers of rows refer to a small number of rows in
the PK table.
Having had a look at it one possible solution seemed to be to do two
special queries in these triggers. Whilst checking and UPDATE/INSERT on
the FK table do a special 'SELECT ... FOR UPDATE BARRIER' query which is
exactly like 'SELECT ... FOR UPDATE', including waiting for transactions
with the row marked for update to commit/rollback, except that it doesn't
actually mark the row for update afterwards. Whilst checking DELETE/UPDATE
on the PK table do a 'SELECT ... FOR UPDATE INCLUDE UNCOMMITTED LIMIT 1'
(or 'SELECT ... FOR UPDATE READ UNCOMMITTED if READ UNCOMMITTED can be
made to work) which would do everything which SELECT .. FOR UPDATE does
but also wait for any transactions with matching uncommitted rows to
complete before returning.
If the RI triggers could have more direct control over the time
qualifications then this could be implemented without the need for these
two queries....which after all are a bit of a hack.
Hmm, come to think of it the check which is triggered on the PK update
probably doesn't need to mark anything for update at all - it might work
with just an update barrier that could include uncommitted rows and return
a 'matching rows existed' vs 'no matching rows existed' status. Perhaps
this would help eliminate the possibility of deadlocking whilst checking
the two constraints simultaeneously for concurrent updates, too...
Unfortionately, whilst I've managed to write a (seemingly, anyway) working
'SELECT .. FOR UPDATE BARRIER' I haven't really got much time to work on
this any more. Comments on it wouldn't go amiss, though.
On Wed, 27 Mar 2002, Tom Lane wrote:
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
The advantage that I see is that we get more control over the time
qualifications used for tuples which may come into play for match
partial. I'm not sure that it's worth the effort to try doing it
this way, but I figured I'd try it.It might be better to address that directly, eg:
- define another SnapShot value that has the semantics you want
- add a field to Scan plan nodes to specify explicitly the snapshot
you want used. Presumably by default the planner would fill this
with the standard QuerySnapshot, but you could- find a way to override the default (if nothing else, walk the
completed plan tree and tweak the snapshot settings).I believe it's already true that scan plan nodes lock down the target
snapshot during plan node startup, by copying QuerySnapshot into node
local execution state. So maybe you don't even need the above hack;
perhaps just twiddling QuerySnapshot right before ExecutorStart would
get the job done.It might be useful to discuss exactly what is bad or broken about the
current RI implementation, so we can get a clearer idea of what ought
to be done. I know that y'all are dissatisfied with it but I'm not
sure I fully understand the issues.
Well, let's see, the big things in the current functionality are:
For update locking is much stronger than we actually need to guarantee
the constraint.
There are some cases that the current constraints may get wrong. We
haven't come to an agreement on some of these cases, but...
On the insert/update fk check, we should not check rows that
aren't valid since the intermediate states don't need to be valid. In
fact this is already patched, but it opens up another possible failure
case below, so I'm mentioning it.
On the noaction pk checks, if other rows have been added such that there
are no failures of the constraint there shouldn't be an error. That
was the NOT EXISTS addition to the constraint that was objected to
in a previous patch. For match full this could be a simple check for
an equal row, but for match partial it seems alot more complicated
since each fk row may have multiple matching rows in the pk table and
those rows may be different for each fk row.
On the referential actions, we need to agree on the behavior of the
cases. If you do something like (with a deferred on delete cascade)
begin; delete from pk; insert into fk; end;
is it supposed to be a failure? On 7.2 it would be. Currently it
wouldn't be because it sees the inserted row as being invalid by the
time it checks. I think it should be, but the old check may not
have been the right place depending on the answers to the below:
If we did instead:
begin; delete from pk; insert into fk; insert into pk; end;
is there a row in fk at the end or not?
If we did:
begin; insert into fk; delete from pk; insert into fk; insert into pk;
end;
do we end up with zero, one or two rows in fk?
Some things that would be good to add:
Making the foreign key stuff work with inheritance.
Adding match partial. This gets complicated with the referential actions
particularly update cascade. My reading of the match partial update
cascade says that if a row gets updated due to having all of its
matching rows being updated by the same statement that all of the rows
that matched this row were updated to non-distinct values for the
columns of the fk row that were not NULL.
Last week I said:
I think that instead of storing just table names in the trigger
parameters, we should store either table OIDs or schema name + table
name. [ ... ]
So I'm leaning towards OIDs, but wanted to see if anyone had a beef
with that.
I've just realized that if we change the RI trigger arguments this way,
we will have a really serious problem with accepting pg_dump scripts
from prior versions. The scripts' representation of foreign key
constraints will contain commands like
CREATE CONSTRAINT TRIGGER "<unnamed>" AFTER UPDATE ON "bar" FROM "baz" NOT DEFERRABLE INITIALLY IMMEDIATE FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_noaction_upd" ('<unnamed>', 'baz', 'bar', 'UNSPECIFIED', 'f1', 'f1');
which will absolutely not work at all if the 7.3 triggers are expecting
to find OIDs in those arguments.
I thought about allowing the triggers to take qualified names in the
style of what nextval() is doing in current sources, but that's still
going to have a lot of nasty compatibility issues --- mixed-case
names, names containing dots, etc are all going to be interpreted
differently than before.
I think we may have little choice except to create two sets of RI trigger
procedures, one that takes the old-style arguments and one that takes
new-style arguments. However the old-style set will be horribly fragile
because they'll have to interpret their arguments based on the current
namespace search path.
Of course the *real* problem here is that pg_dump is outputting a
low-level representation of the original constraints. We knew all along
that that would get us into trouble eventually ... and that trouble is
now upon us. We really need to fix pg_dump to emit ALTER TABLE ADD
CONSTRAINT type commands instead of trigger definitions.
A possible escape from the dilemma is to fix pg_dump so that it can emit
ADD CONSTRAINT commands when it sees RI triggers, release that in 7.2.2,
and then *require* people to use 7.2.2 or later pg_dump when it comes
time to update to 7.3. I do not much like this ... but it may be better
than the alternative of trying to maintain backwards-compatible
triggers.
Comments? Better ideas?
regards, tom lane
If pg_upgrade was shipped with 7.3 in working order with the ability
to convert the old foreign key commands to the new ones I don't think
anyone would care how many funny things are involved. Just fix the
foreign key stuff for 7.3 pg_dump and only support upgrades using that
version, or included pg_upgrade script (any 7.2 release to 7.3)
That said, it doesn't look like it'll be a pretty thing to do with a
shell script.
Hoop jumping may be required to go from 6.5 or 7.0/1 directly to 7.3.
Downside is pg_upgrade is fairly new (can it be trusted -- made to
work 100%?)
Upside is no changes would be required to 7.2 and lots of people would
be really happy to have a fast upgrade process (dump / restore can
take quite a while on large dbs)
--
Rod Taylor
Your eyes are weary from staring at the CRT. You feel sleepy. Notice
how restful it is to watch the cursor blink. Close your eyes. The
opinions stated above are yours. You cannot imagine why you ever felt
otherwise.
----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Stephan Szabo" <sszabo@megazone23.bigpanda.com>
Cc: "Jan Wieck" <JanWieck@Yahoo.com>; <pgsql-hackers@postgresql.org>
Sent: Sunday, March 31, 2002 8:43 PM
Subject: Re: [HACKERS] RI triggers and schemas
Last week I said:
I think that instead of storing just table names in the trigger
parameters, we should store either table OIDs or schema name +
table
name. [ ... ]
So I'm leaning towards OIDs, but wanted to see if anyone had a
beef
with that.
I've just realized that if we change the RI trigger arguments this
way,
we will have a really serious problem with accepting pg_dump scripts
from prior versions. The scripts' representation of foreign key
constraints will contain commands likeCREATE CONSTRAINT TRIGGER "<unnamed>" AFTER UPDATE ON "bar" FROM
"baz" NOT DEFERRABLE INITIALLY IMMEDIATE FOR EACH ROW EXECUTE
PROCEDURE "RI_FKey_noaction_upd" ('<unnamed>', 'baz', 'bar',
'UNSPECIFIED', 'f1', 'f1');
which will absolutely not work at all if the 7.3 triggers are
expecting
to find OIDs in those arguments.
I thought about allowing the triggers to take qualified names in the
style of what nextval() is doing in current sources, but that's
still
going to have a lot of nasty compatibility issues --- mixed-case
names, names containing dots, etc are all going to be interpreted
differently than before.I think we may have little choice except to create two sets of RI
trigger
procedures, one that takes the old-style arguments and one that
takes
new-style arguments. However the old-style set will be horribly
fragile
because they'll have to interpret their arguments based on the
current
namespace search path.
Of course the *real* problem here is that pg_dump is outputting a
low-level representation of the original constraints. We knew all
along
that that would get us into trouble eventually ... and that trouble
is
now upon us. We really need to fix pg_dump to emit ALTER TABLE ADD
CONSTRAINT type commands instead of trigger definitions.A possible escape from the dilemma is to fix pg_dump so that it can
emit
ADD CONSTRAINT commands when it sees RI triggers, release that in
7.2.2,
and then *require* people to use 7.2.2 or later pg_dump when it
comes
time to update to 7.3. I do not much like this ... but it may be
better
than the alternative of trying to maintain backwards-compatible
triggers.Comments? Better ideas?
regards, tom lane
---------------------------(end of
broadcast)---------------------------
Show quoted text
TIP 6: Have you searched our list archives?
I've just realized that if we change the RI trigger arguments this way,
we will have a really serious problem with accepting pg_dump scripts
from prior versions. The scripts' representation of foreign key
constraints will contain commands likeCREATE CONSTRAINT TRIGGER "<unnamed>" AFTER UPDATE ON "bar" FROM "baz" NOT DEFERRABLE INITIALLY IMMEDIATE FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_noaction_upd" ('<unnamed>', 'baz', 'bar', 'UNSPECIFIED', 'f1', 'f1');
which will absolutely not work at all if the 7.3 triggers are expecting
to find OIDs in those arguments.
Why can't we just hack up the CREATE CONSTRAINT TRIGGER code to look up
the OIDs, etc. for the arguments and convert them internally to an ALTER
TABLE/ADD CONSTRAINT or whatever...
Chris
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
Why can't we just hack up the CREATE CONSTRAINT TRIGGER code to look up
the OIDs, etc. for the arguments and convert them internally to an ALTER
TABLE/ADD CONSTRAINT or whatever...
Hmm ... seems pretty ugly, but then again the alternatives are pretty
durn ugly themselves. This ugliness would at least be localized.
Might be a plan.
regards, tom lane
Yeah, although it'd still be a good idea probably to convert the dump form
to ALTER TABLE in any case. The one downside that was brought up in the
past was the time involved in checking dumped (presumably correct) data
when the constraint is added to very large tables. I can probably make
that faster since right now it's just running the check on each row,
but it'll still be slow on big tables possibly. Another option would
be to have an argument that would disable the check on an add constraint,
except that wouldn't work for unique/primary key.
Maybe it could be a really evil SET CONSTRAINTS command like:
SET CONSTRAINTS UNCHECKED;
...
Chris
Import Notes
Reference msg id not found: 20020331215537.U11990-100000@megazone23.bigpanda.com | Resolved by subject fallback
On Sun, 31 Mar 2002, Tom Lane wrote:
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
Why can't we just hack up the CREATE CONSTRAINT TRIGGER code to look up
the OIDs, etc. for the arguments and convert them internally to an ALTER
TABLE/ADD CONSTRAINT or whatever...Hmm ... seems pretty ugly, but then again the alternatives are pretty
durn ugly themselves. This ugliness would at least be localized.
Might be a plan.
Yeah, although it'd still be a good idea probably to convert the dump form
to ALTER TABLE in any case. The one downside that was brought up in the
past was the time involved in checking dumped (presumably correct) data
when the constraint is added to very large tables. I can probably make
that faster since right now it's just running the check on each row,
but it'll still be slow on big tables possibly. Another option would
be to have an argument that would disable the check on an add constraint,
except that wouldn't work for unique/primary key.
Yeah, although it'd still be a good idea probably to convert the dump form
to ALTER TABLE in any case. The one downside that was brought up in the
past was the time involved in checking dumped (presumably correct) data
when the constraint is added to very large tables. I can probably make
that faster since right now it's just running the check on each row,
but it'll still be slow on big tables possibly. Another option would
be to have an argument that would disable the check on an add constraint,
except that wouldn't work for unique/primary key.
Maybe it could be a really evil SET CONSTRAINTS command like:
SET CONSTRAINTS UNCHECKED;
...
Chris
Import Notes
Resolved by subject fallback
Christopher Kings-Lynne wrote:
I've just realized that if we change the RI trigger arguments this way,
we will have a really serious problem with accepting pg_dump scripts
from prior versions. The scripts' representation of foreign key
constraints will contain commands likeCREATE CONSTRAINT TRIGGER "<unnamed>" AFTER UPDATE ON "bar" FROM "baz" NOT DEFERRABLE INITIALLY IMMEDIATE FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_noaction_upd" ('<unnamed>', 'baz', 'bar', 'UNSPECIFIED', 'f1', 'f1');
which will absolutely not work at all if the 7.3 triggers are expecting
to find OIDs in those arguments.Why can't we just hack up the CREATE CONSTRAINT TRIGGER code to look up
the OIDs, etc. for the arguments and convert them internally to an ALTER
TABLE/ADD CONSTRAINT or whatever...
And what language hack do you suggest to suppress the
complete referential check of the foreign key table at ALTER
TABLE ... time? Currently, it does a sequential scan of the
entire table to check every single row. So adding 3
constraints to a 10M row table might take some time.
Note, that that language hack will again make the dump non-
ANSI complient and thus, I don't consider the entire change
to ALTER TABLE an improvement at all.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #
_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com
Christopher Kings-Lynne wrote:
Yeah, although it'd still be a good idea probably to convert the dump form
to ALTER TABLE in any case. The one downside that was brought up in the
past was the time involved in checking dumped (presumably correct) data
when the constraint is added to very large tables. I can probably make
that faster since right now it's just running the check on each row,
but it'll still be slow on big tables possibly. Another option would
be to have an argument that would disable the check on an add constraint,
except that wouldn't work for unique/primary key.Maybe it could be a really evil SET CONSTRAINTS command like:
SET CONSTRAINTS UNCHECKED;
Hmmm, I would like this one if restricted to superusers or
database owner.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #
_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com
Jan Wieck <janwieck@yahoo.com> writes:
Christopher Kings-Lynne wrote:
Why can't we just hack up the CREATE CONSTRAINT TRIGGER code to look up
the OIDs, etc. for the arguments and convert them internally to an ALTER
TABLE/ADD CONSTRAINT or whatever...
And what language hack do you suggest to suppress the
complete referential check of the foreign key table at ALTER
TABLE ... time?
Actually, I was interpreting his idea to mean that we add intelligence
to CREATE TRIGGER to adjust the specified trigger arguments if it sees
the referenced trigger procedure is one of the RI triggers. It'd be
fairly self-contained, really, since the CREATE TRIGGER code could use
its "ON table" and "FROM table" arguments to derive the correct OIDs
to insert. This could be done always (whether the incoming arguments
look like OIDs or not), which'd also give us a short-term answer for
dumping/reloading 7.3-style RI triggers. I'd still like to change
pg_dump to output some kind of ALTER command in place of CREATE TRIGGER,
but we'd have some breathing room to debate about how.
I'm now inclined to leave the attribute arguments alone (stick to names
not numbers) just to avoid possible conversion problems there.
regards, tom lane