ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED
The new SQL Standard (SQL:2011) contains this:
"Table constraints are either enforced or not enforced. Domain
constraints and assertions are always enforced.", 4.17.2
The SQL Standard allows you to turn the checking on and off for CHECK
constraints, UNIQUE constraints and FOREIGN KEYS.
Which of those make sense for us, if any? The ability to create FKs
without checking all the data has been frequently requested to me over
many years. OTOH, I can't really see any point in turning on/off all of
the other aspects mentioned by the SQL Standard, especially indexes.
It's lots of work and seems likely to end with poorer data quality. And
the obvious thing is if you don't want a CHECK constraint, just drop
it...
My proposal is that we add a short and simple clause NOT ENFORCED onto
the ADD constraint syntax. So we have
ALTER TABLE foo
ADD FOREIGN KEY .... NOT ENFORCED;
The "enforced" state is not persisted - once added the FK is checked
every time. So there is no additional column on pg_constraint.
The benefit here is that we implement a capability that allows skipping
very long running SQL statements when required, and doesn't require too
much code. It has been discussed before on hackers, but that was before
it was part of the SQL Standard. Oracle has had this for years and it is
popular feature. We can expect other RDBMS to implement this feature,
now it is part of the standard.
If you want more than my good-bits-only proposal, it really isn't going
to happen for 9.1, and seems pretty pointless anyway.
Very short hack to implement this attached for discussion. No tests, not
even a compile - just showing how quick a patch this can be.
Thoughts? Alternative syntax?
--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services
Attachments:
fk_not_enforced.v1.patchtext/x-patch; charset=UTF-8; name=fk_not_enforced.v1.patchDownload+22-13
Simon Riggs <simon@2ndQuadrant.com> writes:
The new SQL Standard (SQL:2011) contains this:
"Table constraints are either enforced or not enforced. Domain
constraints and assertions are always enforced.", 4.17.2
The SQL Standard allows you to turn the checking on and off for CHECK
constraints, UNIQUE constraints and FOREIGN KEYS.
Huh? It allows you to postpone the check until commit. That's far from
not enforcing it.
regards, tom lane
On Sun, 2010-12-12 at 17:57 -0500, Tom Lane wrote:
Simon Riggs <simon@2ndQuadrant.com> writes:
The new SQL Standard (SQL:2011) contains this:
"Table constraints are either enforced or not enforced. Domain
constraints and assertions are always enforced.", 4.17.2The SQL Standard allows you to turn the checking on and off for CHECK
constraints, UNIQUE constraints and FOREIGN KEYS.Huh? It allows you to postpone the check until commit. That's far from
not enforcing it.
"When a <commit statement> is executed, all enforced constraints are
effectively checked and, if any enforced
constraint is not satisfied, then an exception condition is raised and
the SQL-transaction is terminated by an
implicit <rollback statement>."
This clearly implies that un-enforced constraints are not checked at
commit.
--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services
Simon Riggs <simon@2ndQuadrant.com> writes:
On Sun, 2010-12-12 at 17:57 -0500, Tom Lane wrote:
Huh? It allows you to postpone the check until commit. That's far from
not enforcing it.
This clearly implies that un-enforced constraints are not checked at
commit.
[ shrug... ] I can't argue with you about what may or may not be in an
unpublished draft of an unratified version of the standard, since I
don't have a copy. But allow me to harbor doubts that they really
intend to allow someone to force a constraint to be considered valid
without any verification. This proposal strikes me as something mysql
would do, not the standards committee. (In particular, can a constraint
go from not-enforced to enforced state without getting checked at that
time?)
Even if you're reading the draft correctly, and the wording makes it
into a released standard, the implementation you propose would break our
code. The incremental FK checks are designed on the assumption that the
constraint condition held before; they aren't likely to behave very
sanely if the data is bad. I'd want to see a whole lot more analysis of
the resulting behavior before even considering an idea like this.
regards, tom lane
On Sun, Dec 12, 2010 at 6:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Simon Riggs <simon@2ndQuadrant.com> writes:
On Sun, 2010-12-12 at 17:57 -0500, Tom Lane wrote:
Huh? It allows you to postpone the check until commit. That's far from
not enforcing it.This clearly implies that un-enforced constraints are not checked at
commit.[ shrug... ] I can't argue with you about what may or may not be in an
unpublished draft of an unratified version of the standard, since I
don't have a copy. But allow me to harbor doubts that they really
intend to allow someone to force a constraint to be considered valid
without any verification. This proposal strikes me as something mysql
would do, not the standards committee. (In particular, can a constraint
go from not-enforced to enforced state without getting checked at that
time?)Even if you're reading the draft correctly, and the wording makes it
into a released standard, the implementation you propose would break our
code. The incremental FK checks are designed on the assumption that the
constraint condition held before; they aren't likely to behave very
sanely if the data is bad. I'd want to see a whole lot more analysis of
the resulting behavior before even considering an idea like this.
Wow, you've managed to bash Simon, MySQL, and the SQL standards
committee all in one email.
I'm not going to argue that careful analysis isn't needed before doing
something like this - and, in particular, if we ever get inner-join
removal, which I'm still hoping to do at some point, a foreign key
that isn't actually guaranteed to be valid might result in queries
returning different answers depending on whether or not a join is
removed. I guess we'd have to define that as the user's problem for
alleging a foreign-key relationship that doesn't truly exist. On the
other hand, there's clearly also a use case for this behavior. If a
bulk load of prevalidated data forces an expensive revalidation of
constraints that are already known to hold, there's a real chance the
DBA will be backed into a corner where he simply has no choice but to
not use foreign keys, even though he might really want to validate the
foreign-key relationships on a going-forward basis.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
... On the
other hand, there's clearly also a use case for this behavior. If a
bulk load of prevalidated data forces an expensive revalidation of
constraints that are already known to hold, there's a real chance the
DBA will be backed into a corner where he simply has no choice but to
not use foreign keys, even though he might really want to validate the
foreign-key relationships on a going-forward basis.
There may well be a case to be made for doing this on grounds of
practical usefulness. I'm just voicing extreme skepticism that it can
be supported by reference to the standard.
Personally I'd prefer to see us look into whether we couldn't arrange
for low-impact establishment of a verified FK relationship, analogous to
CREATE INDEX CONCURRENTLY. We don't let people just arbitrarily claim
that a uniqueness condition exists, and ISTM that if we can handle that
case we probably ought to be able to handle FK checking similarly.
regards, tom lane
On Sun, Dec 12, 2010 at 7:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
... On the
other hand, there's clearly also a use case for this behavior. If a
bulk load of prevalidated data forces an expensive revalidation of
constraints that are already known to hold, there's a real chance the
DBA will be backed into a corner where he simply has no choice but to
not use foreign keys, even though he might really want to validate the
foreign-key relationships on a going-forward basis.There may well be a case to be made for doing this on grounds of
practical usefulness. I'm just voicing extreme skepticism that it can
be supported by reference to the standard.
Dunno, I haven't read it either. But it does seem like the natural
interpretation of "NOT ENFORCED".
Personally I'd prefer to see us look into whether we couldn't arrange
for low-impact establishment of a verified FK relationship, analogous to
CREATE INDEX CONCURRENTLY. We don't let people just arbitrarily claim
that a uniqueness condition exists, and ISTM that if we can handle that
case we probably ought to be able to handle FK checking similarly.
That'd be useful, too, but I don't think it would remove the use case
for skipping the check altogether.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Sun, Dec 12, 2010 at 4:49 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Dec 12, 2010 at 6:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Simon Riggs <simon@2ndQuadrant.com> writes:
On Sun, 2010-12-12 at 17:57 -0500, Tom Lane wrote:
Huh? It allows you to postpone the check until commit. That's far from
not enforcing it.This clearly implies that un-enforced constraints are not checked at
commit.[ shrug... ] I can't argue with you about what may or may not be in an
unpublished draft of an unratified version of the standard, since I
don't have a copy. But allow me to harbor doubts that they really
intend to allow someone to force a constraint to be considered valid
without any verification. This proposal strikes me as something mysql
would do, not the standards committee. (In particular, can a constraint
go from not-enforced to enforced state without getting checked at that
time?)Even if you're reading the draft correctly, and the wording makes it
into a released standard, the implementation you propose would break our
code. The incremental FK checks are designed on the assumption that the
constraint condition held before; they aren't likely to behave very
sanely if the data is bad. I'd want to see a whole lot more analysis of
the resulting behavior before even considering an idea like this.Wow, you've managed to bash Simon, MySQL, and the SQL standards
committee all in one email.I'm not going to argue that careful analysis isn't needed before doing
something like this - and, in particular, if we ever get inner-join
removal, which I'm still hoping to do at some point, a foreign key
that isn't actually guaranteed to be valid might result in queries
returning different answers depending on whether or not a join is
removed. I guess we'd have to define that as the user's problem for
alleging a foreign-key relationship that doesn't truly exist. On the
other hand, there's clearly also a use case for this behavior. If a
bulk load of prevalidated data forces an expensive revalidation of
constraints that are already known to hold, there's a real chance the
DBA will be backed into a corner where he simply has no choice but to
not use foreign keys, even though he might really want to validate the
foreign-key relationships on a going-forward basis.--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
MySQL does in fact have this feature and it is used by mysqldump. This
feature is very useful.
--
Rob Wultsch
wultsch@gmail.com
On 12/12/2010 08:27 PM, Rob Wultsch wrote:
MySQL does in fact have this feature and it is used by mysqldump. This
feature is very useful.
The trouble is that FK's have more than one use. In particular, they
have a documentary use that's used by tools that analyze databases, as
well as by tools like htsql. They also have a role as an enforced
constraint.
In fact it's possible now to disable FK enforcement, by disabling the
triggers. It's definitely a footgun though. Just the other day I was
asked how data violating the constraint could have got into the table,
and caused some surprise by demonstrating how easy this was to produce.
So what would actually be an advance in my view would be a mechanism
that allowed explicit disabling of a constraint but ensured that it was
not violated when re-enabling it.
cheers
andrew
I wouldn't like to comment on whether or not Simon has correctly
interpreted the words of the SQL standards committee, because
standards committees sometimes word things in an intentionally
ambiguous way to placate different interests, and because it seems
fairly inconsequential in this case. IMHO this is a useful feature
that should be pursued.
There is another precedent that no one mentioned - DB2. From their docs:
"You can add a foreign key with the NOT ENFORCED option to create an
informational referential constraint. This action does not leave the
table space in CHECK-pending status, and you do not need to execute
CHECK DATA."
I understand that DB2's informational referential constraints won't
ever be enforced (they just show intent, which is useful to their
planner), so this isn't really the same thing. However, DB2 apparently
doesn't initially enforce referential integrity when an FK is created
on a table with existing data, without any special syntax on the
CREATE:
"DB2 does not validate the data when you add the foreign key. Instead,
if the table is populated....the table space that contains the table
is placed in CHECK-pending status, just as if it had been loaded with
ENFORCE NO. In this case, you need to execute the CHECK DATA utility
to clear the CHECK-pending status."
If I am not mistaken, this is almost exactly the behaviour described
by Simon, because referential integrity is, presumably, enforced after
the FK is created, but before the CHECK DATA utility is optionally run
to ensure that we actually have referential integrity at a later time.
I believe that Simon's proposal is essentially sound. I don't know why
CHECK DATA operates at the tablespace granularity rather than the FK
granularity - IANADB2U.
If we followed this behaviour, we wouldn't "let people just
arbitrarily claim" that a referential condition exists - rather, we'd
let them assert that it /ought/ to exist, and that it will be
maintained going forward, and give them the option of verifying that
assertion at a later time, after which it actually exists.
Unfortunately, this refinement of Simon's proposal would probably
entail adding an additional column to pg_constraint.
--
Regards,
Peter Geoghegan
Peter Geoghegan wrote:
If we followed this behaviour, we wouldn't "let people just
arbitrarily claim" that a referential condition exists - rather,
we'd let them assert that it /ought/ to exist, and that it will be
maintained going forward, and give them the option of verifying
that assertion at a later time, after which it actually exists.
What you outline would be quite valuable to our shop. Under the
law, the "custodians of the data" are the elected clerks of circuit
court, and under state law and rules of the state supreme court we
can't "clean up" even the most glaring apparent data problems
without the OK of the elected official or his or her designee. We
have a very complex schema (although no more complex than necessary
to model the reality of the data) with hundreds of foreign key
relationships.
For various reasons (conversions from old systems, etc.), these
relationships don't hold on all tables in all county databases. It
would be desirable to have foreign key definitions define the
intended relationships anyway, and very useful for them to prevent
further data degradation. For those situations where we get a
business analyst to work with clerk of court staff to clean up
orphaned rows, it would be very slick to be able to run some
statement (like CHECK DATA) to see if the cleanup is complete and
successful and to flag that the constraint is now enforced.
So +1 on what Peter outlined as current DB2 features in this regard.
-Kevin
Import Notes
Resolved by subject fallback
On Sun, Dec 12, 2010 at 7:24 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
In fact it's possible now to disable FK enforcement, by disabling the
triggers. It's definitely a footgun though. Just the other day I was asked
how data violating the constraint could have got into the table, and caused
some surprise by demonstrating how easy this was to produce.
Ugh. I have read the entire pg manual and I did not recall that
footgun. At least in MySQL disabling fk's is explicit. There is
something to be said for being able to tell the database: "Hey, hold
my beer and watch this, it might be stupid but it is what we are going
to do". The database telling it's user that is a much larger issue
(and yes, MySQL is generally worse). The user at least gets to talk to
db through sql, the database only really gets to talk to the user
through errors and the manual.
The fact that fk checks are implemented by the trigger system somehow
seems "surprising".
--
Rob Wultsch
wultsch@gmail.com
On Sun, 2010-12-12 at 19:07 -0500, Tom Lane wrote:
Robert Haas <robertmhaas@gmail.com> writes:
... On the
other hand, there's clearly also a use case for this behavior. If a
bulk load of prevalidated data forces an expensive revalidation of
constraints that are already known to hold, there's a real chance the
DBA will be backed into a corner where he simply has no choice but to
not use foreign keys, even though he might really want to validate the
foreign-key relationships on a going-forward basis.There may well be a case to be made for doing this on grounds of
practical usefulness. I'm just voicing extreme skepticism that it can
be supported by reference to the standard.Personally I'd prefer to see us look into whether we couldn't arrange
for low-impact establishment of a verified FK relationship, analogous to
CREATE INDEX CONCURRENTLY. We don't let people just arbitrarily claim
that a uniqueness condition exists, and ISTM that if we can handle that
case we probably ought to be able to handle FK checking similarly.
I think we should do *both* things. Sometimes you already know the check
will pass, sometimes you don't. In particular, reloading data from
another source where you knew the checks passed. Enforcing re-checking
in that case reduces data availability.
--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services
Tom Lane <tgl@sss.pgh.pa.us> writes:
The incremental FK checks are designed on the assumption that the
constraint condition held before; they aren't likely to behave very
sanely if the data is bad. I'd want to see a whole lot more analysis of
the resulting behavior before even considering an idea like this.
ALTER TABLE foo DISABLE TRIGGER ALL;
I must have missed the point when PostgreSQL stoped providing this foot
gun already, so that it's arguable less a surprise to spell the
misfeature NOT ENFORCED rather than DISABLE TRIGGER.
Seriously, real-world use cases such as Kevin's one seems to warrant
that we are able to create a table withouth enforcing the FK. That's
horrid, yes, that's needed, too. Maybe some operations would have to be
instructed that the constraint ain't trustworthy but just declared to be
so by the user?
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
2010/12/13 Tom Lane <tgl@sss.pgh.pa.us>:
But allow me to harbor doubts that they really
intend to allow someone to force a constraint to be considered valid
without any verification.
"Table constraints are either enforced or not enforced. Domain
constraints and assertions are always enforced.", 4.17.2
I don't read that as meaning that such unenforced constraints are
considered "valid". It sounds as if unenforced constraints are the
same as non-existing constraints (think: constraint "templates"),
possibly as a means to "remember" that they should be re-enabled at
some point.
I.e., marking a constraint as unenforced and then as enforced again
would be a shortcut for removing and re-adding the constraint, while
having the advantage that one doesn't have to keep a list of
constraint definitions that must be re-added.
(In particular, can a constraint
go from not-enforced to enforced state without getting checked at that
time?)
I assume not.
Nicolas
On 13 December 2010 10:30, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
Seriously, real-world use cases such as Kevin's one seems to warrant
that we are able to create a table withouth enforcing the FK. That's
horrid, yes, that's needed, too. Maybe some operations would have to be
instructed that the constraint ain't trustworthy but just declared to be
so by the user?
Might I suggest that we call them "aspirational foreign keys", while
sticking with Simon's syntax?
Reasons:
1. It's suggestive of the lack of certainty about the referential
integrity of the underlying data - They're like a foreign key, but not
quite as good.
2. It's indicative that they may one day become actual foreign keys
through the use of something like the CHECK DATA utility. I'd favour
doing this with a separate DDL command.
3. It's suggestive that they aren't just syntactic sugar or an
expression of intent, as DB2's NOT ENFORCED FKs are, but rather that
they behave like Oracle's NOT ENFORCED FKs.
4. It's memorable, I think.
By the way, the DISABLE TRIGGER ALL method isn't equivalent to this.
Apart from hackishly depending on an implementation detail, it isn't
possible to prevent the big, up-front enforcement in the first place
when the FK is declared, because DISABLE TRIGGER ALL only disables
existing triggers. Perhaps, if and when this feature is implemented,
it will also be possible to use some explicit mechanism to disable and
re-enable an FK. However, that's secondary I think.
--
Regards,
Peter Geoghegan
On Mon, Dec 13, 2010 at 12:59 AM, Rob Wultsch <wultsch@gmail.com> wrote:
On Sun, Dec 12, 2010 at 7:24 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
In fact it's possible now to disable FK enforcement, by disabling the
triggers. It's definitely a footgun though. Just the other day I was asked
how data violating the constraint could have got into the table, and caused
some surprise by demonstrating how easy this was to produce.Ugh. I have read the entire pg manual and I did not recall that
footgun. At least in MySQL disabling fk's is explicit. There is
something to be said for being able to tell the database: "Hey, hold
my beer and watch this, it might be stupid but it is what we are going
to do".
I couldn't agree more, and that's a great way to put it. The user is
in charge. Our job is to prevent the user from *accidentally*
shooting themselves in the foot. But if a crocodile is biting their
foot off and they want to fire their gun in that direction and take
their chances, it's not our job to say "oh, no, you might injure your
foot". DBAs hate getting eaten by crocodiles.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, 2010-12-13 at 11:54 +0000, Peter Geoghegan wrote:
On 13 December 2010 10:30, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
Seriously, real-world use cases such as Kevin's one seems to warrant
that we are able to create a table withouth enforcing the FK. That's
horrid, yes, that's needed, too. Maybe some operations would have to be
instructed that the constraint ain't trustworthy but just declared to be
so by the user?Might I suggest that we call them "aspirational foreign keys", while
sticking with Simon's syntax?
Just checking what we are saying:
1.
(a) ALTER TABLE ... ADD FOREIGN KEY ... NOT VALIDATED INITIALLY;
will add a FK but NOT run the check - we mark it as "check pending".
Lock held: ShareRowExclusiveLock
(b) Every new change to the table has the FK enforced - the triggers are
fully enabled and active. (That could mean we update a row and have the
update fail because of a FK violation.)
2. pg_validate_foreign_key('constraint name');
Returns immediately if FK is valid
Returns SETOF rows that violate the constraint, or if no rows are
returned it updates constraint to show it is now valid.
Lock held: AccessShareLock
Note that 1 & 2 together are the equivalent of ADD FK CONCURRENTLY,
except that step 2 more usefully tells you which rows fail.
--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services
On Mon, Dec 13, 2010 at 10:49 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
(a) ALTER TABLE ... ADD FOREIGN KEY ... NOT VALIDATED INITIALLY;
will add a FK but NOT run the check - we mark it as "check pending".
Lock held: ShareRowExclusiveLock
Seems about right. Not sure whether the lock strength is correct.
(b) Every new change to the table has the FK enforced - the triggers are
fully enabled and active. (That could mean we update a row and have the
update fail because of a FK violation.)
Also seems about right.
2. pg_validate_foreign_key('constraint name');
Returns immediately if FK is valid
Returns SETOF rows that violate the constraint, or if no rows are
returned it updates constraint to show it is now valid.
Lock held: AccessShareLock
I'm less sure about this part. I think there should be a DDL
statement to validate the foreign key. The "return the problem" rows
behavior could be done some other way, or just left to the user to
write their own query.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 13 December 2010 16:08, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Dec 13, 2010 at 10:49 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
2. pg_validate_foreign_key('constraint name');
Returns immediately if FK is valid
Returns SETOF rows that violate the constraint, or if no rows are
returned it updates constraint to show it is now valid.
Lock held: AccessShareLockI'm less sure about this part. I think there should be a DDL
statement to validate the foreign key. The "return the problem" rows
behavior could be done some other way, or just left to the user to
write their own query.
+1. I think that a DDL statement is more appropriate, because it makes
the process sort of symmetrical.
Perhaps we could error on the first FK violation found, and give a
"value 'foo' not present in table bar". It ought to not matter that
there could be a lot of violations, because they will be exceptional
if you're using the feature as intended - presumably, you're going to
want to comb through the data to find out exactly what has gone wrong
for each violation. On the off chance that it actually is a problem,
the user can go ahead and write their own query, like Robert
suggested.
--
Regards,
Peter Geoghegan