Allow foreign keys to reference a superset of unique columns

ktchen14@gmail.com

almost 4 years ago

In reply to: Peter Eisentraut (#4)

Re: Allow foreign keys to reference a superset of unique columns

On Fri, Jun 10, 2022 at 12:14 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 10.06.22 05:47, David Rowley wrote:

I think this should be referring to constraint name, not an index name.

Can you explain why you think that?

If you wanted to specify this feature in the SQL standard (I'm not
proposing that, but it seems plausible), then you need to deal in terms
of constraints, not indexes. Maybe referring to an index directly could
be a backup option if desired, but I don't see why that would be
necessary, since you can easily create a real constraint on top of an index.

I think that there's a subtle difference between specifying a
constraint or an index in that the ALTER TABLE ADD CONSTRAINT USING
INDEX command prohibits the creation of a constraint using an index
where the key columns are associated with non default opclasses. As
far as I can tell, a foreign key constraint *can* pick an index with
non default opclasses. So specifying a constraint name in the FOREIGN
KEY syntax would result in a strange situation where the foreign key
constraint could implicitly pick a supporting index with non default
opclasses to use, but there'd be no way to explicitly specify that
index.

Tom Lane

tgl@sss.pgh.pa.us

almost 4 years ago

In reply to: Kaiting Chen (#1)

Re: Allow foreign keys to reference a superset of unique columns

Kaiting Chen <ktchen14@gmail.com> writes:

I'd like to propose a change to PostgreSQL to allow the creation of a foreign
key constraint referencing a superset of uniquely constrained columns.

TBH, I think this is a fundamentally bad idea and should be rejected
outright. It fuzzes the semantics of the FK relationship, and I'm
not convinced that there are legitimate use-cases. Your example
schema could easily be dismissed as bad design that should be done
some other way.

For one example of where the semantics get fuzzy, it's not
very clear how the extra-baggage columns ought to participate in
CASCADE updates. Currently, if we have
CREATE TABLE foo (a integer PRIMARY KEY, b integer);
then an update that changes only foo.b doesn't need to update
referencing tables, and I think we even have optimizations that
assume that if no unique-key columns are touched then RI checks
need not be made. But if you did
CREATE TABLE bar (x integer, y integer,
FOREIGN KEY (x, y) REFERENCES foo(a, b) ON UPDATE CASCADE);
then perhaps you expect bar.y to be updated ... or maybe you don't?

Another example is that I think the idea is only well-defined when
the subset column(s) are a primary key, or at least all marked NOT NULL.
Otherwise they're not as unique as you're claiming. But then the FK
constraint really has to be dependent on a PK constraint not just an
index definition, since indexes in themselves don't enforce not-nullness.
That gets back to Peter's complaint that referring to an index isn't
good enough.

Anyway, seeing that the patch touches neither ri_triggers.c nor any
regression tests, I find it hard to believe that such semantic
questions have been thought through.

It's also unclear to me how this ought to interact with the
information_schema views concerning foreign keys. We generally
feel that we don't want to present any non-SQL-compatible data
in information_schema, for fear that it will confuse applications
that expect to see SQL-spec behavior there. So do we leave such
FKs out of the views altogether, or show only the columns involving
the associated unique constraint? Neither answer seems pleasant.

FWIW, the patch is currently failing to apply per the cfbot.
I think you don't need to manually update backend/nodes/ anymore,
but the gram.y changes look to have been sideswiped by some
other recent commit.

regards, tom lane

walther@technowledgy.de

over 3 years ago

In reply to: Tom Lane (#7)

Re: Allow foreign keys to reference a superset of unique columns

Kaiting Chen:

I'd like to propose a change to PostgreSQL to allow the creation of a foreign
key constraint referencing a superset of uniquely constrained columns.

Tom Lane:

TBH, I think this is a fundamentally bad idea and should be rejected
outright. It fuzzes the semantics of the FK relationship, and I'm
not convinced that there are legitimate use-cases. Your example
schema could easily be dismissed as bad design that should be done
some other way.

I had to add quite a few unique constraints on a superset of already
uniquely constrained columns in the past, just to be able to support FKs
to those columns. I think those cases most often come up when dealing
with slightly denormalized schemas, e.g. for efficiency.

One other use-case I had recently, was along the followling lines, in
abstract terms:

CREATE TABLE classes (class INT PRIMARY KEY, ...);

CREATE TABLE instances (
instance INT PRIMARY KEY,
class INT REFERENCES classes,
...
);

Think about classes and instances as in OOP. So the table classes
contains some definitions for different types of object and the table
instances realizes them into concrete objects.

Now, assume you have some property of a class than is best modeled as a
table like this:

CREATE TABLE classes_prop (
property INT PRIMARY KEY,
class INT REFERNECES classes,
...
);

Now, assume you need to store data for each of those classes_prop rows
for each instance. You'd do the following:

CREATE TABLE instances_prop (
instance INT REFERENCES instances,
property INT REFERENCES classes_prop,
...
);

However, this does not ensure that the instance and the property you're
referencing in instances_prop are actually from the same class, so you
add a class column:

CREATE TABLE instances_prop (
instance INT,
class INT,
property INT,
FOREIGN KEY (instance, class) REFERENCES instances,
FOREIGN KEY (property, class) REFERENCES classes_prop,
...
);

But this won't work, without creating some UNIQUE constraints on those
supersets of the PK column first.

For one example of where the semantics get fuzzy, it's not
very clear how the extra-baggage columns ought to participate in
CASCADE updates. Currently, if we have
CREATE TABLE foo (a integer PRIMARY KEY, b integer);
then an update that changes only foo.b doesn't need to update
referencing tables, and I think we even have optimizations that
assume that if no unique-key columns are touched then RI checks
need not be made. But if you did
CREATE TABLE bar (x integer, y integer,
FOREIGN KEY (x, y) REFERENCES foo(a, b) ON UPDATE CASCADE);
then perhaps you expect bar.y to be updated ... or maybe you don't?

In all use-cases I had so far, I would expect bar.y to be updated, too.

I think it would not even be possible to NOT update bar.y, because the
FK would then not match anymore. foo.a is the PK, so the value in bar.x
already forces bar.y to be the same as foo.b at all times.

bar.y is a little bit like a generated value in that sense, it should
always match foo.b. I think it would be great, if we could actually go a
step further, too: On an update to bar.x to a new value, if foo.a=bar.x
exists, I would like to set bar.y automatically to the new foo.b.
Otherwise those kind of updates always have to either query foo before,
or add a trigger to do the same.

In the classes/instances example above, when updating
instances_prop.property to a new value, instances_prop.class would be
updated automatically to match classes_prop.class. This would fail, when
the class is different than the class required by the FK to instances,
though, providing exactly the safe-guard that this constraint was
supposed to provide, without incurring additional overhead in update
statements.

In the foo/bar example above, which is just a bit of denormalization,
this automatic update would also be helpful - because rejecting the
update on the grounds that the columns don't match doesn't make sense here.

Another example is that I think the idea is only well-defined when
the subset column(s) are a primary key, or at least all marked NOT NULL.
Otherwise they're not as unique as you're claiming.

I fail to see why. My understanding is that rows with NULL values in the
referenced table can't participate in FK matches anyway, because both
MATCH SIMPLE and MATCH FULL wouldn't require a match when any/all of the
columns in the referencing table are NULL. MATCH PARTIAL is not
implemented, so I can't tell whether the semantics would be different there.

I'm not sure whether a FK on a superset of unique columns would be
useful with MATCH SIMPLE. Maybe it could be forced to be MATCH FULL, if
MATCH SIMPLE is indeed not well-defined.

It's also unclear to me how this ought to interact with the
information_schema views concerning foreign keys. We generally
feel that we don't want to present any non-SQL-compatible data
in information_schema, for fear that it will confuse applications
that expect to see SQL-spec behavior there. So do we leave such
FKs out of the views altogether, or show only the columns involving
the associated unique constraint? Neither answer seems pleasant.

Instead of tweaking FKs, maybe it would be possible to define a UNIQUE
constraint re-using an existing index that guarantees uniqueness on a
subset of columns already? This would allow to create those FK
relationships by creating another unique constraint - without the
overhead of creating yet another index.

So roughly something like this:

ALTER TABLE foo ADD UNIQUE (a, b) USING INDEX foo_pk;

This should give a consistent output for information_schema views?

Best

Wolfgang

jtc331@gmail.com

over 3 years ago

In reply to: Wolfgang Walther (#8)

Re: Allow foreign keys to reference a superset of unique columns

On Fri, Sep 2, 2022 at 5:42 AM Wolfgang Walther <walther@technowledgy.de> wrote:

Kaiting Chen:

I'd like to propose a change to PostgreSQL to allow the creation of a foreign
key constraint referencing a superset of uniquely constrained columns.

+1

Tom Lane:

TBH, I think this is a fundamentally bad idea and should be rejected
outright. It fuzzes the semantics of the FK relationship, and I'm
not convinced that there are legitimate use-cases. Your example
schema could easily be dismissed as bad design that should be done
some other way.

I had to add quite a few unique constraints on a superset of already
uniquely constrained columns in the past, just to be able to support FKs
to those columns. I think those cases most often come up when dealing
with slightly denormalized schemas, e.g. for efficiency.

One other use-case I had recently, was along the followling lines, in
abstract terms:

CREATE TABLE classes (class INT PRIMARY KEY, ...);

CREATE TABLE instances (
instance INT PRIMARY KEY,
class INT REFERENCES classes,
...
);

Think about classes and instances as in OOP. So the table classes
contains some definitions for different types of object and the table
instances realizes them into concrete objects.

Now, assume you have some property of a class than is best modeled as a
table like this:

CREATE TABLE classes_prop (
property INT PRIMARY KEY,
class INT REFERNECES classes,
...
);

Now, assume you need to store data for each of those classes_prop rows
for each instance. You'd do the following:

CREATE TABLE instances_prop (
instance INT REFERENCES instances,
property INT REFERENCES classes_prop,
...
);

However, this does not ensure that the instance and the property you're
referencing in instances_prop are actually from the same class, so you
add a class column:

CREATE TABLE instances_prop (
instance INT,
class INT,
property INT,
FOREIGN KEY (instance, class) REFERENCES instances,
FOREIGN KEY (property, class) REFERENCES classes_prop,
...
);

But this won't work, without creating some UNIQUE constraints on those
supersets of the PK column first.

If I'm following properly this sounds like an overengineered EAV
schema, and neither of those things inspires me to think "this is a
use case I want to support".

That being said, I know that sometimes examples that have been
abstracted enough to share aren't always the best, so perhaps there's
something underlying this that's a more valuable example.

For one example of where the semantics get fuzzy, it's not
very clear how the extra-baggage columns ought to participate in
CASCADE updates. Currently, if we have
CREATE TABLE foo (a integer PRIMARY KEY, b integer);
then an update that changes only foo.b doesn't need to update
referencing tables, and I think we even have optimizations that
assume that if no unique-key columns are touched then RI checks
need not be made. But if you did
CREATE TABLE bar (x integer, y integer,
FOREIGN KEY (x, y) REFERENCES foo(a, b) ON UPDATE CASCADE);
then perhaps you expect bar.y to be updated ... or maybe you don't?

In all use-cases I had so far, I would expect bar.y to be updated, too.

I think it would not even be possible to NOT update bar.y, because the
FK would then not match anymore. foo.a is the PK, so the value in bar.x
already forces bar.y to be the same as foo.b at all times.

bar.y is a little bit like a generated value in that sense, it should
always match foo.b. I think it would be great, if we could actually go a
step further, too: On an update to bar.x to a new value, if foo.a=bar.x
exists, I would like to set bar.y automatically to the new foo.b.
Otherwise those kind of updates always have to either query foo before,
or add a trigger to do the same.

Isn't this actually contradictory to the behavior you currently have
with a multi-column foreign key? In the example above then an update
to bar.x is going to update the rows in foo that match bar.x = foo.a
and bar.y = foo.b *using the old values of bar.x and bar.y* to be the
new values. You seem to be suggesting that instead it should look for
other rows that already match the *new value* of only one of the
columns in the constraint. If I'm understanding the example correctly,
that seems like a *very* bad idea.

James Coleman

#10

walther@technowledgy.de

over 3 years ago

In reply to: James Coleman (#9)

Re: Allow foreign keys to reference a superset of unique columns

James Coleman:

If I'm following properly this sounds like an overengineered EAV
schema, and neither of those things inspires me to think "this is a
use case I want to support".

That being said, I know that sometimes examples that have been
abstracted enough to share aren't always the best, so perhaps there's
something underlying this that's a more valuable example.

Most my use-cases are slightly denormalized and I was looking for an
example that didn't require those kind of FKS only because of the
denormalization. So that's why it might have been a bit artifical or
abstracted too much.

Take another example: I deal with multi-tenancy systems, for which I
want to use RLS to separate the data between tenants:

CREATE TABLE tenants (tenant INT PRIMARY KEY);

Each tenant can create multiple users and groups:

CREATE TABLE users (
"user" INT PRIMARY KEY,
tenant INT NOT NULL REFERENCES tenants
);

CREATE TABLLE groups (
"group" INT PRIMARY KEY,
tenant INT NOT NULL REFERENCES tenants
);

Users can be members of groups. The simple approach would be:

CREATE TABLE members (
PRIMARY KEY ("user", "group"),
"user" INT REFERENCES users,
"group" INT REFERENCES groups
);

But this falls short in two aspects:
- To make RLS policies simpler to write and quicker to execute, I want
to add "tenant" columns to **all** other tables. A slightly denormalized
schema for efficiency.
- The schema above does not ensure that users can only be members in
groups of the same tenant. Our business model requires to separate
tenants cleanly, but as written above, cross-tenant memberships would be
allowed.

In comes the "tenant" column which solves both of this:

CREATE TABLE members (
PRIMARY KEY ("user", "group"),
tenant INT REFERENCES tenants,
"user" INT,
"group" INT,
FOREIGN KEY ("user", tenant) REFERENCES users ("user", tenant),
FOREIGN KEY ("group", tenant) REFERENCES groups ("group", tenant)
);

This is not possible to do right now, without adding more UNIQUE
constraints to the users and groups tables - on a superset of already
unique columns.

bar.y is a little bit like a generated value in that sense, it should
always match foo.b. I think it would be great, if we could actually go a
step further, too: On an update to bar.x to a new value, if foo.a=bar.x
exists, I would like to set bar.y automatically to the new foo.b.
Otherwise those kind of updates always have to either query foo before,
or add a trigger to do the same.

Isn't this actually contradictory to the behavior you currently have
with a multi-column foreign key? In the example above then an update
to bar.x is going to update the rows in foo that match bar.x = foo.a
and bar.y = foo.b *using the old values of bar.x and bar.y* to be the
new values.

No, I think there was a misunderstanding. An update to bar should not
update rows in foo. An update to bar.x should update bar.y implicitly,
to match the new value of foo.b.

You seem to be suggesting that instead it should look for
other rows that already match the *new value* of only one of the
columns in the constraint.

Yes. I think basically what I'm suggesting is, that for an FK to a
superset of unique columns, all the FK-logic should still be done on the
already unique set of columns only - and then the additional columns
should be mirrored into the referencing table. The referencing table can
then put additional constraints on this column. In the members example
above, this additional constraint is the fact that the tenant column
can't be filled with two different values for the users and groups FKs.
But this could also be a CHECK constraint to allow FKs only to a subset
of rows in the target table:

CREATE TYPE foo_type AS ENUM ('A', 'B', 'C');

CREATE TABLE foo (
f INT PRIMARY KEY,
type foo_type
);

CREATE TABLE bar (
b INT PRIMARY KEY,
f INT,
ftype foo_type CHECK (ftype <> 'C'),
FOREIGN KEY (f, ftype) REFERENCES foo (f, type);
);

In this example, the additional ftype column is just used to enforce
that bar can only reference rows with type A or B, but not C. Assume:

INSERT INTO foo VALUES (1, 'A'), (2, 'B'), (3, 'C');

In this case, it would be nice to be able to do the following, i.e.
derive the value for bar.ftype automatically:

INSERT INTO bar (b, f) VALUES (10, 1); -- bar.ftype is then 'A'
UPDATE bar SET f = 2 WHERE b = 10; -- bar.ftype is then 'B'

And it would throw errors in the following cases, because the
automatically derived value fails the CHECK constraint:

INSERT INTO bar (b, f) VALUES (20, 3);
UPDATE bar SET f = 3 WHERE b = 10;

Note: This "automatically derived columns" extension would be a separate
feature. Really nice to have, but the above mentioned FKs to supersets
of unique columns would be very valuable without it already.

Best

Wolfgang

#11

jtc331@gmail.com

over 3 years ago

In reply to: Wolfgang Walther (#10)

Re: Allow foreign keys to reference a superset of unique columns

On Sun, Sep 25, 2022 at 4:49 AM Wolfgang Walther
<walther@technowledgy.de> wrote:

James Coleman:

If I'm following properly this sounds like an overengineered EAV
schema, and neither of those things inspires me to think "this is a
use case I want to support".

That being said, I know that sometimes examples that have been
abstracted enough to share aren't always the best, so perhaps there's
something underlying this that's a more valuable example.

Most my use-cases are slightly denormalized and I was looking for an
example that didn't require those kind of FKS only because of the
denormalization. So that's why it might have been a bit artifical or
abstracted too much.

Take another example: I deal with multi-tenancy systems, for which I
want to use RLS to separate the data between tenants:

CREATE TABLE tenants (tenant INT PRIMARY KEY);

Each tenant can create multiple users and groups:

CREATE TABLE users (
"user" INT PRIMARY KEY,
tenant INT NOT NULL REFERENCES tenants
);

CREATE TABLLE groups (
"group" INT PRIMARY KEY,
tenant INT NOT NULL REFERENCES tenants
);

Users can be members of groups. The simple approach would be:

CREATE TABLE members (
PRIMARY KEY ("user", "group"),
"user" INT REFERENCES users,
"group" INT REFERENCES groups
);

But this falls short in two aspects:
- To make RLS policies simpler to write and quicker to execute, I want
to add "tenant" columns to **all** other tables. A slightly denormalized
schema for efficiency.
- The schema above does not ensure that users can only be members in
groups of the same tenant. Our business model requires to separate
tenants cleanly, but as written above, cross-tenant memberships would be
allowed.

In comes the "tenant" column which solves both of this:

CREATE TABLE members (
PRIMARY KEY ("user", "group"),
tenant INT REFERENCES tenants,
"user" INT,
"group" INT,
FOREIGN KEY ("user", tenant) REFERENCES users ("user", tenant),
FOREIGN KEY ("group", tenant) REFERENCES groups ("group", tenant)
);

This is not possible to do right now, without adding more UNIQUE
constraints to the users and groups tables - on a superset of already
unique columns.

Thanks, that's a more interesting use case IMO (and doesn't smell in
the way the other did).

bar.y is a little bit like a generated value in that sense, it should
always match foo.b. I think it would be great, if we could actually go a
step further, too: On an update to bar.x to a new value, if foo.a=bar.x
exists, I would like to set bar.y automatically to the new foo.b.
Otherwise those kind of updates always have to either query foo before,
or add a trigger to do the same.

Isn't this actually contradictory to the behavior you currently have
with a multi-column foreign key? In the example above then an update
to bar.x is going to update the rows in foo that match bar.x = foo.a
and bar.y = foo.b *using the old values of bar.x and bar.y* to be the
new values.

No, I think there was a misunderstanding. An update to bar should not
update rows in foo. An update to bar.x should update bar.y implicitly,
to match the new value of foo.b.

You seem to be suggesting that instead it should look for
other rows that already match the *new value* of only one of the
columns in the constraint.

Yes. I think basically what I'm suggesting is, that for an FK to a
superset of unique columns, all the FK-logic should still be done on the
already unique set of columns only - and then the additional columns
should be mirrored into the referencing table. The referencing table can
then put additional constraints on this column. In the members example
above, this additional constraint is the fact that the tenant column
can't be filled with two different values for the users and groups FKs.

If we have a declared constraint on x,y where x is unique based on an
index including on x I do not think we should have that fk constraint
work differently than a constraint on x,y where there is a unique
index on x,y. That would seem to be incredibly confusing behavior
(even if it would be useful for some specific use case).

But this could also be a CHECK constraint to allow FKs only to a subset
of rows in the target table:

Are you suggesting a check constraint that queries another table?
Because check constraints are not supposed to do that (I assume this
is technically possible to declare via a function, just like it is
technically possible to do in a functional index, but like in the
index case it's a bad idea since it's not actually guaranteed).

CREATE TYPE foo_type AS ENUM ('A', 'B', 'C');

CREATE TABLE foo (
f INT PRIMARY KEY,
type foo_type
);

CREATE TABLE bar (
b INT PRIMARY KEY,
f INT,
ftype foo_type CHECK (ftype <> 'C'),
FOREIGN KEY (f, ftype) REFERENCES foo (f, type);
);

In this example, the additional ftype column is just used to enforce
that bar can only reference rows with type A or B, but not C. Assume:

INSERT INTO foo VALUES (1, 'A'), (2, 'B'), (3, 'C');

In this case, it would be nice to be able to do the following, i.e.
derive the value for bar.ftype automatically:

INSERT INTO bar (b, f) VALUES (10, 1); -- bar.ftype is then 'A'
UPDATE bar SET f = 2 WHERE b = 10; -- bar.ftype is then 'B'

And it would throw errors in the following cases, because the
automatically derived value fails the CHECK constraint:

INSERT INTO bar (b, f) VALUES (20, 3);
UPDATE bar SET f = 3 WHERE b = 10;

Note: This "automatically derived columns" extension would be a separate
feature. Really nice to have, but the above mentioned FKs to supersets
of unique columns would be very valuable without it already.

This "derive the value automatically" is not what foreign key
constraints do right now at all, right? And if fact it's contradictory
to existing behavior, no?

This part just seems like a very bad idea. Unless I'm misunderstanding
I think we should reject this part of the proposals on this thread as
something we would not even consider implementing.

James Coleman

#12

jtc331@gmail.com

over 3 years ago

In reply to: Tom Lane (#7)

Re: Allow foreign keys to reference a superset of unique columns

Hello Kaiting,

The use case you're looking to handle seems interesting to me.

On Wed, Jul 27, 2022 at 3:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Kaiting Chen <ktchen14@gmail.com> writes:

I'd like to propose a change to PostgreSQL to allow the creation of a foreign
key constraint referencing a superset of uniquely constrained columns.

TBH, I think this is a fundamentally bad idea and should be rejected
outright. It fuzzes the semantics of the FK relationship, and I'm
not convinced that there are legitimate use-cases. Your example
schema could easily be dismissed as bad design that should be done
some other way.

For one example of where the semantics get fuzzy, it's not
very clear how the extra-baggage columns ought to participate in
CASCADE updates. Currently, if we have
CREATE TABLE foo (a integer PRIMARY KEY, b integer);
then an update that changes only foo.b doesn't need to update
referencing tables, and I think we even have optimizations that
assume that if no unique-key columns are touched then RI checks
need not be made. But if you did
CREATE TABLE bar (x integer, y integer,
FOREIGN KEY (x, y) REFERENCES foo(a, b) ON UPDATE CASCADE);
then perhaps you expect bar.y to be updated ... or maybe you don't?

Another example is that I think the idea is only well-defined when
the subset column(s) are a primary key, or at least all marked NOT NULL.
Otherwise they're not as unique as you're claiming. But then the FK
constraint really has to be dependent on a PK constraint not just an
index definition, since indexes in themselves don't enforce not-nullness.
That gets back to Peter's complaint that referring to an index isn't
good enough.

Anyway, seeing that the patch touches neither ri_triggers.c nor any
regression tests, I find it hard to believe that such semantic
questions have been thought through.

It's also unclear to me how this ought to interact with the
information_schema views concerning foreign keys. We generally
feel that we don't want to present any non-SQL-compatible data
in information_schema, for fear that it will confuse applications
that expect to see SQL-spec behavior there. So do we leave such
FKs out of the views altogether, or show only the columns involving
the associated unique constraint? Neither answer seems pleasant.

FWIW, the patch is currently failing to apply per the cfbot.
I think you don't need to manually update backend/nodes/ anymore,
but the gram.y changes look to have been sideswiped by some
other recent commit.

As I was reading through the email chain I had this thought: could you
get the same benefit (or 90% of it anyway) by instead allowing the
creation of a uniqueness constraint that contains more columns than
the index backing it? So long as the index backing it still guaranteed
the uniqueness on a subset of columns that would seem to be safe.

Tom notes the additional columns being nullable is something to think
about. But if we go the route of allowing unique constraints with
backing indexes having a subset of columns from the constraint I don't
think the nullability is an issue since it's already the case that a
unique constraint can be declared on columns that are nullable. Indeed
it's also the case that we already support a foreign key constraint
backed by a unique constraint including nullable columns.

Because such an approach would, I believe, avoid changing the foreign
key code (or semantics) at all, I believe that would address Tom's
concerns about information_schema and fuzziness of semantics.

After writing down that idea I noticed Wolfgang Walther had commented
similarly, but it appears that that idea got lost (or at least not
responded to).

I'd be happy to sign up to review an updated patch if you're
interested in continuing this effort. If so, could you register the
patch in the CF app (if not there already)?

Thanks,
James Coleman

#13

walther@technowledgy.de

over 3 years ago

In reply to: James Coleman (#11)

Re: Allow foreign keys to reference a superset of unique columns

James Coleman:

If we have a declared constraint on x,y where x is unique based on an
index including on x I do not think we should have that fk constraint
work differently than a constraint on x,y where there is a unique
index on x,y. That would seem to be incredibly confusing behavior
(even if it would be useful for some specific use case).

I don't think it's behaving differently from how it does now. See below.
But I can see how that could be confusing. Maybe it's just about
describing the feature in a better way than I did so far. Or maybe it
needs a different syntax.

Anyway, I don't think it's just a specific use case. In every use case I
had for $subject so far, the immediate next step was to write some
triggers to fetch those derived values from the referenced table.

Ultimately it's a question of efficiency: We can achieve the same thing
in two ways today:
- We can either **not** add the additional column (members.tenant,
bar.ftype in my examples) to the referencing table at all, and add
constraint triggers that do all those checks instead. This adds
complexity to write the triggers and more complicated RLS policies etc,
and also is potentially slower when executing those more complicated
queries.
- Or we can add the additional column, but also add an additional unique
index on the referenced table, and then make it part of the FK. This
removes some of the constraint triggers and makes RLS policies simpler
and likely faster to execute queries. It comes at a cost of additional
cost of storage, though - and this is something that $subject tries to
address.

Still, even when $subject is allowed, in practice we need some of the
triggers to fetch those dependent values. Considering that the current
FK triggers already do the same kind of queries at the same times, it'd
be more efficient to have those FK queries fetch those dependent values.

But this could also be a CHECK constraint to allow FKs only to a subset
of rows in the target table:

Are you suggesting a check constraint that queries another table?

No. I was talking about the CHECK constraint in my example in the next
paragraph of that mail. The CHECK constraint on bar.ftype is a regular
CHECK constraint, but because of how ftype is updated automatically, it
effectively behaves like some kind of additional constraint on the FK
itself.

This "derive the value automatically" is not what foreign key
constraints do right now at all, right? And if fact it's contradictory
to existing behavior, no?

I don't think it's contradicting. Maybe a better way to put my idea is this:

For a foreign key to a superset of unique columns, the already-unique
columns should behave according to the specified ON UPDATE clause.
However, the extra columns should always behave as they were ON UPDATE
CASCADE. And additionally, they should behave similar to something like
ON INSERT CASCADE. Although that INSERT is about the referencing table,
not the referenced table, so the analogy isn't 100%.

I guess this would also be a more direct answer to Tom's earlier
question about what to expect in the ON UPDATE scenario.

Best

Wolfgang

#14

jtc331@gmail.com

over 3 years ago

In reply to: Wolfgang Walther (#13)

Re: Allow foreign keys to reference a superset of unique columns

On Mon, Sep 26, 2022 at 2:28 AM Wolfgang Walther
<walther@technowledgy.de> wrote:

James Coleman:

If we have a declared constraint on x,y where x is unique based on an
index including on x I do not think we should have that fk constraint
work differently than a constraint on x,y where there is a unique
index on x,y. That would seem to be incredibly confusing behavior
(even if it would be useful for some specific use case).

I don't think it's behaving differently from how it does now. See below.
But I can see how that could be confusing. Maybe it's just about
describing the feature in a better way than I did so far. Or maybe it
needs a different syntax.

Anyway, I don't think it's just a specific use case. In every use case I
had for $subject so far, the immediate next step was to write some
triggers to fetch those derived values from the referenced table.

Ultimately it's a question of efficiency: We can achieve the same thing
in two ways today:
- We can either **not** add the additional column (members.tenant,
bar.ftype in my examples) to the referencing table at all, and add
constraint triggers that do all those checks instead. This adds
complexity to write the triggers and more complicated RLS policies etc,
and also is potentially slower when executing those more complicated
queries.
- Or we can add the additional column, but also add an additional unique
index on the referenced table, and then make it part of the FK. This
removes some of the constraint triggers and makes RLS policies simpler
and likely faster to execute queries. It comes at a cost of additional
cost of storage, though - and this is something that $subject tries to
address.

Still, even when $subject is allowed, in practice we need some of the
triggers to fetch those dependent values. Considering that the current
FK triggers already do the same kind of queries at the same times, it'd
be more efficient to have those FK queries fetch those dependent values.

But this could also be a CHECK constraint to allow FKs only to a subset
of rows in the target table:

Are you suggesting a check constraint that queries another table?

No. I was talking about the CHECK constraint in my example in the next
paragraph of that mail. The CHECK constraint on bar.ftype is a regular
CHECK constraint, but because of how ftype is updated automatically, it
effectively behaves like some kind of additional constraint on the FK
itself.

Ah, OK.

This "derive the value automatically" is not what foreign key
constraints do right now at all, right? And if fact it's contradictory
to existing behavior, no?

I don't think it's contradicting. Maybe a better way to put my idea is this:

For a foreign key to a superset of unique columns, the already-unique
columns should behave according to the specified ON UPDATE clause.
However, the extra columns should always behave as they were ON UPDATE
CASCADE. And additionally, they should behave similar to something like
ON INSERT CASCADE. Although that INSERT is about the referencing table,
not the referenced table, so the analogy isn't 100%.

I guess this would also be a more direct answer to Tom's earlier
question about what to expect in the ON UPDATE scenario.

So the broader point I'm trying to make is that, as I understand it,
indexes backing foreign key constraints is an implementation detail.
The SQL standard details the behavior of foreign key constraints
regardless of implementation details like a backing index. That means
that the behavior of two column foreign key constraints is defined in
a single way whether or not there's a backing index at all or whether
such a backing index, if present, contains one or two columns.

I understand that for the use case you're describing this isn't the
absolute most efficient way to implement the desired data semantics.
But it would be incredibly confusing (and, I think, a violation of the
SQL standard) to have one foreign key constraint work in a different
way from another such constraint when both are indistinguishable at
the constraint level (the backing index isn't an attribute of the
constraint; it's merely an implementation detail).

James Coleman

#15

walther@technowledgy.de

over 3 years ago

In reply to: James Coleman (#14)

Re: Allow foreign keys to reference a superset of unique columns

James Coleman:

So the broader point I'm trying to make is that, as I understand it,
indexes backing foreign key constraints is an implementation detail.
The SQL standard details the behavior of foreign key constraints
regardless of implementation details like a backing index. That means
that the behavior of two column foreign key constraints is defined in
a single way whether or not there's a backing index at all or whether
such a backing index, if present, contains one or two columns.

I understand that for the use case you're describing this isn't the
absolute most efficient way to implement the desired data semantics.
But it would be incredibly confusing (and, I think, a violation of the
SQL standard) to have one foreign key constraint work in a different
way from another such constraint when both are indistinguishable at
the constraint level (the backing index isn't an attribute of the
constraint; it's merely an implementation detail).

Ah, thanks, I understand better now.

The two would only be indistinguishable at the constraint level, if
$subject was implemented by allowing to create unique constraints on a
superset of unique columns, backed by a different index (the suggestion
we both made independently). But if it was possible to reference a
superset of unique columns, where there was only a unique constraint put
on a subset of the referenced columns (the idea originally introduced in
this thread), then there would be a difference, right?

That's if it was only the backing index that is not part of the SQL
standard, and not also the fact that a foreign key should reference a
primary key or unique constraint?

Anyway, I can see very well how that would be quite confusing overall.
It would probably be wiser to allow something roughly like this (if at
all, of course):

CREATE TABLE bar (
b INT PRIMARY KEY,
f INT,
ftype foo_type GENERATED ALWAYS AS REFERENCE TO foo.type,
FOREIGN KEY (f, ftype) REFERENCES foo (f, type)
);

It likely wouldn't work exactly like that, but given a foreign key to
foo, the GENERATED clause could be used to fetch the value through the
same triggers that form that FK for efficiency. My main point for now
is: With a much more explicit syntax anything near that, this would
certainly be an entirely different feature than $subject **and** it
would be possible to implement on top of $subject. If at all.

So no need for me to distract this thread from $subject anymore. I think
the idea of allowing to create unique constraints on a superset of the
columns of an already existing unique index is a good one, so let's
discuss this further.

Best

Wolfgang

#16

walther@technowledgy.de

over 3 years ago

In reply to: James Coleman (#12)

Re: Allow foreign keys to reference a superset of unique columns

James Coleman:

As I was reading through the email chain I had this thought: could you
get the same benefit (or 90% of it anyway) by instead allowing the
creation of a uniqueness constraint that contains more columns than
the index backing it? So long as the index backing it still guaranteed
the uniqueness on a subset of columns that would seem to be safe.

Tom notes the additional columns being nullable is something to think
about. But if we go the route of allowing unique constraints with
backing indexes having a subset of columns from the constraint I don't
think the nullability is an issue since it's already the case that a
unique constraint can be declared on columns that are nullable. Indeed
it's also the case that we already support a foreign key constraint
backed by a unique constraint including nullable columns.

Because such an approach would, I believe, avoid changing the foreign
key code (or semantics) at all, I believe that would address Tom's
concerns about information_schema and fuzziness of semantics.

Could we create this additional unique constraint implicitly, when using
FOREIGN KEY ... REFERENCES on a superset of unique columns? This would
make it easier to use, but still give proper information_schema output.

Best

Wolfgang

#17

jtc331@gmail.com

over 3 years ago

In reply to: Wolfgang Walther (#15)

Re: Allow foreign keys to reference a superset of unique columns

On Mon, Sep 26, 2022 at 9:59 AM Wolfgang Walther
<walther@technowledgy.de> wrote:

James Coleman:

So the broader point I'm trying to make is that, as I understand it,
indexes backing foreign key constraints is an implementation detail.
The SQL standard details the behavior of foreign key constraints
regardless of implementation details like a backing index. That means
that the behavior of two column foreign key constraints is defined in
a single way whether or not there's a backing index at all or whether
such a backing index, if present, contains one or two columns.

I understand that for the use case you're describing this isn't the
absolute most efficient way to implement the desired data semantics.
But it would be incredibly confusing (and, I think, a violation of the
SQL standard) to have one foreign key constraint work in a different
way from another such constraint when both are indistinguishable at
the constraint level (the backing index isn't an attribute of the
constraint; it's merely an implementation detail).

Ah, thanks, I understand better now.

The two would only be indistinguishable at the constraint level, if
$subject was implemented by allowing to create unique constraints on a
superset of unique columns, backed by a different index (the suggestion
we both made independently). But if it was possible to reference a
superset of unique columns, where there was only a unique constraint put
on a subset of the referenced columns (the idea originally introduced in
this thread), then there would be a difference, right?

That's if it was only the backing index that is not part of the SQL
standard, and not also the fact that a foreign key should reference a
primary key or unique constraint?

I think that's not true: the SQL standard doesn't have the option of
"this foreign key is backed by this unique constraint", does it? So in
either case I believe we would be at minimum implementing an extension
to the standard (and as I argued already I think it would actually be
contradictory to the standard).

Anyway, I can see very well how that would be quite confusing overall.
It would probably be wiser to allow something roughly like this (if at
all, of course):

CREATE TABLE bar (
b INT PRIMARY KEY,
f INT,
ftype foo_type GENERATED ALWAYS AS REFERENCE TO foo.type,
FOREIGN KEY (f, ftype) REFERENCES foo (f, type)
);

It likely wouldn't work exactly like that, but given a foreign key to
foo, the GENERATED clause could be used to fetch the value through the
same triggers that form that FK for efficiency. My main point for now
is: With a much more explicit syntax anything near that, this would
certainly be an entirely different feature than $subject **and** it
would be possible to implement on top of $subject. If at all.

Yeah, I think that would make more sense if one were proposing an
addition to the SQL standard (or an explicit extension to it that
Postgres would support indepently of the standard).

So no need for me to distract this thread from $subject anymore. I think
the idea of allowing to create unique constraints on a superset of the
columns of an already existing unique index is a good one, so let's
discuss this further.

Sounds good to me!

James Coleman

#18

[1]: /messages/by-id/3057718.1658949103@sss.pgh.pa.us

jtc331@gmail.com

over 3 years ago

In reply to: Wolfgang Walther (#16)

Re: Allow foreign keys to reference a superset of unique columns

On Mon, Sep 26, 2022 at 10:04 AM Wolfgang Walther
<walther@technowledgy.de> wrote:

James Coleman:

As I was reading through the email chain I had this thought: could you
get the same benefit (or 90% of it anyway) by instead allowing the
creation of a uniqueness constraint that contains more columns than
the index backing it? So long as the index backing it still guaranteed
the uniqueness on a subset of columns that would seem to be safe.

Tom notes the additional columns being nullable is something to think
about. But if we go the route of allowing unique constraints with
backing indexes having a subset of columns from the constraint I don't
think the nullability is an issue since it's already the case that a
unique constraint can be declared on columns that are nullable. Indeed
it's also the case that we already support a foreign key constraint
backed by a unique constraint including nullable columns.

Because such an approach would, I believe, avoid changing the foreign
key code (or semantics) at all, I believe that would address Tom's
concerns about information_schema and fuzziness of semantics.

Could we create this additional unique constraint implicitly, when using
FOREIGN KEY ... REFERENCES on a superset of unique columns? This would
make it easier to use, but still give proper information_schema output.

Possibly. It'd be my preference to discuss that as a second patch
(could be in the same series); my intuition is that it'd be easier to
get agreement on the first part first, but of course I could be wrong
(if some committer, for example, thought the feature only made sense
as an implicit creation of such a constraint to back the use case
Kaiting opened with).

James Coleman

#19

David Rowley

dgrowleyml@gmail.com

over 3 years ago

In reply to: James Coleman (#17)

Re: Allow foreign keys to reference a superset of unique columns

On Tue, 27 Sept 2022 at 06:08, James Coleman <jtc331@gmail.com> wrote:

On Mon, Sep 26, 2022 at 9:59 AM Wolfgang Walther
<walther@technowledgy.de> wrote:

So no need for me to distract this thread from $subject anymore. I think
the idea of allowing to create unique constraints on a superset of the
columns of an already existing unique index is a good one, so let's
discuss this further.

Sounds good to me!

I don't see any immediate problems with allowing UNIQUE constraints to
be supported using a unique index which contains only a subset of
columns that are mentioned in the constraint. There would be a few
things to think about. e.g INSERT ON CONFLICT might need some
attention as a unique constraint can be specified for use as the
arbiter.

Perhaps the patch could be broken down as follows:

0001:

* Extend ALTER TABLE ADD CONSTRAINT UNIQUE syntax to allow a column
list when specifying USING INDEX.
* Add checks to ensure the index in USING INDEX contains only columns
mentioned in the column list.
* Do any required work for INSERT ON CONFLICT. I've not looked at the
code but maybe some adjustments are required for where it gets the
list of columns.
* Address any other places that assume the supporting index contains
all columns of the unique constraint.

0002:

* Adjust transformFkeyCheckAttrs() to have it look at UNIQUE
constraints as well as unique indexes
* Ensure information_schema.referential_constraints view still works correctly.

I think that would address all of Tom's concerns he mentioned in [1]/messages/by-id/3057718.1658949103@sss.pgh.pa.us.
I wasn't quite sure I understood the NOT NULL concern there since
going by RI_FKey_pk_upd_check_required(), we don't enforce FKs when
the referenced table has a NULL in the FK's columns.

David

#20

ktchen14@gmail.com

over 3 years ago

In reply to: Tom Lane (#7)

Re: Allow foreign keys to reference a superset of unique columns

For one example of where the semantics get fuzzy, it's not
very clear how the extra-baggage columns ought to participate in
CASCADE updates. Currently, if we have
CREATE TABLE foo (a integer PRIMARY KEY, b integer);
then an update that changes only foo.b doesn't need to update
referencing tables, and I think we even have optimizations that
assume that if no unique-key columns are touched then RI checks
need not be made. But if you did
CREATE TABLE bar (x integer, y integer,
FOREIGN KEY (x, y) REFERENCES foo(a, b) ON UPDATE

CASCADE);

then perhaps you expect bar.y to be updated ... or maybe you don't?

I'd expect bar.y to be updated. In my mind, the FOREIGN KEY constraint
should
behave the same, regardless of whether the underlying unique index on the
referenced side is an equivalent set to, or a strict subset of, the
referenced
columns.

Another example is that I think the idea is only well-defined when
the subset column(s) are a primary key, or at least all marked NOT NULL.
Otherwise they're not as unique as you're claiming. But then the FK
constraint really has to be dependent on a PK constraint not just an
index definition, since indexes in themselves don't enforce not-nullness.
That gets back to Peter's complaint that referring to an index isn't
good enough.

I think that uniqueness should be guaranteed enough even if the subset
columns
are nullable:

CREATE TABLE foo (a integer UNIQUE, b integer);

CREATE TABLE bar (
x integer,
y integer,
FOREIGN KEY (x, y) REFERENCES foo(a, b)
);

The unique index underlying foo.a guarantees that (foo.a, foo.b) is unique
if
foo.a isn't NULL. That is, there can be multiple rows (NULL, 1) in foo.
However,
such a row can't be the target of the foreign key constraint anyway. So, I'm
fairly certain that, where it matters, a unique index on a nullable subset
of
the referenced columns guarantees a distinct referenced row.

It's also unclear to me how this ought to interact with the
information_schema views concerning foreign keys. We generally
feel that we don't want to present any non-SQL-compatible data
in information_schema, for fear that it will confuse applications
that expect to see SQL-spec behavior there. So do we leave such
FKs out of the views altogether, or show only the columns involving
the associated unique constraint? Neither answer seems pleasant.

Here's the information_schema output for this example:

CREATE TABLE foo (a integer, b integer);

CREATE UNIQUE INDEX ON foo (a, b);

CREATE TABLE bar (
x integer,
y integer,
FOREIGN KEY (x, y) REFERENCES foo(a, b)
);

# SELECT * FROM information_schema.referential_constraints
WHERE constraint_name = 'bar_x_y_fkey';

# SELECT * FROM information_schema.key_column_usage
WHERE constraint_name = 'bar_x_y_fkey';

It appears that currently in PostgreSQL, the unique_constraint_catalog,
schema,
and name are NULL in referential_constraints when a unique index (without an
associated unique constraint) underlies the referenced columns. The
behaviour
I'm proposing would have the same behavior vis-a-vis
referential_constraints.

As for key_column_usage, I propose that position_in_unique_constraint be
NULL if
the referenced column isn't indexed.

#21

ktchen14@gmail.com

over 3 years ago

In reply to: James Coleman (#12)

#22