complex referential integrity constraints

Started by Robert Haasabout 19 years ago31 messagesgeneral

robertmhaas@gmail.com

about 19 years ago

So, I have the following problem.

Suppose you have two kinds of animals, sheep and wolves. Since they
have very similar properties, you create a single table to hold both
kinds of animals, and an animal_type table to specify the type of each
animal:

CREATE TABLE animal_type (
id integer not null,
name varchar(80) not null,
primary key (id)
);
INSERT INTO animal_type VALUES (1, 'Sheep');
INSERT INTO animal_type VALUES (2, 'Wolf');

CREATE TABLE animal (
id serial,
type_id integer not null references animal_type (id),
name varchar(80) not null,
age integer not null,
weight_in_pounds integer not null,
primary key (id)
);

The animal_type table is more or less written in stone, but the animal
table will be updated frequently. Now, let's suppose that we want to
keep track of all of the cases where one animal is mauled by another
animal:

CREATE TABLE mauling (
id serial,
attacker_id integer not null references animal (id),
victim_id integer not null references animal (id),
attack_time timestamp not null,
primary key (id)
);

The problem with this is that I have a very unsettled feeling about the
foreign key constraints on this table. The victim_id constraint is
fine, but the attacker_id constraint is really inadequate, because the
attacker CAN NEVER BE A SHEEP. I really want a way to write a
constraint that says that the attacker must be an animal, but
specifically, a wolf.

It would be really nice to be able to write:

FOREIGN KEY (attacker_id, 2) REFERENCES animal (id, type_id)

Or:

CREATE UNIQUE INDEX wolves ON animal (id) WHERE type_id = 2;
-- and then
FOREIGN KEY (attacker_id) REFERENCES INDEX wolves

...but that's entirely speculative syntax. I don't think there's any
easy way to do this. (Please tell me I'm wrong.)

The problem really comes in when people start modifying the animal
table. Every once in a while we have a case where we record something
as a wolf, but it turns out to have been a sheep in wolf's clothing. In
this case, we want to do something like this:

UPDATE animal SET type_id = 1 WHERE id = 572;

HOWEVER, this operation MUST NOT be allowed if it turns out there is a
row in the mauling table where attacker_id = 572, because that would
violate my integrity constraints that says that sheep do not maul.

Any suggestions? I've thought about creating rules or triggers to check
the conditions, but I'm scared that this could either (a) get really
complicated when there are a lot more tables and constraints involved or
(b) introduce race conditions.

Thanks,

...Robert

Tom Lane

tgl@sss.pgh.pa.us

about 19 years ago

In reply to: Robert Haas (#1)

Re: complex referential integrity constraints

"Robert Haas" <Robert.Haas@dyntek.com> writes:

... The problem with this is that I have a very unsettled feeling about the
foreign key constraints on this table. The victim_id constraint is
fine, but the attacker_id constraint is really inadequate, because the
attacker CAN NEVER BE A SHEEP.

I think the only way to do this in SQL is to denormalize a bit. If you
copy the animal_type field into the maulings table then you can apply a
check constraint there. So

FOREIGN KEY (attacker_id, attacker_type_id) REFERENCES animal (id, type_id)
ON UPDATE CASCADE

CHECK (attacker_type_id != 'sheep')

The thing that's still a bit annoying is that you'd have to hard-wire
the numerical code for SHEEP into the check constraint; you couldn't
really write it symbolically as I did above. Perhaps you should further
denormalize and keep real animal type names not codes in the animal
type table, thus

CREATE TABLE animal_type (
name varchar(80) primary key
);

CREATE TABLE animal (
id serial,
type varchar(80) references animal_type,
...
);

whereupon the maulings table also has real type names not IDs.

No doubt some relational-theory maven will come along and slap your
wrist for doing this, but he should first explain how to do it without
denormalization...

Also, I think what you've really done here is created a "poor man's
enum". There will probably be real enum types in PG 8.3, which would
offer a more efficient solution to the problem of representing animal
types.

regards, tom lane

elein

elein@varlena.com

about 19 years ago

In reply to: Robert Haas (#1)

Re: complex referential integrity constraints

On Fri, Feb 16, 2007 at 09:58:56AM -0500, Robert Haas wrote:

So, I have the following problem.

Suppose you have two kinds of animals, sheep and wolves. Since they
have very similar properties, you create a single table to hold both
kinds of animals, and an animal_type table to specify the type of each
animal:

CREATE TABLE animal_type (
id integer not null,
name varchar(80) not null,
primary key (id)
);
INSERT INTO animal_type VALUES (1, 'Sheep');
INSERT INTO animal_type VALUES (2, 'Wolf');

CREATE TABLE animal (
id serial,
type_id integer not null references animal_type (id),
name varchar(80) not null,
age integer not null,
weight_in_pounds integer not null,
primary key (id)
);

The animal_type table is more or less written in stone, but the animal
table will be updated frequently. Now, let's suppose that we want to
keep track of all of the cases where one animal is mauled by another
animal:

CREATE TABLE mauling (
id serial,
attacker_id integer not null references animal (id),
victim_id integer not null references animal (id),
attack_time timestamp not null,
primary key (id)
);

The problem with this is that I have a very unsettled feeling about the
foreign key constraints on this table. The victim_id constraint is
fine, but the attacker_id constraint is really inadequate, because the
attacker CAN NEVER BE A SHEEP. I really want a way to write a
constraint that says that the attacker must be an animal, but
specifically, a wolf.

It would be really nice to be able to write:

FOREIGN KEY (attacker_id, 2) REFERENCES animal (id, type_id)

Or:

CREATE UNIQUE INDEX wolves ON animal (id) WHERE type_id = 2;
-- and then
FOREIGN KEY (attacker_id) REFERENCES INDEX wolves

...but that's entirely speculative syntax. I don't think there's any
easy way to do this. (Please tell me I'm wrong.)

The problem really comes in when people start modifying the animal
table. Every once in a while we have a case where we record something
as a wolf, but it turns out to have been a sheep in wolf's clothing. In
this case, we want to do something like this:

UPDATE animal SET type_id = 1 WHERE id = 572;

HOWEVER, this operation MUST NOT be allowed if it turns out there is a
row in the mauling table where attacker_id = 572, because that would
violate my integrity constraints that says that sheep do not maul.

Any suggestions? I've thought about creating rules or triggers to check
the conditions, but I'm scared that this could either (a) get really
complicated when there are a lot more tables and constraints involved or
(b) introduce race conditions.

Why don't you add a field in animal_types that is boolean mauler.
Then you can add a trigger on the mauling table to raise an error
when the attacker_id is an animal type mauler.

--elein

Alban Hertroys

alban@magproductions.nl

about 19 years ago

In reply to: Robert Haas (#1)

Re: complex referential integrity constraints

Robert Haas wrote:

So, I have the following problem.

Suppose you have two kinds of animals, sheep and wolves. Since they
have very similar properties, you create a single table to hold both
kinds of animals, and an animal_type table to specify the type of each
animal:

CREATE TABLE animal_type (
id integer not null,
name varchar(80) not null,
primary key (id)
);
INSERT INTO animal_type VALUES (1, 'Sheep');
INSERT INTO animal_type VALUES (2, 'Wolf');

CREATE TABLE animal (
id serial,
type_id integer not null references animal_type (id),
name varchar(80) not null,
age integer not null,
weight_in_pounds integer not null,
primary key (id)
);

The animal_type table is more or less written in stone, but the animal
table will be updated frequently. Now, let's suppose that we want to
keep track of all of the cases where one animal is mauled by another
animal:

I kind of get the feeling that you'd want it like this:

CREATE TABLE predator (
...
) INHERITS animal;

And then put your foreign key constraints from predator to mauling.

You may want to be more accurate about what kind of animals sheep are as
well.

I haven't really given this much thought though, I'm just quickly
reading my mail before starting work ;)

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede

// Integrate Your World //

Joris Dobbelsteen

Joris@familiedobbelsteen.nl

about 19 years ago

In reply to: Robert Haas (#1)

Re: complex referential integrity constraints

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of elein
Sent: zondag 18 februari 2007 23:16
To: Robert Haas
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] complex referential integrity constraints

On Fri, Feb 16, 2007 at 09:58:56AM -0500, Robert Haas wrote:

So, I have the following problem.

Suppose you have two kinds of animals, sheep and wolves. Since they
have very similar properties, you create a single table to hold both
kinds of animals, and an animal_type table to specify the

type of each

animal:

CREATE TABLE animal_type (
id integer not null,
name varchar(80) not null,
primary key (id)
);
INSERT INTO animal_type VALUES (1, 'Sheep'); INSERT INTO animal_type
VALUES (2, 'Wolf');

CREATE TABLE animal (
id serial,
type_id integer not null references

animal_type (id),

name varchar(80) not null,
age integer not null,
weight_in_pounds integer not null,
primary key (id)
);

The animal_type table is more or less written in stone, but

the animal

table will be updated frequently. Now, let's suppose that

we want to

keep track of all of the cases where one animal is mauled by another
animal:

CREATE TABLE mauling (
id serial,
attacker_id integer not null references animal (id),
victim_id integer not null references animal (id),
attack_time timestamp not null,
primary key (id)
);

The problem with this is that I have a very unsettled feeling about
the foreign key constraints on this table. The victim_id constraint
is fine, but the attacker_id constraint is really

inadequate, because

the attacker CAN NEVER BE A SHEEP. I really want a way to write a
constraint that says that the attacker must be an animal, but
specifically, a wolf.

It would be really nice to be able to write:

FOREIGN KEY (attacker_id, 2) REFERENCES animal (id, type_id)

Or:

CREATE UNIQUE INDEX wolves ON animal (id) WHERE type_id = 2;
-- and then
FOREIGN KEY (attacker_id) REFERENCES INDEX wolves

...but that's entirely speculative syntax. I don't think

there's any

easy way to do this. (Please tell me I'm wrong.)

The problem really comes in when people start modifying the animal
table. Every once in a while we have a case where we record

something

as a wolf, but it turns out to have been a sheep in wolf's

clothing.

In this case, we want to do something like this:

UPDATE animal SET type_id = 1 WHERE id = 572;

HOWEVER, this operation MUST NOT be allowed if it turns out

there is a

row in the mauling table where attacker_id = 572, because that would
violate my integrity constraints that says that sheep do not maul.

Any suggestions? I've thought about creating rules or triggers to
check the conditions, but I'm scared that this could either (a) get
really complicated when there are a lot more tables and constraints
involved or
(b) introduce race conditions.

Why don't you add a field in animal_types that is boolean mauler.
Then you can add a trigger on the mauling table to raise an
error when the attacker_id is an animal type mauler.

This is only partial. You need a lot more triggers to guarentee the
constraints are enforced.
Precisely you need to validate:
* mauling on insert/update of attacker_id
* animal on update of type_id
* animal_type on update of your property

Of course you need to think about the MVCC model, such that:
Transaction 1 executes
INSERT INTO mauling VALUES ('someattacker'),
Transaction 2 executes
UPDATE animal_type SET mauler = false WHERE name = 'someattacker',
such that both transaction happen in parallel.

This is perfectly possible and will make it possible to violate the
constraint, UNLESS locking of the tuples is done correctly.

These contraints are not trivial to implement (unfortunally). It would
be great if they where.

- Joris

David Fetter

david@fetter.org

about 19 years ago

In reply to: Robert Haas (#1)

Re: complex referential integrity constraints

On Fri, Feb 16, 2007 at 09:58:56AM -0500, Robert Haas wrote:

So, I have the following problem.

Suppose you have two kinds of animals, sheep and wolves. Since they
have very similar properties, you create a single table to hold both
kinds of animals, and an animal_type table to specify the type of each
animal:

CREATE TABLE animal_type (
id integer not null,
name varchar(80) not null,
primary key (id)
);
INSERT INTO animal_type VALUES (1, 'Sheep');
INSERT INTO animal_type VALUES (2, 'Wolf');

CREATE TABLE animal (
id serial,
type_id integer not null references animal_type (id),
name varchar(80) not null,
age integer not null,
weight_in_pounds integer not null,
primary key (id)
);

The animal_type table is more or less written in stone, but the animal
table will be updated frequently. Now, let's suppose that we want to
keep track of all of the cases where one animal is mauled by another
animal:

CREATE TABLE mauling (
id serial,
attacker_id integer not null references animal (id),
victim_id integer not null references animal (id),
attack_time timestamp not null,
primary key (id)
);

The problem with this is that I have a very unsettled feeling about the
foreign key constraints on this table. The victim_id constraint is
fine, but the attacker_id constraint is really inadequate, because the
attacker CAN NEVER BE A SHEEP. I really want a way to write a
constraint that says that the attacker must be an animal, but
specifically, a wolf.

It would be really nice to be able to write:

FOREIGN KEY (attacker_id, 2) REFERENCES animal (id, type_id)

Or:

CREATE UNIQUE INDEX wolves ON animal (id) WHERE type_id = 2;
-- and then
FOREIGN KEY (attacker_id) REFERENCES INDEX wolves

...but that's entirely speculative syntax. I don't think there's any
easy way to do this. (Please tell me I'm wrong.)

The problem really comes in when people start modifying the animal
table. Every once in a while we have a case where we record something
as a wolf, but it turns out to have been a sheep in wolf's clothing. In
this case, we want to do something like this:

UPDATE animal SET type_id = 1 WHERE id = 572;

HOWEVER, this operation MUST NOT be allowed if it turns out there is a
row in the mauling table where attacker_id = 572, because that would
violate my integrity constraints that says that sheep do not maul.

Any suggestions? I've thought about creating rules or triggers to check
the conditions, but I'm scared that this could either (a) get really
complicated when there are a lot more tables and constraints involved or
(b) introduce race conditions.

Thanks,

...Robert

I'd do something like this:

CREATE TABLE animal_type (
animal_name TEXT PRIMARY KEY,
CHECK(animal_name = trim(animal_name))
);

/* Only one of {Wolf,wolf} can be in the table. */

CREATE UNIQUE INDEX just_one_animal_name
ON animal_type(LOWER(animal_name));

CREATE TABLE predator (
animal_name TEXT NOT NULL
REFERENCES animal_type(animal_name)
ON DELETE CASCADE,
PRIMARY KEY(animal_name)
);

CREATE TABLE prey (
animal_name TEXT NOT NULL
REFERENCES animal_type(animal_name)
ON DELETE CASCADE,
PRIMARY KEY(animal_name)
);

CREATE TABLE mauling (
id SERIAL PRIMARY KEY,
attacker_id INTEGER NOT NULL REFERENCES predator (animal_type_id),
victim_id INTEGER NOT NULL REFERENCES prey (animal_type_id),
attack_time TIMESTAMP WITH TIME ZONE NOT NULL
);

Cheers,
D
--
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778 AIM: dfetter666
Skype: davidfetter

Remember to vote!

Richard Broersma Jr

rabroersma@yahoo.com

about 19 years ago

In reply to: David Fetter (#6)

Re: complex referential integrity constraints

I'd do something like this:

CREATE TABLE animal_type (
animal_name TEXT PRIMARY KEY,
CHECK(animal_name = trim(animal_name))
);

/* Only one of {Wolf,wolf} can be in the table. */

CREATE UNIQUE INDEX just_one_animal_name
ON animal_type(LOWER(animal_name));

CREATE TABLE predator (
animal_name TEXT NOT NULL
REFERENCES animal_type(animal_name)
ON DELETE CASCADE,
PRIMARY KEY(animal_name)
);

CREATE TABLE prey (
animal_name TEXT NOT NULL
REFERENCES animal_type(animal_name)
ON DELETE CASCADE,
PRIMARY KEY(animal_name)
);

CREATE TABLE mauling (
id SERIAL PRIMARY KEY,
attacker_id INTEGER NOT NULL REFERENCES predator (animal_type_id),
victim_id INTEGER NOT NULL REFERENCES prey (animal_type_id),
attack_time TIMESTAMP WITH TIME ZONE NOT NULL
);

Just to add to David's idea, I would create two update-able views that joined animal to predator
and another for animal to prey. This way, you only have to insert/update/delete from 1
update-able view rather than two tables.

Of course, I am still waiting for the future version of postgresql that will handle this
functionality seamlessly using table inheritance. :-)

Regards,
Richard Broersma Jr.

David Fetter

david@fetter.org

about 19 years ago

In reply to: Richard Broersma Jr (#7)

Re: complex referential integrity constraints

On Mon, Feb 19, 2007 at 10:52:51AM -0800, Richard Broersma Jr wrote:

I'd do something like this:

CREATE TABLE animal_type (
animal_name TEXT PRIMARY KEY,
CHECK(animal_name = trim(animal_name))
);

/* Only one of {Wolf,wolf} can be in the table. */

CREATE UNIQUE INDEX just_one_animal_name
ON animal_type(LOWER(animal_name));

CREATE TABLE predator (
animal_name TEXT NOT NULL
REFERENCES animal_type(animal_name)
ON DELETE CASCADE,
PRIMARY KEY(animal_name)
);

CREATE TABLE prey (
animal_name TEXT NOT NULL
REFERENCES animal_type(animal_name)
ON DELETE CASCADE,
PRIMARY KEY(animal_name)
);

CREATE TABLE mauling (
id SERIAL PRIMARY KEY,
attacker_id INTEGER NOT NULL REFERENCES predator (animal_type_id),
victim_id INTEGER NOT NULL REFERENCES prey (animal_type_id),
attack_time TIMESTAMP WITH TIME ZONE NOT NULL
);

Just to add to David's idea, I would create two update-able views
that joined animal to predator and another for animal to prey. This
way, you only have to insert/update/delete from 1 update-able view
rather than two tables.

You could just do a rewrite RULE on predator and prey for each of
INSERT, UPDATE and DELETE that has a DO INSTEAD action that writes to
animal. This wouldn't handle COPY, though.

Of course, I am still waiting for the future version of postgresql
that will handle this functionality seamlessly using table
inheritance. :-)

You mean writeable VIEWs? I think it would be nice to have some cases
handled, but there are several kinds of VIEWs I can think of where the
only sane way to write to them is to define the writing behavior on a
case-by-case basis.

Cheers,
D
--
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778 AIM: dfetter666
Skype: davidfetter

Remember to vote!

Robert Haas

robertmhaas@gmail.com

about 19 years ago

In reply to: David Fetter (#6)

Re: complex referential integrity constraints

The idea here is that a wolf can attack a sheep, or a wolf can attack
another wolf, but sheep can't attack anything. I suppose I could list
each wolf in both the predator and prey tables, but that seems a bit
duplicative (and causes other problems).

...Robert

-----Original Message-----
From: David Fetter [mailto:david@fetter.org]
Sent: Monday, February 19, 2007 1:04 PM
To: Robert Haas
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] complex referential integrity constraints

On Fri, Feb 16, 2007 at 09:58:56AM -0500, Robert Haas wrote:

So, I have the following problem.

Suppose you have two kinds of animals, sheep and wolves. Since they
have very similar properties, you create a single table to hold both
kinds of animals, and an animal_type table to specify the type of each
animal:

CREATE TABLE animal_type (
id integer not null,
name varchar(80) not null,
primary key (id)
);
INSERT INTO animal_type VALUES (1, 'Sheep');
INSERT INTO animal_type VALUES (2, 'Wolf');

CREATE TABLE animal (
id serial,
type_id integer not null references animal_type (id),
name varchar(80) not null,
age integer not null,
weight_in_pounds integer not null,
primary key (id)
);

The animal_type table is more or less written in stone, but the animal
table will be updated frequently. Now, let's suppose that we want to
keep track of all of the cases where one animal is mauled by another
animal:

CREATE TABLE mauling (
id serial,
attacker_id integer not null references animal (id),
victim_id integer not null references animal (id),
attack_time timestamp not null,
primary key (id)
);

The problem with this is that I have a very unsettled feeling about

the

foreign key constraints on this table. The victim_id constraint is
fine, but the attacker_id constraint is really inadequate, because the
attacker CAN NEVER BE A SHEEP. I really want a way to write a
constraint that says that the attacker must be an animal, but
specifically, a wolf.

It would be really nice to be able to write:

FOREIGN KEY (attacker_id, 2) REFERENCES animal (id, type_id)

Or:

CREATE UNIQUE INDEX wolves ON animal (id) WHERE type_id = 2;
-- and then
FOREIGN KEY (attacker_id) REFERENCES INDEX wolves

...but that's entirely speculative syntax. I don't think there's any
easy way to do this. (Please tell me I'm wrong.)

The problem really comes in when people start modifying the animal
table. Every once in a while we have a case where we record something
as a wolf, but it turns out to have been a sheep in wolf's clothing.

this case, we want to do something like this:

UPDATE animal SET type_id = 1 WHERE id = 572;

HOWEVER, this operation MUST NOT be allowed if it turns out there is a
row in the mauling table where attacker_id = 572, because that would
violate my integrity constraints that says that sheep do not maul.

Any suggestions? I've thought about creating rules or triggers to

check

the conditions, but I'm scared that this could either (a) get really
complicated when there are a lot more tables and constraints involved

(b) introduce race conditions.

Thanks,

...Robert

I'd do something like this:

CREATE TABLE animal_type (
animal_name TEXT PRIMARY KEY,
CHECK(animal_name = trim(animal_name))
);

/* Only one of {Wolf,wolf} can be in the table. */

CREATE UNIQUE INDEX just_one_animal_name
ON animal_type(LOWER(animal_name));

CREATE TABLE predator (
animal_name TEXT NOT NULL
REFERENCES animal_type(animal_name)
ON DELETE CASCADE,
PRIMARY KEY(animal_name)
);

CREATE TABLE prey (
animal_name TEXT NOT NULL
REFERENCES animal_type(animal_name)
ON DELETE CASCADE,
PRIMARY KEY(animal_name)
);

CREATE TABLE mauling (
id SERIAL PRIMARY KEY,
attacker_id INTEGER NOT NULL REFERENCES predator
(animal_type_id),
victim_id INTEGER NOT NULL REFERENCES prey (animal_type_id),
attack_time TIMESTAMP WITH TIME ZONE NOT NULL
);

Cheers,
D
--
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778 AIM: dfetter666
Skype: davidfetter

Remember to vote!

#10

Robert Haas

robertmhaas@gmail.com

about 19 years ago

In reply to: Joris Dobbelsteen (#5)

Re: complex referential integrity constraints

Yes, exactly. And while you might not care about all of those (e.g. I
care about the first two but am not worried about the third one because
I'm the only one who will ever update that table), writing multiple
triggers to enforce each constraint of this type quickly gets old if
there are even a few of them. It is exponentially harder to write a
constraint of this type than it is to write a simple foreign key
constraint.

...Robert

-----Original Message-----
From: Joris Dobbelsteen [mailto:Joris@familiedobbelsteen.nl]
Sent: Monday, February 19, 2007 5:59 AM
To: elein; Robert Haas
Cc: pgsql-general@postgresql.org
Subject: RE: [GENERAL] complex referential integrity constraints

Why don't you add a field in animal_types that is boolean mauler.
Then you can add a trigger on the mauling table to raise an
error when the attacker_id is an animal type mauler.

This is perfectly possible and will make it possible to violate the
constraint, UNLESS locking of the tuples is done correctly.

These contraints are not trivial to implement (unfortunally). It would
be great if they where.

- Joris

#11

Joris Dobbelsteen

Joris@familiedobbelsteen.nl

about 19 years ago

In reply to: Robert Haas (#1)

Re: complex referential integrity constraints

I partially agree:
If people CAN do stupid things, they are 'clever' enough to find a way
to actually do it. I've seen them destroy things, by just using a system
in a way it was not intended. They effectively found a way to blow away
the very thing that part was designed for.
But indeed, it's a lot of work, especially if the number of tables that
must be referenced increases. I'm a strong supporter for ensuring
consistency. Postgres has what it takes to do the job, but it doesn't
make my life a lot easier. But it seems to be as good as it gets
today...

Perhaps we should rather define a 'database' constraint in the order of:
"For every mauling, the attacking animal must be of the attacker type"
(in a computer understandable manner). From the set theory this should
be possible without too much problems, However doing so efficiently
might be slightly harder.
This might be a fun project and useful for the TODO list. At least it
makes it a lot easier (and maintanable) to enforce database-wide
constraints.

- Joris

Show quoted text

-----Original Message-----
From: Robert Haas [mailto:Robert.Haas@dyntek.com]
Sent: woensdag 21 februari 2007 3:37
To: Joris Dobbelsteen; elein
Cc: pgsql-general@postgresql.org
Subject: RE: [GENERAL] complex referential integrity constraints

Yes, exactly. And while you might not care about all of those
(e.g. I care about the first two but am not worried about the
third one because I'm the only one who will ever update that
table), writing multiple triggers to enforce each constraint
of this type quickly gets old if there are even a few of them.
It is exponentially harder to write a constraint of this type
than it is to write a simple foreign key constraint.

...Robert

-----Original Message-----
From: Joris Dobbelsteen [mailto:Joris@familiedobbelsteen.nl]
Sent: Monday, February 19, 2007 5:59 AM
To: elein; Robert Haas
Cc: pgsql-general@postgresql.org
Subject: RE: [GENERAL] complex referential integrity constraints

Why don't you add a field in animal_types that is boolean mauler.
Then you can add a trigger on the mauling table to raise an

error when

the attacker_id is an animal type mauler.

This is only partial. You need a lot more triggers to
guarentee the constraints are enforced.
Precisely you need to validate:
* mauling on insert/update of attacker_id
* animal on update of type_id
* animal_type on update of your property

Of course you need to think about the MVCC model, such that:
Transaction 1 executes
INSERT INTO mauling VALUES ('someattacker'), Transaction 2
executes UPDATE animal_type SET mauler = false WHERE name =
'someattacker', such that both transaction happen in parallel.

This is perfectly possible and will make it possible to
violate the constraint, UNLESS locking of the tuples is done correctly.

These contraints are not trivial to implement (unfortunally).
It would be great if they where.

- Joris

#12

Robert Haas

robertmhaas@gmail.com

about 19 years ago

In reply to: Joris Dobbelsteen (#11)

Re: complex referential integrity constraints

The ability to make a foreign key reference a specific partial unique
index (rather than just a set of columns that have a unique index) would
solve many problems of this type. As another example, you might have a
table where one of the columns is "is_deleted boolean not null". By
creating a partial unique index on the primary key of that table "WHERE
NOT is_deleted" and then pointing a foreign key at it, you could enforce
that each row in the child table references a parent who isn't deleted.

However, this would break down when there's more one than intermediate
step involved. For example, if you have:

CREATE TABLE animal_type (
id serial,
name varchar(60) not null,
is_attacker boolean not null,
primary key (id)
);

CREATE TABLE animal (
id serial,
type_id integer not null references animal_type (id),
name varchar(60) not null,
primary key (id)
);

CREATE TABLE mauling (
id serial,
attacker_id integer not null references animal (id),
victim_id integer not null references animal (id),
attack_time timestamp with time zone not null,
primary key (id)
);

It would be easy to enforce the constraint that the attacker must be an
animal of some specific type, but difficult to enforce the constraint
that the attacker must be an animal whose type, in turn, has a true
value for is_attacker.

The best idea that I can think of right now to handle multiple levels of
tables is to allow FOREIGN KEY constraints to references a VIEW, rather
than a table. Then you could say:

CREATE VIEW attackers AS
SELECT a.id FROM animal a, animal_type t WHERE a.type_id = t.id AND
t.attacker;

...and then FOREIGN KEY (attacker_id) REFERENCES attackers (id).

This syntax would solve a number of other problems as well, such as
requiring that some record in table A has a parent either in table P or
in table Q. However, I think this would probably require implementing
some kind of materialized view so that you could actually build an index
on the view, and that opens up a whole new can of worms, because it's
not very difficult to define a view that is costly to update
incrementally.

The problem is really that there is a pretty large gap between writing a
foreign key constraint, which is trivial, and enforcing a constraint
using triggers, which is quite a bit more complex (and therefore, easy
to screw up), because the foreign key automatically handles all the
cases (insert into child table, update of child table, update of parent
table, delete from parent table) whereas with triggers you have to
address each of those cases individually. Unfortunately, something
tells me that implementing a more powerful system for foreign key
constraints is a non-trivial project, however useful it would be.
Still, I'd love to see it in the TODO file, too.

...Robert

-----Original Message-----
From: Joris Dobbelsteen [mailto:Joris@familiedobbelsteen.nl]
Sent: Thursday, February 22, 2007 8:03 AM
To: Robert Haas; elein
Cc: pgsql-general@postgresql.org
Subject: RE: [GENERAL] complex referential integrity constraints

- Joris

Show quoted text

-----Original Message-----
From: Robert Haas [mailto:Robert.Haas@dyntek.com]
Sent: woensdag 21 februari 2007 3:37
To: Joris Dobbelsteen; elein
Cc: pgsql-general@postgresql.org
Subject: RE: [GENERAL] complex referential integrity constraints

Yes, exactly. And while you might not care about all of those
(e.g. I care about the first two but am not worried about the
third one because I'm the only one who will ever update that
table), writing multiple triggers to enforce each constraint
of this type quickly gets old if there are even a few of them.
It is exponentially harder to write a constraint of this type
than it is to write a simple foreign key constraint.

...Robert

-----Original Message-----
From: Joris Dobbelsteen [mailto:Joris@familiedobbelsteen.nl]
Sent: Monday, February 19, 2007 5:59 AM
To: elein; Robert Haas
Cc: pgsql-general@postgresql.org
Subject: RE: [GENERAL] complex referential integrity constraints

Why don't you add a field in animal_types that is boolean mauler.
Then you can add a trigger on the mauling table to raise an

error when

the attacker_id is an animal type mauler.

This is only partial. You need a lot more triggers to
guarentee the constraints are enforced.
Precisely you need to validate:
* mauling on insert/update of attacker_id
* animal on update of type_id
* animal_type on update of your property

Of course you need to think about the MVCC model, such that:
Transaction 1 executes
INSERT INTO mauling VALUES ('someattacker'), Transaction 2
executes UPDATE animal_type SET mauler = false WHERE name =
'someattacker', such that both transaction happen in parallel.

This is perfectly possible and will make it possible to
violate the constraint, UNLESS locking of the tuples is done correctly.

These contraints are not trivial to implement (unfortunally).
It would be great if they where.

- Joris

#13

Joris Dobbelsteen

Joris@familiedobbelsteen.nl

about 19 years ago

In reply to: Robert Haas (#1)

Re: complex referential integrity constraints

-----Original Message-----
From: Robert Haas [mailto:Robert.Haas@dyntek.com]
Sent: donderdag 22 februari 2007 15:58
To: Joris Dobbelsteen; elein
Cc: pgsql-general@postgresql.org
Subject: RE: [GENERAL] complex referential integrity constraints

The ability to make a foreign key reference a specific partial
unique index (rather than just a set of columns that have a
unique index) would solve many problems of this type. As
another example, you might have a table where one of the
columns is "is_deleted boolean not null". By creating a
partial unique index on the primary key of that table "WHERE
NOT is_deleted" and then pointing a foreign key at it, you
could enforce that each row in the child table references a
parent who isn't deleted.

However, this would break down when there's more one than
intermediate step involved. For example, if you have:

CREATE TABLE animal_type (
id serial,
name varchar(60) not null,
is_attacker boolean not null,
primary key (id)
);

CREATE TABLE animal (
id serial,
type_id integer not null references animal_type (id),
name varchar(60) not null,
primary key (id)
);

CREATE TABLE mauling (
id serial,
attacker_id integer not null references animal (id),
victim_id integer not null references animal (id),
attack_time timestamp with time zone not null,
primary key (id)
);

It would be easy to enforce the constraint that the attacker
must be an animal of some specific type, but difficult to
enforce the constraint that the attacker must be an animal
whose type, in turn, has a true value for is_attacker.

Even worse, I don't you can guarentee that this constraint is enforced
at all times. That means, not if you are using triggers.
The only option seems using foreign keys and put in a lot of redundant
data.

The best idea that I can think of right now to handle multiple
levels of tables is to allow FOREIGN KEY constraints to
references a VIEW, rather than a table. Then you could say:

CREATE VIEW attackers AS
SELECT a.id FROM animal a, animal_type t WHERE a.type_id =
t.id AND t.attacker;

...and then FOREIGN KEY (attacker_id) REFERENCES attackers (id).

Perhaps "Alban Hertroys" idea solves this problem a little easier.
However it lacks the possibility to make quick changes later on
(predator is a predator, or you are screwed, no second change).
This is not acceptable in problems where such things are decided after
object creation or might be changed later on.

This syntax would solve a number of other problems as well,
such as requiring that some record in table A has a parent
either in table P or in table Q. However, I think this would
probably require implementing some kind of materialized view
so that you could actually build an index on the view, and
that opens up a whole new can of worms, because it's not very
difficult to define a view that is costly to update incrementally.

You don't need a materialized view to put a database to its knees. You
can already do that today, with ease. I wouldn't worry too much about
that.
If you mean from a syntax I suggested I do not believe it's the 'right'
way to define an (materialized) view, rather use a trigger-like style of
system. There are some other issues, however.

The problem is really that there is a pretty large gap between
writing a foreign key constraint, which is trivial, and
enforcing a constraint using triggers, which is quite a bit
more complex (and therefore, easy to screw up), because the
foreign key automatically handles all the cases (insert into
child table, update of child table, update of parent table,
delete from parent table) whereas with triggers you have to
address each of those cases individually.

Exactly, that is why I suggested such a system. If its not easy to
enforce constraints, it will never happen properly. Especially if
problems get more complex.

Unfortunately,
something tells me that implementing a more powerful system
for foreign key constraints is a non-trivial project, however
useful it would be.
Still, I'd love to see it in the TODO file, too.

Me too, I get the impression that SQL is too weak for most constraints.

- Joris

Show quoted text

...Robert

-----Original Message-----
From: Joris Dobbelsteen [mailto:Joris@familiedobbelsteen.nl]
Sent: Thursday, February 22, 2007 8:03 AM
To: Robert Haas; elein
Cc: pgsql-general@postgresql.org
Subject: RE: [GENERAL] complex referential integrity constraints

I partially agree:
If people CAN do stupid things, they are 'clever' enough to
find a way to actually do it. I've seen them destroy things,
by just using a system in a way it was not intended. They
effectively found a way to blow away the very thing that part
was designed for.
But indeed, it's a lot of work, especially if the number of
tables that must be referenced increases. I'm a strong
supporter for ensuring consistency. Postgres has what it takes
to do the job, but it doesn't make my life a lot easier. But
it seems to be as good as it gets today...

Perhaps we should rather define a 'database' constraint in the
order of:
"For every mauling, the attacking animal must be of the attacker type"
(in a computer understandable manner). From the set theory
this should be possible without too much problems, However
doing so efficiently might be slightly harder.
This might be a fun project and useful for the TODO list. At
least it makes it a lot easier (and maintanable) to enforce
database-wide constraints.

- Joris

-----Original Message-----
From: Robert Haas [mailto:Robert.Haas@dyntek.com]
Sent: woensdag 21 februari 2007 3:37
To: Joris Dobbelsteen; elein
Cc: pgsql-general@postgresql.org
Subject: RE: [GENERAL] complex referential integrity constraints

Yes, exactly. And while you might not care about all of

those (e.g. I

care about the first two but am not worried about the third

one because

I'm the only one who will ever update that table), writing multiple
triggers to enforce each constraint of this type quickly gets old if
there are even a few of them.
It is exponentially harder to write a constraint of this

type than it

is to write a simple foreign key constraint.

...Robert

-----Original Message-----
From: Joris Dobbelsteen [mailto:Joris@familiedobbelsteen.nl]
Sent: Monday, February 19, 2007 5:59 AM
To: elein; Robert Haas
Cc: pgsql-general@postgresql.org
Subject: RE: [GENERAL] complex referential integrity constraints

Why don't you add a field in animal_types that is boolean mauler.
Then you can add a trigger on the mauling table to raise an

error when

the attacker_id is an animal type mauler.

This is only partial. You need a lot more triggers to guarentee the
constraints are enforced.
Precisely you need to validate:
* mauling on insert/update of attacker_id
* animal on update of type_id
* animal_type on update of your property

Of course you need to think about the MVCC model, such that:
Transaction 1 executes
INSERT INTO mauling VALUES ('someattacker'), Transaction 2 executes
UPDATE animal_type SET mauler = false WHERE name =

'someattacker', such

that both transaction happen in parallel.

This is perfectly possible and will make it possible to violate the
constraint, UNLESS locking of the tuples is done correctly.

These contraints are not trivial to implement (unfortunally).
It would be great if they where.

- Joris

#14

Martijn van Oosterhout

kleptog@svana.org

about 19 years ago

In reply to: Joris Dobbelsteen (#13)

Re: complex referential integrity constraints

On Thu, Feb 22, 2007 at 05:28:35PM +0100, Joris Dobbelsteen wrote:

Even worse, I don't you can guarentee that this constraint is enforced
at all times. That means, not if you are using triggers.
The only option seems using foreign keys and put in a lot of redundant
data.

Err, foreign keys are implemented using triggers, so this statement is
self-contradictary.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

#15

Joris Dobbelsteen

Joris@familiedobbelsteen.nl

about 19 years ago

In reply to: Robert Haas (#1)

Re: complex referential integrity constraints

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of
Martijn van Oosterhout
Sent: donderdag 22 februari 2007 18:17
To: Joris Dobbelsteen
Cc: Robert Haas; pgsql-general@postgresql.org
Subject: Re: [GENERAL] complex referential integrity constraints

On Thu, Feb 22, 2007 at 05:28:35PM +0100, Joris Dobbelsteen wrote:

Even worse, I don't you can guarentee that this constraint

is enforced

at all times. That means, not if you are using triggers.
The only option seems using foreign keys and put in a lot of

redundant

data.

Err, foreign keys are implemented using triggers, so this
statement is self-contradictary.

Are you really sure they are executed under the same visibility rules?

- Joris

#16

Stephan Szabo

sszabo@megazone23.bigpanda.com

about 19 years ago

In reply to: Joris Dobbelsteen (#15)

Re: complex referential integrity constraints

On Thu, 22 Feb 2007, Joris Dobbelsteen wrote:

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of
Martijn van Oosterhout
Sent: donderdag 22 februari 2007 18:17
To: Joris Dobbelsteen
Cc: Robert Haas; pgsql-general@postgresql.org
Subject: Re: [GENERAL] complex referential integrity constraints

On Thu, Feb 22, 2007 at 05:28:35PM +0100, Joris Dobbelsteen wrote:

Even worse, I don't you can guarentee that this constraint

is enforced

at all times. That means, not if you are using triggers.
The only option seems using foreign keys and put in a lot of

redundant

data.

Err, foreign keys are implemented using triggers, so this
statement is self-contradictary.

Are you really sure they are executed under the same visibility rules?

IIRC, the ri triggers use calls that you aren't able to get at in
triggers written in any of the PLs, but I think you should be able to
replicate the feat in a trigger written in C.

#17

Martijn van Oosterhout

kleptog@svana.org

about 19 years ago

In reply to: Joris Dobbelsteen (#15)

Re: complex referential integrity constraints

On Thu, Feb 22, 2007 at 06:51:49PM +0100, Joris Dobbelsteen wrote:

Err, foreign keys are implemented using triggers, so this
statement is self-contradictary.

Are you really sure they are executed under the same visibility rules?

Reasonably. I have no idea what visibility rules would make any
difference at all. AIUI a foreign key just takes a shared lock on the
referenced row and all the magic of MVCC makes sure the row exists when
the transaction completes.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

#18

Joris Dobbelsteen

Joris@familiedobbelsteen.nl

about 19 years ago

In reply to: Robert Haas (#1)

Re: complex referential integrity constraints

-----Original Message-----
From: Martijn van Oosterhout [mailto:kleptog@svana.org]
Sent: donderdag 22 februari 2007 23:15
To: Joris Dobbelsteen
Cc: Robert Haas; pgsql-general@postgresql.org
Subject: Re: [GENERAL] complex referential integrity constraints

On Thu, Feb 22, 2007 at 06:51:49PM +0100, Joris Dobbelsteen wrote:

Err, foreign keys are implemented using triggers, so this statement
is self-contradictary.

Are you really sure they are executed under the same

visibility rules?

Reasonably. I have no idea what visibility rules would make
any difference at all. AIUI a foreign key just takes a shared
lock on the referenced row and all the magic of MVCC makes
sure the row exists when the transaction completes.

Try this:
(sorry for any typo's in SQL, if they exist)

CREATE TABLE a (val integer NOT NULL PRIMARY KEY);
CREATE TABLE b (val integer NOT NULL, val2 integer NOT NULL, PRIMARY KEY
(val, val2);
-- we will be doing foreign key ourselves
INSERT INTO a VALUES (1);
INSERT INTO a VALUES (2);
INSERT INTO a VALUES (1,1);
INSERT INTO a VALUES (2,2);

-- Now two transaction (T1, T2)
T1: BEGIN ISOLATION LEVEL SERIALIZABLE;
T2: BEGIN ISOLATION LEVEL SERIALIZABLE;
-- Lets see what we have got.
T1: SELECT * FROM a;
T1: SELECT * FROM b;
T2: SELECT * FROM a;
T2: SELECT * FROM b;
-- lets insert something...
T2: INSERT INTO a VALUES (2,100);
-- results in a lock being acquired
T2: SELECT 1 FROM a x WHERE val = 2 FOR SHARE ON x; -- this is your
lock
-- Ok, done for now...
T2: COMMIT; -- now the lock is gone
-- This means T1 doesn't see the row, right?
T1: SELECT * FROM b;
-- now lets delete
T1: DELETE FROM a WHERE val = 2;
-- on cascade delete, thus:
T1: DELETE FROM b WHERE val = 2; -- won't see new tuple
(serializable isolation)
T1: COMMIT;

SELECT * FROM b;
val val2
2 100

Sorry, constraint wasn't enforced ;)
It does matter.

Now try it with this:
CREATE TABLE b (val integer NOT NULL, val2 integer NOT NULL, PRIMARY KEY
(val, val2), FOREIGN KEY val REFERENCES a(val) ON UPDATE CASCADE ON
DELETE CASCADE);
That won't inhibit this behaviour, but proberly enforces the constraint
(as one would have expected). I believe T2 will abort as in the manual.

Your statement might be correct, but it doesn't take enough account of
how the visibility rules under MVCC are played. It seems the foreign
keys (as well as primary keys) have there rules applied differently,
they see that row and will cause an abort.

- Joris

#19

Joris Dobbelsteen

Joris@familiedobbelsteen.nl

about 19 years ago

In reply to: Robert Haas (#1)

Re: complex referential integrity constraints

-----Original Message-----
From: Stephan Szabo [mailto:sszabo@megazone.bigpanda.com]
Sent: donderdag 22 februari 2007 23:13
To: Joris Dobbelsteen
Cc: Martijn van Oosterhout; Robert Haas; pgsql-general@postgresql.org
Subject: Re: [GENERAL] complex referential integrity constraints

On Thu, 22 Feb 2007, Joris Dobbelsteen wrote:

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of

Martijn van

Oosterhout
Sent: donderdag 22 februari 2007 18:17
To: Joris Dobbelsteen
Cc: Robert Haas; pgsql-general@postgresql.org
Subject: Re: [GENERAL] complex referential integrity constraints

On Thu, Feb 22, 2007 at 05:28:35PM +0100, Joris Dobbelsteen wrote:

Even worse, I don't you can guarentee that this constraint

is enforced

at all times. That means, not if you are using triggers.
The only option seems using foreign keys and put in a lot of

redundant

data.

Err, foreign keys are implemented using triggers, so this statement
is self-contradictary.

Are you really sure they are executed under the same

visibility rules?

IIRC, the ri triggers use calls that you aren't able to get at
in triggers written in any of the PLs, but I think you should
be able to replicate the feat in a trigger written in C.

Why they never did that? No need or no pratical situations where it went
wrong?

IMHO it should become possible to expose this functionality to the PL
languages or as part of the trigger system? The current system can be
shown to be underpowered to enforce constraints. It seems a bit
cumbersome to have C functions for this purpose.

However I must admit that looking through the postgres code doesn't make
it much clearer to me whats actually going for these kind of tricky
problems...

- Joris

#20

Martijn van Oosterhout

kleptog@svana.org

about 19 years ago

In reply to: Joris Dobbelsteen (#18)

Re: complex referential integrity constraints

On Fri, Feb 23, 2007 at 12:41:25AM +0100, Joris Dobbelsteen wrote:

Reasonably. I have no idea what visibility rules would make
any difference at all. AIUI a foreign key just takes a shared
lock on the referenced row and all the magic of MVCC makes
sure the row exists when the transaction completes.

Try this:
(sorry for any typo's in SQL, if they exist)

<snip>

Well, I took a look at the RI code and the only stuff I saw that looked
interesting was this:

utils/adt/ri_triggers.c:
if (IsXactIsoLevelSerializable && detectNewRows)
{
CommandCounterIncrement(); /* be sure all my own work is visible */
test_snapshot = CopySnapshot(GetLatestSnapshot());
crosscheck_snapshot = CopySnapshot(GetTransactionSnapshot());
}

It then proceeds to use that snapshot to execute the query to get the share lock.

It's probably true that other PL's can't do this directly. Not sure how
to deal with that. I got confused because I thought the first version
of RI did use straight pl/pgsql functions, so I thought that was
enough.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text