Constraint stuff

Started by Stephan Szaboover 25 years ago11 messages

sszabo@megazone23.bigpanda.com

over 25 years ago

I think my last message to Tom (and the list)
about the foreign key stuff and oids ended up
in /dev/null due to a problem on the local
mailer.

Tom had suggested storing a more
understandable form of the foreign key constraint
to make dumping more reasonable in its own table.
I'd guess like the src stored for check constraints.
However, I noticed a few problems with this and
while thinking about it I had a few germs of
ideas which aren't any kind of proposal yet, but
I thought someone might be interested in them.

The problem with storing source is that it doesn't
get changed when things change. Try altering
a column name that has a check constraint, then
dump the database. I don't think this is the
responsibility of the dumper. If we store source
we should be guaranteeing it's correct.
Plus, right now for FK constraints we do something
specific to keep track of the other table referenced
so we can remove the constraints if the table goes
away. But, what happens when we allow subqueries
in check constraints, etc...

So, what I was thinking is, that if we have another
table to store this kind of constraint info, it
should probably store information for all constraints.
I was thinking two tables, one (say pg_constraint)
which stores basic information about the constraint
(what type, the constraint name, primarily constraintd
table, maybe owner if constraints have owners in SQL)
and a source form (see more below).
The second table stores references from this constraint.
So any table, column, index, etc is stored here.
Probably something of the form constraintoid,
type of thing being referenced (the oid of the table?),
the oid of the referenced thing and a number.

The number comes in to the source form thats stored.
Anywhere that we're referencing something that a name
is insufficient for (like a column name or table name)
we put something into the source for that says
referncing column n of the referenced thing m.

Then we create something like
format_constraint(constraintoid) which gives out
an SQL compliant version of the cconstraint.

And it means that if we deleted something, we know fairly
easily whether or not it is being referenced by some
constraint somewhere without writing separate code for
fk constraints and check constraints, etc.. And
renaming wouldn't be a problem.

- There are some problems I see right off both conceptually
and implementation, but I thought someone might be able
to come up with a better idea once it was presented (even
if it's just a "not worth the effort" :) )

One of the problems I see is that if taken to its end,
would you store function oids here? If so, that might
make it harder to allow a drop function/create function
to ever work transparently in the future.
Plus, I'm not even really sure if it would be reasonable
to get a source form like I was thinking of for check
constraints really.

Philip Warner

pjw@rhyme.com.au

over 25 years ago

In reply to: Stephan Szabo (#1)

Re: Constraint stuff

At 10:29 6/08/00 -0700, Stephan Szabo wrote:

The problem with storing source is that it doesn't
get changed when things change. Try altering
a column name that has a check constraint, then
dump the database.

Or renaming a referenced table - I think the current constraint system will
handle this since OIDs don't change.

So, what I was thinking is, that if we have another
table to store this kind of constraint info, it
should probably store information for all constraints.
I was thinking two tables, one (say pg_constraint)
which stores basic information about the constraint
(what type, the constraint name, primarily constraintd
table, maybe owner if constraints have owners in SQL)
and a source form (see more below).

This sounds reasonable.

The second table stores references from this constraint.
So any table, column, index, etc is stored here.
Probably something of the form constraintoid,
type of thing being referenced (the oid of the table?),
the oid of the referenced thing and a number.

I would prefer to see this generalized: a dependencies table that lists
both the referrer OID *and* type, as well as the refrerenced thing oid &
type. This then allows things such as SQL functions to make entries in this
table as well as views etc etc.

The number comes in to the source form thats stored.
Anywhere that we're referencing something that a name
is insufficient for (like a column name or table name)
we put something into the source for that says
referncing column n of the referenced thing m.

Don't know enough about the internals, but don't we have attr ids for this,
and/or won't OIDs work in most cases? Maybe I'm missing your point.

- There are some problems I see right off both conceptually
and implementation, but I thought someone might be able
to come up with a better idea once it was presented (even
if it's just a "not worth the effort" :) )

It seems to me that:

- 'format_constraint' is a good idea
- we need the dependency stuff
- dumping source in canonical form is best put in the backend
(philosophical point)
- I presume it's a first part of a full implementation of 'alter table
add/drop constraint...'

so I don't think it's a waste of time.

One of the problems I see is that if taken to its end,
would you store function oids here?

Sounds sensible.

If so, that might
make it harder to allow a drop function/create function
to ever work transparently in the future.

I *think* it doesn't work now; yes you can drop the function, but AFAIK,
the constraint references the old one (at least that's true for normal
triggers). What you are proposing makes people aware that they are about to
break more things than they know.

Plus, I'm not even really sure if it would be reasonable
to get a source form like I was thinking of for check
constraints really.

I suspect it has to depend on how the constraint is acually checked. If
They are checkied by using table OIDs then you need to store the OID
dependency and *somehow* reconstruct the source. If they are checked using
table names (getting OID each time), then store the name and the raw source
(maybe). You need to handle renaming of tables referenced in CHECK clauses.
I hate rename (but I use it).

Maybe you can do something nasty like store the source with escape
characters and OIDs in place of names. This is not as bad as it sounds, I
think. It also gets around the problem that original source may be
unrecoverable (eg. COALESCE is translated to CASE in the parser, so a CHECK
clause that uses COALESCE will never be fully recoverable - although most
people would not see this as a problem). This messing around would have to
be done in the parser, I would guess. So:

check exists(select * from reftbl where reffld=tbl.origfld)

might become:

check exists(select * from %%table:<OID>%% where
%%table-attr:<Table-OID>,<Attd-ID>%%
= %%table:<OID>%%.%%table-attr:<Table-OID>,<Attd-ID>%%

Looking at this, maybe it's not such a good idea after all...

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

Tom Lane

tgl@sss.pgh.pa.us

over 25 years ago

In reply to: Stephan Szabo (#1)

Re: Constraint stuff

Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:

Tom had suggested storing a more
understandable form of the foreign key constraint
to make dumping more reasonable in its own table.
I'd guess like the src stored for check constraints.

I wasn't actually thinking of storing source, but rather precompiled
expressions (as I remarked awhile ago, I think pg_relcheck's rcsrc
column is dead weight; we could and should generate the value on demand
by reverse-listing rcbin instead). This gets you away from
rename-induced problems since everything is table OIDs, attribute column
numbers, operator and function OIDs, etc.

However, digging those references out of the expression tree is a little
bit painful; you're right that we shouldn't expect applications to do
that for themselves. We could store an additional list of referenced
items. We wouldn't necessarily have to store that explicitly either,
though --- functions to say "is this OID referenced in this stored
expression" or perhaps "give me an array of all function OIDs in this
expression" would get the job done AFAICS.

One of the problems I see is that if taken to its end,
would you store function oids here? If so, that might
make it harder to allow a drop function/create function
to ever work transparently in the future.

I don't think we should worry about that. What's actually needed IMHO
is an "ALTER FUNCTION" command that allows you to replace the body of
an existing function, and perhaps change its name, but NOT its type
signature (result type and number/types of arguments). Changing the
signature is inherently not a transparent operation because it'd
invalidate stored expressions that use the function. ALTER would let
you make safe changes to a function without changing its OID and thus
without invalidating references-by-OID.

regards, tom lane

Stephan Szabo

sszabo@megazone23.bigpanda.com

over 25 years ago

In reply to: Philip Warner (#2)

Re: Constraint stuff

On Mon, 7 Aug 2000, Philip Warner wrote:

The second table stores references from this constraint.
So any table, column, index, etc is stored here.
Probably something of the form constraintoid,
type of thing being referenced (the oid of the table?),
the oid of the referenced thing and a number.

I would prefer to see this generalized: a dependencies table that lists
both the referrer OID *and* type, as well as the refrerenced thing oid &
type. This then allows things such as SQL functions to make entries in this
table as well as views etc etc.

That makes more sense, yes. :) Although not all of those things would
probably use it immediately.

The number comes in to the source form thats stored.
Anywhere that we're referencing something that a name
is insufficient for (like a column name or table name)
we put something into the source for that says
referncing column n of the referenced thing m.

Don't know enough about the internals, but don't we have attr ids for this,
and/or won't OIDs work in most cases? Maybe I'm missing your point.

I was thinking of it more for getting a textual representation back out
of the dependencies for constraints and thinking that I might want to
reference something other than its name that's on something that's
referenced. And actually referncing column n, meant more like attrno
n of the row that m refers to. (Sort of like your thing below for
%%table:OID,attrno)

If so, that might
make it harder to allow a drop function/create function
to ever work transparently in the future.

I *think* it doesn't work now; yes you can drop the function, but AFAIK,
the constraint references the old one (at least that's true for normal
triggers). What you are proposing makes people aware that they are about to
break more things than they know.

True, I just wanted to point it out in case someone had some thought on
changing it so that the system somehow fixed such references, but it
seems like alter function is more likely :)

Plus, I'm not even really sure if it would be reasonable
to get a source form like I was thinking of for check
constraints really.

I suspect it has to depend on how the constraint is acually checked. If
They are checkied by using table OIDs then you need to store the OID
dependency and *somehow* reconstruct the source. If they are checked using
table names (getting OID each time), then store the name and the raw source
(maybe). You need to handle renaming of tables referenced in CHECK clauses.
I hate rename (but I use it).

Maybe you can do something nasty like store the source with escape
characters and OIDs in place of names. This is not as bad as it sounds, I
think. It also gets around the problem that original source may be
unrecoverable (eg. COALESCE is translated to CASE in the parser, so a CHECK
clause that uses COALESCE will never be fully recoverable - although most
people would not see this as a problem). This messing around would have to
be done in the parser, I would guess. So:

check exists(select * from reftbl where reffld=tbl.origfld)

might become:

check exists(select * from %%table:<OID>%% where
%%table-attr:<Table-OID>,<Attd-ID>%%
= %%table:<OID>%%.%%table-attr:<Table-OID>,<Attd-ID>%%

Looking at this, maybe it's not such a good idea after all...

:) Basically that's sort of what I was proposing with the %m.n above where
I was referencing a reference rather than the oid directly and the n was
basically attd-id (and the reference stored the type so i didn't need
it. But if it had to be sent from the parser then your format probably
makes more sense. I was thinking about reversing from the stored
expression in some fashion (but that wouldn't recover a coalesce or
something like that)

Hannu Krosing

hannu@tm.ee

over 25 years ago

In reply to: Stephan Szabo (#1)

Re: Constraint stuff

Tom Lane wrote:

Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:

Tom had suggested storing a more
understandable form of the foreign key constraint
to make dumping more reasonable in its own table.
I'd guess like the src stored for check constraints.

...

I don't think we should worry about that. What's actually needed IMHO
is an "ALTER FUNCTION" command that allows you to replace the body of
an existing function, and perhaps change its name, but NOT its type
signature (result type and number/types of arguments).

IIRC Oracle allows the syntax CREATE OR REPLACE in many places, for
example for changing VIEWS and PROCEDURES without affecting the things
dependent on them.

CREATE OR REPLACE works also for not-yet-existing function which ALTER
probably would not.

Changing the
signature is inherently not a transparent operation because it'd
invalidate stored expressions that use the function. ALTER would let
you make safe changes to a function without changing its OID and thus
without invalidating references-by-OID.

----------
Hannu

Stephan Szabo

sszabo@megazone23.bigpanda.com

over 25 years ago

In reply to: Tom Lane (#3)

Re: Constraint stuff

On Mon, 7 Aug 2000, Tom Lane wrote:

Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:

Tom had suggested storing a more
understandable form of the foreign key constraint
to make dumping more reasonable in its own table.
I'd guess like the src stored for check constraints.

I wasn't actually thinking of storing source, but rather precompiled
expressions (as I remarked awhile ago, I think pg_relcheck's rcsrc

I guess you could store the fk_constraint node that is generated for fk
constraints, but that's not really an expression... I think I must
be missing something, because I can't quite see what the precompiled
expression for an fk constraint would be...

However, digging those references out of the expression tree is a little
bit painful; you're right that we shouldn't expect applications to do
that for themselves. We could store an additional list of referenced
items. We wouldn't necessarily have to store that explicitly either,
though --- functions to say "is this OID referenced in this stored
expression" or perhaps "give me an array of all function OIDs in this
expression" would get the job done AFAICS.

The reason I was thinking of storing things was also so you could do
things like: is this oid stored in any constraint. For example,
I'm removing a column, is there any constraint that references this
column, etc, rather than having to code stuff for all of the special
cases in all places that might need it.

Don Baccus

dhogaza@pacifier.com

over 25 years ago

In reply to: Hannu Krosing (#5)

Re: Constraint stuff

At 08:23 PM 8/7/00 +0300, Hannu Krosing wrote:

IIRC Oracle allows the syntax CREATE OR REPLACE in many places

Yes, Oracle does allow this.

- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.

Tom Lane

tgl@sss.pgh.pa.us

over 25 years ago

In reply to: Don Baccus (#7)

Re: Constraint stuff

Jan Wieck <janwieck@Yahoo.com> writes:

PL/Tcl and PL/pgSQL will load a function's source only once a
session. The functions loaded are identified by OID, so if
you drop/create a function, the PL handler will simply load a
different function too (from his point of view). At the time
we are able to ALTER a function, we might want to include
some version counter to pg_proc and in the fmgr_info?

More generally, the PL functions need to be able to deal with
recomputing saved plans after an ALTER of a table referenced by the
function. I haven't really thought about how to do that ... but it
seems like Stephan's idea of a table showing referencers and referencees
might help detect the cases where plans have to be flushed.

regards, tom lane

Import Notes

Reply to msg id not found: 200008072115.QAA03746@jupiter.greatbridge.comReference msg id not found: 200008072115.QAA03746@jupiter.greatbridge.com | Resolved by subject fallback

Stephan Szabo

sszabo@megazone23.bigpanda.com

over 25 years ago

In reply to: Tom Lane (#8)

Re: Constraint stuff (fwd)

I need to learn how to type headers I think... :(

Stephan Szabo
sszabo@bigpanda.com

On Mon, 7 Aug 2000, Jan Wieck wrote:

More generally, the PL functions need to be able to deal with
recomputing saved plans after an ALTER of a table referenced by the
function. I haven't really thought about how to do that ... but it
seems like Stephan's idea of a table showing referencers and referencees
might help detect the cases where plans have to be flushed.

More more generally, you cannot tell which objects are
referenced from saved plans inside of a PL function. Might be
possible for PL/pgSQL, where we could use a specialized
invocation of the PL compiler to determine. But there's no
way to do it for PL/Tcl or the like.

We might end up with some general catalog sequence, bumped
any time a schema change happens, and require each function
to forget about all of it's saved plans for the next
transaction. Ugly, but the only way I see to be consistent.

As a dumb question to help me understand better...
What exactly is saved in the plans and how are the plan saved for a
PL/Tcl function that does something where it generates a query that
you say don't know the table of until run time?

Import Notes

Resolved by subject fallback

#10

Karel Zak

zakkr@zf.jcu.cz

over 25 years ago

In reply to: Tom Lane (#8)

Re: Constraint stuff

On Mon, 7 Aug 2000, Tom Lane wrote:

Jan Wieck <janwieck@Yahoo.com> writes:

PL/Tcl and PL/pgSQL will load a function's source only once a
session. The functions loaded are identified by OID, so if
you drop/create a function, the PL handler will simply load a
different function too (from his point of view). At the time
we are able to ALTER a function, we might want to include
some version counter to pg_proc and in the fmgr_info?

More generally, the PL functions need to be able to deal with
recomputing saved plans after an ALTER of a table referenced by the
function. I haven't really thought about how to do that ... but it

More and more generally, IMHO all saved plans need some validity
checking and not only for ALTER, but also for all operation those
changing relevant system tables. And this is not problem for PL only,
but for all what is based on SPI (and VIEWs?).

IMHO correct solution is _one_ space and one method for plans saving
= query/plan cache, and some common validity-checker that will work
over this cache. But how implement validity-checker is unknown...
(? Call from all command that changing system tables some handler, that
check plans in a cache ?)

Karel

BTW. - first step in this problem is (or can be) on the way ---
query cache...

#11

Stephan Szabo

sszabo@megazone23.bigpanda.com

over 25 years ago

In reply to: Karel Zak (#10)

Re: Constraint stuff (fwd)

On Tue, 8 Aug 2000, Jan Wieck wrote:

That's not exactly what I said.

PL/Tcl has spi_exec and spi_prepare/spi_execp commands. And
of course, when the function call's spi_prepare, it is known
which objects it uses in this one query. But in contrast to
PL/pgSQL, PL/Tcl could use an argument as a tablename,
attribute, function, whatnot and build a saved plan for it
(would need to do so again for each different argument
value). So it CAN possibly reference almost every object in
the entire database, and you have no chance to know that,
even after a hundredthousand invocations of the function.

Okay, that's actually what I had thought, but I wasn't sure after
your previous message. But the saved plans themselves do actually
reference particular objects, even if the function doesn't, right?
So you wouldn't necessarily need to recompile all saved plans,
just ones that reference the changed objects, although it might
be easier to just force all of them.

And, you're not living isolated in your backend (I know -
everything would be so easy :-). There's life in other
processes too, and you need to tell them that they possibly
have to recompile saved plans for the next Xact. How
complicated do you want this to be?

Only as complicated as necessary... :) But it seems that it is
necessary to have really functional full set of alter commands.
I mean, if a particular plan references a column whose type has
changed, that sounds like it would be bad to use an old saved plan.
(Oops, that's not a varchar anymore... it's an integer...)

I guess a question is, what is the correct/desired behavior in
certain cases... And what cases are reasonably our problem and what
can we say is the admin's problem? Obviously, we're not going to get
far trying to deal with the possibility that a user changes a column
type and does something that is no longer correct, except to error,
we probably can't and shouldn't fix it, but what things should we handle
automatically?

If you have a function that makes a query like select * from foo that
isn't done via arguments and you rename foo, what *should* happen if
anything? What if you remove it entirely? What about a constraint that
references a function that's renamed, does the constraint follow the name
change? If the function is removed, do we want to remove the now broken
constraint?

Admittedly, the initial thought behind this whole thing was to allow
constraints to properly dump after renames and to make dropping tables or
columns easier to handle for removing referencing constraints (assuming
that we were going to be at some point handling subqueries inside check
constraints). All the rest of this is way past where I initially thought
it to...

Import Notes

Resolved by subject fallback