information_schema and not-null constraints

Started by Alvaro Herreraover 2 years ago19 messageshackers
Jump to latest
#1Alvaro Herrera
alvherre@2ndquadrant.com

In reference to [1]/messages/by-id/81b461c4-edab-5d8c-2f88-203108425340@enterprisedb.com, 0001 attached to this email contains the updated
view definitions that I propose.

In 0002, I took the tests added by Peter's proposed patch and put them
in a separate test file that runs at the end. There are some issues,
however. One is that the ORDER BY clause in the check_constraints view
is not fully deterministic, because the table name is not part of the
view definition, so we cannot sort by table name. In the current
regression database there is only one case[2]select constraint_name, count(*), string_agg(distinct check_clause, E'\n') from information_schema.check_constraints group by constraint_name having count(*) > 1; where two constraints have
the same name and different definition:

inh_check_constraint │ 2 │ ((f1 > 0)) NOT VALID ↵
│ │ ((f1 > 0))

(on tables invalid_check_con and invalid_check_con_child). I assume
this is going to bite us at some point. We could just add a WHERE
clause to omit that one constraint.

Another issue I notice eyeballing at the results is that foreign keys on
partitioned tables are listing the rows used to implement the
constraints on partitions, which are sort-of "internal" constraints (and
are not displayed by psql's \d). I hope this is a relatively simple fix
that we could extract from the code used by psql.

Anyway, I think I'm going to get 0001 committed sometime tomorrow, and
then play a bit more with 0002 to try and get it pushed soon also.

Thanks

[1]: /messages/by-id/81b461c4-edab-5d8c-2f88-203108425340@enterprisedb.com

[2]: select constraint_name, count(*), string_agg(distinct check_clause, E'\n') from information_schema.check_constraints group by constraint_name having count(*) > 1;
select constraint_name, count(*),
string_agg(distinct check_clause, E'\n')
from information_schema.check_constraints
group by constraint_name
having count(*) > 1;

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"You don't solve a bad join with SELECT DISTINCT" #CupsOfFail
https://twitter.com/connor_mc_d/status/1431240081726115845

Attachments:

v2-0001-update-information_schema-definition.patchtext/x-diff; charset=utf-8Download+27-48
v2-0002-add-information_schema-test.patchtext/x-diff; charset=utf-8Download+735-3
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#1)
Re: information_schema and not-null constraints

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

In 0002, I took the tests added by Peter's proposed patch and put them
in a separate test file that runs at the end. There are some issues,
however. One is that the ORDER BY clause in the check_constraints view
is not fully deterministic, because the table name is not part of the
view definition, so we cannot sort by table name.

I object very very strongly to this proposed test method. It
completely undoes the work I did in v15 (cc50080a8 and related)
to make the core regression test scripts mostly independent of each
other. Even without considering the use-case of running a subset of
the tests, the new test's expected output will constantly be needing
updates as side effects of unrelated changes.

regards, tom lane

#3Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#2)
Re: information_schema and not-null constraints

On 2023-Sep-05, Peter Eisentraut wrote:

The following information schema views are affected by the not-null
constraint catalog entries:

1. CHECK_CONSTRAINTS
2. CONSTRAINT_COLUMN_USAGE
3. DOMAIN_CONSTRAINTS
4. TABLE_CONSTRAINTS

Note that 1 and 3 also contain domain constraints.

After looking at what happens for domain constraints in older versions
(I tested 15, but I suppose this applies everywhere), I notice that we
don't seem to handle them anywhere that I can see. My quick exercise is
just

create domain nnint as int not null;
create table foo (a nnint);

and then verify that this constraint shows nowhere -- it's not in
DOMAIN_CONSTRAINTS for starters, which is I think the most obvious place.
And nothing is shown in CHECK_CONSTRAINTS nor TABLE_CONSTRAINTS either.

This did ever work in the past? I tested with 9.3 and didn't see
anything there either.

I am hesitant to try to add domain not-null constraint support to
information_schema in the same commit as these changes. I think this
should be fixed separately.

(Note that if, in older versions, you change the table to be
create table foo (a nnint NOT NULL);
then you do get a row in table_constraints, but nothing in
check_constraints. With my proposed definition this constraint appears
in check_constraints, table_constraints and constraint_column_usage.)

On 2023-Sep-04, Tom Lane wrote:

I object very very strongly to this proposed test method. It
completely undoes the work I did in v15 (cc50080a8 and related)
to make the core regression test scripts mostly independent of each
other. Even without considering the use-case of running a subset of
the tests, the new test's expected output will constantly be needing
updates as side effects of unrelated changes.

You're absolutely right, this would be disastrous. A better alternative
is that the new test file creates a few objects for itself, either by
using a separate role or by using a separate schema, and we examine the
information_schema display for those objects only. Then it'll be better
isolated.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Subversion to GIT: the shortest path to happiness I've ever heard of
(Alexey Klyukin)

#4Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#3)
Re: information_schema and not-null constraints

On 2023-Sep-05, Alvaro Herrera wrote:

After looking at what happens for domain constraints in older versions
(I tested 15, but I suppose this applies everywhere), I notice that we
don't seem to handle them anywhere that I can see. My quick exercise is
just

create domain nnint as int not null;
create table foo (a nnint);

and then verify that this constraint shows nowhere -- it's not in
DOMAIN_CONSTRAINTS for starters, which is I think the most obvious place.
And nothing is shown in CHECK_CONSTRAINTS nor TABLE_CONSTRAINTS either.

Looking now at what to do for CHECK_CONSTRAINTS with domain constraints,
I admit I'm completely confused about what this view is supposed to
show. Currently, we show the constraint name and a definition like
"CHECK (column IS NOT NULL)". But since the table name is not given, it
is not possible to know to what table the column name refers to. For
domains, we could show "CHECK (VALUE IS NOT NULL)" but again with no
indication of what domain it applies to, or anything at all that would
make this useful in any way whatsoever.

So this whole thing seems pretty futile and I'm disinclined to waste
much time on it.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

#5Vik Fearing
vik@postgresfriends.org
In reply to: Alvaro Herrera (#4)
Re: information_schema and not-null constraints

On 9/5/23 19:15, Alvaro Herrera wrote:

On 2023-Sep-05, Alvaro Herrera wrote:

Looking now at what to do for CHECK_CONSTRAINTS with domain constraints,
I admit I'm completely confused about what this view is supposed to
show. Currently, we show the constraint name and a definition like
"CHECK (column IS NOT NULL)". But since the table name is not given, it
is not possible to know to what table the column name refers to. For
domains, we could show "CHECK (VALUE IS NOT NULL)" but again with no
indication of what domain it applies to, or anything at all that would
make this useful in any way whatsoever.

Constraint names are supposed to be unique per schema[1]SQL:2023-2 11.4 <table constraint definition> Syntax Rule 4 -- Vik Fearing so the view
contains the minimum required information to identify the constraint.

So this whole thing seems pretty futile and I'm disinclined to waste
much time on it.

Until PostgreSQL either
A) obeys the spec on this uniqueness, or
B) decides to deviate from the information_schema spec;
this view will be completely useless for actually getting any useful
information.

I would like to see us do A because it is the right thing to do. Our
autogenerated names obey this rule, but who knows how many duplicate
names per schema are out there in the wild from people specifying their
own names.

I don't know what the project would think about doing B.

[1]: SQL:2023-2 11.4 <table constraint definition> Syntax Rule 4 -- Vik Fearing
--
Vik Fearing

#6David G. Johnston
david.g.johnston@gmail.com
In reply to: Vik Fearing (#5)
Re: information_schema and not-null constraints

On Tue, Sep 5, 2023 at 2:50 PM Vik Fearing <vik@postgresfriends.org> wrote:

On 9/5/23 19:15, Alvaro Herrera wrote:

On 2023-Sep-05, Alvaro Herrera wrote:

Looking now at what to do for CHECK_CONSTRAINTS with domain constraints,
I admit I'm completely confused about what this view is supposed to
show. Currently, we show the constraint name and a definition like
"CHECK (column IS NOT NULL)". But since the table name is not given, it
is not possible to know to what table the column name refers to. For
domains, we could show "CHECK (VALUE IS NOT NULL)" but again with no
indication of what domain it applies to, or anything at all that would
make this useful in any way whatsoever.

Constraint names are supposed to be unique per schema[1] so the view
contains the minimum required information to identify the constraint.

I'm presuming that the view constraint_column_usage [1]https://www.postgresql.org/docs/current/infoschema-constraint-column-usage.html is an integral part
of all this though I haven't taken the time to figure out exactly how we
are implementing it today.

I'm not all that for either A or B since the status quo seems workable.
Though ideally if the system has unique names per schema then everything
should just work - having the views produce duplicated information (as
opposed to nothing) if they are used when the DBA doesn't enforce the
standard's requirements seems plausible.

David J.

[1]: https://www.postgresql.org/docs/current/infoschema-constraint-column-usage.html
https://www.postgresql.org/docs/current/infoschema-constraint-column-usage.html

#7Vik Fearing
vik@postgresfriends.org
In reply to: David G. Johnston (#6)
Re: information_schema and not-null constraints

On 9/6/23 00:14, David G. Johnston wrote:

I'm not all that for either A or B since the status quo seems workable.

Pray tell, how is it workable? The view does not identify a specific
constraint because we don't obey the rules on one side and we do obey
the rules on the other side. It is completely useless and unworkable.

Though ideally if the system has unique names per schema then everything
should just work - having the views produce duplicated information (as
opposed to nothing) if they are used when the DBA doesn't enforce the
standard's requirements seems plausible.

Let us not engage in victim blaming. Postgres is the problem here.
--
Vik Fearing

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Vik Fearing (#7)
Re: information_schema and not-null constraints

Vik Fearing <vik@postgresfriends.org> writes:

On 9/6/23 00:14, David G. Johnston wrote:

I'm not all that for either A or B since the status quo seems workable.

Pray tell, how is it workable? The view does not identify a specific
constraint because we don't obey the rules on one side and we do obey
the rules on the other side. It is completely useless and unworkable.

What solution do you propose? Starting to enforce the spec's rather
arbitrary requirement that constraint names be unique per-schema is
a complete nonstarter. Changing the set of columns in a spec-defined
view is also a nonstarter, or at least we've always taken it as such.

If you'd like to see some forward progress in this area, maybe you
could lobby the SQL committee to make constraint names unique per-table
not per-schema, and then make the information_schema changes that would
be required to support that.

In general though, the fact that we have any DDL extensions at all
compared to the standard means that there will be Postgres databases
that are not adequately represented by the information_schema views.
I'm not sure it's worth being more outraged about constraint names
than anything else. Or do you also want us to rip out (for starters)
unique indexes on expressions, or unique partial indexes?

regards, tom lane

#9Vik Fearing
vik@postgresfriends.org
In reply to: Tom Lane (#8)
Re: information_schema and not-null constraints

On 9/6/23 02:53, Tom Lane wrote:

Vik Fearing <vik@postgresfriends.org> writes:

On 9/6/23 00:14, David G. Johnston wrote:

I'm not all that for either A or B since the status quo seems workable.

Pray tell, how is it workable? The view does not identify a specific
constraint because we don't obey the rules on one side and we do obey
the rules on the other side. It is completely useless and unworkable.

What solution do you propose? Starting to enforce the spec's rather
arbitrary requirement that constraint names be unique per-schema is
a complete nonstarter. Changing the set of columns in a spec-defined
view is also a nonstarter, or at least we've always taken it as such.

I both semi-agree and semi-disagree that these are nonstarters. One of
them has to give.

If you'd like to see some forward progress in this area, maybe you
could lobby the SQL committee to make constraint names unique per-table
not per-schema, and then make the information_schema changes that would
be required to support that.

I could easily do that; but now you are asking to denormalize the
standard, because the constraints could be from tables, domains, or
assertions.

I don't think that will go over well, starting with my own opinion.

And for this reason, I do not believe that this is a "rather arbitrary
requirement".

In general though, the fact that we have any DDL extensions at all
compared to the standard means that there will be Postgres databases
that are not adequately represented by the information_schema views.

Sure.

I'm not sure it's worth being more outraged about constraint names
than anything else. Or do you also want us to rip out (for starters)
unique indexes on expressions, or unique partial indexes?

Indexes of any kind are not part of the standard so these examples are
basically invalid.

SQL:2023-11 Schemata is not the part I am most familiar with, but I
don't even see where regular multi-column unique constraints are listed
out, so that is both a lack in the standard and a knockdown of this
argument. I am happy to be shown wrong about this.
--
Vik Fearing

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Vik Fearing (#9)
Re: information_schema and not-null constraints

Vik Fearing <vik@postgresfriends.org> writes:

On 9/6/23 02:53, Tom Lane wrote:

What solution do you propose? Starting to enforce the spec's rather
arbitrary requirement that constraint names be unique per-schema is
a complete nonstarter. Changing the set of columns in a spec-defined
view is also a nonstarter, or at least we've always taken it as such.

I both semi-agree and semi-disagree that these are nonstarters. One of
them has to give.

[ shrug... ] if you stick to a SQL-compliant schema setup, then the
information_schema views will serve for introspection. If you don't,
they won't, and you'll need to look at Postgres-specific catalog data.
This compromise has served for twenty years or so, and I'm not in a
hurry to change it. I think the odds of changing to the spec's
restriction without enormous pushback are nil, and I do not think
that the benefit could possibly be worth the ensuing pain to users.
(It's not even the absolute pain level that is a problem, so much
as the asymmetry: the pain would fall exclusively on users who get
no benefit, because they weren't relying on these views anyway.
If you think that's an easy sell, you're mistaken.)

It could possibly be a little more palatable to add column(s) to the
information_schema views, but I'm having a hard time seeing how that
moves the needle. The situation would still be precisely describable
as "if you stick to a SQL-compliant schema setup, then the standard
columns of the information_schema views will serve for introspection.
If you don't, they won't, and you'll need to look at Postgres-specific
columns". That doesn't seem like a big improvement. Also, given your
point about normalization, how would we define the additions exactly?

regards, tom lane

#11Peter Eisentraut
peter_e@gmx.net
In reply to: Alvaro Herrera (#3)
Re: information_schema and not-null constraints

On 05.09.23 18:24, Alvaro Herrera wrote:

On 2023-Sep-05, Peter Eisentraut wrote:

The following information schema views are affected by the not-null
constraint catalog entries:

1. CHECK_CONSTRAINTS
2. CONSTRAINT_COLUMN_USAGE
3. DOMAIN_CONSTRAINTS
4. TABLE_CONSTRAINTS

Note that 1 and 3 also contain domain constraints.

After looking at what happens for domain constraints in older versions
(I tested 15, but I suppose this applies everywhere), I notice that we
don't seem to handle them anywhere that I can see. My quick exercise is
just

create domain nnint as int not null;
create table foo (a nnint);

and then verify that this constraint shows nowhere -- it's not in
DOMAIN_CONSTRAINTS for starters, which is I think the most obvious place.
And nothing is shown in CHECK_CONSTRAINTS nor TABLE_CONSTRAINTS either.

This did ever work in the past? I tested with 9.3 and didn't see
anything there either.

No, this was never implemented. (As I wrote in my other message on the
other thread, arguably a bit buggy.) We could fix this separately,
unless we are going to implement catalogued domain not-null constraints
soon.

#12Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#1)
Re: information_schema and not-null constraints

On 2023-Sep-04, Alvaro Herrera wrote:

In reference to [1], 0001 attached to this email contains the updated
view definitions that I propose.

Given the downthread discussion, I propose the attached. There are no
changes to v2, other than dropping the test part.

We can improve the situation for domains separately and likewise for
testing.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/

Attachments:

v3-0001-Update-information_schema-definition-for-not-null.patchtext/x-diff; charset=utf-8Download+28-49
#13Vik Fearing
vik@postgresfriends.org
In reply to: Tom Lane (#10)
Re: information_schema and not-null constraints

On 9/6/23 05:40, Tom Lane wrote:

Vik Fearing <vik@postgresfriends.org> writes:

On 9/6/23 02:53, Tom Lane wrote:

What solution do you propose? Starting to enforce the spec's rather
arbitrary requirement that constraint names be unique per-schema is
a complete nonstarter. Changing the set of columns in a spec-defined
view is also a nonstarter, or at least we've always taken it as such.

I both semi-agree and semi-disagree that these are nonstarters. One of
them has to give.

[ shrug... ] if you stick to a SQL-compliant schema setup, then the
information_schema views will serve for introspection. If you don't,
they won't, and you'll need to look at Postgres-specific catalog data.

As someone who regularly asks people to cite chapter and verse of the
standard, do you not see this as a problem?

If there is /one thing/ I wish we were 100% compliant on, it's
information_schema.

This compromise has served for twenty years or so, and I'm not in a
hurry to change it.

Has it? Or is this just the first time someone has complained?

I think the odds of changing to the spec's
restriction without enormous pushback are nil, and I do not think
that the benefit could possibly be worth the ensuing pain to users.

That is a valid opinion, and probably one that will win out for quite a
while.

(It's not even the absolute pain level that is a problem, so much
as the asymmetry: the pain would fall exclusively on users who get
no benefit, because they weren't relying on these views anyway.
If you think that's an easy sell, you're mistaken.)

I am curious how many people we are selling this to. In my career as a
consultant, I have never once come across anyone specifying their own
constraint names. That is certainly anecdotal, and by no means means it
doesn't happen, but my personal experience says that it is very low.

And since our generated names obey the spec (see ChooseConstraintName()
which even says some apps depend on this), I don't see making this
change being a big problem in the real world.

Mind you, I am not pushing (right now) to make this change; I am just
saying that it is the right thing to do.

It could possibly be a little more palatable to add column(s) to the
information_schema views, but I'm having a hard time seeing how that
moves the needle. The situation would still be precisely describable
as "if you stick to a SQL-compliant schema setup, then the standard
columns of the information_schema views will serve for introspection.
If you don't, they won't, and you'll need to look at Postgres-specific
columns". That doesn't seem like a big improvement. Also, given your
point about normalization, how would we define the additions exactly?

This is precisely my point.
--
Vik Fearing

#14Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#12)
Re: information_schema and not-null constraints

On 2023-Sep-06, Alvaro Herrera wrote:

On 2023-Sep-04, Alvaro Herrera wrote:

In reference to [1], 0001 attached to this email contains the updated
view definitions that I propose.

Given the downthread discussion, I propose the attached. There are no
changes to v2, other than dropping the test part.

Pushed.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

#15Peter Eisentraut
peter_e@gmx.net
In reply to: Alvaro Herrera (#12)
Re: information_schema and not-null constraints

On 06.09.23 19:52, Alvaro Herrera wrote:

+    SELECT current_database()::information_schema.sql_identifier AS constraint_catalog,
+           rs.nspname::information_schema.sql_identifier AS constraint_schema,
+           con.conname::information_schema.sql_identifier AS constraint_name,
+           format('CHECK (%s IS NOT NULL)', at.attname)::information_schema.character_data AS check_clause

Small correction here: This should be

pg_catalog.format('%s IS NOT NULL', at.attname)::information_schema.character_data AS check_clause

That is, the word "CHECK" and the parentheses should not be part of the
produced value.

#16Peter Eisentraut
peter_e@gmx.net
In reply to: Peter Eisentraut (#15)
Re: information_schema and not-null constraints

On 14.09.23 10:20, Peter Eisentraut wrote:

On 06.09.23 19:52, Alvaro Herrera wrote:

+    SELECT current_database()::information_schema.sql_identifier AS 
constraint_catalog,
+           rs.nspname::information_schema.sql_identifier AS 
constraint_schema,
+           con.conname::information_schema.sql_identifier AS 
constraint_name,
+           format('CHECK (%s IS NOT NULL)', 
at.attname)::information_schema.character_data AS check_clause

Small correction here: This should be

pg_catalog.format('%s IS NOT NULL',
at.attname)::information_schema.character_data AS check_clause

That is, the word "CHECK" and the parentheses should not be part of the
produced value.

I have committed this fix.

#17Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Peter Eisentraut (#16)
Re: information_schema and not-null constraints

On 2023-Sep-18, Peter Eisentraut wrote:

On 14.09.23 10:20, Peter Eisentraut wrote:

Small correction here: This should be

pg_catalog.format('%s IS NOT NULL',
at.attname)::information_schema.character_data AS check_clause

That is, the word "CHECK" and the parentheses should not be part of the
produced value.

I have committed this fix.

Thanks.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/

#18Peter Eisentraut
peter_e@gmx.net
In reply to: Peter Eisentraut (#15)
Re: information_schema and not-null constraints

On 14.09.23 10:20, Peter Eisentraut wrote:

On 06.09.23 19:52, Alvaro Herrera wrote:

+    SELECT current_database()::information_schema.sql_identifier AS 
constraint_catalog,
+           rs.nspname::information_schema.sql_identifier AS 
constraint_schema,
+           con.conname::information_schema.sql_identifier AS 
constraint_name,
+           format('CHECK (%s IS NOT NULL)', 
at.attname)::information_schema.character_data AS check_clause

Small correction here: This should be

pg_catalog.format('%s IS NOT NULL',
at.attname)::information_schema.character_data AS check_clause

That is, the word "CHECK" and the parentheses should not be part of the
produced value.

Slightly related, so let's just tack it on here:

While testing this, I noticed that the way the check_clause of regular
check constraints is computed appears to be suboptimal. It currently does

CAST(substring(pg_get_constraintdef(con.oid) from 7) AS character_data)

which ends up with an extra set of parentheses, which is ignorable, but
it also leaves in suffixes like "NOT VALID", which don't belong into
that column. Earlier in this thread I had contemplated a fix for the
first issue, but that wouldn't address the second issue. I think we can
fix this quite simply by using pg_get_expr() instead. I don't know why
it wasn't done like that to begin with, maybe it was just a (my?)
mistake. See attached patch.

Attachments:

0001-Simplify-information-schema-check-constraint-deparsi.patchtext/plain; charset=UTF-8; name=0001-Simplify-information-schema-check-constraint-deparsi.patchDownload+1-3
#19Peter Eisentraut
peter_e@gmx.net
In reply to: Peter Eisentraut (#18)
Re: information_schema and not-null constraints

On 19.09.23 09:01, Peter Eisentraut wrote:

While testing this, I noticed that the way the check_clause of regular
check constraints is computed appears to be suboptimal.  It currently does

CAST(substring(pg_get_constraintdef(con.oid) from 7) AS character_data)

which ends up with an extra set of parentheses, which is ignorable, but
it also leaves in suffixes like "NOT VALID", which don't belong into
that column.  Earlier in this thread I had contemplated a fix for the
first issue, but that wouldn't address the second issue.  I think we can
fix this quite simply by using pg_get_expr() instead.  I don't know why
it wasn't done like that to begin with, maybe it was just a (my?)
mistake.  See attached patch.

committed