creating CHECK constraints as NOT VALID
This patch allows you to initially declare a CHECK constraint as NOT
VALID, similar to what we already allow for foreign keys. That is, you
create the constraint without scanning the table and after it is
committed, it is enforced for new rows; later, all rows are checked by
running ALTER TABLE VALIDATE CONSTRAINT, which doesn't need
AccessExclusive thus allowing for better concurrency.
The trickiest bit here was realizing that unlike FKs, check constraints
do inherit, and so needed special treatment for recursion. Other than
that I think this was pretty straightforward.
I intend to attempt to apply this to NOT NULL constraints as well, once
the patch to add them to pg_constraint is in.
Thoughts?
This patch courtesy of Enova Financial.
--
Álvaro Herrera <alvherre@alvh.no-ip.org>
On Tue, May 31, 2011 at 11:04 AM, Alvaro Herrera
<alvherre@alvh.no-ip.org> wrote:
This patch allows you to initially declare a CHECK constraint as NOT
VALID
seems you forgot to add the patch itself
--
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte y capacitación de PostgreSQL
Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
This patch allows you to initially declare a CHECK constraint as
NOT VALID, similar to what we already allow for foreign keys.
That is, you create the constraint without scanning the table and
< after it is committed, it is enforced for new rows; later, all
rows are checked by running ALTER TABLE VALIDATE CONSTRAINT, which
doesn't need AccessExclusive thus allowing for better concurrency.
I think it's a valuable feature, not just in terms of timing and
concurrency, but in terms of someone starting with less-than-perfect
data who wants to prevent further degradation while cleaning up the
existing problems. This feature is present in other databases I've
used.
-Kevin
On Tue, May 31, 2011 at 12:04 PM, Alvaro Herrera
<alvherre@alvh.no-ip.org> wrote:
This patch allows you to initially declare a CHECK constraint as NOT
VALID, similar to what we already allow for foreign keys. That is, you
create the constraint without scanning the table and after it is
committed, it is enforced for new rows; later, all rows are checked by
running ALTER TABLE VALIDATE CONSTRAINT, which doesn't need
AccessExclusive thus allowing for better concurrency.The trickiest bit here was realizing that unlike FKs, check constraints
do inherit, and so needed special treatment for recursion. Other than
that I think this was pretty straightforward.I intend to attempt to apply this to NOT NULL constraints as well, once
the patch to add them to pg_constraint is in.
Seems like a logical extension of what we have now.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Excerpts from Jaime Casanova's message of mar may 31 12:24:09 -0400 2011:
On Tue, May 31, 2011 at 11:04 AM, Alvaro Herrera
<alvherre@alvh.no-ip.org> wrote:This patch allows you to initially declare a CHECK constraint as NOT
VALIDseems you forgot to add the patch itself
oops ... another bug in my email client, it seems.
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Attachments:
0001-Enable-CHECK-constraints-to-be-declared-NOT-VALID.patchapplication/octet-stream; name=0001-Enable-CHECK-constraints-to-be-declared-NOT-VALID.patchDownload+278-43
Excerpts from Alvaro Herrera's message of mar may 31 12:39:48 -0400 2011:
Excerpts from Jaime Casanova's message of mar may 31 12:24:09 -0400 2011:
On Tue, May 31, 2011 at 11:04 AM, Alvaro Herrera
<alvherre@alvh.no-ip.org> wrote:This patch allows you to initially declare a CHECK constraint as NOT
VALIDseems you forgot to add the patch itself
oops ... another bug in my email client, it seems.
Hmm, found an inconsistency in the way recursion is handled -- other
commands have a AT_DoFooRecurse case. Weird. I'll change this to be
like that, though I don't readily see why we do it that way.
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Here it is -- as a context patch this time, as well.
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Attachments:
0001-Enable-CHECK-constraints-to-be-declared-NOT-VALID.patchapplication/octet-stream; name=0001-Enable-CHECK-constraints-to-be-declared-NOT-VALID.patchDownload+325-164
On Tue, May 31, 2011 at 11:35:01AM -0500, Kevin Grittner wrote:
Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
This patch allows you to initially declare a CHECK constraint as
NOT VALID, similar to what we already allow for foreign keys.
That is, you create the constraint without scanning the table and< after it is committed, it is enforced for new rows; later, all
rows are checked by running ALTER TABLE VALIDATE CONSTRAINT, which
doesn't need AccessExclusive thus allowing for better concurrency.I think it's a valuable feature, not just in terms of timing and
concurrency, but in terms of someone starting with less-than-perfect
data who wants to prevent further degradation while cleaning up the
existing problems. This feature is present in other databases I've
used.
Yup, the ER triage approach to data integrity: "Stop the major bleeding,
we'll go back and make it a pretty scar later"
Follows from one of the practical maxims of databases: "The data is
always dirty" Being able to have the constraints enforced at least for
new data allows you to at least fence the bad data, and have a shot at
fixing it all. Right now, you may be forced into running with
constraints effectively 'off', depending on the app to get new data
right, while attempting to catch up. And the app probably put the bad
data in there in the first place. One of the thankless, important but
seemingly never urgent tasks.
Ross
--
Ross Reedstrom, Ph.D. reedstrm@rice.edu
Systems Engineer & Admin, Research Scientist phone: 713-348-6166
Connexions http://cnx.org fax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E F888 D3AE 810E 88F0 BEDE
Excerpts from Ross J. Reedstrom's message of mar may 31 14:02:04 -0400 2011:
Follows from one of the practical maxims of databases: "The data is
always dirty" Being able to have the constraints enforced at least for
new data allows you to at least fence the bad data, and have a shot at
fixing it all. Right now, you may be forced into running with
constraints effectively 'off', depending on the app to get new data
right, while attempting to catch up. And the app probably put the bad
data in there in the first place. One of the thankless, important but
seemingly never urgent tasks.
Interesting point of view. I have to admit that I didn't realize I was
allowing that, even though I have wished for it in the past myself.
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Tue, May 31, 2011 at 1:07 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
Excerpts from Ross J. Reedstrom's message of mar may 31 14:02:04 -0400 2011:
Follows from one of the practical maxims of databases: "The data is
always dirty" Being able to have the constraints enforced at least for
new data allows you to at least fence the bad data, and have a shot at
fixing it all.Interesting point of view. I have to admit that I didn't realize I was
allowing that, even though I have wished for it in the past myself.
What happens when there's bad data that the new transaction touches in
some minor way? For example updating some other column of the row or
just locking the row? What about things like cluster or table
rewrites?
Also I think NOT NULL might be used in the join elimination patch.
Make sure it understands the "valid" flag and doesn't drop joins that
aren't needed. It would be nice to have this for unique constraints as
well which would *definitely* need to have the planner understand
whether they're valid or not.
--
greg
On 31 May 2011 18:43, Alvaro Herrera <alvherre@commandprompt.com> wrote:
Here it is -- as a context patch this time, as well.
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
There is this scenario:
test=# CREATE DOMAIN things AS INT CHECK (VALUE > 5);
CREATE DOMAIN
test=# CREATE TABLE abc (id SERIAL, stuff things);
NOTICE: CREATE TABLE will create implicit sequence "abc_id_seq" for
serial column "abc.id"
CREATE TABLE
test=# INSERT INTO abc (stuff) VALUES (55);
INSERT 0 1
test=# ALTER DOMAIN things ADD CONSTRAINT meow CHECK (VALUE < 11) NOT VALID;
ERROR: column "stuff" of table "abc" contains values that violate the
new constraint
STATEMENT: ALTER DOMAIN things ADD CONSTRAINT meow CHECK (VALUE < 11)
NOT VALID;
--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, May 31, 2011 at 7:03 PM, Greg Stark <gsstark@mit.edu> wrote:
On Tue, May 31, 2011 at 1:07 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:Excerpts from Ross J. Reedstrom's message of mar may 31 14:02:04 -0400 2011:
Follows from one of the practical maxims of databases: "The data is
always dirty" Being able to have the constraints enforced at least for
new data allows you to at least fence the bad data, and have a shot at
fixing it all.Interesting point of view. I have to admit that I didn't realize I was
allowing that, even though I have wished for it in the past myself.What happens when there's bad data that the new transaction touches in
some minor way? For example updating some other column of the row or
just locking the row?
Updating some other column should fail unless the constraint is
satisfied for the resulting row, I think. The rule should be simple
and easy to understand: old row (versions) aren't checked, but new
ones must satisfy all constraints, whether validated or not.
There's no question that this feature has a certain amount of foot-gun
potential. But it's also really useful. And there are plenty of
people who know how to use a gun safely, without shooting themselves
in the foot. We shouldn't aim for the lowest common denominator.
What about things like cluster or table
rewrites?Also I think NOT NULL might be used in the join elimination patch.
Make sure it understands the "valid" flag and doesn't drop joins that
aren't needed. It would be nice to have this for unique constraints as
well which would *definitely* need to have the planner understand
whether they're valid or not.
Yeah.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Excerpts from Thom Brown's message of mar may 31 20:18:18 -0400 2011:
On 31 May 2011 18:43, Alvaro Herrera <alvherre@commandprompt.com> wrote:
Here it is -- as a context patch this time, as well.
There is this scenario:
test=# CREATE DOMAIN things AS INT CHECK (VALUE > 5);
CREATE DOMAIN
test=# CREATE TABLE abc (id SERIAL, stuff things);
NOTICE: CREATE TABLE will create implicit sequence "abc_id_seq" for
serial column "abc.id"
CREATE TABLE
test=# INSERT INTO abc (stuff) VALUES (55);
INSERT 0 1
test=# ALTER DOMAIN things ADD CONSTRAINT meow CHECK (VALUE < 11) NOT VALID;
ERROR: column "stuff" of table "abc" contains values that violate the
new constraint
STATEMENT: ALTER DOMAIN things ADD CONSTRAINT meow CHECK (VALUE < 11)
NOT VALID;
Oooh, I hadn't realized that I was opening the door for domains and
check constraints therein. I'll have a look at this.
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Tue, May 31, 2011 at 12:04:07PM -0400, Alvaro Herrera wrote:
This patch allows you to initially declare a CHECK constraint as NOT
VALID, similar to what we already allow for foreign keys. That is, you
create the constraint without scanning the table and after it is
committed, it is enforced for new rows; later, all rows are checked by
running ALTER TABLE VALIDATE CONSTRAINT, which doesn't need
AccessExclusive thus allowing for better concurrency.The trickiest bit here was realizing that unlike FKs, check constraints
do inherit, and so needed special treatment for recursion. Other than
that I think this was pretty straightforward.I intend to attempt to apply this to NOT NULL constraints as well, once
the patch to add them to pg_constraint is in.Thoughts?
This patch courtesy of Enova Financial.
Great stuff!
A colleague brought up an interesting idea that I think is worth
exploring for all NOT VALID constraints, to wit, is there some way
(via SQL) to find which rows violate which constraints? I'm picturing
some kind of function that could be aggregated into some structure for
each violating row...
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
Excerpts from David Fetter's message of mar may 31 21:42:08 -0400 2011:
A colleague brought up an interesting idea that I think is worth
exploring for all NOT VALID constraints, to wit, is there some way
(via SQL) to find which rows violate which constraints? I'm picturing
some kind of function that could be aggregated into some structure for
each violating row...
Seems like a job for a plpgsql function with a bunch of exception handlers ...
Some details like the violated constraint name would be hard to extract,
probably, though.
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Excerpts from Thom Brown's message of mar may 31 20:18:18 -0400 2011:
test=# CREATE DOMAIN things AS INT CHECK (VALUE > 5);
CREATE DOMAIN
test=# CREATE TABLE abc (id SERIAL, stuff things);
NOTICE: CREATE TABLE will create implicit sequence "abc_id_seq" for
serial column "abc.id"
CREATE TABLE
test=# INSERT INTO abc (stuff) VALUES (55);
INSERT 0 1
test=# ALTER DOMAIN things ADD CONSTRAINT meow CHECK (VALUE < 11) NOT VALID;
ERROR: column "stuff" of table "abc" contains values that violate the
new constraint
STATEMENT: ALTER DOMAIN things ADD CONSTRAINT meow CHECK (VALUE < 11)
NOT VALID;
Okay, fixed that and added ALTER DOMAIN VALIDATE CONSTRAINT too.
Thanks for the review.
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Attachments:
0001-Make-NOT-VALID-constraints-work-on-domains-too.patchapplication/octet-stream; name=0001-Make-NOT-VALID-constraints-work-on-domains-too.patchDownload+142-137
Here's a complete patch with all this stuff, plus doc additions and
simple regression tests for the new ALTER DOMAIN commands.
Enable CHECK constraints to be declared NOT VALID
This means that they can initially be added to a large existing table
without checking its initial contents, but new tuples must comply to
them; a separate pass invoked by ALTER TABLE / VALIDATE can verify
existing data and ensure it complies with the constraint, at which point
it is marked validated and becomes a normal part of the table ecosystem.
This patch also enables domains to have unvalidated CHECK constraints
attached to them as well by way of ALTER DOMAIN / ADD CONSTRAINT / NOT
VALID, which can later be validated with ALTER DOMAIN / VALIDATE
CONSTRAINT.
This patch was sponsored by Enova Financial.
doc/src/sgml/catalogs.sgml | 2 +-
doc/src/sgml/ref/alter_domain.sgml | 39 +++++-
doc/src/sgml/ref/alter_table.sgml | 4 +-
src/backend/catalog/heap.c | 13 +-
src/backend/commands/tablecmds.c | 227 ++++++++++++++++++++++++-----
src/backend/commands/typecmds.c | 140 ++++++++++++++++--
src/backend/parser/gram.y | 22 +++
src/backend/tcop/utility.c | 4 +
src/include/catalog/heap.h | 1 +
src/include/commands/typecmds.h | 1 +
src/include/nodes/parsenodes.h | 3 +
src/test/regress/expected/alter_table.out | 36 +++++
src/test/regress/expected/domain.out | 11 ++
src/test/regress/sql/alter_table.sql | 29 ++++
src/test/regress/sql/domain.sql | 10 ++
15 files changed, 480 insertions(+), 62 deletions(-)
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Attachments:
not-valid-check.patchapplication/octet-stream; name=not-valid-check.patchDownload+528-318
On 1 June 2011 23:47, Alvaro Herrera <alvherre@commandprompt.com> wrote:
Here's a complete patch with all this stuff, plus doc additions and
simple regression tests for the new ALTER DOMAIN commands.Enable CHECK constraints to be declared NOT VALID
This means that they can initially be added to a large existing table
without checking its initial contents, but new tuples must comply to
them; a separate pass invoked by ALTER TABLE / VALIDATE can verify
existing data and ensure it complies with the constraint, at which point
it is marked validated and becomes a normal part of the table ecosystem.This patch also enables domains to have unvalidated CHECK constraints
attached to them as well by way of ALTER DOMAIN / ADD CONSTRAINT / NOT
VALID, which can later be validated with ALTER DOMAIN / VALIDATE
CONSTRAINT.
Is this expected?
postgres=# CREATE TABLE a (num INT);
CREATE TABLE
postgres=# INSERT INTO a (num) VALUES (90);
INSERT 0 1
postgres=# ALTER TABLE a ADD CONSTRAINT meow CHECK (num < 20) NOT VALID;
ALTER TABLE
postgres=# CREATE DATABASE test;
CREATE DATABASE
postgres=# \q
postgresql thom$ pg_dump -f /tmp/test.sql postgres
postgresql thom$ psql test < /tmp/test.sql
SET
SET
SET
SET
SET
COMMENT
CREATE EXTENSION
COMMENT
SET
SET
SET
CREATE TABLE
ALTER TABLE
ERROR: new row for relation "a" violates check constraint "meow"
CONTEXT: COPY a, line 1: "90"
STATEMENT: COPY a (num) FROM stdin;
ERROR: new row for relation "a" violates check constraint "meow"
CONTEXT: COPY a, line 1: "90"
REVOKE
REVOKE
GRANT
GRANT
Shouldn't the constraint be dumped as not valid too??
--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Excerpts from Thom Brown's message of mié jun 01 19:48:44 -0400 2011:
Is this expected?
[ pg_dump fails to preserve not-valid status of constraints ]
Certainly not.
Shouldn't the constraint be dumped as not valid too??
Sure, I'll implement that tomorrow.
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Excerpts from Alvaro Herrera's message of mié jun 01 20:56:12 -0400 2011:
Excerpts from Thom Brown's message of mié jun 01 19:48:44 -0400 2011:
Is this expected?
[ pg_dump fails to preserve not-valid status of constraints ]Certainly not.
Shouldn't the constraint be dumped as not valid too??
Sure, I'll implement that tomorrow.
Actually, it turns out that NOT VALID foreign keys were already buggy
here, and fixing them automatically fixes this case as well, because the
fix involves touching pg_get_constraintdef to dump the flag. This also
gets it into psql's \d. Patch attached.
(Maybe the changes in psql's describe.c should be reverted, not sure.)
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support