SET NOT NULL [NOT VALID / CONCURRENTLY]?

Started by Joel Jacobsonabout 9 years ago7 messages
#1Joel Jacobson
joel@trustly.com

Hi hackers,

I would be good if it would be possible to quickly set NOT NULL for an
existing column in a table
that have no rows where the column IS NULL and where there is a full
index on the column
allowing the logics to quickly understand there are no NULL values,
and just have to take a quick
lock on the table to prevent any modifications during the short time
when the NOT NULL
is set for the column.

Currently if you want to set NOT NULL for a column in a huge table
that's not doable without blocking all writes to the table for quite
some time.
Setting NOT NULL for a 100 million row table took 28 seconds locally
on my machine.

Is anyone working on fixing this for PostgreSQL 10?

Joel Jacobson
Trustly

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Craig Ringer
craig@2ndquadrant.com
In reply to: Joel Jacobson (#1)
Re: SET NOT NULL [NOT VALID / CONCURRENTLY]?

On 21 December 2016 at 16:48, Joel Jacobson <joel@trustly.com> wrote:

Hi hackers,

I would be good if it would be possible to quickly set NOT NULL for an
existing column in a table
that have no rows where the column IS NULL and where there is a full
index on the column
allowing the logics to quickly understand there are no NULL values,
and just have to take a quick
lock on the table to prevent any modifications during the short time
when the NOT NULL
is set for the column.

Currently if you want to set NOT NULL for a column in a huge table
that's not doable without blocking all writes to the table for quite
some time.
Setting NOT NULL for a 100 million row table took 28 seconds locally
on my machine.

Is anyone working on fixing this for PostgreSQL 10?

Not as far as I know.

IMO this and other similar cases should all be handled the same way:
create the constraint NOT VALID, then VALIDATE it while holding a weak
lock that only blocks concurrent schema changes.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Joel Jacobson
joel@trustly.com
In reply to: Craig Ringer (#2)
Re: SET NOT NULL [NOT VALID / CONCURRENTLY]?

On Wed, Dec 21, 2016 at 4:24 PM, Craig Ringer <craig@2ndquadrant.com> wrote:

Is anyone working on fixing this for PostgreSQL 10?

Not as far as I know.

IMO this and other similar cases should all be handled the same way:
create the constraint NOT VALID, then VALIDATE it while holding a weak
lock that only blocks concurrent schema changes.

Sounds like a good approach.

Similar to what we (Trustly) did when we sponsored the FOR KEY LOCK
feature to improve concurrency,
we would be very interested in also sponsoring this feature, as it
would mean a great lot to us.
I don't know if this is the right forum trying to find someone/some
company to sign up for the task,
please let me know if I should mail to some other list. Thanks.

Joel Jacobson
Trustly

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Craig Ringer
craig@2ndquadrant.com
In reply to: Joel Jacobson (#3)
Re: SET NOT NULL [NOT VALID / CONCURRENTLY]?

On 21 December 2016 at 19:01, Joel Jacobson <joel@trustly.com> wrote:

Similar to what we (Trustly) did when we sponsored the FOR KEY LOCK
feature to improve concurrency,
we would be very interested in also sponsoring this feature, as it
would mean a great lot to us.
I don't know if this is the right forum trying to find someone/some
company to sign up for the task,
please let me know if I should mail to some other list. Thanks.

You'll probably get mail off list.

For what it's worth, there's a bit of a complexity here. PostgreSQL
doesn't model NOT NULL as a true CONSTRAINT. Instead it's a column
attribute. I suspect we would need to change that in order to allow a
NOT VALID NOT NULL constraint to be created.

That's at least partly why the docs say that "option NOT VALID [...]
is currently only allowed for foreign key and CHECK constraints".

Note that "[VALIDATE] acquires only a SHARE UPDATE EXCLUSIVE lock on
the table being altered" so it's already suitable for what you need.
The challenge is making it possible to create a NOT VALID constraint
for NOT NULL.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Joel Jacobson
joel@trustly.com
In reply to: Craig Ringer (#4)
Re: SET NOT NULL [NOT VALID / CONCURRENTLY]?

If you are fully confident you have no NULL values,
e.g. if you have all your logics in db functions and you validate all
INSERTs to a table won't pass any NULL values,
and you have checked all the rows in a table are NOT NULL for the column,
would it be completely crazy to just set pg_attribute.attnotnull to
TRUE for the column?

Is anything else happening "under the hood" than just locking all rows
and verifying there are no NULL rows, and then setting attnotnull to
TRUE?

On Wed, Dec 21, 2016 at 6:37 PM, Craig Ringer <craig@2ndquadrant.com> wrote:

On 21 December 2016 at 19:01, Joel Jacobson <joel@trustly.com> wrote:

Similar to what we (Trustly) did when we sponsored the FOR KEY LOCK
feature to improve concurrency,
we would be very interested in also sponsoring this feature, as it
would mean a great lot to us.
I don't know if this is the right forum trying to find someone/some
company to sign up for the task,
please let me know if I should mail to some other list. Thanks.

You'll probably get mail off list.

For what it's worth, there's a bit of a complexity here. PostgreSQL
doesn't model NOT NULL as a true CONSTRAINT. Instead it's a column
attribute. I suspect we would need to change that in order to allow a
NOT VALID NOT NULL constraint to be created.

That's at least partly why the docs say that "option NOT VALID [...]
is currently only allowed for foreign key and CHECK constraints".

Note that "[VALIDATE] acquires only a SHARE UPDATE EXCLUSIVE lock on
the table being altered" so it's already suitable for what you need.
The challenge is making it possible to create a NOT VALID constraint
for NOT NULL.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Joel Jacobson

Mobile: +46703603801
Trustly.com | Newsroom | LinkedIn | Twitter

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Joel Jacobson
joel@trustly.com
In reply to: Joel Jacobson (#5)
1 attachment(s)
Re: SET NOT NULL [NOT VALID / CONCURRENTLY]?

Attached is the function SET_NOT_NULL(_Schema name, _Table name, _Column
name) which does the following:

1. LOCK TABLE %I.%I IN ACCESS EXCLUSIVE MODE
just like the normal DDL commands would do

2. SELECT EXISTS (SELECT 1 FROM %I.%I WHERE %I IS NULL)
which is fast if there is an index on the column

3. UPDATE pg_catalog.pg_attribute SET attnotnull = TRUE
WHERE attrelid = %L::oid
AND attname = %L

Pragmatically, would this be a safe approach?

On Wed, Dec 21, 2016 at 6:53 PM, Joel Jacobson <joel@trustly.com> wrote:

If you are fully confident you have no NULL values,
e.g. if you have all your logics in db functions and you validate all
INSERTs to a table won't pass any NULL values,
and you have checked all the rows in a table are NOT NULL for the column,
would it be completely crazy to just set pg_attribute.attnotnull to
TRUE for the column?

Is anything else happening "under the hood" than just locking all rows
and verifying there are no NULL rows, and then setting attnotnull to
TRUE?

On Wed, Dec 21, 2016 at 6:37 PM, Craig Ringer <craig@2ndquadrant.com>
wrote:

On 21 December 2016 at 19:01, Joel Jacobson <joel@trustly.com> wrote:

Similar to what we (Trustly) did when we sponsored the FOR KEY LOCK
feature to improve concurrency,
we would be very interested in also sponsoring this feature, as it
would mean a great lot to us.
I don't know if this is the right forum trying to find someone/some
company to sign up for the task,
please let me know if I should mail to some other list. Thanks.

You'll probably get mail off list.

For what it's worth, there's a bit of a complexity here. PostgreSQL
doesn't model NOT NULL as a true CONSTRAINT. Instead it's a column
attribute. I suspect we would need to change that in order to allow a
NOT VALID NOT NULL constraint to be created.

That's at least partly why the docs say that "option NOT VALID [...]
is currently only allowed for foreign key and CHECK constraints".

Note that "[VALIDATE] acquires only a SHARE UPDATE EXCLUSIVE lock on
the table being altered" so it's already suitable for what you need.
The challenge is making it possible to create a NOT VALID constraint
for NOT NULL.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Joel Jacobson

Mobile: +46703603801
Trustly.com | Newsroom | LinkedIn | Twitter

--
Joel Jacobson

Mobile: +46703603801
*Trustly.com <http://trustly.com/&gt; | Newsroom
<http://www.mynewsdesk.com/trustly_en&gt; | LinkedIn
<https://www.linkedin.com/company/trustly-group-ab&gt; | **Twitter
<https://twitter.com/Trustly&gt;*

* <https://trustly.com/&gt;*

Attachments:

set_not_null.sqlapplication/octet-stream; name=set_not_null.sqlDownload
#7Robert Haas
robertmhaas@gmail.com
In reply to: Joel Jacobson (#6)
Re: SET NOT NULL [NOT VALID / CONCURRENTLY]?

On Wed, Dec 21, 2016 at 7:55 PM, Joel Jacobson <joel@trustly.com> wrote:

Attached is the function SET_NOT_NULL(_Schema name, _Table name, _Column
name) which does the following:

1. LOCK TABLE %I.%I IN ACCESS EXCLUSIVE MODE
just like the normal DDL commands would do

2. SELECT EXISTS (SELECT 1 FROM %I.%I WHERE %I IS NULL)
which is fast if there is an index on the column

3. UPDATE pg_catalog.pg_attribute SET attnotnull = TRUE
WHERE attrelid = %L::oid
AND attname = %L

Pragmatically, would this be a safe approach?

Hmm, I don't see a problem with it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company