feature idea: use index when checking for NULLs before SET NOT NULL
There's the age-old problem of SET NOT NULL being impossible on large actively used tables, because it needs to lock the table and do a table scan to check if there are any existing NULL values. I currently have a table that's not particularly huge but a scan takes 70 seconds, which causes unacceptable downtime for my entire application.
Postgres is not able to use an index when doing this check: https://dba.stackexchange.com/questions/267947
Would it be possible to have Postgres use an index for this check? Given the right index, the check could be instant and the table would only need to be locked for milliseconds.
(I'm sure I'm not the first person to think of this, but I couldn't find any other discussion on this list or elsewhere.)
Thanks for reading!
John
Hello
Correct index lookup is a difficult task. I tried to implement this previously...
But the answer in SO is a bit incomplete for recent postgresql releases. Seqscan is not the only possible way to set not null in pg12+. My patch was commited ( https://commitfest.postgresql.org/22/1389/ ) and now it's possible to do this way:
alter table foos
add constraint foos_not_null
check (bar1 is not null) not valid; -- short-time exclusive lock
alter table foos validate constraint foos_not_null; -- still seqscan entire table but without exclusive lock
An then another short lock:
alter table foos alter column bar1 set not null;
alter table foos drop constraint foos_not_null;
regards, Sergei
On Fri, May 29, 2020 at 8:56 AM Sergei Kornilov <sk@zsrv.org> wrote:
Hello
Correct index lookup is a difficult task. I tried to implement this
previously...But the answer in SO is a bit incomplete for recent postgresql releases.
Seqscan is not the only possible way to set not null in pg12+. My patch was
commited ( https://commitfest.postgresql.org/22/1389/ ) and now it's
possible to do this way:alter table foos
add constraint foos_not_null
check (bar1 is not null) not valid; -- short-time exclusive lockalter table foos validate constraint foos_not_null; -- still seqscan
entire table but without exclusive lockAn then another short lock:
alter table foos alter column bar1 set not null;
alter table foos drop constraint foos_not_null;
That's really good to know, Sergei!
John, I think it's worth pointing out that Postgres most likely does a full
table scan to validate a constraint by design and not in optimization
oversight. Think of what's gonna happen if the index used for checking is
corrupted?
Cheers,
--
Alex
Wow! Thank you Sergei for working on this patch, for working for months/years to get it in, and for replying to my email!
For others reading this later:
- the feature was introduced in 12
- the commit is here https://github.com/postgres/postgres/commit/bbb96c3704c041d139181c6601e5bc770e045d26
Sergei, a few questions:
- Just to be clear, your recipe does not require any indexes, right? Because the constraint check table scan is inherently concurrent?
- Was this new behavior mentioned in the release nose?
- Do you know if there are any blog posts etc. discussing this? (I'm definitely going to write one!!)
John
Show quoted text
But the answer in SO is a bit incomplete for recent postgresql
releases. Seqscan is not the only possible way to set not null in
pg12+. My patch was commited (
https://commitfest.postgresql.org/22/1389/ ) and now it's possible to
do this way:alter table foos
add constraint foos_not_null
check (bar1 is not null) not valid; -- short-time exclusive lockalter table foos validate constraint foos_not_null; -- still seqscan
entire table but without exclusive lockAn then another short lock:
alter table foos alter column bar1 set not null;
alter table foos drop constraint foos_not_null;regards, Sergei
John, I think it's worth pointing out that Postgres most likely does a
full table scan to validate a constraint by design and not in optimization
oversight. Think of what's gonna happen if the index used for checking is
corrupted?
This can't be true: a corrupted index is a failure mode, and failure modes
are not expected in normal flow. Think of it this way: we must never use
index scan, because if index is corrupted the results are going to be
disastrous, so we will always do Seq Scans.
It's ok to assume index is not corrupted.
--
Darafei Praliaskouski
Support me: http://patreon.com/komzpa
Hi
Sergei, a few questions:
- Just to be clear, your recipe does not require any indexes, right? Because the constraint check table scan is inherently concurrent?
Right. "alter table validate constraint" can not use indexes, but does not block concurrent read/write queries. Other queries in this scenario can not use indexes too, but should be fast.
- Was this new behavior mentioned in the release nose?
Yes, small note in documentation and small note in release notes https://www.postgresql.org/docs/12/release-12.html this one:
Allow ALTER TABLE ... SET NOT NULL to avoid unnecessary table scans (Sergei Kornilov)
This can be optimized when the table's column constraints can be recognized as disallowing nulls.
- Do you know if there are any blog posts etc. discussing this? (I'm definitely going to write one!!)
I do not know. Personally, I mentioned this feature in only a few Russian-speaking communities. And right now I wrote answer in dba.SO.
regards, Sergei
Hi Sergei - I just used the recipe on my production database. I didn't observe all the expected benefits, I wonder if there were confounding factors or if I did something wrong. If you have time, I'd love to get your feedback. Let me know if you need more info. I'd love to write a blog post informing the world about this potentially game-changing feature!
Here are the commands I did, with some notes. All the columns are boolean. The table has about 8,600,000 rows.
This (blocking operation) was not fast, perhaps 60-100 seconds. maybe running them individually
would have been proportionally faster. but even then, not near-instant as expected.
or, maybe running them together had some sort of aggregate negative effect, so running them individually
would have been instant? I don't have much experience with such constraints.
ALTER TABLE my_table
ADD CONSTRAINT my_table_column1_not_null CHECK (column1 IS NOT NULL) NOT VALID,
ADD CONSTRAINT my_table_column2_not_null CHECK (column2 IS NOT NULL) NOT VALID,
ADD CONSTRAINT my_table_column3_not_null CHECK (column3 IS NOT NULL) NOT VALID,
ADD CONSTRAINT my_table_column4_not_null CHECK (column4 IS NOT NULL) NOT VALID;
as expected these took as long as a table scan, and as expected they did not block.
ALTER TABLE my_table validate CONSTRAINT my_table_column1_not_null;
ALTER TABLE my_table validate CONSTRAINT my_table_column2_not_null;
ALTER TABLE my_table validate CONSTRAINT my_table_column3_not_null;
ALTER TABLE my_table validate CONSTRAINT my_table_column4_not_null;
SLOW (table scan speed) - didn't have timing on, but I think about same time as the next one.
ALTER TABLE my_table ALTER COLUMN column1 SET NOT NULL;
01:39 SLOW (table scan speed)
ALTER TABLE my_table ALTER COLUMN column2 SET NOT NULL;
00:22 - 1/4 time of table scan but still not instant like expected
ALTER TABLE my_table ALTER COLUMN column3 SET NOT NULL;
20.403 ms - instant, like expected
ALTER TABLE my_table ALTER COLUMN column4 SET NOT NULL;
all < 100ms
ALTER TABLE my_table DROP CONSTRAINT my_table_column1_not_null;
ALTER TABLE my_table DROP CONSTRAINT my_table_column2_not_null;
ALTER TABLE my_table DROP CONSTRAINT my_table_column3_not_null;
ALTER TABLE my_table DROP CONSTRAINT my_table_column4_not_null;
On Fri, May 29, 2020 at 09:53:14PM -0400, John Bachir wrote:
Hi Sergei - I just used the recipe on my production database. I didn't
observe all the expected benefits, I wonder if there were confounding factors
or if I did something wrong. If you have time, I'd love to get your feedback.
Let me know if you need more info. I'd love to write a blog post informing
the world about this potentially game-changing feature!
If you do it right, you can see a DEBUG:
postgres=# CREATE TABLE tn (i int);
postgres=# ALTER TABLE tn ADD CONSTRAINT nn CHECK (i IS NOT NULL) NOT VALID;
postgres=# ALTER TABLE tn VALIDATE CONSTRAINT nn;
postgres=# SET client_min_messages=debug;
postgres=# ALTER TABLE tn ALTER i SET NOT NULL ;
DEBUG: existing constraints on column "tn"."i" are sufficient to prove that it does not contain nulls
SLOW (table scan speed) - didn't have timing on, but I think about same time as the next one.
ALTER TABLE my_table ALTER COLUMN column1 SET NOT NULL;01:39 SLOW (table scan speed)
ALTER TABLE my_table ALTER COLUMN column2 SET NOT NULL;00:22 - 1/4 time of table scan but still not instant like expected
ALTER TABLE my_table ALTER COLUMN column3 SET NOT NULL;20.403 ms - instant, like expected
ALTER TABLE my_table ALTER COLUMN column4 SET NOT NULL;
That the duration decreased every time may have been due to caching?
How big is the table vs RAM ?
Do you know if the SET NOT NULL blocked or not ?
Maybe something else had a nontrivial lock on the table, and those commands
were waiting on lock. If you "SET deadlock_timeout='1'; SET
log_lock_waits=on;", then you could see that.
--
Justin
On Fri, May 29, 2020, at 10:10 PM, Justin Pryzby wrote:
If you do it right, you can see a DEBUG:
postgres=# SET client_min_messages=debug;
postgres=# ALTER TABLE tn ALTER i SET NOT NULL ;
DEBUG: existing constraints on column "tn"."i" are sufficient to prove
that it does not contain nulls
Thanks! I'll add that to my recipe for the future. Although by that time it would be too late, so to make use of this I would have to set up a cloned test environment and hope that all conditions are correctly cloned. Is there a way to check sufficiency before running the command?
That the duration decreased every time may have been due to caching?
How big is the table vs RAM ?
Table is about 10 gigs, machine has 16gigs, I'm hoping OS & PG did not decided to kick out everything else from ram when doing the operation. But even with caching, the final command being 20ms, and the first 2 commands being the same time as a table scan, seems like something other than caching is at play here? IDK!
Do you know if the SET NOT NULL blocked or not ?
Maybe something else had a nontrivial lock on the table, and those commands
were waiting on lock. If you "SET deadlock_timeout='1'; SET
log_lock_waits=on;", then you could see that.
I don't know if it blocked. Great idea! I'll add that to my recipe as well.
John
p.s. current recipe: https://gist.github.com/jjb/fab5cc5f0e1b23af28694db4fc01c55a
p.p.s I think one of the biggest surprises was that setting the NOT NULL condition was slow. That's totally unrelated to this feature though and out of scope for this list though, I asked about it here https://dba.stackexchange.com/questions/268301/why-is-add-constraint-not-valid-taking-a-long-time
Maybe something else had a nontrivial lock on the table, and those commands
were waiting on lock. If you "SET deadlock_timeout='1'; SET
log_lock_waits=on;", then you could see that.
Just checking - I think you mean lock_timeout? (although setting deadlock_timeout is also not a bad idea just in case).
On Mon, Jun 01, 2020 at 10:49:25AM -0400, John Bachir wrote:
On Fri, May 29, 2020, at 10:10 PM, Justin Pryzby wrote:
If you do it right, you can see a DEBUG:
postgres=# SET client_min_messages=debug;
postgres=# ALTER TABLE tn ALTER i SET NOT NULL ;
DEBUG: existing constraints on column "tn"."i" are sufficient to prove
that it does not contain nullsThanks! I'll add that to my recipe for the future. Although by that time it would be too late, so to make use of this I would have to set up a cloned test environment and hope that all conditions are correctly cloned. Is there a way to check sufficiency before running the command?
Yea, client_min_messages is there to demonstrate that the feature is working
and allow you to check whether it work using your own recipe.
If you want to avoid blocking the table for nontrivial time, maybe you'd add:
SET statement_timeout='1s';
On Mon, Jun 01, 2020 at 09:55:43PM -0400, John Bachir wrote:
Maybe something else had a nontrivial lock on the table, and those commands
were waiting on lock. If you "SET deadlock_timeout='1'; SET
log_lock_waits=on;", then you could see that.Just checking - I think you mean lock_timeout? (although setting deadlock_timeout is also not a bad idea just in case).
No, actually (but I've had to double check):
https://www.postgresql.org/docs/current/runtime-config-locks.html
|When log_lock_waits is set, this parameter also determines the length of time
|to wait before a log message is issued about the lock wait. If you are trying
|to investigate locking delays you might want to set a shorter than normal
|deadlock_timeout.
--
Justin
Thank you Justin for all that useful info! A couple nitpicky questions, so I can get my recipe right.
On Mon, Jun 1, 2020, at 10:04 PM, Justin Pryzby wrote:
On Mon, Jun 01, 2020 at 10:49:25AM -0400, John Bachir wrote:
Thanks! I'll add that to my recipe for the future. Although by that time it would be too late, so to make use of this I would have to set up a cloned test environment and hope that all conditions are correctly cloned. Is there a way to check sufficiency before running the command?
Yea, client_min_messages is there to demonstrate that the feature is working
and allow you to check whether it work using your own recipe.
Gotcha. Okay, just to double-check - you are saying there is _not_ a way to check before running the command, right?
Just checking - I think you mean lock_timeout? (although setting deadlock_timeout is also not a bad idea just in case).
No, actually (but I've had to double check):
https://www.postgresql.org/docs/current/runtime-config-locks.html
|When log_lock_waits is set, this parameter also determines the length of time
|to wait before a log message is issued about the lock wait. If you are trying
|to investigate locking delays you might want to set a shorter than normal
|deadlock_timeout.
Hah! Unexpected and useful.
I think to avoid blocking my application activity, I should _also_ set lock_timeout, and retry if it fails. (maybe this is orthogonal to what you were addressing before, but I'm just wondering what you think).
Thanks,
John