further clarification: alter table alter column set not null - table scan is skipped

Started by PG Bug reporting form9 months ago7 messagesdocs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/17/docguide.html
Description:

Hello there,
I like PostgreSQL a lot so this is my way of giving back.

The "table scan is skipped" optimization can use some clarification
https://www.postgresql.org/docs/current/sql-altertable.html#SQL-ALTERTABLE-DESC-SET-DROP-NOT-NULL
My proposal is "then the table scan is skipped if the alter statement
doesn't drop the constraint."

The reason behind the proposal is documented here
https://dev.to/andrewpsy/the-set-not-null-downtime-trap-in-postgresql-1o71

Thank you for working on such an awesome project.
Cheers,
Andrew

#2David G. Johnston
david.g.johnston@gmail.com
In reply to: PG Bug reporting form (#1)
Re: further clarification: alter table alter column set not null - table scan is skipped

On Wed, Jul 30, 2025, 13:55 PG Doc comments form <noreply@postgresql.org>
wrote:

The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/17/docguide.html
Description:

Hello there,
I like PostgreSQL a lot so this is my way of giving back.

The "table scan is skipped" optimization can use some clarification

https://www.postgresql.org/docs/current/sql-altertable.html#SQL-ALTERTABLE-DESC-SET-DROP-NOT-NULL
My proposal is "then the table scan is skipped if the alter statement
doesn't drop the constraint."

The reason behind the proposal is documented here
https://dev.to/andrewpsy/the-set-not-null-downtime-trap-in-postgresql-1o71

Thank you for working on such an awesome project.
Cheers,
Andrew

I'm kinda hoping this is actually just a fixable bug...

Otherwise I'd probably go for a parenthetical:
(however, the constraint used must not be dropped in the same alter table
command).

David J.

#3Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: David G. Johnston (#2)
Re: further clarification: alter table alter column set not null - table scan is skipped

On 2025-Jul-30, David G. Johnston wrote:

On Wed, Jul 30, 2025, 13:55 PG Doc comments form <noreply@postgresql.org>
wrote:

The "table scan is skipped" optimization can use some clarification

https://www.postgresql.org/docs/current/sql-altertable.html#SQL-ALTERTABLE-DESC-SET-DROP-NOT-NULL
My proposal is "then the table scan is skipped if the alter statement
doesn't drop the constraint."

I'm kinda hoping this is actually just a fixable bug...

I don't think so -- it's just the way ALTER TABLE is designed to work.
We don't promise that the subcommands are going to be executed in the
order that they are given, and thus this sort of thing can happen.
I suspect a mechanism that would throw an error at trying to drop the
constraint would be too complicated / brittle / laborious to write.

It's possible that there are other combinations that are similarly
affected, but I suspect the majority of them would just give an error
rather than silently wasting a lot of time; so I agree that this
subcommand specifically could use a small note. While writing it I
realized we failed to note that the addition of NOT VALID changes
behavior. So, how about like this:

      <para>
       <literal>SET NOT NULL</literal> may only be applied to a column
       provided none of the records in the table contain a
       <literal>NULL</literal> value for the column.  Ordinarily this is
       checked during the <literal>ALTER TABLE</literal> by scanning the
-      entire table; however, if a valid <literal>CHECK</literal> constraint is
-      found which proves no <literal>NULL</literal> can exist, then the
-      table scan is skipped.
+      entire table, unless <literal>NOT VALID</literal> is specified;
+      however, if a valid <literal>CHECK</literal> constraint is
+      found which proves no <literal>NULL</literal> can exist (and is not
+      dropped in the same command), then the table scan is skipped.
       If a column has an invalid not-null constraint,
       <literal>SET NOT NULL</literal> validates it.
      </para>

(This is correct for 18; for 17 and earlier, the mention of NOT VALID
needs to be removed.) Of course, in 18 you'd rely on ADD NOT NULL NOT
VALID instead of using a separate CHECK constraint.

Not sure if this reads better:

if a valid <literal>CHECK</literal> constraint is
found (and is not dropped in the same command) which
proves no <literal>NULL</literal> can exist, then

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Most hackers will be perfectly comfortable conceptualizing users as entropy
sources, so let's move on." (Nathaniel Smith)
https://mail.gnu.org/archive/html/monotone-devel/2007-01/msg00080.html

#4David G. Johnston
david.g.johnston@gmail.com
In reply to: Alvaro Herrera (#3)

On Thursday, July 31, 2025, Álvaro Herrera <alvherre@kurilemu.de> wrote:

On 2025-Jul-30, David G. Johnston wrote:

On Wed, Jul 30, 2025, 13:55 PG Doc comments form <noreply@postgresql.org

wrote:

The "table scan is skipped" optimization can use some clarification

https://www.postgresql.org/docs/current/sql-altertable.html#

SQL-ALTERTABLE-DESC-SET-DROP-NOT-NULL

My proposal is "then the table scan is skipped if the alter statement
doesn't drop the constraint."

I'm kinda hoping this is actually just a fixable bug...

I don't think so -- it's just the way ALTER TABLE is designed to work.
We don't promise that the subcommands are going to be executed in the
order that they are given, and thus this sort of thing can happen.
I suspect a mechanism that would throw an error at trying to drop the
constraint would be too complicated / brittle / laborious to write.

I wouldn’t want an error. At the start of the command the constraint
existed and its presence then would be enough. It is immaterial that it
went away during the command. But it’s definitely not something that seems
worth spending a non-trivial amount of effort on.

(This is correct for 18; for 17 and earlier, the mention of NOT VALID
needs to be removed.) Of course, in 18 you'd rely on ADD NOT NULL NOT
VALID instead of using a separate CHECK constraint.

Yeah, the main question here is whether we want to document for v17 and
earlier what the article points out regarding locks.

Not sure if this reads better:

if a valid <literal>CHECK</literal> constraint is
found (and is not dropped in the same command) which
proves no <literal>NULL</literal> can exist, then

If a valid check constraint exists (and is not dropped in the same command)
which proves the absence of NULLs, then

I do agree the parenthetical should appear closer to the word constraint.

David J.

#5Shuyu Pan
psy2000usa@yahoo.com
In reply to: David G. Johnston (#4)
Re: further clarification: alter table alter column set not null - table scan is skipped

I like your versions that emphasize: don’t drop the constraint in the same alter table set no null command.
Similar to David’s point, I spent some time trying to figure out a simple refactoring to carry the optimization all the way to the end but it might require executing “set not null” sooner which has a big impact. Another option is only implement a special treatment for this specific use case but it is a code smell to me. I believe a small clarification for the doc entry is the most efficient thing.

Sent from Yahoo Mail for iPhone

On Thursday, July 31, 2025, 09:01, David G. Johnston <david.g.johnston@gmail.com> wrote:

On Thursday, July 31, 2025, Álvaro Herrera <alvherre@kurilemu.de> wrote:

On 2025-Jul-30, David G. Johnston wrote:

On Wed, Jul 30, 2025, 13:55 PG Doc comments form <noreply@postgresql.org>
wrote:

The "table scan is skipped" optimization can use some clarification

https://www.postgresql.org/doc s/current/sql-altertable.html# SQL-ALTERTABLE-DESC-SET-DROP- NOT-NULL
My proposal is "then the table scan is skipped if the alter statement
doesn't drop the constraint."

I'm kinda hoping this is actually just a fixable bug...

I don't think so -- it's just the way ALTER TABLE is designed to work.
We don't promise that the subcommands are going to be executed in the
order that they are given, and thus this sort of thing can happen.
I suspect a mechanism that would throw an error at trying to drop the
constraint would be too complicated / brittle / laborious to write.

I wouldn’t want an error.  At the start of the command the constraint existed and its presence then would be enough.  It is immaterial that it went away during the command.  But it’s definitely not something that seems worth spending a non-trivial amount of effort on. 

(This is correct for 18; for 17 and earlier, the mention of NOT VALID
needs to be removed.)  Of course, in 18 you'd rely on ADD NOT NULL NOT
VALID instead of using a separate CHECK constraint.

Yeah, the main question here is whether we want to document for v17 and earlier what the article points out regarding locks.

Not sure if this reads better:

   if a valid <literal>CHECK</literal> constraint is
   found (and is not dropped in the same command) which
   proves no <literal>NULL</literal> can exist, then

If a valid check constraint exists (and is not dropped in the same command) which proves the absence of NULLs, then
I do agree the parenthetical should appear closer to the word constraint.
David J.

#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Shuyu Pan (#5)
Re: further clarification: alter table alter column set not null - table scan is skipped

On 2025-Jul-31, Shuyu Pan wrote:

I like your versions that emphasize: don’t drop the constraint in the
same alter table set no null command. Similar to David’s point, I
spent some time trying to figure out a simple refactoring to carry the
optimization all the way to the end but it might require executing
“set not null” sooner which has a big impact. Another option is only
implement a special treatment for this specific use case but it is a
code smell to me.

Oh yeah, delaying the drop is much more likely to break other things. I
was more thinking along the lines of maintaining a list of columns that
are known non-null at the start of the command (a bitmapset actually).
This could be computed in ALTER TABLE phase 1, and used later to
determine that no scans are needed. But this is a lot of mechanism
which is useless 99% of the time, and moreso now that you can directly
add the NOT NULL constraints as NOT VALID to start with, which saves
having to mess with a separate CHECK constraint.

I believe a small clarification for the doc entry is the most efficient thing.

Okay, I've pushed the change to all branches using David Johnston's
suggested wording.

Thank you all!

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/

#7Shuyu Pan
psy2000usa@yahoo.com
In reply to: Alvaro Herrera (#6)
Re: further clarification: alter table alter column set not null - table scan is skipped

Thanks a lot Álvaro for preparing the clarification so quickly. Looking forward to the release.
If we mark a column to skip table scan during drop (phase 0) and later skip the table scan when setting attributes (phase 7), we will not risk data corruption if the Access Exclusive lock is never released between phase 0 and 7.

Sent from Yahoo Mail for iPhone

On Monday, August 4, 2025, 04:32, Álvaro Herrera <alvherre@kurilemu.de> wrote:

On 2025-Jul-31, Shuyu Pan wrote:

I like your versions that emphasize: don’t drop the constraint in the
same alter table set no null command.  Similar to David’s point, I
spent some time trying to figure out a simple refactoring to carry the
optimization all the way to the end but it might require executing
“set not null” sooner which has a big impact. Another option is only
implement a special treatment for this specific use case but it is a
code smell to me.

Oh yeah, delaying the drop is much more likely to break other things.  I
was more thinking along the lines of maintaining a list of columns that
are known non-null at the start of the command (a bitmapset actually).
This could be computed in ALTER TABLE phase 1, and used later to
determine that no scans are needed.  But this is a lot of mechanism
which is useless 99% of the time, and moreso now that you can directly
add the NOT NULL constraints as NOT VALID to start with, which saves
having to mess with a separate CHECK constraint.

I believe a small clarification for the doc entry is the most efficient thing.

Okay, I've pushed the change to all branches using David Johnston's
suggested wording.

Thank you all!

--
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/