Correct docs re: rewriting indexes when table rewrite is skipped
Back in 367bc42 (for 9.2!) we "avoid[ed] index rebuild[ing] for
no-rewrite ALTER TABLE
.. ALTER TYPE." However the docs still claim that "a table rewrite is
not needed; but any indexes on the affected columns must still be
rebuilt."
I've attached a simple patch to update the docs to match the current behavior.
Thanks,
James Coleman
Attachments:
v1-0001-Docs-When-table-rewriting-is-skipped-indexes-are-.patchapplication/x-patch; name=v1-0001-Docs-When-table-rewriting-is-skipped-indexes-are-.patchDownload
From f6515a5f5f39d728b4cad837480c3ca953ed4623 Mon Sep 17 00:00:00 2001
From: jcoleman <jtc331@gmail.com>
Date: Tue, 29 Mar 2022 13:56:39 +0000
Subject: [PATCH v1] Docs: When table rewriting is skipped indexes are not
rebuilt
In 367bc42 (for 9.2!) we avoid index rebuild for no-rewrite ALTER TABLE
.. ALTER TYPE. Update the docs to match.
---
doc/src/sgml/ref/alter_table.sgml | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 5c0735e08a..39931c97c8 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -1366,7 +1366,7 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
existing column, if the <literal>USING</literal> clause does not change
the column contents and the old type is either binary coercible to the new
type or an unconstrained domain over the new type, a table rewrite is not
- needed; but any indexes on the affected columns must still be rebuilt.
+ needed.
Table and/or index rebuilds may take a
significant amount of time for a large table; and will temporarily require
as much as double the disk space.
--
2.20.1
On Tue, 29 Mar 2022 at 16:04, James Coleman <jtc331@gmail.com> wrote:
Back in 367bc42 (for 9.2!) we "avoid[ed] index rebuild[ing] for
no-rewrite ALTER TABLE
.. ALTER TYPE." However the docs still claim that "a table rewrite is
not needed; but any indexes on the affected columns must still be
rebuilt."
Although indexes might indeed not need a rebuild, in many cases they
still do (e.g. when the type changes between text and a domain of text
with a different collation).
I think that the current state of the docs is better in that regard;
as it explicitly warns for index rebuilds, even when the letter of the
docs is incorrect: there are indeed cases we don't need to rebuild the
indexes; but that would require more elaboration.
Kind regards,
Matthias van de Meent
On Tue, Mar 29, 2022 at 11:29 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
On Tue, 29 Mar 2022 at 16:04, James Coleman <jtc331@gmail.com> wrote:
Back in 367bc42 (for 9.2!) we "avoid[ed] index rebuild[ing] for
no-rewrite ALTER TABLE
.. ALTER TYPE." However the docs still claim that "a table rewrite is
not needed; but any indexes on the affected columns must still be
rebuilt."Although indexes might indeed not need a rebuild, in many cases they
still do (e.g. when the type changes between text and a domain of text
with a different collation).I think that the current state of the docs is better in that regard;
as it explicitly warns for index rebuilds, even when the letter of the
docs is incorrect: there are indeed cases we don't need to rebuild the
indexes; but that would require more elaboration.
Admittedly I hadn't thought of that case. But isn't it already covered
in the existing docs by the phrase "or an unconstrained domain over
the new type"? I don't love the word "or" there because there's a
sense in which the first clause "binary coercible to the new type" is
still accurate for your example unless you narrowly separate "domain"
and "type", but I think that narrow distinction is what's technically
there already.
That being said, I could instead of removing the clause entirely
replace it with something like "indexes may still need to be rebuilt
when the new type is a constrained domain".
Thoughts?
James Coleman
On Wed, Mar 30, 2022 at 10:04 AM James Coleman <jtc331@gmail.com> wrote:
Admittedly I hadn't thought of that case. But isn't it already covered
in the existing docs by the phrase "or an unconstrained domain over
the new type"? I don't love the word "or" there because there's a
sense in which the first clause "binary coercible to the new type" is
still accurate for your example unless you narrowly separate "domain"
and "type", but I think that narrow distinction is what's technically
there already.That being said, I could instead of removing the clause entirely
replace it with something like "indexes may still need to be rebuilt
when the new type is a constrained domain".
You're talking about this as if the normal cases is that indexes don't
need to rebuilt, but sometimes they do. However, if I recall
correctly, the way the code is structured, it supposes that the
indexes do need a rebuild, and then tries to prove that they actually
don't. That disconnect makes me nervous, because it seems to me that
it could easily lead to a situation where our documentation contains
subtle errors. I think somebody should go through and study the
algorithm as it exists in the code, and then write documentation to
match it. And I think that when they do that, they should approach it
from the same point of view that the code does i.e. "these are the
conditions for skipping the index rebuild" rather than "these are the
conditions for performing an index rebuild." By doing it that way, I
think we minimize the likelihood of disconnects between code and
documentation, and also make it easier to update in future if the
algorithm gets changed.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, Mar 30, 2022 at 11:41 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Mar 30, 2022 at 10:04 AM James Coleman <jtc331@gmail.com> wrote:
Admittedly I hadn't thought of that case. But isn't it already covered
in the existing docs by the phrase "or an unconstrained domain over
the new type"? I don't love the word "or" there because there's a
sense in which the first clause "binary coercible to the new type" is
still accurate for your example unless you narrowly separate "domain"
and "type", but I think that narrow distinction is what's technically
there already.That being said, I could instead of removing the clause entirely
replace it with something like "indexes may still need to be rebuilt
when the new type is a constrained domain".You're talking about this as if the normal cases is that indexes don't
need to rebuilt, but sometimes they do. However, if I recall
correctly, the way the code is structured, it supposes that the
indexes do need a rebuild, and then tries to prove that they actually
don't. That disconnect makes me nervous, because it seems to me that
it could easily lead to a situation where our documentation contains
subtle errors. I think somebody should go through and study the
algorithm as it exists in the code, and then write documentation to
match it. And I think that when they do that, they should approach it
from the same point of view that the code does i.e. "these are the
conditions for skipping the index rebuild" rather than "these are the
conditions for performing an index rebuild." By doing it that way, I
think we minimize the likelihood of disconnects between code and
documentation, and also make it easier to update in future if the
algorithm gets changed.
Hmm, having it match the way it works makes sense. Would you feel
comfortable with an intermediate step (queueing up that as a larger
change) changing the clause to something like "indexes will still have
to be rebuilt unless the system can guarantee that the sort order is
proven to be unchanged" (with appropriate wordsmithing to be a bit
less verbose if possible)?
Thanks,
James Coleman
On Wed, Mar 30, 2022 at 4:33 PM James Coleman <jtc331@gmail.com> wrote:
Hmm, having it match the way it works makes sense. Would you feel
comfortable with an intermediate step (queueing up that as a larger
change) changing the clause to something like "indexes will still have
to be rebuilt unless the system can guarantee that the sort order is
proven to be unchanged" (with appropriate wordsmithing to be a bit
less verbose if possible)?
Yeah, that seems fine. It's arguable how much detail we should go into
here - but a statement of the form you propose is not misleading, and
that's what seems most important to me.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, Mar 30, 2022 at 5:41 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Mar 30, 2022 at 4:33 PM James Coleman <jtc331@gmail.com> wrote:
Hmm, having it match the way it works makes sense. Would you feel
comfortable with an intermediate step (queueing up that as a larger
change) changing the clause to something like "indexes will still have
to be rebuilt unless the system can guarantee that the sort order is
proven to be unchanged" (with appropriate wordsmithing to be a bit
less verbose if possible)?Yeah, that seems fine. It's arguable how much detail we should go into
here - but a statement of the form you propose is not misleading, and
that's what seems most important to me.
All right, thanks for feedback. Attached is v2 with such a change.
I've not included examples, and I'm about 50/50 on doing so. What are
your thoughts on adding in parens "e.g., changing from varchar to text
avoids rebuilding indexes while changing from text to a domain of text
with a different collation will require rebuilding indexes"?
Thanks,
James Coleman
Attachments:
v2-0001-Docs-Index-rebuilding-is-sometimes-skipped-along-.patchapplication/octet-stream; name=v2-0001-Docs-Index-rebuilding-is-sometimes-skipped-along-.patchDownload
From 33ef467fe03cd3b41d89b98b4cc2bf8974401f7f Mon Sep 17 00:00:00 2001
From: jcoleman <jtc331@gmail.com>
Date: Tue, 29 Mar 2022 13:56:39 +0000
Subject: [PATCH v2] Docs: Index rebuilding is sometimes skipped along with
table rewriting
In 367bc42 (for 9.2!) we "avoid index rebuild for no-rewrite ALTER TABLE
.. ALTER TYPE." This doesn't apply in all cases, but update the docs to
match so users don't conclude indexes must always be rewritten.
---
doc/src/sgml/ref/alter_table.sgml | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 5c0735e08a..292efb67af 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -1366,7 +1366,8 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
existing column, if the <literal>USING</literal> clause does not change
the column contents and the old type is either binary coercible to the new
type or an unconstrained domain over the new type, a table rewrite is not
- needed; but any indexes on the affected columns must still be rebuilt.
+ needed; however, indexes will still have to be rebuilt unless the system
+ can verify that sort order and/or hashing semantics are unchanged.
Table and/or index rebuilds may take a
significant amount of time for a large table; and will temporarily require
as much as double the disk space.
--
2.20.1
On Thu, Mar 31, 2022 at 9:17 AM James Coleman <jtc331@gmail.com> wrote:
All right, thanks for feedback. Attached is v2 with such a change.
I've not included examples, and I'm about 50/50 on doing so. What are
your thoughts on adding in parens "e.g., changing from varchar to text
avoids rebuilding indexes while changing from text to a domain of text
with a different collation will require rebuilding indexes"?
On the patch, I suggest that instead of saying "can verify that sort
order and/or hashing semantics are unchanged" you say something like
"can verify that the new index would be logically equivalent to the
current one", mostly because I do not think that "and/or" looks very
good in formal writing.
I think it would be fine to include examples, but I think that the
phrasing you suggest here doesn't seem great. I'm not sure how to fix
it exactly. Maybe it needs a little more explanation?
--
Robert Haas
EDB: http://www.enterprisedb.com
On Thu, Mar 31, 2022 at 9:43 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Mar 31, 2022 at 9:17 AM James Coleman <jtc331@gmail.com> wrote:
All right, thanks for feedback. Attached is v2 with such a change.
I've not included examples, and I'm about 50/50 on doing so. What are
your thoughts on adding in parens "e.g., changing from varchar to text
avoids rebuilding indexes while changing from text to a domain of text
with a different collation will require rebuilding indexes"?On the patch, I suggest that instead of saying "can verify that sort
order and/or hashing semantics are unchanged" you say something like
"can verify that the new index would be logically equivalent to the
current one", mostly because I do not think that "and/or" looks very
good in formal writing.
Agreed re: "and/or".
I think it would be fine to include examples, but I think that the
phrasing you suggest here doesn't seem great. I'm not sure how to fix
it exactly. Maybe it needs a little more explanation?
Is the attached more along the lines of what you were thinking?
Thanks,
James Coleman
Attachments:
v3-0001-Docs-Index-rebuilding-is-sometimes-skipped-along-.patchapplication/octet-stream; name=v3-0001-Docs-Index-rebuilding-is-sometimes-skipped-along-.patchDownload
From bd9b9a289e276c745e5d8cd2208975362472c9ad Mon Sep 17 00:00:00 2001
From: jcoleman <jtc331@gmail.com>
Date: Tue, 29 Mar 2022 13:56:39 +0000
Subject: [PATCH v3] Docs: Index rebuilding is sometimes skipped along with
table rewriting
In 367bc42 (for 9.2!) we "avoid index rebuild for no-rewrite ALTER TABLE
.. ALTER TYPE." This doesn't apply in all cases, but update the docs to
match so users don't conclude indexes must always be rewritten.
---
doc/src/sgml/ref/alter_table.sgml | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 5c0735e08a..da576fcc89 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -1366,7 +1366,12 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
existing column, if the <literal>USING</literal> clause does not change
the column contents and the old type is either binary coercible to the new
type or an unconstrained domain over the new type, a table rewrite is not
- needed; but any indexes on the affected columns must still be rebuilt.
+ needed. However, indexes must always be rebuilt unless the system can
+ verify that the new index would be logically equivalent to the existing one.
+ For example, while changing a column from type <type>VARCHAR</type> to type
+ <type>TEXT</type> (having the same collation) does not require rebuilding
+ indexes, a type change affecting the column's collation requires rebuilding
+ indexes because the logical sort order has changed.
Table and/or index rebuilds may take a
significant amount of time for a large table; and will temporarily require
as much as double the disk space.
--
2.20.1
On Thu, Mar 31, 2022 at 10:14 AM James Coleman <jtc331@gmail.com> wrote:
Is the attached more along the lines of what you were thinking?
Yeah. Maybe this would be a little clearer: "For example, if the
collation for a column has been changed, an index rebuild is always
required, because the new sort order might be different. However, in
the absence of a collation change, a column can be changed from text
to varchar or vice versa without rebuilding the indexes, because these
data types sort identically."
We don't seem to be very consistent about whether we write type names
like VARCHAR in upper case or lower case in the documentation. I'd
vote for using lower-case, but it probably doesn't matter much.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Thu, Mar 31, 2022 at 10:29 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Mar 31, 2022 at 10:14 AM James Coleman <jtc331@gmail.com> wrote:
Is the attached more along the lines of what you were thinking?
Yeah. Maybe this would be a little clearer: "For example, if the
collation for a column has been changed, an index rebuild is always
required, because the new sort order might be different. However, in
the absence of a collation change, a column can be changed from text
to varchar or vice versa without rebuilding the indexes, because these
data types sort identically."
Updated.
We don't seem to be very consistent about whether we write type names
like VARCHAR in upper case or lower case in the documentation. I'd
vote for using lower-case, but it probably doesn't matter much.
Grepping suggests lower case is more common (2 to 1) so I switched to
that. Interestingly for "text" it's 700+ to 1 :)
Thanks,
James Coleman
Attachments:
v4-0001-Docs-Index-rebuilding-is-sometimes-skipped-along-.patchapplication/octet-stream; name=v4-0001-Docs-Index-rebuilding-is-sometimes-skipped-along-.patchDownload
From 9bba318b31394af582659ecfcd5fd77ddf778164 Mon Sep 17 00:00:00 2001
From: jcoleman <jtc331@gmail.com>
Date: Tue, 29 Mar 2022 13:56:39 +0000
Subject: [PATCH v4] Docs: Index rebuilding is sometimes skipped along with
table rewriting
In 367bc42 (for 9.2!) we "avoid index rebuild for no-rewrite ALTER TABLE
.. ALTER TYPE." This doesn't apply in all cases, but update the docs to
match so users don't conclude indexes must always be rewritten.
---
doc/src/sgml/ref/alter_table.sgml | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 5c0735e08a..7027976fa9 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -1366,7 +1366,13 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
existing column, if the <literal>USING</literal> clause does not change
the column contents and the old type is either binary coercible to the new
type or an unconstrained domain over the new type, a table rewrite is not
- needed; but any indexes on the affected columns must still be rebuilt.
+ needed. However, indexes must always be rebuilt unless the system can
+ verify that the new index would be logically equivalent to the existing one.
+ For example, if the collation for a column has been changed an index rebuild
+ is always required because the new sort order might be different. However, in
+ the absence of a collation change, a column can be changed from <type>text</type>
+ to <type>varchar</type> (or vice versa) without rebuilding the indexes because
+ these data types sort identically.
Table and/or index rebuilds may take a
significant amount of time for a large table; and will temporarily require
as much as double the disk space.
--
2.20.1
On Thu, Mar 31, 2022 at 10:51 AM James Coleman <jtc331@gmail.com> wrote:
Updated.
This version looks fine to me. If nobody objects I will commit it and
credit myself as a co-author.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Thu, Mar 31, 2022 at 3:25 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Mar 31, 2022 at 10:51 AM James Coleman <jtc331@gmail.com> wrote:
Updated.
This version looks fine to me. If nobody objects I will commit it and
credit myself as a co-author.
Sounds great; thanks again for the review.
James Coleman
On Thu, Mar 31, 2022 at 4:19 PM James Coleman <jtc331@gmail.com> wrote:
On Thu, Mar 31, 2022 at 3:25 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Mar 31, 2022 at 10:51 AM James Coleman <jtc331@gmail.com> wrote:
Updated.
This version looks fine to me. If nobody objects I will commit it and
credit myself as a co-author.Sounds great; thanks again for the review.
Done.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Fri, Apr 1, 2022 at 8:58 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Mar 31, 2022 at 4:19 PM James Coleman <jtc331@gmail.com> wrote:
On Thu, Mar 31, 2022 at 3:25 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Mar 31, 2022 at 10:51 AM James Coleman <jtc331@gmail.com> wrote:
Updated.
This version looks fine to me. If nobody objects I will commit it and
credit myself as a co-author.Sounds great; thanks again for the review.
Done.
Thanks. I marked the CF entry as committed.
James Coleman