REPACK and naming

Started by Bruce Momjian6 months ago35 messages
Jump to latest
#1Bruce Momjian
bruce@momjian.us

I am starting to get worried about the confusing of adding a REPACK
command. We already have a lot of confusion around vacuum and analyze:

* autoanalyze does vacuum and analyze
* VACUUM FULL is much different from VACUUM

It seems if we add REPACK as a command, it is somewhere between VACUUM
FULL and VACUUM in severity/impact. Should we be rethinking the naming
in this area?

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Do not let urgent matters crowd out time for investment in the future.

#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruce Momjian (#1)
Re: REPACK and naming

Hi,

On 2025-Sep-16, Bruce Momjian wrote:

I am starting to get worried about the confusing of adding a REPACK
command. We already have a lot of confusion around vacuum and analyze:

* autoanalyze does vacuum and analyze
* VACUUM FULL is much different from VACUUM

It seems if we add REPACK as a command, it is somewhere between VACUUM
FULL and VACUUM in severity/impact.

No, REPACK is exactly where VACUUM FULL is in terms of impact and
severity; it's not between anything. The confusion stems precisely from
VACUUM being a thing that's a completely different one from VACUUM FULL,
yet they have pretty much the same name. What I'm doing is give one of
those things a different name, to reduce confusion. Note that there's
no intention to add autorepack, because that would (IMO) make no sense.

Another thing I'm doing with that patch is rename CLUSTER so that it is
also REPACK. This also makes sense, because VACUUM FULL _is_ the same
as CLUSTER, except that it follows current physical order instead of
following a specific index's order.

Peter E suggested that since we have REINDEX to rewrite indexes, then
the command to rewrite tables should be RETABLE. I haven't been able to
get myself to like that idea, and also I think that was a bit
tongue-in-cheek, but if you like RETABLE better than REPACK, then maybe
we can have a vote to decide which one of those names to use. However,
I don't think that change would make a tremendous difference, and also I
don't think RETABLE is enough of an English name to become a command
name.

Should we be rethinking the naming in this area?

I haven't seen anything that needs renaming TBH, but if you have
specific proposals, feel free to air them.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Thou shalt not follow the NULL pointer, for chaos and madness await
thee at its end." (2nd Commandment for C programmers)

#3Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#2)
Re: REPACK and naming

On Tue, Sep 16, 2025 at 7:42 PM Álvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Peter E suggested that since we have REINDEX to rewrite indexes, then
the command to rewrite tables should be RETABLE. I haven't been able to
get myself to like that idea, and also I think that was a bit
tongue-in-cheek, but if you like RETABLE better than REPACK, then maybe
we can have a vote to decide which one of those names to use. However,
I don't think that change would make a tremendous difference, and also I
don't think RETABLE is enough of an English name to become a command
name.

I think RETABLE is not a proposal to be taken seriously. That's
extremely confusing.

I don't love the name REPACK, but I think it's good enough. If we come
up with something better, great.

I agree that having a single command that does both VACUUM FULL and
CLUSTER makes a lot more sense than the status quo, which is a
confusing historical accident.

--
Robert Haas
EDB: http://www.enterprisedb.com

#4Marcos Pegoraro
marcos@f10.com.br
In reply to: Robert Haas (#3)
Re: REPACK and naming

Em ter., 16 de set. de 2025 às 23:01, Robert Haas <robertmhaas@gmail.com>
escreveu:

I think RETABLE is not a proposal to be taken seriously. That's
extremely confusing.

This feature could be used in a future version to rearrange fields in a
table, for better padding.
I don't think we have another one available for this purpose.

CREATE TABLE T(A text, B integer, C bigint, D integer);

We could have something like
RETABLE T USING(B, D, C, A)

So REPACK isn't the best for this, if this feature would exist some day.

regards
Marcos

#5Thom Brown
thom@linux.com
In reply to: Robert Haas (#3)
Re: REPACK and naming

On Wed, 17 Sept 2025, 03:01 Robert Haas, <robertmhaas@gmail.com> wrote:

On Tue, Sep 16, 2025 at 7:42 PM Álvaro Herrera <alvherre@alvh.no-ip.org>
wrote:

Peter E suggested that since we have REINDEX to rewrite indexes, then
the command to rewrite tables should be RETABLE. I haven't been able to
get myself to like that idea, and also I think that was a bit
tongue-in-cheek, but if you like RETABLE better than REPACK, then maybe
we can have a vote to decide which one of those names to use. However,
I don't think that change would make a tremendous difference, and also I
don't think RETABLE is enough of an English name to become a command
name.

I think RETABLE is not a proposal to be taken seriously. That's
extremely confusing.

I don't love the name REPACK, but I think it's good enough. If we come
up with something better, great.

COMPACT?

Thom

Show quoted text
#6Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#2)
Re: REPACK and naming

On Wed, Sep 17, 2025 at 01:42:29AM +0200, Álvaro Herrera wrote:

Hi,

On 2025-Sep-16, Bruce Momjian wrote:

I am starting to get worried about the confusing of adding a REPACK
command. We already have a lot of confusion around vacuum and analyze:

* autoanalyze does vacuum and analyze
* VACUUM FULL is much different from VACUUM

It seems if we add REPACK as a command, it is somewhere between VACUUM
FULL and VACUUM in severity/impact.

No, REPACK is exactly where VACUUM FULL is in terms of impact and
severity; it's not between anything. The confusion stems precisely from
VACUUM being a thing that's a completely different one from VACUUM FULL,
yet they have pretty much the same name. What I'm doing is give one of
those things a different name, to reduce confusion. Note that there's
no intention to add autorepack, because that would (IMO) make no sense.

Another thing I'm doing with that patch is rename CLUSTER so that it is
also REPACK. This also makes sense, because VACUUM FULL _is_ the same
as CLUSTER, except that it follows current physical order instead of
following a specific index's order.

So the CLUSTER command is removed and people should use REPACK instead?
And VACUUM FULL stays unchanged?

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Do not let urgent matters crowd out time for investment in the future.

#7Ranier Vilela
ranier.vf@gmail.com
In reply to: Bruce Momjian (#1)
Re: REPACK and naming

Em ter., 16 de set. de 2025 às 13:40, Bruce Momjian <bruce@momjian.us>
escreveu:

I am starting to get worried about the confusing of adding a REPACK
command. We already have a lot of confusion around vacuum and analyze:

* autoanalyze does vacuum and analyze
* VACUUM FULL is much different from VACUUM

It seems if we add REPACK as a command, it is somewhere between VACUUM
FULL and VACUUM in severity/impact. Should we be rethinking the naming
in this area?

SqlServer has similar feature.
SHRINK

best regards,
Ranier Vilela

#8Robert Haas
robertmhaas@gmail.com
In reply to: Marcos Pegoraro (#4)
Re: REPACK and naming

On Wed, Sep 17, 2025 at 8:04 AM Marcos Pegoraro <marcos@f10.com.br> wrote:

Em ter., 16 de set. de 2025 às 23:01, Robert Haas <robertmhaas@gmail.com> escreveu:

I think RETABLE is not a proposal to be taken seriously. That's
extremely confusing.

This feature could be used in a future version to rearrange fields in a table, for better padding.
I don't think we have another one available for this purpose.

CREATE TABLE T(A text, B integer, C bigint, D integer);

We could have something like
RETABLE T USING(B, D, C, A)

So REPACK isn't the best for this, if this feature would exist some day.

RETABLE just isn't a word. The code sometimes calls this a REWRITE of
a table, which would be reasonable. I suspect, though, that changing
the column order would end up being a form of ALTER TABLE.

--
Robert Haas
EDB: http://www.enterprisedb.com

#9David G. Johnston
david.g.johnston@gmail.com
In reply to: Marcos Pegoraro (#4)

On Wednesday, September 17, 2025, Marcos Pegoraro <marcos@f10.com.br> wrote:

Em ter., 16 de set. de 2025 às 23:01, Robert Haas <robertmhaas@gmail.com>
escreveu:

I think RETABLE is not a proposal to be taken seriously. That's
extremely confusing.

This feature could be used in a future version to rearrange fields in a
table, for better padding.
I don't think we have another one available for this purpose.

CREATE TABLE T(A text, B integer, C bigint, D integer);

We could have something like
RETABLE T USING(B, D, C, A)

That changes logical aspects of a table and so would be done as part of
alter table, IMO. “AT tbl Rearrange columns (names list) “

I’m not a fan of “retable” as a command keyword.

But this digresses from the topic at hand.

I’m fine with repack itself. Deprecating vacuum full would be nice - but
actually renaming existing things is bound to just make matters worse, IMO.

Concretely, maybe we should remove vacuum full from the vacuum command
page, and just call it out as compatibility spelling of repack on its
page. Maybe do the same for cluster (I haven’t dived into the new feature
enough to confidently describe all this yet though).

David J.

#10Junwang Zhao
zhjwpku@gmail.com
In reply to: Ranier Vilela (#7)
Re: REPACK and naming

On Wed, Sep 17, 2025 at 9:01 PM Ranier Vilela <ranier.vf@gmail.com> wrote:

Em ter., 16 de set. de 2025 às 13:40, Bruce Momjian <bruce@momjian.us> escreveu:

I am starting to get worried about the confusing of adding a REPACK
command. We already have a lot of confusion around vacuum and analyze:

* autoanalyze does vacuum and analyze
* VACUUM FULL is much different from VACUUM

It seems if we add REPACK as a command, it is somewhere between VACUUM
FULL and VACUUM in severity/impact. Should we be rethinking the naming
in this area?

SqlServer has similar feature.
SHRINK

C++ vector has a shrink_to_fit method which seems to serve a similar purpose.

REPACK or REBUILD looks good to me, COMPACT, on the other hand, feels
more specific to the idea of consolidating free space within a page or block.

best regards,
Ranier Vilela

--
Regards
Junwang Zhao

#11Marcos Pegoraro
marcos@f10.com.br
In reply to: David G. Johnston (#9)
Re: REPACK and naming

Em qua., 17 de set. de 2025 às 10:17, David G. Johnston <
david.g.johnston@gmail.com> escreveu:

That changes logical aspects of a table and so would be done as part of
alter table, IMO. “AT tbl Rearrange columns (names list) “

If this command recreates entirely that table, it is not only a logical
aspect of that table.
REPACK/RETABLE recreates that table as a replacement for VACUUM FULL/CLUSTER
and ALTER TABLE REARRANGE COLUMNS would recreate that table too ?
and both would have USING INDEX to do what CLUSTER does today ? both would
have CONCURRENTLY ?

Being named REPACK or RETABLE or RECREATE TABLE or COMPACT or anything else
could do it all.

regards
Marcos

#12Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: David G. Johnston (#9)
Re: REPACK and naming

On 2025-Sep-17, David G. Johnston wrote:

That changes logical aspects of a table and so would be done as part of
alter table, IMO. “AT tbl Rearrange columns (names list) “

Yes.

Concretely, maybe we should remove vacuum full from the vacuum command
page, and just call it out as compatibility spelling of repack on its
page. Maybe do the same for cluster (I haven’t dived into the new feature
enough to confidently describe all this yet though).

I think we should list VACUUM FULL as deprecated, document that feature
in the REPACK documentation page, and leave VACUUM FULL in working state
so as not to break existing scripts. Same for CLUSTER.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Those who use electric razors are infidels destined to burn in hell while
we drink from rivers of beer, download free vids and mingle with naked
well shaved babes." (http://slashdot.org/comments.pl?sid=44793&amp;cid=4647152)

#13Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruce Momjian (#6)
Re: REPACK and naming

On 2025-Sep-17, Bruce Momjian wrote:

So the CLUSTER command is removed and people should use REPACK instead?
And VACUUM FULL stays unchanged?

No, not removed. It's going to stay, to avoid breaking scripts. People
should use REPACK on new code going forward, but existing code is not
going to break. Same with VACUUM FULL.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Uno puede defenderse de los ataques; contra los elogios se esta indefenso"

#14David G. Johnston
david.g.johnston@gmail.com
In reply to: Alvaro Herrera (#12)
Re: REPACK and naming

On Wednesday, September 17, 2025, Álvaro Herrera <alvherre@alvh.no-ip.org>
wrote:

On 2025-Sep-17, David G. Johnston wrote:

Concretely, maybe we should remove vacuum full from the vacuum command
page, and just call it out as compatibility spelling of repack on its
page. Maybe do the same for cluster (I haven’t dived into the new

feature

enough to confidently describe all this yet though).

I think we should list VACUUM FULL as deprecated, document that feature
in the REPACK documentation page, and leave VACUUM FULL in working state
so as not to break existing scripts. Same for CLUSTER.

I was unclear - this is indeed what I suggesting as well. Reframe the
documentation but leave the commands functioning.

David J.

#15Mihail Nikalayeu
mihailnikalayeu@gmail.com
In reply to: Ranier Vilela (#7)
Re: REPACK and naming

Ranier Vilela <ranier.vf@gmail.com>:

SqlServer has similar feature.
SHRINK

MySQL/MariaDB
OPTIMIZE TABLE table_name

SQL Server
ALTER TABLE table_name REBUILD
DBCC SHRINKFILE
DBCC SHRINKDATABASE

Oracle
ALTER TABLE table_name SHRINK SPACE

SQLite
VACUUM

IBM DB2
REORG TABLE table_name

Sybase ASE
REORG REBUILD table_name

Best regards,
Mikhail.

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#12)
Re: REPACK and naming

=?utf-8?Q?=C3=81lvaro?= Herrera <alvherre@alvh.no-ip.org> writes:

On 2025-Sep-17, David G. Johnston wrote:

Concretely, maybe we should remove vacuum full from the vacuum command
page, and just call it out as compatibility spelling of repack on its
page. Maybe do the same for cluster (I haven’t dived into the new feature
enough to confidently describe all this yet though).

I think we should list VACUUM FULL as deprecated, document that feature
in the REPACK documentation page, and leave VACUUM FULL in working state
so as not to break existing scripts. Same for CLUSTER.

I'm not at all in love with documenting VACUUM FULL and CLUSTER as
being fundamentally the same thing. I think that is an implementation
happenstance that could go away as easily as it appeared. Even if you
think we'll never again rewrite it for heap, what of other table AMs?
The underlying reality could be totally different for them.

By and large, I don't think I like this renaming proposal.
Maybe eventually it would reduce confusion, but there will be
a long interval where it adds more.

regards, tom lane

#17Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#16)
Re: REPACK and naming

On Wed, Sep 17, 2025 at 10:22 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

By and large, I don't think I like this renaming proposal.
Maybe eventually it would reduce confusion, but there will be
a long interval where it adds more.

I mean, it's PRETTY confusing that VACUUM FULL does something much
more similar to CLUSTER than it is to VACUUM. We can't ever get out
from under that confusion if we don't change something. I think it's
more than fair to bikeshed what the verb should be that describes the
action we currently describe by writing either VACUUM FULL or CLUSTER,
but I agree with Álvaro that having one verb for both of those things
makes a lot more sense than the status quo.

--
Robert Haas
EDB: http://www.enterprisedb.com

#18Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#16)
Re: REPACK and naming

On 2025-Sep-17, Tom Lane wrote:

I'm not at all in love with documenting VACUUM FULL and CLUSTER as
being fundamentally the same thing. I think that is an implementation
happenstance that could go away as easily as it appeared. Even if you
think we'll never again rewrite it for heap, what of other table AMs?
The underlying reality could be totally different for them.

So there two operations here. One is
REPACK tab USING INDEX idx
which we currently call CLUSTER, and there is also
REPACK TAB
(no index specified) which we currently call VACUUM FULL. These have
the very specific charter of rewriting the table while removing bloat,
the distinction being that they keep the rows ordered according to the
index or not. Both these operations currently use the same
implementation, yes; but if we were to reimplement one of them to use
some completely different piece of code, then the new command name
continues to work, it just calls the new different implementation, while
the other command continues to call the other one. (Or maybe we decide
reimplement both using different techniques, and we throw away
cluster.c, but still the command names continue to be sensible and would
continue to work.)

Thinking about the other half of your argument: if we add new table AMs
for which the cluster.c implementation doesn't work, then we'll have to
wire the table AM support routines to call some different implementation
into REPACK or REPACK USING INDEX. This is no different than if we keep
these commands being VACUUM FULL or CLUSTER; we would still need a
different implementation underneath, and we would still need to wire the
table AM support routines to call that different implementation.

So all things considered, I'm not seeing what aspect of the renaming
exactly are you uncomfortable with. We're not making the situation any
worse.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
<Schwern> It does it in a really, really complicated way
<crab> why does it need to be complicated?
<Schwern> Because it's MakeMaker.

#19David Rowley
dgrowleyml@gmail.com
In reply to: Robert Haas (#8)
Re: REPACK and naming

On Thu, 18 Sept 2025 at 01:09, Robert Haas <robertmhaas@gmail.com> wrote:

RETABLE just isn't a word. The code sometimes calls this a REWRITE of
a table, which would be reasonable.

+1. I was reading this yesterday wondering why "REWRITE" didn't get a
mention. The problem I have with REPACK is that "re" indicates that
something is being re-done that's been done before. If you're calling
REPACK for the first time on a table, that's not true.

David J's "REBUILD" also seems ok. In a green field, you could then
have "REBUILD TABLE ..." and "REBUILD INDEX ..."

David

#20David G. Johnston
david.g.johnston@gmail.com
In reply to: David Rowley (#19)

On Wednesday, September 17, 2025, David Rowley <dgrowleyml@gmail.com> wrote:

On Thu, 18 Sept 2025 at 01:09, Robert Haas <robertmhaas@gmail.com> wrote:

RETABLE just isn't a word. The code sometimes calls this a REWRITE of
a table, which would be reasonable.

+1. I was reading this yesterday wondering why "REWRITE" didn't get a
mention.

Agreed.

The problem I have with REPACK is that "re" indicates that
something is being re-done that's been done before. If you're calling
REPACK for the first time on a table, that's not true.

As soon as you’ve written the first tuple you’ve begun “packing” the table
- repack then is simply unpacking it and putting back the stuff you want to
keep in possibly a structured way.

David J's "REBUILD" also seems ok. In a green field, you could then

have "REBUILD TABLE ..." and "REBUILD INDEX ..."

Rebuild has some prior art apparently, which makes it appealing. But I’m
not a fan of the “shrink” usage the other products seem drawn to.

David J.

#21Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#17)
#22Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#21)
#23Vik Fearing
vik@postgresfriends.org
In reply to: Bruce Momjian (#22)
#24Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruce Momjian (#22)
#25Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: David G. Johnston (#20)
#26David G. Johnston
david.g.johnston@gmail.com
In reply to: Alvaro Herrera (#25)
#27David Rowley
dgrowleyml@gmail.com
In reply to: Alvaro Herrera (#18)
#28Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: David Rowley (#27)
#29David Rowley
dgrowleyml@gmail.com
In reply to: Alvaro Herrera (#28)
#30Antonin Houska
ah@cybertec.at
In reply to: Alvaro Herrera (#28)
#31David Rowley
dgrowleyml@gmail.com
In reply to: Antonin Houska (#30)
#32Antonin Houska
ah@cybertec.at
In reply to: David Rowley (#31)
#33Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: David Rowley (#29)
#34Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Antonin Houska (#30)
#35Andrew Dunstan
andrew@dunslane.net
In reply to: Alvaro Herrera (#24)