Adding REPACK [concurrently]
Hello,
Here's a patch to add REPACK and eventually the CONCURRENTLY flag to it.
This is coming from [1]/messages/by-id/76278.1724760050@antos. The ultimate goal is to have an in-core tool
to allow concurrent table rewrite to get rid of bloat; right now, VACUUM
FULL does that, but it's not concurrent. Users have resorted to using
the pg_repack third-party tool, which is ancient and uses a weird
internal implementation, as well as pg_squeeze, which uses logical
decoding to capture changes that occur during the table rewrite. The
patch submitted here, largely by Antonin Houska with some changes by me,
is based on the the pg_squeeze code which he authored, and first
introduces a new command called REPACK to absorb both VACUUM FULL and
CLUSTER, followed by addition of a CONCURRENTLY flag to allow some forms
of REPACK to operate online using logical decoding.
Essentially, this first patch just reshuffles the CLUSTER code to create
the REPACK command.
I made a few changes from Antonin's original at [2]/messages/by-id/152010.1751307725@localhost. First, I modified
the grammar to support "REPACK [tab] USING INDEX" without specifying the
index name. With this change, all possibilities of the old commands are
covered, which gives us the chance to flag them as obsolete. (This is
good, because having VACUUM FULL do something completely different from
regular VACUUM confuses users all the time; and on the other hand,
having a command called CLUSTER which is at odds with what most people
think of as a "database cluster" is also confusing.)
Here's a list of existing commands, and how to write them in the current
patch's proposal for REPACK:
-- re-clusters all tables that have a clustered index set
CLUSTER -> REPACK USING INDEX
-- clusters the given table using the given index
CLUSTER tab USING idx -> REPACK tab USING INDEX idx
-- clusters this table using a clustered index; error if no index clustered
CLUSTER tab -> REPACK tab USING INDEX
-- vacuum-full all tables
VACUUM FULL -> REPACK
-- vacuum-full the specified table
VACUUM FULL tab -> REPACK tab
My other change to Antonin's patch is that I made REPACK USING INDEX set
the 'indisclustered' flag to the index being used, so REPACK behaves
identically to CLUSTER. We can discuss whether we really want this.
For instance we could add an option so that by default REPACK omits
persisting the clustered index, and instead it only does that when you
give it some special option, say something like
"REPACK (persist_clustered_index=true) tab USING INDEX idx"
Overall I'm not sure this is terribly interesting, since clustered
indexes are not very useful for most users anyway.
I made a few other minor changes not worthy of individual mention, and
there are a few others pending, such as updates to the
pg_stat_progress_repack view infrastructure, as well as phasing out
pg_stat_progress_cluster (maybe the latter would offer a subset of the
former; not yet sure about this.) Also, I'd like to work on adding a
`repackdb` command for completeness.
On repackdb: I think is going to be very similar to vacuumdb, mostly in
that it is going to need to be able to run tasks in parallel; but there
are things it doesn't have to deal with, such as analyze-in-stages,
which I think is a large burden. I estimate about 1k LOC there,
extremely similar to vacuumdb. Maybe it makes sense to share the source
code and make the new executable a symlink instead, with some additional
code to support the two different modes. Again, I'm not sure about
this -- I like the idea, but I'd have to see the implementation.
I'll be rebasing the rest of Antonin's patch series afterwards,
including the logical decoding changes necessary for CONCURRENTLY. In
the meantime, if people want to review those, which would be very
valuable, they can go back to branch master from around the time he
submitted it and apply the old patches there.
[1]: /messages/by-id/76278.1724760050@antos
[2]: /messages/by-id/152010.1751307725@localhost
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Attachments:
v1-0001-Add-REPACK-command.patchtext/x-diff; charset=utf-8Download+1407-382
On Sat, Jul 26, 2025 at 5:56 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Hello,
Here's a patch to add REPACK and eventually the CONCURRENTLY flag to it.
This is coming from [1]. The ultimate goal is to have an in-core tool
to allow concurrent table rewrite to get rid of bloat; right now, VACUUM
FULL does that, but it's not concurrent. Users have resorted to using
the pg_repack third-party tool, which is ancient and uses a weird
internal implementation, as well as pg_squeeze, which uses logical
decoding to capture changes that occur during the table rewrite. The
patch submitted here, largely by Antonin Houska with some changes by me,
is based on the the pg_squeeze code which he authored, and first
introduces a new command called REPACK to absorb both VACUUM FULL and
CLUSTER, followed by addition of a CONCURRENTLY flag to allow some forms
of REPACK to operate online using logical decoding.Essentially, this first patch just reshuffles the CLUSTER code to create
the REPACK command.
Thanks for keeping this ball rolling.
My other change to Antonin's patch is that I made REPACK USING INDEX set
the 'indisclustered' flag to the index being used, so REPACK behaves
identically to CLUSTER. We can discuss whether we really want this.
For instance we could add an option so that by default REPACK omits
persisting the clustered index, and instead it only does that when you
give it some special option, say something like
"REPACK (persist_clustered_index=true) tab USING INDEX idx"
Overall I'm not sure this is terribly interesting, since clustered
indexes are not very useful for most users anyway.
I think I would lean towards having it work like CLUSTER (preserve the
index), since that helps people making the transition, and it doesn't
feel terribly useful to invent new syntax for a feature that I would
agree isn't very useful for most people.
I made a few other minor changes not worthy of individual mention, and
there are a few others pending, such as updates to the
pg_stat_progress_repack view infrastructure, as well as phasing out
pg_stat_progress_cluster (maybe the latter would offer a subset of the
former; not yet sure about this.) Also, I'd like to work on adding a
`repackdb` command for completeness.On repackdb: I think is going to be very similar to vacuumdb, mostly in
that it is going to need to be able to run tasks in parallel; but there
are things it doesn't have to deal with, such as analyze-in-stages,
which I think is a large burden. I estimate about 1k LOC there,
extremely similar to vacuumdb. Maybe it makes sense to share the source
code and make the new executable a symlink instead, with some additional
code to support the two different modes. Again, I'm not sure about
this -- I like the idea, but I'd have to see the implementation.I'll be rebasing the rest of Antonin's patch series afterwards,
including the logical decoding changes necessary for CONCURRENTLY. In
the meantime, if people want to review those, which would be very
valuable, they can go back to branch master from around the time he
submitted it and apply the old patches there.
For clarity, are you intending to commit this patch before having the
other parts ready? (If that sounds like an objection, it isn't) After
a first pass, I think there's some confusing bits in the new docs that
could use straightening out, but there likely going to overlap changes
once concurrently is brought in, so it might make sense to hold off on
those. Either way I definitely want to dive into this a bit deeper
with some fresh eyes, there's a lot to digest... speaking of, for this
bit in src/backend/commands/cluster.c
+ switch (cmd)
+ {
+ case REPACK_COMMAND_REPACK:
+ return "REPACK";
+ case REPACK_COMMAND_VACUUMFULL:
+ return "VACUUM";
+ case REPACK_COMMAND_CLUSTER:
+ return "VACUUM";
+ }
+ return "???";
The last one should return "CLUSTER" no?
Robert Treat
https://xzilla.net
On Sun, Jul 27, 2025 at 6:56 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Hello,
Here's a patch to add REPACK and eventually the CONCURRENTLY flag to it.
This is coming from [1]. The ultimate goal is to have an in-core tool
to allow concurrent table rewrite to get rid of bloat;
+1
right now, VACUUM
FULL does that, but it's not concurrent. Users have resorted to using
the pg_repack third-party tool, which is ancient and uses a weird
internal implementation, as well as pg_squeeze, which uses logical
decoding to capture changes that occur during the table rewrite. The
patch submitted here, largely by Antonin Houska with some changes by me,
is based on the the pg_squeeze code which he authored, and first
introduces a new command called REPACK to absorb both VACUUM FULL and
CLUSTER, followed by addition of a CONCURRENTLY flag to allow some forms
of REPACK to operate online using logical decoding.
Does this mean REPACK CONCURRENTLY requires wal_level = logical,
while plain REPACK (without CONCURRENTLY) works with any wal_level
setting? If we eventually deprecate VACUUM FULL and CLUSTER,
I think plain REPACK should still be allowed with wal_level = minimal
or replica, so users with those settings can perform equivalent
processing.
+ if (!cluster_is_permitted_for_relation(tableOid, userid,
+ CLUSTER_COMMAND_CLUSTER))
As for the patch you attached, it seems to be an early WIP and
might not be ready for review yet?? BTW, I got the following
compilation failure and probably CLUSTER_COMMAND_CLUSTER
the above should be GetUserId().
-----------------
cluster.c:455:14: error: use of undeclared identifier 'CLUSTER_COMMAND_CLUSTER'
455 |
CLUSTER_COMMAND_CLUSTER))
|
^
1 error generated.
-----------------
Regards,
--
Fujii Masao
Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
I made a few changes from Antonin's original at [2]. First, I modified
the grammar to support "REPACK [tab] USING INDEX" without specifying the
index name. With this change, all possibilities of the old commands are
covered,
...
Here's a list of existing commands, and how to write them in the current
patch's proposal for REPACK:-- re-clusters all tables that have a clustered index set
CLUSTER -> REPACK USING INDEX-- clusters the given table using the given index
CLUSTER tab USING idx -> REPACK tab USING INDEX idx-- clusters this table using a clustered index; error if no index clustered
CLUSTER tab -> REPACK tab USING INDEX-- vacuum-full all tables
VACUUM FULL -> REPACK-- vacuum-full the specified table
VACUUM FULL tab -> REPACK tab
Now that we want to cover the CLUSTER/VACUUM FULL completely, I've checked the
options of VACUUM FULL. I found two items not supported by REPACK (but also
not supported by by CLUSTER): ANALYZE and SKIP_DATABASE_STATS. Maybe just
let's mention that in the user documentation of REPACK?
(Besides that, VACUUM FULL accepts TRUNCATE and INDEX_CLEANUP options, but I
think these have no effect.)
--
Antonin Houska
Web: https://www.cybertec-postgresql.com
On Tue, Aug 5, 2025 at 4:59 AM Antonin Houska <ah@cybertec.at> wrote:
Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
I made a few changes from Antonin's original at [2]. First, I modified
the grammar to support "REPACK [tab] USING INDEX" without specifying the
index name. With this change, all possibilities of the old commands are
covered,...
Here's a list of existing commands, and how to write them in the current
patch's proposal for REPACK:-- re-clusters all tables that have a clustered index set
CLUSTER -> REPACK USING INDEX-- clusters the given table using the given index
CLUSTER tab USING idx -> REPACK tab USING INDEX idx-- clusters this table using a clustered index; error if no index clustered
CLUSTER tab -> REPACK tab USING INDEX
In the v18 patch, the docs say that repack doesn't remember the index,
but it seems we are still calling mark_index_clustered, so I think the
above is true but we need to update the docs(?).
-- vacuum-full all tables
VACUUM FULL -> REPACK-- vacuum-full the specified table
VACUUM FULL tab -> REPACK tabNow that we want to cover the CLUSTER/VACUUM FULL completely, I've checked the
options of VACUUM FULL. I found two items not supported by REPACK (but also
not supported by by CLUSTER): ANALYZE and SKIP_DATABASE_STATS. Maybe just
let's mention that in the user documentation of REPACK?
I would note that both pg_repack and pg_squeeze analyze by default,
and running "vacuum full analyze" is the recommended behavior, so not
having analyze included is a step backwards.
(Besides that, VACUUM FULL accepts TRUNCATE and INDEX_CLEANUP options, but I
think these have no effect.)
Yeah, these seem safe to ignore.
Robert Treat
https://xzilla.net
On 2025-Aug-16, Robert Treat wrote:
On Tue, Aug 5, 2025 at 4:59 AM Antonin Houska <ah@cybertec.at> wrote:
Now that we want to cover the CLUSTER/VACUUM FULL completely, I've checked the
options of VACUUM FULL. I found two items not supported by REPACK (but also
not supported by by CLUSTER): ANALYZE and SKIP_DATABASE_STATS. Maybe just
let's mention that in the user documentation of REPACK?I would note that both pg_repack and pg_squeeze analyze by default,
and running "vacuum full analyze" is the recommended behavior, so not
having analyze included is a step backwards.
Make sense to add ANALYZE as an option to repack, yeah.
So if I repack a single table with
REPACK (ANALYZE) table USING INDEX;
then do you expect that this would first cluster the table under
AccessExclusiveLock, then release the lock to do the analyze step, or
would the analyze be done under the same lock? This is significant for
a query that starts while repack is running, because if we release the
AEL then the query is planned when there are no stats for the table,
which might be bad.
I think the time to run the analyze step should be considerable shorter
than the time to run the repacking step, so running both together under
the same lock should be okay.
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Computing is too important to be left to men." (Karen Spärck Jones)
On 2025-Aug-16, Robert Treat wrote:
In the v18 patch, the docs say that repack doesn't remember the index,
but it seems we are still calling mark_index_clustered, so I think the
above is true but we need to update the docs(?).
Yes, the docs are obsolete on this point, I'm in the process of updating
them. Thanks for pointing this out.
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"La victoria es para quien se atreve a estar solo"
Hello,
Here's a second cut of the initial REPACK work. Antonin added an
implementation of pg_repackdb, and there's also a couple of bug fixes
that were reported in the thread. I also added support for the ANALYZE
option as noted by Robert Treat, though it only works if you specify a
single non-partitioned table. Adding for the multi-table case is likely
easy, but I didn't try.
I purposefully do not include the CONCURRENTLY work yet -- I want to get
this part commitable-clean first, then we can continue work on the
logical decoding work on top of that.
Note choice of shell command name: though all the other programs in
src/bin/scripts do not use the "pg_" prefix, this one does; we thought
it made no sense to follow the old programs as precedent because there
seems to be a lament for the lack of pg_ prefix in those, and we only
keep what they are because of their long history. This one has no
history.
Still on pg_repackdb, the implementation here is to install a symlink
called pg_repackdb which points to vacuumdb, and make the program behave
differently when called in this way. The amount of additional code for
this is relatively small, so I think this is a worthy technique --
assuming it works. If it doesn't, Antonin proposed a separate binary
that just calls some functions from vacuumdb. Or maybe we could have a
common source file that both utilities call.
I edited the docs a bit, limiting the exposure of CLUSTER and VACUUM
FULL, and instead redirecting the user to the REPACK docs. In the
REPACK docs I modified things for additional clarity.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Selbst das größte Genie würde nicht weit kommen, wenn es
alles seinem eigenen Innern verdanken wollte." (Johann Wolfgang von Goethe)
Ni aún el genio más grande llegaría muy lejos si
quisiera sacarlo todo de su propio interior.
Attachments:
v2-0001-Add-REPACK-command.patchtext/x-diff; charset=utf-8Download+2375-511
Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2025-Aug-16, Robert Treat wrote:
On Tue, Aug 5, 2025 at 4:59 AM Antonin Houska <ah@cybertec.at> wrote:
Now that we want to cover the CLUSTER/VACUUM FULL completely, I've checked the
options of VACUUM FULL. I found two items not supported by REPACK (but also
not supported by by CLUSTER): ANALYZE and SKIP_DATABASE_STATS. Maybe just
let's mention that in the user documentation of REPACK?I would note that both pg_repack and pg_squeeze analyze by default,
and running "vacuum full analyze" is the recommended behavior, so not
having analyze included is a step backwards.Make sense to add ANALYZE as an option to repack, yeah.
So if I repack a single table with
REPACK (ANALYZE) table USING INDEX;then do you expect that this would first cluster the table under
AccessExclusiveLock, then release the lock to do the analyze step, or
would the analyze be done under the same lock? This is significant for
a query that starts while repack is running, because if we release the
AEL then the query is planned when there are no stats for the table,
which might be bad.I think the time to run the analyze step should be considerable shorter
than the time to run the repacking step, so running both together under
the same lock should be okay.
AFAICS, VACUUM FULL first releases the AEL, then it analyzes the table. If
users did not complain so far, I'd assume that vacuum_rel() (effectively
cluster_rel() in the FULL case) does not change the stats that much.
--
Antonin Houska
Web: https://www.cybertec-postgresql.com
Álvaro Herrera <alvherre@kurilemu.de> wrote:
Still on pg_repackdb, the implementation here is to install a symlink
called pg_repackdb which points to vacuumdb, and make the program behave
differently when called in this way. The amount of additional code for
this is relatively small, so I think this is a worthy technique --
assuming it works. If it doesn't, Antonin proposed a separate binary
that just calls some functions from vacuumdb. Or maybe we could have a
common source file that both utilities call.
There's an issue with the symlink, maybe some meson expert can help. In
particular, the CI on Windows ends up with the following error:
ERROR: Tried to install symlink to missing file C:/cirrus/build/tmp_install/usr/local/pgsql/bin/vacuumdb
(The reason it does not happen on other platforms might be that the build is
slower on Windows, and thus it's more prone to some specific race conditions.)
It appears that the 'point_to' argument of the 'install_symlink()' function
[1]: https://mesonbuild.com/Reference-manual_functions.html#install_symlink
the function does not wait for the creation of the 'vacuumdb' executable.
I could not find another symlink of this kind in the tree. (AFAICS, the
postmaster->postgres symlink had been removed before Meson has been
introduced.)
Does anyone happen to have a clue? Thanks.
[1]: https://mesonbuild.com/Reference-manual_functions.html#install_symlink
[2]: https://mesonbuild.com/Reference-manual_returned_tgt.html
--
Antonin Houska
Web: https://www.cybertec-postgresql.com
On 2025-Aug-20, Antonin Houska wrote:
There's an issue with the symlink, maybe some meson expert can help. In
particular, the CI on Windows ends up with the following error:ERROR: Tried to install symlink to missing file C:/cirrus/build/tmp_install/usr/local/pgsql/bin/vacuumdb
Hmm, that's not the problem I see in the CI run from the commitfest app:
https://cirrus-ci.com/task/5608274336153600
[19:11:00.642] FAILED: [code=2] src/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj
[19:11:00.642] "cl" "-Isrc\bin\scripts\vacuumdb.exe.p" "-Isrc\include" "-I..\src\include" "-Ic:\openssl\1.1\include" "-I..\src\include\port\win32" "-I..\src\include\port\win32_msvc" "-Isrc/interfaces/libpq" "-I..\src\interfaces\libpq" "/MDd" "/nologo" "/showIncludes" "/utf-8" "/W2" "/Od" "/Zi" "/Zc:preprocessor" "/DWIN32" "/DWINDOWS" "/D__WINDOWS__" "/D__WIN32__" "/D_CRT_SECURE_NO_DEPRECATE" "/D_CRT_NONSTDC_NO_DEPRECATE" "/wd4018" "/wd4244" "/wd4273" "/wd4101" "/wd4102" "/wd4090" "/wd4267" "/Fdsrc\bin\scripts\vacuumdb.exe.p\vacuumdb.c.pdb" /Fosrc/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj "/c" ../src/bin/scripts/vacuumdb.c
[19:11:00.642] ../src/bin/scripts/vacuumdb.c(186): error C2059: syntax error: '}'
[19:11:00.642] ../src/bin/scripts/vacuumdb.c(197): warning C4034: sizeof returns 0
The real problem here seems to be the empty long_options_repack array.
I removed it and started a new run to see what happens. Running now:
https://cirrus-ci.com/build/4961902171783168
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
Álvaro Herrera <alvherre@kurilemu.de> wrote:
On 2025-Aug-20, Antonin Houska wrote:
There's an issue with the symlink, maybe some meson expert can help. In
particular, the CI on Windows ends up with the following error:ERROR: Tried to install symlink to missing file C:/cirrus/build/tmp_install/usr/local/pgsql/bin/vacuumdb
Hmm, that's not the problem I see in the CI run from the commitfest app:
I was referring to the other build that you shared off-list (probably
independent from cfbot):
https://cirrus-ci.com/build/4726227505774592
[19:11:00.642] FAILED: [code=2] src/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj
[19:11:00.642] "cl" "-Isrc\bin\scripts\vacuumdb.exe.p" "-Isrc\include" "-I..\src\include" "-Ic:\openssl\1.1\include" "-I..\src\include\port\win32" "-I..\src\include\port\win32_msvc" "-Isrc/interfaces/libpq" "-I..\src\interfaces\libpq" "/MDd" "/nologo" "/showIncludes" "/utf-8" "/W2" "/Od" "/Zi" "/Zc:preprocessor" "/DWIN32" "/DWINDOWS" "/D__WINDOWS__" "/D__WIN32__" "/D_CRT_SECURE_NO_DEPRECATE" "/D_CRT_NONSTDC_NO_DEPRECATE" "/wd4018" "/wd4244" "/wd4273" "/wd4101" "/wd4102" "/wd4090" "/wd4267" "/Fdsrc\bin\scripts\vacuumdb.exe.p\vacuumdb.c.pdb" /Fosrc/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj "/c" ../src/bin/scripts/vacuumdb.c
[19:11:00.642] ../src/bin/scripts/vacuumdb.c(186): error C2059: syntax error: '}'
[19:11:00.642] ../src/bin/scripts/vacuumdb.c(197): warning C4034: sizeof returns 0The real problem here seems to be the empty long_options_repack array.
I removed it and started a new run to see what happens. Running now:
https://cirrus-ci.com/build/4961902171783168
The symlink issue occurred at "Windows - Server 2019, MinGW64 - Meson", where
the code compiled well. The compilation failure mentioned above comes from
"Windows - Server 2019, VS 2019 - Meson & ninja". I think it's still possible
that the symlink issue will occur there once the compilation is fixed.
--
Antonin Houska
Web: https://www.cybertec-postgresql.com
Hi,
On 2025-08-20 16:22:41 +0200, Antonin Houska wrote:
�lvaro Herrera <alvherre@kurilemu.de> wrote:
On 2025-Aug-20, Antonin Houska wrote:
There's an issue with the symlink, maybe some meson expert can help. In
particular, the CI on Windows ends up with the following error:ERROR: Tried to install symlink to missing file C:/cirrus/build/tmp_install/usr/local/pgsql/bin/vacuumdb
Hmm, that's not the problem I see in the CI run from the commitfest app:
I was referring to the other build that you shared off-list (probably
independent from cfbot):https://cirrus-ci.com/build/4726227505774592
[19:11:00.642] FAILED: [code=2] src/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj
[19:11:00.642] "cl" "-Isrc\bin\scripts\vacuumdb.exe.p" "-Isrc\include" "-I..\src\include" "-Ic:\openssl\1.1\include" "-I..\src\include\port\win32" "-I..\src\include\port\win32_msvc" "-Isrc/interfaces/libpq" "-I..\src\interfaces\libpq" "/MDd" "/nologo" "/showIncludes" "/utf-8" "/W2" "/Od" "/Zi" "/Zc:preprocessor" "/DWIN32" "/DWINDOWS" "/D__WINDOWS__" "/D__WIN32__" "/D_CRT_SECURE_NO_DEPRECATE" "/D_CRT_NONSTDC_NO_DEPRECATE" "/wd4018" "/wd4244" "/wd4273" "/wd4101" "/wd4102" "/wd4090" "/wd4267" "/Fdsrc\bin\scripts\vacuumdb.exe.p\vacuumdb.c.pdb" /Fosrc/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj "/c" ../src/bin/scripts/vacuumdb.c
[19:11:00.642] ../src/bin/scripts/vacuumdb.c(186): error C2059: syntax error: '}'
[19:11:00.642] ../src/bin/scripts/vacuumdb.c(197): warning C4034: sizeof returns 0The real problem here seems to be the empty long_options_repack array.
I removed it and started a new run to see what happens. Running now:
https://cirrus-ci.com/build/4961902171783168The symlink issue occurred at "Windows - Server 2019, MinGW64 - Meson", where
the code compiled well. The compilation failure mentioned above comes from
"Windows - Server 2019, VS 2019 - Meson & ninja". I think it's still possible
that the symlink issue will occur there once the compilation is fixed.
FWIW, I don't think it's particularly wise to rely on symlinks on windows -
IIRC they will often not be enabled outside of development environments.
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2025-08-20 16:22:41 +0200, Antonin Houska wrote:
Álvaro Herrera <alvherre@kurilemu.de> wrote:
On 2025-Aug-20, Antonin Houska wrote:
There's an issue with the symlink, maybe some meson expert can help. In
particular, the CI on Windows ends up with the following error:ERROR: Tried to install symlink to missing file C:/cirrus/build/tmp_install/usr/local/pgsql/bin/vacuumdb
Hmm, that's not the problem I see in the CI run from the commitfest app:
I was referring to the other build that you shared off-list (probably
independent from cfbot):https://cirrus-ci.com/build/4726227505774592
[19:11:00.642] FAILED: [code=2] src/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj
[19:11:00.642] "cl" "-Isrc\bin\scripts\vacuumdb.exe.p" "-Isrc\include" "-I..\src\include" "-Ic:\openssl\1.1\include" "-I..\src\include\port\win32" "-I..\src\include\port\win32_msvc" "-Isrc/interfaces/libpq" "-I..\src\interfaces\libpq" "/MDd" "/nologo" "/showIncludes" "/utf-8" "/W2" "/Od" "/Zi" "/Zc:preprocessor" "/DWIN32" "/DWINDOWS" "/D__WINDOWS__" "/D__WIN32__" "/D_CRT_SECURE_NO_DEPRECATE" "/D_CRT_NONSTDC_NO_DEPRECATE" "/wd4018" "/wd4244" "/wd4273" "/wd4101" "/wd4102" "/wd4090" "/wd4267" "/Fdsrc\bin\scripts\vacuumdb.exe.p\vacuumdb.c.pdb" /Fosrc/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj "/c" ../src/bin/scripts/vacuumdb.c
[19:11:00.642] ../src/bin/scripts/vacuumdb.c(186): error C2059: syntax error: '}'
[19:11:00.642] ../src/bin/scripts/vacuumdb.c(197): warning C4034: sizeof returns 0The real problem here seems to be the empty long_options_repack array.
I removed it and started a new run to see what happens. Running now:
https://cirrus-ci.com/build/4961902171783168The symlink issue occurred at "Windows - Server 2019, MinGW64 - Meson", where
the code compiled well. The compilation failure mentioned above comes from
"Windows - Server 2019, VS 2019 - Meson & ninja". I think it's still possible
that the symlink issue will occur there once the compilation is fixed.FWIW, I don't think it's particularly wise to rely on symlinks on windows -
IIRC they will often not be enabled outside of development environments.
ok, installing a copy of the same executable with a different name seems more
reliable. At least that's how the postmaster->postgres link used to be
handled, if I read Makefile correctly. Thanks.
--
Antonin Houska
Web: https://www.cybertec-postgresql.com
Hi,
On 2025-08-21 20:14:14 +0200, Antonin Houska wrote:
ok, installing a copy of the same executable with a different name seems more
reliable. At least that's how the postmaster->postgres link used to be
handled, if I read Makefile correctly. Thanks.
I have not followed this thread, but I don't think the whole thing of having a
single executable with multiple names is worth doing. Just make whatever an
option, instead of having multiple "executables".
Greetings,
Andres
On Tue, Aug 19, 2025 at 2:53 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
Note choice of shell command name: though all the other programs in
src/bin/scripts do not use the "pg_" prefix, this one does; we thought
it made no sense to follow the old programs as precedent because there
seems to be a lament for the lack of pg_ prefix in those, and we only
keep what they are because of their long history. This one has no
history.Still on pg_repackdb, the implementation here is to install a symlink
called pg_repackdb which points to vacuumdb, and make the program behave
differently when called in this way. The amount of additional code for
this is relatively small, so I think this is a worthy technique --
assuming it works. If it doesn't, Antonin proposed a separate binary
that just calls some functions from vacuumdb. Or maybe we could have a
common source file that both utilities call.
What's the plan for clusterdb? It seems like we'd ideally create a
stand alone pg_repackdb which replaces clusterdb and also allows us to
remove the FULL options from vacuumdb.
Robert Treat
https://xzilla.net
On 2025-Aug-21, Robert Treat wrote:
What's the plan for clusterdb? It seems like we'd ideally create a
stand alone pg_repackdb which replaces clusterdb and also allows us to
remove the FULL options from vacuumdb.
I don't think we should remove clusterdb, to avoid breaking any scripts
that work today. As you say, I would create the standalone pg_repackdb
to do what we need it to do (namely: run the REPACK commands) and leave
vacuumdb and clusterdb alone. Removing the obsolete commands and
options can be done in a few years.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
On Fri, Aug 22, 2025, at 6:40 AM, Álvaro Herrera wrote:
On 2025-Aug-21, Robert Treat wrote:
What's the plan for clusterdb? It seems like we'd ideally create a
stand alone pg_repackdb which replaces clusterdb and also allows us to
remove the FULL options from vacuumdb.I don't think we should remove clusterdb, to avoid breaking any scripts
that work today. As you say, I would create the standalone pg_repackdb
to do what we need it to do (namely: run the REPACK commands) and leave
vacuumdb and clusterdb alone. Removing the obsolete commands and
options can be done in a few years.
I would say that we need to plan the removal of these binaries (clusterdb and
vacuumdb). We can start with a warning into clusterdb saying they should use
pg_repackdb. In a few years, we can remove clusterdb. There were complaints
about binary names without a pg_ prefix in the past [1]/messages/by-id/CAJgfmqXYYKXR+QUhEa3cq6pc8dV0Hu7QvOUccm7R0TkC=T-+=A@mail.gmail.com.
I don't think we need to keep vacuumdb. Packagers can keep a symlink (vacuumdb)
to pg_repackdb. We can add a similar warning message saying they should use
pg_repackdb if the symlink is used.
[1]: /messages/by-id/CAJgfmqXYYKXR+QUhEa3cq6pc8dV0Hu7QvOUccm7R0TkC=T-+=A@mail.gmail.com
--
Euler Taveira
EDB https://www.enterprisedb.com/
Hi,
On Fri, Aug 22, 2025 at 05:32:34PM -0300, Euler Taveira wrote:
On Fri, Aug 22, 2025, at 6:40 AM, �lvaro Herrera wrote:
On 2025-Aug-21, Robert Treat wrote:
What's the plan for clusterdb? It seems like we'd ideally create a
stand alone pg_repackdb which replaces clusterdb and also allows us to
remove the FULL options from vacuumdb.I don't think we should remove clusterdb, to avoid breaking any scripts
that work today. As you say, I would create the standalone pg_repackdb
to do what we need it to do (namely: run the REPACK commands) and leave
vacuumdb and clusterdb alone. Removing the obsolete commands and
options can be done in a few years.I would say that we need to plan the removal of these binaries (clusterdb and
vacuumdb). We can start with a warning into clusterdb saying they should use
pg_repackdb. In a few years, we can remove clusterdb. There were complaints
about binary names without a pg_ prefix in the past [1].
Yeah.
I don't think we need to keep vacuumdb. Packagers can keep a symlink (vacuumdb)
to pg_repackdb. We can add a similar warning message saying they should use
pg_repackdb if the symlink is used.
Unless pg_repack has the same (or a superset of) CLI and behaviour as
vacuumdb (I haven't checked, but doubt it?), I think replacing vacuumdb
with a symlink to pg_repack will lead to much more breakage in existing
scripts/automation than clusterdb, which I guess is used orders of
magnitude less frequently than vacumdb.
Michael
On 2025-08-23, Michael Banck wrote:
On Fri, Aug 22, 2025 at 05:32:34PM -0300, Euler Taveira wrote:
I don't think we need to keep vacuumdb. Packagers can keep a symlink (vacuumdb)
to pg_repackdb. We can add a similar warning message saying they should use
pg_repackdb if the symlink is used.Unless pg_repack has the same (or a superset of) CLI and behaviour as
vacuumdb (I haven't checked, but doubt it?), I think replacing vacuumdb
with a symlink to pg_repack will lead to much more breakage in existing
scripts/automation than clusterdb, which I guess is used orders of
magnitude less frequently than vacumdb.
Yeah, I completely disagree with the idea of getting rid of vacuumdb. We can, maybe, in a distant future, get rid of the --full option to vacuumdb. But the rest of the vacuumdb behavior must stay, I think, because REPACK is not VACUUM — it is only VACUUM FULL. And we want to make that distinction very clear.
We can also, in a few years, get rid of clusterdb. But I don't think we need to deprecate it just yet.
--
Álvaro Herrera