Adding REPACK [concurrently]

Started by Alvaro Herrera8 months ago136 messages
Jump to latest
#1Alvaro Herrera
alvherre@2ndquadrant.com

Hello,

Here's a patch to add REPACK and eventually the CONCURRENTLY flag to it.
This is coming from [1]/messages/by-id/76278.1724760050@antos. The ultimate goal is to have an in-core tool
to allow concurrent table rewrite to get rid of bloat; right now, VACUUM
FULL does that, but it's not concurrent. Users have resorted to using
the pg_repack third-party tool, which is ancient and uses a weird
internal implementation, as well as pg_squeeze, which uses logical
decoding to capture changes that occur during the table rewrite. The
patch submitted here, largely by Antonin Houska with some changes by me,
is based on the the pg_squeeze code which he authored, and first
introduces a new command called REPACK to absorb both VACUUM FULL and
CLUSTER, followed by addition of a CONCURRENTLY flag to allow some forms
of REPACK to operate online using logical decoding.

Essentially, this first patch just reshuffles the CLUSTER code to create
the REPACK command.

I made a few changes from Antonin's original at [2]/messages/by-id/152010.1751307725@localhost. First, I modified
the grammar to support "REPACK [tab] USING INDEX" without specifying the
index name. With this change, all possibilities of the old commands are
covered, which gives us the chance to flag them as obsolete. (This is
good, because having VACUUM FULL do something completely different from
regular VACUUM confuses users all the time; and on the other hand,
having a command called CLUSTER which is at odds with what most people
think of as a "database cluster" is also confusing.)

Here's a list of existing commands, and how to write them in the current
patch's proposal for REPACK:

-- re-clusters all tables that have a clustered index set
CLUSTER -> REPACK USING INDEX

-- clusters the given table using the given index
CLUSTER tab USING idx -> REPACK tab USING INDEX idx

-- clusters this table using a clustered index; error if no index clustered
CLUSTER tab -> REPACK tab USING INDEX

-- vacuum-full all tables
VACUUM FULL -> REPACK

-- vacuum-full the specified table
VACUUM FULL tab -> REPACK tab

My other change to Antonin's patch is that I made REPACK USING INDEX set
the 'indisclustered' flag to the index being used, so REPACK behaves
identically to CLUSTER. We can discuss whether we really want this.
For instance we could add an option so that by default REPACK omits
persisting the clustered index, and instead it only does that when you
give it some special option, say something like
"REPACK (persist_clustered_index=true) tab USING INDEX idx"
Overall I'm not sure this is terribly interesting, since clustered
indexes are not very useful for most users anyway.

I made a few other minor changes not worthy of individual mention, and
there are a few others pending, such as updates to the
pg_stat_progress_repack view infrastructure, as well as phasing out
pg_stat_progress_cluster (maybe the latter would offer a subset of the
former; not yet sure about this.) Also, I'd like to work on adding a
`repackdb` command for completeness.

On repackdb: I think is going to be very similar to vacuumdb, mostly in
that it is going to need to be able to run tasks in parallel; but there
are things it doesn't have to deal with, such as analyze-in-stages,
which I think is a large burden. I estimate about 1k LOC there,
extremely similar to vacuumdb. Maybe it makes sense to share the source
code and make the new executable a symlink instead, with some additional
code to support the two different modes. Again, I'm not sure about
this -- I like the idea, but I'd have to see the implementation.

I'll be rebasing the rest of Antonin's patch series afterwards,
including the logical decoding changes necessary for CONCURRENTLY. In
the meantime, if people want to review those, which would be very
valuable, they can go back to branch master from around the time he
submitted it and apply the old patches there.

[1]: /messages/by-id/76278.1724760050@antos
[2]: /messages/by-id/152010.1751307725@localhost

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/

Attachments:

v1-0001-Add-REPACK-command.patchtext/x-diff; charset=utf-8Download+1407-382
#2Robert Treat
xzilla@users.sourceforge.net
In reply to: Alvaro Herrera (#1)
Re: Adding REPACK [concurrently]

On Sat, Jul 26, 2025 at 5:56 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Hello,

Here's a patch to add REPACK and eventually the CONCURRENTLY flag to it.
This is coming from [1]. The ultimate goal is to have an in-core tool
to allow concurrent table rewrite to get rid of bloat; right now, VACUUM
FULL does that, but it's not concurrent. Users have resorted to using
the pg_repack third-party tool, which is ancient and uses a weird
internal implementation, as well as pg_squeeze, which uses logical
decoding to capture changes that occur during the table rewrite. The
patch submitted here, largely by Antonin Houska with some changes by me,
is based on the the pg_squeeze code which he authored, and first
introduces a new command called REPACK to absorb both VACUUM FULL and
CLUSTER, followed by addition of a CONCURRENTLY flag to allow some forms
of REPACK to operate online using logical decoding.

Essentially, this first patch just reshuffles the CLUSTER code to create
the REPACK command.

Thanks for keeping this ball rolling.

My other change to Antonin's patch is that I made REPACK USING INDEX set
the 'indisclustered' flag to the index being used, so REPACK behaves
identically to CLUSTER. We can discuss whether we really want this.
For instance we could add an option so that by default REPACK omits
persisting the clustered index, and instead it only does that when you
give it some special option, say something like
"REPACK (persist_clustered_index=true) tab USING INDEX idx"
Overall I'm not sure this is terribly interesting, since clustered
indexes are not very useful for most users anyway.

I think I would lean towards having it work like CLUSTER (preserve the
index), since that helps people making the transition, and it doesn't
feel terribly useful to invent new syntax for a feature that I would
agree isn't very useful for most people.

I made a few other minor changes not worthy of individual mention, and
there are a few others pending, such as updates to the
pg_stat_progress_repack view infrastructure, as well as phasing out
pg_stat_progress_cluster (maybe the latter would offer a subset of the
former; not yet sure about this.) Also, I'd like to work on adding a
`repackdb` command for completeness.

On repackdb: I think is going to be very similar to vacuumdb, mostly in
that it is going to need to be able to run tasks in parallel; but there
are things it doesn't have to deal with, such as analyze-in-stages,
which I think is a large burden. I estimate about 1k LOC there,
extremely similar to vacuumdb. Maybe it makes sense to share the source
code and make the new executable a symlink instead, with some additional
code to support the two different modes. Again, I'm not sure about
this -- I like the idea, but I'd have to see the implementation.

I'll be rebasing the rest of Antonin's patch series afterwards,
including the logical decoding changes necessary for CONCURRENTLY. In
the meantime, if people want to review those, which would be very
valuable, they can go back to branch master from around the time he
submitted it and apply the old patches there.

For clarity, are you intending to commit this patch before having the
other parts ready? (If that sounds like an objection, it isn't) After
a first pass, I think there's some confusing bits in the new docs that
could use straightening out, but there likely going to overlap changes
once concurrently is brought in, so it might make sense to hold off on
those. Either way I definitely want to dive into this a bit deeper
with some fresh eyes, there's a lot to digest... speaking of, for this
bit in src/backend/commands/cluster.c

+    switch (cmd)
+    {
+        case REPACK_COMMAND_REPACK:
+            return "REPACK";
+        case REPACK_COMMAND_VACUUMFULL:
+            return "VACUUM";
+        case REPACK_COMMAND_CLUSTER:
+            return "VACUUM";
+    }
+    return "???";

The last one should return "CLUSTER" no?

Robert Treat
https://xzilla.net

#3Fujii Masao
masao.fujii@gmail.com
In reply to: Alvaro Herrera (#1)
Re: Adding REPACK [concurrently]

On Sun, Jul 27, 2025 at 6:56 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Hello,

Here's a patch to add REPACK and eventually the CONCURRENTLY flag to it.
This is coming from [1]. The ultimate goal is to have an in-core tool
to allow concurrent table rewrite to get rid of bloat;

+1

right now, VACUUM
FULL does that, but it's not concurrent. Users have resorted to using
the pg_repack third-party tool, which is ancient and uses a weird
internal implementation, as well as pg_squeeze, which uses logical
decoding to capture changes that occur during the table rewrite. The
patch submitted here, largely by Antonin Houska with some changes by me,
is based on the the pg_squeeze code which he authored, and first
introduces a new command called REPACK to absorb both VACUUM FULL and
CLUSTER, followed by addition of a CONCURRENTLY flag to allow some forms
of REPACK to operate online using logical decoding.

Does this mean REPACK CONCURRENTLY requires wal_level = logical,
while plain REPACK (without CONCURRENTLY) works with any wal_level
setting? If we eventually deprecate VACUUM FULL and CLUSTER,
I think plain REPACK should still be allowed with wal_level = minimal
or replica, so users with those settings can perform equivalent
processing.

+ if (!cluster_is_permitted_for_relation(tableOid, userid,
+    CLUSTER_COMMAND_CLUSTER))

As for the patch you attached, it seems to be an early WIP and
might not be ready for review yet?? BTW, I got the following
compilation failure and probably CLUSTER_COMMAND_CLUSTER
the above should be GetUserId().

-----------------
cluster.c:455:14: error: use of undeclared identifier 'CLUSTER_COMMAND_CLUSTER'
455 |
CLUSTER_COMMAND_CLUSTER))
|
^
1 error generated.
-----------------

Regards,

--
Fujii Masao

#4Antonin Houska
ah@cybertec.at
In reply to: Alvaro Herrera (#1)
Re: Adding REPACK [concurrently]

Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

I made a few changes from Antonin's original at [2]. First, I modified
the grammar to support "REPACK [tab] USING INDEX" without specifying the
index name. With this change, all possibilities of the old commands are
covered,

...

Here's a list of existing commands, and how to write them in the current
patch's proposal for REPACK:

-- re-clusters all tables that have a clustered index set
CLUSTER -> REPACK USING INDEX

-- clusters the given table using the given index
CLUSTER tab USING idx -> REPACK tab USING INDEX idx

-- clusters this table using a clustered index; error if no index clustered
CLUSTER tab -> REPACK tab USING INDEX

-- vacuum-full all tables
VACUUM FULL -> REPACK

-- vacuum-full the specified table
VACUUM FULL tab -> REPACK tab

Now that we want to cover the CLUSTER/VACUUM FULL completely, I've checked the
options of VACUUM FULL. I found two items not supported by REPACK (but also
not supported by by CLUSTER): ANALYZE and SKIP_DATABASE_STATS. Maybe just
let's mention that in the user documentation of REPACK?

(Besides that, VACUUM FULL accepts TRUNCATE and INDEX_CLEANUP options, but I
think these have no effect.)

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#5Robert Treat
xzilla@users.sourceforge.net
In reply to: Antonin Houska (#4)
Re: Adding REPACK [concurrently]

On Tue, Aug 5, 2025 at 4:59 AM Antonin Houska <ah@cybertec.at> wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

I made a few changes from Antonin's original at [2]. First, I modified
the grammar to support "REPACK [tab] USING INDEX" without specifying the
index name. With this change, all possibilities of the old commands are
covered,

...

Here's a list of existing commands, and how to write them in the current
patch's proposal for REPACK:

-- re-clusters all tables that have a clustered index set
CLUSTER -> REPACK USING INDEX

-- clusters the given table using the given index
CLUSTER tab USING idx -> REPACK tab USING INDEX idx

-- clusters this table using a clustered index; error if no index clustered
CLUSTER tab -> REPACK tab USING INDEX

In the v18 patch, the docs say that repack doesn't remember the index,
but it seems we are still calling mark_index_clustered, so I think the
above is true but we need to update the docs(?).

-- vacuum-full all tables
VACUUM FULL -> REPACK

-- vacuum-full the specified table
VACUUM FULL tab -> REPACK tab

Now that we want to cover the CLUSTER/VACUUM FULL completely, I've checked the
options of VACUUM FULL. I found two items not supported by REPACK (but also
not supported by by CLUSTER): ANALYZE and SKIP_DATABASE_STATS. Maybe just
let's mention that in the user documentation of REPACK?

I would note that both pg_repack and pg_squeeze analyze by default,
and running "vacuum full analyze" is the recommended behavior, so not
having analyze included is a step backwards.

(Besides that, VACUUM FULL accepts TRUNCATE and INDEX_CLEANUP options, but I
think these have no effect.)

Yeah, these seem safe to ignore.

Robert Treat
https://xzilla.net

#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Treat (#5)
Re: Adding REPACK [concurrently]

On 2025-Aug-16, Robert Treat wrote:

On Tue, Aug 5, 2025 at 4:59 AM Antonin Houska <ah@cybertec.at> wrote:

Now that we want to cover the CLUSTER/VACUUM FULL completely, I've checked the
options of VACUUM FULL. I found two items not supported by REPACK (but also
not supported by by CLUSTER): ANALYZE and SKIP_DATABASE_STATS. Maybe just
let's mention that in the user documentation of REPACK?

I would note that both pg_repack and pg_squeeze analyze by default,
and running "vacuum full analyze" is the recommended behavior, so not
having analyze included is a step backwards.

Make sense to add ANALYZE as an option to repack, yeah.

So if I repack a single table with
REPACK (ANALYZE) table USING INDEX;

then do you expect that this would first cluster the table under
AccessExclusiveLock, then release the lock to do the analyze step, or
would the analyze be done under the same lock? This is significant for
a query that starts while repack is running, because if we release the
AEL then the query is planned when there are no stats for the table,
which might be bad.

I think the time to run the analyze step should be considerable shorter
than the time to run the repacking step, so running both together under
the same lock should be okay.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Computing is too important to be left to men." (Karen Spärck Jones)

#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Treat (#5)
Re: Adding REPACK [concurrently]

On 2025-Aug-16, Robert Treat wrote:

In the v18 patch, the docs say that repack doesn't remember the index,
but it seems we are still calling mark_index_clustered, so I think the
above is true but we need to update the docs(?).

Yes, the docs are obsolete on this point, I'm in the process of updating
them. Thanks for pointing this out.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"La victoria es para quien se atreve a estar solo"

#8Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#1)
Re: Adding REPACK [concurrently]

Hello,

Here's a second cut of the initial REPACK work. Antonin added an
implementation of pg_repackdb, and there's also a couple of bug fixes
that were reported in the thread. I also added support for the ANALYZE
option as noted by Robert Treat, though it only works if you specify a
single non-partitioned table. Adding for the multi-table case is likely
easy, but I didn't try.

I purposefully do not include the CONCURRENTLY work yet -- I want to get
this part commitable-clean first, then we can continue work on the
logical decoding work on top of that.

Note choice of shell command name: though all the other programs in
src/bin/scripts do not use the "pg_" prefix, this one does; we thought
it made no sense to follow the old programs as precedent because there
seems to be a lament for the lack of pg_ prefix in those, and we only
keep what they are because of their long history. This one has no
history.

Still on pg_repackdb, the implementation here is to install a symlink
called pg_repackdb which points to vacuumdb, and make the program behave
differently when called in this way. The amount of additional code for
this is relatively small, so I think this is a worthy technique --
assuming it works. If it doesn't, Antonin proposed a separate binary
that just calls some functions from vacuumdb. Or maybe we could have a
common source file that both utilities call.

I edited the docs a bit, limiting the exposure of CLUSTER and VACUUM
FULL, and instead redirecting the user to the REPACK docs. In the
REPACK docs I modified things for additional clarity.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Selbst das größte Genie würde nicht weit kommen, wenn es
alles seinem eigenen Innern verdanken wollte." (Johann Wolfgang von Goethe)
Ni aún el genio más grande llegaría muy lejos si
quisiera sacarlo todo de su propio interior.

Attachments:

v2-0001-Add-REPACK-command.patchtext/x-diff; charset=utf-8Download+2375-511
#9Antonin Houska
ah@cybertec.at
In reply to: Alvaro Herrera (#6)
Re: Adding REPACK [concurrently]

Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2025-Aug-16, Robert Treat wrote:

On Tue, Aug 5, 2025 at 4:59 AM Antonin Houska <ah@cybertec.at> wrote:

Now that we want to cover the CLUSTER/VACUUM FULL completely, I've checked the
options of VACUUM FULL. I found two items not supported by REPACK (but also
not supported by by CLUSTER): ANALYZE and SKIP_DATABASE_STATS. Maybe just
let's mention that in the user documentation of REPACK?

I would note that both pg_repack and pg_squeeze analyze by default,
and running "vacuum full analyze" is the recommended behavior, so not
having analyze included is a step backwards.

Make sense to add ANALYZE as an option to repack, yeah.

So if I repack a single table with
REPACK (ANALYZE) table USING INDEX;

then do you expect that this would first cluster the table under
AccessExclusiveLock, then release the lock to do the analyze step, or
would the analyze be done under the same lock? This is significant for
a query that starts while repack is running, because if we release the
AEL then the query is planned when there are no stats for the table,
which might be bad.

I think the time to run the analyze step should be considerable shorter
than the time to run the repacking step, so running both together under
the same lock should be okay.

AFAICS, VACUUM FULL first releases the AEL, then it analyzes the table. If
users did not complain so far, I'd assume that vacuum_rel() (effectively
cluster_rel() in the FULL case) does not change the stats that much.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#10Antonin Houska
ah@cybertec.at
In reply to: Alvaro Herrera (#8)
Re: Adding REPACK [concurrently]

Álvaro Herrera <alvherre@kurilemu.de> wrote:

Still on pg_repackdb, the implementation here is to install a symlink
called pg_repackdb which points to vacuumdb, and make the program behave
differently when called in this way. The amount of additional code for
this is relatively small, so I think this is a worthy technique --
assuming it works. If it doesn't, Antonin proposed a separate binary
that just calls some functions from vacuumdb. Or maybe we could have a
common source file that both utilities call.

There's an issue with the symlink, maybe some meson expert can help. In
particular, the CI on Windows ends up with the following error:

ERROR: Tried to install symlink to missing file C:/cirrus/build/tmp_install/usr/local/pgsql/bin/vacuumdb

(The reason it does not happen on other platforms might be that the build is
slower on Windows, and thus it's more prone to some specific race conditions.)

It appears that the 'point_to' argument of the 'install_symlink()' function
[1]: https://mesonbuild.com/Reference-manual_functions.html#install_symlink
the function does not wait for the creation of the 'vacuumdb' executable.

I could not find another symlink of this kind in the tree. (AFAICS, the
postmaster->postgres symlink had been removed before Meson has been
introduced.)

Does anyone happen to have a clue? Thanks.

[1]: https://mesonbuild.com/Reference-manual_functions.html#install_symlink
[2]: https://mesonbuild.com/Reference-manual_returned_tgt.html

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#11Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Antonin Houska (#10)
Re: Adding REPACK [concurrently]

On 2025-Aug-20, Antonin Houska wrote:

There's an issue with the symlink, maybe some meson expert can help. In
particular, the CI on Windows ends up with the following error:

ERROR: Tried to install symlink to missing file C:/cirrus/build/tmp_install/usr/local/pgsql/bin/vacuumdb

Hmm, that's not the problem I see in the CI run from the commitfest app:

https://cirrus-ci.com/task/5608274336153600

[19:11:00.642] FAILED: [code=2] src/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj
[19:11:00.642] "cl" "-Isrc\bin\scripts\vacuumdb.exe.p" "-Isrc\include" "-I..\src\include" "-Ic:\openssl\1.1\include" "-I..\src\include\port\win32" "-I..\src\include\port\win32_msvc" "-Isrc/interfaces/libpq" "-I..\src\interfaces\libpq" "/MDd" "/nologo" "/showIncludes" "/utf-8" "/W2" "/Od" "/Zi" "/Zc:preprocessor" "/DWIN32" "/DWINDOWS" "/D__WINDOWS__" "/D__WIN32__" "/D_CRT_SECURE_NO_DEPRECATE" "/D_CRT_NONSTDC_NO_DEPRECATE" "/wd4018" "/wd4244" "/wd4273" "/wd4101" "/wd4102" "/wd4090" "/wd4267" "/Fdsrc\bin\scripts\vacuumdb.exe.p\vacuumdb.c.pdb" /Fosrc/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj "/c" ../src/bin/scripts/vacuumdb.c
[19:11:00.642] ../src/bin/scripts/vacuumdb.c(186): error C2059: syntax error: '}'
[19:11:00.642] ../src/bin/scripts/vacuumdb.c(197): warning C4034: sizeof returns 0

The real problem here seems to be the empty long_options_repack array.
I removed it and started a new run to see what happens. Running now:
https://cirrus-ci.com/build/4961902171783168

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/

#12Antonin Houska
ah@cybertec.at
In reply to: Alvaro Herrera (#11)
Re: Adding REPACK [concurrently]

Álvaro Herrera <alvherre@kurilemu.de> wrote:

On 2025-Aug-20, Antonin Houska wrote:

There's an issue with the symlink, maybe some meson expert can help. In
particular, the CI on Windows ends up with the following error:

ERROR: Tried to install symlink to missing file C:/cirrus/build/tmp_install/usr/local/pgsql/bin/vacuumdb

Hmm, that's not the problem I see in the CI run from the commitfest app:

https://cirrus-ci.com/task/5608274336153600

I was referring to the other build that you shared off-list (probably
independent from cfbot):

https://cirrus-ci.com/build/4726227505774592

[19:11:00.642] FAILED: [code=2] src/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj
[19:11:00.642] "cl" "-Isrc\bin\scripts\vacuumdb.exe.p" "-Isrc\include" "-I..\src\include" "-Ic:\openssl\1.1\include" "-I..\src\include\port\win32" "-I..\src\include\port\win32_msvc" "-Isrc/interfaces/libpq" "-I..\src\interfaces\libpq" "/MDd" "/nologo" "/showIncludes" "/utf-8" "/W2" "/Od" "/Zi" "/Zc:preprocessor" "/DWIN32" "/DWINDOWS" "/D__WINDOWS__" "/D__WIN32__" "/D_CRT_SECURE_NO_DEPRECATE" "/D_CRT_NONSTDC_NO_DEPRECATE" "/wd4018" "/wd4244" "/wd4273" "/wd4101" "/wd4102" "/wd4090" "/wd4267" "/Fdsrc\bin\scripts\vacuumdb.exe.p\vacuumdb.c.pdb" /Fosrc/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj "/c" ../src/bin/scripts/vacuumdb.c
[19:11:00.642] ../src/bin/scripts/vacuumdb.c(186): error C2059: syntax error: '}'
[19:11:00.642] ../src/bin/scripts/vacuumdb.c(197): warning C4034: sizeof returns 0

The real problem here seems to be the empty long_options_repack array.
I removed it and started a new run to see what happens. Running now:
https://cirrus-ci.com/build/4961902171783168

The symlink issue occurred at "Windows - Server 2019, MinGW64 - Meson", where
the code compiled well. The compilation failure mentioned above comes from
"Windows - Server 2019, VS 2019 - Meson & ninja". I think it's still possible
that the symlink issue will occur there once the compilation is fixed.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#13Andres Freund
andres@anarazel.de
In reply to: Antonin Houska (#12)
Re: Adding REPACK [concurrently]

Hi,

On 2025-08-20 16:22:41 +0200, Antonin Houska wrote:

�lvaro Herrera <alvherre@kurilemu.de> wrote:

On 2025-Aug-20, Antonin Houska wrote:

There's an issue with the symlink, maybe some meson expert can help. In
particular, the CI on Windows ends up with the following error:

ERROR: Tried to install symlink to missing file C:/cirrus/build/tmp_install/usr/local/pgsql/bin/vacuumdb

Hmm, that's not the problem I see in the CI run from the commitfest app:

https://cirrus-ci.com/task/5608274336153600

I was referring to the other build that you shared off-list (probably
independent from cfbot):

https://cirrus-ci.com/build/4726227505774592

[19:11:00.642] FAILED: [code=2] src/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj
[19:11:00.642] "cl" "-Isrc\bin\scripts\vacuumdb.exe.p" "-Isrc\include" "-I..\src\include" "-Ic:\openssl\1.1\include" "-I..\src\include\port\win32" "-I..\src\include\port\win32_msvc" "-Isrc/interfaces/libpq" "-I..\src\interfaces\libpq" "/MDd" "/nologo" "/showIncludes" "/utf-8" "/W2" "/Od" "/Zi" "/Zc:preprocessor" "/DWIN32" "/DWINDOWS" "/D__WINDOWS__" "/D__WIN32__" "/D_CRT_SECURE_NO_DEPRECATE" "/D_CRT_NONSTDC_NO_DEPRECATE" "/wd4018" "/wd4244" "/wd4273" "/wd4101" "/wd4102" "/wd4090" "/wd4267" "/Fdsrc\bin\scripts\vacuumdb.exe.p\vacuumdb.c.pdb" /Fosrc/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj "/c" ../src/bin/scripts/vacuumdb.c
[19:11:00.642] ../src/bin/scripts/vacuumdb.c(186): error C2059: syntax error: '}'
[19:11:00.642] ../src/bin/scripts/vacuumdb.c(197): warning C4034: sizeof returns 0

The real problem here seems to be the empty long_options_repack array.
I removed it and started a new run to see what happens. Running now:
https://cirrus-ci.com/build/4961902171783168

The symlink issue occurred at "Windows - Server 2019, MinGW64 - Meson", where
the code compiled well. The compilation failure mentioned above comes from
"Windows - Server 2019, VS 2019 - Meson & ninja". I think it's still possible
that the symlink issue will occur there once the compilation is fixed.

FWIW, I don't think it's particularly wise to rely on symlinks on windows -
IIRC they will often not be enabled outside of development environments.

Greetings,

Andres Freund

#14Antonin Houska
ah@cybertec.at
In reply to: Andres Freund (#13)
Re: Adding REPACK [concurrently]

Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2025-08-20 16:22:41 +0200, Antonin Houska wrote:

Álvaro Herrera <alvherre@kurilemu.de> wrote:

On 2025-Aug-20, Antonin Houska wrote:

There's an issue with the symlink, maybe some meson expert can help. In
particular, the CI on Windows ends up with the following error:

ERROR: Tried to install symlink to missing file C:/cirrus/build/tmp_install/usr/local/pgsql/bin/vacuumdb

Hmm, that's not the problem I see in the CI run from the commitfest app:

https://cirrus-ci.com/task/5608274336153600

I was referring to the other build that you shared off-list (probably
independent from cfbot):

https://cirrus-ci.com/build/4726227505774592

[19:11:00.642] FAILED: [code=2] src/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj
[19:11:00.642] "cl" "-Isrc\bin\scripts\vacuumdb.exe.p" "-Isrc\include" "-I..\src\include" "-Ic:\openssl\1.1\include" "-I..\src\include\port\win32" "-I..\src\include\port\win32_msvc" "-Isrc/interfaces/libpq" "-I..\src\interfaces\libpq" "/MDd" "/nologo" "/showIncludes" "/utf-8" "/W2" "/Od" "/Zi" "/Zc:preprocessor" "/DWIN32" "/DWINDOWS" "/D__WINDOWS__" "/D__WIN32__" "/D_CRT_SECURE_NO_DEPRECATE" "/D_CRT_NONSTDC_NO_DEPRECATE" "/wd4018" "/wd4244" "/wd4273" "/wd4101" "/wd4102" "/wd4090" "/wd4267" "/Fdsrc\bin\scripts\vacuumdb.exe.p\vacuumdb.c.pdb" /Fosrc/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj "/c" ../src/bin/scripts/vacuumdb.c
[19:11:00.642] ../src/bin/scripts/vacuumdb.c(186): error C2059: syntax error: '}'
[19:11:00.642] ../src/bin/scripts/vacuumdb.c(197): warning C4034: sizeof returns 0

The real problem here seems to be the empty long_options_repack array.
I removed it and started a new run to see what happens. Running now:
https://cirrus-ci.com/build/4961902171783168

The symlink issue occurred at "Windows - Server 2019, MinGW64 - Meson", where
the code compiled well. The compilation failure mentioned above comes from
"Windows - Server 2019, VS 2019 - Meson & ninja". I think it's still possible
that the symlink issue will occur there once the compilation is fixed.

FWIW, I don't think it's particularly wise to rely on symlinks on windows -
IIRC they will often not be enabled outside of development environments.

ok, installing a copy of the same executable with a different name seems more
reliable. At least that's how the postmaster->postgres link used to be
handled, if I read Makefile correctly. Thanks.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#15Andres Freund
andres@anarazel.de
In reply to: Antonin Houska (#14)
Re: Adding REPACK [concurrently]

Hi,

On 2025-08-21 20:14:14 +0200, Antonin Houska wrote:

ok, installing a copy of the same executable with a different name seems more
reliable. At least that's how the postmaster->postgres link used to be
handled, if I read Makefile correctly. Thanks.

I have not followed this thread, but I don't think the whole thing of having a
single executable with multiple names is worth doing. Just make whatever an
option, instead of having multiple "executables".

Greetings,

Andres

#16Robert Treat
xzilla@users.sourceforge.net
In reply to: Alvaro Herrera (#8)
Re: Adding REPACK [concurrently]

On Tue, Aug 19, 2025 at 2:53 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:

Note choice of shell command name: though all the other programs in
src/bin/scripts do not use the "pg_" prefix, this one does; we thought
it made no sense to follow the old programs as precedent because there
seems to be a lament for the lack of pg_ prefix in those, and we only
keep what they are because of their long history. This one has no
history.

Still on pg_repackdb, the implementation here is to install a symlink
called pg_repackdb which points to vacuumdb, and make the program behave
differently when called in this way. The amount of additional code for
this is relatively small, so I think this is a worthy technique --
assuming it works. If it doesn't, Antonin proposed a separate binary
that just calls some functions from vacuumdb. Or maybe we could have a
common source file that both utilities call.

What's the plan for clusterdb? It seems like we'd ideally create a
stand alone pg_repackdb which replaces clusterdb and also allows us to
remove the FULL options from vacuumdb.

Robert Treat
https://xzilla.net

#17Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Treat (#16)
Re: Adding REPACK [concurrently]

On 2025-Aug-21, Robert Treat wrote:

What's the plan for clusterdb? It seems like we'd ideally create a
stand alone pg_repackdb which replaces clusterdb and also allows us to
remove the FULL options from vacuumdb.

I don't think we should remove clusterdb, to avoid breaking any scripts
that work today. As you say, I would create the standalone pg_repackdb
to do what we need it to do (namely: run the REPACK commands) and leave
vacuumdb and clusterdb alone. Removing the obsolete commands and
options can be done in a few years.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

#18Euler Taveira
euler@eulerto.com
In reply to: Alvaro Herrera (#17)
Re: Adding REPACK [concurrently]

On Fri, Aug 22, 2025, at 6:40 AM, Álvaro Herrera wrote:

On 2025-Aug-21, Robert Treat wrote:

What's the plan for clusterdb? It seems like we'd ideally create a
stand alone pg_repackdb which replaces clusterdb and also allows us to
remove the FULL options from vacuumdb.

I don't think we should remove clusterdb, to avoid breaking any scripts
that work today. As you say, I would create the standalone pg_repackdb
to do what we need it to do (namely: run the REPACK commands) and leave
vacuumdb and clusterdb alone. Removing the obsolete commands and
options can be done in a few years.

I would say that we need to plan the removal of these binaries (clusterdb and
vacuumdb). We can start with a warning into clusterdb saying they should use
pg_repackdb. In a few years, we can remove clusterdb. There were complaints
about binary names without a pg_ prefix in the past [1]/messages/by-id/CAJgfmqXYYKXR+QUhEa3cq6pc8dV0Hu7QvOUccm7R0TkC=T-+=A@mail.gmail.com.

I don't think we need to keep vacuumdb. Packagers can keep a symlink (vacuumdb)
to pg_repackdb. We can add a similar warning message saying they should use
pg_repackdb if the symlink is used.

[1]: /messages/by-id/CAJgfmqXYYKXR+QUhEa3cq6pc8dV0Hu7QvOUccm7R0TkC=T-+=A@mail.gmail.com

--
Euler Taveira
EDB https://www.enterprisedb.com/

#19Michael Banck
michael.banck@credativ.de
In reply to: Euler Taveira (#18)
Re: Adding REPACK [concurrently]

Hi,

On Fri, Aug 22, 2025 at 05:32:34PM -0300, Euler Taveira wrote:

On Fri, Aug 22, 2025, at 6:40 AM, �lvaro Herrera wrote:

On 2025-Aug-21, Robert Treat wrote:

What's the plan for clusterdb? It seems like we'd ideally create a
stand alone pg_repackdb which replaces clusterdb and also allows us to
remove the FULL options from vacuumdb.

I don't think we should remove clusterdb, to avoid breaking any scripts
that work today. As you say, I would create the standalone pg_repackdb
to do what we need it to do (namely: run the REPACK commands) and leave
vacuumdb and clusterdb alone. Removing the obsolete commands and
options can be done in a few years.

I would say that we need to plan the removal of these binaries (clusterdb and
vacuumdb). We can start with a warning into clusterdb saying they should use
pg_repackdb. In a few years, we can remove clusterdb. There were complaints
about binary names without a pg_ prefix in the past [1].

Yeah.

I don't think we need to keep vacuumdb. Packagers can keep a symlink (vacuumdb)
to pg_repackdb. We can add a similar warning message saying they should use
pg_repackdb if the symlink is used.

Unless pg_repack has the same (or a superset of) CLI and behaviour as
vacuumdb (I haven't checked, but doubt it?), I think replacing vacuumdb
with a symlink to pg_repack will lead to much more breakage in existing
scripts/automation than clusterdb, which I guess is used orders of
magnitude less frequently than vacumdb.

Michael

#20Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Banck (#19)
Re: Adding REPACK [concurrently]

On 2025-08-23, Michael Banck wrote:

On Fri, Aug 22, 2025 at 05:32:34PM -0300, Euler Taveira wrote:

I don't think we need to keep vacuumdb. Packagers can keep a symlink (vacuumdb)
to pg_repackdb. We can add a similar warning message saying they should use
pg_repackdb if the symlink is used.

Unless pg_repack has the same (or a superset of) CLI and behaviour as
vacuumdb (I haven't checked, but doubt it?), I think replacing vacuumdb
with a symlink to pg_repack will lead to much more breakage in existing
scripts/automation than clusterdb, which I guess is used orders of
magnitude less frequently than vacumdb.

Yeah, I completely disagree with the idea of getting rid of vacuumdb. We can, maybe, in a distant future, get rid of the --full option to vacuumdb. But the rest of the vacuumdb behavior must stay, I think, because REPACK is not VACUUM — it is only VACUUM FULL. And we want to make that distinction very clear.

We can also, in a few years, get rid of clusterdb. But I don't think we need to deprecate it just yet.

--
Álvaro Herrera

#21Robert Treat
xzilla@users.sourceforge.net
In reply to: Alvaro Herrera (#20)
#22Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#1)
#23Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#22)
#24Mihail Nikalayeu
mihailnikalayeu@gmail.com
In reply to: Alvaro Herrera (#23)
#25Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Mihail Nikalayeu (#24)
#26Antonin Houska
ah@cybertec.at
In reply to: Mihail Nikalayeu (#24)
#27Mihail Nikalayeu
mihailnikalayeu@gmail.com
In reply to: Antonin Houska (#26)
#28Mihail Nikalayeu
mihailnikalayeu@gmail.com
In reply to: Alvaro Herrera (#25)
#29Antonin Houska
ah@cybertec.at
In reply to: Mihail Nikalayeu (#27)
#30Mihail Nikalayeu
mihailnikalayeu@gmail.com
In reply to: Antonin Houska (#29)
#31Antonin Houska
ah@cybertec.at
In reply to: Mihail Nikalayeu (#30)
#32Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#23)
#33Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#22)
#34Marcos Pegoraro
marcos@f10.com.br
In reply to: Alvaro Herrera (#33)
#35Robert Treat
xzilla@users.sourceforge.net
In reply to: Marcos Pegoraro (#34)
#36Marcos Pegoraro
marcos@f10.com.br
In reply to: Robert Treat (#35)
#37Mihail Nikalayeu
mihailnikalayeu@gmail.com
In reply to: Alvaro Herrera (#33)
#38Robert Treat
xzilla@users.sourceforge.net
In reply to: Alvaro Herrera (#33)
#39Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#1)
#40Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#1)
#41jian he
jian.universality@gmail.com
In reply to: Alvaro Herrera (#40)
In reply to: jian he (#41)
#43Mihail Nikalayeu
mihailnikalayeu@gmail.com
In reply to: Alvaro Herrera (#40)
#44Antonin Houska
ah@cybertec.at
In reply to: Mihail Nikalayeu (#43)
#45jian he
jian.universality@gmail.com
In reply to: Alvaro Herrera (#40)
#46Robert Treat
xzilla@users.sourceforge.net
In reply to: jian he (#45)
#47Antonin Houska
ah@cybertec.at
In reply to: Robert Treat (#46)
#48jian he
jian.universality@gmail.com
In reply to: Robert Treat (#46)
#49Robert Treat
xzilla@users.sourceforge.net
In reply to: Antonin Houska (#47)
#50Mihail Nikalayeu
mihailnikalayeu@gmail.com
In reply to: Antonin Houska (#44)
#51Antonin Houska
ah@cybertec.at
In reply to: Mihail Nikalayeu (#50)
#52Mihail Nikalayeu
mihailnikalayeu@gmail.com
In reply to: Antonin Houska (#51)
#53Antonin Houska
ah@cybertec.at
In reply to: Mihail Nikalayeu (#52)
#54Antonin Houska
ah@cybertec.at
In reply to: jian he (#41)
#55David Klika
david.klika@atlas.cz
In reply to: Antonin Houska (#54)
#56Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: David Klika (#55)
#57Antonin Houska
ah@cybertec.at
In reply to: Alvaro Herrera (#40)
#58Marcos Pegoraro
marcos@f10.com.br
In reply to: Alvaro Herrera (#56)
#59Mihail Nikalayeu
mihailnikalayeu@gmail.com
In reply to: Antonin Houska (#57)
#60Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Marcos Pegoraro (#58)
#61Mihail Nikalayeu
mihailnikalayeu@gmail.com
In reply to: Mihail Nikalayeu (#59)
#62Mihail Nikalayeu
mihailnikalayeu@gmail.com
In reply to: Mihail Nikalayeu (#61)
#63Antonin Houska
ah@cybertec.at
In reply to: Mihail Nikalayeu (#59)
#64David Klika
david.klika@atlas.cz
In reply to: Alvaro Herrera (#56)
#65Antonin Houska
ah@cybertec.at
In reply to: Mihail Nikalayeu (#61)
#66Antonin Houska
ah@cybertec.at
In reply to: Mihail Nikalayeu (#62)
#67Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Antonin Houska (#66)
#68Mihail Nikalayeu
mihailnikalayeu@gmail.com
In reply to: Antonin Houska (#66)
#69Mihail Nikalayeu
mihailnikalayeu@gmail.com
In reply to: Mihail Nikalayeu (#68)
#70Antonin Houska
ah@cybertec.at
In reply to: Alvaro Herrera (#67)
#71Mihail Nikalayeu
mihailnikalayeu@gmail.com
In reply to: Mihail Nikalayeu (#69)
#72Mihail Nikalayeu
mihailnikalayeu@gmail.com
In reply to: Antonin Houska (#70)
#73Antonin Houska
ah@cybertec.at
In reply to: Mihail Nikalayeu (#68)
#74Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Antonin Houska (#70)
#75Mihail Nikalayeu
mihailnikalayeu@gmail.com
In reply to: Mihail Nikalayeu (#69)
#76Mihail Nikalayeu
mihailnikalayeu@gmail.com
In reply to: Antonin Houska (#73)
#77Antonin Houska
ah@cybertec.at
In reply to: Alvaro Herrera (#74)
#78Antonin Houska
ah@cybertec.at
In reply to: Antonin Houska (#77)
#79Antonin Houska
ah@cybertec.at
In reply to: Mihail Nikalayeu (#75)
#80Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Antonin Houska (#79)
#81Antonin Houska
ah@cybertec.at
In reply to: Alvaro Herrera (#80)
#82Antonin Houska
ah@cybertec.at
In reply to: Antonin Houska (#78)
#83Mihail Nikalayeu
mihailnikalayeu@gmail.com
In reply to: Antonin Houska (#82)
#84Antonin Houska
ah@cybertec.at
In reply to: Mihail Nikalayeu (#83)
#85Mihail Nikalayeu
mihailnikalayeu@gmail.com
In reply to: Antonin Houska (#84)
#86Mihail Nikalayeu
mihailnikalayeu@gmail.com
In reply to: Mihail Nikalayeu (#85)
#87Antonin Houska
ah@cybertec.at
In reply to: Mihail Nikalayeu (#86)
#88Mihail Nikalayeu
mihailnikalayeu@gmail.com
In reply to: Antonin Houska (#87)
#89Antonin Houska
ah@cybertec.at
In reply to: Mihail Nikalayeu (#88)
#90Mihail Nikalayeu
mihailnikalayeu@gmail.com
In reply to: Mihail Nikalayeu (#85)
#91Antonin Houska
ah@cybertec.at
In reply to: Antonin Houska (#89)
#92Antonin Houska
ah@cybertec.at
In reply to: Mihail Nikalayeu (#85)
#93Antonin Houska
ah@cybertec.at
In reply to: Mihail Nikalayeu (#90)
#94Mihail Nikalayeu
mihailnikalayeu@gmail.com
In reply to: Antonin Houska (#93)
#95Mihail Nikalayeu
mihailnikalayeu@gmail.com
In reply to: Mihail Nikalayeu (#94)
#96Antonin Houska
ah@cybertec.at
In reply to: Mihail Nikalayeu (#95)
#97Antonin Houska
ah@cybertec.at
In reply to: Antonin Houska (#96)
#98Mihail Nikalayeu
mihailnikalayeu@gmail.com
In reply to: Antonin Houska (#97)
#99Antonin Houska
ah@cybertec.at
In reply to: Mihail Nikalayeu (#98)
#100Mihail Nikalayeu
mihailnikalayeu@gmail.com
In reply to: Antonin Houska (#99)
#101Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Mihail Nikalayeu (#100)
#102Mihail Nikalayeu
mihailnikalayeu@gmail.com
In reply to: Alvaro Herrera (#101)
#103Antonin Houska
ah@cybertec.at
In reply to: Mihail Nikalayeu (#98)
#104Mihail Nikalayeu
mihailnikalayeu@gmail.com
In reply to: Mihail Nikalayeu (#102)
#105Mihail Nikalayeu
mihailnikalayeu@gmail.com
In reply to: Antonin Houska (#103)
#106Antonin Houska
ah@cybertec.at
In reply to: Mihail Nikalayeu (#100)
#107Mihail Nikalayeu
mihailnikalayeu@gmail.com
In reply to: Antonin Houska (#106)
#108Antonin Houska
ah@cybertec.at
In reply to: Mihail Nikalayeu (#107)
#109Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Antonin Houska (#97)
#110Antonin Houska
ah@cybertec.at
In reply to: Mihail Nikalayeu (#107)
#111Mihail Nikalayeu
mihailnikalayeu@gmail.com
In reply to: Antonin Houska (#110)
#112Antonin Houska
ah@cybertec.at
In reply to: Mihail Nikalayeu (#111)
#113Mihail Nikalayeu
mihailnikalayeu@gmail.com
In reply to: Alvaro Herrera (#109)
#114Antonin Houska
ah@cybertec.at
In reply to: Mihail Nikalayeu (#113)
#115Mihail Nikalayeu
mihailnikalayeu@gmail.com
In reply to: Antonin Houska (#114)
#116Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Mihail Nikalayeu (#115)
#117Antonin Houska
ah@cybertec.at
In reply to: Alvaro Herrera (#116)
#118Antonin Houska
ah@cybertec.at
In reply to: Alvaro Herrera (#109)
#119Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Antonin Houska (#118)
#120Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#119)
#121Antonin Houska
ah@cybertec.at
In reply to: Alvaro Herrera (#120)
#122Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Antonin Houska (#121)
#123Srinath Reddy Sadipiralla
srinath2133@gmail.com
In reply to: Antonin Houska (#121)
#124Antonin Houska
ah@cybertec.at
In reply to: Alvaro Herrera (#122)
#125Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Antonin Houska (#124)
#126Antonin Houska
ah@cybertec.at
In reply to: Alvaro Herrera (#125)
#127Antonin Houska
ah@cybertec.at
In reply to: Srinath Reddy Sadipiralla (#123)
#128Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Antonin Houska (#127)
#129Antonin Houska
ah@cybertec.at
In reply to: Antonin Houska (#126)
#130Mihail Nikalayeu
mihailnikalayeu@gmail.com
In reply to: Antonin Houska (#129)
#131Antonin Houska
ah@cybertec.at
In reply to: Mihail Nikalayeu (#130)
#132Antonin Houska
ah@cybertec.at
In reply to: Antonin Houska (#131)
#133Antonin Houska
ah@cybertec.at
In reply to: Antonin Houska (#132)
#134Antonin Houska
ah@cybertec.at
In reply to: Antonin Houska (#127)
#135Mihail Nikalayeu
mihailnikalayeu@gmail.com
In reply to: Antonin Houska (#133)
#136Antonin Houska
ah@cybertec.at
In reply to: Mihail Nikalayeu (#135)