Adding REPACK [concurrently]

Started by Alvaro Herrera6 months ago83 messages

alvherre@alvh.no-ip.org

6 months ago

1 attachment(s)

Hello,

Here's a patch to add REPACK and eventually the CONCURRENTLY flag to it.
This is coming from [1]/messages/by-id/76278.1724760050@antos. The ultimate goal is to have an in-core tool
to allow concurrent table rewrite to get rid of bloat; right now, VACUUM
FULL does that, but it's not concurrent. Users have resorted to using
the pg_repack third-party tool, which is ancient and uses a weird
internal implementation, as well as pg_squeeze, which uses logical
decoding to capture changes that occur during the table rewrite. The
patch submitted here, largely by Antonin Houska with some changes by me,
is based on the the pg_squeeze code which he authored, and first
introduces a new command called REPACK to absorb both VACUUM FULL and
CLUSTER, followed by addition of a CONCURRENTLY flag to allow some forms
of REPACK to operate online using logical decoding.

Essentially, this first patch just reshuffles the CLUSTER code to create
the REPACK command.

I made a few changes from Antonin's original at [2]/messages/by-id/152010.1751307725@localhost. First, I modified
the grammar to support "REPACK [tab] USING INDEX" without specifying the
index name. With this change, all possibilities of the old commands are
covered, which gives us the chance to flag them as obsolete. (This is
good, because having VACUUM FULL do something completely different from
regular VACUUM confuses users all the time; and on the other hand,
having a command called CLUSTER which is at odds with what most people
think of as a "database cluster" is also confusing.)

Here's a list of existing commands, and how to write them in the current
patch's proposal for REPACK:

-- re-clusters all tables that have a clustered index set
CLUSTER -> REPACK USING INDEX

-- clusters the given table using the given index
CLUSTER tab USING idx -> REPACK tab USING INDEX idx

-- clusters this table using a clustered index; error if no index clustered
CLUSTER tab -> REPACK tab USING INDEX

-- vacuum-full all tables
VACUUM FULL -> REPACK

-- vacuum-full the specified table
VACUUM FULL tab -> REPACK tab

My other change to Antonin's patch is that I made REPACK USING INDEX set
the 'indisclustered' flag to the index being used, so REPACK behaves
identically to CLUSTER. We can discuss whether we really want this.
For instance we could add an option so that by default REPACK omits
persisting the clustered index, and instead it only does that when you
give it some special option, say something like
"REPACK (persist_clustered_index=true) tab USING INDEX idx"
Overall I'm not sure this is terribly interesting, since clustered
indexes are not very useful for most users anyway.

I made a few other minor changes not worthy of individual mention, and
there are a few others pending, such as updates to the
pg_stat_progress_repack view infrastructure, as well as phasing out
pg_stat_progress_cluster (maybe the latter would offer a subset of the
former; not yet sure about this.) Also, I'd like to work on adding a
`repackdb` command for completeness.

On repackdb: I think is going to be very similar to vacuumdb, mostly in
that it is going to need to be able to run tasks in parallel; but there
are things it doesn't have to deal with, such as analyze-in-stages,
which I think is a large burden. I estimate about 1k LOC there,
extremely similar to vacuumdb. Maybe it makes sense to share the source
code and make the new executable a symlink instead, with some additional
code to support the two different modes. Again, I'm not sure about
this -- I like the idea, but I'd have to see the implementation.

I'll be rebasing the rest of Antonin's patch series afterwards,
including the logical decoding changes necessary for CONCURRENTLY. In
the meantime, if people want to review those, which would be very
valuable, they can go back to branch master from around the time he
submitted it and apply the old patches there.

[1]: /messages/by-id/76278.1724760050@antos
[2]: /messages/by-id/152010.1751307725@localhost

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/

Attachments:

v1-0001-Add-REPACK-command.patchtext/x-diff; charset=utf-8Download

From 017582a10f948a89d9f49035c2bed6a3fa1c7d34 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= <alvherre@kurilemu.de>
Date: Sat, 26 Jul 2025 19:57:26 +0200
Subject: [PATCH v1] Add REPACK command.

The existing CLUSTER command as well as VACUUM with the FULL option both
reclaim unused space by rewriting table. Now that we want to enhance this
functionality (in particular, by adding a new option CONCURRENTLY), we should
enhance both commands because they are both implemented by the same function
(cluster.c:cluster_rel). However, adding the same option to two different
commands is not very user-friendly. Therefore it was decided to create a new
command and to declare both CLUSTER command and the FULL option of VACUUM
deprecated. Future enhancements to this rewriting code will only affect the
new command.

Like CLUSTER, the REPACK command reorders the table according to the specified
index. Unlike CLUSTER, REPACK does not require the index: if only table is
specified, the command acts as VACUUM FULL. As we don't want to remove CLUSTER
and VACUUM FULL yet, there are three callers of the cluster_rel() function
now: REPACK, CLUSTER and VACUUM FULL. When we need to distinguish who is
calling this function (mostly for logging, but also for progress reporting),
we can no longer use the OID of the clustering index: both REPACK and VACUUM
FULL can pass InvalidOid. Therefore, this patch introduces a new enumeration
type ClusterCommand, and adds an argument of this type to the cluster_rel()
function and to all the functions that need to distinguish the caller.

Like CLUSTER and VACUUM FULL, the REPACK COMMAND without arguments processes
all the tables on which the current user has the MAINTAIN privilege.

A new view pg_stat_progress_repack view is added to monitor the progress of
REPACK. Currently it displays the same information as pg_stat_progress_cluster
(except that column names might differ), but it'll also display the status of
the REPACK CONCURRENTLY command in the future, so the view definitions will
eventually diverge.

Regarding user documentation, the patch moves the information on clustering
from cluster.sgml to the new file repack.sgml. cluster.sgml now contains a
link that points to the related section of repack.sgml. A note on deprecation
and a link to repack.sgml are added to both cluster.sgml and vacuum.sgml.

Author: Antonin Houska <ah@cybertec.at>
Discussion: https://postgr.es/m/82651.1720540558@antos
---
 doc/src/sgml/monitoring.sgml             | 223 ++++++-
 doc/src/sgml/ref/allfiles.sgml           |   1 +
 doc/src/sgml/ref/cluster.sgml            |  82 +--
 doc/src/sgml/ref/repack.sgml             | 254 ++++++++
 doc/src/sgml/ref/vacuum.sgml             |   9 +
 doc/src/sgml/reference.sgml              |   1 +
 src/backend/access/heap/heapam_handler.c |  32 +-
 src/backend/catalog/index.c              |   2 +-
 src/backend/catalog/system_views.sql     |  26 +
 src/backend/commands/cluster.c           | 723 +++++++++++++++--------
 src/backend/commands/vacuum.c            |   3 +-
 src/backend/parser/gram.y                |  77 ++-
 src/backend/tcop/utility.c               |  20 +-
 src/backend/utils/adt/pgstatfuncs.c      |   2 +
 src/bin/psql/tab-complete.in.c           |  33 +-
 src/include/commands/cluster.h           |   7 +-
 src/include/commands/progress.h          |  61 +-
 src/include/nodes/parsenodes.h           |  20 +-
 src/include/parser/kwlist.h              |   1 +
 src/include/tcop/cmdtaglist.h            |   1 +
 src/include/utils/backend_progress.h     |   1 +
 src/test/regress/expected/cluster.out    | 125 +++-
 src/test/regress/expected/rules.out      |  23 +
 src/test/regress/sql/cluster.sql         |  59 ++
 src/tools/pgindent/typedefs.list         |   2 +
 25 files changed, 1407 insertions(+), 381 deletions(-)
 create mode 100644 doc/src/sgml/ref/repack.sgml

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 823afe1b30b..924e1b1fa99 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -405,6 +405,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_progress_repack</structname><indexterm><primary>pg_stat_progress_repack</primary></indexterm></entry>
+      <entry>One row for each backend running
+       <command>REPACK</command>, showing current progress.  See
+       <xref linkend="repack-progress-reporting"/>.
+      </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_progress_basebackup</structname><indexterm><primary>pg_stat_progress_basebackup</primary></indexterm></entry>
       <entry>One row for each WAL sender process streaming a base backup,
@@ -5495,7 +5503,8 @@ FROM pg_stat_get_backend_idset() AS backendid;
    certain commands during command execution.  Currently, the only commands
    which support progress reporting are <command>ANALYZE</command>,
    <command>CLUSTER</command>,
-   <command>CREATE INDEX</command>, <command>VACUUM</command>,
+   <command>CREATE INDEX</command>, <command>REPACK</command>,
+   <command>VACUUM</command>,
    <command>COPY</command>,
    and <xref linkend="protocol-replication-base-backup"/> (i.e., replication
    command that <xref linkend="app-pgbasebackup"/> issues to take
@@ -5954,6 +5963,218 @@ FROM pg_stat_get_backend_idset() AS backendid;
   </table>
  </sect2>
 
+ <sect2 id="repack-progress-reporting">
+  <title>REPACK Progress Reporting</title>
+
+  <indexterm>
+   <primary>pg_stat_progress_repack</primary>
+  </indexterm>
+
+  <para>
+   Whenever <command>REPACK</command> is running,
+   the <structname>pg_stat_progress_repack</structname> view will contain a
+   row for each backend that is currently running the command.  The tables
+   below describe the information that will be reported and provide
+   information about how to interpret it.
+  </para>
+
+  <table id="pg-stat-progress-repack-view" xreflabel="pg_stat_progress_repack">
+   <title><structname>pg_stat_progress_repack</structname> View</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>pid</structfield> <type>integer</type>
+      </para>
+      <para>
+       Process ID of backend.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>datid</structfield> <type>oid</type>
+      </para>
+      <para>
+       OID of the database to which this backend is connected.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>datname</structfield> <type>name</type>
+      </para>
+      <para>
+       Name of the database to which this backend is connected.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>relid</structfield> <type>oid</type>
+      </para>
+      <para>
+       OID of the table being repacked.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>phase</structfield> <type>text</type>
+      </para>
+      <para>
+       Current processing phase. See <xref linkend="repack-phases"/>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>repack_index_relid</structfield> <type>oid</type>
+      </para>
+      <para>
+       If the table is being scanned using an index, this is the OID of the
+       index being used; otherwise, it is zero.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_scanned</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples scanned.
+       This counter only advances when the phase is
+       <literal>seq scanning heap</literal>,
+       <literal>index scanning heap</literal>
+       or <literal>writing new heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_written</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples written.
+       This counter only advances when the phase is
+       <literal>seq scanning heap</literal>,
+       <literal>index scanning heap</literal>
+       or <literal>writing new heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_blks_total</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total number of heap blocks in the table.  This number is reported
+       as of the beginning of <literal>seq scanning heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_blks_scanned</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap blocks scanned.  This counter only advances when the
+       phase is <literal>seq scanning heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>index_rebuild_count</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of indexes rebuilt.  This counter only advances when the phase
+       is <literal>rebuilding index</literal>.
+      </para></entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <table id="repack-phases">
+   <title>REPACK Phases</title>
+   <tgroup cols="2">
+    <colspec colname="col1" colwidth="1*"/>
+    <colspec colname="col2" colwidth="2*"/>
+    <thead>
+    <row>
+      <entry>Phase</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><literal>initializing</literal></entry>
+     <entry>
+       The command is preparing to begin scanning the heap.  This phase is
+       expected to be very brief.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>seq scanning heap</literal></entry>
+     <entry>
+       The command is currently scanning the table using a sequential scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>index scanning heap</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently scanning the table using an index scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>sorting tuples</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently sorting tuples.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>writing new heap</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently writing the new heap.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>swapping relation files</literal></entry>
+     <entry>
+       The command is currently swapping newly-built files into place.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>rebuilding index</literal></entry>
+     <entry>
+       The command is currently rebuilding an index.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>performing final cleanup</literal></entry>
+     <entry>
+       The command is performing final cleanup.  When this phase is
+       completed, <command>REPACK</command> will end.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+ </sect2>
+
  <sect2 id="copy-progress-reporting">
   <title>COPY Progress Reporting</title>
 
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index f5be638867a..c0ef654fcb4 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -167,6 +167,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY refreshMaterializedView SYSTEM "refresh_materialized_view.sgml">
 <!ENTITY reindex            SYSTEM "reindex.sgml">
 <!ENTITY releaseSavepoint   SYSTEM "release_savepoint.sgml">
+<!ENTITY repack             SYSTEM "repack.sgml">
 <!ENTITY reset              SYSTEM "reset.sgml">
 <!ENTITY revoke             SYSTEM "revoke.sgml">
 <!ENTITY rollback           SYSTEM "rollback.sgml">
diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index 8811f169ea0..ee4fd965928 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -42,18 +42,6 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
    <replaceable class="parameter">table_name</replaceable>.
   </para>
 
-  <para>
-   When a table is clustered, it is physically reordered
-   based on the index information. Clustering is a one-time operation:
-   when the table is subsequently updated, the changes are
-   not clustered.  That is, no attempt is made to store new or
-   updated rows according to their index order.  (If one wishes, one can
-   periodically recluster by issuing the command again.  Also, setting
-   the table's <literal>fillfactor</literal> storage parameter to less than
-   100% can aid in preserving cluster ordering during updates, since updated
-   rows are kept on the same page if enough space is available there.)
-  </para>
-
   <para>
    When a table is clustered, <productname>PostgreSQL</productname>
    remembers which index it was clustered by.  The form
@@ -78,6 +66,25 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
    database operations (both reads and writes) from operating on the
    table until the <command>CLUSTER</command> is finished.
   </para>
+
+  <warning>
+   <para>
+    The <command>CLUSTER</command> command is deprecated in favor of
+    <xref linkend="sql-repack"/>.
+   </para>
+  </warning>
+
+  <note>
+   <para>
+    <xref linkend="sql-repack-notes-on-clustering"/> explain how clustering
+    works, whether it is initiated by <command>CLUSTER</command> or
+    by <command>REPACK</command>. The notable difference between the two is
+    that <command>REPACK</command> does not remember the index used last
+    time. Thus if you don't specify an index, <command>REPACK</command>
+    rewrites the table but does not try to cluster it.
+   </para>
+  </note>
+
  </refsect1>
 
  <refsect1>
@@ -136,63 +143,12 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
     on the table.
    </para>
 
-   <para>
-    In cases where you are accessing single rows randomly
-    within a table, the actual order of the data in the
-    table is unimportant. However, if you tend to access some
-    data more than others, and there is an index that groups
-    them together, you will benefit from using <command>CLUSTER</command>.
-    If you are requesting a range of indexed values from a table, or a
-    single indexed value that has multiple rows that match,
-    <command>CLUSTER</command> will help because once the index identifies the
-    table page for the first row that matches, all other rows
-    that match are probably already on the same table page,
-    and so you save disk accesses and speed up the query.
-   </para>
-
-   <para>
-    <command>CLUSTER</command> can re-sort the table using either an index scan
-    on the specified index, or (if the index is a b-tree) a sequential
-    scan followed by sorting.  It will attempt to choose the method that
-    will be faster, based on planner cost parameters and available statistical
-    information.
-   </para>
-
    <para>
     While <command>CLUSTER</command> is running, the <xref
     linkend="guc-search-path"/> is temporarily changed to <literal>pg_catalog,
     pg_temp</literal>.
    </para>
 
-   <para>
-    When an index scan is used, a temporary copy of the table is created that
-    contains the table data in the index order.  Temporary copies of each
-    index on the table are created as well.  Therefore, you need free space on
-    disk at least equal to the sum of the table size and the index sizes.
-   </para>
-
-   <para>
-    When a sequential scan and sort is used, a temporary sort file is
-    also created, so that the peak temporary space requirement is as much
-    as double the table size, plus the index sizes.  This method is often
-    faster than the index scan method, but if the disk space requirement is
-    intolerable, you can disable this choice by temporarily setting <xref
-    linkend="guc-enable-sort"/> to <literal>off</literal>.
-   </para>
-
-   <para>
-    It is advisable to set <xref linkend="guc-maintenance-work-mem"/> to
-    a reasonably large value (but not more than the amount of RAM you can
-    dedicate to the <command>CLUSTER</command> operation) before clustering.
-   </para>
-
-   <para>
-    Because the planner records statistics about the ordering of
-    tables, it is advisable to run <link linkend="sql-analyze"><command>ANALYZE</command></link>
-    on the newly clustered table.
-    Otherwise, the planner might make poor choices of query plans.
-   </para>
-
    <para>
     Because <command>CLUSTER</command> remembers which indexes are clustered,
     one can cluster the tables one wants clustered manually the first time,
diff --git a/doc/src/sgml/ref/repack.sgml b/doc/src/sgml/ref/repack.sgml
new file mode 100644
index 00000000000..a612c72d971
--- /dev/null
+++ b/doc/src/sgml/ref/repack.sgml
@@ -0,0 +1,254 @@
+<!--
+doc/src/sgml/ref/repack.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="sql-repack">
+ <indexterm zone="sql-repack">
+  <primary>REPACK</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>REPACK</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>REPACK</refname>
+  <refpurpose>rewrite a table to reclaim disk space</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <replaceable class="parameter">table_name</replaceable> [ USING INDEX <replaceable class="parameter">index_name</replaceable> ] ]
+
+<phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
+
+    VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>REPACK</command> reclaims storage occupied by dead
+   tuples. Unlike <command>VACUUM</command>, it does so by rewriting the
+   entire contents of the table specified
+   by <replaceable class="parameter">table_name</replaceable> into a new disk
+   file with no extra space (except for the space guaranteed by
+   the <literal>fillfactor</literal> storage parameter), allowing unused space
+   to be returned to the operating system.
+  </para>
+
+  <para>
+   Without
+   a <replaceable class="parameter">table_name</replaceable>, <command>REPACK</command>
+   processes every table and materialized view in the current database that
+   the current user has the <literal>MAINTAIN</literal> privilege on. This
+   form of <command>REPACK</command> cannot be executed inside a transaction
+   block.
+  </para>
+
+  <para>
+   If <replaceable class="parameter">index_name</replaceable> is specified,
+   the table is clustered by this index. Please see the notes on clustering
+   below.
+  </para>
+
+  <para>
+   When a table is being repacked, an <literal>ACCESS EXCLUSIVE</literal> lock
+   is acquired on it. This prevents any other database operations (both reads
+   and writes) from operating on the table until the <command>REPACK</command>
+   is finished.
+  </para>
+
+  <refsect2 id="sql-repack-notes-on-clustering" xreflabel="Notes on Clustering">
+   <title>Notes on Clustering</title>
+
+   <para>
+    When a table is clustered, it is physically reordered based on the index
+    information.  Clustering is a one-time operation: when the table is
+    subsequently updated, the changes are not clustered.  That is, no attempt
+    is made to store new or updated rows according to their index order.  (If
+    one wishes, one can periodically recluster by issuing the command again.
+    Also, setting the table's <literal>fillfactor</literal> storage parameter
+    to less than 100% can aid in preserving cluster ordering during updates,
+    since updated rows are kept on the same page if enough space is available
+    there.)
+   </para>
+
+   <para>
+    In cases where you are accessing single rows randomly within a table, the
+    actual order of the data in the table is unimportant. However, if you tend
+    to access some data more than others, and there is an index that groups
+    them together, you will benefit from using <command>REPACK</command>.  If
+    you are requesting a range of indexed values from a table, or a single
+    indexed value that has multiple rows that match,
+    <command>REPACK</command> will help because once the index identifies the
+    table page for the first row that matches, all other rows that match are
+    probably already on the same table page, and so you save disk accesses and
+    speed up the query.
+   </para>
+
+   <para>
+    <command>REPACK</command> can re-sort the table using either an index scan
+    on the specified index (if the index is a b-tree), or a sequential scan
+    followed by sorting.  It will attempt to choose the method that will be
+    faster, based on planner cost parameters and available statistical
+    information.
+   </para>
+
+   <para>
+    Because the planner records statistics about the ordering of tables, it is
+    advisable to
+    run <link linkend="sql-analyze"><command>ANALYZE</command></link> on the
+    newly repacked table.  Otherwise, the planner might make poor choices of
+    query plans.
+   </para>
+  </refsect2>
+
+  <refsect2 id="sql-repack-notes-on-resources" xreflabel="Notes on Resources">
+   <title>Notes on Resources</title>
+
+   <para>
+    When an index scan or a sequential scan without sort is used, a temporary
+    copy of the table is created that contains the table data in the index
+    order.  Temporary copies of each index on the table are created as well.
+    Therefore, you need free space on disk at least equal to the sum of the
+    table size and the index sizes.
+   </para>
+
+   <para>
+    When a sequential scan and sort is used, a temporary sort file is also
+    created, so that the peak temporary space requirement is as much as double
+    the table size, plus the index sizes.  This method is often faster than
+    the index scan method, but if the disk space requirement is intolerable,
+    you can disable this choice by temporarily setting
+    <xref linkend="guc-enable-sort"/> to <literal>off</literal>.
+   </para>
+
+   <para>
+    It is advisable to set <xref linkend="guc-maintenance-work-mem"/> to a
+    reasonably large value (but not more than the amount of RAM you can
+    dedicate to the <command>REPACK</command> operation) before repacking.
+   </para>
+  </refsect2>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><replaceable class="parameter">table_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (possibly schema-qualified) of a table.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">index_name</replaceable></term>
+    <listitem>
+     <para>
+      The name of an index.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>VERBOSE</literal></term>
+    <listitem>
+     <para>
+      Prints a progress report as each table is repacked
+      at <literal>INFO</literal> level.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">boolean</replaceable></term>
+    <listitem>
+     <para>
+      Specifies whether the selected option should be turned on or off.
+      You can write <literal>TRUE</literal>, <literal>ON</literal>, or
+      <literal>1</literal> to enable the option, and <literal>FALSE</literal>,
+      <literal>OFF</literal>, or <literal>0</literal> to disable it.  The
+      <replaceable class="parameter">boolean</replaceable> value can also
+      be omitted, in which case <literal>TRUE</literal> is assumed.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+
+   <para>
+    To repack a table, one must have the <literal>MAINTAIN</literal> privilege
+    on the table.
+   </para>
+
+   <para>
+    While <command>REPACK</command> is running, the <xref
+    linkend="guc-search-path"/> is temporarily changed to <literal>pg_catalog,
+    pg_temp</literal>.
+   </para>
+
+  <para>
+    Each backend running <command>REPACK</command> will report its progress
+    in the <structname>pg_stat_progress_repack</structname> view. See
+    <xref linkend="repack-progress-reporting"/> for details.
+  </para>
+
+   <para>
+    Repacking a partitioned table repacks each of its partitions. If an index
+    is specified, each partition is repacked using the partition of that
+    index. <command>REPACK</command> on a partitioned table cannot be executed
+    inside a transaction block.
+   </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   Repack the table <literal>employees</literal>:
+<programlisting>
+REPACK employees;
+</programlisting>
+  </para>
+
+  <para>
+   Repack the table <literal>employees</literal> on the basis of its
+   index <literal>employees_ind</literal> (Since index is used here, this is
+   effectively clustering):
+<programlisting>
+REPACK employees USING INDEX employees_ind;
+</programlisting>
+  </para>
+
+  <para>
+   Repack all tables in the database on which you have
+   the <literal>MAINTAIN</literal> privilege:
+<programlisting>
+REPACK;
+</programlisting></para>
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There is no <command>REPACK</command> statement in the SQL standard.
+  </para>
+
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index bd5dcaf86a5..cee1cf3926c 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -98,6 +98,7 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
    <varlistentry>
     <term><literal>FULL</literal></term>
     <listitem>
+
      <para>
       Selects <quote>full</quote> vacuum, which can reclaim more
       space, but takes much longer and exclusively locks the table.
@@ -106,6 +107,14 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
       the operation is complete.  Usually this should only be used when a
       significant amount of space needs to be reclaimed from within the table.
      </para>
+
+     <warning>
+      <para>
+       The <option>FULL</option> parameter is deprecated in favor of
+       <xref linkend="sql-repack"/>.
+      </para>
+     </warning>
+
     </listitem>
    </varlistentry>
 
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index ff85ace83fc..229912d35b7 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -195,6 +195,7 @@
    &refreshMaterializedView;
    &reindex;
    &releaseSavepoint;
+   &repack;
    &reset;
    &revoke;
    &rollback;
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index cb4bc35c93e..0b03070d394 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -741,13 +741,13 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	if (OldIndex != NULL && !use_sort)
 	{
 		const int	ci_index[] = {
-			PROGRESS_CLUSTER_PHASE,
-			PROGRESS_CLUSTER_INDEX_RELID
+			PROGRESS_REPACK_PHASE,
+			PROGRESS_REPACK_INDEX_RELID
 		};
 		int64		ci_val[2];
 
 		/* Set phase and OIDOldIndex to columns */
-		ci_val[0] = PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP;
+		ci_val[0] = PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP;
 		ci_val[1] = RelationGetRelid(OldIndex);
 		pgstat_progress_update_multi_param(2, ci_index, ci_val);
 
@@ -759,15 +759,15 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	else
 	{
 		/* In scan-and-sort mode and also VACUUM FULL, set phase */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
 		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
 		/* Set total heap blocks */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_TOTAL_HEAP_BLKS,
+		pgstat_progress_update_param(PROGRESS_REPACK_TOTAL_HEAP_BLKS,
 									 heapScan->rs_nblocks);
 	}
 
@@ -809,7 +809,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 				 * is manually updated to the correct value when the table
 				 * scan finishes.
 				 */
-				pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+				pgstat_progress_update_param(PROGRESS_REPACK_HEAP_BLKS_SCANNED,
 											 heapScan->rs_nblocks);
 				break;
 			}
@@ -825,7 +825,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			 */
 			if (prev_cblock != heapScan->rs_cblock)
 			{
-				pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+				pgstat_progress_update_param(PROGRESS_REPACK_HEAP_BLKS_SCANNED,
 											 (heapScan->rs_cblock +
 											  heapScan->rs_nblocks -
 											  heapScan->rs_startblock
@@ -912,14 +912,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			 * In scan-and-sort mode, report increase in number of tuples
 			 * scanned
 			 */
-			pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
 										 *num_tuples);
 		}
 		else
 		{
 			const int	ct_index[] = {
-				PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED,
-				PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN
+				PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
+				PROGRESS_REPACK_HEAP_TUPLES_WRITTEN
 			};
 			int64		ct_val[2];
 
@@ -952,14 +952,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		double		n_tuples = 0;
 
 		/* Report that we are now sorting tuples */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_SORT_TUPLES);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_SORT_TUPLES);
 
 		tuplesort_performsort(tuplesort);
 
 		/* Report that we are now writing new heap */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_WRITE_NEW_HEAP);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP);
 
 		for (;;)
 		{
@@ -977,7 +977,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 									 values, isnull,
 									 rwstate);
 			/* Report n_tuples */
-			pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_WRITTEN,
 										 n_tuples);
 		}
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index c4029a4f3d3..3063abff9a5 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -4079,7 +4079,7 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 		Assert(!ReindexIsProcessingIndex(indexOid));
 
 		/* Set index rebuild count */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_INDEX_REBUILD_COUNT,
+		pgstat_progress_update_param(PROGRESS_REPACK_INDEX_REBUILD_COUNT,
 									 i);
 		i++;
 	}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index f6eca09ee15..02f091c3ed6 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1279,6 +1279,32 @@ CREATE VIEW pg_stat_progress_cluster AS
     FROM pg_stat_get_progress_info('CLUSTER') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
+CREATE VIEW pg_stat_progress_repack AS
+    SELECT
+        S.pid AS pid,
+        S.datid AS datid,
+        D.datname AS datname,
+        S.relid AS relid,
+	-- param1 is currently unused
+        CASE S.param2 WHEN 0 THEN 'initializing'
+                      WHEN 1 THEN 'seq scanning heap'
+                      WHEN 2 THEN 'index scanning heap'
+                      WHEN 3 THEN 'sorting tuples'
+                      WHEN 4 THEN 'writing new heap'
+                      WHEN 5 THEN 'swapping relation files'
+                      WHEN 6 THEN 'rebuilding index'
+                      WHEN 7 THEN 'performing final cleanup'
+                      END AS phase,
+        CAST(S.param3 AS oid) AS repack_index_relid,
+        S.param4 AS heap_tuples_scanned,
+        S.param5 AS heap_tuples_written,
+        S.param6 AS heap_blks_total,
+        S.param7 AS heap_blks_scanned,
+        S.param8 AS index_rebuild_count
+    FROM pg_stat_get_progress_info('REPACK') AS S
+        LEFT JOIN pg_database D ON S.datid = D.oid;
+
+
 CREATE VIEW pg_stat_progress_create_index AS
     SELECT
         S.pid AS pid, S.datid AS datid, D.datname AS datname,
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index b55221d44cd..752c83d4391 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -67,18 +67,41 @@ typedef struct
 	Oid			indexOid;
 } RelToCluster;
 
-
-static void cluster_multiple_rels(List *rtcs, ClusterParams *params);
-static void rebuild_relation(Relation OldHeap, Relation index, bool verbose);
+static bool cluster_rel_recheck(RepackCommand cmd, Relation OldHeap,
+								Oid indexOid, Oid userid, int options);
+static void rebuild_relation(RepackCommand cmd, bool usingindex,
+							 Relation OldHeap, Relation index, bool verbose);
 static void copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
-static List *get_tables_to_cluster(MemoryContext cluster_context);
-static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
-											   Oid indexOid);
-static bool cluster_is_permitted_for_relation(Oid relid, Oid userid);
+static List *get_tables_to_repack(RepackCommand cmd, bool usingindex,
+								  MemoryContext permcxt);
+static List *get_tables_to_repack_partitioned(RepackCommand cmd,
+											  MemoryContext cluster_context,
+											  Oid relid, bool rel_is_index);
+static bool cluster_is_permitted_for_relation(RepackCommand cmd,
+											  Oid relid, Oid userid);
+static Relation process_single_relation(RepackStmt *stmt,
+										ClusterParams *params);
+static Oid determine_clustered_index(Relation rel, bool usingindex,
+									 const char *indexname);
 
 
+static const char *
+RepackCommandAsString(RepackCommand cmd)
+{
+	switch (cmd)
+	{
+		case REPACK_COMMAND_REPACK:
+			return "REPACK";
+		case REPACK_COMMAND_VACUUMFULL:
+			return "VACUUM";
+		case REPACK_COMMAND_CLUSTER:
+			return "VACUUM";
+	}
+	return "???";
+}
+
 /*---------------------------------------------------------------------------
  * This cluster code allows for clustering multiple tables at once. Because
  * of this, we cannot just run everything on a single transaction, or we
@@ -104,101 +127,39 @@ static bool cluster_is_permitted_for_relation(Oid relid, Oid userid);
  *---------------------------------------------------------------------------
  */
 void
-cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
+ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 {
-	ListCell   *lc;
 	ClusterParams params = {0};
 	bool		verbose = false;
 	Relation	rel = NULL;
-	Oid			indexOid = InvalidOid;
-	MemoryContext cluster_context;
+	MemoryContext repack_context;
 	List	   *rtcs;
 
 	/* Parse option list */
-	foreach(lc, stmt->params)
+	foreach_node(DefElem, opt, stmt->params)
 	{
-		DefElem    *opt = (DefElem *) lfirst(lc);
-
 		if (strcmp(opt->defname, "verbose") == 0)
 			verbose = defGetBoolean(opt);
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
-					 errmsg("unrecognized CLUSTER option \"%s\"",
+					 errmsg("unrecognized %s option \"%s\"",
+							RepackCommandAsString(stmt->command),
 							opt->defname),
 					 parser_errposition(pstate, opt->location)));
 	}
 
 	params.options = (verbose ? CLUOPT_VERBOSE : 0);
 
+	/*
+	 * If a single relation is specified, process it and we're done ... unless
+	 * the relation is a partitioned table, in which case we fall through.
+	 */
 	if (stmt->relation != NULL)
 	{
-		/* This is the single-relation case. */
-		Oid			tableOid;
-
-		/*
-		 * Find, lock, and check permissions on the table.  We obtain
-		 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
-		 * single-transaction case.
-		 */
-		tableOid = RangeVarGetRelidExtended(stmt->relation,
-											AccessExclusiveLock,
-											0,
-											RangeVarCallbackMaintainsTable,
-											NULL);
-		rel = table_open(tableOid, NoLock);
-
-		/*
-		 * Reject clustering a remote temp table ... their local buffer
-		 * manager is not going to cope.
-		 */
-		if (RELATION_IS_OTHER_TEMP(rel))
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot cluster temporary tables of other sessions")));
-
-		if (stmt->indexname == NULL)
-		{
-			ListCell   *index;
-
-			/* We need to find the index that has indisclustered set. */
-			foreach(index, RelationGetIndexList(rel))
-			{
-				indexOid = lfirst_oid(index);
-				if (get_index_isclustered(indexOid))
-					break;
-				indexOid = InvalidOid;
-			}
-
-			if (!OidIsValid(indexOid))
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_OBJECT),
-						 errmsg("there is no previously clustered index for table \"%s\"",
-								stmt->relation->relname)));
-		}
-		else
-		{
-			/*
-			 * The index is expected to be in the same namespace as the
-			 * relation.
-			 */
-			indexOid = get_relname_relid(stmt->indexname,
-										 rel->rd_rel->relnamespace);
-			if (!OidIsValid(indexOid))
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_OBJECT),
-						 errmsg("index \"%s\" for table \"%s\" does not exist",
-								stmt->indexname, stmt->relation->relname)));
-		}
-
-		/* For non-partitioned tables, do what we came here to do. */
-		if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
-		{
-			cluster_rel(rel, indexOid, &params);
-			/* cluster_rel closes the relation, but keeps lock */
-
+		rel = process_single_relation(stmt, &params);
+		if (rel == NULL)
 			return;
-		}
 	}
 
 	/*
@@ -207,88 +168,103 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 	 * transaction.  This forces us to disallow running inside a user
 	 * transaction block.
 	 */
-	PreventInTransactionBlock(isTopLevel, "CLUSTER");
+	PreventInTransactionBlock(isTopLevel, RepackCommandAsString(stmt->command));
 
 	/* Also, we need a memory context to hold our list of relations */
-	cluster_context = AllocSetContextCreate(PortalContext,
-											"Cluster",
-											ALLOCSET_DEFAULT_SIZES);
+	repack_context = AllocSetContextCreate(PortalContext,
+										   "Repack",
+										   ALLOCSET_DEFAULT_SIZES);
+
+	params.options |= CLUOPT_RECHECK;
 
 	/*
-	 * Either we're processing a partitioned table, or we were not given any
-	 * table name at all.  In either case, obtain a list of relations to
-	 * process.
-	 *
-	 * In the former case, an index name must have been given, so we don't
-	 * need to recheck its "indisclustered" bit, but we have to check that it
-	 * is an index that we can cluster on.  In the latter case, we set the
-	 * option bit to have indisclustered verified.
-	 *
-	 * Rechecking the relation itself is necessary here in all cases.
+	 * If we don't have a relation yet, determine a relation list.  If we do,
+	 * then it must be a partitioned table, and we want to process its
+	 * partitions.
 	 */
-	params.options |= CLUOPT_RECHECK;
-	if (rel != NULL)
+	if (rel == NULL)
 	{
-		Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
-		check_index_is_clusterable(rel, indexOid, AccessShareLock);
-		rtcs = get_tables_to_cluster_partitioned(cluster_context, indexOid);
-
-		/* close relation, releasing lock on parent table */
-		table_close(rel, AccessExclusiveLock);
+		Assert(stmt->indexname == NULL);
+		rtcs = get_tables_to_repack(stmt->command, stmt->usingindex,
+									repack_context);
 	}
 	else
 	{
-		rtcs = get_tables_to_cluster(cluster_context);
-		params.options |= CLUOPT_RECHECK_ISCLUSTERED;
+		Oid			relid;
+		bool		rel_is_index;
+
+		Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
+
+		/*
+		 * If an index name was specified, resolve it now and pass it down.
+		 */
+		if (stmt->usingindex)
+		{
+			/*
+			 * XXX how should this behave?  Passing no index to a partitioned
+			 * table could be useful to have certain partitions clustered by
+			 * some index, and other partitions by a different index.
+			 */
+			if (!stmt->indexname)
+				ereport(ERROR,
+						errmsg("there is no previously clustered index for table \"%s\"",
+							   RelationGetRelationName(rel)));
+
+			relid = determine_clustered_index(rel, true, stmt->indexname);
+			/* XXX is this the right place for this? */
+			check_index_is_clusterable(rel, relid, AccessExclusiveLock);
+			rel_is_index = true;
+		}
+		else
+		{
+			relid = RelationGetRelid(rel);
+			rel_is_index = false;
+		}
+
+		rtcs = get_tables_to_repack_partitioned(stmt->command, repack_context,
+												relid, rel_is_index);
+
+		/* close parent relation, releasing lock on it */
+		table_close(rel, AccessExclusiveLock);
+		rel = NULL;
 	}
 
-	/* Do the job. */
-	cluster_multiple_rels(rtcs, &params);
-
-	/* Start a new transaction for the cleanup work. */
-	StartTransactionCommand();
-
-	/* Clean up working storage */
-	MemoryContextDelete(cluster_context);
-}
-
-/*
- * Given a list of relations to cluster, process each of them in a separate
- * transaction.
- *
- * We expect to be in a transaction at start, but there isn't one when we
- * return.
- */
-static void
-cluster_multiple_rels(List *rtcs, ClusterParams *params)
-{
-	ListCell   *lc;
-
 	/* Commit to get out of starting transaction */
 	PopActiveSnapshot();
 	CommitTransactionCommand();
 
 	/* Cluster the tables, each in a separate transaction */
-	foreach(lc, rtcs)
+	Assert(rel == NULL);
+	foreach_ptr(RelToCluster, rtc, rtcs)
 	{
-		RelToCluster *rtc = (RelToCluster *) lfirst(lc);
-		Relation	rel;
-
 		/* Start a new transaction for each relation. */
 		StartTransactionCommand();
 
 		/* functions in indexes may want a snapshot set */
 		PushActiveSnapshot(GetTransactionSnapshot());
 
-		rel = table_open(rtc->tableOid, AccessExclusiveLock);
+		/*
+		 * Open the target table, coping with the case where it has been
+		 * dropped.
+		 */
+		rel = try_table_open(rtc->tableOid, AccessExclusiveLock);
+		if (rel == NULL)
+			continue;
 
 		/* Process this table */
-		cluster_rel(rel, rtc->indexOid, params);
+		cluster_rel(stmt->command, stmt->usingindex,
+					rel, rtc->indexOid, &params);
 		/* cluster_rel closes the relation, but keeps lock */
 
 		PopActiveSnapshot();
 		CommitTransactionCommand();
 	}
+
+	/* Start a new transaction for the cleanup work. */
+	StartTransactionCommand();
+
+	/* Clean up working storage */
+	MemoryContextDelete(repack_context);
 }
 
 /*
@@ -304,11 +280,14 @@ cluster_multiple_rels(List *rtcs, ClusterParams *params)
  * them incrementally while we load the table.
  *
  * If indexOid is InvalidOid, the table will be rewritten in physical order
- * instead of index order.  This is the new implementation of VACUUM FULL,
- * and error messages should refer to the operation as VACUUM not CLUSTER.
+ * instead of index order.
+ *
+ * 'cmd' indicates which command is being executed, to be used for error
+ * messages.
  */
 void
-cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
+cluster_rel(RepackCommand cmd, bool usingindex,
+			Relation OldHeap, Oid indexOid, ClusterParams *params)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			save_userid;
@@ -323,13 +302,25 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	/* Check for user-requested abort. */
 	CHECK_FOR_INTERRUPTS();
 
-	pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
-	if (OidIsValid(indexOid))
-		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
-									 PROGRESS_CLUSTER_COMMAND_CLUSTER);
+	if (cmd == REPACK_COMMAND_REPACK)
+		pgstat_progress_start_command(PROGRESS_COMMAND_REPACK, tableOid);
 	else
-		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
+		pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
+
+	if (cmd == REPACK_COMMAND_REPACK)
+		pgstat_progress_update_param(PROGRESS_REPACK_COMMAND,
+									 PROGRESS_REPACK_COMMAND_REPACK);
+	else if (cmd == REPACK_COMMAND_CLUSTER)
+	{
+		pgstat_progress_update_param(PROGRESS_REPACK_COMMAND,
+									 PROGRESS_CLUSTER_COMMAND_CLUSTER);
+	}
+	else
+	{
+		Assert(cmd == REPACK_COMMAND_VACUUMFULL);
+		pgstat_progress_update_param(PROGRESS_REPACK_COMMAND,
 									 PROGRESS_CLUSTER_COMMAND_VACUUM_FULL);
+	}
 
 	/*
 	 * Switch to the table owner's userid, so that any index functions are run
@@ -351,63 +342,21 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	 * to cluster a not-previously-clustered index.
 	 */
 	if (recheck)
-	{
-		/* Check that the user still has privileges for the relation */
-		if (!cluster_is_permitted_for_relation(tableOid, save_userid))
-		{
-			relation_close(OldHeap, AccessExclusiveLock);
+		if (!cluster_rel_recheck(cmd, OldHeap, indexOid, save_userid,
+								 params->options))
 			goto out;
-		}
-
-		/*
-		 * Silently skip a temp table for a remote session.  Only doing this
-		 * check in the "recheck" case is appropriate (which currently means
-		 * somebody is executing a database-wide CLUSTER or on a partitioned
-		 * table), because there is another check in cluster() which will stop
-		 * any attempt to cluster remote temp tables by name.  There is
-		 * another check in cluster_rel which is redundant, but we leave it
-		 * for extra safety.
-		 */
-		if (RELATION_IS_OTHER_TEMP(OldHeap))
-		{
-			relation_close(OldHeap, AccessExclusiveLock);
-			goto out;
-		}
-
-		if (OidIsValid(indexOid))
-		{
-			/*
-			 * Check that the index still exists
-			 */
-			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
-			{
-				relation_close(OldHeap, AccessExclusiveLock);
-				goto out;
-			}
-
-			/*
-			 * Check that the index is still the one with indisclustered set,
-			 * if needed.
-			 */
-			if ((params->options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
-				!get_index_isclustered(indexOid))
-			{
-				relation_close(OldHeap, AccessExclusiveLock);
-				goto out;
-			}
-		}
-	}
 
 	/*
-	 * We allow VACUUM FULL, but not CLUSTER, on shared catalogs.  CLUSTER
-	 * would work in most respects, but the index would only get marked as
-	 * indisclustered in the current database, leading to unexpected behavior
-	 * if CLUSTER were later invoked in another database.
+	 * We allow repacking shared catalogs only when not using an index.
+	 * It would work to use an index in most respects, but the index would
+	 * only get marked as indisclustered in the current database, leading to
+	 * unexpected behavior if CLUSTER were later invoked in another database.
 	 */
-	if (OidIsValid(indexOid) && OldHeap->rd_rel->relisshared)
+	if (usingindex && OldHeap->rd_rel->relisshared)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot cluster a shared catalog")));
+				 errmsg("cannot run \"%s\" on a shared catalog",
+						RepackCommandAsString(cmd))));
 
 	/*
 	 * Don't process temp tables of other backends ... their local buffer
@@ -415,21 +364,30 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	 */
 	if (RELATION_IS_OTHER_TEMP(OldHeap))
 	{
-		if (OidIsValid(indexOid))
+		if (cmd == REPACK_COMMAND_CLUSTER)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("cannot cluster temporary tables of other sessions")));
+		else if (cmd == REPACK_COMMAND_REPACK)
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot repack temporary tables of other sessions")));
+		}
 		else
+		{
+			Assert (cmd == REPACK_COMMAND_VACUUMFULL);
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("cannot vacuum temporary tables of other sessions")));
+		}
 	}
 
 	/*
 	 * Also check for active uses of the relation in the current transaction,
 	 * including open scans and pending AFTER trigger events.
 	 */
-	CheckTableNotInUse(OldHeap, OidIsValid(indexOid) ? "CLUSTER" : "VACUUM");
+	CheckTableNotInUse(OldHeap, RepackCommandAsString(cmd));
 
 	/* Check heap and index are valid to cluster on */
 	if (OidIsValid(indexOid))
@@ -469,7 +427,7 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	TransferPredicateLocksToHeapRelation(OldHeap);
 
 	/* rebuild_relation does all the dirty work */
-	rebuild_relation(OldHeap, index, verbose);
+	rebuild_relation(cmd, usingindex, OldHeap, index, verbose);
 	/* rebuild_relation closes OldHeap, and index if valid */
 
 out:
@@ -482,6 +440,64 @@ out:
 	pgstat_progress_end_command();
 }
 
+/*
+ * Check if the table (and its index) still meets the requirements of
+ * cluster_rel().
+ */
+static bool
+cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
+					Oid userid, int options)
+{
+	Oid			tableOid = RelationGetRelid(OldHeap);
+
+	/* Check that the user still has privileges for the relation */
+	if (!cluster_is_permitted_for_relation(tableOid, userid,
+										   CLUSTER_COMMAND_CLUSTER))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		return false;
+	}
+
+	/*
+	 * Silently skip a temp table for a remote session.  Only doing this check
+	 * in the "recheck" case is appropriate (which currently means somebody is
+	 * executing a database-wide CLUSTER or on a partitioned table), because
+	 * there is another check in cluster() which will stop any attempt to
+	 * cluster remote temp tables by name.  There is another check in
+	 * cluster_rel which is redundant, but we leave it for extra safety.
+	 */
+	if (RELATION_IS_OTHER_TEMP(OldHeap))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		return false;
+	}
+
+	if (OidIsValid(indexOid))
+	{
+		/*
+		 * Check that the index still exists
+		 */
+		if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
+		{
+			relation_close(OldHeap, AccessExclusiveLock);
+			return false;
+		}
+
+		/*
+		 * Check that the index is still the one with indisclustered set, if
+		 * needed.
+		 */
+		if ((options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
+			!get_index_isclustered(indexOid))
+		{
+			relation_close(OldHeap, AccessExclusiveLock);
+			return false;
+		}
+	}
+
+	return true;
+}
+
 /*
  * Verify that the specified heap and index are valid to cluster on
  *
@@ -626,7 +642,8 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
  * On exit, they are closed, but locks on them are not released.
  */
 static void
-rebuild_relation(Relation OldHeap, Relation index, bool verbose)
+rebuild_relation(RepackCommand cmd, bool usingindex,
+				 Relation OldHeap, Relation index, bool verbose)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			accessMethod = OldHeap->rd_rel->relam;
@@ -642,8 +659,8 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose)
 	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false) &&
 		   (index == NULL || CheckRelationLockedByMe(index, AccessExclusiveLock, false)));
 
-	if (index)
-		/* Mark the correct index as clustered */
+	/* for CLUSTER or REPACK USING INDEX, mark the index as the one to use */
+	if (usingindex)
 		mark_index_clustered(OldHeap, RelationGetRelid(index), true);
 
 	/* Remember info about rel before closing OldHeap */
@@ -1458,8 +1475,8 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	int			i;
 
 	/* Report that we are now swapping relation files */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_SWAP_REL_FILES);
 
 	/* Zero out possible results from swapped_relation_files */
 	memset(mapped_tables, 0, sizeof(mapped_tables));
@@ -1509,14 +1526,14 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 		reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
 
 	/* Report that we are now reindexing relations */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_REBUILD_INDEX);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
 
 	reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
 
 	/* Report that we are now doing clean up */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_FINAL_CLEANUP);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_FINAL_CLEANUP);
 
 	/*
 	 * If the relation being rebuilt is pg_class, swap_relation_files()
@@ -1632,69 +1649,136 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	}
 }
 
-
 /*
- * Get a list of tables that the current user has privileges on and
- * have indisclustered set.  Return the list in a List * of RelToCluster
- * (stored in the specified memory context), each one giving the tableOid
- * and the indexOid on which the table is already clustered.
+ * Determine which relations to process, when REPACK/CLUSTER is called
+ * without specifying a table name.  The exact process depends on whether
+ * USING INDEX was given or not, and in any case we only return tables and
+ * materialized views that the current user has privileges to repack/cluster.
+ *
+ * If USING INDEX was given, we scan pg_index to find those that have
+ * indisclustered set; if it was not given, scan pg_class and return all
+ * tables.
+ *
+ * Return it as a list of RelToCluster in the given memory context.
  */
 static List *
-get_tables_to_cluster(MemoryContext cluster_context)
+get_tables_to_repack(RepackCommand command, bool usingindex,
+					 MemoryContext permcxt)
 {
-	Relation	indRelation;
+	Relation	catalog;
 	TableScanDesc scan;
-	ScanKeyData entry;
-	HeapTuple	indexTuple;
-	Form_pg_index index;
+	HeapTuple	tuple;
 	MemoryContext old_context;
 	List	   *rtcs = NIL;
 
-	/*
-	 * Get all indexes that have indisclustered set and that the current user
-	 * has the appropriate privileges for.
-	 */
-	indRelation = table_open(IndexRelationId, AccessShareLock);
-	ScanKeyInit(&entry,
-				Anum_pg_index_indisclustered,
-				BTEqualStrategyNumber, F_BOOLEQ,
-				BoolGetDatum(true));
-	scan = table_beginscan_catalog(indRelation, 1, &entry);
-	while ((indexTuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+	if (usingindex)
 	{
-		RelToCluster *rtc;
+		ScanKeyData		entry;
 
-		index = (Form_pg_index) GETSTRUCT(indexTuple);
+		catalog = table_open(IndexRelationId, AccessShareLock);
+		ScanKeyInit(&entry,
+					Anum_pg_index_indisclustered,
+					BTEqualStrategyNumber, F_BOOLEQ,
+					BoolGetDatum(true));
+		scan = table_beginscan_catalog(catalog, 1, &entry);
+		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+		{
+			RelToCluster *rtc;
+			Form_pg_index	index;
 
-		if (!cluster_is_permitted_for_relation(index->indrelid, GetUserId()))
-			continue;
+			index = (Form_pg_index) GETSTRUCT(tuple);
+			/*
+			 * XXX I think the only reason there's no test failure here is
+			 * that we seldom have clustered indexes that would be affected
+			 * by concurrency.  Maybe we should also do the
+			 * ConditionalLockRelationOid+SearchSysCacheExists dance that
+			 * we do below.
+			 */
+			if (!cluster_is_permitted_for_relation(command, index->indrelid,
+												   GetUserId()))
+				continue;
 
-		/* Use a permanent memory context for the result list */
-		old_context = MemoryContextSwitchTo(cluster_context);
+			/* Use a permanent memory context for the result list */
+			old_context = MemoryContextSwitchTo(permcxt);
 
-		rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
-		rtc->tableOid = index->indrelid;
-		rtc->indexOid = index->indexrelid;
-		rtcs = lappend(rtcs, rtc);
+			rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
+			rtc->tableOid = index->indrelid;
+			rtc->indexOid = index->indexrelid;
+			rtcs = lappend(rtcs, rtc);
 
-		MemoryContextSwitchTo(old_context);
+			MemoryContextSwitchTo(old_context);
+		}
 	}
-	table_endscan(scan);
+	else
+	{
+		catalog = table_open(RelationRelationId, AccessShareLock);
+		scan = table_beginscan_catalog(catalog, 0, NULL);
 
-	relation_close(indRelation, AccessShareLock);
+		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+		{
+			RelToCluster *rtc;
+			Form_pg_class class;
+
+			class = (Form_pg_class) GETSTRUCT(tuple);
+
+			/*
+			 * Try to obtain a light lock on the table, to ensure it doesn't
+			 * go away while we collect the list.  If we cannot, just
+			 * disregard the table.  XXX we could release at the bottom of the
+			 * loop, but for now just hold it until this transaction is
+			 * finished.
+			 */
+			if (!ConditionalLockRelationOid(class->oid, AccessShareLock))
+				continue;
+
+			/* Verify that the table still exists. */
+			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(class->oid)))
+			{
+				/* Release useless lock */
+				UnlockRelationOid(class->oid, AccessShareLock);
+				continue;
+			}
+
+			/* Can only process plain tables and matviews */
+			if (class->relkind != RELKIND_RELATION &&
+				class->relkind != RELKIND_MATVIEW)
+				continue;
+
+			if (!cluster_is_permitted_for_relation(command, class->oid,
+												   GetUserId()))
+				continue;
+
+			/* Use a permanent memory context for the result list */
+			old_context = MemoryContextSwitchTo(permcxt);
+
+			rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
+			rtc->tableOid = class->oid;
+			rtc->indexOid = InvalidOid;
+			rtcs = lappend(rtcs, rtc);
+
+			MemoryContextSwitchTo(old_context);
+		}
+	}
+
+	table_endscan(scan);
+	relation_close(catalog, AccessShareLock);
 
 	return rtcs;
 }
 
 /*
- * Given an index on a partitioned table, return a list of RelToCluster for
+ * Given a partitioned table or its index, return a list of RelToCluster for
  * all the children leaves tables/indexes.
  *
  * Like expand_vacuum_rel, but here caller must hold AccessExclusiveLock
  * on the table containing the index.
+ *
+ * 'rel_is_index' tells whether 'relid' is that of an index (true) or of the
+ * owning relation.
  */
 static List *
-get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
+get_tables_to_repack_partitioned(RepackCommand cmd, MemoryContext cluster_context,
+								 Oid relid, bool rel_is_index)
 {
 	List	   *inhoids;
 	ListCell   *lc;
@@ -1702,17 +1786,33 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
 	MemoryContext old_context;
 
 	/* Do not lock the children until they're processed */
-	inhoids = find_all_inheritors(indexOid, NoLock, NULL);
+	inhoids = find_all_inheritors(relid, NoLock, NULL);
 
 	foreach(lc, inhoids)
 	{
-		Oid			indexrelid = lfirst_oid(lc);
-		Oid			relid = IndexGetRelation(indexrelid, false);
+		Oid			inhoid = lfirst_oid(lc);
+		Oid			inhrelid,
+					inhindid;
 		RelToCluster *rtc;
 
-		/* consider only leaf indexes */
-		if (get_rel_relkind(indexrelid) != RELKIND_INDEX)
-			continue;
+		if (rel_is_index)
+		{
+			/* consider only leaf indexes */
+			if (get_rel_relkind(inhoid) != RELKIND_INDEX)
+				continue;
+
+			inhrelid = IndexGetRelation(inhoid, false);
+			inhindid = inhoid;
+		}
+		else
+		{
+			/* consider only leaf relations */
+			if (get_rel_relkind(inhoid) != RELKIND_RELATION)
+				continue;
+
+			inhrelid = inhoid;
+			inhindid = InvalidOid;
+		}
 
 		/*
 		 * It's possible that the user does not have privileges to CLUSTER the
@@ -1720,15 +1820,15 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
 		 * table.  We skip any partitions which the user is not permitted to
 		 * CLUSTER.
 		 */
-		if (!cluster_is_permitted_for_relation(relid, GetUserId()))
+		if (!cluster_is_permitted_for_relation(cmd, inhrelid, GetUserId()))
 			continue;
 
 		/* Use a permanent memory context for the result list */
 		old_context = MemoryContextSwitchTo(cluster_context);
 
 		rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
-		rtc->tableOid = relid;
-		rtc->indexOid = indexrelid;
+		rtc->tableOid = inhrelid;
+		rtc->indexOid = inhindid;
 		rtcs = lappend(rtcs, rtc);
 
 		MemoryContextSwitchTo(old_context);
@@ -1742,13 +1842,134 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
  * function emits a WARNING.
  */
 static bool
-cluster_is_permitted_for_relation(Oid relid, Oid userid)
+cluster_is_permitted_for_relation(RepackCommand cmd, Oid relid, Oid userid)
 {
 	if (pg_class_aclcheck(relid, userid, ACL_MAINTAIN) == ACLCHECK_OK)
 		return true;
 
+	Assert(cmd == REPACK_COMMAND_CLUSTER || cmd == REPACK_COMMAND_REPACK);
 	ereport(WARNING,
-			(errmsg("permission denied to cluster \"%s\", skipping it",
-					get_rel_name(relid))));
+			errmsg("permission denied to execute %s on \"%s\", skipping it",
+				   cmd == REPACK_COMMAND_CLUSTER ? "CLUSTER" : "REPACK",
+				   get_rel_name(relid)));
+
 	return false;
 }
+
+
+/*
+ * Given a RepackStmt with an indicated relation name, resolve the relation
+ * name, obtain lock on it, then determine what to do based on the relation
+ * type: if it's not a partitioned table, repack it as indicated (using an
+ * existing clustered index, or following the indicated index), and return
+ * NULL.
+ *
+ * On the other hand, if the table is partitioned, do nothing further and
+ * instead return the opened relcache entry, so that caller can process the
+ * partitions using the multiple-table handling code.  The index name is not
+ * resolve in this case.
+ */
+static Relation
+process_single_relation(RepackStmt *stmt, ClusterParams *params)
+{
+	Relation	rel;
+	Oid			tableOid;
+
+	Assert(stmt->relation != NULL);
+	Assert(stmt->command == REPACK_COMMAND_CLUSTER ||
+		   stmt->command == REPACK_COMMAND_REPACK);
+
+	/*
+	 * Find, lock, and check permissions on the table.  We obtain
+	 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
+	 * single-transaction case.
+	 */
+	tableOid = RangeVarGetRelidExtended(stmt->relation,
+										AccessExclusiveLock,
+										0,
+										RangeVarCallbackMaintainsTable,
+										NULL);
+	rel = table_open(tableOid, NoLock);
+
+	/*
+	 * Reject clustering a remote temp table ... their local buffer manager is
+	 * not going to cope.
+	 */
+	if (RELATION_IS_OTHER_TEMP(rel))
+	{
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot execute %s on temporary tables of other sessions",
+					   RepackCommandAsString(stmt->command)));
+	}
+
+	/*
+	 * For partitioned tables, let caller handle this.  Otherwise, process
+	 * it here and we're done.
+	 */
+	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+		return rel;
+	else
+	{
+		Oid		indexOid;
+
+		indexOid = determine_clustered_index(rel, stmt->usingindex,
+											 stmt->indexname);
+		check_index_is_clusterable(rel, indexOid, AccessExclusiveLock);
+		cluster_rel(stmt->command, stmt->usingindex, rel, indexOid, params);
+		return NULL;
+	}
+}
+
+/*
+ * Given a relation and the usingindex/indexname options in a
+ * REPACK USING INDEX or CLUSTER command, return the OID of the index to use
+ * for clustering the table.
+ *
+ * Caller must hold lock on the relation so that the set of indexes doesn't
+ * change, and must call check_index_is_clusterable.
+ */
+static Oid
+determine_clustered_index(Relation rel, bool usingindex, const char *indexname)
+{
+	Oid		indexOid;
+
+	if (indexname == NULL && usingindex)
+	{
+		ListCell *lc;
+
+		/* Find an index with indisclustered set, or report error */
+		foreach(lc, RelationGetIndexList(rel))
+		{
+			indexOid = lfirst_oid(lc);
+
+			if (get_index_isclustered(indexOid))
+				break;
+			indexOid = InvalidOid;
+		}
+
+		if (!OidIsValid(indexOid))
+			ereport(ERROR,
+					errcode(ERRCODE_UNDEFINED_OBJECT),
+					errmsg("there is no previously clustered index for table \"%s\"",
+						   RelationGetRelationName(rel)));
+	}
+	else if (indexname != NULL)
+	{
+		/*
+		 * An index was specified; figure out its name.  It must be in the
+		 * same namespace as the relation.
+		 */
+		indexOid = get_relname_relid(indexname,
+									 rel->rd_rel->relnamespace);
+		if (!OidIsValid(indexOid))
+			ereport(ERROR,
+					errcode(ERRCODE_UNDEFINED_OBJECT),
+					errmsg("index \"%s\" for table \"%s\" does not exist",
+						   indexname, RelationGetRelationName(rel)));
+	}
+	else
+		indexOid = InvalidOid;
+
+	return indexOid;
+}
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 733ef40ae7c..8863ad0e8bd 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -2287,7 +2287,8 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 				cluster_params.options |= CLUOPT_VERBOSE;
 
 			/* VACUUM FULL is now a variant of CLUSTER; see cluster.c */
-			cluster_rel(rel, InvalidOid, &cluster_params);
+			cluster_rel(REPACK_COMMAND_VACUUMFULL, false, rel, InvalidOid,
+						&cluster_params);
 			/* cluster_rel closes the relation, but keeps lock */
 
 			rel = NULL;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index db43034b9db..062235817f4 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -280,7 +280,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		AlterCompositeTypeStmt AlterUserMappingStmt
 		AlterRoleStmt AlterRoleSetStmt AlterPolicyStmt AlterStatsStmt
 		AlterDefaultPrivilegesStmt DefACLAction
-		AnalyzeStmt CallStmt ClosePortalStmt ClusterStmt CommentStmt
+		AnalyzeStmt CallStmt ClosePortalStmt CommentStmt
 		ConstraintsSetStmt CopyStmt CreateAsStmt CreateCastStmt
 		CreateDomainStmt CreateExtensionStmt CreateGroupStmt CreateOpClassStmt
 		CreateOpFamilyStmt AlterOpFamilyStmt CreatePLangStmt
@@ -297,7 +297,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		GrantStmt GrantRoleStmt ImportForeignSchemaStmt IndexStmt InsertStmt
 		ListenStmt LoadStmt LockStmt MergeStmt NotifyStmt ExplainableStmt PreparableStmt
 		CreateFunctionStmt AlterFunctionStmt ReindexStmt RemoveAggrStmt
-		RemoveFuncStmt RemoveOperStmt RenameStmt ReturnStmt RevokeStmt RevokeRoleStmt
+		RemoveFuncStmt RemoveOperStmt RenameStmt RepackStmt ReturnStmt RevokeStmt RevokeRoleStmt
 		RuleActionStmt RuleActionStmtOrEmpty RuleStmt
 		SecLabelStmt SelectStmt TransactionStmt TransactionStmtLegacy TruncateStmt
 		UnlistenStmt UpdateStmt VacuumStmt
@@ -316,7 +316,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 %type <str>			opt_single_name
 %type <list>		opt_qualified_name
-%type <boolean>		opt_concurrently
+%type <boolean>		opt_concurrently opt_usingindex
 %type <dbehavior>	opt_drop_behavior
 %type <list>		opt_utility_option_list
 %type <list>		utility_option_list
@@ -763,7 +763,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	QUOTE QUOTES
 
 	RANGE READ REAL REASSIGN RECURSIVE REF_P REFERENCES REFERENCING
-	REFRESH REINDEX RELATIVE_P RELEASE RENAME REPEATABLE REPLACE REPLICA
+	REFRESH REINDEX RELATIVE_P RELEASE RENAME REPACK REPEATABLE REPLACE REPLICA
 	RESET RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP
 	ROUTINE ROUTINES ROW ROWS RULE
 
@@ -1025,7 +1025,6 @@ stmt:
 			| CallStmt
 			| CheckPointStmt
 			| ClosePortalStmt
-			| ClusterStmt
 			| CommentStmt
 			| ConstraintsSetStmt
 			| CopyStmt
@@ -1099,6 +1098,7 @@ stmt:
 			| RemoveFuncStmt
 			| RemoveOperStmt
 			| RenameStmt
+			| RepackStmt
 			| RevokeStmt
 			| RevokeRoleStmt
 			| RuleStmt
@@ -1135,6 +1135,11 @@ opt_concurrently:
 			| /*EMPTY*/						{ $$ = false; }
 		;
 
+opt_usingindex:
+			USING INDEX						{ $$ = true; }
+			| /* EMPTY */					{ $$ = false; }
+		;
+
 opt_drop_behavior:
 			CASCADE							{ $$ = DROP_CASCADE; }
 			| RESTRICT						{ $$ = DROP_RESTRICT; }
@@ -11912,38 +11917,80 @@ CreateConversionStmt:
 /*****************************************************************************
  *
  *		QUERY:
+ *				REPACK [ (options) ] [ <qualified_name> [ USING INDEX <index_name> ] ]
+ *
+ *			obsolete variants:
  *				CLUSTER (options) [ <qualified_name> [ USING <index_name> ] ]
  *				CLUSTER [VERBOSE] [ <qualified_name> [ USING <index_name> ] ]
  *				CLUSTER [VERBOSE] <index_name> ON <qualified_name> (for pre-8.3)
  *
  *****************************************************************************/
 
-ClusterStmt:
-			CLUSTER '(' utility_option_list ')' qualified_name cluster_index_specification
+RepackStmt:
+			REPACK opt_utility_option_list qualified_name USING INDEX name
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = $3;
+					n->indexname = $6;
+					n->usingindex = true;
+					n->params = $2;
+					$$ = (Node *) n;
+				}
+			| REPACK opt_utility_option_list qualified_name opt_usingindex
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = $3;
+					n->indexname = NULL;
+					n->usingindex = $4;
+					n->params = $2;
+					$$ = (Node *) n;
+				}
+			| REPACK opt_usingindex
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = NULL;
+					n->indexname = NULL;
+					n->usingindex = $2;
+					n->params = NIL;
+					$$ = (Node *) n;
+				}
+			| CLUSTER '(' utility_option_list ')' qualified_name cluster_index_specification
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = $5;
 					n->indexname = $6;
+					n->usingindex = true;
 					n->params = $3;
 					$$ = (Node *) n;
 				}
 			| CLUSTER opt_utility_option_list
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = NULL;
 					n->indexname = NULL;
+					n->usingindex = true;
 					n->params = $2;
 					$$ = (Node *) n;
 				}
 			/* unparenthesized VERBOSE kept for pre-14 compatibility */
 			| CLUSTER opt_verbose qualified_name cluster_index_specification
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = $3;
 					n->indexname = $4;
+					n->usingindex = true;
 					if ($2)
 						n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
@@ -11951,20 +11998,24 @@ ClusterStmt:
 			/* unparenthesized VERBOSE kept for pre-17 compatibility */
 			| CLUSTER VERBOSE
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = NULL;
 					n->indexname = NULL;
+					n->usingindex = true;
 					n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
 				}
 			/* kept for pre-8.3 compatibility */
 			| CLUSTER opt_verbose name ON qualified_name
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = $5;
 					n->indexname = $3;
+					n->usingindex = true;
 					if ($2)
 						n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
@@ -17960,6 +18011,7 @@ unreserved_keyword:
 			| RELATIVE_P
 			| RELEASE
 			| RENAME
+			| REPACK
 			| REPEATABLE
 			| REPLACE
 			| REPLICA
@@ -18592,6 +18644,7 @@ bare_label_keyword:
 			| RELATIVE_P
 			| RELEASE
 			| RENAME
+			| REPACK
 			| REPEATABLE
 			| REPLACE
 			| REPLICA
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index babc34d0cbe..f26440a9b79 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -277,9 +277,9 @@ ClassifyUtilityCommandAsReadOnly(Node *parsetree)
 				return COMMAND_OK_IN_RECOVERY | COMMAND_OK_IN_READ_ONLY_TXN;
 			}
 
-		case T_ClusterStmt:
 		case T_ReindexStmt:
 		case T_VacuumStmt:
+		case T_RepackStmt:
 			{
 				/*
 				 * These commands write WAL, so they're not strictly
@@ -854,14 +854,14 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 			ExecuteCallStmt(castNode(CallStmt, parsetree), params, isAtomicContext, dest);
 			break;
 
-		case T_ClusterStmt:
-			cluster(pstate, (ClusterStmt *) parsetree, isTopLevel);
-			break;
-
 		case T_VacuumStmt:
 			ExecVacuum(pstate, (VacuumStmt *) parsetree, isTopLevel);
 			break;
 
+		case T_RepackStmt:
+			ExecRepack(pstate, (RepackStmt *) parsetree, isTopLevel);
+			break;
+
 		case T_ExplainStmt:
 			ExplainQuery(pstate, (ExplainStmt *) parsetree, params, dest);
 			break;
@@ -2850,10 +2850,6 @@ CreateCommandTag(Node *parsetree)
 			tag = CMDTAG_CALL;
 			break;
 
-		case T_ClusterStmt:
-			tag = CMDTAG_CLUSTER;
-			break;
-
 		case T_VacuumStmt:
 			if (((VacuumStmt *) parsetree)->is_vacuumcmd)
 				tag = CMDTAG_VACUUM;
@@ -2861,6 +2857,10 @@ CreateCommandTag(Node *parsetree)
 				tag = CMDTAG_ANALYZE;
 			break;
 
+		case T_RepackStmt:
+			tag = CMDTAG_REPACK;
+			break;
+
 		case T_ExplainStmt:
 			tag = CMDTAG_EXPLAIN;
 			break;
@@ -3498,7 +3498,7 @@ GetCommandLogLevel(Node *parsetree)
 			lev = LOGSTMT_ALL;
 			break;
 
-		case T_ClusterStmt:
+		case T_RepackStmt:
 			lev = LOGSTMT_DDL;
 			break;
 
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 1c12ddbae49..b2ad8ba45cd 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -268,6 +268,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 		cmdtype = PROGRESS_COMMAND_ANALYZE;
 	else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
 		cmdtype = PROGRESS_COMMAND_CLUSTER;
+	else if (pg_strcasecmp(cmd, "REPACK") == 0)
+		cmdtype = PROGRESS_COMMAND_REPACK;
 	else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)
 		cmdtype = PROGRESS_COMMAND_CREATE_INDEX;
 	else if (pg_strcasecmp(cmd, "BASEBACKUP") == 0)
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index dbc586c5bc3..cca12fb058b 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -1244,7 +1244,7 @@ static const char *const sql_commands[] = {
 	"DELETE FROM", "DISCARD", "DO", "DROP", "END", "EXECUTE", "EXPLAIN",
 	"FETCH", "GRANT", "IMPORT FOREIGN SCHEMA", "INSERT INTO", "LISTEN", "LOAD", "LOCK",
 	"MERGE INTO", "MOVE", "NOTIFY", "PREPARE",
-	"REASSIGN", "REFRESH MATERIALIZED VIEW", "REINDEX", "RELEASE",
+	"REASSIGN", "REFRESH MATERIALIZED VIEW", "REINDEX", "RELEASE", "REPACK",
 	"RESET", "REVOKE", "ROLLBACK",
 	"SAVEPOINT", "SECURITY LABEL", "SELECT", "SET", "SHOW", "START",
 	"TABLE", "TRUNCATE", "UNLISTEN", "UPDATE", "VACUUM", "VALUES", "WITH",
@@ -4991,6 +4991,37 @@ match_previous_words(int pattern_id,
 			COMPLETE_WITH_QUERY(Query_for_list_of_tablespaces);
 	}
 
+/* REPACK */
+	else if (Matches("REPACK"))
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
+	else if (Matches("REPACK", "(*)"))
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
+	/* If we have REPACK <sth>, then add "USING INDEX" */
+	else if (Matches("REPACK", MatchAnyExcept("(")))
+		COMPLETE_WITH("USING INDEX");
+	/* If we have REPACK (*) <sth>, then add "USING INDEX" */
+	else if (Matches("REPACK", "(*)", MatchAny))
+		COMPLETE_WITH("USING INDEX");
+	/* If we have REPACK <sth> USING, then add the index as well */
+	else if (Matches("REPACK", MatchAny, "USING", "INDEX"))
+	{
+		set_completion_reference(prev3_wd);
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_index_of_table);
+	}
+	else if (HeadMatches("REPACK", "(*") &&
+			 !HeadMatches("REPACK", "(*)"))
+	{
+		/*
+		 * This fires if we're in an unfinished parenthesized option list.
+		 * get_previous_words treats a completed parenthesized option list as
+		 * one word, so the above test is correct.
+		 */
+		if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
+			COMPLETE_WITH("VERBOSE");
+		else if (TailMatches("VERBOSE"))
+			COMPLETE_WITH("ON", "OFF");
+	}
+
 /* SECURITY LABEL */
 	else if (Matches("SECURITY"))
 		COMPLETE_WITH("LABEL");
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 60088a64cbb..7f4138c7b36 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -31,8 +31,11 @@ typedef struct ClusterParams
 	bits32		options;		/* bitmask of CLUOPT_* */
 } ClusterParams;
 
-extern void cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel);
-extern void cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params);
+
+extern void ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel);
+
+extern void cluster_rel(RepackCommand command, bool usingindex,
+						Relation OldHeap, Oid indexOid, ClusterParams *params);
 extern void check_index_is_clusterable(Relation OldHeap, Oid indexOid,
 									   LOCKMODE lockmode);
 extern void mark_index_clustered(Relation rel, Oid indexOid, bool is_internal);
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 7c736e7b03b..5f102743af4 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -56,24 +56,51 @@
 #define PROGRESS_ANALYZE_PHASE_COMPUTE_EXT_STATS		4
 #define PROGRESS_ANALYZE_PHASE_FINALIZE_ANALYZE			5
 
-/* Progress parameters for cluster */
-#define PROGRESS_CLUSTER_COMMAND				0
-#define PROGRESS_CLUSTER_PHASE					1
-#define PROGRESS_CLUSTER_INDEX_RELID			2
-#define PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED	3
-#define PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN	4
-#define PROGRESS_CLUSTER_TOTAL_HEAP_BLKS		5
-#define PROGRESS_CLUSTER_HEAP_BLKS_SCANNED		6
-#define PROGRESS_CLUSTER_INDEX_REBUILD_COUNT	7
+/*
+ * Progress parameters for REPACK.
+ *
+ * Note: Since REPACK shares some code with CLUSTER, these values are also
+ * used by CLUSTER. (CLUSTER is now deprecated, so it makes little sense to
+ * introduce a separate set of constants.)
+ */
+#define PROGRESS_REPACK_COMMAND					0
+#define PROGRESS_REPACK_PHASE					1
+#define PROGRESS_REPACK_INDEX_RELID				2
+#define PROGRESS_REPACK_HEAP_TUPLES_SCANNED		3
+#define PROGRESS_REPACK_HEAP_TUPLES_WRITTEN		4
+#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			5
+#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		6
+#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		7
 
-/* Phases of cluster (as advertised via PROGRESS_CLUSTER_PHASE) */
-#define PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP	1
-#define PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP	2
-#define PROGRESS_CLUSTER_PHASE_SORT_TUPLES		3
-#define PROGRESS_CLUSTER_PHASE_WRITE_NEW_HEAP	4
-#define PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES	5
-#define PROGRESS_CLUSTER_PHASE_REBUILD_INDEX	6
-#define PROGRESS_CLUSTER_PHASE_FINAL_CLEANUP	7
+/*
+ * Phases of repack (as advertised via PROGRESS_REPACK_PHASE).
+ */
+#define PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP		1
+#define PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP	2
+#define PROGRESS_REPACK_PHASE_SORT_TUPLES		3
+#define PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP	4
+#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	5
+#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		6
+#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		7
+
+/*
+ * Commands of PROGRESS_REPACK
+ *
+ * Currently we only have one command, so the PROGRESS_REPACK_COMMAND
+ * parameter is not necessary. However it makes cluster.c simpler if we have
+ * the same set of parameters for CLUSTER and REPACK - see the note on REPACK
+ * parameters above.
+ */
+#define PROGRESS_REPACK_COMMAND_REPACK			1
+
+/*
+ * Progress parameters for cluster.
+ *
+ * Although we need to report REPACK and CLUSTER in separate views, the
+ * parameters and phases of CLUSTER are a subset of those of REPACK. Therefore
+ * we just use the appropriate values defined for REPACK above instead of
+ * defining a separate set of constants here.
+ */
 
 /* Commands of PROGRESS_CLUSTER */
 #define PROGRESS_CLUSTER_COMMAND_CLUSTER		1
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 86a236bd58b..fcc25a0c592 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3949,16 +3949,26 @@ typedef struct AlterSystemStmt
 } AlterSystemStmt;
 
 /* ----------------------
- *		Cluster Statement (support pbrown's cluster index implementation)
+ *		Repack Statement
  * ----------------------
  */
-typedef struct ClusterStmt
+typedef enum RepackCommand
+{
+	REPACK_COMMAND_CLUSTER,
+	REPACK_COMMAND_REPACK,
+	REPACK_COMMAND_VACUUMFULL,
+} RepackCommand;
+
+typedef struct RepackStmt
 {
 	NodeTag		type;
-	RangeVar   *relation;		/* relation being indexed, or NULL if all */
-	char	   *indexname;		/* original index defined */
+	RepackCommand command;		/* type of command being run */
+	RangeVar   *relation;		/* relation being repacked */
+	char	   *indexname;		/* order tuples by this index */
+	bool		usingindex;		/* whether USING INDEX is specified */
 	List	   *params;			/* list of DefElem nodes */
-} ClusterStmt;
+} RepackStmt;
+
 
 /* ----------------------
  *		Vacuum and Analyze Statements
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index a4af3f717a1..22559369e2c 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -374,6 +374,7 @@ PG_KEYWORD("reindex", REINDEX, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("relative", RELATIVE_P, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("release", RELEASE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("rename", RENAME, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("repack", REPACK, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("repeatable", REPEATABLE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("replace", REPLACE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("replica", REPLICA, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/include/tcop/cmdtaglist.h b/src/include/tcop/cmdtaglist.h
index d250a714d59..cceb312f2b3 100644
--- a/src/include/tcop/cmdtaglist.h
+++ b/src/include/tcop/cmdtaglist.h
@@ -196,6 +196,7 @@ PG_CMDTAG(CMDTAG_REASSIGN_OWNED, "REASSIGN OWNED", false, false, false)
 PG_CMDTAG(CMDTAG_REFRESH_MATERIALIZED_VIEW, "REFRESH MATERIALIZED VIEW", true, false, false)
 PG_CMDTAG(CMDTAG_REINDEX, "REINDEX", true, false, false)
 PG_CMDTAG(CMDTAG_RELEASE, "RELEASE", false, false, false)
+PG_CMDTAG(CMDTAG_REPACK, "REPACK", false, false, false)
 PG_CMDTAG(CMDTAG_RESET, "RESET", false, false, false)
 PG_CMDTAG(CMDTAG_REVOKE, "REVOKE", true, false, false)
 PG_CMDTAG(CMDTAG_REVOKE_ROLE, "REVOKE ROLE", false, false, false)
diff --git a/src/include/utils/backend_progress.h b/src/include/utils/backend_progress.h
index dda813ab407..e69e366dcdc 100644
--- a/src/include/utils/backend_progress.h
+++ b/src/include/utils/backend_progress.h
@@ -28,6 +28,7 @@ typedef enum ProgressCommandType
 	PROGRESS_COMMAND_CREATE_INDEX,
 	PROGRESS_COMMAND_BASEBACKUP,
 	PROGRESS_COMMAND_COPY,
+	PROGRESS_COMMAND_REPACK,
 } ProgressCommandType;
 
 #define PGSTAT_NUM_PROGRESS_PARAM	20
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index 4d40a6809ab..5256628b51d 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -254,6 +254,63 @@ ORDER BY 1;
  clstr_tst_pkey
 (3 rows)
 
+-- REPACK handles individual tables identically to CLUSTER, but it's worth
+-- checking if it handles table hierarchies identically as well.
+REPACK clstr_tst USING INDEX clstr_tst_c;
+-- Verify that inheritance link still works
+INSERT INTO clstr_tst_inh VALUES (0, 100, 'in child table 2');
+SELECT a,b,c,substring(d for 30), length(d) from clstr_tst;
+ a  |  b  |        c         |           substring            | length 
+----+-----+------------------+--------------------------------+--------
+ 10 |  14 | catorce          |                                |       
+ 18 |   5 | cinco            |                                |       
+  9 |   4 | cuatro           |                                |       
+ 26 |  19 | diecinueve       |                                |       
+ 12 |  18 | dieciocho        |                                |       
+ 30 |  16 | dieciseis        |                                |       
+ 24 |  17 | diecisiete       |                                |       
+  2 |  10 | diez             |                                |       
+ 23 |  12 | doce             |                                |       
+ 11 |   2 | dos              |                                |       
+ 25 |   9 | nueve            |                                |       
+ 31 |   8 | ocho             |                                |       
+  1 |  11 | once             |                                |       
+ 28 |  15 | quince           |                                |       
+ 32 |   6 | seis             | xyzzyxyzzyxyzzyxyzzyxyzzyxyzzy | 500000
+ 29 |   7 | siete            |                                |       
+ 15 |  13 | trece            |                                |       
+ 22 |  30 | treinta          |                                |       
+ 17 |  32 | treinta y dos    |                                |       
+  3 |  31 | treinta y uno    |                                |       
+  5 |   3 | tres             |                                |       
+ 20 |   1 | uno              |                                |       
+  6 |  20 | veinte           |                                |       
+ 14 |  25 | veinticinco      |                                |       
+ 21 |  24 | veinticuatro     |                                |       
+  4 |  22 | veintidos        |                                |       
+ 19 |  29 | veintinueve      |                                |       
+ 16 |  28 | veintiocho       |                                |       
+ 27 |  26 | veintiseis       |                                |       
+ 13 |  27 | veintisiete      |                                |       
+  7 |  23 | veintitres       |                                |       
+  8 |  21 | veintiuno        |                                |       
+  0 | 100 | in child table   |                                |       
+  0 | 100 | in child table 2 |                                |       
+(34 rows)
+
+-- Verify that foreign key link still works
+INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
+ERROR:  insert or update on table "clstr_tst" violates foreign key constraint "clstr_tst_con"
+DETAIL:  Key (b)=(1111) is not present in table "clstr_tst_s".
+SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
+ORDER BY 1;
+       conname        
+----------------------
+ clstr_tst_a_not_null
+ clstr_tst_con
+ clstr_tst_pkey
+(3 rows)
+
 SELECT relname, relkind,
     EXISTS(SELECT 1 FROM pg_class WHERE oid = c.reltoastrelid) AS hastoast
 FROM pg_class c WHERE relname LIKE 'clstr_tst%' ORDER BY relname;
@@ -381,6 +438,35 @@ SELECT * FROM clstr_1;
  2
 (2 rows)
 
+-- REPACK w/o argument performs no ordering, so we can only check which tables
+-- have the relfilenode changed.
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_old AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+SET SESSION AUTHORIZATION regress_clstr_user;
+SET client_min_messages = ERROR;  -- order of "skipping" warnings may vary
+REPACK;
+RESET client_min_messages;
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_new AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+-- Do the actual comparison. Unlike CLUSTER, clstr_3 should have been
+-- processed because there is nothing like clustering index here.
+SELECT o.relname FROM relnodes_old o
+JOIN relnodes_new n ON o.relname = n.relname
+WHERE o.relfilenode <> n.relfilenode
+ORDER BY o.relname;
+ relname 
+---------
+ clstr_1
+ clstr_3
+(2 rows)
+
+SET SESSION AUTHORIZATION regress_clstr_user;
 -- Test MVCC-safety of cluster. There isn't much we can do to verify the
 -- results with a single backend...
 CREATE TABLE clustertest (key int PRIMARY KEY);
@@ -495,6 +581,43 @@ ALTER TABLE clstrpart SET WITHOUT CLUSTER;
 ERROR:  cannot mark index clustered in partitioned table
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 ERROR:  cannot mark index clustered in partitioned table
+-- Check that REPACK sets new relfilenodes: it should process exactly the same
+-- tables as CLUSTER did.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart USING INDEX clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
+-- And finally the same for REPACK w/o index.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
 DROP TABLE clstrpart;
 -- Ownership of partitions is checked
 CREATE TABLE ptnowner(i int unique) PARTITION BY LIST (i);
@@ -513,7 +636,7 @@ CREATE TEMP TABLE ptnowner_oldnodes AS
   JOIN pg_class AS c ON c.oid=tree.relid;
 SET SESSION AUTHORIZATION regress_ptnowner;
 CLUSTER ptnowner USING ptnowner_i_idx;
-WARNING:  permission denied to cluster "ptnowner2", skipping it
+WARNING:  permission denied to execute CLUSTER on "ptnowner2", skipping it
 RESET SESSION AUTHORIZATION;
 SELECT a.relname, a.relfilenode=b.relfilenode FROM pg_class a
   JOIN ptnowner_oldnodes b USING (oid) ORDER BY a.relname COLLATE "C";
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index dce8c672b40..84ae47080f5 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2066,6 +2066,29 @@ pg_stat_progress_create_index| SELECT s.pid,
     s.param15 AS partitions_done
    FROM (pg_stat_get_progress_info('CREATE INDEX'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+pg_stat_progress_repack| SELECT s.pid,
+    s.datid,
+    d.datname,
+    s.relid,
+        CASE s.param2
+            WHEN 0 THEN 'initializing'::text
+            WHEN 1 THEN 'seq scanning heap'::text
+            WHEN 2 THEN 'index scanning heap'::text
+            WHEN 3 THEN 'sorting tuples'::text
+            WHEN 4 THEN 'writing new heap'::text
+            WHEN 5 THEN 'swapping relation files'::text
+            WHEN 6 THEN 'rebuilding index'::text
+            WHEN 7 THEN 'performing final cleanup'::text
+            ELSE NULL::text
+        END AS phase,
+    (s.param3)::oid AS repack_index_relid,
+    s.param4 AS heap_tuples_scanned,
+    s.param5 AS heap_tuples_written,
+    s.param6 AS heap_blks_total,
+    s.param7 AS heap_blks_scanned,
+    s.param8 AS index_rebuild_count
+   FROM (pg_stat_get_progress_info('REPACK'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
+     LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_vacuum| SELECT s.pid,
     s.datid,
     d.datname,
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index b7115f86104..cfcc3dc9761 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -76,6 +76,19 @@ INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
 SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
 ORDER BY 1;
 
+-- REPACK handles individual tables identically to CLUSTER, but it's worth
+-- checking if it handles table hierarchies identically as well.
+REPACK clstr_tst USING INDEX clstr_tst_c;
+
+-- Verify that inheritance link still works
+INSERT INTO clstr_tst_inh VALUES (0, 100, 'in child table 2');
+SELECT a,b,c,substring(d for 30), length(d) from clstr_tst;
+
+-- Verify that foreign key link still works
+INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
+
+SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
+ORDER BY 1;
 
 SELECT relname, relkind,
     EXISTS(SELECT 1 FROM pg_class WHERE oid = c.reltoastrelid) AS hastoast
@@ -159,6 +172,34 @@ INSERT INTO clstr_1 VALUES (1);
 CLUSTER clstr_1;
 SELECT * FROM clstr_1;
 
+-- REPACK w/o argument performs no ordering, so we can only check which tables
+-- have the relfilenode changed.
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_old AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+
+SET SESSION AUTHORIZATION regress_clstr_user;
+SET client_min_messages = ERROR;  -- order of "skipping" warnings may vary
+REPACK;
+RESET client_min_messages;
+
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_new AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+
+-- Do the actual comparison. Unlike CLUSTER, clstr_3 should have been
+-- processed because there is nothing like clustering index here.
+SELECT o.relname FROM relnodes_old o
+JOIN relnodes_new n ON o.relname = n.relname
+WHERE o.relfilenode <> n.relfilenode
+ORDER BY o.relname;
+
+SET SESSION AUTHORIZATION regress_clstr_user;
+
 -- Test MVCC-safety of cluster. There isn't much we can do to verify the
 -- results with a single backend...
 
@@ -229,6 +270,24 @@ SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM o
 CLUSTER clstrpart;
 ALTER TABLE clstrpart SET WITHOUT CLUSTER;
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+
+-- Check that REPACK sets new relfilenodes: it should process exactly the same
+-- tables as CLUSTER did.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart USING INDEX clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+
+-- And finally the same for REPACK w/o index.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+
 DROP TABLE clstrpart;
 
 -- Ownership of partitions is checked
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3daba26b237..dc562ca662c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -426,6 +426,7 @@ ClientCertName
 ClientConnectionInfo
 ClientData
 ClientSocket
+ClusterCommand
 ClonePtrType
 ClosePortalStmt
 ClosePtrType
@@ -2536,6 +2537,7 @@ ReorderBufferTupleCidKey
 ReorderBufferUpdateProgressTxnCB
 ReorderTuple
 RepOriginId
+RepackStmt
 ReparameterizeForeignPathByChild_function
 ReplaceVarsFromTargetList_context
 ReplaceVarsNoMatchOption

base-commit: db6461b1c9aae122b90bb52430f06efb306b371a
-- 
2.39.5

Robert Treat

rob@xzilla.net

6 months ago

In reply to: Alvaro Herrera (#1)

Re: Adding REPACK [concurrently]

On Sat, Jul 26, 2025 at 5:56 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Hello,

Here's a patch to add REPACK and eventually the CONCURRENTLY flag to it.
This is coming from [1]. The ultimate goal is to have an in-core tool
to allow concurrent table rewrite to get rid of bloat; right now, VACUUM
FULL does that, but it's not concurrent. Users have resorted to using
the pg_repack third-party tool, which is ancient and uses a weird
internal implementation, as well as pg_squeeze, which uses logical
decoding to capture changes that occur during the table rewrite. The
patch submitted here, largely by Antonin Houska with some changes by me,
is based on the the pg_squeeze code which he authored, and first
introduces a new command called REPACK to absorb both VACUUM FULL and
CLUSTER, followed by addition of a CONCURRENTLY flag to allow some forms
of REPACK to operate online using logical decoding.

Essentially, this first patch just reshuffles the CLUSTER code to create
the REPACK command.

Thanks for keeping this ball rolling.

My other change to Antonin's patch is that I made REPACK USING INDEX set
the 'indisclustered' flag to the index being used, so REPACK behaves
identically to CLUSTER. We can discuss whether we really want this.
For instance we could add an option so that by default REPACK omits
persisting the clustered index, and instead it only does that when you
give it some special option, say something like
"REPACK (persist_clustered_index=true) tab USING INDEX idx"
Overall I'm not sure this is terribly interesting, since clustered
indexes are not very useful for most users anyway.

I think I would lean towards having it work like CLUSTER (preserve the
index), since that helps people making the transition, and it doesn't
feel terribly useful to invent new syntax for a feature that I would
agree isn't very useful for most people.

I made a few other minor changes not worthy of individual mention, and
there are a few others pending, such as updates to the
pg_stat_progress_repack view infrastructure, as well as phasing out
pg_stat_progress_cluster (maybe the latter would offer a subset of the
former; not yet sure about this.) Also, I'd like to work on adding a
`repackdb` command for completeness.

On repackdb: I think is going to be very similar to vacuumdb, mostly in
that it is going to need to be able to run tasks in parallel; but there
are things it doesn't have to deal with, such as analyze-in-stages,
which I think is a large burden. I estimate about 1k LOC there,
extremely similar to vacuumdb. Maybe it makes sense to share the source
code and make the new executable a symlink instead, with some additional
code to support the two different modes. Again, I'm not sure about
this -- I like the idea, but I'd have to see the implementation.

I'll be rebasing the rest of Antonin's patch series afterwards,
including the logical decoding changes necessary for CONCURRENTLY. In
the meantime, if people want to review those, which would be very
valuable, they can go back to branch master from around the time he
submitted it and apply the old patches there.

For clarity, are you intending to commit this patch before having the
other parts ready? (If that sounds like an objection, it isn't) After
a first pass, I think there's some confusing bits in the new docs that
could use straightening out, but there likely going to overlap changes
once concurrently is brought in, so it might make sense to hold off on
those. Either way I definitely want to dive into this a bit deeper
with some fresh eyes, there's a lot to digest... speaking of, for this
bit in src/backend/commands/cluster.c

+    switch (cmd)
+    {
+        case REPACK_COMMAND_REPACK:
+            return "REPACK";
+        case REPACK_COMMAND_VACUUMFULL:
+            return "VACUUM";
+        case REPACK_COMMAND_CLUSTER:
+            return "VACUUM";
+    }
+    return "???";

The last one should return "CLUSTER" no?

Robert Treat
https://xzilla.net

Fujii Masao

masao.fujii@gmail.com

6 months ago

In reply to: Alvaro Herrera (#1)

Re: Adding REPACK [concurrently]

On Sun, Jul 27, 2025 at 6:56 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Hello,

Here's a patch to add REPACK and eventually the CONCURRENTLY flag to it.
This is coming from [1]. The ultimate goal is to have an in-core tool
to allow concurrent table rewrite to get rid of bloat;

right now, VACUUM
FULL does that, but it's not concurrent. Users have resorted to using
the pg_repack third-party tool, which is ancient and uses a weird
internal implementation, as well as pg_squeeze, which uses logical
decoding to capture changes that occur during the table rewrite. The
patch submitted here, largely by Antonin Houska with some changes by me,
is based on the the pg_squeeze code which he authored, and first
introduces a new command called REPACK to absorb both VACUUM FULL and
CLUSTER, followed by addition of a CONCURRENTLY flag to allow some forms
of REPACK to operate online using logical decoding.

Does this mean REPACK CONCURRENTLY requires wal_level = logical,
while plain REPACK (without CONCURRENTLY) works with any wal_level
setting? If we eventually deprecate VACUUM FULL and CLUSTER,
I think plain REPACK should still be allowed with wal_level = minimal
or replica, so users with those settings can perform equivalent
processing.

+ if (!cluster_is_permitted_for_relation(tableOid, userid,
+    CLUSTER_COMMAND_CLUSTER))

As for the patch you attached, it seems to be an early WIP and
might not be ready for review yet?? BTW, I got the following
compilation failure and probably CLUSTER_COMMAND_CLUSTER
the above should be GetUserId().

-----------------
cluster.c:455:14: error: use of undeclared identifier 'CLUSTER_COMMAND_CLUSTER'
455 |
CLUSTER_COMMAND_CLUSTER))
|
^
1 error generated.
-----------------

Regards,

--
Fujii Masao

Antonin Houska

ah@cybertec.at

5 months ago

In reply to: Alvaro Herrera (#1)

Re: Adding REPACK [concurrently]

Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

I made a few changes from Antonin's original at [2]. First, I modified
the grammar to support "REPACK [tab] USING INDEX" without specifying the
index name. With this change, all possibilities of the old commands are
covered,

...

Here's a list of existing commands, and how to write them in the current
patch's proposal for REPACK:

-- re-clusters all tables that have a clustered index set
CLUSTER -> REPACK USING INDEX

-- clusters the given table using the given index
CLUSTER tab USING idx -> REPACK tab USING INDEX idx

-- clusters this table using a clustered index; error if no index clustered
CLUSTER tab -> REPACK tab USING INDEX

-- vacuum-full all tables
VACUUM FULL -> REPACK

-- vacuum-full the specified table
VACUUM FULL tab -> REPACK tab

Now that we want to cover the CLUSTER/VACUUM FULL completely, I've checked the
options of VACUUM FULL. I found two items not supported by REPACK (but also
not supported by by CLUSTER): ANALYZE and SKIP_DATABASE_STATS. Maybe just
let's mention that in the user documentation of REPACK?

(Besides that, VACUUM FULL accepts TRUNCATE and INDEX_CLEANUP options, but I
think these have no effect.)

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Robert Treat

rob@xzilla.net

5 months ago

In reply to: Antonin Houska (#4)

Re: Adding REPACK [concurrently]

On Tue, Aug 5, 2025 at 4:59 AM Antonin Houska <ah@cybertec.at> wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

I made a few changes from Antonin's original at [2]. First, I modified
the grammar to support "REPACK [tab] USING INDEX" without specifying the
index name. With this change, all possibilities of the old commands are
covered,

...

Here's a list of existing commands, and how to write them in the current
patch's proposal for REPACK:

-- re-clusters all tables that have a clustered index set
CLUSTER -> REPACK USING INDEX

-- clusters the given table using the given index
CLUSTER tab USING idx -> REPACK tab USING INDEX idx

-- clusters this table using a clustered index; error if no index clustered
CLUSTER tab -> REPACK tab USING INDEX

In the v18 patch, the docs say that repack doesn't remember the index,
but it seems we are still calling mark_index_clustered, so I think the
above is true but we need to update the docs(?).

-- vacuum-full all tables
VACUUM FULL -> REPACK

-- vacuum-full the specified table
VACUUM FULL tab -> REPACK tab

Now that we want to cover the CLUSTER/VACUUM FULL completely, I've checked the
options of VACUUM FULL. I found two items not supported by REPACK (but also
not supported by by CLUSTER): ANALYZE and SKIP_DATABASE_STATS. Maybe just
let's mention that in the user documentation of REPACK?

I would note that both pg_repack and pg_squeeze analyze by default,
and running "vacuum full analyze" is the recommended behavior, so not
having analyze included is a step backwards.

(Besides that, VACUUM FULL accepts TRUNCATE and INDEX_CLEANUP options, but I
think these have no effect.)

Yeah, these seem safe to ignore.

Robert Treat
https://xzilla.net

Alvaro Herrera

alvherre@alvh.no-ip.org

5 months ago

In reply to: Robert Treat (#5)

Re: Adding REPACK [concurrently]

On 2025-Aug-16, Robert Treat wrote:

On Tue, Aug 5, 2025 at 4:59 AM Antonin Houska <ah@cybertec.at> wrote:

Now that we want to cover the CLUSTER/VACUUM FULL completely, I've checked the
options of VACUUM FULL. I found two items not supported by REPACK (but also
not supported by by CLUSTER): ANALYZE and SKIP_DATABASE_STATS. Maybe just
let's mention that in the user documentation of REPACK?

I would note that both pg_repack and pg_squeeze analyze by default,
and running "vacuum full analyze" is the recommended behavior, so not
having analyze included is a step backwards.

Make sense to add ANALYZE as an option to repack, yeah.

So if I repack a single table with
REPACK (ANALYZE) table USING INDEX;

then do you expect that this would first cluster the table under
AccessExclusiveLock, then release the lock to do the analyze step, or
would the analyze be done under the same lock? This is significant for
a query that starts while repack is running, because if we release the
AEL then the query is planned when there are no stats for the table,
which might be bad.

I think the time to run the analyze step should be considerable shorter
than the time to run the repacking step, so running both together under
the same lock should be okay.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Computing is too important to be left to men." (Karen Spärck Jones)

Alvaro Herrera

alvherre@alvh.no-ip.org

5 months ago

In reply to: Robert Treat (#5)

Re: Adding REPACK [concurrently]

On 2025-Aug-16, Robert Treat wrote:

In the v18 patch, the docs say that repack doesn't remember the index,
but it seems we are still calling mark_index_clustered, so I think the
above is true but we need to update the docs(?).

Yes, the docs are obsolete on this point, I'm in the process of updating
them. Thanks for pointing this out.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"La victoria es para quien se atreve a estar solo"

Álvaro Herrera

alvherre@kurilemu.de

5 months ago

In reply to: Alvaro Herrera (#1)

1 attachment(s)

Re: Adding REPACK [concurrently]

Hello,

Here's a second cut of the initial REPACK work. Antonin added an
implementation of pg_repackdb, and there's also a couple of bug fixes
that were reported in the thread. I also added support for the ANALYZE
option as noted by Robert Treat, though it only works if you specify a
single non-partitioned table. Adding for the multi-table case is likely
easy, but I didn't try.

I purposefully do not include the CONCURRENTLY work yet -- I want to get
this part commitable-clean first, then we can continue work on the
logical decoding work on top of that.

Note choice of shell command name: though all the other programs in
src/bin/scripts do not use the "pg_" prefix, this one does; we thought
it made no sense to follow the old programs as precedent because there
seems to be a lament for the lack of pg_ prefix in those, and we only
keep what they are because of their long history. This one has no
history.

Still on pg_repackdb, the implementation here is to install a symlink
called pg_repackdb which points to vacuumdb, and make the program behave
differently when called in this way. The amount of additional code for
this is relatively small, so I think this is a worthy technique --
assuming it works. If it doesn't, Antonin proposed a separate binary
that just calls some functions from vacuumdb. Or maybe we could have a
common source file that both utilities call.

I edited the docs a bit, limiting the exposure of CLUSTER and VACUUM
FULL, and instead redirecting the user to the REPACK docs. In the
REPACK docs I modified things for additional clarity.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Selbst das größte Genie würde nicht weit kommen, wenn es
alles seinem eigenen Innern verdanken wollte." (Johann Wolfgang von Goethe)
Ni aún el genio más grande llegaría muy lejos si
quisiera sacarlo todo de su propio interior.

Attachments:

v2-0001-Add-REPACK-command.patchtext/x-diff; charset=utf-8Download

From 62b96c7acdeb3c29c1c8668189c5c57a8bdb3c97 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= <alvherre@kurilemu.de>
Date: Sat, 26 Jul 2025 19:57:26 +0200
Subject: [PATCH v2] Add REPACK command

REPACK absorbs the functionality of VACUUM FULL and CLUSTER in a single
command.  Because this functionality is completely different from
regular VACUUM, having it separate from VACUUM makes it easier for users
to understand; as for CLUSTER, the term is heavily overloaded in the
TI world and even in Postgres itself, so it's good that we can avoid it.

This also adds pg_repackdb, a new utility that can invoke the new
commands.  This is heavily based on vacuumdb.  We may still change the
implementation, depending on how does Windows like this one.

Author: Antonin Houska <ah@cybertec.at>
Reviewed-by: To fill in
Discussion: https://postgr.es/m/82651.1720540558@antos
Discussion: https://postgr.es/m/202507262156.sb455angijk6@alvherre.pgsql
---
 doc/src/sgml/monitoring.sgml             | 223 ++++++-
 doc/src/sgml/ref/allfiles.sgml           |   2 +
 doc/src/sgml/ref/cluster.sgml            |  97 +--
 doc/src/sgml/ref/pg_repackdb.sgml        | 479 ++++++++++++++
 doc/src/sgml/ref/repack.sgml             | 284 +++++++++
 doc/src/sgml/ref/vacuum.sgml             |  33 +-
 doc/src/sgml/reference.sgml              |   2 +
 src/backend/access/heap/heapam_handler.c |  32 +-
 src/backend/catalog/index.c              |   2 +-
 src/backend/catalog/system_views.sql     |  26 +
 src/backend/commands/cluster.c           | 758 +++++++++++++++--------
 src/backend/commands/vacuum.c            |   3 +-
 src/backend/parser/gram.y                |  88 ++-
 src/backend/tcop/utility.c               |  20 +-
 src/backend/utils/adt/pgstatfuncs.c      |   2 +
 src/bin/psql/tab-complete.in.c           |  33 +-
 src/bin/scripts/Makefile                 |   7 +
 src/bin/scripts/meson.build              |   4 +
 src/bin/scripts/t/103_repackdb.pl        |  24 +
 src/bin/scripts/vacuumdb.c               | 464 +++++++++++---
 src/include/commands/cluster.h           |   8 +-
 src/include/commands/progress.h          |  61 +-
 src/include/nodes/parsenodes.h           |  20 +-
 src/include/parser/kwlist.h              |   1 +
 src/include/tcop/cmdtaglist.h            |   1 +
 src/include/utils/backend_progress.h     |   1 +
 src/test/regress/expected/cluster.out    | 125 +++-
 src/test/regress/expected/rules.out      |  23 +
 src/test/regress/sql/cluster.sql         |  59 ++
 src/tools/pgindent/typedefs.list         |   3 +
 30 files changed, 2375 insertions(+), 510 deletions(-)
 create mode 100644 doc/src/sgml/ref/pg_repackdb.sgml
 create mode 100644 doc/src/sgml/ref/repack.sgml
 create mode 100644 src/bin/scripts/t/103_repackdb.pl

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3f4a27a736e..12e103d319d 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -405,6 +405,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_progress_repack</structname><indexterm><primary>pg_stat_progress_repack</primary></indexterm></entry>
+      <entry>One row for each backend running
+       <command>REPACK</command>, showing current progress.  See
+       <xref linkend="repack-progress-reporting"/>.
+      </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_progress_basebackup</structname><indexterm><primary>pg_stat_progress_basebackup</primary></indexterm></entry>
       <entry>One row for each WAL sender process streaming a base backup,
@@ -5506,7 +5514,8 @@ FROM pg_stat_get_backend_idset() AS backendid;
    certain commands during command execution.  Currently, the only commands
    which support progress reporting are <command>ANALYZE</command>,
    <command>CLUSTER</command>,
-   <command>CREATE INDEX</command>, <command>VACUUM</command>,
+   <command>CREATE INDEX</command>, <command>REPACK</command>,
+   <command>VACUUM</command>,
    <command>COPY</command>,
    and <xref linkend="protocol-replication-base-backup"/> (i.e., replication
    command that <xref linkend="app-pgbasebackup"/> issues to take
@@ -5965,6 +5974,218 @@ FROM pg_stat_get_backend_idset() AS backendid;
   </table>
  </sect2>
 
+ <sect2 id="repack-progress-reporting">
+  <title>REPACK Progress Reporting</title>
+
+  <indexterm>
+   <primary>pg_stat_progress_repack</primary>
+  </indexterm>
+
+  <para>
+   Whenever <command>REPACK</command> is running,
+   the <structname>pg_stat_progress_repack</structname> view will contain a
+   row for each backend that is currently running the command.  The tables
+   below describe the information that will be reported and provide
+   information about how to interpret it.
+  </para>
+
+  <table id="pg-stat-progress-repack-view" xreflabel="pg_stat_progress_repack">
+   <title><structname>pg_stat_progress_repack</structname> View</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>pid</structfield> <type>integer</type>
+      </para>
+      <para>
+       Process ID of backend.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>datid</structfield> <type>oid</type>
+      </para>
+      <para>
+       OID of the database to which this backend is connected.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>datname</structfield> <type>name</type>
+      </para>
+      <para>
+       Name of the database to which this backend is connected.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>relid</structfield> <type>oid</type>
+      </para>
+      <para>
+       OID of the table being repacked.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>phase</structfield> <type>text</type>
+      </para>
+      <para>
+       Current processing phase. See <xref linkend="repack-phases"/>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>repack_index_relid</structfield> <type>oid</type>
+      </para>
+      <para>
+       If the table is being scanned using an index, this is the OID of the
+       index being used; otherwise, it is zero.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_scanned</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples scanned.
+       This counter only advances when the phase is
+       <literal>seq scanning heap</literal>,
+       <literal>index scanning heap</literal>
+       or <literal>writing new heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_written</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples written.
+       This counter only advances when the phase is
+       <literal>seq scanning heap</literal>,
+       <literal>index scanning heap</literal>
+       or <literal>writing new heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_blks_total</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total number of heap blocks in the table.  This number is reported
+       as of the beginning of <literal>seq scanning heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_blks_scanned</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap blocks scanned.  This counter only advances when the
+       phase is <literal>seq scanning heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>index_rebuild_count</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of indexes rebuilt.  This counter only advances when the phase
+       is <literal>rebuilding index</literal>.
+      </para></entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <table id="repack-phases">
+   <title>REPACK Phases</title>
+   <tgroup cols="2">
+    <colspec colname="col1" colwidth="1*"/>
+    <colspec colname="col2" colwidth="2*"/>
+    <thead>
+    <row>
+      <entry>Phase</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><literal>initializing</literal></entry>
+     <entry>
+       The command is preparing to begin scanning the heap.  This phase is
+       expected to be very brief.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>seq scanning heap</literal></entry>
+     <entry>
+       The command is currently scanning the table using a sequential scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>index scanning heap</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently scanning the table using an index scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>sorting tuples</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently sorting tuples.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>writing new heap</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently writing the new heap.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>swapping relation files</literal></entry>
+     <entry>
+       The command is currently swapping newly-built files into place.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>rebuilding index</literal></entry>
+     <entry>
+       The command is currently rebuilding an index.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>performing final cleanup</literal></entry>
+     <entry>
+       The command is performing final cleanup.  When this phase is
+       completed, <command>REPACK</command> will end.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+ </sect2>
+
  <sect2 id="copy-progress-reporting">
   <title>COPY Progress Reporting</title>
 
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index f5be638867a..eabf92e3536 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -167,6 +167,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY refreshMaterializedView SYSTEM "refresh_materialized_view.sgml">
 <!ENTITY reindex            SYSTEM "reindex.sgml">
 <!ENTITY releaseSavepoint   SYSTEM "release_savepoint.sgml">
+<!ENTITY repack             SYSTEM "repack.sgml">
 <!ENTITY reset              SYSTEM "reset.sgml">
 <!ENTITY revoke             SYSTEM "revoke.sgml">
 <!ENTITY rollback           SYSTEM "rollback.sgml">
@@ -212,6 +213,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY pgIsready          SYSTEM "pg_isready.sgml">
 <!ENTITY pgReceivewal       SYSTEM "pg_receivewal.sgml">
 <!ENTITY pgRecvlogical      SYSTEM "pg_recvlogical.sgml">
+<!ENTITY pgRepackdb         SYSTEM "pg_repackdb.sgml">
 <!ENTITY pgResetwal         SYSTEM "pg_resetwal.sgml">
 <!ENTITY pgRestore          SYSTEM "pg_restore.sgml">
 <!ENTITY pgRewind           SYSTEM "pg_rewind.sgml">
diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index 8811f169ea0..cfcfb65e349 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -33,51 +33,13 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
   <title>Description</title>
 
   <para>
-   <command>CLUSTER</command> instructs <productname>PostgreSQL</productname>
-   to cluster the table specified
-   by <replaceable class="parameter">table_name</replaceable>
-   based on the index specified by
-   <replaceable class="parameter">index_name</replaceable>. The index must
-   already have been defined on
-   <replaceable class="parameter">table_name</replaceable>.
+   The <command>CLUSTER</command> command is equivalent to
+   <xref linkend="sql-repack"/> with an <literal>USING INDEX</literal>
+   clause.  See there for more details.
   </para>
 
-  <para>
-   When a table is clustered, it is physically reordered
-   based on the index information. Clustering is a one-time operation:
-   when the table is subsequently updated, the changes are
-   not clustered.  That is, no attempt is made to store new or
-   updated rows according to their index order.  (If one wishes, one can
-   periodically recluster by issuing the command again.  Also, setting
-   the table's <literal>fillfactor</literal> storage parameter to less than
-   100% can aid in preserving cluster ordering during updates, since updated
-   rows are kept on the same page if enough space is available there.)
-  </para>
+<!-- Do we need to describe exactly which options map to what?  They seem obvious to me. -->
 
-  <para>
-   When a table is clustered, <productname>PostgreSQL</productname>
-   remembers which index it was clustered by.  The form
-   <command>CLUSTER <replaceable class="parameter">table_name</replaceable></command>
-   reclusters the table using the same index as before.  You can also
-   use the <literal>CLUSTER</literal> or <literal>SET WITHOUT CLUSTER</literal>
-   forms of <link linkend="sql-altertable"><command>ALTER TABLE</command></link> to set the index to be used for
-   future cluster operations, or to clear any previous setting.
-  </para>
-
-  <para>
-   <command>CLUSTER</command> without a
-   <replaceable class="parameter">table_name</replaceable> reclusters all the
-   previously-clustered tables in the current database that the calling user
-   has privileges for.  This form of <command>CLUSTER</command> cannot be
-   executed inside a transaction block.
-  </para>
-
-  <para>
-   When a table is being clustered, an <literal>ACCESS
-   EXCLUSIVE</literal> lock is acquired on it. This prevents any other
-   database operations (both reads and writes) from operating on the
-   table until the <command>CLUSTER</command> is finished.
-  </para>
  </refsect1>
 
  <refsect1>
@@ -136,63 +98,12 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
     on the table.
    </para>
 
-   <para>
-    In cases where you are accessing single rows randomly
-    within a table, the actual order of the data in the
-    table is unimportant. However, if you tend to access some
-    data more than others, and there is an index that groups
-    them together, you will benefit from using <command>CLUSTER</command>.
-    If you are requesting a range of indexed values from a table, or a
-    single indexed value that has multiple rows that match,
-    <command>CLUSTER</command> will help because once the index identifies the
-    table page for the first row that matches, all other rows
-    that match are probably already on the same table page,
-    and so you save disk accesses and speed up the query.
-   </para>
-
-   <para>
-    <command>CLUSTER</command> can re-sort the table using either an index scan
-    on the specified index, or (if the index is a b-tree) a sequential
-    scan followed by sorting.  It will attempt to choose the method that
-    will be faster, based on planner cost parameters and available statistical
-    information.
-   </para>
-
    <para>
     While <command>CLUSTER</command> is running, the <xref
     linkend="guc-search-path"/> is temporarily changed to <literal>pg_catalog,
     pg_temp</literal>.
    </para>
 
-   <para>
-    When an index scan is used, a temporary copy of the table is created that
-    contains the table data in the index order.  Temporary copies of each
-    index on the table are created as well.  Therefore, you need free space on
-    disk at least equal to the sum of the table size and the index sizes.
-   </para>
-
-   <para>
-    When a sequential scan and sort is used, a temporary sort file is
-    also created, so that the peak temporary space requirement is as much
-    as double the table size, plus the index sizes.  This method is often
-    faster than the index scan method, but if the disk space requirement is
-    intolerable, you can disable this choice by temporarily setting <xref
-    linkend="guc-enable-sort"/> to <literal>off</literal>.
-   </para>
-
-   <para>
-    It is advisable to set <xref linkend="guc-maintenance-work-mem"/> to
-    a reasonably large value (but not more than the amount of RAM you can
-    dedicate to the <command>CLUSTER</command> operation) before clustering.
-   </para>
-
-   <para>
-    Because the planner records statistics about the ordering of
-    tables, it is advisable to run <link linkend="sql-analyze"><command>ANALYZE</command></link>
-    on the newly clustered table.
-    Otherwise, the planner might make poor choices of query plans.
-   </para>
-
    <para>
     Because <command>CLUSTER</command> remembers which indexes are clustered,
     one can cluster the tables one wants clustered manually the first time,
diff --git a/doc/src/sgml/ref/pg_repackdb.sgml b/doc/src/sgml/ref/pg_repackdb.sgml
new file mode 100644
index 00000000000..32570d071cb
--- /dev/null
+++ b/doc/src/sgml/ref/pg_repackdb.sgml
@@ -0,0 +1,479 @@
+<!--
+doc/src/sgml/ref/pg_repackdb.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="app-pgrepackdb">
+ <indexterm zone="app-pgrepackdb">
+  <primary>pg_repackdb</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle><application>pg_repackdb</application></refentrytitle>
+  <manvolnum>1</manvolnum>
+  <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>pg_repackdb</refname>
+  <refpurpose>repack and analyze a <productname>PostgreSQL</productname>
+  database</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_repackdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+      <arg choice="plain"><option>-t</option></arg>
+      <arg choice="plain"><option>--table</option></arg>
+     </group>
+     <replaceable>table</replaceable>
+     <arg choice="opt">( <replaceable class="parameter">column</replaceable> [,...] )</arg>
+    </arg>
+   </arg>
+
+   <arg choice="opt">
+    <group choice="plain">
+     <arg choice="plain"><replaceable>dbname</replaceable></arg>
+     <arg choice="plain"><option>-a</option></arg>
+     <arg choice="plain"><option>--all</option></arg>
+    </group>
+   </arg>
+  </cmdsynopsis>
+
+  <cmdsynopsis>
+   <command>pg_repackdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+      <arg choice="plain"><option>-n</option></arg>
+      <arg choice="plain"><option>--schema</option></arg>
+     </group>
+     <replaceable>schema</replaceable>
+    </arg>
+   </arg>
+
+   <arg choice="opt">
+    <group choice="plain">
+     <arg choice="plain"><replaceable>dbname</replaceable></arg>
+     <arg choice="plain"><option>-a</option></arg>
+     <arg choice="plain"><option>--all</option></arg>
+    </group>
+   </arg>
+  </cmdsynopsis>
+
+  <cmdsynopsis>
+   <command>pg_repackdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+      <arg choice="plain"><option>-N</option></arg>
+      <arg choice="plain"><option>--exclude-schema</option></arg>
+     </group>
+     <replaceable>schema</replaceable>
+    </arg>
+   </arg>
+
+   <arg choice="opt">
+    <group choice="plain">
+     <arg choice="plain"><replaceable>dbname</replaceable></arg>
+     <arg choice="plain"><option>-a</option></arg>
+     <arg choice="plain"><option>--all</option></arg>
+    </group>
+   </arg>
+  </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <application>pg_repackdb</application> is a utility for repacking a
+   <productname>PostgreSQL</productname> database.
+   <application>pg_repackdb</application> will also generate internal
+   statistics used by the <productname>PostgreSQL</productname> query
+   optimizer.
+  </para>
+
+  <para>
+   <application>pg_repackdb</application> is a wrapper around the SQL
+   command <link linkend="sql-repack"><command>REPACK</command></link> There
+   is no effective difference between repacking and analyzing databases via
+   this utility and via other methods for accessing the server.
+  </para>
+
+ </refsect1>
+
+
+ <refsect1>
+  <title>Options</title>
+
+   <para>
+    <application>pg_repackdb</application> accepts the following command-line arguments:
+    <variablelist>
+     <varlistentry>
+      <term><option>-a</option></term>
+      <term><option>--all</option></term>
+      <listitem>
+       <para>
+        Repack all databases.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option><optional>-d</optional> <replaceable class="parameter">dbname</replaceable></option></term>
+      <term><option><optional>--dbname=</optional><replaceable class="parameter">dbname</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the name of the database to be repacked or analyzed,
+        when <option>-a</option>/<option>--all</option> is not used.  If this
+        is not specified, the database name is read from the environment
+        variable <envar>PGDATABASE</envar>.  If that is not set, the user name
+        specified for the connection is used.
+        The <replaceable>dbname</replaceable> can be
+        a <link linkend="libpq-connstring">connection string</link>.  If so,
+        connection string parameters will override any conflicting command
+        line options.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-e</option></term>
+      <term><option>--echo</option></term>
+      <listitem>
+       <para>
+        Echo the commands that <application>pg_repackdb</application>
+        generates and sends to the server.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-j <replaceable class="parameter">njobs</replaceable></option></term>
+      <term><option>--jobs=<replaceable class="parameter">njobs</replaceable></option></term>
+      <listitem>
+       <para>
+        Execute the repack or analyze commands in parallel by running
+        <replaceable class="parameter">njobs</replaceable>
+        commands simultaneously.  This option may reduce the processing time
+        but it also increases the load on the database server.
+       </para>
+       <para>
+        <application>pg_repackdb</application> will open
+        <replaceable class="parameter">njobs</replaceable> connections to the
+        database, so make sure your <xref linkend="guc-max-connections"/>
+        setting is high enough to accommodate all connections.
+       </para>
+       <para>
+        Note that using this mode might cause deadlock failures if certain
+        system catalogs are processed in parallel.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-n <replaceable class="parameter">schema</replaceable></option></term>
+      <term><option>--schema=<replaceable class="parameter">schema</replaceable></option></term>
+      <listitem>
+       <para>
+        Repack or analyze all tables in
+        <replaceable class="parameter">schema</replaceable> only.  Multiple
+        schemas can be repacked by writing multiple <option>-n</option>
+        switches.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-N <replaceable class="parameter">schema</replaceable></option></term>
+      <term><option>--exclude-schema=<replaceable class="parameter">schema</replaceable></option></term>
+      <listitem>
+       <para>
+        Do not repack or analyze any tables in
+        <replaceable class="parameter">schema</replaceable>.  Multiple schemas
+        can be excluded by writing multiple <option>-N</option> switches.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-q</option></term>
+      <term><option>--quiet</option></term>
+      <listitem>
+       <para>
+        Do not display progress messages.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-t <replaceable class="parameter">table</replaceable> [ (<replaceable class="parameter">column</replaceable> [,...]) ]</option></term>
+      <term><option>--table=<replaceable class="parameter">table</replaceable> [ (<replaceable class="parameter">column</replaceable> [,...]) ]</option></term>
+      <listitem>
+       <para>
+        Repack or analyze <replaceable class="parameter">table</replaceable>
+        only.  Column names can be specified only in conjunction with
+        the <option>--analyze</option> option.  Multiple tables can be
+        repacked by writing multiple
+        <option>-t</option> switches.
+       </para>
+       <tip>
+        <para>
+         If you specify columns, you probably have to escape the parentheses
+         from the shell.  (See examples below.)
+        </para>
+       </tip>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-v</option></term>
+      <term><option>--verbose</option></term>
+      <listitem>
+       <para>
+        Print detailed information during processing.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-V</option></term>
+       <term><option>--version</option></term>
+       <listitem>
+       <para>
+       Print the <application>pg_repackdb</application> version and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-z</option></term>
+      <term><option>--analyze</option></term>
+      <listitem>
+       <para>
+        Also calculate statistics for use by the optimizer.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-?</option></term>
+       <term><option>--help</option></term>
+       <listitem>
+       <para>
+       Show help about <application>pg_repackdb</application> command line
+       arguments, and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+    </variablelist>
+   </para>
+
+   <para>
+    <application>pg_repackdb</application> also accepts
+    the following command-line arguments for connection parameters:
+    <variablelist>
+     <varlistentry>
+      <term><option>-h <replaceable class="parameter">host</replaceable></option></term>
+      <term><option>--host=<replaceable class="parameter">host</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the host name of the machine on which the server
+        is running.  If the value begins with a slash, it is used
+        as the directory for the Unix domain socket.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-p <replaceable class="parameter">port</replaceable></option></term>
+      <term><option>--port=<replaceable class="parameter">port</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the TCP port or local Unix domain socket file
+        extension on which the server
+        is listening for connections.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-U <replaceable class="parameter">username</replaceable></option></term>
+      <term><option>--username=<replaceable class="parameter">username</replaceable></option></term>
+      <listitem>
+       <para>
+        User name to connect as.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-w</option></term>
+      <term><option>--no-password</option></term>
+      <listitem>
+       <para>
+        Never issue a password prompt.  If the server requires
+        password authentication and a password is not available by
+        other means such as a <filename>.pgpass</filename> file, the
+        connection attempt will fail.  This option can be useful in
+        batch jobs and scripts where no user is present to enter a
+        password.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-W</option></term>
+      <term><option>--password</option></term>
+      <listitem>
+       <para>
+        Force <application>pg_repackdb</application> to prompt for a
+        password before connecting to a database.
+       </para>
+
+       <para>
+        This option is never essential, since
+        <application>pg_repackdb</application> will automatically prompt
+        for a password if the server demands password authentication.
+        However, <application>pg_repackdb</application> will waste a
+        connection attempt finding out that the server wants a password.
+        In some cases it is worth typing <option>-W</option> to avoid the extra
+        connection attempt.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--maintenance-db=<replaceable class="parameter">dbname</replaceable></option></term>
+      <listitem>
+       <para>
+        When the <option>-a</option>/<option>--all</option> is used, connect
+        to this database to gather the list of databases to repack.
+        If not specified, the <literal>postgres</literal> database will be used,
+        or if that does not exist, <literal>template1</literal> will be used.
+        This can be a <link linkend="libpq-connstring">connection
+        string</link>.  If so, connection string parameters will override any
+        conflicting command line options.  Also, connection string parameters
+        other than the database name itself will be re-used when connecting
+        to other databases.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+   </para>
+ </refsect1>
+
+
+ <refsect1>
+  <title>Environment</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><envar>PGDATABASE</envar></term>
+    <term><envar>PGHOST</envar></term>
+    <term><envar>PGPORT</envar></term>
+    <term><envar>PGUSER</envar></term>
+
+    <listitem>
+     <para>
+      Default connection parameters
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><envar>PG_COLOR</envar></term>
+    <listitem>
+     <para>
+      Specifies whether to use color in diagnostic messages. Possible values
+      are <literal>always</literal>, <literal>auto</literal> and
+      <literal>never</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <para>
+   This utility, like most other <productname>PostgreSQL</productname> utilities,
+   also uses the environment variables supported by <application>libpq</application>
+   (see <xref linkend="libpq-envars"/>).
+  </para>
+
+ </refsect1>
+
+
+ <refsect1>
+  <title>Diagnostics</title>
+
+  <para>
+   In case of difficulty, see
+   <xref linkend="sql-repack"/> and <xref linkend="app-psql"/> for
+   discussions of potential problems and error messages.
+   The database server must be running at the
+   targeted host.  Also, any default connection settings and environment
+   variables used by the <application>libpq</application> front-end
+   library will apply.
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+   <para>
+    To repack the database <literal>test</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb test</userinput>
+</screen>
+   </para>
+
+   <para>
+    To repack and analyze for the optimizer a database named
+    <literal>bigdb</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --analyze bigdb</userinput>
+</screen>
+   </para>
+
+   <para>
+    To repack a single table
+    <literal>foo</literal> in a database named
+    <literal>xyzzy</literal>, and analyze a single column
+    <literal>bar</literal> of the table for the optimizer:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --analyze --verbose --table='foo(bar)' xyzzy</userinput>
+</screen></para>
+
+   <para>
+    To repack all tables in the <literal>foo</literal> and <literal>bar</literal> schemas
+    in a database named <literal>xyzzy</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --schema='foo' --schema='bar' xyzzy</userinput>
+</screen></para>
+
+
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-repack"/></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/repack.sgml b/doc/src/sgml/ref/repack.sgml
new file mode 100644
index 00000000000..42e7f7dc310
--- /dev/null
+++ b/doc/src/sgml/ref/repack.sgml
@@ -0,0 +1,284 @@
+<!--
+doc/src/sgml/ref/repack.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="sql-repack">
+ <indexterm zone="sql-repack">
+  <primary>REPACK</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>REPACK</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>REPACK</refname>
+  <refpurpose>rewrite a table to reclaim disk space</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <replaceable class="parameter">table_name</replaceable> [ USING INDEX [ <replaceable class="parameter">index_name</replaceable> ] ] ]
+
+<phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
+
+    VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
+    ANALYSE | ANALYZE
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>REPACK</command> reclaims storage occupied by dead
+   tuples. Unlike <command>VACUUM</command>, it does so by rewriting the
+   entire contents of the table specified
+   by <replaceable class="parameter">table_name</replaceable> into a new disk
+   file with no extra space (except for the space guaranteed by
+   the <literal>fillfactor</literal> storage parameter), allowing unused space
+   to be returned to the operating system.
+  </para>
+
+  <para>
+   Without
+   a <replaceable class="parameter">table_name</replaceable>, <command>REPACK</command>
+   processes every table and materialized view in the current database that
+   the current user has the <literal>MAINTAIN</literal> privilege on. This
+   form of <command>REPACK</command> cannot be executed inside a transaction
+   block.
+  </para>
+
+  <para>
+   If a <literal>USING INDEX</literal> clause is specified, the rows are
+   physically reordered based on information from an index.  Please see the
+   notes on clustering below.
+  </para>
+
+  <para>
+   When a table is being repacked, an <literal>ACCESS EXCLUSIVE</literal> lock
+   is acquired on it. This prevents any other database operations (both reads
+   and writes) from operating on the table until the <command>REPACK</command>
+   is finished.
+  </para>
+
+  <refsect2 id="sql-repack-notes-on-clustering" xreflabel="Notes on Clustering">
+   <title>Notes on Clustering</title>
+
+   <para>
+    If the <literal>USING INDEX</literal> clause is specified, the rows in
+    the table are physically reordered following an index: if an index name
+    is specified in the command, then that index is used; if no index name
+    is specified, then the index that has been configured as the index to
+    cluster on.  If no index has been configured in this way, an error is
+    thrown.  The index given in the <literal>USING INDEX</literal> clause
+    is configured as the index to cluster on, as well as an index given
+    to the <command>CLUSTER</command> command.  An index can be set
+    manually using <command>ALTER TABLE ... CLUSTER ON</command>, and reset
+    with <command>ALTER TABLE ... SET WITHOUT CLUSTER</command>.
+   </para>
+
+   <para>
+    If no table name is specified in <command>REPACK USING INDEX</command>,
+    all tables which have a clustering index defined and which the calling
+    user has privileges for are processed.
+   </para>
+
+   <para>
+    Clustering is a one-time operation: when the table is
+    subsequently updated, the changes are not clustered.  That is, no attempt
+    is made to store new or updated rows according to their index order.  (If
+    one wishes, one can periodically recluster by issuing the command again.
+    Also, setting the table's <literal>fillfactor</literal> storage parameter
+    to less than 100% can aid in preserving cluster ordering during updates,
+    since updated rows are kept on the same page if enough space is available
+    there.)
+   </para>
+
+   <para>
+    In cases where you are accessing single rows randomly within a table, the
+    actual order of the data in the table is unimportant. However, if you tend
+    to access some data more than others, and there is an index that groups
+    them together, you will benefit from using clustering.  If
+    you are requesting a range of indexed values from a table, or a single
+    indexed value that has multiple rows that match,
+    <command>REPACK</command> will help because once the index identifies the
+    table page for the first row that matches, all other rows that match are
+    probably already on the same table page, and so you save disk accesses and
+    speed up the query.
+   </para>
+
+   <para>
+    <command>REPACK</command> can re-sort the table using either an index scan
+    on the specified index (if the index is a b-tree), or a sequential scan
+    followed by sorting.  It will attempt to choose the method that will be
+    faster, based on planner cost parameters and available statistical
+    information.
+   </para>
+
+   <para>
+    Because the planner records statistics about the ordering of tables, it is
+    advisable to
+    run <link linkend="sql-analyze"><command>ANALYZE</command></link> on the
+    newly repacked table.  Otherwise, the planner might make poor choices of
+    query plans.
+   </para>
+  </refsect2>
+
+  <refsect2 id="sql-repack-notes-on-resources" xreflabel="Notes on Resources">
+   <title>Notes on Resources</title>
+
+   <para>
+    When an index scan or a sequential scan without sort is used, a temporary
+    copy of the table is created that contains the table data in the index
+    order.  Temporary copies of each index on the table are created as well.
+    Therefore, you need free space on disk at least equal to the sum of the
+    table size and the index sizes.
+   </para>
+
+   <para>
+    When a sequential scan and sort is used, a temporary sort file is also
+    created, so that the peak temporary space requirement is as much as double
+    the table size, plus the index sizes.  This method is often faster than
+    the index scan method, but if the disk space requirement is intolerable,
+    you can disable this choice by temporarily setting
+    <xref linkend="guc-enable-sort"/> to <literal>off</literal>.
+   </para>
+
+   <para>
+    It is advisable to set <xref linkend="guc-maintenance-work-mem"/> to a
+    reasonably large value (but not more than the amount of RAM you can
+    dedicate to the <command>REPACK</command> operation) before repacking.
+   </para>
+  </refsect2>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><replaceable class="parameter">table_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (possibly schema-qualified) of a table.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">index_name</replaceable></term>
+    <listitem>
+     <para>
+      The name of an index.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>VERBOSE</literal></term>
+    <listitem>
+     <para>
+      Prints a progress report as each table is repacked
+      at <literal>INFO</literal> level.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>ANALYZE</literal></term>
+    <term><literal>ANALYSE</literal></term>
+    <listitem>
+     <para>
+      Applies <xref linkend="sql-analyze"/> on the table after repacking.  This is
+      only supported when a single (non-partitioned) table is specified.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">boolean</replaceable></term>
+    <listitem>
+     <para>
+      Specifies whether the selected option should be turned on or off.
+      You can write <literal>TRUE</literal>, <literal>ON</literal>, or
+      <literal>1</literal> to enable the option, and <literal>FALSE</literal>,
+      <literal>OFF</literal>, or <literal>0</literal> to disable it.  The
+      <replaceable class="parameter">boolean</replaceable> value can also
+      be omitted, in which case <literal>TRUE</literal> is assumed.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+
+   <para>
+    To repack a table, one must have the <literal>MAINTAIN</literal> privilege
+    on the table.
+   </para>
+
+   <para>
+    While <command>REPACK</command> is running, the <xref
+    linkend="guc-search-path"/> is temporarily changed to <literal>pg_catalog,
+    pg_temp</literal>.
+   </para>
+
+  <para>
+    Each backend running <command>REPACK</command> will report its progress
+    in the <structname>pg_stat_progress_repack</structname> view. See
+    <xref linkend="repack-progress-reporting"/> for details.
+  </para>
+
+   <para>
+    Repacking a partitioned table repacks each of its partitions. If an index
+    is specified, each partition is repacked using the partition of that
+    index. <command>REPACK</command> on a partitioned table cannot be executed
+    inside a transaction block.
+   </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   Repack the table <literal>employees</literal>:
+<programlisting>
+REPACK employees;
+</programlisting>
+  </para>
+
+  <para>
+   Repack the table <literal>employees</literal> on the basis of its
+   index <literal>employees_ind</literal> (Since index is used here, this is
+   effectively clustering):
+<programlisting>
+REPACK employees USING INDEX employees_ind;
+</programlisting>
+  </para>
+
+  <para>
+   Repack all tables in the database on which you have
+   the <literal>MAINTAIN</literal> privilege:
+<programlisting>
+REPACK;
+</programlisting></para>
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There is no <command>REPACK</command> statement in the SQL standard.
+  </para>
+
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index bd5dcaf86a5..062b658cfcd 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -25,7 +25,6 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
 
 <phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
 
-    FULL [ <replaceable class="parameter">boolean</replaceable> ]
     FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
     VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
     ANALYZE [ <replaceable class="parameter">boolean</replaceable> ]
@@ -39,6 +38,7 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
     SKIP_DATABASE_STATS [ <replaceable class="parameter">boolean</replaceable> ]
     ONLY_DATABASE_STATS [ <replaceable class="parameter">boolean</replaceable> ]
     BUFFER_USAGE_LIMIT <replaceable class="parameter">size</replaceable>
+    FULL [ <replaceable class="parameter">boolean</replaceable> ]
 
 <phrase>and <replaceable class="parameter">table_and_columns</replaceable> is:</phrase>
 
@@ -95,20 +95,6 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
   <title>Parameters</title>
 
   <variablelist>
-   <varlistentry>
-    <term><literal>FULL</literal></term>
-    <listitem>
-     <para>
-      Selects <quote>full</quote> vacuum, which can reclaim more
-      space, but takes much longer and exclusively locks the table.
-      This method also requires extra disk space, since it writes a
-      new copy of the table and doesn't release the old copy until
-      the operation is complete.  Usually this should only be used when a
-      significant amount of space needs to be reclaimed from within the table.
-     </para>
-    </listitem>
-   </varlistentry>
-
    <varlistentry>
     <term><literal>FREEZE</literal></term>
     <listitem>
@@ -362,6 +348,23 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>FULL</literal></term>
+    <listitem>
+     <para>
+      This option, which is deprecated, makes <command>VACUUM</command>
+      behave like <command>REPACK</command> without a
+      <literal>USING INDEX</literal> clause.
+      This method of compacting the table takes much longer than
+      <command>VACUUM</command> and exclusively locks the table.
+      This method also requires extra disk space, since it writes a
+      new copy of the table and doesn't release the old copy until
+      the operation is complete.  Usually this should only be used when a
+      significant amount of space needs to be reclaimed from within the table.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><replaceable class="parameter">boolean</replaceable></term>
     <listitem>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index ff85ace83fc..2ee08e21f41 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -195,6 +195,7 @@
    &refreshMaterializedView;
    &reindex;
    &releaseSavepoint;
+   &repack;
    &reset;
    &revoke;
    &rollback;
@@ -257,6 +258,7 @@
    &pgIsready;
    &pgReceivewal;
    &pgRecvlogical;
+   &pgRepackdb;
    &pgRestore;
    &pgVerifyBackup;
    &psqlRef;
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index cb4bc35c93e..0b03070d394 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -741,13 +741,13 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	if (OldIndex != NULL && !use_sort)
 	{
 		const int	ci_index[] = {
-			PROGRESS_CLUSTER_PHASE,
-			PROGRESS_CLUSTER_INDEX_RELID
+			PROGRESS_REPACK_PHASE,
+			PROGRESS_REPACK_INDEX_RELID
 		};
 		int64		ci_val[2];
 
 		/* Set phase and OIDOldIndex to columns */
-		ci_val[0] = PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP;
+		ci_val[0] = PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP;
 		ci_val[1] = RelationGetRelid(OldIndex);
 		pgstat_progress_update_multi_param(2, ci_index, ci_val);
 
@@ -759,15 +759,15 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	else
 	{
 		/* In scan-and-sort mode and also VACUUM FULL, set phase */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
 		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
 		/* Set total heap blocks */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_TOTAL_HEAP_BLKS,
+		pgstat_progress_update_param(PROGRESS_REPACK_TOTAL_HEAP_BLKS,
 									 heapScan->rs_nblocks);
 	}
 
@@ -809,7 +809,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 				 * is manually updated to the correct value when the table
 				 * scan finishes.
 				 */
-				pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+				pgstat_progress_update_param(PROGRESS_REPACK_HEAP_BLKS_SCANNED,
 											 heapScan->rs_nblocks);
 				break;
 			}
@@ -825,7 +825,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			 */
 			if (prev_cblock != heapScan->rs_cblock)
 			{
-				pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+				pgstat_progress_update_param(PROGRESS_REPACK_HEAP_BLKS_SCANNED,
 											 (heapScan->rs_cblock +
 											  heapScan->rs_nblocks -
 											  heapScan->rs_startblock
@@ -912,14 +912,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			 * In scan-and-sort mode, report increase in number of tuples
 			 * scanned
 			 */
-			pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
 										 *num_tuples);
 		}
 		else
 		{
 			const int	ct_index[] = {
-				PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED,
-				PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN
+				PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
+				PROGRESS_REPACK_HEAP_TUPLES_WRITTEN
 			};
 			int64		ct_val[2];
 
@@ -952,14 +952,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		double		n_tuples = 0;
 
 		/* Report that we are now sorting tuples */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_SORT_TUPLES);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_SORT_TUPLES);
 
 		tuplesort_performsort(tuplesort);
 
 		/* Report that we are now writing new heap */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_WRITE_NEW_HEAP);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP);
 
 		for (;;)
 		{
@@ -977,7 +977,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 									 values, isnull,
 									 rwstate);
 			/* Report n_tuples */
-			pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_WRITTEN,
 										 n_tuples);
 		}
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index c4029a4f3d3..3063abff9a5 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -4079,7 +4079,7 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 		Assert(!ReindexIsProcessingIndex(indexOid));
 
 		/* Set index rebuild count */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_INDEX_REBUILD_COUNT,
+		pgstat_progress_update_param(PROGRESS_REPACK_INDEX_REBUILD_COUNT,
 									 i);
 		i++;
 	}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 1b3c5a55882..b2b7b10c2be 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1279,6 +1279,32 @@ CREATE VIEW pg_stat_progress_cluster AS
     FROM pg_stat_get_progress_info('CLUSTER') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
+CREATE VIEW pg_stat_progress_repack AS
+    SELECT
+        S.pid AS pid,
+        S.datid AS datid,
+        D.datname AS datname,
+        S.relid AS relid,
+	-- param1 is currently unused
+        CASE S.param2 WHEN 0 THEN 'initializing'
+                      WHEN 1 THEN 'seq scanning heap'
+                      WHEN 2 THEN 'index scanning heap'
+                      WHEN 3 THEN 'sorting tuples'
+                      WHEN 4 THEN 'writing new heap'
+                      WHEN 5 THEN 'swapping relation files'
+                      WHEN 6 THEN 'rebuilding index'
+                      WHEN 7 THEN 'performing final cleanup'
+                      END AS phase,
+        CAST(S.param3 AS oid) AS repack_index_relid,
+        S.param4 AS heap_tuples_scanned,
+        S.param5 AS heap_tuples_written,
+        S.param6 AS heap_blks_total,
+        S.param7 AS heap_blks_scanned,
+        S.param8 AS index_rebuild_count
+    FROM pg_stat_get_progress_info('REPACK') AS S
+        LEFT JOIN pg_database D ON S.datid = D.oid;
+
+
 CREATE VIEW pg_stat_progress_create_index AS
     SELECT
         S.pid AS pid, S.datid AS datid, D.datname AS datname,
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index b55221d44cd..ff3154e03cc 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -67,18 +67,41 @@ typedef struct
 	Oid			indexOid;
 } RelToCluster;
 
-
-static void cluster_multiple_rels(List *rtcs, ClusterParams *params);
-static void rebuild_relation(Relation OldHeap, Relation index, bool verbose);
+static bool cluster_rel_recheck(RepackCommand cmd, Relation OldHeap,
+								Oid indexOid, Oid userid, int options);
+static void rebuild_relation(RepackCommand cmd, bool usingindex,
+							 Relation OldHeap, Relation index, bool verbose);
 static void copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
-static List *get_tables_to_cluster(MemoryContext cluster_context);
-static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
-											   Oid indexOid);
-static bool cluster_is_permitted_for_relation(Oid relid, Oid userid);
+static List *get_tables_to_repack(RepackCommand cmd, bool usingindex,
+								  MemoryContext permcxt);
+static List *get_tables_to_repack_partitioned(RepackCommand cmd,
+											  MemoryContext cluster_context,
+											  Oid relid, bool rel_is_index);
+static bool cluster_is_permitted_for_relation(RepackCommand cmd,
+											  Oid relid, Oid userid);
+static Relation process_single_relation(RepackStmt *stmt,
+										ClusterParams *params);
+static Oid	determine_clustered_index(Relation rel, bool usingindex,
+									  const char *indexname);
 
 
+static const char *
+RepackCommandAsString(RepackCommand cmd)
+{
+	switch (cmd)
+	{
+		case REPACK_COMMAND_REPACK:
+			return "REPACK";
+		case REPACK_COMMAND_VACUUMFULL:
+			return "VACUUM";
+		case REPACK_COMMAND_CLUSTER:
+			return "VACUUM";
+	}
+	return "???";
+}
+
 /*---------------------------------------------------------------------------
  * This cluster code allows for clustering multiple tables at once. Because
  * of this, we cannot just run everything on a single transaction, or we
@@ -104,191 +127,155 @@ static bool cluster_is_permitted_for_relation(Oid relid, Oid userid);
  *---------------------------------------------------------------------------
  */
 void
-cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
+ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 {
-	ListCell   *lc;
 	ClusterParams params = {0};
-	bool		verbose = false;
 	Relation	rel = NULL;
-	Oid			indexOid = InvalidOid;
-	MemoryContext cluster_context;
+	MemoryContext repack_context;
 	List	   *rtcs;
 
 	/* Parse option list */
-	foreach(lc, stmt->params)
+	foreach_node(DefElem, opt, stmt->params)
 	{
-		DefElem    *opt = (DefElem *) lfirst(lc);
-
 		if (strcmp(opt->defname, "verbose") == 0)
-			verbose = defGetBoolean(opt);
+			params.options |= defGetBoolean(opt) ? CLUOPT_VERBOSE : 0;
+		else if (strcmp(opt->defname, "analyze") == 0 ||
+				 strcmp(opt->defname, "analyse") == 0)
+			params.options |= defGetBoolean(opt) ? CLUOPT_ANALYZE : 0;
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
-					 errmsg("unrecognized CLUSTER option \"%s\"",
+					 errmsg("unrecognized %s option \"%s\"",
+							RepackCommandAsString(stmt->command),
 							opt->defname),
 					 parser_errposition(pstate, opt->location)));
 	}
 
-	params.options = (verbose ? CLUOPT_VERBOSE : 0);
-
+	/*
+	 * If a single relation is specified, process it and we're done ... unless
+	 * the relation is a partitioned table, in which case we fall through.
+	 */
 	if (stmt->relation != NULL)
 	{
-		/* This is the single-relation case. */
-		Oid			tableOid;
-
-		/*
-		 * Find, lock, and check permissions on the table.  We obtain
-		 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
-		 * single-transaction case.
-		 */
-		tableOid = RangeVarGetRelidExtended(stmt->relation,
-											AccessExclusiveLock,
-											0,
-											RangeVarCallbackMaintainsTable,
-											NULL);
-		rel = table_open(tableOid, NoLock);
-
-		/*
-		 * Reject clustering a remote temp table ... their local buffer
-		 * manager is not going to cope.
-		 */
-		if (RELATION_IS_OTHER_TEMP(rel))
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot cluster temporary tables of other sessions")));
-
-		if (stmt->indexname == NULL)
-		{
-			ListCell   *index;
-
-			/* We need to find the index that has indisclustered set. */
-			foreach(index, RelationGetIndexList(rel))
-			{
-				indexOid = lfirst_oid(index);
-				if (get_index_isclustered(indexOid))
-					break;
-				indexOid = InvalidOid;
-			}
-
-			if (!OidIsValid(indexOid))
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_OBJECT),
-						 errmsg("there is no previously clustered index for table \"%s\"",
-								stmt->relation->relname)));
-		}
-		else
-		{
-			/*
-			 * The index is expected to be in the same namespace as the
-			 * relation.
-			 */
-			indexOid = get_relname_relid(stmt->indexname,
-										 rel->rd_rel->relnamespace);
-			if (!OidIsValid(indexOid))
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_OBJECT),
-						 errmsg("index \"%s\" for table \"%s\" does not exist",
-								stmt->indexname, stmt->relation->relname)));
-		}
-
-		/* For non-partitioned tables, do what we came here to do. */
-		if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
-		{
-			cluster_rel(rel, indexOid, &params);
-			/* cluster_rel closes the relation, but keeps lock */
-
+		rel = process_single_relation(stmt, &params);
+		if (rel == NULL)
 			return;
-		}
 	}
 
+	/* Don't allow this for now.  Maybe we can add support for this later */
+	if (params.options & CLUOPT_ANALYZE)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot ANALYZE multiple tables"));
+
 	/*
 	 * By here, we know we are in a multi-table situation.  In order to avoid
 	 * holding locks for too long, we want to process each table in its own
 	 * transaction.  This forces us to disallow running inside a user
 	 * transaction block.
 	 */
-	PreventInTransactionBlock(isTopLevel, "CLUSTER");
+	PreventInTransactionBlock(isTopLevel, RepackCommandAsString(stmt->command));
 
 	/* Also, we need a memory context to hold our list of relations */
-	cluster_context = AllocSetContextCreate(PortalContext,
-											"Cluster",
-											ALLOCSET_DEFAULT_SIZES);
+	repack_context = AllocSetContextCreate(PortalContext,
+										   "Repack",
+										   ALLOCSET_DEFAULT_SIZES);
+
+	params.options |= CLUOPT_RECHECK;
 
 	/*
-	 * Either we're processing a partitioned table, or we were not given any
-	 * table name at all.  In either case, obtain a list of relations to
-	 * process.
-	 *
-	 * In the former case, an index name must have been given, so we don't
-	 * need to recheck its "indisclustered" bit, but we have to check that it
-	 * is an index that we can cluster on.  In the latter case, we set the
-	 * option bit to have indisclustered verified.
-	 *
-	 * Rechecking the relation itself is necessary here in all cases.
+	 * If we don't have a relation yet, determine a relation list.  If we do,
+	 * then it must be a partitioned table, and we want to process its
+	 * partitions.
 	 */
-	params.options |= CLUOPT_RECHECK;
-	if (rel != NULL)
+	if (rel == NULL)
 	{
-		Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
-		check_index_is_clusterable(rel, indexOid, AccessShareLock);
-		rtcs = get_tables_to_cluster_partitioned(cluster_context, indexOid);
-
-		/* close relation, releasing lock on parent table */
-		table_close(rel, AccessExclusiveLock);
+		Assert(stmt->indexname == NULL);
+		rtcs = get_tables_to_repack(stmt->command, stmt->usingindex,
+									repack_context);
 	}
 	else
 	{
-		rtcs = get_tables_to_cluster(cluster_context);
-		params.options |= CLUOPT_RECHECK_ISCLUSTERED;
+		Oid			relid;
+		bool		rel_is_index;
+
+		Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
+
+		/*
+		 * If an index name was specified, resolve it now and pass it down.
+		 */
+		if (stmt->usingindex)
+		{
+			/*
+			 * XXX how should this behave?  Passing no index to a partitioned
+			 * table could be useful to have certain partitions clustered by
+			 * some index, and other partitions by a different index.
+			 */
+			if (!stmt->indexname)
+				ereport(ERROR,
+						errmsg("there is no previously clustered index for table \"%s\"",
+							   RelationGetRelationName(rel)));
+
+			relid = determine_clustered_index(rel, true, stmt->indexname);
+			if (!OidIsValid(relid))
+				elog(ERROR, "unable to determine index to cluster on");
+			/* XXX is this the right place for this check? */
+			check_index_is_clusterable(rel, relid, AccessExclusiveLock);
+			rel_is_index = true;
+		}
+		else
+		{
+			relid = RelationGetRelid(rel);
+			rel_is_index = false;
+		}
+
+		rtcs = get_tables_to_repack_partitioned(stmt->command, repack_context,
+												relid, rel_is_index);
+
+		/* close parent relation, releasing lock on it */
+		table_close(rel, AccessExclusiveLock);
+		rel = NULL;
 	}
 
-	/* Do the job. */
-	cluster_multiple_rels(rtcs, &params);
-
-	/* Start a new transaction for the cleanup work. */
-	StartTransactionCommand();
-
-	/* Clean up working storage */
-	MemoryContextDelete(cluster_context);
-}
-
-/*
- * Given a list of relations to cluster, process each of them in a separate
- * transaction.
- *
- * We expect to be in a transaction at start, but there isn't one when we
- * return.
- */
-static void
-cluster_multiple_rels(List *rtcs, ClusterParams *params)
-{
-	ListCell   *lc;
-
 	/* Commit to get out of starting transaction */
 	PopActiveSnapshot();
 	CommitTransactionCommand();
 
 	/* Cluster the tables, each in a separate transaction */
-	foreach(lc, rtcs)
+	Assert(rel == NULL);
+	foreach_ptr(RelToCluster, rtc, rtcs)
 	{
-		RelToCluster *rtc = (RelToCluster *) lfirst(lc);
-		Relation	rel;
-
 		/* Start a new transaction for each relation. */
 		StartTransactionCommand();
 
+		/*
+		 * Open the target table, coping with the case where it has been
+		 * dropped.
+		 */
+		rel = try_table_open(rtc->tableOid, AccessExclusiveLock);
+		if (rel == NULL)
+		{
+			CommitTransactionCommand();
+			continue;
+		}
+
 		/* functions in indexes may want a snapshot set */
 		PushActiveSnapshot(GetTransactionSnapshot());
 
-		rel = table_open(rtc->tableOid, AccessExclusiveLock);
-
 		/* Process this table */
-		cluster_rel(rel, rtc->indexOid, params);
+		cluster_rel(stmt->command, stmt->usingindex,
+					rel, rtc->indexOid, &params);
 		/* cluster_rel closes the relation, but keeps lock */
 
 		PopActiveSnapshot();
 		CommitTransactionCommand();
 	}
+
+	/* Start a new transaction for the cleanup work. */
+	StartTransactionCommand();
+
+	/* Clean up working storage */
+	MemoryContextDelete(repack_context);
 }
 
 /*
@@ -304,11 +291,14 @@ cluster_multiple_rels(List *rtcs, ClusterParams *params)
  * them incrementally while we load the table.
  *
  * If indexOid is InvalidOid, the table will be rewritten in physical order
- * instead of index order.  This is the new implementation of VACUUM FULL,
- * and error messages should refer to the operation as VACUUM not CLUSTER.
+ * instead of index order.
+ *
+ * 'cmd' indicates which command is being executed, to be used for error
+ * messages.
  */
 void
-cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
+cluster_rel(RepackCommand cmd, bool usingindex,
+			Relation OldHeap, Oid indexOid, ClusterParams *params)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			save_userid;
@@ -323,13 +313,25 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	/* Check for user-requested abort. */
 	CHECK_FOR_INTERRUPTS();
 
-	pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
-	if (OidIsValid(indexOid))
-		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
-									 PROGRESS_CLUSTER_COMMAND_CLUSTER);
+	if (cmd == REPACK_COMMAND_REPACK)
+		pgstat_progress_start_command(PROGRESS_COMMAND_REPACK, tableOid);
 	else
-		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
+		pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
+
+	if (cmd == REPACK_COMMAND_REPACK)
+		pgstat_progress_update_param(PROGRESS_REPACK_COMMAND,
+									 PROGRESS_REPACK_COMMAND_REPACK);
+	else if (cmd == REPACK_COMMAND_CLUSTER)
+	{
+		pgstat_progress_update_param(PROGRESS_REPACK_COMMAND,
+									 PROGRESS_CLUSTER_COMMAND_CLUSTER);
+	}
+	else
+	{
+		Assert(cmd == REPACK_COMMAND_VACUUMFULL);
+		pgstat_progress_update_param(PROGRESS_REPACK_COMMAND,
 									 PROGRESS_CLUSTER_COMMAND_VACUUM_FULL);
+	}
 
 	/*
 	 * Switch to the table owner's userid, so that any index functions are run
@@ -351,63 +353,21 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	 * to cluster a not-previously-clustered index.
 	 */
 	if (recheck)
-	{
-		/* Check that the user still has privileges for the relation */
-		if (!cluster_is_permitted_for_relation(tableOid, save_userid))
-		{
-			relation_close(OldHeap, AccessExclusiveLock);
+		if (!cluster_rel_recheck(cmd, OldHeap, indexOid, save_userid,
+								 params->options))
 			goto out;
-		}
-
-		/*
-		 * Silently skip a temp table for a remote session.  Only doing this
-		 * check in the "recheck" case is appropriate (which currently means
-		 * somebody is executing a database-wide CLUSTER or on a partitioned
-		 * table), because there is another check in cluster() which will stop
-		 * any attempt to cluster remote temp tables by name.  There is
-		 * another check in cluster_rel which is redundant, but we leave it
-		 * for extra safety.
-		 */
-		if (RELATION_IS_OTHER_TEMP(OldHeap))
-		{
-			relation_close(OldHeap, AccessExclusiveLock);
-			goto out;
-		}
-
-		if (OidIsValid(indexOid))
-		{
-			/*
-			 * Check that the index still exists
-			 */
-			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
-			{
-				relation_close(OldHeap, AccessExclusiveLock);
-				goto out;
-			}
-
-			/*
-			 * Check that the index is still the one with indisclustered set,
-			 * if needed.
-			 */
-			if ((params->options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
-				!get_index_isclustered(indexOid))
-			{
-				relation_close(OldHeap, AccessExclusiveLock);
-				goto out;
-			}
-		}
-	}
 
 	/*
-	 * We allow VACUUM FULL, but not CLUSTER, on shared catalogs.  CLUSTER
-	 * would work in most respects, but the index would only get marked as
-	 * indisclustered in the current database, leading to unexpected behavior
-	 * if CLUSTER were later invoked in another database.
+	 * We allow repacking shared catalogs only when not using an index. It
+	 * would work to use an index in most respects, but the index would only
+	 * get marked as indisclustered in the current database, leading to
+	 * unexpected behavior if CLUSTER were later invoked in another database.
 	 */
-	if (OidIsValid(indexOid) && OldHeap->rd_rel->relisshared)
+	if (usingindex && OldHeap->rd_rel->relisshared)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot cluster a shared catalog")));
+				 errmsg("cannot run \"%s\" on a shared catalog",
+						RepackCommandAsString(cmd))));
 
 	/*
 	 * Don't process temp tables of other backends ... their local buffer
@@ -415,21 +375,30 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	 */
 	if (RELATION_IS_OTHER_TEMP(OldHeap))
 	{
-		if (OidIsValid(indexOid))
+		if (cmd == REPACK_COMMAND_CLUSTER)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("cannot cluster temporary tables of other sessions")));
+		else if (cmd == REPACK_COMMAND_REPACK)
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot repack temporary tables of other sessions")));
+		}
 		else
+		{
+			Assert(cmd == REPACK_COMMAND_VACUUMFULL);
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("cannot vacuum temporary tables of other sessions")));
+		}
 	}
 
 	/*
 	 * Also check for active uses of the relation in the current transaction,
 	 * including open scans and pending AFTER trigger events.
 	 */
-	CheckTableNotInUse(OldHeap, OidIsValid(indexOid) ? "CLUSTER" : "VACUUM");
+	CheckTableNotInUse(OldHeap, RepackCommandAsString(cmd));
 
 	/* Check heap and index are valid to cluster on */
 	if (OidIsValid(indexOid))
@@ -469,7 +438,7 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	TransferPredicateLocksToHeapRelation(OldHeap);
 
 	/* rebuild_relation does all the dirty work */
-	rebuild_relation(OldHeap, index, verbose);
+	rebuild_relation(cmd, usingindex, OldHeap, index, verbose);
 	/* rebuild_relation closes OldHeap, and index if valid */
 
 out:
@@ -482,6 +451,63 @@ out:
 	pgstat_progress_end_command();
 }
 
+/*
+ * Check if the table (and its index) still meets the requirements of
+ * cluster_rel().
+ */
+static bool
+cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
+					Oid userid, int options)
+{
+	Oid			tableOid = RelationGetRelid(OldHeap);
+
+	/* Check that the user still has privileges for the relation */
+	if (!cluster_is_permitted_for_relation(cmd, tableOid, userid))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		return false;
+	}
+
+	/*
+	 * Silently skip a temp table for a remote session.  Only doing this check
+	 * in the "recheck" case is appropriate (which currently means somebody is
+	 * executing a database-wide CLUSTER or on a partitioned table), because
+	 * there is another check in cluster() which will stop any attempt to
+	 * cluster remote temp tables by name.  There is another check in
+	 * cluster_rel which is redundant, but we leave it for extra safety.
+	 */
+	if (RELATION_IS_OTHER_TEMP(OldHeap))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		return false;
+	}
+
+	if (OidIsValid(indexOid))
+	{
+		/*
+		 * Check that the index still exists
+		 */
+		if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
+		{
+			relation_close(OldHeap, AccessExclusiveLock);
+			return false;
+		}
+
+		/*
+		 * Check that the index is still the one with indisclustered set, if
+		 * needed.
+		 */
+		if ((options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
+			!get_index_isclustered(indexOid))
+		{
+			relation_close(OldHeap, AccessExclusiveLock);
+			return false;
+		}
+	}
+
+	return true;
+}
+
 /*
  * Verify that the specified heap and index are valid to cluster on
  *
@@ -626,7 +652,8 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
  * On exit, they are closed, but locks on them are not released.
  */
 static void
-rebuild_relation(Relation OldHeap, Relation index, bool verbose)
+rebuild_relation(RepackCommand cmd, bool usingindex,
+				 Relation OldHeap, Relation index, bool verbose)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			accessMethod = OldHeap->rd_rel->relam;
@@ -642,8 +669,8 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose)
 	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false) &&
 		   (index == NULL || CheckRelationLockedByMe(index, AccessExclusiveLock, false)));
 
-	if (index)
-		/* Mark the correct index as clustered */
+	/* for CLUSTER or REPACK USING INDEX, mark the index as the one to use */
+	if (usingindex)
 		mark_index_clustered(OldHeap, RelationGetRelid(index), true);
 
 	/* Remember info about rel before closing OldHeap */
@@ -1458,8 +1485,8 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	int			i;
 
 	/* Report that we are now swapping relation files */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_SWAP_REL_FILES);
 
 	/* Zero out possible results from swapped_relation_files */
 	memset(mapped_tables, 0, sizeof(mapped_tables));
@@ -1509,14 +1536,14 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 		reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
 
 	/* Report that we are now reindexing relations */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_REBUILD_INDEX);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
 
 	reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
 
 	/* Report that we are now doing clean up */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_FINAL_CLEANUP);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_FINAL_CLEANUP);
 
 	/*
 	 * If the relation being rebuilt is pg_class, swap_relation_files()
@@ -1632,69 +1659,137 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	}
 }
 
-
 /*
- * Get a list of tables that the current user has privileges on and
- * have indisclustered set.  Return the list in a List * of RelToCluster
- * (stored in the specified memory context), each one giving the tableOid
- * and the indexOid on which the table is already clustered.
+ * Determine which relations to process, when REPACK/CLUSTER is called
+ * without specifying a table name.  The exact process depends on whether
+ * USING INDEX was given or not, and in any case we only return tables and
+ * materialized views that the current user has privileges to repack/cluster.
+ *
+ * If USING INDEX was given, we scan pg_index to find those that have
+ * indisclustered set; if it was not given, scan pg_class and return all
+ * tables.
+ *
+ * Return it as a list of RelToCluster in the given memory context.
  */
 static List *
-get_tables_to_cluster(MemoryContext cluster_context)
+get_tables_to_repack(RepackCommand command, bool usingindex,
+					 MemoryContext permcxt)
 {
-	Relation	indRelation;
+	Relation	catalog;
 	TableScanDesc scan;
-	ScanKeyData entry;
-	HeapTuple	indexTuple;
-	Form_pg_index index;
+	HeapTuple	tuple;
 	MemoryContext old_context;
 	List	   *rtcs = NIL;
 
-	/*
-	 * Get all indexes that have indisclustered set and that the current user
-	 * has the appropriate privileges for.
-	 */
-	indRelation = table_open(IndexRelationId, AccessShareLock);
-	ScanKeyInit(&entry,
-				Anum_pg_index_indisclustered,
-				BTEqualStrategyNumber, F_BOOLEQ,
-				BoolGetDatum(true));
-	scan = table_beginscan_catalog(indRelation, 1, &entry);
-	while ((indexTuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+	if (usingindex)
 	{
-		RelToCluster *rtc;
+		ScanKeyData entry;
 
-		index = (Form_pg_index) GETSTRUCT(indexTuple);
+		catalog = table_open(IndexRelationId, AccessShareLock);
+		ScanKeyInit(&entry,
+					Anum_pg_index_indisclustered,
+					BTEqualStrategyNumber, F_BOOLEQ,
+					BoolGetDatum(true));
+		scan = table_beginscan_catalog(catalog, 1, &entry);
+		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+		{
+			RelToCluster *rtc;
+			Form_pg_index index;
 
-		if (!cluster_is_permitted_for_relation(index->indrelid, GetUserId()))
-			continue;
+			index = (Form_pg_index) GETSTRUCT(tuple);
 
-		/* Use a permanent memory context for the result list */
-		old_context = MemoryContextSwitchTo(cluster_context);
+			/*
+			 * XXX I think the only reason there's no test failure here is
+			 * that we seldom have clustered indexes that would be affected by
+			 * concurrency.  Maybe we should also do the
+			 * ConditionalLockRelationOid+SearchSysCacheExists dance that we
+			 * do below.
+			 */
+			if (!cluster_is_permitted_for_relation(command, index->indrelid,
+												   GetUserId()))
+				continue;
 
-		rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
-		rtc->tableOid = index->indrelid;
-		rtc->indexOid = index->indexrelid;
-		rtcs = lappend(rtcs, rtc);
+			/* Use a permanent memory context for the result list */
+			old_context = MemoryContextSwitchTo(permcxt);
 
-		MemoryContextSwitchTo(old_context);
+			rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
+			rtc->tableOid = index->indrelid;
+			rtc->indexOid = index->indexrelid;
+			rtcs = lappend(rtcs, rtc);
+
+			MemoryContextSwitchTo(old_context);
+		}
 	}
-	table_endscan(scan);
+	else
+	{
+		catalog = table_open(RelationRelationId, AccessShareLock);
+		scan = table_beginscan_catalog(catalog, 0, NULL);
 
-	relation_close(indRelation, AccessShareLock);
+		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+		{
+			RelToCluster *rtc;
+			Form_pg_class class;
+
+			class = (Form_pg_class) GETSTRUCT(tuple);
+
+			/*
+			 * Try to obtain a light lock on the table, to ensure it doesn't
+			 * go away while we collect the list.  If we cannot, just
+			 * disregard the table.  XXX we could release at the bottom of the
+			 * loop, but for now just hold it until this transaction is
+			 * finished.
+			 */
+			if (!ConditionalLockRelationOid(class->oid, AccessShareLock))
+				continue;
+
+			/* Verify that the table still exists. */
+			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(class->oid)))
+			{
+				/* Release useless lock */
+				UnlockRelationOid(class->oid, AccessShareLock);
+				continue;
+			}
+
+			/* Can only process plain tables and matviews */
+			if (class->relkind != RELKIND_RELATION &&
+				class->relkind != RELKIND_MATVIEW)
+				continue;
+
+			if (!cluster_is_permitted_for_relation(command, class->oid,
+												   GetUserId()))
+				continue;
+
+			/* Use a permanent memory context for the result list */
+			old_context = MemoryContextSwitchTo(permcxt);
+
+			rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
+			rtc->tableOid = class->oid;
+			rtc->indexOid = InvalidOid;
+			rtcs = lappend(rtcs, rtc);
+
+			MemoryContextSwitchTo(old_context);
+		}
+	}
+
+	table_endscan(scan);
+	relation_close(catalog, AccessShareLock);
 
 	return rtcs;
 }
 
 /*
- * Given an index on a partitioned table, return a list of RelToCluster for
+ * Given a partitioned table or its index, return a list of RelToCluster for
  * all the children leaves tables/indexes.
  *
  * Like expand_vacuum_rel, but here caller must hold AccessExclusiveLock
  * on the table containing the index.
+ *
+ * 'rel_is_index' tells whether 'relid' is that of an index (true) or of the
+ * owning relation.
  */
 static List *
-get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
+get_tables_to_repack_partitioned(RepackCommand cmd, MemoryContext cluster_context,
+								 Oid relid, bool rel_is_index)
 {
 	List	   *inhoids;
 	ListCell   *lc;
@@ -1702,17 +1797,33 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
 	MemoryContext old_context;
 
 	/* Do not lock the children until they're processed */
-	inhoids = find_all_inheritors(indexOid, NoLock, NULL);
+	inhoids = find_all_inheritors(relid, NoLock, NULL);
 
 	foreach(lc, inhoids)
 	{
-		Oid			indexrelid = lfirst_oid(lc);
-		Oid			relid = IndexGetRelation(indexrelid, false);
+		Oid			inhoid = lfirst_oid(lc);
+		Oid			inhrelid,
+					inhindid;
 		RelToCluster *rtc;
 
-		/* consider only leaf indexes */
-		if (get_rel_relkind(indexrelid) != RELKIND_INDEX)
-			continue;
+		if (rel_is_index)
+		{
+			/* consider only leaf indexes */
+			if (get_rel_relkind(inhoid) != RELKIND_INDEX)
+				continue;
+
+			inhrelid = IndexGetRelation(inhoid, false);
+			inhindid = inhoid;
+		}
+		else
+		{
+			/* consider only leaf relations */
+			if (get_rel_relkind(inhoid) != RELKIND_RELATION)
+				continue;
+
+			inhrelid = inhoid;
+			inhindid = InvalidOid;
+		}
 
 		/*
 		 * It's possible that the user does not have privileges to CLUSTER the
@@ -1720,15 +1831,15 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
 		 * table.  We skip any partitions which the user is not permitted to
 		 * CLUSTER.
 		 */
-		if (!cluster_is_permitted_for_relation(relid, GetUserId()))
+		if (!cluster_is_permitted_for_relation(cmd, inhrelid, GetUserId()))
 			continue;
 
 		/* Use a permanent memory context for the result list */
 		old_context = MemoryContextSwitchTo(cluster_context);
 
 		rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
-		rtc->tableOid = relid;
-		rtc->indexOid = indexrelid;
+		rtc->tableOid = inhrelid;
+		rtc->indexOid = inhindid;
 		rtcs = lappend(rtcs, rtc);
 
 		MemoryContextSwitchTo(old_context);
@@ -1742,13 +1853,148 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
  * function emits a WARNING.
  */
 static bool
-cluster_is_permitted_for_relation(Oid relid, Oid userid)
+cluster_is_permitted_for_relation(RepackCommand cmd, Oid relid, Oid userid)
 {
 	if (pg_class_aclcheck(relid, userid, ACL_MAINTAIN) == ACLCHECK_OK)
 		return true;
 
+	Assert(cmd == REPACK_COMMAND_CLUSTER || cmd == REPACK_COMMAND_REPACK);
 	ereport(WARNING,
-			(errmsg("permission denied to cluster \"%s\", skipping it",
-					get_rel_name(relid))));
+			errmsg("permission denied to execute %s on \"%s\", skipping it",
+				   cmd == REPACK_COMMAND_CLUSTER ? "CLUSTER" : "REPACK",
+				   get_rel_name(relid)));
+
 	return false;
 }
+
+
+/*
+ * Given a RepackStmt with an indicated relation name, resolve the relation
+ * name, obtain lock on it, then determine what to do based on the relation
+ * type: if it's not a partitioned table, repack it as indicated (using an
+ * existing clustered index, or following the indicated index), and return
+ * NULL.
+ *
+ * On the other hand, if the table is partitioned, do nothing further and
+ * instead return the opened relcache entry, so that caller can process the
+ * partitions using the multiple-table handling code.  The index name is not
+ * resolve in this case.
+ */
+static Relation
+process_single_relation(RepackStmt *stmt, ClusterParams *params)
+{
+	Relation	rel;
+	Oid			tableOid;
+
+	Assert(stmt->relation != NULL);
+	Assert(stmt->command == REPACK_COMMAND_CLUSTER ||
+		   stmt->command == REPACK_COMMAND_REPACK);
+
+	/*
+	 * Find, lock, and check permissions on the table.  We obtain
+	 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
+	 * single-transaction case.
+	 */
+	tableOid = RangeVarGetRelidExtended(stmt->relation,
+										AccessExclusiveLock,
+										0,
+										RangeVarCallbackMaintainsTable,
+										NULL);
+	rel = table_open(tableOid, NoLock);
+
+	/*
+	 * Reject clustering a remote temp table ... their local buffer manager is
+	 * not going to cope.
+	 */
+	if (RELATION_IS_OTHER_TEMP(rel))
+	{
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot execute %s on temporary tables of other sessions",
+					   RepackCommandAsString(stmt->command)));
+	}
+
+	/*
+	 * For partitioned tables, let caller handle this.  Otherwise, process it
+	 * here and we're done.
+	 */
+	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+		return rel;
+	else
+	{
+		Oid			indexOid;
+
+		indexOid = determine_clustered_index(rel, stmt->usingindex,
+											 stmt->indexname);
+		if (OidIsValid(indexOid))
+			check_index_is_clusterable(rel, indexOid, AccessExclusiveLock);
+		cluster_rel(stmt->command, stmt->usingindex, rel, indexOid, params);
+
+		/* Do an analyze, if requested */
+		if (params->options & CLUOPT_ANALYZE)
+		{
+			VacuumParams vac_params = {0};
+
+			vac_params.options |= VACOPT_ANALYZE;
+			if (params->options & CLUOPT_VERBOSE)
+				vac_params.options |= VACOPT_VERBOSE;
+			analyze_rel(RelationGetRelid(rel), NULL, vac_params, NIL, true,
+						NULL);
+		}
+
+		return NULL;
+	}
+}
+
+/*
+ * Given a relation and the usingindex/indexname options in a
+ * REPACK USING INDEX or CLUSTER command, return the OID of the index to use
+ * for clustering the table.
+ *
+ * Caller must hold lock on the relation so that the set of indexes doesn't
+ * change, and must call check_index_is_clusterable.
+ */
+static Oid
+determine_clustered_index(Relation rel, bool usingindex, const char *indexname)
+{
+	Oid			indexOid;
+
+	if (indexname == NULL && usingindex)
+	{
+		ListCell   *lc;
+
+		/* Find an index with indisclustered set, or report error */
+		foreach(lc, RelationGetIndexList(rel))
+		{
+			indexOid = lfirst_oid(lc);
+
+			if (get_index_isclustered(indexOid))
+				break;
+			indexOid = InvalidOid;
+		}
+
+		if (!OidIsValid(indexOid))
+			ereport(ERROR,
+					errcode(ERRCODE_UNDEFINED_OBJECT),
+					errmsg("there is no previously clustered index for table \"%s\"",
+						   RelationGetRelationName(rel)));
+	}
+	else if (indexname != NULL)
+	{
+		/*
+		 * An index was specified; figure out its OID.  It must be in the same
+		 * namespace as the relation.
+		 */
+		indexOid = get_relname_relid(indexname,
+									 rel->rd_rel->relnamespace);
+		if (!OidIsValid(indexOid))
+			ereport(ERROR,
+					errcode(ERRCODE_UNDEFINED_OBJECT),
+					errmsg("index \"%s\" for table \"%s\" does not exist",
+						   indexname, RelationGetRelationName(rel)));
+	}
+	else
+		indexOid = InvalidOid;
+
+	return indexOid;
+}
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 733ef40ae7c..8863ad0e8bd 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -2287,7 +2287,8 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 				cluster_params.options |= CLUOPT_VERBOSE;
 
 			/* VACUUM FULL is now a variant of CLUSTER; see cluster.c */
-			cluster_rel(rel, InvalidOid, &cluster_params);
+			cluster_rel(REPACK_COMMAND_VACUUMFULL, false, rel, InvalidOid,
+						&cluster_params);
 			/* cluster_rel closes the relation, but keeps lock */
 
 			rel = NULL;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index db43034b9db..f9152728021 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -280,7 +280,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		AlterCompositeTypeStmt AlterUserMappingStmt
 		AlterRoleStmt AlterRoleSetStmt AlterPolicyStmt AlterStatsStmt
 		AlterDefaultPrivilegesStmt DefACLAction
-		AnalyzeStmt CallStmt ClosePortalStmt ClusterStmt CommentStmt
+		AnalyzeStmt CallStmt ClosePortalStmt CommentStmt
 		ConstraintsSetStmt CopyStmt CreateAsStmt CreateCastStmt
 		CreateDomainStmt CreateExtensionStmt CreateGroupStmt CreateOpClassStmt
 		CreateOpFamilyStmt AlterOpFamilyStmt CreatePLangStmt
@@ -297,7 +297,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		GrantStmt GrantRoleStmt ImportForeignSchemaStmt IndexStmt InsertStmt
 		ListenStmt LoadStmt LockStmt MergeStmt NotifyStmt ExplainableStmt PreparableStmt
 		CreateFunctionStmt AlterFunctionStmt ReindexStmt RemoveAggrStmt
-		RemoveFuncStmt RemoveOperStmt RenameStmt ReturnStmt RevokeStmt RevokeRoleStmt
+		RemoveFuncStmt RemoveOperStmt RenameStmt RepackStmt ReturnStmt RevokeStmt RevokeRoleStmt
 		RuleActionStmt RuleActionStmtOrEmpty RuleStmt
 		SecLabelStmt SelectStmt TransactionStmt TransactionStmtLegacy TruncateStmt
 		UnlistenStmt UpdateStmt VacuumStmt
@@ -316,7 +316,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 %type <str>			opt_single_name
 %type <list>		opt_qualified_name
-%type <boolean>		opt_concurrently
+%type <boolean>		opt_concurrently opt_usingindex
 %type <dbehavior>	opt_drop_behavior
 %type <list>		opt_utility_option_list
 %type <list>		utility_option_list
@@ -763,7 +763,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	QUOTE QUOTES
 
 	RANGE READ REAL REASSIGN RECURSIVE REF_P REFERENCES REFERENCING
-	REFRESH REINDEX RELATIVE_P RELEASE RENAME REPEATABLE REPLACE REPLICA
+	REFRESH REINDEX RELATIVE_P RELEASE RENAME REPACK REPEATABLE REPLACE REPLICA
 	RESET RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP
 	ROUTINE ROUTINES ROW ROWS RULE
 
@@ -1025,7 +1025,6 @@ stmt:
 			| CallStmt
 			| CheckPointStmt
 			| ClosePortalStmt
-			| ClusterStmt
 			| CommentStmt
 			| ConstraintsSetStmt
 			| CopyStmt
@@ -1099,6 +1098,7 @@ stmt:
 			| RemoveFuncStmt
 			| RemoveOperStmt
 			| RenameStmt
+			| RepackStmt
 			| RevokeStmt
 			| RevokeRoleStmt
 			| RuleStmt
@@ -1135,6 +1135,11 @@ opt_concurrently:
 			| /*EMPTY*/						{ $$ = false; }
 		;
 
+opt_usingindex:
+			USING INDEX						{ $$ = true; }
+			| /* EMPTY */					{ $$ = false; }
+		;
+
 opt_drop_behavior:
 			CASCADE							{ $$ = DROP_CASCADE; }
 			| RESTRICT						{ $$ = DROP_RESTRICT; }
@@ -11912,38 +11917,91 @@ CreateConversionStmt:
 /*****************************************************************************
  *
  *		QUERY:
+ *				REPACK [ (options) ] [ <qualified_name> [ USING INDEX <index_name> ] ]
+ *
+ *			obsolete variants:
  *				CLUSTER (options) [ <qualified_name> [ USING <index_name> ] ]
  *				CLUSTER [VERBOSE] [ <qualified_name> [ USING <index_name> ] ]
  *				CLUSTER [VERBOSE] <index_name> ON <qualified_name> (for pre-8.3)
  *
  *****************************************************************************/
 
-ClusterStmt:
-			CLUSTER '(' utility_option_list ')' qualified_name cluster_index_specification
+RepackStmt:
+			REPACK opt_utility_option_list qualified_name USING INDEX name
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = $3;
+					n->indexname = $6;
+					n->usingindex = true;
+					n->params = $2;
+					$$ = (Node *) n;
+				}
+			| REPACK opt_utility_option_list qualified_name opt_usingindex
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = $3;
+					n->indexname = NULL;
+					n->usingindex = $4;
+					n->params = $2;
+					$$ = (Node *) n;
+				}
+			| REPACK '(' utility_option_list ')'
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = NULL;
+					n->indexname = NULL;
+					n->usingindex = false;
+					n->params = $3;
+					$$ = (Node *) n;
+				}
+			| REPACK opt_usingindex
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = NULL;
+					n->indexname = NULL;
+					n->usingindex = $2;
+					n->params = NIL;
+					$$ = (Node *) n;
+				}
+			| CLUSTER '(' utility_option_list ')' qualified_name cluster_index_specification
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = $5;
 					n->indexname = $6;
+					n->usingindex = true;
 					n->params = $3;
 					$$ = (Node *) n;
 				}
 			| CLUSTER opt_utility_option_list
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = NULL;
 					n->indexname = NULL;
+					n->usingindex = true;
 					n->params = $2;
 					$$ = (Node *) n;
 				}
 			/* unparenthesized VERBOSE kept for pre-14 compatibility */
 			| CLUSTER opt_verbose qualified_name cluster_index_specification
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = $3;
 					n->indexname = $4;
+					n->usingindex = true;
 					if ($2)
 						n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
@@ -11951,20 +12009,24 @@ ClusterStmt:
 			/* unparenthesized VERBOSE kept for pre-17 compatibility */
 			| CLUSTER VERBOSE
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = NULL;
 					n->indexname = NULL;
+					n->usingindex = true;
 					n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
 				}
 			/* kept for pre-8.3 compatibility */
 			| CLUSTER opt_verbose name ON qualified_name
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = $5;
 					n->indexname = $3;
+					n->usingindex = true;
 					if ($2)
 						n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
@@ -17960,6 +18022,7 @@ unreserved_keyword:
 			| RELATIVE_P
 			| RELEASE
 			| RENAME
+			| REPACK
 			| REPEATABLE
 			| REPLACE
 			| REPLICA
@@ -18592,6 +18655,7 @@ bare_label_keyword:
 			| RELATIVE_P
 			| RELEASE
 			| RENAME
+			| REPACK
 			| REPEATABLE
 			| REPLACE
 			| REPLICA
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 4f4191b0ea6..8a30e081614 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -277,9 +277,9 @@ ClassifyUtilityCommandAsReadOnly(Node *parsetree)
 				return COMMAND_OK_IN_RECOVERY | COMMAND_OK_IN_READ_ONLY_TXN;
 			}
 
-		case T_ClusterStmt:
 		case T_ReindexStmt:
 		case T_VacuumStmt:
+		case T_RepackStmt:
 			{
 				/*
 				 * These commands write WAL, so they're not strictly
@@ -854,14 +854,14 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 			ExecuteCallStmt(castNode(CallStmt, parsetree), params, isAtomicContext, dest);
 			break;
 
-		case T_ClusterStmt:
-			cluster(pstate, (ClusterStmt *) parsetree, isTopLevel);
-			break;
-
 		case T_VacuumStmt:
 			ExecVacuum(pstate, (VacuumStmt *) parsetree, isTopLevel);
 			break;
 
+		case T_RepackStmt:
+			ExecRepack(pstate, (RepackStmt *) parsetree, isTopLevel);
+			break;
+
 		case T_ExplainStmt:
 			ExplainQuery(pstate, (ExplainStmt *) parsetree, params, dest);
 			break;
@@ -2850,10 +2850,6 @@ CreateCommandTag(Node *parsetree)
 			tag = CMDTAG_CALL;
 			break;
 
-		case T_ClusterStmt:
-			tag = CMDTAG_CLUSTER;
-			break;
-
 		case T_VacuumStmt:
 			if (((VacuumStmt *) parsetree)->is_vacuumcmd)
 				tag = CMDTAG_VACUUM;
@@ -2861,6 +2857,10 @@ CreateCommandTag(Node *parsetree)
 				tag = CMDTAG_ANALYZE;
 			break;
 
+		case T_RepackStmt:
+			tag = CMDTAG_REPACK;
+			break;
+
 		case T_ExplainStmt:
 			tag = CMDTAG_EXPLAIN;
 			break;
@@ -3498,7 +3498,7 @@ GetCommandLogLevel(Node *parsetree)
 			lev = LOGSTMT_ALL;
 			break;
 
-		case T_ClusterStmt:
+		case T_RepackStmt:
 			lev = LOGSTMT_DDL;
 			break;
 
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index c756c2bebaa..a1e10e8c2f6 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -268,6 +268,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 		cmdtype = PROGRESS_COMMAND_ANALYZE;
 	else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
 		cmdtype = PROGRESS_COMMAND_CLUSTER;
+	else if (pg_strcasecmp(cmd, "REPACK") == 0)
+		cmdtype = PROGRESS_COMMAND_REPACK;
 	else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)
 		cmdtype = PROGRESS_COMMAND_CREATE_INDEX;
 	else if (pg_strcasecmp(cmd, "BASEBACKUP") == 0)
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 8b10f2313f3..59ff6e0923b 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -1247,7 +1247,7 @@ static const char *const sql_commands[] = {
 	"DELETE FROM", "DISCARD", "DO", "DROP", "END", "EXECUTE", "EXPLAIN",
 	"FETCH", "GRANT", "IMPORT FOREIGN SCHEMA", "INSERT INTO", "LISTEN", "LOAD", "LOCK",
 	"MERGE INTO", "MOVE", "NOTIFY", "PREPARE",
-	"REASSIGN", "REFRESH MATERIALIZED VIEW", "REINDEX", "RELEASE",
+	"REASSIGN", "REFRESH MATERIALIZED VIEW", "REINDEX", "RELEASE", "REPACK",
 	"RESET", "REVOKE", "ROLLBACK",
 	"SAVEPOINT", "SECURITY LABEL", "SELECT", "SET", "SHOW", "START",
 	"TABLE", "TRUNCATE", "UNLISTEN", "UPDATE", "VACUUM", "VALUES", "WITH",
@@ -4997,6 +4997,37 @@ match_previous_words(int pattern_id,
 			COMPLETE_WITH_QUERY(Query_for_list_of_tablespaces);
 	}
 
+/* REPACK */
+	else if (Matches("REPACK"))
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
+	else if (Matches("REPACK", "(*)"))
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
+	/* If we have REPACK <sth>, then add "USING INDEX" */
+	else if (Matches("REPACK", MatchAnyExcept("(")))
+		COMPLETE_WITH("USING INDEX");
+	/* If we have REPACK (*) <sth>, then add "USING INDEX" */
+	else if (Matches("REPACK", "(*)", MatchAny))
+		COMPLETE_WITH("USING INDEX");
+	/* If we have REPACK <sth> USING, then add the index as well */
+	else if (Matches("REPACK", MatchAny, "USING", "INDEX"))
+	{
+		set_completion_reference(prev3_wd);
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_index_of_table);
+	}
+	else if (HeadMatches("REPACK", "(*") &&
+			 !HeadMatches("REPACK", "(*)"))
+	{
+		/*
+		 * This fires if we're in an unfinished parenthesized option list.
+		 * get_previous_words treats a completed parenthesized option list as
+		 * one word, so the above test is correct.
+		 */
+		if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
+			COMPLETE_WITH("VERBOSE");
+		else if (TailMatches("VERBOSE"))
+			COMPLETE_WITH("ON", "OFF");
+	}
+
 /* SECURITY LABEL */
 	else if (Matches("SECURITY"))
 		COMPLETE_WITH("LABEL");
diff --git a/src/bin/scripts/Makefile b/src/bin/scripts/Makefile
index f6b4d40810b..c3b40ebec2b 100644
--- a/src/bin/scripts/Makefile
+++ b/src/bin/scripts/Makefile
@@ -41,6 +41,13 @@ install: all installdirs
 	$(INSTALL_PROGRAM) vacuumdb$(X)   '$(DESTDIR)$(bindir)'/vacuumdb$(X)
 	$(INSTALL_PROGRAM) reindexdb$(X)  '$(DESTDIR)$(bindir)'/reindexdb$(X)
 	$(INSTALL_PROGRAM) pg_isready$(X) '$(DESTDIR)$(bindir)'/pg_isready$(X)
+ifneq ($(PORTNAME), win32)
+	@rm -f '$(DESTDIR)$(bindir)/pg_repackdb$(X)'
+	ln -s vacuumdb$(X) '$(DESTDIR)$(bindir)/pg_repackdb$(X)'
+else
+	$(INSTALL_PROGRAM) vacuumdb$(X) '$(DESTDIR)$(bindir)/pg_repackdb$(X)'
+endif
+
 
 installdirs:
 	$(MKDIR_P) '$(DESTDIR)$(bindir)'
diff --git a/src/bin/scripts/meson.build b/src/bin/scripts/meson.build
index 80df7c33257..4eddb71a4e8 100644
--- a/src/bin/scripts/meson.build
+++ b/src/bin/scripts/meson.build
@@ -35,6 +35,10 @@ foreach binary : binaries
   bin_targets += binary
 endforeach
 
+install_symlink('pg_repackdb',
+  kwargs: {'install_dir': dir_bin, 'pointing_to': 'vacuumdb'}
+)
+
 tests += {
   'name': 'scripts',
   'sd': meson.current_source_dir(),
diff --git a/src/bin/scripts/t/103_repackdb.pl b/src/bin/scripts/t/103_repackdb.pl
new file mode 100644
index 00000000000..51de4d7ab34
--- /dev/null
+++ b/src/bin/scripts/t/103_repackdb.pl
@@ -0,0 +1,24 @@
+# Copyright (c) 2021-2025, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+program_help_ok('pg_repackdb');
+program_version_ok('pg_repackdb');
+program_options_handling_ok('pg_repackdb');
+
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init;
+$node->start;
+
+$node->issues_sql_like(
+	[ 'pg_repackdb', 'postgres' ],
+	qr/statement: REPACK.*;/,
+	'SQL REPACK run');
+
+
+done_testing();
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 79b1096eb08..2db0cdbc789 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -63,6 +63,23 @@ typedef enum
 
 static VacObjFilter objfilter = OBJFILTER_NONE;
 
+typedef enum
+{
+	MODE_VACUUM,
+	MODE_REPACK
+} RunMode;
+
+static RunMode mode = MODE_VACUUM;
+
+static struct option *get_options(void);
+static void main_vacuum(int argc, char *argv[]);
+static void main_repack(int argc, char *argv[]);
+static void main_common(ConnParams *cparams, const char *dbname,
+						const char *maintenance_db, vacuumingOptions *vacopts,
+						SimpleStringList *objects, bool analyze_in_stages,
+						int tbl_count, int concurrentCons,
+						const char *progname, bool echo, bool quiet);
+
 static SimpleStringList *retrieve_objects(PGconn *conn,
 										  vacuumingOptions *vacopts,
 										  SimpleStringList *objects,
@@ -89,7 +106,8 @@ static void prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
 static void run_vacuum_command(PGconn *conn, const char *sql, bool echo,
 							   const char *table);
 
-static void help(const char *progname);
+static void help_vacuum(const char *progname);
+static void help_repack(const char *progname);
 
 void		check_objfilter(void);
 
@@ -99,55 +117,118 @@ static char *escape_quotes(const char *src);
 #define ANALYZE_NO_STAGE	-1
 #define ANALYZE_NUM_STAGES	3
 
-
 int
 main(int argc, char *argv[])
 {
-	static struct option long_options[] = {
-		{"host", required_argument, NULL, 'h'},
-		{"port", required_argument, NULL, 'p'},
-		{"username", required_argument, NULL, 'U'},
-		{"no-password", no_argument, NULL, 'w'},
-		{"password", no_argument, NULL, 'W'},
-		{"echo", no_argument, NULL, 'e'},
-		{"quiet", no_argument, NULL, 'q'},
-		{"dbname", required_argument, NULL, 'd'},
-		{"analyze", no_argument, NULL, 'z'},
-		{"analyze-only", no_argument, NULL, 'Z'},
-		{"freeze", no_argument, NULL, 'F'},
-		{"all", no_argument, NULL, 'a'},
-		{"table", required_argument, NULL, 't'},
-		{"full", no_argument, NULL, 'f'},
-		{"verbose", no_argument, NULL, 'v'},
-		{"jobs", required_argument, NULL, 'j'},
-		{"parallel", required_argument, NULL, 'P'},
-		{"schema", required_argument, NULL, 'n'},
-		{"exclude-schema", required_argument, NULL, 'N'},
-		{"maintenance-db", required_argument, NULL, 2},
-		{"analyze-in-stages", no_argument, NULL, 3},
-		{"disable-page-skipping", no_argument, NULL, 4},
-		{"skip-locked", no_argument, NULL, 5},
-		{"min-xid-age", required_argument, NULL, 6},
-		{"min-mxid-age", required_argument, NULL, 7},
-		{"no-index-cleanup", no_argument, NULL, 8},
-		{"force-index-cleanup", no_argument, NULL, 9},
-		{"no-truncate", no_argument, NULL, 10},
-		{"no-process-toast", no_argument, NULL, 11},
-		{"no-process-main", no_argument, NULL, 12},
-		{"buffer-usage-limit", required_argument, NULL, 13},
-		{"missing-stats-only", no_argument, NULL, 14},
-		{NULL, 0, NULL, 0}
-	};
+	const char *progname = get_progname(argv[0]);
 
+	if (strcmp(progname, "vacuumdb") == 0)
+	{
+		mode = MODE_VACUUM;
+		main_vacuum(argc, argv);
+	}
+	else
+	{
+		/*
+		 * The application is executed via a symbolic link.
+		 */
+		Assert(strcmp(progname, "pg_repackdb") == 0);
+
+		mode = MODE_REPACK;
+		main_repack(argc, argv);
+	}
+
+	exit(0);
+}
+
+static struct option long_options_common[] = {
+	{"host", required_argument, NULL, 'h'},
+	{"port", required_argument, NULL, 'p'},
+	{"username", required_argument, NULL, 'U'},
+	{"no-password", no_argument, NULL, 'w'},
+	{"password", no_argument, NULL, 'W'},
+	{"echo", no_argument, NULL, 'e'},
+	{"quiet", no_argument, NULL, 'q'},
+	{"dbname", required_argument, NULL, 'd'},
+	{"all", no_argument, NULL, 'a'},
+	{"table", required_argument, NULL, 't'},
+	{"verbose", no_argument, NULL, 'v'},
+	{"jobs", required_argument, NULL, 'j'},
+	{"schema", required_argument, NULL, 'n'},
+	{"exclude-schema", required_argument, NULL, 'N'},
+	{"maintenance-db", required_argument, NULL, 2}
+};
+
+static struct option long_options_vacuum[] = {
+	{"analyze", no_argument, NULL, 'z'},
+	{"analyze-only", no_argument, NULL, 'Z'},
+	{"freeze", no_argument, NULL, 'F'},
+	{"full", no_argument, NULL, 'f'},
+	{"parallel", required_argument, NULL, 'P'},
+	{"analyze-in-stages", no_argument, NULL, 3},
+	{"disable-page-skipping", no_argument, NULL, 4},
+	{"skip-locked", no_argument, NULL, 5},
+	/* TODO Consider moving to _common */
+	{"min-xid-age", required_argument, NULL, 6},
+	/* TODO Consider moving to _common */
+	{"min-mxid-age", required_argument, NULL, 7},
+	{"no-index-cleanup", no_argument, NULL, 8},
+	{"force-index-cleanup", no_argument, NULL, 9},
+	{"no-truncate", no_argument, NULL, 10},
+	{"no-process-toast", no_argument, NULL, 11},
+	{"no-process-main", no_argument, NULL, 12},
+	{"buffer-usage-limit", required_argument, NULL, 13},
+	{"missing-stats-only", no_argument, NULL, 14}
+};
+
+/* TODO Remove if there are eventually no specific options. */
+static struct option long_options_repack[] = {
+};
+
+/*
+ * Construct the options array. The result depends on whether we're doing
+ * VACUUM or REPACK.
+ */
+static struct option *
+get_options(void)
+{
+	int			ncommon = lengthof(long_options_common);
+	int			nother = mode == MODE_VACUUM ? lengthof(long_options_vacuum) :
+		lengthof(long_options_repack);
+	struct option *result = palloc_array(struct option, ncommon + nother + 1);
+	struct option *src = long_options_common;
+	struct option *dst = result;
+	int			i;
+
+	for (i = 0; i < ncommon; i++)
+	{
+		memcpy(dst, src, sizeof(struct option));
+		dst++;
+		src++;
+	}
+
+	src = mode == MODE_VACUUM ? long_options_vacuum : long_options_repack;
+	for (i = 0; i < nother; i++)
+	{
+		memcpy(dst, src, sizeof(struct option));
+		dst++;
+		src++;
+	}
+
+	/* End-of-list marker */
+	memset(dst, 0, sizeof(struct option));
+
+	return result;
+}
+
+static void
+main_vacuum(int argc, char *argv[])
+{
 	const char *progname;
 	int			optindex;
 	int			c;
 	const char *dbname = NULL;
 	const char *maintenance_db = NULL;
-	char	   *host = NULL;
-	char	   *port = NULL;
-	char	   *username = NULL;
-	enum trivalue prompt_password = TRI_DEFAULT;
 	ConnParams	cparams;
 	bool		echo = false;
 	bool		quiet = false;
@@ -167,13 +248,18 @@ main(int argc, char *argv[])
 	vacopts.process_main = true;
 	vacopts.process_toast = true;
 
+	/* the same for connection parameters */
+	memset(&cparams, 0, sizeof(cparams));
+	cparams.prompt_password = TRI_DEFAULT;
+
 	pg_logging_init(argv[0]);
 	progname = get_progname(argv[0]);
 	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
 
-	handle_help_version_opts(argc, argv, "vacuumdb", help);
+	handle_help_version_opts(argc, argv, progname, help_vacuum);
 
-	while ((c = getopt_long(argc, argv, "ad:efFh:j:n:N:p:P:qt:U:vwWzZ", long_options, &optindex)) != -1)
+	while ((c = getopt_long(argc, argv, "ad:efFh:j:n:N:p:P:qt:U:vwWzZ",
+							get_options(), &optindex)) != -1)
 	{
 		switch (c)
 		{
@@ -194,7 +280,7 @@ main(int argc, char *argv[])
 				vacopts.freeze = true;
 				break;
 			case 'h':
-				host = pg_strdup(optarg);
+				cparams.pghost = pg_strdup(optarg);
 				break;
 			case 'j':
 				if (!option_parse_int(optarg, "-j/--jobs", 1, INT_MAX,
@@ -210,7 +296,7 @@ main(int argc, char *argv[])
 				simple_string_list_append(&objects, optarg);
 				break;
 			case 'p':
-				port = pg_strdup(optarg);
+				cparams.pgport = pg_strdup(optarg);
 				break;
 			case 'P':
 				if (!option_parse_int(optarg, "-P/--parallel", 0, INT_MAX,
@@ -226,16 +312,16 @@ main(int argc, char *argv[])
 				tbl_count++;
 				break;
 			case 'U':
-				username = pg_strdup(optarg);
+				cparams.pguser = pg_strdup(optarg);
 				break;
 			case 'v':
 				vacopts.verbose = true;
 				break;
 			case 'w':
-				prompt_password = TRI_NO;
+				cparams.prompt_password = TRI_NO;
 				break;
 			case 'W':
-				prompt_password = TRI_YES;
+				cparams.prompt_password = TRI_YES;
 				break;
 			case 'z':
 				vacopts.and_analyze = true;
@@ -379,13 +465,141 @@ main(int argc, char *argv[])
 		pg_fatal("cannot use the \"%s\" option without \"%s\" or \"%s\"",
 				 "missing-stats-only", "analyze-only", "analyze-in-stages");
 
-	/* fill cparams except for dbname, which is set below */
-	cparams.pghost = host;
-	cparams.pgport = port;
-	cparams.pguser = username;
-	cparams.prompt_password = prompt_password;
-	cparams.override_dbname = NULL;
+	main_common(&cparams, dbname, maintenance_db, &vacopts, &objects,
+				analyze_in_stages, tbl_count, concurrentCons,
+				progname, echo, quiet);
+}
 
+static void
+main_repack(int argc, char *argv[])
+{
+	const char *progname;
+	int			optindex;
+	int			c;
+	const char *dbname = NULL;
+	const char *maintenance_db = NULL;
+	ConnParams	cparams;
+	bool		echo = false;
+	bool		quiet = false;
+	vacuumingOptions vacopts;
+	SimpleStringList objects = {NULL, NULL};
+	int			concurrentCons = 1;
+	int			tbl_count = 0;
+
+	/* initialize options */
+	memset(&vacopts, 0, sizeof(vacopts));
+
+	/* the same for connection parameters */
+	memset(&cparams, 0, sizeof(cparams));
+	cparams.prompt_password = TRI_DEFAULT;
+
+	pg_logging_init(argv[0]);
+	progname = get_progname(argv[0]);
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
+
+	handle_help_version_opts(argc, argv, progname, help_repack);
+
+	while ((c = getopt_long(argc, argv, "ad:eh:j:n:N:p:qt:U:vwW",
+							get_options(), &optindex)) != -1)
+	{
+		switch (c)
+		{
+			case 'a':
+				objfilter |= OBJFILTER_ALL_DBS;
+				break;
+			case 'd':
+				objfilter |= OBJFILTER_DATABASE;
+				dbname = pg_strdup(optarg);
+				break;
+			case 'e':
+				echo = true;
+				break;
+			case 'h':
+				cparams.pghost = pg_strdup(optarg);
+				break;
+			case 'j':
+				if (!option_parse_int(optarg, "-j/--jobs", 1, INT_MAX,
+									  &concurrentCons))
+					exit(1);
+				break;
+			case 'n':
+				objfilter |= OBJFILTER_SCHEMA;
+				simple_string_list_append(&objects, optarg);
+				break;
+			case 'N':
+				objfilter |= OBJFILTER_SCHEMA_EXCLUDE;
+				simple_string_list_append(&objects, optarg);
+				break;
+			case 'p':
+				cparams.pgport = pg_strdup(optarg);
+				break;
+			case 'q':
+				quiet = true;
+				break;
+			case 't':
+				objfilter |= OBJFILTER_TABLE;
+				simple_string_list_append(&objects, optarg);
+				tbl_count++;
+				break;
+			case 'U':
+				cparams.pguser = pg_strdup(optarg);
+				break;
+			case 'v':
+				vacopts.verbose = true;
+				break;
+			case 'w':
+				cparams.prompt_password = TRI_NO;
+				break;
+			case 'W':
+				cparams.prompt_password = TRI_YES;
+				break;
+			case 2:
+				maintenance_db = pg_strdup(optarg);
+				break;
+			default:
+				/* getopt_long already emitted a complaint */
+				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+				exit(1);
+		}
+	}
+
+	/*
+	 * Non-option argument specifies database name as long as it wasn't
+	 * already specified with -d / --dbname
+	 */
+	if (optind < argc && dbname == NULL)
+	{
+		objfilter |= OBJFILTER_DATABASE;
+		dbname = argv[optind];
+		optind++;
+	}
+
+	if (optind < argc)
+	{
+		pg_log_error("too many command-line arguments (first is \"%s\")",
+					 argv[optind]);
+		pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+		exit(1);
+	}
+
+	/*
+	 * Validate the combination of filters specified in the command-line
+	 * options.
+	 */
+	check_objfilter();
+
+	main_common(&cparams, dbname, maintenance_db, &vacopts, &objects,
+				false, tbl_count, concurrentCons,
+				progname, echo, quiet);
+}
+
+static void
+main_common(ConnParams *cparams, const char *dbname,
+			const char *maintenance_db, vacuumingOptions *vacopts,
+			SimpleStringList *objects, bool analyze_in_stages,
+			int tbl_count, int concurrentCons,
+			const char *progname, bool echo, bool quiet)
+{
 	setup_cancel_handler(NULL);
 
 	/* Avoid opening extra connections. */
@@ -394,11 +608,11 @@ main(int argc, char *argv[])
 
 	if (objfilter & OBJFILTER_ALL_DBS)
 	{
-		cparams.dbname = maintenance_db;
+		cparams->dbname = maintenance_db;
 
-		vacuum_all_databases(&cparams, &vacopts,
+		vacuum_all_databases(cparams, vacopts,
 							 analyze_in_stages,
-							 &objects,
+							 objects,
 							 concurrentCons,
 							 progname, echo, quiet);
 	}
@@ -414,7 +628,7 @@ main(int argc, char *argv[])
 				dbname = get_user_name_or_exit(progname);
 		}
 
-		cparams.dbname = dbname;
+		cparams->dbname = dbname;
 
 		if (analyze_in_stages)
 		{
@@ -423,23 +637,21 @@ main(int argc, char *argv[])
 
 			for (stage = 0; stage < ANALYZE_NUM_STAGES; stage++)
 			{
-				vacuum_one_database(&cparams, &vacopts,
+				vacuum_one_database(cparams, vacopts,
 									stage,
-									&objects,
-									vacopts.missing_stats_only ? &found_objs : NULL,
+									objects,
+									vacopts->missing_stats_only ? &found_objs : NULL,
 									concurrentCons,
 									progname, echo, quiet);
 			}
 		}
 		else
-			vacuum_one_database(&cparams, &vacopts,
+			vacuum_one_database(cparams, vacopts,
 								ANALYZE_NO_STAGE,
-								&objects, NULL,
+								objects, NULL,
 								concurrentCons,
 								progname, echo, quiet);
 	}
-
-	exit(0);
 }
 
 /*
@@ -450,19 +662,39 @@ check_objfilter(void)
 {
 	if ((objfilter & OBJFILTER_ALL_DBS) &&
 		(objfilter & OBJFILTER_DATABASE))
-		pg_fatal("cannot vacuum all databases and a specific one at the same time");
+	{
+		if (mode == MODE_VACUUM)
+			pg_fatal("cannot vacuum all databases and a specific one at the same time");
+		else
+			pg_fatal("cannot repack all databases and a specific one at the same time");
+	}
 
 	if ((objfilter & OBJFILTER_TABLE) &&
 		(objfilter & OBJFILTER_SCHEMA))
-		pg_fatal("cannot vacuum all tables in schema(s) and specific table(s) at the same time");
+	{
+		if (mode == MODE_VACUUM)
+			pg_fatal("cannot vacuum all tables in schema(s) and specific table(s) at the same time");
+		else
+			pg_fatal("cannot repack all tables in schema(s) and specific table(s) at the same time");
+	}
 
 	if ((objfilter & OBJFILTER_TABLE) &&
 		(objfilter & OBJFILTER_SCHEMA_EXCLUDE))
-		pg_fatal("cannot vacuum specific table(s) and exclude schema(s) at the same time");
+	{
+		if (mode == MODE_VACUUM)
+			pg_fatal("cannot vacuum specific table(s) and exclude schema(s) at the same time");
+		else
+			pg_fatal("cannot repack specific table(s) and exclude schema(s) at the same time");
+	}
 
 	if ((objfilter & OBJFILTER_SCHEMA) &&
 		(objfilter & OBJFILTER_SCHEMA_EXCLUDE))
-		pg_fatal("cannot vacuum all tables in schema(s) and exclude schema(s) at the same time");
+	{
+		if (mode == MODE_VACUUM)
+			pg_fatal("cannot vacuum all tables in schema(s) and exclude schema(s) at the same time");
+		else
+			pg_fatal("cannot repack all tables in schema(s) and exclude schema(s) at the same time");
+	}
 }
 
 /*
@@ -552,6 +784,13 @@ vacuum_one_database(ConnParams *cparams,
 
 	conn = connectDatabase(cparams, progname, echo, false, true);
 
+	if (mode == MODE_REPACK && PQserverVersion(conn) < 190000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" command on server versions older than PostgreSQL %s",
+				 "REPACK", "19");
+	}
+
 	if (vacopts->disable_page_skipping && PQserverVersion(conn) < 90600)
 	{
 		PQfinish(conn);
@@ -644,9 +883,15 @@ vacuum_one_database(ConnParams *cparams,
 		if (stage != ANALYZE_NO_STAGE)
 			printf(_("%s: processing database \"%s\": %s\n"),
 				   progname, PQdb(conn), _(stage_messages[stage]));
-		else
+		else if (mode == MODE_VACUUM)
 			printf(_("%s: vacuuming database \"%s\"\n"),
 				   progname, PQdb(conn));
+		else
+		{
+			Assert(mode == MODE_REPACK);
+			printf(_("%s: repacking database \"%s\"\n"),
+				   progname, PQdb(conn));
+		}
 		fflush(stdout);
 	}
 
@@ -749,7 +994,8 @@ vacuum_one_database(ConnParams *cparams,
 	}
 
 	/* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */
-	if (vacopts->skip_database_stats && stage == ANALYZE_NO_STAGE)
+	if (mode == MODE_VACUUM && vacopts->skip_database_stats &&
+		stage == ANALYZE_NO_STAGE)
 	{
 		const char *cmd = "VACUUM (ONLY_DATABASE_STATS);";
 		ParallelSlot *free_slot = ParallelSlotsGetIdle(sa, NULL);
@@ -1074,6 +1320,12 @@ vacuum_all_databases(ConnParams *cparams,
 	int			i;
 
 	conn = connectMaintenanceDatabase(cparams, progname, echo);
+	if (mode == MODE_REPACK && PQserverVersion(conn) < 190000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" command on server versions older than PostgreSQL %s",
+				 "REPACK", "19");
+	}
 	result = executeQuery(conn,
 						  "SELECT datname FROM pg_database WHERE datallowconn AND datconnlimit <> -2 ORDER BY 1;",
 						  echo);
@@ -1127,8 +1379,8 @@ vacuum_all_databases(ConnParams *cparams,
 }
 
 /*
- * Construct a vacuum/analyze command to run based on the given options, in the
- * given string buffer, which may contain previous garbage.
+ * Construct a vacuum/analyze/repack command to run based on the given
+ * options, in the given string buffer, which may contain previous garbage.
  *
  * The table name used must be already properly quoted.  The command generated
  * depends on the server version involved and it is semicolon-terminated.
@@ -1143,7 +1395,13 @@ prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
 
 	resetPQExpBuffer(sql);
 
-	if (vacopts->analyze_only)
+	if (mode == MODE_REPACK)
+	{
+		appendPQExpBufferStr(sql, "REPACK");
+		if (vacopts->verbose)
+			appendPQExpBufferStr(sql, " (VERBOSE)");
+	}
+	else if (vacopts->analyze_only)
 	{
 		appendPQExpBufferStr(sql, "ANALYZE");
 
@@ -1317,16 +1575,34 @@ run_vacuum_command(PGconn *conn, const char *sql, bool echo,
 	if (!status)
 	{
 		if (table)
-			pg_log_error("vacuuming of table \"%s\" in database \"%s\" failed: %s",
-						 table, PQdb(conn), PQerrorMessage(conn));
+		{
+			if (mode == MODE_VACUUM)
+				pg_log_error("vacuuming of table \"%s\" in database \"%s\" failed: %s",
+							 table, PQdb(conn), PQerrorMessage(conn));
+			else
+				pg_log_error("repacking of table \"%s\" in database \"%s\" failed: %s",
+							 table, PQdb(conn), PQerrorMessage(conn));
+		}
 		else
-			pg_log_error("vacuuming of database \"%s\" failed: %s",
-						 PQdb(conn), PQerrorMessage(conn));
+		{
+			if (mode == MODE_VACUUM)
+				pg_log_error("vacuuming of database \"%s\" failed: %s",
+							 PQdb(conn), PQerrorMessage(conn));
+			else
+				pg_log_error("repacking of database \"%s\" failed: %s",
+							 PQdb(conn), PQerrorMessage(conn));
+		}
 	}
 }
 
+/*
+ * As the order of options matters here, and since the command (vacuum/repack)
+ * is mentioned often, it'd look ugly if we tried to generate the text
+ * according to the 'mode' variable. Therefore, use a separate function for
+ * each mode.
+ */
 static void
-help(const char *progname)
+help_vacuum(const char *progname)
 {
 	printf(_("%s cleans and analyzes a PostgreSQL database.\n\n"), progname);
 	printf(_("Usage:\n"));
@@ -1372,3 +1648,33 @@ help(const char *progname)
 	printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
 	printf(_("%s home page: <%s>\n"), PACKAGE_NAME, PACKAGE_URL);
 }
+
+static void
+help_repack(const char *progname)
+{
+	printf(_("%s cleans and analyzes a PostgreSQL database.\n\n"), progname);
+	printf(_("Usage:\n"));
+	printf(_("  %s [OPTION]... [DBNAME]\n"), progname);
+	printf(_("\nOptions:\n"));
+	printf(_("  -a, --all                       repack all databases\n"));
+	printf(_("  -d, --dbname=DBNAME             database to repack\n"));
+	printf(_("  -e, --echo                      show the commands being sent to the server\n"));
+	printf(_("  -j, --jobs=NUM                  use this many concurrent connections to repack\n"));
+	printf(_("  -n, --schema=SCHEMA             repack tables in the specified schema(s) only\n"));
+	printf(_("  -N, --exclude-schema=SCHEMA     do not repack tables in the specified schema(s)\n"));
+	printf(_("  -q, --quiet                     don't write any messages\n"));
+	printf(_("  -t, --table='TABLE[(COLUMNS)]'  repack specific table(s) only\n"));
+	printf(_("  -v, --verbose                   write a lot of output\n"));
+	printf(_("  -V, --version                   output version information, then exit\n"));
+	printf(_("  -?, --help                      show this help, then exit\n"));
+	printf(_("\nConnection options:\n"));
+	printf(_("  -h, --host=HOSTNAME       database server host or socket directory\n"));
+	printf(_("  -p, --port=PORT           database server port\n"));
+	printf(_("  -U, --username=USERNAME   user name to connect as\n"));
+	printf(_("  -w, --no-password         never prompt for password\n"));
+	printf(_("  -W, --password            force password prompt\n"));
+	printf(_("  --maintenance-db=DBNAME   alternate maintenance database\n"));
+	printf(_("\nRead the description of the SQL command VACUUM for details.\n"));
+	printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
+	printf(_("%s home page: <%s>\n"), PACKAGE_NAME, PACKAGE_URL);
+}
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 60088a64cbb..890998d84bb 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -24,6 +24,7 @@
 #define CLUOPT_RECHECK 0x02		/* recheck relation state */
 #define CLUOPT_RECHECK_ISCLUSTERED 0x04 /* recheck relation state for
 										 * indisclustered */
+#define CLUOPT_ANALYZE 0x08		/* do an ANALYZE */
 
 /* options for CLUSTER */
 typedef struct ClusterParams
@@ -31,8 +32,11 @@ typedef struct ClusterParams
 	bits32		options;		/* bitmask of CLUOPT_* */
 } ClusterParams;
 
-extern void cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel);
-extern void cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params);
+
+extern void ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel);
+
+extern void cluster_rel(RepackCommand command, bool usingindex,
+						Relation OldHeap, Oid indexOid, ClusterParams *params);
 extern void check_index_is_clusterable(Relation OldHeap, Oid indexOid,
 									   LOCKMODE lockmode);
 extern void mark_index_clustered(Relation rel, Oid indexOid, bool is_internal);
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 1cde4bd9bcf..5b6639c114c 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -56,24 +56,51 @@
 #define PROGRESS_ANALYZE_PHASE_COMPUTE_EXT_STATS		4
 #define PROGRESS_ANALYZE_PHASE_FINALIZE_ANALYZE			5
 
-/* Progress parameters for cluster */
-#define PROGRESS_CLUSTER_COMMAND				0
-#define PROGRESS_CLUSTER_PHASE					1
-#define PROGRESS_CLUSTER_INDEX_RELID			2
-#define PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED	3
-#define PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN	4
-#define PROGRESS_CLUSTER_TOTAL_HEAP_BLKS		5
-#define PROGRESS_CLUSTER_HEAP_BLKS_SCANNED		6
-#define PROGRESS_CLUSTER_INDEX_REBUILD_COUNT	7
+/*
+ * Progress parameters for REPACK.
+ *
+ * Note: Since REPACK shares some code with CLUSTER, these values are also
+ * used by CLUSTER. (CLUSTER is now deprecated, so it makes little sense to
+ * introduce a separate set of constants.)
+ */
+#define PROGRESS_REPACK_COMMAND					0
+#define PROGRESS_REPACK_PHASE					1
+#define PROGRESS_REPACK_INDEX_RELID				2
+#define PROGRESS_REPACK_HEAP_TUPLES_SCANNED		3
+#define PROGRESS_REPACK_HEAP_TUPLES_WRITTEN		4
+#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			5
+#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		6
+#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		7
 
-/* Phases of cluster (as advertised via PROGRESS_CLUSTER_PHASE) */
-#define PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP	1
-#define PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP	2
-#define PROGRESS_CLUSTER_PHASE_SORT_TUPLES		3
-#define PROGRESS_CLUSTER_PHASE_WRITE_NEW_HEAP	4
-#define PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES	5
-#define PROGRESS_CLUSTER_PHASE_REBUILD_INDEX	6
-#define PROGRESS_CLUSTER_PHASE_FINAL_CLEANUP	7
+/*
+ * Phases of repack (as advertised via PROGRESS_REPACK_PHASE).
+ */
+#define PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP		1
+#define PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP	2
+#define PROGRESS_REPACK_PHASE_SORT_TUPLES		3
+#define PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP	4
+#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	5
+#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		6
+#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		7
+
+/*
+ * Commands of PROGRESS_REPACK
+ *
+ * Currently we only have one command, so the PROGRESS_REPACK_COMMAND
+ * parameter is not necessary. However it makes cluster.c simpler if we have
+ * the same set of parameters for CLUSTER and REPACK - see the note on REPACK
+ * parameters above.
+ */
+#define PROGRESS_REPACK_COMMAND_REPACK			1
+
+/*
+ * Progress parameters for cluster.
+ *
+ * Although we need to report REPACK and CLUSTER in separate views, the
+ * parameters and phases of CLUSTER are a subset of those of REPACK. Therefore
+ * we just use the appropriate values defined for REPACK above instead of
+ * defining a separate set of constants here.
+ */
 
 /* Commands of PROGRESS_CLUSTER */
 #define PROGRESS_CLUSTER_COMMAND_CLUSTER		1
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 86a236bd58b..fcc25a0c592 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3949,16 +3949,26 @@ typedef struct AlterSystemStmt
 } AlterSystemStmt;
 
 /* ----------------------
- *		Cluster Statement (support pbrown's cluster index implementation)
+ *		Repack Statement
  * ----------------------
  */
-typedef struct ClusterStmt
+typedef enum RepackCommand
+{
+	REPACK_COMMAND_CLUSTER,
+	REPACK_COMMAND_REPACK,
+	REPACK_COMMAND_VACUUMFULL,
+} RepackCommand;
+
+typedef struct RepackStmt
 {
 	NodeTag		type;
-	RangeVar   *relation;		/* relation being indexed, or NULL if all */
-	char	   *indexname;		/* original index defined */
+	RepackCommand command;		/* type of command being run */
+	RangeVar   *relation;		/* relation being repacked */
+	char	   *indexname;		/* order tuples by this index */
+	bool		usingindex;		/* whether USING INDEX is specified */
 	List	   *params;			/* list of DefElem nodes */
-} ClusterStmt;
+} RepackStmt;
+
 
 /* ----------------------
  *		Vacuum and Analyze Statements
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index a4af3f717a1..22559369e2c 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -374,6 +374,7 @@ PG_KEYWORD("reindex", REINDEX, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("relative", RELATIVE_P, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("release", RELEASE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("rename", RENAME, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("repack", REPACK, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("repeatable", REPEATABLE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("replace", REPLACE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("replica", REPLICA, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/include/tcop/cmdtaglist.h b/src/include/tcop/cmdtaglist.h
index d250a714d59..cceb312f2b3 100644
--- a/src/include/tcop/cmdtaglist.h
+++ b/src/include/tcop/cmdtaglist.h
@@ -196,6 +196,7 @@ PG_CMDTAG(CMDTAG_REASSIGN_OWNED, "REASSIGN OWNED", false, false, false)
 PG_CMDTAG(CMDTAG_REFRESH_MATERIALIZED_VIEW, "REFRESH MATERIALIZED VIEW", true, false, false)
 PG_CMDTAG(CMDTAG_REINDEX, "REINDEX", true, false, false)
 PG_CMDTAG(CMDTAG_RELEASE, "RELEASE", false, false, false)
+PG_CMDTAG(CMDTAG_REPACK, "REPACK", false, false, false)
 PG_CMDTAG(CMDTAG_RESET, "RESET", false, false, false)
 PG_CMDTAG(CMDTAG_REVOKE, "REVOKE", true, false, false)
 PG_CMDTAG(CMDTAG_REVOKE_ROLE, "REVOKE ROLE", false, false, false)
diff --git a/src/include/utils/backend_progress.h b/src/include/utils/backend_progress.h
index dda813ab407..e69e366dcdc 100644
--- a/src/include/utils/backend_progress.h
+++ b/src/include/utils/backend_progress.h
@@ -28,6 +28,7 @@ typedef enum ProgressCommandType
 	PROGRESS_COMMAND_CREATE_INDEX,
 	PROGRESS_COMMAND_BASEBACKUP,
 	PROGRESS_COMMAND_COPY,
+	PROGRESS_COMMAND_REPACK,
 } ProgressCommandType;
 
 #define PGSTAT_NUM_PROGRESS_PARAM	20
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index 4d40a6809ab..5256628b51d 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -254,6 +254,63 @@ ORDER BY 1;
  clstr_tst_pkey
 (3 rows)
 
+-- REPACK handles individual tables identically to CLUSTER, but it's worth
+-- checking if it handles table hierarchies identically as well.
+REPACK clstr_tst USING INDEX clstr_tst_c;
+-- Verify that inheritance link still works
+INSERT INTO clstr_tst_inh VALUES (0, 100, 'in child table 2');
+SELECT a,b,c,substring(d for 30), length(d) from clstr_tst;
+ a  |  b  |        c         |           substring            | length 
+----+-----+------------------+--------------------------------+--------
+ 10 |  14 | catorce          |                                |       
+ 18 |   5 | cinco            |                                |       
+  9 |   4 | cuatro           |                                |       
+ 26 |  19 | diecinueve       |                                |       
+ 12 |  18 | dieciocho        |                                |       
+ 30 |  16 | dieciseis        |                                |       
+ 24 |  17 | diecisiete       |                                |       
+  2 |  10 | diez             |                                |       
+ 23 |  12 | doce             |                                |       
+ 11 |   2 | dos              |                                |       
+ 25 |   9 | nueve            |                                |       
+ 31 |   8 | ocho             |                                |       
+  1 |  11 | once             |                                |       
+ 28 |  15 | quince           |                                |       
+ 32 |   6 | seis             | xyzzyxyzzyxyzzyxyzzyxyzzyxyzzy | 500000
+ 29 |   7 | siete            |                                |       
+ 15 |  13 | trece            |                                |       
+ 22 |  30 | treinta          |                                |       
+ 17 |  32 | treinta y dos    |                                |       
+  3 |  31 | treinta y uno    |                                |       
+  5 |   3 | tres             |                                |       
+ 20 |   1 | uno              |                                |       
+  6 |  20 | veinte           |                                |       
+ 14 |  25 | veinticinco      |                                |       
+ 21 |  24 | veinticuatro     |                                |       
+  4 |  22 | veintidos        |                                |       
+ 19 |  29 | veintinueve      |                                |       
+ 16 |  28 | veintiocho       |                                |       
+ 27 |  26 | veintiseis       |                                |       
+ 13 |  27 | veintisiete      |                                |       
+  7 |  23 | veintitres       |                                |       
+  8 |  21 | veintiuno        |                                |       
+  0 | 100 | in child table   |                                |       
+  0 | 100 | in child table 2 |                                |       
+(34 rows)
+
+-- Verify that foreign key link still works
+INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
+ERROR:  insert or update on table "clstr_tst" violates foreign key constraint "clstr_tst_con"
+DETAIL:  Key (b)=(1111) is not present in table "clstr_tst_s".
+SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
+ORDER BY 1;
+       conname        
+----------------------
+ clstr_tst_a_not_null
+ clstr_tst_con
+ clstr_tst_pkey
+(3 rows)
+
 SELECT relname, relkind,
     EXISTS(SELECT 1 FROM pg_class WHERE oid = c.reltoastrelid) AS hastoast
 FROM pg_class c WHERE relname LIKE 'clstr_tst%' ORDER BY relname;
@@ -381,6 +438,35 @@ SELECT * FROM clstr_1;
  2
 (2 rows)
 
+-- REPACK w/o argument performs no ordering, so we can only check which tables
+-- have the relfilenode changed.
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_old AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+SET SESSION AUTHORIZATION regress_clstr_user;
+SET client_min_messages = ERROR;  -- order of "skipping" warnings may vary
+REPACK;
+RESET client_min_messages;
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_new AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+-- Do the actual comparison. Unlike CLUSTER, clstr_3 should have been
+-- processed because there is nothing like clustering index here.
+SELECT o.relname FROM relnodes_old o
+JOIN relnodes_new n ON o.relname = n.relname
+WHERE o.relfilenode <> n.relfilenode
+ORDER BY o.relname;
+ relname 
+---------
+ clstr_1
+ clstr_3
+(2 rows)
+
+SET SESSION AUTHORIZATION regress_clstr_user;
 -- Test MVCC-safety of cluster. There isn't much we can do to verify the
 -- results with a single backend...
 CREATE TABLE clustertest (key int PRIMARY KEY);
@@ -495,6 +581,43 @@ ALTER TABLE clstrpart SET WITHOUT CLUSTER;
 ERROR:  cannot mark index clustered in partitioned table
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 ERROR:  cannot mark index clustered in partitioned table
+-- Check that REPACK sets new relfilenodes: it should process exactly the same
+-- tables as CLUSTER did.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart USING INDEX clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
+-- And finally the same for REPACK w/o index.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
 DROP TABLE clstrpart;
 -- Ownership of partitions is checked
 CREATE TABLE ptnowner(i int unique) PARTITION BY LIST (i);
@@ -513,7 +636,7 @@ CREATE TEMP TABLE ptnowner_oldnodes AS
   JOIN pg_class AS c ON c.oid=tree.relid;
 SET SESSION AUTHORIZATION regress_ptnowner;
 CLUSTER ptnowner USING ptnowner_i_idx;
-WARNING:  permission denied to cluster "ptnowner2", skipping it
+WARNING:  permission denied to execute CLUSTER on "ptnowner2", skipping it
 RESET SESSION AUTHORIZATION;
 SELECT a.relname, a.relfilenode=b.relfilenode FROM pg_class a
   JOIN ptnowner_oldnodes b USING (oid) ORDER BY a.relname COLLATE "C";
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 35e8aad7701..3a1d1d28282 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2071,6 +2071,29 @@ pg_stat_progress_create_index| SELECT s.pid,
     s.param15 AS partitions_done
    FROM (pg_stat_get_progress_info('CREATE INDEX'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+pg_stat_progress_repack| SELECT s.pid,
+    s.datid,
+    d.datname,
+    s.relid,
+        CASE s.param2
+            WHEN 0 THEN 'initializing'::text
+            WHEN 1 THEN 'seq scanning heap'::text
+            WHEN 2 THEN 'index scanning heap'::text
+            WHEN 3 THEN 'sorting tuples'::text
+            WHEN 4 THEN 'writing new heap'::text
+            WHEN 5 THEN 'swapping relation files'::text
+            WHEN 6 THEN 'rebuilding index'::text
+            WHEN 7 THEN 'performing final cleanup'::text
+            ELSE NULL::text
+        END AS phase,
+    (s.param3)::oid AS repack_index_relid,
+    s.param4 AS heap_tuples_scanned,
+    s.param5 AS heap_tuples_written,
+    s.param6 AS heap_blks_total,
+    s.param7 AS heap_blks_scanned,
+    s.param8 AS index_rebuild_count
+   FROM (pg_stat_get_progress_info('REPACK'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
+     LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_vacuum| SELECT s.pid,
     s.datid,
     d.datname,
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index b7115f86104..cfcc3dc9761 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -76,6 +76,19 @@ INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
 SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
 ORDER BY 1;
 
+-- REPACK handles individual tables identically to CLUSTER, but it's worth
+-- checking if it handles table hierarchies identically as well.
+REPACK clstr_tst USING INDEX clstr_tst_c;
+
+-- Verify that inheritance link still works
+INSERT INTO clstr_tst_inh VALUES (0, 100, 'in child table 2');
+SELECT a,b,c,substring(d for 30), length(d) from clstr_tst;
+
+-- Verify that foreign key link still works
+INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
+
+SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
+ORDER BY 1;
 
 SELECT relname, relkind,
     EXISTS(SELECT 1 FROM pg_class WHERE oid = c.reltoastrelid) AS hastoast
@@ -159,6 +172,34 @@ INSERT INTO clstr_1 VALUES (1);
 CLUSTER clstr_1;
 SELECT * FROM clstr_1;
 
+-- REPACK w/o argument performs no ordering, so we can only check which tables
+-- have the relfilenode changed.
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_old AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+
+SET SESSION AUTHORIZATION regress_clstr_user;
+SET client_min_messages = ERROR;  -- order of "skipping" warnings may vary
+REPACK;
+RESET client_min_messages;
+
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_new AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+
+-- Do the actual comparison. Unlike CLUSTER, clstr_3 should have been
+-- processed because there is nothing like clustering index here.
+SELECT o.relname FROM relnodes_old o
+JOIN relnodes_new n ON o.relname = n.relname
+WHERE o.relfilenode <> n.relfilenode
+ORDER BY o.relname;
+
+SET SESSION AUTHORIZATION regress_clstr_user;
+
 -- Test MVCC-safety of cluster. There isn't much we can do to verify the
 -- results with a single backend...
 
@@ -229,6 +270,24 @@ SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM o
 CLUSTER clstrpart;
 ALTER TABLE clstrpart SET WITHOUT CLUSTER;
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+
+-- Check that REPACK sets new relfilenodes: it should process exactly the same
+-- tables as CLUSTER did.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart USING INDEX clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+
+-- And finally the same for REPACK w/o index.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+
 DROP TABLE clstrpart;
 
 -- Ownership of partitions is checked
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e4a9ec65ab4..b8c125aa016 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2536,6 +2536,8 @@ ReorderBufferTupleCidKey
 ReorderBufferUpdateProgressTxnCB
 ReorderTuple
 RepOriginId
+RepackCommand
+RepackStmt
 ReparameterizeForeignPathByChild_function
 ReplaceVarsFromTargetList_context
 ReplaceVarsNoMatchOption
@@ -2602,6 +2604,7 @@ RtlNtStatusToDosError_t
 RuleInfo
 RuleLock
 RuleStmt
+RunMode
 RunningTransactions
 RunningTransactionsData
 SASLStatus
-- 
2.39.5

Antonin Houska

ah@cybertec.at

5 months ago

In reply to: Alvaro Herrera (#6)

Re: Adding REPACK [concurrently]

Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2025-Aug-16, Robert Treat wrote:

On Tue, Aug 5, 2025 at 4:59 AM Antonin Houska <ah@cybertec.at> wrote:

Now that we want to cover the CLUSTER/VACUUM FULL completely, I've checked the
options of VACUUM FULL. I found two items not supported by REPACK (but also
not supported by by CLUSTER): ANALYZE and SKIP_DATABASE_STATS. Maybe just
let's mention that in the user documentation of REPACK?

I would note that both pg_repack and pg_squeeze analyze by default,
and running "vacuum full analyze" is the recommended behavior, so not
having analyze included is a step backwards.

Make sense to add ANALYZE as an option to repack, yeah.

So if I repack a single table with
REPACK (ANALYZE) table USING INDEX;

then do you expect that this would first cluster the table under
AccessExclusiveLock, then release the lock to do the analyze step, or
would the analyze be done under the same lock? This is significant for
a query that starts while repack is running, because if we release the
AEL then the query is planned when there are no stats for the table,
which might be bad.

I think the time to run the analyze step should be considerable shorter
than the time to run the repacking step, so running both together under
the same lock should be okay.

AFAICS, VACUUM FULL first releases the AEL, then it analyzes the table. If
users did not complain so far, I'd assume that vacuum_rel() (effectively
cluster_rel() in the FULL case) does not change the stats that much.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#10

Antonin Houska

ah@cybertec.at

5 months ago

In reply to: Álvaro Herrera (#8)

Re: Adding REPACK [concurrently]

Álvaro Herrera <alvherre@kurilemu.de> wrote:

Still on pg_repackdb, the implementation here is to install a symlink
called pg_repackdb which points to vacuumdb, and make the program behave
differently when called in this way. The amount of additional code for
this is relatively small, so I think this is a worthy technique --
assuming it works. If it doesn't, Antonin proposed a separate binary
that just calls some functions from vacuumdb. Or maybe we could have a
common source file that both utilities call.

There's an issue with the symlink, maybe some meson expert can help. In
particular, the CI on Windows ends up with the following error:

ERROR: Tried to install symlink to missing file C:/cirrus/build/tmp_install/usr/local/pgsql/bin/vacuumdb

(The reason it does not happen on other platforms might be that the build is
slower on Windows, and thus it's more prone to some specific race conditions.)

It appears that the 'point_to' argument of the 'install_symlink()' function
[1]: https://mesonbuild.com/Reference-manual_functions.html#install_symlink
the function does not wait for the creation of the 'vacuumdb' executable.

I could not find another symlink of this kind in the tree. (AFAICS, the
postmaster->postgres symlink had been removed before Meson has been
introduced.)

Does anyone happen to have a clue? Thanks.

[1]: https://mesonbuild.com/Reference-manual_functions.html#install_symlink
[2]: https://mesonbuild.com/Reference-manual_returned_tgt.html

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#11

Álvaro Herrera

alvherre@kurilemu.de

5 months ago

In reply to: Antonin Houska (#10)

Re: Adding REPACK [concurrently]

On 2025-Aug-20, Antonin Houska wrote:

There's an issue with the symlink, maybe some meson expert can help. In
particular, the CI on Windows ends up with the following error:

ERROR: Tried to install symlink to missing file C:/cirrus/build/tmp_install/usr/local/pgsql/bin/vacuumdb

Hmm, that's not the problem I see in the CI run from the commitfest app:

https://cirrus-ci.com/task/5608274336153600

[19:11:00.642] FAILED: [code=2] src/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj
[19:11:00.642] "cl" "-Isrc\bin\scripts\vacuumdb.exe.p" "-Isrc\include" "-I..\src\include" "-Ic:\openssl\1.1\include" "-I..\src\include\port\win32" "-I..\src\include\port\win32_msvc" "-Isrc/interfaces/libpq" "-I..\src\interfaces\libpq" "/MDd" "/nologo" "/showIncludes" "/utf-8" "/W2" "/Od" "/Zi" "/Zc:preprocessor" "/DWIN32" "/DWINDOWS" "/D__WINDOWS__" "/D__WIN32__" "/D_CRT_SECURE_NO_DEPRECATE" "/D_CRT_NONSTDC_NO_DEPRECATE" "/wd4018" "/wd4244" "/wd4273" "/wd4101" "/wd4102" "/wd4090" "/wd4267" "/Fdsrc\bin\scripts\vacuumdb.exe.p\vacuumdb.c.pdb" /Fosrc/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj "/c" ../src/bin/scripts/vacuumdb.c
[19:11:00.642] ../src/bin/scripts/vacuumdb.c(186): error C2059: syntax error: '}'
[19:11:00.642] ../src/bin/scripts/vacuumdb.c(197): warning C4034: sizeof returns 0

The real problem here seems to be the empty long_options_repack array.
I removed it and started a new run to see what happens. Running now:
https://cirrus-ci.com/build/4961902171783168

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/

#12

Antonin Houska

ah@cybertec.at

5 months ago

In reply to: Álvaro Herrera (#11)

Re: Adding REPACK [concurrently]

Álvaro Herrera <alvherre@kurilemu.de> wrote:

On 2025-Aug-20, Antonin Houska wrote:

There's an issue with the symlink, maybe some meson expert can help. In
particular, the CI on Windows ends up with the following error:

ERROR: Tried to install symlink to missing file C:/cirrus/build/tmp_install/usr/local/pgsql/bin/vacuumdb

Hmm, that's not the problem I see in the CI run from the commitfest app:

https://cirrus-ci.com/task/5608274336153600

I was referring to the other build that you shared off-list (probably
independent from cfbot):

https://cirrus-ci.com/build/4726227505774592

[19:11:00.642] FAILED: [code=2] src/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj
[19:11:00.642] "cl" "-Isrc\bin\scripts\vacuumdb.exe.p" "-Isrc\include" "-I..\src\include" "-Ic:\openssl\1.1\include" "-I..\src\include\port\win32" "-I..\src\include\port\win32_msvc" "-Isrc/interfaces/libpq" "-I..\src\interfaces\libpq" "/MDd" "/nologo" "/showIncludes" "/utf-8" "/W2" "/Od" "/Zi" "/Zc:preprocessor" "/DWIN32" "/DWINDOWS" "/D__WINDOWS__" "/D__WIN32__" "/D_CRT_SECURE_NO_DEPRECATE" "/D_CRT_NONSTDC_NO_DEPRECATE" "/wd4018" "/wd4244" "/wd4273" "/wd4101" "/wd4102" "/wd4090" "/wd4267" "/Fdsrc\bin\scripts\vacuumdb.exe.p\vacuumdb.c.pdb" /Fosrc/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj "/c" ../src/bin/scripts/vacuumdb.c
[19:11:00.642] ../src/bin/scripts/vacuumdb.c(186): error C2059: syntax error: '}'
[19:11:00.642] ../src/bin/scripts/vacuumdb.c(197): warning C4034: sizeof returns 0

The real problem here seems to be the empty long_options_repack array.
I removed it and started a new run to see what happens. Running now:
https://cirrus-ci.com/build/4961902171783168

The symlink issue occurred at "Windows - Server 2019, MinGW64 - Meson", where
the code compiled well. The compilation failure mentioned above comes from
"Windows - Server 2019, VS 2019 - Meson & ninja". I think it's still possible
that the symlink issue will occur there once the compilation is fixed.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#13

Andres Freund

andres@anarazel.de

5 months ago

In reply to: Antonin Houska (#12)

Re: Adding REPACK [concurrently]

Hi,

On 2025-08-20 16:22:41 +0200, Antonin Houska wrote:

ï¿½lvaro Herrera <alvherre@kurilemu.de> wrote:

On 2025-Aug-20, Antonin Houska wrote:

There's an issue with the symlink, maybe some meson expert can help. In
particular, the CI on Windows ends up with the following error:

ERROR: Tried to install symlink to missing file C:/cirrus/build/tmp_install/usr/local/pgsql/bin/vacuumdb

Hmm, that's not the problem I see in the CI run from the commitfest app:

https://cirrus-ci.com/task/5608274336153600

I was referring to the other build that you shared off-list (probably
independent from cfbot):

https://cirrus-ci.com/build/4726227505774592

[19:11:00.642] FAILED: [code=2] src/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj
[19:11:00.642] "cl" "-Isrc\bin\scripts\vacuumdb.exe.p" "-Isrc\include" "-I..\src\include" "-Ic:\openssl\1.1\include" "-I..\src\include\port\win32" "-I..\src\include\port\win32_msvc" "-Isrc/interfaces/libpq" "-I..\src\interfaces\libpq" "/MDd" "/nologo" "/showIncludes" "/utf-8" "/W2" "/Od" "/Zi" "/Zc:preprocessor" "/DWIN32" "/DWINDOWS" "/D__WINDOWS__" "/D__WIN32__" "/D_CRT_SECURE_NO_DEPRECATE" "/D_CRT_NONSTDC_NO_DEPRECATE" "/wd4018" "/wd4244" "/wd4273" "/wd4101" "/wd4102" "/wd4090" "/wd4267" "/Fdsrc\bin\scripts\vacuumdb.exe.p\vacuumdb.c.pdb" /Fosrc/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj "/c" ../src/bin/scripts/vacuumdb.c
[19:11:00.642] ../src/bin/scripts/vacuumdb.c(186): error C2059: syntax error: '}'
[19:11:00.642] ../src/bin/scripts/vacuumdb.c(197): warning C4034: sizeof returns 0

The real problem here seems to be the empty long_options_repack array.
I removed it and started a new run to see what happens. Running now:
https://cirrus-ci.com/build/4961902171783168

The symlink issue occurred at "Windows - Server 2019, MinGW64 - Meson", where
the code compiled well. The compilation failure mentioned above comes from
"Windows - Server 2019, VS 2019 - Meson & ninja". I think it's still possible
that the symlink issue will occur there once the compilation is fixed.

FWIW, I don't think it's particularly wise to rely on symlinks on windows -
IIRC they will often not be enabled outside of development environments.

Greetings,

Andres Freund

#14

Antonin Houska

ah@cybertec.at

5 months ago

In reply to: Andres Freund (#13)

Re: Adding REPACK [concurrently]

Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2025-08-20 16:22:41 +0200, Antonin Houska wrote:

Álvaro Herrera <alvherre@kurilemu.de> wrote:

On 2025-Aug-20, Antonin Houska wrote:

There's an issue with the symlink, maybe some meson expert can help. In
particular, the CI on Windows ends up with the following error:

ERROR: Tried to install symlink to missing file C:/cirrus/build/tmp_install/usr/local/pgsql/bin/vacuumdb

Hmm, that's not the problem I see in the CI run from the commitfest app:

https://cirrus-ci.com/task/5608274336153600

I was referring to the other build that you shared off-list (probably
independent from cfbot):

https://cirrus-ci.com/build/4726227505774592

[19:11:00.642] FAILED: [code=2] src/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj
[19:11:00.642] "cl" "-Isrc\bin\scripts\vacuumdb.exe.p" "-Isrc\include" "-I..\src\include" "-Ic:\openssl\1.1\include" "-I..\src\include\port\win32" "-I..\src\include\port\win32_msvc" "-Isrc/interfaces/libpq" "-I..\src\interfaces\libpq" "/MDd" "/nologo" "/showIncludes" "/utf-8" "/W2" "/Od" "/Zi" "/Zc:preprocessor" "/DWIN32" "/DWINDOWS" "/D__WINDOWS__" "/D__WIN32__" "/D_CRT_SECURE_NO_DEPRECATE" "/D_CRT_NONSTDC_NO_DEPRECATE" "/wd4018" "/wd4244" "/wd4273" "/wd4101" "/wd4102" "/wd4090" "/wd4267" "/Fdsrc\bin\scripts\vacuumdb.exe.p\vacuumdb.c.pdb" /Fosrc/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj "/c" ../src/bin/scripts/vacuumdb.c
[19:11:00.642] ../src/bin/scripts/vacuumdb.c(186): error C2059: syntax error: '}'
[19:11:00.642] ../src/bin/scripts/vacuumdb.c(197): warning C4034: sizeof returns 0

The real problem here seems to be the empty long_options_repack array.
I removed it and started a new run to see what happens. Running now:
https://cirrus-ci.com/build/4961902171783168

The symlink issue occurred at "Windows - Server 2019, MinGW64 - Meson", where
the code compiled well. The compilation failure mentioned above comes from
"Windows - Server 2019, VS 2019 - Meson & ninja". I think it's still possible
that the symlink issue will occur there once the compilation is fixed.

FWIW, I don't think it's particularly wise to rely on symlinks on windows -
IIRC they will often not be enabled outside of development environments.

ok, installing a copy of the same executable with a different name seems more
reliable. At least that's how the postmaster->postgres link used to be
handled, if I read Makefile correctly. Thanks.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#15

Andres Freund

andres@anarazel.de

5 months ago

In reply to: Antonin Houska (#14)

Re: Adding REPACK [concurrently]

Hi,

On 2025-08-21 20:14:14 +0200, Antonin Houska wrote:

ok, installing a copy of the same executable with a different name seems more
reliable. At least that's how the postmaster->postgres link used to be
handled, if I read Makefile correctly. Thanks.

I have not followed this thread, but I don't think the whole thing of having a
single executable with multiple names is worth doing. Just make whatever an
option, instead of having multiple "executables".

Greetings,

Andres

#16

Robert Treat

rob@xzilla.net

5 months ago

In reply to: Álvaro Herrera (#8)

Re: Adding REPACK [concurrently]

On Tue, Aug 19, 2025 at 2:53 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:

Note choice of shell command name: though all the other programs in
src/bin/scripts do not use the "pg_" prefix, this one does; we thought
it made no sense to follow the old programs as precedent because there
seems to be a lament for the lack of pg_ prefix in those, and we only
keep what they are because of their long history. This one has no
history.

Still on pg_repackdb, the implementation here is to install a symlink
called pg_repackdb which points to vacuumdb, and make the program behave
differently when called in this way. The amount of additional code for
this is relatively small, so I think this is a worthy technique --
assuming it works. If it doesn't, Antonin proposed a separate binary
that just calls some functions from vacuumdb. Or maybe we could have a
common source file that both utilities call.

What's the plan for clusterdb? It seems like we'd ideally create a
stand alone pg_repackdb which replaces clusterdb and also allows us to
remove the FULL options from vacuumdb.

Robert Treat
https://xzilla.net

#17

Álvaro Herrera

alvherre@kurilemu.de

5 months ago

In reply to: Robert Treat (#16)

Re: Adding REPACK [concurrently]

On 2025-Aug-21, Robert Treat wrote:

What's the plan for clusterdb? It seems like we'd ideally create a
stand alone pg_repackdb which replaces clusterdb and also allows us to
remove the FULL options from vacuumdb.

I don't think we should remove clusterdb, to avoid breaking any scripts
that work today. As you say, I would create the standalone pg_repackdb
to do what we need it to do (namely: run the REPACK commands) and leave
vacuumdb and clusterdb alone. Removing the obsolete commands and
options can be done in a few years.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

#18

Euler Taveira

euler@eulerto.com

5 months ago

In reply to: Álvaro Herrera (#17)

Re: Adding REPACK [concurrently]

On Fri, Aug 22, 2025, at 6:40 AM, Álvaro Herrera wrote:

On 2025-Aug-21, Robert Treat wrote:

What's the plan for clusterdb? It seems like we'd ideally create a
stand alone pg_repackdb which replaces clusterdb and also allows us to
remove the FULL options from vacuumdb.

I don't think we should remove clusterdb, to avoid breaking any scripts
that work today. As you say, I would create the standalone pg_repackdb
to do what we need it to do (namely: run the REPACK commands) and leave
vacuumdb and clusterdb alone. Removing the obsolete commands and
options can be done in a few years.

I would say that we need to plan the removal of these binaries (clusterdb and
vacuumdb). We can start with a warning into clusterdb saying they should use
pg_repackdb. In a few years, we can remove clusterdb. There were complaints
about binary names without a pg_ prefix in the past [1]/messages/by-id/CAJgfmqXYYKXR+QUhEa3cq6pc8dV0Hu7QvOUccm7R0TkC=T-+=A@mail.gmail.com.

I don't think we need to keep vacuumdb. Packagers can keep a symlink (vacuumdb)
to pg_repackdb. We can add a similar warning message saying they should use
pg_repackdb if the symlink is used.

[1]: /messages/by-id/CAJgfmqXYYKXR+QUhEa3cq6pc8dV0Hu7QvOUccm7R0TkC=T-+=A@mail.gmail.com

--
Euler Taveira
EDB https://www.enterprisedb.com/

#19

Michael Banck

mbanck@gmx.net

5 months ago

In reply to: Euler Taveira (#18)

Re: Adding REPACK [concurrently]

Hi,

On Fri, Aug 22, 2025 at 05:32:34PM -0300, Euler Taveira wrote:

On Fri, Aug 22, 2025, at 6:40 AM, ï¿½lvaro Herrera wrote:

On 2025-Aug-21, Robert Treat wrote:

What's the plan for clusterdb? It seems like we'd ideally create a
stand alone pg_repackdb which replaces clusterdb and also allows us to
remove the FULL options from vacuumdb.

I don't think we should remove clusterdb, to avoid breaking any scripts
that work today. As you say, I would create the standalone pg_repackdb
to do what we need it to do (namely: run the REPACK commands) and leave
vacuumdb and clusterdb alone. Removing the obsolete commands and
options can be done in a few years.

I would say that we need to plan the removal of these binaries (clusterdb and
vacuumdb). We can start with a warning into clusterdb saying they should use
pg_repackdb. In a few years, we can remove clusterdb. There were complaints
about binary names without a pg_ prefix in the past [1].

Yeah.

I don't think we need to keep vacuumdb. Packagers can keep a symlink (vacuumdb)
to pg_repackdb. We can add a similar warning message saying they should use
pg_repackdb if the symlink is used.

Unless pg_repack has the same (or a superset of) CLI and behaviour as
vacuumdb (I haven't checked, but doubt it?), I think replacing vacuumdb
with a symlink to pg_repack will lead to much more breakage in existing
scripts/automation than clusterdb, which I guess is used orders of
magnitude less frequently than vacumdb.

Michael

#20

Álvaro Herrera

alvherre@kurilemu.de

5 months ago

In reply to: Michael Banck (#19)

Re: Adding REPACK [concurrently]

On 2025-08-23, Michael Banck wrote:

On Fri, Aug 22, 2025 at 05:32:34PM -0300, Euler Taveira wrote:

I don't think we need to keep vacuumdb. Packagers can keep a symlink (vacuumdb)
to pg_repackdb. We can add a similar warning message saying they should use
pg_repackdb if the symlink is used.

Unless pg_repack has the same (or a superset of) CLI and behaviour as
vacuumdb (I haven't checked, but doubt it?), I think replacing vacuumdb
with a symlink to pg_repack will lead to much more breakage in existing
scripts/automation than clusterdb, which I guess is used orders of
magnitude less frequently than vacumdb.

Yeah, I completely disagree with the idea of getting rid of vacuumdb. We can, maybe, in a distant future, get rid of the --full option to vacuumdb. But the rest of the vacuumdb behavior must stay, I think, because REPACK is not VACUUM — it is only VACUUM FULL. And we want to make that distinction very clear.

We can also, in a few years, get rid of clusterdb. But I don't think we need to deprecate it just yet.

--
Álvaro Herrera

#21

Robert Treat

rob@xzilla.net

5 months ago

In reply to: Álvaro Herrera (#20)

Re: Adding REPACK [concurrently]

On Sat, Aug 23, 2025 at 10:23 AM Álvaro Herrera <alvherre@kurilemu.de> wrote:

On 2025-08-23, Michael Banck wrote:

On Fri, Aug 22, 2025 at 05:32:34PM -0300, Euler Taveira wrote:

I don't think we need to keep vacuumdb. Packagers can keep a symlink (vacuumdb)
to pg_repackdb. We can add a similar warning message saying they should use
pg_repackdb if the symlink is used.

Unless pg_repack has the same (or a superset of) CLI and behaviour as
vacuumdb (I haven't checked, but doubt it?), I think replacing vacuumdb
with a symlink to pg_repack will lead to much more breakage in existing
scripts/automation than clusterdb, which I guess is used orders of
magnitude less frequently than vacumdb.

Yeah, I completely disagree with the idea of getting rid of vacuumdb. We can, maybe, in a distant future, get rid of the --full option to vacuumdb. But the rest of the vacuumdb behavior must stay, I think, because REPACK is not VACUUM — it is only VACUUM FULL. And we want to make that distinction very clear.

Or to put it the other way, VACUUM FULL is not really VACUUM either,
it is really a form of "repack".

We can also, in a few years, get rid of clusterdb. But I don't think we need to deprecate it just yet.

Yeah, ISTM the long term goal should be two binaries, one of which
manages aspects of clustering/repacking type of activities, and one
which manages vacuum type activities. I don't think that's different
that what Alvaro is proposing, FWIW my original question was about
confirming that was the end goal, but also trying to understand the
coordination of when these changes would take place, because the
changes to the code, changes to the SQL commands and their docs, and
changes to the command line tools, seem to be working at different
cadences. Which can be fine if it's on purpose, but maybe needs to be
tightened up if not; for example, the current patchset doesn't make
any changes to clusterdb, which one might expect to emit a warning
about being deprecated in favor of pg_repackdb, if not just a complete
punting to use pg_repackdb instead.

Robert Treat
https://xzilla.net

#22

Alvaro Herrera

alvherre@alvh.no-ip.org

4 months ago

In reply to: Alvaro Herrera (#1)

6 attachment(s)

Re: Adding REPACK [concurrently]

Hello,

Here's v19 of this patchset. This is mostly Antonin's v18. I added a
preparatory v19-0001 commit, which splits vacuumdb.c to create a new
file, vacuuming.c (and its header file vacuuming.h). If you look at it
under 'git show --color-moved=zebra' you should notice that most of it
is just code movement; there's hardly any code changes.

v19-0002 has absorbed Antonin's v18-0005 (the pg_repackdb binary)
together with the introduction of the REPACK command proper; but instead
of using a symlink, I just created a separate pg_repackdb.c source file
for it and we compile that small new source file with vacuuming.c to
create a regular binary. BTW the meson.build changes look somewhat
duplicative; maybe there's a less dumb way to go about this. (For
instance, maybe just have libscripts.a include vacuuming.o, though it's
not used by any of the other programs in that subdir.)

I'm not wedded to the name "vacuuming.c"; happy to take suggestions.

After 0002, the pg_repackdb utility should be ready to take clusterdb's
place, and also vacuumdb --full, with one gotcha: if you try to use
pg_repackdb with an older server version, it will fail, claiming that
REPACK is not supported. This is not ideal. Instead, we should make it
run VACUUM FULL (or CLUSTER); so if you have a fleet including older
servers you can use the new utils there too.

All the logic for vacuumdb to select tables to operate on has been moved
to vacuuming.c verbatim. This means this logic applies to pg_repackdb
as well. As long as you stick to repacking a single table this is okay
(read: it won't be used at all), but if you want to use parallel mode
(say to process multiple schemas), we might need to change it. For the
same reason, I think we should add an option to it (--index[=indexname])
to select whether to use the USING INDEX clause or not, and optionally
indicate which index to use; right now there's no way to select which
logic (cluster's or vacuum full's) to use.

Then v19-0003 through v19-0005 are Antonin's subsequent patches to add
the CONCURRENTLY option; I have not reviewed these at all, so I'm
including them here just for completion. I also included v18-0006 as
posted by Mihail previously, though I have little faith that we're going
to include it in this release.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Pensar que el espectro que vemos es ilusorio no lo despoja de espanto,
sólo le suma el nuevo terror de la locura" (Perelandra, C.S. Lewis)

Attachments:

v19-0001-Split-vacuumdb-to-create-vacuuming.c-h.patchtext/x-diff; charset=utf-8Download

From 9b7a81619278991f48b91d2f236aede2261493b1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= <alvherre@kurilemu.de>
Date: Sat, 30 Aug 2025 14:39:49 +0200
Subject: [PATCH v19 1/6] Split vacuumdb to create vacuuming.c/h

---
 src/bin/scripts/Makefile    |    4 +-
 src/bin/scripts/meson.build |   28 +-
 src/bin/scripts/vacuumdb.c  | 1048 +----------------------------------
 src/bin/scripts/vacuuming.c |  978 ++++++++++++++++++++++++++++++++
 src/bin/scripts/vacuuming.h |   95 ++++
 5 files changed, 1119 insertions(+), 1034 deletions(-)
 create mode 100644 src/bin/scripts/vacuuming.c
 create mode 100644 src/bin/scripts/vacuuming.h

diff --git a/src/bin/scripts/Makefile b/src/bin/scripts/Makefile
index f6b4d40810b..019ca06455d 100644
--- a/src/bin/scripts/Makefile
+++ b/src/bin/scripts/Makefile
@@ -28,7 +28,7 @@ createuser: createuser.o common.o $(WIN32RES) | submake-libpq submake-libpgport
 dropdb: dropdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 dropuser: dropuser.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 clusterdb: clusterdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
-vacuumdb: vacuumdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
+vacuumdb: vacuumdb.o vacuuming.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 reindexdb: reindexdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 pg_isready: pg_isready.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 
@@ -50,7 +50,7 @@ uninstall:
 
 clean distclean:
 	rm -f $(addsuffix $(X), $(PROGRAMS)) $(addsuffix .o, $(PROGRAMS))
-	rm -f common.o $(WIN32RES)
+	rm -f common.o vacuuming.o $(WIN32RES)
 	rm -rf tmp_check
 
 export with_icu
diff --git a/src/bin/scripts/meson.build b/src/bin/scripts/meson.build
index 80df7c33257..a4fed59d1c9 100644
--- a/src/bin/scripts/meson.build
+++ b/src/bin/scripts/meson.build
@@ -12,7 +12,6 @@ binaries = [
   'createuser',
   'dropuser',
   'clusterdb',
-  'vacuumdb',
   'reindexdb',
   'pg_isready',
 ]
@@ -35,6 +34,33 @@ foreach binary : binaries
   bin_targets += binary
 endforeach
 
+vacuuming_common = static_library('libvacuuming_common',
+  files('common.c', 'vacuuming.c'),
+  dependencies: [frontend_code, libpq],
+  kwargs: internal_lib_args,
+)
+
+binaries = [
+  'vacuumdb',
+]
+foreach binary : binaries
+  binary_sources = files('@0@.c'.format(binary))
+
+  if host_system == 'windows'
+    binary_sources += rc_bin_gen.process(win32ver_rc, extra_args: [
+      '--NAME', binary,
+      '--FILEDESC', '@0@ - PostgreSQL utility'.format(binary),])
+  endif
+
+  binary = executable(binary,
+    binary_sources,
+    link_with: [vacuuming_common],
+    dependencies: [frontend_code, libpq],
+    kwargs: default_bin_args,
+  )
+  bin_targets += binary
+endforeach
+
 tests += {
   'name': 'scripts',
   'sd': meson.current_source_dir(),
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index fd236087e90..b1be61ddf25 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -14,92 +14,13 @@
 
 #include <limits.h>
 
-#include "catalog/pg_attribute_d.h"
-#include "catalog/pg_class_d.h"
 #include "common.h"
-#include "common/connect.h"
 #include "common/logging.h"
-#include "fe_utils/cancel.h"
 #include "fe_utils/option_utils.h"
-#include "fe_utils/parallel_slot.h"
-#include "fe_utils/query_utils.h"
-#include "fe_utils/simple_list.h"
-#include "fe_utils/string_utils.h"
-
-
-/* vacuum options controlled by user flags */
-typedef struct vacuumingOptions
-{
-	bool		analyze_only;
-	bool		verbose;
-	bool		and_analyze;
-	bool		full;
-	bool		freeze;
-	bool		disable_page_skipping;
-	bool		skip_locked;
-	int			min_xid_age;
-	int			min_mxid_age;
-	int			parallel_workers;	/* >= 0 indicates user specified the
-									 * parallel degree, otherwise -1 */
-	bool		no_index_cleanup;
-	bool		force_index_cleanup;
-	bool		do_truncate;
-	bool		process_main;
-	bool		process_toast;
-	bool		skip_database_stats;
-	char	   *buffer_usage_limit;
-	bool		missing_stats_only;
-} vacuumingOptions;
-
-/* object filter options */
-typedef enum
-{
-	OBJFILTER_NONE = 0,			/* no filter used */
-	OBJFILTER_ALL_DBS = (1 << 0),	/* -a | --all */
-	OBJFILTER_DATABASE = (1 << 1),	/* -d | --dbname */
-	OBJFILTER_TABLE = (1 << 2), /* -t | --table */
-	OBJFILTER_SCHEMA = (1 << 3),	/* -n | --schema */
-	OBJFILTER_SCHEMA_EXCLUDE = (1 << 4),	/* -N | --exclude-schema */
-} VacObjFilter;
-
-static VacObjFilter objfilter = OBJFILTER_NONE;
-
-static SimpleStringList *retrieve_objects(PGconn *conn,
-										  vacuumingOptions *vacopts,
-										  SimpleStringList *objects,
-										  bool echo);
-
-static void vacuum_one_database(ConnParams *cparams,
-								vacuumingOptions *vacopts,
-								int stage,
-								SimpleStringList *objects,
-								SimpleStringList **found_objs,
-								int concurrentCons,
-								const char *progname, bool echo, bool quiet);
-
-static void vacuum_all_databases(ConnParams *cparams,
-								 vacuumingOptions *vacopts,
-								 bool analyze_in_stages,
-								 SimpleStringList *objects,
-								 int concurrentCons,
-								 const char *progname, bool echo, bool quiet);
-
-static void prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
-								   vacuumingOptions *vacopts, const char *table);
-
-static void run_vacuum_command(PGconn *conn, const char *sql, bool echo,
-							   const char *table);
+#include "vacuuming.h"
 
 static void help(const char *progname);
-
-void		check_objfilter(void);
-
-static char *escape_quotes(const char *src);
-
-/* For analyze-in-stages mode */
-#define ANALYZE_NO_STAGE	-1
-#define ANALYZE_NUM_STAGES	3
-
+static void check_objfilter(void);
 
 int
 main(int argc, char *argv[])
@@ -145,10 +66,6 @@ main(int argc, char *argv[])
 	int			c;
 	const char *dbname = NULL;
 	const char *maintenance_db = NULL;
-	char	   *host = NULL;
-	char	   *port = NULL;
-	char	   *username = NULL;
-	enum trivalue prompt_password = TRI_DEFAULT;
 	ConnParams	cparams;
 	bool		echo = false;
 	bool		quiet = false;
@@ -168,13 +85,18 @@ main(int argc, char *argv[])
 	vacopts.process_main = true;
 	vacopts.process_toast = true;
 
+	/* the same for connection parameters */
+	memset(&cparams, 0, sizeof(cparams));
+	cparams.prompt_password = TRI_DEFAULT;
+
 	pg_logging_init(argv[0]);
 	progname = get_progname(argv[0]);
 	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
 
-	handle_help_version_opts(argc, argv, "vacuumdb", help);
+	handle_help_version_opts(argc, argv, progname, help);
 
-	while ((c = getopt_long(argc, argv, "ad:efFh:j:n:N:p:P:qt:U:vwWzZ", long_options, &optindex)) != -1)
+	while ((c = getopt_long(argc, argv, "ad:efFh:j:n:N:p:P:qt:U:vwWzZ",
+							long_options, &optindex)) != -1)
 	{
 		switch (c)
 		{
@@ -195,7 +117,7 @@ main(int argc, char *argv[])
 				vacopts.freeze = true;
 				break;
 			case 'h':
-				host = pg_strdup(optarg);
+				cparams.pghost = pg_strdup(optarg);
 				break;
 			case 'j':
 				if (!option_parse_int(optarg, "-j/--jobs", 1, INT_MAX,
@@ -211,7 +133,7 @@ main(int argc, char *argv[])
 				simple_string_list_append(&objects, optarg);
 				break;
 			case 'p':
-				port = pg_strdup(optarg);
+				cparams.pgport = pg_strdup(optarg);
 				break;
 			case 'P':
 				if (!option_parse_int(optarg, "-P/--parallel", 0, INT_MAX,
@@ -227,16 +149,16 @@ main(int argc, char *argv[])
 				tbl_count++;
 				break;
 			case 'U':
-				username = pg_strdup(optarg);
+				cparams.pguser = pg_strdup(optarg);
 				break;
 			case 'v':
 				vacopts.verbose = true;
 				break;
 			case 'w':
-				prompt_password = TRI_NO;
+				cparams.prompt_password = TRI_NO;
 				break;
 			case 'W':
-				prompt_password = TRI_YES;
+				cparams.prompt_password = TRI_YES;
 				break;
 			case 'z':
 				vacopts.and_analyze = true;
@@ -380,66 +302,9 @@ main(int argc, char *argv[])
 		pg_fatal("cannot use the \"%s\" option without \"%s\" or \"%s\"",
 				 "missing-stats-only", "analyze-only", "analyze-in-stages");
 
-	/* fill cparams except for dbname, which is set below */
-	cparams.pghost = host;
-	cparams.pgport = port;
-	cparams.pguser = username;
-	cparams.prompt_password = prompt_password;
-	cparams.override_dbname = NULL;
-
-	setup_cancel_handler(NULL);
-
-	/* Avoid opening extra connections. */
-	if (tbl_count && (concurrentCons > tbl_count))
-		concurrentCons = tbl_count;
-
-	if (objfilter & OBJFILTER_ALL_DBS)
-	{
-		cparams.dbname = maintenance_db;
-
-		vacuum_all_databases(&cparams, &vacopts,
-							 analyze_in_stages,
-							 &objects,
-							 concurrentCons,
-							 progname, echo, quiet);
-	}
-	else
-	{
-		if (dbname == NULL)
-		{
-			if (getenv("PGDATABASE"))
-				dbname = getenv("PGDATABASE");
-			else if (getenv("PGUSER"))
-				dbname = getenv("PGUSER");
-			else
-				dbname = get_user_name_or_exit(progname);
-		}
-
-		cparams.dbname = dbname;
-
-		if (analyze_in_stages)
-		{
-			int			stage;
-			SimpleStringList *found_objs = NULL;
-
-			for (stage = 0; stage < ANALYZE_NUM_STAGES; stage++)
-			{
-				vacuum_one_database(&cparams, &vacopts,
-									stage,
-									&objects,
-									vacopts.missing_stats_only ? &found_objs : NULL,
-									concurrentCons,
-									progname, echo, quiet);
-			}
-		}
-		else
-			vacuum_one_database(&cparams, &vacopts,
-								ANALYZE_NO_STAGE,
-								&objects, NULL,
-								concurrentCons,
-								progname, echo, quiet);
-	}
-
+	vacuuming_main(&cparams, dbname, maintenance_db, &vacopts, &objects,
+				   analyze_in_stages, tbl_count, concurrentCons,
+				   progname, echo, quiet);
 	exit(0);
 }
 
@@ -466,885 +331,6 @@ check_objfilter(void)
 		pg_fatal("cannot vacuum all tables in schema(s) and exclude schema(s) at the same time");
 }
 
-/*
- * Returns a newly malloc'd version of 'src' with escaped single quotes and
- * backslashes.
- */
-static char *
-escape_quotes(const char *src)
-{
-	char	   *result = escape_single_quotes_ascii(src);
-
-	if (!result)
-		pg_fatal("out of memory");
-	return result;
-}
-
-/*
- * vacuum_one_database
- *
- * Process tables in the given database.
- *
- * There are two ways to specify the list of objects to process:
- *
- * 1) The "found_objs" parameter is a double pointer to a fully qualified list
- *    of objects to process, as returned by a previous call to
- *    vacuum_one_database().
- *
- *     a) If both "found_objs" (the double pointer) and "*found_objs" (the
- *        once-dereferenced double pointer) are not NULL, this list takes
- *        priority, and anything specified in "objects" is ignored.
- *
- *     b) If "found_objs" (the double pointer) is not NULL but "*found_objs"
- *        (the once-dereferenced double pointer) _is_ NULL, the "objects"
- *        parameter takes priority, and the results of the catalog query
- *        described in (2) are stored in "found_objs".
- *
- *     c) If "found_objs" (the double pointer) is NULL, the "objects"
- *        parameter again takes priority, and the results of the catalog query
- *        are not saved.
- *
- * 2) The "objects" parameter is a user-specified list of objects to process.
- *    When (1b) or (1c) applies, this function performs a catalog query to
- *    retrieve a fully qualified list of objects to process, as described
- *    below.
- *
- *     a) If "objects" is not NULL, the catalog query gathers only the objects
- *        listed in "objects".
- *
- *     b) If "objects" is NULL, all tables in the database are gathered.
- *
- * Note that this function is only concerned with running exactly one stage
- * when in analyze-in-stages mode; caller must iterate on us if necessary.
- *
- * If concurrentCons is > 1, multiple connections are used to vacuum tables
- * in parallel.
- */
-static void
-vacuum_one_database(ConnParams *cparams,
-					vacuumingOptions *vacopts,
-					int stage,
-					SimpleStringList *objects,
-					SimpleStringList **found_objs,
-					int concurrentCons,
-					const char *progname, bool echo, bool quiet)
-{
-	PQExpBufferData sql;
-	PGconn	   *conn;
-	SimpleStringListCell *cell;
-	ParallelSlotArray *sa;
-	int			ntups = 0;
-	bool		failed = false;
-	const char *initcmd;
-	SimpleStringList *ret = NULL;
-	const char *stage_commands[] = {
-		"SET default_statistics_target=1; SET vacuum_cost_delay=0;",
-		"SET default_statistics_target=10; RESET vacuum_cost_delay;",
-		"RESET default_statistics_target;"
-	};
-	const char *stage_messages[] = {
-		gettext_noop("Generating minimal optimizer statistics (1 target)"),
-		gettext_noop("Generating medium optimizer statistics (10 targets)"),
-		gettext_noop("Generating default (full) optimizer statistics")
-	};
-
-	Assert(stage == ANALYZE_NO_STAGE ||
-		   (stage >= 0 && stage < ANALYZE_NUM_STAGES));
-
-	conn = connectDatabase(cparams, progname, echo, false, true);
-
-	if (vacopts->disable_page_skipping && PQserverVersion(conn) < 90600)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "disable-page-skipping", "9.6");
-	}
-
-	if (vacopts->no_index_cleanup && PQserverVersion(conn) < 120000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "no-index-cleanup", "12");
-	}
-
-	if (vacopts->force_index_cleanup && PQserverVersion(conn) < 120000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "force-index-cleanup", "12");
-	}
-
-	if (!vacopts->do_truncate && PQserverVersion(conn) < 120000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "no-truncate", "12");
-	}
-
-	if (!vacopts->process_main && PQserverVersion(conn) < 160000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "no-process-main", "16");
-	}
-
-	if (!vacopts->process_toast && PQserverVersion(conn) < 140000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "no-process-toast", "14");
-	}
-
-	if (vacopts->skip_locked && PQserverVersion(conn) < 120000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "skip-locked", "12");
-	}
-
-	if (vacopts->min_xid_age != 0 && PQserverVersion(conn) < 90600)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "--min-xid-age", "9.6");
-	}
-
-	if (vacopts->min_mxid_age != 0 && PQserverVersion(conn) < 90600)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "--min-mxid-age", "9.6");
-	}
-
-	if (vacopts->parallel_workers >= 0 && PQserverVersion(conn) < 130000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "--parallel", "13");
-	}
-
-	if (vacopts->buffer_usage_limit && PQserverVersion(conn) < 160000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "--buffer-usage-limit", "16");
-	}
-
-	if (vacopts->missing_stats_only && PQserverVersion(conn) < 150000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "--missing-stats-only", "15");
-	}
-
-	/* skip_database_stats is used automatically if server supports it */
-	vacopts->skip_database_stats = (PQserverVersion(conn) >= 160000);
-
-	if (!quiet)
-	{
-		if (stage != ANALYZE_NO_STAGE)
-			printf(_("%s: processing database \"%s\": %s\n"),
-				   progname, PQdb(conn), _(stage_messages[stage]));
-		else
-			printf(_("%s: vacuuming database \"%s\"\n"),
-				   progname, PQdb(conn));
-		fflush(stdout);
-	}
-
-	/*
-	 * If the caller provided the results of a previous catalog query, just
-	 * use that.  Otherwise, run the catalog query ourselves and set the
-	 * return variable if provided.
-	 */
-	if (found_objs && *found_objs)
-		ret = *found_objs;
-	else
-	{
-		ret = retrieve_objects(conn, vacopts, objects, echo);
-		if (found_objs)
-			*found_objs = ret;
-	}
-
-	/*
-	 * Count the number of objects in the catalog query result.  If there are
-	 * none, we are done.
-	 */
-	for (cell = ret ? ret->head : NULL; cell; cell = cell->next)
-		ntups++;
-
-	if (ntups == 0)
-	{
-		PQfinish(conn);
-		return;
-	}
-
-	/*
-	 * Ensure concurrentCons is sane.  If there are more connections than
-	 * vacuumable relations, we don't need to use them all.
-	 */
-	if (concurrentCons > ntups)
-		concurrentCons = ntups;
-	if (concurrentCons <= 0)
-		concurrentCons = 1;
-
-	/*
-	 * All slots need to be prepared to run the appropriate analyze stage, if
-	 * caller requested that mode.  We have to prepare the initial connection
-	 * ourselves before setting up the slots.
-	 */
-	if (stage == ANALYZE_NO_STAGE)
-		initcmd = NULL;
-	else
-	{
-		initcmd = stage_commands[stage];
-		executeCommand(conn, initcmd, echo);
-	}
-
-	/*
-	 * Setup the database connections. We reuse the connection we already have
-	 * for the first slot.  If not in parallel mode, the first slot in the
-	 * array contains the connection.
-	 */
-	sa = ParallelSlotsSetup(concurrentCons, cparams, progname, echo, initcmd);
-	ParallelSlotsAdoptConn(sa, conn);
-
-	initPQExpBuffer(&sql);
-
-	cell = ret->head;
-	do
-	{
-		const char *tabname = cell->val;
-		ParallelSlot *free_slot;
-
-		if (CancelRequested)
-		{
-			failed = true;
-			goto finish;
-		}
-
-		free_slot = ParallelSlotsGetIdle(sa, NULL);
-		if (!free_slot)
-		{
-			failed = true;
-			goto finish;
-		}
-
-		prepare_vacuum_command(&sql, PQserverVersion(free_slot->connection),
-							   vacopts, tabname);
-
-		/*
-		 * Execute the vacuum.  All errors are handled in processQueryResult
-		 * through ParallelSlotsGetIdle.
-		 */
-		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, sql.data,
-						   echo, tabname);
-
-		cell = cell->next;
-	} while (cell != NULL);
-
-	if (!ParallelSlotsWaitCompletion(sa))
-	{
-		failed = true;
-		goto finish;
-	}
-
-	/* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */
-	if (vacopts->skip_database_stats && stage == ANALYZE_NO_STAGE)
-	{
-		const char *cmd = "VACUUM (ONLY_DATABASE_STATS);";
-		ParallelSlot *free_slot = ParallelSlotsGetIdle(sa, NULL);
-
-		if (!free_slot)
-		{
-			failed = true;
-			goto finish;
-		}
-
-		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, cmd, echo, NULL);
-
-		if (!ParallelSlotsWaitCompletion(sa))
-			failed = true;
-	}
-
-finish:
-	ParallelSlotsTerminate(sa);
-	pg_free(sa);
-
-	termPQExpBuffer(&sql);
-
-	if (failed)
-		exit(1);
-}
-
-/*
- * Prepare the list of tables to process by querying the catalogs.
- *
- * Since we execute the constructed query with the default search_path (which
- * could be unsafe), everything in this query MUST be fully qualified.
- *
- * First, build a WITH clause for the catalog query if any tables were
- * specified, with a set of values made of relation names and their optional
- * set of columns.  This is used to match any provided column lists with the
- * generated qualified identifiers and to filter for the tables provided via
- * --table.  If a listed table does not exist, the catalog query will fail.
- */
-static SimpleStringList *
-retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
-				 SimpleStringList *objects, bool echo)
-{
-	PQExpBufferData buf;
-	PQExpBufferData catalog_query;
-	PGresult   *res;
-	SimpleStringListCell *cell;
-	SimpleStringList *found_objs = palloc0(sizeof(SimpleStringList));
-	bool		objects_listed = false;
-
-	initPQExpBuffer(&catalog_query);
-	for (cell = objects ? objects->head : NULL; cell; cell = cell->next)
-	{
-		char	   *just_table = NULL;
-		const char *just_columns = NULL;
-
-		if (!objects_listed)
-		{
-			appendPQExpBufferStr(&catalog_query,
-								 "WITH listed_objects (object_oid, column_list) "
-								 "AS (\n  VALUES (");
-			objects_listed = true;
-		}
-		else
-			appendPQExpBufferStr(&catalog_query, ",\n  (");
-
-		if (objfilter & (OBJFILTER_SCHEMA | OBJFILTER_SCHEMA_EXCLUDE))
-		{
-			appendStringLiteralConn(&catalog_query, cell->val, conn);
-			appendPQExpBufferStr(&catalog_query, "::pg_catalog.regnamespace, ");
-		}
-
-		if (objfilter & OBJFILTER_TABLE)
-		{
-			/*
-			 * Split relation and column names given by the user, this is used
-			 * to feed the CTE with values on which are performed pre-run
-			 * validity checks as well.  For now these happen only on the
-			 * relation name.
-			 */
-			splitTableColumnsSpec(cell->val, PQclientEncoding(conn),
-								  &just_table, &just_columns);
-
-			appendStringLiteralConn(&catalog_query, just_table, conn);
-			appendPQExpBufferStr(&catalog_query, "::pg_catalog.regclass, ");
-		}
-
-		if (just_columns && just_columns[0] != '\0')
-			appendStringLiteralConn(&catalog_query, just_columns, conn);
-		else
-			appendPQExpBufferStr(&catalog_query, "NULL");
-
-		appendPQExpBufferStr(&catalog_query, "::pg_catalog.text)");
-
-		pg_free(just_table);
-	}
-
-	/* Finish formatting the CTE */
-	if (objects_listed)
-		appendPQExpBufferStr(&catalog_query, "\n)\n");
-
-	appendPQExpBufferStr(&catalog_query, "SELECT c.relname, ns.nspname");
-
-	if (objects_listed)
-		appendPQExpBufferStr(&catalog_query, ", listed_objects.column_list");
-
-	appendPQExpBufferStr(&catalog_query,
-						 " FROM pg_catalog.pg_class c\n"
-						 " JOIN pg_catalog.pg_namespace ns"
-						 " ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n"
-						 " CROSS JOIN LATERAL (SELECT c.relkind IN ("
-						 CppAsString2(RELKIND_PARTITIONED_TABLE) ", "
-						 CppAsString2(RELKIND_PARTITIONED_INDEX) ")) as p (inherited)\n"
-						 " LEFT JOIN pg_catalog.pg_class t"
-						 " ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid\n");
-
-	/*
-	 * Used to match the tables or schemas listed by the user, completing the
-	 * JOIN clause.
-	 */
-	if (objects_listed)
-	{
-		appendPQExpBufferStr(&catalog_query, " LEFT JOIN listed_objects"
-							 " ON listed_objects.object_oid"
-							 " OPERATOR(pg_catalog.=) ");
-
-		if (objfilter & OBJFILTER_TABLE)
-			appendPQExpBufferStr(&catalog_query, "c.oid\n");
-		else
-			appendPQExpBufferStr(&catalog_query, "ns.oid\n");
-	}
-
-	/*
-	 * Exclude temporary tables, beginning the WHERE clause.
-	 */
-	appendPQExpBufferStr(&catalog_query,
-						 " WHERE c.relpersistence OPERATOR(pg_catalog.!=) "
-						 CppAsString2(RELPERSISTENCE_TEMP) "\n");
-
-	/*
-	 * Used to match the tables or schemas listed by the user, for the WHERE
-	 * clause.
-	 */
-	if (objects_listed)
-	{
-		if (objfilter & OBJFILTER_SCHEMA_EXCLUDE)
-			appendPQExpBufferStr(&catalog_query,
-								 " AND listed_objects.object_oid IS NULL\n");
-		else
-			appendPQExpBufferStr(&catalog_query,
-								 " AND listed_objects.object_oid IS NOT NULL\n");
-	}
-
-	/*
-	 * If no tables were listed, filter for the relevant relation types.  If
-	 * tables were given via --table, don't bother filtering by relation type.
-	 * Instead, let the server decide whether a given relation can be
-	 * processed in which case the user will know about it.
-	 */
-	if ((objfilter & OBJFILTER_TABLE) == 0)
-	{
-		/*
-		 * vacuumdb should generally follow the behavior of the underlying
-		 * VACUUM and ANALYZE commands. If analyze_only is true, process
-		 * regular tables, materialized views, and partitioned tables, just
-		 * like ANALYZE (with no specific target tables) does. Otherwise,
-		 * process only regular tables and materialized views, since VACUUM
-		 * skips partitioned tables when no target tables are specified.
-		 */
-		if (vacopts->analyze_only)
-			appendPQExpBufferStr(&catalog_query,
-								 " AND c.relkind OPERATOR(pg_catalog.=) ANY (array["
-								 CppAsString2(RELKIND_RELATION) ", "
-								 CppAsString2(RELKIND_MATVIEW) ", "
-								 CppAsString2(RELKIND_PARTITIONED_TABLE) "])\n");
-		else
-			appendPQExpBufferStr(&catalog_query,
-								 " AND c.relkind OPERATOR(pg_catalog.=) ANY (array["
-								 CppAsString2(RELKIND_RELATION) ", "
-								 CppAsString2(RELKIND_MATVIEW) "])\n");
-
-	}
-
-	/*
-	 * For --min-xid-age and --min-mxid-age, the age of the relation is the
-	 * greatest of the ages of the main relation and its associated TOAST
-	 * table.  The commands generated by vacuumdb will also process the TOAST
-	 * table for the relation if necessary, so it does not need to be
-	 * considered separately.
-	 */
-	if (vacopts->min_xid_age != 0)
-	{
-		appendPQExpBuffer(&catalog_query,
-						  " AND GREATEST(pg_catalog.age(c.relfrozenxid),"
-						  " pg_catalog.age(t.relfrozenxid)) "
-						  " OPERATOR(pg_catalog.>=) '%d'::pg_catalog.int4\n"
-						  " AND c.relfrozenxid OPERATOR(pg_catalog.!=)"
-						  " '0'::pg_catalog.xid\n",
-						  vacopts->min_xid_age);
-	}
-
-	if (vacopts->min_mxid_age != 0)
-	{
-		appendPQExpBuffer(&catalog_query,
-						  " AND GREATEST(pg_catalog.mxid_age(c.relminmxid),"
-						  " pg_catalog.mxid_age(t.relminmxid)) OPERATOR(pg_catalog.>=)"
-						  " '%d'::pg_catalog.int4\n"
-						  " AND c.relminmxid OPERATOR(pg_catalog.!=)"
-						  " '0'::pg_catalog.xid\n",
-						  vacopts->min_mxid_age);
-	}
-
-	if (vacopts->missing_stats_only)
-	{
-		appendPQExpBufferStr(&catalog_query, " AND (\n");
-
-		/* regular stats */
-		appendPQExpBufferStr(&catalog_query,
-							 " EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
-							 " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
-							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
-							 " AND NOT a.attisdropped\n"
-							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
-							 " AND a.attgenerated OPERATOR(pg_catalog.<>) "
-							 CppAsString2(ATTRIBUTE_GENERATED_VIRTUAL) "\n"
-							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
-							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
-							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
-							 " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
-
-		/* extended stats */
-		appendPQExpBufferStr(&catalog_query,
-							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
-							 " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
-							 " AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
-							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
-							 " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
-							 " AND d.stxdinherit OPERATOR(pg_catalog.=) p.inherited))\n");
-
-		/* expression indexes */
-		appendPQExpBufferStr(&catalog_query,
-							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
-							 " JOIN pg_catalog.pg_index i"
-							 " ON i.indexrelid OPERATOR(pg_catalog.=) a.attrelid\n"
-							 " WHERE i.indrelid OPERATOR(pg_catalog.=) c.oid\n"
-							 " AND i.indkey[a.attnum OPERATOR(pg_catalog.-) 1::pg_catalog.int2]"
-							 " OPERATOR(pg_catalog.=) 0::pg_catalog.int2\n"
-							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
-							 " AND NOT a.attisdropped\n"
-							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
-							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
-							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
-							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
-							 " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
-
-		/* inheritance and regular stats */
-		appendPQExpBufferStr(&catalog_query,
-							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
-							 " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
-							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
-							 " AND NOT a.attisdropped\n"
-							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
-							 " AND a.attgenerated OPERATOR(pg_catalog.<>) "
-							 CppAsString2(ATTRIBUTE_GENERATED_VIRTUAL) "\n"
-							 " AND c.relhassubclass\n"
-							 " AND NOT p.inherited\n"
-							 " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
-							 " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
-							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
-							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
-							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
-							 " AND s.stainherit))\n");
-
-		/* inheritance and extended stats */
-		appendPQExpBufferStr(&catalog_query,
-							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
-							 " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
-							 " AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
-							 " AND c.relhassubclass\n"
-							 " AND NOT p.inherited\n"
-							 " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
-							 " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
-							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
-							 " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
-							 " AND d.stxdinherit))\n");
-
-		appendPQExpBufferStr(&catalog_query, " )\n");
-	}
-
-	/*
-	 * Execute the catalog query.  We use the default search_path for this
-	 * query for consistency with table lookups done elsewhere by the user.
-	 */
-	appendPQExpBufferStr(&catalog_query, " ORDER BY c.relpages DESC;");
-	executeCommand(conn, "RESET search_path;", echo);
-	res = executeQuery(conn, catalog_query.data, echo);
-	termPQExpBuffer(&catalog_query);
-	PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, echo));
-
-	/*
-	 * Build qualified identifiers for each table, including the column list
-	 * if given.
-	 */
-	initPQExpBuffer(&buf);
-	for (int i = 0; i < PQntuples(res); i++)
-	{
-		appendPQExpBufferStr(&buf,
-							 fmtQualifiedIdEnc(PQgetvalue(res, i, 1),
-											   PQgetvalue(res, i, 0),
-											   PQclientEncoding(conn)));
-
-		if (objects_listed && !PQgetisnull(res, i, 2))
-			appendPQExpBufferStr(&buf, PQgetvalue(res, i, 2));
-
-		simple_string_list_append(found_objs, buf.data);
-		resetPQExpBuffer(&buf);
-	}
-	termPQExpBuffer(&buf);
-	PQclear(res);
-
-	return found_objs;
-}
-
-/*
- * Vacuum/analyze all connectable databases.
- *
- * In analyze-in-stages mode, we process all databases in one stage before
- * moving on to the next stage.  That ensure minimal stats are available
- * quickly everywhere before generating more detailed ones.
- */
-static void
-vacuum_all_databases(ConnParams *cparams,
-					 vacuumingOptions *vacopts,
-					 bool analyze_in_stages,
-					 SimpleStringList *objects,
-					 int concurrentCons,
-					 const char *progname, bool echo, bool quiet)
-{
-	PGconn	   *conn;
-	PGresult   *result;
-	int			stage;
-	int			i;
-
-	conn = connectMaintenanceDatabase(cparams, progname, echo);
-	result = executeQuery(conn,
-						  "SELECT datname FROM pg_database WHERE datallowconn AND datconnlimit <> -2 ORDER BY 1;",
-						  echo);
-	PQfinish(conn);
-
-	if (analyze_in_stages)
-	{
-		SimpleStringList **found_objs = NULL;
-
-		if (vacopts->missing_stats_only)
-			found_objs = palloc0(PQntuples(result) * sizeof(SimpleStringList *));
-
-		/*
-		 * When analyzing all databases in stages, we analyze them all in the
-		 * fastest stage first, so that initial statistics become available
-		 * for all of them as soon as possible.
-		 *
-		 * This means we establish several times as many connections, but
-		 * that's a secondary consideration.
-		 */
-		for (stage = 0; stage < ANALYZE_NUM_STAGES; stage++)
-		{
-			for (i = 0; i < PQntuples(result); i++)
-			{
-				cparams->override_dbname = PQgetvalue(result, i, 0);
-
-				vacuum_one_database(cparams, vacopts,
-									stage,
-									objects,
-									vacopts->missing_stats_only ? &found_objs[i] : NULL,
-									concurrentCons,
-									progname, echo, quiet);
-			}
-		}
-	}
-	else
-	{
-		for (i = 0; i < PQntuples(result); i++)
-		{
-			cparams->override_dbname = PQgetvalue(result, i, 0);
-
-			vacuum_one_database(cparams, vacopts,
-								ANALYZE_NO_STAGE,
-								objects, NULL,
-								concurrentCons,
-								progname, echo, quiet);
-		}
-	}
-
-	PQclear(result);
-}
-
-/*
- * Construct a vacuum/analyze command to run based on the given options, in the
- * given string buffer, which may contain previous garbage.
- *
- * The table name used must be already properly quoted.  The command generated
- * depends on the server version involved and it is semicolon-terminated.
- */
-static void
-prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
-					   vacuumingOptions *vacopts, const char *table)
-{
-	const char *paren = " (";
-	const char *comma = ", ";
-	const char *sep = paren;
-
-	resetPQExpBuffer(sql);
-
-	if (vacopts->analyze_only)
-	{
-		appendPQExpBufferStr(sql, "ANALYZE");
-
-		/* parenthesized grammar of ANALYZE is supported since v11 */
-		if (serverVersion >= 110000)
-		{
-			if (vacopts->skip_locked)
-			{
-				/* SKIP_LOCKED is supported since v12 */
-				Assert(serverVersion >= 120000);
-				appendPQExpBuffer(sql, "%sSKIP_LOCKED", sep);
-				sep = comma;
-			}
-			if (vacopts->verbose)
-			{
-				appendPQExpBuffer(sql, "%sVERBOSE", sep);
-				sep = comma;
-			}
-			if (vacopts->buffer_usage_limit)
-			{
-				Assert(serverVersion >= 160000);
-				appendPQExpBuffer(sql, "%sBUFFER_USAGE_LIMIT '%s'", sep,
-								  vacopts->buffer_usage_limit);
-				sep = comma;
-			}
-			if (sep != paren)
-				appendPQExpBufferChar(sql, ')');
-		}
-		else
-		{
-			if (vacopts->verbose)
-				appendPQExpBufferStr(sql, " VERBOSE");
-		}
-	}
-	else
-	{
-		appendPQExpBufferStr(sql, "VACUUM");
-
-		/* parenthesized grammar of VACUUM is supported since v9.0 */
-		if (serverVersion >= 90000)
-		{
-			if (vacopts->disable_page_skipping)
-			{
-				/* DISABLE_PAGE_SKIPPING is supported since v9.6 */
-				Assert(serverVersion >= 90600);
-				appendPQExpBuffer(sql, "%sDISABLE_PAGE_SKIPPING", sep);
-				sep = comma;
-			}
-			if (vacopts->no_index_cleanup)
-			{
-				/* "INDEX_CLEANUP FALSE" has been supported since v12 */
-				Assert(serverVersion >= 120000);
-				Assert(!vacopts->force_index_cleanup);
-				appendPQExpBuffer(sql, "%sINDEX_CLEANUP FALSE", sep);
-				sep = comma;
-			}
-			if (vacopts->force_index_cleanup)
-			{
-				/* "INDEX_CLEANUP TRUE" has been supported since v12 */
-				Assert(serverVersion >= 120000);
-				Assert(!vacopts->no_index_cleanup);
-				appendPQExpBuffer(sql, "%sINDEX_CLEANUP TRUE", sep);
-				sep = comma;
-			}
-			if (!vacopts->do_truncate)
-			{
-				/* TRUNCATE is supported since v12 */
-				Assert(serverVersion >= 120000);
-				appendPQExpBuffer(sql, "%sTRUNCATE FALSE", sep);
-				sep = comma;
-			}
-			if (!vacopts->process_main)
-			{
-				/* PROCESS_MAIN is supported since v16 */
-				Assert(serverVersion >= 160000);
-				appendPQExpBuffer(sql, "%sPROCESS_MAIN FALSE", sep);
-				sep = comma;
-			}
-			if (!vacopts->process_toast)
-			{
-				/* PROCESS_TOAST is supported since v14 */
-				Assert(serverVersion >= 140000);
-				appendPQExpBuffer(sql, "%sPROCESS_TOAST FALSE", sep);
-				sep = comma;
-			}
-			if (vacopts->skip_database_stats)
-			{
-				/* SKIP_DATABASE_STATS is supported since v16 */
-				Assert(serverVersion >= 160000);
-				appendPQExpBuffer(sql, "%sSKIP_DATABASE_STATS", sep);
-				sep = comma;
-			}
-			if (vacopts->skip_locked)
-			{
-				/* SKIP_LOCKED is supported since v12 */
-				Assert(serverVersion >= 120000);
-				appendPQExpBuffer(sql, "%sSKIP_LOCKED", sep);
-				sep = comma;
-			}
-			if (vacopts->full)
-			{
-				appendPQExpBuffer(sql, "%sFULL", sep);
-				sep = comma;
-			}
-			if (vacopts->freeze)
-			{
-				appendPQExpBuffer(sql, "%sFREEZE", sep);
-				sep = comma;
-			}
-			if (vacopts->verbose)
-			{
-				appendPQExpBuffer(sql, "%sVERBOSE", sep);
-				sep = comma;
-			}
-			if (vacopts->and_analyze)
-			{
-				appendPQExpBuffer(sql, "%sANALYZE", sep);
-				sep = comma;
-			}
-			if (vacopts->parallel_workers >= 0)
-			{
-				/* PARALLEL is supported since v13 */
-				Assert(serverVersion >= 130000);
-				appendPQExpBuffer(sql, "%sPARALLEL %d", sep,
-								  vacopts->parallel_workers);
-				sep = comma;
-			}
-			if (vacopts->buffer_usage_limit)
-			{
-				Assert(serverVersion >= 160000);
-				appendPQExpBuffer(sql, "%sBUFFER_USAGE_LIMIT '%s'", sep,
-								  vacopts->buffer_usage_limit);
-				sep = comma;
-			}
-			if (sep != paren)
-				appendPQExpBufferChar(sql, ')');
-		}
-		else
-		{
-			if (vacopts->full)
-				appendPQExpBufferStr(sql, " FULL");
-			if (vacopts->freeze)
-				appendPQExpBufferStr(sql, " FREEZE");
-			if (vacopts->verbose)
-				appendPQExpBufferStr(sql, " VERBOSE");
-			if (vacopts->and_analyze)
-				appendPQExpBufferStr(sql, " ANALYZE");
-		}
-	}
-
-	appendPQExpBuffer(sql, " %s;", table);
-}
-
-/*
- * Send a vacuum/analyze command to the server, returning after sending the
- * command.
- *
- * Any errors during command execution are reported to stderr.
- */
-static void
-run_vacuum_command(PGconn *conn, const char *sql, bool echo,
-				   const char *table)
-{
-	bool		status;
-
-	if (echo)
-		printf("%s\n", sql);
-
-	status = PQsendQuery(conn, sql) == 1;
-
-	if (!status)
-	{
-		if (table)
-			pg_log_error("vacuuming of table \"%s\" in database \"%s\" failed: %s",
-						 table, PQdb(conn), PQerrorMessage(conn));
-		else
-			pg_log_error("vacuuming of database \"%s\" failed: %s",
-						 PQdb(conn), PQerrorMessage(conn));
-	}
-}
 
 static void
 help(const char *progname)
diff --git a/src/bin/scripts/vacuuming.c b/src/bin/scripts/vacuuming.c
new file mode 100644
index 00000000000..9be37fcc45a
--- /dev/null
+++ b/src/bin/scripts/vacuuming.c
@@ -0,0 +1,978 @@
+/*-------------------------------------------------------------------------
+ * vacuuming.c
+ *		Common routines for vacuumdb
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/bin/scripts/vacuuming.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <limits.h>
+
+#include "catalog/pg_attribute_d.h"
+#include "catalog/pg_class_d.h"
+#include "common/connect.h"
+#include "common/logging.h"
+#include "fe_utils/cancel.h"
+#include "fe_utils/option_utils.h"
+#include "fe_utils/parallel_slot.h"
+#include "fe_utils/query_utils.h"
+#include "fe_utils/string_utils.h"
+#include "vacuuming.h"
+
+VacObjFilter objfilter = OBJFILTER_NONE;
+
+
+/*
+ * Executes vacuum/analyze as indicated, or dies in case of failure.
+ */
+void
+vacuuming_main(ConnParams *cparams, const char *dbname,
+			   const char *maintenance_db, vacuumingOptions *vacopts,
+			   SimpleStringList *objects, bool analyze_in_stages,
+			   int tbl_count, int concurrentCons,
+			   const char *progname, bool echo, bool quiet)
+{
+	setup_cancel_handler(NULL);
+
+	/* Avoid opening extra connections. */
+	if (tbl_count && (concurrentCons > tbl_count))
+		concurrentCons = tbl_count;
+
+	if (objfilter & OBJFILTER_ALL_DBS)
+	{
+		cparams->dbname = maintenance_db;
+
+		vacuum_all_databases(cparams, vacopts,
+							 analyze_in_stages,
+							 objects,
+							 concurrentCons,
+							 progname, echo, quiet);
+	}
+	else
+	{
+		if (dbname == NULL)
+		{
+			if (getenv("PGDATABASE"))
+				dbname = getenv("PGDATABASE");
+			else if (getenv("PGUSER"))
+				dbname = getenv("PGUSER");
+			else
+				dbname = get_user_name_or_exit(progname);
+		}
+
+		cparams->dbname = dbname;
+
+		if (analyze_in_stages)
+		{
+			int			stage;
+			SimpleStringList *found_objs = NULL;
+
+			for (stage = 0; stage < ANALYZE_NUM_STAGES; stage++)
+			{
+				vacuum_one_database(cparams, vacopts,
+									stage,
+									objects,
+									vacopts->missing_stats_only ? &found_objs : NULL,
+									concurrentCons,
+									progname, echo, quiet);
+			}
+		}
+		else
+			vacuum_one_database(cparams, vacopts,
+								ANALYZE_NO_STAGE,
+								objects, NULL,
+								concurrentCons,
+								progname, echo, quiet);
+	}
+}
+
+
+/*
+ * vacuum_one_database
+ *
+ * Process tables in the given database.
+ *
+ * There are two ways to specify the list of objects to process:
+ *
+ * 1) The "found_objs" parameter is a double pointer to a fully qualified list
+ *    of objects to process, as returned by a previous call to
+ *    vacuum_one_database().
+ *
+ *     a) If both "found_objs" (the double pointer) and "*found_objs" (the
+ *        once-dereferenced double pointer) are not NULL, this list takes
+ *        priority, and anything specified in "objects" is ignored.
+ *
+ *     b) If "found_objs" (the double pointer) is not NULL but "*found_objs"
+ *        (the once-dereferenced double pointer) _is_ NULL, the "objects"
+ *        parameter takes priority, and the results of the catalog query
+ *        described in (2) are stored in "found_objs".
+ *
+ *     c) If "found_objs" (the double pointer) is NULL, the "objects"
+ *        parameter again takes priority, and the results of the catalog query
+ *        are not saved.
+ *
+ * 2) The "objects" parameter is a user-specified list of objects to process.
+ *    When (1b) or (1c) applies, this function performs a catalog query to
+ *    retrieve a fully qualified list of objects to process, as described
+ *    below.
+ *
+ *     a) If "objects" is not NULL, the catalog query gathers only the objects
+ *        listed in "objects".
+ *
+ *     b) If "objects" is NULL, all tables in the database are gathered.
+ *
+ * Note that this function is only concerned with running exactly one stage
+ * when in analyze-in-stages mode; caller must iterate on us if necessary.
+ *
+ * If concurrentCons is > 1, multiple connections are used to vacuum tables
+ * in parallel.
+ */
+void
+vacuum_one_database(ConnParams *cparams,
+					vacuumingOptions *vacopts,
+					int stage,
+					SimpleStringList *objects,
+					SimpleStringList **found_objs,
+					int concurrentCons,
+					const char *progname, bool echo, bool quiet)
+{
+	PQExpBufferData sql;
+	PGconn	   *conn;
+	SimpleStringListCell *cell;
+	ParallelSlotArray *sa;
+	int			ntups = 0;
+	bool		failed = false;
+	const char *initcmd;
+	SimpleStringList *ret = NULL;
+	const char *stage_commands[] = {
+		"SET default_statistics_target=1; SET vacuum_cost_delay=0;",
+		"SET default_statistics_target=10; RESET vacuum_cost_delay;",
+		"RESET default_statistics_target;"
+	};
+	const char *stage_messages[] = {
+		gettext_noop("Generating minimal optimizer statistics (1 target)"),
+		gettext_noop("Generating medium optimizer statistics (10 targets)"),
+		gettext_noop("Generating default (full) optimizer statistics")
+	};
+
+	Assert(stage == ANALYZE_NO_STAGE ||
+		   (stage >= 0 && stage < ANALYZE_NUM_STAGES));
+
+	conn = connectDatabase(cparams, progname, echo, false, true);
+
+	if (vacopts->disable_page_skipping && PQserverVersion(conn) < 90600)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "disable-page-skipping", "9.6");
+	}
+
+	if (vacopts->no_index_cleanup && PQserverVersion(conn) < 120000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "no-index-cleanup", "12");
+	}
+
+	if (vacopts->force_index_cleanup && PQserverVersion(conn) < 120000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "force-index-cleanup", "12");
+	}
+
+	if (!vacopts->do_truncate && PQserverVersion(conn) < 120000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "no-truncate", "12");
+	}
+
+	if (!vacopts->process_main && PQserverVersion(conn) < 160000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "no-process-main", "16");
+	}
+
+	if (!vacopts->process_toast && PQserverVersion(conn) < 140000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "no-process-toast", "14");
+	}
+
+	if (vacopts->skip_locked && PQserverVersion(conn) < 120000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "skip-locked", "12");
+	}
+
+	if (vacopts->min_xid_age != 0 && PQserverVersion(conn) < 90600)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "--min-xid-age", "9.6");
+	}
+
+	if (vacopts->min_mxid_age != 0 && PQserverVersion(conn) < 90600)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "--min-mxid-age", "9.6");
+	}
+
+	if (vacopts->parallel_workers >= 0 && PQserverVersion(conn) < 130000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "--parallel", "13");
+	}
+
+	if (vacopts->buffer_usage_limit && PQserverVersion(conn) < 160000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "--buffer-usage-limit", "16");
+	}
+
+	if (vacopts->missing_stats_only && PQserverVersion(conn) < 150000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "--missing-stats-only", "15");
+	}
+
+	/* skip_database_stats is used automatically if server supports it */
+	vacopts->skip_database_stats = (PQserverVersion(conn) >= 160000);
+
+	if (!quiet)
+	{
+		if (stage != ANALYZE_NO_STAGE)
+			printf(_("%s: processing database \"%s\": %s\n"),
+				   progname, PQdb(conn), _(stage_messages[stage]));
+		else
+			printf(_("%s: vacuuming database \"%s\"\n"),
+				   progname, PQdb(conn));
+		fflush(stdout);
+	}
+
+	/*
+	 * If the caller provided the results of a previous catalog query, just
+	 * use that.  Otherwise, run the catalog query ourselves and set the
+	 * return variable if provided.
+	 */
+	if (found_objs && *found_objs)
+		ret = *found_objs;
+	else
+	{
+		ret = retrieve_objects(conn, vacopts, objects, echo);
+		if (found_objs)
+			*found_objs = ret;
+	}
+
+	/*
+	 * Count the number of objects in the catalog query result.  If there are
+	 * none, we are done.
+	 */
+	for (cell = ret ? ret->head : NULL; cell; cell = cell->next)
+		ntups++;
+
+	if (ntups == 0)
+	{
+		PQfinish(conn);
+		return;
+	}
+
+	/*
+	 * Ensure concurrentCons is sane.  If there are more connections than
+	 * vacuumable relations, we don't need to use them all.
+	 */
+	if (concurrentCons > ntups)
+		concurrentCons = ntups;
+	if (concurrentCons <= 0)
+		concurrentCons = 1;
+
+	/*
+	 * All slots need to be prepared to run the appropriate analyze stage, if
+	 * caller requested that mode.  We have to prepare the initial connection
+	 * ourselves before setting up the slots.
+	 */
+	if (stage == ANALYZE_NO_STAGE)
+		initcmd = NULL;
+	else
+	{
+		initcmd = stage_commands[stage];
+		executeCommand(conn, initcmd, echo);
+	}
+
+	/*
+	 * Setup the database connections. We reuse the connection we already have
+	 * for the first slot.  If not in parallel mode, the first slot in the
+	 * array contains the connection.
+	 */
+	sa = ParallelSlotsSetup(concurrentCons, cparams, progname, echo, initcmd);
+	ParallelSlotsAdoptConn(sa, conn);
+
+	initPQExpBuffer(&sql);
+
+	cell = ret->head;
+	do
+	{
+		const char *tabname = cell->val;
+		ParallelSlot *free_slot;
+
+		if (CancelRequested)
+		{
+			failed = true;
+			goto finish;
+		}
+
+		free_slot = ParallelSlotsGetIdle(sa, NULL);
+		if (!free_slot)
+		{
+			failed = true;
+			goto finish;
+		}
+
+		prepare_vacuum_command(&sql, PQserverVersion(free_slot->connection),
+							   vacopts, tabname);
+
+		/*
+		 * Execute the vacuum.  All errors are handled in processQueryResult
+		 * through ParallelSlotsGetIdle.
+		 */
+		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
+		run_vacuum_command(free_slot->connection, sql.data,
+						   echo, tabname);
+
+		cell = cell->next;
+	} while (cell != NULL);
+
+	if (!ParallelSlotsWaitCompletion(sa))
+	{
+		failed = true;
+		goto finish;
+	}
+
+	/* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */
+	if (vacopts->skip_database_stats &&
+		stage == ANALYZE_NO_STAGE)
+	{
+		const char *cmd = "VACUUM (ONLY_DATABASE_STATS);";
+		ParallelSlot *free_slot = ParallelSlotsGetIdle(sa, NULL);
+
+		if (!free_slot)
+		{
+			failed = true;
+			goto finish;
+		}
+
+		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
+		run_vacuum_command(free_slot->connection, cmd, echo, NULL);
+
+		if (!ParallelSlotsWaitCompletion(sa))
+			failed = true;
+	}
+
+finish:
+	ParallelSlotsTerminate(sa);
+	pg_free(sa);
+
+	termPQExpBuffer(&sql);
+
+	if (failed)
+		exit(1);
+}
+
+/*
+ * Prepare the list of tables to process by querying the catalogs.
+ *
+ * Since we execute the constructed query with the default search_path (which
+ * could be unsafe), everything in this query MUST be fully qualified.
+ *
+ * First, build a WITH clause for the catalog query if any tables were
+ * specified, with a set of values made of relation names and their optional
+ * set of columns.  This is used to match any provided column lists with the
+ * generated qualified identifiers and to filter for the tables provided via
+ * --table.  If a listed table does not exist, the catalog query will fail.
+ */
+SimpleStringList *
+retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
+				 SimpleStringList *objects, bool echo)
+{
+	PQExpBufferData buf;
+	PQExpBufferData catalog_query;
+	PGresult   *res;
+	SimpleStringListCell *cell;
+	SimpleStringList *found_objs = palloc0(sizeof(SimpleStringList));
+	bool		objects_listed = false;
+
+	initPQExpBuffer(&catalog_query);
+	for (cell = objects ? objects->head : NULL; cell; cell = cell->next)
+	{
+		char	   *just_table = NULL;
+		const char *just_columns = NULL;
+
+		if (!objects_listed)
+		{
+			appendPQExpBufferStr(&catalog_query,
+								 "WITH listed_objects (object_oid, column_list) AS (\n"
+								 "  VALUES (");
+			objects_listed = true;
+		}
+		else
+			appendPQExpBufferStr(&catalog_query, ",\n  (");
+
+		if (objfilter & (OBJFILTER_SCHEMA | OBJFILTER_SCHEMA_EXCLUDE))
+		{
+			appendStringLiteralConn(&catalog_query, cell->val, conn);
+			appendPQExpBufferStr(&catalog_query, "::pg_catalog.regnamespace, ");
+		}
+
+		if (objfilter & OBJFILTER_TABLE)
+		{
+			/*
+			 * Split relation and column names given by the user, this is used
+			 * to feed the CTE with values on which are performed pre-run
+			 * validity checks as well.  For now these happen only on the
+			 * relation name.
+			 */
+			splitTableColumnsSpec(cell->val, PQclientEncoding(conn),
+								  &just_table, &just_columns);
+
+			appendStringLiteralConn(&catalog_query, just_table, conn);
+			appendPQExpBufferStr(&catalog_query, "::pg_catalog.regclass, ");
+		}
+
+		if (just_columns && just_columns[0] != '\0')
+			appendStringLiteralConn(&catalog_query, just_columns, conn);
+		else
+			appendPQExpBufferStr(&catalog_query, "NULL");
+
+		appendPQExpBufferStr(&catalog_query, "::pg_catalog.text)");
+
+		pg_free(just_table);
+	}
+
+	/* Finish formatting the CTE */
+	if (objects_listed)
+		appendPQExpBufferStr(&catalog_query, "\n)\n");
+
+	appendPQExpBufferStr(&catalog_query, "SELECT c.relname, ns.nspname");
+
+	if (objects_listed)
+		appendPQExpBufferStr(&catalog_query, ", listed_objects.column_list");
+
+	appendPQExpBufferStr(&catalog_query,
+						 " FROM pg_catalog.pg_class c\n"
+						 " JOIN pg_catalog.pg_namespace ns"
+						 " ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n"
+						 " CROSS JOIN LATERAL (SELECT c.relkind IN ("
+						 CppAsString2(RELKIND_PARTITIONED_TABLE) ", "
+						 CppAsString2(RELKIND_PARTITIONED_INDEX) ")) as p (inherited)\n"
+						 " LEFT JOIN pg_catalog.pg_class t"
+						 " ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid\n");
+
+	/*
+	 * Used to match the tables or schemas listed by the user, completing the
+	 * JOIN clause.
+	 */
+	if (objects_listed)
+	{
+		appendPQExpBufferStr(&catalog_query, " LEFT JOIN listed_objects"
+							 " ON listed_objects.object_oid"
+							 " OPERATOR(pg_catalog.=) ");
+
+		if (objfilter & OBJFILTER_TABLE)
+			appendPQExpBufferStr(&catalog_query, "c.oid\n");
+		else
+			appendPQExpBufferStr(&catalog_query, "ns.oid\n");
+	}
+
+	/*
+	 * Exclude temporary tables, beginning the WHERE clause.
+	 */
+	appendPQExpBufferStr(&catalog_query,
+						 " WHERE c.relpersistence OPERATOR(pg_catalog.!=) "
+						 CppAsString2(RELPERSISTENCE_TEMP) "\n");
+
+	/*
+	 * Used to match the tables or schemas listed by the user, for the WHERE
+	 * clause.
+	 */
+	if (objects_listed)
+	{
+		if (objfilter & OBJFILTER_SCHEMA_EXCLUDE)
+			appendPQExpBufferStr(&catalog_query,
+								 " AND listed_objects.object_oid IS NULL\n");
+		else
+			appendPQExpBufferStr(&catalog_query,
+								 " AND listed_objects.object_oid IS NOT NULL\n");
+	}
+
+	/*
+	 * If no tables were listed, filter for the relevant relation types.  If
+	 * tables were given via --table, don't bother filtering by relation type.
+	 * Instead, let the server decide whether a given relation can be
+	 * processed in which case the user will know about it.
+	 */
+	if ((objfilter & OBJFILTER_TABLE) == 0)
+	{
+		/*
+		 * vacuumdb should generally follow the behavior of the underlying
+		 * VACUUM and ANALYZE commands. If analyze_only is true, process
+		 * regular tables, materialized views, and partitioned tables, just
+		 * like ANALYZE (with no specific target tables) does. Otherwise,
+		 * process only regular tables and materialized views, since VACUUM
+		 * skips partitioned tables when no target tables are specified.
+		 */
+		if (vacopts->analyze_only)
+			appendPQExpBufferStr(&catalog_query,
+								 " AND c.relkind OPERATOR(pg_catalog.=) ANY (array["
+								 CppAsString2(RELKIND_RELATION) ", "
+								 CppAsString2(RELKIND_MATVIEW) ", "
+								 CppAsString2(RELKIND_PARTITIONED_TABLE) "])\n");
+		else
+			appendPQExpBufferStr(&catalog_query,
+								 " AND c.relkind OPERATOR(pg_catalog.=) ANY (array["
+								 CppAsString2(RELKIND_RELATION) ", "
+								 CppAsString2(RELKIND_MATVIEW) "])\n");
+	}
+
+	/*
+	 * For --min-xid-age and --min-mxid-age, the age of the relation is the
+	 * greatest of the ages of the main relation and its associated TOAST
+	 * table.  The commands generated by vacuumdb will also process the TOAST
+	 * table for the relation if necessary, so it does not need to be
+	 * considered separately.
+	 */
+	if (vacopts->min_xid_age != 0)
+	{
+		appendPQExpBuffer(&catalog_query,
+						  " AND GREATEST(pg_catalog.age(c.relfrozenxid),"
+						  " pg_catalog.age(t.relfrozenxid)) "
+						  " OPERATOR(pg_catalog.>=) '%d'::pg_catalog.int4\n"
+						  " AND c.relfrozenxid OPERATOR(pg_catalog.!=)"
+						  " '0'::pg_catalog.xid\n",
+						  vacopts->min_xid_age);
+	}
+
+	if (vacopts->min_mxid_age != 0)
+	{
+		appendPQExpBuffer(&catalog_query,
+						  " AND GREATEST(pg_catalog.mxid_age(c.relminmxid),"
+						  " pg_catalog.mxid_age(t.relminmxid)) OPERATOR(pg_catalog.>=)"
+						  " '%d'::pg_catalog.int4\n"
+						  " AND c.relminmxid OPERATOR(pg_catalog.!=)"
+						  " '0'::pg_catalog.xid\n",
+						  vacopts->min_mxid_age);
+	}
+
+	if (vacopts->missing_stats_only)
+	{
+		appendPQExpBufferStr(&catalog_query, " AND (\n");
+
+		/* regular stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
+							 " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
+							 " AND NOT a.attisdropped\n"
+							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND a.attgenerated OPERATOR(pg_catalog.<>) "
+							 CppAsString2(ATTRIBUTE_GENERATED_VIRTUAL) "\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
+							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
+							 " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+		/* extended stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
+							 " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
+							 " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
+							 " AND d.stxdinherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+		/* expression indexes */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
+							 " JOIN pg_catalog.pg_index i"
+							 " ON i.indexrelid OPERATOR(pg_catalog.=) a.attrelid\n"
+							 " WHERE i.indrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND i.indkey[a.attnum OPERATOR(pg_catalog.-) 1::pg_catalog.int2]"
+							 " OPERATOR(pg_catalog.=) 0::pg_catalog.int2\n"
+							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
+							 " AND NOT a.attisdropped\n"
+							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
+							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
+							 " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+		/* inheritance and regular stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
+							 " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
+							 " AND NOT a.attisdropped\n"
+							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND a.attgenerated OPERATOR(pg_catalog.<>) "
+							 CppAsString2(ATTRIBUTE_GENERATED_VIRTUAL) "\n"
+							 " AND c.relhassubclass\n"
+							 " AND NOT p.inherited\n"
+							 " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
+							 " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
+							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
+							 " AND s.stainherit))\n");
+
+		/* inheritance and extended stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
+							 " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND c.relhassubclass\n"
+							 " AND NOT p.inherited\n"
+							 " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
+							 " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
+							 " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
+							 " AND d.stxdinherit))\n");
+
+		appendPQExpBufferStr(&catalog_query, " )\n");
+	}
+
+	/*
+	 * Execute the catalog query.  We use the default search_path for this
+	 * query for consistency with table lookups done elsewhere by the user.
+	 */
+	appendPQExpBufferStr(&catalog_query, " ORDER BY c.relpages DESC;");
+	executeCommand(conn, "RESET search_path;", echo);
+	res = executeQuery(conn, catalog_query.data, echo);
+	termPQExpBuffer(&catalog_query);
+	PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, echo));
+
+	/*
+	 * Build qualified identifiers for each table, including the column list
+	 * if given.
+	 */
+	initPQExpBuffer(&buf);
+	for (int i = 0; i < PQntuples(res); i++)
+	{
+		appendPQExpBufferStr(&buf,
+							 fmtQualifiedIdEnc(PQgetvalue(res, i, 1),
+											   PQgetvalue(res, i, 0),
+											   PQclientEncoding(conn)));
+
+		if (objects_listed && !PQgetisnull(res, i, 2))
+			appendPQExpBufferStr(&buf, PQgetvalue(res, i, 2));
+
+		simple_string_list_append(found_objs, buf.data);
+		resetPQExpBuffer(&buf);
+	}
+	termPQExpBuffer(&buf);
+	PQclear(res);
+
+	return found_objs;
+}
+
+/*
+ * Vacuum/analyze all connectable databases.
+ *
+ * In analyze-in-stages mode, we process all databases in one stage before
+ * moving on to the next stage.  That ensure minimal stats are available
+ * quickly everywhere before generating more detailed ones.
+ */
+void
+vacuum_all_databases(ConnParams *cparams,
+					 vacuumingOptions *vacopts,
+					 bool analyze_in_stages,
+					 SimpleStringList *objects,
+					 int concurrentCons,
+					 const char *progname, bool echo, bool quiet)
+{
+	PGconn	   *conn;
+	PGresult   *result;
+	int			stage;
+	int			i;
+
+	conn = connectMaintenanceDatabase(cparams, progname, echo);
+	result = executeQuery(conn,
+						  "SELECT datname FROM pg_database WHERE datallowconn AND datconnlimit <> -2 ORDER BY 1;",
+						  echo);
+	PQfinish(conn);
+
+	if (analyze_in_stages)
+	{
+		SimpleStringList **found_objs = NULL;
+
+		if (vacopts->missing_stats_only)
+			found_objs = palloc0(PQntuples(result) * sizeof(SimpleStringList *));
+
+		/*
+		 * When analyzing all databases in stages, we analyze them all in the
+		 * fastest stage first, so that initial statistics become available
+		 * for all of them as soon as possible.
+		 *
+		 * This means we establish several times as many connections, but
+		 * that's a secondary consideration.
+		 */
+		for (stage = 0; stage < ANALYZE_NUM_STAGES; stage++)
+		{
+			for (i = 0; i < PQntuples(result); i++)
+			{
+				cparams->override_dbname = PQgetvalue(result, i, 0);
+
+				vacuum_one_database(cparams, vacopts,
+									stage,
+									objects,
+									vacopts->missing_stats_only ? &found_objs[i] : NULL,
+									concurrentCons,
+									progname, echo, quiet);
+			}
+		}
+	}
+	else
+	{
+		for (i = 0; i < PQntuples(result); i++)
+		{
+			cparams->override_dbname = PQgetvalue(result, i, 0);
+
+			vacuum_one_database(cparams, vacopts,
+								ANALYZE_NO_STAGE,
+								objects, NULL,
+								concurrentCons,
+								progname, echo, quiet);
+		}
+	}
+
+	PQclear(result);
+}
+
+/*
+ * Construct a vacuum/analyze command to run based on the given
+ * options, in the given string buffer, which may contain previous garbage.
+ *
+ * The table name used must be already properly quoted.  The command generated
+ * depends on the server version involved and it is semicolon-terminated.
+ */
+void
+prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
+					   vacuumingOptions *vacopts, const char *table)
+{
+	const char *paren = " (";
+	const char *comma = ", ";
+	const char *sep = paren;
+
+	resetPQExpBuffer(sql);
+
+	if (vacopts->analyze_only)
+	{
+		appendPQExpBufferStr(sql, "ANALYZE");
+
+		/* parenthesized grammar of ANALYZE is supported since v11 */
+		if (serverVersion >= 110000)
+		{
+			if (vacopts->skip_locked)
+			{
+				/* SKIP_LOCKED is supported since v12 */
+				Assert(serverVersion >= 120000);
+				appendPQExpBuffer(sql, "%sSKIP_LOCKED", sep);
+				sep = comma;
+			}
+			if (vacopts->verbose)
+			{
+				appendPQExpBuffer(sql, "%sVERBOSE", sep);
+				sep = comma;
+			}
+			if (vacopts->buffer_usage_limit)
+			{
+				Assert(serverVersion >= 160000);
+				appendPQExpBuffer(sql, "%sBUFFER_USAGE_LIMIT '%s'", sep,
+								  vacopts->buffer_usage_limit);
+				sep = comma;
+			}
+			if (sep != paren)
+				appendPQExpBufferChar(sql, ')');
+		}
+		else
+		{
+			if (vacopts->verbose)
+				appendPQExpBufferStr(sql, " VERBOSE");
+		}
+	}
+	else
+	{
+		appendPQExpBufferStr(sql, "VACUUM");
+
+		/* parenthesized grammar of VACUUM is supported since v9.0 */
+		if (serverVersion >= 90000)
+		{
+			if (vacopts->disable_page_skipping)
+			{
+				/* DISABLE_PAGE_SKIPPING is supported since v9.6 */
+				Assert(serverVersion >= 90600);
+				appendPQExpBuffer(sql, "%sDISABLE_PAGE_SKIPPING", sep);
+				sep = comma;
+			}
+			if (vacopts->no_index_cleanup)
+			{
+				/* "INDEX_CLEANUP FALSE" has been supported since v12 */
+				Assert(serverVersion >= 120000);
+				Assert(!vacopts->force_index_cleanup);
+				appendPQExpBuffer(sql, "%sINDEX_CLEANUP FALSE", sep);
+				sep = comma;
+			}
+			if (vacopts->force_index_cleanup)
+			{
+				/* "INDEX_CLEANUP TRUE" has been supported since v12 */
+				Assert(serverVersion >= 120000);
+				Assert(!vacopts->no_index_cleanup);
+				appendPQExpBuffer(sql, "%sINDEX_CLEANUP TRUE", sep);
+				sep = comma;
+			}
+			if (!vacopts->do_truncate)
+			{
+				/* TRUNCATE is supported since v12 */
+				Assert(serverVersion >= 120000);
+				appendPQExpBuffer(sql, "%sTRUNCATE FALSE", sep);
+				sep = comma;
+			}
+			if (!vacopts->process_main)
+			{
+				/* PROCESS_MAIN is supported since v16 */
+				Assert(serverVersion >= 160000);
+				appendPQExpBuffer(sql, "%sPROCESS_MAIN FALSE", sep);
+				sep = comma;
+			}
+			if (!vacopts->process_toast)
+			{
+				/* PROCESS_TOAST is supported since v14 */
+				Assert(serverVersion >= 140000);
+				appendPQExpBuffer(sql, "%sPROCESS_TOAST FALSE", sep);
+				sep = comma;
+			}
+			if (vacopts->skip_database_stats)
+			{
+				/* SKIP_DATABASE_STATS is supported since v16 */
+				Assert(serverVersion >= 160000);
+				appendPQExpBuffer(sql, "%sSKIP_DATABASE_STATS", sep);
+				sep = comma;
+			}
+			if (vacopts->skip_locked)
+			{
+				/* SKIP_LOCKED is supported since v12 */
+				Assert(serverVersion >= 120000);
+				appendPQExpBuffer(sql, "%sSKIP_LOCKED", sep);
+				sep = comma;
+			}
+			if (vacopts->full)
+			{
+				appendPQExpBuffer(sql, "%sFULL", sep);
+				sep = comma;
+			}
+			if (vacopts->freeze)
+			{
+				appendPQExpBuffer(sql, "%sFREEZE", sep);
+				sep = comma;
+			}
+			if (vacopts->verbose)
+			{
+				appendPQExpBuffer(sql, "%sVERBOSE", sep);
+				sep = comma;
+			}
+			if (vacopts->and_analyze)
+			{
+				appendPQExpBuffer(sql, "%sANALYZE", sep);
+				sep = comma;
+			}
+			if (vacopts->parallel_workers >= 0)
+			{
+				/* PARALLEL is supported since v13 */
+				Assert(serverVersion >= 130000);
+				appendPQExpBuffer(sql, "%sPARALLEL %d", sep,
+								  vacopts->parallel_workers);
+				sep = comma;
+			}
+			if (vacopts->buffer_usage_limit)
+			{
+				Assert(serverVersion >= 160000);
+				appendPQExpBuffer(sql, "%sBUFFER_USAGE_LIMIT '%s'", sep,
+								  vacopts->buffer_usage_limit);
+				sep = comma;
+			}
+			if (sep != paren)
+				appendPQExpBufferChar(sql, ')');
+		}
+		else
+		{
+			if (vacopts->full)
+				appendPQExpBufferStr(sql, " FULL");
+			if (vacopts->freeze)
+				appendPQExpBufferStr(sql, " FREEZE");
+			if (vacopts->verbose)
+				appendPQExpBufferStr(sql, " VERBOSE");
+			if (vacopts->and_analyze)
+				appendPQExpBufferStr(sql, " ANALYZE");
+		}
+	}
+
+	appendPQExpBuffer(sql, " %s;", table);
+}
+
+/*
+ * Send a vacuum/analyze command to the server, returning after sending the
+ * command.
+ *
+ * Any errors during command execution are reported to stderr.
+ */
+void
+run_vacuum_command(PGconn *conn, const char *sql, bool echo,
+				   const char *table)
+{
+	bool		status;
+
+	if (echo)
+		printf("%s\n", sql);
+
+	status = PQsendQuery(conn, sql) == 1;
+
+	if (!status)
+	{
+		if (table)
+		{
+			pg_log_error("vacuuming of table \"%s\" in database \"%s\" failed: %s",
+						 table, PQdb(conn), PQerrorMessage(conn));
+		}
+		else
+		{
+			pg_log_error("vacuuming of database \"%s\" failed: %s",
+						 PQdb(conn), PQerrorMessage(conn));
+		}
+	}
+}
+
+/*
+ * Returns a newly malloc'd version of 'src' with escaped single quotes and
+ * backslashes.
+ */
+char *
+escape_quotes(const char *src)
+{
+	char	   *result = escape_single_quotes_ascii(src);
+
+	if (!result)
+		pg_fatal("out of memory");
+	return result;
+}
diff --git a/src/bin/scripts/vacuuming.h b/src/bin/scripts/vacuuming.h
new file mode 100644
index 00000000000..d3f000840fa
--- /dev/null
+++ b/src/bin/scripts/vacuuming.h
@@ -0,0 +1,95 @@
+/*-------------------------------------------------------------------------
+ *
+ * vacuuming.h
+ *		Common declarations for vacuuming.c
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/bin/scripts/vacuuming.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef VACUUMING_H
+#define VACUUMING_H
+
+#include "common.h"
+#include "fe_utils/connect_utils.h"
+#include "fe_utils/simple_list.h"
+
+/* For analyze-in-stages mode */
+#define ANALYZE_NO_STAGE	-1
+#define ANALYZE_NUM_STAGES	3
+
+/* vacuum options controlled by user flags */
+typedef struct vacuumingOptions
+{
+	bool		analyze_only;
+	bool		verbose;
+	bool		and_analyze;
+	bool		full;
+	bool		freeze;
+	bool		disable_page_skipping;
+	bool		skip_locked;
+	int			min_xid_age;
+	int			min_mxid_age;
+	int			parallel_workers;	/* >= 0 indicates user specified the
+									 * parallel degree, otherwise -1 */
+	bool		no_index_cleanup;
+	bool		force_index_cleanup;
+	bool		do_truncate;
+	bool		process_main;
+	bool		process_toast;
+	bool		skip_database_stats;
+	char	   *buffer_usage_limit;
+	bool		missing_stats_only;
+} vacuumingOptions;
+
+/* object filter options */
+typedef enum
+{
+	OBJFILTER_NONE = 0,			/* no filter used */
+	OBJFILTER_ALL_DBS = (1 << 0),	/* -a | --all */
+	OBJFILTER_DATABASE = (1 << 1),	/* -d | --dbname */
+	OBJFILTER_TABLE = (1 << 2), /* -t | --table */
+	OBJFILTER_SCHEMA = (1 << 3),	/* -n | --schema */
+	OBJFILTER_SCHEMA_EXCLUDE = (1 << 4),	/* -N | --exclude-schema */
+} VacObjFilter;
+
+extern VacObjFilter objfilter;
+
+extern void vacuuming_main(ConnParams *cparams, const char *dbname,
+						   const char *maintenance_db, vacuumingOptions *vacopts,
+						   SimpleStringList *objects, bool analyze_in_stages,
+						   int tbl_count, int concurrentCons,
+						   const char *progname, bool echo, bool quiet);
+
+extern SimpleStringList *retrieve_objects(PGconn *conn,
+										  vacuumingOptions *vacopts,
+										  SimpleStringList *objects,
+										  bool echo);
+
+extern void vacuum_one_database(ConnParams *cparams,
+								vacuumingOptions *vacopts,
+								int stage,
+								SimpleStringList *objects,
+								SimpleStringList **found_objs,
+								int concurrentCons,
+								const char *progname, bool echo, bool quiet);
+
+extern void vacuum_all_databases(ConnParams *cparams,
+								 vacuumingOptions *vacopts,
+								 bool analyze_in_stages,
+								 SimpleStringList *objects,
+								 int concurrentCons,
+								 const char *progname, bool echo, bool quiet);
+
+extern void prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
+								   vacuumingOptions *vacopts, const char *table);
+
+extern void run_vacuum_command(PGconn *conn, const char *sql, bool echo,
+							   const char *table);
+
+extern char *escape_quotes(const char *src);
+
+#endif							/* VACUUMING_H */
-- 
2.39.5

v19-0002-Add-REPACK-command.patchtext/x-diff; charset=utf-8Download

From e4bef4dacf6207bc58abf7aeca616d765a54de60 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= <alvherre@kurilemu.de>
Date: Sat, 26 Jul 2025 19:57:26 +0200
Subject: [PATCH v19 2/6] Add REPACK command

REPACK absorbs the functionality of VACUUM FULL and CLUSTER in a single
command.  Because this functionality is completely different from
regular VACUUM, having it separate from VACUUM makes it easier for users
to understand; as for CLUSTER, the term is heavily overloaded in the
TI world and even in Postgres itself, so it's good that we can avoid it.

This also adds pg_repackdb, a new utility that can invoke the new
commands.  This is heavily based on vacuumdb.  We may still change the
implementation, depending on how does Windows like this one.

Author: Antonin Houska <ah@cybertec.at>
Reviewed-by: To fill in
Discussion: https://postgr.es/m/82651.1720540558@antos
Discussion: https://postgr.es/m/202507262156.sb455angijk6@alvherre.pgsql
---
 doc/src/sgml/monitoring.sgml             | 223 ++++++-
 doc/src/sgml/ref/allfiles.sgml           |   2 +
 doc/src/sgml/ref/cluster.sgml            |  97 +--
 doc/src/sgml/ref/clusterdb.sgml          |   5 +
 doc/src/sgml/ref/pg_repackdb.sgml        | 479 ++++++++++++++
 doc/src/sgml/ref/repack.sgml             | 284 +++++++++
 doc/src/sgml/ref/vacuum.sgml             |  33 +-
 doc/src/sgml/reference.sgml              |   2 +
 src/backend/access/heap/heapam_handler.c |  32 +-
 src/backend/catalog/index.c              |   2 +-
 src/backend/catalog/system_views.sql     |  26 +
 src/backend/commands/cluster.c           | 758 +++++++++++++++--------
 src/backend/commands/vacuum.c            |   3 +-
 src/backend/parser/gram.y                |  88 ++-
 src/backend/tcop/utility.c               |  20 +-
 src/backend/utils/adt/pgstatfuncs.c      |   2 +
 src/bin/psql/tab-complete.in.c           |  33 +-
 src/bin/scripts/Makefile                 |   4 +-
 src/bin/scripts/meson.build              |   1 +
 src/bin/scripts/pg_repackdb.c            | 226 +++++++
 src/bin/scripts/t/103_repackdb.pl        |  24 +
 src/bin/scripts/vacuuming.c              |  60 +-
 src/bin/scripts/vacuuming.h              |  11 +-
 src/include/commands/cluster.h           |   8 +-
 src/include/commands/progress.h          |  61 +-
 src/include/nodes/parsenodes.h           |  20 +-
 src/include/parser/kwlist.h              |   1 +
 src/include/tcop/cmdtaglist.h            |   1 +
 src/include/utils/backend_progress.h     |   1 +
 src/test/regress/expected/cluster.out    | 125 +++-
 src/test/regress/expected/rules.out      |  23 +
 src/test/regress/sql/cluster.sql         |  59 ++
 src/tools/pgindent/typedefs.list         |   3 +
 33 files changed, 2270 insertions(+), 447 deletions(-)
 create mode 100644 doc/src/sgml/ref/pg_repackdb.sgml
 create mode 100644 doc/src/sgml/ref/repack.sgml
 create mode 100644 src/bin/scripts/pg_repackdb.c
 create mode 100644 src/bin/scripts/t/103_repackdb.pl

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3f4a27a736e..12e103d319d 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -405,6 +405,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_progress_repack</structname><indexterm><primary>pg_stat_progress_repack</primary></indexterm></entry>
+      <entry>One row for each backend running
+       <command>REPACK</command>, showing current progress.  See
+       <xref linkend="repack-progress-reporting"/>.
+      </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_progress_basebackup</structname><indexterm><primary>pg_stat_progress_basebackup</primary></indexterm></entry>
       <entry>One row for each WAL sender process streaming a base backup,
@@ -5506,7 +5514,8 @@ FROM pg_stat_get_backend_idset() AS backendid;
    certain commands during command execution.  Currently, the only commands
    which support progress reporting are <command>ANALYZE</command>,
    <command>CLUSTER</command>,
-   <command>CREATE INDEX</command>, <command>VACUUM</command>,
+   <command>CREATE INDEX</command>, <command>REPACK</command>,
+   <command>VACUUM</command>,
    <command>COPY</command>,
    and <xref linkend="protocol-replication-base-backup"/> (i.e., replication
    command that <xref linkend="app-pgbasebackup"/> issues to take
@@ -5965,6 +5974,218 @@ FROM pg_stat_get_backend_idset() AS backendid;
   </table>
  </sect2>
 
+ <sect2 id="repack-progress-reporting">
+  <title>REPACK Progress Reporting</title>
+
+  <indexterm>
+   <primary>pg_stat_progress_repack</primary>
+  </indexterm>
+
+  <para>
+   Whenever <command>REPACK</command> is running,
+   the <structname>pg_stat_progress_repack</structname> view will contain a
+   row for each backend that is currently running the command.  The tables
+   below describe the information that will be reported and provide
+   information about how to interpret it.
+  </para>
+
+  <table id="pg-stat-progress-repack-view" xreflabel="pg_stat_progress_repack">
+   <title><structname>pg_stat_progress_repack</structname> View</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>pid</structfield> <type>integer</type>
+      </para>
+      <para>
+       Process ID of backend.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>datid</structfield> <type>oid</type>
+      </para>
+      <para>
+       OID of the database to which this backend is connected.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>datname</structfield> <type>name</type>
+      </para>
+      <para>
+       Name of the database to which this backend is connected.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>relid</structfield> <type>oid</type>
+      </para>
+      <para>
+       OID of the table being repacked.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>phase</structfield> <type>text</type>
+      </para>
+      <para>
+       Current processing phase. See <xref linkend="repack-phases"/>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>repack_index_relid</structfield> <type>oid</type>
+      </para>
+      <para>
+       If the table is being scanned using an index, this is the OID of the
+       index being used; otherwise, it is zero.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_scanned</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples scanned.
+       This counter only advances when the phase is
+       <literal>seq scanning heap</literal>,
+       <literal>index scanning heap</literal>
+       or <literal>writing new heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_written</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples written.
+       This counter only advances when the phase is
+       <literal>seq scanning heap</literal>,
+       <literal>index scanning heap</literal>
+       or <literal>writing new heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_blks_total</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total number of heap blocks in the table.  This number is reported
+       as of the beginning of <literal>seq scanning heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_blks_scanned</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap blocks scanned.  This counter only advances when the
+       phase is <literal>seq scanning heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>index_rebuild_count</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of indexes rebuilt.  This counter only advances when the phase
+       is <literal>rebuilding index</literal>.
+      </para></entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <table id="repack-phases">
+   <title>REPACK Phases</title>
+   <tgroup cols="2">
+    <colspec colname="col1" colwidth="1*"/>
+    <colspec colname="col2" colwidth="2*"/>
+    <thead>
+    <row>
+      <entry>Phase</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><literal>initializing</literal></entry>
+     <entry>
+       The command is preparing to begin scanning the heap.  This phase is
+       expected to be very brief.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>seq scanning heap</literal></entry>
+     <entry>
+       The command is currently scanning the table using a sequential scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>index scanning heap</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently scanning the table using an index scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>sorting tuples</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently sorting tuples.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>writing new heap</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently writing the new heap.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>swapping relation files</literal></entry>
+     <entry>
+       The command is currently swapping newly-built files into place.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>rebuilding index</literal></entry>
+     <entry>
+       The command is currently rebuilding an index.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>performing final cleanup</literal></entry>
+     <entry>
+       The command is performing final cleanup.  When this phase is
+       completed, <command>REPACK</command> will end.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+ </sect2>
+
  <sect2 id="copy-progress-reporting">
   <title>COPY Progress Reporting</title>
 
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index f5be638867a..eabf92e3536 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -167,6 +167,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY refreshMaterializedView SYSTEM "refresh_materialized_view.sgml">
 <!ENTITY reindex            SYSTEM "reindex.sgml">
 <!ENTITY releaseSavepoint   SYSTEM "release_savepoint.sgml">
+<!ENTITY repack             SYSTEM "repack.sgml">
 <!ENTITY reset              SYSTEM "reset.sgml">
 <!ENTITY revoke             SYSTEM "revoke.sgml">
 <!ENTITY rollback           SYSTEM "rollback.sgml">
@@ -212,6 +213,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY pgIsready          SYSTEM "pg_isready.sgml">
 <!ENTITY pgReceivewal       SYSTEM "pg_receivewal.sgml">
 <!ENTITY pgRecvlogical      SYSTEM "pg_recvlogical.sgml">
+<!ENTITY pgRepackdb         SYSTEM "pg_repackdb.sgml">
 <!ENTITY pgResetwal         SYSTEM "pg_resetwal.sgml">
 <!ENTITY pgRestore          SYSTEM "pg_restore.sgml">
 <!ENTITY pgRewind           SYSTEM "pg_rewind.sgml">
diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index 8811f169ea0..cfcfb65e349 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -33,51 +33,13 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
   <title>Description</title>
 
   <para>
-   <command>CLUSTER</command> instructs <productname>PostgreSQL</productname>
-   to cluster the table specified
-   by <replaceable class="parameter">table_name</replaceable>
-   based on the index specified by
-   <replaceable class="parameter">index_name</replaceable>. The index must
-   already have been defined on
-   <replaceable class="parameter">table_name</replaceable>.
+   The <command>CLUSTER</command> command is equivalent to
+   <xref linkend="sql-repack"/> with an <literal>USING INDEX</literal>
+   clause.  See there for more details.
   </para>
 
-  <para>
-   When a table is clustered, it is physically reordered
-   based on the index information. Clustering is a one-time operation:
-   when the table is subsequently updated, the changes are
-   not clustered.  That is, no attempt is made to store new or
-   updated rows according to their index order.  (If one wishes, one can
-   periodically recluster by issuing the command again.  Also, setting
-   the table's <literal>fillfactor</literal> storage parameter to less than
-   100% can aid in preserving cluster ordering during updates, since updated
-   rows are kept on the same page if enough space is available there.)
-  </para>
+<!-- Do we need to describe exactly which options map to what?  They seem obvious to me. -->
 
-  <para>
-   When a table is clustered, <productname>PostgreSQL</productname>
-   remembers which index it was clustered by.  The form
-   <command>CLUSTER <replaceable class="parameter">table_name</replaceable></command>
-   reclusters the table using the same index as before.  You can also
-   use the <literal>CLUSTER</literal> or <literal>SET WITHOUT CLUSTER</literal>
-   forms of <link linkend="sql-altertable"><command>ALTER TABLE</command></link> to set the index to be used for
-   future cluster operations, or to clear any previous setting.
-  </para>
-
-  <para>
-   <command>CLUSTER</command> without a
-   <replaceable class="parameter">table_name</replaceable> reclusters all the
-   previously-clustered tables in the current database that the calling user
-   has privileges for.  This form of <command>CLUSTER</command> cannot be
-   executed inside a transaction block.
-  </para>
-
-  <para>
-   When a table is being clustered, an <literal>ACCESS
-   EXCLUSIVE</literal> lock is acquired on it. This prevents any other
-   database operations (both reads and writes) from operating on the
-   table until the <command>CLUSTER</command> is finished.
-  </para>
  </refsect1>
 
  <refsect1>
@@ -136,63 +98,12 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
     on the table.
    </para>
 
-   <para>
-    In cases where you are accessing single rows randomly
-    within a table, the actual order of the data in the
-    table is unimportant. However, if you tend to access some
-    data more than others, and there is an index that groups
-    them together, you will benefit from using <command>CLUSTER</command>.
-    If you are requesting a range of indexed values from a table, or a
-    single indexed value that has multiple rows that match,
-    <command>CLUSTER</command> will help because once the index identifies the
-    table page for the first row that matches, all other rows
-    that match are probably already on the same table page,
-    and so you save disk accesses and speed up the query.
-   </para>
-
-   <para>
-    <command>CLUSTER</command> can re-sort the table using either an index scan
-    on the specified index, or (if the index is a b-tree) a sequential
-    scan followed by sorting.  It will attempt to choose the method that
-    will be faster, based on planner cost parameters and available statistical
-    information.
-   </para>
-
    <para>
     While <command>CLUSTER</command> is running, the <xref
     linkend="guc-search-path"/> is temporarily changed to <literal>pg_catalog,
     pg_temp</literal>.
    </para>
 
-   <para>
-    When an index scan is used, a temporary copy of the table is created that
-    contains the table data in the index order.  Temporary copies of each
-    index on the table are created as well.  Therefore, you need free space on
-    disk at least equal to the sum of the table size and the index sizes.
-   </para>
-
-   <para>
-    When a sequential scan and sort is used, a temporary sort file is
-    also created, so that the peak temporary space requirement is as much
-    as double the table size, plus the index sizes.  This method is often
-    faster than the index scan method, but if the disk space requirement is
-    intolerable, you can disable this choice by temporarily setting <xref
-    linkend="guc-enable-sort"/> to <literal>off</literal>.
-   </para>
-
-   <para>
-    It is advisable to set <xref linkend="guc-maintenance-work-mem"/> to
-    a reasonably large value (but not more than the amount of RAM you can
-    dedicate to the <command>CLUSTER</command> operation) before clustering.
-   </para>
-
-   <para>
-    Because the planner records statistics about the ordering of
-    tables, it is advisable to run <link linkend="sql-analyze"><command>ANALYZE</command></link>
-    on the newly clustered table.
-    Otherwise, the planner might make poor choices of query plans.
-   </para>
-
    <para>
     Because <command>CLUSTER</command> remembers which indexes are clustered,
     one can cluster the tables one wants clustered manually the first time,
diff --git a/doc/src/sgml/ref/clusterdb.sgml b/doc/src/sgml/ref/clusterdb.sgml
index 0d2051bf6f1..546c1289c31 100644
--- a/doc/src/sgml/ref/clusterdb.sgml
+++ b/doc/src/sgml/ref/clusterdb.sgml
@@ -64,6 +64,11 @@ PostgreSQL documentation
    this utility and via other methods for accessing the server.
   </para>
 
+  <para>
+   <application>clusterdb</application> has been superceded by
+   <application>pg_repackdb</application>.
+  </para>
+
  </refsect1>
 
 
diff --git a/doc/src/sgml/ref/pg_repackdb.sgml b/doc/src/sgml/ref/pg_repackdb.sgml
new file mode 100644
index 00000000000..32570d071cb
--- /dev/null
+++ b/doc/src/sgml/ref/pg_repackdb.sgml
@@ -0,0 +1,479 @@
+<!--
+doc/src/sgml/ref/pg_repackdb.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="app-pgrepackdb">
+ <indexterm zone="app-pgrepackdb">
+  <primary>pg_repackdb</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle><application>pg_repackdb</application></refentrytitle>
+  <manvolnum>1</manvolnum>
+  <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>pg_repackdb</refname>
+  <refpurpose>repack and analyze a <productname>PostgreSQL</productname>
+  database</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_repackdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+      <arg choice="plain"><option>-t</option></arg>
+      <arg choice="plain"><option>--table</option></arg>
+     </group>
+     <replaceable>table</replaceable>
+     <arg choice="opt">( <replaceable class="parameter">column</replaceable> [,...] )</arg>
+    </arg>
+   </arg>
+
+   <arg choice="opt">
+    <group choice="plain">
+     <arg choice="plain"><replaceable>dbname</replaceable></arg>
+     <arg choice="plain"><option>-a</option></arg>
+     <arg choice="plain"><option>--all</option></arg>
+    </group>
+   </arg>
+  </cmdsynopsis>
+
+  <cmdsynopsis>
+   <command>pg_repackdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+      <arg choice="plain"><option>-n</option></arg>
+      <arg choice="plain"><option>--schema</option></arg>
+     </group>
+     <replaceable>schema</replaceable>
+    </arg>
+   </arg>
+
+   <arg choice="opt">
+    <group choice="plain">
+     <arg choice="plain"><replaceable>dbname</replaceable></arg>
+     <arg choice="plain"><option>-a</option></arg>
+     <arg choice="plain"><option>--all</option></arg>
+    </group>
+   </arg>
+  </cmdsynopsis>
+
+  <cmdsynopsis>
+   <command>pg_repackdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+      <arg choice="plain"><option>-N</option></arg>
+      <arg choice="plain"><option>--exclude-schema</option></arg>
+     </group>
+     <replaceable>schema</replaceable>
+    </arg>
+   </arg>
+
+   <arg choice="opt">
+    <group choice="plain">
+     <arg choice="plain"><replaceable>dbname</replaceable></arg>
+     <arg choice="plain"><option>-a</option></arg>
+     <arg choice="plain"><option>--all</option></arg>
+    </group>
+   </arg>
+  </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <application>pg_repackdb</application> is a utility for repacking a
+   <productname>PostgreSQL</productname> database.
+   <application>pg_repackdb</application> will also generate internal
+   statistics used by the <productname>PostgreSQL</productname> query
+   optimizer.
+  </para>
+
+  <para>
+   <application>pg_repackdb</application> is a wrapper around the SQL
+   command <link linkend="sql-repack"><command>REPACK</command></link> There
+   is no effective difference between repacking and analyzing databases via
+   this utility and via other methods for accessing the server.
+  </para>
+
+ </refsect1>
+
+
+ <refsect1>
+  <title>Options</title>
+
+   <para>
+    <application>pg_repackdb</application> accepts the following command-line arguments:
+    <variablelist>
+     <varlistentry>
+      <term><option>-a</option></term>
+      <term><option>--all</option></term>
+      <listitem>
+       <para>
+        Repack all databases.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option><optional>-d</optional> <replaceable class="parameter">dbname</replaceable></option></term>
+      <term><option><optional>--dbname=</optional><replaceable class="parameter">dbname</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the name of the database to be repacked or analyzed,
+        when <option>-a</option>/<option>--all</option> is not used.  If this
+        is not specified, the database name is read from the environment
+        variable <envar>PGDATABASE</envar>.  If that is not set, the user name
+        specified for the connection is used.
+        The <replaceable>dbname</replaceable> can be
+        a <link linkend="libpq-connstring">connection string</link>.  If so,
+        connection string parameters will override any conflicting command
+        line options.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-e</option></term>
+      <term><option>--echo</option></term>
+      <listitem>
+       <para>
+        Echo the commands that <application>pg_repackdb</application>
+        generates and sends to the server.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-j <replaceable class="parameter">njobs</replaceable></option></term>
+      <term><option>--jobs=<replaceable class="parameter">njobs</replaceable></option></term>
+      <listitem>
+       <para>
+        Execute the repack or analyze commands in parallel by running
+        <replaceable class="parameter">njobs</replaceable>
+        commands simultaneously.  This option may reduce the processing time
+        but it also increases the load on the database server.
+       </para>
+       <para>
+        <application>pg_repackdb</application> will open
+        <replaceable class="parameter">njobs</replaceable> connections to the
+        database, so make sure your <xref linkend="guc-max-connections"/>
+        setting is high enough to accommodate all connections.
+       </para>
+       <para>
+        Note that using this mode might cause deadlock failures if certain
+        system catalogs are processed in parallel.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-n <replaceable class="parameter">schema</replaceable></option></term>
+      <term><option>--schema=<replaceable class="parameter">schema</replaceable></option></term>
+      <listitem>
+       <para>
+        Repack or analyze all tables in
+        <replaceable class="parameter">schema</replaceable> only.  Multiple
+        schemas can be repacked by writing multiple <option>-n</option>
+        switches.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-N <replaceable class="parameter">schema</replaceable></option></term>
+      <term><option>--exclude-schema=<replaceable class="parameter">schema</replaceable></option></term>
+      <listitem>
+       <para>
+        Do not repack or analyze any tables in
+        <replaceable class="parameter">schema</replaceable>.  Multiple schemas
+        can be excluded by writing multiple <option>-N</option> switches.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-q</option></term>
+      <term><option>--quiet</option></term>
+      <listitem>
+       <para>
+        Do not display progress messages.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-t <replaceable class="parameter">table</replaceable> [ (<replaceable class="parameter">column</replaceable> [,...]) ]</option></term>
+      <term><option>--table=<replaceable class="parameter">table</replaceable> [ (<replaceable class="parameter">column</replaceable> [,...]) ]</option></term>
+      <listitem>
+       <para>
+        Repack or analyze <replaceable class="parameter">table</replaceable>
+        only.  Column names can be specified only in conjunction with
+        the <option>--analyze</option> option.  Multiple tables can be
+        repacked by writing multiple
+        <option>-t</option> switches.
+       </para>
+       <tip>
+        <para>
+         If you specify columns, you probably have to escape the parentheses
+         from the shell.  (See examples below.)
+        </para>
+       </tip>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-v</option></term>
+      <term><option>--verbose</option></term>
+      <listitem>
+       <para>
+        Print detailed information during processing.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-V</option></term>
+       <term><option>--version</option></term>
+       <listitem>
+       <para>
+       Print the <application>pg_repackdb</application> version and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-z</option></term>
+      <term><option>--analyze</option></term>
+      <listitem>
+       <para>
+        Also calculate statistics for use by the optimizer.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-?</option></term>
+       <term><option>--help</option></term>
+       <listitem>
+       <para>
+       Show help about <application>pg_repackdb</application> command line
+       arguments, and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+    </variablelist>
+   </para>
+
+   <para>
+    <application>pg_repackdb</application> also accepts
+    the following command-line arguments for connection parameters:
+    <variablelist>
+     <varlistentry>
+      <term><option>-h <replaceable class="parameter">host</replaceable></option></term>
+      <term><option>--host=<replaceable class="parameter">host</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the host name of the machine on which the server
+        is running.  If the value begins with a slash, it is used
+        as the directory for the Unix domain socket.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-p <replaceable class="parameter">port</replaceable></option></term>
+      <term><option>--port=<replaceable class="parameter">port</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the TCP port or local Unix domain socket file
+        extension on which the server
+        is listening for connections.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-U <replaceable class="parameter">username</replaceable></option></term>
+      <term><option>--username=<replaceable class="parameter">username</replaceable></option></term>
+      <listitem>
+       <para>
+        User name to connect as.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-w</option></term>
+      <term><option>--no-password</option></term>
+      <listitem>
+       <para>
+        Never issue a password prompt.  If the server requires
+        password authentication and a password is not available by
+        other means such as a <filename>.pgpass</filename> file, the
+        connection attempt will fail.  This option can be useful in
+        batch jobs and scripts where no user is present to enter a
+        password.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-W</option></term>
+      <term><option>--password</option></term>
+      <listitem>
+       <para>
+        Force <application>pg_repackdb</application> to prompt for a
+        password before connecting to a database.
+       </para>
+
+       <para>
+        This option is never essential, since
+        <application>pg_repackdb</application> will automatically prompt
+        for a password if the server demands password authentication.
+        However, <application>pg_repackdb</application> will waste a
+        connection attempt finding out that the server wants a password.
+        In some cases it is worth typing <option>-W</option> to avoid the extra
+        connection attempt.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--maintenance-db=<replaceable class="parameter">dbname</replaceable></option></term>
+      <listitem>
+       <para>
+        When the <option>-a</option>/<option>--all</option> is used, connect
+        to this database to gather the list of databases to repack.
+        If not specified, the <literal>postgres</literal> database will be used,
+        or if that does not exist, <literal>template1</literal> will be used.
+        This can be a <link linkend="libpq-connstring">connection
+        string</link>.  If so, connection string parameters will override any
+        conflicting command line options.  Also, connection string parameters
+        other than the database name itself will be re-used when connecting
+        to other databases.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+   </para>
+ </refsect1>
+
+
+ <refsect1>
+  <title>Environment</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><envar>PGDATABASE</envar></term>
+    <term><envar>PGHOST</envar></term>
+    <term><envar>PGPORT</envar></term>
+    <term><envar>PGUSER</envar></term>
+
+    <listitem>
+     <para>
+      Default connection parameters
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><envar>PG_COLOR</envar></term>
+    <listitem>
+     <para>
+      Specifies whether to use color in diagnostic messages. Possible values
+      are <literal>always</literal>, <literal>auto</literal> and
+      <literal>never</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <para>
+   This utility, like most other <productname>PostgreSQL</productname> utilities,
+   also uses the environment variables supported by <application>libpq</application>
+   (see <xref linkend="libpq-envars"/>).
+  </para>
+
+ </refsect1>
+
+
+ <refsect1>
+  <title>Diagnostics</title>
+
+  <para>
+   In case of difficulty, see
+   <xref linkend="sql-repack"/> and <xref linkend="app-psql"/> for
+   discussions of potential problems and error messages.
+   The database server must be running at the
+   targeted host.  Also, any default connection settings and environment
+   variables used by the <application>libpq</application> front-end
+   library will apply.
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+   <para>
+    To repack the database <literal>test</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb test</userinput>
+</screen>
+   </para>
+
+   <para>
+    To repack and analyze for the optimizer a database named
+    <literal>bigdb</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --analyze bigdb</userinput>
+</screen>
+   </para>
+
+   <para>
+    To repack a single table
+    <literal>foo</literal> in a database named
+    <literal>xyzzy</literal>, and analyze a single column
+    <literal>bar</literal> of the table for the optimizer:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --analyze --verbose --table='foo(bar)' xyzzy</userinput>
+</screen></para>
+
+   <para>
+    To repack all tables in the <literal>foo</literal> and <literal>bar</literal> schemas
+    in a database named <literal>xyzzy</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --schema='foo' --schema='bar' xyzzy</userinput>
+</screen></para>
+
+
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-repack"/></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/repack.sgml b/doc/src/sgml/ref/repack.sgml
new file mode 100644
index 00000000000..fd9d89f8aaa
--- /dev/null
+++ b/doc/src/sgml/ref/repack.sgml
@@ -0,0 +1,284 @@
+<!--
+doc/src/sgml/ref/repack.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="sql-repack">
+ <indexterm zone="sql-repack">
+  <primary>REPACK</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>REPACK</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>REPACK</refname>
+  <refpurpose>rewrite a table to reclaim disk space</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <replaceable class="parameter">table_name</replaceable> [ USING INDEX [ <replaceable class="parameter">index_name</replaceable> ] ] ]
+
+<phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
+
+    VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
+    ANALYSE | ANALYZE
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>REPACK</command> reclaims storage occupied by dead
+   tuples. Unlike <command>VACUUM</command>, it does so by rewriting the
+   entire contents of the table specified
+   by <replaceable class="parameter">table_name</replaceable> into a new disk
+   file with no extra space (except for the space guaranteed by
+   the <literal>fillfactor</literal> storage parameter), allowing unused space
+   to be returned to the operating system.
+  </para>
+
+  <para>
+   Without
+   a <replaceable class="parameter">table_name</replaceable>, <command>REPACK</command>
+   processes every table and materialized view in the current database that
+   the current user has the <literal>MAINTAIN</literal> privilege on. This
+   form of <command>REPACK</command> cannot be executed inside a transaction
+   block.
+  </para>
+
+  <para>
+   If a <literal>USING INDEX</literal> clause is specified, the rows are
+   physically reordered based on information from an index.  Please see the
+   notes on clustering below.
+  </para>
+
+  <para>
+   When a table is being repacked, an <literal>ACCESS EXCLUSIVE</literal> lock
+   is acquired on it. This prevents any other database operations (both reads
+   and writes) from operating on the table until the <command>REPACK</command>
+   is finished.
+  </para>
+
+  <refsect2 id="sql-repack-notes-on-clustering" xreflabel="Notes on Clustering">
+   <title>Notes on Clustering</title>
+
+   <para>
+    If the <literal>USING INDEX</literal> clause is specified, the rows in
+    the table are physically reordered following an index: if an index name
+    is specified in the command, then that index is used; if no index name
+    is specified, then the index that has been configured as the index to
+    cluster on.  If no index has been configured in this way, an error is
+    thrown.  The index given in the <literal>USING INDEX</literal> clause
+    is configured as the index to cluster on, as well as an index given
+    to the <command>CLUSTER</command> command.  An index can be set
+    manually using <command>ALTER TABLE ... CLUSTER ON</command>, and reset
+    with <command>ALTER TABLE ... SET WITHOUT CLUSTER</command>.
+   </para>
+
+   <para>
+    If no table name is specified in <command>REPACK USING INDEX</command>,
+    all tables which have a clustering index defined and which the calling
+    user has privileges for are processed.
+   </para>
+
+   <para>
+    Clustering is a one-time operation: when the table is
+    subsequently updated, the changes are not clustered.  That is, no attempt
+    is made to store new or updated rows according to their index order.  (If
+    one wishes, one can periodically recluster by issuing the command again.
+    Also, setting the table's <literal>fillfactor</literal> storage parameter
+    to less than 100% can aid in preserving cluster ordering during updates,
+    since updated rows are kept on the same page if enough space is available
+    there.)
+   </para>
+
+   <para>
+    In cases where you are accessing single rows randomly within a table, the
+    actual order of the data in the table is unimportant. However, if you tend
+    to access some data more than others, and there is an index that groups
+    them together, you will benefit from using clustering.  If
+    you are requesting a range of indexed values from a table, or a single
+    indexed value that has multiple rows that match,
+    <command>REPACK</command> will help because once the index identifies the
+    table page for the first row that matches, all other rows that match are
+    probably already on the same table page, and so you save disk accesses and
+    speed up the query.
+   </para>
+
+   <para>
+    <command>REPACK</command> can re-sort the table using either an index scan
+    on the specified index (if the index is a b-tree), or a sequential scan
+    followed by sorting.  It will attempt to choose the method that will be
+    faster, based on planner cost parameters and available statistical
+    information.
+   </para>
+
+   <para>
+    Because the planner records statistics about the ordering of tables, it is
+    advisable to
+    run <link linkend="sql-analyze"><command>ANALYZE</command></link> on the
+    newly repacked table.  Otherwise, the planner might make poor choices of
+    query plans.
+   </para>
+  </refsect2>
+
+  <refsect2 id="sql-repack-notes-on-resources" xreflabel="Notes on Resources">
+   <title>Notes on Resources</title>
+
+   <para>
+    When an index scan or a sequential scan without sort is used, a temporary
+    copy of the table is created that contains the table data in the index
+    order.  Temporary copies of each index on the table are created as well.
+    Therefore, you need free space on disk at least equal to the sum of the
+    table size and the index sizes.
+   </para>
+
+   <para>
+    When a sequential scan and sort is used, a temporary sort file is also
+    created, so that the peak temporary space requirement is as much as double
+    the table size, plus the index sizes.  This method is often faster than
+    the index scan method, but if the disk space requirement is intolerable,
+    you can disable this choice by temporarily setting
+    <xref linkend="guc-enable-sort"/> to <literal>off</literal>.
+   </para>
+
+   <para>
+    It is advisable to set <xref linkend="guc-maintenance-work-mem"/> to a
+    reasonably large value (but not more than the amount of RAM you can
+    dedicate to the <command>REPACK</command> operation) before repacking.
+   </para>
+  </refsect2>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><replaceable class="parameter">table_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (possibly schema-qualified) of a table.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">index_name</replaceable></term>
+    <listitem>
+     <para>
+      The name of an index.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>VERBOSE</literal></term>
+    <listitem>
+     <para>
+      Prints a progress report as each table is repacked
+      at <literal>INFO</literal> level.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>ANALYZE</literal></term>
+    <term><literal>ANALYSE</literal></term>
+    <listitem>
+     <para>
+      Applies <xref linkend="sql-analyze"/> on the table after repacking.  This is
+      currently only supported when a single (non-partitioned) table is specified.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">boolean</replaceable></term>
+    <listitem>
+     <para>
+      Specifies whether the selected option should be turned on or off.
+      You can write <literal>TRUE</literal>, <literal>ON</literal>, or
+      <literal>1</literal> to enable the option, and <literal>FALSE</literal>,
+      <literal>OFF</literal>, or <literal>0</literal> to disable it.  The
+      <replaceable class="parameter">boolean</replaceable> value can also
+      be omitted, in which case <literal>TRUE</literal> is assumed.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+
+   <para>
+    To repack a table, one must have the <literal>MAINTAIN</literal> privilege
+    on the table.
+   </para>
+
+   <para>
+    While <command>REPACK</command> is running, the <xref
+    linkend="guc-search-path"/> is temporarily changed to <literal>pg_catalog,
+    pg_temp</literal>.
+   </para>
+
+  <para>
+    Each backend running <command>REPACK</command> will report its progress
+    in the <structname>pg_stat_progress_repack</structname> view. See
+    <xref linkend="repack-progress-reporting"/> for details.
+  </para>
+
+   <para>
+    Repacking a partitioned table repacks each of its partitions. If an index
+    is specified, each partition is repacked using the partition of that
+    index. <command>REPACK</command> on a partitioned table cannot be executed
+    inside a transaction block.
+   </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   Repack the table <literal>employees</literal>:
+<programlisting>
+REPACK employees;
+</programlisting>
+  </para>
+
+  <para>
+   Repack the table <literal>employees</literal> on the basis of its
+   index <literal>employees_ind</literal> (Since index is used here, this is
+   effectively clustering):
+<programlisting>
+REPACK employees USING INDEX employees_ind;
+</programlisting>
+  </para>
+
+  <para>
+   Repack all tables in the database on which you have
+   the <literal>MAINTAIN</literal> privilege:
+<programlisting>
+REPACK;
+</programlisting></para>
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There is no <command>REPACK</command> statement in the SQL standard.
+  </para>
+
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index bd5dcaf86a5..062b658cfcd 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -25,7 +25,6 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
 
 <phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
 
-    FULL [ <replaceable class="parameter">boolean</replaceable> ]
     FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
     VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
     ANALYZE [ <replaceable class="parameter">boolean</replaceable> ]
@@ -39,6 +38,7 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
     SKIP_DATABASE_STATS [ <replaceable class="parameter">boolean</replaceable> ]
     ONLY_DATABASE_STATS [ <replaceable class="parameter">boolean</replaceable> ]
     BUFFER_USAGE_LIMIT <replaceable class="parameter">size</replaceable>
+    FULL [ <replaceable class="parameter">boolean</replaceable> ]
 
 <phrase>and <replaceable class="parameter">table_and_columns</replaceable> is:</phrase>
 
@@ -95,20 +95,6 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
   <title>Parameters</title>
 
   <variablelist>
-   <varlistentry>
-    <term><literal>FULL</literal></term>
-    <listitem>
-     <para>
-      Selects <quote>full</quote> vacuum, which can reclaim more
-      space, but takes much longer and exclusively locks the table.
-      This method also requires extra disk space, since it writes a
-      new copy of the table and doesn't release the old copy until
-      the operation is complete.  Usually this should only be used when a
-      significant amount of space needs to be reclaimed from within the table.
-     </para>
-    </listitem>
-   </varlistentry>
-
    <varlistentry>
     <term><literal>FREEZE</literal></term>
     <listitem>
@@ -362,6 +348,23 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>FULL</literal></term>
+    <listitem>
+     <para>
+      This option, which is deprecated, makes <command>VACUUM</command>
+      behave like <command>REPACK</command> without a
+      <literal>USING INDEX</literal> clause.
+      This method of compacting the table takes much longer than
+      <command>VACUUM</command> and exclusively locks the table.
+      This method also requires extra disk space, since it writes a
+      new copy of the table and doesn't release the old copy until
+      the operation is complete.  Usually this should only be used when a
+      significant amount of space needs to be reclaimed from within the table.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><replaceable class="parameter">boolean</replaceable></term>
     <listitem>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index ff85ace83fc..2ee08e21f41 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -195,6 +195,7 @@
    &refreshMaterializedView;
    &reindex;
    &releaseSavepoint;
+   &repack;
    &reset;
    &revoke;
    &rollback;
@@ -257,6 +258,7 @@
    &pgIsready;
    &pgReceivewal;
    &pgRecvlogical;
+   &pgRepackdb;
    &pgRestore;
    &pgVerifyBackup;
    &psqlRef;
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..79f9de5d760 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -741,13 +741,13 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	if (OldIndex != NULL && !use_sort)
 	{
 		const int	ci_index[] = {
-			PROGRESS_CLUSTER_PHASE,
-			PROGRESS_CLUSTER_INDEX_RELID
+			PROGRESS_REPACK_PHASE,
+			PROGRESS_REPACK_INDEX_RELID
 		};
 		int64		ci_val[2];
 
 		/* Set phase and OIDOldIndex to columns */
-		ci_val[0] = PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP;
+		ci_val[0] = PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP;
 		ci_val[1] = RelationGetRelid(OldIndex);
 		pgstat_progress_update_multi_param(2, ci_index, ci_val);
 
@@ -759,15 +759,15 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	else
 	{
 		/* In scan-and-sort mode and also VACUUM FULL, set phase */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
 		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
 		/* Set total heap blocks */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_TOTAL_HEAP_BLKS,
+		pgstat_progress_update_param(PROGRESS_REPACK_TOTAL_HEAP_BLKS,
 									 heapScan->rs_nblocks);
 	}
 
@@ -809,7 +809,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 				 * is manually updated to the correct value when the table
 				 * scan finishes.
 				 */
-				pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+				pgstat_progress_update_param(PROGRESS_REPACK_HEAP_BLKS_SCANNED,
 											 heapScan->rs_nblocks);
 				break;
 			}
@@ -825,7 +825,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			 */
 			if (prev_cblock != heapScan->rs_cblock)
 			{
-				pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+				pgstat_progress_update_param(PROGRESS_REPACK_HEAP_BLKS_SCANNED,
 											 (heapScan->rs_cblock +
 											  heapScan->rs_nblocks -
 											  heapScan->rs_startblock
@@ -912,14 +912,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			 * In scan-and-sort mode, report increase in number of tuples
 			 * scanned
 			 */
-			pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
 										 *num_tuples);
 		}
 		else
 		{
 			const int	ct_index[] = {
-				PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED,
-				PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN
+				PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
+				PROGRESS_REPACK_HEAP_TUPLES_WRITTEN
 			};
 			int64		ct_val[2];
 
@@ -952,14 +952,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		double		n_tuples = 0;
 
 		/* Report that we are now sorting tuples */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_SORT_TUPLES);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_SORT_TUPLES);
 
 		tuplesort_performsort(tuplesort);
 
 		/* Report that we are now writing new heap */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_WRITE_NEW_HEAP);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP);
 
 		for (;;)
 		{
@@ -977,7 +977,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 									 values, isnull,
 									 rwstate);
 			/* Report n_tuples */
-			pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_WRITTEN,
 										 n_tuples);
 		}
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index c4029a4f3d3..3063abff9a5 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -4079,7 +4079,7 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 		Assert(!ReindexIsProcessingIndex(indexOid));
 
 		/* Set index rebuild count */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_INDEX_REBUILD_COUNT,
+		pgstat_progress_update_param(PROGRESS_REPACK_INDEX_REBUILD_COUNT,
 									 i);
 		i++;
 	}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 1b3c5a55882..b2b7b10c2be 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1279,6 +1279,32 @@ CREATE VIEW pg_stat_progress_cluster AS
     FROM pg_stat_get_progress_info('CLUSTER') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
+CREATE VIEW pg_stat_progress_repack AS
+    SELECT
+        S.pid AS pid,
+        S.datid AS datid,
+        D.datname AS datname,
+        S.relid AS relid,
+	-- param1 is currently unused
+        CASE S.param2 WHEN 0 THEN 'initializing'
+                      WHEN 1 THEN 'seq scanning heap'
+                      WHEN 2 THEN 'index scanning heap'
+                      WHEN 3 THEN 'sorting tuples'
+                      WHEN 4 THEN 'writing new heap'
+                      WHEN 5 THEN 'swapping relation files'
+                      WHEN 6 THEN 'rebuilding index'
+                      WHEN 7 THEN 'performing final cleanup'
+                      END AS phase,
+        CAST(S.param3 AS oid) AS repack_index_relid,
+        S.param4 AS heap_tuples_scanned,
+        S.param5 AS heap_tuples_written,
+        S.param6 AS heap_blks_total,
+        S.param7 AS heap_blks_scanned,
+        S.param8 AS index_rebuild_count
+    FROM pg_stat_get_progress_info('REPACK') AS S
+        LEFT JOIN pg_database D ON S.datid = D.oid;
+
+
 CREATE VIEW pg_stat_progress_create_index AS
     SELECT
         S.pid AS pid, S.datid AS datid, D.datname AS datname,
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index b55221d44cd..8b64f9e6795 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -67,18 +67,41 @@ typedef struct
 	Oid			indexOid;
 } RelToCluster;
 
-
-static void cluster_multiple_rels(List *rtcs, ClusterParams *params);
-static void rebuild_relation(Relation OldHeap, Relation index, bool verbose);
+static bool cluster_rel_recheck(RepackCommand cmd, Relation OldHeap,
+								Oid indexOid, Oid userid, int options);
+static void rebuild_relation(RepackCommand cmd, bool usingindex,
+							 Relation OldHeap, Relation index, bool verbose);
 static void copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
-static List *get_tables_to_cluster(MemoryContext cluster_context);
-static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
-											   Oid indexOid);
-static bool cluster_is_permitted_for_relation(Oid relid, Oid userid);
+static List *get_tables_to_repack(RepackCommand cmd, bool usingindex,
+								  MemoryContext permcxt);
+static List *get_tables_to_repack_partitioned(RepackCommand cmd,
+											  MemoryContext cluster_context,
+											  Oid relid, bool rel_is_index);
+static bool cluster_is_permitted_for_relation(RepackCommand cmd,
+											  Oid relid, Oid userid);
+static Relation process_single_relation(RepackStmt *stmt,
+										ClusterParams *params);
+static Oid	determine_clustered_index(Relation rel, bool usingindex,
+									  const char *indexname);
 
 
+static const char *
+RepackCommandAsString(RepackCommand cmd)
+{
+	switch (cmd)
+	{
+		case REPACK_COMMAND_REPACK:
+			return "REPACK";
+		case REPACK_COMMAND_VACUUMFULL:
+			return "VACUUM";
+		case REPACK_COMMAND_CLUSTER:
+			return "CLUSTER";
+	}
+	return "???";
+}
+
 /*---------------------------------------------------------------------------
  * This cluster code allows for clustering multiple tables at once. Because
  * of this, we cannot just run everything on a single transaction, or we
@@ -104,191 +127,155 @@ static bool cluster_is_permitted_for_relation(Oid relid, Oid userid);
  *---------------------------------------------------------------------------
  */
 void
-cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
+ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 {
-	ListCell   *lc;
 	ClusterParams params = {0};
-	bool		verbose = false;
 	Relation	rel = NULL;
-	Oid			indexOid = InvalidOid;
-	MemoryContext cluster_context;
+	MemoryContext repack_context;
 	List	   *rtcs;
 
 	/* Parse option list */
-	foreach(lc, stmt->params)
+	foreach_node(DefElem, opt, stmt->params)
 	{
-		DefElem    *opt = (DefElem *) lfirst(lc);
-
 		if (strcmp(opt->defname, "verbose") == 0)
-			verbose = defGetBoolean(opt);
+			params.options |= defGetBoolean(opt) ? CLUOPT_VERBOSE : 0;
+		else if (strcmp(opt->defname, "analyze") == 0 ||
+				 strcmp(opt->defname, "analyse") == 0)
+			params.options |= defGetBoolean(opt) ? CLUOPT_ANALYZE : 0;
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
-					 errmsg("unrecognized CLUSTER option \"%s\"",
+					 errmsg("unrecognized %s option \"%s\"",
+							RepackCommandAsString(stmt->command),
 							opt->defname),
 					 parser_errposition(pstate, opt->location)));
 	}
 
-	params.options = (verbose ? CLUOPT_VERBOSE : 0);
-
+	/*
+	 * If a single relation is specified, process it and we're done ... unless
+	 * the relation is a partitioned table, in which case we fall through.
+	 */
 	if (stmt->relation != NULL)
 	{
-		/* This is the single-relation case. */
-		Oid			tableOid;
-
-		/*
-		 * Find, lock, and check permissions on the table.  We obtain
-		 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
-		 * single-transaction case.
-		 */
-		tableOid = RangeVarGetRelidExtended(stmt->relation,
-											AccessExclusiveLock,
-											0,
-											RangeVarCallbackMaintainsTable,
-											NULL);
-		rel = table_open(tableOid, NoLock);
-
-		/*
-		 * Reject clustering a remote temp table ... their local buffer
-		 * manager is not going to cope.
-		 */
-		if (RELATION_IS_OTHER_TEMP(rel))
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot cluster temporary tables of other sessions")));
-
-		if (stmt->indexname == NULL)
-		{
-			ListCell   *index;
-
-			/* We need to find the index that has indisclustered set. */
-			foreach(index, RelationGetIndexList(rel))
-			{
-				indexOid = lfirst_oid(index);
-				if (get_index_isclustered(indexOid))
-					break;
-				indexOid = InvalidOid;
-			}
-
-			if (!OidIsValid(indexOid))
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_OBJECT),
-						 errmsg("there is no previously clustered index for table \"%s\"",
-								stmt->relation->relname)));
-		}
-		else
-		{
-			/*
-			 * The index is expected to be in the same namespace as the
-			 * relation.
-			 */
-			indexOid = get_relname_relid(stmt->indexname,
-										 rel->rd_rel->relnamespace);
-			if (!OidIsValid(indexOid))
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_OBJECT),
-						 errmsg("index \"%s\" for table \"%s\" does not exist",
-								stmt->indexname, stmt->relation->relname)));
-		}
-
-		/* For non-partitioned tables, do what we came here to do. */
-		if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
-		{
-			cluster_rel(rel, indexOid, &params);
-			/* cluster_rel closes the relation, but keeps lock */
-
+		rel = process_single_relation(stmt, &params);
+		if (rel == NULL)
 			return;
-		}
 	}
 
+	/* Don't allow this for now.  Maybe we can add support for this later */
+	if (params.options & CLUOPT_ANALYZE)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot ANALYZE multiple tables"));
+
 	/*
 	 * By here, we know we are in a multi-table situation.  In order to avoid
 	 * holding locks for too long, we want to process each table in its own
 	 * transaction.  This forces us to disallow running inside a user
 	 * transaction block.
 	 */
-	PreventInTransactionBlock(isTopLevel, "CLUSTER");
+	PreventInTransactionBlock(isTopLevel, RepackCommandAsString(stmt->command));
 
 	/* Also, we need a memory context to hold our list of relations */
-	cluster_context = AllocSetContextCreate(PortalContext,
-											"Cluster",
-											ALLOCSET_DEFAULT_SIZES);
+	repack_context = AllocSetContextCreate(PortalContext,
+										   "Repack",
+										   ALLOCSET_DEFAULT_SIZES);
+
+	params.options |= CLUOPT_RECHECK;
 
 	/*
-	 * Either we're processing a partitioned table, or we were not given any
-	 * table name at all.  In either case, obtain a list of relations to
-	 * process.
-	 *
-	 * In the former case, an index name must have been given, so we don't
-	 * need to recheck its "indisclustered" bit, but we have to check that it
-	 * is an index that we can cluster on.  In the latter case, we set the
-	 * option bit to have indisclustered verified.
-	 *
-	 * Rechecking the relation itself is necessary here in all cases.
+	 * If we don't have a relation yet, determine a relation list.  If we do,
+	 * then it must be a partitioned table, and we want to process its
+	 * partitions.
 	 */
-	params.options |= CLUOPT_RECHECK;
-	if (rel != NULL)
+	if (rel == NULL)
 	{
-		Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
-		check_index_is_clusterable(rel, indexOid, AccessShareLock);
-		rtcs = get_tables_to_cluster_partitioned(cluster_context, indexOid);
-
-		/* close relation, releasing lock on parent table */
-		table_close(rel, AccessExclusiveLock);
+		Assert(stmt->indexname == NULL);
+		rtcs = get_tables_to_repack(stmt->command, stmt->usingindex,
+									repack_context);
 	}
 	else
 	{
-		rtcs = get_tables_to_cluster(cluster_context);
-		params.options |= CLUOPT_RECHECK_ISCLUSTERED;
+		Oid			relid;
+		bool		rel_is_index;
+
+		Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
+
+		/*
+		 * If an index name was specified, resolve it now and pass it down.
+		 */
+		if (stmt->usingindex)
+		{
+			/*
+			 * XXX how should this behave?  Passing no index to a partitioned
+			 * table could be useful to have certain partitions clustered by
+			 * some index, and other partitions by a different index.
+			 */
+			if (!stmt->indexname)
+				ereport(ERROR,
+						errmsg("there is no previously clustered index for table \"%s\"",
+							   RelationGetRelationName(rel)));
+
+			relid = determine_clustered_index(rel, true, stmt->indexname);
+			if (!OidIsValid(relid))
+				elog(ERROR, "unable to determine index to cluster on");
+			/* XXX is this the right place for this check? */
+			check_index_is_clusterable(rel, relid, AccessExclusiveLock);
+			rel_is_index = true;
+		}
+		else
+		{
+			relid = RelationGetRelid(rel);
+			rel_is_index = false;
+		}
+
+		rtcs = get_tables_to_repack_partitioned(stmt->command, repack_context,
+												relid, rel_is_index);
+
+		/* close parent relation, releasing lock on it */
+		table_close(rel, AccessExclusiveLock);
+		rel = NULL;
 	}
 
-	/* Do the job. */
-	cluster_multiple_rels(rtcs, &params);
-
-	/* Start a new transaction for the cleanup work. */
-	StartTransactionCommand();
-
-	/* Clean up working storage */
-	MemoryContextDelete(cluster_context);
-}
-
-/*
- * Given a list of relations to cluster, process each of them in a separate
- * transaction.
- *
- * We expect to be in a transaction at start, but there isn't one when we
- * return.
- */
-static void
-cluster_multiple_rels(List *rtcs, ClusterParams *params)
-{
-	ListCell   *lc;
-
 	/* Commit to get out of starting transaction */
 	PopActiveSnapshot();
 	CommitTransactionCommand();
 
 	/* Cluster the tables, each in a separate transaction */
-	foreach(lc, rtcs)
+	Assert(rel == NULL);
+	foreach_ptr(RelToCluster, rtc, rtcs)
 	{
-		RelToCluster *rtc = (RelToCluster *) lfirst(lc);
-		Relation	rel;
-
 		/* Start a new transaction for each relation. */
 		StartTransactionCommand();
 
+		/*
+		 * Open the target table, coping with the case where it has been
+		 * dropped.
+		 */
+		rel = try_table_open(rtc->tableOid, AccessExclusiveLock);
+		if (rel == NULL)
+		{
+			CommitTransactionCommand();
+			continue;
+		}
+
 		/* functions in indexes may want a snapshot set */
 		PushActiveSnapshot(GetTransactionSnapshot());
 
-		rel = table_open(rtc->tableOid, AccessExclusiveLock);
-
 		/* Process this table */
-		cluster_rel(rel, rtc->indexOid, params);
+		cluster_rel(stmt->command, stmt->usingindex,
+					rel, rtc->indexOid, &params);
 		/* cluster_rel closes the relation, but keeps lock */
 
 		PopActiveSnapshot();
 		CommitTransactionCommand();
 	}
+
+	/* Start a new transaction for the cleanup work. */
+	StartTransactionCommand();
+
+	/* Clean up working storage */
+	MemoryContextDelete(repack_context);
 }
 
 /*
@@ -304,11 +291,14 @@ cluster_multiple_rels(List *rtcs, ClusterParams *params)
  * them incrementally while we load the table.
  *
  * If indexOid is InvalidOid, the table will be rewritten in physical order
- * instead of index order.  This is the new implementation of VACUUM FULL,
- * and error messages should refer to the operation as VACUUM not CLUSTER.
+ * instead of index order.
+ *
+ * 'cmd' indicates which command is being executed, to be used for error
+ * messages.
  */
 void
-cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
+cluster_rel(RepackCommand cmd, bool usingindex,
+			Relation OldHeap, Oid indexOid, ClusterParams *params)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			save_userid;
@@ -323,13 +313,25 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	/* Check for user-requested abort. */
 	CHECK_FOR_INTERRUPTS();
 
-	pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
-	if (OidIsValid(indexOid))
-		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
-									 PROGRESS_CLUSTER_COMMAND_CLUSTER);
+	if (cmd == REPACK_COMMAND_REPACK)
+		pgstat_progress_start_command(PROGRESS_COMMAND_REPACK, tableOid);
 	else
-		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
+		pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
+
+	if (cmd == REPACK_COMMAND_REPACK)
+		pgstat_progress_update_param(PROGRESS_REPACK_COMMAND,
+									 PROGRESS_REPACK_COMMAND_REPACK);
+	else if (cmd == REPACK_COMMAND_CLUSTER)
+	{
+		pgstat_progress_update_param(PROGRESS_REPACK_COMMAND,
+									 PROGRESS_CLUSTER_COMMAND_CLUSTER);
+	}
+	else
+	{
+		Assert(cmd == REPACK_COMMAND_VACUUMFULL);
+		pgstat_progress_update_param(PROGRESS_REPACK_COMMAND,
 									 PROGRESS_CLUSTER_COMMAND_VACUUM_FULL);
+	}
 
 	/*
 	 * Switch to the table owner's userid, so that any index functions are run
@@ -351,63 +353,21 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	 * to cluster a not-previously-clustered index.
 	 */
 	if (recheck)
-	{
-		/* Check that the user still has privileges for the relation */
-		if (!cluster_is_permitted_for_relation(tableOid, save_userid))
-		{
-			relation_close(OldHeap, AccessExclusiveLock);
+		if (!cluster_rel_recheck(cmd, OldHeap, indexOid, save_userid,
+								 params->options))
 			goto out;
-		}
-
-		/*
-		 * Silently skip a temp table for a remote session.  Only doing this
-		 * check in the "recheck" case is appropriate (which currently means
-		 * somebody is executing a database-wide CLUSTER or on a partitioned
-		 * table), because there is another check in cluster() which will stop
-		 * any attempt to cluster remote temp tables by name.  There is
-		 * another check in cluster_rel which is redundant, but we leave it
-		 * for extra safety.
-		 */
-		if (RELATION_IS_OTHER_TEMP(OldHeap))
-		{
-			relation_close(OldHeap, AccessExclusiveLock);
-			goto out;
-		}
-
-		if (OidIsValid(indexOid))
-		{
-			/*
-			 * Check that the index still exists
-			 */
-			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
-			{
-				relation_close(OldHeap, AccessExclusiveLock);
-				goto out;
-			}
-
-			/*
-			 * Check that the index is still the one with indisclustered set,
-			 * if needed.
-			 */
-			if ((params->options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
-				!get_index_isclustered(indexOid))
-			{
-				relation_close(OldHeap, AccessExclusiveLock);
-				goto out;
-			}
-		}
-	}
 
 	/*
-	 * We allow VACUUM FULL, but not CLUSTER, on shared catalogs.  CLUSTER
-	 * would work in most respects, but the index would only get marked as
-	 * indisclustered in the current database, leading to unexpected behavior
-	 * if CLUSTER were later invoked in another database.
+	 * We allow repacking shared catalogs only when not using an index. It
+	 * would work to use an index in most respects, but the index would only
+	 * get marked as indisclustered in the current database, leading to
+	 * unexpected behavior if CLUSTER were later invoked in another database.
 	 */
-	if (OidIsValid(indexOid) && OldHeap->rd_rel->relisshared)
+	if (usingindex && OldHeap->rd_rel->relisshared)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot cluster a shared catalog")));
+				 errmsg("cannot run \"%s\" on a shared catalog",
+						RepackCommandAsString(cmd))));
 
 	/*
 	 * Don't process temp tables of other backends ... their local buffer
@@ -415,21 +375,30 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	 */
 	if (RELATION_IS_OTHER_TEMP(OldHeap))
 	{
-		if (OidIsValid(indexOid))
+		if (cmd == REPACK_COMMAND_CLUSTER)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("cannot cluster temporary tables of other sessions")));
+		else if (cmd == REPACK_COMMAND_REPACK)
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot repack temporary tables of other sessions")));
+		}
 		else
+		{
+			Assert(cmd == REPACK_COMMAND_VACUUMFULL);
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("cannot vacuum temporary tables of other sessions")));
+		}
 	}
 
 	/*
 	 * Also check for active uses of the relation in the current transaction,
 	 * including open scans and pending AFTER trigger events.
 	 */
-	CheckTableNotInUse(OldHeap, OidIsValid(indexOid) ? "CLUSTER" : "VACUUM");
+	CheckTableNotInUse(OldHeap, RepackCommandAsString(cmd));
 
 	/* Check heap and index are valid to cluster on */
 	if (OidIsValid(indexOid))
@@ -469,7 +438,7 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	TransferPredicateLocksToHeapRelation(OldHeap);
 
 	/* rebuild_relation does all the dirty work */
-	rebuild_relation(OldHeap, index, verbose);
+	rebuild_relation(cmd, usingindex, OldHeap, index, verbose);
 	/* rebuild_relation closes OldHeap, and index if valid */
 
 out:
@@ -482,6 +451,63 @@ out:
 	pgstat_progress_end_command();
 }
 
+/*
+ * Check if the table (and its index) still meets the requirements of
+ * cluster_rel().
+ */
+static bool
+cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
+					Oid userid, int options)
+{
+	Oid			tableOid = RelationGetRelid(OldHeap);
+
+	/* Check that the user still has privileges for the relation */
+	if (!cluster_is_permitted_for_relation(cmd, tableOid, userid))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		return false;
+	}
+
+	/*
+	 * Silently skip a temp table for a remote session.  Only doing this check
+	 * in the "recheck" case is appropriate (which currently means somebody is
+	 * executing a database-wide CLUSTER or on a partitioned table), because
+	 * there is another check in cluster() which will stop any attempt to
+	 * cluster remote temp tables by name.  There is another check in
+	 * cluster_rel which is redundant, but we leave it for extra safety.
+	 */
+	if (RELATION_IS_OTHER_TEMP(OldHeap))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		return false;
+	}
+
+	if (OidIsValid(indexOid))
+	{
+		/*
+		 * Check that the index still exists
+		 */
+		if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
+		{
+			relation_close(OldHeap, AccessExclusiveLock);
+			return false;
+		}
+
+		/*
+		 * Check that the index is still the one with indisclustered set, if
+		 * needed.
+		 */
+		if ((options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
+			!get_index_isclustered(indexOid))
+		{
+			relation_close(OldHeap, AccessExclusiveLock);
+			return false;
+		}
+	}
+
+	return true;
+}
+
 /*
  * Verify that the specified heap and index are valid to cluster on
  *
@@ -626,7 +652,8 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
  * On exit, they are closed, but locks on them are not released.
  */
 static void
-rebuild_relation(Relation OldHeap, Relation index, bool verbose)
+rebuild_relation(RepackCommand cmd, bool usingindex,
+				 Relation OldHeap, Relation index, bool verbose)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			accessMethod = OldHeap->rd_rel->relam;
@@ -642,8 +669,8 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose)
 	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false) &&
 		   (index == NULL || CheckRelationLockedByMe(index, AccessExclusiveLock, false)));
 
-	if (index)
-		/* Mark the correct index as clustered */
+	/* for CLUSTER or REPACK USING INDEX, mark the index as the one to use */
+	if (usingindex)
 		mark_index_clustered(OldHeap, RelationGetRelid(index), true);
 
 	/* Remember info about rel before closing OldHeap */
@@ -1458,8 +1485,8 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	int			i;
 
 	/* Report that we are now swapping relation files */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_SWAP_REL_FILES);
 
 	/* Zero out possible results from swapped_relation_files */
 	memset(mapped_tables, 0, sizeof(mapped_tables));
@@ -1509,14 +1536,14 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 		reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
 
 	/* Report that we are now reindexing relations */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_REBUILD_INDEX);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
 
 	reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
 
 	/* Report that we are now doing clean up */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_FINAL_CLEANUP);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_FINAL_CLEANUP);
 
 	/*
 	 * If the relation being rebuilt is pg_class, swap_relation_files()
@@ -1632,69 +1659,137 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	}
 }
 
-
 /*
- * Get a list of tables that the current user has privileges on and
- * have indisclustered set.  Return the list in a List * of RelToCluster
- * (stored in the specified memory context), each one giving the tableOid
- * and the indexOid on which the table is already clustered.
+ * Determine which relations to process, when REPACK/CLUSTER is called
+ * without specifying a table name.  The exact process depends on whether
+ * USING INDEX was given or not, and in any case we only return tables and
+ * materialized views that the current user has privileges to repack/cluster.
+ *
+ * If USING INDEX was given, we scan pg_index to find those that have
+ * indisclustered set; if it was not given, scan pg_class and return all
+ * tables.
+ *
+ * Return it as a list of RelToCluster in the given memory context.
  */
 static List *
-get_tables_to_cluster(MemoryContext cluster_context)
+get_tables_to_repack(RepackCommand command, bool usingindex,
+					 MemoryContext permcxt)
 {
-	Relation	indRelation;
+	Relation	catalog;
 	TableScanDesc scan;
-	ScanKeyData entry;
-	HeapTuple	indexTuple;
-	Form_pg_index index;
+	HeapTuple	tuple;
 	MemoryContext old_context;
 	List	   *rtcs = NIL;
 
-	/*
-	 * Get all indexes that have indisclustered set and that the current user
-	 * has the appropriate privileges for.
-	 */
-	indRelation = table_open(IndexRelationId, AccessShareLock);
-	ScanKeyInit(&entry,
-				Anum_pg_index_indisclustered,
-				BTEqualStrategyNumber, F_BOOLEQ,
-				BoolGetDatum(true));
-	scan = table_beginscan_catalog(indRelation, 1, &entry);
-	while ((indexTuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+	if (usingindex)
 	{
-		RelToCluster *rtc;
+		ScanKeyData entry;
 
-		index = (Form_pg_index) GETSTRUCT(indexTuple);
+		catalog = table_open(IndexRelationId, AccessShareLock);
+		ScanKeyInit(&entry,
+					Anum_pg_index_indisclustered,
+					BTEqualStrategyNumber, F_BOOLEQ,
+					BoolGetDatum(true));
+		scan = table_beginscan_catalog(catalog, 1, &entry);
+		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+		{
+			RelToCluster *rtc;
+			Form_pg_index index;
 
-		if (!cluster_is_permitted_for_relation(index->indrelid, GetUserId()))
-			continue;
+			index = (Form_pg_index) GETSTRUCT(tuple);
 
-		/* Use a permanent memory context for the result list */
-		old_context = MemoryContextSwitchTo(cluster_context);
+			/*
+			 * XXX I think the only reason there's no test failure here is
+			 * that we seldom have clustered indexes that would be affected by
+			 * concurrency.  Maybe we should also do the
+			 * ConditionalLockRelationOid+SearchSysCacheExists dance that we
+			 * do below.
+			 */
+			if (!cluster_is_permitted_for_relation(command, index->indrelid,
+												   GetUserId()))
+				continue;
 
-		rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
-		rtc->tableOid = index->indrelid;
-		rtc->indexOid = index->indexrelid;
-		rtcs = lappend(rtcs, rtc);
+			/* Use a permanent memory context for the result list */
+			old_context = MemoryContextSwitchTo(permcxt);
 
-		MemoryContextSwitchTo(old_context);
+			rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
+			rtc->tableOid = index->indrelid;
+			rtc->indexOid = index->indexrelid;
+			rtcs = lappend(rtcs, rtc);
+
+			MemoryContextSwitchTo(old_context);
+		}
 	}
-	table_endscan(scan);
+	else
+	{
+		catalog = table_open(RelationRelationId, AccessShareLock);
+		scan = table_beginscan_catalog(catalog, 0, NULL);
 
-	relation_close(indRelation, AccessShareLock);
+		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+		{
+			RelToCluster *rtc;
+			Form_pg_class class;
+
+			class = (Form_pg_class) GETSTRUCT(tuple);
+
+			/*
+			 * Try to obtain a light lock on the table, to ensure it doesn't
+			 * go away while we collect the list.  If we cannot, just
+			 * disregard the table.  XXX we could release at the bottom of the
+			 * loop, but for now just hold it until this transaction is
+			 * finished.
+			 */
+			if (!ConditionalLockRelationOid(class->oid, AccessShareLock))
+				continue;
+
+			/* Verify that the table still exists. */
+			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(class->oid)))
+			{
+				/* Release useless lock */
+				UnlockRelationOid(class->oid, AccessShareLock);
+				continue;
+			}
+
+			/* Can only process plain tables and matviews */
+			if (class->relkind != RELKIND_RELATION &&
+				class->relkind != RELKIND_MATVIEW)
+				continue;
+
+			if (!cluster_is_permitted_for_relation(command, class->oid,
+												   GetUserId()))
+				continue;
+
+			/* Use a permanent memory context for the result list */
+			old_context = MemoryContextSwitchTo(permcxt);
+
+			rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
+			rtc->tableOid = class->oid;
+			rtc->indexOid = InvalidOid;
+			rtcs = lappend(rtcs, rtc);
+
+			MemoryContextSwitchTo(old_context);
+		}
+	}
+
+	table_endscan(scan);
+	relation_close(catalog, AccessShareLock);
 
 	return rtcs;
 }
 
 /*
- * Given an index on a partitioned table, return a list of RelToCluster for
+ * Given a partitioned table or its index, return a list of RelToCluster for
  * all the children leaves tables/indexes.
  *
  * Like expand_vacuum_rel, but here caller must hold AccessExclusiveLock
  * on the table containing the index.
+ *
+ * 'rel_is_index' tells whether 'relid' is that of an index (true) or of the
+ * owning relation.
  */
 static List *
-get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
+get_tables_to_repack_partitioned(RepackCommand cmd, MemoryContext cluster_context,
+								 Oid relid, bool rel_is_index)
 {
 	List	   *inhoids;
 	ListCell   *lc;
@@ -1702,17 +1797,33 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
 	MemoryContext old_context;
 
 	/* Do not lock the children until they're processed */
-	inhoids = find_all_inheritors(indexOid, NoLock, NULL);
+	inhoids = find_all_inheritors(relid, NoLock, NULL);
 
 	foreach(lc, inhoids)
 	{
-		Oid			indexrelid = lfirst_oid(lc);
-		Oid			relid = IndexGetRelation(indexrelid, false);
+		Oid			inhoid = lfirst_oid(lc);
+		Oid			inhrelid,
+					inhindid;
 		RelToCluster *rtc;
 
-		/* consider only leaf indexes */
-		if (get_rel_relkind(indexrelid) != RELKIND_INDEX)
-			continue;
+		if (rel_is_index)
+		{
+			/* consider only leaf indexes */
+			if (get_rel_relkind(inhoid) != RELKIND_INDEX)
+				continue;
+
+			inhrelid = IndexGetRelation(inhoid, false);
+			inhindid = inhoid;
+		}
+		else
+		{
+			/* consider only leaf relations */
+			if (get_rel_relkind(inhoid) != RELKIND_RELATION)
+				continue;
+
+			inhrelid = inhoid;
+			inhindid = InvalidOid;
+		}
 
 		/*
 		 * It's possible that the user does not have privileges to CLUSTER the
@@ -1720,15 +1831,15 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
 		 * table.  We skip any partitions which the user is not permitted to
 		 * CLUSTER.
 		 */
-		if (!cluster_is_permitted_for_relation(relid, GetUserId()))
+		if (!cluster_is_permitted_for_relation(cmd, inhrelid, GetUserId()))
 			continue;
 
 		/* Use a permanent memory context for the result list */
 		old_context = MemoryContextSwitchTo(cluster_context);
 
 		rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
-		rtc->tableOid = relid;
-		rtc->indexOid = indexrelid;
+		rtc->tableOid = inhrelid;
+		rtc->indexOid = inhindid;
 		rtcs = lappend(rtcs, rtc);
 
 		MemoryContextSwitchTo(old_context);
@@ -1742,13 +1853,148 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
  * function emits a WARNING.
  */
 static bool
-cluster_is_permitted_for_relation(Oid relid, Oid userid)
+cluster_is_permitted_for_relation(RepackCommand cmd, Oid relid, Oid userid)
 {
 	if (pg_class_aclcheck(relid, userid, ACL_MAINTAIN) == ACLCHECK_OK)
 		return true;
 
+	Assert(cmd == REPACK_COMMAND_CLUSTER || cmd == REPACK_COMMAND_REPACK);
 	ereport(WARNING,
-			(errmsg("permission denied to cluster \"%s\", skipping it",
-					get_rel_name(relid))));
+			errmsg("permission denied to execute %s on \"%s\", skipping it",
+				   cmd == REPACK_COMMAND_CLUSTER ? "CLUSTER" : "REPACK",
+				   get_rel_name(relid)));
+
 	return false;
 }
+
+
+/*
+ * Given a RepackStmt with an indicated relation name, resolve the relation
+ * name, obtain lock on it, then determine what to do based on the relation
+ * type: if it's not a partitioned table, repack it as indicated (using an
+ * existing clustered index, or following the indicated index), and return
+ * NULL.
+ *
+ * On the other hand, if the table is partitioned, do nothing further and
+ * instead return the opened relcache entry, so that caller can process the
+ * partitions using the multiple-table handling code.  The index name is not
+ * resolve in this case.
+ */
+static Relation
+process_single_relation(RepackStmt *stmt, ClusterParams *params)
+{
+	Relation	rel;
+	Oid			tableOid;
+
+	Assert(stmt->relation != NULL);
+	Assert(stmt->command == REPACK_COMMAND_CLUSTER ||
+		   stmt->command == REPACK_COMMAND_REPACK);
+
+	/*
+	 * Find, lock, and check permissions on the table.  We obtain
+	 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
+	 * single-transaction case.
+	 */
+	tableOid = RangeVarGetRelidExtended(stmt->relation,
+										AccessExclusiveLock,
+										0,
+										RangeVarCallbackMaintainsTable,
+										NULL);
+	rel = table_open(tableOid, NoLock);
+
+	/*
+	 * Reject clustering a remote temp table ... their local buffer manager is
+	 * not going to cope.
+	 */
+	if (RELATION_IS_OTHER_TEMP(rel))
+	{
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot execute %s on temporary tables of other sessions",
+					   RepackCommandAsString(stmt->command)));
+	}
+
+	/*
+	 * For partitioned tables, let caller handle this.  Otherwise, process it
+	 * here and we're done.
+	 */
+	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+		return rel;
+	else
+	{
+		Oid			indexOid;
+
+		indexOid = determine_clustered_index(rel, stmt->usingindex,
+											 stmt->indexname);
+		if (OidIsValid(indexOid))
+			check_index_is_clusterable(rel, indexOid, AccessExclusiveLock);
+		cluster_rel(stmt->command, stmt->usingindex, rel, indexOid, params);
+
+		/* Do an analyze, if requested */
+		if (params->options & CLUOPT_ANALYZE)
+		{
+			VacuumParams vac_params = {0};
+
+			vac_params.options |= VACOPT_ANALYZE;
+			if (params->options & CLUOPT_VERBOSE)
+				vac_params.options |= VACOPT_VERBOSE;
+			analyze_rel(RelationGetRelid(rel), NULL, vac_params, NIL, true,
+						NULL);
+		}
+
+		return NULL;
+	}
+}
+
+/*
+ * Given a relation and the usingindex/indexname options in a
+ * REPACK USING INDEX or CLUSTER command, return the OID of the index to use
+ * for clustering the table.
+ *
+ * Caller must hold lock on the relation so that the set of indexes doesn't
+ * change, and must call check_index_is_clusterable.
+ */
+static Oid
+determine_clustered_index(Relation rel, bool usingindex, const char *indexname)
+{
+	Oid			indexOid;
+
+	if (indexname == NULL && usingindex)
+	{
+		ListCell   *lc;
+
+		/* Find an index with indisclustered set, or report error */
+		foreach(lc, RelationGetIndexList(rel))
+		{
+			indexOid = lfirst_oid(lc);
+
+			if (get_index_isclustered(indexOid))
+				break;
+			indexOid = InvalidOid;
+		}
+
+		if (!OidIsValid(indexOid))
+			ereport(ERROR,
+					errcode(ERRCODE_UNDEFINED_OBJECT),
+					errmsg("there is no previously clustered index for table \"%s\"",
+						   RelationGetRelationName(rel)));
+	}
+	else if (indexname != NULL)
+	{
+		/*
+		 * An index was specified; figure out its OID.  It must be in the same
+		 * namespace as the relation.
+		 */
+		indexOid = get_relname_relid(indexname,
+									 rel->rd_rel->relnamespace);
+		if (!OidIsValid(indexOid))
+			ereport(ERROR,
+					errcode(ERRCODE_UNDEFINED_OBJECT),
+					errmsg("index \"%s\" for table \"%s\" does not exist",
+						   indexname, RelationGetRelationName(rel)));
+	}
+	else
+		indexOid = InvalidOid;
+
+	return indexOid;
+}
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 733ef40ae7c..8863ad0e8bd 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -2287,7 +2287,8 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 				cluster_params.options |= CLUOPT_VERBOSE;
 
 			/* VACUUM FULL is now a variant of CLUSTER; see cluster.c */
-			cluster_rel(rel, InvalidOid, &cluster_params);
+			cluster_rel(REPACK_COMMAND_VACUUMFULL, false, rel, InvalidOid,
+						&cluster_params);
 			/* cluster_rel closes the relation, but keeps lock */
 
 			rel = NULL;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index db43034b9db..f9152728021 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -280,7 +280,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		AlterCompositeTypeStmt AlterUserMappingStmt
 		AlterRoleStmt AlterRoleSetStmt AlterPolicyStmt AlterStatsStmt
 		AlterDefaultPrivilegesStmt DefACLAction
-		AnalyzeStmt CallStmt ClosePortalStmt ClusterStmt CommentStmt
+		AnalyzeStmt CallStmt ClosePortalStmt CommentStmt
 		ConstraintsSetStmt CopyStmt CreateAsStmt CreateCastStmt
 		CreateDomainStmt CreateExtensionStmt CreateGroupStmt CreateOpClassStmt
 		CreateOpFamilyStmt AlterOpFamilyStmt CreatePLangStmt
@@ -297,7 +297,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		GrantStmt GrantRoleStmt ImportForeignSchemaStmt IndexStmt InsertStmt
 		ListenStmt LoadStmt LockStmt MergeStmt NotifyStmt ExplainableStmt PreparableStmt
 		CreateFunctionStmt AlterFunctionStmt ReindexStmt RemoveAggrStmt
-		RemoveFuncStmt RemoveOperStmt RenameStmt ReturnStmt RevokeStmt RevokeRoleStmt
+		RemoveFuncStmt RemoveOperStmt RenameStmt RepackStmt ReturnStmt RevokeStmt RevokeRoleStmt
 		RuleActionStmt RuleActionStmtOrEmpty RuleStmt
 		SecLabelStmt SelectStmt TransactionStmt TransactionStmtLegacy TruncateStmt
 		UnlistenStmt UpdateStmt VacuumStmt
@@ -316,7 +316,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 %type <str>			opt_single_name
 %type <list>		opt_qualified_name
-%type <boolean>		opt_concurrently
+%type <boolean>		opt_concurrently opt_usingindex
 %type <dbehavior>	opt_drop_behavior
 %type <list>		opt_utility_option_list
 %type <list>		utility_option_list
@@ -763,7 +763,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	QUOTE QUOTES
 
 	RANGE READ REAL REASSIGN RECURSIVE REF_P REFERENCES REFERENCING
-	REFRESH REINDEX RELATIVE_P RELEASE RENAME REPEATABLE REPLACE REPLICA
+	REFRESH REINDEX RELATIVE_P RELEASE RENAME REPACK REPEATABLE REPLACE REPLICA
 	RESET RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP
 	ROUTINE ROUTINES ROW ROWS RULE
 
@@ -1025,7 +1025,6 @@ stmt:
 			| CallStmt
 			| CheckPointStmt
 			| ClosePortalStmt
-			| ClusterStmt
 			| CommentStmt
 			| ConstraintsSetStmt
 			| CopyStmt
@@ -1099,6 +1098,7 @@ stmt:
 			| RemoveFuncStmt
 			| RemoveOperStmt
 			| RenameStmt
+			| RepackStmt
 			| RevokeStmt
 			| RevokeRoleStmt
 			| RuleStmt
@@ -1135,6 +1135,11 @@ opt_concurrently:
 			| /*EMPTY*/						{ $$ = false; }
 		;
 
+opt_usingindex:
+			USING INDEX						{ $$ = true; }
+			| /* EMPTY */					{ $$ = false; }
+		;
+
 opt_drop_behavior:
 			CASCADE							{ $$ = DROP_CASCADE; }
 			| RESTRICT						{ $$ = DROP_RESTRICT; }
@@ -11912,38 +11917,91 @@ CreateConversionStmt:
 /*****************************************************************************
  *
  *		QUERY:
+ *				REPACK [ (options) ] [ <qualified_name> [ USING INDEX <index_name> ] ]
+ *
+ *			obsolete variants:
  *				CLUSTER (options) [ <qualified_name> [ USING <index_name> ] ]
  *				CLUSTER [VERBOSE] [ <qualified_name> [ USING <index_name> ] ]
  *				CLUSTER [VERBOSE] <index_name> ON <qualified_name> (for pre-8.3)
  *
  *****************************************************************************/
 
-ClusterStmt:
-			CLUSTER '(' utility_option_list ')' qualified_name cluster_index_specification
+RepackStmt:
+			REPACK opt_utility_option_list qualified_name USING INDEX name
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = $3;
+					n->indexname = $6;
+					n->usingindex = true;
+					n->params = $2;
+					$$ = (Node *) n;
+				}
+			| REPACK opt_utility_option_list qualified_name opt_usingindex
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = $3;
+					n->indexname = NULL;
+					n->usingindex = $4;
+					n->params = $2;
+					$$ = (Node *) n;
+				}
+			| REPACK '(' utility_option_list ')'
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = NULL;
+					n->indexname = NULL;
+					n->usingindex = false;
+					n->params = $3;
+					$$ = (Node *) n;
+				}
+			| REPACK opt_usingindex
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = NULL;
+					n->indexname = NULL;
+					n->usingindex = $2;
+					n->params = NIL;
+					$$ = (Node *) n;
+				}
+			| CLUSTER '(' utility_option_list ')' qualified_name cluster_index_specification
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = $5;
 					n->indexname = $6;
+					n->usingindex = true;
 					n->params = $3;
 					$$ = (Node *) n;
 				}
 			| CLUSTER opt_utility_option_list
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = NULL;
 					n->indexname = NULL;
+					n->usingindex = true;
 					n->params = $2;
 					$$ = (Node *) n;
 				}
 			/* unparenthesized VERBOSE kept for pre-14 compatibility */
 			| CLUSTER opt_verbose qualified_name cluster_index_specification
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = $3;
 					n->indexname = $4;
+					n->usingindex = true;
 					if ($2)
 						n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
@@ -11951,20 +12009,24 @@ ClusterStmt:
 			/* unparenthesized VERBOSE kept for pre-17 compatibility */
 			| CLUSTER VERBOSE
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = NULL;
 					n->indexname = NULL;
+					n->usingindex = true;
 					n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
 				}
 			/* kept for pre-8.3 compatibility */
 			| CLUSTER opt_verbose name ON qualified_name
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = $5;
 					n->indexname = $3;
+					n->usingindex = true;
 					if ($2)
 						n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
@@ -17960,6 +18022,7 @@ unreserved_keyword:
 			| RELATIVE_P
 			| RELEASE
 			| RENAME
+			| REPACK
 			| REPEATABLE
 			| REPLACE
 			| REPLICA
@@ -18592,6 +18655,7 @@ bare_label_keyword:
 			| RELATIVE_P
 			| RELEASE
 			| RENAME
+			| REPACK
 			| REPEATABLE
 			| REPLACE
 			| REPLICA
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 5f442bc3bd4..cf6db581007 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -277,9 +277,9 @@ ClassifyUtilityCommandAsReadOnly(Node *parsetree)
 				return COMMAND_OK_IN_RECOVERY | COMMAND_OK_IN_READ_ONLY_TXN;
 			}
 
-		case T_ClusterStmt:
 		case T_ReindexStmt:
 		case T_VacuumStmt:
+		case T_RepackStmt:
 			{
 				/*
 				 * These commands write WAL, so they're not strictly
@@ -854,14 +854,14 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 			ExecuteCallStmt(castNode(CallStmt, parsetree), params, isAtomicContext, dest);
 			break;
 
-		case T_ClusterStmt:
-			cluster(pstate, (ClusterStmt *) parsetree, isTopLevel);
-			break;
-
 		case T_VacuumStmt:
 			ExecVacuum(pstate, (VacuumStmt *) parsetree, isTopLevel);
 			break;
 
+		case T_RepackStmt:
+			ExecRepack(pstate, (RepackStmt *) parsetree, isTopLevel);
+			break;
+
 		case T_ExplainStmt:
 			ExplainQuery(pstate, (ExplainStmt *) parsetree, params, dest);
 			break;
@@ -2851,10 +2851,6 @@ CreateCommandTag(Node *parsetree)
 			tag = CMDTAG_CALL;
 			break;
 
-		case T_ClusterStmt:
-			tag = CMDTAG_CLUSTER;
-			break;
-
 		case T_VacuumStmt:
 			if (((VacuumStmt *) parsetree)->is_vacuumcmd)
 				tag = CMDTAG_VACUUM;
@@ -2862,6 +2858,10 @@ CreateCommandTag(Node *parsetree)
 				tag = CMDTAG_ANALYZE;
 			break;
 
+		case T_RepackStmt:
+			tag = CMDTAG_REPACK;
+			break;
+
 		case T_ExplainStmt:
 			tag = CMDTAG_EXPLAIN;
 			break;
@@ -3499,7 +3499,7 @@ GetCommandLogLevel(Node *parsetree)
 			lev = LOGSTMT_ALL;
 			break;
 
-		case T_ClusterStmt:
+		case T_RepackStmt:
 			lev = LOGSTMT_DDL;
 			break;
 
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index c756c2bebaa..a1e10e8c2f6 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -268,6 +268,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 		cmdtype = PROGRESS_COMMAND_ANALYZE;
 	else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
 		cmdtype = PROGRESS_COMMAND_CLUSTER;
+	else if (pg_strcasecmp(cmd, "REPACK") == 0)
+		cmdtype = PROGRESS_COMMAND_REPACK;
 	else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)
 		cmdtype = PROGRESS_COMMAND_CREATE_INDEX;
 	else if (pg_strcasecmp(cmd, "BASEBACKUP") == 0)
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 8b10f2313f3..59ff6e0923b 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -1247,7 +1247,7 @@ static const char *const sql_commands[] = {
 	"DELETE FROM", "DISCARD", "DO", "DROP", "END", "EXECUTE", "EXPLAIN",
 	"FETCH", "GRANT", "IMPORT FOREIGN SCHEMA", "INSERT INTO", "LISTEN", "LOAD", "LOCK",
 	"MERGE INTO", "MOVE", "NOTIFY", "PREPARE",
-	"REASSIGN", "REFRESH MATERIALIZED VIEW", "REINDEX", "RELEASE",
+	"REASSIGN", "REFRESH MATERIALIZED VIEW", "REINDEX", "RELEASE", "REPACK",
 	"RESET", "REVOKE", "ROLLBACK",
 	"SAVEPOINT", "SECURITY LABEL", "SELECT", "SET", "SHOW", "START",
 	"TABLE", "TRUNCATE", "UNLISTEN", "UPDATE", "VACUUM", "VALUES", "WITH",
@@ -4997,6 +4997,37 @@ match_previous_words(int pattern_id,
 			COMPLETE_WITH_QUERY(Query_for_list_of_tablespaces);
 	}
 
+/* REPACK */
+	else if (Matches("REPACK"))
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
+	else if (Matches("REPACK", "(*)"))
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
+	/* If we have REPACK <sth>, then add "USING INDEX" */
+	else if (Matches("REPACK", MatchAnyExcept("(")))
+		COMPLETE_WITH("USING INDEX");
+	/* If we have REPACK (*) <sth>, then add "USING INDEX" */
+	else if (Matches("REPACK", "(*)", MatchAny))
+		COMPLETE_WITH("USING INDEX");
+	/* If we have REPACK <sth> USING, then add the index as well */
+	else if (Matches("REPACK", MatchAny, "USING", "INDEX"))
+	{
+		set_completion_reference(prev3_wd);
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_index_of_table);
+	}
+	else if (HeadMatches("REPACK", "(*") &&
+			 !HeadMatches("REPACK", "(*)"))
+	{
+		/*
+		 * This fires if we're in an unfinished parenthesized option list.
+		 * get_previous_words treats a completed parenthesized option list as
+		 * one word, so the above test is correct.
+		 */
+		if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
+			COMPLETE_WITH("VERBOSE");
+		else if (TailMatches("VERBOSE"))
+			COMPLETE_WITH("ON", "OFF");
+	}
+
 /* SECURITY LABEL */
 	else if (Matches("SECURITY"))
 		COMPLETE_WITH("LABEL");
diff --git a/src/bin/scripts/Makefile b/src/bin/scripts/Makefile
index 019ca06455d..f0c1bd4175c 100644
--- a/src/bin/scripts/Makefile
+++ b/src/bin/scripts/Makefile
@@ -16,7 +16,7 @@ subdir = src/bin/scripts
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-PROGRAMS = createdb createuser dropdb dropuser clusterdb vacuumdb reindexdb pg_isready
+PROGRAMS = createdb createuser dropdb dropuser clusterdb vacuumdb reindexdb pg_isready pg_repackdb
 
 override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
 LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
@@ -31,6 +31,7 @@ clusterdb: clusterdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport su
 vacuumdb: vacuumdb.o vacuuming.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 reindexdb: reindexdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 pg_isready: pg_isready.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
+pg_repackdb: pg_repackdb.o vacuuming.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 
 install: all installdirs
 	$(INSTALL_PROGRAM) createdb$(X)   '$(DESTDIR)$(bindir)'/createdb$(X)
@@ -41,6 +42,7 @@ install: all installdirs
 	$(INSTALL_PROGRAM) vacuumdb$(X)   '$(DESTDIR)$(bindir)'/vacuumdb$(X)
 	$(INSTALL_PROGRAM) reindexdb$(X)  '$(DESTDIR)$(bindir)'/reindexdb$(X)
 	$(INSTALL_PROGRAM) pg_isready$(X) '$(DESTDIR)$(bindir)'/pg_isready$(X)
+	$(INSTALL_PROGRAM) pg_repackdb$(X) '$(DESTDIR)$(bindir)'/pg_repackdb$(X)
 
 installdirs:
 	$(MKDIR_P) '$(DESTDIR)$(bindir)'
diff --git a/src/bin/scripts/meson.build b/src/bin/scripts/meson.build
index a4fed59d1c9..0a83abe253b 100644
--- a/src/bin/scripts/meson.build
+++ b/src/bin/scripts/meson.build
@@ -80,6 +80,7 @@ tests += {
       't/100_vacuumdb.pl',
       't/101_vacuumdb_all.pl',
       't/102_vacuumdb_stages.pl',
+      't/103_repackdb.pl',
       't/200_connstr.pl',
     ],
   },
diff --git a/src/bin/scripts/pg_repackdb.c b/src/bin/scripts/pg_repackdb.c
new file mode 100644
index 00000000000..23326372a77
--- /dev/null
+++ b/src/bin/scripts/pg_repackdb.c
@@ -0,0 +1,226 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_repackdb
+ *		An utility to run REPACK
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * FIXME: this is missing a way to specify the index to use to repack one
+ * table, or whether to pass a WITH INDEX clause when multiple tables are
+ * used.  Something like --index[=indexname].  Adding that bleeds into
+ * vacuuming.c as well.
+ *
+ * src/bin/scripts/pg_repackdb.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <limits.h>
+
+#include "common.h"
+#include "common/logging.h"
+#include "fe_utils/option_utils.h"
+#include "vacuuming.h"
+
+static void help(const char *progname);
+void		check_objfilter(void);
+
+int
+main(int argc, char *argv[])
+{
+	static struct option long_options[] = {
+		{"host", required_argument, NULL, 'h'},
+		{"port", required_argument, NULL, 'p'},
+		{"username", required_argument, NULL, 'U'},
+		{"no-password", no_argument, NULL, 'w'},
+		{"password", no_argument, NULL, 'W'},
+		{"echo", no_argument, NULL, 'e'},
+		{"quiet", no_argument, NULL, 'q'},
+		{"dbname", required_argument, NULL, 'd'},
+		{"all", no_argument, NULL, 'a'},
+		{"table", required_argument, NULL, 't'},
+		{"verbose", no_argument, NULL, 'v'},
+		{"jobs", required_argument, NULL, 'j'},
+		{"schema", required_argument, NULL, 'n'},
+		{"exclude-schema", required_argument, NULL, 'N'},
+		{"maintenance-db", required_argument, NULL, 2},
+		{NULL, 0, NULL, 0}
+	};
+
+	const char *progname;
+	int			optindex;
+	int			c;
+	const char *dbname = NULL;
+	const char *maintenance_db = NULL;
+	ConnParams	cparams;
+	bool		echo = false;
+	bool		quiet = false;
+	vacuumingOptions vacopts;
+	SimpleStringList objects = {NULL, NULL};
+	int			concurrentCons = 1;
+	int			tbl_count = 0;
+
+	/* initialize options */
+	memset(&vacopts, 0, sizeof(vacopts));
+	vacopts.mode = MODE_REPACK;
+
+	/* the same for connection parameters */
+	memset(&cparams, 0, sizeof(cparams));
+	cparams.prompt_password = TRI_DEFAULT;
+
+	pg_logging_init(argv[0]);
+	progname = get_progname(argv[0]);
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
+
+	handle_help_version_opts(argc, argv, progname, help);
+
+	while ((c = getopt_long(argc, argv, "ad:eh:j:n:N:p:qt:U:vwW",
+							long_options, &optindex)) != -1)
+	{
+		switch (c)
+		{
+			case 'a':
+				objfilter |= OBJFILTER_ALL_DBS;
+				break;
+			case 'd':
+				objfilter |= OBJFILTER_DATABASE;
+				dbname = pg_strdup(optarg);
+				break;
+			case 'e':
+				echo = true;
+				break;
+			case 'h':
+				cparams.pghost = pg_strdup(optarg);
+				break;
+			case 'j':
+				if (!option_parse_int(optarg, "-j/--jobs", 1, INT_MAX,
+									  &concurrentCons))
+					exit(1);
+				break;
+			case 'n':
+				objfilter |= OBJFILTER_SCHEMA;
+				simple_string_list_append(&objects, optarg);
+				break;
+			case 'N':
+				objfilter |= OBJFILTER_SCHEMA_EXCLUDE;
+				simple_string_list_append(&objects, optarg);
+				break;
+			case 'p':
+				cparams.pgport = pg_strdup(optarg);
+				break;
+			case 'q':
+				quiet = true;
+				break;
+			case 't':
+				objfilter |= OBJFILTER_TABLE;
+				simple_string_list_append(&objects, optarg);
+				tbl_count++;
+				break;
+			case 'U':
+				cparams.pguser = pg_strdup(optarg);
+				break;
+			case 'v':
+				vacopts.verbose = true;
+				break;
+			case 'w':
+				cparams.prompt_password = TRI_NO;
+				break;
+			case 'W':
+				cparams.prompt_password = TRI_YES;
+				break;
+			case 2:
+				maintenance_db = pg_strdup(optarg);
+				break;
+			default:
+				/* getopt_long already emitted a complaint */
+				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+				exit(1);
+		}
+	}
+
+	/*
+	 * Non-option argument specifies database name as long as it wasn't
+	 * already specified with -d / --dbname
+	 */
+	if (optind < argc && dbname == NULL)
+	{
+		objfilter |= OBJFILTER_DATABASE;
+		dbname = argv[optind];
+		optind++;
+	}
+
+	if (optind < argc)
+	{
+		pg_log_error("too many command-line arguments (first is \"%s\")",
+					 argv[optind]);
+		pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+		exit(1);
+	}
+
+	/*
+	 * Validate the combination of filters specified in the command-line
+	 * options.
+	 */
+	check_objfilter();
+
+	vacuuming_main(&cparams, dbname, maintenance_db, &vacopts, &objects,
+				   false, tbl_count, concurrentCons,
+				   progname, echo, quiet);
+	exit(0);
+}
+
+/*
+ * Verify that the filters used at command line are compatible.
+ */
+void
+check_objfilter(void)
+{
+	if ((objfilter & OBJFILTER_ALL_DBS) &&
+		(objfilter & OBJFILTER_DATABASE))
+		pg_fatal("cannot repack all databases and a specific one at the same time");
+
+	if ((objfilter & OBJFILTER_TABLE) &&
+		(objfilter & OBJFILTER_SCHEMA))
+		pg_fatal("cannot repack all tables in schema(s) and specific table(s) at the same time");
+
+	if ((objfilter & OBJFILTER_TABLE) &&
+		(objfilter & OBJFILTER_SCHEMA_EXCLUDE))
+		pg_fatal("cannot repack specific table(s) and exclude schema(s) at the same time");
+
+	if ((objfilter & OBJFILTER_SCHEMA) &&
+		(objfilter & OBJFILTER_SCHEMA_EXCLUDE))
+		pg_fatal("cannot repack all tables in schema(s) and exclude schema(s) at the same time");
+}
+
+static void
+help(const char *progname)
+{
+	printf(_("%s repacks a PostgreSQL database.\n\n"), progname);
+	printf(_("Usage:\n"));
+	printf(_("  %s [OPTION]... [DBNAME]\n"), progname);
+	printf(_("\nOptions:\n"));
+	printf(_("  -a, --all                       repack all databases\n"));
+	printf(_("  -d, --dbname=DBNAME             database to repack\n"));
+	printf(_("  -e, --echo                      show the commands being sent to the server\n"));
+	printf(_("  -j, --jobs=NUM                  use this many concurrent connections to repack\n"));
+	printf(_("  -n, --schema=SCHEMA             repack tables in the specified schema(s) only\n"));
+	printf(_("  -N, --exclude-schema=SCHEMA     do not repack tables in the specified schema(s)\n"));
+	printf(_("  -q, --quiet                     don't write any messages\n"));
+	printf(_("  -t, --table='TABLE'             repack specific table(s) only\n"));
+	printf(_("  -v, --verbose                   write a lot of output\n"));
+	printf(_("  -V, --version                   output version information, then exit\n"));
+	printf(_("  -?, --help                      show this help, then exit\n"));
+	printf(_("\nConnection options:\n"));
+	printf(_("  -h, --host=HOSTNAME       database server host or socket directory\n"));
+	printf(_("  -p, --port=PORT           database server port\n"));
+	printf(_("  -U, --username=USERNAME   user name to connect as\n"));
+	printf(_("  -w, --no-password         never prompt for password\n"));
+	printf(_("  -W, --password            force password prompt\n"));
+	printf(_("  --maintenance-db=DBNAME   alternate maintenance database\n"));
+	printf(_("\nRead the description of the SQL command REPACK for details.\n"));
+	printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
+	printf(_("%s home page: <%s>\n"), PACKAGE_NAME, PACKAGE_URL);
+}
diff --git a/src/bin/scripts/t/103_repackdb.pl b/src/bin/scripts/t/103_repackdb.pl
new file mode 100644
index 00000000000..51de4d7ab34
--- /dev/null
+++ b/src/bin/scripts/t/103_repackdb.pl
@@ -0,0 +1,24 @@
+# Copyright (c) 2021-2025, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+program_help_ok('pg_repackdb');
+program_version_ok('pg_repackdb');
+program_options_handling_ok('pg_repackdb');
+
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init;
+$node->start;
+
+$node->issues_sql_like(
+	[ 'pg_repackdb', 'postgres' ],
+	qr/statement: REPACK.*;/,
+	'SQL REPACK run');
+
+
+done_testing();
diff --git a/src/bin/scripts/vacuuming.c b/src/bin/scripts/vacuuming.c
index 9be37fcc45a..e07071c38ee 100644
--- a/src/bin/scripts/vacuuming.c
+++ b/src/bin/scripts/vacuuming.c
@@ -1,6 +1,6 @@
 /*-------------------------------------------------------------------------
  * vacuuming.c
- *		Common routines for vacuumdb
+ *		Common routines for vacuumdb and pg_repackdb
  *
  * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -166,6 +166,14 @@ vacuum_one_database(ConnParams *cparams,
 
 	conn = connectDatabase(cparams, progname, echo, false, true);
 
+	if (vacopts->mode == MODE_REPACK && PQserverVersion(conn) < 190000)
+	{
+		/* XXX arguably, here we should use VACUUM FULL instead of failing */
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" command on server versions older than PostgreSQL %s",
+				 "REPACK", "19");
+	}
+
 	if (vacopts->disable_page_skipping && PQserverVersion(conn) < 90600)
 	{
 		PQfinish(conn);
@@ -258,9 +266,15 @@ vacuum_one_database(ConnParams *cparams,
 		if (stage != ANALYZE_NO_STAGE)
 			printf(_("%s: processing database \"%s\": %s\n"),
 				   progname, PQdb(conn), _(stage_messages[stage]));
-		else
+		else if (vacopts->mode == MODE_VACUUM)
 			printf(_("%s: vacuuming database \"%s\"\n"),
 				   progname, PQdb(conn));
+		else
+		{
+			Assert(vacopts->mode == MODE_REPACK);
+			printf(_("%s: repacking database \"%s\"\n"),
+				   progname, PQdb(conn));
+		}
 		fflush(stdout);
 	}
 
@@ -350,7 +364,7 @@ vacuum_one_database(ConnParams *cparams,
 		 * through ParallelSlotsGetIdle.
 		 */
 		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, sql.data,
+		run_vacuum_command(free_slot->connection, vacopts, sql.data,
 						   echo, tabname);
 
 		cell = cell->next;
@@ -363,7 +377,7 @@ vacuum_one_database(ConnParams *cparams,
 	}
 
 	/* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */
-	if (vacopts->skip_database_stats &&
+	if (vacopts->mode == MODE_VACUUM && vacopts->skip_database_stats &&
 		stage == ANALYZE_NO_STAGE)
 	{
 		const char *cmd = "VACUUM (ONLY_DATABASE_STATS);";
@@ -376,7 +390,7 @@ vacuum_one_database(ConnParams *cparams,
 		}
 
 		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, cmd, echo, NULL);
+		run_vacuum_command(free_slot->connection, vacopts, cmd, echo, NULL);
 
 		if (!ParallelSlotsWaitCompletion(sa))
 			failed = true;
@@ -708,6 +722,12 @@ vacuum_all_databases(ConnParams *cparams,
 	int			i;
 
 	conn = connectMaintenanceDatabase(cparams, progname, echo);
+	if (vacopts->mode == MODE_REPACK && PQserverVersion(conn) < 190000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" command on server versions older than PostgreSQL %s",
+				 "REPACK", "19");
+	}
 	result = executeQuery(conn,
 						  "SELECT datname FROM pg_database WHERE datallowconn AND datconnlimit <> -2 ORDER BY 1;",
 						  echo);
@@ -761,7 +781,7 @@ vacuum_all_databases(ConnParams *cparams,
 }
 
 /*
- * Construct a vacuum/analyze command to run based on the given
+ * Construct a vacuum/analyze/repack command to run based on the given
  * options, in the given string buffer, which may contain previous garbage.
  *
  * The table name used must be already properly quoted.  The command generated
@@ -777,7 +797,13 @@ prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
 
 	resetPQExpBuffer(sql);
 
-	if (vacopts->analyze_only)
+	if (vacopts->mode == MODE_REPACK)
+	{
+		appendPQExpBufferStr(sql, "REPACK");
+		if (vacopts->verbose)
+			appendPQExpBufferStr(sql, " (VERBOSE)");
+	}
+	else if (vacopts->analyze_only)
 	{
 		appendPQExpBufferStr(sql, "ANALYZE");
 
@@ -938,8 +964,8 @@ prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
  * Any errors during command execution are reported to stderr.
  */
 void
-run_vacuum_command(PGconn *conn, const char *sql, bool echo,
-				   const char *table)
+run_vacuum_command(PGconn *conn, vacuumingOptions *vacopts,
+				   const char *sql, bool echo, const char *table)
 {
 	bool		status;
 
@@ -952,13 +978,21 @@ run_vacuum_command(PGconn *conn, const char *sql, bool echo,
 	{
 		if (table)
 		{
-			pg_log_error("vacuuming of table \"%s\" in database \"%s\" failed: %s",
-						 table, PQdb(conn), PQerrorMessage(conn));
+			if (vacopts->mode == MODE_VACUUM)
+				pg_log_error("vacuuming of table \"%s\" in database \"%s\" failed: %s",
+							 table, PQdb(conn), PQerrorMessage(conn));
+			else
+				pg_log_error("repacking of table \"%s\" in database \"%s\" failed: %s",
+							 table, PQdb(conn), PQerrorMessage(conn));
 		}
 		else
 		{
-			pg_log_error("vacuuming of database \"%s\" failed: %s",
-						 PQdb(conn), PQerrorMessage(conn));
+			if (vacopts->mode == MODE_VACUUM)
+				pg_log_error("vacuuming of database \"%s\" failed: %s",
+							 PQdb(conn), PQerrorMessage(conn));
+			else
+				pg_log_error("repacking of database \"%s\" failed: %s",
+							 PQdb(conn), PQerrorMessage(conn));
 		}
 	}
 }
diff --git a/src/bin/scripts/vacuuming.h b/src/bin/scripts/vacuuming.h
index d3f000840fa..154bc9925c0 100644
--- a/src/bin/scripts/vacuuming.h
+++ b/src/bin/scripts/vacuuming.h
@@ -17,6 +17,12 @@
 #include "fe_utils/connect_utils.h"
 #include "fe_utils/simple_list.h"
 
+typedef enum
+{
+	MODE_VACUUM,
+	MODE_REPACK
+} RunMode;
+
 /* For analyze-in-stages mode */
 #define ANALYZE_NO_STAGE	-1
 #define ANALYZE_NUM_STAGES	3
@@ -24,6 +30,7 @@
 /* vacuum options controlled by user flags */
 typedef struct vacuumingOptions
 {
+	RunMode		mode;
 	bool		analyze_only;
 	bool		verbose;
 	bool		and_analyze;
@@ -87,8 +94,8 @@ extern void vacuum_all_databases(ConnParams *cparams,
 extern void prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
 								   vacuumingOptions *vacopts, const char *table);
 
-extern void run_vacuum_command(PGconn *conn, const char *sql, bool echo,
-							   const char *table);
+extern void run_vacuum_command(PGconn *conn, vacuumingOptions *vacopts,
+							   const char *sql, bool echo, const char *table);
 
 extern char *escape_quotes(const char *src);
 
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 60088a64cbb..890998d84bb 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -24,6 +24,7 @@
 #define CLUOPT_RECHECK 0x02		/* recheck relation state */
 #define CLUOPT_RECHECK_ISCLUSTERED 0x04 /* recheck relation state for
 										 * indisclustered */
+#define CLUOPT_ANALYZE 0x08		/* do an ANALYZE */
 
 /* options for CLUSTER */
 typedef struct ClusterParams
@@ -31,8 +32,11 @@ typedef struct ClusterParams
 	bits32		options;		/* bitmask of CLUOPT_* */
 } ClusterParams;
 
-extern void cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel);
-extern void cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params);
+
+extern void ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel);
+
+extern void cluster_rel(RepackCommand command, bool usingindex,
+						Relation OldHeap, Oid indexOid, ClusterParams *params);
 extern void check_index_is_clusterable(Relation OldHeap, Oid indexOid,
 									   LOCKMODE lockmode);
 extern void mark_index_clustered(Relation rel, Oid indexOid, bool is_internal);
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 1cde4bd9bcf..5b6639c114c 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -56,24 +56,51 @@
 #define PROGRESS_ANALYZE_PHASE_COMPUTE_EXT_STATS		4
 #define PROGRESS_ANALYZE_PHASE_FINALIZE_ANALYZE			5
 
-/* Progress parameters for cluster */
-#define PROGRESS_CLUSTER_COMMAND				0
-#define PROGRESS_CLUSTER_PHASE					1
-#define PROGRESS_CLUSTER_INDEX_RELID			2
-#define PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED	3
-#define PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN	4
-#define PROGRESS_CLUSTER_TOTAL_HEAP_BLKS		5
-#define PROGRESS_CLUSTER_HEAP_BLKS_SCANNED		6
-#define PROGRESS_CLUSTER_INDEX_REBUILD_COUNT	7
+/*
+ * Progress parameters for REPACK.
+ *
+ * Note: Since REPACK shares some code with CLUSTER, these values are also
+ * used by CLUSTER. (CLUSTER is now deprecated, so it makes little sense to
+ * introduce a separate set of constants.)
+ */
+#define PROGRESS_REPACK_COMMAND					0
+#define PROGRESS_REPACK_PHASE					1
+#define PROGRESS_REPACK_INDEX_RELID				2
+#define PROGRESS_REPACK_HEAP_TUPLES_SCANNED		3
+#define PROGRESS_REPACK_HEAP_TUPLES_WRITTEN		4
+#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			5
+#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		6
+#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		7
 
-/* Phases of cluster (as advertised via PROGRESS_CLUSTER_PHASE) */
-#define PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP	1
-#define PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP	2
-#define PROGRESS_CLUSTER_PHASE_SORT_TUPLES		3
-#define PROGRESS_CLUSTER_PHASE_WRITE_NEW_HEAP	4
-#define PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES	5
-#define PROGRESS_CLUSTER_PHASE_REBUILD_INDEX	6
-#define PROGRESS_CLUSTER_PHASE_FINAL_CLEANUP	7
+/*
+ * Phases of repack (as advertised via PROGRESS_REPACK_PHASE).
+ */
+#define PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP		1
+#define PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP	2
+#define PROGRESS_REPACK_PHASE_SORT_TUPLES		3
+#define PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP	4
+#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	5
+#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		6
+#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		7
+
+/*
+ * Commands of PROGRESS_REPACK
+ *
+ * Currently we only have one command, so the PROGRESS_REPACK_COMMAND
+ * parameter is not necessary. However it makes cluster.c simpler if we have
+ * the same set of parameters for CLUSTER and REPACK - see the note on REPACK
+ * parameters above.
+ */
+#define PROGRESS_REPACK_COMMAND_REPACK			1
+
+/*
+ * Progress parameters for cluster.
+ *
+ * Although we need to report REPACK and CLUSTER in separate views, the
+ * parameters and phases of CLUSTER are a subset of those of REPACK. Therefore
+ * we just use the appropriate values defined for REPACK above instead of
+ * defining a separate set of constants here.
+ */
 
 /* Commands of PROGRESS_CLUSTER */
 #define PROGRESS_CLUSTER_COMMAND_CLUSTER		1
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 86a236bd58b..fcc25a0c592 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3949,16 +3949,26 @@ typedef struct AlterSystemStmt
 } AlterSystemStmt;
 
 /* ----------------------
- *		Cluster Statement (support pbrown's cluster index implementation)
+ *		Repack Statement
  * ----------------------
  */
-typedef struct ClusterStmt
+typedef enum RepackCommand
+{
+	REPACK_COMMAND_CLUSTER,
+	REPACK_COMMAND_REPACK,
+	REPACK_COMMAND_VACUUMFULL,
+} RepackCommand;
+
+typedef struct RepackStmt
 {
 	NodeTag		type;
-	RangeVar   *relation;		/* relation being indexed, or NULL if all */
-	char	   *indexname;		/* original index defined */
+	RepackCommand command;		/* type of command being run */
+	RangeVar   *relation;		/* relation being repacked */
+	char	   *indexname;		/* order tuples by this index */
+	bool		usingindex;		/* whether USING INDEX is specified */
 	List	   *params;			/* list of DefElem nodes */
-} ClusterStmt;
+} RepackStmt;
+
 
 /* ----------------------
  *		Vacuum and Analyze Statements
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index a4af3f717a1..22559369e2c 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -374,6 +374,7 @@ PG_KEYWORD("reindex", REINDEX, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("relative", RELATIVE_P, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("release", RELEASE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("rename", RENAME, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("repack", REPACK, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("repeatable", REPEATABLE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("replace", REPLACE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("replica", REPLICA, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/include/tcop/cmdtaglist.h b/src/include/tcop/cmdtaglist.h
index d250a714d59..cceb312f2b3 100644
--- a/src/include/tcop/cmdtaglist.h
+++ b/src/include/tcop/cmdtaglist.h
@@ -196,6 +196,7 @@ PG_CMDTAG(CMDTAG_REASSIGN_OWNED, "REASSIGN OWNED", false, false, false)
 PG_CMDTAG(CMDTAG_REFRESH_MATERIALIZED_VIEW, "REFRESH MATERIALIZED VIEW", true, false, false)
 PG_CMDTAG(CMDTAG_REINDEX, "REINDEX", true, false, false)
 PG_CMDTAG(CMDTAG_RELEASE, "RELEASE", false, false, false)
+PG_CMDTAG(CMDTAG_REPACK, "REPACK", false, false, false)
 PG_CMDTAG(CMDTAG_RESET, "RESET", false, false, false)
 PG_CMDTAG(CMDTAG_REVOKE, "REVOKE", true, false, false)
 PG_CMDTAG(CMDTAG_REVOKE_ROLE, "REVOKE ROLE", false, false, false)
diff --git a/src/include/utils/backend_progress.h b/src/include/utils/backend_progress.h
index dda813ab407..e69e366dcdc 100644
--- a/src/include/utils/backend_progress.h
+++ b/src/include/utils/backend_progress.h
@@ -28,6 +28,7 @@ typedef enum ProgressCommandType
 	PROGRESS_COMMAND_CREATE_INDEX,
 	PROGRESS_COMMAND_BASEBACKUP,
 	PROGRESS_COMMAND_COPY,
+	PROGRESS_COMMAND_REPACK,
 } ProgressCommandType;
 
 #define PGSTAT_NUM_PROGRESS_PARAM	20
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index 4d40a6809ab..5256628b51d 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -254,6 +254,63 @@ ORDER BY 1;
  clstr_tst_pkey
 (3 rows)
 
+-- REPACK handles individual tables identically to CLUSTER, but it's worth
+-- checking if it handles table hierarchies identically as well.
+REPACK clstr_tst USING INDEX clstr_tst_c;
+-- Verify that inheritance link still works
+INSERT INTO clstr_tst_inh VALUES (0, 100, 'in child table 2');
+SELECT a,b,c,substring(d for 30), length(d) from clstr_tst;
+ a  |  b  |        c         |           substring            | length 
+----+-----+------------------+--------------------------------+--------
+ 10 |  14 | catorce          |                                |       
+ 18 |   5 | cinco            |                                |       
+  9 |   4 | cuatro           |                                |       
+ 26 |  19 | diecinueve       |                                |       
+ 12 |  18 | dieciocho        |                                |       
+ 30 |  16 | dieciseis        |                                |       
+ 24 |  17 | diecisiete       |                                |       
+  2 |  10 | diez             |                                |       
+ 23 |  12 | doce             |                                |       
+ 11 |   2 | dos              |                                |       
+ 25 |   9 | nueve            |                                |       
+ 31 |   8 | ocho             |                                |       
+  1 |  11 | once             |                                |       
+ 28 |  15 | quince           |                                |       
+ 32 |   6 | seis             | xyzzyxyzzyxyzzyxyzzyxyzzyxyzzy | 500000
+ 29 |   7 | siete            |                                |       
+ 15 |  13 | trece            |                                |       
+ 22 |  30 | treinta          |                                |       
+ 17 |  32 | treinta y dos    |                                |       
+  3 |  31 | treinta y uno    |                                |       
+  5 |   3 | tres             |                                |       
+ 20 |   1 | uno              |                                |       
+  6 |  20 | veinte           |                                |       
+ 14 |  25 | veinticinco      |                                |       
+ 21 |  24 | veinticuatro     |                                |       
+  4 |  22 | veintidos        |                                |       
+ 19 |  29 | veintinueve      |                                |       
+ 16 |  28 | veintiocho       |                                |       
+ 27 |  26 | veintiseis       |                                |       
+ 13 |  27 | veintisiete      |                                |       
+  7 |  23 | veintitres       |                                |       
+  8 |  21 | veintiuno        |                                |       
+  0 | 100 | in child table   |                                |       
+  0 | 100 | in child table 2 |                                |       
+(34 rows)
+
+-- Verify that foreign key link still works
+INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
+ERROR:  insert or update on table "clstr_tst" violates foreign key constraint "clstr_tst_con"
+DETAIL:  Key (b)=(1111) is not present in table "clstr_tst_s".
+SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
+ORDER BY 1;
+       conname        
+----------------------
+ clstr_tst_a_not_null
+ clstr_tst_con
+ clstr_tst_pkey
+(3 rows)
+
 SELECT relname, relkind,
     EXISTS(SELECT 1 FROM pg_class WHERE oid = c.reltoastrelid) AS hastoast
 FROM pg_class c WHERE relname LIKE 'clstr_tst%' ORDER BY relname;
@@ -381,6 +438,35 @@ SELECT * FROM clstr_1;
  2
 (2 rows)
 
+-- REPACK w/o argument performs no ordering, so we can only check which tables
+-- have the relfilenode changed.
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_old AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+SET SESSION AUTHORIZATION regress_clstr_user;
+SET client_min_messages = ERROR;  -- order of "skipping" warnings may vary
+REPACK;
+RESET client_min_messages;
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_new AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+-- Do the actual comparison. Unlike CLUSTER, clstr_3 should have been
+-- processed because there is nothing like clustering index here.
+SELECT o.relname FROM relnodes_old o
+JOIN relnodes_new n ON o.relname = n.relname
+WHERE o.relfilenode <> n.relfilenode
+ORDER BY o.relname;
+ relname 
+---------
+ clstr_1
+ clstr_3
+(2 rows)
+
+SET SESSION AUTHORIZATION regress_clstr_user;
 -- Test MVCC-safety of cluster. There isn't much we can do to verify the
 -- results with a single backend...
 CREATE TABLE clustertest (key int PRIMARY KEY);
@@ -495,6 +581,43 @@ ALTER TABLE clstrpart SET WITHOUT CLUSTER;
 ERROR:  cannot mark index clustered in partitioned table
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 ERROR:  cannot mark index clustered in partitioned table
+-- Check that REPACK sets new relfilenodes: it should process exactly the same
+-- tables as CLUSTER did.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart USING INDEX clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
+-- And finally the same for REPACK w/o index.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
 DROP TABLE clstrpart;
 -- Ownership of partitions is checked
 CREATE TABLE ptnowner(i int unique) PARTITION BY LIST (i);
@@ -513,7 +636,7 @@ CREATE TEMP TABLE ptnowner_oldnodes AS
   JOIN pg_class AS c ON c.oid=tree.relid;
 SET SESSION AUTHORIZATION regress_ptnowner;
 CLUSTER ptnowner USING ptnowner_i_idx;
-WARNING:  permission denied to cluster "ptnowner2", skipping it
+WARNING:  permission denied to execute CLUSTER on "ptnowner2", skipping it
 RESET SESSION AUTHORIZATION;
 SELECT a.relname, a.relfilenode=b.relfilenode FROM pg_class a
   JOIN ptnowner_oldnodes b USING (oid) ORDER BY a.relname COLLATE "C";
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 35e8aad7701..3a1d1d28282 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2071,6 +2071,29 @@ pg_stat_progress_create_index| SELECT s.pid,
     s.param15 AS partitions_done
    FROM (pg_stat_get_progress_info('CREATE INDEX'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+pg_stat_progress_repack| SELECT s.pid,
+    s.datid,
+    d.datname,
+    s.relid,
+        CASE s.param2
+            WHEN 0 THEN 'initializing'::text
+            WHEN 1 THEN 'seq scanning heap'::text
+            WHEN 2 THEN 'index scanning heap'::text
+            WHEN 3 THEN 'sorting tuples'::text
+            WHEN 4 THEN 'writing new heap'::text
+            WHEN 5 THEN 'swapping relation files'::text
+            WHEN 6 THEN 'rebuilding index'::text
+            WHEN 7 THEN 'performing final cleanup'::text
+            ELSE NULL::text
+        END AS phase,
+    (s.param3)::oid AS repack_index_relid,
+    s.param4 AS heap_tuples_scanned,
+    s.param5 AS heap_tuples_written,
+    s.param6 AS heap_blks_total,
+    s.param7 AS heap_blks_scanned,
+    s.param8 AS index_rebuild_count
+   FROM (pg_stat_get_progress_info('REPACK'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
+     LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_vacuum| SELECT s.pid,
     s.datid,
     d.datname,
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index b7115f86104..cfcc3dc9761 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -76,6 +76,19 @@ INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
 SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
 ORDER BY 1;
 
+-- REPACK handles individual tables identically to CLUSTER, but it's worth
+-- checking if it handles table hierarchies identically as well.
+REPACK clstr_tst USING INDEX clstr_tst_c;
+
+-- Verify that inheritance link still works
+INSERT INTO clstr_tst_inh VALUES (0, 100, 'in child table 2');
+SELECT a,b,c,substring(d for 30), length(d) from clstr_tst;
+
+-- Verify that foreign key link still works
+INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
+
+SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
+ORDER BY 1;
 
 SELECT relname, relkind,
     EXISTS(SELECT 1 FROM pg_class WHERE oid = c.reltoastrelid) AS hastoast
@@ -159,6 +172,34 @@ INSERT INTO clstr_1 VALUES (1);
 CLUSTER clstr_1;
 SELECT * FROM clstr_1;
 
+-- REPACK w/o argument performs no ordering, so we can only check which tables
+-- have the relfilenode changed.
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_old AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+
+SET SESSION AUTHORIZATION regress_clstr_user;
+SET client_min_messages = ERROR;  -- order of "skipping" warnings may vary
+REPACK;
+RESET client_min_messages;
+
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_new AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+
+-- Do the actual comparison. Unlike CLUSTER, clstr_3 should have been
+-- processed because there is nothing like clustering index here.
+SELECT o.relname FROM relnodes_old o
+JOIN relnodes_new n ON o.relname = n.relname
+WHERE o.relfilenode <> n.relfilenode
+ORDER BY o.relname;
+
+SET SESSION AUTHORIZATION regress_clstr_user;
+
 -- Test MVCC-safety of cluster. There isn't much we can do to verify the
 -- results with a single backend...
 
@@ -229,6 +270,24 @@ SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM o
 CLUSTER clstrpart;
 ALTER TABLE clstrpart SET WITHOUT CLUSTER;
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+
+-- Check that REPACK sets new relfilenodes: it should process exactly the same
+-- tables as CLUSTER did.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart USING INDEX clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+
+-- And finally the same for REPACK w/o index.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+
 DROP TABLE clstrpart;
 
 -- Ownership of partitions is checked
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a13e8162890..98242e25432 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2537,6 +2537,8 @@ ReorderBufferTupleCidKey
 ReorderBufferUpdateProgressTxnCB
 ReorderTuple
 RepOriginId
+RepackCommand
+RepackStmt
 ReparameterizeForeignPathByChild_function
 ReplaceVarsFromTargetList_context
 ReplaceVarsNoMatchOption
@@ -2603,6 +2605,7 @@ RtlNtStatusToDosError_t
 RuleInfo
 RuleLock
 RuleStmt
+RunMode
 RunningTransactions
 RunningTransactionsData
 SASLStatus
-- 
2.39.5

v19-0003-Refactor-index_concurrently_create_copy-for-use-.patchtext/x-diff; charset=utf-8Download

From f66096a8d81dc931fc8b1c5a2088a49875ef729d Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Mon, 11 Aug 2025 15:31:34 +0200
Subject: [PATCH v19 3/6] Refactor index_concurrently_create_copy() for use
 with REPACK (CONCURRENTLY).

This patch moves the code to index_create_copy() and adds a "concurrently"
parameter so it can be used by REPACK (CONCURRENTLY).

With the CONCURRENTLY option, REPACK cannot simply swap the heap file and
rebuild its indexes. Instead, it needs to build a separate set of indexes
(including system catalog entries) *before* the actual swap, to reduce the
time AccessExclusiveLock needs to be held for.
---
 src/backend/catalog/index.c | 36 ++++++++++++++++++++++++++++--------
 src/include/catalog/index.h |  3 +++
 2 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 3063abff9a5..0dee1b1a9d8 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1290,15 +1290,31 @@ index_create(Relation heapRelation,
 /*
  * index_concurrently_create_copy
  *
- * Create concurrently an index based on the definition of the one provided by
- * caller.  The index is inserted into catalogs and needs to be built later
- * on.  This is called during concurrent reindex processing.
- *
- * "tablespaceOid" is the tablespace to use for this index.
+ * Variant of index_create_copy(), called during concurrent reindex
+ * processing.
  */
 Oid
 index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 							   Oid tablespaceOid, const char *newName)
+{
+	return index_create_copy(heapRelation, oldIndexId, tablespaceOid, newName,
+							 true);
+}
+
+/*
+ * index_create_copy
+ *
+ * Create an index based on the definition of the one provided by caller.  The
+ * index is inserted into catalogs and needs to be built later on.
+ *
+ * "tablespaceOid" is the tablespace to use for this index.
+ *
+ * The actual implementation of index_concurrently_create_copy(), reusable for
+ * other purposes.
+ */
+Oid
+index_create_copy(Relation heapRelation, Oid oldIndexId, Oid tablespaceOid,
+				  const char *newName, bool concurrently)
 {
 	Relation	indexRelation;
 	IndexInfo  *oldInfo,
@@ -1317,6 +1333,7 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 	List	   *indexColNames = NIL;
 	List	   *indexExprs = NIL;
 	List	   *indexPreds = NIL;
+	int			flags = 0;
 
 	indexRelation = index_open(oldIndexId, RowExclusiveLock);
 
@@ -1325,9 +1342,9 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 
 	/*
 	 * Concurrent build of an index with exclusion constraints is not
-	 * supported.
+	 * supported. If !concurrently, ii_ExclusinOps is currently not needed.
 	 */
-	if (oldInfo->ii_ExclusionOps != NULL)
+	if (oldInfo->ii_ExclusionOps != NULL && concurrently)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("concurrent index creation for exclusion constraints is not supported")));
@@ -1435,6 +1452,9 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 		stattargets[i].isnull = isnull;
 	}
 
+	if (concurrently)
+		flags = INDEX_CREATE_SKIP_BUILD | INDEX_CREATE_CONCURRENT;
+
 	/*
 	 * Now create the new index.
 	 *
@@ -1458,7 +1478,7 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 							  indcoloptions->values,
 							  stattargets,
 							  reloptionsDatum,
-							  INDEX_CREATE_SKIP_BUILD | INDEX_CREATE_CONCURRENT,
+							  flags,
 							  0,
 							  true, /* allow table to be a system catalog? */
 							  false,	/* is_internal? */
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index 4daa8bef5ee..063a891351a 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -99,6 +99,9 @@ extern Oid	index_concurrently_create_copy(Relation heapRelation,
 										   Oid oldIndexId,
 										   Oid tablespaceOid,
 										   const char *newName);
+extern Oid	index_create_copy(Relation heapRelation, Oid oldIndexId,
+							  Oid tablespaceOid, const char *newName,
+							  bool concurrently);
 
 extern void index_concurrently_build(Oid heapRelationId,
 									 Oid indexRelationId);
-- 
2.39.5

v19-0004-Move-conversion-of-a-historic-to-MVCC-snapshot-t.patchtext/x-diff; charset=utf-8Download

From 82351d58f29f09bef38898a2d47a23044c706938 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Mon, 11 Aug 2025 15:23:05 +0200
Subject: [PATCH v19 4/6] Move conversion of a "historic" to MVCC snapshot to a
 separate function.

The conversion is now handled by SnapBuildMVCCFromHistoric(). REPACK
CONCURRENTLY will also need it.
---
 src/backend/replication/logical/snapbuild.c | 51 +++++++++++++++++----
 src/backend/utils/time/snapmgr.c            |  3 +-
 src/include/replication/snapbuild.h         |  1 +
 src/include/utils/snapmgr.h                 |  1 +
 4 files changed, 45 insertions(+), 11 deletions(-)

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 98ddee20929..a2f1803622c 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -440,10 +440,7 @@ Snapshot
 SnapBuildInitialSnapshot(SnapBuild *builder)
 {
 	Snapshot	snap;
-	TransactionId xid;
 	TransactionId safeXid;
-	TransactionId *newxip;
-	int			newxcnt = 0;
 
 	Assert(XactIsoLevel == XACT_REPEATABLE_READ);
 	Assert(builder->building_full_snapshot);
@@ -485,6 +482,31 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 
 	MyProc->xmin = snap->xmin;
 
+	/* Convert the historic snapshot to MVCC snapshot. */
+	return SnapBuildMVCCFromHistoric(snap, true);
+}
+
+/*
+ * Turn a historic MVCC snapshot into an ordinary MVCC snapshot.
+ *
+ * Unlike a regular (non-historic) MVCC snapshot, the xip array of this
+ * snapshot contains not only running main transactions, but also their
+ * subtransactions. This difference does has no impact on XidInMVCCSnapshot().
+ *
+ * Pass true for 'in_place' if you don't care about modifying the source
+ * snapshot. If you need a new instance, and one that was allocated as a
+ * single chunk of memory, pass false.
+ */
+Snapshot
+SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place)
+{
+	TransactionId xid;
+	TransactionId *oldxip = snapshot->xip;
+	uint32		oldxcnt = snapshot->xcnt;
+	TransactionId *newxip;
+	int			newxcnt = 0;
+	Snapshot	result;
+
 	/* allocate in transaction context */
 	newxip = (TransactionId *)
 		palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount());
@@ -495,7 +517,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	 * classical snapshot by marking all non-committed transactions as
 	 * in-progress. This can be expensive.
 	 */
-	for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);)
+	for (xid = snapshot->xmin; NormalTransactionIdPrecedes(xid, snapshot->xmax);)
 	{
 		void	   *test;
 
@@ -503,7 +525,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 		 * Check whether transaction committed using the decoding snapshot
 		 * meaning of ->xip.
 		 */
-		test = bsearch(&xid, snap->xip, snap->xcnt,
+		test = bsearch(&xid, snapshot->xip, snapshot->xcnt,
 					   sizeof(TransactionId), xidComparator);
 
 		if (test == NULL)
@@ -520,11 +542,22 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	}
 
 	/* adjust remaining snapshot fields as needed */
-	snap->snapshot_type = SNAPSHOT_MVCC;
-	snap->xcnt = newxcnt;
-	snap->xip = newxip;
+	snapshot->xcnt = newxcnt;
+	snapshot->xip = newxip;
 
-	return snap;
+	if (in_place)
+		result = snapshot;
+	else
+	{
+		result = CopySnapshot(snapshot);
+
+		/* Restore the original values so the source is intact. */
+		snapshot->xip = oldxip;
+		snapshot->xcnt = oldxcnt;
+	}
+	result->snapshot_type = SNAPSHOT_MVCC;
+
+	return result;
 }
 
 /*
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 65561cc6bc3..bc7840052fe 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -212,7 +212,6 @@ typedef struct ExportedSnapshot
 static List *exportedSnapshots = NIL;
 
 /* Prototypes for local functions */
-static Snapshot CopySnapshot(Snapshot snapshot);
 static void UnregisterSnapshotNoOwner(Snapshot snapshot);
 static void FreeSnapshot(Snapshot snapshot);
 static void SnapshotResetXmin(void);
@@ -602,7 +601,7 @@ SetTransactionSnapshot(Snapshot sourcesnap, VirtualTransactionId *sourcevxid,
  * The copy is palloc'd in TopTransactionContext and has initial refcounts set
  * to 0.  The returned snapshot has the copied flag set.
  */
-static Snapshot
+Snapshot
 CopySnapshot(Snapshot snapshot)
 {
 	Snapshot	newsnap;
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index 44031dcf6e3..6d4d2d1814c 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -73,6 +73,7 @@ extern void FreeSnapshotBuilder(SnapBuild *builder);
 extern void SnapBuildSnapDecRefcount(Snapshot snap);
 
 extern Snapshot SnapBuildInitialSnapshot(SnapBuild *builder);
+extern Snapshot SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place);
 extern const char *SnapBuildExportSnapshot(SnapBuild *builder);
 extern void SnapBuildClearExportedSnapshot(void);
 extern void SnapBuildResetExportedSnapshotState(void);
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..f65f83c85cd 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -63,6 +63,7 @@ extern Snapshot GetTransactionSnapshot(void);
 extern Snapshot GetLatestSnapshot(void);
 extern void SnapshotSetCommandId(CommandId curcid);
 
+extern Snapshot CopySnapshot(Snapshot snapshot);
 extern Snapshot GetCatalogSnapshot(Oid relid);
 extern Snapshot GetNonHistoricCatalogSnapshot(Oid relid);
 extern void InvalidateCatalogSnapshot(void);
-- 
2.39.5

v19-0005-Add-CONCURRENTLY-option-to-REPACK-command.patchtext/x-diff; charset=utf-8Download

From 0cb87a6bfc37ceb9f2f1657776b1a35464315db8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= <alvherre@kurilemu.de>
Date: Sat, 30 Aug 2025 19:13:38 +0200
Subject: [PATCH v19 5/6] Add CONCURRENTLY option to REPACK command.

The REPACK command copies the relation data into a new file, creates new
indexes and eventually swaps the files. To make sure that the old file does
not change during the copying, the relation is locked in an exclusive mode,
which prevents applications from both reading and writing. (To keep the data
consistent, we'd only need to prevent the applications from writing, but even
reading needs to be blocked before we can swap the files - otherwise some
applications could continue using the old file. Since we should not request a
stronger lock without releasing the weaker one first, we acquire the exclusive
lock in the beginning and keep it till the end of the processing.)

This patch introduces an alternative workflow, which only requires the
exclusive lock when the relation (and index) files are being swapped.
(Supposedly, the swapping should be pretty fast.) On the other hand, when we
copy the data to the new file, we allow applications to read from the relation
and even to write to it.

First, we scan the relation using a "historic snapshot", and insert all the
tuples satisfying this snapshot into the new file.

Second, logical decoding is used to capture the data changes done by
applications during the copying (i.e. changes that do not satisfy the historic
snapshot mentioned above), and those are applied to the new file before we
acquire the exclusive lock that we need to swap the files. (Of course, more
data changes can take place while we are waiting for the lock - these will be
applied to the new file after we have acquired the lock, before we swap the
files.)

Since the logical decoding system, during its startup, waits until all the
transactions which already have XID assigned have finished, there is a risk of
deadlock if a transaction that already changed anything in the database tries
to acquire a conflicting lock on the table REPACK CONCURRENTLY is working
on. As an example, consider transaction running CREATE INDEX command on the
table that is being REPACKed CONCURRENTLY. On the other hand, DML commands
(INSERT, UPDATE, DELETE) are not a problem as their lock does not conflict
with REPACK CONCURRENTLY.

The current approach is that we accept the risk. If we tried to avoid it, it'd
be necessary to unlock the table before the logical decoding is setup and lock
it again afterwards. Such temporary unlocking would imply re-checking if the
table still meets all the requirements for REPACK CONCURRENTLY.

Like the existing implementation of REPACK, the variant with the CONCURRENTLY
option also requires an extra space for the new relation and index files
(which coexist with the old files for some time). In addition, the
CONCURRENTLY option might introduce a lag in releasing WAL segments for
archiving / recycling. This is due to the decoding of the data changes done by
applications concurrently. When copying the table contents into the new file,
we check the lag periodically. If it exceeds the size of a WAL segment, we
decode all the available WAL before resuming the copying. (Of course, the
changes are not applied until the whole table contents is copied.) A
background worker might be a better approach for the decoding - let's consider
implementing it in the future.

The WAL records produced by running DML commands on the new relation do not
contain enough information to be processed by the logical decoding system. All
we need from the new relation is the file (relfilenode), while the actual
relation is eventually dropped. Thus there is no point in replaying the DMLs
anywhere.

Author: Antonin Houska <ah@cybertec.at>
---
 doc/src/sgml/monitoring.sgml                  |   37 +-
 doc/src/sgml/mvcc.sgml                        |   12 +-
 doc/src/sgml/ref/repack.sgml                  |  129 +-
 src/Makefile                                  |    1 +
 src/backend/access/heap/heapam.c              |   34 +-
 src/backend/access/heap/heapam_handler.c      |  227 ++-
 src/backend/access/heap/rewriteheap.c         |    6 +-
 src/backend/access/transam/xact.c             |   11 +-
 src/backend/catalog/system_views.sql          |   30 +-
 src/backend/commands/cluster.c                | 1677 +++++++++++++++--
 src/backend/commands/matview.c                |    2 +-
 src/backend/commands/tablecmds.c              |    1 +
 src/backend/commands/vacuum.c                 |   12 +-
 src/backend/meson.build                       |    1 +
 src/backend/replication/logical/decode.c      |   83 +
 src/backend/replication/logical/snapbuild.c   |   20 +
 .../replication/pgoutput_repack/Makefile      |   32 +
 .../replication/pgoutput_repack/meson.build   |   18 +
 .../pgoutput_repack/pgoutput_repack.c         |  288 +++
 src/backend/storage/ipc/ipci.c                |    1 +
 .../storage/lmgr/generate-lwlocknames.pl      |    2 +-
 src/backend/utils/cache/relcache.c            |    1 +
 src/backend/utils/time/snapmgr.c              |    3 +-
 src/bin/psql/tab-complete.in.c                |   25 +-
 src/include/access/heapam.h                   |    9 +-
 src/include/access/heapam_xlog.h              |    2 +
 src/include/access/tableam.h                  |   10 +
 src/include/commands/cluster.h                |   91 +-
 src/include/commands/progress.h               |   23 +-
 src/include/replication/snapbuild.h           |    1 +
 src/include/storage/lockdefs.h                |    4 +-
 src/include/utils/snapmgr.h                   |    2 +
 src/test/modules/injection_points/Makefile    |    5 +-
 .../injection_points/expected/repack.out      |  113 ++
 .../modules/injection_points/logical.conf     |    1 +
 src/test/modules/injection_points/meson.build |    4 +
 .../injection_points/specs/repack.spec        |  143 ++
 src/test/regress/expected/rules.out           |   29 +-
 src/tools/pgindent/typedefs.list              |    4 +
 39 files changed, 2821 insertions(+), 273 deletions(-)
 create mode 100644 src/backend/replication/pgoutput_repack/Makefile
 create mode 100644 src/backend/replication/pgoutput_repack/meson.build
 create mode 100644 src/backend/replication/pgoutput_repack/pgoutput_repack.c
 create mode 100644 src/test/modules/injection_points/expected/repack.out
 create mode 100644 src/test/modules/injection_points/logical.conf
 create mode 100644 src/test/modules/injection_points/specs/repack.spec

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 12e103d319d..61c0197555f 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6074,14 +6074,35 @@ FROM pg_stat_get_backend_idset() AS backendid;
 
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>heap_tuples_written</structfield> <type>bigint</type>
+       <structfield>heap_tuples_inserted</structfield> <type>bigint</type>
       </para>
       <para>
-       Number of heap tuples written.
+       Number of heap tuples inserted.
        This counter only advances when the phase is
        <literal>seq scanning heap</literal>,
-       <literal>index scanning heap</literal>
-       or <literal>writing new heap</literal>.
+       <literal>index scanning heap</literal>,
+       <literal>writing new heap</literal>
+       or <literal>catch-up</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_updated</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples updated.
+       This counter only advances when the phase is <literal>catch-up</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_deleted</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples deleted.
+       This counter only advances when the phase is <literal>catch-up</literal>.
       </para></entry>
      </row>
 
@@ -6162,6 +6183,14 @@ FROM pg_stat_get_backend_idset() AS backendid;
        <command>REPACK</command> is currently writing the new heap.
      </entry>
     </row>
+    <row>
+     <entry><literal>catch-up</literal></entry>
+     <entry>
+       <command>REPACK CONCURRENTLY</command> is currently processing the DML
+       commands that other transactions executed during any of the preceding
+       phase.
+     </entry>
+    </row>
     <row>
      <entry><literal>swapping relation files</literal></entry>
      <entry>
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index 049ee75a4ba..0f5c34af542 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -1833,15 +1833,17 @@ SELECT pg_advisory_lock(q.id) FROM
    <title>Caveats</title>
 
    <para>
-    Some DDL commands, currently only <link linkend="sql-truncate"><command>TRUNCATE</command></link> and the
-    table-rewriting forms of <link linkend="sql-altertable"><command>ALTER TABLE</command></link>, are not
+    Some commands, currently only <link linkend="sql-truncate"><command>TRUNCATE</command></link>, the
+    table-rewriting forms of <link linkend="sql-altertable"><command>ALTER
+    TABLE</command></link> and <command>REPACK</command> with
+    the <literal>CONCURRENTLY</literal> option, are not
     MVCC-safe.  This means that after the truncation or rewrite commits, the
     table will appear empty to concurrent transactions, if they are using a
-    snapshot taken before the DDL command committed.  This will only be an
+    snapshot taken before the command committed.  This will only be an
     issue for a transaction that did not access the table in question
-    before the DDL command started &mdash; any transaction that has done so
+    before the command started &mdash; any transaction that has done so
     would hold at least an <literal>ACCESS SHARE</literal> table lock,
-    which would block the DDL command until that transaction completes.
+    which would block the truncating or rewriting command until that transaction completes.
     So these commands will not cause any apparent inconsistency in the
     table contents for successive queries on the target table, but they
     could cause visible inconsistency between the contents of the target
diff --git a/doc/src/sgml/ref/repack.sgml b/doc/src/sgml/ref/repack.sgml
index fd9d89f8aaa..ff5ce48de55 100644
--- a/doc/src/sgml/ref/repack.sgml
+++ b/doc/src/sgml/ref/repack.sgml
@@ -27,6 +27,7 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
 
     VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
     ANALYSE | ANALYZE
+    CONCURRENTLY
 </synopsis>
  </refsynopsisdiv>
 
@@ -49,7 +50,8 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
    processes every table and materialized view in the current database that
    the current user has the <literal>MAINTAIN</literal> privilege on. This
    form of <command>REPACK</command> cannot be executed inside a transaction
-   block.
+   block.  Also, this form is not allowed if
+   the <literal>CONCURRENTLY</literal> option is used.
   </para>
 
   <para>
@@ -62,7 +64,8 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
    When a table is being repacked, an <literal>ACCESS EXCLUSIVE</literal> lock
    is acquired on it. This prevents any other database operations (both reads
    and writes) from operating on the table until the <command>REPACK</command>
-   is finished.
+   is finished. If you want to keep the table accessible during the repacking,
+   consider using the <literal>CONCURRENTLY</literal> option.
   </para>
 
   <refsect2 id="sql-repack-notes-on-clustering" xreflabel="Notes on Clustering">
@@ -179,6 +182,128 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>CONCURRENTLY</literal></term>
+    <listitem>
+     <para>
+      Allow other transactions to use the table while it is being repacked.
+     </para>
+
+     <para>
+      Internally, <command>REPACK</command> copies the contents of the table
+      (ignoring dead tuples) into a new file, sorted by the specified index,
+      and also creates a new file for each index. Then it swaps the old and
+      new files for the table and all the indexes, and deletes the old
+      files. The <literal>ACCESS EXCLUSIVE</literal> lock is needed to make
+      sure that the old files do not change during the processing because the
+      changes would get lost due to the swap.
+     </para>
+
+     <para>
+      With the <literal>CONCURRENTLY</literal> option, the <literal>ACCESS
+      EXCLUSIVE</literal> lock is only acquired to swap the table and index
+      files. The data changes that took place during the creation of the new
+      table and index files are captured using logical decoding
+      (<xref linkend="logicaldecoding"/>) and applied before
+      the <literal>ACCESS EXCLUSIVE</literal> lock is requested. Thus the lock
+      is typically held only for the time needed to swap the files, which
+      should be pretty short. However, the time might still be noticeable if
+      too many data changes have been done to the table while
+      <command>REPACK</command> was waiting for the lock: those changes must
+      be processed just before the files are swapped, while the
+      <literal>ACCESS EXCLUSIVE</literal> lock is being held.
+     </para>
+
+     <para>
+      Note that <command>REPACK</command> with the
+      the <literal>CONCURRENTLY</literal> option does not try to order the
+      rows inserted into the table after the repacking started. Also
+      note <command>REPACK</command> might fail to complete due to DDL
+      commands executed on the table by other transactions during the
+      repacking.
+     </para>
+
+     <note>
+      <para>
+       In addition to the temporary space requirements explained in
+       <xref linkend="sql-repack-notes-on-resources"/>,
+       the <literal>CONCURRENTLY</literal> option can add to the usage of
+       temporary space a bit more. The reason is that other transactions can
+       perform DML operations which cannot be applied to the new file until
+       <command>REPACK</command> has copied all the tuples from the old
+       file. Thus the tuples inserted into the old file during the copying are
+       also stored separately in a temporary file, so they can eventually be
+       applied to the new file.
+      </para>
+
+      <para>
+       Furthermore, the data changes performed during the copying are
+       extracted from <link linkend="wal">write-ahead log</link> (WAL), and
+       this extraction (decoding) only takes place when certain amount of WAL
+       has been written. Therefore, WAL removal can be delayed by this
+       threshold. Currently the threshold is equal to the value of
+       the <link linkend="guc-wal-segment-size"><varname>wal_segment_size</varname></link>
+       configuration parameter.
+      </para>
+     </note>
+
+     <para>
+      The <literal>CONCURRENTLY</literal> option cannot be used in the
+      following cases:
+
+      <itemizedlist>
+       <listitem>
+        <para>
+          The table is <literal>UNLOGGED</literal>.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+          The table is partitioned.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+          The table is a system catalog or a <acronym>TOAST</acronym> table.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         <command>REPACK</command> is executed inside a transaction block.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+          The <link linkend="guc-wal-level"><varname>wal_level</varname></link>
+          configuration parameter is less than <literal>logical</literal>.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         The <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+         configuration parameter does not allow for creation of an additional
+         replication slot.
+        </para>
+       </listitem>
+      </itemizedlist>
+     </para>
+
+     <warning>
+      <para>
+       <command>REPACK</command> with the <literal>CONCURRENTLY</literal>
+       option is not MVCC-safe, see <xref linkend="mvcc-caveats"/> for
+       details.
+      </para>
+     </warning>
+
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>VERBOSE</literal></term>
     <listitem>
diff --git a/src/Makefile b/src/Makefile
index 2f31a2f20a7..b18c9a14ffa 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -23,6 +23,7 @@ SUBDIRS = \
 	interfaces \
 	backend/replication/libpqwalreceiver \
 	backend/replication/pgoutput \
+	backend/replication/pgoutput_repack \
 	fe_utils \
 	bin \
 	pl \
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e3e7307ef5f..f9a4fe3faed 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -60,7 +60,8 @@ static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
 static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 								  Buffer newbuf, HeapTuple oldtup,
 								  HeapTuple newtup, HeapTuple old_key_tuple,
-								  bool all_visible_cleared, bool new_all_visible_cleared);
+								  bool all_visible_cleared, bool new_all_visible_cleared,
+								  bool wal_logical);
 #ifdef USE_ASSERT_CHECKING
 static void check_lock_if_inplace_updateable_rel(Relation relation,
 												 ItemPointer otid,
@@ -2780,7 +2781,7 @@ xmax_infomask_changed(uint16 new_infomask, uint16 old_infomask)
 TM_Result
 heap_delete(Relation relation, ItemPointer tid,
 			CommandId cid, Snapshot crosscheck, bool wait,
-			TM_FailureData *tmfd, bool changingPart)
+			TM_FailureData *tmfd, bool changingPart, bool wal_logical)
 {
 	TM_Result	result;
 	TransactionId xid = GetCurrentTransactionId();
@@ -3027,7 +3028,8 @@ l1:
 	 * Compute replica identity tuple before entering the critical section so
 	 * we don't PANIC upon a memory allocation failure.
 	 */
-	old_key_tuple = ExtractReplicaIdentity(relation, &tp, true, &old_key_copied);
+	old_key_tuple = wal_logical ?
+		ExtractReplicaIdentity(relation, &tp, true, &old_key_copied) : NULL;
 
 	/*
 	 * If this is the first possibly-multixact-able operation in the current
@@ -3117,6 +3119,15 @@ l1:
 				xlrec.flags |= XLH_DELETE_CONTAINS_OLD_KEY;
 		}
 
+		/*
+		 * Unlike UPDATE, DELETE is decoded even if there is no old key, so it
+		 * does not help to clear both XLH_DELETE_CONTAINS_OLD_TUPLE and
+		 * XLH_DELETE_CONTAINS_OLD_KEY. Thus we need an extra flag. TODO
+		 * Consider not decoding tuples w/o the old tuple/key instead.
+		 */
+		if (!wal_logical)
+			xlrec.flags |= XLH_DELETE_NO_LOGICAL;
+
 		XLogBeginInsert();
 		XLogRegisterData(&xlrec, SizeOfHeapDelete);
 
@@ -3209,7 +3220,8 @@ simple_heap_delete(Relation relation, ItemPointer tid)
 	result = heap_delete(relation, tid,
 						 GetCurrentCommandId(true), InvalidSnapshot,
 						 true /* wait for commit */ ,
-						 &tmfd, false /* changingPart */ );
+						 &tmfd, false,	/* changingPart */
+						 true /* wal_logical */ );
 	switch (result)
 	{
 		case TM_SelfModified:
@@ -3250,7 +3262,7 @@ TM_Result
 heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 			CommandId cid, Snapshot crosscheck, bool wait,
 			TM_FailureData *tmfd, LockTupleMode *lockmode,
-			TU_UpdateIndexes *update_indexes)
+			TU_UpdateIndexes *update_indexes, bool wal_logical)
 {
 	TM_Result	result;
 	TransactionId xid = GetCurrentTransactionId();
@@ -4143,7 +4155,8 @@ l2:
 								 newbuf, &oldtup, heaptup,
 								 old_key_tuple,
 								 all_visible_cleared,
-								 all_visible_cleared_new);
+								 all_visible_cleared_new,
+								 wal_logical);
 		if (newbuf != buffer)
 		{
 			PageSetLSN(BufferGetPage(newbuf), recptr);
@@ -4501,7 +4514,8 @@ simple_heap_update(Relation relation, ItemPointer otid, HeapTuple tup,
 	result = heap_update(relation, otid, tup,
 						 GetCurrentCommandId(true), InvalidSnapshot,
 						 true /* wait for commit */ ,
-						 &tmfd, &lockmode, update_indexes);
+						 &tmfd, &lockmode, update_indexes,
+						 true /* wal_logical */ );
 	switch (result)
 	{
 		case TM_SelfModified:
@@ -8842,7 +8856,8 @@ static XLogRecPtr
 log_heap_update(Relation reln, Buffer oldbuf,
 				Buffer newbuf, HeapTuple oldtup, HeapTuple newtup,
 				HeapTuple old_key_tuple,
-				bool all_visible_cleared, bool new_all_visible_cleared)
+				bool all_visible_cleared, bool new_all_visible_cleared,
+				bool wal_logical)
 {
 	xl_heap_update xlrec;
 	xl_heap_header xlhdr;
@@ -8853,7 +8868,8 @@ log_heap_update(Relation reln, Buffer oldbuf,
 				suffixlen = 0;
 	XLogRecPtr	recptr;
 	Page		page = BufferGetPage(newbuf);
-	bool		need_tuple_data = RelationIsLogicallyLogged(reln);
+	bool		need_tuple_data = RelationIsLogicallyLogged(reln) &&
+		wal_logical;
 	bool		init;
 	int			bufflags;
 
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 79f9de5d760..d03084768e0 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -33,6 +33,7 @@
 #include "catalog/index.h"
 #include "catalog/storage.h"
 #include "catalog/storage_xlog.h"
+#include "commands/cluster.h"
 #include "commands/progress.h"
 #include "executor/executor.h"
 #include "miscadmin.h"
@@ -309,7 +310,8 @@ heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
 	 * the storage itself is cleaning the dead tuples by itself, it is the
 	 * time to call the index tuple deletion also.
 	 */
-	return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart);
+	return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart,
+					   true);
 }
 
 
@@ -328,7 +330,7 @@ heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
 	tuple->t_tableOid = slot->tts_tableOid;
 
 	result = heap_update(relation, otid, tuple, cid, crosscheck, wait,
-						 tmfd, lockmode, update_indexes);
+						 tmfd, lockmode, update_indexes, true);
 	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
 
 	/*
@@ -685,13 +687,15 @@ static void
 heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 								 Relation OldIndex, bool use_sort,
 								 TransactionId OldestXmin,
+								 Snapshot snapshot,
+								 LogicalDecodingContext *decoding_ctx,
 								 TransactionId *xid_cutoff,
 								 MultiXactId *multi_cutoff,
 								 double *num_tuples,
 								 double *tups_vacuumed,
 								 double *tups_recently_dead)
 {
-	RewriteState rwstate;
+	RewriteState rwstate = NULL;
 	IndexScanDesc indexScan;
 	TableScanDesc tableScan;
 	HeapScanDesc heapScan;
@@ -705,6 +709,8 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	bool	   *isnull;
 	BufferHeapTupleTableSlot *hslot;
 	BlockNumber prev_cblock = InvalidBlockNumber;
+	bool		concurrent = snapshot != NULL;
+	XLogRecPtr	end_of_wal_prev = GetFlushRecPtr(NULL);
 
 	/* Remember if it's a system catalog */
 	is_system_catalog = IsSystemRelation(OldHeap);
@@ -720,9 +726,12 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	values = (Datum *) palloc(natts * sizeof(Datum));
 	isnull = (bool *) palloc(natts * sizeof(bool));
 
-	/* Initialize the rewrite operation */
-	rwstate = begin_heap_rewrite(OldHeap, NewHeap, OldestXmin, *xid_cutoff,
-								 *multi_cutoff);
+	/*
+	 * Initialize the rewrite operation.
+	 */
+	if (!concurrent)
+		rwstate = begin_heap_rewrite(OldHeap, NewHeap, OldestXmin,
+									 *xid_cutoff, *multi_cutoff);
 
 
 	/* Set up sorting if wanted */
@@ -737,6 +746,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	 * Prepare to scan the OldHeap.  To ensure we see recently-dead tuples
 	 * that still need to be copied, we scan with SnapshotAny and use
 	 * HeapTupleSatisfiesVacuum for the visibility test.
+	 *
+	 * In the CONCURRENTLY case, we do regular MVCC visibility tests, using
+	 * the snapshot passed by the caller.
 	 */
 	if (OldIndex != NULL && !use_sort)
 	{
@@ -753,7 +765,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex,
+									snapshot ? snapshot : SnapshotAny,
+									NULL, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -762,7 +776,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap,
+									snapshot ? snapshot : SnapshotAny,
+									0, (ScanKey) NULL);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
@@ -785,6 +801,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		HeapTuple	tuple;
 		Buffer		buf;
 		bool		isdead;
+		HTSV_Result vis;
 
 		CHECK_FOR_INTERRUPTS();
 
@@ -837,70 +854,84 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		tuple = ExecFetchSlotHeapTuple(slot, false, NULL);
 		buf = hslot->buffer;
 
-		LockBuffer(buf, BUFFER_LOCK_SHARE);
-
-		switch (HeapTupleSatisfiesVacuum(tuple, OldestXmin, buf))
+		/*
+		 * Regarding CONCURRENTLY, see the comments on MVCC snapshot above.
+		 */
+		if (!concurrent)
 		{
-			case HEAPTUPLE_DEAD:
-				/* Definitely dead */
-				isdead = true;
-				break;
-			case HEAPTUPLE_RECENTLY_DEAD:
-				*tups_recently_dead += 1;
-				/* fall through */
-			case HEAPTUPLE_LIVE:
-				/* Live or recently dead, must copy it */
-				isdead = false;
-				break;
-			case HEAPTUPLE_INSERT_IN_PROGRESS:
+			LockBuffer(buf, BUFFER_LOCK_SHARE);
 
-				/*
-				 * Since we hold exclusive lock on the relation, normally the
-				 * only way to see this is if it was inserted earlier in our
-				 * own transaction.  However, it can happen in system
-				 * catalogs, since we tend to release write lock before commit
-				 * there.  Give a warning if neither case applies; but in any
-				 * case we had better copy it.
-				 */
-				if (!is_system_catalog &&
-					!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple->t_data)))
-					elog(WARNING, "concurrent insert in progress within table \"%s\"",
-						 RelationGetRelationName(OldHeap));
-				/* treat as live */
-				isdead = false;
-				break;
-			case HEAPTUPLE_DELETE_IN_PROGRESS:
-
-				/*
-				 * Similar situation to INSERT_IN_PROGRESS case.
-				 */
-				if (!is_system_catalog &&
-					!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tuple->t_data)))
-					elog(WARNING, "concurrent delete in progress within table \"%s\"",
-						 RelationGetRelationName(OldHeap));
-				/* treat as recently dead */
-				*tups_recently_dead += 1;
-				isdead = false;
-				break;
-			default:
-				elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
-				isdead = false; /* keep compiler quiet */
-				break;
-		}
-
-		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
-
-		if (isdead)
-		{
-			*tups_vacuumed += 1;
-			/* heap rewrite module still needs to see it... */
-			if (rewrite_heap_dead_tuple(rwstate, tuple))
+			switch ((vis = HeapTupleSatisfiesVacuum(tuple, OldestXmin, buf)))
 			{
-				/* A previous recently-dead tuple is now known dead */
-				*tups_vacuumed += 1;
-				*tups_recently_dead -= 1;
+				case HEAPTUPLE_DEAD:
+					/* Definitely dead */
+					isdead = true;
+					break;
+				case HEAPTUPLE_RECENTLY_DEAD:
+					*tups_recently_dead += 1;
+					/* fall through */
+				case HEAPTUPLE_LIVE:
+					/* Live or recently dead, must copy it */
+					isdead = false;
+					break;
+				case HEAPTUPLE_INSERT_IN_PROGRESS:
+
+					/*
+					 * As long as we hold exclusive lock on the relation,
+					 * normally the only way to see this is if it was inserted
+					 * earlier in our own transaction.  However, it can happen
+					 * in system catalogs, since we tend to release write lock
+					 * before commit there. Also, there's no exclusive lock
+					 * during concurrent processing. Give a warning if neither
+					 * case applies; but in any case we had better copy it.
+					 */
+					if (!is_system_catalog && !concurrent &&
+						!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple->t_data)))
+						elog(WARNING, "concurrent insert in progress within table \"%s\"",
+							 RelationGetRelationName(OldHeap));
+					/* treat as live */
+					isdead = false;
+					break;
+				case HEAPTUPLE_DELETE_IN_PROGRESS:
+
+					/*
+					 * Similar situation to INSERT_IN_PROGRESS case.
+					 */
+					if (!is_system_catalog && !concurrent &&
+						!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tuple->t_data)))
+						elog(WARNING, "concurrent delete in progress within table \"%s\"",
+							 RelationGetRelationName(OldHeap));
+					/* treat as recently dead */
+					*tups_recently_dead += 1;
+					isdead = false;
+					break;
+				default:
+					elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
+					isdead = false; /* keep compiler quiet */
+					break;
 			}
-			continue;
+
+			if (isdead)
+			{
+				*tups_vacuumed += 1;
+				/* heap rewrite module still needs to see it... */
+				if (rewrite_heap_dead_tuple(rwstate, tuple))
+				{
+					/* A previous recently-dead tuple is now known dead */
+					*tups_vacuumed += 1;
+					*tups_recently_dead -= 1;
+				}
+
+				LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+				continue;
+			}
+
+			/*
+			 * In the concurrent case, we have a copy of the tuple, so we
+			 * don't worry whether the source tuple will be deleted / updated
+			 * after we release the lock.
+			 */
+			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 		}
 
 		*num_tuples += 1;
@@ -919,7 +950,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		{
 			const int	ct_index[] = {
 				PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
-				PROGRESS_REPACK_HEAP_TUPLES_WRITTEN
+				PROGRESS_REPACK_HEAP_TUPLES_INSERTED
 			};
 			int64		ct_val[2];
 
@@ -934,6 +965,31 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			ct_val[1] = *num_tuples;
 			pgstat_progress_update_multi_param(2, ct_index, ct_val);
 		}
+
+		/*
+		 * Process the WAL produced by the load, as well as by other
+		 * transactions, so that the replication slot can advance and WAL does
+		 * not pile up. Use wal_segment_size as a threshold so that we do not
+		 * introduce the decoding overhead too often.
+		 *
+		 * Of course, we must not apply the changes until the initial load has
+		 * completed.
+		 *
+		 * Note that our insertions into the new table should not be decoded
+		 * as we (intentionally) do not write the logical decoding specific
+		 * information to WAL.
+		 */
+		if (concurrent)
+		{
+			XLogRecPtr	end_of_wal;
+
+			end_of_wal = GetFlushRecPtr(NULL);
+			if ((end_of_wal - end_of_wal_prev) > wal_segment_size)
+			{
+				repack_decode_concurrent_changes(decoding_ctx, end_of_wal);
+				end_of_wal_prev = end_of_wal;
+			}
+		}
 	}
 
 	if (indexScan != NULL)
@@ -977,7 +1033,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 									 values, isnull,
 									 rwstate);
 			/* Report n_tuples */
-			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_WRITTEN,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_INSERTED,
 										 n_tuples);
 		}
 
@@ -985,7 +1041,8 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	}
 
 	/* Write out any remaining tuples, and fsync if needed */
-	end_heap_rewrite(rwstate);
+	if (rwstate)
+		end_heap_rewrite(rwstate);
 
 	/* Clean up */
 	pfree(values);
@@ -2376,6 +2433,10 @@ heapam_scan_sample_next_tuple(TableScanDesc scan, SampleScanState *scanstate,
  * SET WITHOUT OIDS.
  *
  * So, we must reconstruct the tuple from component Datums.
+ *
+ * If rwstate=NULL, use simple_heap_insert() instead of rewriting - in that
+ * case we still need to deform/form the tuple. TODO Shouldn't we rename the
+ * function, as might not do any rewrite?
  */
 static void
 reform_and_rewrite_tuple(HeapTuple tuple,
@@ -2398,8 +2459,28 @@ reform_and_rewrite_tuple(HeapTuple tuple,
 
 	copiedTuple = heap_form_tuple(newTupDesc, values, isnull);
 
-	/* The heap rewrite module does the rest */
-	rewrite_heap_tuple(rwstate, tuple, copiedTuple);
+	if (rwstate)
+		/* The heap rewrite module does the rest */
+		rewrite_heap_tuple(rwstate, tuple, copiedTuple);
+	else
+	{
+		/*
+		 * Insert tuple when processing REPACK CONCURRENTLY.
+		 *
+		 * rewriteheap.c is not used in the CONCURRENTLY case because it'd be
+		 * difficult to do the same in the catch-up phase (as the logical
+		 * decoding does not provide us with sufficient visibility
+		 * information). Thus we must use heap_insert() both during the
+		 * catch-up and here.
+		 *
+		 * The following is like simple_heap_insert() except that we pass the
+		 * flag to skip logical decoding: as soon as REPACK CONCURRENTLY swaps
+		 * the relation files, it drops this relation, so no logical
+		 * replication subscription should need the data.
+		 */
+		heap_insert(NewHeap, copiedTuple, GetCurrentCommandId(true),
+					HEAP_INSERT_NO_LOGICAL, NULL);
+	}
 
 	heap_freetuple(copiedTuple);
 }
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index e6d2b5fced1..6aa2ed214f2 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -617,9 +617,9 @@ raw_heap_insert(RewriteState state, HeapTuple tup)
 		int			options = HEAP_INSERT_SKIP_FSM;
 
 		/*
-		 * While rewriting the heap for VACUUM FULL / CLUSTER, make sure data
-		 * for the TOAST table are not logically decoded.  The main heap is
-		 * WAL-logged as XLOG FPI records, which are not logically decoded.
+		 * While rewriting the heap for REPACK, make sure data for the TOAST
+		 * table are not logically decoded.  The main heap is WAL-logged as
+		 * XLOG FPI records, which are not logically decoded.
 		 */
 		options |= HEAP_INSERT_NO_LOGICAL;
 
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index b46e7e9c2a6..5670f2bfbde 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -215,6 +215,7 @@ typedef struct TransactionStateData
 	bool		parallelChildXact;	/* is any parent transaction parallel? */
 	bool		chain;			/* start a new block after this one */
 	bool		topXidLogged;	/* for a subxact: is top-level XID logged? */
+	bool		internal;		/* for a subxact: launched internally? */
 	struct TransactionStateData *parent;	/* back link to parent */
 } TransactionStateData;
 
@@ -4735,6 +4736,7 @@ BeginInternalSubTransaction(const char *name)
 			/* Normal subtransaction start */
 			PushTransaction();
 			s = CurrentTransactionState;	/* changed by push */
+			s->internal = true;
 
 			/*
 			 * Savepoint names, like the TransactionState block itself, live
@@ -5251,7 +5253,13 @@ AbortSubTransaction(void)
 	LWLockReleaseAll();
 
 	pgstat_report_wait_end();
-	pgstat_progress_end_command();
+
+	/*
+	 * Internal subtransacion might be used by an user command, in which case
+	 * the command outlives the subtransaction.
+	 */
+	if (!s->internal)
+		pgstat_progress_end_command();
 
 	pgaio_error_cleanup();
 
@@ -5468,6 +5476,7 @@ PushTransaction(void)
 	s->parallelModeLevel = 0;
 	s->parallelChildXact = (p->parallelModeLevel != 0 || p->parallelChildXact);
 	s->topXidLogged = false;
+	s->internal = false;
 
 	CurrentTransactionState = s;
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index b2b7b10c2be..a92ac78ad9e 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1266,16 +1266,17 @@ CREATE VIEW pg_stat_progress_cluster AS
                       WHEN 2 THEN 'index scanning heap'
                       WHEN 3 THEN 'sorting tuples'
                       WHEN 4 THEN 'writing new heap'
-                      WHEN 5 THEN 'swapping relation files'
-                      WHEN 6 THEN 'rebuilding index'
-                      WHEN 7 THEN 'performing final cleanup'
+                      -- 5 is 'catch-up', but that should not appear here.
+                      WHEN 6 THEN 'swapping relation files'
+                      WHEN 7 THEN 'rebuilding index'
+                      WHEN 8 THEN 'performing final cleanup'
                       END AS phase,
         CAST(S.param3 AS oid) AS cluster_index_relid,
         S.param4 AS heap_tuples_scanned,
         S.param5 AS heap_tuples_written,
-        S.param6 AS heap_blks_total,
-        S.param7 AS heap_blks_scanned,
-        S.param8 AS index_rebuild_count
+        S.param8 AS heap_blks_total,
+        S.param9 AS heap_blks_scanned,
+        S.param10 AS index_rebuild_count
     FROM pg_stat_get_progress_info('CLUSTER') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
@@ -1291,16 +1292,19 @@ CREATE VIEW pg_stat_progress_repack AS
                       WHEN 2 THEN 'index scanning heap'
                       WHEN 3 THEN 'sorting tuples'
                       WHEN 4 THEN 'writing new heap'
-                      WHEN 5 THEN 'swapping relation files'
-                      WHEN 6 THEN 'rebuilding index'
-                      WHEN 7 THEN 'performing final cleanup'
+                      WHEN 5 THEN 'catch-up'
+                      WHEN 6 THEN 'swapping relation files'
+                      WHEN 7 THEN 'rebuilding index'
+                      WHEN 8 THEN 'performing final cleanup'
                       END AS phase,
         CAST(S.param3 AS oid) AS repack_index_relid,
         S.param4 AS heap_tuples_scanned,
-        S.param5 AS heap_tuples_written,
-        S.param6 AS heap_blks_total,
-        S.param7 AS heap_blks_scanned,
-        S.param8 AS index_rebuild_count
+        S.param5 AS heap_tuples_inserted,
+        S.param6 AS heap_tuples_updated,
+        S.param7 AS heap_tuples_deleted,
+        S.param8 AS heap_blks_total,
+        S.param9 AS heap_blks_scanned,
+        S.param10 AS index_rebuild_count
     FROM pg_stat_get_progress_info('REPACK') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 8b64f9e6795..511b2bb6c43 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -25,6 +25,10 @@
 #include "access/toast_internals.h"
 #include "access/transam.h"
 #include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xlog_internal.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
 #include "catalog/catalog.h"
 #include "catalog/dependency.h"
 #include "catalog/heap.h"
@@ -32,6 +36,7 @@
 #include "catalog/namespace.h"
 #include "catalog/objectaccess.h"
 #include "catalog/pg_am.h"
+#include "catalog/pg_control.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/toasting.h"
 #include "commands/cluster.h"
@@ -39,15 +44,21 @@
 #include "commands/progress.h"
 #include "commands/tablecmds.h"
 #include "commands/vacuum.h"
+#include "executor/executor.h"
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
 #include "pgstat.h"
+#include "replication/decode.h"
+#include "replication/logical.h"
+#include "replication/snapbuild.h"
 #include "storage/bufmgr.h"
+#include "storage/ipc.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
 #include "utils/acl.h"
 #include "utils/fmgroids.h"
 #include "utils/guc.h"
+#include "utils/injection_point.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
@@ -67,13 +78,45 @@ typedef struct
 	Oid			indexOid;
 } RelToCluster;
 
+/*
+ * The following definitions are used for concurrent processing.
+ */
+
+/*
+ * The locators are used to avoid logical decoding of data that we do not need
+ * for our table.
+ */
+RelFileLocator repacked_rel_locator = {.relNumber = InvalidOid};
+RelFileLocator repacked_rel_toast_locator = {.relNumber = InvalidOid};
+
+/*
+ * Everything we need to call ExecInsertIndexTuples().
+ */
+typedef struct IndexInsertState
+{
+	ResultRelInfo *rri;
+	EState	   *estate;
+
+	Relation	ident_index;
+} IndexInsertState;
+
+/* The WAL segment being decoded. */
+static XLogSegNo repack_current_segment = 0;
+
+
 static bool cluster_rel_recheck(RepackCommand cmd, Relation OldHeap,
-								Oid indexOid, Oid userid, int options);
+								Oid indexOid, Oid userid, LOCKMODE lmode,
+								int options);
+static void check_repack_concurrently_requirements(Relation rel);
 static void rebuild_relation(RepackCommand cmd, bool usingindex,
-							 Relation OldHeap, Relation index, bool verbose);
+							 Relation OldHeap, Relation index, Oid userid,
+							 bool verbose, bool concurrent);
 static void copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
-							bool verbose, bool *pSwapToastByContent,
-							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
+							Snapshot snapshot, LogicalDecodingContext *decoding_ctx,
+							bool verbose,
+							bool *pSwapToastByContent,
+							TransactionId *pFreezeXid,
+							MultiXactId *pCutoffMulti);
 static List *get_tables_to_repack(RepackCommand cmd, bool usingindex,
 								  MemoryContext permcxt);
 static List *get_tables_to_repack_partitioned(RepackCommand cmd,
@@ -81,12 +124,61 @@ static List *get_tables_to_repack_partitioned(RepackCommand cmd,
 											  Oid relid, bool rel_is_index);
 static bool cluster_is_permitted_for_relation(RepackCommand cmd,
 											  Oid relid, Oid userid);
+
+static void begin_concurrent_repack(Relation rel);
+static void end_concurrent_repack(void);
+static LogicalDecodingContext *setup_logical_decoding(Oid relid,
+													  const char *slotname,
+													  TupleDesc tupdesc);
+static HeapTuple get_changed_tuple(char *change);
+static void apply_concurrent_changes(RepackDecodingState *dstate,
+									 Relation rel, ScanKey key, int nkeys,
+									 IndexInsertState *iistate);
+static void apply_concurrent_insert(Relation rel, ConcurrentChange *change,
+									HeapTuple tup, IndexInsertState *iistate,
+									TupleTableSlot *index_slot);
+static void apply_concurrent_update(Relation rel, HeapTuple tup,
+									HeapTuple tup_target,
+									ConcurrentChange *change,
+									IndexInsertState *iistate,
+									TupleTableSlot *index_slot);
+static void apply_concurrent_delete(Relation rel, HeapTuple tup_target,
+									ConcurrentChange *change);
+static HeapTuple find_target_tuple(Relation rel, ScanKey key, int nkeys,
+								   HeapTuple tup_key,
+								   IndexInsertState *iistate,
+								   TupleTableSlot *ident_slot,
+								   IndexScanDesc *scan_p);
+static void process_concurrent_changes(LogicalDecodingContext *ctx,
+									   XLogRecPtr end_of_wal,
+									   Relation rel_dst,
+									   Relation rel_src,
+									   ScanKey ident_key,
+									   int ident_key_nentries,
+									   IndexInsertState *iistate);
+static IndexInsertState *get_index_insert_state(Relation relation,
+												Oid ident_index_id);
+static ScanKey build_identity_key(Oid ident_idx_oid, Relation rel_src,
+								  int *nentries);
+static void free_index_insert_state(IndexInsertState *iistate);
+static void cleanup_logical_decoding(LogicalDecodingContext *ctx);
+static void rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
+											   Relation cl_index,
+											   LogicalDecodingContext *ctx,
+											   bool swap_toast_by_content,
+											   TransactionId frozenXid,
+											   MultiXactId cutoffMulti);
+static List *build_new_indexes(Relation NewHeap, Relation OldHeap, List *OldIndexes);
 static Relation process_single_relation(RepackStmt *stmt,
+										LOCKMODE lockmode,
+										bool isTopLevel,
 										ClusterParams *params);
 static Oid	determine_clustered_index(Relation rel, bool usingindex,
 									  const char *indexname);
 
 
+#define REPL_PLUGIN_NAME   "pgoutput_repack"
+
 static const char *
 RepackCommandAsString(RepackCommand cmd)
 {
@@ -95,7 +187,7 @@ RepackCommandAsString(RepackCommand cmd)
 		case REPACK_COMMAND_REPACK:
 			return "REPACK";
 		case REPACK_COMMAND_VACUUMFULL:
-			return "VACUUM";
+			return "VACUUM (FULL)";
 		case REPACK_COMMAND_CLUSTER:
 			return "CLUSTER";
 	}
@@ -132,6 +224,7 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 	ClusterParams params = {0};
 	Relation	rel = NULL;
 	MemoryContext repack_context;
+	LOCKMODE	lockmode;
 	List	   *rtcs;
 
 	/* Parse option list */
@@ -142,6 +235,16 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 		else if (strcmp(opt->defname, "analyze") == 0 ||
 				 strcmp(opt->defname, "analyse") == 0)
 			params.options |= defGetBoolean(opt) ? CLUOPT_ANALYZE : 0;
+		else if (strcmp(opt->defname, "concurrently") == 0 &&
+				 defGetBoolean(opt))
+		{
+			if (stmt->command != REPACK_COMMAND_REPACK)
+				ereport(ERROR,
+						errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						errmsg("CONCURRENTLY option not supported for %s",
+							   RepackCommandAsString(stmt->command)));
+			params.options |= CLUOPT_CONCURRENT;
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -151,13 +254,25 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 					 parser_errposition(pstate, opt->location)));
 	}
 
+	/*
+	 * Determine the lock mode expected by cluster_rel().
+	 *
+	 * In the exclusive case, we obtain AccessExclusiveLock right away to
+	 * avoid lock-upgrade hazard in the single-transaction case. In the
+	 * CONCURRENTLY case, the AccessExclusiveLock will only be used at the end
+	 * of processing, supposedly for very short time. Until then, we'll have
+	 * to unlock the relation temporarily, so there's no lock-upgrade hazard.
+	 */
+	lockmode = (params.options & CLUOPT_CONCURRENT) == 0 ?
+		AccessExclusiveLock : ShareUpdateExclusiveLock;
+
 	/*
 	 * If a single relation is specified, process it and we're done ... unless
 	 * the relation is a partitioned table, in which case we fall through.
 	 */
 	if (stmt->relation != NULL)
 	{
-		rel = process_single_relation(stmt, &params);
+		rel = process_single_relation(stmt, lockmode, isTopLevel, &params);
 		if (rel == NULL)
 			return;
 	}
@@ -169,10 +284,29 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 				errmsg("cannot ANALYZE multiple tables"));
 
 	/*
-	 * By here, we know we are in a multi-table situation.  In order to avoid
-	 * holding locks for too long, we want to process each table in its own
-	 * transaction.  This forces us to disallow running inside a user
-	 * transaction block.
+	 * By here, we know we are in a multi-table situation.
+	 *
+	 * Concurrent processing is currently considered rather special (e.g. in
+	 * terms of resources consumed) so it is not performed in bulk.
+	 */
+	if (params.options & CLUOPT_CONCURRENT)
+	{
+		if (rel != NULL)
+		{
+			Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
+			ereport(ERROR,
+					errmsg("REPACK CONCURRENTLY not supported for partitioned tables"),
+					errhint("Consider running the command for individual partitions."));
+		}
+		else
+			ereport(ERROR,
+					errmsg("REPACK CONCURRENTLY requires explicit table name"));
+	}
+
+	/*
+	 * In order to avoid holding locks for too long, we want to process each
+	 * table in its own transaction.  This forces us to disallow running
+	 * inside a user transaction block.
 	 */
 	PreventInTransactionBlock(isTopLevel, RepackCommandAsString(stmt->command));
 
@@ -252,7 +386,7 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 		 * Open the target table, coping with the case where it has been
 		 * dropped.
 		 */
-		rel = try_table_open(rtc->tableOid, AccessExclusiveLock);
+		rel = try_table_open(rtc->tableOid, lockmode);
 		if (rel == NULL)
 		{
 			CommitTransactionCommand();
@@ -264,7 +398,7 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 
 		/* Process this table */
 		cluster_rel(stmt->command, stmt->usingindex,
-					rel, rtc->indexOid, &params);
+					rel, rtc->indexOid, &params, isTopLevel);
 		/* cluster_rel closes the relation, but keeps lock */
 
 		PopActiveSnapshot();
@@ -293,22 +427,55 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
  * If indexOid is InvalidOid, the table will be rewritten in physical order
  * instead of index order.
  *
+ * Note that, in the concurrent case, the function releases the lock at some
+ * point, in order to get AccessExclusiveLock for the final steps (i.e. to
+ * swap the relation files). To make things simpler, the caller should expect
+ * OldHeap to be closed on return, regardless CLUOPT_CONCURRENT. (The
+ * AccessExclusiveLock is kept till the end of the transaction.)
+ *
  * 'cmd' indicates which command is being executed, to be used for error
  * messages.
  */
 void
 cluster_rel(RepackCommand cmd, bool usingindex,
-			Relation OldHeap, Oid indexOid, ClusterParams *params)
+			Relation OldHeap, Oid indexOid, ClusterParams *params,
+			bool isTopLevel)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
+	Relation	index;
+	LOCKMODE	lmode;
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
 	bool		verbose = ((params->options & CLUOPT_VERBOSE) != 0);
 	bool		recheck = ((params->options & CLUOPT_RECHECK) != 0);
-	Relation	index;
+	bool		concurrent = ((params->options & CLUOPT_CONCURRENT) != 0);
 
-	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false));
+	/*
+	 * Check that the correct lock is held. The lock mode is
+	 * AccessExclusiveLock for normal processing and ShareUpdateExclusiveLock
+	 * for concurrent processing (so that SELECT, INSERT, UPDATE and DELETE
+	 * commands work, but cluster_rel() cannot be called concurrently for the
+	 * same relation).
+	 */
+	lmode = !concurrent ? AccessExclusiveLock : ShareUpdateExclusiveLock;
+
+	/* There are specific requirements on concurrent processing. */
+	if (concurrent)
+	{
+		/*
+		 * Make sure we have no XID assigned, otherwise call of
+		 * setup_logical_decoding() can cause a deadlock.
+		 *
+		 * The existence of transaction block actually does not imply that XID
+		 * was already assigned, but it very likely is. We might want to check
+		 * the result of GetCurrentTransactionIdIfAny() instead, but that
+		 * would be less clear from user's perspective.
+		 */
+		PreventInTransactionBlock(isTopLevel, "REPACK (CONCURRENTLY)");
+
+		check_repack_concurrently_requirements(OldHeap);
+	}
 
 	/* Check for user-requested abort. */
 	CHECK_FOR_INTERRUPTS();
@@ -351,11 +518,13 @@ cluster_rel(RepackCommand cmd, bool usingindex,
 	 * If this is a single-transaction CLUSTER, we can skip these tests. We
 	 * *must* skip the one on indisclustered since it would reject an attempt
 	 * to cluster a not-previously-clustered index.
+	 *
+	 * XXX move [some of] these comments to where the RECHECK flag is
+	 * determined?
 	 */
-	if (recheck)
-		if (!cluster_rel_recheck(cmd, OldHeap, indexOid, save_userid,
-								 params->options))
-			goto out;
+	if (recheck && !cluster_rel_recheck(cmd, OldHeap, indexOid, save_userid,
+										lmode, params->options))
+		goto out;
 
 	/*
 	 * We allow repacking shared catalogs only when not using an index. It
@@ -369,6 +538,12 @@ cluster_rel(RepackCommand cmd, bool usingindex,
 				 errmsg("cannot run \"%s\" on a shared catalog",
 						RepackCommandAsString(cmd))));
 
+	/*
+	 * The CONCURRENTLY case should have been rejected earlier because it does
+	 * not support system catalogs.
+	 */
+	Assert(!(OldHeap->rd_rel->relisshared && concurrent));
+
 	/*
 	 * Don't process temp tables of other backends ... their local buffer
 	 * manager is not going to cope.
@@ -404,7 +579,7 @@ cluster_rel(RepackCommand cmd, bool usingindex,
 	if (OidIsValid(indexOid))
 	{
 		/* verify the index is good and lock it */
-		check_index_is_clusterable(OldHeap, indexOid, AccessExclusiveLock);
+		check_index_is_clusterable(OldHeap, indexOid, lmode);
 		/* also open it */
 		index = index_open(indexOid, NoLock);
 	}
@@ -421,7 +596,9 @@ cluster_rel(RepackCommand cmd, bool usingindex,
 	if (OldHeap->rd_rel->relkind == RELKIND_MATVIEW &&
 		!RelationIsPopulated(OldHeap))
 	{
-		relation_close(OldHeap, AccessExclusiveLock);
+		if (index)
+			index_close(index, lmode);
+		relation_close(OldHeap, lmode);
 		goto out;
 	}
 
@@ -434,11 +611,35 @@ cluster_rel(RepackCommand cmd, bool usingindex,
 	 * invalid, because we move tuples around.  Promote them to relation
 	 * locks.  Predicate locks on indexes will be promoted when they are
 	 * reindexed.
+	 *
+	 * During concurrent processing, the heap as well as its indexes stay in
+	 * operation, so we postpone this step until they are locked using
+	 * AccessExclusiveLock near the end of the processing.
 	 */
-	TransferPredicateLocksToHeapRelation(OldHeap);
+	if (!concurrent)
+		TransferPredicateLocksToHeapRelation(OldHeap);
 
 	/* rebuild_relation does all the dirty work */
-	rebuild_relation(cmd, usingindex, OldHeap, index, verbose);
+	PG_TRY();
+	{
+		/*
+		 * For concurrent processing, make sure that our logical decoding
+		 * ignores data changes of other tables than the one we are
+		 * processing.
+		 */
+		if (concurrent)
+			begin_concurrent_repack(OldHeap);
+
+		rebuild_relation(cmd, usingindex, OldHeap, index, save_userid,
+						 verbose, concurrent);
+	}
+	PG_FINALLY();
+	{
+		if (concurrent)
+			end_concurrent_repack();
+	}
+	PG_END_TRY();
+
 	/* rebuild_relation closes OldHeap, and index if valid */
 
 out:
@@ -457,14 +658,14 @@ out:
  */
 static bool
 cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
-					Oid userid, int options)
+					Oid userid, LOCKMODE lmode, int options)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 
 	/* Check that the user still has privileges for the relation */
 	if (!cluster_is_permitted_for_relation(cmd, tableOid, userid))
 	{
-		relation_close(OldHeap, AccessExclusiveLock);
+		relation_close(OldHeap, lmode);
 		return false;
 	}
 
@@ -478,7 +679,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	 */
 	if (RELATION_IS_OTHER_TEMP(OldHeap))
 	{
-		relation_close(OldHeap, AccessExclusiveLock);
+		relation_close(OldHeap, lmode);
 		return false;
 	}
 
@@ -489,7 +690,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 		 */
 		if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
 		{
-			relation_close(OldHeap, AccessExclusiveLock);
+			relation_close(OldHeap, lmode);
 			return false;
 		}
 
@@ -500,7 +701,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 		if ((options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
 			!get_index_isclustered(indexOid))
 		{
-			relation_close(OldHeap, AccessExclusiveLock);
+			relation_close(OldHeap, lmode);
 			return false;
 		}
 	}
@@ -641,19 +842,89 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 	table_close(pg_index, RowExclusiveLock);
 }
 
+/*
+ * Check if the CONCURRENTLY option is legal for the relation.
+ */
+static void
+check_repack_concurrently_requirements(Relation rel)
+{
+	char		relpersistence,
+				replident;
+	Oid			ident_idx;
+
+	/* Data changes in system relations are not logically decoded. */
+	if (IsCatalogRelation(rel))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("REPACK CONCURRENTLY is not supported for catalog relations.")));
+
+	/*
+	 * reorderbuffer.c does not seem to handle processing of TOAST relation
+	 * alone.
+	 */
+	if (IsToastRelation(rel))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("REPACK CONCURRENTLY is not supported for TOAST relations, unless the main relation is repacked too.")));
+
+	relpersistence = rel->rd_rel->relpersistence;
+	if (relpersistence != RELPERSISTENCE_PERMANENT)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("REPACK CONCURRENTLY is only allowed for permanent relations.")));
+
+	/* With NOTHING, WAL does not contain the old tuple. */
+	replident = rel->rd_rel->relreplident;
+	if (replident == REPLICA_IDENTITY_NOTHING)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("Relation \"%s\" has insufficient replication identity.",
+						 RelationGetRelationName(rel))));
+
+	/*
+	 * Identity index is not set if the replica identity is FULL, but PK might
+	 * exist in such a case.
+	 */
+	ident_idx = RelationGetReplicaIndex(rel);
+	if (!OidIsValid(ident_idx) && OidIsValid(rel->rd_pkindex))
+		ident_idx = rel->rd_pkindex;
+	if (!OidIsValid(ident_idx))
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot process relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 (errhint("Relation \"%s\" has no identity index.",
+						  RelationGetRelationName(rel)))));
+}
+
+
 /*
  * rebuild_relation: rebuild an existing relation in index or physical order
  *
- * OldHeap: table to rebuild.
+ * OldHeap: table to rebuild.  See cluster_rel() for comments on the required
+ * lock strength.
+ *
  * index: index to cluster by, or NULL to rewrite in physical order.
  *
- * On entry, heap and index (if one is given) must be open, and
- * AccessExclusiveLock held on them.
- * On exit, they are closed, but locks on them are not released.
+ * On entry, heap and index (if one is given) must be open, and the
+ * appropriate lock held on them -- AccessExclusiveLock for exclusive
+ * processing and ShareUpdateExclusiveLock for concurrent processing.
+ *
+ * On exit, they are closed, but still locked with AccessExclusiveLock.  (The
+ * function handles the lock upgrade if 'concurrent' is true.)
  */
 static void
 rebuild_relation(RepackCommand cmd, bool usingindex,
-				 Relation OldHeap, Relation index, bool verbose)
+				 Relation OldHeap, Relation index, Oid userid,
+				 bool verbose, bool concurrent)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			accessMethod = OldHeap->rd_rel->relam;
@@ -661,13 +932,55 @@ rebuild_relation(RepackCommand cmd, bool usingindex,
 	Oid			OIDNewHeap;
 	Relation	NewHeap;
 	char		relpersistence;
-	bool		is_system_catalog;
 	bool		swap_toast_by_content;
 	TransactionId frozenXid;
 	MultiXactId cutoffMulti;
+	NameData	slotname;
+	LogicalDecodingContext *ctx = NULL;
+	Snapshot	snapshot = NULL;
+#if USE_ASSERT_CHECKING
+	LOCKMODE	lmode;
 
-	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false) &&
-		   (index == NULL || CheckRelationLockedByMe(index, AccessExclusiveLock, false)));
+	lmode = concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock;
+
+	Assert(CheckRelationLockedByMe(OldHeap, lmode, false));
+	Assert(!usingindex || CheckRelationLockedByMe(index, lmode, false));
+#endif
+
+	if (concurrent)
+	{
+		TupleDesc	tupdesc;
+
+		/*
+		 * REPACK CONCURRENTLY is not allowed in a transaction block, so this
+		 * should never fire.
+		 */
+		Assert(GetTopTransactionIdIfAny() == InvalidTransactionId);
+
+		/*
+		 * A single backend should not execute multiple REPACK commands at a
+		 * time, so use PID to make the slot unique.
+		 */
+		snprintf(NameStr(slotname), NAMEDATALEN, "repack_%d", MyProcPid);
+
+		tupdesc = CreateTupleDescCopy(RelationGetDescr(OldHeap));
+
+		/*
+		 * Prepare to capture the concurrent data changes.
+		 *
+		 * Note that this call waits for all transactions with XID already
+		 * assigned to finish. If some of those transactions is waiting for a
+		 * lock conflicting with ShareUpdateExclusiveLock on our table (e.g.
+		 * it runs CREATE INDEX), we can end up in a deadlock. Not sure this
+		 * risk is worth unlocking/locking the table (and its clustering
+		 * index) and checking again if its still eligible for REPACK
+		 * CONCURRENTLY.
+		 */
+		ctx = setup_logical_decoding(tableOid, NameStr(slotname), tupdesc);
+
+		snapshot = SnapBuildInitialSnapshotForRepack(ctx->snapshot_builder);
+		PushActiveSnapshot(snapshot);
+	}
 
 	/* for CLUSTER or REPACK USING INDEX, mark the index as the one to use */
 	if (usingindex)
@@ -675,7 +988,6 @@ rebuild_relation(RepackCommand cmd, bool usingindex,
 
 	/* Remember info about rel before closing OldHeap */
 	relpersistence = OldHeap->rd_rel->relpersistence;
-	is_system_catalog = IsSystemRelation(OldHeap);
 
 	/*
 	 * Create the transient table that will receive the re-ordered data.
@@ -691,30 +1003,67 @@ rebuild_relation(RepackCommand cmd, bool usingindex,
 	NewHeap = table_open(OIDNewHeap, NoLock);
 
 	/* Copy the heap data into the new table in the desired order */
-	copy_table_data(NewHeap, OldHeap, index, verbose,
+	copy_table_data(NewHeap, OldHeap, index, snapshot, ctx, verbose,
 					&swap_toast_by_content, &frozenXid, &cutoffMulti);
 
+	/* The historic snapshot won't be needed anymore. */
+	if (snapshot)
+		PopActiveSnapshot();
 
-	/* Close relcache entries, but keep lock until transaction commit */
-	table_close(OldHeap, NoLock);
-	if (index)
-		index_close(index, NoLock);
+	if (concurrent)
+	{
+		/*
+		 * Push a snapshot that we will use to find old versions of rows when
+		 * processing concurrent UPDATE and DELETE commands. (That snapshot
+		 * should also be used by index expressions.)
+		 */
+		PushActiveSnapshot(GetTransactionSnapshot());
 
-	/*
-	 * Close the new relation so it can be dropped as soon as the storage is
-	 * swapped. The relation is not visible to others, so no need to unlock it
-	 * explicitly.
-	 */
-	table_close(NewHeap, NoLock);
+		/*
+		 * Make sure we can find the tuples just inserted when applying DML
+		 * commands on top of those.
+		 */
+		CommandCounterIncrement();
+		UpdateActiveSnapshotCommandId();
 
-	/*
-	 * Swap the physical files of the target and transient tables, then
-	 * rebuild the target's indexes and throw away the transient table.
-	 */
-	finish_heap_swap(tableOid, OIDNewHeap, is_system_catalog,
-					 swap_toast_by_content, false, true,
-					 frozenXid, cutoffMulti,
-					 relpersistence);
+		rebuild_relation_finish_concurrent(NewHeap, OldHeap, index,
+										   ctx, swap_toast_by_content,
+										   frozenXid, cutoffMulti);
+		PopActiveSnapshot();
+
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_FINAL_CLEANUP);
+
+		/* Done with decoding. */
+		cleanup_logical_decoding(ctx);
+		ReplicationSlotRelease();
+		ReplicationSlotDrop(NameStr(slotname), false);
+	}
+	else
+	{
+		bool		is_system_catalog = IsSystemRelation(OldHeap);
+
+		/* Close relcache entries, but keep lock until transaction commit */
+		table_close(OldHeap, NoLock);
+		if (index)
+			index_close(index, NoLock);
+
+		/*
+		 * Close the new relation so it can be dropped as soon as the storage
+		 * is swapped. The relation is not visible to others, so no need to
+		 * unlock it explicitly.
+		 */
+		table_close(NewHeap, NoLock);
+
+		/*
+		 * Swap the physical files of the target and transient tables, then
+		 * rebuild the target's indexes and throw away the transient table.
+		 */
+		finish_heap_swap(tableOid, OIDNewHeap, is_system_catalog,
+						 swap_toast_by_content, false, true, true,
+						 frozenXid, cutoffMulti,
+						 relpersistence);
+	}
 }
 
 
@@ -849,15 +1198,19 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
 /*
  * Do the physical copying of table data.
  *
+ * 'snapshot' and 'decoding_ctx': see table_relation_copy_for_cluster(). Pass
+ * iff concurrent processing is required.
+ *
  * There are three output parameters:
  * *pSwapToastByContent is set true if toast tables must be swapped by content.
  * *pFreezeXid receives the TransactionId used as freeze cutoff point.
  * *pCutoffMulti receives the MultiXactId used as a cutoff point.
  */
 static void
-copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verbose,
-				bool *pSwapToastByContent, TransactionId *pFreezeXid,
-				MultiXactId *pCutoffMulti)
+copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
+				Snapshot snapshot, LogicalDecodingContext *decoding_ctx,
+				bool verbose, bool *pSwapToastByContent,
+				TransactionId *pFreezeXid, MultiXactId *pCutoffMulti)
 {
 	Relation	relRelation;
 	HeapTuple	reltup;
@@ -875,6 +1228,8 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	PGRUsage	ru0;
 	char	   *nspname;
 
+	bool		concurrent = snapshot != NULL;
+
 	pg_rusage_init(&ru0);
 
 	/* Store a copy of the namespace name for logging purposes */
@@ -977,8 +1332,48 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	 * provided, else plain seqscan.
 	 */
 	if (OldIndex != NULL && OldIndex->rd_rel->relam == BTREE_AM_OID)
+	{
+		ResourceOwner oldowner = NULL;
+		ResourceOwner resowner = NULL;
+
+		/*
+		 * In the CONCURRENT case, use a dedicated resource owner so we don't
+		 * leave any additional locks behind us that we cannot release easily.
+		 */
+		if (concurrent)
+		{
+			Assert(CheckRelationLockedByMe(OldHeap, ShareUpdateExclusiveLock,
+										   false));
+			Assert(CheckRelationLockedByMe(OldIndex, ShareUpdateExclusiveLock,
+										   false));
+
+			resowner = ResourceOwnerCreate(CurrentResourceOwner,
+										   "plan_cluster_use_sort");
+			oldowner = CurrentResourceOwner;
+			CurrentResourceOwner = resowner;
+		}
+
 		use_sort = plan_cluster_use_sort(RelationGetRelid(OldHeap),
 										 RelationGetRelid(OldIndex));
+
+		if (concurrent)
+		{
+			CurrentResourceOwner = oldowner;
+
+			/*
+			 * We are primarily concerned about locks, but if the planner
+			 * happened to allocate any other resources, we should release
+			 * them too because we're going to delete the whole resowner.
+			 */
+			ResourceOwnerRelease(resowner, RESOURCE_RELEASE_BEFORE_LOCKS,
+								 false, false);
+			ResourceOwnerRelease(resowner, RESOURCE_RELEASE_LOCKS,
+								 false, false);
+			ResourceOwnerRelease(resowner, RESOURCE_RELEASE_AFTER_LOCKS,
+								 false, false);
+			ResourceOwnerDelete(resowner);
+		}
+	}
 	else
 		use_sort = false;
 
@@ -1007,7 +1402,9 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	 * values (e.g. because the AM doesn't use freezing).
 	 */
 	table_relation_copy_for_cluster(OldHeap, NewHeap, OldIndex, use_sort,
-									cutoffs.OldestXmin, &cutoffs.FreezeLimit,
+									cutoffs.OldestXmin, snapshot,
+									decoding_ctx,
+									&cutoffs.FreezeLimit,
 									&cutoffs.MultiXactCutoff,
 									&num_tuples, &tups_vacuumed,
 									&tups_recently_dead);
@@ -1016,7 +1413,11 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	*pFreezeXid = cutoffs.FreezeLimit;
 	*pCutoffMulti = cutoffs.MultiXactCutoff;
 
-	/* Reset rd_toastoid just to be tidy --- it shouldn't be looked at again */
+	/*
+	 * Reset rd_toastoid just to be tidy --- it shouldn't be looked at again.
+	 * In the CONCURRENTLY case, we need to set it again before applying the
+	 * concurrent changes.
+	 */
 	NewHeap->rd_toastoid = InvalidOid;
 
 	num_pages = RelationGetNumberOfBlocks(NewHeap);
@@ -1474,14 +1875,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 				 bool swap_toast_by_content,
 				 bool check_constraints,
 				 bool is_internal,
+				 bool reindex,
 				 TransactionId frozenXid,
 				 MultiXactId cutoffMulti,
 				 char newrelpersistence)
 {
 	ObjectAddress object;
 	Oid			mapped_tables[4];
-	int			reindex_flags;
-	ReindexParams reindex_params = {0};
 	int			i;
 
 	/* Report that we are now swapping relation files */
@@ -1507,39 +1907,47 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	if (is_system_catalog)
 		CacheInvalidateCatalog(OIDOldHeap);
 
-	/*
-	 * Rebuild each index on the relation (but not the toast table, which is
-	 * all-new at this point).  It is important to do this before the DROP
-	 * step because if we are processing a system catalog that will be used
-	 * during DROP, we want to have its indexes available.  There is no
-	 * advantage to the other order anyway because this is all transactional,
-	 * so no chance to reclaim disk space before commit.  We do not need a
-	 * final CommandCounterIncrement() because reindex_relation does it.
-	 *
-	 * Note: because index_build is called via reindex_relation, it will never
-	 * set indcheckxmin true for the indexes.  This is OK even though in some
-	 * sense we are building new indexes rather than rebuilding existing ones,
-	 * because the new heap won't contain any HOT chains at all, let alone
-	 * broken ones, so it can't be necessary to set indcheckxmin.
-	 */
-	reindex_flags = REINDEX_REL_SUPPRESS_INDEX_USE;
-	if (check_constraints)
-		reindex_flags |= REINDEX_REL_CHECK_CONSTRAINTS;
+	if (reindex)
+	{
+		int			reindex_flags;
+		ReindexParams reindex_params = {0};
 
-	/*
-	 * Ensure that the indexes have the same persistence as the parent
-	 * relation.
-	 */
-	if (newrelpersistence == RELPERSISTENCE_UNLOGGED)
-		reindex_flags |= REINDEX_REL_FORCE_INDEXES_UNLOGGED;
-	else if (newrelpersistence == RELPERSISTENCE_PERMANENT)
-		reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
+		/*
+		 * Rebuild each index on the relation (but not the toast table, which
+		 * is all-new at this point).  It is important to do this before the
+		 * DROP step because if we are processing a system catalog that will
+		 * be used during DROP, we want to have its indexes available.  There
+		 * is no advantage to the other order anyway because this is all
+		 * transactional, so no chance to reclaim disk space before commit. We
+		 * do not need a final CommandCounterIncrement() because
+		 * reindex_relation does it.
+		 *
+		 * Note: because index_build is called via reindex_relation, it will
+		 * never set indcheckxmin true for the indexes.  This is OK even
+		 * though in some sense we are building new indexes rather than
+		 * rebuilding existing ones, because the new heap won't contain any
+		 * HOT chains at all, let alone broken ones, so it can't be necessary
+		 * to set indcheckxmin.
+		 */
+		reindex_flags = REINDEX_REL_SUPPRESS_INDEX_USE;
+		if (check_constraints)
+			reindex_flags |= REINDEX_REL_CHECK_CONSTRAINTS;
 
-	/* Report that we are now reindexing relations */
-	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
-								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
+		/*
+		 * Ensure that the indexes have the same persistence as the parent
+		 * relation.
+		 */
+		if (newrelpersistence == RELPERSISTENCE_UNLOGGED)
+			reindex_flags |= REINDEX_REL_FORCE_INDEXES_UNLOGGED;
+		else if (newrelpersistence == RELPERSISTENCE_PERMANENT)
+			reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
 
-	reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
+		/* Report that we are now reindexing relations */
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
+
+		reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
+	}
 
 	/* Report that we are now doing clean up */
 	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
@@ -1881,7 +2289,8 @@ cluster_is_permitted_for_relation(RepackCommand cmd, Oid relid, Oid userid)
  * resolve in this case.
  */
 static Relation
-process_single_relation(RepackStmt *stmt, ClusterParams *params)
+process_single_relation(RepackStmt *stmt, LOCKMODE lockmode, bool isTopLevel,
+						ClusterParams *params)
 {
 	Relation	rel;
 	Oid			tableOid;
@@ -1890,13 +2299,9 @@ process_single_relation(RepackStmt *stmt, ClusterParams *params)
 	Assert(stmt->command == REPACK_COMMAND_CLUSTER ||
 		   stmt->command == REPACK_COMMAND_REPACK);
 
-	/*
-	 * Find, lock, and check permissions on the table.  We obtain
-	 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
-	 * single-transaction case.
-	 */
+	/* Find, lock, and check permissions on the table. */
 	tableOid = RangeVarGetRelidExtended(stmt->relation,
-										AccessExclusiveLock,
+										lockmode,
 										0,
 										RangeVarCallbackMaintainsTable,
 										NULL);
@@ -1922,26 +2327,17 @@ process_single_relation(RepackStmt *stmt, ClusterParams *params)
 		return rel;
 	else
 	{
-		Oid			indexOid;
+		Oid			indexOid = InvalidOid;
 
-		indexOid = determine_clustered_index(rel, stmt->usingindex,
-											 stmt->indexname);
-		if (OidIsValid(indexOid))
-			check_index_is_clusterable(rel, indexOid, AccessExclusiveLock);
-		cluster_rel(stmt->command, stmt->usingindex, rel, indexOid, params);
-
-		/* Do an analyze, if requested */
-		if (params->options & CLUOPT_ANALYZE)
+		if (stmt->usingindex)
 		{
-			VacuumParams vac_params = {0};
-
-			vac_params.options |= VACOPT_ANALYZE;
-			if (params->options & CLUOPT_VERBOSE)
-				vac_params.options |= VACOPT_VERBOSE;
-			analyze_rel(RelationGetRelid(rel), NULL, vac_params, NIL, true,
-						NULL);
+			indexOid = determine_clustered_index(rel, stmt->usingindex,
+												 stmt->indexname);
+			check_index_is_clusterable(rel, indexOid, lockmode);
 		}
 
+		cluster_rel(stmt->command, stmt->usingindex, rel, indexOid,
+					params, isTopLevel);
 		return NULL;
 	}
 }
@@ -1998,3 +2394,1052 @@ determine_clustered_index(Relation rel, bool usingindex, const char *indexname)
 
 	return indexOid;
 }
+
+
+/*
+ * Call this function before REPACK CONCURRENTLY starts to setup logical
+ * decoding. It makes sure that other users of the table put enough
+ * information into WAL.
+ *
+ * The point is that at various places we expect that the table we're
+ * processing is treated like a system catalog. For example, we need to be
+ * able to scan it using a "historic snapshot" anytime during the processing
+ * (as opposed to scanning only at the start point of the decoding, as logical
+ * replication does during initial table synchronization), in order to apply
+ * concurrent UPDATE / DELETE commands.
+ *
+ * Note that TOAST table needs no attention here as it's not scanned using
+ * historic snapshot.
+ */
+static void
+begin_concurrent_repack(Relation rel)
+{
+	Oid			toastrelid;
+
+	/* Avoid logical decoding of other relations by this backend. */
+	repacked_rel_locator = rel->rd_locator;
+	toastrelid = rel->rd_rel->reltoastrelid;
+	if (OidIsValid(toastrelid))
+	{
+		Relation	toastrel;
+
+		/* Avoid logical decoding of other TOAST relations. */
+		toastrel = table_open(toastrelid, AccessShareLock);
+		repacked_rel_toast_locator = toastrel->rd_locator;
+		table_close(toastrel, AccessShareLock);
+	}
+}
+
+/*
+ * Call this when done with REPACK CONCURRENTLY.
+ */
+static void
+end_concurrent_repack(void)
+{
+	/*
+	 * Restore normal function of (future) logical decoding for this backend.
+	 */
+	repacked_rel_locator.relNumber = InvalidOid;
+	repacked_rel_toast_locator.relNumber = InvalidOid;
+}
+
+/*
+ * This function is much like pg_create_logical_replication_slot() except that
+ * the new slot is neither released (if anyone else could read changes from
+ * our slot, we could miss changes other backends do while we copy the
+ * existing data into temporary table), nor persisted (it's easier to handle
+ * crash by restarting all the work from scratch).
+ */
+static LogicalDecodingContext *
+setup_logical_decoding(Oid relid, const char *slotname, TupleDesc tupdesc)
+{
+	LogicalDecodingContext *ctx;
+	RepackDecodingState *dstate;
+
+	/*
+	 * Check if we can use logical decoding.
+	 */
+	CheckSlotPermissions();
+	CheckLogicalDecodingRequirements();
+
+	/* RS_TEMPORARY so that the slot gets cleaned up on ERROR. */
+	ReplicationSlotCreate(slotname, true, RS_TEMPORARY, false, false, false);
+
+	/*
+	 * Neither prepare_write nor do_write callback nor update_progress is
+	 * useful for us.
+	 *
+	 * Regarding the value of need_full_snapshot, we pass false because the
+	 * table we are processing is present in RepackedRelsHash and therefore,
+	 * regarding logical decoding, treated like a catalog.
+	 */
+	ctx = CreateInitDecodingContext(REPL_PLUGIN_NAME,
+									NIL,
+									false,
+									InvalidXLogRecPtr,
+									XL_ROUTINE(.page_read = read_local_xlog_page,
+											   .segment_open = wal_segment_open,
+											   .segment_close = wal_segment_close),
+									NULL, NULL, NULL);
+
+	/*
+	 * We don't have control on setting fast_forward, so at least check it.
+	 */
+	Assert(!ctx->fast_forward);
+
+	DecodingContextFindStartpoint(ctx);
+
+	/* Some WAL records should have been read. */
+	Assert(ctx->reader->EndRecPtr != InvalidXLogRecPtr);
+
+	XLByteToSeg(ctx->reader->EndRecPtr, repack_current_segment,
+				wal_segment_size);
+
+	/*
+	 * Setup structures to store decoded changes.
+	 */
+	dstate = palloc0(sizeof(RepackDecodingState));
+	dstate->relid = relid;
+	dstate->tstore = tuplestore_begin_heap(false, false,
+										   maintenance_work_mem);
+
+	dstate->tupdesc = tupdesc;
+
+	/* Initialize the descriptor to store the changes ... */
+	dstate->tupdesc_change = CreateTemplateTupleDesc(1);
+
+	TupleDescInitEntry(dstate->tupdesc_change, 1, NULL, BYTEAOID, -1, 0);
+	/* ... as well as the corresponding slot. */
+	dstate->tsslot = MakeSingleTupleTableSlot(dstate->tupdesc_change,
+											  &TTSOpsMinimalTuple);
+
+	dstate->resowner = ResourceOwnerCreate(CurrentResourceOwner,
+										   "logical decoding");
+
+	ctx->output_writer_private = dstate;
+	return ctx;
+}
+
+/*
+ * Retrieve tuple from ConcurrentChange structure.
+ *
+ * The input data starts with the structure but it might not be appropriately
+ * aligned.
+ */
+static HeapTuple
+get_changed_tuple(char *change)
+{
+	HeapTupleData tup_data;
+	HeapTuple	result;
+	char	   *src;
+
+	/*
+	 * Ensure alignment before accessing the fields. (This is why we can't use
+	 * heap_copytuple() instead of this function.)
+	 */
+	src = change + offsetof(ConcurrentChange, tup_data);
+	memcpy(&tup_data, src, sizeof(HeapTupleData));
+
+	result = (HeapTuple) palloc(HEAPTUPLESIZE + tup_data.t_len);
+	memcpy(result, &tup_data, sizeof(HeapTupleData));
+	result->t_data = (HeapTupleHeader) ((char *) result + HEAPTUPLESIZE);
+	src = change + SizeOfConcurrentChange;
+	memcpy(result->t_data, src, result->t_len);
+
+	return result;
+}
+
+/*
+ * Decode logical changes from the WAL sequence up to end_of_wal.
+ */
+void
+repack_decode_concurrent_changes(LogicalDecodingContext *ctx,
+								 XLogRecPtr end_of_wal)
+{
+	RepackDecodingState *dstate;
+	ResourceOwner resowner_old;
+
+	/*
+	 * Invalidate the "present" cache before moving to "(recent) history".
+	 */
+	InvalidateSystemCaches();
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+	resowner_old = CurrentResourceOwner;
+	CurrentResourceOwner = dstate->resowner;
+
+	PG_TRY();
+	{
+		while (ctx->reader->EndRecPtr < end_of_wal)
+		{
+			XLogRecord *record;
+			XLogSegNo	segno_new;
+			char	   *errm = NULL;
+			XLogRecPtr	end_lsn;
+
+			record = XLogReadRecord(ctx->reader, &errm);
+			if (errm)
+				elog(ERROR, "%s", errm);
+
+			if (record != NULL)
+				LogicalDecodingProcessRecord(ctx, ctx->reader);
+
+			/*
+			 * If WAL segment boundary has been crossed, inform the decoding
+			 * system that the catalog_xmin can advance. (We can confirm more
+			 * often, but a filling a single WAL segment should not take much
+			 * time.)
+			 */
+			end_lsn = ctx->reader->EndRecPtr;
+			XLByteToSeg(end_lsn, segno_new, wal_segment_size);
+			if (segno_new != repack_current_segment)
+			{
+				LogicalConfirmReceivedLocation(end_lsn);
+				elog(DEBUG1, "REPACK: confirmed receive location %X/%X",
+					 (uint32) (end_lsn >> 32), (uint32) end_lsn);
+				repack_current_segment = segno_new;
+			}
+
+			CHECK_FOR_INTERRUPTS();
+		}
+		InvalidateSystemCaches();
+		CurrentResourceOwner = resowner_old;
+	}
+	PG_CATCH();
+	{
+		/* clear all timetravel entries */
+		InvalidateSystemCaches();
+		CurrentResourceOwner = resowner_old;
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+}
+
+/*
+ * Apply changes that happened during the initial load.
+ *
+ * Scan key is passed by caller, so it does not have to be constructed
+ * multiple times. Key entries have all fields initialized, except for
+ * sk_argument.
+ */
+static void
+apply_concurrent_changes(RepackDecodingState *dstate, Relation rel,
+						 ScanKey key, int nkeys, IndexInsertState *iistate)
+{
+	TupleTableSlot *index_slot,
+			   *ident_slot;
+	HeapTuple	tup_old = NULL;
+
+	if (dstate->nchanges == 0)
+		return;
+
+	/* TupleTableSlot is needed to pass the tuple to ExecInsertIndexTuples(). */
+	index_slot = MakeSingleTupleTableSlot(dstate->tupdesc, &TTSOpsHeapTuple);
+
+	/* A slot to fetch tuples from identity index. */
+	ident_slot = table_slot_create(rel, NULL);
+
+	while (tuplestore_gettupleslot(dstate->tstore, true, false,
+								   dstate->tsslot))
+	{
+		bool		shouldFree;
+		HeapTuple	tup_change,
+					tup,
+					tup_exist;
+		char	   *change_raw,
+				   *src;
+		ConcurrentChange change;
+		bool		isnull[1];
+		Datum		values[1];
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* Get the change from the single-column tuple. */
+		tup_change = ExecFetchSlotHeapTuple(dstate->tsslot, false, &shouldFree);
+		heap_deform_tuple(tup_change, dstate->tupdesc_change, values, isnull);
+		Assert(!isnull[0]);
+
+		/* Make sure we access aligned data. */
+		change_raw = (char *) DatumGetByteaP(values[0]);
+		src = (char *) VARDATA(change_raw);
+		memcpy(&change, src, SizeOfConcurrentChange);
+
+		/* TRUNCATE change contains no tuple, so process it separately. */
+		if (change.kind == CHANGE_TRUNCATE)
+		{
+			/*
+			 * All the things that ExecuteTruncateGuts() does (such as firing
+			 * triggers or handling the DROP_CASCADE behavior) should have
+			 * taken place on the source relation. Thus we only do the actual
+			 * truncation of the new relation (and its indexes).
+			 */
+			heap_truncate_one_rel(rel);
+
+			pfree(tup_change);
+			continue;
+		}
+
+		/*
+		 * Extract the tuple from the change. The tuple is copied here because
+		 * it might be assigned to 'tup_old', in which case it needs to
+		 * survive into the next iteration.
+		 */
+		tup = get_changed_tuple(src);
+
+		if (change.kind == CHANGE_UPDATE_OLD)
+		{
+			Assert(tup_old == NULL);
+			tup_old = tup;
+		}
+		else if (change.kind == CHANGE_INSERT)
+		{
+			Assert(tup_old == NULL);
+
+			apply_concurrent_insert(rel, &change, tup, iistate, index_slot);
+
+			pfree(tup);
+		}
+		else if (change.kind == CHANGE_UPDATE_NEW ||
+				 change.kind == CHANGE_DELETE)
+		{
+			IndexScanDesc ind_scan = NULL;
+			HeapTuple	tup_key;
+
+			if (change.kind == CHANGE_UPDATE_NEW)
+			{
+				tup_key = tup_old != NULL ? tup_old : tup;
+			}
+			else
+			{
+				Assert(tup_old == NULL);
+				tup_key = tup;
+			}
+
+			/*
+			 * Find the tuple to be updated or deleted.
+			 */
+			tup_exist = find_target_tuple(rel, key, nkeys, tup_key,
+										  iistate, ident_slot, &ind_scan);
+			if (tup_exist == NULL)
+				elog(ERROR, "Failed to find target tuple");
+
+			if (change.kind == CHANGE_UPDATE_NEW)
+				apply_concurrent_update(rel, tup, tup_exist, &change, iistate,
+										index_slot);
+			else
+				apply_concurrent_delete(rel, tup_exist, &change);
+
+			if (tup_old != NULL)
+			{
+				pfree(tup_old);
+				tup_old = NULL;
+			}
+
+			pfree(tup);
+			index_endscan(ind_scan);
+		}
+		else
+			elog(ERROR, "Unrecognized kind of change: %d", change.kind);
+
+		/*
+		 * If a change was applied now, increment CID for next writes and
+		 * update the snapshot so it sees the changes we've applied so far.
+		 */
+		if (change.kind != CHANGE_UPDATE_OLD)
+		{
+			CommandCounterIncrement();
+			UpdateActiveSnapshotCommandId();
+		}
+
+		/* TTSOpsMinimalTuple has .get_heap_tuple==NULL. */
+		Assert(shouldFree);
+		pfree(tup_change);
+	}
+
+	tuplestore_clear(dstate->tstore);
+	dstate->nchanges = 0;
+
+	/* Cleanup. */
+	ExecDropSingleTupleTableSlot(index_slot);
+	ExecDropSingleTupleTableSlot(ident_slot);
+}
+
+static void
+apply_concurrent_insert(Relation rel, ConcurrentChange *change, HeapTuple tup,
+						IndexInsertState *iistate, TupleTableSlot *index_slot)
+{
+	List	   *recheck;
+
+
+	/*
+	 * Like simple_heap_insert(), but make sure that the INSERT is not
+	 * logically decoded - see reform_and_rewrite_tuple() for more
+	 * information.
+	 */
+	heap_insert(rel, tup, GetCurrentCommandId(true), HEAP_INSERT_NO_LOGICAL,
+				NULL);
+
+	/*
+	 * Update indexes.
+	 *
+	 * In case functions in the index need the active snapshot and caller
+	 * hasn't set one.
+	 */
+	ExecStoreHeapTuple(tup, index_slot, false);
+	recheck = ExecInsertIndexTuples(iistate->rri,
+									index_slot,
+									iistate->estate,
+									false,	/* update */
+									false,	/* noDupErr */
+									NULL,	/* specConflict */
+									NIL,	/* arbiterIndexes */
+									false	/* onlySummarizing */
+		);
+
+	/*
+	 * If recheck is required, it must have been preformed on the source
+	 * relation by now. (All the logical changes we process here are already
+	 * committed.)
+	 */
+	list_free(recheck);
+
+	pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_INSERTED, 1);
+}
+
+static void
+apply_concurrent_update(Relation rel, HeapTuple tup, HeapTuple tup_target,
+						ConcurrentChange *change, IndexInsertState *iistate,
+						TupleTableSlot *index_slot)
+{
+	LockTupleMode lockmode;
+	TM_FailureData tmfd;
+	TU_UpdateIndexes update_indexes;
+	TM_Result	res;
+	List	   *recheck;
+
+	/*
+	 * Write the new tuple into the new heap. ('tup' gets the TID assigned
+	 * here.)
+	 *
+	 * Do it like in simple_heap_update(), except for 'wal_logical' (and
+	 * except for 'wait').
+	 */
+	res = heap_update(rel, &tup_target->t_self, tup,
+					  GetCurrentCommandId(true),
+					  InvalidSnapshot,
+					  false,	/* no wait - only we are doing changes */
+					  &tmfd, &lockmode, &update_indexes,
+					  false /* wal_logical */ );
+	if (res != TM_Ok)
+		ereport(ERROR, (errmsg("failed to apply concurrent UPDATE")));
+
+	ExecStoreHeapTuple(tup, index_slot, false);
+
+	if (update_indexes != TU_None)
+	{
+		recheck = ExecInsertIndexTuples(iistate->rri,
+										index_slot,
+										iistate->estate,
+										true,	/* update */
+										false,	/* noDupErr */
+										NULL,	/* specConflict */
+										NIL,	/* arbiterIndexes */
+		/* onlySummarizing */
+										update_indexes == TU_Summarizing);
+		list_free(recheck);
+	}
+
+	pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_UPDATED, 1);
+}
+
+static void
+apply_concurrent_delete(Relation rel, HeapTuple tup_target,
+						ConcurrentChange *change)
+{
+	TM_Result	res;
+	TM_FailureData tmfd;
+
+	/*
+	 * Delete tuple from the new heap.
+	 *
+	 * Do it like in simple_heap_delete(), except for 'wal_logical' (and
+	 * except for 'wait').
+	 */
+	res = heap_delete(rel, &tup_target->t_self, GetCurrentCommandId(true),
+					  InvalidSnapshot, false,
+					  &tmfd,
+					  false,	/* no wait - only we are doing changes */
+					  false /* wal_logical */ );
+
+	if (res != TM_Ok)
+		ereport(ERROR, (errmsg("failed to apply concurrent DELETE")));
+
+	pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_DELETED, 1);
+}
+
+/*
+ * Find the tuple to be updated or deleted.
+ *
+ * 'key' is a pre-initialized scan key, into which the function will put the
+ * key values.
+ *
+ * 'tup_key' is a tuple containing the key values for the scan.
+ *
+ * On exit,'*scan_p' contains the scan descriptor used. The caller must close
+ * it when he no longer needs the tuple returned.
+ */
+static HeapTuple
+find_target_tuple(Relation rel, ScanKey key, int nkeys, HeapTuple tup_key,
+				  IndexInsertState *iistate,
+				  TupleTableSlot *ident_slot, IndexScanDesc *scan_p)
+{
+	IndexScanDesc scan;
+	Form_pg_index ident_form;
+	int2vector *ident_indkey;
+	HeapTuple	result = NULL;
+
+	/* XXX no instrumentation for now */
+	scan = index_beginscan(rel, iistate->ident_index, GetActiveSnapshot(),
+						   NULL, nkeys, 0);
+	*scan_p = scan;
+	index_rescan(scan, key, nkeys, NULL, 0);
+
+	/* Info needed to retrieve key values from heap tuple. */
+	ident_form = iistate->ident_index->rd_index;
+	ident_indkey = &ident_form->indkey;
+
+	/* Use the incoming tuple to finalize the scan key. */
+	for (int i = 0; i < scan->numberOfKeys; i++)
+	{
+		ScanKey		entry;
+		bool		isnull;
+		int16		attno_heap;
+
+		entry = &scan->keyData[i];
+		attno_heap = ident_indkey->values[i];
+		entry->sk_argument = heap_getattr(tup_key,
+										  attno_heap,
+										  rel->rd_att,
+										  &isnull);
+		Assert(!isnull);
+	}
+	if (index_getnext_slot(scan, ForwardScanDirection, ident_slot))
+	{
+		bool		shouldFree;
+
+		result = ExecFetchSlotHeapTuple(ident_slot, false, &shouldFree);
+		/* TTSOpsBufferHeapTuple has .get_heap_tuple != NULL. */
+		Assert(!shouldFree);
+	}
+
+	return result;
+}
+
+/*
+ * Decode and apply concurrent changes.
+ *
+ * Pass rel_src iff its reltoastrelid is needed.
+ */
+static void
+process_concurrent_changes(LogicalDecodingContext *ctx, XLogRecPtr end_of_wal,
+						   Relation rel_dst, Relation rel_src, ScanKey ident_key,
+						   int ident_key_nentries, IndexInsertState *iistate)
+{
+	RepackDecodingState *dstate;
+
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_CATCH_UP);
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	repack_decode_concurrent_changes(ctx, end_of_wal);
+
+	if (dstate->nchanges == 0)
+		return;
+
+	PG_TRY();
+	{
+		/*
+		 * Make sure that TOAST values can eventually be accessed via the old
+		 * relation - see comment in copy_table_data().
+		 */
+		if (rel_src)
+			rel_dst->rd_toastoid = rel_src->rd_rel->reltoastrelid;
+
+		apply_concurrent_changes(dstate, rel_dst, ident_key,
+								 ident_key_nentries, iistate);
+	}
+	PG_FINALLY();
+	{
+		if (rel_src)
+			rel_dst->rd_toastoid = InvalidOid;
+	}
+	PG_END_TRY();
+}
+
+static IndexInsertState *
+get_index_insert_state(Relation relation, Oid ident_index_id)
+{
+	EState	   *estate;
+	int			i;
+	IndexInsertState *result;
+
+	result = (IndexInsertState *) palloc0(sizeof(IndexInsertState));
+	estate = CreateExecutorState();
+
+	result->rri = (ResultRelInfo *) palloc(sizeof(ResultRelInfo));
+	InitResultRelInfo(result->rri, relation, 0, 0, 0);
+	ExecOpenIndices(result->rri, false);
+
+	/*
+	 * Find the relcache entry of the identity index so that we spend no extra
+	 * effort to open / close it.
+	 */
+	for (i = 0; i < result->rri->ri_NumIndices; i++)
+	{
+		Relation	ind_rel;
+
+		ind_rel = result->rri->ri_IndexRelationDescs[i];
+		if (ind_rel->rd_id == ident_index_id)
+			result->ident_index = ind_rel;
+	}
+	if (result->ident_index == NULL)
+		elog(ERROR, "Failed to open identity index");
+
+	/* Only initialize fields needed by ExecInsertIndexTuples(). */
+	result->estate = estate;
+
+	return result;
+}
+
+/*
+ * Build scan key to process logical changes.
+ */
+static ScanKey
+build_identity_key(Oid ident_idx_oid, Relation rel_src, int *nentries)
+{
+	Relation	ident_idx_rel;
+	Form_pg_index ident_idx;
+	int			n,
+				i;
+	ScanKey		result;
+
+	Assert(OidIsValid(ident_idx_oid));
+	ident_idx_rel = index_open(ident_idx_oid, AccessShareLock);
+	ident_idx = ident_idx_rel->rd_index;
+	n = ident_idx->indnatts;
+	result = (ScanKey) palloc(sizeof(ScanKeyData) * n);
+	for (i = 0; i < n; i++)
+	{
+		ScanKey		entry;
+		int16		relattno;
+		Form_pg_attribute att;
+		Oid			opfamily,
+					opcintype,
+					opno,
+					opcode;
+
+		entry = &result[i];
+		relattno = ident_idx->indkey.values[i];
+		if (relattno >= 1)
+		{
+			TupleDesc	desc;
+
+			desc = rel_src->rd_att;
+			att = TupleDescAttr(desc, relattno - 1);
+		}
+		else
+			elog(ERROR, "Unexpected attribute number %d in index", relattno);
+
+		opfamily = ident_idx_rel->rd_opfamily[i];
+		opcintype = ident_idx_rel->rd_opcintype[i];
+		opno = get_opfamily_member(opfamily, opcintype, opcintype,
+								   BTEqualStrategyNumber);
+
+		if (!OidIsValid(opno))
+			elog(ERROR, "Failed to find = operator for type %u", opcintype);
+
+		opcode = get_opcode(opno);
+		if (!OidIsValid(opcode))
+			elog(ERROR, "Failed to find = operator for operator %u", opno);
+
+		/* Initialize everything but argument. */
+		ScanKeyInit(entry,
+					i + 1,
+					BTEqualStrategyNumber, opcode,
+					(Datum) NULL);
+		entry->sk_collation = att->attcollation;
+	}
+	index_close(ident_idx_rel, AccessShareLock);
+
+	*nentries = n;
+	return result;
+}
+
+static void
+free_index_insert_state(IndexInsertState *iistate)
+{
+	ExecCloseIndices(iistate->rri);
+	FreeExecutorState(iistate->estate);
+	pfree(iistate->rri);
+	pfree(iistate);
+}
+
+static void
+cleanup_logical_decoding(LogicalDecodingContext *ctx)
+{
+	RepackDecodingState *dstate;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	ExecDropSingleTupleTableSlot(dstate->tsslot);
+	FreeTupleDesc(dstate->tupdesc_change);
+	FreeTupleDesc(dstate->tupdesc);
+	tuplestore_end(dstate->tstore);
+
+	FreeDecodingContext(ctx);
+}
+
+/*
+ * The final steps of rebuild_relation() for concurrent processing.
+ *
+ * On entry, NewHeap is locked in AccessExclusiveLock mode. OldHeap and its
+ * clustering index (if one is passed) are still locked in a mode that allows
+ * concurrent data changes. On exit, both tables and their indexes are closed,
+ * but locked in AccessExclusiveLock mode.
+ */
+static void
+rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
+								   Relation cl_index,
+								   LogicalDecodingContext *ctx,
+								   bool swap_toast_by_content,
+								   TransactionId frozenXid,
+								   MultiXactId cutoffMulti)
+{
+	LOCKMODE	lockmode_old PG_USED_FOR_ASSERTS_ONLY;
+	List	   *ind_oids_new;
+	Oid			old_table_oid = RelationGetRelid(OldHeap);
+	Oid			new_table_oid = RelationGetRelid(NewHeap);
+	List	   *ind_oids_old = RelationGetIndexList(OldHeap);
+	ListCell   *lc,
+			   *lc2;
+	char		relpersistence;
+	bool		is_system_catalog;
+	Oid			ident_idx_old,
+				ident_idx_new;
+	IndexInsertState *iistate;
+	ScanKey		ident_key;
+	int			ident_key_nentries;
+	XLogRecPtr	wal_insert_ptr,
+				end_of_wal;
+	char		dummy_rec_data = '\0';
+	Relation   *ind_refs,
+			   *ind_refs_p;
+	int			nind;
+
+	/* Like in cluster_rel(). */
+	lockmode_old = ShareUpdateExclusiveLock;
+	Assert(CheckRelationLockedByMe(OldHeap, lockmode_old, false));
+	Assert(cl_index == NULL ||
+		   CheckRelationLockedByMe(cl_index, lockmode_old, false));
+	/* This is expected from the caller. */
+	Assert(CheckRelationLockedByMe(NewHeap, AccessExclusiveLock, false));
+
+	ident_idx_old = RelationGetReplicaIndex(OldHeap);
+
+	/*
+	 * Unlike the exclusive case, we build new indexes for the new relation
+	 * rather than swapping the storage and reindexing the old relation. The
+	 * point is that the index build can take some time, so we do it before we
+	 * get AccessExclusiveLock on the old heap and therefore we cannot swap
+	 * the heap storage yet.
+	 *
+	 * index_create() will lock the new indexes using AccessExclusiveLock - no
+	 * need to change that.
+	 *
+	 * We assume that ShareUpdateExclusiveLock on the table prevents anyone
+	 * from dropping the existing indexes or adding new ones, so the lists of
+	 * old and new indexes should match at the swap time. On the other hand we
+	 * do not block ALTER INDEX commands that do not require table lock (e.g.
+	 * ALTER INDEX ... SET ...).
+	 *
+	 * XXX Should we check a the end of our work if another transaction
+	 * executed such a command and issue a NOTICE that we might have discarded
+	 * its effects? (For example, someone changes storage parameter after we
+	 * have created the new index, the new value of that parameter is lost.)
+	 * Alternatively, we can lock all the indexes now in a mode that blocks
+	 * all the ALTER INDEX commands (ShareUpdateExclusiveLock ?), and keep
+	 * them locked till the end of the transactions. That might increase the
+	 * risk of deadlock during the lock upgrade below, however SELECT / DML
+	 * queries should not be involved in such a deadlock.
+	 */
+	ind_oids_new = build_new_indexes(NewHeap, OldHeap, ind_oids_old);
+
+	/*
+	 * Processing shouldn't start w/o valid identity index.
+	 */
+	Assert(OidIsValid(ident_idx_old));
+
+	/* Find "identity index" on the new relation. */
+	ident_idx_new = InvalidOid;
+	forboth(lc, ind_oids_old, lc2, ind_oids_new)
+	{
+		Oid			ind_old = lfirst_oid(lc);
+		Oid			ind_new = lfirst_oid(lc2);
+
+		if (ident_idx_old == ind_old)
+		{
+			ident_idx_new = ind_new;
+			break;
+		}
+	}
+	if (!OidIsValid(ident_idx_new))
+
+		/*
+		 * Should not happen, given our lock on the old relation.
+		 */
+		ereport(ERROR,
+				(errmsg("Identity index missing on the new relation")));
+
+	/* Executor state to update indexes. */
+	iistate = get_index_insert_state(NewHeap, ident_idx_new);
+
+	/*
+	 * Build scan key that we'll use to look for rows to be updated / deleted
+	 * during logical decoding.
+	 */
+	ident_key = build_identity_key(ident_idx_new, OldHeap, &ident_key_nentries);
+
+	/*
+	 * During testing, wait for another backend to perform concurrent data
+	 * changes which we will process below.
+	 */
+	INJECTION_POINT("repack-concurrently-before-lock", NULL);
+
+	/*
+	 * Flush all WAL records inserted so far (possibly except for the last
+	 * incomplete page, see GetInsertRecPtr), to minimize the amount of data
+	 * we need to flush while holding exclusive lock on the source table.
+	 */
+	wal_insert_ptr = GetInsertRecPtr();
+	XLogFlush(wal_insert_ptr);
+	end_of_wal = GetFlushRecPtr(NULL);
+
+	/*
+	 * Apply concurrent changes first time, to minimize the time we need to
+	 * hold AccessExclusiveLock. (Quite some amount of WAL could have been
+	 * written during the data copying and index creation.)
+	 */
+	process_concurrent_changes(ctx, end_of_wal, NewHeap,
+							   swap_toast_by_content ? OldHeap : NULL,
+							   ident_key, ident_key_nentries, iistate);
+
+	/*
+	 * Acquire AccessExclusiveLock on the table, its TOAST relation (if there
+	 * is one), all its indexes, so that we can swap the files.
+	 *
+	 * Before that, unlock the index temporarily to avoid deadlock in case
+	 * another transaction is trying to lock it while holding the lock on the
+	 * table.
+	 */
+	if (cl_index)
+	{
+		index_close(cl_index, ShareUpdateExclusiveLock);
+		cl_index = NULL;
+	}
+	/* For the same reason, unlock TOAST relation. */
+	if (OldHeap->rd_rel->reltoastrelid)
+		LockRelationOid(OldHeap->rd_rel->reltoastrelid, AccessExclusiveLock);
+	/* Finally lock the table */
+	LockRelationOid(old_table_oid, AccessExclusiveLock);
+
+	/*
+	 * Lock all indexes now, not only the clustering one: all indexes need to
+	 * have their files swapped. While doing that, store their relation
+	 * references in an array, to handle predicate locks below.
+	 */
+	ind_refs_p = ind_refs = palloc_array(Relation, list_length(ind_oids_old));
+	nind = 0;
+	foreach(lc, ind_oids_old)
+	{
+		Oid			ind_oid;
+		Relation	index;
+
+		ind_oid = lfirst_oid(lc);
+		index = index_open(ind_oid, AccessExclusiveLock);
+
+		/*
+		 * TODO 1) Do we need to check if ALTER INDEX was executed since the
+		 * new index was created in build_new_indexes()? 2) Specifically for
+		 * the clustering index, should check_index_is_clusterable() be called
+		 * here? (Not sure about the latter: ShareUpdateExclusiveLock on the
+		 * table probably blocks all commands that affect the result of
+		 * check_index_is_clusterable().)
+		 */
+		*ind_refs_p = index;
+		ind_refs_p++;
+		nind++;
+	}
+
+	/*
+	 * In addition, lock the OldHeap's TOAST relation exclusively - again, the
+	 * lock is needed to swap the files.
+	 */
+	if (OidIsValid(OldHeap->rd_rel->reltoastrelid))
+		LockRelationOid(OldHeap->rd_rel->reltoastrelid, AccessExclusiveLock);
+
+	/*
+	 * Tuples and pages of the old heap will be gone, but the heap will stay.
+	 */
+	TransferPredicateLocksToHeapRelation(OldHeap);
+	/* The same for indexes. */
+	for (int i = 0; i < nind; i++)
+	{
+		Relation	index = ind_refs[i];
+
+		TransferPredicateLocksToHeapRelation(index);
+
+		/*
+		 * References to indexes on the old relation are not needed anymore,
+		 * however locks stay till the end of the transaction.
+		 */
+		index_close(index, NoLock);
+	}
+	pfree(ind_refs);
+
+	/*
+	 * Flush anything we see in WAL, to make sure that all changes committed
+	 * while we were waiting for the exclusive lock are available for
+	 * decoding. This should not be necessary if all backends had
+	 * synchronous_commit set, but we can't rely on this setting.
+	 *
+	 * Unfortunately, GetInsertRecPtr() may lag behind the actual insert
+	 * position, and GetLastImportantRecPtr() points at the start of the last
+	 * record rather than at the end. Thus the simplest way to determine the
+	 * insert position is to insert a dummy record and use its LSN.
+	 *
+	 * XXX Consider using GetLastImportantRecPtr() and adding the size of the
+	 * last record (plus the total size of all the page headers the record
+	 * spans)?
+	 */
+	XLogBeginInsert();
+	XLogRegisterData(&dummy_rec_data, 1);
+	wal_insert_ptr = XLogInsert(RM_XLOG_ID, XLOG_NOOP);
+	XLogFlush(wal_insert_ptr);
+	end_of_wal = GetFlushRecPtr(NULL);
+
+	/* Apply the concurrent changes again. */
+	process_concurrent_changes(ctx, end_of_wal, NewHeap,
+							   swap_toast_by_content ? OldHeap : NULL,
+							   ident_key, ident_key_nentries, iistate);
+
+	/* Remember info about rel before closing OldHeap */
+	relpersistence = OldHeap->rd_rel->relpersistence;
+	is_system_catalog = IsSystemRelation(OldHeap);
+
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_SWAP_REL_FILES);
+
+	/*
+	 * Even ShareUpdateExclusiveLock should have prevented others from
+	 * creating / dropping indexes (even using the CONCURRENTLY option), so we
+	 * do not need to check whether the lists match.
+	 */
+	forboth(lc, ind_oids_old, lc2, ind_oids_new)
+	{
+		Oid			ind_old = lfirst_oid(lc);
+		Oid			ind_new = lfirst_oid(lc2);
+		Oid			mapped_tables[4];
+
+		/* Zero out possible results from swapped_relation_files */
+		memset(mapped_tables, 0, sizeof(mapped_tables));
+
+		swap_relation_files(ind_old, ind_new,
+							(old_table_oid == RelationRelationId),
+							swap_toast_by_content,
+							true,
+							InvalidTransactionId,
+							InvalidMultiXactId,
+							mapped_tables);
+
+#ifdef USE_ASSERT_CHECKING
+
+		/*
+		 * Concurrent processing is not supported for system relations, so
+		 * there should be no mapped tables.
+		 */
+		for (int i = 0; i < 4; i++)
+			Assert(mapped_tables[i] == 0);
+#endif
+	}
+
+	/* The new indexes must be visible for deletion. */
+	CommandCounterIncrement();
+
+	/* Close the old heap but keep lock until transaction commit. */
+	table_close(OldHeap, NoLock);
+	/* Close the new heap. (We didn't have to open its indexes). */
+	table_close(NewHeap, NoLock);
+
+	/* Cleanup what we don't need anymore. (And close the identity index.) */
+	pfree(ident_key);
+	free_index_insert_state(iistate);
+
+	/*
+	 * Swap the relations and their TOAST relations and TOAST indexes. This
+	 * also drops the new relation and its indexes.
+	 *
+	 * (System catalogs are currently not supported.)
+	 */
+	Assert(!is_system_catalog);
+	finish_heap_swap(old_table_oid, new_table_oid,
+					 is_system_catalog,
+					 swap_toast_by_content,
+					 false, true, false,
+					 frozenXid, cutoffMulti,
+					 relpersistence);
+}
+
+/*
+ * Build indexes on NewHeap according to those on OldHeap.
+ *
+ * OldIndexes is the list of index OIDs on OldHeap.
+ *
+ * A list of OIDs of the corresponding indexes created on NewHeap is
+ * returned. The order of items does match, so we can use these arrays to swap
+ * index storage.
+ */
+static List *
+build_new_indexes(Relation NewHeap, Relation OldHeap, List *OldIndexes)
+{
+	ListCell   *lc;
+	List	   *result = NIL;
+
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
+
+	foreach(lc, OldIndexes)
+	{
+		Oid			ind_oid,
+					ind_oid_new;
+		char	   *newName;
+		Relation	ind;
+
+		ind_oid = lfirst_oid(lc);
+		ind = index_open(ind_oid, AccessShareLock);
+
+		newName = ChooseRelationName(get_rel_name(ind_oid),
+									 NULL,
+									 "repacknew",
+									 get_rel_namespace(ind->rd_index->indrelid),
+									 false);
+		ind_oid_new = index_create_copy(NewHeap, ind_oid,
+										ind->rd_rel->reltablespace, newName,
+										false);
+		result = lappend_oid(result, ind_oid_new);
+
+		index_close(ind, AccessShareLock);
+	}
+
+	return result;
+}
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 188e26f0e6e..71b73c21ebf 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -904,7 +904,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 static void
 refresh_by_heap_swap(Oid matviewOid, Oid OIDNewHeap, char relpersistence)
 {
-	finish_heap_swap(matviewOid, OIDNewHeap, false, false, true, true,
+	finish_heap_swap(matviewOid, OIDNewHeap, false, false, true, true, true,
 					 RecentXmin, ReadNextMultiXactId(), relpersistence);
 }
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 082a3575d62..c79f5b1dc0f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -5989,6 +5989,7 @@ ATRewriteTables(AlterTableStmt *parsetree, List **wqueue, LOCKMODE lockmode,
 			finish_heap_swap(tab->relid, OIDNewHeap,
 							 false, false, true,
 							 !OidIsValid(tab->newTableSpace),
+							 true,
 							 RecentXmin,
 							 ReadNextMultiXactId(),
 							 persistence);
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 8863ad0e8bd..6de9d0ba39d 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -125,7 +125,7 @@ static void vac_truncate_clog(TransactionId frozenXID,
 							  TransactionId lastSaneFrozenXid,
 							  MultiXactId lastSaneMinMulti);
 static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
-					   BufferAccessStrategy bstrategy);
+					   BufferAccessStrategy bstrategy, bool isTopLevel);
 static double compute_parallel_delay(void);
 static VacOptValue get_vacoptval_from_boolean(DefElem *def);
 static bool vac_tid_reaped(ItemPointer itemptr, void *state);
@@ -633,7 +633,8 @@ vacuum(List *relations, const VacuumParams params, BufferAccessStrategy bstrateg
 
 			if (params.options & VACOPT_VACUUM)
 			{
-				if (!vacuum_rel(vrel->oid, vrel->relation, params, bstrategy))
+				if (!vacuum_rel(vrel->oid, vrel->relation, params, bstrategy,
+								isTopLevel))
 					continue;
 			}
 
@@ -1997,7 +1998,7 @@ vac_truncate_clog(TransactionId frozenXID,
  */
 static bool
 vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
-		   BufferAccessStrategy bstrategy)
+		   BufferAccessStrategy bstrategy, bool isTopLevel)
 {
 	LOCKMODE	lmode;
 	Relation	rel;
@@ -2288,7 +2289,7 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 
 			/* VACUUM FULL is now a variant of CLUSTER; see cluster.c */
 			cluster_rel(REPACK_COMMAND_VACUUMFULL, false, rel, InvalidOid,
-						&cluster_params);
+						&cluster_params, isTopLevel);
 			/* cluster_rel closes the relation, but keeps lock */
 
 			rel = NULL;
@@ -2331,7 +2332,8 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 		toast_vacuum_params.options |= VACOPT_PROCESS_MAIN;
 		toast_vacuum_params.toast_parent = relid;
 
-		vacuum_rel(toast_relid, NULL, toast_vacuum_params, bstrategy);
+		vacuum_rel(toast_relid, NULL, toast_vacuum_params, bstrategy,
+				   isTopLevel);
 	}
 
 	/*
diff --git a/src/backend/meson.build b/src/backend/meson.build
index b831a541652..5c148131217 100644
--- a/src/backend/meson.build
+++ b/src/backend/meson.build
@@ -194,5 +194,6 @@ pg_test_mod_args = pg_mod_args + {
 subdir('jit/llvm')
 subdir('replication/libpqwalreceiver')
 subdir('replication/pgoutput')
+subdir('replication/pgoutput_repack')
 subdir('snowball')
 subdir('utils/mb/conversion_procs')
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..5dc4ae58ffe 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -33,6 +33,7 @@
 #include "access/xlogreader.h"
 #include "access/xlogrecord.h"
 #include "catalog/pg_control.h"
+#include "commands/cluster.h"
 #include "replication/decode.h"
 #include "replication/logical.h"
 #include "replication/message.h"
@@ -472,6 +473,88 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	TransactionId xid = XLogRecGetXid(buf->record);
 	SnapBuild  *builder = ctx->snapshot_builder;
 
+	/*
+	 * If the change is not intended for logical decoding, do not even
+	 * establish transaction for it - REPACK CONCURRENTLY is the typical use
+	 * case.
+	 *
+	 * First, check if REPACK CONCURRENTLY is being performed by this backend.
+	 * If so, only decode data changes of the table that it is processing, and
+	 * the changes of its TOAST relation.
+	 *
+	 * (TOAST locator should not be set unless the main is.)
+	 */
+	Assert(!OidIsValid(repacked_rel_toast_locator.relNumber) ||
+		   OidIsValid(repacked_rel_locator.relNumber));
+
+	if (OidIsValid(repacked_rel_locator.relNumber))
+	{
+		XLogReaderState *r = buf->record;
+		RelFileLocator locator;
+
+		/* Not all records contain the block. */
+		if (XLogRecGetBlockTagExtended(r, 0, &locator, NULL, NULL, NULL) &&
+			!RelFileLocatorEquals(locator, repacked_rel_locator) &&
+			(!OidIsValid(repacked_rel_toast_locator.relNumber) ||
+			 !RelFileLocatorEquals(locator, repacked_rel_toast_locator)))
+			return;
+	}
+
+	/*
+	 * Second, skip records which do not contain sufficient information for
+	 * the decoding.
+	 *
+	 * The problem we solve here is that REPACK CONCURRENTLY generates WAL
+	 * when doing changes in the new table. Those changes should not be useful
+	 * for any other user (such as logical replication subscription) because
+	 * the new table will eventually be dropped (after REPACK CONCURRENTLY has
+	 * assigned its file to the "old table").
+	 */
+	switch (info)
+	{
+		case XLOG_HEAP_INSERT:
+			{
+				xl_heap_insert *rec;
+
+				rec = (xl_heap_insert *) XLogRecGetData(buf->record);
+
+				/*
+				 * This does happen when 1) raw_heap_insert marks the TOAST
+				 * record as HEAP_INSERT_NO_LOGICAL, 2) REPACK CONCURRENTLY
+				 * replays inserts performed by other backends.
+				 */
+				if ((rec->flags & XLH_INSERT_CONTAINS_NEW_TUPLE) == 0)
+					return;
+
+				break;
+			}
+
+		case XLOG_HEAP_HOT_UPDATE:
+		case XLOG_HEAP_UPDATE:
+			{
+				xl_heap_update *rec;
+
+				rec = (xl_heap_update *) XLogRecGetData(buf->record);
+				if ((rec->flags &
+					 (XLH_UPDATE_CONTAINS_NEW_TUPLE |
+					  XLH_UPDATE_CONTAINS_OLD_TUPLE |
+					  XLH_UPDATE_CONTAINS_OLD_KEY)) == 0)
+					return;
+
+				break;
+			}
+
+		case XLOG_HEAP_DELETE:
+			{
+				xl_heap_delete *rec;
+
+				rec = (xl_heap_delete *) XLogRecGetData(buf->record);
+				if (rec->flags & XLH_DELETE_NO_LOGICAL)
+					return;
+				break;
+			}
+	}
+
 	ReorderBufferProcessXid(ctx->reorder, xid, buf->origptr);
 
 	/*
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index a2f1803622c..8e5116a9cab 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -486,6 +486,26 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	return SnapBuildMVCCFromHistoric(snap, true);
 }
 
+/*
+ * Build an MVCC snapshot for the initial data load performed by REPACK
+ * CONCURRENTLY command.
+ *
+ * The snapshot will only be used to scan one particular relation, which is
+ * treated like a catalog (therefore ->building_full_snapshot is not
+ * important), and the caller should already have a replication slot setup (so
+ * we do not set MyProc->xmin). XXX Do we yet need to add some restrictions?
+ */
+Snapshot
+SnapBuildInitialSnapshotForRepack(SnapBuild *builder)
+{
+	Snapshot	snap;
+
+	Assert(builder->state == SNAPBUILD_CONSISTENT);
+
+	snap = SnapBuildBuildSnapshot(builder);
+	return SnapBuildMVCCFromHistoric(snap, false);
+}
+
 /*
  * Turn a historic MVCC snapshot into an ordinary MVCC snapshot.
  *
diff --git a/src/backend/replication/pgoutput_repack/Makefile b/src/backend/replication/pgoutput_repack/Makefile
new file mode 100644
index 00000000000..4efeb713b70
--- /dev/null
+++ b/src/backend/replication/pgoutput_repack/Makefile
@@ -0,0 +1,32 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for src/backend/replication/pgoutput_repack
+#
+# IDENTIFICATION
+#    src/backend/replication/pgoutput_repack
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/replication/pgoutput_repack
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = \
+	$(WIN32RES) \
+	pgoutput_repack.o
+PGFILEDESC = "pgoutput_repack - logical replication output plugin for REPACK command"
+NAME = pgoutput_repack
+
+all: all-shared-lib
+
+include $(top_srcdir)/src/Makefile.shlib
+
+install: all installdirs install-lib
+
+installdirs: installdirs-lib
+
+uninstall: uninstall-lib
+
+clean distclean: clean-lib
+	rm -f $(OBJS)
diff --git a/src/backend/replication/pgoutput_repack/meson.build b/src/backend/replication/pgoutput_repack/meson.build
new file mode 100644
index 00000000000..133e865a4a0
--- /dev/null
+++ b/src/backend/replication/pgoutput_repack/meson.build
@@ -0,0 +1,18 @@
+# Copyright (c) 2022-2024, PostgreSQL Global Development Group
+
+pgoutput_repack_sources = files(
+  'pgoutput_repack.c',
+)
+
+if host_system == 'windows'
+  pgoutput_repack_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'pgoutput_repack',
+    '--FILEDESC', 'pgoutput_repack - logical replication output plugin for REPACK command',])
+endif
+
+pgoutput_repack = shared_module('pgoutput_repack',
+  pgoutput_repack_sources,
+  kwargs: pg_mod_args,
+)
+
+backend_targets += pgoutput_repack
diff --git a/src/backend/replication/pgoutput_repack/pgoutput_repack.c b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
new file mode 100644
index 00000000000..687fbbc59bb
--- /dev/null
+++ b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
@@ -0,0 +1,288 @@
+/*-------------------------------------------------------------------------
+ *
+ * pgoutput_cluster.c
+ *		Logical Replication output plugin for REPACK command
+ *
+ * Copyright (c) 2012-2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		  src/backend/replication/pgoutput_cluster/pgoutput_cluster.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heaptoast.h"
+#include "commands/cluster.h"
+#include "replication/snapbuild.h"
+
+PG_MODULE_MAGIC;
+
+static void plugin_startup(LogicalDecodingContext *ctx,
+						   OutputPluginOptions *opt, bool is_init);
+static void plugin_shutdown(LogicalDecodingContext *ctx);
+static void plugin_begin_txn(LogicalDecodingContext *ctx,
+							 ReorderBufferTXN *txn);
+static void plugin_commit_txn(LogicalDecodingContext *ctx,
+							  ReorderBufferTXN *txn, XLogRecPtr commit_lsn);
+static void plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+						  Relation rel, ReorderBufferChange *change);
+static void plugin_truncate(struct LogicalDecodingContext *ctx,
+							ReorderBufferTXN *txn, int nrelations,
+							Relation relations[],
+							ReorderBufferChange *change);
+static void store_change(LogicalDecodingContext *ctx,
+						 ConcurrentChangeKind kind, HeapTuple tuple);
+
+void
+_PG_output_plugin_init(OutputPluginCallbacks *cb)
+{
+	AssertVariableIsOfType(&_PG_output_plugin_init, LogicalOutputPluginInit);
+
+	cb->startup_cb = plugin_startup;
+	cb->begin_cb = plugin_begin_txn;
+	cb->change_cb = plugin_change;
+	cb->truncate_cb = plugin_truncate;
+	cb->commit_cb = plugin_commit_txn;
+	cb->shutdown_cb = plugin_shutdown;
+}
+
+
+/* initialize this plugin */
+static void
+plugin_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
+			   bool is_init)
+{
+	ctx->output_plugin_private = NULL;
+
+	/* Probably unnecessary, as we don't use the SQL interface ... */
+	opt->output_type = OUTPUT_PLUGIN_BINARY_OUTPUT;
+
+	if (ctx->output_plugin_options != NIL)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("This plugin does not expect any options")));
+	}
+}
+
+static void
+plugin_shutdown(LogicalDecodingContext *ctx)
+{
+}
+
+/*
+ * As we don't release the slot during processing of particular table, there's
+ * no room for SQL interface, even for debugging purposes. Therefore we need
+ * neither OutputPluginPrepareWrite() nor OutputPluginWrite() in the plugin
+ * callbacks. (Although we might want to write custom callbacks, this API
+ * seems to be unnecessarily generic for our purposes.)
+ */
+
+/* BEGIN callback */
+static void
+plugin_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
+{
+}
+
+/* COMMIT callback */
+static void
+plugin_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+				  XLogRecPtr commit_lsn)
+{
+}
+
+/*
+ * Callback for individual changed tuples
+ */
+static void
+plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+			  Relation relation, ReorderBufferChange *change)
+{
+	RepackDecodingState *dstate;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	/* Only interested in one particular relation. */
+	if (relation->rd_id != dstate->relid)
+		return;
+
+	/* Decode entry depending on its type */
+	switch (change->action)
+	{
+		case REORDER_BUFFER_CHANGE_INSERT:
+			{
+				HeapTuple	newtuple;
+
+				newtuple = change->data.tp.newtuple != NULL ?
+					change->data.tp.newtuple : NULL;
+
+				/*
+				 * Identity checks in the main function should have made this
+				 * impossible.
+				 */
+				if (newtuple == NULL)
+					elog(ERROR, "Incomplete insert info.");
+
+				store_change(ctx, CHANGE_INSERT, newtuple);
+			}
+			break;
+		case REORDER_BUFFER_CHANGE_UPDATE:
+			{
+				HeapTuple	oldtuple,
+							newtuple;
+
+				oldtuple = change->data.tp.oldtuple != NULL ?
+					change->data.tp.oldtuple : NULL;
+				newtuple = change->data.tp.newtuple != NULL ?
+					change->data.tp.newtuple : NULL;
+
+				if (newtuple == NULL)
+					elog(ERROR, "Incomplete update info.");
+
+				if (oldtuple != NULL)
+					store_change(ctx, CHANGE_UPDATE_OLD, oldtuple);
+
+				store_change(ctx, CHANGE_UPDATE_NEW, newtuple);
+			}
+			break;
+		case REORDER_BUFFER_CHANGE_DELETE:
+			{
+				HeapTuple	oldtuple;
+
+				oldtuple = change->data.tp.oldtuple ?
+					change->data.tp.oldtuple : NULL;
+
+				if (oldtuple == NULL)
+					elog(ERROR, "Incomplete delete info.");
+
+				store_change(ctx, CHANGE_DELETE, oldtuple);
+			}
+			break;
+		default:
+			/* Should not come here */
+			Assert(false);
+			break;
+	}
+}
+
+static void
+plugin_truncate(struct LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+				int nrelations, Relation relations[],
+				ReorderBufferChange *change)
+{
+	RepackDecodingState *dstate;
+	int			i;
+	Relation	relation = NULL;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	/* Find the relation we are processing. */
+	for (i = 0; i < nrelations; i++)
+	{
+		relation = relations[i];
+
+		if (RelationGetRelid(relation) == dstate->relid)
+			break;
+	}
+
+	/* Is this truncation of another relation? */
+	if (i == nrelations)
+		return;
+
+	store_change(ctx, CHANGE_TRUNCATE, NULL);
+}
+
+/* Store concurrent data change. */
+static void
+store_change(LogicalDecodingContext *ctx, ConcurrentChangeKind kind,
+			 HeapTuple tuple)
+{
+	RepackDecodingState *dstate;
+	char	   *change_raw;
+	ConcurrentChange change;
+	bool		flattened = false;
+	Size		size;
+	Datum		values[1];
+	bool		isnull[1];
+	char	   *dst,
+			   *dst_start;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	size = MAXALIGN(VARHDRSZ) + SizeOfConcurrentChange;
+
+	if (tuple)
+	{
+		/*
+		 * ReorderBufferCommit() stores the TOAST chunks in its private memory
+		 * context and frees them after having called apply_change().
+		 * Therefore we need flat copy (including TOAST) that we eventually
+		 * copy into the memory context which is available to
+		 * decode_concurrent_changes().
+		 */
+		if (HeapTupleHasExternal(tuple))
+		{
+			/*
+			 * toast_flatten_tuple_to_datum() might be more convenient but we
+			 * don't want the decompression it does.
+			 */
+			tuple = toast_flatten_tuple(tuple, dstate->tupdesc);
+			flattened = true;
+		}
+
+		size += tuple->t_len;
+	}
+
+	/* XXX Isn't there any function / macro to do this? */
+	if (size >= 0x3FFFFFFF)
+		elog(ERROR, "Change is too big.");
+
+	/* Construct the change. */
+	change_raw = (char *) palloc0(size);
+	SET_VARSIZE(change_raw, size);
+
+	/*
+	 * Since the varlena alignment might not be sufficient for the structure,
+	 * set the fields in a local instance and remember where it should
+	 * eventually be copied.
+	 */
+	change.kind = kind;
+	dst_start = (char *) VARDATA(change_raw);
+
+	/* No other information is needed for TRUNCATE. */
+	if (change.kind == CHANGE_TRUNCATE)
+	{
+		memcpy(dst_start, &change, SizeOfConcurrentChange);
+		goto store;
+	}
+
+	/*
+	 * Copy the tuple.
+	 *
+	 * CAUTION: change->tup_data.t_data must be fixed on retrieval!
+	 */
+	memcpy(&change.tup_data, tuple, sizeof(HeapTupleData));
+	dst = dst_start + SizeOfConcurrentChange;
+	memcpy(dst, tuple->t_data, tuple->t_len);
+
+	/* The data has been copied. */
+	if (flattened)
+		pfree(tuple);
+
+store:
+	/* Copy the structure so it can be stored. */
+	memcpy(dst_start, &change, SizeOfConcurrentChange);
+
+	/* Store as tuple of 1 bytea column. */
+	values[0] = PointerGetDatum(change_raw);
+	isnull[0] = false;
+	tuplestore_putvalues(dstate->tstore, dstate->tupdesc_change,
+						 values, isnull);
+
+	/* Accounting. */
+	dstate->nchanges++;
+
+	/* Cleanup. */
+	pfree(change_raw);
+}
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2fa045e6b0f..e9ddf39500c 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -25,6 +25,7 @@
 #include "access/xlogprefetcher.h"
 #include "access/xlogrecovery.h"
 #include "commands/async.h"
+#include "commands/cluster.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
diff --git a/src/backend/storage/lmgr/generate-lwlocknames.pl b/src/backend/storage/lmgr/generate-lwlocknames.pl
index cd3e43c448a..519f3953638 100644
--- a/src/backend/storage/lmgr/generate-lwlocknames.pl
+++ b/src/backend/storage/lmgr/generate-lwlocknames.pl
@@ -162,7 +162,7 @@ while (<$lwlocklist>)
 
 die
   "$wait_event_lwlocks[$lwlock_count] defined in wait_event_names.txt but "
-  . " missing from lwlocklist.h"
+  . "missing from lwlocklist.h"
   if $lwlock_count < scalar @wait_event_lwlocks;
 
 die
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 6fe268a8eec..d27a4c30548 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -64,6 +64,7 @@
 #include "catalog/pg_type.h"
 #include "catalog/schemapg.h"
 #include "catalog/storage.h"
+#include "commands/cluster.h"
 #include "commands/policy.h"
 #include "commands/publicationcmds.h"
 #include "commands/trigger.h"
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index bc7840052fe..6d46537cbe8 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -213,7 +213,6 @@ static List *exportedSnapshots = NIL;
 
 /* Prototypes for local functions */
 static void UnregisterSnapshotNoOwner(Snapshot snapshot);
-static void FreeSnapshot(Snapshot snapshot);
 static void SnapshotResetXmin(void);
 
 /* ResourceOwner callbacks to track snapshot references */
@@ -657,7 +656,7 @@ CopySnapshot(Snapshot snapshot)
  * FreeSnapshot
  *		Free the memory associated with a snapshot.
  */
-static void
+void
 FreeSnapshot(Snapshot snapshot)
 {
 	Assert(snapshot->regd_count == 0);
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 59ff6e0923b..528fb08154a 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -4998,18 +4998,27 @@ match_previous_words(int pattern_id,
 	}
 
 /* REPACK */
-	else if (Matches("REPACK"))
+	else if (Matches("REPACK") || Matches("REPACK", "(*)"))
+		COMPLETE_WITH_SCHEMA_QUERY_PLUS(Query_for_list_of_clusterables,
+										"CONCURRENTLY");
+	else if (Matches("REPACK", "CONCURRENTLY"))
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
-	else if (Matches("REPACK", "(*)"))
+	else if (Matches("REPACK", "(*)", "CONCURRENTLY"))
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
-	/* If we have REPACK <sth>, then add "USING INDEX" */
-	else if (Matches("REPACK", MatchAnyExcept("(")))
+	/* If we have REPACK [ CONCURRENTLY ] <sth>, then add "USING INDEX" */
+	else if (Matches("REPACK", MatchAnyExcept("(|CONCURRENTLY")) ||
+			 Matches("REPACK", "CONCURRENTLY", MatchAnyExcept("(")))
 		COMPLETE_WITH("USING INDEX");
-	/* If we have REPACK (*) <sth>, then add "USING INDEX" */
-	else if (Matches("REPACK", "(*)", MatchAny))
+	/* If we have REPACK (*) [ CONCURRENTLY ] <sth>, then add "USING INDEX" */
+	else if (Matches("REPACK", "(*)", MatchAnyExcept("CONCURRENTLY")) ||
+			 Matches("REPACK", "(*)", "CONCURRENTLY", MatchAnyExcept("(")))
 		COMPLETE_WITH("USING INDEX");
-	/* If we have REPACK <sth> USING, then add the index as well */
-	else if (Matches("REPACK", MatchAny, "USING", "INDEX"))
+
+	/*
+	 * Complete ... [ (*) ] [ CONCURRENTLY ] <sth> USING INDEX, with a list of
+	 * indexes for <sth>.
+	 */
+	else if (TailMatches(MatchAnyExcept("(|CONCURRENTLY"), "USING", "INDEX"))
 	{
 		set_completion_reference(prev3_wd);
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_index_of_table);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2bd5a897f8..b82dd17a966 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -323,14 +323,15 @@ extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots,
 							  BulkInsertState bistate);
 extern TM_Result heap_delete(Relation relation, ItemPointer tid,
 							 CommandId cid, Snapshot crosscheck, bool wait,
-							 struct TM_FailureData *tmfd, bool changingPart);
+							 struct TM_FailureData *tmfd, bool changingPart,
+							 bool wal_logical);
 extern void heap_finish_speculative(Relation relation, ItemPointer tid);
 extern void heap_abort_speculative(Relation relation, ItemPointer tid);
 extern TM_Result heap_update(Relation relation, ItemPointer otid,
 							 HeapTuple newtup,
 							 CommandId cid, Snapshot crosscheck, bool wait,
 							 struct TM_FailureData *tmfd, LockTupleMode *lockmode,
-							 TU_UpdateIndexes *update_indexes);
+							 TU_UpdateIndexes *update_indexes, bool wal_logical);
 extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
 								 CommandId cid, LockTupleMode mode, LockWaitPolicy wait_policy,
 								 bool follow_updates,
@@ -411,6 +412,10 @@ extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
 								 uint16 infomask, TransactionId xid);
+extern bool HeapTupleMVCCInserted(HeapTuple htup, Snapshot snapshot,
+								  Buffer buffer);
+extern bool HeapTupleMVCCNotDeleted(HeapTuple htup, Snapshot snapshot,
+									Buffer buffer);
 extern bool HeapTupleHeaderIsOnlyLocked(HeapTupleHeader tuple);
 extern bool HeapTupleIsSurelyDead(HeapTuple htup,
 								  struct GlobalVisState *vistest);
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 277df6b3cf0..8d4af07f840 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -104,6 +104,8 @@
 #define XLH_DELETE_CONTAINS_OLD_KEY				(1<<2)
 #define XLH_DELETE_IS_SUPER						(1<<3)
 #define XLH_DELETE_IS_PARTITION_MOVE			(1<<4)
+/* See heap_delete() */
+#define XLH_DELETE_NO_LOGICAL					(1<<5)
 
 /* convenience macro for checking whether any form of old tuple was logged */
 #define XLH_DELETE_CONTAINS_OLD						\
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 1c9e802a6b1..289b64edfd9 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -22,6 +22,7 @@
 #include "access/xact.h"
 #include "commands/vacuum.h"
 #include "executor/tuptable.h"
+#include "replication/logical.h"
 #include "storage/read_stream.h"
 #include "utils/rel.h"
 #include "utils/snapshot.h"
@@ -623,6 +624,8 @@ typedef struct TableAmRoutine
 											  Relation OldIndex,
 											  bool use_sort,
 											  TransactionId OldestXmin,
+											  Snapshot snapshot,
+											  LogicalDecodingContext *decoding_ctx,
 											  TransactionId *xid_cutoff,
 											  MultiXactId *multi_cutoff,
 											  double *num_tuples,
@@ -1627,6 +1630,10 @@ table_relation_copy_data(Relation rel, const RelFileLocator *newrlocator)
  *   not needed for the relation's AM
  * - *xid_cutoff - ditto
  * - *multi_cutoff - ditto
+ * - snapshot - if != NULL, ignore data changes done by transactions that this
+ *	 (MVCC) snapshot considers still in-progress or in the future.
+ * - decoding_ctx - logical decoding context, to capture concurrent data
+ *   changes.
  *
  * Output parameters:
  * - *xid_cutoff - rel's new relfrozenxid value, may be invalid
@@ -1639,6 +1646,8 @@ table_relation_copy_for_cluster(Relation OldTable, Relation NewTable,
 								Relation OldIndex,
 								bool use_sort,
 								TransactionId OldestXmin,
+								Snapshot snapshot,
+								LogicalDecodingContext *decoding_ctx,
 								TransactionId *xid_cutoff,
 								MultiXactId *multi_cutoff,
 								double *num_tuples,
@@ -1647,6 +1656,7 @@ table_relation_copy_for_cluster(Relation OldTable, Relation NewTable,
 {
 	OldTable->rd_tableam->relation_copy_for_cluster(OldTable, NewTable, OldIndex,
 													use_sort, OldestXmin,
+													snapshot, decoding_ctx,
 													xid_cutoff, multi_cutoff,
 													num_tuples, tups_vacuumed,
 													tups_recently_dead);
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 890998d84bb..4a508c57a50 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -13,10 +13,15 @@
 #ifndef CLUSTER_H
 #define CLUSTER_H
 
+#include "nodes/execnodes.h"
 #include "nodes/parsenodes.h"
 #include "parser/parse_node.h"
+#include "replication/logical.h"
 #include "storage/lock.h"
+#include "storage/relfilelocator.h"
 #include "utils/relcache.h"
+#include "utils/resowner.h"
+#include "utils/tuplestore.h"
 
 
 /* flag bits for ClusterParams->options */
@@ -25,6 +30,8 @@
 #define CLUOPT_RECHECK_ISCLUSTERED 0x04 /* recheck relation state for
 										 * indisclustered */
 #define CLUOPT_ANALYZE 0x08		/* do an ANALYZE */
+#define CLUOPT_CONCURRENT 0x08	/* allow concurrent data changes */
+
 
 /* options for CLUSTER */
 typedef struct ClusterParams
@@ -33,14 +40,95 @@ typedef struct ClusterParams
 } ClusterParams;
 
 
+/*
+ * The following definitions are used by REPACK CONCURRENTLY.
+ */
+
+extern RelFileLocator repacked_rel_locator;
+extern RelFileLocator repacked_rel_toast_locator;
+
+typedef enum
+{
+	CHANGE_INSERT,
+	CHANGE_UPDATE_OLD,
+	CHANGE_UPDATE_NEW,
+	CHANGE_DELETE,
+	CHANGE_TRUNCATE
+} ConcurrentChangeKind;
+
+typedef struct ConcurrentChange
+{
+	/* See the enum above. */
+	ConcurrentChangeKind kind;
+
+	/*
+	 * The actual tuple.
+	 *
+	 * The tuple data follows the ConcurrentChange structure. Before use make
+	 * sure the tuple is correctly aligned (ConcurrentChange can be stored as
+	 * bytea) and that tuple->t_data is fixed.
+	 */
+	HeapTupleData tup_data;
+} ConcurrentChange;
+
+#define SizeOfConcurrentChange (offsetof(ConcurrentChange, tup_data) + \
+								sizeof(HeapTupleData))
+
+/*
+ * Logical decoding state.
+ *
+ * Here we store the data changes that we decode from WAL while the table
+ * contents is being copied to a new storage. Also the necessary metadata
+ * needed to apply these changes to the table is stored here.
+ */
+typedef struct RepackDecodingState
+{
+	/* The relation whose changes we're decoding. */
+	Oid			relid;
+
+	/*
+	 * Decoded changes are stored here. Although we try to avoid excessive
+	 * batches, it can happen that the changes need to be stored to disk. The
+	 * tuplestore does this transparently.
+	 */
+	Tuplestorestate *tstore;
+
+	/* The current number of changes in tstore. */
+	double		nchanges;
+
+	/*
+	 * Descriptor to store the ConcurrentChange structure serialized (bytea).
+	 * We can't store the tuple directly because tuplestore only supports
+	 * minimum tuple and we may need to transfer OID system column from the
+	 * output plugin. Also we need to transfer the change kind, so it's better
+	 * to put everything in the structure than to use 2 tuplestores "in
+	 * parallel".
+	 */
+	TupleDesc	tupdesc_change;
+
+	/* Tuple descriptor needed to update indexes. */
+	TupleDesc	tupdesc;
+
+	/* Slot to retrieve data from tstore. */
+	TupleTableSlot *tsslot;
+
+	ResourceOwner resowner;
+} RepackDecodingState;
+
+
+
 extern void ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel);
 
 extern void cluster_rel(RepackCommand command, bool usingindex,
-						Relation OldHeap, Oid indexOid, ClusterParams *params);
+						Relation OldHeap, Oid indexOid, ClusterParams *params,
+						bool isTopLevel);
 extern void check_index_is_clusterable(Relation OldHeap, Oid indexOid,
 									   LOCKMODE lockmode);
 extern void mark_index_clustered(Relation rel, Oid indexOid, bool is_internal);
 
+extern void repack_decode_concurrent_changes(LogicalDecodingContext *ctx,
+											 XLogRecPtr end_of_wal);
+
 extern Oid	make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
 						  char relpersistence, LOCKMODE lockmode);
 extern void finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
@@ -48,6 +136,7 @@ extern void finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 							 bool swap_toast_by_content,
 							 bool check_constraints,
 							 bool is_internal,
+							 bool reindex,
 							 TransactionId frozenXid,
 							 MultiXactId cutoffMulti,
 							 char newrelpersistence);
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 5b6639c114c..93917ad5544 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -59,18 +59,20 @@
 /*
  * Progress parameters for REPACK.
  *
- * Note: Since REPACK shares some code with CLUSTER, these values are also
- * used by CLUSTER. (CLUSTER is now deprecated, so it makes little sense to
- * introduce a separate set of constants.)
+ * Note: Since REPACK shares some code with CLUSTER, (some of) these values
+ * are also used by CLUSTER. (CLUSTER is now deprecated, so it makes little
+ * sense to introduce a separate set of constants.)
  */
 #define PROGRESS_REPACK_COMMAND					0
 #define PROGRESS_REPACK_PHASE					1
 #define PROGRESS_REPACK_INDEX_RELID				2
 #define PROGRESS_REPACK_HEAP_TUPLES_SCANNED		3
-#define PROGRESS_REPACK_HEAP_TUPLES_WRITTEN		4
-#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			5
-#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		6
-#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		7
+#define PROGRESS_REPACK_HEAP_TUPLES_INSERTED	4
+#define PROGRESS_REPACK_HEAP_TUPLES_UPDATED		5
+#define PROGRESS_REPACK_HEAP_TUPLES_DELETED		6
+#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			7
+#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		8
+#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		9
 
 /*
  * Phases of repack (as advertised via PROGRESS_REPACK_PHASE).
@@ -79,9 +81,10 @@
 #define PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP	2
 #define PROGRESS_REPACK_PHASE_SORT_TUPLES		3
 #define PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP	4
-#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	5
-#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		6
-#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		7
+#define PROGRESS_REPACK_PHASE_CATCH_UP			5
+#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	6
+#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		7
+#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		8
 
 /*
  * Commands of PROGRESS_REPACK
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index 6d4d2d1814c..802fc4b0823 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -73,6 +73,7 @@ extern void FreeSnapshotBuilder(SnapBuild *builder);
 extern void SnapBuildSnapDecRefcount(Snapshot snap);
 
 extern Snapshot SnapBuildInitialSnapshot(SnapBuild *builder);
+extern Snapshot SnapBuildInitialSnapshotForRepack(SnapBuild *builder);
 extern Snapshot SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place);
 extern const char *SnapBuildExportSnapshot(SnapBuild *builder);
 extern void SnapBuildClearExportedSnapshot(void);
diff --git a/src/include/storage/lockdefs.h b/src/include/storage/lockdefs.h
index 7f3ba0352f6..2739327b0da 100644
--- a/src/include/storage/lockdefs.h
+++ b/src/include/storage/lockdefs.h
@@ -36,8 +36,8 @@ typedef int LOCKMODE;
 #define AccessShareLock			1	/* SELECT */
 #define RowShareLock			2	/* SELECT FOR UPDATE/FOR SHARE */
 #define RowExclusiveLock		3	/* INSERT, UPDATE, DELETE */
-#define ShareUpdateExclusiveLock 4	/* VACUUM (non-FULL), ANALYZE, CREATE
-									 * INDEX CONCURRENTLY */
+#define ShareUpdateExclusiveLock 4	/* VACUUM (non-exclusive), ANALYZE, CREATE
+									 * INDEX CONCURRENTLY, REPACK CONCURRENTLY */
 #define ShareLock				5	/* CREATE INDEX (WITHOUT CONCURRENTLY) */
 #define ShareRowExclusiveLock	6	/* like EXCLUSIVE MODE, but allows ROW
 									 * SHARE */
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index f65f83c85cd..1f821fd2ccd 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -64,6 +64,8 @@ extern Snapshot GetLatestSnapshot(void);
 extern void SnapshotSetCommandId(CommandId curcid);
 
 extern Snapshot CopySnapshot(Snapshot snapshot);
+extern void FreeSnapshot(Snapshot snapshot);
+
 extern Snapshot GetCatalogSnapshot(Oid relid);
 extern Snapshot GetNonHistoricCatalogSnapshot(Oid relid);
 extern void InvalidateCatalogSnapshot(void);
diff --git a/src/test/modules/injection_points/Makefile b/src/test/modules/injection_points/Makefile
index fc82cd67f6c..f16422175f8 100644
--- a/src/test/modules/injection_points/Makefile
+++ b/src/test/modules/injection_points/Makefile
@@ -11,10 +11,11 @@ EXTENSION = injection_points
 DATA = injection_points--1.0.sql
 PGFILEDESC = "injection_points - facility for injection points"
 
-REGRESS = injection_points hashagg reindex_conc vacuum
+# REGRESS = injection_points hashagg reindex_conc vacuum
 REGRESS_OPTS = --dlpath=$(top_builddir)/src/test/regress
 
-ISOLATION = basic inplace syscache-update-pruned
+ISOLATION = basic inplace syscache-update-pruned repack
+ISOLATION_OPTS = --temp-config $(top_srcdir)/src/test/modules/injection_points/logical.conf
 
 TAP_TESTS = 1
 
diff --git a/src/test/modules/injection_points/expected/repack.out b/src/test/modules/injection_points/expected/repack.out
new file mode 100644
index 00000000000..b575e9052ee
--- /dev/null
+++ b/src/test/modules/injection_points/expected/repack.out
@@ -0,0 +1,113 @@
+Parsed test spec with 2 sessions
+
+starting permutation: wait_before_lock change_existing change_new change_subxact1 change_subxact2 check2 wakeup_before_lock check1
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step wait_before_lock: 
+	REPACK (CONCURRENTLY) repack_test USING INDEX repack_test_pkey;
+ <waiting ...>
+step change_existing: 
+	UPDATE repack_test SET i=10 where i=1;
+	UPDATE repack_test SET j=20 where i=2;
+	UPDATE repack_test SET i=30 where i=3;
+	UPDATE repack_test SET i=40 where i=30;
+	DELETE FROM repack_test WHERE i=4;
+
+step change_new: 
+	INSERT INTO repack_test(i, j) VALUES (5, 5), (6, 6), (7, 7), (8, 8);
+	UPDATE repack_test SET i=50 where i=5;
+	UPDATE repack_test SET j=60 where i=6;
+	DELETE FROM repack_test WHERE i=7;
+
+step change_subxact1: 
+	BEGIN;
+	INSERT INTO repack_test(i, j) VALUES (100, 100);
+	SAVEPOINT s1;
+	UPDATE repack_test SET i=101 where i=100;
+	SAVEPOINT s2;
+	UPDATE repack_test SET i=102 where i=101;
+	COMMIT;
+
+step change_subxact2: 
+	BEGIN;
+	SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 110);
+	ROLLBACK TO SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 111);
+	COMMIT;
+
+step check2: 
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s2(i, j)
+	SELECT i, j FROM repack_test;
+
+  i|  j
+---+---
+  2| 20
+  6| 60
+  8|  8
+ 10|  1
+ 40|  3
+ 50|  5
+102|100
+110|111
+(8 rows)
+
+step wakeup_before_lock: 
+	SELECT injection_points_wakeup('repack-concurrently-before-lock');
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step wait_before_lock: <... completed>
+step check1: 
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT count(DISTINCT node) FROM relfilenodes;
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s1(i, j)
+	SELECT i, j FROM repack_test;
+
+	SELECT count(*)
+	FROM data_s1 d1 FULL JOIN data_s2 d2 USING (i, j)
+	WHERE d1.i ISNULL OR d2.i ISNULL;
+
+count
+-----
+    2
+(1 row)
+
+  i|  j
+---+---
+  2| 20
+  6| 60
+  8|  8
+ 10|  1
+ 40|  3
+ 50|  5
+102|100
+110|111
+(8 rows)
+
+count
+-----
+    0
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
diff --git a/src/test/modules/injection_points/logical.conf b/src/test/modules/injection_points/logical.conf
new file mode 100644
index 00000000000..c8f264bc6cb
--- /dev/null
+++ b/src/test/modules/injection_points/logical.conf
@@ -0,0 +1 @@
+wal_level = logical
\ No newline at end of file
diff --git a/src/test/modules/injection_points/meson.build b/src/test/modules/injection_points/meson.build
index 20390d6b4bf..29561103bbf 100644
--- a/src/test/modules/injection_points/meson.build
+++ b/src/test/modules/injection_points/meson.build
@@ -47,9 +47,13 @@ tests += {
     'specs': [
       'basic',
       'inplace',
+      'repack',
       'syscache-update-pruned',
     ],
     'runningcheck': false, # see syscache-update-pruned
+    # 'repack' requires wal_level = 'logical'.
+    'regress_args': ['--temp-config', files('logical.conf')],
+
   },
   'tap': {
     'env': {
diff --git a/src/test/modules/injection_points/specs/repack.spec b/src/test/modules/injection_points/specs/repack.spec
new file mode 100644
index 00000000000..75850334986
--- /dev/null
+++ b/src/test/modules/injection_points/specs/repack.spec
@@ -0,0 +1,143 @@
+# Prefix the system columns with underscore as they are not allowed as column
+# names.
+setup
+{
+	CREATE EXTENSION injection_points;
+
+	CREATE TABLE repack_test(i int PRIMARY KEY, j int);
+	INSERT INTO repack_test(i, j) VALUES (1, 1), (2, 2), (3, 3), (4, 4);
+
+	CREATE TABLE relfilenodes(node oid);
+
+	CREATE TABLE data_s1(i int, j int);
+	CREATE TABLE data_s2(i int, j int);
+}
+
+teardown
+{
+	DROP TABLE repack_test;
+	DROP EXTENSION injection_points;
+
+	DROP TABLE relfilenodes;
+	DROP TABLE data_s1;
+	DROP TABLE data_s2;
+}
+
+session s1
+setup
+{
+	SELECT injection_points_set_local();
+	SELECT injection_points_attach('repack-concurrently-before-lock', 'wait');
+}
+# Perform the initial load and wait for s2 to do some data changes.
+step wait_before_lock
+{
+	REPACK (CONCURRENTLY) repack_test USING INDEX repack_test_pkey;
+}
+# Check the table from the perspective of s1.
+#
+# Besides the contents, we also check that relfilenode has changed.
+
+# Have each session write the contents into a table and use FULL JOIN to check
+# if the outputs are identical.
+step check1
+{
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT count(DISTINCT node) FROM relfilenodes;
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s1(i, j)
+	SELECT i, j FROM repack_test;
+
+	SELECT count(*)
+	FROM data_s1 d1 FULL JOIN data_s2 d2 USING (i, j)
+	WHERE d1.i ISNULL OR d2.i ISNULL;
+}
+teardown
+{
+    SELECT injection_points_detach('repack-concurrently-before-lock');
+}
+
+session s2
+# Change the existing data. UPDATE changes both key and non-key columns. Also
+# update one row twice to test whether tuple version generated by this session
+# can be found.
+step change_existing
+{
+	UPDATE repack_test SET i=10 where i=1;
+	UPDATE repack_test SET j=20 where i=2;
+	UPDATE repack_test SET i=30 where i=3;
+	UPDATE repack_test SET i=40 where i=30;
+	DELETE FROM repack_test WHERE i=4;
+}
+# Insert new rows and UPDATE / DELETE some of them. Again, update both key and
+# non-key column.
+step change_new
+{
+	INSERT INTO repack_test(i, j) VALUES (5, 5), (6, 6), (7, 7), (8, 8);
+	UPDATE repack_test SET i=50 where i=5;
+	UPDATE repack_test SET j=60 where i=6;
+	DELETE FROM repack_test WHERE i=7;
+}
+
+# When applying concurrent data changes, we should see the effects of an
+# in-progress subtransaction.
+#
+# XXX Not sure this test is useful now - it was designed for the patch that
+# preserves tuple visibility and which therefore modifies
+# TransactionIdIsCurrentTransactionId().
+step change_subxact1
+{
+	BEGIN;
+	INSERT INTO repack_test(i, j) VALUES (100, 100);
+	SAVEPOINT s1;
+	UPDATE repack_test SET i=101 where i=100;
+	SAVEPOINT s2;
+	UPDATE repack_test SET i=102 where i=101;
+	COMMIT;
+}
+
+# When applying concurrent data changes, we should not see the effects of a
+# rolled back subtransaction.
+#
+# XXX Is this test useful? See above.
+step change_subxact2
+{
+	BEGIN;
+	SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 110);
+	ROLLBACK TO SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 111);
+	COMMIT;
+}
+
+# Check the table from the perspective of s2.
+step check2
+{
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s2(i, j)
+	SELECT i, j FROM repack_test;
+}
+step wakeup_before_lock
+{
+	SELECT injection_points_wakeup('repack-concurrently-before-lock');
+}
+
+# Test if data changes introduced while one session is performing REPACK
+# CONCURRENTLY find their way into the table.
+permutation
+	wait_before_lock
+	change_existing
+	change_new
+	change_subxact1
+	change_subxact2
+	check2
+	wakeup_before_lock
+	check1
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 3a1d1d28282..fe227bd8a30 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1999,17 +1999,17 @@ pg_stat_progress_cluster| SELECT s.pid,
             WHEN 2 THEN 'index scanning heap'::text
             WHEN 3 THEN 'sorting tuples'::text
             WHEN 4 THEN 'writing new heap'::text
-            WHEN 5 THEN 'swapping relation files'::text
-            WHEN 6 THEN 'rebuilding index'::text
-            WHEN 7 THEN 'performing final cleanup'::text
+            WHEN 6 THEN 'swapping relation files'::text
+            WHEN 7 THEN 'rebuilding index'::text
+            WHEN 8 THEN 'performing final cleanup'::text
             ELSE NULL::text
         END AS phase,
     (s.param3)::oid AS cluster_index_relid,
     s.param4 AS heap_tuples_scanned,
     s.param5 AS heap_tuples_written,
-    s.param6 AS heap_blks_total,
-    s.param7 AS heap_blks_scanned,
-    s.param8 AS index_rebuild_count
+    s.param8 AS heap_blks_total,
+    s.param9 AS heap_blks_scanned,
+    s.param10 AS index_rebuild_count
    FROM (pg_stat_get_progress_info('CLUSTER'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_copy| SELECT s.pid,
@@ -2081,17 +2081,20 @@ pg_stat_progress_repack| SELECT s.pid,
             WHEN 2 THEN 'index scanning heap'::text
             WHEN 3 THEN 'sorting tuples'::text
             WHEN 4 THEN 'writing new heap'::text
-            WHEN 5 THEN 'swapping relation files'::text
-            WHEN 6 THEN 'rebuilding index'::text
-            WHEN 7 THEN 'performing final cleanup'::text
+            WHEN 5 THEN 'catch-up'::text
+            WHEN 6 THEN 'swapping relation files'::text
+            WHEN 7 THEN 'rebuilding index'::text
+            WHEN 8 THEN 'performing final cleanup'::text
             ELSE NULL::text
         END AS phase,
     (s.param3)::oid AS repack_index_relid,
     s.param4 AS heap_tuples_scanned,
-    s.param5 AS heap_tuples_written,
-    s.param6 AS heap_blks_total,
-    s.param7 AS heap_blks_scanned,
-    s.param8 AS index_rebuild_count
+    s.param5 AS heap_tuples_inserted,
+    s.param6 AS heap_tuples_updated,
+    s.param7 AS heap_tuples_deleted,
+    s.param8 AS heap_blks_total,
+    s.param9 AS heap_blks_scanned,
+    s.param10 AS index_rebuild_count
    FROM (pg_stat_get_progress_info('REPACK'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_vacuum| SELECT s.pid,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 98242e25432..b64ab8dfab4 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -485,6 +485,8 @@ CompressFileHandle
 CompressionLocation
 CompressorState
 ComputeXidHorizonsResult
+ConcurrentChange
+ConcurrentChangeKind
 ConditionVariable
 ConditionVariableMinimallyPadded
 ConditionalStack
@@ -1257,6 +1259,7 @@ IndexElem
 IndexFetchHeapData
 IndexFetchTableData
 IndexInfo
+IndexInsertState
 IndexList
 IndexOnlyScan
 IndexOnlyScanState
@@ -2538,6 +2541,7 @@ ReorderBufferUpdateProgressTxnCB
 ReorderTuple
 RepOriginId
 RepackCommand
+RepackDecodingState
 RepackStmt
 ReparameterizeForeignPathByChild_function
 ReplaceVarsFromTargetList_context
-- 
2.39.5

v19-0006-Preserve-visibility-information-of-the-concurren.patchtext/x-diff; charset=utf-8Download

From 7908447f751783a267706558efada48a6efddb37 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= <alvherre@kurilemu.de>
Date: Sat, 30 Aug 2025 19:40:04 +0200
Subject: [PATCH v19 6/6] Preserve visibility information of the concurrent
 data changes.

As explained in the commit message of the preceding patch of the series, the
data changes done by applications while REPACK CONCURRENTLY is copying the
table contents to a new file are decoded from WAL and eventually also applied
to the new file. To reduce the complexity a little bit, the preceding patch
uses the current transaction (i.e. transaction opened by the REPACK command)
to execute those INSERT, UPDATE and DELETE commands.

However, REPACK is not expected to change visibility of tuples. Therefore,
this patch fixes the handling of the "concurrent data changes". It ensures
that tuples written into the new table have the same XID and command ID (CID)
as they had in the old table.

To "replay" an UPDATE or DELETE command on the new table, we need the
appropriate snapshot to find the previous tuple version in the new table. The
(historic) snapshot we used to decode the UPDATE / DELETE should (by
definition) see the state of the catalog prior to that UPDATE / DELETE. Thus
we can use the same snapshot to find the "old tuple" for UPDATE / DELETE in
the new table if:

1) REPACK CONCURRENTLY preserves visibility information of all tuples - that's
the purpose of this part of the patch series.

2) The table being REPACKed is treated as a system catalog by all transactions
that modify its data. This ensures that reorderbuffer.c generates a new
snapshot for each data change in the table.

We ensure 2) by maintaining a shared hashtable of tables being REPACKed
CONCURRENTLY and by adjusting the RelationIsAccessibleInLogicalDecoding()
macro so it checks this hashtable. (The corresponding flag is also added to
the relation cache, so that the shared hashtable does not have to be accessed
too often.) It's essential that after adding an entry to the hashtable we wait
for completion of all the transactions that might have started to modify our
table before our entry has was added. We achieve that by upgrading our lock on
the table to ShareLock temporarily: as soon as we acquire it, no DML command
should be running on the table. (This lock upgrade shouldn't cause any
deadlock because we care to not hold a lock on other objects at the same
time.)

As long as we preserve the tuple visibility information (which includes XID),
it's important to avoid logical decoding of the WAL generated by DMLs on the
new table: the logical decoding subsystem probably does not expect that the
incoming WAL records contain XIDs of an already decoded transactions. (And of
course, repeated decoding would be wasted effort.)

Author: Antonin Houska <ah@cybertec.at>
Author: Mikhail Nikalayeu <mihailnikalayeu@gmail.com> (small changes)
---
 src/backend/access/common/toast_internals.c   |   3 +-
 src/backend/access/heap/heapam.c              |  51 ++-
 src/backend/access/heap/heapam_handler.c      |  23 +-
 src/backend/access/transam/xact.c             |  52 +++
 src/backend/commands/cluster.c                | 400 ++++++++++++++++--
 src/backend/replication/logical/decode.c      |  28 +-
 src/backend/replication/logical/snapbuild.c   |  22 +-
 .../pgoutput_repack/pgoutput_repack.c         |  68 ++-
 src/backend/storage/ipc/ipci.c                |   2 +
 .../utils/activity/wait_event_names.txt       |   1 +
 src/backend/utils/cache/inval.c               |  21 +
 src/backend/utils/cache/relcache.c            |   4 +
 src/include/access/heapam.h                   |  12 +-
 src/include/access/xact.h                     |   2 +
 src/include/commands/cluster.h                |  22 +
 src/include/storage/lwlocklist.h              |   1 +
 src/include/utils/inval.h                     |   2 +
 src/include/utils/rel.h                       |   7 +-
 src/include/utils/snapshot.h                  |   3 +
 .../injection_points/specs/repack.spec        |   4 -
 src/tools/pgindent/typedefs.list              |   1 +
 21 files changed, 635 insertions(+), 94 deletions(-)

diff --git a/src/backend/access/common/toast_internals.c b/src/backend/access/common/toast_internals.c
index a1d0eed8953..586eb42a137 100644
--- a/src/backend/access/common/toast_internals.c
+++ b/src/backend/access/common/toast_internals.c
@@ -320,7 +320,8 @@ toast_save_datum(Relation rel, Datum value,
 		memcpy(VARDATA(&chunk_data), data_p, chunk_size);
 		toasttup = heap_form_tuple(toasttupDesc, t_values, t_isnull);
 
-		heap_insert(toastrel, toasttup, mycid, options, NULL);
+		heap_insert(toastrel, toasttup, GetCurrentTransactionId(), mycid,
+					options, NULL);
 
 		/*
 		 * Create the index entry.  We cheat a little here by not using
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index f9a4fe3faed..fd17286cabe 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2070,7 +2070,7 @@ ReleaseBulkInsertStatePin(BulkInsertState bistate)
 /*
  *	heap_insert		- insert tuple into a heap
  *
- * The new tuple is stamped with current transaction ID and the specified
+ * The new tuple is stamped with specified transaction ID and the specified
  * command ID.
  *
  * See table_tuple_insert for comments about most of the input flags, except
@@ -2086,15 +2086,16 @@ ReleaseBulkInsertStatePin(BulkInsertState bistate)
  * reflected into *tup.
  */
 void
-heap_insert(Relation relation, HeapTuple tup, CommandId cid,
-			int options, BulkInsertState bistate)
+heap_insert(Relation relation, HeapTuple tup, TransactionId xid,
+			CommandId cid, int options, BulkInsertState bistate)
 {
-	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
+	Assert(TransactionIdIsValid(xid));
+
 	/* Cheap, simplistic check that the tuple matches the rel's rowtype. */
 	Assert(HeapTupleHeaderGetNatts(tup->t_data) <=
 		   RelationGetNumberOfAttributes(relation));
@@ -2176,8 +2177,15 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		/*
 		 * If this is a catalog, we need to transmit combo CIDs to properly
 		 * decode, so log that as well.
+		 *
+		 * HEAP_INSERT_NO_LOGICAL should be set when applying data changes
+		 * done by other transactions during REPACK CONCURRENTLY. In such a
+		 * case, the insertion should not be decoded at all - see
+		 * heap_decode(). (It's also set by raw_heap_insert() for TOAST, but
+		 * TOAST does not pass this test anyway.)
 		 */
-		if (RelationIsAccessibleInLogicalDecoding(relation))
+		if ((options & HEAP_INSERT_NO_LOGICAL) == 0 &&
+			RelationIsAccessibleInLogicalDecoding(relation))
 			log_heap_new_cid(relation, heaptup);
 
 		/*
@@ -2723,7 +2731,8 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 void
 simple_heap_insert(Relation relation, HeapTuple tup)
 {
-	heap_insert(relation, tup, GetCurrentCommandId(true), 0, NULL);
+	heap_insert(relation, tup, GetCurrentTransactionId(),
+				GetCurrentCommandId(true), 0, NULL);
 }
 
 /*
@@ -2780,11 +2789,11 @@ xmax_infomask_changed(uint16 new_infomask, uint16 old_infomask)
  */
 TM_Result
 heap_delete(Relation relation, ItemPointer tid,
-			CommandId cid, Snapshot crosscheck, bool wait,
-			TM_FailureData *tmfd, bool changingPart, bool wal_logical)
+			TransactionId xid, CommandId cid, Snapshot crosscheck, bool wait,
+			TM_FailureData *tmfd, bool changingPart,
+			bool wal_logical)
 {
 	TM_Result	result;
-	TransactionId xid = GetCurrentTransactionId();
 	ItemId		lp;
 	HeapTupleData tp;
 	Page		page;
@@ -2801,6 +2810,7 @@ heap_delete(Relation relation, ItemPointer tid,
 	bool		old_key_copied = false;
 
 	Assert(ItemPointerIsValid(tid));
+	Assert(TransactionIdIsValid(xid));
 
 	AssertHasSnapshotForToast(relation);
 
@@ -3097,8 +3107,12 @@ l1:
 		/*
 		 * For logical decode we need combo CIDs to properly decode the
 		 * catalog
+		 *
+		 * Like in heap_insert(), visibility is unchanged when called from
+		 * VACUUM FULL / CLUSTER.
 		 */
-		if (RelationIsAccessibleInLogicalDecoding(relation))
+		if (wal_logical &&
+			RelationIsAccessibleInLogicalDecoding(relation))
 			log_heap_new_cid(relation, &tp);
 
 		xlrec.flags = 0;
@@ -3217,11 +3231,12 @@ simple_heap_delete(Relation relation, ItemPointer tid)
 	TM_Result	result;
 	TM_FailureData tmfd;
 
-	result = heap_delete(relation, tid,
+	result = heap_delete(relation, tid, GetCurrentTransactionId(),
 						 GetCurrentCommandId(true), InvalidSnapshot,
 						 true /* wait for commit */ ,
 						 &tmfd, false,	/* changingPart */
 						 true /* wal_logical */ );
+
 	switch (result)
 	{
 		case TM_SelfModified:
@@ -3260,12 +3275,11 @@ simple_heap_delete(Relation relation, ItemPointer tid)
  */
 TM_Result
 heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
-			CommandId cid, Snapshot crosscheck, bool wait,
-			TM_FailureData *tmfd, LockTupleMode *lockmode,
+			TransactionId xid, CommandId cid, Snapshot crosscheck,
+			bool wait, TM_FailureData *tmfd, LockTupleMode *lockmode,
 			TU_UpdateIndexes *update_indexes, bool wal_logical)
 {
 	TM_Result	result;
-	TransactionId xid = GetCurrentTransactionId();
 	Bitmapset  *hot_attrs;
 	Bitmapset  *sum_attrs;
 	Bitmapset  *key_attrs;
@@ -3305,6 +3319,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 				infomask2_new_tuple;
 
 	Assert(ItemPointerIsValid(otid));
+	Assert(TransactionIdIsValid(xid));
 
 	/* Cheap, simplistic check that the tuple matches the rel's rowtype. */
 	Assert(HeapTupleHeaderGetNatts(newtup->t_data) <=
@@ -4144,8 +4159,12 @@ l2:
 		/*
 		 * For logical decoding we need combo CIDs to properly decode the
 		 * catalog.
+		 *
+		 * Like in heap_insert(), visibility is unchanged when called from
+		 * VACUUM FULL / CLUSTER.
 		 */
-		if (RelationIsAccessibleInLogicalDecoding(relation))
+		if (wal_logical &&
+			RelationIsAccessibleInLogicalDecoding(relation))
 		{
 			log_heap_new_cid(relation, &oldtup);
 			log_heap_new_cid(relation, heaptup);
@@ -4511,7 +4530,7 @@ simple_heap_update(Relation relation, ItemPointer otid, HeapTuple tup,
 	TM_FailureData tmfd;
 	LockTupleMode lockmode;
 
-	result = heap_update(relation, otid, tup,
+	result = heap_update(relation, otid, tup, GetCurrentTransactionId(),
 						 GetCurrentCommandId(true), InvalidSnapshot,
 						 true /* wait for commit */ ,
 						 &tmfd, &lockmode, update_indexes,
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index d03084768e0..b50f7dc9b9c 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -253,7 +253,8 @@ heapam_tuple_insert(Relation relation, TupleTableSlot *slot, CommandId cid,
 	tuple->t_tableOid = slot->tts_tableOid;
 
 	/* Perform the insertion, and copy the resulting ItemPointer */
-	heap_insert(relation, tuple, cid, options, bistate);
+	heap_insert(relation, tuple, GetCurrentTransactionId(), cid, options,
+				bistate);
 	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
 
 	if (shouldFree)
@@ -276,7 +277,8 @@ heapam_tuple_insert_speculative(Relation relation, TupleTableSlot *slot,
 	options |= HEAP_INSERT_SPECULATIVE;
 
 	/* Perform the insertion, and copy the resulting ItemPointer */
-	heap_insert(relation, tuple, cid, options, bistate);
+	heap_insert(relation, tuple, GetCurrentTransactionId(), cid, options,
+				bistate);
 	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
 
 	if (shouldFree)
@@ -310,8 +312,8 @@ heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
 	 * the storage itself is cleaning the dead tuples by itself, it is the
 	 * time to call the index tuple deletion also.
 	 */
-	return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart,
-					   true);
+	return heap_delete(relation, tid, GetCurrentTransactionId(), cid,
+					   crosscheck, wait, tmfd, changingPart, true);
 }
 
 
@@ -329,7 +331,8 @@ heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
 	slot->tts_tableOid = RelationGetRelid(relation);
 	tuple->t_tableOid = slot->tts_tableOid;
 
-	result = heap_update(relation, otid, tuple, cid, crosscheck, wait,
+	result = heap_update(relation, otid, tuple, GetCurrentTransactionId(),
+						 cid, crosscheck, wait,
 						 tmfd, lockmode, update_indexes, true);
 	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
 
@@ -2477,9 +2480,15 @@ reform_and_rewrite_tuple(HeapTuple tuple,
 		 * flag to skip logical decoding: as soon as REPACK CONCURRENTLY swaps
 		 * the relation files, it drops this relation, so no logical
 		 * replication subscription should need the data.
+		 *
+		 * It is also crucial to stamp the new record with the exact same xid
+		 * and cid, because the tuple must be visible to the snapshot of the
+		 * applied concurrent change later.
 		 */
-		heap_insert(NewHeap, copiedTuple, GetCurrentCommandId(true),
-					HEAP_INSERT_NO_LOGICAL, NULL);
+		CommandId	cid = HeapTupleHeaderGetRawCommandId(tuple->t_data);
+		TransactionId xid = HeapTupleHeaderGetXmin(tuple->t_data);
+
+		heap_insert(NewHeap, copiedTuple, xid, cid, HEAP_INSERT_NO_LOGICAL, NULL);
 	}
 
 	heap_freetuple(copiedTuple);
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 5670f2bfbde..e913594fc07 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -126,6 +126,18 @@ static FullTransactionId XactTopFullTransactionId = {InvalidTransactionId};
 static int	nParallelCurrentXids = 0;
 static TransactionId *ParallelCurrentXids;
 
+/*
+ * Another case that requires TransactionIdIsCurrentTransactionId() to behave
+ * specially is when REPACK CONCURRENTLY is processing data changes made in
+ * the old storage of a table by other transactions. When applying the changes
+ * to the new storage, the backend executing the CLUSTER command needs to act
+ * on behalf on those other transactions. The transactions responsible for the
+ * changes in the old storage are stored in this array, sorted by
+ * xidComparator.
+ */
+static int	nRepackCurrentXids = 0;
+static TransactionId *RepackCurrentXids = NULL;
+
 /*
  * Miscellaneous flag bits to record events which occur on the top level
  * transaction. These flags are only persisted in MyXactFlags and are intended
@@ -973,6 +985,8 @@ TransactionIdIsCurrentTransactionId(TransactionId xid)
 		int			low,
 					high;
 
+		Assert(nRepackCurrentXids == 0);
+
 		low = 0;
 		high = nParallelCurrentXids - 1;
 		while (low <= high)
@@ -992,6 +1006,21 @@ TransactionIdIsCurrentTransactionId(TransactionId xid)
 		return false;
 	}
 
+	/*
+	 * When executing CLUSTER CONCURRENTLY, the array of current transactions
+	 * is given.
+	 */
+	if (nRepackCurrentXids > 0)
+	{
+		Assert(nParallelCurrentXids == 0);
+
+		return bsearch(&xid,
+					   RepackCurrentXids,
+					   nRepackCurrentXids,
+					   sizeof(TransactionId),
+					   xidComparator) != NULL;
+	}
+
 	/*
 	 * We will return true for the Xid of the current subtransaction, any of
 	 * its subcommitted children, any of its parents, or any of their
@@ -5661,6 +5690,29 @@ EndParallelWorkerTransaction(void)
 	CurrentTransactionState->blockState = TBLOCK_DEFAULT;
 }
 
+/*
+ * SetRepackCurrentXids
+ *		Set the XID array that TransactionIdIsCurrentTransactionId() should
+ *		use.
+ */
+void
+SetRepackCurrentXids(TransactionId *xip, int xcnt)
+{
+	RepackCurrentXids = xip;
+	nRepackCurrentXids = xcnt;
+}
+
+/*
+ * ResetRepackCurrentXids
+ *		Undo the effect of SetRepackCurrentXids().
+ */
+void
+ResetRepackCurrentXids(void)
+{
+	RepackCurrentXids = NULL;
+	nRepackCurrentXids = 0;
+}
+
 /*
  * ShowTransactionState
  *		Debug support
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 511b2bb6c43..a44724f3757 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -82,6 +82,11 @@ typedef struct
  * The following definitions are used for concurrent processing.
  */
 
+/*
+ * OID of the table being repacked by this backend.
+ */
+static Oid	repacked_rel = InvalidOid;
+
 /*
  * The locators are used to avoid logical decoding of data that we do not need
  * for our table.
@@ -125,8 +130,10 @@ static List *get_tables_to_repack_partitioned(RepackCommand cmd,
 static bool cluster_is_permitted_for_relation(RepackCommand cmd,
 											  Oid relid, Oid userid);
 
-static void begin_concurrent_repack(Relation rel);
-static void end_concurrent_repack(void);
+static void begin_concurrent_repack(Relation rel, Relation *index_p,
+									bool *entered_p);
+static void end_concurrent_repack(bool error);
+static void cluster_before_shmem_exit_callback(int code, Datum arg);
 static LogicalDecodingContext *setup_logical_decoding(Oid relid,
 													  const char *slotname,
 													  TupleDesc tupdesc);
@@ -146,6 +153,7 @@ static void apply_concurrent_delete(Relation rel, HeapTuple tup_target,
 									ConcurrentChange *change);
 static HeapTuple find_target_tuple(Relation rel, ScanKey key, int nkeys,
 								   HeapTuple tup_key,
+								   Snapshot snapshot,
 								   IndexInsertState *iistate,
 								   TupleTableSlot *ident_slot,
 								   IndexScanDesc *scan_p);
@@ -450,6 +458,8 @@ cluster_rel(RepackCommand cmd, bool usingindex,
 	bool		verbose = ((params->options & CLUOPT_VERBOSE) != 0);
 	bool		recheck = ((params->options & CLUOPT_RECHECK) != 0);
 	bool		concurrent = ((params->options & CLUOPT_CONCURRENT) != 0);
+	bool		entered,
+				success;
 
 	/*
 	 * Check that the correct lock is held. The lock mode is
@@ -620,23 +630,30 @@ cluster_rel(RepackCommand cmd, bool usingindex,
 		TransferPredicateLocksToHeapRelation(OldHeap);
 
 	/* rebuild_relation does all the dirty work */
+	entered = false;
+	success = false;
 	PG_TRY();
 	{
 		/*
-		 * For concurrent processing, make sure that our logical decoding
-		 * ignores data changes of other tables than the one we are
-		 * processing.
+		 * For concurrent processing, make sure that
+		 *
+		 * 1) our logical decoding ignores data changes of other tables than
+		 * the one we are processing.
+		 *
+		 * 2) other transactions treat this table as if it was a system / user
+		 * catalog, and WAL the relevant additional information.
 		 */
 		if (concurrent)
-			begin_concurrent_repack(OldHeap);
+			begin_concurrent_repack(OldHeap, &index, &entered);
 
 		rebuild_relation(cmd, usingindex, OldHeap, index, save_userid,
 						 verbose, concurrent);
+		success = true;
 	}
 	PG_FINALLY();
 	{
-		if (concurrent)
-			end_concurrent_repack();
+		if (concurrent && entered)
+			end_concurrent_repack(!success);
 	}
 	PG_END_TRY();
 
@@ -2396,6 +2413,47 @@ determine_clustered_index(Relation rel, bool usingindex, const char *indexname)
 }
 
 
+/*
+ * Each relation being processed by REPACK CONCURRENTLY must be in the
+ * repackedRels hashtable.
+ */
+typedef struct RepackedRel
+{
+	Oid			relid;
+	Oid			dbid;
+} RepackedRel;
+
+static HTAB *RepackedRelsHash = NULL;
+
+/*
+ * Maximum number of entries in the hashtable.
+ *
+ * A replication slot is needed for the processing, so use this GUC to
+ * allocate memory for the hashtable.
+ */
+#define	MAX_REPACKED_RELS	(max_replication_slots)
+
+Size
+RepackShmemSize(void)
+{
+	return hash_estimate_size(MAX_REPACKED_RELS, sizeof(RepackedRel));
+}
+
+void
+RepackShmemInit(void)
+{
+	HASHCTL		info;
+
+	info.keysize = sizeof(RepackedRel);
+	info.entrysize = info.keysize;
+
+	RepackedRelsHash = ShmemInitHash("Repacked Relations",
+									 MAX_REPACKED_RELS,
+									 MAX_REPACKED_RELS,
+									 &info,
+									 HASH_ELEM | HASH_BLOBS);
+}
+
 /*
  * Call this function before REPACK CONCURRENTLY starts to setup logical
  * decoding. It makes sure that other users of the table put enough
@@ -2410,11 +2468,119 @@ determine_clustered_index(Relation rel, bool usingindex, const char *indexname)
  *
  * Note that TOAST table needs no attention here as it's not scanned using
  * historic snapshot.
+ *
+ * 'index_p' is in/out argument because the function unlocks the index
+ * temporarily.
+ *
+ * 'enter_p' receives a bool value telling whether relation OID was entered
+ * into RepackedRelsHash or not.
  */
 static void
-begin_concurrent_repack(Relation rel)
+begin_concurrent_repack(Relation rel, Relation *index_p, bool *entered_p)
 {
-	Oid			toastrelid;
+	Oid			relid,
+				toastrelid;
+	Relation	index = NULL;
+	Oid			indexid = InvalidOid;
+	RepackedRel key,
+			   *entry;
+	bool		found;
+	static bool before_shmem_exit_callback_setup = false;
+
+	relid = RelationGetRelid(rel);
+	index = index_p ? *index_p : NULL;
+
+	/*
+	 * Make sure that we do not leave an entry in RepackedRelsHash if exiting
+	 * due to FATAL.
+	 */
+	if (!before_shmem_exit_callback_setup)
+	{
+		before_shmem_exit(cluster_before_shmem_exit_callback, 0);
+		before_shmem_exit_callback_setup = true;
+	}
+
+	memset(&key, 0, sizeof(key));
+	key.relid = relid;
+	key.dbid = MyDatabaseId;
+
+	*entered_p = false;
+	LWLockAcquire(RepackedRelsLock, LW_EXCLUSIVE);
+	entry = (RepackedRel *)
+		hash_search(RepackedRelsHash, &key, HASH_ENTER_NULL, &found);
+	if (found)
+	{
+		/*
+		 * Since REPACK CONCURRENTLY takes ShareRowExclusiveLock, a conflict
+		 * should occur much earlier. However that lock may be released
+		 * temporarily, see below.  Anyway, we should complain whatever the
+		 * reason of the conflict might be.
+		 */
+		ereport(ERROR,
+				(errmsg("relation \"%s\" is already being processed by REPACK CONCURRENTLY",
+						RelationGetRelationName(rel))));
+	}
+	if (entry == NULL)
+		ereport(ERROR,
+				(errmsg("too many requests for REPACK CONCURRENTLY at a time")),
+				(errhint("Please consider increasing the \"max_replication_slots\" configuration parameter.")));
+
+	/*
+	 * Even if anything fails below, the caller has to do cleanup in the
+	 * shared memory.
+	 */
+	*entered_p = true;
+
+	/*
+	 * Enable the callback to remove the entry in case of exit. We should not
+	 * do this earlier, otherwise an attempt to insert already existing entry
+	 * could make us remove that entry (inserted by another backend) during
+	 * ERROR handling.
+	 */
+	Assert(!OidIsValid(repacked_rel));
+	repacked_rel = relid;
+
+	LWLockRelease(RepackedRelsLock);
+
+	/*
+	 * Make sure that other backends are aware of the new hash entry as soon
+	 * as they open our table.
+	 */
+	CacheInvalidateRelcacheImmediate(relid);
+
+	/*
+	 * Also make sure that the existing users of the table update their
+	 * relcache entry as soon as they try to run DML commands on it.
+	 *
+	 * ShareLock is the weakest lock that conflicts with DMLs. If any backend
+	 * has a lower lock, we assume it'll accept our invalidation message when
+	 * it changes the lock mode.
+	 *
+	 * Before upgrading the lock on the relation, close the index temporarily
+	 * to avoid a deadlock if another backend running DML already has its lock
+	 * (ShareLock) on the table and waits for the lock on the index.
+	 */
+	if (index)
+	{
+		indexid = RelationGetRelid(index);
+		index_close(index, ShareUpdateExclusiveLock);
+	}
+	LockRelationOid(relid, ShareLock);
+	UnlockRelationOid(relid, ShareLock);
+	if (OidIsValid(indexid))
+	{
+		/*
+		 * Re-open the index and check that it hasn't changed while unlocked.
+		 */
+		check_index_is_clusterable(rel, indexid, ShareUpdateExclusiveLock);
+
+		/*
+		 * Return the new relcache entry to the caller. (It's been locked by
+		 * the call above.)
+		 */
+		index = index_open(indexid, NoLock);
+		*index_p = index;
+	}
 
 	/* Avoid logical decoding of other relations by this backend. */
 	repacked_rel_locator = rel->rd_locator;
@@ -2432,15 +2598,122 @@ begin_concurrent_repack(Relation rel)
 
 /*
  * Call this when done with REPACK CONCURRENTLY.
+ *
+ * 'error' tells whether the function is being called in order to handle
+ * error.
  */
 static void
-end_concurrent_repack(void)
+end_concurrent_repack(bool error)
 {
+	RepackedRel key;
+	RepackedRel *entry = NULL;
+	Oid			relid = repacked_rel;
+
+	/* Remove the relation from the hash if we managed to insert one. */
+	if (OidIsValid(repacked_rel))
+	{
+		memset(&key, 0, sizeof(key));
+		key.relid = repacked_rel;
+		key.dbid = MyDatabaseId;
+		LWLockAcquire(RepackedRelsLock, LW_EXCLUSIVE);
+		entry = hash_search(RepackedRelsHash, &key, HASH_REMOVE, NULL);
+		LWLockRelease(RepackedRelsLock);
+
+		/*
+		 * Make others refresh their information whether they should still
+		 * treat the table as catalog from the perspective of writing WAL.
+		 *
+		 * XXX Unlike entering the entry into the hashtable, we do not bother
+		 * with locking and unlocking the table here:
+		 *
+		 * 1) On normal completion (and sometimes even on ERROR), the caller
+		 * is already holding AccessExclusiveLock on the table, so there
+		 * should be no relcache reference unaware of this change.
+		 *
+		 * 2) In the other cases, the worst scenario is that the other
+		 * backends will write unnecessary information to WAL until they close
+		 * the relation.
+		 *
+		 * Should we use ShareLock mode to fix 2) at least for the non-FATAL
+		 * errors? (Our before_shmem_exit callback is in charge of FATAL, and
+		 * that probably should not try to acquire any lock.)
+		 */
+		CacheInvalidateRelcacheImmediate(repacked_rel);
+
+		/*
+		 * By clearing this variable we also disable
+		 * cluster_before_shmem_exit_callback().
+		 */
+		repacked_rel = InvalidOid;
+	}
+
 	/*
 	 * Restore normal function of (future) logical decoding for this backend.
 	 */
 	repacked_rel_locator.relNumber = InvalidOid;
 	repacked_rel_toast_locator.relNumber = InvalidOid;
+
+	/*
+	 * On normal completion (!error), we should not really fail to remove the
+	 * entry. But if it wasn't there for any reason, raise ERROR to make sure
+	 * the transaction is aborted: if other transactions, while changing the
+	 * contents of the relation, didn't know that REPACK CONCURRENTLY was in
+	 * progress, they could have missed to WAL enough information, and thus we
+	 * could have produced an inconsistent table contents.
+	 *
+	 * On the other hand, if we are already handling an error, there's no
+	 * reason to worry about inconsistent contents of the new storage because
+	 * the transaction is going to be rolled back anyway. Furthermore, by
+	 * raising ERROR here we'd shadow the original error.
+	 */
+	if (!error)
+	{
+		char	   *relname;
+
+		if (OidIsValid(relid) && entry == NULL)
+		{
+			relname = get_rel_name(relid);
+			if (!relname)
+				ereport(ERROR,
+						(errmsg("cache lookup failed for relation %u",
+								relid)));
+
+			ereport(ERROR,
+					(errmsg("relation \"%s\" not found among repacked relations",
+							relname)));
+		}
+	}
+}
+
+/*
+ * A wrapper to call end_concurrent_repack() as a before_shmem_exit callback.
+ */
+static void
+cluster_before_shmem_exit_callback(int code, Datum arg)
+{
+	if (OidIsValid(repacked_rel))
+		end_concurrent_repack(true);
+}
+
+/*
+ * Check if relation is currently being processed by REPACK CONCURRENTLY.
+ */
+bool
+is_concurrent_repack_in_progress(Oid relid)
+{
+	RepackedRel key,
+			   *entry;
+
+	memset(&key, 0, sizeof(key));
+	key.relid = relid;
+	key.dbid = MyDatabaseId;
+
+	LWLockAcquire(RepackedRelsLock, LW_SHARED);
+	entry = (RepackedRel *)
+		hash_search(RepackedRelsHash, &key, HASH_FIND, NULL);
+	LWLockRelease(RepackedRelsLock);
+
+	return entry != NULL;
 }
 
 /*
@@ -2502,6 +2775,9 @@ setup_logical_decoding(Oid relid, const char *slotname, TupleDesc tupdesc)
 	dstate->relid = relid;
 	dstate->tstore = tuplestore_begin_heap(false, false,
 										   maintenance_work_mem);
+#ifdef USE_ASSERT_CHECKING
+	dstate->last_change_xid = InvalidTransactionId;
+#endif
 
 	dstate->tupdesc = tupdesc;
 
@@ -2649,6 +2925,7 @@ apply_concurrent_changes(RepackDecodingState *dstate, Relation rel,
 		char	   *change_raw,
 				   *src;
 		ConcurrentChange change;
+		Snapshot	snapshot;
 		bool		isnull[1];
 		Datum		values[1];
 
@@ -2717,8 +2994,30 @@ apply_concurrent_changes(RepackDecodingState *dstate, Relation rel,
 
 			/*
 			 * Find the tuple to be updated or deleted.
+			 *
+			 * As the table being REPACKed concurrently is treated like a
+			 * catalog, new CID is WAL-logged and decoded. And since we use
+			 * the same XID that the original DMLs did, the snapshot used for
+			 * the logical decoding (by now converted to a non-historic MVCC
+			 * snapshot) should see the tuples inserted previously into the
+			 * new heap and/or updated there.
 			 */
-			tup_exist = find_target_tuple(rel, key, nkeys, tup_key,
+			snapshot = change.snapshot;
+
+			/*
+			 * Set what should be considered current transaction (and
+			 * subtransactions) during visibility check.
+			 *
+			 * Note that this snapshot was created from a historic snapshot
+			 * using SnapBuildMVCCFromHistoric(), which does not touch
+			 * 'subxip'. Thus, unlike in a regular MVCC snapshot, the array
+			 * only contains the transactions whose data changes we are
+			 * applying, and its subtransactions. That's exactly what we need
+			 * to check if particular xact is a "current transaction:".
+			 */
+			SetRepackCurrentXids(snapshot->subxip, snapshot->subxcnt);
+
+			tup_exist = find_target_tuple(rel, key, nkeys, tup_key, snapshot,
 										  iistate, ident_slot, &ind_scan);
 			if (tup_exist == NULL)
 				elog(ERROR, "Failed to find target tuple");
@@ -2729,6 +3028,8 @@ apply_concurrent_changes(RepackDecodingState *dstate, Relation rel,
 			else
 				apply_concurrent_delete(rel, tup_exist, &change);
 
+			ResetRepackCurrentXids();
+
 			if (tup_old != NULL)
 			{
 				pfree(tup_old);
@@ -2741,14 +3042,14 @@ apply_concurrent_changes(RepackDecodingState *dstate, Relation rel,
 		else
 			elog(ERROR, "Unrecognized kind of change: %d", change.kind);
 
-		/*
-		 * If a change was applied now, increment CID for next writes and
-		 * update the snapshot so it sees the changes we've applied so far.
-		 */
-		if (change.kind != CHANGE_UPDATE_OLD)
+		/* Free the snapshot if this is the last change that needed it. */
+		Assert(change.snapshot->active_count > 0);
+		change.snapshot->active_count--;
+		if (change.snapshot->active_count == 0)
 		{
-			CommandCounterIncrement();
-			UpdateActiveSnapshotCommandId();
+			if (change.snapshot == dstate->snapshot)
+				dstate->snapshot = NULL;
+			FreeSnapshot(change.snapshot);
 		}
 
 		/* TTSOpsMinimalTuple has .get_heap_tuple==NULL. */
@@ -2768,16 +3069,35 @@ static void
 apply_concurrent_insert(Relation rel, ConcurrentChange *change, HeapTuple tup,
 						IndexInsertState *iistate, TupleTableSlot *index_slot)
 {
+	Snapshot	snapshot = change->snapshot;
 	List	   *recheck;
 
+	/*
+	 * For INSERT, the visibility information is not important, but we use the
+	 * snapshot to get CID. Index functions might need the whole snapshot
+	 * anyway.
+	 */
+	SetRepackCurrentXids(snapshot->subxip, snapshot->subxcnt);
+
+	/*
+	 * Write the tuple into the new heap.
+	 *
+	 * The snapshot is the one we used to decode the insert (though converted
+	 * to "non-historic" MVCC snapshot), i.e. the snapshot's curcid is the
+	 * tuple CID incremented by one (due to the "new CID" WAL record that got
+	 * written along with the INSERT record). Thus if we want to use the
+	 * original CID, we need to subtract 1 from curcid.
+	 */
+	Assert(snapshot->curcid != InvalidCommandId &&
+		   snapshot->curcid > FirstCommandId);
 
 	/*
 	 * Like simple_heap_insert(), but make sure that the INSERT is not
 	 * logically decoded - see reform_and_rewrite_tuple() for more
 	 * information.
 	 */
-	heap_insert(rel, tup, GetCurrentCommandId(true), HEAP_INSERT_NO_LOGICAL,
-				NULL);
+	heap_insert(rel, tup, change->xid, snapshot->curcid - 1,
+				HEAP_INSERT_NO_LOGICAL, NULL);
 
 	/*
 	 * Update indexes.
@@ -2785,6 +3105,7 @@ apply_concurrent_insert(Relation rel, ConcurrentChange *change, HeapTuple tup,
 	 * In case functions in the index need the active snapshot and caller
 	 * hasn't set one.
 	 */
+	PushActiveSnapshot(snapshot);
 	ExecStoreHeapTuple(tup, index_slot, false);
 	recheck = ExecInsertIndexTuples(iistate->rri,
 									index_slot,
@@ -2795,6 +3116,8 @@ apply_concurrent_insert(Relation rel, ConcurrentChange *change, HeapTuple tup,
 									NIL,	/* arbiterIndexes */
 									false	/* onlySummarizing */
 		);
+	PopActiveSnapshot();
+	ResetRepackCurrentXids();
 
 	/*
 	 * If recheck is required, it must have been preformed on the source
@@ -2816,6 +3139,7 @@ apply_concurrent_update(Relation rel, HeapTuple tup, HeapTuple tup_target,
 	TU_UpdateIndexes update_indexes;
 	TM_Result	res;
 	List	   *recheck;
+	Snapshot	snapshot = change->snapshot;
 
 	/*
 	 * Write the new tuple into the new heap. ('tup' gets the TID assigned
@@ -2823,13 +3147,19 @@ apply_concurrent_update(Relation rel, HeapTuple tup, HeapTuple tup_target,
 	 *
 	 * Do it like in simple_heap_update(), except for 'wal_logical' (and
 	 * except for 'wait').
+	 *
+	 * Regarding CID, see the comment in apply_concurrent_insert().
 	 */
+	Assert(snapshot->curcid != InvalidCommandId &&
+		   snapshot->curcid > FirstCommandId);
+
 	res = heap_update(rel, &tup_target->t_self, tup,
-					  GetCurrentCommandId(true),
+					  change->xid, snapshot->curcid - 1,
 					  InvalidSnapshot,
 					  false,	/* no wait - only we are doing changes */
 					  &tmfd, &lockmode, &update_indexes,
-					  false /* wal_logical */ );
+	/* wal_logical */
+					  false);
 	if (res != TM_Ok)
 		ereport(ERROR, (errmsg("failed to apply concurrent UPDATE")));
 
@@ -2837,6 +3167,7 @@ apply_concurrent_update(Relation rel, HeapTuple tup, HeapTuple tup_target,
 
 	if (update_indexes != TU_None)
 	{
+		PushActiveSnapshot(snapshot);
 		recheck = ExecInsertIndexTuples(iistate->rri,
 										index_slot,
 										iistate->estate,
@@ -2846,6 +3177,7 @@ apply_concurrent_update(Relation rel, HeapTuple tup, HeapTuple tup_target,
 										NIL,	/* arbiterIndexes */
 		/* onlySummarizing */
 										update_indexes == TU_Summarizing);
+		PopActiveSnapshot();
 		list_free(recheck);
 	}
 
@@ -2858,6 +3190,12 @@ apply_concurrent_delete(Relation rel, HeapTuple tup_target,
 {
 	TM_Result	res;
 	TM_FailureData tmfd;
+	Snapshot	snapshot = change->snapshot;
+
+
+	/* Regarding CID, see the comment in apply_concurrent_insert(). */
+	Assert(snapshot->curcid != InvalidCommandId &&
+		   snapshot->curcid > FirstCommandId);
 
 	/*
 	 * Delete tuple from the new heap.
@@ -2865,11 +3203,11 @@ apply_concurrent_delete(Relation rel, HeapTuple tup_target,
 	 * Do it like in simple_heap_delete(), except for 'wal_logical' (and
 	 * except for 'wait').
 	 */
-	res = heap_delete(rel, &tup_target->t_self, GetCurrentCommandId(true),
-					  InvalidSnapshot, false,
-					  &tmfd,
-					  false,	/* no wait - only we are doing changes */
-					  false /* wal_logical */ );
+	res = heap_delete(rel, &tup_target->t_self, change->xid,
+					  snapshot->curcid - 1, InvalidSnapshot, false,
+					  &tmfd, false,
+	/* wal_logical */
+					  false);
 
 	if (res != TM_Ok)
 		ereport(ERROR, (errmsg("failed to apply concurrent DELETE")));
@@ -2890,7 +3228,7 @@ apply_concurrent_delete(Relation rel, HeapTuple tup_target,
  */
 static HeapTuple
 find_target_tuple(Relation rel, ScanKey key, int nkeys, HeapTuple tup_key,
-				  IndexInsertState *iistate,
+				  Snapshot snapshot, IndexInsertState *iistate,
 				  TupleTableSlot *ident_slot, IndexScanDesc *scan_p)
 {
 	IndexScanDesc scan;
@@ -2899,7 +3237,7 @@ find_target_tuple(Relation rel, ScanKey key, int nkeys, HeapTuple tup_key,
 	HeapTuple	result = NULL;
 
 	/* XXX no instrumentation for now */
-	scan = index_beginscan(rel, iistate->ident_index, GetActiveSnapshot(),
+	scan = index_beginscan(rel, iistate->ident_index, snapshot,
 						   NULL, nkeys, 0);
 	*scan_p = scan;
 	index_rescan(scan, key, nkeys, NULL, 0);
@@ -2971,6 +3309,8 @@ process_concurrent_changes(LogicalDecodingContext *ctx, XLogRecPtr end_of_wal,
 	}
 	PG_FINALLY();
 	{
+		ResetRepackCurrentXids();
+
 		if (rel_src)
 			rel_dst->rd_toastoid = InvalidOid;
 	}
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 5dc4ae58ffe..9fefcffd8b3 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -475,9 +475,14 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 
 	/*
 	 * If the change is not intended for logical decoding, do not even
-	 * establish transaction for it - REPACK CONCURRENTLY is the typical use
-	 * case.
-	 *
+	 * establish transaction for it. This is particularly important if the
+	 * record was generated by REPACK CONCURRENTLY because this command uses
+	 * the original XID when doing changes in the new storage. The decoding
+	 * system probably does not expect to see the same transaction multiple
+	 * times.
+	 */
+
+	/*
 	 * First, check if REPACK CONCURRENTLY is being performed by this backend.
 	 * If so, only decode data changes of the table that it is processing, and
 	 * the changes of its TOAST relation.
@@ -504,11 +509,11 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	 * Second, skip records which do not contain sufficient information for
 	 * the decoding.
 	 *
-	 * The problem we solve here is that REPACK CONCURRENTLY generates WAL
-	 * when doing changes in the new table. Those changes should not be useful
-	 * for any other user (such as logical replication subscription) because
-	 * the new table will eventually be dropped (after REPACK CONCURRENTLY has
-	 * assigned its file to the "old table").
+	 * One particular problem we solve here is that REPACK CONCURRENTLY
+	 * generates WAL when doing changes in the new table. Those changes should
+	 * not be decoded because reorderbuffer.c considers their XID already
+	 * committed. (REPACK CONCURRENTLY deliberately generates WAL records in
+	 * such a way that they are skipped here.)
 	 */
 	switch (info)
 	{
@@ -995,13 +1000,6 @@ DecodeInsert(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 
 	xlrec = (xl_heap_insert *) XLogRecGetData(r);
 
-	/*
-	 * Ignore insert records without new tuples (this does happen when
-	 * raw_heap_insert marks the TOAST record as HEAP_INSERT_NO_LOGICAL).
-	 */
-	if (!(xlrec->flags & XLH_INSERT_CONTAINS_NEW_TUPLE))
-		return;
-
 	/* only interested in our database */
 	XLogRecGetBlockTag(r, 0, &target_locator, NULL, NULL);
 	if (target_locator.dbOid != ctx->slot->data.database)
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 8e5116a9cab..72a38074a7b 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -155,7 +155,7 @@ static bool ExportInProgress = false;
 static void SnapBuildPurgeOlderTxn(SnapBuild *builder);
 
 /* snapshot building/manipulation/distribution functions */
-static Snapshot SnapBuildBuildSnapshot(SnapBuild *builder);
+static Snapshot SnapBuildBuildSnapshot(SnapBuild *builder, XLogRecPtr lsn);
 
 static void SnapBuildFreeSnapshot(Snapshot snap);
 
@@ -352,12 +352,17 @@ SnapBuildSnapDecRefcount(Snapshot snap)
  * Build a new snapshot, based on currently committed catalog-modifying
  * transactions.
  *
+ * 'lsn' is the location of the commit record (of a catalog-changing
+ * transaction) that triggered creation of the snapshot. Pass
+ * InvalidXLogRecPtr for the transaction base snapshot or if it the user of
+ * the snapshot should not need the LSN.
+ *
  * In-progress transactions with catalog access are *not* allowed to modify
  * these snapshots; they have to copy them and fill in appropriate ->curcid
  * and ->subxip/subxcnt values.
  */
 static Snapshot
-SnapBuildBuildSnapshot(SnapBuild *builder)
+SnapBuildBuildSnapshot(SnapBuild *builder, XLogRecPtr lsn)
 {
 	Snapshot	snapshot;
 	Size		ssize;
@@ -425,6 +430,7 @@ SnapBuildBuildSnapshot(SnapBuild *builder)
 	snapshot->active_count = 0;
 	snapshot->regd_count = 0;
 	snapshot->snapXactCompletionCount = 0;
+	snapshot->lsn = lsn;
 
 	return snapshot;
 }
@@ -461,7 +467,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	if (TransactionIdIsValid(MyProc->xmin))
 		elog(ERROR, "cannot build an initial slot snapshot when MyProc->xmin already is valid");
 
-	snap = SnapBuildBuildSnapshot(builder);
+	snap = SnapBuildBuildSnapshot(builder, InvalidXLogRecPtr);
 
 	/*
 	 * We know that snap->xmin is alive, enforced by the logical xmin
@@ -502,7 +508,7 @@ SnapBuildInitialSnapshotForRepack(SnapBuild *builder)
 
 	Assert(builder->state == SNAPBUILD_CONSISTENT);
 
-	snap = SnapBuildBuildSnapshot(builder);
+	snap = SnapBuildBuildSnapshot(builder, InvalidXLogRecPtr);
 	return SnapBuildMVCCFromHistoric(snap, false);
 }
 
@@ -636,7 +642,7 @@ SnapBuildGetOrBuildSnapshot(SnapBuild *builder)
 	/* only build a new snapshot if we don't have a prebuilt one */
 	if (builder->snapshot == NULL)
 	{
-		builder->snapshot = SnapBuildBuildSnapshot(builder);
+		builder->snapshot = SnapBuildBuildSnapshot(builder, InvalidXLogRecPtr);
 		/* increase refcount for the snapshot builder */
 		SnapBuildSnapIncRefcount(builder->snapshot);
 	}
@@ -716,7 +722,7 @@ SnapBuildProcessChange(SnapBuild *builder, TransactionId xid, XLogRecPtr lsn)
 		/* only build a new snapshot if we don't have a prebuilt one */
 		if (builder->snapshot == NULL)
 		{
-			builder->snapshot = SnapBuildBuildSnapshot(builder);
+			builder->snapshot = SnapBuildBuildSnapshot(builder, lsn);
 			/* increase refcount for the snapshot builder */
 			SnapBuildSnapIncRefcount(builder->snapshot);
 		}
@@ -1130,7 +1136,7 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 		if (builder->snapshot)
 			SnapBuildSnapDecRefcount(builder->snapshot);
 
-		builder->snapshot = SnapBuildBuildSnapshot(builder);
+		builder->snapshot = SnapBuildBuildSnapshot(builder, lsn);
 
 		/* we might need to execute invalidations, add snapshot */
 		if (!ReorderBufferXidHasBaseSnapshot(builder->reorder, xid))
@@ -1958,7 +1964,7 @@ SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
 	{
 		SnapBuildSnapDecRefcount(builder->snapshot);
 	}
-	builder->snapshot = SnapBuildBuildSnapshot(builder);
+	builder->snapshot = SnapBuildBuildSnapshot(builder, InvalidXLogRecPtr);
 	SnapBuildSnapIncRefcount(builder->snapshot);
 
 	ReorderBufferSetRestartPoint(builder->reorder, lsn);
diff --git a/src/backend/replication/pgoutput_repack/pgoutput_repack.c b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
index 687fbbc59bb..28bd16f9cc7 100644
--- a/src/backend/replication/pgoutput_repack/pgoutput_repack.c
+++ b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
@@ -32,7 +32,8 @@ static void plugin_truncate(struct LogicalDecodingContext *ctx,
 							Relation relations[],
 							ReorderBufferChange *change);
 static void store_change(LogicalDecodingContext *ctx,
-						 ConcurrentChangeKind kind, HeapTuple tuple);
+						 ConcurrentChangeKind kind, HeapTuple tuple,
+						 TransactionId xid);
 
 void
 _PG_output_plugin_init(OutputPluginCallbacks *cb)
@@ -100,6 +101,7 @@ plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 			  Relation relation, ReorderBufferChange *change)
 {
 	RepackDecodingState *dstate;
+	Snapshot	snapshot;
 
 	dstate = (RepackDecodingState *) ctx->output_writer_private;
 
@@ -107,6 +109,48 @@ plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 	if (relation->rd_id != dstate->relid)
 		return;
 
+	/*
+	 * Catalog snapshot is fine because the table we are processing is
+	 * temporarily considered a user catalog table.
+	 */
+	snapshot = GetCatalogSnapshot(InvalidOid);
+	Assert(snapshot->snapshot_type == SNAPSHOT_HISTORIC_MVCC);
+	Assert(!snapshot->suboverflowed);
+
+	/*
+	 * This should not happen, but if we don't have enough information to
+	 * apply a new snapshot, the consequences would be bad. Thus prefer ERROR
+	 * to Assert().
+	 */
+	if (XLogRecPtrIsInvalid(snapshot->lsn))
+		ereport(ERROR, (errmsg("snapshot has invalid LSN")));
+
+	/*
+	 * reorderbuffer.c changes the catalog snapshot as soon as it sees a new
+	 * CID or a commit record of a catalog-changing transaction.
+	 */
+	if (dstate->snapshot == NULL || snapshot->lsn != dstate->snapshot_lsn ||
+		snapshot->curcid != dstate->snapshot->curcid)
+	{
+		/* CID should not go backwards. */
+		Assert(dstate->snapshot == NULL ||
+			   snapshot->curcid >= dstate->snapshot->curcid ||
+			   change->txn->xid != dstate->last_change_xid);
+
+		/*
+		 * XXX Is it a problem that the copy is created in
+		 * TopTransactionContext?
+		 *
+		 * XXX Wouldn't it be o.k. for SnapBuildMVCCFromHistoric() to set xcnt
+		 * to 0 instead of converting xip in this case? The point is that
+		 * transactions which are still in progress from the perspective of
+		 * reorderbuffer.c could not be replayed yet, so we do not need to
+		 * examine their XIDs.
+		 */
+		dstate->snapshot = SnapBuildMVCCFromHistoric(snapshot, false);
+		dstate->snapshot_lsn = snapshot->lsn;
+	}
+
 	/* Decode entry depending on its type */
 	switch (change->action)
 	{
@@ -124,7 +168,7 @@ plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 				if (newtuple == NULL)
 					elog(ERROR, "Incomplete insert info.");
 
-				store_change(ctx, CHANGE_INSERT, newtuple);
+				store_change(ctx, CHANGE_INSERT, newtuple, change->txn->xid);
 			}
 			break;
 		case REORDER_BUFFER_CHANGE_UPDATE:
@@ -141,9 +185,11 @@ plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 					elog(ERROR, "Incomplete update info.");
 
 				if (oldtuple != NULL)
-					store_change(ctx, CHANGE_UPDATE_OLD, oldtuple);
+					store_change(ctx, CHANGE_UPDATE_OLD, oldtuple,
+								 change->txn->xid);
 
-				store_change(ctx, CHANGE_UPDATE_NEW, newtuple);
+				store_change(ctx, CHANGE_UPDATE_NEW, newtuple,
+							 change->txn->xid);
 			}
 			break;
 		case REORDER_BUFFER_CHANGE_DELETE:
@@ -156,7 +202,7 @@ plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 				if (oldtuple == NULL)
 					elog(ERROR, "Incomplete delete info.");
 
-				store_change(ctx, CHANGE_DELETE, oldtuple);
+				store_change(ctx, CHANGE_DELETE, oldtuple, change->txn->xid);
 			}
 			break;
 		default:
@@ -190,13 +236,13 @@ plugin_truncate(struct LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 	if (i == nrelations)
 		return;
 
-	store_change(ctx, CHANGE_TRUNCATE, NULL);
+	store_change(ctx, CHANGE_TRUNCATE, NULL, InvalidTransactionId);
 }
 
 /* Store concurrent data change. */
 static void
 store_change(LogicalDecodingContext *ctx, ConcurrentChangeKind kind,
-			 HeapTuple tuple)
+			 HeapTuple tuple, TransactionId xid)
 {
 	RepackDecodingState *dstate;
 	char	   *change_raw;
@@ -266,6 +312,11 @@ store_change(LogicalDecodingContext *ctx, ConcurrentChangeKind kind,
 	dst = dst_start + SizeOfConcurrentChange;
 	memcpy(dst, tuple->t_data, tuple->t_len);
 
+	/* Initialize the other fields. */
+	change.xid = xid;
+	change.snapshot = dstate->snapshot;
+	dstate->snapshot->active_count++;
+
 	/* The data has been copied. */
 	if (flattened)
 		pfree(tuple);
@@ -279,6 +330,9 @@ store:
 	isnull[0] = false;
 	tuplestore_putvalues(dstate->tstore, dstate->tupdesc_change,
 						 values, isnull);
+#ifdef USE_ASSERT_CHECKING
+	dstate->last_change_xid = xid;
+#endif
 
 	/* Accounting. */
 	dstate->nchanges++;
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index e9ddf39500c..e24e1795aa9 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -151,6 +151,7 @@ CalculateShmemSize(int *num_semaphores)
 	size = add_size(size, InjectionPointShmemSize());
 	size = add_size(size, SlotSyncShmemSize());
 	size = add_size(size, AioShmemSize());
+	size = add_size(size, RepackShmemSize());
 
 	/* include additional requested shmem from preload libraries */
 	size = add_size(size, total_addin_request);
@@ -344,6 +345,7 @@ CreateOrAttachShmemStructs(void)
 	WaitEventCustomShmemInit();
 	InjectionPointShmemInit();
 	AioShmemInit();
+	RepackShmemInit();
 }
 
 /*
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 5427da5bc1b..e94c83726d6 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -352,6 +352,7 @@ DSMRegistry	"Waiting to read or update the dynamic shared memory registry."
 InjectionPoint	"Waiting to read or update information related to injection points."
 SerialControl	"Waiting to read or update shared <filename>pg_serial</filename> state."
 AioWorkerSubmissionQueue	"Waiting to access AIO worker submission queue."
+RepackedRels	"Waiting to access to hash table with list of repacked relations."
 
 #
 # END OF PREDEFINED LWLOCKS (DO NOT CHANGE THIS LINE)
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index 02505c88b8e..ecaa2283c2a 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -1643,6 +1643,27 @@ CacheInvalidateRelcache(Relation relation)
 								 databaseId, relationId);
 }
 
+/*
+ * CacheInvalidateRelcacheImmediate
+ *		Send invalidation message for the specified relation's relcache entry.
+ *
+ * Currently this is used in REPACK CONCURRENTLY, to make sure that other
+ * backends are aware that the command is being executed for the relation.
+ */
+void
+CacheInvalidateRelcacheImmediate(Oid relid)
+{
+	SharedInvalidationMessage msg;
+
+	msg.rc.id = SHAREDINVALRELCACHE_ID;
+	msg.rc.dbId = MyDatabaseId;
+	msg.rc.relId = relid;
+	/* check AddCatcacheInvalidationMessage() for an explanation */
+	VALGRIND_MAKE_MEM_DEFINED(&msg, sizeof(msg));
+
+	SendSharedInvalidMessages(&msg, 1);
+}
+
 /*
  * CacheInvalidateRelcacheAll
  *		Register invalidation of the whole relcache at the end of command.
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index d27a4c30548..ea565b5b053 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -1279,6 +1279,10 @@ retry:
 	/* make sure relation is marked as having no open file yet */
 	relation->rd_smgr = NULL;
 
+	/* Is REPACK CONCURRENTLY in progress? */
+	relation->rd_repack_concurrent =
+		is_concurrent_repack_in_progress(targetRelId);
+
 	/*
 	 * now we can free the memory allocated for pg_class_tuple
 	 */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b82dd17a966..981425f23b6 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -316,22 +316,24 @@ extern BulkInsertState GetBulkInsertState(void);
 extern void FreeBulkInsertState(BulkInsertState);
 extern void ReleaseBulkInsertStatePin(BulkInsertState bistate);
 
-extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
-						int options, BulkInsertState bistate);
+extern void heap_insert(Relation relation, HeapTuple tup, TransactionId xid,
+						CommandId cid, int options, BulkInsertState bistate);
 extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots,
 							  int ntuples, CommandId cid, int options,
 							  BulkInsertState bistate);
 extern TM_Result heap_delete(Relation relation, ItemPointer tid,
-							 CommandId cid, Snapshot crosscheck, bool wait,
+							 TransactionId xid, CommandId cid,
+							 Snapshot crosscheck, bool wait,
 							 struct TM_FailureData *tmfd, bool changingPart,
 							 bool wal_logical);
 extern void heap_finish_speculative(Relation relation, ItemPointer tid);
 extern void heap_abort_speculative(Relation relation, ItemPointer tid);
 extern TM_Result heap_update(Relation relation, ItemPointer otid,
-							 HeapTuple newtup,
+							 HeapTuple newtup, TransactionId xid,
 							 CommandId cid, Snapshot crosscheck, bool wait,
 							 struct TM_FailureData *tmfd, LockTupleMode *lockmode,
-							 TU_UpdateIndexes *update_indexes, bool wal_logical);
+							 TU_UpdateIndexes *update_indexes,
+							 bool wal_logical);
 extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
 								 CommandId cid, LockTupleMode mode, LockWaitPolicy wait_policy,
 								 bool follow_updates,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index b2bc10ee041..fbb66d559b6 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -482,6 +482,8 @@ extern Size EstimateTransactionStateSpace(void);
 extern void SerializeTransactionState(Size maxsize, char *start_address);
 extern void StartParallelWorkerTransaction(char *tstatespace);
 extern void EndParallelWorkerTransaction(void);
+extern void SetRepackCurrentXids(TransactionId *xip, int xcnt);
+extern void ResetRepackCurrentXids(void);
 extern bool IsTransactionBlock(void);
 extern bool IsTransactionOrTransactionBlock(void);
 extern char TransactionBlockStatusCode(void);
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 4a508c57a50..5dba3d427f5 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -61,6 +61,14 @@ typedef struct ConcurrentChange
 	/* See the enum above. */
 	ConcurrentChangeKind kind;
 
+	/* Transaction that changes the data. */
+	TransactionId xid;
+
+	/*
+	 * Historic catalog snapshot that was used to decode this change.
+	 */
+	Snapshot	snapshot;
+
 	/*
 	 * The actual tuple.
 	 *
@@ -92,6 +100,8 @@ typedef struct RepackDecodingState
 	 * tuplestore does this transparently.
 	 */
 	Tuplestorestate *tstore;
+	/* XID of the last change added to tstore. */
+	TransactionId last_change_xid PG_USED_FOR_ASSERTS_ONLY;
 
 	/* The current number of changes in tstore. */
 	double		nchanges;
@@ -112,6 +122,14 @@ typedef struct RepackDecodingState
 	/* Slot to retrieve data from tstore. */
 	TupleTableSlot *tsslot;
 
+	/*
+	 * Historic catalog snapshot that was used to decode the most recent
+	 * change.
+	 */
+	Snapshot	snapshot;
+	/* LSN of the record  */
+	XLogRecPtr	snapshot_lsn;
+
 	ResourceOwner resowner;
 } RepackDecodingState;
 
@@ -141,4 +159,8 @@ extern void finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 							 MultiXactId cutoffMulti,
 							 char newrelpersistence);
 
+extern Size RepackShmemSize(void);
+extern void RepackShmemInit(void);
+extern bool is_concurrent_repack_in_progress(Oid relid);
+
 #endif							/* CLUSTER_H */
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index 06a1ffd4b08..9a9880b3073 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -85,6 +85,7 @@ PG_LWLOCK(50, DSMRegistry)
 PG_LWLOCK(51, InjectionPoint)
 PG_LWLOCK(52, SerialControl)
 PG_LWLOCK(53, AioWorkerSubmissionQueue)
+PG_LWLOCK(54, RepackedRels)
 
 /*
  * There also exist several built-in LWLock tranches.  As with the predefined
diff --git a/src/include/utils/inval.h b/src/include/utils/inval.h
index 9b871caef62..ae9dee394dc 100644
--- a/src/include/utils/inval.h
+++ b/src/include/utils/inval.h
@@ -50,6 +50,8 @@ extern void CacheInvalidateCatalog(Oid catalogId);
 
 extern void CacheInvalidateRelcache(Relation relation);
 
+extern void CacheInvalidateRelcacheImmediate(Oid relid);
+
 extern void CacheInvalidateRelcacheAll(void);
 
 extern void CacheInvalidateRelcacheByTuple(HeapTuple classTuple);
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index b552359915f..66de3bc0c29 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -253,6 +253,9 @@ typedef struct RelationData
 	bool		pgstat_enabled; /* should relation stats be counted */
 	/* use "struct" here to avoid needing to include pgstat.h: */
 	struct PgStat_TableStatus *pgstat_info; /* statistics collection area */
+
+	/* Is REPACK CONCURRENTLY being performed on this relation? */
+	bool		rd_repack_concurrent;
 } RelationData;
 
 
@@ -695,7 +698,9 @@ RelationCloseSmgr(Relation relation)
 #define RelationIsAccessibleInLogicalDecoding(relation) \
 	(XLogLogicalInfoActive() && \
 	 RelationNeedsWAL(relation) && \
-	 (IsCatalogRelation(relation) || RelationIsUsedAsCatalogTable(relation)))
+	 (IsCatalogRelation(relation) || \
+	  RelationIsUsedAsCatalogTable(relation) || \
+	  (relation)->rd_repack_concurrent))
 
 /*
  * RelationIsLogicallyLogged
diff --git a/src/include/utils/snapshot.h b/src/include/utils/snapshot.h
index 0e546ec1497..014f27db7d7 100644
--- a/src/include/utils/snapshot.h
+++ b/src/include/utils/snapshot.h
@@ -13,6 +13,7 @@
 #ifndef SNAPSHOT_H
 #define SNAPSHOT_H
 
+#include "access/xlogdefs.h"
 #include "lib/pairingheap.h"
 
 
@@ -201,6 +202,8 @@ typedef struct SnapshotData
 	uint32		regd_count;		/* refcount on RegisteredSnapshots */
 	pairingheap_node ph_node;	/* link in the RegisteredSnapshots heap */
 
+	XLogRecPtr	lsn;			/* position in the WAL stream when taken */
+
 	/*
 	 * The transaction completion count at the time GetSnapshotData() built
 	 * this snapshot. Allows to avoid re-computing static snapshots when no
diff --git a/src/test/modules/injection_points/specs/repack.spec b/src/test/modules/injection_points/specs/repack.spec
index 75850334986..3711a7c92b9 100644
--- a/src/test/modules/injection_points/specs/repack.spec
+++ b/src/test/modules/injection_points/specs/repack.spec
@@ -86,9 +86,6 @@ step change_new
 # When applying concurrent data changes, we should see the effects of an
 # in-progress subtransaction.
 #
-# XXX Not sure this test is useful now - it was designed for the patch that
-# preserves tuple visibility and which therefore modifies
-# TransactionIdIsCurrentTransactionId().
 step change_subxact1
 {
 	BEGIN;
@@ -103,7 +100,6 @@ step change_subxact1
 # When applying concurrent data changes, we should not see the effects of a
 # rolled back subtransaction.
 #
-# XXX Is this test useful? See above.
 step change_subxact2
 {
 	BEGIN;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b64ab8dfab4..9f5f331cad6 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2540,6 +2540,7 @@ ReorderBufferTupleCidKey
 ReorderBufferUpdateProgressTxnCB
 ReorderTuple
 RepOriginId
+RepackedRel
 RepackCommand
 RepackDecodingState
 RepackStmt
-- 
2.39.5

#23

Alvaro Herrera

alvherre@alvh.no-ip.org

4 months ago

In reply to: Alvaro Herrera (#22)

6 attachment(s)

Re: Adding REPACK [concurrently]

Apparently I mismerged src/bin/scripts/meson.build. This v20 is
identical to v19, where that mistake has been corrected.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Al principio era UNIX, y UNIX habló y dijo: "Hello world\n".
No dijo "Hello New Jersey\n", ni "Hello USA\n".

Attachments:

v20-0001-Split-vacuumdb-to-create-vacuuming.c-h.patchtext/x-diff; charset=utf-8Download

From 9b7a81619278991f48b91d2f236aede2261493b1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= <alvherre@kurilemu.de>
Date: Sat, 30 Aug 2025 14:39:49 +0200
Subject: [PATCH v20 1/6] Split vacuumdb to create vacuuming.c/h

---
 src/bin/scripts/Makefile    |    4 +-
 src/bin/scripts/meson.build |   28 +-
 src/bin/scripts/vacuumdb.c  | 1048 +----------------------------------
 src/bin/scripts/vacuuming.c |  978 ++++++++++++++++++++++++++++++++
 src/bin/scripts/vacuuming.h |   95 ++++
 5 files changed, 1119 insertions(+), 1034 deletions(-)
 create mode 100644 src/bin/scripts/vacuuming.c
 create mode 100644 src/bin/scripts/vacuuming.h

diff --git a/src/bin/scripts/Makefile b/src/bin/scripts/Makefile
index f6b4d40810b..019ca06455d 100644
--- a/src/bin/scripts/Makefile
+++ b/src/bin/scripts/Makefile
@@ -28,7 +28,7 @@ createuser: createuser.o common.o $(WIN32RES) | submake-libpq submake-libpgport
 dropdb: dropdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 dropuser: dropuser.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 clusterdb: clusterdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
-vacuumdb: vacuumdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
+vacuumdb: vacuumdb.o vacuuming.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 reindexdb: reindexdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 pg_isready: pg_isready.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 
@@ -50,7 +50,7 @@ uninstall:
 
 clean distclean:
 	rm -f $(addsuffix $(X), $(PROGRAMS)) $(addsuffix .o, $(PROGRAMS))
-	rm -f common.o $(WIN32RES)
+	rm -f common.o vacuuming.o $(WIN32RES)
 	rm -rf tmp_check
 
 export with_icu
diff --git a/src/bin/scripts/meson.build b/src/bin/scripts/meson.build
index 80df7c33257..a4fed59d1c9 100644
--- a/src/bin/scripts/meson.build
+++ b/src/bin/scripts/meson.build
@@ -12,7 +12,6 @@ binaries = [
   'createuser',
   'dropuser',
   'clusterdb',
-  'vacuumdb',
   'reindexdb',
   'pg_isready',
 ]
@@ -35,6 +34,33 @@ foreach binary : binaries
   bin_targets += binary
 endforeach
 
+vacuuming_common = static_library('libvacuuming_common',
+  files('common.c', 'vacuuming.c'),
+  dependencies: [frontend_code, libpq],
+  kwargs: internal_lib_args,
+)
+
+binaries = [
+  'vacuumdb',
+]
+foreach binary : binaries
+  binary_sources = files('@0@.c'.format(binary))
+
+  if host_system == 'windows'
+    binary_sources += rc_bin_gen.process(win32ver_rc, extra_args: [
+      '--NAME', binary,
+      '--FILEDESC', '@0@ - PostgreSQL utility'.format(binary),])
+  endif
+
+  binary = executable(binary,
+    binary_sources,
+    link_with: [vacuuming_common],
+    dependencies: [frontend_code, libpq],
+    kwargs: default_bin_args,
+  )
+  bin_targets += binary
+endforeach
+
 tests += {
   'name': 'scripts',
   'sd': meson.current_source_dir(),
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index fd236087e90..b1be61ddf25 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -14,92 +14,13 @@
 
 #include <limits.h>
 
-#include "catalog/pg_attribute_d.h"
-#include "catalog/pg_class_d.h"
 #include "common.h"
-#include "common/connect.h"
 #include "common/logging.h"
-#include "fe_utils/cancel.h"
 #include "fe_utils/option_utils.h"
-#include "fe_utils/parallel_slot.h"
-#include "fe_utils/query_utils.h"
-#include "fe_utils/simple_list.h"
-#include "fe_utils/string_utils.h"
-
-
-/* vacuum options controlled by user flags */
-typedef struct vacuumingOptions
-{
-	bool		analyze_only;
-	bool		verbose;
-	bool		and_analyze;
-	bool		full;
-	bool		freeze;
-	bool		disable_page_skipping;
-	bool		skip_locked;
-	int			min_xid_age;
-	int			min_mxid_age;
-	int			parallel_workers;	/* >= 0 indicates user specified the
-									 * parallel degree, otherwise -1 */
-	bool		no_index_cleanup;
-	bool		force_index_cleanup;
-	bool		do_truncate;
-	bool		process_main;
-	bool		process_toast;
-	bool		skip_database_stats;
-	char	   *buffer_usage_limit;
-	bool		missing_stats_only;
-} vacuumingOptions;
-
-/* object filter options */
-typedef enum
-{
-	OBJFILTER_NONE = 0,			/* no filter used */
-	OBJFILTER_ALL_DBS = (1 << 0),	/* -a | --all */
-	OBJFILTER_DATABASE = (1 << 1),	/* -d | --dbname */
-	OBJFILTER_TABLE = (1 << 2), /* -t | --table */
-	OBJFILTER_SCHEMA = (1 << 3),	/* -n | --schema */
-	OBJFILTER_SCHEMA_EXCLUDE = (1 << 4),	/* -N | --exclude-schema */
-} VacObjFilter;
-
-static VacObjFilter objfilter = OBJFILTER_NONE;
-
-static SimpleStringList *retrieve_objects(PGconn *conn,
-										  vacuumingOptions *vacopts,
-										  SimpleStringList *objects,
-										  bool echo);
-
-static void vacuum_one_database(ConnParams *cparams,
-								vacuumingOptions *vacopts,
-								int stage,
-								SimpleStringList *objects,
-								SimpleStringList **found_objs,
-								int concurrentCons,
-								const char *progname, bool echo, bool quiet);
-
-static void vacuum_all_databases(ConnParams *cparams,
-								 vacuumingOptions *vacopts,
-								 bool analyze_in_stages,
-								 SimpleStringList *objects,
-								 int concurrentCons,
-								 const char *progname, bool echo, bool quiet);
-
-static void prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
-								   vacuumingOptions *vacopts, const char *table);
-
-static void run_vacuum_command(PGconn *conn, const char *sql, bool echo,
-							   const char *table);
+#include "vacuuming.h"
 
 static void help(const char *progname);
-
-void		check_objfilter(void);
-
-static char *escape_quotes(const char *src);
-
-/* For analyze-in-stages mode */
-#define ANALYZE_NO_STAGE	-1
-#define ANALYZE_NUM_STAGES	3
-
+static void check_objfilter(void);
 
 int
 main(int argc, char *argv[])
@@ -145,10 +66,6 @@ main(int argc, char *argv[])
 	int			c;
 	const char *dbname = NULL;
 	const char *maintenance_db = NULL;
-	char	   *host = NULL;
-	char	   *port = NULL;
-	char	   *username = NULL;
-	enum trivalue prompt_password = TRI_DEFAULT;
 	ConnParams	cparams;
 	bool		echo = false;
 	bool		quiet = false;
@@ -168,13 +85,18 @@ main(int argc, char *argv[])
 	vacopts.process_main = true;
 	vacopts.process_toast = true;
 
+	/* the same for connection parameters */
+	memset(&cparams, 0, sizeof(cparams));
+	cparams.prompt_password = TRI_DEFAULT;
+
 	pg_logging_init(argv[0]);
 	progname = get_progname(argv[0]);
 	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
 
-	handle_help_version_opts(argc, argv, "vacuumdb", help);
+	handle_help_version_opts(argc, argv, progname, help);
 
-	while ((c = getopt_long(argc, argv, "ad:efFh:j:n:N:p:P:qt:U:vwWzZ", long_options, &optindex)) != -1)
+	while ((c = getopt_long(argc, argv, "ad:efFh:j:n:N:p:P:qt:U:vwWzZ",
+							long_options, &optindex)) != -1)
 	{
 		switch (c)
 		{
@@ -195,7 +117,7 @@ main(int argc, char *argv[])
 				vacopts.freeze = true;
 				break;
 			case 'h':
-				host = pg_strdup(optarg);
+				cparams.pghost = pg_strdup(optarg);
 				break;
 			case 'j':
 				if (!option_parse_int(optarg, "-j/--jobs", 1, INT_MAX,
@@ -211,7 +133,7 @@ main(int argc, char *argv[])
 				simple_string_list_append(&objects, optarg);
 				break;
 			case 'p':
-				port = pg_strdup(optarg);
+				cparams.pgport = pg_strdup(optarg);
 				break;
 			case 'P':
 				if (!option_parse_int(optarg, "-P/--parallel", 0, INT_MAX,
@@ -227,16 +149,16 @@ main(int argc, char *argv[])
 				tbl_count++;
 				break;
 			case 'U':
-				username = pg_strdup(optarg);
+				cparams.pguser = pg_strdup(optarg);
 				break;
 			case 'v':
 				vacopts.verbose = true;
 				break;
 			case 'w':
-				prompt_password = TRI_NO;
+				cparams.prompt_password = TRI_NO;
 				break;
 			case 'W':
-				prompt_password = TRI_YES;
+				cparams.prompt_password = TRI_YES;
 				break;
 			case 'z':
 				vacopts.and_analyze = true;
@@ -380,66 +302,9 @@ main(int argc, char *argv[])
 		pg_fatal("cannot use the \"%s\" option without \"%s\" or \"%s\"",
 				 "missing-stats-only", "analyze-only", "analyze-in-stages");
 
-	/* fill cparams except for dbname, which is set below */
-	cparams.pghost = host;
-	cparams.pgport = port;
-	cparams.pguser = username;
-	cparams.prompt_password = prompt_password;
-	cparams.override_dbname = NULL;
-
-	setup_cancel_handler(NULL);
-
-	/* Avoid opening extra connections. */
-	if (tbl_count && (concurrentCons > tbl_count))
-		concurrentCons = tbl_count;
-
-	if (objfilter & OBJFILTER_ALL_DBS)
-	{
-		cparams.dbname = maintenance_db;
-
-		vacuum_all_databases(&cparams, &vacopts,
-							 analyze_in_stages,
-							 &objects,
-							 concurrentCons,
-							 progname, echo, quiet);
-	}
-	else
-	{
-		if (dbname == NULL)
-		{
-			if (getenv("PGDATABASE"))
-				dbname = getenv("PGDATABASE");
-			else if (getenv("PGUSER"))
-				dbname = getenv("PGUSER");
-			else
-				dbname = get_user_name_or_exit(progname);
-		}
-
-		cparams.dbname = dbname;
-
-		if (analyze_in_stages)
-		{
-			int			stage;
-			SimpleStringList *found_objs = NULL;
-
-			for (stage = 0; stage < ANALYZE_NUM_STAGES; stage++)
-			{
-				vacuum_one_database(&cparams, &vacopts,
-									stage,
-									&objects,
-									vacopts.missing_stats_only ? &found_objs : NULL,
-									concurrentCons,
-									progname, echo, quiet);
-			}
-		}
-		else
-			vacuum_one_database(&cparams, &vacopts,
-								ANALYZE_NO_STAGE,
-								&objects, NULL,
-								concurrentCons,
-								progname, echo, quiet);
-	}
-
+	vacuuming_main(&cparams, dbname, maintenance_db, &vacopts, &objects,
+				   analyze_in_stages, tbl_count, concurrentCons,
+				   progname, echo, quiet);
 	exit(0);
 }
 
@@ -466,885 +331,6 @@ check_objfilter(void)
 		pg_fatal("cannot vacuum all tables in schema(s) and exclude schema(s) at the same time");
 }
 
-/*
- * Returns a newly malloc'd version of 'src' with escaped single quotes and
- * backslashes.
- */
-static char *
-escape_quotes(const char *src)
-{
-	char	   *result = escape_single_quotes_ascii(src);
-
-	if (!result)
-		pg_fatal("out of memory");
-	return result;
-}
-
-/*
- * vacuum_one_database
- *
- * Process tables in the given database.
- *
- * There are two ways to specify the list of objects to process:
- *
- * 1) The "found_objs" parameter is a double pointer to a fully qualified list
- *    of objects to process, as returned by a previous call to
- *    vacuum_one_database().
- *
- *     a) If both "found_objs" (the double pointer) and "*found_objs" (the
- *        once-dereferenced double pointer) are not NULL, this list takes
- *        priority, and anything specified in "objects" is ignored.
- *
- *     b) If "found_objs" (the double pointer) is not NULL but "*found_objs"
- *        (the once-dereferenced double pointer) _is_ NULL, the "objects"
- *        parameter takes priority, and the results of the catalog query
- *        described in (2) are stored in "found_objs".
- *
- *     c) If "found_objs" (the double pointer) is NULL, the "objects"
- *        parameter again takes priority, and the results of the catalog query
- *        are not saved.
- *
- * 2) The "objects" parameter is a user-specified list of objects to process.
- *    When (1b) or (1c) applies, this function performs a catalog query to
- *    retrieve a fully qualified list of objects to process, as described
- *    below.
- *
- *     a) If "objects" is not NULL, the catalog query gathers only the objects
- *        listed in "objects".
- *
- *     b) If "objects" is NULL, all tables in the database are gathered.
- *
- * Note that this function is only concerned with running exactly one stage
- * when in analyze-in-stages mode; caller must iterate on us if necessary.
- *
- * If concurrentCons is > 1, multiple connections are used to vacuum tables
- * in parallel.
- */
-static void
-vacuum_one_database(ConnParams *cparams,
-					vacuumingOptions *vacopts,
-					int stage,
-					SimpleStringList *objects,
-					SimpleStringList **found_objs,
-					int concurrentCons,
-					const char *progname, bool echo, bool quiet)
-{
-	PQExpBufferData sql;
-	PGconn	   *conn;
-	SimpleStringListCell *cell;
-	ParallelSlotArray *sa;
-	int			ntups = 0;
-	bool		failed = false;
-	const char *initcmd;
-	SimpleStringList *ret = NULL;
-	const char *stage_commands[] = {
-		"SET default_statistics_target=1; SET vacuum_cost_delay=0;",
-		"SET default_statistics_target=10; RESET vacuum_cost_delay;",
-		"RESET default_statistics_target;"
-	};
-	const char *stage_messages[] = {
-		gettext_noop("Generating minimal optimizer statistics (1 target)"),
-		gettext_noop("Generating medium optimizer statistics (10 targets)"),
-		gettext_noop("Generating default (full) optimizer statistics")
-	};
-
-	Assert(stage == ANALYZE_NO_STAGE ||
-		   (stage >= 0 && stage < ANALYZE_NUM_STAGES));
-
-	conn = connectDatabase(cparams, progname, echo, false, true);
-
-	if (vacopts->disable_page_skipping && PQserverVersion(conn) < 90600)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "disable-page-skipping", "9.6");
-	}
-
-	if (vacopts->no_index_cleanup && PQserverVersion(conn) < 120000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "no-index-cleanup", "12");
-	}
-
-	if (vacopts->force_index_cleanup && PQserverVersion(conn) < 120000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "force-index-cleanup", "12");
-	}
-
-	if (!vacopts->do_truncate && PQserverVersion(conn) < 120000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "no-truncate", "12");
-	}
-
-	if (!vacopts->process_main && PQserverVersion(conn) < 160000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "no-process-main", "16");
-	}
-
-	if (!vacopts->process_toast && PQserverVersion(conn) < 140000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "no-process-toast", "14");
-	}
-
-	if (vacopts->skip_locked && PQserverVersion(conn) < 120000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "skip-locked", "12");
-	}
-
-	if (vacopts->min_xid_age != 0 && PQserverVersion(conn) < 90600)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "--min-xid-age", "9.6");
-	}
-
-	if (vacopts->min_mxid_age != 0 && PQserverVersion(conn) < 90600)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "--min-mxid-age", "9.6");
-	}
-
-	if (vacopts->parallel_workers >= 0 && PQserverVersion(conn) < 130000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "--parallel", "13");
-	}
-
-	if (vacopts->buffer_usage_limit && PQserverVersion(conn) < 160000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "--buffer-usage-limit", "16");
-	}
-
-	if (vacopts->missing_stats_only && PQserverVersion(conn) < 150000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "--missing-stats-only", "15");
-	}
-
-	/* skip_database_stats is used automatically if server supports it */
-	vacopts->skip_database_stats = (PQserverVersion(conn) >= 160000);
-
-	if (!quiet)
-	{
-		if (stage != ANALYZE_NO_STAGE)
-			printf(_("%s: processing database \"%s\": %s\n"),
-				   progname, PQdb(conn), _(stage_messages[stage]));
-		else
-			printf(_("%s: vacuuming database \"%s\"\n"),
-				   progname, PQdb(conn));
-		fflush(stdout);
-	}
-
-	/*
-	 * If the caller provided the results of a previous catalog query, just
-	 * use that.  Otherwise, run the catalog query ourselves and set the
-	 * return variable if provided.
-	 */
-	if (found_objs && *found_objs)
-		ret = *found_objs;
-	else
-	{
-		ret = retrieve_objects(conn, vacopts, objects, echo);
-		if (found_objs)
-			*found_objs = ret;
-	}
-
-	/*
-	 * Count the number of objects in the catalog query result.  If there are
-	 * none, we are done.
-	 */
-	for (cell = ret ? ret->head : NULL; cell; cell = cell->next)
-		ntups++;
-
-	if (ntups == 0)
-	{
-		PQfinish(conn);
-		return;
-	}
-
-	/*
-	 * Ensure concurrentCons is sane.  If there are more connections than
-	 * vacuumable relations, we don't need to use them all.
-	 */
-	if (concurrentCons > ntups)
-		concurrentCons = ntups;
-	if (concurrentCons <= 0)
-		concurrentCons = 1;
-
-	/*
-	 * All slots need to be prepared to run the appropriate analyze stage, if
-	 * caller requested that mode.  We have to prepare the initial connection
-	 * ourselves before setting up the slots.
-	 */
-	if (stage == ANALYZE_NO_STAGE)
-		initcmd = NULL;
-	else
-	{
-		initcmd = stage_commands[stage];
-		executeCommand(conn, initcmd, echo);
-	}
-
-	/*
-	 * Setup the database connections. We reuse the connection we already have
-	 * for the first slot.  If not in parallel mode, the first slot in the
-	 * array contains the connection.
-	 */
-	sa = ParallelSlotsSetup(concurrentCons, cparams, progname, echo, initcmd);
-	ParallelSlotsAdoptConn(sa, conn);
-
-	initPQExpBuffer(&sql);
-
-	cell = ret->head;
-	do
-	{
-		const char *tabname = cell->val;
-		ParallelSlot *free_slot;
-
-		if (CancelRequested)
-		{
-			failed = true;
-			goto finish;
-		}
-
-		free_slot = ParallelSlotsGetIdle(sa, NULL);
-		if (!free_slot)
-		{
-			failed = true;
-			goto finish;
-		}
-
-		prepare_vacuum_command(&sql, PQserverVersion(free_slot->connection),
-							   vacopts, tabname);
-
-		/*
-		 * Execute the vacuum.  All errors are handled in processQueryResult
-		 * through ParallelSlotsGetIdle.
-		 */
-		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, sql.data,
-						   echo, tabname);
-
-		cell = cell->next;
-	} while (cell != NULL);
-
-	if (!ParallelSlotsWaitCompletion(sa))
-	{
-		failed = true;
-		goto finish;
-	}
-
-	/* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */
-	if (vacopts->skip_database_stats && stage == ANALYZE_NO_STAGE)
-	{
-		const char *cmd = "VACUUM (ONLY_DATABASE_STATS);";
-		ParallelSlot *free_slot = ParallelSlotsGetIdle(sa, NULL);
-
-		if (!free_slot)
-		{
-			failed = true;
-			goto finish;
-		}
-
-		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, cmd, echo, NULL);
-
-		if (!ParallelSlotsWaitCompletion(sa))
-			failed = true;
-	}
-
-finish:
-	ParallelSlotsTerminate(sa);
-	pg_free(sa);
-
-	termPQExpBuffer(&sql);
-
-	if (failed)
-		exit(1);
-}
-
-/*
- * Prepare the list of tables to process by querying the catalogs.
- *
- * Since we execute the constructed query with the default search_path (which
- * could be unsafe), everything in this query MUST be fully qualified.
- *
- * First, build a WITH clause for the catalog query if any tables were
- * specified, with a set of values made of relation names and their optional
- * set of columns.  This is used to match any provided column lists with the
- * generated qualified identifiers and to filter for the tables provided via
- * --table.  If a listed table does not exist, the catalog query will fail.
- */
-static SimpleStringList *
-retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
-				 SimpleStringList *objects, bool echo)
-{
-	PQExpBufferData buf;
-	PQExpBufferData catalog_query;
-	PGresult   *res;
-	SimpleStringListCell *cell;
-	SimpleStringList *found_objs = palloc0(sizeof(SimpleStringList));
-	bool		objects_listed = false;
-
-	initPQExpBuffer(&catalog_query);
-	for (cell = objects ? objects->head : NULL; cell; cell = cell->next)
-	{
-		char	   *just_table = NULL;
-		const char *just_columns = NULL;
-
-		if (!objects_listed)
-		{
-			appendPQExpBufferStr(&catalog_query,
-								 "WITH listed_objects (object_oid, column_list) "
-								 "AS (\n  VALUES (");
-			objects_listed = true;
-		}
-		else
-			appendPQExpBufferStr(&catalog_query, ",\n  (");
-
-		if (objfilter & (OBJFILTER_SCHEMA | OBJFILTER_SCHEMA_EXCLUDE))
-		{
-			appendStringLiteralConn(&catalog_query, cell->val, conn);
-			appendPQExpBufferStr(&catalog_query, "::pg_catalog.regnamespace, ");
-		}
-
-		if (objfilter & OBJFILTER_TABLE)
-		{
-			/*
-			 * Split relation and column names given by the user, this is used
-			 * to feed the CTE with values on which are performed pre-run
-			 * validity checks as well.  For now these happen only on the
-			 * relation name.
-			 */
-			splitTableColumnsSpec(cell->val, PQclientEncoding(conn),
-								  &just_table, &just_columns);
-
-			appendStringLiteralConn(&catalog_query, just_table, conn);
-			appendPQExpBufferStr(&catalog_query, "::pg_catalog.regclass, ");
-		}
-
-		if (just_columns && just_columns[0] != '\0')
-			appendStringLiteralConn(&catalog_query, just_columns, conn);
-		else
-			appendPQExpBufferStr(&catalog_query, "NULL");
-
-		appendPQExpBufferStr(&catalog_query, "::pg_catalog.text)");
-
-		pg_free(just_table);
-	}
-
-	/* Finish formatting the CTE */
-	if (objects_listed)
-		appendPQExpBufferStr(&catalog_query, "\n)\n");
-
-	appendPQExpBufferStr(&catalog_query, "SELECT c.relname, ns.nspname");
-
-	if (objects_listed)
-		appendPQExpBufferStr(&catalog_query, ", listed_objects.column_list");
-
-	appendPQExpBufferStr(&catalog_query,
-						 " FROM pg_catalog.pg_class c\n"
-						 " JOIN pg_catalog.pg_namespace ns"
-						 " ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n"
-						 " CROSS JOIN LATERAL (SELECT c.relkind IN ("
-						 CppAsString2(RELKIND_PARTITIONED_TABLE) ", "
-						 CppAsString2(RELKIND_PARTITIONED_INDEX) ")) as p (inherited)\n"
-						 " LEFT JOIN pg_catalog.pg_class t"
-						 " ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid\n");
-
-	/*
-	 * Used to match the tables or schemas listed by the user, completing the
-	 * JOIN clause.
-	 */
-	if (objects_listed)
-	{
-		appendPQExpBufferStr(&catalog_query, " LEFT JOIN listed_objects"
-							 " ON listed_objects.object_oid"
-							 " OPERATOR(pg_catalog.=) ");
-
-		if (objfilter & OBJFILTER_TABLE)
-			appendPQExpBufferStr(&catalog_query, "c.oid\n");
-		else
-			appendPQExpBufferStr(&catalog_query, "ns.oid\n");
-	}
-
-	/*
-	 * Exclude temporary tables, beginning the WHERE clause.
-	 */
-	appendPQExpBufferStr(&catalog_query,
-						 " WHERE c.relpersistence OPERATOR(pg_catalog.!=) "
-						 CppAsString2(RELPERSISTENCE_TEMP) "\n");
-
-	/*
-	 * Used to match the tables or schemas listed by the user, for the WHERE
-	 * clause.
-	 */
-	if (objects_listed)
-	{
-		if (objfilter & OBJFILTER_SCHEMA_EXCLUDE)
-			appendPQExpBufferStr(&catalog_query,
-								 " AND listed_objects.object_oid IS NULL\n");
-		else
-			appendPQExpBufferStr(&catalog_query,
-								 " AND listed_objects.object_oid IS NOT NULL\n");
-	}
-
-	/*
-	 * If no tables were listed, filter for the relevant relation types.  If
-	 * tables were given via --table, don't bother filtering by relation type.
-	 * Instead, let the server decide whether a given relation can be
-	 * processed in which case the user will know about it.
-	 */
-	if ((objfilter & OBJFILTER_TABLE) == 0)
-	{
-		/*
-		 * vacuumdb should generally follow the behavior of the underlying
-		 * VACUUM and ANALYZE commands. If analyze_only is true, process
-		 * regular tables, materialized views, and partitioned tables, just
-		 * like ANALYZE (with no specific target tables) does. Otherwise,
-		 * process only regular tables and materialized views, since VACUUM
-		 * skips partitioned tables when no target tables are specified.
-		 */
-		if (vacopts->analyze_only)
-			appendPQExpBufferStr(&catalog_query,
-								 " AND c.relkind OPERATOR(pg_catalog.=) ANY (array["
-								 CppAsString2(RELKIND_RELATION) ", "
-								 CppAsString2(RELKIND_MATVIEW) ", "
-								 CppAsString2(RELKIND_PARTITIONED_TABLE) "])\n");
-		else
-			appendPQExpBufferStr(&catalog_query,
-								 " AND c.relkind OPERATOR(pg_catalog.=) ANY (array["
-								 CppAsString2(RELKIND_RELATION) ", "
-								 CppAsString2(RELKIND_MATVIEW) "])\n");
-
-	}
-
-	/*
-	 * For --min-xid-age and --min-mxid-age, the age of the relation is the
-	 * greatest of the ages of the main relation and its associated TOAST
-	 * table.  The commands generated by vacuumdb will also process the TOAST
-	 * table for the relation if necessary, so it does not need to be
-	 * considered separately.
-	 */
-	if (vacopts->min_xid_age != 0)
-	{
-		appendPQExpBuffer(&catalog_query,
-						  " AND GREATEST(pg_catalog.age(c.relfrozenxid),"
-						  " pg_catalog.age(t.relfrozenxid)) "
-						  " OPERATOR(pg_catalog.>=) '%d'::pg_catalog.int4\n"
-						  " AND c.relfrozenxid OPERATOR(pg_catalog.!=)"
-						  " '0'::pg_catalog.xid\n",
-						  vacopts->min_xid_age);
-	}
-
-	if (vacopts->min_mxid_age != 0)
-	{
-		appendPQExpBuffer(&catalog_query,
-						  " AND GREATEST(pg_catalog.mxid_age(c.relminmxid),"
-						  " pg_catalog.mxid_age(t.relminmxid)) OPERATOR(pg_catalog.>=)"
-						  " '%d'::pg_catalog.int4\n"
-						  " AND c.relminmxid OPERATOR(pg_catalog.!=)"
-						  " '0'::pg_catalog.xid\n",
-						  vacopts->min_mxid_age);
-	}
-
-	if (vacopts->missing_stats_only)
-	{
-		appendPQExpBufferStr(&catalog_query, " AND (\n");
-
-		/* regular stats */
-		appendPQExpBufferStr(&catalog_query,
-							 " EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
-							 " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
-							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
-							 " AND NOT a.attisdropped\n"
-							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
-							 " AND a.attgenerated OPERATOR(pg_catalog.<>) "
-							 CppAsString2(ATTRIBUTE_GENERATED_VIRTUAL) "\n"
-							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
-							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
-							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
-							 " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
-
-		/* extended stats */
-		appendPQExpBufferStr(&catalog_query,
-							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
-							 " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
-							 " AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
-							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
-							 " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
-							 " AND d.stxdinherit OPERATOR(pg_catalog.=) p.inherited))\n");
-
-		/* expression indexes */
-		appendPQExpBufferStr(&catalog_query,
-							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
-							 " JOIN pg_catalog.pg_index i"
-							 " ON i.indexrelid OPERATOR(pg_catalog.=) a.attrelid\n"
-							 " WHERE i.indrelid OPERATOR(pg_catalog.=) c.oid\n"
-							 " AND i.indkey[a.attnum OPERATOR(pg_catalog.-) 1::pg_catalog.int2]"
-							 " OPERATOR(pg_catalog.=) 0::pg_catalog.int2\n"
-							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
-							 " AND NOT a.attisdropped\n"
-							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
-							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
-							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
-							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
-							 " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
-
-		/* inheritance and regular stats */
-		appendPQExpBufferStr(&catalog_query,
-							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
-							 " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
-							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
-							 " AND NOT a.attisdropped\n"
-							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
-							 " AND a.attgenerated OPERATOR(pg_catalog.<>) "
-							 CppAsString2(ATTRIBUTE_GENERATED_VIRTUAL) "\n"
-							 " AND c.relhassubclass\n"
-							 " AND NOT p.inherited\n"
-							 " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
-							 " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
-							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
-							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
-							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
-							 " AND s.stainherit))\n");
-
-		/* inheritance and extended stats */
-		appendPQExpBufferStr(&catalog_query,
-							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
-							 " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
-							 " AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
-							 " AND c.relhassubclass\n"
-							 " AND NOT p.inherited\n"
-							 " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
-							 " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
-							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
-							 " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
-							 " AND d.stxdinherit))\n");
-
-		appendPQExpBufferStr(&catalog_query, " )\n");
-	}
-
-	/*
-	 * Execute the catalog query.  We use the default search_path for this
-	 * query for consistency with table lookups done elsewhere by the user.
-	 */
-	appendPQExpBufferStr(&catalog_query, " ORDER BY c.relpages DESC;");
-	executeCommand(conn, "RESET search_path;", echo);
-	res = executeQuery(conn, catalog_query.data, echo);
-	termPQExpBuffer(&catalog_query);
-	PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, echo));
-
-	/*
-	 * Build qualified identifiers for each table, including the column list
-	 * if given.
-	 */
-	initPQExpBuffer(&buf);
-	for (int i = 0; i < PQntuples(res); i++)
-	{
-		appendPQExpBufferStr(&buf,
-							 fmtQualifiedIdEnc(PQgetvalue(res, i, 1),
-											   PQgetvalue(res, i, 0),
-											   PQclientEncoding(conn)));
-
-		if (objects_listed && !PQgetisnull(res, i, 2))
-			appendPQExpBufferStr(&buf, PQgetvalue(res, i, 2));
-
-		simple_string_list_append(found_objs, buf.data);
-		resetPQExpBuffer(&buf);
-	}
-	termPQExpBuffer(&buf);
-	PQclear(res);
-
-	return found_objs;
-}
-
-/*
- * Vacuum/analyze all connectable databases.
- *
- * In analyze-in-stages mode, we process all databases in one stage before
- * moving on to the next stage.  That ensure minimal stats are available
- * quickly everywhere before generating more detailed ones.
- */
-static void
-vacuum_all_databases(ConnParams *cparams,
-					 vacuumingOptions *vacopts,
-					 bool analyze_in_stages,
-					 SimpleStringList *objects,
-					 int concurrentCons,
-					 const char *progname, bool echo, bool quiet)
-{
-	PGconn	   *conn;
-	PGresult   *result;
-	int			stage;
-	int			i;
-
-	conn = connectMaintenanceDatabase(cparams, progname, echo);
-	result = executeQuery(conn,
-						  "SELECT datname FROM pg_database WHERE datallowconn AND datconnlimit <> -2 ORDER BY 1;",
-						  echo);
-	PQfinish(conn);
-
-	if (analyze_in_stages)
-	{
-		SimpleStringList **found_objs = NULL;
-
-		if (vacopts->missing_stats_only)
-			found_objs = palloc0(PQntuples(result) * sizeof(SimpleStringList *));
-
-		/*
-		 * When analyzing all databases in stages, we analyze them all in the
-		 * fastest stage first, so that initial statistics become available
-		 * for all of them as soon as possible.
-		 *
-		 * This means we establish several times as many connections, but
-		 * that's a secondary consideration.
-		 */
-		for (stage = 0; stage < ANALYZE_NUM_STAGES; stage++)
-		{
-			for (i = 0; i < PQntuples(result); i++)
-			{
-				cparams->override_dbname = PQgetvalue(result, i, 0);
-
-				vacuum_one_database(cparams, vacopts,
-									stage,
-									objects,
-									vacopts->missing_stats_only ? &found_objs[i] : NULL,
-									concurrentCons,
-									progname, echo, quiet);
-			}
-		}
-	}
-	else
-	{
-		for (i = 0; i < PQntuples(result); i++)
-		{
-			cparams->override_dbname = PQgetvalue(result, i, 0);
-
-			vacuum_one_database(cparams, vacopts,
-								ANALYZE_NO_STAGE,
-								objects, NULL,
-								concurrentCons,
-								progname, echo, quiet);
-		}
-	}
-
-	PQclear(result);
-}
-
-/*
- * Construct a vacuum/analyze command to run based on the given options, in the
- * given string buffer, which may contain previous garbage.
- *
- * The table name used must be already properly quoted.  The command generated
- * depends on the server version involved and it is semicolon-terminated.
- */
-static void
-prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
-					   vacuumingOptions *vacopts, const char *table)
-{
-	const char *paren = " (";
-	const char *comma = ", ";
-	const char *sep = paren;
-
-	resetPQExpBuffer(sql);
-
-	if (vacopts->analyze_only)
-	{
-		appendPQExpBufferStr(sql, "ANALYZE");
-
-		/* parenthesized grammar of ANALYZE is supported since v11 */
-		if (serverVersion >= 110000)
-		{
-			if (vacopts->skip_locked)
-			{
-				/* SKIP_LOCKED is supported since v12 */
-				Assert(serverVersion >= 120000);
-				appendPQExpBuffer(sql, "%sSKIP_LOCKED", sep);
-				sep = comma;
-			}
-			if (vacopts->verbose)
-			{
-				appendPQExpBuffer(sql, "%sVERBOSE", sep);
-				sep = comma;
-			}
-			if (vacopts->buffer_usage_limit)
-			{
-				Assert(serverVersion >= 160000);
-				appendPQExpBuffer(sql, "%sBUFFER_USAGE_LIMIT '%s'", sep,
-								  vacopts->buffer_usage_limit);
-				sep = comma;
-			}
-			if (sep != paren)
-				appendPQExpBufferChar(sql, ')');
-		}
-		else
-		{
-			if (vacopts->verbose)
-				appendPQExpBufferStr(sql, " VERBOSE");
-		}
-	}
-	else
-	{
-		appendPQExpBufferStr(sql, "VACUUM");
-
-		/* parenthesized grammar of VACUUM is supported since v9.0 */
-		if (serverVersion >= 90000)
-		{
-			if (vacopts->disable_page_skipping)
-			{
-				/* DISABLE_PAGE_SKIPPING is supported since v9.6 */
-				Assert(serverVersion >= 90600);
-				appendPQExpBuffer(sql, "%sDISABLE_PAGE_SKIPPING", sep);
-				sep = comma;
-			}
-			if (vacopts->no_index_cleanup)
-			{
-				/* "INDEX_CLEANUP FALSE" has been supported since v12 */
-				Assert(serverVersion >= 120000);
-				Assert(!vacopts->force_index_cleanup);
-				appendPQExpBuffer(sql, "%sINDEX_CLEANUP FALSE", sep);
-				sep = comma;
-			}
-			if (vacopts->force_index_cleanup)
-			{
-				/* "INDEX_CLEANUP TRUE" has been supported since v12 */
-				Assert(serverVersion >= 120000);
-				Assert(!vacopts->no_index_cleanup);
-				appendPQExpBuffer(sql, "%sINDEX_CLEANUP TRUE", sep);
-				sep = comma;
-			}
-			if (!vacopts->do_truncate)
-			{
-				/* TRUNCATE is supported since v12 */
-				Assert(serverVersion >= 120000);
-				appendPQExpBuffer(sql, "%sTRUNCATE FALSE", sep);
-				sep = comma;
-			}
-			if (!vacopts->process_main)
-			{
-				/* PROCESS_MAIN is supported since v16 */
-				Assert(serverVersion >= 160000);
-				appendPQExpBuffer(sql, "%sPROCESS_MAIN FALSE", sep);
-				sep = comma;
-			}
-			if (!vacopts->process_toast)
-			{
-				/* PROCESS_TOAST is supported since v14 */
-				Assert(serverVersion >= 140000);
-				appendPQExpBuffer(sql, "%sPROCESS_TOAST FALSE", sep);
-				sep = comma;
-			}
-			if (vacopts->skip_database_stats)
-			{
-				/* SKIP_DATABASE_STATS is supported since v16 */
-				Assert(serverVersion >= 160000);
-				appendPQExpBuffer(sql, "%sSKIP_DATABASE_STATS", sep);
-				sep = comma;
-			}
-			if (vacopts->skip_locked)
-			{
-				/* SKIP_LOCKED is supported since v12 */
-				Assert(serverVersion >= 120000);
-				appendPQExpBuffer(sql, "%sSKIP_LOCKED", sep);
-				sep = comma;
-			}
-			if (vacopts->full)
-			{
-				appendPQExpBuffer(sql, "%sFULL", sep);
-				sep = comma;
-			}
-			if (vacopts->freeze)
-			{
-				appendPQExpBuffer(sql, "%sFREEZE", sep);
-				sep = comma;
-			}
-			if (vacopts->verbose)
-			{
-				appendPQExpBuffer(sql, "%sVERBOSE", sep);
-				sep = comma;
-			}
-			if (vacopts->and_analyze)
-			{
-				appendPQExpBuffer(sql, "%sANALYZE", sep);
-				sep = comma;
-			}
-			if (vacopts->parallel_workers >= 0)
-			{
-				/* PARALLEL is supported since v13 */
-				Assert(serverVersion >= 130000);
-				appendPQExpBuffer(sql, "%sPARALLEL %d", sep,
-								  vacopts->parallel_workers);
-				sep = comma;
-			}
-			if (vacopts->buffer_usage_limit)
-			{
-				Assert(serverVersion >= 160000);
-				appendPQExpBuffer(sql, "%sBUFFER_USAGE_LIMIT '%s'", sep,
-								  vacopts->buffer_usage_limit);
-				sep = comma;
-			}
-			if (sep != paren)
-				appendPQExpBufferChar(sql, ')');
-		}
-		else
-		{
-			if (vacopts->full)
-				appendPQExpBufferStr(sql, " FULL");
-			if (vacopts->freeze)
-				appendPQExpBufferStr(sql, " FREEZE");
-			if (vacopts->verbose)
-				appendPQExpBufferStr(sql, " VERBOSE");
-			if (vacopts->and_analyze)
-				appendPQExpBufferStr(sql, " ANALYZE");
-		}
-	}
-
-	appendPQExpBuffer(sql, " %s;", table);
-}
-
-/*
- * Send a vacuum/analyze command to the server, returning after sending the
- * command.
- *
- * Any errors during command execution are reported to stderr.
- */
-static void
-run_vacuum_command(PGconn *conn, const char *sql, bool echo,
-				   const char *table)
-{
-	bool		status;
-
-	if (echo)
-		printf("%s\n", sql);
-
-	status = PQsendQuery(conn, sql) == 1;
-
-	if (!status)
-	{
-		if (table)
-			pg_log_error("vacuuming of table \"%s\" in database \"%s\" failed: %s",
-						 table, PQdb(conn), PQerrorMessage(conn));
-		else
-			pg_log_error("vacuuming of database \"%s\" failed: %s",
-						 PQdb(conn), PQerrorMessage(conn));
-	}
-}
 
 static void
 help(const char *progname)
diff --git a/src/bin/scripts/vacuuming.c b/src/bin/scripts/vacuuming.c
new file mode 100644
index 00000000000..9be37fcc45a
--- /dev/null
+++ b/src/bin/scripts/vacuuming.c
@@ -0,0 +1,978 @@
+/*-------------------------------------------------------------------------
+ * vacuuming.c
+ *		Common routines for vacuumdb
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/bin/scripts/vacuuming.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <limits.h>
+
+#include "catalog/pg_attribute_d.h"
+#include "catalog/pg_class_d.h"
+#include "common/connect.h"
+#include "common/logging.h"
+#include "fe_utils/cancel.h"
+#include "fe_utils/option_utils.h"
+#include "fe_utils/parallel_slot.h"
+#include "fe_utils/query_utils.h"
+#include "fe_utils/string_utils.h"
+#include "vacuuming.h"
+
+VacObjFilter objfilter = OBJFILTER_NONE;
+
+
+/*
+ * Executes vacuum/analyze as indicated, or dies in case of failure.
+ */
+void
+vacuuming_main(ConnParams *cparams, const char *dbname,
+			   const char *maintenance_db, vacuumingOptions *vacopts,
+			   SimpleStringList *objects, bool analyze_in_stages,
+			   int tbl_count, int concurrentCons,
+			   const char *progname, bool echo, bool quiet)
+{
+	setup_cancel_handler(NULL);
+
+	/* Avoid opening extra connections. */
+	if (tbl_count && (concurrentCons > tbl_count))
+		concurrentCons = tbl_count;
+
+	if (objfilter & OBJFILTER_ALL_DBS)
+	{
+		cparams->dbname = maintenance_db;
+
+		vacuum_all_databases(cparams, vacopts,
+							 analyze_in_stages,
+							 objects,
+							 concurrentCons,
+							 progname, echo, quiet);
+	}
+	else
+	{
+		if (dbname == NULL)
+		{
+			if (getenv("PGDATABASE"))
+				dbname = getenv("PGDATABASE");
+			else if (getenv("PGUSER"))
+				dbname = getenv("PGUSER");
+			else
+				dbname = get_user_name_or_exit(progname);
+		}
+
+		cparams->dbname = dbname;
+
+		if (analyze_in_stages)
+		{
+			int			stage;
+			SimpleStringList *found_objs = NULL;
+
+			for (stage = 0; stage < ANALYZE_NUM_STAGES; stage++)
+			{
+				vacuum_one_database(cparams, vacopts,
+									stage,
+									objects,
+									vacopts->missing_stats_only ? &found_objs : NULL,
+									concurrentCons,
+									progname, echo, quiet);
+			}
+		}
+		else
+			vacuum_one_database(cparams, vacopts,
+								ANALYZE_NO_STAGE,
+								objects, NULL,
+								concurrentCons,
+								progname, echo, quiet);
+	}
+}
+
+
+/*
+ * vacuum_one_database
+ *
+ * Process tables in the given database.
+ *
+ * There are two ways to specify the list of objects to process:
+ *
+ * 1) The "found_objs" parameter is a double pointer to a fully qualified list
+ *    of objects to process, as returned by a previous call to
+ *    vacuum_one_database().
+ *
+ *     a) If both "found_objs" (the double pointer) and "*found_objs" (the
+ *        once-dereferenced double pointer) are not NULL, this list takes
+ *        priority, and anything specified in "objects" is ignored.
+ *
+ *     b) If "found_objs" (the double pointer) is not NULL but "*found_objs"
+ *        (the once-dereferenced double pointer) _is_ NULL, the "objects"
+ *        parameter takes priority, and the results of the catalog query
+ *        described in (2) are stored in "found_objs".
+ *
+ *     c) If "found_objs" (the double pointer) is NULL, the "objects"
+ *        parameter again takes priority, and the results of the catalog query
+ *        are not saved.
+ *
+ * 2) The "objects" parameter is a user-specified list of objects to process.
+ *    When (1b) or (1c) applies, this function performs a catalog query to
+ *    retrieve a fully qualified list of objects to process, as described
+ *    below.
+ *
+ *     a) If "objects" is not NULL, the catalog query gathers only the objects
+ *        listed in "objects".
+ *
+ *     b) If "objects" is NULL, all tables in the database are gathered.
+ *
+ * Note that this function is only concerned with running exactly one stage
+ * when in analyze-in-stages mode; caller must iterate on us if necessary.
+ *
+ * If concurrentCons is > 1, multiple connections are used to vacuum tables
+ * in parallel.
+ */
+void
+vacuum_one_database(ConnParams *cparams,
+					vacuumingOptions *vacopts,
+					int stage,
+					SimpleStringList *objects,
+					SimpleStringList **found_objs,
+					int concurrentCons,
+					const char *progname, bool echo, bool quiet)
+{
+	PQExpBufferData sql;
+	PGconn	   *conn;
+	SimpleStringListCell *cell;
+	ParallelSlotArray *sa;
+	int			ntups = 0;
+	bool		failed = false;
+	const char *initcmd;
+	SimpleStringList *ret = NULL;
+	const char *stage_commands[] = {
+		"SET default_statistics_target=1; SET vacuum_cost_delay=0;",
+		"SET default_statistics_target=10; RESET vacuum_cost_delay;",
+		"RESET default_statistics_target;"
+	};
+	const char *stage_messages[] = {
+		gettext_noop("Generating minimal optimizer statistics (1 target)"),
+		gettext_noop("Generating medium optimizer statistics (10 targets)"),
+		gettext_noop("Generating default (full) optimizer statistics")
+	};
+
+	Assert(stage == ANALYZE_NO_STAGE ||
+		   (stage >= 0 && stage < ANALYZE_NUM_STAGES));
+
+	conn = connectDatabase(cparams, progname, echo, false, true);
+
+	if (vacopts->disable_page_skipping && PQserverVersion(conn) < 90600)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "disable-page-skipping", "9.6");
+	}
+
+	if (vacopts->no_index_cleanup && PQserverVersion(conn) < 120000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "no-index-cleanup", "12");
+	}
+
+	if (vacopts->force_index_cleanup && PQserverVersion(conn) < 120000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "force-index-cleanup", "12");
+	}
+
+	if (!vacopts->do_truncate && PQserverVersion(conn) < 120000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "no-truncate", "12");
+	}
+
+	if (!vacopts->process_main && PQserverVersion(conn) < 160000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "no-process-main", "16");
+	}
+
+	if (!vacopts->process_toast && PQserverVersion(conn) < 140000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "no-process-toast", "14");
+	}
+
+	if (vacopts->skip_locked && PQserverVersion(conn) < 120000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "skip-locked", "12");
+	}
+
+	if (vacopts->min_xid_age != 0 && PQserverVersion(conn) < 90600)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "--min-xid-age", "9.6");
+	}
+
+	if (vacopts->min_mxid_age != 0 && PQserverVersion(conn) < 90600)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "--min-mxid-age", "9.6");
+	}
+
+	if (vacopts->parallel_workers >= 0 && PQserverVersion(conn) < 130000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "--parallel", "13");
+	}
+
+	if (vacopts->buffer_usage_limit && PQserverVersion(conn) < 160000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "--buffer-usage-limit", "16");
+	}
+
+	if (vacopts->missing_stats_only && PQserverVersion(conn) < 150000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "--missing-stats-only", "15");
+	}
+
+	/* skip_database_stats is used automatically if server supports it */
+	vacopts->skip_database_stats = (PQserverVersion(conn) >= 160000);
+
+	if (!quiet)
+	{
+		if (stage != ANALYZE_NO_STAGE)
+			printf(_("%s: processing database \"%s\": %s\n"),
+				   progname, PQdb(conn), _(stage_messages[stage]));
+		else
+			printf(_("%s: vacuuming database \"%s\"\n"),
+				   progname, PQdb(conn));
+		fflush(stdout);
+	}
+
+	/*
+	 * If the caller provided the results of a previous catalog query, just
+	 * use that.  Otherwise, run the catalog query ourselves and set the
+	 * return variable if provided.
+	 */
+	if (found_objs && *found_objs)
+		ret = *found_objs;
+	else
+	{
+		ret = retrieve_objects(conn, vacopts, objects, echo);
+		if (found_objs)
+			*found_objs = ret;
+	}
+
+	/*
+	 * Count the number of objects in the catalog query result.  If there are
+	 * none, we are done.
+	 */
+	for (cell = ret ? ret->head : NULL; cell; cell = cell->next)
+		ntups++;
+
+	if (ntups == 0)
+	{
+		PQfinish(conn);
+		return;
+	}
+
+	/*
+	 * Ensure concurrentCons is sane.  If there are more connections than
+	 * vacuumable relations, we don't need to use them all.
+	 */
+	if (concurrentCons > ntups)
+		concurrentCons = ntups;
+	if (concurrentCons <= 0)
+		concurrentCons = 1;
+
+	/*
+	 * All slots need to be prepared to run the appropriate analyze stage, if
+	 * caller requested that mode.  We have to prepare the initial connection
+	 * ourselves before setting up the slots.
+	 */
+	if (stage == ANALYZE_NO_STAGE)
+		initcmd = NULL;
+	else
+	{
+		initcmd = stage_commands[stage];
+		executeCommand(conn, initcmd, echo);
+	}
+
+	/*
+	 * Setup the database connections. We reuse the connection we already have
+	 * for the first slot.  If not in parallel mode, the first slot in the
+	 * array contains the connection.
+	 */
+	sa = ParallelSlotsSetup(concurrentCons, cparams, progname, echo, initcmd);
+	ParallelSlotsAdoptConn(sa, conn);
+
+	initPQExpBuffer(&sql);
+
+	cell = ret->head;
+	do
+	{
+		const char *tabname = cell->val;
+		ParallelSlot *free_slot;
+
+		if (CancelRequested)
+		{
+			failed = true;
+			goto finish;
+		}
+
+		free_slot = ParallelSlotsGetIdle(sa, NULL);
+		if (!free_slot)
+		{
+			failed = true;
+			goto finish;
+		}
+
+		prepare_vacuum_command(&sql, PQserverVersion(free_slot->connection),
+							   vacopts, tabname);
+
+		/*
+		 * Execute the vacuum.  All errors are handled in processQueryResult
+		 * through ParallelSlotsGetIdle.
+		 */
+		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
+		run_vacuum_command(free_slot->connection, sql.data,
+						   echo, tabname);
+
+		cell = cell->next;
+	} while (cell != NULL);
+
+	if (!ParallelSlotsWaitCompletion(sa))
+	{
+		failed = true;
+		goto finish;
+	}
+
+	/* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */
+	if (vacopts->skip_database_stats &&
+		stage == ANALYZE_NO_STAGE)
+	{
+		const char *cmd = "VACUUM (ONLY_DATABASE_STATS);";
+		ParallelSlot *free_slot = ParallelSlotsGetIdle(sa, NULL);
+
+		if (!free_slot)
+		{
+			failed = true;
+			goto finish;
+		}
+
+		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
+		run_vacuum_command(free_slot->connection, cmd, echo, NULL);
+
+		if (!ParallelSlotsWaitCompletion(sa))
+			failed = true;
+	}
+
+finish:
+	ParallelSlotsTerminate(sa);
+	pg_free(sa);
+
+	termPQExpBuffer(&sql);
+
+	if (failed)
+		exit(1);
+}
+
+/*
+ * Prepare the list of tables to process by querying the catalogs.
+ *
+ * Since we execute the constructed query with the default search_path (which
+ * could be unsafe), everything in this query MUST be fully qualified.
+ *
+ * First, build a WITH clause for the catalog query if any tables were
+ * specified, with a set of values made of relation names and their optional
+ * set of columns.  This is used to match any provided column lists with the
+ * generated qualified identifiers and to filter for the tables provided via
+ * --table.  If a listed table does not exist, the catalog query will fail.
+ */
+SimpleStringList *
+retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
+				 SimpleStringList *objects, bool echo)
+{
+	PQExpBufferData buf;
+	PQExpBufferData catalog_query;
+	PGresult   *res;
+	SimpleStringListCell *cell;
+	SimpleStringList *found_objs = palloc0(sizeof(SimpleStringList));
+	bool		objects_listed = false;
+
+	initPQExpBuffer(&catalog_query);
+	for (cell = objects ? objects->head : NULL; cell; cell = cell->next)
+	{
+		char	   *just_table = NULL;
+		const char *just_columns = NULL;
+
+		if (!objects_listed)
+		{
+			appendPQExpBufferStr(&catalog_query,
+								 "WITH listed_objects (object_oid, column_list) AS (\n"
+								 "  VALUES (");
+			objects_listed = true;
+		}
+		else
+			appendPQExpBufferStr(&catalog_query, ",\n  (");
+
+		if (objfilter & (OBJFILTER_SCHEMA | OBJFILTER_SCHEMA_EXCLUDE))
+		{
+			appendStringLiteralConn(&catalog_query, cell->val, conn);
+			appendPQExpBufferStr(&catalog_query, "::pg_catalog.regnamespace, ");
+		}
+
+		if (objfilter & OBJFILTER_TABLE)
+		{
+			/*
+			 * Split relation and column names given by the user, this is used
+			 * to feed the CTE with values on which are performed pre-run
+			 * validity checks as well.  For now these happen only on the
+			 * relation name.
+			 */
+			splitTableColumnsSpec(cell->val, PQclientEncoding(conn),
+								  &just_table, &just_columns);
+
+			appendStringLiteralConn(&catalog_query, just_table, conn);
+			appendPQExpBufferStr(&catalog_query, "::pg_catalog.regclass, ");
+		}
+
+		if (just_columns && just_columns[0] != '\0')
+			appendStringLiteralConn(&catalog_query, just_columns, conn);
+		else
+			appendPQExpBufferStr(&catalog_query, "NULL");
+
+		appendPQExpBufferStr(&catalog_query, "::pg_catalog.text)");
+
+		pg_free(just_table);
+	}
+
+	/* Finish formatting the CTE */
+	if (objects_listed)
+		appendPQExpBufferStr(&catalog_query, "\n)\n");
+
+	appendPQExpBufferStr(&catalog_query, "SELECT c.relname, ns.nspname");
+
+	if (objects_listed)
+		appendPQExpBufferStr(&catalog_query, ", listed_objects.column_list");
+
+	appendPQExpBufferStr(&catalog_query,
+						 " FROM pg_catalog.pg_class c\n"
+						 " JOIN pg_catalog.pg_namespace ns"
+						 " ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n"
+						 " CROSS JOIN LATERAL (SELECT c.relkind IN ("
+						 CppAsString2(RELKIND_PARTITIONED_TABLE) ", "
+						 CppAsString2(RELKIND_PARTITIONED_INDEX) ")) as p (inherited)\n"
+						 " LEFT JOIN pg_catalog.pg_class t"
+						 " ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid\n");
+
+	/*
+	 * Used to match the tables or schemas listed by the user, completing the
+	 * JOIN clause.
+	 */
+	if (objects_listed)
+	{
+		appendPQExpBufferStr(&catalog_query, " LEFT JOIN listed_objects"
+							 " ON listed_objects.object_oid"
+							 " OPERATOR(pg_catalog.=) ");
+
+		if (objfilter & OBJFILTER_TABLE)
+			appendPQExpBufferStr(&catalog_query, "c.oid\n");
+		else
+			appendPQExpBufferStr(&catalog_query, "ns.oid\n");
+	}
+
+	/*
+	 * Exclude temporary tables, beginning the WHERE clause.
+	 */
+	appendPQExpBufferStr(&catalog_query,
+						 " WHERE c.relpersistence OPERATOR(pg_catalog.!=) "
+						 CppAsString2(RELPERSISTENCE_TEMP) "\n");
+
+	/*
+	 * Used to match the tables or schemas listed by the user, for the WHERE
+	 * clause.
+	 */
+	if (objects_listed)
+	{
+		if (objfilter & OBJFILTER_SCHEMA_EXCLUDE)
+			appendPQExpBufferStr(&catalog_query,
+								 " AND listed_objects.object_oid IS NULL\n");
+		else
+			appendPQExpBufferStr(&catalog_query,
+								 " AND listed_objects.object_oid IS NOT NULL\n");
+	}
+
+	/*
+	 * If no tables were listed, filter for the relevant relation types.  If
+	 * tables were given via --table, don't bother filtering by relation type.
+	 * Instead, let the server decide whether a given relation can be
+	 * processed in which case the user will know about it.
+	 */
+	if ((objfilter & OBJFILTER_TABLE) == 0)
+	{
+		/*
+		 * vacuumdb should generally follow the behavior of the underlying
+		 * VACUUM and ANALYZE commands. If analyze_only is true, process
+		 * regular tables, materialized views, and partitioned tables, just
+		 * like ANALYZE (with no specific target tables) does. Otherwise,
+		 * process only regular tables and materialized views, since VACUUM
+		 * skips partitioned tables when no target tables are specified.
+		 */
+		if (vacopts->analyze_only)
+			appendPQExpBufferStr(&catalog_query,
+								 " AND c.relkind OPERATOR(pg_catalog.=) ANY (array["
+								 CppAsString2(RELKIND_RELATION) ", "
+								 CppAsString2(RELKIND_MATVIEW) ", "
+								 CppAsString2(RELKIND_PARTITIONED_TABLE) "])\n");
+		else
+			appendPQExpBufferStr(&catalog_query,
+								 " AND c.relkind OPERATOR(pg_catalog.=) ANY (array["
+								 CppAsString2(RELKIND_RELATION) ", "
+								 CppAsString2(RELKIND_MATVIEW) "])\n");
+	}
+
+	/*
+	 * For --min-xid-age and --min-mxid-age, the age of the relation is the
+	 * greatest of the ages of the main relation and its associated TOAST
+	 * table.  The commands generated by vacuumdb will also process the TOAST
+	 * table for the relation if necessary, so it does not need to be
+	 * considered separately.
+	 */
+	if (vacopts->min_xid_age != 0)
+	{
+		appendPQExpBuffer(&catalog_query,
+						  " AND GREATEST(pg_catalog.age(c.relfrozenxid),"
+						  " pg_catalog.age(t.relfrozenxid)) "
+						  " OPERATOR(pg_catalog.>=) '%d'::pg_catalog.int4\n"
+						  " AND c.relfrozenxid OPERATOR(pg_catalog.!=)"
+						  " '0'::pg_catalog.xid\n",
+						  vacopts->min_xid_age);
+	}
+
+	if (vacopts->min_mxid_age != 0)
+	{
+		appendPQExpBuffer(&catalog_query,
+						  " AND GREATEST(pg_catalog.mxid_age(c.relminmxid),"
+						  " pg_catalog.mxid_age(t.relminmxid)) OPERATOR(pg_catalog.>=)"
+						  " '%d'::pg_catalog.int4\n"
+						  " AND c.relminmxid OPERATOR(pg_catalog.!=)"
+						  " '0'::pg_catalog.xid\n",
+						  vacopts->min_mxid_age);
+	}
+
+	if (vacopts->missing_stats_only)
+	{
+		appendPQExpBufferStr(&catalog_query, " AND (\n");
+
+		/* regular stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
+							 " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
+							 " AND NOT a.attisdropped\n"
+							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND a.attgenerated OPERATOR(pg_catalog.<>) "
+							 CppAsString2(ATTRIBUTE_GENERATED_VIRTUAL) "\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
+							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
+							 " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+		/* extended stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
+							 " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
+							 " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
+							 " AND d.stxdinherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+		/* expression indexes */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
+							 " JOIN pg_catalog.pg_index i"
+							 " ON i.indexrelid OPERATOR(pg_catalog.=) a.attrelid\n"
+							 " WHERE i.indrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND i.indkey[a.attnum OPERATOR(pg_catalog.-) 1::pg_catalog.int2]"
+							 " OPERATOR(pg_catalog.=) 0::pg_catalog.int2\n"
+							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
+							 " AND NOT a.attisdropped\n"
+							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
+							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
+							 " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+		/* inheritance and regular stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
+							 " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
+							 " AND NOT a.attisdropped\n"
+							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND a.attgenerated OPERATOR(pg_catalog.<>) "
+							 CppAsString2(ATTRIBUTE_GENERATED_VIRTUAL) "\n"
+							 " AND c.relhassubclass\n"
+							 " AND NOT p.inherited\n"
+							 " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
+							 " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
+							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
+							 " AND s.stainherit))\n");
+
+		/* inheritance and extended stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
+							 " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND c.relhassubclass\n"
+							 " AND NOT p.inherited\n"
+							 " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
+							 " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
+							 " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
+							 " AND d.stxdinherit))\n");
+
+		appendPQExpBufferStr(&catalog_query, " )\n");
+	}
+
+	/*
+	 * Execute the catalog query.  We use the default search_path for this
+	 * query for consistency with table lookups done elsewhere by the user.
+	 */
+	appendPQExpBufferStr(&catalog_query, " ORDER BY c.relpages DESC;");
+	executeCommand(conn, "RESET search_path;", echo);
+	res = executeQuery(conn, catalog_query.data, echo);
+	termPQExpBuffer(&catalog_query);
+	PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, echo));
+
+	/*
+	 * Build qualified identifiers for each table, including the column list
+	 * if given.
+	 */
+	initPQExpBuffer(&buf);
+	for (int i = 0; i < PQntuples(res); i++)
+	{
+		appendPQExpBufferStr(&buf,
+							 fmtQualifiedIdEnc(PQgetvalue(res, i, 1),
+											   PQgetvalue(res, i, 0),
+											   PQclientEncoding(conn)));
+
+		if (objects_listed && !PQgetisnull(res, i, 2))
+			appendPQExpBufferStr(&buf, PQgetvalue(res, i, 2));
+
+		simple_string_list_append(found_objs, buf.data);
+		resetPQExpBuffer(&buf);
+	}
+	termPQExpBuffer(&buf);
+	PQclear(res);
+
+	return found_objs;
+}
+
+/*
+ * Vacuum/analyze all connectable databases.
+ *
+ * In analyze-in-stages mode, we process all databases in one stage before
+ * moving on to the next stage.  That ensure minimal stats are available
+ * quickly everywhere before generating more detailed ones.
+ */
+void
+vacuum_all_databases(ConnParams *cparams,
+					 vacuumingOptions *vacopts,
+					 bool analyze_in_stages,
+					 SimpleStringList *objects,
+					 int concurrentCons,
+					 const char *progname, bool echo, bool quiet)
+{
+	PGconn	   *conn;
+	PGresult   *result;
+	int			stage;
+	int			i;
+
+	conn = connectMaintenanceDatabase(cparams, progname, echo);
+	result = executeQuery(conn,
+						  "SELECT datname FROM pg_database WHERE datallowconn AND datconnlimit <> -2 ORDER BY 1;",
+						  echo);
+	PQfinish(conn);
+
+	if (analyze_in_stages)
+	{
+		SimpleStringList **found_objs = NULL;
+
+		if (vacopts->missing_stats_only)
+			found_objs = palloc0(PQntuples(result) * sizeof(SimpleStringList *));
+
+		/*
+		 * When analyzing all databases in stages, we analyze them all in the
+		 * fastest stage first, so that initial statistics become available
+		 * for all of them as soon as possible.
+		 *
+		 * This means we establish several times as many connections, but
+		 * that's a secondary consideration.
+		 */
+		for (stage = 0; stage < ANALYZE_NUM_STAGES; stage++)
+		{
+			for (i = 0; i < PQntuples(result); i++)
+			{
+				cparams->override_dbname = PQgetvalue(result, i, 0);
+
+				vacuum_one_database(cparams, vacopts,
+									stage,
+									objects,
+									vacopts->missing_stats_only ? &found_objs[i] : NULL,
+									concurrentCons,
+									progname, echo, quiet);
+			}
+		}
+	}
+	else
+	{
+		for (i = 0; i < PQntuples(result); i++)
+		{
+			cparams->override_dbname = PQgetvalue(result, i, 0);
+
+			vacuum_one_database(cparams, vacopts,
+								ANALYZE_NO_STAGE,
+								objects, NULL,
+								concurrentCons,
+								progname, echo, quiet);
+		}
+	}
+
+	PQclear(result);
+}
+
+/*
+ * Construct a vacuum/analyze command to run based on the given
+ * options, in the given string buffer, which may contain previous garbage.
+ *
+ * The table name used must be already properly quoted.  The command generated
+ * depends on the server version involved and it is semicolon-terminated.
+ */
+void
+prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
+					   vacuumingOptions *vacopts, const char *table)
+{
+	const char *paren = " (";
+	const char *comma = ", ";
+	const char *sep = paren;
+
+	resetPQExpBuffer(sql);
+
+	if (vacopts->analyze_only)
+	{
+		appendPQExpBufferStr(sql, "ANALYZE");
+
+		/* parenthesized grammar of ANALYZE is supported since v11 */
+		if (serverVersion >= 110000)
+		{
+			if (vacopts->skip_locked)
+			{
+				/* SKIP_LOCKED is supported since v12 */
+				Assert(serverVersion >= 120000);
+				appendPQExpBuffer(sql, "%sSKIP_LOCKED", sep);
+				sep = comma;
+			}
+			if (vacopts->verbose)
+			{
+				appendPQExpBuffer(sql, "%sVERBOSE", sep);
+				sep = comma;
+			}
+			if (vacopts->buffer_usage_limit)
+			{
+				Assert(serverVersion >= 160000);
+				appendPQExpBuffer(sql, "%sBUFFER_USAGE_LIMIT '%s'", sep,
+								  vacopts->buffer_usage_limit);
+				sep = comma;
+			}
+			if (sep != paren)
+				appendPQExpBufferChar(sql, ')');
+		}
+		else
+		{
+			if (vacopts->verbose)
+				appendPQExpBufferStr(sql, " VERBOSE");
+		}
+	}
+	else
+	{
+		appendPQExpBufferStr(sql, "VACUUM");
+
+		/* parenthesized grammar of VACUUM is supported since v9.0 */
+		if (serverVersion >= 90000)
+		{
+			if (vacopts->disable_page_skipping)
+			{
+				/* DISABLE_PAGE_SKIPPING is supported since v9.6 */
+				Assert(serverVersion >= 90600);
+				appendPQExpBuffer(sql, "%sDISABLE_PAGE_SKIPPING", sep);
+				sep = comma;
+			}
+			if (vacopts->no_index_cleanup)
+			{
+				/* "INDEX_CLEANUP FALSE" has been supported since v12 */
+				Assert(serverVersion >= 120000);
+				Assert(!vacopts->force_index_cleanup);
+				appendPQExpBuffer(sql, "%sINDEX_CLEANUP FALSE", sep);
+				sep = comma;
+			}
+			if (vacopts->force_index_cleanup)
+			{
+				/* "INDEX_CLEANUP TRUE" has been supported since v12 */
+				Assert(serverVersion >= 120000);
+				Assert(!vacopts->no_index_cleanup);
+				appendPQExpBuffer(sql, "%sINDEX_CLEANUP TRUE", sep);
+				sep = comma;
+			}
+			if (!vacopts->do_truncate)
+			{
+				/* TRUNCATE is supported since v12 */
+				Assert(serverVersion >= 120000);
+				appendPQExpBuffer(sql, "%sTRUNCATE FALSE", sep);
+				sep = comma;
+			}
+			if (!vacopts->process_main)
+			{
+				/* PROCESS_MAIN is supported since v16 */
+				Assert(serverVersion >= 160000);
+				appendPQExpBuffer(sql, "%sPROCESS_MAIN FALSE", sep);
+				sep = comma;
+			}
+			if (!vacopts->process_toast)
+			{
+				/* PROCESS_TOAST is supported since v14 */
+				Assert(serverVersion >= 140000);
+				appendPQExpBuffer(sql, "%sPROCESS_TOAST FALSE", sep);
+				sep = comma;
+			}
+			if (vacopts->skip_database_stats)
+			{
+				/* SKIP_DATABASE_STATS is supported since v16 */
+				Assert(serverVersion >= 160000);
+				appendPQExpBuffer(sql, "%sSKIP_DATABASE_STATS", sep);
+				sep = comma;
+			}
+			if (vacopts->skip_locked)
+			{
+				/* SKIP_LOCKED is supported since v12 */
+				Assert(serverVersion >= 120000);
+				appendPQExpBuffer(sql, "%sSKIP_LOCKED", sep);
+				sep = comma;
+			}
+			if (vacopts->full)
+			{
+				appendPQExpBuffer(sql, "%sFULL", sep);
+				sep = comma;
+			}
+			if (vacopts->freeze)
+			{
+				appendPQExpBuffer(sql, "%sFREEZE", sep);
+				sep = comma;
+			}
+			if (vacopts->verbose)
+			{
+				appendPQExpBuffer(sql, "%sVERBOSE", sep);
+				sep = comma;
+			}
+			if (vacopts->and_analyze)
+			{
+				appendPQExpBuffer(sql, "%sANALYZE", sep);
+				sep = comma;
+			}
+			if (vacopts->parallel_workers >= 0)
+			{
+				/* PARALLEL is supported since v13 */
+				Assert(serverVersion >= 130000);
+				appendPQExpBuffer(sql, "%sPARALLEL %d", sep,
+								  vacopts->parallel_workers);
+				sep = comma;
+			}
+			if (vacopts->buffer_usage_limit)
+			{
+				Assert(serverVersion >= 160000);
+				appendPQExpBuffer(sql, "%sBUFFER_USAGE_LIMIT '%s'", sep,
+								  vacopts->buffer_usage_limit);
+				sep = comma;
+			}
+			if (sep != paren)
+				appendPQExpBufferChar(sql, ')');
+		}
+		else
+		{
+			if (vacopts->full)
+				appendPQExpBufferStr(sql, " FULL");
+			if (vacopts->freeze)
+				appendPQExpBufferStr(sql, " FREEZE");
+			if (vacopts->verbose)
+				appendPQExpBufferStr(sql, " VERBOSE");
+			if (vacopts->and_analyze)
+				appendPQExpBufferStr(sql, " ANALYZE");
+		}
+	}
+
+	appendPQExpBuffer(sql, " %s;", table);
+}
+
+/*
+ * Send a vacuum/analyze command to the server, returning after sending the
+ * command.
+ *
+ * Any errors during command execution are reported to stderr.
+ */
+void
+run_vacuum_command(PGconn *conn, const char *sql, bool echo,
+				   const char *table)
+{
+	bool		status;
+
+	if (echo)
+		printf("%s\n", sql);
+
+	status = PQsendQuery(conn, sql) == 1;
+
+	if (!status)
+	{
+		if (table)
+		{
+			pg_log_error("vacuuming of table \"%s\" in database \"%s\" failed: %s",
+						 table, PQdb(conn), PQerrorMessage(conn));
+		}
+		else
+		{
+			pg_log_error("vacuuming of database \"%s\" failed: %s",
+						 PQdb(conn), PQerrorMessage(conn));
+		}
+	}
+}
+
+/*
+ * Returns a newly malloc'd version of 'src' with escaped single quotes and
+ * backslashes.
+ */
+char *
+escape_quotes(const char *src)
+{
+	char	   *result = escape_single_quotes_ascii(src);
+
+	if (!result)
+		pg_fatal("out of memory");
+	return result;
+}
diff --git a/src/bin/scripts/vacuuming.h b/src/bin/scripts/vacuuming.h
new file mode 100644
index 00000000000..d3f000840fa
--- /dev/null
+++ b/src/bin/scripts/vacuuming.h
@@ -0,0 +1,95 @@
+/*-------------------------------------------------------------------------
+ *
+ * vacuuming.h
+ *		Common declarations for vacuuming.c
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/bin/scripts/vacuuming.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef VACUUMING_H
+#define VACUUMING_H
+
+#include "common.h"
+#include "fe_utils/connect_utils.h"
+#include "fe_utils/simple_list.h"
+
+/* For analyze-in-stages mode */
+#define ANALYZE_NO_STAGE	-1
+#define ANALYZE_NUM_STAGES	3
+
+/* vacuum options controlled by user flags */
+typedef struct vacuumingOptions
+{
+	bool		analyze_only;
+	bool		verbose;
+	bool		and_analyze;
+	bool		full;
+	bool		freeze;
+	bool		disable_page_skipping;
+	bool		skip_locked;
+	int			min_xid_age;
+	int			min_mxid_age;
+	int			parallel_workers;	/* >= 0 indicates user specified the
+									 * parallel degree, otherwise -1 */
+	bool		no_index_cleanup;
+	bool		force_index_cleanup;
+	bool		do_truncate;
+	bool		process_main;
+	bool		process_toast;
+	bool		skip_database_stats;
+	char	   *buffer_usage_limit;
+	bool		missing_stats_only;
+} vacuumingOptions;
+
+/* object filter options */
+typedef enum
+{
+	OBJFILTER_NONE = 0,			/* no filter used */
+	OBJFILTER_ALL_DBS = (1 << 0),	/* -a | --all */
+	OBJFILTER_DATABASE = (1 << 1),	/* -d | --dbname */
+	OBJFILTER_TABLE = (1 << 2), /* -t | --table */
+	OBJFILTER_SCHEMA = (1 << 3),	/* -n | --schema */
+	OBJFILTER_SCHEMA_EXCLUDE = (1 << 4),	/* -N | --exclude-schema */
+} VacObjFilter;
+
+extern VacObjFilter objfilter;
+
+extern void vacuuming_main(ConnParams *cparams, const char *dbname,
+						   const char *maintenance_db, vacuumingOptions *vacopts,
+						   SimpleStringList *objects, bool analyze_in_stages,
+						   int tbl_count, int concurrentCons,
+						   const char *progname, bool echo, bool quiet);
+
+extern SimpleStringList *retrieve_objects(PGconn *conn,
+										  vacuumingOptions *vacopts,
+										  SimpleStringList *objects,
+										  bool echo);
+
+extern void vacuum_one_database(ConnParams *cparams,
+								vacuumingOptions *vacopts,
+								int stage,
+								SimpleStringList *objects,
+								SimpleStringList **found_objs,
+								int concurrentCons,
+								const char *progname, bool echo, bool quiet);
+
+extern void vacuum_all_databases(ConnParams *cparams,
+								 vacuumingOptions *vacopts,
+								 bool analyze_in_stages,
+								 SimpleStringList *objects,
+								 int concurrentCons,
+								 const char *progname, bool echo, bool quiet);
+
+extern void prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
+								   vacuumingOptions *vacopts, const char *table);
+
+extern void run_vacuum_command(PGconn *conn, const char *sql, bool echo,
+							   const char *table);
+
+extern char *escape_quotes(const char *src);
+
+#endif							/* VACUUMING_H */
-- 
2.39.5

v20-0002-Add-REPACK-command.patchtext/x-diff; charset=utf-8Download

From 2b6f4a13e657d960694de8d79044bf3a268bb6c8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= <alvherre@kurilemu.de>
Date: Sat, 26 Jul 2025 19:57:26 +0200
Subject: [PATCH v20 2/6] Add REPACK command

REPACK absorbs the functionality of VACUUM FULL and CLUSTER in a single
command.  Because this functionality is completely different from
regular VACUUM, having it separate from VACUUM makes it easier for users
to understand; as for CLUSTER, the term is heavily overloaded in the
TI world and even in Postgres itself, so it's good that we can avoid it.

This also adds pg_repackdb, a new utility that can invoke the new
commands.  This is heavily based on vacuumdb.  We may still change the
implementation, depending on how does Windows like this one.

Author: Antonin Houska <ah@cybertec.at>
Reviewed-by: To fill in
Discussion: https://postgr.es/m/82651.1720540558@antos
Discussion: https://postgr.es/m/202507262156.sb455angijk6@alvherre.pgsql
---
 doc/src/sgml/monitoring.sgml             | 223 ++++++-
 doc/src/sgml/ref/allfiles.sgml           |   2 +
 doc/src/sgml/ref/cluster.sgml            |  97 +--
 doc/src/sgml/ref/clusterdb.sgml          |   5 +
 doc/src/sgml/ref/pg_repackdb.sgml        | 479 ++++++++++++++
 doc/src/sgml/ref/repack.sgml             | 284 +++++++++
 doc/src/sgml/ref/vacuum.sgml             |  33 +-
 doc/src/sgml/reference.sgml              |   2 +
 src/backend/access/heap/heapam_handler.c |  32 +-
 src/backend/catalog/index.c              |   2 +-
 src/backend/catalog/system_views.sql     |  26 +
 src/backend/commands/cluster.c           | 758 +++++++++++++++--------
 src/backend/commands/vacuum.c            |   3 +-
 src/backend/parser/gram.y                |  88 ++-
 src/backend/tcop/utility.c               |  20 +-
 src/backend/utils/adt/pgstatfuncs.c      |   2 +
 src/bin/psql/tab-complete.in.c           |  33 +-
 src/bin/scripts/Makefile                 |   4 +-
 src/bin/scripts/meson.build              |   2 +
 src/bin/scripts/pg_repackdb.c            | 226 +++++++
 src/bin/scripts/t/103_repackdb.pl        |  24 +
 src/bin/scripts/vacuuming.c              |  60 +-
 src/bin/scripts/vacuuming.h              |  11 +-
 src/include/commands/cluster.h           |   8 +-
 src/include/commands/progress.h          |  61 +-
 src/include/nodes/parsenodes.h           |  20 +-
 src/include/parser/kwlist.h              |   1 +
 src/include/tcop/cmdtaglist.h            |   1 +
 src/include/utils/backend_progress.h     |   1 +
 src/test/regress/expected/cluster.out    | 125 +++-
 src/test/regress/expected/rules.out      |  23 +
 src/test/regress/sql/cluster.sql         |  59 ++
 src/tools/pgindent/typedefs.list         |   3 +
 33 files changed, 2271 insertions(+), 447 deletions(-)
 create mode 100644 doc/src/sgml/ref/pg_repackdb.sgml
 create mode 100644 doc/src/sgml/ref/repack.sgml
 create mode 100644 src/bin/scripts/pg_repackdb.c
 create mode 100644 src/bin/scripts/t/103_repackdb.pl

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3f4a27a736e..12e103d319d 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -405,6 +405,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_progress_repack</structname><indexterm><primary>pg_stat_progress_repack</primary></indexterm></entry>
+      <entry>One row for each backend running
+       <command>REPACK</command>, showing current progress.  See
+       <xref linkend="repack-progress-reporting"/>.
+      </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_progress_basebackup</structname><indexterm><primary>pg_stat_progress_basebackup</primary></indexterm></entry>
       <entry>One row for each WAL sender process streaming a base backup,
@@ -5506,7 +5514,8 @@ FROM pg_stat_get_backend_idset() AS backendid;
    certain commands during command execution.  Currently, the only commands
    which support progress reporting are <command>ANALYZE</command>,
    <command>CLUSTER</command>,
-   <command>CREATE INDEX</command>, <command>VACUUM</command>,
+   <command>CREATE INDEX</command>, <command>REPACK</command>,
+   <command>VACUUM</command>,
    <command>COPY</command>,
    and <xref linkend="protocol-replication-base-backup"/> (i.e., replication
    command that <xref linkend="app-pgbasebackup"/> issues to take
@@ -5965,6 +5974,218 @@ FROM pg_stat_get_backend_idset() AS backendid;
   </table>
  </sect2>
 
+ <sect2 id="repack-progress-reporting">
+  <title>REPACK Progress Reporting</title>
+
+  <indexterm>
+   <primary>pg_stat_progress_repack</primary>
+  </indexterm>
+
+  <para>
+   Whenever <command>REPACK</command> is running,
+   the <structname>pg_stat_progress_repack</structname> view will contain a
+   row for each backend that is currently running the command.  The tables
+   below describe the information that will be reported and provide
+   information about how to interpret it.
+  </para>
+
+  <table id="pg-stat-progress-repack-view" xreflabel="pg_stat_progress_repack">
+   <title><structname>pg_stat_progress_repack</structname> View</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>pid</structfield> <type>integer</type>
+      </para>
+      <para>
+       Process ID of backend.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>datid</structfield> <type>oid</type>
+      </para>
+      <para>
+       OID of the database to which this backend is connected.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>datname</structfield> <type>name</type>
+      </para>
+      <para>
+       Name of the database to which this backend is connected.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>relid</structfield> <type>oid</type>
+      </para>
+      <para>
+       OID of the table being repacked.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>phase</structfield> <type>text</type>
+      </para>
+      <para>
+       Current processing phase. See <xref linkend="repack-phases"/>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>repack_index_relid</structfield> <type>oid</type>
+      </para>
+      <para>
+       If the table is being scanned using an index, this is the OID of the
+       index being used; otherwise, it is zero.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_scanned</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples scanned.
+       This counter only advances when the phase is
+       <literal>seq scanning heap</literal>,
+       <literal>index scanning heap</literal>
+       or <literal>writing new heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_written</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples written.
+       This counter only advances when the phase is
+       <literal>seq scanning heap</literal>,
+       <literal>index scanning heap</literal>
+       or <literal>writing new heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_blks_total</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total number of heap blocks in the table.  This number is reported
+       as of the beginning of <literal>seq scanning heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_blks_scanned</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap blocks scanned.  This counter only advances when the
+       phase is <literal>seq scanning heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>index_rebuild_count</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of indexes rebuilt.  This counter only advances when the phase
+       is <literal>rebuilding index</literal>.
+      </para></entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <table id="repack-phases">
+   <title>REPACK Phases</title>
+   <tgroup cols="2">
+    <colspec colname="col1" colwidth="1*"/>
+    <colspec colname="col2" colwidth="2*"/>
+    <thead>
+    <row>
+      <entry>Phase</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><literal>initializing</literal></entry>
+     <entry>
+       The command is preparing to begin scanning the heap.  This phase is
+       expected to be very brief.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>seq scanning heap</literal></entry>
+     <entry>
+       The command is currently scanning the table using a sequential scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>index scanning heap</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently scanning the table using an index scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>sorting tuples</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently sorting tuples.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>writing new heap</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently writing the new heap.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>swapping relation files</literal></entry>
+     <entry>
+       The command is currently swapping newly-built files into place.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>rebuilding index</literal></entry>
+     <entry>
+       The command is currently rebuilding an index.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>performing final cleanup</literal></entry>
+     <entry>
+       The command is performing final cleanup.  When this phase is
+       completed, <command>REPACK</command> will end.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+ </sect2>
+
  <sect2 id="copy-progress-reporting">
   <title>COPY Progress Reporting</title>
 
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index f5be638867a..eabf92e3536 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -167,6 +167,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY refreshMaterializedView SYSTEM "refresh_materialized_view.sgml">
 <!ENTITY reindex            SYSTEM "reindex.sgml">
 <!ENTITY releaseSavepoint   SYSTEM "release_savepoint.sgml">
+<!ENTITY repack             SYSTEM "repack.sgml">
 <!ENTITY reset              SYSTEM "reset.sgml">
 <!ENTITY revoke             SYSTEM "revoke.sgml">
 <!ENTITY rollback           SYSTEM "rollback.sgml">
@@ -212,6 +213,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY pgIsready          SYSTEM "pg_isready.sgml">
 <!ENTITY pgReceivewal       SYSTEM "pg_receivewal.sgml">
 <!ENTITY pgRecvlogical      SYSTEM "pg_recvlogical.sgml">
+<!ENTITY pgRepackdb         SYSTEM "pg_repackdb.sgml">
 <!ENTITY pgResetwal         SYSTEM "pg_resetwal.sgml">
 <!ENTITY pgRestore          SYSTEM "pg_restore.sgml">
 <!ENTITY pgRewind           SYSTEM "pg_rewind.sgml">
diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index 8811f169ea0..cfcfb65e349 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -33,51 +33,13 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
   <title>Description</title>
 
   <para>
-   <command>CLUSTER</command> instructs <productname>PostgreSQL</productname>
-   to cluster the table specified
-   by <replaceable class="parameter">table_name</replaceable>
-   based on the index specified by
-   <replaceable class="parameter">index_name</replaceable>. The index must
-   already have been defined on
-   <replaceable class="parameter">table_name</replaceable>.
+   The <command>CLUSTER</command> command is equivalent to
+   <xref linkend="sql-repack"/> with an <literal>USING INDEX</literal>
+   clause.  See there for more details.
   </para>
 
-  <para>
-   When a table is clustered, it is physically reordered
-   based on the index information. Clustering is a one-time operation:
-   when the table is subsequently updated, the changes are
-   not clustered.  That is, no attempt is made to store new or
-   updated rows according to their index order.  (If one wishes, one can
-   periodically recluster by issuing the command again.  Also, setting
-   the table's <literal>fillfactor</literal> storage parameter to less than
-   100% can aid in preserving cluster ordering during updates, since updated
-   rows are kept on the same page if enough space is available there.)
-  </para>
+<!-- Do we need to describe exactly which options map to what?  They seem obvious to me. -->
 
-  <para>
-   When a table is clustered, <productname>PostgreSQL</productname>
-   remembers which index it was clustered by.  The form
-   <command>CLUSTER <replaceable class="parameter">table_name</replaceable></command>
-   reclusters the table using the same index as before.  You can also
-   use the <literal>CLUSTER</literal> or <literal>SET WITHOUT CLUSTER</literal>
-   forms of <link linkend="sql-altertable"><command>ALTER TABLE</command></link> to set the index to be used for
-   future cluster operations, or to clear any previous setting.
-  </para>
-
-  <para>
-   <command>CLUSTER</command> without a
-   <replaceable class="parameter">table_name</replaceable> reclusters all the
-   previously-clustered tables in the current database that the calling user
-   has privileges for.  This form of <command>CLUSTER</command> cannot be
-   executed inside a transaction block.
-  </para>
-
-  <para>
-   When a table is being clustered, an <literal>ACCESS
-   EXCLUSIVE</literal> lock is acquired on it. This prevents any other
-   database operations (both reads and writes) from operating on the
-   table until the <command>CLUSTER</command> is finished.
-  </para>
  </refsect1>
 
  <refsect1>
@@ -136,63 +98,12 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
     on the table.
    </para>
 
-   <para>
-    In cases where you are accessing single rows randomly
-    within a table, the actual order of the data in the
-    table is unimportant. However, if you tend to access some
-    data more than others, and there is an index that groups
-    them together, you will benefit from using <command>CLUSTER</command>.
-    If you are requesting a range of indexed values from a table, or a
-    single indexed value that has multiple rows that match,
-    <command>CLUSTER</command> will help because once the index identifies the
-    table page for the first row that matches, all other rows
-    that match are probably already on the same table page,
-    and so you save disk accesses and speed up the query.
-   </para>
-
-   <para>
-    <command>CLUSTER</command> can re-sort the table using either an index scan
-    on the specified index, or (if the index is a b-tree) a sequential
-    scan followed by sorting.  It will attempt to choose the method that
-    will be faster, based on planner cost parameters and available statistical
-    information.
-   </para>
-
    <para>
     While <command>CLUSTER</command> is running, the <xref
     linkend="guc-search-path"/> is temporarily changed to <literal>pg_catalog,
     pg_temp</literal>.
    </para>
 
-   <para>
-    When an index scan is used, a temporary copy of the table is created that
-    contains the table data in the index order.  Temporary copies of each
-    index on the table are created as well.  Therefore, you need free space on
-    disk at least equal to the sum of the table size and the index sizes.
-   </para>
-
-   <para>
-    When a sequential scan and sort is used, a temporary sort file is
-    also created, so that the peak temporary space requirement is as much
-    as double the table size, plus the index sizes.  This method is often
-    faster than the index scan method, but if the disk space requirement is
-    intolerable, you can disable this choice by temporarily setting <xref
-    linkend="guc-enable-sort"/> to <literal>off</literal>.
-   </para>
-
-   <para>
-    It is advisable to set <xref linkend="guc-maintenance-work-mem"/> to
-    a reasonably large value (but not more than the amount of RAM you can
-    dedicate to the <command>CLUSTER</command> operation) before clustering.
-   </para>
-
-   <para>
-    Because the planner records statistics about the ordering of
-    tables, it is advisable to run <link linkend="sql-analyze"><command>ANALYZE</command></link>
-    on the newly clustered table.
-    Otherwise, the planner might make poor choices of query plans.
-   </para>
-
    <para>
     Because <command>CLUSTER</command> remembers which indexes are clustered,
     one can cluster the tables one wants clustered manually the first time,
diff --git a/doc/src/sgml/ref/clusterdb.sgml b/doc/src/sgml/ref/clusterdb.sgml
index 0d2051bf6f1..546c1289c31 100644
--- a/doc/src/sgml/ref/clusterdb.sgml
+++ b/doc/src/sgml/ref/clusterdb.sgml
@@ -64,6 +64,11 @@ PostgreSQL documentation
    this utility and via other methods for accessing the server.
   </para>
 
+  <para>
+   <application>clusterdb</application> has been superceded by
+   <application>pg_repackdb</application>.
+  </para>
+
  </refsect1>
 
 
diff --git a/doc/src/sgml/ref/pg_repackdb.sgml b/doc/src/sgml/ref/pg_repackdb.sgml
new file mode 100644
index 00000000000..32570d071cb
--- /dev/null
+++ b/doc/src/sgml/ref/pg_repackdb.sgml
@@ -0,0 +1,479 @@
+<!--
+doc/src/sgml/ref/pg_repackdb.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="app-pgrepackdb">
+ <indexterm zone="app-pgrepackdb">
+  <primary>pg_repackdb</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle><application>pg_repackdb</application></refentrytitle>
+  <manvolnum>1</manvolnum>
+  <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>pg_repackdb</refname>
+  <refpurpose>repack and analyze a <productname>PostgreSQL</productname>
+  database</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_repackdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+      <arg choice="plain"><option>-t</option></arg>
+      <arg choice="plain"><option>--table</option></arg>
+     </group>
+     <replaceable>table</replaceable>
+     <arg choice="opt">( <replaceable class="parameter">column</replaceable> [,...] )</arg>
+    </arg>
+   </arg>
+
+   <arg choice="opt">
+    <group choice="plain">
+     <arg choice="plain"><replaceable>dbname</replaceable></arg>
+     <arg choice="plain"><option>-a</option></arg>
+     <arg choice="plain"><option>--all</option></arg>
+    </group>
+   </arg>
+  </cmdsynopsis>
+
+  <cmdsynopsis>
+   <command>pg_repackdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+      <arg choice="plain"><option>-n</option></arg>
+      <arg choice="plain"><option>--schema</option></arg>
+     </group>
+     <replaceable>schema</replaceable>
+    </arg>
+   </arg>
+
+   <arg choice="opt">
+    <group choice="plain">
+     <arg choice="plain"><replaceable>dbname</replaceable></arg>
+     <arg choice="plain"><option>-a</option></arg>
+     <arg choice="plain"><option>--all</option></arg>
+    </group>
+   </arg>
+  </cmdsynopsis>
+
+  <cmdsynopsis>
+   <command>pg_repackdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+      <arg choice="plain"><option>-N</option></arg>
+      <arg choice="plain"><option>--exclude-schema</option></arg>
+     </group>
+     <replaceable>schema</replaceable>
+    </arg>
+   </arg>
+
+   <arg choice="opt">
+    <group choice="plain">
+     <arg choice="plain"><replaceable>dbname</replaceable></arg>
+     <arg choice="plain"><option>-a</option></arg>
+     <arg choice="plain"><option>--all</option></arg>
+    </group>
+   </arg>
+  </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <application>pg_repackdb</application> is a utility for repacking a
+   <productname>PostgreSQL</productname> database.
+   <application>pg_repackdb</application> will also generate internal
+   statistics used by the <productname>PostgreSQL</productname> query
+   optimizer.
+  </para>
+
+  <para>
+   <application>pg_repackdb</application> is a wrapper around the SQL
+   command <link linkend="sql-repack"><command>REPACK</command></link> There
+   is no effective difference between repacking and analyzing databases via
+   this utility and via other methods for accessing the server.
+  </para>
+
+ </refsect1>
+
+
+ <refsect1>
+  <title>Options</title>
+
+   <para>
+    <application>pg_repackdb</application> accepts the following command-line arguments:
+    <variablelist>
+     <varlistentry>
+      <term><option>-a</option></term>
+      <term><option>--all</option></term>
+      <listitem>
+       <para>
+        Repack all databases.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option><optional>-d</optional> <replaceable class="parameter">dbname</replaceable></option></term>
+      <term><option><optional>--dbname=</optional><replaceable class="parameter">dbname</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the name of the database to be repacked or analyzed,
+        when <option>-a</option>/<option>--all</option> is not used.  If this
+        is not specified, the database name is read from the environment
+        variable <envar>PGDATABASE</envar>.  If that is not set, the user name
+        specified for the connection is used.
+        The <replaceable>dbname</replaceable> can be
+        a <link linkend="libpq-connstring">connection string</link>.  If so,
+        connection string parameters will override any conflicting command
+        line options.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-e</option></term>
+      <term><option>--echo</option></term>
+      <listitem>
+       <para>
+        Echo the commands that <application>pg_repackdb</application>
+        generates and sends to the server.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-j <replaceable class="parameter">njobs</replaceable></option></term>
+      <term><option>--jobs=<replaceable class="parameter">njobs</replaceable></option></term>
+      <listitem>
+       <para>
+        Execute the repack or analyze commands in parallel by running
+        <replaceable class="parameter">njobs</replaceable>
+        commands simultaneously.  This option may reduce the processing time
+        but it also increases the load on the database server.
+       </para>
+       <para>
+        <application>pg_repackdb</application> will open
+        <replaceable class="parameter">njobs</replaceable> connections to the
+        database, so make sure your <xref linkend="guc-max-connections"/>
+        setting is high enough to accommodate all connections.
+       </para>
+       <para>
+        Note that using this mode might cause deadlock failures if certain
+        system catalogs are processed in parallel.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-n <replaceable class="parameter">schema</replaceable></option></term>
+      <term><option>--schema=<replaceable class="parameter">schema</replaceable></option></term>
+      <listitem>
+       <para>
+        Repack or analyze all tables in
+        <replaceable class="parameter">schema</replaceable> only.  Multiple
+        schemas can be repacked by writing multiple <option>-n</option>
+        switches.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-N <replaceable class="parameter">schema</replaceable></option></term>
+      <term><option>--exclude-schema=<replaceable class="parameter">schema</replaceable></option></term>
+      <listitem>
+       <para>
+        Do not repack or analyze any tables in
+        <replaceable class="parameter">schema</replaceable>.  Multiple schemas
+        can be excluded by writing multiple <option>-N</option> switches.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-q</option></term>
+      <term><option>--quiet</option></term>
+      <listitem>
+       <para>
+        Do not display progress messages.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-t <replaceable class="parameter">table</replaceable> [ (<replaceable class="parameter">column</replaceable> [,...]) ]</option></term>
+      <term><option>--table=<replaceable class="parameter">table</replaceable> [ (<replaceable class="parameter">column</replaceable> [,...]) ]</option></term>
+      <listitem>
+       <para>
+        Repack or analyze <replaceable class="parameter">table</replaceable>
+        only.  Column names can be specified only in conjunction with
+        the <option>--analyze</option> option.  Multiple tables can be
+        repacked by writing multiple
+        <option>-t</option> switches.
+       </para>
+       <tip>
+        <para>
+         If you specify columns, you probably have to escape the parentheses
+         from the shell.  (See examples below.)
+        </para>
+       </tip>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-v</option></term>
+      <term><option>--verbose</option></term>
+      <listitem>
+       <para>
+        Print detailed information during processing.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-V</option></term>
+       <term><option>--version</option></term>
+       <listitem>
+       <para>
+       Print the <application>pg_repackdb</application> version and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-z</option></term>
+      <term><option>--analyze</option></term>
+      <listitem>
+       <para>
+        Also calculate statistics for use by the optimizer.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-?</option></term>
+       <term><option>--help</option></term>
+       <listitem>
+       <para>
+       Show help about <application>pg_repackdb</application> command line
+       arguments, and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+    </variablelist>
+   </para>
+
+   <para>
+    <application>pg_repackdb</application> also accepts
+    the following command-line arguments for connection parameters:
+    <variablelist>
+     <varlistentry>
+      <term><option>-h <replaceable class="parameter">host</replaceable></option></term>
+      <term><option>--host=<replaceable class="parameter">host</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the host name of the machine on which the server
+        is running.  If the value begins with a slash, it is used
+        as the directory for the Unix domain socket.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-p <replaceable class="parameter">port</replaceable></option></term>
+      <term><option>--port=<replaceable class="parameter">port</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the TCP port or local Unix domain socket file
+        extension on which the server
+        is listening for connections.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-U <replaceable class="parameter">username</replaceable></option></term>
+      <term><option>--username=<replaceable class="parameter">username</replaceable></option></term>
+      <listitem>
+       <para>
+        User name to connect as.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-w</option></term>
+      <term><option>--no-password</option></term>
+      <listitem>
+       <para>
+        Never issue a password prompt.  If the server requires
+        password authentication and a password is not available by
+        other means such as a <filename>.pgpass</filename> file, the
+        connection attempt will fail.  This option can be useful in
+        batch jobs and scripts where no user is present to enter a
+        password.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-W</option></term>
+      <term><option>--password</option></term>
+      <listitem>
+       <para>
+        Force <application>pg_repackdb</application> to prompt for a
+        password before connecting to a database.
+       </para>
+
+       <para>
+        This option is never essential, since
+        <application>pg_repackdb</application> will automatically prompt
+        for a password if the server demands password authentication.
+        However, <application>pg_repackdb</application> will waste a
+        connection attempt finding out that the server wants a password.
+        In some cases it is worth typing <option>-W</option> to avoid the extra
+        connection attempt.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--maintenance-db=<replaceable class="parameter">dbname</replaceable></option></term>
+      <listitem>
+       <para>
+        When the <option>-a</option>/<option>--all</option> is used, connect
+        to this database to gather the list of databases to repack.
+        If not specified, the <literal>postgres</literal> database will be used,
+        or if that does not exist, <literal>template1</literal> will be used.
+        This can be a <link linkend="libpq-connstring">connection
+        string</link>.  If so, connection string parameters will override any
+        conflicting command line options.  Also, connection string parameters
+        other than the database name itself will be re-used when connecting
+        to other databases.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+   </para>
+ </refsect1>
+
+
+ <refsect1>
+  <title>Environment</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><envar>PGDATABASE</envar></term>
+    <term><envar>PGHOST</envar></term>
+    <term><envar>PGPORT</envar></term>
+    <term><envar>PGUSER</envar></term>
+
+    <listitem>
+     <para>
+      Default connection parameters
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><envar>PG_COLOR</envar></term>
+    <listitem>
+     <para>
+      Specifies whether to use color in diagnostic messages. Possible values
+      are <literal>always</literal>, <literal>auto</literal> and
+      <literal>never</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <para>
+   This utility, like most other <productname>PostgreSQL</productname> utilities,
+   also uses the environment variables supported by <application>libpq</application>
+   (see <xref linkend="libpq-envars"/>).
+  </para>
+
+ </refsect1>
+
+
+ <refsect1>
+  <title>Diagnostics</title>
+
+  <para>
+   In case of difficulty, see
+   <xref linkend="sql-repack"/> and <xref linkend="app-psql"/> for
+   discussions of potential problems and error messages.
+   The database server must be running at the
+   targeted host.  Also, any default connection settings and environment
+   variables used by the <application>libpq</application> front-end
+   library will apply.
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+   <para>
+    To repack the database <literal>test</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb test</userinput>
+</screen>
+   </para>
+
+   <para>
+    To repack and analyze for the optimizer a database named
+    <literal>bigdb</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --analyze bigdb</userinput>
+</screen>
+   </para>
+
+   <para>
+    To repack a single table
+    <literal>foo</literal> in a database named
+    <literal>xyzzy</literal>, and analyze a single column
+    <literal>bar</literal> of the table for the optimizer:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --analyze --verbose --table='foo(bar)' xyzzy</userinput>
+</screen></para>
+
+   <para>
+    To repack all tables in the <literal>foo</literal> and <literal>bar</literal> schemas
+    in a database named <literal>xyzzy</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --schema='foo' --schema='bar' xyzzy</userinput>
+</screen></para>
+
+
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-repack"/></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/repack.sgml b/doc/src/sgml/ref/repack.sgml
new file mode 100644
index 00000000000..fd9d89f8aaa
--- /dev/null
+++ b/doc/src/sgml/ref/repack.sgml
@@ -0,0 +1,284 @@
+<!--
+doc/src/sgml/ref/repack.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="sql-repack">
+ <indexterm zone="sql-repack">
+  <primary>REPACK</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>REPACK</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>REPACK</refname>
+  <refpurpose>rewrite a table to reclaim disk space</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <replaceable class="parameter">table_name</replaceable> [ USING INDEX [ <replaceable class="parameter">index_name</replaceable> ] ] ]
+
+<phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
+
+    VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
+    ANALYSE | ANALYZE
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>REPACK</command> reclaims storage occupied by dead
+   tuples. Unlike <command>VACUUM</command>, it does so by rewriting the
+   entire contents of the table specified
+   by <replaceable class="parameter">table_name</replaceable> into a new disk
+   file with no extra space (except for the space guaranteed by
+   the <literal>fillfactor</literal> storage parameter), allowing unused space
+   to be returned to the operating system.
+  </para>
+
+  <para>
+   Without
+   a <replaceable class="parameter">table_name</replaceable>, <command>REPACK</command>
+   processes every table and materialized view in the current database that
+   the current user has the <literal>MAINTAIN</literal> privilege on. This
+   form of <command>REPACK</command> cannot be executed inside a transaction
+   block.
+  </para>
+
+  <para>
+   If a <literal>USING INDEX</literal> clause is specified, the rows are
+   physically reordered based on information from an index.  Please see the
+   notes on clustering below.
+  </para>
+
+  <para>
+   When a table is being repacked, an <literal>ACCESS EXCLUSIVE</literal> lock
+   is acquired on it. This prevents any other database operations (both reads
+   and writes) from operating on the table until the <command>REPACK</command>
+   is finished.
+  </para>
+
+  <refsect2 id="sql-repack-notes-on-clustering" xreflabel="Notes on Clustering">
+   <title>Notes on Clustering</title>
+
+   <para>
+    If the <literal>USING INDEX</literal> clause is specified, the rows in
+    the table are physically reordered following an index: if an index name
+    is specified in the command, then that index is used; if no index name
+    is specified, then the index that has been configured as the index to
+    cluster on.  If no index has been configured in this way, an error is
+    thrown.  The index given in the <literal>USING INDEX</literal> clause
+    is configured as the index to cluster on, as well as an index given
+    to the <command>CLUSTER</command> command.  An index can be set
+    manually using <command>ALTER TABLE ... CLUSTER ON</command>, and reset
+    with <command>ALTER TABLE ... SET WITHOUT CLUSTER</command>.
+   </para>
+
+   <para>
+    If no table name is specified in <command>REPACK USING INDEX</command>,
+    all tables which have a clustering index defined and which the calling
+    user has privileges for are processed.
+   </para>
+
+   <para>
+    Clustering is a one-time operation: when the table is
+    subsequently updated, the changes are not clustered.  That is, no attempt
+    is made to store new or updated rows according to their index order.  (If
+    one wishes, one can periodically recluster by issuing the command again.
+    Also, setting the table's <literal>fillfactor</literal> storage parameter
+    to less than 100% can aid in preserving cluster ordering during updates,
+    since updated rows are kept on the same page if enough space is available
+    there.)
+   </para>
+
+   <para>
+    In cases where you are accessing single rows randomly within a table, the
+    actual order of the data in the table is unimportant. However, if you tend
+    to access some data more than others, and there is an index that groups
+    them together, you will benefit from using clustering.  If
+    you are requesting a range of indexed values from a table, or a single
+    indexed value that has multiple rows that match,
+    <command>REPACK</command> will help because once the index identifies the
+    table page for the first row that matches, all other rows that match are
+    probably already on the same table page, and so you save disk accesses and
+    speed up the query.
+   </para>
+
+   <para>
+    <command>REPACK</command> can re-sort the table using either an index scan
+    on the specified index (if the index is a b-tree), or a sequential scan
+    followed by sorting.  It will attempt to choose the method that will be
+    faster, based on planner cost parameters and available statistical
+    information.
+   </para>
+
+   <para>
+    Because the planner records statistics about the ordering of tables, it is
+    advisable to
+    run <link linkend="sql-analyze"><command>ANALYZE</command></link> on the
+    newly repacked table.  Otherwise, the planner might make poor choices of
+    query plans.
+   </para>
+  </refsect2>
+
+  <refsect2 id="sql-repack-notes-on-resources" xreflabel="Notes on Resources">
+   <title>Notes on Resources</title>
+
+   <para>
+    When an index scan or a sequential scan without sort is used, a temporary
+    copy of the table is created that contains the table data in the index
+    order.  Temporary copies of each index on the table are created as well.
+    Therefore, you need free space on disk at least equal to the sum of the
+    table size and the index sizes.
+   </para>
+
+   <para>
+    When a sequential scan and sort is used, a temporary sort file is also
+    created, so that the peak temporary space requirement is as much as double
+    the table size, plus the index sizes.  This method is often faster than
+    the index scan method, but if the disk space requirement is intolerable,
+    you can disable this choice by temporarily setting
+    <xref linkend="guc-enable-sort"/> to <literal>off</literal>.
+   </para>
+
+   <para>
+    It is advisable to set <xref linkend="guc-maintenance-work-mem"/> to a
+    reasonably large value (but not more than the amount of RAM you can
+    dedicate to the <command>REPACK</command> operation) before repacking.
+   </para>
+  </refsect2>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><replaceable class="parameter">table_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (possibly schema-qualified) of a table.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">index_name</replaceable></term>
+    <listitem>
+     <para>
+      The name of an index.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>VERBOSE</literal></term>
+    <listitem>
+     <para>
+      Prints a progress report as each table is repacked
+      at <literal>INFO</literal> level.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>ANALYZE</literal></term>
+    <term><literal>ANALYSE</literal></term>
+    <listitem>
+     <para>
+      Applies <xref linkend="sql-analyze"/> on the table after repacking.  This is
+      currently only supported when a single (non-partitioned) table is specified.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">boolean</replaceable></term>
+    <listitem>
+     <para>
+      Specifies whether the selected option should be turned on or off.
+      You can write <literal>TRUE</literal>, <literal>ON</literal>, or
+      <literal>1</literal> to enable the option, and <literal>FALSE</literal>,
+      <literal>OFF</literal>, or <literal>0</literal> to disable it.  The
+      <replaceable class="parameter">boolean</replaceable> value can also
+      be omitted, in which case <literal>TRUE</literal> is assumed.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+
+   <para>
+    To repack a table, one must have the <literal>MAINTAIN</literal> privilege
+    on the table.
+   </para>
+
+   <para>
+    While <command>REPACK</command> is running, the <xref
+    linkend="guc-search-path"/> is temporarily changed to <literal>pg_catalog,
+    pg_temp</literal>.
+   </para>
+
+  <para>
+    Each backend running <command>REPACK</command> will report its progress
+    in the <structname>pg_stat_progress_repack</structname> view. See
+    <xref linkend="repack-progress-reporting"/> for details.
+  </para>
+
+   <para>
+    Repacking a partitioned table repacks each of its partitions. If an index
+    is specified, each partition is repacked using the partition of that
+    index. <command>REPACK</command> on a partitioned table cannot be executed
+    inside a transaction block.
+   </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   Repack the table <literal>employees</literal>:
+<programlisting>
+REPACK employees;
+</programlisting>
+  </para>
+
+  <para>
+   Repack the table <literal>employees</literal> on the basis of its
+   index <literal>employees_ind</literal> (Since index is used here, this is
+   effectively clustering):
+<programlisting>
+REPACK employees USING INDEX employees_ind;
+</programlisting>
+  </para>
+
+  <para>
+   Repack all tables in the database on which you have
+   the <literal>MAINTAIN</literal> privilege:
+<programlisting>
+REPACK;
+</programlisting></para>
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There is no <command>REPACK</command> statement in the SQL standard.
+  </para>
+
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index bd5dcaf86a5..062b658cfcd 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -25,7 +25,6 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
 
 <phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
 
-    FULL [ <replaceable class="parameter">boolean</replaceable> ]
     FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
     VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
     ANALYZE [ <replaceable class="parameter">boolean</replaceable> ]
@@ -39,6 +38,7 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
     SKIP_DATABASE_STATS [ <replaceable class="parameter">boolean</replaceable> ]
     ONLY_DATABASE_STATS [ <replaceable class="parameter">boolean</replaceable> ]
     BUFFER_USAGE_LIMIT <replaceable class="parameter">size</replaceable>
+    FULL [ <replaceable class="parameter">boolean</replaceable> ]
 
 <phrase>and <replaceable class="parameter">table_and_columns</replaceable> is:</phrase>
 
@@ -95,20 +95,6 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
   <title>Parameters</title>
 
   <variablelist>
-   <varlistentry>
-    <term><literal>FULL</literal></term>
-    <listitem>
-     <para>
-      Selects <quote>full</quote> vacuum, which can reclaim more
-      space, but takes much longer and exclusively locks the table.
-      This method also requires extra disk space, since it writes a
-      new copy of the table and doesn't release the old copy until
-      the operation is complete.  Usually this should only be used when a
-      significant amount of space needs to be reclaimed from within the table.
-     </para>
-    </listitem>
-   </varlistentry>
-
    <varlistentry>
     <term><literal>FREEZE</literal></term>
     <listitem>
@@ -362,6 +348,23 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>FULL</literal></term>
+    <listitem>
+     <para>
+      This option, which is deprecated, makes <command>VACUUM</command>
+      behave like <command>REPACK</command> without a
+      <literal>USING INDEX</literal> clause.
+      This method of compacting the table takes much longer than
+      <command>VACUUM</command> and exclusively locks the table.
+      This method also requires extra disk space, since it writes a
+      new copy of the table and doesn't release the old copy until
+      the operation is complete.  Usually this should only be used when a
+      significant amount of space needs to be reclaimed from within the table.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><replaceable class="parameter">boolean</replaceable></term>
     <listitem>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index ff85ace83fc..2ee08e21f41 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -195,6 +195,7 @@
    &refreshMaterializedView;
    &reindex;
    &releaseSavepoint;
+   &repack;
    &reset;
    &revoke;
    &rollback;
@@ -257,6 +258,7 @@
    &pgIsready;
    &pgReceivewal;
    &pgRecvlogical;
+   &pgRepackdb;
    &pgRestore;
    &pgVerifyBackup;
    &psqlRef;
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..79f9de5d760 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -741,13 +741,13 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	if (OldIndex != NULL && !use_sort)
 	{
 		const int	ci_index[] = {
-			PROGRESS_CLUSTER_PHASE,
-			PROGRESS_CLUSTER_INDEX_RELID
+			PROGRESS_REPACK_PHASE,
+			PROGRESS_REPACK_INDEX_RELID
 		};
 		int64		ci_val[2];
 
 		/* Set phase and OIDOldIndex to columns */
-		ci_val[0] = PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP;
+		ci_val[0] = PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP;
 		ci_val[1] = RelationGetRelid(OldIndex);
 		pgstat_progress_update_multi_param(2, ci_index, ci_val);
 
@@ -759,15 +759,15 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	else
 	{
 		/* In scan-and-sort mode and also VACUUM FULL, set phase */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
 		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
 		/* Set total heap blocks */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_TOTAL_HEAP_BLKS,
+		pgstat_progress_update_param(PROGRESS_REPACK_TOTAL_HEAP_BLKS,
 									 heapScan->rs_nblocks);
 	}
 
@@ -809,7 +809,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 				 * is manually updated to the correct value when the table
 				 * scan finishes.
 				 */
-				pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+				pgstat_progress_update_param(PROGRESS_REPACK_HEAP_BLKS_SCANNED,
 											 heapScan->rs_nblocks);
 				break;
 			}
@@ -825,7 +825,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			 */
 			if (prev_cblock != heapScan->rs_cblock)
 			{
-				pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+				pgstat_progress_update_param(PROGRESS_REPACK_HEAP_BLKS_SCANNED,
 											 (heapScan->rs_cblock +
 											  heapScan->rs_nblocks -
 											  heapScan->rs_startblock
@@ -912,14 +912,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			 * In scan-and-sort mode, report increase in number of tuples
 			 * scanned
 			 */
-			pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
 										 *num_tuples);
 		}
 		else
 		{
 			const int	ct_index[] = {
-				PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED,
-				PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN
+				PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
+				PROGRESS_REPACK_HEAP_TUPLES_WRITTEN
 			};
 			int64		ct_val[2];
 
@@ -952,14 +952,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		double		n_tuples = 0;
 
 		/* Report that we are now sorting tuples */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_SORT_TUPLES);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_SORT_TUPLES);
 
 		tuplesort_performsort(tuplesort);
 
 		/* Report that we are now writing new heap */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_WRITE_NEW_HEAP);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP);
 
 		for (;;)
 		{
@@ -977,7 +977,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 									 values, isnull,
 									 rwstate);
 			/* Report n_tuples */
-			pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_WRITTEN,
 										 n_tuples);
 		}
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index c4029a4f3d3..3063abff9a5 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -4079,7 +4079,7 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 		Assert(!ReindexIsProcessingIndex(indexOid));
 
 		/* Set index rebuild count */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_INDEX_REBUILD_COUNT,
+		pgstat_progress_update_param(PROGRESS_REPACK_INDEX_REBUILD_COUNT,
 									 i);
 		i++;
 	}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 1b3c5a55882..b2b7b10c2be 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1279,6 +1279,32 @@ CREATE VIEW pg_stat_progress_cluster AS
     FROM pg_stat_get_progress_info('CLUSTER') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
+CREATE VIEW pg_stat_progress_repack AS
+    SELECT
+        S.pid AS pid,
+        S.datid AS datid,
+        D.datname AS datname,
+        S.relid AS relid,
+	-- param1 is currently unused
+        CASE S.param2 WHEN 0 THEN 'initializing'
+                      WHEN 1 THEN 'seq scanning heap'
+                      WHEN 2 THEN 'index scanning heap'
+                      WHEN 3 THEN 'sorting tuples'
+                      WHEN 4 THEN 'writing new heap'
+                      WHEN 5 THEN 'swapping relation files'
+                      WHEN 6 THEN 'rebuilding index'
+                      WHEN 7 THEN 'performing final cleanup'
+                      END AS phase,
+        CAST(S.param3 AS oid) AS repack_index_relid,
+        S.param4 AS heap_tuples_scanned,
+        S.param5 AS heap_tuples_written,
+        S.param6 AS heap_blks_total,
+        S.param7 AS heap_blks_scanned,
+        S.param8 AS index_rebuild_count
+    FROM pg_stat_get_progress_info('REPACK') AS S
+        LEFT JOIN pg_database D ON S.datid = D.oid;
+
+
 CREATE VIEW pg_stat_progress_create_index AS
     SELECT
         S.pid AS pid, S.datid AS datid, D.datname AS datname,
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index b55221d44cd..8b64f9e6795 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -67,18 +67,41 @@ typedef struct
 	Oid			indexOid;
 } RelToCluster;
 
-
-static void cluster_multiple_rels(List *rtcs, ClusterParams *params);
-static void rebuild_relation(Relation OldHeap, Relation index, bool verbose);
+static bool cluster_rel_recheck(RepackCommand cmd, Relation OldHeap,
+								Oid indexOid, Oid userid, int options);
+static void rebuild_relation(RepackCommand cmd, bool usingindex,
+							 Relation OldHeap, Relation index, bool verbose);
 static void copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
-static List *get_tables_to_cluster(MemoryContext cluster_context);
-static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
-											   Oid indexOid);
-static bool cluster_is_permitted_for_relation(Oid relid, Oid userid);
+static List *get_tables_to_repack(RepackCommand cmd, bool usingindex,
+								  MemoryContext permcxt);
+static List *get_tables_to_repack_partitioned(RepackCommand cmd,
+											  MemoryContext cluster_context,
+											  Oid relid, bool rel_is_index);
+static bool cluster_is_permitted_for_relation(RepackCommand cmd,
+											  Oid relid, Oid userid);
+static Relation process_single_relation(RepackStmt *stmt,
+										ClusterParams *params);
+static Oid	determine_clustered_index(Relation rel, bool usingindex,
+									  const char *indexname);
 
 
+static const char *
+RepackCommandAsString(RepackCommand cmd)
+{
+	switch (cmd)
+	{
+		case REPACK_COMMAND_REPACK:
+			return "REPACK";
+		case REPACK_COMMAND_VACUUMFULL:
+			return "VACUUM";
+		case REPACK_COMMAND_CLUSTER:
+			return "CLUSTER";
+	}
+	return "???";
+}
+
 /*---------------------------------------------------------------------------
  * This cluster code allows for clustering multiple tables at once. Because
  * of this, we cannot just run everything on a single transaction, or we
@@ -104,191 +127,155 @@ static bool cluster_is_permitted_for_relation(Oid relid, Oid userid);
  *---------------------------------------------------------------------------
  */
 void
-cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
+ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 {
-	ListCell   *lc;
 	ClusterParams params = {0};
-	bool		verbose = false;
 	Relation	rel = NULL;
-	Oid			indexOid = InvalidOid;
-	MemoryContext cluster_context;
+	MemoryContext repack_context;
 	List	   *rtcs;
 
 	/* Parse option list */
-	foreach(lc, stmt->params)
+	foreach_node(DefElem, opt, stmt->params)
 	{
-		DefElem    *opt = (DefElem *) lfirst(lc);
-
 		if (strcmp(opt->defname, "verbose") == 0)
-			verbose = defGetBoolean(opt);
+			params.options |= defGetBoolean(opt) ? CLUOPT_VERBOSE : 0;
+		else if (strcmp(opt->defname, "analyze") == 0 ||
+				 strcmp(opt->defname, "analyse") == 0)
+			params.options |= defGetBoolean(opt) ? CLUOPT_ANALYZE : 0;
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
-					 errmsg("unrecognized CLUSTER option \"%s\"",
+					 errmsg("unrecognized %s option \"%s\"",
+							RepackCommandAsString(stmt->command),
 							opt->defname),
 					 parser_errposition(pstate, opt->location)));
 	}
 
-	params.options = (verbose ? CLUOPT_VERBOSE : 0);
-
+	/*
+	 * If a single relation is specified, process it and we're done ... unless
+	 * the relation is a partitioned table, in which case we fall through.
+	 */
 	if (stmt->relation != NULL)
 	{
-		/* This is the single-relation case. */
-		Oid			tableOid;
-
-		/*
-		 * Find, lock, and check permissions on the table.  We obtain
-		 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
-		 * single-transaction case.
-		 */
-		tableOid = RangeVarGetRelidExtended(stmt->relation,
-											AccessExclusiveLock,
-											0,
-											RangeVarCallbackMaintainsTable,
-											NULL);
-		rel = table_open(tableOid, NoLock);
-
-		/*
-		 * Reject clustering a remote temp table ... their local buffer
-		 * manager is not going to cope.
-		 */
-		if (RELATION_IS_OTHER_TEMP(rel))
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot cluster temporary tables of other sessions")));
-
-		if (stmt->indexname == NULL)
-		{
-			ListCell   *index;
-
-			/* We need to find the index that has indisclustered set. */
-			foreach(index, RelationGetIndexList(rel))
-			{
-				indexOid = lfirst_oid(index);
-				if (get_index_isclustered(indexOid))
-					break;
-				indexOid = InvalidOid;
-			}
-
-			if (!OidIsValid(indexOid))
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_OBJECT),
-						 errmsg("there is no previously clustered index for table \"%s\"",
-								stmt->relation->relname)));
-		}
-		else
-		{
-			/*
-			 * The index is expected to be in the same namespace as the
-			 * relation.
-			 */
-			indexOid = get_relname_relid(stmt->indexname,
-										 rel->rd_rel->relnamespace);
-			if (!OidIsValid(indexOid))
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_OBJECT),
-						 errmsg("index \"%s\" for table \"%s\" does not exist",
-								stmt->indexname, stmt->relation->relname)));
-		}
-
-		/* For non-partitioned tables, do what we came here to do. */
-		if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
-		{
-			cluster_rel(rel, indexOid, &params);
-			/* cluster_rel closes the relation, but keeps lock */
-
+		rel = process_single_relation(stmt, &params);
+		if (rel == NULL)
 			return;
-		}
 	}
 
+	/* Don't allow this for now.  Maybe we can add support for this later */
+	if (params.options & CLUOPT_ANALYZE)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot ANALYZE multiple tables"));
+
 	/*
 	 * By here, we know we are in a multi-table situation.  In order to avoid
 	 * holding locks for too long, we want to process each table in its own
 	 * transaction.  This forces us to disallow running inside a user
 	 * transaction block.
 	 */
-	PreventInTransactionBlock(isTopLevel, "CLUSTER");
+	PreventInTransactionBlock(isTopLevel, RepackCommandAsString(stmt->command));
 
 	/* Also, we need a memory context to hold our list of relations */
-	cluster_context = AllocSetContextCreate(PortalContext,
-											"Cluster",
-											ALLOCSET_DEFAULT_SIZES);
+	repack_context = AllocSetContextCreate(PortalContext,
+										   "Repack",
+										   ALLOCSET_DEFAULT_SIZES);
+
+	params.options |= CLUOPT_RECHECK;
 
 	/*
-	 * Either we're processing a partitioned table, or we were not given any
-	 * table name at all.  In either case, obtain a list of relations to
-	 * process.
-	 *
-	 * In the former case, an index name must have been given, so we don't
-	 * need to recheck its "indisclustered" bit, but we have to check that it
-	 * is an index that we can cluster on.  In the latter case, we set the
-	 * option bit to have indisclustered verified.
-	 *
-	 * Rechecking the relation itself is necessary here in all cases.
+	 * If we don't have a relation yet, determine a relation list.  If we do,
+	 * then it must be a partitioned table, and we want to process its
+	 * partitions.
 	 */
-	params.options |= CLUOPT_RECHECK;
-	if (rel != NULL)
+	if (rel == NULL)
 	{
-		Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
-		check_index_is_clusterable(rel, indexOid, AccessShareLock);
-		rtcs = get_tables_to_cluster_partitioned(cluster_context, indexOid);
-
-		/* close relation, releasing lock on parent table */
-		table_close(rel, AccessExclusiveLock);
+		Assert(stmt->indexname == NULL);
+		rtcs = get_tables_to_repack(stmt->command, stmt->usingindex,
+									repack_context);
 	}
 	else
 	{
-		rtcs = get_tables_to_cluster(cluster_context);
-		params.options |= CLUOPT_RECHECK_ISCLUSTERED;
+		Oid			relid;
+		bool		rel_is_index;
+
+		Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
+
+		/*
+		 * If an index name was specified, resolve it now and pass it down.
+		 */
+		if (stmt->usingindex)
+		{
+			/*
+			 * XXX how should this behave?  Passing no index to a partitioned
+			 * table could be useful to have certain partitions clustered by
+			 * some index, and other partitions by a different index.
+			 */
+			if (!stmt->indexname)
+				ereport(ERROR,
+						errmsg("there is no previously clustered index for table \"%s\"",
+							   RelationGetRelationName(rel)));
+
+			relid = determine_clustered_index(rel, true, stmt->indexname);
+			if (!OidIsValid(relid))
+				elog(ERROR, "unable to determine index to cluster on");
+			/* XXX is this the right place for this check? */
+			check_index_is_clusterable(rel, relid, AccessExclusiveLock);
+			rel_is_index = true;
+		}
+		else
+		{
+			relid = RelationGetRelid(rel);
+			rel_is_index = false;
+		}
+
+		rtcs = get_tables_to_repack_partitioned(stmt->command, repack_context,
+												relid, rel_is_index);
+
+		/* close parent relation, releasing lock on it */
+		table_close(rel, AccessExclusiveLock);
+		rel = NULL;
 	}
 
-	/* Do the job. */
-	cluster_multiple_rels(rtcs, &params);
-
-	/* Start a new transaction for the cleanup work. */
-	StartTransactionCommand();
-
-	/* Clean up working storage */
-	MemoryContextDelete(cluster_context);
-}
-
-/*
- * Given a list of relations to cluster, process each of them in a separate
- * transaction.
- *
- * We expect to be in a transaction at start, but there isn't one when we
- * return.
- */
-static void
-cluster_multiple_rels(List *rtcs, ClusterParams *params)
-{
-	ListCell   *lc;
-
 	/* Commit to get out of starting transaction */
 	PopActiveSnapshot();
 	CommitTransactionCommand();
 
 	/* Cluster the tables, each in a separate transaction */
-	foreach(lc, rtcs)
+	Assert(rel == NULL);
+	foreach_ptr(RelToCluster, rtc, rtcs)
 	{
-		RelToCluster *rtc = (RelToCluster *) lfirst(lc);
-		Relation	rel;
-
 		/* Start a new transaction for each relation. */
 		StartTransactionCommand();
 
+		/*
+		 * Open the target table, coping with the case where it has been
+		 * dropped.
+		 */
+		rel = try_table_open(rtc->tableOid, AccessExclusiveLock);
+		if (rel == NULL)
+		{
+			CommitTransactionCommand();
+			continue;
+		}
+
 		/* functions in indexes may want a snapshot set */
 		PushActiveSnapshot(GetTransactionSnapshot());
 
-		rel = table_open(rtc->tableOid, AccessExclusiveLock);
-
 		/* Process this table */
-		cluster_rel(rel, rtc->indexOid, params);
+		cluster_rel(stmt->command, stmt->usingindex,
+					rel, rtc->indexOid, &params);
 		/* cluster_rel closes the relation, but keeps lock */
 
 		PopActiveSnapshot();
 		CommitTransactionCommand();
 	}
+
+	/* Start a new transaction for the cleanup work. */
+	StartTransactionCommand();
+
+	/* Clean up working storage */
+	MemoryContextDelete(repack_context);
 }
 
 /*
@@ -304,11 +291,14 @@ cluster_multiple_rels(List *rtcs, ClusterParams *params)
  * them incrementally while we load the table.
  *
  * If indexOid is InvalidOid, the table will be rewritten in physical order
- * instead of index order.  This is the new implementation of VACUUM FULL,
- * and error messages should refer to the operation as VACUUM not CLUSTER.
+ * instead of index order.
+ *
+ * 'cmd' indicates which command is being executed, to be used for error
+ * messages.
  */
 void
-cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
+cluster_rel(RepackCommand cmd, bool usingindex,
+			Relation OldHeap, Oid indexOid, ClusterParams *params)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			save_userid;
@@ -323,13 +313,25 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	/* Check for user-requested abort. */
 	CHECK_FOR_INTERRUPTS();
 
-	pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
-	if (OidIsValid(indexOid))
-		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
-									 PROGRESS_CLUSTER_COMMAND_CLUSTER);
+	if (cmd == REPACK_COMMAND_REPACK)
+		pgstat_progress_start_command(PROGRESS_COMMAND_REPACK, tableOid);
 	else
-		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
+		pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
+
+	if (cmd == REPACK_COMMAND_REPACK)
+		pgstat_progress_update_param(PROGRESS_REPACK_COMMAND,
+									 PROGRESS_REPACK_COMMAND_REPACK);
+	else if (cmd == REPACK_COMMAND_CLUSTER)
+	{
+		pgstat_progress_update_param(PROGRESS_REPACK_COMMAND,
+									 PROGRESS_CLUSTER_COMMAND_CLUSTER);
+	}
+	else
+	{
+		Assert(cmd == REPACK_COMMAND_VACUUMFULL);
+		pgstat_progress_update_param(PROGRESS_REPACK_COMMAND,
 									 PROGRESS_CLUSTER_COMMAND_VACUUM_FULL);
+	}
 
 	/*
 	 * Switch to the table owner's userid, so that any index functions are run
@@ -351,63 +353,21 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	 * to cluster a not-previously-clustered index.
 	 */
 	if (recheck)
-	{
-		/* Check that the user still has privileges for the relation */
-		if (!cluster_is_permitted_for_relation(tableOid, save_userid))
-		{
-			relation_close(OldHeap, AccessExclusiveLock);
+		if (!cluster_rel_recheck(cmd, OldHeap, indexOid, save_userid,
+								 params->options))
 			goto out;
-		}
-
-		/*
-		 * Silently skip a temp table for a remote session.  Only doing this
-		 * check in the "recheck" case is appropriate (which currently means
-		 * somebody is executing a database-wide CLUSTER or on a partitioned
-		 * table), because there is another check in cluster() which will stop
-		 * any attempt to cluster remote temp tables by name.  There is
-		 * another check in cluster_rel which is redundant, but we leave it
-		 * for extra safety.
-		 */
-		if (RELATION_IS_OTHER_TEMP(OldHeap))
-		{
-			relation_close(OldHeap, AccessExclusiveLock);
-			goto out;
-		}
-
-		if (OidIsValid(indexOid))
-		{
-			/*
-			 * Check that the index still exists
-			 */
-			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
-			{
-				relation_close(OldHeap, AccessExclusiveLock);
-				goto out;
-			}
-
-			/*
-			 * Check that the index is still the one with indisclustered set,
-			 * if needed.
-			 */
-			if ((params->options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
-				!get_index_isclustered(indexOid))
-			{
-				relation_close(OldHeap, AccessExclusiveLock);
-				goto out;
-			}
-		}
-	}
 
 	/*
-	 * We allow VACUUM FULL, but not CLUSTER, on shared catalogs.  CLUSTER
-	 * would work in most respects, but the index would only get marked as
-	 * indisclustered in the current database, leading to unexpected behavior
-	 * if CLUSTER were later invoked in another database.
+	 * We allow repacking shared catalogs only when not using an index. It
+	 * would work to use an index in most respects, but the index would only
+	 * get marked as indisclustered in the current database, leading to
+	 * unexpected behavior if CLUSTER were later invoked in another database.
 	 */
-	if (OidIsValid(indexOid) && OldHeap->rd_rel->relisshared)
+	if (usingindex && OldHeap->rd_rel->relisshared)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot cluster a shared catalog")));
+				 errmsg("cannot run \"%s\" on a shared catalog",
+						RepackCommandAsString(cmd))));
 
 	/*
 	 * Don't process temp tables of other backends ... their local buffer
@@ -415,21 +375,30 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	 */
 	if (RELATION_IS_OTHER_TEMP(OldHeap))
 	{
-		if (OidIsValid(indexOid))
+		if (cmd == REPACK_COMMAND_CLUSTER)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("cannot cluster temporary tables of other sessions")));
+		else if (cmd == REPACK_COMMAND_REPACK)
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot repack temporary tables of other sessions")));
+		}
 		else
+		{
+			Assert(cmd == REPACK_COMMAND_VACUUMFULL);
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("cannot vacuum temporary tables of other sessions")));
+		}
 	}
 
 	/*
 	 * Also check for active uses of the relation in the current transaction,
 	 * including open scans and pending AFTER trigger events.
 	 */
-	CheckTableNotInUse(OldHeap, OidIsValid(indexOid) ? "CLUSTER" : "VACUUM");
+	CheckTableNotInUse(OldHeap, RepackCommandAsString(cmd));
 
 	/* Check heap and index are valid to cluster on */
 	if (OidIsValid(indexOid))
@@ -469,7 +438,7 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	TransferPredicateLocksToHeapRelation(OldHeap);
 
 	/* rebuild_relation does all the dirty work */
-	rebuild_relation(OldHeap, index, verbose);
+	rebuild_relation(cmd, usingindex, OldHeap, index, verbose);
 	/* rebuild_relation closes OldHeap, and index if valid */
 
 out:
@@ -482,6 +451,63 @@ out:
 	pgstat_progress_end_command();
 }
 
+/*
+ * Check if the table (and its index) still meets the requirements of
+ * cluster_rel().
+ */
+static bool
+cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
+					Oid userid, int options)
+{
+	Oid			tableOid = RelationGetRelid(OldHeap);
+
+	/* Check that the user still has privileges for the relation */
+	if (!cluster_is_permitted_for_relation(cmd, tableOid, userid))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		return false;
+	}
+
+	/*
+	 * Silently skip a temp table for a remote session.  Only doing this check
+	 * in the "recheck" case is appropriate (which currently means somebody is
+	 * executing a database-wide CLUSTER or on a partitioned table), because
+	 * there is another check in cluster() which will stop any attempt to
+	 * cluster remote temp tables by name.  There is another check in
+	 * cluster_rel which is redundant, but we leave it for extra safety.
+	 */
+	if (RELATION_IS_OTHER_TEMP(OldHeap))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		return false;
+	}
+
+	if (OidIsValid(indexOid))
+	{
+		/*
+		 * Check that the index still exists
+		 */
+		if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
+		{
+			relation_close(OldHeap, AccessExclusiveLock);
+			return false;
+		}
+
+		/*
+		 * Check that the index is still the one with indisclustered set, if
+		 * needed.
+		 */
+		if ((options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
+			!get_index_isclustered(indexOid))
+		{
+			relation_close(OldHeap, AccessExclusiveLock);
+			return false;
+		}
+	}
+
+	return true;
+}
+
 /*
  * Verify that the specified heap and index are valid to cluster on
  *
@@ -626,7 +652,8 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
  * On exit, they are closed, but locks on them are not released.
  */
 static void
-rebuild_relation(Relation OldHeap, Relation index, bool verbose)
+rebuild_relation(RepackCommand cmd, bool usingindex,
+				 Relation OldHeap, Relation index, bool verbose)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			accessMethod = OldHeap->rd_rel->relam;
@@ -642,8 +669,8 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose)
 	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false) &&
 		   (index == NULL || CheckRelationLockedByMe(index, AccessExclusiveLock, false)));
 
-	if (index)
-		/* Mark the correct index as clustered */
+	/* for CLUSTER or REPACK USING INDEX, mark the index as the one to use */
+	if (usingindex)
 		mark_index_clustered(OldHeap, RelationGetRelid(index), true);
 
 	/* Remember info about rel before closing OldHeap */
@@ -1458,8 +1485,8 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	int			i;
 
 	/* Report that we are now swapping relation files */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_SWAP_REL_FILES);
 
 	/* Zero out possible results from swapped_relation_files */
 	memset(mapped_tables, 0, sizeof(mapped_tables));
@@ -1509,14 +1536,14 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 		reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
 
 	/* Report that we are now reindexing relations */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_REBUILD_INDEX);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
 
 	reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
 
 	/* Report that we are now doing clean up */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_FINAL_CLEANUP);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_FINAL_CLEANUP);
 
 	/*
 	 * If the relation being rebuilt is pg_class, swap_relation_files()
@@ -1632,69 +1659,137 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	}
 }
 
-
 /*
- * Get a list of tables that the current user has privileges on and
- * have indisclustered set.  Return the list in a List * of RelToCluster
- * (stored in the specified memory context), each one giving the tableOid
- * and the indexOid on which the table is already clustered.
+ * Determine which relations to process, when REPACK/CLUSTER is called
+ * without specifying a table name.  The exact process depends on whether
+ * USING INDEX was given or not, and in any case we only return tables and
+ * materialized views that the current user has privileges to repack/cluster.
+ *
+ * If USING INDEX was given, we scan pg_index to find those that have
+ * indisclustered set; if it was not given, scan pg_class and return all
+ * tables.
+ *
+ * Return it as a list of RelToCluster in the given memory context.
  */
 static List *
-get_tables_to_cluster(MemoryContext cluster_context)
+get_tables_to_repack(RepackCommand command, bool usingindex,
+					 MemoryContext permcxt)
 {
-	Relation	indRelation;
+	Relation	catalog;
 	TableScanDesc scan;
-	ScanKeyData entry;
-	HeapTuple	indexTuple;
-	Form_pg_index index;
+	HeapTuple	tuple;
 	MemoryContext old_context;
 	List	   *rtcs = NIL;
 
-	/*
-	 * Get all indexes that have indisclustered set and that the current user
-	 * has the appropriate privileges for.
-	 */
-	indRelation = table_open(IndexRelationId, AccessShareLock);
-	ScanKeyInit(&entry,
-				Anum_pg_index_indisclustered,
-				BTEqualStrategyNumber, F_BOOLEQ,
-				BoolGetDatum(true));
-	scan = table_beginscan_catalog(indRelation, 1, &entry);
-	while ((indexTuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+	if (usingindex)
 	{
-		RelToCluster *rtc;
+		ScanKeyData entry;
 
-		index = (Form_pg_index) GETSTRUCT(indexTuple);
+		catalog = table_open(IndexRelationId, AccessShareLock);
+		ScanKeyInit(&entry,
+					Anum_pg_index_indisclustered,
+					BTEqualStrategyNumber, F_BOOLEQ,
+					BoolGetDatum(true));
+		scan = table_beginscan_catalog(catalog, 1, &entry);
+		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+		{
+			RelToCluster *rtc;
+			Form_pg_index index;
 
-		if (!cluster_is_permitted_for_relation(index->indrelid, GetUserId()))
-			continue;
+			index = (Form_pg_index) GETSTRUCT(tuple);
 
-		/* Use a permanent memory context for the result list */
-		old_context = MemoryContextSwitchTo(cluster_context);
+			/*
+			 * XXX I think the only reason there's no test failure here is
+			 * that we seldom have clustered indexes that would be affected by
+			 * concurrency.  Maybe we should also do the
+			 * ConditionalLockRelationOid+SearchSysCacheExists dance that we
+			 * do below.
+			 */
+			if (!cluster_is_permitted_for_relation(command, index->indrelid,
+												   GetUserId()))
+				continue;
 
-		rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
-		rtc->tableOid = index->indrelid;
-		rtc->indexOid = index->indexrelid;
-		rtcs = lappend(rtcs, rtc);
+			/* Use a permanent memory context for the result list */
+			old_context = MemoryContextSwitchTo(permcxt);
 
-		MemoryContextSwitchTo(old_context);
+			rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
+			rtc->tableOid = index->indrelid;
+			rtc->indexOid = index->indexrelid;
+			rtcs = lappend(rtcs, rtc);
+
+			MemoryContextSwitchTo(old_context);
+		}
 	}
-	table_endscan(scan);
+	else
+	{
+		catalog = table_open(RelationRelationId, AccessShareLock);
+		scan = table_beginscan_catalog(catalog, 0, NULL);
 
-	relation_close(indRelation, AccessShareLock);
+		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+		{
+			RelToCluster *rtc;
+			Form_pg_class class;
+
+			class = (Form_pg_class) GETSTRUCT(tuple);
+
+			/*
+			 * Try to obtain a light lock on the table, to ensure it doesn't
+			 * go away while we collect the list.  If we cannot, just
+			 * disregard the table.  XXX we could release at the bottom of the
+			 * loop, but for now just hold it until this transaction is
+			 * finished.
+			 */
+			if (!ConditionalLockRelationOid(class->oid, AccessShareLock))
+				continue;
+
+			/* Verify that the table still exists. */
+			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(class->oid)))
+			{
+				/* Release useless lock */
+				UnlockRelationOid(class->oid, AccessShareLock);
+				continue;
+			}
+
+			/* Can only process plain tables and matviews */
+			if (class->relkind != RELKIND_RELATION &&
+				class->relkind != RELKIND_MATVIEW)
+				continue;
+
+			if (!cluster_is_permitted_for_relation(command, class->oid,
+												   GetUserId()))
+				continue;
+
+			/* Use a permanent memory context for the result list */
+			old_context = MemoryContextSwitchTo(permcxt);
+
+			rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
+			rtc->tableOid = class->oid;
+			rtc->indexOid = InvalidOid;
+			rtcs = lappend(rtcs, rtc);
+
+			MemoryContextSwitchTo(old_context);
+		}
+	}
+
+	table_endscan(scan);
+	relation_close(catalog, AccessShareLock);
 
 	return rtcs;
 }
 
 /*
- * Given an index on a partitioned table, return a list of RelToCluster for
+ * Given a partitioned table or its index, return a list of RelToCluster for
  * all the children leaves tables/indexes.
  *
  * Like expand_vacuum_rel, but here caller must hold AccessExclusiveLock
  * on the table containing the index.
+ *
+ * 'rel_is_index' tells whether 'relid' is that of an index (true) or of the
+ * owning relation.
  */
 static List *
-get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
+get_tables_to_repack_partitioned(RepackCommand cmd, MemoryContext cluster_context,
+								 Oid relid, bool rel_is_index)
 {
 	List	   *inhoids;
 	ListCell   *lc;
@@ -1702,17 +1797,33 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
 	MemoryContext old_context;
 
 	/* Do not lock the children until they're processed */
-	inhoids = find_all_inheritors(indexOid, NoLock, NULL);
+	inhoids = find_all_inheritors(relid, NoLock, NULL);
 
 	foreach(lc, inhoids)
 	{
-		Oid			indexrelid = lfirst_oid(lc);
-		Oid			relid = IndexGetRelation(indexrelid, false);
+		Oid			inhoid = lfirst_oid(lc);
+		Oid			inhrelid,
+					inhindid;
 		RelToCluster *rtc;
 
-		/* consider only leaf indexes */
-		if (get_rel_relkind(indexrelid) != RELKIND_INDEX)
-			continue;
+		if (rel_is_index)
+		{
+			/* consider only leaf indexes */
+			if (get_rel_relkind(inhoid) != RELKIND_INDEX)
+				continue;
+
+			inhrelid = IndexGetRelation(inhoid, false);
+			inhindid = inhoid;
+		}
+		else
+		{
+			/* consider only leaf relations */
+			if (get_rel_relkind(inhoid) != RELKIND_RELATION)
+				continue;
+
+			inhrelid = inhoid;
+			inhindid = InvalidOid;
+		}
 
 		/*
 		 * It's possible that the user does not have privileges to CLUSTER the
@@ -1720,15 +1831,15 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
 		 * table.  We skip any partitions which the user is not permitted to
 		 * CLUSTER.
 		 */
-		if (!cluster_is_permitted_for_relation(relid, GetUserId()))
+		if (!cluster_is_permitted_for_relation(cmd, inhrelid, GetUserId()))
 			continue;
 
 		/* Use a permanent memory context for the result list */
 		old_context = MemoryContextSwitchTo(cluster_context);
 
 		rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
-		rtc->tableOid = relid;
-		rtc->indexOid = indexrelid;
+		rtc->tableOid = inhrelid;
+		rtc->indexOid = inhindid;
 		rtcs = lappend(rtcs, rtc);
 
 		MemoryContextSwitchTo(old_context);
@@ -1742,13 +1853,148 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
  * function emits a WARNING.
  */
 static bool
-cluster_is_permitted_for_relation(Oid relid, Oid userid)
+cluster_is_permitted_for_relation(RepackCommand cmd, Oid relid, Oid userid)
 {
 	if (pg_class_aclcheck(relid, userid, ACL_MAINTAIN) == ACLCHECK_OK)
 		return true;
 
+	Assert(cmd == REPACK_COMMAND_CLUSTER || cmd == REPACK_COMMAND_REPACK);
 	ereport(WARNING,
-			(errmsg("permission denied to cluster \"%s\", skipping it",
-					get_rel_name(relid))));
+			errmsg("permission denied to execute %s on \"%s\", skipping it",
+				   cmd == REPACK_COMMAND_CLUSTER ? "CLUSTER" : "REPACK",
+				   get_rel_name(relid)));
+
 	return false;
 }
+
+
+/*
+ * Given a RepackStmt with an indicated relation name, resolve the relation
+ * name, obtain lock on it, then determine what to do based on the relation
+ * type: if it's not a partitioned table, repack it as indicated (using an
+ * existing clustered index, or following the indicated index), and return
+ * NULL.
+ *
+ * On the other hand, if the table is partitioned, do nothing further and
+ * instead return the opened relcache entry, so that caller can process the
+ * partitions using the multiple-table handling code.  The index name is not
+ * resolve in this case.
+ */
+static Relation
+process_single_relation(RepackStmt *stmt, ClusterParams *params)
+{
+	Relation	rel;
+	Oid			tableOid;
+
+	Assert(stmt->relation != NULL);
+	Assert(stmt->command == REPACK_COMMAND_CLUSTER ||
+		   stmt->command == REPACK_COMMAND_REPACK);
+
+	/*
+	 * Find, lock, and check permissions on the table.  We obtain
+	 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
+	 * single-transaction case.
+	 */
+	tableOid = RangeVarGetRelidExtended(stmt->relation,
+										AccessExclusiveLock,
+										0,
+										RangeVarCallbackMaintainsTable,
+										NULL);
+	rel = table_open(tableOid, NoLock);
+
+	/*
+	 * Reject clustering a remote temp table ... their local buffer manager is
+	 * not going to cope.
+	 */
+	if (RELATION_IS_OTHER_TEMP(rel))
+	{
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot execute %s on temporary tables of other sessions",
+					   RepackCommandAsString(stmt->command)));
+	}
+
+	/*
+	 * For partitioned tables, let caller handle this.  Otherwise, process it
+	 * here and we're done.
+	 */
+	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+		return rel;
+	else
+	{
+		Oid			indexOid;
+
+		indexOid = determine_clustered_index(rel, stmt->usingindex,
+											 stmt->indexname);
+		if (OidIsValid(indexOid))
+			check_index_is_clusterable(rel, indexOid, AccessExclusiveLock);
+		cluster_rel(stmt->command, stmt->usingindex, rel, indexOid, params);
+
+		/* Do an analyze, if requested */
+		if (params->options & CLUOPT_ANALYZE)
+		{
+			VacuumParams vac_params = {0};
+
+			vac_params.options |= VACOPT_ANALYZE;
+			if (params->options & CLUOPT_VERBOSE)
+				vac_params.options |= VACOPT_VERBOSE;
+			analyze_rel(RelationGetRelid(rel), NULL, vac_params, NIL, true,
+						NULL);
+		}
+
+		return NULL;
+	}
+}
+
+/*
+ * Given a relation and the usingindex/indexname options in a
+ * REPACK USING INDEX or CLUSTER command, return the OID of the index to use
+ * for clustering the table.
+ *
+ * Caller must hold lock on the relation so that the set of indexes doesn't
+ * change, and must call check_index_is_clusterable.
+ */
+static Oid
+determine_clustered_index(Relation rel, bool usingindex, const char *indexname)
+{
+	Oid			indexOid;
+
+	if (indexname == NULL && usingindex)
+	{
+		ListCell   *lc;
+
+		/* Find an index with indisclustered set, or report error */
+		foreach(lc, RelationGetIndexList(rel))
+		{
+			indexOid = lfirst_oid(lc);
+
+			if (get_index_isclustered(indexOid))
+				break;
+			indexOid = InvalidOid;
+		}
+
+		if (!OidIsValid(indexOid))
+			ereport(ERROR,
+					errcode(ERRCODE_UNDEFINED_OBJECT),
+					errmsg("there is no previously clustered index for table \"%s\"",
+						   RelationGetRelationName(rel)));
+	}
+	else if (indexname != NULL)
+	{
+		/*
+		 * An index was specified; figure out its OID.  It must be in the same
+		 * namespace as the relation.
+		 */
+		indexOid = get_relname_relid(indexname,
+									 rel->rd_rel->relnamespace);
+		if (!OidIsValid(indexOid))
+			ereport(ERROR,
+					errcode(ERRCODE_UNDEFINED_OBJECT),
+					errmsg("index \"%s\" for table \"%s\" does not exist",
+						   indexname, RelationGetRelationName(rel)));
+	}
+	else
+		indexOid = InvalidOid;
+
+	return indexOid;
+}
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 733ef40ae7c..8863ad0e8bd 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -2287,7 +2287,8 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 				cluster_params.options |= CLUOPT_VERBOSE;
 
 			/* VACUUM FULL is now a variant of CLUSTER; see cluster.c */
-			cluster_rel(rel, InvalidOid, &cluster_params);
+			cluster_rel(REPACK_COMMAND_VACUUMFULL, false, rel, InvalidOid,
+						&cluster_params);
 			/* cluster_rel closes the relation, but keeps lock */
 
 			rel = NULL;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index db43034b9db..f9152728021 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -280,7 +280,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		AlterCompositeTypeStmt AlterUserMappingStmt
 		AlterRoleStmt AlterRoleSetStmt AlterPolicyStmt AlterStatsStmt
 		AlterDefaultPrivilegesStmt DefACLAction
-		AnalyzeStmt CallStmt ClosePortalStmt ClusterStmt CommentStmt
+		AnalyzeStmt CallStmt ClosePortalStmt CommentStmt
 		ConstraintsSetStmt CopyStmt CreateAsStmt CreateCastStmt
 		CreateDomainStmt CreateExtensionStmt CreateGroupStmt CreateOpClassStmt
 		CreateOpFamilyStmt AlterOpFamilyStmt CreatePLangStmt
@@ -297,7 +297,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		GrantStmt GrantRoleStmt ImportForeignSchemaStmt IndexStmt InsertStmt
 		ListenStmt LoadStmt LockStmt MergeStmt NotifyStmt ExplainableStmt PreparableStmt
 		CreateFunctionStmt AlterFunctionStmt ReindexStmt RemoveAggrStmt
-		RemoveFuncStmt RemoveOperStmt RenameStmt ReturnStmt RevokeStmt RevokeRoleStmt
+		RemoveFuncStmt RemoveOperStmt RenameStmt RepackStmt ReturnStmt RevokeStmt RevokeRoleStmt
 		RuleActionStmt RuleActionStmtOrEmpty RuleStmt
 		SecLabelStmt SelectStmt TransactionStmt TransactionStmtLegacy TruncateStmt
 		UnlistenStmt UpdateStmt VacuumStmt
@@ -316,7 +316,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 %type <str>			opt_single_name
 %type <list>		opt_qualified_name
-%type <boolean>		opt_concurrently
+%type <boolean>		opt_concurrently opt_usingindex
 %type <dbehavior>	opt_drop_behavior
 %type <list>		opt_utility_option_list
 %type <list>		utility_option_list
@@ -763,7 +763,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	QUOTE QUOTES
 
 	RANGE READ REAL REASSIGN RECURSIVE REF_P REFERENCES REFERENCING
-	REFRESH REINDEX RELATIVE_P RELEASE RENAME REPEATABLE REPLACE REPLICA
+	REFRESH REINDEX RELATIVE_P RELEASE RENAME REPACK REPEATABLE REPLACE REPLICA
 	RESET RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP
 	ROUTINE ROUTINES ROW ROWS RULE
 
@@ -1025,7 +1025,6 @@ stmt:
 			| CallStmt
 			| CheckPointStmt
 			| ClosePortalStmt
-			| ClusterStmt
 			| CommentStmt
 			| ConstraintsSetStmt
 			| CopyStmt
@@ -1099,6 +1098,7 @@ stmt:
 			| RemoveFuncStmt
 			| RemoveOperStmt
 			| RenameStmt
+			| RepackStmt
 			| RevokeStmt
 			| RevokeRoleStmt
 			| RuleStmt
@@ -1135,6 +1135,11 @@ opt_concurrently:
 			| /*EMPTY*/						{ $$ = false; }
 		;
 
+opt_usingindex:
+			USING INDEX						{ $$ = true; }
+			| /* EMPTY */					{ $$ = false; }
+		;
+
 opt_drop_behavior:
 			CASCADE							{ $$ = DROP_CASCADE; }
 			| RESTRICT						{ $$ = DROP_RESTRICT; }
@@ -11912,38 +11917,91 @@ CreateConversionStmt:
 /*****************************************************************************
  *
  *		QUERY:
+ *				REPACK [ (options) ] [ <qualified_name> [ USING INDEX <index_name> ] ]
+ *
+ *			obsolete variants:
  *				CLUSTER (options) [ <qualified_name> [ USING <index_name> ] ]
  *				CLUSTER [VERBOSE] [ <qualified_name> [ USING <index_name> ] ]
  *				CLUSTER [VERBOSE] <index_name> ON <qualified_name> (for pre-8.3)
  *
  *****************************************************************************/
 
-ClusterStmt:
-			CLUSTER '(' utility_option_list ')' qualified_name cluster_index_specification
+RepackStmt:
+			REPACK opt_utility_option_list qualified_name USING INDEX name
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = $3;
+					n->indexname = $6;
+					n->usingindex = true;
+					n->params = $2;
+					$$ = (Node *) n;
+				}
+			| REPACK opt_utility_option_list qualified_name opt_usingindex
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = $3;
+					n->indexname = NULL;
+					n->usingindex = $4;
+					n->params = $2;
+					$$ = (Node *) n;
+				}
+			| REPACK '(' utility_option_list ')'
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = NULL;
+					n->indexname = NULL;
+					n->usingindex = false;
+					n->params = $3;
+					$$ = (Node *) n;
+				}
+			| REPACK opt_usingindex
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = NULL;
+					n->indexname = NULL;
+					n->usingindex = $2;
+					n->params = NIL;
+					$$ = (Node *) n;
+				}
+			| CLUSTER '(' utility_option_list ')' qualified_name cluster_index_specification
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = $5;
 					n->indexname = $6;
+					n->usingindex = true;
 					n->params = $3;
 					$$ = (Node *) n;
 				}
 			| CLUSTER opt_utility_option_list
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = NULL;
 					n->indexname = NULL;
+					n->usingindex = true;
 					n->params = $2;
 					$$ = (Node *) n;
 				}
 			/* unparenthesized VERBOSE kept for pre-14 compatibility */
 			| CLUSTER opt_verbose qualified_name cluster_index_specification
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = $3;
 					n->indexname = $4;
+					n->usingindex = true;
 					if ($2)
 						n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
@@ -11951,20 +12009,24 @@ ClusterStmt:
 			/* unparenthesized VERBOSE kept for pre-17 compatibility */
 			| CLUSTER VERBOSE
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = NULL;
 					n->indexname = NULL;
+					n->usingindex = true;
 					n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
 				}
 			/* kept for pre-8.3 compatibility */
 			| CLUSTER opt_verbose name ON qualified_name
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = $5;
 					n->indexname = $3;
+					n->usingindex = true;
 					if ($2)
 						n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
@@ -17960,6 +18022,7 @@ unreserved_keyword:
 			| RELATIVE_P
 			| RELEASE
 			| RENAME
+			| REPACK
 			| REPEATABLE
 			| REPLACE
 			| REPLICA
@@ -18592,6 +18655,7 @@ bare_label_keyword:
 			| RELATIVE_P
 			| RELEASE
 			| RENAME
+			| REPACK
 			| REPEATABLE
 			| REPLACE
 			| REPLICA
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 5f442bc3bd4..cf6db581007 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -277,9 +277,9 @@ ClassifyUtilityCommandAsReadOnly(Node *parsetree)
 				return COMMAND_OK_IN_RECOVERY | COMMAND_OK_IN_READ_ONLY_TXN;
 			}
 
-		case T_ClusterStmt:
 		case T_ReindexStmt:
 		case T_VacuumStmt:
+		case T_RepackStmt:
 			{
 				/*
 				 * These commands write WAL, so they're not strictly
@@ -854,14 +854,14 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 			ExecuteCallStmt(castNode(CallStmt, parsetree), params, isAtomicContext, dest);
 			break;
 
-		case T_ClusterStmt:
-			cluster(pstate, (ClusterStmt *) parsetree, isTopLevel);
-			break;
-
 		case T_VacuumStmt:
 			ExecVacuum(pstate, (VacuumStmt *) parsetree, isTopLevel);
 			break;
 
+		case T_RepackStmt:
+			ExecRepack(pstate, (RepackStmt *) parsetree, isTopLevel);
+			break;
+
 		case T_ExplainStmt:
 			ExplainQuery(pstate, (ExplainStmt *) parsetree, params, dest);
 			break;
@@ -2851,10 +2851,6 @@ CreateCommandTag(Node *parsetree)
 			tag = CMDTAG_CALL;
 			break;
 
-		case T_ClusterStmt:
-			tag = CMDTAG_CLUSTER;
-			break;
-
 		case T_VacuumStmt:
 			if (((VacuumStmt *) parsetree)->is_vacuumcmd)
 				tag = CMDTAG_VACUUM;
@@ -2862,6 +2858,10 @@ CreateCommandTag(Node *parsetree)
 				tag = CMDTAG_ANALYZE;
 			break;
 
+		case T_RepackStmt:
+			tag = CMDTAG_REPACK;
+			break;
+
 		case T_ExplainStmt:
 			tag = CMDTAG_EXPLAIN;
 			break;
@@ -3499,7 +3499,7 @@ GetCommandLogLevel(Node *parsetree)
 			lev = LOGSTMT_ALL;
 			break;
 
-		case T_ClusterStmt:
+		case T_RepackStmt:
 			lev = LOGSTMT_DDL;
 			break;
 
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index c756c2bebaa..a1e10e8c2f6 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -268,6 +268,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 		cmdtype = PROGRESS_COMMAND_ANALYZE;
 	else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
 		cmdtype = PROGRESS_COMMAND_CLUSTER;
+	else if (pg_strcasecmp(cmd, "REPACK") == 0)
+		cmdtype = PROGRESS_COMMAND_REPACK;
 	else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)
 		cmdtype = PROGRESS_COMMAND_CREATE_INDEX;
 	else if (pg_strcasecmp(cmd, "BASEBACKUP") == 0)
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 8b10f2313f3..59ff6e0923b 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -1247,7 +1247,7 @@ static const char *const sql_commands[] = {
 	"DELETE FROM", "DISCARD", "DO", "DROP", "END", "EXECUTE", "EXPLAIN",
 	"FETCH", "GRANT", "IMPORT FOREIGN SCHEMA", "INSERT INTO", "LISTEN", "LOAD", "LOCK",
 	"MERGE INTO", "MOVE", "NOTIFY", "PREPARE",
-	"REASSIGN", "REFRESH MATERIALIZED VIEW", "REINDEX", "RELEASE",
+	"REASSIGN", "REFRESH MATERIALIZED VIEW", "REINDEX", "RELEASE", "REPACK",
 	"RESET", "REVOKE", "ROLLBACK",
 	"SAVEPOINT", "SECURITY LABEL", "SELECT", "SET", "SHOW", "START",
 	"TABLE", "TRUNCATE", "UNLISTEN", "UPDATE", "VACUUM", "VALUES", "WITH",
@@ -4997,6 +4997,37 @@ match_previous_words(int pattern_id,
 			COMPLETE_WITH_QUERY(Query_for_list_of_tablespaces);
 	}
 
+/* REPACK */
+	else if (Matches("REPACK"))
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
+	else if (Matches("REPACK", "(*)"))
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
+	/* If we have REPACK <sth>, then add "USING INDEX" */
+	else if (Matches("REPACK", MatchAnyExcept("(")))
+		COMPLETE_WITH("USING INDEX");
+	/* If we have REPACK (*) <sth>, then add "USING INDEX" */
+	else if (Matches("REPACK", "(*)", MatchAny))
+		COMPLETE_WITH("USING INDEX");
+	/* If we have REPACK <sth> USING, then add the index as well */
+	else if (Matches("REPACK", MatchAny, "USING", "INDEX"))
+	{
+		set_completion_reference(prev3_wd);
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_index_of_table);
+	}
+	else if (HeadMatches("REPACK", "(*") &&
+			 !HeadMatches("REPACK", "(*)"))
+	{
+		/*
+		 * This fires if we're in an unfinished parenthesized option list.
+		 * get_previous_words treats a completed parenthesized option list as
+		 * one word, so the above test is correct.
+		 */
+		if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
+			COMPLETE_WITH("VERBOSE");
+		else if (TailMatches("VERBOSE"))
+			COMPLETE_WITH("ON", "OFF");
+	}
+
 /* SECURITY LABEL */
 	else if (Matches("SECURITY"))
 		COMPLETE_WITH("LABEL");
diff --git a/src/bin/scripts/Makefile b/src/bin/scripts/Makefile
index 019ca06455d..f0c1bd4175c 100644
--- a/src/bin/scripts/Makefile
+++ b/src/bin/scripts/Makefile
@@ -16,7 +16,7 @@ subdir = src/bin/scripts
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-PROGRAMS = createdb createuser dropdb dropuser clusterdb vacuumdb reindexdb pg_isready
+PROGRAMS = createdb createuser dropdb dropuser clusterdb vacuumdb reindexdb pg_isready pg_repackdb
 
 override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
 LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
@@ -31,6 +31,7 @@ clusterdb: clusterdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport su
 vacuumdb: vacuumdb.o vacuuming.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 reindexdb: reindexdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 pg_isready: pg_isready.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
+pg_repackdb: pg_repackdb.o vacuuming.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 
 install: all installdirs
 	$(INSTALL_PROGRAM) createdb$(X)   '$(DESTDIR)$(bindir)'/createdb$(X)
@@ -41,6 +42,7 @@ install: all installdirs
 	$(INSTALL_PROGRAM) vacuumdb$(X)   '$(DESTDIR)$(bindir)'/vacuumdb$(X)
 	$(INSTALL_PROGRAM) reindexdb$(X)  '$(DESTDIR)$(bindir)'/reindexdb$(X)
 	$(INSTALL_PROGRAM) pg_isready$(X) '$(DESTDIR)$(bindir)'/pg_isready$(X)
+	$(INSTALL_PROGRAM) pg_repackdb$(X) '$(DESTDIR)$(bindir)'/pg_repackdb$(X)
 
 installdirs:
 	$(MKDIR_P) '$(DESTDIR)$(bindir)'
diff --git a/src/bin/scripts/meson.build b/src/bin/scripts/meson.build
index a4fed59d1c9..be573cae682 100644
--- a/src/bin/scripts/meson.build
+++ b/src/bin/scripts/meson.build
@@ -42,6 +42,7 @@ vacuuming_common = static_library('libvacuuming_common',
 
 binaries = [
   'vacuumdb',
+  'pg_repackdb',
 ]
 foreach binary : binaries
   binary_sources = files('@0@.c'.format(binary))
@@ -80,6 +81,7 @@ tests += {
       't/100_vacuumdb.pl',
       't/101_vacuumdb_all.pl',
       't/102_vacuumdb_stages.pl',
+      't/103_repackdb.pl',
       't/200_connstr.pl',
     ],
   },
diff --git a/src/bin/scripts/pg_repackdb.c b/src/bin/scripts/pg_repackdb.c
new file mode 100644
index 00000000000..23326372a77
--- /dev/null
+++ b/src/bin/scripts/pg_repackdb.c
@@ -0,0 +1,226 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_repackdb
+ *		An utility to run REPACK
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * FIXME: this is missing a way to specify the index to use to repack one
+ * table, or whether to pass a WITH INDEX clause when multiple tables are
+ * used.  Something like --index[=indexname].  Adding that bleeds into
+ * vacuuming.c as well.
+ *
+ * src/bin/scripts/pg_repackdb.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <limits.h>
+
+#include "common.h"
+#include "common/logging.h"
+#include "fe_utils/option_utils.h"
+#include "vacuuming.h"
+
+static void help(const char *progname);
+void		check_objfilter(void);
+
+int
+main(int argc, char *argv[])
+{
+	static struct option long_options[] = {
+		{"host", required_argument, NULL, 'h'},
+		{"port", required_argument, NULL, 'p'},
+		{"username", required_argument, NULL, 'U'},
+		{"no-password", no_argument, NULL, 'w'},
+		{"password", no_argument, NULL, 'W'},
+		{"echo", no_argument, NULL, 'e'},
+		{"quiet", no_argument, NULL, 'q'},
+		{"dbname", required_argument, NULL, 'd'},
+		{"all", no_argument, NULL, 'a'},
+		{"table", required_argument, NULL, 't'},
+		{"verbose", no_argument, NULL, 'v'},
+		{"jobs", required_argument, NULL, 'j'},
+		{"schema", required_argument, NULL, 'n'},
+		{"exclude-schema", required_argument, NULL, 'N'},
+		{"maintenance-db", required_argument, NULL, 2},
+		{NULL, 0, NULL, 0}
+	};
+
+	const char *progname;
+	int			optindex;
+	int			c;
+	const char *dbname = NULL;
+	const char *maintenance_db = NULL;
+	ConnParams	cparams;
+	bool		echo = false;
+	bool		quiet = false;
+	vacuumingOptions vacopts;
+	SimpleStringList objects = {NULL, NULL};
+	int			concurrentCons = 1;
+	int			tbl_count = 0;
+
+	/* initialize options */
+	memset(&vacopts, 0, sizeof(vacopts));
+	vacopts.mode = MODE_REPACK;
+
+	/* the same for connection parameters */
+	memset(&cparams, 0, sizeof(cparams));
+	cparams.prompt_password = TRI_DEFAULT;
+
+	pg_logging_init(argv[0]);
+	progname = get_progname(argv[0]);
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
+
+	handle_help_version_opts(argc, argv, progname, help);
+
+	while ((c = getopt_long(argc, argv, "ad:eh:j:n:N:p:qt:U:vwW",
+							long_options, &optindex)) != -1)
+	{
+		switch (c)
+		{
+			case 'a':
+				objfilter |= OBJFILTER_ALL_DBS;
+				break;
+			case 'd':
+				objfilter |= OBJFILTER_DATABASE;
+				dbname = pg_strdup(optarg);
+				break;
+			case 'e':
+				echo = true;
+				break;
+			case 'h':
+				cparams.pghost = pg_strdup(optarg);
+				break;
+			case 'j':
+				if (!option_parse_int(optarg, "-j/--jobs", 1, INT_MAX,
+									  &concurrentCons))
+					exit(1);
+				break;
+			case 'n':
+				objfilter |= OBJFILTER_SCHEMA;
+				simple_string_list_append(&objects, optarg);
+				break;
+			case 'N':
+				objfilter |= OBJFILTER_SCHEMA_EXCLUDE;
+				simple_string_list_append(&objects, optarg);
+				break;
+			case 'p':
+				cparams.pgport = pg_strdup(optarg);
+				break;
+			case 'q':
+				quiet = true;
+				break;
+			case 't':
+				objfilter |= OBJFILTER_TABLE;
+				simple_string_list_append(&objects, optarg);
+				tbl_count++;
+				break;
+			case 'U':
+				cparams.pguser = pg_strdup(optarg);
+				break;
+			case 'v':
+				vacopts.verbose = true;
+				break;
+			case 'w':
+				cparams.prompt_password = TRI_NO;
+				break;
+			case 'W':
+				cparams.prompt_password = TRI_YES;
+				break;
+			case 2:
+				maintenance_db = pg_strdup(optarg);
+				break;
+			default:
+				/* getopt_long already emitted a complaint */
+				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+				exit(1);
+		}
+	}
+
+	/*
+	 * Non-option argument specifies database name as long as it wasn't
+	 * already specified with -d / --dbname
+	 */
+	if (optind < argc && dbname == NULL)
+	{
+		objfilter |= OBJFILTER_DATABASE;
+		dbname = argv[optind];
+		optind++;
+	}
+
+	if (optind < argc)
+	{
+		pg_log_error("too many command-line arguments (first is \"%s\")",
+					 argv[optind]);
+		pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+		exit(1);
+	}
+
+	/*
+	 * Validate the combination of filters specified in the command-line
+	 * options.
+	 */
+	check_objfilter();
+
+	vacuuming_main(&cparams, dbname, maintenance_db, &vacopts, &objects,
+				   false, tbl_count, concurrentCons,
+				   progname, echo, quiet);
+	exit(0);
+}
+
+/*
+ * Verify that the filters used at command line are compatible.
+ */
+void
+check_objfilter(void)
+{
+	if ((objfilter & OBJFILTER_ALL_DBS) &&
+		(objfilter & OBJFILTER_DATABASE))
+		pg_fatal("cannot repack all databases and a specific one at the same time");
+
+	if ((objfilter & OBJFILTER_TABLE) &&
+		(objfilter & OBJFILTER_SCHEMA))
+		pg_fatal("cannot repack all tables in schema(s) and specific table(s) at the same time");
+
+	if ((objfilter & OBJFILTER_TABLE) &&
+		(objfilter & OBJFILTER_SCHEMA_EXCLUDE))
+		pg_fatal("cannot repack specific table(s) and exclude schema(s) at the same time");
+
+	if ((objfilter & OBJFILTER_SCHEMA) &&
+		(objfilter & OBJFILTER_SCHEMA_EXCLUDE))
+		pg_fatal("cannot repack all tables in schema(s) and exclude schema(s) at the same time");
+}
+
+static void
+help(const char *progname)
+{
+	printf(_("%s repacks a PostgreSQL database.\n\n"), progname);
+	printf(_("Usage:\n"));
+	printf(_("  %s [OPTION]... [DBNAME]\n"), progname);
+	printf(_("\nOptions:\n"));
+	printf(_("  -a, --all                       repack all databases\n"));
+	printf(_("  -d, --dbname=DBNAME             database to repack\n"));
+	printf(_("  -e, --echo                      show the commands being sent to the server\n"));
+	printf(_("  -j, --jobs=NUM                  use this many concurrent connections to repack\n"));
+	printf(_("  -n, --schema=SCHEMA             repack tables in the specified schema(s) only\n"));
+	printf(_("  -N, --exclude-schema=SCHEMA     do not repack tables in the specified schema(s)\n"));
+	printf(_("  -q, --quiet                     don't write any messages\n"));
+	printf(_("  -t, --table='TABLE'             repack specific table(s) only\n"));
+	printf(_("  -v, --verbose                   write a lot of output\n"));
+	printf(_("  -V, --version                   output version information, then exit\n"));
+	printf(_("  -?, --help                      show this help, then exit\n"));
+	printf(_("\nConnection options:\n"));
+	printf(_("  -h, --host=HOSTNAME       database server host or socket directory\n"));
+	printf(_("  -p, --port=PORT           database server port\n"));
+	printf(_("  -U, --username=USERNAME   user name to connect as\n"));
+	printf(_("  -w, --no-password         never prompt for password\n"));
+	printf(_("  -W, --password            force password prompt\n"));
+	printf(_("  --maintenance-db=DBNAME   alternate maintenance database\n"));
+	printf(_("\nRead the description of the SQL command REPACK for details.\n"));
+	printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
+	printf(_("%s home page: <%s>\n"), PACKAGE_NAME, PACKAGE_URL);
+}
diff --git a/src/bin/scripts/t/103_repackdb.pl b/src/bin/scripts/t/103_repackdb.pl
new file mode 100644
index 00000000000..51de4d7ab34
--- /dev/null
+++ b/src/bin/scripts/t/103_repackdb.pl
@@ -0,0 +1,24 @@
+# Copyright (c) 2021-2025, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+program_help_ok('pg_repackdb');
+program_version_ok('pg_repackdb');
+program_options_handling_ok('pg_repackdb');
+
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init;
+$node->start;
+
+$node->issues_sql_like(
+	[ 'pg_repackdb', 'postgres' ],
+	qr/statement: REPACK.*;/,
+	'SQL REPACK run');
+
+
+done_testing();
diff --git a/src/bin/scripts/vacuuming.c b/src/bin/scripts/vacuuming.c
index 9be37fcc45a..e07071c38ee 100644
--- a/src/bin/scripts/vacuuming.c
+++ b/src/bin/scripts/vacuuming.c
@@ -1,6 +1,6 @@
 /*-------------------------------------------------------------------------
  * vacuuming.c
- *		Common routines for vacuumdb
+ *		Common routines for vacuumdb and pg_repackdb
  *
  * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -166,6 +166,14 @@ vacuum_one_database(ConnParams *cparams,
 
 	conn = connectDatabase(cparams, progname, echo, false, true);
 
+	if (vacopts->mode == MODE_REPACK && PQserverVersion(conn) < 190000)
+	{
+		/* XXX arguably, here we should use VACUUM FULL instead of failing */
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" command on server versions older than PostgreSQL %s",
+				 "REPACK", "19");
+	}
+
 	if (vacopts->disable_page_skipping && PQserverVersion(conn) < 90600)
 	{
 		PQfinish(conn);
@@ -258,9 +266,15 @@ vacuum_one_database(ConnParams *cparams,
 		if (stage != ANALYZE_NO_STAGE)
 			printf(_("%s: processing database \"%s\": %s\n"),
 				   progname, PQdb(conn), _(stage_messages[stage]));
-		else
+		else if (vacopts->mode == MODE_VACUUM)
 			printf(_("%s: vacuuming database \"%s\"\n"),
 				   progname, PQdb(conn));
+		else
+		{
+			Assert(vacopts->mode == MODE_REPACK);
+			printf(_("%s: repacking database \"%s\"\n"),
+				   progname, PQdb(conn));
+		}
 		fflush(stdout);
 	}
 
@@ -350,7 +364,7 @@ vacuum_one_database(ConnParams *cparams,
 		 * through ParallelSlotsGetIdle.
 		 */
 		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, sql.data,
+		run_vacuum_command(free_slot->connection, vacopts, sql.data,
 						   echo, tabname);
 
 		cell = cell->next;
@@ -363,7 +377,7 @@ vacuum_one_database(ConnParams *cparams,
 	}
 
 	/* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */
-	if (vacopts->skip_database_stats &&
+	if (vacopts->mode == MODE_VACUUM && vacopts->skip_database_stats &&
 		stage == ANALYZE_NO_STAGE)
 	{
 		const char *cmd = "VACUUM (ONLY_DATABASE_STATS);";
@@ -376,7 +390,7 @@ vacuum_one_database(ConnParams *cparams,
 		}
 
 		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, cmd, echo, NULL);
+		run_vacuum_command(free_slot->connection, vacopts, cmd, echo, NULL);
 
 		if (!ParallelSlotsWaitCompletion(sa))
 			failed = true;
@@ -708,6 +722,12 @@ vacuum_all_databases(ConnParams *cparams,
 	int			i;
 
 	conn = connectMaintenanceDatabase(cparams, progname, echo);
+	if (vacopts->mode == MODE_REPACK && PQserverVersion(conn) < 190000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" command on server versions older than PostgreSQL %s",
+				 "REPACK", "19");
+	}
 	result = executeQuery(conn,
 						  "SELECT datname FROM pg_database WHERE datallowconn AND datconnlimit <> -2 ORDER BY 1;",
 						  echo);
@@ -761,7 +781,7 @@ vacuum_all_databases(ConnParams *cparams,
 }
 
 /*
- * Construct a vacuum/analyze command to run based on the given
+ * Construct a vacuum/analyze/repack command to run based on the given
  * options, in the given string buffer, which may contain previous garbage.
  *
  * The table name used must be already properly quoted.  The command generated
@@ -777,7 +797,13 @@ prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
 
 	resetPQExpBuffer(sql);
 
-	if (vacopts->analyze_only)
+	if (vacopts->mode == MODE_REPACK)
+	{
+		appendPQExpBufferStr(sql, "REPACK");
+		if (vacopts->verbose)
+			appendPQExpBufferStr(sql, " (VERBOSE)");
+	}
+	else if (vacopts->analyze_only)
 	{
 		appendPQExpBufferStr(sql, "ANALYZE");
 
@@ -938,8 +964,8 @@ prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
  * Any errors during command execution are reported to stderr.
  */
 void
-run_vacuum_command(PGconn *conn, const char *sql, bool echo,
-				   const char *table)
+run_vacuum_command(PGconn *conn, vacuumingOptions *vacopts,
+				   const char *sql, bool echo, const char *table)
 {
 	bool		status;
 
@@ -952,13 +978,21 @@ run_vacuum_command(PGconn *conn, const char *sql, bool echo,
 	{
 		if (table)
 		{
-			pg_log_error("vacuuming of table \"%s\" in database \"%s\" failed: %s",
-						 table, PQdb(conn), PQerrorMessage(conn));
+			if (vacopts->mode == MODE_VACUUM)
+				pg_log_error("vacuuming of table \"%s\" in database \"%s\" failed: %s",
+							 table, PQdb(conn), PQerrorMessage(conn));
+			else
+				pg_log_error("repacking of table \"%s\" in database \"%s\" failed: %s",
+							 table, PQdb(conn), PQerrorMessage(conn));
 		}
 		else
 		{
-			pg_log_error("vacuuming of database \"%s\" failed: %s",
-						 PQdb(conn), PQerrorMessage(conn));
+			if (vacopts->mode == MODE_VACUUM)
+				pg_log_error("vacuuming of database \"%s\" failed: %s",
+							 PQdb(conn), PQerrorMessage(conn));
+			else
+				pg_log_error("repacking of database \"%s\" failed: %s",
+							 PQdb(conn), PQerrorMessage(conn));
 		}
 	}
 }
diff --git a/src/bin/scripts/vacuuming.h b/src/bin/scripts/vacuuming.h
index d3f000840fa..154bc9925c0 100644
--- a/src/bin/scripts/vacuuming.h
+++ b/src/bin/scripts/vacuuming.h
@@ -17,6 +17,12 @@
 #include "fe_utils/connect_utils.h"
 #include "fe_utils/simple_list.h"
 
+typedef enum
+{
+	MODE_VACUUM,
+	MODE_REPACK
+} RunMode;
+
 /* For analyze-in-stages mode */
 #define ANALYZE_NO_STAGE	-1
 #define ANALYZE_NUM_STAGES	3
@@ -24,6 +30,7 @@
 /* vacuum options controlled by user flags */
 typedef struct vacuumingOptions
 {
+	RunMode		mode;
 	bool		analyze_only;
 	bool		verbose;
 	bool		and_analyze;
@@ -87,8 +94,8 @@ extern void vacuum_all_databases(ConnParams *cparams,
 extern void prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
 								   vacuumingOptions *vacopts, const char *table);
 
-extern void run_vacuum_command(PGconn *conn, const char *sql, bool echo,
-							   const char *table);
+extern void run_vacuum_command(PGconn *conn, vacuumingOptions *vacopts,
+							   const char *sql, bool echo, const char *table);
 
 extern char *escape_quotes(const char *src);
 
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 60088a64cbb..890998d84bb 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -24,6 +24,7 @@
 #define CLUOPT_RECHECK 0x02		/* recheck relation state */
 #define CLUOPT_RECHECK_ISCLUSTERED 0x04 /* recheck relation state for
 										 * indisclustered */
+#define CLUOPT_ANALYZE 0x08		/* do an ANALYZE */
 
 /* options for CLUSTER */
 typedef struct ClusterParams
@@ -31,8 +32,11 @@ typedef struct ClusterParams
 	bits32		options;		/* bitmask of CLUOPT_* */
 } ClusterParams;
 
-extern void cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel);
-extern void cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params);
+
+extern void ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel);
+
+extern void cluster_rel(RepackCommand command, bool usingindex,
+						Relation OldHeap, Oid indexOid, ClusterParams *params);
 extern void check_index_is_clusterable(Relation OldHeap, Oid indexOid,
 									   LOCKMODE lockmode);
 extern void mark_index_clustered(Relation rel, Oid indexOid, bool is_internal);
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 1cde4bd9bcf..5b6639c114c 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -56,24 +56,51 @@
 #define PROGRESS_ANALYZE_PHASE_COMPUTE_EXT_STATS		4
 #define PROGRESS_ANALYZE_PHASE_FINALIZE_ANALYZE			5
 
-/* Progress parameters for cluster */
-#define PROGRESS_CLUSTER_COMMAND				0
-#define PROGRESS_CLUSTER_PHASE					1
-#define PROGRESS_CLUSTER_INDEX_RELID			2
-#define PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED	3
-#define PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN	4
-#define PROGRESS_CLUSTER_TOTAL_HEAP_BLKS		5
-#define PROGRESS_CLUSTER_HEAP_BLKS_SCANNED		6
-#define PROGRESS_CLUSTER_INDEX_REBUILD_COUNT	7
+/*
+ * Progress parameters for REPACK.
+ *
+ * Note: Since REPACK shares some code with CLUSTER, these values are also
+ * used by CLUSTER. (CLUSTER is now deprecated, so it makes little sense to
+ * introduce a separate set of constants.)
+ */
+#define PROGRESS_REPACK_COMMAND					0
+#define PROGRESS_REPACK_PHASE					1
+#define PROGRESS_REPACK_INDEX_RELID				2
+#define PROGRESS_REPACK_HEAP_TUPLES_SCANNED		3
+#define PROGRESS_REPACK_HEAP_TUPLES_WRITTEN		4
+#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			5
+#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		6
+#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		7
 
-/* Phases of cluster (as advertised via PROGRESS_CLUSTER_PHASE) */
-#define PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP	1
-#define PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP	2
-#define PROGRESS_CLUSTER_PHASE_SORT_TUPLES		3
-#define PROGRESS_CLUSTER_PHASE_WRITE_NEW_HEAP	4
-#define PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES	5
-#define PROGRESS_CLUSTER_PHASE_REBUILD_INDEX	6
-#define PROGRESS_CLUSTER_PHASE_FINAL_CLEANUP	7
+/*
+ * Phases of repack (as advertised via PROGRESS_REPACK_PHASE).
+ */
+#define PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP		1
+#define PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP	2
+#define PROGRESS_REPACK_PHASE_SORT_TUPLES		3
+#define PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP	4
+#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	5
+#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		6
+#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		7
+
+/*
+ * Commands of PROGRESS_REPACK
+ *
+ * Currently we only have one command, so the PROGRESS_REPACK_COMMAND
+ * parameter is not necessary. However it makes cluster.c simpler if we have
+ * the same set of parameters for CLUSTER and REPACK - see the note on REPACK
+ * parameters above.
+ */
+#define PROGRESS_REPACK_COMMAND_REPACK			1
+
+/*
+ * Progress parameters for cluster.
+ *
+ * Although we need to report REPACK and CLUSTER in separate views, the
+ * parameters and phases of CLUSTER are a subset of those of REPACK. Therefore
+ * we just use the appropriate values defined for REPACK above instead of
+ * defining a separate set of constants here.
+ */
 
 /* Commands of PROGRESS_CLUSTER */
 #define PROGRESS_CLUSTER_COMMAND_CLUSTER		1
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 86a236bd58b..fcc25a0c592 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3949,16 +3949,26 @@ typedef struct AlterSystemStmt
 } AlterSystemStmt;
 
 /* ----------------------
- *		Cluster Statement (support pbrown's cluster index implementation)
+ *		Repack Statement
  * ----------------------
  */
-typedef struct ClusterStmt
+typedef enum RepackCommand
+{
+	REPACK_COMMAND_CLUSTER,
+	REPACK_COMMAND_REPACK,
+	REPACK_COMMAND_VACUUMFULL,
+} RepackCommand;
+
+typedef struct RepackStmt
 {
 	NodeTag		type;
-	RangeVar   *relation;		/* relation being indexed, or NULL if all */
-	char	   *indexname;		/* original index defined */
+	RepackCommand command;		/* type of command being run */
+	RangeVar   *relation;		/* relation being repacked */
+	char	   *indexname;		/* order tuples by this index */
+	bool		usingindex;		/* whether USING INDEX is specified */
 	List	   *params;			/* list of DefElem nodes */
-} ClusterStmt;
+} RepackStmt;
+
 
 /* ----------------------
  *		Vacuum and Analyze Statements
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index a4af3f717a1..22559369e2c 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -374,6 +374,7 @@ PG_KEYWORD("reindex", REINDEX, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("relative", RELATIVE_P, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("release", RELEASE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("rename", RENAME, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("repack", REPACK, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("repeatable", REPEATABLE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("replace", REPLACE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("replica", REPLICA, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/include/tcop/cmdtaglist.h b/src/include/tcop/cmdtaglist.h
index d250a714d59..cceb312f2b3 100644
--- a/src/include/tcop/cmdtaglist.h
+++ b/src/include/tcop/cmdtaglist.h
@@ -196,6 +196,7 @@ PG_CMDTAG(CMDTAG_REASSIGN_OWNED, "REASSIGN OWNED", false, false, false)
 PG_CMDTAG(CMDTAG_REFRESH_MATERIALIZED_VIEW, "REFRESH MATERIALIZED VIEW", true, false, false)
 PG_CMDTAG(CMDTAG_REINDEX, "REINDEX", true, false, false)
 PG_CMDTAG(CMDTAG_RELEASE, "RELEASE", false, false, false)
+PG_CMDTAG(CMDTAG_REPACK, "REPACK", false, false, false)
 PG_CMDTAG(CMDTAG_RESET, "RESET", false, false, false)
 PG_CMDTAG(CMDTAG_REVOKE, "REVOKE", true, false, false)
 PG_CMDTAG(CMDTAG_REVOKE_ROLE, "REVOKE ROLE", false, false, false)
diff --git a/src/include/utils/backend_progress.h b/src/include/utils/backend_progress.h
index dda813ab407..e69e366dcdc 100644
--- a/src/include/utils/backend_progress.h
+++ b/src/include/utils/backend_progress.h
@@ -28,6 +28,7 @@ typedef enum ProgressCommandType
 	PROGRESS_COMMAND_CREATE_INDEX,
 	PROGRESS_COMMAND_BASEBACKUP,
 	PROGRESS_COMMAND_COPY,
+	PROGRESS_COMMAND_REPACK,
 } ProgressCommandType;
 
 #define PGSTAT_NUM_PROGRESS_PARAM	20
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index 4d40a6809ab..5256628b51d 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -254,6 +254,63 @@ ORDER BY 1;
  clstr_tst_pkey
 (3 rows)
 
+-- REPACK handles individual tables identically to CLUSTER, but it's worth
+-- checking if it handles table hierarchies identically as well.
+REPACK clstr_tst USING INDEX clstr_tst_c;
+-- Verify that inheritance link still works
+INSERT INTO clstr_tst_inh VALUES (0, 100, 'in child table 2');
+SELECT a,b,c,substring(d for 30), length(d) from clstr_tst;
+ a  |  b  |        c         |           substring            | length 
+----+-----+------------------+--------------------------------+--------
+ 10 |  14 | catorce          |                                |       
+ 18 |   5 | cinco            |                                |       
+  9 |   4 | cuatro           |                                |       
+ 26 |  19 | diecinueve       |                                |       
+ 12 |  18 | dieciocho        |                                |       
+ 30 |  16 | dieciseis        |                                |       
+ 24 |  17 | diecisiete       |                                |       
+  2 |  10 | diez             |                                |       
+ 23 |  12 | doce             |                                |       
+ 11 |   2 | dos              |                                |       
+ 25 |   9 | nueve            |                                |       
+ 31 |   8 | ocho             |                                |       
+  1 |  11 | once             |                                |       
+ 28 |  15 | quince           |                                |       
+ 32 |   6 | seis             | xyzzyxyzzyxyzzyxyzzyxyzzyxyzzy | 500000
+ 29 |   7 | siete            |                                |       
+ 15 |  13 | trece            |                                |       
+ 22 |  30 | treinta          |                                |       
+ 17 |  32 | treinta y dos    |                                |       
+  3 |  31 | treinta y uno    |                                |       
+  5 |   3 | tres             |                                |       
+ 20 |   1 | uno              |                                |       
+  6 |  20 | veinte           |                                |       
+ 14 |  25 | veinticinco      |                                |       
+ 21 |  24 | veinticuatro     |                                |       
+  4 |  22 | veintidos        |                                |       
+ 19 |  29 | veintinueve      |                                |       
+ 16 |  28 | veintiocho       |                                |       
+ 27 |  26 | veintiseis       |                                |       
+ 13 |  27 | veintisiete      |                                |       
+  7 |  23 | veintitres       |                                |       
+  8 |  21 | veintiuno        |                                |       
+  0 | 100 | in child table   |                                |       
+  0 | 100 | in child table 2 |                                |       
+(34 rows)
+
+-- Verify that foreign key link still works
+INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
+ERROR:  insert or update on table "clstr_tst" violates foreign key constraint "clstr_tst_con"
+DETAIL:  Key (b)=(1111) is not present in table "clstr_tst_s".
+SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
+ORDER BY 1;
+       conname        
+----------------------
+ clstr_tst_a_not_null
+ clstr_tst_con
+ clstr_tst_pkey
+(3 rows)
+
 SELECT relname, relkind,
     EXISTS(SELECT 1 FROM pg_class WHERE oid = c.reltoastrelid) AS hastoast
 FROM pg_class c WHERE relname LIKE 'clstr_tst%' ORDER BY relname;
@@ -381,6 +438,35 @@ SELECT * FROM clstr_1;
  2
 (2 rows)
 
+-- REPACK w/o argument performs no ordering, so we can only check which tables
+-- have the relfilenode changed.
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_old AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+SET SESSION AUTHORIZATION regress_clstr_user;
+SET client_min_messages = ERROR;  -- order of "skipping" warnings may vary
+REPACK;
+RESET client_min_messages;
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_new AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+-- Do the actual comparison. Unlike CLUSTER, clstr_3 should have been
+-- processed because there is nothing like clustering index here.
+SELECT o.relname FROM relnodes_old o
+JOIN relnodes_new n ON o.relname = n.relname
+WHERE o.relfilenode <> n.relfilenode
+ORDER BY o.relname;
+ relname 
+---------
+ clstr_1
+ clstr_3
+(2 rows)
+
+SET SESSION AUTHORIZATION regress_clstr_user;
 -- Test MVCC-safety of cluster. There isn't much we can do to verify the
 -- results with a single backend...
 CREATE TABLE clustertest (key int PRIMARY KEY);
@@ -495,6 +581,43 @@ ALTER TABLE clstrpart SET WITHOUT CLUSTER;
 ERROR:  cannot mark index clustered in partitioned table
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 ERROR:  cannot mark index clustered in partitioned table
+-- Check that REPACK sets new relfilenodes: it should process exactly the same
+-- tables as CLUSTER did.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart USING INDEX clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
+-- And finally the same for REPACK w/o index.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
 DROP TABLE clstrpart;
 -- Ownership of partitions is checked
 CREATE TABLE ptnowner(i int unique) PARTITION BY LIST (i);
@@ -513,7 +636,7 @@ CREATE TEMP TABLE ptnowner_oldnodes AS
   JOIN pg_class AS c ON c.oid=tree.relid;
 SET SESSION AUTHORIZATION regress_ptnowner;
 CLUSTER ptnowner USING ptnowner_i_idx;
-WARNING:  permission denied to cluster "ptnowner2", skipping it
+WARNING:  permission denied to execute CLUSTER on "ptnowner2", skipping it
 RESET SESSION AUTHORIZATION;
 SELECT a.relname, a.relfilenode=b.relfilenode FROM pg_class a
   JOIN ptnowner_oldnodes b USING (oid) ORDER BY a.relname COLLATE "C";
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 35e8aad7701..3a1d1d28282 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2071,6 +2071,29 @@ pg_stat_progress_create_index| SELECT s.pid,
     s.param15 AS partitions_done
    FROM (pg_stat_get_progress_info('CREATE INDEX'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+pg_stat_progress_repack| SELECT s.pid,
+    s.datid,
+    d.datname,
+    s.relid,
+        CASE s.param2
+            WHEN 0 THEN 'initializing'::text
+            WHEN 1 THEN 'seq scanning heap'::text
+            WHEN 2 THEN 'index scanning heap'::text
+            WHEN 3 THEN 'sorting tuples'::text
+            WHEN 4 THEN 'writing new heap'::text
+            WHEN 5 THEN 'swapping relation files'::text
+            WHEN 6 THEN 'rebuilding index'::text
+            WHEN 7 THEN 'performing final cleanup'::text
+            ELSE NULL::text
+        END AS phase,
+    (s.param3)::oid AS repack_index_relid,
+    s.param4 AS heap_tuples_scanned,
+    s.param5 AS heap_tuples_written,
+    s.param6 AS heap_blks_total,
+    s.param7 AS heap_blks_scanned,
+    s.param8 AS index_rebuild_count
+   FROM (pg_stat_get_progress_info('REPACK'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
+     LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_vacuum| SELECT s.pid,
     s.datid,
     d.datname,
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index b7115f86104..cfcc3dc9761 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -76,6 +76,19 @@ INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
 SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
 ORDER BY 1;
 
+-- REPACK handles individual tables identically to CLUSTER, but it's worth
+-- checking if it handles table hierarchies identically as well.
+REPACK clstr_tst USING INDEX clstr_tst_c;
+
+-- Verify that inheritance link still works
+INSERT INTO clstr_tst_inh VALUES (0, 100, 'in child table 2');
+SELECT a,b,c,substring(d for 30), length(d) from clstr_tst;
+
+-- Verify that foreign key link still works
+INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
+
+SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
+ORDER BY 1;
 
 SELECT relname, relkind,
     EXISTS(SELECT 1 FROM pg_class WHERE oid = c.reltoastrelid) AS hastoast
@@ -159,6 +172,34 @@ INSERT INTO clstr_1 VALUES (1);
 CLUSTER clstr_1;
 SELECT * FROM clstr_1;
 
+-- REPACK w/o argument performs no ordering, so we can only check which tables
+-- have the relfilenode changed.
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_old AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+
+SET SESSION AUTHORIZATION regress_clstr_user;
+SET client_min_messages = ERROR;  -- order of "skipping" warnings may vary
+REPACK;
+RESET client_min_messages;
+
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_new AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+
+-- Do the actual comparison. Unlike CLUSTER, clstr_3 should have been
+-- processed because there is nothing like clustering index here.
+SELECT o.relname FROM relnodes_old o
+JOIN relnodes_new n ON o.relname = n.relname
+WHERE o.relfilenode <> n.relfilenode
+ORDER BY o.relname;
+
+SET SESSION AUTHORIZATION regress_clstr_user;
+
 -- Test MVCC-safety of cluster. There isn't much we can do to verify the
 -- results with a single backend...
 
@@ -229,6 +270,24 @@ SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM o
 CLUSTER clstrpart;
 ALTER TABLE clstrpart SET WITHOUT CLUSTER;
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+
+-- Check that REPACK sets new relfilenodes: it should process exactly the same
+-- tables as CLUSTER did.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart USING INDEX clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+
+-- And finally the same for REPACK w/o index.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+
 DROP TABLE clstrpart;
 
 -- Ownership of partitions is checked
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a13e8162890..98242e25432 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2537,6 +2537,8 @@ ReorderBufferTupleCidKey
 ReorderBufferUpdateProgressTxnCB
 ReorderTuple
 RepOriginId
+RepackCommand
+RepackStmt
 ReparameterizeForeignPathByChild_function
 ReplaceVarsFromTargetList_context
 ReplaceVarsNoMatchOption
@@ -2603,6 +2605,7 @@ RtlNtStatusToDosError_t
 RuleInfo
 RuleLock
 RuleStmt
+RunMode
 RunningTransactions
 RunningTransactionsData
 SASLStatus
-- 
2.39.5

v20-0003-Refactor-index_concurrently_create_copy-for-use-.patchtext/x-diff; charset=utf-8Download

From 7c708ce787c8d8327eeb96f2e9a3b5e2ad87226d Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Mon, 11 Aug 2025 15:31:34 +0200
Subject: [PATCH v20 3/6] Refactor index_concurrently_create_copy() for use
 with REPACK (CONCURRENTLY).

This patch moves the code to index_create_copy() and adds a "concurrently"
parameter so it can be used by REPACK (CONCURRENTLY).

With the CONCURRENTLY option, REPACK cannot simply swap the heap file and
rebuild its indexes. Instead, it needs to build a separate set of indexes
(including system catalog entries) *before* the actual swap, to reduce the
time AccessExclusiveLock needs to be held for.
---
 src/backend/catalog/index.c | 36 ++++++++++++++++++++++++++++--------
 src/include/catalog/index.h |  3 +++
 2 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 3063abff9a5..0dee1b1a9d8 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1290,15 +1290,31 @@ index_create(Relation heapRelation,
 /*
  * index_concurrently_create_copy
  *
- * Create concurrently an index based on the definition of the one provided by
- * caller.  The index is inserted into catalogs and needs to be built later
- * on.  This is called during concurrent reindex processing.
- *
- * "tablespaceOid" is the tablespace to use for this index.
+ * Variant of index_create_copy(), called during concurrent reindex
+ * processing.
  */
 Oid
 index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 							   Oid tablespaceOid, const char *newName)
+{
+	return index_create_copy(heapRelation, oldIndexId, tablespaceOid, newName,
+							 true);
+}
+
+/*
+ * index_create_copy
+ *
+ * Create an index based on the definition of the one provided by caller.  The
+ * index is inserted into catalogs and needs to be built later on.
+ *
+ * "tablespaceOid" is the tablespace to use for this index.
+ *
+ * The actual implementation of index_concurrently_create_copy(), reusable for
+ * other purposes.
+ */
+Oid
+index_create_copy(Relation heapRelation, Oid oldIndexId, Oid tablespaceOid,
+				  const char *newName, bool concurrently)
 {
 	Relation	indexRelation;
 	IndexInfo  *oldInfo,
@@ -1317,6 +1333,7 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 	List	   *indexColNames = NIL;
 	List	   *indexExprs = NIL;
 	List	   *indexPreds = NIL;
+	int			flags = 0;
 
 	indexRelation = index_open(oldIndexId, RowExclusiveLock);
 
@@ -1325,9 +1342,9 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 
 	/*
 	 * Concurrent build of an index with exclusion constraints is not
-	 * supported.
+	 * supported. If !concurrently, ii_ExclusinOps is currently not needed.
 	 */
-	if (oldInfo->ii_ExclusionOps != NULL)
+	if (oldInfo->ii_ExclusionOps != NULL && concurrently)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("concurrent index creation for exclusion constraints is not supported")));
@@ -1435,6 +1452,9 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 		stattargets[i].isnull = isnull;
 	}
 
+	if (concurrently)
+		flags = INDEX_CREATE_SKIP_BUILD | INDEX_CREATE_CONCURRENT;
+
 	/*
 	 * Now create the new index.
 	 *
@@ -1458,7 +1478,7 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 							  indcoloptions->values,
 							  stattargets,
 							  reloptionsDatum,
-							  INDEX_CREATE_SKIP_BUILD | INDEX_CREATE_CONCURRENT,
+							  flags,
 							  0,
 							  true, /* allow table to be a system catalog? */
 							  false,	/* is_internal? */
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index 4daa8bef5ee..063a891351a 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -99,6 +99,9 @@ extern Oid	index_concurrently_create_copy(Relation heapRelation,
 										   Oid oldIndexId,
 										   Oid tablespaceOid,
 										   const char *newName);
+extern Oid	index_create_copy(Relation heapRelation, Oid oldIndexId,
+							  Oid tablespaceOid, const char *newName,
+							  bool concurrently);
 
 extern void index_concurrently_build(Oid heapRelationId,
 									 Oid indexRelationId);
-- 
2.39.5

v20-0004-Move-conversion-of-a-historic-to-MVCC-snapshot-t.patchtext/x-diff; charset=utf-8Download

From 32aa5b72853a9405b1f03fbc664aa81ea43c939f Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Mon, 11 Aug 2025 15:23:05 +0200
Subject: [PATCH v20 4/6] Move conversion of a "historic" to MVCC snapshot to a
 separate function.

The conversion is now handled by SnapBuildMVCCFromHistoric(). REPACK
CONCURRENTLY will also need it.
---
 src/backend/replication/logical/snapbuild.c | 51 +++++++++++++++++----
 src/backend/utils/time/snapmgr.c            |  3 +-
 src/include/replication/snapbuild.h         |  1 +
 src/include/utils/snapmgr.h                 |  1 +
 4 files changed, 45 insertions(+), 11 deletions(-)

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 98ddee20929..a2f1803622c 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -440,10 +440,7 @@ Snapshot
 SnapBuildInitialSnapshot(SnapBuild *builder)
 {
 	Snapshot	snap;
-	TransactionId xid;
 	TransactionId safeXid;
-	TransactionId *newxip;
-	int			newxcnt = 0;
 
 	Assert(XactIsoLevel == XACT_REPEATABLE_READ);
 	Assert(builder->building_full_snapshot);
@@ -485,6 +482,31 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 
 	MyProc->xmin = snap->xmin;
 
+	/* Convert the historic snapshot to MVCC snapshot. */
+	return SnapBuildMVCCFromHistoric(snap, true);
+}
+
+/*
+ * Turn a historic MVCC snapshot into an ordinary MVCC snapshot.
+ *
+ * Unlike a regular (non-historic) MVCC snapshot, the xip array of this
+ * snapshot contains not only running main transactions, but also their
+ * subtransactions. This difference does has no impact on XidInMVCCSnapshot().
+ *
+ * Pass true for 'in_place' if you don't care about modifying the source
+ * snapshot. If you need a new instance, and one that was allocated as a
+ * single chunk of memory, pass false.
+ */
+Snapshot
+SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place)
+{
+	TransactionId xid;
+	TransactionId *oldxip = snapshot->xip;
+	uint32		oldxcnt = snapshot->xcnt;
+	TransactionId *newxip;
+	int			newxcnt = 0;
+	Snapshot	result;
+
 	/* allocate in transaction context */
 	newxip = (TransactionId *)
 		palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount());
@@ -495,7 +517,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	 * classical snapshot by marking all non-committed transactions as
 	 * in-progress. This can be expensive.
 	 */
-	for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);)
+	for (xid = snapshot->xmin; NormalTransactionIdPrecedes(xid, snapshot->xmax);)
 	{
 		void	   *test;
 
@@ -503,7 +525,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 		 * Check whether transaction committed using the decoding snapshot
 		 * meaning of ->xip.
 		 */
-		test = bsearch(&xid, snap->xip, snap->xcnt,
+		test = bsearch(&xid, snapshot->xip, snapshot->xcnt,
 					   sizeof(TransactionId), xidComparator);
 
 		if (test == NULL)
@@ -520,11 +542,22 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	}
 
 	/* adjust remaining snapshot fields as needed */
-	snap->snapshot_type = SNAPSHOT_MVCC;
-	snap->xcnt = newxcnt;
-	snap->xip = newxip;
+	snapshot->xcnt = newxcnt;
+	snapshot->xip = newxip;
 
-	return snap;
+	if (in_place)
+		result = snapshot;
+	else
+	{
+		result = CopySnapshot(snapshot);
+
+		/* Restore the original values so the source is intact. */
+		snapshot->xip = oldxip;
+		snapshot->xcnt = oldxcnt;
+	}
+	result->snapshot_type = SNAPSHOT_MVCC;
+
+	return result;
 }
 
 /*
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 65561cc6bc3..bc7840052fe 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -212,7 +212,6 @@ typedef struct ExportedSnapshot
 static List *exportedSnapshots = NIL;
 
 /* Prototypes for local functions */
-static Snapshot CopySnapshot(Snapshot snapshot);
 static void UnregisterSnapshotNoOwner(Snapshot snapshot);
 static void FreeSnapshot(Snapshot snapshot);
 static void SnapshotResetXmin(void);
@@ -602,7 +601,7 @@ SetTransactionSnapshot(Snapshot sourcesnap, VirtualTransactionId *sourcevxid,
  * The copy is palloc'd in TopTransactionContext and has initial refcounts set
  * to 0.  The returned snapshot has the copied flag set.
  */
-static Snapshot
+Snapshot
 CopySnapshot(Snapshot snapshot)
 {
 	Snapshot	newsnap;
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index 44031dcf6e3..6d4d2d1814c 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -73,6 +73,7 @@ extern void FreeSnapshotBuilder(SnapBuild *builder);
 extern void SnapBuildSnapDecRefcount(Snapshot snap);
 
 extern Snapshot SnapBuildInitialSnapshot(SnapBuild *builder);
+extern Snapshot SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place);
 extern const char *SnapBuildExportSnapshot(SnapBuild *builder);
 extern void SnapBuildClearExportedSnapshot(void);
 extern void SnapBuildResetExportedSnapshotState(void);
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..f65f83c85cd 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -63,6 +63,7 @@ extern Snapshot GetTransactionSnapshot(void);
 extern Snapshot GetLatestSnapshot(void);
 extern void SnapshotSetCommandId(CommandId curcid);
 
+extern Snapshot CopySnapshot(Snapshot snapshot);
 extern Snapshot GetCatalogSnapshot(Oid relid);
 extern Snapshot GetNonHistoricCatalogSnapshot(Oid relid);
 extern void InvalidateCatalogSnapshot(void);
-- 
2.39.5

v20-0005-Add-CONCURRENTLY-option-to-REPACK-command.patchtext/x-diff; charset=utf-8Download

From 28f9c2f1ac8666473674ea08f188ba3ca211a1f4 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= <alvherre@kurilemu.de>
Date: Sat, 30 Aug 2025 19:13:38 +0200
Subject: [PATCH v20 5/6] Add CONCURRENTLY option to REPACK command.

The REPACK command copies the relation data into a new file, creates new
indexes and eventually swaps the files. To make sure that the old file does
not change during the copying, the relation is locked in an exclusive mode,
which prevents applications from both reading and writing. (To keep the data
consistent, we'd only need to prevent the applications from writing, but even
reading needs to be blocked before we can swap the files - otherwise some
applications could continue using the old file. Since we should not request a
stronger lock without releasing the weaker one first, we acquire the exclusive
lock in the beginning and keep it till the end of the processing.)

This patch introduces an alternative workflow, which only requires the
exclusive lock when the relation (and index) files are being swapped.
(Supposedly, the swapping should be pretty fast.) On the other hand, when we
copy the data to the new file, we allow applications to read from the relation
and even to write to it.

First, we scan the relation using a "historic snapshot", and insert all the
tuples satisfying this snapshot into the new file.

Second, logical decoding is used to capture the data changes done by
applications during the copying (i.e. changes that do not satisfy the historic
snapshot mentioned above), and those are applied to the new file before we
acquire the exclusive lock that we need to swap the files. (Of course, more
data changes can take place while we are waiting for the lock - these will be
applied to the new file after we have acquired the lock, before we swap the
files.)

Since the logical decoding system, during its startup, waits until all the
transactions which already have XID assigned have finished, there is a risk of
deadlock if a transaction that already changed anything in the database tries
to acquire a conflicting lock on the table REPACK CONCURRENTLY is working
on. As an example, consider transaction running CREATE INDEX command on the
table that is being REPACKed CONCURRENTLY. On the other hand, DML commands
(INSERT, UPDATE, DELETE) are not a problem as their lock does not conflict
with REPACK CONCURRENTLY.

The current approach is that we accept the risk. If we tried to avoid it, it'd
be necessary to unlock the table before the logical decoding is setup and lock
it again afterwards. Such temporary unlocking would imply re-checking if the
table still meets all the requirements for REPACK CONCURRENTLY.

Like the existing implementation of REPACK, the variant with the CONCURRENTLY
option also requires an extra space for the new relation and index files
(which coexist with the old files for some time). In addition, the
CONCURRENTLY option might introduce a lag in releasing WAL segments for
archiving / recycling. This is due to the decoding of the data changes done by
applications concurrently. When copying the table contents into the new file,
we check the lag periodically. If it exceeds the size of a WAL segment, we
decode all the available WAL before resuming the copying. (Of course, the
changes are not applied until the whole table contents is copied.) A
background worker might be a better approach for the decoding - let's consider
implementing it in the future.

The WAL records produced by running DML commands on the new relation do not
contain enough information to be processed by the logical decoding system. All
we need from the new relation is the file (relfilenode), while the actual
relation is eventually dropped. Thus there is no point in replaying the DMLs
anywhere.

Author: Antonin Houska <ah@cybertec.at>
---
 doc/src/sgml/monitoring.sgml                  |   37 +-
 doc/src/sgml/mvcc.sgml                        |   12 +-
 doc/src/sgml/ref/repack.sgml                  |  129 +-
 src/Makefile                                  |    1 +
 src/backend/access/heap/heapam.c              |   34 +-
 src/backend/access/heap/heapam_handler.c      |  227 ++-
 src/backend/access/heap/rewriteheap.c         |    6 +-
 src/backend/access/transam/xact.c             |   11 +-
 src/backend/catalog/system_views.sql          |   30 +-
 src/backend/commands/cluster.c                | 1677 +++++++++++++++--
 src/backend/commands/matview.c                |    2 +-
 src/backend/commands/tablecmds.c              |    1 +
 src/backend/commands/vacuum.c                 |   12 +-
 src/backend/meson.build                       |    1 +
 src/backend/replication/logical/decode.c      |   83 +
 src/backend/replication/logical/snapbuild.c   |   20 +
 .../replication/pgoutput_repack/Makefile      |   32 +
 .../replication/pgoutput_repack/meson.build   |   18 +
 .../pgoutput_repack/pgoutput_repack.c         |  288 +++
 src/backend/storage/ipc/ipci.c                |    1 +
 .../storage/lmgr/generate-lwlocknames.pl      |    2 +-
 src/backend/utils/cache/relcache.c            |    1 +
 src/backend/utils/time/snapmgr.c              |    3 +-
 src/bin/psql/tab-complete.in.c                |   25 +-
 src/include/access/heapam.h                   |    9 +-
 src/include/access/heapam_xlog.h              |    2 +
 src/include/access/tableam.h                  |   10 +
 src/include/commands/cluster.h                |   91 +-
 src/include/commands/progress.h               |   23 +-
 src/include/replication/snapbuild.h           |    1 +
 src/include/storage/lockdefs.h                |    4 +-
 src/include/utils/snapmgr.h                   |    2 +
 src/test/modules/injection_points/Makefile    |    5 +-
 .../injection_points/expected/repack.out      |  113 ++
 .../modules/injection_points/logical.conf     |    1 +
 src/test/modules/injection_points/meson.build |    4 +
 .../injection_points/specs/repack.spec        |  143 ++
 src/test/regress/expected/rules.out           |   29 +-
 src/tools/pgindent/typedefs.list              |    4 +
 39 files changed, 2821 insertions(+), 273 deletions(-)
 create mode 100644 src/backend/replication/pgoutput_repack/Makefile
 create mode 100644 src/backend/replication/pgoutput_repack/meson.build
 create mode 100644 src/backend/replication/pgoutput_repack/pgoutput_repack.c
 create mode 100644 src/test/modules/injection_points/expected/repack.out
 create mode 100644 src/test/modules/injection_points/logical.conf
 create mode 100644 src/test/modules/injection_points/specs/repack.spec

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 12e103d319d..61c0197555f 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6074,14 +6074,35 @@ FROM pg_stat_get_backend_idset() AS backendid;
 
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>heap_tuples_written</structfield> <type>bigint</type>
+       <structfield>heap_tuples_inserted</structfield> <type>bigint</type>
       </para>
       <para>
-       Number of heap tuples written.
+       Number of heap tuples inserted.
        This counter only advances when the phase is
        <literal>seq scanning heap</literal>,
-       <literal>index scanning heap</literal>
-       or <literal>writing new heap</literal>.
+       <literal>index scanning heap</literal>,
+       <literal>writing new heap</literal>
+       or <literal>catch-up</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_updated</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples updated.
+       This counter only advances when the phase is <literal>catch-up</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_deleted</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples deleted.
+       This counter only advances when the phase is <literal>catch-up</literal>.
       </para></entry>
      </row>
 
@@ -6162,6 +6183,14 @@ FROM pg_stat_get_backend_idset() AS backendid;
        <command>REPACK</command> is currently writing the new heap.
      </entry>
     </row>
+    <row>
+     <entry><literal>catch-up</literal></entry>
+     <entry>
+       <command>REPACK CONCURRENTLY</command> is currently processing the DML
+       commands that other transactions executed during any of the preceding
+       phase.
+     </entry>
+    </row>
     <row>
      <entry><literal>swapping relation files</literal></entry>
      <entry>
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index 049ee75a4ba..0f5c34af542 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -1833,15 +1833,17 @@ SELECT pg_advisory_lock(q.id) FROM
    <title>Caveats</title>
 
    <para>
-    Some DDL commands, currently only <link linkend="sql-truncate"><command>TRUNCATE</command></link> and the
-    table-rewriting forms of <link linkend="sql-altertable"><command>ALTER TABLE</command></link>, are not
+    Some commands, currently only <link linkend="sql-truncate"><command>TRUNCATE</command></link>, the
+    table-rewriting forms of <link linkend="sql-altertable"><command>ALTER
+    TABLE</command></link> and <command>REPACK</command> with
+    the <literal>CONCURRENTLY</literal> option, are not
     MVCC-safe.  This means that after the truncation or rewrite commits, the
     table will appear empty to concurrent transactions, if they are using a
-    snapshot taken before the DDL command committed.  This will only be an
+    snapshot taken before the command committed.  This will only be an
     issue for a transaction that did not access the table in question
-    before the DDL command started &mdash; any transaction that has done so
+    before the command started &mdash; any transaction that has done so
     would hold at least an <literal>ACCESS SHARE</literal> table lock,
-    which would block the DDL command until that transaction completes.
+    which would block the truncating or rewriting command until that transaction completes.
     So these commands will not cause any apparent inconsistency in the
     table contents for successive queries on the target table, but they
     could cause visible inconsistency between the contents of the target
diff --git a/doc/src/sgml/ref/repack.sgml b/doc/src/sgml/ref/repack.sgml
index fd9d89f8aaa..ff5ce48de55 100644
--- a/doc/src/sgml/ref/repack.sgml
+++ b/doc/src/sgml/ref/repack.sgml
@@ -27,6 +27,7 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
 
     VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
     ANALYSE | ANALYZE
+    CONCURRENTLY
 </synopsis>
  </refsynopsisdiv>
 
@@ -49,7 +50,8 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
    processes every table and materialized view in the current database that
    the current user has the <literal>MAINTAIN</literal> privilege on. This
    form of <command>REPACK</command> cannot be executed inside a transaction
-   block.
+   block.  Also, this form is not allowed if
+   the <literal>CONCURRENTLY</literal> option is used.
   </para>
 
   <para>
@@ -62,7 +64,8 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
    When a table is being repacked, an <literal>ACCESS EXCLUSIVE</literal> lock
    is acquired on it. This prevents any other database operations (both reads
    and writes) from operating on the table until the <command>REPACK</command>
-   is finished.
+   is finished. If you want to keep the table accessible during the repacking,
+   consider using the <literal>CONCURRENTLY</literal> option.
   </para>
 
   <refsect2 id="sql-repack-notes-on-clustering" xreflabel="Notes on Clustering">
@@ -179,6 +182,128 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>CONCURRENTLY</literal></term>
+    <listitem>
+     <para>
+      Allow other transactions to use the table while it is being repacked.
+     </para>
+
+     <para>
+      Internally, <command>REPACK</command> copies the contents of the table
+      (ignoring dead tuples) into a new file, sorted by the specified index,
+      and also creates a new file for each index. Then it swaps the old and
+      new files for the table and all the indexes, and deletes the old
+      files. The <literal>ACCESS EXCLUSIVE</literal> lock is needed to make
+      sure that the old files do not change during the processing because the
+      changes would get lost due to the swap.
+     </para>
+
+     <para>
+      With the <literal>CONCURRENTLY</literal> option, the <literal>ACCESS
+      EXCLUSIVE</literal> lock is only acquired to swap the table and index
+      files. The data changes that took place during the creation of the new
+      table and index files are captured using logical decoding
+      (<xref linkend="logicaldecoding"/>) and applied before
+      the <literal>ACCESS EXCLUSIVE</literal> lock is requested. Thus the lock
+      is typically held only for the time needed to swap the files, which
+      should be pretty short. However, the time might still be noticeable if
+      too many data changes have been done to the table while
+      <command>REPACK</command> was waiting for the lock: those changes must
+      be processed just before the files are swapped, while the
+      <literal>ACCESS EXCLUSIVE</literal> lock is being held.
+     </para>
+
+     <para>
+      Note that <command>REPACK</command> with the
+      the <literal>CONCURRENTLY</literal> option does not try to order the
+      rows inserted into the table after the repacking started. Also
+      note <command>REPACK</command> might fail to complete due to DDL
+      commands executed on the table by other transactions during the
+      repacking.
+     </para>
+
+     <note>
+      <para>
+       In addition to the temporary space requirements explained in
+       <xref linkend="sql-repack-notes-on-resources"/>,
+       the <literal>CONCURRENTLY</literal> option can add to the usage of
+       temporary space a bit more. The reason is that other transactions can
+       perform DML operations which cannot be applied to the new file until
+       <command>REPACK</command> has copied all the tuples from the old
+       file. Thus the tuples inserted into the old file during the copying are
+       also stored separately in a temporary file, so they can eventually be
+       applied to the new file.
+      </para>
+
+      <para>
+       Furthermore, the data changes performed during the copying are
+       extracted from <link linkend="wal">write-ahead log</link> (WAL), and
+       this extraction (decoding) only takes place when certain amount of WAL
+       has been written. Therefore, WAL removal can be delayed by this
+       threshold. Currently the threshold is equal to the value of
+       the <link linkend="guc-wal-segment-size"><varname>wal_segment_size</varname></link>
+       configuration parameter.
+      </para>
+     </note>
+
+     <para>
+      The <literal>CONCURRENTLY</literal> option cannot be used in the
+      following cases:
+
+      <itemizedlist>
+       <listitem>
+        <para>
+          The table is <literal>UNLOGGED</literal>.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+          The table is partitioned.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+          The table is a system catalog or a <acronym>TOAST</acronym> table.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         <command>REPACK</command> is executed inside a transaction block.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+          The <link linkend="guc-wal-level"><varname>wal_level</varname></link>
+          configuration parameter is less than <literal>logical</literal>.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         The <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+         configuration parameter does not allow for creation of an additional
+         replication slot.
+        </para>
+       </listitem>
+      </itemizedlist>
+     </para>
+
+     <warning>
+      <para>
+       <command>REPACK</command> with the <literal>CONCURRENTLY</literal>
+       option is not MVCC-safe, see <xref linkend="mvcc-caveats"/> for
+       details.
+      </para>
+     </warning>
+
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>VERBOSE</literal></term>
     <listitem>
diff --git a/src/Makefile b/src/Makefile
index 2f31a2f20a7..b18c9a14ffa 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -23,6 +23,7 @@ SUBDIRS = \
 	interfaces \
 	backend/replication/libpqwalreceiver \
 	backend/replication/pgoutput \
+	backend/replication/pgoutput_repack \
 	fe_utils \
 	bin \
 	pl \
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e3e7307ef5f..f9a4fe3faed 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -60,7 +60,8 @@ static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
 static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 								  Buffer newbuf, HeapTuple oldtup,
 								  HeapTuple newtup, HeapTuple old_key_tuple,
-								  bool all_visible_cleared, bool new_all_visible_cleared);
+								  bool all_visible_cleared, bool new_all_visible_cleared,
+								  bool wal_logical);
 #ifdef USE_ASSERT_CHECKING
 static void check_lock_if_inplace_updateable_rel(Relation relation,
 												 ItemPointer otid,
@@ -2780,7 +2781,7 @@ xmax_infomask_changed(uint16 new_infomask, uint16 old_infomask)
 TM_Result
 heap_delete(Relation relation, ItemPointer tid,
 			CommandId cid, Snapshot crosscheck, bool wait,
-			TM_FailureData *tmfd, bool changingPart)
+			TM_FailureData *tmfd, bool changingPart, bool wal_logical)
 {
 	TM_Result	result;
 	TransactionId xid = GetCurrentTransactionId();
@@ -3027,7 +3028,8 @@ l1:
 	 * Compute replica identity tuple before entering the critical section so
 	 * we don't PANIC upon a memory allocation failure.
 	 */
-	old_key_tuple = ExtractReplicaIdentity(relation, &tp, true, &old_key_copied);
+	old_key_tuple = wal_logical ?
+		ExtractReplicaIdentity(relation, &tp, true, &old_key_copied) : NULL;
 
 	/*
 	 * If this is the first possibly-multixact-able operation in the current
@@ -3117,6 +3119,15 @@ l1:
 				xlrec.flags |= XLH_DELETE_CONTAINS_OLD_KEY;
 		}
 
+		/*
+		 * Unlike UPDATE, DELETE is decoded even if there is no old key, so it
+		 * does not help to clear both XLH_DELETE_CONTAINS_OLD_TUPLE and
+		 * XLH_DELETE_CONTAINS_OLD_KEY. Thus we need an extra flag. TODO
+		 * Consider not decoding tuples w/o the old tuple/key instead.
+		 */
+		if (!wal_logical)
+			xlrec.flags |= XLH_DELETE_NO_LOGICAL;
+
 		XLogBeginInsert();
 		XLogRegisterData(&xlrec, SizeOfHeapDelete);
 
@@ -3209,7 +3220,8 @@ simple_heap_delete(Relation relation, ItemPointer tid)
 	result = heap_delete(relation, tid,
 						 GetCurrentCommandId(true), InvalidSnapshot,
 						 true /* wait for commit */ ,
-						 &tmfd, false /* changingPart */ );
+						 &tmfd, false,	/* changingPart */
+						 true /* wal_logical */ );
 	switch (result)
 	{
 		case TM_SelfModified:
@@ -3250,7 +3262,7 @@ TM_Result
 heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 			CommandId cid, Snapshot crosscheck, bool wait,
 			TM_FailureData *tmfd, LockTupleMode *lockmode,
-			TU_UpdateIndexes *update_indexes)
+			TU_UpdateIndexes *update_indexes, bool wal_logical)
 {
 	TM_Result	result;
 	TransactionId xid = GetCurrentTransactionId();
@@ -4143,7 +4155,8 @@ l2:
 								 newbuf, &oldtup, heaptup,
 								 old_key_tuple,
 								 all_visible_cleared,
-								 all_visible_cleared_new);
+								 all_visible_cleared_new,
+								 wal_logical);
 		if (newbuf != buffer)
 		{
 			PageSetLSN(BufferGetPage(newbuf), recptr);
@@ -4501,7 +4514,8 @@ simple_heap_update(Relation relation, ItemPointer otid, HeapTuple tup,
 	result = heap_update(relation, otid, tup,
 						 GetCurrentCommandId(true), InvalidSnapshot,
 						 true /* wait for commit */ ,
-						 &tmfd, &lockmode, update_indexes);
+						 &tmfd, &lockmode, update_indexes,
+						 true /* wal_logical */ );
 	switch (result)
 	{
 		case TM_SelfModified:
@@ -8842,7 +8856,8 @@ static XLogRecPtr
 log_heap_update(Relation reln, Buffer oldbuf,
 				Buffer newbuf, HeapTuple oldtup, HeapTuple newtup,
 				HeapTuple old_key_tuple,
-				bool all_visible_cleared, bool new_all_visible_cleared)
+				bool all_visible_cleared, bool new_all_visible_cleared,
+				bool wal_logical)
 {
 	xl_heap_update xlrec;
 	xl_heap_header xlhdr;
@@ -8853,7 +8868,8 @@ log_heap_update(Relation reln, Buffer oldbuf,
 				suffixlen = 0;
 	XLogRecPtr	recptr;
 	Page		page = BufferGetPage(newbuf);
-	bool		need_tuple_data = RelationIsLogicallyLogged(reln);
+	bool		need_tuple_data = RelationIsLogicallyLogged(reln) &&
+		wal_logical;
 	bool		init;
 	int			bufflags;
 
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 79f9de5d760..d03084768e0 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -33,6 +33,7 @@
 #include "catalog/index.h"
 #include "catalog/storage.h"
 #include "catalog/storage_xlog.h"
+#include "commands/cluster.h"
 #include "commands/progress.h"
 #include "executor/executor.h"
 #include "miscadmin.h"
@@ -309,7 +310,8 @@ heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
 	 * the storage itself is cleaning the dead tuples by itself, it is the
 	 * time to call the index tuple deletion also.
 	 */
-	return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart);
+	return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart,
+					   true);
 }
 
 
@@ -328,7 +330,7 @@ heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
 	tuple->t_tableOid = slot->tts_tableOid;
 
 	result = heap_update(relation, otid, tuple, cid, crosscheck, wait,
-						 tmfd, lockmode, update_indexes);
+						 tmfd, lockmode, update_indexes, true);
 	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
 
 	/*
@@ -685,13 +687,15 @@ static void
 heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 								 Relation OldIndex, bool use_sort,
 								 TransactionId OldestXmin,
+								 Snapshot snapshot,
+								 LogicalDecodingContext *decoding_ctx,
 								 TransactionId *xid_cutoff,
 								 MultiXactId *multi_cutoff,
 								 double *num_tuples,
 								 double *tups_vacuumed,
 								 double *tups_recently_dead)
 {
-	RewriteState rwstate;
+	RewriteState rwstate = NULL;
 	IndexScanDesc indexScan;
 	TableScanDesc tableScan;
 	HeapScanDesc heapScan;
@@ -705,6 +709,8 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	bool	   *isnull;
 	BufferHeapTupleTableSlot *hslot;
 	BlockNumber prev_cblock = InvalidBlockNumber;
+	bool		concurrent = snapshot != NULL;
+	XLogRecPtr	end_of_wal_prev = GetFlushRecPtr(NULL);
 
 	/* Remember if it's a system catalog */
 	is_system_catalog = IsSystemRelation(OldHeap);
@@ -720,9 +726,12 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	values = (Datum *) palloc(natts * sizeof(Datum));
 	isnull = (bool *) palloc(natts * sizeof(bool));
 
-	/* Initialize the rewrite operation */
-	rwstate = begin_heap_rewrite(OldHeap, NewHeap, OldestXmin, *xid_cutoff,
-								 *multi_cutoff);
+	/*
+	 * Initialize the rewrite operation.
+	 */
+	if (!concurrent)
+		rwstate = begin_heap_rewrite(OldHeap, NewHeap, OldestXmin,
+									 *xid_cutoff, *multi_cutoff);
 
 
 	/* Set up sorting if wanted */
@@ -737,6 +746,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	 * Prepare to scan the OldHeap.  To ensure we see recently-dead tuples
 	 * that still need to be copied, we scan with SnapshotAny and use
 	 * HeapTupleSatisfiesVacuum for the visibility test.
+	 *
+	 * In the CONCURRENTLY case, we do regular MVCC visibility tests, using
+	 * the snapshot passed by the caller.
 	 */
 	if (OldIndex != NULL && !use_sort)
 	{
@@ -753,7 +765,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex,
+									snapshot ? snapshot : SnapshotAny,
+									NULL, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -762,7 +776,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap,
+									snapshot ? snapshot : SnapshotAny,
+									0, (ScanKey) NULL);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
@@ -785,6 +801,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		HeapTuple	tuple;
 		Buffer		buf;
 		bool		isdead;
+		HTSV_Result vis;
 
 		CHECK_FOR_INTERRUPTS();
 
@@ -837,70 +854,84 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		tuple = ExecFetchSlotHeapTuple(slot, false, NULL);
 		buf = hslot->buffer;
 
-		LockBuffer(buf, BUFFER_LOCK_SHARE);
-
-		switch (HeapTupleSatisfiesVacuum(tuple, OldestXmin, buf))
+		/*
+		 * Regarding CONCURRENTLY, see the comments on MVCC snapshot above.
+		 */
+		if (!concurrent)
 		{
-			case HEAPTUPLE_DEAD:
-				/* Definitely dead */
-				isdead = true;
-				break;
-			case HEAPTUPLE_RECENTLY_DEAD:
-				*tups_recently_dead += 1;
-				/* fall through */
-			case HEAPTUPLE_LIVE:
-				/* Live or recently dead, must copy it */
-				isdead = false;
-				break;
-			case HEAPTUPLE_INSERT_IN_PROGRESS:
+			LockBuffer(buf, BUFFER_LOCK_SHARE);
 
-				/*
-				 * Since we hold exclusive lock on the relation, normally the
-				 * only way to see this is if it was inserted earlier in our
-				 * own transaction.  However, it can happen in system
-				 * catalogs, since we tend to release write lock before commit
-				 * there.  Give a warning if neither case applies; but in any
-				 * case we had better copy it.
-				 */
-				if (!is_system_catalog &&
-					!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple->t_data)))
-					elog(WARNING, "concurrent insert in progress within table \"%s\"",
-						 RelationGetRelationName(OldHeap));
-				/* treat as live */
-				isdead = false;
-				break;
-			case HEAPTUPLE_DELETE_IN_PROGRESS:
-
-				/*
-				 * Similar situation to INSERT_IN_PROGRESS case.
-				 */
-				if (!is_system_catalog &&
-					!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tuple->t_data)))
-					elog(WARNING, "concurrent delete in progress within table \"%s\"",
-						 RelationGetRelationName(OldHeap));
-				/* treat as recently dead */
-				*tups_recently_dead += 1;
-				isdead = false;
-				break;
-			default:
-				elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
-				isdead = false; /* keep compiler quiet */
-				break;
-		}
-
-		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
-
-		if (isdead)
-		{
-			*tups_vacuumed += 1;
-			/* heap rewrite module still needs to see it... */
-			if (rewrite_heap_dead_tuple(rwstate, tuple))
+			switch ((vis = HeapTupleSatisfiesVacuum(tuple, OldestXmin, buf)))
 			{
-				/* A previous recently-dead tuple is now known dead */
-				*tups_vacuumed += 1;
-				*tups_recently_dead -= 1;
+				case HEAPTUPLE_DEAD:
+					/* Definitely dead */
+					isdead = true;
+					break;
+				case HEAPTUPLE_RECENTLY_DEAD:
+					*tups_recently_dead += 1;
+					/* fall through */
+				case HEAPTUPLE_LIVE:
+					/* Live or recently dead, must copy it */
+					isdead = false;
+					break;
+				case HEAPTUPLE_INSERT_IN_PROGRESS:
+
+					/*
+					 * As long as we hold exclusive lock on the relation,
+					 * normally the only way to see this is if it was inserted
+					 * earlier in our own transaction.  However, it can happen
+					 * in system catalogs, since we tend to release write lock
+					 * before commit there. Also, there's no exclusive lock
+					 * during concurrent processing. Give a warning if neither
+					 * case applies; but in any case we had better copy it.
+					 */
+					if (!is_system_catalog && !concurrent &&
+						!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple->t_data)))
+						elog(WARNING, "concurrent insert in progress within table \"%s\"",
+							 RelationGetRelationName(OldHeap));
+					/* treat as live */
+					isdead = false;
+					break;
+				case HEAPTUPLE_DELETE_IN_PROGRESS:
+
+					/*
+					 * Similar situation to INSERT_IN_PROGRESS case.
+					 */
+					if (!is_system_catalog && !concurrent &&
+						!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tuple->t_data)))
+						elog(WARNING, "concurrent delete in progress within table \"%s\"",
+							 RelationGetRelationName(OldHeap));
+					/* treat as recently dead */
+					*tups_recently_dead += 1;
+					isdead = false;
+					break;
+				default:
+					elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
+					isdead = false; /* keep compiler quiet */
+					break;
 			}
-			continue;
+
+			if (isdead)
+			{
+				*tups_vacuumed += 1;
+				/* heap rewrite module still needs to see it... */
+				if (rewrite_heap_dead_tuple(rwstate, tuple))
+				{
+					/* A previous recently-dead tuple is now known dead */
+					*tups_vacuumed += 1;
+					*tups_recently_dead -= 1;
+				}
+
+				LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+				continue;
+			}
+
+			/*
+			 * In the concurrent case, we have a copy of the tuple, so we
+			 * don't worry whether the source tuple will be deleted / updated
+			 * after we release the lock.
+			 */
+			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 		}
 
 		*num_tuples += 1;
@@ -919,7 +950,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		{
 			const int	ct_index[] = {
 				PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
-				PROGRESS_REPACK_HEAP_TUPLES_WRITTEN
+				PROGRESS_REPACK_HEAP_TUPLES_INSERTED
 			};
 			int64		ct_val[2];
 
@@ -934,6 +965,31 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			ct_val[1] = *num_tuples;
 			pgstat_progress_update_multi_param(2, ct_index, ct_val);
 		}
+
+		/*
+		 * Process the WAL produced by the load, as well as by other
+		 * transactions, so that the replication slot can advance and WAL does
+		 * not pile up. Use wal_segment_size as a threshold so that we do not
+		 * introduce the decoding overhead too often.
+		 *
+		 * Of course, we must not apply the changes until the initial load has
+		 * completed.
+		 *
+		 * Note that our insertions into the new table should not be decoded
+		 * as we (intentionally) do not write the logical decoding specific
+		 * information to WAL.
+		 */
+		if (concurrent)
+		{
+			XLogRecPtr	end_of_wal;
+
+			end_of_wal = GetFlushRecPtr(NULL);
+			if ((end_of_wal - end_of_wal_prev) > wal_segment_size)
+			{
+				repack_decode_concurrent_changes(decoding_ctx, end_of_wal);
+				end_of_wal_prev = end_of_wal;
+			}
+		}
 	}
 
 	if (indexScan != NULL)
@@ -977,7 +1033,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 									 values, isnull,
 									 rwstate);
 			/* Report n_tuples */
-			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_WRITTEN,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_INSERTED,
 										 n_tuples);
 		}
 
@@ -985,7 +1041,8 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	}
 
 	/* Write out any remaining tuples, and fsync if needed */
-	end_heap_rewrite(rwstate);
+	if (rwstate)
+		end_heap_rewrite(rwstate);
 
 	/* Clean up */
 	pfree(values);
@@ -2376,6 +2433,10 @@ heapam_scan_sample_next_tuple(TableScanDesc scan, SampleScanState *scanstate,
  * SET WITHOUT OIDS.
  *
  * So, we must reconstruct the tuple from component Datums.
+ *
+ * If rwstate=NULL, use simple_heap_insert() instead of rewriting - in that
+ * case we still need to deform/form the tuple. TODO Shouldn't we rename the
+ * function, as might not do any rewrite?
  */
 static void
 reform_and_rewrite_tuple(HeapTuple tuple,
@@ -2398,8 +2459,28 @@ reform_and_rewrite_tuple(HeapTuple tuple,
 
 	copiedTuple = heap_form_tuple(newTupDesc, values, isnull);
 
-	/* The heap rewrite module does the rest */
-	rewrite_heap_tuple(rwstate, tuple, copiedTuple);
+	if (rwstate)
+		/* The heap rewrite module does the rest */
+		rewrite_heap_tuple(rwstate, tuple, copiedTuple);
+	else
+	{
+		/*
+		 * Insert tuple when processing REPACK CONCURRENTLY.
+		 *
+		 * rewriteheap.c is not used in the CONCURRENTLY case because it'd be
+		 * difficult to do the same in the catch-up phase (as the logical
+		 * decoding does not provide us with sufficient visibility
+		 * information). Thus we must use heap_insert() both during the
+		 * catch-up and here.
+		 *
+		 * The following is like simple_heap_insert() except that we pass the
+		 * flag to skip logical decoding: as soon as REPACK CONCURRENTLY swaps
+		 * the relation files, it drops this relation, so no logical
+		 * replication subscription should need the data.
+		 */
+		heap_insert(NewHeap, copiedTuple, GetCurrentCommandId(true),
+					HEAP_INSERT_NO_LOGICAL, NULL);
+	}
 
 	heap_freetuple(copiedTuple);
 }
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index e6d2b5fced1..6aa2ed214f2 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -617,9 +617,9 @@ raw_heap_insert(RewriteState state, HeapTuple tup)
 		int			options = HEAP_INSERT_SKIP_FSM;
 
 		/*
-		 * While rewriting the heap for VACUUM FULL / CLUSTER, make sure data
-		 * for the TOAST table are not logically decoded.  The main heap is
-		 * WAL-logged as XLOG FPI records, which are not logically decoded.
+		 * While rewriting the heap for REPACK, make sure data for the TOAST
+		 * table are not logically decoded.  The main heap is WAL-logged as
+		 * XLOG FPI records, which are not logically decoded.
 		 */
 		options |= HEAP_INSERT_NO_LOGICAL;
 
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index b46e7e9c2a6..5670f2bfbde 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -215,6 +215,7 @@ typedef struct TransactionStateData
 	bool		parallelChildXact;	/* is any parent transaction parallel? */
 	bool		chain;			/* start a new block after this one */
 	bool		topXidLogged;	/* for a subxact: is top-level XID logged? */
+	bool		internal;		/* for a subxact: launched internally? */
 	struct TransactionStateData *parent;	/* back link to parent */
 } TransactionStateData;
 
@@ -4735,6 +4736,7 @@ BeginInternalSubTransaction(const char *name)
 			/* Normal subtransaction start */
 			PushTransaction();
 			s = CurrentTransactionState;	/* changed by push */
+			s->internal = true;
 
 			/*
 			 * Savepoint names, like the TransactionState block itself, live
@@ -5251,7 +5253,13 @@ AbortSubTransaction(void)
 	LWLockReleaseAll();
 
 	pgstat_report_wait_end();
-	pgstat_progress_end_command();
+
+	/*
+	 * Internal subtransacion might be used by an user command, in which case
+	 * the command outlives the subtransaction.
+	 */
+	if (!s->internal)
+		pgstat_progress_end_command();
 
 	pgaio_error_cleanup();
 
@@ -5468,6 +5476,7 @@ PushTransaction(void)
 	s->parallelModeLevel = 0;
 	s->parallelChildXact = (p->parallelModeLevel != 0 || p->parallelChildXact);
 	s->topXidLogged = false;
+	s->internal = false;
 
 	CurrentTransactionState = s;
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index b2b7b10c2be..a92ac78ad9e 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1266,16 +1266,17 @@ CREATE VIEW pg_stat_progress_cluster AS
                       WHEN 2 THEN 'index scanning heap'
                       WHEN 3 THEN 'sorting tuples'
                       WHEN 4 THEN 'writing new heap'
-                      WHEN 5 THEN 'swapping relation files'
-                      WHEN 6 THEN 'rebuilding index'
-                      WHEN 7 THEN 'performing final cleanup'
+                      -- 5 is 'catch-up', but that should not appear here.
+                      WHEN 6 THEN 'swapping relation files'
+                      WHEN 7 THEN 'rebuilding index'
+                      WHEN 8 THEN 'performing final cleanup'
                       END AS phase,
         CAST(S.param3 AS oid) AS cluster_index_relid,
         S.param4 AS heap_tuples_scanned,
         S.param5 AS heap_tuples_written,
-        S.param6 AS heap_blks_total,
-        S.param7 AS heap_blks_scanned,
-        S.param8 AS index_rebuild_count
+        S.param8 AS heap_blks_total,
+        S.param9 AS heap_blks_scanned,
+        S.param10 AS index_rebuild_count
     FROM pg_stat_get_progress_info('CLUSTER') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
@@ -1291,16 +1292,19 @@ CREATE VIEW pg_stat_progress_repack AS
                       WHEN 2 THEN 'index scanning heap'
                       WHEN 3 THEN 'sorting tuples'
                       WHEN 4 THEN 'writing new heap'
-                      WHEN 5 THEN 'swapping relation files'
-                      WHEN 6 THEN 'rebuilding index'
-                      WHEN 7 THEN 'performing final cleanup'
+                      WHEN 5 THEN 'catch-up'
+                      WHEN 6 THEN 'swapping relation files'
+                      WHEN 7 THEN 'rebuilding index'
+                      WHEN 8 THEN 'performing final cleanup'
                       END AS phase,
         CAST(S.param3 AS oid) AS repack_index_relid,
         S.param4 AS heap_tuples_scanned,
-        S.param5 AS heap_tuples_written,
-        S.param6 AS heap_blks_total,
-        S.param7 AS heap_blks_scanned,
-        S.param8 AS index_rebuild_count
+        S.param5 AS heap_tuples_inserted,
+        S.param6 AS heap_tuples_updated,
+        S.param7 AS heap_tuples_deleted,
+        S.param8 AS heap_blks_total,
+        S.param9 AS heap_blks_scanned,
+        S.param10 AS index_rebuild_count
     FROM pg_stat_get_progress_info('REPACK') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 8b64f9e6795..511b2bb6c43 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -25,6 +25,10 @@
 #include "access/toast_internals.h"
 #include "access/transam.h"
 #include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xlog_internal.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
 #include "catalog/catalog.h"
 #include "catalog/dependency.h"
 #include "catalog/heap.h"
@@ -32,6 +36,7 @@
 #include "catalog/namespace.h"
 #include "catalog/objectaccess.h"
 #include "catalog/pg_am.h"
+#include "catalog/pg_control.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/toasting.h"
 #include "commands/cluster.h"
@@ -39,15 +44,21 @@
 #include "commands/progress.h"
 #include "commands/tablecmds.h"
 #include "commands/vacuum.h"
+#include "executor/executor.h"
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
 #include "pgstat.h"
+#include "replication/decode.h"
+#include "replication/logical.h"
+#include "replication/snapbuild.h"
 #include "storage/bufmgr.h"
+#include "storage/ipc.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
 #include "utils/acl.h"
 #include "utils/fmgroids.h"
 #include "utils/guc.h"
+#include "utils/injection_point.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
@@ -67,13 +78,45 @@ typedef struct
 	Oid			indexOid;
 } RelToCluster;
 
+/*
+ * The following definitions are used for concurrent processing.
+ */
+
+/*
+ * The locators are used to avoid logical decoding of data that we do not need
+ * for our table.
+ */
+RelFileLocator repacked_rel_locator = {.relNumber = InvalidOid};
+RelFileLocator repacked_rel_toast_locator = {.relNumber = InvalidOid};
+
+/*
+ * Everything we need to call ExecInsertIndexTuples().
+ */
+typedef struct IndexInsertState
+{
+	ResultRelInfo *rri;
+	EState	   *estate;
+
+	Relation	ident_index;
+} IndexInsertState;
+
+/* The WAL segment being decoded. */
+static XLogSegNo repack_current_segment = 0;
+
+
 static bool cluster_rel_recheck(RepackCommand cmd, Relation OldHeap,
-								Oid indexOid, Oid userid, int options);
+								Oid indexOid, Oid userid, LOCKMODE lmode,
+								int options);
+static void check_repack_concurrently_requirements(Relation rel);
 static void rebuild_relation(RepackCommand cmd, bool usingindex,
-							 Relation OldHeap, Relation index, bool verbose);
+							 Relation OldHeap, Relation index, Oid userid,
+							 bool verbose, bool concurrent);
 static void copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
-							bool verbose, bool *pSwapToastByContent,
-							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
+							Snapshot snapshot, LogicalDecodingContext *decoding_ctx,
+							bool verbose,
+							bool *pSwapToastByContent,
+							TransactionId *pFreezeXid,
+							MultiXactId *pCutoffMulti);
 static List *get_tables_to_repack(RepackCommand cmd, bool usingindex,
 								  MemoryContext permcxt);
 static List *get_tables_to_repack_partitioned(RepackCommand cmd,
@@ -81,12 +124,61 @@ static List *get_tables_to_repack_partitioned(RepackCommand cmd,
 											  Oid relid, bool rel_is_index);
 static bool cluster_is_permitted_for_relation(RepackCommand cmd,
 											  Oid relid, Oid userid);
+
+static void begin_concurrent_repack(Relation rel);
+static void end_concurrent_repack(void);
+static LogicalDecodingContext *setup_logical_decoding(Oid relid,
+													  const char *slotname,
+													  TupleDesc tupdesc);
+static HeapTuple get_changed_tuple(char *change);
+static void apply_concurrent_changes(RepackDecodingState *dstate,
+									 Relation rel, ScanKey key, int nkeys,
+									 IndexInsertState *iistate);
+static void apply_concurrent_insert(Relation rel, ConcurrentChange *change,
+									HeapTuple tup, IndexInsertState *iistate,
+									TupleTableSlot *index_slot);
+static void apply_concurrent_update(Relation rel, HeapTuple tup,
+									HeapTuple tup_target,
+									ConcurrentChange *change,
+									IndexInsertState *iistate,
+									TupleTableSlot *index_slot);
+static void apply_concurrent_delete(Relation rel, HeapTuple tup_target,
+									ConcurrentChange *change);
+static HeapTuple find_target_tuple(Relation rel, ScanKey key, int nkeys,
+								   HeapTuple tup_key,
+								   IndexInsertState *iistate,
+								   TupleTableSlot *ident_slot,
+								   IndexScanDesc *scan_p);
+static void process_concurrent_changes(LogicalDecodingContext *ctx,
+									   XLogRecPtr end_of_wal,
+									   Relation rel_dst,
+									   Relation rel_src,
+									   ScanKey ident_key,
+									   int ident_key_nentries,
+									   IndexInsertState *iistate);
+static IndexInsertState *get_index_insert_state(Relation relation,
+												Oid ident_index_id);
+static ScanKey build_identity_key(Oid ident_idx_oid, Relation rel_src,
+								  int *nentries);
+static void free_index_insert_state(IndexInsertState *iistate);
+static void cleanup_logical_decoding(LogicalDecodingContext *ctx);
+static void rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
+											   Relation cl_index,
+											   LogicalDecodingContext *ctx,
+											   bool swap_toast_by_content,
+											   TransactionId frozenXid,
+											   MultiXactId cutoffMulti);
+static List *build_new_indexes(Relation NewHeap, Relation OldHeap, List *OldIndexes);
 static Relation process_single_relation(RepackStmt *stmt,
+										LOCKMODE lockmode,
+										bool isTopLevel,
 										ClusterParams *params);
 static Oid	determine_clustered_index(Relation rel, bool usingindex,
 									  const char *indexname);
 
 
+#define REPL_PLUGIN_NAME   "pgoutput_repack"
+
 static const char *
 RepackCommandAsString(RepackCommand cmd)
 {
@@ -95,7 +187,7 @@ RepackCommandAsString(RepackCommand cmd)
 		case REPACK_COMMAND_REPACK:
 			return "REPACK";
 		case REPACK_COMMAND_VACUUMFULL:
-			return "VACUUM";
+			return "VACUUM (FULL)";
 		case REPACK_COMMAND_CLUSTER:
 			return "CLUSTER";
 	}
@@ -132,6 +224,7 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 	ClusterParams params = {0};
 	Relation	rel = NULL;
 	MemoryContext repack_context;
+	LOCKMODE	lockmode;
 	List	   *rtcs;
 
 	/* Parse option list */
@@ -142,6 +235,16 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 		else if (strcmp(opt->defname, "analyze") == 0 ||
 				 strcmp(opt->defname, "analyse") == 0)
 			params.options |= defGetBoolean(opt) ? CLUOPT_ANALYZE : 0;
+		else if (strcmp(opt->defname, "concurrently") == 0 &&
+				 defGetBoolean(opt))
+		{
+			if (stmt->command != REPACK_COMMAND_REPACK)
+				ereport(ERROR,
+						errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						errmsg("CONCURRENTLY option not supported for %s",
+							   RepackCommandAsString(stmt->command)));
+			params.options |= CLUOPT_CONCURRENT;
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -151,13 +254,25 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 					 parser_errposition(pstate, opt->location)));
 	}
 
+	/*
+	 * Determine the lock mode expected by cluster_rel().
+	 *
+	 * In the exclusive case, we obtain AccessExclusiveLock right away to
+	 * avoid lock-upgrade hazard in the single-transaction case. In the
+	 * CONCURRENTLY case, the AccessExclusiveLock will only be used at the end
+	 * of processing, supposedly for very short time. Until then, we'll have
+	 * to unlock the relation temporarily, so there's no lock-upgrade hazard.
+	 */
+	lockmode = (params.options & CLUOPT_CONCURRENT) == 0 ?
+		AccessExclusiveLock : ShareUpdateExclusiveLock;
+
 	/*
 	 * If a single relation is specified, process it and we're done ... unless
 	 * the relation is a partitioned table, in which case we fall through.
 	 */
 	if (stmt->relation != NULL)
 	{
-		rel = process_single_relation(stmt, &params);
+		rel = process_single_relation(stmt, lockmode, isTopLevel, &params);
 		if (rel == NULL)
 			return;
 	}
@@ -169,10 +284,29 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 				errmsg("cannot ANALYZE multiple tables"));
 
 	/*
-	 * By here, we know we are in a multi-table situation.  In order to avoid
-	 * holding locks for too long, we want to process each table in its own
-	 * transaction.  This forces us to disallow running inside a user
-	 * transaction block.
+	 * By here, we know we are in a multi-table situation.
+	 *
+	 * Concurrent processing is currently considered rather special (e.g. in
+	 * terms of resources consumed) so it is not performed in bulk.
+	 */
+	if (params.options & CLUOPT_CONCURRENT)
+	{
+		if (rel != NULL)
+		{
+			Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
+			ereport(ERROR,
+					errmsg("REPACK CONCURRENTLY not supported for partitioned tables"),
+					errhint("Consider running the command for individual partitions."));
+		}
+		else
+			ereport(ERROR,
+					errmsg("REPACK CONCURRENTLY requires explicit table name"));
+	}
+
+	/*
+	 * In order to avoid holding locks for too long, we want to process each
+	 * table in its own transaction.  This forces us to disallow running
+	 * inside a user transaction block.
 	 */
 	PreventInTransactionBlock(isTopLevel, RepackCommandAsString(stmt->command));
 
@@ -252,7 +386,7 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 		 * Open the target table, coping with the case where it has been
 		 * dropped.
 		 */
-		rel = try_table_open(rtc->tableOid, AccessExclusiveLock);
+		rel = try_table_open(rtc->tableOid, lockmode);
 		if (rel == NULL)
 		{
 			CommitTransactionCommand();
@@ -264,7 +398,7 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 
 		/* Process this table */
 		cluster_rel(stmt->command, stmt->usingindex,
-					rel, rtc->indexOid, &params);
+					rel, rtc->indexOid, &params, isTopLevel);
 		/* cluster_rel closes the relation, but keeps lock */
 
 		PopActiveSnapshot();
@@ -293,22 +427,55 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
  * If indexOid is InvalidOid, the table will be rewritten in physical order
  * instead of index order.
  *
+ * Note that, in the concurrent case, the function releases the lock at some
+ * point, in order to get AccessExclusiveLock for the final steps (i.e. to
+ * swap the relation files). To make things simpler, the caller should expect
+ * OldHeap to be closed on return, regardless CLUOPT_CONCURRENT. (The
+ * AccessExclusiveLock is kept till the end of the transaction.)
+ *
  * 'cmd' indicates which command is being executed, to be used for error
  * messages.
  */
 void
 cluster_rel(RepackCommand cmd, bool usingindex,
-			Relation OldHeap, Oid indexOid, ClusterParams *params)
+			Relation OldHeap, Oid indexOid, ClusterParams *params,
+			bool isTopLevel)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
+	Relation	index;
+	LOCKMODE	lmode;
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
 	bool		verbose = ((params->options & CLUOPT_VERBOSE) != 0);
 	bool		recheck = ((params->options & CLUOPT_RECHECK) != 0);
-	Relation	index;
+	bool		concurrent = ((params->options & CLUOPT_CONCURRENT) != 0);
 
-	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false));
+	/*
+	 * Check that the correct lock is held. The lock mode is
+	 * AccessExclusiveLock for normal processing and ShareUpdateExclusiveLock
+	 * for concurrent processing (so that SELECT, INSERT, UPDATE and DELETE
+	 * commands work, but cluster_rel() cannot be called concurrently for the
+	 * same relation).
+	 */
+	lmode = !concurrent ? AccessExclusiveLock : ShareUpdateExclusiveLock;
+
+	/* There are specific requirements on concurrent processing. */
+	if (concurrent)
+	{
+		/*
+		 * Make sure we have no XID assigned, otherwise call of
+		 * setup_logical_decoding() can cause a deadlock.
+		 *
+		 * The existence of transaction block actually does not imply that XID
+		 * was already assigned, but it very likely is. We might want to check
+		 * the result of GetCurrentTransactionIdIfAny() instead, but that
+		 * would be less clear from user's perspective.
+		 */
+		PreventInTransactionBlock(isTopLevel, "REPACK (CONCURRENTLY)");
+
+		check_repack_concurrently_requirements(OldHeap);
+	}
 
 	/* Check for user-requested abort. */
 	CHECK_FOR_INTERRUPTS();
@@ -351,11 +518,13 @@ cluster_rel(RepackCommand cmd, bool usingindex,
 	 * If this is a single-transaction CLUSTER, we can skip these tests. We
 	 * *must* skip the one on indisclustered since it would reject an attempt
 	 * to cluster a not-previously-clustered index.
+	 *
+	 * XXX move [some of] these comments to where the RECHECK flag is
+	 * determined?
 	 */
-	if (recheck)
-		if (!cluster_rel_recheck(cmd, OldHeap, indexOid, save_userid,
-								 params->options))
-			goto out;
+	if (recheck && !cluster_rel_recheck(cmd, OldHeap, indexOid, save_userid,
+										lmode, params->options))
+		goto out;
 
 	/*
 	 * We allow repacking shared catalogs only when not using an index. It
@@ -369,6 +538,12 @@ cluster_rel(RepackCommand cmd, bool usingindex,
 				 errmsg("cannot run \"%s\" on a shared catalog",
 						RepackCommandAsString(cmd))));
 
+	/*
+	 * The CONCURRENTLY case should have been rejected earlier because it does
+	 * not support system catalogs.
+	 */
+	Assert(!(OldHeap->rd_rel->relisshared && concurrent));
+
 	/*
 	 * Don't process temp tables of other backends ... their local buffer
 	 * manager is not going to cope.
@@ -404,7 +579,7 @@ cluster_rel(RepackCommand cmd, bool usingindex,
 	if (OidIsValid(indexOid))
 	{
 		/* verify the index is good and lock it */
-		check_index_is_clusterable(OldHeap, indexOid, AccessExclusiveLock);
+		check_index_is_clusterable(OldHeap, indexOid, lmode);
 		/* also open it */
 		index = index_open(indexOid, NoLock);
 	}
@@ -421,7 +596,9 @@ cluster_rel(RepackCommand cmd, bool usingindex,
 	if (OldHeap->rd_rel->relkind == RELKIND_MATVIEW &&
 		!RelationIsPopulated(OldHeap))
 	{
-		relation_close(OldHeap, AccessExclusiveLock);
+		if (index)
+			index_close(index, lmode);
+		relation_close(OldHeap, lmode);
 		goto out;
 	}
 
@@ -434,11 +611,35 @@ cluster_rel(RepackCommand cmd, bool usingindex,
 	 * invalid, because we move tuples around.  Promote them to relation
 	 * locks.  Predicate locks on indexes will be promoted when they are
 	 * reindexed.
+	 *
+	 * During concurrent processing, the heap as well as its indexes stay in
+	 * operation, so we postpone this step until they are locked using
+	 * AccessExclusiveLock near the end of the processing.
 	 */
-	TransferPredicateLocksToHeapRelation(OldHeap);
+	if (!concurrent)
+		TransferPredicateLocksToHeapRelation(OldHeap);
 
 	/* rebuild_relation does all the dirty work */
-	rebuild_relation(cmd, usingindex, OldHeap, index, verbose);
+	PG_TRY();
+	{
+		/*
+		 * For concurrent processing, make sure that our logical decoding
+		 * ignores data changes of other tables than the one we are
+		 * processing.
+		 */
+		if (concurrent)
+			begin_concurrent_repack(OldHeap);
+
+		rebuild_relation(cmd, usingindex, OldHeap, index, save_userid,
+						 verbose, concurrent);
+	}
+	PG_FINALLY();
+	{
+		if (concurrent)
+			end_concurrent_repack();
+	}
+	PG_END_TRY();
+
 	/* rebuild_relation closes OldHeap, and index if valid */
 
 out:
@@ -457,14 +658,14 @@ out:
  */
 static bool
 cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
-					Oid userid, int options)
+					Oid userid, LOCKMODE lmode, int options)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 
 	/* Check that the user still has privileges for the relation */
 	if (!cluster_is_permitted_for_relation(cmd, tableOid, userid))
 	{
-		relation_close(OldHeap, AccessExclusiveLock);
+		relation_close(OldHeap, lmode);
 		return false;
 	}
 
@@ -478,7 +679,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	 */
 	if (RELATION_IS_OTHER_TEMP(OldHeap))
 	{
-		relation_close(OldHeap, AccessExclusiveLock);
+		relation_close(OldHeap, lmode);
 		return false;
 	}
 
@@ -489,7 +690,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 		 */
 		if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
 		{
-			relation_close(OldHeap, AccessExclusiveLock);
+			relation_close(OldHeap, lmode);
 			return false;
 		}
 
@@ -500,7 +701,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 		if ((options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
 			!get_index_isclustered(indexOid))
 		{
-			relation_close(OldHeap, AccessExclusiveLock);
+			relation_close(OldHeap, lmode);
 			return false;
 		}
 	}
@@ -641,19 +842,89 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 	table_close(pg_index, RowExclusiveLock);
 }
 
+/*
+ * Check if the CONCURRENTLY option is legal for the relation.
+ */
+static void
+check_repack_concurrently_requirements(Relation rel)
+{
+	char		relpersistence,
+				replident;
+	Oid			ident_idx;
+
+	/* Data changes in system relations are not logically decoded. */
+	if (IsCatalogRelation(rel))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("REPACK CONCURRENTLY is not supported for catalog relations.")));
+
+	/*
+	 * reorderbuffer.c does not seem to handle processing of TOAST relation
+	 * alone.
+	 */
+	if (IsToastRelation(rel))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("REPACK CONCURRENTLY is not supported for TOAST relations, unless the main relation is repacked too.")));
+
+	relpersistence = rel->rd_rel->relpersistence;
+	if (relpersistence != RELPERSISTENCE_PERMANENT)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("REPACK CONCURRENTLY is only allowed for permanent relations.")));
+
+	/* With NOTHING, WAL does not contain the old tuple. */
+	replident = rel->rd_rel->relreplident;
+	if (replident == REPLICA_IDENTITY_NOTHING)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("Relation \"%s\" has insufficient replication identity.",
+						 RelationGetRelationName(rel))));
+
+	/*
+	 * Identity index is not set if the replica identity is FULL, but PK might
+	 * exist in such a case.
+	 */
+	ident_idx = RelationGetReplicaIndex(rel);
+	if (!OidIsValid(ident_idx) && OidIsValid(rel->rd_pkindex))
+		ident_idx = rel->rd_pkindex;
+	if (!OidIsValid(ident_idx))
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot process relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 (errhint("Relation \"%s\" has no identity index.",
+						  RelationGetRelationName(rel)))));
+}
+
+
 /*
  * rebuild_relation: rebuild an existing relation in index or physical order
  *
- * OldHeap: table to rebuild.
+ * OldHeap: table to rebuild.  See cluster_rel() for comments on the required
+ * lock strength.
+ *
  * index: index to cluster by, or NULL to rewrite in physical order.
  *
- * On entry, heap and index (if one is given) must be open, and
- * AccessExclusiveLock held on them.
- * On exit, they are closed, but locks on them are not released.
+ * On entry, heap and index (if one is given) must be open, and the
+ * appropriate lock held on them -- AccessExclusiveLock for exclusive
+ * processing and ShareUpdateExclusiveLock for concurrent processing.
+ *
+ * On exit, they are closed, but still locked with AccessExclusiveLock.  (The
+ * function handles the lock upgrade if 'concurrent' is true.)
  */
 static void
 rebuild_relation(RepackCommand cmd, bool usingindex,
-				 Relation OldHeap, Relation index, bool verbose)
+				 Relation OldHeap, Relation index, Oid userid,
+				 bool verbose, bool concurrent)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			accessMethod = OldHeap->rd_rel->relam;
@@ -661,13 +932,55 @@ rebuild_relation(RepackCommand cmd, bool usingindex,
 	Oid			OIDNewHeap;
 	Relation	NewHeap;
 	char		relpersistence;
-	bool		is_system_catalog;
 	bool		swap_toast_by_content;
 	TransactionId frozenXid;
 	MultiXactId cutoffMulti;
+	NameData	slotname;
+	LogicalDecodingContext *ctx = NULL;
+	Snapshot	snapshot = NULL;
+#if USE_ASSERT_CHECKING
+	LOCKMODE	lmode;
 
-	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false) &&
-		   (index == NULL || CheckRelationLockedByMe(index, AccessExclusiveLock, false)));
+	lmode = concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock;
+
+	Assert(CheckRelationLockedByMe(OldHeap, lmode, false));
+	Assert(!usingindex || CheckRelationLockedByMe(index, lmode, false));
+#endif
+
+	if (concurrent)
+	{
+		TupleDesc	tupdesc;
+
+		/*
+		 * REPACK CONCURRENTLY is not allowed in a transaction block, so this
+		 * should never fire.
+		 */
+		Assert(GetTopTransactionIdIfAny() == InvalidTransactionId);
+
+		/*
+		 * A single backend should not execute multiple REPACK commands at a
+		 * time, so use PID to make the slot unique.
+		 */
+		snprintf(NameStr(slotname), NAMEDATALEN, "repack_%d", MyProcPid);
+
+		tupdesc = CreateTupleDescCopy(RelationGetDescr(OldHeap));
+
+		/*
+		 * Prepare to capture the concurrent data changes.
+		 *
+		 * Note that this call waits for all transactions with XID already
+		 * assigned to finish. If some of those transactions is waiting for a
+		 * lock conflicting with ShareUpdateExclusiveLock on our table (e.g.
+		 * it runs CREATE INDEX), we can end up in a deadlock. Not sure this
+		 * risk is worth unlocking/locking the table (and its clustering
+		 * index) and checking again if its still eligible for REPACK
+		 * CONCURRENTLY.
+		 */
+		ctx = setup_logical_decoding(tableOid, NameStr(slotname), tupdesc);
+
+		snapshot = SnapBuildInitialSnapshotForRepack(ctx->snapshot_builder);
+		PushActiveSnapshot(snapshot);
+	}
 
 	/* for CLUSTER or REPACK USING INDEX, mark the index as the one to use */
 	if (usingindex)
@@ -675,7 +988,6 @@ rebuild_relation(RepackCommand cmd, bool usingindex,
 
 	/* Remember info about rel before closing OldHeap */
 	relpersistence = OldHeap->rd_rel->relpersistence;
-	is_system_catalog = IsSystemRelation(OldHeap);
 
 	/*
 	 * Create the transient table that will receive the re-ordered data.
@@ -691,30 +1003,67 @@ rebuild_relation(RepackCommand cmd, bool usingindex,
 	NewHeap = table_open(OIDNewHeap, NoLock);
 
 	/* Copy the heap data into the new table in the desired order */
-	copy_table_data(NewHeap, OldHeap, index, verbose,
+	copy_table_data(NewHeap, OldHeap, index, snapshot, ctx, verbose,
 					&swap_toast_by_content, &frozenXid, &cutoffMulti);
 
+	/* The historic snapshot won't be needed anymore. */
+	if (snapshot)
+		PopActiveSnapshot();
 
-	/* Close relcache entries, but keep lock until transaction commit */
-	table_close(OldHeap, NoLock);
-	if (index)
-		index_close(index, NoLock);
+	if (concurrent)
+	{
+		/*
+		 * Push a snapshot that we will use to find old versions of rows when
+		 * processing concurrent UPDATE and DELETE commands. (That snapshot
+		 * should also be used by index expressions.)
+		 */
+		PushActiveSnapshot(GetTransactionSnapshot());
 
-	/*
-	 * Close the new relation so it can be dropped as soon as the storage is
-	 * swapped. The relation is not visible to others, so no need to unlock it
-	 * explicitly.
-	 */
-	table_close(NewHeap, NoLock);
+		/*
+		 * Make sure we can find the tuples just inserted when applying DML
+		 * commands on top of those.
+		 */
+		CommandCounterIncrement();
+		UpdateActiveSnapshotCommandId();
 
-	/*
-	 * Swap the physical files of the target and transient tables, then
-	 * rebuild the target's indexes and throw away the transient table.
-	 */
-	finish_heap_swap(tableOid, OIDNewHeap, is_system_catalog,
-					 swap_toast_by_content, false, true,
-					 frozenXid, cutoffMulti,
-					 relpersistence);
+		rebuild_relation_finish_concurrent(NewHeap, OldHeap, index,
+										   ctx, swap_toast_by_content,
+										   frozenXid, cutoffMulti);
+		PopActiveSnapshot();
+
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_FINAL_CLEANUP);
+
+		/* Done with decoding. */
+		cleanup_logical_decoding(ctx);
+		ReplicationSlotRelease();
+		ReplicationSlotDrop(NameStr(slotname), false);
+	}
+	else
+	{
+		bool		is_system_catalog = IsSystemRelation(OldHeap);
+
+		/* Close relcache entries, but keep lock until transaction commit */
+		table_close(OldHeap, NoLock);
+		if (index)
+			index_close(index, NoLock);
+
+		/*
+		 * Close the new relation so it can be dropped as soon as the storage
+		 * is swapped. The relation is not visible to others, so no need to
+		 * unlock it explicitly.
+		 */
+		table_close(NewHeap, NoLock);
+
+		/*
+		 * Swap the physical files of the target and transient tables, then
+		 * rebuild the target's indexes and throw away the transient table.
+		 */
+		finish_heap_swap(tableOid, OIDNewHeap, is_system_catalog,
+						 swap_toast_by_content, false, true, true,
+						 frozenXid, cutoffMulti,
+						 relpersistence);
+	}
 }
 
 
@@ -849,15 +1198,19 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
 /*
  * Do the physical copying of table data.
  *
+ * 'snapshot' and 'decoding_ctx': see table_relation_copy_for_cluster(). Pass
+ * iff concurrent processing is required.
+ *
  * There are three output parameters:
  * *pSwapToastByContent is set true if toast tables must be swapped by content.
  * *pFreezeXid receives the TransactionId used as freeze cutoff point.
  * *pCutoffMulti receives the MultiXactId used as a cutoff point.
  */
 static void
-copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verbose,
-				bool *pSwapToastByContent, TransactionId *pFreezeXid,
-				MultiXactId *pCutoffMulti)
+copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
+				Snapshot snapshot, LogicalDecodingContext *decoding_ctx,
+				bool verbose, bool *pSwapToastByContent,
+				TransactionId *pFreezeXid, MultiXactId *pCutoffMulti)
 {
 	Relation	relRelation;
 	HeapTuple	reltup;
@@ -875,6 +1228,8 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	PGRUsage	ru0;
 	char	   *nspname;
 
+	bool		concurrent = snapshot != NULL;
+
 	pg_rusage_init(&ru0);
 
 	/* Store a copy of the namespace name for logging purposes */
@@ -977,8 +1332,48 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	 * provided, else plain seqscan.
 	 */
 	if (OldIndex != NULL && OldIndex->rd_rel->relam == BTREE_AM_OID)
+	{
+		ResourceOwner oldowner = NULL;
+		ResourceOwner resowner = NULL;
+
+		/*
+		 * In the CONCURRENT case, use a dedicated resource owner so we don't
+		 * leave any additional locks behind us that we cannot release easily.
+		 */
+		if (concurrent)
+		{
+			Assert(CheckRelationLockedByMe(OldHeap, ShareUpdateExclusiveLock,
+										   false));
+			Assert(CheckRelationLockedByMe(OldIndex, ShareUpdateExclusiveLock,
+										   false));
+
+			resowner = ResourceOwnerCreate(CurrentResourceOwner,
+										   "plan_cluster_use_sort");
+			oldowner = CurrentResourceOwner;
+			CurrentResourceOwner = resowner;
+		}
+
 		use_sort = plan_cluster_use_sort(RelationGetRelid(OldHeap),
 										 RelationGetRelid(OldIndex));
+
+		if (concurrent)
+		{
+			CurrentResourceOwner = oldowner;
+
+			/*
+			 * We are primarily concerned about locks, but if the planner
+			 * happened to allocate any other resources, we should release
+			 * them too because we're going to delete the whole resowner.
+			 */
+			ResourceOwnerRelease(resowner, RESOURCE_RELEASE_BEFORE_LOCKS,
+								 false, false);
+			ResourceOwnerRelease(resowner, RESOURCE_RELEASE_LOCKS,
+								 false, false);
+			ResourceOwnerRelease(resowner, RESOURCE_RELEASE_AFTER_LOCKS,
+								 false, false);
+			ResourceOwnerDelete(resowner);
+		}
+	}
 	else
 		use_sort = false;
 
@@ -1007,7 +1402,9 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	 * values (e.g. because the AM doesn't use freezing).
 	 */
 	table_relation_copy_for_cluster(OldHeap, NewHeap, OldIndex, use_sort,
-									cutoffs.OldestXmin, &cutoffs.FreezeLimit,
+									cutoffs.OldestXmin, snapshot,
+									decoding_ctx,
+									&cutoffs.FreezeLimit,
 									&cutoffs.MultiXactCutoff,
 									&num_tuples, &tups_vacuumed,
 									&tups_recently_dead);
@@ -1016,7 +1413,11 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	*pFreezeXid = cutoffs.FreezeLimit;
 	*pCutoffMulti = cutoffs.MultiXactCutoff;
 
-	/* Reset rd_toastoid just to be tidy --- it shouldn't be looked at again */
+	/*
+	 * Reset rd_toastoid just to be tidy --- it shouldn't be looked at again.
+	 * In the CONCURRENTLY case, we need to set it again before applying the
+	 * concurrent changes.
+	 */
 	NewHeap->rd_toastoid = InvalidOid;
 
 	num_pages = RelationGetNumberOfBlocks(NewHeap);
@@ -1474,14 +1875,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 				 bool swap_toast_by_content,
 				 bool check_constraints,
 				 bool is_internal,
+				 bool reindex,
 				 TransactionId frozenXid,
 				 MultiXactId cutoffMulti,
 				 char newrelpersistence)
 {
 	ObjectAddress object;
 	Oid			mapped_tables[4];
-	int			reindex_flags;
-	ReindexParams reindex_params = {0};
 	int			i;
 
 	/* Report that we are now swapping relation files */
@@ -1507,39 +1907,47 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	if (is_system_catalog)
 		CacheInvalidateCatalog(OIDOldHeap);
 
-	/*
-	 * Rebuild each index on the relation (but not the toast table, which is
-	 * all-new at this point).  It is important to do this before the DROP
-	 * step because if we are processing a system catalog that will be used
-	 * during DROP, we want to have its indexes available.  There is no
-	 * advantage to the other order anyway because this is all transactional,
-	 * so no chance to reclaim disk space before commit.  We do not need a
-	 * final CommandCounterIncrement() because reindex_relation does it.
-	 *
-	 * Note: because index_build is called via reindex_relation, it will never
-	 * set indcheckxmin true for the indexes.  This is OK even though in some
-	 * sense we are building new indexes rather than rebuilding existing ones,
-	 * because the new heap won't contain any HOT chains at all, let alone
-	 * broken ones, so it can't be necessary to set indcheckxmin.
-	 */
-	reindex_flags = REINDEX_REL_SUPPRESS_INDEX_USE;
-	if (check_constraints)
-		reindex_flags |= REINDEX_REL_CHECK_CONSTRAINTS;
+	if (reindex)
+	{
+		int			reindex_flags;
+		ReindexParams reindex_params = {0};
 
-	/*
-	 * Ensure that the indexes have the same persistence as the parent
-	 * relation.
-	 */
-	if (newrelpersistence == RELPERSISTENCE_UNLOGGED)
-		reindex_flags |= REINDEX_REL_FORCE_INDEXES_UNLOGGED;
-	else if (newrelpersistence == RELPERSISTENCE_PERMANENT)
-		reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
+		/*
+		 * Rebuild each index on the relation (but not the toast table, which
+		 * is all-new at this point).  It is important to do this before the
+		 * DROP step because if we are processing a system catalog that will
+		 * be used during DROP, we want to have its indexes available.  There
+		 * is no advantage to the other order anyway because this is all
+		 * transactional, so no chance to reclaim disk space before commit. We
+		 * do not need a final CommandCounterIncrement() because
+		 * reindex_relation does it.
+		 *
+		 * Note: because index_build is called via reindex_relation, it will
+		 * never set indcheckxmin true for the indexes.  This is OK even
+		 * though in some sense we are building new indexes rather than
+		 * rebuilding existing ones, because the new heap won't contain any
+		 * HOT chains at all, let alone broken ones, so it can't be necessary
+		 * to set indcheckxmin.
+		 */
+		reindex_flags = REINDEX_REL_SUPPRESS_INDEX_USE;
+		if (check_constraints)
+			reindex_flags |= REINDEX_REL_CHECK_CONSTRAINTS;
 
-	/* Report that we are now reindexing relations */
-	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
-								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
+		/*
+		 * Ensure that the indexes have the same persistence as the parent
+		 * relation.
+		 */
+		if (newrelpersistence == RELPERSISTENCE_UNLOGGED)
+			reindex_flags |= REINDEX_REL_FORCE_INDEXES_UNLOGGED;
+		else if (newrelpersistence == RELPERSISTENCE_PERMANENT)
+			reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
 
-	reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
+		/* Report that we are now reindexing relations */
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
+
+		reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
+	}
 
 	/* Report that we are now doing clean up */
 	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
@@ -1881,7 +2289,8 @@ cluster_is_permitted_for_relation(RepackCommand cmd, Oid relid, Oid userid)
  * resolve in this case.
  */
 static Relation
-process_single_relation(RepackStmt *stmt, ClusterParams *params)
+process_single_relation(RepackStmt *stmt, LOCKMODE lockmode, bool isTopLevel,
+						ClusterParams *params)
 {
 	Relation	rel;
 	Oid			tableOid;
@@ -1890,13 +2299,9 @@ process_single_relation(RepackStmt *stmt, ClusterParams *params)
 	Assert(stmt->command == REPACK_COMMAND_CLUSTER ||
 		   stmt->command == REPACK_COMMAND_REPACK);
 
-	/*
-	 * Find, lock, and check permissions on the table.  We obtain
-	 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
-	 * single-transaction case.
-	 */
+	/* Find, lock, and check permissions on the table. */
 	tableOid = RangeVarGetRelidExtended(stmt->relation,
-										AccessExclusiveLock,
+										lockmode,
 										0,
 										RangeVarCallbackMaintainsTable,
 										NULL);
@@ -1922,26 +2327,17 @@ process_single_relation(RepackStmt *stmt, ClusterParams *params)
 		return rel;
 	else
 	{
-		Oid			indexOid;
+		Oid			indexOid = InvalidOid;
 
-		indexOid = determine_clustered_index(rel, stmt->usingindex,
-											 stmt->indexname);
-		if (OidIsValid(indexOid))
-			check_index_is_clusterable(rel, indexOid, AccessExclusiveLock);
-		cluster_rel(stmt->command, stmt->usingindex, rel, indexOid, params);
-
-		/* Do an analyze, if requested */
-		if (params->options & CLUOPT_ANALYZE)
+		if (stmt->usingindex)
 		{
-			VacuumParams vac_params = {0};
-
-			vac_params.options |= VACOPT_ANALYZE;
-			if (params->options & CLUOPT_VERBOSE)
-				vac_params.options |= VACOPT_VERBOSE;
-			analyze_rel(RelationGetRelid(rel), NULL, vac_params, NIL, true,
-						NULL);
+			indexOid = determine_clustered_index(rel, stmt->usingindex,
+												 stmt->indexname);
+			check_index_is_clusterable(rel, indexOid, lockmode);
 		}
 
+		cluster_rel(stmt->command, stmt->usingindex, rel, indexOid,
+					params, isTopLevel);
 		return NULL;
 	}
 }
@@ -1998,3 +2394,1052 @@ determine_clustered_index(Relation rel, bool usingindex, const char *indexname)
 
 	return indexOid;
 }
+
+
+/*
+ * Call this function before REPACK CONCURRENTLY starts to setup logical
+ * decoding. It makes sure that other users of the table put enough
+ * information into WAL.
+ *
+ * The point is that at various places we expect that the table we're
+ * processing is treated like a system catalog. For example, we need to be
+ * able to scan it using a "historic snapshot" anytime during the processing
+ * (as opposed to scanning only at the start point of the decoding, as logical
+ * replication does during initial table synchronization), in order to apply
+ * concurrent UPDATE / DELETE commands.
+ *
+ * Note that TOAST table needs no attention here as it's not scanned using
+ * historic snapshot.
+ */
+static void
+begin_concurrent_repack(Relation rel)
+{
+	Oid			toastrelid;
+
+	/* Avoid logical decoding of other relations by this backend. */
+	repacked_rel_locator = rel->rd_locator;
+	toastrelid = rel->rd_rel->reltoastrelid;
+	if (OidIsValid(toastrelid))
+	{
+		Relation	toastrel;
+
+		/* Avoid logical decoding of other TOAST relations. */
+		toastrel = table_open(toastrelid, AccessShareLock);
+		repacked_rel_toast_locator = toastrel->rd_locator;
+		table_close(toastrel, AccessShareLock);
+	}
+}
+
+/*
+ * Call this when done with REPACK CONCURRENTLY.
+ */
+static void
+end_concurrent_repack(void)
+{
+	/*
+	 * Restore normal function of (future) logical decoding for this backend.
+	 */
+	repacked_rel_locator.relNumber = InvalidOid;
+	repacked_rel_toast_locator.relNumber = InvalidOid;
+}
+
+/*
+ * This function is much like pg_create_logical_replication_slot() except that
+ * the new slot is neither released (if anyone else could read changes from
+ * our slot, we could miss changes other backends do while we copy the
+ * existing data into temporary table), nor persisted (it's easier to handle
+ * crash by restarting all the work from scratch).
+ */
+static LogicalDecodingContext *
+setup_logical_decoding(Oid relid, const char *slotname, TupleDesc tupdesc)
+{
+	LogicalDecodingContext *ctx;
+	RepackDecodingState *dstate;
+
+	/*
+	 * Check if we can use logical decoding.
+	 */
+	CheckSlotPermissions();
+	CheckLogicalDecodingRequirements();
+
+	/* RS_TEMPORARY so that the slot gets cleaned up on ERROR. */
+	ReplicationSlotCreate(slotname, true, RS_TEMPORARY, false, false, false);
+
+	/*
+	 * Neither prepare_write nor do_write callback nor update_progress is
+	 * useful for us.
+	 *
+	 * Regarding the value of need_full_snapshot, we pass false because the
+	 * table we are processing is present in RepackedRelsHash and therefore,
+	 * regarding logical decoding, treated like a catalog.
+	 */
+	ctx = CreateInitDecodingContext(REPL_PLUGIN_NAME,
+									NIL,
+									false,
+									InvalidXLogRecPtr,
+									XL_ROUTINE(.page_read = read_local_xlog_page,
+											   .segment_open = wal_segment_open,
+											   .segment_close = wal_segment_close),
+									NULL, NULL, NULL);
+
+	/*
+	 * We don't have control on setting fast_forward, so at least check it.
+	 */
+	Assert(!ctx->fast_forward);
+
+	DecodingContextFindStartpoint(ctx);
+
+	/* Some WAL records should have been read. */
+	Assert(ctx->reader->EndRecPtr != InvalidXLogRecPtr);
+
+	XLByteToSeg(ctx->reader->EndRecPtr, repack_current_segment,
+				wal_segment_size);
+
+	/*
+	 * Setup structures to store decoded changes.
+	 */
+	dstate = palloc0(sizeof(RepackDecodingState));
+	dstate->relid = relid;
+	dstate->tstore = tuplestore_begin_heap(false, false,
+										   maintenance_work_mem);
+
+	dstate->tupdesc = tupdesc;
+
+	/* Initialize the descriptor to store the changes ... */
+	dstate->tupdesc_change = CreateTemplateTupleDesc(1);
+
+	TupleDescInitEntry(dstate->tupdesc_change, 1, NULL, BYTEAOID, -1, 0);
+	/* ... as well as the corresponding slot. */
+	dstate->tsslot = MakeSingleTupleTableSlot(dstate->tupdesc_change,
+											  &TTSOpsMinimalTuple);
+
+	dstate->resowner = ResourceOwnerCreate(CurrentResourceOwner,
+										   "logical decoding");
+
+	ctx->output_writer_private = dstate;
+	return ctx;
+}
+
+/*
+ * Retrieve tuple from ConcurrentChange structure.
+ *
+ * The input data starts with the structure but it might not be appropriately
+ * aligned.
+ */
+static HeapTuple
+get_changed_tuple(char *change)
+{
+	HeapTupleData tup_data;
+	HeapTuple	result;
+	char	   *src;
+
+	/*
+	 * Ensure alignment before accessing the fields. (This is why we can't use
+	 * heap_copytuple() instead of this function.)
+	 */
+	src = change + offsetof(ConcurrentChange, tup_data);
+	memcpy(&tup_data, src, sizeof(HeapTupleData));
+
+	result = (HeapTuple) palloc(HEAPTUPLESIZE + tup_data.t_len);
+	memcpy(result, &tup_data, sizeof(HeapTupleData));
+	result->t_data = (HeapTupleHeader) ((char *) result + HEAPTUPLESIZE);
+	src = change + SizeOfConcurrentChange;
+	memcpy(result->t_data, src, result->t_len);
+
+	return result;
+}
+
+/*
+ * Decode logical changes from the WAL sequence up to end_of_wal.
+ */
+void
+repack_decode_concurrent_changes(LogicalDecodingContext *ctx,
+								 XLogRecPtr end_of_wal)
+{
+	RepackDecodingState *dstate;
+	ResourceOwner resowner_old;
+
+	/*
+	 * Invalidate the "present" cache before moving to "(recent) history".
+	 */
+	InvalidateSystemCaches();
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+	resowner_old = CurrentResourceOwner;
+	CurrentResourceOwner = dstate->resowner;
+
+	PG_TRY();
+	{
+		while (ctx->reader->EndRecPtr < end_of_wal)
+		{
+			XLogRecord *record;
+			XLogSegNo	segno_new;
+			char	   *errm = NULL;
+			XLogRecPtr	end_lsn;
+
+			record = XLogReadRecord(ctx->reader, &errm);
+			if (errm)
+				elog(ERROR, "%s", errm);
+
+			if (record != NULL)
+				LogicalDecodingProcessRecord(ctx, ctx->reader);
+
+			/*
+			 * If WAL segment boundary has been crossed, inform the decoding
+			 * system that the catalog_xmin can advance. (We can confirm more
+			 * often, but a filling a single WAL segment should not take much
+			 * time.)
+			 */
+			end_lsn = ctx->reader->EndRecPtr;
+			XLByteToSeg(end_lsn, segno_new, wal_segment_size);
+			if (segno_new != repack_current_segment)
+			{
+				LogicalConfirmReceivedLocation(end_lsn);
+				elog(DEBUG1, "REPACK: confirmed receive location %X/%X",
+					 (uint32) (end_lsn >> 32), (uint32) end_lsn);
+				repack_current_segment = segno_new;
+			}
+
+			CHECK_FOR_INTERRUPTS();
+		}
+		InvalidateSystemCaches();
+		CurrentResourceOwner = resowner_old;
+	}
+	PG_CATCH();
+	{
+		/* clear all timetravel entries */
+		InvalidateSystemCaches();
+		CurrentResourceOwner = resowner_old;
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+}
+
+/*
+ * Apply changes that happened during the initial load.
+ *
+ * Scan key is passed by caller, so it does not have to be constructed
+ * multiple times. Key entries have all fields initialized, except for
+ * sk_argument.
+ */
+static void
+apply_concurrent_changes(RepackDecodingState *dstate, Relation rel,
+						 ScanKey key, int nkeys, IndexInsertState *iistate)
+{
+	TupleTableSlot *index_slot,
+			   *ident_slot;
+	HeapTuple	tup_old = NULL;
+
+	if (dstate->nchanges == 0)
+		return;
+
+	/* TupleTableSlot is needed to pass the tuple to ExecInsertIndexTuples(). */
+	index_slot = MakeSingleTupleTableSlot(dstate->tupdesc, &TTSOpsHeapTuple);
+
+	/* A slot to fetch tuples from identity index. */
+	ident_slot = table_slot_create(rel, NULL);
+
+	while (tuplestore_gettupleslot(dstate->tstore, true, false,
+								   dstate->tsslot))
+	{
+		bool		shouldFree;
+		HeapTuple	tup_change,
+					tup,
+					tup_exist;
+		char	   *change_raw,
+				   *src;
+		ConcurrentChange change;
+		bool		isnull[1];
+		Datum		values[1];
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* Get the change from the single-column tuple. */
+		tup_change = ExecFetchSlotHeapTuple(dstate->tsslot, false, &shouldFree);
+		heap_deform_tuple(tup_change, dstate->tupdesc_change, values, isnull);
+		Assert(!isnull[0]);
+
+		/* Make sure we access aligned data. */
+		change_raw = (char *) DatumGetByteaP(values[0]);
+		src = (char *) VARDATA(change_raw);
+		memcpy(&change, src, SizeOfConcurrentChange);
+
+		/* TRUNCATE change contains no tuple, so process it separately. */
+		if (change.kind == CHANGE_TRUNCATE)
+		{
+			/*
+			 * All the things that ExecuteTruncateGuts() does (such as firing
+			 * triggers or handling the DROP_CASCADE behavior) should have
+			 * taken place on the source relation. Thus we only do the actual
+			 * truncation of the new relation (and its indexes).
+			 */
+			heap_truncate_one_rel(rel);
+
+			pfree(tup_change);
+			continue;
+		}
+
+		/*
+		 * Extract the tuple from the change. The tuple is copied here because
+		 * it might be assigned to 'tup_old', in which case it needs to
+		 * survive into the next iteration.
+		 */
+		tup = get_changed_tuple(src);
+
+		if (change.kind == CHANGE_UPDATE_OLD)
+		{
+			Assert(tup_old == NULL);
+			tup_old = tup;
+		}
+		else if (change.kind == CHANGE_INSERT)
+		{
+			Assert(tup_old == NULL);
+
+			apply_concurrent_insert(rel, &change, tup, iistate, index_slot);
+
+			pfree(tup);
+		}
+		else if (change.kind == CHANGE_UPDATE_NEW ||
+				 change.kind == CHANGE_DELETE)
+		{
+			IndexScanDesc ind_scan = NULL;
+			HeapTuple	tup_key;
+
+			if (change.kind == CHANGE_UPDATE_NEW)
+			{
+				tup_key = tup_old != NULL ? tup_old : tup;
+			}
+			else
+			{
+				Assert(tup_old == NULL);
+				tup_key = tup;
+			}
+
+			/*
+			 * Find the tuple to be updated or deleted.
+			 */
+			tup_exist = find_target_tuple(rel, key, nkeys, tup_key,
+										  iistate, ident_slot, &ind_scan);
+			if (tup_exist == NULL)
+				elog(ERROR, "Failed to find target tuple");
+
+			if (change.kind == CHANGE_UPDATE_NEW)
+				apply_concurrent_update(rel, tup, tup_exist, &change, iistate,
+										index_slot);
+			else
+				apply_concurrent_delete(rel, tup_exist, &change);
+
+			if (tup_old != NULL)
+			{
+				pfree(tup_old);
+				tup_old = NULL;
+			}
+
+			pfree(tup);
+			index_endscan(ind_scan);
+		}
+		else
+			elog(ERROR, "Unrecognized kind of change: %d", change.kind);
+
+		/*
+		 * If a change was applied now, increment CID for next writes and
+		 * update the snapshot so it sees the changes we've applied so far.
+		 */
+		if (change.kind != CHANGE_UPDATE_OLD)
+		{
+			CommandCounterIncrement();
+			UpdateActiveSnapshotCommandId();
+		}
+
+		/* TTSOpsMinimalTuple has .get_heap_tuple==NULL. */
+		Assert(shouldFree);
+		pfree(tup_change);
+	}
+
+	tuplestore_clear(dstate->tstore);
+	dstate->nchanges = 0;
+
+	/* Cleanup. */
+	ExecDropSingleTupleTableSlot(index_slot);
+	ExecDropSingleTupleTableSlot(ident_slot);
+}
+
+static void
+apply_concurrent_insert(Relation rel, ConcurrentChange *change, HeapTuple tup,
+						IndexInsertState *iistate, TupleTableSlot *index_slot)
+{
+	List	   *recheck;
+
+
+	/*
+	 * Like simple_heap_insert(), but make sure that the INSERT is not
+	 * logically decoded - see reform_and_rewrite_tuple() for more
+	 * information.
+	 */
+	heap_insert(rel, tup, GetCurrentCommandId(true), HEAP_INSERT_NO_LOGICAL,
+				NULL);
+
+	/*
+	 * Update indexes.
+	 *
+	 * In case functions in the index need the active snapshot and caller
+	 * hasn't set one.
+	 */
+	ExecStoreHeapTuple(tup, index_slot, false);
+	recheck = ExecInsertIndexTuples(iistate->rri,
+									index_slot,
+									iistate->estate,
+									false,	/* update */
+									false,	/* noDupErr */
+									NULL,	/* specConflict */
+									NIL,	/* arbiterIndexes */
+									false	/* onlySummarizing */
+		);
+
+	/*
+	 * If recheck is required, it must have been preformed on the source
+	 * relation by now. (All the logical changes we process here are already
+	 * committed.)
+	 */
+	list_free(recheck);
+
+	pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_INSERTED, 1);
+}
+
+static void
+apply_concurrent_update(Relation rel, HeapTuple tup, HeapTuple tup_target,
+						ConcurrentChange *change, IndexInsertState *iistate,
+						TupleTableSlot *index_slot)
+{
+	LockTupleMode lockmode;
+	TM_FailureData tmfd;
+	TU_UpdateIndexes update_indexes;
+	TM_Result	res;
+	List	   *recheck;
+
+	/*
+	 * Write the new tuple into the new heap. ('tup' gets the TID assigned
+	 * here.)
+	 *
+	 * Do it like in simple_heap_update(), except for 'wal_logical' (and
+	 * except for 'wait').
+	 */
+	res = heap_update(rel, &tup_target->t_self, tup,
+					  GetCurrentCommandId(true),
+					  InvalidSnapshot,
+					  false,	/* no wait - only we are doing changes */
+					  &tmfd, &lockmode, &update_indexes,
+					  false /* wal_logical */ );
+	if (res != TM_Ok)
+		ereport(ERROR, (errmsg("failed to apply concurrent UPDATE")));
+
+	ExecStoreHeapTuple(tup, index_slot, false);
+
+	if (update_indexes != TU_None)
+	{
+		recheck = ExecInsertIndexTuples(iistate->rri,
+										index_slot,
+										iistate->estate,
+										true,	/* update */
+										false,	/* noDupErr */
+										NULL,	/* specConflict */
+										NIL,	/* arbiterIndexes */
+		/* onlySummarizing */
+										update_indexes == TU_Summarizing);
+		list_free(recheck);
+	}
+
+	pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_UPDATED, 1);
+}
+
+static void
+apply_concurrent_delete(Relation rel, HeapTuple tup_target,
+						ConcurrentChange *change)
+{
+	TM_Result	res;
+	TM_FailureData tmfd;
+
+	/*
+	 * Delete tuple from the new heap.
+	 *
+	 * Do it like in simple_heap_delete(), except for 'wal_logical' (and
+	 * except for 'wait').
+	 */
+	res = heap_delete(rel, &tup_target->t_self, GetCurrentCommandId(true),
+					  InvalidSnapshot, false,
+					  &tmfd,
+					  false,	/* no wait - only we are doing changes */
+					  false /* wal_logical */ );
+
+	if (res != TM_Ok)
+		ereport(ERROR, (errmsg("failed to apply concurrent DELETE")));
+
+	pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_DELETED, 1);
+}
+
+/*
+ * Find the tuple to be updated or deleted.
+ *
+ * 'key' is a pre-initialized scan key, into which the function will put the
+ * key values.
+ *
+ * 'tup_key' is a tuple containing the key values for the scan.
+ *
+ * On exit,'*scan_p' contains the scan descriptor used. The caller must close
+ * it when he no longer needs the tuple returned.
+ */
+static HeapTuple
+find_target_tuple(Relation rel, ScanKey key, int nkeys, HeapTuple tup_key,
+				  IndexInsertState *iistate,
+				  TupleTableSlot *ident_slot, IndexScanDesc *scan_p)
+{
+	IndexScanDesc scan;
+	Form_pg_index ident_form;
+	int2vector *ident_indkey;
+	HeapTuple	result = NULL;
+
+	/* XXX no instrumentation for now */
+	scan = index_beginscan(rel, iistate->ident_index, GetActiveSnapshot(),
+						   NULL, nkeys, 0);
+	*scan_p = scan;
+	index_rescan(scan, key, nkeys, NULL, 0);
+
+	/* Info needed to retrieve key values from heap tuple. */
+	ident_form = iistate->ident_index->rd_index;
+	ident_indkey = &ident_form->indkey;
+
+	/* Use the incoming tuple to finalize the scan key. */
+	for (int i = 0; i < scan->numberOfKeys; i++)
+	{
+		ScanKey		entry;
+		bool		isnull;
+		int16		attno_heap;
+
+		entry = &scan->keyData[i];
+		attno_heap = ident_indkey->values[i];
+		entry->sk_argument = heap_getattr(tup_key,
+										  attno_heap,
+										  rel->rd_att,
+										  &isnull);
+		Assert(!isnull);
+	}
+	if (index_getnext_slot(scan, ForwardScanDirection, ident_slot))
+	{
+		bool		shouldFree;
+
+		result = ExecFetchSlotHeapTuple(ident_slot, false, &shouldFree);
+		/* TTSOpsBufferHeapTuple has .get_heap_tuple != NULL. */
+		Assert(!shouldFree);
+	}
+
+	return result;
+}
+
+/*
+ * Decode and apply concurrent changes.
+ *
+ * Pass rel_src iff its reltoastrelid is needed.
+ */
+static void
+process_concurrent_changes(LogicalDecodingContext *ctx, XLogRecPtr end_of_wal,
+						   Relation rel_dst, Relation rel_src, ScanKey ident_key,
+						   int ident_key_nentries, IndexInsertState *iistate)
+{
+	RepackDecodingState *dstate;
+
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_CATCH_UP);
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	repack_decode_concurrent_changes(ctx, end_of_wal);
+
+	if (dstate->nchanges == 0)
+		return;
+
+	PG_TRY();
+	{
+		/*
+		 * Make sure that TOAST values can eventually be accessed via the old
+		 * relation - see comment in copy_table_data().
+		 */
+		if (rel_src)
+			rel_dst->rd_toastoid = rel_src->rd_rel->reltoastrelid;
+
+		apply_concurrent_changes(dstate, rel_dst, ident_key,
+								 ident_key_nentries, iistate);
+	}
+	PG_FINALLY();
+	{
+		if (rel_src)
+			rel_dst->rd_toastoid = InvalidOid;
+	}
+	PG_END_TRY();
+}
+
+static IndexInsertState *
+get_index_insert_state(Relation relation, Oid ident_index_id)
+{
+	EState	   *estate;
+	int			i;
+	IndexInsertState *result;
+
+	result = (IndexInsertState *) palloc0(sizeof(IndexInsertState));
+	estate = CreateExecutorState();
+
+	result->rri = (ResultRelInfo *) palloc(sizeof(ResultRelInfo));
+	InitResultRelInfo(result->rri, relation, 0, 0, 0);
+	ExecOpenIndices(result->rri, false);
+
+	/*
+	 * Find the relcache entry of the identity index so that we spend no extra
+	 * effort to open / close it.
+	 */
+	for (i = 0; i < result->rri->ri_NumIndices; i++)
+	{
+		Relation	ind_rel;
+
+		ind_rel = result->rri->ri_IndexRelationDescs[i];
+		if (ind_rel->rd_id == ident_index_id)
+			result->ident_index = ind_rel;
+	}
+	if (result->ident_index == NULL)
+		elog(ERROR, "Failed to open identity index");
+
+	/* Only initialize fields needed by ExecInsertIndexTuples(). */
+	result->estate = estate;
+
+	return result;
+}
+
+/*
+ * Build scan key to process logical changes.
+ */
+static ScanKey
+build_identity_key(Oid ident_idx_oid, Relation rel_src, int *nentries)
+{
+	Relation	ident_idx_rel;
+	Form_pg_index ident_idx;
+	int			n,
+				i;
+	ScanKey		result;
+
+	Assert(OidIsValid(ident_idx_oid));
+	ident_idx_rel = index_open(ident_idx_oid, AccessShareLock);
+	ident_idx = ident_idx_rel->rd_index;
+	n = ident_idx->indnatts;
+	result = (ScanKey) palloc(sizeof(ScanKeyData) * n);
+	for (i = 0; i < n; i++)
+	{
+		ScanKey		entry;
+		int16		relattno;
+		Form_pg_attribute att;
+		Oid			opfamily,
+					opcintype,
+					opno,
+					opcode;
+
+		entry = &result[i];
+		relattno = ident_idx->indkey.values[i];
+		if (relattno >= 1)
+		{
+			TupleDesc	desc;
+
+			desc = rel_src->rd_att;
+			att = TupleDescAttr(desc, relattno - 1);
+		}
+		else
+			elog(ERROR, "Unexpected attribute number %d in index", relattno);
+
+		opfamily = ident_idx_rel->rd_opfamily[i];
+		opcintype = ident_idx_rel->rd_opcintype[i];
+		opno = get_opfamily_member(opfamily, opcintype, opcintype,
+								   BTEqualStrategyNumber);
+
+		if (!OidIsValid(opno))
+			elog(ERROR, "Failed to find = operator for type %u", opcintype);
+
+		opcode = get_opcode(opno);
+		if (!OidIsValid(opcode))
+			elog(ERROR, "Failed to find = operator for operator %u", opno);
+
+		/* Initialize everything but argument. */
+		ScanKeyInit(entry,
+					i + 1,
+					BTEqualStrategyNumber, opcode,
+					(Datum) NULL);
+		entry->sk_collation = att->attcollation;
+	}
+	index_close(ident_idx_rel, AccessShareLock);
+
+	*nentries = n;
+	return result;
+}
+
+static void
+free_index_insert_state(IndexInsertState *iistate)
+{
+	ExecCloseIndices(iistate->rri);
+	FreeExecutorState(iistate->estate);
+	pfree(iistate->rri);
+	pfree(iistate);
+}
+
+static void
+cleanup_logical_decoding(LogicalDecodingContext *ctx)
+{
+	RepackDecodingState *dstate;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	ExecDropSingleTupleTableSlot(dstate->tsslot);
+	FreeTupleDesc(dstate->tupdesc_change);
+	FreeTupleDesc(dstate->tupdesc);
+	tuplestore_end(dstate->tstore);
+
+	FreeDecodingContext(ctx);
+}
+
+/*
+ * The final steps of rebuild_relation() for concurrent processing.
+ *
+ * On entry, NewHeap is locked in AccessExclusiveLock mode. OldHeap and its
+ * clustering index (if one is passed) are still locked in a mode that allows
+ * concurrent data changes. On exit, both tables and their indexes are closed,
+ * but locked in AccessExclusiveLock mode.
+ */
+static void
+rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
+								   Relation cl_index,
+								   LogicalDecodingContext *ctx,
+								   bool swap_toast_by_content,
+								   TransactionId frozenXid,
+								   MultiXactId cutoffMulti)
+{
+	LOCKMODE	lockmode_old PG_USED_FOR_ASSERTS_ONLY;
+	List	   *ind_oids_new;
+	Oid			old_table_oid = RelationGetRelid(OldHeap);
+	Oid			new_table_oid = RelationGetRelid(NewHeap);
+	List	   *ind_oids_old = RelationGetIndexList(OldHeap);
+	ListCell   *lc,
+			   *lc2;
+	char		relpersistence;
+	bool		is_system_catalog;
+	Oid			ident_idx_old,
+				ident_idx_new;
+	IndexInsertState *iistate;
+	ScanKey		ident_key;
+	int			ident_key_nentries;
+	XLogRecPtr	wal_insert_ptr,
+				end_of_wal;
+	char		dummy_rec_data = '\0';
+	Relation   *ind_refs,
+			   *ind_refs_p;
+	int			nind;
+
+	/* Like in cluster_rel(). */
+	lockmode_old = ShareUpdateExclusiveLock;
+	Assert(CheckRelationLockedByMe(OldHeap, lockmode_old, false));
+	Assert(cl_index == NULL ||
+		   CheckRelationLockedByMe(cl_index, lockmode_old, false));
+	/* This is expected from the caller. */
+	Assert(CheckRelationLockedByMe(NewHeap, AccessExclusiveLock, false));
+
+	ident_idx_old = RelationGetReplicaIndex(OldHeap);
+
+	/*
+	 * Unlike the exclusive case, we build new indexes for the new relation
+	 * rather than swapping the storage and reindexing the old relation. The
+	 * point is that the index build can take some time, so we do it before we
+	 * get AccessExclusiveLock on the old heap and therefore we cannot swap
+	 * the heap storage yet.
+	 *
+	 * index_create() will lock the new indexes using AccessExclusiveLock - no
+	 * need to change that.
+	 *
+	 * We assume that ShareUpdateExclusiveLock on the table prevents anyone
+	 * from dropping the existing indexes or adding new ones, so the lists of
+	 * old and new indexes should match at the swap time. On the other hand we
+	 * do not block ALTER INDEX commands that do not require table lock (e.g.
+	 * ALTER INDEX ... SET ...).
+	 *
+	 * XXX Should we check a the end of our work if another transaction
+	 * executed such a command and issue a NOTICE that we might have discarded
+	 * its effects? (For example, someone changes storage parameter after we
+	 * have created the new index, the new value of that parameter is lost.)
+	 * Alternatively, we can lock all the indexes now in a mode that blocks
+	 * all the ALTER INDEX commands (ShareUpdateExclusiveLock ?), and keep
+	 * them locked till the end of the transactions. That might increase the
+	 * risk of deadlock during the lock upgrade below, however SELECT / DML
+	 * queries should not be involved in such a deadlock.
+	 */
+	ind_oids_new = build_new_indexes(NewHeap, OldHeap, ind_oids_old);
+
+	/*
+	 * Processing shouldn't start w/o valid identity index.
+	 */
+	Assert(OidIsValid(ident_idx_old));
+
+	/* Find "identity index" on the new relation. */
+	ident_idx_new = InvalidOid;
+	forboth(lc, ind_oids_old, lc2, ind_oids_new)
+	{
+		Oid			ind_old = lfirst_oid(lc);
+		Oid			ind_new = lfirst_oid(lc2);
+
+		if (ident_idx_old == ind_old)
+		{
+			ident_idx_new = ind_new;
+			break;
+		}
+	}
+	if (!OidIsValid(ident_idx_new))
+
+		/*
+		 * Should not happen, given our lock on the old relation.
+		 */
+		ereport(ERROR,
+				(errmsg("Identity index missing on the new relation")));
+
+	/* Executor state to update indexes. */
+	iistate = get_index_insert_state(NewHeap, ident_idx_new);
+
+	/*
+	 * Build scan key that we'll use to look for rows to be updated / deleted
+	 * during logical decoding.
+	 */
+	ident_key = build_identity_key(ident_idx_new, OldHeap, &ident_key_nentries);
+
+	/*
+	 * During testing, wait for another backend to perform concurrent data
+	 * changes which we will process below.
+	 */
+	INJECTION_POINT("repack-concurrently-before-lock", NULL);
+
+	/*
+	 * Flush all WAL records inserted so far (possibly except for the last
+	 * incomplete page, see GetInsertRecPtr), to minimize the amount of data
+	 * we need to flush while holding exclusive lock on the source table.
+	 */
+	wal_insert_ptr = GetInsertRecPtr();
+	XLogFlush(wal_insert_ptr);
+	end_of_wal = GetFlushRecPtr(NULL);
+
+	/*
+	 * Apply concurrent changes first time, to minimize the time we need to
+	 * hold AccessExclusiveLock. (Quite some amount of WAL could have been
+	 * written during the data copying and index creation.)
+	 */
+	process_concurrent_changes(ctx, end_of_wal, NewHeap,
+							   swap_toast_by_content ? OldHeap : NULL,
+							   ident_key, ident_key_nentries, iistate);
+
+	/*
+	 * Acquire AccessExclusiveLock on the table, its TOAST relation (if there
+	 * is one), all its indexes, so that we can swap the files.
+	 *
+	 * Before that, unlock the index temporarily to avoid deadlock in case
+	 * another transaction is trying to lock it while holding the lock on the
+	 * table.
+	 */
+	if (cl_index)
+	{
+		index_close(cl_index, ShareUpdateExclusiveLock);
+		cl_index = NULL;
+	}
+	/* For the same reason, unlock TOAST relation. */
+	if (OldHeap->rd_rel->reltoastrelid)
+		LockRelationOid(OldHeap->rd_rel->reltoastrelid, AccessExclusiveLock);
+	/* Finally lock the table */
+	LockRelationOid(old_table_oid, AccessExclusiveLock);
+
+	/*
+	 * Lock all indexes now, not only the clustering one: all indexes need to
+	 * have their files swapped. While doing that, store their relation
+	 * references in an array, to handle predicate locks below.
+	 */
+	ind_refs_p = ind_refs = palloc_array(Relation, list_length(ind_oids_old));
+	nind = 0;
+	foreach(lc, ind_oids_old)
+	{
+		Oid			ind_oid;
+		Relation	index;
+
+		ind_oid = lfirst_oid(lc);
+		index = index_open(ind_oid, AccessExclusiveLock);
+
+		/*
+		 * TODO 1) Do we need to check if ALTER INDEX was executed since the
+		 * new index was created in build_new_indexes()? 2) Specifically for
+		 * the clustering index, should check_index_is_clusterable() be called
+		 * here? (Not sure about the latter: ShareUpdateExclusiveLock on the
+		 * table probably blocks all commands that affect the result of
+		 * check_index_is_clusterable().)
+		 */
+		*ind_refs_p = index;
+		ind_refs_p++;
+		nind++;
+	}
+
+	/*
+	 * In addition, lock the OldHeap's TOAST relation exclusively - again, the
+	 * lock is needed to swap the files.
+	 */
+	if (OidIsValid(OldHeap->rd_rel->reltoastrelid))
+		LockRelationOid(OldHeap->rd_rel->reltoastrelid, AccessExclusiveLock);
+
+	/*
+	 * Tuples and pages of the old heap will be gone, but the heap will stay.
+	 */
+	TransferPredicateLocksToHeapRelation(OldHeap);
+	/* The same for indexes. */
+	for (int i = 0; i < nind; i++)
+	{
+		Relation	index = ind_refs[i];
+
+		TransferPredicateLocksToHeapRelation(index);
+
+		/*
+		 * References to indexes on the old relation are not needed anymore,
+		 * however locks stay till the end of the transaction.
+		 */
+		index_close(index, NoLock);
+	}
+	pfree(ind_refs);
+
+	/*
+	 * Flush anything we see in WAL, to make sure that all changes committed
+	 * while we were waiting for the exclusive lock are available for
+	 * decoding. This should not be necessary if all backends had
+	 * synchronous_commit set, but we can't rely on this setting.
+	 *
+	 * Unfortunately, GetInsertRecPtr() may lag behind the actual insert
+	 * position, and GetLastImportantRecPtr() points at the start of the last
+	 * record rather than at the end. Thus the simplest way to determine the
+	 * insert position is to insert a dummy record and use its LSN.
+	 *
+	 * XXX Consider using GetLastImportantRecPtr() and adding the size of the
+	 * last record (plus the total size of all the page headers the record
+	 * spans)?
+	 */
+	XLogBeginInsert();
+	XLogRegisterData(&dummy_rec_data, 1);
+	wal_insert_ptr = XLogInsert(RM_XLOG_ID, XLOG_NOOP);
+	XLogFlush(wal_insert_ptr);
+	end_of_wal = GetFlushRecPtr(NULL);
+
+	/* Apply the concurrent changes again. */
+	process_concurrent_changes(ctx, end_of_wal, NewHeap,
+							   swap_toast_by_content ? OldHeap : NULL,
+							   ident_key, ident_key_nentries, iistate);
+
+	/* Remember info about rel before closing OldHeap */
+	relpersistence = OldHeap->rd_rel->relpersistence;
+	is_system_catalog = IsSystemRelation(OldHeap);
+
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_SWAP_REL_FILES);
+
+	/*
+	 * Even ShareUpdateExclusiveLock should have prevented others from
+	 * creating / dropping indexes (even using the CONCURRENTLY option), so we
+	 * do not need to check whether the lists match.
+	 */
+	forboth(lc, ind_oids_old, lc2, ind_oids_new)
+	{
+		Oid			ind_old = lfirst_oid(lc);
+		Oid			ind_new = lfirst_oid(lc2);
+		Oid			mapped_tables[4];
+
+		/* Zero out possible results from swapped_relation_files */
+		memset(mapped_tables, 0, sizeof(mapped_tables));
+
+		swap_relation_files(ind_old, ind_new,
+							(old_table_oid == RelationRelationId),
+							swap_toast_by_content,
+							true,
+							InvalidTransactionId,
+							InvalidMultiXactId,
+							mapped_tables);
+
+#ifdef USE_ASSERT_CHECKING
+
+		/*
+		 * Concurrent processing is not supported for system relations, so
+		 * there should be no mapped tables.
+		 */
+		for (int i = 0; i < 4; i++)
+			Assert(mapped_tables[i] == 0);
+#endif
+	}
+
+	/* The new indexes must be visible for deletion. */
+	CommandCounterIncrement();
+
+	/* Close the old heap but keep lock until transaction commit. */
+	table_close(OldHeap, NoLock);
+	/* Close the new heap. (We didn't have to open its indexes). */
+	table_close(NewHeap, NoLock);
+
+	/* Cleanup what we don't need anymore. (And close the identity index.) */
+	pfree(ident_key);
+	free_index_insert_state(iistate);
+
+	/*
+	 * Swap the relations and their TOAST relations and TOAST indexes. This
+	 * also drops the new relation and its indexes.
+	 *
+	 * (System catalogs are currently not supported.)
+	 */
+	Assert(!is_system_catalog);
+	finish_heap_swap(old_table_oid, new_table_oid,
+					 is_system_catalog,
+					 swap_toast_by_content,
+					 false, true, false,
+					 frozenXid, cutoffMulti,
+					 relpersistence);
+}
+
+/*
+ * Build indexes on NewHeap according to those on OldHeap.
+ *
+ * OldIndexes is the list of index OIDs on OldHeap.
+ *
+ * A list of OIDs of the corresponding indexes created on NewHeap is
+ * returned. The order of items does match, so we can use these arrays to swap
+ * index storage.
+ */
+static List *
+build_new_indexes(Relation NewHeap, Relation OldHeap, List *OldIndexes)
+{
+	ListCell   *lc;
+	List	   *result = NIL;
+
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
+
+	foreach(lc, OldIndexes)
+	{
+		Oid			ind_oid,
+					ind_oid_new;
+		char	   *newName;
+		Relation	ind;
+
+		ind_oid = lfirst_oid(lc);
+		ind = index_open(ind_oid, AccessShareLock);
+
+		newName = ChooseRelationName(get_rel_name(ind_oid),
+									 NULL,
+									 "repacknew",
+									 get_rel_namespace(ind->rd_index->indrelid),
+									 false);
+		ind_oid_new = index_create_copy(NewHeap, ind_oid,
+										ind->rd_rel->reltablespace, newName,
+										false);
+		result = lappend_oid(result, ind_oid_new);
+
+		index_close(ind, AccessShareLock);
+	}
+
+	return result;
+}
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 188e26f0e6e..71b73c21ebf 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -904,7 +904,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 static void
 refresh_by_heap_swap(Oid matviewOid, Oid OIDNewHeap, char relpersistence)
 {
-	finish_heap_swap(matviewOid, OIDNewHeap, false, false, true, true,
+	finish_heap_swap(matviewOid, OIDNewHeap, false, false, true, true, true,
 					 RecentXmin, ReadNextMultiXactId(), relpersistence);
 }
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 082a3575d62..c79f5b1dc0f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -5989,6 +5989,7 @@ ATRewriteTables(AlterTableStmt *parsetree, List **wqueue, LOCKMODE lockmode,
 			finish_heap_swap(tab->relid, OIDNewHeap,
 							 false, false, true,
 							 !OidIsValid(tab->newTableSpace),
+							 true,
 							 RecentXmin,
 							 ReadNextMultiXactId(),
 							 persistence);
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 8863ad0e8bd..6de9d0ba39d 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -125,7 +125,7 @@ static void vac_truncate_clog(TransactionId frozenXID,
 							  TransactionId lastSaneFrozenXid,
 							  MultiXactId lastSaneMinMulti);
 static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
-					   BufferAccessStrategy bstrategy);
+					   BufferAccessStrategy bstrategy, bool isTopLevel);
 static double compute_parallel_delay(void);
 static VacOptValue get_vacoptval_from_boolean(DefElem *def);
 static bool vac_tid_reaped(ItemPointer itemptr, void *state);
@@ -633,7 +633,8 @@ vacuum(List *relations, const VacuumParams params, BufferAccessStrategy bstrateg
 
 			if (params.options & VACOPT_VACUUM)
 			{
-				if (!vacuum_rel(vrel->oid, vrel->relation, params, bstrategy))
+				if (!vacuum_rel(vrel->oid, vrel->relation, params, bstrategy,
+								isTopLevel))
 					continue;
 			}
 
@@ -1997,7 +1998,7 @@ vac_truncate_clog(TransactionId frozenXID,
  */
 static bool
 vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
-		   BufferAccessStrategy bstrategy)
+		   BufferAccessStrategy bstrategy, bool isTopLevel)
 {
 	LOCKMODE	lmode;
 	Relation	rel;
@@ -2288,7 +2289,7 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 
 			/* VACUUM FULL is now a variant of CLUSTER; see cluster.c */
 			cluster_rel(REPACK_COMMAND_VACUUMFULL, false, rel, InvalidOid,
-						&cluster_params);
+						&cluster_params, isTopLevel);
 			/* cluster_rel closes the relation, but keeps lock */
 
 			rel = NULL;
@@ -2331,7 +2332,8 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 		toast_vacuum_params.options |= VACOPT_PROCESS_MAIN;
 		toast_vacuum_params.toast_parent = relid;
 
-		vacuum_rel(toast_relid, NULL, toast_vacuum_params, bstrategy);
+		vacuum_rel(toast_relid, NULL, toast_vacuum_params, bstrategy,
+				   isTopLevel);
 	}
 
 	/*
diff --git a/src/backend/meson.build b/src/backend/meson.build
index b831a541652..5c148131217 100644
--- a/src/backend/meson.build
+++ b/src/backend/meson.build
@@ -194,5 +194,6 @@ pg_test_mod_args = pg_mod_args + {
 subdir('jit/llvm')
 subdir('replication/libpqwalreceiver')
 subdir('replication/pgoutput')
+subdir('replication/pgoutput_repack')
 subdir('snowball')
 subdir('utils/mb/conversion_procs')
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..5dc4ae58ffe 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -33,6 +33,7 @@
 #include "access/xlogreader.h"
 #include "access/xlogrecord.h"
 #include "catalog/pg_control.h"
+#include "commands/cluster.h"
 #include "replication/decode.h"
 #include "replication/logical.h"
 #include "replication/message.h"
@@ -472,6 +473,88 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	TransactionId xid = XLogRecGetXid(buf->record);
 	SnapBuild  *builder = ctx->snapshot_builder;
 
+	/*
+	 * If the change is not intended for logical decoding, do not even
+	 * establish transaction for it - REPACK CONCURRENTLY is the typical use
+	 * case.
+	 *
+	 * First, check if REPACK CONCURRENTLY is being performed by this backend.
+	 * If so, only decode data changes of the table that it is processing, and
+	 * the changes of its TOAST relation.
+	 *
+	 * (TOAST locator should not be set unless the main is.)
+	 */
+	Assert(!OidIsValid(repacked_rel_toast_locator.relNumber) ||
+		   OidIsValid(repacked_rel_locator.relNumber));
+
+	if (OidIsValid(repacked_rel_locator.relNumber))
+	{
+		XLogReaderState *r = buf->record;
+		RelFileLocator locator;
+
+		/* Not all records contain the block. */
+		if (XLogRecGetBlockTagExtended(r, 0, &locator, NULL, NULL, NULL) &&
+			!RelFileLocatorEquals(locator, repacked_rel_locator) &&
+			(!OidIsValid(repacked_rel_toast_locator.relNumber) ||
+			 !RelFileLocatorEquals(locator, repacked_rel_toast_locator)))
+			return;
+	}
+
+	/*
+	 * Second, skip records which do not contain sufficient information for
+	 * the decoding.
+	 *
+	 * The problem we solve here is that REPACK CONCURRENTLY generates WAL
+	 * when doing changes in the new table. Those changes should not be useful
+	 * for any other user (such as logical replication subscription) because
+	 * the new table will eventually be dropped (after REPACK CONCURRENTLY has
+	 * assigned its file to the "old table").
+	 */
+	switch (info)
+	{
+		case XLOG_HEAP_INSERT:
+			{
+				xl_heap_insert *rec;
+
+				rec = (xl_heap_insert *) XLogRecGetData(buf->record);
+
+				/*
+				 * This does happen when 1) raw_heap_insert marks the TOAST
+				 * record as HEAP_INSERT_NO_LOGICAL, 2) REPACK CONCURRENTLY
+				 * replays inserts performed by other backends.
+				 */
+				if ((rec->flags & XLH_INSERT_CONTAINS_NEW_TUPLE) == 0)
+					return;
+
+				break;
+			}
+
+		case XLOG_HEAP_HOT_UPDATE:
+		case XLOG_HEAP_UPDATE:
+			{
+				xl_heap_update *rec;
+
+				rec = (xl_heap_update *) XLogRecGetData(buf->record);
+				if ((rec->flags &
+					 (XLH_UPDATE_CONTAINS_NEW_TUPLE |
+					  XLH_UPDATE_CONTAINS_OLD_TUPLE |
+					  XLH_UPDATE_CONTAINS_OLD_KEY)) == 0)
+					return;
+
+				break;
+			}
+
+		case XLOG_HEAP_DELETE:
+			{
+				xl_heap_delete *rec;
+
+				rec = (xl_heap_delete *) XLogRecGetData(buf->record);
+				if (rec->flags & XLH_DELETE_NO_LOGICAL)
+					return;
+				break;
+			}
+	}
+
 	ReorderBufferProcessXid(ctx->reorder, xid, buf->origptr);
 
 	/*
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index a2f1803622c..8e5116a9cab 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -486,6 +486,26 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	return SnapBuildMVCCFromHistoric(snap, true);
 }
 
+/*
+ * Build an MVCC snapshot for the initial data load performed by REPACK
+ * CONCURRENTLY command.
+ *
+ * The snapshot will only be used to scan one particular relation, which is
+ * treated like a catalog (therefore ->building_full_snapshot is not
+ * important), and the caller should already have a replication slot setup (so
+ * we do not set MyProc->xmin). XXX Do we yet need to add some restrictions?
+ */
+Snapshot
+SnapBuildInitialSnapshotForRepack(SnapBuild *builder)
+{
+	Snapshot	snap;
+
+	Assert(builder->state == SNAPBUILD_CONSISTENT);
+
+	snap = SnapBuildBuildSnapshot(builder);
+	return SnapBuildMVCCFromHistoric(snap, false);
+}
+
 /*
  * Turn a historic MVCC snapshot into an ordinary MVCC snapshot.
  *
diff --git a/src/backend/replication/pgoutput_repack/Makefile b/src/backend/replication/pgoutput_repack/Makefile
new file mode 100644
index 00000000000..4efeb713b70
--- /dev/null
+++ b/src/backend/replication/pgoutput_repack/Makefile
@@ -0,0 +1,32 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for src/backend/replication/pgoutput_repack
+#
+# IDENTIFICATION
+#    src/backend/replication/pgoutput_repack
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/replication/pgoutput_repack
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = \
+	$(WIN32RES) \
+	pgoutput_repack.o
+PGFILEDESC = "pgoutput_repack - logical replication output plugin for REPACK command"
+NAME = pgoutput_repack
+
+all: all-shared-lib
+
+include $(top_srcdir)/src/Makefile.shlib
+
+install: all installdirs install-lib
+
+installdirs: installdirs-lib
+
+uninstall: uninstall-lib
+
+clean distclean: clean-lib
+	rm -f $(OBJS)
diff --git a/src/backend/replication/pgoutput_repack/meson.build b/src/backend/replication/pgoutput_repack/meson.build
new file mode 100644
index 00000000000..133e865a4a0
--- /dev/null
+++ b/src/backend/replication/pgoutput_repack/meson.build
@@ -0,0 +1,18 @@
+# Copyright (c) 2022-2024, PostgreSQL Global Development Group
+
+pgoutput_repack_sources = files(
+  'pgoutput_repack.c',
+)
+
+if host_system == 'windows'
+  pgoutput_repack_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'pgoutput_repack',
+    '--FILEDESC', 'pgoutput_repack - logical replication output plugin for REPACK command',])
+endif
+
+pgoutput_repack = shared_module('pgoutput_repack',
+  pgoutput_repack_sources,
+  kwargs: pg_mod_args,
+)
+
+backend_targets += pgoutput_repack
diff --git a/src/backend/replication/pgoutput_repack/pgoutput_repack.c b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
new file mode 100644
index 00000000000..687fbbc59bb
--- /dev/null
+++ b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
@@ -0,0 +1,288 @@
+/*-------------------------------------------------------------------------
+ *
+ * pgoutput_cluster.c
+ *		Logical Replication output plugin for REPACK command
+ *
+ * Copyright (c) 2012-2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		  src/backend/replication/pgoutput_cluster/pgoutput_cluster.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heaptoast.h"
+#include "commands/cluster.h"
+#include "replication/snapbuild.h"
+
+PG_MODULE_MAGIC;
+
+static void plugin_startup(LogicalDecodingContext *ctx,
+						   OutputPluginOptions *opt, bool is_init);
+static void plugin_shutdown(LogicalDecodingContext *ctx);
+static void plugin_begin_txn(LogicalDecodingContext *ctx,
+							 ReorderBufferTXN *txn);
+static void plugin_commit_txn(LogicalDecodingContext *ctx,
+							  ReorderBufferTXN *txn, XLogRecPtr commit_lsn);
+static void plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+						  Relation rel, ReorderBufferChange *change);
+static void plugin_truncate(struct LogicalDecodingContext *ctx,
+							ReorderBufferTXN *txn, int nrelations,
+							Relation relations[],
+							ReorderBufferChange *change);
+static void store_change(LogicalDecodingContext *ctx,
+						 ConcurrentChangeKind kind, HeapTuple tuple);
+
+void
+_PG_output_plugin_init(OutputPluginCallbacks *cb)
+{
+	AssertVariableIsOfType(&_PG_output_plugin_init, LogicalOutputPluginInit);
+
+	cb->startup_cb = plugin_startup;
+	cb->begin_cb = plugin_begin_txn;
+	cb->change_cb = plugin_change;
+	cb->truncate_cb = plugin_truncate;
+	cb->commit_cb = plugin_commit_txn;
+	cb->shutdown_cb = plugin_shutdown;
+}
+
+
+/* initialize this plugin */
+static void
+plugin_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
+			   bool is_init)
+{
+	ctx->output_plugin_private = NULL;
+
+	/* Probably unnecessary, as we don't use the SQL interface ... */
+	opt->output_type = OUTPUT_PLUGIN_BINARY_OUTPUT;
+
+	if (ctx->output_plugin_options != NIL)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("This plugin does not expect any options")));
+	}
+}
+
+static void
+plugin_shutdown(LogicalDecodingContext *ctx)
+{
+}
+
+/*
+ * As we don't release the slot during processing of particular table, there's
+ * no room for SQL interface, even for debugging purposes. Therefore we need
+ * neither OutputPluginPrepareWrite() nor OutputPluginWrite() in the plugin
+ * callbacks. (Although we might want to write custom callbacks, this API
+ * seems to be unnecessarily generic for our purposes.)
+ */
+
+/* BEGIN callback */
+static void
+plugin_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
+{
+}
+
+/* COMMIT callback */
+static void
+plugin_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+				  XLogRecPtr commit_lsn)
+{
+}
+
+/*
+ * Callback for individual changed tuples
+ */
+static void
+plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+			  Relation relation, ReorderBufferChange *change)
+{
+	RepackDecodingState *dstate;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	/* Only interested in one particular relation. */
+	if (relation->rd_id != dstate->relid)
+		return;
+
+	/* Decode entry depending on its type */
+	switch (change->action)
+	{
+		case REORDER_BUFFER_CHANGE_INSERT:
+			{
+				HeapTuple	newtuple;
+
+				newtuple = change->data.tp.newtuple != NULL ?
+					change->data.tp.newtuple : NULL;
+
+				/*
+				 * Identity checks in the main function should have made this
+				 * impossible.
+				 */
+				if (newtuple == NULL)
+					elog(ERROR, "Incomplete insert info.");
+
+				store_change(ctx, CHANGE_INSERT, newtuple);
+			}
+			break;
+		case REORDER_BUFFER_CHANGE_UPDATE:
+			{
+				HeapTuple	oldtuple,
+							newtuple;
+
+				oldtuple = change->data.tp.oldtuple != NULL ?
+					change->data.tp.oldtuple : NULL;
+				newtuple = change->data.tp.newtuple != NULL ?
+					change->data.tp.newtuple : NULL;
+
+				if (newtuple == NULL)
+					elog(ERROR, "Incomplete update info.");
+
+				if (oldtuple != NULL)
+					store_change(ctx, CHANGE_UPDATE_OLD, oldtuple);
+
+				store_change(ctx, CHANGE_UPDATE_NEW, newtuple);
+			}
+			break;
+		case REORDER_BUFFER_CHANGE_DELETE:
+			{
+				HeapTuple	oldtuple;
+
+				oldtuple = change->data.tp.oldtuple ?
+					change->data.tp.oldtuple : NULL;
+
+				if (oldtuple == NULL)
+					elog(ERROR, "Incomplete delete info.");
+
+				store_change(ctx, CHANGE_DELETE, oldtuple);
+			}
+			break;
+		default:
+			/* Should not come here */
+			Assert(false);
+			break;
+	}
+}
+
+static void
+plugin_truncate(struct LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+				int nrelations, Relation relations[],
+				ReorderBufferChange *change)
+{
+	RepackDecodingState *dstate;
+	int			i;
+	Relation	relation = NULL;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	/* Find the relation we are processing. */
+	for (i = 0; i < nrelations; i++)
+	{
+		relation = relations[i];
+
+		if (RelationGetRelid(relation) == dstate->relid)
+			break;
+	}
+
+	/* Is this truncation of another relation? */
+	if (i == nrelations)
+		return;
+
+	store_change(ctx, CHANGE_TRUNCATE, NULL);
+}
+
+/* Store concurrent data change. */
+static void
+store_change(LogicalDecodingContext *ctx, ConcurrentChangeKind kind,
+			 HeapTuple tuple)
+{
+	RepackDecodingState *dstate;
+	char	   *change_raw;
+	ConcurrentChange change;
+	bool		flattened = false;
+	Size		size;
+	Datum		values[1];
+	bool		isnull[1];
+	char	   *dst,
+			   *dst_start;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	size = MAXALIGN(VARHDRSZ) + SizeOfConcurrentChange;
+
+	if (tuple)
+	{
+		/*
+		 * ReorderBufferCommit() stores the TOAST chunks in its private memory
+		 * context and frees them after having called apply_change().
+		 * Therefore we need flat copy (including TOAST) that we eventually
+		 * copy into the memory context which is available to
+		 * decode_concurrent_changes().
+		 */
+		if (HeapTupleHasExternal(tuple))
+		{
+			/*
+			 * toast_flatten_tuple_to_datum() might be more convenient but we
+			 * don't want the decompression it does.
+			 */
+			tuple = toast_flatten_tuple(tuple, dstate->tupdesc);
+			flattened = true;
+		}
+
+		size += tuple->t_len;
+	}
+
+	/* XXX Isn't there any function / macro to do this? */
+	if (size >= 0x3FFFFFFF)
+		elog(ERROR, "Change is too big.");
+
+	/* Construct the change. */
+	change_raw = (char *) palloc0(size);
+	SET_VARSIZE(change_raw, size);
+
+	/*
+	 * Since the varlena alignment might not be sufficient for the structure,
+	 * set the fields in a local instance and remember where it should
+	 * eventually be copied.
+	 */
+	change.kind = kind;
+	dst_start = (char *) VARDATA(change_raw);
+
+	/* No other information is needed for TRUNCATE. */
+	if (change.kind == CHANGE_TRUNCATE)
+	{
+		memcpy(dst_start, &change, SizeOfConcurrentChange);
+		goto store;
+	}
+
+	/*
+	 * Copy the tuple.
+	 *
+	 * CAUTION: change->tup_data.t_data must be fixed on retrieval!
+	 */
+	memcpy(&change.tup_data, tuple, sizeof(HeapTupleData));
+	dst = dst_start + SizeOfConcurrentChange;
+	memcpy(dst, tuple->t_data, tuple->t_len);
+
+	/* The data has been copied. */
+	if (flattened)
+		pfree(tuple);
+
+store:
+	/* Copy the structure so it can be stored. */
+	memcpy(dst_start, &change, SizeOfConcurrentChange);
+
+	/* Store as tuple of 1 bytea column. */
+	values[0] = PointerGetDatum(change_raw);
+	isnull[0] = false;
+	tuplestore_putvalues(dstate->tstore, dstate->tupdesc_change,
+						 values, isnull);
+
+	/* Accounting. */
+	dstate->nchanges++;
+
+	/* Cleanup. */
+	pfree(change_raw);
+}
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2fa045e6b0f..e9ddf39500c 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -25,6 +25,7 @@
 #include "access/xlogprefetcher.h"
 #include "access/xlogrecovery.h"
 #include "commands/async.h"
+#include "commands/cluster.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
diff --git a/src/backend/storage/lmgr/generate-lwlocknames.pl b/src/backend/storage/lmgr/generate-lwlocknames.pl
index cd3e43c448a..519f3953638 100644
--- a/src/backend/storage/lmgr/generate-lwlocknames.pl
+++ b/src/backend/storage/lmgr/generate-lwlocknames.pl
@@ -162,7 +162,7 @@ while (<$lwlocklist>)
 
 die
   "$wait_event_lwlocks[$lwlock_count] defined in wait_event_names.txt but "
-  . " missing from lwlocklist.h"
+  . "missing from lwlocklist.h"
   if $lwlock_count < scalar @wait_event_lwlocks;
 
 die
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 6fe268a8eec..d27a4c30548 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -64,6 +64,7 @@
 #include "catalog/pg_type.h"
 #include "catalog/schemapg.h"
 #include "catalog/storage.h"
+#include "commands/cluster.h"
 #include "commands/policy.h"
 #include "commands/publicationcmds.h"
 #include "commands/trigger.h"
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index bc7840052fe..6d46537cbe8 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -213,7 +213,6 @@ static List *exportedSnapshots = NIL;
 
 /* Prototypes for local functions */
 static void UnregisterSnapshotNoOwner(Snapshot snapshot);
-static void FreeSnapshot(Snapshot snapshot);
 static void SnapshotResetXmin(void);
 
 /* ResourceOwner callbacks to track snapshot references */
@@ -657,7 +656,7 @@ CopySnapshot(Snapshot snapshot)
  * FreeSnapshot
  *		Free the memory associated with a snapshot.
  */
-static void
+void
 FreeSnapshot(Snapshot snapshot)
 {
 	Assert(snapshot->regd_count == 0);
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 59ff6e0923b..528fb08154a 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -4998,18 +4998,27 @@ match_previous_words(int pattern_id,
 	}
 
 /* REPACK */
-	else if (Matches("REPACK"))
+	else if (Matches("REPACK") || Matches("REPACK", "(*)"))
+		COMPLETE_WITH_SCHEMA_QUERY_PLUS(Query_for_list_of_clusterables,
+										"CONCURRENTLY");
+	else if (Matches("REPACK", "CONCURRENTLY"))
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
-	else if (Matches("REPACK", "(*)"))
+	else if (Matches("REPACK", "(*)", "CONCURRENTLY"))
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
-	/* If we have REPACK <sth>, then add "USING INDEX" */
-	else if (Matches("REPACK", MatchAnyExcept("(")))
+	/* If we have REPACK [ CONCURRENTLY ] <sth>, then add "USING INDEX" */
+	else if (Matches("REPACK", MatchAnyExcept("(|CONCURRENTLY")) ||
+			 Matches("REPACK", "CONCURRENTLY", MatchAnyExcept("(")))
 		COMPLETE_WITH("USING INDEX");
-	/* If we have REPACK (*) <sth>, then add "USING INDEX" */
-	else if (Matches("REPACK", "(*)", MatchAny))
+	/* If we have REPACK (*) [ CONCURRENTLY ] <sth>, then add "USING INDEX" */
+	else if (Matches("REPACK", "(*)", MatchAnyExcept("CONCURRENTLY")) ||
+			 Matches("REPACK", "(*)", "CONCURRENTLY", MatchAnyExcept("(")))
 		COMPLETE_WITH("USING INDEX");
-	/* If we have REPACK <sth> USING, then add the index as well */
-	else if (Matches("REPACK", MatchAny, "USING", "INDEX"))
+
+	/*
+	 * Complete ... [ (*) ] [ CONCURRENTLY ] <sth> USING INDEX, with a list of
+	 * indexes for <sth>.
+	 */
+	else if (TailMatches(MatchAnyExcept("(|CONCURRENTLY"), "USING", "INDEX"))
 	{
 		set_completion_reference(prev3_wd);
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_index_of_table);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2bd5a897f8..b82dd17a966 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -323,14 +323,15 @@ extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots,
 							  BulkInsertState bistate);
 extern TM_Result heap_delete(Relation relation, ItemPointer tid,
 							 CommandId cid, Snapshot crosscheck, bool wait,
-							 struct TM_FailureData *tmfd, bool changingPart);
+							 struct TM_FailureData *tmfd, bool changingPart,
+							 bool wal_logical);
 extern void heap_finish_speculative(Relation relation, ItemPointer tid);
 extern void heap_abort_speculative(Relation relation, ItemPointer tid);
 extern TM_Result heap_update(Relation relation, ItemPointer otid,
 							 HeapTuple newtup,
 							 CommandId cid, Snapshot crosscheck, bool wait,
 							 struct TM_FailureData *tmfd, LockTupleMode *lockmode,
-							 TU_UpdateIndexes *update_indexes);
+							 TU_UpdateIndexes *update_indexes, bool wal_logical);
 extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
 								 CommandId cid, LockTupleMode mode, LockWaitPolicy wait_policy,
 								 bool follow_updates,
@@ -411,6 +412,10 @@ extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
 								 uint16 infomask, TransactionId xid);
+extern bool HeapTupleMVCCInserted(HeapTuple htup, Snapshot snapshot,
+								  Buffer buffer);
+extern bool HeapTupleMVCCNotDeleted(HeapTuple htup, Snapshot snapshot,
+									Buffer buffer);
 extern bool HeapTupleHeaderIsOnlyLocked(HeapTupleHeader tuple);
 extern bool HeapTupleIsSurelyDead(HeapTuple htup,
 								  struct GlobalVisState *vistest);
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 277df6b3cf0..8d4af07f840 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -104,6 +104,8 @@
 #define XLH_DELETE_CONTAINS_OLD_KEY				(1<<2)
 #define XLH_DELETE_IS_SUPER						(1<<3)
 #define XLH_DELETE_IS_PARTITION_MOVE			(1<<4)
+/* See heap_delete() */
+#define XLH_DELETE_NO_LOGICAL					(1<<5)
 
 /* convenience macro for checking whether any form of old tuple was logged */
 #define XLH_DELETE_CONTAINS_OLD						\
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 1c9e802a6b1..289b64edfd9 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -22,6 +22,7 @@
 #include "access/xact.h"
 #include "commands/vacuum.h"
 #include "executor/tuptable.h"
+#include "replication/logical.h"
 #include "storage/read_stream.h"
 #include "utils/rel.h"
 #include "utils/snapshot.h"
@@ -623,6 +624,8 @@ typedef struct TableAmRoutine
 											  Relation OldIndex,
 											  bool use_sort,
 											  TransactionId OldestXmin,
+											  Snapshot snapshot,
+											  LogicalDecodingContext *decoding_ctx,
 											  TransactionId *xid_cutoff,
 											  MultiXactId *multi_cutoff,
 											  double *num_tuples,
@@ -1627,6 +1630,10 @@ table_relation_copy_data(Relation rel, const RelFileLocator *newrlocator)
  *   not needed for the relation's AM
  * - *xid_cutoff - ditto
  * - *multi_cutoff - ditto
+ * - snapshot - if != NULL, ignore data changes done by transactions that this
+ *	 (MVCC) snapshot considers still in-progress or in the future.
+ * - decoding_ctx - logical decoding context, to capture concurrent data
+ *   changes.
  *
  * Output parameters:
  * - *xid_cutoff - rel's new relfrozenxid value, may be invalid
@@ -1639,6 +1646,8 @@ table_relation_copy_for_cluster(Relation OldTable, Relation NewTable,
 								Relation OldIndex,
 								bool use_sort,
 								TransactionId OldestXmin,
+								Snapshot snapshot,
+								LogicalDecodingContext *decoding_ctx,
 								TransactionId *xid_cutoff,
 								MultiXactId *multi_cutoff,
 								double *num_tuples,
@@ -1647,6 +1656,7 @@ table_relation_copy_for_cluster(Relation OldTable, Relation NewTable,
 {
 	OldTable->rd_tableam->relation_copy_for_cluster(OldTable, NewTable, OldIndex,
 													use_sort, OldestXmin,
+													snapshot, decoding_ctx,
 													xid_cutoff, multi_cutoff,
 													num_tuples, tups_vacuumed,
 													tups_recently_dead);
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 890998d84bb..4a508c57a50 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -13,10 +13,15 @@
 #ifndef CLUSTER_H
 #define CLUSTER_H
 
+#include "nodes/execnodes.h"
 #include "nodes/parsenodes.h"
 #include "parser/parse_node.h"
+#include "replication/logical.h"
 #include "storage/lock.h"
+#include "storage/relfilelocator.h"
 #include "utils/relcache.h"
+#include "utils/resowner.h"
+#include "utils/tuplestore.h"
 
 
 /* flag bits for ClusterParams->options */
@@ -25,6 +30,8 @@
 #define CLUOPT_RECHECK_ISCLUSTERED 0x04 /* recheck relation state for
 										 * indisclustered */
 #define CLUOPT_ANALYZE 0x08		/* do an ANALYZE */
+#define CLUOPT_CONCURRENT 0x08	/* allow concurrent data changes */
+
 
 /* options for CLUSTER */
 typedef struct ClusterParams
@@ -33,14 +40,95 @@ typedef struct ClusterParams
 } ClusterParams;
 
 
+/*
+ * The following definitions are used by REPACK CONCURRENTLY.
+ */
+
+extern RelFileLocator repacked_rel_locator;
+extern RelFileLocator repacked_rel_toast_locator;
+
+typedef enum
+{
+	CHANGE_INSERT,
+	CHANGE_UPDATE_OLD,
+	CHANGE_UPDATE_NEW,
+	CHANGE_DELETE,
+	CHANGE_TRUNCATE
+} ConcurrentChangeKind;
+
+typedef struct ConcurrentChange
+{
+	/* See the enum above. */
+	ConcurrentChangeKind kind;
+
+	/*
+	 * The actual tuple.
+	 *
+	 * The tuple data follows the ConcurrentChange structure. Before use make
+	 * sure the tuple is correctly aligned (ConcurrentChange can be stored as
+	 * bytea) and that tuple->t_data is fixed.
+	 */
+	HeapTupleData tup_data;
+} ConcurrentChange;
+
+#define SizeOfConcurrentChange (offsetof(ConcurrentChange, tup_data) + \
+								sizeof(HeapTupleData))
+
+/*
+ * Logical decoding state.
+ *
+ * Here we store the data changes that we decode from WAL while the table
+ * contents is being copied to a new storage. Also the necessary metadata
+ * needed to apply these changes to the table is stored here.
+ */
+typedef struct RepackDecodingState
+{
+	/* The relation whose changes we're decoding. */
+	Oid			relid;
+
+	/*
+	 * Decoded changes are stored here. Although we try to avoid excessive
+	 * batches, it can happen that the changes need to be stored to disk. The
+	 * tuplestore does this transparently.
+	 */
+	Tuplestorestate *tstore;
+
+	/* The current number of changes in tstore. */
+	double		nchanges;
+
+	/*
+	 * Descriptor to store the ConcurrentChange structure serialized (bytea).
+	 * We can't store the tuple directly because tuplestore only supports
+	 * minimum tuple and we may need to transfer OID system column from the
+	 * output plugin. Also we need to transfer the change kind, so it's better
+	 * to put everything in the structure than to use 2 tuplestores "in
+	 * parallel".
+	 */
+	TupleDesc	tupdesc_change;
+
+	/* Tuple descriptor needed to update indexes. */
+	TupleDesc	tupdesc;
+
+	/* Slot to retrieve data from tstore. */
+	TupleTableSlot *tsslot;
+
+	ResourceOwner resowner;
+} RepackDecodingState;
+
+
+
 extern void ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel);
 
 extern void cluster_rel(RepackCommand command, bool usingindex,
-						Relation OldHeap, Oid indexOid, ClusterParams *params);
+						Relation OldHeap, Oid indexOid, ClusterParams *params,
+						bool isTopLevel);
 extern void check_index_is_clusterable(Relation OldHeap, Oid indexOid,
 									   LOCKMODE lockmode);
 extern void mark_index_clustered(Relation rel, Oid indexOid, bool is_internal);
 
+extern void repack_decode_concurrent_changes(LogicalDecodingContext *ctx,
+											 XLogRecPtr end_of_wal);
+
 extern Oid	make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
 						  char relpersistence, LOCKMODE lockmode);
 extern void finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
@@ -48,6 +136,7 @@ extern void finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 							 bool swap_toast_by_content,
 							 bool check_constraints,
 							 bool is_internal,
+							 bool reindex,
 							 TransactionId frozenXid,
 							 MultiXactId cutoffMulti,
 							 char newrelpersistence);
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 5b6639c114c..93917ad5544 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -59,18 +59,20 @@
 /*
  * Progress parameters for REPACK.
  *
- * Note: Since REPACK shares some code with CLUSTER, these values are also
- * used by CLUSTER. (CLUSTER is now deprecated, so it makes little sense to
- * introduce a separate set of constants.)
+ * Note: Since REPACK shares some code with CLUSTER, (some of) these values
+ * are also used by CLUSTER. (CLUSTER is now deprecated, so it makes little
+ * sense to introduce a separate set of constants.)
  */
 #define PROGRESS_REPACK_COMMAND					0
 #define PROGRESS_REPACK_PHASE					1
 #define PROGRESS_REPACK_INDEX_RELID				2
 #define PROGRESS_REPACK_HEAP_TUPLES_SCANNED		3
-#define PROGRESS_REPACK_HEAP_TUPLES_WRITTEN		4
-#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			5
-#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		6
-#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		7
+#define PROGRESS_REPACK_HEAP_TUPLES_INSERTED	4
+#define PROGRESS_REPACK_HEAP_TUPLES_UPDATED		5
+#define PROGRESS_REPACK_HEAP_TUPLES_DELETED		6
+#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			7
+#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		8
+#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		9
 
 /*
  * Phases of repack (as advertised via PROGRESS_REPACK_PHASE).
@@ -79,9 +81,10 @@
 #define PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP	2
 #define PROGRESS_REPACK_PHASE_SORT_TUPLES		3
 #define PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP	4
-#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	5
-#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		6
-#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		7
+#define PROGRESS_REPACK_PHASE_CATCH_UP			5
+#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	6
+#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		7
+#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		8
 
 /*
  * Commands of PROGRESS_REPACK
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index 6d4d2d1814c..802fc4b0823 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -73,6 +73,7 @@ extern void FreeSnapshotBuilder(SnapBuild *builder);
 extern void SnapBuildSnapDecRefcount(Snapshot snap);
 
 extern Snapshot SnapBuildInitialSnapshot(SnapBuild *builder);
+extern Snapshot SnapBuildInitialSnapshotForRepack(SnapBuild *builder);
 extern Snapshot SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place);
 extern const char *SnapBuildExportSnapshot(SnapBuild *builder);
 extern void SnapBuildClearExportedSnapshot(void);
diff --git a/src/include/storage/lockdefs.h b/src/include/storage/lockdefs.h
index 7f3ba0352f6..2739327b0da 100644
--- a/src/include/storage/lockdefs.h
+++ b/src/include/storage/lockdefs.h
@@ -36,8 +36,8 @@ typedef int LOCKMODE;
 #define AccessShareLock			1	/* SELECT */
 #define RowShareLock			2	/* SELECT FOR UPDATE/FOR SHARE */
 #define RowExclusiveLock		3	/* INSERT, UPDATE, DELETE */
-#define ShareUpdateExclusiveLock 4	/* VACUUM (non-FULL), ANALYZE, CREATE
-									 * INDEX CONCURRENTLY */
+#define ShareUpdateExclusiveLock 4	/* VACUUM (non-exclusive), ANALYZE, CREATE
+									 * INDEX CONCURRENTLY, REPACK CONCURRENTLY */
 #define ShareLock				5	/* CREATE INDEX (WITHOUT CONCURRENTLY) */
 #define ShareRowExclusiveLock	6	/* like EXCLUSIVE MODE, but allows ROW
 									 * SHARE */
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index f65f83c85cd..1f821fd2ccd 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -64,6 +64,8 @@ extern Snapshot GetLatestSnapshot(void);
 extern void SnapshotSetCommandId(CommandId curcid);
 
 extern Snapshot CopySnapshot(Snapshot snapshot);
+extern void FreeSnapshot(Snapshot snapshot);
+
 extern Snapshot GetCatalogSnapshot(Oid relid);
 extern Snapshot GetNonHistoricCatalogSnapshot(Oid relid);
 extern void InvalidateCatalogSnapshot(void);
diff --git a/src/test/modules/injection_points/Makefile b/src/test/modules/injection_points/Makefile
index fc82cd67f6c..f16422175f8 100644
--- a/src/test/modules/injection_points/Makefile
+++ b/src/test/modules/injection_points/Makefile
@@ -11,10 +11,11 @@ EXTENSION = injection_points
 DATA = injection_points--1.0.sql
 PGFILEDESC = "injection_points - facility for injection points"
 
-REGRESS = injection_points hashagg reindex_conc vacuum
+# REGRESS = injection_points hashagg reindex_conc vacuum
 REGRESS_OPTS = --dlpath=$(top_builddir)/src/test/regress
 
-ISOLATION = basic inplace syscache-update-pruned
+ISOLATION = basic inplace syscache-update-pruned repack
+ISOLATION_OPTS = --temp-config $(top_srcdir)/src/test/modules/injection_points/logical.conf
 
 TAP_TESTS = 1
 
diff --git a/src/test/modules/injection_points/expected/repack.out b/src/test/modules/injection_points/expected/repack.out
new file mode 100644
index 00000000000..b575e9052ee
--- /dev/null
+++ b/src/test/modules/injection_points/expected/repack.out
@@ -0,0 +1,113 @@
+Parsed test spec with 2 sessions
+
+starting permutation: wait_before_lock change_existing change_new change_subxact1 change_subxact2 check2 wakeup_before_lock check1
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step wait_before_lock: 
+	REPACK (CONCURRENTLY) repack_test USING INDEX repack_test_pkey;
+ <waiting ...>
+step change_existing: 
+	UPDATE repack_test SET i=10 where i=1;
+	UPDATE repack_test SET j=20 where i=2;
+	UPDATE repack_test SET i=30 where i=3;
+	UPDATE repack_test SET i=40 where i=30;
+	DELETE FROM repack_test WHERE i=4;
+
+step change_new: 
+	INSERT INTO repack_test(i, j) VALUES (5, 5), (6, 6), (7, 7), (8, 8);
+	UPDATE repack_test SET i=50 where i=5;
+	UPDATE repack_test SET j=60 where i=6;
+	DELETE FROM repack_test WHERE i=7;
+
+step change_subxact1: 
+	BEGIN;
+	INSERT INTO repack_test(i, j) VALUES (100, 100);
+	SAVEPOINT s1;
+	UPDATE repack_test SET i=101 where i=100;
+	SAVEPOINT s2;
+	UPDATE repack_test SET i=102 where i=101;
+	COMMIT;
+
+step change_subxact2: 
+	BEGIN;
+	SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 110);
+	ROLLBACK TO SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 111);
+	COMMIT;
+
+step check2: 
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s2(i, j)
+	SELECT i, j FROM repack_test;
+
+  i|  j
+---+---
+  2| 20
+  6| 60
+  8|  8
+ 10|  1
+ 40|  3
+ 50|  5
+102|100
+110|111
+(8 rows)
+
+step wakeup_before_lock: 
+	SELECT injection_points_wakeup('repack-concurrently-before-lock');
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step wait_before_lock: <... completed>
+step check1: 
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT count(DISTINCT node) FROM relfilenodes;
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s1(i, j)
+	SELECT i, j FROM repack_test;
+
+	SELECT count(*)
+	FROM data_s1 d1 FULL JOIN data_s2 d2 USING (i, j)
+	WHERE d1.i ISNULL OR d2.i ISNULL;
+
+count
+-----
+    2
+(1 row)
+
+  i|  j
+---+---
+  2| 20
+  6| 60
+  8|  8
+ 10|  1
+ 40|  3
+ 50|  5
+102|100
+110|111
+(8 rows)
+
+count
+-----
+    0
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
diff --git a/src/test/modules/injection_points/logical.conf b/src/test/modules/injection_points/logical.conf
new file mode 100644
index 00000000000..c8f264bc6cb
--- /dev/null
+++ b/src/test/modules/injection_points/logical.conf
@@ -0,0 +1 @@
+wal_level = logical
\ No newline at end of file
diff --git a/src/test/modules/injection_points/meson.build b/src/test/modules/injection_points/meson.build
index 20390d6b4bf..29561103bbf 100644
--- a/src/test/modules/injection_points/meson.build
+++ b/src/test/modules/injection_points/meson.build
@@ -47,9 +47,13 @@ tests += {
     'specs': [
       'basic',
       'inplace',
+      'repack',
       'syscache-update-pruned',
     ],
     'runningcheck': false, # see syscache-update-pruned
+    # 'repack' requires wal_level = 'logical'.
+    'regress_args': ['--temp-config', files('logical.conf')],
+
   },
   'tap': {
     'env': {
diff --git a/src/test/modules/injection_points/specs/repack.spec b/src/test/modules/injection_points/specs/repack.spec
new file mode 100644
index 00000000000..75850334986
--- /dev/null
+++ b/src/test/modules/injection_points/specs/repack.spec
@@ -0,0 +1,143 @@
+# Prefix the system columns with underscore as they are not allowed as column
+# names.
+setup
+{
+	CREATE EXTENSION injection_points;
+
+	CREATE TABLE repack_test(i int PRIMARY KEY, j int);
+	INSERT INTO repack_test(i, j) VALUES (1, 1), (2, 2), (3, 3), (4, 4);
+
+	CREATE TABLE relfilenodes(node oid);
+
+	CREATE TABLE data_s1(i int, j int);
+	CREATE TABLE data_s2(i int, j int);
+}
+
+teardown
+{
+	DROP TABLE repack_test;
+	DROP EXTENSION injection_points;
+
+	DROP TABLE relfilenodes;
+	DROP TABLE data_s1;
+	DROP TABLE data_s2;
+}
+
+session s1
+setup
+{
+	SELECT injection_points_set_local();
+	SELECT injection_points_attach('repack-concurrently-before-lock', 'wait');
+}
+# Perform the initial load and wait for s2 to do some data changes.
+step wait_before_lock
+{
+	REPACK (CONCURRENTLY) repack_test USING INDEX repack_test_pkey;
+}
+# Check the table from the perspective of s1.
+#
+# Besides the contents, we also check that relfilenode has changed.
+
+# Have each session write the contents into a table and use FULL JOIN to check
+# if the outputs are identical.
+step check1
+{
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT count(DISTINCT node) FROM relfilenodes;
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s1(i, j)
+	SELECT i, j FROM repack_test;
+
+	SELECT count(*)
+	FROM data_s1 d1 FULL JOIN data_s2 d2 USING (i, j)
+	WHERE d1.i ISNULL OR d2.i ISNULL;
+}
+teardown
+{
+    SELECT injection_points_detach('repack-concurrently-before-lock');
+}
+
+session s2
+# Change the existing data. UPDATE changes both key and non-key columns. Also
+# update one row twice to test whether tuple version generated by this session
+# can be found.
+step change_existing
+{
+	UPDATE repack_test SET i=10 where i=1;
+	UPDATE repack_test SET j=20 where i=2;
+	UPDATE repack_test SET i=30 where i=3;
+	UPDATE repack_test SET i=40 where i=30;
+	DELETE FROM repack_test WHERE i=4;
+}
+# Insert new rows and UPDATE / DELETE some of them. Again, update both key and
+# non-key column.
+step change_new
+{
+	INSERT INTO repack_test(i, j) VALUES (5, 5), (6, 6), (7, 7), (8, 8);
+	UPDATE repack_test SET i=50 where i=5;
+	UPDATE repack_test SET j=60 where i=6;
+	DELETE FROM repack_test WHERE i=7;
+}
+
+# When applying concurrent data changes, we should see the effects of an
+# in-progress subtransaction.
+#
+# XXX Not sure this test is useful now - it was designed for the patch that
+# preserves tuple visibility and which therefore modifies
+# TransactionIdIsCurrentTransactionId().
+step change_subxact1
+{
+	BEGIN;
+	INSERT INTO repack_test(i, j) VALUES (100, 100);
+	SAVEPOINT s1;
+	UPDATE repack_test SET i=101 where i=100;
+	SAVEPOINT s2;
+	UPDATE repack_test SET i=102 where i=101;
+	COMMIT;
+}
+
+# When applying concurrent data changes, we should not see the effects of a
+# rolled back subtransaction.
+#
+# XXX Is this test useful? See above.
+step change_subxact2
+{
+	BEGIN;
+	SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 110);
+	ROLLBACK TO SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 111);
+	COMMIT;
+}
+
+# Check the table from the perspective of s2.
+step check2
+{
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s2(i, j)
+	SELECT i, j FROM repack_test;
+}
+step wakeup_before_lock
+{
+	SELECT injection_points_wakeup('repack-concurrently-before-lock');
+}
+
+# Test if data changes introduced while one session is performing REPACK
+# CONCURRENTLY find their way into the table.
+permutation
+	wait_before_lock
+	change_existing
+	change_new
+	change_subxact1
+	change_subxact2
+	check2
+	wakeup_before_lock
+	check1
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 3a1d1d28282..fe227bd8a30 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1999,17 +1999,17 @@ pg_stat_progress_cluster| SELECT s.pid,
             WHEN 2 THEN 'index scanning heap'::text
             WHEN 3 THEN 'sorting tuples'::text
             WHEN 4 THEN 'writing new heap'::text
-            WHEN 5 THEN 'swapping relation files'::text
-            WHEN 6 THEN 'rebuilding index'::text
-            WHEN 7 THEN 'performing final cleanup'::text
+            WHEN 6 THEN 'swapping relation files'::text
+            WHEN 7 THEN 'rebuilding index'::text
+            WHEN 8 THEN 'performing final cleanup'::text
             ELSE NULL::text
         END AS phase,
     (s.param3)::oid AS cluster_index_relid,
     s.param4 AS heap_tuples_scanned,
     s.param5 AS heap_tuples_written,
-    s.param6 AS heap_blks_total,
-    s.param7 AS heap_blks_scanned,
-    s.param8 AS index_rebuild_count
+    s.param8 AS heap_blks_total,
+    s.param9 AS heap_blks_scanned,
+    s.param10 AS index_rebuild_count
    FROM (pg_stat_get_progress_info('CLUSTER'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_copy| SELECT s.pid,
@@ -2081,17 +2081,20 @@ pg_stat_progress_repack| SELECT s.pid,
             WHEN 2 THEN 'index scanning heap'::text
             WHEN 3 THEN 'sorting tuples'::text
             WHEN 4 THEN 'writing new heap'::text
-            WHEN 5 THEN 'swapping relation files'::text
-            WHEN 6 THEN 'rebuilding index'::text
-            WHEN 7 THEN 'performing final cleanup'::text
+            WHEN 5 THEN 'catch-up'::text
+            WHEN 6 THEN 'swapping relation files'::text
+            WHEN 7 THEN 'rebuilding index'::text
+            WHEN 8 THEN 'performing final cleanup'::text
             ELSE NULL::text
         END AS phase,
     (s.param3)::oid AS repack_index_relid,
     s.param4 AS heap_tuples_scanned,
-    s.param5 AS heap_tuples_written,
-    s.param6 AS heap_blks_total,
-    s.param7 AS heap_blks_scanned,
-    s.param8 AS index_rebuild_count
+    s.param5 AS heap_tuples_inserted,
+    s.param6 AS heap_tuples_updated,
+    s.param7 AS heap_tuples_deleted,
+    s.param8 AS heap_blks_total,
+    s.param9 AS heap_blks_scanned,
+    s.param10 AS index_rebuild_count
    FROM (pg_stat_get_progress_info('REPACK'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_vacuum| SELECT s.pid,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 98242e25432..b64ab8dfab4 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -485,6 +485,8 @@ CompressFileHandle
 CompressionLocation
 CompressorState
 ComputeXidHorizonsResult
+ConcurrentChange
+ConcurrentChangeKind
 ConditionVariable
 ConditionVariableMinimallyPadded
 ConditionalStack
@@ -1257,6 +1259,7 @@ IndexElem
 IndexFetchHeapData
 IndexFetchTableData
 IndexInfo
+IndexInsertState
 IndexList
 IndexOnlyScan
 IndexOnlyScanState
@@ -2538,6 +2541,7 @@ ReorderBufferUpdateProgressTxnCB
 ReorderTuple
 RepOriginId
 RepackCommand
+RepackDecodingState
 RepackStmt
 ReparameterizeForeignPathByChild_function
 ReplaceVarsFromTargetList_context
-- 
2.39.5

v20-0006-Preserve-visibility-information-of-the-concurren.patchtext/x-diff; charset=utf-8Download

From b37a7a44f4e41f4a4322edf7648fbb651218c159 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= <alvherre@kurilemu.de>
Date: Sat, 30 Aug 2025 19:40:04 +0200
Subject: [PATCH v20 6/6] Preserve visibility information of the concurrent
 data changes.

As explained in the commit message of the preceding patch of the series, the
data changes done by applications while REPACK CONCURRENTLY is copying the
table contents to a new file are decoded from WAL and eventually also applied
to the new file. To reduce the complexity a little bit, the preceding patch
uses the current transaction (i.e. transaction opened by the REPACK command)
to execute those INSERT, UPDATE and DELETE commands.

However, REPACK is not expected to change visibility of tuples. Therefore,
this patch fixes the handling of the "concurrent data changes". It ensures
that tuples written into the new table have the same XID and command ID (CID)
as they had in the old table.

To "replay" an UPDATE or DELETE command on the new table, we need the
appropriate snapshot to find the previous tuple version in the new table. The
(historic) snapshot we used to decode the UPDATE / DELETE should (by
definition) see the state of the catalog prior to that UPDATE / DELETE. Thus
we can use the same snapshot to find the "old tuple" for UPDATE / DELETE in
the new table if:

1) REPACK CONCURRENTLY preserves visibility information of all tuples - that's
the purpose of this part of the patch series.

2) The table being REPACKed is treated as a system catalog by all transactions
that modify its data. This ensures that reorderbuffer.c generates a new
snapshot for each data change in the table.

We ensure 2) by maintaining a shared hashtable of tables being REPACKed
CONCURRENTLY and by adjusting the RelationIsAccessibleInLogicalDecoding()
macro so it checks this hashtable. (The corresponding flag is also added to
the relation cache, so that the shared hashtable does not have to be accessed
too often.) It's essential that after adding an entry to the hashtable we wait
for completion of all the transactions that might have started to modify our
table before our entry has was added. We achieve that by upgrading our lock on
the table to ShareLock temporarily: as soon as we acquire it, no DML command
should be running on the table. (This lock upgrade shouldn't cause any
deadlock because we care to not hold a lock on other objects at the same
time.)

As long as we preserve the tuple visibility information (which includes XID),
it's important to avoid logical decoding of the WAL generated by DMLs on the
new table: the logical decoding subsystem probably does not expect that the
incoming WAL records contain XIDs of an already decoded transactions. (And of
course, repeated decoding would be wasted effort.)

Author: Antonin Houska <ah@cybertec.at>
Author: Mikhail Nikalayeu <mihailnikalayeu@gmail.com> (small changes)
---
 src/backend/access/common/toast_internals.c   |   3 +-
 src/backend/access/heap/heapam.c              |  51 ++-
 src/backend/access/heap/heapam_handler.c      |  23 +-
 src/backend/access/transam/xact.c             |  52 +++
 src/backend/commands/cluster.c                | 400 ++++++++++++++++--
 src/backend/replication/logical/decode.c      |  28 +-
 src/backend/replication/logical/snapbuild.c   |  22 +-
 .../pgoutput_repack/pgoutput_repack.c         |  68 ++-
 src/backend/storage/ipc/ipci.c                |   2 +
 .../utils/activity/wait_event_names.txt       |   1 +
 src/backend/utils/cache/inval.c               |  21 +
 src/backend/utils/cache/relcache.c            |   4 +
 src/include/access/heapam.h                   |  12 +-
 src/include/access/xact.h                     |   2 +
 src/include/commands/cluster.h                |  22 +
 src/include/storage/lwlocklist.h              |   1 +
 src/include/utils/inval.h                     |   2 +
 src/include/utils/rel.h                       |   7 +-
 src/include/utils/snapshot.h                  |   3 +
 .../injection_points/specs/repack.spec        |   4 -
 src/tools/pgindent/typedefs.list              |   1 +
 21 files changed, 635 insertions(+), 94 deletions(-)

diff --git a/src/backend/access/common/toast_internals.c b/src/backend/access/common/toast_internals.c
index a1d0eed8953..586eb42a137 100644
--- a/src/backend/access/common/toast_internals.c
+++ b/src/backend/access/common/toast_internals.c
@@ -320,7 +320,8 @@ toast_save_datum(Relation rel, Datum value,
 		memcpy(VARDATA(&chunk_data), data_p, chunk_size);
 		toasttup = heap_form_tuple(toasttupDesc, t_values, t_isnull);
 
-		heap_insert(toastrel, toasttup, mycid, options, NULL);
+		heap_insert(toastrel, toasttup, GetCurrentTransactionId(), mycid,
+					options, NULL);
 
 		/*
 		 * Create the index entry.  We cheat a little here by not using
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index f9a4fe3faed..fd17286cabe 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2070,7 +2070,7 @@ ReleaseBulkInsertStatePin(BulkInsertState bistate)
 /*
  *	heap_insert		- insert tuple into a heap
  *
- * The new tuple is stamped with current transaction ID and the specified
+ * The new tuple is stamped with specified transaction ID and the specified
  * command ID.
  *
  * See table_tuple_insert for comments about most of the input flags, except
@@ -2086,15 +2086,16 @@ ReleaseBulkInsertStatePin(BulkInsertState bistate)
  * reflected into *tup.
  */
 void
-heap_insert(Relation relation, HeapTuple tup, CommandId cid,
-			int options, BulkInsertState bistate)
+heap_insert(Relation relation, HeapTuple tup, TransactionId xid,
+			CommandId cid, int options, BulkInsertState bistate)
 {
-	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
+	Assert(TransactionIdIsValid(xid));
+
 	/* Cheap, simplistic check that the tuple matches the rel's rowtype. */
 	Assert(HeapTupleHeaderGetNatts(tup->t_data) <=
 		   RelationGetNumberOfAttributes(relation));
@@ -2176,8 +2177,15 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		/*
 		 * If this is a catalog, we need to transmit combo CIDs to properly
 		 * decode, so log that as well.
+		 *
+		 * HEAP_INSERT_NO_LOGICAL should be set when applying data changes
+		 * done by other transactions during REPACK CONCURRENTLY. In such a
+		 * case, the insertion should not be decoded at all - see
+		 * heap_decode(). (It's also set by raw_heap_insert() for TOAST, but
+		 * TOAST does not pass this test anyway.)
 		 */
-		if (RelationIsAccessibleInLogicalDecoding(relation))
+		if ((options & HEAP_INSERT_NO_LOGICAL) == 0 &&
+			RelationIsAccessibleInLogicalDecoding(relation))
 			log_heap_new_cid(relation, heaptup);
 
 		/*
@@ -2723,7 +2731,8 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 void
 simple_heap_insert(Relation relation, HeapTuple tup)
 {
-	heap_insert(relation, tup, GetCurrentCommandId(true), 0, NULL);
+	heap_insert(relation, tup, GetCurrentTransactionId(),
+				GetCurrentCommandId(true), 0, NULL);
 }
 
 /*
@@ -2780,11 +2789,11 @@ xmax_infomask_changed(uint16 new_infomask, uint16 old_infomask)
  */
 TM_Result
 heap_delete(Relation relation, ItemPointer tid,
-			CommandId cid, Snapshot crosscheck, bool wait,
-			TM_FailureData *tmfd, bool changingPart, bool wal_logical)
+			TransactionId xid, CommandId cid, Snapshot crosscheck, bool wait,
+			TM_FailureData *tmfd, bool changingPart,
+			bool wal_logical)
 {
 	TM_Result	result;
-	TransactionId xid = GetCurrentTransactionId();
 	ItemId		lp;
 	HeapTupleData tp;
 	Page		page;
@@ -2801,6 +2810,7 @@ heap_delete(Relation relation, ItemPointer tid,
 	bool		old_key_copied = false;
 
 	Assert(ItemPointerIsValid(tid));
+	Assert(TransactionIdIsValid(xid));
 
 	AssertHasSnapshotForToast(relation);
 
@@ -3097,8 +3107,12 @@ l1:
 		/*
 		 * For logical decode we need combo CIDs to properly decode the
 		 * catalog
+		 *
+		 * Like in heap_insert(), visibility is unchanged when called from
+		 * VACUUM FULL / CLUSTER.
 		 */
-		if (RelationIsAccessibleInLogicalDecoding(relation))
+		if (wal_logical &&
+			RelationIsAccessibleInLogicalDecoding(relation))
 			log_heap_new_cid(relation, &tp);
 
 		xlrec.flags = 0;
@@ -3217,11 +3231,12 @@ simple_heap_delete(Relation relation, ItemPointer tid)
 	TM_Result	result;
 	TM_FailureData tmfd;
 
-	result = heap_delete(relation, tid,
+	result = heap_delete(relation, tid, GetCurrentTransactionId(),
 						 GetCurrentCommandId(true), InvalidSnapshot,
 						 true /* wait for commit */ ,
 						 &tmfd, false,	/* changingPart */
 						 true /* wal_logical */ );
+
 	switch (result)
 	{
 		case TM_SelfModified:
@@ -3260,12 +3275,11 @@ simple_heap_delete(Relation relation, ItemPointer tid)
  */
 TM_Result
 heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
-			CommandId cid, Snapshot crosscheck, bool wait,
-			TM_FailureData *tmfd, LockTupleMode *lockmode,
+			TransactionId xid, CommandId cid, Snapshot crosscheck,
+			bool wait, TM_FailureData *tmfd, LockTupleMode *lockmode,
 			TU_UpdateIndexes *update_indexes, bool wal_logical)
 {
 	TM_Result	result;
-	TransactionId xid = GetCurrentTransactionId();
 	Bitmapset  *hot_attrs;
 	Bitmapset  *sum_attrs;
 	Bitmapset  *key_attrs;
@@ -3305,6 +3319,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 				infomask2_new_tuple;
 
 	Assert(ItemPointerIsValid(otid));
+	Assert(TransactionIdIsValid(xid));
 
 	/* Cheap, simplistic check that the tuple matches the rel's rowtype. */
 	Assert(HeapTupleHeaderGetNatts(newtup->t_data) <=
@@ -4144,8 +4159,12 @@ l2:
 		/*
 		 * For logical decoding we need combo CIDs to properly decode the
 		 * catalog.
+		 *
+		 * Like in heap_insert(), visibility is unchanged when called from
+		 * VACUUM FULL / CLUSTER.
 		 */
-		if (RelationIsAccessibleInLogicalDecoding(relation))
+		if (wal_logical &&
+			RelationIsAccessibleInLogicalDecoding(relation))
 		{
 			log_heap_new_cid(relation, &oldtup);
 			log_heap_new_cid(relation, heaptup);
@@ -4511,7 +4530,7 @@ simple_heap_update(Relation relation, ItemPointer otid, HeapTuple tup,
 	TM_FailureData tmfd;
 	LockTupleMode lockmode;
 
-	result = heap_update(relation, otid, tup,
+	result = heap_update(relation, otid, tup, GetCurrentTransactionId(),
 						 GetCurrentCommandId(true), InvalidSnapshot,
 						 true /* wait for commit */ ,
 						 &tmfd, &lockmode, update_indexes,
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index d03084768e0..b50f7dc9b9c 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -253,7 +253,8 @@ heapam_tuple_insert(Relation relation, TupleTableSlot *slot, CommandId cid,
 	tuple->t_tableOid = slot->tts_tableOid;
 
 	/* Perform the insertion, and copy the resulting ItemPointer */
-	heap_insert(relation, tuple, cid, options, bistate);
+	heap_insert(relation, tuple, GetCurrentTransactionId(), cid, options,
+				bistate);
 	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
 
 	if (shouldFree)
@@ -276,7 +277,8 @@ heapam_tuple_insert_speculative(Relation relation, TupleTableSlot *slot,
 	options |= HEAP_INSERT_SPECULATIVE;
 
 	/* Perform the insertion, and copy the resulting ItemPointer */
-	heap_insert(relation, tuple, cid, options, bistate);
+	heap_insert(relation, tuple, GetCurrentTransactionId(), cid, options,
+				bistate);
 	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
 
 	if (shouldFree)
@@ -310,8 +312,8 @@ heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
 	 * the storage itself is cleaning the dead tuples by itself, it is the
 	 * time to call the index tuple deletion also.
 	 */
-	return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart,
-					   true);
+	return heap_delete(relation, tid, GetCurrentTransactionId(), cid,
+					   crosscheck, wait, tmfd, changingPart, true);
 }
 
 
@@ -329,7 +331,8 @@ heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
 	slot->tts_tableOid = RelationGetRelid(relation);
 	tuple->t_tableOid = slot->tts_tableOid;
 
-	result = heap_update(relation, otid, tuple, cid, crosscheck, wait,
+	result = heap_update(relation, otid, tuple, GetCurrentTransactionId(),
+						 cid, crosscheck, wait,
 						 tmfd, lockmode, update_indexes, true);
 	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
 
@@ -2477,9 +2480,15 @@ reform_and_rewrite_tuple(HeapTuple tuple,
 		 * flag to skip logical decoding: as soon as REPACK CONCURRENTLY swaps
 		 * the relation files, it drops this relation, so no logical
 		 * replication subscription should need the data.
+		 *
+		 * It is also crucial to stamp the new record with the exact same xid
+		 * and cid, because the tuple must be visible to the snapshot of the
+		 * applied concurrent change later.
 		 */
-		heap_insert(NewHeap, copiedTuple, GetCurrentCommandId(true),
-					HEAP_INSERT_NO_LOGICAL, NULL);
+		CommandId	cid = HeapTupleHeaderGetRawCommandId(tuple->t_data);
+		TransactionId xid = HeapTupleHeaderGetXmin(tuple->t_data);
+
+		heap_insert(NewHeap, copiedTuple, xid, cid, HEAP_INSERT_NO_LOGICAL, NULL);
 	}
 
 	heap_freetuple(copiedTuple);
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 5670f2bfbde..e913594fc07 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -126,6 +126,18 @@ static FullTransactionId XactTopFullTransactionId = {InvalidTransactionId};
 static int	nParallelCurrentXids = 0;
 static TransactionId *ParallelCurrentXids;
 
+/*
+ * Another case that requires TransactionIdIsCurrentTransactionId() to behave
+ * specially is when REPACK CONCURRENTLY is processing data changes made in
+ * the old storage of a table by other transactions. When applying the changes
+ * to the new storage, the backend executing the CLUSTER command needs to act
+ * on behalf on those other transactions. The transactions responsible for the
+ * changes in the old storage are stored in this array, sorted by
+ * xidComparator.
+ */
+static int	nRepackCurrentXids = 0;
+static TransactionId *RepackCurrentXids = NULL;
+
 /*
  * Miscellaneous flag bits to record events which occur on the top level
  * transaction. These flags are only persisted in MyXactFlags and are intended
@@ -973,6 +985,8 @@ TransactionIdIsCurrentTransactionId(TransactionId xid)
 		int			low,
 					high;
 
+		Assert(nRepackCurrentXids == 0);
+
 		low = 0;
 		high = nParallelCurrentXids - 1;
 		while (low <= high)
@@ -992,6 +1006,21 @@ TransactionIdIsCurrentTransactionId(TransactionId xid)
 		return false;
 	}
 
+	/*
+	 * When executing CLUSTER CONCURRENTLY, the array of current transactions
+	 * is given.
+	 */
+	if (nRepackCurrentXids > 0)
+	{
+		Assert(nParallelCurrentXids == 0);
+
+		return bsearch(&xid,
+					   RepackCurrentXids,
+					   nRepackCurrentXids,
+					   sizeof(TransactionId),
+					   xidComparator) != NULL;
+	}
+
 	/*
 	 * We will return true for the Xid of the current subtransaction, any of
 	 * its subcommitted children, any of its parents, or any of their
@@ -5661,6 +5690,29 @@ EndParallelWorkerTransaction(void)
 	CurrentTransactionState->blockState = TBLOCK_DEFAULT;
 }
 
+/*
+ * SetRepackCurrentXids
+ *		Set the XID array that TransactionIdIsCurrentTransactionId() should
+ *		use.
+ */
+void
+SetRepackCurrentXids(TransactionId *xip, int xcnt)
+{
+	RepackCurrentXids = xip;
+	nRepackCurrentXids = xcnt;
+}
+
+/*
+ * ResetRepackCurrentXids
+ *		Undo the effect of SetRepackCurrentXids().
+ */
+void
+ResetRepackCurrentXids(void)
+{
+	RepackCurrentXids = NULL;
+	nRepackCurrentXids = 0;
+}
+
 /*
  * ShowTransactionState
  *		Debug support
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 511b2bb6c43..a44724f3757 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -82,6 +82,11 @@ typedef struct
  * The following definitions are used for concurrent processing.
  */
 
+/*
+ * OID of the table being repacked by this backend.
+ */
+static Oid	repacked_rel = InvalidOid;
+
 /*
  * The locators are used to avoid logical decoding of data that we do not need
  * for our table.
@@ -125,8 +130,10 @@ static List *get_tables_to_repack_partitioned(RepackCommand cmd,
 static bool cluster_is_permitted_for_relation(RepackCommand cmd,
 											  Oid relid, Oid userid);
 
-static void begin_concurrent_repack(Relation rel);
-static void end_concurrent_repack(void);
+static void begin_concurrent_repack(Relation rel, Relation *index_p,
+									bool *entered_p);
+static void end_concurrent_repack(bool error);
+static void cluster_before_shmem_exit_callback(int code, Datum arg);
 static LogicalDecodingContext *setup_logical_decoding(Oid relid,
 													  const char *slotname,
 													  TupleDesc tupdesc);
@@ -146,6 +153,7 @@ static void apply_concurrent_delete(Relation rel, HeapTuple tup_target,
 									ConcurrentChange *change);
 static HeapTuple find_target_tuple(Relation rel, ScanKey key, int nkeys,
 								   HeapTuple tup_key,
+								   Snapshot snapshot,
 								   IndexInsertState *iistate,
 								   TupleTableSlot *ident_slot,
 								   IndexScanDesc *scan_p);
@@ -450,6 +458,8 @@ cluster_rel(RepackCommand cmd, bool usingindex,
 	bool		verbose = ((params->options & CLUOPT_VERBOSE) != 0);
 	bool		recheck = ((params->options & CLUOPT_RECHECK) != 0);
 	bool		concurrent = ((params->options & CLUOPT_CONCURRENT) != 0);
+	bool		entered,
+				success;
 
 	/*
 	 * Check that the correct lock is held. The lock mode is
@@ -620,23 +630,30 @@ cluster_rel(RepackCommand cmd, bool usingindex,
 		TransferPredicateLocksToHeapRelation(OldHeap);
 
 	/* rebuild_relation does all the dirty work */
+	entered = false;
+	success = false;
 	PG_TRY();
 	{
 		/*
-		 * For concurrent processing, make sure that our logical decoding
-		 * ignores data changes of other tables than the one we are
-		 * processing.
+		 * For concurrent processing, make sure that
+		 *
+		 * 1) our logical decoding ignores data changes of other tables than
+		 * the one we are processing.
+		 *
+		 * 2) other transactions treat this table as if it was a system / user
+		 * catalog, and WAL the relevant additional information.
 		 */
 		if (concurrent)
-			begin_concurrent_repack(OldHeap);
+			begin_concurrent_repack(OldHeap, &index, &entered);
 
 		rebuild_relation(cmd, usingindex, OldHeap, index, save_userid,
 						 verbose, concurrent);
+		success = true;
 	}
 	PG_FINALLY();
 	{
-		if (concurrent)
-			end_concurrent_repack();
+		if (concurrent && entered)
+			end_concurrent_repack(!success);
 	}
 	PG_END_TRY();
 
@@ -2396,6 +2413,47 @@ determine_clustered_index(Relation rel, bool usingindex, const char *indexname)
 }
 
 
+/*
+ * Each relation being processed by REPACK CONCURRENTLY must be in the
+ * repackedRels hashtable.
+ */
+typedef struct RepackedRel
+{
+	Oid			relid;
+	Oid			dbid;
+} RepackedRel;
+
+static HTAB *RepackedRelsHash = NULL;
+
+/*
+ * Maximum number of entries in the hashtable.
+ *
+ * A replication slot is needed for the processing, so use this GUC to
+ * allocate memory for the hashtable.
+ */
+#define	MAX_REPACKED_RELS	(max_replication_slots)
+
+Size
+RepackShmemSize(void)
+{
+	return hash_estimate_size(MAX_REPACKED_RELS, sizeof(RepackedRel));
+}
+
+void
+RepackShmemInit(void)
+{
+	HASHCTL		info;
+
+	info.keysize = sizeof(RepackedRel);
+	info.entrysize = info.keysize;
+
+	RepackedRelsHash = ShmemInitHash("Repacked Relations",
+									 MAX_REPACKED_RELS,
+									 MAX_REPACKED_RELS,
+									 &info,
+									 HASH_ELEM | HASH_BLOBS);
+}
+
 /*
  * Call this function before REPACK CONCURRENTLY starts to setup logical
  * decoding. It makes sure that other users of the table put enough
@@ -2410,11 +2468,119 @@ determine_clustered_index(Relation rel, bool usingindex, const char *indexname)
  *
  * Note that TOAST table needs no attention here as it's not scanned using
  * historic snapshot.
+ *
+ * 'index_p' is in/out argument because the function unlocks the index
+ * temporarily.
+ *
+ * 'enter_p' receives a bool value telling whether relation OID was entered
+ * into RepackedRelsHash or not.
  */
 static void
-begin_concurrent_repack(Relation rel)
+begin_concurrent_repack(Relation rel, Relation *index_p, bool *entered_p)
 {
-	Oid			toastrelid;
+	Oid			relid,
+				toastrelid;
+	Relation	index = NULL;
+	Oid			indexid = InvalidOid;
+	RepackedRel key,
+			   *entry;
+	bool		found;
+	static bool before_shmem_exit_callback_setup = false;
+
+	relid = RelationGetRelid(rel);
+	index = index_p ? *index_p : NULL;
+
+	/*
+	 * Make sure that we do not leave an entry in RepackedRelsHash if exiting
+	 * due to FATAL.
+	 */
+	if (!before_shmem_exit_callback_setup)
+	{
+		before_shmem_exit(cluster_before_shmem_exit_callback, 0);
+		before_shmem_exit_callback_setup = true;
+	}
+
+	memset(&key, 0, sizeof(key));
+	key.relid = relid;
+	key.dbid = MyDatabaseId;
+
+	*entered_p = false;
+	LWLockAcquire(RepackedRelsLock, LW_EXCLUSIVE);
+	entry = (RepackedRel *)
+		hash_search(RepackedRelsHash, &key, HASH_ENTER_NULL, &found);
+	if (found)
+	{
+		/*
+		 * Since REPACK CONCURRENTLY takes ShareRowExclusiveLock, a conflict
+		 * should occur much earlier. However that lock may be released
+		 * temporarily, see below.  Anyway, we should complain whatever the
+		 * reason of the conflict might be.
+		 */
+		ereport(ERROR,
+				(errmsg("relation \"%s\" is already being processed by REPACK CONCURRENTLY",
+						RelationGetRelationName(rel))));
+	}
+	if (entry == NULL)
+		ereport(ERROR,
+				(errmsg("too many requests for REPACK CONCURRENTLY at a time")),
+				(errhint("Please consider increasing the \"max_replication_slots\" configuration parameter.")));
+
+	/*
+	 * Even if anything fails below, the caller has to do cleanup in the
+	 * shared memory.
+	 */
+	*entered_p = true;
+
+	/*
+	 * Enable the callback to remove the entry in case of exit. We should not
+	 * do this earlier, otherwise an attempt to insert already existing entry
+	 * could make us remove that entry (inserted by another backend) during
+	 * ERROR handling.
+	 */
+	Assert(!OidIsValid(repacked_rel));
+	repacked_rel = relid;
+
+	LWLockRelease(RepackedRelsLock);
+
+	/*
+	 * Make sure that other backends are aware of the new hash entry as soon
+	 * as they open our table.
+	 */
+	CacheInvalidateRelcacheImmediate(relid);
+
+	/*
+	 * Also make sure that the existing users of the table update their
+	 * relcache entry as soon as they try to run DML commands on it.
+	 *
+	 * ShareLock is the weakest lock that conflicts with DMLs. If any backend
+	 * has a lower lock, we assume it'll accept our invalidation message when
+	 * it changes the lock mode.
+	 *
+	 * Before upgrading the lock on the relation, close the index temporarily
+	 * to avoid a deadlock if another backend running DML already has its lock
+	 * (ShareLock) on the table and waits for the lock on the index.
+	 */
+	if (index)
+	{
+		indexid = RelationGetRelid(index);
+		index_close(index, ShareUpdateExclusiveLock);
+	}
+	LockRelationOid(relid, ShareLock);
+	UnlockRelationOid(relid, ShareLock);
+	if (OidIsValid(indexid))
+	{
+		/*
+		 * Re-open the index and check that it hasn't changed while unlocked.
+		 */
+		check_index_is_clusterable(rel, indexid, ShareUpdateExclusiveLock);
+
+		/*
+		 * Return the new relcache entry to the caller. (It's been locked by
+		 * the call above.)
+		 */
+		index = index_open(indexid, NoLock);
+		*index_p = index;
+	}
 
 	/* Avoid logical decoding of other relations by this backend. */
 	repacked_rel_locator = rel->rd_locator;
@@ -2432,15 +2598,122 @@ begin_concurrent_repack(Relation rel)
 
 /*
  * Call this when done with REPACK CONCURRENTLY.
+ *
+ * 'error' tells whether the function is being called in order to handle
+ * error.
  */
 static void
-end_concurrent_repack(void)
+end_concurrent_repack(bool error)
 {
+	RepackedRel key;
+	RepackedRel *entry = NULL;
+	Oid			relid = repacked_rel;
+
+	/* Remove the relation from the hash if we managed to insert one. */
+	if (OidIsValid(repacked_rel))
+	{
+		memset(&key, 0, sizeof(key));
+		key.relid = repacked_rel;
+		key.dbid = MyDatabaseId;
+		LWLockAcquire(RepackedRelsLock, LW_EXCLUSIVE);
+		entry = hash_search(RepackedRelsHash, &key, HASH_REMOVE, NULL);
+		LWLockRelease(RepackedRelsLock);
+
+		/*
+		 * Make others refresh their information whether they should still
+		 * treat the table as catalog from the perspective of writing WAL.
+		 *
+		 * XXX Unlike entering the entry into the hashtable, we do not bother
+		 * with locking and unlocking the table here:
+		 *
+		 * 1) On normal completion (and sometimes even on ERROR), the caller
+		 * is already holding AccessExclusiveLock on the table, so there
+		 * should be no relcache reference unaware of this change.
+		 *
+		 * 2) In the other cases, the worst scenario is that the other
+		 * backends will write unnecessary information to WAL until they close
+		 * the relation.
+		 *
+		 * Should we use ShareLock mode to fix 2) at least for the non-FATAL
+		 * errors? (Our before_shmem_exit callback is in charge of FATAL, and
+		 * that probably should not try to acquire any lock.)
+		 */
+		CacheInvalidateRelcacheImmediate(repacked_rel);
+
+		/*
+		 * By clearing this variable we also disable
+		 * cluster_before_shmem_exit_callback().
+		 */
+		repacked_rel = InvalidOid;
+	}
+
 	/*
 	 * Restore normal function of (future) logical decoding for this backend.
 	 */
 	repacked_rel_locator.relNumber = InvalidOid;
 	repacked_rel_toast_locator.relNumber = InvalidOid;
+
+	/*
+	 * On normal completion (!error), we should not really fail to remove the
+	 * entry. But if it wasn't there for any reason, raise ERROR to make sure
+	 * the transaction is aborted: if other transactions, while changing the
+	 * contents of the relation, didn't know that REPACK CONCURRENTLY was in
+	 * progress, they could have missed to WAL enough information, and thus we
+	 * could have produced an inconsistent table contents.
+	 *
+	 * On the other hand, if we are already handling an error, there's no
+	 * reason to worry about inconsistent contents of the new storage because
+	 * the transaction is going to be rolled back anyway. Furthermore, by
+	 * raising ERROR here we'd shadow the original error.
+	 */
+	if (!error)
+	{
+		char	   *relname;
+
+		if (OidIsValid(relid) && entry == NULL)
+		{
+			relname = get_rel_name(relid);
+			if (!relname)
+				ereport(ERROR,
+						(errmsg("cache lookup failed for relation %u",
+								relid)));
+
+			ereport(ERROR,
+					(errmsg("relation \"%s\" not found among repacked relations",
+							relname)));
+		}
+	}
+}
+
+/*
+ * A wrapper to call end_concurrent_repack() as a before_shmem_exit callback.
+ */
+static void
+cluster_before_shmem_exit_callback(int code, Datum arg)
+{
+	if (OidIsValid(repacked_rel))
+		end_concurrent_repack(true);
+}
+
+/*
+ * Check if relation is currently being processed by REPACK CONCURRENTLY.
+ */
+bool
+is_concurrent_repack_in_progress(Oid relid)
+{
+	RepackedRel key,
+			   *entry;
+
+	memset(&key, 0, sizeof(key));
+	key.relid = relid;
+	key.dbid = MyDatabaseId;
+
+	LWLockAcquire(RepackedRelsLock, LW_SHARED);
+	entry = (RepackedRel *)
+		hash_search(RepackedRelsHash, &key, HASH_FIND, NULL);
+	LWLockRelease(RepackedRelsLock);
+
+	return entry != NULL;
 }
 
 /*
@@ -2502,6 +2775,9 @@ setup_logical_decoding(Oid relid, const char *slotname, TupleDesc tupdesc)
 	dstate->relid = relid;
 	dstate->tstore = tuplestore_begin_heap(false, false,
 										   maintenance_work_mem);
+#ifdef USE_ASSERT_CHECKING
+	dstate->last_change_xid = InvalidTransactionId;
+#endif
 
 	dstate->tupdesc = tupdesc;
 
@@ -2649,6 +2925,7 @@ apply_concurrent_changes(RepackDecodingState *dstate, Relation rel,
 		char	   *change_raw,
 				   *src;
 		ConcurrentChange change;
+		Snapshot	snapshot;
 		bool		isnull[1];
 		Datum		values[1];
 
@@ -2717,8 +2994,30 @@ apply_concurrent_changes(RepackDecodingState *dstate, Relation rel,
 
 			/*
 			 * Find the tuple to be updated or deleted.
+			 *
+			 * As the table being REPACKed concurrently is treated like a
+			 * catalog, new CID is WAL-logged and decoded. And since we use
+			 * the same XID that the original DMLs did, the snapshot used for
+			 * the logical decoding (by now converted to a non-historic MVCC
+			 * snapshot) should see the tuples inserted previously into the
+			 * new heap and/or updated there.
 			 */
-			tup_exist = find_target_tuple(rel, key, nkeys, tup_key,
+			snapshot = change.snapshot;
+
+			/*
+			 * Set what should be considered current transaction (and
+			 * subtransactions) during visibility check.
+			 *
+			 * Note that this snapshot was created from a historic snapshot
+			 * using SnapBuildMVCCFromHistoric(), which does not touch
+			 * 'subxip'. Thus, unlike in a regular MVCC snapshot, the array
+			 * only contains the transactions whose data changes we are
+			 * applying, and its subtransactions. That's exactly what we need
+			 * to check if particular xact is a "current transaction:".
+			 */
+			SetRepackCurrentXids(snapshot->subxip, snapshot->subxcnt);
+
+			tup_exist = find_target_tuple(rel, key, nkeys, tup_key, snapshot,
 										  iistate, ident_slot, &ind_scan);
 			if (tup_exist == NULL)
 				elog(ERROR, "Failed to find target tuple");
@@ -2729,6 +3028,8 @@ apply_concurrent_changes(RepackDecodingState *dstate, Relation rel,
 			else
 				apply_concurrent_delete(rel, tup_exist, &change);
 
+			ResetRepackCurrentXids();
+
 			if (tup_old != NULL)
 			{
 				pfree(tup_old);
@@ -2741,14 +3042,14 @@ apply_concurrent_changes(RepackDecodingState *dstate, Relation rel,
 		else
 			elog(ERROR, "Unrecognized kind of change: %d", change.kind);
 
-		/*
-		 * If a change was applied now, increment CID for next writes and
-		 * update the snapshot so it sees the changes we've applied so far.
-		 */
-		if (change.kind != CHANGE_UPDATE_OLD)
+		/* Free the snapshot if this is the last change that needed it. */
+		Assert(change.snapshot->active_count > 0);
+		change.snapshot->active_count--;
+		if (change.snapshot->active_count == 0)
 		{
-			CommandCounterIncrement();
-			UpdateActiveSnapshotCommandId();
+			if (change.snapshot == dstate->snapshot)
+				dstate->snapshot = NULL;
+			FreeSnapshot(change.snapshot);
 		}
 
 		/* TTSOpsMinimalTuple has .get_heap_tuple==NULL. */
@@ -2768,16 +3069,35 @@ static void
 apply_concurrent_insert(Relation rel, ConcurrentChange *change, HeapTuple tup,
 						IndexInsertState *iistate, TupleTableSlot *index_slot)
 {
+	Snapshot	snapshot = change->snapshot;
 	List	   *recheck;
 
+	/*
+	 * For INSERT, the visibility information is not important, but we use the
+	 * snapshot to get CID. Index functions might need the whole snapshot
+	 * anyway.
+	 */
+	SetRepackCurrentXids(snapshot->subxip, snapshot->subxcnt);
+
+	/*
+	 * Write the tuple into the new heap.
+	 *
+	 * The snapshot is the one we used to decode the insert (though converted
+	 * to "non-historic" MVCC snapshot), i.e. the snapshot's curcid is the
+	 * tuple CID incremented by one (due to the "new CID" WAL record that got
+	 * written along with the INSERT record). Thus if we want to use the
+	 * original CID, we need to subtract 1 from curcid.
+	 */
+	Assert(snapshot->curcid != InvalidCommandId &&
+		   snapshot->curcid > FirstCommandId);
 
 	/*
 	 * Like simple_heap_insert(), but make sure that the INSERT is not
 	 * logically decoded - see reform_and_rewrite_tuple() for more
 	 * information.
 	 */
-	heap_insert(rel, tup, GetCurrentCommandId(true), HEAP_INSERT_NO_LOGICAL,
-				NULL);
+	heap_insert(rel, tup, change->xid, snapshot->curcid - 1,
+				HEAP_INSERT_NO_LOGICAL, NULL);
 
 	/*
 	 * Update indexes.
@@ -2785,6 +3105,7 @@ apply_concurrent_insert(Relation rel, ConcurrentChange *change, HeapTuple tup,
 	 * In case functions in the index need the active snapshot and caller
 	 * hasn't set one.
 	 */
+	PushActiveSnapshot(snapshot);
 	ExecStoreHeapTuple(tup, index_slot, false);
 	recheck = ExecInsertIndexTuples(iistate->rri,
 									index_slot,
@@ -2795,6 +3116,8 @@ apply_concurrent_insert(Relation rel, ConcurrentChange *change, HeapTuple tup,
 									NIL,	/* arbiterIndexes */
 									false	/* onlySummarizing */
 		);
+	PopActiveSnapshot();
+	ResetRepackCurrentXids();
 
 	/*
 	 * If recheck is required, it must have been preformed on the source
@@ -2816,6 +3139,7 @@ apply_concurrent_update(Relation rel, HeapTuple tup, HeapTuple tup_target,
 	TU_UpdateIndexes update_indexes;
 	TM_Result	res;
 	List	   *recheck;
+	Snapshot	snapshot = change->snapshot;
 
 	/*
 	 * Write the new tuple into the new heap. ('tup' gets the TID assigned
@@ -2823,13 +3147,19 @@ apply_concurrent_update(Relation rel, HeapTuple tup, HeapTuple tup_target,
 	 *
 	 * Do it like in simple_heap_update(), except for 'wal_logical' (and
 	 * except for 'wait').
+	 *
+	 * Regarding CID, see the comment in apply_concurrent_insert().
 	 */
+	Assert(snapshot->curcid != InvalidCommandId &&
+		   snapshot->curcid > FirstCommandId);
+
 	res = heap_update(rel, &tup_target->t_self, tup,
-					  GetCurrentCommandId(true),
+					  change->xid, snapshot->curcid - 1,
 					  InvalidSnapshot,
 					  false,	/* no wait - only we are doing changes */
 					  &tmfd, &lockmode, &update_indexes,
-					  false /* wal_logical */ );
+	/* wal_logical */
+					  false);
 	if (res != TM_Ok)
 		ereport(ERROR, (errmsg("failed to apply concurrent UPDATE")));
 
@@ -2837,6 +3167,7 @@ apply_concurrent_update(Relation rel, HeapTuple tup, HeapTuple tup_target,
 
 	if (update_indexes != TU_None)
 	{
+		PushActiveSnapshot(snapshot);
 		recheck = ExecInsertIndexTuples(iistate->rri,
 										index_slot,
 										iistate->estate,
@@ -2846,6 +3177,7 @@ apply_concurrent_update(Relation rel, HeapTuple tup, HeapTuple tup_target,
 										NIL,	/* arbiterIndexes */
 		/* onlySummarizing */
 										update_indexes == TU_Summarizing);
+		PopActiveSnapshot();
 		list_free(recheck);
 	}
 
@@ -2858,6 +3190,12 @@ apply_concurrent_delete(Relation rel, HeapTuple tup_target,
 {
 	TM_Result	res;
 	TM_FailureData tmfd;
+	Snapshot	snapshot = change->snapshot;
+
+
+	/* Regarding CID, see the comment in apply_concurrent_insert(). */
+	Assert(snapshot->curcid != InvalidCommandId &&
+		   snapshot->curcid > FirstCommandId);
 
 	/*
 	 * Delete tuple from the new heap.
@@ -2865,11 +3203,11 @@ apply_concurrent_delete(Relation rel, HeapTuple tup_target,
 	 * Do it like in simple_heap_delete(), except for 'wal_logical' (and
 	 * except for 'wait').
 	 */
-	res = heap_delete(rel, &tup_target->t_self, GetCurrentCommandId(true),
-					  InvalidSnapshot, false,
-					  &tmfd,
-					  false,	/* no wait - only we are doing changes */
-					  false /* wal_logical */ );
+	res = heap_delete(rel, &tup_target->t_self, change->xid,
+					  snapshot->curcid - 1, InvalidSnapshot, false,
+					  &tmfd, false,
+	/* wal_logical */
+					  false);
 
 	if (res != TM_Ok)
 		ereport(ERROR, (errmsg("failed to apply concurrent DELETE")));
@@ -2890,7 +3228,7 @@ apply_concurrent_delete(Relation rel, HeapTuple tup_target,
  */
 static HeapTuple
 find_target_tuple(Relation rel, ScanKey key, int nkeys, HeapTuple tup_key,
-				  IndexInsertState *iistate,
+				  Snapshot snapshot, IndexInsertState *iistate,
 				  TupleTableSlot *ident_slot, IndexScanDesc *scan_p)
 {
 	IndexScanDesc scan;
@@ -2899,7 +3237,7 @@ find_target_tuple(Relation rel, ScanKey key, int nkeys, HeapTuple tup_key,
 	HeapTuple	result = NULL;
 
 	/* XXX no instrumentation for now */
-	scan = index_beginscan(rel, iistate->ident_index, GetActiveSnapshot(),
+	scan = index_beginscan(rel, iistate->ident_index, snapshot,
 						   NULL, nkeys, 0);
 	*scan_p = scan;
 	index_rescan(scan, key, nkeys, NULL, 0);
@@ -2971,6 +3309,8 @@ process_concurrent_changes(LogicalDecodingContext *ctx, XLogRecPtr end_of_wal,
 	}
 	PG_FINALLY();
 	{
+		ResetRepackCurrentXids();
+
 		if (rel_src)
 			rel_dst->rd_toastoid = InvalidOid;
 	}
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 5dc4ae58ffe..9fefcffd8b3 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -475,9 +475,14 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 
 	/*
 	 * If the change is not intended for logical decoding, do not even
-	 * establish transaction for it - REPACK CONCURRENTLY is the typical use
-	 * case.
-	 *
+	 * establish transaction for it. This is particularly important if the
+	 * record was generated by REPACK CONCURRENTLY because this command uses
+	 * the original XID when doing changes in the new storage. The decoding
+	 * system probably does not expect to see the same transaction multiple
+	 * times.
+	 */
+
+	/*
 	 * First, check if REPACK CONCURRENTLY is being performed by this backend.
 	 * If so, only decode data changes of the table that it is processing, and
 	 * the changes of its TOAST relation.
@@ -504,11 +509,11 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	 * Second, skip records which do not contain sufficient information for
 	 * the decoding.
 	 *
-	 * The problem we solve here is that REPACK CONCURRENTLY generates WAL
-	 * when doing changes in the new table. Those changes should not be useful
-	 * for any other user (such as logical replication subscription) because
-	 * the new table will eventually be dropped (after REPACK CONCURRENTLY has
-	 * assigned its file to the "old table").
+	 * One particular problem we solve here is that REPACK CONCURRENTLY
+	 * generates WAL when doing changes in the new table. Those changes should
+	 * not be decoded because reorderbuffer.c considers their XID already
+	 * committed. (REPACK CONCURRENTLY deliberately generates WAL records in
+	 * such a way that they are skipped here.)
 	 */
 	switch (info)
 	{
@@ -995,13 +1000,6 @@ DecodeInsert(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 
 	xlrec = (xl_heap_insert *) XLogRecGetData(r);
 
-	/*
-	 * Ignore insert records without new tuples (this does happen when
-	 * raw_heap_insert marks the TOAST record as HEAP_INSERT_NO_LOGICAL).
-	 */
-	if (!(xlrec->flags & XLH_INSERT_CONTAINS_NEW_TUPLE))
-		return;
-
 	/* only interested in our database */
 	XLogRecGetBlockTag(r, 0, &target_locator, NULL, NULL);
 	if (target_locator.dbOid != ctx->slot->data.database)
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 8e5116a9cab..72a38074a7b 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -155,7 +155,7 @@ static bool ExportInProgress = false;
 static void SnapBuildPurgeOlderTxn(SnapBuild *builder);
 
 /* snapshot building/manipulation/distribution functions */
-static Snapshot SnapBuildBuildSnapshot(SnapBuild *builder);
+static Snapshot SnapBuildBuildSnapshot(SnapBuild *builder, XLogRecPtr lsn);
 
 static void SnapBuildFreeSnapshot(Snapshot snap);
 
@@ -352,12 +352,17 @@ SnapBuildSnapDecRefcount(Snapshot snap)
  * Build a new snapshot, based on currently committed catalog-modifying
  * transactions.
  *
+ * 'lsn' is the location of the commit record (of a catalog-changing
+ * transaction) that triggered creation of the snapshot. Pass
+ * InvalidXLogRecPtr for the transaction base snapshot or if it the user of
+ * the snapshot should not need the LSN.
+ *
  * In-progress transactions with catalog access are *not* allowed to modify
  * these snapshots; they have to copy them and fill in appropriate ->curcid
  * and ->subxip/subxcnt values.
  */
 static Snapshot
-SnapBuildBuildSnapshot(SnapBuild *builder)
+SnapBuildBuildSnapshot(SnapBuild *builder, XLogRecPtr lsn)
 {
 	Snapshot	snapshot;
 	Size		ssize;
@@ -425,6 +430,7 @@ SnapBuildBuildSnapshot(SnapBuild *builder)
 	snapshot->active_count = 0;
 	snapshot->regd_count = 0;
 	snapshot->snapXactCompletionCount = 0;
+	snapshot->lsn = lsn;
 
 	return snapshot;
 }
@@ -461,7 +467,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	if (TransactionIdIsValid(MyProc->xmin))
 		elog(ERROR, "cannot build an initial slot snapshot when MyProc->xmin already is valid");
 
-	snap = SnapBuildBuildSnapshot(builder);
+	snap = SnapBuildBuildSnapshot(builder, InvalidXLogRecPtr);
 
 	/*
 	 * We know that snap->xmin is alive, enforced by the logical xmin
@@ -502,7 +508,7 @@ SnapBuildInitialSnapshotForRepack(SnapBuild *builder)
 
 	Assert(builder->state == SNAPBUILD_CONSISTENT);
 
-	snap = SnapBuildBuildSnapshot(builder);
+	snap = SnapBuildBuildSnapshot(builder, InvalidXLogRecPtr);
 	return SnapBuildMVCCFromHistoric(snap, false);
 }
 
@@ -636,7 +642,7 @@ SnapBuildGetOrBuildSnapshot(SnapBuild *builder)
 	/* only build a new snapshot if we don't have a prebuilt one */
 	if (builder->snapshot == NULL)
 	{
-		builder->snapshot = SnapBuildBuildSnapshot(builder);
+		builder->snapshot = SnapBuildBuildSnapshot(builder, InvalidXLogRecPtr);
 		/* increase refcount for the snapshot builder */
 		SnapBuildSnapIncRefcount(builder->snapshot);
 	}
@@ -716,7 +722,7 @@ SnapBuildProcessChange(SnapBuild *builder, TransactionId xid, XLogRecPtr lsn)
 		/* only build a new snapshot if we don't have a prebuilt one */
 		if (builder->snapshot == NULL)
 		{
-			builder->snapshot = SnapBuildBuildSnapshot(builder);
+			builder->snapshot = SnapBuildBuildSnapshot(builder, lsn);
 			/* increase refcount for the snapshot builder */
 			SnapBuildSnapIncRefcount(builder->snapshot);
 		}
@@ -1130,7 +1136,7 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 		if (builder->snapshot)
 			SnapBuildSnapDecRefcount(builder->snapshot);
 
-		builder->snapshot = SnapBuildBuildSnapshot(builder);
+		builder->snapshot = SnapBuildBuildSnapshot(builder, lsn);
 
 		/* we might need to execute invalidations, add snapshot */
 		if (!ReorderBufferXidHasBaseSnapshot(builder->reorder, xid))
@@ -1958,7 +1964,7 @@ SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
 	{
 		SnapBuildSnapDecRefcount(builder->snapshot);
 	}
-	builder->snapshot = SnapBuildBuildSnapshot(builder);
+	builder->snapshot = SnapBuildBuildSnapshot(builder, InvalidXLogRecPtr);
 	SnapBuildSnapIncRefcount(builder->snapshot);
 
 	ReorderBufferSetRestartPoint(builder->reorder, lsn);
diff --git a/src/backend/replication/pgoutput_repack/pgoutput_repack.c b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
index 687fbbc59bb..28bd16f9cc7 100644
--- a/src/backend/replication/pgoutput_repack/pgoutput_repack.c
+++ b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
@@ -32,7 +32,8 @@ static void plugin_truncate(struct LogicalDecodingContext *ctx,
 							Relation relations[],
 							ReorderBufferChange *change);
 static void store_change(LogicalDecodingContext *ctx,
-						 ConcurrentChangeKind kind, HeapTuple tuple);
+						 ConcurrentChangeKind kind, HeapTuple tuple,
+						 TransactionId xid);
 
 void
 _PG_output_plugin_init(OutputPluginCallbacks *cb)
@@ -100,6 +101,7 @@ plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 			  Relation relation, ReorderBufferChange *change)
 {
 	RepackDecodingState *dstate;
+	Snapshot	snapshot;
 
 	dstate = (RepackDecodingState *) ctx->output_writer_private;
 
@@ -107,6 +109,48 @@ plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 	if (relation->rd_id != dstate->relid)
 		return;
 
+	/*
+	 * Catalog snapshot is fine because the table we are processing is
+	 * temporarily considered a user catalog table.
+	 */
+	snapshot = GetCatalogSnapshot(InvalidOid);
+	Assert(snapshot->snapshot_type == SNAPSHOT_HISTORIC_MVCC);
+	Assert(!snapshot->suboverflowed);
+
+	/*
+	 * This should not happen, but if we don't have enough information to
+	 * apply a new snapshot, the consequences would be bad. Thus prefer ERROR
+	 * to Assert().
+	 */
+	if (XLogRecPtrIsInvalid(snapshot->lsn))
+		ereport(ERROR, (errmsg("snapshot has invalid LSN")));
+
+	/*
+	 * reorderbuffer.c changes the catalog snapshot as soon as it sees a new
+	 * CID or a commit record of a catalog-changing transaction.
+	 */
+	if (dstate->snapshot == NULL || snapshot->lsn != dstate->snapshot_lsn ||
+		snapshot->curcid != dstate->snapshot->curcid)
+	{
+		/* CID should not go backwards. */
+		Assert(dstate->snapshot == NULL ||
+			   snapshot->curcid >= dstate->snapshot->curcid ||
+			   change->txn->xid != dstate->last_change_xid);
+
+		/*
+		 * XXX Is it a problem that the copy is created in
+		 * TopTransactionContext?
+		 *
+		 * XXX Wouldn't it be o.k. for SnapBuildMVCCFromHistoric() to set xcnt
+		 * to 0 instead of converting xip in this case? The point is that
+		 * transactions which are still in progress from the perspective of
+		 * reorderbuffer.c could not be replayed yet, so we do not need to
+		 * examine their XIDs.
+		 */
+		dstate->snapshot = SnapBuildMVCCFromHistoric(snapshot, false);
+		dstate->snapshot_lsn = snapshot->lsn;
+	}
+
 	/* Decode entry depending on its type */
 	switch (change->action)
 	{
@@ -124,7 +168,7 @@ plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 				if (newtuple == NULL)
 					elog(ERROR, "Incomplete insert info.");
 
-				store_change(ctx, CHANGE_INSERT, newtuple);
+				store_change(ctx, CHANGE_INSERT, newtuple, change->txn->xid);
 			}
 			break;
 		case REORDER_BUFFER_CHANGE_UPDATE:
@@ -141,9 +185,11 @@ plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 					elog(ERROR, "Incomplete update info.");
 
 				if (oldtuple != NULL)
-					store_change(ctx, CHANGE_UPDATE_OLD, oldtuple);
+					store_change(ctx, CHANGE_UPDATE_OLD, oldtuple,
+								 change->txn->xid);
 
-				store_change(ctx, CHANGE_UPDATE_NEW, newtuple);
+				store_change(ctx, CHANGE_UPDATE_NEW, newtuple,
+							 change->txn->xid);
 			}
 			break;
 		case REORDER_BUFFER_CHANGE_DELETE:
@@ -156,7 +202,7 @@ plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 				if (oldtuple == NULL)
 					elog(ERROR, "Incomplete delete info.");
 
-				store_change(ctx, CHANGE_DELETE, oldtuple);
+				store_change(ctx, CHANGE_DELETE, oldtuple, change->txn->xid);
 			}
 			break;
 		default:
@@ -190,13 +236,13 @@ plugin_truncate(struct LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 	if (i == nrelations)
 		return;
 
-	store_change(ctx, CHANGE_TRUNCATE, NULL);
+	store_change(ctx, CHANGE_TRUNCATE, NULL, InvalidTransactionId);
 }
 
 /* Store concurrent data change. */
 static void
 store_change(LogicalDecodingContext *ctx, ConcurrentChangeKind kind,
-			 HeapTuple tuple)
+			 HeapTuple tuple, TransactionId xid)
 {
 	RepackDecodingState *dstate;
 	char	   *change_raw;
@@ -266,6 +312,11 @@ store_change(LogicalDecodingContext *ctx, ConcurrentChangeKind kind,
 	dst = dst_start + SizeOfConcurrentChange;
 	memcpy(dst, tuple->t_data, tuple->t_len);
 
+	/* Initialize the other fields. */
+	change.xid = xid;
+	change.snapshot = dstate->snapshot;
+	dstate->snapshot->active_count++;
+
 	/* The data has been copied. */
 	if (flattened)
 		pfree(tuple);
@@ -279,6 +330,9 @@ store:
 	isnull[0] = false;
 	tuplestore_putvalues(dstate->tstore, dstate->tupdesc_change,
 						 values, isnull);
+#ifdef USE_ASSERT_CHECKING
+	dstate->last_change_xid = xid;
+#endif
 
 	/* Accounting. */
 	dstate->nchanges++;
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index e9ddf39500c..e24e1795aa9 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -151,6 +151,7 @@ CalculateShmemSize(int *num_semaphores)
 	size = add_size(size, InjectionPointShmemSize());
 	size = add_size(size, SlotSyncShmemSize());
 	size = add_size(size, AioShmemSize());
+	size = add_size(size, RepackShmemSize());
 
 	/* include additional requested shmem from preload libraries */
 	size = add_size(size, total_addin_request);
@@ -344,6 +345,7 @@ CreateOrAttachShmemStructs(void)
 	WaitEventCustomShmemInit();
 	InjectionPointShmemInit();
 	AioShmemInit();
+	RepackShmemInit();
 }
 
 /*
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 5427da5bc1b..e94c83726d6 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -352,6 +352,7 @@ DSMRegistry	"Waiting to read or update the dynamic shared memory registry."
 InjectionPoint	"Waiting to read or update information related to injection points."
 SerialControl	"Waiting to read or update shared <filename>pg_serial</filename> state."
 AioWorkerSubmissionQueue	"Waiting to access AIO worker submission queue."
+RepackedRels	"Waiting to access to hash table with list of repacked relations."
 
 #
 # END OF PREDEFINED LWLOCKS (DO NOT CHANGE THIS LINE)
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index 02505c88b8e..ecaa2283c2a 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -1643,6 +1643,27 @@ CacheInvalidateRelcache(Relation relation)
 								 databaseId, relationId);
 }
 
+/*
+ * CacheInvalidateRelcacheImmediate
+ *		Send invalidation message for the specified relation's relcache entry.
+ *
+ * Currently this is used in REPACK CONCURRENTLY, to make sure that other
+ * backends are aware that the command is being executed for the relation.
+ */
+void
+CacheInvalidateRelcacheImmediate(Oid relid)
+{
+	SharedInvalidationMessage msg;
+
+	msg.rc.id = SHAREDINVALRELCACHE_ID;
+	msg.rc.dbId = MyDatabaseId;
+	msg.rc.relId = relid;
+	/* check AddCatcacheInvalidationMessage() for an explanation */
+	VALGRIND_MAKE_MEM_DEFINED(&msg, sizeof(msg));
+
+	SendSharedInvalidMessages(&msg, 1);
+}
+
 /*
  * CacheInvalidateRelcacheAll
  *		Register invalidation of the whole relcache at the end of command.
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index d27a4c30548..ea565b5b053 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -1279,6 +1279,10 @@ retry:
 	/* make sure relation is marked as having no open file yet */
 	relation->rd_smgr = NULL;
 
+	/* Is REPACK CONCURRENTLY in progress? */
+	relation->rd_repack_concurrent =
+		is_concurrent_repack_in_progress(targetRelId);
+
 	/*
 	 * now we can free the memory allocated for pg_class_tuple
 	 */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b82dd17a966..981425f23b6 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -316,22 +316,24 @@ extern BulkInsertState GetBulkInsertState(void);
 extern void FreeBulkInsertState(BulkInsertState);
 extern void ReleaseBulkInsertStatePin(BulkInsertState bistate);
 
-extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
-						int options, BulkInsertState bistate);
+extern void heap_insert(Relation relation, HeapTuple tup, TransactionId xid,
+						CommandId cid, int options, BulkInsertState bistate);
 extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots,
 							  int ntuples, CommandId cid, int options,
 							  BulkInsertState bistate);
 extern TM_Result heap_delete(Relation relation, ItemPointer tid,
-							 CommandId cid, Snapshot crosscheck, bool wait,
+							 TransactionId xid, CommandId cid,
+							 Snapshot crosscheck, bool wait,
 							 struct TM_FailureData *tmfd, bool changingPart,
 							 bool wal_logical);
 extern void heap_finish_speculative(Relation relation, ItemPointer tid);
 extern void heap_abort_speculative(Relation relation, ItemPointer tid);
 extern TM_Result heap_update(Relation relation, ItemPointer otid,
-							 HeapTuple newtup,
+							 HeapTuple newtup, TransactionId xid,
 							 CommandId cid, Snapshot crosscheck, bool wait,
 							 struct TM_FailureData *tmfd, LockTupleMode *lockmode,
-							 TU_UpdateIndexes *update_indexes, bool wal_logical);
+							 TU_UpdateIndexes *update_indexes,
+							 bool wal_logical);
 extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
 								 CommandId cid, LockTupleMode mode, LockWaitPolicy wait_policy,
 								 bool follow_updates,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index b2bc10ee041..fbb66d559b6 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -482,6 +482,8 @@ extern Size EstimateTransactionStateSpace(void);
 extern void SerializeTransactionState(Size maxsize, char *start_address);
 extern void StartParallelWorkerTransaction(char *tstatespace);
 extern void EndParallelWorkerTransaction(void);
+extern void SetRepackCurrentXids(TransactionId *xip, int xcnt);
+extern void ResetRepackCurrentXids(void);
 extern bool IsTransactionBlock(void);
 extern bool IsTransactionOrTransactionBlock(void);
 extern char TransactionBlockStatusCode(void);
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 4a508c57a50..5dba3d427f5 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -61,6 +61,14 @@ typedef struct ConcurrentChange
 	/* See the enum above. */
 	ConcurrentChangeKind kind;
 
+	/* Transaction that changes the data. */
+	TransactionId xid;
+
+	/*
+	 * Historic catalog snapshot that was used to decode this change.
+	 */
+	Snapshot	snapshot;
+
 	/*
 	 * The actual tuple.
 	 *
@@ -92,6 +100,8 @@ typedef struct RepackDecodingState
 	 * tuplestore does this transparently.
 	 */
 	Tuplestorestate *tstore;
+	/* XID of the last change added to tstore. */
+	TransactionId last_change_xid PG_USED_FOR_ASSERTS_ONLY;
 
 	/* The current number of changes in tstore. */
 	double		nchanges;
@@ -112,6 +122,14 @@ typedef struct RepackDecodingState
 	/* Slot to retrieve data from tstore. */
 	TupleTableSlot *tsslot;
 
+	/*
+	 * Historic catalog snapshot that was used to decode the most recent
+	 * change.
+	 */
+	Snapshot	snapshot;
+	/* LSN of the record  */
+	XLogRecPtr	snapshot_lsn;
+
 	ResourceOwner resowner;
 } RepackDecodingState;
 
@@ -141,4 +159,8 @@ extern void finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 							 MultiXactId cutoffMulti,
 							 char newrelpersistence);
 
+extern Size RepackShmemSize(void);
+extern void RepackShmemInit(void);
+extern bool is_concurrent_repack_in_progress(Oid relid);
+
 #endif							/* CLUSTER_H */
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index 06a1ffd4b08..9a9880b3073 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -85,6 +85,7 @@ PG_LWLOCK(50, DSMRegistry)
 PG_LWLOCK(51, InjectionPoint)
 PG_LWLOCK(52, SerialControl)
 PG_LWLOCK(53, AioWorkerSubmissionQueue)
+PG_LWLOCK(54, RepackedRels)
 
 /*
  * There also exist several built-in LWLock tranches.  As with the predefined
diff --git a/src/include/utils/inval.h b/src/include/utils/inval.h
index 9b871caef62..ae9dee394dc 100644
--- a/src/include/utils/inval.h
+++ b/src/include/utils/inval.h
@@ -50,6 +50,8 @@ extern void CacheInvalidateCatalog(Oid catalogId);
 
 extern void CacheInvalidateRelcache(Relation relation);
 
+extern void CacheInvalidateRelcacheImmediate(Oid relid);
+
 extern void CacheInvalidateRelcacheAll(void);
 
 extern void CacheInvalidateRelcacheByTuple(HeapTuple classTuple);
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index b552359915f..66de3bc0c29 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -253,6 +253,9 @@ typedef struct RelationData
 	bool		pgstat_enabled; /* should relation stats be counted */
 	/* use "struct" here to avoid needing to include pgstat.h: */
 	struct PgStat_TableStatus *pgstat_info; /* statistics collection area */
+
+	/* Is REPACK CONCURRENTLY being performed on this relation? */
+	bool		rd_repack_concurrent;
 } RelationData;
 
 
@@ -695,7 +698,9 @@ RelationCloseSmgr(Relation relation)
 #define RelationIsAccessibleInLogicalDecoding(relation) \
 	(XLogLogicalInfoActive() && \
 	 RelationNeedsWAL(relation) && \
-	 (IsCatalogRelation(relation) || RelationIsUsedAsCatalogTable(relation)))
+	 (IsCatalogRelation(relation) || \
+	  RelationIsUsedAsCatalogTable(relation) || \
+	  (relation)->rd_repack_concurrent))
 
 /*
  * RelationIsLogicallyLogged
diff --git a/src/include/utils/snapshot.h b/src/include/utils/snapshot.h
index 0e546ec1497..014f27db7d7 100644
--- a/src/include/utils/snapshot.h
+++ b/src/include/utils/snapshot.h
@@ -13,6 +13,7 @@
 #ifndef SNAPSHOT_H
 #define SNAPSHOT_H
 
+#include "access/xlogdefs.h"
 #include "lib/pairingheap.h"
 
 
@@ -201,6 +202,8 @@ typedef struct SnapshotData
 	uint32		regd_count;		/* refcount on RegisteredSnapshots */
 	pairingheap_node ph_node;	/* link in the RegisteredSnapshots heap */
 
+	XLogRecPtr	lsn;			/* position in the WAL stream when taken */
+
 	/*
 	 * The transaction completion count at the time GetSnapshotData() built
 	 * this snapshot. Allows to avoid re-computing static snapshots when no
diff --git a/src/test/modules/injection_points/specs/repack.spec b/src/test/modules/injection_points/specs/repack.spec
index 75850334986..3711a7c92b9 100644
--- a/src/test/modules/injection_points/specs/repack.spec
+++ b/src/test/modules/injection_points/specs/repack.spec
@@ -86,9 +86,6 @@ step change_new
 # When applying concurrent data changes, we should see the effects of an
 # in-progress subtransaction.
 #
-# XXX Not sure this test is useful now - it was designed for the patch that
-# preserves tuple visibility and which therefore modifies
-# TransactionIdIsCurrentTransactionId().
 step change_subxact1
 {
 	BEGIN;
@@ -103,7 +100,6 @@ step change_subxact1
 # When applying concurrent data changes, we should not see the effects of a
 # rolled back subtransaction.
 #
-# XXX Is this test useful? See above.
 step change_subxact2
 {
 	BEGIN;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b64ab8dfab4..9f5f331cad6 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2540,6 +2540,7 @@ ReorderBufferTupleCidKey
 ReorderBufferUpdateProgressTxnCB
 ReorderTuple
 RepOriginId
+RepackedRel
 RepackCommand
 RepackDecodingState
 RepackStmt
-- 
2.39.5

#24

Mihail Nikalayeu

mihailnikalayeu@gmail.com

4 months ago

In reply to: Alvaro Herrera (#23)

1 attachment(s)

Re: Adding REPACK [concurrently]

Hello!

I started an attempt to make a "lightweight" MVCC-safe prototype and
stuck into the "it is not working" issue.
After some debugging I realized Antonin's variant (catalog-mode based)
seems to be broken also...

And after a few more hours I realized non-MVCC is broken as well :)

This is a patch with a test to reproduce the issue related to repack +
concurrent modifications.
Seems like some updates may be lost.

I hope the patch logic is clear - but feel free to ask if not.

Best regards,
Mikhail.

Attachments:

v22-0002-Add-stress-tests-for-concurrent-index-builds.patchapplication/octet-stream; name=v22-0002-Add-stress-tests-for-concurrent-index-builds.patchDownload

From c7424f44a086433d2eff6153476e0fd0c6b5b576 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <mihailnikalayeu@gmail.com>
Date: Sat, 30 Nov 2024 16:24:20 +0100
Subject: [PATCH v22 02/12] Add stress tests for concurrent index builds

Introduce stress tests for concurrent index operations:
- test concurrent inserts/updates during CREATE/REINDEX INDEX CONCURRENTLY
- cover various index types (btree, gin, gist, brin, hash, spgist)
- test unique and non-unique indexes
- test with expressions and predicates
- test both parallel and non-parallel operations

These tests verify the behavior of the following commits.
---
 src/bin/pg_amcheck/meson.build  |   1 +
 src/bin/pg_amcheck/t/006_cic.pl | 223 ++++++++++++++++++++++++++++++++
 2 files changed, 224 insertions(+)
 create mode 100644 src/bin/pg_amcheck/t/006_cic.pl

diff --git a/src/bin/pg_amcheck/meson.build b/src/bin/pg_amcheck/meson.build
index 316ea0d40b8..7df15435fbb 100644
--- a/src/bin/pg_amcheck/meson.build
+++ b/src/bin/pg_amcheck/meson.build
@@ -28,6 +28,7 @@ tests += {
       't/003_check.pl',
       't/004_verify_heapam.pl',
       't/005_opclass_damage.pl',
+      't/006_cic.pl',
     ],
   },
 }
diff --git a/src/bin/pg_amcheck/t/006_cic.pl b/src/bin/pg_amcheck/t/006_cic.pl
new file mode 100644
index 00000000000..2aad0e8daa8
--- /dev/null
+++ b/src/bin/pg_amcheck/t/006_cic.pl
@@ -0,0 +1,223 @@
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+# Test REINDEX CONCURRENTLY with concurrent modifications and HOT updates
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+Test::More->builder->todo_start('filesystem bug')
+  if PostgreSQL::Test::Utils::has_wal_read_bug;
+
+my ($node, $result);
+
+#
+# Test set-up
+#
+$node = PostgreSQL::Test::Cluster->new('RC_test');
+$node->init;
+$node->append_conf('postgresql.conf',
+	'lock_timeout = ' . (1000 * $PostgreSQL::Test::Utils::timeout_default));
+$node->append_conf('postgresql.conf', 'fsync = off');
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+$node->safe_psql('postgres', q(CREATE TABLE tbl(i int primary key,
+								c1 money default 0, c2 money default 0,
+								c3 money default 0, updated_at timestamp,
+								ia int4[], p point)));
+$node->safe_psql('postgres', q(CREATE INDEX CONCURRENTLY idx ON tbl(i, updated_at);));
+# create sequence
+$node->safe_psql('postgres', q(CREATE UNLOGGED SEQUENCE in_row_rebuild START 1 INCREMENT 1;));
+$node->safe_psql('postgres', q(SELECT nextval('in_row_rebuild');));
+
+# Create helper functions for predicate tests
+$node->safe_psql('postgres', q(
+	CREATE FUNCTION predicate_stable() RETURNS bool IMMUTABLE
+	LANGUAGE plpgsql AS $$
+	BEGIN
+		EXECUTE 'SELECT txid_current()';
+		RETURN true;
+	END; $$;
+));
+
+$node->safe_psql('postgres', q(
+	CREATE FUNCTION predicate_const(integer) RETURNS bool IMMUTABLE
+	LANGUAGE plpgsql AS $$
+	BEGIN
+		RETURN MOD($1, 2) = 0;
+	END; $$;
+));
+
+# Run CIC/RIC in different options concurrently with upserts
+$node->pgbench(
+	'--no-vacuum --client=30 --jobs=4 --exit-on-abort --transactions=2500',
+	0,
+	[qr{actually processed}],
+	[qr{^$}],
+	'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY',
+	{
+		'concurrent_ops' => q(
+			SET debug_parallel_query = off; -- this is because predicate_stable implementation
+			SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+			\if :gotlock
+				SELECT nextval('in_row_rebuild') AS last_value \gset
+				\set variant random(0, 5)
+				\set parallels random(0, 4)
+				\if :last_value < 3
+					ALTER TABLE tbl SET (parallel_workers=:parallels);
+					\if :variant = 0
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, updated_at);
+					\elif :variant = 1
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, updated_at) WHERE predicate_stable();
+					\elif :variant = 2
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, updated_at) WHERE MOD(i, 2) = 0;
+					\elif :variant = 3
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, updated_at) WHERE predicate_const(i);
+					\elif :variant = 4
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(predicate_const(i));
+					\elif :variant = 5
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, predicate_const(i), updated_at) WHERE predicate_const(i);
+					\endif
+					\sleep 10 ms
+					SELECT bt_index_check('new_idx', heapallindexed => true, checkunique => true);
+					REINDEX INDEX CONCURRENTLY new_idx;
+					\sleep 10 ms
+					SELECT bt_index_check('new_idx', heapallindexed => true, checkunique => true);
+					DROP INDEX CONCURRENTLY new_idx;
+				\endif
+				SELECT pg_advisory_unlock(42);
+			\else
+				\set num random(1000, 100000)
+				BEGIN;
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now())
+					ON CONFLICT(i) DO UPDATE SET updated_at = now();
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now())
+					ON CONFLICT(i) DO UPDATE SET updated_at = now();
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now())
+					ON CONFLICT(i) DO UPDATE SET updated_at = now();
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now())
+					ON CONFLICT(i) DO UPDATE SET updated_at = now();
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now())
+					ON CONFLICT(i) DO UPDATE SET updated_at = now();
+				SELECT setval('in_row_rebuild', 1);
+				COMMIT;
+			\endif
+		)
+	});
+
+$node->safe_psql('postgres', q(TRUNCATE TABLE tbl;));
+
+# Run CIC/RIC for unique index concurrently with upserts
+$node->pgbench(
+	'--no-vacuum --client=30 --jobs=4 --exit-on-abort --transactions=2500',
+	0,
+	[qr{actually processed}],
+	[qr{^$}],
+	'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY for unique BTREE',
+	{
+		'concurrent_ops_unique_idx' => q(
+			SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+			\if :gotlock
+				SELECT nextval('in_row_rebuild') AS last_value \gset
+				\set parallels random(0, 4)
+				\if :last_value < 3
+					ALTER TABLE tbl SET (parallel_workers=:parallels);
+					CREATE UNIQUE INDEX CONCURRENTLY new_idx ON tbl(i);
+					\sleep 10 ms
+					SELECT bt_index_check('new_idx', heapallindexed => true, checkunique => true);
+					REINDEX INDEX CONCURRENTLY new_idx;
+					\sleep 10 ms
+					SELECT bt_index_check('new_idx', heapallindexed => true, checkunique => true);
+					DROP INDEX CONCURRENTLY new_idx;
+				\endif
+				SELECT pg_advisory_unlock(42);
+			\else
+				\set num random(1, power(10, random(1, 5)))
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now())
+					ON CONFLICT(i) DO UPDATE SET updated_at = now();
+				SELECT setval('in_row_rebuild', 1);
+			\endif
+		)
+	});
+
+$node->safe_psql('postgres', q(TRUNCATE TABLE tbl;));
+
+# Run CIC/RIC for GIN with upserts
+$node->pgbench(
+	'--no-vacuum --client=30 --jobs=4 --exit-on-abort --transactions=2500',
+	0,
+	[qr{actually processed}],
+	[qr{^$}],
+	'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY for GIN/GIST/BRIN/HASH/SPGIST',
+	{
+		'concurrent_ops_gin_idx' => q(
+			SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+			\if :gotlock
+				SELECT nextval('in_row_rebuild') AS last_value \gset
+				\set parallels random(0, 4)
+				\if :last_value < 3
+					ALTER TABLE tbl SET (parallel_workers=:parallels);
+					CREATE INDEX CONCURRENTLY new_idx ON tbl USING GIN (ia);
+					\sleep 10 ms
+					SELECT gin_index_check('new_idx');
+					REINDEX INDEX CONCURRENTLY new_idx;
+					\sleep 10 ms
+					SELECT gin_index_check('new_idx');
+					DROP INDEX CONCURRENTLY new_idx;
+				\endif
+				SELECT pg_advisory_unlock(42);
+			\else
+				\set num random(1, power(10, random(1, 5)))
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now())
+					ON CONFLICT(i) DO UPDATE SET updated_at = now();
+				SELECT setval('in_row_rebuild', 1);
+			\endif
+		)
+	});
+
+$node->safe_psql('postgres', q(TRUNCATE TABLE tbl;));
+
+# Run CIC/RIC for GIST/BRIN/HASH/SPGIST index concurrently with upserts
+$node->pgbench(
+	'--no-vacuum --client=30 --jobs=4 --exit-on-abort --transactions=2500',
+	0,
+	[qr{actually processed}],
+	[qr{^$}],
+	'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY for GIN/GIST/BRIN/HASH/SPGIST',
+	{
+		'concurrent_ops_other_idx' => q(
+			SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+			\if :gotlock
+				SELECT nextval('in_row_rebuild') AS last_value \gset
+				\set parallels random(0, 4)
+				\if :last_value < 3
+					ALTER TABLE tbl SET (parallel_workers=:parallels);
+					\set variant random(0, 3)
+					\if :variant = 0
+						CREATE INDEX CONCURRENTLY new_idx ON tbl USING GIST (p);
+					\elif :variant = 1
+						CREATE INDEX CONCURRENTLY new_idx ON tbl USING BRIN (updated_at);
+					\elif :variant = 2
+						CREATE INDEX CONCURRENTLY new_idx ON tbl USING HASH (updated_at);
+					\elif :variant = 3
+						CREATE INDEX CONCURRENTLY new_idx ON tbl USING SPGIST (p);
+					\endif
+					\sleep 10 ms
+					REINDEX INDEX CONCURRENTLY new_idx;
+					\sleep 10 ms
+					DROP INDEX CONCURRENTLY new_idx;
+				\endif
+				SELECT pg_advisory_unlock(42);
+			\else
+				\set num random(1, power(10, random(1, 5)))
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now())
+					ON CONFLICT(i) DO UPDATE SET updated_at = now();
+				SELECT setval('in_row_rebuild', 1);
+			\endif
+		)
+	});
+
+$node->stop;
+done_testing();
\ No newline at end of file
-- 
2.43.0

#25

Alvaro Herrera

alvherre@alvh.no-ip.org

4 months ago

In reply to: Mihail Nikalayeu (#24)

Re: Adding REPACK [concurrently]

On 2025-Aug-31, Mihail Nikalayeu wrote:

I started an attempt to make a "lightweight" MVCC-safe prototype and
stuck into the "it is not working" issue.
After some debugging I realized Antonin's variant (catalog-mode based)
seems to be broken also...

And after a few more hours I realized non-MVCC is broken as well :)

Ugh. Well, obviously we need to get this fixed if we want CONCURRENTLY
at all :-)

Please don't post patches that aren't the commitfest item's main patch
as attachment with .patch extension. This confuses the CFbot into
thinking your patch is the patch-of-record (which it isn't) and reports
that the patch fails CI. See here:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5117
(For the same reason, it isn't useful to number them as if they were
part of the patch series).

If you want to post secondary patches, please rename them to end in
something like .txt or .nocfbot or whatever. See here:
https://wiki.postgresql.org/wiki/Cfbot#Which_attachments_are_considered_to_be_patches?

Thanks for your interest in this topic,

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/

#26

Antonin Houska

ah@cybertec.at

4 months ago

In reply to: Mihail Nikalayeu (#24)

Re: Adding REPACK [concurrently]

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

Hello!

I started an attempt to make a "lightweight" MVCC-safe prototype and
stuck into the "it is not working" issue.
After some debugging I realized Antonin's variant (catalog-mode based)
seems to be broken also...

And after a few more hours I realized non-MVCC is broken as well :)

This is a patch with a test to reproduce the issue related to repack +
concurrent modifications.
Seems like some updates may be lost.

I hope the patch logic is clear - but feel free to ask if not.

Are you sure the test is complete? I see no occurrence of the REPACK command
in it.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#27

Mihail Nikalayeu

mihailnikalayeu@gmail.com

4 months ago

In reply to: Antonin Houska (#26)

1 attachment(s)

Re: Adding REPACK [concurrently]

Hello!

Antonin Houska <ah@cybertec.at>:

Are you sure the test is complete? I see no occurrence of the REPACK command
in it.

Oops, send invalid file. The correct one in attachment.

Attachments:

Add_test_for_REPACK_CONCURRENTLY_with_concurrent_modifications.patch_application/octet-stream; name=Add_test_for_REPACK_CONCURRENTLY_with_concurrent_modifications.patch_Download

Subject: [PATCH] Add test for REPACK CONCURRENTLY with concurrent modifications
---
Index: contrib/amcheck/t/007_repack_concurrently_mvcc.pl
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/contrib/amcheck/t/007_repack_concurrently_mvcc.pl b/contrib/amcheck/t/007_repack_concurrently_mvcc.pl
new file mode 100644
--- /dev/null	(date 1756650098221)
+++ b/contrib/amcheck/t/007_repack_concurrently_mvcc.pl	(date 1756650098221)
@@ -0,0 +1,94 @@
+
+# Copyright (c) 2021-2025, PostgreSQL Global Development Group
+
+# Test REPACK CONCURRENTLY with concurrent modifications
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+
+use Test::More;
+
+my $node;
+
+#
+# Test set-up
+#
+$node = PostgreSQL::Test::Cluster->new('CIC_test');
+$node->init;
+$node->append_conf('postgresql.conf',
+	'lock_timeout = ' . (1000 * $PostgreSQL::Test::Utils::timeout_default));
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+$node->append_conf(
+	'postgresql.conf', qq(
+wal_level = logical
+));
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+$node->safe_psql('postgres', q(CREATE TABLE tbl1(i int PRIMARY KEY, j int)));
+$node->safe_psql('postgres', q(CREATE TABLE tbl2(i int PRIMARY KEY, j int)));
+
+
+# Insert 100 rows into tbl1
+$node->safe_psql('postgres', q(
+    INSERT INTO tbl1 SELECT i, i % 100 FROM generate_series(1,100) i
+));
+
+# Insert 100 rows into tbl2
+$node->safe_psql('postgres', q(
+    INSERT INTO tbl2 SELECT i, i % 100 FROM generate_series(1,100) i
+));
+
+
+# Insert 100 rows into tbl1
+$node->safe_psql('postgres', q(
+	CREATE OR REPLACE FUNCTION log_raise(i int, j1 int, j2 int) RETURNS VOID AS $$
+	BEGIN
+	  RAISE NOTICE 'ERROR i=% j1=% j2=%', i, j1, j2;
+	END;$$ LANGUAGE plpgsql;
+));
+
+$node->safe_psql('postgres', q(CREATE UNLOGGED SEQUENCE in_row_repack START 1 INCREMENT 1;));
+$node->safe_psql('postgres', q(SELECT nextval('in_row_repack');));
+
+
+$node->pgbench(
+'--no-vacuum --client=10 --jobs=4 --exit-on-abort --transactions=25000',
+0,
+[qr{actually processed}],
+[qr{^$}],
+'concurrent operations with REPACK CONCURRENTLY',
+{
+	'concurrent_ops' => q(
+		SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+		\if :gotlock
+			SELECT nextval('in_row_repack') AS last_value \gset
+			\if :last_value = 2
+				REPACK (CONCURRENTLY) tbl2 USING INDEX tbl2_pkey;
+				\sleep 10 ms
+			\endif
+			SELECT pg_advisory_unlock(42);
+		\else
+			\set num random(1, 100)
+			BEGIN;
+			UPDATE tbl1 SET j = j + 1 WHERE i = :num;
+			UPDATE tbl2 SET j = j + 1 WHERE i = :num;
+			COMMIT;
+			SELECT setval('in_row_repack', 1);
+
+			SELECT COUNT(*) AS suspect FROM (SELECT * FROM tbl1 LEFT OUTER JOIN tbl2 ON tbl1.i = tbl2.i WHERE tbl1.j != tbl2.j) as X  \gset p_
+			\if :p_suspect != 0
+				SELECT pg_advisory_lock(42); -- make sure we have not any concurrent repack
+				SELECT COUNT(*) AS fatal FROM (SELECT log_raise(tbl1.i, tbl1.j, tbl2.j) FROM tbl1 LEFT OUTER JOIN tbl2 ON tbl1.i = tbl2.i WHERE tbl1.j != tbl2.j) as X  \gset p_
+				\if :p_fatal != 0
+					SELECT :p_fatal / 0;
+				\endif
+				SELECT pg_advisory_unlock(42);
+			\endif
+		\endif
+	)
+});
+
+$node->stop;
+done_testing();
Index: contrib/amcheck/meson.build
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/contrib/amcheck/meson.build b/contrib/amcheck/meson.build
--- a/contrib/amcheck/meson.build	(revision 911e274d320364685721d5d84d8013f2f510e5aa)
+++ b/contrib/amcheck/meson.build	(date 1756649361369)
@@ -50,6 +50,7 @@
       't/004_verify_nbtree_unique.pl',
       't/005_pitr.pl',
       't/006_verify_gin.pl',
+      't/007_repack_concurrently_mvcc.pl',
     ],
   },
 }

#28

Mihail Nikalayeu

mihailnikalayeu@gmail.com

4 months ago

In reply to: Alvaro Herrera (#25)

Re: Adding REPACK [concurrently]

Hello, Álvaro!

Alvaro Herrera <alvherre@alvh.no-ip.org>:

If you want to post secondary patches, please rename them to end in
something like .txt or .nocfbot or whatever. See here:
https://wiki.postgresql.org/wiki/Cfbot#Which_attachments_are_considered_to_be_patches?

Sorry, I missed that.
But now it is possible to send ".patch" without changing the extension [0]https://discord.com/channels/1258108670710124574/1328362897189113867/1412021226528051250.

It also ignores any files that start with "nocfbot".

[0]: https://discord.com/channels/1258108670710124574/1328362897189113867/1412021226528051250

#29

Antonin Houska

ah@cybertec.at

4 months ago

In reply to: Mihail Nikalayeu (#27)

Re: Adding REPACK [concurrently]

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

Antonin Houska <ah@cybertec.at>:

Are you sure the test is complete? I see no occurrence of the REPACK command
in it.

Oops, send invalid file. The correct one in attachment.

Thanks!

The problem was that when removing the original "preserve visibility patch"
v12-0005 [1]/messages/by-id/CAFj8pRDK89FtY_yyGw7-MW-zTaHOCY4m6qfLRittdoPocz+dMQ@mail.gmail.com from the series, I forgot to change the value of
'need_full_snapshot' argument of CreateInitDecodingContext().

v12 and earlier treated the repacked table like system catalog, so it was
o.k. to pass need_full_snapshot=false. However, it must be true now, otherwise
the snapshot created for the initial copy does not see commits of transactions
that do not change regular catalogs.

The fix is as simple as

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index f481a3cec6d..7866ac01278 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -502,6 +502,7 @@ SnapBuildInitialSnapshotForRepack(SnapBuild *builder)
        StringInfo      buf = makeStringInfo();

Assert(builder->state == SNAPBUILD_CONSISTENT);
+ Assert(builder->building_full_snapshot);

snap = SnapBuildBuildSnapshot(builder);

I'll apply it to the next version of the "Add CONCURRENTLY option to REPACK
command" patch.

[1]: /messages/by-id/CAFj8pRDK89FtY_yyGw7-MW-zTaHOCY4m6qfLRittdoPocz+dMQ@mail.gmail.com

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#30

Mihail Nikalayeu

mihailnikalayeu@gmail.com

4 months ago

In reply to: Antonin Houska (#29)

6 attachment(s)

Re: Adding REPACK [concurrently]

Hello!

Antonin Houska <ah@cybertec.at>:

I'll apply it to the next version of the "Add CONCURRENTLY option to REPACK
command" patch.

I have added it to the v21 patchset.

Also, I’ve updated the MVCC-safe patch:
* it uses the "XactLockTableWait before replay + SnapshotSelf" approach from [0]/messages/by-id/CADzfLwXCTXNdxK-XGTKmObvT=_QnaCviwgrcGtG9chsj5sYzrg@mail.gmail.com
* it includes a TAP test to ensure MVCC safety - not intended to be
committed in its current form (too heavy)
* documentation has been updated.

It's now much simpler and does not negatively impact performance. It
is less aggressive in tuple freezing, but can be updated to match the
non-MVCC-safe version if needed.

While testing MVCC-safe version with stress-tests
007_repack_concurrently_mvcc.pl I encountered some random crashes with
such logs:

25-09-02 12:24:40.039 CEST client backend[261907]
007_repack_concurrently_mvcc.pl ERROR: relcache reference
0x7715b9f394a8 is not owned by resource owner TopTransaction
2025-09-02 12:24:40.039 CEST client backend[261907]
007_repack_concurrently_mvcc.pl STATEMENT: REPACK (CONCURRENTLY) tbl1
USING INDEX tbl1_pkey;
TRAP: failed Assert("rel->rd_refcnt > 0"), File:
"../src/backend/utils/cache/relcache.c", Line: 6992, PID: 261907
postgres: CIC_test: nkey postgres [local]
REPACK(ExceptionalCondition+0xbe)[0x5b7ac41d79f9]
postgres: CIC_test: nkey postgres [local] REPACK(+0x852d2e)[0x5b7ac41cbd2e]
postgres: CIC_test: nkey postgres [local] REPACK(+0x8aa4a6)[0x5b7ac42234a6]
postgres: CIC_test: nkey postgres [local] REPACK(+0x8aad3b)[0x5b7ac4223d3b]
postgres: CIC_test: nkey postgres [local] REPACK(+0x8aac69)[0x5b7ac4223c69]
postgres: CIC_test: nkey postgres [local]
REPACK(ResourceOwnerRelease+0x32)[0x5b7ac4223c26]
postgres: CIC_test: nkey postgres [local] REPACK(+0x1f43bf)[0x5b7ac3b6d3bf]
postgres: CIC_test: nkey postgres [local] REPACK(+0x1f4dfa)[0x5b7ac3b6ddfa]
postgres: CIC_test: nkey postgres [local]
REPACK(AbortCurrentTransaction+0xe)[0x5b7ac3b6dd6b]
postgres: CIC_test: nkey postgres [local]
REPACK(PostgresMain+0x57d)[0x5b7ac3fd7238]
postgres: CIC_test: nkey postgres [local] REPACK(+0x654102)[0x5b7ac3fcd102]
postgres: CIC_test: nkey postgres [local]
REPACK(postmaster_child_launch+0x191)[0x5b7ac3eceb7a]
postgres: CIC_test: nkey postgres [local] REPACK(+0x55c8c1)[0x5b7ac3ed58c1]
postgres: CIC_test: nkey postgres [local] REPACK(+0x559d1e)[0x5b7ac3ed2d1e]
postgres: CIC_test: nkey postgres [local]
REPACK(PostmasterMain+0x168a)[0x5b7ac3ed25f8]
postgres: CIC_test: nkey postgres [local] REPACK(main+0x3a1)[0x5b7ac3da2bd6]
/lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7715b962a1ca]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7715b962a28b]

This time I was clever and tried to attempt to reproduce the issue on
a non-MVCC safe version at first - and it is reproducible.

Just comment \if :p_t1 != :p_t2 (and its internals, because they
catching non-mvcc behaviour which is expected without 0006 patch); and
set
'--no-vacuum --client=30 --jobs=4 --exit-on-abort --transactions=25000'

It takes about a minute on my PC to get the crash.

[0]: /messages/by-id/CADzfLwXCTXNdxK-XGTKmObvT=_QnaCviwgrcGtG9chsj5sYzrg@mail.gmail.com

Best regards,
Mikhail.

Attachments:

v21-0006-Preserve-visibility-information-of-the-concurren.patchapplication/octet-stream; name=v21-0006-Preserve-visibility-information-of-the-concurren.patchDownload

From 946862e2a4dbfd91ac6802c2e8da104dce81c43a Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <mihailnikalayeu@gmail.com>
Date: Tue, 2 Sep 2025 11:30:55 +0200
Subject: [PATCH v21 6/6] Preserve visibility information of the concurrent 
 data  changes.

As explained in the commit message of the preceding patch of the series, the
data changes done by applications while REPACK CONCURRENTLY is copying the
table contents to a new file are decoded from WAL and eventually also applied
to the new file. To reduce the complexity a little bit, the preceding patch
uses the current transaction (i.e. transaction opened by the REPACK command)
to execute those INSERT, UPDATE and DELETE commands.

However, REPACK is not expected to change visibility of tuples. Therefore,
this patch fixes the handling of the "concurrent data changes". It ensures
that tuples written into the new table have the same XID and command ID (CID)
as they had in the old table.

To "replay" an UPDATE or DELETE command on the new table, we use SnapshotSelf to find the last alive version of tuple and update with stamp with xid of original transaction. It is safe because:
* all transactions we replaying are committed
* apply worker working without any concurrent modifiers of the table

As long as we preserve the tuple visibility information (which includes XID),
it's important to avoid logical decoding of the WAL generated by DMLs on the
new table: the logical decoding subsystem probably does not expect that the
incoming WAL records contain XIDs of an already decoded transactions. (And of
course, repeated decoding would be wasted effort.)

Author: Antonin Houska <ah@cybertec.at> with changes from Mikhail Nikalayeu <mihailnikalayeu@gmail.com
---
 contrib/amcheck/meson.build                   |   1 +
 .../amcheck/t/007_repack_concurrently_mvcc.pl | 113 ++++++++++++++++++
 doc/src/sgml/mvcc.sgml                        |  12 +-
 doc/src/sgml/ref/repack.sgml                  |   9 --
 src/backend/access/common/toast_internals.c   |   3 +-
 src/backend/access/heap/heapam.c              |  46 ++++---
 src/backend/access/heap/heapam_handler.c      |  24 ++--
 src/backend/commands/cluster.c                |  85 +++++++++----
 .../pgoutput_repack/pgoutput_repack.c         |  18 +--
 src/include/access/heapam.h                   |  12 +-
 src/include/commands/cluster.h                |   2 +
 .../injection_points/specs/repack.spec        |   4 -
 12 files changed, 249 insertions(+), 80 deletions(-)
 create mode 100644 contrib/amcheck/t/007_repack_concurrently_mvcc.pl

diff --git a/contrib/amcheck/meson.build b/contrib/amcheck/meson.build
index 1f0c347ed54..d07d6ed3f0c 100644
--- a/contrib/amcheck/meson.build
+++ b/contrib/amcheck/meson.build
@@ -50,6 +50,7 @@ tests += {
       't/004_verify_nbtree_unique.pl',
       't/005_pitr.pl',
       't/006_verify_gin.pl',
+      't/007_repack_concurrently_mvcc.pl',
     ],
   },
 }
diff --git a/contrib/amcheck/t/007_repack_concurrently_mvcc.pl b/contrib/amcheck/t/007_repack_concurrently_mvcc.pl
new file mode 100644
index 00000000000..a83fd5b8141
--- /dev/null
+++ b/contrib/amcheck/t/007_repack_concurrently_mvcc.pl
@@ -0,0 +1,113 @@
+
+# Copyright (c) 2021-2025, PostgreSQL Global Development Group
+
+# Test REPACK CONCURRENTLY with concurrent modifications
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+
+use Test::More;
+
+my $node;
+
+#
+# Test set-up
+#
+$node = PostgreSQL::Test::Cluster->new('CIC_test');
+$node->init;
+$node->append_conf('postgresql.conf',
+	'lock_timeout = ' . (1000 * $PostgreSQL::Test::Utils::timeout_default));
+$node->append_conf(
+	'postgresql.conf', qq(
+wal_level = logical
+));
+$node->start;
+$node->safe_psql('postgres', q(CREATE TABLE tbl1(i int PRIMARY KEY, j int)));
+$node->safe_psql('postgres', q(CREATE TABLE tbl2(i int PRIMARY KEY, j int)));
+
+
+# Insert 100 rows into tbl1
+$node->safe_psql('postgres', q(
+    INSERT INTO tbl1 SELECT i, i % 100 FROM generate_series(1,100) i
+));
+
+# Insert 100 rows into tbl2
+$node->safe_psql('postgres', q(
+    INSERT INTO tbl2 SELECT i, i % 100 FROM generate_series(1,100) i
+));
+
+
+# Insert 100 rows into tbl1
+$node->safe_psql('postgres', q(
+	CREATE OR REPLACE FUNCTION log_raise(i int, j1 int, j2 int) RETURNS VOID AS $$
+	BEGIN
+	  RAISE NOTICE 'ERROR i=% j1=% j2=%', i, j1, j2;
+	END;$$ LANGUAGE plpgsql;
+));
+
+$node->safe_psql('postgres', q(CREATE UNLOGGED SEQUENCE in_row_rebuild START 1 INCREMENT 1;));
+$node->safe_psql('postgres', q(SELECT nextval('in_row_rebuild');));
+
+
+$node->pgbench(
+'--no-vacuum --client=10 --jobs=4 --exit-on-abort --transactions=2500',
+0,
+[qr{actually processed}],
+[qr{^$}],
+'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY',
+{
+	'concurrent_ops' => q(
+		SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+		\if :gotlock
+			SELECT nextval('in_row_rebuild') AS last_value \gset
+			\if :last_value = 2
+				REPACK (CONCURRENTLY) tbl1 USING INDEX tbl1_pkey;
+				\sleep 10 ms
+				REPACK (CONCURRENTLY) tbl2 USING INDEX tbl2_pkey;
+				\sleep 10 ms
+			\endif
+			SELECT pg_advisory_unlock(42);
+		\else
+			\set num random(1, 100)
+			BEGIN;
+			UPDATE tbl1 SET j = j + 1 WHERE i = :num;
+			\sleep 1 ms
+			UPDATE tbl1 SET j = j + 2 WHERE i = :num;
+			\sleep 1 ms
+			UPDATE tbl1 SET j = j + 3 WHERE i = :num;
+			\sleep 1 ms
+			UPDATE tbl1 SET j = j + 4 WHERE i = :num;
+			\sleep 1 ms
+
+			UPDATE tbl2 SET j = j + 1 WHERE i = :num;
+			\sleep 1 ms
+			UPDATE tbl2 SET j = j + 2 WHERE i = :num;
+			\sleep 1 ms
+			UPDATE tbl2 SET j = j + 3 WHERE i = :num;
+			\sleep 1 ms
+			UPDATE tbl2 SET j = j + 4 WHERE i = :num;
+
+			COMMIT;
+			SELECT setval('in_row_rebuild', 1);
+
+			BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
+			SELECT COALESCE(SUM(j), 0) AS t1 FROM tbl1 WHERE i = :num \gset p_
+			\sleep 10 ms
+			SELECT COALESCE(SUM(j), 0) AS t2 FROM tbl2 WHERE i = :num \gset p_
+			\if :p_t1 != :p_t2
+				COMMIT;
+				SELECT log_raise(tbl1.i, tbl1.j, tbl2.j) FROM tbl1 LEFT OUTER JOIN tbl2 ON tbl1.i = tbl2.i WHERE tbl1.j != tbl2.j;
+				\sleep 10 ms
+				SELECT log_raise(tbl1.i, tbl1.j, tbl2.j) FROM tbl1 LEFT OUTER JOIN tbl2 ON tbl1.i = tbl2.i WHERE tbl1.j != tbl2.j;
+				SELECT (:p_t1 + :p_t2) / 0;
+			\endif
+
+			COMMIT;
+		\endif
+	)
+});
+
+$node->stop;
+done_testing();
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index 0f5c34af542..049ee75a4ba 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -1833,17 +1833,15 @@ SELECT pg_advisory_lock(q.id) FROM
    <title>Caveats</title>
 
    <para>
-    Some commands, currently only <link linkend="sql-truncate"><command>TRUNCATE</command></link>, the
-    table-rewriting forms of <link linkend="sql-altertable"><command>ALTER
-    TABLE</command></link> and <command>REPACK</command> with
-    the <literal>CONCURRENTLY</literal> option, are not
+    Some DDL commands, currently only <link linkend="sql-truncate"><command>TRUNCATE</command></link> and the
+    table-rewriting forms of <link linkend="sql-altertable"><command>ALTER TABLE</command></link>, are not
     MVCC-safe.  This means that after the truncation or rewrite commits, the
     table will appear empty to concurrent transactions, if they are using a
-    snapshot taken before the command committed.  This will only be an
+    snapshot taken before the DDL command committed.  This will only be an
     issue for a transaction that did not access the table in question
-    before the command started &mdash; any transaction that has done so
+    before the DDL command started &mdash; any transaction that has done so
     would hold at least an <literal>ACCESS SHARE</literal> table lock,
-    which would block the truncating or rewriting command until that transaction completes.
+    which would block the DDL command until that transaction completes.
     So these commands will not cause any apparent inconsistency in the
     table contents for successive queries on the target table, but they
     could cause visible inconsistency between the contents of the target
diff --git a/doc/src/sgml/ref/repack.sgml b/doc/src/sgml/ref/repack.sgml
index ff5ce48de55..271923a5a60 100644
--- a/doc/src/sgml/ref/repack.sgml
+++ b/doc/src/sgml/ref/repack.sgml
@@ -292,15 +292,6 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
        </listitem>
       </itemizedlist>
      </para>
-
-     <warning>
-      <para>
-       <command>REPACK</command> with the <literal>CONCURRENTLY</literal>
-       option is not MVCC-safe, see <xref linkend="mvcc-caveats"/> for
-       details.
-      </para>
-     </warning>
-
     </listitem>
    </varlistentry>
 
diff --git a/src/backend/access/common/toast_internals.c b/src/backend/access/common/toast_internals.c
index a1d0eed8953..586eb42a137 100644
--- a/src/backend/access/common/toast_internals.c
+++ b/src/backend/access/common/toast_internals.c
@@ -320,7 +320,8 @@ toast_save_datum(Relation rel, Datum value,
 		memcpy(VARDATA(&chunk_data), data_p, chunk_size);
 		toasttup = heap_form_tuple(toasttupDesc, t_values, t_isnull);
 
-		heap_insert(toastrel, toasttup, mycid, options, NULL);
+		heap_insert(toastrel, toasttup, GetCurrentTransactionId(), mycid,
+					options, NULL);
 
 		/*
 		 * Create the index entry.  We cheat a little here by not using
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index f9a4fe3faed..45da5902de0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2070,7 +2070,7 @@ ReleaseBulkInsertStatePin(BulkInsertState bistate)
 /*
  *	heap_insert		- insert tuple into a heap
  *
- * The new tuple is stamped with current transaction ID and the specified
+ * The new tuple is stamped with specified transaction ID and the specified
  * command ID.
  *
  * See table_tuple_insert for comments about most of the input flags, except
@@ -2086,15 +2086,16 @@ ReleaseBulkInsertStatePin(BulkInsertState bistate)
  * reflected into *tup.
  */
 void
-heap_insert(Relation relation, HeapTuple tup, CommandId cid,
-			int options, BulkInsertState bistate)
+heap_insert(Relation relation, HeapTuple tup, TransactionId xid,
+			CommandId cid, int options, BulkInsertState bistate)
 {
-	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
+	Assert(TransactionIdIsValid(xid));
+
 	/* Cheap, simplistic check that the tuple matches the rel's rowtype. */
 	Assert(HeapTupleHeaderGetNatts(tup->t_data) <=
 		   RelationGetNumberOfAttributes(relation));
@@ -2176,8 +2177,15 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		/*
 		 * If this is a catalog, we need to transmit combo CIDs to properly
 		 * decode, so log that as well.
+		 *
+		 * HEAP_INSERT_NO_LOGICAL should be set when applying data changes
+		 * done by other transactions during REPACK CONCURRENTLY. In such a
+		 * case, the insertion should not be decoded at all - see
+		 * heap_decode(). (It's also set by raw_heap_insert() for TOAST, but
+		 * TOAST does not pass this test anyway.)
 		 */
-		if (RelationIsAccessibleInLogicalDecoding(relation))
+		if ((options & HEAP_INSERT_NO_LOGICAL) == 0 &&
+			RelationIsAccessibleInLogicalDecoding(relation))
 			log_heap_new_cid(relation, heaptup);
 
 		/*
@@ -2723,7 +2731,8 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 void
 simple_heap_insert(Relation relation, HeapTuple tup)
 {
-	heap_insert(relation, tup, GetCurrentCommandId(true), 0, NULL);
+	heap_insert(relation, tup, GetCurrentTransactionId(),
+				GetCurrentCommandId(true), 0, NULL);
 }
 
 /*
@@ -2780,11 +2789,11 @@ xmax_infomask_changed(uint16 new_infomask, uint16 old_infomask)
  */
 TM_Result
 heap_delete(Relation relation, ItemPointer tid,
-			CommandId cid, Snapshot crosscheck, bool wait,
-			TM_FailureData *tmfd, bool changingPart, bool wal_logical)
+			TransactionId xid, CommandId cid, Snapshot crosscheck, bool wait,
+			TM_FailureData *tmfd, bool changingPart,
+			bool wal_logical)
 {
 	TM_Result	result;
-	TransactionId xid = GetCurrentTransactionId();
 	ItemId		lp;
 	HeapTupleData tp;
 	Page		page;
@@ -2801,6 +2810,7 @@ heap_delete(Relation relation, ItemPointer tid,
 	bool		old_key_copied = false;
 
 	Assert(ItemPointerIsValid(tid));
+	Assert(TransactionIdIsValid(xid));
 
 	AssertHasSnapshotForToast(relation);
 
@@ -3217,7 +3227,7 @@ simple_heap_delete(Relation relation, ItemPointer tid)
 	TM_Result	result;
 	TM_FailureData tmfd;
 
-	result = heap_delete(relation, tid,
+	result = heap_delete(relation, tid, GetCurrentTransactionId(),
 						 GetCurrentCommandId(true), InvalidSnapshot,
 						 true /* wait for commit */ ,
 						 &tmfd, false,	/* changingPart */
@@ -3260,12 +3270,11 @@ simple_heap_delete(Relation relation, ItemPointer tid)
  */
 TM_Result
 heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
-			CommandId cid, Snapshot crosscheck, bool wait,
-			TM_FailureData *tmfd, LockTupleMode *lockmode,
+			TransactionId xid, CommandId cid, Snapshot crosscheck,
+			bool wait, TM_FailureData *tmfd, LockTupleMode *lockmode,
 			TU_UpdateIndexes *update_indexes, bool wal_logical)
 {
 	TM_Result	result;
-	TransactionId xid = GetCurrentTransactionId();
 	Bitmapset  *hot_attrs;
 	Bitmapset  *sum_attrs;
 	Bitmapset  *key_attrs;
@@ -3305,6 +3314,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 				infomask2_new_tuple;
 
 	Assert(ItemPointerIsValid(otid));
+	Assert(TransactionIdIsValid(xid));
 
 	/* Cheap, simplistic check that the tuple matches the rel's rowtype. */
 	Assert(HeapTupleHeaderGetNatts(newtup->t_data) <=
@@ -4144,8 +4154,12 @@ l2:
 		/*
 		 * For logical decoding we need combo CIDs to properly decode the
 		 * catalog.
+		 *
+		 * Like in heap_insert(), visibility is unchanged when called from
+		 * VACUUM FULL / CLUSTER.
 		 */
-		if (RelationIsAccessibleInLogicalDecoding(relation))
+		if (wal_logical &&
+			RelationIsAccessibleInLogicalDecoding(relation))
 		{
 			log_heap_new_cid(relation, &oldtup);
 			log_heap_new_cid(relation, heaptup);
@@ -4511,7 +4525,7 @@ simple_heap_update(Relation relation, ItemPointer otid, HeapTuple tup,
 	TM_FailureData tmfd;
 	LockTupleMode lockmode;
 
-	result = heap_update(relation, otid, tup,
+	result = heap_update(relation, otid, tup, GetCurrentTransactionId(),
 						 GetCurrentCommandId(true), InvalidSnapshot,
 						 true /* wait for commit */ ,
 						 &tmfd, &lockmode, update_indexes,
@@ -5351,8 +5365,6 @@ compute_new_xmax_infomask(TransactionId xmax, uint16 old_infomask,
 	uint16		new_infomask,
 				new_infomask2;
 
-	Assert(TransactionIdIsCurrentTransactionId(add_to_xmax));
-
 l5:
 	new_infomask = 0;
 	new_infomask2 = 0;
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index d03084768e0..6733e5fdda6 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -253,7 +253,8 @@ heapam_tuple_insert(Relation relation, TupleTableSlot *slot, CommandId cid,
 	tuple->t_tableOid = slot->tts_tableOid;
 
 	/* Perform the insertion, and copy the resulting ItemPointer */
-	heap_insert(relation, tuple, cid, options, bistate);
+	heap_insert(relation, tuple, GetCurrentTransactionId(), cid, options,
+				bistate);
 	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
 
 	if (shouldFree)
@@ -276,7 +277,8 @@ heapam_tuple_insert_speculative(Relation relation, TupleTableSlot *slot,
 	options |= HEAP_INSERT_SPECULATIVE;
 
 	/* Perform the insertion, and copy the resulting ItemPointer */
-	heap_insert(relation, tuple, cid, options, bistate);
+	heap_insert(relation, tuple, GetCurrentTransactionId(), cid, options,
+				bistate);
 	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
 
 	if (shouldFree)
@@ -310,8 +312,8 @@ heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
 	 * the storage itself is cleaning the dead tuples by itself, it is the
 	 * time to call the index tuple deletion also.
 	 */
-	return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart,
-					   true);
+	return heap_delete(relation, tid, GetCurrentTransactionId(), cid,
+					   crosscheck, wait, tmfd, changingPart, true);
 }
 
 
@@ -329,7 +331,8 @@ heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
 	slot->tts_tableOid = RelationGetRelid(relation);
 	tuple->t_tableOid = slot->tts_tableOid;
 
-	result = heap_update(relation, otid, tuple, cid, crosscheck, wait,
+	result = heap_update(relation, otid, tuple, GetCurrentTransactionId(),
+						 cid, crosscheck, wait,
 						 tmfd, lockmode, update_indexes, true);
 	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
 
@@ -2477,9 +2480,16 @@ reform_and_rewrite_tuple(HeapTuple tuple,
 		 * flag to skip logical decoding: as soon as REPACK CONCURRENTLY swaps
 		 * the relation files, it drops this relation, so no logical
 		 * replication subscription should need the data.
+		 *
+		 * It is also crucial to stamp the new record with the exact same xid
+		 * and cid, because the tuple must be visible to the snapshots of the
+		 * concurrent transactions later.
 		 */
-		heap_insert(NewHeap, copiedTuple, GetCurrentCommandId(true),
-					HEAP_INSERT_NO_LOGICAL, NULL);
+		// TODO: looks like cid is not required
+		CommandId	cid = HeapTupleHeaderGetRawCommandId(tuple->t_data);
+		TransactionId xid = HeapTupleHeaderGetXmin(tuple->t_data);
+
+		heap_insert(NewHeap, copiedTuple, xid, cid, HEAP_INSERT_NO_LOGICAL, NULL);
 	}
 
 	heap_freetuple(copiedTuple);
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 61224a3adf2..936cb0ae429 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -55,6 +55,7 @@
 #include "storage/ipc.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
+#include "storage/procarray.h"
 #include "utils/acl.h"
 #include "utils/fmgroids.h"
 #include "utils/guc.h"
@@ -146,6 +147,7 @@ static void apply_concurrent_delete(Relation rel, HeapTuple tup_target,
 									ConcurrentChange *change);
 static HeapTuple find_target_tuple(Relation rel, ScanKey key, int nkeys,
 								   HeapTuple tup_key,
+								   Snapshot snapshot,
 								   IndexInsertState *iistate,
 								   TupleTableSlot *ident_slot,
 								   IndexScanDesc *scan_p);
@@ -1008,7 +1010,14 @@ rebuild_relation(RepackCommand cmd, bool usingindex,
 
 	/* The historic snapshot won't be needed anymore. */
 	if (snapshot)
+	{
+		TransactionId xmin = snapshot->xmin;
 		PopActiveSnapshot();
+		Assert(concurrent);
+		// TODO: seems like it not required: need to check SnapBuildInitialSnapshotForRepack
+		WaitForOlderSnapshots(xmin, false);
+	}
+
 
 	if (concurrent)
 	{
@@ -1299,30 +1308,35 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
 	 * not to be aggressive about this.
 	 */
 	memset(&params, 0, sizeof(VacuumParams));
-	vacuum_get_cutoffs(OldHeap, params, &cutoffs);
-
-	/*
-	 * FreezeXid will become the table's new relfrozenxid, and that mustn't go
-	 * backwards, so take the max.
-	 */
+	if (!concurrent)
 	{
 		TransactionId relfrozenxid = OldHeap->rd_rel->relfrozenxid;
+		MultiXactId relminmxid = OldHeap->rd_rel->relminmxid;
 
+		vacuum_get_cutoffs(OldHeap, params, &cutoffs);
+		/*
+		 * FreezeXid will become the table's new relfrozenxid, and that mustn't go
+		 * backwards, so take the max.
+		 */
 		if (TransactionIdIsValid(relfrozenxid) &&
 			TransactionIdPrecedes(cutoffs.FreezeLimit, relfrozenxid))
 			cutoffs.FreezeLimit = relfrozenxid;
-	}
-
-	/*
-	 * MultiXactCutoff, similarly, shouldn't go backwards either.
-	 */
-	{
-		MultiXactId relminmxid = OldHeap->rd_rel->relminmxid;
-
+		/*
+		 * MultiXactCutoff, similarly, shouldn't go backwards either.
+		 */
 		if (MultiXactIdIsValid(relminmxid) &&
 			MultiXactIdPrecedes(cutoffs.MultiXactCutoff, relminmxid))
 			cutoffs.MultiXactCutoff = relminmxid;
 	}
+	else
+	{
+		/*
+		 * In concurrent mode we reuse all the xmin/xmax,
+		 * so just use current values for simplicity.
+		 */
+		cutoffs.FreezeLimit = OldHeap->rd_rel->relfrozenxid;
+		cutoffs.MultiXactCutoff = OldHeap->rd_rel->relminmxid;
+	}
 
 	/*
 	 * Decide whether to use an indexscan or seqscan-and-optional-sort to scan
@@ -2675,6 +2689,16 @@ apply_concurrent_changes(RepackDecodingState *dstate, Relation rel,
 			continue;
 		}
 
+		if (TransactionIdIsInProgress(change.xid))
+		{
+			/* xid is committed for sure because we got that update from reorderbuffer.
+			 * but there is a possibility procarray is not yet updated and current backend still see it as
+			 * in-progress. Let's wait for procarray to be updated. */
+			XactLockTableWait(change.xid, NULL, NULL, XLTW_None);
+			Assert(!TransactionIdIsInProgress(change.xid));
+			Assert(TransactionIdDidCommit(change.xid));
+		}
+
 		/*
 		 * Extract the tuple from the change. The tuple is copied here because
 		 * it might be assigned to 'tup_old', in which case it needs to
@@ -2712,9 +2736,13 @@ apply_concurrent_changes(RepackDecodingState *dstate, Relation rel,
 			}
 
 			/*
-			 * Find the tuple to be updated or deleted.
+			 * Find the tuple to be updated or deleted using SnapshotSelf.
+			 * That way we receive the last alive version in case of HOT chain.
+			 * It is guaranteed there is no any non-yet committed, but updated version
+			 * because we here replaying all-committed transactions without any concurrency
+			 * involved.
 			 */
-			tup_exist = find_target_tuple(rel, key, nkeys, tup_key,
+			tup_exist = find_target_tuple(rel, key, nkeys, tup_key, SnapshotSelf,
 										  iistate, ident_slot, &ind_scan);
 			if (tup_exist == NULL)
 				elog(ERROR, "Failed to find target tuple");
@@ -2743,6 +2771,7 @@ apply_concurrent_changes(RepackDecodingState *dstate, Relation rel,
 		 */
 		if (change.kind != CHANGE_UPDATE_OLD)
 		{
+			// TODO: not sure it is required at all: we are replaying committed transactions stamping them with committed XID
 			CommandCounterIncrement();
 			UpdateActiveSnapshotCommandId();
 		}
@@ -2771,9 +2800,11 @@ apply_concurrent_insert(Relation rel, ConcurrentChange *change, HeapTuple tup,
 	 * Like simple_heap_insert(), but make sure that the INSERT is not
 	 * logically decoded - see reform_and_rewrite_tuple() for more
 	 * information.
+	 *
+	 * Use already committed xid to stamp the tuple.
 	 */
-	heap_insert(rel, tup, GetCurrentCommandId(true), HEAP_INSERT_NO_LOGICAL,
-				NULL);
+	heap_insert(rel, tup, change->xid, GetCurrentCommandId(true),
+				HEAP_INSERT_NO_LOGICAL, NULL);
 
 	/*
 	 * Update indexes.
@@ -2781,6 +2812,7 @@ apply_concurrent_insert(Relation rel, ConcurrentChange *change, HeapTuple tup,
 	 * In case functions in the index need the active snapshot and caller
 	 * hasn't set one.
 	 */
+	PushActiveSnapshot(GetLatestSnapshot());
 	ExecStoreHeapTuple(tup, index_slot, false);
 	recheck = ExecInsertIndexTuples(iistate->rri,
 									index_slot,
@@ -2791,6 +2823,7 @@ apply_concurrent_insert(Relation rel, ConcurrentChange *change, HeapTuple tup,
 									NIL,	/* arbiterIndexes */
 									false	/* onlySummarizing */
 		);
+	PopActiveSnapshot();
 
 	/*
 	 * If recheck is required, it must have been preformed on the source
@@ -2819,9 +2852,11 @@ apply_concurrent_update(Relation rel, HeapTuple tup, HeapTuple tup_target,
 	 *
 	 * Do it like in simple_heap_update(), except for 'wal_logical' (and
 	 * except for 'wait').
+	 *
+	 * Use already committed xid to stamp the tuple.
 	 */
 	res = heap_update(rel, &tup_target->t_self, tup,
-					  GetCurrentCommandId(true),
+					  change->xid, GetCurrentCommandId(true),
 					  InvalidSnapshot,
 					  false,	/* no wait - only we are doing changes */
 					  &tmfd, &lockmode, &update_indexes,
@@ -2833,6 +2868,7 @@ apply_concurrent_update(Relation rel, HeapTuple tup, HeapTuple tup_target,
 
 	if (update_indexes != TU_None)
 	{
+		PushActiveSnapshot(GetLatestSnapshot());
 		recheck = ExecInsertIndexTuples(iistate->rri,
 										index_slot,
 										iistate->estate,
@@ -2842,6 +2878,7 @@ apply_concurrent_update(Relation rel, HeapTuple tup, HeapTuple tup_target,
 										NIL,	/* arbiterIndexes */
 		/* onlySummarizing */
 										update_indexes == TU_Summarizing);
+		PopActiveSnapshot();
 		list_free(recheck);
 	}
 
@@ -2860,9 +2897,11 @@ apply_concurrent_delete(Relation rel, HeapTuple tup_target,
 	 *
 	 * Do it like in simple_heap_delete(), except for 'wal_logical' (and
 	 * except for 'wait').
+	 *
+	 * Use already committed xid to stamp the tuple.
 	 */
-	res = heap_delete(rel, &tup_target->t_self, GetCurrentCommandId(true),
-					  InvalidSnapshot, false,
+	res = heap_delete(rel, &tup_target->t_self, change->xid,
+					  GetCurrentCommandId(true), InvalidSnapshot, false,
 					  &tmfd,
 					  false,	/* no wait - only we are doing changes */
 					  false /* wal_logical */ );
@@ -2886,7 +2925,7 @@ apply_concurrent_delete(Relation rel, HeapTuple tup_target,
  */
 static HeapTuple
 find_target_tuple(Relation rel, ScanKey key, int nkeys, HeapTuple tup_key,
-				  IndexInsertState *iistate,
+				  Snapshot snapshot, IndexInsertState *iistate,
 				  TupleTableSlot *ident_slot, IndexScanDesc *scan_p)
 {
 	IndexScanDesc scan;
@@ -2895,7 +2934,7 @@ find_target_tuple(Relation rel, ScanKey key, int nkeys, HeapTuple tup_key,
 	HeapTuple	result = NULL;
 
 	/* XXX no instrumentation for now */
-	scan = index_beginscan(rel, iistate->ident_index, GetActiveSnapshot(),
+	scan = index_beginscan(rel, iistate->ident_index, snapshot,
 						   NULL, nkeys, 0);
 	*scan_p = scan;
 	index_rescan(scan, key, nkeys, NULL, 0);
diff --git a/src/backend/replication/pgoutput_repack/pgoutput_repack.c b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
index 687fbbc59bb..020ff7b7c80 100644
--- a/src/backend/replication/pgoutput_repack/pgoutput_repack.c
+++ b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
@@ -32,7 +32,8 @@ static void plugin_truncate(struct LogicalDecodingContext *ctx,
 							Relation relations[],
 							ReorderBufferChange *change);
 static void store_change(LogicalDecodingContext *ctx,
-						 ConcurrentChangeKind kind, HeapTuple tuple);
+						 ConcurrentChangeKind kind, HeapTuple tuple,
+						 TransactionId xid);
 
 void
 _PG_output_plugin_init(OutputPluginCallbacks *cb)
@@ -124,7 +125,7 @@ plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 				if (newtuple == NULL)
 					elog(ERROR, "Incomplete insert info.");
 
-				store_change(ctx, CHANGE_INSERT, newtuple);
+				store_change(ctx, CHANGE_INSERT, newtuple, change->txn->xid);
 			}
 			break;
 		case REORDER_BUFFER_CHANGE_UPDATE:
@@ -141,9 +142,11 @@ plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 					elog(ERROR, "Incomplete update info.");
 
 				if (oldtuple != NULL)
-					store_change(ctx, CHANGE_UPDATE_OLD, oldtuple);
+					store_change(ctx, CHANGE_UPDATE_OLD, oldtuple,
+								 change->txn->xid);
 
-				store_change(ctx, CHANGE_UPDATE_NEW, newtuple);
+				store_change(ctx, CHANGE_UPDATE_NEW, newtuple,
+							 change->txn->xid);
 			}
 			break;
 		case REORDER_BUFFER_CHANGE_DELETE:
@@ -156,7 +159,7 @@ plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 				if (oldtuple == NULL)
 					elog(ERROR, "Incomplete delete info.");
 
-				store_change(ctx, CHANGE_DELETE, oldtuple);
+				store_change(ctx, CHANGE_DELETE, oldtuple, change->txn->xid);
 			}
 			break;
 		default:
@@ -190,13 +193,13 @@ plugin_truncate(struct LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 	if (i == nrelations)
 		return;
 
-	store_change(ctx, CHANGE_TRUNCATE, NULL);
+	store_change(ctx, CHANGE_TRUNCATE, NULL, InvalidTransactionId);
 }
 
 /* Store concurrent data change. */
 static void
 store_change(LogicalDecodingContext *ctx, ConcurrentChangeKind kind,
-			 HeapTuple tuple)
+			 HeapTuple tuple, TransactionId xid)
 {
 	RepackDecodingState *dstate;
 	char	   *change_raw;
@@ -266,6 +269,7 @@ store_change(LogicalDecodingContext *ctx, ConcurrentChangeKind kind,
 	dst = dst_start + SizeOfConcurrentChange;
 	memcpy(dst, tuple->t_data, tuple->t_len);
 
+	change.xid = xid;
 	/* The data has been copied. */
 	if (flattened)
 		pfree(tuple);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b82dd17a966..981425f23b6 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -316,22 +316,24 @@ extern BulkInsertState GetBulkInsertState(void);
 extern void FreeBulkInsertState(BulkInsertState);
 extern void ReleaseBulkInsertStatePin(BulkInsertState bistate);
 
-extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
-						int options, BulkInsertState bistate);
+extern void heap_insert(Relation relation, HeapTuple tup, TransactionId xid,
+						CommandId cid, int options, BulkInsertState bistate);
 extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots,
 							  int ntuples, CommandId cid, int options,
 							  BulkInsertState bistate);
 extern TM_Result heap_delete(Relation relation, ItemPointer tid,
-							 CommandId cid, Snapshot crosscheck, bool wait,
+							 TransactionId xid, CommandId cid,
+							 Snapshot crosscheck, bool wait,
 							 struct TM_FailureData *tmfd, bool changingPart,
 							 bool wal_logical);
 extern void heap_finish_speculative(Relation relation, ItemPointer tid);
 extern void heap_abort_speculative(Relation relation, ItemPointer tid);
 extern TM_Result heap_update(Relation relation, ItemPointer otid,
-							 HeapTuple newtup,
+							 HeapTuple newtup, TransactionId xid,
 							 CommandId cid, Snapshot crosscheck, bool wait,
 							 struct TM_FailureData *tmfd, LockTupleMode *lockmode,
-							 TU_UpdateIndexes *update_indexes, bool wal_logical);
+							 TU_UpdateIndexes *update_indexes,
+							 bool wal_logical);
 extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
 								 CommandId cid, LockTupleMode mode, LockWaitPolicy wait_policy,
 								 bool follow_updates,
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 4a508c57a50..242f8da770a 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -61,6 +61,8 @@ typedef struct ConcurrentChange
 	/* See the enum above. */
 	ConcurrentChangeKind kind;
 
+	/* Transaction that changes the data. */
+	TransactionId xid;
 	/*
 	 * The actual tuple.
 	 *
diff --git a/src/test/modules/injection_points/specs/repack.spec b/src/test/modules/injection_points/specs/repack.spec
index 75850334986..3711a7c92b9 100644
--- a/src/test/modules/injection_points/specs/repack.spec
+++ b/src/test/modules/injection_points/specs/repack.spec
@@ -86,9 +86,6 @@ step change_new
 # When applying concurrent data changes, we should see the effects of an
 # in-progress subtransaction.
 #
-# XXX Not sure this test is useful now - it was designed for the patch that
-# preserves tuple visibility and which therefore modifies
-# TransactionIdIsCurrentTransactionId().
 step change_subxact1
 {
 	BEGIN;
@@ -103,7 +100,6 @@ step change_subxact1
 # When applying concurrent data changes, we should not see the effects of a
 # rolled back subtransaction.
 #
-# XXX Is this test useful? See above.
 step change_subxact2
 {
 	BEGIN;
-- 
2.43.0

v21-0003-Refactor-index_concurrently_create_copy-for-use-.patchapplication/octet-stream; name=v21-0003-Refactor-index_concurrently_create_copy-for-use-.patchDownload

From 896f4fc90d128f0a8625f47b82b08eb0da145be7 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Mon, 11 Aug 2025 15:31:34 +0200
Subject: [PATCH v21 3/6] Refactor index_concurrently_create_copy() for use
 with REPACK (CONCURRENTLY).

This patch moves the code to index_create_copy() and adds a "concurrently"
parameter so it can be used by REPACK (CONCURRENTLY).

With the CONCURRENTLY option, REPACK cannot simply swap the heap file and
rebuild its indexes. Instead, it needs to build a separate set of indexes
(including system catalog entries) *before* the actual swap, to reduce the
time AccessExclusiveLock needs to be held for.
---
 src/backend/catalog/index.c | 36 ++++++++++++++++++++++++++++--------
 src/include/catalog/index.h |  3 +++
 2 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 3063abff9a5..0dee1b1a9d8 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1290,15 +1290,31 @@ index_create(Relation heapRelation,
 /*
  * index_concurrently_create_copy
  *
- * Create concurrently an index based on the definition of the one provided by
- * caller.  The index is inserted into catalogs and needs to be built later
- * on.  This is called during concurrent reindex processing.
- *
- * "tablespaceOid" is the tablespace to use for this index.
+ * Variant of index_create_copy(), called during concurrent reindex
+ * processing.
  */
 Oid
 index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 							   Oid tablespaceOid, const char *newName)
+{
+	return index_create_copy(heapRelation, oldIndexId, tablespaceOid, newName,
+							 true);
+}
+
+/*
+ * index_create_copy
+ *
+ * Create an index based on the definition of the one provided by caller.  The
+ * index is inserted into catalogs and needs to be built later on.
+ *
+ * "tablespaceOid" is the tablespace to use for this index.
+ *
+ * The actual implementation of index_concurrently_create_copy(), reusable for
+ * other purposes.
+ */
+Oid
+index_create_copy(Relation heapRelation, Oid oldIndexId, Oid tablespaceOid,
+				  const char *newName, bool concurrently)
 {
 	Relation	indexRelation;
 	IndexInfo  *oldInfo,
@@ -1317,6 +1333,7 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 	List	   *indexColNames = NIL;
 	List	   *indexExprs = NIL;
 	List	   *indexPreds = NIL;
+	int			flags = 0;
 
 	indexRelation = index_open(oldIndexId, RowExclusiveLock);
 
@@ -1325,9 +1342,9 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 
 	/*
 	 * Concurrent build of an index with exclusion constraints is not
-	 * supported.
+	 * supported. If !concurrently, ii_ExclusinOps is currently not needed.
 	 */
-	if (oldInfo->ii_ExclusionOps != NULL)
+	if (oldInfo->ii_ExclusionOps != NULL && concurrently)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("concurrent index creation for exclusion constraints is not supported")));
@@ -1435,6 +1452,9 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 		stattargets[i].isnull = isnull;
 	}
 
+	if (concurrently)
+		flags = INDEX_CREATE_SKIP_BUILD | INDEX_CREATE_CONCURRENT;
+
 	/*
 	 * Now create the new index.
 	 *
@@ -1458,7 +1478,7 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 							  indcoloptions->values,
 							  stattargets,
 							  reloptionsDatum,
-							  INDEX_CREATE_SKIP_BUILD | INDEX_CREATE_CONCURRENT,
+							  flags,
 							  0,
 							  true, /* allow table to be a system catalog? */
 							  false,	/* is_internal? */
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index 4daa8bef5ee..063a891351a 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -99,6 +99,9 @@ extern Oid	index_concurrently_create_copy(Relation heapRelation,
 										   Oid oldIndexId,
 										   Oid tablespaceOid,
 										   const char *newName);
+extern Oid	index_create_copy(Relation heapRelation, Oid oldIndexId,
+							  Oid tablespaceOid, const char *newName,
+							  bool concurrently);
 
 extern void index_concurrently_build(Oid heapRelationId,
 									 Oid indexRelationId);
-- 
2.43.0

v21-0005-Add-CONCURRENTLY-option-to-REPACK-command.patchapplication/octet-stream; name=v21-0005-Add-CONCURRENTLY-option-to-REPACK-command.patchDownload

From a9411b077bc121215b230556be5a114d5effd847 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= <alvherre@kurilemu.de>
Date: Sat, 30 Aug 2025 19:13:38 +0200
Subject: [PATCH v21 5/6] Add CONCURRENTLY option to REPACK command.

The REPACK command copies the relation data into a new file, creates new
indexes and eventually swaps the files. To make sure that the old file does
not change during the copying, the relation is locked in an exclusive mode,
which prevents applications from both reading and writing. (To keep the data
consistent, we'd only need to prevent the applications from writing, but even
reading needs to be blocked before we can swap the files - otherwise some
applications could continue using the old file. Since we should not request a
stronger lock without releasing the weaker one first, we acquire the exclusive
lock in the beginning and keep it till the end of the processing.)

This patch introduces an alternative workflow, which only requires the
exclusive lock when the relation (and index) files are being swapped.
(Supposedly, the swapping should be pretty fast.) On the other hand, when we
copy the data to the new file, we allow applications to read from the relation
and even to write to it.

First, we scan the relation using a "historic snapshot", and insert all the
tuples satisfying this snapshot into the new file.

Second, logical decoding is used to capture the data changes done by
applications during the copying (i.e. changes that do not satisfy the historic
snapshot mentioned above), and those are applied to the new file before we
acquire the exclusive lock that we need to swap the files. (Of course, more
data changes can take place while we are waiting for the lock - these will be
applied to the new file after we have acquired the lock, before we swap the
files.)

Since the logical decoding system, during its startup, waits until all the
transactions which already have XID assigned have finished, there is a risk of
deadlock if a transaction that already changed anything in the database tries
to acquire a conflicting lock on the table REPACK CONCURRENTLY is working
on. As an example, consider transaction running CREATE INDEX command on the
table that is being REPACKed CONCURRENTLY. On the other hand, DML commands
(INSERT, UPDATE, DELETE) are not a problem as their lock does not conflict
with REPACK CONCURRENTLY.

The current approach is that we accept the risk. If we tried to avoid it, it'd
be necessary to unlock the table before the logical decoding is setup and lock
it again afterwards. Such temporary unlocking would imply re-checking if the
table still meets all the requirements for REPACK CONCURRENTLY.

Like the existing implementation of REPACK, the variant with the CONCURRENTLY
option also requires an extra space for the new relation and index files
(which coexist with the old files for some time). In addition, the
CONCURRENTLY option might introduce a lag in releasing WAL segments for
archiving / recycling. This is due to the decoding of the data changes done by
applications concurrently. When copying the table contents into the new file,
we check the lag periodically. If it exceeds the size of a WAL segment, we
decode all the available WAL before resuming the copying. (Of course, the
changes are not applied until the whole table contents is copied.) A
background worker might be a better approach for the decoding - let's consider
implementing it in the future.

The WAL records produced by running DML commands on the new relation do not
contain enough information to be processed by the logical decoding system. All
we need from the new relation is the file (relfilenode), while the actual
relation is eventually dropped. Thus there is no point in replaying the DMLs
anywhere.

Author: Antonin Houska <ah@cybertec.at>
---
 doc/src/sgml/monitoring.sgml                  |   37 +-
 doc/src/sgml/mvcc.sgml                        |   12 +-
 doc/src/sgml/ref/repack.sgml                  |  129 +-
 src/Makefile                                  |    1 +
 src/backend/access/heap/heapam.c              |   34 +-
 src/backend/access/heap/heapam_handler.c      |  219 ++-
 src/backend/access/heap/rewriteheap.c         |    6 +-
 src/backend/access/transam/xact.c             |   11 +-
 src/backend/catalog/system_views.sql          |   30 +-
 src/backend/commands/cluster.c                | 1677 +++++++++++++++--
 src/backend/commands/matview.c                |    2 +-
 src/backend/commands/tablecmds.c              |    1 +
 src/backend/commands/vacuum.c                 |   12 +-
 src/backend/meson.build                       |    1 +
 src/backend/replication/logical/decode.c      |   83 +
 src/backend/replication/logical/snapbuild.c   |   21 +
 .../replication/pgoutput_repack/Makefile      |   32 +
 .../replication/pgoutput_repack/meson.build   |   18 +
 .../pgoutput_repack/pgoutput_repack.c         |  288 +++
 src/backend/storage/ipc/ipci.c                |    1 +
 .../storage/lmgr/generate-lwlocknames.pl      |    2 +-
 src/backend/utils/cache/relcache.c            |    1 +
 src/backend/utils/time/snapmgr.c              |    3 +-
 src/bin/psql/tab-complete.in.c                |   25 +-
 src/include/access/heapam.h                   |    9 +-
 src/include/access/heapam_xlog.h              |    2 +
 src/include/access/tableam.h                  |   10 +
 src/include/commands/cluster.h                |   91 +-
 src/include/commands/progress.h               |   23 +-
 src/include/replication/snapbuild.h           |    1 +
 src/include/storage/lockdefs.h                |    4 +-
 src/include/utils/snapmgr.h                   |    2 +
 src/test/modules/injection_points/Makefile    |    5 +-
 .../injection_points/expected/repack.out      |  113 ++
 .../modules/injection_points/logical.conf     |    1 +
 src/test/modules/injection_points/meson.build |    4 +
 .../injection_points/specs/repack.spec        |  143 ++
 src/test/regress/expected/rules.out           |   29 +-
 src/tools/pgindent/typedefs.list              |    4 +
 39 files changed, 2816 insertions(+), 271 deletions(-)
 create mode 100644 src/backend/replication/pgoutput_repack/Makefile
 create mode 100644 src/backend/replication/pgoutput_repack/meson.build
 create mode 100644 src/backend/replication/pgoutput_repack/pgoutput_repack.c
 create mode 100644 src/test/modules/injection_points/expected/repack.out
 create mode 100644 src/test/modules/injection_points/logical.conf
 create mode 100644 src/test/modules/injection_points/specs/repack.spec

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 12e103d319d..61c0197555f 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6074,14 +6074,35 @@ FROM pg_stat_get_backend_idset() AS backendid;
 
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>heap_tuples_written</structfield> <type>bigint</type>
+       <structfield>heap_tuples_inserted</structfield> <type>bigint</type>
       </para>
       <para>
-       Number of heap tuples written.
+       Number of heap tuples inserted.
        This counter only advances when the phase is
        <literal>seq scanning heap</literal>,
-       <literal>index scanning heap</literal>
-       or <literal>writing new heap</literal>.
+       <literal>index scanning heap</literal>,
+       <literal>writing new heap</literal>
+       or <literal>catch-up</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_updated</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples updated.
+       This counter only advances when the phase is <literal>catch-up</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_deleted</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples deleted.
+       This counter only advances when the phase is <literal>catch-up</literal>.
       </para></entry>
      </row>
 
@@ -6162,6 +6183,14 @@ FROM pg_stat_get_backend_idset() AS backendid;
        <command>REPACK</command> is currently writing the new heap.
      </entry>
     </row>
+    <row>
+     <entry><literal>catch-up</literal></entry>
+     <entry>
+       <command>REPACK CONCURRENTLY</command> is currently processing the DML
+       commands that other transactions executed during any of the preceding
+       phase.
+     </entry>
+    </row>
     <row>
      <entry><literal>swapping relation files</literal></entry>
      <entry>
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index 049ee75a4ba..0f5c34af542 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -1833,15 +1833,17 @@ SELECT pg_advisory_lock(q.id) FROM
    <title>Caveats</title>
 
    <para>
-    Some DDL commands, currently only <link linkend="sql-truncate"><command>TRUNCATE</command></link> and the
-    table-rewriting forms of <link linkend="sql-altertable"><command>ALTER TABLE</command></link>, are not
+    Some commands, currently only <link linkend="sql-truncate"><command>TRUNCATE</command></link>, the
+    table-rewriting forms of <link linkend="sql-altertable"><command>ALTER
+    TABLE</command></link> and <command>REPACK</command> with
+    the <literal>CONCURRENTLY</literal> option, are not
     MVCC-safe.  This means that after the truncation or rewrite commits, the
     table will appear empty to concurrent transactions, if they are using a
-    snapshot taken before the DDL command committed.  This will only be an
+    snapshot taken before the command committed.  This will only be an
     issue for a transaction that did not access the table in question
-    before the DDL command started &mdash; any transaction that has done so
+    before the command started &mdash; any transaction that has done so
     would hold at least an <literal>ACCESS SHARE</literal> table lock,
-    which would block the DDL command until that transaction completes.
+    which would block the truncating or rewriting command until that transaction completes.
     So these commands will not cause any apparent inconsistency in the
     table contents for successive queries on the target table, but they
     could cause visible inconsistency between the contents of the target
diff --git a/doc/src/sgml/ref/repack.sgml b/doc/src/sgml/ref/repack.sgml
index fd9d89f8aaa..ff5ce48de55 100644
--- a/doc/src/sgml/ref/repack.sgml
+++ b/doc/src/sgml/ref/repack.sgml
@@ -27,6 +27,7 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
 
     VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
     ANALYSE | ANALYZE
+    CONCURRENTLY
 </synopsis>
  </refsynopsisdiv>
 
@@ -49,7 +50,8 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
    processes every table and materialized view in the current database that
    the current user has the <literal>MAINTAIN</literal> privilege on. This
    form of <command>REPACK</command> cannot be executed inside a transaction
-   block.
+   block.  Also, this form is not allowed if
+   the <literal>CONCURRENTLY</literal> option is used.
   </para>
 
   <para>
@@ -62,7 +64,8 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
    When a table is being repacked, an <literal>ACCESS EXCLUSIVE</literal> lock
    is acquired on it. This prevents any other database operations (both reads
    and writes) from operating on the table until the <command>REPACK</command>
-   is finished.
+   is finished. If you want to keep the table accessible during the repacking,
+   consider using the <literal>CONCURRENTLY</literal> option.
   </para>
 
   <refsect2 id="sql-repack-notes-on-clustering" xreflabel="Notes on Clustering">
@@ -179,6 +182,128 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>CONCURRENTLY</literal></term>
+    <listitem>
+     <para>
+      Allow other transactions to use the table while it is being repacked.
+     </para>
+
+     <para>
+      Internally, <command>REPACK</command> copies the contents of the table
+      (ignoring dead tuples) into a new file, sorted by the specified index,
+      and also creates a new file for each index. Then it swaps the old and
+      new files for the table and all the indexes, and deletes the old
+      files. The <literal>ACCESS EXCLUSIVE</literal> lock is needed to make
+      sure that the old files do not change during the processing because the
+      changes would get lost due to the swap.
+     </para>
+
+     <para>
+      With the <literal>CONCURRENTLY</literal> option, the <literal>ACCESS
+      EXCLUSIVE</literal> lock is only acquired to swap the table and index
+      files. The data changes that took place during the creation of the new
+      table and index files are captured using logical decoding
+      (<xref linkend="logicaldecoding"/>) and applied before
+      the <literal>ACCESS EXCLUSIVE</literal> lock is requested. Thus the lock
+      is typically held only for the time needed to swap the files, which
+      should be pretty short. However, the time might still be noticeable if
+      too many data changes have been done to the table while
+      <command>REPACK</command> was waiting for the lock: those changes must
+      be processed just before the files are swapped, while the
+      <literal>ACCESS EXCLUSIVE</literal> lock is being held.
+     </para>
+
+     <para>
+      Note that <command>REPACK</command> with the
+      the <literal>CONCURRENTLY</literal> option does not try to order the
+      rows inserted into the table after the repacking started. Also
+      note <command>REPACK</command> might fail to complete due to DDL
+      commands executed on the table by other transactions during the
+      repacking.
+     </para>
+
+     <note>
+      <para>
+       In addition to the temporary space requirements explained in
+       <xref linkend="sql-repack-notes-on-resources"/>,
+       the <literal>CONCURRENTLY</literal> option can add to the usage of
+       temporary space a bit more. The reason is that other transactions can
+       perform DML operations which cannot be applied to the new file until
+       <command>REPACK</command> has copied all the tuples from the old
+       file. Thus the tuples inserted into the old file during the copying are
+       also stored separately in a temporary file, so they can eventually be
+       applied to the new file.
+      </para>
+
+      <para>
+       Furthermore, the data changes performed during the copying are
+       extracted from <link linkend="wal">write-ahead log</link> (WAL), and
+       this extraction (decoding) only takes place when certain amount of WAL
+       has been written. Therefore, WAL removal can be delayed by this
+       threshold. Currently the threshold is equal to the value of
+       the <link linkend="guc-wal-segment-size"><varname>wal_segment_size</varname></link>
+       configuration parameter.
+      </para>
+     </note>
+
+     <para>
+      The <literal>CONCURRENTLY</literal> option cannot be used in the
+      following cases:
+
+      <itemizedlist>
+       <listitem>
+        <para>
+          The table is <literal>UNLOGGED</literal>.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+          The table is partitioned.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+          The table is a system catalog or a <acronym>TOAST</acronym> table.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         <command>REPACK</command> is executed inside a transaction block.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+          The <link linkend="guc-wal-level"><varname>wal_level</varname></link>
+          configuration parameter is less than <literal>logical</literal>.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         The <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+         configuration parameter does not allow for creation of an additional
+         replication slot.
+        </para>
+       </listitem>
+      </itemizedlist>
+     </para>
+
+     <warning>
+      <para>
+       <command>REPACK</command> with the <literal>CONCURRENTLY</literal>
+       option is not MVCC-safe, see <xref linkend="mvcc-caveats"/> for
+       details.
+      </para>
+     </warning>
+
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>VERBOSE</literal></term>
     <listitem>
diff --git a/src/Makefile b/src/Makefile
index 2f31a2f20a7..b18c9a14ffa 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -23,6 +23,7 @@ SUBDIRS = \
 	interfaces \
 	backend/replication/libpqwalreceiver \
 	backend/replication/pgoutput \
+	backend/replication/pgoutput_repack \
 	fe_utils \
 	bin \
 	pl \
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e3e7307ef5f..f9a4fe3faed 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -60,7 +60,8 @@ static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
 static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 								  Buffer newbuf, HeapTuple oldtup,
 								  HeapTuple newtup, HeapTuple old_key_tuple,
-								  bool all_visible_cleared, bool new_all_visible_cleared);
+								  bool all_visible_cleared, bool new_all_visible_cleared,
+								  bool wal_logical);
 #ifdef USE_ASSERT_CHECKING
 static void check_lock_if_inplace_updateable_rel(Relation relation,
 												 ItemPointer otid,
@@ -2780,7 +2781,7 @@ xmax_infomask_changed(uint16 new_infomask, uint16 old_infomask)
 TM_Result
 heap_delete(Relation relation, ItemPointer tid,
 			CommandId cid, Snapshot crosscheck, bool wait,
-			TM_FailureData *tmfd, bool changingPart)
+			TM_FailureData *tmfd, bool changingPart, bool wal_logical)
 {
 	TM_Result	result;
 	TransactionId xid = GetCurrentTransactionId();
@@ -3027,7 +3028,8 @@ l1:
 	 * Compute replica identity tuple before entering the critical section so
 	 * we don't PANIC upon a memory allocation failure.
 	 */
-	old_key_tuple = ExtractReplicaIdentity(relation, &tp, true, &old_key_copied);
+	old_key_tuple = wal_logical ?
+		ExtractReplicaIdentity(relation, &tp, true, &old_key_copied) : NULL;
 
 	/*
 	 * If this is the first possibly-multixact-able operation in the current
@@ -3117,6 +3119,15 @@ l1:
 				xlrec.flags |= XLH_DELETE_CONTAINS_OLD_KEY;
 		}
 
+		/*
+		 * Unlike UPDATE, DELETE is decoded even if there is no old key, so it
+		 * does not help to clear both XLH_DELETE_CONTAINS_OLD_TUPLE and
+		 * XLH_DELETE_CONTAINS_OLD_KEY. Thus we need an extra flag. TODO
+		 * Consider not decoding tuples w/o the old tuple/key instead.
+		 */
+		if (!wal_logical)
+			xlrec.flags |= XLH_DELETE_NO_LOGICAL;
+
 		XLogBeginInsert();
 		XLogRegisterData(&xlrec, SizeOfHeapDelete);
 
@@ -3209,7 +3220,8 @@ simple_heap_delete(Relation relation, ItemPointer tid)
 	result = heap_delete(relation, tid,
 						 GetCurrentCommandId(true), InvalidSnapshot,
 						 true /* wait for commit */ ,
-						 &tmfd, false /* changingPart */ );
+						 &tmfd, false,	/* changingPart */
+						 true /* wal_logical */ );
 	switch (result)
 	{
 		case TM_SelfModified:
@@ -3250,7 +3262,7 @@ TM_Result
 heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 			CommandId cid, Snapshot crosscheck, bool wait,
 			TM_FailureData *tmfd, LockTupleMode *lockmode,
-			TU_UpdateIndexes *update_indexes)
+			TU_UpdateIndexes *update_indexes, bool wal_logical)
 {
 	TM_Result	result;
 	TransactionId xid = GetCurrentTransactionId();
@@ -4143,7 +4155,8 @@ l2:
 								 newbuf, &oldtup, heaptup,
 								 old_key_tuple,
 								 all_visible_cleared,
-								 all_visible_cleared_new);
+								 all_visible_cleared_new,
+								 wal_logical);
 		if (newbuf != buffer)
 		{
 			PageSetLSN(BufferGetPage(newbuf), recptr);
@@ -4501,7 +4514,8 @@ simple_heap_update(Relation relation, ItemPointer otid, HeapTuple tup,
 	result = heap_update(relation, otid, tup,
 						 GetCurrentCommandId(true), InvalidSnapshot,
 						 true /* wait for commit */ ,
-						 &tmfd, &lockmode, update_indexes);
+						 &tmfd, &lockmode, update_indexes,
+						 true /* wal_logical */ );
 	switch (result)
 	{
 		case TM_SelfModified:
@@ -8842,7 +8856,8 @@ static XLogRecPtr
 log_heap_update(Relation reln, Buffer oldbuf,
 				Buffer newbuf, HeapTuple oldtup, HeapTuple newtup,
 				HeapTuple old_key_tuple,
-				bool all_visible_cleared, bool new_all_visible_cleared)
+				bool all_visible_cleared, bool new_all_visible_cleared,
+				bool wal_logical)
 {
 	xl_heap_update xlrec;
 	xl_heap_header xlhdr;
@@ -8853,7 +8868,8 @@ log_heap_update(Relation reln, Buffer oldbuf,
 				suffixlen = 0;
 	XLogRecPtr	recptr;
 	Page		page = BufferGetPage(newbuf);
-	bool		need_tuple_data = RelationIsLogicallyLogged(reln);
+	bool		need_tuple_data = RelationIsLogicallyLogged(reln) &&
+		wal_logical;
 	bool		init;
 	int			bufflags;
 
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 79f9de5d760..d03084768e0 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -33,6 +33,7 @@
 #include "catalog/index.h"
 #include "catalog/storage.h"
 #include "catalog/storage_xlog.h"
+#include "commands/cluster.h"
 #include "commands/progress.h"
 #include "executor/executor.h"
 #include "miscadmin.h"
@@ -309,7 +310,8 @@ heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
 	 * the storage itself is cleaning the dead tuples by itself, it is the
 	 * time to call the index tuple deletion also.
 	 */
-	return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart);
+	return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart,
+					   true);
 }
 
 
@@ -328,7 +330,7 @@ heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
 	tuple->t_tableOid = slot->tts_tableOid;
 
 	result = heap_update(relation, otid, tuple, cid, crosscheck, wait,
-						 tmfd, lockmode, update_indexes);
+						 tmfd, lockmode, update_indexes, true);
 	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
 
 	/*
@@ -685,13 +687,15 @@ static void
 heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 								 Relation OldIndex, bool use_sort,
 								 TransactionId OldestXmin,
+								 Snapshot snapshot,
+								 LogicalDecodingContext *decoding_ctx,
 								 TransactionId *xid_cutoff,
 								 MultiXactId *multi_cutoff,
 								 double *num_tuples,
 								 double *tups_vacuumed,
 								 double *tups_recently_dead)
 {
-	RewriteState rwstate;
+	RewriteState rwstate = NULL;
 	IndexScanDesc indexScan;
 	TableScanDesc tableScan;
 	HeapScanDesc heapScan;
@@ -705,6 +709,8 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	bool	   *isnull;
 	BufferHeapTupleTableSlot *hslot;
 	BlockNumber prev_cblock = InvalidBlockNumber;
+	bool		concurrent = snapshot != NULL;
+	XLogRecPtr	end_of_wal_prev = GetFlushRecPtr(NULL);
 
 	/* Remember if it's a system catalog */
 	is_system_catalog = IsSystemRelation(OldHeap);
@@ -720,9 +726,12 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	values = (Datum *) palloc(natts * sizeof(Datum));
 	isnull = (bool *) palloc(natts * sizeof(bool));
 
-	/* Initialize the rewrite operation */
-	rwstate = begin_heap_rewrite(OldHeap, NewHeap, OldestXmin, *xid_cutoff,
-								 *multi_cutoff);
+	/*
+	 * Initialize the rewrite operation.
+	 */
+	if (!concurrent)
+		rwstate = begin_heap_rewrite(OldHeap, NewHeap, OldestXmin,
+									 *xid_cutoff, *multi_cutoff);
 
 
 	/* Set up sorting if wanted */
@@ -737,6 +746,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	 * Prepare to scan the OldHeap.  To ensure we see recently-dead tuples
 	 * that still need to be copied, we scan with SnapshotAny and use
 	 * HeapTupleSatisfiesVacuum for the visibility test.
+	 *
+	 * In the CONCURRENTLY case, we do regular MVCC visibility tests, using
+	 * the snapshot passed by the caller.
 	 */
 	if (OldIndex != NULL && !use_sort)
 	{
@@ -753,7 +765,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex,
+									snapshot ? snapshot : SnapshotAny,
+									NULL, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -762,7 +776,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap,
+									snapshot ? snapshot : SnapshotAny,
+									0, (ScanKey) NULL);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
@@ -785,6 +801,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		HeapTuple	tuple;
 		Buffer		buf;
 		bool		isdead;
+		HTSV_Result vis;
 
 		CHECK_FOR_INTERRUPTS();
 
@@ -837,70 +854,84 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		tuple = ExecFetchSlotHeapTuple(slot, false, NULL);
 		buf = hslot->buffer;
 
-		LockBuffer(buf, BUFFER_LOCK_SHARE);
-
-		switch (HeapTupleSatisfiesVacuum(tuple, OldestXmin, buf))
+		/*
+		 * Regarding CONCURRENTLY, see the comments on MVCC snapshot above.
+		 */
+		if (!concurrent)
 		{
-			case HEAPTUPLE_DEAD:
-				/* Definitely dead */
-				isdead = true;
-				break;
-			case HEAPTUPLE_RECENTLY_DEAD:
-				*tups_recently_dead += 1;
-				/* fall through */
-			case HEAPTUPLE_LIVE:
-				/* Live or recently dead, must copy it */
-				isdead = false;
-				break;
-			case HEAPTUPLE_INSERT_IN_PROGRESS:
+			LockBuffer(buf, BUFFER_LOCK_SHARE);
 
-				/*
-				 * Since we hold exclusive lock on the relation, normally the
-				 * only way to see this is if it was inserted earlier in our
-				 * own transaction.  However, it can happen in system
-				 * catalogs, since we tend to release write lock before commit
-				 * there.  Give a warning if neither case applies; but in any
-				 * case we had better copy it.
-				 */
-				if (!is_system_catalog &&
-					!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple->t_data)))
-					elog(WARNING, "concurrent insert in progress within table \"%s\"",
-						 RelationGetRelationName(OldHeap));
-				/* treat as live */
-				isdead = false;
-				break;
-			case HEAPTUPLE_DELETE_IN_PROGRESS:
+			switch ((vis = HeapTupleSatisfiesVacuum(tuple, OldestXmin, buf)))
+			{
+				case HEAPTUPLE_DEAD:
+					/* Definitely dead */
+					isdead = true;
+					break;
+				case HEAPTUPLE_RECENTLY_DEAD:
+					*tups_recently_dead += 1;
+					/* fall through */
+				case HEAPTUPLE_LIVE:
+					/* Live or recently dead, must copy it */
+					isdead = false;
+					break;
+				case HEAPTUPLE_INSERT_IN_PROGRESS:
 
-				/*
-				 * Similar situation to INSERT_IN_PROGRESS case.
-				 */
-				if (!is_system_catalog &&
-					!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tuple->t_data)))
-					elog(WARNING, "concurrent delete in progress within table \"%s\"",
-						 RelationGetRelationName(OldHeap));
-				/* treat as recently dead */
-				*tups_recently_dead += 1;
-				isdead = false;
-				break;
-			default:
-				elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
-				isdead = false; /* keep compiler quiet */
-				break;
-		}
+					/*
+					 * As long as we hold exclusive lock on the relation,
+					 * normally the only way to see this is if it was inserted
+					 * earlier in our own transaction.  However, it can happen
+					 * in system catalogs, since we tend to release write lock
+					 * before commit there. Also, there's no exclusive lock
+					 * during concurrent processing. Give a warning if neither
+					 * case applies; but in any case we had better copy it.
+					 */
+					if (!is_system_catalog && !concurrent &&
+						!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple->t_data)))
+						elog(WARNING, "concurrent insert in progress within table \"%s\"",
+							 RelationGetRelationName(OldHeap));
+					/* treat as live */
+					isdead = false;
+					break;
+				case HEAPTUPLE_DELETE_IN_PROGRESS:
 
-		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+					/*
+					 * Similar situation to INSERT_IN_PROGRESS case.
+					 */
+					if (!is_system_catalog && !concurrent &&
+						!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tuple->t_data)))
+						elog(WARNING, "concurrent delete in progress within table \"%s\"",
+							 RelationGetRelationName(OldHeap));
+					/* treat as recently dead */
+					*tups_recently_dead += 1;
+					isdead = false;
+					break;
+				default:
+					elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
+					isdead = false; /* keep compiler quiet */
+					break;
+			}
 
-		if (isdead)
-		{
-			*tups_vacuumed += 1;
-			/* heap rewrite module still needs to see it... */
-			if (rewrite_heap_dead_tuple(rwstate, tuple))
+			if (isdead)
 			{
-				/* A previous recently-dead tuple is now known dead */
 				*tups_vacuumed += 1;
-				*tups_recently_dead -= 1;
+				/* heap rewrite module still needs to see it... */
+				if (rewrite_heap_dead_tuple(rwstate, tuple))
+				{
+					/* A previous recently-dead tuple is now known dead */
+					*tups_vacuumed += 1;
+					*tups_recently_dead -= 1;
+				}
+
+				LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+				continue;
 			}
-			continue;
+
+			/*
+			 * In the concurrent case, we have a copy of the tuple, so we
+			 * don't worry whether the source tuple will be deleted / updated
+			 * after we release the lock.
+			 */
+			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 		}
 
 		*num_tuples += 1;
@@ -919,7 +950,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		{
 			const int	ct_index[] = {
 				PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
-				PROGRESS_REPACK_HEAP_TUPLES_WRITTEN
+				PROGRESS_REPACK_HEAP_TUPLES_INSERTED
 			};
 			int64		ct_val[2];
 
@@ -934,6 +965,31 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			ct_val[1] = *num_tuples;
 			pgstat_progress_update_multi_param(2, ct_index, ct_val);
 		}
+
+		/*
+		 * Process the WAL produced by the load, as well as by other
+		 * transactions, so that the replication slot can advance and WAL does
+		 * not pile up. Use wal_segment_size as a threshold so that we do not
+		 * introduce the decoding overhead too often.
+		 *
+		 * Of course, we must not apply the changes until the initial load has
+		 * completed.
+		 *
+		 * Note that our insertions into the new table should not be decoded
+		 * as we (intentionally) do not write the logical decoding specific
+		 * information to WAL.
+		 */
+		if (concurrent)
+		{
+			XLogRecPtr	end_of_wal;
+
+			end_of_wal = GetFlushRecPtr(NULL);
+			if ((end_of_wal - end_of_wal_prev) > wal_segment_size)
+			{
+				repack_decode_concurrent_changes(decoding_ctx, end_of_wal);
+				end_of_wal_prev = end_of_wal;
+			}
+		}
 	}
 
 	if (indexScan != NULL)
@@ -977,7 +1033,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 									 values, isnull,
 									 rwstate);
 			/* Report n_tuples */
-			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_WRITTEN,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_INSERTED,
 										 n_tuples);
 		}
 
@@ -985,7 +1041,8 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	}
 
 	/* Write out any remaining tuples, and fsync if needed */
-	end_heap_rewrite(rwstate);
+	if (rwstate)
+		end_heap_rewrite(rwstate);
 
 	/* Clean up */
 	pfree(values);
@@ -2376,6 +2433,10 @@ heapam_scan_sample_next_tuple(TableScanDesc scan, SampleScanState *scanstate,
  * SET WITHOUT OIDS.
  *
  * So, we must reconstruct the tuple from component Datums.
+ *
+ * If rwstate=NULL, use simple_heap_insert() instead of rewriting - in that
+ * case we still need to deform/form the tuple. TODO Shouldn't we rename the
+ * function, as might not do any rewrite?
  */
 static void
 reform_and_rewrite_tuple(HeapTuple tuple,
@@ -2398,8 +2459,28 @@ reform_and_rewrite_tuple(HeapTuple tuple,
 
 	copiedTuple = heap_form_tuple(newTupDesc, values, isnull);
 
-	/* The heap rewrite module does the rest */
-	rewrite_heap_tuple(rwstate, tuple, copiedTuple);
+	if (rwstate)
+		/* The heap rewrite module does the rest */
+		rewrite_heap_tuple(rwstate, tuple, copiedTuple);
+	else
+	{
+		/*
+		 * Insert tuple when processing REPACK CONCURRENTLY.
+		 *
+		 * rewriteheap.c is not used in the CONCURRENTLY case because it'd be
+		 * difficult to do the same in the catch-up phase (as the logical
+		 * decoding does not provide us with sufficient visibility
+		 * information). Thus we must use heap_insert() both during the
+		 * catch-up and here.
+		 *
+		 * The following is like simple_heap_insert() except that we pass the
+		 * flag to skip logical decoding: as soon as REPACK CONCURRENTLY swaps
+		 * the relation files, it drops this relation, so no logical
+		 * replication subscription should need the data.
+		 */
+		heap_insert(NewHeap, copiedTuple, GetCurrentCommandId(true),
+					HEAP_INSERT_NO_LOGICAL, NULL);
+	}
 
 	heap_freetuple(copiedTuple);
 }
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index e6d2b5fced1..6aa2ed214f2 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -617,9 +617,9 @@ raw_heap_insert(RewriteState state, HeapTuple tup)
 		int			options = HEAP_INSERT_SKIP_FSM;
 
 		/*
-		 * While rewriting the heap for VACUUM FULL / CLUSTER, make sure data
-		 * for the TOAST table are not logically decoded.  The main heap is
-		 * WAL-logged as XLOG FPI records, which are not logically decoded.
+		 * While rewriting the heap for REPACK, make sure data for the TOAST
+		 * table are not logically decoded.  The main heap is WAL-logged as
+		 * XLOG FPI records, which are not logically decoded.
 		 */
 		options |= HEAP_INSERT_NO_LOGICAL;
 
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index b46e7e9c2a6..5670f2bfbde 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -215,6 +215,7 @@ typedef struct TransactionStateData
 	bool		parallelChildXact;	/* is any parent transaction parallel? */
 	bool		chain;			/* start a new block after this one */
 	bool		topXidLogged;	/* for a subxact: is top-level XID logged? */
+	bool		internal;		/* for a subxact: launched internally? */
 	struct TransactionStateData *parent;	/* back link to parent */
 } TransactionStateData;
 
@@ -4735,6 +4736,7 @@ BeginInternalSubTransaction(const char *name)
 			/* Normal subtransaction start */
 			PushTransaction();
 			s = CurrentTransactionState;	/* changed by push */
+			s->internal = true;
 
 			/*
 			 * Savepoint names, like the TransactionState block itself, live
@@ -5251,7 +5253,13 @@ AbortSubTransaction(void)
 	LWLockReleaseAll();
 
 	pgstat_report_wait_end();
-	pgstat_progress_end_command();
+
+	/*
+	 * Internal subtransacion might be used by an user command, in which case
+	 * the command outlives the subtransaction.
+	 */
+	if (!s->internal)
+		pgstat_progress_end_command();
 
 	pgaio_error_cleanup();
 
@@ -5468,6 +5476,7 @@ PushTransaction(void)
 	s->parallelModeLevel = 0;
 	s->parallelChildXact = (p->parallelModeLevel != 0 || p->parallelChildXact);
 	s->topXidLogged = false;
+	s->internal = false;
 
 	CurrentTransactionState = s;
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index b2b7b10c2be..a92ac78ad9e 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1266,16 +1266,17 @@ CREATE VIEW pg_stat_progress_cluster AS
                       WHEN 2 THEN 'index scanning heap'
                       WHEN 3 THEN 'sorting tuples'
                       WHEN 4 THEN 'writing new heap'
-                      WHEN 5 THEN 'swapping relation files'
-                      WHEN 6 THEN 'rebuilding index'
-                      WHEN 7 THEN 'performing final cleanup'
+                      -- 5 is 'catch-up', but that should not appear here.
+                      WHEN 6 THEN 'swapping relation files'
+                      WHEN 7 THEN 'rebuilding index'
+                      WHEN 8 THEN 'performing final cleanup'
                       END AS phase,
         CAST(S.param3 AS oid) AS cluster_index_relid,
         S.param4 AS heap_tuples_scanned,
         S.param5 AS heap_tuples_written,
-        S.param6 AS heap_blks_total,
-        S.param7 AS heap_blks_scanned,
-        S.param8 AS index_rebuild_count
+        S.param8 AS heap_blks_total,
+        S.param9 AS heap_blks_scanned,
+        S.param10 AS index_rebuild_count
     FROM pg_stat_get_progress_info('CLUSTER') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
@@ -1291,16 +1292,19 @@ CREATE VIEW pg_stat_progress_repack AS
                       WHEN 2 THEN 'index scanning heap'
                       WHEN 3 THEN 'sorting tuples'
                       WHEN 4 THEN 'writing new heap'
-                      WHEN 5 THEN 'swapping relation files'
-                      WHEN 6 THEN 'rebuilding index'
-                      WHEN 7 THEN 'performing final cleanup'
+                      WHEN 5 THEN 'catch-up'
+                      WHEN 6 THEN 'swapping relation files'
+                      WHEN 7 THEN 'rebuilding index'
+                      WHEN 8 THEN 'performing final cleanup'
                       END AS phase,
         CAST(S.param3 AS oid) AS repack_index_relid,
         S.param4 AS heap_tuples_scanned,
-        S.param5 AS heap_tuples_written,
-        S.param6 AS heap_blks_total,
-        S.param7 AS heap_blks_scanned,
-        S.param8 AS index_rebuild_count
+        S.param5 AS heap_tuples_inserted,
+        S.param6 AS heap_tuples_updated,
+        S.param7 AS heap_tuples_deleted,
+        S.param8 AS heap_blks_total,
+        S.param9 AS heap_blks_scanned,
+        S.param10 AS index_rebuild_count
     FROM pg_stat_get_progress_info('REPACK') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 8b64f9e6795..61224a3adf2 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -25,6 +25,10 @@
 #include "access/toast_internals.h"
 #include "access/transam.h"
 #include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xlog_internal.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
 #include "catalog/catalog.h"
 #include "catalog/dependency.h"
 #include "catalog/heap.h"
@@ -32,6 +36,7 @@
 #include "catalog/namespace.h"
 #include "catalog/objectaccess.h"
 #include "catalog/pg_am.h"
+#include "catalog/pg_control.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/toasting.h"
 #include "commands/cluster.h"
@@ -39,15 +44,21 @@
 #include "commands/progress.h"
 #include "commands/tablecmds.h"
 #include "commands/vacuum.h"
+#include "executor/executor.h"
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
 #include "pgstat.h"
+#include "replication/decode.h"
+#include "replication/logical.h"
+#include "replication/snapbuild.h"
 #include "storage/bufmgr.h"
+#include "storage/ipc.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
 #include "utils/acl.h"
 #include "utils/fmgroids.h"
 #include "utils/guc.h"
+#include "utils/injection_point.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
@@ -67,13 +78,45 @@ typedef struct
 	Oid			indexOid;
 } RelToCluster;
 
+/*
+ * The following definitions are used for concurrent processing.
+ */
+
+/*
+ * The locators are used to avoid logical decoding of data that we do not need
+ * for our table.
+ */
+RelFileLocator repacked_rel_locator = {.relNumber = InvalidOid};
+RelFileLocator repacked_rel_toast_locator = {.relNumber = InvalidOid};
+
+/*
+ * Everything we need to call ExecInsertIndexTuples().
+ */
+typedef struct IndexInsertState
+{
+	ResultRelInfo *rri;
+	EState	   *estate;
+
+	Relation	ident_index;
+} IndexInsertState;
+
+/* The WAL segment being decoded. */
+static XLogSegNo repack_current_segment = 0;
+
+
 static bool cluster_rel_recheck(RepackCommand cmd, Relation OldHeap,
-								Oid indexOid, Oid userid, int options);
+								Oid indexOid, Oid userid, LOCKMODE lmode,
+								int options);
+static void check_repack_concurrently_requirements(Relation rel);
 static void rebuild_relation(RepackCommand cmd, bool usingindex,
-							 Relation OldHeap, Relation index, bool verbose);
+							 Relation OldHeap, Relation index, Oid userid,
+							 bool verbose, bool concurrent);
 static void copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
-							bool verbose, bool *pSwapToastByContent,
-							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
+							Snapshot snapshot, LogicalDecodingContext *decoding_ctx,
+							bool verbose,
+							bool *pSwapToastByContent,
+							TransactionId *pFreezeXid,
+							MultiXactId *pCutoffMulti);
 static List *get_tables_to_repack(RepackCommand cmd, bool usingindex,
 								  MemoryContext permcxt);
 static List *get_tables_to_repack_partitioned(RepackCommand cmd,
@@ -81,12 +124,61 @@ static List *get_tables_to_repack_partitioned(RepackCommand cmd,
 											  Oid relid, bool rel_is_index);
 static bool cluster_is_permitted_for_relation(RepackCommand cmd,
 											  Oid relid, Oid userid);
+
+static void begin_concurrent_repack(Relation rel);
+static void end_concurrent_repack(void);
+static LogicalDecodingContext *setup_logical_decoding(Oid relid,
+													  const char *slotname,
+													  TupleDesc tupdesc);
+static HeapTuple get_changed_tuple(char *change);
+static void apply_concurrent_changes(RepackDecodingState *dstate,
+									 Relation rel, ScanKey key, int nkeys,
+									 IndexInsertState *iistate);
+static void apply_concurrent_insert(Relation rel, ConcurrentChange *change,
+									HeapTuple tup, IndexInsertState *iistate,
+									TupleTableSlot *index_slot);
+static void apply_concurrent_update(Relation rel, HeapTuple tup,
+									HeapTuple tup_target,
+									ConcurrentChange *change,
+									IndexInsertState *iistate,
+									TupleTableSlot *index_slot);
+static void apply_concurrent_delete(Relation rel, HeapTuple tup_target,
+									ConcurrentChange *change);
+static HeapTuple find_target_tuple(Relation rel, ScanKey key, int nkeys,
+								   HeapTuple tup_key,
+								   IndexInsertState *iistate,
+								   TupleTableSlot *ident_slot,
+								   IndexScanDesc *scan_p);
+static void process_concurrent_changes(LogicalDecodingContext *ctx,
+									   XLogRecPtr end_of_wal,
+									   Relation rel_dst,
+									   Relation rel_src,
+									   ScanKey ident_key,
+									   int ident_key_nentries,
+									   IndexInsertState *iistate);
+static IndexInsertState *get_index_insert_state(Relation relation,
+												Oid ident_index_id);
+static ScanKey build_identity_key(Oid ident_idx_oid, Relation rel_src,
+								  int *nentries);
+static void free_index_insert_state(IndexInsertState *iistate);
+static void cleanup_logical_decoding(LogicalDecodingContext *ctx);
+static void rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
+											   Relation cl_index,
+											   LogicalDecodingContext *ctx,
+											   bool swap_toast_by_content,
+											   TransactionId frozenXid,
+											   MultiXactId cutoffMulti);
+static List *build_new_indexes(Relation NewHeap, Relation OldHeap, List *OldIndexes);
 static Relation process_single_relation(RepackStmt *stmt,
+										LOCKMODE lockmode,
+										bool isTopLevel,
 										ClusterParams *params);
 static Oid	determine_clustered_index(Relation rel, bool usingindex,
 									  const char *indexname);
 
 
+#define REPL_PLUGIN_NAME   "pgoutput_repack"
+
 static const char *
 RepackCommandAsString(RepackCommand cmd)
 {
@@ -95,7 +187,7 @@ RepackCommandAsString(RepackCommand cmd)
 		case REPACK_COMMAND_REPACK:
 			return "REPACK";
 		case REPACK_COMMAND_VACUUMFULL:
-			return "VACUUM";
+			return "VACUUM (FULL)";
 		case REPACK_COMMAND_CLUSTER:
 			return "CLUSTER";
 	}
@@ -132,6 +224,7 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 	ClusterParams params = {0};
 	Relation	rel = NULL;
 	MemoryContext repack_context;
+	LOCKMODE	lockmode;
 	List	   *rtcs;
 
 	/* Parse option list */
@@ -142,6 +235,16 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 		else if (strcmp(opt->defname, "analyze") == 0 ||
 				 strcmp(opt->defname, "analyse") == 0)
 			params.options |= defGetBoolean(opt) ? CLUOPT_ANALYZE : 0;
+		else if (strcmp(opt->defname, "concurrently") == 0 &&
+				 defGetBoolean(opt))
+		{
+			if (stmt->command != REPACK_COMMAND_REPACK)
+				ereport(ERROR,
+						errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						errmsg("CONCURRENTLY option not supported for %s",
+							   RepackCommandAsString(stmt->command)));
+			params.options |= CLUOPT_CONCURRENT;
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -151,13 +254,25 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 					 parser_errposition(pstate, opt->location)));
 	}
 
+	/*
+	 * Determine the lock mode expected by cluster_rel().
+	 *
+	 * In the exclusive case, we obtain AccessExclusiveLock right away to
+	 * avoid lock-upgrade hazard in the single-transaction case. In the
+	 * CONCURRENTLY case, the AccessExclusiveLock will only be used at the end
+	 * of processing, supposedly for very short time. Until then, we'll have
+	 * to unlock the relation temporarily, so there's no lock-upgrade hazard.
+	 */
+	lockmode = (params.options & CLUOPT_CONCURRENT) == 0 ?
+		AccessExclusiveLock : ShareUpdateExclusiveLock;
+
 	/*
 	 * If a single relation is specified, process it and we're done ... unless
 	 * the relation is a partitioned table, in which case we fall through.
 	 */
 	if (stmt->relation != NULL)
 	{
-		rel = process_single_relation(stmt, &params);
+		rel = process_single_relation(stmt, lockmode, isTopLevel, &params);
 		if (rel == NULL)
 			return;
 	}
@@ -169,10 +284,29 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 				errmsg("cannot ANALYZE multiple tables"));
 
 	/*
-	 * By here, we know we are in a multi-table situation.  In order to avoid
-	 * holding locks for too long, we want to process each table in its own
-	 * transaction.  This forces us to disallow running inside a user
-	 * transaction block.
+	 * By here, we know we are in a multi-table situation.
+	 *
+	 * Concurrent processing is currently considered rather special (e.g. in
+	 * terms of resources consumed) so it is not performed in bulk.
+	 */
+	if (params.options & CLUOPT_CONCURRENT)
+	{
+		if (rel != NULL)
+		{
+			Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
+			ereport(ERROR,
+					errmsg("REPACK CONCURRENTLY not supported for partitioned tables"),
+					errhint("Consider running the command for individual partitions."));
+		}
+		else
+			ereport(ERROR,
+					errmsg("REPACK CONCURRENTLY requires explicit table name"));
+	}
+
+	/*
+	 * In order to avoid holding locks for too long, we want to process each
+	 * table in its own transaction.  This forces us to disallow running
+	 * inside a user transaction block.
 	 */
 	PreventInTransactionBlock(isTopLevel, RepackCommandAsString(stmt->command));
 
@@ -252,7 +386,7 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 		 * Open the target table, coping with the case where it has been
 		 * dropped.
 		 */
-		rel = try_table_open(rtc->tableOid, AccessExclusiveLock);
+		rel = try_table_open(rtc->tableOid, lockmode);
 		if (rel == NULL)
 		{
 			CommitTransactionCommand();
@@ -264,7 +398,7 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 
 		/* Process this table */
 		cluster_rel(stmt->command, stmt->usingindex,
-					rel, rtc->indexOid, &params);
+					rel, rtc->indexOid, &params, isTopLevel);
 		/* cluster_rel closes the relation, but keeps lock */
 
 		PopActiveSnapshot();
@@ -293,22 +427,55 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
  * If indexOid is InvalidOid, the table will be rewritten in physical order
  * instead of index order.
  *
+ * Note that, in the concurrent case, the function releases the lock at some
+ * point, in order to get AccessExclusiveLock for the final steps (i.e. to
+ * swap the relation files). To make things simpler, the caller should expect
+ * OldHeap to be closed on return, regardless CLUOPT_CONCURRENT. (The
+ * AccessExclusiveLock is kept till the end of the transaction.)
+ *
  * 'cmd' indicates which command is being executed, to be used for error
  * messages.
  */
 void
 cluster_rel(RepackCommand cmd, bool usingindex,
-			Relation OldHeap, Oid indexOid, ClusterParams *params)
+			Relation OldHeap, Oid indexOid, ClusterParams *params,
+			bool isTopLevel)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
+	Relation	index;
+	LOCKMODE	lmode;
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
 	bool		verbose = ((params->options & CLUOPT_VERBOSE) != 0);
 	bool		recheck = ((params->options & CLUOPT_RECHECK) != 0);
-	Relation	index;
+	bool		concurrent = ((params->options & CLUOPT_CONCURRENT) != 0);
 
-	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false));
+	/*
+	 * Check that the correct lock is held. The lock mode is
+	 * AccessExclusiveLock for normal processing and ShareUpdateExclusiveLock
+	 * for concurrent processing (so that SELECT, INSERT, UPDATE and DELETE
+	 * commands work, but cluster_rel() cannot be called concurrently for the
+	 * same relation).
+	 */
+	lmode = !concurrent ? AccessExclusiveLock : ShareUpdateExclusiveLock;
+
+	/* There are specific requirements on concurrent processing. */
+	if (concurrent)
+	{
+		/*
+		 * Make sure we have no XID assigned, otherwise call of
+		 * setup_logical_decoding() can cause a deadlock.
+		 *
+		 * The existence of transaction block actually does not imply that XID
+		 * was already assigned, but it very likely is. We might want to check
+		 * the result of GetCurrentTransactionIdIfAny() instead, but that
+		 * would be less clear from user's perspective.
+		 */
+		PreventInTransactionBlock(isTopLevel, "REPACK (CONCURRENTLY)");
+
+		check_repack_concurrently_requirements(OldHeap);
+	}
 
 	/* Check for user-requested abort. */
 	CHECK_FOR_INTERRUPTS();
@@ -351,11 +518,13 @@ cluster_rel(RepackCommand cmd, bool usingindex,
 	 * If this is a single-transaction CLUSTER, we can skip these tests. We
 	 * *must* skip the one on indisclustered since it would reject an attempt
 	 * to cluster a not-previously-clustered index.
+	 *
+	 * XXX move [some of] these comments to where the RECHECK flag is
+	 * determined?
 	 */
-	if (recheck)
-		if (!cluster_rel_recheck(cmd, OldHeap, indexOid, save_userid,
-								 params->options))
-			goto out;
+	if (recheck && !cluster_rel_recheck(cmd, OldHeap, indexOid, save_userid,
+										lmode, params->options))
+		goto out;
 
 	/*
 	 * We allow repacking shared catalogs only when not using an index. It
@@ -369,6 +538,12 @@ cluster_rel(RepackCommand cmd, bool usingindex,
 				 errmsg("cannot run \"%s\" on a shared catalog",
 						RepackCommandAsString(cmd))));
 
+	/*
+	 * The CONCURRENTLY case should have been rejected earlier because it does
+	 * not support system catalogs.
+	 */
+	Assert(!(OldHeap->rd_rel->relisshared && concurrent));
+
 	/*
 	 * Don't process temp tables of other backends ... their local buffer
 	 * manager is not going to cope.
@@ -404,7 +579,7 @@ cluster_rel(RepackCommand cmd, bool usingindex,
 	if (OidIsValid(indexOid))
 	{
 		/* verify the index is good and lock it */
-		check_index_is_clusterable(OldHeap, indexOid, AccessExclusiveLock);
+		check_index_is_clusterable(OldHeap, indexOid, lmode);
 		/* also open it */
 		index = index_open(indexOid, NoLock);
 	}
@@ -421,7 +596,9 @@ cluster_rel(RepackCommand cmd, bool usingindex,
 	if (OldHeap->rd_rel->relkind == RELKIND_MATVIEW &&
 		!RelationIsPopulated(OldHeap))
 	{
-		relation_close(OldHeap, AccessExclusiveLock);
+		if (index)
+			index_close(index, lmode);
+		relation_close(OldHeap, lmode);
 		goto out;
 	}
 
@@ -434,11 +611,35 @@ cluster_rel(RepackCommand cmd, bool usingindex,
 	 * invalid, because we move tuples around.  Promote them to relation
 	 * locks.  Predicate locks on indexes will be promoted when they are
 	 * reindexed.
+	 *
+	 * During concurrent processing, the heap as well as its indexes stay in
+	 * operation, so we postpone this step until they are locked using
+	 * AccessExclusiveLock near the end of the processing.
 	 */
-	TransferPredicateLocksToHeapRelation(OldHeap);
+	if (!concurrent)
+		TransferPredicateLocksToHeapRelation(OldHeap);
 
 	/* rebuild_relation does all the dirty work */
-	rebuild_relation(cmd, usingindex, OldHeap, index, verbose);
+	PG_TRY();
+	{
+		/*
+		 * For concurrent processing, make sure that our logical decoding
+		 * ignores data changes of other tables than the one we are
+		 * processing.
+		 */
+		if (concurrent)
+			begin_concurrent_repack(OldHeap);
+
+		rebuild_relation(cmd, usingindex, OldHeap, index, save_userid,
+						 verbose, concurrent);
+	}
+	PG_FINALLY();
+	{
+		if (concurrent)
+			end_concurrent_repack();
+	}
+	PG_END_TRY();
+
 	/* rebuild_relation closes OldHeap, and index if valid */
 
 out:
@@ -457,14 +658,14 @@ out:
  */
 static bool
 cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
-					Oid userid, int options)
+					Oid userid, LOCKMODE lmode, int options)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 
 	/* Check that the user still has privileges for the relation */
 	if (!cluster_is_permitted_for_relation(cmd, tableOid, userid))
 	{
-		relation_close(OldHeap, AccessExclusiveLock);
+		relation_close(OldHeap, lmode);
 		return false;
 	}
 
@@ -478,7 +679,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	 */
 	if (RELATION_IS_OTHER_TEMP(OldHeap))
 	{
-		relation_close(OldHeap, AccessExclusiveLock);
+		relation_close(OldHeap, lmode);
 		return false;
 	}
 
@@ -489,7 +690,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 		 */
 		if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
 		{
-			relation_close(OldHeap, AccessExclusiveLock);
+			relation_close(OldHeap, lmode);
 			return false;
 		}
 
@@ -500,7 +701,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 		if ((options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
 			!get_index_isclustered(indexOid))
 		{
-			relation_close(OldHeap, AccessExclusiveLock);
+			relation_close(OldHeap, lmode);
 			return false;
 		}
 	}
@@ -641,19 +842,89 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 	table_close(pg_index, RowExclusiveLock);
 }
 
+/*
+ * Check if the CONCURRENTLY option is legal for the relation.
+ */
+static void
+check_repack_concurrently_requirements(Relation rel)
+{
+	char		relpersistence,
+				replident;
+	Oid			ident_idx;
+
+	/* Data changes in system relations are not logically decoded. */
+	if (IsCatalogRelation(rel))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("REPACK CONCURRENTLY is not supported for catalog relations.")));
+
+	/*
+	 * reorderbuffer.c does not seem to handle processing of TOAST relation
+	 * alone.
+	 */
+	if (IsToastRelation(rel))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("REPACK CONCURRENTLY is not supported for TOAST relations, unless the main relation is repacked too.")));
+
+	relpersistence = rel->rd_rel->relpersistence;
+	if (relpersistence != RELPERSISTENCE_PERMANENT)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("REPACK CONCURRENTLY is only allowed for permanent relations.")));
+
+	/* With NOTHING, WAL does not contain the old tuple. */
+	replident = rel->rd_rel->relreplident;
+	if (replident == REPLICA_IDENTITY_NOTHING)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("Relation \"%s\" has insufficient replication identity.",
+						 RelationGetRelationName(rel))));
+
+	/*
+	 * Identity index is not set if the replica identity is FULL, but PK might
+	 * exist in such a case.
+	 */
+	ident_idx = RelationGetReplicaIndex(rel);
+	if (!OidIsValid(ident_idx) && OidIsValid(rel->rd_pkindex))
+		ident_idx = rel->rd_pkindex;
+	if (!OidIsValid(ident_idx))
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot process relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 (errhint("Relation \"%s\" has no identity index.",
+						  RelationGetRelationName(rel)))));
+}
+
+
 /*
  * rebuild_relation: rebuild an existing relation in index or physical order
  *
- * OldHeap: table to rebuild.
+ * OldHeap: table to rebuild.  See cluster_rel() for comments on the required
+ * lock strength.
+ *
  * index: index to cluster by, or NULL to rewrite in physical order.
  *
- * On entry, heap and index (if one is given) must be open, and
- * AccessExclusiveLock held on them.
- * On exit, they are closed, but locks on them are not released.
+ * On entry, heap and index (if one is given) must be open, and the
+ * appropriate lock held on them -- AccessExclusiveLock for exclusive
+ * processing and ShareUpdateExclusiveLock for concurrent processing.
+ *
+ * On exit, they are closed, but still locked with AccessExclusiveLock.  (The
+ * function handles the lock upgrade if 'concurrent' is true.)
  */
 static void
 rebuild_relation(RepackCommand cmd, bool usingindex,
-				 Relation OldHeap, Relation index, bool verbose)
+				 Relation OldHeap, Relation index, Oid userid,
+				 bool verbose, bool concurrent)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			accessMethod = OldHeap->rd_rel->relam;
@@ -661,13 +932,55 @@ rebuild_relation(RepackCommand cmd, bool usingindex,
 	Oid			OIDNewHeap;
 	Relation	NewHeap;
 	char		relpersistence;
-	bool		is_system_catalog;
 	bool		swap_toast_by_content;
 	TransactionId frozenXid;
 	MultiXactId cutoffMulti;
+	NameData	slotname;
+	LogicalDecodingContext *ctx = NULL;
+	Snapshot	snapshot = NULL;
+#if USE_ASSERT_CHECKING
+	LOCKMODE	lmode;
 
-	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false) &&
-		   (index == NULL || CheckRelationLockedByMe(index, AccessExclusiveLock, false)));
+	lmode = concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock;
+
+	Assert(CheckRelationLockedByMe(OldHeap, lmode, false));
+	Assert(!usingindex || CheckRelationLockedByMe(index, lmode, false));
+#endif
+
+	if (concurrent)
+	{
+		TupleDesc	tupdesc;
+
+		/*
+		 * REPACK CONCURRENTLY is not allowed in a transaction block, so this
+		 * should never fire.
+		 */
+		Assert(GetTopTransactionIdIfAny() == InvalidTransactionId);
+
+		/*
+		 * A single backend should not execute multiple REPACK commands at a
+		 * time, so use PID to make the slot unique.
+		 */
+		snprintf(NameStr(slotname), NAMEDATALEN, "repack_%d", MyProcPid);
+
+		tupdesc = CreateTupleDescCopy(RelationGetDescr(OldHeap));
+
+		/*
+		 * Prepare to capture the concurrent data changes.
+		 *
+		 * Note that this call waits for all transactions with XID already
+		 * assigned to finish. If some of those transactions is waiting for a
+		 * lock conflicting with ShareUpdateExclusiveLock on our table (e.g.
+		 * it runs CREATE INDEX), we can end up in a deadlock. Not sure this
+		 * risk is worth unlocking/locking the table (and its clustering
+		 * index) and checking again if its still eligible for REPACK
+		 * CONCURRENTLY.
+		 */
+		ctx = setup_logical_decoding(tableOid, NameStr(slotname), tupdesc);
+
+		snapshot = SnapBuildInitialSnapshotForRepack(ctx->snapshot_builder);
+		PushActiveSnapshot(snapshot);
+	}
 
 	/* for CLUSTER or REPACK USING INDEX, mark the index as the one to use */
 	if (usingindex)
@@ -675,7 +988,6 @@ rebuild_relation(RepackCommand cmd, bool usingindex,
 
 	/* Remember info about rel before closing OldHeap */
 	relpersistence = OldHeap->rd_rel->relpersistence;
-	is_system_catalog = IsSystemRelation(OldHeap);
 
 	/*
 	 * Create the transient table that will receive the re-ordered data.
@@ -691,30 +1003,67 @@ rebuild_relation(RepackCommand cmd, bool usingindex,
 	NewHeap = table_open(OIDNewHeap, NoLock);
 
 	/* Copy the heap data into the new table in the desired order */
-	copy_table_data(NewHeap, OldHeap, index, verbose,
+	copy_table_data(NewHeap, OldHeap, index, snapshot, ctx, verbose,
 					&swap_toast_by_content, &frozenXid, &cutoffMulti);
 
+	/* The historic snapshot won't be needed anymore. */
+	if (snapshot)
+		PopActiveSnapshot();
 
-	/* Close relcache entries, but keep lock until transaction commit */
-	table_close(OldHeap, NoLock);
-	if (index)
-		index_close(index, NoLock);
-
-	/*
-	 * Close the new relation so it can be dropped as soon as the storage is
-	 * swapped. The relation is not visible to others, so no need to unlock it
-	 * explicitly.
-	 */
-	table_close(NewHeap, NoLock);
-
-	/*
-	 * Swap the physical files of the target and transient tables, then
-	 * rebuild the target's indexes and throw away the transient table.
-	 */
-	finish_heap_swap(tableOid, OIDNewHeap, is_system_catalog,
-					 swap_toast_by_content, false, true,
-					 frozenXid, cutoffMulti,
-					 relpersistence);
+	if (concurrent)
+	{
+		/*
+		 * Push a snapshot that we will use to find old versions of rows when
+		 * processing concurrent UPDATE and DELETE commands. (That snapshot
+		 * should also be used by index expressions.)
+		 */
+		PushActiveSnapshot(GetTransactionSnapshot());
+
+		/*
+		 * Make sure we can find the tuples just inserted when applying DML
+		 * commands on top of those.
+		 */
+		CommandCounterIncrement();
+		UpdateActiveSnapshotCommandId();
+
+		rebuild_relation_finish_concurrent(NewHeap, OldHeap, index,
+										   ctx, swap_toast_by_content,
+										   frozenXid, cutoffMulti);
+		PopActiveSnapshot();
+
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_FINAL_CLEANUP);
+
+		/* Done with decoding. */
+		cleanup_logical_decoding(ctx);
+		ReplicationSlotRelease();
+		ReplicationSlotDrop(NameStr(slotname), false);
+	}
+	else
+	{
+		bool		is_system_catalog = IsSystemRelation(OldHeap);
+
+		/* Close relcache entries, but keep lock until transaction commit */
+		table_close(OldHeap, NoLock);
+		if (index)
+			index_close(index, NoLock);
+
+		/*
+		 * Close the new relation so it can be dropped as soon as the storage
+		 * is swapped. The relation is not visible to others, so no need to
+		 * unlock it explicitly.
+		 */
+		table_close(NewHeap, NoLock);
+
+		/*
+		 * Swap the physical files of the target and transient tables, then
+		 * rebuild the target's indexes and throw away the transient table.
+		 */
+		finish_heap_swap(tableOid, OIDNewHeap, is_system_catalog,
+						 swap_toast_by_content, false, true, true,
+						 frozenXid, cutoffMulti,
+						 relpersistence);
+	}
 }
 
 
@@ -849,15 +1198,19 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
 /*
  * Do the physical copying of table data.
  *
+ * 'snapshot' and 'decoding_ctx': see table_relation_copy_for_cluster(). Pass
+ * iff concurrent processing is required.
+ *
  * There are three output parameters:
  * *pSwapToastByContent is set true if toast tables must be swapped by content.
  * *pFreezeXid receives the TransactionId used as freeze cutoff point.
  * *pCutoffMulti receives the MultiXactId used as a cutoff point.
  */
 static void
-copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verbose,
-				bool *pSwapToastByContent, TransactionId *pFreezeXid,
-				MultiXactId *pCutoffMulti)
+copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
+				Snapshot snapshot, LogicalDecodingContext *decoding_ctx,
+				bool verbose, bool *pSwapToastByContent,
+				TransactionId *pFreezeXid, MultiXactId *pCutoffMulti)
 {
 	Relation	relRelation;
 	HeapTuple	reltup;
@@ -875,6 +1228,8 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	PGRUsage	ru0;
 	char	   *nspname;
 
+	bool		concurrent = snapshot != NULL;
+
 	pg_rusage_init(&ru0);
 
 	/* Store a copy of the namespace name for logging purposes */
@@ -977,8 +1332,48 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	 * provided, else plain seqscan.
 	 */
 	if (OldIndex != NULL && OldIndex->rd_rel->relam == BTREE_AM_OID)
+	{
+		ResourceOwner oldowner = NULL;
+		ResourceOwner resowner = NULL;
+
+		/*
+		 * In the CONCURRENT case, use a dedicated resource owner so we don't
+		 * leave any additional locks behind us that we cannot release easily.
+		 */
+		if (concurrent)
+		{
+			Assert(CheckRelationLockedByMe(OldHeap, ShareUpdateExclusiveLock,
+										   false));
+			Assert(CheckRelationLockedByMe(OldIndex, ShareUpdateExclusiveLock,
+										   false));
+
+			resowner = ResourceOwnerCreate(CurrentResourceOwner,
+										   "plan_cluster_use_sort");
+			oldowner = CurrentResourceOwner;
+			CurrentResourceOwner = resowner;
+		}
+
 		use_sort = plan_cluster_use_sort(RelationGetRelid(OldHeap),
 										 RelationGetRelid(OldIndex));
+
+		if (concurrent)
+		{
+			CurrentResourceOwner = oldowner;
+
+			/*
+			 * We are primarily concerned about locks, but if the planner
+			 * happened to allocate any other resources, we should release
+			 * them too because we're going to delete the whole resowner.
+			 */
+			ResourceOwnerRelease(resowner, RESOURCE_RELEASE_BEFORE_LOCKS,
+								 false, false);
+			ResourceOwnerRelease(resowner, RESOURCE_RELEASE_LOCKS,
+								 false, false);
+			ResourceOwnerRelease(resowner, RESOURCE_RELEASE_AFTER_LOCKS,
+								 false, false);
+			ResourceOwnerDelete(resowner);
+		}
+	}
 	else
 		use_sort = false;
 
@@ -1007,7 +1402,9 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	 * values (e.g. because the AM doesn't use freezing).
 	 */
 	table_relation_copy_for_cluster(OldHeap, NewHeap, OldIndex, use_sort,
-									cutoffs.OldestXmin, &cutoffs.FreezeLimit,
+									cutoffs.OldestXmin, snapshot,
+									decoding_ctx,
+									&cutoffs.FreezeLimit,
 									&cutoffs.MultiXactCutoff,
 									&num_tuples, &tups_vacuumed,
 									&tups_recently_dead);
@@ -1016,7 +1413,11 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	*pFreezeXid = cutoffs.FreezeLimit;
 	*pCutoffMulti = cutoffs.MultiXactCutoff;
 
-	/* Reset rd_toastoid just to be tidy --- it shouldn't be looked at again */
+	/*
+	 * Reset rd_toastoid just to be tidy --- it shouldn't be looked at again.
+	 * In the CONCURRENTLY case, we need to set it again before applying the
+	 * concurrent changes.
+	 */
 	NewHeap->rd_toastoid = InvalidOid;
 
 	num_pages = RelationGetNumberOfBlocks(NewHeap);
@@ -1474,14 +1875,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 				 bool swap_toast_by_content,
 				 bool check_constraints,
 				 bool is_internal,
+				 bool reindex,
 				 TransactionId frozenXid,
 				 MultiXactId cutoffMulti,
 				 char newrelpersistence)
 {
 	ObjectAddress object;
 	Oid			mapped_tables[4];
-	int			reindex_flags;
-	ReindexParams reindex_params = {0};
 	int			i;
 
 	/* Report that we are now swapping relation files */
@@ -1507,39 +1907,47 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	if (is_system_catalog)
 		CacheInvalidateCatalog(OIDOldHeap);
 
-	/*
-	 * Rebuild each index on the relation (but not the toast table, which is
-	 * all-new at this point).  It is important to do this before the DROP
-	 * step because if we are processing a system catalog that will be used
-	 * during DROP, we want to have its indexes available.  There is no
-	 * advantage to the other order anyway because this is all transactional,
-	 * so no chance to reclaim disk space before commit.  We do not need a
-	 * final CommandCounterIncrement() because reindex_relation does it.
-	 *
-	 * Note: because index_build is called via reindex_relation, it will never
-	 * set indcheckxmin true for the indexes.  This is OK even though in some
-	 * sense we are building new indexes rather than rebuilding existing ones,
-	 * because the new heap won't contain any HOT chains at all, let alone
-	 * broken ones, so it can't be necessary to set indcheckxmin.
-	 */
-	reindex_flags = REINDEX_REL_SUPPRESS_INDEX_USE;
-	if (check_constraints)
-		reindex_flags |= REINDEX_REL_CHECK_CONSTRAINTS;
+	if (reindex)
+	{
+		int			reindex_flags;
+		ReindexParams reindex_params = {0};
 
-	/*
-	 * Ensure that the indexes have the same persistence as the parent
-	 * relation.
-	 */
-	if (newrelpersistence == RELPERSISTENCE_UNLOGGED)
-		reindex_flags |= REINDEX_REL_FORCE_INDEXES_UNLOGGED;
-	else if (newrelpersistence == RELPERSISTENCE_PERMANENT)
-		reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
+		/*
+		 * Rebuild each index on the relation (but not the toast table, which
+		 * is all-new at this point).  It is important to do this before the
+		 * DROP step because if we are processing a system catalog that will
+		 * be used during DROP, we want to have its indexes available.  There
+		 * is no advantage to the other order anyway because this is all
+		 * transactional, so no chance to reclaim disk space before commit. We
+		 * do not need a final CommandCounterIncrement() because
+		 * reindex_relation does it.
+		 *
+		 * Note: because index_build is called via reindex_relation, it will
+		 * never set indcheckxmin true for the indexes.  This is OK even
+		 * though in some sense we are building new indexes rather than
+		 * rebuilding existing ones, because the new heap won't contain any
+		 * HOT chains at all, let alone broken ones, so it can't be necessary
+		 * to set indcheckxmin.
+		 */
+		reindex_flags = REINDEX_REL_SUPPRESS_INDEX_USE;
+		if (check_constraints)
+			reindex_flags |= REINDEX_REL_CHECK_CONSTRAINTS;
 
-	/* Report that we are now reindexing relations */
-	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
-								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
+		/*
+		 * Ensure that the indexes have the same persistence as the parent
+		 * relation.
+		 */
+		if (newrelpersistence == RELPERSISTENCE_UNLOGGED)
+			reindex_flags |= REINDEX_REL_FORCE_INDEXES_UNLOGGED;
+		else if (newrelpersistence == RELPERSISTENCE_PERMANENT)
+			reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
 
-	reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
+		/* Report that we are now reindexing relations */
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
+
+		reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
+	}
 
 	/* Report that we are now doing clean up */
 	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
@@ -1881,7 +2289,8 @@ cluster_is_permitted_for_relation(RepackCommand cmd, Oid relid, Oid userid)
  * resolve in this case.
  */
 static Relation
-process_single_relation(RepackStmt *stmt, ClusterParams *params)
+process_single_relation(RepackStmt *stmt, LOCKMODE lockmode, bool isTopLevel,
+						ClusterParams *params)
 {
 	Relation	rel;
 	Oid			tableOid;
@@ -1890,13 +2299,9 @@ process_single_relation(RepackStmt *stmt, ClusterParams *params)
 	Assert(stmt->command == REPACK_COMMAND_CLUSTER ||
 		   stmt->command == REPACK_COMMAND_REPACK);
 
-	/*
-	 * Find, lock, and check permissions on the table.  We obtain
-	 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
-	 * single-transaction case.
-	 */
+	/* Find, lock, and check permissions on the table. */
 	tableOid = RangeVarGetRelidExtended(stmt->relation,
-										AccessExclusiveLock,
+										lockmode,
 										0,
 										RangeVarCallbackMaintainsTable,
 										NULL);
@@ -1922,26 +2327,17 @@ process_single_relation(RepackStmt *stmt, ClusterParams *params)
 		return rel;
 	else
 	{
-		Oid			indexOid;
+		Oid			indexOid = InvalidOid;
 
-		indexOid = determine_clustered_index(rel, stmt->usingindex,
-											 stmt->indexname);
-		if (OidIsValid(indexOid))
-			check_index_is_clusterable(rel, indexOid, AccessExclusiveLock);
-		cluster_rel(stmt->command, stmt->usingindex, rel, indexOid, params);
-
-		/* Do an analyze, if requested */
-		if (params->options & CLUOPT_ANALYZE)
+		if (stmt->usingindex)
 		{
-			VacuumParams vac_params = {0};
-
-			vac_params.options |= VACOPT_ANALYZE;
-			if (params->options & CLUOPT_VERBOSE)
-				vac_params.options |= VACOPT_VERBOSE;
-			analyze_rel(RelationGetRelid(rel), NULL, vac_params, NIL, true,
-						NULL);
+			indexOid = determine_clustered_index(rel, stmt->usingindex,
+												 stmt->indexname);
+			check_index_is_clusterable(rel, indexOid, lockmode);
 		}
 
+		cluster_rel(stmt->command, stmt->usingindex, rel, indexOid,
+					params, isTopLevel);
 		return NULL;
 	}
 }
@@ -1998,3 +2394,1048 @@ determine_clustered_index(Relation rel, bool usingindex, const char *indexname)
 
 	return indexOid;
 }
+
+
+/*
+ * Call this function before REPACK CONCURRENTLY starts to setup logical
+ * decoding. It makes sure that other users of the table put enough
+ * information into WAL.
+ *
+ * The point is that at various places we expect that the table we're
+ * processing is treated like a system catalog. For example, we need to be
+ * able to scan it using a "historic snapshot" anytime during the processing
+ * (as opposed to scanning only at the start point of the decoding, as logical
+ * replication does during initial table synchronization), in order to apply
+ * concurrent UPDATE / DELETE commands.
+ *
+ * Note that TOAST table needs no attention here as it's not scanned using
+ * historic snapshot.
+ */
+static void
+begin_concurrent_repack(Relation rel)
+{
+	Oid			toastrelid;
+
+	/* Avoid logical decoding of other relations by this backend. */
+	repacked_rel_locator = rel->rd_locator;
+	toastrelid = rel->rd_rel->reltoastrelid;
+	if (OidIsValid(toastrelid))
+	{
+		Relation	toastrel;
+
+		/* Avoid logical decoding of other TOAST relations. */
+		toastrel = table_open(toastrelid, AccessShareLock);
+		repacked_rel_toast_locator = toastrel->rd_locator;
+		table_close(toastrel, AccessShareLock);
+	}
+}
+
+/*
+ * Call this when done with REPACK CONCURRENTLY.
+ */
+static void
+end_concurrent_repack(void)
+{
+	/*
+	 * Restore normal function of (future) logical decoding for this backend.
+	 */
+	repacked_rel_locator.relNumber = InvalidOid;
+	repacked_rel_toast_locator.relNumber = InvalidOid;
+}
+
+/*
+ * This function is much like pg_create_logical_replication_slot() except that
+ * the new slot is neither released (if anyone else could read changes from
+ * our slot, we could miss changes other backends do while we copy the
+ * existing data into temporary table), nor persisted (it's easier to handle
+ * crash by restarting all the work from scratch).
+ */
+static LogicalDecodingContext *
+setup_logical_decoding(Oid relid, const char *slotname, TupleDesc tupdesc)
+{
+	LogicalDecodingContext *ctx;
+	RepackDecodingState *dstate;
+
+	/*
+	 * Check if we can use logical decoding.
+	 */
+	CheckSlotPermissions();
+	CheckLogicalDecodingRequirements();
+
+	/* RS_TEMPORARY so that the slot gets cleaned up on ERROR. */
+	ReplicationSlotCreate(slotname, true, RS_TEMPORARY, false, false, false);
+
+	/*
+	 * Neither prepare_write nor do_write callback nor update_progress is
+	 * useful for us.
+	 */
+	ctx = CreateInitDecodingContext(REPL_PLUGIN_NAME,
+									NIL,
+									true,
+									InvalidXLogRecPtr,
+									XL_ROUTINE(.page_read = read_local_xlog_page,
+											   .segment_open = wal_segment_open,
+											   .segment_close = wal_segment_close),
+									NULL, NULL, NULL);
+
+	/*
+	 * We don't have control on setting fast_forward, so at least check it.
+	 */
+	Assert(!ctx->fast_forward);
+
+	DecodingContextFindStartpoint(ctx);
+
+	/* Some WAL records should have been read. */
+	Assert(ctx->reader->EndRecPtr != InvalidXLogRecPtr);
+
+	XLByteToSeg(ctx->reader->EndRecPtr, repack_current_segment,
+				wal_segment_size);
+
+	/*
+	 * Setup structures to store decoded changes.
+	 */
+	dstate = palloc0(sizeof(RepackDecodingState));
+	dstate->relid = relid;
+	dstate->tstore = tuplestore_begin_heap(false, false,
+										   maintenance_work_mem);
+
+	dstate->tupdesc = tupdesc;
+
+	/* Initialize the descriptor to store the changes ... */
+	dstate->tupdesc_change = CreateTemplateTupleDesc(1);
+
+	TupleDescInitEntry(dstate->tupdesc_change, 1, NULL, BYTEAOID, -1, 0);
+	/* ... as well as the corresponding slot. */
+	dstate->tsslot = MakeSingleTupleTableSlot(dstate->tupdesc_change,
+											  &TTSOpsMinimalTuple);
+
+	dstate->resowner = ResourceOwnerCreate(CurrentResourceOwner,
+										   "logical decoding");
+
+	ctx->output_writer_private = dstate;
+	return ctx;
+}
+
+/*
+ * Retrieve tuple from ConcurrentChange structure.
+ *
+ * The input data starts with the structure but it might not be appropriately
+ * aligned.
+ */
+static HeapTuple
+get_changed_tuple(char *change)
+{
+	HeapTupleData tup_data;
+	HeapTuple	result;
+	char	   *src;
+
+	/*
+	 * Ensure alignment before accessing the fields. (This is why we can't use
+	 * heap_copytuple() instead of this function.)
+	 */
+	src = change + offsetof(ConcurrentChange, tup_data);
+	memcpy(&tup_data, src, sizeof(HeapTupleData));
+
+	result = (HeapTuple) palloc(HEAPTUPLESIZE + tup_data.t_len);
+	memcpy(result, &tup_data, sizeof(HeapTupleData));
+	result->t_data = (HeapTupleHeader) ((char *) result + HEAPTUPLESIZE);
+	src = change + SizeOfConcurrentChange;
+	memcpy(result->t_data, src, result->t_len);
+
+	return result;
+}
+
+/*
+ * Decode logical changes from the WAL sequence up to end_of_wal.
+ */
+void
+repack_decode_concurrent_changes(LogicalDecodingContext *ctx,
+								 XLogRecPtr end_of_wal)
+{
+	RepackDecodingState *dstate;
+	ResourceOwner resowner_old;
+
+	/*
+	 * Invalidate the "present" cache before moving to "(recent) history".
+	 */
+	InvalidateSystemCaches();
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+	resowner_old = CurrentResourceOwner;
+	CurrentResourceOwner = dstate->resowner;
+
+	PG_TRY();
+	{
+		while (ctx->reader->EndRecPtr < end_of_wal)
+		{
+			XLogRecord *record;
+			XLogSegNo	segno_new;
+			char	   *errm = NULL;
+			XLogRecPtr	end_lsn;
+
+			record = XLogReadRecord(ctx->reader, &errm);
+			if (errm)
+				elog(ERROR, "%s", errm);
+
+			if (record != NULL)
+				LogicalDecodingProcessRecord(ctx, ctx->reader);
+
+			/*
+			 * If WAL segment boundary has been crossed, inform the decoding
+			 * system that the catalog_xmin can advance. (We can confirm more
+			 * often, but a filling a single WAL segment should not take much
+			 * time.)
+			 */
+			end_lsn = ctx->reader->EndRecPtr;
+			XLByteToSeg(end_lsn, segno_new, wal_segment_size);
+			if (segno_new != repack_current_segment)
+			{
+				LogicalConfirmReceivedLocation(end_lsn);
+				elog(DEBUG1, "REPACK: confirmed receive location %X/%X",
+					 (uint32) (end_lsn >> 32), (uint32) end_lsn);
+				repack_current_segment = segno_new;
+			}
+
+			CHECK_FOR_INTERRUPTS();
+		}
+		InvalidateSystemCaches();
+		CurrentResourceOwner = resowner_old;
+	}
+	PG_CATCH();
+	{
+		/* clear all timetravel entries */
+		InvalidateSystemCaches();
+		CurrentResourceOwner = resowner_old;
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+}
+
+/*
+ * Apply changes that happened during the initial load.
+ *
+ * Scan key is passed by caller, so it does not have to be constructed
+ * multiple times. Key entries have all fields initialized, except for
+ * sk_argument.
+ */
+static void
+apply_concurrent_changes(RepackDecodingState *dstate, Relation rel,
+						 ScanKey key, int nkeys, IndexInsertState *iistate)
+{
+	TupleTableSlot *index_slot,
+			   *ident_slot;
+	HeapTuple	tup_old = NULL;
+
+	if (dstate->nchanges == 0)
+		return;
+
+	/* TupleTableSlot is needed to pass the tuple to ExecInsertIndexTuples(). */
+	index_slot = MakeSingleTupleTableSlot(dstate->tupdesc, &TTSOpsHeapTuple);
+
+	/* A slot to fetch tuples from identity index. */
+	ident_slot = table_slot_create(rel, NULL);
+
+	while (tuplestore_gettupleslot(dstate->tstore, true, false,
+								   dstate->tsslot))
+	{
+		bool		shouldFree;
+		HeapTuple	tup_change,
+					tup,
+					tup_exist;
+		char	   *change_raw,
+				   *src;
+		ConcurrentChange change;
+		bool		isnull[1];
+		Datum		values[1];
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* Get the change from the single-column tuple. */
+		tup_change = ExecFetchSlotHeapTuple(dstate->tsslot, false, &shouldFree);
+		heap_deform_tuple(tup_change, dstate->tupdesc_change, values, isnull);
+		Assert(!isnull[0]);
+
+		/* Make sure we access aligned data. */
+		change_raw = (char *) DatumGetByteaP(values[0]);
+		src = (char *) VARDATA(change_raw);
+		memcpy(&change, src, SizeOfConcurrentChange);
+
+		/* TRUNCATE change contains no tuple, so process it separately. */
+		if (change.kind == CHANGE_TRUNCATE)
+		{
+			/*
+			 * All the things that ExecuteTruncateGuts() does (such as firing
+			 * triggers or handling the DROP_CASCADE behavior) should have
+			 * taken place on the source relation. Thus we only do the actual
+			 * truncation of the new relation (and its indexes).
+			 */
+			heap_truncate_one_rel(rel);
+
+			pfree(tup_change);
+			continue;
+		}
+
+		/*
+		 * Extract the tuple from the change. The tuple is copied here because
+		 * it might be assigned to 'tup_old', in which case it needs to
+		 * survive into the next iteration.
+		 */
+		tup = get_changed_tuple(src);
+
+		if (change.kind == CHANGE_UPDATE_OLD)
+		{
+			Assert(tup_old == NULL);
+			tup_old = tup;
+		}
+		else if (change.kind == CHANGE_INSERT)
+		{
+			Assert(tup_old == NULL);
+
+			apply_concurrent_insert(rel, &change, tup, iistate, index_slot);
+
+			pfree(tup);
+		}
+		else if (change.kind == CHANGE_UPDATE_NEW ||
+				 change.kind == CHANGE_DELETE)
+		{
+			IndexScanDesc ind_scan = NULL;
+			HeapTuple	tup_key;
+
+			if (change.kind == CHANGE_UPDATE_NEW)
+			{
+				tup_key = tup_old != NULL ? tup_old : tup;
+			}
+			else
+			{
+				Assert(tup_old == NULL);
+				tup_key = tup;
+			}
+
+			/*
+			 * Find the tuple to be updated or deleted.
+			 */
+			tup_exist = find_target_tuple(rel, key, nkeys, tup_key,
+										  iistate, ident_slot, &ind_scan);
+			if (tup_exist == NULL)
+				elog(ERROR, "Failed to find target tuple");
+
+			if (change.kind == CHANGE_UPDATE_NEW)
+				apply_concurrent_update(rel, tup, tup_exist, &change, iistate,
+										index_slot);
+			else
+				apply_concurrent_delete(rel, tup_exist, &change);
+
+			if (tup_old != NULL)
+			{
+				pfree(tup_old);
+				tup_old = NULL;
+			}
+
+			pfree(tup);
+			index_endscan(ind_scan);
+		}
+		else
+			elog(ERROR, "Unrecognized kind of change: %d", change.kind);
+
+		/*
+		 * If a change was applied now, increment CID for next writes and
+		 * update the snapshot so it sees the changes we've applied so far.
+		 */
+		if (change.kind != CHANGE_UPDATE_OLD)
+		{
+			CommandCounterIncrement();
+			UpdateActiveSnapshotCommandId();
+		}
+
+		/* TTSOpsMinimalTuple has .get_heap_tuple==NULL. */
+		Assert(shouldFree);
+		pfree(tup_change);
+	}
+
+	tuplestore_clear(dstate->tstore);
+	dstate->nchanges = 0;
+
+	/* Cleanup. */
+	ExecDropSingleTupleTableSlot(index_slot);
+	ExecDropSingleTupleTableSlot(ident_slot);
+}
+
+static void
+apply_concurrent_insert(Relation rel, ConcurrentChange *change, HeapTuple tup,
+						IndexInsertState *iistate, TupleTableSlot *index_slot)
+{
+	List	   *recheck;
+
+
+	/*
+	 * Like simple_heap_insert(), but make sure that the INSERT is not
+	 * logically decoded - see reform_and_rewrite_tuple() for more
+	 * information.
+	 */
+	heap_insert(rel, tup, GetCurrentCommandId(true), HEAP_INSERT_NO_LOGICAL,
+				NULL);
+
+	/*
+	 * Update indexes.
+	 *
+	 * In case functions in the index need the active snapshot and caller
+	 * hasn't set one.
+	 */
+	ExecStoreHeapTuple(tup, index_slot, false);
+	recheck = ExecInsertIndexTuples(iistate->rri,
+									index_slot,
+									iistate->estate,
+									false,	/* update */
+									false,	/* noDupErr */
+									NULL,	/* specConflict */
+									NIL,	/* arbiterIndexes */
+									false	/* onlySummarizing */
+		);
+
+	/*
+	 * If recheck is required, it must have been preformed on the source
+	 * relation by now. (All the logical changes we process here are already
+	 * committed.)
+	 */
+	list_free(recheck);
+
+	pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_INSERTED, 1);
+}
+
+static void
+apply_concurrent_update(Relation rel, HeapTuple tup, HeapTuple tup_target,
+						ConcurrentChange *change, IndexInsertState *iistate,
+						TupleTableSlot *index_slot)
+{
+	LockTupleMode lockmode;
+	TM_FailureData tmfd;
+	TU_UpdateIndexes update_indexes;
+	TM_Result	res;
+	List	   *recheck;
+
+	/*
+	 * Write the new tuple into the new heap. ('tup' gets the TID assigned
+	 * here.)
+	 *
+	 * Do it like in simple_heap_update(), except for 'wal_logical' (and
+	 * except for 'wait').
+	 */
+	res = heap_update(rel, &tup_target->t_self, tup,
+					  GetCurrentCommandId(true),
+					  InvalidSnapshot,
+					  false,	/* no wait - only we are doing changes */
+					  &tmfd, &lockmode, &update_indexes,
+					  false /* wal_logical */ );
+	if (res != TM_Ok)
+		ereport(ERROR, (errmsg("failed to apply concurrent UPDATE")));
+
+	ExecStoreHeapTuple(tup, index_slot, false);
+
+	if (update_indexes != TU_None)
+	{
+		recheck = ExecInsertIndexTuples(iistate->rri,
+										index_slot,
+										iistate->estate,
+										true,	/* update */
+										false,	/* noDupErr */
+										NULL,	/* specConflict */
+										NIL,	/* arbiterIndexes */
+		/* onlySummarizing */
+										update_indexes == TU_Summarizing);
+		list_free(recheck);
+	}
+
+	pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_UPDATED, 1);
+}
+
+static void
+apply_concurrent_delete(Relation rel, HeapTuple tup_target,
+						ConcurrentChange *change)
+{
+	TM_Result	res;
+	TM_FailureData tmfd;
+
+	/*
+	 * Delete tuple from the new heap.
+	 *
+	 * Do it like in simple_heap_delete(), except for 'wal_logical' (and
+	 * except for 'wait').
+	 */
+	res = heap_delete(rel, &tup_target->t_self, GetCurrentCommandId(true),
+					  InvalidSnapshot, false,
+					  &tmfd,
+					  false,	/* no wait - only we are doing changes */
+					  false /* wal_logical */ );
+
+	if (res != TM_Ok)
+		ereport(ERROR, (errmsg("failed to apply concurrent DELETE")));
+
+	pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_DELETED, 1);
+}
+
+/*
+ * Find the tuple to be updated or deleted.
+ *
+ * 'key' is a pre-initialized scan key, into which the function will put the
+ * key values.
+ *
+ * 'tup_key' is a tuple containing the key values for the scan.
+ *
+ * On exit,'*scan_p' contains the scan descriptor used. The caller must close
+ * it when he no longer needs the tuple returned.
+ */
+static HeapTuple
+find_target_tuple(Relation rel, ScanKey key, int nkeys, HeapTuple tup_key,
+				  IndexInsertState *iistate,
+				  TupleTableSlot *ident_slot, IndexScanDesc *scan_p)
+{
+	IndexScanDesc scan;
+	Form_pg_index ident_form;
+	int2vector *ident_indkey;
+	HeapTuple	result = NULL;
+
+	/* XXX no instrumentation for now */
+	scan = index_beginscan(rel, iistate->ident_index, GetActiveSnapshot(),
+						   NULL, nkeys, 0);
+	*scan_p = scan;
+	index_rescan(scan, key, nkeys, NULL, 0);
+
+	/* Info needed to retrieve key values from heap tuple. */
+	ident_form = iistate->ident_index->rd_index;
+	ident_indkey = &ident_form->indkey;
+
+	/* Use the incoming tuple to finalize the scan key. */
+	for (int i = 0; i < scan->numberOfKeys; i++)
+	{
+		ScanKey		entry;
+		bool		isnull;
+		int16		attno_heap;
+
+		entry = &scan->keyData[i];
+		attno_heap = ident_indkey->values[i];
+		entry->sk_argument = heap_getattr(tup_key,
+										  attno_heap,
+										  rel->rd_att,
+										  &isnull);
+		Assert(!isnull);
+	}
+	if (index_getnext_slot(scan, ForwardScanDirection, ident_slot))
+	{
+		bool		shouldFree;
+
+		result = ExecFetchSlotHeapTuple(ident_slot, false, &shouldFree);
+		/* TTSOpsBufferHeapTuple has .get_heap_tuple != NULL. */
+		Assert(!shouldFree);
+	}
+
+	return result;
+}
+
+/*
+ * Decode and apply concurrent changes.
+ *
+ * Pass rel_src iff its reltoastrelid is needed.
+ */
+static void
+process_concurrent_changes(LogicalDecodingContext *ctx, XLogRecPtr end_of_wal,
+						   Relation rel_dst, Relation rel_src, ScanKey ident_key,
+						   int ident_key_nentries, IndexInsertState *iistate)
+{
+	RepackDecodingState *dstate;
+
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_CATCH_UP);
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	repack_decode_concurrent_changes(ctx, end_of_wal);
+
+	if (dstate->nchanges == 0)
+		return;
+
+	PG_TRY();
+	{
+		/*
+		 * Make sure that TOAST values can eventually be accessed via the old
+		 * relation - see comment in copy_table_data().
+		 */
+		if (rel_src)
+			rel_dst->rd_toastoid = rel_src->rd_rel->reltoastrelid;
+
+		apply_concurrent_changes(dstate, rel_dst, ident_key,
+								 ident_key_nentries, iistate);
+	}
+	PG_FINALLY();
+	{
+		if (rel_src)
+			rel_dst->rd_toastoid = InvalidOid;
+	}
+	PG_END_TRY();
+}
+
+static IndexInsertState *
+get_index_insert_state(Relation relation, Oid ident_index_id)
+{
+	EState	   *estate;
+	int			i;
+	IndexInsertState *result;
+
+	result = (IndexInsertState *) palloc0(sizeof(IndexInsertState));
+	estate = CreateExecutorState();
+
+	result->rri = (ResultRelInfo *) palloc(sizeof(ResultRelInfo));
+	InitResultRelInfo(result->rri, relation, 0, 0, 0);
+	ExecOpenIndices(result->rri, false);
+
+	/*
+	 * Find the relcache entry of the identity index so that we spend no extra
+	 * effort to open / close it.
+	 */
+	for (i = 0; i < result->rri->ri_NumIndices; i++)
+	{
+		Relation	ind_rel;
+
+		ind_rel = result->rri->ri_IndexRelationDescs[i];
+		if (ind_rel->rd_id == ident_index_id)
+			result->ident_index = ind_rel;
+	}
+	if (result->ident_index == NULL)
+		elog(ERROR, "Failed to open identity index");
+
+	/* Only initialize fields needed by ExecInsertIndexTuples(). */
+	result->estate = estate;
+
+	return result;
+}
+
+/*
+ * Build scan key to process logical changes.
+ */
+static ScanKey
+build_identity_key(Oid ident_idx_oid, Relation rel_src, int *nentries)
+{
+	Relation	ident_idx_rel;
+	Form_pg_index ident_idx;
+	int			n,
+				i;
+	ScanKey		result;
+
+	Assert(OidIsValid(ident_idx_oid));
+	ident_idx_rel = index_open(ident_idx_oid, AccessShareLock);
+	ident_idx = ident_idx_rel->rd_index;
+	n = ident_idx->indnatts;
+	result = (ScanKey) palloc(sizeof(ScanKeyData) * n);
+	for (i = 0; i < n; i++)
+	{
+		ScanKey		entry;
+		int16		relattno;
+		Form_pg_attribute att;
+		Oid			opfamily,
+					opcintype,
+					opno,
+					opcode;
+
+		entry = &result[i];
+		relattno = ident_idx->indkey.values[i];
+		if (relattno >= 1)
+		{
+			TupleDesc	desc;
+
+			desc = rel_src->rd_att;
+			att = TupleDescAttr(desc, relattno - 1);
+		}
+		else
+			elog(ERROR, "Unexpected attribute number %d in index", relattno);
+
+		opfamily = ident_idx_rel->rd_opfamily[i];
+		opcintype = ident_idx_rel->rd_opcintype[i];
+		opno = get_opfamily_member(opfamily, opcintype, opcintype,
+								   BTEqualStrategyNumber);
+
+		if (!OidIsValid(opno))
+			elog(ERROR, "Failed to find = operator for type %u", opcintype);
+
+		opcode = get_opcode(opno);
+		if (!OidIsValid(opcode))
+			elog(ERROR, "Failed to find = operator for operator %u", opno);
+
+		/* Initialize everything but argument. */
+		ScanKeyInit(entry,
+					i + 1,
+					BTEqualStrategyNumber, opcode,
+					(Datum) NULL);
+		entry->sk_collation = att->attcollation;
+	}
+	index_close(ident_idx_rel, AccessShareLock);
+
+	*nentries = n;
+	return result;
+}
+
+static void
+free_index_insert_state(IndexInsertState *iistate)
+{
+	ExecCloseIndices(iistate->rri);
+	FreeExecutorState(iistate->estate);
+	pfree(iistate->rri);
+	pfree(iistate);
+}
+
+static void
+cleanup_logical_decoding(LogicalDecodingContext *ctx)
+{
+	RepackDecodingState *dstate;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	ExecDropSingleTupleTableSlot(dstate->tsslot);
+	FreeTupleDesc(dstate->tupdesc_change);
+	FreeTupleDesc(dstate->tupdesc);
+	tuplestore_end(dstate->tstore);
+
+	FreeDecodingContext(ctx);
+}
+
+/*
+ * The final steps of rebuild_relation() for concurrent processing.
+ *
+ * On entry, NewHeap is locked in AccessExclusiveLock mode. OldHeap and its
+ * clustering index (if one is passed) are still locked in a mode that allows
+ * concurrent data changes. On exit, both tables and their indexes are closed,
+ * but locked in AccessExclusiveLock mode.
+ */
+static void
+rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
+								   Relation cl_index,
+								   LogicalDecodingContext *ctx,
+								   bool swap_toast_by_content,
+								   TransactionId frozenXid,
+								   MultiXactId cutoffMulti)
+{
+	LOCKMODE	lockmode_old PG_USED_FOR_ASSERTS_ONLY;
+	List	   *ind_oids_new;
+	Oid			old_table_oid = RelationGetRelid(OldHeap);
+	Oid			new_table_oid = RelationGetRelid(NewHeap);
+	List	   *ind_oids_old = RelationGetIndexList(OldHeap);
+	ListCell   *lc,
+			   *lc2;
+	char		relpersistence;
+	bool		is_system_catalog;
+	Oid			ident_idx_old,
+				ident_idx_new;
+	IndexInsertState *iistate;
+	ScanKey		ident_key;
+	int			ident_key_nentries;
+	XLogRecPtr	wal_insert_ptr,
+				end_of_wal;
+	char		dummy_rec_data = '\0';
+	Relation   *ind_refs,
+			   *ind_refs_p;
+	int			nind;
+
+	/* Like in cluster_rel(). */
+	lockmode_old = ShareUpdateExclusiveLock;
+	Assert(CheckRelationLockedByMe(OldHeap, lockmode_old, false));
+	Assert(cl_index == NULL ||
+		   CheckRelationLockedByMe(cl_index, lockmode_old, false));
+	/* This is expected from the caller. */
+	Assert(CheckRelationLockedByMe(NewHeap, AccessExclusiveLock, false));
+
+	ident_idx_old = RelationGetReplicaIndex(OldHeap);
+
+	/*
+	 * Unlike the exclusive case, we build new indexes for the new relation
+	 * rather than swapping the storage and reindexing the old relation. The
+	 * point is that the index build can take some time, so we do it before we
+	 * get AccessExclusiveLock on the old heap and therefore we cannot swap
+	 * the heap storage yet.
+	 *
+	 * index_create() will lock the new indexes using AccessExclusiveLock - no
+	 * need to change that.
+	 *
+	 * We assume that ShareUpdateExclusiveLock on the table prevents anyone
+	 * from dropping the existing indexes or adding new ones, so the lists of
+	 * old and new indexes should match at the swap time. On the other hand we
+	 * do not block ALTER INDEX commands that do not require table lock (e.g.
+	 * ALTER INDEX ... SET ...).
+	 *
+	 * XXX Should we check a the end of our work if another transaction
+	 * executed such a command and issue a NOTICE that we might have discarded
+	 * its effects? (For example, someone changes storage parameter after we
+	 * have created the new index, the new value of that parameter is lost.)
+	 * Alternatively, we can lock all the indexes now in a mode that blocks
+	 * all the ALTER INDEX commands (ShareUpdateExclusiveLock ?), and keep
+	 * them locked till the end of the transactions. That might increase the
+	 * risk of deadlock during the lock upgrade below, however SELECT / DML
+	 * queries should not be involved in such a deadlock.
+	 */
+	ind_oids_new = build_new_indexes(NewHeap, OldHeap, ind_oids_old);
+
+	/*
+	 * Processing shouldn't start w/o valid identity index.
+	 */
+	Assert(OidIsValid(ident_idx_old));
+
+	/* Find "identity index" on the new relation. */
+	ident_idx_new = InvalidOid;
+	forboth(lc, ind_oids_old, lc2, ind_oids_new)
+	{
+		Oid			ind_old = lfirst_oid(lc);
+		Oid			ind_new = lfirst_oid(lc2);
+
+		if (ident_idx_old == ind_old)
+		{
+			ident_idx_new = ind_new;
+			break;
+		}
+	}
+	if (!OidIsValid(ident_idx_new))
+
+		/*
+		 * Should not happen, given our lock on the old relation.
+		 */
+		ereport(ERROR,
+				(errmsg("Identity index missing on the new relation")));
+
+	/* Executor state to update indexes. */
+	iistate = get_index_insert_state(NewHeap, ident_idx_new);
+
+	/*
+	 * Build scan key that we'll use to look for rows to be updated / deleted
+	 * during logical decoding.
+	 */
+	ident_key = build_identity_key(ident_idx_new, OldHeap, &ident_key_nentries);
+
+	/*
+	 * During testing, wait for another backend to perform concurrent data
+	 * changes which we will process below.
+	 */
+	INJECTION_POINT("repack-concurrently-before-lock", NULL);
+
+	/*
+	 * Flush all WAL records inserted so far (possibly except for the last
+	 * incomplete page, see GetInsertRecPtr), to minimize the amount of data
+	 * we need to flush while holding exclusive lock on the source table.
+	 */
+	wal_insert_ptr = GetInsertRecPtr();
+	XLogFlush(wal_insert_ptr);
+	end_of_wal = GetFlushRecPtr(NULL);
+
+	/*
+	 * Apply concurrent changes first time, to minimize the time we need to
+	 * hold AccessExclusiveLock. (Quite some amount of WAL could have been
+	 * written during the data copying and index creation.)
+	 */
+	process_concurrent_changes(ctx, end_of_wal, NewHeap,
+							   swap_toast_by_content ? OldHeap : NULL,
+							   ident_key, ident_key_nentries, iistate);
+
+	/*
+	 * Acquire AccessExclusiveLock on the table, its TOAST relation (if there
+	 * is one), all its indexes, so that we can swap the files.
+	 *
+	 * Before that, unlock the index temporarily to avoid deadlock in case
+	 * another transaction is trying to lock it while holding the lock on the
+	 * table.
+	 */
+	if (cl_index)
+	{
+		index_close(cl_index, ShareUpdateExclusiveLock);
+		cl_index = NULL;
+	}
+	/* For the same reason, unlock TOAST relation. */
+	if (OldHeap->rd_rel->reltoastrelid)
+		LockRelationOid(OldHeap->rd_rel->reltoastrelid, AccessExclusiveLock);
+	/* Finally lock the table */
+	LockRelationOid(old_table_oid, AccessExclusiveLock);
+
+	/*
+	 * Lock all indexes now, not only the clustering one: all indexes need to
+	 * have their files swapped. While doing that, store their relation
+	 * references in an array, to handle predicate locks below.
+	 */
+	ind_refs_p = ind_refs = palloc_array(Relation, list_length(ind_oids_old));
+	nind = 0;
+	foreach(lc, ind_oids_old)
+	{
+		Oid			ind_oid;
+		Relation	index;
+
+		ind_oid = lfirst_oid(lc);
+		index = index_open(ind_oid, AccessExclusiveLock);
+
+		/*
+		 * TODO 1) Do we need to check if ALTER INDEX was executed since the
+		 * new index was created in build_new_indexes()? 2) Specifically for
+		 * the clustering index, should check_index_is_clusterable() be called
+		 * here? (Not sure about the latter: ShareUpdateExclusiveLock on the
+		 * table probably blocks all commands that affect the result of
+		 * check_index_is_clusterable().)
+		 */
+		*ind_refs_p = index;
+		ind_refs_p++;
+		nind++;
+	}
+
+	/*
+	 * In addition, lock the OldHeap's TOAST relation exclusively - again, the
+	 * lock is needed to swap the files.
+	 */
+	if (OidIsValid(OldHeap->rd_rel->reltoastrelid))
+		LockRelationOid(OldHeap->rd_rel->reltoastrelid, AccessExclusiveLock);
+
+	/*
+	 * Tuples and pages of the old heap will be gone, but the heap will stay.
+	 */
+	TransferPredicateLocksToHeapRelation(OldHeap);
+	/* The same for indexes. */
+	for (int i = 0; i < nind; i++)
+	{
+		Relation	index = ind_refs[i];
+
+		TransferPredicateLocksToHeapRelation(index);
+
+		/*
+		 * References to indexes on the old relation are not needed anymore,
+		 * however locks stay till the end of the transaction.
+		 */
+		index_close(index, NoLock);
+	}
+	pfree(ind_refs);
+
+	/*
+	 * Flush anything we see in WAL, to make sure that all changes committed
+	 * while we were waiting for the exclusive lock are available for
+	 * decoding. This should not be necessary if all backends had
+	 * synchronous_commit set, but we can't rely on this setting.
+	 *
+	 * Unfortunately, GetInsertRecPtr() may lag behind the actual insert
+	 * position, and GetLastImportantRecPtr() points at the start of the last
+	 * record rather than at the end. Thus the simplest way to determine the
+	 * insert position is to insert a dummy record and use its LSN.
+	 *
+	 * XXX Consider using GetLastImportantRecPtr() and adding the size of the
+	 * last record (plus the total size of all the page headers the record
+	 * spans)?
+	 */
+	XLogBeginInsert();
+	XLogRegisterData(&dummy_rec_data, 1);
+	wal_insert_ptr = XLogInsert(RM_XLOG_ID, XLOG_NOOP);
+	XLogFlush(wal_insert_ptr);
+	end_of_wal = GetFlushRecPtr(NULL);
+
+	/* Apply the concurrent changes again. */
+	process_concurrent_changes(ctx, end_of_wal, NewHeap,
+							   swap_toast_by_content ? OldHeap : NULL,
+							   ident_key, ident_key_nentries, iistate);
+
+	/* Remember info about rel before closing OldHeap */
+	relpersistence = OldHeap->rd_rel->relpersistence;
+	is_system_catalog = IsSystemRelation(OldHeap);
+
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_SWAP_REL_FILES);
+
+	/*
+	 * Even ShareUpdateExclusiveLock should have prevented others from
+	 * creating / dropping indexes (even using the CONCURRENTLY option), so we
+	 * do not need to check whether the lists match.
+	 */
+	forboth(lc, ind_oids_old, lc2, ind_oids_new)
+	{
+		Oid			ind_old = lfirst_oid(lc);
+		Oid			ind_new = lfirst_oid(lc2);
+		Oid			mapped_tables[4];
+
+		/* Zero out possible results from swapped_relation_files */
+		memset(mapped_tables, 0, sizeof(mapped_tables));
+
+		swap_relation_files(ind_old, ind_new,
+							(old_table_oid == RelationRelationId),
+							swap_toast_by_content,
+							true,
+							InvalidTransactionId,
+							InvalidMultiXactId,
+							mapped_tables);
+
+#ifdef USE_ASSERT_CHECKING
+
+		/*
+		 * Concurrent processing is not supported for system relations, so
+		 * there should be no mapped tables.
+		 */
+		for (int i = 0; i < 4; i++)
+			Assert(mapped_tables[i] == 0);
+#endif
+	}
+
+	/* The new indexes must be visible for deletion. */
+	CommandCounterIncrement();
+
+	/* Close the old heap but keep lock until transaction commit. */
+	table_close(OldHeap, NoLock);
+	/* Close the new heap. (We didn't have to open its indexes). */
+	table_close(NewHeap, NoLock);
+
+	/* Cleanup what we don't need anymore. (And close the identity index.) */
+	pfree(ident_key);
+	free_index_insert_state(iistate);
+
+	/*
+	 * Swap the relations and their TOAST relations and TOAST indexes. This
+	 * also drops the new relation and its indexes.
+	 *
+	 * (System catalogs are currently not supported.)
+	 */
+	Assert(!is_system_catalog);
+	finish_heap_swap(old_table_oid, new_table_oid,
+					 is_system_catalog,
+					 swap_toast_by_content,
+					 false, true, false,
+					 frozenXid, cutoffMulti,
+					 relpersistence);
+}
+
+/*
+ * Build indexes on NewHeap according to those on OldHeap.
+ *
+ * OldIndexes is the list of index OIDs on OldHeap.
+ *
+ * A list of OIDs of the corresponding indexes created on NewHeap is
+ * returned. The order of items does match, so we can use these arrays to swap
+ * index storage.
+ */
+static List *
+build_new_indexes(Relation NewHeap, Relation OldHeap, List *OldIndexes)
+{
+	ListCell   *lc;
+	List	   *result = NIL;
+
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
+
+	foreach(lc, OldIndexes)
+	{
+		Oid			ind_oid,
+					ind_oid_new;
+		char	   *newName;
+		Relation	ind;
+
+		ind_oid = lfirst_oid(lc);
+		ind = index_open(ind_oid, AccessShareLock);
+
+		newName = ChooseRelationName(get_rel_name(ind_oid),
+									 NULL,
+									 "repacknew",
+									 get_rel_namespace(ind->rd_index->indrelid),
+									 false);
+		ind_oid_new = index_create_copy(NewHeap, ind_oid,
+										ind->rd_rel->reltablespace, newName,
+										false);
+		result = lappend_oid(result, ind_oid_new);
+
+		index_close(ind, AccessShareLock);
+	}
+
+	return result;
+}
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 188e26f0e6e..71b73c21ebf 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -904,7 +904,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 static void
 refresh_by_heap_swap(Oid matviewOid, Oid OIDNewHeap, char relpersistence)
 {
-	finish_heap_swap(matviewOid, OIDNewHeap, false, false, true, true,
+	finish_heap_swap(matviewOid, OIDNewHeap, false, false, true, true, true,
 					 RecentXmin, ReadNextMultiXactId(), relpersistence);
 }
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 082a3575d62..c79f5b1dc0f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -5989,6 +5989,7 @@ ATRewriteTables(AlterTableStmt *parsetree, List **wqueue, LOCKMODE lockmode,
 			finish_heap_swap(tab->relid, OIDNewHeap,
 							 false, false, true,
 							 !OidIsValid(tab->newTableSpace),
+							 true,
 							 RecentXmin,
 							 ReadNextMultiXactId(),
 							 persistence);
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 8863ad0e8bd..6de9d0ba39d 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -125,7 +125,7 @@ static void vac_truncate_clog(TransactionId frozenXID,
 							  TransactionId lastSaneFrozenXid,
 							  MultiXactId lastSaneMinMulti);
 static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
-					   BufferAccessStrategy bstrategy);
+					   BufferAccessStrategy bstrategy, bool isTopLevel);
 static double compute_parallel_delay(void);
 static VacOptValue get_vacoptval_from_boolean(DefElem *def);
 static bool vac_tid_reaped(ItemPointer itemptr, void *state);
@@ -633,7 +633,8 @@ vacuum(List *relations, const VacuumParams params, BufferAccessStrategy bstrateg
 
 			if (params.options & VACOPT_VACUUM)
 			{
-				if (!vacuum_rel(vrel->oid, vrel->relation, params, bstrategy))
+				if (!vacuum_rel(vrel->oid, vrel->relation, params, bstrategy,
+								isTopLevel))
 					continue;
 			}
 
@@ -1997,7 +1998,7 @@ vac_truncate_clog(TransactionId frozenXID,
  */
 static bool
 vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
-		   BufferAccessStrategy bstrategy)
+		   BufferAccessStrategy bstrategy, bool isTopLevel)
 {
 	LOCKMODE	lmode;
 	Relation	rel;
@@ -2288,7 +2289,7 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 
 			/* VACUUM FULL is now a variant of CLUSTER; see cluster.c */
 			cluster_rel(REPACK_COMMAND_VACUUMFULL, false, rel, InvalidOid,
-						&cluster_params);
+						&cluster_params, isTopLevel);
 			/* cluster_rel closes the relation, but keeps lock */
 
 			rel = NULL;
@@ -2331,7 +2332,8 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 		toast_vacuum_params.options |= VACOPT_PROCESS_MAIN;
 		toast_vacuum_params.toast_parent = relid;
 
-		vacuum_rel(toast_relid, NULL, toast_vacuum_params, bstrategy);
+		vacuum_rel(toast_relid, NULL, toast_vacuum_params, bstrategy,
+				   isTopLevel);
 	}
 
 	/*
diff --git a/src/backend/meson.build b/src/backend/meson.build
index b831a541652..5c148131217 100644
--- a/src/backend/meson.build
+++ b/src/backend/meson.build
@@ -194,5 +194,6 @@ pg_test_mod_args = pg_mod_args + {
 subdir('jit/llvm')
 subdir('replication/libpqwalreceiver')
 subdir('replication/pgoutput')
+subdir('replication/pgoutput_repack')
 subdir('snowball')
 subdir('utils/mb/conversion_procs')
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..5dc4ae58ffe 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -33,6 +33,7 @@
 #include "access/xlogreader.h"
 #include "access/xlogrecord.h"
 #include "catalog/pg_control.h"
+#include "commands/cluster.h"
 #include "replication/decode.h"
 #include "replication/logical.h"
 #include "replication/message.h"
@@ -472,6 +473,88 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	TransactionId xid = XLogRecGetXid(buf->record);
 	SnapBuild  *builder = ctx->snapshot_builder;
 
+	/*
+	 * If the change is not intended for logical decoding, do not even
+	 * establish transaction for it - REPACK CONCURRENTLY is the typical use
+	 * case.
+	 *
+	 * First, check if REPACK CONCURRENTLY is being performed by this backend.
+	 * If so, only decode data changes of the table that it is processing, and
+	 * the changes of its TOAST relation.
+	 *
+	 * (TOAST locator should not be set unless the main is.)
+	 */
+	Assert(!OidIsValid(repacked_rel_toast_locator.relNumber) ||
+		   OidIsValid(repacked_rel_locator.relNumber));
+
+	if (OidIsValid(repacked_rel_locator.relNumber))
+	{
+		XLogReaderState *r = buf->record;
+		RelFileLocator locator;
+
+		/* Not all records contain the block. */
+		if (XLogRecGetBlockTagExtended(r, 0, &locator, NULL, NULL, NULL) &&
+			!RelFileLocatorEquals(locator, repacked_rel_locator) &&
+			(!OidIsValid(repacked_rel_toast_locator.relNumber) ||
+			 !RelFileLocatorEquals(locator, repacked_rel_toast_locator)))
+			return;
+	}
+
+	/*
+	 * Second, skip records which do not contain sufficient information for
+	 * the decoding.
+	 *
+	 * The problem we solve here is that REPACK CONCURRENTLY generates WAL
+	 * when doing changes in the new table. Those changes should not be useful
+	 * for any other user (such as logical replication subscription) because
+	 * the new table will eventually be dropped (after REPACK CONCURRENTLY has
+	 * assigned its file to the "old table").
+	 */
+	switch (info)
+	{
+		case XLOG_HEAP_INSERT:
+			{
+				xl_heap_insert *rec;
+
+				rec = (xl_heap_insert *) XLogRecGetData(buf->record);
+
+				/*
+				 * This does happen when 1) raw_heap_insert marks the TOAST
+				 * record as HEAP_INSERT_NO_LOGICAL, 2) REPACK CONCURRENTLY
+				 * replays inserts performed by other backends.
+				 */
+				if ((rec->flags & XLH_INSERT_CONTAINS_NEW_TUPLE) == 0)
+					return;
+
+				break;
+			}
+
+		case XLOG_HEAP_HOT_UPDATE:
+		case XLOG_HEAP_UPDATE:
+			{
+				xl_heap_update *rec;
+
+				rec = (xl_heap_update *) XLogRecGetData(buf->record);
+				if ((rec->flags &
+					 (XLH_UPDATE_CONTAINS_NEW_TUPLE |
+					  XLH_UPDATE_CONTAINS_OLD_TUPLE |
+					  XLH_UPDATE_CONTAINS_OLD_KEY)) == 0)
+					return;
+
+				break;
+			}
+
+		case XLOG_HEAP_DELETE:
+			{
+				xl_heap_delete *rec;
+
+				rec = (xl_heap_delete *) XLogRecGetData(buf->record);
+				if (rec->flags & XLH_DELETE_NO_LOGICAL)
+					return;
+				break;
+			}
+	}
+
 	ReorderBufferProcessXid(ctx->reorder, xid, buf->origptr);
 
 	/*
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index a2f1803622c..d69229905a2 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -486,6 +486,27 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	return SnapBuildMVCCFromHistoric(snap, true);
 }
 
+/*
+ * Build an MVCC snapshot for the initial data load performed by REPACK
+ * CONCURRENTLY command.
+ *
+ * The snapshot will only be used to scan one particular relation, which is
+ * treated like a catalog (therefore ->building_full_snapshot is not
+ * important), and the caller should already have a replication slot setup (so
+ * we do not set MyProc->xmin). XXX Do we yet need to add some restrictions?
+ */
+Snapshot
+SnapBuildInitialSnapshotForRepack(SnapBuild *builder)
+{
+	Snapshot	snap;
+
+	Assert(builder->state == SNAPBUILD_CONSISTENT);
+	Assert(builder->building_full_snapshot);
+
+	snap = SnapBuildBuildSnapshot(builder);
+	return SnapBuildMVCCFromHistoric(snap, false);
+}
+
 /*
  * Turn a historic MVCC snapshot into an ordinary MVCC snapshot.
  *
diff --git a/src/backend/replication/pgoutput_repack/Makefile b/src/backend/replication/pgoutput_repack/Makefile
new file mode 100644
index 00000000000..4efeb713b70
--- /dev/null
+++ b/src/backend/replication/pgoutput_repack/Makefile
@@ -0,0 +1,32 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for src/backend/replication/pgoutput_repack
+#
+# IDENTIFICATION
+#    src/backend/replication/pgoutput_repack
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/replication/pgoutput_repack
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = \
+	$(WIN32RES) \
+	pgoutput_repack.o
+PGFILEDESC = "pgoutput_repack - logical replication output plugin for REPACK command"
+NAME = pgoutput_repack
+
+all: all-shared-lib
+
+include $(top_srcdir)/src/Makefile.shlib
+
+install: all installdirs install-lib
+
+installdirs: installdirs-lib
+
+uninstall: uninstall-lib
+
+clean distclean: clean-lib
+	rm -f $(OBJS)
diff --git a/src/backend/replication/pgoutput_repack/meson.build b/src/backend/replication/pgoutput_repack/meson.build
new file mode 100644
index 00000000000..133e865a4a0
--- /dev/null
+++ b/src/backend/replication/pgoutput_repack/meson.build
@@ -0,0 +1,18 @@
+# Copyright (c) 2022-2024, PostgreSQL Global Development Group
+
+pgoutput_repack_sources = files(
+  'pgoutput_repack.c',
+)
+
+if host_system == 'windows'
+  pgoutput_repack_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'pgoutput_repack',
+    '--FILEDESC', 'pgoutput_repack - logical replication output plugin for REPACK command',])
+endif
+
+pgoutput_repack = shared_module('pgoutput_repack',
+  pgoutput_repack_sources,
+  kwargs: pg_mod_args,
+)
+
+backend_targets += pgoutput_repack
diff --git a/src/backend/replication/pgoutput_repack/pgoutput_repack.c b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
new file mode 100644
index 00000000000..687fbbc59bb
--- /dev/null
+++ b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
@@ -0,0 +1,288 @@
+/*-------------------------------------------------------------------------
+ *
+ * pgoutput_cluster.c
+ *		Logical Replication output plugin for REPACK command
+ *
+ * Copyright (c) 2012-2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		  src/backend/replication/pgoutput_cluster/pgoutput_cluster.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heaptoast.h"
+#include "commands/cluster.h"
+#include "replication/snapbuild.h"
+
+PG_MODULE_MAGIC;
+
+static void plugin_startup(LogicalDecodingContext *ctx,
+						   OutputPluginOptions *opt, bool is_init);
+static void plugin_shutdown(LogicalDecodingContext *ctx);
+static void plugin_begin_txn(LogicalDecodingContext *ctx,
+							 ReorderBufferTXN *txn);
+static void plugin_commit_txn(LogicalDecodingContext *ctx,
+							  ReorderBufferTXN *txn, XLogRecPtr commit_lsn);
+static void plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+						  Relation rel, ReorderBufferChange *change);
+static void plugin_truncate(struct LogicalDecodingContext *ctx,
+							ReorderBufferTXN *txn, int nrelations,
+							Relation relations[],
+							ReorderBufferChange *change);
+static void store_change(LogicalDecodingContext *ctx,
+						 ConcurrentChangeKind kind, HeapTuple tuple);
+
+void
+_PG_output_plugin_init(OutputPluginCallbacks *cb)
+{
+	AssertVariableIsOfType(&_PG_output_plugin_init, LogicalOutputPluginInit);
+
+	cb->startup_cb = plugin_startup;
+	cb->begin_cb = plugin_begin_txn;
+	cb->change_cb = plugin_change;
+	cb->truncate_cb = plugin_truncate;
+	cb->commit_cb = plugin_commit_txn;
+	cb->shutdown_cb = plugin_shutdown;
+}
+
+
+/* initialize this plugin */
+static void
+plugin_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
+			   bool is_init)
+{
+	ctx->output_plugin_private = NULL;
+
+	/* Probably unnecessary, as we don't use the SQL interface ... */
+	opt->output_type = OUTPUT_PLUGIN_BINARY_OUTPUT;
+
+	if (ctx->output_plugin_options != NIL)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("This plugin does not expect any options")));
+	}
+}
+
+static void
+plugin_shutdown(LogicalDecodingContext *ctx)
+{
+}
+
+/*
+ * As we don't release the slot during processing of particular table, there's
+ * no room for SQL interface, even for debugging purposes. Therefore we need
+ * neither OutputPluginPrepareWrite() nor OutputPluginWrite() in the plugin
+ * callbacks. (Although we might want to write custom callbacks, this API
+ * seems to be unnecessarily generic for our purposes.)
+ */
+
+/* BEGIN callback */
+static void
+plugin_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
+{
+}
+
+/* COMMIT callback */
+static void
+plugin_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+				  XLogRecPtr commit_lsn)
+{
+}
+
+/*
+ * Callback for individual changed tuples
+ */
+static void
+plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+			  Relation relation, ReorderBufferChange *change)
+{
+	RepackDecodingState *dstate;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	/* Only interested in one particular relation. */
+	if (relation->rd_id != dstate->relid)
+		return;
+
+	/* Decode entry depending on its type */
+	switch (change->action)
+	{
+		case REORDER_BUFFER_CHANGE_INSERT:
+			{
+				HeapTuple	newtuple;
+
+				newtuple = change->data.tp.newtuple != NULL ?
+					change->data.tp.newtuple : NULL;
+
+				/*
+				 * Identity checks in the main function should have made this
+				 * impossible.
+				 */
+				if (newtuple == NULL)
+					elog(ERROR, "Incomplete insert info.");
+
+				store_change(ctx, CHANGE_INSERT, newtuple);
+			}
+			break;
+		case REORDER_BUFFER_CHANGE_UPDATE:
+			{
+				HeapTuple	oldtuple,
+							newtuple;
+
+				oldtuple = change->data.tp.oldtuple != NULL ?
+					change->data.tp.oldtuple : NULL;
+				newtuple = change->data.tp.newtuple != NULL ?
+					change->data.tp.newtuple : NULL;
+
+				if (newtuple == NULL)
+					elog(ERROR, "Incomplete update info.");
+
+				if (oldtuple != NULL)
+					store_change(ctx, CHANGE_UPDATE_OLD, oldtuple);
+
+				store_change(ctx, CHANGE_UPDATE_NEW, newtuple);
+			}
+			break;
+		case REORDER_BUFFER_CHANGE_DELETE:
+			{
+				HeapTuple	oldtuple;
+
+				oldtuple = change->data.tp.oldtuple ?
+					change->data.tp.oldtuple : NULL;
+
+				if (oldtuple == NULL)
+					elog(ERROR, "Incomplete delete info.");
+
+				store_change(ctx, CHANGE_DELETE, oldtuple);
+			}
+			break;
+		default:
+			/* Should not come here */
+			Assert(false);
+			break;
+	}
+}
+
+static void
+plugin_truncate(struct LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+				int nrelations, Relation relations[],
+				ReorderBufferChange *change)
+{
+	RepackDecodingState *dstate;
+	int			i;
+	Relation	relation = NULL;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	/* Find the relation we are processing. */
+	for (i = 0; i < nrelations; i++)
+	{
+		relation = relations[i];
+
+		if (RelationGetRelid(relation) == dstate->relid)
+			break;
+	}
+
+	/* Is this truncation of another relation? */
+	if (i == nrelations)
+		return;
+
+	store_change(ctx, CHANGE_TRUNCATE, NULL);
+}
+
+/* Store concurrent data change. */
+static void
+store_change(LogicalDecodingContext *ctx, ConcurrentChangeKind kind,
+			 HeapTuple tuple)
+{
+	RepackDecodingState *dstate;
+	char	   *change_raw;
+	ConcurrentChange change;
+	bool		flattened = false;
+	Size		size;
+	Datum		values[1];
+	bool		isnull[1];
+	char	   *dst,
+			   *dst_start;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	size = MAXALIGN(VARHDRSZ) + SizeOfConcurrentChange;
+
+	if (tuple)
+	{
+		/*
+		 * ReorderBufferCommit() stores the TOAST chunks in its private memory
+		 * context and frees them after having called apply_change().
+		 * Therefore we need flat copy (including TOAST) that we eventually
+		 * copy into the memory context which is available to
+		 * decode_concurrent_changes().
+		 */
+		if (HeapTupleHasExternal(tuple))
+		{
+			/*
+			 * toast_flatten_tuple_to_datum() might be more convenient but we
+			 * don't want the decompression it does.
+			 */
+			tuple = toast_flatten_tuple(tuple, dstate->tupdesc);
+			flattened = true;
+		}
+
+		size += tuple->t_len;
+	}
+
+	/* XXX Isn't there any function / macro to do this? */
+	if (size >= 0x3FFFFFFF)
+		elog(ERROR, "Change is too big.");
+
+	/* Construct the change. */
+	change_raw = (char *) palloc0(size);
+	SET_VARSIZE(change_raw, size);
+
+	/*
+	 * Since the varlena alignment might not be sufficient for the structure,
+	 * set the fields in a local instance and remember where it should
+	 * eventually be copied.
+	 */
+	change.kind = kind;
+	dst_start = (char *) VARDATA(change_raw);
+
+	/* No other information is needed for TRUNCATE. */
+	if (change.kind == CHANGE_TRUNCATE)
+	{
+		memcpy(dst_start, &change, SizeOfConcurrentChange);
+		goto store;
+	}
+
+	/*
+	 * Copy the tuple.
+	 *
+	 * CAUTION: change->tup_data.t_data must be fixed on retrieval!
+	 */
+	memcpy(&change.tup_data, tuple, sizeof(HeapTupleData));
+	dst = dst_start + SizeOfConcurrentChange;
+	memcpy(dst, tuple->t_data, tuple->t_len);
+
+	/* The data has been copied. */
+	if (flattened)
+		pfree(tuple);
+
+store:
+	/* Copy the structure so it can be stored. */
+	memcpy(dst_start, &change, SizeOfConcurrentChange);
+
+	/* Store as tuple of 1 bytea column. */
+	values[0] = PointerGetDatum(change_raw);
+	isnull[0] = false;
+	tuplestore_putvalues(dstate->tstore, dstate->tupdesc_change,
+						 values, isnull);
+
+	/* Accounting. */
+	dstate->nchanges++;
+
+	/* Cleanup. */
+	pfree(change_raw);
+}
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2fa045e6b0f..e9ddf39500c 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -25,6 +25,7 @@
 #include "access/xlogprefetcher.h"
 #include "access/xlogrecovery.h"
 #include "commands/async.h"
+#include "commands/cluster.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
diff --git a/src/backend/storage/lmgr/generate-lwlocknames.pl b/src/backend/storage/lmgr/generate-lwlocknames.pl
index cd3e43c448a..519f3953638 100644
--- a/src/backend/storage/lmgr/generate-lwlocknames.pl
+++ b/src/backend/storage/lmgr/generate-lwlocknames.pl
@@ -162,7 +162,7 @@ while (<$lwlocklist>)
 
 die
   "$wait_event_lwlocks[$lwlock_count] defined in wait_event_names.txt but "
-  . " missing from lwlocklist.h"
+  . "missing from lwlocklist.h"
   if $lwlock_count < scalar @wait_event_lwlocks;
 
 die
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 6fe268a8eec..d27a4c30548 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -64,6 +64,7 @@
 #include "catalog/pg_type.h"
 #include "catalog/schemapg.h"
 #include "catalog/storage.h"
+#include "commands/cluster.h"
 #include "commands/policy.h"
 #include "commands/publicationcmds.h"
 #include "commands/trigger.h"
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index bc7840052fe..6d46537cbe8 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -213,7 +213,6 @@ static List *exportedSnapshots = NIL;
 
 /* Prototypes for local functions */
 static void UnregisterSnapshotNoOwner(Snapshot snapshot);
-static void FreeSnapshot(Snapshot snapshot);
 static void SnapshotResetXmin(void);
 
 /* ResourceOwner callbacks to track snapshot references */
@@ -657,7 +656,7 @@ CopySnapshot(Snapshot snapshot)
  * FreeSnapshot
  *		Free the memory associated with a snapshot.
  */
-static void
+void
 FreeSnapshot(Snapshot snapshot)
 {
 	Assert(snapshot->regd_count == 0);
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 59ff6e0923b..528fb08154a 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -4998,18 +4998,27 @@ match_previous_words(int pattern_id,
 	}
 
 /* REPACK */
-	else if (Matches("REPACK"))
+	else if (Matches("REPACK") || Matches("REPACK", "(*)"))
+		COMPLETE_WITH_SCHEMA_QUERY_PLUS(Query_for_list_of_clusterables,
+										"CONCURRENTLY");
+	else if (Matches("REPACK", "CONCURRENTLY"))
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
-	else if (Matches("REPACK", "(*)"))
+	else if (Matches("REPACK", "(*)", "CONCURRENTLY"))
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
-	/* If we have REPACK <sth>, then add "USING INDEX" */
-	else if (Matches("REPACK", MatchAnyExcept("(")))
+	/* If we have REPACK [ CONCURRENTLY ] <sth>, then add "USING INDEX" */
+	else if (Matches("REPACK", MatchAnyExcept("(|CONCURRENTLY")) ||
+			 Matches("REPACK", "CONCURRENTLY", MatchAnyExcept("(")))
 		COMPLETE_WITH("USING INDEX");
-	/* If we have REPACK (*) <sth>, then add "USING INDEX" */
-	else if (Matches("REPACK", "(*)", MatchAny))
+	/* If we have REPACK (*) [ CONCURRENTLY ] <sth>, then add "USING INDEX" */
+	else if (Matches("REPACK", "(*)", MatchAnyExcept("CONCURRENTLY")) ||
+			 Matches("REPACK", "(*)", "CONCURRENTLY", MatchAnyExcept("(")))
 		COMPLETE_WITH("USING INDEX");
-	/* If we have REPACK <sth> USING, then add the index as well */
-	else if (Matches("REPACK", MatchAny, "USING", "INDEX"))
+
+	/*
+	 * Complete ... [ (*) ] [ CONCURRENTLY ] <sth> USING INDEX, with a list of
+	 * indexes for <sth>.
+	 */
+	else if (TailMatches(MatchAnyExcept("(|CONCURRENTLY"), "USING", "INDEX"))
 	{
 		set_completion_reference(prev3_wd);
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_index_of_table);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2bd5a897f8..b82dd17a966 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -323,14 +323,15 @@ extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots,
 							  BulkInsertState bistate);
 extern TM_Result heap_delete(Relation relation, ItemPointer tid,
 							 CommandId cid, Snapshot crosscheck, bool wait,
-							 struct TM_FailureData *tmfd, bool changingPart);
+							 struct TM_FailureData *tmfd, bool changingPart,
+							 bool wal_logical);
 extern void heap_finish_speculative(Relation relation, ItemPointer tid);
 extern void heap_abort_speculative(Relation relation, ItemPointer tid);
 extern TM_Result heap_update(Relation relation, ItemPointer otid,
 							 HeapTuple newtup,
 							 CommandId cid, Snapshot crosscheck, bool wait,
 							 struct TM_FailureData *tmfd, LockTupleMode *lockmode,
-							 TU_UpdateIndexes *update_indexes);
+							 TU_UpdateIndexes *update_indexes, bool wal_logical);
 extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
 								 CommandId cid, LockTupleMode mode, LockWaitPolicy wait_policy,
 								 bool follow_updates,
@@ -411,6 +412,10 @@ extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
 								 uint16 infomask, TransactionId xid);
+extern bool HeapTupleMVCCInserted(HeapTuple htup, Snapshot snapshot,
+								  Buffer buffer);
+extern bool HeapTupleMVCCNotDeleted(HeapTuple htup, Snapshot snapshot,
+									Buffer buffer);
 extern bool HeapTupleHeaderIsOnlyLocked(HeapTupleHeader tuple);
 extern bool HeapTupleIsSurelyDead(HeapTuple htup,
 								  struct GlobalVisState *vistest);
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 277df6b3cf0..8d4af07f840 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -104,6 +104,8 @@
 #define XLH_DELETE_CONTAINS_OLD_KEY				(1<<2)
 #define XLH_DELETE_IS_SUPER						(1<<3)
 #define XLH_DELETE_IS_PARTITION_MOVE			(1<<4)
+/* See heap_delete() */
+#define XLH_DELETE_NO_LOGICAL					(1<<5)
 
 /* convenience macro for checking whether any form of old tuple was logged */
 #define XLH_DELETE_CONTAINS_OLD						\
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 1c9e802a6b1..289b64edfd9 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -22,6 +22,7 @@
 #include "access/xact.h"
 #include "commands/vacuum.h"
 #include "executor/tuptable.h"
+#include "replication/logical.h"
 #include "storage/read_stream.h"
 #include "utils/rel.h"
 #include "utils/snapshot.h"
@@ -623,6 +624,8 @@ typedef struct TableAmRoutine
 											  Relation OldIndex,
 											  bool use_sort,
 											  TransactionId OldestXmin,
+											  Snapshot snapshot,
+											  LogicalDecodingContext *decoding_ctx,
 											  TransactionId *xid_cutoff,
 											  MultiXactId *multi_cutoff,
 											  double *num_tuples,
@@ -1627,6 +1630,10 @@ table_relation_copy_data(Relation rel, const RelFileLocator *newrlocator)
  *   not needed for the relation's AM
  * - *xid_cutoff - ditto
  * - *multi_cutoff - ditto
+ * - snapshot - if != NULL, ignore data changes done by transactions that this
+ *	 (MVCC) snapshot considers still in-progress or in the future.
+ * - decoding_ctx - logical decoding context, to capture concurrent data
+ *   changes.
  *
  * Output parameters:
  * - *xid_cutoff - rel's new relfrozenxid value, may be invalid
@@ -1639,6 +1646,8 @@ table_relation_copy_for_cluster(Relation OldTable, Relation NewTable,
 								Relation OldIndex,
 								bool use_sort,
 								TransactionId OldestXmin,
+								Snapshot snapshot,
+								LogicalDecodingContext *decoding_ctx,
 								TransactionId *xid_cutoff,
 								MultiXactId *multi_cutoff,
 								double *num_tuples,
@@ -1647,6 +1656,7 @@ table_relation_copy_for_cluster(Relation OldTable, Relation NewTable,
 {
 	OldTable->rd_tableam->relation_copy_for_cluster(OldTable, NewTable, OldIndex,
 													use_sort, OldestXmin,
+													snapshot, decoding_ctx,
 													xid_cutoff, multi_cutoff,
 													num_tuples, tups_vacuumed,
 													tups_recently_dead);
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 890998d84bb..4a508c57a50 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -13,10 +13,15 @@
 #ifndef CLUSTER_H
 #define CLUSTER_H
 
+#include "nodes/execnodes.h"
 #include "nodes/parsenodes.h"
 #include "parser/parse_node.h"
+#include "replication/logical.h"
 #include "storage/lock.h"
+#include "storage/relfilelocator.h"
 #include "utils/relcache.h"
+#include "utils/resowner.h"
+#include "utils/tuplestore.h"
 
 
 /* flag bits for ClusterParams->options */
@@ -25,6 +30,8 @@
 #define CLUOPT_RECHECK_ISCLUSTERED 0x04 /* recheck relation state for
 										 * indisclustered */
 #define CLUOPT_ANALYZE 0x08		/* do an ANALYZE */
+#define CLUOPT_CONCURRENT 0x08	/* allow concurrent data changes */
+
 
 /* options for CLUSTER */
 typedef struct ClusterParams
@@ -33,14 +40,95 @@ typedef struct ClusterParams
 } ClusterParams;
 
 
+/*
+ * The following definitions are used by REPACK CONCURRENTLY.
+ */
+
+extern RelFileLocator repacked_rel_locator;
+extern RelFileLocator repacked_rel_toast_locator;
+
+typedef enum
+{
+	CHANGE_INSERT,
+	CHANGE_UPDATE_OLD,
+	CHANGE_UPDATE_NEW,
+	CHANGE_DELETE,
+	CHANGE_TRUNCATE
+} ConcurrentChangeKind;
+
+typedef struct ConcurrentChange
+{
+	/* See the enum above. */
+	ConcurrentChangeKind kind;
+
+	/*
+	 * The actual tuple.
+	 *
+	 * The tuple data follows the ConcurrentChange structure. Before use make
+	 * sure the tuple is correctly aligned (ConcurrentChange can be stored as
+	 * bytea) and that tuple->t_data is fixed.
+	 */
+	HeapTupleData tup_data;
+} ConcurrentChange;
+
+#define SizeOfConcurrentChange (offsetof(ConcurrentChange, tup_data) + \
+								sizeof(HeapTupleData))
+
+/*
+ * Logical decoding state.
+ *
+ * Here we store the data changes that we decode from WAL while the table
+ * contents is being copied to a new storage. Also the necessary metadata
+ * needed to apply these changes to the table is stored here.
+ */
+typedef struct RepackDecodingState
+{
+	/* The relation whose changes we're decoding. */
+	Oid			relid;
+
+	/*
+	 * Decoded changes are stored here. Although we try to avoid excessive
+	 * batches, it can happen that the changes need to be stored to disk. The
+	 * tuplestore does this transparently.
+	 */
+	Tuplestorestate *tstore;
+
+	/* The current number of changes in tstore. */
+	double		nchanges;
+
+	/*
+	 * Descriptor to store the ConcurrentChange structure serialized (bytea).
+	 * We can't store the tuple directly because tuplestore only supports
+	 * minimum tuple and we may need to transfer OID system column from the
+	 * output plugin. Also we need to transfer the change kind, so it's better
+	 * to put everything in the structure than to use 2 tuplestores "in
+	 * parallel".
+	 */
+	TupleDesc	tupdesc_change;
+
+	/* Tuple descriptor needed to update indexes. */
+	TupleDesc	tupdesc;
+
+	/* Slot to retrieve data from tstore. */
+	TupleTableSlot *tsslot;
+
+	ResourceOwner resowner;
+} RepackDecodingState;
+
+
+
 extern void ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel);
 
 extern void cluster_rel(RepackCommand command, bool usingindex,
-						Relation OldHeap, Oid indexOid, ClusterParams *params);
+						Relation OldHeap, Oid indexOid, ClusterParams *params,
+						bool isTopLevel);
 extern void check_index_is_clusterable(Relation OldHeap, Oid indexOid,
 									   LOCKMODE lockmode);
 extern void mark_index_clustered(Relation rel, Oid indexOid, bool is_internal);
 
+extern void repack_decode_concurrent_changes(LogicalDecodingContext *ctx,
+											 XLogRecPtr end_of_wal);
+
 extern Oid	make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
 						  char relpersistence, LOCKMODE lockmode);
 extern void finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
@@ -48,6 +136,7 @@ extern void finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 							 bool swap_toast_by_content,
 							 bool check_constraints,
 							 bool is_internal,
+							 bool reindex,
 							 TransactionId frozenXid,
 							 MultiXactId cutoffMulti,
 							 char newrelpersistence);
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 5b6639c114c..93917ad5544 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -59,18 +59,20 @@
 /*
  * Progress parameters for REPACK.
  *
- * Note: Since REPACK shares some code with CLUSTER, these values are also
- * used by CLUSTER. (CLUSTER is now deprecated, so it makes little sense to
- * introduce a separate set of constants.)
+ * Note: Since REPACK shares some code with CLUSTER, (some of) these values
+ * are also used by CLUSTER. (CLUSTER is now deprecated, so it makes little
+ * sense to introduce a separate set of constants.)
  */
 #define PROGRESS_REPACK_COMMAND					0
 #define PROGRESS_REPACK_PHASE					1
 #define PROGRESS_REPACK_INDEX_RELID				2
 #define PROGRESS_REPACK_HEAP_TUPLES_SCANNED		3
-#define PROGRESS_REPACK_HEAP_TUPLES_WRITTEN		4
-#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			5
-#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		6
-#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		7
+#define PROGRESS_REPACK_HEAP_TUPLES_INSERTED	4
+#define PROGRESS_REPACK_HEAP_TUPLES_UPDATED		5
+#define PROGRESS_REPACK_HEAP_TUPLES_DELETED		6
+#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			7
+#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		8
+#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		9
 
 /*
  * Phases of repack (as advertised via PROGRESS_REPACK_PHASE).
@@ -79,9 +81,10 @@
 #define PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP	2
 #define PROGRESS_REPACK_PHASE_SORT_TUPLES		3
 #define PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP	4
-#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	5
-#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		6
-#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		7
+#define PROGRESS_REPACK_PHASE_CATCH_UP			5
+#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	6
+#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		7
+#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		8
 
 /*
  * Commands of PROGRESS_REPACK
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index 6d4d2d1814c..802fc4b0823 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -73,6 +73,7 @@ extern void FreeSnapshotBuilder(SnapBuild *builder);
 extern void SnapBuildSnapDecRefcount(Snapshot snap);
 
 extern Snapshot SnapBuildInitialSnapshot(SnapBuild *builder);
+extern Snapshot SnapBuildInitialSnapshotForRepack(SnapBuild *builder);
 extern Snapshot SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place);
 extern const char *SnapBuildExportSnapshot(SnapBuild *builder);
 extern void SnapBuildClearExportedSnapshot(void);
diff --git a/src/include/storage/lockdefs.h b/src/include/storage/lockdefs.h
index 7f3ba0352f6..2739327b0da 100644
--- a/src/include/storage/lockdefs.h
+++ b/src/include/storage/lockdefs.h
@@ -36,8 +36,8 @@ typedef int LOCKMODE;
 #define AccessShareLock			1	/* SELECT */
 #define RowShareLock			2	/* SELECT FOR UPDATE/FOR SHARE */
 #define RowExclusiveLock		3	/* INSERT, UPDATE, DELETE */
-#define ShareUpdateExclusiveLock 4	/* VACUUM (non-FULL), ANALYZE, CREATE
-									 * INDEX CONCURRENTLY */
+#define ShareUpdateExclusiveLock 4	/* VACUUM (non-exclusive), ANALYZE, CREATE
+									 * INDEX CONCURRENTLY, REPACK CONCURRENTLY */
 #define ShareLock				5	/* CREATE INDEX (WITHOUT CONCURRENTLY) */
 #define ShareRowExclusiveLock	6	/* like EXCLUSIVE MODE, but allows ROW
 									 * SHARE */
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index f65f83c85cd..1f821fd2ccd 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -64,6 +64,8 @@ extern Snapshot GetLatestSnapshot(void);
 extern void SnapshotSetCommandId(CommandId curcid);
 
 extern Snapshot CopySnapshot(Snapshot snapshot);
+extern void FreeSnapshot(Snapshot snapshot);
+
 extern Snapshot GetCatalogSnapshot(Oid relid);
 extern Snapshot GetNonHistoricCatalogSnapshot(Oid relid);
 extern void InvalidateCatalogSnapshot(void);
diff --git a/src/test/modules/injection_points/Makefile b/src/test/modules/injection_points/Makefile
index fc82cd67f6c..f16422175f8 100644
--- a/src/test/modules/injection_points/Makefile
+++ b/src/test/modules/injection_points/Makefile
@@ -11,10 +11,11 @@ EXTENSION = injection_points
 DATA = injection_points--1.0.sql
 PGFILEDESC = "injection_points - facility for injection points"
 
-REGRESS = injection_points hashagg reindex_conc vacuum
+# REGRESS = injection_points hashagg reindex_conc vacuum
 REGRESS_OPTS = --dlpath=$(top_builddir)/src/test/regress
 
-ISOLATION = basic inplace syscache-update-pruned
+ISOLATION = basic inplace syscache-update-pruned repack
+ISOLATION_OPTS = --temp-config $(top_srcdir)/src/test/modules/injection_points/logical.conf
 
 TAP_TESTS = 1
 
diff --git a/src/test/modules/injection_points/expected/repack.out b/src/test/modules/injection_points/expected/repack.out
new file mode 100644
index 00000000000..b575e9052ee
--- /dev/null
+++ b/src/test/modules/injection_points/expected/repack.out
@@ -0,0 +1,113 @@
+Parsed test spec with 2 sessions
+
+starting permutation: wait_before_lock change_existing change_new change_subxact1 change_subxact2 check2 wakeup_before_lock check1
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step wait_before_lock: 
+	REPACK (CONCURRENTLY) repack_test USING INDEX repack_test_pkey;
+ <waiting ...>
+step change_existing: 
+	UPDATE repack_test SET i=10 where i=1;
+	UPDATE repack_test SET j=20 where i=2;
+	UPDATE repack_test SET i=30 where i=3;
+	UPDATE repack_test SET i=40 where i=30;
+	DELETE FROM repack_test WHERE i=4;
+
+step change_new: 
+	INSERT INTO repack_test(i, j) VALUES (5, 5), (6, 6), (7, 7), (8, 8);
+	UPDATE repack_test SET i=50 where i=5;
+	UPDATE repack_test SET j=60 where i=6;
+	DELETE FROM repack_test WHERE i=7;
+
+step change_subxact1: 
+	BEGIN;
+	INSERT INTO repack_test(i, j) VALUES (100, 100);
+	SAVEPOINT s1;
+	UPDATE repack_test SET i=101 where i=100;
+	SAVEPOINT s2;
+	UPDATE repack_test SET i=102 where i=101;
+	COMMIT;
+
+step change_subxact2: 
+	BEGIN;
+	SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 110);
+	ROLLBACK TO SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 111);
+	COMMIT;
+
+step check2: 
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s2(i, j)
+	SELECT i, j FROM repack_test;
+
+  i|  j
+---+---
+  2| 20
+  6| 60
+  8|  8
+ 10|  1
+ 40|  3
+ 50|  5
+102|100
+110|111
+(8 rows)
+
+step wakeup_before_lock: 
+	SELECT injection_points_wakeup('repack-concurrently-before-lock');
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step wait_before_lock: <... completed>
+step check1: 
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT count(DISTINCT node) FROM relfilenodes;
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s1(i, j)
+	SELECT i, j FROM repack_test;
+
+	SELECT count(*)
+	FROM data_s1 d1 FULL JOIN data_s2 d2 USING (i, j)
+	WHERE d1.i ISNULL OR d2.i ISNULL;
+
+count
+-----
+    2
+(1 row)
+
+  i|  j
+---+---
+  2| 20
+  6| 60
+  8|  8
+ 10|  1
+ 40|  3
+ 50|  5
+102|100
+110|111
+(8 rows)
+
+count
+-----
+    0
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
diff --git a/src/test/modules/injection_points/logical.conf b/src/test/modules/injection_points/logical.conf
new file mode 100644
index 00000000000..c8f264bc6cb
--- /dev/null
+++ b/src/test/modules/injection_points/logical.conf
@@ -0,0 +1 @@
+wal_level = logical
\ No newline at end of file
diff --git a/src/test/modules/injection_points/meson.build b/src/test/modules/injection_points/meson.build
index 20390d6b4bf..29561103bbf 100644
--- a/src/test/modules/injection_points/meson.build
+++ b/src/test/modules/injection_points/meson.build
@@ -47,9 +47,13 @@ tests += {
     'specs': [
       'basic',
       'inplace',
+      'repack',
       'syscache-update-pruned',
     ],
     'runningcheck': false, # see syscache-update-pruned
+    # 'repack' requires wal_level = 'logical'.
+    'regress_args': ['--temp-config', files('logical.conf')],
+
   },
   'tap': {
     'env': {
diff --git a/src/test/modules/injection_points/specs/repack.spec b/src/test/modules/injection_points/specs/repack.spec
new file mode 100644
index 00000000000..75850334986
--- /dev/null
+++ b/src/test/modules/injection_points/specs/repack.spec
@@ -0,0 +1,143 @@
+# Prefix the system columns with underscore as they are not allowed as column
+# names.
+setup
+{
+	CREATE EXTENSION injection_points;
+
+	CREATE TABLE repack_test(i int PRIMARY KEY, j int);
+	INSERT INTO repack_test(i, j) VALUES (1, 1), (2, 2), (3, 3), (4, 4);
+
+	CREATE TABLE relfilenodes(node oid);
+
+	CREATE TABLE data_s1(i int, j int);
+	CREATE TABLE data_s2(i int, j int);
+}
+
+teardown
+{
+	DROP TABLE repack_test;
+	DROP EXTENSION injection_points;
+
+	DROP TABLE relfilenodes;
+	DROP TABLE data_s1;
+	DROP TABLE data_s2;
+}
+
+session s1
+setup
+{
+	SELECT injection_points_set_local();
+	SELECT injection_points_attach('repack-concurrently-before-lock', 'wait');
+}
+# Perform the initial load and wait for s2 to do some data changes.
+step wait_before_lock
+{
+	REPACK (CONCURRENTLY) repack_test USING INDEX repack_test_pkey;
+}
+# Check the table from the perspective of s1.
+#
+# Besides the contents, we also check that relfilenode has changed.
+
+# Have each session write the contents into a table and use FULL JOIN to check
+# if the outputs are identical.
+step check1
+{
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT count(DISTINCT node) FROM relfilenodes;
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s1(i, j)
+	SELECT i, j FROM repack_test;
+
+	SELECT count(*)
+	FROM data_s1 d1 FULL JOIN data_s2 d2 USING (i, j)
+	WHERE d1.i ISNULL OR d2.i ISNULL;
+}
+teardown
+{
+    SELECT injection_points_detach('repack-concurrently-before-lock');
+}
+
+session s2
+# Change the existing data. UPDATE changes both key and non-key columns. Also
+# update one row twice to test whether tuple version generated by this session
+# can be found.
+step change_existing
+{
+	UPDATE repack_test SET i=10 where i=1;
+	UPDATE repack_test SET j=20 where i=2;
+	UPDATE repack_test SET i=30 where i=3;
+	UPDATE repack_test SET i=40 where i=30;
+	DELETE FROM repack_test WHERE i=4;
+}
+# Insert new rows and UPDATE / DELETE some of them. Again, update both key and
+# non-key column.
+step change_new
+{
+	INSERT INTO repack_test(i, j) VALUES (5, 5), (6, 6), (7, 7), (8, 8);
+	UPDATE repack_test SET i=50 where i=5;
+	UPDATE repack_test SET j=60 where i=6;
+	DELETE FROM repack_test WHERE i=7;
+}
+
+# When applying concurrent data changes, we should see the effects of an
+# in-progress subtransaction.
+#
+# XXX Not sure this test is useful now - it was designed for the patch that
+# preserves tuple visibility and which therefore modifies
+# TransactionIdIsCurrentTransactionId().
+step change_subxact1
+{
+	BEGIN;
+	INSERT INTO repack_test(i, j) VALUES (100, 100);
+	SAVEPOINT s1;
+	UPDATE repack_test SET i=101 where i=100;
+	SAVEPOINT s2;
+	UPDATE repack_test SET i=102 where i=101;
+	COMMIT;
+}
+
+# When applying concurrent data changes, we should not see the effects of a
+# rolled back subtransaction.
+#
+# XXX Is this test useful? See above.
+step change_subxact2
+{
+	BEGIN;
+	SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 110);
+	ROLLBACK TO SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 111);
+	COMMIT;
+}
+
+# Check the table from the perspective of s2.
+step check2
+{
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s2(i, j)
+	SELECT i, j FROM repack_test;
+}
+step wakeup_before_lock
+{
+	SELECT injection_points_wakeup('repack-concurrently-before-lock');
+}
+
+# Test if data changes introduced while one session is performing REPACK
+# CONCURRENTLY find their way into the table.
+permutation
+	wait_before_lock
+	change_existing
+	change_new
+	change_subxact1
+	change_subxact2
+	check2
+	wakeup_before_lock
+	check1
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 3a1d1d28282..fe227bd8a30 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1999,17 +1999,17 @@ pg_stat_progress_cluster| SELECT s.pid,
             WHEN 2 THEN 'index scanning heap'::text
             WHEN 3 THEN 'sorting tuples'::text
             WHEN 4 THEN 'writing new heap'::text
-            WHEN 5 THEN 'swapping relation files'::text
-            WHEN 6 THEN 'rebuilding index'::text
-            WHEN 7 THEN 'performing final cleanup'::text
+            WHEN 6 THEN 'swapping relation files'::text
+            WHEN 7 THEN 'rebuilding index'::text
+            WHEN 8 THEN 'performing final cleanup'::text
             ELSE NULL::text
         END AS phase,
     (s.param3)::oid AS cluster_index_relid,
     s.param4 AS heap_tuples_scanned,
     s.param5 AS heap_tuples_written,
-    s.param6 AS heap_blks_total,
-    s.param7 AS heap_blks_scanned,
-    s.param8 AS index_rebuild_count
+    s.param8 AS heap_blks_total,
+    s.param9 AS heap_blks_scanned,
+    s.param10 AS index_rebuild_count
    FROM (pg_stat_get_progress_info('CLUSTER'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_copy| SELECT s.pid,
@@ -2081,17 +2081,20 @@ pg_stat_progress_repack| SELECT s.pid,
             WHEN 2 THEN 'index scanning heap'::text
             WHEN 3 THEN 'sorting tuples'::text
             WHEN 4 THEN 'writing new heap'::text
-            WHEN 5 THEN 'swapping relation files'::text
-            WHEN 6 THEN 'rebuilding index'::text
-            WHEN 7 THEN 'performing final cleanup'::text
+            WHEN 5 THEN 'catch-up'::text
+            WHEN 6 THEN 'swapping relation files'::text
+            WHEN 7 THEN 'rebuilding index'::text
+            WHEN 8 THEN 'performing final cleanup'::text
             ELSE NULL::text
         END AS phase,
     (s.param3)::oid AS repack_index_relid,
     s.param4 AS heap_tuples_scanned,
-    s.param5 AS heap_tuples_written,
-    s.param6 AS heap_blks_total,
-    s.param7 AS heap_blks_scanned,
-    s.param8 AS index_rebuild_count
+    s.param5 AS heap_tuples_inserted,
+    s.param6 AS heap_tuples_updated,
+    s.param7 AS heap_tuples_deleted,
+    s.param8 AS heap_blks_total,
+    s.param9 AS heap_blks_scanned,
+    s.param10 AS index_rebuild_count
    FROM (pg_stat_get_progress_info('REPACK'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_vacuum| SELECT s.pid,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 98242e25432..b64ab8dfab4 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -485,6 +485,8 @@ CompressFileHandle
 CompressionLocation
 CompressorState
 ComputeXidHorizonsResult
+ConcurrentChange
+ConcurrentChangeKind
 ConditionVariable
 ConditionVariableMinimallyPadded
 ConditionalStack
@@ -1257,6 +1259,7 @@ IndexElem
 IndexFetchHeapData
 IndexFetchTableData
 IndexInfo
+IndexInsertState
 IndexList
 IndexOnlyScan
 IndexOnlyScanState
@@ -2538,6 +2541,7 @@ ReorderBufferUpdateProgressTxnCB
 ReorderTuple
 RepOriginId
 RepackCommand
+RepackDecodingState
 RepackStmt
 ReparameterizeForeignPathByChild_function
 ReplaceVarsFromTargetList_context
-- 
2.43.0

v21-0002-Add-REPACK-command.patchapplication/octet-stream; name=v21-0002-Add-REPACK-command.patchDownload

From 40965dfef0f26a92249cda7a956bd03c9358a026 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= <alvherre@kurilemu.de>
Date: Sat, 26 Jul 2025 19:57:26 +0200
Subject: [PATCH v21 2/6] Add REPACK command

REPACK absorbs the functionality of VACUUM FULL and CLUSTER in a single
command.  Because this functionality is completely different from
regular VACUUM, having it separate from VACUUM makes it easier for users
to understand; as for CLUSTER, the term is heavily overloaded in the
TI world and even in Postgres itself, so it's good that we can avoid it.

This also adds pg_repackdb, a new utility that can invoke the new
commands.  This is heavily based on vacuumdb.  We may still change the
implementation, depending on how does Windows like this one.

Author: Antonin Houska <ah@cybertec.at>
Reviewed-by: To fill in
Discussion: https://postgr.es/m/82651.1720540558@antos
Discussion: https://postgr.es/m/202507262156.sb455angijk6@alvherre.pgsql
---
 doc/src/sgml/monitoring.sgml             | 223 ++++++-
 doc/src/sgml/ref/allfiles.sgml           |   2 +
 doc/src/sgml/ref/cluster.sgml            |  97 +--
 doc/src/sgml/ref/clusterdb.sgml          |   5 +
 doc/src/sgml/ref/pg_repackdb.sgml        | 479 ++++++++++++++
 doc/src/sgml/ref/repack.sgml             | 284 +++++++++
 doc/src/sgml/ref/vacuum.sgml             |  33 +-
 doc/src/sgml/reference.sgml              |   2 +
 src/backend/access/heap/heapam_handler.c |  32 +-
 src/backend/catalog/index.c              |   2 +-
 src/backend/catalog/system_views.sql     |  26 +
 src/backend/commands/cluster.c           | 758 +++++++++++++++--------
 src/backend/commands/vacuum.c            |   3 +-
 src/backend/parser/gram.y                |  88 ++-
 src/backend/tcop/utility.c               |  20 +-
 src/backend/utils/adt/pgstatfuncs.c      |   2 +
 src/bin/psql/tab-complete.in.c           |  33 +-
 src/bin/scripts/Makefile                 |   4 +-
 src/bin/scripts/meson.build              |   2 +
 src/bin/scripts/pg_repackdb.c            | 226 +++++++
 src/bin/scripts/t/103_repackdb.pl        |  24 +
 src/bin/scripts/vacuuming.c              |  60 +-
 src/bin/scripts/vacuuming.h              |  11 +-
 src/include/commands/cluster.h           |   8 +-
 src/include/commands/progress.h          |  61 +-
 src/include/nodes/parsenodes.h           |  20 +-
 src/include/parser/kwlist.h              |   1 +
 src/include/tcop/cmdtaglist.h            |   1 +
 src/include/utils/backend_progress.h     |   1 +
 src/test/regress/expected/cluster.out    | 125 +++-
 src/test/regress/expected/rules.out      |  23 +
 src/test/regress/sql/cluster.sql         |  59 ++
 src/tools/pgindent/typedefs.list         |   3 +
 33 files changed, 2271 insertions(+), 447 deletions(-)
 create mode 100644 doc/src/sgml/ref/pg_repackdb.sgml
 create mode 100644 doc/src/sgml/ref/repack.sgml
 create mode 100644 src/bin/scripts/pg_repackdb.c
 create mode 100644 src/bin/scripts/t/103_repackdb.pl

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3f4a27a736e..12e103d319d 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -405,6 +405,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_progress_repack</structname><indexterm><primary>pg_stat_progress_repack</primary></indexterm></entry>
+      <entry>One row for each backend running
+       <command>REPACK</command>, showing current progress.  See
+       <xref linkend="repack-progress-reporting"/>.
+      </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_progress_basebackup</structname><indexterm><primary>pg_stat_progress_basebackup</primary></indexterm></entry>
       <entry>One row for each WAL sender process streaming a base backup,
@@ -5506,7 +5514,8 @@ FROM pg_stat_get_backend_idset() AS backendid;
    certain commands during command execution.  Currently, the only commands
    which support progress reporting are <command>ANALYZE</command>,
    <command>CLUSTER</command>,
-   <command>CREATE INDEX</command>, <command>VACUUM</command>,
+   <command>CREATE INDEX</command>, <command>REPACK</command>,
+   <command>VACUUM</command>,
    <command>COPY</command>,
    and <xref linkend="protocol-replication-base-backup"/> (i.e., replication
    command that <xref linkend="app-pgbasebackup"/> issues to take
@@ -5965,6 +5974,218 @@ FROM pg_stat_get_backend_idset() AS backendid;
   </table>
  </sect2>
 
+ <sect2 id="repack-progress-reporting">
+  <title>REPACK Progress Reporting</title>
+
+  <indexterm>
+   <primary>pg_stat_progress_repack</primary>
+  </indexterm>
+
+  <para>
+   Whenever <command>REPACK</command> is running,
+   the <structname>pg_stat_progress_repack</structname> view will contain a
+   row for each backend that is currently running the command.  The tables
+   below describe the information that will be reported and provide
+   information about how to interpret it.
+  </para>
+
+  <table id="pg-stat-progress-repack-view" xreflabel="pg_stat_progress_repack">
+   <title><structname>pg_stat_progress_repack</structname> View</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>pid</structfield> <type>integer</type>
+      </para>
+      <para>
+       Process ID of backend.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>datid</structfield> <type>oid</type>
+      </para>
+      <para>
+       OID of the database to which this backend is connected.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>datname</structfield> <type>name</type>
+      </para>
+      <para>
+       Name of the database to which this backend is connected.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>relid</structfield> <type>oid</type>
+      </para>
+      <para>
+       OID of the table being repacked.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>phase</structfield> <type>text</type>
+      </para>
+      <para>
+       Current processing phase. See <xref linkend="repack-phases"/>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>repack_index_relid</structfield> <type>oid</type>
+      </para>
+      <para>
+       If the table is being scanned using an index, this is the OID of the
+       index being used; otherwise, it is zero.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_scanned</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples scanned.
+       This counter only advances when the phase is
+       <literal>seq scanning heap</literal>,
+       <literal>index scanning heap</literal>
+       or <literal>writing new heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_written</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples written.
+       This counter only advances when the phase is
+       <literal>seq scanning heap</literal>,
+       <literal>index scanning heap</literal>
+       or <literal>writing new heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_blks_total</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total number of heap blocks in the table.  This number is reported
+       as of the beginning of <literal>seq scanning heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_blks_scanned</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap blocks scanned.  This counter only advances when the
+       phase is <literal>seq scanning heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>index_rebuild_count</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of indexes rebuilt.  This counter only advances when the phase
+       is <literal>rebuilding index</literal>.
+      </para></entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <table id="repack-phases">
+   <title>REPACK Phases</title>
+   <tgroup cols="2">
+    <colspec colname="col1" colwidth="1*"/>
+    <colspec colname="col2" colwidth="2*"/>
+    <thead>
+    <row>
+      <entry>Phase</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><literal>initializing</literal></entry>
+     <entry>
+       The command is preparing to begin scanning the heap.  This phase is
+       expected to be very brief.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>seq scanning heap</literal></entry>
+     <entry>
+       The command is currently scanning the table using a sequential scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>index scanning heap</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently scanning the table using an index scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>sorting tuples</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently sorting tuples.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>writing new heap</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently writing the new heap.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>swapping relation files</literal></entry>
+     <entry>
+       The command is currently swapping newly-built files into place.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>rebuilding index</literal></entry>
+     <entry>
+       The command is currently rebuilding an index.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>performing final cleanup</literal></entry>
+     <entry>
+       The command is performing final cleanup.  When this phase is
+       completed, <command>REPACK</command> will end.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+ </sect2>
+
  <sect2 id="copy-progress-reporting">
   <title>COPY Progress Reporting</title>
 
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index f5be638867a..eabf92e3536 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -167,6 +167,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY refreshMaterializedView SYSTEM "refresh_materialized_view.sgml">
 <!ENTITY reindex            SYSTEM "reindex.sgml">
 <!ENTITY releaseSavepoint   SYSTEM "release_savepoint.sgml">
+<!ENTITY repack             SYSTEM "repack.sgml">
 <!ENTITY reset              SYSTEM "reset.sgml">
 <!ENTITY revoke             SYSTEM "revoke.sgml">
 <!ENTITY rollback           SYSTEM "rollback.sgml">
@@ -212,6 +213,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY pgIsready          SYSTEM "pg_isready.sgml">
 <!ENTITY pgReceivewal       SYSTEM "pg_receivewal.sgml">
 <!ENTITY pgRecvlogical      SYSTEM "pg_recvlogical.sgml">
+<!ENTITY pgRepackdb         SYSTEM "pg_repackdb.sgml">
 <!ENTITY pgResetwal         SYSTEM "pg_resetwal.sgml">
 <!ENTITY pgRestore          SYSTEM "pg_restore.sgml">
 <!ENTITY pgRewind           SYSTEM "pg_rewind.sgml">
diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index 8811f169ea0..cfcfb65e349 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -33,51 +33,13 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
   <title>Description</title>
 
   <para>
-   <command>CLUSTER</command> instructs <productname>PostgreSQL</productname>
-   to cluster the table specified
-   by <replaceable class="parameter">table_name</replaceable>
-   based on the index specified by
-   <replaceable class="parameter">index_name</replaceable>. The index must
-   already have been defined on
-   <replaceable class="parameter">table_name</replaceable>.
+   The <command>CLUSTER</command> command is equivalent to
+   <xref linkend="sql-repack"/> with an <literal>USING INDEX</literal>
+   clause.  See there for more details.
   </para>
 
-  <para>
-   When a table is clustered, it is physically reordered
-   based on the index information. Clustering is a one-time operation:
-   when the table is subsequently updated, the changes are
-   not clustered.  That is, no attempt is made to store new or
-   updated rows according to their index order.  (If one wishes, one can
-   periodically recluster by issuing the command again.  Also, setting
-   the table's <literal>fillfactor</literal> storage parameter to less than
-   100% can aid in preserving cluster ordering during updates, since updated
-   rows are kept on the same page if enough space is available there.)
-  </para>
-
-  <para>
-   When a table is clustered, <productname>PostgreSQL</productname>
-   remembers which index it was clustered by.  The form
-   <command>CLUSTER <replaceable class="parameter">table_name</replaceable></command>
-   reclusters the table using the same index as before.  You can also
-   use the <literal>CLUSTER</literal> or <literal>SET WITHOUT CLUSTER</literal>
-   forms of <link linkend="sql-altertable"><command>ALTER TABLE</command></link> to set the index to be used for
-   future cluster operations, or to clear any previous setting.
-  </para>
+<!-- Do we need to describe exactly which options map to what?  They seem obvious to me. -->
 
-  <para>
-   <command>CLUSTER</command> without a
-   <replaceable class="parameter">table_name</replaceable> reclusters all the
-   previously-clustered tables in the current database that the calling user
-   has privileges for.  This form of <command>CLUSTER</command> cannot be
-   executed inside a transaction block.
-  </para>
-
-  <para>
-   When a table is being clustered, an <literal>ACCESS
-   EXCLUSIVE</literal> lock is acquired on it. This prevents any other
-   database operations (both reads and writes) from operating on the
-   table until the <command>CLUSTER</command> is finished.
-  </para>
  </refsect1>
 
  <refsect1>
@@ -136,63 +98,12 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
     on the table.
    </para>
 
-   <para>
-    In cases where you are accessing single rows randomly
-    within a table, the actual order of the data in the
-    table is unimportant. However, if you tend to access some
-    data more than others, and there is an index that groups
-    them together, you will benefit from using <command>CLUSTER</command>.
-    If you are requesting a range of indexed values from a table, or a
-    single indexed value that has multiple rows that match,
-    <command>CLUSTER</command> will help because once the index identifies the
-    table page for the first row that matches, all other rows
-    that match are probably already on the same table page,
-    and so you save disk accesses and speed up the query.
-   </para>
-
-   <para>
-    <command>CLUSTER</command> can re-sort the table using either an index scan
-    on the specified index, or (if the index is a b-tree) a sequential
-    scan followed by sorting.  It will attempt to choose the method that
-    will be faster, based on planner cost parameters and available statistical
-    information.
-   </para>
-
    <para>
     While <command>CLUSTER</command> is running, the <xref
     linkend="guc-search-path"/> is temporarily changed to <literal>pg_catalog,
     pg_temp</literal>.
    </para>
 
-   <para>
-    When an index scan is used, a temporary copy of the table is created that
-    contains the table data in the index order.  Temporary copies of each
-    index on the table are created as well.  Therefore, you need free space on
-    disk at least equal to the sum of the table size and the index sizes.
-   </para>
-
-   <para>
-    When a sequential scan and sort is used, a temporary sort file is
-    also created, so that the peak temporary space requirement is as much
-    as double the table size, plus the index sizes.  This method is often
-    faster than the index scan method, but if the disk space requirement is
-    intolerable, you can disable this choice by temporarily setting <xref
-    linkend="guc-enable-sort"/> to <literal>off</literal>.
-   </para>
-
-   <para>
-    It is advisable to set <xref linkend="guc-maintenance-work-mem"/> to
-    a reasonably large value (but not more than the amount of RAM you can
-    dedicate to the <command>CLUSTER</command> operation) before clustering.
-   </para>
-
-   <para>
-    Because the planner records statistics about the ordering of
-    tables, it is advisable to run <link linkend="sql-analyze"><command>ANALYZE</command></link>
-    on the newly clustered table.
-    Otherwise, the planner might make poor choices of query plans.
-   </para>
-
    <para>
     Because <command>CLUSTER</command> remembers which indexes are clustered,
     one can cluster the tables one wants clustered manually the first time,
diff --git a/doc/src/sgml/ref/clusterdb.sgml b/doc/src/sgml/ref/clusterdb.sgml
index 0d2051bf6f1..546c1289c31 100644
--- a/doc/src/sgml/ref/clusterdb.sgml
+++ b/doc/src/sgml/ref/clusterdb.sgml
@@ -64,6 +64,11 @@ PostgreSQL documentation
    this utility and via other methods for accessing the server.
   </para>
 
+  <para>
+   <application>clusterdb</application> has been superceded by
+   <application>pg_repackdb</application>.
+  </para>
+
  </refsect1>
 
 
diff --git a/doc/src/sgml/ref/pg_repackdb.sgml b/doc/src/sgml/ref/pg_repackdb.sgml
new file mode 100644
index 00000000000..32570d071cb
--- /dev/null
+++ b/doc/src/sgml/ref/pg_repackdb.sgml
@@ -0,0 +1,479 @@
+<!--
+doc/src/sgml/ref/pg_repackdb.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="app-pgrepackdb">
+ <indexterm zone="app-pgrepackdb">
+  <primary>pg_repackdb</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle><application>pg_repackdb</application></refentrytitle>
+  <manvolnum>1</manvolnum>
+  <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>pg_repackdb</refname>
+  <refpurpose>repack and analyze a <productname>PostgreSQL</productname>
+  database</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_repackdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+      <arg choice="plain"><option>-t</option></arg>
+      <arg choice="plain"><option>--table</option></arg>
+     </group>
+     <replaceable>table</replaceable>
+     <arg choice="opt">( <replaceable class="parameter">column</replaceable> [,...] )</arg>
+    </arg>
+   </arg>
+
+   <arg choice="opt">
+    <group choice="plain">
+     <arg choice="plain"><replaceable>dbname</replaceable></arg>
+     <arg choice="plain"><option>-a</option></arg>
+     <arg choice="plain"><option>--all</option></arg>
+    </group>
+   </arg>
+  </cmdsynopsis>
+
+  <cmdsynopsis>
+   <command>pg_repackdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+      <arg choice="plain"><option>-n</option></arg>
+      <arg choice="plain"><option>--schema</option></arg>
+     </group>
+     <replaceable>schema</replaceable>
+    </arg>
+   </arg>
+
+   <arg choice="opt">
+    <group choice="plain">
+     <arg choice="plain"><replaceable>dbname</replaceable></arg>
+     <arg choice="plain"><option>-a</option></arg>
+     <arg choice="plain"><option>--all</option></arg>
+    </group>
+   </arg>
+  </cmdsynopsis>
+
+  <cmdsynopsis>
+   <command>pg_repackdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+      <arg choice="plain"><option>-N</option></arg>
+      <arg choice="plain"><option>--exclude-schema</option></arg>
+     </group>
+     <replaceable>schema</replaceable>
+    </arg>
+   </arg>
+
+   <arg choice="opt">
+    <group choice="plain">
+     <arg choice="plain"><replaceable>dbname</replaceable></arg>
+     <arg choice="plain"><option>-a</option></arg>
+     <arg choice="plain"><option>--all</option></arg>
+    </group>
+   </arg>
+  </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <application>pg_repackdb</application> is a utility for repacking a
+   <productname>PostgreSQL</productname> database.
+   <application>pg_repackdb</application> will also generate internal
+   statistics used by the <productname>PostgreSQL</productname> query
+   optimizer.
+  </para>
+
+  <para>
+   <application>pg_repackdb</application> is a wrapper around the SQL
+   command <link linkend="sql-repack"><command>REPACK</command></link> There
+   is no effective difference between repacking and analyzing databases via
+   this utility and via other methods for accessing the server.
+  </para>
+
+ </refsect1>
+
+
+ <refsect1>
+  <title>Options</title>
+
+   <para>
+    <application>pg_repackdb</application> accepts the following command-line arguments:
+    <variablelist>
+     <varlistentry>
+      <term><option>-a</option></term>
+      <term><option>--all</option></term>
+      <listitem>
+       <para>
+        Repack all databases.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option><optional>-d</optional> <replaceable class="parameter">dbname</replaceable></option></term>
+      <term><option><optional>--dbname=</optional><replaceable class="parameter">dbname</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the name of the database to be repacked or analyzed,
+        when <option>-a</option>/<option>--all</option> is not used.  If this
+        is not specified, the database name is read from the environment
+        variable <envar>PGDATABASE</envar>.  If that is not set, the user name
+        specified for the connection is used.
+        The <replaceable>dbname</replaceable> can be
+        a <link linkend="libpq-connstring">connection string</link>.  If so,
+        connection string parameters will override any conflicting command
+        line options.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-e</option></term>
+      <term><option>--echo</option></term>
+      <listitem>
+       <para>
+        Echo the commands that <application>pg_repackdb</application>
+        generates and sends to the server.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-j <replaceable class="parameter">njobs</replaceable></option></term>
+      <term><option>--jobs=<replaceable class="parameter">njobs</replaceable></option></term>
+      <listitem>
+       <para>
+        Execute the repack or analyze commands in parallel by running
+        <replaceable class="parameter">njobs</replaceable>
+        commands simultaneously.  This option may reduce the processing time
+        but it also increases the load on the database server.
+       </para>
+       <para>
+        <application>pg_repackdb</application> will open
+        <replaceable class="parameter">njobs</replaceable> connections to the
+        database, so make sure your <xref linkend="guc-max-connections"/>
+        setting is high enough to accommodate all connections.
+       </para>
+       <para>
+        Note that using this mode might cause deadlock failures if certain
+        system catalogs are processed in parallel.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-n <replaceable class="parameter">schema</replaceable></option></term>
+      <term><option>--schema=<replaceable class="parameter">schema</replaceable></option></term>
+      <listitem>
+       <para>
+        Repack or analyze all tables in
+        <replaceable class="parameter">schema</replaceable> only.  Multiple
+        schemas can be repacked by writing multiple <option>-n</option>
+        switches.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-N <replaceable class="parameter">schema</replaceable></option></term>
+      <term><option>--exclude-schema=<replaceable class="parameter">schema</replaceable></option></term>
+      <listitem>
+       <para>
+        Do not repack or analyze any tables in
+        <replaceable class="parameter">schema</replaceable>.  Multiple schemas
+        can be excluded by writing multiple <option>-N</option> switches.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-q</option></term>
+      <term><option>--quiet</option></term>
+      <listitem>
+       <para>
+        Do not display progress messages.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-t <replaceable class="parameter">table</replaceable> [ (<replaceable class="parameter">column</replaceable> [,...]) ]</option></term>
+      <term><option>--table=<replaceable class="parameter">table</replaceable> [ (<replaceable class="parameter">column</replaceable> [,...]) ]</option></term>
+      <listitem>
+       <para>
+        Repack or analyze <replaceable class="parameter">table</replaceable>
+        only.  Column names can be specified only in conjunction with
+        the <option>--analyze</option> option.  Multiple tables can be
+        repacked by writing multiple
+        <option>-t</option> switches.
+       </para>
+       <tip>
+        <para>
+         If you specify columns, you probably have to escape the parentheses
+         from the shell.  (See examples below.)
+        </para>
+       </tip>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-v</option></term>
+      <term><option>--verbose</option></term>
+      <listitem>
+       <para>
+        Print detailed information during processing.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-V</option></term>
+       <term><option>--version</option></term>
+       <listitem>
+       <para>
+       Print the <application>pg_repackdb</application> version and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-z</option></term>
+      <term><option>--analyze</option></term>
+      <listitem>
+       <para>
+        Also calculate statistics for use by the optimizer.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-?</option></term>
+       <term><option>--help</option></term>
+       <listitem>
+       <para>
+       Show help about <application>pg_repackdb</application> command line
+       arguments, and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+    </variablelist>
+   </para>
+
+   <para>
+    <application>pg_repackdb</application> also accepts
+    the following command-line arguments for connection parameters:
+    <variablelist>
+     <varlistentry>
+      <term><option>-h <replaceable class="parameter">host</replaceable></option></term>
+      <term><option>--host=<replaceable class="parameter">host</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the host name of the machine on which the server
+        is running.  If the value begins with a slash, it is used
+        as the directory for the Unix domain socket.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-p <replaceable class="parameter">port</replaceable></option></term>
+      <term><option>--port=<replaceable class="parameter">port</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the TCP port or local Unix domain socket file
+        extension on which the server
+        is listening for connections.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-U <replaceable class="parameter">username</replaceable></option></term>
+      <term><option>--username=<replaceable class="parameter">username</replaceable></option></term>
+      <listitem>
+       <para>
+        User name to connect as.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-w</option></term>
+      <term><option>--no-password</option></term>
+      <listitem>
+       <para>
+        Never issue a password prompt.  If the server requires
+        password authentication and a password is not available by
+        other means such as a <filename>.pgpass</filename> file, the
+        connection attempt will fail.  This option can be useful in
+        batch jobs and scripts where no user is present to enter a
+        password.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-W</option></term>
+      <term><option>--password</option></term>
+      <listitem>
+       <para>
+        Force <application>pg_repackdb</application> to prompt for a
+        password before connecting to a database.
+       </para>
+
+       <para>
+        This option is never essential, since
+        <application>pg_repackdb</application> will automatically prompt
+        for a password if the server demands password authentication.
+        However, <application>pg_repackdb</application> will waste a
+        connection attempt finding out that the server wants a password.
+        In some cases it is worth typing <option>-W</option> to avoid the extra
+        connection attempt.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--maintenance-db=<replaceable class="parameter">dbname</replaceable></option></term>
+      <listitem>
+       <para>
+        When the <option>-a</option>/<option>--all</option> is used, connect
+        to this database to gather the list of databases to repack.
+        If not specified, the <literal>postgres</literal> database will be used,
+        or if that does not exist, <literal>template1</literal> will be used.
+        This can be a <link linkend="libpq-connstring">connection
+        string</link>.  If so, connection string parameters will override any
+        conflicting command line options.  Also, connection string parameters
+        other than the database name itself will be re-used when connecting
+        to other databases.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+   </para>
+ </refsect1>
+
+
+ <refsect1>
+  <title>Environment</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><envar>PGDATABASE</envar></term>
+    <term><envar>PGHOST</envar></term>
+    <term><envar>PGPORT</envar></term>
+    <term><envar>PGUSER</envar></term>
+
+    <listitem>
+     <para>
+      Default connection parameters
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><envar>PG_COLOR</envar></term>
+    <listitem>
+     <para>
+      Specifies whether to use color in diagnostic messages. Possible values
+      are <literal>always</literal>, <literal>auto</literal> and
+      <literal>never</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <para>
+   This utility, like most other <productname>PostgreSQL</productname> utilities,
+   also uses the environment variables supported by <application>libpq</application>
+   (see <xref linkend="libpq-envars"/>).
+  </para>
+
+ </refsect1>
+
+
+ <refsect1>
+  <title>Diagnostics</title>
+
+  <para>
+   In case of difficulty, see
+   <xref linkend="sql-repack"/> and <xref linkend="app-psql"/> for
+   discussions of potential problems and error messages.
+   The database server must be running at the
+   targeted host.  Also, any default connection settings and environment
+   variables used by the <application>libpq</application> front-end
+   library will apply.
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+   <para>
+    To repack the database <literal>test</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb test</userinput>
+</screen>
+   </para>
+
+   <para>
+    To repack and analyze for the optimizer a database named
+    <literal>bigdb</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --analyze bigdb</userinput>
+</screen>
+   </para>
+
+   <para>
+    To repack a single table
+    <literal>foo</literal> in a database named
+    <literal>xyzzy</literal>, and analyze a single column
+    <literal>bar</literal> of the table for the optimizer:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --analyze --verbose --table='foo(bar)' xyzzy</userinput>
+</screen></para>
+
+   <para>
+    To repack all tables in the <literal>foo</literal> and <literal>bar</literal> schemas
+    in a database named <literal>xyzzy</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --schema='foo' --schema='bar' xyzzy</userinput>
+</screen></para>
+
+
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-repack"/></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/repack.sgml b/doc/src/sgml/ref/repack.sgml
new file mode 100644
index 00000000000..fd9d89f8aaa
--- /dev/null
+++ b/doc/src/sgml/ref/repack.sgml
@@ -0,0 +1,284 @@
+<!--
+doc/src/sgml/ref/repack.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="sql-repack">
+ <indexterm zone="sql-repack">
+  <primary>REPACK</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>REPACK</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>REPACK</refname>
+  <refpurpose>rewrite a table to reclaim disk space</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <replaceable class="parameter">table_name</replaceable> [ USING INDEX [ <replaceable class="parameter">index_name</replaceable> ] ] ]
+
+<phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
+
+    VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
+    ANALYSE | ANALYZE
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>REPACK</command> reclaims storage occupied by dead
+   tuples. Unlike <command>VACUUM</command>, it does so by rewriting the
+   entire contents of the table specified
+   by <replaceable class="parameter">table_name</replaceable> into a new disk
+   file with no extra space (except for the space guaranteed by
+   the <literal>fillfactor</literal> storage parameter), allowing unused space
+   to be returned to the operating system.
+  </para>
+
+  <para>
+   Without
+   a <replaceable class="parameter">table_name</replaceable>, <command>REPACK</command>
+   processes every table and materialized view in the current database that
+   the current user has the <literal>MAINTAIN</literal> privilege on. This
+   form of <command>REPACK</command> cannot be executed inside a transaction
+   block.
+  </para>
+
+  <para>
+   If a <literal>USING INDEX</literal> clause is specified, the rows are
+   physically reordered based on information from an index.  Please see the
+   notes on clustering below.
+  </para>
+
+  <para>
+   When a table is being repacked, an <literal>ACCESS EXCLUSIVE</literal> lock
+   is acquired on it. This prevents any other database operations (both reads
+   and writes) from operating on the table until the <command>REPACK</command>
+   is finished.
+  </para>
+
+  <refsect2 id="sql-repack-notes-on-clustering" xreflabel="Notes on Clustering">
+   <title>Notes on Clustering</title>
+
+   <para>
+    If the <literal>USING INDEX</literal> clause is specified, the rows in
+    the table are physically reordered following an index: if an index name
+    is specified in the command, then that index is used; if no index name
+    is specified, then the index that has been configured as the index to
+    cluster on.  If no index has been configured in this way, an error is
+    thrown.  The index given in the <literal>USING INDEX</literal> clause
+    is configured as the index to cluster on, as well as an index given
+    to the <command>CLUSTER</command> command.  An index can be set
+    manually using <command>ALTER TABLE ... CLUSTER ON</command>, and reset
+    with <command>ALTER TABLE ... SET WITHOUT CLUSTER</command>.
+   </para>
+
+   <para>
+    If no table name is specified in <command>REPACK USING INDEX</command>,
+    all tables which have a clustering index defined and which the calling
+    user has privileges for are processed.
+   </para>
+
+   <para>
+    Clustering is a one-time operation: when the table is
+    subsequently updated, the changes are not clustered.  That is, no attempt
+    is made to store new or updated rows according to their index order.  (If
+    one wishes, one can periodically recluster by issuing the command again.
+    Also, setting the table's <literal>fillfactor</literal> storage parameter
+    to less than 100% can aid in preserving cluster ordering during updates,
+    since updated rows are kept on the same page if enough space is available
+    there.)
+   </para>
+
+   <para>
+    In cases where you are accessing single rows randomly within a table, the
+    actual order of the data in the table is unimportant. However, if you tend
+    to access some data more than others, and there is an index that groups
+    them together, you will benefit from using clustering.  If
+    you are requesting a range of indexed values from a table, or a single
+    indexed value that has multiple rows that match,
+    <command>REPACK</command> will help because once the index identifies the
+    table page for the first row that matches, all other rows that match are
+    probably already on the same table page, and so you save disk accesses and
+    speed up the query.
+   </para>
+
+   <para>
+    <command>REPACK</command> can re-sort the table using either an index scan
+    on the specified index (if the index is a b-tree), or a sequential scan
+    followed by sorting.  It will attempt to choose the method that will be
+    faster, based on planner cost parameters and available statistical
+    information.
+   </para>
+
+   <para>
+    Because the planner records statistics about the ordering of tables, it is
+    advisable to
+    run <link linkend="sql-analyze"><command>ANALYZE</command></link> on the
+    newly repacked table.  Otherwise, the planner might make poor choices of
+    query plans.
+   </para>
+  </refsect2>
+
+  <refsect2 id="sql-repack-notes-on-resources" xreflabel="Notes on Resources">
+   <title>Notes on Resources</title>
+
+   <para>
+    When an index scan or a sequential scan without sort is used, a temporary
+    copy of the table is created that contains the table data in the index
+    order.  Temporary copies of each index on the table are created as well.
+    Therefore, you need free space on disk at least equal to the sum of the
+    table size and the index sizes.
+   </para>
+
+   <para>
+    When a sequential scan and sort is used, a temporary sort file is also
+    created, so that the peak temporary space requirement is as much as double
+    the table size, plus the index sizes.  This method is often faster than
+    the index scan method, but if the disk space requirement is intolerable,
+    you can disable this choice by temporarily setting
+    <xref linkend="guc-enable-sort"/> to <literal>off</literal>.
+   </para>
+
+   <para>
+    It is advisable to set <xref linkend="guc-maintenance-work-mem"/> to a
+    reasonably large value (but not more than the amount of RAM you can
+    dedicate to the <command>REPACK</command> operation) before repacking.
+   </para>
+  </refsect2>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><replaceable class="parameter">table_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (possibly schema-qualified) of a table.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">index_name</replaceable></term>
+    <listitem>
+     <para>
+      The name of an index.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>VERBOSE</literal></term>
+    <listitem>
+     <para>
+      Prints a progress report as each table is repacked
+      at <literal>INFO</literal> level.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>ANALYZE</literal></term>
+    <term><literal>ANALYSE</literal></term>
+    <listitem>
+     <para>
+      Applies <xref linkend="sql-analyze"/> on the table after repacking.  This is
+      currently only supported when a single (non-partitioned) table is specified.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">boolean</replaceable></term>
+    <listitem>
+     <para>
+      Specifies whether the selected option should be turned on or off.
+      You can write <literal>TRUE</literal>, <literal>ON</literal>, or
+      <literal>1</literal> to enable the option, and <literal>FALSE</literal>,
+      <literal>OFF</literal>, or <literal>0</literal> to disable it.  The
+      <replaceable class="parameter">boolean</replaceable> value can also
+      be omitted, in which case <literal>TRUE</literal> is assumed.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+
+   <para>
+    To repack a table, one must have the <literal>MAINTAIN</literal> privilege
+    on the table.
+   </para>
+
+   <para>
+    While <command>REPACK</command> is running, the <xref
+    linkend="guc-search-path"/> is temporarily changed to <literal>pg_catalog,
+    pg_temp</literal>.
+   </para>
+
+  <para>
+    Each backend running <command>REPACK</command> will report its progress
+    in the <structname>pg_stat_progress_repack</structname> view. See
+    <xref linkend="repack-progress-reporting"/> for details.
+  </para>
+
+   <para>
+    Repacking a partitioned table repacks each of its partitions. If an index
+    is specified, each partition is repacked using the partition of that
+    index. <command>REPACK</command> on a partitioned table cannot be executed
+    inside a transaction block.
+   </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   Repack the table <literal>employees</literal>:
+<programlisting>
+REPACK employees;
+</programlisting>
+  </para>
+
+  <para>
+   Repack the table <literal>employees</literal> on the basis of its
+   index <literal>employees_ind</literal> (Since index is used here, this is
+   effectively clustering):
+<programlisting>
+REPACK employees USING INDEX employees_ind;
+</programlisting>
+  </para>
+
+  <para>
+   Repack all tables in the database on which you have
+   the <literal>MAINTAIN</literal> privilege:
+<programlisting>
+REPACK;
+</programlisting></para>
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There is no <command>REPACK</command> statement in the SQL standard.
+  </para>
+
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index bd5dcaf86a5..062b658cfcd 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -25,7 +25,6 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
 
 <phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
 
-    FULL [ <replaceable class="parameter">boolean</replaceable> ]
     FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
     VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
     ANALYZE [ <replaceable class="parameter">boolean</replaceable> ]
@@ -39,6 +38,7 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
     SKIP_DATABASE_STATS [ <replaceable class="parameter">boolean</replaceable> ]
     ONLY_DATABASE_STATS [ <replaceable class="parameter">boolean</replaceable> ]
     BUFFER_USAGE_LIMIT <replaceable class="parameter">size</replaceable>
+    FULL [ <replaceable class="parameter">boolean</replaceable> ]
 
 <phrase>and <replaceable class="parameter">table_and_columns</replaceable> is:</phrase>
 
@@ -95,20 +95,6 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
   <title>Parameters</title>
 
   <variablelist>
-   <varlistentry>
-    <term><literal>FULL</literal></term>
-    <listitem>
-     <para>
-      Selects <quote>full</quote> vacuum, which can reclaim more
-      space, but takes much longer and exclusively locks the table.
-      This method also requires extra disk space, since it writes a
-      new copy of the table and doesn't release the old copy until
-      the operation is complete.  Usually this should only be used when a
-      significant amount of space needs to be reclaimed from within the table.
-     </para>
-    </listitem>
-   </varlistentry>
-
    <varlistentry>
     <term><literal>FREEZE</literal></term>
     <listitem>
@@ -362,6 +348,23 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>FULL</literal></term>
+    <listitem>
+     <para>
+      This option, which is deprecated, makes <command>VACUUM</command>
+      behave like <command>REPACK</command> without a
+      <literal>USING INDEX</literal> clause.
+      This method of compacting the table takes much longer than
+      <command>VACUUM</command> and exclusively locks the table.
+      This method also requires extra disk space, since it writes a
+      new copy of the table and doesn't release the old copy until
+      the operation is complete.  Usually this should only be used when a
+      significant amount of space needs to be reclaimed from within the table.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><replaceable class="parameter">boolean</replaceable></term>
     <listitem>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index ff85ace83fc..2ee08e21f41 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -195,6 +195,7 @@
    &refreshMaterializedView;
    &reindex;
    &releaseSavepoint;
+   &repack;
    &reset;
    &revoke;
    &rollback;
@@ -257,6 +258,7 @@
    &pgIsready;
    &pgReceivewal;
    &pgRecvlogical;
+   &pgRepackdb;
    &pgRestore;
    &pgVerifyBackup;
    &psqlRef;
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..79f9de5d760 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -741,13 +741,13 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	if (OldIndex != NULL && !use_sort)
 	{
 		const int	ci_index[] = {
-			PROGRESS_CLUSTER_PHASE,
-			PROGRESS_CLUSTER_INDEX_RELID
+			PROGRESS_REPACK_PHASE,
+			PROGRESS_REPACK_INDEX_RELID
 		};
 		int64		ci_val[2];
 
 		/* Set phase and OIDOldIndex to columns */
-		ci_val[0] = PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP;
+		ci_val[0] = PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP;
 		ci_val[1] = RelationGetRelid(OldIndex);
 		pgstat_progress_update_multi_param(2, ci_index, ci_val);
 
@@ -759,15 +759,15 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	else
 	{
 		/* In scan-and-sort mode and also VACUUM FULL, set phase */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
 		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
 		/* Set total heap blocks */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_TOTAL_HEAP_BLKS,
+		pgstat_progress_update_param(PROGRESS_REPACK_TOTAL_HEAP_BLKS,
 									 heapScan->rs_nblocks);
 	}
 
@@ -809,7 +809,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 				 * is manually updated to the correct value when the table
 				 * scan finishes.
 				 */
-				pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+				pgstat_progress_update_param(PROGRESS_REPACK_HEAP_BLKS_SCANNED,
 											 heapScan->rs_nblocks);
 				break;
 			}
@@ -825,7 +825,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			 */
 			if (prev_cblock != heapScan->rs_cblock)
 			{
-				pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+				pgstat_progress_update_param(PROGRESS_REPACK_HEAP_BLKS_SCANNED,
 											 (heapScan->rs_cblock +
 											  heapScan->rs_nblocks -
 											  heapScan->rs_startblock
@@ -912,14 +912,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			 * In scan-and-sort mode, report increase in number of tuples
 			 * scanned
 			 */
-			pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
 										 *num_tuples);
 		}
 		else
 		{
 			const int	ct_index[] = {
-				PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED,
-				PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN
+				PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
+				PROGRESS_REPACK_HEAP_TUPLES_WRITTEN
 			};
 			int64		ct_val[2];
 
@@ -952,14 +952,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		double		n_tuples = 0;
 
 		/* Report that we are now sorting tuples */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_SORT_TUPLES);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_SORT_TUPLES);
 
 		tuplesort_performsort(tuplesort);
 
 		/* Report that we are now writing new heap */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_WRITE_NEW_HEAP);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP);
 
 		for (;;)
 		{
@@ -977,7 +977,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 									 values, isnull,
 									 rwstate);
 			/* Report n_tuples */
-			pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_WRITTEN,
 										 n_tuples);
 		}
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index c4029a4f3d3..3063abff9a5 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -4079,7 +4079,7 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 		Assert(!ReindexIsProcessingIndex(indexOid));
 
 		/* Set index rebuild count */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_INDEX_REBUILD_COUNT,
+		pgstat_progress_update_param(PROGRESS_REPACK_INDEX_REBUILD_COUNT,
 									 i);
 		i++;
 	}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 1b3c5a55882..b2b7b10c2be 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1279,6 +1279,32 @@ CREATE VIEW pg_stat_progress_cluster AS
     FROM pg_stat_get_progress_info('CLUSTER') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
+CREATE VIEW pg_stat_progress_repack AS
+    SELECT
+        S.pid AS pid,
+        S.datid AS datid,
+        D.datname AS datname,
+        S.relid AS relid,
+	-- param1 is currently unused
+        CASE S.param2 WHEN 0 THEN 'initializing'
+                      WHEN 1 THEN 'seq scanning heap'
+                      WHEN 2 THEN 'index scanning heap'
+                      WHEN 3 THEN 'sorting tuples'
+                      WHEN 4 THEN 'writing new heap'
+                      WHEN 5 THEN 'swapping relation files'
+                      WHEN 6 THEN 'rebuilding index'
+                      WHEN 7 THEN 'performing final cleanup'
+                      END AS phase,
+        CAST(S.param3 AS oid) AS repack_index_relid,
+        S.param4 AS heap_tuples_scanned,
+        S.param5 AS heap_tuples_written,
+        S.param6 AS heap_blks_total,
+        S.param7 AS heap_blks_scanned,
+        S.param8 AS index_rebuild_count
+    FROM pg_stat_get_progress_info('REPACK') AS S
+        LEFT JOIN pg_database D ON S.datid = D.oid;
+
+
 CREATE VIEW pg_stat_progress_create_index AS
     SELECT
         S.pid AS pid, S.datid AS datid, D.datname AS datname,
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index b55221d44cd..8b64f9e6795 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -67,18 +67,41 @@ typedef struct
 	Oid			indexOid;
 } RelToCluster;
 
-
-static void cluster_multiple_rels(List *rtcs, ClusterParams *params);
-static void rebuild_relation(Relation OldHeap, Relation index, bool verbose);
+static bool cluster_rel_recheck(RepackCommand cmd, Relation OldHeap,
+								Oid indexOid, Oid userid, int options);
+static void rebuild_relation(RepackCommand cmd, bool usingindex,
+							 Relation OldHeap, Relation index, bool verbose);
 static void copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
-static List *get_tables_to_cluster(MemoryContext cluster_context);
-static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
-											   Oid indexOid);
-static bool cluster_is_permitted_for_relation(Oid relid, Oid userid);
+static List *get_tables_to_repack(RepackCommand cmd, bool usingindex,
+								  MemoryContext permcxt);
+static List *get_tables_to_repack_partitioned(RepackCommand cmd,
+											  MemoryContext cluster_context,
+											  Oid relid, bool rel_is_index);
+static bool cluster_is_permitted_for_relation(RepackCommand cmd,
+											  Oid relid, Oid userid);
+static Relation process_single_relation(RepackStmt *stmt,
+										ClusterParams *params);
+static Oid	determine_clustered_index(Relation rel, bool usingindex,
+									  const char *indexname);
 
 
+static const char *
+RepackCommandAsString(RepackCommand cmd)
+{
+	switch (cmd)
+	{
+		case REPACK_COMMAND_REPACK:
+			return "REPACK";
+		case REPACK_COMMAND_VACUUMFULL:
+			return "VACUUM";
+		case REPACK_COMMAND_CLUSTER:
+			return "CLUSTER";
+	}
+	return "???";
+}
+
 /*---------------------------------------------------------------------------
  * This cluster code allows for clustering multiple tables at once. Because
  * of this, we cannot just run everything on a single transaction, or we
@@ -104,191 +127,155 @@ static bool cluster_is_permitted_for_relation(Oid relid, Oid userid);
  *---------------------------------------------------------------------------
  */
 void
-cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
+ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 {
-	ListCell   *lc;
 	ClusterParams params = {0};
-	bool		verbose = false;
 	Relation	rel = NULL;
-	Oid			indexOid = InvalidOid;
-	MemoryContext cluster_context;
+	MemoryContext repack_context;
 	List	   *rtcs;
 
 	/* Parse option list */
-	foreach(lc, stmt->params)
+	foreach_node(DefElem, opt, stmt->params)
 	{
-		DefElem    *opt = (DefElem *) lfirst(lc);
-
 		if (strcmp(opt->defname, "verbose") == 0)
-			verbose = defGetBoolean(opt);
+			params.options |= defGetBoolean(opt) ? CLUOPT_VERBOSE : 0;
+		else if (strcmp(opt->defname, "analyze") == 0 ||
+				 strcmp(opt->defname, "analyse") == 0)
+			params.options |= defGetBoolean(opt) ? CLUOPT_ANALYZE : 0;
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
-					 errmsg("unrecognized CLUSTER option \"%s\"",
+					 errmsg("unrecognized %s option \"%s\"",
+							RepackCommandAsString(stmt->command),
 							opt->defname),
 					 parser_errposition(pstate, opt->location)));
 	}
 
-	params.options = (verbose ? CLUOPT_VERBOSE : 0);
-
+	/*
+	 * If a single relation is specified, process it and we're done ... unless
+	 * the relation is a partitioned table, in which case we fall through.
+	 */
 	if (stmt->relation != NULL)
 	{
-		/* This is the single-relation case. */
-		Oid			tableOid;
-
-		/*
-		 * Find, lock, and check permissions on the table.  We obtain
-		 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
-		 * single-transaction case.
-		 */
-		tableOid = RangeVarGetRelidExtended(stmt->relation,
-											AccessExclusiveLock,
-											0,
-											RangeVarCallbackMaintainsTable,
-											NULL);
-		rel = table_open(tableOid, NoLock);
-
-		/*
-		 * Reject clustering a remote temp table ... their local buffer
-		 * manager is not going to cope.
-		 */
-		if (RELATION_IS_OTHER_TEMP(rel))
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot cluster temporary tables of other sessions")));
-
-		if (stmt->indexname == NULL)
-		{
-			ListCell   *index;
-
-			/* We need to find the index that has indisclustered set. */
-			foreach(index, RelationGetIndexList(rel))
-			{
-				indexOid = lfirst_oid(index);
-				if (get_index_isclustered(indexOid))
-					break;
-				indexOid = InvalidOid;
-			}
-
-			if (!OidIsValid(indexOid))
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_OBJECT),
-						 errmsg("there is no previously clustered index for table \"%s\"",
-								stmt->relation->relname)));
-		}
-		else
-		{
-			/*
-			 * The index is expected to be in the same namespace as the
-			 * relation.
-			 */
-			indexOid = get_relname_relid(stmt->indexname,
-										 rel->rd_rel->relnamespace);
-			if (!OidIsValid(indexOid))
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_OBJECT),
-						 errmsg("index \"%s\" for table \"%s\" does not exist",
-								stmt->indexname, stmt->relation->relname)));
-		}
-
-		/* For non-partitioned tables, do what we came here to do. */
-		if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
-		{
-			cluster_rel(rel, indexOid, &params);
-			/* cluster_rel closes the relation, but keeps lock */
-
+		rel = process_single_relation(stmt, &params);
+		if (rel == NULL)
 			return;
-		}
 	}
 
+	/* Don't allow this for now.  Maybe we can add support for this later */
+	if (params.options & CLUOPT_ANALYZE)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot ANALYZE multiple tables"));
+
 	/*
 	 * By here, we know we are in a multi-table situation.  In order to avoid
 	 * holding locks for too long, we want to process each table in its own
 	 * transaction.  This forces us to disallow running inside a user
 	 * transaction block.
 	 */
-	PreventInTransactionBlock(isTopLevel, "CLUSTER");
+	PreventInTransactionBlock(isTopLevel, RepackCommandAsString(stmt->command));
 
 	/* Also, we need a memory context to hold our list of relations */
-	cluster_context = AllocSetContextCreate(PortalContext,
-											"Cluster",
-											ALLOCSET_DEFAULT_SIZES);
+	repack_context = AllocSetContextCreate(PortalContext,
+										   "Repack",
+										   ALLOCSET_DEFAULT_SIZES);
 
-	/*
-	 * Either we're processing a partitioned table, or we were not given any
-	 * table name at all.  In either case, obtain a list of relations to
-	 * process.
-	 *
-	 * In the former case, an index name must have been given, so we don't
-	 * need to recheck its "indisclustered" bit, but we have to check that it
-	 * is an index that we can cluster on.  In the latter case, we set the
-	 * option bit to have indisclustered verified.
-	 *
-	 * Rechecking the relation itself is necessary here in all cases.
-	 */
 	params.options |= CLUOPT_RECHECK;
-	if (rel != NULL)
+
+	/*
+	 * If we don't have a relation yet, determine a relation list.  If we do,
+	 * then it must be a partitioned table, and we want to process its
+	 * partitions.
+	 */
+	if (rel == NULL)
 	{
+		Assert(stmt->indexname == NULL);
+		rtcs = get_tables_to_repack(stmt->command, stmt->usingindex,
+									repack_context);
+	}
+	else
+	{
+		Oid			relid;
+		bool		rel_is_index;
+
 		Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
-		check_index_is_clusterable(rel, indexOid, AccessShareLock);
-		rtcs = get_tables_to_cluster_partitioned(cluster_context, indexOid);
 
-		/* close relation, releasing lock on parent table */
+		/*
+		 * If an index name was specified, resolve it now and pass it down.
+		 */
+		if (stmt->usingindex)
+		{
+			/*
+			 * XXX how should this behave?  Passing no index to a partitioned
+			 * table could be useful to have certain partitions clustered by
+			 * some index, and other partitions by a different index.
+			 */
+			if (!stmt->indexname)
+				ereport(ERROR,
+						errmsg("there is no previously clustered index for table \"%s\"",
+							   RelationGetRelationName(rel)));
+
+			relid = determine_clustered_index(rel, true, stmt->indexname);
+			if (!OidIsValid(relid))
+				elog(ERROR, "unable to determine index to cluster on");
+			/* XXX is this the right place for this check? */
+			check_index_is_clusterable(rel, relid, AccessExclusiveLock);
+			rel_is_index = true;
+		}
+		else
+		{
+			relid = RelationGetRelid(rel);
+			rel_is_index = false;
+		}
+
+		rtcs = get_tables_to_repack_partitioned(stmt->command, repack_context,
+												relid, rel_is_index);
+
+		/* close parent relation, releasing lock on it */
 		table_close(rel, AccessExclusiveLock);
+		rel = NULL;
 	}
-	else
-	{
-		rtcs = get_tables_to_cluster(cluster_context);
-		params.options |= CLUOPT_RECHECK_ISCLUSTERED;
-	}
-
-	/* Do the job. */
-	cluster_multiple_rels(rtcs, &params);
-
-	/* Start a new transaction for the cleanup work. */
-	StartTransactionCommand();
-
-	/* Clean up working storage */
-	MemoryContextDelete(cluster_context);
-}
-
-/*
- * Given a list of relations to cluster, process each of them in a separate
- * transaction.
- *
- * We expect to be in a transaction at start, but there isn't one when we
- * return.
- */
-static void
-cluster_multiple_rels(List *rtcs, ClusterParams *params)
-{
-	ListCell   *lc;
 
 	/* Commit to get out of starting transaction */
 	PopActiveSnapshot();
 	CommitTransactionCommand();
 
 	/* Cluster the tables, each in a separate transaction */
-	foreach(lc, rtcs)
+	Assert(rel == NULL);
+	foreach_ptr(RelToCluster, rtc, rtcs)
 	{
-		RelToCluster *rtc = (RelToCluster *) lfirst(lc);
-		Relation	rel;
-
 		/* Start a new transaction for each relation. */
 		StartTransactionCommand();
 
+		/*
+		 * Open the target table, coping with the case where it has been
+		 * dropped.
+		 */
+		rel = try_table_open(rtc->tableOid, AccessExclusiveLock);
+		if (rel == NULL)
+		{
+			CommitTransactionCommand();
+			continue;
+		}
+
 		/* functions in indexes may want a snapshot set */
 		PushActiveSnapshot(GetTransactionSnapshot());
 
-		rel = table_open(rtc->tableOid, AccessExclusiveLock);
-
 		/* Process this table */
-		cluster_rel(rel, rtc->indexOid, params);
+		cluster_rel(stmt->command, stmt->usingindex,
+					rel, rtc->indexOid, &params);
 		/* cluster_rel closes the relation, but keeps lock */
 
 		PopActiveSnapshot();
 		CommitTransactionCommand();
 	}
+
+	/* Start a new transaction for the cleanup work. */
+	StartTransactionCommand();
+
+	/* Clean up working storage */
+	MemoryContextDelete(repack_context);
 }
 
 /*
@@ -304,11 +291,14 @@ cluster_multiple_rels(List *rtcs, ClusterParams *params)
  * them incrementally while we load the table.
  *
  * If indexOid is InvalidOid, the table will be rewritten in physical order
- * instead of index order.  This is the new implementation of VACUUM FULL,
- * and error messages should refer to the operation as VACUUM not CLUSTER.
+ * instead of index order.
+ *
+ * 'cmd' indicates which command is being executed, to be used for error
+ * messages.
  */
 void
-cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
+cluster_rel(RepackCommand cmd, bool usingindex,
+			Relation OldHeap, Oid indexOid, ClusterParams *params)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			save_userid;
@@ -323,13 +313,25 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	/* Check for user-requested abort. */
 	CHECK_FOR_INTERRUPTS();
 
-	pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
-	if (OidIsValid(indexOid))
-		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
+	if (cmd == REPACK_COMMAND_REPACK)
+		pgstat_progress_start_command(PROGRESS_COMMAND_REPACK, tableOid);
+	else
+		pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
+
+	if (cmd == REPACK_COMMAND_REPACK)
+		pgstat_progress_update_param(PROGRESS_REPACK_COMMAND,
+									 PROGRESS_REPACK_COMMAND_REPACK);
+	else if (cmd == REPACK_COMMAND_CLUSTER)
+	{
+		pgstat_progress_update_param(PROGRESS_REPACK_COMMAND,
 									 PROGRESS_CLUSTER_COMMAND_CLUSTER);
+	}
 	else
-		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
+	{
+		Assert(cmd == REPACK_COMMAND_VACUUMFULL);
+		pgstat_progress_update_param(PROGRESS_REPACK_COMMAND,
 									 PROGRESS_CLUSTER_COMMAND_VACUUM_FULL);
+	}
 
 	/*
 	 * Switch to the table owner's userid, so that any index functions are run
@@ -351,63 +353,21 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	 * to cluster a not-previously-clustered index.
 	 */
 	if (recheck)
-	{
-		/* Check that the user still has privileges for the relation */
-		if (!cluster_is_permitted_for_relation(tableOid, save_userid))
-		{
-			relation_close(OldHeap, AccessExclusiveLock);
+		if (!cluster_rel_recheck(cmd, OldHeap, indexOid, save_userid,
+								 params->options))
 			goto out;
-		}
-
-		/*
-		 * Silently skip a temp table for a remote session.  Only doing this
-		 * check in the "recheck" case is appropriate (which currently means
-		 * somebody is executing a database-wide CLUSTER or on a partitioned
-		 * table), because there is another check in cluster() which will stop
-		 * any attempt to cluster remote temp tables by name.  There is
-		 * another check in cluster_rel which is redundant, but we leave it
-		 * for extra safety.
-		 */
-		if (RELATION_IS_OTHER_TEMP(OldHeap))
-		{
-			relation_close(OldHeap, AccessExclusiveLock);
-			goto out;
-		}
-
-		if (OidIsValid(indexOid))
-		{
-			/*
-			 * Check that the index still exists
-			 */
-			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
-			{
-				relation_close(OldHeap, AccessExclusiveLock);
-				goto out;
-			}
-
-			/*
-			 * Check that the index is still the one with indisclustered set,
-			 * if needed.
-			 */
-			if ((params->options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
-				!get_index_isclustered(indexOid))
-			{
-				relation_close(OldHeap, AccessExclusiveLock);
-				goto out;
-			}
-		}
-	}
 
 	/*
-	 * We allow VACUUM FULL, but not CLUSTER, on shared catalogs.  CLUSTER
-	 * would work in most respects, but the index would only get marked as
-	 * indisclustered in the current database, leading to unexpected behavior
-	 * if CLUSTER were later invoked in another database.
+	 * We allow repacking shared catalogs only when not using an index. It
+	 * would work to use an index in most respects, but the index would only
+	 * get marked as indisclustered in the current database, leading to
+	 * unexpected behavior if CLUSTER were later invoked in another database.
 	 */
-	if (OidIsValid(indexOid) && OldHeap->rd_rel->relisshared)
+	if (usingindex && OldHeap->rd_rel->relisshared)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot cluster a shared catalog")));
+				 errmsg("cannot run \"%s\" on a shared catalog",
+						RepackCommandAsString(cmd))));
 
 	/*
 	 * Don't process temp tables of other backends ... their local buffer
@@ -415,21 +375,30 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	 */
 	if (RELATION_IS_OTHER_TEMP(OldHeap))
 	{
-		if (OidIsValid(indexOid))
+		if (cmd == REPACK_COMMAND_CLUSTER)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("cannot cluster temporary tables of other sessions")));
+		else if (cmd == REPACK_COMMAND_REPACK)
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot repack temporary tables of other sessions")));
+		}
 		else
+		{
+			Assert(cmd == REPACK_COMMAND_VACUUMFULL);
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("cannot vacuum temporary tables of other sessions")));
+		}
 	}
 
 	/*
 	 * Also check for active uses of the relation in the current transaction,
 	 * including open scans and pending AFTER trigger events.
 	 */
-	CheckTableNotInUse(OldHeap, OidIsValid(indexOid) ? "CLUSTER" : "VACUUM");
+	CheckTableNotInUse(OldHeap, RepackCommandAsString(cmd));
 
 	/* Check heap and index are valid to cluster on */
 	if (OidIsValid(indexOid))
@@ -469,7 +438,7 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	TransferPredicateLocksToHeapRelation(OldHeap);
 
 	/* rebuild_relation does all the dirty work */
-	rebuild_relation(OldHeap, index, verbose);
+	rebuild_relation(cmd, usingindex, OldHeap, index, verbose);
 	/* rebuild_relation closes OldHeap, and index if valid */
 
 out:
@@ -482,6 +451,63 @@ out:
 	pgstat_progress_end_command();
 }
 
+/*
+ * Check if the table (and its index) still meets the requirements of
+ * cluster_rel().
+ */
+static bool
+cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
+					Oid userid, int options)
+{
+	Oid			tableOid = RelationGetRelid(OldHeap);
+
+	/* Check that the user still has privileges for the relation */
+	if (!cluster_is_permitted_for_relation(cmd, tableOid, userid))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		return false;
+	}
+
+	/*
+	 * Silently skip a temp table for a remote session.  Only doing this check
+	 * in the "recheck" case is appropriate (which currently means somebody is
+	 * executing a database-wide CLUSTER or on a partitioned table), because
+	 * there is another check in cluster() which will stop any attempt to
+	 * cluster remote temp tables by name.  There is another check in
+	 * cluster_rel which is redundant, but we leave it for extra safety.
+	 */
+	if (RELATION_IS_OTHER_TEMP(OldHeap))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		return false;
+	}
+
+	if (OidIsValid(indexOid))
+	{
+		/*
+		 * Check that the index still exists
+		 */
+		if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
+		{
+			relation_close(OldHeap, AccessExclusiveLock);
+			return false;
+		}
+
+		/*
+		 * Check that the index is still the one with indisclustered set, if
+		 * needed.
+		 */
+		if ((options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
+			!get_index_isclustered(indexOid))
+		{
+			relation_close(OldHeap, AccessExclusiveLock);
+			return false;
+		}
+	}
+
+	return true;
+}
+
 /*
  * Verify that the specified heap and index are valid to cluster on
  *
@@ -626,7 +652,8 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
  * On exit, they are closed, but locks on them are not released.
  */
 static void
-rebuild_relation(Relation OldHeap, Relation index, bool verbose)
+rebuild_relation(RepackCommand cmd, bool usingindex,
+				 Relation OldHeap, Relation index, bool verbose)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			accessMethod = OldHeap->rd_rel->relam;
@@ -642,8 +669,8 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose)
 	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false) &&
 		   (index == NULL || CheckRelationLockedByMe(index, AccessExclusiveLock, false)));
 
-	if (index)
-		/* Mark the correct index as clustered */
+	/* for CLUSTER or REPACK USING INDEX, mark the index as the one to use */
+	if (usingindex)
 		mark_index_clustered(OldHeap, RelationGetRelid(index), true);
 
 	/* Remember info about rel before closing OldHeap */
@@ -1458,8 +1485,8 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	int			i;
 
 	/* Report that we are now swapping relation files */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_SWAP_REL_FILES);
 
 	/* Zero out possible results from swapped_relation_files */
 	memset(mapped_tables, 0, sizeof(mapped_tables));
@@ -1509,14 +1536,14 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 		reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
 
 	/* Report that we are now reindexing relations */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_REBUILD_INDEX);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
 
 	reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
 
 	/* Report that we are now doing clean up */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_FINAL_CLEANUP);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_FINAL_CLEANUP);
 
 	/*
 	 * If the relation being rebuilt is pg_class, swap_relation_files()
@@ -1632,69 +1659,137 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	}
 }
 
-
 /*
- * Get a list of tables that the current user has privileges on and
- * have indisclustered set.  Return the list in a List * of RelToCluster
- * (stored in the specified memory context), each one giving the tableOid
- * and the indexOid on which the table is already clustered.
+ * Determine which relations to process, when REPACK/CLUSTER is called
+ * without specifying a table name.  The exact process depends on whether
+ * USING INDEX was given or not, and in any case we only return tables and
+ * materialized views that the current user has privileges to repack/cluster.
+ *
+ * If USING INDEX was given, we scan pg_index to find those that have
+ * indisclustered set; if it was not given, scan pg_class and return all
+ * tables.
+ *
+ * Return it as a list of RelToCluster in the given memory context.
  */
 static List *
-get_tables_to_cluster(MemoryContext cluster_context)
+get_tables_to_repack(RepackCommand command, bool usingindex,
+					 MemoryContext permcxt)
 {
-	Relation	indRelation;
+	Relation	catalog;
 	TableScanDesc scan;
-	ScanKeyData entry;
-	HeapTuple	indexTuple;
-	Form_pg_index index;
+	HeapTuple	tuple;
 	MemoryContext old_context;
 	List	   *rtcs = NIL;
 
-	/*
-	 * Get all indexes that have indisclustered set and that the current user
-	 * has the appropriate privileges for.
-	 */
-	indRelation = table_open(IndexRelationId, AccessShareLock);
-	ScanKeyInit(&entry,
-				Anum_pg_index_indisclustered,
-				BTEqualStrategyNumber, F_BOOLEQ,
-				BoolGetDatum(true));
-	scan = table_beginscan_catalog(indRelation, 1, &entry);
-	while ((indexTuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+	if (usingindex)
 	{
-		RelToCluster *rtc;
+		ScanKeyData entry;
 
-		index = (Form_pg_index) GETSTRUCT(indexTuple);
+		catalog = table_open(IndexRelationId, AccessShareLock);
+		ScanKeyInit(&entry,
+					Anum_pg_index_indisclustered,
+					BTEqualStrategyNumber, F_BOOLEQ,
+					BoolGetDatum(true));
+		scan = table_beginscan_catalog(catalog, 1, &entry);
+		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+		{
+			RelToCluster *rtc;
+			Form_pg_index index;
 
-		if (!cluster_is_permitted_for_relation(index->indrelid, GetUserId()))
-			continue;
+			index = (Form_pg_index) GETSTRUCT(tuple);
 
-		/* Use a permanent memory context for the result list */
-		old_context = MemoryContextSwitchTo(cluster_context);
+			/*
+			 * XXX I think the only reason there's no test failure here is
+			 * that we seldom have clustered indexes that would be affected by
+			 * concurrency.  Maybe we should also do the
+			 * ConditionalLockRelationOid+SearchSysCacheExists dance that we
+			 * do below.
+			 */
+			if (!cluster_is_permitted_for_relation(command, index->indrelid,
+												   GetUserId()))
+				continue;
 
-		rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
-		rtc->tableOid = index->indrelid;
-		rtc->indexOid = index->indexrelid;
-		rtcs = lappend(rtcs, rtc);
+			/* Use a permanent memory context for the result list */
+			old_context = MemoryContextSwitchTo(permcxt);
 
-		MemoryContextSwitchTo(old_context);
+			rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
+			rtc->tableOid = index->indrelid;
+			rtc->indexOid = index->indexrelid;
+			rtcs = lappend(rtcs, rtc);
+
+			MemoryContextSwitchTo(old_context);
+		}
 	}
+	else
+	{
+		catalog = table_open(RelationRelationId, AccessShareLock);
+		scan = table_beginscan_catalog(catalog, 0, NULL);
+
+		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+		{
+			RelToCluster *rtc;
+			Form_pg_class class;
+
+			class = (Form_pg_class) GETSTRUCT(tuple);
+
+			/*
+			 * Try to obtain a light lock on the table, to ensure it doesn't
+			 * go away while we collect the list.  If we cannot, just
+			 * disregard the table.  XXX we could release at the bottom of the
+			 * loop, but for now just hold it until this transaction is
+			 * finished.
+			 */
+			if (!ConditionalLockRelationOid(class->oid, AccessShareLock))
+				continue;
+
+			/* Verify that the table still exists. */
+			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(class->oid)))
+			{
+				/* Release useless lock */
+				UnlockRelationOid(class->oid, AccessShareLock);
+				continue;
+			}
+
+			/* Can only process plain tables and matviews */
+			if (class->relkind != RELKIND_RELATION &&
+				class->relkind != RELKIND_MATVIEW)
+				continue;
+
+			if (!cluster_is_permitted_for_relation(command, class->oid,
+												   GetUserId()))
+				continue;
+
+			/* Use a permanent memory context for the result list */
+			old_context = MemoryContextSwitchTo(permcxt);
+
+			rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
+			rtc->tableOid = class->oid;
+			rtc->indexOid = InvalidOid;
+			rtcs = lappend(rtcs, rtc);
+
+			MemoryContextSwitchTo(old_context);
+		}
+	}
+
 	table_endscan(scan);
-
-	relation_close(indRelation, AccessShareLock);
+	relation_close(catalog, AccessShareLock);
 
 	return rtcs;
 }
 
 /*
- * Given an index on a partitioned table, return a list of RelToCluster for
+ * Given a partitioned table or its index, return a list of RelToCluster for
  * all the children leaves tables/indexes.
  *
  * Like expand_vacuum_rel, but here caller must hold AccessExclusiveLock
  * on the table containing the index.
+ *
+ * 'rel_is_index' tells whether 'relid' is that of an index (true) or of the
+ * owning relation.
  */
 static List *
-get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
+get_tables_to_repack_partitioned(RepackCommand cmd, MemoryContext cluster_context,
+								 Oid relid, bool rel_is_index)
 {
 	List	   *inhoids;
 	ListCell   *lc;
@@ -1702,17 +1797,33 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
 	MemoryContext old_context;
 
 	/* Do not lock the children until they're processed */
-	inhoids = find_all_inheritors(indexOid, NoLock, NULL);
+	inhoids = find_all_inheritors(relid, NoLock, NULL);
 
 	foreach(lc, inhoids)
 	{
-		Oid			indexrelid = lfirst_oid(lc);
-		Oid			relid = IndexGetRelation(indexrelid, false);
+		Oid			inhoid = lfirst_oid(lc);
+		Oid			inhrelid,
+					inhindid;
 		RelToCluster *rtc;
 
-		/* consider only leaf indexes */
-		if (get_rel_relkind(indexrelid) != RELKIND_INDEX)
-			continue;
+		if (rel_is_index)
+		{
+			/* consider only leaf indexes */
+			if (get_rel_relkind(inhoid) != RELKIND_INDEX)
+				continue;
+
+			inhrelid = IndexGetRelation(inhoid, false);
+			inhindid = inhoid;
+		}
+		else
+		{
+			/* consider only leaf relations */
+			if (get_rel_relkind(inhoid) != RELKIND_RELATION)
+				continue;
+
+			inhrelid = inhoid;
+			inhindid = InvalidOid;
+		}
 
 		/*
 		 * It's possible that the user does not have privileges to CLUSTER the
@@ -1720,15 +1831,15 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
 		 * table.  We skip any partitions which the user is not permitted to
 		 * CLUSTER.
 		 */
-		if (!cluster_is_permitted_for_relation(relid, GetUserId()))
+		if (!cluster_is_permitted_for_relation(cmd, inhrelid, GetUserId()))
 			continue;
 
 		/* Use a permanent memory context for the result list */
 		old_context = MemoryContextSwitchTo(cluster_context);
 
 		rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
-		rtc->tableOid = relid;
-		rtc->indexOid = indexrelid;
+		rtc->tableOid = inhrelid;
+		rtc->indexOid = inhindid;
 		rtcs = lappend(rtcs, rtc);
 
 		MemoryContextSwitchTo(old_context);
@@ -1742,13 +1853,148 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
  * function emits a WARNING.
  */
 static bool
-cluster_is_permitted_for_relation(Oid relid, Oid userid)
+cluster_is_permitted_for_relation(RepackCommand cmd, Oid relid, Oid userid)
 {
 	if (pg_class_aclcheck(relid, userid, ACL_MAINTAIN) == ACLCHECK_OK)
 		return true;
 
+	Assert(cmd == REPACK_COMMAND_CLUSTER || cmd == REPACK_COMMAND_REPACK);
 	ereport(WARNING,
-			(errmsg("permission denied to cluster \"%s\", skipping it",
-					get_rel_name(relid))));
+			errmsg("permission denied to execute %s on \"%s\", skipping it",
+				   cmd == REPACK_COMMAND_CLUSTER ? "CLUSTER" : "REPACK",
+				   get_rel_name(relid)));
+
 	return false;
 }
+
+
+/*
+ * Given a RepackStmt with an indicated relation name, resolve the relation
+ * name, obtain lock on it, then determine what to do based on the relation
+ * type: if it's not a partitioned table, repack it as indicated (using an
+ * existing clustered index, or following the indicated index), and return
+ * NULL.
+ *
+ * On the other hand, if the table is partitioned, do nothing further and
+ * instead return the opened relcache entry, so that caller can process the
+ * partitions using the multiple-table handling code.  The index name is not
+ * resolve in this case.
+ */
+static Relation
+process_single_relation(RepackStmt *stmt, ClusterParams *params)
+{
+	Relation	rel;
+	Oid			tableOid;
+
+	Assert(stmt->relation != NULL);
+	Assert(stmt->command == REPACK_COMMAND_CLUSTER ||
+		   stmt->command == REPACK_COMMAND_REPACK);
+
+	/*
+	 * Find, lock, and check permissions on the table.  We obtain
+	 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
+	 * single-transaction case.
+	 */
+	tableOid = RangeVarGetRelidExtended(stmt->relation,
+										AccessExclusiveLock,
+										0,
+										RangeVarCallbackMaintainsTable,
+										NULL);
+	rel = table_open(tableOid, NoLock);
+
+	/*
+	 * Reject clustering a remote temp table ... their local buffer manager is
+	 * not going to cope.
+	 */
+	if (RELATION_IS_OTHER_TEMP(rel))
+	{
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot execute %s on temporary tables of other sessions",
+					   RepackCommandAsString(stmt->command)));
+	}
+
+	/*
+	 * For partitioned tables, let caller handle this.  Otherwise, process it
+	 * here and we're done.
+	 */
+	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+		return rel;
+	else
+	{
+		Oid			indexOid;
+
+		indexOid = determine_clustered_index(rel, stmt->usingindex,
+											 stmt->indexname);
+		if (OidIsValid(indexOid))
+			check_index_is_clusterable(rel, indexOid, AccessExclusiveLock);
+		cluster_rel(stmt->command, stmt->usingindex, rel, indexOid, params);
+
+		/* Do an analyze, if requested */
+		if (params->options & CLUOPT_ANALYZE)
+		{
+			VacuumParams vac_params = {0};
+
+			vac_params.options |= VACOPT_ANALYZE;
+			if (params->options & CLUOPT_VERBOSE)
+				vac_params.options |= VACOPT_VERBOSE;
+			analyze_rel(RelationGetRelid(rel), NULL, vac_params, NIL, true,
+						NULL);
+		}
+
+		return NULL;
+	}
+}
+
+/*
+ * Given a relation and the usingindex/indexname options in a
+ * REPACK USING INDEX or CLUSTER command, return the OID of the index to use
+ * for clustering the table.
+ *
+ * Caller must hold lock on the relation so that the set of indexes doesn't
+ * change, and must call check_index_is_clusterable.
+ */
+static Oid
+determine_clustered_index(Relation rel, bool usingindex, const char *indexname)
+{
+	Oid			indexOid;
+
+	if (indexname == NULL && usingindex)
+	{
+		ListCell   *lc;
+
+		/* Find an index with indisclustered set, or report error */
+		foreach(lc, RelationGetIndexList(rel))
+		{
+			indexOid = lfirst_oid(lc);
+
+			if (get_index_isclustered(indexOid))
+				break;
+			indexOid = InvalidOid;
+		}
+
+		if (!OidIsValid(indexOid))
+			ereport(ERROR,
+					errcode(ERRCODE_UNDEFINED_OBJECT),
+					errmsg("there is no previously clustered index for table \"%s\"",
+						   RelationGetRelationName(rel)));
+	}
+	else if (indexname != NULL)
+	{
+		/*
+		 * An index was specified; figure out its OID.  It must be in the same
+		 * namespace as the relation.
+		 */
+		indexOid = get_relname_relid(indexname,
+									 rel->rd_rel->relnamespace);
+		if (!OidIsValid(indexOid))
+			ereport(ERROR,
+					errcode(ERRCODE_UNDEFINED_OBJECT),
+					errmsg("index \"%s\" for table \"%s\" does not exist",
+						   indexname, RelationGetRelationName(rel)));
+	}
+	else
+		indexOid = InvalidOid;
+
+	return indexOid;
+}
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 733ef40ae7c..8863ad0e8bd 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -2287,7 +2287,8 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 				cluster_params.options |= CLUOPT_VERBOSE;
 
 			/* VACUUM FULL is now a variant of CLUSTER; see cluster.c */
-			cluster_rel(rel, InvalidOid, &cluster_params);
+			cluster_rel(REPACK_COMMAND_VACUUMFULL, false, rel, InvalidOid,
+						&cluster_params);
 			/* cluster_rel closes the relation, but keeps lock */
 
 			rel = NULL;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index db43034b9db..f9152728021 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -280,7 +280,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		AlterCompositeTypeStmt AlterUserMappingStmt
 		AlterRoleStmt AlterRoleSetStmt AlterPolicyStmt AlterStatsStmt
 		AlterDefaultPrivilegesStmt DefACLAction
-		AnalyzeStmt CallStmt ClosePortalStmt ClusterStmt CommentStmt
+		AnalyzeStmt CallStmt ClosePortalStmt CommentStmt
 		ConstraintsSetStmt CopyStmt CreateAsStmt CreateCastStmt
 		CreateDomainStmt CreateExtensionStmt CreateGroupStmt CreateOpClassStmt
 		CreateOpFamilyStmt AlterOpFamilyStmt CreatePLangStmt
@@ -297,7 +297,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		GrantStmt GrantRoleStmt ImportForeignSchemaStmt IndexStmt InsertStmt
 		ListenStmt LoadStmt LockStmt MergeStmt NotifyStmt ExplainableStmt PreparableStmt
 		CreateFunctionStmt AlterFunctionStmt ReindexStmt RemoveAggrStmt
-		RemoveFuncStmt RemoveOperStmt RenameStmt ReturnStmt RevokeStmt RevokeRoleStmt
+		RemoveFuncStmt RemoveOperStmt RenameStmt RepackStmt ReturnStmt RevokeStmt RevokeRoleStmt
 		RuleActionStmt RuleActionStmtOrEmpty RuleStmt
 		SecLabelStmt SelectStmt TransactionStmt TransactionStmtLegacy TruncateStmt
 		UnlistenStmt UpdateStmt VacuumStmt
@@ -316,7 +316,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 %type <str>			opt_single_name
 %type <list>		opt_qualified_name
-%type <boolean>		opt_concurrently
+%type <boolean>		opt_concurrently opt_usingindex
 %type <dbehavior>	opt_drop_behavior
 %type <list>		opt_utility_option_list
 %type <list>		utility_option_list
@@ -763,7 +763,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	QUOTE QUOTES
 
 	RANGE READ REAL REASSIGN RECURSIVE REF_P REFERENCES REFERENCING
-	REFRESH REINDEX RELATIVE_P RELEASE RENAME REPEATABLE REPLACE REPLICA
+	REFRESH REINDEX RELATIVE_P RELEASE RENAME REPACK REPEATABLE REPLACE REPLICA
 	RESET RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP
 	ROUTINE ROUTINES ROW ROWS RULE
 
@@ -1025,7 +1025,6 @@ stmt:
 			| CallStmt
 			| CheckPointStmt
 			| ClosePortalStmt
-			| ClusterStmt
 			| CommentStmt
 			| ConstraintsSetStmt
 			| CopyStmt
@@ -1099,6 +1098,7 @@ stmt:
 			| RemoveFuncStmt
 			| RemoveOperStmt
 			| RenameStmt
+			| RepackStmt
 			| RevokeStmt
 			| RevokeRoleStmt
 			| RuleStmt
@@ -1135,6 +1135,11 @@ opt_concurrently:
 			| /*EMPTY*/						{ $$ = false; }
 		;
 
+opt_usingindex:
+			USING INDEX						{ $$ = true; }
+			| /* EMPTY */					{ $$ = false; }
+		;
+
 opt_drop_behavior:
 			CASCADE							{ $$ = DROP_CASCADE; }
 			| RESTRICT						{ $$ = DROP_RESTRICT; }
@@ -11912,38 +11917,91 @@ CreateConversionStmt:
 /*****************************************************************************
  *
  *		QUERY:
+ *				REPACK [ (options) ] [ <qualified_name> [ USING INDEX <index_name> ] ]
+ *
+ *			obsolete variants:
  *				CLUSTER (options) [ <qualified_name> [ USING <index_name> ] ]
  *				CLUSTER [VERBOSE] [ <qualified_name> [ USING <index_name> ] ]
  *				CLUSTER [VERBOSE] <index_name> ON <qualified_name> (for pre-8.3)
  *
  *****************************************************************************/
 
-ClusterStmt:
-			CLUSTER '(' utility_option_list ')' qualified_name cluster_index_specification
+RepackStmt:
+			REPACK opt_utility_option_list qualified_name USING INDEX name
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = $3;
+					n->indexname = $6;
+					n->usingindex = true;
+					n->params = $2;
+					$$ = (Node *) n;
+				}
+			| REPACK opt_utility_option_list qualified_name opt_usingindex
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = $3;
+					n->indexname = NULL;
+					n->usingindex = $4;
+					n->params = $2;
+					$$ = (Node *) n;
+				}
+			| REPACK '(' utility_option_list ')'
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = NULL;
+					n->indexname = NULL;
+					n->usingindex = false;
+					n->params = $3;
+					$$ = (Node *) n;
+				}
+			| REPACK opt_usingindex
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = NULL;
+					n->indexname = NULL;
+					n->usingindex = $2;
+					n->params = NIL;
+					$$ = (Node *) n;
+				}
+			| CLUSTER '(' utility_option_list ')' qualified_name cluster_index_specification
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = $5;
 					n->indexname = $6;
+					n->usingindex = true;
 					n->params = $3;
 					$$ = (Node *) n;
 				}
 			| CLUSTER opt_utility_option_list
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = NULL;
 					n->indexname = NULL;
+					n->usingindex = true;
 					n->params = $2;
 					$$ = (Node *) n;
 				}
 			/* unparenthesized VERBOSE kept for pre-14 compatibility */
 			| CLUSTER opt_verbose qualified_name cluster_index_specification
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = $3;
 					n->indexname = $4;
+					n->usingindex = true;
 					if ($2)
 						n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
@@ -11951,20 +12009,24 @@ ClusterStmt:
 			/* unparenthesized VERBOSE kept for pre-17 compatibility */
 			| CLUSTER VERBOSE
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = NULL;
 					n->indexname = NULL;
+					n->usingindex = true;
 					n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
 				}
 			/* kept for pre-8.3 compatibility */
 			| CLUSTER opt_verbose name ON qualified_name
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = $5;
 					n->indexname = $3;
+					n->usingindex = true;
 					if ($2)
 						n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
@@ -17960,6 +18022,7 @@ unreserved_keyword:
 			| RELATIVE_P
 			| RELEASE
 			| RENAME
+			| REPACK
 			| REPEATABLE
 			| REPLACE
 			| REPLICA
@@ -18592,6 +18655,7 @@ bare_label_keyword:
 			| RELATIVE_P
 			| RELEASE
 			| RENAME
+			| REPACK
 			| REPEATABLE
 			| REPLACE
 			| REPLICA
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 5f442bc3bd4..cf6db581007 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -277,9 +277,9 @@ ClassifyUtilityCommandAsReadOnly(Node *parsetree)
 				return COMMAND_OK_IN_RECOVERY | COMMAND_OK_IN_READ_ONLY_TXN;
 			}
 
-		case T_ClusterStmt:
 		case T_ReindexStmt:
 		case T_VacuumStmt:
+		case T_RepackStmt:
 			{
 				/*
 				 * These commands write WAL, so they're not strictly
@@ -854,14 +854,14 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 			ExecuteCallStmt(castNode(CallStmt, parsetree), params, isAtomicContext, dest);
 			break;
 
-		case T_ClusterStmt:
-			cluster(pstate, (ClusterStmt *) parsetree, isTopLevel);
-			break;
-
 		case T_VacuumStmt:
 			ExecVacuum(pstate, (VacuumStmt *) parsetree, isTopLevel);
 			break;
 
+		case T_RepackStmt:
+			ExecRepack(pstate, (RepackStmt *) parsetree, isTopLevel);
+			break;
+
 		case T_ExplainStmt:
 			ExplainQuery(pstate, (ExplainStmt *) parsetree, params, dest);
 			break;
@@ -2851,10 +2851,6 @@ CreateCommandTag(Node *parsetree)
 			tag = CMDTAG_CALL;
 			break;
 
-		case T_ClusterStmt:
-			tag = CMDTAG_CLUSTER;
-			break;
-
 		case T_VacuumStmt:
 			if (((VacuumStmt *) parsetree)->is_vacuumcmd)
 				tag = CMDTAG_VACUUM;
@@ -2862,6 +2858,10 @@ CreateCommandTag(Node *parsetree)
 				tag = CMDTAG_ANALYZE;
 			break;
 
+		case T_RepackStmt:
+			tag = CMDTAG_REPACK;
+			break;
+
 		case T_ExplainStmt:
 			tag = CMDTAG_EXPLAIN;
 			break;
@@ -3499,7 +3499,7 @@ GetCommandLogLevel(Node *parsetree)
 			lev = LOGSTMT_ALL;
 			break;
 
-		case T_ClusterStmt:
+		case T_RepackStmt:
 			lev = LOGSTMT_DDL;
 			break;
 
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index c756c2bebaa..a1e10e8c2f6 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -268,6 +268,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 		cmdtype = PROGRESS_COMMAND_ANALYZE;
 	else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
 		cmdtype = PROGRESS_COMMAND_CLUSTER;
+	else if (pg_strcasecmp(cmd, "REPACK") == 0)
+		cmdtype = PROGRESS_COMMAND_REPACK;
 	else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)
 		cmdtype = PROGRESS_COMMAND_CREATE_INDEX;
 	else if (pg_strcasecmp(cmd, "BASEBACKUP") == 0)
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 8b10f2313f3..59ff6e0923b 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -1247,7 +1247,7 @@ static const char *const sql_commands[] = {
 	"DELETE FROM", "DISCARD", "DO", "DROP", "END", "EXECUTE", "EXPLAIN",
 	"FETCH", "GRANT", "IMPORT FOREIGN SCHEMA", "INSERT INTO", "LISTEN", "LOAD", "LOCK",
 	"MERGE INTO", "MOVE", "NOTIFY", "PREPARE",
-	"REASSIGN", "REFRESH MATERIALIZED VIEW", "REINDEX", "RELEASE",
+	"REASSIGN", "REFRESH MATERIALIZED VIEW", "REINDEX", "RELEASE", "REPACK",
 	"RESET", "REVOKE", "ROLLBACK",
 	"SAVEPOINT", "SECURITY LABEL", "SELECT", "SET", "SHOW", "START",
 	"TABLE", "TRUNCATE", "UNLISTEN", "UPDATE", "VACUUM", "VALUES", "WITH",
@@ -4997,6 +4997,37 @@ match_previous_words(int pattern_id,
 			COMPLETE_WITH_QUERY(Query_for_list_of_tablespaces);
 	}
 
+/* REPACK */
+	else if (Matches("REPACK"))
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
+	else if (Matches("REPACK", "(*)"))
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
+	/* If we have REPACK <sth>, then add "USING INDEX" */
+	else if (Matches("REPACK", MatchAnyExcept("(")))
+		COMPLETE_WITH("USING INDEX");
+	/* If we have REPACK (*) <sth>, then add "USING INDEX" */
+	else if (Matches("REPACK", "(*)", MatchAny))
+		COMPLETE_WITH("USING INDEX");
+	/* If we have REPACK <sth> USING, then add the index as well */
+	else if (Matches("REPACK", MatchAny, "USING", "INDEX"))
+	{
+		set_completion_reference(prev3_wd);
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_index_of_table);
+	}
+	else if (HeadMatches("REPACK", "(*") &&
+			 !HeadMatches("REPACK", "(*)"))
+	{
+		/*
+		 * This fires if we're in an unfinished parenthesized option list.
+		 * get_previous_words treats a completed parenthesized option list as
+		 * one word, so the above test is correct.
+		 */
+		if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
+			COMPLETE_WITH("VERBOSE");
+		else if (TailMatches("VERBOSE"))
+			COMPLETE_WITH("ON", "OFF");
+	}
+
 /* SECURITY LABEL */
 	else if (Matches("SECURITY"))
 		COMPLETE_WITH("LABEL");
diff --git a/src/bin/scripts/Makefile b/src/bin/scripts/Makefile
index 019ca06455d..f0c1bd4175c 100644
--- a/src/bin/scripts/Makefile
+++ b/src/bin/scripts/Makefile
@@ -16,7 +16,7 @@ subdir = src/bin/scripts
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-PROGRAMS = createdb createuser dropdb dropuser clusterdb vacuumdb reindexdb pg_isready
+PROGRAMS = createdb createuser dropdb dropuser clusterdb vacuumdb reindexdb pg_isready pg_repackdb
 
 override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
 LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
@@ -31,6 +31,7 @@ clusterdb: clusterdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport su
 vacuumdb: vacuumdb.o vacuuming.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 reindexdb: reindexdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 pg_isready: pg_isready.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
+pg_repackdb: pg_repackdb.o vacuuming.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 
 install: all installdirs
 	$(INSTALL_PROGRAM) createdb$(X)   '$(DESTDIR)$(bindir)'/createdb$(X)
@@ -41,6 +42,7 @@ install: all installdirs
 	$(INSTALL_PROGRAM) vacuumdb$(X)   '$(DESTDIR)$(bindir)'/vacuumdb$(X)
 	$(INSTALL_PROGRAM) reindexdb$(X)  '$(DESTDIR)$(bindir)'/reindexdb$(X)
 	$(INSTALL_PROGRAM) pg_isready$(X) '$(DESTDIR)$(bindir)'/pg_isready$(X)
+	$(INSTALL_PROGRAM) pg_repackdb$(X) '$(DESTDIR)$(bindir)'/pg_repackdb$(X)
 
 installdirs:
 	$(MKDIR_P) '$(DESTDIR)$(bindir)'
diff --git a/src/bin/scripts/meson.build b/src/bin/scripts/meson.build
index a4fed59d1c9..18410fb80dd 100644
--- a/src/bin/scripts/meson.build
+++ b/src/bin/scripts/meson.build
@@ -42,6 +42,7 @@ vacuuming_common = static_library('libvacuuming_common',
 
 binaries = [
   'vacuumdb',
+  'pg_repackdb'
 ]
 foreach binary : binaries
   binary_sources = files('@0@.c'.format(binary))
@@ -80,6 +81,7 @@ tests += {
       't/100_vacuumdb.pl',
       't/101_vacuumdb_all.pl',
       't/102_vacuumdb_stages.pl',
+      't/103_repackdb.pl',
       't/200_connstr.pl',
     ],
   },
diff --git a/src/bin/scripts/pg_repackdb.c b/src/bin/scripts/pg_repackdb.c
new file mode 100644
index 00000000000..23326372a77
--- /dev/null
+++ b/src/bin/scripts/pg_repackdb.c
@@ -0,0 +1,226 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_repackdb
+ *		An utility to run REPACK
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * FIXME: this is missing a way to specify the index to use to repack one
+ * table, or whether to pass a WITH INDEX clause when multiple tables are
+ * used.  Something like --index[=indexname].  Adding that bleeds into
+ * vacuuming.c as well.
+ *
+ * src/bin/scripts/pg_repackdb.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <limits.h>
+
+#include "common.h"
+#include "common/logging.h"
+#include "fe_utils/option_utils.h"
+#include "vacuuming.h"
+
+static void help(const char *progname);
+void		check_objfilter(void);
+
+int
+main(int argc, char *argv[])
+{
+	static struct option long_options[] = {
+		{"host", required_argument, NULL, 'h'},
+		{"port", required_argument, NULL, 'p'},
+		{"username", required_argument, NULL, 'U'},
+		{"no-password", no_argument, NULL, 'w'},
+		{"password", no_argument, NULL, 'W'},
+		{"echo", no_argument, NULL, 'e'},
+		{"quiet", no_argument, NULL, 'q'},
+		{"dbname", required_argument, NULL, 'd'},
+		{"all", no_argument, NULL, 'a'},
+		{"table", required_argument, NULL, 't'},
+		{"verbose", no_argument, NULL, 'v'},
+		{"jobs", required_argument, NULL, 'j'},
+		{"schema", required_argument, NULL, 'n'},
+		{"exclude-schema", required_argument, NULL, 'N'},
+		{"maintenance-db", required_argument, NULL, 2},
+		{NULL, 0, NULL, 0}
+	};
+
+	const char *progname;
+	int			optindex;
+	int			c;
+	const char *dbname = NULL;
+	const char *maintenance_db = NULL;
+	ConnParams	cparams;
+	bool		echo = false;
+	bool		quiet = false;
+	vacuumingOptions vacopts;
+	SimpleStringList objects = {NULL, NULL};
+	int			concurrentCons = 1;
+	int			tbl_count = 0;
+
+	/* initialize options */
+	memset(&vacopts, 0, sizeof(vacopts));
+	vacopts.mode = MODE_REPACK;
+
+	/* the same for connection parameters */
+	memset(&cparams, 0, sizeof(cparams));
+	cparams.prompt_password = TRI_DEFAULT;
+
+	pg_logging_init(argv[0]);
+	progname = get_progname(argv[0]);
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
+
+	handle_help_version_opts(argc, argv, progname, help);
+
+	while ((c = getopt_long(argc, argv, "ad:eh:j:n:N:p:qt:U:vwW",
+							long_options, &optindex)) != -1)
+	{
+		switch (c)
+		{
+			case 'a':
+				objfilter |= OBJFILTER_ALL_DBS;
+				break;
+			case 'd':
+				objfilter |= OBJFILTER_DATABASE;
+				dbname = pg_strdup(optarg);
+				break;
+			case 'e':
+				echo = true;
+				break;
+			case 'h':
+				cparams.pghost = pg_strdup(optarg);
+				break;
+			case 'j':
+				if (!option_parse_int(optarg, "-j/--jobs", 1, INT_MAX,
+									  &concurrentCons))
+					exit(1);
+				break;
+			case 'n':
+				objfilter |= OBJFILTER_SCHEMA;
+				simple_string_list_append(&objects, optarg);
+				break;
+			case 'N':
+				objfilter |= OBJFILTER_SCHEMA_EXCLUDE;
+				simple_string_list_append(&objects, optarg);
+				break;
+			case 'p':
+				cparams.pgport = pg_strdup(optarg);
+				break;
+			case 'q':
+				quiet = true;
+				break;
+			case 't':
+				objfilter |= OBJFILTER_TABLE;
+				simple_string_list_append(&objects, optarg);
+				tbl_count++;
+				break;
+			case 'U':
+				cparams.pguser = pg_strdup(optarg);
+				break;
+			case 'v':
+				vacopts.verbose = true;
+				break;
+			case 'w':
+				cparams.prompt_password = TRI_NO;
+				break;
+			case 'W':
+				cparams.prompt_password = TRI_YES;
+				break;
+			case 2:
+				maintenance_db = pg_strdup(optarg);
+				break;
+			default:
+				/* getopt_long already emitted a complaint */
+				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+				exit(1);
+		}
+	}
+
+	/*
+	 * Non-option argument specifies database name as long as it wasn't
+	 * already specified with -d / --dbname
+	 */
+	if (optind < argc && dbname == NULL)
+	{
+		objfilter |= OBJFILTER_DATABASE;
+		dbname = argv[optind];
+		optind++;
+	}
+
+	if (optind < argc)
+	{
+		pg_log_error("too many command-line arguments (first is \"%s\")",
+					 argv[optind]);
+		pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+		exit(1);
+	}
+
+	/*
+	 * Validate the combination of filters specified in the command-line
+	 * options.
+	 */
+	check_objfilter();
+
+	vacuuming_main(&cparams, dbname, maintenance_db, &vacopts, &objects,
+				   false, tbl_count, concurrentCons,
+				   progname, echo, quiet);
+	exit(0);
+}
+
+/*
+ * Verify that the filters used at command line are compatible.
+ */
+void
+check_objfilter(void)
+{
+	if ((objfilter & OBJFILTER_ALL_DBS) &&
+		(objfilter & OBJFILTER_DATABASE))
+		pg_fatal("cannot repack all databases and a specific one at the same time");
+
+	if ((objfilter & OBJFILTER_TABLE) &&
+		(objfilter & OBJFILTER_SCHEMA))
+		pg_fatal("cannot repack all tables in schema(s) and specific table(s) at the same time");
+
+	if ((objfilter & OBJFILTER_TABLE) &&
+		(objfilter & OBJFILTER_SCHEMA_EXCLUDE))
+		pg_fatal("cannot repack specific table(s) and exclude schema(s) at the same time");
+
+	if ((objfilter & OBJFILTER_SCHEMA) &&
+		(objfilter & OBJFILTER_SCHEMA_EXCLUDE))
+		pg_fatal("cannot repack all tables in schema(s) and exclude schema(s) at the same time");
+}
+
+static void
+help(const char *progname)
+{
+	printf(_("%s repacks a PostgreSQL database.\n\n"), progname);
+	printf(_("Usage:\n"));
+	printf(_("  %s [OPTION]... [DBNAME]\n"), progname);
+	printf(_("\nOptions:\n"));
+	printf(_("  -a, --all                       repack all databases\n"));
+	printf(_("  -d, --dbname=DBNAME             database to repack\n"));
+	printf(_("  -e, --echo                      show the commands being sent to the server\n"));
+	printf(_("  -j, --jobs=NUM                  use this many concurrent connections to repack\n"));
+	printf(_("  -n, --schema=SCHEMA             repack tables in the specified schema(s) only\n"));
+	printf(_("  -N, --exclude-schema=SCHEMA     do not repack tables in the specified schema(s)\n"));
+	printf(_("  -q, --quiet                     don't write any messages\n"));
+	printf(_("  -t, --table='TABLE'             repack specific table(s) only\n"));
+	printf(_("  -v, --verbose                   write a lot of output\n"));
+	printf(_("  -V, --version                   output version information, then exit\n"));
+	printf(_("  -?, --help                      show this help, then exit\n"));
+	printf(_("\nConnection options:\n"));
+	printf(_("  -h, --host=HOSTNAME       database server host or socket directory\n"));
+	printf(_("  -p, --port=PORT           database server port\n"));
+	printf(_("  -U, --username=USERNAME   user name to connect as\n"));
+	printf(_("  -w, --no-password         never prompt for password\n"));
+	printf(_("  -W, --password            force password prompt\n"));
+	printf(_("  --maintenance-db=DBNAME   alternate maintenance database\n"));
+	printf(_("\nRead the description of the SQL command REPACK for details.\n"));
+	printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
+	printf(_("%s home page: <%s>\n"), PACKAGE_NAME, PACKAGE_URL);
+}
diff --git a/src/bin/scripts/t/103_repackdb.pl b/src/bin/scripts/t/103_repackdb.pl
new file mode 100644
index 00000000000..51de4d7ab34
--- /dev/null
+++ b/src/bin/scripts/t/103_repackdb.pl
@@ -0,0 +1,24 @@
+# Copyright (c) 2021-2025, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+program_help_ok('pg_repackdb');
+program_version_ok('pg_repackdb');
+program_options_handling_ok('pg_repackdb');
+
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init;
+$node->start;
+
+$node->issues_sql_like(
+	[ 'pg_repackdb', 'postgres' ],
+	qr/statement: REPACK.*;/,
+	'SQL REPACK run');
+
+
+done_testing();
diff --git a/src/bin/scripts/vacuuming.c b/src/bin/scripts/vacuuming.c
index 9be37fcc45a..e07071c38ee 100644
--- a/src/bin/scripts/vacuuming.c
+++ b/src/bin/scripts/vacuuming.c
@@ -1,6 +1,6 @@
 /*-------------------------------------------------------------------------
  * vacuuming.c
- *		Common routines for vacuumdb
+ *		Common routines for vacuumdb and pg_repackdb
  *
  * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -166,6 +166,14 @@ vacuum_one_database(ConnParams *cparams,
 
 	conn = connectDatabase(cparams, progname, echo, false, true);
 
+	if (vacopts->mode == MODE_REPACK && PQserverVersion(conn) < 190000)
+	{
+		/* XXX arguably, here we should use VACUUM FULL instead of failing */
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" command on server versions older than PostgreSQL %s",
+				 "REPACK", "19");
+	}
+
 	if (vacopts->disable_page_skipping && PQserverVersion(conn) < 90600)
 	{
 		PQfinish(conn);
@@ -258,9 +266,15 @@ vacuum_one_database(ConnParams *cparams,
 		if (stage != ANALYZE_NO_STAGE)
 			printf(_("%s: processing database \"%s\": %s\n"),
 				   progname, PQdb(conn), _(stage_messages[stage]));
-		else
+		else if (vacopts->mode == MODE_VACUUM)
 			printf(_("%s: vacuuming database \"%s\"\n"),
 				   progname, PQdb(conn));
+		else
+		{
+			Assert(vacopts->mode == MODE_REPACK);
+			printf(_("%s: repacking database \"%s\"\n"),
+				   progname, PQdb(conn));
+		}
 		fflush(stdout);
 	}
 
@@ -350,7 +364,7 @@ vacuum_one_database(ConnParams *cparams,
 		 * through ParallelSlotsGetIdle.
 		 */
 		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, sql.data,
+		run_vacuum_command(free_slot->connection, vacopts, sql.data,
 						   echo, tabname);
 
 		cell = cell->next;
@@ -363,7 +377,7 @@ vacuum_one_database(ConnParams *cparams,
 	}
 
 	/* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */
-	if (vacopts->skip_database_stats &&
+	if (vacopts->mode == MODE_VACUUM && vacopts->skip_database_stats &&
 		stage == ANALYZE_NO_STAGE)
 	{
 		const char *cmd = "VACUUM (ONLY_DATABASE_STATS);";
@@ -376,7 +390,7 @@ vacuum_one_database(ConnParams *cparams,
 		}
 
 		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, cmd, echo, NULL);
+		run_vacuum_command(free_slot->connection, vacopts, cmd, echo, NULL);
 
 		if (!ParallelSlotsWaitCompletion(sa))
 			failed = true;
@@ -708,6 +722,12 @@ vacuum_all_databases(ConnParams *cparams,
 	int			i;
 
 	conn = connectMaintenanceDatabase(cparams, progname, echo);
+	if (vacopts->mode == MODE_REPACK && PQserverVersion(conn) < 190000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" command on server versions older than PostgreSQL %s",
+				 "REPACK", "19");
+	}
 	result = executeQuery(conn,
 						  "SELECT datname FROM pg_database WHERE datallowconn AND datconnlimit <> -2 ORDER BY 1;",
 						  echo);
@@ -761,7 +781,7 @@ vacuum_all_databases(ConnParams *cparams,
 }
 
 /*
- * Construct a vacuum/analyze command to run based on the given
+ * Construct a vacuum/analyze/repack command to run based on the given
  * options, in the given string buffer, which may contain previous garbage.
  *
  * The table name used must be already properly quoted.  The command generated
@@ -777,7 +797,13 @@ prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
 
 	resetPQExpBuffer(sql);
 
-	if (vacopts->analyze_only)
+	if (vacopts->mode == MODE_REPACK)
+	{
+		appendPQExpBufferStr(sql, "REPACK");
+		if (vacopts->verbose)
+			appendPQExpBufferStr(sql, " (VERBOSE)");
+	}
+	else if (vacopts->analyze_only)
 	{
 		appendPQExpBufferStr(sql, "ANALYZE");
 
@@ -938,8 +964,8 @@ prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
  * Any errors during command execution are reported to stderr.
  */
 void
-run_vacuum_command(PGconn *conn, const char *sql, bool echo,
-				   const char *table)
+run_vacuum_command(PGconn *conn, vacuumingOptions *vacopts,
+				   const char *sql, bool echo, const char *table)
 {
 	bool		status;
 
@@ -952,13 +978,21 @@ run_vacuum_command(PGconn *conn, const char *sql, bool echo,
 	{
 		if (table)
 		{
-			pg_log_error("vacuuming of table \"%s\" in database \"%s\" failed: %s",
-						 table, PQdb(conn), PQerrorMessage(conn));
+			if (vacopts->mode == MODE_VACUUM)
+				pg_log_error("vacuuming of table \"%s\" in database \"%s\" failed: %s",
+							 table, PQdb(conn), PQerrorMessage(conn));
+			else
+				pg_log_error("repacking of table \"%s\" in database \"%s\" failed: %s",
+							 table, PQdb(conn), PQerrorMessage(conn));
 		}
 		else
 		{
-			pg_log_error("vacuuming of database \"%s\" failed: %s",
-						 PQdb(conn), PQerrorMessage(conn));
+			if (vacopts->mode == MODE_VACUUM)
+				pg_log_error("vacuuming of database \"%s\" failed: %s",
+							 PQdb(conn), PQerrorMessage(conn));
+			else
+				pg_log_error("repacking of database \"%s\" failed: %s",
+							 PQdb(conn), PQerrorMessage(conn));
 		}
 	}
 }
diff --git a/src/bin/scripts/vacuuming.h b/src/bin/scripts/vacuuming.h
index d3f000840fa..154bc9925c0 100644
--- a/src/bin/scripts/vacuuming.h
+++ b/src/bin/scripts/vacuuming.h
@@ -17,6 +17,12 @@
 #include "fe_utils/connect_utils.h"
 #include "fe_utils/simple_list.h"
 
+typedef enum
+{
+	MODE_VACUUM,
+	MODE_REPACK
+} RunMode;
+
 /* For analyze-in-stages mode */
 #define ANALYZE_NO_STAGE	-1
 #define ANALYZE_NUM_STAGES	3
@@ -24,6 +30,7 @@
 /* vacuum options controlled by user flags */
 typedef struct vacuumingOptions
 {
+	RunMode		mode;
 	bool		analyze_only;
 	bool		verbose;
 	bool		and_analyze;
@@ -87,8 +94,8 @@ extern void vacuum_all_databases(ConnParams *cparams,
 extern void prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
 								   vacuumingOptions *vacopts, const char *table);
 
-extern void run_vacuum_command(PGconn *conn, const char *sql, bool echo,
-							   const char *table);
+extern void run_vacuum_command(PGconn *conn, vacuumingOptions *vacopts,
+							   const char *sql, bool echo, const char *table);
 
 extern char *escape_quotes(const char *src);
 
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 60088a64cbb..890998d84bb 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -24,6 +24,7 @@
 #define CLUOPT_RECHECK 0x02		/* recheck relation state */
 #define CLUOPT_RECHECK_ISCLUSTERED 0x04 /* recheck relation state for
 										 * indisclustered */
+#define CLUOPT_ANALYZE 0x08		/* do an ANALYZE */
 
 /* options for CLUSTER */
 typedef struct ClusterParams
@@ -31,8 +32,11 @@ typedef struct ClusterParams
 	bits32		options;		/* bitmask of CLUOPT_* */
 } ClusterParams;
 
-extern void cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel);
-extern void cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params);
+
+extern void ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel);
+
+extern void cluster_rel(RepackCommand command, bool usingindex,
+						Relation OldHeap, Oid indexOid, ClusterParams *params);
 extern void check_index_is_clusterable(Relation OldHeap, Oid indexOid,
 									   LOCKMODE lockmode);
 extern void mark_index_clustered(Relation rel, Oid indexOid, bool is_internal);
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 1cde4bd9bcf..5b6639c114c 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -56,24 +56,51 @@
 #define PROGRESS_ANALYZE_PHASE_COMPUTE_EXT_STATS		4
 #define PROGRESS_ANALYZE_PHASE_FINALIZE_ANALYZE			5
 
-/* Progress parameters for cluster */
-#define PROGRESS_CLUSTER_COMMAND				0
-#define PROGRESS_CLUSTER_PHASE					1
-#define PROGRESS_CLUSTER_INDEX_RELID			2
-#define PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED	3
-#define PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN	4
-#define PROGRESS_CLUSTER_TOTAL_HEAP_BLKS		5
-#define PROGRESS_CLUSTER_HEAP_BLKS_SCANNED		6
-#define PROGRESS_CLUSTER_INDEX_REBUILD_COUNT	7
+/*
+ * Progress parameters for REPACK.
+ *
+ * Note: Since REPACK shares some code with CLUSTER, these values are also
+ * used by CLUSTER. (CLUSTER is now deprecated, so it makes little sense to
+ * introduce a separate set of constants.)
+ */
+#define PROGRESS_REPACK_COMMAND					0
+#define PROGRESS_REPACK_PHASE					1
+#define PROGRESS_REPACK_INDEX_RELID				2
+#define PROGRESS_REPACK_HEAP_TUPLES_SCANNED		3
+#define PROGRESS_REPACK_HEAP_TUPLES_WRITTEN		4
+#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			5
+#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		6
+#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		7
 
-/* Phases of cluster (as advertised via PROGRESS_CLUSTER_PHASE) */
-#define PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP	1
-#define PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP	2
-#define PROGRESS_CLUSTER_PHASE_SORT_TUPLES		3
-#define PROGRESS_CLUSTER_PHASE_WRITE_NEW_HEAP	4
-#define PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES	5
-#define PROGRESS_CLUSTER_PHASE_REBUILD_INDEX	6
-#define PROGRESS_CLUSTER_PHASE_FINAL_CLEANUP	7
+/*
+ * Phases of repack (as advertised via PROGRESS_REPACK_PHASE).
+ */
+#define PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP		1
+#define PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP	2
+#define PROGRESS_REPACK_PHASE_SORT_TUPLES		3
+#define PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP	4
+#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	5
+#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		6
+#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		7
+
+/*
+ * Commands of PROGRESS_REPACK
+ *
+ * Currently we only have one command, so the PROGRESS_REPACK_COMMAND
+ * parameter is not necessary. However it makes cluster.c simpler if we have
+ * the same set of parameters for CLUSTER and REPACK - see the note on REPACK
+ * parameters above.
+ */
+#define PROGRESS_REPACK_COMMAND_REPACK			1
+
+/*
+ * Progress parameters for cluster.
+ *
+ * Although we need to report REPACK and CLUSTER in separate views, the
+ * parameters and phases of CLUSTER are a subset of those of REPACK. Therefore
+ * we just use the appropriate values defined for REPACK above instead of
+ * defining a separate set of constants here.
+ */
 
 /* Commands of PROGRESS_CLUSTER */
 #define PROGRESS_CLUSTER_COMMAND_CLUSTER		1
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 86a236bd58b..fcc25a0c592 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3949,16 +3949,26 @@ typedef struct AlterSystemStmt
 } AlterSystemStmt;
 
 /* ----------------------
- *		Cluster Statement (support pbrown's cluster index implementation)
+ *		Repack Statement
  * ----------------------
  */
-typedef struct ClusterStmt
+typedef enum RepackCommand
+{
+	REPACK_COMMAND_CLUSTER,
+	REPACK_COMMAND_REPACK,
+	REPACK_COMMAND_VACUUMFULL,
+} RepackCommand;
+
+typedef struct RepackStmt
 {
 	NodeTag		type;
-	RangeVar   *relation;		/* relation being indexed, or NULL if all */
-	char	   *indexname;		/* original index defined */
+	RepackCommand command;		/* type of command being run */
+	RangeVar   *relation;		/* relation being repacked */
+	char	   *indexname;		/* order tuples by this index */
+	bool		usingindex;		/* whether USING INDEX is specified */
 	List	   *params;			/* list of DefElem nodes */
-} ClusterStmt;
+} RepackStmt;
+
 
 /* ----------------------
  *		Vacuum and Analyze Statements
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index a4af3f717a1..22559369e2c 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -374,6 +374,7 @@ PG_KEYWORD("reindex", REINDEX, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("relative", RELATIVE_P, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("release", RELEASE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("rename", RENAME, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("repack", REPACK, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("repeatable", REPEATABLE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("replace", REPLACE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("replica", REPLICA, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/include/tcop/cmdtaglist.h b/src/include/tcop/cmdtaglist.h
index d250a714d59..cceb312f2b3 100644
--- a/src/include/tcop/cmdtaglist.h
+++ b/src/include/tcop/cmdtaglist.h
@@ -196,6 +196,7 @@ PG_CMDTAG(CMDTAG_REASSIGN_OWNED, "REASSIGN OWNED", false, false, false)
 PG_CMDTAG(CMDTAG_REFRESH_MATERIALIZED_VIEW, "REFRESH MATERIALIZED VIEW", true, false, false)
 PG_CMDTAG(CMDTAG_REINDEX, "REINDEX", true, false, false)
 PG_CMDTAG(CMDTAG_RELEASE, "RELEASE", false, false, false)
+PG_CMDTAG(CMDTAG_REPACK, "REPACK", false, false, false)
 PG_CMDTAG(CMDTAG_RESET, "RESET", false, false, false)
 PG_CMDTAG(CMDTAG_REVOKE, "REVOKE", true, false, false)
 PG_CMDTAG(CMDTAG_REVOKE_ROLE, "REVOKE ROLE", false, false, false)
diff --git a/src/include/utils/backend_progress.h b/src/include/utils/backend_progress.h
index dda813ab407..e69e366dcdc 100644
--- a/src/include/utils/backend_progress.h
+++ b/src/include/utils/backend_progress.h
@@ -28,6 +28,7 @@ typedef enum ProgressCommandType
 	PROGRESS_COMMAND_CREATE_INDEX,
 	PROGRESS_COMMAND_BASEBACKUP,
 	PROGRESS_COMMAND_COPY,
+	PROGRESS_COMMAND_REPACK,
 } ProgressCommandType;
 
 #define PGSTAT_NUM_PROGRESS_PARAM	20
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index 4d40a6809ab..5256628b51d 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -254,6 +254,63 @@ ORDER BY 1;
  clstr_tst_pkey
 (3 rows)
 
+-- REPACK handles individual tables identically to CLUSTER, but it's worth
+-- checking if it handles table hierarchies identically as well.
+REPACK clstr_tst USING INDEX clstr_tst_c;
+-- Verify that inheritance link still works
+INSERT INTO clstr_tst_inh VALUES (0, 100, 'in child table 2');
+SELECT a,b,c,substring(d for 30), length(d) from clstr_tst;
+ a  |  b  |        c         |           substring            | length 
+----+-----+------------------+--------------------------------+--------
+ 10 |  14 | catorce          |                                |       
+ 18 |   5 | cinco            |                                |       
+  9 |   4 | cuatro           |                                |       
+ 26 |  19 | diecinueve       |                                |       
+ 12 |  18 | dieciocho        |                                |       
+ 30 |  16 | dieciseis        |                                |       
+ 24 |  17 | diecisiete       |                                |       
+  2 |  10 | diez             |                                |       
+ 23 |  12 | doce             |                                |       
+ 11 |   2 | dos              |                                |       
+ 25 |   9 | nueve            |                                |       
+ 31 |   8 | ocho             |                                |       
+  1 |  11 | once             |                                |       
+ 28 |  15 | quince           |                                |       
+ 32 |   6 | seis             | xyzzyxyzzyxyzzyxyzzyxyzzyxyzzy | 500000
+ 29 |   7 | siete            |                                |       
+ 15 |  13 | trece            |                                |       
+ 22 |  30 | treinta          |                                |       
+ 17 |  32 | treinta y dos    |                                |       
+  3 |  31 | treinta y uno    |                                |       
+  5 |   3 | tres             |                                |       
+ 20 |   1 | uno              |                                |       
+  6 |  20 | veinte           |                                |       
+ 14 |  25 | veinticinco      |                                |       
+ 21 |  24 | veinticuatro     |                                |       
+  4 |  22 | veintidos        |                                |       
+ 19 |  29 | veintinueve      |                                |       
+ 16 |  28 | veintiocho       |                                |       
+ 27 |  26 | veintiseis       |                                |       
+ 13 |  27 | veintisiete      |                                |       
+  7 |  23 | veintitres       |                                |       
+  8 |  21 | veintiuno        |                                |       
+  0 | 100 | in child table   |                                |       
+  0 | 100 | in child table 2 |                                |       
+(34 rows)
+
+-- Verify that foreign key link still works
+INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
+ERROR:  insert or update on table "clstr_tst" violates foreign key constraint "clstr_tst_con"
+DETAIL:  Key (b)=(1111) is not present in table "clstr_tst_s".
+SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
+ORDER BY 1;
+       conname        
+----------------------
+ clstr_tst_a_not_null
+ clstr_tst_con
+ clstr_tst_pkey
+(3 rows)
+
 SELECT relname, relkind,
     EXISTS(SELECT 1 FROM pg_class WHERE oid = c.reltoastrelid) AS hastoast
 FROM pg_class c WHERE relname LIKE 'clstr_tst%' ORDER BY relname;
@@ -381,6 +438,35 @@ SELECT * FROM clstr_1;
  2
 (2 rows)
 
+-- REPACK w/o argument performs no ordering, so we can only check which tables
+-- have the relfilenode changed.
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_old AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+SET SESSION AUTHORIZATION regress_clstr_user;
+SET client_min_messages = ERROR;  -- order of "skipping" warnings may vary
+REPACK;
+RESET client_min_messages;
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_new AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+-- Do the actual comparison. Unlike CLUSTER, clstr_3 should have been
+-- processed because there is nothing like clustering index here.
+SELECT o.relname FROM relnodes_old o
+JOIN relnodes_new n ON o.relname = n.relname
+WHERE o.relfilenode <> n.relfilenode
+ORDER BY o.relname;
+ relname 
+---------
+ clstr_1
+ clstr_3
+(2 rows)
+
+SET SESSION AUTHORIZATION regress_clstr_user;
 -- Test MVCC-safety of cluster. There isn't much we can do to verify the
 -- results with a single backend...
 CREATE TABLE clustertest (key int PRIMARY KEY);
@@ -495,6 +581,43 @@ ALTER TABLE clstrpart SET WITHOUT CLUSTER;
 ERROR:  cannot mark index clustered in partitioned table
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 ERROR:  cannot mark index clustered in partitioned table
+-- Check that REPACK sets new relfilenodes: it should process exactly the same
+-- tables as CLUSTER did.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart USING INDEX clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
+-- And finally the same for REPACK w/o index.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
 DROP TABLE clstrpart;
 -- Ownership of partitions is checked
 CREATE TABLE ptnowner(i int unique) PARTITION BY LIST (i);
@@ -513,7 +636,7 @@ CREATE TEMP TABLE ptnowner_oldnodes AS
   JOIN pg_class AS c ON c.oid=tree.relid;
 SET SESSION AUTHORIZATION regress_ptnowner;
 CLUSTER ptnowner USING ptnowner_i_idx;
-WARNING:  permission denied to cluster "ptnowner2", skipping it
+WARNING:  permission denied to execute CLUSTER on "ptnowner2", skipping it
 RESET SESSION AUTHORIZATION;
 SELECT a.relname, a.relfilenode=b.relfilenode FROM pg_class a
   JOIN ptnowner_oldnodes b USING (oid) ORDER BY a.relname COLLATE "C";
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 35e8aad7701..3a1d1d28282 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2071,6 +2071,29 @@ pg_stat_progress_create_index| SELECT s.pid,
     s.param15 AS partitions_done
    FROM (pg_stat_get_progress_info('CREATE INDEX'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+pg_stat_progress_repack| SELECT s.pid,
+    s.datid,
+    d.datname,
+    s.relid,
+        CASE s.param2
+            WHEN 0 THEN 'initializing'::text
+            WHEN 1 THEN 'seq scanning heap'::text
+            WHEN 2 THEN 'index scanning heap'::text
+            WHEN 3 THEN 'sorting tuples'::text
+            WHEN 4 THEN 'writing new heap'::text
+            WHEN 5 THEN 'swapping relation files'::text
+            WHEN 6 THEN 'rebuilding index'::text
+            WHEN 7 THEN 'performing final cleanup'::text
+            ELSE NULL::text
+        END AS phase,
+    (s.param3)::oid AS repack_index_relid,
+    s.param4 AS heap_tuples_scanned,
+    s.param5 AS heap_tuples_written,
+    s.param6 AS heap_blks_total,
+    s.param7 AS heap_blks_scanned,
+    s.param8 AS index_rebuild_count
+   FROM (pg_stat_get_progress_info('REPACK'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
+     LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_vacuum| SELECT s.pid,
     s.datid,
     d.datname,
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index b7115f86104..cfcc3dc9761 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -76,6 +76,19 @@ INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
 SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
 ORDER BY 1;
 
+-- REPACK handles individual tables identically to CLUSTER, but it's worth
+-- checking if it handles table hierarchies identically as well.
+REPACK clstr_tst USING INDEX clstr_tst_c;
+
+-- Verify that inheritance link still works
+INSERT INTO clstr_tst_inh VALUES (0, 100, 'in child table 2');
+SELECT a,b,c,substring(d for 30), length(d) from clstr_tst;
+
+-- Verify that foreign key link still works
+INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
+
+SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
+ORDER BY 1;
 
 SELECT relname, relkind,
     EXISTS(SELECT 1 FROM pg_class WHERE oid = c.reltoastrelid) AS hastoast
@@ -159,6 +172,34 @@ INSERT INTO clstr_1 VALUES (1);
 CLUSTER clstr_1;
 SELECT * FROM clstr_1;
 
+-- REPACK w/o argument performs no ordering, so we can only check which tables
+-- have the relfilenode changed.
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_old AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+
+SET SESSION AUTHORIZATION regress_clstr_user;
+SET client_min_messages = ERROR;  -- order of "skipping" warnings may vary
+REPACK;
+RESET client_min_messages;
+
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_new AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+
+-- Do the actual comparison. Unlike CLUSTER, clstr_3 should have been
+-- processed because there is nothing like clustering index here.
+SELECT o.relname FROM relnodes_old o
+JOIN relnodes_new n ON o.relname = n.relname
+WHERE o.relfilenode <> n.relfilenode
+ORDER BY o.relname;
+
+SET SESSION AUTHORIZATION regress_clstr_user;
+
 -- Test MVCC-safety of cluster. There isn't much we can do to verify the
 -- results with a single backend...
 
@@ -229,6 +270,24 @@ SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM o
 CLUSTER clstrpart;
 ALTER TABLE clstrpart SET WITHOUT CLUSTER;
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+
+-- Check that REPACK sets new relfilenodes: it should process exactly the same
+-- tables as CLUSTER did.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart USING INDEX clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+
+-- And finally the same for REPACK w/o index.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+
 DROP TABLE clstrpart;
 
 -- Ownership of partitions is checked
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a13e8162890..98242e25432 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2537,6 +2537,8 @@ ReorderBufferTupleCidKey
 ReorderBufferUpdateProgressTxnCB
 ReorderTuple
 RepOriginId
+RepackCommand
+RepackStmt
 ReparameterizeForeignPathByChild_function
 ReplaceVarsFromTargetList_context
 ReplaceVarsNoMatchOption
@@ -2603,6 +2605,7 @@ RtlNtStatusToDosError_t
 RuleInfo
 RuleLock
 RuleStmt
+RunMode
 RunningTransactions
 RunningTransactionsData
 SASLStatus
-- 
2.43.0

v21-0004-Move-conversion-of-a-historic-to-MVCC-snapshot-t.patchapplication/octet-stream; name=v21-0004-Move-conversion-of-a-historic-to-MVCC-snapshot-t.patchDownload

From b9384aa62c96c94d45bb7e97a56acda5590f0c5f Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Mon, 11 Aug 2025 15:23:05 +0200
Subject: [PATCH v21 4/6] Move conversion of a "historic" to MVCC snapshot to a
 separate function.

The conversion is now handled by SnapBuildMVCCFromHistoric(). REPACK
CONCURRENTLY will also need it.
---
 src/backend/replication/logical/snapbuild.c | 51 +++++++++++++++++----
 src/backend/utils/time/snapmgr.c            |  3 +-
 src/include/replication/snapbuild.h         |  1 +
 src/include/utils/snapmgr.h                 |  1 +
 4 files changed, 45 insertions(+), 11 deletions(-)

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 98ddee20929..a2f1803622c 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -440,10 +440,7 @@ Snapshot
 SnapBuildInitialSnapshot(SnapBuild *builder)
 {
 	Snapshot	snap;
-	TransactionId xid;
 	TransactionId safeXid;
-	TransactionId *newxip;
-	int			newxcnt = 0;
 
 	Assert(XactIsoLevel == XACT_REPEATABLE_READ);
 	Assert(builder->building_full_snapshot);
@@ -485,6 +482,31 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 
 	MyProc->xmin = snap->xmin;
 
+	/* Convert the historic snapshot to MVCC snapshot. */
+	return SnapBuildMVCCFromHistoric(snap, true);
+}
+
+/*
+ * Turn a historic MVCC snapshot into an ordinary MVCC snapshot.
+ *
+ * Unlike a regular (non-historic) MVCC snapshot, the xip array of this
+ * snapshot contains not only running main transactions, but also their
+ * subtransactions. This difference does has no impact on XidInMVCCSnapshot().
+ *
+ * Pass true for 'in_place' if you don't care about modifying the source
+ * snapshot. If you need a new instance, and one that was allocated as a
+ * single chunk of memory, pass false.
+ */
+Snapshot
+SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place)
+{
+	TransactionId xid;
+	TransactionId *oldxip = snapshot->xip;
+	uint32		oldxcnt = snapshot->xcnt;
+	TransactionId *newxip;
+	int			newxcnt = 0;
+	Snapshot	result;
+
 	/* allocate in transaction context */
 	newxip = (TransactionId *)
 		palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount());
@@ -495,7 +517,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	 * classical snapshot by marking all non-committed transactions as
 	 * in-progress. This can be expensive.
 	 */
-	for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);)
+	for (xid = snapshot->xmin; NormalTransactionIdPrecedes(xid, snapshot->xmax);)
 	{
 		void	   *test;
 
@@ -503,7 +525,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 		 * Check whether transaction committed using the decoding snapshot
 		 * meaning of ->xip.
 		 */
-		test = bsearch(&xid, snap->xip, snap->xcnt,
+		test = bsearch(&xid, snapshot->xip, snapshot->xcnt,
 					   sizeof(TransactionId), xidComparator);
 
 		if (test == NULL)
@@ -520,11 +542,22 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	}
 
 	/* adjust remaining snapshot fields as needed */
-	snap->snapshot_type = SNAPSHOT_MVCC;
-	snap->xcnt = newxcnt;
-	snap->xip = newxip;
+	snapshot->xcnt = newxcnt;
+	snapshot->xip = newxip;
 
-	return snap;
+	if (in_place)
+		result = snapshot;
+	else
+	{
+		result = CopySnapshot(snapshot);
+
+		/* Restore the original values so the source is intact. */
+		snapshot->xip = oldxip;
+		snapshot->xcnt = oldxcnt;
+	}
+	result->snapshot_type = SNAPSHOT_MVCC;
+
+	return result;
 }
 
 /*
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 65561cc6bc3..bc7840052fe 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -212,7 +212,6 @@ typedef struct ExportedSnapshot
 static List *exportedSnapshots = NIL;
 
 /* Prototypes for local functions */
-static Snapshot CopySnapshot(Snapshot snapshot);
 static void UnregisterSnapshotNoOwner(Snapshot snapshot);
 static void FreeSnapshot(Snapshot snapshot);
 static void SnapshotResetXmin(void);
@@ -602,7 +601,7 @@ SetTransactionSnapshot(Snapshot sourcesnap, VirtualTransactionId *sourcevxid,
  * The copy is palloc'd in TopTransactionContext and has initial refcounts set
  * to 0.  The returned snapshot has the copied flag set.
  */
-static Snapshot
+Snapshot
 CopySnapshot(Snapshot snapshot)
 {
 	Snapshot	newsnap;
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index 44031dcf6e3..6d4d2d1814c 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -73,6 +73,7 @@ extern void FreeSnapshotBuilder(SnapBuild *builder);
 extern void SnapBuildSnapDecRefcount(Snapshot snap);
 
 extern Snapshot SnapBuildInitialSnapshot(SnapBuild *builder);
+extern Snapshot SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place);
 extern const char *SnapBuildExportSnapshot(SnapBuild *builder);
 extern void SnapBuildClearExportedSnapshot(void);
 extern void SnapBuildResetExportedSnapshotState(void);
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..f65f83c85cd 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -63,6 +63,7 @@ extern Snapshot GetTransactionSnapshot(void);
 extern Snapshot GetLatestSnapshot(void);
 extern void SnapshotSetCommandId(CommandId curcid);
 
+extern Snapshot CopySnapshot(Snapshot snapshot);
 extern Snapshot GetCatalogSnapshot(Oid relid);
 extern Snapshot GetNonHistoricCatalogSnapshot(Oid relid);
 extern void InvalidateCatalogSnapshot(void);
-- 
2.43.0

v21-0001-Split-vacuumdb-to-create-vacuuming.c-h.patchapplication/octet-stream; name=v21-0001-Split-vacuumdb-to-create-vacuuming.c-h.patchDownload

From 2206b215a8855cf8a9c29889f5feab4a0bd8a7e0 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= <alvherre@kurilemu.de>
Date: Sat, 30 Aug 2025 14:39:49 +0200
Subject: [PATCH v21 1/6] Split vacuumdb to create vacuuming.c/h

---
 src/bin/scripts/Makefile    |    4 +-
 src/bin/scripts/meson.build |   28 +-
 src/bin/scripts/vacuumdb.c  | 1048 +----------------------------------
 src/bin/scripts/vacuuming.c |  978 ++++++++++++++++++++++++++++++++
 src/bin/scripts/vacuuming.h |   95 ++++
 5 files changed, 1119 insertions(+), 1034 deletions(-)
 create mode 100644 src/bin/scripts/vacuuming.c
 create mode 100644 src/bin/scripts/vacuuming.h

diff --git a/src/bin/scripts/Makefile b/src/bin/scripts/Makefile
index f6b4d40810b..019ca06455d 100644
--- a/src/bin/scripts/Makefile
+++ b/src/bin/scripts/Makefile
@@ -28,7 +28,7 @@ createuser: createuser.o common.o $(WIN32RES) | submake-libpq submake-libpgport
 dropdb: dropdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 dropuser: dropuser.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 clusterdb: clusterdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
-vacuumdb: vacuumdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
+vacuumdb: vacuumdb.o vacuuming.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 reindexdb: reindexdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 pg_isready: pg_isready.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 
@@ -50,7 +50,7 @@ uninstall:
 
 clean distclean:
 	rm -f $(addsuffix $(X), $(PROGRAMS)) $(addsuffix .o, $(PROGRAMS))
-	rm -f common.o $(WIN32RES)
+	rm -f common.o vacuuming.o $(WIN32RES)
 	rm -rf tmp_check
 
 export with_icu
diff --git a/src/bin/scripts/meson.build b/src/bin/scripts/meson.build
index 80df7c33257..a4fed59d1c9 100644
--- a/src/bin/scripts/meson.build
+++ b/src/bin/scripts/meson.build
@@ -12,7 +12,6 @@ binaries = [
   'createuser',
   'dropuser',
   'clusterdb',
-  'vacuumdb',
   'reindexdb',
   'pg_isready',
 ]
@@ -35,6 +34,33 @@ foreach binary : binaries
   bin_targets += binary
 endforeach
 
+vacuuming_common = static_library('libvacuuming_common',
+  files('common.c', 'vacuuming.c'),
+  dependencies: [frontend_code, libpq],
+  kwargs: internal_lib_args,
+)
+
+binaries = [
+  'vacuumdb',
+]
+foreach binary : binaries
+  binary_sources = files('@0@.c'.format(binary))
+
+  if host_system == 'windows'
+    binary_sources += rc_bin_gen.process(win32ver_rc, extra_args: [
+      '--NAME', binary,
+      '--FILEDESC', '@0@ - PostgreSQL utility'.format(binary),])
+  endif
+
+  binary = executable(binary,
+    binary_sources,
+    link_with: [vacuuming_common],
+    dependencies: [frontend_code, libpq],
+    kwargs: default_bin_args,
+  )
+  bin_targets += binary
+endforeach
+
 tests += {
   'name': 'scripts',
   'sd': meson.current_source_dir(),
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index fd236087e90..b1be61ddf25 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -14,92 +14,13 @@
 
 #include <limits.h>
 
-#include "catalog/pg_attribute_d.h"
-#include "catalog/pg_class_d.h"
 #include "common.h"
-#include "common/connect.h"
 #include "common/logging.h"
-#include "fe_utils/cancel.h"
 #include "fe_utils/option_utils.h"
-#include "fe_utils/parallel_slot.h"
-#include "fe_utils/query_utils.h"
-#include "fe_utils/simple_list.h"
-#include "fe_utils/string_utils.h"
-
-
-/* vacuum options controlled by user flags */
-typedef struct vacuumingOptions
-{
-	bool		analyze_only;
-	bool		verbose;
-	bool		and_analyze;
-	bool		full;
-	bool		freeze;
-	bool		disable_page_skipping;
-	bool		skip_locked;
-	int			min_xid_age;
-	int			min_mxid_age;
-	int			parallel_workers;	/* >= 0 indicates user specified the
-									 * parallel degree, otherwise -1 */
-	bool		no_index_cleanup;
-	bool		force_index_cleanup;
-	bool		do_truncate;
-	bool		process_main;
-	bool		process_toast;
-	bool		skip_database_stats;
-	char	   *buffer_usage_limit;
-	bool		missing_stats_only;
-} vacuumingOptions;
-
-/* object filter options */
-typedef enum
-{
-	OBJFILTER_NONE = 0,			/* no filter used */
-	OBJFILTER_ALL_DBS = (1 << 0),	/* -a | --all */
-	OBJFILTER_DATABASE = (1 << 1),	/* -d | --dbname */
-	OBJFILTER_TABLE = (1 << 2), /* -t | --table */
-	OBJFILTER_SCHEMA = (1 << 3),	/* -n | --schema */
-	OBJFILTER_SCHEMA_EXCLUDE = (1 << 4),	/* -N | --exclude-schema */
-} VacObjFilter;
-
-static VacObjFilter objfilter = OBJFILTER_NONE;
-
-static SimpleStringList *retrieve_objects(PGconn *conn,
-										  vacuumingOptions *vacopts,
-										  SimpleStringList *objects,
-										  bool echo);
-
-static void vacuum_one_database(ConnParams *cparams,
-								vacuumingOptions *vacopts,
-								int stage,
-								SimpleStringList *objects,
-								SimpleStringList **found_objs,
-								int concurrentCons,
-								const char *progname, bool echo, bool quiet);
-
-static void vacuum_all_databases(ConnParams *cparams,
-								 vacuumingOptions *vacopts,
-								 bool analyze_in_stages,
-								 SimpleStringList *objects,
-								 int concurrentCons,
-								 const char *progname, bool echo, bool quiet);
-
-static void prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
-								   vacuumingOptions *vacopts, const char *table);
-
-static void run_vacuum_command(PGconn *conn, const char *sql, bool echo,
-							   const char *table);
+#include "vacuuming.h"
 
 static void help(const char *progname);
-
-void		check_objfilter(void);
-
-static char *escape_quotes(const char *src);
-
-/* For analyze-in-stages mode */
-#define ANALYZE_NO_STAGE	-1
-#define ANALYZE_NUM_STAGES	3
-
+static void check_objfilter(void);
 
 int
 main(int argc, char *argv[])
@@ -145,10 +66,6 @@ main(int argc, char *argv[])
 	int			c;
 	const char *dbname = NULL;
 	const char *maintenance_db = NULL;
-	char	   *host = NULL;
-	char	   *port = NULL;
-	char	   *username = NULL;
-	enum trivalue prompt_password = TRI_DEFAULT;
 	ConnParams	cparams;
 	bool		echo = false;
 	bool		quiet = false;
@@ -168,13 +85,18 @@ main(int argc, char *argv[])
 	vacopts.process_main = true;
 	vacopts.process_toast = true;
 
+	/* the same for connection parameters */
+	memset(&cparams, 0, sizeof(cparams));
+	cparams.prompt_password = TRI_DEFAULT;
+
 	pg_logging_init(argv[0]);
 	progname = get_progname(argv[0]);
 	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
 
-	handle_help_version_opts(argc, argv, "vacuumdb", help);
+	handle_help_version_opts(argc, argv, progname, help);
 
-	while ((c = getopt_long(argc, argv, "ad:efFh:j:n:N:p:P:qt:U:vwWzZ", long_options, &optindex)) != -1)
+	while ((c = getopt_long(argc, argv, "ad:efFh:j:n:N:p:P:qt:U:vwWzZ",
+							long_options, &optindex)) != -1)
 	{
 		switch (c)
 		{
@@ -195,7 +117,7 @@ main(int argc, char *argv[])
 				vacopts.freeze = true;
 				break;
 			case 'h':
-				host = pg_strdup(optarg);
+				cparams.pghost = pg_strdup(optarg);
 				break;
 			case 'j':
 				if (!option_parse_int(optarg, "-j/--jobs", 1, INT_MAX,
@@ -211,7 +133,7 @@ main(int argc, char *argv[])
 				simple_string_list_append(&objects, optarg);
 				break;
 			case 'p':
-				port = pg_strdup(optarg);
+				cparams.pgport = pg_strdup(optarg);
 				break;
 			case 'P':
 				if (!option_parse_int(optarg, "-P/--parallel", 0, INT_MAX,
@@ -227,16 +149,16 @@ main(int argc, char *argv[])
 				tbl_count++;
 				break;
 			case 'U':
-				username = pg_strdup(optarg);
+				cparams.pguser = pg_strdup(optarg);
 				break;
 			case 'v':
 				vacopts.verbose = true;
 				break;
 			case 'w':
-				prompt_password = TRI_NO;
+				cparams.prompt_password = TRI_NO;
 				break;
 			case 'W':
-				prompt_password = TRI_YES;
+				cparams.prompt_password = TRI_YES;
 				break;
 			case 'z':
 				vacopts.and_analyze = true;
@@ -380,66 +302,9 @@ main(int argc, char *argv[])
 		pg_fatal("cannot use the \"%s\" option without \"%s\" or \"%s\"",
 				 "missing-stats-only", "analyze-only", "analyze-in-stages");
 
-	/* fill cparams except for dbname, which is set below */
-	cparams.pghost = host;
-	cparams.pgport = port;
-	cparams.pguser = username;
-	cparams.prompt_password = prompt_password;
-	cparams.override_dbname = NULL;
-
-	setup_cancel_handler(NULL);
-
-	/* Avoid opening extra connections. */
-	if (tbl_count && (concurrentCons > tbl_count))
-		concurrentCons = tbl_count;
-
-	if (objfilter & OBJFILTER_ALL_DBS)
-	{
-		cparams.dbname = maintenance_db;
-
-		vacuum_all_databases(&cparams, &vacopts,
-							 analyze_in_stages,
-							 &objects,
-							 concurrentCons,
-							 progname, echo, quiet);
-	}
-	else
-	{
-		if (dbname == NULL)
-		{
-			if (getenv("PGDATABASE"))
-				dbname = getenv("PGDATABASE");
-			else if (getenv("PGUSER"))
-				dbname = getenv("PGUSER");
-			else
-				dbname = get_user_name_or_exit(progname);
-		}
-
-		cparams.dbname = dbname;
-
-		if (analyze_in_stages)
-		{
-			int			stage;
-			SimpleStringList *found_objs = NULL;
-
-			for (stage = 0; stage < ANALYZE_NUM_STAGES; stage++)
-			{
-				vacuum_one_database(&cparams, &vacopts,
-									stage,
-									&objects,
-									vacopts.missing_stats_only ? &found_objs : NULL,
-									concurrentCons,
-									progname, echo, quiet);
-			}
-		}
-		else
-			vacuum_one_database(&cparams, &vacopts,
-								ANALYZE_NO_STAGE,
-								&objects, NULL,
-								concurrentCons,
-								progname, echo, quiet);
-	}
-
+	vacuuming_main(&cparams, dbname, maintenance_db, &vacopts, &objects,
+				   analyze_in_stages, tbl_count, concurrentCons,
+				   progname, echo, quiet);
 	exit(0);
 }
 
@@ -466,885 +331,6 @@ check_objfilter(void)
 		pg_fatal("cannot vacuum all tables in schema(s) and exclude schema(s) at the same time");
 }
 
-/*
- * Returns a newly malloc'd version of 'src' with escaped single quotes and
- * backslashes.
- */
-static char *
-escape_quotes(const char *src)
-{
-	char	   *result = escape_single_quotes_ascii(src);
-
-	if (!result)
-		pg_fatal("out of memory");
-	return result;
-}
-
-/*
- * vacuum_one_database
- *
- * Process tables in the given database.
- *
- * There are two ways to specify the list of objects to process:
- *
- * 1) The "found_objs" parameter is a double pointer to a fully qualified list
- *    of objects to process, as returned by a previous call to
- *    vacuum_one_database().
- *
- *     a) If both "found_objs" (the double pointer) and "*found_objs" (the
- *        once-dereferenced double pointer) are not NULL, this list takes
- *        priority, and anything specified in "objects" is ignored.
- *
- *     b) If "found_objs" (the double pointer) is not NULL but "*found_objs"
- *        (the once-dereferenced double pointer) _is_ NULL, the "objects"
- *        parameter takes priority, and the results of the catalog query
- *        described in (2) are stored in "found_objs".
- *
- *     c) If "found_objs" (the double pointer) is NULL, the "objects"
- *        parameter again takes priority, and the results of the catalog query
- *        are not saved.
- *
- * 2) The "objects" parameter is a user-specified list of objects to process.
- *    When (1b) or (1c) applies, this function performs a catalog query to
- *    retrieve a fully qualified list of objects to process, as described
- *    below.
- *
- *     a) If "objects" is not NULL, the catalog query gathers only the objects
- *        listed in "objects".
- *
- *     b) If "objects" is NULL, all tables in the database are gathered.
- *
- * Note that this function is only concerned with running exactly one stage
- * when in analyze-in-stages mode; caller must iterate on us if necessary.
- *
- * If concurrentCons is > 1, multiple connections are used to vacuum tables
- * in parallel.
- */
-static void
-vacuum_one_database(ConnParams *cparams,
-					vacuumingOptions *vacopts,
-					int stage,
-					SimpleStringList *objects,
-					SimpleStringList **found_objs,
-					int concurrentCons,
-					const char *progname, bool echo, bool quiet)
-{
-	PQExpBufferData sql;
-	PGconn	   *conn;
-	SimpleStringListCell *cell;
-	ParallelSlotArray *sa;
-	int			ntups = 0;
-	bool		failed = false;
-	const char *initcmd;
-	SimpleStringList *ret = NULL;
-	const char *stage_commands[] = {
-		"SET default_statistics_target=1; SET vacuum_cost_delay=0;",
-		"SET default_statistics_target=10; RESET vacuum_cost_delay;",
-		"RESET default_statistics_target;"
-	};
-	const char *stage_messages[] = {
-		gettext_noop("Generating minimal optimizer statistics (1 target)"),
-		gettext_noop("Generating medium optimizer statistics (10 targets)"),
-		gettext_noop("Generating default (full) optimizer statistics")
-	};
-
-	Assert(stage == ANALYZE_NO_STAGE ||
-		   (stage >= 0 && stage < ANALYZE_NUM_STAGES));
-
-	conn = connectDatabase(cparams, progname, echo, false, true);
-
-	if (vacopts->disable_page_skipping && PQserverVersion(conn) < 90600)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "disable-page-skipping", "9.6");
-	}
-
-	if (vacopts->no_index_cleanup && PQserverVersion(conn) < 120000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "no-index-cleanup", "12");
-	}
-
-	if (vacopts->force_index_cleanup && PQserverVersion(conn) < 120000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "force-index-cleanup", "12");
-	}
-
-	if (!vacopts->do_truncate && PQserverVersion(conn) < 120000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "no-truncate", "12");
-	}
-
-	if (!vacopts->process_main && PQserverVersion(conn) < 160000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "no-process-main", "16");
-	}
-
-	if (!vacopts->process_toast && PQserverVersion(conn) < 140000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "no-process-toast", "14");
-	}
-
-	if (vacopts->skip_locked && PQserverVersion(conn) < 120000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "skip-locked", "12");
-	}
-
-	if (vacopts->min_xid_age != 0 && PQserverVersion(conn) < 90600)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "--min-xid-age", "9.6");
-	}
-
-	if (vacopts->min_mxid_age != 0 && PQserverVersion(conn) < 90600)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "--min-mxid-age", "9.6");
-	}
-
-	if (vacopts->parallel_workers >= 0 && PQserverVersion(conn) < 130000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "--parallel", "13");
-	}
-
-	if (vacopts->buffer_usage_limit && PQserverVersion(conn) < 160000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "--buffer-usage-limit", "16");
-	}
-
-	if (vacopts->missing_stats_only && PQserverVersion(conn) < 150000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "--missing-stats-only", "15");
-	}
-
-	/* skip_database_stats is used automatically if server supports it */
-	vacopts->skip_database_stats = (PQserverVersion(conn) >= 160000);
-
-	if (!quiet)
-	{
-		if (stage != ANALYZE_NO_STAGE)
-			printf(_("%s: processing database \"%s\": %s\n"),
-				   progname, PQdb(conn), _(stage_messages[stage]));
-		else
-			printf(_("%s: vacuuming database \"%s\"\n"),
-				   progname, PQdb(conn));
-		fflush(stdout);
-	}
-
-	/*
-	 * If the caller provided the results of a previous catalog query, just
-	 * use that.  Otherwise, run the catalog query ourselves and set the
-	 * return variable if provided.
-	 */
-	if (found_objs && *found_objs)
-		ret = *found_objs;
-	else
-	{
-		ret = retrieve_objects(conn, vacopts, objects, echo);
-		if (found_objs)
-			*found_objs = ret;
-	}
-
-	/*
-	 * Count the number of objects in the catalog query result.  If there are
-	 * none, we are done.
-	 */
-	for (cell = ret ? ret->head : NULL; cell; cell = cell->next)
-		ntups++;
-
-	if (ntups == 0)
-	{
-		PQfinish(conn);
-		return;
-	}
-
-	/*
-	 * Ensure concurrentCons is sane.  If there are more connections than
-	 * vacuumable relations, we don't need to use them all.
-	 */
-	if (concurrentCons > ntups)
-		concurrentCons = ntups;
-	if (concurrentCons <= 0)
-		concurrentCons = 1;
-
-	/*
-	 * All slots need to be prepared to run the appropriate analyze stage, if
-	 * caller requested that mode.  We have to prepare the initial connection
-	 * ourselves before setting up the slots.
-	 */
-	if (stage == ANALYZE_NO_STAGE)
-		initcmd = NULL;
-	else
-	{
-		initcmd = stage_commands[stage];
-		executeCommand(conn, initcmd, echo);
-	}
-
-	/*
-	 * Setup the database connections. We reuse the connection we already have
-	 * for the first slot.  If not in parallel mode, the first slot in the
-	 * array contains the connection.
-	 */
-	sa = ParallelSlotsSetup(concurrentCons, cparams, progname, echo, initcmd);
-	ParallelSlotsAdoptConn(sa, conn);
-
-	initPQExpBuffer(&sql);
-
-	cell = ret->head;
-	do
-	{
-		const char *tabname = cell->val;
-		ParallelSlot *free_slot;
-
-		if (CancelRequested)
-		{
-			failed = true;
-			goto finish;
-		}
-
-		free_slot = ParallelSlotsGetIdle(sa, NULL);
-		if (!free_slot)
-		{
-			failed = true;
-			goto finish;
-		}
-
-		prepare_vacuum_command(&sql, PQserverVersion(free_slot->connection),
-							   vacopts, tabname);
-
-		/*
-		 * Execute the vacuum.  All errors are handled in processQueryResult
-		 * through ParallelSlotsGetIdle.
-		 */
-		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, sql.data,
-						   echo, tabname);
-
-		cell = cell->next;
-	} while (cell != NULL);
-
-	if (!ParallelSlotsWaitCompletion(sa))
-	{
-		failed = true;
-		goto finish;
-	}
-
-	/* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */
-	if (vacopts->skip_database_stats && stage == ANALYZE_NO_STAGE)
-	{
-		const char *cmd = "VACUUM (ONLY_DATABASE_STATS);";
-		ParallelSlot *free_slot = ParallelSlotsGetIdle(sa, NULL);
-
-		if (!free_slot)
-		{
-			failed = true;
-			goto finish;
-		}
-
-		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, cmd, echo, NULL);
-
-		if (!ParallelSlotsWaitCompletion(sa))
-			failed = true;
-	}
-
-finish:
-	ParallelSlotsTerminate(sa);
-	pg_free(sa);
-
-	termPQExpBuffer(&sql);
-
-	if (failed)
-		exit(1);
-}
-
-/*
- * Prepare the list of tables to process by querying the catalogs.
- *
- * Since we execute the constructed query with the default search_path (which
- * could be unsafe), everything in this query MUST be fully qualified.
- *
- * First, build a WITH clause for the catalog query if any tables were
- * specified, with a set of values made of relation names and their optional
- * set of columns.  This is used to match any provided column lists with the
- * generated qualified identifiers and to filter for the tables provided via
- * --table.  If a listed table does not exist, the catalog query will fail.
- */
-static SimpleStringList *
-retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
-				 SimpleStringList *objects, bool echo)
-{
-	PQExpBufferData buf;
-	PQExpBufferData catalog_query;
-	PGresult   *res;
-	SimpleStringListCell *cell;
-	SimpleStringList *found_objs = palloc0(sizeof(SimpleStringList));
-	bool		objects_listed = false;
-
-	initPQExpBuffer(&catalog_query);
-	for (cell = objects ? objects->head : NULL; cell; cell = cell->next)
-	{
-		char	   *just_table = NULL;
-		const char *just_columns = NULL;
-
-		if (!objects_listed)
-		{
-			appendPQExpBufferStr(&catalog_query,
-								 "WITH listed_objects (object_oid, column_list) "
-								 "AS (\n  VALUES (");
-			objects_listed = true;
-		}
-		else
-			appendPQExpBufferStr(&catalog_query, ",\n  (");
-
-		if (objfilter & (OBJFILTER_SCHEMA | OBJFILTER_SCHEMA_EXCLUDE))
-		{
-			appendStringLiteralConn(&catalog_query, cell->val, conn);
-			appendPQExpBufferStr(&catalog_query, "::pg_catalog.regnamespace, ");
-		}
-
-		if (objfilter & OBJFILTER_TABLE)
-		{
-			/*
-			 * Split relation and column names given by the user, this is used
-			 * to feed the CTE with values on which are performed pre-run
-			 * validity checks as well.  For now these happen only on the
-			 * relation name.
-			 */
-			splitTableColumnsSpec(cell->val, PQclientEncoding(conn),
-								  &just_table, &just_columns);
-
-			appendStringLiteralConn(&catalog_query, just_table, conn);
-			appendPQExpBufferStr(&catalog_query, "::pg_catalog.regclass, ");
-		}
-
-		if (just_columns && just_columns[0] != '\0')
-			appendStringLiteralConn(&catalog_query, just_columns, conn);
-		else
-			appendPQExpBufferStr(&catalog_query, "NULL");
-
-		appendPQExpBufferStr(&catalog_query, "::pg_catalog.text)");
-
-		pg_free(just_table);
-	}
-
-	/* Finish formatting the CTE */
-	if (objects_listed)
-		appendPQExpBufferStr(&catalog_query, "\n)\n");
-
-	appendPQExpBufferStr(&catalog_query, "SELECT c.relname, ns.nspname");
-
-	if (objects_listed)
-		appendPQExpBufferStr(&catalog_query, ", listed_objects.column_list");
-
-	appendPQExpBufferStr(&catalog_query,
-						 " FROM pg_catalog.pg_class c\n"
-						 " JOIN pg_catalog.pg_namespace ns"
-						 " ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n"
-						 " CROSS JOIN LATERAL (SELECT c.relkind IN ("
-						 CppAsString2(RELKIND_PARTITIONED_TABLE) ", "
-						 CppAsString2(RELKIND_PARTITIONED_INDEX) ")) as p (inherited)\n"
-						 " LEFT JOIN pg_catalog.pg_class t"
-						 " ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid\n");
-
-	/*
-	 * Used to match the tables or schemas listed by the user, completing the
-	 * JOIN clause.
-	 */
-	if (objects_listed)
-	{
-		appendPQExpBufferStr(&catalog_query, " LEFT JOIN listed_objects"
-							 " ON listed_objects.object_oid"
-							 " OPERATOR(pg_catalog.=) ");
-
-		if (objfilter & OBJFILTER_TABLE)
-			appendPQExpBufferStr(&catalog_query, "c.oid\n");
-		else
-			appendPQExpBufferStr(&catalog_query, "ns.oid\n");
-	}
-
-	/*
-	 * Exclude temporary tables, beginning the WHERE clause.
-	 */
-	appendPQExpBufferStr(&catalog_query,
-						 " WHERE c.relpersistence OPERATOR(pg_catalog.!=) "
-						 CppAsString2(RELPERSISTENCE_TEMP) "\n");
-
-	/*
-	 * Used to match the tables or schemas listed by the user, for the WHERE
-	 * clause.
-	 */
-	if (objects_listed)
-	{
-		if (objfilter & OBJFILTER_SCHEMA_EXCLUDE)
-			appendPQExpBufferStr(&catalog_query,
-								 " AND listed_objects.object_oid IS NULL\n");
-		else
-			appendPQExpBufferStr(&catalog_query,
-								 " AND listed_objects.object_oid IS NOT NULL\n");
-	}
-
-	/*
-	 * If no tables were listed, filter for the relevant relation types.  If
-	 * tables were given via --table, don't bother filtering by relation type.
-	 * Instead, let the server decide whether a given relation can be
-	 * processed in which case the user will know about it.
-	 */
-	if ((objfilter & OBJFILTER_TABLE) == 0)
-	{
-		/*
-		 * vacuumdb should generally follow the behavior of the underlying
-		 * VACUUM and ANALYZE commands. If analyze_only is true, process
-		 * regular tables, materialized views, and partitioned tables, just
-		 * like ANALYZE (with no specific target tables) does. Otherwise,
-		 * process only regular tables and materialized views, since VACUUM
-		 * skips partitioned tables when no target tables are specified.
-		 */
-		if (vacopts->analyze_only)
-			appendPQExpBufferStr(&catalog_query,
-								 " AND c.relkind OPERATOR(pg_catalog.=) ANY (array["
-								 CppAsString2(RELKIND_RELATION) ", "
-								 CppAsString2(RELKIND_MATVIEW) ", "
-								 CppAsString2(RELKIND_PARTITIONED_TABLE) "])\n");
-		else
-			appendPQExpBufferStr(&catalog_query,
-								 " AND c.relkind OPERATOR(pg_catalog.=) ANY (array["
-								 CppAsString2(RELKIND_RELATION) ", "
-								 CppAsString2(RELKIND_MATVIEW) "])\n");
-
-	}
-
-	/*
-	 * For --min-xid-age and --min-mxid-age, the age of the relation is the
-	 * greatest of the ages of the main relation and its associated TOAST
-	 * table.  The commands generated by vacuumdb will also process the TOAST
-	 * table for the relation if necessary, so it does not need to be
-	 * considered separately.
-	 */
-	if (vacopts->min_xid_age != 0)
-	{
-		appendPQExpBuffer(&catalog_query,
-						  " AND GREATEST(pg_catalog.age(c.relfrozenxid),"
-						  " pg_catalog.age(t.relfrozenxid)) "
-						  " OPERATOR(pg_catalog.>=) '%d'::pg_catalog.int4\n"
-						  " AND c.relfrozenxid OPERATOR(pg_catalog.!=)"
-						  " '0'::pg_catalog.xid\n",
-						  vacopts->min_xid_age);
-	}
-
-	if (vacopts->min_mxid_age != 0)
-	{
-		appendPQExpBuffer(&catalog_query,
-						  " AND GREATEST(pg_catalog.mxid_age(c.relminmxid),"
-						  " pg_catalog.mxid_age(t.relminmxid)) OPERATOR(pg_catalog.>=)"
-						  " '%d'::pg_catalog.int4\n"
-						  " AND c.relminmxid OPERATOR(pg_catalog.!=)"
-						  " '0'::pg_catalog.xid\n",
-						  vacopts->min_mxid_age);
-	}
-
-	if (vacopts->missing_stats_only)
-	{
-		appendPQExpBufferStr(&catalog_query, " AND (\n");
-
-		/* regular stats */
-		appendPQExpBufferStr(&catalog_query,
-							 " EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
-							 " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
-							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
-							 " AND NOT a.attisdropped\n"
-							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
-							 " AND a.attgenerated OPERATOR(pg_catalog.<>) "
-							 CppAsString2(ATTRIBUTE_GENERATED_VIRTUAL) "\n"
-							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
-							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
-							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
-							 " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
-
-		/* extended stats */
-		appendPQExpBufferStr(&catalog_query,
-							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
-							 " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
-							 " AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
-							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
-							 " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
-							 " AND d.stxdinherit OPERATOR(pg_catalog.=) p.inherited))\n");
-
-		/* expression indexes */
-		appendPQExpBufferStr(&catalog_query,
-							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
-							 " JOIN pg_catalog.pg_index i"
-							 " ON i.indexrelid OPERATOR(pg_catalog.=) a.attrelid\n"
-							 " WHERE i.indrelid OPERATOR(pg_catalog.=) c.oid\n"
-							 " AND i.indkey[a.attnum OPERATOR(pg_catalog.-) 1::pg_catalog.int2]"
-							 " OPERATOR(pg_catalog.=) 0::pg_catalog.int2\n"
-							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
-							 " AND NOT a.attisdropped\n"
-							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
-							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
-							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
-							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
-							 " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
-
-		/* inheritance and regular stats */
-		appendPQExpBufferStr(&catalog_query,
-							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
-							 " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
-							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
-							 " AND NOT a.attisdropped\n"
-							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
-							 " AND a.attgenerated OPERATOR(pg_catalog.<>) "
-							 CppAsString2(ATTRIBUTE_GENERATED_VIRTUAL) "\n"
-							 " AND c.relhassubclass\n"
-							 " AND NOT p.inherited\n"
-							 " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
-							 " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
-							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
-							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
-							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
-							 " AND s.stainherit))\n");
-
-		/* inheritance and extended stats */
-		appendPQExpBufferStr(&catalog_query,
-							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
-							 " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
-							 " AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
-							 " AND c.relhassubclass\n"
-							 " AND NOT p.inherited\n"
-							 " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
-							 " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
-							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
-							 " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
-							 " AND d.stxdinherit))\n");
-
-		appendPQExpBufferStr(&catalog_query, " )\n");
-	}
-
-	/*
-	 * Execute the catalog query.  We use the default search_path for this
-	 * query for consistency with table lookups done elsewhere by the user.
-	 */
-	appendPQExpBufferStr(&catalog_query, " ORDER BY c.relpages DESC;");
-	executeCommand(conn, "RESET search_path;", echo);
-	res = executeQuery(conn, catalog_query.data, echo);
-	termPQExpBuffer(&catalog_query);
-	PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, echo));
-
-	/*
-	 * Build qualified identifiers for each table, including the column list
-	 * if given.
-	 */
-	initPQExpBuffer(&buf);
-	for (int i = 0; i < PQntuples(res); i++)
-	{
-		appendPQExpBufferStr(&buf,
-							 fmtQualifiedIdEnc(PQgetvalue(res, i, 1),
-											   PQgetvalue(res, i, 0),
-											   PQclientEncoding(conn)));
-
-		if (objects_listed && !PQgetisnull(res, i, 2))
-			appendPQExpBufferStr(&buf, PQgetvalue(res, i, 2));
-
-		simple_string_list_append(found_objs, buf.data);
-		resetPQExpBuffer(&buf);
-	}
-	termPQExpBuffer(&buf);
-	PQclear(res);
-
-	return found_objs;
-}
-
-/*
- * Vacuum/analyze all connectable databases.
- *
- * In analyze-in-stages mode, we process all databases in one stage before
- * moving on to the next stage.  That ensure minimal stats are available
- * quickly everywhere before generating more detailed ones.
- */
-static void
-vacuum_all_databases(ConnParams *cparams,
-					 vacuumingOptions *vacopts,
-					 bool analyze_in_stages,
-					 SimpleStringList *objects,
-					 int concurrentCons,
-					 const char *progname, bool echo, bool quiet)
-{
-	PGconn	   *conn;
-	PGresult   *result;
-	int			stage;
-	int			i;
-
-	conn = connectMaintenanceDatabase(cparams, progname, echo);
-	result = executeQuery(conn,
-						  "SELECT datname FROM pg_database WHERE datallowconn AND datconnlimit <> -2 ORDER BY 1;",
-						  echo);
-	PQfinish(conn);
-
-	if (analyze_in_stages)
-	{
-		SimpleStringList **found_objs = NULL;
-
-		if (vacopts->missing_stats_only)
-			found_objs = palloc0(PQntuples(result) * sizeof(SimpleStringList *));
-
-		/*
-		 * When analyzing all databases in stages, we analyze them all in the
-		 * fastest stage first, so that initial statistics become available
-		 * for all of them as soon as possible.
-		 *
-		 * This means we establish several times as many connections, but
-		 * that's a secondary consideration.
-		 */
-		for (stage = 0; stage < ANALYZE_NUM_STAGES; stage++)
-		{
-			for (i = 0; i < PQntuples(result); i++)
-			{
-				cparams->override_dbname = PQgetvalue(result, i, 0);
-
-				vacuum_one_database(cparams, vacopts,
-									stage,
-									objects,
-									vacopts->missing_stats_only ? &found_objs[i] : NULL,
-									concurrentCons,
-									progname, echo, quiet);
-			}
-		}
-	}
-	else
-	{
-		for (i = 0; i < PQntuples(result); i++)
-		{
-			cparams->override_dbname = PQgetvalue(result, i, 0);
-
-			vacuum_one_database(cparams, vacopts,
-								ANALYZE_NO_STAGE,
-								objects, NULL,
-								concurrentCons,
-								progname, echo, quiet);
-		}
-	}
-
-	PQclear(result);
-}
-
-/*
- * Construct a vacuum/analyze command to run based on the given options, in the
- * given string buffer, which may contain previous garbage.
- *
- * The table name used must be already properly quoted.  The command generated
- * depends on the server version involved and it is semicolon-terminated.
- */
-static void
-prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
-					   vacuumingOptions *vacopts, const char *table)
-{
-	const char *paren = " (";
-	const char *comma = ", ";
-	const char *sep = paren;
-
-	resetPQExpBuffer(sql);
-
-	if (vacopts->analyze_only)
-	{
-		appendPQExpBufferStr(sql, "ANALYZE");
-
-		/* parenthesized grammar of ANALYZE is supported since v11 */
-		if (serverVersion >= 110000)
-		{
-			if (vacopts->skip_locked)
-			{
-				/* SKIP_LOCKED is supported since v12 */
-				Assert(serverVersion >= 120000);
-				appendPQExpBuffer(sql, "%sSKIP_LOCKED", sep);
-				sep = comma;
-			}
-			if (vacopts->verbose)
-			{
-				appendPQExpBuffer(sql, "%sVERBOSE", sep);
-				sep = comma;
-			}
-			if (vacopts->buffer_usage_limit)
-			{
-				Assert(serverVersion >= 160000);
-				appendPQExpBuffer(sql, "%sBUFFER_USAGE_LIMIT '%s'", sep,
-								  vacopts->buffer_usage_limit);
-				sep = comma;
-			}
-			if (sep != paren)
-				appendPQExpBufferChar(sql, ')');
-		}
-		else
-		{
-			if (vacopts->verbose)
-				appendPQExpBufferStr(sql, " VERBOSE");
-		}
-	}
-	else
-	{
-		appendPQExpBufferStr(sql, "VACUUM");
-
-		/* parenthesized grammar of VACUUM is supported since v9.0 */
-		if (serverVersion >= 90000)
-		{
-			if (vacopts->disable_page_skipping)
-			{
-				/* DISABLE_PAGE_SKIPPING is supported since v9.6 */
-				Assert(serverVersion >= 90600);
-				appendPQExpBuffer(sql, "%sDISABLE_PAGE_SKIPPING", sep);
-				sep = comma;
-			}
-			if (vacopts->no_index_cleanup)
-			{
-				/* "INDEX_CLEANUP FALSE" has been supported since v12 */
-				Assert(serverVersion >= 120000);
-				Assert(!vacopts->force_index_cleanup);
-				appendPQExpBuffer(sql, "%sINDEX_CLEANUP FALSE", sep);
-				sep = comma;
-			}
-			if (vacopts->force_index_cleanup)
-			{
-				/* "INDEX_CLEANUP TRUE" has been supported since v12 */
-				Assert(serverVersion >= 120000);
-				Assert(!vacopts->no_index_cleanup);
-				appendPQExpBuffer(sql, "%sINDEX_CLEANUP TRUE", sep);
-				sep = comma;
-			}
-			if (!vacopts->do_truncate)
-			{
-				/* TRUNCATE is supported since v12 */
-				Assert(serverVersion >= 120000);
-				appendPQExpBuffer(sql, "%sTRUNCATE FALSE", sep);
-				sep = comma;
-			}
-			if (!vacopts->process_main)
-			{
-				/* PROCESS_MAIN is supported since v16 */
-				Assert(serverVersion >= 160000);
-				appendPQExpBuffer(sql, "%sPROCESS_MAIN FALSE", sep);
-				sep = comma;
-			}
-			if (!vacopts->process_toast)
-			{
-				/* PROCESS_TOAST is supported since v14 */
-				Assert(serverVersion >= 140000);
-				appendPQExpBuffer(sql, "%sPROCESS_TOAST FALSE", sep);
-				sep = comma;
-			}
-			if (vacopts->skip_database_stats)
-			{
-				/* SKIP_DATABASE_STATS is supported since v16 */
-				Assert(serverVersion >= 160000);
-				appendPQExpBuffer(sql, "%sSKIP_DATABASE_STATS", sep);
-				sep = comma;
-			}
-			if (vacopts->skip_locked)
-			{
-				/* SKIP_LOCKED is supported since v12 */
-				Assert(serverVersion >= 120000);
-				appendPQExpBuffer(sql, "%sSKIP_LOCKED", sep);
-				sep = comma;
-			}
-			if (vacopts->full)
-			{
-				appendPQExpBuffer(sql, "%sFULL", sep);
-				sep = comma;
-			}
-			if (vacopts->freeze)
-			{
-				appendPQExpBuffer(sql, "%sFREEZE", sep);
-				sep = comma;
-			}
-			if (vacopts->verbose)
-			{
-				appendPQExpBuffer(sql, "%sVERBOSE", sep);
-				sep = comma;
-			}
-			if (vacopts->and_analyze)
-			{
-				appendPQExpBuffer(sql, "%sANALYZE", sep);
-				sep = comma;
-			}
-			if (vacopts->parallel_workers >= 0)
-			{
-				/* PARALLEL is supported since v13 */
-				Assert(serverVersion >= 130000);
-				appendPQExpBuffer(sql, "%sPARALLEL %d", sep,
-								  vacopts->parallel_workers);
-				sep = comma;
-			}
-			if (vacopts->buffer_usage_limit)
-			{
-				Assert(serverVersion >= 160000);
-				appendPQExpBuffer(sql, "%sBUFFER_USAGE_LIMIT '%s'", sep,
-								  vacopts->buffer_usage_limit);
-				sep = comma;
-			}
-			if (sep != paren)
-				appendPQExpBufferChar(sql, ')');
-		}
-		else
-		{
-			if (vacopts->full)
-				appendPQExpBufferStr(sql, " FULL");
-			if (vacopts->freeze)
-				appendPQExpBufferStr(sql, " FREEZE");
-			if (vacopts->verbose)
-				appendPQExpBufferStr(sql, " VERBOSE");
-			if (vacopts->and_analyze)
-				appendPQExpBufferStr(sql, " ANALYZE");
-		}
-	}
-
-	appendPQExpBuffer(sql, " %s;", table);
-}
-
-/*
- * Send a vacuum/analyze command to the server, returning after sending the
- * command.
- *
- * Any errors during command execution are reported to stderr.
- */
-static void
-run_vacuum_command(PGconn *conn, const char *sql, bool echo,
-				   const char *table)
-{
-	bool		status;
-
-	if (echo)
-		printf("%s\n", sql);
-
-	status = PQsendQuery(conn, sql) == 1;
-
-	if (!status)
-	{
-		if (table)
-			pg_log_error("vacuuming of table \"%s\" in database \"%s\" failed: %s",
-						 table, PQdb(conn), PQerrorMessage(conn));
-		else
-			pg_log_error("vacuuming of database \"%s\" failed: %s",
-						 PQdb(conn), PQerrorMessage(conn));
-	}
-}
 
 static void
 help(const char *progname)
diff --git a/src/bin/scripts/vacuuming.c b/src/bin/scripts/vacuuming.c
new file mode 100644
index 00000000000..9be37fcc45a
--- /dev/null
+++ b/src/bin/scripts/vacuuming.c
@@ -0,0 +1,978 @@
+/*-------------------------------------------------------------------------
+ * vacuuming.c
+ *		Common routines for vacuumdb
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/bin/scripts/vacuuming.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <limits.h>
+
+#include "catalog/pg_attribute_d.h"
+#include "catalog/pg_class_d.h"
+#include "common/connect.h"
+#include "common/logging.h"
+#include "fe_utils/cancel.h"
+#include "fe_utils/option_utils.h"
+#include "fe_utils/parallel_slot.h"
+#include "fe_utils/query_utils.h"
+#include "fe_utils/string_utils.h"
+#include "vacuuming.h"
+
+VacObjFilter objfilter = OBJFILTER_NONE;
+
+
+/*
+ * Executes vacuum/analyze as indicated, or dies in case of failure.
+ */
+void
+vacuuming_main(ConnParams *cparams, const char *dbname,
+			   const char *maintenance_db, vacuumingOptions *vacopts,
+			   SimpleStringList *objects, bool analyze_in_stages,
+			   int tbl_count, int concurrentCons,
+			   const char *progname, bool echo, bool quiet)
+{
+	setup_cancel_handler(NULL);
+
+	/* Avoid opening extra connections. */
+	if (tbl_count && (concurrentCons > tbl_count))
+		concurrentCons = tbl_count;
+
+	if (objfilter & OBJFILTER_ALL_DBS)
+	{
+		cparams->dbname = maintenance_db;
+
+		vacuum_all_databases(cparams, vacopts,
+							 analyze_in_stages,
+							 objects,
+							 concurrentCons,
+							 progname, echo, quiet);
+	}
+	else
+	{
+		if (dbname == NULL)
+		{
+			if (getenv("PGDATABASE"))
+				dbname = getenv("PGDATABASE");
+			else if (getenv("PGUSER"))
+				dbname = getenv("PGUSER");
+			else
+				dbname = get_user_name_or_exit(progname);
+		}
+
+		cparams->dbname = dbname;
+
+		if (analyze_in_stages)
+		{
+			int			stage;
+			SimpleStringList *found_objs = NULL;
+
+			for (stage = 0; stage < ANALYZE_NUM_STAGES; stage++)
+			{
+				vacuum_one_database(cparams, vacopts,
+									stage,
+									objects,
+									vacopts->missing_stats_only ? &found_objs : NULL,
+									concurrentCons,
+									progname, echo, quiet);
+			}
+		}
+		else
+			vacuum_one_database(cparams, vacopts,
+								ANALYZE_NO_STAGE,
+								objects, NULL,
+								concurrentCons,
+								progname, echo, quiet);
+	}
+}
+
+
+/*
+ * vacuum_one_database
+ *
+ * Process tables in the given database.
+ *
+ * There are two ways to specify the list of objects to process:
+ *
+ * 1) The "found_objs" parameter is a double pointer to a fully qualified list
+ *    of objects to process, as returned by a previous call to
+ *    vacuum_one_database().
+ *
+ *     a) If both "found_objs" (the double pointer) and "*found_objs" (the
+ *        once-dereferenced double pointer) are not NULL, this list takes
+ *        priority, and anything specified in "objects" is ignored.
+ *
+ *     b) If "found_objs" (the double pointer) is not NULL but "*found_objs"
+ *        (the once-dereferenced double pointer) _is_ NULL, the "objects"
+ *        parameter takes priority, and the results of the catalog query
+ *        described in (2) are stored in "found_objs".
+ *
+ *     c) If "found_objs" (the double pointer) is NULL, the "objects"
+ *        parameter again takes priority, and the results of the catalog query
+ *        are not saved.
+ *
+ * 2) The "objects" parameter is a user-specified list of objects to process.
+ *    When (1b) or (1c) applies, this function performs a catalog query to
+ *    retrieve a fully qualified list of objects to process, as described
+ *    below.
+ *
+ *     a) If "objects" is not NULL, the catalog query gathers only the objects
+ *        listed in "objects".
+ *
+ *     b) If "objects" is NULL, all tables in the database are gathered.
+ *
+ * Note that this function is only concerned with running exactly one stage
+ * when in analyze-in-stages mode; caller must iterate on us if necessary.
+ *
+ * If concurrentCons is > 1, multiple connections are used to vacuum tables
+ * in parallel.
+ */
+void
+vacuum_one_database(ConnParams *cparams,
+					vacuumingOptions *vacopts,
+					int stage,
+					SimpleStringList *objects,
+					SimpleStringList **found_objs,
+					int concurrentCons,
+					const char *progname, bool echo, bool quiet)
+{
+	PQExpBufferData sql;
+	PGconn	   *conn;
+	SimpleStringListCell *cell;
+	ParallelSlotArray *sa;
+	int			ntups = 0;
+	bool		failed = false;
+	const char *initcmd;
+	SimpleStringList *ret = NULL;
+	const char *stage_commands[] = {
+		"SET default_statistics_target=1; SET vacuum_cost_delay=0;",
+		"SET default_statistics_target=10; RESET vacuum_cost_delay;",
+		"RESET default_statistics_target;"
+	};
+	const char *stage_messages[] = {
+		gettext_noop("Generating minimal optimizer statistics (1 target)"),
+		gettext_noop("Generating medium optimizer statistics (10 targets)"),
+		gettext_noop("Generating default (full) optimizer statistics")
+	};
+
+	Assert(stage == ANALYZE_NO_STAGE ||
+		   (stage >= 0 && stage < ANALYZE_NUM_STAGES));
+
+	conn = connectDatabase(cparams, progname, echo, false, true);
+
+	if (vacopts->disable_page_skipping && PQserverVersion(conn) < 90600)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "disable-page-skipping", "9.6");
+	}
+
+	if (vacopts->no_index_cleanup && PQserverVersion(conn) < 120000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "no-index-cleanup", "12");
+	}
+
+	if (vacopts->force_index_cleanup && PQserverVersion(conn) < 120000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "force-index-cleanup", "12");
+	}
+
+	if (!vacopts->do_truncate && PQserverVersion(conn) < 120000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "no-truncate", "12");
+	}
+
+	if (!vacopts->process_main && PQserverVersion(conn) < 160000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "no-process-main", "16");
+	}
+
+	if (!vacopts->process_toast && PQserverVersion(conn) < 140000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "no-process-toast", "14");
+	}
+
+	if (vacopts->skip_locked && PQserverVersion(conn) < 120000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "skip-locked", "12");
+	}
+
+	if (vacopts->min_xid_age != 0 && PQserverVersion(conn) < 90600)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "--min-xid-age", "9.6");
+	}
+
+	if (vacopts->min_mxid_age != 0 && PQserverVersion(conn) < 90600)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "--min-mxid-age", "9.6");
+	}
+
+	if (vacopts->parallel_workers >= 0 && PQserverVersion(conn) < 130000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "--parallel", "13");
+	}
+
+	if (vacopts->buffer_usage_limit && PQserverVersion(conn) < 160000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "--buffer-usage-limit", "16");
+	}
+
+	if (vacopts->missing_stats_only && PQserverVersion(conn) < 150000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "--missing-stats-only", "15");
+	}
+
+	/* skip_database_stats is used automatically if server supports it */
+	vacopts->skip_database_stats = (PQserverVersion(conn) >= 160000);
+
+	if (!quiet)
+	{
+		if (stage != ANALYZE_NO_STAGE)
+			printf(_("%s: processing database \"%s\": %s\n"),
+				   progname, PQdb(conn), _(stage_messages[stage]));
+		else
+			printf(_("%s: vacuuming database \"%s\"\n"),
+				   progname, PQdb(conn));
+		fflush(stdout);
+	}
+
+	/*
+	 * If the caller provided the results of a previous catalog query, just
+	 * use that.  Otherwise, run the catalog query ourselves and set the
+	 * return variable if provided.
+	 */
+	if (found_objs && *found_objs)
+		ret = *found_objs;
+	else
+	{
+		ret = retrieve_objects(conn, vacopts, objects, echo);
+		if (found_objs)
+			*found_objs = ret;
+	}
+
+	/*
+	 * Count the number of objects in the catalog query result.  If there are
+	 * none, we are done.
+	 */
+	for (cell = ret ? ret->head : NULL; cell; cell = cell->next)
+		ntups++;
+
+	if (ntups == 0)
+	{
+		PQfinish(conn);
+		return;
+	}
+
+	/*
+	 * Ensure concurrentCons is sane.  If there are more connections than
+	 * vacuumable relations, we don't need to use them all.
+	 */
+	if (concurrentCons > ntups)
+		concurrentCons = ntups;
+	if (concurrentCons <= 0)
+		concurrentCons = 1;
+
+	/*
+	 * All slots need to be prepared to run the appropriate analyze stage, if
+	 * caller requested that mode.  We have to prepare the initial connection
+	 * ourselves before setting up the slots.
+	 */
+	if (stage == ANALYZE_NO_STAGE)
+		initcmd = NULL;
+	else
+	{
+		initcmd = stage_commands[stage];
+		executeCommand(conn, initcmd, echo);
+	}
+
+	/*
+	 * Setup the database connections. We reuse the connection we already have
+	 * for the first slot.  If not in parallel mode, the first slot in the
+	 * array contains the connection.
+	 */
+	sa = ParallelSlotsSetup(concurrentCons, cparams, progname, echo, initcmd);
+	ParallelSlotsAdoptConn(sa, conn);
+
+	initPQExpBuffer(&sql);
+
+	cell = ret->head;
+	do
+	{
+		const char *tabname = cell->val;
+		ParallelSlot *free_slot;
+
+		if (CancelRequested)
+		{
+			failed = true;
+			goto finish;
+		}
+
+		free_slot = ParallelSlotsGetIdle(sa, NULL);
+		if (!free_slot)
+		{
+			failed = true;
+			goto finish;
+		}
+
+		prepare_vacuum_command(&sql, PQserverVersion(free_slot->connection),
+							   vacopts, tabname);
+
+		/*
+		 * Execute the vacuum.  All errors are handled in processQueryResult
+		 * through ParallelSlotsGetIdle.
+		 */
+		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
+		run_vacuum_command(free_slot->connection, sql.data,
+						   echo, tabname);
+
+		cell = cell->next;
+	} while (cell != NULL);
+
+	if (!ParallelSlotsWaitCompletion(sa))
+	{
+		failed = true;
+		goto finish;
+	}
+
+	/* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */
+	if (vacopts->skip_database_stats &&
+		stage == ANALYZE_NO_STAGE)
+	{
+		const char *cmd = "VACUUM (ONLY_DATABASE_STATS);";
+		ParallelSlot *free_slot = ParallelSlotsGetIdle(sa, NULL);
+
+		if (!free_slot)
+		{
+			failed = true;
+			goto finish;
+		}
+
+		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
+		run_vacuum_command(free_slot->connection, cmd, echo, NULL);
+
+		if (!ParallelSlotsWaitCompletion(sa))
+			failed = true;
+	}
+
+finish:
+	ParallelSlotsTerminate(sa);
+	pg_free(sa);
+
+	termPQExpBuffer(&sql);
+
+	if (failed)
+		exit(1);
+}
+
+/*
+ * Prepare the list of tables to process by querying the catalogs.
+ *
+ * Since we execute the constructed query with the default search_path (which
+ * could be unsafe), everything in this query MUST be fully qualified.
+ *
+ * First, build a WITH clause for the catalog query if any tables were
+ * specified, with a set of values made of relation names and their optional
+ * set of columns.  This is used to match any provided column lists with the
+ * generated qualified identifiers and to filter for the tables provided via
+ * --table.  If a listed table does not exist, the catalog query will fail.
+ */
+SimpleStringList *
+retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
+				 SimpleStringList *objects, bool echo)
+{
+	PQExpBufferData buf;
+	PQExpBufferData catalog_query;
+	PGresult   *res;
+	SimpleStringListCell *cell;
+	SimpleStringList *found_objs = palloc0(sizeof(SimpleStringList));
+	bool		objects_listed = false;
+
+	initPQExpBuffer(&catalog_query);
+	for (cell = objects ? objects->head : NULL; cell; cell = cell->next)
+	{
+		char	   *just_table = NULL;
+		const char *just_columns = NULL;
+
+		if (!objects_listed)
+		{
+			appendPQExpBufferStr(&catalog_query,
+								 "WITH listed_objects (object_oid, column_list) AS (\n"
+								 "  VALUES (");
+			objects_listed = true;
+		}
+		else
+			appendPQExpBufferStr(&catalog_query, ",\n  (");
+
+		if (objfilter & (OBJFILTER_SCHEMA | OBJFILTER_SCHEMA_EXCLUDE))
+		{
+			appendStringLiteralConn(&catalog_query, cell->val, conn);
+			appendPQExpBufferStr(&catalog_query, "::pg_catalog.regnamespace, ");
+		}
+
+		if (objfilter & OBJFILTER_TABLE)
+		{
+			/*
+			 * Split relation and column names given by the user, this is used
+			 * to feed the CTE with values on which are performed pre-run
+			 * validity checks as well.  For now these happen only on the
+			 * relation name.
+			 */
+			splitTableColumnsSpec(cell->val, PQclientEncoding(conn),
+								  &just_table, &just_columns);
+
+			appendStringLiteralConn(&catalog_query, just_table, conn);
+			appendPQExpBufferStr(&catalog_query, "::pg_catalog.regclass, ");
+		}
+
+		if (just_columns && just_columns[0] != '\0')
+			appendStringLiteralConn(&catalog_query, just_columns, conn);
+		else
+			appendPQExpBufferStr(&catalog_query, "NULL");
+
+		appendPQExpBufferStr(&catalog_query, "::pg_catalog.text)");
+
+		pg_free(just_table);
+	}
+
+	/* Finish formatting the CTE */
+	if (objects_listed)
+		appendPQExpBufferStr(&catalog_query, "\n)\n");
+
+	appendPQExpBufferStr(&catalog_query, "SELECT c.relname, ns.nspname");
+
+	if (objects_listed)
+		appendPQExpBufferStr(&catalog_query, ", listed_objects.column_list");
+
+	appendPQExpBufferStr(&catalog_query,
+						 " FROM pg_catalog.pg_class c\n"
+						 " JOIN pg_catalog.pg_namespace ns"
+						 " ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n"
+						 " CROSS JOIN LATERAL (SELECT c.relkind IN ("
+						 CppAsString2(RELKIND_PARTITIONED_TABLE) ", "
+						 CppAsString2(RELKIND_PARTITIONED_INDEX) ")) as p (inherited)\n"
+						 " LEFT JOIN pg_catalog.pg_class t"
+						 " ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid\n");
+
+	/*
+	 * Used to match the tables or schemas listed by the user, completing the
+	 * JOIN clause.
+	 */
+	if (objects_listed)
+	{
+		appendPQExpBufferStr(&catalog_query, " LEFT JOIN listed_objects"
+							 " ON listed_objects.object_oid"
+							 " OPERATOR(pg_catalog.=) ");
+
+		if (objfilter & OBJFILTER_TABLE)
+			appendPQExpBufferStr(&catalog_query, "c.oid\n");
+		else
+			appendPQExpBufferStr(&catalog_query, "ns.oid\n");
+	}
+
+	/*
+	 * Exclude temporary tables, beginning the WHERE clause.
+	 */
+	appendPQExpBufferStr(&catalog_query,
+						 " WHERE c.relpersistence OPERATOR(pg_catalog.!=) "
+						 CppAsString2(RELPERSISTENCE_TEMP) "\n");
+
+	/*
+	 * Used to match the tables or schemas listed by the user, for the WHERE
+	 * clause.
+	 */
+	if (objects_listed)
+	{
+		if (objfilter & OBJFILTER_SCHEMA_EXCLUDE)
+			appendPQExpBufferStr(&catalog_query,
+								 " AND listed_objects.object_oid IS NULL\n");
+		else
+			appendPQExpBufferStr(&catalog_query,
+								 " AND listed_objects.object_oid IS NOT NULL\n");
+	}
+
+	/*
+	 * If no tables were listed, filter for the relevant relation types.  If
+	 * tables were given via --table, don't bother filtering by relation type.
+	 * Instead, let the server decide whether a given relation can be
+	 * processed in which case the user will know about it.
+	 */
+	if ((objfilter & OBJFILTER_TABLE) == 0)
+	{
+		/*
+		 * vacuumdb should generally follow the behavior of the underlying
+		 * VACUUM and ANALYZE commands. If analyze_only is true, process
+		 * regular tables, materialized views, and partitioned tables, just
+		 * like ANALYZE (with no specific target tables) does. Otherwise,
+		 * process only regular tables and materialized views, since VACUUM
+		 * skips partitioned tables when no target tables are specified.
+		 */
+		if (vacopts->analyze_only)
+			appendPQExpBufferStr(&catalog_query,
+								 " AND c.relkind OPERATOR(pg_catalog.=) ANY (array["
+								 CppAsString2(RELKIND_RELATION) ", "
+								 CppAsString2(RELKIND_MATVIEW) ", "
+								 CppAsString2(RELKIND_PARTITIONED_TABLE) "])\n");
+		else
+			appendPQExpBufferStr(&catalog_query,
+								 " AND c.relkind OPERATOR(pg_catalog.=) ANY (array["
+								 CppAsString2(RELKIND_RELATION) ", "
+								 CppAsString2(RELKIND_MATVIEW) "])\n");
+	}
+
+	/*
+	 * For --min-xid-age and --min-mxid-age, the age of the relation is the
+	 * greatest of the ages of the main relation and its associated TOAST
+	 * table.  The commands generated by vacuumdb will also process the TOAST
+	 * table for the relation if necessary, so it does not need to be
+	 * considered separately.
+	 */
+	if (vacopts->min_xid_age != 0)
+	{
+		appendPQExpBuffer(&catalog_query,
+						  " AND GREATEST(pg_catalog.age(c.relfrozenxid),"
+						  " pg_catalog.age(t.relfrozenxid)) "
+						  " OPERATOR(pg_catalog.>=) '%d'::pg_catalog.int4\n"
+						  " AND c.relfrozenxid OPERATOR(pg_catalog.!=)"
+						  " '0'::pg_catalog.xid\n",
+						  vacopts->min_xid_age);
+	}
+
+	if (vacopts->min_mxid_age != 0)
+	{
+		appendPQExpBuffer(&catalog_query,
+						  " AND GREATEST(pg_catalog.mxid_age(c.relminmxid),"
+						  " pg_catalog.mxid_age(t.relminmxid)) OPERATOR(pg_catalog.>=)"
+						  " '%d'::pg_catalog.int4\n"
+						  " AND c.relminmxid OPERATOR(pg_catalog.!=)"
+						  " '0'::pg_catalog.xid\n",
+						  vacopts->min_mxid_age);
+	}
+
+	if (vacopts->missing_stats_only)
+	{
+		appendPQExpBufferStr(&catalog_query, " AND (\n");
+
+		/* regular stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
+							 " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
+							 " AND NOT a.attisdropped\n"
+							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND a.attgenerated OPERATOR(pg_catalog.<>) "
+							 CppAsString2(ATTRIBUTE_GENERATED_VIRTUAL) "\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
+							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
+							 " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+		/* extended stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
+							 " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
+							 " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
+							 " AND d.stxdinherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+		/* expression indexes */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
+							 " JOIN pg_catalog.pg_index i"
+							 " ON i.indexrelid OPERATOR(pg_catalog.=) a.attrelid\n"
+							 " WHERE i.indrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND i.indkey[a.attnum OPERATOR(pg_catalog.-) 1::pg_catalog.int2]"
+							 " OPERATOR(pg_catalog.=) 0::pg_catalog.int2\n"
+							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
+							 " AND NOT a.attisdropped\n"
+							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
+							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
+							 " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+		/* inheritance and regular stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
+							 " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
+							 " AND NOT a.attisdropped\n"
+							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND a.attgenerated OPERATOR(pg_catalog.<>) "
+							 CppAsString2(ATTRIBUTE_GENERATED_VIRTUAL) "\n"
+							 " AND c.relhassubclass\n"
+							 " AND NOT p.inherited\n"
+							 " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
+							 " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
+							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
+							 " AND s.stainherit))\n");
+
+		/* inheritance and extended stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
+							 " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND c.relhassubclass\n"
+							 " AND NOT p.inherited\n"
+							 " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
+							 " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
+							 " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
+							 " AND d.stxdinherit))\n");
+
+		appendPQExpBufferStr(&catalog_query, " )\n");
+	}
+
+	/*
+	 * Execute the catalog query.  We use the default search_path for this
+	 * query for consistency with table lookups done elsewhere by the user.
+	 */
+	appendPQExpBufferStr(&catalog_query, " ORDER BY c.relpages DESC;");
+	executeCommand(conn, "RESET search_path;", echo);
+	res = executeQuery(conn, catalog_query.data, echo);
+	termPQExpBuffer(&catalog_query);
+	PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, echo));
+
+	/*
+	 * Build qualified identifiers for each table, including the column list
+	 * if given.
+	 */
+	initPQExpBuffer(&buf);
+	for (int i = 0; i < PQntuples(res); i++)
+	{
+		appendPQExpBufferStr(&buf,
+							 fmtQualifiedIdEnc(PQgetvalue(res, i, 1),
+											   PQgetvalue(res, i, 0),
+											   PQclientEncoding(conn)));
+
+		if (objects_listed && !PQgetisnull(res, i, 2))
+			appendPQExpBufferStr(&buf, PQgetvalue(res, i, 2));
+
+		simple_string_list_append(found_objs, buf.data);
+		resetPQExpBuffer(&buf);
+	}
+	termPQExpBuffer(&buf);
+	PQclear(res);
+
+	return found_objs;
+}
+
+/*
+ * Vacuum/analyze all connectable databases.
+ *
+ * In analyze-in-stages mode, we process all databases in one stage before
+ * moving on to the next stage.  That ensure minimal stats are available
+ * quickly everywhere before generating more detailed ones.
+ */
+void
+vacuum_all_databases(ConnParams *cparams,
+					 vacuumingOptions *vacopts,
+					 bool analyze_in_stages,
+					 SimpleStringList *objects,
+					 int concurrentCons,
+					 const char *progname, bool echo, bool quiet)
+{
+	PGconn	   *conn;
+	PGresult   *result;
+	int			stage;
+	int			i;
+
+	conn = connectMaintenanceDatabase(cparams, progname, echo);
+	result = executeQuery(conn,
+						  "SELECT datname FROM pg_database WHERE datallowconn AND datconnlimit <> -2 ORDER BY 1;",
+						  echo);
+	PQfinish(conn);
+
+	if (analyze_in_stages)
+	{
+		SimpleStringList **found_objs = NULL;
+
+		if (vacopts->missing_stats_only)
+			found_objs = palloc0(PQntuples(result) * sizeof(SimpleStringList *));
+
+		/*
+		 * When analyzing all databases in stages, we analyze them all in the
+		 * fastest stage first, so that initial statistics become available
+		 * for all of them as soon as possible.
+		 *
+		 * This means we establish several times as many connections, but
+		 * that's a secondary consideration.
+		 */
+		for (stage = 0; stage < ANALYZE_NUM_STAGES; stage++)
+		{
+			for (i = 0; i < PQntuples(result); i++)
+			{
+				cparams->override_dbname = PQgetvalue(result, i, 0);
+
+				vacuum_one_database(cparams, vacopts,
+									stage,
+									objects,
+									vacopts->missing_stats_only ? &found_objs[i] : NULL,
+									concurrentCons,
+									progname, echo, quiet);
+			}
+		}
+	}
+	else
+	{
+		for (i = 0; i < PQntuples(result); i++)
+		{
+			cparams->override_dbname = PQgetvalue(result, i, 0);
+
+			vacuum_one_database(cparams, vacopts,
+								ANALYZE_NO_STAGE,
+								objects, NULL,
+								concurrentCons,
+								progname, echo, quiet);
+		}
+	}
+
+	PQclear(result);
+}
+
+/*
+ * Construct a vacuum/analyze command to run based on the given
+ * options, in the given string buffer, which may contain previous garbage.
+ *
+ * The table name used must be already properly quoted.  The command generated
+ * depends on the server version involved and it is semicolon-terminated.
+ */
+void
+prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
+					   vacuumingOptions *vacopts, const char *table)
+{
+	const char *paren = " (";
+	const char *comma = ", ";
+	const char *sep = paren;
+
+	resetPQExpBuffer(sql);
+
+	if (vacopts->analyze_only)
+	{
+		appendPQExpBufferStr(sql, "ANALYZE");
+
+		/* parenthesized grammar of ANALYZE is supported since v11 */
+		if (serverVersion >= 110000)
+		{
+			if (vacopts->skip_locked)
+			{
+				/* SKIP_LOCKED is supported since v12 */
+				Assert(serverVersion >= 120000);
+				appendPQExpBuffer(sql, "%sSKIP_LOCKED", sep);
+				sep = comma;
+			}
+			if (vacopts->verbose)
+			{
+				appendPQExpBuffer(sql, "%sVERBOSE", sep);
+				sep = comma;
+			}
+			if (vacopts->buffer_usage_limit)
+			{
+				Assert(serverVersion >= 160000);
+				appendPQExpBuffer(sql, "%sBUFFER_USAGE_LIMIT '%s'", sep,
+								  vacopts->buffer_usage_limit);
+				sep = comma;
+			}
+			if (sep != paren)
+				appendPQExpBufferChar(sql, ')');
+		}
+		else
+		{
+			if (vacopts->verbose)
+				appendPQExpBufferStr(sql, " VERBOSE");
+		}
+	}
+	else
+	{
+		appendPQExpBufferStr(sql, "VACUUM");
+
+		/* parenthesized grammar of VACUUM is supported since v9.0 */
+		if (serverVersion >= 90000)
+		{
+			if (vacopts->disable_page_skipping)
+			{
+				/* DISABLE_PAGE_SKIPPING is supported since v9.6 */
+				Assert(serverVersion >= 90600);
+				appendPQExpBuffer(sql, "%sDISABLE_PAGE_SKIPPING", sep);
+				sep = comma;
+			}
+			if (vacopts->no_index_cleanup)
+			{
+				/* "INDEX_CLEANUP FALSE" has been supported since v12 */
+				Assert(serverVersion >= 120000);
+				Assert(!vacopts->force_index_cleanup);
+				appendPQExpBuffer(sql, "%sINDEX_CLEANUP FALSE", sep);
+				sep = comma;
+			}
+			if (vacopts->force_index_cleanup)
+			{
+				/* "INDEX_CLEANUP TRUE" has been supported since v12 */
+				Assert(serverVersion >= 120000);
+				Assert(!vacopts->no_index_cleanup);
+				appendPQExpBuffer(sql, "%sINDEX_CLEANUP TRUE", sep);
+				sep = comma;
+			}
+			if (!vacopts->do_truncate)
+			{
+				/* TRUNCATE is supported since v12 */
+				Assert(serverVersion >= 120000);
+				appendPQExpBuffer(sql, "%sTRUNCATE FALSE", sep);
+				sep = comma;
+			}
+			if (!vacopts->process_main)
+			{
+				/* PROCESS_MAIN is supported since v16 */
+				Assert(serverVersion >= 160000);
+				appendPQExpBuffer(sql, "%sPROCESS_MAIN FALSE", sep);
+				sep = comma;
+			}
+			if (!vacopts->process_toast)
+			{
+				/* PROCESS_TOAST is supported since v14 */
+				Assert(serverVersion >= 140000);
+				appendPQExpBuffer(sql, "%sPROCESS_TOAST FALSE", sep);
+				sep = comma;
+			}
+			if (vacopts->skip_database_stats)
+			{
+				/* SKIP_DATABASE_STATS is supported since v16 */
+				Assert(serverVersion >= 160000);
+				appendPQExpBuffer(sql, "%sSKIP_DATABASE_STATS", sep);
+				sep = comma;
+			}
+			if (vacopts->skip_locked)
+			{
+				/* SKIP_LOCKED is supported since v12 */
+				Assert(serverVersion >= 120000);
+				appendPQExpBuffer(sql, "%sSKIP_LOCKED", sep);
+				sep = comma;
+			}
+			if (vacopts->full)
+			{
+				appendPQExpBuffer(sql, "%sFULL", sep);
+				sep = comma;
+			}
+			if (vacopts->freeze)
+			{
+				appendPQExpBuffer(sql, "%sFREEZE", sep);
+				sep = comma;
+			}
+			if (vacopts->verbose)
+			{
+				appendPQExpBuffer(sql, "%sVERBOSE", sep);
+				sep = comma;
+			}
+			if (vacopts->and_analyze)
+			{
+				appendPQExpBuffer(sql, "%sANALYZE", sep);
+				sep = comma;
+			}
+			if (vacopts->parallel_workers >= 0)
+			{
+				/* PARALLEL is supported since v13 */
+				Assert(serverVersion >= 130000);
+				appendPQExpBuffer(sql, "%sPARALLEL %d", sep,
+								  vacopts->parallel_workers);
+				sep = comma;
+			}
+			if (vacopts->buffer_usage_limit)
+			{
+				Assert(serverVersion >= 160000);
+				appendPQExpBuffer(sql, "%sBUFFER_USAGE_LIMIT '%s'", sep,
+								  vacopts->buffer_usage_limit);
+				sep = comma;
+			}
+			if (sep != paren)
+				appendPQExpBufferChar(sql, ')');
+		}
+		else
+		{
+			if (vacopts->full)
+				appendPQExpBufferStr(sql, " FULL");
+			if (vacopts->freeze)
+				appendPQExpBufferStr(sql, " FREEZE");
+			if (vacopts->verbose)
+				appendPQExpBufferStr(sql, " VERBOSE");
+			if (vacopts->and_analyze)
+				appendPQExpBufferStr(sql, " ANALYZE");
+		}
+	}
+
+	appendPQExpBuffer(sql, " %s;", table);
+}
+
+/*
+ * Send a vacuum/analyze command to the server, returning after sending the
+ * command.
+ *
+ * Any errors during command execution are reported to stderr.
+ */
+void
+run_vacuum_command(PGconn *conn, const char *sql, bool echo,
+				   const char *table)
+{
+	bool		status;
+
+	if (echo)
+		printf("%s\n", sql);
+
+	status = PQsendQuery(conn, sql) == 1;
+
+	if (!status)
+	{
+		if (table)
+		{
+			pg_log_error("vacuuming of table \"%s\" in database \"%s\" failed: %s",
+						 table, PQdb(conn), PQerrorMessage(conn));
+		}
+		else
+		{
+			pg_log_error("vacuuming of database \"%s\" failed: %s",
+						 PQdb(conn), PQerrorMessage(conn));
+		}
+	}
+}
+
+/*
+ * Returns a newly malloc'd version of 'src' with escaped single quotes and
+ * backslashes.
+ */
+char *
+escape_quotes(const char *src)
+{
+	char	   *result = escape_single_quotes_ascii(src);
+
+	if (!result)
+		pg_fatal("out of memory");
+	return result;
+}
diff --git a/src/bin/scripts/vacuuming.h b/src/bin/scripts/vacuuming.h
new file mode 100644
index 00000000000..d3f000840fa
--- /dev/null
+++ b/src/bin/scripts/vacuuming.h
@@ -0,0 +1,95 @@
+/*-------------------------------------------------------------------------
+ *
+ * vacuuming.h
+ *		Common declarations for vacuuming.c
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/bin/scripts/vacuuming.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef VACUUMING_H
+#define VACUUMING_H
+
+#include "common.h"
+#include "fe_utils/connect_utils.h"
+#include "fe_utils/simple_list.h"
+
+/* For analyze-in-stages mode */
+#define ANALYZE_NO_STAGE	-1
+#define ANALYZE_NUM_STAGES	3
+
+/* vacuum options controlled by user flags */
+typedef struct vacuumingOptions
+{
+	bool		analyze_only;
+	bool		verbose;
+	bool		and_analyze;
+	bool		full;
+	bool		freeze;
+	bool		disable_page_skipping;
+	bool		skip_locked;
+	int			min_xid_age;
+	int			min_mxid_age;
+	int			parallel_workers;	/* >= 0 indicates user specified the
+									 * parallel degree, otherwise -1 */
+	bool		no_index_cleanup;
+	bool		force_index_cleanup;
+	bool		do_truncate;
+	bool		process_main;
+	bool		process_toast;
+	bool		skip_database_stats;
+	char	   *buffer_usage_limit;
+	bool		missing_stats_only;
+} vacuumingOptions;
+
+/* object filter options */
+typedef enum
+{
+	OBJFILTER_NONE = 0,			/* no filter used */
+	OBJFILTER_ALL_DBS = (1 << 0),	/* -a | --all */
+	OBJFILTER_DATABASE = (1 << 1),	/* -d | --dbname */
+	OBJFILTER_TABLE = (1 << 2), /* -t | --table */
+	OBJFILTER_SCHEMA = (1 << 3),	/* -n | --schema */
+	OBJFILTER_SCHEMA_EXCLUDE = (1 << 4),	/* -N | --exclude-schema */
+} VacObjFilter;
+
+extern VacObjFilter objfilter;
+
+extern void vacuuming_main(ConnParams *cparams, const char *dbname,
+						   const char *maintenance_db, vacuumingOptions *vacopts,
+						   SimpleStringList *objects, bool analyze_in_stages,
+						   int tbl_count, int concurrentCons,
+						   const char *progname, bool echo, bool quiet);
+
+extern SimpleStringList *retrieve_objects(PGconn *conn,
+										  vacuumingOptions *vacopts,
+										  SimpleStringList *objects,
+										  bool echo);
+
+extern void vacuum_one_database(ConnParams *cparams,
+								vacuumingOptions *vacopts,
+								int stage,
+								SimpleStringList *objects,
+								SimpleStringList **found_objs,
+								int concurrentCons,
+								const char *progname, bool echo, bool quiet);
+
+extern void vacuum_all_databases(ConnParams *cparams,
+								 vacuumingOptions *vacopts,
+								 bool analyze_in_stages,
+								 SimpleStringList *objects,
+								 int concurrentCons,
+								 const char *progname, bool echo, bool quiet);
+
+extern void prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
+								   vacuumingOptions *vacopts, const char *table);
+
+extern void run_vacuum_command(PGconn *conn, const char *sql, bool echo,
+							   const char *table);
+
+extern char *escape_quotes(const char *src);
+
+#endif							/* VACUUMING_H */
-- 
2.43.0

#31

Antonin Houska

ah@cybertec.at

4 months ago

In reply to: Mihail Nikalayeu (#30)

Re: Adding REPACK [concurrently]

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

While testing MVCC-safe version with stress-tests
007_repack_concurrently_mvcc.pl I encountered some random crashes with
such logs:

25-09-02 12:24:40.039 CEST client backend[261907]
007_repack_concurrently_mvcc.pl ERROR: relcache reference
0x7715b9f394a8 is not owned by resource owner TopTransaction
...
This time I was clever and tried to attempt to reproduce the issue on
a non-MVCC safe version at first - and it is reproducible.

Thanks again for a thorough testing!

I think this should be fixed separately [1]/messages/by-id/119497.1756892972@localhost.

[1]: /messages/by-id/119497.1756892972@localhost

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#32

Alvaro Herrera

alvherre@alvh.no-ip.org

4 months ago

In reply to: Alvaro Herrera (#23)

Re: Adding REPACK [concurrently]

Hello,

Barring further commentary, I intend to get 0001 committed tomorrow, and
0002 some time later -- perhaps by end of this week, or sometime next
week.

Regards

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/

#33

Álvaro Herrera

alvherre@kurilemu.de

4 months ago

In reply to: Alvaro Herrera (#22)

2 attachment(s)

Re: Adding REPACK [concurrently]

After looking at this some more, I realized that 0001 had been written a
bit too hastily and that it could use with some more cleanup -- in
particular, we don't need to export most of the function prototypes
other than vacuuming_main() (and the trivial escape_quotes helper). I
made the other functions static. Also, prepare_vacuum_command() also
needs the encoding in order to do fmtIdEnc() on a given index name (for
`pg_repackdb -t table --index=foobar`), so I changed it to take the
PGconn instead of just the serverVersion. I realized that it makes no
sense that objfilter is a global variable instead of living inside
`main` and be passed as argument where needed. (Heck, maybe it should
be inside vacuumingOpts). Lastly, it seemed weird coding that the
functions would sometimes exit(1) instead of returning a result code, so
I made them do that and have the callers react appropriately. These are
all fairly straightforward changes.

So here's v22 with those and rebased to current sources. Only the first
two patches this time, which are the ones I would be glad to receive
input on.

I also wonder if analyze_only and analyze_in_stages should be new values
in RunMode rather than separate booleans ... I think that might make the
code simpler. I didn't try though.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Los dioses no protegen a los insensatos. Éstos reciben protección de
otros insensatos mejor dotados" (Luis Wu, Mundo Anillo)

Attachments:

v22-0001-Split-vacuumdb-to-create-vacuuming.c-h.patchtext/x-diff; charset=utf-8Download

From 6cbc0124cf341be82a24b9846012f0515a66f663 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= <alvherre@kurilemu.de>
Date: Thu, 25 Sep 2025 14:58:38 +0200
Subject: [PATCH v22 1/2] Split vacuumdb to create vacuuming.c/h

This allows these routines to be reused by a future utility heavily
based on vacuumdb.

Discussion: https://postgr.es/m/202508301750.cbohxyy2pcce@alvherre.pgsql
---
 src/bin/scripts/Makefile    |    4 +-
 src/bin/scripts/meson.build |   28 +-
 src/bin/scripts/vacuumdb.c  | 1055 +----------------------------------
 src/bin/scripts/vacuuming.c | 1012 +++++++++++++++++++++++++++++++++
 src/bin/scripts/vacuuming.h |   68 +++
 5 files changed, 1131 insertions(+), 1036 deletions(-)
 create mode 100644 src/bin/scripts/vacuuming.c
 create mode 100644 src/bin/scripts/vacuuming.h

diff --git a/src/bin/scripts/Makefile b/src/bin/scripts/Makefile
index f6b4d40810b..019ca06455d 100644
--- a/src/bin/scripts/Makefile
+++ b/src/bin/scripts/Makefile
@@ -28,7 +28,7 @@ createuser: createuser.o common.o $(WIN32RES) | submake-libpq submake-libpgport
 dropdb: dropdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 dropuser: dropuser.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 clusterdb: clusterdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
-vacuumdb: vacuumdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
+vacuumdb: vacuumdb.o vacuuming.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 reindexdb: reindexdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 pg_isready: pg_isready.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 
@@ -50,7 +50,7 @@ uninstall:
 
 clean distclean:
 	rm -f $(addsuffix $(X), $(PROGRAMS)) $(addsuffix .o, $(PROGRAMS))
-	rm -f common.o $(WIN32RES)
+	rm -f common.o vacuuming.o $(WIN32RES)
 	rm -rf tmp_check
 
 export with_icu
diff --git a/src/bin/scripts/meson.build b/src/bin/scripts/meson.build
index 80df7c33257..a4fed59d1c9 100644
--- a/src/bin/scripts/meson.build
+++ b/src/bin/scripts/meson.build
@@ -12,7 +12,6 @@ binaries = [
   'createuser',
   'dropuser',
   'clusterdb',
-  'vacuumdb',
   'reindexdb',
   'pg_isready',
 ]
@@ -35,6 +34,33 @@ foreach binary : binaries
   bin_targets += binary
 endforeach
 
+vacuuming_common = static_library('libvacuuming_common',
+  files('common.c', 'vacuuming.c'),
+  dependencies: [frontend_code, libpq],
+  kwargs: internal_lib_args,
+)
+
+binaries = [
+  'vacuumdb',
+]
+foreach binary : binaries
+  binary_sources = files('@0@.c'.format(binary))
+
+  if host_system == 'windows'
+    binary_sources += rc_bin_gen.process(win32ver_rc, extra_args: [
+      '--NAME', binary,
+      '--FILEDESC', '@0@ - PostgreSQL utility'.format(binary),])
+  endif
+
+  binary = executable(binary,
+    binary_sources,
+    link_with: [vacuuming_common],
+    dependencies: [frontend_code, libpq],
+    kwargs: default_bin_args,
+  )
+  bin_targets += binary
+endforeach
+
 tests += {
   'name': 'scripts',
   'sd': meson.current_source_dir(),
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 6e30f223efe..00babc5c9c0 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -14,91 +14,13 @@
 
 #include <limits.h>
 
-#include "catalog/pg_attribute_d.h"
-#include "catalog/pg_class_d.h"
 #include "common.h"
-#include "common/connect.h"
 #include "common/logging.h"
-#include "fe_utils/cancel.h"
 #include "fe_utils/option_utils.h"
-#include "fe_utils/parallel_slot.h"
-#include "fe_utils/query_utils.h"
-#include "fe_utils/simple_list.h"
-#include "fe_utils/string_utils.h"
-
-
-/* vacuum options controlled by user flags */
-typedef struct vacuumingOptions
-{
-	bool		analyze_only;
-	bool		verbose;
-	bool		and_analyze;
-	bool		full;
-	bool		freeze;
-	bool		disable_page_skipping;
-	bool		skip_locked;
-	int			min_xid_age;
-	int			min_mxid_age;
-	int			parallel_workers;	/* >= 0 indicates user specified the
-									 * parallel degree, otherwise -1 */
-	bool		no_index_cleanup;
-	bool		force_index_cleanup;
-	bool		do_truncate;
-	bool		process_main;
-	bool		process_toast;
-	bool		skip_database_stats;
-	char	   *buffer_usage_limit;
-	bool		missing_stats_only;
-} vacuumingOptions;
-
-/* object filter options */
-typedef enum
-{
-	OBJFILTER_NONE = 0,			/* no filter used */
-	OBJFILTER_ALL_DBS = (1 << 0),	/* -a | --all */
-	OBJFILTER_DATABASE = (1 << 1),	/* -d | --dbname */
-	OBJFILTER_TABLE = (1 << 2), /* -t | --table */
-	OBJFILTER_SCHEMA = (1 << 3),	/* -n | --schema */
-	OBJFILTER_SCHEMA_EXCLUDE = (1 << 4),	/* -N | --exclude-schema */
-} VacObjFilter;
-
-static VacObjFilter objfilter = OBJFILTER_NONE;
-
-static SimpleStringList *retrieve_objects(PGconn *conn,
-										  vacuumingOptions *vacopts,
-										  SimpleStringList *objects,
-										  bool echo);
-
-static void vacuum_one_database(ConnParams *cparams,
-								vacuumingOptions *vacopts,
-								int stage,
-								SimpleStringList *objects,
-								SimpleStringList **found_objs,
-								int concurrentCons,
-								const char *progname, bool echo, bool quiet);
-
-static void vacuum_all_databases(ConnParams *cparams,
-								 vacuumingOptions *vacopts,
-								 bool analyze_in_stages,
-								 SimpleStringList *objects,
-								 int concurrentCons,
-								 const char *progname, bool echo, bool quiet);
-
-static void prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
-								   vacuumingOptions *vacopts, const char *table);
-
-static void run_vacuum_command(PGconn *conn, const char *sql, bool echo,
-							   const char *table);
+#include "vacuuming.h"
 
 static void help(const char *progname);
-
-void		check_objfilter(void);
-
-static char *escape_quotes(const char *src);
-
-/* For analyze-in-stages mode */
-#define ANALYZE_NO_STAGE	-1
-#define ANALYZE_NUM_STAGES	3
+static void check_objfilter(VacObjFilter objfilter);
 
 
 int
@@ -145,18 +67,16 @@ main(int argc, char *argv[])
 	int			c;
 	const char *dbname = NULL;
 	const char *maintenance_db = NULL;
-	char	   *host = NULL;
-	char	   *port = NULL;
-	char	   *username = NULL;
-	enum trivalue prompt_password = TRI_DEFAULT;
 	ConnParams	cparams;
 	bool		echo = false;
 	bool		quiet = false;
 	vacuumingOptions vacopts;
 	bool		analyze_in_stages = false;
 	SimpleStringList objects = {NULL, NULL};
+	VacObjFilter objfilter = OBJFILTER_NONE;
 	int			concurrentCons = 1;
 	int			tbl_count = 0;
+	int			ret;
 
 	/* initialize options */
 	memset(&vacopts, 0, sizeof(vacopts));
@@ -168,13 +88,18 @@ main(int argc, char *argv[])
 	vacopts.process_main = true;
 	vacopts.process_toast = true;
 
+	/* the same for connection parameters */
+	memset(&cparams, 0, sizeof(cparams));
+	cparams.prompt_password = TRI_DEFAULT;
+
 	pg_logging_init(argv[0]);
 	progname = get_progname(argv[0]);
 	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
 
 	handle_help_version_opts(argc, argv, "vacuumdb", help);
 
-	while ((c = getopt_long(argc, argv, "ad:efFh:j:n:N:p:P:qt:U:vwWzZ", long_options, &optindex)) != -1)
+	while ((c = getopt_long(argc, argv, "ad:efFh:j:n:N:p:P:qt:U:vwWzZ",
+							long_options, &optindex)) != -1)
 	{
 		switch (c)
 		{
@@ -195,7 +120,7 @@ main(int argc, char *argv[])
 				vacopts.freeze = true;
 				break;
 			case 'h':
-				host = pg_strdup(optarg);
+				cparams.pghost = pg_strdup(optarg);
 				break;
 			case 'j':
 				if (!option_parse_int(optarg, "-j/--jobs", 1, INT_MAX,
@@ -211,7 +136,7 @@ main(int argc, char *argv[])
 				simple_string_list_append(&objects, optarg);
 				break;
 			case 'p':
-				port = pg_strdup(optarg);
+				cparams.pgport = pg_strdup(optarg);
 				break;
 			case 'P':
 				if (!option_parse_int(optarg, "-P/--parallel", 0, INT_MAX,
@@ -227,16 +152,16 @@ main(int argc, char *argv[])
 				tbl_count++;
 				break;
 			case 'U':
-				username = pg_strdup(optarg);
+				cparams.pguser = pg_strdup(optarg);
 				break;
 			case 'v':
 				vacopts.verbose = true;
 				break;
 			case 'w':
-				prompt_password = TRI_NO;
+				cparams.prompt_password = TRI_NO;
 				break;
 			case 'W':
-				prompt_password = TRI_YES;
+				cparams.prompt_password = TRI_YES;
 				break;
 			case 'z':
 				vacopts.and_analyze = true;
@@ -317,7 +242,7 @@ main(int argc, char *argv[])
 	 * Validate the combination of filters specified in the command-line
 	 * options.
 	 */
-	check_objfilter();
+	check_objfilter(objfilter);
 
 	if (vacopts.analyze_only)
 	{
@@ -380,74 +305,18 @@ main(int argc, char *argv[])
 		pg_fatal("cannot use the \"%s\" option without \"%s\" or \"%s\"",
 				 "missing-stats-only", "analyze-only", "analyze-in-stages");
 
-	/* fill cparams except for dbname, which is set below */
-	cparams.pghost = host;
-	cparams.pgport = port;
-	cparams.pguser = username;
-	cparams.prompt_password = prompt_password;
-	cparams.override_dbname = NULL;
-
-	setup_cancel_handler(NULL);
-
-	/* Avoid opening extra connections. */
-	if (tbl_count && (concurrentCons > tbl_count))
-		concurrentCons = tbl_count;
-
-	if (objfilter & OBJFILTER_ALL_DBS)
-	{
-		cparams.dbname = maintenance_db;
-
-		vacuum_all_databases(&cparams, &vacopts,
-							 analyze_in_stages,
-							 &objects,
-							 concurrentCons,
-							 progname, echo, quiet);
-	}
-	else
-	{
-		if (dbname == NULL)
-		{
-			if (getenv("PGDATABASE"))
-				dbname = getenv("PGDATABASE");
-			else if (getenv("PGUSER"))
-				dbname = getenv("PGUSER");
-			else
-				dbname = get_user_name_or_exit(progname);
-		}
-
-		cparams.dbname = dbname;
-
-		if (analyze_in_stages)
-		{
-			int			stage;
-			SimpleStringList *found_objs = NULL;
-
-			for (stage = 0; stage < ANALYZE_NUM_STAGES; stage++)
-			{
-				vacuum_one_database(&cparams, &vacopts,
-									stage,
-									&objects,
-									vacopts.missing_stats_only ? &found_objs : NULL,
-									concurrentCons,
-									progname, echo, quiet);
-			}
-		}
-		else
-			vacuum_one_database(&cparams, &vacopts,
-								ANALYZE_NO_STAGE,
-								&objects, NULL,
-								concurrentCons,
-								progname, echo, quiet);
-	}
-
-	exit(0);
+	ret = vacuuming_main(&cparams, dbname, maintenance_db, &vacopts,
+						 objfilter, &objects, tbl_count, analyze_in_stages,
+						 concurrentCons,
+						 progname, echo, quiet);
+	exit(ret);
 }
 
 /*
  * Verify that the filters used at command line are compatible.
  */
 void
-check_objfilter(void)
+check_objfilter(VacObjFilter objfilter)
 {
 	if ((objfilter & OBJFILTER_ALL_DBS) &&
 		(objfilter & OBJFILTER_DATABASE))
@@ -466,886 +335,6 @@ check_objfilter(void)
 		pg_fatal("cannot vacuum all tables in schema(s) and exclude schema(s) at the same time");
 }
 
-/*
- * Returns a newly malloc'd version of 'src' with escaped single quotes and
- * backslashes.
- */
-static char *
-escape_quotes(const char *src)
-{
-	char	   *result = escape_single_quotes_ascii(src);
-
-	if (!result)
-		pg_fatal("out of memory");
-	return result;
-}
-
-/*
- * vacuum_one_database
- *
- * Process tables in the given database.
- *
- * There are two ways to specify the list of objects to process:
- *
- * 1) The "found_objs" parameter is a double pointer to a fully qualified list
- *    of objects to process, as returned by a previous call to
- *    vacuum_one_database().
- *
- *     a) If both "found_objs" (the double pointer) and "*found_objs" (the
- *        once-dereferenced double pointer) are not NULL, this list takes
- *        priority, and anything specified in "objects" is ignored.
- *
- *     b) If "found_objs" (the double pointer) is not NULL but "*found_objs"
- *        (the once-dereferenced double pointer) _is_ NULL, the "objects"
- *        parameter takes priority, and the results of the catalog query
- *        described in (2) are stored in "found_objs".
- *
- *     c) If "found_objs" (the double pointer) is NULL, the "objects"
- *        parameter again takes priority, and the results of the catalog query
- *        are not saved.
- *
- * 2) The "objects" parameter is a user-specified list of objects to process.
- *    When (1b) or (1c) applies, this function performs a catalog query to
- *    retrieve a fully qualified list of objects to process, as described
- *    below.
- *
- *     a) If "objects" is not NULL, the catalog query gathers only the objects
- *        listed in "objects".
- *
- *     b) If "objects" is NULL, all tables in the database are gathered.
- *
- * Note that this function is only concerned with running exactly one stage
- * when in analyze-in-stages mode; caller must iterate on us if necessary.
- *
- * If concurrentCons is > 1, multiple connections are used to vacuum tables
- * in parallel.
- */
-static void
-vacuum_one_database(ConnParams *cparams,
-					vacuumingOptions *vacopts,
-					int stage,
-					SimpleStringList *objects,
-					SimpleStringList **found_objs,
-					int concurrentCons,
-					const char *progname, bool echo, bool quiet)
-{
-	PQExpBufferData sql;
-	PGconn	   *conn;
-	SimpleStringListCell *cell;
-	ParallelSlotArray *sa;
-	int			ntups = 0;
-	bool		failed = false;
-	const char *initcmd;
-	SimpleStringList *ret = NULL;
-	const char *stage_commands[] = {
-		"SET default_statistics_target=1; SET vacuum_cost_delay=0;",
-		"SET default_statistics_target=10; RESET vacuum_cost_delay;",
-		"RESET default_statistics_target;"
-	};
-	const char *stage_messages[] = {
-		gettext_noop("Generating minimal optimizer statistics (1 target)"),
-		gettext_noop("Generating medium optimizer statistics (10 targets)"),
-		gettext_noop("Generating default (full) optimizer statistics")
-	};
-
-	Assert(stage == ANALYZE_NO_STAGE ||
-		   (stage >= 0 && stage < ANALYZE_NUM_STAGES));
-
-	conn = connectDatabase(cparams, progname, echo, false, true);
-
-	if (vacopts->disable_page_skipping && PQserverVersion(conn) < 90600)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "disable-page-skipping", "9.6");
-	}
-
-	if (vacopts->no_index_cleanup && PQserverVersion(conn) < 120000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "no-index-cleanup", "12");
-	}
-
-	if (vacopts->force_index_cleanup && PQserverVersion(conn) < 120000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "force-index-cleanup", "12");
-	}
-
-	if (!vacopts->do_truncate && PQserverVersion(conn) < 120000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "no-truncate", "12");
-	}
-
-	if (!vacopts->process_main && PQserverVersion(conn) < 160000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "no-process-main", "16");
-	}
-
-	if (!vacopts->process_toast && PQserverVersion(conn) < 140000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "no-process-toast", "14");
-	}
-
-	if (vacopts->skip_locked && PQserverVersion(conn) < 120000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "skip-locked", "12");
-	}
-
-	if (vacopts->min_xid_age != 0 && PQserverVersion(conn) < 90600)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "--min-xid-age", "9.6");
-	}
-
-	if (vacopts->min_mxid_age != 0 && PQserverVersion(conn) < 90600)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "--min-mxid-age", "9.6");
-	}
-
-	if (vacopts->parallel_workers >= 0 && PQserverVersion(conn) < 130000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "--parallel", "13");
-	}
-
-	if (vacopts->buffer_usage_limit && PQserverVersion(conn) < 160000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "--buffer-usage-limit", "16");
-	}
-
-	if (vacopts->missing_stats_only && PQserverVersion(conn) < 150000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "--missing-stats-only", "15");
-	}
-
-	/* skip_database_stats is used automatically if server supports it */
-	vacopts->skip_database_stats = (PQserverVersion(conn) >= 160000);
-
-	if (!quiet)
-	{
-		if (stage != ANALYZE_NO_STAGE)
-			printf(_("%s: processing database \"%s\": %s\n"),
-				   progname, PQdb(conn), _(stage_messages[stage]));
-		else
-			printf(_("%s: vacuuming database \"%s\"\n"),
-				   progname, PQdb(conn));
-		fflush(stdout);
-	}
-
-	/*
-	 * If the caller provided the results of a previous catalog query, just
-	 * use that.  Otherwise, run the catalog query ourselves and set the
-	 * return variable if provided.
-	 */
-	if (found_objs && *found_objs)
-		ret = *found_objs;
-	else
-	{
-		ret = retrieve_objects(conn, vacopts, objects, echo);
-		if (found_objs)
-			*found_objs = ret;
-	}
-
-	/*
-	 * Count the number of objects in the catalog query result.  If there are
-	 * none, we are done.
-	 */
-	for (cell = ret ? ret->head : NULL; cell; cell = cell->next)
-		ntups++;
-
-	if (ntups == 0)
-	{
-		PQfinish(conn);
-		return;
-	}
-
-	/*
-	 * Ensure concurrentCons is sane.  If there are more connections than
-	 * vacuumable relations, we don't need to use them all.
-	 */
-	if (concurrentCons > ntups)
-		concurrentCons = ntups;
-	if (concurrentCons <= 0)
-		concurrentCons = 1;
-
-	/*
-	 * All slots need to be prepared to run the appropriate analyze stage, if
-	 * caller requested that mode.  We have to prepare the initial connection
-	 * ourselves before setting up the slots.
-	 */
-	if (stage == ANALYZE_NO_STAGE)
-		initcmd = NULL;
-	else
-	{
-		initcmd = stage_commands[stage];
-		executeCommand(conn, initcmd, echo);
-	}
-
-	/*
-	 * Setup the database connections. We reuse the connection we already have
-	 * for the first slot.  If not in parallel mode, the first slot in the
-	 * array contains the connection.
-	 */
-	sa = ParallelSlotsSetup(concurrentCons, cparams, progname, echo, initcmd);
-	ParallelSlotsAdoptConn(sa, conn);
-
-	initPQExpBuffer(&sql);
-
-	cell = ret->head;
-	do
-	{
-		const char *tabname = cell->val;
-		ParallelSlot *free_slot;
-
-		if (CancelRequested)
-		{
-			failed = true;
-			goto finish;
-		}
-
-		free_slot = ParallelSlotsGetIdle(sa, NULL);
-		if (!free_slot)
-		{
-			failed = true;
-			goto finish;
-		}
-
-		prepare_vacuum_command(&sql, PQserverVersion(free_slot->connection),
-							   vacopts, tabname);
-
-		/*
-		 * Execute the vacuum.  All errors are handled in processQueryResult
-		 * through ParallelSlotsGetIdle.
-		 */
-		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, sql.data,
-						   echo, tabname);
-
-		cell = cell->next;
-	} while (cell != NULL);
-
-	if (!ParallelSlotsWaitCompletion(sa))
-	{
-		failed = true;
-		goto finish;
-	}
-
-	/* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */
-	if (vacopts->skip_database_stats && stage == ANALYZE_NO_STAGE &&
-		!vacopts->analyze_only)
-	{
-		const char *cmd = "VACUUM (ONLY_DATABASE_STATS);";
-		ParallelSlot *free_slot = ParallelSlotsGetIdle(sa, NULL);
-
-		if (!free_slot)
-		{
-			failed = true;
-			goto finish;
-		}
-
-		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, cmd, echo, NULL);
-
-		if (!ParallelSlotsWaitCompletion(sa))
-			failed = true;
-	}
-
-finish:
-	ParallelSlotsTerminate(sa);
-	pg_free(sa);
-
-	termPQExpBuffer(&sql);
-
-	if (failed)
-		exit(1);
-}
-
-/*
- * Prepare the list of tables to process by querying the catalogs.
- *
- * Since we execute the constructed query with the default search_path (which
- * could be unsafe), everything in this query MUST be fully qualified.
- *
- * First, build a WITH clause for the catalog query if any tables were
- * specified, with a set of values made of relation names and their optional
- * set of columns.  This is used to match any provided column lists with the
- * generated qualified identifiers and to filter for the tables provided via
- * --table.  If a listed table does not exist, the catalog query will fail.
- */
-static SimpleStringList *
-retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
-				 SimpleStringList *objects, bool echo)
-{
-	PQExpBufferData buf;
-	PQExpBufferData catalog_query;
-	PGresult   *res;
-	SimpleStringListCell *cell;
-	SimpleStringList *found_objs = palloc0(sizeof(SimpleStringList));
-	bool		objects_listed = false;
-
-	initPQExpBuffer(&catalog_query);
-	for (cell = objects ? objects->head : NULL; cell; cell = cell->next)
-	{
-		char	   *just_table = NULL;
-		const char *just_columns = NULL;
-
-		if (!objects_listed)
-		{
-			appendPQExpBufferStr(&catalog_query,
-								 "WITH listed_objects (object_oid, column_list) "
-								 "AS (\n  VALUES (");
-			objects_listed = true;
-		}
-		else
-			appendPQExpBufferStr(&catalog_query, ",\n  (");
-
-		if (objfilter & (OBJFILTER_SCHEMA | OBJFILTER_SCHEMA_EXCLUDE))
-		{
-			appendStringLiteralConn(&catalog_query, cell->val, conn);
-			appendPQExpBufferStr(&catalog_query, "::pg_catalog.regnamespace, ");
-		}
-
-		if (objfilter & OBJFILTER_TABLE)
-		{
-			/*
-			 * Split relation and column names given by the user, this is used
-			 * to feed the CTE with values on which are performed pre-run
-			 * validity checks as well.  For now these happen only on the
-			 * relation name.
-			 */
-			splitTableColumnsSpec(cell->val, PQclientEncoding(conn),
-								  &just_table, &just_columns);
-
-			appendStringLiteralConn(&catalog_query, just_table, conn);
-			appendPQExpBufferStr(&catalog_query, "::pg_catalog.regclass, ");
-		}
-
-		if (just_columns && just_columns[0] != '\0')
-			appendStringLiteralConn(&catalog_query, just_columns, conn);
-		else
-			appendPQExpBufferStr(&catalog_query, "NULL");
-
-		appendPQExpBufferStr(&catalog_query, "::pg_catalog.text)");
-
-		pg_free(just_table);
-	}
-
-	/* Finish formatting the CTE */
-	if (objects_listed)
-		appendPQExpBufferStr(&catalog_query, "\n)\n");
-
-	appendPQExpBufferStr(&catalog_query, "SELECT c.relname, ns.nspname");
-
-	if (objects_listed)
-		appendPQExpBufferStr(&catalog_query, ", listed_objects.column_list");
-
-	appendPQExpBufferStr(&catalog_query,
-						 " FROM pg_catalog.pg_class c\n"
-						 " JOIN pg_catalog.pg_namespace ns"
-						 " ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n"
-						 " CROSS JOIN LATERAL (SELECT c.relkind IN ("
-						 CppAsString2(RELKIND_PARTITIONED_TABLE) ", "
-						 CppAsString2(RELKIND_PARTITIONED_INDEX) ")) as p (inherited)\n"
-						 " LEFT JOIN pg_catalog.pg_class t"
-						 " ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid\n");
-
-	/*
-	 * Used to match the tables or schemas listed by the user, completing the
-	 * JOIN clause.
-	 */
-	if (objects_listed)
-	{
-		appendPQExpBufferStr(&catalog_query, " LEFT JOIN listed_objects"
-							 " ON listed_objects.object_oid"
-							 " OPERATOR(pg_catalog.=) ");
-
-		if (objfilter & OBJFILTER_TABLE)
-			appendPQExpBufferStr(&catalog_query, "c.oid\n");
-		else
-			appendPQExpBufferStr(&catalog_query, "ns.oid\n");
-	}
-
-	/*
-	 * Exclude temporary tables, beginning the WHERE clause.
-	 */
-	appendPQExpBufferStr(&catalog_query,
-						 " WHERE c.relpersistence OPERATOR(pg_catalog.!=) "
-						 CppAsString2(RELPERSISTENCE_TEMP) "\n");
-
-	/*
-	 * Used to match the tables or schemas listed by the user, for the WHERE
-	 * clause.
-	 */
-	if (objects_listed)
-	{
-		if (objfilter & OBJFILTER_SCHEMA_EXCLUDE)
-			appendPQExpBufferStr(&catalog_query,
-								 " AND listed_objects.object_oid IS NULL\n");
-		else
-			appendPQExpBufferStr(&catalog_query,
-								 " AND listed_objects.object_oid IS NOT NULL\n");
-	}
-
-	/*
-	 * If no tables were listed, filter for the relevant relation types.  If
-	 * tables were given via --table, don't bother filtering by relation type.
-	 * Instead, let the server decide whether a given relation can be
-	 * processed in which case the user will know about it.
-	 */
-	if ((objfilter & OBJFILTER_TABLE) == 0)
-	{
-		/*
-		 * vacuumdb should generally follow the behavior of the underlying
-		 * VACUUM and ANALYZE commands. If analyze_only is true, process
-		 * regular tables, materialized views, and partitioned tables, just
-		 * like ANALYZE (with no specific target tables) does. Otherwise,
-		 * process only regular tables and materialized views, since VACUUM
-		 * skips partitioned tables when no target tables are specified.
-		 */
-		if (vacopts->analyze_only)
-			appendPQExpBufferStr(&catalog_query,
-								 " AND c.relkind OPERATOR(pg_catalog.=) ANY (array["
-								 CppAsString2(RELKIND_RELATION) ", "
-								 CppAsString2(RELKIND_MATVIEW) ", "
-								 CppAsString2(RELKIND_PARTITIONED_TABLE) "])\n");
-		else
-			appendPQExpBufferStr(&catalog_query,
-								 " AND c.relkind OPERATOR(pg_catalog.=) ANY (array["
-								 CppAsString2(RELKIND_RELATION) ", "
-								 CppAsString2(RELKIND_MATVIEW) "])\n");
-
-	}
-
-	/*
-	 * For --min-xid-age and --min-mxid-age, the age of the relation is the
-	 * greatest of the ages of the main relation and its associated TOAST
-	 * table.  The commands generated by vacuumdb will also process the TOAST
-	 * table for the relation if necessary, so it does not need to be
-	 * considered separately.
-	 */
-	if (vacopts->min_xid_age != 0)
-	{
-		appendPQExpBuffer(&catalog_query,
-						  " AND GREATEST(pg_catalog.age(c.relfrozenxid),"
-						  " pg_catalog.age(t.relfrozenxid)) "
-						  " OPERATOR(pg_catalog.>=) '%d'::pg_catalog.int4\n"
-						  " AND c.relfrozenxid OPERATOR(pg_catalog.!=)"
-						  " '0'::pg_catalog.xid\n",
-						  vacopts->min_xid_age);
-	}
-
-	if (vacopts->min_mxid_age != 0)
-	{
-		appendPQExpBuffer(&catalog_query,
-						  " AND GREATEST(pg_catalog.mxid_age(c.relminmxid),"
-						  " pg_catalog.mxid_age(t.relminmxid)) OPERATOR(pg_catalog.>=)"
-						  " '%d'::pg_catalog.int4\n"
-						  " AND c.relminmxid OPERATOR(pg_catalog.!=)"
-						  " '0'::pg_catalog.xid\n",
-						  vacopts->min_mxid_age);
-	}
-
-	if (vacopts->missing_stats_only)
-	{
-		appendPQExpBufferStr(&catalog_query, " AND (\n");
-
-		/* regular stats */
-		appendPQExpBufferStr(&catalog_query,
-							 " EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
-							 " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
-							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
-							 " AND NOT a.attisdropped\n"
-							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
-							 " AND a.attgenerated OPERATOR(pg_catalog.<>) "
-							 CppAsString2(ATTRIBUTE_GENERATED_VIRTUAL) "\n"
-							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
-							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
-							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
-							 " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
-
-		/* extended stats */
-		appendPQExpBufferStr(&catalog_query,
-							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
-							 " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
-							 " AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
-							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
-							 " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
-							 " AND d.stxdinherit OPERATOR(pg_catalog.=) p.inherited))\n");
-
-		/* expression indexes */
-		appendPQExpBufferStr(&catalog_query,
-							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
-							 " JOIN pg_catalog.pg_index i"
-							 " ON i.indexrelid OPERATOR(pg_catalog.=) a.attrelid\n"
-							 " WHERE i.indrelid OPERATOR(pg_catalog.=) c.oid\n"
-							 " AND i.indkey[a.attnum OPERATOR(pg_catalog.-) 1::pg_catalog.int2]"
-							 " OPERATOR(pg_catalog.=) 0::pg_catalog.int2\n"
-							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
-							 " AND NOT a.attisdropped\n"
-							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
-							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
-							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
-							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
-							 " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
-
-		/* inheritance and regular stats */
-		appendPQExpBufferStr(&catalog_query,
-							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
-							 " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
-							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
-							 " AND NOT a.attisdropped\n"
-							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
-							 " AND a.attgenerated OPERATOR(pg_catalog.<>) "
-							 CppAsString2(ATTRIBUTE_GENERATED_VIRTUAL) "\n"
-							 " AND c.relhassubclass\n"
-							 " AND NOT p.inherited\n"
-							 " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
-							 " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
-							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
-							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
-							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
-							 " AND s.stainherit))\n");
-
-		/* inheritance and extended stats */
-		appendPQExpBufferStr(&catalog_query,
-							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
-							 " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
-							 " AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
-							 " AND c.relhassubclass\n"
-							 " AND NOT p.inherited\n"
-							 " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
-							 " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
-							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
-							 " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
-							 " AND d.stxdinherit))\n");
-
-		appendPQExpBufferStr(&catalog_query, " )\n");
-	}
-
-	/*
-	 * Execute the catalog query.  We use the default search_path for this
-	 * query for consistency with table lookups done elsewhere by the user.
-	 */
-	appendPQExpBufferStr(&catalog_query, " ORDER BY c.relpages DESC;");
-	executeCommand(conn, "RESET search_path;", echo);
-	res = executeQuery(conn, catalog_query.data, echo);
-	termPQExpBuffer(&catalog_query);
-	PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, echo));
-
-	/*
-	 * Build qualified identifiers for each table, including the column list
-	 * if given.
-	 */
-	initPQExpBuffer(&buf);
-	for (int i = 0; i < PQntuples(res); i++)
-	{
-		appendPQExpBufferStr(&buf,
-							 fmtQualifiedIdEnc(PQgetvalue(res, i, 1),
-											   PQgetvalue(res, i, 0),
-											   PQclientEncoding(conn)));
-
-		if (objects_listed && !PQgetisnull(res, i, 2))
-			appendPQExpBufferStr(&buf, PQgetvalue(res, i, 2));
-
-		simple_string_list_append(found_objs, buf.data);
-		resetPQExpBuffer(&buf);
-	}
-	termPQExpBuffer(&buf);
-	PQclear(res);
-
-	return found_objs;
-}
-
-/*
- * Vacuum/analyze all connectable databases.
- *
- * In analyze-in-stages mode, we process all databases in one stage before
- * moving on to the next stage.  That ensure minimal stats are available
- * quickly everywhere before generating more detailed ones.
- */
-static void
-vacuum_all_databases(ConnParams *cparams,
-					 vacuumingOptions *vacopts,
-					 bool analyze_in_stages,
-					 SimpleStringList *objects,
-					 int concurrentCons,
-					 const char *progname, bool echo, bool quiet)
-{
-	PGconn	   *conn;
-	PGresult   *result;
-	int			stage;
-	int			i;
-
-	conn = connectMaintenanceDatabase(cparams, progname, echo);
-	result = executeQuery(conn,
-						  "SELECT datname FROM pg_database WHERE datallowconn AND datconnlimit <> -2 ORDER BY 1;",
-						  echo);
-	PQfinish(conn);
-
-	if (analyze_in_stages)
-	{
-		SimpleStringList **found_objs = NULL;
-
-		if (vacopts->missing_stats_only)
-			found_objs = palloc0(PQntuples(result) * sizeof(SimpleStringList *));
-
-		/*
-		 * When analyzing all databases in stages, we analyze them all in the
-		 * fastest stage first, so that initial statistics become available
-		 * for all of them as soon as possible.
-		 *
-		 * This means we establish several times as many connections, but
-		 * that's a secondary consideration.
-		 */
-		for (stage = 0; stage < ANALYZE_NUM_STAGES; stage++)
-		{
-			for (i = 0; i < PQntuples(result); i++)
-			{
-				cparams->override_dbname = PQgetvalue(result, i, 0);
-
-				vacuum_one_database(cparams, vacopts,
-									stage,
-									objects,
-									vacopts->missing_stats_only ? &found_objs[i] : NULL,
-									concurrentCons,
-									progname, echo, quiet);
-			}
-		}
-	}
-	else
-	{
-		for (i = 0; i < PQntuples(result); i++)
-		{
-			cparams->override_dbname = PQgetvalue(result, i, 0);
-
-			vacuum_one_database(cparams, vacopts,
-								ANALYZE_NO_STAGE,
-								objects, NULL,
-								concurrentCons,
-								progname, echo, quiet);
-		}
-	}
-
-	PQclear(result);
-}
-
-/*
- * Construct a vacuum/analyze command to run based on the given options, in the
- * given string buffer, which may contain previous garbage.
- *
- * The table name used must be already properly quoted.  The command generated
- * depends on the server version involved and it is semicolon-terminated.
- */
-static void
-prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
-					   vacuumingOptions *vacopts, const char *table)
-{
-	const char *paren = " (";
-	const char *comma = ", ";
-	const char *sep = paren;
-
-	resetPQExpBuffer(sql);
-
-	if (vacopts->analyze_only)
-	{
-		appendPQExpBufferStr(sql, "ANALYZE");
-
-		/* parenthesized grammar of ANALYZE is supported since v11 */
-		if (serverVersion >= 110000)
-		{
-			if (vacopts->skip_locked)
-			{
-				/* SKIP_LOCKED is supported since v12 */
-				Assert(serverVersion >= 120000);
-				appendPQExpBuffer(sql, "%sSKIP_LOCKED", sep);
-				sep = comma;
-			}
-			if (vacopts->verbose)
-			{
-				appendPQExpBuffer(sql, "%sVERBOSE", sep);
-				sep = comma;
-			}
-			if (vacopts->buffer_usage_limit)
-			{
-				Assert(serverVersion >= 160000);
-				appendPQExpBuffer(sql, "%sBUFFER_USAGE_LIMIT '%s'", sep,
-								  vacopts->buffer_usage_limit);
-				sep = comma;
-			}
-			if (sep != paren)
-				appendPQExpBufferChar(sql, ')');
-		}
-		else
-		{
-			if (vacopts->verbose)
-				appendPQExpBufferStr(sql, " VERBOSE");
-		}
-	}
-	else
-	{
-		appendPQExpBufferStr(sql, "VACUUM");
-
-		/* parenthesized grammar of VACUUM is supported since v9.0 */
-		if (serverVersion >= 90000)
-		{
-			if (vacopts->disable_page_skipping)
-			{
-				/* DISABLE_PAGE_SKIPPING is supported since v9.6 */
-				Assert(serverVersion >= 90600);
-				appendPQExpBuffer(sql, "%sDISABLE_PAGE_SKIPPING", sep);
-				sep = comma;
-			}
-			if (vacopts->no_index_cleanup)
-			{
-				/* "INDEX_CLEANUP FALSE" has been supported since v12 */
-				Assert(serverVersion >= 120000);
-				Assert(!vacopts->force_index_cleanup);
-				appendPQExpBuffer(sql, "%sINDEX_CLEANUP FALSE", sep);
-				sep = comma;
-			}
-			if (vacopts->force_index_cleanup)
-			{
-				/* "INDEX_CLEANUP TRUE" has been supported since v12 */
-				Assert(serverVersion >= 120000);
-				Assert(!vacopts->no_index_cleanup);
-				appendPQExpBuffer(sql, "%sINDEX_CLEANUP TRUE", sep);
-				sep = comma;
-			}
-			if (!vacopts->do_truncate)
-			{
-				/* TRUNCATE is supported since v12 */
-				Assert(serverVersion >= 120000);
-				appendPQExpBuffer(sql, "%sTRUNCATE FALSE", sep);
-				sep = comma;
-			}
-			if (!vacopts->process_main)
-			{
-				/* PROCESS_MAIN is supported since v16 */
-				Assert(serverVersion >= 160000);
-				appendPQExpBuffer(sql, "%sPROCESS_MAIN FALSE", sep);
-				sep = comma;
-			}
-			if (!vacopts->process_toast)
-			{
-				/* PROCESS_TOAST is supported since v14 */
-				Assert(serverVersion >= 140000);
-				appendPQExpBuffer(sql, "%sPROCESS_TOAST FALSE", sep);
-				sep = comma;
-			}
-			if (vacopts->skip_database_stats)
-			{
-				/* SKIP_DATABASE_STATS is supported since v16 */
-				Assert(serverVersion >= 160000);
-				appendPQExpBuffer(sql, "%sSKIP_DATABASE_STATS", sep);
-				sep = comma;
-			}
-			if (vacopts->skip_locked)
-			{
-				/* SKIP_LOCKED is supported since v12 */
-				Assert(serverVersion >= 120000);
-				appendPQExpBuffer(sql, "%sSKIP_LOCKED", sep);
-				sep = comma;
-			}
-			if (vacopts->full)
-			{
-				appendPQExpBuffer(sql, "%sFULL", sep);
-				sep = comma;
-			}
-			if (vacopts->freeze)
-			{
-				appendPQExpBuffer(sql, "%sFREEZE", sep);
-				sep = comma;
-			}
-			if (vacopts->verbose)
-			{
-				appendPQExpBuffer(sql, "%sVERBOSE", sep);
-				sep = comma;
-			}
-			if (vacopts->and_analyze)
-			{
-				appendPQExpBuffer(sql, "%sANALYZE", sep);
-				sep = comma;
-			}
-			if (vacopts->parallel_workers >= 0)
-			{
-				/* PARALLEL is supported since v13 */
-				Assert(serverVersion >= 130000);
-				appendPQExpBuffer(sql, "%sPARALLEL %d", sep,
-								  vacopts->parallel_workers);
-				sep = comma;
-			}
-			if (vacopts->buffer_usage_limit)
-			{
-				Assert(serverVersion >= 160000);
-				appendPQExpBuffer(sql, "%sBUFFER_USAGE_LIMIT '%s'", sep,
-								  vacopts->buffer_usage_limit);
-				sep = comma;
-			}
-			if (sep != paren)
-				appendPQExpBufferChar(sql, ')');
-		}
-		else
-		{
-			if (vacopts->full)
-				appendPQExpBufferStr(sql, " FULL");
-			if (vacopts->freeze)
-				appendPQExpBufferStr(sql, " FREEZE");
-			if (vacopts->verbose)
-				appendPQExpBufferStr(sql, " VERBOSE");
-			if (vacopts->and_analyze)
-				appendPQExpBufferStr(sql, " ANALYZE");
-		}
-	}
-
-	appendPQExpBuffer(sql, " %s;", table);
-}
-
-/*
- * Send a vacuum/analyze command to the server, returning after sending the
- * command.
- *
- * Any errors during command execution are reported to stderr.
- */
-static void
-run_vacuum_command(PGconn *conn, const char *sql, bool echo,
-				   const char *table)
-{
-	bool		status;
-
-	if (echo)
-		printf("%s\n", sql);
-
-	status = PQsendQuery(conn, sql) == 1;
-
-	if (!status)
-	{
-		if (table)
-			pg_log_error("vacuuming of table \"%s\" in database \"%s\" failed: %s",
-						 table, PQdb(conn), PQerrorMessage(conn));
-		else
-			pg_log_error("vacuuming of database \"%s\" failed: %s",
-						 PQdb(conn), PQerrorMessage(conn));
-	}
-}
 
 static void
 help(const char *progname)
diff --git a/src/bin/scripts/vacuuming.c b/src/bin/scripts/vacuuming.c
new file mode 100644
index 00000000000..b9f2e507557
--- /dev/null
+++ b/src/bin/scripts/vacuuming.c
@@ -0,0 +1,1012 @@
+/*-------------------------------------------------------------------------
+ * vacuuming.c
+ *		Helper routines for vacuumdb
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/bin/scripts/vacuuming.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include "catalog/pg_attribute_d.h"
+#include "catalog/pg_class_d.h"
+#include "common/connect.h"
+#include "common/logging.h"
+#include "fe_utils/cancel.h"
+#include "fe_utils/option_utils.h"
+#include "fe_utils/parallel_slot.h"
+#include "fe_utils/query_utils.h"
+#include "fe_utils/string_utils.h"
+#include "vacuuming.h"
+
+
+static int	vacuum_one_database(ConnParams *cparams,
+								vacuumingOptions *vacopts,
+								int stage,
+								VacObjFilter objfilter,
+								SimpleStringList *objects,
+								SimpleStringList **found_objs,
+								int concurrentCons,
+								const char *progname, bool echo, bool quiet);
+static int	vacuum_all_databases(ConnParams *cparams,
+								 vacuumingOptions *vacopts,
+								 bool analyze_in_stages,
+								 VacObjFilter objfilter,
+								 SimpleStringList *objects,
+								 int concurrentCons,
+								 const char *progname, bool echo, bool quiet);
+static SimpleStringList *retrieve_objects(PGconn *conn,
+										  vacuumingOptions *vacopts,
+										  VacObjFilter objfilter,
+										  SimpleStringList *objects,
+										  bool echo);
+static void prepare_vacuum_command(PGconn *conn, PQExpBuffer sql,
+								   vacuumingOptions *vacopts, const char *table);
+static void run_vacuum_command(PGconn *conn, const char *sql, bool echo,
+							   const char *table);
+
+/*
+ * Executes vacuum/analyze as indicated, or dies in case of failure.
+ */
+int
+vacuuming_main(ConnParams *cparams, const char *dbname,
+			   const char *maintenance_db, vacuumingOptions *vacopts,
+			   VacObjFilter objfilter, SimpleStringList *objects,
+			   int tbl_count, bool analyze_in_stages, int concurrentCons,
+			   const char *progname, bool echo, bool quiet)
+{
+	setup_cancel_handler(NULL);
+
+	/* Avoid opening extra connections. */
+	if (tbl_count && (concurrentCons > tbl_count))
+		concurrentCons = tbl_count;
+
+	if (objfilter & OBJFILTER_ALL_DBS)
+	{
+		cparams->dbname = maintenance_db;
+
+		return vacuum_all_databases(cparams, vacopts,
+									analyze_in_stages,
+									objfilter, objects,
+									concurrentCons,
+									progname, echo, quiet);
+	}
+	else
+	{
+		if (dbname == NULL)
+		{
+			if (getenv("PGDATABASE"))
+				dbname = getenv("PGDATABASE");
+			else if (getenv("PGUSER"))
+				dbname = getenv("PGUSER");
+			else
+				dbname = get_user_name_or_exit(progname);
+		}
+
+		cparams->dbname = dbname;
+
+		if (analyze_in_stages)
+		{
+			SimpleStringList *found_objs = NULL;
+
+			for (int stage = 0; stage < ANALYZE_NUM_STAGES; stage++)
+			{
+				int			ret;
+
+				ret = vacuum_one_database(cparams, vacopts,
+										  stage,
+										  objfilter, objects,
+										  vacopts->missing_stats_only ? &found_objs : NULL,
+										  concurrentCons,
+										  progname, echo, quiet);
+				if (ret != 0)
+					return ret;
+			}
+
+			return EXIT_SUCCESS;
+		}
+		else
+			return vacuum_one_database(cparams, vacopts,
+									   ANALYZE_NO_STAGE,
+									   objfilter, objects, NULL,
+									   concurrentCons,
+									   progname, echo, quiet);
+	}
+}
+
+
+/*
+ * vacuum_one_database
+ *
+ * Process tables in the given database.
+ *
+ * There are two ways to specify the list of objects to process:
+ *
+ * 1) The "found_objs" parameter is a double pointer to a fully qualified list
+ *    of objects to process, as returned by a previous call to
+ *    vacuum_one_database().
+ *
+ *     a) If both "found_objs" (the double pointer) and "*found_objs" (the
+ *        once-dereferenced double pointer) are not NULL, this list takes
+ *        priority, and anything specified in "objects" is ignored.
+ *
+ *     b) If "found_objs" (the double pointer) is not NULL but "*found_objs"
+ *        (the once-dereferenced double pointer) _is_ NULL, the "objects"
+ *        parameter takes priority, and the results of the catalog query
+ *        described in (2) are stored in "found_objs".
+ *
+ *     c) If "found_objs" (the double pointer) is NULL, the "objects"
+ *        parameter again takes priority, and the results of the catalog query
+ *        are not saved.
+ *
+ * 2) The "objects" parameter is a user-specified list of objects to process.
+ *    When (1b) or (1c) applies, this function performs a catalog query to
+ *    retrieve a fully qualified list of objects to process, as described
+ *    below.
+ *
+ *     a) If "objects" is not NULL, the catalog query gathers only the objects
+ *        listed in "objects".
+ *
+ *     b) If "objects" is NULL, all tables in the database are gathered.
+ *
+ * Note that this function is only concerned with running exactly one stage
+ * when in analyze-in-stages mode; caller must iterate on us if necessary.
+ *
+ * If concurrentCons is > 1, multiple connections are used to vacuum tables
+ * in parallel.
+ */
+static int
+vacuum_one_database(ConnParams *cparams,
+					vacuumingOptions *vacopts,
+					int stage,
+					VacObjFilter objfilter, SimpleStringList *objects,
+					SimpleStringList **found_objs,
+					int concurrentCons,
+					const char *progname, bool echo, bool quiet)
+{
+	PQExpBufferData sql;
+	PGconn	   *conn;
+	SimpleStringListCell *cell;
+	ParallelSlotArray *sa;
+	int			ntups = 0;
+	const char *initcmd;
+	SimpleStringList *retobjs = NULL;
+	int			ret = EXIT_SUCCESS;
+	const char *stage_commands[] = {
+		"SET default_statistics_target=1; SET vacuum_cost_delay=0;",
+		"SET default_statistics_target=10; RESET vacuum_cost_delay;",
+		"RESET default_statistics_target;"
+	};
+	const char *stage_messages[] = {
+		gettext_noop("Generating minimal optimizer statistics (1 target)"),
+		gettext_noop("Generating medium optimizer statistics (10 targets)"),
+		gettext_noop("Generating default (full) optimizer statistics")
+	};
+
+	Assert(stage == ANALYZE_NO_STAGE ||
+		   (stage >= 0 && stage < ANALYZE_NUM_STAGES));
+
+	conn = connectDatabase(cparams, progname, echo, false, true);
+
+	if (vacopts->disable_page_skipping && PQserverVersion(conn) < 90600)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "disable-page-skipping", "9.6");
+	}
+
+	if (vacopts->no_index_cleanup && PQserverVersion(conn) < 120000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "no-index-cleanup", "12");
+	}
+
+	if (vacopts->force_index_cleanup && PQserverVersion(conn) < 120000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "force-index-cleanup", "12");
+	}
+
+	if (!vacopts->do_truncate && PQserverVersion(conn) < 120000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "no-truncate", "12");
+	}
+
+	if (!vacopts->process_main && PQserverVersion(conn) < 160000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "no-process-main", "16");
+	}
+
+	if (!vacopts->process_toast && PQserverVersion(conn) < 140000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "no-process-toast", "14");
+	}
+
+	if (vacopts->skip_locked && PQserverVersion(conn) < 120000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "skip-locked", "12");
+	}
+
+	if (vacopts->min_xid_age != 0 && PQserverVersion(conn) < 90600)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "--min-xid-age", "9.6");
+	}
+
+	if (vacopts->min_mxid_age != 0 && PQserverVersion(conn) < 90600)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "--min-mxid-age", "9.6");
+	}
+
+	if (vacopts->parallel_workers >= 0 && PQserverVersion(conn) < 130000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "--parallel", "13");
+	}
+
+	if (vacopts->buffer_usage_limit && PQserverVersion(conn) < 160000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "--buffer-usage-limit", "16");
+	}
+
+	if (vacopts->missing_stats_only && PQserverVersion(conn) < 150000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "--missing-stats-only", "15");
+	}
+
+	/* skip_database_stats is used automatically if server supports it */
+	vacopts->skip_database_stats = (PQserverVersion(conn) >= 160000);
+
+	if (!quiet)
+	{
+		if (stage != ANALYZE_NO_STAGE)
+			printf(_("%s: processing database \"%s\": %s\n"),
+				   progname, PQdb(conn), _(stage_messages[stage]));
+		else
+			printf(_("%s: vacuuming database \"%s\"\n"),
+				   progname, PQdb(conn));
+		fflush(stdout);
+	}
+
+	/*
+	 * If the caller provided the results of a previous catalog query, just
+	 * use that.  Otherwise, run the catalog query ourselves and set the
+	 * return variable if provided.
+	 */
+	if (found_objs && *found_objs)
+		retobjs = *found_objs;
+	else
+	{
+		retobjs = retrieve_objects(conn, vacopts, objfilter, objects, echo);
+		if (found_objs)
+			*found_objs = retobjs;
+	}
+
+	/*
+	 * Count the number of objects in the catalog query result.  If there are
+	 * none, we are done.
+	 */
+	for (cell = retobjs ? retobjs->head : NULL; cell; cell = cell->next)
+		ntups++;
+
+	if (ntups == 0)
+	{
+		PQfinish(conn);
+		return EXIT_SUCCESS;
+	}
+
+	/*
+	 * Ensure concurrentCons is sane.  If there are more connections than
+	 * vacuumable relations, we don't need to use them all.
+	 */
+	if (concurrentCons > ntups)
+		concurrentCons = ntups;
+	if (concurrentCons <= 0)
+		concurrentCons = 1;
+
+	/*
+	 * All slots need to be prepared to run the appropriate analyze stage, if
+	 * caller requested that mode.  We have to prepare the initial connection
+	 * ourselves before setting up the slots.
+	 */
+	if (stage == ANALYZE_NO_STAGE)
+		initcmd = NULL;
+	else
+	{
+		initcmd = stage_commands[stage];
+		executeCommand(conn, initcmd, echo);
+	}
+
+	/*
+	 * Setup the database connections. We reuse the connection we already have
+	 * for the first slot.  If not in parallel mode, the first slot in the
+	 * array contains the connection.
+	 */
+	sa = ParallelSlotsSetup(concurrentCons, cparams, progname, echo, initcmd);
+	ParallelSlotsAdoptConn(sa, conn);
+
+	initPQExpBuffer(&sql);
+
+	cell = retobjs->head;
+	do
+	{
+		const char *tabname = cell->val;
+		ParallelSlot *free_slot;
+
+		if (CancelRequested)
+		{
+			ret = EXIT_FAILURE;
+			goto finish;
+		}
+
+		free_slot = ParallelSlotsGetIdle(sa, NULL);
+		if (!free_slot)
+		{
+			ret = EXIT_FAILURE;
+			goto finish;
+		}
+
+		prepare_vacuum_command(free_slot->connection, &sql,
+							   vacopts, tabname);
+
+		/*
+		 * Execute the vacuum.  All errors are handled in processQueryResult
+		 * through ParallelSlotsGetIdle.
+		 */
+		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
+		run_vacuum_command(free_slot->connection, sql.data,
+						   echo, tabname);
+
+		cell = cell->next;
+	} while (cell != NULL);
+
+	if (!ParallelSlotsWaitCompletion(sa))
+	{
+		ret = EXIT_FAILURE;
+		goto finish;
+	}
+
+	/* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */
+	if (vacopts->skip_database_stats && stage == ANALYZE_NO_STAGE &&
+		!vacopts->analyze_only)
+	{
+		const char *cmd = "VACUUM (ONLY_DATABASE_STATS);";
+		ParallelSlot *free_slot = ParallelSlotsGetIdle(sa, NULL);
+
+		if (!free_slot)
+		{
+			ret = EXIT_FAILURE;
+			goto finish;
+		}
+
+		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
+		run_vacuum_command(free_slot->connection, cmd, echo, NULL);
+
+		if (!ParallelSlotsWaitCompletion(sa))
+			ret = EXIT_FAILURE;
+	}
+
+finish:
+	ParallelSlotsTerminate(sa);
+	pg_free(sa);
+
+	termPQExpBuffer(&sql);
+
+	return ret;
+}
+
+/*
+ * Vacuum/analyze all connectable databases.
+ *
+ * In analyze-in-stages mode, we process all databases in one stage before
+ * moving on to the next stage.  That ensure minimal stats are available
+ * quickly everywhere before generating more detailed ones.
+ */
+static int
+vacuum_all_databases(ConnParams *cparams,
+					 vacuumingOptions *vacopts,
+					 bool analyze_in_stages,
+					 VacObjFilter objfilter,
+					 SimpleStringList *objects,
+					 int concurrentCons,
+					 const char *progname, bool echo, bool quiet)
+{
+	PGconn	   *conn;
+	PGresult   *result;
+
+	conn = connectMaintenanceDatabase(cparams, progname, echo);
+	result = executeQuery(conn,
+						  "SELECT datname FROM pg_database WHERE datallowconn AND datconnlimit <> -2 ORDER BY 1;",
+						  echo);
+	PQfinish(conn);
+
+	if (analyze_in_stages)
+	{
+		SimpleStringList **found_objs = NULL;
+
+		if (vacopts->missing_stats_only)
+			found_objs = palloc0(PQntuples(result) * sizeof(SimpleStringList *));
+
+		/*
+		 * When analyzing all databases in stages, we analyze them all in the
+		 * fastest stage first, so that initial statistics become available
+		 * for all of them as soon as possible.
+		 *
+		 * This means we establish several times as many connections, but
+		 * that's a secondary consideration.
+		 */
+		for (int stage = 0; stage < ANALYZE_NUM_STAGES; stage++)
+		{
+			for (int i = 0; i < PQntuples(result); i++)
+			{
+				int			ret;
+
+				cparams->override_dbname = PQgetvalue(result, i, 0);
+				ret = vacuum_one_database(cparams, vacopts,
+										  stage,
+										  objfilter, objects,
+										  vacopts->missing_stats_only ? &found_objs[i] : NULL,
+										  concurrentCons,
+										  progname, echo, quiet);
+				if (ret != EXIT_SUCCESS)
+					return ret;
+			}
+		}
+	}
+	else
+	{
+		for (int i = 0; i < PQntuples(result); i++)
+		{
+			int			ret;
+
+			cparams->override_dbname = PQgetvalue(result, i, 0);
+			ret = vacuum_one_database(cparams, vacopts,
+									  ANALYZE_NO_STAGE,
+									  objfilter, objects,
+									  NULL,
+									  concurrentCons,
+									  progname, echo, quiet);
+			if (ret != EXIT_SUCCESS)
+				return ret;
+		}
+	}
+
+	PQclear(result);
+
+	return EXIT_SUCCESS;
+}
+
+/*
+ * Prepare the list of tables to process by querying the catalogs.
+ *
+ * Since we execute the constructed query with the default search_path (which
+ * could be unsafe), everything in this query MUST be fully qualified.
+ *
+ * First, build a WITH clause for the catalog query if any tables were
+ * specified, with a set of values made of relation names and their optional
+ * set of columns.  This is used to match any provided column lists with the
+ * generated qualified identifiers and to filter for the tables provided via
+ * --table.  If a listed table does not exist, the catalog query will fail.
+ */
+static SimpleStringList *
+retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
+				 VacObjFilter objfilter, SimpleStringList *objects, bool echo)
+{
+	PQExpBufferData buf;
+	PQExpBufferData catalog_query;
+	PGresult   *res;
+	SimpleStringListCell *cell;
+	SimpleStringList *found_objs = palloc0(sizeof(SimpleStringList));
+	bool		objects_listed = false;
+
+	initPQExpBuffer(&catalog_query);
+	for (cell = objects ? objects->head : NULL; cell; cell = cell->next)
+	{
+		char	   *just_table = NULL;
+		const char *just_columns = NULL;
+
+		if (!objects_listed)
+		{
+			appendPQExpBufferStr(&catalog_query,
+								 "WITH listed_objects (object_oid, column_list) AS (\n"
+								 "  VALUES (");
+			objects_listed = true;
+		}
+		else
+			appendPQExpBufferStr(&catalog_query, ",\n  (");
+
+		if (objfilter & (OBJFILTER_SCHEMA | OBJFILTER_SCHEMA_EXCLUDE))
+		{
+			appendStringLiteralConn(&catalog_query, cell->val, conn);
+			appendPQExpBufferStr(&catalog_query, "::pg_catalog.regnamespace, ");
+		}
+
+		if (objfilter & OBJFILTER_TABLE)
+		{
+			/*
+			 * Split relation and column names given by the user, this is used
+			 * to feed the CTE with values on which are performed pre-run
+			 * validity checks as well.  For now these happen only on the
+			 * relation name.
+			 */
+			splitTableColumnsSpec(cell->val, PQclientEncoding(conn),
+								  &just_table, &just_columns);
+
+			appendStringLiteralConn(&catalog_query, just_table, conn);
+			appendPQExpBufferStr(&catalog_query, "::pg_catalog.regclass, ");
+		}
+
+		if (just_columns && just_columns[0] != '\0')
+			appendStringLiteralConn(&catalog_query, just_columns, conn);
+		else
+			appendPQExpBufferStr(&catalog_query, "NULL");
+
+		appendPQExpBufferStr(&catalog_query, "::pg_catalog.text)");
+
+		pg_free(just_table);
+	}
+
+	/* Finish formatting the CTE */
+	if (objects_listed)
+		appendPQExpBufferStr(&catalog_query, "\n)\n");
+
+	appendPQExpBufferStr(&catalog_query, "SELECT c.relname, ns.nspname");
+
+	if (objects_listed)
+		appendPQExpBufferStr(&catalog_query, ", listed_objects.column_list");
+
+	appendPQExpBufferStr(&catalog_query,
+						 " FROM pg_catalog.pg_class c\n"
+						 " JOIN pg_catalog.pg_namespace ns"
+						 " ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n"
+						 " CROSS JOIN LATERAL (SELECT c.relkind IN ("
+						 CppAsString2(RELKIND_PARTITIONED_TABLE) ", "
+						 CppAsString2(RELKIND_PARTITIONED_INDEX) ")) as p (inherited)\n"
+						 " LEFT JOIN pg_catalog.pg_class t"
+						 " ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid\n");
+
+	/*
+	 * Used to match the tables or schemas listed by the user, completing the
+	 * JOIN clause.
+	 */
+	if (objects_listed)
+	{
+		appendPQExpBufferStr(&catalog_query, " LEFT JOIN listed_objects"
+							 " ON listed_objects.object_oid"
+							 " OPERATOR(pg_catalog.=) ");
+
+		if (objfilter & OBJFILTER_TABLE)
+			appendPQExpBufferStr(&catalog_query, "c.oid\n");
+		else
+			appendPQExpBufferStr(&catalog_query, "ns.oid\n");
+	}
+
+	/*
+	 * Exclude temporary tables, beginning the WHERE clause.
+	 */
+	appendPQExpBufferStr(&catalog_query,
+						 " WHERE c.relpersistence OPERATOR(pg_catalog.!=) "
+						 CppAsString2(RELPERSISTENCE_TEMP) "\n");
+
+	/*
+	 * Used to match the tables or schemas listed by the user, for the WHERE
+	 * clause.
+	 */
+	if (objects_listed)
+	{
+		if (objfilter & OBJFILTER_SCHEMA_EXCLUDE)
+			appendPQExpBufferStr(&catalog_query,
+								 " AND listed_objects.object_oid IS NULL\n");
+		else
+			appendPQExpBufferStr(&catalog_query,
+								 " AND listed_objects.object_oid IS NOT NULL\n");
+	}
+
+	/*
+	 * If no tables were listed, filter for the relevant relation types.  If
+	 * tables were given via --table, don't bother filtering by relation type.
+	 * Instead, let the server decide whether a given relation can be
+	 * processed in which case the user will know about it.
+	 */
+	if ((objfilter & OBJFILTER_TABLE) == 0)
+	{
+		/*
+		 * vacuumdb should generally follow the behavior of the underlying
+		 * VACUUM and ANALYZE commands. If analyze_only is true, process
+		 * regular tables, materialized views, and partitioned tables, just
+		 * like ANALYZE (with no specific target tables) does. Otherwise,
+		 * process only regular tables and materialized views, since VACUUM
+		 * skips partitioned tables when no target tables are specified.
+		 */
+		if (vacopts->analyze_only)
+			appendPQExpBufferStr(&catalog_query,
+								 " AND c.relkind OPERATOR(pg_catalog.=) ANY (array["
+								 CppAsString2(RELKIND_RELATION) ", "
+								 CppAsString2(RELKIND_MATVIEW) ", "
+								 CppAsString2(RELKIND_PARTITIONED_TABLE) "])\n");
+		else
+			appendPQExpBufferStr(&catalog_query,
+								 " AND c.relkind OPERATOR(pg_catalog.=) ANY (array["
+								 CppAsString2(RELKIND_RELATION) ", "
+								 CppAsString2(RELKIND_MATVIEW) "])\n");
+	}
+
+	/*
+	 * For --min-xid-age and --min-mxid-age, the age of the relation is the
+	 * greatest of the ages of the main relation and its associated TOAST
+	 * table.  The commands generated by vacuumdb will also process the TOAST
+	 * table for the relation if necessary, so it does not need to be
+	 * considered separately.
+	 */
+	if (vacopts->min_xid_age != 0)
+	{
+		appendPQExpBuffer(&catalog_query,
+						  " AND GREATEST(pg_catalog.age(c.relfrozenxid),"
+						  " pg_catalog.age(t.relfrozenxid)) "
+						  " OPERATOR(pg_catalog.>=) '%d'::pg_catalog.int4\n"
+						  " AND c.relfrozenxid OPERATOR(pg_catalog.!=)"
+						  " '0'::pg_catalog.xid\n",
+						  vacopts->min_xid_age);
+	}
+
+	if (vacopts->min_mxid_age != 0)
+	{
+		appendPQExpBuffer(&catalog_query,
+						  " AND GREATEST(pg_catalog.mxid_age(c.relminmxid),"
+						  " pg_catalog.mxid_age(t.relminmxid)) OPERATOR(pg_catalog.>=)"
+						  " '%d'::pg_catalog.int4\n"
+						  " AND c.relminmxid OPERATOR(pg_catalog.!=)"
+						  " '0'::pg_catalog.xid\n",
+						  vacopts->min_mxid_age);
+	}
+
+	if (vacopts->missing_stats_only)
+	{
+		appendPQExpBufferStr(&catalog_query, " AND (\n");
+
+		/* regular stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
+							 " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
+							 " AND NOT a.attisdropped\n"
+							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND a.attgenerated OPERATOR(pg_catalog.<>) "
+							 CppAsString2(ATTRIBUTE_GENERATED_VIRTUAL) "\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
+							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
+							 " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+		/* extended stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
+							 " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
+							 " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
+							 " AND d.stxdinherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+		/* expression indexes */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
+							 " JOIN pg_catalog.pg_index i"
+							 " ON i.indexrelid OPERATOR(pg_catalog.=) a.attrelid\n"
+							 " WHERE i.indrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND i.indkey[a.attnum OPERATOR(pg_catalog.-) 1::pg_catalog.int2]"
+							 " OPERATOR(pg_catalog.=) 0::pg_catalog.int2\n"
+							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
+							 " AND NOT a.attisdropped\n"
+							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
+							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
+							 " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+		/* inheritance and regular stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
+							 " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
+							 " AND NOT a.attisdropped\n"
+							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND a.attgenerated OPERATOR(pg_catalog.<>) "
+							 CppAsString2(ATTRIBUTE_GENERATED_VIRTUAL) "\n"
+							 " AND c.relhassubclass\n"
+							 " AND NOT p.inherited\n"
+							 " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
+							 " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
+							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
+							 " AND s.stainherit))\n");
+
+		/* inheritance and extended stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
+							 " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND c.relhassubclass\n"
+							 " AND NOT p.inherited\n"
+							 " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
+							 " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
+							 " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
+							 " AND d.stxdinherit))\n");
+
+		appendPQExpBufferStr(&catalog_query, " )\n");
+	}
+
+	/*
+	 * Execute the catalog query.  We use the default search_path for this
+	 * query for consistency with table lookups done elsewhere by the user.
+	 */
+	appendPQExpBufferStr(&catalog_query, " ORDER BY c.relpages DESC;");
+	executeCommand(conn, "RESET search_path;", echo);
+	res = executeQuery(conn, catalog_query.data, echo);
+	termPQExpBuffer(&catalog_query);
+	PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, echo));
+
+	/*
+	 * Build qualified identifiers for each table, including the column list
+	 * if given.
+	 */
+	initPQExpBuffer(&buf);
+	for (int i = 0; i < PQntuples(res); i++)
+	{
+		appendPQExpBufferStr(&buf,
+							 fmtQualifiedIdEnc(PQgetvalue(res, i, 1),
+											   PQgetvalue(res, i, 0),
+											   PQclientEncoding(conn)));
+
+		if (objects_listed && !PQgetisnull(res, i, 2))
+			appendPQExpBufferStr(&buf, PQgetvalue(res, i, 2));
+
+		simple_string_list_append(found_objs, buf.data);
+		resetPQExpBuffer(&buf);
+	}
+	termPQExpBuffer(&buf);
+	PQclear(res);
+
+	return found_objs;
+}
+
+/*
+ * Construct a vacuum/analyze command to run based on the given
+ * options, in the given string buffer, which may contain previous garbage.
+ *
+ * The table name used must be already properly quoted.  The command generated
+ * depends on the server version involved and it is semicolon-terminated.
+ */
+static void
+prepare_vacuum_command(PGconn *conn, PQExpBuffer sql,
+					   vacuumingOptions *vacopts, const char *table)
+{
+	int		serverVersion = PQserverVersion(conn);
+	const char *paren = " (";
+	const char *comma = ", ";
+	const char *sep = paren;
+
+	resetPQExpBuffer(sql);
+
+	if (vacopts->analyze_only)
+	{
+		appendPQExpBufferStr(sql, "ANALYZE");
+
+		/* parenthesized grammar of ANALYZE is supported since v11 */
+		if (serverVersion >= 110000)
+		{
+			if (vacopts->skip_locked)
+			{
+				/* SKIP_LOCKED is supported since v12 */
+				Assert(serverVersion >= 120000);
+				appendPQExpBuffer(sql, "%sSKIP_LOCKED", sep);
+				sep = comma;
+			}
+			if (vacopts->verbose)
+			{
+				appendPQExpBuffer(sql, "%sVERBOSE", sep);
+				sep = comma;
+			}
+			if (vacopts->buffer_usage_limit)
+			{
+				Assert(serverVersion >= 160000);
+				appendPQExpBuffer(sql, "%sBUFFER_USAGE_LIMIT '%s'", sep,
+								  vacopts->buffer_usage_limit);
+				sep = comma;
+			}
+			if (sep != paren)
+				appendPQExpBufferChar(sql, ')');
+		}
+		else
+		{
+			if (vacopts->verbose)
+				appendPQExpBufferStr(sql, " VERBOSE");
+		}
+	}
+	else
+	{
+		appendPQExpBufferStr(sql, "VACUUM");
+
+		/* parenthesized grammar of VACUUM is supported since v9.0 */
+		if (serverVersion >= 90000)
+		{
+			if (vacopts->disable_page_skipping)
+			{
+				/* DISABLE_PAGE_SKIPPING is supported since v9.6 */
+				Assert(serverVersion >= 90600);
+				appendPQExpBuffer(sql, "%sDISABLE_PAGE_SKIPPING", sep);
+				sep = comma;
+			}
+			if (vacopts->no_index_cleanup)
+			{
+				/* "INDEX_CLEANUP FALSE" has been supported since v12 */
+				Assert(serverVersion >= 120000);
+				Assert(!vacopts->force_index_cleanup);
+				appendPQExpBuffer(sql, "%sINDEX_CLEANUP FALSE", sep);
+				sep = comma;
+			}
+			if (vacopts->force_index_cleanup)
+			{
+				/* "INDEX_CLEANUP TRUE" has been supported since v12 */
+				Assert(serverVersion >= 120000);
+				Assert(!vacopts->no_index_cleanup);
+				appendPQExpBuffer(sql, "%sINDEX_CLEANUP TRUE", sep);
+				sep = comma;
+			}
+			if (!vacopts->do_truncate)
+			{
+				/* TRUNCATE is supported since v12 */
+				Assert(serverVersion >= 120000);
+				appendPQExpBuffer(sql, "%sTRUNCATE FALSE", sep);
+				sep = comma;
+			}
+			if (!vacopts->process_main)
+			{
+				/* PROCESS_MAIN is supported since v16 */
+				Assert(serverVersion >= 160000);
+				appendPQExpBuffer(sql, "%sPROCESS_MAIN FALSE", sep);
+				sep = comma;
+			}
+			if (!vacopts->process_toast)
+			{
+				/* PROCESS_TOAST is supported since v14 */
+				Assert(serverVersion >= 140000);
+				appendPQExpBuffer(sql, "%sPROCESS_TOAST FALSE", sep);
+				sep = comma;
+			}
+			if (vacopts->skip_database_stats)
+			{
+				/* SKIP_DATABASE_STATS is supported since v16 */
+				Assert(serverVersion >= 160000);
+				appendPQExpBuffer(sql, "%sSKIP_DATABASE_STATS", sep);
+				sep = comma;
+			}
+			if (vacopts->skip_locked)
+			{
+				/* SKIP_LOCKED is supported since v12 */
+				Assert(serverVersion >= 120000);
+				appendPQExpBuffer(sql, "%sSKIP_LOCKED", sep);
+				sep = comma;
+			}
+			if (vacopts->full)
+			{
+				appendPQExpBuffer(sql, "%sFULL", sep);
+				sep = comma;
+			}
+			if (vacopts->freeze)
+			{
+				appendPQExpBuffer(sql, "%sFREEZE", sep);
+				sep = comma;
+			}
+			if (vacopts->verbose)
+			{
+				appendPQExpBuffer(sql, "%sVERBOSE", sep);
+				sep = comma;
+			}
+			if (vacopts->and_analyze)
+			{
+				appendPQExpBuffer(sql, "%sANALYZE", sep);
+				sep = comma;
+			}
+			if (vacopts->parallel_workers >= 0)
+			{
+				/* PARALLEL is supported since v13 */
+				Assert(serverVersion >= 130000);
+				appendPQExpBuffer(sql, "%sPARALLEL %d", sep,
+								  vacopts->parallel_workers);
+				sep = comma;
+			}
+			if (vacopts->buffer_usage_limit)
+			{
+				Assert(serverVersion >= 160000);
+				appendPQExpBuffer(sql, "%sBUFFER_USAGE_LIMIT '%s'", sep,
+								  vacopts->buffer_usage_limit);
+				sep = comma;
+			}
+			if (sep != paren)
+				appendPQExpBufferChar(sql, ')');
+		}
+		else
+		{
+			if (vacopts->full)
+				appendPQExpBufferStr(sql, " FULL");
+			if (vacopts->freeze)
+				appendPQExpBufferStr(sql, " FREEZE");
+			if (vacopts->verbose)
+				appendPQExpBufferStr(sql, " VERBOSE");
+			if (vacopts->and_analyze)
+				appendPQExpBufferStr(sql, " ANALYZE");
+		}
+	}
+
+	appendPQExpBuffer(sql, " %s;", table);
+}
+
+/*
+ * Send a vacuum/analyze command to the server, returning after sending the
+ * command.
+ *
+ * Any errors during command execution are reported to stderr.
+ */
+static void
+run_vacuum_command(PGconn *conn, const char *sql, bool echo,
+				   const char *table)
+{
+	bool		status;
+
+	if (echo)
+		printf("%s\n", sql);
+
+	status = PQsendQuery(conn, sql) == 1;
+
+	if (!status)
+	{
+		if (table)
+		{
+			pg_log_error("vacuuming of table \"%s\" in database \"%s\" failed: %s",
+						 table, PQdb(conn), PQerrorMessage(conn));
+		}
+		else
+		{
+			pg_log_error("vacuuming of database \"%s\" failed: %s",
+						 PQdb(conn), PQerrorMessage(conn));
+		}
+	}
+}
+
+/*
+ * Returns a newly malloc'd version of 'src' with escaped single quotes and
+ * backslashes.
+ */
+char *
+escape_quotes(const char *src)
+{
+	char	   *result = escape_single_quotes_ascii(src);
+
+	if (!result)
+		pg_fatal("out of memory");
+	return result;
+}
diff --git a/src/bin/scripts/vacuuming.h b/src/bin/scripts/vacuuming.h
new file mode 100644
index 00000000000..021953e153a
--- /dev/null
+++ b/src/bin/scripts/vacuuming.h
@@ -0,0 +1,68 @@
+/*-------------------------------------------------------------------------
+ *
+ * vacuuming.h
+ *		Common declarations for vacuuming.c
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/bin/scripts/vacuuming.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef VACUUMING_H
+#define VACUUMING_H
+
+#include "common.h"
+#include "fe_utils/connect_utils.h"
+#include "fe_utils/simple_list.h"
+
+/* For analyze-in-stages mode */
+#define ANALYZE_NO_STAGE	-1
+#define ANALYZE_NUM_STAGES	3
+
+/* vacuum options controlled by user flags */
+typedef struct vacuumingOptions
+{
+	bool		analyze_only;
+	bool		verbose;
+	bool		and_analyze;
+	bool		full;
+	bool		freeze;
+	bool		disable_page_skipping;
+	bool		skip_locked;
+	int			min_xid_age;
+	int			min_mxid_age;
+	int			parallel_workers;	/* >= 0 indicates user specified the
+									 * parallel degree, otherwise -1 */
+	bool		no_index_cleanup;
+	bool		force_index_cleanup;
+	bool		do_truncate;
+	bool		process_main;
+	bool		process_toast;
+	bool		skip_database_stats;
+	char	   *buffer_usage_limit;
+	bool		missing_stats_only;
+} vacuumingOptions;
+
+/* object filter options */
+typedef enum
+{
+	OBJFILTER_NONE = 0,			/* no filter used */
+	OBJFILTER_ALL_DBS = (1 << 0),	/* -a | --all */
+	OBJFILTER_DATABASE = (1 << 1),	/* -d | --dbname */
+	OBJFILTER_TABLE = (1 << 2), /* -t | --table */
+	OBJFILTER_SCHEMA = (1 << 3),	/* -n | --schema */
+	OBJFILTER_SCHEMA_EXCLUDE = (1 << 4),	/* -N | --exclude-schema */
+} VacObjFilter;
+
+extern int	vacuuming_main(ConnParams *cparams, const char *dbname,
+						   const char *maintenance_db, vacuumingOptions *vacopts,
+						   VacObjFilter objfilter, SimpleStringList *objects,
+						   int tbl_count, bool analyze_in_stages,
+						   int concurrentCons,
+						   const char *progname, bool echo, bool quiet);
+
+extern char *escape_quotes(const char *src);
+
+#endif							/* VACUUMING_H */

base-commit: e849bd551c323a384f2b14d20a1b7bfaa6127ed7
-- 
2.47.3

v22-0002-Add-REPACK-command.patchtext/x-diff; charset=utf-8Download

From b90c472c4efb72c8dae574e2126c08fa3725988a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= <alvherre@kurilemu.de>
Date: Sat, 26 Jul 2025 19:57:26 +0200
Subject: [PATCH v22 2/2] Add REPACK command

REPACK absorbs the functionality of VACUUM FULL and CLUSTER in a single
command.  Because this functionality is completely different from
regular VACUUM, having it separate from VACUUM makes it easier for users
to understand; as for CLUSTER, the term is heavily overloaded in the
TI world and even in Postgres itself, so it's good that we can avoid it.

This also adds pg_repackdb, a new utility that can invoke the new
commands.  This is heavily based on vacuumdb.  We may still change the
implementation, depending on how does Windows like this one.

Author: Antonin Houska <ah@cybertec.at>
Reviewed-by: To fill in
Discussion: https://postgr.es/m/82651.1720540558@antos
Discussion: https://postgr.es/m/202507262156.sb455angijk6@alvherre.pgsql
---
 doc/src/sgml/monitoring.sgml             | 223 ++++++-
 doc/src/sgml/ref/allfiles.sgml           |   2 +
 doc/src/sgml/ref/cluster.sgml            |  97 +--
 doc/src/sgml/ref/clusterdb.sgml          |   5 +
 doc/src/sgml/ref/pg_repackdb.sgml        | 479 ++++++++++++++
 doc/src/sgml/ref/repack.sgml             | 284 +++++++++
 doc/src/sgml/ref/vacuum.sgml             |  33 +-
 doc/src/sgml/reference.sgml              |   2 +
 src/backend/access/heap/heapam_handler.c |  32 +-
 src/backend/catalog/index.c              |   2 +-
 src/backend/catalog/system_views.sql     |  26 +
 src/backend/commands/cluster.c           | 758 +++++++++++++++--------
 src/backend/commands/vacuum.c            |   3 +-
 src/backend/parser/gram.y                |  88 ++-
 src/backend/tcop/utility.c               |  20 +-
 src/backend/utils/adt/pgstatfuncs.c      |   2 +
 src/bin/psql/tab-complete.in.c           |  33 +-
 src/bin/scripts/Makefile                 |   4 +-
 src/bin/scripts/meson.build              |   2 +
 src/bin/scripts/pg_repackdb.c            | 238 +++++++
 src/bin/scripts/t/103_repackdb.pl        |  24 +
 src/bin/scripts/vacuuming.c              |  97 ++-
 src/bin/scripts/vacuuming.h              |   9 +
 src/include/commands/cluster.h           |   8 +-
 src/include/commands/progress.h          |  61 +-
 src/include/nodes/parsenodes.h           |  20 +-
 src/include/parser/kwlist.h              |   1 +
 src/include/tcop/cmdtaglist.h            |   1 +
 src/include/utils/backend_progress.h     |   1 +
 src/test/regress/expected/cluster.out    | 125 +++-
 src/test/regress/expected/rules.out      |  23 +
 src/test/regress/sql/cluster.sql         |  59 ++
 src/tools/pgindent/typedefs.list         |   3 +
 33 files changed, 2318 insertions(+), 447 deletions(-)
 create mode 100644 doc/src/sgml/ref/pg_repackdb.sgml
 create mode 100644 doc/src/sgml/ref/repack.sgml
 create mode 100644 src/bin/scripts/pg_repackdb.c
 create mode 100644 src/bin/scripts/t/103_repackdb.pl

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3f4a27a736e..12e103d319d 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -405,6 +405,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_progress_repack</structname><indexterm><primary>pg_stat_progress_repack</primary></indexterm></entry>
+      <entry>One row for each backend running
+       <command>REPACK</command>, showing current progress.  See
+       <xref linkend="repack-progress-reporting"/>.
+      </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_progress_basebackup</structname><indexterm><primary>pg_stat_progress_basebackup</primary></indexterm></entry>
       <entry>One row for each WAL sender process streaming a base backup,
@@ -5506,7 +5514,8 @@ FROM pg_stat_get_backend_idset() AS backendid;
    certain commands during command execution.  Currently, the only commands
    which support progress reporting are <command>ANALYZE</command>,
    <command>CLUSTER</command>,
-   <command>CREATE INDEX</command>, <command>VACUUM</command>,
+   <command>CREATE INDEX</command>, <command>REPACK</command>,
+   <command>VACUUM</command>,
    <command>COPY</command>,
    and <xref linkend="protocol-replication-base-backup"/> (i.e., replication
    command that <xref linkend="app-pgbasebackup"/> issues to take
@@ -5965,6 +5974,218 @@ FROM pg_stat_get_backend_idset() AS backendid;
   </table>
  </sect2>
 
+ <sect2 id="repack-progress-reporting">
+  <title>REPACK Progress Reporting</title>
+
+  <indexterm>
+   <primary>pg_stat_progress_repack</primary>
+  </indexterm>
+
+  <para>
+   Whenever <command>REPACK</command> is running,
+   the <structname>pg_stat_progress_repack</structname> view will contain a
+   row for each backend that is currently running the command.  The tables
+   below describe the information that will be reported and provide
+   information about how to interpret it.
+  </para>
+
+  <table id="pg-stat-progress-repack-view" xreflabel="pg_stat_progress_repack">
+   <title><structname>pg_stat_progress_repack</structname> View</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>pid</structfield> <type>integer</type>
+      </para>
+      <para>
+       Process ID of backend.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>datid</structfield> <type>oid</type>
+      </para>
+      <para>
+       OID of the database to which this backend is connected.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>datname</structfield> <type>name</type>
+      </para>
+      <para>
+       Name of the database to which this backend is connected.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>relid</structfield> <type>oid</type>
+      </para>
+      <para>
+       OID of the table being repacked.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>phase</structfield> <type>text</type>
+      </para>
+      <para>
+       Current processing phase. See <xref linkend="repack-phases"/>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>repack_index_relid</structfield> <type>oid</type>
+      </para>
+      <para>
+       If the table is being scanned using an index, this is the OID of the
+       index being used; otherwise, it is zero.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_scanned</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples scanned.
+       This counter only advances when the phase is
+       <literal>seq scanning heap</literal>,
+       <literal>index scanning heap</literal>
+       or <literal>writing new heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_written</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples written.
+       This counter only advances when the phase is
+       <literal>seq scanning heap</literal>,
+       <literal>index scanning heap</literal>
+       or <literal>writing new heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_blks_total</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total number of heap blocks in the table.  This number is reported
+       as of the beginning of <literal>seq scanning heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_blks_scanned</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap blocks scanned.  This counter only advances when the
+       phase is <literal>seq scanning heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>index_rebuild_count</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of indexes rebuilt.  This counter only advances when the phase
+       is <literal>rebuilding index</literal>.
+      </para></entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <table id="repack-phases">
+   <title>REPACK Phases</title>
+   <tgroup cols="2">
+    <colspec colname="col1" colwidth="1*"/>
+    <colspec colname="col2" colwidth="2*"/>
+    <thead>
+    <row>
+      <entry>Phase</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><literal>initializing</literal></entry>
+     <entry>
+       The command is preparing to begin scanning the heap.  This phase is
+       expected to be very brief.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>seq scanning heap</literal></entry>
+     <entry>
+       The command is currently scanning the table using a sequential scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>index scanning heap</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently scanning the table using an index scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>sorting tuples</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently sorting tuples.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>writing new heap</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently writing the new heap.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>swapping relation files</literal></entry>
+     <entry>
+       The command is currently swapping newly-built files into place.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>rebuilding index</literal></entry>
+     <entry>
+       The command is currently rebuilding an index.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>performing final cleanup</literal></entry>
+     <entry>
+       The command is performing final cleanup.  When this phase is
+       completed, <command>REPACK</command> will end.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+ </sect2>
+
  <sect2 id="copy-progress-reporting">
   <title>COPY Progress Reporting</title>
 
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index f5be638867a..eabf92e3536 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -167,6 +167,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY refreshMaterializedView SYSTEM "refresh_materialized_view.sgml">
 <!ENTITY reindex            SYSTEM "reindex.sgml">
 <!ENTITY releaseSavepoint   SYSTEM "release_savepoint.sgml">
+<!ENTITY repack             SYSTEM "repack.sgml">
 <!ENTITY reset              SYSTEM "reset.sgml">
 <!ENTITY revoke             SYSTEM "revoke.sgml">
 <!ENTITY rollback           SYSTEM "rollback.sgml">
@@ -212,6 +213,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY pgIsready          SYSTEM "pg_isready.sgml">
 <!ENTITY pgReceivewal       SYSTEM "pg_receivewal.sgml">
 <!ENTITY pgRecvlogical      SYSTEM "pg_recvlogical.sgml">
+<!ENTITY pgRepackdb         SYSTEM "pg_repackdb.sgml">
 <!ENTITY pgResetwal         SYSTEM "pg_resetwal.sgml">
 <!ENTITY pgRestore          SYSTEM "pg_restore.sgml">
 <!ENTITY pgRewind           SYSTEM "pg_rewind.sgml">
diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index 8811f169ea0..cfcfb65e349 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -33,51 +33,13 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
   <title>Description</title>
 
   <para>
-   <command>CLUSTER</command> instructs <productname>PostgreSQL</productname>
-   to cluster the table specified
-   by <replaceable class="parameter">table_name</replaceable>
-   based on the index specified by
-   <replaceable class="parameter">index_name</replaceable>. The index must
-   already have been defined on
-   <replaceable class="parameter">table_name</replaceable>.
+   The <command>CLUSTER</command> command is equivalent to
+   <xref linkend="sql-repack"/> with an <literal>USING INDEX</literal>
+   clause.  See there for more details.
   </para>
 
-  <para>
-   When a table is clustered, it is physically reordered
-   based on the index information. Clustering is a one-time operation:
-   when the table is subsequently updated, the changes are
-   not clustered.  That is, no attempt is made to store new or
-   updated rows according to their index order.  (If one wishes, one can
-   periodically recluster by issuing the command again.  Also, setting
-   the table's <literal>fillfactor</literal> storage parameter to less than
-   100% can aid in preserving cluster ordering during updates, since updated
-   rows are kept on the same page if enough space is available there.)
-  </para>
+<!-- Do we need to describe exactly which options map to what?  They seem obvious to me. -->
 
-  <para>
-   When a table is clustered, <productname>PostgreSQL</productname>
-   remembers which index it was clustered by.  The form
-   <command>CLUSTER <replaceable class="parameter">table_name</replaceable></command>
-   reclusters the table using the same index as before.  You can also
-   use the <literal>CLUSTER</literal> or <literal>SET WITHOUT CLUSTER</literal>
-   forms of <link linkend="sql-altertable"><command>ALTER TABLE</command></link> to set the index to be used for
-   future cluster operations, or to clear any previous setting.
-  </para>
-
-  <para>
-   <command>CLUSTER</command> without a
-   <replaceable class="parameter">table_name</replaceable> reclusters all the
-   previously-clustered tables in the current database that the calling user
-   has privileges for.  This form of <command>CLUSTER</command> cannot be
-   executed inside a transaction block.
-  </para>
-
-  <para>
-   When a table is being clustered, an <literal>ACCESS
-   EXCLUSIVE</literal> lock is acquired on it. This prevents any other
-   database operations (both reads and writes) from operating on the
-   table until the <command>CLUSTER</command> is finished.
-  </para>
  </refsect1>
 
  <refsect1>
@@ -136,63 +98,12 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
     on the table.
    </para>
 
-   <para>
-    In cases where you are accessing single rows randomly
-    within a table, the actual order of the data in the
-    table is unimportant. However, if you tend to access some
-    data more than others, and there is an index that groups
-    them together, you will benefit from using <command>CLUSTER</command>.
-    If you are requesting a range of indexed values from a table, or a
-    single indexed value that has multiple rows that match,
-    <command>CLUSTER</command> will help because once the index identifies the
-    table page for the first row that matches, all other rows
-    that match are probably already on the same table page,
-    and so you save disk accesses and speed up the query.
-   </para>
-
-   <para>
-    <command>CLUSTER</command> can re-sort the table using either an index scan
-    on the specified index, or (if the index is a b-tree) a sequential
-    scan followed by sorting.  It will attempt to choose the method that
-    will be faster, based on planner cost parameters and available statistical
-    information.
-   </para>
-
    <para>
     While <command>CLUSTER</command> is running, the <xref
     linkend="guc-search-path"/> is temporarily changed to <literal>pg_catalog,
     pg_temp</literal>.
    </para>
 
-   <para>
-    When an index scan is used, a temporary copy of the table is created that
-    contains the table data in the index order.  Temporary copies of each
-    index on the table are created as well.  Therefore, you need free space on
-    disk at least equal to the sum of the table size and the index sizes.
-   </para>
-
-   <para>
-    When a sequential scan and sort is used, a temporary sort file is
-    also created, so that the peak temporary space requirement is as much
-    as double the table size, plus the index sizes.  This method is often
-    faster than the index scan method, but if the disk space requirement is
-    intolerable, you can disable this choice by temporarily setting <xref
-    linkend="guc-enable-sort"/> to <literal>off</literal>.
-   </para>
-
-   <para>
-    It is advisable to set <xref linkend="guc-maintenance-work-mem"/> to
-    a reasonably large value (but not more than the amount of RAM you can
-    dedicate to the <command>CLUSTER</command> operation) before clustering.
-   </para>
-
-   <para>
-    Because the planner records statistics about the ordering of
-    tables, it is advisable to run <link linkend="sql-analyze"><command>ANALYZE</command></link>
-    on the newly clustered table.
-    Otherwise, the planner might make poor choices of query plans.
-   </para>
-
    <para>
     Because <command>CLUSTER</command> remembers which indexes are clustered,
     one can cluster the tables one wants clustered manually the first time,
diff --git a/doc/src/sgml/ref/clusterdb.sgml b/doc/src/sgml/ref/clusterdb.sgml
index 0d2051bf6f1..546c1289c31 100644
--- a/doc/src/sgml/ref/clusterdb.sgml
+++ b/doc/src/sgml/ref/clusterdb.sgml
@@ -64,6 +64,11 @@ PostgreSQL documentation
    this utility and via other methods for accessing the server.
   </para>
 
+  <para>
+   <application>clusterdb</application> has been superceded by
+   <application>pg_repackdb</application>.
+  </para>
+
  </refsect1>
 
 
diff --git a/doc/src/sgml/ref/pg_repackdb.sgml b/doc/src/sgml/ref/pg_repackdb.sgml
new file mode 100644
index 00000000000..32570d071cb
--- /dev/null
+++ b/doc/src/sgml/ref/pg_repackdb.sgml
@@ -0,0 +1,479 @@
+<!--
+doc/src/sgml/ref/pg_repackdb.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="app-pgrepackdb">
+ <indexterm zone="app-pgrepackdb">
+  <primary>pg_repackdb</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle><application>pg_repackdb</application></refentrytitle>
+  <manvolnum>1</manvolnum>
+  <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>pg_repackdb</refname>
+  <refpurpose>repack and analyze a <productname>PostgreSQL</productname>
+  database</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_repackdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+      <arg choice="plain"><option>-t</option></arg>
+      <arg choice="plain"><option>--table</option></arg>
+     </group>
+     <replaceable>table</replaceable>
+     <arg choice="opt">( <replaceable class="parameter">column</replaceable> [,...] )</arg>
+    </arg>
+   </arg>
+
+   <arg choice="opt">
+    <group choice="plain">
+     <arg choice="plain"><replaceable>dbname</replaceable></arg>
+     <arg choice="plain"><option>-a</option></arg>
+     <arg choice="plain"><option>--all</option></arg>
+    </group>
+   </arg>
+  </cmdsynopsis>
+
+  <cmdsynopsis>
+   <command>pg_repackdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+      <arg choice="plain"><option>-n</option></arg>
+      <arg choice="plain"><option>--schema</option></arg>
+     </group>
+     <replaceable>schema</replaceable>
+    </arg>
+   </arg>
+
+   <arg choice="opt">
+    <group choice="plain">
+     <arg choice="plain"><replaceable>dbname</replaceable></arg>
+     <arg choice="plain"><option>-a</option></arg>
+     <arg choice="plain"><option>--all</option></arg>
+    </group>
+   </arg>
+  </cmdsynopsis>
+
+  <cmdsynopsis>
+   <command>pg_repackdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+      <arg choice="plain"><option>-N</option></arg>
+      <arg choice="plain"><option>--exclude-schema</option></arg>
+     </group>
+     <replaceable>schema</replaceable>
+    </arg>
+   </arg>
+
+   <arg choice="opt">
+    <group choice="plain">
+     <arg choice="plain"><replaceable>dbname</replaceable></arg>
+     <arg choice="plain"><option>-a</option></arg>
+     <arg choice="plain"><option>--all</option></arg>
+    </group>
+   </arg>
+  </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <application>pg_repackdb</application> is a utility for repacking a
+   <productname>PostgreSQL</productname> database.
+   <application>pg_repackdb</application> will also generate internal
+   statistics used by the <productname>PostgreSQL</productname> query
+   optimizer.
+  </para>
+
+  <para>
+   <application>pg_repackdb</application> is a wrapper around the SQL
+   command <link linkend="sql-repack"><command>REPACK</command></link> There
+   is no effective difference between repacking and analyzing databases via
+   this utility and via other methods for accessing the server.
+  </para>
+
+ </refsect1>
+
+
+ <refsect1>
+  <title>Options</title>
+
+   <para>
+    <application>pg_repackdb</application> accepts the following command-line arguments:
+    <variablelist>
+     <varlistentry>
+      <term><option>-a</option></term>
+      <term><option>--all</option></term>
+      <listitem>
+       <para>
+        Repack all databases.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option><optional>-d</optional> <replaceable class="parameter">dbname</replaceable></option></term>
+      <term><option><optional>--dbname=</optional><replaceable class="parameter">dbname</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the name of the database to be repacked or analyzed,
+        when <option>-a</option>/<option>--all</option> is not used.  If this
+        is not specified, the database name is read from the environment
+        variable <envar>PGDATABASE</envar>.  If that is not set, the user name
+        specified for the connection is used.
+        The <replaceable>dbname</replaceable> can be
+        a <link linkend="libpq-connstring">connection string</link>.  If so,
+        connection string parameters will override any conflicting command
+        line options.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-e</option></term>
+      <term><option>--echo</option></term>
+      <listitem>
+       <para>
+        Echo the commands that <application>pg_repackdb</application>
+        generates and sends to the server.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-j <replaceable class="parameter">njobs</replaceable></option></term>
+      <term><option>--jobs=<replaceable class="parameter">njobs</replaceable></option></term>
+      <listitem>
+       <para>
+        Execute the repack or analyze commands in parallel by running
+        <replaceable class="parameter">njobs</replaceable>
+        commands simultaneously.  This option may reduce the processing time
+        but it also increases the load on the database server.
+       </para>
+       <para>
+        <application>pg_repackdb</application> will open
+        <replaceable class="parameter">njobs</replaceable> connections to the
+        database, so make sure your <xref linkend="guc-max-connections"/>
+        setting is high enough to accommodate all connections.
+       </para>
+       <para>
+        Note that using this mode might cause deadlock failures if certain
+        system catalogs are processed in parallel.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-n <replaceable class="parameter">schema</replaceable></option></term>
+      <term><option>--schema=<replaceable class="parameter">schema</replaceable></option></term>
+      <listitem>
+       <para>
+        Repack or analyze all tables in
+        <replaceable class="parameter">schema</replaceable> only.  Multiple
+        schemas can be repacked by writing multiple <option>-n</option>
+        switches.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-N <replaceable class="parameter">schema</replaceable></option></term>
+      <term><option>--exclude-schema=<replaceable class="parameter">schema</replaceable></option></term>
+      <listitem>
+       <para>
+        Do not repack or analyze any tables in
+        <replaceable class="parameter">schema</replaceable>.  Multiple schemas
+        can be excluded by writing multiple <option>-N</option> switches.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-q</option></term>
+      <term><option>--quiet</option></term>
+      <listitem>
+       <para>
+        Do not display progress messages.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-t <replaceable class="parameter">table</replaceable> [ (<replaceable class="parameter">column</replaceable> [,...]) ]</option></term>
+      <term><option>--table=<replaceable class="parameter">table</replaceable> [ (<replaceable class="parameter">column</replaceable> [,...]) ]</option></term>
+      <listitem>
+       <para>
+        Repack or analyze <replaceable class="parameter">table</replaceable>
+        only.  Column names can be specified only in conjunction with
+        the <option>--analyze</option> option.  Multiple tables can be
+        repacked by writing multiple
+        <option>-t</option> switches.
+       </para>
+       <tip>
+        <para>
+         If you specify columns, you probably have to escape the parentheses
+         from the shell.  (See examples below.)
+        </para>
+       </tip>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-v</option></term>
+      <term><option>--verbose</option></term>
+      <listitem>
+       <para>
+        Print detailed information during processing.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-V</option></term>
+       <term><option>--version</option></term>
+       <listitem>
+       <para>
+       Print the <application>pg_repackdb</application> version and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-z</option></term>
+      <term><option>--analyze</option></term>
+      <listitem>
+       <para>
+        Also calculate statistics for use by the optimizer.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-?</option></term>
+       <term><option>--help</option></term>
+       <listitem>
+       <para>
+       Show help about <application>pg_repackdb</application> command line
+       arguments, and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+    </variablelist>
+   </para>
+
+   <para>
+    <application>pg_repackdb</application> also accepts
+    the following command-line arguments for connection parameters:
+    <variablelist>
+     <varlistentry>
+      <term><option>-h <replaceable class="parameter">host</replaceable></option></term>
+      <term><option>--host=<replaceable class="parameter">host</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the host name of the machine on which the server
+        is running.  If the value begins with a slash, it is used
+        as the directory for the Unix domain socket.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-p <replaceable class="parameter">port</replaceable></option></term>
+      <term><option>--port=<replaceable class="parameter">port</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the TCP port or local Unix domain socket file
+        extension on which the server
+        is listening for connections.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-U <replaceable class="parameter">username</replaceable></option></term>
+      <term><option>--username=<replaceable class="parameter">username</replaceable></option></term>
+      <listitem>
+       <para>
+        User name to connect as.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-w</option></term>
+      <term><option>--no-password</option></term>
+      <listitem>
+       <para>
+        Never issue a password prompt.  If the server requires
+        password authentication and a password is not available by
+        other means such as a <filename>.pgpass</filename> file, the
+        connection attempt will fail.  This option can be useful in
+        batch jobs and scripts where no user is present to enter a
+        password.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-W</option></term>
+      <term><option>--password</option></term>
+      <listitem>
+       <para>
+        Force <application>pg_repackdb</application> to prompt for a
+        password before connecting to a database.
+       </para>
+
+       <para>
+        This option is never essential, since
+        <application>pg_repackdb</application> will automatically prompt
+        for a password if the server demands password authentication.
+        However, <application>pg_repackdb</application> will waste a
+        connection attempt finding out that the server wants a password.
+        In some cases it is worth typing <option>-W</option> to avoid the extra
+        connection attempt.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--maintenance-db=<replaceable class="parameter">dbname</replaceable></option></term>
+      <listitem>
+       <para>
+        When the <option>-a</option>/<option>--all</option> is used, connect
+        to this database to gather the list of databases to repack.
+        If not specified, the <literal>postgres</literal> database will be used,
+        or if that does not exist, <literal>template1</literal> will be used.
+        This can be a <link linkend="libpq-connstring">connection
+        string</link>.  If so, connection string parameters will override any
+        conflicting command line options.  Also, connection string parameters
+        other than the database name itself will be re-used when connecting
+        to other databases.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+   </para>
+ </refsect1>
+
+
+ <refsect1>
+  <title>Environment</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><envar>PGDATABASE</envar></term>
+    <term><envar>PGHOST</envar></term>
+    <term><envar>PGPORT</envar></term>
+    <term><envar>PGUSER</envar></term>
+
+    <listitem>
+     <para>
+      Default connection parameters
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><envar>PG_COLOR</envar></term>
+    <listitem>
+     <para>
+      Specifies whether to use color in diagnostic messages. Possible values
+      are <literal>always</literal>, <literal>auto</literal> and
+      <literal>never</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <para>
+   This utility, like most other <productname>PostgreSQL</productname> utilities,
+   also uses the environment variables supported by <application>libpq</application>
+   (see <xref linkend="libpq-envars"/>).
+  </para>
+
+ </refsect1>
+
+
+ <refsect1>
+  <title>Diagnostics</title>
+
+  <para>
+   In case of difficulty, see
+   <xref linkend="sql-repack"/> and <xref linkend="app-psql"/> for
+   discussions of potential problems and error messages.
+   The database server must be running at the
+   targeted host.  Also, any default connection settings and environment
+   variables used by the <application>libpq</application> front-end
+   library will apply.
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+   <para>
+    To repack the database <literal>test</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb test</userinput>
+</screen>
+   </para>
+
+   <para>
+    To repack and analyze for the optimizer a database named
+    <literal>bigdb</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --analyze bigdb</userinput>
+</screen>
+   </para>
+
+   <para>
+    To repack a single table
+    <literal>foo</literal> in a database named
+    <literal>xyzzy</literal>, and analyze a single column
+    <literal>bar</literal> of the table for the optimizer:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --analyze --verbose --table='foo(bar)' xyzzy</userinput>
+</screen></para>
+
+   <para>
+    To repack all tables in the <literal>foo</literal> and <literal>bar</literal> schemas
+    in a database named <literal>xyzzy</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --schema='foo' --schema='bar' xyzzy</userinput>
+</screen></para>
+
+
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-repack"/></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/repack.sgml b/doc/src/sgml/ref/repack.sgml
new file mode 100644
index 00000000000..fd9d89f8aaa
--- /dev/null
+++ b/doc/src/sgml/ref/repack.sgml
@@ -0,0 +1,284 @@
+<!--
+doc/src/sgml/ref/repack.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="sql-repack">
+ <indexterm zone="sql-repack">
+  <primary>REPACK</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>REPACK</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>REPACK</refname>
+  <refpurpose>rewrite a table to reclaim disk space</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <replaceable class="parameter">table_name</replaceable> [ USING INDEX [ <replaceable class="parameter">index_name</replaceable> ] ] ]
+
+<phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
+
+    VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
+    ANALYSE | ANALYZE
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>REPACK</command> reclaims storage occupied by dead
+   tuples. Unlike <command>VACUUM</command>, it does so by rewriting the
+   entire contents of the table specified
+   by <replaceable class="parameter">table_name</replaceable> into a new disk
+   file with no extra space (except for the space guaranteed by
+   the <literal>fillfactor</literal> storage parameter), allowing unused space
+   to be returned to the operating system.
+  </para>
+
+  <para>
+   Without
+   a <replaceable class="parameter">table_name</replaceable>, <command>REPACK</command>
+   processes every table and materialized view in the current database that
+   the current user has the <literal>MAINTAIN</literal> privilege on. This
+   form of <command>REPACK</command> cannot be executed inside a transaction
+   block.
+  </para>
+
+  <para>
+   If a <literal>USING INDEX</literal> clause is specified, the rows are
+   physically reordered based on information from an index.  Please see the
+   notes on clustering below.
+  </para>
+
+  <para>
+   When a table is being repacked, an <literal>ACCESS EXCLUSIVE</literal> lock
+   is acquired on it. This prevents any other database operations (both reads
+   and writes) from operating on the table until the <command>REPACK</command>
+   is finished.
+  </para>
+
+  <refsect2 id="sql-repack-notes-on-clustering" xreflabel="Notes on Clustering">
+   <title>Notes on Clustering</title>
+
+   <para>
+    If the <literal>USING INDEX</literal> clause is specified, the rows in
+    the table are physically reordered following an index: if an index name
+    is specified in the command, then that index is used; if no index name
+    is specified, then the index that has been configured as the index to
+    cluster on.  If no index has been configured in this way, an error is
+    thrown.  The index given in the <literal>USING INDEX</literal> clause
+    is configured as the index to cluster on, as well as an index given
+    to the <command>CLUSTER</command> command.  An index can be set
+    manually using <command>ALTER TABLE ... CLUSTER ON</command>, and reset
+    with <command>ALTER TABLE ... SET WITHOUT CLUSTER</command>.
+   </para>
+
+   <para>
+    If no table name is specified in <command>REPACK USING INDEX</command>,
+    all tables which have a clustering index defined and which the calling
+    user has privileges for are processed.
+   </para>
+
+   <para>
+    Clustering is a one-time operation: when the table is
+    subsequently updated, the changes are not clustered.  That is, no attempt
+    is made to store new or updated rows according to their index order.  (If
+    one wishes, one can periodically recluster by issuing the command again.
+    Also, setting the table's <literal>fillfactor</literal> storage parameter
+    to less than 100% can aid in preserving cluster ordering during updates,
+    since updated rows are kept on the same page if enough space is available
+    there.)
+   </para>
+
+   <para>
+    In cases where you are accessing single rows randomly within a table, the
+    actual order of the data in the table is unimportant. However, if you tend
+    to access some data more than others, and there is an index that groups
+    them together, you will benefit from using clustering.  If
+    you are requesting a range of indexed values from a table, or a single
+    indexed value that has multiple rows that match,
+    <command>REPACK</command> will help because once the index identifies the
+    table page for the first row that matches, all other rows that match are
+    probably already on the same table page, and so you save disk accesses and
+    speed up the query.
+   </para>
+
+   <para>
+    <command>REPACK</command> can re-sort the table using either an index scan
+    on the specified index (if the index is a b-tree), or a sequential scan
+    followed by sorting.  It will attempt to choose the method that will be
+    faster, based on planner cost parameters and available statistical
+    information.
+   </para>
+
+   <para>
+    Because the planner records statistics about the ordering of tables, it is
+    advisable to
+    run <link linkend="sql-analyze"><command>ANALYZE</command></link> on the
+    newly repacked table.  Otherwise, the planner might make poor choices of
+    query plans.
+   </para>
+  </refsect2>
+
+  <refsect2 id="sql-repack-notes-on-resources" xreflabel="Notes on Resources">
+   <title>Notes on Resources</title>
+
+   <para>
+    When an index scan or a sequential scan without sort is used, a temporary
+    copy of the table is created that contains the table data in the index
+    order.  Temporary copies of each index on the table are created as well.
+    Therefore, you need free space on disk at least equal to the sum of the
+    table size and the index sizes.
+   </para>
+
+   <para>
+    When a sequential scan and sort is used, a temporary sort file is also
+    created, so that the peak temporary space requirement is as much as double
+    the table size, plus the index sizes.  This method is often faster than
+    the index scan method, but if the disk space requirement is intolerable,
+    you can disable this choice by temporarily setting
+    <xref linkend="guc-enable-sort"/> to <literal>off</literal>.
+   </para>
+
+   <para>
+    It is advisable to set <xref linkend="guc-maintenance-work-mem"/> to a
+    reasonably large value (but not more than the amount of RAM you can
+    dedicate to the <command>REPACK</command> operation) before repacking.
+   </para>
+  </refsect2>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><replaceable class="parameter">table_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (possibly schema-qualified) of a table.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">index_name</replaceable></term>
+    <listitem>
+     <para>
+      The name of an index.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>VERBOSE</literal></term>
+    <listitem>
+     <para>
+      Prints a progress report as each table is repacked
+      at <literal>INFO</literal> level.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>ANALYZE</literal></term>
+    <term><literal>ANALYSE</literal></term>
+    <listitem>
+     <para>
+      Applies <xref linkend="sql-analyze"/> on the table after repacking.  This is
+      currently only supported when a single (non-partitioned) table is specified.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">boolean</replaceable></term>
+    <listitem>
+     <para>
+      Specifies whether the selected option should be turned on or off.
+      You can write <literal>TRUE</literal>, <literal>ON</literal>, or
+      <literal>1</literal> to enable the option, and <literal>FALSE</literal>,
+      <literal>OFF</literal>, or <literal>0</literal> to disable it.  The
+      <replaceable class="parameter">boolean</replaceable> value can also
+      be omitted, in which case <literal>TRUE</literal> is assumed.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+
+   <para>
+    To repack a table, one must have the <literal>MAINTAIN</literal> privilege
+    on the table.
+   </para>
+
+   <para>
+    While <command>REPACK</command> is running, the <xref
+    linkend="guc-search-path"/> is temporarily changed to <literal>pg_catalog,
+    pg_temp</literal>.
+   </para>
+
+  <para>
+    Each backend running <command>REPACK</command> will report its progress
+    in the <structname>pg_stat_progress_repack</structname> view. See
+    <xref linkend="repack-progress-reporting"/> for details.
+  </para>
+
+   <para>
+    Repacking a partitioned table repacks each of its partitions. If an index
+    is specified, each partition is repacked using the partition of that
+    index. <command>REPACK</command> on a partitioned table cannot be executed
+    inside a transaction block.
+   </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   Repack the table <literal>employees</literal>:
+<programlisting>
+REPACK employees;
+</programlisting>
+  </para>
+
+  <para>
+   Repack the table <literal>employees</literal> on the basis of its
+   index <literal>employees_ind</literal> (Since index is used here, this is
+   effectively clustering):
+<programlisting>
+REPACK employees USING INDEX employees_ind;
+</programlisting>
+  </para>
+
+  <para>
+   Repack all tables in the database on which you have
+   the <literal>MAINTAIN</literal> privilege:
+<programlisting>
+REPACK;
+</programlisting></para>
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There is no <command>REPACK</command> statement in the SQL standard.
+  </para>
+
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index bd5dcaf86a5..062b658cfcd 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -25,7 +25,6 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
 
 <phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
 
-    FULL [ <replaceable class="parameter">boolean</replaceable> ]
     FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
     VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
     ANALYZE [ <replaceable class="parameter">boolean</replaceable> ]
@@ -39,6 +38,7 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
     SKIP_DATABASE_STATS [ <replaceable class="parameter">boolean</replaceable> ]
     ONLY_DATABASE_STATS [ <replaceable class="parameter">boolean</replaceable> ]
     BUFFER_USAGE_LIMIT <replaceable class="parameter">size</replaceable>
+    FULL [ <replaceable class="parameter">boolean</replaceable> ]
 
 <phrase>and <replaceable class="parameter">table_and_columns</replaceable> is:</phrase>
 
@@ -95,20 +95,6 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
   <title>Parameters</title>
 
   <variablelist>
-   <varlistentry>
-    <term><literal>FULL</literal></term>
-    <listitem>
-     <para>
-      Selects <quote>full</quote> vacuum, which can reclaim more
-      space, but takes much longer and exclusively locks the table.
-      This method also requires extra disk space, since it writes a
-      new copy of the table and doesn't release the old copy until
-      the operation is complete.  Usually this should only be used when a
-      significant amount of space needs to be reclaimed from within the table.
-     </para>
-    </listitem>
-   </varlistentry>
-
    <varlistentry>
     <term><literal>FREEZE</literal></term>
     <listitem>
@@ -362,6 +348,23 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>FULL</literal></term>
+    <listitem>
+     <para>
+      This option, which is deprecated, makes <command>VACUUM</command>
+      behave like <command>REPACK</command> without a
+      <literal>USING INDEX</literal> clause.
+      This method of compacting the table takes much longer than
+      <command>VACUUM</command> and exclusively locks the table.
+      This method also requires extra disk space, since it writes a
+      new copy of the table and doesn't release the old copy until
+      the operation is complete.  Usually this should only be used when a
+      significant amount of space needs to be reclaimed from within the table.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><replaceable class="parameter">boolean</replaceable></term>
     <listitem>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index ff85ace83fc..2ee08e21f41 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -195,6 +195,7 @@
    &refreshMaterializedView;
    &reindex;
    &releaseSavepoint;
+   &repack;
    &reset;
    &revoke;
    &rollback;
@@ -257,6 +258,7 @@
    &pgIsready;
    &pgReceivewal;
    &pgRecvlogical;
+   &pgRepackdb;
    &pgRestore;
    &pgVerifyBackup;
    &psqlRef;
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..79f9de5d760 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -741,13 +741,13 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	if (OldIndex != NULL && !use_sort)
 	{
 		const int	ci_index[] = {
-			PROGRESS_CLUSTER_PHASE,
-			PROGRESS_CLUSTER_INDEX_RELID
+			PROGRESS_REPACK_PHASE,
+			PROGRESS_REPACK_INDEX_RELID
 		};
 		int64		ci_val[2];
 
 		/* Set phase and OIDOldIndex to columns */
-		ci_val[0] = PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP;
+		ci_val[0] = PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP;
 		ci_val[1] = RelationGetRelid(OldIndex);
 		pgstat_progress_update_multi_param(2, ci_index, ci_val);
 
@@ -759,15 +759,15 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	else
 	{
 		/* In scan-and-sort mode and also VACUUM FULL, set phase */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
 		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
 		/* Set total heap blocks */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_TOTAL_HEAP_BLKS,
+		pgstat_progress_update_param(PROGRESS_REPACK_TOTAL_HEAP_BLKS,
 									 heapScan->rs_nblocks);
 	}
 
@@ -809,7 +809,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 				 * is manually updated to the correct value when the table
 				 * scan finishes.
 				 */
-				pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+				pgstat_progress_update_param(PROGRESS_REPACK_HEAP_BLKS_SCANNED,
 											 heapScan->rs_nblocks);
 				break;
 			}
@@ -825,7 +825,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			 */
 			if (prev_cblock != heapScan->rs_cblock)
 			{
-				pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+				pgstat_progress_update_param(PROGRESS_REPACK_HEAP_BLKS_SCANNED,
 											 (heapScan->rs_cblock +
 											  heapScan->rs_nblocks -
 											  heapScan->rs_startblock
@@ -912,14 +912,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			 * In scan-and-sort mode, report increase in number of tuples
 			 * scanned
 			 */
-			pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
 										 *num_tuples);
 		}
 		else
 		{
 			const int	ct_index[] = {
-				PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED,
-				PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN
+				PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
+				PROGRESS_REPACK_HEAP_TUPLES_WRITTEN
 			};
 			int64		ct_val[2];
 
@@ -952,14 +952,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		double		n_tuples = 0;
 
 		/* Report that we are now sorting tuples */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_SORT_TUPLES);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_SORT_TUPLES);
 
 		tuplesort_performsort(tuplesort);
 
 		/* Report that we are now writing new heap */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_WRITE_NEW_HEAP);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP);
 
 		for (;;)
 		{
@@ -977,7 +977,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 									 values, isnull,
 									 rwstate);
 			/* Report n_tuples */
-			pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_WRITTEN,
 										 n_tuples);
 		}
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 5d9db167e59..08d4b8e44d7 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -4079,7 +4079,7 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 		Assert(!ReindexIsProcessingIndex(indexOid));
 
 		/* Set index rebuild count */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_INDEX_REBUILD_COUNT,
+		pgstat_progress_update_param(PROGRESS_REPACK_INDEX_REBUILD_COUNT,
 									 i);
 		i++;
 	}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index c77fa0234bb..7924922e0fe 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1279,6 +1279,32 @@ CREATE VIEW pg_stat_progress_cluster AS
     FROM pg_stat_get_progress_info('CLUSTER') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
+CREATE VIEW pg_stat_progress_repack AS
+    SELECT
+        S.pid AS pid,
+        S.datid AS datid,
+        D.datname AS datname,
+        S.relid AS relid,
+	-- param1 is currently unused
+        CASE S.param2 WHEN 0 THEN 'initializing'
+                      WHEN 1 THEN 'seq scanning heap'
+                      WHEN 2 THEN 'index scanning heap'
+                      WHEN 3 THEN 'sorting tuples'
+                      WHEN 4 THEN 'writing new heap'
+                      WHEN 5 THEN 'swapping relation files'
+                      WHEN 6 THEN 'rebuilding index'
+                      WHEN 7 THEN 'performing final cleanup'
+                      END AS phase,
+        CAST(S.param3 AS oid) AS repack_index_relid,
+        S.param4 AS heap_tuples_scanned,
+        S.param5 AS heap_tuples_written,
+        S.param6 AS heap_blks_total,
+        S.param7 AS heap_blks_scanned,
+        S.param8 AS index_rebuild_count
+    FROM pg_stat_get_progress_info('REPACK') AS S
+        LEFT JOIN pg_database D ON S.datid = D.oid;
+
+
 CREATE VIEW pg_stat_progress_create_index AS
     SELECT
         S.pid AS pid, S.datid AS datid, D.datname AS datname,
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index b55221d44cd..8b64f9e6795 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -67,18 +67,41 @@ typedef struct
 	Oid			indexOid;
 } RelToCluster;
 
-
-static void cluster_multiple_rels(List *rtcs, ClusterParams *params);
-static void rebuild_relation(Relation OldHeap, Relation index, bool verbose);
+static bool cluster_rel_recheck(RepackCommand cmd, Relation OldHeap,
+								Oid indexOid, Oid userid, int options);
+static void rebuild_relation(RepackCommand cmd, bool usingindex,
+							 Relation OldHeap, Relation index, bool verbose);
 static void copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
-static List *get_tables_to_cluster(MemoryContext cluster_context);
-static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
-											   Oid indexOid);
-static bool cluster_is_permitted_for_relation(Oid relid, Oid userid);
+static List *get_tables_to_repack(RepackCommand cmd, bool usingindex,
+								  MemoryContext permcxt);
+static List *get_tables_to_repack_partitioned(RepackCommand cmd,
+											  MemoryContext cluster_context,
+											  Oid relid, bool rel_is_index);
+static bool cluster_is_permitted_for_relation(RepackCommand cmd,
+											  Oid relid, Oid userid);
+static Relation process_single_relation(RepackStmt *stmt,
+										ClusterParams *params);
+static Oid	determine_clustered_index(Relation rel, bool usingindex,
+									  const char *indexname);
 
 
+static const char *
+RepackCommandAsString(RepackCommand cmd)
+{
+	switch (cmd)
+	{
+		case REPACK_COMMAND_REPACK:
+			return "REPACK";
+		case REPACK_COMMAND_VACUUMFULL:
+			return "VACUUM";
+		case REPACK_COMMAND_CLUSTER:
+			return "CLUSTER";
+	}
+	return "???";
+}
+
 /*---------------------------------------------------------------------------
  * This cluster code allows for clustering multiple tables at once. Because
  * of this, we cannot just run everything on a single transaction, or we
@@ -104,191 +127,155 @@ static bool cluster_is_permitted_for_relation(Oid relid, Oid userid);
  *---------------------------------------------------------------------------
  */
 void
-cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
+ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 {
-	ListCell   *lc;
 	ClusterParams params = {0};
-	bool		verbose = false;
 	Relation	rel = NULL;
-	Oid			indexOid = InvalidOid;
-	MemoryContext cluster_context;
+	MemoryContext repack_context;
 	List	   *rtcs;
 
 	/* Parse option list */
-	foreach(lc, stmt->params)
+	foreach_node(DefElem, opt, stmt->params)
 	{
-		DefElem    *opt = (DefElem *) lfirst(lc);
-
 		if (strcmp(opt->defname, "verbose") == 0)
-			verbose = defGetBoolean(opt);
+			params.options |= defGetBoolean(opt) ? CLUOPT_VERBOSE : 0;
+		else if (strcmp(opt->defname, "analyze") == 0 ||
+				 strcmp(opt->defname, "analyse") == 0)
+			params.options |= defGetBoolean(opt) ? CLUOPT_ANALYZE : 0;
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
-					 errmsg("unrecognized CLUSTER option \"%s\"",
+					 errmsg("unrecognized %s option \"%s\"",
+							RepackCommandAsString(stmt->command),
 							opt->defname),
 					 parser_errposition(pstate, opt->location)));
 	}
 
-	params.options = (verbose ? CLUOPT_VERBOSE : 0);
-
+	/*
+	 * If a single relation is specified, process it and we're done ... unless
+	 * the relation is a partitioned table, in which case we fall through.
+	 */
 	if (stmt->relation != NULL)
 	{
-		/* This is the single-relation case. */
-		Oid			tableOid;
-
-		/*
-		 * Find, lock, and check permissions on the table.  We obtain
-		 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
-		 * single-transaction case.
-		 */
-		tableOid = RangeVarGetRelidExtended(stmt->relation,
-											AccessExclusiveLock,
-											0,
-											RangeVarCallbackMaintainsTable,
-											NULL);
-		rel = table_open(tableOid, NoLock);
-
-		/*
-		 * Reject clustering a remote temp table ... their local buffer
-		 * manager is not going to cope.
-		 */
-		if (RELATION_IS_OTHER_TEMP(rel))
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot cluster temporary tables of other sessions")));
-
-		if (stmt->indexname == NULL)
-		{
-			ListCell   *index;
-
-			/* We need to find the index that has indisclustered set. */
-			foreach(index, RelationGetIndexList(rel))
-			{
-				indexOid = lfirst_oid(index);
-				if (get_index_isclustered(indexOid))
-					break;
-				indexOid = InvalidOid;
-			}
-
-			if (!OidIsValid(indexOid))
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_OBJECT),
-						 errmsg("there is no previously clustered index for table \"%s\"",
-								stmt->relation->relname)));
-		}
-		else
-		{
-			/*
-			 * The index is expected to be in the same namespace as the
-			 * relation.
-			 */
-			indexOid = get_relname_relid(stmt->indexname,
-										 rel->rd_rel->relnamespace);
-			if (!OidIsValid(indexOid))
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_OBJECT),
-						 errmsg("index \"%s\" for table \"%s\" does not exist",
-								stmt->indexname, stmt->relation->relname)));
-		}
-
-		/* For non-partitioned tables, do what we came here to do. */
-		if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
-		{
-			cluster_rel(rel, indexOid, &params);
-			/* cluster_rel closes the relation, but keeps lock */
-
+		rel = process_single_relation(stmt, &params);
+		if (rel == NULL)
 			return;
-		}
 	}
 
+	/* Don't allow this for now.  Maybe we can add support for this later */
+	if (params.options & CLUOPT_ANALYZE)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot ANALYZE multiple tables"));
+
 	/*
 	 * By here, we know we are in a multi-table situation.  In order to avoid
 	 * holding locks for too long, we want to process each table in its own
 	 * transaction.  This forces us to disallow running inside a user
 	 * transaction block.
 	 */
-	PreventInTransactionBlock(isTopLevel, "CLUSTER");
+	PreventInTransactionBlock(isTopLevel, RepackCommandAsString(stmt->command));
 
 	/* Also, we need a memory context to hold our list of relations */
-	cluster_context = AllocSetContextCreate(PortalContext,
-											"Cluster",
-											ALLOCSET_DEFAULT_SIZES);
+	repack_context = AllocSetContextCreate(PortalContext,
+										   "Repack",
+										   ALLOCSET_DEFAULT_SIZES);
+
+	params.options |= CLUOPT_RECHECK;
 
 	/*
-	 * Either we're processing a partitioned table, or we were not given any
-	 * table name at all.  In either case, obtain a list of relations to
-	 * process.
-	 *
-	 * In the former case, an index name must have been given, so we don't
-	 * need to recheck its "indisclustered" bit, but we have to check that it
-	 * is an index that we can cluster on.  In the latter case, we set the
-	 * option bit to have indisclustered verified.
-	 *
-	 * Rechecking the relation itself is necessary here in all cases.
+	 * If we don't have a relation yet, determine a relation list.  If we do,
+	 * then it must be a partitioned table, and we want to process its
+	 * partitions.
 	 */
-	params.options |= CLUOPT_RECHECK;
-	if (rel != NULL)
+	if (rel == NULL)
 	{
-		Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
-		check_index_is_clusterable(rel, indexOid, AccessShareLock);
-		rtcs = get_tables_to_cluster_partitioned(cluster_context, indexOid);
-
-		/* close relation, releasing lock on parent table */
-		table_close(rel, AccessExclusiveLock);
+		Assert(stmt->indexname == NULL);
+		rtcs = get_tables_to_repack(stmt->command, stmt->usingindex,
+									repack_context);
 	}
 	else
 	{
-		rtcs = get_tables_to_cluster(cluster_context);
-		params.options |= CLUOPT_RECHECK_ISCLUSTERED;
+		Oid			relid;
+		bool		rel_is_index;
+
+		Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
+
+		/*
+		 * If an index name was specified, resolve it now and pass it down.
+		 */
+		if (stmt->usingindex)
+		{
+			/*
+			 * XXX how should this behave?  Passing no index to a partitioned
+			 * table could be useful to have certain partitions clustered by
+			 * some index, and other partitions by a different index.
+			 */
+			if (!stmt->indexname)
+				ereport(ERROR,
+						errmsg("there is no previously clustered index for table \"%s\"",
+							   RelationGetRelationName(rel)));
+
+			relid = determine_clustered_index(rel, true, stmt->indexname);
+			if (!OidIsValid(relid))
+				elog(ERROR, "unable to determine index to cluster on");
+			/* XXX is this the right place for this check? */
+			check_index_is_clusterable(rel, relid, AccessExclusiveLock);
+			rel_is_index = true;
+		}
+		else
+		{
+			relid = RelationGetRelid(rel);
+			rel_is_index = false;
+		}
+
+		rtcs = get_tables_to_repack_partitioned(stmt->command, repack_context,
+												relid, rel_is_index);
+
+		/* close parent relation, releasing lock on it */
+		table_close(rel, AccessExclusiveLock);
+		rel = NULL;
 	}
 
-	/* Do the job. */
-	cluster_multiple_rels(rtcs, &params);
-
-	/* Start a new transaction for the cleanup work. */
-	StartTransactionCommand();
-
-	/* Clean up working storage */
-	MemoryContextDelete(cluster_context);
-}
-
-/*
- * Given a list of relations to cluster, process each of them in a separate
- * transaction.
- *
- * We expect to be in a transaction at start, but there isn't one when we
- * return.
- */
-static void
-cluster_multiple_rels(List *rtcs, ClusterParams *params)
-{
-	ListCell   *lc;
-
 	/* Commit to get out of starting transaction */
 	PopActiveSnapshot();
 	CommitTransactionCommand();
 
 	/* Cluster the tables, each in a separate transaction */
-	foreach(lc, rtcs)
+	Assert(rel == NULL);
+	foreach_ptr(RelToCluster, rtc, rtcs)
 	{
-		RelToCluster *rtc = (RelToCluster *) lfirst(lc);
-		Relation	rel;
-
 		/* Start a new transaction for each relation. */
 		StartTransactionCommand();
 
+		/*
+		 * Open the target table, coping with the case where it has been
+		 * dropped.
+		 */
+		rel = try_table_open(rtc->tableOid, AccessExclusiveLock);
+		if (rel == NULL)
+		{
+			CommitTransactionCommand();
+			continue;
+		}
+
 		/* functions in indexes may want a snapshot set */
 		PushActiveSnapshot(GetTransactionSnapshot());
 
-		rel = table_open(rtc->tableOid, AccessExclusiveLock);
-
 		/* Process this table */
-		cluster_rel(rel, rtc->indexOid, params);
+		cluster_rel(stmt->command, stmt->usingindex,
+					rel, rtc->indexOid, &params);
 		/* cluster_rel closes the relation, but keeps lock */
 
 		PopActiveSnapshot();
 		CommitTransactionCommand();
 	}
+
+	/* Start a new transaction for the cleanup work. */
+	StartTransactionCommand();
+
+	/* Clean up working storage */
+	MemoryContextDelete(repack_context);
 }
 
 /*
@@ -304,11 +291,14 @@ cluster_multiple_rels(List *rtcs, ClusterParams *params)
  * them incrementally while we load the table.
  *
  * If indexOid is InvalidOid, the table will be rewritten in physical order
- * instead of index order.  This is the new implementation of VACUUM FULL,
- * and error messages should refer to the operation as VACUUM not CLUSTER.
+ * instead of index order.
+ *
+ * 'cmd' indicates which command is being executed, to be used for error
+ * messages.
  */
 void
-cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
+cluster_rel(RepackCommand cmd, bool usingindex,
+			Relation OldHeap, Oid indexOid, ClusterParams *params)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			save_userid;
@@ -323,13 +313,25 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	/* Check for user-requested abort. */
 	CHECK_FOR_INTERRUPTS();
 
-	pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
-	if (OidIsValid(indexOid))
-		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
-									 PROGRESS_CLUSTER_COMMAND_CLUSTER);
+	if (cmd == REPACK_COMMAND_REPACK)
+		pgstat_progress_start_command(PROGRESS_COMMAND_REPACK, tableOid);
 	else
-		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
+		pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
+
+	if (cmd == REPACK_COMMAND_REPACK)
+		pgstat_progress_update_param(PROGRESS_REPACK_COMMAND,
+									 PROGRESS_REPACK_COMMAND_REPACK);
+	else if (cmd == REPACK_COMMAND_CLUSTER)
+	{
+		pgstat_progress_update_param(PROGRESS_REPACK_COMMAND,
+									 PROGRESS_CLUSTER_COMMAND_CLUSTER);
+	}
+	else
+	{
+		Assert(cmd == REPACK_COMMAND_VACUUMFULL);
+		pgstat_progress_update_param(PROGRESS_REPACK_COMMAND,
 									 PROGRESS_CLUSTER_COMMAND_VACUUM_FULL);
+	}
 
 	/*
 	 * Switch to the table owner's userid, so that any index functions are run
@@ -351,63 +353,21 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	 * to cluster a not-previously-clustered index.
 	 */
 	if (recheck)
-	{
-		/* Check that the user still has privileges for the relation */
-		if (!cluster_is_permitted_for_relation(tableOid, save_userid))
-		{
-			relation_close(OldHeap, AccessExclusiveLock);
+		if (!cluster_rel_recheck(cmd, OldHeap, indexOid, save_userid,
+								 params->options))
 			goto out;
-		}
-
-		/*
-		 * Silently skip a temp table for a remote session.  Only doing this
-		 * check in the "recheck" case is appropriate (which currently means
-		 * somebody is executing a database-wide CLUSTER or on a partitioned
-		 * table), because there is another check in cluster() which will stop
-		 * any attempt to cluster remote temp tables by name.  There is
-		 * another check in cluster_rel which is redundant, but we leave it
-		 * for extra safety.
-		 */
-		if (RELATION_IS_OTHER_TEMP(OldHeap))
-		{
-			relation_close(OldHeap, AccessExclusiveLock);
-			goto out;
-		}
-
-		if (OidIsValid(indexOid))
-		{
-			/*
-			 * Check that the index still exists
-			 */
-			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
-			{
-				relation_close(OldHeap, AccessExclusiveLock);
-				goto out;
-			}
-
-			/*
-			 * Check that the index is still the one with indisclustered set,
-			 * if needed.
-			 */
-			if ((params->options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
-				!get_index_isclustered(indexOid))
-			{
-				relation_close(OldHeap, AccessExclusiveLock);
-				goto out;
-			}
-		}
-	}
 
 	/*
-	 * We allow VACUUM FULL, but not CLUSTER, on shared catalogs.  CLUSTER
-	 * would work in most respects, but the index would only get marked as
-	 * indisclustered in the current database, leading to unexpected behavior
-	 * if CLUSTER were later invoked in another database.
+	 * We allow repacking shared catalogs only when not using an index. It
+	 * would work to use an index in most respects, but the index would only
+	 * get marked as indisclustered in the current database, leading to
+	 * unexpected behavior if CLUSTER were later invoked in another database.
 	 */
-	if (OidIsValid(indexOid) && OldHeap->rd_rel->relisshared)
+	if (usingindex && OldHeap->rd_rel->relisshared)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot cluster a shared catalog")));
+				 errmsg("cannot run \"%s\" on a shared catalog",
+						RepackCommandAsString(cmd))));
 
 	/*
 	 * Don't process temp tables of other backends ... their local buffer
@@ -415,21 +375,30 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	 */
 	if (RELATION_IS_OTHER_TEMP(OldHeap))
 	{
-		if (OidIsValid(indexOid))
+		if (cmd == REPACK_COMMAND_CLUSTER)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("cannot cluster temporary tables of other sessions")));
+		else if (cmd == REPACK_COMMAND_REPACK)
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot repack temporary tables of other sessions")));
+		}
 		else
+		{
+			Assert(cmd == REPACK_COMMAND_VACUUMFULL);
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("cannot vacuum temporary tables of other sessions")));
+		}
 	}
 
 	/*
 	 * Also check for active uses of the relation in the current transaction,
 	 * including open scans and pending AFTER trigger events.
 	 */
-	CheckTableNotInUse(OldHeap, OidIsValid(indexOid) ? "CLUSTER" : "VACUUM");
+	CheckTableNotInUse(OldHeap, RepackCommandAsString(cmd));
 
 	/* Check heap and index are valid to cluster on */
 	if (OidIsValid(indexOid))
@@ -469,7 +438,7 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	TransferPredicateLocksToHeapRelation(OldHeap);
 
 	/* rebuild_relation does all the dirty work */
-	rebuild_relation(OldHeap, index, verbose);
+	rebuild_relation(cmd, usingindex, OldHeap, index, verbose);
 	/* rebuild_relation closes OldHeap, and index if valid */
 
 out:
@@ -482,6 +451,63 @@ out:
 	pgstat_progress_end_command();
 }
 
+/*
+ * Check if the table (and its index) still meets the requirements of
+ * cluster_rel().
+ */
+static bool
+cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
+					Oid userid, int options)
+{
+	Oid			tableOid = RelationGetRelid(OldHeap);
+
+	/* Check that the user still has privileges for the relation */
+	if (!cluster_is_permitted_for_relation(cmd, tableOid, userid))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		return false;
+	}
+
+	/*
+	 * Silently skip a temp table for a remote session.  Only doing this check
+	 * in the "recheck" case is appropriate (which currently means somebody is
+	 * executing a database-wide CLUSTER or on a partitioned table), because
+	 * there is another check in cluster() which will stop any attempt to
+	 * cluster remote temp tables by name.  There is another check in
+	 * cluster_rel which is redundant, but we leave it for extra safety.
+	 */
+	if (RELATION_IS_OTHER_TEMP(OldHeap))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		return false;
+	}
+
+	if (OidIsValid(indexOid))
+	{
+		/*
+		 * Check that the index still exists
+		 */
+		if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
+		{
+			relation_close(OldHeap, AccessExclusiveLock);
+			return false;
+		}
+
+		/*
+		 * Check that the index is still the one with indisclustered set, if
+		 * needed.
+		 */
+		if ((options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
+			!get_index_isclustered(indexOid))
+		{
+			relation_close(OldHeap, AccessExclusiveLock);
+			return false;
+		}
+	}
+
+	return true;
+}
+
 /*
  * Verify that the specified heap and index are valid to cluster on
  *
@@ -626,7 +652,8 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
  * On exit, they are closed, but locks on them are not released.
  */
 static void
-rebuild_relation(Relation OldHeap, Relation index, bool verbose)
+rebuild_relation(RepackCommand cmd, bool usingindex,
+				 Relation OldHeap, Relation index, bool verbose)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			accessMethod = OldHeap->rd_rel->relam;
@@ -642,8 +669,8 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose)
 	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false) &&
 		   (index == NULL || CheckRelationLockedByMe(index, AccessExclusiveLock, false)));
 
-	if (index)
-		/* Mark the correct index as clustered */
+	/* for CLUSTER or REPACK USING INDEX, mark the index as the one to use */
+	if (usingindex)
 		mark_index_clustered(OldHeap, RelationGetRelid(index), true);
 
 	/* Remember info about rel before closing OldHeap */
@@ -1458,8 +1485,8 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	int			i;
 
 	/* Report that we are now swapping relation files */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_SWAP_REL_FILES);
 
 	/* Zero out possible results from swapped_relation_files */
 	memset(mapped_tables, 0, sizeof(mapped_tables));
@@ -1509,14 +1536,14 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 		reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
 
 	/* Report that we are now reindexing relations */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_REBUILD_INDEX);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
 
 	reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
 
 	/* Report that we are now doing clean up */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_FINAL_CLEANUP);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_FINAL_CLEANUP);
 
 	/*
 	 * If the relation being rebuilt is pg_class, swap_relation_files()
@@ -1632,69 +1659,137 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	}
 }
 
-
 /*
- * Get a list of tables that the current user has privileges on and
- * have indisclustered set.  Return the list in a List * of RelToCluster
- * (stored in the specified memory context), each one giving the tableOid
- * and the indexOid on which the table is already clustered.
+ * Determine which relations to process, when REPACK/CLUSTER is called
+ * without specifying a table name.  The exact process depends on whether
+ * USING INDEX was given or not, and in any case we only return tables and
+ * materialized views that the current user has privileges to repack/cluster.
+ *
+ * If USING INDEX was given, we scan pg_index to find those that have
+ * indisclustered set; if it was not given, scan pg_class and return all
+ * tables.
+ *
+ * Return it as a list of RelToCluster in the given memory context.
  */
 static List *
-get_tables_to_cluster(MemoryContext cluster_context)
+get_tables_to_repack(RepackCommand command, bool usingindex,
+					 MemoryContext permcxt)
 {
-	Relation	indRelation;
+	Relation	catalog;
 	TableScanDesc scan;
-	ScanKeyData entry;
-	HeapTuple	indexTuple;
-	Form_pg_index index;
+	HeapTuple	tuple;
 	MemoryContext old_context;
 	List	   *rtcs = NIL;
 
-	/*
-	 * Get all indexes that have indisclustered set and that the current user
-	 * has the appropriate privileges for.
-	 */
-	indRelation = table_open(IndexRelationId, AccessShareLock);
-	ScanKeyInit(&entry,
-				Anum_pg_index_indisclustered,
-				BTEqualStrategyNumber, F_BOOLEQ,
-				BoolGetDatum(true));
-	scan = table_beginscan_catalog(indRelation, 1, &entry);
-	while ((indexTuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+	if (usingindex)
 	{
-		RelToCluster *rtc;
+		ScanKeyData entry;
 
-		index = (Form_pg_index) GETSTRUCT(indexTuple);
+		catalog = table_open(IndexRelationId, AccessShareLock);
+		ScanKeyInit(&entry,
+					Anum_pg_index_indisclustered,
+					BTEqualStrategyNumber, F_BOOLEQ,
+					BoolGetDatum(true));
+		scan = table_beginscan_catalog(catalog, 1, &entry);
+		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+		{
+			RelToCluster *rtc;
+			Form_pg_index index;
 
-		if (!cluster_is_permitted_for_relation(index->indrelid, GetUserId()))
-			continue;
+			index = (Form_pg_index) GETSTRUCT(tuple);
 
-		/* Use a permanent memory context for the result list */
-		old_context = MemoryContextSwitchTo(cluster_context);
+			/*
+			 * XXX I think the only reason there's no test failure here is
+			 * that we seldom have clustered indexes that would be affected by
+			 * concurrency.  Maybe we should also do the
+			 * ConditionalLockRelationOid+SearchSysCacheExists dance that we
+			 * do below.
+			 */
+			if (!cluster_is_permitted_for_relation(command, index->indrelid,
+												   GetUserId()))
+				continue;
 
-		rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
-		rtc->tableOid = index->indrelid;
-		rtc->indexOid = index->indexrelid;
-		rtcs = lappend(rtcs, rtc);
+			/* Use a permanent memory context for the result list */
+			old_context = MemoryContextSwitchTo(permcxt);
 
-		MemoryContextSwitchTo(old_context);
+			rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
+			rtc->tableOid = index->indrelid;
+			rtc->indexOid = index->indexrelid;
+			rtcs = lappend(rtcs, rtc);
+
+			MemoryContextSwitchTo(old_context);
+		}
 	}
-	table_endscan(scan);
+	else
+	{
+		catalog = table_open(RelationRelationId, AccessShareLock);
+		scan = table_beginscan_catalog(catalog, 0, NULL);
 
-	relation_close(indRelation, AccessShareLock);
+		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+		{
+			RelToCluster *rtc;
+			Form_pg_class class;
+
+			class = (Form_pg_class) GETSTRUCT(tuple);
+
+			/*
+			 * Try to obtain a light lock on the table, to ensure it doesn't
+			 * go away while we collect the list.  If we cannot, just
+			 * disregard the table.  XXX we could release at the bottom of the
+			 * loop, but for now just hold it until this transaction is
+			 * finished.
+			 */
+			if (!ConditionalLockRelationOid(class->oid, AccessShareLock))
+				continue;
+
+			/* Verify that the table still exists. */
+			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(class->oid)))
+			{
+				/* Release useless lock */
+				UnlockRelationOid(class->oid, AccessShareLock);
+				continue;
+			}
+
+			/* Can only process plain tables and matviews */
+			if (class->relkind != RELKIND_RELATION &&
+				class->relkind != RELKIND_MATVIEW)
+				continue;
+
+			if (!cluster_is_permitted_for_relation(command, class->oid,
+												   GetUserId()))
+				continue;
+
+			/* Use a permanent memory context for the result list */
+			old_context = MemoryContextSwitchTo(permcxt);
+
+			rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
+			rtc->tableOid = class->oid;
+			rtc->indexOid = InvalidOid;
+			rtcs = lappend(rtcs, rtc);
+
+			MemoryContextSwitchTo(old_context);
+		}
+	}
+
+	table_endscan(scan);
+	relation_close(catalog, AccessShareLock);
 
 	return rtcs;
 }
 
 /*
- * Given an index on a partitioned table, return a list of RelToCluster for
+ * Given a partitioned table or its index, return a list of RelToCluster for
  * all the children leaves tables/indexes.
  *
  * Like expand_vacuum_rel, but here caller must hold AccessExclusiveLock
  * on the table containing the index.
+ *
+ * 'rel_is_index' tells whether 'relid' is that of an index (true) or of the
+ * owning relation.
  */
 static List *
-get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
+get_tables_to_repack_partitioned(RepackCommand cmd, MemoryContext cluster_context,
+								 Oid relid, bool rel_is_index)
 {
 	List	   *inhoids;
 	ListCell   *lc;
@@ -1702,17 +1797,33 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
 	MemoryContext old_context;
 
 	/* Do not lock the children until they're processed */
-	inhoids = find_all_inheritors(indexOid, NoLock, NULL);
+	inhoids = find_all_inheritors(relid, NoLock, NULL);
 
 	foreach(lc, inhoids)
 	{
-		Oid			indexrelid = lfirst_oid(lc);
-		Oid			relid = IndexGetRelation(indexrelid, false);
+		Oid			inhoid = lfirst_oid(lc);
+		Oid			inhrelid,
+					inhindid;
 		RelToCluster *rtc;
 
-		/* consider only leaf indexes */
-		if (get_rel_relkind(indexrelid) != RELKIND_INDEX)
-			continue;
+		if (rel_is_index)
+		{
+			/* consider only leaf indexes */
+			if (get_rel_relkind(inhoid) != RELKIND_INDEX)
+				continue;
+
+			inhrelid = IndexGetRelation(inhoid, false);
+			inhindid = inhoid;
+		}
+		else
+		{
+			/* consider only leaf relations */
+			if (get_rel_relkind(inhoid) != RELKIND_RELATION)
+				continue;
+
+			inhrelid = inhoid;
+			inhindid = InvalidOid;
+		}
 
 		/*
 		 * It's possible that the user does not have privileges to CLUSTER the
@@ -1720,15 +1831,15 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
 		 * table.  We skip any partitions which the user is not permitted to
 		 * CLUSTER.
 		 */
-		if (!cluster_is_permitted_for_relation(relid, GetUserId()))
+		if (!cluster_is_permitted_for_relation(cmd, inhrelid, GetUserId()))
 			continue;
 
 		/* Use a permanent memory context for the result list */
 		old_context = MemoryContextSwitchTo(cluster_context);
 
 		rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
-		rtc->tableOid = relid;
-		rtc->indexOid = indexrelid;
+		rtc->tableOid = inhrelid;
+		rtc->indexOid = inhindid;
 		rtcs = lappend(rtcs, rtc);
 
 		MemoryContextSwitchTo(old_context);
@@ -1742,13 +1853,148 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
  * function emits a WARNING.
  */
 static bool
-cluster_is_permitted_for_relation(Oid relid, Oid userid)
+cluster_is_permitted_for_relation(RepackCommand cmd, Oid relid, Oid userid)
 {
 	if (pg_class_aclcheck(relid, userid, ACL_MAINTAIN) == ACLCHECK_OK)
 		return true;
 
+	Assert(cmd == REPACK_COMMAND_CLUSTER || cmd == REPACK_COMMAND_REPACK);
 	ereport(WARNING,
-			(errmsg("permission denied to cluster \"%s\", skipping it",
-					get_rel_name(relid))));
+			errmsg("permission denied to execute %s on \"%s\", skipping it",
+				   cmd == REPACK_COMMAND_CLUSTER ? "CLUSTER" : "REPACK",
+				   get_rel_name(relid)));
+
 	return false;
 }
+
+
+/*
+ * Given a RepackStmt with an indicated relation name, resolve the relation
+ * name, obtain lock on it, then determine what to do based on the relation
+ * type: if it's not a partitioned table, repack it as indicated (using an
+ * existing clustered index, or following the indicated index), and return
+ * NULL.
+ *
+ * On the other hand, if the table is partitioned, do nothing further and
+ * instead return the opened relcache entry, so that caller can process the
+ * partitions using the multiple-table handling code.  The index name is not
+ * resolve in this case.
+ */
+static Relation
+process_single_relation(RepackStmt *stmt, ClusterParams *params)
+{
+	Relation	rel;
+	Oid			tableOid;
+
+	Assert(stmt->relation != NULL);
+	Assert(stmt->command == REPACK_COMMAND_CLUSTER ||
+		   stmt->command == REPACK_COMMAND_REPACK);
+
+	/*
+	 * Find, lock, and check permissions on the table.  We obtain
+	 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
+	 * single-transaction case.
+	 */
+	tableOid = RangeVarGetRelidExtended(stmt->relation,
+										AccessExclusiveLock,
+										0,
+										RangeVarCallbackMaintainsTable,
+										NULL);
+	rel = table_open(tableOid, NoLock);
+
+	/*
+	 * Reject clustering a remote temp table ... their local buffer manager is
+	 * not going to cope.
+	 */
+	if (RELATION_IS_OTHER_TEMP(rel))
+	{
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot execute %s on temporary tables of other sessions",
+					   RepackCommandAsString(stmt->command)));
+	}
+
+	/*
+	 * For partitioned tables, let caller handle this.  Otherwise, process it
+	 * here and we're done.
+	 */
+	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+		return rel;
+	else
+	{
+		Oid			indexOid;
+
+		indexOid = determine_clustered_index(rel, stmt->usingindex,
+											 stmt->indexname);
+		if (OidIsValid(indexOid))
+			check_index_is_clusterable(rel, indexOid, AccessExclusiveLock);
+		cluster_rel(stmt->command, stmt->usingindex, rel, indexOid, params);
+
+		/* Do an analyze, if requested */
+		if (params->options & CLUOPT_ANALYZE)
+		{
+			VacuumParams vac_params = {0};
+
+			vac_params.options |= VACOPT_ANALYZE;
+			if (params->options & CLUOPT_VERBOSE)
+				vac_params.options |= VACOPT_VERBOSE;
+			analyze_rel(RelationGetRelid(rel), NULL, vac_params, NIL, true,
+						NULL);
+		}
+
+		return NULL;
+	}
+}
+
+/*
+ * Given a relation and the usingindex/indexname options in a
+ * REPACK USING INDEX or CLUSTER command, return the OID of the index to use
+ * for clustering the table.
+ *
+ * Caller must hold lock on the relation so that the set of indexes doesn't
+ * change, and must call check_index_is_clusterable.
+ */
+static Oid
+determine_clustered_index(Relation rel, bool usingindex, const char *indexname)
+{
+	Oid			indexOid;
+
+	if (indexname == NULL && usingindex)
+	{
+		ListCell   *lc;
+
+		/* Find an index with indisclustered set, or report error */
+		foreach(lc, RelationGetIndexList(rel))
+		{
+			indexOid = lfirst_oid(lc);
+
+			if (get_index_isclustered(indexOid))
+				break;
+			indexOid = InvalidOid;
+		}
+
+		if (!OidIsValid(indexOid))
+			ereport(ERROR,
+					errcode(ERRCODE_UNDEFINED_OBJECT),
+					errmsg("there is no previously clustered index for table \"%s\"",
+						   RelationGetRelationName(rel)));
+	}
+	else if (indexname != NULL)
+	{
+		/*
+		 * An index was specified; figure out its OID.  It must be in the same
+		 * namespace as the relation.
+		 */
+		indexOid = get_relname_relid(indexname,
+									 rel->rd_rel->relnamespace);
+		if (!OidIsValid(indexOid))
+			ereport(ERROR,
+					errcode(ERRCODE_UNDEFINED_OBJECT),
+					errmsg("index \"%s\" for table \"%s\" does not exist",
+						   indexname, RelationGetRelationName(rel)));
+	}
+	else
+		indexOid = InvalidOid;
+
+	return indexOid;
+}
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 733ef40ae7c..8863ad0e8bd 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -2287,7 +2287,8 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 				cluster_params.options |= CLUOPT_VERBOSE;
 
 			/* VACUUM FULL is now a variant of CLUSTER; see cluster.c */
-			cluster_rel(rel, InvalidOid, &cluster_params);
+			cluster_rel(REPACK_COMMAND_VACUUMFULL, false, rel, InvalidOid,
+						&cluster_params);
 			/* cluster_rel closes the relation, but keeps lock */
 
 			rel = NULL;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 9fd48acb1f8..ab52c171b84 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -280,7 +280,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		AlterCompositeTypeStmt AlterUserMappingStmt
 		AlterRoleStmt AlterRoleSetStmt AlterPolicyStmt AlterStatsStmt
 		AlterDefaultPrivilegesStmt DefACLAction
-		AnalyzeStmt CallStmt ClosePortalStmt ClusterStmt CommentStmt
+		AnalyzeStmt CallStmt ClosePortalStmt CommentStmt
 		ConstraintsSetStmt CopyStmt CreateAsStmt CreateCastStmt
 		CreateDomainStmt CreateExtensionStmt CreateGroupStmt CreateOpClassStmt
 		CreateOpFamilyStmt AlterOpFamilyStmt CreatePLangStmt
@@ -297,7 +297,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		GrantStmt GrantRoleStmt ImportForeignSchemaStmt IndexStmt InsertStmt
 		ListenStmt LoadStmt LockStmt MergeStmt NotifyStmt ExplainableStmt PreparableStmt
 		CreateFunctionStmt AlterFunctionStmt ReindexStmt RemoveAggrStmt
-		RemoveFuncStmt RemoveOperStmt RenameStmt ReturnStmt RevokeStmt RevokeRoleStmt
+		RemoveFuncStmt RemoveOperStmt RenameStmt RepackStmt ReturnStmt RevokeStmt RevokeRoleStmt
 		RuleActionStmt RuleActionStmtOrEmpty RuleStmt
 		SecLabelStmt SelectStmt TransactionStmt TransactionStmtLegacy TruncateStmt
 		UnlistenStmt UpdateStmt VacuumStmt
@@ -316,7 +316,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 %type <str>			opt_single_name
 %type <list>		opt_qualified_name
-%type <boolean>		opt_concurrently
+%type <boolean>		opt_concurrently opt_usingindex
 %type <dbehavior>	opt_drop_behavior
 %type <list>		opt_utility_option_list
 %type <list>		utility_option_list
@@ -763,7 +763,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	QUOTE QUOTES
 
 	RANGE READ REAL REASSIGN RECURSIVE REF_P REFERENCES REFERENCING
-	REFRESH REINDEX RELATIVE_P RELEASE RENAME REPEATABLE REPLACE REPLICA
+	REFRESH REINDEX RELATIVE_P RELEASE RENAME REPACK REPEATABLE REPLACE REPLICA
 	RESET RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP
 	ROUTINE ROUTINES ROW ROWS RULE
 
@@ -1025,7 +1025,6 @@ stmt:
 			| CallStmt
 			| CheckPointStmt
 			| ClosePortalStmt
-			| ClusterStmt
 			| CommentStmt
 			| ConstraintsSetStmt
 			| CopyStmt
@@ -1099,6 +1098,7 @@ stmt:
 			| RemoveFuncStmt
 			| RemoveOperStmt
 			| RenameStmt
+			| RepackStmt
 			| RevokeStmt
 			| RevokeRoleStmt
 			| RuleStmt
@@ -1135,6 +1135,11 @@ opt_concurrently:
 			| /*EMPTY*/						{ $$ = false; }
 		;
 
+opt_usingindex:
+			USING INDEX						{ $$ = true; }
+			| /* EMPTY */					{ $$ = false; }
+		;
+
 opt_drop_behavior:
 			CASCADE							{ $$ = DROP_CASCADE; }
 			| RESTRICT						{ $$ = DROP_RESTRICT; }
@@ -11913,38 +11918,91 @@ CreateConversionStmt:
 /*****************************************************************************
  *
  *		QUERY:
+ *				REPACK [ (options) ] [ <qualified_name> [ USING INDEX <index_name> ] ]
+ *
+ *			obsolete variants:
  *				CLUSTER (options) [ <qualified_name> [ USING <index_name> ] ]
  *				CLUSTER [VERBOSE] [ <qualified_name> [ USING <index_name> ] ]
  *				CLUSTER [VERBOSE] <index_name> ON <qualified_name> (for pre-8.3)
  *
  *****************************************************************************/
 
-ClusterStmt:
-			CLUSTER '(' utility_option_list ')' qualified_name cluster_index_specification
+RepackStmt:
+			REPACK opt_utility_option_list qualified_name USING INDEX name
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = $3;
+					n->indexname = $6;
+					n->usingindex = true;
+					n->params = $2;
+					$$ = (Node *) n;
+				}
+			| REPACK opt_utility_option_list qualified_name opt_usingindex
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = $3;
+					n->indexname = NULL;
+					n->usingindex = $4;
+					n->params = $2;
+					$$ = (Node *) n;
+				}
+			| REPACK '(' utility_option_list ')'
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = NULL;
+					n->indexname = NULL;
+					n->usingindex = false;
+					n->params = $3;
+					$$ = (Node *) n;
+				}
+			| REPACK opt_usingindex
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = NULL;
+					n->indexname = NULL;
+					n->usingindex = $2;
+					n->params = NIL;
+					$$ = (Node *) n;
+				}
+			| CLUSTER '(' utility_option_list ')' qualified_name cluster_index_specification
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = $5;
 					n->indexname = $6;
+					n->usingindex = true;
 					n->params = $3;
 					$$ = (Node *) n;
 				}
 			| CLUSTER opt_utility_option_list
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = NULL;
 					n->indexname = NULL;
+					n->usingindex = true;
 					n->params = $2;
 					$$ = (Node *) n;
 				}
 			/* unparenthesized VERBOSE kept for pre-14 compatibility */
 			| CLUSTER opt_verbose qualified_name cluster_index_specification
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = $3;
 					n->indexname = $4;
+					n->usingindex = true;
 					if ($2)
 						n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
@@ -11952,20 +12010,24 @@ ClusterStmt:
 			/* unparenthesized VERBOSE kept for pre-17 compatibility */
 			| CLUSTER VERBOSE
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = NULL;
 					n->indexname = NULL;
+					n->usingindex = true;
 					n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
 				}
 			/* kept for pre-8.3 compatibility */
 			| CLUSTER opt_verbose name ON qualified_name
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = $5;
 					n->indexname = $3;
+					n->usingindex = true;
 					if ($2)
 						n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
@@ -17961,6 +18023,7 @@ unreserved_keyword:
 			| RELATIVE_P
 			| RELEASE
 			| RENAME
+			| REPACK
 			| REPEATABLE
 			| REPLACE
 			| REPLICA
@@ -18593,6 +18656,7 @@ bare_label_keyword:
 			| RELATIVE_P
 			| RELEASE
 			| RENAME
+			| REPACK
 			| REPEATABLE
 			| REPLACE
 			| REPLICA
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 918db53dd5e..1295dc25d02 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -277,9 +277,9 @@ ClassifyUtilityCommandAsReadOnly(Node *parsetree)
 				return COMMAND_OK_IN_RECOVERY | COMMAND_OK_IN_READ_ONLY_TXN;
 			}
 
-		case T_ClusterStmt:
 		case T_ReindexStmt:
 		case T_VacuumStmt:
+		case T_RepackStmt:
 			{
 				/*
 				 * These commands write WAL, so they're not strictly
@@ -854,14 +854,14 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 			ExecuteCallStmt(castNode(CallStmt, parsetree), params, isAtomicContext, dest);
 			break;
 
-		case T_ClusterStmt:
-			cluster(pstate, (ClusterStmt *) parsetree, isTopLevel);
-			break;
-
 		case T_VacuumStmt:
 			ExecVacuum(pstate, (VacuumStmt *) parsetree, isTopLevel);
 			break;
 
+		case T_RepackStmt:
+			ExecRepack(pstate, (RepackStmt *) parsetree, isTopLevel);
+			break;
+
 		case T_ExplainStmt:
 			ExplainQuery(pstate, (ExplainStmt *) parsetree, params, dest);
 			break;
@@ -2850,10 +2850,6 @@ CreateCommandTag(Node *parsetree)
 			tag = CMDTAG_CALL;
 			break;
 
-		case T_ClusterStmt:
-			tag = CMDTAG_CLUSTER;
-			break;
-
 		case T_VacuumStmt:
 			if (((VacuumStmt *) parsetree)->is_vacuumcmd)
 				tag = CMDTAG_VACUUM;
@@ -2861,6 +2857,10 @@ CreateCommandTag(Node *parsetree)
 				tag = CMDTAG_ANALYZE;
 			break;
 
+		case T_RepackStmt:
+			tag = CMDTAG_REPACK;
+			break;
+
 		case T_ExplainStmt:
 			tag = CMDTAG_EXPLAIN;
 			break;
@@ -3498,7 +3498,7 @@ GetCommandLogLevel(Node *parsetree)
 			lev = LOGSTMT_ALL;
 			break;
 
-		case T_ClusterStmt:
+		case T_RepackStmt:
 			lev = LOGSTMT_DDL;
 			break;
 
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index c756c2bebaa..a1e10e8c2f6 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -268,6 +268,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 		cmdtype = PROGRESS_COMMAND_ANALYZE;
 	else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
 		cmdtype = PROGRESS_COMMAND_CLUSTER;
+	else if (pg_strcasecmp(cmd, "REPACK") == 0)
+		cmdtype = PROGRESS_COMMAND_REPACK;
 	else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)
 		cmdtype = PROGRESS_COMMAND_CREATE_INDEX;
 	else if (pg_strcasecmp(cmd, "BASEBACKUP") == 0)
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 6b20a4404b2..df6c82e6d7b 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -1247,7 +1247,7 @@ static const char *const sql_commands[] = {
 	"DELETE FROM", "DISCARD", "DO", "DROP", "END", "EXECUTE", "EXPLAIN",
 	"FETCH", "GRANT", "IMPORT FOREIGN SCHEMA", "INSERT INTO", "LISTEN", "LOAD", "LOCK",
 	"MERGE INTO", "MOVE", "NOTIFY", "PREPARE",
-	"REASSIGN", "REFRESH MATERIALIZED VIEW", "REINDEX", "RELEASE",
+	"REASSIGN", "REFRESH MATERIALIZED VIEW", "REINDEX", "RELEASE", "REPACK",
 	"RESET", "REVOKE", "ROLLBACK",
 	"SAVEPOINT", "SECURITY LABEL", "SELECT", "SET", "SHOW", "START",
 	"TABLE", "TRUNCATE", "UNLISTEN", "UPDATE", "VACUUM", "VALUES", "WITH",
@@ -4999,6 +4999,37 @@ match_previous_words(int pattern_id,
 			COMPLETE_WITH_QUERY(Query_for_list_of_tablespaces);
 	}
 
+/* REPACK */
+	else if (Matches("REPACK"))
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
+	else if (Matches("REPACK", "(*)"))
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
+	/* If we have REPACK <sth>, then add "USING INDEX" */
+	else if (Matches("REPACK", MatchAnyExcept("(")))
+		COMPLETE_WITH("USING INDEX");
+	/* If we have REPACK (*) <sth>, then add "USING INDEX" */
+	else if (Matches("REPACK", "(*)", MatchAny))
+		COMPLETE_WITH("USING INDEX");
+	/* If we have REPACK <sth> USING, then add the index as well */
+	else if (Matches("REPACK", MatchAny, "USING", "INDEX"))
+	{
+		set_completion_reference(prev3_wd);
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_index_of_table);
+	}
+	else if (HeadMatches("REPACK", "(*") &&
+			 !HeadMatches("REPACK", "(*)"))
+	{
+		/*
+		 * This fires if we're in an unfinished parenthesized option list.
+		 * get_previous_words treats a completed parenthesized option list as
+		 * one word, so the above test is correct.
+		 */
+		if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
+			COMPLETE_WITH("VERBOSE");
+		else if (TailMatches("VERBOSE"))
+			COMPLETE_WITH("ON", "OFF");
+	}
+
 /* SECURITY LABEL */
 	else if (Matches("SECURITY"))
 		COMPLETE_WITH("LABEL");
diff --git a/src/bin/scripts/Makefile b/src/bin/scripts/Makefile
index 019ca06455d..f0c1bd4175c 100644
--- a/src/bin/scripts/Makefile
+++ b/src/bin/scripts/Makefile
@@ -16,7 +16,7 @@ subdir = src/bin/scripts
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-PROGRAMS = createdb createuser dropdb dropuser clusterdb vacuumdb reindexdb pg_isready
+PROGRAMS = createdb createuser dropdb dropuser clusterdb vacuumdb reindexdb pg_isready pg_repackdb
 
 override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
 LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
@@ -31,6 +31,7 @@ clusterdb: clusterdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport su
 vacuumdb: vacuumdb.o vacuuming.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 reindexdb: reindexdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 pg_isready: pg_isready.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
+pg_repackdb: pg_repackdb.o vacuuming.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 
 install: all installdirs
 	$(INSTALL_PROGRAM) createdb$(X)   '$(DESTDIR)$(bindir)'/createdb$(X)
@@ -41,6 +42,7 @@ install: all installdirs
 	$(INSTALL_PROGRAM) vacuumdb$(X)   '$(DESTDIR)$(bindir)'/vacuumdb$(X)
 	$(INSTALL_PROGRAM) reindexdb$(X)  '$(DESTDIR)$(bindir)'/reindexdb$(X)
 	$(INSTALL_PROGRAM) pg_isready$(X) '$(DESTDIR)$(bindir)'/pg_isready$(X)
+	$(INSTALL_PROGRAM) pg_repackdb$(X) '$(DESTDIR)$(bindir)'/pg_repackdb$(X)
 
 installdirs:
 	$(MKDIR_P) '$(DESTDIR)$(bindir)'
diff --git a/src/bin/scripts/meson.build b/src/bin/scripts/meson.build
index a4fed59d1c9..be573cae682 100644
--- a/src/bin/scripts/meson.build
+++ b/src/bin/scripts/meson.build
@@ -42,6 +42,7 @@ vacuuming_common = static_library('libvacuuming_common',
 
 binaries = [
   'vacuumdb',
+  'pg_repackdb',
 ]
 foreach binary : binaries
   binary_sources = files('@0@.c'.format(binary))
@@ -80,6 +81,7 @@ tests += {
       't/100_vacuumdb.pl',
       't/101_vacuumdb_all.pl',
       't/102_vacuumdb_stages.pl',
+      't/103_repackdb.pl',
       't/200_connstr.pl',
     ],
   },
diff --git a/src/bin/scripts/pg_repackdb.c b/src/bin/scripts/pg_repackdb.c
new file mode 100644
index 00000000000..8b4cfca7fe1
--- /dev/null
+++ b/src/bin/scripts/pg_repackdb.c
@@ -0,0 +1,238 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_repackdb
+ *		An utility to run REPACK
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * FIXME: this is missing a way to specify the index to use to repack one
+ * table, or whether to pass a WITH INDEX clause when multiple tables are
+ * used.  Something like --index[=indexname].  Adding that bleeds into
+ * vacuuming.c as well.
+ *
+ * src/bin/scripts/pg_repackdb.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <limits.h>
+
+#include "common.h"
+#include "common/logging.h"
+#include "fe_utils/option_utils.h"
+#include "vacuuming.h"
+
+static void help(const char *progname);
+void		check_objfilter(VacObjFilter objfilter);
+
+int
+main(int argc, char *argv[])
+{
+	static struct option long_options[] = {
+		{"host", required_argument, NULL, 'h'},
+		{"port", required_argument, NULL, 'p'},
+		{"username", required_argument, NULL, 'U'},
+		{"no-password", no_argument, NULL, 'w'},
+		{"password", no_argument, NULL, 'W'},
+		{"echo", no_argument, NULL, 'e'},
+		{"quiet", no_argument, NULL, 'q'},
+		{"dbname", required_argument, NULL, 'd'},
+		{"all", no_argument, NULL, 'a'},
+		/* XXX this could be 'i', but optional_arg is messy */
+		{"index", optional_argument, NULL, 1},
+		{"table", required_argument, NULL, 't'},
+		{"verbose", no_argument, NULL, 'v'},
+		{"jobs", required_argument, NULL, 'j'},
+		{"schema", required_argument, NULL, 'n'},
+		{"exclude-schema", required_argument, NULL, 'N'},
+		{"maintenance-db", required_argument, NULL, 2},
+		{NULL, 0, NULL, 0}
+	};
+
+	const char *progname;
+	int			optindex;
+	int			c;
+	const char *dbname = NULL;
+	const char *maintenance_db = NULL;
+	ConnParams	cparams;
+	bool		echo = false;
+	bool		quiet = false;
+	vacuumingOptions vacopts;
+	VacObjFilter objfilter = OBJFILTER_NONE;
+	SimpleStringList objects = {NULL, NULL};
+	int			concurrentCons = 1;
+	int			tbl_count = 0;
+	int			ret;
+
+	/* initialize options */
+	memset(&vacopts, 0, sizeof(vacopts));
+	vacopts.mode = MODE_REPACK;
+
+	/* the same for connection parameters */
+	memset(&cparams, 0, sizeof(cparams));
+	cparams.prompt_password = TRI_DEFAULT;
+
+	pg_logging_init(argv[0]);
+	progname = get_progname(argv[0]);
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
+
+	handle_help_version_opts(argc, argv, progname, help);
+
+	while ((c = getopt_long(argc, argv, "ad:eh:j:n:N:p:qt:U:vwW",
+							long_options, &optindex)) != -1)
+	{
+		switch (c)
+		{
+			case 'a':
+				objfilter |= OBJFILTER_ALL_DBS;
+				break;
+			case 'd':
+				objfilter |= OBJFILTER_DATABASE;
+				dbname = pg_strdup(optarg);
+				break;
+			case 'e':
+				echo = true;
+				break;
+			case 'h':
+				cparams.pghost = pg_strdup(optarg);
+				break;
+			case 'j':
+				if (!option_parse_int(optarg, "-j/--jobs", 1, INT_MAX,
+									  &concurrentCons))
+					exit(1);
+				break;
+			case 'n':
+				objfilter |= OBJFILTER_SCHEMA;
+				simple_string_list_append(&objects, optarg);
+				break;
+			case 'N':
+				objfilter |= OBJFILTER_SCHEMA_EXCLUDE;
+				simple_string_list_append(&objects, optarg);
+				break;
+			case 'p':
+				cparams.pgport = pg_strdup(optarg);
+				break;
+			case 'q':
+				quiet = true;
+				break;
+			case 't':
+				objfilter |= OBJFILTER_TABLE;
+				simple_string_list_append(&objects, optarg);
+				tbl_count++;
+				break;
+			case 'U':
+				cparams.pguser = pg_strdup(optarg);
+				break;
+			case 'v':
+				vacopts.verbose = true;
+				break;
+			case 'w':
+				cparams.prompt_password = TRI_NO;
+				break;
+			case 'W':
+				cparams.prompt_password = TRI_YES;
+				break;
+			case 1:
+				vacopts.using_index = true;
+				if (optarg)
+					vacopts.indexname = pg_strdup(optarg);
+				else
+					vacopts.indexname = NULL;
+				break;
+			case 2:
+				maintenance_db = pg_strdup(optarg);
+				break;
+			default:
+				/* getopt_long already emitted a complaint */
+				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+				exit(1);
+		}
+	}
+
+	/*
+	 * Non-option argument specifies database name as long as it wasn't
+	 * already specified with -d / --dbname
+	 */
+	if (optind < argc && dbname == NULL)
+	{
+		objfilter |= OBJFILTER_DATABASE;
+		dbname = argv[optind];
+		optind++;
+	}
+
+	if (optind < argc)
+	{
+		pg_log_error("too many command-line arguments (first is \"%s\")",
+					 argv[optind]);
+		pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+		exit(1);
+	}
+
+	/*
+	 * Validate the combination of filters specified in the command-line
+	 * options.
+	 */
+	check_objfilter(objfilter);
+
+	ret = vacuuming_main(&cparams, dbname, maintenance_db, &vacopts,
+						 objfilter, &objects,
+						 tbl_count, false, concurrentCons,
+						 progname, echo, quiet);
+	exit(ret);
+}
+
+/*
+ * Verify that the filters used at command line are compatible.
+ */
+void
+check_objfilter(VacObjFilter objfilter)
+{
+	if ((objfilter & OBJFILTER_ALL_DBS) &&
+		(objfilter & OBJFILTER_DATABASE))
+		pg_fatal("cannot repack all databases and a specific one at the same time");
+
+	if ((objfilter & OBJFILTER_TABLE) &&
+		(objfilter & OBJFILTER_SCHEMA))
+		pg_fatal("cannot repack all tables in schema(s) and specific table(s) at the same time");
+
+	if ((objfilter & OBJFILTER_TABLE) &&
+		(objfilter & OBJFILTER_SCHEMA_EXCLUDE))
+		pg_fatal("cannot repack specific table(s) and exclude schema(s) at the same time");
+
+	if ((objfilter & OBJFILTER_SCHEMA) &&
+		(objfilter & OBJFILTER_SCHEMA_EXCLUDE))
+		pg_fatal("cannot repack all tables in schema(s) and exclude schema(s) at the same time");
+}
+
+static void
+help(const char *progname)
+{
+	printf(_("%s repacks a PostgreSQL database.\n\n"), progname);
+	printf(_("Usage:\n"));
+	printf(_("  %s [OPTION]... [DBNAME]\n"), progname);
+	printf(_("\nOptions:\n"));
+	printf(_("  -a, --all                       repack all databases\n"));
+	printf(_("  -d, --dbname=DBNAME             database to repack\n"));
+	printf(_("  -e, --echo                      show the commands being sent to the server\n"));
+	printf(_("  -j, --jobs=NUM                  use this many concurrent connections to repack\n"));
+	printf(_("  -n, --schema=SCHEMA             repack tables in the specified schema(s) only\n"));
+	printf(_("  -N, --exclude-schema=SCHEMA     do not repack tables in the specified schema(s)\n"));
+	printf(_("  -q, --quiet                     don't write any messages\n"));
+	printf(_("  -t, --table='TABLE'             repack specific table(s) only\n"));
+	printf(_("  -v, --verbose                   write a lot of output\n"));
+	printf(_("  -V, --version                   output version information, then exit\n"));
+	printf(_("  -?, --help                      show this help, then exit\n"));
+	printf(_("\nConnection options:\n"));
+	printf(_("  -h, --host=HOSTNAME       database server host or socket directory\n"));
+	printf(_("  -p, --port=PORT           database server port\n"));
+	printf(_("  -U, --username=USERNAME   user name to connect as\n"));
+	printf(_("  -w, --no-password         never prompt for password\n"));
+	printf(_("  -W, --password            force password prompt\n"));
+	printf(_("  --maintenance-db=DBNAME   alternate maintenance database\n"));
+	printf(_("\nRead the description of the SQL command REPACK for details.\n"));
+	printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
+	printf(_("%s home page: <%s>\n"), PACKAGE_NAME, PACKAGE_URL);
+}
diff --git a/src/bin/scripts/t/103_repackdb.pl b/src/bin/scripts/t/103_repackdb.pl
new file mode 100644
index 00000000000..51de4d7ab34
--- /dev/null
+++ b/src/bin/scripts/t/103_repackdb.pl
@@ -0,0 +1,24 @@
+# Copyright (c) 2021-2025, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+program_help_ok('pg_repackdb');
+program_version_ok('pg_repackdb');
+program_options_handling_ok('pg_repackdb');
+
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init;
+$node->start;
+
+$node->issues_sql_like(
+	[ 'pg_repackdb', 'postgres' ],
+	qr/statement: REPACK.*;/,
+	'SQL REPACK run');
+
+
+done_testing();
diff --git a/src/bin/scripts/vacuuming.c b/src/bin/scripts/vacuuming.c
index b9f2e507557..f926cf2124a 100644
--- a/src/bin/scripts/vacuuming.c
+++ b/src/bin/scripts/vacuuming.c
@@ -1,6 +1,6 @@
 /*-------------------------------------------------------------------------
  * vacuuming.c
- *		Helper routines for vacuumdb
+ *		Helper routines for vacuumdb and pg_repackdb
  *
  * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -46,8 +46,8 @@ static SimpleStringList *retrieve_objects(PGconn *conn,
 										  bool echo);
 static void prepare_vacuum_command(PGconn *conn, PQExpBuffer sql,
 								   vacuumingOptions *vacopts, const char *table);
-static void run_vacuum_command(PGconn *conn, const char *sql, bool echo,
-							   const char *table);
+static void run_vacuum_command(PGconn *conn, vacuumingOptions *vacopts,
+							   const char *sql, bool echo, const char *table);
 
 /*
  * Executes vacuum/analyze as indicated, or dies in case of failure.
@@ -192,6 +192,14 @@ vacuum_one_database(ConnParams *cparams,
 
 	conn = connectDatabase(cparams, progname, echo, false, true);
 
+	if (vacopts->mode == MODE_REPACK && PQserverVersion(conn) < 190000)
+	{
+		/* XXX arguably, here we should use VACUUM FULL instead of failing */
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" command on server versions older than PostgreSQL %s",
+				 "REPACK", "19");
+	}
+
 	if (vacopts->disable_page_skipping && PQserverVersion(conn) < 90600)
 	{
 		PQfinish(conn);
@@ -281,12 +289,19 @@ vacuum_one_database(ConnParams *cparams,
 
 	if (!quiet)
 	{
+		/* XXX get rid of the assumption that ANALYZE_NO_STAGE means vacuum */
 		if (stage != ANALYZE_NO_STAGE)
 			printf(_("%s: processing database \"%s\": %s\n"),
 				   progname, PQdb(conn), _(stage_messages[stage]));
-		else
+		else if (vacopts->mode == MODE_VACUUM)
 			printf(_("%s: vacuuming database \"%s\"\n"),
 				   progname, PQdb(conn));
+		else
+		{
+			Assert(vacopts->mode == MODE_REPACK);
+			printf(_("%s: repacking database \"%s\"\n"),
+				   progname, PQdb(conn));
+		}
 		fflush(stdout);
 	}
 
@@ -376,7 +391,7 @@ vacuum_one_database(ConnParams *cparams,
 		 * through ParallelSlotsGetIdle.
 		 */
 		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, sql.data,
+		run_vacuum_command(free_slot->connection, vacopts, sql.data,
 						   echo, tabname);
 
 		cell = cell->next;
@@ -389,7 +404,8 @@ vacuum_one_database(ConnParams *cparams,
 	}
 
 	/* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */
-	if (vacopts->skip_database_stats && stage == ANALYZE_NO_STAGE &&
+	if (vacopts->mode == MODE_VACUUM &&
+		vacopts->skip_database_stats && stage == ANALYZE_NO_STAGE &&
 		!vacopts->analyze_only)
 	{
 		const char *cmd = "VACUUM (ONLY_DATABASE_STATS);";
@@ -402,7 +418,7 @@ vacuum_one_database(ConnParams *cparams,
 		}
 
 		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, cmd, echo, NULL);
+		run_vacuum_command(free_slot->connection, vacopts, cmd, echo, NULL);
 
 		if (!ParallelSlotsWaitCompletion(sa))
 			ret = EXIT_FAILURE;
@@ -624,6 +640,23 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
 								 " AND listed_objects.object_oid IS NOT NULL\n");
 	}
 
+	/*
+	 * In REPACK mode, if the 'using_index' option was given but no index
+	 * name, filter only tables that have an index with indisclustered set.
+	 * (If an index name is given, we trust the user to pass a reasonable list
+	 * of tables.)
+	 *
+	 * XXX it may be worth printing an error if an index name is given with
+	 * no list of tables.
+	 */
+	if (vacopts->mode == MODE_REPACK &&
+		vacopts->using_index && !vacopts->indexname)
+	{
+		appendPQExpBufferStr(&catalog_query,
+							 " AND EXISTS (SELECT 1 FROM pg_catalog.pg_index\n"
+							 "    WHERE indrelid = c.oid AND indisclustered)\n");
+	}
+
 	/*
 	 * If no tables were listed, filter for the relevant relation types.  If
 	 * tables were given via --table, don't bother filtering by relation type.
@@ -811,7 +844,30 @@ prepare_vacuum_command(PGconn *conn, PQExpBuffer sql,
 
 	resetPQExpBuffer(sql);
 
-	if (vacopts->analyze_only)
+	if (vacopts->mode == MODE_REPACK)
+	{
+		appendPQExpBufferStr(sql, "REPACK");
+
+		if (vacopts->verbose)
+		{
+			appendPQExpBuffer(sql, "%sVERBOSE", sep);
+			sep = comma;
+		}
+
+		if (sep != paren)
+			appendPQExpBufferChar(sql, ')');
+
+		appendPQExpBuffer(sql, " %s", table);
+
+		if (vacopts->using_index)
+		{
+			appendPQExpBuffer(sql, " USING INDEX");
+			if (vacopts->indexname)
+				appendPQExpBuffer(sql, " %s", fmtIdEnc(vacopts->indexname,
+													   PQclientEncoding(conn)));
+		}
+	}
+	else if (vacopts->analyze_only)
 	{
 		appendPQExpBufferStr(sql, "ANALYZE");
 
@@ -962,7 +1018,10 @@ prepare_vacuum_command(PGconn *conn, PQExpBuffer sql,
 		}
 	}
 
-	appendPQExpBuffer(sql, " %s;", table);
+	if (vacopts->mode != MODE_REPACK)
+		appendPQExpBuffer(sql, " %s", table);
+
+	appendPQExpBufferChar(sql, ';');
 }
 
 /*
@@ -972,8 +1031,8 @@ prepare_vacuum_command(PGconn *conn, PQExpBuffer sql,
  * Any errors during command execution are reported to stderr.
  */
 static void
-run_vacuum_command(PGconn *conn, const char *sql, bool echo,
-				   const char *table)
+run_vacuum_command(PGconn *conn, vacuumingOptions *vacopts,
+				   const char *sql, bool echo, const char *table)
 {
 	bool		status;
 
@@ -986,13 +1045,21 @@ run_vacuum_command(PGconn *conn, const char *sql, bool echo,
 	{
 		if (table)
 		{
-			pg_log_error("vacuuming of table \"%s\" in database \"%s\" failed: %s",
-						 table, PQdb(conn), PQerrorMessage(conn));
+			if (vacopts->mode == MODE_VACUUM)
+				pg_log_error("vacuuming of table \"%s\" in database \"%s\" failed: %s",
+							 table, PQdb(conn), PQerrorMessage(conn));
+			else
+				pg_log_error("repacking of table \"%s\" in database \"%s\" failed: %s",
+							 table, PQdb(conn), PQerrorMessage(conn));
 		}
 		else
 		{
-			pg_log_error("vacuuming of database \"%s\" failed: %s",
-						 PQdb(conn), PQerrorMessage(conn));
+			if (vacopts->mode == MODE_VACUUM)
+				pg_log_error("vacuuming of database \"%s\" failed: %s",
+							 PQdb(conn), PQerrorMessage(conn));
+			else
+				pg_log_error("repacking of database \"%s\" failed: %s",
+							 PQdb(conn), PQerrorMessage(conn));
 		}
 	}
 }
diff --git a/src/bin/scripts/vacuuming.h b/src/bin/scripts/vacuuming.h
index 021953e153a..dde68fb4cba 100644
--- a/src/bin/scripts/vacuuming.h
+++ b/src/bin/scripts/vacuuming.h
@@ -17,6 +17,12 @@
 #include "fe_utils/connect_utils.h"
 #include "fe_utils/simple_list.h"
 
+typedef enum
+{
+	MODE_VACUUM,
+	MODE_REPACK
+} RunMode;
+
 /* For analyze-in-stages mode */
 #define ANALYZE_NO_STAGE	-1
 #define ANALYZE_NUM_STAGES	3
@@ -24,11 +30,14 @@
 /* vacuum options controlled by user flags */
 typedef struct vacuumingOptions
 {
+	RunMode		mode;
 	bool		analyze_only;
 	bool		verbose;
 	bool		and_analyze;
 	bool		full;
 	bool		freeze;
+	bool		using_index;
+	char	   *indexname;
 	bool		disable_page_skipping;
 	bool		skip_locked;
 	int			min_xid_age;
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 60088a64cbb..890998d84bb 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -24,6 +24,7 @@
 #define CLUOPT_RECHECK 0x02		/* recheck relation state */
 #define CLUOPT_RECHECK_ISCLUSTERED 0x04 /* recheck relation state for
 										 * indisclustered */
+#define CLUOPT_ANALYZE 0x08		/* do an ANALYZE */
 
 /* options for CLUSTER */
 typedef struct ClusterParams
@@ -31,8 +32,11 @@ typedef struct ClusterParams
 	bits32		options;		/* bitmask of CLUOPT_* */
 } ClusterParams;
 
-extern void cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel);
-extern void cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params);
+
+extern void ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel);
+
+extern void cluster_rel(RepackCommand command, bool usingindex,
+						Relation OldHeap, Oid indexOid, ClusterParams *params);
 extern void check_index_is_clusterable(Relation OldHeap, Oid indexOid,
 									   LOCKMODE lockmode);
 extern void mark_index_clustered(Relation rel, Oid indexOid, bool is_internal);
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 1cde4bd9bcf..5b6639c114c 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -56,24 +56,51 @@
 #define PROGRESS_ANALYZE_PHASE_COMPUTE_EXT_STATS		4
 #define PROGRESS_ANALYZE_PHASE_FINALIZE_ANALYZE			5
 
-/* Progress parameters for cluster */
-#define PROGRESS_CLUSTER_COMMAND				0
-#define PROGRESS_CLUSTER_PHASE					1
-#define PROGRESS_CLUSTER_INDEX_RELID			2
-#define PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED	3
-#define PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN	4
-#define PROGRESS_CLUSTER_TOTAL_HEAP_BLKS		5
-#define PROGRESS_CLUSTER_HEAP_BLKS_SCANNED		6
-#define PROGRESS_CLUSTER_INDEX_REBUILD_COUNT	7
+/*
+ * Progress parameters for REPACK.
+ *
+ * Note: Since REPACK shares some code with CLUSTER, these values are also
+ * used by CLUSTER. (CLUSTER is now deprecated, so it makes little sense to
+ * introduce a separate set of constants.)
+ */
+#define PROGRESS_REPACK_COMMAND					0
+#define PROGRESS_REPACK_PHASE					1
+#define PROGRESS_REPACK_INDEX_RELID				2
+#define PROGRESS_REPACK_HEAP_TUPLES_SCANNED		3
+#define PROGRESS_REPACK_HEAP_TUPLES_WRITTEN		4
+#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			5
+#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		6
+#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		7
 
-/* Phases of cluster (as advertised via PROGRESS_CLUSTER_PHASE) */
-#define PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP	1
-#define PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP	2
-#define PROGRESS_CLUSTER_PHASE_SORT_TUPLES		3
-#define PROGRESS_CLUSTER_PHASE_WRITE_NEW_HEAP	4
-#define PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES	5
-#define PROGRESS_CLUSTER_PHASE_REBUILD_INDEX	6
-#define PROGRESS_CLUSTER_PHASE_FINAL_CLEANUP	7
+/*
+ * Phases of repack (as advertised via PROGRESS_REPACK_PHASE).
+ */
+#define PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP		1
+#define PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP	2
+#define PROGRESS_REPACK_PHASE_SORT_TUPLES		3
+#define PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP	4
+#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	5
+#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		6
+#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		7
+
+/*
+ * Commands of PROGRESS_REPACK
+ *
+ * Currently we only have one command, so the PROGRESS_REPACK_COMMAND
+ * parameter is not necessary. However it makes cluster.c simpler if we have
+ * the same set of parameters for CLUSTER and REPACK - see the note on REPACK
+ * parameters above.
+ */
+#define PROGRESS_REPACK_COMMAND_REPACK			1
+
+/*
+ * Progress parameters for cluster.
+ *
+ * Although we need to report REPACK and CLUSTER in separate views, the
+ * parameters and phases of CLUSTER are a subset of those of REPACK. Therefore
+ * we just use the appropriate values defined for REPACK above instead of
+ * defining a separate set of constants here.
+ */
 
 /* Commands of PROGRESS_CLUSTER */
 #define PROGRESS_CLUSTER_COMMAND_CLUSTER		1
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 4ed14fc5b78..e4ba59f6b8f 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3949,16 +3949,26 @@ typedef struct AlterSystemStmt
 } AlterSystemStmt;
 
 /* ----------------------
- *		Cluster Statement (support pbrown's cluster index implementation)
+ *		Repack Statement
  * ----------------------
  */
-typedef struct ClusterStmt
+typedef enum RepackCommand
+{
+	REPACK_COMMAND_CLUSTER,
+	REPACK_COMMAND_REPACK,
+	REPACK_COMMAND_VACUUMFULL,
+} RepackCommand;
+
+typedef struct RepackStmt
 {
 	NodeTag		type;
-	RangeVar   *relation;		/* relation being indexed, or NULL if all */
-	char	   *indexname;		/* original index defined */
+	RepackCommand command;		/* type of command being run */
+	RangeVar   *relation;		/* relation being repacked */
+	char	   *indexname;		/* order tuples by this index */
+	bool		usingindex;		/* whether USING INDEX is specified */
 	List	   *params;			/* list of DefElem nodes */
-} ClusterStmt;
+} RepackStmt;
+
 
 /* ----------------------
  *		Vacuum and Analyze Statements
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index a4af3f717a1..22559369e2c 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -374,6 +374,7 @@ PG_KEYWORD("reindex", REINDEX, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("relative", RELATIVE_P, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("release", RELEASE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("rename", RENAME, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("repack", REPACK, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("repeatable", REPEATABLE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("replace", REPLACE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("replica", REPLICA, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/include/tcop/cmdtaglist.h b/src/include/tcop/cmdtaglist.h
index d250a714d59..cceb312f2b3 100644
--- a/src/include/tcop/cmdtaglist.h
+++ b/src/include/tcop/cmdtaglist.h
@@ -196,6 +196,7 @@ PG_CMDTAG(CMDTAG_REASSIGN_OWNED, "REASSIGN OWNED", false, false, false)
 PG_CMDTAG(CMDTAG_REFRESH_MATERIALIZED_VIEW, "REFRESH MATERIALIZED VIEW", true, false, false)
 PG_CMDTAG(CMDTAG_REINDEX, "REINDEX", true, false, false)
 PG_CMDTAG(CMDTAG_RELEASE, "RELEASE", false, false, false)
+PG_CMDTAG(CMDTAG_REPACK, "REPACK", false, false, false)
 PG_CMDTAG(CMDTAG_RESET, "RESET", false, false, false)
 PG_CMDTAG(CMDTAG_REVOKE, "REVOKE", true, false, false)
 PG_CMDTAG(CMDTAG_REVOKE_ROLE, "REVOKE ROLE", false, false, false)
diff --git a/src/include/utils/backend_progress.h b/src/include/utils/backend_progress.h
index dda813ab407..e69e366dcdc 100644
--- a/src/include/utils/backend_progress.h
+++ b/src/include/utils/backend_progress.h
@@ -28,6 +28,7 @@ typedef enum ProgressCommandType
 	PROGRESS_COMMAND_CREATE_INDEX,
 	PROGRESS_COMMAND_BASEBACKUP,
 	PROGRESS_COMMAND_COPY,
+	PROGRESS_COMMAND_REPACK,
 } ProgressCommandType;
 
 #define PGSTAT_NUM_PROGRESS_PARAM	20
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index 4d40a6809ab..5256628b51d 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -254,6 +254,63 @@ ORDER BY 1;
  clstr_tst_pkey
 (3 rows)
 
+-- REPACK handles individual tables identically to CLUSTER, but it's worth
+-- checking if it handles table hierarchies identically as well.
+REPACK clstr_tst USING INDEX clstr_tst_c;
+-- Verify that inheritance link still works
+INSERT INTO clstr_tst_inh VALUES (0, 100, 'in child table 2');
+SELECT a,b,c,substring(d for 30), length(d) from clstr_tst;
+ a  |  b  |        c         |           substring            | length 
+----+-----+------------------+--------------------------------+--------
+ 10 |  14 | catorce          |                                |       
+ 18 |   5 | cinco            |                                |       
+  9 |   4 | cuatro           |                                |       
+ 26 |  19 | diecinueve       |                                |       
+ 12 |  18 | dieciocho        |                                |       
+ 30 |  16 | dieciseis        |                                |       
+ 24 |  17 | diecisiete       |                                |       
+  2 |  10 | diez             |                                |       
+ 23 |  12 | doce             |                                |       
+ 11 |   2 | dos              |                                |       
+ 25 |   9 | nueve            |                                |       
+ 31 |   8 | ocho             |                                |       
+  1 |  11 | once             |                                |       
+ 28 |  15 | quince           |                                |       
+ 32 |   6 | seis             | xyzzyxyzzyxyzzyxyzzyxyzzyxyzzy | 500000
+ 29 |   7 | siete            |                                |       
+ 15 |  13 | trece            |                                |       
+ 22 |  30 | treinta          |                                |       
+ 17 |  32 | treinta y dos    |                                |       
+  3 |  31 | treinta y uno    |                                |       
+  5 |   3 | tres             |                                |       
+ 20 |   1 | uno              |                                |       
+  6 |  20 | veinte           |                                |       
+ 14 |  25 | veinticinco      |                                |       
+ 21 |  24 | veinticuatro     |                                |       
+  4 |  22 | veintidos        |                                |       
+ 19 |  29 | veintinueve      |                                |       
+ 16 |  28 | veintiocho       |                                |       
+ 27 |  26 | veintiseis       |                                |       
+ 13 |  27 | veintisiete      |                                |       
+  7 |  23 | veintitres       |                                |       
+  8 |  21 | veintiuno        |                                |       
+  0 | 100 | in child table   |                                |       
+  0 | 100 | in child table 2 |                                |       
+(34 rows)
+
+-- Verify that foreign key link still works
+INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
+ERROR:  insert or update on table "clstr_tst" violates foreign key constraint "clstr_tst_con"
+DETAIL:  Key (b)=(1111) is not present in table "clstr_tst_s".
+SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
+ORDER BY 1;
+       conname        
+----------------------
+ clstr_tst_a_not_null
+ clstr_tst_con
+ clstr_tst_pkey
+(3 rows)
+
 SELECT relname, relkind,
     EXISTS(SELECT 1 FROM pg_class WHERE oid = c.reltoastrelid) AS hastoast
 FROM pg_class c WHERE relname LIKE 'clstr_tst%' ORDER BY relname;
@@ -381,6 +438,35 @@ SELECT * FROM clstr_1;
  2
 (2 rows)
 
+-- REPACK w/o argument performs no ordering, so we can only check which tables
+-- have the relfilenode changed.
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_old AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+SET SESSION AUTHORIZATION regress_clstr_user;
+SET client_min_messages = ERROR;  -- order of "skipping" warnings may vary
+REPACK;
+RESET client_min_messages;
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_new AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+-- Do the actual comparison. Unlike CLUSTER, clstr_3 should have been
+-- processed because there is nothing like clustering index here.
+SELECT o.relname FROM relnodes_old o
+JOIN relnodes_new n ON o.relname = n.relname
+WHERE o.relfilenode <> n.relfilenode
+ORDER BY o.relname;
+ relname 
+---------
+ clstr_1
+ clstr_3
+(2 rows)
+
+SET SESSION AUTHORIZATION regress_clstr_user;
 -- Test MVCC-safety of cluster. There isn't much we can do to verify the
 -- results with a single backend...
 CREATE TABLE clustertest (key int PRIMARY KEY);
@@ -495,6 +581,43 @@ ALTER TABLE clstrpart SET WITHOUT CLUSTER;
 ERROR:  cannot mark index clustered in partitioned table
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 ERROR:  cannot mark index clustered in partitioned table
+-- Check that REPACK sets new relfilenodes: it should process exactly the same
+-- tables as CLUSTER did.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart USING INDEX clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
+-- And finally the same for REPACK w/o index.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
 DROP TABLE clstrpart;
 -- Ownership of partitions is checked
 CREATE TABLE ptnowner(i int unique) PARTITION BY LIST (i);
@@ -513,7 +636,7 @@ CREATE TEMP TABLE ptnowner_oldnodes AS
   JOIN pg_class AS c ON c.oid=tree.relid;
 SET SESSION AUTHORIZATION regress_ptnowner;
 CLUSTER ptnowner USING ptnowner_i_idx;
-WARNING:  permission denied to cluster "ptnowner2", skipping it
+WARNING:  permission denied to execute CLUSTER on "ptnowner2", skipping it
 RESET SESSION AUTHORIZATION;
 SELECT a.relname, a.relfilenode=b.relfilenode FROM pg_class a
   JOIN ptnowner_oldnodes b USING (oid) ORDER BY a.relname COLLATE "C";
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 35e8aad7701..3a1d1d28282 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2071,6 +2071,29 @@ pg_stat_progress_create_index| SELECT s.pid,
     s.param15 AS partitions_done
    FROM (pg_stat_get_progress_info('CREATE INDEX'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+pg_stat_progress_repack| SELECT s.pid,
+    s.datid,
+    d.datname,
+    s.relid,
+        CASE s.param2
+            WHEN 0 THEN 'initializing'::text
+            WHEN 1 THEN 'seq scanning heap'::text
+            WHEN 2 THEN 'index scanning heap'::text
+            WHEN 3 THEN 'sorting tuples'::text
+            WHEN 4 THEN 'writing new heap'::text
+            WHEN 5 THEN 'swapping relation files'::text
+            WHEN 6 THEN 'rebuilding index'::text
+            WHEN 7 THEN 'performing final cleanup'::text
+            ELSE NULL::text
+        END AS phase,
+    (s.param3)::oid AS repack_index_relid,
+    s.param4 AS heap_tuples_scanned,
+    s.param5 AS heap_tuples_written,
+    s.param6 AS heap_blks_total,
+    s.param7 AS heap_blks_scanned,
+    s.param8 AS index_rebuild_count
+   FROM (pg_stat_get_progress_info('REPACK'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
+     LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_vacuum| SELECT s.pid,
     s.datid,
     d.datname,
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index b7115f86104..cfcc3dc9761 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -76,6 +76,19 @@ INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
 SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
 ORDER BY 1;
 
+-- REPACK handles individual tables identically to CLUSTER, but it's worth
+-- checking if it handles table hierarchies identically as well.
+REPACK clstr_tst USING INDEX clstr_tst_c;
+
+-- Verify that inheritance link still works
+INSERT INTO clstr_tst_inh VALUES (0, 100, 'in child table 2');
+SELECT a,b,c,substring(d for 30), length(d) from clstr_tst;
+
+-- Verify that foreign key link still works
+INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
+
+SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
+ORDER BY 1;
 
 SELECT relname, relkind,
     EXISTS(SELECT 1 FROM pg_class WHERE oid = c.reltoastrelid) AS hastoast
@@ -159,6 +172,34 @@ INSERT INTO clstr_1 VALUES (1);
 CLUSTER clstr_1;
 SELECT * FROM clstr_1;
 
+-- REPACK w/o argument performs no ordering, so we can only check which tables
+-- have the relfilenode changed.
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_old AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+
+SET SESSION AUTHORIZATION regress_clstr_user;
+SET client_min_messages = ERROR;  -- order of "skipping" warnings may vary
+REPACK;
+RESET client_min_messages;
+
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_new AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+
+-- Do the actual comparison. Unlike CLUSTER, clstr_3 should have been
+-- processed because there is nothing like clustering index here.
+SELECT o.relname FROM relnodes_old o
+JOIN relnodes_new n ON o.relname = n.relname
+WHERE o.relfilenode <> n.relfilenode
+ORDER BY o.relname;
+
+SET SESSION AUTHORIZATION regress_clstr_user;
+
 -- Test MVCC-safety of cluster. There isn't much we can do to verify the
 -- results with a single backend...
 
@@ -229,6 +270,24 @@ SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM o
 CLUSTER clstrpart;
 ALTER TABLE clstrpart SET WITHOUT CLUSTER;
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+
+-- Check that REPACK sets new relfilenodes: it should process exactly the same
+-- tables as CLUSTER did.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart USING INDEX clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+
+-- And finally the same for REPACK w/o index.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+
 DROP TABLE clstrpart;
 
 -- Ownership of partitions is checked
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3c80d49b67e..c17b5c0cadc 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2537,6 +2537,8 @@ ReorderBufferTupleCidKey
 ReorderBufferUpdateProgressTxnCB
 ReorderTuple
 RepOriginId
+RepackCommand
+RepackStmt
 ReparameterizeForeignPathByChild_function
 ReplaceVarsFromTargetList_context
 ReplaceVarsNoMatchOption
@@ -2604,6 +2606,7 @@ RtlNtStatusToDosError_t
 RuleInfo
 RuleLock
 RuleStmt
+RunMode
 RunningTransactions
 RunningTransactionsData
 SASLStatus
-- 
2.47.3

#34

Marcos Pegoraro

marcos@f10.com.br

4 months ago

In reply to: Álvaro Herrera (#33)

Re: Adding REPACK [concurrently]

Em qui., 25 de set. de 2025 às 15:12, Álvaro Herrera <alvherre@kurilemu.de>
escreveu:

Some typos I've found on usage of pg_repackdb.

+ printf(_("  -n, --schema=SCHEMA             repack tables in the
specified schema(s) only\n"));
+ printf(_("  -N, --exclude-schema=SCHEMA     do not repack tables in the
specified schema(s)\n"));
both options can point to a single schema, so "(s)" should be removed.
"in the specified schema(s)" should be "in the specified schema"

Same occurs on this one, which should be table, not table(s)
+ printf(_(" -t, --table='TABLE' repack specific table(s)
only\n"));

regards
Marcos

#35

Robert Treat

rob@xzilla.net

4 months ago

In reply to: Marcos Pegoraro (#34)

Re: Adding REPACK [concurrently]

On Thu, Sep 25, 2025 at 4:21 PM Marcos Pegoraro <marcos@f10.com.br> wrote:

Em qui., 25 de set. de 2025 às 15:12, Álvaro Herrera <alvherre@kurilemu.de> escreveu:

Some typos I've found on usage of pg_repackdb.
+ printf(_("  -n, --schema=SCHEMA             repack tables in the specified schema(s) only\n"));
+ printf(_("  -N, --exclude-schema=SCHEMA     do not repack tables in the specified schema(s)\n"));
both options can point to a single schema, so "(s)" should be removed.
"in the specified schema(s)" should be "in the specified schema"
Same occurs on this one, which should be table, not table(s)
+ printf(_(" -t, --table='TABLE' repack specific table(s) only\n"));

This pattern is used because you can pass more than one argument, for
example, something like

pg_repackdb -d pagila -v -n public -n legacy

While I agree that the wording is a little awkward; I'd prefer "repack
tables only in the specified schema(s)"; but this follows the same
pattern as pg_dump and friends.

Robert Treat
https://xzilla.net

#36

Marcos Pegoraro

marcos@f10.com.br

4 months ago

In reply to: Robert Treat (#35)

Re: Adding REPACK [concurrently]

Em qui., 25 de set. de 2025 às 18:31, Robert Treat <rob@xzilla.net>
escreveu:

This pattern is used because you can pass more than one argument, for
example, something like

I know that

While I agree that the wording is a little awkward, this follows the same
pattern as pg_dump and friends.

well, I think pg_dump looks wrong too. Because if you explain that it's a
single table or single schema on docs, why you write on plural on usage ?
+        Repack or analyze all tables in
+        <replaceable class="parameter">schema</replaceable> only.  Multiple
+        schemas can be repacked by writing multiple <option>-n</option>
+        switches.

instead of
+ printf(_("  -n, --schema=SCHEMA             repack tables in the
specified schema(s) only\n"));
maybe this ?
+ printf(_("  -n, --schema=SCHEMA             repack tables in the
specified schema, can be used several times\n"));

regards
Marcos

#37

Mihail Nikalayeu

mihailnikalayeu@gmail.com

4 months ago

In reply to: Álvaro Herrera (#33)

Re: Adding REPACK [concurrently]

Hello!

Álvaro Herrera <alvherre@kurilemu.de>:

So here's v22 with those and rebased to current sources. Only the first
two patches this time, which are the ones I would be glad to receive
input on.

get_tables_to_repack_partitioned(RepackCommand cmd, MemoryContext cluster_context,
Oid relid, bool rel_is_index)

Should we rename it to repack_context to be aligned with the calling side?

---------
'cmd' in

static List *get_tables_to_repack(RepackCommand cmd, bool usingindex,
MemoryContext permcxt);

but 'command' in

get_tables_to_repack(RepackCommand command, bool usingindex,
MemoryContext permcxt)

---------

cmd == REPACK_COMMAND_CLUSTER ? "CLUSTER" : "REPACK",

May be changed to RepackCommandAsString

-----------

if (cmd == REPACK_COMMAND_REPACK)
pgstat_progress_update_param(PROGRESS_REPACK_COMMAND,
PROGRESS_REPACK_COMMAND_REPACK);
else if (cmd == REPACK_COMMAND_CLUSTER)
{
pgstat_progress_update_param(PROGRESS_REPACK_COMMAND,
PROGRESS_CLUSTER_COMMAND_CLUSTER);
} else ....

'{' and '}' looks a little bit weird.

--------
Documentation of pg_repackdb contains a lot of "analyze" and even
"--analyze" parameter - but I can't see anything related in the code.

Best regards,
Mikhail.

#38

Robert Treat

rob@xzilla.net

4 months ago

In reply to: Álvaro Herrera (#33)

Re: Adding REPACK [concurrently]

On Thu, Sep 25, 2025 at 2:12 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:

So here's v22 with those and rebased to current sources. Only the first
two patches this time, which are the ones I would be glad to receive
input on.

A number of small issues I noticed. I don't know that they all need
addressing right now, but seems worth asking the questions...

#1
"pg_repackdb --help" does not mention the --index option, although the
flag is accepted. I'm not sure if this is meant to match clusterdb,
but since we need the index option to invoke the clustering behavior,
I think it needs to be there.

#2
[xzilla@zebes] pgsql/bin/pg_repackdb -d pagila -v -t customer
--index=idx_last_name
pg_repackdb: repacking database "pagila"
INFO: clustering "public.customer" using sequential scan and sort

[xzilla@zebes] pgsql/bin/pg_repackdb -d pagila -v -t customer
pg_repackdb: repacking database "pagila"
INFO: vacuuming "public.customer"

This was less confusing once I figured out we could pass the --index
option, but even with that it is a little confusing, I think mostly
because it looks like we are "vacuuming" the table, which in a world
of repack and vacuum (ie. no vacuum full) doesn't make sense. I think
the right thing to do here would be to modify it to be "repacking %s"
in both cases, with the "using sequential scan and sort" as the means
to understand which version of repack is being executed.

#3
pg_repackdb does not offer an --analyze option, which istm it should
to match the REPACK command

#4
SQL level REPACK help shows:

where option can be one of:
VERBOSE [ boolean ]
ANALYSE | ANALYZE

but SQL level VACUUM does
VERBOSE [ boolean ]
ANALYZE [ boolean ]

These operate the same way, so I would expect it to match the language
in vacuum.

#5
[xzilla@zebes] pgsql/bin/pg_repackdb -d pagila -v -t film --index
pg_repackdb: repacking database "pagila"

In the above scenario, I am repacking without having previously
specified an index. At the SQL level this would throw an error, at the
command line it gives me a heart attack. :-)
It's actually not that bad, because we don't actually do anything, but
maybe we should throw an error?

#6
On the individual command pages (like sql-repack.html), I think there
should be more cross-linking, ie. repack should probably say "see also
cluster" and vice versa. Likely similarly with vacuum and repack.

#7
Is there some reason you chose to intermingle the repack regression
tests with the existing tests? I feel like it'd be easier to
differentiate potential regressions and new functionality if these
were separated.

Robert Treat
https://xzilla.net

#39

Alvaro Herrera

alvherre@alvh.no-ip.org

3 months ago

In reply to: Alvaro Herrera (#1)

1 attachment(s)

Re: Adding REPACK [concurrently]

Hello,

Here's patch v24. I was hoping to push this today, but I think there
were too many changes from v23 for that. Here's what I did:

- pg_stat_progress_cluster is no longer a view on top of the low-level
pg_stat_get_progress_info() function. Instead, it's a view on top of
pg_stat_progress_repack. The only change it applies on top of that
one is change the command from REPACK to one of VACUUM FULL or
CLUSTER, depending on whether an index is being used or not. This
should keep the behavior identical to previous versions.
Alternatively we could just hide rows where the command is REPACK, but
I don't think that would be any better. This way, we maintain
compatibility with tools reading pg_stat_progress_cluster. Maybe this
is useless and we should just drop the view, not sure, we can discuss
separately.

- pg_stat_progress_repack itself now shows the command. Also I got rid
of the separate enum values for the command, and instead used the
values from the parse node (RepackCommand); this removes about a dozen
lines of C code. To forestall potentially bogus usage of value 0, I
made the enum start from 1.

- I noticed that you can do "CLUSTER pg_class ON some_index" and it will
happily modify pg_index.indisclustered, which is a bit weird
considering that allow_system_table_mods is off -- if you later try
ALTER TABLE .. SET WITHOUT CLUSTER, it won't let you. I think this is
bogus and we should change it so that CLUSTER refuses to change the
clustered index on a system catalog, unless allow_system_table_mods is
on. However, that would be a change from longstanding behavior which
is specifically tested for in regression tests, so I didn't do it.
We can discuss such a change separately. But I did make REPACK refuse
to do that, because we don't need to propagate bogus historical
behavior. So REPACK will fail if you try to change the indisclustered
index, but it will work fine if you repack based on the same index as
before, or repack with no index.

- pg_repackdb: if you try with a non-superuser without specifying a
table name, it will fail as soon as it hits the first catalog table or
whatever with "ERROR: cannot lock this table". This is sorta fine for
vacuumdb, but only because VACUUM itself will instead say "WARNING:
cannot lock table XYZ, skipping", so it's not an error and vacuumdb
keeps running. IMO this is bogus: vacuumdb should not try to process
tables that it doesn't have privileges to. However, not wanting to
change longstanding behavior, I left that alone. For pg_repackdb, I
added a condition in the WHERE clause there to only fetch tables that
the current user has MAINTAIN privilege over. Then you can do a
"pg_repackdb -U foobar" and it will nicely process the tables that
that user is allowed to process. We can discuss changing the vacuumdb
behavior separately.

- Added some additional tests for pg_repackdb and REPACK.

- Updated the docs.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

Attachments:

v24-0001-Add-REPACK-command.patchtext/x-diff; charset=utf-8Download

From 444251dda641d14508e134d5528670bcf7f9d733 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= <alvherre@kurilemu.de>
Date: Sat, 26 Jul 2025 19:57:26 +0200
Subject: [PATCH v24] Add REPACK command
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

REPACK absorbs the functionality of VACUUM FULL and CLUSTER in a single
command.  Because this functionality is completely different from
regular VACUUM, having it separate from VACUUM makes it easier for users
to understand; as for CLUSTER, the term is heavily overloaded in the
IT world and even in Postgres itself, so it's good that we can avoid it.

This also adds pg_repackdb, a new utility that can invoke the new
commands.  This is heavily based on vacuumdb.

Author: Antonin Houska <ah@cybertec.at>
Co-authored-by: Ãlvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: jian he <jian.universality@gmail.com>
Discussion: https://postgr.es/m/82651.1720540558@antos
Discussion: https://postgr.es/m/202507262156.sb455angijk6@alvherre.pgsql
---
 doc/src/sgml/monitoring.sgml             | 223 +++++-
 doc/src/sgml/ref/allfiles.sgml           |   2 +
 doc/src/sgml/ref/cluster.sgml            |  97 +--
 doc/src/sgml/ref/clusterdb.sgml          |   5 +
 doc/src/sgml/ref/pg_repackdb.sgml        | 488 +++++++++++++
 doc/src/sgml/ref/repack.sgml             | 319 +++++++++
 doc/src/sgml/ref/vacuum.sgml             |  33 +-
 doc/src/sgml/reference.sgml              |   2 +
 src/backend/access/heap/heapam_handler.c |  32 +-
 src/backend/catalog/index.c              |   2 +-
 src/backend/catalog/system_views.sql     |  29 +-
 src/backend/commands/cluster.c           | 854 +++++++++++++++--------
 src/backend/commands/vacuum.c            |   6 +-
 src/backend/parser/gram.y                |  97 ++-
 src/backend/tcop/utility.c               |  20 +-
 src/backend/utils/adt/pgstatfuncs.c      |   2 +
 src/bin/psql/tab-complete.in.c           |  33 +-
 src/bin/scripts/Makefile                 |   4 +-
 src/bin/scripts/meson.build              |   2 +
 src/bin/scripts/pg_repackdb.c            | 242 +++++++
 src/bin/scripts/t/103_repackdb.pl        |  47 ++
 src/bin/scripts/vacuuming.c              | 114 ++-
 src/bin/scripts/vacuuming.h              |   3 +
 src/include/commands/cluster.h           |   8 +-
 src/include/commands/progress.h          |  48 +-
 src/include/nodes/parsenodes.h           |  35 +-
 src/include/parser/kwlist.h              |   1 +
 src/include/tcop/cmdtaglist.h            |   1 +
 src/include/utils/backend_progress.h     |   1 +
 src/test/regress/expected/cluster.out    | 134 +++-
 src/test/regress/expected/rules.out      |  72 +-
 src/test/regress/sql/cluster.sql         |  70 +-
 src/tools/pgindent/typedefs.list         |   2 +
 33 files changed, 2483 insertions(+), 545 deletions(-)
 create mode 100644 doc/src/sgml/ref/pg_repackdb.sgml
 create mode 100644 doc/src/sgml/ref/repack.sgml
 create mode 100644 src/bin/scripts/pg_repackdb.c
 create mode 100644 src/bin/scripts/t/103_repackdb.pl

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 6e3aac3d815..7727b0e17e5 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -405,6 +405,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_progress_repack</structname><indexterm><primary>pg_stat_progress_repack</primary></indexterm></entry>
+      <entry>One row for each backend running
+       <command>REPACK</command>, showing current progress.  See
+       <xref linkend="repack-progress-reporting"/>.
+      </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_progress_basebackup</structname><indexterm><primary>pg_stat_progress_basebackup</primary></indexterm></entry>
       <entry>One row for each WAL sender process streaming a base backup,
@@ -5542,7 +5550,8 @@ FROM pg_stat_get_backend_idset() AS backendid;
    certain commands during command execution.  Currently, the only commands
    which support progress reporting are <command>ANALYZE</command>,
    <command>CLUSTER</command>,
-   <command>CREATE INDEX</command>, <command>VACUUM</command>,
+   <command>CREATE INDEX</command>, <command>REPACK</command>,
+   <command>VACUUM</command>,
    <command>COPY</command>,
    and <xref linkend="protocol-replication-base-backup"/> (i.e., replication
    command that <xref linkend="app-pgbasebackup"/> issues to take
@@ -6001,6 +6010,218 @@ FROM pg_stat_get_backend_idset() AS backendid;
   </table>
  </sect2>
 
+ <sect2 id="repack-progress-reporting">
+  <title>REPACK Progress Reporting</title>
+
+  <indexterm>
+   <primary>pg_stat_progress_repack</primary>
+  </indexterm>
+
+  <para>
+   Whenever <command>REPACK</command> is running,
+   the <structname>pg_stat_progress_repack</structname> view will contain a
+   row for each backend that is currently running the command.  The tables
+   below describe the information that will be reported and provide
+   information about how to interpret it.
+  </para>
+
+  <table id="pg-stat-progress-repack-view" xreflabel="pg_stat_progress_repack">
+   <title><structname>pg_stat_progress_repack</structname> View</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>pid</structfield> <type>integer</type>
+      </para>
+      <para>
+       Process ID of backend.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>datid</structfield> <type>oid</type>
+      </para>
+      <para>
+       OID of the database to which this backend is connected.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>datname</structfield> <type>name</type>
+      </para>
+      <para>
+       Name of the database to which this backend is connected.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>relid</structfield> <type>oid</type>
+      </para>
+      <para>
+       OID of the table being repacked.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>phase</structfield> <type>text</type>
+      </para>
+      <para>
+       Current processing phase. See <xref linkend="repack-phases"/>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>repack_index_relid</structfield> <type>oid</type>
+      </para>
+      <para>
+       If the table is being scanned using an index, this is the OID of the
+       index being used; otherwise, it is zero.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_scanned</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples scanned.
+       This counter only advances when the phase is
+       <literal>seq scanning heap</literal>,
+       <literal>index scanning heap</literal>
+       or <literal>writing new heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_written</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples written.
+       This counter only advances when the phase is
+       <literal>seq scanning heap</literal>,
+       <literal>index scanning heap</literal>
+       or <literal>writing new heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_blks_total</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total number of heap blocks in the table.  This number is reported
+       as of the beginning of <literal>seq scanning heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_blks_scanned</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap blocks scanned.  This counter only advances when the
+       phase is <literal>seq scanning heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>index_rebuild_count</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of indexes rebuilt.  This counter only advances when the phase
+       is <literal>rebuilding index</literal>.
+      </para></entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <table id="repack-phases">
+   <title>REPACK Phases</title>
+   <tgroup cols="2">
+    <colspec colname="col1" colwidth="1*"/>
+    <colspec colname="col2" colwidth="2*"/>
+    <thead>
+    <row>
+      <entry>Phase</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><literal>initializing</literal></entry>
+     <entry>
+       The command is preparing to begin scanning the heap.  This phase is
+       expected to be very brief.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>seq scanning heap</literal></entry>
+     <entry>
+       The command is currently scanning the table using a sequential scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>index scanning heap</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently scanning the table using an index scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>sorting tuples</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently sorting tuples.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>writing new heap</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently writing the new heap.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>swapping relation files</literal></entry>
+     <entry>
+       The command is currently swapping newly-built files into place.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>rebuilding index</literal></entry>
+     <entry>
+       The command is currently rebuilding an index.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>performing final cleanup</literal></entry>
+     <entry>
+       The command is performing final cleanup.  When this phase is
+       completed, <command>REPACK</command> will end.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+ </sect2>
+
  <sect2 id="copy-progress-reporting">
   <title>COPY Progress Reporting</title>
 
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index f5be638867a..eabf92e3536 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -167,6 +167,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY refreshMaterializedView SYSTEM "refresh_materialized_view.sgml">
 <!ENTITY reindex            SYSTEM "reindex.sgml">
 <!ENTITY releaseSavepoint   SYSTEM "release_savepoint.sgml">
+<!ENTITY repack             SYSTEM "repack.sgml">
 <!ENTITY reset              SYSTEM "reset.sgml">
 <!ENTITY revoke             SYSTEM "revoke.sgml">
 <!ENTITY rollback           SYSTEM "rollback.sgml">
@@ -212,6 +213,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY pgIsready          SYSTEM "pg_isready.sgml">
 <!ENTITY pgReceivewal       SYSTEM "pg_receivewal.sgml">
 <!ENTITY pgRecvlogical      SYSTEM "pg_recvlogical.sgml">
+<!ENTITY pgRepackdb         SYSTEM "pg_repackdb.sgml">
 <!ENTITY pgResetwal         SYSTEM "pg_resetwal.sgml">
 <!ENTITY pgRestore          SYSTEM "pg_restore.sgml">
 <!ENTITY pgRewind           SYSTEM "pg_rewind.sgml">
diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index 8811f169ea0..cfcfb65e349 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -33,51 +33,13 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
   <title>Description</title>
 
   <para>
-   <command>CLUSTER</command> instructs <productname>PostgreSQL</productname>
-   to cluster the table specified
-   by <replaceable class="parameter">table_name</replaceable>
-   based on the index specified by
-   <replaceable class="parameter">index_name</replaceable>. The index must
-   already have been defined on
-   <replaceable class="parameter">table_name</replaceable>.
+   The <command>CLUSTER</command> command is equivalent to
+   <xref linkend="sql-repack"/> with an <literal>USING INDEX</literal>
+   clause.  See there for more details.
   </para>
 
-  <para>
-   When a table is clustered, it is physically reordered
-   based on the index information. Clustering is a one-time operation:
-   when the table is subsequently updated, the changes are
-   not clustered.  That is, no attempt is made to store new or
-   updated rows according to their index order.  (If one wishes, one can
-   periodically recluster by issuing the command again.  Also, setting
-   the table's <literal>fillfactor</literal> storage parameter to less than
-   100% can aid in preserving cluster ordering during updates, since updated
-   rows are kept on the same page if enough space is available there.)
-  </para>
+<!-- Do we need to describe exactly which options map to what?  They seem obvious to me. -->
 
-  <para>
-   When a table is clustered, <productname>PostgreSQL</productname>
-   remembers which index it was clustered by.  The form
-   <command>CLUSTER <replaceable class="parameter">table_name</replaceable></command>
-   reclusters the table using the same index as before.  You can also
-   use the <literal>CLUSTER</literal> or <literal>SET WITHOUT CLUSTER</literal>
-   forms of <link linkend="sql-altertable"><command>ALTER TABLE</command></link> to set the index to be used for
-   future cluster operations, or to clear any previous setting.
-  </para>
-
-  <para>
-   <command>CLUSTER</command> without a
-   <replaceable class="parameter">table_name</replaceable> reclusters all the
-   previously-clustered tables in the current database that the calling user
-   has privileges for.  This form of <command>CLUSTER</command> cannot be
-   executed inside a transaction block.
-  </para>
-
-  <para>
-   When a table is being clustered, an <literal>ACCESS
-   EXCLUSIVE</literal> lock is acquired on it. This prevents any other
-   database operations (both reads and writes) from operating on the
-   table until the <command>CLUSTER</command> is finished.
-  </para>
  </refsect1>
 
  <refsect1>
@@ -136,63 +98,12 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
     on the table.
    </para>
 
-   <para>
-    In cases where you are accessing single rows randomly
-    within a table, the actual order of the data in the
-    table is unimportant. However, if you tend to access some
-    data more than others, and there is an index that groups
-    them together, you will benefit from using <command>CLUSTER</command>.
-    If you are requesting a range of indexed values from a table, or a
-    single indexed value that has multiple rows that match,
-    <command>CLUSTER</command> will help because once the index identifies the
-    table page for the first row that matches, all other rows
-    that match are probably already on the same table page,
-    and so you save disk accesses and speed up the query.
-   </para>
-
-   <para>
-    <command>CLUSTER</command> can re-sort the table using either an index scan
-    on the specified index, or (if the index is a b-tree) a sequential
-    scan followed by sorting.  It will attempt to choose the method that
-    will be faster, based on planner cost parameters and available statistical
-    information.
-   </para>
-
    <para>
     While <command>CLUSTER</command> is running, the <xref
     linkend="guc-search-path"/> is temporarily changed to <literal>pg_catalog,
     pg_temp</literal>.
    </para>
 
-   <para>
-    When an index scan is used, a temporary copy of the table is created that
-    contains the table data in the index order.  Temporary copies of each
-    index on the table are created as well.  Therefore, you need free space on
-    disk at least equal to the sum of the table size and the index sizes.
-   </para>
-
-   <para>
-    When a sequential scan and sort is used, a temporary sort file is
-    also created, so that the peak temporary space requirement is as much
-    as double the table size, plus the index sizes.  This method is often
-    faster than the index scan method, but if the disk space requirement is
-    intolerable, you can disable this choice by temporarily setting <xref
-    linkend="guc-enable-sort"/> to <literal>off</literal>.
-   </para>
-
-   <para>
-    It is advisable to set <xref linkend="guc-maintenance-work-mem"/> to
-    a reasonably large value (but not more than the amount of RAM you can
-    dedicate to the <command>CLUSTER</command> operation) before clustering.
-   </para>
-
-   <para>
-    Because the planner records statistics about the ordering of
-    tables, it is advisable to run <link linkend="sql-analyze"><command>ANALYZE</command></link>
-    on the newly clustered table.
-    Otherwise, the planner might make poor choices of query plans.
-   </para>
-
    <para>
     Because <command>CLUSTER</command> remembers which indexes are clustered,
     one can cluster the tables one wants clustered manually the first time,
diff --git a/doc/src/sgml/ref/clusterdb.sgml b/doc/src/sgml/ref/clusterdb.sgml
index 0d2051bf6f1..546c1289c31 100644
--- a/doc/src/sgml/ref/clusterdb.sgml
+++ b/doc/src/sgml/ref/clusterdb.sgml
@@ -64,6 +64,11 @@ PostgreSQL documentation
    this utility and via other methods for accessing the server.
   </para>
 
+  <para>
+   <application>clusterdb</application> has been superceded by
+   <application>pg_repackdb</application>.
+  </para>
+
  </refsect1>
 
 
diff --git a/doc/src/sgml/ref/pg_repackdb.sgml b/doc/src/sgml/ref/pg_repackdb.sgml
new file mode 100644
index 00000000000..b313b54ab63
--- /dev/null
+++ b/doc/src/sgml/ref/pg_repackdb.sgml
@@ -0,0 +1,488 @@
+<!--
+doc/src/sgml/ref/pg_repackdb.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="app-pgrepackdb">
+ <indexterm zone="app-pgrepackdb">
+  <primary>pg_repackdb</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle><application>pg_repackdb</application></refentrytitle>
+  <manvolnum>1</manvolnum>
+  <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>pg_repackdb</refname>
+  <refpurpose>repack and analyze a <productname>PostgreSQL</productname>
+  database</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_repackdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+      <arg choice="plain"><option>-t</option></arg>
+      <arg choice="plain"><option>--table</option></arg>
+     </group>
+     <replaceable>table</replaceable>
+     <arg choice="opt">( <replaceable class="parameter">column</replaceable> [,...] )</arg>
+    </arg>
+   </arg>
+
+   <arg choice="opt">
+    <group choice="plain">
+     <arg choice="plain"><replaceable>dbname</replaceable></arg>
+     <arg choice="plain"><option>-a</option></arg>
+     <arg choice="plain"><option>--all</option></arg>
+    </group>
+   </arg>
+  </cmdsynopsis>
+
+  <cmdsynopsis>
+   <command>pg_repackdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+      <arg choice="plain"><option>-n</option></arg>
+      <arg choice="plain"><option>--schema</option></arg>
+     </group>
+     <replaceable>schema</replaceable>
+    </arg>
+   </arg>
+
+   <arg choice="opt">
+    <group choice="plain">
+     <arg choice="plain"><replaceable>dbname</replaceable></arg>
+     <arg choice="plain"><option>-a</option></arg>
+     <arg choice="plain"><option>--all</option></arg>
+    </group>
+   </arg>
+  </cmdsynopsis>
+
+  <cmdsynopsis>
+   <command>pg_repackdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+      <arg choice="plain"><option>-N</option></arg>
+      <arg choice="plain"><option>--exclude-schema</option></arg>
+     </group>
+     <replaceable>schema</replaceable>
+    </arg>
+   </arg>
+
+   <arg choice="opt">
+    <group choice="plain">
+     <arg choice="plain"><replaceable>dbname</replaceable></arg>
+     <arg choice="plain"><option>-a</option></arg>
+     <arg choice="plain"><option>--all</option></arg>
+    </group>
+   </arg>
+  </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <application>pg_repackdb</application> is a utility for repacking a
+   <productname>PostgreSQL</productname> database.
+   <application>pg_repackdb</application> will also generate internal
+   statistics used by the <productname>PostgreSQL</productname> query
+   optimizer.
+  </para>
+
+  <para>
+   <application>pg_repackdb</application> is a wrapper around the SQL
+   command <link linkend="sql-repack"><command>REPACK</command></link> There
+   is no effective difference between repacking and analyzing databases via
+   this utility and via other methods for accessing the server.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Options</title>
+
+   <para>
+    <application>pg_repackdb</application> accepts the following command-line arguments:
+    <variablelist>
+     <varlistentry>
+      <term><option>-a</option></term>
+      <term><option>--all</option></term>
+      <listitem>
+       <para>
+        Repack all databases.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option><optional>-d</optional> <replaceable class="parameter">dbname</replaceable></option></term>
+      <term><option><optional>--dbname=</optional><replaceable class="parameter">dbname</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the name of the database to be repacked or analyzed,
+        when <option>-a</option>/<option>--all</option> is not used.  If this
+        is not specified, the database name is read from the environment
+        variable <envar>PGDATABASE</envar>.  If that is not set, the user name
+        specified for the connection is used.
+        The <replaceable>dbname</replaceable> can be
+        a <link linkend="libpq-connstring">connection string</link>.  If so,
+        connection string parameters will override any conflicting command
+        line options.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-e</option></term>
+      <term><option>--echo</option></term>
+      <listitem>
+       <para>
+        Echo the commands that <application>pg_repackdb</application>
+        generates and sends to the server.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--index<optional>=<replaceable class="parameter">index_name</replaceable></optional></option></term>
+      <listitem>
+       <para>
+        Pass the <literal>USING INDEX</literal> clause to <literal>REPACK</literal>,
+        and optionally the index name to specify.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-j <replaceable class="parameter">njobs</replaceable></option></term>
+      <term><option>--jobs=<replaceable class="parameter">njobs</replaceable></option></term>
+      <listitem>
+       <para>
+        Execute the repack or analyze commands in parallel by running
+        <replaceable class="parameter">njobs</replaceable>
+        commands simultaneously.  This option may reduce the processing time
+        but it also increases the load on the database server.
+       </para>
+       <para>
+        <application>pg_repackdb</application> will open
+        <replaceable class="parameter">njobs</replaceable> connections to the
+        database, so make sure your <xref linkend="guc-max-connections"/>
+        setting is high enough to accommodate all connections.
+       </para>
+       <para>
+        Note that using this mode might cause deadlock failures if certain
+        system catalogs are processed in parallel.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-n <replaceable class="parameter">schema</replaceable></option></term>
+      <term><option>--schema=<replaceable class="parameter">schema</replaceable></option></term>
+      <listitem>
+       <para>
+        Repack or analyze all tables in
+        <replaceable class="parameter">schema</replaceable> only.  Multiple
+        schemas can be repacked by writing multiple <option>-n</option>
+        switches.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-N <replaceable class="parameter">schema</replaceable></option></term>
+      <term><option>--exclude-schema=<replaceable class="parameter">schema</replaceable></option></term>
+      <listitem>
+       <para>
+        Do not repack or analyze any tables in
+        <replaceable class="parameter">schema</replaceable>.  Multiple schemas
+        can be excluded by writing multiple <option>-N</option> switches.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-q</option></term>
+      <term><option>--quiet</option></term>
+      <listitem>
+       <para>
+        Do not display progress messages.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-t <replaceable class="parameter">table</replaceable> [ (<replaceable class="parameter">column</replaceable> [,...]) ]</option></term>
+      <term><option>--table=<replaceable class="parameter">table</replaceable> [ (<replaceable class="parameter">column</replaceable> [,...]) ]</option></term>
+      <listitem>
+       <para>
+        Repack or analyze <replaceable class="parameter">table</replaceable>
+        only.  Column names can be specified only in conjunction with
+        the <option>--analyze</option> option.  Multiple tables can be
+        repacked by writing multiple
+        <option>-t</option> switches.
+       </para>
+       <tip>
+        <para>
+         If you specify columns, you probably have to escape the parentheses
+         from the shell.  (See examples below.)
+        </para>
+       </tip>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-v</option></term>
+      <term><option>--verbose</option></term>
+      <listitem>
+       <para>
+        Print detailed information during processing.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-V</option></term>
+       <term><option>--version</option></term>
+       <listitem>
+       <para>
+       Print the <application>pg_repackdb</application> version and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-z</option></term>
+      <term><option>--analyze</option></term>
+      <listitem>
+       <para>
+        Also calculate statistics for use by the optimizer.  If a column name
+        list is given, only compute statistics for those columns.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-?</option></term>
+       <term><option>--help</option></term>
+       <listitem>
+       <para>
+       Show help about <application>pg_repackdb</application> command line
+       arguments, and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+    </variablelist>
+   </para>
+
+   <para>
+    <application>pg_repackdb</application> also accepts
+    the following command-line arguments for connection parameters:
+    <variablelist>
+     <varlistentry>
+      <term><option>-h <replaceable class="parameter">host</replaceable></option></term>
+      <term><option>--host=<replaceable class="parameter">host</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the host name of the machine on which the server
+        is running.  If the value begins with a slash, it is used
+        as the directory for the Unix domain socket.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-p <replaceable class="parameter">port</replaceable></option></term>
+      <term><option>--port=<replaceable class="parameter">port</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the TCP port or local Unix domain socket file
+        extension on which the server
+        is listening for connections.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-U <replaceable class="parameter">username</replaceable></option></term>
+      <term><option>--username=<replaceable class="parameter">username</replaceable></option></term>
+      <listitem>
+       <para>
+        User name to connect as.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-w</option></term>
+      <term><option>--no-password</option></term>
+      <listitem>
+       <para>
+        Never issue a password prompt.  If the server requires
+        password authentication and a password is not available by
+        other means such as a <filename>.pgpass</filename> file, the
+        connection attempt will fail.  This option can be useful in
+        batch jobs and scripts where no user is present to enter a
+        password.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-W</option></term>
+      <term><option>--password</option></term>
+      <listitem>
+       <para>
+        Force <application>pg_repackdb</application> to prompt for a
+        password before connecting to a database.
+       </para>
+
+       <para>
+        This option is never essential, since
+        <application>pg_repackdb</application> will automatically prompt
+        for a password if the server demands password authentication.
+        However, <application>pg_repackdb</application> will waste a
+        connection attempt finding out that the server wants a password.
+        In some cases it is worth typing <option>-W</option> to avoid the extra
+        connection attempt.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--maintenance-db=<replaceable class="parameter">dbname</replaceable></option></term>
+      <listitem>
+       <para>
+        When the <option>-a</option>/<option>--all</option> is used, connect
+        to this database to gather the list of databases to repack.
+        If not specified, the <literal>postgres</literal> database will be used,
+        or if that does not exist, <literal>template1</literal> will be used.
+        This can be a <link linkend="libpq-connstring">connection
+        string</link>.  If so, connection string parameters will override any
+        conflicting command line options.  Also, connection string parameters
+        other than the database name itself will be re-used when connecting
+        to other databases.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+   </para>
+ </refsect1>
+
+
+ <refsect1>
+  <title>Environment</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><envar>PGDATABASE</envar></term>
+    <term><envar>PGHOST</envar></term>
+    <term><envar>PGPORT</envar></term>
+    <term><envar>PGUSER</envar></term>
+
+    <listitem>
+     <para>
+      Default connection parameters
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><envar>PG_COLOR</envar></term>
+    <listitem>
+     <para>
+      Specifies whether to use color in diagnostic messages. Possible values
+      are <literal>always</literal>, <literal>auto</literal> and
+      <literal>never</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <para>
+   This utility, like most other <productname>PostgreSQL</productname> utilities,
+   also uses the environment variables supported by <application>libpq</application>
+   (see <xref linkend="libpq-envars"/>).
+  </para>
+
+ </refsect1>
+
+
+ <refsect1>
+  <title>Diagnostics</title>
+
+  <para>
+   In case of difficulty, see
+   <xref linkend="sql-repack"/> and <xref linkend="app-psql"/> for
+   discussions of potential problems and error messages.
+   The database server must be running at the
+   targeted host.  Also, any default connection settings and environment
+   variables used by the <application>libpq</application> front-end
+   library will apply.
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+   <para>
+    To repack the database <literal>test</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb test</userinput>
+</screen>
+   </para>
+
+   <para>
+    To repack and analyze for the optimizer a database named
+    <literal>bigdb</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --analyze bigdb</userinput>
+</screen>
+   </para>
+
+   <para>
+    To repack a single table
+    <literal>foo</literal> in a database named
+    <literal>xyzzy</literal>, and analyze a single column
+    <literal>bar</literal> of the table for the optimizer:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --analyze --verbose --table='foo(bar)' xyzzy</userinput>
+</screen></para>
+
+   <para>
+    To repack all tables in the <literal>foo</literal> and <literal>bar</literal> schemas
+    in a database named <literal>xyzzy</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --schema='foo' --schema='bar' xyzzy</userinput>
+</screen></para>
+
+
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-repack"/></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/repack.sgml b/doc/src/sgml/ref/repack.sgml
new file mode 100644
index 00000000000..0e1116eae85
--- /dev/null
+++ b/doc/src/sgml/ref/repack.sgml
@@ -0,0 +1,319 @@
+<!--
+doc/src/sgml/ref/repack.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="sql-repack">
+ <indexterm zone="sql-repack">
+  <primary>REPACK</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>REPACK</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>REPACK</refname>
+  <refpurpose>rewrite a table to reclaim disk space</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <replaceable class="parameter">table_and_columns</replaceable> [ USING INDEX [ <replaceable class="parameter">index_name</replaceable> ] ] ]
+
+<phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
+
+    VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
+    ANALYZE [ <replaceable class="parameter">boolean</replaceable> ]
+
+<phrase>and <replaceable class="parameter">table_and_columns</replaceable> is:</phrase>
+
+    <replaceable class="parameter">table_name</replaceable> [ ( <replaceable class="parameter">column_name</replaceable> [, ...] ) ]
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>REPACK</command> reclaims storage occupied by dead
+   tuples. Unlike <command>VACUUM</command>, it does so by rewriting the
+   entire contents of the table specified
+   by <replaceable class="parameter">table_name</replaceable> into a new disk
+   file with no extra space (except for the space guaranteed by
+   the <literal>fillfactor</literal> storage parameter), allowing unused space
+   to be returned to the operating system.
+  </para>
+
+  <para>
+   Without
+   a <replaceable class="parameter">table_name</replaceable>, <command>REPACK</command>
+   processes every table and materialized view in the current database that
+   the current user has the <literal>MAINTAIN</literal> privilege on. This
+   form of <command>REPACK</command> cannot be executed inside a transaction
+   block.
+  </para>
+
+  <para>
+   If a <literal>USING INDEX</literal> clause is specified, the rows are
+   physically reordered based on information from an index.  Please see the
+   notes on clustering below.
+  </para>
+
+  <para>
+   When a table is being repacked, an <literal>ACCESS EXCLUSIVE</literal> lock
+   is acquired on it. This prevents any other database operations (both reads
+   and writes) from operating on the table until the <command>REPACK</command>
+   is finished.
+  </para>
+
+  <refsect2 id="sql-repack-notes-on-clustering" xreflabel="Notes on Clustering">
+   <title>Notes on Clustering</title>
+
+   <para>
+    If the <literal>USING INDEX</literal> clause is specified, the rows in
+    the table are physically reordered following an index: if an index name
+    is specified in the command, then that index is used; if no index name
+    is specified, then the index that has been configured as the index to
+    cluster on.  If no index has been configured in this way, an error is
+    thrown.  The index given in the <literal>USING INDEX</literal> clause
+    is configured as the index to cluster on, as well as an index given
+    to the <command>CLUSTER</command> command.  An index can be set
+    manually using <command>ALTER TABLE ... CLUSTER ON</command>, and reset
+    with <command>ALTER TABLE ... SET WITHOUT CLUSTER</command>.
+   </para>
+
+   <para>
+    If no table name is specified in <command>REPACK USING INDEX</command>,
+    all tables which have a clustering index defined and which the calling
+    user has privileges for are processed.
+   </para>
+
+   <para>
+    Clustering is a one-time operation: when the table is
+    subsequently updated, the changes are not clustered.  That is, no attempt
+    is made to store new or updated rows according to their index order.  (If
+    one wishes, one can periodically recluster by issuing the command again.
+    Also, setting the table's <literal>fillfactor</literal> storage parameter
+    to less than 100% can aid in preserving cluster ordering during updates,
+    since updated rows are kept on the same page if enough space is available
+    there.)
+   </para>
+
+   <para>
+    In cases where you are accessing single rows randomly within a table, the
+    actual order of the data in the table is unimportant. However, if you tend
+    to access some data more than others, and there is an index that groups
+    them together, you will benefit from using clustering.  If
+    you are requesting a range of indexed values from a table, or a single
+    indexed value that has multiple rows that match,
+    <command>REPACK</command> will help because once the index identifies the
+    table page for the first row that matches, all other rows that match are
+    probably already on the same table page, and so you save disk accesses and
+    speed up the query.
+   </para>
+
+   <para>
+    <command>REPACK</command> can re-sort the table using either an index scan
+    on the specified index (if the index is a b-tree), or a sequential scan
+    followed by sorting.  It will attempt to choose the method that will be
+    faster, based on planner cost parameters and available statistical
+    information.
+   </para>
+
+   <para>
+    Because the planner records statistics about the ordering of tables, it is
+    advisable to
+    run <link linkend="sql-analyze"><command>ANALYZE</command></link> on the
+    newly repacked table.  Otherwise, the planner might make poor choices of
+    query plans.
+   </para>
+  </refsect2>
+
+  <refsect2 id="sql-repack-notes-on-resources" xreflabel="Notes on Resources">
+   <title>Notes on Resources</title>
+
+   <para>
+    When an index scan or a sequential scan without sort is used, a temporary
+    copy of the table is created that contains the table data in the index
+    order.  Temporary copies of each index on the table are created as well.
+    Therefore, you need free space on disk at least equal to the sum of the
+    table size and the index sizes.
+   </para>
+
+   <para>
+    When a sequential scan and sort is used, a temporary sort file is also
+    created, so that the peak temporary space requirement is as much as double
+    the table size, plus the index sizes.  This method is often faster than
+    the index scan method, but if the disk space requirement is intolerable,
+    you can disable this choice by temporarily setting
+    <xref linkend="guc-enable-sort"/> to <literal>off</literal>.
+   </para>
+
+   <para>
+    It is advisable to set <xref linkend="guc-maintenance-work-mem"/> to a
+    reasonably large value (but not more than the amount of RAM you can
+    dedicate to the <command>REPACK</command> operation) before repacking.
+   </para>
+  </refsect2>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><replaceable class="parameter">table_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (possibly schema-qualified) of a table.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">column_name</replaceable></term>
+    <listitem>
+     <para>
+      The name of a specific column to analyze. Defaults to all columns.
+      If a column list is specific, <literal>ANALYZE</literal> must also
+      be specified.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">index_name</replaceable></term>
+    <listitem>
+     <para>
+      The name of an index.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>VERBOSE</literal></term>
+    <listitem>
+     <para>
+      Prints a progress report as each table is repacked
+      at <literal>INFO</literal> level.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>ANALYZE</literal></term>
+    <term><literal>ANALYSE</literal></term>
+    <listitem>
+     <para>
+      Applies <xref linkend="sql-analyze"/> on the table after repacking.  This is
+      currently only supported when a single (non-partitioned) table is specified.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">boolean</replaceable></term>
+    <listitem>
+     <para>
+      Specifies whether the selected option should be turned on or off.
+      You can write <literal>TRUE</literal>, <literal>ON</literal>, or
+      <literal>1</literal> to enable the option, and <literal>FALSE</literal>,
+      <literal>OFF</literal>, or <literal>0</literal> to disable it.  The
+      <replaceable class="parameter">boolean</replaceable> value can also
+      be omitted, in which case <literal>TRUE</literal> is assumed.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+
+   <para>
+    To repack a table, one must have the <literal>MAINTAIN</literal> privilege
+    on the table.
+   </para>
+
+   <para>
+    While <command>REPACK</command> is running, the <xref
+    linkend="guc-search-path"/> is temporarily changed to <literal>pg_catalog,
+    pg_temp</literal>.
+   </para>
+
+  <para>
+    Each backend running <command>REPACK</command> will report its progress
+    in the <structname>pg_stat_progress_repack</structname> view. See
+    <xref linkend="repack-progress-reporting"/> for details.
+  </para>
+
+   <para>
+    Repacking a partitioned table repacks each of its partitions. If an index
+    is specified, each partition is repacked using the partition of that
+    index. <command>REPACK</command> on a partitioned table cannot be executed
+    inside a transaction block.
+   </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   Repack the table <literal>employees</literal>:
+<programlisting>
+REPACK employees;
+</programlisting>
+  </para>
+
+  <para>
+   Repack the table <literal>employees</literal> on the basis of its
+   index <literal>employees_ind</literal> (Since index is used here, this is
+   effectively clustering):
+<programlisting>
+REPACK employees USING INDEX employees_ind;
+</programlisting>
+  </para>
+
+  <para>
+   Repack the table <literal>cases</literal> on physical ordering,
+   running an <command>ANALYZE</command> on the given columns once
+   repacking is done, showing informational messages:
+<programlisting>
+REPACK (ANALYZE, VERBOSE) cases (district, case_nr);
+</programlisting>
+  </para>
+
+  <para>
+   Repack all tables in the database on which you have
+   the <literal>MAINTAIN</literal> privilege:
+<programlisting>
+REPACK;
+</programlisting>
+  </para>
+
+  <para>
+   Repack all tables for which a clustering index has previously been
+   configured on which you have the <literal>MAINTAIN</literal> privilege,
+   showing informational messages:
+<programlisting>
+REPACK (VERBOSE) USING INDEX;
+</programlisting>
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There is no <command>REPACK</command> statement in the SQL standard.
+  </para>
+
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index bd5dcaf86a5..062b658cfcd 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -25,7 +25,6 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
 
 <phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
 
-    FULL [ <replaceable class="parameter">boolean</replaceable> ]
     FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
     VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
     ANALYZE [ <replaceable class="parameter">boolean</replaceable> ]
@@ -39,6 +38,7 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
     SKIP_DATABASE_STATS [ <replaceable class="parameter">boolean</replaceable> ]
     ONLY_DATABASE_STATS [ <replaceable class="parameter">boolean</replaceable> ]
     BUFFER_USAGE_LIMIT <replaceable class="parameter">size</replaceable>
+    FULL [ <replaceable class="parameter">boolean</replaceable> ]
 
 <phrase>and <replaceable class="parameter">table_and_columns</replaceable> is:</phrase>
 
@@ -95,20 +95,6 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
   <title>Parameters</title>
 
   <variablelist>
-   <varlistentry>
-    <term><literal>FULL</literal></term>
-    <listitem>
-     <para>
-      Selects <quote>full</quote> vacuum, which can reclaim more
-      space, but takes much longer and exclusively locks the table.
-      This method also requires extra disk space, since it writes a
-      new copy of the table and doesn't release the old copy until
-      the operation is complete.  Usually this should only be used when a
-      significant amount of space needs to be reclaimed from within the table.
-     </para>
-    </listitem>
-   </varlistentry>
-
    <varlistentry>
     <term><literal>FREEZE</literal></term>
     <listitem>
@@ -362,6 +348,23 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>FULL</literal></term>
+    <listitem>
+     <para>
+      This option, which is deprecated, makes <command>VACUUM</command>
+      behave like <command>REPACK</command> without a
+      <literal>USING INDEX</literal> clause.
+      This method of compacting the table takes much longer than
+      <command>VACUUM</command> and exclusively locks the table.
+      This method also requires extra disk space, since it writes a
+      new copy of the table and doesn't release the old copy until
+      the operation is complete.  Usually this should only be used when a
+      significant amount of space needs to be reclaimed from within the table.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><replaceable class="parameter">boolean</replaceable></term>
     <listitem>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index ff85ace83fc..2ee08e21f41 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -195,6 +195,7 @@
    &refreshMaterializedView;
    &reindex;
    &releaseSavepoint;
+   &repack;
    &reset;
    &revoke;
    &rollback;
@@ -257,6 +258,7 @@
    &pgIsready;
    &pgReceivewal;
    &pgRecvlogical;
+   &pgRepackdb;
    &pgRestore;
    &pgVerifyBackup;
    &psqlRef;
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..79f9de5d760 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -741,13 +741,13 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	if (OldIndex != NULL && !use_sort)
 	{
 		const int	ci_index[] = {
-			PROGRESS_CLUSTER_PHASE,
-			PROGRESS_CLUSTER_INDEX_RELID
+			PROGRESS_REPACK_PHASE,
+			PROGRESS_REPACK_INDEX_RELID
 		};
 		int64		ci_val[2];
 
 		/* Set phase and OIDOldIndex to columns */
-		ci_val[0] = PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP;
+		ci_val[0] = PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP;
 		ci_val[1] = RelationGetRelid(OldIndex);
 		pgstat_progress_update_multi_param(2, ci_index, ci_val);
 
@@ -759,15 +759,15 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	else
 	{
 		/* In scan-and-sort mode and also VACUUM FULL, set phase */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
 		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
 		/* Set total heap blocks */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_TOTAL_HEAP_BLKS,
+		pgstat_progress_update_param(PROGRESS_REPACK_TOTAL_HEAP_BLKS,
 									 heapScan->rs_nblocks);
 	}
 
@@ -809,7 +809,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 				 * is manually updated to the correct value when the table
 				 * scan finishes.
 				 */
-				pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+				pgstat_progress_update_param(PROGRESS_REPACK_HEAP_BLKS_SCANNED,
 											 heapScan->rs_nblocks);
 				break;
 			}
@@ -825,7 +825,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			 */
 			if (prev_cblock != heapScan->rs_cblock)
 			{
-				pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+				pgstat_progress_update_param(PROGRESS_REPACK_HEAP_BLKS_SCANNED,
 											 (heapScan->rs_cblock +
 											  heapScan->rs_nblocks -
 											  heapScan->rs_startblock
@@ -912,14 +912,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			 * In scan-and-sort mode, report increase in number of tuples
 			 * scanned
 			 */
-			pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
 										 *num_tuples);
 		}
 		else
 		{
 			const int	ct_index[] = {
-				PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED,
-				PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN
+				PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
+				PROGRESS_REPACK_HEAP_TUPLES_WRITTEN
 			};
 			int64		ct_val[2];
 
@@ -952,14 +952,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		double		n_tuples = 0;
 
 		/* Report that we are now sorting tuples */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_SORT_TUPLES);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_SORT_TUPLES);
 
 		tuplesort_performsort(tuplesort);
 
 		/* Report that we are now writing new heap */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_WRITE_NEW_HEAP);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP);
 
 		for (;;)
 		{
@@ -977,7 +977,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 									 values, isnull,
 									 rwstate);
 			/* Report n_tuples */
-			pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_WRITTEN,
 										 n_tuples);
 		}
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 5d9db167e59..08d4b8e44d7 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -4079,7 +4079,7 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 		Assert(!ReindexIsProcessingIndex(indexOid));
 
 		/* Set index rebuild count */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_INDEX_REBUILD_COUNT,
+		pgstat_progress_update_param(PROGRESS_REPACK_INDEX_REBUILD_COUNT,
 									 i);
 		i++;
 	}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 884b6a23817..77ebf231eed 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1256,14 +1256,15 @@ CREATE VIEW pg_stat_progress_vacuum AS
     FROM pg_stat_get_progress_info('VACUUM') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
-CREATE VIEW pg_stat_progress_cluster AS
+CREATE VIEW pg_stat_progress_repack AS
     SELECT
         S.pid AS pid,
         S.datid AS datid,
         D.datname AS datname,
         S.relid AS relid,
         CASE S.param1 WHEN 1 THEN 'CLUSTER'
-                      WHEN 2 THEN 'VACUUM FULL'
+                      WHEN 2 THEN 'REPACK'
+                      WHEN 3 THEN 'VACUUM FULL'
                       END AS command,
         CASE S.param2 WHEN 0 THEN 'initializing'
                       WHEN 1 THEN 'seq scanning heap'
@@ -1274,15 +1275,35 @@ CREATE VIEW pg_stat_progress_cluster AS
                       WHEN 6 THEN 'rebuilding index'
                       WHEN 7 THEN 'performing final cleanup'
                       END AS phase,
-        CAST(S.param3 AS oid) AS cluster_index_relid,
+        CAST(S.param3 AS oid) AS repack_index_relid,
         S.param4 AS heap_tuples_scanned,
         S.param5 AS heap_tuples_written,
         S.param6 AS heap_blks_total,
         S.param7 AS heap_blks_scanned,
         S.param8 AS index_rebuild_count
-    FROM pg_stat_get_progress_info('CLUSTER') AS S
+    FROM pg_stat_get_progress_info('REPACK') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
+-- This view is as the one above, except for renaming a column and avoiding
+-- 'REPACK' as a command name to report.
+CREATE VIEW pg_stat_progress_cluster AS
+    SELECT
+        pid,
+        datid,
+        datname,
+        relid,
+        CASE WHEN command IN ('CLUSTER', 'VACUUM FULL') THEN command
+             WHEN repack_index_relid = 0 THEN 'VACUUM FULL'
+             ELSE 'CLUSTER' END AS command,
+        phase,
+        repack_index_relid AS cluster_index_relid,
+        heap_tuples_scanned,
+        heap_tuples_written,
+        heap_blks_total,
+        heap_blks_scanned,
+        index_rebuild_count
+    FROM pg_stat_progress_repack;
+
 CREATE VIEW pg_stat_progress_create_index AS
     SELECT
         S.pid AS pid, S.datid AS datid, D.datname AS datname,
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index b55221d44cd..18bee52a4ee 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -67,27 +67,36 @@ typedef struct
 	Oid			indexOid;
 } RelToCluster;
 
-
-static void cluster_multiple_rels(List *rtcs, ClusterParams *params);
-static void rebuild_relation(Relation OldHeap, Relation index, bool verbose);
+static bool cluster_rel_recheck(RepackCommand cmd, Relation OldHeap,
+								Oid indexOid, Oid userid, int options);
+static void rebuild_relation(RepackCommand cmd,
+							 Relation OldHeap, Relation index, bool verbose);
 static void copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
-static List *get_tables_to_cluster(MemoryContext cluster_context);
-static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
-											   Oid indexOid);
-static bool cluster_is_permitted_for_relation(Oid relid, Oid userid);
+static List *get_tables_to_repack(RepackCommand cmd, bool usingindex,
+								  MemoryContext permcxt);
+static List *get_tables_to_repack_partitioned(RepackCommand cmd,
+											  Oid relid, bool rel_is_index,
+											  MemoryContext permcxt);
+static bool cluster_is_permitted_for_relation(RepackCommand cmd,
+											  Oid relid, Oid userid);
+static Relation process_single_relation(RepackStmt *stmt,
+										ClusterParams *params);
+static Oid	determine_clustered_index(Relation rel, bool usingindex,
+									  const char *indexname);
+static const char *RepackCommandAsString(RepackCommand cmd);
 
 
-/*---------------------------------------------------------------------------
- * This cluster code allows for clustering multiple tables at once. Because
+/*
+ * The repack code allows for processing multiple tables at once. Because
  * of this, we cannot just run everything on a single transaction, or we
  * would be forced to acquire exclusive locks on all the tables being
  * clustered, simultaneously --- very likely leading to deadlock.
  *
- * To solve this we follow a similar strategy to VACUUM code,
- * clustering each relation in a separate transaction. For this to work,
- * we need to:
+ * To solve this we follow a similar strategy to VACUUM code, processing each
+ * relation in a separate transaction. For this to work, we need to:
+ *
  *	- provide a separate memory context so that we can pass information in
  *	  a way that survives across transactions
  *	- start a new transaction every time a new relation is clustered
@@ -98,197 +107,165 @@ static bool cluster_is_permitted_for_relation(Oid relid, Oid userid);
  *
  * The single-relation case does not have any such overhead.
  *
- * We also allow a relation to be specified without index.  In that case,
- * the indisclustered bit will be looked up, and an ERROR will be thrown
- * if there is no index with the bit set.
- *---------------------------------------------------------------------------
+ * We also allow a relation to be repacked following an index, but without
+ * naming a specific one.  In that case, the indisclustered bit will be
+ * looked up, and an ERROR will be thrown if no so-marked index is found.
  */
 void
-cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
+ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 {
-	ListCell   *lc;
 	ClusterParams params = {0};
-	bool		verbose = false;
 	Relation	rel = NULL;
-	Oid			indexOid = InvalidOid;
-	MemoryContext cluster_context;
+	MemoryContext repack_context;
 	List	   *rtcs;
 
 	/* Parse option list */
-	foreach(lc, stmt->params)
+	foreach_node(DefElem, opt, stmt->params)
 	{
-		DefElem    *opt = (DefElem *) lfirst(lc);
-
 		if (strcmp(opt->defname, "verbose") == 0)
-			verbose = defGetBoolean(opt);
+			params.options |= defGetBoolean(opt) ? CLUOPT_VERBOSE : 0;
+		else if (strcmp(opt->defname, "analyze") == 0 ||
+				 strcmp(opt->defname, "analyse") == 0)
+			params.options |= defGetBoolean(opt) ? CLUOPT_ANALYZE : 0;
 		else
 			ereport(ERROR,
-					(errcode(ERRCODE_SYNTAX_ERROR),
-					 errmsg("unrecognized CLUSTER option \"%s\"",
-							opt->defname),
-					 parser_errposition(pstate, opt->location)));
+					errcode(ERRCODE_SYNTAX_ERROR),
+					errmsg("unrecognized %s option \"%s\"",
+						   RepackCommandAsString(stmt->command),
+						   opt->defname),
+					parser_errposition(pstate, opt->location));
 	}
 
-	params.options = (verbose ? CLUOPT_VERBOSE : 0);
-
+	/*
+	 * If a single relation is specified, process it and we're done ... unless
+	 * the relation is a partitioned table, in which case we fall through.
+	 */
 	if (stmt->relation != NULL)
 	{
-		/* This is the single-relation case. */
-		Oid			tableOid;
-
-		/*
-		 * Find, lock, and check permissions on the table.  We obtain
-		 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
-		 * single-transaction case.
-		 */
-		tableOid = RangeVarGetRelidExtended(stmt->relation,
-											AccessExclusiveLock,
-											0,
-											RangeVarCallbackMaintainsTable,
-											NULL);
-		rel = table_open(tableOid, NoLock);
-
-		/*
-		 * Reject clustering a remote temp table ... their local buffer
-		 * manager is not going to cope.
-		 */
-		if (RELATION_IS_OTHER_TEMP(rel))
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot cluster temporary tables of other sessions")));
-
-		if (stmt->indexname == NULL)
-		{
-			ListCell   *index;
-
-			/* We need to find the index that has indisclustered set. */
-			foreach(index, RelationGetIndexList(rel))
-			{
-				indexOid = lfirst_oid(index);
-				if (get_index_isclustered(indexOid))
-					break;
-				indexOid = InvalidOid;
-			}
-
-			if (!OidIsValid(indexOid))
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_OBJECT),
-						 errmsg("there is no previously clustered index for table \"%s\"",
-								stmt->relation->relname)));
-		}
-		else
-		{
-			/*
-			 * The index is expected to be in the same namespace as the
-			 * relation.
-			 */
-			indexOid = get_relname_relid(stmt->indexname,
-										 rel->rd_rel->relnamespace);
-			if (!OidIsValid(indexOid))
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_OBJECT),
-						 errmsg("index \"%s\" for table \"%s\" does not exist",
-								stmt->indexname, stmt->relation->relname)));
-		}
-
-		/* For non-partitioned tables, do what we came here to do. */
-		if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
-		{
-			cluster_rel(rel, indexOid, &params);
-			/* cluster_rel closes the relation, but keeps lock */
-
-			return;
-		}
+		rel = process_single_relation(stmt, &params);
+		if (rel == NULL)
+			return;				/* all done */
 	}
 
+	/*
+	 * Don't allow ANALYZE in the multiple-relation case for now.  Maybe we
+	 * can add support for this later.
+	 */
+	if (params.options & CLUOPT_ANALYZE)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot %s multiple tables", "REPACK (ANALYZE)"));
+
 	/*
 	 * By here, we know we are in a multi-table situation.  In order to avoid
 	 * holding locks for too long, we want to process each table in its own
 	 * transaction.  This forces us to disallow running inside a user
 	 * transaction block.
 	 */
-	PreventInTransactionBlock(isTopLevel, "CLUSTER");
+	PreventInTransactionBlock(isTopLevel, RepackCommandAsString(stmt->command));
 
 	/* Also, we need a memory context to hold our list of relations */
-	cluster_context = AllocSetContextCreate(PortalContext,
-											"Cluster",
-											ALLOCSET_DEFAULT_SIZES);
+	repack_context = AllocSetContextCreate(PortalContext,
+										   "Repack",
+										   ALLOCSET_DEFAULT_SIZES);
+
+	params.options |= CLUOPT_RECHECK;
 
 	/*
-	 * Either we're processing a partitioned table, or we were not given any
-	 * table name at all.  In either case, obtain a list of relations to
-	 * process.
-	 *
-	 * In the former case, an index name must have been given, so we don't
-	 * need to recheck its "indisclustered" bit, but we have to check that it
-	 * is an index that we can cluster on.  In the latter case, we set the
-	 * option bit to have indisclustered verified.
-	 *
-	 * Rechecking the relation itself is necessary here in all cases.
+	 * If we don't have a relation yet, determine a relation list.  If we do,
+	 * then it must be a partitioned table, and we want to process its
+	 * partitions.
 	 */
-	params.options |= CLUOPT_RECHECK;
-	if (rel != NULL)
+	if (rel == NULL)
 	{
-		Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
-		check_index_is_clusterable(rel, indexOid, AccessShareLock);
-		rtcs = get_tables_to_cluster_partitioned(cluster_context, indexOid);
-
-		/* close relation, releasing lock on parent table */
-		table_close(rel, AccessExclusiveLock);
+		Assert(stmt->indexname == NULL);
+		rtcs = get_tables_to_repack(stmt->command, stmt->usingindex,
+									repack_context);
 	}
 	else
 	{
-		rtcs = get_tables_to_cluster(cluster_context);
-		params.options |= CLUOPT_RECHECK_ISCLUSTERED;
+		Oid			relid;
+		bool		rel_is_index;
+
+		Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
+
+		/*
+		 * If USING INDEX was specified, resolve the index name now and pass
+		 * it down.
+		 */
+		if (stmt->usingindex)
+		{
+			/*
+			 * If no index name was specified when repacking a partitioned
+			 * table, punt for now.  Maybe we can improve this later.
+			 */
+			if (!stmt->indexname)
+				ereport(ERROR,
+						errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+						errmsg("there is no previously clustered index for table \"%s\"",
+							   RelationGetRelationName(rel)));
+
+			relid = determine_clustered_index(rel, stmt->usingindex,
+											  stmt->indexname);
+			if (!OidIsValid(relid))
+				elog(ERROR, "unable to determine index to cluster on");
+			/* XXX is this the right place for this check? */
+			check_index_is_clusterable(rel, relid, AccessExclusiveLock);
+			rel_is_index = true;
+		}
+		else
+		{
+			relid = RelationGetRelid(rel);
+			rel_is_index = false;
+		}
+
+		rtcs = get_tables_to_repack_partitioned(stmt->command,
+												relid, rel_is_index,
+												repack_context);
+
+		/* close parent relation, releasing lock on it */
+		table_close(rel, AccessExclusiveLock);
+		rel = NULL;
 	}
 
-	/* Do the job. */
-	cluster_multiple_rels(rtcs, &params);
-
-	/* Start a new transaction for the cleanup work. */
-	StartTransactionCommand();
-
-	/* Clean up working storage */
-	MemoryContextDelete(cluster_context);
-}
-
-/*
- * Given a list of relations to cluster, process each of them in a separate
- * transaction.
- *
- * We expect to be in a transaction at start, but there isn't one when we
- * return.
- */
-static void
-cluster_multiple_rels(List *rtcs, ClusterParams *params)
-{
-	ListCell   *lc;
-
 	/* Commit to get out of starting transaction */
 	PopActiveSnapshot();
 	CommitTransactionCommand();
 
 	/* Cluster the tables, each in a separate transaction */
-	foreach(lc, rtcs)
+	Assert(rel == NULL);
+	foreach_ptr(RelToCluster, rtc, rtcs)
 	{
-		RelToCluster *rtc = (RelToCluster *) lfirst(lc);
-		Relation	rel;
-
 		/* Start a new transaction for each relation. */
 		StartTransactionCommand();
 
+		/*
+		 * Open the target table, coping with the case where it has been
+		 * dropped.
+		 */
+		rel = try_table_open(rtc->tableOid, AccessExclusiveLock);
+		if (rel == NULL)
+		{
+			CommitTransactionCommand();
+			continue;
+		}
+
 		/* functions in indexes may want a snapshot set */
 		PushActiveSnapshot(GetTransactionSnapshot());
 
-		rel = table_open(rtc->tableOid, AccessExclusiveLock);
-
 		/* Process this table */
-		cluster_rel(rel, rtc->indexOid, params);
+		cluster_rel(stmt->command, rel, rtc->indexOid, &params);
 		/* cluster_rel closes the relation, but keeps lock */
 
 		PopActiveSnapshot();
 		CommitTransactionCommand();
 	}
+
+	/* Start a new transaction for the cleanup work. */
+	StartTransactionCommand();
+
+	/* Clean up working storage */
+	MemoryContextDelete(repack_context);
 }
 
 /*
@@ -304,11 +281,14 @@ cluster_multiple_rels(List *rtcs, ClusterParams *params)
  * them incrementally while we load the table.
  *
  * If indexOid is InvalidOid, the table will be rewritten in physical order
- * instead of index order.  This is the new implementation of VACUUM FULL,
- * and error messages should refer to the operation as VACUUM not CLUSTER.
+ * instead of index order.
+ *
+ * 'cmd' indicates which command is being executed, to be used for error
+ * messages.
  */
 void
-cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
+cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
+			ClusterParams *params)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			save_userid;
@@ -323,13 +303,8 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	/* Check for user-requested abort. */
 	CHECK_FOR_INTERRUPTS();
 
-	pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
-	if (OidIsValid(indexOid))
-		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
-									 PROGRESS_CLUSTER_COMMAND_CLUSTER);
-	else
-		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
-									 PROGRESS_CLUSTER_COMMAND_VACUUM_FULL);
+	pgstat_progress_start_command(PROGRESS_COMMAND_REPACK, tableOid);
+	pgstat_progress_update_param(PROGRESS_REPACK_COMMAND, cmd);
 
 	/*
 	 * Switch to the table owner's userid, so that any index functions are run
@@ -350,86 +325,38 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	 * *must* skip the one on indisclustered since it would reject an attempt
 	 * to cluster a not-previously-clustered index.
 	 */
-	if (recheck)
-	{
-		/* Check that the user still has privileges for the relation */
-		if (!cluster_is_permitted_for_relation(tableOid, save_userid))
-		{
-			relation_close(OldHeap, AccessExclusiveLock);
-			goto out;
-		}
-
-		/*
-		 * Silently skip a temp table for a remote session.  Only doing this
-		 * check in the "recheck" case is appropriate (which currently means
-		 * somebody is executing a database-wide CLUSTER or on a partitioned
-		 * table), because there is another check in cluster() which will stop
-		 * any attempt to cluster remote temp tables by name.  There is
-		 * another check in cluster_rel which is redundant, but we leave it
-		 * for extra safety.
-		 */
-		if (RELATION_IS_OTHER_TEMP(OldHeap))
-		{
-			relation_close(OldHeap, AccessExclusiveLock);
-			goto out;
-		}
-
-		if (OidIsValid(indexOid))
-		{
-			/*
-			 * Check that the index still exists
-			 */
-			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
-			{
-				relation_close(OldHeap, AccessExclusiveLock);
-				goto out;
-			}
-
-			/*
-			 * Check that the index is still the one with indisclustered set,
-			 * if needed.
-			 */
-			if ((params->options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
-				!get_index_isclustered(indexOid))
-			{
-				relation_close(OldHeap, AccessExclusiveLock);
-				goto out;
-			}
-		}
-	}
+	if (recheck &&
+		!cluster_rel_recheck(cmd, OldHeap, indexOid, save_userid,
+							 params->options))
+		goto out;
 
 	/*
-	 * We allow VACUUM FULL, but not CLUSTER, on shared catalogs.  CLUSTER
-	 * would work in most respects, but the index would only get marked as
-	 * indisclustered in the current database, leading to unexpected behavior
-	 * if CLUSTER were later invoked in another database.
+	 * We allow repacking shared catalogs only when not using an index. It
+	 * would work to use an index in most respects, but the index would only
+	 * get marked as indisclustered in the current database, leading to
+	 * unexpected behavior if CLUSTER were later invoked in another database.
 	 */
 	if (OidIsValid(indexOid) && OldHeap->rd_rel->relisshared)
 		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot cluster a shared catalog")));
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot run %s on a shared catalog",
+					   RepackCommandAsString(cmd)));
 
 	/*
 	 * Don't process temp tables of other backends ... their local buffer
 	 * manager is not going to cope.
 	 */
 	if (RELATION_IS_OTHER_TEMP(OldHeap))
-	{
-		if (OidIsValid(indexOid))
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot cluster temporary tables of other sessions")));
-		else
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot vacuum temporary tables of other sessions")));
-	}
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot run %s on temporary tables of other sessions",
+					   RepackCommandAsString(cmd)));
 
 	/*
 	 * Also check for active uses of the relation in the current transaction,
 	 * including open scans and pending AFTER trigger events.
 	 */
-	CheckTableNotInUse(OldHeap, OidIsValid(indexOid) ? "CLUSTER" : "VACUUM");
+	CheckTableNotInUse(OldHeap, RepackCommandAsString(cmd));
 
 	/* Check heap and index are valid to cluster on */
 	if (OidIsValid(indexOid))
@@ -442,6 +369,24 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	else
 		index = NULL;
 
+	/*
+	 * When allow_system_table_mods is turned off, we disallow repacking a
+	 * catalog on a particular index unless that's already the clustered index
+	 * for that catalog.
+	 *
+	 * XXX We don't check for this in CLUSTER, because it's historically been
+	 * allowed.
+	 */
+	if (cmd != REPACK_COMMAND_CLUSTER &&
+		!allowSystemTableMods && OidIsValid(indexOid) &&
+		IsCatalogRelation(OldHeap) && !index->rd_index->indisclustered)
+		ereport(ERROR,
+				errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				errmsg("permission denied: \"%s\" is a system catalog",
+					   RelationGetRelationName(OldHeap)),
+				errdetail("System catalogs can only be clustered by the index they're already clustered on, if any, unless \"%s\" is enabled.",
+						  "allow_system_table_mods"));
+
 	/*
 	 * Quietly ignore the request if this is a materialized view which has not
 	 * been populated from its query. No harm is done because there is no data
@@ -469,7 +414,7 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	TransferPredicateLocksToHeapRelation(OldHeap);
 
 	/* rebuild_relation does all the dirty work */
-	rebuild_relation(OldHeap, index, verbose);
+	rebuild_relation(cmd, OldHeap, index, verbose);
 	/* rebuild_relation closes OldHeap, and index if valid */
 
 out:
@@ -482,6 +427,63 @@ out:
 	pgstat_progress_end_command();
 }
 
+/*
+ * Check if the table (and its index) still meets the requirements of
+ * cluster_rel().
+ */
+static bool
+cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
+					Oid userid, int options)
+{
+	Oid			tableOid = RelationGetRelid(OldHeap);
+
+	/* Check that the user still has privileges for the relation */
+	if (!cluster_is_permitted_for_relation(cmd, tableOid, userid))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		return false;
+	}
+
+	/*
+	 * Silently skip a temp table for a remote session.  Only doing this check
+	 * in the "recheck" case is appropriate (which currently means somebody is
+	 * executing a database-wide CLUSTER or on a partitioned table), because
+	 * there is another check in cluster() which will stop any attempt to
+	 * cluster remote temp tables by name.  There is another check in
+	 * cluster_rel which is redundant, but we leave it for extra safety.
+	 */
+	if (RELATION_IS_OTHER_TEMP(OldHeap))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		return false;
+	}
+
+	if (OidIsValid(indexOid))
+	{
+		/*
+		 * Check that the index still exists
+		 */
+		if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
+		{
+			relation_close(OldHeap, AccessExclusiveLock);
+			return false;
+		}
+
+		/*
+		 * Check that the index is still the one with indisclustered set, if
+		 * needed.
+		 */
+		if ((options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
+			!get_index_isclustered(indexOid))
+		{
+			relation_close(OldHeap, AccessExclusiveLock);
+			return false;
+		}
+	}
+
+	return true;
+}
+
 /*
  * Verify that the specified heap and index are valid to cluster on
  *
@@ -626,7 +628,8 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
  * On exit, they are closed, but locks on them are not released.
  */
 static void
-rebuild_relation(Relation OldHeap, Relation index, bool verbose)
+rebuild_relation(RepackCommand cmd,
+				 Relation OldHeap, Relation index, bool verbose)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			accessMethod = OldHeap->rd_rel->relam;
@@ -642,8 +645,8 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose)
 	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false) &&
 		   (index == NULL || CheckRelationLockedByMe(index, AccessExclusiveLock, false)));
 
-	if (index)
-		/* Mark the correct index as clustered */
+	/* for CLUSTER or REPACK USING INDEX, mark the index as the one to use */
+	if (index != NULL)
 		mark_index_clustered(OldHeap, RelationGetRelid(index), true);
 
 	/* Remember info about rel before closing OldHeap */
@@ -958,20 +961,20 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	/* Log what we're doing */
 	if (OldIndex != NULL && !use_sort)
 		ereport(elevel,
-				(errmsg("clustering \"%s.%s\" using index scan on \"%s\"",
-						nspname,
-						RelationGetRelationName(OldHeap),
-						RelationGetRelationName(OldIndex))));
+				errmsg("repacking \"%s.%s\" using index scan on \"%s\"",
+					   nspname,
+					   RelationGetRelationName(OldHeap),
+					   RelationGetRelationName(OldIndex)));
 	else if (use_sort)
 		ereport(elevel,
-				(errmsg("clustering \"%s.%s\" using sequential scan and sort",
-						nspname,
-						RelationGetRelationName(OldHeap))));
+				errmsg("repacking \"%s.%s\" using sequential scan and sort",
+					   nspname,
+					   RelationGetRelationName(OldHeap)));
 	else
 		ereport(elevel,
-				(errmsg("vacuuming \"%s.%s\"",
-						nspname,
-						RelationGetRelationName(OldHeap))));
+				errmsg("repacking \"%s.%s\" in physical order",
+					   nspname,
+					   RelationGetRelationName(OldHeap)));
 
 	/*
 	 * Hand off the actual copying to AM specific function, the generic code
@@ -1458,8 +1461,8 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	int			i;
 
 	/* Report that we are now swapping relation files */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_SWAP_REL_FILES);
 
 	/* Zero out possible results from swapped_relation_files */
 	memset(mapped_tables, 0, sizeof(mapped_tables));
@@ -1509,14 +1512,14 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 		reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
 
 	/* Report that we are now reindexing relations */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_REBUILD_INDEX);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
 
 	reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
 
 	/* Report that we are now doing clean up */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_FINAL_CLEANUP);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_FINAL_CLEANUP);
 
 	/*
 	 * If the relation being rebuilt is pg_class, swap_relation_files()
@@ -1632,106 +1635,191 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	}
 }
 
-
 /*
- * Get a list of tables that the current user has privileges on and
- * have indisclustered set.  Return the list in a List * of RelToCluster
- * (stored in the specified memory context), each one giving the tableOid
- * and the indexOid on which the table is already clustered.
+ * Determine which relations to process, when REPACK/CLUSTER is called
+ * without specifying a table name.  The exact process depends on whether
+ * USING INDEX was given or not, and in any case we only return tables and
+ * materialized views that the current user has privileges to repack/cluster.
+ *
+ * If USING INDEX was given, we scan pg_index to find those that have
+ * indisclustered set; if it was not given, scan pg_class and return all
+ * tables.
+ *
+ * Return it as a list of RelToCluster in the given memory context.
  */
 static List *
-get_tables_to_cluster(MemoryContext cluster_context)
+get_tables_to_repack(RepackCommand cmd, bool usingindex, MemoryContext permcxt)
 {
-	Relation	indRelation;
+	Relation	catalog;
 	TableScanDesc scan;
-	ScanKeyData entry;
-	HeapTuple	indexTuple;
-	Form_pg_index index;
-	MemoryContext old_context;
+	HeapTuple	tuple;
 	List	   *rtcs = NIL;
 
-	/*
-	 * Get all indexes that have indisclustered set and that the current user
-	 * has the appropriate privileges for.
-	 */
-	indRelation = table_open(IndexRelationId, AccessShareLock);
-	ScanKeyInit(&entry,
-				Anum_pg_index_indisclustered,
-				BTEqualStrategyNumber, F_BOOLEQ,
-				BoolGetDatum(true));
-	scan = table_beginscan_catalog(indRelation, 1, &entry);
-	while ((indexTuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+	if (usingindex)
 	{
-		RelToCluster *rtc;
+		ScanKeyData entry;
 
-		index = (Form_pg_index) GETSTRUCT(indexTuple);
+		catalog = table_open(IndexRelationId, AccessShareLock);
+		ScanKeyInit(&entry,
+					Anum_pg_index_indisclustered,
+					BTEqualStrategyNumber, F_BOOLEQ,
+					BoolGetDatum(true));
+		scan = table_beginscan_catalog(catalog, 1, &entry);
+		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+		{
+			RelToCluster *rtc;
+			Form_pg_index index;
+			MemoryContext oldcxt;
 
-		if (!cluster_is_permitted_for_relation(index->indrelid, GetUserId()))
-			continue;
+			index = (Form_pg_index) GETSTRUCT(tuple);
 
-		/* Use a permanent memory context for the result list */
-		old_context = MemoryContextSwitchTo(cluster_context);
+			/*
+			 * Try to obtain a light lock on the index's table, to ensure it
+			 * doesn't go away while we collect the list.  If we cannot, just
+			 * disregard it.
+			 */
+			if (!ConditionalLockRelationOid(index->indrelid, AccessShareLock))
+				continue;
 
-		rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
-		rtc->tableOid = index->indrelid;
-		rtc->indexOid = index->indexrelid;
-		rtcs = lappend(rtcs, rtc);
+			/* Verify that the table still exists */
+			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(index->indrelid)))
+			{
+				/* Release useless lock */
+				UnlockRelationOid(index->indrelid, AccessShareLock);
+				continue;
+			}
 
-		MemoryContextSwitchTo(old_context);
+			if (!cluster_is_permitted_for_relation(cmd, index->indrelid,
+												   GetUserId()))
+				continue;
+
+			/* Use a permanent memory context for the result list */
+			oldcxt = MemoryContextSwitchTo(permcxt);
+			rtc = palloc(sizeof(RelToCluster));
+			rtc->tableOid = index->indrelid;
+			rtc->indexOid = index->indexrelid;
+			rtcs = lappend(rtcs, rtc);
+			MemoryContextSwitchTo(oldcxt);
+		}
 	}
-	table_endscan(scan);
+	else
+	{
+		catalog = table_open(RelationRelationId, AccessShareLock);
+		scan = table_beginscan_catalog(catalog, 0, NULL);
 
-	relation_close(indRelation, AccessShareLock);
+		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+		{
+			RelToCluster *rtc;
+			Form_pg_class class;
+			MemoryContext oldcxt;
+
+			class = (Form_pg_class) GETSTRUCT(tuple);
+
+			/*
+			 * Try to obtain a light lock on the table, to ensure it doesn't
+			 * go away while we collect the list.  If we cannot, just
+			 * disregard the table.
+			 */
+			if (!ConditionalLockRelationOid(class->oid, AccessShareLock))
+				continue;
+
+			/* Verify that the table still exists */
+			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(class->oid)))
+			{
+				/* Release useless lock */
+				UnlockRelationOid(class->oid, AccessShareLock);
+				continue;
+			}
+
+			/* Can only process plain tables and matviews */
+			if (class->relkind != RELKIND_RELATION &&
+				class->relkind != RELKIND_MATVIEW)
+				continue;
+
+			/* noisily skip rels which the user can't process */
+			if (!cluster_is_permitted_for_relation(cmd, class->oid,
+												   GetUserId()))
+				continue;
+
+			/* Use a permanent memory context for the result list */
+			oldcxt = MemoryContextSwitchTo(permcxt);
+			rtc = palloc(sizeof(RelToCluster));
+			rtc->tableOid = class->oid;
+			rtc->indexOid = InvalidOid;
+			rtcs = lappend(rtcs, rtc);
+			MemoryContextSwitchTo(oldcxt);
+		}
+	}
+
+	table_endscan(scan);
+	relation_close(catalog, AccessShareLock);
 
 	return rtcs;
 }
 
 /*
- * Given an index on a partitioned table, return a list of RelToCluster for
+ * Given a partitioned table or its index, return a list of RelToCluster for
  * all the children leaves tables/indexes.
  *
  * Like expand_vacuum_rel, but here caller must hold AccessExclusiveLock
  * on the table containing the index.
+ *
+ * 'rel_is_index' tells whether 'relid' is that of an index (true) or of the
+ * owning relation.
  */
 static List *
-get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
+get_tables_to_repack_partitioned(RepackCommand cmd, Oid relid,
+								 bool rel_is_index, MemoryContext permcxt)
 {
 	List	   *inhoids;
-	ListCell   *lc;
 	List	   *rtcs = NIL;
-	MemoryContext old_context;
 
-	/* Do not lock the children until they're processed */
-	inhoids = find_all_inheritors(indexOid, NoLock, NULL);
-
-	foreach(lc, inhoids)
+	/*
+	 * Do not lock the children until they're processed.  Note that we do hold
+	 * a lock on the parent partitioned table.
+	 */
+	inhoids = find_all_inheritors(relid, NoLock, NULL);
+	foreach_oid(child_oid, inhoids)
 	{
-		Oid			indexrelid = lfirst_oid(lc);
-		Oid			relid = IndexGetRelation(indexrelid, false);
+		Oid			table_oid,
+					index_oid;
 		RelToCluster *rtc;
+		MemoryContext oldcxt;
 
-		/* consider only leaf indexes */
-		if (get_rel_relkind(indexrelid) != RELKIND_INDEX)
-			continue;
+		if (rel_is_index)
+		{
+			/* consider only leaf indexes */
+			if (get_rel_relkind(child_oid) != RELKIND_INDEX)
+				continue;
+
+			table_oid = IndexGetRelation(child_oid, false);
+			index_oid = child_oid;
+		}
+		else
+		{
+			/* consider only leaf relations */
+			if (get_rel_relkind(child_oid) != RELKIND_RELATION)
+				continue;
+
+			table_oid = child_oid;
+			index_oid = InvalidOid;
+		}
 
 		/*
 		 * It's possible that the user does not have privileges to CLUSTER the
-		 * leaf partition despite having such privileges on the partitioned
-		 * table.  We skip any partitions which the user is not permitted to
-		 * CLUSTER.
+		 * leaf partition despite having them on the partitioned table.  Skip
+		 * if so.
 		 */
-		if (!cluster_is_permitted_for_relation(relid, GetUserId()))
+		if (!cluster_is_permitted_for_relation(cmd, table_oid, GetUserId()))
 			continue;
 
 		/* Use a permanent memory context for the result list */
-		old_context = MemoryContextSwitchTo(cluster_context);
-
-		rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
-		rtc->tableOid = relid;
-		rtc->indexOid = indexrelid;
+		oldcxt = MemoryContextSwitchTo(permcxt);
+		rtc = palloc(sizeof(RelToCluster));
+		rtc->tableOid = table_oid;
+		rtc->indexOid = index_oid;
 		rtcs = lappend(rtcs, rtc);
-
-		MemoryContextSwitchTo(old_context);
+		MemoryContextSwitchTo(oldcxt);
 	}
 
 	return rtcs;
@@ -1742,13 +1830,167 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
  * function emits a WARNING.
  */
 static bool
-cluster_is_permitted_for_relation(Oid relid, Oid userid)
+cluster_is_permitted_for_relation(RepackCommand cmd, Oid relid, Oid userid)
 {
+	Assert(cmd == REPACK_COMMAND_CLUSTER || cmd == REPACK_COMMAND_REPACK);
+
 	if (pg_class_aclcheck(relid, userid, ACL_MAINTAIN) == ACLCHECK_OK)
 		return true;
 
 	ereport(WARNING,
-			(errmsg("permission denied to cluster \"%s\", skipping it",
-					get_rel_name(relid))));
+			errmsg("permission denied to execute %s on \"%s\", skipping it",
+				   RepackCommandAsString(cmd),
+				   get_rel_name(relid)));
+
 	return false;
 }
+
+
+/*
+ * Given a RepackStmt with an indicated relation name, resolve the relation
+ * name, obtain lock on it, then determine what to do based on the relation
+ * type: if it's table and not partitioned, repack it as indicated (using an
+ * existing clustered index, or following the given one), and return NULL.
+ *
+ * On the other hand, if the table is partitioned, do nothing further and
+ * instead return the opened and locked relcache entry, so that caller can
+ * process the partitions using the multiple-table handling code.  In this
+ * case, if an index name is given, it's up to the caller to resolve it.
+ */
+static Relation
+process_single_relation(RepackStmt *stmt, ClusterParams *params)
+{
+	Relation	rel;
+	Oid			tableOid;
+
+	Assert(stmt->relation != NULL);
+	Assert(stmt->command == REPACK_COMMAND_CLUSTER ||
+		   stmt->command == REPACK_COMMAND_REPACK);
+
+	/*
+	 * Find, lock, and check permissions on the table.  We obtain
+	 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
+	 * single-transaction case.
+	 */
+	tableOid = RangeVarGetRelidExtended(stmt->relation->relation,
+										AccessExclusiveLock,
+										0,
+										RangeVarCallbackMaintainsTable,
+										NULL);
+	rel = table_open(tableOid, NoLock);
+
+	/*
+	 * Reject clustering a remote temp table ... their local buffer manager is
+	 * not going to cope.
+	 */
+	if (RELATION_IS_OTHER_TEMP(rel))
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot execute %s on temporary tables of other sessions",
+					   RepackCommandAsString(stmt->command)));
+
+	/*
+	 * Make sure ANALYZE is specified if a column list is present.
+	 */
+	if ((params->options & CLUOPT_ANALYZE) == 0 && stmt->relation->va_cols != NIL)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("ANALYZE option must be specified when a column list is provided"));
+
+	/*
+	 * For partitioned tables, let caller handle this.  Otherwise, process it
+	 * here and we're done.
+	 */
+	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+		return rel;
+	else
+	{
+		Oid			indexOid;
+
+		indexOid = determine_clustered_index(rel, stmt->usingindex,
+											 stmt->indexname);
+		if (OidIsValid(indexOid))
+			check_index_is_clusterable(rel, indexOid, AccessExclusiveLock);
+		cluster_rel(stmt->command, rel, indexOid, params);
+
+		/* Do an analyze, if requested */
+		if (params->options & CLUOPT_ANALYZE)
+		{
+			VacuumParams vac_params = {0};
+
+			vac_params.options |= VACOPT_ANALYZE;
+			if (params->options & CLUOPT_VERBOSE)
+				vac_params.options |= VACOPT_VERBOSE;
+			analyze_rel(RelationGetRelid(rel), NULL, vac_params,
+						stmt->relation->va_cols, true, NULL);
+		}
+
+		return NULL;
+	}
+}
+
+/*
+ * Given a relation and the usingindex/indexname options in a
+ * REPACK USING INDEX or CLUSTER command, return the OID of the
+ * index to use for clustering the table.
+ *
+ * Caller must hold lock on the relation so that the set of indexes
+ * doesn't change, and must call check_index_is_clusterable.
+ */
+static Oid
+determine_clustered_index(Relation rel, bool usingindex, const char *indexname)
+{
+	Oid			indexOid;
+
+	if (indexname == NULL && usingindex)
+	{
+		/*
+		 * If USING INDEX with no name is given, find a clustered index, or
+		 * error out if none.
+		 */
+		indexOid = InvalidOid;
+		foreach_oid(idxoid, RelationGetIndexList(rel))
+		{
+			if (get_index_isclustered(idxoid))
+			{
+				indexOid = idxoid;
+				break;
+			}
+		}
+
+		if (!OidIsValid(indexOid))
+			ereport(ERROR,
+					errcode(ERRCODE_UNDEFINED_OBJECT),
+					errmsg("there is no previously clustered index for table \"%s\"",
+						   RelationGetRelationName(rel)));
+	}
+	else if (indexname != NULL)
+	{
+		/* An index was specified; obtain its OID. */
+		indexOid = get_relname_relid(indexname, rel->rd_rel->relnamespace);
+		if (!OidIsValid(indexOid))
+			ereport(ERROR,
+					errcode(ERRCODE_UNDEFINED_OBJECT),
+					errmsg("index \"%s\" for table \"%s\" does not exist",
+						   indexname, RelationGetRelationName(rel)));
+	}
+	else
+		indexOid = InvalidOid;
+
+	return indexOid;
+}
+
+static const char *
+RepackCommandAsString(RepackCommand cmd)
+{
+	switch (cmd)
+	{
+		case REPACK_COMMAND_REPACK:
+			return "REPACK";
+		case REPACK_COMMAND_VACUUMFULL:
+			return "VACUUM";
+		case REPACK_COMMAND_CLUSTER:
+			return "CLUSTER";
+	}
+	return "???";
+}
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 733ef40ae7c..a141f4557dc 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -358,7 +358,6 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 		}
 	}
 
-
 	/*
 	 * Sanity check DISABLE_PAGE_SKIPPING option.
 	 */
@@ -2286,8 +2285,9 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 			if ((params.options & VACOPT_VERBOSE) != 0)
 				cluster_params.options |= CLUOPT_VERBOSE;
 
-			/* VACUUM FULL is now a variant of CLUSTER; see cluster.c */
-			cluster_rel(rel, InvalidOid, &cluster_params);
+			/* VACUUM FULL is a variant of REPACK; see cluster.c */
+			cluster_rel(REPACK_COMMAND_VACUUMFULL, rel, InvalidOid,
+						&cluster_params);
 			/* cluster_rel closes the relation, but keeps lock */
 
 			rel = NULL;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 57bf7a7c7f2..1b905a0d792 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -281,7 +281,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		AlterCompositeTypeStmt AlterUserMappingStmt
 		AlterRoleStmt AlterRoleSetStmt AlterPolicyStmt AlterStatsStmt
 		AlterDefaultPrivilegesStmt DefACLAction
-		AnalyzeStmt CallStmt ClosePortalStmt ClusterStmt CommentStmt
+		AnalyzeStmt CallStmt ClosePortalStmt CommentStmt
 		ConstraintsSetStmt CopyStmt CreateAsStmt CreateCastStmt
 		CreateDomainStmt CreateExtensionStmt CreateGroupStmt CreateOpClassStmt
 		CreateOpFamilyStmt AlterOpFamilyStmt CreatePLangStmt
@@ -298,7 +298,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		GrantStmt GrantRoleStmt ImportForeignSchemaStmt IndexStmt InsertStmt
 		ListenStmt LoadStmt LockStmt MergeStmt NotifyStmt ExplainableStmt PreparableStmt
 		CreateFunctionStmt AlterFunctionStmt ReindexStmt RemoveAggrStmt
-		RemoveFuncStmt RemoveOperStmt RenameStmt ReturnStmt RevokeStmt RevokeRoleStmt
+		RemoveFuncStmt RemoveOperStmt RenameStmt RepackStmt ReturnStmt RevokeStmt RevokeRoleStmt
 		RuleActionStmt RuleActionStmtOrEmpty RuleStmt
 		SecLabelStmt SelectStmt TransactionStmt TransactionStmtLegacy TruncateStmt
 		UnlistenStmt UpdateStmt VacuumStmt
@@ -317,7 +317,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 %type <str>			opt_single_name
 %type <list>		opt_qualified_name
-%type <boolean>		opt_concurrently
+%type <boolean>		opt_concurrently opt_usingindex
 %type <dbehavior>	opt_drop_behavior
 %type <list>		opt_utility_option_list
 %type <list>		utility_option_list
@@ -764,7 +764,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	QUOTE QUOTES
 
 	RANGE READ REAL REASSIGN RECURSIVE REF_P REFERENCES REFERENCING
-	REFRESH REINDEX RELATIVE_P RELEASE RENAME REPEATABLE REPLACE REPLICA
+	REFRESH REINDEX RELATIVE_P RELEASE RENAME REPACK REPEATABLE REPLACE REPLICA
 	RESET RESPECT_P RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP
 	ROUTINE ROUTINES ROW ROWS RULE
 
@@ -1026,7 +1026,6 @@ stmt:
 			| CallStmt
 			| CheckPointStmt
 			| ClosePortalStmt
-			| ClusterStmt
 			| CommentStmt
 			| ConstraintsSetStmt
 			| CopyStmt
@@ -1100,6 +1099,7 @@ stmt:
 			| RemoveFuncStmt
 			| RemoveOperStmt
 			| RenameStmt
+			| RepackStmt
 			| RevokeStmt
 			| RevokeRoleStmt
 			| RuleStmt
@@ -1136,6 +1136,11 @@ opt_concurrently:
 			| /*EMPTY*/						{ $$ = false; }
 		;
 
+opt_usingindex:
+			USING INDEX						{ $$ = true; }
+			| /* EMPTY */					{ $$ = false; }
+		;
+
 opt_drop_behavior:
 			CASCADE							{ $$ = DROP_CASCADE; }
 			| RESTRICT						{ $$ = DROP_RESTRICT; }
@@ -11914,38 +11919,93 @@ CreateConversionStmt:
 /*****************************************************************************
  *
  *		QUERY:
+ *				REPACK [ (options) ] [ <qualified_name> [ <name_list> ] [ USING INDEX <index_name> ] ]
+ *
+ *			obsolete variants:
  *				CLUSTER (options) [ <qualified_name> [ USING <index_name> ] ]
  *				CLUSTER [VERBOSE] [ <qualified_name> [ USING <index_name> ] ]
  *				CLUSTER [VERBOSE] <index_name> ON <qualified_name> (for pre-8.3)
  *
  *****************************************************************************/
 
-ClusterStmt:
-			CLUSTER '(' utility_option_list ')' qualified_name cluster_index_specification
+RepackStmt:
+			REPACK opt_utility_option_list vacuum_relation USING INDEX name
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
-					n->relation = $5;
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = (VacuumRelation *) $3;
 					n->indexname = $6;
+					n->usingindex = true;
+					n->params = $2;
+					$$ = (Node *) n;
+				}
+			| REPACK opt_utility_option_list vacuum_relation opt_usingindex
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = (VacuumRelation *) $3;
+					n->indexname = NULL;
+					n->usingindex = $4;
+					n->params = $2;
+					$$ = (Node *) n;
+				}
+			| REPACK '(' utility_option_list ')'
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = NULL;
+					n->indexname = NULL;
+					n->usingindex = false;
+					n->params = $3;
+					$$ = (Node *) n;
+				}
+			| REPACK opt_usingindex
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = NULL;
+					n->indexname = NULL;
+					n->usingindex = $2;
+					n->params = NIL;
+					$$ = (Node *) n;
+				}
+			| CLUSTER '(' utility_option_list ')' qualified_name cluster_index_specification
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_CLUSTER;
+					n->relation = makeNode(VacuumRelation);
+					n->relation->relation = $5;
+					n->indexname = $6;
+					n->usingindex = true;
 					n->params = $3;
 					$$ = (Node *) n;
 				}
 			| CLUSTER opt_utility_option_list
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = NULL;
 					n->indexname = NULL;
+					n->usingindex = true;
 					n->params = $2;
 					$$ = (Node *) n;
 				}
 			/* unparenthesized VERBOSE kept for pre-14 compatibility */
 			| CLUSTER opt_verbose qualified_name cluster_index_specification
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
-					n->relation = $3;
+					n->command = REPACK_COMMAND_CLUSTER;
+					n->relation = makeNode(VacuumRelation);
+					n->relation->relation = $3;
 					n->indexname = $4;
+					n->usingindex = true;
 					if ($2)
 						n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
@@ -11953,20 +12013,25 @@ ClusterStmt:
 			/* unparenthesized VERBOSE kept for pre-17 compatibility */
 			| CLUSTER VERBOSE
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = NULL;
 					n->indexname = NULL;
+					n->usingindex = true;
 					n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
 				}
 			/* kept for pre-8.3 compatibility */
 			| CLUSTER opt_verbose name ON qualified_name
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
-					n->relation = $5;
+					n->command = REPACK_COMMAND_CLUSTER;
+					n->relation = makeNode(VacuumRelation);
+					n->relation->relation = $5;
 					n->indexname = $3;
+					n->usingindex = true;
 					if ($2)
 						n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
@@ -17983,6 +18048,7 @@ unreserved_keyword:
 			| RELATIVE_P
 			| RELEASE
 			| RENAME
+			| REPACK
 			| REPEATABLE
 			| REPLACE
 			| REPLICA
@@ -18616,6 +18682,7 @@ bare_label_keyword:
 			| RELATIVE_P
 			| RELEASE
 			| RENAME
+			| REPACK
 			| REPEATABLE
 			| REPLACE
 			| REPLICA
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 918db53dd5e..1295dc25d02 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -277,9 +277,9 @@ ClassifyUtilityCommandAsReadOnly(Node *parsetree)
 				return COMMAND_OK_IN_RECOVERY | COMMAND_OK_IN_READ_ONLY_TXN;
 			}
 
-		case T_ClusterStmt:
 		case T_ReindexStmt:
 		case T_VacuumStmt:
+		case T_RepackStmt:
 			{
 				/*
 				 * These commands write WAL, so they're not strictly
@@ -854,14 +854,14 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 			ExecuteCallStmt(castNode(CallStmt, parsetree), params, isAtomicContext, dest);
 			break;
 
-		case T_ClusterStmt:
-			cluster(pstate, (ClusterStmt *) parsetree, isTopLevel);
-			break;
-
 		case T_VacuumStmt:
 			ExecVacuum(pstate, (VacuumStmt *) parsetree, isTopLevel);
 			break;
 
+		case T_RepackStmt:
+			ExecRepack(pstate, (RepackStmt *) parsetree, isTopLevel);
+			break;
+
 		case T_ExplainStmt:
 			ExplainQuery(pstate, (ExplainStmt *) parsetree, params, dest);
 			break;
@@ -2850,10 +2850,6 @@ CreateCommandTag(Node *parsetree)
 			tag = CMDTAG_CALL;
 			break;
 
-		case T_ClusterStmt:
-			tag = CMDTAG_CLUSTER;
-			break;
-
 		case T_VacuumStmt:
 			if (((VacuumStmt *) parsetree)->is_vacuumcmd)
 				tag = CMDTAG_VACUUM;
@@ -2861,6 +2857,10 @@ CreateCommandTag(Node *parsetree)
 				tag = CMDTAG_ANALYZE;
 			break;
 
+		case T_RepackStmt:
+			tag = CMDTAG_REPACK;
+			break;
+
 		case T_ExplainStmt:
 			tag = CMDTAG_EXPLAIN;
 			break;
@@ -3498,7 +3498,7 @@ GetCommandLogLevel(Node *parsetree)
 			lev = LOGSTMT_ALL;
 			break;
 
-		case T_ClusterStmt:
+		case T_RepackStmt:
 			lev = LOGSTMT_DDL;
 			break;
 
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 7e89a8048d5..ab55239a6ca 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -271,6 +271,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 		cmdtype = PROGRESS_COMMAND_ANALYZE;
 	else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
 		cmdtype = PROGRESS_COMMAND_CLUSTER;
+	else if (pg_strcasecmp(cmd, "REPACK") == 0)
+		cmdtype = PROGRESS_COMMAND_REPACK;
 	else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)
 		cmdtype = PROGRESS_COMMAND_CREATE_INDEX;
 	else if (pg_strcasecmp(cmd, "BASEBACKUP") == 0)
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 6176741d20b..455d145d428 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -1257,7 +1257,7 @@ static const char *const sql_commands[] = {
 	"DELETE FROM", "DISCARD", "DO", "DROP", "END", "EXECUTE", "EXPLAIN",
 	"FETCH", "GRANT", "IMPORT FOREIGN SCHEMA", "INSERT INTO", "LISTEN", "LOAD", "LOCK",
 	"MERGE INTO", "MOVE", "NOTIFY", "PREPARE",
-	"REASSIGN", "REFRESH MATERIALIZED VIEW", "REINDEX", "RELEASE",
+	"REASSIGN", "REFRESH MATERIALIZED VIEW", "REINDEX", "RELEASE", "REPACK",
 	"RESET", "REVOKE", "ROLLBACK",
 	"SAVEPOINT", "SECURITY LABEL", "SELECT", "SET", "SHOW", "START",
 	"TABLE", "TRUNCATE", "UNLISTEN", "UPDATE", "VACUUM", "VALUES", "WITH",
@@ -5001,6 +5001,37 @@ match_previous_words(int pattern_id,
 			COMPLETE_WITH_QUERY(Query_for_list_of_tablespaces);
 	}
 
+/* REPACK */
+	else if (Matches("REPACK"))
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
+	else if (Matches("REPACK", "(*)"))
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
+	/* If we have REPACK <sth>, then add "USING INDEX" */
+	else if (Matches("REPACK", MatchAnyExcept("(")))
+		COMPLETE_WITH("USING INDEX");
+	/* If we have REPACK (*) <sth>, then add "USING INDEX" */
+	else if (Matches("REPACK", "(*)", MatchAny))
+		COMPLETE_WITH("USING INDEX");
+	/* If we have REPACK <sth> USING, then add the index as well */
+	else if (Matches("REPACK", MatchAny, "USING", "INDEX"))
+	{
+		set_completion_reference(prev3_wd);
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_index_of_table);
+	}
+	else if (HeadMatches("REPACK", "(*") &&
+			 !HeadMatches("REPACK", "(*)"))
+	{
+		/*
+		 * This fires if we're in an unfinished parenthesized option list.
+		 * get_previous_words treats a completed parenthesized option list as
+		 * one word, so the above test is correct.
+		 */
+		if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
+			COMPLETE_WITH("VERBOSE");
+		else if (TailMatches("VERBOSE"))
+			COMPLETE_WITH("ON", "OFF");
+	}
+
 /* SECURITY LABEL */
 	else if (Matches("SECURITY"))
 		COMPLETE_WITH("LABEL");
diff --git a/src/bin/scripts/Makefile b/src/bin/scripts/Makefile
index 019ca06455d..f0c1bd4175c 100644
--- a/src/bin/scripts/Makefile
+++ b/src/bin/scripts/Makefile
@@ -16,7 +16,7 @@ subdir = src/bin/scripts
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-PROGRAMS = createdb createuser dropdb dropuser clusterdb vacuumdb reindexdb pg_isready
+PROGRAMS = createdb createuser dropdb dropuser clusterdb vacuumdb reindexdb pg_isready pg_repackdb
 
 override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
 LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
@@ -31,6 +31,7 @@ clusterdb: clusterdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport su
 vacuumdb: vacuumdb.o vacuuming.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 reindexdb: reindexdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 pg_isready: pg_isready.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
+pg_repackdb: pg_repackdb.o vacuuming.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 
 install: all installdirs
 	$(INSTALL_PROGRAM) createdb$(X)   '$(DESTDIR)$(bindir)'/createdb$(X)
@@ -41,6 +42,7 @@ install: all installdirs
 	$(INSTALL_PROGRAM) vacuumdb$(X)   '$(DESTDIR)$(bindir)'/vacuumdb$(X)
 	$(INSTALL_PROGRAM) reindexdb$(X)  '$(DESTDIR)$(bindir)'/reindexdb$(X)
 	$(INSTALL_PROGRAM) pg_isready$(X) '$(DESTDIR)$(bindir)'/pg_isready$(X)
+	$(INSTALL_PROGRAM) pg_repackdb$(X) '$(DESTDIR)$(bindir)'/pg_repackdb$(X)
 
 installdirs:
 	$(MKDIR_P) '$(DESTDIR)$(bindir)'
diff --git a/src/bin/scripts/meson.build b/src/bin/scripts/meson.build
index a4fed59d1c9..be573cae682 100644
--- a/src/bin/scripts/meson.build
+++ b/src/bin/scripts/meson.build
@@ -42,6 +42,7 @@ vacuuming_common = static_library('libvacuuming_common',
 
 binaries = [
   'vacuumdb',
+  'pg_repackdb',
 ]
 foreach binary : binaries
   binary_sources = files('@0@.c'.format(binary))
@@ -80,6 +81,7 @@ tests += {
       't/100_vacuumdb.pl',
       't/101_vacuumdb_all.pl',
       't/102_vacuumdb_stages.pl',
+      't/103_repackdb.pl',
       't/200_connstr.pl',
     ],
   },
diff --git a/src/bin/scripts/pg_repackdb.c b/src/bin/scripts/pg_repackdb.c
new file mode 100644
index 00000000000..328a5baefbc
--- /dev/null
+++ b/src/bin/scripts/pg_repackdb.c
@@ -0,0 +1,242 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_repackdb
+ *		An utility to run REPACK
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * FIXME: this is missing a way to specify the index to use to repack one
+ * table, or whether to pass a WITH INDEX clause when multiple tables are
+ * used.  Something like --index[=indexname].  Adding that bleeds into
+ * vacuuming.c as well.
+ *
+ * src/bin/scripts/pg_repackdb.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <limits.h>
+
+#include "common.h"
+#include "common/logging.h"
+#include "fe_utils/option_utils.h"
+#include "vacuuming.h"
+
+static void help(const char *progname);
+static void check_objfilter(bits32 objfilter);
+
+int
+main(int argc, char *argv[])
+{
+	static struct option long_options[] = {
+		{"host", required_argument, NULL, 'h'},
+		{"port", required_argument, NULL, 'p'},
+		{"username", required_argument, NULL, 'U'},
+		{"no-password", no_argument, NULL, 'w'},
+		{"password", no_argument, NULL, 'W'},
+		{"echo", no_argument, NULL, 'e'},
+		{"quiet", no_argument, NULL, 'q'},
+		{"dbname", required_argument, NULL, 'd'},
+		{"analyze", no_argument, NULL, 'z'},
+		{"all", no_argument, NULL, 'a'},
+		/* XXX this could be 'i', but optional_arg is messy */
+		{"index", optional_argument, NULL, 1},
+		{"table", required_argument, NULL, 't'},
+		{"verbose", no_argument, NULL, 'v'},
+		{"jobs", required_argument, NULL, 'j'},
+		{"schema", required_argument, NULL, 'n'},
+		{"exclude-schema", required_argument, NULL, 'N'},
+		{"maintenance-db", required_argument, NULL, 2},
+		{NULL, 0, NULL, 0}
+	};
+
+	const char *progname;
+	int			optindex;
+	int			c;
+	const char *dbname = NULL;
+	const char *maintenance_db = NULL;
+	ConnParams	cparams;
+	bool		echo = false;
+	bool		quiet = false;
+	vacuumingOptions vacopts;
+	SimpleStringList objects = {NULL, NULL};
+	int			concurrentCons = 1;
+	int			tbl_count = 0;
+	int			ret;
+
+	/* initialize options */
+	memset(&vacopts, 0, sizeof(vacopts));
+	vacopts.mode = MODE_REPACK;
+
+	/* the same for connection parameters */
+	memset(&cparams, 0, sizeof(cparams));
+	cparams.prompt_password = TRI_DEFAULT;
+
+	pg_logging_init(argv[0]);
+	progname = get_progname(argv[0]);
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
+
+	handle_help_version_opts(argc, argv, progname, help);
+
+	while ((c = getopt_long(argc, argv, "ad:eh:j:n:N:p:qt:U:vwWz",
+							long_options, &optindex)) != -1)
+	{
+		switch (c)
+		{
+			case 'a':
+				vacopts.objfilter |= OBJFILTER_ALL_DBS;
+				break;
+			case 'd':
+				vacopts.objfilter |= OBJFILTER_DATABASE;
+				dbname = pg_strdup(optarg);
+				break;
+			case 'e':
+				echo = true;
+				break;
+			case 'h':
+				cparams.pghost = pg_strdup(optarg);
+				break;
+			case 'j':
+				if (!option_parse_int(optarg, "-j/--jobs", 1, INT_MAX,
+									  &concurrentCons))
+					exit(1);
+				break;
+			case 'n':
+				vacopts.objfilter |= OBJFILTER_SCHEMA;
+				simple_string_list_append(&objects, optarg);
+				break;
+			case 'N':
+				vacopts.objfilter |= OBJFILTER_SCHEMA_EXCLUDE;
+				simple_string_list_append(&objects, optarg);
+				break;
+			case 'p':
+				cparams.pgport = pg_strdup(optarg);
+				break;
+			case 'q':
+				quiet = true;
+				break;
+			case 't':
+				vacopts.objfilter |= OBJFILTER_TABLE;
+				simple_string_list_append(&objects, optarg);
+				tbl_count++;
+				break;
+			case 'U':
+				cparams.pguser = pg_strdup(optarg);
+				break;
+			case 'v':
+				vacopts.verbose = true;
+				break;
+			case 'w':
+				cparams.prompt_password = TRI_NO;
+				break;
+			case 'W':
+				cparams.prompt_password = TRI_YES;
+				break;
+			case 'z':
+				vacopts.and_analyze = true;
+				break;
+			case 1:
+				vacopts.using_index = true;
+				if (optarg)
+					vacopts.indexname = pg_strdup(optarg);
+				else
+					vacopts.indexname = NULL;
+				break;
+			case 2:
+				maintenance_db = pg_strdup(optarg);
+				break;
+			default:
+				/* getopt_long already emitted a complaint */
+				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+				exit(1);
+		}
+	}
+
+	/*
+	 * Non-option argument specifies database name as long as it wasn't
+	 * already specified with -d / --dbname
+	 */
+	if (optind < argc && dbname == NULL)
+	{
+		vacopts.objfilter |= OBJFILTER_DATABASE;
+		dbname = argv[optind];
+		optind++;
+	}
+
+	if (optind < argc)
+	{
+		pg_log_error("too many command-line arguments (first is \"%s\")",
+					 argv[optind]);
+		pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+		exit(1);
+	}
+
+	/*
+	 * Validate the combination of filters specified in the command-line
+	 * options.
+	 */
+	check_objfilter(vacopts.objfilter);
+
+	ret = vacuuming_main(&cparams, dbname, maintenance_db, &vacopts,
+						 &objects, tbl_count, concurrentCons,
+						 progname, echo, quiet);
+	exit(ret);
+}
+
+/*
+ * Verify that the filters used at command line are compatible.
+ */
+void
+check_objfilter(bits32 objfilter)
+{
+	if ((objfilter & OBJFILTER_ALL_DBS) &&
+		(objfilter & OBJFILTER_DATABASE))
+		pg_fatal("cannot repack all databases and a specific one at the same time");
+
+	if ((objfilter & OBJFILTER_TABLE) &&
+		(objfilter & OBJFILTER_SCHEMA))
+		pg_fatal("cannot repack all tables in schema(s) and specific table(s) at the same time");
+
+	if ((objfilter & OBJFILTER_TABLE) &&
+		(objfilter & OBJFILTER_SCHEMA_EXCLUDE))
+		pg_fatal("cannot repack specific table(s) and exclude schema(s) at the same time");
+
+	if ((objfilter & OBJFILTER_SCHEMA) &&
+		(objfilter & OBJFILTER_SCHEMA_EXCLUDE))
+		pg_fatal("cannot repack all tables in schema(s) and exclude schema(s) at the same time");
+}
+
+static void
+help(const char *progname)
+{
+	printf(_("%s repacks a PostgreSQL database.\n\n"), progname);
+	printf(_("Usage:\n"));
+	printf(_("  %s [OPTION]... [DBNAME]\n"), progname);
+	printf(_("\nOptions:\n"));
+	printf(_("  -a, --all                       repack all databases\n"));
+	printf(_("  -d, --dbname=DBNAME             database to repack\n"));
+	printf(_("  -e, --echo                      show the commands being sent to the server\n"));
+	printf(_("      --index[=INDEX]             repack following an index\n"));
+	printf(_("  -j, --jobs=NUM                  use this many concurrent connections to repack\n"));
+	printf(_("  -n, --schema=SCHEMA             repack tables in the specified schema(s) only\n"));
+	printf(_("  -N, --exclude-schema=SCHEMA     do not repack tables in the specified schema(s)\n"));
+	printf(_("  -q, --quiet                     don't write any messages\n"));
+	printf(_("  -t, --table='TABLE'             repack specific table(s) only\n"));
+	printf(_("  -v, --verbose                   write a lot of output\n"));
+	printf(_("  -V, --version                   output version information, then exit\n"));
+	printf(_("  -z, --analyze                   update optimizer statistics\n"));
+	printf(_("  -?, --help                      show this help, then exit\n"));
+	printf(_("\nConnection options:\n"));
+	printf(_("  -h, --host=HOSTNAME       database server host or socket directory\n"));
+	printf(_("  -p, --port=PORT           database server port\n"));
+	printf(_("  -U, --username=USERNAME   user name to connect as\n"));
+	printf(_("  -w, --no-password         never prompt for password\n"));
+	printf(_("  -W, --password            force password prompt\n"));
+	printf(_("  --maintenance-db=DBNAME   alternate maintenance database\n"));
+	printf(_("\nRead the description of the SQL command REPACK for details.\n"));
+	printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
+	printf(_("%s home page: <%s>\n"), PACKAGE_NAME, PACKAGE_URL);
+}
diff --git a/src/bin/scripts/t/103_repackdb.pl b/src/bin/scripts/t/103_repackdb.pl
new file mode 100644
index 00000000000..cadce9b837c
--- /dev/null
+++ b/src/bin/scripts/t/103_repackdb.pl
@@ -0,0 +1,47 @@
+# Copyright (c) 2021-2025, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+program_help_ok('pg_repackdb');
+program_version_ok('pg_repackdb');
+program_options_handling_ok('pg_repackdb');
+
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init;
+$node->start;
+
+$node->issues_sql_like(
+	[ 'pg_repackdb', 'postgres', '-t', 'pg_class'],
+	qr/statement: REPACK.*pg_class;/,
+	'pg_repackdb processes a single table');
+
+$node->safe_psql('postgres', 'CREATE USER testusr;
+	GRANT CREATE ON SCHEMA public TO testusr');
+$node->safe_psql('postgres',
+	'CREATE TABLE cluster_1 (a int primary key);
+	ALTER TABLE cluster_1 CLUSTER ON cluster_1_pkey;
+	CREATE TABLE cluster_2 (a int unique);
+	ALTER TABLE cluster_2 CLUSTER ON cluster_2_a_key;',
+	extra_params => ['-U' => 'testusr']);
+
+$node->issues_sql_like(
+	[ 'pg_repackdb', 'postgres', '-U', 'testusr' ],
+	qr/statement: REPACK.*;/,
+	'SQL REPACK run');
+
+$node->issues_sql_like(
+	[ 'pg_repackdb', 'postgres', '--index'],
+	qr/statement: REPACK.*cluster_1 USING INDEX.*statement: REPACK.*cluster_2 USING INDEX/ms,
+	'pg_repackdb --index chooses multiple tables');
+
+$node->issues_sql_like(
+	[ 'pg_repackdb', 'postgres', '--analyze', '-t', 'cluster_1'],
+	qr/statement: REPACK \(ANALYZE\) public.cluster_1/,
+	'pg_repackdb --analyze works');
+
+done_testing();
diff --git a/src/bin/scripts/vacuuming.c b/src/bin/scripts/vacuuming.c
index e2c6ae1dc7c..47690d32879 100644
--- a/src/bin/scripts/vacuuming.c
+++ b/src/bin/scripts/vacuuming.c
@@ -1,6 +1,6 @@
 /*-------------------------------------------------------------------------
  * vacuuming.c
- *		Helper routines for vacuumdb
+ *		Helper routines for vacuumdb and pg_repackdb
  *
  * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -42,8 +42,8 @@ static SimpleStringList *retrieve_objects(PGconn *conn,
 										  bool echo);
 static void prepare_vacuum_command(PGconn *conn, PQExpBuffer sql,
 								   vacuumingOptions *vacopts, const char *table);
-static void run_vacuum_command(PGconn *conn, const char *sql, bool echo,
-							   const char *table);
+static void run_vacuum_command(PGconn *conn, vacuumingOptions *vacopts,
+							   const char *sql, bool echo, const char *table);
 
 /*
  * Executes vacuum/analyze as indicated.  Returns 0 if the plan is carried
@@ -188,6 +188,14 @@ vacuum_one_database(ConnParams *cparams,
 
 	conn = connectDatabase(cparams, progname, echo, false, true);
 
+	if (vacopts->mode == MODE_REPACK && PQserverVersion(conn) < 190000)
+	{
+		/* XXX arguably, here we should use VACUUM FULL instead of failing */
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" command on server versions older than PostgreSQL %s",
+				 "REPACK", "19");
+	}
+
 	if (vacopts->disable_page_skipping && PQserverVersion(conn) < 90600)
 	{
 		PQfinish(conn);
@@ -280,9 +288,18 @@ vacuum_one_database(ConnParams *cparams,
 		if (vacopts->mode == MODE_ANALYZE_IN_STAGES)
 			printf(_("%s: processing database \"%s\": %s\n"),
 				   progname, PQdb(conn), _(stage_messages[stage]));
-		else
+		else if (vacopts->mode == MODE_ANALYZE)
+			printf(_("%s: analyzing database \"%s\"\n"),
+				   progname, PQdb(conn));
+		else if (vacopts->mode == MODE_VACUUM)
 			printf(_("%s: vacuuming database \"%s\"\n"),
 				   progname, PQdb(conn));
+		else
+		{
+			Assert(vacopts->mode == MODE_REPACK);
+			printf(_("%s: repacking database \"%s\"\n"),
+				   progname, PQdb(conn));
+		}
 		fflush(stdout);
 	}
 
@@ -372,7 +389,7 @@ vacuum_one_database(ConnParams *cparams,
 		 * through ParallelSlotsGetIdle.
 		 */
 		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, sql.data,
+		run_vacuum_command(free_slot->connection, vacopts, sql.data,
 						   echo, tabname);
 
 		cell = cell->next;
@@ -397,7 +414,7 @@ vacuum_one_database(ConnParams *cparams,
 		}
 
 		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, cmd, echo, NULL);
+		run_vacuum_command(free_slot->connection, vacopts, cmd, echo, NULL);
 
 		if (!ParallelSlotsWaitCompletion(sa))
 			ret = EXIT_FAILURE; /* error already reported by handler */
@@ -615,6 +632,35 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
 								 " AND listed_objects.object_oid IS NOT NULL\n");
 	}
 
+	/*
+	 * In REPACK mode, if the 'using_index' option was given but no index
+	 * name, filter only tables that have an index with indisclustered set.
+	 * (If an index name is given, we trust the user to pass a reasonable list
+	 * of tables.)
+	 *
+	 * XXX it may be worth printing an error if an index name is given with no
+	 * list of tables.
+	 */
+	if (vacopts->mode == MODE_REPACK &&
+		vacopts->using_index && !vacopts->indexname)
+	{
+		appendPQExpBufferStr(&catalog_query,
+							 " AND EXISTS (SELECT 1 FROM pg_catalog.pg_index\n"
+							 "    WHERE indrelid = c.oid AND indisclustered)\n");
+	}
+
+	/*
+	 * In REPACK mode, only consider the tables that the current user has
+	 * MAINTAIN privileges on.  XXX maybe we should do this in all cases, not
+	 * just REPACK.  The vacuumdb output is too noisy for no reason.
+	 */
+	if (vacopts->mode == MODE_REPACK)
+	{
+		appendPQExpBufferStr(&catalog_query,
+							 " AND pg_catalog.has_table_privilege(current_user, "
+							 "c.oid, 'MAINTAIN')\n");
+	}
+
 	/*
 	 * If no tables were listed, filter for the relevant relation types.  If
 	 * tables were given via --table, don't bother filtering by relation type.
@@ -837,8 +883,10 @@ prepare_vacuum_command(PGconn *conn, PQExpBuffer sql,
 			if (vacopts->verbose)
 				appendPQExpBufferStr(sql, " VERBOSE");
 		}
+
+		appendPQExpBuffer(sql, " %s", table);
 	}
-	else
+	else if (vacopts->mode == MODE_VACUUM)
 	{
 		appendPQExpBufferStr(sql, "VACUUM");
 
@@ -952,9 +1000,39 @@ prepare_vacuum_command(PGconn *conn, PQExpBuffer sql,
 			if (vacopts->and_analyze)
 				appendPQExpBufferStr(sql, " ANALYZE");
 		}
+
+		appendPQExpBuffer(sql, " %s", table);
+	}
+	else if (vacopts->mode == MODE_REPACK)
+	{
+		appendPQExpBufferStr(sql, "REPACK");
+
+		if (vacopts->verbose)
+		{
+			appendPQExpBuffer(sql, "%sVERBOSE", sep);
+			sep = comma;
+		}
+		if (vacopts->and_analyze)
+		{
+			appendPQExpBuffer(sql, "%sANALYZE", sep);
+			sep = comma;
+		}
+
+		if (sep != paren)
+			appendPQExpBufferChar(sql, ')');
+
+		appendPQExpBuffer(sql, " %s", table);
+
+		if (vacopts->using_index)
+		{
+			appendPQExpBuffer(sql, " USING INDEX");
+			if (vacopts->indexname)
+				appendPQExpBuffer(sql, " %s", fmtIdEnc(vacopts->indexname,
+													   PQclientEncoding(conn)));
+		}
 	}
 
-	appendPQExpBuffer(sql, " %s;", table);
+	appendPQExpBufferChar(sql, ';');
 }
 
 /*
@@ -964,8 +1042,8 @@ prepare_vacuum_command(PGconn *conn, PQExpBuffer sql,
  * Any errors during command execution are reported to stderr.
  */
 static void
-run_vacuum_command(PGconn *conn, const char *sql, bool echo,
-				   const char *table)
+run_vacuum_command(PGconn *conn, vacuumingOptions *vacopts,
+				   const char *sql, bool echo, const char *table)
 {
 	bool		status;
 
@@ -978,13 +1056,21 @@ run_vacuum_command(PGconn *conn, const char *sql, bool echo,
 	{
 		if (table)
 		{
-			pg_log_error("vacuuming of table \"%s\" in database \"%s\" failed: %s",
-						 table, PQdb(conn), PQerrorMessage(conn));
+			if (vacopts->mode == MODE_VACUUM)
+				pg_log_error("vacuuming of table \"%s\" in database \"%s\" failed: %s",
+							 table, PQdb(conn), PQerrorMessage(conn));
+			else
+				pg_log_error("repacking of table \"%s\" in database \"%s\" failed: %s",
+							 table, PQdb(conn), PQerrorMessage(conn));
 		}
 		else
 		{
-			pg_log_error("vacuuming of database \"%s\" failed: %s",
-						 PQdb(conn), PQerrorMessage(conn));
+			if (vacopts->mode == MODE_VACUUM)
+				pg_log_error("vacuuming of database \"%s\" failed: %s",
+							 PQdb(conn), PQerrorMessage(conn));
+			else
+				pg_log_error("repacking of database \"%s\" failed: %s",
+							 PQdb(conn), PQerrorMessage(conn));
 		}
 	}
 }
diff --git a/src/bin/scripts/vacuuming.h b/src/bin/scripts/vacuuming.h
index 49f968b32e5..665dbaedfad 100644
--- a/src/bin/scripts/vacuuming.h
+++ b/src/bin/scripts/vacuuming.h
@@ -20,6 +20,7 @@
 typedef enum
 {
 	MODE_VACUUM,
+	MODE_REPACK,
 	MODE_ANALYZE,
 	MODE_ANALYZE_IN_STAGES
 } RunMode;
@@ -37,6 +38,8 @@ typedef struct vacuumingOptions
 	bool		and_analyze;
 	bool		full;
 	bool		freeze;
+	bool		using_index;
+	char	   *indexname;
 	bool		disable_page_skipping;
 	bool		skip_locked;
 	int			min_xid_age;
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 60088a64cbb..652542e8e65 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -24,6 +24,7 @@
 #define CLUOPT_RECHECK 0x02		/* recheck relation state */
 #define CLUOPT_RECHECK_ISCLUSTERED 0x04 /* recheck relation state for
 										 * indisclustered */
+#define CLUOPT_ANALYZE 0x08		/* do an ANALYZE */
 
 /* options for CLUSTER */
 typedef struct ClusterParams
@@ -31,8 +32,11 @@ typedef struct ClusterParams
 	bits32		options;		/* bitmask of CLUOPT_* */
 } ClusterParams;
 
-extern void cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel);
-extern void cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params);
+
+extern void ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel);
+
+extern void cluster_rel(RepackCommand command, Relation OldHeap, Oid indexOid,
+						ClusterParams *params);
 extern void check_index_is_clusterable(Relation OldHeap, Oid indexOid,
 									   LOCKMODE lockmode);
 extern void mark_index_clustered(Relation rel, Oid indexOid, bool is_internal);
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 1cde4bd9bcf..ebf004b7aa5 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -56,28 +56,34 @@
 #define PROGRESS_ANALYZE_PHASE_COMPUTE_EXT_STATS		4
 #define PROGRESS_ANALYZE_PHASE_FINALIZE_ANALYZE			5
 
-/* Progress parameters for cluster */
-#define PROGRESS_CLUSTER_COMMAND				0
-#define PROGRESS_CLUSTER_PHASE					1
-#define PROGRESS_CLUSTER_INDEX_RELID			2
-#define PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED	3
-#define PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN	4
-#define PROGRESS_CLUSTER_TOTAL_HEAP_BLKS		5
-#define PROGRESS_CLUSTER_HEAP_BLKS_SCANNED		6
-#define PROGRESS_CLUSTER_INDEX_REBUILD_COUNT	7
+/*
+ * Progress parameters for REPACK.
+ *
+ * Values for PROGRESS_REPACK_COMMAND are defined as in RepackCommand.
+ *
+ * Note: Since REPACK shares code with CLUSTER, these values are also
+ * used by CLUSTER. (CLUSTER is now deprecated, so it makes little sense to
+ * introduce a separate set of constants.)
+ */
+#define PROGRESS_REPACK_COMMAND					0
+#define PROGRESS_REPACK_PHASE					1
+#define PROGRESS_REPACK_INDEX_RELID				2
+#define PROGRESS_REPACK_HEAP_TUPLES_SCANNED		3
+#define PROGRESS_REPACK_HEAP_TUPLES_WRITTEN		4
+#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			5
+#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		6
+#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		7
 
-/* Phases of cluster (as advertised via PROGRESS_CLUSTER_PHASE) */
-#define PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP	1
-#define PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP	2
-#define PROGRESS_CLUSTER_PHASE_SORT_TUPLES		3
-#define PROGRESS_CLUSTER_PHASE_WRITE_NEW_HEAP	4
-#define PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES	5
-#define PROGRESS_CLUSTER_PHASE_REBUILD_INDEX	6
-#define PROGRESS_CLUSTER_PHASE_FINAL_CLEANUP	7
-
-/* Commands of PROGRESS_CLUSTER */
-#define PROGRESS_CLUSTER_COMMAND_CLUSTER		1
-#define PROGRESS_CLUSTER_COMMAND_VACUUM_FULL	2
+/*
+ * Phases of repack (as advertised via PROGRESS_REPACK_PHASE).
+ */
+#define PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP		1
+#define PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP	2
+#define PROGRESS_REPACK_PHASE_SORT_TUPLES		3
+#define PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP	4
+#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	5
+#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		6
+#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		7
 
 /* Progress parameters for CREATE INDEX */
 /* 3, 4 and 5 reserved for "waitfor" metrics */
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 87c1086ec99..68b9a777098 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3951,18 +3951,6 @@ typedef struct AlterSystemStmt
 	VariableSetStmt *setstmt;	/* SET subcommand */
 } AlterSystemStmt;
 
-/* ----------------------
- *		Cluster Statement (support pbrown's cluster index implementation)
- * ----------------------
- */
-typedef struct ClusterStmt
-{
-	NodeTag		type;
-	RangeVar   *relation;		/* relation being indexed, or NULL if all */
-	char	   *indexname;		/* original index defined */
-	List	   *params;			/* list of DefElem nodes */
-} ClusterStmt;
-
 /* ----------------------
  *		Vacuum and Analyze Statements
  *
@@ -3975,7 +3963,7 @@ typedef struct VacuumStmt
 	NodeTag		type;
 	List	   *options;		/* list of DefElem nodes */
 	List	   *rels;			/* list of VacuumRelation, or NIL for all */
-	bool		is_vacuumcmd;	/* true for VACUUM, false for ANALYZE */
+	bool		is_vacuumcmd;	/* true for VACUUM, false otherwise */
 } VacuumStmt;
 
 /*
@@ -3993,6 +3981,27 @@ typedef struct VacuumRelation
 	List	   *va_cols;		/* list of column names, or NIL for all */
 } VacuumRelation;
 
+/* ----------------------
+ *		Repack Statement
+ * ----------------------
+ */
+typedef enum RepackCommand
+{
+	REPACK_COMMAND_CLUSTER = 1,
+	REPACK_COMMAND_REPACK,
+	REPACK_COMMAND_VACUUMFULL,
+} RepackCommand;
+
+typedef struct RepackStmt
+{
+	NodeTag		type;
+	RepackCommand command;		/* type of command being run */
+	VacuumRelation *relation;	/* relation being repacked */
+	char	   *indexname;		/* order tuples by this index */
+	bool		usingindex;		/* whether USING INDEX is specified */
+	List	   *params;			/* list of DefElem nodes */
+} RepackStmt;
+
 /* ----------------------
  *		Explain Statement
  *
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 84182eaaae2..87f6c226c43 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -375,6 +375,7 @@ PG_KEYWORD("reindex", REINDEX, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("relative", RELATIVE_P, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("release", RELEASE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("rename", RENAME, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("repack", REPACK, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("repeatable", REPEATABLE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("replace", REPLACE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("replica", REPLICA, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/include/tcop/cmdtaglist.h b/src/include/tcop/cmdtaglist.h
index d250a714d59..cceb312f2b3 100644
--- a/src/include/tcop/cmdtaglist.h
+++ b/src/include/tcop/cmdtaglist.h
@@ -196,6 +196,7 @@ PG_CMDTAG(CMDTAG_REASSIGN_OWNED, "REASSIGN OWNED", false, false, false)
 PG_CMDTAG(CMDTAG_REFRESH_MATERIALIZED_VIEW, "REFRESH MATERIALIZED VIEW", true, false, false)
 PG_CMDTAG(CMDTAG_REINDEX, "REINDEX", true, false, false)
 PG_CMDTAG(CMDTAG_RELEASE, "RELEASE", false, false, false)
+PG_CMDTAG(CMDTAG_REPACK, "REPACK", false, false, false)
 PG_CMDTAG(CMDTAG_RESET, "RESET", false, false, false)
 PG_CMDTAG(CMDTAG_REVOKE, "REVOKE", true, false, false)
 PG_CMDTAG(CMDTAG_REVOKE_ROLE, "REVOKE ROLE", false, false, false)
diff --git a/src/include/utils/backend_progress.h b/src/include/utils/backend_progress.h
index dda813ab407..e69e366dcdc 100644
--- a/src/include/utils/backend_progress.h
+++ b/src/include/utils/backend_progress.h
@@ -28,6 +28,7 @@ typedef enum ProgressCommandType
 	PROGRESS_COMMAND_CREATE_INDEX,
 	PROGRESS_COMMAND_BASEBACKUP,
 	PROGRESS_COMMAND_COPY,
+	PROGRESS_COMMAND_REPACK,
 } ProgressCommandType;
 
 #define PGSTAT_NUM_PROGRESS_PARAM	20
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index 4d40a6809ab..277854418fa 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -495,6 +495,43 @@ ALTER TABLE clstrpart SET WITHOUT CLUSTER;
 ERROR:  cannot mark index clustered in partitioned table
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 ERROR:  cannot mark index clustered in partitioned table
+-- Check that REPACK sets new relfilenodes: it should process exactly the same
+-- tables as CLUSTER did.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart USING INDEX clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
+-- And finally the same for REPACK w/o index.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
 DROP TABLE clstrpart;
 -- Ownership of partitions is checked
 CREATE TABLE ptnowner(i int unique) PARTITION BY LIST (i);
@@ -513,7 +550,7 @@ CREATE TEMP TABLE ptnowner_oldnodes AS
   JOIN pg_class AS c ON c.oid=tree.relid;
 SET SESSION AUTHORIZATION regress_ptnowner;
 CLUSTER ptnowner USING ptnowner_i_idx;
-WARNING:  permission denied to cluster "ptnowner2", skipping it
+WARNING:  permission denied to execute CLUSTER on "ptnowner2", skipping it
 RESET SESSION AUTHORIZATION;
 SELECT a.relname, a.relfilenode=b.relfilenode FROM pg_class a
   JOIN ptnowner_oldnodes b USING (oid) ORDER BY a.relname COLLATE "C";
@@ -665,6 +702,101 @@ SELECT * FROM clstr_expression WHERE -a = -3 ORDER BY -a, b;
 (4 rows)
 
 COMMIT;
+----------------------------------------------------------------------
+--
+-- REPACK
+--
+----------------------------------------------------------------------
+-- REPACK handles individual tables identically to CLUSTER, but it's worth
+-- checking if it handles table hierarchies identically as well.
+REPACK clstr_tst USING INDEX clstr_tst_c;
+-- Verify that inheritance link still works
+INSERT INTO clstr_tst_inh VALUES (0, 100, 'in child table 2');
+SELECT a,b,c,substring(d for 30), length(d) from clstr_tst;
+ a  |  b  |        c         |           substring            | length 
+----+-----+------------------+--------------------------------+--------
+ 10 |  14 | catorce          |                                |       
+ 18 |   5 | cinco            |                                |       
+  9 |   4 | cuatro           |                                |       
+ 26 |  19 | diecinueve       |                                |       
+ 12 |  18 | dieciocho        |                                |       
+ 30 |  16 | dieciseis        |                                |       
+ 24 |  17 | diecisiete       |                                |       
+  2 |  10 | diez             |                                |       
+ 23 |  12 | doce             |                                |       
+ 11 |   2 | dos              |                                |       
+ 25 |   9 | nueve            |                                |       
+ 31 |   8 | ocho             |                                |       
+  1 |  11 | once             |                                |       
+ 28 |  15 | quince           |                                |       
+ 32 |   6 | seis             | xyzzyxyzzyxyzzyxyzzyxyzzyxyzzy | 500000
+ 29 |   7 | siete            |                                |       
+ 15 |  13 | trece            |                                |       
+ 22 |  30 | treinta          |                                |       
+ 17 |  32 | treinta y dos    |                                |       
+  3 |  31 | treinta y uno    |                                |       
+  5 |   3 | tres             |                                |       
+ 20 |   1 | uno              |                                |       
+  6 |  20 | veinte           |                                |       
+ 14 |  25 | veinticinco      |                                |       
+ 21 |  24 | veinticuatro     |                                |       
+  4 |  22 | veintidos        |                                |       
+ 19 |  29 | veintinueve      |                                |       
+ 16 |  28 | veintiocho       |                                |       
+ 27 |  26 | veintiseis       |                                |       
+ 13 |  27 | veintisiete      |                                |       
+  7 |  23 | veintitres       |                                |       
+  8 |  21 | veintiuno        |                                |       
+  0 | 100 | in child table   |                                |       
+  0 | 100 | in child table 2 |                                |       
+(34 rows)
+
+-- Verify that foreign key link still works
+INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
+ERROR:  insert or update on table "clstr_tst" violates foreign key constraint "clstr_tst_con"
+DETAIL:  Key (b)=(1111) is not present in table "clstr_tst_s".
+SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
+ORDER BY 1;
+       conname        
+----------------------
+ clstr_tst_a_not_null
+ clstr_tst_con
+ clstr_tst_pkey
+(3 rows)
+
+-- Verify partial analyze works
+REPACK (ANALYZE) clstr_tst (a);
+REPACK (ANALYZE) clstr_tst;
+REPACK (VERBOSE) clstr_tst (a);
+ERROR:  ANALYZE option must be specified when a column list is provided
+-- REPACK w/o argument performs no ordering, so we can only check which tables
+-- have the relfilenode changed.
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_old AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+SET SESSION AUTHORIZATION regress_clstr_user;
+SET client_min_messages = ERROR;  -- order of "skipping" warnings may vary
+REPACK;
+RESET client_min_messages;
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_new AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+-- Do the actual comparison. Unlike CLUSTER, clstr_3 should have been
+-- processed because there is nothing like clustering index here.
+SELECT o.relname FROM relnodes_old o
+JOIN relnodes_new n ON o.relname = n.relname
+WHERE o.relfilenode <> n.relfilenode
+ORDER BY o.relname;
+ relname 
+---------
+ clstr_1
+ clstr_3
+(2 rows)
+
 -- clean up
 DROP TABLE clustertest;
 DROP TABLE clstr_1;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 7f1cb3bb4af..8a4672a2a2b 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1986,34 +1986,23 @@ pg_stat_progress_basebackup| SELECT pid,
             ELSE NULL::text
         END AS backup_type
    FROM pg_stat_get_progress_info('BASEBACKUP'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20);
-pg_stat_progress_cluster| SELECT s.pid,
-    s.datid,
-    d.datname,
-    s.relid,
-        CASE s.param1
-            WHEN 1 THEN 'CLUSTER'::text
-            WHEN 2 THEN 'VACUUM FULL'::text
-            ELSE NULL::text
+pg_stat_progress_cluster| SELECT pid,
+    datid,
+    datname,
+    relid,
+        CASE
+            WHEN (command = ANY (ARRAY['CLUSTER'::text, 'VACUUM FULL'::text])) THEN command
+            WHEN (repack_index_relid = (0)::oid) THEN 'VACUUM FULL'::text
+            ELSE 'CLUSTER'::text
         END AS command,
-        CASE s.param2
-            WHEN 0 THEN 'initializing'::text
-            WHEN 1 THEN 'seq scanning heap'::text
-            WHEN 2 THEN 'index scanning heap'::text
-            WHEN 3 THEN 'sorting tuples'::text
-            WHEN 4 THEN 'writing new heap'::text
-            WHEN 5 THEN 'swapping relation files'::text
-            WHEN 6 THEN 'rebuilding index'::text
-            WHEN 7 THEN 'performing final cleanup'::text
-            ELSE NULL::text
-        END AS phase,
-    (s.param3)::oid AS cluster_index_relid,
-    s.param4 AS heap_tuples_scanned,
-    s.param5 AS heap_tuples_written,
-    s.param6 AS heap_blks_total,
-    s.param7 AS heap_blks_scanned,
-    s.param8 AS index_rebuild_count
-   FROM (pg_stat_get_progress_info('CLUSTER'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
-     LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+    phase,
+    repack_index_relid AS cluster_index_relid,
+    heap_tuples_scanned,
+    heap_tuples_written,
+    heap_blks_total,
+    heap_blks_scanned,
+    index_rebuild_count
+   FROM pg_stat_progress_repack;
 pg_stat_progress_copy| SELECT s.pid,
     s.datid,
     d.datname,
@@ -2073,6 +2062,35 @@ pg_stat_progress_create_index| SELECT s.pid,
     s.param15 AS partitions_done
    FROM (pg_stat_get_progress_info('CREATE INDEX'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+pg_stat_progress_repack| SELECT s.pid,
+    s.datid,
+    d.datname,
+    s.relid,
+        CASE s.param1
+            WHEN 1 THEN 'CLUSTER'::text
+            WHEN 2 THEN 'REPACK'::text
+            WHEN 3 THEN 'VACUUM FULL'::text
+            ELSE NULL::text
+        END AS command,
+        CASE s.param2
+            WHEN 0 THEN 'initializing'::text
+            WHEN 1 THEN 'seq scanning heap'::text
+            WHEN 2 THEN 'index scanning heap'::text
+            WHEN 3 THEN 'sorting tuples'::text
+            WHEN 4 THEN 'writing new heap'::text
+            WHEN 5 THEN 'swapping relation files'::text
+            WHEN 6 THEN 'rebuilding index'::text
+            WHEN 7 THEN 'performing final cleanup'::text
+            ELSE NULL::text
+        END AS phase,
+    (s.param3)::oid AS repack_index_relid,
+    s.param4 AS heap_tuples_scanned,
+    s.param5 AS heap_tuples_written,
+    s.param6 AS heap_blks_total,
+    s.param7 AS heap_blks_scanned,
+    s.param8 AS index_rebuild_count
+   FROM (pg_stat_get_progress_info('REPACK'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
+     LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_vacuum| SELECT s.pid,
     s.datid,
     d.datname,
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index b7115f86104..c976823a3cb 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -76,7 +76,6 @@ INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
 SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
 ORDER BY 1;
 
-
 SELECT relname, relkind,
     EXISTS(SELECT 1 FROM pg_class WHERE oid = c.reltoastrelid) AS hastoast
 FROM pg_class c WHERE relname LIKE 'clstr_tst%' ORDER BY relname;
@@ -229,6 +228,24 @@ SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM o
 CLUSTER clstrpart;
 ALTER TABLE clstrpart SET WITHOUT CLUSTER;
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+
+-- Check that REPACK sets new relfilenodes: it should process exactly the same
+-- tables as CLUSTER did.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart USING INDEX clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+
+-- And finally the same for REPACK w/o index.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+
 DROP TABLE clstrpart;
 
 -- Ownership of partitions is checked
@@ -313,6 +330,57 @@ EXPLAIN (COSTS OFF) SELECT * FROM clstr_expression WHERE -a = -3 ORDER BY -a, b;
 SELECT * FROM clstr_expression WHERE -a = -3 ORDER BY -a, b;
 COMMIT;
 
+----------------------------------------------------------------------
+--
+-- REPACK
+--
+----------------------------------------------------------------------
+
+-- REPACK handles individual tables identically to CLUSTER, but it's worth
+-- checking if it handles table hierarchies identically as well.
+REPACK clstr_tst USING INDEX clstr_tst_c;
+
+-- Verify that inheritance link still works
+INSERT INTO clstr_tst_inh VALUES (0, 100, 'in child table 2');
+SELECT a,b,c,substring(d for 30), length(d) from clstr_tst;
+
+-- Verify that foreign key link still works
+INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
+
+SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
+ORDER BY 1;
+
+-- Verify partial analyze works
+REPACK (ANALYZE) clstr_tst (a);
+REPACK (ANALYZE) clstr_tst;
+REPACK (VERBOSE) clstr_tst (a);
+
+-- REPACK w/o argument performs no ordering, so we can only check which tables
+-- have the relfilenode changed.
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_old AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+
+SET SESSION AUTHORIZATION regress_clstr_user;
+SET client_min_messages = ERROR;  -- order of "skipping" warnings may vary
+REPACK;
+RESET client_min_messages;
+
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_new AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+
+-- Do the actual comparison. Unlike CLUSTER, clstr_3 should have been
+-- processed because there is nothing like clustering index here.
+SELECT o.relname FROM relnodes_old o
+JOIN relnodes_new n ON o.relname = n.relname
+WHERE o.relfilenode <> n.relfilenode
+ORDER BY o.relname;
+
 -- clean up
 DROP TABLE clustertest;
 DROP TABLE clstr_1;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 37f26f6c6b7..536a23f74a5 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2537,6 +2537,8 @@ ReorderBufferTupleCidKey
 ReorderBufferUpdateProgressTxnCB
 ReorderTuple
 RepOriginId
+RepackCommand
+RepackStmt
 ReparameterizeForeignPathByChild_function
 ReplaceVarsFromTargetList_context
 ReplaceVarsNoMatchOption
-- 
2.47.3

#40

Alvaro Herrera

alvherre@alvh.no-ip.org

2 months ago

In reply to: Alvaro Herrera (#1)

4 attachment(s)

Re: Adding REPACK [concurrently]

Hello,

Here's a new installment of this series, v25, including the CONCURRENTLY
part, which required some conflict fixes on top of the much-changed
v24-0001 patch.

After the talk on this subject for PGConf.EU, there were some
reservations about this whole project, and if I understand correctly,
they can be summarized in these three points:

1. Would the spill files for reorderbuffers occupy as much disk space as
it takes to copy the initial contents of the table, for each active
logical decoding replication slot? Antonin claims (I haven't verified
this) that there are some hacks in place to avoid this problem, or that
it is easy to install some -- and if so, then this patch would already
be better than pg_repack. This perhaps merits more testing.

2. Is the concurrent REPACK operation MVCC-safe? At the moment, with
the present implementation, no it is not. There are discussions on
getting this fixed, and Mihail has proposed some patches which at least
are quite short, though their safety is something we need to assess in
more depth.

3. Would the xmin horizon remain stuck at the spot where REPACK started,
thereby preventing VACUUM from cleaning up recently-dead rows in other
tables? As I understand, with the current implementation, yes it would,
and we cannot easily apply hacks such as PROC_IN_VACUUM to prevent it,
because it would introduce the same problems it did for CREATE INDEX
CONCURRENTLY that was fixed in pg14 (commit 042b584c7f7d62). Mihail and
Antonin have discussed possible ways to ease this, but we don't have
code for that yet. This is, again, no worse than VACUUM FULL or
CLUSTER, so lack of this wouldn't be a killer for this project, though
of course it would be much better to do better.

I have not yet addressed Robert Treat's feedback from October 12th.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Officer Krupke, what are we to do?
Gee, officer Krupke, Krup you! (West Side Story, "Gee, Officer Krupke")

Attachments:

v25-0001-Add-REPACK-command.patchtext/x-diff; charset=utf-8Download

From faa8c9908a1897af327a5ec7aa02406fc40b8761 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= <alvherre@kurilemu.de>
Date: Sat, 26 Jul 2025 19:57:26 +0200
Subject: [PATCH v25 1/4] Add REPACK command
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

REPACK absorbs the functionality of VACUUM FULL and CLUSTER in a single
command.  Because this functionality is completely different from
regular VACUUM, having it separate from VACUUM makes it easier for users
to understand; as for CLUSTER, the term is heavily overloaded in the
IT world and even in Postgres itself, so it's good that we can avoid it.

This also adds pg_repackdb, a new utility that can invoke the new
commands.  This is heavily based on vacuumdb.

Author: Antonin Houska <ah@cybertec.at>
Co-authored-by: Ãlvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: jian he <jian.universality@gmail.com>
Discussion: https://postgr.es/m/82651.1720540558@antos
Discussion: https://postgr.es/m/202507262156.sb455angijk6@alvherre.pgsql
---
 doc/src/sgml/monitoring.sgml             | 223 +++++-
 doc/src/sgml/ref/allfiles.sgml           |   2 +
 doc/src/sgml/ref/cluster.sgml            |  97 +--
 doc/src/sgml/ref/clusterdb.sgml          |   5 +
 doc/src/sgml/ref/pg_repackdb.sgml        | 488 +++++++++++++
 doc/src/sgml/ref/repack.sgml             | 319 +++++++++
 doc/src/sgml/ref/vacuum.sgml             |  33 +-
 doc/src/sgml/reference.sgml              |   2 +
 src/backend/access/heap/heapam_handler.c |  32 +-
 src/backend/catalog/index.c              |   2 +-
 src/backend/catalog/system_views.sql     |  29 +-
 src/backend/commands/cluster.c           | 854 +++++++++++++++--------
 src/backend/commands/vacuum.c            |   6 +-
 src/backend/parser/gram.y                |  97 ++-
 src/backend/tcop/utility.c               |  20 +-
 src/backend/utils/adt/pgstatfuncs.c      |   2 +
 src/bin/psql/tab-complete.in.c           |  33 +-
 src/bin/scripts/Makefile                 |   4 +-
 src/bin/scripts/meson.build              |   2 +
 src/bin/scripts/pg_repackdb.c            | 242 +++++++
 src/bin/scripts/t/103_repackdb.pl        |  47 ++
 src/bin/scripts/vacuuming.c              | 114 ++-
 src/bin/scripts/vacuuming.h              |   3 +
 src/include/commands/cluster.h           |   8 +-
 src/include/commands/progress.h          |  48 +-
 src/include/nodes/parsenodes.h           |  35 +-
 src/include/parser/kwlist.h              |   1 +
 src/include/tcop/cmdtaglist.h            |   1 +
 src/include/utils/backend_progress.h     |   1 +
 src/test/regress/expected/cluster.out    | 134 +++-
 src/test/regress/expected/rules.out      |  72 +-
 src/test/regress/sql/cluster.sql         |  70 +-
 src/tools/pgindent/typedefs.list         |   2 +
 33 files changed, 2483 insertions(+), 545 deletions(-)
 create mode 100644 doc/src/sgml/ref/pg_repackdb.sgml
 create mode 100644 doc/src/sgml/ref/repack.sgml
 create mode 100644 src/bin/scripts/pg_repackdb.c
 create mode 100644 src/bin/scripts/t/103_repackdb.pl

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index f3bf527d5b4..467e081f015 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -405,6 +405,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_progress_repack</structname><indexterm><primary>pg_stat_progress_repack</primary></indexterm></entry>
+      <entry>One row for each backend running
+       <command>REPACK</command>, showing current progress.  See
+       <xref linkend="repack-progress-reporting"/>.
+      </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_progress_basebackup</structname><indexterm><primary>pg_stat_progress_basebackup</primary></indexterm></entry>
       <entry>One row for each WAL sender process streaming a base backup,
@@ -5575,7 +5583,8 @@ FROM pg_stat_get_backend_idset() AS backendid;
    certain commands during command execution.  Currently, the only commands
    which support progress reporting are <command>ANALYZE</command>,
    <command>CLUSTER</command>,
-   <command>CREATE INDEX</command>, <command>VACUUM</command>,
+   <command>CREATE INDEX</command>, <command>REPACK</command>,
+   <command>VACUUM</command>,
    <command>COPY</command>,
    and <xref linkend="protocol-replication-base-backup"/> (i.e., replication
    command that <xref linkend="app-pgbasebackup"/> issues to take
@@ -6034,6 +6043,218 @@ FROM pg_stat_get_backend_idset() AS backendid;
   </table>
  </sect2>
 
+ <sect2 id="repack-progress-reporting">
+  <title>REPACK Progress Reporting</title>
+
+  <indexterm>
+   <primary>pg_stat_progress_repack</primary>
+  </indexterm>
+
+  <para>
+   Whenever <command>REPACK</command> is running,
+   the <structname>pg_stat_progress_repack</structname> view will contain a
+   row for each backend that is currently running the command.  The tables
+   below describe the information that will be reported and provide
+   information about how to interpret it.
+  </para>
+
+  <table id="pg-stat-progress-repack-view" xreflabel="pg_stat_progress_repack">
+   <title><structname>pg_stat_progress_repack</structname> View</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>pid</structfield> <type>integer</type>
+      </para>
+      <para>
+       Process ID of backend.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>datid</structfield> <type>oid</type>
+      </para>
+      <para>
+       OID of the database to which this backend is connected.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>datname</structfield> <type>name</type>
+      </para>
+      <para>
+       Name of the database to which this backend is connected.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>relid</structfield> <type>oid</type>
+      </para>
+      <para>
+       OID of the table being repacked.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>phase</structfield> <type>text</type>
+      </para>
+      <para>
+       Current processing phase. See <xref linkend="repack-phases"/>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>repack_index_relid</structfield> <type>oid</type>
+      </para>
+      <para>
+       If the table is being scanned using an index, this is the OID of the
+       index being used; otherwise, it is zero.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_scanned</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples scanned.
+       This counter only advances when the phase is
+       <literal>seq scanning heap</literal>,
+       <literal>index scanning heap</literal>
+       or <literal>writing new heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_written</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples written.
+       This counter only advances when the phase is
+       <literal>seq scanning heap</literal>,
+       <literal>index scanning heap</literal>
+       or <literal>writing new heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_blks_total</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total number of heap blocks in the table.  This number is reported
+       as of the beginning of <literal>seq scanning heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_blks_scanned</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap blocks scanned.  This counter only advances when the
+       phase is <literal>seq scanning heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>index_rebuild_count</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of indexes rebuilt.  This counter only advances when the phase
+       is <literal>rebuilding index</literal>.
+      </para></entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <table id="repack-phases">
+   <title>REPACK Phases</title>
+   <tgroup cols="2">
+    <colspec colname="col1" colwidth="1*"/>
+    <colspec colname="col2" colwidth="2*"/>
+    <thead>
+    <row>
+      <entry>Phase</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><literal>initializing</literal></entry>
+     <entry>
+       The command is preparing to begin scanning the heap.  This phase is
+       expected to be very brief.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>seq scanning heap</literal></entry>
+     <entry>
+       The command is currently scanning the table using a sequential scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>index scanning heap</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently scanning the table using an index scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>sorting tuples</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently sorting tuples.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>writing new heap</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently writing the new heap.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>swapping relation files</literal></entry>
+     <entry>
+       The command is currently swapping newly-built files into place.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>rebuilding index</literal></entry>
+     <entry>
+       The command is currently rebuilding an index.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>performing final cleanup</literal></entry>
+     <entry>
+       The command is performing final cleanup.  When this phase is
+       completed, <command>REPACK</command> will end.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+ </sect2>
+
  <sect2 id="copy-progress-reporting">
   <title>COPY Progress Reporting</title>
 
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index f5be638867a..eabf92e3536 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -167,6 +167,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY refreshMaterializedView SYSTEM "refresh_materialized_view.sgml">
 <!ENTITY reindex            SYSTEM "reindex.sgml">
 <!ENTITY releaseSavepoint   SYSTEM "release_savepoint.sgml">
+<!ENTITY repack             SYSTEM "repack.sgml">
 <!ENTITY reset              SYSTEM "reset.sgml">
 <!ENTITY revoke             SYSTEM "revoke.sgml">
 <!ENTITY rollback           SYSTEM "rollback.sgml">
@@ -212,6 +213,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY pgIsready          SYSTEM "pg_isready.sgml">
 <!ENTITY pgReceivewal       SYSTEM "pg_receivewal.sgml">
 <!ENTITY pgRecvlogical      SYSTEM "pg_recvlogical.sgml">
+<!ENTITY pgRepackdb         SYSTEM "pg_repackdb.sgml">
 <!ENTITY pgResetwal         SYSTEM "pg_resetwal.sgml">
 <!ENTITY pgRestore          SYSTEM "pg_restore.sgml">
 <!ENTITY pgRewind           SYSTEM "pg_rewind.sgml">
diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index 8811f169ea0..cfcfb65e349 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -33,51 +33,13 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
   <title>Description</title>
 
   <para>
-   <command>CLUSTER</command> instructs <productname>PostgreSQL</productname>
-   to cluster the table specified
-   by <replaceable class="parameter">table_name</replaceable>
-   based on the index specified by
-   <replaceable class="parameter">index_name</replaceable>. The index must
-   already have been defined on
-   <replaceable class="parameter">table_name</replaceable>.
+   The <command>CLUSTER</command> command is equivalent to
+   <xref linkend="sql-repack"/> with an <literal>USING INDEX</literal>
+   clause.  See there for more details.
   </para>
 
-  <para>
-   When a table is clustered, it is physically reordered
-   based on the index information. Clustering is a one-time operation:
-   when the table is subsequently updated, the changes are
-   not clustered.  That is, no attempt is made to store new or
-   updated rows according to their index order.  (If one wishes, one can
-   periodically recluster by issuing the command again.  Also, setting
-   the table's <literal>fillfactor</literal> storage parameter to less than
-   100% can aid in preserving cluster ordering during updates, since updated
-   rows are kept on the same page if enough space is available there.)
-  </para>
+<!-- Do we need to describe exactly which options map to what?  They seem obvious to me. -->
 
-  <para>
-   When a table is clustered, <productname>PostgreSQL</productname>
-   remembers which index it was clustered by.  The form
-   <command>CLUSTER <replaceable class="parameter">table_name</replaceable></command>
-   reclusters the table using the same index as before.  You can also
-   use the <literal>CLUSTER</literal> or <literal>SET WITHOUT CLUSTER</literal>
-   forms of <link linkend="sql-altertable"><command>ALTER TABLE</command></link> to set the index to be used for
-   future cluster operations, or to clear any previous setting.
-  </para>
-
-  <para>
-   <command>CLUSTER</command> without a
-   <replaceable class="parameter">table_name</replaceable> reclusters all the
-   previously-clustered tables in the current database that the calling user
-   has privileges for.  This form of <command>CLUSTER</command> cannot be
-   executed inside a transaction block.
-  </para>
-
-  <para>
-   When a table is being clustered, an <literal>ACCESS
-   EXCLUSIVE</literal> lock is acquired on it. This prevents any other
-   database operations (both reads and writes) from operating on the
-   table until the <command>CLUSTER</command> is finished.
-  </para>
  </refsect1>
 
  <refsect1>
@@ -136,63 +98,12 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
     on the table.
    </para>
 
-   <para>
-    In cases where you are accessing single rows randomly
-    within a table, the actual order of the data in the
-    table is unimportant. However, if you tend to access some
-    data more than others, and there is an index that groups
-    them together, you will benefit from using <command>CLUSTER</command>.
-    If you are requesting a range of indexed values from a table, or a
-    single indexed value that has multiple rows that match,
-    <command>CLUSTER</command> will help because once the index identifies the
-    table page for the first row that matches, all other rows
-    that match are probably already on the same table page,
-    and so you save disk accesses and speed up the query.
-   </para>
-
-   <para>
-    <command>CLUSTER</command> can re-sort the table using either an index scan
-    on the specified index, or (if the index is a b-tree) a sequential
-    scan followed by sorting.  It will attempt to choose the method that
-    will be faster, based on planner cost parameters and available statistical
-    information.
-   </para>
-
    <para>
     While <command>CLUSTER</command> is running, the <xref
     linkend="guc-search-path"/> is temporarily changed to <literal>pg_catalog,
     pg_temp</literal>.
    </para>
 
-   <para>
-    When an index scan is used, a temporary copy of the table is created that
-    contains the table data in the index order.  Temporary copies of each
-    index on the table are created as well.  Therefore, you need free space on
-    disk at least equal to the sum of the table size and the index sizes.
-   </para>
-
-   <para>
-    When a sequential scan and sort is used, a temporary sort file is
-    also created, so that the peak temporary space requirement is as much
-    as double the table size, plus the index sizes.  This method is often
-    faster than the index scan method, but if the disk space requirement is
-    intolerable, you can disable this choice by temporarily setting <xref
-    linkend="guc-enable-sort"/> to <literal>off</literal>.
-   </para>
-
-   <para>
-    It is advisable to set <xref linkend="guc-maintenance-work-mem"/> to
-    a reasonably large value (but not more than the amount of RAM you can
-    dedicate to the <command>CLUSTER</command> operation) before clustering.
-   </para>
-
-   <para>
-    Because the planner records statistics about the ordering of
-    tables, it is advisable to run <link linkend="sql-analyze"><command>ANALYZE</command></link>
-    on the newly clustered table.
-    Otherwise, the planner might make poor choices of query plans.
-   </para>
-
    <para>
     Because <command>CLUSTER</command> remembers which indexes are clustered,
     one can cluster the tables one wants clustered manually the first time,
diff --git a/doc/src/sgml/ref/clusterdb.sgml b/doc/src/sgml/ref/clusterdb.sgml
index 0d2051bf6f1..546c1289c31 100644
--- a/doc/src/sgml/ref/clusterdb.sgml
+++ b/doc/src/sgml/ref/clusterdb.sgml
@@ -64,6 +64,11 @@ PostgreSQL documentation
    this utility and via other methods for accessing the server.
   </para>
 
+  <para>
+   <application>clusterdb</application> has been superceded by
+   <application>pg_repackdb</application>.
+  </para>
+
  </refsect1>
 
 
diff --git a/doc/src/sgml/ref/pg_repackdb.sgml b/doc/src/sgml/ref/pg_repackdb.sgml
new file mode 100644
index 00000000000..b313b54ab63
--- /dev/null
+++ b/doc/src/sgml/ref/pg_repackdb.sgml
@@ -0,0 +1,488 @@
+<!--
+doc/src/sgml/ref/pg_repackdb.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="app-pgrepackdb">
+ <indexterm zone="app-pgrepackdb">
+  <primary>pg_repackdb</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle><application>pg_repackdb</application></refentrytitle>
+  <manvolnum>1</manvolnum>
+  <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>pg_repackdb</refname>
+  <refpurpose>repack and analyze a <productname>PostgreSQL</productname>
+  database</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_repackdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+      <arg choice="plain"><option>-t</option></arg>
+      <arg choice="plain"><option>--table</option></arg>
+     </group>
+     <replaceable>table</replaceable>
+     <arg choice="opt">( <replaceable class="parameter">column</replaceable> [,...] )</arg>
+    </arg>
+   </arg>
+
+   <arg choice="opt">
+    <group choice="plain">
+     <arg choice="plain"><replaceable>dbname</replaceable></arg>
+     <arg choice="plain"><option>-a</option></arg>
+     <arg choice="plain"><option>--all</option></arg>
+    </group>
+   </arg>
+  </cmdsynopsis>
+
+  <cmdsynopsis>
+   <command>pg_repackdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+      <arg choice="plain"><option>-n</option></arg>
+      <arg choice="plain"><option>--schema</option></arg>
+     </group>
+     <replaceable>schema</replaceable>
+    </arg>
+   </arg>
+
+   <arg choice="opt">
+    <group choice="plain">
+     <arg choice="plain"><replaceable>dbname</replaceable></arg>
+     <arg choice="plain"><option>-a</option></arg>
+     <arg choice="plain"><option>--all</option></arg>
+    </group>
+   </arg>
+  </cmdsynopsis>
+
+  <cmdsynopsis>
+   <command>pg_repackdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+      <arg choice="plain"><option>-N</option></arg>
+      <arg choice="plain"><option>--exclude-schema</option></arg>
+     </group>
+     <replaceable>schema</replaceable>
+    </arg>
+   </arg>
+
+   <arg choice="opt">
+    <group choice="plain">
+     <arg choice="plain"><replaceable>dbname</replaceable></arg>
+     <arg choice="plain"><option>-a</option></arg>
+     <arg choice="plain"><option>--all</option></arg>
+    </group>
+   </arg>
+  </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <application>pg_repackdb</application> is a utility for repacking a
+   <productname>PostgreSQL</productname> database.
+   <application>pg_repackdb</application> will also generate internal
+   statistics used by the <productname>PostgreSQL</productname> query
+   optimizer.
+  </para>
+
+  <para>
+   <application>pg_repackdb</application> is a wrapper around the SQL
+   command <link linkend="sql-repack"><command>REPACK</command></link> There
+   is no effective difference between repacking and analyzing databases via
+   this utility and via other methods for accessing the server.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Options</title>
+
+   <para>
+    <application>pg_repackdb</application> accepts the following command-line arguments:
+    <variablelist>
+     <varlistentry>
+      <term><option>-a</option></term>
+      <term><option>--all</option></term>
+      <listitem>
+       <para>
+        Repack all databases.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option><optional>-d</optional> <replaceable class="parameter">dbname</replaceable></option></term>
+      <term><option><optional>--dbname=</optional><replaceable class="parameter">dbname</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the name of the database to be repacked or analyzed,
+        when <option>-a</option>/<option>--all</option> is not used.  If this
+        is not specified, the database name is read from the environment
+        variable <envar>PGDATABASE</envar>.  If that is not set, the user name
+        specified for the connection is used.
+        The <replaceable>dbname</replaceable> can be
+        a <link linkend="libpq-connstring">connection string</link>.  If so,
+        connection string parameters will override any conflicting command
+        line options.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-e</option></term>
+      <term><option>--echo</option></term>
+      <listitem>
+       <para>
+        Echo the commands that <application>pg_repackdb</application>
+        generates and sends to the server.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--index<optional>=<replaceable class="parameter">index_name</replaceable></optional></option></term>
+      <listitem>
+       <para>
+        Pass the <literal>USING INDEX</literal> clause to <literal>REPACK</literal>,
+        and optionally the index name to specify.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-j <replaceable class="parameter">njobs</replaceable></option></term>
+      <term><option>--jobs=<replaceable class="parameter">njobs</replaceable></option></term>
+      <listitem>
+       <para>
+        Execute the repack or analyze commands in parallel by running
+        <replaceable class="parameter">njobs</replaceable>
+        commands simultaneously.  This option may reduce the processing time
+        but it also increases the load on the database server.
+       </para>
+       <para>
+        <application>pg_repackdb</application> will open
+        <replaceable class="parameter">njobs</replaceable> connections to the
+        database, so make sure your <xref linkend="guc-max-connections"/>
+        setting is high enough to accommodate all connections.
+       </para>
+       <para>
+        Note that using this mode might cause deadlock failures if certain
+        system catalogs are processed in parallel.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-n <replaceable class="parameter">schema</replaceable></option></term>
+      <term><option>--schema=<replaceable class="parameter">schema</replaceable></option></term>
+      <listitem>
+       <para>
+        Repack or analyze all tables in
+        <replaceable class="parameter">schema</replaceable> only.  Multiple
+        schemas can be repacked by writing multiple <option>-n</option>
+        switches.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-N <replaceable class="parameter">schema</replaceable></option></term>
+      <term><option>--exclude-schema=<replaceable class="parameter">schema</replaceable></option></term>
+      <listitem>
+       <para>
+        Do not repack or analyze any tables in
+        <replaceable class="parameter">schema</replaceable>.  Multiple schemas
+        can be excluded by writing multiple <option>-N</option> switches.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-q</option></term>
+      <term><option>--quiet</option></term>
+      <listitem>
+       <para>
+        Do not display progress messages.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-t <replaceable class="parameter">table</replaceable> [ (<replaceable class="parameter">column</replaceable> [,...]) ]</option></term>
+      <term><option>--table=<replaceable class="parameter">table</replaceable> [ (<replaceable class="parameter">column</replaceable> [,...]) ]</option></term>
+      <listitem>
+       <para>
+        Repack or analyze <replaceable class="parameter">table</replaceable>
+        only.  Column names can be specified only in conjunction with
+        the <option>--analyze</option> option.  Multiple tables can be
+        repacked by writing multiple
+        <option>-t</option> switches.
+       </para>
+       <tip>
+        <para>
+         If you specify columns, you probably have to escape the parentheses
+         from the shell.  (See examples below.)
+        </para>
+       </tip>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-v</option></term>
+      <term><option>--verbose</option></term>
+      <listitem>
+       <para>
+        Print detailed information during processing.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-V</option></term>
+       <term><option>--version</option></term>
+       <listitem>
+       <para>
+       Print the <application>pg_repackdb</application> version and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-z</option></term>
+      <term><option>--analyze</option></term>
+      <listitem>
+       <para>
+        Also calculate statistics for use by the optimizer.  If a column name
+        list is given, only compute statistics for those columns.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-?</option></term>
+       <term><option>--help</option></term>
+       <listitem>
+       <para>
+       Show help about <application>pg_repackdb</application> command line
+       arguments, and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+    </variablelist>
+   </para>
+
+   <para>
+    <application>pg_repackdb</application> also accepts
+    the following command-line arguments for connection parameters:
+    <variablelist>
+     <varlistentry>
+      <term><option>-h <replaceable class="parameter">host</replaceable></option></term>
+      <term><option>--host=<replaceable class="parameter">host</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the host name of the machine on which the server
+        is running.  If the value begins with a slash, it is used
+        as the directory for the Unix domain socket.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-p <replaceable class="parameter">port</replaceable></option></term>
+      <term><option>--port=<replaceable class="parameter">port</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the TCP port or local Unix domain socket file
+        extension on which the server
+        is listening for connections.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-U <replaceable class="parameter">username</replaceable></option></term>
+      <term><option>--username=<replaceable class="parameter">username</replaceable></option></term>
+      <listitem>
+       <para>
+        User name to connect as.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-w</option></term>
+      <term><option>--no-password</option></term>
+      <listitem>
+       <para>
+        Never issue a password prompt.  If the server requires
+        password authentication and a password is not available by
+        other means such as a <filename>.pgpass</filename> file, the
+        connection attempt will fail.  This option can be useful in
+        batch jobs and scripts where no user is present to enter a
+        password.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-W</option></term>
+      <term><option>--password</option></term>
+      <listitem>
+       <para>
+        Force <application>pg_repackdb</application> to prompt for a
+        password before connecting to a database.
+       </para>
+
+       <para>
+        This option is never essential, since
+        <application>pg_repackdb</application> will automatically prompt
+        for a password if the server demands password authentication.
+        However, <application>pg_repackdb</application> will waste a
+        connection attempt finding out that the server wants a password.
+        In some cases it is worth typing <option>-W</option> to avoid the extra
+        connection attempt.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--maintenance-db=<replaceable class="parameter">dbname</replaceable></option></term>
+      <listitem>
+       <para>
+        When the <option>-a</option>/<option>--all</option> is used, connect
+        to this database to gather the list of databases to repack.
+        If not specified, the <literal>postgres</literal> database will be used,
+        or if that does not exist, <literal>template1</literal> will be used.
+        This can be a <link linkend="libpq-connstring">connection
+        string</link>.  If so, connection string parameters will override any
+        conflicting command line options.  Also, connection string parameters
+        other than the database name itself will be re-used when connecting
+        to other databases.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+   </para>
+ </refsect1>
+
+
+ <refsect1>
+  <title>Environment</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><envar>PGDATABASE</envar></term>
+    <term><envar>PGHOST</envar></term>
+    <term><envar>PGPORT</envar></term>
+    <term><envar>PGUSER</envar></term>
+
+    <listitem>
+     <para>
+      Default connection parameters
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><envar>PG_COLOR</envar></term>
+    <listitem>
+     <para>
+      Specifies whether to use color in diagnostic messages. Possible values
+      are <literal>always</literal>, <literal>auto</literal> and
+      <literal>never</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <para>
+   This utility, like most other <productname>PostgreSQL</productname> utilities,
+   also uses the environment variables supported by <application>libpq</application>
+   (see <xref linkend="libpq-envars"/>).
+  </para>
+
+ </refsect1>
+
+
+ <refsect1>
+  <title>Diagnostics</title>
+
+  <para>
+   In case of difficulty, see
+   <xref linkend="sql-repack"/> and <xref linkend="app-psql"/> for
+   discussions of potential problems and error messages.
+   The database server must be running at the
+   targeted host.  Also, any default connection settings and environment
+   variables used by the <application>libpq</application> front-end
+   library will apply.
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+   <para>
+    To repack the database <literal>test</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb test</userinput>
+</screen>
+   </para>
+
+   <para>
+    To repack and analyze for the optimizer a database named
+    <literal>bigdb</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --analyze bigdb</userinput>
+</screen>
+   </para>
+
+   <para>
+    To repack a single table
+    <literal>foo</literal> in a database named
+    <literal>xyzzy</literal>, and analyze a single column
+    <literal>bar</literal> of the table for the optimizer:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --analyze --verbose --table='foo(bar)' xyzzy</userinput>
+</screen></para>
+
+   <para>
+    To repack all tables in the <literal>foo</literal> and <literal>bar</literal> schemas
+    in a database named <literal>xyzzy</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --schema='foo' --schema='bar' xyzzy</userinput>
+</screen></para>
+
+
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-repack"/></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/repack.sgml b/doc/src/sgml/ref/repack.sgml
new file mode 100644
index 00000000000..0e1116eae85
--- /dev/null
+++ b/doc/src/sgml/ref/repack.sgml
@@ -0,0 +1,319 @@
+<!--
+doc/src/sgml/ref/repack.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="sql-repack">
+ <indexterm zone="sql-repack">
+  <primary>REPACK</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>REPACK</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>REPACK</refname>
+  <refpurpose>rewrite a table to reclaim disk space</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <replaceable class="parameter">table_and_columns</replaceable> [ USING INDEX [ <replaceable class="parameter">index_name</replaceable> ] ] ]
+
+<phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
+
+    VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
+    ANALYZE [ <replaceable class="parameter">boolean</replaceable> ]
+
+<phrase>and <replaceable class="parameter">table_and_columns</replaceable> is:</phrase>
+
+    <replaceable class="parameter">table_name</replaceable> [ ( <replaceable class="parameter">column_name</replaceable> [, ...] ) ]
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>REPACK</command> reclaims storage occupied by dead
+   tuples. Unlike <command>VACUUM</command>, it does so by rewriting the
+   entire contents of the table specified
+   by <replaceable class="parameter">table_name</replaceable> into a new disk
+   file with no extra space (except for the space guaranteed by
+   the <literal>fillfactor</literal> storage parameter), allowing unused space
+   to be returned to the operating system.
+  </para>
+
+  <para>
+   Without
+   a <replaceable class="parameter">table_name</replaceable>, <command>REPACK</command>
+   processes every table and materialized view in the current database that
+   the current user has the <literal>MAINTAIN</literal> privilege on. This
+   form of <command>REPACK</command> cannot be executed inside a transaction
+   block.
+  </para>
+
+  <para>
+   If a <literal>USING INDEX</literal> clause is specified, the rows are
+   physically reordered based on information from an index.  Please see the
+   notes on clustering below.
+  </para>
+
+  <para>
+   When a table is being repacked, an <literal>ACCESS EXCLUSIVE</literal> lock
+   is acquired on it. This prevents any other database operations (both reads
+   and writes) from operating on the table until the <command>REPACK</command>
+   is finished.
+  </para>
+
+  <refsect2 id="sql-repack-notes-on-clustering" xreflabel="Notes on Clustering">
+   <title>Notes on Clustering</title>
+
+   <para>
+    If the <literal>USING INDEX</literal> clause is specified, the rows in
+    the table are physically reordered following an index: if an index name
+    is specified in the command, then that index is used; if no index name
+    is specified, then the index that has been configured as the index to
+    cluster on.  If no index has been configured in this way, an error is
+    thrown.  The index given in the <literal>USING INDEX</literal> clause
+    is configured as the index to cluster on, as well as an index given
+    to the <command>CLUSTER</command> command.  An index can be set
+    manually using <command>ALTER TABLE ... CLUSTER ON</command>, and reset
+    with <command>ALTER TABLE ... SET WITHOUT CLUSTER</command>.
+   </para>
+
+   <para>
+    If no table name is specified in <command>REPACK USING INDEX</command>,
+    all tables which have a clustering index defined and which the calling
+    user has privileges for are processed.
+   </para>
+
+   <para>
+    Clustering is a one-time operation: when the table is
+    subsequently updated, the changes are not clustered.  That is, no attempt
+    is made to store new or updated rows according to their index order.  (If
+    one wishes, one can periodically recluster by issuing the command again.
+    Also, setting the table's <literal>fillfactor</literal> storage parameter
+    to less than 100% can aid in preserving cluster ordering during updates,
+    since updated rows are kept on the same page if enough space is available
+    there.)
+   </para>
+
+   <para>
+    In cases where you are accessing single rows randomly within a table, the
+    actual order of the data in the table is unimportant. However, if you tend
+    to access some data more than others, and there is an index that groups
+    them together, you will benefit from using clustering.  If
+    you are requesting a range of indexed values from a table, or a single
+    indexed value that has multiple rows that match,
+    <command>REPACK</command> will help because once the index identifies the
+    table page for the first row that matches, all other rows that match are
+    probably already on the same table page, and so you save disk accesses and
+    speed up the query.
+   </para>
+
+   <para>
+    <command>REPACK</command> can re-sort the table using either an index scan
+    on the specified index (if the index is a b-tree), or a sequential scan
+    followed by sorting.  It will attempt to choose the method that will be
+    faster, based on planner cost parameters and available statistical
+    information.
+   </para>
+
+   <para>
+    Because the planner records statistics about the ordering of tables, it is
+    advisable to
+    run <link linkend="sql-analyze"><command>ANALYZE</command></link> on the
+    newly repacked table.  Otherwise, the planner might make poor choices of
+    query plans.
+   </para>
+  </refsect2>
+
+  <refsect2 id="sql-repack-notes-on-resources" xreflabel="Notes on Resources">
+   <title>Notes on Resources</title>
+
+   <para>
+    When an index scan or a sequential scan without sort is used, a temporary
+    copy of the table is created that contains the table data in the index
+    order.  Temporary copies of each index on the table are created as well.
+    Therefore, you need free space on disk at least equal to the sum of the
+    table size and the index sizes.
+   </para>
+
+   <para>
+    When a sequential scan and sort is used, a temporary sort file is also
+    created, so that the peak temporary space requirement is as much as double
+    the table size, plus the index sizes.  This method is often faster than
+    the index scan method, but if the disk space requirement is intolerable,
+    you can disable this choice by temporarily setting
+    <xref linkend="guc-enable-sort"/> to <literal>off</literal>.
+   </para>
+
+   <para>
+    It is advisable to set <xref linkend="guc-maintenance-work-mem"/> to a
+    reasonably large value (but not more than the amount of RAM you can
+    dedicate to the <command>REPACK</command> operation) before repacking.
+   </para>
+  </refsect2>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><replaceable class="parameter">table_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (possibly schema-qualified) of a table.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">column_name</replaceable></term>
+    <listitem>
+     <para>
+      The name of a specific column to analyze. Defaults to all columns.
+      If a column list is specific, <literal>ANALYZE</literal> must also
+      be specified.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">index_name</replaceable></term>
+    <listitem>
+     <para>
+      The name of an index.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>VERBOSE</literal></term>
+    <listitem>
+     <para>
+      Prints a progress report as each table is repacked
+      at <literal>INFO</literal> level.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>ANALYZE</literal></term>
+    <term><literal>ANALYSE</literal></term>
+    <listitem>
+     <para>
+      Applies <xref linkend="sql-analyze"/> on the table after repacking.  This is
+      currently only supported when a single (non-partitioned) table is specified.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">boolean</replaceable></term>
+    <listitem>
+     <para>
+      Specifies whether the selected option should be turned on or off.
+      You can write <literal>TRUE</literal>, <literal>ON</literal>, or
+      <literal>1</literal> to enable the option, and <literal>FALSE</literal>,
+      <literal>OFF</literal>, or <literal>0</literal> to disable it.  The
+      <replaceable class="parameter">boolean</replaceable> value can also
+      be omitted, in which case <literal>TRUE</literal> is assumed.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+
+   <para>
+    To repack a table, one must have the <literal>MAINTAIN</literal> privilege
+    on the table.
+   </para>
+
+   <para>
+    While <command>REPACK</command> is running, the <xref
+    linkend="guc-search-path"/> is temporarily changed to <literal>pg_catalog,
+    pg_temp</literal>.
+   </para>
+
+  <para>
+    Each backend running <command>REPACK</command> will report its progress
+    in the <structname>pg_stat_progress_repack</structname> view. See
+    <xref linkend="repack-progress-reporting"/> for details.
+  </para>
+
+   <para>
+    Repacking a partitioned table repacks each of its partitions. If an index
+    is specified, each partition is repacked using the partition of that
+    index. <command>REPACK</command> on a partitioned table cannot be executed
+    inside a transaction block.
+   </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   Repack the table <literal>employees</literal>:
+<programlisting>
+REPACK employees;
+</programlisting>
+  </para>
+
+  <para>
+   Repack the table <literal>employees</literal> on the basis of its
+   index <literal>employees_ind</literal> (Since index is used here, this is
+   effectively clustering):
+<programlisting>
+REPACK employees USING INDEX employees_ind;
+</programlisting>
+  </para>
+
+  <para>
+   Repack the table <literal>cases</literal> on physical ordering,
+   running an <command>ANALYZE</command> on the given columns once
+   repacking is done, showing informational messages:
+<programlisting>
+REPACK (ANALYZE, VERBOSE) cases (district, case_nr);
+</programlisting>
+  </para>
+
+  <para>
+   Repack all tables in the database on which you have
+   the <literal>MAINTAIN</literal> privilege:
+<programlisting>
+REPACK;
+</programlisting>
+  </para>
+
+  <para>
+   Repack all tables for which a clustering index has previously been
+   configured on which you have the <literal>MAINTAIN</literal> privilege,
+   showing informational messages:
+<programlisting>
+REPACK (VERBOSE) USING INDEX;
+</programlisting>
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There is no <command>REPACK</command> statement in the SQL standard.
+  </para>
+
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index bd5dcaf86a5..062b658cfcd 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -25,7 +25,6 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
 
 <phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
 
-    FULL [ <replaceable class="parameter">boolean</replaceable> ]
     FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
     VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
     ANALYZE [ <replaceable class="parameter">boolean</replaceable> ]
@@ -39,6 +38,7 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
     SKIP_DATABASE_STATS [ <replaceable class="parameter">boolean</replaceable> ]
     ONLY_DATABASE_STATS [ <replaceable class="parameter">boolean</replaceable> ]
     BUFFER_USAGE_LIMIT <replaceable class="parameter">size</replaceable>
+    FULL [ <replaceable class="parameter">boolean</replaceable> ]
 
 <phrase>and <replaceable class="parameter">table_and_columns</replaceable> is:</phrase>
 
@@ -95,20 +95,6 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
   <title>Parameters</title>
 
   <variablelist>
-   <varlistentry>
-    <term><literal>FULL</literal></term>
-    <listitem>
-     <para>
-      Selects <quote>full</quote> vacuum, which can reclaim more
-      space, but takes much longer and exclusively locks the table.
-      This method also requires extra disk space, since it writes a
-      new copy of the table and doesn't release the old copy until
-      the operation is complete.  Usually this should only be used when a
-      significant amount of space needs to be reclaimed from within the table.
-     </para>
-    </listitem>
-   </varlistentry>
-
    <varlistentry>
     <term><literal>FREEZE</literal></term>
     <listitem>
@@ -362,6 +348,23 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>FULL</literal></term>
+    <listitem>
+     <para>
+      This option, which is deprecated, makes <command>VACUUM</command>
+      behave like <command>REPACK</command> without a
+      <literal>USING INDEX</literal> clause.
+      This method of compacting the table takes much longer than
+      <command>VACUUM</command> and exclusively locks the table.
+      This method also requires extra disk space, since it writes a
+      new copy of the table and doesn't release the old copy until
+      the operation is complete.  Usually this should only be used when a
+      significant amount of space needs to be reclaimed from within the table.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><replaceable class="parameter">boolean</replaceable></term>
     <listitem>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index ff85ace83fc..2ee08e21f41 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -195,6 +195,7 @@
    &refreshMaterializedView;
    &reindex;
    &releaseSavepoint;
+   &repack;
    &reset;
    &revoke;
    &rollback;
@@ -257,6 +258,7 @@
    &pgIsready;
    &pgReceivewal;
    &pgRecvlogical;
+   &pgRepackdb;
    &pgRestore;
    &pgVerifyBackup;
    &psqlRef;
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..79f9de5d760 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -741,13 +741,13 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	if (OldIndex != NULL && !use_sort)
 	{
 		const int	ci_index[] = {
-			PROGRESS_CLUSTER_PHASE,
-			PROGRESS_CLUSTER_INDEX_RELID
+			PROGRESS_REPACK_PHASE,
+			PROGRESS_REPACK_INDEX_RELID
 		};
 		int64		ci_val[2];
 
 		/* Set phase and OIDOldIndex to columns */
-		ci_val[0] = PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP;
+		ci_val[0] = PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP;
 		ci_val[1] = RelationGetRelid(OldIndex);
 		pgstat_progress_update_multi_param(2, ci_index, ci_val);
 
@@ -759,15 +759,15 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	else
 	{
 		/* In scan-and-sort mode and also VACUUM FULL, set phase */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
 		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
 		/* Set total heap blocks */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_TOTAL_HEAP_BLKS,
+		pgstat_progress_update_param(PROGRESS_REPACK_TOTAL_HEAP_BLKS,
 									 heapScan->rs_nblocks);
 	}
 
@@ -809,7 +809,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 				 * is manually updated to the correct value when the table
 				 * scan finishes.
 				 */
-				pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+				pgstat_progress_update_param(PROGRESS_REPACK_HEAP_BLKS_SCANNED,
 											 heapScan->rs_nblocks);
 				break;
 			}
@@ -825,7 +825,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			 */
 			if (prev_cblock != heapScan->rs_cblock)
 			{
-				pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+				pgstat_progress_update_param(PROGRESS_REPACK_HEAP_BLKS_SCANNED,
 											 (heapScan->rs_cblock +
 											  heapScan->rs_nblocks -
 											  heapScan->rs_startblock
@@ -912,14 +912,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			 * In scan-and-sort mode, report increase in number of tuples
 			 * scanned
 			 */
-			pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
 										 *num_tuples);
 		}
 		else
 		{
 			const int	ct_index[] = {
-				PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED,
-				PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN
+				PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
+				PROGRESS_REPACK_HEAP_TUPLES_WRITTEN
 			};
 			int64		ct_val[2];
 
@@ -952,14 +952,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		double		n_tuples = 0;
 
 		/* Report that we are now sorting tuples */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_SORT_TUPLES);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_SORT_TUPLES);
 
 		tuplesort_performsort(tuplesort);
 
 		/* Report that we are now writing new heap */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_WRITE_NEW_HEAP);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP);
 
 		for (;;)
 		{
@@ -977,7 +977,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 									 values, isnull,
 									 rwstate);
 			/* Report n_tuples */
-			pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_WRITTEN,
 										 n_tuples);
 		}
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 5d9db167e59..08d4b8e44d7 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -4079,7 +4079,7 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 		Assert(!ReindexIsProcessingIndex(indexOid));
 
 		/* Set index rebuild count */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_INDEX_REBUILD_COUNT,
+		pgstat_progress_update_param(PROGRESS_REPACK_INDEX_REBUILD_COUNT,
 									 i);
 		i++;
 	}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index dec8df4f8ee..1ad30116631 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1269,14 +1269,15 @@ CREATE VIEW pg_stat_progress_vacuum AS
     FROM pg_stat_get_progress_info('VACUUM') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
-CREATE VIEW pg_stat_progress_cluster AS
+CREATE VIEW pg_stat_progress_repack AS
     SELECT
         S.pid AS pid,
         S.datid AS datid,
         D.datname AS datname,
         S.relid AS relid,
         CASE S.param1 WHEN 1 THEN 'CLUSTER'
-                      WHEN 2 THEN 'VACUUM FULL'
+                      WHEN 2 THEN 'REPACK'
+                      WHEN 3 THEN 'VACUUM FULL'
                       END AS command,
         CASE S.param2 WHEN 0 THEN 'initializing'
                       WHEN 1 THEN 'seq scanning heap'
@@ -1287,15 +1288,35 @@ CREATE VIEW pg_stat_progress_cluster AS
                       WHEN 6 THEN 'rebuilding index'
                       WHEN 7 THEN 'performing final cleanup'
                       END AS phase,
-        CAST(S.param3 AS oid) AS cluster_index_relid,
+        CAST(S.param3 AS oid) AS repack_index_relid,
         S.param4 AS heap_tuples_scanned,
         S.param5 AS heap_tuples_written,
         S.param6 AS heap_blks_total,
         S.param7 AS heap_blks_scanned,
         S.param8 AS index_rebuild_count
-    FROM pg_stat_get_progress_info('CLUSTER') AS S
+    FROM pg_stat_get_progress_info('REPACK') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
+-- This view is as the one above, except for renaming a column and avoiding
+-- 'REPACK' as a command name to report.
+CREATE VIEW pg_stat_progress_cluster AS
+    SELECT
+        pid,
+        datid,
+        datname,
+        relid,
+        CASE WHEN command IN ('CLUSTER', 'VACUUM FULL') THEN command
+             WHEN repack_index_relid = 0 THEN 'VACUUM FULL'
+             ELSE 'CLUSTER' END AS command,
+        phase,
+        repack_index_relid AS cluster_index_relid,
+        heap_tuples_scanned,
+        heap_tuples_written,
+        heap_blks_total,
+        heap_blks_scanned,
+        index_rebuild_count
+    FROM pg_stat_progress_repack;
+
 CREATE VIEW pg_stat_progress_create_index AS
     SELECT
         S.pid AS pid, S.datid AS datid, D.datname AS datname,
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index b55221d44cd..18bee52a4ee 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -67,27 +67,36 @@ typedef struct
 	Oid			indexOid;
 } RelToCluster;
 
-
-static void cluster_multiple_rels(List *rtcs, ClusterParams *params);
-static void rebuild_relation(Relation OldHeap, Relation index, bool verbose);
+static bool cluster_rel_recheck(RepackCommand cmd, Relation OldHeap,
+								Oid indexOid, Oid userid, int options);
+static void rebuild_relation(RepackCommand cmd,
+							 Relation OldHeap, Relation index, bool verbose);
 static void copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
-static List *get_tables_to_cluster(MemoryContext cluster_context);
-static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
-											   Oid indexOid);
-static bool cluster_is_permitted_for_relation(Oid relid, Oid userid);
+static List *get_tables_to_repack(RepackCommand cmd, bool usingindex,
+								  MemoryContext permcxt);
+static List *get_tables_to_repack_partitioned(RepackCommand cmd,
+											  Oid relid, bool rel_is_index,
+											  MemoryContext permcxt);
+static bool cluster_is_permitted_for_relation(RepackCommand cmd,
+											  Oid relid, Oid userid);
+static Relation process_single_relation(RepackStmt *stmt,
+										ClusterParams *params);
+static Oid	determine_clustered_index(Relation rel, bool usingindex,
+									  const char *indexname);
+static const char *RepackCommandAsString(RepackCommand cmd);
 
 
-/*---------------------------------------------------------------------------
- * This cluster code allows for clustering multiple tables at once. Because
+/*
+ * The repack code allows for processing multiple tables at once. Because
  * of this, we cannot just run everything on a single transaction, or we
  * would be forced to acquire exclusive locks on all the tables being
  * clustered, simultaneously --- very likely leading to deadlock.
  *
- * To solve this we follow a similar strategy to VACUUM code,
- * clustering each relation in a separate transaction. For this to work,
- * we need to:
+ * To solve this we follow a similar strategy to VACUUM code, processing each
+ * relation in a separate transaction. For this to work, we need to:
+ *
  *	- provide a separate memory context so that we can pass information in
  *	  a way that survives across transactions
  *	- start a new transaction every time a new relation is clustered
@@ -98,197 +107,165 @@ static bool cluster_is_permitted_for_relation(Oid relid, Oid userid);
  *
  * The single-relation case does not have any such overhead.
  *
- * We also allow a relation to be specified without index.  In that case,
- * the indisclustered bit will be looked up, and an ERROR will be thrown
- * if there is no index with the bit set.
- *---------------------------------------------------------------------------
+ * We also allow a relation to be repacked following an index, but without
+ * naming a specific one.  In that case, the indisclustered bit will be
+ * looked up, and an ERROR will be thrown if no so-marked index is found.
  */
 void
-cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
+ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 {
-	ListCell   *lc;
 	ClusterParams params = {0};
-	bool		verbose = false;
 	Relation	rel = NULL;
-	Oid			indexOid = InvalidOid;
-	MemoryContext cluster_context;
+	MemoryContext repack_context;
 	List	   *rtcs;
 
 	/* Parse option list */
-	foreach(lc, stmt->params)
+	foreach_node(DefElem, opt, stmt->params)
 	{
-		DefElem    *opt = (DefElem *) lfirst(lc);
-
 		if (strcmp(opt->defname, "verbose") == 0)
-			verbose = defGetBoolean(opt);
+			params.options |= defGetBoolean(opt) ? CLUOPT_VERBOSE : 0;
+		else if (strcmp(opt->defname, "analyze") == 0 ||
+				 strcmp(opt->defname, "analyse") == 0)
+			params.options |= defGetBoolean(opt) ? CLUOPT_ANALYZE : 0;
 		else
 			ereport(ERROR,
-					(errcode(ERRCODE_SYNTAX_ERROR),
-					 errmsg("unrecognized CLUSTER option \"%s\"",
-							opt->defname),
-					 parser_errposition(pstate, opt->location)));
+					errcode(ERRCODE_SYNTAX_ERROR),
+					errmsg("unrecognized %s option \"%s\"",
+						   RepackCommandAsString(stmt->command),
+						   opt->defname),
+					parser_errposition(pstate, opt->location));
 	}
 
-	params.options = (verbose ? CLUOPT_VERBOSE : 0);
-
+	/*
+	 * If a single relation is specified, process it and we're done ... unless
+	 * the relation is a partitioned table, in which case we fall through.
+	 */
 	if (stmt->relation != NULL)
 	{
-		/* This is the single-relation case. */
-		Oid			tableOid;
-
-		/*
-		 * Find, lock, and check permissions on the table.  We obtain
-		 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
-		 * single-transaction case.
-		 */
-		tableOid = RangeVarGetRelidExtended(stmt->relation,
-											AccessExclusiveLock,
-											0,
-											RangeVarCallbackMaintainsTable,
-											NULL);
-		rel = table_open(tableOid, NoLock);
-
-		/*
-		 * Reject clustering a remote temp table ... their local buffer
-		 * manager is not going to cope.
-		 */
-		if (RELATION_IS_OTHER_TEMP(rel))
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot cluster temporary tables of other sessions")));
-
-		if (stmt->indexname == NULL)
-		{
-			ListCell   *index;
-
-			/* We need to find the index that has indisclustered set. */
-			foreach(index, RelationGetIndexList(rel))
-			{
-				indexOid = lfirst_oid(index);
-				if (get_index_isclustered(indexOid))
-					break;
-				indexOid = InvalidOid;
-			}
-
-			if (!OidIsValid(indexOid))
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_OBJECT),
-						 errmsg("there is no previously clustered index for table \"%s\"",
-								stmt->relation->relname)));
-		}
-		else
-		{
-			/*
-			 * The index is expected to be in the same namespace as the
-			 * relation.
-			 */
-			indexOid = get_relname_relid(stmt->indexname,
-										 rel->rd_rel->relnamespace);
-			if (!OidIsValid(indexOid))
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_OBJECT),
-						 errmsg("index \"%s\" for table \"%s\" does not exist",
-								stmt->indexname, stmt->relation->relname)));
-		}
-
-		/* For non-partitioned tables, do what we came here to do. */
-		if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
-		{
-			cluster_rel(rel, indexOid, &params);
-			/* cluster_rel closes the relation, but keeps lock */
-
-			return;
-		}
+		rel = process_single_relation(stmt, &params);
+		if (rel == NULL)
+			return;				/* all done */
 	}
 
+	/*
+	 * Don't allow ANALYZE in the multiple-relation case for now.  Maybe we
+	 * can add support for this later.
+	 */
+	if (params.options & CLUOPT_ANALYZE)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot %s multiple tables", "REPACK (ANALYZE)"));
+
 	/*
 	 * By here, we know we are in a multi-table situation.  In order to avoid
 	 * holding locks for too long, we want to process each table in its own
 	 * transaction.  This forces us to disallow running inside a user
 	 * transaction block.
 	 */
-	PreventInTransactionBlock(isTopLevel, "CLUSTER");
+	PreventInTransactionBlock(isTopLevel, RepackCommandAsString(stmt->command));
 
 	/* Also, we need a memory context to hold our list of relations */
-	cluster_context = AllocSetContextCreate(PortalContext,
-											"Cluster",
-											ALLOCSET_DEFAULT_SIZES);
+	repack_context = AllocSetContextCreate(PortalContext,
+										   "Repack",
+										   ALLOCSET_DEFAULT_SIZES);
+
+	params.options |= CLUOPT_RECHECK;
 
 	/*
-	 * Either we're processing a partitioned table, or we were not given any
-	 * table name at all.  In either case, obtain a list of relations to
-	 * process.
-	 *
-	 * In the former case, an index name must have been given, so we don't
-	 * need to recheck its "indisclustered" bit, but we have to check that it
-	 * is an index that we can cluster on.  In the latter case, we set the
-	 * option bit to have indisclustered verified.
-	 *
-	 * Rechecking the relation itself is necessary here in all cases.
+	 * If we don't have a relation yet, determine a relation list.  If we do,
+	 * then it must be a partitioned table, and we want to process its
+	 * partitions.
 	 */
-	params.options |= CLUOPT_RECHECK;
-	if (rel != NULL)
+	if (rel == NULL)
 	{
-		Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
-		check_index_is_clusterable(rel, indexOid, AccessShareLock);
-		rtcs = get_tables_to_cluster_partitioned(cluster_context, indexOid);
-
-		/* close relation, releasing lock on parent table */
-		table_close(rel, AccessExclusiveLock);
+		Assert(stmt->indexname == NULL);
+		rtcs = get_tables_to_repack(stmt->command, stmt->usingindex,
+									repack_context);
 	}
 	else
 	{
-		rtcs = get_tables_to_cluster(cluster_context);
-		params.options |= CLUOPT_RECHECK_ISCLUSTERED;
+		Oid			relid;
+		bool		rel_is_index;
+
+		Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
+
+		/*
+		 * If USING INDEX was specified, resolve the index name now and pass
+		 * it down.
+		 */
+		if (stmt->usingindex)
+		{
+			/*
+			 * If no index name was specified when repacking a partitioned
+			 * table, punt for now.  Maybe we can improve this later.
+			 */
+			if (!stmt->indexname)
+				ereport(ERROR,
+						errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+						errmsg("there is no previously clustered index for table \"%s\"",
+							   RelationGetRelationName(rel)));
+
+			relid = determine_clustered_index(rel, stmt->usingindex,
+											  stmt->indexname);
+			if (!OidIsValid(relid))
+				elog(ERROR, "unable to determine index to cluster on");
+			/* XXX is this the right place for this check? */
+			check_index_is_clusterable(rel, relid, AccessExclusiveLock);
+			rel_is_index = true;
+		}
+		else
+		{
+			relid = RelationGetRelid(rel);
+			rel_is_index = false;
+		}
+
+		rtcs = get_tables_to_repack_partitioned(stmt->command,
+												relid, rel_is_index,
+												repack_context);
+
+		/* close parent relation, releasing lock on it */
+		table_close(rel, AccessExclusiveLock);
+		rel = NULL;
 	}
 
-	/* Do the job. */
-	cluster_multiple_rels(rtcs, &params);
-
-	/* Start a new transaction for the cleanup work. */
-	StartTransactionCommand();
-
-	/* Clean up working storage */
-	MemoryContextDelete(cluster_context);
-}
-
-/*
- * Given a list of relations to cluster, process each of them in a separate
- * transaction.
- *
- * We expect to be in a transaction at start, but there isn't one when we
- * return.
- */
-static void
-cluster_multiple_rels(List *rtcs, ClusterParams *params)
-{
-	ListCell   *lc;
-
 	/* Commit to get out of starting transaction */
 	PopActiveSnapshot();
 	CommitTransactionCommand();
 
 	/* Cluster the tables, each in a separate transaction */
-	foreach(lc, rtcs)
+	Assert(rel == NULL);
+	foreach_ptr(RelToCluster, rtc, rtcs)
 	{
-		RelToCluster *rtc = (RelToCluster *) lfirst(lc);
-		Relation	rel;
-
 		/* Start a new transaction for each relation. */
 		StartTransactionCommand();
 
+		/*
+		 * Open the target table, coping with the case where it has been
+		 * dropped.
+		 */
+		rel = try_table_open(rtc->tableOid, AccessExclusiveLock);
+		if (rel == NULL)
+		{
+			CommitTransactionCommand();
+			continue;
+		}
+
 		/* functions in indexes may want a snapshot set */
 		PushActiveSnapshot(GetTransactionSnapshot());
 
-		rel = table_open(rtc->tableOid, AccessExclusiveLock);
-
 		/* Process this table */
-		cluster_rel(rel, rtc->indexOid, params);
+		cluster_rel(stmt->command, rel, rtc->indexOid, &params);
 		/* cluster_rel closes the relation, but keeps lock */
 
 		PopActiveSnapshot();
 		CommitTransactionCommand();
 	}
+
+	/* Start a new transaction for the cleanup work. */
+	StartTransactionCommand();
+
+	/* Clean up working storage */
+	MemoryContextDelete(repack_context);
 }
 
 /*
@@ -304,11 +281,14 @@ cluster_multiple_rels(List *rtcs, ClusterParams *params)
  * them incrementally while we load the table.
  *
  * If indexOid is InvalidOid, the table will be rewritten in physical order
- * instead of index order.  This is the new implementation of VACUUM FULL,
- * and error messages should refer to the operation as VACUUM not CLUSTER.
+ * instead of index order.
+ *
+ * 'cmd' indicates which command is being executed, to be used for error
+ * messages.
  */
 void
-cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
+cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
+			ClusterParams *params)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			save_userid;
@@ -323,13 +303,8 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	/* Check for user-requested abort. */
 	CHECK_FOR_INTERRUPTS();
 
-	pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
-	if (OidIsValid(indexOid))
-		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
-									 PROGRESS_CLUSTER_COMMAND_CLUSTER);
-	else
-		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
-									 PROGRESS_CLUSTER_COMMAND_VACUUM_FULL);
+	pgstat_progress_start_command(PROGRESS_COMMAND_REPACK, tableOid);
+	pgstat_progress_update_param(PROGRESS_REPACK_COMMAND, cmd);
 
 	/*
 	 * Switch to the table owner's userid, so that any index functions are run
@@ -350,86 +325,38 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	 * *must* skip the one on indisclustered since it would reject an attempt
 	 * to cluster a not-previously-clustered index.
 	 */
-	if (recheck)
-	{
-		/* Check that the user still has privileges for the relation */
-		if (!cluster_is_permitted_for_relation(tableOid, save_userid))
-		{
-			relation_close(OldHeap, AccessExclusiveLock);
-			goto out;
-		}
-
-		/*
-		 * Silently skip a temp table for a remote session.  Only doing this
-		 * check in the "recheck" case is appropriate (which currently means
-		 * somebody is executing a database-wide CLUSTER or on a partitioned
-		 * table), because there is another check in cluster() which will stop
-		 * any attempt to cluster remote temp tables by name.  There is
-		 * another check in cluster_rel which is redundant, but we leave it
-		 * for extra safety.
-		 */
-		if (RELATION_IS_OTHER_TEMP(OldHeap))
-		{
-			relation_close(OldHeap, AccessExclusiveLock);
-			goto out;
-		}
-
-		if (OidIsValid(indexOid))
-		{
-			/*
-			 * Check that the index still exists
-			 */
-			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
-			{
-				relation_close(OldHeap, AccessExclusiveLock);
-				goto out;
-			}
-
-			/*
-			 * Check that the index is still the one with indisclustered set,
-			 * if needed.
-			 */
-			if ((params->options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
-				!get_index_isclustered(indexOid))
-			{
-				relation_close(OldHeap, AccessExclusiveLock);
-				goto out;
-			}
-		}
-	}
+	if (recheck &&
+		!cluster_rel_recheck(cmd, OldHeap, indexOid, save_userid,
+							 params->options))
+		goto out;
 
 	/*
-	 * We allow VACUUM FULL, but not CLUSTER, on shared catalogs.  CLUSTER
-	 * would work in most respects, but the index would only get marked as
-	 * indisclustered in the current database, leading to unexpected behavior
-	 * if CLUSTER were later invoked in another database.
+	 * We allow repacking shared catalogs only when not using an index. It
+	 * would work to use an index in most respects, but the index would only
+	 * get marked as indisclustered in the current database, leading to
+	 * unexpected behavior if CLUSTER were later invoked in another database.
 	 */
 	if (OidIsValid(indexOid) && OldHeap->rd_rel->relisshared)
 		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot cluster a shared catalog")));
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot run %s on a shared catalog",
+					   RepackCommandAsString(cmd)));
 
 	/*
 	 * Don't process temp tables of other backends ... their local buffer
 	 * manager is not going to cope.
 	 */
 	if (RELATION_IS_OTHER_TEMP(OldHeap))
-	{
-		if (OidIsValid(indexOid))
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot cluster temporary tables of other sessions")));
-		else
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot vacuum temporary tables of other sessions")));
-	}
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot run %s on temporary tables of other sessions",
+					   RepackCommandAsString(cmd)));
 
 	/*
 	 * Also check for active uses of the relation in the current transaction,
 	 * including open scans and pending AFTER trigger events.
 	 */
-	CheckTableNotInUse(OldHeap, OidIsValid(indexOid) ? "CLUSTER" : "VACUUM");
+	CheckTableNotInUse(OldHeap, RepackCommandAsString(cmd));
 
 	/* Check heap and index are valid to cluster on */
 	if (OidIsValid(indexOid))
@@ -442,6 +369,24 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	else
 		index = NULL;
 
+	/*
+	 * When allow_system_table_mods is turned off, we disallow repacking a
+	 * catalog on a particular index unless that's already the clustered index
+	 * for that catalog.
+	 *
+	 * XXX We don't check for this in CLUSTER, because it's historically been
+	 * allowed.
+	 */
+	if (cmd != REPACK_COMMAND_CLUSTER &&
+		!allowSystemTableMods && OidIsValid(indexOid) &&
+		IsCatalogRelation(OldHeap) && !index->rd_index->indisclustered)
+		ereport(ERROR,
+				errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				errmsg("permission denied: \"%s\" is a system catalog",
+					   RelationGetRelationName(OldHeap)),
+				errdetail("System catalogs can only be clustered by the index they're already clustered on, if any, unless \"%s\" is enabled.",
+						  "allow_system_table_mods"));
+
 	/*
 	 * Quietly ignore the request if this is a materialized view which has not
 	 * been populated from its query. No harm is done because there is no data
@@ -469,7 +414,7 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	TransferPredicateLocksToHeapRelation(OldHeap);
 
 	/* rebuild_relation does all the dirty work */
-	rebuild_relation(OldHeap, index, verbose);
+	rebuild_relation(cmd, OldHeap, index, verbose);
 	/* rebuild_relation closes OldHeap, and index if valid */
 
 out:
@@ -482,6 +427,63 @@ out:
 	pgstat_progress_end_command();
 }
 
+/*
+ * Check if the table (and its index) still meets the requirements of
+ * cluster_rel().
+ */
+static bool
+cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
+					Oid userid, int options)
+{
+	Oid			tableOid = RelationGetRelid(OldHeap);
+
+	/* Check that the user still has privileges for the relation */
+	if (!cluster_is_permitted_for_relation(cmd, tableOid, userid))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		return false;
+	}
+
+	/*
+	 * Silently skip a temp table for a remote session.  Only doing this check
+	 * in the "recheck" case is appropriate (which currently means somebody is
+	 * executing a database-wide CLUSTER or on a partitioned table), because
+	 * there is another check in cluster() which will stop any attempt to
+	 * cluster remote temp tables by name.  There is another check in
+	 * cluster_rel which is redundant, but we leave it for extra safety.
+	 */
+	if (RELATION_IS_OTHER_TEMP(OldHeap))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		return false;
+	}
+
+	if (OidIsValid(indexOid))
+	{
+		/*
+		 * Check that the index still exists
+		 */
+		if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
+		{
+			relation_close(OldHeap, AccessExclusiveLock);
+			return false;
+		}
+
+		/*
+		 * Check that the index is still the one with indisclustered set, if
+		 * needed.
+		 */
+		if ((options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
+			!get_index_isclustered(indexOid))
+		{
+			relation_close(OldHeap, AccessExclusiveLock);
+			return false;
+		}
+	}
+
+	return true;
+}
+
 /*
  * Verify that the specified heap and index are valid to cluster on
  *
@@ -626,7 +628,8 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
  * On exit, they are closed, but locks on them are not released.
  */
 static void
-rebuild_relation(Relation OldHeap, Relation index, bool verbose)
+rebuild_relation(RepackCommand cmd,
+				 Relation OldHeap, Relation index, bool verbose)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			accessMethod = OldHeap->rd_rel->relam;
@@ -642,8 +645,8 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose)
 	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false) &&
 		   (index == NULL || CheckRelationLockedByMe(index, AccessExclusiveLock, false)));
 
-	if (index)
-		/* Mark the correct index as clustered */
+	/* for CLUSTER or REPACK USING INDEX, mark the index as the one to use */
+	if (index != NULL)
 		mark_index_clustered(OldHeap, RelationGetRelid(index), true);
 
 	/* Remember info about rel before closing OldHeap */
@@ -958,20 +961,20 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	/* Log what we're doing */
 	if (OldIndex != NULL && !use_sort)
 		ereport(elevel,
-				(errmsg("clustering \"%s.%s\" using index scan on \"%s\"",
-						nspname,
-						RelationGetRelationName(OldHeap),
-						RelationGetRelationName(OldIndex))));
+				errmsg("repacking \"%s.%s\" using index scan on \"%s\"",
+					   nspname,
+					   RelationGetRelationName(OldHeap),
+					   RelationGetRelationName(OldIndex)));
 	else if (use_sort)
 		ereport(elevel,
-				(errmsg("clustering \"%s.%s\" using sequential scan and sort",
-						nspname,
-						RelationGetRelationName(OldHeap))));
+				errmsg("repacking \"%s.%s\" using sequential scan and sort",
+					   nspname,
+					   RelationGetRelationName(OldHeap)));
 	else
 		ereport(elevel,
-				(errmsg("vacuuming \"%s.%s\"",
-						nspname,
-						RelationGetRelationName(OldHeap))));
+				errmsg("repacking \"%s.%s\" in physical order",
+					   nspname,
+					   RelationGetRelationName(OldHeap)));
 
 	/*
 	 * Hand off the actual copying to AM specific function, the generic code
@@ -1458,8 +1461,8 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	int			i;
 
 	/* Report that we are now swapping relation files */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_SWAP_REL_FILES);
 
 	/* Zero out possible results from swapped_relation_files */
 	memset(mapped_tables, 0, sizeof(mapped_tables));
@@ -1509,14 +1512,14 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 		reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
 
 	/* Report that we are now reindexing relations */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_REBUILD_INDEX);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
 
 	reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
 
 	/* Report that we are now doing clean up */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_FINAL_CLEANUP);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_FINAL_CLEANUP);
 
 	/*
 	 * If the relation being rebuilt is pg_class, swap_relation_files()
@@ -1632,106 +1635,191 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	}
 }
 
-
 /*
- * Get a list of tables that the current user has privileges on and
- * have indisclustered set.  Return the list in a List * of RelToCluster
- * (stored in the specified memory context), each one giving the tableOid
- * and the indexOid on which the table is already clustered.
+ * Determine which relations to process, when REPACK/CLUSTER is called
+ * without specifying a table name.  The exact process depends on whether
+ * USING INDEX was given or not, and in any case we only return tables and
+ * materialized views that the current user has privileges to repack/cluster.
+ *
+ * If USING INDEX was given, we scan pg_index to find those that have
+ * indisclustered set; if it was not given, scan pg_class and return all
+ * tables.
+ *
+ * Return it as a list of RelToCluster in the given memory context.
  */
 static List *
-get_tables_to_cluster(MemoryContext cluster_context)
+get_tables_to_repack(RepackCommand cmd, bool usingindex, MemoryContext permcxt)
 {
-	Relation	indRelation;
+	Relation	catalog;
 	TableScanDesc scan;
-	ScanKeyData entry;
-	HeapTuple	indexTuple;
-	Form_pg_index index;
-	MemoryContext old_context;
+	HeapTuple	tuple;
 	List	   *rtcs = NIL;
 
-	/*
-	 * Get all indexes that have indisclustered set and that the current user
-	 * has the appropriate privileges for.
-	 */
-	indRelation = table_open(IndexRelationId, AccessShareLock);
-	ScanKeyInit(&entry,
-				Anum_pg_index_indisclustered,
-				BTEqualStrategyNumber, F_BOOLEQ,
-				BoolGetDatum(true));
-	scan = table_beginscan_catalog(indRelation, 1, &entry);
-	while ((indexTuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+	if (usingindex)
 	{
-		RelToCluster *rtc;
+		ScanKeyData entry;
 
-		index = (Form_pg_index) GETSTRUCT(indexTuple);
+		catalog = table_open(IndexRelationId, AccessShareLock);
+		ScanKeyInit(&entry,
+					Anum_pg_index_indisclustered,
+					BTEqualStrategyNumber, F_BOOLEQ,
+					BoolGetDatum(true));
+		scan = table_beginscan_catalog(catalog, 1, &entry);
+		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+		{
+			RelToCluster *rtc;
+			Form_pg_index index;
+			MemoryContext oldcxt;
 
-		if (!cluster_is_permitted_for_relation(index->indrelid, GetUserId()))
-			continue;
+			index = (Form_pg_index) GETSTRUCT(tuple);
 
-		/* Use a permanent memory context for the result list */
-		old_context = MemoryContextSwitchTo(cluster_context);
+			/*
+			 * Try to obtain a light lock on the index's table, to ensure it
+			 * doesn't go away while we collect the list.  If we cannot, just
+			 * disregard it.
+			 */
+			if (!ConditionalLockRelationOid(index->indrelid, AccessShareLock))
+				continue;
 
-		rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
-		rtc->tableOid = index->indrelid;
-		rtc->indexOid = index->indexrelid;
-		rtcs = lappend(rtcs, rtc);
+			/* Verify that the table still exists */
+			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(index->indrelid)))
+			{
+				/* Release useless lock */
+				UnlockRelationOid(index->indrelid, AccessShareLock);
+				continue;
+			}
 
-		MemoryContextSwitchTo(old_context);
+			if (!cluster_is_permitted_for_relation(cmd, index->indrelid,
+												   GetUserId()))
+				continue;
+
+			/* Use a permanent memory context for the result list */
+			oldcxt = MemoryContextSwitchTo(permcxt);
+			rtc = palloc(sizeof(RelToCluster));
+			rtc->tableOid = index->indrelid;
+			rtc->indexOid = index->indexrelid;
+			rtcs = lappend(rtcs, rtc);
+			MemoryContextSwitchTo(oldcxt);
+		}
 	}
-	table_endscan(scan);
+	else
+	{
+		catalog = table_open(RelationRelationId, AccessShareLock);
+		scan = table_beginscan_catalog(catalog, 0, NULL);
 
-	relation_close(indRelation, AccessShareLock);
+		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+		{
+			RelToCluster *rtc;
+			Form_pg_class class;
+			MemoryContext oldcxt;
+
+			class = (Form_pg_class) GETSTRUCT(tuple);
+
+			/*
+			 * Try to obtain a light lock on the table, to ensure it doesn't
+			 * go away while we collect the list.  If we cannot, just
+			 * disregard the table.
+			 */
+			if (!ConditionalLockRelationOid(class->oid, AccessShareLock))
+				continue;
+
+			/* Verify that the table still exists */
+			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(class->oid)))
+			{
+				/* Release useless lock */
+				UnlockRelationOid(class->oid, AccessShareLock);
+				continue;
+			}
+
+			/* Can only process plain tables and matviews */
+			if (class->relkind != RELKIND_RELATION &&
+				class->relkind != RELKIND_MATVIEW)
+				continue;
+
+			/* noisily skip rels which the user can't process */
+			if (!cluster_is_permitted_for_relation(cmd, class->oid,
+												   GetUserId()))
+				continue;
+
+			/* Use a permanent memory context for the result list */
+			oldcxt = MemoryContextSwitchTo(permcxt);
+			rtc = palloc(sizeof(RelToCluster));
+			rtc->tableOid = class->oid;
+			rtc->indexOid = InvalidOid;
+			rtcs = lappend(rtcs, rtc);
+			MemoryContextSwitchTo(oldcxt);
+		}
+	}
+
+	table_endscan(scan);
+	relation_close(catalog, AccessShareLock);
 
 	return rtcs;
 }
 
 /*
- * Given an index on a partitioned table, return a list of RelToCluster for
+ * Given a partitioned table or its index, return a list of RelToCluster for
  * all the children leaves tables/indexes.
  *
  * Like expand_vacuum_rel, but here caller must hold AccessExclusiveLock
  * on the table containing the index.
+ *
+ * 'rel_is_index' tells whether 'relid' is that of an index (true) or of the
+ * owning relation.
  */
 static List *
-get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
+get_tables_to_repack_partitioned(RepackCommand cmd, Oid relid,
+								 bool rel_is_index, MemoryContext permcxt)
 {
 	List	   *inhoids;
-	ListCell   *lc;
 	List	   *rtcs = NIL;
-	MemoryContext old_context;
 
-	/* Do not lock the children until they're processed */
-	inhoids = find_all_inheritors(indexOid, NoLock, NULL);
-
-	foreach(lc, inhoids)
+	/*
+	 * Do not lock the children until they're processed.  Note that we do hold
+	 * a lock on the parent partitioned table.
+	 */
+	inhoids = find_all_inheritors(relid, NoLock, NULL);
+	foreach_oid(child_oid, inhoids)
 	{
-		Oid			indexrelid = lfirst_oid(lc);
-		Oid			relid = IndexGetRelation(indexrelid, false);
+		Oid			table_oid,
+					index_oid;
 		RelToCluster *rtc;
+		MemoryContext oldcxt;
 
-		/* consider only leaf indexes */
-		if (get_rel_relkind(indexrelid) != RELKIND_INDEX)
-			continue;
+		if (rel_is_index)
+		{
+			/* consider only leaf indexes */
+			if (get_rel_relkind(child_oid) != RELKIND_INDEX)
+				continue;
+
+			table_oid = IndexGetRelation(child_oid, false);
+			index_oid = child_oid;
+		}
+		else
+		{
+			/* consider only leaf relations */
+			if (get_rel_relkind(child_oid) != RELKIND_RELATION)
+				continue;
+
+			table_oid = child_oid;
+			index_oid = InvalidOid;
+		}
 
 		/*
 		 * It's possible that the user does not have privileges to CLUSTER the
-		 * leaf partition despite having such privileges on the partitioned
-		 * table.  We skip any partitions which the user is not permitted to
-		 * CLUSTER.
+		 * leaf partition despite having them on the partitioned table.  Skip
+		 * if so.
 		 */
-		if (!cluster_is_permitted_for_relation(relid, GetUserId()))
+		if (!cluster_is_permitted_for_relation(cmd, table_oid, GetUserId()))
 			continue;
 
 		/* Use a permanent memory context for the result list */
-		old_context = MemoryContextSwitchTo(cluster_context);
-
-		rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
-		rtc->tableOid = relid;
-		rtc->indexOid = indexrelid;
+		oldcxt = MemoryContextSwitchTo(permcxt);
+		rtc = palloc(sizeof(RelToCluster));
+		rtc->tableOid = table_oid;
+		rtc->indexOid = index_oid;
 		rtcs = lappend(rtcs, rtc);
-
-		MemoryContextSwitchTo(old_context);
+		MemoryContextSwitchTo(oldcxt);
 	}
 
 	return rtcs;
@@ -1742,13 +1830,167 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
  * function emits a WARNING.
  */
 static bool
-cluster_is_permitted_for_relation(Oid relid, Oid userid)
+cluster_is_permitted_for_relation(RepackCommand cmd, Oid relid, Oid userid)
 {
+	Assert(cmd == REPACK_COMMAND_CLUSTER || cmd == REPACK_COMMAND_REPACK);
+
 	if (pg_class_aclcheck(relid, userid, ACL_MAINTAIN) == ACLCHECK_OK)
 		return true;
 
 	ereport(WARNING,
-			(errmsg("permission denied to cluster \"%s\", skipping it",
-					get_rel_name(relid))));
+			errmsg("permission denied to execute %s on \"%s\", skipping it",
+				   RepackCommandAsString(cmd),
+				   get_rel_name(relid)));
+
 	return false;
 }
+
+
+/*
+ * Given a RepackStmt with an indicated relation name, resolve the relation
+ * name, obtain lock on it, then determine what to do based on the relation
+ * type: if it's table and not partitioned, repack it as indicated (using an
+ * existing clustered index, or following the given one), and return NULL.
+ *
+ * On the other hand, if the table is partitioned, do nothing further and
+ * instead return the opened and locked relcache entry, so that caller can
+ * process the partitions using the multiple-table handling code.  In this
+ * case, if an index name is given, it's up to the caller to resolve it.
+ */
+static Relation
+process_single_relation(RepackStmt *stmt, ClusterParams *params)
+{
+	Relation	rel;
+	Oid			tableOid;
+
+	Assert(stmt->relation != NULL);
+	Assert(stmt->command == REPACK_COMMAND_CLUSTER ||
+		   stmt->command == REPACK_COMMAND_REPACK);
+
+	/*
+	 * Find, lock, and check permissions on the table.  We obtain
+	 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
+	 * single-transaction case.
+	 */
+	tableOid = RangeVarGetRelidExtended(stmt->relation->relation,
+										AccessExclusiveLock,
+										0,
+										RangeVarCallbackMaintainsTable,
+										NULL);
+	rel = table_open(tableOid, NoLock);
+
+	/*
+	 * Reject clustering a remote temp table ... their local buffer manager is
+	 * not going to cope.
+	 */
+	if (RELATION_IS_OTHER_TEMP(rel))
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot execute %s on temporary tables of other sessions",
+					   RepackCommandAsString(stmt->command)));
+
+	/*
+	 * Make sure ANALYZE is specified if a column list is present.
+	 */
+	if ((params->options & CLUOPT_ANALYZE) == 0 && stmt->relation->va_cols != NIL)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("ANALYZE option must be specified when a column list is provided"));
+
+	/*
+	 * For partitioned tables, let caller handle this.  Otherwise, process it
+	 * here and we're done.
+	 */
+	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+		return rel;
+	else
+	{
+		Oid			indexOid;
+
+		indexOid = determine_clustered_index(rel, stmt->usingindex,
+											 stmt->indexname);
+		if (OidIsValid(indexOid))
+			check_index_is_clusterable(rel, indexOid, AccessExclusiveLock);
+		cluster_rel(stmt->command, rel, indexOid, params);
+
+		/* Do an analyze, if requested */
+		if (params->options & CLUOPT_ANALYZE)
+		{
+			VacuumParams vac_params = {0};
+
+			vac_params.options |= VACOPT_ANALYZE;
+			if (params->options & CLUOPT_VERBOSE)
+				vac_params.options |= VACOPT_VERBOSE;
+			analyze_rel(RelationGetRelid(rel), NULL, vac_params,
+						stmt->relation->va_cols, true, NULL);
+		}
+
+		return NULL;
+	}
+}
+
+/*
+ * Given a relation and the usingindex/indexname options in a
+ * REPACK USING INDEX or CLUSTER command, return the OID of the
+ * index to use for clustering the table.
+ *
+ * Caller must hold lock on the relation so that the set of indexes
+ * doesn't change, and must call check_index_is_clusterable.
+ */
+static Oid
+determine_clustered_index(Relation rel, bool usingindex, const char *indexname)
+{
+	Oid			indexOid;
+
+	if (indexname == NULL && usingindex)
+	{
+		/*
+		 * If USING INDEX with no name is given, find a clustered index, or
+		 * error out if none.
+		 */
+		indexOid = InvalidOid;
+		foreach_oid(idxoid, RelationGetIndexList(rel))
+		{
+			if (get_index_isclustered(idxoid))
+			{
+				indexOid = idxoid;
+				break;
+			}
+		}
+
+		if (!OidIsValid(indexOid))
+			ereport(ERROR,
+					errcode(ERRCODE_UNDEFINED_OBJECT),
+					errmsg("there is no previously clustered index for table \"%s\"",
+						   RelationGetRelationName(rel)));
+	}
+	else if (indexname != NULL)
+	{
+		/* An index was specified; obtain its OID. */
+		indexOid = get_relname_relid(indexname, rel->rd_rel->relnamespace);
+		if (!OidIsValid(indexOid))
+			ereport(ERROR,
+					errcode(ERRCODE_UNDEFINED_OBJECT),
+					errmsg("index \"%s\" for table \"%s\" does not exist",
+						   indexname, RelationGetRelationName(rel)));
+	}
+	else
+		indexOid = InvalidOid;
+
+	return indexOid;
+}
+
+static const char *
+RepackCommandAsString(RepackCommand cmd)
+{
+	switch (cmd)
+	{
+		case REPACK_COMMAND_REPACK:
+			return "REPACK";
+		case REPACK_COMMAND_VACUUMFULL:
+			return "VACUUM";
+		case REPACK_COMMAND_CLUSTER:
+			return "CLUSTER";
+	}
+	return "???";
+}
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index ed03e3bd50d..62207ceff7e 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -348,7 +348,6 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 		}
 	}
 
-
 	/*
 	 * Sanity check DISABLE_PAGE_SKIPPING option.
 	 */
@@ -2280,8 +2279,9 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 			if ((params.options & VACOPT_VERBOSE) != 0)
 				cluster_params.options |= CLUOPT_VERBOSE;
 
-			/* VACUUM FULL is now a variant of CLUSTER; see cluster.c */
-			cluster_rel(rel, InvalidOid, &cluster_params);
+			/* VACUUM FULL is a variant of REPACK; see cluster.c */
+			cluster_rel(REPACK_COMMAND_VACUUMFULL, rel, InvalidOid,
+						&cluster_params);
 			/* cluster_rel closes the relation, but keeps lock */
 
 			rel = NULL;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index a4b29c822e8..c463aa1415d 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -286,7 +286,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		AlterCompositeTypeStmt AlterUserMappingStmt
 		AlterRoleStmt AlterRoleSetStmt AlterPolicyStmt AlterStatsStmt
 		AlterDefaultPrivilegesStmt DefACLAction
-		AnalyzeStmt CallStmt ClosePortalStmt ClusterStmt CommentStmt
+		AnalyzeStmt CallStmt ClosePortalStmt CommentStmt
 		ConstraintsSetStmt CopyStmt CreateAsStmt CreateCastStmt
 		CreateDomainStmt CreateExtensionStmt CreateGroupStmt CreateOpClassStmt
 		CreateOpFamilyStmt AlterOpFamilyStmt CreatePLangStmt
@@ -303,7 +303,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		GrantStmt GrantRoleStmt ImportForeignSchemaStmt IndexStmt InsertStmt
 		ListenStmt LoadStmt LockStmt MergeStmt NotifyStmt ExplainableStmt PreparableStmt
 		CreateFunctionStmt AlterFunctionStmt ReindexStmt RemoveAggrStmt
-		RemoveFuncStmt RemoveOperStmt RenameStmt ReturnStmt RevokeStmt RevokeRoleStmt
+		RemoveFuncStmt RemoveOperStmt RenameStmt RepackStmt ReturnStmt RevokeStmt RevokeRoleStmt
 		RuleActionStmt RuleActionStmtOrEmpty RuleStmt
 		SecLabelStmt SelectStmt TransactionStmt TransactionStmtLegacy TruncateStmt
 		UnlistenStmt UpdateStmt VacuumStmt
@@ -322,7 +322,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 %type <str>			opt_single_name
 %type <list>		opt_qualified_name
-%type <boolean>		opt_concurrently
+%type <boolean>		opt_concurrently opt_usingindex
 %type <dbehavior>	opt_drop_behavior
 %type <list>		opt_utility_option_list
 %type <list>		utility_option_list
@@ -770,7 +770,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	QUOTE QUOTES
 
 	RANGE READ REAL REASSIGN RECURSIVE REF_P REFERENCES REFERENCING
-	REFRESH REINDEX RELATIVE_P RELEASE RENAME REPEATABLE REPLACE REPLICA
+	REFRESH REINDEX RELATIVE_P RELEASE RENAME REPACK REPEATABLE REPLACE REPLICA
 	RESET RESPECT_P RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP
 	ROUTINE ROUTINES ROW ROWS RULE
 
@@ -1032,7 +1032,6 @@ stmt:
 			| CallStmt
 			| CheckPointStmt
 			| ClosePortalStmt
-			| ClusterStmt
 			| CommentStmt
 			| ConstraintsSetStmt
 			| CopyStmt
@@ -1106,6 +1105,7 @@ stmt:
 			| RemoveFuncStmt
 			| RemoveOperStmt
 			| RenameStmt
+			| RepackStmt
 			| RevokeStmt
 			| RevokeRoleStmt
 			| RuleStmt
@@ -1142,6 +1142,11 @@ opt_concurrently:
 			| /*EMPTY*/						{ $$ = false; }
 		;
 
+opt_usingindex:
+			USING INDEX						{ $$ = true; }
+			| /* EMPTY */					{ $$ = false; }
+		;
+
 opt_drop_behavior:
 			CASCADE							{ $$ = DROP_CASCADE; }
 			| RESTRICT						{ $$ = DROP_RESTRICT; }
@@ -11959,38 +11964,93 @@ CreateConversionStmt:
 /*****************************************************************************
  *
  *		QUERY:
+ *				REPACK [ (options) ] [ <qualified_name> [ <name_list> ] [ USING INDEX <index_name> ] ]
+ *
+ *			obsolete variants:
  *				CLUSTER (options) [ <qualified_name> [ USING <index_name> ] ]
  *				CLUSTER [VERBOSE] [ <qualified_name> [ USING <index_name> ] ]
  *				CLUSTER [VERBOSE] <index_name> ON <qualified_name> (for pre-8.3)
  *
  *****************************************************************************/
 
-ClusterStmt:
-			CLUSTER '(' utility_option_list ')' qualified_name cluster_index_specification
+RepackStmt:
+			REPACK opt_utility_option_list vacuum_relation USING INDEX name
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
-					n->relation = $5;
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = (VacuumRelation *) $3;
 					n->indexname = $6;
+					n->usingindex = true;
+					n->params = $2;
+					$$ = (Node *) n;
+				}
+			| REPACK opt_utility_option_list vacuum_relation opt_usingindex
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = (VacuumRelation *) $3;
+					n->indexname = NULL;
+					n->usingindex = $4;
+					n->params = $2;
+					$$ = (Node *) n;
+				}
+			| REPACK '(' utility_option_list ')'
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = NULL;
+					n->indexname = NULL;
+					n->usingindex = false;
+					n->params = $3;
+					$$ = (Node *) n;
+				}
+			| REPACK opt_usingindex
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = NULL;
+					n->indexname = NULL;
+					n->usingindex = $2;
+					n->params = NIL;
+					$$ = (Node *) n;
+				}
+			| CLUSTER '(' utility_option_list ')' qualified_name cluster_index_specification
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_CLUSTER;
+					n->relation = makeNode(VacuumRelation);
+					n->relation->relation = $5;
+					n->indexname = $6;
+					n->usingindex = true;
 					n->params = $3;
 					$$ = (Node *) n;
 				}
 			| CLUSTER opt_utility_option_list
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = NULL;
 					n->indexname = NULL;
+					n->usingindex = true;
 					n->params = $2;
 					$$ = (Node *) n;
 				}
 			/* unparenthesized VERBOSE kept for pre-14 compatibility */
 			| CLUSTER opt_verbose qualified_name cluster_index_specification
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
-					n->relation = $3;
+					n->command = REPACK_COMMAND_CLUSTER;
+					n->relation = makeNode(VacuumRelation);
+					n->relation->relation = $3;
 					n->indexname = $4;
+					n->usingindex = true;
 					if ($2)
 						n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
@@ -11998,20 +12058,25 @@ ClusterStmt:
 			/* unparenthesized VERBOSE kept for pre-17 compatibility */
 			| CLUSTER VERBOSE
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = NULL;
 					n->indexname = NULL;
+					n->usingindex = true;
 					n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
 				}
 			/* kept for pre-8.3 compatibility */
 			| CLUSTER opt_verbose name ON qualified_name
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
-					n->relation = $5;
+					n->command = REPACK_COMMAND_CLUSTER;
+					n->relation = makeNode(VacuumRelation);
+					n->relation->relation = $5;
 					n->indexname = $3;
+					n->usingindex = true;
 					if ($2)
 						n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
@@ -18028,6 +18093,7 @@ unreserved_keyword:
 			| RELATIVE_P
 			| RELEASE
 			| RENAME
+			| REPACK
 			| REPEATABLE
 			| REPLACE
 			| REPLICA
@@ -18661,6 +18727,7 @@ bare_label_keyword:
 			| RELATIVE_P
 			| RELEASE
 			| RENAME
+			| REPACK
 			| REPEATABLE
 			| REPLACE
 			| REPLICA
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 918db53dd5e..1295dc25d02 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -277,9 +277,9 @@ ClassifyUtilityCommandAsReadOnly(Node *parsetree)
 				return COMMAND_OK_IN_RECOVERY | COMMAND_OK_IN_READ_ONLY_TXN;
 			}
 
-		case T_ClusterStmt:
 		case T_ReindexStmt:
 		case T_VacuumStmt:
+		case T_RepackStmt:
 			{
 				/*
 				 * These commands write WAL, so they're not strictly
@@ -854,14 +854,14 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 			ExecuteCallStmt(castNode(CallStmt, parsetree), params, isAtomicContext, dest);
 			break;
 
-		case T_ClusterStmt:
-			cluster(pstate, (ClusterStmt *) parsetree, isTopLevel);
-			break;
-
 		case T_VacuumStmt:
 			ExecVacuum(pstate, (VacuumStmt *) parsetree, isTopLevel);
 			break;
 
+		case T_RepackStmt:
+			ExecRepack(pstate, (RepackStmt *) parsetree, isTopLevel);
+			break;
+
 		case T_ExplainStmt:
 			ExplainQuery(pstate, (ExplainStmt *) parsetree, params, dest);
 			break;
@@ -2850,10 +2850,6 @@ CreateCommandTag(Node *parsetree)
 			tag = CMDTAG_CALL;
 			break;
 
-		case T_ClusterStmt:
-			tag = CMDTAG_CLUSTER;
-			break;
-
 		case T_VacuumStmt:
 			if (((VacuumStmt *) parsetree)->is_vacuumcmd)
 				tag = CMDTAG_VACUUM;
@@ -2861,6 +2857,10 @@ CreateCommandTag(Node *parsetree)
 				tag = CMDTAG_ANALYZE;
 			break;
 
+		case T_RepackStmt:
+			tag = CMDTAG_REPACK;
+			break;
+
 		case T_ExplainStmt:
 			tag = CMDTAG_EXPLAIN;
 			break;
@@ -3498,7 +3498,7 @@ GetCommandLogLevel(Node *parsetree)
 			lev = LOGSTMT_ALL;
 			break;
 
-		case T_ClusterStmt:
+		case T_RepackStmt:
 			lev = LOGSTMT_DDL;
 			break;
 
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index a710508979e..1a7bb1e4a31 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -289,6 +289,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 		cmdtype = PROGRESS_COMMAND_ANALYZE;
 	else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
 		cmdtype = PROGRESS_COMMAND_CLUSTER;
+	else if (pg_strcasecmp(cmd, "REPACK") == 0)
+		cmdtype = PROGRESS_COMMAND_REPACK;
 	else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)
 		cmdtype = PROGRESS_COMMAND_CREATE_INDEX;
 	else if (pg_strcasecmp(cmd, "BASEBACKUP") == 0)
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 36ea6a4d557..b1fe7703296 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -1257,7 +1257,7 @@ static const char *const sql_commands[] = {
 	"DELETE FROM", "DISCARD", "DO", "DROP", "END", "EXECUTE", "EXPLAIN",
 	"FETCH", "GRANT", "IMPORT FOREIGN SCHEMA", "INSERT INTO", "LISTEN", "LOAD", "LOCK",
 	"MERGE INTO", "MOVE", "NOTIFY", "PREPARE",
-	"REASSIGN", "REFRESH MATERIALIZED VIEW", "REINDEX", "RELEASE",
+	"REASSIGN", "REFRESH MATERIALIZED VIEW", "REINDEX", "RELEASE", "REPACK",
 	"RESET", "REVOKE", "ROLLBACK",
 	"SAVEPOINT", "SECURITY LABEL", "SELECT", "SET", "SHOW", "START",
 	"TABLE", "TRUNCATE", "UNLISTEN", "UPDATE", "VACUUM", "VALUES", "WITH",
@@ -5008,6 +5008,37 @@ match_previous_words(int pattern_id,
 			COMPLETE_WITH_QUERY(Query_for_list_of_tablespaces);
 	}
 
+/* REPACK */
+	else if (Matches("REPACK"))
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
+	else if (Matches("REPACK", "(*)"))
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
+	/* If we have REPACK <sth>, then add "USING INDEX" */
+	else if (Matches("REPACK", MatchAnyExcept("(")))
+		COMPLETE_WITH("USING INDEX");
+	/* If we have REPACK (*) <sth>, then add "USING INDEX" */
+	else if (Matches("REPACK", "(*)", MatchAny))
+		COMPLETE_WITH("USING INDEX");
+	/* If we have REPACK <sth> USING, then add the index as well */
+	else if (Matches("REPACK", MatchAny, "USING", "INDEX"))
+	{
+		set_completion_reference(prev3_wd);
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_index_of_table);
+	}
+	else if (HeadMatches("REPACK", "(*") &&
+			 !HeadMatches("REPACK", "(*)"))
+	{
+		/*
+		 * This fires if we're in an unfinished parenthesized option list.
+		 * get_previous_words treats a completed parenthesized option list as
+		 * one word, so the above test is correct.
+		 */
+		if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
+			COMPLETE_WITH("VERBOSE");
+		else if (TailMatches("VERBOSE"))
+			COMPLETE_WITH("ON", "OFF");
+	}
+
 /* SECURITY LABEL */
 	else if (Matches("SECURITY"))
 		COMPLETE_WITH("LABEL");
diff --git a/src/bin/scripts/Makefile b/src/bin/scripts/Makefile
index 019ca06455d..f0c1bd4175c 100644
--- a/src/bin/scripts/Makefile
+++ b/src/bin/scripts/Makefile
@@ -16,7 +16,7 @@ subdir = src/bin/scripts
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-PROGRAMS = createdb createuser dropdb dropuser clusterdb vacuumdb reindexdb pg_isready
+PROGRAMS = createdb createuser dropdb dropuser clusterdb vacuumdb reindexdb pg_isready pg_repackdb
 
 override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
 LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
@@ -31,6 +31,7 @@ clusterdb: clusterdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport su
 vacuumdb: vacuumdb.o vacuuming.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 reindexdb: reindexdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 pg_isready: pg_isready.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
+pg_repackdb: pg_repackdb.o vacuuming.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 
 install: all installdirs
 	$(INSTALL_PROGRAM) createdb$(X)   '$(DESTDIR)$(bindir)'/createdb$(X)
@@ -41,6 +42,7 @@ install: all installdirs
 	$(INSTALL_PROGRAM) vacuumdb$(X)   '$(DESTDIR)$(bindir)'/vacuumdb$(X)
 	$(INSTALL_PROGRAM) reindexdb$(X)  '$(DESTDIR)$(bindir)'/reindexdb$(X)
 	$(INSTALL_PROGRAM) pg_isready$(X) '$(DESTDIR)$(bindir)'/pg_isready$(X)
+	$(INSTALL_PROGRAM) pg_repackdb$(X) '$(DESTDIR)$(bindir)'/pg_repackdb$(X)
 
 installdirs:
 	$(MKDIR_P) '$(DESTDIR)$(bindir)'
diff --git a/src/bin/scripts/meson.build b/src/bin/scripts/meson.build
index a4fed59d1c9..be573cae682 100644
--- a/src/bin/scripts/meson.build
+++ b/src/bin/scripts/meson.build
@@ -42,6 +42,7 @@ vacuuming_common = static_library('libvacuuming_common',
 
 binaries = [
   'vacuumdb',
+  'pg_repackdb',
 ]
 foreach binary : binaries
   binary_sources = files('@0@.c'.format(binary))
@@ -80,6 +81,7 @@ tests += {
       't/100_vacuumdb.pl',
       't/101_vacuumdb_all.pl',
       't/102_vacuumdb_stages.pl',
+      't/103_repackdb.pl',
       't/200_connstr.pl',
     ],
   },
diff --git a/src/bin/scripts/pg_repackdb.c b/src/bin/scripts/pg_repackdb.c
new file mode 100644
index 00000000000..1edfa34ed0f
--- /dev/null
+++ b/src/bin/scripts/pg_repackdb.c
@@ -0,0 +1,242 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_repackdb
+ *		An utility to run REPACK
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * FIXME: this is missing a way to specify the index to use to repack one
+ * table, or whether to pass a WITH INDEX clause when multiple tables are
+ * used.  Something like --index[=indexname].  Adding that bleeds into
+ * vacuuming.c as well.
+ *
+ * src/bin/scripts/pg_repackdb.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <limits.h>
+
+#include "common.h"
+#include "common/logging.h"
+#include "fe_utils/option_utils.h"
+#include "vacuuming.h"
+
+static void help(const char *progname);
+static void check_objfilter(bits32 objfilter);
+
+int
+main(int argc, char *argv[])
+{
+	static struct option long_options[] = {
+		{"host", required_argument, NULL, 'h'},
+		{"port", required_argument, NULL, 'p'},
+		{"username", required_argument, NULL, 'U'},
+		{"no-password", no_argument, NULL, 'w'},
+		{"password", no_argument, NULL, 'W'},
+		{"echo", no_argument, NULL, 'e'},
+		{"quiet", no_argument, NULL, 'q'},
+		{"dbname", required_argument, NULL, 'd'},
+		{"analyze", no_argument, NULL, 'z'},
+		{"all", no_argument, NULL, 'a'},
+		/* XXX this could be 'i', but optional_arg is messy */
+		{"index", optional_argument, NULL, 1},
+		{"table", required_argument, NULL, 't'},
+		{"verbose", no_argument, NULL, 'v'},
+		{"jobs", required_argument, NULL, 'j'},
+		{"schema", required_argument, NULL, 'n'},
+		{"exclude-schema", required_argument, NULL, 'N'},
+		{"maintenance-db", required_argument, NULL, 2},
+		{NULL, 0, NULL, 0}
+	};
+
+	const char *progname;
+	int			optindex;
+	int			c;
+	const char *dbname = NULL;
+	const char *maintenance_db = NULL;
+	ConnParams	cparams;
+	bool		echo = false;
+	bool		quiet = false;
+	vacuumingOptions vacopts;
+	SimpleStringList objects = {NULL, NULL};
+	int			concurrentCons = 1;
+	int			tbl_count = 0;
+	int			ret;
+
+	/* initialize options */
+	memset(&vacopts, 0, sizeof(vacopts));
+	vacopts.mode = MODE_REPACK;
+
+	/* the same for connection parameters */
+	memset(&cparams, 0, sizeof(cparams));
+	cparams.prompt_password = TRI_DEFAULT;
+
+	pg_logging_init(argv[0]);
+	progname = get_progname(argv[0]);
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
+
+	handle_help_version_opts(argc, argv, progname, help);
+
+	while ((c = getopt_long(argc, argv, "ad:eh:j:n:N:p:qt:U:vwWz",
+							long_options, &optindex)) != -1)
+	{
+		switch (c)
+		{
+			case 'a':
+				vacopts.objfilter |= OBJFILTER_ALL_DBS;
+				break;
+			case 'd':
+				vacopts.objfilter |= OBJFILTER_DATABASE;
+				dbname = pg_strdup(optarg);
+				break;
+			case 'e':
+				echo = true;
+				break;
+			case 'h':
+				cparams.pghost = pg_strdup(optarg);
+				break;
+			case 'j':
+				if (!option_parse_int(optarg, "-j/--jobs", 1, INT_MAX,
+									  &concurrentCons))
+					exit(1);
+				break;
+			case 'n':
+				vacopts.objfilter |= OBJFILTER_SCHEMA;
+				simple_string_list_append(&objects, optarg);
+				break;
+			case 'N':
+				vacopts.objfilter |= OBJFILTER_SCHEMA_EXCLUDE;
+				simple_string_list_append(&objects, optarg);
+				break;
+			case 'p':
+				cparams.pgport = pg_strdup(optarg);
+				break;
+			case 'q':
+				quiet = true;
+				break;
+			case 't':
+				vacopts.objfilter |= OBJFILTER_TABLE;
+				simple_string_list_append(&objects, optarg);
+				tbl_count++;
+				break;
+			case 'U':
+				cparams.pguser = pg_strdup(optarg);
+				break;
+			case 'v':
+				vacopts.verbose = true;
+				break;
+			case 'w':
+				cparams.prompt_password = TRI_NO;
+				break;
+			case 'W':
+				cparams.prompt_password = TRI_YES;
+				break;
+			case 'z':
+				vacopts.and_analyze = true;
+				break;
+			case 1:
+				vacopts.using_index = true;
+				if (optarg)
+					vacopts.indexname = pg_strdup(optarg);
+				else
+					vacopts.indexname = NULL;
+				break;
+			case 2:
+				maintenance_db = pg_strdup(optarg);
+				break;
+			default:
+				/* getopt_long already emitted a complaint */
+				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+				exit(1);
+		}
+	}
+
+	/*
+	 * Non-option argument specifies database name as long as it wasn't
+	 * already specified with -d / --dbname
+	 */
+	if (optind < argc && dbname == NULL)
+	{
+		vacopts.objfilter |= OBJFILTER_DATABASE;
+		dbname = argv[optind];
+		optind++;
+	}
+
+	if (optind < argc)
+	{
+		pg_log_error("too many command-line arguments (first is \"%s\")",
+					 argv[optind]);
+		pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+		exit(1);
+	}
+
+	/*
+	 * Validate the combination of filters specified in the command-line
+	 * options.
+	 */
+	check_objfilter(vacopts.objfilter);
+
+	ret = vacuuming_main(&cparams, dbname, maintenance_db, &vacopts,
+						 &objects, tbl_count, concurrentCons,
+						 progname, echo, quiet);
+	exit(ret);
+}
+
+/*
+ * Verify that the filters used at command line are compatible.
+ */
+void
+check_objfilter(bits32 objfilter)
+{
+	if ((objfilter & OBJFILTER_ALL_DBS) &&
+		(objfilter & OBJFILTER_DATABASE))
+		pg_fatal("cannot repack all databases and a specific one at the same time");
+
+	if ((objfilter & OBJFILTER_TABLE) &&
+		(objfilter & OBJFILTER_SCHEMA))
+		pg_fatal("cannot repack all tables in schema(s) and specific table(s) at the same time");
+
+	if ((objfilter & OBJFILTER_TABLE) &&
+		(objfilter & OBJFILTER_SCHEMA_EXCLUDE))
+		pg_fatal("cannot repack specific table(s) and exclude schema(s) at the same time");
+
+	if ((objfilter & OBJFILTER_SCHEMA) &&
+		(objfilter & OBJFILTER_SCHEMA_EXCLUDE))
+		pg_fatal("cannot repack all tables in schema(s) and exclude schema(s) at the same time");
+}
+
+static void
+help(const char *progname)
+{
+	printf(_("%s repacks a PostgreSQL database.\n\n"), progname);
+	printf(_("Usage:\n"));
+	printf(_("  %s [OPTION]... [DBNAME]\n"), progname);
+	printf(_("\nOptions:\n"));
+	printf(_("  -a, --all                       repack all databases\n"));
+	printf(_("  -d, --dbname=DBNAME             database to repack\n"));
+	printf(_("  -e, --echo                      show the commands being sent to the server\n"));
+	printf(_("      --index[=INDEX]             repack following an index\n"));
+	printf(_("  -j, --jobs=NUM                  use this many concurrent connections to repack\n"));
+	printf(_("  -n, --schema=SCHEMA             repack tables in the specified schema(s) only\n"));
+	printf(_("  -N, --exclude-schema=SCHEMA     do not repack tables in the specified schema(s)\n"));
+	printf(_("  -q, --quiet                     don't write any messages\n"));
+	printf(_("  -t, --table='TABLE[(COLUMNS)]'  repack specific table(s) only\n"));
+	printf(_("  -v, --verbose                   write a lot of output\n"));
+	printf(_("  -V, --version                   output version information, then exit\n"));
+	printf(_("  -z, --analyze                   update optimizer statistics\n"));
+	printf(_("  -?, --help                      show this help, then exit\n"));
+	printf(_("\nConnection options:\n"));
+	printf(_("  -h, --host=HOSTNAME       database server host or socket directory\n"));
+	printf(_("  -p, --port=PORT           database server port\n"));
+	printf(_("  -U, --username=USERNAME   user name to connect as\n"));
+	printf(_("  -w, --no-password         never prompt for password\n"));
+	printf(_("  -W, --password            force password prompt\n"));
+	printf(_("  --maintenance-db=DBNAME   alternate maintenance database\n"));
+	printf(_("\nRead the description of the SQL command REPACK for details.\n"));
+	printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
+	printf(_("%s home page: <%s>\n"), PACKAGE_NAME, PACKAGE_URL);
+}
diff --git a/src/bin/scripts/t/103_repackdb.pl b/src/bin/scripts/t/103_repackdb.pl
new file mode 100644
index 00000000000..cadce9b837c
--- /dev/null
+++ b/src/bin/scripts/t/103_repackdb.pl
@@ -0,0 +1,47 @@
+# Copyright (c) 2021-2025, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+program_help_ok('pg_repackdb');
+program_version_ok('pg_repackdb');
+program_options_handling_ok('pg_repackdb');
+
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init;
+$node->start;
+
+$node->issues_sql_like(
+	[ 'pg_repackdb', 'postgres', '-t', 'pg_class'],
+	qr/statement: REPACK.*pg_class;/,
+	'pg_repackdb processes a single table');
+
+$node->safe_psql('postgres', 'CREATE USER testusr;
+	GRANT CREATE ON SCHEMA public TO testusr');
+$node->safe_psql('postgres',
+	'CREATE TABLE cluster_1 (a int primary key);
+	ALTER TABLE cluster_1 CLUSTER ON cluster_1_pkey;
+	CREATE TABLE cluster_2 (a int unique);
+	ALTER TABLE cluster_2 CLUSTER ON cluster_2_a_key;',
+	extra_params => ['-U' => 'testusr']);
+
+$node->issues_sql_like(
+	[ 'pg_repackdb', 'postgres', '-U', 'testusr' ],
+	qr/statement: REPACK.*;/,
+	'SQL REPACK run');
+
+$node->issues_sql_like(
+	[ 'pg_repackdb', 'postgres', '--index'],
+	qr/statement: REPACK.*cluster_1 USING INDEX.*statement: REPACK.*cluster_2 USING INDEX/ms,
+	'pg_repackdb --index chooses multiple tables');
+
+$node->issues_sql_like(
+	[ 'pg_repackdb', 'postgres', '--analyze', '-t', 'cluster_1'],
+	qr/statement: REPACK \(ANALYZE\) public.cluster_1/,
+	'pg_repackdb --analyze works');
+
+done_testing();
diff --git a/src/bin/scripts/vacuuming.c b/src/bin/scripts/vacuuming.c
index f836f21fb03..d8d77cabe43 100644
--- a/src/bin/scripts/vacuuming.c
+++ b/src/bin/scripts/vacuuming.c
@@ -1,6 +1,6 @@
 /*-------------------------------------------------------------------------
  * vacuuming.c
- *		Helper routines for vacuumdb
+ *		Helper routines for vacuumdb and pg_repackdb
  *
  * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -43,8 +43,8 @@ static SimpleStringList *retrieve_objects(PGconn *conn,
 static void free_retrieved_objects(SimpleStringList *list);
 static void prepare_vacuum_command(PGconn *conn, PQExpBuffer sql,
 								   vacuumingOptions *vacopts, const char *table);
-static void run_vacuum_command(PGconn *conn, const char *sql, bool echo,
-							   const char *table);
+static void run_vacuum_command(PGconn *conn, vacuumingOptions *vacopts,
+							   const char *sql, bool echo, const char *table);
 
 /*
  * Executes vacuum/analyze as indicated.  Returns 0 if the plan is carried
@@ -194,6 +194,14 @@ vacuum_one_database(ConnParams *cparams,
 
 	conn = connectDatabase(cparams, progname, echo, false, true);
 
+	if (vacopts->mode == MODE_REPACK && PQserverVersion(conn) < 190000)
+	{
+		/* XXX arguably, here we should use VACUUM FULL instead of failing */
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" command on server versions older than PostgreSQL %s",
+				 "REPACK", "19");
+	}
+
 	if (vacopts->disable_page_skipping && PQserverVersion(conn) < 90600)
 	{
 		PQfinish(conn);
@@ -286,9 +294,18 @@ vacuum_one_database(ConnParams *cparams,
 		if (vacopts->mode == MODE_ANALYZE_IN_STAGES)
 			printf(_("%s: processing database \"%s\": %s\n"),
 				   progname, PQdb(conn), _(stage_messages[stage]));
-		else
+		else if (vacopts->mode == MODE_ANALYZE)
+			printf(_("%s: analyzing database \"%s\"\n"),
+				   progname, PQdb(conn));
+		else if (vacopts->mode == MODE_VACUUM)
 			printf(_("%s: vacuuming database \"%s\"\n"),
 				   progname, PQdb(conn));
+		else
+		{
+			Assert(vacopts->mode == MODE_REPACK);
+			printf(_("%s: repacking database \"%s\"\n"),
+				   progname, PQdb(conn));
+		}
 		fflush(stdout);
 	}
 
@@ -383,7 +400,7 @@ vacuum_one_database(ConnParams *cparams,
 		 * through ParallelSlotsGetIdle.
 		 */
 		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, sql.data,
+		run_vacuum_command(free_slot->connection, vacopts, sql.data,
 						   echo, tabname);
 
 		cell = cell->next;
@@ -408,7 +425,7 @@ vacuum_one_database(ConnParams *cparams,
 		}
 
 		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, cmd, echo, NULL);
+		run_vacuum_command(free_slot->connection, vacopts, cmd, echo, NULL);
 
 		if (!ParallelSlotsWaitCompletion(sa))
 			ret = EXIT_FAILURE; /* error already reported by handler */
@@ -636,6 +653,35 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
 								 " AND listed_objects.object_oid IS NOT NULL\n");
 	}
 
+	/*
+	 * In REPACK mode, if the 'using_index' option was given but no index
+	 * name, filter only tables that have an index with indisclustered set.
+	 * (If an index name is given, we trust the user to pass a reasonable list
+	 * of tables.)
+	 *
+	 * XXX it may be worth printing an error if an index name is given with no
+	 * list of tables.
+	 */
+	if (vacopts->mode == MODE_REPACK &&
+		vacopts->using_index && !vacopts->indexname)
+	{
+		appendPQExpBufferStr(&catalog_query,
+							 " AND EXISTS (SELECT 1 FROM pg_catalog.pg_index\n"
+							 "    WHERE indrelid = c.oid AND indisclustered)\n");
+	}
+
+	/*
+	 * In REPACK mode, only consider the tables that the current user has
+	 * MAINTAIN privileges on.  XXX maybe we should do this in all cases, not
+	 * just REPACK.  The vacuumdb output is too noisy for no reason.
+	 */
+	if (vacopts->mode == MODE_REPACK)
+	{
+		appendPQExpBufferStr(&catalog_query,
+							 " AND pg_catalog.has_table_privilege(current_user, "
+							 "c.oid, 'MAINTAIN')\n");
+	}
+
 	/*
 	 * If no tables were listed, filter for the relevant relation types.  If
 	 * tables were given via --table, don't bother filtering by relation type.
@@ -874,8 +920,10 @@ prepare_vacuum_command(PGconn *conn, PQExpBuffer sql,
 			if (vacopts->verbose)
 				appendPQExpBufferStr(sql, " VERBOSE");
 		}
+
+		appendPQExpBuffer(sql, " %s", table);
 	}
-	else
+	else if (vacopts->mode == MODE_VACUUM)
 	{
 		appendPQExpBufferStr(sql, "VACUUM");
 
@@ -989,9 +1037,39 @@ prepare_vacuum_command(PGconn *conn, PQExpBuffer sql,
 			if (vacopts->and_analyze)
 				appendPQExpBufferStr(sql, " ANALYZE");
 		}
+
+		appendPQExpBuffer(sql, " %s", table);
+	}
+	else if (vacopts->mode == MODE_REPACK)
+	{
+		appendPQExpBufferStr(sql, "REPACK");
+
+		if (vacopts->verbose)
+		{
+			appendPQExpBuffer(sql, "%sVERBOSE", sep);
+			sep = comma;
+		}
+		if (vacopts->and_analyze)
+		{
+			appendPQExpBuffer(sql, "%sANALYZE", sep);
+			sep = comma;
+		}
+
+		if (sep != paren)
+			appendPQExpBufferChar(sql, ')');
+
+		appendPQExpBuffer(sql, " %s", table);
+
+		if (vacopts->using_index)
+		{
+			appendPQExpBuffer(sql, " USING INDEX");
+			if (vacopts->indexname)
+				appendPQExpBuffer(sql, " %s", fmtIdEnc(vacopts->indexname,
+													   PQclientEncoding(conn)));
+		}
 	}
 
-	appendPQExpBuffer(sql, " %s;", table);
+	appendPQExpBufferChar(sql, ';');
 }
 
 /*
@@ -1001,8 +1079,8 @@ prepare_vacuum_command(PGconn *conn, PQExpBuffer sql,
  * Any errors during command execution are reported to stderr.
  */
 static void
-run_vacuum_command(PGconn *conn, const char *sql, bool echo,
-				   const char *table)
+run_vacuum_command(PGconn *conn, vacuumingOptions *vacopts,
+				   const char *sql, bool echo, const char *table)
 {
 	bool		status;
 
@@ -1015,13 +1093,21 @@ run_vacuum_command(PGconn *conn, const char *sql, bool echo,
 	{
 		if (table)
 		{
-			pg_log_error("vacuuming of table \"%s\" in database \"%s\" failed: %s",
-						 table, PQdb(conn), PQerrorMessage(conn));
+			if (vacopts->mode == MODE_VACUUM)
+				pg_log_error("vacuuming of table \"%s\" in database \"%s\" failed: %s",
+							 table, PQdb(conn), PQerrorMessage(conn));
+			else
+				pg_log_error("repacking of table \"%s\" in database \"%s\" failed: %s",
+							 table, PQdb(conn), PQerrorMessage(conn));
 		}
 		else
 		{
-			pg_log_error("vacuuming of database \"%s\" failed: %s",
-						 PQdb(conn), PQerrorMessage(conn));
+			if (vacopts->mode == MODE_VACUUM)
+				pg_log_error("vacuuming of database \"%s\" failed: %s",
+							 PQdb(conn), PQerrorMessage(conn));
+			else
+				pg_log_error("repacking of database \"%s\" failed: %s",
+							 PQdb(conn), PQerrorMessage(conn));
 		}
 	}
 }
diff --git a/src/bin/scripts/vacuuming.h b/src/bin/scripts/vacuuming.h
index 49f968b32e5..665dbaedfad 100644
--- a/src/bin/scripts/vacuuming.h
+++ b/src/bin/scripts/vacuuming.h
@@ -20,6 +20,7 @@
 typedef enum
 {
 	MODE_VACUUM,
+	MODE_REPACK,
 	MODE_ANALYZE,
 	MODE_ANALYZE_IN_STAGES
 } RunMode;
@@ -37,6 +38,8 @@ typedef struct vacuumingOptions
 	bool		and_analyze;
 	bool		full;
 	bool		freeze;
+	bool		using_index;
+	char	   *indexname;
 	bool		disable_page_skipping;
 	bool		skip_locked;
 	int			min_xid_age;
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 60088a64cbb..652542e8e65 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -24,6 +24,7 @@
 #define CLUOPT_RECHECK 0x02		/* recheck relation state */
 #define CLUOPT_RECHECK_ISCLUSTERED 0x04 /* recheck relation state for
 										 * indisclustered */
+#define CLUOPT_ANALYZE 0x08		/* do an ANALYZE */
 
 /* options for CLUSTER */
 typedef struct ClusterParams
@@ -31,8 +32,11 @@ typedef struct ClusterParams
 	bits32		options;		/* bitmask of CLUOPT_* */
 } ClusterParams;
 
-extern void cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel);
-extern void cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params);
+
+extern void ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel);
+
+extern void cluster_rel(RepackCommand command, Relation OldHeap, Oid indexOid,
+						ClusterParams *params);
 extern void check_index_is_clusterable(Relation OldHeap, Oid indexOid,
 									   LOCKMODE lockmode);
 extern void mark_index_clustered(Relation rel, Oid indexOid, bool is_internal);
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 1cde4bd9bcf..ebf004b7aa5 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -56,28 +56,34 @@
 #define PROGRESS_ANALYZE_PHASE_COMPUTE_EXT_STATS		4
 #define PROGRESS_ANALYZE_PHASE_FINALIZE_ANALYZE			5
 
-/* Progress parameters for cluster */
-#define PROGRESS_CLUSTER_COMMAND				0
-#define PROGRESS_CLUSTER_PHASE					1
-#define PROGRESS_CLUSTER_INDEX_RELID			2
-#define PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED	3
-#define PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN	4
-#define PROGRESS_CLUSTER_TOTAL_HEAP_BLKS		5
-#define PROGRESS_CLUSTER_HEAP_BLKS_SCANNED		6
-#define PROGRESS_CLUSTER_INDEX_REBUILD_COUNT	7
+/*
+ * Progress parameters for REPACK.
+ *
+ * Values for PROGRESS_REPACK_COMMAND are defined as in RepackCommand.
+ *
+ * Note: Since REPACK shares code with CLUSTER, these values are also
+ * used by CLUSTER. (CLUSTER is now deprecated, so it makes little sense to
+ * introduce a separate set of constants.)
+ */
+#define PROGRESS_REPACK_COMMAND					0
+#define PROGRESS_REPACK_PHASE					1
+#define PROGRESS_REPACK_INDEX_RELID				2
+#define PROGRESS_REPACK_HEAP_TUPLES_SCANNED		3
+#define PROGRESS_REPACK_HEAP_TUPLES_WRITTEN		4
+#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			5
+#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		6
+#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		7
 
-/* Phases of cluster (as advertised via PROGRESS_CLUSTER_PHASE) */
-#define PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP	1
-#define PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP	2
-#define PROGRESS_CLUSTER_PHASE_SORT_TUPLES		3
-#define PROGRESS_CLUSTER_PHASE_WRITE_NEW_HEAP	4
-#define PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES	5
-#define PROGRESS_CLUSTER_PHASE_REBUILD_INDEX	6
-#define PROGRESS_CLUSTER_PHASE_FINAL_CLEANUP	7
-
-/* Commands of PROGRESS_CLUSTER */
-#define PROGRESS_CLUSTER_COMMAND_CLUSTER		1
-#define PROGRESS_CLUSTER_COMMAND_VACUUM_FULL	2
+/*
+ * Phases of repack (as advertised via PROGRESS_REPACK_PHASE).
+ */
+#define PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP		1
+#define PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP	2
+#define PROGRESS_REPACK_PHASE_SORT_TUPLES		3
+#define PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP	4
+#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	5
+#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		6
+#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		7
 
 /* Progress parameters for CREATE INDEX */
 /* 3, 4 and 5 reserved for "waitfor" metrics */
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index ecbddd12e1b..525f6e6d6a5 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3951,18 +3951,6 @@ typedef struct AlterSystemStmt
 	VariableSetStmt *setstmt;	/* SET subcommand */
 } AlterSystemStmt;
 
-/* ----------------------
- *		Cluster Statement (support pbrown's cluster index implementation)
- * ----------------------
- */
-typedef struct ClusterStmt
-{
-	NodeTag		type;
-	RangeVar   *relation;		/* relation being indexed, or NULL if all */
-	char	   *indexname;		/* original index defined */
-	List	   *params;			/* list of DefElem nodes */
-} ClusterStmt;
-
 /* ----------------------
  *		Vacuum and Analyze Statements
  *
@@ -3975,7 +3963,7 @@ typedef struct VacuumStmt
 	NodeTag		type;
 	List	   *options;		/* list of DefElem nodes */
 	List	   *rels;			/* list of VacuumRelation, or NIL for all */
-	bool		is_vacuumcmd;	/* true for VACUUM, false for ANALYZE */
+	bool		is_vacuumcmd;	/* true for VACUUM, false otherwise */
 } VacuumStmt;
 
 /*
@@ -3993,6 +3981,27 @@ typedef struct VacuumRelation
 	List	   *va_cols;		/* list of column names, or NIL for all */
 } VacuumRelation;
 
+/* ----------------------
+ *		Repack Statement
+ * ----------------------
+ */
+typedef enum RepackCommand
+{
+	REPACK_COMMAND_CLUSTER = 1,
+	REPACK_COMMAND_REPACK,
+	REPACK_COMMAND_VACUUMFULL,
+} RepackCommand;
+
+typedef struct RepackStmt
+{
+	NodeTag		type;
+	RepackCommand command;		/* type of command being run */
+	VacuumRelation *relation;	/* relation being repacked */
+	char	   *indexname;		/* order tuples by this index */
+	bool		usingindex;		/* whether USING INDEX is specified */
+	List	   *params;			/* list of DefElem nodes */
+} RepackStmt;
+
 /* ----------------------
  *		Explain Statement
  *
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 84182eaaae2..87f6c226c43 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -375,6 +375,7 @@ PG_KEYWORD("reindex", REINDEX, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("relative", RELATIVE_P, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("release", RELEASE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("rename", RENAME, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("repack", REPACK, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("repeatable", REPEATABLE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("replace", REPLACE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("replica", REPLICA, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/include/tcop/cmdtaglist.h b/src/include/tcop/cmdtaglist.h
index d250a714d59..cceb312f2b3 100644
--- a/src/include/tcop/cmdtaglist.h
+++ b/src/include/tcop/cmdtaglist.h
@@ -196,6 +196,7 @@ PG_CMDTAG(CMDTAG_REASSIGN_OWNED, "REASSIGN OWNED", false, false, false)
 PG_CMDTAG(CMDTAG_REFRESH_MATERIALIZED_VIEW, "REFRESH MATERIALIZED VIEW", true, false, false)
 PG_CMDTAG(CMDTAG_REINDEX, "REINDEX", true, false, false)
 PG_CMDTAG(CMDTAG_RELEASE, "RELEASE", false, false, false)
+PG_CMDTAG(CMDTAG_REPACK, "REPACK", false, false, false)
 PG_CMDTAG(CMDTAG_RESET, "RESET", false, false, false)
 PG_CMDTAG(CMDTAG_REVOKE, "REVOKE", true, false, false)
 PG_CMDTAG(CMDTAG_REVOKE_ROLE, "REVOKE ROLE", false, false, false)
diff --git a/src/include/utils/backend_progress.h b/src/include/utils/backend_progress.h
index dda813ab407..e69e366dcdc 100644
--- a/src/include/utils/backend_progress.h
+++ b/src/include/utils/backend_progress.h
@@ -28,6 +28,7 @@ typedef enum ProgressCommandType
 	PROGRESS_COMMAND_CREATE_INDEX,
 	PROGRESS_COMMAND_BASEBACKUP,
 	PROGRESS_COMMAND_COPY,
+	PROGRESS_COMMAND_REPACK,
 } ProgressCommandType;
 
 #define PGSTAT_NUM_PROGRESS_PARAM	20
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index 4d40a6809ab..277854418fa 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -495,6 +495,43 @@ ALTER TABLE clstrpart SET WITHOUT CLUSTER;
 ERROR:  cannot mark index clustered in partitioned table
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 ERROR:  cannot mark index clustered in partitioned table
+-- Check that REPACK sets new relfilenodes: it should process exactly the same
+-- tables as CLUSTER did.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart USING INDEX clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
+-- And finally the same for REPACK w/o index.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
 DROP TABLE clstrpart;
 -- Ownership of partitions is checked
 CREATE TABLE ptnowner(i int unique) PARTITION BY LIST (i);
@@ -513,7 +550,7 @@ CREATE TEMP TABLE ptnowner_oldnodes AS
   JOIN pg_class AS c ON c.oid=tree.relid;
 SET SESSION AUTHORIZATION regress_ptnowner;
 CLUSTER ptnowner USING ptnowner_i_idx;
-WARNING:  permission denied to cluster "ptnowner2", skipping it
+WARNING:  permission denied to execute CLUSTER on "ptnowner2", skipping it
 RESET SESSION AUTHORIZATION;
 SELECT a.relname, a.relfilenode=b.relfilenode FROM pg_class a
   JOIN ptnowner_oldnodes b USING (oid) ORDER BY a.relname COLLATE "C";
@@ -665,6 +702,101 @@ SELECT * FROM clstr_expression WHERE -a = -3 ORDER BY -a, b;
 (4 rows)
 
 COMMIT;
+----------------------------------------------------------------------
+--
+-- REPACK
+--
+----------------------------------------------------------------------
+-- REPACK handles individual tables identically to CLUSTER, but it's worth
+-- checking if it handles table hierarchies identically as well.
+REPACK clstr_tst USING INDEX clstr_tst_c;
+-- Verify that inheritance link still works
+INSERT INTO clstr_tst_inh VALUES (0, 100, 'in child table 2');
+SELECT a,b,c,substring(d for 30), length(d) from clstr_tst;
+ a  |  b  |        c         |           substring            | length 
+----+-----+------------------+--------------------------------+--------
+ 10 |  14 | catorce          |                                |       
+ 18 |   5 | cinco            |                                |       
+  9 |   4 | cuatro           |                                |       
+ 26 |  19 | diecinueve       |                                |       
+ 12 |  18 | dieciocho        |                                |       
+ 30 |  16 | dieciseis        |                                |       
+ 24 |  17 | diecisiete       |                                |       
+  2 |  10 | diez             |                                |       
+ 23 |  12 | doce             |                                |       
+ 11 |   2 | dos              |                                |       
+ 25 |   9 | nueve            |                                |       
+ 31 |   8 | ocho             |                                |       
+  1 |  11 | once             |                                |       
+ 28 |  15 | quince           |                                |       
+ 32 |   6 | seis             | xyzzyxyzzyxyzzyxyzzyxyzzyxyzzy | 500000
+ 29 |   7 | siete            |                                |       
+ 15 |  13 | trece            |                                |       
+ 22 |  30 | treinta          |                                |       
+ 17 |  32 | treinta y dos    |                                |       
+  3 |  31 | treinta y uno    |                                |       
+  5 |   3 | tres             |                                |       
+ 20 |   1 | uno              |                                |       
+  6 |  20 | veinte           |                                |       
+ 14 |  25 | veinticinco      |                                |       
+ 21 |  24 | veinticuatro     |                                |       
+  4 |  22 | veintidos        |                                |       
+ 19 |  29 | veintinueve      |                                |       
+ 16 |  28 | veintiocho       |                                |       
+ 27 |  26 | veintiseis       |                                |       
+ 13 |  27 | veintisiete      |                                |       
+  7 |  23 | veintitres       |                                |       
+  8 |  21 | veintiuno        |                                |       
+  0 | 100 | in child table   |                                |       
+  0 | 100 | in child table 2 |                                |       
+(34 rows)
+
+-- Verify that foreign key link still works
+INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
+ERROR:  insert or update on table "clstr_tst" violates foreign key constraint "clstr_tst_con"
+DETAIL:  Key (b)=(1111) is not present in table "clstr_tst_s".
+SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
+ORDER BY 1;
+       conname        
+----------------------
+ clstr_tst_a_not_null
+ clstr_tst_con
+ clstr_tst_pkey
+(3 rows)
+
+-- Verify partial analyze works
+REPACK (ANALYZE) clstr_tst (a);
+REPACK (ANALYZE) clstr_tst;
+REPACK (VERBOSE) clstr_tst (a);
+ERROR:  ANALYZE option must be specified when a column list is provided
+-- REPACK w/o argument performs no ordering, so we can only check which tables
+-- have the relfilenode changed.
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_old AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+SET SESSION AUTHORIZATION regress_clstr_user;
+SET client_min_messages = ERROR;  -- order of "skipping" warnings may vary
+REPACK;
+RESET client_min_messages;
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_new AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+-- Do the actual comparison. Unlike CLUSTER, clstr_3 should have been
+-- processed because there is nothing like clustering index here.
+SELECT o.relname FROM relnodes_old o
+JOIN relnodes_new n ON o.relname = n.relname
+WHERE o.relfilenode <> n.relfilenode
+ORDER BY o.relname;
+ relname 
+---------
+ clstr_1
+ clstr_3
+(2 rows)
+
 -- clean up
 DROP TABLE clustertest;
 DROP TABLE clstr_1;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 77e25ca029e..bd872ebd13a 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1994,34 +1994,23 @@ pg_stat_progress_basebackup| SELECT pid,
             ELSE NULL::text
         END AS backup_type
    FROM pg_stat_get_progress_info('BASEBACKUP'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20);
-pg_stat_progress_cluster| SELECT s.pid,
-    s.datid,
-    d.datname,
-    s.relid,
-        CASE s.param1
-            WHEN 1 THEN 'CLUSTER'::text
-            WHEN 2 THEN 'VACUUM FULL'::text
-            ELSE NULL::text
+pg_stat_progress_cluster| SELECT pid,
+    datid,
+    datname,
+    relid,
+        CASE
+            WHEN (command = ANY (ARRAY['CLUSTER'::text, 'VACUUM FULL'::text])) THEN command
+            WHEN (repack_index_relid = (0)::oid) THEN 'VACUUM FULL'::text
+            ELSE 'CLUSTER'::text
         END AS command,
-        CASE s.param2
-            WHEN 0 THEN 'initializing'::text
-            WHEN 1 THEN 'seq scanning heap'::text
-            WHEN 2 THEN 'index scanning heap'::text
-            WHEN 3 THEN 'sorting tuples'::text
-            WHEN 4 THEN 'writing new heap'::text
-            WHEN 5 THEN 'swapping relation files'::text
-            WHEN 6 THEN 'rebuilding index'::text
-            WHEN 7 THEN 'performing final cleanup'::text
-            ELSE NULL::text
-        END AS phase,
-    (s.param3)::oid AS cluster_index_relid,
-    s.param4 AS heap_tuples_scanned,
-    s.param5 AS heap_tuples_written,
-    s.param6 AS heap_blks_total,
-    s.param7 AS heap_blks_scanned,
-    s.param8 AS index_rebuild_count
-   FROM (pg_stat_get_progress_info('CLUSTER'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
-     LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+    phase,
+    repack_index_relid AS cluster_index_relid,
+    heap_tuples_scanned,
+    heap_tuples_written,
+    heap_blks_total,
+    heap_blks_scanned,
+    index_rebuild_count
+   FROM pg_stat_progress_repack;
 pg_stat_progress_copy| SELECT s.pid,
     s.datid,
     d.datname,
@@ -2081,6 +2070,35 @@ pg_stat_progress_create_index| SELECT s.pid,
     s.param15 AS partitions_done
    FROM (pg_stat_get_progress_info('CREATE INDEX'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+pg_stat_progress_repack| SELECT s.pid,
+    s.datid,
+    d.datname,
+    s.relid,
+        CASE s.param1
+            WHEN 1 THEN 'CLUSTER'::text
+            WHEN 2 THEN 'REPACK'::text
+            WHEN 3 THEN 'VACUUM FULL'::text
+            ELSE NULL::text
+        END AS command,
+        CASE s.param2
+            WHEN 0 THEN 'initializing'::text
+            WHEN 1 THEN 'seq scanning heap'::text
+            WHEN 2 THEN 'index scanning heap'::text
+            WHEN 3 THEN 'sorting tuples'::text
+            WHEN 4 THEN 'writing new heap'::text
+            WHEN 5 THEN 'swapping relation files'::text
+            WHEN 6 THEN 'rebuilding index'::text
+            WHEN 7 THEN 'performing final cleanup'::text
+            ELSE NULL::text
+        END AS phase,
+    (s.param3)::oid AS repack_index_relid,
+    s.param4 AS heap_tuples_scanned,
+    s.param5 AS heap_tuples_written,
+    s.param6 AS heap_blks_total,
+    s.param7 AS heap_blks_scanned,
+    s.param8 AS index_rebuild_count
+   FROM (pg_stat_get_progress_info('REPACK'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
+     LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_vacuum| SELECT s.pid,
     s.datid,
     d.datname,
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index b7115f86104..c976823a3cb 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -76,7 +76,6 @@ INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
 SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
 ORDER BY 1;
 
-
 SELECT relname, relkind,
     EXISTS(SELECT 1 FROM pg_class WHERE oid = c.reltoastrelid) AS hastoast
 FROM pg_class c WHERE relname LIKE 'clstr_tst%' ORDER BY relname;
@@ -229,6 +228,24 @@ SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM o
 CLUSTER clstrpart;
 ALTER TABLE clstrpart SET WITHOUT CLUSTER;
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+
+-- Check that REPACK sets new relfilenodes: it should process exactly the same
+-- tables as CLUSTER did.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart USING INDEX clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+
+-- And finally the same for REPACK w/o index.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+
 DROP TABLE clstrpart;
 
 -- Ownership of partitions is checked
@@ -313,6 +330,57 @@ EXPLAIN (COSTS OFF) SELECT * FROM clstr_expression WHERE -a = -3 ORDER BY -a, b;
 SELECT * FROM clstr_expression WHERE -a = -3 ORDER BY -a, b;
 COMMIT;
 
+----------------------------------------------------------------------
+--
+-- REPACK
+--
+----------------------------------------------------------------------
+
+-- REPACK handles individual tables identically to CLUSTER, but it's worth
+-- checking if it handles table hierarchies identically as well.
+REPACK clstr_tst USING INDEX clstr_tst_c;
+
+-- Verify that inheritance link still works
+INSERT INTO clstr_tst_inh VALUES (0, 100, 'in child table 2');
+SELECT a,b,c,substring(d for 30), length(d) from clstr_tst;
+
+-- Verify that foreign key link still works
+INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
+
+SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
+ORDER BY 1;
+
+-- Verify partial analyze works
+REPACK (ANALYZE) clstr_tst (a);
+REPACK (ANALYZE) clstr_tst;
+REPACK (VERBOSE) clstr_tst (a);
+
+-- REPACK w/o argument performs no ordering, so we can only check which tables
+-- have the relfilenode changed.
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_old AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+
+SET SESSION AUTHORIZATION regress_clstr_user;
+SET client_min_messages = ERROR;  -- order of "skipping" warnings may vary
+REPACK;
+RESET client_min_messages;
+
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_new AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+
+-- Do the actual comparison. Unlike CLUSTER, clstr_3 should have been
+-- processed because there is nothing like clustering index here.
+SELECT o.relname FROM relnodes_old o
+JOIN relnodes_new n ON o.relname = n.relname
+WHERE o.relfilenode <> n.relfilenode
+ORDER BY o.relname;
+
 -- clean up
 DROP TABLE clustertest;
 DROP TABLE clstr_1;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 018b5919cf6..0c7dd5d09e8 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2544,6 +2544,8 @@ ReorderBufferTupleCidKey
 ReorderBufferUpdateProgressTxnCB
 ReorderTuple
 RepOriginId
+RepackCommand
+RepackStmt
 ReparameterizeForeignPathByChild_function
 ReplaceVarsFromTargetList_context
 ReplaceVarsNoMatchOption
-- 
2.47.3

v25-0002-Refactor-index_concurrently_create_copy-for-use-.patchtext/x-diff; charset=utf-8Download

From bbc3950766cadd4c90de1c627c8237aa9ecdf44a Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Mon, 11 Aug 2025 15:31:34 +0200
Subject: [PATCH v25 2/4] Refactor index_concurrently_create_copy() for use
 with REPACK (CONCURRENTLY).

This patch moves the code to index_create_copy() and adds a "concurrently"
parameter so it can be used by REPACK (CONCURRENTLY).

With the CONCURRENTLY option, REPACK cannot simply swap the heap file and
rebuild its indexes. Instead, it needs to build a separate set of indexes
(including system catalog entries) *before* the actual swap, to reduce the
time AccessExclusiveLock needs to be held for.
---
 src/backend/catalog/index.c | 36 ++++++++++++++++++++++++++++--------
 src/include/catalog/index.h |  3 +++
 2 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 08d4b8e44d7..2b33bf04883 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1290,15 +1290,31 @@ index_create(Relation heapRelation,
 /*
  * index_concurrently_create_copy
  *
- * Create concurrently an index based on the definition of the one provided by
- * caller.  The index is inserted into catalogs and needs to be built later
- * on.  This is called during concurrent reindex processing.
- *
- * "tablespaceOid" is the tablespace to use for this index.
+ * Variant of index_create_copy(), called during concurrent reindex
+ * processing.
  */
 Oid
 index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 							   Oid tablespaceOid, const char *newName)
+{
+	return index_create_copy(heapRelation, oldIndexId, tablespaceOid, newName,
+							 true);
+}
+
+/*
+ * index_create_copy
+ *
+ * Create an index based on the definition of the one provided by caller.  The
+ * index is inserted into catalogs and needs to be built later on.
+ *
+ * "tablespaceOid" is the tablespace to use for this index.
+ *
+ * The actual implementation of index_concurrently_create_copy(), reusable for
+ * other purposes.
+ */
+Oid
+index_create_copy(Relation heapRelation, Oid oldIndexId, Oid tablespaceOid,
+				  const char *newName, bool concurrently)
 {
 	Relation	indexRelation;
 	IndexInfo  *oldInfo,
@@ -1317,6 +1333,7 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 	List	   *indexColNames = NIL;
 	List	   *indexExprs = NIL;
 	List	   *indexPreds = NIL;
+	int			flags = 0;
 
 	indexRelation = index_open(oldIndexId, RowExclusiveLock);
 
@@ -1325,9 +1342,9 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 
 	/*
 	 * Concurrent build of an index with exclusion constraints is not
-	 * supported.
+	 * supported. If !concurrently, ii_ExclusinOps is currently not needed.
 	 */
-	if (oldInfo->ii_ExclusionOps != NULL)
+	if (oldInfo->ii_ExclusionOps != NULL && concurrently)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("concurrent index creation for exclusion constraints is not supported")));
@@ -1435,6 +1452,9 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 		stattargets[i].isnull = isnull;
 	}
 
+	if (concurrently)
+		flags = INDEX_CREATE_SKIP_BUILD | INDEX_CREATE_CONCURRENT;
+
 	/*
 	 * Now create the new index.
 	 *
@@ -1458,7 +1478,7 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 							  indcoloptions->values,
 							  stattargets,
 							  reloptionsDatum,
-							  INDEX_CREATE_SKIP_BUILD | INDEX_CREATE_CONCURRENT,
+							  flags,
 							  0,
 							  true, /* allow table to be a system catalog? */
 							  false,	/* is_internal? */
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index dda95e54903..4bf909078d8 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -99,6 +99,9 @@ extern Oid	index_concurrently_create_copy(Relation heapRelation,
 										   Oid oldIndexId,
 										   Oid tablespaceOid,
 										   const char *newName);
+extern Oid	index_create_copy(Relation heapRelation, Oid oldIndexId,
+							  Oid tablespaceOid, const char *newName,
+							  bool concurrently);
 
 extern void index_concurrently_build(Oid heapRelationId,
 									 Oid indexRelationId);
-- 
2.47.3

v25-0003-Move-conversion-of-a-historic-to-MVCC-snapshot-t.patchtext/x-diff; charset=utf-8Download

From 695d0b9d2737aafd3fd2dcd581869fad623add14 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Mon, 11 Aug 2025 15:23:05 +0200
Subject: [PATCH v25 3/4] Move conversion of a "historic" to MVCC snapshot to a
 separate function.

The conversion is now handled by SnapBuildMVCCFromHistoric(). REPACK
CONCURRENTLY will also need it.
---
 src/backend/replication/logical/snapbuild.c | 51 +++++++++++++++++----
 src/backend/utils/time/snapmgr.c            |  3 +-
 src/include/replication/snapbuild.h         |  1 +
 src/include/utils/snapmgr.h                 |  1 +
 4 files changed, 45 insertions(+), 11 deletions(-)

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 98ddee20929..a2f1803622c 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -440,10 +440,7 @@ Snapshot
 SnapBuildInitialSnapshot(SnapBuild *builder)
 {
 	Snapshot	snap;
-	TransactionId xid;
 	TransactionId safeXid;
-	TransactionId *newxip;
-	int			newxcnt = 0;
 
 	Assert(XactIsoLevel == XACT_REPEATABLE_READ);
 	Assert(builder->building_full_snapshot);
@@ -485,6 +482,31 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 
 	MyProc->xmin = snap->xmin;
 
+	/* Convert the historic snapshot to MVCC snapshot. */
+	return SnapBuildMVCCFromHistoric(snap, true);
+}
+
+/*
+ * Turn a historic MVCC snapshot into an ordinary MVCC snapshot.
+ *
+ * Unlike a regular (non-historic) MVCC snapshot, the xip array of this
+ * snapshot contains not only running main transactions, but also their
+ * subtransactions. This difference does has no impact on XidInMVCCSnapshot().
+ *
+ * Pass true for 'in_place' if you don't care about modifying the source
+ * snapshot. If you need a new instance, and one that was allocated as a
+ * single chunk of memory, pass false.
+ */
+Snapshot
+SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place)
+{
+	TransactionId xid;
+	TransactionId *oldxip = snapshot->xip;
+	uint32		oldxcnt = snapshot->xcnt;
+	TransactionId *newxip;
+	int			newxcnt = 0;
+	Snapshot	result;
+
 	/* allocate in transaction context */
 	newxip = (TransactionId *)
 		palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount());
@@ -495,7 +517,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	 * classical snapshot by marking all non-committed transactions as
 	 * in-progress. This can be expensive.
 	 */
-	for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);)
+	for (xid = snapshot->xmin; NormalTransactionIdPrecedes(xid, snapshot->xmax);)
 	{
 		void	   *test;
 
@@ -503,7 +525,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 		 * Check whether transaction committed using the decoding snapshot
 		 * meaning of ->xip.
 		 */
-		test = bsearch(&xid, snap->xip, snap->xcnt,
+		test = bsearch(&xid, snapshot->xip, snapshot->xcnt,
 					   sizeof(TransactionId), xidComparator);
 
 		if (test == NULL)
@@ -520,11 +542,22 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	}
 
 	/* adjust remaining snapshot fields as needed */
-	snap->snapshot_type = SNAPSHOT_MVCC;
-	snap->xcnt = newxcnt;
-	snap->xip = newxip;
+	snapshot->xcnt = newxcnt;
+	snapshot->xip = newxip;
 
-	return snap;
+	if (in_place)
+		result = snapshot;
+	else
+	{
+		result = CopySnapshot(snapshot);
+
+		/* Restore the original values so the source is intact. */
+		snapshot->xip = oldxip;
+		snapshot->xcnt = oldxcnt;
+	}
+	result->snapshot_type = SNAPSHOT_MVCC;
+
+	return result;
 }
 
 /*
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 65561cc6bc3..bc7840052fe 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -212,7 +212,6 @@ typedef struct ExportedSnapshot
 static List *exportedSnapshots = NIL;
 
 /* Prototypes for local functions */
-static Snapshot CopySnapshot(Snapshot snapshot);
 static void UnregisterSnapshotNoOwner(Snapshot snapshot);
 static void FreeSnapshot(Snapshot snapshot);
 static void SnapshotResetXmin(void);
@@ -602,7 +601,7 @@ SetTransactionSnapshot(Snapshot sourcesnap, VirtualTransactionId *sourcevxid,
  * The copy is palloc'd in TopTransactionContext and has initial refcounts set
  * to 0.  The returned snapshot has the copied flag set.
  */
-static Snapshot
+Snapshot
 CopySnapshot(Snapshot snapshot)
 {
 	Snapshot	newsnap;
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index 44031dcf6e3..6d4d2d1814c 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -73,6 +73,7 @@ extern void FreeSnapshotBuilder(SnapBuild *builder);
 extern void SnapBuildSnapDecRefcount(Snapshot snap);
 
 extern Snapshot SnapBuildInitialSnapshot(SnapBuild *builder);
+extern Snapshot SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place);
 extern const char *SnapBuildExportSnapshot(SnapBuild *builder);
 extern void SnapBuildClearExportedSnapshot(void);
 extern void SnapBuildResetExportedSnapshotState(void);
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..f65f83c85cd 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -63,6 +63,7 @@ extern Snapshot GetTransactionSnapshot(void);
 extern Snapshot GetLatestSnapshot(void);
 extern void SnapshotSetCommandId(CommandId curcid);
 
+extern Snapshot CopySnapshot(Snapshot snapshot);
 extern Snapshot GetCatalogSnapshot(Oid relid);
 extern Snapshot GetNonHistoricCatalogSnapshot(Oid relid);
 extern void InvalidateCatalogSnapshot(void);
-- 
2.47.3

v25-0004-Add-CONCURRENTLY-option-to-REPACK-command.patchtext/x-diff; charset=utf-8Download

From 84e8e7ae97699e3fe067e868a310ed4a1f55aabd Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= <alvherre@kurilemu.de>
Date: Sat, 30 Aug 2025 19:13:38 +0200
Subject: [PATCH v25 4/4] Add CONCURRENTLY option to REPACK command.

The REPACK command copies the relation data into a new file, creates new
indexes and eventually swaps the files. To make sure that the old file does
not change during the copying, the relation is locked in an exclusive mode,
which prevents applications from both reading and writing. (To keep the data
consistent, we'd only need to prevent the applications from writing, but even
reading needs to be blocked before we can swap the files - otherwise some
applications could continue using the old file. Since we should not request a
stronger lock without releasing the weaker one first, we acquire the exclusive
lock in the beginning and keep it till the end of the processing.)

This patch introduces an alternative workflow, which only requires the
exclusive lock when the relation (and index) files are being swapped.
(Supposedly, the swapping should be pretty fast.) On the other hand, when we
copy the data to the new file, we allow applications to read from the relation
and even to write to it.

First, we scan the relation using a "historic snapshot", and insert all the
tuples satisfying this snapshot into the new file.

Second, logical decoding is used to capture the data changes done by
applications during the copying (i.e. changes that do not satisfy the historic
snapshot mentioned above), and those are applied to the new file before we
acquire the exclusive lock that we need to swap the files. (Of course, more
data changes can take place while we are waiting for the lock - these will be
applied to the new file after we have acquired the lock, before we swap the
files.)

Since the logical decoding system, during its startup, waits until all the
transactions which already have XID assigned have finished, there is a risk of
deadlock if a transaction that already changed anything in the database tries
to acquire a conflicting lock on the table REPACK CONCURRENTLY is working
on. As an example, consider transaction running CREATE INDEX command on the
table that is being REPACKed CONCURRENTLY. On the other hand, DML commands
(INSERT, UPDATE, DELETE) are not a problem as their lock does not conflict
with REPACK CONCURRENTLY.

The current approach is that we accept the risk. If we tried to avoid it, it'd
be necessary to unlock the table before the logical decoding is setup and lock
it again afterwards. Such temporary unlocking would imply re-checking if the
table still meets all the requirements for REPACK CONCURRENTLY.

Like the existing implementation of REPACK, the variant with the CONCURRENTLY
option also requires an extra space for the new relation and index files
(which coexist with the old files for some time). In addition, the
CONCURRENTLY option might introduce a lag in releasing WAL segments for
archiving / recycling. This is due to the decoding of the data changes done by
applications concurrently. When copying the table contents into the new file,
we check the lag periodically. If it exceeds the size of a WAL segment, we
decode all the available WAL before resuming the copying. (Of course, the
changes are not applied until the whole table contents is copied.) A
background worker might be a better approach for the decoding - let's consider
implementing it in the future.

The WAL records produced by running DML commands on the new relation do not
contain enough information to be processed by the logical decoding system. All
we need from the new relation is the file (relfilenode), while the actual
relation is eventually dropped. Thus there is no point in replaying the DMLs
anywhere.

Author: Antonin Houska <ah@cybertec.at>
---
 doc/src/sgml/monitoring.sgml                  |   37 +-
 doc/src/sgml/mvcc.sgml                        |   12 +-
 doc/src/sgml/ref/repack.sgml                  |  129 +-
 src/Makefile                                  |    1 +
 src/backend/access/heap/heapam.c              |   34 +-
 src/backend/access/heap/heapam_handler.c      |  227 ++-
 src/backend/access/heap/rewriteheap.c         |    6 +-
 src/backend/access/transam/xact.c             |   11 +-
 src/backend/catalog/system_views.sql          |   19 +-
 src/backend/commands/cluster.c                | 1661 +++++++++++++++--
 src/backend/commands/matview.c                |    2 +-
 src/backend/commands/tablecmds.c              |    1 +
 src/backend/commands/vacuum.c                 |   12 +-
 src/backend/meson.build                       |    1 +
 src/backend/replication/logical/decode.c      |   83 +
 src/backend/replication/logical/snapbuild.c   |   21 +
 .../replication/pgoutput_repack/Makefile      |   32 +
 .../replication/pgoutput_repack/meson.build   |   18 +
 .../pgoutput_repack/pgoutput_repack.c         |  288 +++
 src/backend/storage/ipc/ipci.c                |    1 +
 .../storage/lmgr/generate-lwlocknames.pl      |    2 +-
 src/backend/utils/cache/relcache.c            |    1 +
 src/backend/utils/time/snapmgr.c              |    3 +-
 src/bin/psql/tab-complete.in.c                |   25 +-
 src/include/access/heapam.h                   |    9 +-
 src/include/access/heapam_xlog.h              |    2 +
 src/include/access/tableam.h                  |   10 +
 src/include/commands/cluster.h                |   90 +-
 src/include/commands/progress.h               |   17 +-
 src/include/replication/snapbuild.h           |    1 +
 src/include/storage/lockdefs.h                |    4 +-
 src/include/utils/snapmgr.h                   |    2 +
 src/test/modules/injection_points/Makefile    |    5 +-
 .../injection_points/expected/repack.out      |  113 ++
 .../modules/injection_points/logical.conf     |    1 +
 src/test/modules/injection_points/meson.build |    4 +
 .../injection_points/specs/repack.spec        |  143 ++
 src/test/regress/expected/rules.out           |   19 +-
 src/tools/pgindent/typedefs.list              |    4 +
 39 files changed, 2801 insertions(+), 250 deletions(-)
 create mode 100644 src/backend/replication/pgoutput_repack/Makefile
 create mode 100644 src/backend/replication/pgoutput_repack/meson.build
 create mode 100644 src/backend/replication/pgoutput_repack/pgoutput_repack.c
 create mode 100644 src/test/modules/injection_points/expected/repack.out
 create mode 100644 src/test/modules/injection_points/logical.conf
 create mode 100644 src/test/modules/injection_points/specs/repack.spec

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 467e081f015..14a17541b5e 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6143,14 +6143,35 @@ FROM pg_stat_get_backend_idset() AS backendid;
 
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>heap_tuples_written</structfield> <type>bigint</type>
+       <structfield>heap_tuples_inserted</structfield> <type>bigint</type>
       </para>
       <para>
-       Number of heap tuples written.
+       Number of heap tuples inserted.
        This counter only advances when the phase is
        <literal>seq scanning heap</literal>,
-       <literal>index scanning heap</literal>
-       or <literal>writing new heap</literal>.
+       <literal>index scanning heap</literal>,
+       <literal>writing new heap</literal>
+       or <literal>catch-up</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_updated</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples updated.
+       This counter only advances when the phase is <literal>catch-up</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_deleted</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples deleted.
+       This counter only advances when the phase is <literal>catch-up</literal>.
       </para></entry>
      </row>
 
@@ -6231,6 +6252,14 @@ FROM pg_stat_get_backend_idset() AS backendid;
        <command>REPACK</command> is currently writing the new heap.
      </entry>
     </row>
+    <row>
+     <entry><literal>catch-up</literal></entry>
+     <entry>
+       <command>REPACK CONCURRENTLY</command> is currently processing the DML
+       commands that other transactions executed during any of the preceding
+       phase.
+     </entry>
+    </row>
     <row>
      <entry><literal>swapping relation files</literal></entry>
      <entry>
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index 049ee75a4ba..0f5c34af542 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -1833,15 +1833,17 @@ SELECT pg_advisory_lock(q.id) FROM
    <title>Caveats</title>
 
    <para>
-    Some DDL commands, currently only <link linkend="sql-truncate"><command>TRUNCATE</command></link> and the
-    table-rewriting forms of <link linkend="sql-altertable"><command>ALTER TABLE</command></link>, are not
+    Some commands, currently only <link linkend="sql-truncate"><command>TRUNCATE</command></link>, the
+    table-rewriting forms of <link linkend="sql-altertable"><command>ALTER
+    TABLE</command></link> and <command>REPACK</command> with
+    the <literal>CONCURRENTLY</literal> option, are not
     MVCC-safe.  This means that after the truncation or rewrite commits, the
     table will appear empty to concurrent transactions, if they are using a
-    snapshot taken before the DDL command committed.  This will only be an
+    snapshot taken before the command committed.  This will only be an
     issue for a transaction that did not access the table in question
-    before the DDL command started &mdash; any transaction that has done so
+    before the command started &mdash; any transaction that has done so
     would hold at least an <literal>ACCESS SHARE</literal> table lock,
-    which would block the DDL command until that transaction completes.
+    which would block the truncating or rewriting command until that transaction completes.
     So these commands will not cause any apparent inconsistency in the
     table contents for successive queries on the target table, but they
     could cause visible inconsistency between the contents of the target
diff --git a/doc/src/sgml/ref/repack.sgml b/doc/src/sgml/ref/repack.sgml
index 0e1116eae85..a6b6e6cdaa3 100644
--- a/doc/src/sgml/ref/repack.sgml
+++ b/doc/src/sgml/ref/repack.sgml
@@ -27,6 +27,7 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
 
     VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
     ANALYZE [ <replaceable class="parameter">boolean</replaceable> ]
+    CONCURRENTLY [ <replaceable class="parameter">boolean</replaceable> ]
 
 <phrase>and <replaceable class="parameter">table_and_columns</replaceable> is:</phrase>
 
@@ -53,7 +54,8 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
    processes every table and materialized view in the current database that
    the current user has the <literal>MAINTAIN</literal> privilege on. This
    form of <command>REPACK</command> cannot be executed inside a transaction
-   block.
+   block.  Also, this form is not allowed if
+   the <literal>CONCURRENTLY</literal> option is used.
   </para>
 
   <para>
@@ -66,7 +68,8 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
    When a table is being repacked, an <literal>ACCESS EXCLUSIVE</literal> lock
    is acquired on it. This prevents any other database operations (both reads
    and writes) from operating on the table until the <command>REPACK</command>
-   is finished.
+   is finished. If you want to keep the table accessible during the repacking,
+   consider using the <literal>CONCURRENTLY</literal> option.
   </para>
 
   <refsect2 id="sql-repack-notes-on-clustering" xreflabel="Notes on Clustering">
@@ -194,6 +197,128 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>CONCURRENTLY</literal></term>
+    <listitem>
+     <para>
+      Allow other transactions to use the table while it is being repacked.
+     </para>
+
+     <para>
+      Internally, <command>REPACK</command> copies the contents of the table
+      (ignoring dead tuples) into a new file, sorted by the specified index,
+      and also creates a new file for each index. Then it swaps the old and
+      new files for the table and all the indexes, and deletes the old
+      files. The <literal>ACCESS EXCLUSIVE</literal> lock is needed to make
+      sure that the old files do not change during the processing because the
+      changes would get lost due to the swap.
+     </para>
+
+     <para>
+      With the <literal>CONCURRENTLY</literal> option, the <literal>ACCESS
+      EXCLUSIVE</literal> lock is only acquired to swap the table and index
+      files. The data changes that took place during the creation of the new
+      table and index files are captured using logical decoding
+      (<xref linkend="logicaldecoding"/>) and applied before
+      the <literal>ACCESS EXCLUSIVE</literal> lock is requested. Thus the lock
+      is typically held only for the time needed to swap the files, which
+      should be pretty short. However, the time might still be noticeable if
+      too many data changes have been done to the table while
+      <command>REPACK</command> was waiting for the lock: those changes must
+      be processed just before the files are swapped, while the
+      <literal>ACCESS EXCLUSIVE</literal> lock is being held.
+     </para>
+
+     <para>
+      Note that <command>REPACK</command> with the
+      the <literal>CONCURRENTLY</literal> option does not try to order the
+      rows inserted into the table after the repacking started. Also
+      note <command>REPACK</command> might fail to complete due to DDL
+      commands executed on the table by other transactions during the
+      repacking.
+     </para>
+
+     <note>
+      <para>
+       In addition to the temporary space requirements explained in
+       <xref linkend="sql-repack-notes-on-resources"/>,
+       the <literal>CONCURRENTLY</literal> option can add to the usage of
+       temporary space a bit more. The reason is that other transactions can
+       perform DML operations which cannot be applied to the new file until
+       <command>REPACK</command> has copied all the tuples from the old
+       file. Thus the tuples inserted into the old file during the copying are
+       also stored separately in a temporary file, so they can eventually be
+       applied to the new file.
+      </para>
+
+      <para>
+       Furthermore, the data changes performed during the copying are
+       extracted from <link linkend="wal">write-ahead log</link> (WAL), and
+       this extraction (decoding) only takes place when certain amount of WAL
+       has been written. Therefore, WAL removal can be delayed by this
+       threshold. Currently the threshold is equal to the value of
+       the <link linkend="guc-wal-segment-size"><varname>wal_segment_size</varname></link>
+       configuration parameter.
+      </para>
+     </note>
+
+     <para>
+      The <literal>CONCURRENTLY</literal> option cannot be used in the
+      following cases:
+
+      <itemizedlist>
+       <listitem>
+        <para>
+          The table is <literal>UNLOGGED</literal>.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+          The table is partitioned.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+          The table is a system catalog or a <acronym>TOAST</acronym> table.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         <command>REPACK</command> is executed inside a transaction block.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+          The <link linkend="guc-wal-level"><varname>wal_level</varname></link>
+          configuration parameter is less than <literal>logical</literal>.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         The <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+         configuration parameter does not allow for creation of an additional
+         replication slot.
+        </para>
+       </listitem>
+      </itemizedlist>
+     </para>
+
+     <warning>
+      <para>
+       <command>REPACK</command> with the <literal>CONCURRENTLY</literal>
+       option is not MVCC-safe, see <xref linkend="mvcc-caveats"/> for
+       details.
+      </para>
+     </warning>
+
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>VERBOSE</literal></term>
     <listitem>
diff --git a/src/Makefile b/src/Makefile
index 2f31a2f20a7..b18c9a14ffa 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -23,6 +23,7 @@ SUBDIRS = \
 	interfaces \
 	backend/replication/libpqwalreceiver \
 	backend/replication/pgoutput \
+	backend/replication/pgoutput_repack \
 	fe_utils \
 	bin \
 	pl \
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 36fee9c994e..3dfad00f0aa 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -60,7 +60,8 @@ static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
 static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 								  Buffer newbuf, HeapTuple oldtup,
 								  HeapTuple newtup, HeapTuple old_key_tuple,
-								  bool all_visible_cleared, bool new_all_visible_cleared);
+								  bool all_visible_cleared, bool new_all_visible_cleared,
+								  bool wal_logical);
 #ifdef USE_ASSERT_CHECKING
 static void check_lock_if_inplace_updateable_rel(Relation relation,
 												 const ItemPointerData *otid,
@@ -2788,7 +2789,7 @@ xmax_infomask_changed(uint16 new_infomask, uint16 old_infomask)
 TM_Result
 heap_delete(Relation relation, const ItemPointerData *tid,
 			CommandId cid, Snapshot crosscheck, bool wait,
-			TM_FailureData *tmfd, bool changingPart)
+			TM_FailureData *tmfd, bool changingPart, bool wal_logical)
 {
 	TM_Result	result;
 	TransactionId xid = GetCurrentTransactionId();
@@ -3035,7 +3036,8 @@ l1:
 	 * Compute replica identity tuple before entering the critical section so
 	 * we don't PANIC upon a memory allocation failure.
 	 */
-	old_key_tuple = ExtractReplicaIdentity(relation, &tp, true, &old_key_copied);
+	old_key_tuple = wal_logical ?
+		ExtractReplicaIdentity(relation, &tp, true, &old_key_copied) : NULL;
 
 	/*
 	 * If this is the first possibly-multixact-able operation in the current
@@ -3125,6 +3127,15 @@ l1:
 				xlrec.flags |= XLH_DELETE_CONTAINS_OLD_KEY;
 		}
 
+		/*
+		 * Unlike UPDATE, DELETE is decoded even if there is no old key, so it
+		 * does not help to clear both XLH_DELETE_CONTAINS_OLD_TUPLE and
+		 * XLH_DELETE_CONTAINS_OLD_KEY. Thus we need an extra flag. TODO
+		 * Consider not decoding tuples w/o the old tuple/key instead.
+		 */
+		if (!wal_logical)
+			xlrec.flags |= XLH_DELETE_NO_LOGICAL;
+
 		XLogBeginInsert();
 		XLogRegisterData(&xlrec, SizeOfHeapDelete);
 
@@ -3217,7 +3228,8 @@ simple_heap_delete(Relation relation, const ItemPointerData *tid)
 	result = heap_delete(relation, tid,
 						 GetCurrentCommandId(true), InvalidSnapshot,
 						 true /* wait for commit */ ,
-						 &tmfd, false /* changingPart */ );
+						 &tmfd, false,	/* changingPart */
+						 true /* wal_logical */ );
 	switch (result)
 	{
 		case TM_SelfModified:
@@ -3258,7 +3270,7 @@ TM_Result
 heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
 			CommandId cid, Snapshot crosscheck, bool wait,
 			TM_FailureData *tmfd, LockTupleMode *lockmode,
-			TU_UpdateIndexes *update_indexes)
+			TU_UpdateIndexes *update_indexes, bool wal_logical)
 {
 	TM_Result	result;
 	TransactionId xid = GetCurrentTransactionId();
@@ -4151,7 +4163,8 @@ l2:
 								 newbuf, &oldtup, heaptup,
 								 old_key_tuple,
 								 all_visible_cleared,
-								 all_visible_cleared_new);
+								 all_visible_cleared_new,
+								 wal_logical);
 		if (newbuf != buffer)
 		{
 			PageSetLSN(BufferGetPage(newbuf), recptr);
@@ -4509,7 +4522,8 @@ simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup
 	result = heap_update(relation, otid, tup,
 						 GetCurrentCommandId(true), InvalidSnapshot,
 						 true /* wait for commit */ ,
-						 &tmfd, &lockmode, update_indexes);
+						 &tmfd, &lockmode, update_indexes,
+						 true /* wal_logical */ );
 	switch (result)
 	{
 		case TM_SelfModified:
@@ -8849,7 +8863,8 @@ static XLogRecPtr
 log_heap_update(Relation reln, Buffer oldbuf,
 				Buffer newbuf, HeapTuple oldtup, HeapTuple newtup,
 				HeapTuple old_key_tuple,
-				bool all_visible_cleared, bool new_all_visible_cleared)
+				bool all_visible_cleared, bool new_all_visible_cleared,
+				bool wal_logical)
 {
 	xl_heap_update xlrec;
 	xl_heap_header xlhdr;
@@ -8860,7 +8875,8 @@ log_heap_update(Relation reln, Buffer oldbuf,
 				suffixlen = 0;
 	XLogRecPtr	recptr;
 	Page		page = BufferGetPage(newbuf);
-	bool		need_tuple_data = RelationIsLogicallyLogged(reln);
+	bool		need_tuple_data = RelationIsLogicallyLogged(reln) &&
+		wal_logical;
 	bool		init;
 	int			bufflags;
 
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 79f9de5d760..d03084768e0 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -33,6 +33,7 @@
 #include "catalog/index.h"
 #include "catalog/storage.h"
 #include "catalog/storage_xlog.h"
+#include "commands/cluster.h"
 #include "commands/progress.h"
 #include "executor/executor.h"
 #include "miscadmin.h"
@@ -309,7 +310,8 @@ heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
 	 * the storage itself is cleaning the dead tuples by itself, it is the
 	 * time to call the index tuple deletion also.
 	 */
-	return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart);
+	return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart,
+					   true);
 }
 
 
@@ -328,7 +330,7 @@ heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
 	tuple->t_tableOid = slot->tts_tableOid;
 
 	result = heap_update(relation, otid, tuple, cid, crosscheck, wait,
-						 tmfd, lockmode, update_indexes);
+						 tmfd, lockmode, update_indexes, true);
 	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
 
 	/*
@@ -685,13 +687,15 @@ static void
 heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 								 Relation OldIndex, bool use_sort,
 								 TransactionId OldestXmin,
+								 Snapshot snapshot,
+								 LogicalDecodingContext *decoding_ctx,
 								 TransactionId *xid_cutoff,
 								 MultiXactId *multi_cutoff,
 								 double *num_tuples,
 								 double *tups_vacuumed,
 								 double *tups_recently_dead)
 {
-	RewriteState rwstate;
+	RewriteState rwstate = NULL;
 	IndexScanDesc indexScan;
 	TableScanDesc tableScan;
 	HeapScanDesc heapScan;
@@ -705,6 +709,8 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	bool	   *isnull;
 	BufferHeapTupleTableSlot *hslot;
 	BlockNumber prev_cblock = InvalidBlockNumber;
+	bool		concurrent = snapshot != NULL;
+	XLogRecPtr	end_of_wal_prev = GetFlushRecPtr(NULL);
 
 	/* Remember if it's a system catalog */
 	is_system_catalog = IsSystemRelation(OldHeap);
@@ -720,9 +726,12 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	values = (Datum *) palloc(natts * sizeof(Datum));
 	isnull = (bool *) palloc(natts * sizeof(bool));
 
-	/* Initialize the rewrite operation */
-	rwstate = begin_heap_rewrite(OldHeap, NewHeap, OldestXmin, *xid_cutoff,
-								 *multi_cutoff);
+	/*
+	 * Initialize the rewrite operation.
+	 */
+	if (!concurrent)
+		rwstate = begin_heap_rewrite(OldHeap, NewHeap, OldestXmin,
+									 *xid_cutoff, *multi_cutoff);
 
 
 	/* Set up sorting if wanted */
@@ -737,6 +746,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	 * Prepare to scan the OldHeap.  To ensure we see recently-dead tuples
 	 * that still need to be copied, we scan with SnapshotAny and use
 	 * HeapTupleSatisfiesVacuum for the visibility test.
+	 *
+	 * In the CONCURRENTLY case, we do regular MVCC visibility tests, using
+	 * the snapshot passed by the caller.
 	 */
 	if (OldIndex != NULL && !use_sort)
 	{
@@ -753,7 +765,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex,
+									snapshot ? snapshot : SnapshotAny,
+									NULL, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -762,7 +776,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap,
+									snapshot ? snapshot : SnapshotAny,
+									0, (ScanKey) NULL);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
@@ -785,6 +801,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		HeapTuple	tuple;
 		Buffer		buf;
 		bool		isdead;
+		HTSV_Result vis;
 
 		CHECK_FOR_INTERRUPTS();
 
@@ -837,70 +854,84 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		tuple = ExecFetchSlotHeapTuple(slot, false, NULL);
 		buf = hslot->buffer;
 
-		LockBuffer(buf, BUFFER_LOCK_SHARE);
-
-		switch (HeapTupleSatisfiesVacuum(tuple, OldestXmin, buf))
+		/*
+		 * Regarding CONCURRENTLY, see the comments on MVCC snapshot above.
+		 */
+		if (!concurrent)
 		{
-			case HEAPTUPLE_DEAD:
-				/* Definitely dead */
-				isdead = true;
-				break;
-			case HEAPTUPLE_RECENTLY_DEAD:
-				*tups_recently_dead += 1;
-				/* fall through */
-			case HEAPTUPLE_LIVE:
-				/* Live or recently dead, must copy it */
-				isdead = false;
-				break;
-			case HEAPTUPLE_INSERT_IN_PROGRESS:
+			LockBuffer(buf, BUFFER_LOCK_SHARE);
 
-				/*
-				 * Since we hold exclusive lock on the relation, normally the
-				 * only way to see this is if it was inserted earlier in our
-				 * own transaction.  However, it can happen in system
-				 * catalogs, since we tend to release write lock before commit
-				 * there.  Give a warning if neither case applies; but in any
-				 * case we had better copy it.
-				 */
-				if (!is_system_catalog &&
-					!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple->t_data)))
-					elog(WARNING, "concurrent insert in progress within table \"%s\"",
-						 RelationGetRelationName(OldHeap));
-				/* treat as live */
-				isdead = false;
-				break;
-			case HEAPTUPLE_DELETE_IN_PROGRESS:
-
-				/*
-				 * Similar situation to INSERT_IN_PROGRESS case.
-				 */
-				if (!is_system_catalog &&
-					!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tuple->t_data)))
-					elog(WARNING, "concurrent delete in progress within table \"%s\"",
-						 RelationGetRelationName(OldHeap));
-				/* treat as recently dead */
-				*tups_recently_dead += 1;
-				isdead = false;
-				break;
-			default:
-				elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
-				isdead = false; /* keep compiler quiet */
-				break;
-		}
-
-		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
-
-		if (isdead)
-		{
-			*tups_vacuumed += 1;
-			/* heap rewrite module still needs to see it... */
-			if (rewrite_heap_dead_tuple(rwstate, tuple))
+			switch ((vis = HeapTupleSatisfiesVacuum(tuple, OldestXmin, buf)))
 			{
-				/* A previous recently-dead tuple is now known dead */
-				*tups_vacuumed += 1;
-				*tups_recently_dead -= 1;
+				case HEAPTUPLE_DEAD:
+					/* Definitely dead */
+					isdead = true;
+					break;
+				case HEAPTUPLE_RECENTLY_DEAD:
+					*tups_recently_dead += 1;
+					/* fall through */
+				case HEAPTUPLE_LIVE:
+					/* Live or recently dead, must copy it */
+					isdead = false;
+					break;
+				case HEAPTUPLE_INSERT_IN_PROGRESS:
+
+					/*
+					 * As long as we hold exclusive lock on the relation,
+					 * normally the only way to see this is if it was inserted
+					 * earlier in our own transaction.  However, it can happen
+					 * in system catalogs, since we tend to release write lock
+					 * before commit there. Also, there's no exclusive lock
+					 * during concurrent processing. Give a warning if neither
+					 * case applies; but in any case we had better copy it.
+					 */
+					if (!is_system_catalog && !concurrent &&
+						!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple->t_data)))
+						elog(WARNING, "concurrent insert in progress within table \"%s\"",
+							 RelationGetRelationName(OldHeap));
+					/* treat as live */
+					isdead = false;
+					break;
+				case HEAPTUPLE_DELETE_IN_PROGRESS:
+
+					/*
+					 * Similar situation to INSERT_IN_PROGRESS case.
+					 */
+					if (!is_system_catalog && !concurrent &&
+						!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tuple->t_data)))
+						elog(WARNING, "concurrent delete in progress within table \"%s\"",
+							 RelationGetRelationName(OldHeap));
+					/* treat as recently dead */
+					*tups_recently_dead += 1;
+					isdead = false;
+					break;
+				default:
+					elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
+					isdead = false; /* keep compiler quiet */
+					break;
 			}
-			continue;
+
+			if (isdead)
+			{
+				*tups_vacuumed += 1;
+				/* heap rewrite module still needs to see it... */
+				if (rewrite_heap_dead_tuple(rwstate, tuple))
+				{
+					/* A previous recently-dead tuple is now known dead */
+					*tups_vacuumed += 1;
+					*tups_recently_dead -= 1;
+				}
+
+				LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+				continue;
+			}
+
+			/*
+			 * In the concurrent case, we have a copy of the tuple, so we
+			 * don't worry whether the source tuple will be deleted / updated
+			 * after we release the lock.
+			 */
+			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 		}
 
 		*num_tuples += 1;
@@ -919,7 +950,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		{
 			const int	ct_index[] = {
 				PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
-				PROGRESS_REPACK_HEAP_TUPLES_WRITTEN
+				PROGRESS_REPACK_HEAP_TUPLES_INSERTED
 			};
 			int64		ct_val[2];
 
@@ -934,6 +965,31 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			ct_val[1] = *num_tuples;
 			pgstat_progress_update_multi_param(2, ct_index, ct_val);
 		}
+
+		/*
+		 * Process the WAL produced by the load, as well as by other
+		 * transactions, so that the replication slot can advance and WAL does
+		 * not pile up. Use wal_segment_size as a threshold so that we do not
+		 * introduce the decoding overhead too often.
+		 *
+		 * Of course, we must not apply the changes until the initial load has
+		 * completed.
+		 *
+		 * Note that our insertions into the new table should not be decoded
+		 * as we (intentionally) do not write the logical decoding specific
+		 * information to WAL.
+		 */
+		if (concurrent)
+		{
+			XLogRecPtr	end_of_wal;
+
+			end_of_wal = GetFlushRecPtr(NULL);
+			if ((end_of_wal - end_of_wal_prev) > wal_segment_size)
+			{
+				repack_decode_concurrent_changes(decoding_ctx, end_of_wal);
+				end_of_wal_prev = end_of_wal;
+			}
+		}
 	}
 
 	if (indexScan != NULL)
@@ -977,7 +1033,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 									 values, isnull,
 									 rwstate);
 			/* Report n_tuples */
-			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_WRITTEN,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_INSERTED,
 										 n_tuples);
 		}
 
@@ -985,7 +1041,8 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	}
 
 	/* Write out any remaining tuples, and fsync if needed */
-	end_heap_rewrite(rwstate);
+	if (rwstate)
+		end_heap_rewrite(rwstate);
 
 	/* Clean up */
 	pfree(values);
@@ -2376,6 +2433,10 @@ heapam_scan_sample_next_tuple(TableScanDesc scan, SampleScanState *scanstate,
  * SET WITHOUT OIDS.
  *
  * So, we must reconstruct the tuple from component Datums.
+ *
+ * If rwstate=NULL, use simple_heap_insert() instead of rewriting - in that
+ * case we still need to deform/form the tuple. TODO Shouldn't we rename the
+ * function, as might not do any rewrite?
  */
 static void
 reform_and_rewrite_tuple(HeapTuple tuple,
@@ -2398,8 +2459,28 @@ reform_and_rewrite_tuple(HeapTuple tuple,
 
 	copiedTuple = heap_form_tuple(newTupDesc, values, isnull);
 
-	/* The heap rewrite module does the rest */
-	rewrite_heap_tuple(rwstate, tuple, copiedTuple);
+	if (rwstate)
+		/* The heap rewrite module does the rest */
+		rewrite_heap_tuple(rwstate, tuple, copiedTuple);
+	else
+	{
+		/*
+		 * Insert tuple when processing REPACK CONCURRENTLY.
+		 *
+		 * rewriteheap.c is not used in the CONCURRENTLY case because it'd be
+		 * difficult to do the same in the catch-up phase (as the logical
+		 * decoding does not provide us with sufficient visibility
+		 * information). Thus we must use heap_insert() both during the
+		 * catch-up and here.
+		 *
+		 * The following is like simple_heap_insert() except that we pass the
+		 * flag to skip logical decoding: as soon as REPACK CONCURRENTLY swaps
+		 * the relation files, it drops this relation, so no logical
+		 * replication subscription should need the data.
+		 */
+		heap_insert(NewHeap, copiedTuple, GetCurrentCommandId(true),
+					HEAP_INSERT_NO_LOGICAL, NULL);
+	}
 
 	heap_freetuple(copiedTuple);
 }
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index 8061e92f044..dca849fb226 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -617,9 +617,9 @@ raw_heap_insert(RewriteState state, HeapTuple tup)
 		int			options = HEAP_INSERT_SKIP_FSM;
 
 		/*
-		 * While rewriting the heap for VACUUM FULL / CLUSTER, make sure data
-		 * for the TOAST table are not logically decoded.  The main heap is
-		 * WAL-logged as XLOG FPI records, which are not logically decoded.
+		 * While rewriting the heap for REPACK, make sure data for the TOAST
+		 * table are not logically decoded.  The main heap is WAL-logged as
+		 * XLOG FPI records, which are not logically decoded.
 		 */
 		options |= HEAP_INSERT_NO_LOGICAL;
 
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 2cf3d4e92b7..061ee8c9f87 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -215,6 +215,7 @@ typedef struct TransactionStateData
 	bool		parallelChildXact;	/* is any parent transaction parallel? */
 	bool		chain;			/* start a new block after this one */
 	bool		topXidLogged;	/* for a subxact: is top-level XID logged? */
+	bool		internal;		/* for a subxact: launched internally? */
 	struct TransactionStateData *parent;	/* back link to parent */
 } TransactionStateData;
 
@@ -4735,6 +4736,7 @@ BeginInternalSubTransaction(const char *name)
 			/* Normal subtransaction start */
 			PushTransaction();
 			s = CurrentTransactionState;	/* changed by push */
+			s->internal = true;
 
 			/*
 			 * Savepoint names, like the TransactionState block itself, live
@@ -5251,7 +5253,13 @@ AbortSubTransaction(void)
 	LWLockReleaseAll();
 
 	pgstat_report_wait_end();
-	pgstat_progress_end_command();
+
+	/*
+	 * Internal subtransacion might be used by an user command, in which case
+	 * the command outlives the subtransaction.
+	 */
+	if (!s->internal)
+		pgstat_progress_end_command();
 
 	pgaio_error_cleanup();
 
@@ -5468,6 +5476,7 @@ PushTransaction(void)
 	s->parallelModeLevel = 0;
 	s->parallelChildXact = (p->parallelModeLevel != 0 || p->parallelChildXact);
 	s->topXidLogged = false;
+	s->internal = false;
 
 	CurrentTransactionState = s;
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 1ad30116631..81cd3760a4d 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1284,16 +1284,19 @@ CREATE VIEW pg_stat_progress_repack AS
                       WHEN 2 THEN 'index scanning heap'
                       WHEN 3 THEN 'sorting tuples'
                       WHEN 4 THEN 'writing new heap'
-                      WHEN 5 THEN 'swapping relation files'
-                      WHEN 6 THEN 'rebuilding index'
-                      WHEN 7 THEN 'performing final cleanup'
+                      WHEN 5 THEN 'catch-up'
+                      WHEN 6 THEN 'swapping relation files'
+                      WHEN 7 THEN 'rebuilding index'
+                      WHEN 8 THEN 'performing final cleanup'
                       END AS phase,
         CAST(S.param3 AS oid) AS repack_index_relid,
         S.param4 AS heap_tuples_scanned,
-        S.param5 AS heap_tuples_written,
-        S.param6 AS heap_blks_total,
-        S.param7 AS heap_blks_scanned,
-        S.param8 AS index_rebuild_count
+        S.param5 AS heap_tuples_inserted,
+        S.param6 AS heap_tuples_updated,
+        S.param7 AS heap_tuples_deleted,
+        S.param8 AS heap_blks_total,
+        S.param9 AS heap_blks_scanned,
+        S.param10 AS index_rebuild_count
     FROM pg_stat_get_progress_info('REPACK') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
@@ -1311,7 +1314,7 @@ CREATE VIEW pg_stat_progress_cluster AS
         phase,
         repack_index_relid AS cluster_index_relid,
         heap_tuples_scanned,
-        heap_tuples_written,
+        heap_tuples_inserted + heap_tuples_updated AS heap_tuples_written,
         heap_blks_total,
         heap_blks_scanned,
         index_rebuild_count
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 18bee52a4ee..04b6b905009 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -25,6 +25,10 @@
 #include "access/toast_internals.h"
 #include "access/transam.h"
 #include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xlog_internal.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
 #include "catalog/catalog.h"
 #include "catalog/dependency.h"
 #include "catalog/heap.h"
@@ -32,6 +36,7 @@
 #include "catalog/namespace.h"
 #include "catalog/objectaccess.h"
 #include "catalog/pg_am.h"
+#include "catalog/pg_control.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/toasting.h"
 #include "commands/cluster.h"
@@ -39,15 +44,21 @@
 #include "commands/progress.h"
 #include "commands/tablecmds.h"
 #include "commands/vacuum.h"
+#include "executor/executor.h"
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
 #include "pgstat.h"
+#include "replication/decode.h"
+#include "replication/logical.h"
+#include "replication/snapbuild.h"
 #include "storage/bufmgr.h"
+#include "storage/ipc.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
 #include "utils/acl.h"
 #include "utils/fmgroids.h"
 #include "utils/guc.h"
+#include "utils/injection_point.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
@@ -67,13 +78,44 @@ typedef struct
 	Oid			indexOid;
 } RelToCluster;
 
+/*
+ * The following definitions are used for concurrent processing.
+ */
+
+/*
+ * The locators are used to avoid logical decoding of data that we do not need
+ * for our table.
+ */
+RelFileLocator repacked_rel_locator = {.relNumber = InvalidOid};
+RelFileLocator repacked_rel_toast_locator = {.relNumber = InvalidOid};
+
+/*
+ * Everything we need to call ExecInsertIndexTuples().
+ */
+typedef struct IndexInsertState
+{
+	ResultRelInfo *rri;
+	EState	   *estate;
+
+	Relation	ident_index;
+} IndexInsertState;
+
+/* The WAL segment being decoded. */
+static XLogSegNo repack_current_segment = 0;
+
+
 static bool cluster_rel_recheck(RepackCommand cmd, Relation OldHeap,
-								Oid indexOid, Oid userid, int options);
-static void rebuild_relation(RepackCommand cmd,
-							 Relation OldHeap, Relation index, bool verbose);
+								Oid indexOid, Oid userid, LOCKMODE lmode,
+								int options);
+static void check_repack_concurrently_requirements(Relation rel);
+static void rebuild_relation(RepackCommand cmd, Relation OldHeap, Relation index,
+							 bool verbose, bool concurrently);
 static void copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
-							bool verbose, bool *pSwapToastByContent,
-							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
+							Snapshot snapshot, LogicalDecodingContext *decoding_ctx,
+							bool verbose,
+							bool *pSwapToastByContent,
+							TransactionId *pFreezeXid,
+							MultiXactId *pCutoffMulti);
 static List *get_tables_to_repack(RepackCommand cmd, bool usingindex,
 								  MemoryContext permcxt);
 static List *get_tables_to_repack_partitioned(RepackCommand cmd,
@@ -81,13 +123,62 @@ static List *get_tables_to_repack_partitioned(RepackCommand cmd,
 											  MemoryContext permcxt);
 static bool cluster_is_permitted_for_relation(RepackCommand cmd,
 											  Oid relid, Oid userid);
+
+static void begin_concurrent_repack(Relation rel);
+static void end_concurrent_repack(void);
+static LogicalDecodingContext *setup_logical_decoding(Oid relid,
+													  const char *slotname,
+													  TupleDesc tupdesc);
+static HeapTuple get_changed_tuple(char *change);
+static void apply_concurrent_changes(RepackDecodingState *dstate,
+									 Relation rel, ScanKey key, int nkeys,
+									 IndexInsertState *iistate);
+static void apply_concurrent_insert(Relation rel, ConcurrentChange *change,
+									HeapTuple tup, IndexInsertState *iistate,
+									TupleTableSlot *index_slot);
+static void apply_concurrent_update(Relation rel, HeapTuple tup,
+									HeapTuple tup_target,
+									ConcurrentChange *change,
+									IndexInsertState *iistate,
+									TupleTableSlot *index_slot);
+static void apply_concurrent_delete(Relation rel, HeapTuple tup_target,
+									ConcurrentChange *change);
+static HeapTuple find_target_tuple(Relation rel, ScanKey key, int nkeys,
+								   HeapTuple tup_key,
+								   IndexInsertState *iistate,
+								   TupleTableSlot *ident_slot,
+								   IndexScanDesc *scan_p);
+static void process_concurrent_changes(LogicalDecodingContext *ctx,
+									   XLogRecPtr end_of_wal,
+									   Relation rel_dst,
+									   Relation rel_src,
+									   ScanKey ident_key,
+									   int ident_key_nentries,
+									   IndexInsertState *iistate);
+static IndexInsertState *get_index_insert_state(Relation relation,
+												Oid ident_index_id);
+static ScanKey build_identity_key(Oid ident_idx_oid, Relation rel_src,
+								  int *nentries);
+static void free_index_insert_state(IndexInsertState *iistate);
+static void cleanup_logical_decoding(LogicalDecodingContext *ctx);
+static void rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
+											   Relation cl_index,
+											   LogicalDecodingContext *ctx,
+											   bool swap_toast_by_content,
+											   TransactionId frozenXid,
+											   MultiXactId cutoffMulti);
+static List *build_new_indexes(Relation NewHeap, Relation OldHeap, List *OldIndexes);
 static Relation process_single_relation(RepackStmt *stmt,
+										LOCKMODE lockmode,
+										bool isTopLevel,
 										ClusterParams *params);
 static Oid	determine_clustered_index(Relation rel, bool usingindex,
 									  const char *indexname);
 static const char *RepackCommandAsString(RepackCommand cmd);
 
 
+#define REPL_PLUGIN_NAME   "pgoutput_repack"
+
 /*
  * The repack code allows for processing multiple tables at once. Because
  * of this, we cannot just run everything on a single transaction, or we
@@ -117,6 +208,7 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 	ClusterParams params = {0};
 	Relation	rel = NULL;
 	MemoryContext repack_context;
+	LOCKMODE	lockmode;
 	List	   *rtcs;
 
 	/* Parse option list */
@@ -127,6 +219,16 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 		else if (strcmp(opt->defname, "analyze") == 0 ||
 				 strcmp(opt->defname, "analyse") == 0)
 			params.options |= defGetBoolean(opt) ? CLUOPT_ANALYZE : 0;
+		else if (strcmp(opt->defname, "concurrently") == 0 &&
+				 defGetBoolean(opt))
+		{
+			if (stmt->command != REPACK_COMMAND_REPACK)
+				ereport(ERROR,
+						errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						errmsg("CONCURRENTLY option not supported for %s",
+							   RepackCommandAsString(stmt->command)));
+			params.options |= CLUOPT_CONCURRENT;
+		}
 		else
 			ereport(ERROR,
 					errcode(ERRCODE_SYNTAX_ERROR),
@@ -136,13 +238,25 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 					parser_errposition(pstate, opt->location));
 	}
 
+	/*
+	 * Determine the lock mode expected by cluster_rel().
+	 *
+	 * In the exclusive case, we obtain AccessExclusiveLock right away to
+	 * avoid lock-upgrade hazard in the single-transaction case. In the
+	 * CONCURRENTLY case, the AccessExclusiveLock will only be used at the end
+	 * of processing, supposedly for very short time. Until then, we'll have
+	 * to unlock the relation temporarily, so there's no lock-upgrade hazard.
+	 */
+	lockmode = (params.options & CLUOPT_CONCURRENT) == 0 ?
+		AccessExclusiveLock : ShareUpdateExclusiveLock;
+
 	/*
 	 * If a single relation is specified, process it and we're done ... unless
 	 * the relation is a partitioned table, in which case we fall through.
 	 */
 	if (stmt->relation != NULL)
 	{
-		rel = process_single_relation(stmt, &params);
+		rel = process_single_relation(stmt, lockmode, isTopLevel, &params);
 		if (rel == NULL)
 			return;				/* all done */
 	}
@@ -157,10 +271,29 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 				errmsg("cannot %s multiple tables", "REPACK (ANALYZE)"));
 
 	/*
-	 * By here, we know we are in a multi-table situation.  In order to avoid
-	 * holding locks for too long, we want to process each table in its own
-	 * transaction.  This forces us to disallow running inside a user
-	 * transaction block.
+	 * By here, we know we are in a multi-table situation.
+	 *
+	 * Concurrent processing is currently considered rather special (e.g. in
+	 * terms of resources consumed) so it is not performed in bulk.
+	 */
+	if (params.options & CLUOPT_CONCURRENT)
+	{
+		if (rel != NULL)
+		{
+			Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
+			ereport(ERROR,
+					errmsg("REPACK CONCURRENTLY not supported for partitioned tables"),
+					errhint("Consider running the command for individual partitions."));
+		}
+		else
+			ereport(ERROR,
+					errmsg("REPACK CONCURRENTLY requires explicit table name"));
+	}
+
+	/*
+	 * In order to avoid holding locks for too long, we want to process each
+	 * table in its own transaction.  This forces us to disallow running
+	 * inside a user transaction block.
 	 */
 	PreventInTransactionBlock(isTopLevel, RepackCommandAsString(stmt->command));
 
@@ -243,7 +376,7 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 		 * Open the target table, coping with the case where it has been
 		 * dropped.
 		 */
-		rel = try_table_open(rtc->tableOid, AccessExclusiveLock);
+		rel = try_table_open(rtc->tableOid, lockmode);
 		if (rel == NULL)
 		{
 			CommitTransactionCommand();
@@ -254,7 +387,7 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 		PushActiveSnapshot(GetTransactionSnapshot());
 
 		/* Process this table */
-		cluster_rel(stmt->command, rel, rtc->indexOid, &params);
+		cluster_rel(stmt->command, rel, rtc->indexOid, &params, isTopLevel);
 		/* cluster_rel closes the relation, but keeps lock */
 
 		PopActiveSnapshot();
@@ -283,22 +416,54 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
  * If indexOid is InvalidOid, the table will be rewritten in physical order
  * instead of index order.
  *
+ * Note that, in the concurrent case, the function releases the lock at some
+ * point, in order to get AccessExclusiveLock for the final steps (i.e. to
+ * swap the relation files). To make things simpler, the caller should expect
+ * OldHeap to be closed on return, regardless CLUOPT_CONCURRENT. (The
+ * AccessExclusiveLock is kept till the end of the transaction.)
+ *
  * 'cmd' indicates which command is being executed, to be used for error
  * messages.
  */
 void
 cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
-			ClusterParams *params)
+			ClusterParams *params, bool isTopLevel)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
+	Relation	index;
+	LOCKMODE	lmode;
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
 	bool		verbose = ((params->options & CLUOPT_VERBOSE) != 0);
 	bool		recheck = ((params->options & CLUOPT_RECHECK) != 0);
-	Relation	index;
+	bool		concurrent = ((params->options & CLUOPT_CONCURRENT) != 0);
 
-	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false));
+	/*
+	 * Check that the correct lock is held. The lock mode is
+	 * AccessExclusiveLock for normal processing and ShareUpdateExclusiveLock
+	 * for concurrent processing (so that SELECT, INSERT, UPDATE and DELETE
+	 * commands work, but cluster_rel() cannot be called concurrently for the
+	 * same relation).
+	 */
+	lmode = !concurrent ? AccessExclusiveLock : ShareUpdateExclusiveLock;
+
+	/* There are specific requirements on concurrent processing. */
+	if (concurrent)
+	{
+		/*
+		 * Make sure we have no XID assigned, otherwise call of
+		 * setup_logical_decoding() can cause a deadlock.
+		 *
+		 * The existence of transaction block actually does not imply that XID
+		 * was already assigned, but it very likely is. We might want to check
+		 * the result of GetCurrentTransactionIdIfAny() instead, but that
+		 * would be less clear from user's perspective.
+		 */
+		PreventInTransactionBlock(isTopLevel, "REPACK (CONCURRENTLY)");
+
+		check_repack_concurrently_requirements(OldHeap);
+	}
 
 	/* Check for user-requested abort. */
 	CHECK_FOR_INTERRUPTS();
@@ -324,10 +489,13 @@ cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	 * If this is a single-transaction CLUSTER, we can skip these tests. We
 	 * *must* skip the one on indisclustered since it would reject an attempt
 	 * to cluster a not-previously-clustered index.
+	 *
+	 * XXX move [some of] these comments to where the RECHECK flag is
+	 * determined?
 	 */
 	if (recheck &&
 		!cluster_rel_recheck(cmd, OldHeap, indexOid, save_userid,
-							 params->options))
+							 lmode, params->options))
 		goto out;
 
 	/*
@@ -342,6 +510,12 @@ cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 				errmsg("cannot run %s on a shared catalog",
 					   RepackCommandAsString(cmd)));
 
+	/*
+	 * The CONCURRENTLY case should have been rejected earlier because it does
+	 * not support system catalogs.
+	 */
+	Assert(!(OldHeap->rd_rel->relisshared && concurrent));
+
 	/*
 	 * Don't process temp tables of other backends ... their local buffer
 	 * manager is not going to cope.
@@ -362,7 +536,7 @@ cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	if (OidIsValid(indexOid))
 	{
 		/* verify the index is good and lock it */
-		check_index_is_clusterable(OldHeap, indexOid, AccessExclusiveLock);
+		check_index_is_clusterable(OldHeap, indexOid, lmode);
 		/* also open it */
 		index = index_open(indexOid, NoLock);
 	}
@@ -397,7 +571,9 @@ cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	if (OldHeap->rd_rel->relkind == RELKIND_MATVIEW &&
 		!RelationIsPopulated(OldHeap))
 	{
-		relation_close(OldHeap, AccessExclusiveLock);
+		if (index)
+			index_close(index, lmode);
+		relation_close(OldHeap, lmode);
 		goto out;
 	}
 
@@ -410,11 +586,35 @@ cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	 * invalid, because we move tuples around.  Promote them to relation
 	 * locks.  Predicate locks on indexes will be promoted when they are
 	 * reindexed.
+	 *
+	 * During concurrent processing, the heap as well as its indexes stay in
+	 * operation, so we postpone this step until they are locked using
+	 * AccessExclusiveLock near the end of the processing.
 	 */
-	TransferPredicateLocksToHeapRelation(OldHeap);
+	if (!concurrent)
+		TransferPredicateLocksToHeapRelation(OldHeap);
 
 	/* rebuild_relation does all the dirty work */
-	rebuild_relation(cmd, OldHeap, index, verbose);
+	PG_TRY();
+	{
+		/*
+		 * For concurrent processing, make sure that our logical decoding
+		 * ignores data changes of other tables than the one we are
+		 * processing.
+		 */
+		if (concurrent)
+			begin_concurrent_repack(OldHeap);
+
+		rebuild_relation(cmd, OldHeap, index, /* save_userid, */
+						 verbose, concurrent);
+	}
+	PG_FINALLY();
+	{
+		if (concurrent)
+			end_concurrent_repack();
+	}
+	PG_END_TRY();
+
 	/* rebuild_relation closes OldHeap, and index if valid */
 
 out:
@@ -433,14 +633,14 @@ out:
  */
 static bool
 cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
-					Oid userid, int options)
+					Oid userid, LOCKMODE lmode, int options)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 
 	/* Check that the user still has privileges for the relation */
 	if (!cluster_is_permitted_for_relation(cmd, tableOid, userid))
 	{
-		relation_close(OldHeap, AccessExclusiveLock);
+		relation_close(OldHeap, lmode);
 		return false;
 	}
 
@@ -454,7 +654,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	 */
 	if (RELATION_IS_OTHER_TEMP(OldHeap))
 	{
-		relation_close(OldHeap, AccessExclusiveLock);
+		relation_close(OldHeap, lmode);
 		return false;
 	}
 
@@ -465,7 +665,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 		 */
 		if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
 		{
-			relation_close(OldHeap, AccessExclusiveLock);
+			relation_close(OldHeap, lmode);
 			return false;
 		}
 
@@ -476,7 +676,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 		if ((options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
 			!get_index_isclustered(indexOid))
 		{
-			relation_close(OldHeap, AccessExclusiveLock);
+			relation_close(OldHeap, lmode);
 			return false;
 		}
 	}
@@ -488,7 +688,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
  * Verify that the specified heap and index are valid to cluster on
  *
  * Side effect: obtains lock on the index.  The caller may
- * in some cases already have AccessExclusiveLock on the table, but
+ * in some cases already have a lock of the same strength on the table, but
  * not in all cases so we can't rely on the table-level lock for
  * protection here.
  */
@@ -618,18 +818,87 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 }
 
 /*
- * rebuild_relation: rebuild an existing relation in index or physical order
- *
- * OldHeap: table to rebuild.
- * index: index to cluster by, or NULL to rewrite in physical order.
- *
- * On entry, heap and index (if one is given) must be open, and
- * AccessExclusiveLock held on them.
- * On exit, they are closed, but locks on them are not released.
+ * Check if the CONCURRENTLY option is legal for the relation.
  */
 static void
-rebuild_relation(RepackCommand cmd,
-				 Relation OldHeap, Relation index, bool verbose)
+check_repack_concurrently_requirements(Relation rel)
+{
+	char		relpersistence,
+				replident;
+	Oid			ident_idx;
+
+	/* Data changes in system relations are not logically decoded. */
+	if (IsCatalogRelation(rel))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("REPACK CONCURRENTLY is not supported for catalog relations.")));
+
+	/*
+	 * reorderbuffer.c does not seem to handle processing of TOAST relation
+	 * alone.
+	 */
+	if (IsToastRelation(rel))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("REPACK CONCURRENTLY is not supported for TOAST relations, unless the main relation is repacked too.")));
+
+	relpersistence = rel->rd_rel->relpersistence;
+	if (relpersistence != RELPERSISTENCE_PERMANENT)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("REPACK CONCURRENTLY is only allowed for permanent relations.")));
+
+	/* With NOTHING, WAL does not contain the old tuple. */
+	replident = rel->rd_rel->relreplident;
+	if (replident == REPLICA_IDENTITY_NOTHING)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("Relation \"%s\" has insufficient replication identity.",
+						 RelationGetRelationName(rel))));
+
+	/*
+	 * Identity index is not set if the replica identity is FULL, but PK might
+	 * exist in such a case.
+	 */
+	ident_idx = RelationGetReplicaIndex(rel);
+	if (!OidIsValid(ident_idx) && OidIsValid(rel->rd_pkindex))
+		ident_idx = rel->rd_pkindex;
+	if (!OidIsValid(ident_idx))
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot process relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 (errhint("Relation \"%s\" has no identity index.",
+						  RelationGetRelationName(rel)))));
+}
+
+
+/*
+ * rebuild_relation: rebuild an existing relation in index or physical order
+ *
+ * OldHeap: table to rebuild.  See cluster_rel() for comments on the required
+ * lock strength.
+ *
+ * index: index to cluster by, or NULL to rewrite in physical order.
+ *
+ * On entry, heap and index (if one is given) must be open, and the
+ * appropriate lock held on them -- AccessExclusiveLock for exclusive
+ * processing and ShareUpdateExclusiveLock for concurrent processing.
+ *
+ * On exit, they are closed, but still locked with AccessExclusiveLock.
+ * (The function handles the lock upgrade if 'concurrent' is true.)
+ */
+static void
+rebuild_relation(RepackCommand cmd, Relation OldHeap, Relation index,
+				 bool verbose, bool concurrent)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			accessMethod = OldHeap->rd_rel->relam;
@@ -637,13 +906,55 @@ rebuild_relation(RepackCommand cmd,
 	Oid			OIDNewHeap;
 	Relation	NewHeap;
 	char		relpersistence;
-	bool		is_system_catalog;
 	bool		swap_toast_by_content;
 	TransactionId frozenXid;
 	MultiXactId cutoffMulti;
+	NameData	slotname;
+	LogicalDecodingContext *ctx = NULL;
+	Snapshot	snapshot = NULL;
+#if USE_ASSERT_CHECKING
+	LOCKMODE	lmode;
 
-	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false) &&
-		   (index == NULL || CheckRelationLockedByMe(index, AccessExclusiveLock, false)));
+	lmode = concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock;
+
+	Assert(CheckRelationLockedByMe(OldHeap, lmode, false));
+	Assert(index == NULL || CheckRelationLockedByMe(index, lmode, false));
+#endif
+
+	if (concurrent)
+	{
+		TupleDesc	tupdesc;
+
+		/*
+		 * REPACK CONCURRENTLY is not allowed in a transaction block, so this
+		 * should never fire.
+		 */
+		Assert(GetTopTransactionIdIfAny() == InvalidTransactionId);
+
+		/*
+		 * A single backend should not execute multiple REPACK commands at a
+		 * time, so use PID to make the slot unique.
+		 */
+		snprintf(NameStr(slotname), NAMEDATALEN, "repack_%d", MyProcPid);
+
+		tupdesc = CreateTupleDescCopy(RelationGetDescr(OldHeap));
+
+		/*
+		 * Prepare to capture the concurrent data changes.
+		 *
+		 * Note that this call waits for all transactions with XID already
+		 * assigned to finish. If some of those transactions is waiting for a
+		 * lock conflicting with ShareUpdateExclusiveLock on our table (e.g.
+		 * it runs CREATE INDEX), we can end up in a deadlock. Not sure this
+		 * risk is worth unlocking/locking the table (and its clustering
+		 * index) and checking again if its still eligible for REPACK
+		 * CONCURRENTLY.
+		 */
+		ctx = setup_logical_decoding(tableOid, NameStr(slotname), tupdesc);
+
+		snapshot = SnapBuildInitialSnapshotForRepack(ctx->snapshot_builder);
+		PushActiveSnapshot(snapshot);
+	}
 
 	/* for CLUSTER or REPACK USING INDEX, mark the index as the one to use */
 	if (index != NULL)
@@ -651,7 +962,6 @@ rebuild_relation(RepackCommand cmd,
 
 	/* Remember info about rel before closing OldHeap */
 	relpersistence = OldHeap->rd_rel->relpersistence;
-	is_system_catalog = IsSystemRelation(OldHeap);
 
 	/*
 	 * Create the transient table that will receive the re-ordered data.
@@ -667,30 +977,67 @@ rebuild_relation(RepackCommand cmd,
 	NewHeap = table_open(OIDNewHeap, NoLock);
 
 	/* Copy the heap data into the new table in the desired order */
-	copy_table_data(NewHeap, OldHeap, index, verbose,
+	copy_table_data(NewHeap, OldHeap, index, snapshot, ctx, verbose,
 					&swap_toast_by_content, &frozenXid, &cutoffMulti);
 
+	/* The historic snapshot won't be needed anymore. */
+	if (snapshot)
+		PopActiveSnapshot();
 
-	/* Close relcache entries, but keep lock until transaction commit */
-	table_close(OldHeap, NoLock);
-	if (index)
-		index_close(index, NoLock);
+	if (concurrent)
+	{
+		/*
+		 * Push a snapshot that we will use to find old versions of rows when
+		 * processing concurrent UPDATE and DELETE commands. (That snapshot
+		 * should also be used by index expressions.)
+		 */
+		PushActiveSnapshot(GetTransactionSnapshot());
 
-	/*
-	 * Close the new relation so it can be dropped as soon as the storage is
-	 * swapped. The relation is not visible to others, so no need to unlock it
-	 * explicitly.
-	 */
-	table_close(NewHeap, NoLock);
+		/*
+		 * Make sure we can find the tuples just inserted when applying DML
+		 * commands on top of those.
+		 */
+		CommandCounterIncrement();
+		UpdateActiveSnapshotCommandId();
 
-	/*
-	 * Swap the physical files of the target and transient tables, then
-	 * rebuild the target's indexes and throw away the transient table.
-	 */
-	finish_heap_swap(tableOid, OIDNewHeap, is_system_catalog,
-					 swap_toast_by_content, false, true,
-					 frozenXid, cutoffMulti,
-					 relpersistence);
+		rebuild_relation_finish_concurrent(NewHeap, OldHeap, index,
+										   ctx, swap_toast_by_content,
+										   frozenXid, cutoffMulti);
+		PopActiveSnapshot();
+
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_FINAL_CLEANUP);
+
+		/* Done with decoding. */
+		cleanup_logical_decoding(ctx);
+		ReplicationSlotRelease();
+		ReplicationSlotDrop(NameStr(slotname), false);
+	}
+	else
+	{
+		bool		is_system_catalog = IsSystemRelation(OldHeap);
+
+		/* Close relcache entries, but keep lock until transaction commit */
+		table_close(OldHeap, NoLock);
+		if (index)
+			index_close(index, NoLock);
+
+		/*
+		 * Close the new relation so it can be dropped as soon as the storage
+		 * is swapped. The relation is not visible to others, so no need to
+		 * unlock it explicitly.
+		 */
+		table_close(NewHeap, NoLock);
+
+		/*
+		 * Swap the physical files of the target and transient tables, then
+		 * rebuild the target's indexes and throw away the transient table.
+		 */
+		finish_heap_swap(tableOid, OIDNewHeap, is_system_catalog,
+						 swap_toast_by_content, false, true, true,
+						 frozenXid, cutoffMulti,
+						 relpersistence);
+	}
 }
 
 
@@ -825,15 +1172,19 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
 /*
  * Do the physical copying of table data.
  *
+ * 'snapshot' and 'decoding_ctx': see table_relation_copy_for_cluster(). Pass
+ * iff concurrent processing is required.
+ *
  * There are three output parameters:
  * *pSwapToastByContent is set true if toast tables must be swapped by content.
  * *pFreezeXid receives the TransactionId used as freeze cutoff point.
  * *pCutoffMulti receives the MultiXactId used as a cutoff point.
  */
 static void
-copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verbose,
-				bool *pSwapToastByContent, TransactionId *pFreezeXid,
-				MultiXactId *pCutoffMulti)
+copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
+				Snapshot snapshot, LogicalDecodingContext *decoding_ctx,
+				bool verbose, bool *pSwapToastByContent,
+				TransactionId *pFreezeXid, MultiXactId *pCutoffMulti)
 {
 	Relation	relRelation;
 	HeapTuple	reltup;
@@ -851,6 +1202,8 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	PGRUsage	ru0;
 	char	   *nspname;
 
+	bool		concurrent = snapshot != NULL;
+
 	pg_rusage_init(&ru0);
 
 	/* Store a copy of the namespace name for logging purposes */
@@ -953,8 +1306,48 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	 * provided, else plain seqscan.
 	 */
 	if (OldIndex != NULL && OldIndex->rd_rel->relam == BTREE_AM_OID)
+	{
+		ResourceOwner oldowner = NULL;
+		ResourceOwner resowner = NULL;
+
+		/*
+		 * In the CONCURRENT case, use a dedicated resource owner so we don't
+		 * leave any additional locks behind us that we cannot release easily.
+		 */
+		if (concurrent)
+		{
+			Assert(CheckRelationLockedByMe(OldHeap, ShareUpdateExclusiveLock,
+										   false));
+			Assert(CheckRelationLockedByMe(OldIndex, ShareUpdateExclusiveLock,
+										   false));
+
+			resowner = ResourceOwnerCreate(CurrentResourceOwner,
+										   "plan_cluster_use_sort");
+			oldowner = CurrentResourceOwner;
+			CurrentResourceOwner = resowner;
+		}
+
 		use_sort = plan_cluster_use_sort(RelationGetRelid(OldHeap),
 										 RelationGetRelid(OldIndex));
+
+		if (concurrent)
+		{
+			CurrentResourceOwner = oldowner;
+
+			/*
+			 * We are primarily concerned about locks, but if the planner
+			 * happened to allocate any other resources, we should release
+			 * them too because we're going to delete the whole resowner.
+			 */
+			ResourceOwnerRelease(resowner, RESOURCE_RELEASE_BEFORE_LOCKS,
+								 false, false);
+			ResourceOwnerRelease(resowner, RESOURCE_RELEASE_LOCKS,
+								 false, false);
+			ResourceOwnerRelease(resowner, RESOURCE_RELEASE_AFTER_LOCKS,
+								 false, false);
+			ResourceOwnerDelete(resowner);
+		}
+	}
 	else
 		use_sort = false;
 
@@ -983,7 +1376,9 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	 * values (e.g. because the AM doesn't use freezing).
 	 */
 	table_relation_copy_for_cluster(OldHeap, NewHeap, OldIndex, use_sort,
-									cutoffs.OldestXmin, &cutoffs.FreezeLimit,
+									cutoffs.OldestXmin, snapshot,
+									decoding_ctx,
+									&cutoffs.FreezeLimit,
 									&cutoffs.MultiXactCutoff,
 									&num_tuples, &tups_vacuumed,
 									&tups_recently_dead);
@@ -992,7 +1387,11 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	*pFreezeXid = cutoffs.FreezeLimit;
 	*pCutoffMulti = cutoffs.MultiXactCutoff;
 
-	/* Reset rd_toastoid just to be tidy --- it shouldn't be looked at again */
+	/*
+	 * Reset rd_toastoid just to be tidy --- it shouldn't be looked at again.
+	 * In the CONCURRENTLY case, we need to set it again before applying the
+	 * concurrent changes.
+	 */
 	NewHeap->rd_toastoid = InvalidOid;
 
 	num_pages = RelationGetNumberOfBlocks(NewHeap);
@@ -1450,14 +1849,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 				 bool swap_toast_by_content,
 				 bool check_constraints,
 				 bool is_internal,
+				 bool reindex,
 				 TransactionId frozenXid,
 				 MultiXactId cutoffMulti,
 				 char newrelpersistence)
 {
 	ObjectAddress object;
 	Oid			mapped_tables[4];
-	int			reindex_flags;
-	ReindexParams reindex_params = {0};
 	int			i;
 
 	/* Report that we are now swapping relation files */
@@ -1483,39 +1881,47 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	if (is_system_catalog)
 		CacheInvalidateCatalog(OIDOldHeap);
 
-	/*
-	 * Rebuild each index on the relation (but not the toast table, which is
-	 * all-new at this point).  It is important to do this before the DROP
-	 * step because if we are processing a system catalog that will be used
-	 * during DROP, we want to have its indexes available.  There is no
-	 * advantage to the other order anyway because this is all transactional,
-	 * so no chance to reclaim disk space before commit.  We do not need a
-	 * final CommandCounterIncrement() because reindex_relation does it.
-	 *
-	 * Note: because index_build is called via reindex_relation, it will never
-	 * set indcheckxmin true for the indexes.  This is OK even though in some
-	 * sense we are building new indexes rather than rebuilding existing ones,
-	 * because the new heap won't contain any HOT chains at all, let alone
-	 * broken ones, so it can't be necessary to set indcheckxmin.
-	 */
-	reindex_flags = REINDEX_REL_SUPPRESS_INDEX_USE;
-	if (check_constraints)
-		reindex_flags |= REINDEX_REL_CHECK_CONSTRAINTS;
+	if (reindex)
+	{
+		int			reindex_flags;
+		ReindexParams reindex_params = {0};
 
-	/*
-	 * Ensure that the indexes have the same persistence as the parent
-	 * relation.
-	 */
-	if (newrelpersistence == RELPERSISTENCE_UNLOGGED)
-		reindex_flags |= REINDEX_REL_FORCE_INDEXES_UNLOGGED;
-	else if (newrelpersistence == RELPERSISTENCE_PERMANENT)
-		reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
+		/*
+		 * Rebuild each index on the relation (but not the toast table, which
+		 * is all-new at this point).  It is important to do this before the
+		 * DROP step because if we are processing a system catalog that will
+		 * be used during DROP, we want to have its indexes available.  There
+		 * is no advantage to the other order anyway because this is all
+		 * transactional, so no chance to reclaim disk space before commit. We
+		 * do not need a final CommandCounterIncrement() because
+		 * reindex_relation does it.
+		 *
+		 * Note: because index_build is called via reindex_relation, it will
+		 * never set indcheckxmin true for the indexes.  This is OK even
+		 * though in some sense we are building new indexes rather than
+		 * rebuilding existing ones, because the new heap won't contain any
+		 * HOT chains at all, let alone broken ones, so it can't be necessary
+		 * to set indcheckxmin.
+		 */
+		reindex_flags = REINDEX_REL_SUPPRESS_INDEX_USE;
+		if (check_constraints)
+			reindex_flags |= REINDEX_REL_CHECK_CONSTRAINTS;
 
-	/* Report that we are now reindexing relations */
-	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
-								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
+		/*
+		 * Ensure that the indexes have the same persistence as the parent
+		 * relation.
+		 */
+		if (newrelpersistence == RELPERSISTENCE_UNLOGGED)
+			reindex_flags |= REINDEX_REL_FORCE_INDEXES_UNLOGGED;
+		else if (newrelpersistence == RELPERSISTENCE_PERMANENT)
+			reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
 
-	reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
+		/* Report that we are now reindexing relations */
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
+
+		reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
+	}
 
 	/* Report that we are now doing clean up */
 	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
@@ -1858,7 +2264,8 @@ cluster_is_permitted_for_relation(RepackCommand cmd, Oid relid, Oid userid)
  * case, if an index name is given, it's up to the caller to resolve it.
  */
 static Relation
-process_single_relation(RepackStmt *stmt, ClusterParams *params)
+process_single_relation(RepackStmt *stmt, LOCKMODE lockmode, bool isTopLevel,
+						ClusterParams *params)
 {
 	Relation	rel;
 	Oid			tableOid;
@@ -1867,13 +2274,9 @@ process_single_relation(RepackStmt *stmt, ClusterParams *params)
 	Assert(stmt->command == REPACK_COMMAND_CLUSTER ||
 		   stmt->command == REPACK_COMMAND_REPACK);
 
-	/*
-	 * Find, lock, and check permissions on the table.  We obtain
-	 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
-	 * single-transaction case.
-	 */
+	/* Find, lock, and check permissions on the table. */
 	tableOid = RangeVarGetRelidExtended(stmt->relation->relation,
-										AccessExclusiveLock,
+										lockmode,
 										0,
 										RangeVarCallbackMaintainsTable,
 										NULL);
@@ -1905,13 +2308,14 @@ process_single_relation(RepackStmt *stmt, ClusterParams *params)
 		return rel;
 	else
 	{
-		Oid			indexOid;
+		Oid			indexOid = InvalidOid;
 
 		indexOid = determine_clustered_index(rel, stmt->usingindex,
 											 stmt->indexname);
 		if (OidIsValid(indexOid))
-			check_index_is_clusterable(rel, indexOid, AccessExclusiveLock);
-		cluster_rel(stmt->command, rel, indexOid, params);
+			check_index_is_clusterable(rel, indexOid, lockmode);
+
+		cluster_rel(stmt->command, rel, indexOid, params, isTopLevel);
 
 		/* Do an analyze, if requested */
 		if (params->options & CLUOPT_ANALYZE)
@@ -1994,3 +2398,1048 @@ RepackCommandAsString(RepackCommand cmd)
 	}
 	return "???";
 }
+
+
+/*
+ * Call this function before REPACK CONCURRENTLY starts to setup logical
+ * decoding. It makes sure that other users of the table put enough
+ * information into WAL.
+ *
+ * The point is that at various places we expect that the table we're
+ * processing is treated like a system catalog. For example, we need to be
+ * able to scan it using a "historic snapshot" anytime during the processing
+ * (as opposed to scanning only at the start point of the decoding, as logical
+ * replication does during initial table synchronization), in order to apply
+ * concurrent UPDATE / DELETE commands.
+ *
+ * Note that TOAST table needs no attention here as it's not scanned using
+ * historic snapshot.
+ */
+static void
+begin_concurrent_repack(Relation rel)
+{
+	Oid			toastrelid;
+
+	/* Avoid logical decoding of other relations by this backend. */
+	repacked_rel_locator = rel->rd_locator;
+	toastrelid = rel->rd_rel->reltoastrelid;
+	if (OidIsValid(toastrelid))
+	{
+		Relation	toastrel;
+
+		/* Avoid logical decoding of other TOAST relations. */
+		toastrel = table_open(toastrelid, AccessShareLock);
+		repacked_rel_toast_locator = toastrel->rd_locator;
+		table_close(toastrel, AccessShareLock);
+	}
+}
+
+/*
+ * Call this when done with REPACK CONCURRENTLY.
+ */
+static void
+end_concurrent_repack(void)
+{
+	/*
+	 * Restore normal function of (future) logical decoding for this backend.
+	 */
+	repacked_rel_locator.relNumber = InvalidOid;
+	repacked_rel_toast_locator.relNumber = InvalidOid;
+}
+
+/*
+ * This function is much like pg_create_logical_replication_slot() except that
+ * the new slot is neither released (if anyone else could read changes from
+ * our slot, we could miss changes other backends do while we copy the
+ * existing data into temporary table), nor persisted (it's easier to handle
+ * crash by restarting all the work from scratch).
+ */
+static LogicalDecodingContext *
+setup_logical_decoding(Oid relid, const char *slotname, TupleDesc tupdesc)
+{
+	LogicalDecodingContext *ctx;
+	RepackDecodingState *dstate;
+
+	/*
+	 * Check if we can use logical decoding.
+	 */
+	CheckSlotPermissions();
+	CheckLogicalDecodingRequirements();
+
+	/* RS_TEMPORARY so that the slot gets cleaned up on ERROR. */
+	ReplicationSlotCreate(slotname, true, RS_TEMPORARY, false, false, false);
+
+	/*
+	 * Neither prepare_write nor do_write callback nor update_progress is
+	 * useful for us.
+	 */
+	ctx = CreateInitDecodingContext(REPL_PLUGIN_NAME,
+									NIL,
+									true,
+									InvalidXLogRecPtr,
+									XL_ROUTINE(.page_read = read_local_xlog_page,
+											   .segment_open = wal_segment_open,
+											   .segment_close = wal_segment_close),
+									NULL, NULL, NULL);
+
+	/*
+	 * We don't have control on setting fast_forward, so at least check it.
+	 */
+	Assert(!ctx->fast_forward);
+
+	DecodingContextFindStartpoint(ctx);
+
+	/* Some WAL records should have been read. */
+	Assert(ctx->reader->EndRecPtr != InvalidXLogRecPtr);
+
+	XLByteToSeg(ctx->reader->EndRecPtr, repack_current_segment,
+				wal_segment_size);
+
+	/*
+	 * Setup structures to store decoded changes.
+	 */
+	dstate = palloc0(sizeof(RepackDecodingState));
+	dstate->relid = relid;
+	dstate->tstore = tuplestore_begin_heap(false, false,
+										   maintenance_work_mem);
+
+	dstate->tupdesc = tupdesc;
+
+	/* Initialize the descriptor to store the changes ... */
+	dstate->tupdesc_change = CreateTemplateTupleDesc(1);
+
+	TupleDescInitEntry(dstate->tupdesc_change, 1, NULL, BYTEAOID, -1, 0);
+	/* ... as well as the corresponding slot. */
+	dstate->tsslot = MakeSingleTupleTableSlot(dstate->tupdesc_change,
+											  &TTSOpsMinimalTuple);
+
+	dstate->resowner = ResourceOwnerCreate(CurrentResourceOwner,
+										   "logical decoding");
+
+	ctx->output_writer_private = dstate;
+	return ctx;
+}
+
+/*
+ * Retrieve tuple from ConcurrentChange structure.
+ *
+ * The input data starts with the structure but it might not be appropriately
+ * aligned.
+ */
+static HeapTuple
+get_changed_tuple(char *change)
+{
+	HeapTupleData tup_data;
+	HeapTuple	result;
+	char	   *src;
+
+	/*
+	 * Ensure alignment before accessing the fields. (This is why we can't use
+	 * heap_copytuple() instead of this function.)
+	 */
+	src = change + offsetof(ConcurrentChange, tup_data);
+	memcpy(&tup_data, src, sizeof(HeapTupleData));
+
+	result = (HeapTuple) palloc(HEAPTUPLESIZE + tup_data.t_len);
+	memcpy(result, &tup_data, sizeof(HeapTupleData));
+	result->t_data = (HeapTupleHeader) ((char *) result + HEAPTUPLESIZE);
+	src = change + SizeOfConcurrentChange;
+	memcpy(result->t_data, src, result->t_len);
+
+	return result;
+}
+
+/*
+ * Decode logical changes from the WAL sequence up to end_of_wal.
+ */
+void
+repack_decode_concurrent_changes(LogicalDecodingContext *ctx,
+								 XLogRecPtr end_of_wal)
+{
+	RepackDecodingState *dstate;
+	ResourceOwner resowner_old;
+
+	/*
+	 * Invalidate the "present" cache before moving to "(recent) history".
+	 */
+	InvalidateSystemCaches();
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+	resowner_old = CurrentResourceOwner;
+	CurrentResourceOwner = dstate->resowner;
+
+	PG_TRY();
+	{
+		while (ctx->reader->EndRecPtr < end_of_wal)
+		{
+			XLogRecord *record;
+			XLogSegNo	segno_new;
+			char	   *errm = NULL;
+			XLogRecPtr	end_lsn;
+
+			record = XLogReadRecord(ctx->reader, &errm);
+			if (errm)
+				elog(ERROR, "%s", errm);
+
+			if (record != NULL)
+				LogicalDecodingProcessRecord(ctx, ctx->reader);
+
+			/*
+			 * If WAL segment boundary has been crossed, inform the decoding
+			 * system that the catalog_xmin can advance. (We can confirm more
+			 * often, but a filling a single WAL segment should not take much
+			 * time.)
+			 */
+			end_lsn = ctx->reader->EndRecPtr;
+			XLByteToSeg(end_lsn, segno_new, wal_segment_size);
+			if (segno_new != repack_current_segment)
+			{
+				LogicalConfirmReceivedLocation(end_lsn);
+				elog(DEBUG1, "REPACK: confirmed receive location %X/%X",
+					 (uint32) (end_lsn >> 32), (uint32) end_lsn);
+				repack_current_segment = segno_new;
+			}
+
+			CHECK_FOR_INTERRUPTS();
+		}
+		InvalidateSystemCaches();
+		CurrentResourceOwner = resowner_old;
+	}
+	PG_CATCH();
+	{
+		/* clear all timetravel entries */
+		InvalidateSystemCaches();
+		CurrentResourceOwner = resowner_old;
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+}
+
+/*
+ * Apply changes that happened during the initial load.
+ *
+ * Scan key is passed by caller, so it does not have to be constructed
+ * multiple times. Key entries have all fields initialized, except for
+ * sk_argument.
+ */
+static void
+apply_concurrent_changes(RepackDecodingState *dstate, Relation rel,
+						 ScanKey key, int nkeys, IndexInsertState *iistate)
+{
+	TupleTableSlot *index_slot,
+			   *ident_slot;
+	HeapTuple	tup_old = NULL;
+
+	if (dstate->nchanges == 0)
+		return;
+
+	/* TupleTableSlot is needed to pass the tuple to ExecInsertIndexTuples(). */
+	index_slot = MakeSingleTupleTableSlot(dstate->tupdesc, &TTSOpsHeapTuple);
+
+	/* A slot to fetch tuples from identity index. */
+	ident_slot = table_slot_create(rel, NULL);
+
+	while (tuplestore_gettupleslot(dstate->tstore, true, false,
+								   dstate->tsslot))
+	{
+		bool		shouldFree;
+		HeapTuple	tup_change,
+					tup,
+					tup_exist;
+		char	   *change_raw,
+				   *src;
+		ConcurrentChange change;
+		bool		isnull[1];
+		Datum		values[1];
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* Get the change from the single-column tuple. */
+		tup_change = ExecFetchSlotHeapTuple(dstate->tsslot, false, &shouldFree);
+		heap_deform_tuple(tup_change, dstate->tupdesc_change, values, isnull);
+		Assert(!isnull[0]);
+
+		/* Make sure we access aligned data. */
+		change_raw = (char *) DatumGetByteaP(values[0]);
+		src = (char *) VARDATA(change_raw);
+		memcpy(&change, src, SizeOfConcurrentChange);
+
+		/* TRUNCATE change contains no tuple, so process it separately. */
+		if (change.kind == CHANGE_TRUNCATE)
+		{
+			/*
+			 * All the things that ExecuteTruncateGuts() does (such as firing
+			 * triggers or handling the DROP_CASCADE behavior) should have
+			 * taken place on the source relation. Thus we only do the actual
+			 * truncation of the new relation (and its indexes).
+			 */
+			heap_truncate_one_rel(rel);
+
+			pfree(tup_change);
+			continue;
+		}
+
+		/*
+		 * Extract the tuple from the change. The tuple is copied here because
+		 * it might be assigned to 'tup_old', in which case it needs to
+		 * survive into the next iteration.
+		 */
+		tup = get_changed_tuple(src);
+
+		if (change.kind == CHANGE_UPDATE_OLD)
+		{
+			Assert(tup_old == NULL);
+			tup_old = tup;
+		}
+		else if (change.kind == CHANGE_INSERT)
+		{
+			Assert(tup_old == NULL);
+
+			apply_concurrent_insert(rel, &change, tup, iistate, index_slot);
+
+			pfree(tup);
+		}
+		else if (change.kind == CHANGE_UPDATE_NEW ||
+				 change.kind == CHANGE_DELETE)
+		{
+			IndexScanDesc ind_scan = NULL;
+			HeapTuple	tup_key;
+
+			if (change.kind == CHANGE_UPDATE_NEW)
+			{
+				tup_key = tup_old != NULL ? tup_old : tup;
+			}
+			else
+			{
+				Assert(tup_old == NULL);
+				tup_key = tup;
+			}
+
+			/*
+			 * Find the tuple to be updated or deleted.
+			 */
+			tup_exist = find_target_tuple(rel, key, nkeys, tup_key,
+										  iistate, ident_slot, &ind_scan);
+			if (tup_exist == NULL)
+				elog(ERROR, "Failed to find target tuple");
+
+			if (change.kind == CHANGE_UPDATE_NEW)
+				apply_concurrent_update(rel, tup, tup_exist, &change, iistate,
+										index_slot);
+			else
+				apply_concurrent_delete(rel, tup_exist, &change);
+
+			if (tup_old != NULL)
+			{
+				pfree(tup_old);
+				tup_old = NULL;
+			}
+
+			pfree(tup);
+			index_endscan(ind_scan);
+		}
+		else
+			elog(ERROR, "Unrecognized kind of change: %d", change.kind);
+
+		/*
+		 * If a change was applied now, increment CID for next writes and
+		 * update the snapshot so it sees the changes we've applied so far.
+		 */
+		if (change.kind != CHANGE_UPDATE_OLD)
+		{
+			CommandCounterIncrement();
+			UpdateActiveSnapshotCommandId();
+		}
+
+		/* TTSOpsMinimalTuple has .get_heap_tuple==NULL. */
+		Assert(shouldFree);
+		pfree(tup_change);
+	}
+
+	tuplestore_clear(dstate->tstore);
+	dstate->nchanges = 0;
+
+	/* Cleanup. */
+	ExecDropSingleTupleTableSlot(index_slot);
+	ExecDropSingleTupleTableSlot(ident_slot);
+}
+
+static void
+apply_concurrent_insert(Relation rel, ConcurrentChange *change, HeapTuple tup,
+						IndexInsertState *iistate, TupleTableSlot *index_slot)
+{
+	List	   *recheck;
+
+
+	/*
+	 * Like simple_heap_insert(), but make sure that the INSERT is not
+	 * logically decoded - see reform_and_rewrite_tuple() for more
+	 * information.
+	 */
+	heap_insert(rel, tup, GetCurrentCommandId(true), HEAP_INSERT_NO_LOGICAL,
+				NULL);
+
+	/*
+	 * Update indexes.
+	 *
+	 * In case functions in the index need the active snapshot and caller
+	 * hasn't set one.
+	 */
+	ExecStoreHeapTuple(tup, index_slot, false);
+	recheck = ExecInsertIndexTuples(iistate->rri,
+									index_slot,
+									iistate->estate,
+									false,	/* update */
+									false,	/* noDupErr */
+									NULL,	/* specConflict */
+									NIL,	/* arbiterIndexes */
+									false	/* onlySummarizing */
+		);
+
+	/*
+	 * If recheck is required, it must have been preformed on the source
+	 * relation by now. (All the logical changes we process here are already
+	 * committed.)
+	 */
+	list_free(recheck);
+
+	pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_INSERTED, 1);
+}
+
+static void
+apply_concurrent_update(Relation rel, HeapTuple tup, HeapTuple tup_target,
+						ConcurrentChange *change, IndexInsertState *iistate,
+						TupleTableSlot *index_slot)
+{
+	LockTupleMode lockmode;
+	TM_FailureData tmfd;
+	TU_UpdateIndexes update_indexes;
+	TM_Result	res;
+	List	   *recheck;
+
+	/*
+	 * Write the new tuple into the new heap. ('tup' gets the TID assigned
+	 * here.)
+	 *
+	 * Do it like in simple_heap_update(), except for 'wal_logical' (and
+	 * except for 'wait').
+	 */
+	res = heap_update(rel, &tup_target->t_self, tup,
+					  GetCurrentCommandId(true),
+					  InvalidSnapshot,
+					  false,	/* no wait - only we are doing changes */
+					  &tmfd, &lockmode, &update_indexes,
+					  false /* wal_logical */ );
+	if (res != TM_Ok)
+		ereport(ERROR, (errmsg("failed to apply concurrent UPDATE")));
+
+	ExecStoreHeapTuple(tup, index_slot, false);
+
+	if (update_indexes != TU_None)
+	{
+		recheck = ExecInsertIndexTuples(iistate->rri,
+										index_slot,
+										iistate->estate,
+										true,	/* update */
+										false,	/* noDupErr */
+										NULL,	/* specConflict */
+										NIL,	/* arbiterIndexes */
+		/* onlySummarizing */
+										update_indexes == TU_Summarizing);
+		list_free(recheck);
+	}
+
+	pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_UPDATED, 1);
+}
+
+static void
+apply_concurrent_delete(Relation rel, HeapTuple tup_target,
+						ConcurrentChange *change)
+{
+	TM_Result	res;
+	TM_FailureData tmfd;
+
+	/*
+	 * Delete tuple from the new heap.
+	 *
+	 * Do it like in simple_heap_delete(), except for 'wal_logical' (and
+	 * except for 'wait').
+	 */
+	res = heap_delete(rel, &tup_target->t_self, GetCurrentCommandId(true),
+					  InvalidSnapshot, false,
+					  &tmfd,
+					  false,	/* no wait - only we are doing changes */
+					  false /* wal_logical */ );
+
+	if (res != TM_Ok)
+		ereport(ERROR, (errmsg("failed to apply concurrent DELETE")));
+
+	pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_DELETED, 1);
+}
+
+/*
+ * Find the tuple to be updated or deleted.
+ *
+ * 'key' is a pre-initialized scan key, into which the function will put the
+ * key values.
+ *
+ * 'tup_key' is a tuple containing the key values for the scan.
+ *
+ * On exit,'*scan_p' contains the scan descriptor used. The caller must close
+ * it when he no longer needs the tuple returned.
+ */
+static HeapTuple
+find_target_tuple(Relation rel, ScanKey key, int nkeys, HeapTuple tup_key,
+				  IndexInsertState *iistate,
+				  TupleTableSlot *ident_slot, IndexScanDesc *scan_p)
+{
+	IndexScanDesc scan;
+	Form_pg_index ident_form;
+	int2vector *ident_indkey;
+	HeapTuple	result = NULL;
+
+	/* XXX no instrumentation for now */
+	scan = index_beginscan(rel, iistate->ident_index, GetActiveSnapshot(),
+						   NULL, nkeys, 0);
+	*scan_p = scan;
+	index_rescan(scan, key, nkeys, NULL, 0);
+
+	/* Info needed to retrieve key values from heap tuple. */
+	ident_form = iistate->ident_index->rd_index;
+	ident_indkey = &ident_form->indkey;
+
+	/* Use the incoming tuple to finalize the scan key. */
+	for (int i = 0; i < scan->numberOfKeys; i++)
+	{
+		ScanKey		entry;
+		bool		isnull;
+		int16		attno_heap;
+
+		entry = &scan->keyData[i];
+		attno_heap = ident_indkey->values[i];
+		entry->sk_argument = heap_getattr(tup_key,
+										  attno_heap,
+										  rel->rd_att,
+										  &isnull);
+		Assert(!isnull);
+	}
+	if (index_getnext_slot(scan, ForwardScanDirection, ident_slot))
+	{
+		bool		shouldFree;
+
+		result = ExecFetchSlotHeapTuple(ident_slot, false, &shouldFree);
+		/* TTSOpsBufferHeapTuple has .get_heap_tuple != NULL. */
+		Assert(!shouldFree);
+	}
+
+	return result;
+}
+
+/*
+ * Decode and apply concurrent changes.
+ *
+ * Pass rel_src iff its reltoastrelid is needed.
+ */
+static void
+process_concurrent_changes(LogicalDecodingContext *ctx, XLogRecPtr end_of_wal,
+						   Relation rel_dst, Relation rel_src, ScanKey ident_key,
+						   int ident_key_nentries, IndexInsertState *iistate)
+{
+	RepackDecodingState *dstate;
+
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_CATCH_UP);
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	repack_decode_concurrent_changes(ctx, end_of_wal);
+
+	if (dstate->nchanges == 0)
+		return;
+
+	PG_TRY();
+	{
+		/*
+		 * Make sure that TOAST values can eventually be accessed via the old
+		 * relation - see comment in copy_table_data().
+		 */
+		if (rel_src)
+			rel_dst->rd_toastoid = rel_src->rd_rel->reltoastrelid;
+
+		apply_concurrent_changes(dstate, rel_dst, ident_key,
+								 ident_key_nentries, iistate);
+	}
+	PG_FINALLY();
+	{
+		if (rel_src)
+			rel_dst->rd_toastoid = InvalidOid;
+	}
+	PG_END_TRY();
+}
+
+static IndexInsertState *
+get_index_insert_state(Relation relation, Oid ident_index_id)
+{
+	EState	   *estate;
+	int			i;
+	IndexInsertState *result;
+
+	result = (IndexInsertState *) palloc0(sizeof(IndexInsertState));
+	estate = CreateExecutorState();
+
+	result->rri = (ResultRelInfo *) palloc(sizeof(ResultRelInfo));
+	InitResultRelInfo(result->rri, relation, 0, 0, 0);
+	ExecOpenIndices(result->rri, false);
+
+	/*
+	 * Find the relcache entry of the identity index so that we spend no extra
+	 * effort to open / close it.
+	 */
+	for (i = 0; i < result->rri->ri_NumIndices; i++)
+	{
+		Relation	ind_rel;
+
+		ind_rel = result->rri->ri_IndexRelationDescs[i];
+		if (ind_rel->rd_id == ident_index_id)
+			result->ident_index = ind_rel;
+	}
+	if (result->ident_index == NULL)
+		elog(ERROR, "Failed to open identity index");
+
+	/* Only initialize fields needed by ExecInsertIndexTuples(). */
+	result->estate = estate;
+
+	return result;
+}
+
+/*
+ * Build scan key to process logical changes.
+ */
+static ScanKey
+build_identity_key(Oid ident_idx_oid, Relation rel_src, int *nentries)
+{
+	Relation	ident_idx_rel;
+	Form_pg_index ident_idx;
+	int			n,
+				i;
+	ScanKey		result;
+
+	Assert(OidIsValid(ident_idx_oid));
+	ident_idx_rel = index_open(ident_idx_oid, AccessShareLock);
+	ident_idx = ident_idx_rel->rd_index;
+	n = ident_idx->indnatts;
+	result = (ScanKey) palloc(sizeof(ScanKeyData) * n);
+	for (i = 0; i < n; i++)
+	{
+		ScanKey		entry;
+		int16		relattno;
+		Form_pg_attribute att;
+		Oid			opfamily,
+					opcintype,
+					opno,
+					opcode;
+
+		entry = &result[i];
+		relattno = ident_idx->indkey.values[i];
+		if (relattno >= 1)
+		{
+			TupleDesc	desc;
+
+			desc = rel_src->rd_att;
+			att = TupleDescAttr(desc, relattno - 1);
+		}
+		else
+			elog(ERROR, "Unexpected attribute number %d in index", relattno);
+
+		opfamily = ident_idx_rel->rd_opfamily[i];
+		opcintype = ident_idx_rel->rd_opcintype[i];
+		opno = get_opfamily_member(opfamily, opcintype, opcintype,
+								   BTEqualStrategyNumber);
+
+		if (!OidIsValid(opno))
+			elog(ERROR, "Failed to find = operator for type %u", opcintype);
+
+		opcode = get_opcode(opno);
+		if (!OidIsValid(opcode))
+			elog(ERROR, "Failed to find = operator for operator %u", opno);
+
+		/* Initialize everything but argument. */
+		ScanKeyInit(entry,
+					i + 1,
+					BTEqualStrategyNumber, opcode,
+					(Datum) NULL);
+		entry->sk_collation = att->attcollation;
+	}
+	index_close(ident_idx_rel, AccessShareLock);
+
+	*nentries = n;
+	return result;
+}
+
+static void
+free_index_insert_state(IndexInsertState *iistate)
+{
+	ExecCloseIndices(iistate->rri);
+	FreeExecutorState(iistate->estate);
+	pfree(iistate->rri);
+	pfree(iistate);
+}
+
+static void
+cleanup_logical_decoding(LogicalDecodingContext *ctx)
+{
+	RepackDecodingState *dstate;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	ExecDropSingleTupleTableSlot(dstate->tsslot);
+	FreeTupleDesc(dstate->tupdesc_change);
+	FreeTupleDesc(dstate->tupdesc);
+	tuplestore_end(dstate->tstore);
+
+	FreeDecodingContext(ctx);
+}
+
+/*
+ * The final steps of rebuild_relation() for concurrent processing.
+ *
+ * On entry, NewHeap is locked in AccessExclusiveLock mode. OldHeap and its
+ * clustering index (if one is passed) are still locked in a mode that allows
+ * concurrent data changes. On exit, both tables and their indexes are closed,
+ * but locked in AccessExclusiveLock mode.
+ */
+static void
+rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
+								   Relation cl_index,
+								   LogicalDecodingContext *ctx,
+								   bool swap_toast_by_content,
+								   TransactionId frozenXid,
+								   MultiXactId cutoffMulti)
+{
+	LOCKMODE	lockmode_old PG_USED_FOR_ASSERTS_ONLY;
+	List	   *ind_oids_new;
+	Oid			old_table_oid = RelationGetRelid(OldHeap);
+	Oid			new_table_oid = RelationGetRelid(NewHeap);
+	List	   *ind_oids_old = RelationGetIndexList(OldHeap);
+	ListCell   *lc,
+			   *lc2;
+	char		relpersistence;
+	bool		is_system_catalog;
+	Oid			ident_idx_old,
+				ident_idx_new;
+	IndexInsertState *iistate;
+	ScanKey		ident_key;
+	int			ident_key_nentries;
+	XLogRecPtr	wal_insert_ptr,
+				end_of_wal;
+	char		dummy_rec_data = '\0';
+	Relation   *ind_refs,
+			   *ind_refs_p;
+	int			nind;
+
+	/* Like in cluster_rel(). */
+	lockmode_old = ShareUpdateExclusiveLock;
+	Assert(CheckRelationLockedByMe(OldHeap, lockmode_old, false));
+	Assert(cl_index == NULL ||
+		   CheckRelationLockedByMe(cl_index, lockmode_old, false));
+	/* This is expected from the caller. */
+	Assert(CheckRelationLockedByMe(NewHeap, AccessExclusiveLock, false));
+
+	ident_idx_old = RelationGetReplicaIndex(OldHeap);
+
+	/*
+	 * Unlike the exclusive case, we build new indexes for the new relation
+	 * rather than swapping the storage and reindexing the old relation. The
+	 * point is that the index build can take some time, so we do it before we
+	 * get AccessExclusiveLock on the old heap and therefore we cannot swap
+	 * the heap storage yet.
+	 *
+	 * index_create() will lock the new indexes using AccessExclusiveLock - no
+	 * need to change that.
+	 *
+	 * We assume that ShareUpdateExclusiveLock on the table prevents anyone
+	 * from dropping the existing indexes or adding new ones, so the lists of
+	 * old and new indexes should match at the swap time. On the other hand we
+	 * do not block ALTER INDEX commands that do not require table lock (e.g.
+	 * ALTER INDEX ... SET ...).
+	 *
+	 * XXX Should we check a the end of our work if another transaction
+	 * executed such a command and issue a NOTICE that we might have discarded
+	 * its effects? (For example, someone changes storage parameter after we
+	 * have created the new index, the new value of that parameter is lost.)
+	 * Alternatively, we can lock all the indexes now in a mode that blocks
+	 * all the ALTER INDEX commands (ShareUpdateExclusiveLock ?), and keep
+	 * them locked till the end of the transactions. That might increase the
+	 * risk of deadlock during the lock upgrade below, however SELECT / DML
+	 * queries should not be involved in such a deadlock.
+	 */
+	ind_oids_new = build_new_indexes(NewHeap, OldHeap, ind_oids_old);
+
+	/*
+	 * Processing shouldn't start w/o valid identity index.
+	 */
+	Assert(OidIsValid(ident_idx_old));
+
+	/* Find "identity index" on the new relation. */
+	ident_idx_new = InvalidOid;
+	forboth(lc, ind_oids_old, lc2, ind_oids_new)
+	{
+		Oid			ind_old = lfirst_oid(lc);
+		Oid			ind_new = lfirst_oid(lc2);
+
+		if (ident_idx_old == ind_old)
+		{
+			ident_idx_new = ind_new;
+			break;
+		}
+	}
+	if (!OidIsValid(ident_idx_new))
+
+		/*
+		 * Should not happen, given our lock on the old relation.
+		 */
+		ereport(ERROR,
+				(errmsg("Identity index missing on the new relation")));
+
+	/* Executor state to update indexes. */
+	iistate = get_index_insert_state(NewHeap, ident_idx_new);
+
+	/*
+	 * Build scan key that we'll use to look for rows to be updated / deleted
+	 * during logical decoding.
+	 */
+	ident_key = build_identity_key(ident_idx_new, OldHeap, &ident_key_nentries);
+
+	/*
+	 * During testing, wait for another backend to perform concurrent data
+	 * changes which we will process below.
+	 */
+	INJECTION_POINT("repack-concurrently-before-lock", NULL);
+
+	/*
+	 * Flush all WAL records inserted so far (possibly except for the last
+	 * incomplete page, see GetInsertRecPtr), to minimize the amount of data
+	 * we need to flush while holding exclusive lock on the source table.
+	 */
+	wal_insert_ptr = GetInsertRecPtr();
+	XLogFlush(wal_insert_ptr);
+	end_of_wal = GetFlushRecPtr(NULL);
+
+	/*
+	 * Apply concurrent changes first time, to minimize the time we need to
+	 * hold AccessExclusiveLock. (Quite some amount of WAL could have been
+	 * written during the data copying and index creation.)
+	 */
+	process_concurrent_changes(ctx, end_of_wal, NewHeap,
+							   swap_toast_by_content ? OldHeap : NULL,
+							   ident_key, ident_key_nentries, iistate);
+
+	/*
+	 * Acquire AccessExclusiveLock on the table, its TOAST relation (if there
+	 * is one), all its indexes, so that we can swap the files.
+	 *
+	 * Before that, unlock the index temporarily to avoid deadlock in case
+	 * another transaction is trying to lock it while holding the lock on the
+	 * table.
+	 */
+	if (cl_index)
+	{
+		index_close(cl_index, ShareUpdateExclusiveLock);
+		cl_index = NULL;
+	}
+	/* For the same reason, unlock TOAST relation. */
+	if (OldHeap->rd_rel->reltoastrelid)
+		LockRelationOid(OldHeap->rd_rel->reltoastrelid, AccessExclusiveLock);
+	/* Finally lock the table */
+	LockRelationOid(old_table_oid, AccessExclusiveLock);
+
+	/*
+	 * Lock all indexes now, not only the clustering one: all indexes need to
+	 * have their files swapped. While doing that, store their relation
+	 * references in an array, to handle predicate locks below.
+	 */
+	ind_refs_p = ind_refs = palloc_array(Relation, list_length(ind_oids_old));
+	nind = 0;
+	foreach(lc, ind_oids_old)
+	{
+		Oid			ind_oid;
+		Relation	index;
+
+		ind_oid = lfirst_oid(lc);
+		index = index_open(ind_oid, AccessExclusiveLock);
+
+		/*
+		 * TODO 1) Do we need to check if ALTER INDEX was executed since the
+		 * new index was created in build_new_indexes()? 2) Specifically for
+		 * the clustering index, should check_index_is_clusterable() be called
+		 * here? (Not sure about the latter: ShareUpdateExclusiveLock on the
+		 * table probably blocks all commands that affect the result of
+		 * check_index_is_clusterable().)
+		 */
+		*ind_refs_p = index;
+		ind_refs_p++;
+		nind++;
+	}
+
+	/*
+	 * In addition, lock the OldHeap's TOAST relation exclusively - again, the
+	 * lock is needed to swap the files.
+	 */
+	if (OidIsValid(OldHeap->rd_rel->reltoastrelid))
+		LockRelationOid(OldHeap->rd_rel->reltoastrelid, AccessExclusiveLock);
+
+	/*
+	 * Tuples and pages of the old heap will be gone, but the heap will stay.
+	 */
+	TransferPredicateLocksToHeapRelation(OldHeap);
+	/* The same for indexes. */
+	for (int i = 0; i < nind; i++)
+	{
+		Relation	index = ind_refs[i];
+
+		TransferPredicateLocksToHeapRelation(index);
+
+		/*
+		 * References to indexes on the old relation are not needed anymore,
+		 * however locks stay till the end of the transaction.
+		 */
+		index_close(index, NoLock);
+	}
+	pfree(ind_refs);
+
+	/*
+	 * Flush anything we see in WAL, to make sure that all changes committed
+	 * while we were waiting for the exclusive lock are available for
+	 * decoding. This should not be necessary if all backends had
+	 * synchronous_commit set, but we can't rely on this setting.
+	 *
+	 * Unfortunately, GetInsertRecPtr() may lag behind the actual insert
+	 * position, and GetLastImportantRecPtr() points at the start of the last
+	 * record rather than at the end. Thus the simplest way to determine the
+	 * insert position is to insert a dummy record and use its LSN.
+	 *
+	 * XXX Consider using GetLastImportantRecPtr() and adding the size of the
+	 * last record (plus the total size of all the page headers the record
+	 * spans)?
+	 */
+	XLogBeginInsert();
+	XLogRegisterData(&dummy_rec_data, 1);
+	wal_insert_ptr = XLogInsert(RM_XLOG_ID, XLOG_NOOP);
+	XLogFlush(wal_insert_ptr);
+	end_of_wal = GetFlushRecPtr(NULL);
+
+	/* Apply the concurrent changes again. */
+	process_concurrent_changes(ctx, end_of_wal, NewHeap,
+							   swap_toast_by_content ? OldHeap : NULL,
+							   ident_key, ident_key_nentries, iistate);
+
+	/* Remember info about rel before closing OldHeap */
+	relpersistence = OldHeap->rd_rel->relpersistence;
+	is_system_catalog = IsSystemRelation(OldHeap);
+
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_SWAP_REL_FILES);
+
+	/*
+	 * Even ShareUpdateExclusiveLock should have prevented others from
+	 * creating / dropping indexes (even using the CONCURRENTLY option), so we
+	 * do not need to check whether the lists match.
+	 */
+	forboth(lc, ind_oids_old, lc2, ind_oids_new)
+	{
+		Oid			ind_old = lfirst_oid(lc);
+		Oid			ind_new = lfirst_oid(lc2);
+		Oid			mapped_tables[4];
+
+		/* Zero out possible results from swapped_relation_files */
+		memset(mapped_tables, 0, sizeof(mapped_tables));
+
+		swap_relation_files(ind_old, ind_new,
+							(old_table_oid == RelationRelationId),
+							swap_toast_by_content,
+							true,
+							InvalidTransactionId,
+							InvalidMultiXactId,
+							mapped_tables);
+
+#ifdef USE_ASSERT_CHECKING
+
+		/*
+		 * Concurrent processing is not supported for system relations, so
+		 * there should be no mapped tables.
+		 */
+		for (int i = 0; i < 4; i++)
+			Assert(mapped_tables[i] == 0);
+#endif
+	}
+
+	/* The new indexes must be visible for deletion. */
+	CommandCounterIncrement();
+
+	/* Close the old heap but keep lock until transaction commit. */
+	table_close(OldHeap, NoLock);
+	/* Close the new heap. (We didn't have to open its indexes). */
+	table_close(NewHeap, NoLock);
+
+	/* Cleanup what we don't need anymore. (And close the identity index.) */
+	pfree(ident_key);
+	free_index_insert_state(iistate);
+
+	/*
+	 * Swap the relations and their TOAST relations and TOAST indexes. This
+	 * also drops the new relation and its indexes.
+	 *
+	 * (System catalogs are currently not supported.)
+	 */
+	Assert(!is_system_catalog);
+	finish_heap_swap(old_table_oid, new_table_oid,
+					 is_system_catalog,
+					 swap_toast_by_content,
+					 false, true, false,
+					 frozenXid, cutoffMulti,
+					 relpersistence);
+}
+
+/*
+ * Build indexes on NewHeap according to those on OldHeap.
+ *
+ * OldIndexes is the list of index OIDs on OldHeap.
+ *
+ * A list of OIDs of the corresponding indexes created on NewHeap is
+ * returned. The order of items does match, so we can use these arrays to swap
+ * index storage.
+ */
+static List *
+build_new_indexes(Relation NewHeap, Relation OldHeap, List *OldIndexes)
+{
+	ListCell   *lc;
+	List	   *result = NIL;
+
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
+
+	foreach(lc, OldIndexes)
+	{
+		Oid			ind_oid,
+					ind_oid_new;
+		char	   *newName;
+		Relation	ind;
+
+		ind_oid = lfirst_oid(lc);
+		ind = index_open(ind_oid, AccessShareLock);
+
+		newName = ChooseRelationName(get_rel_name(ind_oid),
+									 NULL,
+									 "repacknew",
+									 get_rel_namespace(ind->rd_index->indrelid),
+									 false);
+		ind_oid_new = index_create_copy(NewHeap, ind_oid,
+										ind->rd_rel->reltablespace, newName,
+										false);
+		result = lappend_oid(result, ind_oid_new);
+
+		index_close(ind, AccessShareLock);
+	}
+
+	return result;
+}
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index ef7c0d624f1..d12e2d0f2e0 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -892,7 +892,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 static void
 refresh_by_heap_swap(Oid matviewOid, Oid OIDNewHeap, char relpersistence)
 {
-	finish_heap_swap(matviewOid, OIDNewHeap, false, false, true, true,
+	finish_heap_swap(matviewOid, OIDNewHeap, false, false, true, true, true,
 					 RecentXmin, ReadNextMultiXactId(), relpersistence);
 }
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 5fd8b51312c..bd7b8db72ba 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -5992,6 +5992,7 @@ ATRewriteTables(AlterTableStmt *parsetree, List **wqueue, LOCKMODE lockmode,
 			finish_heap_swap(tab->relid, OIDNewHeap,
 							 false, false, true,
 							 !OidIsValid(tab->newTableSpace),
+							 true,
 							 RecentXmin,
 							 ReadNextMultiXactId(),
 							 persistence);
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 62207ceff7e..cd5b5049d2b 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -125,7 +125,7 @@ static void vac_truncate_clog(TransactionId frozenXID,
 							  TransactionId lastSaneFrozenXid,
 							  MultiXactId lastSaneMinMulti);
 static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
-					   BufferAccessStrategy bstrategy);
+					   BufferAccessStrategy bstrategy, bool isTopLevel);
 static double compute_parallel_delay(void);
 static VacOptValue get_vacoptval_from_boolean(DefElem *def);
 static bool vac_tid_reaped(ItemPointer itemptr, void *state);
@@ -626,7 +626,8 @@ vacuum(List *relations, const VacuumParams params, BufferAccessStrategy bstrateg
 
 			if (params.options & VACOPT_VACUUM)
 			{
-				if (!vacuum_rel(vrel->oid, vrel->relation, params, bstrategy))
+				if (!vacuum_rel(vrel->oid, vrel->relation, params, bstrategy,
+								isTopLevel))
 					continue;
 			}
 
@@ -1990,7 +1991,7 @@ vac_truncate_clog(TransactionId frozenXID,
  */
 static bool
 vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
-		   BufferAccessStrategy bstrategy)
+		   BufferAccessStrategy bstrategy, bool isTopLevel)
 {
 	LOCKMODE	lmode;
 	Relation	rel;
@@ -2281,7 +2282,7 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 
 			/* VACUUM FULL is a variant of REPACK; see cluster.c */
 			cluster_rel(REPACK_COMMAND_VACUUMFULL, rel, InvalidOid,
-						&cluster_params);
+						&cluster_params, isTopLevel);
 			/* cluster_rel closes the relation, but keeps lock */
 
 			rel = NULL;
@@ -2324,7 +2325,8 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 		toast_vacuum_params.options |= VACOPT_PROCESS_MAIN;
 		toast_vacuum_params.toast_parent = relid;
 
-		vacuum_rel(toast_relid, NULL, toast_vacuum_params, bstrategy);
+		vacuum_rel(toast_relid, NULL, toast_vacuum_params, bstrategy,
+				   isTopLevel);
 	}
 
 	/*
diff --git a/src/backend/meson.build b/src/backend/meson.build
index b831a541652..5c148131217 100644
--- a/src/backend/meson.build
+++ b/src/backend/meson.build
@@ -194,5 +194,6 @@ pg_test_mod_args = pg_mod_args + {
 subdir('jit/llvm')
 subdir('replication/libpqwalreceiver')
 subdir('replication/pgoutput')
+subdir('replication/pgoutput_repack')
 subdir('snowball')
 subdir('utils/mb/conversion_procs')
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..5dc4ae58ffe 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -33,6 +33,7 @@
 #include "access/xlogreader.h"
 #include "access/xlogrecord.h"
 #include "catalog/pg_control.h"
+#include "commands/cluster.h"
 #include "replication/decode.h"
 #include "replication/logical.h"
 #include "replication/message.h"
@@ -472,6 +473,88 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	TransactionId xid = XLogRecGetXid(buf->record);
 	SnapBuild  *builder = ctx->snapshot_builder;
 
+	/*
+	 * If the change is not intended for logical decoding, do not even
+	 * establish transaction for it - REPACK CONCURRENTLY is the typical use
+	 * case.
+	 *
+	 * First, check if REPACK CONCURRENTLY is being performed by this backend.
+	 * If so, only decode data changes of the table that it is processing, and
+	 * the changes of its TOAST relation.
+	 *
+	 * (TOAST locator should not be set unless the main is.)
+	 */
+	Assert(!OidIsValid(repacked_rel_toast_locator.relNumber) ||
+		   OidIsValid(repacked_rel_locator.relNumber));
+
+	if (OidIsValid(repacked_rel_locator.relNumber))
+	{
+		XLogReaderState *r = buf->record;
+		RelFileLocator locator;
+
+		/* Not all records contain the block. */
+		if (XLogRecGetBlockTagExtended(r, 0, &locator, NULL, NULL, NULL) &&
+			!RelFileLocatorEquals(locator, repacked_rel_locator) &&
+			(!OidIsValid(repacked_rel_toast_locator.relNumber) ||
+			 !RelFileLocatorEquals(locator, repacked_rel_toast_locator)))
+			return;
+	}
+
+	/*
+	 * Second, skip records which do not contain sufficient information for
+	 * the decoding.
+	 *
+	 * The problem we solve here is that REPACK CONCURRENTLY generates WAL
+	 * when doing changes in the new table. Those changes should not be useful
+	 * for any other user (such as logical replication subscription) because
+	 * the new table will eventually be dropped (after REPACK CONCURRENTLY has
+	 * assigned its file to the "old table").
+	 */
+	switch (info)
+	{
+		case XLOG_HEAP_INSERT:
+			{
+				xl_heap_insert *rec;
+
+				rec = (xl_heap_insert *) XLogRecGetData(buf->record);
+
+				/*
+				 * This does happen when 1) raw_heap_insert marks the TOAST
+				 * record as HEAP_INSERT_NO_LOGICAL, 2) REPACK CONCURRENTLY
+				 * replays inserts performed by other backends.
+				 */
+				if ((rec->flags & XLH_INSERT_CONTAINS_NEW_TUPLE) == 0)
+					return;
+
+				break;
+			}
+
+		case XLOG_HEAP_HOT_UPDATE:
+		case XLOG_HEAP_UPDATE:
+			{
+				xl_heap_update *rec;
+
+				rec = (xl_heap_update *) XLogRecGetData(buf->record);
+				if ((rec->flags &
+					 (XLH_UPDATE_CONTAINS_NEW_TUPLE |
+					  XLH_UPDATE_CONTAINS_OLD_TUPLE |
+					  XLH_UPDATE_CONTAINS_OLD_KEY)) == 0)
+					return;
+
+				break;
+			}
+
+		case XLOG_HEAP_DELETE:
+			{
+				xl_heap_delete *rec;
+
+				rec = (xl_heap_delete *) XLogRecGetData(buf->record);
+				if (rec->flags & XLH_DELETE_NO_LOGICAL)
+					return;
+				break;
+			}
+	}
+
 	ReorderBufferProcessXid(ctx->reorder, xid, buf->origptr);
 
 	/*
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index a2f1803622c..d69229905a2 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -486,6 +486,27 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	return SnapBuildMVCCFromHistoric(snap, true);
 }
 
+/*
+ * Build an MVCC snapshot for the initial data load performed by REPACK
+ * CONCURRENTLY command.
+ *
+ * The snapshot will only be used to scan one particular relation, which is
+ * treated like a catalog (therefore ->building_full_snapshot is not
+ * important), and the caller should already have a replication slot setup (so
+ * we do not set MyProc->xmin). XXX Do we yet need to add some restrictions?
+ */
+Snapshot
+SnapBuildInitialSnapshotForRepack(SnapBuild *builder)
+{
+	Snapshot	snap;
+
+	Assert(builder->state == SNAPBUILD_CONSISTENT);
+	Assert(builder->building_full_snapshot);
+
+	snap = SnapBuildBuildSnapshot(builder);
+	return SnapBuildMVCCFromHistoric(snap, false);
+}
+
 /*
  * Turn a historic MVCC snapshot into an ordinary MVCC snapshot.
  *
diff --git a/src/backend/replication/pgoutput_repack/Makefile b/src/backend/replication/pgoutput_repack/Makefile
new file mode 100644
index 00000000000..4efeb713b70
--- /dev/null
+++ b/src/backend/replication/pgoutput_repack/Makefile
@@ -0,0 +1,32 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for src/backend/replication/pgoutput_repack
+#
+# IDENTIFICATION
+#    src/backend/replication/pgoutput_repack
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/replication/pgoutput_repack
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = \
+	$(WIN32RES) \
+	pgoutput_repack.o
+PGFILEDESC = "pgoutput_repack - logical replication output plugin for REPACK command"
+NAME = pgoutput_repack
+
+all: all-shared-lib
+
+include $(top_srcdir)/src/Makefile.shlib
+
+install: all installdirs install-lib
+
+installdirs: installdirs-lib
+
+uninstall: uninstall-lib
+
+clean distclean: clean-lib
+	rm -f $(OBJS)
diff --git a/src/backend/replication/pgoutput_repack/meson.build b/src/backend/replication/pgoutput_repack/meson.build
new file mode 100644
index 00000000000..133e865a4a0
--- /dev/null
+++ b/src/backend/replication/pgoutput_repack/meson.build
@@ -0,0 +1,18 @@
+# Copyright (c) 2022-2024, PostgreSQL Global Development Group
+
+pgoutput_repack_sources = files(
+  'pgoutput_repack.c',
+)
+
+if host_system == 'windows'
+  pgoutput_repack_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'pgoutput_repack',
+    '--FILEDESC', 'pgoutput_repack - logical replication output plugin for REPACK command',])
+endif
+
+pgoutput_repack = shared_module('pgoutput_repack',
+  pgoutput_repack_sources,
+  kwargs: pg_mod_args,
+)
+
+backend_targets += pgoutput_repack
diff --git a/src/backend/replication/pgoutput_repack/pgoutput_repack.c b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
new file mode 100644
index 00000000000..687fbbc59bb
--- /dev/null
+++ b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
@@ -0,0 +1,288 @@
+/*-------------------------------------------------------------------------
+ *
+ * pgoutput_cluster.c
+ *		Logical Replication output plugin for REPACK command
+ *
+ * Copyright (c) 2012-2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		  src/backend/replication/pgoutput_cluster/pgoutput_cluster.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heaptoast.h"
+#include "commands/cluster.h"
+#include "replication/snapbuild.h"
+
+PG_MODULE_MAGIC;
+
+static void plugin_startup(LogicalDecodingContext *ctx,
+						   OutputPluginOptions *opt, bool is_init);
+static void plugin_shutdown(LogicalDecodingContext *ctx);
+static void plugin_begin_txn(LogicalDecodingContext *ctx,
+							 ReorderBufferTXN *txn);
+static void plugin_commit_txn(LogicalDecodingContext *ctx,
+							  ReorderBufferTXN *txn, XLogRecPtr commit_lsn);
+static void plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+						  Relation rel, ReorderBufferChange *change);
+static void plugin_truncate(struct LogicalDecodingContext *ctx,
+							ReorderBufferTXN *txn, int nrelations,
+							Relation relations[],
+							ReorderBufferChange *change);
+static void store_change(LogicalDecodingContext *ctx,
+						 ConcurrentChangeKind kind, HeapTuple tuple);
+
+void
+_PG_output_plugin_init(OutputPluginCallbacks *cb)
+{
+	AssertVariableIsOfType(&_PG_output_plugin_init, LogicalOutputPluginInit);
+
+	cb->startup_cb = plugin_startup;
+	cb->begin_cb = plugin_begin_txn;
+	cb->change_cb = plugin_change;
+	cb->truncate_cb = plugin_truncate;
+	cb->commit_cb = plugin_commit_txn;
+	cb->shutdown_cb = plugin_shutdown;
+}
+
+
+/* initialize this plugin */
+static void
+plugin_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
+			   bool is_init)
+{
+	ctx->output_plugin_private = NULL;
+
+	/* Probably unnecessary, as we don't use the SQL interface ... */
+	opt->output_type = OUTPUT_PLUGIN_BINARY_OUTPUT;
+
+	if (ctx->output_plugin_options != NIL)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("This plugin does not expect any options")));
+	}
+}
+
+static void
+plugin_shutdown(LogicalDecodingContext *ctx)
+{
+}
+
+/*
+ * As we don't release the slot during processing of particular table, there's
+ * no room for SQL interface, even for debugging purposes. Therefore we need
+ * neither OutputPluginPrepareWrite() nor OutputPluginWrite() in the plugin
+ * callbacks. (Although we might want to write custom callbacks, this API
+ * seems to be unnecessarily generic for our purposes.)
+ */
+
+/* BEGIN callback */
+static void
+plugin_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
+{
+}
+
+/* COMMIT callback */
+static void
+plugin_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+				  XLogRecPtr commit_lsn)
+{
+}
+
+/*
+ * Callback for individual changed tuples
+ */
+static void
+plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+			  Relation relation, ReorderBufferChange *change)
+{
+	RepackDecodingState *dstate;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	/* Only interested in one particular relation. */
+	if (relation->rd_id != dstate->relid)
+		return;
+
+	/* Decode entry depending on its type */
+	switch (change->action)
+	{
+		case REORDER_BUFFER_CHANGE_INSERT:
+			{
+				HeapTuple	newtuple;
+
+				newtuple = change->data.tp.newtuple != NULL ?
+					change->data.tp.newtuple : NULL;
+
+				/*
+				 * Identity checks in the main function should have made this
+				 * impossible.
+				 */
+				if (newtuple == NULL)
+					elog(ERROR, "Incomplete insert info.");
+
+				store_change(ctx, CHANGE_INSERT, newtuple);
+			}
+			break;
+		case REORDER_BUFFER_CHANGE_UPDATE:
+			{
+				HeapTuple	oldtuple,
+							newtuple;
+
+				oldtuple = change->data.tp.oldtuple != NULL ?
+					change->data.tp.oldtuple : NULL;
+				newtuple = change->data.tp.newtuple != NULL ?
+					change->data.tp.newtuple : NULL;
+
+				if (newtuple == NULL)
+					elog(ERROR, "Incomplete update info.");
+
+				if (oldtuple != NULL)
+					store_change(ctx, CHANGE_UPDATE_OLD, oldtuple);
+
+				store_change(ctx, CHANGE_UPDATE_NEW, newtuple);
+			}
+			break;
+		case REORDER_BUFFER_CHANGE_DELETE:
+			{
+				HeapTuple	oldtuple;
+
+				oldtuple = change->data.tp.oldtuple ?
+					change->data.tp.oldtuple : NULL;
+
+				if (oldtuple == NULL)
+					elog(ERROR, "Incomplete delete info.");
+
+				store_change(ctx, CHANGE_DELETE, oldtuple);
+			}
+			break;
+		default:
+			/* Should not come here */
+			Assert(false);
+			break;
+	}
+}
+
+static void
+plugin_truncate(struct LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+				int nrelations, Relation relations[],
+				ReorderBufferChange *change)
+{
+	RepackDecodingState *dstate;
+	int			i;
+	Relation	relation = NULL;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	/* Find the relation we are processing. */
+	for (i = 0; i < nrelations; i++)
+	{
+		relation = relations[i];
+
+		if (RelationGetRelid(relation) == dstate->relid)
+			break;
+	}
+
+	/* Is this truncation of another relation? */
+	if (i == nrelations)
+		return;
+
+	store_change(ctx, CHANGE_TRUNCATE, NULL);
+}
+
+/* Store concurrent data change. */
+static void
+store_change(LogicalDecodingContext *ctx, ConcurrentChangeKind kind,
+			 HeapTuple tuple)
+{
+	RepackDecodingState *dstate;
+	char	   *change_raw;
+	ConcurrentChange change;
+	bool		flattened = false;
+	Size		size;
+	Datum		values[1];
+	bool		isnull[1];
+	char	   *dst,
+			   *dst_start;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	size = MAXALIGN(VARHDRSZ) + SizeOfConcurrentChange;
+
+	if (tuple)
+	{
+		/*
+		 * ReorderBufferCommit() stores the TOAST chunks in its private memory
+		 * context and frees them after having called apply_change().
+		 * Therefore we need flat copy (including TOAST) that we eventually
+		 * copy into the memory context which is available to
+		 * decode_concurrent_changes().
+		 */
+		if (HeapTupleHasExternal(tuple))
+		{
+			/*
+			 * toast_flatten_tuple_to_datum() might be more convenient but we
+			 * don't want the decompression it does.
+			 */
+			tuple = toast_flatten_tuple(tuple, dstate->tupdesc);
+			flattened = true;
+		}
+
+		size += tuple->t_len;
+	}
+
+	/* XXX Isn't there any function / macro to do this? */
+	if (size >= 0x3FFFFFFF)
+		elog(ERROR, "Change is too big.");
+
+	/* Construct the change. */
+	change_raw = (char *) palloc0(size);
+	SET_VARSIZE(change_raw, size);
+
+	/*
+	 * Since the varlena alignment might not be sufficient for the structure,
+	 * set the fields in a local instance and remember where it should
+	 * eventually be copied.
+	 */
+	change.kind = kind;
+	dst_start = (char *) VARDATA(change_raw);
+
+	/* No other information is needed for TRUNCATE. */
+	if (change.kind == CHANGE_TRUNCATE)
+	{
+		memcpy(dst_start, &change, SizeOfConcurrentChange);
+		goto store;
+	}
+
+	/*
+	 * Copy the tuple.
+	 *
+	 * CAUTION: change->tup_data.t_data must be fixed on retrieval!
+	 */
+	memcpy(&change.tup_data, tuple, sizeof(HeapTupleData));
+	dst = dst_start + SizeOfConcurrentChange;
+	memcpy(dst, tuple->t_data, tuple->t_len);
+
+	/* The data has been copied. */
+	if (flattened)
+		pfree(tuple);
+
+store:
+	/* Copy the structure so it can be stored. */
+	memcpy(dst_start, &change, SizeOfConcurrentChange);
+
+	/* Store as tuple of 1 bytea column. */
+	values[0] = PointerGetDatum(change_raw);
+	isnull[0] = false;
+	tuplestore_putvalues(dstate->tstore, dstate->tupdesc_change,
+						 values, isnull);
+
+	/* Accounting. */
+	dstate->nchanges++;
+
+	/* Cleanup. */
+	pfree(change_raw);
+}
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2fa045e6b0f..e9ddf39500c 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -25,6 +25,7 @@
 #include "access/xlogprefetcher.h"
 #include "access/xlogrecovery.h"
 #include "commands/async.h"
+#include "commands/cluster.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
diff --git a/src/backend/storage/lmgr/generate-lwlocknames.pl b/src/backend/storage/lmgr/generate-lwlocknames.pl
index cd3e43c448a..519f3953638 100644
--- a/src/backend/storage/lmgr/generate-lwlocknames.pl
+++ b/src/backend/storage/lmgr/generate-lwlocknames.pl
@@ -162,7 +162,7 @@ while (<$lwlocklist>)
 
 die
   "$wait_event_lwlocks[$lwlock_count] defined in wait_event_names.txt but "
-  . " missing from lwlocklist.h"
+  . "missing from lwlocklist.h"
   if $lwlock_count < scalar @wait_event_lwlocks;
 
 die
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 915d0bc9084..d0f01d85bd3 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -64,6 +64,7 @@
 #include "catalog/pg_type.h"
 #include "catalog/schemapg.h"
 #include "catalog/storage.h"
+#include "commands/cluster.h"
 #include "commands/policy.h"
 #include "commands/publicationcmds.h"
 #include "commands/trigger.h"
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index bc7840052fe..6d46537cbe8 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -213,7 +213,6 @@ static List *exportedSnapshots = NIL;
 
 /* Prototypes for local functions */
 static void UnregisterSnapshotNoOwner(Snapshot snapshot);
-static void FreeSnapshot(Snapshot snapshot);
 static void SnapshotResetXmin(void);
 
 /* ResourceOwner callbacks to track snapshot references */
@@ -657,7 +656,7 @@ CopySnapshot(Snapshot snapshot)
  * FreeSnapshot
  *		Free the memory associated with a snapshot.
  */
-static void
+void
 FreeSnapshot(Snapshot snapshot)
 {
 	Assert(snapshot->regd_count == 0);
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index b1fe7703296..8d245939e72 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -5009,18 +5009,27 @@ match_previous_words(int pattern_id,
 	}
 
 /* REPACK */
-	else if (Matches("REPACK"))
+	else if (Matches("REPACK") || Matches("REPACK", "(*)"))
+		COMPLETE_WITH_SCHEMA_QUERY_PLUS(Query_for_list_of_clusterables,
+										"CONCURRENTLY");
+	else if (Matches("REPACK", "CONCURRENTLY"))
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
-	else if (Matches("REPACK", "(*)"))
+	else if (Matches("REPACK", "(*)", "CONCURRENTLY"))
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
-	/* If we have REPACK <sth>, then add "USING INDEX" */
-	else if (Matches("REPACK", MatchAnyExcept("(")))
+	/* If we have REPACK [ CONCURRENTLY ] <sth>, then add "USING INDEX" */
+	else if (Matches("REPACK", MatchAnyExcept("(|CONCURRENTLY")) ||
+			 Matches("REPACK", "CONCURRENTLY", MatchAnyExcept("(")))
 		COMPLETE_WITH("USING INDEX");
-	/* If we have REPACK (*) <sth>, then add "USING INDEX" */
-	else if (Matches("REPACK", "(*)", MatchAny))
+	/* If we have REPACK (*) [ CONCURRENTLY ] <sth>, then add "USING INDEX" */
+	else if (Matches("REPACK", "(*)", MatchAnyExcept("CONCURRENTLY")) ||
+			 Matches("REPACK", "(*)", "CONCURRENTLY", MatchAnyExcept("(")))
 		COMPLETE_WITH("USING INDEX");
-	/* If we have REPACK <sth> USING, then add the index as well */
-	else if (Matches("REPACK", MatchAny, "USING", "INDEX"))
+
+	/*
+	 * Complete ... [ (*) ] [ CONCURRENTLY ] <sth> USING INDEX, with a list of
+	 * indexes for <sth>.
+	 */
+	else if (TailMatches(MatchAnyExcept("(|CONCURRENTLY"), "USING", "INDEX"))
 	{
 		set_completion_reference(prev3_wd);
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_index_of_table);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 909db73b7bb..d5c2e8a9a25 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -318,14 +318,15 @@ extern void heap_multi_insert(Relation relation, TupleTableSlot **slots,
 							  BulkInsertState bistate);
 extern TM_Result heap_delete(Relation relation, const ItemPointerData *tid,
 							 CommandId cid, Snapshot crosscheck, bool wait,
-							 TM_FailureData *tmfd, bool changingPart);
+							 TM_FailureData *tmfd, bool changingPart,
+							 bool wal_logical);
 extern void heap_finish_speculative(Relation relation, const ItemPointerData *tid);
 extern void heap_abort_speculative(Relation relation, const ItemPointerData *tid);
 extern TM_Result heap_update(Relation relation, const ItemPointerData *otid,
 							 HeapTuple newtup,
 							 CommandId cid, Snapshot crosscheck, bool wait,
 							 TM_FailureData *tmfd, LockTupleMode *lockmode,
-							 TU_UpdateIndexes *update_indexes);
+							 TU_UpdateIndexes *update_indexes, bool wal_logical);
 extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
 								 CommandId cid, LockTupleMode mode, LockWaitPolicy wait_policy,
 								 bool follow_updates,
@@ -406,6 +407,10 @@ extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
 								 uint16 infomask, TransactionId xid);
+extern bool HeapTupleMVCCInserted(HeapTuple htup, Snapshot snapshot,
+								  Buffer buffer);
+extern bool HeapTupleMVCCNotDeleted(HeapTuple htup, Snapshot snapshot,
+									Buffer buffer);
 extern bool HeapTupleHeaderIsOnlyLocked(HeapTupleHeader tuple);
 extern bool HeapTupleIsSurelyDead(HeapTuple htup,
 								  GlobalVisState *vistest);
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..2cc49fd48de 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -104,6 +104,8 @@
 #define XLH_DELETE_CONTAINS_OLD_KEY				(1<<2)
 #define XLH_DELETE_IS_SUPER						(1<<3)
 #define XLH_DELETE_IS_PARTITION_MOVE			(1<<4)
+/* See heap_delete() */
+#define XLH_DELETE_NO_LOGICAL					(1<<5)
 
 /* convenience macro for checking whether any form of old tuple was logged */
 #define XLH_DELETE_CONTAINS_OLD						\
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index e16bf025692..aa0599e3a7a 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -22,6 +22,7 @@
 #include "access/xact.h"
 #include "commands/vacuum.h"
 #include "executor/tuptable.h"
+#include "replication/logical.h"
 #include "storage/read_stream.h"
 #include "utils/rel.h"
 #include "utils/snapshot.h"
@@ -629,6 +630,8 @@ typedef struct TableAmRoutine
 											  Relation OldIndex,
 											  bool use_sort,
 											  TransactionId OldestXmin,
+											  Snapshot snapshot,
+											  LogicalDecodingContext *decoding_ctx,
 											  TransactionId *xid_cutoff,
 											  MultiXactId *multi_cutoff,
 											  double *num_tuples,
@@ -1636,6 +1639,10 @@ table_relation_copy_data(Relation rel, const RelFileLocator *newrlocator)
  *   not needed for the relation's AM
  * - *xid_cutoff - ditto
  * - *multi_cutoff - ditto
+ * - snapshot - if != NULL, ignore data changes done by transactions that this
+ *	 (MVCC) snapshot considers still in-progress or in the future.
+ * - decoding_ctx - logical decoding context, to capture concurrent data
+ *   changes.
  *
  * Output parameters:
  * - *xid_cutoff - rel's new relfrozenxid value, may be invalid
@@ -1648,6 +1655,8 @@ table_relation_copy_for_cluster(Relation OldTable, Relation NewTable,
 								Relation OldIndex,
 								bool use_sort,
 								TransactionId OldestXmin,
+								Snapshot snapshot,
+								LogicalDecodingContext *decoding_ctx,
 								TransactionId *xid_cutoff,
 								MultiXactId *multi_cutoff,
 								double *num_tuples,
@@ -1656,6 +1665,7 @@ table_relation_copy_for_cluster(Relation OldTable, Relation NewTable,
 {
 	OldTable->rd_tableam->relation_copy_for_cluster(OldTable, NewTable, OldIndex,
 													use_sort, OldestXmin,
+													snapshot, decoding_ctx,
 													xid_cutoff, multi_cutoff,
 													num_tuples, tups_vacuumed,
 													tups_recently_dead);
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 652542e8e65..b34abe9249f 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -13,10 +13,15 @@
 #ifndef CLUSTER_H
 #define CLUSTER_H
 
+#include "nodes/execnodes.h"
 #include "nodes/parsenodes.h"
 #include "parser/parse_node.h"
+#include "replication/logical.h"
 #include "storage/lock.h"
+#include "storage/relfilelocator.h"
 #include "utils/relcache.h"
+#include "utils/resowner.h"
+#include "utils/tuplestore.h"
 
 
 /* flag bits for ClusterParams->options */
@@ -25,6 +30,8 @@
 #define CLUOPT_RECHECK_ISCLUSTERED 0x04 /* recheck relation state for
 										 * indisclustered */
 #define CLUOPT_ANALYZE 0x08		/* do an ANALYZE */
+#define CLUOPT_CONCURRENT 0x10	/* allow concurrent data changes */
+
 
 /* options for CLUSTER */
 typedef struct ClusterParams
@@ -33,14 +40,94 @@ typedef struct ClusterParams
 } ClusterParams;
 
 
+/*
+ * The following definitions are used by REPACK CONCURRENTLY.
+ */
+
+extern RelFileLocator repacked_rel_locator;
+extern RelFileLocator repacked_rel_toast_locator;
+
+typedef enum
+{
+	CHANGE_INSERT,
+	CHANGE_UPDATE_OLD,
+	CHANGE_UPDATE_NEW,
+	CHANGE_DELETE,
+	CHANGE_TRUNCATE
+} ConcurrentChangeKind;
+
+typedef struct ConcurrentChange
+{
+	/* See the enum above. */
+	ConcurrentChangeKind kind;
+
+	/*
+	 * The actual tuple.
+	 *
+	 * The tuple data follows the ConcurrentChange structure. Before use make
+	 * sure the tuple is correctly aligned (ConcurrentChange can be stored as
+	 * bytea) and that tuple->t_data is fixed.
+	 */
+	HeapTupleData tup_data;
+} ConcurrentChange;
+
+#define SizeOfConcurrentChange (offsetof(ConcurrentChange, tup_data) + \
+								sizeof(HeapTupleData))
+
+/*
+ * Logical decoding state.
+ *
+ * Here we store the data changes that we decode from WAL while the table
+ * contents is being copied to a new storage. Also the necessary metadata
+ * needed to apply these changes to the table is stored here.
+ */
+typedef struct RepackDecodingState
+{
+	/* The relation whose changes we're decoding. */
+	Oid			relid;
+
+	/*
+	 * Decoded changes are stored here. Although we try to avoid excessive
+	 * batches, it can happen that the changes need to be stored to disk. The
+	 * tuplestore does this transparently.
+	 */
+	Tuplestorestate *tstore;
+
+	/* The current number of changes in tstore. */
+	double		nchanges;
+
+	/*
+	 * Descriptor to store the ConcurrentChange structure serialized (bytea).
+	 * We can't store the tuple directly because tuplestore only supports
+	 * minimum tuple and we may need to transfer OID system column from the
+	 * output plugin. Also we need to transfer the change kind, so it's better
+	 * to put everything in the structure than to use 2 tuplestores "in
+	 * parallel".
+	 */
+	TupleDesc	tupdesc_change;
+
+	/* Tuple descriptor needed to update indexes. */
+	TupleDesc	tupdesc;
+
+	/* Slot to retrieve data from tstore. */
+	TupleTableSlot *tsslot;
+
+	ResourceOwner resowner;
+} RepackDecodingState;
+
+
+
 extern void ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel);
 
 extern void cluster_rel(RepackCommand command, Relation OldHeap, Oid indexOid,
-						ClusterParams *params);
+						ClusterParams *params, bool isTopLevel);
 extern void check_index_is_clusterable(Relation OldHeap, Oid indexOid,
 									   LOCKMODE lockmode);
 extern void mark_index_clustered(Relation rel, Oid indexOid, bool is_internal);
 
+extern void repack_decode_concurrent_changes(LogicalDecodingContext *ctx,
+											 XLogRecPtr end_of_wal);
+
 extern Oid	make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
 						  char relpersistence, LOCKMODE lockmode);
 extern void finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
@@ -48,6 +135,7 @@ extern void finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 							 bool swap_toast_by_content,
 							 bool check_constraints,
 							 bool is_internal,
+							 bool reindex,
 							 TransactionId frozenXid,
 							 MultiXactId cutoffMulti,
 							 char newrelpersistence);
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index ebf004b7aa5..5024fea5e2e 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -69,10 +69,12 @@
 #define PROGRESS_REPACK_PHASE					1
 #define PROGRESS_REPACK_INDEX_RELID				2
 #define PROGRESS_REPACK_HEAP_TUPLES_SCANNED		3
-#define PROGRESS_REPACK_HEAP_TUPLES_WRITTEN		4
-#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			5
-#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		6
-#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		7
+#define PROGRESS_REPACK_HEAP_TUPLES_INSERTED	4
+#define PROGRESS_REPACK_HEAP_TUPLES_UPDATED		5
+#define PROGRESS_REPACK_HEAP_TUPLES_DELETED		6
+#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			7
+#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		8
+#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		9
 
 /*
  * Phases of repack (as advertised via PROGRESS_REPACK_PHASE).
@@ -81,9 +83,10 @@
 #define PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP	2
 #define PROGRESS_REPACK_PHASE_SORT_TUPLES		3
 #define PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP	4
-#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	5
-#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		6
-#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		7
+#define PROGRESS_REPACK_PHASE_CATCH_UP			5
+#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	6
+#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		7
+#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		8
 
 /* Progress parameters for CREATE INDEX */
 /* 3, 4 and 5 reserved for "waitfor" metrics */
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index 6d4d2d1814c..802fc4b0823 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -73,6 +73,7 @@ extern void FreeSnapshotBuilder(SnapBuild *builder);
 extern void SnapBuildSnapDecRefcount(Snapshot snap);
 
 extern Snapshot SnapBuildInitialSnapshot(SnapBuild *builder);
+extern Snapshot SnapBuildInitialSnapshotForRepack(SnapBuild *builder);
 extern Snapshot SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place);
 extern const char *SnapBuildExportSnapshot(SnapBuild *builder);
 extern void SnapBuildClearExportedSnapshot(void);
diff --git a/src/include/storage/lockdefs.h b/src/include/storage/lockdefs.h
index 7f3ba0352f6..2739327b0da 100644
--- a/src/include/storage/lockdefs.h
+++ b/src/include/storage/lockdefs.h
@@ -36,8 +36,8 @@ typedef int LOCKMODE;
 #define AccessShareLock			1	/* SELECT */
 #define RowShareLock			2	/* SELECT FOR UPDATE/FOR SHARE */
 #define RowExclusiveLock		3	/* INSERT, UPDATE, DELETE */
-#define ShareUpdateExclusiveLock 4	/* VACUUM (non-FULL), ANALYZE, CREATE
-									 * INDEX CONCURRENTLY */
+#define ShareUpdateExclusiveLock 4	/* VACUUM (non-exclusive), ANALYZE, CREATE
+									 * INDEX CONCURRENTLY, REPACK CONCURRENTLY */
 #define ShareLock				5	/* CREATE INDEX (WITHOUT CONCURRENTLY) */
 #define ShareRowExclusiveLock	6	/* like EXCLUSIVE MODE, but allows ROW
 									 * SHARE */
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index f65f83c85cd..1f821fd2ccd 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -64,6 +64,8 @@ extern Snapshot GetLatestSnapshot(void);
 extern void SnapshotSetCommandId(CommandId curcid);
 
 extern Snapshot CopySnapshot(Snapshot snapshot);
+extern void FreeSnapshot(Snapshot snapshot);
+
 extern Snapshot GetCatalogSnapshot(Oid relid);
 extern Snapshot GetNonHistoricCatalogSnapshot(Oid relid);
 extern void InvalidateCatalogSnapshot(void);
diff --git a/src/test/modules/injection_points/Makefile b/src/test/modules/injection_points/Makefile
index fc82cd67f6c..f16422175f8 100644
--- a/src/test/modules/injection_points/Makefile
+++ b/src/test/modules/injection_points/Makefile
@@ -11,10 +11,11 @@ EXTENSION = injection_points
 DATA = injection_points--1.0.sql
 PGFILEDESC = "injection_points - facility for injection points"
 
-REGRESS = injection_points hashagg reindex_conc vacuum
+# REGRESS = injection_points hashagg reindex_conc vacuum
 REGRESS_OPTS = --dlpath=$(top_builddir)/src/test/regress
 
-ISOLATION = basic inplace syscache-update-pruned
+ISOLATION = basic inplace syscache-update-pruned repack
+ISOLATION_OPTS = --temp-config $(top_srcdir)/src/test/modules/injection_points/logical.conf
 
 TAP_TESTS = 1
 
diff --git a/src/test/modules/injection_points/expected/repack.out b/src/test/modules/injection_points/expected/repack.out
new file mode 100644
index 00000000000..b575e9052ee
--- /dev/null
+++ b/src/test/modules/injection_points/expected/repack.out
@@ -0,0 +1,113 @@
+Parsed test spec with 2 sessions
+
+starting permutation: wait_before_lock change_existing change_new change_subxact1 change_subxact2 check2 wakeup_before_lock check1
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step wait_before_lock: 
+	REPACK (CONCURRENTLY) repack_test USING INDEX repack_test_pkey;
+ <waiting ...>
+step change_existing: 
+	UPDATE repack_test SET i=10 where i=1;
+	UPDATE repack_test SET j=20 where i=2;
+	UPDATE repack_test SET i=30 where i=3;
+	UPDATE repack_test SET i=40 where i=30;
+	DELETE FROM repack_test WHERE i=4;
+
+step change_new: 
+	INSERT INTO repack_test(i, j) VALUES (5, 5), (6, 6), (7, 7), (8, 8);
+	UPDATE repack_test SET i=50 where i=5;
+	UPDATE repack_test SET j=60 where i=6;
+	DELETE FROM repack_test WHERE i=7;
+
+step change_subxact1: 
+	BEGIN;
+	INSERT INTO repack_test(i, j) VALUES (100, 100);
+	SAVEPOINT s1;
+	UPDATE repack_test SET i=101 where i=100;
+	SAVEPOINT s2;
+	UPDATE repack_test SET i=102 where i=101;
+	COMMIT;
+
+step change_subxact2: 
+	BEGIN;
+	SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 110);
+	ROLLBACK TO SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 111);
+	COMMIT;
+
+step check2: 
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s2(i, j)
+	SELECT i, j FROM repack_test;
+
+  i|  j
+---+---
+  2| 20
+  6| 60
+  8|  8
+ 10|  1
+ 40|  3
+ 50|  5
+102|100
+110|111
+(8 rows)
+
+step wakeup_before_lock: 
+	SELECT injection_points_wakeup('repack-concurrently-before-lock');
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step wait_before_lock: <... completed>
+step check1: 
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT count(DISTINCT node) FROM relfilenodes;
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s1(i, j)
+	SELECT i, j FROM repack_test;
+
+	SELECT count(*)
+	FROM data_s1 d1 FULL JOIN data_s2 d2 USING (i, j)
+	WHERE d1.i ISNULL OR d2.i ISNULL;
+
+count
+-----
+    2
+(1 row)
+
+  i|  j
+---+---
+  2| 20
+  6| 60
+  8|  8
+ 10|  1
+ 40|  3
+ 50|  5
+102|100
+110|111
+(8 rows)
+
+count
+-----
+    0
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
diff --git a/src/test/modules/injection_points/logical.conf b/src/test/modules/injection_points/logical.conf
new file mode 100644
index 00000000000..c8f264bc6cb
--- /dev/null
+++ b/src/test/modules/injection_points/logical.conf
@@ -0,0 +1 @@
+wal_level = logical
\ No newline at end of file
diff --git a/src/test/modules/injection_points/meson.build b/src/test/modules/injection_points/meson.build
index 20390d6b4bf..29561103bbf 100644
--- a/src/test/modules/injection_points/meson.build
+++ b/src/test/modules/injection_points/meson.build
@@ -47,9 +47,13 @@ tests += {
     'specs': [
       'basic',
       'inplace',
+      'repack',
       'syscache-update-pruned',
     ],
     'runningcheck': false, # see syscache-update-pruned
+    # 'repack' requires wal_level = 'logical'.
+    'regress_args': ['--temp-config', files('logical.conf')],
+
   },
   'tap': {
     'env': {
diff --git a/src/test/modules/injection_points/specs/repack.spec b/src/test/modules/injection_points/specs/repack.spec
new file mode 100644
index 00000000000..75850334986
--- /dev/null
+++ b/src/test/modules/injection_points/specs/repack.spec
@@ -0,0 +1,143 @@
+# Prefix the system columns with underscore as they are not allowed as column
+# names.
+setup
+{
+	CREATE EXTENSION injection_points;
+
+	CREATE TABLE repack_test(i int PRIMARY KEY, j int);
+	INSERT INTO repack_test(i, j) VALUES (1, 1), (2, 2), (3, 3), (4, 4);
+
+	CREATE TABLE relfilenodes(node oid);
+
+	CREATE TABLE data_s1(i int, j int);
+	CREATE TABLE data_s2(i int, j int);
+}
+
+teardown
+{
+	DROP TABLE repack_test;
+	DROP EXTENSION injection_points;
+
+	DROP TABLE relfilenodes;
+	DROP TABLE data_s1;
+	DROP TABLE data_s2;
+}
+
+session s1
+setup
+{
+	SELECT injection_points_set_local();
+	SELECT injection_points_attach('repack-concurrently-before-lock', 'wait');
+}
+# Perform the initial load and wait for s2 to do some data changes.
+step wait_before_lock
+{
+	REPACK (CONCURRENTLY) repack_test USING INDEX repack_test_pkey;
+}
+# Check the table from the perspective of s1.
+#
+# Besides the contents, we also check that relfilenode has changed.
+
+# Have each session write the contents into a table and use FULL JOIN to check
+# if the outputs are identical.
+step check1
+{
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT count(DISTINCT node) FROM relfilenodes;
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s1(i, j)
+	SELECT i, j FROM repack_test;
+
+	SELECT count(*)
+	FROM data_s1 d1 FULL JOIN data_s2 d2 USING (i, j)
+	WHERE d1.i ISNULL OR d2.i ISNULL;
+}
+teardown
+{
+    SELECT injection_points_detach('repack-concurrently-before-lock');
+}
+
+session s2
+# Change the existing data. UPDATE changes both key and non-key columns. Also
+# update one row twice to test whether tuple version generated by this session
+# can be found.
+step change_existing
+{
+	UPDATE repack_test SET i=10 where i=1;
+	UPDATE repack_test SET j=20 where i=2;
+	UPDATE repack_test SET i=30 where i=3;
+	UPDATE repack_test SET i=40 where i=30;
+	DELETE FROM repack_test WHERE i=4;
+}
+# Insert new rows and UPDATE / DELETE some of them. Again, update both key and
+# non-key column.
+step change_new
+{
+	INSERT INTO repack_test(i, j) VALUES (5, 5), (6, 6), (7, 7), (8, 8);
+	UPDATE repack_test SET i=50 where i=5;
+	UPDATE repack_test SET j=60 where i=6;
+	DELETE FROM repack_test WHERE i=7;
+}
+
+# When applying concurrent data changes, we should see the effects of an
+# in-progress subtransaction.
+#
+# XXX Not sure this test is useful now - it was designed for the patch that
+# preserves tuple visibility and which therefore modifies
+# TransactionIdIsCurrentTransactionId().
+step change_subxact1
+{
+	BEGIN;
+	INSERT INTO repack_test(i, j) VALUES (100, 100);
+	SAVEPOINT s1;
+	UPDATE repack_test SET i=101 where i=100;
+	SAVEPOINT s2;
+	UPDATE repack_test SET i=102 where i=101;
+	COMMIT;
+}
+
+# When applying concurrent data changes, we should not see the effects of a
+# rolled back subtransaction.
+#
+# XXX Is this test useful? See above.
+step change_subxact2
+{
+	BEGIN;
+	SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 110);
+	ROLLBACK TO SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 111);
+	COMMIT;
+}
+
+# Check the table from the perspective of s2.
+step check2
+{
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s2(i, j)
+	SELECT i, j FROM repack_test;
+}
+step wakeup_before_lock
+{
+	SELECT injection_points_wakeup('repack-concurrently-before-lock');
+}
+
+# Test if data changes introduced while one session is performing REPACK
+# CONCURRENTLY find their way into the table.
+permutation
+	wait_before_lock
+	change_existing
+	change_new
+	change_subxact1
+	change_subxact2
+	check2
+	wakeup_before_lock
+	check1
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index bd872ebd13a..3d1eb773136 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2006,7 +2006,7 @@ pg_stat_progress_cluster| SELECT pid,
     phase,
     repack_index_relid AS cluster_index_relid,
     heap_tuples_scanned,
-    heap_tuples_written,
+    (heap_tuples_inserted + heap_tuples_updated) AS heap_tuples_written,
     heap_blks_total,
     heap_blks_scanned,
     index_rebuild_count
@@ -2086,17 +2086,20 @@ pg_stat_progress_repack| SELECT s.pid,
             WHEN 2 THEN 'index scanning heap'::text
             WHEN 3 THEN 'sorting tuples'::text
             WHEN 4 THEN 'writing new heap'::text
-            WHEN 5 THEN 'swapping relation files'::text
-            WHEN 6 THEN 'rebuilding index'::text
-            WHEN 7 THEN 'performing final cleanup'::text
+            WHEN 5 THEN 'catch-up'::text
+            WHEN 6 THEN 'swapping relation files'::text
+            WHEN 7 THEN 'rebuilding index'::text
+            WHEN 8 THEN 'performing final cleanup'::text
             ELSE NULL::text
         END AS phase,
     (s.param3)::oid AS repack_index_relid,
     s.param4 AS heap_tuples_scanned,
-    s.param5 AS heap_tuples_written,
-    s.param6 AS heap_blks_total,
-    s.param7 AS heap_blks_scanned,
-    s.param8 AS index_rebuild_count
+    s.param5 AS heap_tuples_inserted,
+    s.param6 AS heap_tuples_updated,
+    s.param7 AS heap_tuples_deleted,
+    s.param8 AS heap_blks_total,
+    s.param9 AS heap_blks_scanned,
+    s.param10 AS index_rebuild_count
    FROM (pg_stat_get_progress_info('REPACK'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_vacuum| SELECT s.pid,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 0c7dd5d09e8..eafe61b8e03 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -486,6 +486,8 @@ CompressFileHandle
 CompressionLocation
 CompressorState
 ComputeXidHorizonsResult
+ConcurrentChange
+ConcurrentChangeKind
 ConditionVariable
 ConditionVariableMinimallyPadded
 ConditionalStack
@@ -1260,6 +1262,7 @@ IndexElem
 IndexFetchHeapData
 IndexFetchTableData
 IndexInfo
+IndexInsertState
 IndexList
 IndexOnlyScan
 IndexOnlyScanState
@@ -2545,6 +2548,7 @@ ReorderBufferUpdateProgressTxnCB
 ReorderTuple
 RepOriginId
 RepackCommand
+RepackDecodingState
 RepackStmt
 ReparameterizeForeignPathByChild_function
 ReplaceVarsFromTargetList_context
-- 
2.47.3

#41

jian he

jian.universality@gmail.com

2 months ago

In reply to: Alvaro Herrera (#40)

Re: Adding REPACK [concurrently]

On Fri, Oct 31, 2025 at 7:17 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Hello,

Here's a new installment of this series, v25, including the CONCURRENTLY
part, which required some conflict fixes on top of the much-changed
v24-0001 patch.

hi.

if (params.options & CLUOPT_ANALYZE)
ereport(ERROR,
errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot %s multiple tables", "REPACK (ANALYZE)"));
for this error case, adding a simple test case would be better?

+ /* Do an analyze, if requested */
+ if (params->options & CLUOPT_ANALYZE)
+ {
+ VacuumParams vac_params = {0};
+
+ vac_params.options |= VACOPT_ANALYZE;
+ if (params->options & CLUOPT_VERBOSE)
+ vac_params.options |= VACOPT_VERBOSE;
+ analyze_rel(RelationGetRelid(rel), NULL, vac_params,
+ stmt->relation->va_cols, true, NULL);
+ }

Looking at the comments in struct VacuumParams, some fields have nonzero default
values — for example, log_vacuum_min_duration.
Do we need to explicitly set these fields to their default values?
(see ExecVacuum)

repack.sgml can also add a
<refsect1> <title>See Also</title>
similar to analyze.sgml, vacuum.sgml

doc/src/sgml/ref/repack.sgml
synopsis section missing syntax:
REPACK USING INDEX

I am wondering, can we also support
REPACK opt_utility_option_list USING INDEX

MATERIALIZED VIEW:
create materialized view a_________ as select * from t2;

repack (verbose);
INFO: repacking "public.a_________" in physical order
INFO: "public.a_________": found 0 removable, 10 nonremovable row
versions in 1 pages
DETAIL: 0 dead row versions cannot be removed yet.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.

cluster (verbose);
won't touch materialized view a_________

but materialized views don't have bloat, nothing can be removed.
So here we are waste cycles to repack materialized view?

#42

Sergei Kornilov

sk@zsrv.org

2 months ago

In reply to: jian he (#41)

Re: Adding REPACK [concurrently]

Hello!

but materialized views don't have bloat, nothing can be removed.

REFRESH MATERIALIZED VIEW CONCURRENTLY does not replace relation completely but updates the relation using insert and delete queries (refresh_by_match_merge in src/backend/commands/matview.c) - so there may be bloat.

regards, Sergei

#43

Mihail Nikalayeu

mihailnikalayeu@gmail.com

2 months ago

In reply to: Alvaro Herrera (#40)

Re: Adding REPACK [concurrently]

Hello!

On Fri, Oct 31, 2025 at 12:17 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Here's a new installment of this series, v25, including the CONCURRENTLY
part, which required some conflict fixes on top of the much-changed
v24-0001 patch.

* cluster.c
* CLUSTER a table on an index. This is now also used for VACUUM FULL.

Should we add something about repack here?

ii_ExclusinOps

typo here.

* index is inserted into catalogs and needs to be built later on.

Now it is only in case concurrently == true

* Build the index information for the new index. Note that rebuild of
* indexes with exclusion constraints is not supported, hence there is no
* need to fill all the ii_Exclusion* fields.

Now the function supports its in !concurrently mode. Should we fill
ii_Exclusion? Also, it says

If !concurrently, ii_ExclusinOps is currently not needed.

But it is not clear - why not?

newInfo = makeIndexInfo(oldInfo->ii_NumIndexAttrs,
oldInfo->ii_NumIndexKeyAttrs,
oldInfo->ii_Am,
indexExprs,
indexPreds,
oldInfo->ii_Unique,
oldInfo->ii_NullsNotDistinct,
false, /* not ready for inserts */
true,
indexRelation->rd_indam->amsummarizing,
oldInfo->ii_WithoutOverlaps);

Is it ok we pass isready == false if !concurrent?
Also, we pass concurrent == true even if concurrently == false - feels
strange and probably wrong.

This difference does has no impact on XidInMVCCSnapshot().

Should it be "This difference has no impact"?

* pgoutput_cluster.c
* src/backend/replication/pgoutput_cluster/pgoutput_cluster.c

it is pgoutput_trepack.c :)

Best regards,
Mikhail.

#44

Antonin Houska

ah@cybertec.at

2 months ago

In reply to: Mihail Nikalayeu (#43)

Re: Adding REPACK [concurrently]

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

Hello!

On Fri, Oct 31, 2025 at 12:17 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Here's a new installment of this series, v25, including the CONCURRENTLY
part, which required some conflict fixes on top of the much-changed
v24-0001 patch.

* cluster.c
* CLUSTER a table on an index. This is now also used for VACUUM FULL.

Should we add something about repack here?

ii_ExclusinOps

typo here.

* index is inserted into catalogs and needs to be built later on.

Now it is only in case concurrently == true

* Build the index information for the new index. Note that rebuild of
* indexes with exclusion constraints is not supported, hence there is no
* need to fill all the ii_Exclusion* fields.

Now the function supports its in !concurrently mode. Should we fill
ii_Exclusion? Also, it says

If !concurrently, ii_ExclusinOps is currently not needed.

But it is not clear - why not?

Right, makeIndexInfo() needs to be adjusted.

newInfo = makeIndexInfo(oldInfo->ii_NumIndexAttrs,
oldInfo->ii_NumIndexKeyAttrs,
oldInfo->ii_Am,
indexExprs,
indexPreds,
oldInfo->ii_Unique,
oldInfo->ii_NullsNotDistinct,
false, /* not ready for inserts */
true,
indexRelation->rd_indam->amsummarizing,
oldInfo->ii_WithoutOverlaps);

Is it ok we pass isready == false if !concurrent?
Also, we pass concurrent == true even if concurrently == false - feels
strange and probably wrong.

You're right, both arguments are wrong.

This difference does has no impact on XidInMVCCSnapshot().

Should it be "This difference has no impact"?

* pgoutput_cluster.c
* src/backend/replication/pgoutput_cluster/pgoutput_cluster.c

it is pgoutput_trepack.c :)

I'll fix all the problems in the next version. Thanks!

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#45

jian he

jian.universality@gmail.com

2 months ago

In reply to: Alvaro Herrera (#40)

Re: Adding REPACK [concurrently]

On Fri, Oct 31, 2025 at 7:17 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Hello,

Here's a new installment of this series, v25, including the CONCURRENTLY
part, which required some conflict fixes on top of the much-changed
v24-0001 patch.

<refnamediv>
<refname>pg_repackdb</refname>
<refpurpose>repack and analyze a <productname>PostgreSQL</productname>
database</refpurpose>
</refnamediv>

but with --all option specified, it's doing repack whole cluster.
(more than one database).
I am not fully sure this description is OK.

can be simplified the same way as as pg_dump:

pg_repackdb [connection-option...] [option...] [ dbname | -a | --all ]

------------------------
[-d] dbname
[--dbname=]dbname

what do you think to expand it as below:
dbname
-d dbname
--dbname=dbname
--------------------

+ printf(_("      --index[=INDEX]             repack following an index\n"));
should it be
+ printf(_("--index[=INDEX]                   repack following an index\n"));
?

similar to pg_dump:
printf(_("\nIf no database name is supplied, then the PGDATABASE
environment\n"
"variable value is used.\n\n"));

in pg_repackdb help section, we can mention:
printf(_("\nIf no database name is supplied and --all option not
specified then the PGDATABASE environment\n"
"variable value is used.\n\n"));
Do you think it's necessary?

what the expectation of
pg_repackdb --index=index_name, the doc is not very helpful.

pg_repackdb --analyze --index=zz --verbose
pg_repackdb: repacking database "src3"
pg_repackdb: error: processing of database "src3" failed: ERROR: "zz"
is not an index for table "tenk1"

select pg_get_indexdef ('zz'::regclass);
pg_get_indexdef
---------------------------------------------------
CREATE INDEX zz ON public.tenk2 USING btree (two)

------
jian he
EDB: http://www.enterprisedb.com

#46

Robert Treat

rob@xzilla.net

2 months ago

In reply to: jian he (#45)

Re: Adding REPACK [concurrently]

On Tue, Nov 4, 2025 at 9:48 PM jian he <jian.universality@gmail.com> wrote:

On Fri, Oct 31, 2025 at 7:17 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Hello,

Here's a new installment of this series, v25, including the CONCURRENTLY
part, which required some conflict fixes on top of the much-changed
v24-0001 patch.

<refnamediv>
<refname>pg_repackdb</refname>
<refpurpose>repack and analyze a <productname>PostgreSQL</productname>
database</refpurpose>
</refnamediv>

but with --all option specified, it's doing repack whole cluster.
(more than one database).
I am not fully sure this description is OK.

This wording came from vacuumdb, which operates the same way, and I
don't think it's lead to confusion. And while I don't think we need to
take away the option, I see no reason to encourage the idea that
people should be doing cluster wide full database repacks. On that
note, I'd take the "and analyze" from the refpurpose as well; the more
I look at it, I see pg_repackdb as a replacement for clusterdb, with
selected bells and whistles from vacuum full or external repack-type
tooling, but at the end of the day that's a simpler model for
operators, and helps draw a distinction for which features we DONT
need to include, like -Z (ie. analyze only; if you want that, use
vacuumdb, not pg_repackdb)

I think pg_repackdb Synopsis section:
pg_repackdb [connection-option...] [option...] [ -t | --table table [(
column [,...] )] ] ... [ dbname | -a | --all ]
pg_repackdb [connection-option...] [option...] [ -n | --schema schema
] ... [ dbname | -a | --all ]
pg_repackdb [connection-option...] [option...] [ -N | --exclude-schema
schema ] ... [ dbname | -a | --all ]

can be simplified the same way as as pg_dump:

pg_repackdb [connection-option...] [option...] [ dbname | -a | --all ]

I think it's laid out that way in vacuumdb to indicate that those
options are exclusive of one another. I'm not sure how convincing that
is, but the above would need to do more to make the switch imo.

------------------------
[-d] dbname
[--dbname=]dbname

what do you think to expand it as below:
dbname
-d dbname
--dbname=dbname

not sure i am following this one, but the brackets are the standard
way we should items to be optional, which in either case they are.

--------------------

+ printf(_("      --index[=INDEX]             repack following an index\n"));
should it be
+ printf(_("--index[=INDEX]                   repack following an index\n"));
?

I believe this is included for alignment, since this option has no
shorthand version.

similar to pg_dump:
printf(_("\nIf no database name is supplied, then the PGDATABASE
environment\n"
"variable value is used.\n\n"));

in pg_repackdb help section, we can mention:
printf(_("\nIf no database name is supplied and --all option not
specified then the PGDATABASE environment\n"
"variable value is used.\n\n"));
Do you think it's necessary?

no. (again, looking first at clusterdb, and also vacuumdb, neither of
which have it).

what the expectation of
pg_repackdb --index=index_name, the doc is not very helpful.

pg_repackdb --analyze --index=zz --verbose
pg_repackdb: repacking database "src3"
pg_repackdb: error: processing of database "src3" failed: ERROR: "zz"
is not an index for table "tenk1"

select pg_get_indexdef ('zz'::regclass);
pg_get_indexdef
---------------------------------------------------
CREATE INDEX zz ON public.tenk2 USING btree (two)

Hmm... yes, this is a bit confusing. I didn't verify it in the code,
but from memory I think the --index option is meant to be used only in
conjunction with --table, in which case it would repack the table
using the specified index. I could be overlooking something though.

Robert Treat
https://xzilla.net

#47

Antonin Houska

ah@cybertec.at

2 months ago

In reply to: Robert Treat (#46)

Re: Adding REPACK [concurrently]

Robert Treat <rob@xzilla.net> wrote:

On Tue, Nov 4, 2025 at 9:48 PM jian he <jian.universality@gmail.com> wrote:

what the expectation of
pg_repackdb --index=index_name, the doc is not very helpful.

pg_repackdb --analyze --index=zz --verbose
pg_repackdb: repacking database "src3"
pg_repackdb: error: processing of database "src3" failed: ERROR: "zz"
is not an index for table "tenk1"

select pg_get_indexdef ('zz'::regclass);
pg_get_indexdef
---------------------------------------------------
CREATE INDEX zz ON public.tenk2 USING btree (two)

Hmm... yes, this is a bit confusing. I didn't verify it in the code,
but from memory I think the --index option is meant to be used only in
conjunction with --table, in which case it would repack the table
using the specified index. I could be overlooking something though.

The corresponding code is:

+	/*
+	 * In REPACK mode, if the 'using_index' option was given but no index
+	 * name, filter only tables that have an index with indisclustered set.
+	 * (If an index name is given, we trust the user to pass a reasonable list
+	 * of tables.)
+	 *
+	 * XXX it may be worth printing an error if an index name is given with no
+	 * list of tables.
+	 */
+	if (vacopts->mode == MODE_REPACK &&
+		vacopts->using_index && !vacopts->indexname)
+	{
+		appendPQExpBufferStr(&catalog_query,
+							 " AND EXISTS (SELECT 1 FROM pg_catalog.pg_index\n"
+							 "    WHERE indrelid = c.oid AND indisclustered)\n");
+	}

I'm not sure if it's worth allowing the --index option to have an
argument. Since the user can specify multiple tables, he should also be able
to specify multiple indexes. And then the question would be: what should
happen if the user forgot to specify (or just mistyped) the index name for a
table which does not yet have the clustering index set? Skip that table (and
print out a warning)? Or consider it an error?

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#48

jian he

jian.universality@gmail.com

2 months ago

In reply to: Robert Treat (#46)

Re: Adding REPACK [concurrently]

On Wed, Nov 5, 2025 at 1:11 PM Robert Treat <rob@xzilla.net> wrote:

--------------------
+ printf(_("      --index[=INDEX]             repack following an index\n"));
should it be
+ printf(_("--index[=INDEX]                   repack following an index\n"));
?
I believe this is included for alignment, since this option has no
shorthand version.

if you compare pg_dump --help, pg_repackdb --help
then you will see the inconsistency.

This is legacy behavior, but can we move some of the error checks in
do_analyze_rel to an earlier point?
we call cluster_rel before analyze_rel, cluster_rel is way more time-consuming,
a failure in analyze_rel means all the previous work (cluster_rel) is wasted.

+ else if (HeadMatches("REPACK", "(*") &&
+ !HeadMatches("REPACK", "(*)"))
+ {
+ /*
+ * This fires if we're in an unfinished parenthesized option list.
+ * get_previous_words treats a completed parenthesized option list as
+ * one word, so the above test is correct.
+ */
+ if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
+ COMPLETE_WITH("VERBOSE");
+ else if (TailMatches("VERBOSE"))
+ COMPLETE_WITH("ON", "OFF");
+ }
this part can also support the ANALYZE option?

ClusterStmt
should be removed from src/tools/pgindent/typedefs.list?

doc/src/sgml/ref/clusterdb.sgml
<para>
<application>clusterdb</application> has been superceded by
<application>pg_repackdb</application>.
</para>
google told me, "superceded" should be "superseded"

--
jian he
EDB: http://www.enterprisedb.com

#49

Robert Treat

rob@xzilla.net

2 months ago

In reply to: Antonin Houska (#47)

Re: Adding REPACK [concurrently]

On Wed, Nov 5, 2025 at 2:12 AM Antonin Houska <ah@cybertec.at> wrote:

Robert Treat <rob@xzilla.net> wrote:

On Tue, Nov 4, 2025 at 9:48 PM jian he <jian.universality@gmail.com> wrote:

what the expectation of
pg_repackdb --index=index_name, the doc is not very helpful.

pg_repackdb --analyze --index=zz --verbose
pg_repackdb: repacking database "src3"
pg_repackdb: error: processing of database "src3" failed: ERROR: "zz"
is not an index for table "tenk1"

select pg_get_indexdef ('zz'::regclass);
pg_get_indexdef
---------------------------------------------------
CREATE INDEX zz ON public.tenk2 USING btree (two)

Hmm... yes, this is a bit confusing. I didn't verify it in the code,
but from memory I think the --index option is meant to be used only in
conjunction with --table, in which case it would repack the table
using the specified index. I could be overlooking something though.

The corresponding code is:
+       /*
+        * In REPACK mode, if the 'using_index' option was given but no index
+        * name, filter only tables that have an index with indisclustered set.
+        * (If an index name is given, we trust the user to pass a reasonable list
+        * of tables.)
+        *
+        * XXX it may be worth printing an error if an index name is given with no
+        * list of tables.
+        */
+       if (vacopts->mode == MODE_REPACK &&
+               vacopts->using_index && !vacopts->indexname)
+       {
+               appendPQExpBufferStr(&catalog_query,
+                                                        " AND EXISTS (SELECT 1 FROM pg_catalog.pg_index\n"
+                                                        "    WHERE indrelid = c.oid AND indisclustered)\n");
+       }
I'm not sure if it's worth allowing the --index option to have an
argument. Since the user can specify multiple tables, he should also be able
to specify multiple indexes. And then the question would be: what should
happen if the user forgot to specify (or just mistyped) the index name for a
table which does not yet have the clustering index set? Skip that table (and
print out a warning)? Or consider it an error?

Ah, yes, this is something completely different. So, we do need a way
to differentiate between "vacuum full" vs "cluster" all tables... as
well as "vacuum full" vs "cluster" of a specific table (including the
idea of "vacuum full" of a previously clustered table, and the
existing code handles all that (though I might quibble with the option
name).

As for having an --index= option, I'd love to hear the use case;
something like partitions or maybe some client per schema situation
comes to mind, but ISTM in all those cases the user would also know
(or be expected to know) the table name, so I agree with Antonin that
the extra complexity doesn't seem worth supporting to me. (It's even
worse the more you think about it, what if some table has the index
named above, but is clustered on a different index, then what should
we do?)

As for the use case I was thinking of, specifying a table and index in
order to repack using that index (and setting indisclustered if not
already); while I feel like that would be a useful option, if it isn't
currently supported I don't see a strong argument for adding it now.

Robert Treat
https://xzilla.net

#50

Mihail Nikalayeu

mihailnikalayeu@gmail.com

about 1 month ago

In reply to: Antonin Houska (#44)

Re: Adding REPACK [concurrently]

Hello, Antonin!

On Mon, Nov 3, 2025 at 8:56 AM Antonin Houska <ah@cybertec.at> wrote:

I'll fix all the problems in the next version. Thanks!

A few more moments I mentioned:

switch ((vis = HeapTupleSatisfiesVacuum(tuple, OldestXmin, buf)))

vis is unused, also to double braces.

LockBuffer(buf, BUFFER_LOCK_UNLOCK);
continue;
}

/*
* In the concurrent case, we have a copy of the tuple, so we
* don't worry whether the source tuple will be deleted / updated
* after we release the lock.
*/
LockBuffer(buf, BUFFER_LOCK_UNLOCK);
}

I think locking and comments are a little bit confusing here.
I think we may use single LockBuffer(buf, BUFFER_LOCK_UNLOCK); before
`if (isdead)` as it was before.
Also, I am not sure "we have a copy" is the correct point here, I
think motivation is mostly the same as in
heapam_index_build_range_scan.

Also, I think it is a good idea to add tests for index-based and
sort-based repack.

Also, for sort-based I think we need to also call
repack_decode_concurrent_changes during insertion phase

is_system_catalog && !concurrent

2 places, always true, feels strange.

Best regards,
Mikhail.

#51

Antonin Houska

ah@cybertec.at

about 1 month ago

In reply to: Mihail Nikalayeu (#50)

Re: Adding REPACK [concurrently]

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

Hello, Antonin!

On Mon, Nov 3, 2025 at 8:56 AM Antonin Houska <ah@cybertec.at> wrote:

I'll fix all the problems in the next version. Thanks!

A few more moments I mentioned:

switch ((vis = HeapTupleSatisfiesVacuum(tuple, OldestXmin, buf)))

vis is unused, also to double braces.

LockBuffer(buf, BUFFER_LOCK_UNLOCK);
continue;
}

/*
* In the concurrent case, we have a copy of the tuple, so we
* don't worry whether the source tuple will be deleted / updated
* after we release the lock.
*/
LockBuffer(buf, BUFFER_LOCK_UNLOCK);
}

I think locking and comments are a little bit confusing here.
I think we may use single LockBuffer(buf, BUFFER_LOCK_UNLOCK); before
`if (isdead)` as it was before.
Also, I am not sure "we have a copy" is the correct point here, I
think motivation is mostly the same as in
heapam_index_build_range_scan.

All these problems are due to incorrect separation of the "preserve
visibility" part of the patch series. Will be fixed in the next version.

Also, I think it is a good idea to add tests for index-based and
sort-based repack.

Not sure, cluster.sql already seems to do the same.

Also, for sort-based I think we need to also call
repack_decode_concurrent_changes during insertion phase

I miss the point. The current coding is such that this part

if (concurrent)
{
XLogRecPtr end_of_wal;

end_of_wal = GetFlushRecPtr(NULL);
if ((end_of_wal - end_of_wal_prev) > wal_segment_size)
{
repack_decode_concurrent_changes(decoding_ctx, end_of_wal);
end_of_wal_prev = end_of_wal;
}
}

gets called regardless the value of 'tuplesort' above.

is_system_catalog && !concurrent

2 places, always true, feels strange.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#52

Mihail Nikalayeu

mihailnikalayeu@gmail.com

about 1 month ago

In reply to: Antonin Houska (#51)

Re: Adding REPACK [concurrently]

Hi!

On Tue, Dec 2, 2025 at 5:14 PM Antonin Houska <ah@cybertec.at> wrote:

Not sure, cluster.sql already seems to do the same.

I think in the case of CONCURRENTLY it may behave a little bit
different, but not sure.

I miss the point. The current coding is such that this part

I mean call it periodically in both loops: scan loop and insertion loop.

Best greetings,
Mikhail.

#53

Antonin Houska

ah@cybertec.at

about 1 month ago

In reply to: Mihail Nikalayeu (#52)

Re: Adding REPACK [concurrently]

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

On Tue, Dec 2, 2025 at 5:14 PM Antonin Houska <ah@cybertec.at> wrote:

Not sure, cluster.sql already seems to do the same.

I think in the case of CONCURRENTLY it may behave a little bit
different, but not sure.

I miss the point. The current coding is such that this part

I mean call it periodically in both loops: scan loop and insertion loop.

ok, that makes sense. I'll add that to the next version. Thanks.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#54

Antonin Houska

ah@cybertec.at

about 1 month ago

In reply to: jian he (#41)

Re: Adding REPACK [concurrently]

jian he <jian.universality@gmail.com> wrote:

if (params.options & CLUOPT_ANALYZE)
ereport(ERROR,
errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot %s multiple tables", "REPACK (ANALYZE)"));
for this error case, adding a simple test case would be better?

More options should probably be tested, currently we have only very basic
regression test for pg_repackdb. TBD

+ /* Do an analyze, if requested */
+ if (params->options & CLUOPT_ANALYZE)
+ {
+ VacuumParams vac_params = {0};
+
+ vac_params.options |= VACOPT_ANALYZE;
+ if (params->options & CLUOPT_VERBOSE)
+ vac_params.options |= VACOPT_VERBOSE;
+ analyze_rel(RelationGetRelid(rel), NULL, vac_params,
+ stmt->relation->va_cols, true, NULL);
+ }
Looking at the comments in struct VacuumParams, some fields have nonzero default
values — for example, log_vacuum_min_duration.
Do we need to explicitly set these fields to their default values?
(see ExecVacuum)

Perhaps, TBD.

repack.sgml can also add a
<refsect1> <title>See Also</title>
similar to analyze.sgml, vacuum.sgml

ok, added this in v26 (to be posted today):

(Not added reference to VACUUM FULL and CLUSTER intentionally: whoever uses
REPACK should not need them.

doc/src/sgml/ref/repack.sgml
synopsis section missing syntax:
REPACK USING INDEX

ok, added in v26.

I am wondering, can we also support
REPACK opt_utility_option_list USING INDEX

I agree, added that in v26 (Hopefully I haven't broken anything, the syntax is
not trival anymore.)

MATERIALIZED VIEW:
create materialized view a_________ as select * from t2;

repack (verbose);
INFO: repacking "public.a_________" in physical order
INFO: "public.a_________": found 0 removable, 10 nonremovable row
versions in 1 pages
DETAIL: 0 dead row versions cannot be removed yet.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.

cluster (verbose);
won't touch materialized view a_________

but materialized views don't have bloat, nothing can be removed.
So here we are waste cycles to repack materialized view?

Answered in /messages/by-id/3436011762001613@a7af8471-b1b8-48c2-9ff7-631187067407

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#55

David Klika

david.klika@atlas.cz

about 1 month ago

In reply to: Antonin Houska (#54)

Re: Adding REPACK [concurrently]

Hello

Great to hear about this feature.

You speak about table rewrite (suppose a whole-table rewrite). I would
like to share idea of an alternative approach that also takes into
account amount of WAL generated during the operation. Applicable to
non-clustered case only.

Let's consider a large table where 80% blocks are fine (filled enough by
live tuples). The table could be scanned from the beginning (left side)
to identify "not enough filled" blocks and also from the end (right
side) to process live tuples by moving them to the blocks identified
by the left side scan. The work is over when both scan reaches the same
position.

Example:

_ stands for filled enough blocks

D stands for blocks with (many) dead tuples

123456789
___DD____

Left scan identifies page #4 and tuples from the right scan (page #9)
are moved here. The same with tuples from #8 to #5. Two pages from the
data file are trimmed and (only) pages #4 and #5 are written in WAL,
others are untouched.

Regards
David

Import Notes

Resolved by subject fallback

#56

Álvaro Herrera

alvherre@alvh.no-ip.org

about 1 month ago

In reply to: David Klika (#55)

Re: Adding REPACK [concurrently]

Hello David,

Thanks for your interest in this.

On 2025-Dec-04, David Klika wrote:

Let's consider a large table where 80% blocks are fine (filled enough by
live tuples). The table could be scanned from the beginning (left side) to
identify "not enough filled" blocks and also from the end (right side) to
process live tuples by moving them to the blocks identified by the left side
scan. The work is over when both scan reaches the same position.

If you only have a small number of pages that have this problem, then
you don't actually need to do anything -- the pages will be marked free
by regular vacuuming, and future inserts or updates can make use of
those pages. It's not a problem to have a small number of pages in
empty state for some time.

So if you're trying to do this, the number of problematic pages must be
large.

Now, the issue with what you propose is that you need to make either the
old tuples or the new tuples visible to concurrent transactions. If at
any point they are both visible, or none of them is visible, then you
have potentially corrupted the results that would be obtained by a query
that's scanning the table and halfway through.

The other point is that you need to keep indexes updated. That is, you
need to make the indexes point to both the old and new, until you remove
the old tuples from the table, then remove those index pointers.
This process bloats the indexes, which is not insignificant, considering
that the number of tuples to process is large. If there are several
indexes, this makes your process take even longer.

You can fix the concurrency problem by holding a lock on the table that
ensures nobody is reading the table until you've finished. But we don't
want to have to hold such a lock for long! And we already established
that the number of pages to check is large, which means you're going to
work for a long time.

So, I'm not really sure that it's practical to implement what you
suggest.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/

#57

Antonin Houska

ah@cybertec.at

about 1 month ago

In reply to: Alvaro Herrera (#40)

4 attachment(s)

Re: Adding REPACK [concurrently]

Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Here's a new installment of this series, v25, including the CONCURRENTLY
part, which required some conflict fixes on top of the much-changed
v24-0001 patch.

v26 attached here. It's been rebased and reflects most of the feedback.

A few incomplete items are marked as TBD here [1]/messages/by-id/23631.1764855372@localhost and [2]/messages/by-id/CAJSLCQ2_jX8WmNOC4eu6hL5QyNHceOkgPbGhKHFw2X5onVEKDQ@mail.gmail.com -- Antonin Houska Web: https://www.cybertec-postgresql.com is another thing
that needs discussion.

Besides that, I've done some refactoring in 0004: 1) move more code to
setup_logical_decoding(), and 2) reduced the number of arguments of
process_concurrent_changes() by using a new structure. Both these changes are
a preparation for a background worker that will perform the logical decoding,
but seem to be useful as such. (I have a PoC of the worker but will post it
later, it doesn't seem to be the priority for now.)

I've also removed support for decoding TRUNCATE because I realized that this
command uses AccessExclusiveLock, so it cannot be executed on a table that
REPACK (CONCURRENTLY) is just processing.

Also I tried to fix TAB completion in psql.

I have not yet addressed Robert Treat's feedback from October 12th.

These are still pending.

[1]: /messages/by-id/23631.1764855372@localhost
[2]: /messages/by-id/CAJSLCQ2_jX8WmNOC4eu6hL5QyNHceOkgPbGhKHFw2X5onVEKDQ@mail.gmail.com -- Antonin Houska Web: https://www.cybertec-postgresql.com
--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachments:

v26-0001-Add-REPACK-command.patchtext/x-diff; charset=utf-8Download

From 59f1075f4017398d83d399c336f6634700a02006 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Thu, 4 Dec 2025 18:20:07 +0100
Subject: [PATCH 1/4] Add REPACK command
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

REPACK absorbs the functionality of VACUUM FULL and CLUSTER in a single
command.  Because this functionality is completely different from
regular VACUUM, having it separate from VACUUM makes it easier for users
to understand; as for CLUSTER, the term is heavily overloaded in the
IT world and even in Postgres itself, so it's good that we can avoid it.

This also adds pg_repackdb, a new utility that can invoke the new
commands.  This is heavily based on vacuumdb.

Author: Antonin Houska <ah@cybertec.at>
Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: jian he <jian.universality@gmail.com>
Discussion: https://postgr.es/m/82651.1720540558@antos
Discussion: https://postgr.es/m/202507262156.sb455angijk6@alvherre.pgsql
---
 doc/src/sgml/monitoring.sgml             | 223 +++++-
 doc/src/sgml/ref/allfiles.sgml           |   2 +
 doc/src/sgml/ref/cluster.sgml            |  97 +--
 doc/src/sgml/ref/clusterdb.sgml          |   5 +
 doc/src/sgml/ref/pg_repackdb.sgml        | 488 +++++++++++++
 doc/src/sgml/ref/repack.sgml             | 328 +++++++++
 doc/src/sgml/ref/vacuum.sgml             |  33 +-
 doc/src/sgml/reference.sgml              |   2 +
 src/backend/access/heap/heapam_handler.c |  32 +-
 src/backend/catalog/index.c              |   2 +-
 src/backend/catalog/system_views.sql     |  29 +-
 src/backend/commands/cluster.c           | 851 +++++++++++++++--------
 src/backend/commands/vacuum.c            |   6 +-
 src/backend/parser/gram.y                |  86 ++-
 src/backend/tcop/utility.c               |  20 +-
 src/backend/utils/adt/pgstatfuncs.c      |   2 +
 src/bin/psql/tab-complete.in.c           |  42 +-
 src/bin/scripts/Makefile                 |   4 +-
 src/bin/scripts/meson.build              |   2 +
 src/bin/scripts/pg_repackdb.c            | 242 +++++++
 src/bin/scripts/t/103_repackdb.pl        |  47 ++
 src/bin/scripts/vacuuming.c              | 114 ++-
 src/bin/scripts/vacuuming.h              |   3 +
 src/include/commands/cluster.h           |   8 +-
 src/include/commands/progress.h          |  50 +-
 src/include/nodes/parsenodes.h           |  35 +-
 src/include/parser/kwlist.h              |   1 +
 src/include/tcop/cmdtaglist.h            |   1 +
 src/include/utils/backend_progress.h     |   1 +
 src/test/regress/expected/cluster.out    | 134 +++-
 src/test/regress/expected/rules.out      |  72 +-
 src/test/regress/sql/cluster.sql         |  70 +-
 src/tools/pgindent/typedefs.list         |   2 +
 33 files changed, 2490 insertions(+), 544 deletions(-)
 create mode 100644 doc/src/sgml/ref/pg_repackdb.sgml
 create mode 100644 doc/src/sgml/ref/repack.sgml
 create mode 100644 src/bin/scripts/pg_repackdb.c
 create mode 100644 src/bin/scripts/t/103_repackdb.pl

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 039d73691be..b8da77b4d89 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -405,6 +405,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_progress_repack</structname><indexterm><primary>pg_stat_progress_repack</primary></indexterm></entry>
+      <entry>One row for each backend running
+       <command>REPACK</command>, showing current progress.  See
+       <xref linkend="repack-progress-reporting"/>.
+      </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_progress_basebackup</structname><indexterm><primary>pg_stat_progress_basebackup</primary></indexterm></entry>
       <entry>One row for each WAL sender process streaming a base backup,
@@ -5609,7 +5617,8 @@ FROM pg_stat_get_backend_idset() AS backendid;
    certain commands during command execution.  Currently, the only commands
    which support progress reporting are <command>ANALYZE</command>,
    <command>CLUSTER</command>,
-   <command>CREATE INDEX</command>, <command>VACUUM</command>,
+   <command>CREATE INDEX</command>, <command>REPACK</command>,
+   <command>VACUUM</command>,
    <command>COPY</command>,
    and <xref linkend="protocol-replication-base-backup"/> (i.e., replication
    command that <xref linkend="app-pgbasebackup"/> issues to take
@@ -6068,6 +6077,218 @@ FROM pg_stat_get_backend_idset() AS backendid;
   </table>
  </sect2>
 
+ <sect2 id="repack-progress-reporting">
+  <title>REPACK Progress Reporting</title>
+
+  <indexterm>
+   <primary>pg_stat_progress_repack</primary>
+  </indexterm>
+
+  <para>
+   Whenever <command>REPACK</command> is running,
+   the <structname>pg_stat_progress_repack</structname> view will contain a
+   row for each backend that is currently running the command.  The tables
+   below describe the information that will be reported and provide
+   information about how to interpret it.
+  </para>
+
+  <table id="pg-stat-progress-repack-view" xreflabel="pg_stat_progress_repack">
+   <title><structname>pg_stat_progress_repack</structname> View</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>pid</structfield> <type>integer</type>
+      </para>
+      <para>
+       Process ID of backend.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>datid</structfield> <type>oid</type>
+      </para>
+      <para>
+       OID of the database to which this backend is connected.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>datname</structfield> <type>name</type>
+      </para>
+      <para>
+       Name of the database to which this backend is connected.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>relid</structfield> <type>oid</type>
+      </para>
+      <para>
+       OID of the table being repacked.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>phase</structfield> <type>text</type>
+      </para>
+      <para>
+       Current processing phase. See <xref linkend="repack-phases"/>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>repack_index_relid</structfield> <type>oid</type>
+      </para>
+      <para>
+       If the table is being scanned using an index, this is the OID of the
+       index being used; otherwise, it is zero.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_scanned</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples scanned.
+       This counter only advances when the phase is
+       <literal>seq scanning heap</literal>,
+       <literal>index scanning heap</literal>
+       or <literal>writing new heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_written</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples written.
+       This counter only advances when the phase is
+       <literal>seq scanning heap</literal>,
+       <literal>index scanning heap</literal>
+       or <literal>writing new heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_blks_total</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total number of heap blocks in the table.  This number is reported
+       as of the beginning of <literal>seq scanning heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_blks_scanned</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap blocks scanned.  This counter only advances when the
+       phase is <literal>seq scanning heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>index_rebuild_count</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of indexes rebuilt.  This counter only advances when the phase
+       is <literal>rebuilding index</literal>.
+      </para></entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <table id="repack-phases">
+   <title>REPACK Phases</title>
+   <tgroup cols="2">
+    <colspec colname="col1" colwidth="1*"/>
+    <colspec colname="col2" colwidth="2*"/>
+    <thead>
+    <row>
+      <entry>Phase</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><literal>initializing</literal></entry>
+     <entry>
+       The command is preparing to begin scanning the heap.  This phase is
+       expected to be very brief.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>seq scanning heap</literal></entry>
+     <entry>
+       The command is currently scanning the table using a sequential scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>index scanning heap</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently scanning the table using an index scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>sorting tuples</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently sorting tuples.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>writing new heap</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently writing the new heap.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>swapping relation files</literal></entry>
+     <entry>
+       The command is currently swapping newly-built files into place.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>rebuilding index</literal></entry>
+     <entry>
+       The command is currently rebuilding an index.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>performing final cleanup</literal></entry>
+     <entry>
+       The command is performing final cleanup.  When this phase is
+       completed, <command>REPACK</command> will end.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+ </sect2>
+
  <sect2 id="copy-progress-reporting">
   <title>COPY Progress Reporting</title>
 
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index e167406c744..5df944d13ca 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -167,6 +167,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY refreshMaterializedView SYSTEM "refresh_materialized_view.sgml">
 <!ENTITY reindex            SYSTEM "reindex.sgml">
 <!ENTITY releaseSavepoint   SYSTEM "release_savepoint.sgml">
+<!ENTITY repack             SYSTEM "repack.sgml">
 <!ENTITY reset              SYSTEM "reset.sgml">
 <!ENTITY revoke             SYSTEM "revoke.sgml">
 <!ENTITY rollback           SYSTEM "rollback.sgml">
@@ -213,6 +214,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY pgIsready          SYSTEM "pg_isready.sgml">
 <!ENTITY pgReceivewal       SYSTEM "pg_receivewal.sgml">
 <!ENTITY pgRecvlogical      SYSTEM "pg_recvlogical.sgml">
+<!ENTITY pgRepackdb         SYSTEM "pg_repackdb.sgml">
 <!ENTITY pgResetwal         SYSTEM "pg_resetwal.sgml">
 <!ENTITY pgRestore          SYSTEM "pg_restore.sgml">
 <!ENTITY pgRewind           SYSTEM "pg_rewind.sgml">
diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index 0b47460080b..2cda711bc9f 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -33,51 +33,13 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
   <title>Description</title>
 
   <para>
-   <command>CLUSTER</command> instructs <productname>PostgreSQL</productname>
-   to cluster the table specified
-   by <replaceable class="parameter">table_name</replaceable>
-   based on the index specified by
-   <replaceable class="parameter">index_name</replaceable>. The index must
-   already have been defined on
-   <replaceable class="parameter">table_name</replaceable>.
+   The <command>CLUSTER</command> command is equivalent to
+   <xref linkend="sql-repack"/> with an <literal>USING INDEX</literal>
+   clause.  See there for more details.
   </para>
 
-  <para>
-   When a table is clustered, it is physically reordered
-   based on the index information. Clustering is a one-time operation:
-   when the table is subsequently updated, the changes are
-   not clustered.  That is, no attempt is made to store new or
-   updated rows according to their index order.  (If one wishes, one can
-   periodically recluster by issuing the command again.  Also, setting
-   the table's <literal>fillfactor</literal> storage parameter to less than
-   100% can aid in preserving cluster ordering during updates, since updated
-   rows are kept on the same page if enough space is available there.)
-  </para>
-
-  <para>
-   When a table is clustered, <productname>PostgreSQL</productname>
-   remembers which index it was clustered by.  The form
-   <command>CLUSTER <replaceable class="parameter">table_name</replaceable></command>
-   reclusters the table using the same index as before.  You can also
-   use the <literal>CLUSTER</literal> or <literal>SET WITHOUT CLUSTER</literal>
-   forms of <link linkend="sql-altertable"><command>ALTER TABLE</command></link> to set the index to be used for
-   future cluster operations, or to clear any previous setting.
-  </para>
-
-  <para>
-   <command>CLUSTER</command> without a
-   <replaceable class="parameter">table_name</replaceable> reclusters all the
-   previously-clustered tables in the current database that the calling user
-   has privileges for.  This form of <command>CLUSTER</command> cannot be
-   executed inside a transaction block.
-  </para>
+<!-- Do we need to describe exactly which options map to what?  They seem obvious to me. -->
 
-  <para>
-   When a table is being clustered, an <literal>ACCESS
-   EXCLUSIVE</literal> lock is acquired on it. This prevents any other
-   database operations (both reads and writes) from operating on the
-   table until the <command>CLUSTER</command> is finished.
-  </para>
  </refsect1>
 
  <refsect1>
@@ -136,63 +98,12 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
     on the table.
    </para>
 
-   <para>
-    In cases where you are accessing single rows randomly
-    within a table, the actual order of the data in the
-    table is unimportant. However, if you tend to access some
-    data more than others, and there is an index that groups
-    them together, you will benefit from using <command>CLUSTER</command>.
-    If you are requesting a range of indexed values from a table, or a
-    single indexed value that has multiple rows that match,
-    <command>CLUSTER</command> will help because once the index identifies the
-    table page for the first row that matches, all other rows
-    that match are probably already on the same table page,
-    and so you save disk accesses and speed up the query.
-   </para>
-
-   <para>
-    <command>CLUSTER</command> can re-sort the table using either an index scan
-    on the specified index, or (if the index is a b-tree) a sequential
-    scan followed by sorting.  It will attempt to choose the method that
-    will be faster, based on planner cost parameters and available statistical
-    information.
-   </para>
-
    <para>
     While <command>CLUSTER</command> is running, the <xref
     linkend="guc-search-path"/> is temporarily changed to <literal>pg_catalog,
     pg_temp</literal>.
    </para>
 
-   <para>
-    When an index scan is used, a temporary copy of the table is created that
-    contains the table data in the index order.  Temporary copies of each
-    index on the table are created as well.  Therefore, you need free space on
-    disk at least equal to the sum of the table size and the index sizes.
-   </para>
-
-   <para>
-    When a sequential scan and sort is used, a temporary sort file is
-    also created, so that the peak temporary space requirement is as much
-    as double the table size, plus the index sizes.  This method is often
-    faster than the index scan method, but if the disk space requirement is
-    intolerable, you can disable this choice by temporarily setting <xref
-    linkend="guc-enable-sort"/> to <literal>off</literal>.
-   </para>
-
-   <para>
-    It is advisable to set <xref linkend="guc-maintenance-work-mem"/> to
-    a reasonably large value (but not more than the amount of RAM you can
-    dedicate to the <command>CLUSTER</command> operation) before clustering.
-   </para>
-
-   <para>
-    Because the planner records statistics about the ordering of
-    tables, it is advisable to run <link linkend="sql-analyze"><command>ANALYZE</command></link>
-    on the newly clustered table.
-    Otherwise, the planner might make poor choices of query plans.
-   </para>
-
    <para>
     Because <command>CLUSTER</command> remembers which indexes are clustered,
     one can cluster the tables one wants clustered manually the first time,
diff --git a/doc/src/sgml/ref/clusterdb.sgml b/doc/src/sgml/ref/clusterdb.sgml
index 0d2051bf6f1..546c1289c31 100644
--- a/doc/src/sgml/ref/clusterdb.sgml
+++ b/doc/src/sgml/ref/clusterdb.sgml
@@ -64,6 +64,11 @@ PostgreSQL documentation
    this utility and via other methods for accessing the server.
   </para>
 
+  <para>
+   <application>clusterdb</application> has been superceded by
+   <application>pg_repackdb</application>.
+  </para>
+
  </refsect1>
 
 
diff --git a/doc/src/sgml/ref/pg_repackdb.sgml b/doc/src/sgml/ref/pg_repackdb.sgml
new file mode 100644
index 00000000000..b313b54ab63
--- /dev/null
+++ b/doc/src/sgml/ref/pg_repackdb.sgml
@@ -0,0 +1,488 @@
+<!--
+doc/src/sgml/ref/pg_repackdb.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="app-pgrepackdb">
+ <indexterm zone="app-pgrepackdb">
+  <primary>pg_repackdb</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle><application>pg_repackdb</application></refentrytitle>
+  <manvolnum>1</manvolnum>
+  <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>pg_repackdb</refname>
+  <refpurpose>repack and analyze a <productname>PostgreSQL</productname>
+  database</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_repackdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+      <arg choice="plain"><option>-t</option></arg>
+      <arg choice="plain"><option>--table</option></arg>
+     </group>
+     <replaceable>table</replaceable>
+     <arg choice="opt">( <replaceable class="parameter">column</replaceable> [,...] )</arg>
+    </arg>
+   </arg>
+
+   <arg choice="opt">
+    <group choice="plain">
+     <arg choice="plain"><replaceable>dbname</replaceable></arg>
+     <arg choice="plain"><option>-a</option></arg>
+     <arg choice="plain"><option>--all</option></arg>
+    </group>
+   </arg>
+  </cmdsynopsis>
+
+  <cmdsynopsis>
+   <command>pg_repackdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+      <arg choice="plain"><option>-n</option></arg>
+      <arg choice="plain"><option>--schema</option></arg>
+     </group>
+     <replaceable>schema</replaceable>
+    </arg>
+   </arg>
+
+   <arg choice="opt">
+    <group choice="plain">
+     <arg choice="plain"><replaceable>dbname</replaceable></arg>
+     <arg choice="plain"><option>-a</option></arg>
+     <arg choice="plain"><option>--all</option></arg>
+    </group>
+   </arg>
+  </cmdsynopsis>
+
+  <cmdsynopsis>
+   <command>pg_repackdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+      <arg choice="plain"><option>-N</option></arg>
+      <arg choice="plain"><option>--exclude-schema</option></arg>
+     </group>
+     <replaceable>schema</replaceable>
+    </arg>
+   </arg>
+
+   <arg choice="opt">
+    <group choice="plain">
+     <arg choice="plain"><replaceable>dbname</replaceable></arg>
+     <arg choice="plain"><option>-a</option></arg>
+     <arg choice="plain"><option>--all</option></arg>
+    </group>
+   </arg>
+  </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <application>pg_repackdb</application> is a utility for repacking a
+   <productname>PostgreSQL</productname> database.
+   <application>pg_repackdb</application> will also generate internal
+   statistics used by the <productname>PostgreSQL</productname> query
+   optimizer.
+  </para>
+
+  <para>
+   <application>pg_repackdb</application> is a wrapper around the SQL
+   command <link linkend="sql-repack"><command>REPACK</command></link> There
+   is no effective difference between repacking and analyzing databases via
+   this utility and via other methods for accessing the server.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Options</title>
+
+   <para>
+    <application>pg_repackdb</application> accepts the following command-line arguments:
+    <variablelist>
+     <varlistentry>
+      <term><option>-a</option></term>
+      <term><option>--all</option></term>
+      <listitem>
+       <para>
+        Repack all databases.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option><optional>-d</optional> <replaceable class="parameter">dbname</replaceable></option></term>
+      <term><option><optional>--dbname=</optional><replaceable class="parameter">dbname</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the name of the database to be repacked or analyzed,
+        when <option>-a</option>/<option>--all</option> is not used.  If this
+        is not specified, the database name is read from the environment
+        variable <envar>PGDATABASE</envar>.  If that is not set, the user name
+        specified for the connection is used.
+        The <replaceable>dbname</replaceable> can be
+        a <link linkend="libpq-connstring">connection string</link>.  If so,
+        connection string parameters will override any conflicting command
+        line options.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-e</option></term>
+      <term><option>--echo</option></term>
+      <listitem>
+       <para>
+        Echo the commands that <application>pg_repackdb</application>
+        generates and sends to the server.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--index<optional>=<replaceable class="parameter">index_name</replaceable></optional></option></term>
+      <listitem>
+       <para>
+        Pass the <literal>USING INDEX</literal> clause to <literal>REPACK</literal>,
+        and optionally the index name to specify.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-j <replaceable class="parameter">njobs</replaceable></option></term>
+      <term><option>--jobs=<replaceable class="parameter">njobs</replaceable></option></term>
+      <listitem>
+       <para>
+        Execute the repack or analyze commands in parallel by running
+        <replaceable class="parameter">njobs</replaceable>
+        commands simultaneously.  This option may reduce the processing time
+        but it also increases the load on the database server.
+       </para>
+       <para>
+        <application>pg_repackdb</application> will open
+        <replaceable class="parameter">njobs</replaceable> connections to the
+        database, so make sure your <xref linkend="guc-max-connections"/>
+        setting is high enough to accommodate all connections.
+       </para>
+       <para>
+        Note that using this mode might cause deadlock failures if certain
+        system catalogs are processed in parallel.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-n <replaceable class="parameter">schema</replaceable></option></term>
+      <term><option>--schema=<replaceable class="parameter">schema</replaceable></option></term>
+      <listitem>
+       <para>
+        Repack or analyze all tables in
+        <replaceable class="parameter">schema</replaceable> only.  Multiple
+        schemas can be repacked by writing multiple <option>-n</option>
+        switches.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-N <replaceable class="parameter">schema</replaceable></option></term>
+      <term><option>--exclude-schema=<replaceable class="parameter">schema</replaceable></option></term>
+      <listitem>
+       <para>
+        Do not repack or analyze any tables in
+        <replaceable class="parameter">schema</replaceable>.  Multiple schemas
+        can be excluded by writing multiple <option>-N</option> switches.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-q</option></term>
+      <term><option>--quiet</option></term>
+      <listitem>
+       <para>
+        Do not display progress messages.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-t <replaceable class="parameter">table</replaceable> [ (<replaceable class="parameter">column</replaceable> [,...]) ]</option></term>
+      <term><option>--table=<replaceable class="parameter">table</replaceable> [ (<replaceable class="parameter">column</replaceable> [,...]) ]</option></term>
+      <listitem>
+       <para>
+        Repack or analyze <replaceable class="parameter">table</replaceable>
+        only.  Column names can be specified only in conjunction with
+        the <option>--analyze</option> option.  Multiple tables can be
+        repacked by writing multiple
+        <option>-t</option> switches.
+       </para>
+       <tip>
+        <para>
+         If you specify columns, you probably have to escape the parentheses
+         from the shell.  (See examples below.)
+        </para>
+       </tip>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-v</option></term>
+      <term><option>--verbose</option></term>
+      <listitem>
+       <para>
+        Print detailed information during processing.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-V</option></term>
+       <term><option>--version</option></term>
+       <listitem>
+       <para>
+       Print the <application>pg_repackdb</application> version and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-z</option></term>
+      <term><option>--analyze</option></term>
+      <listitem>
+       <para>
+        Also calculate statistics for use by the optimizer.  If a column name
+        list is given, only compute statistics for those columns.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-?</option></term>
+       <term><option>--help</option></term>
+       <listitem>
+       <para>
+       Show help about <application>pg_repackdb</application> command line
+       arguments, and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+    </variablelist>
+   </para>
+
+   <para>
+    <application>pg_repackdb</application> also accepts
+    the following command-line arguments for connection parameters:
+    <variablelist>
+     <varlistentry>
+      <term><option>-h <replaceable class="parameter">host</replaceable></option></term>
+      <term><option>--host=<replaceable class="parameter">host</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the host name of the machine on which the server
+        is running.  If the value begins with a slash, it is used
+        as the directory for the Unix domain socket.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-p <replaceable class="parameter">port</replaceable></option></term>
+      <term><option>--port=<replaceable class="parameter">port</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the TCP port or local Unix domain socket file
+        extension on which the server
+        is listening for connections.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-U <replaceable class="parameter">username</replaceable></option></term>
+      <term><option>--username=<replaceable class="parameter">username</replaceable></option></term>
+      <listitem>
+       <para>
+        User name to connect as.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-w</option></term>
+      <term><option>--no-password</option></term>
+      <listitem>
+       <para>
+        Never issue a password prompt.  If the server requires
+        password authentication and a password is not available by
+        other means such as a <filename>.pgpass</filename> file, the
+        connection attempt will fail.  This option can be useful in
+        batch jobs and scripts where no user is present to enter a
+        password.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-W</option></term>
+      <term><option>--password</option></term>
+      <listitem>
+       <para>
+        Force <application>pg_repackdb</application> to prompt for a
+        password before connecting to a database.
+       </para>
+
+       <para>
+        This option is never essential, since
+        <application>pg_repackdb</application> will automatically prompt
+        for a password if the server demands password authentication.
+        However, <application>pg_repackdb</application> will waste a
+        connection attempt finding out that the server wants a password.
+        In some cases it is worth typing <option>-W</option> to avoid the extra
+        connection attempt.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--maintenance-db=<replaceable class="parameter">dbname</replaceable></option></term>
+      <listitem>
+       <para>
+        When the <option>-a</option>/<option>--all</option> is used, connect
+        to this database to gather the list of databases to repack.
+        If not specified, the <literal>postgres</literal> database will be used,
+        or if that does not exist, <literal>template1</literal> will be used.
+        This can be a <link linkend="libpq-connstring">connection
+        string</link>.  If so, connection string parameters will override any
+        conflicting command line options.  Also, connection string parameters
+        other than the database name itself will be re-used when connecting
+        to other databases.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+   </para>
+ </refsect1>
+
+
+ <refsect1>
+  <title>Environment</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><envar>PGDATABASE</envar></term>
+    <term><envar>PGHOST</envar></term>
+    <term><envar>PGPORT</envar></term>
+    <term><envar>PGUSER</envar></term>
+
+    <listitem>
+     <para>
+      Default connection parameters
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><envar>PG_COLOR</envar></term>
+    <listitem>
+     <para>
+      Specifies whether to use color in diagnostic messages. Possible values
+      are <literal>always</literal>, <literal>auto</literal> and
+      <literal>never</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <para>
+   This utility, like most other <productname>PostgreSQL</productname> utilities,
+   also uses the environment variables supported by <application>libpq</application>
+   (see <xref linkend="libpq-envars"/>).
+  </para>
+
+ </refsect1>
+
+
+ <refsect1>
+  <title>Diagnostics</title>
+
+  <para>
+   In case of difficulty, see
+   <xref linkend="sql-repack"/> and <xref linkend="app-psql"/> for
+   discussions of potential problems and error messages.
+   The database server must be running at the
+   targeted host.  Also, any default connection settings and environment
+   variables used by the <application>libpq</application> front-end
+   library will apply.
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+   <para>
+    To repack the database <literal>test</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb test</userinput>
+</screen>
+   </para>
+
+   <para>
+    To repack and analyze for the optimizer a database named
+    <literal>bigdb</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --analyze bigdb</userinput>
+</screen>
+   </para>
+
+   <para>
+    To repack a single table
+    <literal>foo</literal> in a database named
+    <literal>xyzzy</literal>, and analyze a single column
+    <literal>bar</literal> of the table for the optimizer:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --analyze --verbose --table='foo(bar)' xyzzy</userinput>
+</screen></para>
+
+   <para>
+    To repack all tables in the <literal>foo</literal> and <literal>bar</literal> schemas
+    in a database named <literal>xyzzy</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --schema='foo' --schema='bar' xyzzy</userinput>
+</screen></para>
+
+
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-repack"/></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/repack.sgml b/doc/src/sgml/ref/repack.sgml
new file mode 100644
index 00000000000..61d5c2cdef1
--- /dev/null
+++ b/doc/src/sgml/ref/repack.sgml
@@ -0,0 +1,328 @@
+<!--
+doc/src/sgml/ref/repack.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="sql-repack">
+ <indexterm zone="sql-repack">
+  <primary>REPACK</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>REPACK</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>REPACK</refname>
+  <refpurpose>rewrite a table to reclaim disk space</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <replaceable class="parameter">table_and_columns</replaceable> [ USING INDEX [ <replaceable class="parameter">index_name</replaceable> ] ] ]
+REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] USING INDEX
+
+<phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
+
+    VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
+    ANALYZE [ <replaceable class="parameter">boolean</replaceable> ]
+
+<phrase>and <replaceable class="parameter">table_and_columns</replaceable> is:</phrase>
+
+    <replaceable class="parameter">table_name</replaceable> [ ( <replaceable class="parameter">column_name</replaceable> [, ...] ) ]
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>REPACK</command> reclaims storage occupied by dead
+   tuples. Unlike <command>VACUUM</command>, it does so by rewriting the
+   entire contents of the table specified
+   by <replaceable class="parameter">table_name</replaceable> into a new disk
+   file with no extra space (except for the space guaranteed by
+   the <literal>fillfactor</literal> storage parameter), allowing unused space
+   to be returned to the operating system.
+  </para>
+
+  <para>
+   Without
+   a <replaceable class="parameter">table_name</replaceable>, <command>REPACK</command>
+   processes every table and materialized view in the current database that
+   the current user has the <literal>MAINTAIN</literal> privilege on. This
+   form of <command>REPACK</command> cannot be executed inside a transaction
+   block.
+  </para>
+
+  <para>
+   If a <literal>USING INDEX</literal> clause is specified, the rows are
+   physically reordered based on information from an index.  Please see the
+   notes on clustering below.
+  </para>
+
+  <para>
+   When a table is being repacked, an <literal>ACCESS EXCLUSIVE</literal> lock
+   is acquired on it. This prevents any other database operations (both reads
+   and writes) from operating on the table until the <command>REPACK</command>
+   is finished.
+  </para>
+
+  <refsect2 id="sql-repack-notes-on-clustering" xreflabel="Notes on Clustering">
+   <title>Notes on Clustering</title>
+
+   <para>
+    If the <literal>USING INDEX</literal> clause is specified, the rows in
+    the table are physically reordered following an index: if an index name
+    is specified in the command, then that index is used; if no index name
+    is specified, then the index that has been configured as the index to
+    cluster on.  If no index has been configured in this way, an error is
+    thrown.  The index given in the <literal>USING INDEX</literal> clause
+    is configured as the index to cluster on, as well as an index given
+    to the <command>CLUSTER</command> command.  An index can be set
+    manually using <command>ALTER TABLE ... CLUSTER ON</command>, and reset
+    with <command>ALTER TABLE ... SET WITHOUT CLUSTER</command>.
+   </para>
+
+   <para>
+    If no table name is specified in <command>REPACK USING INDEX</command>,
+    all tables which have a clustering index defined and which the calling
+    user has privileges for are processed.
+   </para>
+
+   <para>
+    Clustering is a one-time operation: when the table is
+    subsequently updated, the changes are not clustered.  That is, no attempt
+    is made to store new or updated rows according to their index order.  (If
+    one wishes, one can periodically recluster by issuing the command again.
+    Also, setting the table's <literal>fillfactor</literal> storage parameter
+    to less than 100% can aid in preserving cluster ordering during updates,
+    since updated rows are kept on the same page if enough space is available
+    there.)
+   </para>
+
+   <para>
+    In cases where you are accessing single rows randomly within a table, the
+    actual order of the data in the table is unimportant. However, if you tend
+    to access some data more than others, and there is an index that groups
+    them together, you will benefit from using clustering.  If
+    you are requesting a range of indexed values from a table, or a single
+    indexed value that has multiple rows that match,
+    <command>REPACK</command> will help because once the index identifies the
+    table page for the first row that matches, all other rows that match are
+    probably already on the same table page, and so you save disk accesses and
+    speed up the query.
+   </para>
+
+   <para>
+    <command>REPACK</command> can re-sort the table using either an index scan
+    on the specified index (if the index is a b-tree), or a sequential scan
+    followed by sorting.  It will attempt to choose the method that will be
+    faster, based on planner cost parameters and available statistical
+    information.
+   </para>
+
+   <para>
+    Because the planner records statistics about the ordering of tables, it is
+    advisable to
+    run <link linkend="sql-analyze"><command>ANALYZE</command></link> on the
+    newly repacked table.  Otherwise, the planner might make poor choices of
+    query plans.
+   </para>
+  </refsect2>
+
+  <refsect2 id="sql-repack-notes-on-resources" xreflabel="Notes on Resources">
+   <title>Notes on Resources</title>
+
+   <para>
+    When an index scan or a sequential scan without sort is used, a temporary
+    copy of the table is created that contains the table data in the index
+    order.  Temporary copies of each index on the table are created as well.
+    Therefore, you need free space on disk at least equal to the sum of the
+    table size and the index sizes.
+   </para>
+
+   <para>
+    When a sequential scan and sort is used, a temporary sort file is also
+    created, so that the peak temporary space requirement is as much as double
+    the table size, plus the index sizes.  This method is often faster than
+    the index scan method, but if the disk space requirement is intolerable,
+    you can disable this choice by temporarily setting
+    <xref linkend="guc-enable-sort"/> to <literal>off</literal>.
+   </para>
+
+   <para>
+    It is advisable to set <xref linkend="guc-maintenance-work-mem"/> to a
+    reasonably large value (but not more than the amount of RAM you can
+    dedicate to the <command>REPACK</command> operation) before repacking.
+   </para>
+  </refsect2>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><replaceable class="parameter">table_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (possibly schema-qualified) of a table.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">column_name</replaceable></term>
+    <listitem>
+     <para>
+      The name of a specific column to analyze. Defaults to all columns.
+      If a column list is specific, <literal>ANALYZE</literal> must also
+      be specified.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">index_name</replaceable></term>
+    <listitem>
+     <para>
+      The name of an index.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>VERBOSE</literal></term>
+    <listitem>
+     <para>
+      Prints a progress report as each table is repacked
+      at <literal>INFO</literal> level.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>ANALYZE</literal></term>
+    <term><literal>ANALYSE</literal></term>
+    <listitem>
+     <para>
+      Applies <xref linkend="sql-analyze"/> on the table after repacking.  This is
+      currently only supported when a single (non-partitioned) table is specified.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">boolean</replaceable></term>
+    <listitem>
+     <para>
+      Specifies whether the selected option should be turned on or off.
+      You can write <literal>TRUE</literal>, <literal>ON</literal>, or
+      <literal>1</literal> to enable the option, and <literal>FALSE</literal>,
+      <literal>OFF</literal>, or <literal>0</literal> to disable it.  The
+      <replaceable class="parameter">boolean</replaceable> value can also
+      be omitted, in which case <literal>TRUE</literal> is assumed.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+
+   <para>
+    To repack a table, one must have the <literal>MAINTAIN</literal> privilege
+    on the table.
+   </para>
+
+   <para>
+    While <command>REPACK</command> is running, the <xref
+    linkend="guc-search-path"/> is temporarily changed to <literal>pg_catalog,
+    pg_temp</literal>.
+   </para>
+
+  <para>
+    Each backend running <command>REPACK</command> will report its progress
+    in the <structname>pg_stat_progress_repack</structname> view. See
+    <xref linkend="repack-progress-reporting"/> for details.
+  </para>
+
+   <para>
+    Repacking a partitioned table repacks each of its partitions. If an index
+    is specified, each partition is repacked using the partition of that
+    index. <command>REPACK</command> on a partitioned table cannot be executed
+    inside a transaction block.
+   </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   Repack the table <literal>employees</literal>:
+<programlisting>
+REPACK employees;
+</programlisting>
+  </para>
+
+  <para>
+   Repack the table <literal>employees</literal> on the basis of its
+   index <literal>employees_ind</literal> (Since index is used here, this is
+   effectively clustering):
+<programlisting>
+REPACK employees USING INDEX employees_ind;
+</programlisting>
+  </para>
+
+  <para>
+   Repack the table <literal>cases</literal> on physical ordering,
+   running an <command>ANALYZE</command> on the given columns once
+   repacking is done, showing informational messages:
+<programlisting>
+REPACK (ANALYZE, VERBOSE) cases (district, case_nr);
+</programlisting>
+  </para>
+
+  <para>
+   Repack all tables in the database on which you have
+   the <literal>MAINTAIN</literal> privilege:
+<programlisting>
+REPACK;
+</programlisting>
+  </para>
+
+  <para>
+   Repack all tables for which a clustering index has previously been
+   configured on which you have the <literal>MAINTAIN</literal> privilege,
+   showing informational messages:
+<programlisting>
+REPACK (VERBOSE) USING INDEX;
+</programlisting>
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There is no <command>REPACK</command> statement in the SQL standard.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="app-pgrepackdb"/></member>
+   <member><xref linkend="repack-progress-reporting"/></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index 6d0fdd43cfb..ac5d083d468 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -25,7 +25,6 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
 
 <phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
 
-    FULL [ <replaceable class="parameter">boolean</replaceable> ]
     FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
     VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
     ANALYZE [ <replaceable class="parameter">boolean</replaceable> ]
@@ -39,6 +38,7 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
     SKIP_DATABASE_STATS [ <replaceable class="parameter">boolean</replaceable> ]
     ONLY_DATABASE_STATS [ <replaceable class="parameter">boolean</replaceable> ]
     BUFFER_USAGE_LIMIT <replaceable class="parameter">size</replaceable>
+    FULL [ <replaceable class="parameter">boolean</replaceable> ]
 
 <phrase>and <replaceable class="parameter">table_and_columns</replaceable> is:</phrase>
 
@@ -95,20 +95,6 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
   <title>Parameters</title>
 
   <variablelist>
-   <varlistentry>
-    <term><literal>FULL</literal></term>
-    <listitem>
-     <para>
-      Selects <quote>full</quote> vacuum, which can reclaim more
-      space, but takes much longer and exclusively locks the table.
-      This method also requires extra disk space, since it writes a
-      new copy of the table and doesn't release the old copy until
-      the operation is complete.  Usually this should only be used when a
-      significant amount of space needs to be reclaimed from within the table.
-     </para>
-    </listitem>
-   </varlistentry>
-
    <varlistentry>
     <term><literal>FREEZE</literal></term>
     <listitem>
@@ -362,6 +348,23 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>FULL</literal></term>
+    <listitem>
+     <para>
+      This option, which is deprecated, makes <command>VACUUM</command>
+      behave like <command>REPACK</command> without a
+      <literal>USING INDEX</literal> clause.
+      This method of compacting the table takes much longer than
+      <command>VACUUM</command> and exclusively locks the table.
+      This method also requires extra disk space, since it writes a
+      new copy of the table and doesn't release the old copy until
+      the operation is complete.  Usually this should only be used when a
+      significant amount of space needs to be reclaimed from within the table.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><replaceable class="parameter">boolean</replaceable></term>
     <listitem>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 2cf02c37b17..5d9a8a25a02 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -195,6 +195,7 @@
    &refreshMaterializedView;
    &reindex;
    &releaseSavepoint;
+   &repack;
    &reset;
    &revoke;
    &rollback;
@@ -258,6 +259,7 @@
    &pgIsready;
    &pgReceivewal;
    &pgRecvlogical;
+   &pgRepackdb;
    &pgRestore;
    &pgVerifyBackup;
    &psqlRef;
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..79f9de5d760 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -741,13 +741,13 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	if (OldIndex != NULL && !use_sort)
 	{
 		const int	ci_index[] = {
-			PROGRESS_CLUSTER_PHASE,
-			PROGRESS_CLUSTER_INDEX_RELID
+			PROGRESS_REPACK_PHASE,
+			PROGRESS_REPACK_INDEX_RELID
 		};
 		int64		ci_val[2];
 
 		/* Set phase and OIDOldIndex to columns */
-		ci_val[0] = PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP;
+		ci_val[0] = PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP;
 		ci_val[1] = RelationGetRelid(OldIndex);
 		pgstat_progress_update_multi_param(2, ci_index, ci_val);
 
@@ -759,15 +759,15 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	else
 	{
 		/* In scan-and-sort mode and also VACUUM FULL, set phase */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
 		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
 		/* Set total heap blocks */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_TOTAL_HEAP_BLKS,
+		pgstat_progress_update_param(PROGRESS_REPACK_TOTAL_HEAP_BLKS,
 									 heapScan->rs_nblocks);
 	}
 
@@ -809,7 +809,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 				 * is manually updated to the correct value when the table
 				 * scan finishes.
 				 */
-				pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+				pgstat_progress_update_param(PROGRESS_REPACK_HEAP_BLKS_SCANNED,
 											 heapScan->rs_nblocks);
 				break;
 			}
@@ -825,7 +825,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			 */
 			if (prev_cblock != heapScan->rs_cblock)
 			{
-				pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+				pgstat_progress_update_param(PROGRESS_REPACK_HEAP_BLKS_SCANNED,
 											 (heapScan->rs_cblock +
 											  heapScan->rs_nblocks -
 											  heapScan->rs_startblock
@@ -912,14 +912,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			 * In scan-and-sort mode, report increase in number of tuples
 			 * scanned
 			 */
-			pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
 										 *num_tuples);
 		}
 		else
 		{
 			const int	ct_index[] = {
-				PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED,
-				PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN
+				PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
+				PROGRESS_REPACK_HEAP_TUPLES_WRITTEN
 			};
 			int64		ct_val[2];
 
@@ -952,14 +952,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		double		n_tuples = 0;
 
 		/* Report that we are now sorting tuples */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_SORT_TUPLES);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_SORT_TUPLES);
 
 		tuplesort_performsort(tuplesort);
 
 		/* Report that we are now writing new heap */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_WRITE_NEW_HEAP);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP);
 
 		for (;;)
 		{
@@ -977,7 +977,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 									 values, isnull,
 									 rwstate);
 			/* Report n_tuples */
-			pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_WRITTEN,
 										 n_tuples);
 		}
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 5d9db167e59..08d4b8e44d7 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -4079,7 +4079,7 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 		Assert(!ReindexIsProcessingIndex(indexOid));
 
 		/* Set index rebuild count */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_INDEX_REBUILD_COUNT,
+		pgstat_progress_update_param(PROGRESS_REPACK_INDEX_REBUILD_COUNT,
 									 i);
 		i++;
 	}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 086c4c8fb6f..024d219016d 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1272,14 +1272,15 @@ CREATE VIEW pg_stat_progress_vacuum AS
     FROM pg_stat_get_progress_info('VACUUM') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
-CREATE VIEW pg_stat_progress_cluster AS
+CREATE VIEW pg_stat_progress_repack AS
     SELECT
         S.pid AS pid,
         S.datid AS datid,
         D.datname AS datname,
         S.relid AS relid,
         CASE S.param1 WHEN 1 THEN 'CLUSTER'
-                      WHEN 2 THEN 'VACUUM FULL'
+                      WHEN 2 THEN 'REPACK'
+                      WHEN 3 THEN 'VACUUM FULL'
                       END AS command,
         CASE S.param2 WHEN 0 THEN 'initializing'
                       WHEN 1 THEN 'seq scanning heap'
@@ -1290,15 +1291,35 @@ CREATE VIEW pg_stat_progress_cluster AS
                       WHEN 6 THEN 'rebuilding index'
                       WHEN 7 THEN 'performing final cleanup'
                       END AS phase,
-        CAST(S.param3 AS oid) AS cluster_index_relid,
+        CAST(S.param3 AS oid) AS repack_index_relid,
         S.param4 AS heap_tuples_scanned,
         S.param5 AS heap_tuples_written,
         S.param6 AS heap_blks_total,
         S.param7 AS heap_blks_scanned,
         S.param8 AS index_rebuild_count
-    FROM pg_stat_get_progress_info('CLUSTER') AS S
+    FROM pg_stat_get_progress_info('REPACK') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
+-- This view is as the one above, except for renaming a column and avoiding
+-- 'REPACK' as a command name to report.
+CREATE VIEW pg_stat_progress_cluster AS
+    SELECT
+        pid,
+        datid,
+        datname,
+        relid,
+        CASE WHEN command IN ('CLUSTER', 'VACUUM FULL') THEN command
+             WHEN repack_index_relid = 0 THEN 'VACUUM FULL'
+             ELSE 'CLUSTER' END AS command,
+        phase,
+        repack_index_relid AS cluster_index_relid,
+        heap_tuples_scanned,
+        heap_tuples_written,
+        heap_blks_total,
+        heap_blks_scanned,
+        index_rebuild_count
+    FROM pg_stat_progress_repack;
+
 CREATE VIEW pg_stat_progress_create_index AS
     SELECT
         S.pid AS pid, S.datid AS datid, D.datname AS datname,
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index b55221d44cd..ba3c076ea7d 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1,7 +1,8 @@
 /*-------------------------------------------------------------------------
  *
  * cluster.c
- *	  CLUSTER a table on an index.  This is now also used for VACUUM FULL.
+ *	  CLUSTER a table on an index.  This is now also used for VACUUM FULL and
+ *	  REPACK.
  *
  * There is hardly anything left of Paul Brown's original implementation...
  *
@@ -67,27 +68,36 @@ typedef struct
 	Oid			indexOid;
 } RelToCluster;
 
-
-static void cluster_multiple_rels(List *rtcs, ClusterParams *params);
-static void rebuild_relation(Relation OldHeap, Relation index, bool verbose);
+static bool cluster_rel_recheck(RepackCommand cmd, Relation OldHeap,
+								Oid indexOid, Oid userid, int options);
+static void rebuild_relation(RepackCommand cmd,
+							 Relation OldHeap, Relation index, bool verbose);
 static void copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
-static List *get_tables_to_cluster(MemoryContext cluster_context);
-static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
-											   Oid indexOid);
-static bool cluster_is_permitted_for_relation(Oid relid, Oid userid);
+static List *get_tables_to_repack(RepackCommand cmd, bool usingindex,
+								  MemoryContext permcxt);
+static List *get_tables_to_repack_partitioned(RepackCommand cmd,
+											  Oid relid, bool rel_is_index,
+											  MemoryContext permcxt);
+static bool cluster_is_permitted_for_relation(RepackCommand cmd,
+											  Oid relid, Oid userid);
+static Relation process_single_relation(RepackStmt *stmt,
+										ClusterParams *params);
+static Oid	determine_clustered_index(Relation rel, bool usingindex,
+									  const char *indexname);
+static const char *RepackCommandAsString(RepackCommand cmd);
 
 
-/*---------------------------------------------------------------------------
- * This cluster code allows for clustering multiple tables at once. Because
+/*
+ * The repack code allows for processing multiple tables at once. Because
  * of this, we cannot just run everything on a single transaction, or we
  * would be forced to acquire exclusive locks on all the tables being
  * clustered, simultaneously --- very likely leading to deadlock.
  *
- * To solve this we follow a similar strategy to VACUUM code,
- * clustering each relation in a separate transaction. For this to work,
- * we need to:
+ * To solve this we follow a similar strategy to VACUUM code, processing each
+ * relation in a separate transaction. For this to work, we need to:
+ *
  *	- provide a separate memory context so that we can pass information in
  *	  a way that survives across transactions
  *	- start a new transaction every time a new relation is clustered
@@ -98,197 +108,165 @@ static bool cluster_is_permitted_for_relation(Oid relid, Oid userid);
  *
  * The single-relation case does not have any such overhead.
  *
- * We also allow a relation to be specified without index.  In that case,
- * the indisclustered bit will be looked up, and an ERROR will be thrown
- * if there is no index with the bit set.
- *---------------------------------------------------------------------------
+ * We also allow a relation to be repacked following an index, but without
+ * naming a specific one.  In that case, the indisclustered bit will be
+ * looked up, and an ERROR will be thrown if no so-marked index is found.
  */
 void
-cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
+ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 {
-	ListCell   *lc;
 	ClusterParams params = {0};
-	bool		verbose = false;
 	Relation	rel = NULL;
-	Oid			indexOid = InvalidOid;
-	MemoryContext cluster_context;
+	MemoryContext repack_context;
 	List	   *rtcs;
 
 	/* Parse option list */
-	foreach(lc, stmt->params)
+	foreach_node(DefElem, opt, stmt->params)
 	{
-		DefElem    *opt = (DefElem *) lfirst(lc);
-
 		if (strcmp(opt->defname, "verbose") == 0)
-			verbose = defGetBoolean(opt);
+			params.options |= defGetBoolean(opt) ? CLUOPT_VERBOSE : 0;
+		else if (strcmp(opt->defname, "analyze") == 0 ||
+				 strcmp(opt->defname, "analyse") == 0)
+			params.options |= defGetBoolean(opt) ? CLUOPT_ANALYZE : 0;
 		else
 			ereport(ERROR,
-					(errcode(ERRCODE_SYNTAX_ERROR),
-					 errmsg("unrecognized CLUSTER option \"%s\"",
-							opt->defname),
-					 parser_errposition(pstate, opt->location)));
+					errcode(ERRCODE_SYNTAX_ERROR),
+					errmsg("unrecognized %s option \"%s\"",
+						   RepackCommandAsString(stmt->command),
+						   opt->defname),
+					parser_errposition(pstate, opt->location));
 	}
 
-	params.options = (verbose ? CLUOPT_VERBOSE : 0);
-
+	/*
+	 * If a single relation is specified, process it and we're done ... unless
+	 * the relation is a partitioned table, in which case we fall through.
+	 */
 	if (stmt->relation != NULL)
 	{
-		/* This is the single-relation case. */
-		Oid			tableOid;
-
-		/*
-		 * Find, lock, and check permissions on the table.  We obtain
-		 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
-		 * single-transaction case.
-		 */
-		tableOid = RangeVarGetRelidExtended(stmt->relation,
-											AccessExclusiveLock,
-											0,
-											RangeVarCallbackMaintainsTable,
-											NULL);
-		rel = table_open(tableOid, NoLock);
-
-		/*
-		 * Reject clustering a remote temp table ... their local buffer
-		 * manager is not going to cope.
-		 */
-		if (RELATION_IS_OTHER_TEMP(rel))
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot cluster temporary tables of other sessions")));
-
-		if (stmt->indexname == NULL)
-		{
-			ListCell   *index;
-
-			/* We need to find the index that has indisclustered set. */
-			foreach(index, RelationGetIndexList(rel))
-			{
-				indexOid = lfirst_oid(index);
-				if (get_index_isclustered(indexOid))
-					break;
-				indexOid = InvalidOid;
-			}
-
-			if (!OidIsValid(indexOid))
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_OBJECT),
-						 errmsg("there is no previously clustered index for table \"%s\"",
-								stmt->relation->relname)));
-		}
-		else
-		{
-			/*
-			 * The index is expected to be in the same namespace as the
-			 * relation.
-			 */
-			indexOid = get_relname_relid(stmt->indexname,
-										 rel->rd_rel->relnamespace);
-			if (!OidIsValid(indexOid))
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_OBJECT),
-						 errmsg("index \"%s\" for table \"%s\" does not exist",
-								stmt->indexname, stmt->relation->relname)));
-		}
-
-		/* For non-partitioned tables, do what we came here to do. */
-		if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
-		{
-			cluster_rel(rel, indexOid, &params);
-			/* cluster_rel closes the relation, but keeps lock */
-
-			return;
-		}
+		rel = process_single_relation(stmt, &params);
+		if (rel == NULL)
+			return;				/* all done */
 	}
 
+	/*
+	 * Don't allow ANALYZE in the multiple-relation case for now.  Maybe we
+	 * can add support for this later.
+	 */
+	if (params.options & CLUOPT_ANALYZE)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot %s multiple tables", "REPACK (ANALYZE)"));
+
 	/*
 	 * By here, we know we are in a multi-table situation.  In order to avoid
 	 * holding locks for too long, we want to process each table in its own
 	 * transaction.  This forces us to disallow running inside a user
 	 * transaction block.
 	 */
-	PreventInTransactionBlock(isTopLevel, "CLUSTER");
+	PreventInTransactionBlock(isTopLevel, RepackCommandAsString(stmt->command));
 
 	/* Also, we need a memory context to hold our list of relations */
-	cluster_context = AllocSetContextCreate(PortalContext,
-											"Cluster",
-											ALLOCSET_DEFAULT_SIZES);
+	repack_context = AllocSetContextCreate(PortalContext,
+										   "Repack",
+										   ALLOCSET_DEFAULT_SIZES);
+
+	params.options |= CLUOPT_RECHECK;
 
 	/*
-	 * Either we're processing a partitioned table, or we were not given any
-	 * table name at all.  In either case, obtain a list of relations to
-	 * process.
-	 *
-	 * In the former case, an index name must have been given, so we don't
-	 * need to recheck its "indisclustered" bit, but we have to check that it
-	 * is an index that we can cluster on.  In the latter case, we set the
-	 * option bit to have indisclustered verified.
-	 *
-	 * Rechecking the relation itself is necessary here in all cases.
+	 * If we don't have a relation yet, determine a relation list.  If we do,
+	 * then it must be a partitioned table, and we want to process its
+	 * partitions.
 	 */
-	params.options |= CLUOPT_RECHECK;
-	if (rel != NULL)
+	if (rel == NULL)
 	{
-		Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
-		check_index_is_clusterable(rel, indexOid, AccessShareLock);
-		rtcs = get_tables_to_cluster_partitioned(cluster_context, indexOid);
-
-		/* close relation, releasing lock on parent table */
-		table_close(rel, AccessExclusiveLock);
+		Assert(stmt->indexname == NULL);
+		rtcs = get_tables_to_repack(stmt->command, stmt->usingindex,
+									repack_context);
 	}
 	else
 	{
-		rtcs = get_tables_to_cluster(cluster_context);
-		params.options |= CLUOPT_RECHECK_ISCLUSTERED;
-	}
+		Oid			relid;
+		bool		rel_is_index;
 
-	/* Do the job. */
-	cluster_multiple_rels(rtcs, &params);
+		Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
 
-	/* Start a new transaction for the cleanup work. */
-	StartTransactionCommand();
+		/*
+		 * If USING INDEX was specified, resolve the index name now and pass
+		 * it down.
+		 */
+		if (stmt->usingindex)
+		{
+			/*
+			 * If no index name was specified when repacking a partitioned
+			 * table, punt for now.  Maybe we can improve this later.
+			 */
+			if (!stmt->indexname)
+				ereport(ERROR,
+						errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+						errmsg("there is no previously clustered index for table \"%s\"",
+							   RelationGetRelationName(rel)));
+
+			relid = determine_clustered_index(rel, stmt->usingindex,
+											  stmt->indexname);
+			if (!OidIsValid(relid))
+				elog(ERROR, "unable to determine index to cluster on");
+			/* XXX is this the right place for this check? */
+			check_index_is_clusterable(rel, relid, AccessExclusiveLock);
+			rel_is_index = true;
+		}
+		else
+		{
+			relid = RelationGetRelid(rel);
+			rel_is_index = false;
+		}
 
-	/* Clean up working storage */
-	MemoryContextDelete(cluster_context);
-}
+		rtcs = get_tables_to_repack_partitioned(stmt->command,
+												relid, rel_is_index,
+												repack_context);
 
-/*
- * Given a list of relations to cluster, process each of them in a separate
- * transaction.
- *
- * We expect to be in a transaction at start, but there isn't one when we
- * return.
- */
-static void
-cluster_multiple_rels(List *rtcs, ClusterParams *params)
-{
-	ListCell   *lc;
+		/* close parent relation, releasing lock on it */
+		table_close(rel, AccessExclusiveLock);
+		rel = NULL;
+	}
 
 	/* Commit to get out of starting transaction */
 	PopActiveSnapshot();
 	CommitTransactionCommand();
 
 	/* Cluster the tables, each in a separate transaction */
-	foreach(lc, rtcs)
+	Assert(rel == NULL);
+	foreach_ptr(RelToCluster, rtc, rtcs)
 	{
-		RelToCluster *rtc = (RelToCluster *) lfirst(lc);
-		Relation	rel;
-
 		/* Start a new transaction for each relation. */
 		StartTransactionCommand();
 
+		/*
+		 * Open the target table, coping with the case where it has been
+		 * dropped.
+		 */
+		rel = try_table_open(rtc->tableOid, AccessExclusiveLock);
+		if (rel == NULL)
+		{
+			CommitTransactionCommand();
+			continue;
+		}
+
 		/* functions in indexes may want a snapshot set */
 		PushActiveSnapshot(GetTransactionSnapshot());
 
-		rel = table_open(rtc->tableOid, AccessExclusiveLock);
-
 		/* Process this table */
-		cluster_rel(rel, rtc->indexOid, params);
+		cluster_rel(stmt->command, rel, rtc->indexOid, &params);
 		/* cluster_rel closes the relation, but keeps lock */
 
 		PopActiveSnapshot();
 		CommitTransactionCommand();
 	}
+
+	/* Start a new transaction for the cleanup work. */
+	StartTransactionCommand();
+
+	/* Clean up working storage */
+	MemoryContextDelete(repack_context);
 }
 
 /*
@@ -304,11 +282,14 @@ cluster_multiple_rels(List *rtcs, ClusterParams *params)
  * them incrementally while we load the table.
  *
  * If indexOid is InvalidOid, the table will be rewritten in physical order
- * instead of index order.  This is the new implementation of VACUUM FULL,
- * and error messages should refer to the operation as VACUUM not CLUSTER.
+ * instead of index order.
+ *
+ * 'cmd' indicates which command is being executed, to be used for error
+ * messages.
  */
 void
-cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
+cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
+			ClusterParams *params)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			save_userid;
@@ -323,13 +304,8 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	/* Check for user-requested abort. */
 	CHECK_FOR_INTERRUPTS();
 
-	pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
-	if (OidIsValid(indexOid))
-		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
-									 PROGRESS_CLUSTER_COMMAND_CLUSTER);
-	else
-		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
-									 PROGRESS_CLUSTER_COMMAND_VACUUM_FULL);
+	pgstat_progress_start_command(PROGRESS_COMMAND_REPACK, tableOid);
+	pgstat_progress_update_param(PROGRESS_REPACK_COMMAND, cmd);
 
 	/*
 	 * Switch to the table owner's userid, so that any index functions are run
@@ -350,86 +326,38 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	 * *must* skip the one on indisclustered since it would reject an attempt
 	 * to cluster a not-previously-clustered index.
 	 */
-	if (recheck)
-	{
-		/* Check that the user still has privileges for the relation */
-		if (!cluster_is_permitted_for_relation(tableOid, save_userid))
-		{
-			relation_close(OldHeap, AccessExclusiveLock);
-			goto out;
-		}
-
-		/*
-		 * Silently skip a temp table for a remote session.  Only doing this
-		 * check in the "recheck" case is appropriate (which currently means
-		 * somebody is executing a database-wide CLUSTER or on a partitioned
-		 * table), because there is another check in cluster() which will stop
-		 * any attempt to cluster remote temp tables by name.  There is
-		 * another check in cluster_rel which is redundant, but we leave it
-		 * for extra safety.
-		 */
-		if (RELATION_IS_OTHER_TEMP(OldHeap))
-		{
-			relation_close(OldHeap, AccessExclusiveLock);
-			goto out;
-		}
-
-		if (OidIsValid(indexOid))
-		{
-			/*
-			 * Check that the index still exists
-			 */
-			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
-			{
-				relation_close(OldHeap, AccessExclusiveLock);
-				goto out;
-			}
-
-			/*
-			 * Check that the index is still the one with indisclustered set,
-			 * if needed.
-			 */
-			if ((params->options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
-				!get_index_isclustered(indexOid))
-			{
-				relation_close(OldHeap, AccessExclusiveLock);
-				goto out;
-			}
-		}
-	}
+	if (recheck &&
+		!cluster_rel_recheck(cmd, OldHeap, indexOid, save_userid,
+							 params->options))
+		goto out;
 
 	/*
-	 * We allow VACUUM FULL, but not CLUSTER, on shared catalogs.  CLUSTER
-	 * would work in most respects, but the index would only get marked as
-	 * indisclustered in the current database, leading to unexpected behavior
-	 * if CLUSTER were later invoked in another database.
+	 * We allow repacking shared catalogs only when not using an index. It
+	 * would work to use an index in most respects, but the index would only
+	 * get marked as indisclustered in the current database, leading to
+	 * unexpected behavior if CLUSTER were later invoked in another database.
 	 */
 	if (OidIsValid(indexOid) && OldHeap->rd_rel->relisshared)
 		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot cluster a shared catalog")));
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot run %s on a shared catalog",
+					   RepackCommandAsString(cmd)));
 
 	/*
 	 * Don't process temp tables of other backends ... their local buffer
 	 * manager is not going to cope.
 	 */
 	if (RELATION_IS_OTHER_TEMP(OldHeap))
-	{
-		if (OidIsValid(indexOid))
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot cluster temporary tables of other sessions")));
-		else
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot vacuum temporary tables of other sessions")));
-	}
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot run %s on temporary tables of other sessions",
+					   RepackCommandAsString(cmd)));
 
 	/*
 	 * Also check for active uses of the relation in the current transaction,
 	 * including open scans and pending AFTER trigger events.
 	 */
-	CheckTableNotInUse(OldHeap, OidIsValid(indexOid) ? "CLUSTER" : "VACUUM");
+	CheckTableNotInUse(OldHeap, RepackCommandAsString(cmd));
 
 	/* Check heap and index are valid to cluster on */
 	if (OidIsValid(indexOid))
@@ -442,6 +370,24 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	else
 		index = NULL;
 
+	/*
+	 * When allow_system_table_mods is turned off, we disallow repacking a
+	 * catalog on a particular index unless that's already the clustered index
+	 * for that catalog.
+	 *
+	 * XXX We don't check for this in CLUSTER, because it's historically been
+	 * allowed.
+	 */
+	if (cmd != REPACK_COMMAND_CLUSTER &&
+		!allowSystemTableMods && OidIsValid(indexOid) &&
+		IsCatalogRelation(OldHeap) && !index->rd_index->indisclustered)
+		ereport(ERROR,
+				errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				errmsg("permission denied: \"%s\" is a system catalog",
+					   RelationGetRelationName(OldHeap)),
+				errdetail("System catalogs can only be clustered by the index they're already clustered on, if any, unless \"%s\" is enabled.",
+						  "allow_system_table_mods"));
+
 	/*
 	 * Quietly ignore the request if this is a materialized view which has not
 	 * been populated from its query. No harm is done because there is no data
@@ -469,7 +415,7 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	TransferPredicateLocksToHeapRelation(OldHeap);
 
 	/* rebuild_relation does all the dirty work */
-	rebuild_relation(OldHeap, index, verbose);
+	rebuild_relation(cmd, OldHeap, index, verbose);
 	/* rebuild_relation closes OldHeap, and index if valid */
 
 out:
@@ -482,6 +428,63 @@ out:
 	pgstat_progress_end_command();
 }
 
+/*
+ * Check if the table (and its index) still meets the requirements of
+ * cluster_rel().
+ */
+static bool
+cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
+					Oid userid, int options)
+{
+	Oid			tableOid = RelationGetRelid(OldHeap);
+
+	/* Check that the user still has privileges for the relation */
+	if (!cluster_is_permitted_for_relation(cmd, tableOid, userid))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		return false;
+	}
+
+	/*
+	 * Silently skip a temp table for a remote session.  Only doing this check
+	 * in the "recheck" case is appropriate (which currently means somebody is
+	 * executing a database-wide CLUSTER or on a partitioned table), because
+	 * there is another check in cluster() which will stop any attempt to
+	 * cluster remote temp tables by name.  There is another check in
+	 * cluster_rel which is redundant, but we leave it for extra safety.
+	 */
+	if (RELATION_IS_OTHER_TEMP(OldHeap))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		return false;
+	}
+
+	if (OidIsValid(indexOid))
+	{
+		/*
+		 * Check that the index still exists
+		 */
+		if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
+		{
+			relation_close(OldHeap, AccessExclusiveLock);
+			return false;
+		}
+
+		/*
+		 * Check that the index is still the one with indisclustered set, if
+		 * needed.
+		 */
+		if ((options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
+			!get_index_isclustered(indexOid))
+		{
+			relation_close(OldHeap, AccessExclusiveLock);
+			return false;
+		}
+	}
+
+	return true;
+}
+
 /*
  * Verify that the specified heap and index are valid to cluster on
  *
@@ -626,7 +629,8 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
  * On exit, they are closed, but locks on them are not released.
  */
 static void
-rebuild_relation(Relation OldHeap, Relation index, bool verbose)
+rebuild_relation(RepackCommand cmd,
+				 Relation OldHeap, Relation index, bool verbose)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			accessMethod = OldHeap->rd_rel->relam;
@@ -642,8 +646,8 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose)
 	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false) &&
 		   (index == NULL || CheckRelationLockedByMe(index, AccessExclusiveLock, false)));
 
-	if (index)
-		/* Mark the correct index as clustered */
+	/* for CLUSTER or REPACK USING INDEX, mark the index as the one to use */
+	if (index != NULL)
 		mark_index_clustered(OldHeap, RelationGetRelid(index), true);
 
 	/* Remember info about rel before closing OldHeap */
@@ -958,20 +962,20 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	/* Log what we're doing */
 	if (OldIndex != NULL && !use_sort)
 		ereport(elevel,
-				(errmsg("clustering \"%s.%s\" using index scan on \"%s\"",
-						nspname,
-						RelationGetRelationName(OldHeap),
-						RelationGetRelationName(OldIndex))));
+				errmsg("repacking \"%s.%s\" using index scan on \"%s\"",
+					   nspname,
+					   RelationGetRelationName(OldHeap),
+					   RelationGetRelationName(OldIndex)));
 	else if (use_sort)
 		ereport(elevel,
-				(errmsg("clustering \"%s.%s\" using sequential scan and sort",
-						nspname,
-						RelationGetRelationName(OldHeap))));
+				errmsg("repacking \"%s.%s\" using sequential scan and sort",
+					   nspname,
+					   RelationGetRelationName(OldHeap)));
 	else
 		ereport(elevel,
-				(errmsg("vacuuming \"%s.%s\"",
-						nspname,
-						RelationGetRelationName(OldHeap))));
+				errmsg("repacking \"%s.%s\" in physical order",
+					   nspname,
+					   RelationGetRelationName(OldHeap)));
 
 	/*
 	 * Hand off the actual copying to AM specific function, the generic code
@@ -1458,8 +1462,8 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	int			i;
 
 	/* Report that we are now swapping relation files */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_SWAP_REL_FILES);
 
 	/* Zero out possible results from swapped_relation_files */
 	memset(mapped_tables, 0, sizeof(mapped_tables));
@@ -1509,14 +1513,14 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 		reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
 
 	/* Report that we are now reindexing relations */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_REBUILD_INDEX);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
 
 	reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
 
 	/* Report that we are now doing clean up */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_FINAL_CLEANUP);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_FINAL_CLEANUP);
 
 	/*
 	 * If the relation being rebuilt is pg_class, swap_relation_files()
@@ -1632,106 +1636,191 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	}
 }
 
-
 /*
- * Get a list of tables that the current user has privileges on and
- * have indisclustered set.  Return the list in a List * of RelToCluster
- * (stored in the specified memory context), each one giving the tableOid
- * and the indexOid on which the table is already clustered.
+ * Determine which relations to process, when REPACK/CLUSTER is called
+ * without specifying a table name.  The exact process depends on whether
+ * USING INDEX was given or not, and in any case we only return tables and
+ * materialized views that the current user has privileges to repack/cluster.
+ *
+ * If USING INDEX was given, we scan pg_index to find those that have
+ * indisclustered set; if it was not given, scan pg_class and return all
+ * tables.
+ *
+ * Return it as a list of RelToCluster in the given memory context.
  */
 static List *
-get_tables_to_cluster(MemoryContext cluster_context)
+get_tables_to_repack(RepackCommand cmd, bool usingindex, MemoryContext permcxt)
 {
-	Relation	indRelation;
+	Relation	catalog;
 	TableScanDesc scan;
-	ScanKeyData entry;
-	HeapTuple	indexTuple;
-	Form_pg_index index;
-	MemoryContext old_context;
+	HeapTuple	tuple;
 	List	   *rtcs = NIL;
 
-	/*
-	 * Get all indexes that have indisclustered set and that the current user
-	 * has the appropriate privileges for.
-	 */
-	indRelation = table_open(IndexRelationId, AccessShareLock);
-	ScanKeyInit(&entry,
-				Anum_pg_index_indisclustered,
-				BTEqualStrategyNumber, F_BOOLEQ,
-				BoolGetDatum(true));
-	scan = table_beginscan_catalog(indRelation, 1, &entry);
-	while ((indexTuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+	if (usingindex)
 	{
-		RelToCluster *rtc;
+		ScanKeyData entry;
+
+		catalog = table_open(IndexRelationId, AccessShareLock);
+		ScanKeyInit(&entry,
+					Anum_pg_index_indisclustered,
+					BTEqualStrategyNumber, F_BOOLEQ,
+					BoolGetDatum(true));
+		scan = table_beginscan_catalog(catalog, 1, &entry);
+		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+		{
+			RelToCluster *rtc;
+			Form_pg_index index;
+			MemoryContext oldcxt;
 
-		index = (Form_pg_index) GETSTRUCT(indexTuple);
+			index = (Form_pg_index) GETSTRUCT(tuple);
 
-		if (!cluster_is_permitted_for_relation(index->indrelid, GetUserId()))
-			continue;
+			/*
+			 * Try to obtain a light lock on the index's table, to ensure it
+			 * doesn't go away while we collect the list.  If we cannot, just
+			 * disregard it.
+			 */
+			if (!ConditionalLockRelationOid(index->indrelid, AccessShareLock))
+				continue;
 
-		/* Use a permanent memory context for the result list */
-		old_context = MemoryContextSwitchTo(cluster_context);
+			/* Verify that the table still exists */
+			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(index->indrelid)))
+			{
+				/* Release useless lock */
+				UnlockRelationOid(index->indrelid, AccessShareLock);
+				continue;
+			}
 
-		rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
-		rtc->tableOid = index->indrelid;
-		rtc->indexOid = index->indexrelid;
-		rtcs = lappend(rtcs, rtc);
+			if (!cluster_is_permitted_for_relation(cmd, index->indrelid,
+												   GetUserId()))
+				continue;
+
+			/* Use a permanent memory context for the result list */
+			oldcxt = MemoryContextSwitchTo(permcxt);
+			rtc = palloc(sizeof(RelToCluster));
+			rtc->tableOid = index->indrelid;
+			rtc->indexOid = index->indexrelid;
+			rtcs = lappend(rtcs, rtc);
+			MemoryContextSwitchTo(oldcxt);
+		}
+	}
+	else
+	{
+		catalog = table_open(RelationRelationId, AccessShareLock);
+		scan = table_beginscan_catalog(catalog, 0, NULL);
+
+		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+		{
+			RelToCluster *rtc;
+			Form_pg_class class;
+			MemoryContext oldcxt;
+
+			class = (Form_pg_class) GETSTRUCT(tuple);
+
+			/*
+			 * Try to obtain a light lock on the table, to ensure it doesn't
+			 * go away while we collect the list.  If we cannot, just
+			 * disregard the table.
+			 */
+			if (!ConditionalLockRelationOid(class->oid, AccessShareLock))
+				continue;
 
-		MemoryContextSwitchTo(old_context);
+			/* Verify that the table still exists */
+			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(class->oid)))
+			{
+				/* Release useless lock */
+				UnlockRelationOid(class->oid, AccessShareLock);
+				continue;
+			}
+
+			/* Can only process plain tables and matviews */
+			if (class->relkind != RELKIND_RELATION &&
+				class->relkind != RELKIND_MATVIEW)
+				continue;
+
+			/* noisily skip rels which the user can't process */
+			if (!cluster_is_permitted_for_relation(cmd, class->oid,
+												   GetUserId()))
+				continue;
+
+			/* Use a permanent memory context for the result list */
+			oldcxt = MemoryContextSwitchTo(permcxt);
+			rtc = palloc(sizeof(RelToCluster));
+			rtc->tableOid = class->oid;
+			rtc->indexOid = InvalidOid;
+			rtcs = lappend(rtcs, rtc);
+			MemoryContextSwitchTo(oldcxt);
+		}
 	}
-	table_endscan(scan);
 
-	relation_close(indRelation, AccessShareLock);
+	table_endscan(scan);
+	relation_close(catalog, AccessShareLock);
 
 	return rtcs;
 }
 
 /*
- * Given an index on a partitioned table, return a list of RelToCluster for
+ * Given a partitioned table or its index, return a list of RelToCluster for
  * all the children leaves tables/indexes.
  *
  * Like expand_vacuum_rel, but here caller must hold AccessExclusiveLock
  * on the table containing the index.
+ *
+ * 'rel_is_index' tells whether 'relid' is that of an index (true) or of the
+ * owning relation.
  */
 static List *
-get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
+get_tables_to_repack_partitioned(RepackCommand cmd, Oid relid,
+								 bool rel_is_index, MemoryContext permcxt)
 {
 	List	   *inhoids;
-	ListCell   *lc;
 	List	   *rtcs = NIL;
-	MemoryContext old_context;
 
-	/* Do not lock the children until they're processed */
-	inhoids = find_all_inheritors(indexOid, NoLock, NULL);
-
-	foreach(lc, inhoids)
+	/*
+	 * Do not lock the children until they're processed.  Note that we do hold
+	 * a lock on the parent partitioned table.
+	 */
+	inhoids = find_all_inheritors(relid, NoLock, NULL);
+	foreach_oid(child_oid, inhoids)
 	{
-		Oid			indexrelid = lfirst_oid(lc);
-		Oid			relid = IndexGetRelation(indexrelid, false);
+		Oid			table_oid,
+					index_oid;
 		RelToCluster *rtc;
+		MemoryContext oldcxt;
 
-		/* consider only leaf indexes */
-		if (get_rel_relkind(indexrelid) != RELKIND_INDEX)
-			continue;
+		if (rel_is_index)
+		{
+			/* consider only leaf indexes */
+			if (get_rel_relkind(child_oid) != RELKIND_INDEX)
+				continue;
+
+			table_oid = IndexGetRelation(child_oid, false);
+			index_oid = child_oid;
+		}
+		else
+		{
+			/* consider only leaf relations */
+			if (get_rel_relkind(child_oid) != RELKIND_RELATION)
+				continue;
+
+			table_oid = child_oid;
+			index_oid = InvalidOid;
+		}
 
 		/*
 		 * It's possible that the user does not have privileges to CLUSTER the
-		 * leaf partition despite having such privileges on the partitioned
-		 * table.  We skip any partitions which the user is not permitted to
-		 * CLUSTER.
+		 * leaf partition despite having them on the partitioned table.  Skip
+		 * if so.
 		 */
-		if (!cluster_is_permitted_for_relation(relid, GetUserId()))
+		if (!cluster_is_permitted_for_relation(cmd, table_oid, GetUserId()))
 			continue;
 
 		/* Use a permanent memory context for the result list */
-		old_context = MemoryContextSwitchTo(cluster_context);
-
-		rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
-		rtc->tableOid = relid;
-		rtc->indexOid = indexrelid;
+		oldcxt = MemoryContextSwitchTo(permcxt);
+		rtc = palloc(sizeof(RelToCluster));
+		rtc->tableOid = table_oid;
+		rtc->indexOid = index_oid;
 		rtcs = lappend(rtcs, rtc);
-
-		MemoryContextSwitchTo(old_context);
+		MemoryContextSwitchTo(oldcxt);
 	}
 
 	return rtcs;
@@ -1742,13 +1831,167 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
  * function emits a WARNING.
  */
 static bool
-cluster_is_permitted_for_relation(Oid relid, Oid userid)
+cluster_is_permitted_for_relation(RepackCommand cmd, Oid relid, Oid userid)
 {
+	Assert(cmd == REPACK_COMMAND_CLUSTER || cmd == REPACK_COMMAND_REPACK);
+
 	if (pg_class_aclcheck(relid, userid, ACL_MAINTAIN) == ACLCHECK_OK)
 		return true;
 
 	ereport(WARNING,
-			(errmsg("permission denied to cluster \"%s\", skipping it",
-					get_rel_name(relid))));
+			errmsg("permission denied to execute %s on \"%s\", skipping it",
+				   RepackCommandAsString(cmd),
+				   get_rel_name(relid)));
+
 	return false;
 }
+
+
+/*
+ * Given a RepackStmt with an indicated relation name, resolve the relation
+ * name, obtain lock on it, then determine what to do based on the relation
+ * type: if it's table and not partitioned, repack it as indicated (using an
+ * existing clustered index, or following the given one), and return NULL.
+ *
+ * On the other hand, if the table is partitioned, do nothing further and
+ * instead return the opened and locked relcache entry, so that caller can
+ * process the partitions using the multiple-table handling code.  In this
+ * case, if an index name is given, it's up to the caller to resolve it.
+ */
+static Relation
+process_single_relation(RepackStmt *stmt, ClusterParams *params)
+{
+	Relation	rel;
+	Oid			tableOid;
+
+	Assert(stmt->relation != NULL);
+	Assert(stmt->command == REPACK_COMMAND_CLUSTER ||
+		   stmt->command == REPACK_COMMAND_REPACK);
+
+	/*
+	 * Find, lock, and check permissions on the table.  We obtain
+	 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
+	 * single-transaction case.
+	 */
+	tableOid = RangeVarGetRelidExtended(stmt->relation->relation,
+										AccessExclusiveLock,
+										0,
+										RangeVarCallbackMaintainsTable,
+										NULL);
+	rel = table_open(tableOid, NoLock);
+
+	/*
+	 * Reject clustering a remote temp table ... their local buffer manager is
+	 * not going to cope.
+	 */
+	if (RELATION_IS_OTHER_TEMP(rel))
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot execute %s on temporary tables of other sessions",
+					   RepackCommandAsString(stmt->command)));
+
+	/*
+	 * Make sure ANALYZE is specified if a column list is present.
+	 */
+	if ((params->options & CLUOPT_ANALYZE) == 0 && stmt->relation->va_cols != NIL)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("ANALYZE option must be specified when a column list is provided"));
+
+	/*
+	 * For partitioned tables, let caller handle this.  Otherwise, process it
+	 * here and we're done.
+	 */
+	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+		return rel;
+	else
+	{
+		Oid			indexOid;
+
+		indexOid = determine_clustered_index(rel, stmt->usingindex,
+											 stmt->indexname);
+		if (OidIsValid(indexOid))
+			check_index_is_clusterable(rel, indexOid, AccessExclusiveLock);
+		cluster_rel(stmt->command, rel, indexOid, params);
+
+		/* Do an analyze, if requested */
+		if (params->options & CLUOPT_ANALYZE)
+		{
+			VacuumParams vac_params = {0};
+
+			vac_params.options |= VACOPT_ANALYZE;
+			if (params->options & CLUOPT_VERBOSE)
+				vac_params.options |= VACOPT_VERBOSE;
+			analyze_rel(RelationGetRelid(rel), NULL, vac_params,
+						stmt->relation->va_cols, true, NULL);
+		}
+
+		return NULL;
+	}
+}
+
+/*
+ * Given a relation and the usingindex/indexname options in a
+ * REPACK USING INDEX or CLUSTER command, return the OID of the
+ * index to use for clustering the table.
+ *
+ * Caller must hold lock on the relation so that the set of indexes
+ * doesn't change, and must call check_index_is_clusterable.
+ */
+static Oid
+determine_clustered_index(Relation rel, bool usingindex, const char *indexname)
+{
+	Oid			indexOid;
+
+	if (indexname == NULL && usingindex)
+	{
+		/*
+		 * If USING INDEX with no name is given, find a clustered index, or
+		 * error out if none.
+		 */
+		indexOid = InvalidOid;
+		foreach_oid(idxoid, RelationGetIndexList(rel))
+		{
+			if (get_index_isclustered(idxoid))
+			{
+				indexOid = idxoid;
+				break;
+			}
+		}
+
+		if (!OidIsValid(indexOid))
+			ereport(ERROR,
+					errcode(ERRCODE_UNDEFINED_OBJECT),
+					errmsg("there is no previously clustered index for table \"%s\"",
+						   RelationGetRelationName(rel)));
+	}
+	else if (indexname != NULL)
+	{
+		/* An index was specified; obtain its OID. */
+		indexOid = get_relname_relid(indexname, rel->rd_rel->relnamespace);
+		if (!OidIsValid(indexOid))
+			ereport(ERROR,
+					errcode(ERRCODE_UNDEFINED_OBJECT),
+					errmsg("index \"%s\" for table \"%s\" does not exist",
+						   indexname, RelationGetRelationName(rel)));
+	}
+	else
+		indexOid = InvalidOid;
+
+	return indexOid;
+}
+
+static const char *
+RepackCommandAsString(RepackCommand cmd)
+{
+	switch (cmd)
+	{
+		case REPACK_COMMAND_REPACK:
+			return "REPACK";
+		case REPACK_COMMAND_VACUUMFULL:
+			return "VACUUM";
+		case REPACK_COMMAND_CLUSTER:
+			return "CLUSTER";
+	}
+	return "???";
+}
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index e785dd55ce5..827e66724b5 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -349,7 +349,6 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 		}
 	}
 
-
 	/*
 	 * Sanity check DISABLE_PAGE_SKIPPING option.
 	 */
@@ -2287,8 +2286,9 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 			if ((params.options & VACOPT_VERBOSE) != 0)
 				cluster_params.options |= CLUOPT_VERBOSE;
 
-			/* VACUUM FULL is now a variant of CLUSTER; see cluster.c */
-			cluster_rel(rel, InvalidOid, &cluster_params);
+			/* VACUUM FULL is a variant of REPACK; see cluster.c */
+			cluster_rel(REPACK_COMMAND_VACUUMFULL, rel, InvalidOid,
+						&cluster_params);
 			/* cluster_rel closes the relation, but keeps lock */
 
 			rel = NULL;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index c3a0a354a9c..c314c11e23d 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -286,7 +286,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		AlterCompositeTypeStmt AlterUserMappingStmt
 		AlterRoleStmt AlterRoleSetStmt AlterPolicyStmt AlterStatsStmt
 		AlterDefaultPrivilegesStmt DefACLAction
-		AnalyzeStmt CallStmt ClosePortalStmt ClusterStmt CommentStmt
+		AnalyzeStmt CallStmt ClosePortalStmt CommentStmt
 		ConstraintsSetStmt CopyStmt CreateAsStmt CreateCastStmt
 		CreateDomainStmt CreateExtensionStmt CreateGroupStmt CreateOpClassStmt
 		CreateOpFamilyStmt AlterOpFamilyStmt CreatePLangStmt
@@ -303,7 +303,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		GrantStmt GrantRoleStmt ImportForeignSchemaStmt IndexStmt InsertStmt
 		ListenStmt LoadStmt LockStmt MergeStmt NotifyStmt ExplainableStmt PreparableStmt
 		CreateFunctionStmt AlterFunctionStmt ReindexStmt RemoveAggrStmt
-		RemoveFuncStmt RemoveOperStmt RenameStmt ReturnStmt RevokeStmt RevokeRoleStmt
+		RemoveFuncStmt RemoveOperStmt RenameStmt RepackStmt ReturnStmt RevokeStmt RevokeRoleStmt
 		RuleActionStmt RuleActionStmtOrEmpty RuleStmt
 		SecLabelStmt SelectStmt TransactionStmt TransactionStmtLegacy TruncateStmt
 		UnlistenStmt UpdateStmt VacuumStmt
@@ -322,7 +322,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 %type <str>			opt_single_name
 %type <list>		opt_qualified_name
-%type <boolean>		opt_concurrently
+%type <boolean>		opt_concurrently opt_usingindex
 %type <dbehavior>	opt_drop_behavior
 %type <list>		opt_utility_option_list
 %type <list>		opt_wait_with_clause
@@ -770,7 +770,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	QUOTE QUOTES
 
 	RANGE READ REAL REASSIGN RECURSIVE REF_P REFERENCES REFERENCING
-	REFRESH REINDEX RELATIVE_P RELEASE RENAME REPEATABLE REPLACE REPLICA
+	REFRESH REINDEX RELATIVE_P RELEASE RENAME REPACK REPEATABLE REPLACE REPLICA
 	RESET RESPECT_P RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP
 	ROUTINE ROUTINES ROW ROWS RULE
 
@@ -1032,7 +1032,6 @@ stmt:
 			| CallStmt
 			| CheckPointStmt
 			| ClosePortalStmt
-			| ClusterStmt
 			| CommentStmt
 			| ConstraintsSetStmt
 			| CopyStmt
@@ -1106,6 +1105,7 @@ stmt:
 			| RemoveFuncStmt
 			| RemoveOperStmt
 			| RenameStmt
+			| RepackStmt
 			| RevokeStmt
 			| RevokeRoleStmt
 			| RuleStmt
@@ -1143,6 +1143,11 @@ opt_concurrently:
 			| /*EMPTY*/						{ $$ = false; }
 		;
 
+opt_usingindex:
+			USING INDEX						{ $$ = true; }
+			| /* EMPTY */					{ $$ = false; }
+		;
+
 opt_drop_behavior:
 			CASCADE							{ $$ = DROP_CASCADE; }
 			| RESTRICT						{ $$ = DROP_RESTRICT; }
@@ -11979,38 +11984,82 @@ CreateConversionStmt:
 /*****************************************************************************
  *
  *		QUERY:
+ *				REPACK [ (options) ] [ <qualified_name> [ <name_list> ] [ USING INDEX <index_name> ] ]
+ *
+ *			obsolete variants:
  *				CLUSTER (options) [ <qualified_name> [ USING <index_name> ] ]
  *				CLUSTER [VERBOSE] [ <qualified_name> [ USING <index_name> ] ]
  *				CLUSTER [VERBOSE] <index_name> ON <qualified_name> (for pre-8.3)
  *
  *****************************************************************************/
 
-ClusterStmt:
-			CLUSTER '(' utility_option_list ')' qualified_name cluster_index_specification
+RepackStmt:
+			REPACK opt_utility_option_list vacuum_relation USING INDEX name
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
-					n->relation = $5;
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = (VacuumRelation *) $3;
 					n->indexname = $6;
+					n->usingindex = true;
+					n->params = $2;
+					$$ = (Node *) n;
+				}
+			| REPACK opt_utility_option_list vacuum_relation opt_usingindex
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = (VacuumRelation *) $3;
+					n->indexname = NULL;
+					n->usingindex = $4;
+					n->params = $2;
+					$$ = (Node *) n;
+				}
+			| REPACK opt_utility_option_list opt_usingindex
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = NULL;
+					n->indexname = NULL;
+					n->usingindex = $3;
+					n->params = $2;
+					$$ = (Node *) n;
+				}
+			| CLUSTER '(' utility_option_list ')' qualified_name cluster_index_specification
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_CLUSTER;
+					n->relation = makeNode(VacuumRelation);
+					n->relation->relation = $5;
+					n->indexname = $6;
+					n->usingindex = true;
 					n->params = $3;
 					$$ = (Node *) n;
 				}
 			| CLUSTER opt_utility_option_list
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = NULL;
 					n->indexname = NULL;
+					n->usingindex = true;
 					n->params = $2;
 					$$ = (Node *) n;
 				}
 			/* unparenthesized VERBOSE kept for pre-14 compatibility */
 			| CLUSTER opt_verbose qualified_name cluster_index_specification
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
-					n->relation = $3;
+					n->command = REPACK_COMMAND_CLUSTER;
+					n->relation = makeNode(VacuumRelation);
+					n->relation->relation = $3;
 					n->indexname = $4;
+					n->usingindex = true;
 					if ($2)
 						n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
@@ -12018,20 +12067,25 @@ ClusterStmt:
 			/* unparenthesized VERBOSE kept for pre-17 compatibility */
 			| CLUSTER VERBOSE
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = NULL;
 					n->indexname = NULL;
+					n->usingindex = true;
 					n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
 				}
 			/* kept for pre-8.3 compatibility */
 			| CLUSTER opt_verbose name ON qualified_name
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
-					n->relation = $5;
+					n->command = REPACK_COMMAND_CLUSTER;
+					n->relation = makeNode(VacuumRelation);
+					n->relation->relation = $5;
 					n->indexname = $3;
+					n->usingindex = true;
 					if ($2)
 						n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
@@ -18069,6 +18123,7 @@ unreserved_keyword:
 			| RELATIVE_P
 			| RELEASE
 			| RENAME
+			| REPACK
 			| REPEATABLE
 			| REPLACE
 			| REPLICA
@@ -18704,6 +18759,7 @@ bare_label_keyword:
 			| RELATIVE_P
 			| RELEASE
 			| RENAME
+			| REPACK
 			| REPEATABLE
 			| REPLACE
 			| REPLICA
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index d18a3a60a46..d01895d1cf1 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -279,9 +279,9 @@ ClassifyUtilityCommandAsReadOnly(Node *parsetree)
 				return COMMAND_OK_IN_RECOVERY | COMMAND_OK_IN_READ_ONLY_TXN;
 			}
 
-		case T_ClusterStmt:
 		case T_ReindexStmt:
 		case T_VacuumStmt:
+		case T_RepackStmt:
 			{
 				/*
 				 * These commands write WAL, so they're not strictly
@@ -856,14 +856,14 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 			ExecuteCallStmt(castNode(CallStmt, parsetree), params, isAtomicContext, dest);
 			break;
 
-		case T_ClusterStmt:
-			cluster(pstate, (ClusterStmt *) parsetree, isTopLevel);
-			break;
-
 		case T_VacuumStmt:
 			ExecVacuum(pstate, (VacuumStmt *) parsetree, isTopLevel);
 			break;
 
+		case T_RepackStmt:
+			ExecRepack(pstate, (RepackStmt *) parsetree, isTopLevel);
+			break;
+
 		case T_ExplainStmt:
 			ExplainQuery(pstate, (ExplainStmt *) parsetree, params, dest);
 			break;
@@ -2864,10 +2864,6 @@ CreateCommandTag(Node *parsetree)
 			tag = CMDTAG_CALL;
 			break;
 
-		case T_ClusterStmt:
-			tag = CMDTAG_CLUSTER;
-			break;
-
 		case T_VacuumStmt:
 			if (((VacuumStmt *) parsetree)->is_vacuumcmd)
 				tag = CMDTAG_VACUUM;
@@ -2875,6 +2871,10 @@ CreateCommandTag(Node *parsetree)
 				tag = CMDTAG_ANALYZE;
 			break;
 
+		case T_RepackStmt:
+			tag = CMDTAG_REPACK;
+			break;
+
 		case T_ExplainStmt:
 			tag = CMDTAG_EXPLAIN;
 			break;
@@ -3516,7 +3516,7 @@ GetCommandLogLevel(Node *parsetree)
 			lev = LOGSTMT_ALL;
 			break;
 
-		case T_ClusterStmt:
+		case T_RepackStmt:
 			lev = LOGSTMT_DDL;
 			break;
 
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 7e2ed69138a..1a82534e347 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -289,6 +289,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 		cmdtype = PROGRESS_COMMAND_ANALYZE;
 	else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
 		cmdtype = PROGRESS_COMMAND_CLUSTER;
+	else if (pg_strcasecmp(cmd, "REPACK") == 0)
+		cmdtype = PROGRESS_COMMAND_REPACK;
 	else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)
 		cmdtype = PROGRESS_COMMAND_CREATE_INDEX;
 	else if (pg_strcasecmp(cmd, "BASEBACKUP") == 0)
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 20d7a65c614..626d9f1c98b 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -1267,7 +1267,7 @@ static const char *const sql_commands[] = {
 	"DELETE FROM", "DISCARD", "DO", "DROP", "END", "EXECUTE", "EXPLAIN",
 	"FETCH", "GRANT", "IMPORT FOREIGN SCHEMA", "INSERT INTO", "LISTEN", "LOAD", "LOCK",
 	"MERGE INTO", "MOVE", "NOTIFY", "PREPARE",
-	"REASSIGN", "REFRESH MATERIALIZED VIEW", "REINDEX", "RELEASE",
+	"REASSIGN", "REFRESH MATERIALIZED VIEW", "REINDEX", "RELEASE", "REPACK",
 	"RESET", "REVOKE", "ROLLBACK",
 	"SAVEPOINT", "SECURITY LABEL", "SELECT", "SET", "SHOW", "START",
 	"TABLE", "TRUNCATE", "UNLISTEN", "UPDATE", "VACUUM", "VALUES",
@@ -5040,6 +5040,46 @@ match_previous_words(int pattern_id,
 			COMPLETE_WITH_QUERY(Query_for_list_of_tablespaces);
 	}
 
+/* REPACK */
+	else if (Matches("REPACK"))
+		COMPLETE_WITH_SCHEMA_QUERY_PLUS(Query_for_list_of_clusterables,
+										"(", "USING INDEX");
+	else if (Matches("REPACK", "(*)"))
+		COMPLETE_WITH_SCHEMA_QUERY_PLUS(Query_for_list_of_clusterables,
+										"USING INDEX");
+	else if (Matches("REPACK", MatchAnyExcept("(")))
+		COMPLETE_WITH("USING INDEX");
+	else if (Matches("REPACK", "(*)", MatchAnyExcept("(")))
+		COMPLETE_WITH("USING INDEX");
+	else if (Matches("REPACK", MatchAny, "USING", "INDEX") ||
+			 Matches("REPACK", "(*)", MatchAny, "USING", "INDEX"))
+	{
+		set_completion_reference(prev3_wd);
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_index_of_table);
+	}
+	/*
+	 * Complete ... [ (*) ] <sth> USING INDEX, with a list of indexes for
+	 * <sth>.
+	 */
+	else if (TailMatches(MatchAny, "USING", "INDEX"))
+	{
+		set_completion_reference(prev3_wd);
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_index_of_table);
+	}
+	else if (HeadMatches("REPACK", "(*") &&
+			 !HeadMatches("REPACK", "(*)"))
+	{
+		/*
+		 * This fires if we're in an unfinished parenthesized option list.
+		 * get_previous_words treats a completed parenthesized option list as
+		 * one word, so the above test is correct.
+		 */
+		if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
+			COMPLETE_WITH("ANALYZE", "VERBOSE");
+		else if (TailMatches("ANALYZE", "VERBOSE"))
+			COMPLETE_WITH("ON", "OFF");
+	}
+
 /* SECURITY LABEL */
 	else if (Matches("SECURITY"))
 		COMPLETE_WITH("LABEL");
diff --git a/src/bin/scripts/Makefile b/src/bin/scripts/Makefile
index 019ca06455d..f0c1bd4175c 100644
--- a/src/bin/scripts/Makefile
+++ b/src/bin/scripts/Makefile
@@ -16,7 +16,7 @@ subdir = src/bin/scripts
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-PROGRAMS = createdb createuser dropdb dropuser clusterdb vacuumdb reindexdb pg_isready
+PROGRAMS = createdb createuser dropdb dropuser clusterdb vacuumdb reindexdb pg_isready pg_repackdb
 
 override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
 LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
@@ -31,6 +31,7 @@ clusterdb: clusterdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport su
 vacuumdb: vacuumdb.o vacuuming.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 reindexdb: reindexdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 pg_isready: pg_isready.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
+pg_repackdb: pg_repackdb.o vacuuming.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 
 install: all installdirs
 	$(INSTALL_PROGRAM) createdb$(X)   '$(DESTDIR)$(bindir)'/createdb$(X)
@@ -41,6 +42,7 @@ install: all installdirs
 	$(INSTALL_PROGRAM) vacuumdb$(X)   '$(DESTDIR)$(bindir)'/vacuumdb$(X)
 	$(INSTALL_PROGRAM) reindexdb$(X)  '$(DESTDIR)$(bindir)'/reindexdb$(X)
 	$(INSTALL_PROGRAM) pg_isready$(X) '$(DESTDIR)$(bindir)'/pg_isready$(X)
+	$(INSTALL_PROGRAM) pg_repackdb$(X) '$(DESTDIR)$(bindir)'/pg_repackdb$(X)
 
 installdirs:
 	$(MKDIR_P) '$(DESTDIR)$(bindir)'
diff --git a/src/bin/scripts/meson.build b/src/bin/scripts/meson.build
index a4fed59d1c9..be573cae682 100644
--- a/src/bin/scripts/meson.build
+++ b/src/bin/scripts/meson.build
@@ -42,6 +42,7 @@ vacuuming_common = static_library('libvacuuming_common',
 
 binaries = [
   'vacuumdb',
+  'pg_repackdb',
 ]
 foreach binary : binaries
   binary_sources = files('@0@.c'.format(binary))
@@ -80,6 +81,7 @@ tests += {
       't/100_vacuumdb.pl',
       't/101_vacuumdb_all.pl',
       't/102_vacuumdb_stages.pl',
+      't/103_repackdb.pl',
       't/200_connstr.pl',
     ],
   },
diff --git a/src/bin/scripts/pg_repackdb.c b/src/bin/scripts/pg_repackdb.c
new file mode 100644
index 00000000000..1edfa34ed0f
--- /dev/null
+++ b/src/bin/scripts/pg_repackdb.c
@@ -0,0 +1,242 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_repackdb
+ *		An utility to run REPACK
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * FIXME: this is missing a way to specify the index to use to repack one
+ * table, or whether to pass a WITH INDEX clause when multiple tables are
+ * used.  Something like --index[=indexname].  Adding that bleeds into
+ * vacuuming.c as well.
+ *
+ * src/bin/scripts/pg_repackdb.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <limits.h>
+
+#include "common.h"
+#include "common/logging.h"
+#include "fe_utils/option_utils.h"
+#include "vacuuming.h"
+
+static void help(const char *progname);
+static void check_objfilter(bits32 objfilter);
+
+int
+main(int argc, char *argv[])
+{
+	static struct option long_options[] = {
+		{"host", required_argument, NULL, 'h'},
+		{"port", required_argument, NULL, 'p'},
+		{"username", required_argument, NULL, 'U'},
+		{"no-password", no_argument, NULL, 'w'},
+		{"password", no_argument, NULL, 'W'},
+		{"echo", no_argument, NULL, 'e'},
+		{"quiet", no_argument, NULL, 'q'},
+		{"dbname", required_argument, NULL, 'd'},
+		{"analyze", no_argument, NULL, 'z'},
+		{"all", no_argument, NULL, 'a'},
+		/* XXX this could be 'i', but optional_arg is messy */
+		{"index", optional_argument, NULL, 1},
+		{"table", required_argument, NULL, 't'},
+		{"verbose", no_argument, NULL, 'v'},
+		{"jobs", required_argument, NULL, 'j'},
+		{"schema", required_argument, NULL, 'n'},
+		{"exclude-schema", required_argument, NULL, 'N'},
+		{"maintenance-db", required_argument, NULL, 2},
+		{NULL, 0, NULL, 0}
+	};
+
+	const char *progname;
+	int			optindex;
+	int			c;
+	const char *dbname = NULL;
+	const char *maintenance_db = NULL;
+	ConnParams	cparams;
+	bool		echo = false;
+	bool		quiet = false;
+	vacuumingOptions vacopts;
+	SimpleStringList objects = {NULL, NULL};
+	int			concurrentCons = 1;
+	int			tbl_count = 0;
+	int			ret;
+
+	/* initialize options */
+	memset(&vacopts, 0, sizeof(vacopts));
+	vacopts.mode = MODE_REPACK;
+
+	/* the same for connection parameters */
+	memset(&cparams, 0, sizeof(cparams));
+	cparams.prompt_password = TRI_DEFAULT;
+
+	pg_logging_init(argv[0]);
+	progname = get_progname(argv[0]);
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
+
+	handle_help_version_opts(argc, argv, progname, help);
+
+	while ((c = getopt_long(argc, argv, "ad:eh:j:n:N:p:qt:U:vwWz",
+							long_options, &optindex)) != -1)
+	{
+		switch (c)
+		{
+			case 'a':
+				vacopts.objfilter |= OBJFILTER_ALL_DBS;
+				break;
+			case 'd':
+				vacopts.objfilter |= OBJFILTER_DATABASE;
+				dbname = pg_strdup(optarg);
+				break;
+			case 'e':
+				echo = true;
+				break;
+			case 'h':
+				cparams.pghost = pg_strdup(optarg);
+				break;
+			case 'j':
+				if (!option_parse_int(optarg, "-j/--jobs", 1, INT_MAX,
+									  &concurrentCons))
+					exit(1);
+				break;
+			case 'n':
+				vacopts.objfilter |= OBJFILTER_SCHEMA;
+				simple_string_list_append(&objects, optarg);
+				break;
+			case 'N':
+				vacopts.objfilter |= OBJFILTER_SCHEMA_EXCLUDE;
+				simple_string_list_append(&objects, optarg);
+				break;
+			case 'p':
+				cparams.pgport = pg_strdup(optarg);
+				break;
+			case 'q':
+				quiet = true;
+				break;
+			case 't':
+				vacopts.objfilter |= OBJFILTER_TABLE;
+				simple_string_list_append(&objects, optarg);
+				tbl_count++;
+				break;
+			case 'U':
+				cparams.pguser = pg_strdup(optarg);
+				break;
+			case 'v':
+				vacopts.verbose = true;
+				break;
+			case 'w':
+				cparams.prompt_password = TRI_NO;
+				break;
+			case 'W':
+				cparams.prompt_password = TRI_YES;
+				break;
+			case 'z':
+				vacopts.and_analyze = true;
+				break;
+			case 1:
+				vacopts.using_index = true;
+				if (optarg)
+					vacopts.indexname = pg_strdup(optarg);
+				else
+					vacopts.indexname = NULL;
+				break;
+			case 2:
+				maintenance_db = pg_strdup(optarg);
+				break;
+			default:
+				/* getopt_long already emitted a complaint */
+				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+				exit(1);
+		}
+	}
+
+	/*
+	 * Non-option argument specifies database name as long as it wasn't
+	 * already specified with -d / --dbname
+	 */
+	if (optind < argc && dbname == NULL)
+	{
+		vacopts.objfilter |= OBJFILTER_DATABASE;
+		dbname = argv[optind];
+		optind++;
+	}
+
+	if (optind < argc)
+	{
+		pg_log_error("too many command-line arguments (first is \"%s\")",
+					 argv[optind]);
+		pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+		exit(1);
+	}
+
+	/*
+	 * Validate the combination of filters specified in the command-line
+	 * options.
+	 */
+	check_objfilter(vacopts.objfilter);
+
+	ret = vacuuming_main(&cparams, dbname, maintenance_db, &vacopts,
+						 &objects, tbl_count, concurrentCons,
+						 progname, echo, quiet);
+	exit(ret);
+}
+
+/*
+ * Verify that the filters used at command line are compatible.
+ */
+void
+check_objfilter(bits32 objfilter)
+{
+	if ((objfilter & OBJFILTER_ALL_DBS) &&
+		(objfilter & OBJFILTER_DATABASE))
+		pg_fatal("cannot repack all databases and a specific one at the same time");
+
+	if ((objfilter & OBJFILTER_TABLE) &&
+		(objfilter & OBJFILTER_SCHEMA))
+		pg_fatal("cannot repack all tables in schema(s) and specific table(s) at the same time");
+
+	if ((objfilter & OBJFILTER_TABLE) &&
+		(objfilter & OBJFILTER_SCHEMA_EXCLUDE))
+		pg_fatal("cannot repack specific table(s) and exclude schema(s) at the same time");
+
+	if ((objfilter & OBJFILTER_SCHEMA) &&
+		(objfilter & OBJFILTER_SCHEMA_EXCLUDE))
+		pg_fatal("cannot repack all tables in schema(s) and exclude schema(s) at the same time");
+}
+
+static void
+help(const char *progname)
+{
+	printf(_("%s repacks a PostgreSQL database.\n\n"), progname);
+	printf(_("Usage:\n"));
+	printf(_("  %s [OPTION]... [DBNAME]\n"), progname);
+	printf(_("\nOptions:\n"));
+	printf(_("  -a, --all                       repack all databases\n"));
+	printf(_("  -d, --dbname=DBNAME             database to repack\n"));
+	printf(_("  -e, --echo                      show the commands being sent to the server\n"));
+	printf(_("      --index[=INDEX]             repack following an index\n"));
+	printf(_("  -j, --jobs=NUM                  use this many concurrent connections to repack\n"));
+	printf(_("  -n, --schema=SCHEMA             repack tables in the specified schema(s) only\n"));
+	printf(_("  -N, --exclude-schema=SCHEMA     do not repack tables in the specified schema(s)\n"));
+	printf(_("  -q, --quiet                     don't write any messages\n"));
+	printf(_("  -t, --table='TABLE[(COLUMNS)]'  repack specific table(s) only\n"));
+	printf(_("  -v, --verbose                   write a lot of output\n"));
+	printf(_("  -V, --version                   output version information, then exit\n"));
+	printf(_("  -z, --analyze                   update optimizer statistics\n"));
+	printf(_("  -?, --help                      show this help, then exit\n"));
+	printf(_("\nConnection options:\n"));
+	printf(_("  -h, --host=HOSTNAME       database server host or socket directory\n"));
+	printf(_("  -p, --port=PORT           database server port\n"));
+	printf(_("  -U, --username=USERNAME   user name to connect as\n"));
+	printf(_("  -w, --no-password         never prompt for password\n"));
+	printf(_("  -W, --password            force password prompt\n"));
+	printf(_("  --maintenance-db=DBNAME   alternate maintenance database\n"));
+	printf(_("\nRead the description of the SQL command REPACK for details.\n"));
+	printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
+	printf(_("%s home page: <%s>\n"), PACKAGE_NAME, PACKAGE_URL);
+}
diff --git a/src/bin/scripts/t/103_repackdb.pl b/src/bin/scripts/t/103_repackdb.pl
new file mode 100644
index 00000000000..cadce9b837c
--- /dev/null
+++ b/src/bin/scripts/t/103_repackdb.pl
@@ -0,0 +1,47 @@
+# Copyright (c) 2021-2025, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+program_help_ok('pg_repackdb');
+program_version_ok('pg_repackdb');
+program_options_handling_ok('pg_repackdb');
+
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init;
+$node->start;
+
+$node->issues_sql_like(
+	[ 'pg_repackdb', 'postgres', '-t', 'pg_class'],
+	qr/statement: REPACK.*pg_class;/,
+	'pg_repackdb processes a single table');
+
+$node->safe_psql('postgres', 'CREATE USER testusr;
+	GRANT CREATE ON SCHEMA public TO testusr');
+$node->safe_psql('postgres',
+	'CREATE TABLE cluster_1 (a int primary key);
+	ALTER TABLE cluster_1 CLUSTER ON cluster_1_pkey;
+	CREATE TABLE cluster_2 (a int unique);
+	ALTER TABLE cluster_2 CLUSTER ON cluster_2_a_key;',
+	extra_params => ['-U' => 'testusr']);
+
+$node->issues_sql_like(
+	[ 'pg_repackdb', 'postgres', '-U', 'testusr' ],
+	qr/statement: REPACK.*;/,
+	'SQL REPACK run');
+
+$node->issues_sql_like(
+	[ 'pg_repackdb', 'postgres', '--index'],
+	qr/statement: REPACK.*cluster_1 USING INDEX.*statement: REPACK.*cluster_2 USING INDEX/ms,
+	'pg_repackdb --index chooses multiple tables');
+
+$node->issues_sql_like(
+	[ 'pg_repackdb', 'postgres', '--analyze', '-t', 'cluster_1'],
+	qr/statement: REPACK \(ANALYZE\) public.cluster_1/,
+	'pg_repackdb --analyze works');
+
+done_testing();
diff --git a/src/bin/scripts/vacuuming.c b/src/bin/scripts/vacuuming.c
index f836f21fb03..d8d77cabe43 100644
--- a/src/bin/scripts/vacuuming.c
+++ b/src/bin/scripts/vacuuming.c
@@ -1,6 +1,6 @@
 /*-------------------------------------------------------------------------
  * vacuuming.c
- *		Helper routines for vacuumdb
+ *		Helper routines for vacuumdb and pg_repackdb
  *
  * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -43,8 +43,8 @@ static SimpleStringList *retrieve_objects(PGconn *conn,
 static void free_retrieved_objects(SimpleStringList *list);
 static void prepare_vacuum_command(PGconn *conn, PQExpBuffer sql,
 								   vacuumingOptions *vacopts, const char *table);
-static void run_vacuum_command(PGconn *conn, const char *sql, bool echo,
-							   const char *table);
+static void run_vacuum_command(PGconn *conn, vacuumingOptions *vacopts,
+							   const char *sql, bool echo, const char *table);
 
 /*
  * Executes vacuum/analyze as indicated.  Returns 0 if the plan is carried
@@ -194,6 +194,14 @@ vacuum_one_database(ConnParams *cparams,
 
 	conn = connectDatabase(cparams, progname, echo, false, true);
 
+	if (vacopts->mode == MODE_REPACK && PQserverVersion(conn) < 190000)
+	{
+		/* XXX arguably, here we should use VACUUM FULL instead of failing */
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" command on server versions older than PostgreSQL %s",
+				 "REPACK", "19");
+	}
+
 	if (vacopts->disable_page_skipping && PQserverVersion(conn) < 90600)
 	{
 		PQfinish(conn);
@@ -286,9 +294,18 @@ vacuum_one_database(ConnParams *cparams,
 		if (vacopts->mode == MODE_ANALYZE_IN_STAGES)
 			printf(_("%s: processing database \"%s\": %s\n"),
 				   progname, PQdb(conn), _(stage_messages[stage]));
-		else
+		else if (vacopts->mode == MODE_ANALYZE)
+			printf(_("%s: analyzing database \"%s\"\n"),
+				   progname, PQdb(conn));
+		else if (vacopts->mode == MODE_VACUUM)
 			printf(_("%s: vacuuming database \"%s\"\n"),
 				   progname, PQdb(conn));
+		else
+		{
+			Assert(vacopts->mode == MODE_REPACK);
+			printf(_("%s: repacking database \"%s\"\n"),
+				   progname, PQdb(conn));
+		}
 		fflush(stdout);
 	}
 
@@ -383,7 +400,7 @@ vacuum_one_database(ConnParams *cparams,
 		 * through ParallelSlotsGetIdle.
 		 */
 		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, sql.data,
+		run_vacuum_command(free_slot->connection, vacopts, sql.data,
 						   echo, tabname);
 
 		cell = cell->next;
@@ -408,7 +425,7 @@ vacuum_one_database(ConnParams *cparams,
 		}
 
 		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, cmd, echo, NULL);
+		run_vacuum_command(free_slot->connection, vacopts, cmd, echo, NULL);
 
 		if (!ParallelSlotsWaitCompletion(sa))
 			ret = EXIT_FAILURE; /* error already reported by handler */
@@ -636,6 +653,35 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
 								 " AND listed_objects.object_oid IS NOT NULL\n");
 	}
 
+	/*
+	 * In REPACK mode, if the 'using_index' option was given but no index
+	 * name, filter only tables that have an index with indisclustered set.
+	 * (If an index name is given, we trust the user to pass a reasonable list
+	 * of tables.)
+	 *
+	 * XXX it may be worth printing an error if an index name is given with no
+	 * list of tables.
+	 */
+	if (vacopts->mode == MODE_REPACK &&
+		vacopts->using_index && !vacopts->indexname)
+	{
+		appendPQExpBufferStr(&catalog_query,
+							 " AND EXISTS (SELECT 1 FROM pg_catalog.pg_index\n"
+							 "    WHERE indrelid = c.oid AND indisclustered)\n");
+	}
+
+	/*
+	 * In REPACK mode, only consider the tables that the current user has
+	 * MAINTAIN privileges on.  XXX maybe we should do this in all cases, not
+	 * just REPACK.  The vacuumdb output is too noisy for no reason.
+	 */
+	if (vacopts->mode == MODE_REPACK)
+	{
+		appendPQExpBufferStr(&catalog_query,
+							 " AND pg_catalog.has_table_privilege(current_user, "
+							 "c.oid, 'MAINTAIN')\n");
+	}
+
 	/*
 	 * If no tables were listed, filter for the relevant relation types.  If
 	 * tables were given via --table, don't bother filtering by relation type.
@@ -874,8 +920,10 @@ prepare_vacuum_command(PGconn *conn, PQExpBuffer sql,
 			if (vacopts->verbose)
 				appendPQExpBufferStr(sql, " VERBOSE");
 		}
+
+		appendPQExpBuffer(sql, " %s", table);
 	}
-	else
+	else if (vacopts->mode == MODE_VACUUM)
 	{
 		appendPQExpBufferStr(sql, "VACUUM");
 
@@ -989,9 +1037,39 @@ prepare_vacuum_command(PGconn *conn, PQExpBuffer sql,
 			if (vacopts->and_analyze)
 				appendPQExpBufferStr(sql, " ANALYZE");
 		}
+
+		appendPQExpBuffer(sql, " %s", table);
 	}
+	else if (vacopts->mode == MODE_REPACK)
+	{
+		appendPQExpBufferStr(sql, "REPACK");
 
-	appendPQExpBuffer(sql, " %s;", table);
+		if (vacopts->verbose)
+		{
+			appendPQExpBuffer(sql, "%sVERBOSE", sep);
+			sep = comma;
+		}
+		if (vacopts->and_analyze)
+		{
+			appendPQExpBuffer(sql, "%sANALYZE", sep);
+			sep = comma;
+		}
+
+		if (sep != paren)
+			appendPQExpBufferChar(sql, ')');
+
+		appendPQExpBuffer(sql, " %s", table);
+
+		if (vacopts->using_index)
+		{
+			appendPQExpBuffer(sql, " USING INDEX");
+			if (vacopts->indexname)
+				appendPQExpBuffer(sql, " %s", fmtIdEnc(vacopts->indexname,
+													   PQclientEncoding(conn)));
+		}
+	}
+
+	appendPQExpBufferChar(sql, ';');
 }
 
 /*
@@ -1001,8 +1079,8 @@ prepare_vacuum_command(PGconn *conn, PQExpBuffer sql,
  * Any errors during command execution are reported to stderr.
  */
 static void
-run_vacuum_command(PGconn *conn, const char *sql, bool echo,
-				   const char *table)
+run_vacuum_command(PGconn *conn, vacuumingOptions *vacopts,
+				   const char *sql, bool echo, const char *table)
 {
 	bool		status;
 
@@ -1015,13 +1093,21 @@ run_vacuum_command(PGconn *conn, const char *sql, bool echo,
 	{
 		if (table)
 		{
-			pg_log_error("vacuuming of table \"%s\" in database \"%s\" failed: %s",
-						 table, PQdb(conn), PQerrorMessage(conn));
+			if (vacopts->mode == MODE_VACUUM)
+				pg_log_error("vacuuming of table \"%s\" in database \"%s\" failed: %s",
+							 table, PQdb(conn), PQerrorMessage(conn));
+			else
+				pg_log_error("repacking of table \"%s\" in database \"%s\" failed: %s",
+							 table, PQdb(conn), PQerrorMessage(conn));
 		}
 		else
 		{
-			pg_log_error("vacuuming of database \"%s\" failed: %s",
-						 PQdb(conn), PQerrorMessage(conn));
+			if (vacopts->mode == MODE_VACUUM)
+				pg_log_error("vacuuming of database \"%s\" failed: %s",
+							 PQdb(conn), PQerrorMessage(conn));
+			else
+				pg_log_error("repacking of database \"%s\" failed: %s",
+							 PQdb(conn), PQerrorMessage(conn));
 		}
 	}
 }
diff --git a/src/bin/scripts/vacuuming.h b/src/bin/scripts/vacuuming.h
index 49f968b32e5..665dbaedfad 100644
--- a/src/bin/scripts/vacuuming.h
+++ b/src/bin/scripts/vacuuming.h
@@ -20,6 +20,7 @@
 typedef enum
 {
 	MODE_VACUUM,
+	MODE_REPACK,
 	MODE_ANALYZE,
 	MODE_ANALYZE_IN_STAGES
 } RunMode;
@@ -37,6 +38,8 @@ typedef struct vacuumingOptions
 	bool		and_analyze;
 	bool		full;
 	bool		freeze;
+	bool		using_index;
+	char	   *indexname;
 	bool		disable_page_skipping;
 	bool		skip_locked;
 	int			min_xid_age;
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 60088a64cbb..652542e8e65 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -24,6 +24,7 @@
 #define CLUOPT_RECHECK 0x02		/* recheck relation state */
 #define CLUOPT_RECHECK_ISCLUSTERED 0x04 /* recheck relation state for
 										 * indisclustered */
+#define CLUOPT_ANALYZE 0x08		/* do an ANALYZE */
 
 /* options for CLUSTER */
 typedef struct ClusterParams
@@ -31,8 +32,11 @@ typedef struct ClusterParams
 	bits32		options;		/* bitmask of CLUOPT_* */
 } ClusterParams;
 
-extern void cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel);
-extern void cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params);
+
+extern void ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel);
+
+extern void cluster_rel(RepackCommand command, Relation OldHeap, Oid indexOid,
+						ClusterParams *params);
 extern void check_index_is_clusterable(Relation OldHeap, Oid indexOid,
 									   LOCKMODE lockmode);
 extern void mark_index_clustered(Relation rel, Oid indexOid, bool is_internal);
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 1cde4bd9bcf..ebf004b7aa5 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -56,28 +56,34 @@
 #define PROGRESS_ANALYZE_PHASE_COMPUTE_EXT_STATS		4
 #define PROGRESS_ANALYZE_PHASE_FINALIZE_ANALYZE			5
 
-/* Progress parameters for cluster */
-#define PROGRESS_CLUSTER_COMMAND				0
-#define PROGRESS_CLUSTER_PHASE					1
-#define PROGRESS_CLUSTER_INDEX_RELID			2
-#define PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED	3
-#define PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN	4
-#define PROGRESS_CLUSTER_TOTAL_HEAP_BLKS		5
-#define PROGRESS_CLUSTER_HEAP_BLKS_SCANNED		6
-#define PROGRESS_CLUSTER_INDEX_REBUILD_COUNT	7
-
-/* Phases of cluster (as advertised via PROGRESS_CLUSTER_PHASE) */
-#define PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP	1
-#define PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP	2
-#define PROGRESS_CLUSTER_PHASE_SORT_TUPLES		3
-#define PROGRESS_CLUSTER_PHASE_WRITE_NEW_HEAP	4
-#define PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES	5
-#define PROGRESS_CLUSTER_PHASE_REBUILD_INDEX	6
-#define PROGRESS_CLUSTER_PHASE_FINAL_CLEANUP	7
-
-/* Commands of PROGRESS_CLUSTER */
-#define PROGRESS_CLUSTER_COMMAND_CLUSTER		1
-#define PROGRESS_CLUSTER_COMMAND_VACUUM_FULL	2
+/*
+ * Progress parameters for REPACK.
+ *
+ * Values for PROGRESS_REPACK_COMMAND are defined as in RepackCommand.
+ *
+ * Note: Since REPACK shares code with CLUSTER, these values are also
+ * used by CLUSTER. (CLUSTER is now deprecated, so it makes little sense to
+ * introduce a separate set of constants.)
+ */
+#define PROGRESS_REPACK_COMMAND					0
+#define PROGRESS_REPACK_PHASE					1
+#define PROGRESS_REPACK_INDEX_RELID				2
+#define PROGRESS_REPACK_HEAP_TUPLES_SCANNED		3
+#define PROGRESS_REPACK_HEAP_TUPLES_WRITTEN		4
+#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			5
+#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		6
+#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		7
+
+/*
+ * Phases of repack (as advertised via PROGRESS_REPACK_PHASE).
+ */
+#define PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP		1
+#define PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP	2
+#define PROGRESS_REPACK_PHASE_SORT_TUPLES		3
+#define PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP	4
+#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	5
+#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		6
+#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		7
 
 /* Progress parameters for CREATE INDEX */
 /* 3, 4 and 5 reserved for "waitfor" metrics */
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index d14294a4ece..94892042b8d 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3951,18 +3951,6 @@ typedef struct AlterSystemStmt
 	VariableSetStmt *setstmt;	/* SET subcommand */
 } AlterSystemStmt;
 
-/* ----------------------
- *		Cluster Statement (support pbrown's cluster index implementation)
- * ----------------------
- */
-typedef struct ClusterStmt
-{
-	NodeTag		type;
-	RangeVar   *relation;		/* relation being indexed, or NULL if all */
-	char	   *indexname;		/* original index defined */
-	List	   *params;			/* list of DefElem nodes */
-} ClusterStmt;
-
 /* ----------------------
  *		Vacuum and Analyze Statements
  *
@@ -3975,7 +3963,7 @@ typedef struct VacuumStmt
 	NodeTag		type;
 	List	   *options;		/* list of DefElem nodes */
 	List	   *rels;			/* list of VacuumRelation, or NIL for all */
-	bool		is_vacuumcmd;	/* true for VACUUM, false for ANALYZE */
+	bool		is_vacuumcmd;	/* true for VACUUM, false otherwise */
 } VacuumStmt;
 
 /*
@@ -3993,6 +3981,27 @@ typedef struct VacuumRelation
 	List	   *va_cols;		/* list of column names, or NIL for all */
 } VacuumRelation;
 
+/* ----------------------
+ *		Repack Statement
+ * ----------------------
+ */
+typedef enum RepackCommand
+{
+	REPACK_COMMAND_CLUSTER = 1,
+	REPACK_COMMAND_REPACK,
+	REPACK_COMMAND_VACUUMFULL,
+} RepackCommand;
+
+typedef struct RepackStmt
+{
+	NodeTag		type;
+	RepackCommand command;		/* type of command being run */
+	VacuumRelation *relation;	/* relation being repacked */
+	char	   *indexname;		/* order tuples by this index */
+	bool		usingindex;		/* whether USING INDEX is specified */
+	List	   *params;			/* list of DefElem nodes */
+} RepackStmt;
+
 /* ----------------------
  *		Explain Statement
  *
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 5d4fe27ef96..f1a1d5e7a80 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -376,6 +376,7 @@ PG_KEYWORD("reindex", REINDEX, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("relative", RELATIVE_P, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("release", RELEASE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("rename", RENAME, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("repack", REPACK, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("repeatable", REPEATABLE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("replace", REPLACE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("replica", REPLICA, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/include/tcop/cmdtaglist.h b/src/include/tcop/cmdtaglist.h
index c4606d65043..66690f1134a 100644
--- a/src/include/tcop/cmdtaglist.h
+++ b/src/include/tcop/cmdtaglist.h
@@ -196,6 +196,7 @@ PG_CMDTAG(CMDTAG_REASSIGN_OWNED, "REASSIGN OWNED", false, false, false)
 PG_CMDTAG(CMDTAG_REFRESH_MATERIALIZED_VIEW, "REFRESH MATERIALIZED VIEW", true, false, false)
 PG_CMDTAG(CMDTAG_REINDEX, "REINDEX", true, false, false)
 PG_CMDTAG(CMDTAG_RELEASE, "RELEASE", false, false, false)
+PG_CMDTAG(CMDTAG_REPACK, "REPACK", false, false, false)
 PG_CMDTAG(CMDTAG_RESET, "RESET", false, false, false)
 PG_CMDTAG(CMDTAG_REVOKE, "REVOKE", true, false, false)
 PG_CMDTAG(CMDTAG_REVOKE_ROLE, "REVOKE ROLE", false, false, false)
diff --git a/src/include/utils/backend_progress.h b/src/include/utils/backend_progress.h
index dda813ab407..e69e366dcdc 100644
--- a/src/include/utils/backend_progress.h
+++ b/src/include/utils/backend_progress.h
@@ -28,6 +28,7 @@ typedef enum ProgressCommandType
 	PROGRESS_COMMAND_CREATE_INDEX,
 	PROGRESS_COMMAND_BASEBACKUP,
 	PROGRESS_COMMAND_COPY,
+	PROGRESS_COMMAND_REPACK,
 } ProgressCommandType;
 
 #define PGSTAT_NUM_PROGRESS_PARAM	20
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index 4d40a6809ab..277854418fa 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -495,6 +495,43 @@ ALTER TABLE clstrpart SET WITHOUT CLUSTER;
 ERROR:  cannot mark index clustered in partitioned table
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 ERROR:  cannot mark index clustered in partitioned table
+-- Check that REPACK sets new relfilenodes: it should process exactly the same
+-- tables as CLUSTER did.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart USING INDEX clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
+-- And finally the same for REPACK w/o index.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
 DROP TABLE clstrpart;
 -- Ownership of partitions is checked
 CREATE TABLE ptnowner(i int unique) PARTITION BY LIST (i);
@@ -513,7 +550,7 @@ CREATE TEMP TABLE ptnowner_oldnodes AS
   JOIN pg_class AS c ON c.oid=tree.relid;
 SET SESSION AUTHORIZATION regress_ptnowner;
 CLUSTER ptnowner USING ptnowner_i_idx;
-WARNING:  permission denied to cluster "ptnowner2", skipping it
+WARNING:  permission denied to execute CLUSTER on "ptnowner2", skipping it
 RESET SESSION AUTHORIZATION;
 SELECT a.relname, a.relfilenode=b.relfilenode FROM pg_class a
   JOIN ptnowner_oldnodes b USING (oid) ORDER BY a.relname COLLATE "C";
@@ -665,6 +702,101 @@ SELECT * FROM clstr_expression WHERE -a = -3 ORDER BY -a, b;
 (4 rows)
 
 COMMIT;
+----------------------------------------------------------------------
+--
+-- REPACK
+--
+----------------------------------------------------------------------
+-- REPACK handles individual tables identically to CLUSTER, but it's worth
+-- checking if it handles table hierarchies identically as well.
+REPACK clstr_tst USING INDEX clstr_tst_c;
+-- Verify that inheritance link still works
+INSERT INTO clstr_tst_inh VALUES (0, 100, 'in child table 2');
+SELECT a,b,c,substring(d for 30), length(d) from clstr_tst;
+ a  |  b  |        c         |           substring            | length 
+----+-----+------------------+--------------------------------+--------
+ 10 |  14 | catorce          |                                |       
+ 18 |   5 | cinco            |                                |       
+  9 |   4 | cuatro           |                                |       
+ 26 |  19 | diecinueve       |                                |       
+ 12 |  18 | dieciocho        |                                |       
+ 30 |  16 | dieciseis        |                                |       
+ 24 |  17 | diecisiete       |                                |       
+  2 |  10 | diez             |                                |       
+ 23 |  12 | doce             |                                |       
+ 11 |   2 | dos              |                                |       
+ 25 |   9 | nueve            |                                |       
+ 31 |   8 | ocho             |                                |       
+  1 |  11 | once             |                                |       
+ 28 |  15 | quince           |                                |       
+ 32 |   6 | seis             | xyzzyxyzzyxyzzyxyzzyxyzzyxyzzy | 500000
+ 29 |   7 | siete            |                                |       
+ 15 |  13 | trece            |                                |       
+ 22 |  30 | treinta          |                                |       
+ 17 |  32 | treinta y dos    |                                |       
+  3 |  31 | treinta y uno    |                                |       
+  5 |   3 | tres             |                                |       
+ 20 |   1 | uno              |                                |       
+  6 |  20 | veinte           |                                |       
+ 14 |  25 | veinticinco      |                                |       
+ 21 |  24 | veinticuatro     |                                |       
+  4 |  22 | veintidos        |                                |       
+ 19 |  29 | veintinueve      |                                |       
+ 16 |  28 | veintiocho       |                                |       
+ 27 |  26 | veintiseis       |                                |       
+ 13 |  27 | veintisiete      |                                |       
+  7 |  23 | veintitres       |                                |       
+  8 |  21 | veintiuno        |                                |       
+  0 | 100 | in child table   |                                |       
+  0 | 100 | in child table 2 |                                |       
+(34 rows)
+
+-- Verify that foreign key link still works
+INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
+ERROR:  insert or update on table "clstr_tst" violates foreign key constraint "clstr_tst_con"
+DETAIL:  Key (b)=(1111) is not present in table "clstr_tst_s".
+SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
+ORDER BY 1;
+       conname        
+----------------------
+ clstr_tst_a_not_null
+ clstr_tst_con
+ clstr_tst_pkey
+(3 rows)
+
+-- Verify partial analyze works
+REPACK (ANALYZE) clstr_tst (a);
+REPACK (ANALYZE) clstr_tst;
+REPACK (VERBOSE) clstr_tst (a);
+ERROR:  ANALYZE option must be specified when a column list is provided
+-- REPACK w/o argument performs no ordering, so we can only check which tables
+-- have the relfilenode changed.
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_old AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+SET SESSION AUTHORIZATION regress_clstr_user;
+SET client_min_messages = ERROR;  -- order of "skipping" warnings may vary
+REPACK;
+RESET client_min_messages;
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_new AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+-- Do the actual comparison. Unlike CLUSTER, clstr_3 should have been
+-- processed because there is nothing like clustering index here.
+SELECT o.relname FROM relnodes_old o
+JOIN relnodes_new n ON o.relname = n.relname
+WHERE o.relfilenode <> n.relfilenode
+ORDER BY o.relname;
+ relname 
+---------
+ clstr_1
+ clstr_3
+(2 rows)
+
 -- clean up
 DROP TABLE clustertest;
 DROP TABLE clstr_1;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 94e45dd4d57..1c957f12d27 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1995,34 +1995,23 @@ pg_stat_progress_basebackup| SELECT pid,
             ELSE NULL::text
         END AS backup_type
    FROM pg_stat_get_progress_info('BASEBACKUP'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20);
-pg_stat_progress_cluster| SELECT s.pid,
-    s.datid,
-    d.datname,
-    s.relid,
-        CASE s.param1
-            WHEN 1 THEN 'CLUSTER'::text
-            WHEN 2 THEN 'VACUUM FULL'::text
-            ELSE NULL::text
+pg_stat_progress_cluster| SELECT pid,
+    datid,
+    datname,
+    relid,
+        CASE
+            WHEN (command = ANY (ARRAY['CLUSTER'::text, 'VACUUM FULL'::text])) THEN command
+            WHEN (repack_index_relid = (0)::oid) THEN 'VACUUM FULL'::text
+            ELSE 'CLUSTER'::text
         END AS command,
-        CASE s.param2
-            WHEN 0 THEN 'initializing'::text
-            WHEN 1 THEN 'seq scanning heap'::text
-            WHEN 2 THEN 'index scanning heap'::text
-            WHEN 3 THEN 'sorting tuples'::text
-            WHEN 4 THEN 'writing new heap'::text
-            WHEN 5 THEN 'swapping relation files'::text
-            WHEN 6 THEN 'rebuilding index'::text
-            WHEN 7 THEN 'performing final cleanup'::text
-            ELSE NULL::text
-        END AS phase,
-    (s.param3)::oid AS cluster_index_relid,
-    s.param4 AS heap_tuples_scanned,
-    s.param5 AS heap_tuples_written,
-    s.param6 AS heap_blks_total,
-    s.param7 AS heap_blks_scanned,
-    s.param8 AS index_rebuild_count
-   FROM (pg_stat_get_progress_info('CLUSTER'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
-     LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+    phase,
+    repack_index_relid AS cluster_index_relid,
+    heap_tuples_scanned,
+    heap_tuples_written,
+    heap_blks_total,
+    heap_blks_scanned,
+    index_rebuild_count
+   FROM pg_stat_progress_repack;
 pg_stat_progress_copy| SELECT s.pid,
     s.datid,
     d.datname,
@@ -2082,6 +2071,35 @@ pg_stat_progress_create_index| SELECT s.pid,
     s.param15 AS partitions_done
    FROM (pg_stat_get_progress_info('CREATE INDEX'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+pg_stat_progress_repack| SELECT s.pid,
+    s.datid,
+    d.datname,
+    s.relid,
+        CASE s.param1
+            WHEN 1 THEN 'CLUSTER'::text
+            WHEN 2 THEN 'REPACK'::text
+            WHEN 3 THEN 'VACUUM FULL'::text
+            ELSE NULL::text
+        END AS command,
+        CASE s.param2
+            WHEN 0 THEN 'initializing'::text
+            WHEN 1 THEN 'seq scanning heap'::text
+            WHEN 2 THEN 'index scanning heap'::text
+            WHEN 3 THEN 'sorting tuples'::text
+            WHEN 4 THEN 'writing new heap'::text
+            WHEN 5 THEN 'swapping relation files'::text
+            WHEN 6 THEN 'rebuilding index'::text
+            WHEN 7 THEN 'performing final cleanup'::text
+            ELSE NULL::text
+        END AS phase,
+    (s.param3)::oid AS repack_index_relid,
+    s.param4 AS heap_tuples_scanned,
+    s.param5 AS heap_tuples_written,
+    s.param6 AS heap_blks_total,
+    s.param7 AS heap_blks_scanned,
+    s.param8 AS index_rebuild_count
+   FROM (pg_stat_get_progress_info('REPACK'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
+     LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_vacuum| SELECT s.pid,
     s.datid,
     d.datname,
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index b7115f86104..c976823a3cb 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -76,7 +76,6 @@ INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
 SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
 ORDER BY 1;
 
-
 SELECT relname, relkind,
     EXISTS(SELECT 1 FROM pg_class WHERE oid = c.reltoastrelid) AS hastoast
 FROM pg_class c WHERE relname LIKE 'clstr_tst%' ORDER BY relname;
@@ -229,6 +228,24 @@ SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM o
 CLUSTER clstrpart;
 ALTER TABLE clstrpart SET WITHOUT CLUSTER;
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+
+-- Check that REPACK sets new relfilenodes: it should process exactly the same
+-- tables as CLUSTER did.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart USING INDEX clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+
+-- And finally the same for REPACK w/o index.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+
 DROP TABLE clstrpart;
 
 -- Ownership of partitions is checked
@@ -313,6 +330,57 @@ EXPLAIN (COSTS OFF) SELECT * FROM clstr_expression WHERE -a = -3 ORDER BY -a, b;
 SELECT * FROM clstr_expression WHERE -a = -3 ORDER BY -a, b;
 COMMIT;
 
+----------------------------------------------------------------------
+--
+-- REPACK
+--
+----------------------------------------------------------------------
+
+-- REPACK handles individual tables identically to CLUSTER, but it's worth
+-- checking if it handles table hierarchies identically as well.
+REPACK clstr_tst USING INDEX clstr_tst_c;
+
+-- Verify that inheritance link still works
+INSERT INTO clstr_tst_inh VALUES (0, 100, 'in child table 2');
+SELECT a,b,c,substring(d for 30), length(d) from clstr_tst;
+
+-- Verify that foreign key link still works
+INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
+
+SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
+ORDER BY 1;
+
+-- Verify partial analyze works
+REPACK (ANALYZE) clstr_tst (a);
+REPACK (ANALYZE) clstr_tst;
+REPACK (VERBOSE) clstr_tst (a);
+
+-- REPACK w/o argument performs no ordering, so we can only check which tables
+-- have the relfilenode changed.
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_old AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+
+SET SESSION AUTHORIZATION regress_clstr_user;
+SET client_min_messages = ERROR;  -- order of "skipping" warnings may vary
+REPACK;
+RESET client_min_messages;
+
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_new AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+
+-- Do the actual comparison. Unlike CLUSTER, clstr_3 should have been
+-- processed because there is nothing like clustering index here.
+SELECT o.relname FROM relnodes_old o
+JOIN relnodes_new n ON o.relname = n.relname
+WHERE o.relfilenode <> n.relfilenode
+ORDER BY o.relname;
+
 -- clean up
 DROP TABLE clustertest;
 DROP TABLE clstr_1;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index c1ad80a418d..4641da9b746 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2555,6 +2555,8 @@ ReorderBufferTupleCidKey
 ReorderBufferUpdateProgressTxnCB
 ReorderTuple
 RepOriginId
+RepackCommand
+RepackStmt
 ReparameterizeForeignPathByChild_function
 ReplaceVarsFromTargetList_context
 ReplaceVarsNoMatchOption
-- 
2.47.3

v26-0002-Refactor-index_concurrently_create_copy-for-use-with.patchtext/x-diffDownload

From 89b8c72639a7158726193dc799996ba0d9ddc74e Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Thu, 4 Dec 2025 18:20:07 +0100
Subject: [PATCH 2/4] Refactor index_concurrently_create_copy() for use with
 REPACK (CONCURRENTLY).

This patch moves the code to index_create_copy() and adds a "concurrently"
parameter so it can be used by REPACK (CONCURRENTLY).

With the CONCURRENTLY option, REPACK cannot simply swap the heap file and
rebuild its indexes. Instead, it needs to build a separate set of indexes
(including system catalog entries) *before* the actual swap, to reduce the
time AccessExclusiveLock needs to be held for.
---
 src/backend/catalog/index.c      | 54 +++++++++++++++++++++++---------
 src/backend/commands/indexcmds.c |  6 ++--
 src/backend/nodes/makefuncs.c    |  9 +++---
 src/include/catalog/index.h      |  3 ++
 src/include/nodes/makefuncs.h    |  4 ++-
 5 files changed, 54 insertions(+), 22 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 08d4b8e44d7..cf2d0abf370 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1290,15 +1290,32 @@ index_create(Relation heapRelation,
 /*
  * index_concurrently_create_copy
  *
- * Create concurrently an index based on the definition of the one provided by
- * caller.  The index is inserted into catalogs and needs to be built later
- * on.  This is called during concurrent reindex processing.
- *
- * "tablespaceOid" is the tablespace to use for this index.
+ * Variant of index_create_copy(), called during concurrent reindex
+ * processing.
  */
 Oid
 index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 							   Oid tablespaceOid, const char *newName)
+{
+	return index_create_copy(heapRelation, oldIndexId, tablespaceOid, newName,
+							 true);
+}
+
+/*
+ * index_create_copy
+ *
+ * Create an index based on the definition of the one provided by caller.  The
+ * index is inserted into catalogs. If 'concurrently' is TRUE, it needs to be
+ * built later on, otherwise it's built immediately.
+ *
+ * "tablespaceOid" is the tablespace to use for this index.
+ *
+ * The actual implementation of index_concurrently_create_copy(), reusable for
+ * other purposes.
+ */
+Oid
+index_create_copy(Relation heapRelation, Oid oldIndexId, Oid tablespaceOid,
+				  const char *newName, bool concurrently)
 {
 	Relation	indexRelation;
 	IndexInfo  *oldInfo,
@@ -1317,6 +1334,7 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 	List	   *indexColNames = NIL;
 	List	   *indexExprs = NIL;
 	List	   *indexPreds = NIL;
+	int			flags = 0;
 
 	indexRelation = index_open(oldIndexId, RowExclusiveLock);
 
@@ -1327,7 +1345,7 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 	 * Concurrent build of an index with exclusion constraints is not
 	 * supported.
 	 */
-	if (oldInfo->ii_ExclusionOps != NULL)
+	if (oldInfo->ii_ExclusionOps != NULL && concurrently)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("concurrent index creation for exclusion constraints is not supported")));
@@ -1383,9 +1401,7 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 	}
 
 	/*
-	 * Build the index information for the new index.  Note that rebuild of
-	 * indexes with exclusion constraints is not supported, hence there is no
-	 * need to fill all the ii_Exclusion* fields.
+	 * Build the index information for the new index.
 	 */
 	newInfo = makeIndexInfo(oldInfo->ii_NumIndexAttrs,
 							oldInfo->ii_NumIndexKeyAttrs,
@@ -1394,10 +1410,13 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 							indexPreds,
 							oldInfo->ii_Unique,
 							oldInfo->ii_NullsNotDistinct,
-							false,	/* not ready for inserts */
-							true,
+							!concurrently,	/* isready */
+							concurrently,	/* concurrent */
 							indexRelation->rd_indam->amsummarizing,
-							oldInfo->ii_WithoutOverlaps);
+							oldInfo->ii_WithoutOverlaps,
+							oldInfo->ii_ExclusionOps,
+							oldInfo->ii_ExclusionProcs,
+							oldInfo->ii_ExclusionStrats);
 
 	/*
 	 * Extract the list of column names and the column numbers for the new
@@ -1435,6 +1454,9 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 		stattargets[i].isnull = isnull;
 	}
 
+	if (concurrently)
+		flags = INDEX_CREATE_SKIP_BUILD | INDEX_CREATE_CONCURRENT;
+
 	/*
 	 * Now create the new index.
 	 *
@@ -1458,7 +1480,7 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 							  indcoloptions->values,
 							  stattargets,
 							  reloptionsDatum,
-							  INDEX_CREATE_SKIP_BUILD | INDEX_CREATE_CONCURRENT,
+							  flags,
 							  0,
 							  true, /* allow table to be a system catalog? */
 							  false,	/* is_internal? */
@@ -2452,7 +2474,8 @@ BuildIndexInfo(Relation index)
 					   indexStruct->indisready,
 					   false,
 					   index->rd_indam->amsummarizing,
-					   indexStruct->indisexclusion && indexStruct->indisunique);
+					   indexStruct->indisexclusion && indexStruct->indisunique,
+					   NULL, NULL, NULL);
 
 	/* fill in attribute numbers */
 	for (i = 0; i < numAtts; i++)
@@ -2512,7 +2535,8 @@ BuildDummyIndexInfo(Relation index)
 					   indexStruct->indisready,
 					   false,
 					   index->rd_indam->amsummarizing,
-					   indexStruct->indisexclusion && indexStruct->indisunique);
+					   indexStruct->indisexclusion && indexStruct->indisunique,
+					   NULL, NULL, NULL);
 
 	/* fill in attribute numbers */
 	for (i = 0; i < numAtts; i++)
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index a8033be4bff..9cc94884abc 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -242,7 +242,8 @@ CheckIndexCompatible(Oid oldId,
 	 */
 	indexInfo = makeIndexInfo(numberOfAttributes, numberOfAttributes,
 							  accessMethodId, NIL, NIL, false, false,
-							  false, false, amsummarizing, isWithoutOverlaps);
+							  false, false, amsummarizing, isWithoutOverlaps,
+							  NULL, NULL, NULL);
 	typeIds = palloc_array(Oid, numberOfAttributes);
 	collationIds = palloc_array(Oid, numberOfAttributes);
 	opclassIds = palloc_array(Oid, numberOfAttributes);
@@ -927,7 +928,8 @@ DefineIndex(Oid tableId,
 							  !concurrent,
 							  concurrent,
 							  amissummarizing,
-							  stmt->iswithoutoverlaps);
+							  stmt->iswithoutoverlaps,
+							  NULL, NULL, NULL);
 
 	typeIds = palloc_array(Oid, numberOfAttributes);
 	collationIds = palloc_array(Oid, numberOfAttributes);
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index e2d9e9be41a..c5d5a37f514 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -834,7 +834,8 @@ IndexInfo *
 makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 			  List *predicates, bool unique, bool nulls_not_distinct,
 			  bool isready, bool concurrent, bool summarizing,
-			  bool withoutoverlaps)
+			  bool withoutoverlaps, Oid *exclusion_ops, Oid *exclusion_procs,
+			  uint16 *exclusion_strats)
 {
 	IndexInfo  *n = makeNode(IndexInfo);
 
@@ -863,9 +864,9 @@ makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 	n->ii_PredicateState = NULL;
 
 	/* exclusion constraints */
-	n->ii_ExclusionOps = NULL;
-	n->ii_ExclusionProcs = NULL;
-	n->ii_ExclusionStrats = NULL;
+	n->ii_ExclusionOps = exclusion_ops;
+	n->ii_ExclusionProcs = exclusion_procs;
+	n->ii_ExclusionStrats = exclusion_strats;
 
 	/* speculative inserts */
 	n->ii_UniqueOps = NULL;
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index dda95e54903..4bf909078d8 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -99,6 +99,9 @@ extern Oid	index_concurrently_create_copy(Relation heapRelation,
 										   Oid oldIndexId,
 										   Oid tablespaceOid,
 										   const char *newName);
+extern Oid	index_create_copy(Relation heapRelation, Oid oldIndexId,
+							  Oid tablespaceOid, const char *newName,
+							  bool concurrently);
 
 extern void index_concurrently_build(Oid heapRelationId,
 									 Oid indexRelationId);
diff --git a/src/include/nodes/makefuncs.h b/src/include/nodes/makefuncs.h
index 5473ce9a288..9ff7159ff0c 100644
--- a/src/include/nodes/makefuncs.h
+++ b/src/include/nodes/makefuncs.h
@@ -99,7 +99,9 @@ extern IndexInfo *makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid,
 								List *expressions, List *predicates,
 								bool unique, bool nulls_not_distinct,
 								bool isready, bool concurrent,
-								bool summarizing, bool withoutoverlaps);
+								bool summarizing, bool withoutoverlaps,
+								Oid *exclusion_ops, Oid *exclusion_procs,
+								uint16 *exclusion_strats);
 
 extern Node *makeStringConst(char *str, int location);
 extern DefElem *makeDefElem(char *name, Node *arg, int location);
-- 
2.47.3

v26-0003-Move-conversion-of-a-historic-to-MVCC-snapshot-to-a-.patchtext/x-diffDownload

From 79b67f90e1be61945797252ce8c5ffc481ab67f9 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Thu, 4 Dec 2025 18:20:07 +0100
Subject: [PATCH 3/4] Move conversion of a "historic" to MVCC snapshot to a
 separate function.

The conversion is now handled by SnapBuildMVCCFromHistoric(). REPACK
CONCURRENTLY will also need it.
---
 src/backend/replication/logical/snapbuild.c | 51 +++++++++++++++++----
 src/backend/utils/time/snapmgr.c            |  3 +-
 src/include/replication/snapbuild.h         |  1 +
 src/include/utils/snapmgr.h                 |  1 +
 4 files changed, 45 insertions(+), 11 deletions(-)

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 6e18baa33cb..dcf32101c4c 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -440,10 +440,7 @@ Snapshot
 SnapBuildInitialSnapshot(SnapBuild *builder)
 {
 	Snapshot	snap;
-	TransactionId xid;
 	TransactionId safeXid;
-	TransactionId *newxip;
-	int			newxcnt = 0;
 
 	Assert(XactIsoLevel == XACT_REPEATABLE_READ);
 	Assert(builder->building_full_snapshot);
@@ -485,6 +482,31 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 
 	MyProc->xmin = snap->xmin;
 
+	/* Convert the historic snapshot to MVCC snapshot. */
+	return SnapBuildMVCCFromHistoric(snap, true);
+}
+
+/*
+ * Turn a historic MVCC snapshot into an ordinary MVCC snapshot.
+ *
+ * Unlike a regular (non-historic) MVCC snapshot, the xip array of this
+ * snapshot contains not only running main transactions, but also their
+ * subtransactions. This difference does has no impact on XidInMVCCSnapshot().
+ *
+ * Pass true for 'in_place' if you don't care about modifying the source
+ * snapshot. If you need a new instance, and one that was allocated as a
+ * single chunk of memory, pass false.
+ */
+Snapshot
+SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place)
+{
+	TransactionId xid;
+	TransactionId *oldxip = snapshot->xip;
+	uint32		oldxcnt = snapshot->xcnt;
+	TransactionId *newxip;
+	int			newxcnt = 0;
+	Snapshot	result;
+
 	/* allocate in transaction context */
 	newxip = (TransactionId *)
 		palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount());
@@ -495,7 +517,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	 * classical snapshot by marking all non-committed transactions as
 	 * in-progress. This can be expensive.
 	 */
-	for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);)
+	for (xid = snapshot->xmin; NormalTransactionIdPrecedes(xid, snapshot->xmax);)
 	{
 		void	   *test;
 
@@ -503,7 +525,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 		 * Check whether transaction committed using the decoding snapshot
 		 * meaning of ->xip.
 		 */
-		test = bsearch(&xid, snap->xip, snap->xcnt,
+		test = bsearch(&xid, snapshot->xip, snapshot->xcnt,
 					   sizeof(TransactionId), xidComparator);
 
 		if (test == NULL)
@@ -520,11 +542,22 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	}
 
 	/* adjust remaining snapshot fields as needed */
-	snap->snapshot_type = SNAPSHOT_MVCC;
-	snap->xcnt = newxcnt;
-	snap->xip = newxip;
+	snapshot->xcnt = newxcnt;
+	snapshot->xip = newxip;
+
+	if (in_place)
+		result = snapshot;
+	else
+	{
+		result = CopySnapshot(snapshot);
+
+		/* Restore the original values so the source is intact. */
+		snapshot->xip = oldxip;
+		snapshot->xcnt = oldxcnt;
+	}
+	result->snapshot_type = SNAPSHOT_MVCC;
 
-	return snap;
+	return result;
 }
 
 /*
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 24f73a49d27..886060305f5 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -213,7 +213,6 @@ typedef struct ExportedSnapshot
 static List *exportedSnapshots = NIL;
 
 /* Prototypes for local functions */
-static Snapshot CopySnapshot(Snapshot snapshot);
 static void UnregisterSnapshotNoOwner(Snapshot snapshot);
 static void FreeSnapshot(Snapshot snapshot);
 static void SnapshotResetXmin(void);
@@ -604,7 +603,7 @@ SetTransactionSnapshot(Snapshot sourcesnap, VirtualTransactionId *sourcevxid,
  * The copy is palloc'd in TopTransactionContext and has initial refcounts set
  * to 0.  The returned snapshot has the copied flag set.
  */
-static Snapshot
+Snapshot
 CopySnapshot(Snapshot snapshot)
 {
 	Snapshot	newsnap;
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index 44031dcf6e3..6d4d2d1814c 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -73,6 +73,7 @@ extern void FreeSnapshotBuilder(SnapBuild *builder);
 extern void SnapBuildSnapDecRefcount(Snapshot snap);
 
 extern Snapshot SnapBuildInitialSnapshot(SnapBuild *builder);
+extern Snapshot SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place);
 extern const char *SnapBuildExportSnapshot(SnapBuild *builder);
 extern void SnapBuildClearExportedSnapshot(void);
 extern void SnapBuildResetExportedSnapshotState(void);
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..f65f83c85cd 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -63,6 +63,7 @@ extern Snapshot GetTransactionSnapshot(void);
 extern Snapshot GetLatestSnapshot(void);
 extern void SnapshotSetCommandId(CommandId curcid);
 
+extern Snapshot CopySnapshot(Snapshot snapshot);
 extern Snapshot GetCatalogSnapshot(Oid relid);
 extern Snapshot GetNonHistoricCatalogSnapshot(Oid relid);
 extern void InvalidateCatalogSnapshot(void);
-- 
2.47.3

v26-0004-Add-CONCURRENTLY-option-to-REPACK-command.patchtext/plainDownload

From 1a181bb4b334554dafd1d8fd3ba5974effffed78 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Thu, 4 Dec 2025 18:20:07 +0100
Subject: [PATCH 4/4] Add CONCURRENTLY option to REPACK command.

The REPACK command copies the relation data into a new file, creates new
indexes and eventually swaps the files. To make sure that the old file does
not change during the copying, the relation is locked in an exclusive mode,
which prevents applications from both reading and writing. (To keep the data
consistent, we'd only need to prevent the applications from writing, but even
reading needs to be blocked before we can swap the files - otherwise some
applications could continue using the old file. Since we should not request a
stronger lock without releasing the weaker one first, we acquire the exclusive
lock in the beginning and keep it till the end of the processing.)

This patch introduces an alternative workflow, which only requires the
exclusive lock when the relation (and index) files are being swapped.
(Supposedly, the swapping should be pretty fast.) On the other hand, when we
copy the data to the new file, we allow applications to read from the relation
and even to write to it.

First, we scan the relation using a "historic snapshot", and insert all the
tuples satisfying this snapshot into the new file.

Second, logical decoding is used to capture the data changes done by
applications during the copying (i.e. changes that do not satisfy the historic
snapshot mentioned above), and those are applied to the new file before we
acquire the exclusive lock that we need to swap the files. (Of course, more
data changes can take place while we are waiting for the lock - these will be
applied to the new file after we have acquired the lock, before we swap the
files.)

Since the logical decoding system, during its startup, waits until all the
transactions which already have XID assigned have finished, there is a risk of
deadlock if a transaction that already changed anything in the database tries
to acquire a conflicting lock on the table REPACK CONCURRENTLY is working
on. As an example, consider transaction running CREATE INDEX command on the
table that is being REPACKed CONCURRENTLY. On the other hand, DML commands
(INSERT, UPDATE, DELETE) are not a problem as their lock does not conflict
with REPACK CONCURRENTLY.

The current approach is that we accept the risk. If we tried to avoid it, it'd
be necessary to unlock the table before the logical decoding is setup and lock
it again afterwards. Such temporary unlocking would imply re-checking if the
table still meets all the requirements for REPACK CONCURRENTLY.

Like the existing implementation of REPACK, the variant with the CONCURRENTLY
option also requires an extra space for the new relation and index files
(which coexist with the old files for some time). In addition, the
CONCURRENTLY option might introduce a lag in releasing WAL segments for
archiving / recycling. This is due to the decoding of the data changes done by
applications concurrently. When copying the table contents into the new file,
we check the lag periodically. If it exceeds the size of a WAL segment, we
decode all the available WAL before resuming the copying. (Of course, the
changes are not applied until the whole table contents is copied.) A
background worker might be a better approach for the decoding - let's consider
implementing it in the future.

The WAL records produced by running DML commands on the new relation do not
contain enough information to be processed by the logical decoding system. All
we need from the new relation is the file (relfilenode), while the actual
relation is eventually dropped. Thus there is no point in replaying the DMLs
anywhere.
---
 doc/src/sgml/monitoring.sgml                  |   37 +-
 doc/src/sgml/mvcc.sgml                        |   12 +-
 doc/src/sgml/ref/repack.sgml                  |  129 +-
 src/Makefile                                  |    1 +
 src/backend/access/heap/heapam.c              |   34 +-
 src/backend/access/heap/heapam_handler.c      |  227 ++-
 src/backend/access/heap/rewriteheap.c         |    6 +-
 src/backend/access/transam/xact.c             |   11 +-
 src/backend/catalog/system_views.sql          |   19 +-
 src/backend/commands/cluster.c                | 1659 ++++++++++++++++-
 src/backend/commands/matview.c                |    2 +-
 src/backend/commands/tablecmds.c              |    1 +
 src/backend/commands/vacuum.c                 |   12 +-
 src/backend/meson.build                       |    1 +
 src/backend/replication/logical/decode.c      |   93 +
 src/backend/replication/logical/snapbuild.c   |   23 +-
 .../replication/pgoutput_repack/Makefile      |   32 +
 .../replication/pgoutput_repack/meson.build   |   18 +
 .../pgoutput_repack/pgoutput_repack.c         |  248 +++
 .../storage/lmgr/generate-lwlocknames.pl      |    2 +-
 src/backend/utils/cache/relcache.c            |    1 +
 src/backend/utils/time/snapmgr.c              |    3 +-
 src/bin/psql/tab-complete.in.c                |    4 +-
 src/include/access/heapam.h                   |    9 +-
 src/include/access/heapam_xlog.h              |    2 +
 src/include/access/tableam.h                  |   10 +
 src/include/commands/cluster.h                |   90 +-
 src/include/commands/progress.h               |   17 +-
 src/include/replication/snapbuild.h           |    1 +
 src/include/storage/lockdefs.h                |    4 +-
 src/include/utils/snapmgr.h                   |    2 +
 src/test/modules/injection_points/Makefile    |    2 +
 .../injection_points/expected/repack.out      |  113 ++
 .../modules/injection_points/logical.conf     |    1 +
 src/test/modules/injection_points/meson.build |    3 +
 .../injection_points/specs/repack.spec        |  143 ++
 src/test/regress/expected/rules.out           |   19 +-
 src/tools/pgindent/typedefs.list              |    5 +
 38 files changed, 2761 insertions(+), 235 deletions(-)
 create mode 100644 src/backend/replication/pgoutput_repack/Makefile
 create mode 100644 src/backend/replication/pgoutput_repack/meson.build
 create mode 100644 src/backend/replication/pgoutput_repack/pgoutput_repack.c
 create mode 100644 src/test/modules/injection_points/expected/repack.out
 create mode 100644 src/test/modules/injection_points/logical.conf
 create mode 100644 src/test/modules/injection_points/specs/repack.spec

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index b8da77b4d89..7d9e60c1b20 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6177,14 +6177,35 @@ FROM pg_stat_get_backend_idset() AS backendid;
 
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>heap_tuples_written</structfield> <type>bigint</type>
+       <structfield>heap_tuples_inserted</structfield> <type>bigint</type>
       </para>
       <para>
-       Number of heap tuples written.
+       Number of heap tuples inserted.
        This counter only advances when the phase is
        <literal>seq scanning heap</literal>,
-       <literal>index scanning heap</literal>
-       or <literal>writing new heap</literal>.
+       <literal>index scanning heap</literal>,
+       <literal>writing new heap</literal>
+       or <literal>catch-up</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_updated</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples updated.
+       This counter only advances when the phase is <literal>catch-up</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_deleted</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples deleted.
+       This counter only advances when the phase is <literal>catch-up</literal>.
       </para></entry>
      </row>
 
@@ -6265,6 +6286,14 @@ FROM pg_stat_get_backend_idset() AS backendid;
        <command>REPACK</command> is currently writing the new heap.
      </entry>
     </row>
+    <row>
+     <entry><literal>catch-up</literal></entry>
+     <entry>
+       <command>REPACK CONCURRENTLY</command> is currently processing the DML
+       commands that other transactions executed during any of the preceding
+       phase.
+     </entry>
+    </row>
     <row>
      <entry><literal>swapping relation files</literal></entry>
      <entry>
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index 049ee75a4ba..0f5c34af542 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -1833,15 +1833,17 @@ SELECT pg_advisory_lock(q.id) FROM
    <title>Caveats</title>
 
    <para>
-    Some DDL commands, currently only <link linkend="sql-truncate"><command>TRUNCATE</command></link> and the
-    table-rewriting forms of <link linkend="sql-altertable"><command>ALTER TABLE</command></link>, are not
+    Some commands, currently only <link linkend="sql-truncate"><command>TRUNCATE</command></link>, the
+    table-rewriting forms of <link linkend="sql-altertable"><command>ALTER
+    TABLE</command></link> and <command>REPACK</command> with
+    the <literal>CONCURRENTLY</literal> option, are not
     MVCC-safe.  This means that after the truncation or rewrite commits, the
     table will appear empty to concurrent transactions, if they are using a
-    snapshot taken before the DDL command committed.  This will only be an
+    snapshot taken before the command committed.  This will only be an
     issue for a transaction that did not access the table in question
-    before the DDL command started &mdash; any transaction that has done so
+    before the command started &mdash; any transaction that has done so
     would hold at least an <literal>ACCESS SHARE</literal> table lock,
-    which would block the DDL command until that transaction completes.
+    which would block the truncating or rewriting command until that transaction completes.
     So these commands will not cause any apparent inconsistency in the
     table contents for successive queries on the target table, but they
     could cause visible inconsistency between the contents of the target
diff --git a/doc/src/sgml/ref/repack.sgml b/doc/src/sgml/ref/repack.sgml
index 61d5c2cdef1..2765ce4100e 100644
--- a/doc/src/sgml/ref/repack.sgml
+++ b/doc/src/sgml/ref/repack.sgml
@@ -28,6 +28,7 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] USING
 
     VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
     ANALYZE [ <replaceable class="parameter">boolean</replaceable> ]
+    CONCURRENTLY [ <replaceable class="parameter">boolean</replaceable> ]
 
 <phrase>and <replaceable class="parameter">table_and_columns</replaceable> is:</phrase>
 
@@ -54,7 +55,8 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] USING
    processes every table and materialized view in the current database that
    the current user has the <literal>MAINTAIN</literal> privilege on. This
    form of <command>REPACK</command> cannot be executed inside a transaction
-   block.
+   block.  Also, this form is not allowed if
+   the <literal>CONCURRENTLY</literal> option is used.
   </para>
 
   <para>
@@ -67,7 +69,8 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] USING
    When a table is being repacked, an <literal>ACCESS EXCLUSIVE</literal> lock
    is acquired on it. This prevents any other database operations (both reads
    and writes) from operating on the table until the <command>REPACK</command>
-   is finished.
+   is finished. If you want to keep the table accessible during the repacking,
+   consider using the <literal>CONCURRENTLY</literal> option.
   </para>
 
   <refsect2 id="sql-repack-notes-on-clustering" xreflabel="Notes on Clustering">
@@ -195,6 +198,128 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] USING
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>CONCURRENTLY</literal></term>
+    <listitem>
+     <para>
+      Allow other transactions to use the table while it is being repacked.
+     </para>
+
+     <para>
+      Internally, <command>REPACK</command> copies the contents of the table
+      (ignoring dead tuples) into a new file, sorted by the specified index,
+      and also creates a new file for each index. Then it swaps the old and
+      new files for the table and all the indexes, and deletes the old
+      files. The <literal>ACCESS EXCLUSIVE</literal> lock is needed to make
+      sure that the old files do not change during the processing because the
+      changes would get lost due to the swap.
+     </para>
+
+     <para>
+      With the <literal>CONCURRENTLY</literal> option, the <literal>ACCESS
+      EXCLUSIVE</literal> lock is only acquired to swap the table and index
+      files. The data changes that took place during the creation of the new
+      table and index files are captured using logical decoding
+      (<xref linkend="logicaldecoding"/>) and applied before
+      the <literal>ACCESS EXCLUSIVE</literal> lock is requested. Thus the lock
+      is typically held only for the time needed to swap the files, which
+      should be pretty short. However, the time might still be noticeable if
+      too many data changes have been done to the table while
+      <command>REPACK</command> was waiting for the lock: those changes must
+      be processed just before the files are swapped, while the
+      <literal>ACCESS EXCLUSIVE</literal> lock is being held.
+     </para>
+
+     <para>
+      Note that <command>REPACK</command> with the
+      the <literal>CONCURRENTLY</literal> option does not try to order the
+      rows inserted into the table after the repacking started. Also
+      note <command>REPACK</command> might fail to complete due to DDL
+      commands executed on the table by other transactions during the
+      repacking.
+     </para>
+
+     <note>
+      <para>
+       In addition to the temporary space requirements explained in
+       <xref linkend="sql-repack-notes-on-resources"/>,
+       the <literal>CONCURRENTLY</literal> option can add to the usage of
+       temporary space a bit more. The reason is that other transactions can
+       perform DML operations which cannot be applied to the new file until
+       <command>REPACK</command> has copied all the tuples from the old
+       file. Thus the tuples inserted into the old file during the copying are
+       also stored separately in a temporary file, so they can eventually be
+       applied to the new file.
+      </para>
+
+      <para>
+       Furthermore, the data changes performed during the copying are
+       extracted from <link linkend="wal">write-ahead log</link> (WAL), and
+       this extraction (decoding) only takes place when certain amount of WAL
+       has been written. Therefore, WAL removal can be delayed by this
+       threshold. Currently the threshold is equal to the value of
+       the <link linkend="guc-wal-segment-size"><varname>wal_segment_size</varname></link>
+       configuration parameter.
+      </para>
+     </note>
+
+     <para>
+      The <literal>CONCURRENTLY</literal> option cannot be used in the
+      following cases:
+
+      <itemizedlist>
+       <listitem>
+        <para>
+          The table is <literal>UNLOGGED</literal>.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+          The table is partitioned.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+          The table is a system catalog or a <acronym>TOAST</acronym> table.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         <command>REPACK</command> is executed inside a transaction block.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+          The <link linkend="guc-wal-level"><varname>wal_level</varname></link>
+          configuration parameter is less than <literal>logical</literal>.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         The <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+         configuration parameter does not allow for creation of an additional
+         replication slot.
+        </para>
+       </listitem>
+      </itemizedlist>
+     </para>
+
+     <warning>
+      <para>
+       <command>REPACK</command> with the <literal>CONCURRENTLY</literal>
+       option is not MVCC-safe, see <xref linkend="mvcc-caveats"/> for
+       details.
+      </para>
+     </warning>
+
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>VERBOSE</literal></term>
     <listitem>
diff --git a/src/Makefile b/src/Makefile
index 2f31a2f20a7..b18c9a14ffa 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -23,6 +23,7 @@ SUBDIRS = \
 	interfaces \
 	backend/replication/libpqwalreceiver \
 	backend/replication/pgoutput \
+	backend/replication/pgoutput_repack \
 	fe_utils \
 	bin \
 	pl \
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4d382a04338..f019f93bca6 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -60,7 +60,8 @@ static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
 static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 								  Buffer newbuf, HeapTuple oldtup,
 								  HeapTuple newtup, HeapTuple old_key_tuple,
-								  bool all_visible_cleared, bool new_all_visible_cleared);
+								  bool all_visible_cleared, bool new_all_visible_cleared,
+								  bool wal_logical);
 #ifdef USE_ASSERT_CHECKING
 static void check_lock_if_inplace_updateable_rel(Relation relation,
 												 const ItemPointerData *otid,
@@ -2803,7 +2804,7 @@ xmax_infomask_changed(uint16 new_infomask, uint16 old_infomask)
 TM_Result
 heap_delete(Relation relation, const ItemPointerData *tid,
 			CommandId cid, Snapshot crosscheck, bool wait,
-			TM_FailureData *tmfd, bool changingPart)
+			TM_FailureData *tmfd, bool changingPart, bool wal_logical)
 {
 	TM_Result	result;
 	TransactionId xid = GetCurrentTransactionId();
@@ -3050,7 +3051,8 @@ l1:
 	 * Compute replica identity tuple before entering the critical section so
 	 * we don't PANIC upon a memory allocation failure.
 	 */
-	old_key_tuple = ExtractReplicaIdentity(relation, &tp, true, &old_key_copied);
+	old_key_tuple = wal_logical ?
+		ExtractReplicaIdentity(relation, &tp, true, &old_key_copied) : NULL;
 
 	/*
 	 * If this is the first possibly-multixact-able operation in the current
@@ -3140,6 +3142,15 @@ l1:
 				xlrec.flags |= XLH_DELETE_CONTAINS_OLD_KEY;
 		}
 
+		/*
+		 * Unlike UPDATE, DELETE is decoded even if there is no old key, so it
+		 * does not help to clear both XLH_DELETE_CONTAINS_OLD_TUPLE and
+		 * XLH_DELETE_CONTAINS_OLD_KEY. Thus we need an extra flag. TODO
+		 * Consider not decoding tuples w/o the old tuple/key instead.
+		 */
+		if (!wal_logical)
+			xlrec.flags |= XLH_DELETE_NO_LOGICAL;
+
 		XLogBeginInsert();
 		XLogRegisterData(&xlrec, SizeOfHeapDelete);
 
@@ -3232,7 +3243,8 @@ simple_heap_delete(Relation relation, const ItemPointerData *tid)
 	result = heap_delete(relation, tid,
 						 GetCurrentCommandId(true), InvalidSnapshot,
 						 true /* wait for commit */ ,
-						 &tmfd, false /* changingPart */ );
+						 &tmfd, false,	/* changingPart */
+						 true /* wal_logical */ );
 	switch (result)
 	{
 		case TM_SelfModified:
@@ -3273,7 +3285,7 @@ TM_Result
 heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
 			CommandId cid, Snapshot crosscheck, bool wait,
 			TM_FailureData *tmfd, LockTupleMode *lockmode,
-			TU_UpdateIndexes *update_indexes)
+			TU_UpdateIndexes *update_indexes, bool wal_logical)
 {
 	TM_Result	result;
 	TransactionId xid = GetCurrentTransactionId();
@@ -4166,7 +4178,8 @@ l2:
 								 newbuf, &oldtup, heaptup,
 								 old_key_tuple,
 								 all_visible_cleared,
-								 all_visible_cleared_new);
+								 all_visible_cleared_new,
+								 wal_logical);
 		if (newbuf != buffer)
 		{
 			PageSetLSN(BufferGetPage(newbuf), recptr);
@@ -4524,7 +4537,8 @@ simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup
 	result = heap_update(relation, otid, tup,
 						 GetCurrentCommandId(true), InvalidSnapshot,
 						 true /* wait for commit */ ,
-						 &tmfd, &lockmode, update_indexes);
+						 &tmfd, &lockmode, update_indexes,
+						 true /* wal_logical */ );
 	switch (result)
 	{
 		case TM_SelfModified:
@@ -8864,7 +8878,8 @@ static XLogRecPtr
 log_heap_update(Relation reln, Buffer oldbuf,
 				Buffer newbuf, HeapTuple oldtup, HeapTuple newtup,
 				HeapTuple old_key_tuple,
-				bool all_visible_cleared, bool new_all_visible_cleared)
+				bool all_visible_cleared, bool new_all_visible_cleared,
+				bool wal_logical)
 {
 	xl_heap_update xlrec;
 	xl_heap_header xlhdr;
@@ -8875,7 +8890,8 @@ log_heap_update(Relation reln, Buffer oldbuf,
 				suffixlen = 0;
 	XLogRecPtr	recptr;
 	Page		page = BufferGetPage(newbuf);
-	bool		need_tuple_data = RelationIsLogicallyLogged(reln);
+	bool		need_tuple_data = RelationIsLogicallyLogged(reln) &&
+		wal_logical;
 	bool		init;
 	int			bufflags;
 
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 79f9de5d760..01be29eb405 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -33,6 +33,7 @@
 #include "catalog/index.h"
 #include "catalog/storage.h"
 #include "catalog/storage_xlog.h"
+#include "commands/cluster.h"
 #include "commands/progress.h"
 #include "executor/executor.h"
 #include "miscadmin.h"
@@ -309,7 +310,8 @@ heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
 	 * the storage itself is cleaning the dead tuples by itself, it is the
 	 * time to call the index tuple deletion also.
 	 */
-	return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart);
+	return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart,
+					   true);
 }
 
 
@@ -328,7 +330,7 @@ heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
 	tuple->t_tableOid = slot->tts_tableOid;
 
 	result = heap_update(relation, otid, tuple, cid, crosscheck, wait,
-						 tmfd, lockmode, update_indexes);
+						 tmfd, lockmode, update_indexes, true);
 	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
 
 	/*
@@ -685,13 +687,15 @@ static void
 heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 								 Relation OldIndex, bool use_sort,
 								 TransactionId OldestXmin,
+								 Snapshot snapshot,
+								 LogicalDecodingContext *decoding_ctx,
 								 TransactionId *xid_cutoff,
 								 MultiXactId *multi_cutoff,
 								 double *num_tuples,
 								 double *tups_vacuumed,
 								 double *tups_recently_dead)
 {
-	RewriteState rwstate;
+	RewriteState rwstate = NULL;
 	IndexScanDesc indexScan;
 	TableScanDesc tableScan;
 	HeapScanDesc heapScan;
@@ -705,6 +709,8 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	bool	   *isnull;
 	BufferHeapTupleTableSlot *hslot;
 	BlockNumber prev_cblock = InvalidBlockNumber;
+	bool		concurrent = snapshot != NULL;
+	XLogRecPtr	end_of_wal_prev = GetFlushRecPtr(NULL);
 
 	/* Remember if it's a system catalog */
 	is_system_catalog = IsSystemRelation(OldHeap);
@@ -720,9 +726,12 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	values = (Datum *) palloc(natts * sizeof(Datum));
 	isnull = (bool *) palloc(natts * sizeof(bool));
 
-	/* Initialize the rewrite operation */
-	rwstate = begin_heap_rewrite(OldHeap, NewHeap, OldestXmin, *xid_cutoff,
-								 *multi_cutoff);
+	/*
+	 * Initialize the rewrite operation.
+	 */
+	if (!concurrent)
+		rwstate = begin_heap_rewrite(OldHeap, NewHeap, OldestXmin,
+									 *xid_cutoff, *multi_cutoff);
 
 
 	/* Set up sorting if wanted */
@@ -737,6 +746,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	 * Prepare to scan the OldHeap.  To ensure we see recently-dead tuples
 	 * that still need to be copied, we scan with SnapshotAny and use
 	 * HeapTupleSatisfiesVacuum for the visibility test.
+	 *
+	 * In the CONCURRENTLY case, we do regular MVCC visibility tests, using
+	 * the snapshot passed by the caller.
 	 */
 	if (OldIndex != NULL && !use_sort)
 	{
@@ -753,7 +765,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex,
+									snapshot ? snapshot : SnapshotAny,
+									NULL, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -762,7 +776,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap,
+									snapshot ? snapshot : SnapshotAny,
+									0, (ScanKey) NULL);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
@@ -837,70 +853,77 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		tuple = ExecFetchSlotHeapTuple(slot, false, NULL);
 		buf = hslot->buffer;
 
-		LockBuffer(buf, BUFFER_LOCK_SHARE);
-
-		switch (HeapTupleSatisfiesVacuum(tuple, OldestXmin, buf))
+		/*
+		 * Regarding CONCURRENTLY, see the comments on MVCC snapshot above.
+		 */
+		if (!concurrent)
 		{
-			case HEAPTUPLE_DEAD:
-				/* Definitely dead */
-				isdead = true;
-				break;
-			case HEAPTUPLE_RECENTLY_DEAD:
-				*tups_recently_dead += 1;
-				/* fall through */
-			case HEAPTUPLE_LIVE:
-				/* Live or recently dead, must copy it */
-				isdead = false;
-				break;
-			case HEAPTUPLE_INSERT_IN_PROGRESS:
+			LockBuffer(buf, BUFFER_LOCK_SHARE);
 
-				/*
-				 * Since we hold exclusive lock on the relation, normally the
-				 * only way to see this is if it was inserted earlier in our
-				 * own transaction.  However, it can happen in system
-				 * catalogs, since we tend to release write lock before commit
-				 * there.  Give a warning if neither case applies; but in any
-				 * case we had better copy it.
-				 */
-				if (!is_system_catalog &&
-					!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple->t_data)))
-					elog(WARNING, "concurrent insert in progress within table \"%s\"",
-						 RelationGetRelationName(OldHeap));
-				/* treat as live */
-				isdead = false;
-				break;
-			case HEAPTUPLE_DELETE_IN_PROGRESS:
+			switch (HeapTupleSatisfiesVacuum(tuple, OldestXmin, buf))
+			{
+				case HEAPTUPLE_DEAD:
+					/* Definitely dead */
+					isdead = true;
+					break;
+				case HEAPTUPLE_RECENTLY_DEAD:
+					*tups_recently_dead += 1;
+					/* fall through */
+				case HEAPTUPLE_LIVE:
+					/* Live or recently dead, must copy it */
+					isdead = false;
+					break;
+				case HEAPTUPLE_INSERT_IN_PROGRESS:
 
-				/*
-				 * Similar situation to INSERT_IN_PROGRESS case.
-				 */
-				if (!is_system_catalog &&
-					!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tuple->t_data)))
-					elog(WARNING, "concurrent delete in progress within table \"%s\"",
-						 RelationGetRelationName(OldHeap));
-				/* treat as recently dead */
-				*tups_recently_dead += 1;
-				isdead = false;
-				break;
-			default:
-				elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
-				isdead = false; /* keep compiler quiet */
-				break;
-		}
+					/*
+					 * As long as we hold exclusive lock on the relation,
+					 * normally the only way to see this is if it was inserted
+					 * earlier in our own transaction.  However, it can happen
+					 * in system catalogs, since we tend to release write lock
+					 * before commit there. Give a warning if neither case
+					 * applies; but in any case we had better copy it.
+					 */
+					if (!is_system_catalog &&
+						!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple->t_data)))
+						elog(WARNING, "concurrent insert in progress within table \"%s\"",
+							 RelationGetRelationName(OldHeap));
+					/* treat as live */
+					isdead = false;
+					break;
+				case HEAPTUPLE_DELETE_IN_PROGRESS:
+
+					/*
+					 * Similar situation to INSERT_IN_PROGRESS case.
+					 */
+					if (!is_system_catalog &&
+						!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tuple->t_data)))
+						elog(WARNING, "concurrent delete in progress within table \"%s\"",
+							 RelationGetRelationName(OldHeap));
+					/* treat as recently dead */
+					*tups_recently_dead += 1;
+					isdead = false;
+					break;
+				default:
+					elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
+					isdead = false; /* keep compiler quiet */
+					break;
+			}
 
-		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 
-		if (isdead)
-		{
-			*tups_vacuumed += 1;
-			/* heap rewrite module still needs to see it... */
-			if (rewrite_heap_dead_tuple(rwstate, tuple))
+			if (isdead)
 			{
-				/* A previous recently-dead tuple is now known dead */
 				*tups_vacuumed += 1;
-				*tups_recently_dead -= 1;
+				/* heap rewrite module still needs to see it... */
+				if (rewrite_heap_dead_tuple(rwstate, tuple))
+				{
+					/* A previous recently-dead tuple is now known dead */
+					*tups_vacuumed += 1;
+					*tups_recently_dead -= 1;
+				}
+
+				continue;
 			}
-			continue;
 		}
 
 		*num_tuples += 1;
@@ -919,7 +942,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		{
 			const int	ct_index[] = {
 				PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
-				PROGRESS_REPACK_HEAP_TUPLES_WRITTEN
+				PROGRESS_REPACK_HEAP_TUPLES_INSERTED
 			};
 			int64		ct_val[2];
 
@@ -934,6 +957,31 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			ct_val[1] = *num_tuples;
 			pgstat_progress_update_multi_param(2, ct_index, ct_val);
 		}
+
+		/*
+		 * Process the WAL produced by the load, as well as by other
+		 * transactions, so that the replication slot can advance and WAL does
+		 * not pile up. Use wal_segment_size as a threshold so that we do not
+		 * introduce the decoding overhead too often.
+		 *
+		 * Of course, we must not apply the changes until the initial load has
+		 * completed.
+		 *
+		 * Note that our insertions into the new table should not be decoded
+		 * as we (intentionally) do not write the logical decoding specific
+		 * information to WAL.
+		 */
+		if (concurrent)
+		{
+			XLogRecPtr	end_of_wal;
+
+			end_of_wal = GetFlushRecPtr(NULL);
+			if ((end_of_wal - end_of_wal_prev) > wal_segment_size)
+			{
+				repack_decode_concurrent_changes(decoding_ctx, end_of_wal);
+				end_of_wal_prev = end_of_wal;
+			}
+		}
 	}
 
 	if (indexScan != NULL)
@@ -977,15 +1025,32 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 									 values, isnull,
 									 rwstate);
 			/* Report n_tuples */
-			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_WRITTEN,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_INSERTED,
 										 n_tuples);
+
+			/*
+			 * Try to keep the amount of not-yet-decoded WAL small, like
+			 * above.
+			 */
+			if (concurrent)
+			{
+				XLogRecPtr	end_of_wal;
+
+				end_of_wal = GetFlushRecPtr(NULL);
+				if ((end_of_wal - end_of_wal_prev) > wal_segment_size)
+				{
+					repack_decode_concurrent_changes(decoding_ctx, end_of_wal);
+					end_of_wal_prev = end_of_wal;
+				}
+			}
 		}
 
 		tuplesort_end(tuplesort);
 	}
 
 	/* Write out any remaining tuples, and fsync if needed */
-	end_heap_rewrite(rwstate);
+	if (rwstate)
+		end_heap_rewrite(rwstate);
 
 	/* Clean up */
 	pfree(values);
@@ -2376,6 +2441,10 @@ heapam_scan_sample_next_tuple(TableScanDesc scan, SampleScanState *scanstate,
  * SET WITHOUT OIDS.
  *
  * So, we must reconstruct the tuple from component Datums.
+ *
+ * If rwstate=NULL, use simple_heap_insert() instead of rewriting - in that
+ * case we still need to deform/form the tuple. TODO Shouldn't we rename the
+ * function, as might not do any rewrite?
  */
 static void
 reform_and_rewrite_tuple(HeapTuple tuple,
@@ -2398,8 +2467,28 @@ reform_and_rewrite_tuple(HeapTuple tuple,
 
 	copiedTuple = heap_form_tuple(newTupDesc, values, isnull);
 
-	/* The heap rewrite module does the rest */
-	rewrite_heap_tuple(rwstate, tuple, copiedTuple);
+	if (rwstate)
+		/* The heap rewrite module does the rest */
+		rewrite_heap_tuple(rwstate, tuple, copiedTuple);
+	else
+	{
+		/*
+		 * Insert tuple when processing REPACK CONCURRENTLY.
+		 *
+		 * rewriteheap.c is not used in the CONCURRENTLY case because it'd be
+		 * difficult to do the same in the catch-up phase (as the logical
+		 * decoding does not provide us with sufficient visibility
+		 * information). Thus we must use heap_insert() both during the
+		 * catch-up and here.
+		 *
+		 * The following is like simple_heap_insert() except that we pass the
+		 * flag to skip logical decoding: as soon as REPACK CONCURRENTLY swaps
+		 * the relation files, it drops this relation, so no logical
+		 * replication subscription should need the data.
+		 */
+		heap_insert(NewHeap, copiedTuple, GetCurrentCommandId(true),
+					HEAP_INSERT_NO_LOGICAL, NULL);
+	}
 
 	heap_freetuple(copiedTuple);
 }
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index 66ab48f0fe0..ee83a0fc91d 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -617,9 +617,9 @@ raw_heap_insert(RewriteState state, HeapTuple tup)
 		int			options = HEAP_INSERT_SKIP_FSM;
 
 		/*
-		 * While rewriting the heap for VACUUM FULL / CLUSTER, make sure data
-		 * for the TOAST table are not logically decoded.  The main heap is
-		 * WAL-logged as XLOG FPI records, which are not logically decoded.
+		 * While rewriting the heap for REPACK, make sure data for the TOAST
+		 * table are not logically decoded.  The main heap is WAL-logged as
+		 * XLOG FPI records, which are not logically decoded.
 		 */
 		options |= HEAP_INSERT_NO_LOGICAL;
 
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 092e197eba3..edf467a9125 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -216,6 +216,7 @@ typedef struct TransactionStateData
 	bool		parallelChildXact;	/* is any parent transaction parallel? */
 	bool		chain;			/* start a new block after this one */
 	bool		topXidLogged;	/* for a subxact: is top-level XID logged? */
+	bool		internal;		/* for a subxact: launched internally? */
 	struct TransactionStateData *parent;	/* back link to parent */
 } TransactionStateData;
 
@@ -4741,6 +4742,7 @@ BeginInternalSubTransaction(const char *name)
 			/* Normal subtransaction start */
 			PushTransaction();
 			s = CurrentTransactionState;	/* changed by push */
+			s->internal = true;
 
 			/*
 			 * Savepoint names, like the TransactionState block itself, live
@@ -5257,7 +5259,13 @@ AbortSubTransaction(void)
 	LWLockReleaseAll();
 
 	pgstat_report_wait_end();
-	pgstat_progress_end_command();
+
+	/*
+	 * Internal subtransacion might be used by an user command, in which case
+	 * the command outlives the subtransaction.
+	 */
+	if (!s->internal)
+		pgstat_progress_end_command();
 
 	pgaio_error_cleanup();
 
@@ -5474,6 +5482,7 @@ PushTransaction(void)
 	s->parallelModeLevel = 0;
 	s->parallelChildXact = (p->parallelModeLevel != 0 || p->parallelChildXact);
 	s->topXidLogged = false;
+	s->internal = false;
 
 	CurrentTransactionState = s;
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 024d219016d..941cd46ef67 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1287,16 +1287,19 @@ CREATE VIEW pg_stat_progress_repack AS
                       WHEN 2 THEN 'index scanning heap'
                       WHEN 3 THEN 'sorting tuples'
                       WHEN 4 THEN 'writing new heap'
-                      WHEN 5 THEN 'swapping relation files'
-                      WHEN 6 THEN 'rebuilding index'
-                      WHEN 7 THEN 'performing final cleanup'
+                      WHEN 5 THEN 'catch-up'
+                      WHEN 6 THEN 'swapping relation files'
+                      WHEN 7 THEN 'rebuilding index'
+                      WHEN 8 THEN 'performing final cleanup'
                       END AS phase,
         CAST(S.param3 AS oid) AS repack_index_relid,
         S.param4 AS heap_tuples_scanned,
-        S.param5 AS heap_tuples_written,
-        S.param6 AS heap_blks_total,
-        S.param7 AS heap_blks_scanned,
-        S.param8 AS index_rebuild_count
+        S.param5 AS heap_tuples_inserted,
+        S.param6 AS heap_tuples_updated,
+        S.param7 AS heap_tuples_deleted,
+        S.param8 AS heap_blks_total,
+        S.param9 AS heap_blks_scanned,
+        S.param10 AS index_rebuild_count
     FROM pg_stat_get_progress_info('REPACK') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
@@ -1314,7 +1317,7 @@ CREATE VIEW pg_stat_progress_cluster AS
         phase,
         repack_index_relid AS cluster_index_relid,
         heap_tuples_scanned,
-        heap_tuples_written,
+        heap_tuples_inserted + heap_tuples_updated AS heap_tuples_written,
         heap_blks_total,
         heap_blks_scanned,
         index_rebuild_count
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index ba3c076ea7d..260d7cc07e0 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -26,6 +26,10 @@
 #include "access/toast_internals.h"
 #include "access/transam.h"
 #include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xlog_internal.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
 #include "catalog/catalog.h"
 #include "catalog/dependency.h"
 #include "catalog/heap.h"
@@ -33,6 +37,7 @@
 #include "catalog/namespace.h"
 #include "catalog/objectaccess.h"
 #include "catalog/pg_am.h"
+#include "catalog/pg_control.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/toasting.h"
 #include "commands/cluster.h"
@@ -40,15 +45,21 @@
 #include "commands/progress.h"
 #include "commands/tablecmds.h"
 #include "commands/vacuum.h"
+#include "executor/executor.h"
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
 #include "pgstat.h"
+#include "replication/decode.h"
+#include "replication/logical.h"
+#include "replication/snapbuild.h"
 #include "storage/bufmgr.h"
+#include "storage/ipc.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
 #include "utils/acl.h"
 #include "utils/fmgroids.h"
 #include "utils/guc.h"
+#include "utils/injection_point.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
@@ -68,13 +79,68 @@ typedef struct
 	Oid			indexOid;
 } RelToCluster;
 
+/*
+ * The following definitions are used for concurrent processing.
+ */
+
+/*
+ * The locators are used to avoid logical decoding of data that we do not need
+ * for our table.
+ */
+RelFileLocator repacked_rel_locator = {.relNumber = InvalidOid};
+RelFileLocator repacked_rel_toast_locator = {.relNumber = InvalidOid};
+
+/*
+ * Everything we need to call ExecInsertIndexTuples().
+ */
+typedef struct IndexInsertState
+{
+	ResultRelInfo *rri;
+	EState	   *estate;
+} IndexInsertState;
+
+/* The WAL segment being decoded. */
+static XLogSegNo repack_current_segment = 0;
+
+/*
+ * Information needed to apply concurrent data changes.
+ */
+typedef struct ChangeDest
+{
+	/* The relation the changes are applied to. */
+	Relation	rel;
+
+	/*
+	 * If valid, set rel->rd_toastoid to this for the time the changes are
+	 * being applied.
+	 */
+	Oid			toastrelid;
+
+	/*
+	 * The following is needed to find the existing tuple if the change is
+	 * UPDATE or DELETE. 'ident_key' should have all the fields except for
+	 * 'sk_argument' initialized.
+	 */
+	Relation	ident_index;
+	ScanKey		ident_key;
+	int			ident_key_nentries;
+
+	/* Needed to update indexes of rel_dst. */
+	IndexInsertState *iistate;
+} ChangeDest;
+
 static bool cluster_rel_recheck(RepackCommand cmd, Relation OldHeap,
-								Oid indexOid, Oid userid, int options);
-static void rebuild_relation(RepackCommand cmd,
-							 Relation OldHeap, Relation index, bool verbose);
+								Oid indexOid, Oid userid, LOCKMODE lmode,
+								int options);
+static void check_repack_concurrently_requirements(Relation rel);
+static void rebuild_relation(RepackCommand cmd, Relation OldHeap, Relation index,
+							 bool verbose, bool concurrently);
 static void copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
-							bool verbose, bool *pSwapToastByContent,
-							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
+							Snapshot snapshot, LogicalDecodingContext *decoding_ctx,
+							bool verbose,
+							bool *pSwapToastByContent,
+							TransactionId *pFreezeXid,
+							MultiXactId *pCutoffMulti);
 static List *get_tables_to_repack(RepackCommand cmd, bool usingindex,
 								  MemoryContext permcxt);
 static List *get_tables_to_repack_partitioned(RepackCommand cmd,
@@ -82,13 +148,54 @@ static List *get_tables_to_repack_partitioned(RepackCommand cmd,
 											  MemoryContext permcxt);
 static bool cluster_is_permitted_for_relation(RepackCommand cmd,
 											  Oid relid, Oid userid);
+
+static void begin_concurrent_repack(Relation rel);
+static void end_concurrent_repack(void);
+static LogicalDecodingContext *setup_logical_decoding(Oid relid);
+static HeapTuple get_changed_tuple(char *change);
+static void apply_concurrent_changes(RepackDecodingState *dstate,
+									 ChangeDest *dest);
+static void apply_concurrent_insert(Relation rel, ConcurrentChange *change,
+									HeapTuple tup, IndexInsertState *iistate,
+									TupleTableSlot *index_slot);
+static void apply_concurrent_update(Relation rel, HeapTuple tup,
+									HeapTuple tup_target,
+									ConcurrentChange *change,
+									IndexInsertState *iistate,
+									TupleTableSlot *index_slot);
+static void apply_concurrent_delete(Relation rel, HeapTuple tup_target,
+									ConcurrentChange *change);
+static HeapTuple find_target_tuple(Relation rel, ChangeDest *dest,
+								   HeapTuple tup_key,
+								   TupleTableSlot *ident_slot);
+static void process_concurrent_changes(LogicalDecodingContext *decoding_ctx,
+									   XLogRecPtr end_of_wal,
+									   ChangeDest *dest);
+static IndexInsertState *get_index_insert_state(Relation relation,
+												Oid ident_index_id,
+												Relation *ident_index_p);
+static ScanKey build_identity_key(Oid ident_idx_oid, Relation rel_src,
+								  int *nentries);
+static void free_index_insert_state(IndexInsertState *iistate);
+static void cleanup_logical_decoding(LogicalDecodingContext *ctx);
+static void rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
+											   Relation cl_index,
+											   LogicalDecodingContext *decoding_ctx,
+											   bool swap_toast_by_content,
+											   TransactionId frozenXid,
+											   MultiXactId cutoffMulti);
+static List *build_new_indexes(Relation NewHeap, Relation OldHeap, List *OldIndexes);
 static Relation process_single_relation(RepackStmt *stmt,
+										LOCKMODE lockmode,
+										bool isTopLevel,
 										ClusterParams *params);
 static Oid	determine_clustered_index(Relation rel, bool usingindex,
 									  const char *indexname);
 static const char *RepackCommandAsString(RepackCommand cmd);
 
 
+#define REPL_PLUGIN_NAME   "pgoutput_repack"
+
 /*
  * The repack code allows for processing multiple tables at once. Because
  * of this, we cannot just run everything on a single transaction, or we
@@ -118,6 +225,7 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 	ClusterParams params = {0};
 	Relation	rel = NULL;
 	MemoryContext repack_context;
+	LOCKMODE	lockmode;
 	List	   *rtcs;
 
 	/* Parse option list */
@@ -128,6 +236,16 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 		else if (strcmp(opt->defname, "analyze") == 0 ||
 				 strcmp(opt->defname, "analyse") == 0)
 			params.options |= defGetBoolean(opt) ? CLUOPT_ANALYZE : 0;
+		else if (strcmp(opt->defname, "concurrently") == 0 &&
+				 defGetBoolean(opt))
+		{
+			if (stmt->command != REPACK_COMMAND_REPACK)
+				ereport(ERROR,
+						errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						errmsg("CONCURRENTLY option not supported for %s",
+							   RepackCommandAsString(stmt->command)));
+			params.options |= CLUOPT_CONCURRENT;
+		}
 		else
 			ereport(ERROR,
 					errcode(ERRCODE_SYNTAX_ERROR),
@@ -137,13 +255,25 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 					parser_errposition(pstate, opt->location));
 	}
 
+	/*
+	 * Determine the lock mode expected by cluster_rel().
+	 *
+	 * In the exclusive case, we obtain AccessExclusiveLock right away to
+	 * avoid lock-upgrade hazard in the single-transaction case. In the
+	 * CONCURRENTLY case, the AccessExclusiveLock will only be used at the end
+	 * of processing, supposedly for very short time. Until then, we'll have
+	 * to unlock the relation temporarily, so there's no lock-upgrade hazard.
+	 */
+	lockmode = (params.options & CLUOPT_CONCURRENT) == 0 ?
+		AccessExclusiveLock : ShareUpdateExclusiveLock;
+
 	/*
 	 * If a single relation is specified, process it and we're done ... unless
 	 * the relation is a partitioned table, in which case we fall through.
 	 */
 	if (stmt->relation != NULL)
 	{
-		rel = process_single_relation(stmt, &params);
+		rel = process_single_relation(stmt, lockmode, isTopLevel, &params);
 		if (rel == NULL)
 			return;				/* all done */
 	}
@@ -158,10 +288,29 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 				errmsg("cannot %s multiple tables", "REPACK (ANALYZE)"));
 
 	/*
-	 * By here, we know we are in a multi-table situation.  In order to avoid
-	 * holding locks for too long, we want to process each table in its own
-	 * transaction.  This forces us to disallow running inside a user
-	 * transaction block.
+	 * By here, we know we are in a multi-table situation.
+	 *
+	 * Concurrent processing is currently considered rather special (e.g. in
+	 * terms of resources consumed) so it is not performed in bulk.
+	 */
+	if (params.options & CLUOPT_CONCURRENT)
+	{
+		if (rel != NULL)
+		{
+			Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
+			ereport(ERROR,
+					errmsg("REPACK CONCURRENTLY not supported for partitioned tables"),
+					errhint("Consider running the command for individual partitions."));
+		}
+		else
+			ereport(ERROR,
+					errmsg("REPACK CONCURRENTLY requires explicit table name"));
+	}
+
+	/*
+	 * In order to avoid holding locks for too long, we want to process each
+	 * table in its own transaction.  This forces us to disallow running
+	 * inside a user transaction block.
 	 */
 	PreventInTransactionBlock(isTopLevel, RepackCommandAsString(stmt->command));
 
@@ -244,7 +393,7 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 		 * Open the target table, coping with the case where it has been
 		 * dropped.
 		 */
-		rel = try_table_open(rtc->tableOid, AccessExclusiveLock);
+		rel = try_table_open(rtc->tableOid, lockmode);
 		if (rel == NULL)
 		{
 			CommitTransactionCommand();
@@ -255,7 +404,7 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 		PushActiveSnapshot(GetTransactionSnapshot());
 
 		/* Process this table */
-		cluster_rel(stmt->command, rel, rtc->indexOid, &params);
+		cluster_rel(stmt->command, rel, rtc->indexOid, &params, isTopLevel);
 		/* cluster_rel closes the relation, but keeps lock */
 
 		PopActiveSnapshot();
@@ -284,22 +433,53 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
  * If indexOid is InvalidOid, the table will be rewritten in physical order
  * instead of index order.
  *
+ * Note that, in the concurrent case, the function releases the lock at some
+ * point, in order to get AccessExclusiveLock for the final steps (i.e. to
+ * swap the relation files). To make things simpler, the caller should expect
+ * OldHeap to be closed on return, regardless CLUOPT_CONCURRENT. (The
+ * AccessExclusiveLock is kept till the end of the transaction.)
+ *
  * 'cmd' indicates which command is being executed, to be used for error
  * messages.
  */
 void
 cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
-			ClusterParams *params)
+			ClusterParams *params, bool isTopLevel)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
+	Relation	index;
+	LOCKMODE	lmode;
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
 	bool		verbose = ((params->options & CLUOPT_VERBOSE) != 0);
 	bool		recheck = ((params->options & CLUOPT_RECHECK) != 0);
-	Relation	index;
+	bool		concurrent = ((params->options & CLUOPT_CONCURRENT) != 0);
+
+	/*
+	 * The lock mode is AccessExclusiveLock for normal processing and
+	 * ShareUpdateExclusiveLock for concurrent processing (so that SELECT,
+	 * INSERT, UPDATE and DELETE commands work, but cluster_rel() cannot be
+	 * called concurrently for the same relation).
+	 */
+	lmode = !concurrent ? AccessExclusiveLock : ShareUpdateExclusiveLock;
+
+	/* There are specific requirements on concurrent processing. */
+	if (concurrent)
+	{
+		/*
+		 * Make sure we have no XID assigned, otherwise call of
+		 * setup_logical_decoding() can cause a deadlock.
+		 *
+		 * The existence of transaction block actually does not imply that XID
+		 * was already assigned, but it very likely is. We might want to check
+		 * the result of GetCurrentTransactionIdIfAny() instead, but that
+		 * would be less clear from user's perspective.
+		 */
+		PreventInTransactionBlock(isTopLevel, "REPACK (CONCURRENTLY)");
 
-	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false));
+		check_repack_concurrently_requirements(OldHeap);
+	}
 
 	/* Check for user-requested abort. */
 	CHECK_FOR_INTERRUPTS();
@@ -325,10 +505,13 @@ cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	 * If this is a single-transaction CLUSTER, we can skip these tests. We
 	 * *must* skip the one on indisclustered since it would reject an attempt
 	 * to cluster a not-previously-clustered index.
+	 *
+	 * XXX move [some of] these comments to where the RECHECK flag is
+	 * determined?
 	 */
 	if (recheck &&
 		!cluster_rel_recheck(cmd, OldHeap, indexOid, save_userid,
-							 params->options))
+							 lmode, params->options))
 		goto out;
 
 	/*
@@ -343,6 +526,12 @@ cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 				errmsg("cannot run %s on a shared catalog",
 					   RepackCommandAsString(cmd)));
 
+	/*
+	 * The CONCURRENTLY case should have been rejected earlier because it does
+	 * not support system catalogs.
+	 */
+	Assert(!(OldHeap->rd_rel->relisshared && concurrent));
+
 	/*
 	 * Don't process temp tables of other backends ... their local buffer
 	 * manager is not going to cope.
@@ -363,7 +552,7 @@ cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	if (OidIsValid(indexOid))
 	{
 		/* verify the index is good and lock it */
-		check_index_is_clusterable(OldHeap, indexOid, AccessExclusiveLock);
+		check_index_is_clusterable(OldHeap, indexOid, lmode);
 		/* also open it */
 		index = index_open(indexOid, NoLock);
 	}
@@ -398,7 +587,9 @@ cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	if (OldHeap->rd_rel->relkind == RELKIND_MATVIEW &&
 		!RelationIsPopulated(OldHeap))
 	{
-		relation_close(OldHeap, AccessExclusiveLock);
+		if (index)
+			index_close(index, lmode);
+		relation_close(OldHeap, lmode);
 		goto out;
 	}
 
@@ -411,11 +602,35 @@ cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	 * invalid, because we move tuples around.  Promote them to relation
 	 * locks.  Predicate locks on indexes will be promoted when they are
 	 * reindexed.
+	 *
+	 * During concurrent processing, the heap as well as its indexes stay in
+	 * operation, so we postpone this step until they are locked using
+	 * AccessExclusiveLock near the end of the processing.
 	 */
-	TransferPredicateLocksToHeapRelation(OldHeap);
+	if (!concurrent)
+		TransferPredicateLocksToHeapRelation(OldHeap);
 
 	/* rebuild_relation does all the dirty work */
-	rebuild_relation(cmd, OldHeap, index, verbose);
+	PG_TRY();
+	{
+		/*
+		 * For concurrent processing, make sure that our logical decoding
+		 * ignores data changes of other tables than the one we are
+		 * processing.
+		 */
+		if (concurrent)
+			begin_concurrent_repack(OldHeap);
+
+		rebuild_relation(cmd, OldHeap, index, /* save_userid, */
+						 verbose, concurrent);
+	}
+	PG_FINALLY();
+	{
+		if (concurrent)
+			end_concurrent_repack();
+	}
+	PG_END_TRY();
+
 	/* rebuild_relation closes OldHeap, and index if valid */
 
 out:
@@ -434,14 +649,14 @@ out:
  */
 static bool
 cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
-					Oid userid, int options)
+					Oid userid, LOCKMODE lmode, int options)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 
 	/* Check that the user still has privileges for the relation */
 	if (!cluster_is_permitted_for_relation(cmd, tableOid, userid))
 	{
-		relation_close(OldHeap, AccessExclusiveLock);
+		relation_close(OldHeap, lmode);
 		return false;
 	}
 
@@ -455,7 +670,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	 */
 	if (RELATION_IS_OTHER_TEMP(OldHeap))
 	{
-		relation_close(OldHeap, AccessExclusiveLock);
+		relation_close(OldHeap, lmode);
 		return false;
 	}
 
@@ -466,7 +681,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 		 */
 		if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
 		{
-			relation_close(OldHeap, AccessExclusiveLock);
+			relation_close(OldHeap, lmode);
 			return false;
 		}
 
@@ -477,7 +692,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 		if ((options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
 			!get_index_isclustered(indexOid))
 		{
-			relation_close(OldHeap, AccessExclusiveLock);
+			relation_close(OldHeap, lmode);
 			return false;
 		}
 	}
@@ -489,7 +704,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
  * Verify that the specified heap and index are valid to cluster on
  *
  * Side effect: obtains lock on the index.  The caller may
- * in some cases already have AccessExclusiveLock on the table, but
+ * in some cases already have a lock of the same strength on the table, but
  * not in all cases so we can't rely on the table-level lock for
  * protection here.
  */
@@ -618,19 +833,88 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 	table_close(pg_index, RowExclusiveLock);
 }
 
+/*
+ * Check if the CONCURRENTLY option is legal for the relation.
+ */
+static void
+check_repack_concurrently_requirements(Relation rel)
+{
+	char		relpersistence,
+				replident;
+	Oid			ident_idx;
+
+	/* Data changes in system relations are not logically decoded. */
+	if (IsCatalogRelation(rel))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("REPACK CONCURRENTLY is not supported for catalog relations.")));
+
+	/*
+	 * reorderbuffer.c does not seem to handle processing of TOAST relation
+	 * alone.
+	 */
+	if (IsToastRelation(rel))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("REPACK CONCURRENTLY is not supported for TOAST relations, unless the main relation is repacked too.")));
+
+	relpersistence = rel->rd_rel->relpersistence;
+	if (relpersistence != RELPERSISTENCE_PERMANENT)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("REPACK CONCURRENTLY is only allowed for permanent relations.")));
+
+	/* With NOTHING, WAL does not contain the old tuple. */
+	replident = rel->rd_rel->relreplident;
+	if (replident == REPLICA_IDENTITY_NOTHING)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("Relation \"%s\" has insufficient replication identity.",
+						 RelationGetRelationName(rel))));
+
+	/*
+	 * Identity index is not set if the replica identity is FULL, but PK might
+	 * exist in such a case.
+	 */
+	ident_idx = RelationGetReplicaIndex(rel);
+	if (!OidIsValid(ident_idx) && OidIsValid(rel->rd_pkindex))
+		ident_idx = rel->rd_pkindex;
+	if (!OidIsValid(ident_idx))
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot process relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 (errhint("Relation \"%s\" has no identity index.",
+						  RelationGetRelationName(rel)))));
+}
+
+
 /*
  * rebuild_relation: rebuild an existing relation in index or physical order
  *
- * OldHeap: table to rebuild.
+ * OldHeap: table to rebuild.  See cluster_rel() for comments on the required
+ * lock strength.
+ *
  * index: index to cluster by, or NULL to rewrite in physical order.
  *
- * On entry, heap and index (if one is given) must be open, and
- * AccessExclusiveLock held on them.
- * On exit, they are closed, but locks on them are not released.
+ * On entry, heap and index (if one is given) must be open, and the
+ * appropriate lock held on them -- AccessExclusiveLock for exclusive
+ * processing and ShareUpdateExclusiveLock for concurrent processing.
+ *
+ * On exit, they are closed, but still locked with AccessExclusiveLock.
+ * (The function handles the lock upgrade if 'concurrent' is true.)
  */
 static void
-rebuild_relation(RepackCommand cmd,
-				 Relation OldHeap, Relation index, bool verbose)
+rebuild_relation(RepackCommand cmd, Relation OldHeap, Relation index,
+				 bool verbose, bool concurrent)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			accessMethod = OldHeap->rd_rel->relam;
@@ -638,13 +922,38 @@ rebuild_relation(RepackCommand cmd,
 	Oid			OIDNewHeap;
 	Relation	NewHeap;
 	char		relpersistence;
-	bool		is_system_catalog;
 	bool		swap_toast_by_content;
 	TransactionId frozenXid;
 	MultiXactId cutoffMulti;
+	LogicalDecodingContext *decoding_ctx = NULL;
+	Snapshot	snapshot = NULL;
+#if USE_ASSERT_CHECKING
+	LOCKMODE	lmode;
+
+	lmode = concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock;
+
+	Assert(CheckRelationLockedByMe(OldHeap, lmode, false));
+	Assert(index == NULL || CheckRelationLockedByMe(index, lmode, false));
+#endif
 
-	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false) &&
-		   (index == NULL || CheckRelationLockedByMe(index, AccessExclusiveLock, false)));
+	if (concurrent)
+	{
+		/*
+		 * Prepare to capture the concurrent data changes.
+		 *
+		 * Note that this call waits for all transactions with XID already
+		 * assigned to finish. If some of those transactions is waiting for a
+		 * lock conflicting with ShareUpdateExclusiveLock on our table (e.g.
+		 * it runs CREATE INDEX), we can end up in a deadlock. Not sure this
+		 * risk is worth unlocking/locking the table (and its clustering
+		 * index) and checking again if its still eligible for REPACK
+		 * CONCURRENTLY.
+		 */
+		decoding_ctx = setup_logical_decoding(tableOid);
+
+		snapshot = SnapBuildInitialSnapshotForRepack(decoding_ctx->snapshot_builder);
+		PushActiveSnapshot(snapshot);
+	}
 
 	/* for CLUSTER or REPACK USING INDEX, mark the index as the one to use */
 	if (index != NULL)
@@ -652,7 +961,6 @@ rebuild_relation(RepackCommand cmd,
 
 	/* Remember info about rel before closing OldHeap */
 	relpersistence = OldHeap->rd_rel->relpersistence;
-	is_system_catalog = IsSystemRelation(OldHeap);
 
 	/*
 	 * Create the transient table that will receive the re-ordered data.
@@ -668,30 +976,65 @@ rebuild_relation(RepackCommand cmd,
 	NewHeap = table_open(OIDNewHeap, NoLock);
 
 	/* Copy the heap data into the new table in the desired order */
-	copy_table_data(NewHeap, OldHeap, index, verbose,
+	copy_table_data(NewHeap, OldHeap, index, snapshot, decoding_ctx, verbose,
 					&swap_toast_by_content, &frozenXid, &cutoffMulti);
 
+	/* The historic snapshot won't be needed anymore. */
+	if (snapshot)
+		PopActiveSnapshot();
 
-	/* Close relcache entries, but keep lock until transaction commit */
-	table_close(OldHeap, NoLock);
-	if (index)
-		index_close(index, NoLock);
+	if (concurrent)
+	{
+		/*
+		 * Push a snapshot that we will use to find old versions of rows when
+		 * processing concurrent UPDATE and DELETE commands. (That snapshot
+		 * should also be used by index expressions.)
+		 */
+		PushActiveSnapshot(GetTransactionSnapshot());
 
-	/*
-	 * Close the new relation so it can be dropped as soon as the storage is
-	 * swapped. The relation is not visible to others, so no need to unlock it
-	 * explicitly.
-	 */
-	table_close(NewHeap, NoLock);
+		/*
+		 * Make sure we can find the tuples just inserted when applying DML
+		 * commands on top of those.
+		 */
+		CommandCounterIncrement();
+		UpdateActiveSnapshotCommandId();
 
-	/*
-	 * Swap the physical files of the target and transient tables, then
-	 * rebuild the target's indexes and throw away the transient table.
-	 */
-	finish_heap_swap(tableOid, OIDNewHeap, is_system_catalog,
-					 swap_toast_by_content, false, true,
-					 frozenXid, cutoffMulti,
-					 relpersistence);
+		rebuild_relation_finish_concurrent(NewHeap, OldHeap, index,
+										   decoding_ctx, swap_toast_by_content,
+										   frozenXid, cutoffMulti);
+		PopActiveSnapshot();
+
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_FINAL_CLEANUP);
+
+		/* Done with decoding. */
+		cleanup_logical_decoding(decoding_ctx);
+	}
+	else
+	{
+		bool		is_system_catalog = IsSystemRelation(OldHeap);
+
+		/* Close relcache entries, but keep lock until transaction commit */
+		table_close(OldHeap, NoLock);
+		if (index)
+			index_close(index, NoLock);
+
+		/*
+		 * Close the new relation so it can be dropped as soon as the storage
+		 * is swapped. The relation is not visible to others, so no need to
+		 * unlock it explicitly.
+		 */
+		table_close(NewHeap, NoLock);
+
+		/*
+		 * Swap the physical files of the target and transient tables, then
+		 * rebuild the target's indexes and throw away the transient table.
+		 */
+		finish_heap_swap(tableOid, OIDNewHeap, is_system_catalog,
+						 swap_toast_by_content, false, true, true,
+						 frozenXid, cutoffMulti,
+						 relpersistence);
+	}
 }
 
 
@@ -826,15 +1169,19 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
 /*
  * Do the physical copying of table data.
  *
+ * 'snapshot' and 'decoding_ctx': see table_relation_copy_for_cluster(). Pass
+ * iff concurrent processing is required.
+ *
  * There are three output parameters:
  * *pSwapToastByContent is set true if toast tables must be swapped by content.
  * *pFreezeXid receives the TransactionId used as freeze cutoff point.
  * *pCutoffMulti receives the MultiXactId used as a cutoff point.
  */
 static void
-copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verbose,
-				bool *pSwapToastByContent, TransactionId *pFreezeXid,
-				MultiXactId *pCutoffMulti)
+copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
+				Snapshot snapshot, LogicalDecodingContext *decoding_ctx,
+				bool verbose, bool *pSwapToastByContent,
+				TransactionId *pFreezeXid, MultiXactId *pCutoffMulti)
 {
 	Relation	relRelation;
 	HeapTuple	reltup;
@@ -852,6 +1199,8 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	PGRUsage	ru0;
 	char	   *nspname;
 
+	bool		concurrent = snapshot != NULL;
+
 	pg_rusage_init(&ru0);
 
 	/* Store a copy of the namespace name for logging purposes */
@@ -954,8 +1303,48 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	 * provided, else plain seqscan.
 	 */
 	if (OldIndex != NULL && OldIndex->rd_rel->relam == BTREE_AM_OID)
+	{
+		ResourceOwner oldowner = NULL;
+		ResourceOwner resowner = NULL;
+
+		/*
+		 * In the CONCURRENT case, use a dedicated resource owner so we don't
+		 * leave any additional locks behind us that we cannot release easily.
+		 */
+		if (concurrent)
+		{
+			Assert(CheckRelationLockedByMe(OldHeap, ShareUpdateExclusiveLock,
+										   false));
+			Assert(CheckRelationLockedByMe(OldIndex, ShareUpdateExclusiveLock,
+										   false));
+
+			resowner = ResourceOwnerCreate(CurrentResourceOwner,
+										   "plan_cluster_use_sort");
+			oldowner = CurrentResourceOwner;
+			CurrentResourceOwner = resowner;
+		}
+
 		use_sort = plan_cluster_use_sort(RelationGetRelid(OldHeap),
 										 RelationGetRelid(OldIndex));
+
+		if (concurrent)
+		{
+			CurrentResourceOwner = oldowner;
+
+			/*
+			 * We are primarily concerned about locks, but if the planner
+			 * happened to allocate any other resources, we should release
+			 * them too because we're going to delete the whole resowner.
+			 */
+			ResourceOwnerRelease(resowner, RESOURCE_RELEASE_BEFORE_LOCKS,
+								 false, false);
+			ResourceOwnerRelease(resowner, RESOURCE_RELEASE_LOCKS,
+								 false, false);
+			ResourceOwnerRelease(resowner, RESOURCE_RELEASE_AFTER_LOCKS,
+								 false, false);
+			ResourceOwnerDelete(resowner);
+		}
+	}
 	else
 		use_sort = false;
 
@@ -984,7 +1373,9 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	 * values (e.g. because the AM doesn't use freezing).
 	 */
 	table_relation_copy_for_cluster(OldHeap, NewHeap, OldIndex, use_sort,
-									cutoffs.OldestXmin, &cutoffs.FreezeLimit,
+									cutoffs.OldestXmin, snapshot,
+									decoding_ctx,
+									&cutoffs.FreezeLimit,
 									&cutoffs.MultiXactCutoff,
 									&num_tuples, &tups_vacuumed,
 									&tups_recently_dead);
@@ -993,7 +1384,11 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	*pFreezeXid = cutoffs.FreezeLimit;
 	*pCutoffMulti = cutoffs.MultiXactCutoff;
 
-	/* Reset rd_toastoid just to be tidy --- it shouldn't be looked at again */
+	/*
+	 * Reset rd_toastoid just to be tidy --- it shouldn't be looked at again.
+	 * In the CONCURRENTLY case, we need to set it again before applying the
+	 * concurrent changes.
+	 */
 	NewHeap->rd_toastoid = InvalidOid;
 
 	num_pages = RelationGetNumberOfBlocks(NewHeap);
@@ -1451,14 +1846,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 				 bool swap_toast_by_content,
 				 bool check_constraints,
 				 bool is_internal,
+				 bool reindex,
 				 TransactionId frozenXid,
 				 MultiXactId cutoffMulti,
 				 char newrelpersistence)
 {
 	ObjectAddress object;
 	Oid			mapped_tables[4];
-	int			reindex_flags;
-	ReindexParams reindex_params = {0};
 	int			i;
 
 	/* Report that we are now swapping relation files */
@@ -1484,39 +1878,47 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	if (is_system_catalog)
 		CacheInvalidateCatalog(OIDOldHeap);
 
-	/*
-	 * Rebuild each index on the relation (but not the toast table, which is
-	 * all-new at this point).  It is important to do this before the DROP
-	 * step because if we are processing a system catalog that will be used
-	 * during DROP, we want to have its indexes available.  There is no
-	 * advantage to the other order anyway because this is all transactional,
-	 * so no chance to reclaim disk space before commit.  We do not need a
-	 * final CommandCounterIncrement() because reindex_relation does it.
-	 *
-	 * Note: because index_build is called via reindex_relation, it will never
-	 * set indcheckxmin true for the indexes.  This is OK even though in some
-	 * sense we are building new indexes rather than rebuilding existing ones,
-	 * because the new heap won't contain any HOT chains at all, let alone
-	 * broken ones, so it can't be necessary to set indcheckxmin.
-	 */
-	reindex_flags = REINDEX_REL_SUPPRESS_INDEX_USE;
-	if (check_constraints)
-		reindex_flags |= REINDEX_REL_CHECK_CONSTRAINTS;
+	if (reindex)
+	{
+		int			reindex_flags;
+		ReindexParams reindex_params = {0};
 
-	/*
-	 * Ensure that the indexes have the same persistence as the parent
-	 * relation.
-	 */
-	if (newrelpersistence == RELPERSISTENCE_UNLOGGED)
-		reindex_flags |= REINDEX_REL_FORCE_INDEXES_UNLOGGED;
-	else if (newrelpersistence == RELPERSISTENCE_PERMANENT)
-		reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
+		/*
+		 * Rebuild each index on the relation (but not the toast table, which
+		 * is all-new at this point).  It is important to do this before the
+		 * DROP step because if we are processing a system catalog that will
+		 * be used during DROP, we want to have its indexes available.  There
+		 * is no advantage to the other order anyway because this is all
+		 * transactional, so no chance to reclaim disk space before commit. We
+		 * do not need a final CommandCounterIncrement() because
+		 * reindex_relation does it.
+		 *
+		 * Note: because index_build is called via reindex_relation, it will
+		 * never set indcheckxmin true for the indexes.  This is OK even
+		 * though in some sense we are building new indexes rather than
+		 * rebuilding existing ones, because the new heap won't contain any
+		 * HOT chains at all, let alone broken ones, so it can't be necessary
+		 * to set indcheckxmin.
+		 */
+		reindex_flags = REINDEX_REL_SUPPRESS_INDEX_USE;
+		if (check_constraints)
+			reindex_flags |= REINDEX_REL_CHECK_CONSTRAINTS;
 
-	/* Report that we are now reindexing relations */
-	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
-								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
+		/*
+		 * Ensure that the indexes have the same persistence as the parent
+		 * relation.
+		 */
+		if (newrelpersistence == RELPERSISTENCE_UNLOGGED)
+			reindex_flags |= REINDEX_REL_FORCE_INDEXES_UNLOGGED;
+		else if (newrelpersistence == RELPERSISTENCE_PERMANENT)
+			reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
 
-	reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
+		/* Report that we are now reindexing relations */
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
+
+		reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
+	}
 
 	/* Report that we are now doing clean up */
 	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
@@ -1859,7 +2261,8 @@ cluster_is_permitted_for_relation(RepackCommand cmd, Oid relid, Oid userid)
  * case, if an index name is given, it's up to the caller to resolve it.
  */
 static Relation
-process_single_relation(RepackStmt *stmt, ClusterParams *params)
+process_single_relation(RepackStmt *stmt, LOCKMODE lockmode, bool isTopLevel,
+						ClusterParams *params)
 {
 	Relation	rel;
 	Oid			tableOid;
@@ -1868,13 +2271,9 @@ process_single_relation(RepackStmt *stmt, ClusterParams *params)
 	Assert(stmt->command == REPACK_COMMAND_CLUSTER ||
 		   stmt->command == REPACK_COMMAND_REPACK);
 
-	/*
-	 * Find, lock, and check permissions on the table.  We obtain
-	 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
-	 * single-transaction case.
-	 */
+	/* Find, lock, and check permissions on the table. */
 	tableOid = RangeVarGetRelidExtended(stmt->relation->relation,
-										AccessExclusiveLock,
+										lockmode,
 										0,
 										RangeVarCallbackMaintainsTable,
 										NULL);
@@ -1906,13 +2305,14 @@ process_single_relation(RepackStmt *stmt, ClusterParams *params)
 		return rel;
 	else
 	{
-		Oid			indexOid;
+		Oid			indexOid = InvalidOid;
 
 		indexOid = determine_clustered_index(rel, stmt->usingindex,
 											 stmt->indexname);
 		if (OidIsValid(indexOid))
-			check_index_is_clusterable(rel, indexOid, AccessExclusiveLock);
-		cluster_rel(stmt->command, rel, indexOid, params);
+			check_index_is_clusterable(rel, indexOid, lockmode);
+
+		cluster_rel(stmt->command, rel, indexOid, params, isTopLevel);
 
 		/* Do an analyze, if requested */
 		if (params->options & CLUOPT_ANALYZE)
@@ -1995,3 +2395,1058 @@ RepackCommandAsString(RepackCommand cmd)
 	}
 	return "???";
 }
+
+
+/*
+ * Call this function before REPACK CONCURRENTLY starts to setup logical
+ * decoding. It makes sure that other users of the table put enough
+ * information into WAL.
+ *
+ * The point is that at various places we expect that the table we're
+ * processing is treated like a system catalog. For example, we need to be
+ * able to scan it using a "historic snapshot" anytime during the processing
+ * (as opposed to scanning only at the start point of the decoding, as logical
+ * replication does during initial table synchronization), in order to apply
+ * concurrent UPDATE / DELETE commands.
+ *
+ * Note that TOAST table needs no attention here as it's not scanned using
+ * historic snapshot.
+ */
+static void
+begin_concurrent_repack(Relation rel)
+{
+	Oid			toastrelid;
+
+	/*
+	 * Avoid logical decoding of other relations by this backend. The lock we
+	 * have guarantees that the actual locator cannot be changed concurrently:
+	 * TRUNCATE needs AccessExclusiveLock.
+	 */
+	Assert(CheckRelationLockedByMe(rel, ShareUpdateExclusiveLock, false));
+	repacked_rel_locator = rel->rd_locator;
+	toastrelid = rel->rd_rel->reltoastrelid;
+	if (OidIsValid(toastrelid))
+	{
+		Relation	toastrel;
+
+		/* Avoid logical decoding of other TOAST relations. */
+		toastrel = table_open(toastrelid, AccessShareLock);
+		repacked_rel_toast_locator = toastrel->rd_locator;
+		table_close(toastrel, AccessShareLock);
+	}
+}
+
+/*
+ * Call this when done with REPACK CONCURRENTLY.
+ */
+static void
+end_concurrent_repack(void)
+{
+	/*
+	 * Restore normal function of (future) logical decoding for this backend.
+	 */
+	repacked_rel_locator.relNumber = InvalidOid;
+	repacked_rel_toast_locator.relNumber = InvalidOid;
+}
+
+/*
+ * This function is much like pg_create_logical_replication_slot() except that
+ * the new slot is neither released (if anyone else could read changes from
+ * our slot, we could miss changes other backends do while we copy the
+ * existing data into temporary table), nor persisted (it's easier to handle
+ * crash by restarting all the work from scratch).
+ */
+static LogicalDecodingContext *
+setup_logical_decoding(Oid relid)
+{
+	Relation	rel;
+	TupleDesc	tupdesc;
+	LogicalDecodingContext *ctx;
+	RepackDecodingState *dstate = palloc0_object(RepackDecodingState);
+
+	/*
+	 * REPACK CONCURRENTLY is not allowed in a transaction block, so this
+	 * should never fire.
+	 */
+	Assert(!TransactionIdIsValid(GetTopTransactionIdIfAny()));
+
+	/*
+	 * A single backend should not execute multiple REPACK commands at a time,
+	 * so use PID to make the slot unique.
+	 */
+	snprintf(NameStr(dstate->slotname), NAMEDATALEN, "repack_%d", MyProcPid);
+
+	/*
+	 * Check if we can use logical decoding.
+	 */
+	CheckSlotPermissions();
+	CheckLogicalDecodingRequirements();
+
+	/* RS_TEMPORARY so that the slot gets cleaned up on ERROR. */
+	ReplicationSlotCreate(NameStr(dstate->slotname), true, RS_TEMPORARY,
+						  false, false, false);
+
+	/*
+	 * Neither prepare_write nor do_write callback nor update_progress is
+	 * useful for us.
+	 */
+	ctx = CreateInitDecodingContext(REPL_PLUGIN_NAME,
+									NIL,
+									true,
+									InvalidXLogRecPtr,
+									XL_ROUTINE(.page_read = read_local_xlog_page,
+											   .segment_open = wal_segment_open,
+											   .segment_close = wal_segment_close),
+									NULL, NULL, NULL);
+
+	/*
+	 * We don't have control on setting fast_forward, so at least check it.
+	 */
+	Assert(!ctx->fast_forward);
+
+	DecodingContextFindStartpoint(ctx);
+
+	/* Some WAL records should have been read. */
+	Assert(ctx->reader->EndRecPtr != InvalidXLogRecPtr);
+
+	XLByteToSeg(ctx->reader->EndRecPtr, repack_current_segment,
+				wal_segment_size);
+
+	/*
+	 * Setup structures to store decoded changes.
+	 */
+	dstate->relid = relid;
+	dstate->tstore = tuplestore_begin_heap(false, false,
+										   maintenance_work_mem);
+
+	/* Caller should already have the table locked. */
+	rel = table_open(relid, NoLock);
+	tupdesc = CreateTupleDescCopy(RelationGetDescr(rel));
+	dstate->tupdesc = tupdesc;
+	table_close(rel, NoLock);
+
+	/* Initialize the descriptor to store the changes ... */
+	dstate->tupdesc_change = CreateTemplateTupleDesc(1);
+
+	TupleDescInitEntry(dstate->tupdesc_change, 1, NULL, BYTEAOID, -1, 0);
+	/* ... as well as the corresponding slot. */
+	dstate->tsslot = MakeSingleTupleTableSlot(dstate->tupdesc_change,
+											  &TTSOpsMinimalTuple);
+
+	dstate->resowner = ResourceOwnerCreate(CurrentResourceOwner,
+										   "logical decoding");
+
+	ctx->output_writer_private = dstate;
+	return ctx;
+}
+
+/*
+ * Retrieve tuple from ConcurrentChange structure.
+ *
+ * The input data starts with the structure but it might not be appropriately
+ * aligned.
+ */
+static HeapTuple
+get_changed_tuple(char *change)
+{
+	HeapTupleData tup_data;
+	HeapTuple	result;
+	char	   *src;
+
+	/*
+	 * Ensure alignment before accessing the fields. (This is why we can't use
+	 * heap_copytuple() instead of this function.)
+	 */
+	src = change + offsetof(ConcurrentChange, tup_data);
+	memcpy(&tup_data, src, sizeof(HeapTupleData));
+
+	result = (HeapTuple) palloc(HEAPTUPLESIZE + tup_data.t_len);
+	memcpy(result, &tup_data, sizeof(HeapTupleData));
+	result->t_data = (HeapTupleHeader) ((char *) result + HEAPTUPLESIZE);
+	src = change + SizeOfConcurrentChange;
+	memcpy(result->t_data, src, result->t_len);
+
+	return result;
+}
+
+/*
+ * Decode logical changes from the WAL sequence up to end_of_wal.
+ */
+void
+repack_decode_concurrent_changes(LogicalDecodingContext *ctx,
+								 XLogRecPtr end_of_wal)
+{
+	RepackDecodingState *dstate;
+	ResourceOwner resowner_old;
+
+	/*
+	 * Invalidate the "present" cache before moving to "(recent) history".
+	 */
+	InvalidateSystemCaches();
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+	resowner_old = CurrentResourceOwner;
+	CurrentResourceOwner = dstate->resowner;
+
+	PG_TRY();
+	{
+		while (ctx->reader->EndRecPtr < end_of_wal)
+		{
+			XLogRecord *record;
+			XLogSegNo	segno_new;
+			char	   *errm = NULL;
+			XLogRecPtr	end_lsn;
+
+			record = XLogReadRecord(ctx->reader, &errm);
+			if (errm)
+				elog(ERROR, "%s", errm);
+
+			if (record != NULL)
+				LogicalDecodingProcessRecord(ctx, ctx->reader);
+
+			/*
+			 * If WAL segment boundary has been crossed, inform the decoding
+			 * system that the catalog_xmin can advance. (We can confirm more
+			 * often, but a filling a single WAL segment should not take much
+			 * time.)
+			 */
+			end_lsn = ctx->reader->EndRecPtr;
+			XLByteToSeg(end_lsn, segno_new, wal_segment_size);
+			if (segno_new != repack_current_segment)
+			{
+				LogicalConfirmReceivedLocation(end_lsn);
+				elog(DEBUG1, "REPACK: confirmed receive location %X/%X",
+					 (uint32) (end_lsn >> 32), (uint32) end_lsn);
+				repack_current_segment = segno_new;
+			}
+
+			CHECK_FOR_INTERRUPTS();
+		}
+		InvalidateSystemCaches();
+		CurrentResourceOwner = resowner_old;
+	}
+	PG_CATCH();
+	{
+		/* clear all timetravel entries */
+		InvalidateSystemCaches();
+		CurrentResourceOwner = resowner_old;
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+}
+
+/*
+ * Apply changes that happened during the initial load.
+ *
+ * Scan key is passed by caller, so it does not have to be constructed
+ * multiple times. Key entries have all fields initialized, except for
+ * sk_argument.
+ */
+static void
+apply_concurrent_changes(RepackDecodingState *dstate, ChangeDest *dest)
+{
+	Relation	rel = dest->rel;
+	TupleTableSlot *index_slot,
+			   *ident_slot;
+	HeapTuple	tup_old = NULL;
+
+	if (dstate->nchanges == 0)
+		return;
+
+	/* TupleTableSlot is needed to pass the tuple to ExecInsertIndexTuples(). */
+	index_slot = MakeSingleTupleTableSlot(dstate->tupdesc, &TTSOpsHeapTuple);
+
+	/* A slot to fetch tuples from identity index. */
+	ident_slot = table_slot_create(rel, NULL);
+
+	while (tuplestore_gettupleslot(dstate->tstore, true, false,
+								   dstate->tsslot))
+	{
+		bool		shouldFree;
+		HeapTuple	tup_change,
+					tup,
+					tup_exist;
+		char	   *change_raw,
+				   *src;
+		ConcurrentChange change;
+		bool		isnull[1];
+		Datum		values[1];
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* Get the change from the single-column tuple. */
+		tup_change = ExecFetchSlotHeapTuple(dstate->tsslot, false, &shouldFree);
+		heap_deform_tuple(tup_change, dstate->tupdesc_change, values, isnull);
+		Assert(!isnull[0]);
+
+		/* Make sure we access aligned data. */
+		change_raw = (char *) DatumGetByteaP(values[0]);
+		src = (char *) VARDATA(change_raw);
+		memcpy(&change, src, SizeOfConcurrentChange);
+
+		/*
+		 * Extract the tuple from the change. The tuple is copied here because
+		 * it might be assigned to 'tup_old', in which case it needs to
+		 * survive into the next iteration.
+		 */
+		tup = get_changed_tuple(src);
+
+		if (change.kind == CHANGE_UPDATE_OLD)
+		{
+			Assert(tup_old == NULL);
+			tup_old = tup;
+		}
+		else if (change.kind == CHANGE_INSERT)
+		{
+			Assert(tup_old == NULL);
+
+			apply_concurrent_insert(rel, &change, tup, dest->iistate,
+									index_slot);
+
+			pfree(tup);
+		}
+		else if (change.kind == CHANGE_UPDATE_NEW ||
+				 change.kind == CHANGE_DELETE)
+		{
+			HeapTuple	tup_key;
+
+			if (change.kind == CHANGE_UPDATE_NEW)
+			{
+				tup_key = tup_old != NULL ? tup_old : tup;
+			}
+			else
+			{
+				Assert(tup_old == NULL);
+				tup_key = tup;
+			}
+
+			/*
+			 * Find the tuple to be updated or deleted.
+			 */
+			tup_exist = find_target_tuple(rel, dest, tup_key, ident_slot);
+			if (tup_exist == NULL)
+				elog(ERROR, "Failed to find target tuple");
+
+			if (change.kind == CHANGE_UPDATE_NEW)
+				apply_concurrent_update(rel, tup, tup_exist, &change,
+										dest->iistate, index_slot);
+			else
+				apply_concurrent_delete(rel, tup_exist, &change);
+
+			if (tup_old != NULL)
+			{
+				pfree(tup_old);
+				tup_old = NULL;
+			}
+
+			pfree(tup);
+		}
+		else
+			elog(ERROR, "Unrecognized kind of change: %d", change.kind);
+
+		/*
+		 * If a change was applied now, increment CID for next writes and
+		 * update the snapshot so it sees the changes we've applied so far.
+		 */
+		if (change.kind != CHANGE_UPDATE_OLD)
+		{
+			CommandCounterIncrement();
+			UpdateActiveSnapshotCommandId();
+		}
+
+		/* TTSOpsMinimalTuple has .get_heap_tuple==NULL. */
+		Assert(shouldFree);
+		pfree(tup_change);
+	}
+
+	tuplestore_clear(dstate->tstore);
+	dstate->nchanges = 0;
+
+	/* Cleanup. */
+	ExecDropSingleTupleTableSlot(index_slot);
+	ExecDropSingleTupleTableSlot(ident_slot);
+}
+
+static void
+apply_concurrent_insert(Relation rel, ConcurrentChange *change, HeapTuple tup,
+						IndexInsertState *iistate, TupleTableSlot *index_slot)
+{
+	List	   *recheck;
+
+
+	/*
+	 * Like simple_heap_insert(), but make sure that the INSERT is not
+	 * logically decoded - see reform_and_rewrite_tuple() for more
+	 * information.
+	 */
+	heap_insert(rel, tup, GetCurrentCommandId(true), HEAP_INSERT_NO_LOGICAL,
+				NULL);
+
+	/*
+	 * Update indexes.
+	 *
+	 * In case functions in the index need the active snapshot and caller
+	 * hasn't set one.
+	 */
+	ExecStoreHeapTuple(tup, index_slot, false);
+	recheck = ExecInsertIndexTuples(iistate->rri,
+									index_slot,
+									iistate->estate,
+									false,	/* update */
+									false,	/* noDupErr */
+									NULL,	/* specConflict */
+									NIL,	/* arbiterIndexes */
+									false	/* onlySummarizing */
+		);
+
+	/*
+	 * If recheck is required, it must have been preformed on the source
+	 * relation by now. (All the logical changes we process here are already
+	 * committed.)
+	 */
+	list_free(recheck);
+
+	pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_INSERTED, 1);
+}
+
+static void
+apply_concurrent_update(Relation rel, HeapTuple tup, HeapTuple tup_target,
+						ConcurrentChange *change, IndexInsertState *iistate,
+						TupleTableSlot *index_slot)
+{
+	LockTupleMode lockmode;
+	TM_FailureData tmfd;
+	TU_UpdateIndexes update_indexes;
+	TM_Result	res;
+	List	   *recheck;
+
+	/*
+	 * Write the new tuple into the new heap. ('tup' gets the TID assigned
+	 * here.)
+	 *
+	 * Do it like in simple_heap_update(), except for 'wal_logical' (and
+	 * except for 'wait').
+	 */
+	res = heap_update(rel, &tup_target->t_self, tup,
+					  GetCurrentCommandId(true),
+					  InvalidSnapshot,
+					  false,	/* no wait - only we are doing changes */
+					  &tmfd, &lockmode, &update_indexes,
+					  false /* wal_logical */ );
+	if (res != TM_Ok)
+		ereport(ERROR, (errmsg("failed to apply concurrent UPDATE")));
+
+	ExecStoreHeapTuple(tup, index_slot, false);
+
+	if (update_indexes != TU_None)
+	{
+		recheck = ExecInsertIndexTuples(iistate->rri,
+										index_slot,
+										iistate->estate,
+										true,	/* update */
+										false,	/* noDupErr */
+										NULL,	/* specConflict */
+										NIL,	/* arbiterIndexes */
+		/* onlySummarizing */
+										update_indexes == TU_Summarizing);
+		list_free(recheck);
+	}
+
+	pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_UPDATED, 1);
+}
+
+static void
+apply_concurrent_delete(Relation rel, HeapTuple tup_target,
+						ConcurrentChange *change)
+{
+	TM_Result	res;
+	TM_FailureData tmfd;
+
+	/*
+	 * Delete tuple from the new heap.
+	 *
+	 * Do it like in simple_heap_delete(), except for 'wal_logical' (and
+	 * except for 'wait').
+	 */
+	res = heap_delete(rel, &tup_target->t_self, GetCurrentCommandId(true),
+					  InvalidSnapshot, false,
+					  &tmfd,
+					  false,	/* no wait - only we are doing changes */
+					  false /* wal_logical */ );
+
+	if (res != TM_Ok)
+		ereport(ERROR, (errmsg("failed to apply concurrent DELETE")));
+
+	pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_DELETED, 1);
+}
+
+/*
+ * Find the tuple to be updated or deleted.
+ *
+ * 'tup_key' is a tuple containing the key values for the scan.
+ *
+ * On exit,'*scan_p' contains the scan descriptor used. The caller must close
+ * it when he no longer needs the tuple returned.
+ */
+static HeapTuple
+find_target_tuple(Relation rel, ChangeDest *dest, HeapTuple tup_key,
+				  TupleTableSlot *ident_slot)
+{
+	Relation	ident_index = dest->ident_index;
+	IndexScanDesc scan;
+	Form_pg_index ident_form;
+	int2vector *ident_indkey;
+	HeapTuple	result = NULL;
+
+	/* XXX no instrumentation for now */
+	scan = index_beginscan(rel, ident_index, GetActiveSnapshot(),
+						   NULL, dest->ident_key_nentries, 0);
+	index_rescan(scan, dest->ident_key, dest->ident_key_nentries, NULL, 0);
+
+	/* Info needed to retrieve key values from heap tuple. */
+	ident_form = ident_index->rd_index;
+	ident_indkey = &ident_form->indkey;
+
+	/* Use the incoming tuple to finalize the scan key. */
+	for (int i = 0; i < scan->numberOfKeys; i++)
+	{
+		ScanKey		entry;
+		bool		isnull;
+		int16		attno_heap;
+
+		entry = &scan->keyData[i];
+		attno_heap = ident_indkey->values[i];
+		entry->sk_argument = heap_getattr(tup_key,
+										  attno_heap,
+										  rel->rd_att,
+										  &isnull);
+		Assert(!isnull);
+	}
+	if (index_getnext_slot(scan, ForwardScanDirection, ident_slot))
+	{
+		bool		shouldFree;
+
+		result = ExecFetchSlotHeapTuple(ident_slot, false, &shouldFree);
+		/* TTSOpsBufferHeapTuple has .get_heap_tuple != NULL. */
+		Assert(!shouldFree);
+	}
+	index_endscan(scan);
+
+	return result;
+}
+
+/*
+ * Decode and apply concurrent changes.
+ */
+static void
+process_concurrent_changes(LogicalDecodingContext *decoding_ctx,
+						   XLogRecPtr end_of_wal, ChangeDest *dest)
+{
+	RepackDecodingState *dstate;
+
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_CATCH_UP);
+
+	dstate = (RepackDecodingState *) decoding_ctx->output_writer_private;
+
+	repack_decode_concurrent_changes(decoding_ctx, end_of_wal);
+
+	if (dstate->nchanges == 0)
+		return;
+
+	PG_TRY();
+	{
+		/*
+		 * Make sure that TOAST values can eventually be accessed via the old
+		 * relation - see comment in copy_table_data().
+		 */
+		if (OidIsValid(dest->toastrelid))
+			dest->rel->rd_toastoid = dest->toastrelid;
+
+		apply_concurrent_changes(dstate, dest);
+	}
+
+	/*
+	 * TODO Consider if the setting needs to be reverted: rel_dst will
+	 * eventually be dropped, w/o other transactions being able to access it.
+	 */
+	PG_FINALLY();
+	{
+		if (OidIsValid(dest->toastrelid))
+			dest->rel->rd_toastoid = InvalidOid;
+	}
+	PG_END_TRY();
+}
+
+/*
+ * Initialize IndexInsertState for index specified by ident_index_id.
+ *
+ * While doing that, also return the identity index in *ident_index_p.
+ */
+static IndexInsertState *
+get_index_insert_state(Relation relation, Oid ident_index_id,
+					   Relation *ident_index_p)
+{
+	EState	   *estate;
+	int			i;
+	IndexInsertState *result;
+	Relation	ident_index = NULL;
+
+	result = (IndexInsertState *) palloc0(sizeof(IndexInsertState));
+	estate = CreateExecutorState();
+
+	result->rri = (ResultRelInfo *) palloc(sizeof(ResultRelInfo));
+	InitResultRelInfo(result->rri, relation, 0, 0, 0);
+	ExecOpenIndices(result->rri, false);
+
+	/*
+	 * Find the relcache entry of the identity index so that we spend no extra
+	 * effort to open / close it.
+	 */
+	for (i = 0; i < result->rri->ri_NumIndices; i++)
+	{
+		Relation	ind_rel;
+
+		ind_rel = result->rri->ri_IndexRelationDescs[i];
+		if (ind_rel->rd_id == ident_index_id)
+			ident_index = ind_rel;
+	}
+	if (ident_index == NULL)
+		elog(ERROR, "Failed to open identity index");
+
+	/* Only initialize fields needed by ExecInsertIndexTuples(). */
+	result->estate = estate;
+
+	*ident_index_p = ident_index;
+	return result;
+}
+
+/*
+ * Build scan key to process logical changes.
+ */
+static ScanKey
+build_identity_key(Oid ident_idx_oid, Relation rel_src, int *nentries)
+{
+	Relation	ident_idx_rel;
+	Form_pg_index ident_idx;
+	int			n,
+				i;
+	ScanKey		result;
+
+	Assert(OidIsValid(ident_idx_oid));
+	ident_idx_rel = index_open(ident_idx_oid, AccessShareLock);
+	ident_idx = ident_idx_rel->rd_index;
+	n = ident_idx->indnatts;
+	result = (ScanKey) palloc(sizeof(ScanKeyData) * n);
+	for (i = 0; i < n; i++)
+	{
+		ScanKey		entry;
+		int16		relattno;
+		Form_pg_attribute att;
+		Oid			opfamily,
+					opcintype,
+					opno,
+					opcode;
+
+		entry = &result[i];
+		relattno = ident_idx->indkey.values[i];
+		if (relattno >= 1)
+		{
+			TupleDesc	desc;
+
+			desc = rel_src->rd_att;
+			att = TupleDescAttr(desc, relattno - 1);
+		}
+		else
+			elog(ERROR, "Unexpected attribute number %d in index", relattno);
+
+		opfamily = ident_idx_rel->rd_opfamily[i];
+		opcintype = ident_idx_rel->rd_opcintype[i];
+		opno = get_opfamily_member(opfamily, opcintype, opcintype,
+								   BTEqualStrategyNumber);
+
+		if (!OidIsValid(opno))
+			elog(ERROR, "Failed to find = operator for type %u", opcintype);
+
+		opcode = get_opcode(opno);
+		if (!OidIsValid(opcode))
+			elog(ERROR, "Failed to find = operator for operator %u", opno);
+
+		/* Initialize everything but argument. */
+		ScanKeyInit(entry,
+					i + 1,
+					BTEqualStrategyNumber, opcode,
+					(Datum) NULL);
+		entry->sk_collation = att->attcollation;
+	}
+	index_close(ident_idx_rel, AccessShareLock);
+
+	*nentries = n;
+	return result;
+}
+
+static void
+free_index_insert_state(IndexInsertState *iistate)
+{
+	ExecCloseIndices(iistate->rri);
+	FreeExecutorState(iistate->estate);
+	pfree(iistate->rri);
+	pfree(iistate);
+}
+
+static void
+cleanup_logical_decoding(LogicalDecodingContext *ctx)
+{
+	RepackDecodingState *dstate;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	ExecDropSingleTupleTableSlot(dstate->tsslot);
+	FreeTupleDesc(dstate->tupdesc_change);
+	FreeTupleDesc(dstate->tupdesc);
+	tuplestore_end(dstate->tstore);
+
+	FreeDecodingContext(ctx);
+
+	ReplicationSlotRelease();
+	ReplicationSlotDrop(NameStr(dstate->slotname), false);
+	pfree(dstate);
+}
+
+/*
+ * The final steps of rebuild_relation() for concurrent processing.
+ *
+ * On entry, NewHeap is locked in AccessExclusiveLock mode. OldHeap and its
+ * clustering index (if one is passed) are still locked in a mode that allows
+ * concurrent data changes. On exit, both tables and their indexes are closed,
+ * but locked in AccessExclusiveLock mode.
+ */
+static void
+rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
+								   Relation cl_index,
+								   LogicalDecodingContext *decoding_ctx,
+								   bool swap_toast_by_content,
+								   TransactionId frozenXid,
+								   MultiXactId cutoffMulti)
+{
+	LOCKMODE	lockmode_old PG_USED_FOR_ASSERTS_ONLY;
+	List	   *ind_oids_new;
+	Oid			old_table_oid = RelationGetRelid(OldHeap);
+	Oid			new_table_oid = RelationGetRelid(NewHeap);
+	List	   *ind_oids_old = RelationGetIndexList(OldHeap);
+	ListCell   *lc,
+			   *lc2;
+	char		relpersistence;
+	bool		is_system_catalog;
+	Oid			ident_idx_old,
+				ident_idx_new;
+	XLogRecPtr	wal_insert_ptr,
+				end_of_wal;
+	char		dummy_rec_data = '\0';
+	Relation   *ind_refs,
+			   *ind_refs_p;
+	int			nind;
+	ChangeDest	chgdst;
+
+	/* Like in cluster_rel(). */
+	lockmode_old = ShareUpdateExclusiveLock;
+	Assert(CheckRelationLockedByMe(OldHeap, lockmode_old, false));
+	Assert(cl_index == NULL ||
+		   CheckRelationLockedByMe(cl_index, lockmode_old, false));
+	/* This is expected from the caller. */
+	Assert(CheckRelationLockedByMe(NewHeap, AccessExclusiveLock, false));
+
+	ident_idx_old = RelationGetReplicaIndex(OldHeap);
+
+	/*
+	 * Unlike the exclusive case, we build new indexes for the new relation
+	 * rather than swapping the storage and reindexing the old relation. The
+	 * point is that the index build can take some time, so we do it before we
+	 * get AccessExclusiveLock on the old heap and therefore we cannot swap
+	 * the heap storage yet.
+	 *
+	 * index_create() will lock the new indexes using AccessExclusiveLock - no
+	 * need to change that.
+	 *
+	 * We assume that ShareUpdateExclusiveLock on the table prevents anyone
+	 * from dropping the existing indexes or adding new ones, so the lists of
+	 * old and new indexes should match at the swap time. On the other hand we
+	 * do not block ALTER INDEX commands that do not require table lock (e.g.
+	 * ALTER INDEX ... SET ...).
+	 *
+	 * XXX Should we check a the end of our work if another transaction
+	 * executed such a command and issue a NOTICE that we might have discarded
+	 * its effects? (For example, someone changes storage parameter after we
+	 * have created the new index, the new value of that parameter is lost.)
+	 * Alternatively, we can lock all the indexes now in a mode that blocks
+	 * all the ALTER INDEX commands (ShareUpdateExclusiveLock ?), and keep
+	 * them locked till the end of the transactions. That might increase the
+	 * risk of deadlock during the lock upgrade below, however SELECT / DML
+	 * queries should not be involved in such a deadlock.
+	 */
+	ind_oids_new = build_new_indexes(NewHeap, OldHeap, ind_oids_old);
+
+	/*
+	 * Processing shouldn't start w/o valid identity index.
+	 */
+	Assert(OidIsValid(ident_idx_old));
+
+	/* Find "identity index" on the new relation. */
+	ident_idx_new = InvalidOid;
+	forboth(lc, ind_oids_old, lc2, ind_oids_new)
+	{
+		Oid			ind_old = lfirst_oid(lc);
+		Oid			ind_new = lfirst_oid(lc2);
+
+		if (ident_idx_old == ind_old)
+		{
+			ident_idx_new = ind_new;
+			break;
+		}
+	}
+	if (!OidIsValid(ident_idx_new))
+
+		/*
+		 * Should not happen, given our lock on the old relation.
+		 */
+		ereport(ERROR,
+				(errmsg("Identity index missing on the new relation")));
+
+	/* Gather information to apply concurrent changes. */
+	chgdst.rel = NewHeap;
+	chgdst.toastrelid = swap_toast_by_content ?
+		OldHeap->rd_rel->reltoastrelid : InvalidOid;
+	chgdst.iistate = get_index_insert_state(NewHeap, ident_idx_new,
+											&chgdst.ident_index);
+	chgdst.ident_key = build_identity_key(ident_idx_new, OldHeap,
+										  &chgdst.ident_key_nentries);
+
+	/*
+	 * During testing, wait for another backend to perform concurrent data
+	 * changes which we will process below.
+	 */
+	INJECTION_POINT("repack-concurrently-before-lock", NULL);
+
+	/*
+	 * Flush all WAL records inserted so far (possibly except for the last
+	 * incomplete page, see GetInsertRecPtr), to minimize the amount of data
+	 * we need to flush while holding exclusive lock on the source table.
+	 */
+	wal_insert_ptr = GetInsertRecPtr();
+	XLogFlush(wal_insert_ptr);
+	end_of_wal = GetFlushRecPtr(NULL);
+
+	/*
+	 * Apply concurrent changes first time, to minimize the time we need to
+	 * hold AccessExclusiveLock. (Quite some amount of WAL could have been
+	 * written during the data copying and index creation.)
+	 */
+	process_concurrent_changes(decoding_ctx, end_of_wal, &chgdst);
+
+	/*
+	 * Acquire AccessExclusiveLock on the table, its TOAST relation (if there
+	 * is one), all its indexes, so that we can swap the files.
+	 *
+	 * Before that, unlock the index temporarily to avoid deadlock in case
+	 * another transaction is trying to lock it while holding the lock on the
+	 * table.
+	 */
+	if (cl_index)
+	{
+		index_close(cl_index, ShareUpdateExclusiveLock);
+		cl_index = NULL;
+	}
+	/* For the same reason, unlock TOAST relation. */
+	if (OldHeap->rd_rel->reltoastrelid)
+		LockRelationOid(OldHeap->rd_rel->reltoastrelid, AccessExclusiveLock);
+	/* Finally lock the table */
+	LockRelationOid(old_table_oid, AccessExclusiveLock);
+
+	/*
+	 * Lock all indexes now, not only the clustering one: all indexes need to
+	 * have their files swapped. While doing that, store their relation
+	 * references in an array, to handle predicate locks below.
+	 */
+	ind_refs_p = ind_refs = palloc_array(Relation, list_length(ind_oids_old));
+	nind = 0;
+	foreach(lc, ind_oids_old)
+	{
+		Oid			ind_oid;
+		Relation	index;
+
+		ind_oid = lfirst_oid(lc);
+		index = index_open(ind_oid, AccessExclusiveLock);
+
+		/*
+		 * TODO 1) Do we need to check if ALTER INDEX was executed since the
+		 * new index was created in build_new_indexes()? 2) Specifically for
+		 * the clustering index, should check_index_is_clusterable() be called
+		 * here? (Not sure about the latter: ShareUpdateExclusiveLock on the
+		 * table probably blocks all commands that affect the result of
+		 * check_index_is_clusterable().)
+		 */
+		*ind_refs_p = index;
+		ind_refs_p++;
+		nind++;
+	}
+
+	/*
+	 * In addition, lock the OldHeap's TOAST relation exclusively - again, the
+	 * lock is needed to swap the files.
+	 */
+	if (OidIsValid(OldHeap->rd_rel->reltoastrelid))
+		LockRelationOid(OldHeap->rd_rel->reltoastrelid, AccessExclusiveLock);
+
+	/*
+	 * Tuples and pages of the old heap will be gone, but the heap will stay.
+	 */
+	TransferPredicateLocksToHeapRelation(OldHeap);
+	/* The same for indexes. */
+	for (int i = 0; i < nind; i++)
+	{
+		Relation	index = ind_refs[i];
+
+		TransferPredicateLocksToHeapRelation(index);
+
+		/*
+		 * References to indexes on the old relation are not needed anymore,
+		 * however locks stay till the end of the transaction.
+		 */
+		index_close(index, NoLock);
+	}
+	pfree(ind_refs);
+
+	/*
+	 * Flush anything we see in WAL, to make sure that all changes committed
+	 * while we were waiting for the exclusive lock are available for
+	 * decoding. This should not be necessary if all backends had
+	 * synchronous_commit set, but we can't rely on this setting.
+	 *
+	 * Unfortunately, GetInsertRecPtr() may lag behind the actual insert
+	 * position, and GetLastImportantRecPtr() points at the start of the last
+	 * record rather than at the end. Thus the simplest way to determine the
+	 * insert position is to insert a dummy record and use its LSN.
+	 *
+	 * XXX Consider using GetLastImportantRecPtr() and adding the size of the
+	 * last record (plus the total size of all the page headers the record
+	 * spans)?
+	 */
+	XLogBeginInsert();
+	XLogRegisterData(&dummy_rec_data, 1);
+	wal_insert_ptr = XLogInsert(RM_XLOG_ID, XLOG_NOOP);
+	XLogFlush(wal_insert_ptr);
+	end_of_wal = GetFlushRecPtr(NULL);
+
+	/* Apply the concurrent changes again. */
+	process_concurrent_changes(decoding_ctx, end_of_wal, &chgdst);
+
+	/* Remember info about rel before closing OldHeap */
+	relpersistence = OldHeap->rd_rel->relpersistence;
+	is_system_catalog = IsSystemRelation(OldHeap);
+
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_SWAP_REL_FILES);
+
+	/*
+	 * Even ShareUpdateExclusiveLock should have prevented others from
+	 * creating / dropping indexes (even using the CONCURRENTLY option), so we
+	 * do not need to check whether the lists match.
+	 */
+	forboth(lc, ind_oids_old, lc2, ind_oids_new)
+	{
+		Oid			ind_old = lfirst_oid(lc);
+		Oid			ind_new = lfirst_oid(lc2);
+		Oid			mapped_tables[4];
+
+		/* Zero out possible results from swapped_relation_files */
+		memset(mapped_tables, 0, sizeof(mapped_tables));
+
+		swap_relation_files(ind_old, ind_new,
+							(old_table_oid == RelationRelationId),
+							swap_toast_by_content,
+							true,
+							InvalidTransactionId,
+							InvalidMultiXactId,
+							mapped_tables);
+
+#ifdef USE_ASSERT_CHECKING
+
+		/*
+		 * Concurrent processing is not supported for system relations, so
+		 * there should be no mapped tables.
+		 */
+		for (int i = 0; i < 4; i++)
+			Assert(mapped_tables[i] == 0);
+#endif
+	}
+
+	/* The new indexes must be visible for deletion. */
+	CommandCounterIncrement();
+
+	/* Close the old heap but keep lock until transaction commit. */
+	table_close(OldHeap, NoLock);
+	/* Close the new heap. (We didn't have to open its indexes). */
+	table_close(NewHeap, NoLock);
+
+	/* Cleanup what we don't need anymore. (And close the identity index.) */
+	pfree(chgdst.ident_key);
+	free_index_insert_state(chgdst.iistate);
+
+	/*
+	 * Swap the relations and their TOAST relations and TOAST indexes. This
+	 * also drops the new relation and its indexes.
+	 *
+	 * (System catalogs are currently not supported.)
+	 */
+	Assert(!is_system_catalog);
+	finish_heap_swap(old_table_oid, new_table_oid,
+					 is_system_catalog,
+					 swap_toast_by_content,
+					 false, true, false,
+					 frozenXid, cutoffMulti,
+					 relpersistence);
+}
+
+/*
+ * Build indexes on NewHeap according to those on OldHeap.
+ *
+ * OldIndexes is the list of index OIDs on OldHeap.
+ *
+ * A list of OIDs of the corresponding indexes created on NewHeap is
+ * returned. The order of items does match, so we can use these arrays to swap
+ * index storage.
+ */
+static List *
+build_new_indexes(Relation NewHeap, Relation OldHeap, List *OldIndexes)
+{
+	ListCell   *lc;
+	List	   *result = NIL;
+
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
+
+	foreach(lc, OldIndexes)
+	{
+		Oid			ind_oid,
+					ind_oid_new;
+		char	   *newName;
+		Relation	ind;
+
+		ind_oid = lfirst_oid(lc);
+		ind = index_open(ind_oid, AccessShareLock);
+
+		newName = ChooseRelationName(get_rel_name(ind_oid),
+									 NULL,
+									 "repacknew",
+									 get_rel_namespace(ind->rd_index->indrelid),
+									 false);
+		ind_oid_new = index_create_copy(NewHeap, ind_oid,
+										ind->rd_rel->reltablespace, newName,
+										false);
+		result = lappend_oid(result, ind_oid_new);
+
+		index_close(ind, AccessShareLock);
+	}
+
+	return result;
+}
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index ef7c0d624f1..d12e2d0f2e0 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -892,7 +892,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 static void
 refresh_by_heap_swap(Oid matviewOid, Oid OIDNewHeap, char relpersistence)
 {
-	finish_heap_swap(matviewOid, OIDNewHeap, false, false, true, true,
+	finish_heap_swap(matviewOid, OIDNewHeap, false, false, true, true, true,
 					 RecentXmin, ReadNextMultiXactId(), relpersistence);
 }
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 07e5b95782e..1bce85e4232 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -5992,6 +5992,7 @@ ATRewriteTables(AlterTableStmt *parsetree, List **wqueue, LOCKMODE lockmode,
 			finish_heap_swap(tab->relid, OIDNewHeap,
 							 false, false, true,
 							 !OidIsValid(tab->newTableSpace),
+							 true,
 							 RecentXmin,
 							 ReadNextMultiXactId(),
 							 persistence);
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 827e66724b5..03478ecd5cc 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -126,7 +126,7 @@ static void vac_truncate_clog(TransactionId frozenXID,
 							  TransactionId lastSaneFrozenXid,
 							  MultiXactId lastSaneMinMulti);
 static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
-					   BufferAccessStrategy bstrategy);
+					   BufferAccessStrategy bstrategy, bool isTopLevel);
 static double compute_parallel_delay(void);
 static VacOptValue get_vacoptval_from_boolean(DefElem *def);
 static bool vac_tid_reaped(ItemPointer itemptr, void *state);
@@ -627,7 +627,8 @@ vacuum(List *relations, const VacuumParams params, BufferAccessStrategy bstrateg
 
 			if (params.options & VACOPT_VACUUM)
 			{
-				if (!vacuum_rel(vrel->oid, vrel->relation, params, bstrategy))
+				if (!vacuum_rel(vrel->oid, vrel->relation, params, bstrategy,
+								isTopLevel))
 					continue;
 			}
 
@@ -1997,7 +1998,7 @@ vac_truncate_clog(TransactionId frozenXID,
  */
 static bool
 vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
-		   BufferAccessStrategy bstrategy)
+		   BufferAccessStrategy bstrategy, bool isTopLevel)
 {
 	LOCKMODE	lmode;
 	Relation	rel;
@@ -2288,7 +2289,7 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 
 			/* VACUUM FULL is a variant of REPACK; see cluster.c */
 			cluster_rel(REPACK_COMMAND_VACUUMFULL, rel, InvalidOid,
-						&cluster_params);
+						&cluster_params, isTopLevel);
 			/* cluster_rel closes the relation, but keeps lock */
 
 			rel = NULL;
@@ -2331,7 +2332,8 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 		toast_vacuum_params.options |= VACOPT_PROCESS_MAIN;
 		toast_vacuum_params.toast_parent = relid;
 
-		vacuum_rel(toast_relid, NULL, toast_vacuum_params, bstrategy);
+		vacuum_rel(toast_relid, NULL, toast_vacuum_params, bstrategy,
+				   isTopLevel);
 	}
 
 	/*
diff --git a/src/backend/meson.build b/src/backend/meson.build
index b831a541652..5c148131217 100644
--- a/src/backend/meson.build
+++ b/src/backend/meson.build
@@ -194,5 +194,6 @@ pg_test_mod_args = pg_mod_args + {
 subdir('jit/llvm')
 subdir('replication/libpqwalreceiver')
 subdir('replication/pgoutput')
+subdir('replication/pgoutput_repack')
 subdir('snowball')
 subdir('utils/mb/conversion_procs')
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..73fc4d30c67 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -33,6 +33,7 @@
 #include "access/xlogreader.h"
 #include "access/xlogrecord.h"
 #include "catalog/pg_control.h"
+#include "commands/cluster.h"
 #include "replication/decode.h"
 #include "replication/logical.h"
 #include "replication/message.h"
@@ -472,6 +473,88 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	TransactionId xid = XLogRecGetXid(buf->record);
 	SnapBuild  *builder = ctx->snapshot_builder;
 
+	/*
+	 * If the change is not intended for logical decoding, do not even
+	 * establish transaction for it - REPACK CONCURRENTLY is the typical use
+	 * case.
+	 *
+	 * First, check if REPACK CONCURRENTLY is being performed by this backend.
+	 * If so, only decode data changes of the table that it is processing, and
+	 * the changes of its TOAST relation.
+	 *
+	 * (TOAST locator should not be set unless the main is.)
+	 */
+	Assert(!OidIsValid(repacked_rel_toast_locator.relNumber) ||
+		   OidIsValid(repacked_rel_locator.relNumber));
+
+	if (OidIsValid(repacked_rel_locator.relNumber))
+	{
+		XLogReaderState *r = buf->record;
+		RelFileLocator locator;
+
+		/* Not all records contain the block. */
+		if (XLogRecGetBlockTagExtended(r, 0, &locator, NULL, NULL, NULL) &&
+			!RelFileLocatorEquals(locator, repacked_rel_locator) &&
+			(!OidIsValid(repacked_rel_toast_locator.relNumber) ||
+			 !RelFileLocatorEquals(locator, repacked_rel_toast_locator)))
+			return;
+	}
+
+	/*
+	 * Second, skip records which do not contain sufficient information for
+	 * the decoding.
+	 *
+	 * The problem we solve here is that REPACK CONCURRENTLY generates WAL
+	 * when doing changes in the new table. Those changes should not be useful
+	 * for any other user (such as logical replication subscription) because
+	 * the new table will eventually be dropped (after REPACK CONCURRENTLY has
+	 * assigned its file to the "old table").
+	 */
+	switch (info)
+	{
+		case XLOG_HEAP_INSERT:
+			{
+				xl_heap_insert *rec;
+
+				rec = (xl_heap_insert *) XLogRecGetData(buf->record);
+
+				/*
+				 * This does happen when 1) raw_heap_insert marks the TOAST
+				 * record as HEAP_INSERT_NO_LOGICAL, 2) REPACK CONCURRENTLY
+				 * replays inserts performed by other backends.
+				 */
+				if ((rec->flags & XLH_INSERT_CONTAINS_NEW_TUPLE) == 0)
+					return;
+
+				break;
+			}
+
+		case XLOG_HEAP_HOT_UPDATE:
+		case XLOG_HEAP_UPDATE:
+			{
+				xl_heap_update *rec;
+
+				rec = (xl_heap_update *) XLogRecGetData(buf->record);
+				if ((rec->flags &
+					 (XLH_UPDATE_CONTAINS_NEW_TUPLE |
+					  XLH_UPDATE_CONTAINS_OLD_TUPLE |
+					  XLH_UPDATE_CONTAINS_OLD_KEY)) == 0)
+					return;
+
+				break;
+			}
+
+		case XLOG_HEAP_DELETE:
+			{
+				xl_heap_delete *rec;
+
+				rec = (xl_heap_delete *) XLogRecGetData(buf->record);
+				if (rec->flags & XLH_DELETE_NO_LOGICAL)
+					return;
+				break;
+			}
+	}
+
 	ReorderBufferProcessXid(ctx->reorder, xid, buf->origptr);
 
 	/*
@@ -512,6 +595,16 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			break;
 
 		case XLOG_HEAP_TRUNCATE:
+			/* Is REPACK (CONCURRENTLY) being run by this backend? */
+			if (OidIsValid(repacked_rel_locator.relNumber))
+				/*
+				 * TRUNCATE changes rd_locator of the relation, so it'd break
+				 * REPACK (CONCURRENTLY). In fact it should not happen because
+				 * TRUNCATE needs AccessExclusiveLock on the table. Should we
+				 * only use Assert() here?
+				 */
+				ereport(ERROR,
+						(errmsg("TRUNCATE encountered while doing REPACK (CONCURRENTLY)")));
 			if (SnapBuildProcessChange(builder, xid, buf->origptr) &&
 				!ctx->fast_forward)
 				DecodeTruncate(ctx, buf);
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index dcf32101c4c..63efb577975 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -486,12 +486,33 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	return SnapBuildMVCCFromHistoric(snap, true);
 }
 
+/*
+ * Build an MVCC snapshot for the initial data load performed by REPACK
+ * CONCURRENTLY command.
+ *
+ * The snapshot will only be used to scan one particular relation, which is
+ * treated like a catalog (therefore ->building_full_snapshot is not
+ * important), and the caller should already have a replication slot setup (so
+ * we do not set MyProc->xmin). XXX Do we yet need to add some restrictions?
+ */
+Snapshot
+SnapBuildInitialSnapshotForRepack(SnapBuild *builder)
+{
+	Snapshot	snap;
+
+	Assert(builder->state == SNAPBUILD_CONSISTENT);
+	Assert(builder->building_full_snapshot);
+
+	snap = SnapBuildBuildSnapshot(builder);
+	return SnapBuildMVCCFromHistoric(snap, false);
+}
+
 /*
  * Turn a historic MVCC snapshot into an ordinary MVCC snapshot.
  *
  * Unlike a regular (non-historic) MVCC snapshot, the xip array of this
  * snapshot contains not only running main transactions, but also their
- * subtransactions. This difference does has no impact on XidInMVCCSnapshot().
+ * subtransactions. This difference has no impact on XidInMVCCSnapshot().
  *
  * Pass true for 'in_place' if you don't care about modifying the source
  * snapshot. If you need a new instance, and one that was allocated as a
diff --git a/src/backend/replication/pgoutput_repack/Makefile b/src/backend/replication/pgoutput_repack/Makefile
new file mode 100644
index 00000000000..4efeb713b70
--- /dev/null
+++ b/src/backend/replication/pgoutput_repack/Makefile
@@ -0,0 +1,32 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for src/backend/replication/pgoutput_repack
+#
+# IDENTIFICATION
+#    src/backend/replication/pgoutput_repack
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/replication/pgoutput_repack
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = \
+	$(WIN32RES) \
+	pgoutput_repack.o
+PGFILEDESC = "pgoutput_repack - logical replication output plugin for REPACK command"
+NAME = pgoutput_repack
+
+all: all-shared-lib
+
+include $(top_srcdir)/src/Makefile.shlib
+
+install: all installdirs install-lib
+
+installdirs: installdirs-lib
+
+uninstall: uninstall-lib
+
+clean distclean: clean-lib
+	rm -f $(OBJS)
diff --git a/src/backend/replication/pgoutput_repack/meson.build b/src/backend/replication/pgoutput_repack/meson.build
new file mode 100644
index 00000000000..133e865a4a0
--- /dev/null
+++ b/src/backend/replication/pgoutput_repack/meson.build
@@ -0,0 +1,18 @@
+# Copyright (c) 2022-2024, PostgreSQL Global Development Group
+
+pgoutput_repack_sources = files(
+  'pgoutput_repack.c',
+)
+
+if host_system == 'windows'
+  pgoutput_repack_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'pgoutput_repack',
+    '--FILEDESC', 'pgoutput_repack - logical replication output plugin for REPACK command',])
+endif
+
+pgoutput_repack = shared_module('pgoutput_repack',
+  pgoutput_repack_sources,
+  kwargs: pg_mod_args,
+)
+
+backend_targets += pgoutput_repack
diff --git a/src/backend/replication/pgoutput_repack/pgoutput_repack.c b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
new file mode 100644
index 00000000000..178e47bdce4
--- /dev/null
+++ b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
@@ -0,0 +1,248 @@
+/*-------------------------------------------------------------------------
+ *
+ * pgoutput_repack.c
+ *		Logical Replication output plugin for REPACK command
+ *
+ * Copyright (c) 2012-2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		  src/backend/replication/pgoutput_repack/pgoutput_repack.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heaptoast.h"
+#include "commands/cluster.h"
+#include "replication/snapbuild.h"
+
+PG_MODULE_MAGIC;
+
+static void plugin_startup(LogicalDecodingContext *ctx,
+						   OutputPluginOptions *opt, bool is_init);
+static void plugin_shutdown(LogicalDecodingContext *ctx);
+static void plugin_begin_txn(LogicalDecodingContext *ctx,
+							 ReorderBufferTXN *txn);
+static void plugin_commit_txn(LogicalDecodingContext *ctx,
+							  ReorderBufferTXN *txn, XLogRecPtr commit_lsn);
+static void plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+						  Relation rel, ReorderBufferChange *change);
+static void store_change(LogicalDecodingContext *ctx,
+						 ConcurrentChangeKind kind, HeapTuple tuple);
+
+void
+_PG_output_plugin_init(OutputPluginCallbacks *cb)
+{
+	AssertVariableIsOfType(&_PG_output_plugin_init, LogicalOutputPluginInit);
+
+	cb->startup_cb = plugin_startup;
+	cb->begin_cb = plugin_begin_txn;
+	cb->change_cb = plugin_change;
+	cb->commit_cb = plugin_commit_txn;
+	cb->shutdown_cb = plugin_shutdown;
+}
+
+
+/* initialize this plugin */
+static void
+plugin_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
+			   bool is_init)
+{
+	ctx->output_plugin_private = NULL;
+
+	/* Probably unnecessary, as we don't use the SQL interface ... */
+	opt->output_type = OUTPUT_PLUGIN_BINARY_OUTPUT;
+
+	if (ctx->output_plugin_options != NIL)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("This plugin does not expect any options")));
+	}
+}
+
+static void
+plugin_shutdown(LogicalDecodingContext *ctx)
+{
+}
+
+/*
+ * As we don't release the slot during processing of particular table, there's
+ * no room for SQL interface, even for debugging purposes. Therefore we need
+ * neither OutputPluginPrepareWrite() nor OutputPluginWrite() in the plugin
+ * callbacks. (Although we might want to write custom callbacks, this API
+ * seems to be unnecessarily generic for our purposes.)
+ */
+
+/* BEGIN callback */
+static void
+plugin_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
+{
+}
+
+/* COMMIT callback */
+static void
+plugin_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+				  XLogRecPtr commit_lsn)
+{
+}
+
+/*
+ * Callback for individual changed tuples
+ */
+static void
+plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+			  Relation relation, ReorderBufferChange *change)
+{
+	RepackDecodingState *dstate;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	/* Only interested in one particular relation. */
+	if (relation->rd_id != dstate->relid)
+		return;
+
+	/* Decode entry depending on its type */
+	switch (change->action)
+	{
+		case REORDER_BUFFER_CHANGE_INSERT:
+			{
+				HeapTuple	newtuple;
+
+				newtuple = change->data.tp.newtuple != NULL ?
+					change->data.tp.newtuple : NULL;
+
+				/*
+				 * Identity checks in the main function should have made this
+				 * impossible.
+				 */
+				if (newtuple == NULL)
+					elog(ERROR, "Incomplete insert info.");
+
+				store_change(ctx, CHANGE_INSERT, newtuple);
+			}
+			break;
+		case REORDER_BUFFER_CHANGE_UPDATE:
+			{
+				HeapTuple	oldtuple,
+							newtuple;
+
+				oldtuple = change->data.tp.oldtuple != NULL ?
+					change->data.tp.oldtuple : NULL;
+				newtuple = change->data.tp.newtuple != NULL ?
+					change->data.tp.newtuple : NULL;
+
+				if (newtuple == NULL)
+					elog(ERROR, "Incomplete update info.");
+
+				if (oldtuple != NULL)
+					store_change(ctx, CHANGE_UPDATE_OLD, oldtuple);
+
+				store_change(ctx, CHANGE_UPDATE_NEW, newtuple);
+			}
+			break;
+		case REORDER_BUFFER_CHANGE_DELETE:
+			{
+				HeapTuple	oldtuple;
+
+				oldtuple = change->data.tp.oldtuple ?
+					change->data.tp.oldtuple : NULL;
+
+				if (oldtuple == NULL)
+					elog(ERROR, "Incomplete delete info.");
+
+				store_change(ctx, CHANGE_DELETE, oldtuple);
+			}
+			break;
+		default:
+			/* Should not come here */
+			Assert(false);
+			break;
+	}
+}
+
+/* Store concurrent data change. */
+static void
+store_change(LogicalDecodingContext *ctx, ConcurrentChangeKind kind,
+			 HeapTuple tuple)
+{
+	RepackDecodingState *dstate;
+	char	   *change_raw;
+	ConcurrentChange change;
+	bool		flattened = false;
+	Size		size;
+	Datum		values[1];
+	bool		isnull[1];
+	char	   *dst,
+			   *dst_start;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	size = MAXALIGN(VARHDRSZ) + SizeOfConcurrentChange;
+
+	if (tuple)
+	{
+		/*
+		 * ReorderBufferCommit() stores the TOAST chunks in its private memory
+		 * context and frees them after having called apply_change().
+		 * Therefore we need flat copy (including TOAST) that we eventually
+		 * copy into the memory context which is available to
+		 * decode_concurrent_changes().
+		 */
+		if (HeapTupleHasExternal(tuple))
+		{
+			/*
+			 * toast_flatten_tuple_to_datum() might be more convenient but we
+			 * don't want the decompression it does.
+			 */
+			tuple = toast_flatten_tuple(tuple, dstate->tupdesc);
+			flattened = true;
+		}
+
+		size += tuple->t_len;
+	}
+
+	/* XXX Isn't there any function / macro to do this? */
+	if (size >= 0x3FFFFFFF)
+		elog(ERROR, "Change is too big.");
+
+	/* Construct the change. */
+	change_raw = (char *) palloc0(size);
+	SET_VARSIZE(change_raw, size);
+
+	/*
+	 * Since the varlena alignment might not be sufficient for the structure,
+	 * set the fields in a local instance and remember where it should
+	 * eventually be copied.
+	 */
+	change.kind = kind;
+	dst_start = (char *) VARDATA(change_raw);
+
+	/*
+	 * Copy the tuple.
+	 *
+	 * CAUTION: change->tup_data.t_data must be fixed on retrieval!
+	 */
+	memcpy(&change.tup_data, tuple, sizeof(HeapTupleData));
+	dst = dst_start + SizeOfConcurrentChange;
+	memcpy(dst, tuple->t_data, tuple->t_len);
+
+	/* The data has been copied. */
+	if (flattened)
+		pfree(tuple);
+
+	/* Copy the structure so it can be stored. */
+	memcpy(dst_start, &change, SizeOfConcurrentChange);
+
+	/* Store as tuple of 1 bytea column. */
+	values[0] = PointerGetDatum(change_raw);
+	isnull[0] = false;
+	tuplestore_putvalues(dstate->tstore, dstate->tupdesc_change,
+						 values, isnull);
+
+	/* Accounting. */
+	dstate->nchanges++;
+
+	/* Cleanup. */
+	pfree(change_raw);
+}
diff --git a/src/backend/storage/lmgr/generate-lwlocknames.pl b/src/backend/storage/lmgr/generate-lwlocknames.pl
index cd3e43c448a..519f3953638 100644
--- a/src/backend/storage/lmgr/generate-lwlocknames.pl
+++ b/src/backend/storage/lmgr/generate-lwlocknames.pl
@@ -162,7 +162,7 @@ while (<$lwlocklist>)
 
 die
   "$wait_event_lwlocks[$lwlock_count] defined in wait_event_names.txt but "
-  . " missing from lwlocklist.h"
+  . "missing from lwlocklist.h"
   if $lwlock_count < scalar @wait_event_lwlocks;
 
 die
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 915d0bc9084..d0f01d85bd3 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -64,6 +64,7 @@
 #include "catalog/pg_type.h"
 #include "catalog/schemapg.h"
 #include "catalog/storage.h"
+#include "commands/cluster.h"
 #include "commands/policy.h"
 #include "commands/publicationcmds.h"
 #include "commands/trigger.h"
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 886060305f5..fbb3d66bbd9 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -214,7 +214,6 @@ static List *exportedSnapshots = NIL;
 
 /* Prototypes for local functions */
 static void UnregisterSnapshotNoOwner(Snapshot snapshot);
-static void FreeSnapshot(Snapshot snapshot);
 static void SnapshotResetXmin(void);
 
 /* ResourceOwner callbacks to track snapshot references */
@@ -659,7 +658,7 @@ CopySnapshot(Snapshot snapshot)
  * FreeSnapshot
  *		Free the memory associated with a snapshot.
  */
-static void
+void
 FreeSnapshot(Snapshot snapshot)
 {
 	Assert(snapshot->regd_count == 0);
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 626d9f1c98b..0fcf343d3af 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -5075,8 +5075,8 @@ match_previous_words(int pattern_id,
 		 * one word, so the above test is correct.
 		 */
 		if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
-			COMPLETE_WITH("ANALYZE", "VERBOSE");
-		else if (TailMatches("ANALYZE", "VERBOSE"))
+			COMPLETE_WITH("ANALYZE", "CONCURRENTLY", "VERBOSE");
+		else if (TailMatches("ANALYZE", "CONCURRENTLY", "VERBOSE"))
 			COMPLETE_WITH("ON", "OFF");
 	}
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 632c4332a8c..b0d6af0474c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -361,14 +361,15 @@ extern void heap_multi_insert(Relation relation, TupleTableSlot **slots,
 							  BulkInsertState bistate);
 extern TM_Result heap_delete(Relation relation, const ItemPointerData *tid,
 							 CommandId cid, Snapshot crosscheck, bool wait,
-							 TM_FailureData *tmfd, bool changingPart);
+							 TM_FailureData *tmfd, bool changingPart,
+							 bool wal_logical);
 extern void heap_finish_speculative(Relation relation, const ItemPointerData *tid);
 extern void heap_abort_speculative(Relation relation, const ItemPointerData *tid);
 extern TM_Result heap_update(Relation relation, const ItemPointerData *otid,
 							 HeapTuple newtup,
 							 CommandId cid, Snapshot crosscheck, bool wait,
 							 TM_FailureData *tmfd, LockTupleMode *lockmode,
-							 TU_UpdateIndexes *update_indexes);
+							 TU_UpdateIndexes *update_indexes, bool wal_logical);
 extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
 								 CommandId cid, LockTupleMode mode, LockWaitPolicy wait_policy,
 								 bool follow_updates,
@@ -445,6 +446,10 @@ extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
 								 uint16 infomask, TransactionId xid);
+extern bool HeapTupleMVCCInserted(HeapTuple htup, Snapshot snapshot,
+								  Buffer buffer);
+extern bool HeapTupleMVCCNotDeleted(HeapTuple htup, Snapshot snapshot,
+									Buffer buffer);
 extern bool HeapTupleHeaderIsOnlyLocked(HeapTupleHeader tuple);
 extern bool HeapTupleIsSurelyDead(HeapTuple htup,
 								  GlobalVisState *vistest);
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..2cc49fd48de 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -104,6 +104,8 @@
 #define XLH_DELETE_CONTAINS_OLD_KEY				(1<<2)
 #define XLH_DELETE_IS_SUPER						(1<<3)
 #define XLH_DELETE_IS_PARTITION_MOVE			(1<<4)
+/* See heap_delete() */
+#define XLH_DELETE_NO_LOGICAL					(1<<5)
 
 /* convenience macro for checking whether any form of old tuple was logged */
 #define XLH_DELETE_CONTAINS_OLD						\
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 2fa790b6bf5..d8f76d325f9 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -22,6 +22,7 @@
 #include "access/xact.h"
 #include "commands/vacuum.h"
 #include "executor/tuptable.h"
+#include "replication/logical.h"
 #include "storage/read_stream.h"
 #include "utils/rel.h"
 #include "utils/snapshot.h"
@@ -629,6 +630,8 @@ typedef struct TableAmRoutine
 											  Relation OldIndex,
 											  bool use_sort,
 											  TransactionId OldestXmin,
+											  Snapshot snapshot,
+											  LogicalDecodingContext *decoding_ctx,
 											  TransactionId *xid_cutoff,
 											  MultiXactId *multi_cutoff,
 											  double *num_tuples,
@@ -1646,6 +1649,10 @@ table_relation_copy_data(Relation rel, const RelFileLocator *newrlocator)
  *   not needed for the relation's AM
  * - *xid_cutoff - ditto
  * - *multi_cutoff - ditto
+ * - snapshot - if != NULL, ignore data changes done by transactions that this
+ *	 (MVCC) snapshot considers still in-progress or in the future.
+ * - decoding_ctx - logical decoding context, to capture concurrent data
+ *   changes.
  *
  * Output parameters:
  * - *xid_cutoff - rel's new relfrozenxid value, may be invalid
@@ -1658,6 +1665,8 @@ table_relation_copy_for_cluster(Relation OldTable, Relation NewTable,
 								Relation OldIndex,
 								bool use_sort,
 								TransactionId OldestXmin,
+								Snapshot snapshot,
+								LogicalDecodingContext *decoding_ctx,
 								TransactionId *xid_cutoff,
 								MultiXactId *multi_cutoff,
 								double *num_tuples,
@@ -1666,6 +1675,7 @@ table_relation_copy_for_cluster(Relation OldTable, Relation NewTable,
 {
 	OldTable->rd_tableam->relation_copy_for_cluster(OldTable, NewTable, OldIndex,
 													use_sort, OldestXmin,
+													snapshot, decoding_ctx,
 													xid_cutoff, multi_cutoff,
 													num_tuples, tups_vacuumed,
 													tups_recently_dead);
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 652542e8e65..b43a1740053 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -13,10 +13,15 @@
 #ifndef CLUSTER_H
 #define CLUSTER_H
 
+#include "nodes/execnodes.h"
 #include "nodes/parsenodes.h"
 #include "parser/parse_node.h"
+#include "replication/logical.h"
 #include "storage/lock.h"
+#include "storage/relfilelocator.h"
 #include "utils/relcache.h"
+#include "utils/resowner.h"
+#include "utils/tuplestore.h"
 
 
 /* flag bits for ClusterParams->options */
@@ -25,6 +30,8 @@
 #define CLUOPT_RECHECK_ISCLUSTERED 0x04 /* recheck relation state for
 										 * indisclustered */
 #define CLUOPT_ANALYZE 0x08		/* do an ANALYZE */
+#define CLUOPT_CONCURRENT 0x10	/* allow concurrent data changes */
+
 
 /* options for CLUSTER */
 typedef struct ClusterParams
@@ -33,14 +40,94 @@ typedef struct ClusterParams
 } ClusterParams;
 
 
+/*
+ * The following definitions are used by REPACK CONCURRENTLY.
+ */
+
+extern RelFileLocator repacked_rel_locator;
+extern RelFileLocator repacked_rel_toast_locator;
+
+typedef enum
+{
+	CHANGE_INSERT,
+	CHANGE_UPDATE_OLD,
+	CHANGE_UPDATE_NEW,
+	CHANGE_DELETE
+} ConcurrentChangeKind;
+
+typedef struct ConcurrentChange
+{
+	/* See the enum above. */
+	ConcurrentChangeKind kind;
+
+	/*
+	 * The actual tuple.
+	 *
+	 * The tuple data follows the ConcurrentChange structure. Before use make
+	 * sure the tuple is correctly aligned (ConcurrentChange can be stored as
+	 * bytea) and that tuple->t_data is fixed.
+	 */
+	HeapTupleData tup_data;
+} ConcurrentChange;
+
+#define SizeOfConcurrentChange (offsetof(ConcurrentChange, tup_data) + \
+								sizeof(HeapTupleData))
+
+/*
+ * Logical decoding state.
+ *
+ * Here we store the data changes that we decode from WAL while the table
+ * contents is being copied to a new storage. Also the necessary metadata
+ * needed to apply these changes to the table is stored here.
+ */
+typedef struct RepackDecodingState
+{
+	/* The relation whose changes we're decoding. */
+	Oid			relid;
+
+	/* Replication slot name. */
+	NameData	slotname;
+
+	/*
+	 * Decoded changes are stored here. Although we try to avoid excessive
+	 * batches, it can happen that the changes need to be stored to disk. The
+	 * tuplestore does this transparently.
+	 */
+	Tuplestorestate *tstore;
+
+	/* The current number of changes in tstore. */
+	double		nchanges;
+
+	/*
+	 * Descriptor to store the ConcurrentChange structure serialized (bytea).
+	 * We can't store the tuple directly because tuplestore only supports
+	 * minimum tuple and we may need to transfer OID system column from the
+	 * output plugin. Also we need to transfer the change kind, so it's better
+	 * to put everything in the structure than to use 2 tuplestores "in
+	 * parallel".
+	 */
+	TupleDesc	tupdesc_change;
+
+	/* Tuple descriptor needed to update indexes. */
+	TupleDesc	tupdesc;
+
+	/* Slot to retrieve data from tstore. */
+	TupleTableSlot *tsslot;
+
+	ResourceOwner resowner;
+} RepackDecodingState;
+
 extern void ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel);
 
 extern void cluster_rel(RepackCommand command, Relation OldHeap, Oid indexOid,
-						ClusterParams *params);
+						ClusterParams *params, bool isTopLevel);
 extern void check_index_is_clusterable(Relation OldHeap, Oid indexOid,
 									   LOCKMODE lockmode);
 extern void mark_index_clustered(Relation rel, Oid indexOid, bool is_internal);
 
+extern void repack_decode_concurrent_changes(LogicalDecodingContext *ctx,
+											 XLogRecPtr end_of_wal);
+
 extern Oid	make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
 						  char relpersistence, LOCKMODE lockmode);
 extern void finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
@@ -48,6 +135,7 @@ extern void finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 							 bool swap_toast_by_content,
 							 bool check_constraints,
 							 bool is_internal,
+							 bool reindex,
 							 TransactionId frozenXid,
 							 MultiXactId cutoffMulti,
 							 char newrelpersistence);
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index ebf004b7aa5..5024fea5e2e 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -69,10 +69,12 @@
 #define PROGRESS_REPACK_PHASE					1
 #define PROGRESS_REPACK_INDEX_RELID				2
 #define PROGRESS_REPACK_HEAP_TUPLES_SCANNED		3
-#define PROGRESS_REPACK_HEAP_TUPLES_WRITTEN		4
-#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			5
-#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		6
-#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		7
+#define PROGRESS_REPACK_HEAP_TUPLES_INSERTED	4
+#define PROGRESS_REPACK_HEAP_TUPLES_UPDATED		5
+#define PROGRESS_REPACK_HEAP_TUPLES_DELETED		6
+#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			7
+#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		8
+#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		9
 
 /*
  * Phases of repack (as advertised via PROGRESS_REPACK_PHASE).
@@ -81,9 +83,10 @@
 #define PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP	2
 #define PROGRESS_REPACK_PHASE_SORT_TUPLES		3
 #define PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP	4
-#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	5
-#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		6
-#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		7
+#define PROGRESS_REPACK_PHASE_CATCH_UP			5
+#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	6
+#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		7
+#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		8
 
 /* Progress parameters for CREATE INDEX */
 /* 3, 4 and 5 reserved for "waitfor" metrics */
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index 6d4d2d1814c..802fc4b0823 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -73,6 +73,7 @@ extern void FreeSnapshotBuilder(SnapBuild *builder);
 extern void SnapBuildSnapDecRefcount(Snapshot snap);
 
 extern Snapshot SnapBuildInitialSnapshot(SnapBuild *builder);
+extern Snapshot SnapBuildInitialSnapshotForRepack(SnapBuild *builder);
 extern Snapshot SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place);
 extern const char *SnapBuildExportSnapshot(SnapBuild *builder);
 extern void SnapBuildClearExportedSnapshot(void);
diff --git a/src/include/storage/lockdefs.h b/src/include/storage/lockdefs.h
index 7f3ba0352f6..2739327b0da 100644
--- a/src/include/storage/lockdefs.h
+++ b/src/include/storage/lockdefs.h
@@ -36,8 +36,8 @@ typedef int LOCKMODE;
 #define AccessShareLock			1	/* SELECT */
 #define RowShareLock			2	/* SELECT FOR UPDATE/FOR SHARE */
 #define RowExclusiveLock		3	/* INSERT, UPDATE, DELETE */
-#define ShareUpdateExclusiveLock 4	/* VACUUM (non-FULL), ANALYZE, CREATE
-									 * INDEX CONCURRENTLY */
+#define ShareUpdateExclusiveLock 4	/* VACUUM (non-exclusive), ANALYZE, CREATE
+									 * INDEX CONCURRENTLY, REPACK CONCURRENTLY */
 #define ShareLock				5	/* CREATE INDEX (WITHOUT CONCURRENTLY) */
 #define ShareRowExclusiveLock	6	/* like EXCLUSIVE MODE, but allows ROW
 									 * SHARE */
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index f65f83c85cd..1f821fd2ccd 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -64,6 +64,8 @@ extern Snapshot GetLatestSnapshot(void);
 extern void SnapshotSetCommandId(CommandId curcid);
 
 extern Snapshot CopySnapshot(Snapshot snapshot);
+extern void FreeSnapshot(Snapshot snapshot);
+
 extern Snapshot GetCatalogSnapshot(Oid relid);
 extern Snapshot GetNonHistoricCatalogSnapshot(Oid relid);
 extern void InvalidateCatalogSnapshot(void);
diff --git a/src/test/modules/injection_points/Makefile b/src/test/modules/injection_points/Makefile
index a618e6a9899..e477a1ba5ff 100644
--- a/src/test/modules/injection_points/Makefile
+++ b/src/test/modules/injection_points/Makefile
@@ -16,12 +16,14 @@ REGRESS_OPTS = --dlpath=$(top_builddir)/src/test/regress
 
 ISOLATION = basic \
 	    inplace \
+	    repack \
 	    syscache-update-pruned \
 	    index-concurrently-upsert \
 	    index-concurrently-upsert-predicate \
 	    reindex-concurrently-upsert \
 	    reindex-concurrently-upsert-on-constraint \
 	    reindex-concurrently-upsert-partitioned
+ISOLATION_OPTS = --temp-config $(top_srcdir)/src/test/modules/injection_points/logical.conf
 
 TAP_TESTS = 1
 
diff --git a/src/test/modules/injection_points/expected/repack.out b/src/test/modules/injection_points/expected/repack.out
new file mode 100644
index 00000000000..b575e9052ee
--- /dev/null
+++ b/src/test/modules/injection_points/expected/repack.out
@@ -0,0 +1,113 @@
+Parsed test spec with 2 sessions
+
+starting permutation: wait_before_lock change_existing change_new change_subxact1 change_subxact2 check2 wakeup_before_lock check1
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step wait_before_lock: 
+	REPACK (CONCURRENTLY) repack_test USING INDEX repack_test_pkey;
+ <waiting ...>
+step change_existing: 
+	UPDATE repack_test SET i=10 where i=1;
+	UPDATE repack_test SET j=20 where i=2;
+	UPDATE repack_test SET i=30 where i=3;
+	UPDATE repack_test SET i=40 where i=30;
+	DELETE FROM repack_test WHERE i=4;
+
+step change_new: 
+	INSERT INTO repack_test(i, j) VALUES (5, 5), (6, 6), (7, 7), (8, 8);
+	UPDATE repack_test SET i=50 where i=5;
+	UPDATE repack_test SET j=60 where i=6;
+	DELETE FROM repack_test WHERE i=7;
+
+step change_subxact1: 
+	BEGIN;
+	INSERT INTO repack_test(i, j) VALUES (100, 100);
+	SAVEPOINT s1;
+	UPDATE repack_test SET i=101 where i=100;
+	SAVEPOINT s2;
+	UPDATE repack_test SET i=102 where i=101;
+	COMMIT;
+
+step change_subxact2: 
+	BEGIN;
+	SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 110);
+	ROLLBACK TO SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 111);
+	COMMIT;
+
+step check2: 
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s2(i, j)
+	SELECT i, j FROM repack_test;
+
+  i|  j
+---+---
+  2| 20
+  6| 60
+  8|  8
+ 10|  1
+ 40|  3
+ 50|  5
+102|100
+110|111
+(8 rows)
+
+step wakeup_before_lock: 
+	SELECT injection_points_wakeup('repack-concurrently-before-lock');
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step wait_before_lock: <... completed>
+step check1: 
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT count(DISTINCT node) FROM relfilenodes;
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s1(i, j)
+	SELECT i, j FROM repack_test;
+
+	SELECT count(*)
+	FROM data_s1 d1 FULL JOIN data_s2 d2 USING (i, j)
+	WHERE d1.i ISNULL OR d2.i ISNULL;
+
+count
+-----
+    2
+(1 row)
+
+  i|  j
+---+---
+  2| 20
+  6| 60
+  8|  8
+ 10|  1
+ 40|  3
+ 50|  5
+102|100
+110|111
+(8 rows)
+
+count
+-----
+    0
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
diff --git a/src/test/modules/injection_points/logical.conf b/src/test/modules/injection_points/logical.conf
new file mode 100644
index 00000000000..c8f264bc6cb
--- /dev/null
+++ b/src/test/modules/injection_points/logical.conf
@@ -0,0 +1 @@
+wal_level = logical
\ No newline at end of file
diff --git a/src/test/modules/injection_points/meson.build b/src/test/modules/injection_points/meson.build
index 1a2af8a26c4..f8957d82afa 100644
--- a/src/test/modules/injection_points/meson.build
+++ b/src/test/modules/injection_points/meson.build
@@ -47,6 +47,7 @@ tests += {
     'specs': [
       'basic',
       'inplace',
+      'repack',
       'syscache-update-pruned',
       'index-concurrently-upsert',
       'index-concurrently-upsert-predicate',
@@ -57,6 +58,8 @@ tests += {
     'runningcheck': false, # see syscache-update-pruned
     # Some tests wait for all snapshots, so avoid parallel execution
     'runningcheck-parallel': false,
+    # 'repack' requires wal_level = 'logical'.
+    'regress_args': ['--temp-config', files('logical.conf')],
   },
   'tap': {
     'env': {
diff --git a/src/test/modules/injection_points/specs/repack.spec b/src/test/modules/injection_points/specs/repack.spec
new file mode 100644
index 00000000000..75850334986
--- /dev/null
+++ b/src/test/modules/injection_points/specs/repack.spec
@@ -0,0 +1,143 @@
+# Prefix the system columns with underscore as they are not allowed as column
+# names.
+setup
+{
+	CREATE EXTENSION injection_points;
+
+	CREATE TABLE repack_test(i int PRIMARY KEY, j int);
+	INSERT INTO repack_test(i, j) VALUES (1, 1), (2, 2), (3, 3), (4, 4);
+
+	CREATE TABLE relfilenodes(node oid);
+
+	CREATE TABLE data_s1(i int, j int);
+	CREATE TABLE data_s2(i int, j int);
+}
+
+teardown
+{
+	DROP TABLE repack_test;
+	DROP EXTENSION injection_points;
+
+	DROP TABLE relfilenodes;
+	DROP TABLE data_s1;
+	DROP TABLE data_s2;
+}
+
+session s1
+setup
+{
+	SELECT injection_points_set_local();
+	SELECT injection_points_attach('repack-concurrently-before-lock', 'wait');
+}
+# Perform the initial load and wait for s2 to do some data changes.
+step wait_before_lock
+{
+	REPACK (CONCURRENTLY) repack_test USING INDEX repack_test_pkey;
+}
+# Check the table from the perspective of s1.
+#
+# Besides the contents, we also check that relfilenode has changed.
+
+# Have each session write the contents into a table and use FULL JOIN to check
+# if the outputs are identical.
+step check1
+{
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT count(DISTINCT node) FROM relfilenodes;
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s1(i, j)
+	SELECT i, j FROM repack_test;
+
+	SELECT count(*)
+	FROM data_s1 d1 FULL JOIN data_s2 d2 USING (i, j)
+	WHERE d1.i ISNULL OR d2.i ISNULL;
+}
+teardown
+{
+    SELECT injection_points_detach('repack-concurrently-before-lock');
+}
+
+session s2
+# Change the existing data. UPDATE changes both key and non-key columns. Also
+# update one row twice to test whether tuple version generated by this session
+# can be found.
+step change_existing
+{
+	UPDATE repack_test SET i=10 where i=1;
+	UPDATE repack_test SET j=20 where i=2;
+	UPDATE repack_test SET i=30 where i=3;
+	UPDATE repack_test SET i=40 where i=30;
+	DELETE FROM repack_test WHERE i=4;
+}
+# Insert new rows and UPDATE / DELETE some of them. Again, update both key and
+# non-key column.
+step change_new
+{
+	INSERT INTO repack_test(i, j) VALUES (5, 5), (6, 6), (7, 7), (8, 8);
+	UPDATE repack_test SET i=50 where i=5;
+	UPDATE repack_test SET j=60 where i=6;
+	DELETE FROM repack_test WHERE i=7;
+}
+
+# When applying concurrent data changes, we should see the effects of an
+# in-progress subtransaction.
+#
+# XXX Not sure this test is useful now - it was designed for the patch that
+# preserves tuple visibility and which therefore modifies
+# TransactionIdIsCurrentTransactionId().
+step change_subxact1
+{
+	BEGIN;
+	INSERT INTO repack_test(i, j) VALUES (100, 100);
+	SAVEPOINT s1;
+	UPDATE repack_test SET i=101 where i=100;
+	SAVEPOINT s2;
+	UPDATE repack_test SET i=102 where i=101;
+	COMMIT;
+}
+
+# When applying concurrent data changes, we should not see the effects of a
+# rolled back subtransaction.
+#
+# XXX Is this test useful? See above.
+step change_subxact2
+{
+	BEGIN;
+	SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 110);
+	ROLLBACK TO SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 111);
+	COMMIT;
+}
+
+# Check the table from the perspective of s2.
+step check2
+{
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s2(i, j)
+	SELECT i, j FROM repack_test;
+}
+step wakeup_before_lock
+{
+	SELECT injection_points_wakeup('repack-concurrently-before-lock');
+}
+
+# Test if data changes introduced while one session is performing REPACK
+# CONCURRENTLY find their way into the table.
+permutation
+	wait_before_lock
+	change_existing
+	change_new
+	change_subxact1
+	change_subxact2
+	check2
+	wakeup_before_lock
+	check1
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 1c957f12d27..e23751afeaf 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2007,7 +2007,7 @@ pg_stat_progress_cluster| SELECT pid,
     phase,
     repack_index_relid AS cluster_index_relid,
     heap_tuples_scanned,
-    heap_tuples_written,
+    (heap_tuples_inserted + heap_tuples_updated) AS heap_tuples_written,
     heap_blks_total,
     heap_blks_scanned,
     index_rebuild_count
@@ -2087,17 +2087,20 @@ pg_stat_progress_repack| SELECT s.pid,
             WHEN 2 THEN 'index scanning heap'::text
             WHEN 3 THEN 'sorting tuples'::text
             WHEN 4 THEN 'writing new heap'::text
-            WHEN 5 THEN 'swapping relation files'::text
-            WHEN 6 THEN 'rebuilding index'::text
-            WHEN 7 THEN 'performing final cleanup'::text
+            WHEN 5 THEN 'catch-up'::text
+            WHEN 6 THEN 'swapping relation files'::text
+            WHEN 7 THEN 'rebuilding index'::text
+            WHEN 8 THEN 'performing final cleanup'::text
             ELSE NULL::text
         END AS phase,
     (s.param3)::oid AS repack_index_relid,
     s.param4 AS heap_tuples_scanned,
-    s.param5 AS heap_tuples_written,
-    s.param6 AS heap_blks_total,
-    s.param7 AS heap_blks_scanned,
-    s.param8 AS index_rebuild_count
+    s.param5 AS heap_tuples_inserted,
+    s.param6 AS heap_tuples_updated,
+    s.param7 AS heap_tuples_deleted,
+    s.param8 AS heap_blks_total,
+    s.param9 AS heap_blks_scanned,
+    s.param10 AS index_rebuild_count
    FROM (pg_stat_get_progress_info('REPACK'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_vacuum| SELECT s.pid,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 4641da9b746..5e546160a2f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -411,6 +411,7 @@ CatCacheHeader
 CatalogId
 CatalogIdMapEntry
 CatalogIndexState
+ChangeDest
 ChangeVarNodes_callback
 ChangeVarNodes_context
 CheckPoint
@@ -487,6 +488,8 @@ CompressFileHandle
 CompressionLocation
 CompressorState
 ComputeXidHorizonsResult
+ConcurrentChange
+ConcurrentChangeKind
 ConditionVariable
 ConditionVariableMinimallyPadded
 ConditionalStack
@@ -1264,6 +1267,7 @@ IndexElem
 IndexFetchHeapData
 IndexFetchTableData
 IndexInfo
+IndexInsertState
 IndexList
 IndexOnlyScan
 IndexOnlyScanState
@@ -2556,6 +2560,7 @@ ReorderBufferUpdateProgressTxnCB
 ReorderTuple
 RepOriginId
 RepackCommand
+RepackDecodingState
 RepackStmt
 ReparameterizeForeignPathByChild_function
 ReplaceVarsFromTargetList_context
-- 
2.47.3

#58

Marcos Pegoraro

marcos@f10.com.br

about 1 month ago

In reply to: Álvaro Herrera (#56)

Re: Adding REPACK [concurrently]

Em qui., 4 de dez. de 2025 às 12:43, Álvaro Herrera <alvherre@alvh.no-ip.org>
escreveu:

If you only have a small number of pages that have this problem, then
you don't actually need to do anything -- the pages will be marked free
by regular vacuuming, and future inserts or updates can make use of
those pages. It's not a problem to have a small number of pages in
empty state for some time.

So if you're trying to do this, the number of problematic pages must be
large.

Not necessarily. I have some tables where I like to use CLUSTER
every 2 or 3 months, to reorganize the data based on an index
and consequently load fewer pages with each call. These tables
don't have more than 2 or 3% of dead records, but they are quite
disorganized from the point of view of that index, since the
inserted and updated records don't follow the order I determined.

regards
Marcos

#59

Mihail Nikalayeu

mihailnikalayeu@gmail.com

about 1 month ago

In reply to: Antonin Houska (#57)

Re: Adding REPACK [concurrently]

Hello, Antonin!

On Thu, Dec 4, 2025 at 6:43 PM Antonin Houska <ah@cybertec.at> wrote:

v26 attached here. It's been rebased and reflects most of the feedback.

Some comments on 0001-0002:
1)

cluster_rel(stmt->command, rel, indexOid, params);

cluster_rel closes relation, and after it is dereferenced a few lines after.
Technically it may be correct, but feels a little bit strange.

if (vacopts->mode == MODE_VACUUM)

I think for better compatibility it is better to handle new value in
if - (vacopts->mode == MODE_REPACK) to keep old cases unchanged

case T_RepackStmt:
tag = CMDTAG_REPACK;
break;

should we use instead:

case T_RepackStmt:
if (((RepackStmt *) parsetree)->command == REPACK_COMMAND_CLUSTER)
tag = CMDTAG_CLUSTER;
else
tag = CMDTAG_REPACK;
break;

or delete CMDTAG_CLUSTER - since it not used anymore

4)
"has been superceded by"
typo

Best regards,
Mikhail.

#60

Álvaro Herrera

alvherre@alvh.no-ip.org

about 1 month ago

In reply to: Marcos Pegoraro (#58)

Re: Adding REPACK [concurrently]

Hello,

On 2025-Dec-04, Marcos Pegoraro wrote:

Em qui., 4 de dez. de 2025 às 12:43, Álvaro Herrera <alvherre@alvh.no-ip.org>
escreveu:

So if you're trying to do this, the number of problematic pages must
be large.

Not necessarily. I have some tables where I like to use CLUSTER every
2 or 3 months, to reorganize the data based on an index and
consequently load fewer pages with each call. These tables don't have
more than 2 or 3% of dead records, but they are quite disorganized
from the point of view of that index, since the inserted and updated
records don't follow the order I determined.

I don't understand what does this have to do with what David was
proposing. I mean, you're right: if all you want is to CLUSTER, you may
not have an enormous number of pages to get rid of. But how can you use
the technique he proposes to deal with reordering tuples? If you just
move the tuples from the end of the table to where some random hole has
appeared, you've not clustered the table at all.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"People get annoyed when you try to debug them." (Larry Wall)

#61

Mihail Nikalayeu

mihailnikalayeu@gmail.com

about 1 month ago

In reply to: Mihail Nikalayeu (#59)

Re: Adding REPACK [concurrently]

Hello, Antonin!

Some comments for 0003:

/* allocate in transaction context */

It may be any context now, because it is a function now.

result = CopySnapshot(snapshot);

/* Restore the original values so the source is intact. */
snapshot->xip = oldxip;
snapshot->xcnt = oldxcnt;

I think it is worth to call pfree(newxip) here.

"This difference does has no impact"

should be "This difference has no impact"?

Best regards,
Mikhail.

#62

Mihail Nikalayeu

mihailnikalayeu@gmail.com

about 1 month ago

In reply to: Mihail Nikalayeu (#61)

Re: Adding REPACK [concurrently]

Hello, comments so far on 0004:

---

ind_oids_new = build_new_indexes(NewHeap, OldHeap, ind_oids_old);

I think the biggest issue we have so far -
repack_decode_concurrent_changes is not called while new indexes are
built (the build itself creates a huge amount of WAL and takes days
sometimes). Looks like a way to catastrophic scenarios :)

Some small parts of it may be related to reset snapshots tech in CIC case:
1) if we build new indexes concurrently in REPACK case
2) and reset snapshots every so often
3) we may use the same callback to also process WAL every so often
4) but it still not applies to some phases of index building (batch
insertion phase, for example)

Or should we move repack_decode_concurrent_changes calls into some
kind of worker instead?

---

if (OldHeap->rd_rel->reltoastrelid)
LockRelationOid(OldHeap->rd_rel->reltoastrelid, AccessExclusiveLock);

I think we should pass mode from rebuild_relation here - because
AccessExclusiveLock will break "CONCURRENTLY" totally.
And also upgrade before swap probably.

---

cluster_is_permitted_for_relation(RepackCommand cmd, Oid relid, Oid userid)

Should be check CheckSlotPermissions(); here? Aso, maybe it is worth
mentioning in docs.

---

REPACK (CONCURRENTLY) repack_test USING INDEX repack_test_pkey;

Some paths (without index) are not covered in any way in tests at the moment.
Also, I think some TOAST-related scenarios too.

* Alternatively, we can lock all the indexes now in a mode that blocks
* all the ALTER INDEX commands (ShareUpdateExclusiveLock ?), and keep

I think it's better to lock.

---

rebuild_relation(RepackCommand cmd, Relation OldHeap, Relation index,

"cmd" is not used.

---

apply_concurrent_update
apply_concurrent_delete
apply_concurrent_insert

"change" is not used, but I think it is intentionally for the MVCC-safe case.

---

rebuild_relation(RepackCommand cmd, Relation OldHeap, Relation index,
bool verbose, bool concurrent)

"concurrent" is "concurrently" in definition.

---

TM_FailureData *tmfd, bool changingPart,
bool wal_logical);

Maybe "walLogical" to keep it aligned with "changingPart"?

---

subtransacion

typo

---

Should we check a the end

"a" is "at"?

---

Note that <command>REPACK</command> with the
the <literal>CONCURRENTLY</literal> option does not try to order the

double "the"

---

if (size >= 0x3FFFFFFF)

if (size >= MaxAllocSize)

---

extern bool HeapTupleMVCCInserted(HeapTuple htup, Snapshot snapshot,
Buffer buffer);
extern bool HeapTupleMVCCNotDeleted(HeapTuple htup, Snapshot snapshot,
Buffer buffer);

Looks like this from another patch.

---
src/backend/utils/cache/relcache.c

#include "commands/cluster.h"

may be removed

---

during any of the preceding
phase.

"phases"

---

# Prefix the system columns with underscore as they are not allowed as column
# names.

Should it be removed?

---

"Failed to find target tuple"

This and multiple other new error messages should start with lowercase

---

Copyright (c) 2012-2024, PostgreSQL Global Development Group

in pgoutput_repack - maybe it is time to adjust.

---
src/test/modules/injection_points/logical.conf

Better to add newline

---

SELECT injection_points_detach('repack-concurrently-before-lock');

Uses spaces, need to be tabs.

Next step in my plan - rebase MVCC-safe commit and test it with some
amount of stress tests.

Best regards,
Mikhail.

#63

Antonin Houska

ah@cybertec.at

about 1 month ago

In reply to: Mihail Nikalayeu (#59)

Re: Adding REPACK [concurrently]

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

On Thu, Dec 4, 2025 at 6:43 PM Antonin Houska <ah@cybertec.at> wrote:

v26 attached here. It's been rebased and reflects most of the feedback.

Some comments on 0001-0002:
1)

cluster_rel(stmt->command, rel, indexOid, params);

cluster_rel closes relation, and after it is dereferenced a few lines after.
Technically it may be correct, but feels a little bit strange.

ok, will be fixed in the next version (supposedly later today).

2)

if (vacopts->mode == MODE_VACUUM)

I think for better compatibility it is better to handle new value in
if - (vacopts->mode == MODE_REPACK) to keep old cases unchanged

I suppose you mean vacuuming.c. We're considering removal of pg_repackdb from
the patchset, so let's decide on this later.

3)

case T_RepackStmt:
tag = CMDTAG_REPACK;
break;

should we use instead:

case T_RepackStmt:
if (((RepackStmt *) parsetree)->command == REPACK_COMMAND_CLUSTER)
tag = CMDTAG_CLUSTER;
else
tag = CMDTAG_REPACK;
break;

or delete CMDTAG_CLUSTER - since it not used anymore

LGTM, will include it in the next version.

4)
"has been superceded by"
typo

ok. (This may also be removed, as it's specific to pg_repackdb.)

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#64

David Klika

david.klika@atlas.cz

about 1 month ago

In reply to: Álvaro Herrera (#56)

Re: Adding REPACK [concurrently]

Hello Alvaro

Thank you for the detailed analysis.

Dne 04.12.2025 v 16:43 Álvaro Herrera napsal(a):

Hello David,

Thanks for your interest in this.

On 2025-Dec-04, David Klika wrote:

Let's consider a large table where 80% blocks are fine (filled enough by
live tuples). The table could be scanned from the beginning (left side) to
identify "not enough filled" blocks and also from the end (right side) to
process live tuples by moving them to the blocks identified by the left side
scan. The work is over when both scan reaches the same position.

If you only have a small number of pages that have this problem, then
you don't actually need to do anything -- the pages will be marked free
by regular vacuuming, and future inserts or updates can make use of
those pages. It's not a problem to have a small number of pages in
empty state for some time.

So if you're trying to do this, the number of problematic pages must be
large.

I agree, I had in mind about 20-40% of the table that could have tenths
of GB.

Now, the issue with what you propose is that you need to make either the
old tuples or the new tuples visible to concurrent transactions. If at
any point they are both visible, or none of them is visible, then you
have potentially corrupted the results that would be obtained by a query
that's scanning the table and halfway through.

When performing a tuple movement from a (right) page to a (left) page,
both of pages must be hold in shared buffers. I suppose the other
processes scanning the table also access the table data through the
shared buffers so the movement could be handled at this level. If the
tuple movement does not change its xid, it wouldn't even have to be in
conflict with other transactions that locked/modified the tuple (in
buffer cache again, just changing the physical location). Looks like
something dirty...

The other point is that you need to keep indexes updated. That is, you
need to make the indexes point to both the old and new, until you remove
the old tuples from the table, then remove those index pointers.
This process bloats the indexes, which is not insignificant, considering
that the number of tuples to process is large. If there are several
indexes, this makes your process take even longer.

You can fix the concurrency problem by holding a lock on the table that
ensures nobody is reading the table until you've finished. But we don't
want to have to hold such a lock for long! And we already established
that the number of pages to check is large, which means you're going to
work for a long time.
So, I'm not really sure that it's practical to implement what you
suggest.

I agree. Proposed tuple shuffle might work better compared to the
current VACUUM FULL (i.e. blocking non-clustered maintenance) but I
understand that you prefer an universal method of data files maintenance
(the concurrent variant will be amazing).

Regards David

#65

Antonin Houska

ah@cybertec.at

about 1 month ago

In reply to: Mihail Nikalayeu (#61)

Re: Adding REPACK [concurrently]

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

Some comments for 0003:

/* allocate in transaction context */

It may be any context now, because it is a function now.

Inaccuracy not introduced by REPACK, but I think it's o.k. if the next version
of this patch will remove the comment.

result = CopySnapshot(snapshot);

/* Restore the original values so the source is intact. */
snapshot->xip = oldxip;
snapshot->xcnt = oldxcnt;

I think it is worth to call pfree(newxip) here.

"This difference does has no impact"

should be "This difference has no impact"?

Right, thanks.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#66

Antonin Houska

ah@cybertec.at

about 1 month ago

In reply to: Mihail Nikalayeu (#62)

5 attachment(s)

Re: Adding REPACK [concurrently]

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

Hello, comments so far on 0004:

---

ind_oids_new = build_new_indexes(NewHeap, OldHeap, ind_oids_old);

I think the biggest issue we have so far -
repack_decode_concurrent_changes is not called while new indexes are
built (the build itself creates a huge amount of WAL and takes days
sometimes). Looks like a way to catastrophic scenarios :)

Indeed, that may be a problem.

Some small parts of it may be related to reset snapshots tech in CIC case:
1) if we build new indexes concurrently in REPACK case
2) and reset snapshots every so often
3) we may use the same callback to also process WAL every so often
4) but it still not applies to some phases of index building (batch
insertion phase, for example)

I prefer not to depend on other improvements.

Or should we move repack_decode_concurrent_changes calls into some
kind of worker instead?

Worker makes more sense to me - the initial implementation is in 0005.

---

if (OldHeap->rd_rel->reltoastrelid)
LockRelationOid(OldHeap->rd_rel->reltoastrelid, AccessExclusiveLock);

I think we should pass mode from rebuild_relation here - because
AccessExclusiveLock will break "CONCURRENTLY" totally.

Good point, I missed this.

And also upgrade before swap probably.

rebuild_relation_finish_concurrent() already does that.

---

cluster_is_permitted_for_relation(RepackCommand cmd, Oid relid, Oid userid)

Should be check CheckSlotPermissions(); here? Aso, maybe it is worth
mentioning in docs.

setup_logical_decoding() does that, but I'm not sure if we should really
require the REPLICATION user attribute for REPACK. I need to think about this,
perhaps ACL_MAINTAIN is enough.

---

REPACK (CONCURRENTLY) repack_test USING INDEX repack_test_pkey;

Some paths (without index) are not covered in any way in tests at the moment.
Also, I think some TOAST-related scenarios too.

I added test for TOAST to "injection_points" and hit a serious problem: when
applying concurrent changes to the new table, REPACK tried to delete rows from
the new one. The point is that the "swap TOAST by content" technique cannot be
used here. Fixed, thanks for this suggestion!

* Alternatively, we can lock all the indexes now in a mode that blocks
* all the ALTER INDEX commands (ShareUpdateExclusiveLock ?), and keep

I think it's better to lock.

ok, changed

---

rebuild_relation(RepackCommand cmd, Relation OldHeap, Relation index,

"cmd" is not used.

Fixed (not specific to 0004).

---

apply_concurrent_update
apply_concurrent_delete
apply_concurrent_insert

"change" is not used, but I think it is intentionally for the MVCC-safe case.

Not sure if it's necessary for the MVCC-safe case, I consider it leftover from
some previous version. Removed.

---

rebuild_relation(RepackCommand cmd, Relation OldHeap, Relation index,
bool verbose, bool concurrent)

"concurrent" is "concurrently" in definition.

Fixed.

---

TM_FailureData *tmfd, bool changingPart,
bool wal_logical);

Maybe "walLogical" to keep it aligned with "changingPart"?

---

subtransacion

typo

I removed the related code. It was a workaround for plan_cluster_use_sort()
not to leave locks behind. However, as REPACK (CONCURRENTLY) does not unlock
the relation anymore, this is not needed as well.

---

Should we check a the end

"a" is "at"?

Removed when addressing one of the previous comments.

---

Note that <command>REPACK</command> with the
the <literal>CONCURRENTLY</literal> option does not try to order the

double "the"

Fixed.

---

if (size >= 0x3FFFFFFF)

if (size >= MaxAllocSize)

Fixed.

---

extern bool HeapTupleMVCCInserted(HeapTuple htup, Snapshot snapshot,
Buffer buffer);
extern bool HeapTupleMVCCNotDeleted(HeapTuple htup, Snapshot snapshot,
Buffer buffer);

Looks like this from another patch.

Right, this is from the "MVCC safety part".

---
src/backend/utils/cache/relcache.c

#include "commands/cluster.h"

may be removed

Yes, this belongs to some of the following patches of the series.

---

during any of the preceding
phase.

"phases"

Fixed.

---

# Prefix the system columns with underscore as they are not allowed as column
# names.

Should it be removed?

Done. (Belongs to the "MVCC-safety" part, where the test check xmin, xmax,
...)

---

"Failed to find target tuple"

This and multiple other new error messages should start with lowercase

Fixed.

---

Copyright (c) 2012-2024, PostgreSQL Global Development Group

in pgoutput_repack - maybe it is time to adjust.

Done.

---
src/test/modules/injection_points/logical.conf

Better to add newline

Done.

---

SELECT injection_points_detach('repack-concurrently-before-lock');

Uses spaces, need to be tabs.

Thanks for the review!

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachments:

v27-0001-Add-REPACK-command.patchtext/x-diff; charset=utf-8Download

From 5bc43504a954c86cfb6722e0030109008564ff4a Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Tue, 9 Dec 2025 19:44:42 +0100
Subject: [PATCH 1/5] Add REPACK command
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

REPACK absorbs the functionality of VACUUM FULL and CLUSTER in a single
command.  Because this functionality is completely different from
regular VACUUM, having it separate from VACUUM makes it easier for users
to understand; as for CLUSTER, the term is heavily overloaded in the
IT world and even in Postgres itself, so it's good that we can avoid it.

This also adds pg_repackdb, a new utility that can invoke the new
commands.  This is heavily based on vacuumdb.

Author: Antonin Houska <ah@cybertec.at>
Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: jian he <jian.universality@gmail.com>
Discussion: https://postgr.es/m/82651.1720540558@antos
Discussion: https://postgr.es/m/202507262156.sb455angijk6@alvherre.pgsql
---
 doc/src/sgml/monitoring.sgml             | 223 +++++-
 doc/src/sgml/ref/allfiles.sgml           |   2 +
 doc/src/sgml/ref/cluster.sgml            |  97 +--
 doc/src/sgml/ref/clusterdb.sgml          |   5 +
 doc/src/sgml/ref/pg_repackdb.sgml        | 488 +++++++++++++
 doc/src/sgml/ref/repack.sgml             | 328 +++++++++
 doc/src/sgml/ref/vacuum.sgml             |  33 +-
 doc/src/sgml/reference.sgml              |   2 +
 src/backend/access/heap/heapam_handler.c |  32 +-
 src/backend/catalog/index.c              |   2 +-
 src/backend/catalog/system_views.sql     |  29 +-
 src/backend/commands/cluster.c           | 851 +++++++++++++++--------
 src/backend/commands/vacuum.c            |   6 +-
 src/backend/parser/gram.y                |  86 ++-
 src/backend/tcop/utility.c               |  23 +-
 src/backend/utils/adt/pgstatfuncs.c      |   2 +
 src/bin/psql/tab-complete.in.c           |  42 +-
 src/bin/scripts/Makefile                 |   4 +-
 src/bin/scripts/meson.build              |   2 +
 src/bin/scripts/pg_repackdb.c            | 242 +++++++
 src/bin/scripts/t/103_repackdb.pl        |  47 ++
 src/bin/scripts/vacuuming.c              | 114 ++-
 src/bin/scripts/vacuuming.h              |   3 +
 src/include/commands/cluster.h           |   8 +-
 src/include/commands/progress.h          |  50 +-
 src/include/nodes/parsenodes.h           |  35 +-
 src/include/parser/kwlist.h              |   1 +
 src/include/tcop/cmdtaglist.h            |   1 +
 src/include/utils/backend_progress.h     |   1 +
 src/test/regress/expected/cluster.out    | 134 +++-
 src/test/regress/expected/rules.out      |  72 +-
 src/test/regress/sql/cluster.sql         |  70 +-
 src/tools/pgindent/typedefs.list         |   2 +
 33 files changed, 2493 insertions(+), 544 deletions(-)
 create mode 100644 doc/src/sgml/ref/pg_repackdb.sgml
 create mode 100644 doc/src/sgml/ref/repack.sgml
 create mode 100644 src/bin/scripts/pg_repackdb.c
 create mode 100644 src/bin/scripts/t/103_repackdb.pl

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index d2dd5e28365..816b11b7318 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -405,6 +405,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_progress_repack</structname><indexterm><primary>pg_stat_progress_repack</primary></indexterm></entry>
+      <entry>One row for each backend running
+       <command>REPACK</command>, showing current progress.  See
+       <xref linkend="repack-progress-reporting"/>.
+      </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_progress_basebackup</structname><indexterm><primary>pg_stat_progress_basebackup</primary></indexterm></entry>
       <entry>One row for each WAL sender process streaming a base backup,
@@ -5609,7 +5617,8 @@ FROM pg_stat_get_backend_idset() AS backendid;
    certain commands during command execution.  Currently, the only commands
    which support progress reporting are <command>ANALYZE</command>,
    <command>CLUSTER</command>,
-   <command>CREATE INDEX</command>, <command>VACUUM</command>,
+   <command>CREATE INDEX</command>, <command>REPACK</command>,
+   <command>VACUUM</command>,
    <command>COPY</command>,
    and <xref linkend="protocol-replication-base-backup"/> (i.e., replication
    command that <xref linkend="app-pgbasebackup"/> issues to take
@@ -6068,6 +6077,218 @@ FROM pg_stat_get_backend_idset() AS backendid;
   </table>
  </sect2>
 
+ <sect2 id="repack-progress-reporting">
+  <title>REPACK Progress Reporting</title>
+
+  <indexterm>
+   <primary>pg_stat_progress_repack</primary>
+  </indexterm>
+
+  <para>
+   Whenever <command>REPACK</command> is running,
+   the <structname>pg_stat_progress_repack</structname> view will contain a
+   row for each backend that is currently running the command.  The tables
+   below describe the information that will be reported and provide
+   information about how to interpret it.
+  </para>
+
+  <table id="pg-stat-progress-repack-view" xreflabel="pg_stat_progress_repack">
+   <title><structname>pg_stat_progress_repack</structname> View</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>pid</structfield> <type>integer</type>
+      </para>
+      <para>
+       Process ID of backend.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>datid</structfield> <type>oid</type>
+      </para>
+      <para>
+       OID of the database to which this backend is connected.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>datname</structfield> <type>name</type>
+      </para>
+      <para>
+       Name of the database to which this backend is connected.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>relid</structfield> <type>oid</type>
+      </para>
+      <para>
+       OID of the table being repacked.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>phase</structfield> <type>text</type>
+      </para>
+      <para>
+       Current processing phase. See <xref linkend="repack-phases"/>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>repack_index_relid</structfield> <type>oid</type>
+      </para>
+      <para>
+       If the table is being scanned using an index, this is the OID of the
+       index being used; otherwise, it is zero.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_scanned</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples scanned.
+       This counter only advances when the phase is
+       <literal>seq scanning heap</literal>,
+       <literal>index scanning heap</literal>
+       or <literal>writing new heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_written</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples written.
+       This counter only advances when the phase is
+       <literal>seq scanning heap</literal>,
+       <literal>index scanning heap</literal>
+       or <literal>writing new heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_blks_total</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total number of heap blocks in the table.  This number is reported
+       as of the beginning of <literal>seq scanning heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_blks_scanned</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap blocks scanned.  This counter only advances when the
+       phase is <literal>seq scanning heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>index_rebuild_count</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of indexes rebuilt.  This counter only advances when the phase
+       is <literal>rebuilding index</literal>.
+      </para></entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <table id="repack-phases">
+   <title>REPACK Phases</title>
+   <tgroup cols="2">
+    <colspec colname="col1" colwidth="1*"/>
+    <colspec colname="col2" colwidth="2*"/>
+    <thead>
+    <row>
+      <entry>Phase</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><literal>initializing</literal></entry>
+     <entry>
+       The command is preparing to begin scanning the heap.  This phase is
+       expected to be very brief.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>seq scanning heap</literal></entry>
+     <entry>
+       The command is currently scanning the table using a sequential scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>index scanning heap</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently scanning the table using an index scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>sorting tuples</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently sorting tuples.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>writing new heap</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently writing the new heap.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>swapping relation files</literal></entry>
+     <entry>
+       The command is currently swapping newly-built files into place.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>rebuilding index</literal></entry>
+     <entry>
+       The command is currently rebuilding an index.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>performing final cleanup</literal></entry>
+     <entry>
+       The command is performing final cleanup.  When this phase is
+       completed, <command>REPACK</command> will end.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+ </sect2>
+
  <sect2 id="copy-progress-reporting">
   <title>COPY Progress Reporting</title>
 
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index e167406c744..5df944d13ca 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -167,6 +167,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY refreshMaterializedView SYSTEM "refresh_materialized_view.sgml">
 <!ENTITY reindex            SYSTEM "reindex.sgml">
 <!ENTITY releaseSavepoint   SYSTEM "release_savepoint.sgml">
+<!ENTITY repack             SYSTEM "repack.sgml">
 <!ENTITY reset              SYSTEM "reset.sgml">
 <!ENTITY revoke             SYSTEM "revoke.sgml">
 <!ENTITY rollback           SYSTEM "rollback.sgml">
@@ -213,6 +214,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY pgIsready          SYSTEM "pg_isready.sgml">
 <!ENTITY pgReceivewal       SYSTEM "pg_receivewal.sgml">
 <!ENTITY pgRecvlogical      SYSTEM "pg_recvlogical.sgml">
+<!ENTITY pgRepackdb         SYSTEM "pg_repackdb.sgml">
 <!ENTITY pgResetwal         SYSTEM "pg_resetwal.sgml">
 <!ENTITY pgRestore          SYSTEM "pg_restore.sgml">
 <!ENTITY pgRewind           SYSTEM "pg_rewind.sgml">
diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index 0b47460080b..2cda711bc9f 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -33,51 +33,13 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
   <title>Description</title>
 
   <para>
-   <command>CLUSTER</command> instructs <productname>PostgreSQL</productname>
-   to cluster the table specified
-   by <replaceable class="parameter">table_name</replaceable>
-   based on the index specified by
-   <replaceable class="parameter">index_name</replaceable>. The index must
-   already have been defined on
-   <replaceable class="parameter">table_name</replaceable>.
+   The <command>CLUSTER</command> command is equivalent to
+   <xref linkend="sql-repack"/> with an <literal>USING INDEX</literal>
+   clause.  See there for more details.
   </para>
 
-  <para>
-   When a table is clustered, it is physically reordered
-   based on the index information. Clustering is a one-time operation:
-   when the table is subsequently updated, the changes are
-   not clustered.  That is, no attempt is made to store new or
-   updated rows according to their index order.  (If one wishes, one can
-   periodically recluster by issuing the command again.  Also, setting
-   the table's <literal>fillfactor</literal> storage parameter to less than
-   100% can aid in preserving cluster ordering during updates, since updated
-   rows are kept on the same page if enough space is available there.)
-  </para>
-
-  <para>
-   When a table is clustered, <productname>PostgreSQL</productname>
-   remembers which index it was clustered by.  The form
-   <command>CLUSTER <replaceable class="parameter">table_name</replaceable></command>
-   reclusters the table using the same index as before.  You can also
-   use the <literal>CLUSTER</literal> or <literal>SET WITHOUT CLUSTER</literal>
-   forms of <link linkend="sql-altertable"><command>ALTER TABLE</command></link> to set the index to be used for
-   future cluster operations, or to clear any previous setting.
-  </para>
-
-  <para>
-   <command>CLUSTER</command> without a
-   <replaceable class="parameter">table_name</replaceable> reclusters all the
-   previously-clustered tables in the current database that the calling user
-   has privileges for.  This form of <command>CLUSTER</command> cannot be
-   executed inside a transaction block.
-  </para>
+<!-- Do we need to describe exactly which options map to what?  They seem obvious to me. -->
 
-  <para>
-   When a table is being clustered, an <literal>ACCESS
-   EXCLUSIVE</literal> lock is acquired on it. This prevents any other
-   database operations (both reads and writes) from operating on the
-   table until the <command>CLUSTER</command> is finished.
-  </para>
  </refsect1>
 
  <refsect1>
@@ -136,63 +98,12 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
     on the table.
    </para>
 
-   <para>
-    In cases where you are accessing single rows randomly
-    within a table, the actual order of the data in the
-    table is unimportant. However, if you tend to access some
-    data more than others, and there is an index that groups
-    them together, you will benefit from using <command>CLUSTER</command>.
-    If you are requesting a range of indexed values from a table, or a
-    single indexed value that has multiple rows that match,
-    <command>CLUSTER</command> will help because once the index identifies the
-    table page for the first row that matches, all other rows
-    that match are probably already on the same table page,
-    and so you save disk accesses and speed up the query.
-   </para>
-
-   <para>
-    <command>CLUSTER</command> can re-sort the table using either an index scan
-    on the specified index, or (if the index is a b-tree) a sequential
-    scan followed by sorting.  It will attempt to choose the method that
-    will be faster, based on planner cost parameters and available statistical
-    information.
-   </para>
-
    <para>
     While <command>CLUSTER</command> is running, the <xref
     linkend="guc-search-path"/> is temporarily changed to <literal>pg_catalog,
     pg_temp</literal>.
    </para>
 
-   <para>
-    When an index scan is used, a temporary copy of the table is created that
-    contains the table data in the index order.  Temporary copies of each
-    index on the table are created as well.  Therefore, you need free space on
-    disk at least equal to the sum of the table size and the index sizes.
-   </para>
-
-   <para>
-    When a sequential scan and sort is used, a temporary sort file is
-    also created, so that the peak temporary space requirement is as much
-    as double the table size, plus the index sizes.  This method is often
-    faster than the index scan method, but if the disk space requirement is
-    intolerable, you can disable this choice by temporarily setting <xref
-    linkend="guc-enable-sort"/> to <literal>off</literal>.
-   </para>
-
-   <para>
-    It is advisable to set <xref linkend="guc-maintenance-work-mem"/> to
-    a reasonably large value (but not more than the amount of RAM you can
-    dedicate to the <command>CLUSTER</command> operation) before clustering.
-   </para>
-
-   <para>
-    Because the planner records statistics about the ordering of
-    tables, it is advisable to run <link linkend="sql-analyze"><command>ANALYZE</command></link>
-    on the newly clustered table.
-    Otherwise, the planner might make poor choices of query plans.
-   </para>
-
    <para>
     Because <command>CLUSTER</command> remembers which indexes are clustered,
     one can cluster the tables one wants clustered manually the first time,
diff --git a/doc/src/sgml/ref/clusterdb.sgml b/doc/src/sgml/ref/clusterdb.sgml
index 0d2051bf6f1..b50c9581a98 100644
--- a/doc/src/sgml/ref/clusterdb.sgml
+++ b/doc/src/sgml/ref/clusterdb.sgml
@@ -64,6 +64,11 @@ PostgreSQL documentation
    this utility and via other methods for accessing the server.
   </para>
 
+  <para>
+   <application>clusterdb</application> has been superseded by
+   <application>pg_repackdb</application>.
+  </para>
+
  </refsect1>
 
 
diff --git a/doc/src/sgml/ref/pg_repackdb.sgml b/doc/src/sgml/ref/pg_repackdb.sgml
new file mode 100644
index 00000000000..b313b54ab63
--- /dev/null
+++ b/doc/src/sgml/ref/pg_repackdb.sgml
@@ -0,0 +1,488 @@
+<!--
+doc/src/sgml/ref/pg_repackdb.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="app-pgrepackdb">
+ <indexterm zone="app-pgrepackdb">
+  <primary>pg_repackdb</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle><application>pg_repackdb</application></refentrytitle>
+  <manvolnum>1</manvolnum>
+  <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>pg_repackdb</refname>
+  <refpurpose>repack and analyze a <productname>PostgreSQL</productname>
+  database</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_repackdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+      <arg choice="plain"><option>-t</option></arg>
+      <arg choice="plain"><option>--table</option></arg>
+     </group>
+     <replaceable>table</replaceable>
+     <arg choice="opt">( <replaceable class="parameter">column</replaceable> [,...] )</arg>
+    </arg>
+   </arg>
+
+   <arg choice="opt">
+    <group choice="plain">
+     <arg choice="plain"><replaceable>dbname</replaceable></arg>
+     <arg choice="plain"><option>-a</option></arg>
+     <arg choice="plain"><option>--all</option></arg>
+    </group>
+   </arg>
+  </cmdsynopsis>
+
+  <cmdsynopsis>
+   <command>pg_repackdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+      <arg choice="plain"><option>-n</option></arg>
+      <arg choice="plain"><option>--schema</option></arg>
+     </group>
+     <replaceable>schema</replaceable>
+    </arg>
+   </arg>
+
+   <arg choice="opt">
+    <group choice="plain">
+     <arg choice="plain"><replaceable>dbname</replaceable></arg>
+     <arg choice="plain"><option>-a</option></arg>
+     <arg choice="plain"><option>--all</option></arg>
+    </group>
+   </arg>
+  </cmdsynopsis>
+
+  <cmdsynopsis>
+   <command>pg_repackdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+      <arg choice="plain"><option>-N</option></arg>
+      <arg choice="plain"><option>--exclude-schema</option></arg>
+     </group>
+     <replaceable>schema</replaceable>
+    </arg>
+   </arg>
+
+   <arg choice="opt">
+    <group choice="plain">
+     <arg choice="plain"><replaceable>dbname</replaceable></arg>
+     <arg choice="plain"><option>-a</option></arg>
+     <arg choice="plain"><option>--all</option></arg>
+    </group>
+   </arg>
+  </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <application>pg_repackdb</application> is a utility for repacking a
+   <productname>PostgreSQL</productname> database.
+   <application>pg_repackdb</application> will also generate internal
+   statistics used by the <productname>PostgreSQL</productname> query
+   optimizer.
+  </para>
+
+  <para>
+   <application>pg_repackdb</application> is a wrapper around the SQL
+   command <link linkend="sql-repack"><command>REPACK</command></link> There
+   is no effective difference between repacking and analyzing databases via
+   this utility and via other methods for accessing the server.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Options</title>
+
+   <para>
+    <application>pg_repackdb</application> accepts the following command-line arguments:
+    <variablelist>
+     <varlistentry>
+      <term><option>-a</option></term>
+      <term><option>--all</option></term>
+      <listitem>
+       <para>
+        Repack all databases.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option><optional>-d</optional> <replaceable class="parameter">dbname</replaceable></option></term>
+      <term><option><optional>--dbname=</optional><replaceable class="parameter">dbname</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the name of the database to be repacked or analyzed,
+        when <option>-a</option>/<option>--all</option> is not used.  If this
+        is not specified, the database name is read from the environment
+        variable <envar>PGDATABASE</envar>.  If that is not set, the user name
+        specified for the connection is used.
+        The <replaceable>dbname</replaceable> can be
+        a <link linkend="libpq-connstring">connection string</link>.  If so,
+        connection string parameters will override any conflicting command
+        line options.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-e</option></term>
+      <term><option>--echo</option></term>
+      <listitem>
+       <para>
+        Echo the commands that <application>pg_repackdb</application>
+        generates and sends to the server.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--index<optional>=<replaceable class="parameter">index_name</replaceable></optional></option></term>
+      <listitem>
+       <para>
+        Pass the <literal>USING INDEX</literal> clause to <literal>REPACK</literal>,
+        and optionally the index name to specify.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-j <replaceable class="parameter">njobs</replaceable></option></term>
+      <term><option>--jobs=<replaceable class="parameter">njobs</replaceable></option></term>
+      <listitem>
+       <para>
+        Execute the repack or analyze commands in parallel by running
+        <replaceable class="parameter">njobs</replaceable>
+        commands simultaneously.  This option may reduce the processing time
+        but it also increases the load on the database server.
+       </para>
+       <para>
+        <application>pg_repackdb</application> will open
+        <replaceable class="parameter">njobs</replaceable> connections to the
+        database, so make sure your <xref linkend="guc-max-connections"/>
+        setting is high enough to accommodate all connections.
+       </para>
+       <para>
+        Note that using this mode might cause deadlock failures if certain
+        system catalogs are processed in parallel.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-n <replaceable class="parameter">schema</replaceable></option></term>
+      <term><option>--schema=<replaceable class="parameter">schema</replaceable></option></term>
+      <listitem>
+       <para>
+        Repack or analyze all tables in
+        <replaceable class="parameter">schema</replaceable> only.  Multiple
+        schemas can be repacked by writing multiple <option>-n</option>
+        switches.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-N <replaceable class="parameter">schema</replaceable></option></term>
+      <term><option>--exclude-schema=<replaceable class="parameter">schema</replaceable></option></term>
+      <listitem>
+       <para>
+        Do not repack or analyze any tables in
+        <replaceable class="parameter">schema</replaceable>.  Multiple schemas
+        can be excluded by writing multiple <option>-N</option> switches.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-q</option></term>
+      <term><option>--quiet</option></term>
+      <listitem>
+       <para>
+        Do not display progress messages.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-t <replaceable class="parameter">table</replaceable> [ (<replaceable class="parameter">column</replaceable> [,...]) ]</option></term>
+      <term><option>--table=<replaceable class="parameter">table</replaceable> [ (<replaceable class="parameter">column</replaceable> [,...]) ]</option></term>
+      <listitem>
+       <para>
+        Repack or analyze <replaceable class="parameter">table</replaceable>
+        only.  Column names can be specified only in conjunction with
+        the <option>--analyze</option> option.  Multiple tables can be
+        repacked by writing multiple
+        <option>-t</option> switches.
+       </para>
+       <tip>
+        <para>
+         If you specify columns, you probably have to escape the parentheses
+         from the shell.  (See examples below.)
+        </para>
+       </tip>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-v</option></term>
+      <term><option>--verbose</option></term>
+      <listitem>
+       <para>
+        Print detailed information during processing.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-V</option></term>
+       <term><option>--version</option></term>
+       <listitem>
+       <para>
+       Print the <application>pg_repackdb</application> version and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-z</option></term>
+      <term><option>--analyze</option></term>
+      <listitem>
+       <para>
+        Also calculate statistics for use by the optimizer.  If a column name
+        list is given, only compute statistics for those columns.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-?</option></term>
+       <term><option>--help</option></term>
+       <listitem>
+       <para>
+       Show help about <application>pg_repackdb</application> command line
+       arguments, and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+    </variablelist>
+   </para>
+
+   <para>
+    <application>pg_repackdb</application> also accepts
+    the following command-line arguments for connection parameters:
+    <variablelist>
+     <varlistentry>
+      <term><option>-h <replaceable class="parameter">host</replaceable></option></term>
+      <term><option>--host=<replaceable class="parameter">host</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the host name of the machine on which the server
+        is running.  If the value begins with a slash, it is used
+        as the directory for the Unix domain socket.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-p <replaceable class="parameter">port</replaceable></option></term>
+      <term><option>--port=<replaceable class="parameter">port</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the TCP port or local Unix domain socket file
+        extension on which the server
+        is listening for connections.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-U <replaceable class="parameter">username</replaceable></option></term>
+      <term><option>--username=<replaceable class="parameter">username</replaceable></option></term>
+      <listitem>
+       <para>
+        User name to connect as.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-w</option></term>
+      <term><option>--no-password</option></term>
+      <listitem>
+       <para>
+        Never issue a password prompt.  If the server requires
+        password authentication and a password is not available by
+        other means such as a <filename>.pgpass</filename> file, the
+        connection attempt will fail.  This option can be useful in
+        batch jobs and scripts where no user is present to enter a
+        password.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-W</option></term>
+      <term><option>--password</option></term>
+      <listitem>
+       <para>
+        Force <application>pg_repackdb</application> to prompt for a
+        password before connecting to a database.
+       </para>
+
+       <para>
+        This option is never essential, since
+        <application>pg_repackdb</application> will automatically prompt
+        for a password if the server demands password authentication.
+        However, <application>pg_repackdb</application> will waste a
+        connection attempt finding out that the server wants a password.
+        In some cases it is worth typing <option>-W</option> to avoid the extra
+        connection attempt.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--maintenance-db=<replaceable class="parameter">dbname</replaceable></option></term>
+      <listitem>
+       <para>
+        When the <option>-a</option>/<option>--all</option> is used, connect
+        to this database to gather the list of databases to repack.
+        If not specified, the <literal>postgres</literal> database will be used,
+        or if that does not exist, <literal>template1</literal> will be used.
+        This can be a <link linkend="libpq-connstring">connection
+        string</link>.  If so, connection string parameters will override any
+        conflicting command line options.  Also, connection string parameters
+        other than the database name itself will be re-used when connecting
+        to other databases.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+   </para>
+ </refsect1>
+
+
+ <refsect1>
+  <title>Environment</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><envar>PGDATABASE</envar></term>
+    <term><envar>PGHOST</envar></term>
+    <term><envar>PGPORT</envar></term>
+    <term><envar>PGUSER</envar></term>
+
+    <listitem>
+     <para>
+      Default connection parameters
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><envar>PG_COLOR</envar></term>
+    <listitem>
+     <para>
+      Specifies whether to use color in diagnostic messages. Possible values
+      are <literal>always</literal>, <literal>auto</literal> and
+      <literal>never</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <para>
+   This utility, like most other <productname>PostgreSQL</productname> utilities,
+   also uses the environment variables supported by <application>libpq</application>
+   (see <xref linkend="libpq-envars"/>).
+  </para>
+
+ </refsect1>
+
+
+ <refsect1>
+  <title>Diagnostics</title>
+
+  <para>
+   In case of difficulty, see
+   <xref linkend="sql-repack"/> and <xref linkend="app-psql"/> for
+   discussions of potential problems and error messages.
+   The database server must be running at the
+   targeted host.  Also, any default connection settings and environment
+   variables used by the <application>libpq</application> front-end
+   library will apply.
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+   <para>
+    To repack the database <literal>test</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb test</userinput>
+</screen>
+   </para>
+
+   <para>
+    To repack and analyze for the optimizer a database named
+    <literal>bigdb</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --analyze bigdb</userinput>
+</screen>
+   </para>
+
+   <para>
+    To repack a single table
+    <literal>foo</literal> in a database named
+    <literal>xyzzy</literal>, and analyze a single column
+    <literal>bar</literal> of the table for the optimizer:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --analyze --verbose --table='foo(bar)' xyzzy</userinput>
+</screen></para>
+
+   <para>
+    To repack all tables in the <literal>foo</literal> and <literal>bar</literal> schemas
+    in a database named <literal>xyzzy</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --schema='foo' --schema='bar' xyzzy</userinput>
+</screen></para>
+
+
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-repack"/></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/repack.sgml b/doc/src/sgml/ref/repack.sgml
new file mode 100644
index 00000000000..61d5c2cdef1
--- /dev/null
+++ b/doc/src/sgml/ref/repack.sgml
@@ -0,0 +1,328 @@
+<!--
+doc/src/sgml/ref/repack.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="sql-repack">
+ <indexterm zone="sql-repack">
+  <primary>REPACK</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>REPACK</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>REPACK</refname>
+  <refpurpose>rewrite a table to reclaim disk space</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <replaceable class="parameter">table_and_columns</replaceable> [ USING INDEX [ <replaceable class="parameter">index_name</replaceable> ] ] ]
+REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] USING INDEX
+
+<phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
+
+    VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
+    ANALYZE [ <replaceable class="parameter">boolean</replaceable> ]
+
+<phrase>and <replaceable class="parameter">table_and_columns</replaceable> is:</phrase>
+
+    <replaceable class="parameter">table_name</replaceable> [ ( <replaceable class="parameter">column_name</replaceable> [, ...] ) ]
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>REPACK</command> reclaims storage occupied by dead
+   tuples. Unlike <command>VACUUM</command>, it does so by rewriting the
+   entire contents of the table specified
+   by <replaceable class="parameter">table_name</replaceable> into a new disk
+   file with no extra space (except for the space guaranteed by
+   the <literal>fillfactor</literal> storage parameter), allowing unused space
+   to be returned to the operating system.
+  </para>
+
+  <para>
+   Without
+   a <replaceable class="parameter">table_name</replaceable>, <command>REPACK</command>
+   processes every table and materialized view in the current database that
+   the current user has the <literal>MAINTAIN</literal> privilege on. This
+   form of <command>REPACK</command> cannot be executed inside a transaction
+   block.
+  </para>
+
+  <para>
+   If a <literal>USING INDEX</literal> clause is specified, the rows are
+   physically reordered based on information from an index.  Please see the
+   notes on clustering below.
+  </para>
+
+  <para>
+   When a table is being repacked, an <literal>ACCESS EXCLUSIVE</literal> lock
+   is acquired on it. This prevents any other database operations (both reads
+   and writes) from operating on the table until the <command>REPACK</command>
+   is finished.
+  </para>
+
+  <refsect2 id="sql-repack-notes-on-clustering" xreflabel="Notes on Clustering">
+   <title>Notes on Clustering</title>
+
+   <para>
+    If the <literal>USING INDEX</literal> clause is specified, the rows in
+    the table are physically reordered following an index: if an index name
+    is specified in the command, then that index is used; if no index name
+    is specified, then the index that has been configured as the index to
+    cluster on.  If no index has been configured in this way, an error is
+    thrown.  The index given in the <literal>USING INDEX</literal> clause
+    is configured as the index to cluster on, as well as an index given
+    to the <command>CLUSTER</command> command.  An index can be set
+    manually using <command>ALTER TABLE ... CLUSTER ON</command>, and reset
+    with <command>ALTER TABLE ... SET WITHOUT CLUSTER</command>.
+   </para>
+
+   <para>
+    If no table name is specified in <command>REPACK USING INDEX</command>,
+    all tables which have a clustering index defined and which the calling
+    user has privileges for are processed.
+   </para>
+
+   <para>
+    Clustering is a one-time operation: when the table is
+    subsequently updated, the changes are not clustered.  That is, no attempt
+    is made to store new or updated rows according to their index order.  (If
+    one wishes, one can periodically recluster by issuing the command again.
+    Also, setting the table's <literal>fillfactor</literal> storage parameter
+    to less than 100% can aid in preserving cluster ordering during updates,
+    since updated rows are kept on the same page if enough space is available
+    there.)
+   </para>
+
+   <para>
+    In cases where you are accessing single rows randomly within a table, the
+    actual order of the data in the table is unimportant. However, if you tend
+    to access some data more than others, and there is an index that groups
+    them together, you will benefit from using clustering.  If
+    you are requesting a range of indexed values from a table, or a single
+    indexed value that has multiple rows that match,
+    <command>REPACK</command> will help because once the index identifies the
+    table page for the first row that matches, all other rows that match are
+    probably already on the same table page, and so you save disk accesses and
+    speed up the query.
+   </para>
+
+   <para>
+    <command>REPACK</command> can re-sort the table using either an index scan
+    on the specified index (if the index is a b-tree), or a sequential scan
+    followed by sorting.  It will attempt to choose the method that will be
+    faster, based on planner cost parameters and available statistical
+    information.
+   </para>
+
+   <para>
+    Because the planner records statistics about the ordering of tables, it is
+    advisable to
+    run <link linkend="sql-analyze"><command>ANALYZE</command></link> on the
+    newly repacked table.  Otherwise, the planner might make poor choices of
+    query plans.
+   </para>
+  </refsect2>
+
+  <refsect2 id="sql-repack-notes-on-resources" xreflabel="Notes on Resources">
+   <title>Notes on Resources</title>
+
+   <para>
+    When an index scan or a sequential scan without sort is used, a temporary
+    copy of the table is created that contains the table data in the index
+    order.  Temporary copies of each index on the table are created as well.
+    Therefore, you need free space on disk at least equal to the sum of the
+    table size and the index sizes.
+   </para>
+
+   <para>
+    When a sequential scan and sort is used, a temporary sort file is also
+    created, so that the peak temporary space requirement is as much as double
+    the table size, plus the index sizes.  This method is often faster than
+    the index scan method, but if the disk space requirement is intolerable,
+    you can disable this choice by temporarily setting
+    <xref linkend="guc-enable-sort"/> to <literal>off</literal>.
+   </para>
+
+   <para>
+    It is advisable to set <xref linkend="guc-maintenance-work-mem"/> to a
+    reasonably large value (but not more than the amount of RAM you can
+    dedicate to the <command>REPACK</command> operation) before repacking.
+   </para>
+  </refsect2>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><replaceable class="parameter">table_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (possibly schema-qualified) of a table.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">column_name</replaceable></term>
+    <listitem>
+     <para>
+      The name of a specific column to analyze. Defaults to all columns.
+      If a column list is specific, <literal>ANALYZE</literal> must also
+      be specified.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">index_name</replaceable></term>
+    <listitem>
+     <para>
+      The name of an index.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>VERBOSE</literal></term>
+    <listitem>
+     <para>
+      Prints a progress report as each table is repacked
+      at <literal>INFO</literal> level.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>ANALYZE</literal></term>
+    <term><literal>ANALYSE</literal></term>
+    <listitem>
+     <para>
+      Applies <xref linkend="sql-analyze"/> on the table after repacking.  This is
+      currently only supported when a single (non-partitioned) table is specified.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">boolean</replaceable></term>
+    <listitem>
+     <para>
+      Specifies whether the selected option should be turned on or off.
+      You can write <literal>TRUE</literal>, <literal>ON</literal>, or
+      <literal>1</literal> to enable the option, and <literal>FALSE</literal>,
+      <literal>OFF</literal>, or <literal>0</literal> to disable it.  The
+      <replaceable class="parameter">boolean</replaceable> value can also
+      be omitted, in which case <literal>TRUE</literal> is assumed.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+
+   <para>
+    To repack a table, one must have the <literal>MAINTAIN</literal> privilege
+    on the table.
+   </para>
+
+   <para>
+    While <command>REPACK</command> is running, the <xref
+    linkend="guc-search-path"/> is temporarily changed to <literal>pg_catalog,
+    pg_temp</literal>.
+   </para>
+
+  <para>
+    Each backend running <command>REPACK</command> will report its progress
+    in the <structname>pg_stat_progress_repack</structname> view. See
+    <xref linkend="repack-progress-reporting"/> for details.
+  </para>
+
+   <para>
+    Repacking a partitioned table repacks each of its partitions. If an index
+    is specified, each partition is repacked using the partition of that
+    index. <command>REPACK</command> on a partitioned table cannot be executed
+    inside a transaction block.
+   </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   Repack the table <literal>employees</literal>:
+<programlisting>
+REPACK employees;
+</programlisting>
+  </para>
+
+  <para>
+   Repack the table <literal>employees</literal> on the basis of its
+   index <literal>employees_ind</literal> (Since index is used here, this is
+   effectively clustering):
+<programlisting>
+REPACK employees USING INDEX employees_ind;
+</programlisting>
+  </para>
+
+  <para>
+   Repack the table <literal>cases</literal> on physical ordering,
+   running an <command>ANALYZE</command> on the given columns once
+   repacking is done, showing informational messages:
+<programlisting>
+REPACK (ANALYZE, VERBOSE) cases (district, case_nr);
+</programlisting>
+  </para>
+
+  <para>
+   Repack all tables in the database on which you have
+   the <literal>MAINTAIN</literal> privilege:
+<programlisting>
+REPACK;
+</programlisting>
+  </para>
+
+  <para>
+   Repack all tables for which a clustering index has previously been
+   configured on which you have the <literal>MAINTAIN</literal> privilege,
+   showing informational messages:
+<programlisting>
+REPACK (VERBOSE) USING INDEX;
+</programlisting>
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There is no <command>REPACK</command> statement in the SQL standard.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="app-pgrepackdb"/></member>
+   <member><xref linkend="repack-progress-reporting"/></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index 6d0fdd43cfb..ac5d083d468 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -25,7 +25,6 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
 
 <phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
 
-    FULL [ <replaceable class="parameter">boolean</replaceable> ]
     FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
     VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
     ANALYZE [ <replaceable class="parameter">boolean</replaceable> ]
@@ -39,6 +38,7 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
     SKIP_DATABASE_STATS [ <replaceable class="parameter">boolean</replaceable> ]
     ONLY_DATABASE_STATS [ <replaceable class="parameter">boolean</replaceable> ]
     BUFFER_USAGE_LIMIT <replaceable class="parameter">size</replaceable>
+    FULL [ <replaceable class="parameter">boolean</replaceable> ]
 
 <phrase>and <replaceable class="parameter">table_and_columns</replaceable> is:</phrase>
 
@@ -95,20 +95,6 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
   <title>Parameters</title>
 
   <variablelist>
-   <varlistentry>
-    <term><literal>FULL</literal></term>
-    <listitem>
-     <para>
-      Selects <quote>full</quote> vacuum, which can reclaim more
-      space, but takes much longer and exclusively locks the table.
-      This method also requires extra disk space, since it writes a
-      new copy of the table and doesn't release the old copy until
-      the operation is complete.  Usually this should only be used when a
-      significant amount of space needs to be reclaimed from within the table.
-     </para>
-    </listitem>
-   </varlistentry>
-
    <varlistentry>
     <term><literal>FREEZE</literal></term>
     <listitem>
@@ -362,6 +348,23 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>FULL</literal></term>
+    <listitem>
+     <para>
+      This option, which is deprecated, makes <command>VACUUM</command>
+      behave like <command>REPACK</command> without a
+      <literal>USING INDEX</literal> clause.
+      This method of compacting the table takes much longer than
+      <command>VACUUM</command> and exclusively locks the table.
+      This method also requires extra disk space, since it writes a
+      new copy of the table and doesn't release the old copy until
+      the operation is complete.  Usually this should only be used when a
+      significant amount of space needs to be reclaimed from within the table.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><replaceable class="parameter">boolean</replaceable></term>
     <listitem>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 2cf02c37b17..5d9a8a25a02 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -195,6 +195,7 @@
    &refreshMaterializedView;
    &reindex;
    &releaseSavepoint;
+   &repack;
    &reset;
    &revoke;
    &rollback;
@@ -258,6 +259,7 @@
    &pgIsready;
    &pgReceivewal;
    &pgRecvlogical;
+   &pgRepackdb;
    &pgRestore;
    &pgVerifyBackup;
    &psqlRef;
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..79f9de5d760 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -741,13 +741,13 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	if (OldIndex != NULL && !use_sort)
 	{
 		const int	ci_index[] = {
-			PROGRESS_CLUSTER_PHASE,
-			PROGRESS_CLUSTER_INDEX_RELID
+			PROGRESS_REPACK_PHASE,
+			PROGRESS_REPACK_INDEX_RELID
 		};
 		int64		ci_val[2];
 
 		/* Set phase and OIDOldIndex to columns */
-		ci_val[0] = PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP;
+		ci_val[0] = PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP;
 		ci_val[1] = RelationGetRelid(OldIndex);
 		pgstat_progress_update_multi_param(2, ci_index, ci_val);
 
@@ -759,15 +759,15 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	else
 	{
 		/* In scan-and-sort mode and also VACUUM FULL, set phase */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
 		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
 		/* Set total heap blocks */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_TOTAL_HEAP_BLKS,
+		pgstat_progress_update_param(PROGRESS_REPACK_TOTAL_HEAP_BLKS,
 									 heapScan->rs_nblocks);
 	}
 
@@ -809,7 +809,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 				 * is manually updated to the correct value when the table
 				 * scan finishes.
 				 */
-				pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+				pgstat_progress_update_param(PROGRESS_REPACK_HEAP_BLKS_SCANNED,
 											 heapScan->rs_nblocks);
 				break;
 			}
@@ -825,7 +825,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			 */
 			if (prev_cblock != heapScan->rs_cblock)
 			{
-				pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+				pgstat_progress_update_param(PROGRESS_REPACK_HEAP_BLKS_SCANNED,
 											 (heapScan->rs_cblock +
 											  heapScan->rs_nblocks -
 											  heapScan->rs_startblock
@@ -912,14 +912,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			 * In scan-and-sort mode, report increase in number of tuples
 			 * scanned
 			 */
-			pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
 										 *num_tuples);
 		}
 		else
 		{
 			const int	ct_index[] = {
-				PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED,
-				PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN
+				PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
+				PROGRESS_REPACK_HEAP_TUPLES_WRITTEN
 			};
 			int64		ct_val[2];
 
@@ -952,14 +952,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		double		n_tuples = 0;
 
 		/* Report that we are now sorting tuples */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_SORT_TUPLES);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_SORT_TUPLES);
 
 		tuplesort_performsort(tuplesort);
 
 		/* Report that we are now writing new heap */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_WRITE_NEW_HEAP);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP);
 
 		for (;;)
 		{
@@ -977,7 +977,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 									 values, isnull,
 									 rwstate);
 			/* Report n_tuples */
-			pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_WRITTEN,
 										 n_tuples);
 		}
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 5d9db167e59..08d4b8e44d7 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -4079,7 +4079,7 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 		Assert(!ReindexIsProcessingIndex(indexOid));
 
 		/* Set index rebuild count */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_INDEX_REBUILD_COUNT,
+		pgstat_progress_update_param(PROGRESS_REPACK_INDEX_REBUILD_COUNT,
 									 i);
 		i++;
 	}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 48af8ee90a6..574f1004b9a 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1272,14 +1272,15 @@ CREATE VIEW pg_stat_progress_vacuum AS
     FROM pg_stat_get_progress_info('VACUUM') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
-CREATE VIEW pg_stat_progress_cluster AS
+CREATE VIEW pg_stat_progress_repack AS
     SELECT
         S.pid AS pid,
         S.datid AS datid,
         D.datname AS datname,
         S.relid AS relid,
         CASE S.param1 WHEN 1 THEN 'CLUSTER'
-                      WHEN 2 THEN 'VACUUM FULL'
+                      WHEN 2 THEN 'REPACK'
+                      WHEN 3 THEN 'VACUUM FULL'
                       END AS command,
         CASE S.param2 WHEN 0 THEN 'initializing'
                       WHEN 1 THEN 'seq scanning heap'
@@ -1290,15 +1291,35 @@ CREATE VIEW pg_stat_progress_cluster AS
                       WHEN 6 THEN 'rebuilding index'
                       WHEN 7 THEN 'performing final cleanup'
                       END AS phase,
-        CAST(S.param3 AS oid) AS cluster_index_relid,
+        CAST(S.param3 AS oid) AS repack_index_relid,
         S.param4 AS heap_tuples_scanned,
         S.param5 AS heap_tuples_written,
         S.param6 AS heap_blks_total,
         S.param7 AS heap_blks_scanned,
         S.param8 AS index_rebuild_count
-    FROM pg_stat_get_progress_info('CLUSTER') AS S
+    FROM pg_stat_get_progress_info('REPACK') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
+-- This view is as the one above, except for renaming a column and avoiding
+-- 'REPACK' as a command name to report.
+CREATE VIEW pg_stat_progress_cluster AS
+    SELECT
+        pid,
+        datid,
+        datname,
+        relid,
+        CASE WHEN command IN ('CLUSTER', 'VACUUM FULL') THEN command
+             WHEN repack_index_relid = 0 THEN 'VACUUM FULL'
+             ELSE 'CLUSTER' END AS command,
+        phase,
+        repack_index_relid AS cluster_index_relid,
+        heap_tuples_scanned,
+        heap_tuples_written,
+        heap_blks_total,
+        heap_blks_scanned,
+        index_rebuild_count
+    FROM pg_stat_progress_repack;
+
 CREATE VIEW pg_stat_progress_create_index AS
     SELECT
         S.pid AS pid, S.datid AS datid, D.datname AS datname,
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index d1e772efb72..3afab656cd9 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1,7 +1,8 @@
 /*-------------------------------------------------------------------------
  *
  * cluster.c
- *	  CLUSTER a table on an index.  This is now also used for VACUUM FULL.
+ *	  CLUSTER a table on an index.  This is now also used for VACUUM FULL and
+ *	  REPACK.
  *
  * There is hardly anything left of Paul Brown's original implementation...
  *
@@ -67,27 +68,36 @@ typedef struct
 	Oid			indexOid;
 } RelToCluster;
 
-
-static void cluster_multiple_rels(List *rtcs, ClusterParams *params);
-static void rebuild_relation(Relation OldHeap, Relation index, bool verbose);
+static bool cluster_rel_recheck(RepackCommand cmd, Relation OldHeap,
+								Oid indexOid, Oid userid, int options);
+static void rebuild_relation(RepackCommand cmd,
+							 Relation OldHeap, Relation index, bool verbose);
 static void copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
-static List *get_tables_to_cluster(MemoryContext cluster_context);
-static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
-											   Oid indexOid);
-static bool cluster_is_permitted_for_relation(Oid relid, Oid userid);
+static List *get_tables_to_repack(RepackCommand cmd, bool usingindex,
+								  MemoryContext permcxt);
+static List *get_tables_to_repack_partitioned(RepackCommand cmd,
+											  Oid relid, bool rel_is_index,
+											  MemoryContext permcxt);
+static bool cluster_is_permitted_for_relation(RepackCommand cmd,
+											  Oid relid, Oid userid);
+static Relation process_single_relation(RepackStmt *stmt,
+										ClusterParams *params);
+static Oid	determine_clustered_index(Relation rel, bool usingindex,
+									  const char *indexname);
+static const char *RepackCommandAsString(RepackCommand cmd);
 
 
-/*---------------------------------------------------------------------------
- * This cluster code allows for clustering multiple tables at once. Because
+/*
+ * The repack code allows for processing multiple tables at once. Because
  * of this, we cannot just run everything on a single transaction, or we
  * would be forced to acquire exclusive locks on all the tables being
  * clustered, simultaneously --- very likely leading to deadlock.
  *
- * To solve this we follow a similar strategy to VACUUM code,
- * clustering each relation in a separate transaction. For this to work,
- * we need to:
+ * To solve this we follow a similar strategy to VACUUM code, processing each
+ * relation in a separate transaction. For this to work, we need to:
+ *
  *	- provide a separate memory context so that we can pass information in
  *	  a way that survives across transactions
  *	- start a new transaction every time a new relation is clustered
@@ -98,197 +108,165 @@ static bool cluster_is_permitted_for_relation(Oid relid, Oid userid);
  *
  * The single-relation case does not have any such overhead.
  *
- * We also allow a relation to be specified without index.  In that case,
- * the indisclustered bit will be looked up, and an ERROR will be thrown
- * if there is no index with the bit set.
- *---------------------------------------------------------------------------
+ * We also allow a relation to be repacked following an index, but without
+ * naming a specific one.  In that case, the indisclustered bit will be
+ * looked up, and an ERROR will be thrown if no so-marked index is found.
  */
 void
-cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
+ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 {
-	ListCell   *lc;
 	ClusterParams params = {0};
-	bool		verbose = false;
 	Relation	rel = NULL;
-	Oid			indexOid = InvalidOid;
-	MemoryContext cluster_context;
+	MemoryContext repack_context;
 	List	   *rtcs;
 
 	/* Parse option list */
-	foreach(lc, stmt->params)
+	foreach_node(DefElem, opt, stmt->params)
 	{
-		DefElem    *opt = (DefElem *) lfirst(lc);
-
 		if (strcmp(opt->defname, "verbose") == 0)
-			verbose = defGetBoolean(opt);
+			params.options |= defGetBoolean(opt) ? CLUOPT_VERBOSE : 0;
+		else if (strcmp(opt->defname, "analyze") == 0 ||
+				 strcmp(opt->defname, "analyse") == 0)
+			params.options |= defGetBoolean(opt) ? CLUOPT_ANALYZE : 0;
 		else
 			ereport(ERROR,
-					(errcode(ERRCODE_SYNTAX_ERROR),
-					 errmsg("unrecognized %s option \"%s\"",
-							"CLUSTER", opt->defname),
-					 parser_errposition(pstate, opt->location)));
+					errcode(ERRCODE_SYNTAX_ERROR),
+					errmsg("unrecognized %s option \"%s\"",
+						   RepackCommandAsString(stmt->command),
+						   opt->defname),
+					parser_errposition(pstate, opt->location));
 	}
 
-	params.options = (verbose ? CLUOPT_VERBOSE : 0);
-
+	/*
+	 * If a single relation is specified, process it and we're done ... unless
+	 * the relation is a partitioned table, in which case we fall through.
+	 */
 	if (stmt->relation != NULL)
 	{
-		/* This is the single-relation case. */
-		Oid			tableOid;
-
-		/*
-		 * Find, lock, and check permissions on the table.  We obtain
-		 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
-		 * single-transaction case.
-		 */
-		tableOid = RangeVarGetRelidExtended(stmt->relation,
-											AccessExclusiveLock,
-											0,
-											RangeVarCallbackMaintainsTable,
-											NULL);
-		rel = table_open(tableOid, NoLock);
-
-		/*
-		 * Reject clustering a remote temp table ... their local buffer
-		 * manager is not going to cope.
-		 */
-		if (RELATION_IS_OTHER_TEMP(rel))
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot cluster temporary tables of other sessions")));
-
-		if (stmt->indexname == NULL)
-		{
-			ListCell   *index;
-
-			/* We need to find the index that has indisclustered set. */
-			foreach(index, RelationGetIndexList(rel))
-			{
-				indexOid = lfirst_oid(index);
-				if (get_index_isclustered(indexOid))
-					break;
-				indexOid = InvalidOid;
-			}
-
-			if (!OidIsValid(indexOid))
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_OBJECT),
-						 errmsg("there is no previously clustered index for table \"%s\"",
-								stmt->relation->relname)));
-		}
-		else
-		{
-			/*
-			 * The index is expected to be in the same namespace as the
-			 * relation.
-			 */
-			indexOid = get_relname_relid(stmt->indexname,
-										 rel->rd_rel->relnamespace);
-			if (!OidIsValid(indexOid))
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_OBJECT),
-						 errmsg("index \"%s\" for table \"%s\" does not exist",
-								stmt->indexname, stmt->relation->relname)));
-		}
-
-		/* For non-partitioned tables, do what we came here to do. */
-		if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
-		{
-			cluster_rel(rel, indexOid, &params);
-			/* cluster_rel closes the relation, but keeps lock */
-
-			return;
-		}
+		rel = process_single_relation(stmt, &params);
+		if (rel == NULL)
+			return;				/* all done */
 	}
 
+	/*
+	 * Don't allow ANALYZE in the multiple-relation case for now.  Maybe we
+	 * can add support for this later.
+	 */
+	if (params.options & CLUOPT_ANALYZE)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot %s multiple tables", "REPACK (ANALYZE)"));
+
 	/*
 	 * By here, we know we are in a multi-table situation.  In order to avoid
 	 * holding locks for too long, we want to process each table in its own
 	 * transaction.  This forces us to disallow running inside a user
 	 * transaction block.
 	 */
-	PreventInTransactionBlock(isTopLevel, "CLUSTER");
+	PreventInTransactionBlock(isTopLevel, RepackCommandAsString(stmt->command));
 
 	/* Also, we need a memory context to hold our list of relations */
-	cluster_context = AllocSetContextCreate(PortalContext,
-											"Cluster",
-											ALLOCSET_DEFAULT_SIZES);
+	repack_context = AllocSetContextCreate(PortalContext,
+										   "Repack",
+										   ALLOCSET_DEFAULT_SIZES);
+
+	params.options |= CLUOPT_RECHECK;
 
 	/*
-	 * Either we're processing a partitioned table, or we were not given any
-	 * table name at all.  In either case, obtain a list of relations to
-	 * process.
-	 *
-	 * In the former case, an index name must have been given, so we don't
-	 * need to recheck its "indisclustered" bit, but we have to check that it
-	 * is an index that we can cluster on.  In the latter case, we set the
-	 * option bit to have indisclustered verified.
-	 *
-	 * Rechecking the relation itself is necessary here in all cases.
+	 * If we don't have a relation yet, determine a relation list.  If we do,
+	 * then it must be a partitioned table, and we want to process its
+	 * partitions.
 	 */
-	params.options |= CLUOPT_RECHECK;
-	if (rel != NULL)
+	if (rel == NULL)
 	{
-		Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
-		check_index_is_clusterable(rel, indexOid, AccessShareLock);
-		rtcs = get_tables_to_cluster_partitioned(cluster_context, indexOid);
-
-		/* close relation, releasing lock on parent table */
-		table_close(rel, AccessExclusiveLock);
+		Assert(stmt->indexname == NULL);
+		rtcs = get_tables_to_repack(stmt->command, stmt->usingindex,
+									repack_context);
 	}
 	else
 	{
-		rtcs = get_tables_to_cluster(cluster_context);
-		params.options |= CLUOPT_RECHECK_ISCLUSTERED;
-	}
+		Oid			relid;
+		bool		rel_is_index;
 
-	/* Do the job. */
-	cluster_multiple_rels(rtcs, &params);
+		Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
 
-	/* Start a new transaction for the cleanup work. */
-	StartTransactionCommand();
+		/*
+		 * If USING INDEX was specified, resolve the index name now and pass
+		 * it down.
+		 */
+		if (stmt->usingindex)
+		{
+			/*
+			 * If no index name was specified when repacking a partitioned
+			 * table, punt for now.  Maybe we can improve this later.
+			 */
+			if (!stmt->indexname)
+				ereport(ERROR,
+						errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+						errmsg("there is no previously clustered index for table \"%s\"",
+							   RelationGetRelationName(rel)));
+
+			relid = determine_clustered_index(rel, stmt->usingindex,
+											  stmt->indexname);
+			if (!OidIsValid(relid))
+				elog(ERROR, "unable to determine index to cluster on");
+			/* XXX is this the right place for this check? */
+			check_index_is_clusterable(rel, relid, AccessExclusiveLock);
+			rel_is_index = true;
+		}
+		else
+		{
+			relid = RelationGetRelid(rel);
+			rel_is_index = false;
+		}
 
-	/* Clean up working storage */
-	MemoryContextDelete(cluster_context);
-}
+		rtcs = get_tables_to_repack_partitioned(stmt->command,
+												relid, rel_is_index,
+												repack_context);
 
-/*
- * Given a list of relations to cluster, process each of them in a separate
- * transaction.
- *
- * We expect to be in a transaction at start, but there isn't one when we
- * return.
- */
-static void
-cluster_multiple_rels(List *rtcs, ClusterParams *params)
-{
-	ListCell   *lc;
+		/* close parent relation, releasing lock on it */
+		table_close(rel, AccessExclusiveLock);
+		rel = NULL;
+	}
 
 	/* Commit to get out of starting transaction */
 	PopActiveSnapshot();
 	CommitTransactionCommand();
 
 	/* Cluster the tables, each in a separate transaction */
-	foreach(lc, rtcs)
+	Assert(rel == NULL);
+	foreach_ptr(RelToCluster, rtc, rtcs)
 	{
-		RelToCluster *rtc = (RelToCluster *) lfirst(lc);
-		Relation	rel;
-
 		/* Start a new transaction for each relation. */
 		StartTransactionCommand();
 
+		/*
+		 * Open the target table, coping with the case where it has been
+		 * dropped.
+		 */
+		rel = try_table_open(rtc->tableOid, AccessExclusiveLock);
+		if (rel == NULL)
+		{
+			CommitTransactionCommand();
+			continue;
+		}
+
 		/* functions in indexes may want a snapshot set */
 		PushActiveSnapshot(GetTransactionSnapshot());
 
-		rel = table_open(rtc->tableOid, AccessExclusiveLock);
-
 		/* Process this table */
-		cluster_rel(rel, rtc->indexOid, params);
+		cluster_rel(stmt->command, rel, rtc->indexOid, &params);
 		/* cluster_rel closes the relation, but keeps lock */
 
 		PopActiveSnapshot();
 		CommitTransactionCommand();
 	}
+
+	/* Start a new transaction for the cleanup work. */
+	StartTransactionCommand();
+
+	/* Clean up working storage */
+	MemoryContextDelete(repack_context);
 }
 
 /*
@@ -304,11 +282,14 @@ cluster_multiple_rels(List *rtcs, ClusterParams *params)
  * them incrementally while we load the table.
  *
  * If indexOid is InvalidOid, the table will be rewritten in physical order
- * instead of index order.  This is the new implementation of VACUUM FULL,
- * and error messages should refer to the operation as VACUUM not CLUSTER.
+ * instead of index order.
+ *
+ * 'cmd' indicates which command is being executed, to be used for error
+ * messages.
  */
 void
-cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
+cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
+			ClusterParams *params)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			save_userid;
@@ -323,13 +304,8 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	/* Check for user-requested abort. */
 	CHECK_FOR_INTERRUPTS();
 
-	pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
-	if (OidIsValid(indexOid))
-		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
-									 PROGRESS_CLUSTER_COMMAND_CLUSTER);
-	else
-		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
-									 PROGRESS_CLUSTER_COMMAND_VACUUM_FULL);
+	pgstat_progress_start_command(PROGRESS_COMMAND_REPACK, tableOid);
+	pgstat_progress_update_param(PROGRESS_REPACK_COMMAND, cmd);
 
 	/*
 	 * Switch to the table owner's userid, so that any index functions are run
@@ -350,86 +326,38 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	 * *must* skip the one on indisclustered since it would reject an attempt
 	 * to cluster a not-previously-clustered index.
 	 */
-	if (recheck)
-	{
-		/* Check that the user still has privileges for the relation */
-		if (!cluster_is_permitted_for_relation(tableOid, save_userid))
-		{
-			relation_close(OldHeap, AccessExclusiveLock);
-			goto out;
-		}
-
-		/*
-		 * Silently skip a temp table for a remote session.  Only doing this
-		 * check in the "recheck" case is appropriate (which currently means
-		 * somebody is executing a database-wide CLUSTER or on a partitioned
-		 * table), because there is another check in cluster() which will stop
-		 * any attempt to cluster remote temp tables by name.  There is
-		 * another check in cluster_rel which is redundant, but we leave it
-		 * for extra safety.
-		 */
-		if (RELATION_IS_OTHER_TEMP(OldHeap))
-		{
-			relation_close(OldHeap, AccessExclusiveLock);
-			goto out;
-		}
-
-		if (OidIsValid(indexOid))
-		{
-			/*
-			 * Check that the index still exists
-			 */
-			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
-			{
-				relation_close(OldHeap, AccessExclusiveLock);
-				goto out;
-			}
-
-			/*
-			 * Check that the index is still the one with indisclustered set,
-			 * if needed.
-			 */
-			if ((params->options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
-				!get_index_isclustered(indexOid))
-			{
-				relation_close(OldHeap, AccessExclusiveLock);
-				goto out;
-			}
-		}
-	}
+	if (recheck &&
+		!cluster_rel_recheck(cmd, OldHeap, indexOid, save_userid,
+							 params->options))
+		goto out;
 
 	/*
-	 * We allow VACUUM FULL, but not CLUSTER, on shared catalogs.  CLUSTER
-	 * would work in most respects, but the index would only get marked as
-	 * indisclustered in the current database, leading to unexpected behavior
-	 * if CLUSTER were later invoked in another database.
+	 * We allow repacking shared catalogs only when not using an index. It
+	 * would work to use an index in most respects, but the index would only
+	 * get marked as indisclustered in the current database, leading to
+	 * unexpected behavior if CLUSTER were later invoked in another database.
 	 */
 	if (OidIsValid(indexOid) && OldHeap->rd_rel->relisshared)
 		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot cluster a shared catalog")));
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot run %s on a shared catalog",
+					   RepackCommandAsString(cmd)));
 
 	/*
 	 * Don't process temp tables of other backends ... their local buffer
 	 * manager is not going to cope.
 	 */
 	if (RELATION_IS_OTHER_TEMP(OldHeap))
-	{
-		if (OidIsValid(indexOid))
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot cluster temporary tables of other sessions")));
-		else
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot vacuum temporary tables of other sessions")));
-	}
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot run %s on temporary tables of other sessions",
+					   RepackCommandAsString(cmd)));
 
 	/*
 	 * Also check for active uses of the relation in the current transaction,
 	 * including open scans and pending AFTER trigger events.
 	 */
-	CheckTableNotInUse(OldHeap, OidIsValid(indexOid) ? "CLUSTER" : "VACUUM");
+	CheckTableNotInUse(OldHeap, RepackCommandAsString(cmd));
 
 	/* Check heap and index are valid to cluster on */
 	if (OidIsValid(indexOid))
@@ -442,6 +370,24 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	else
 		index = NULL;
 
+	/*
+	 * When allow_system_table_mods is turned off, we disallow repacking a
+	 * catalog on a particular index unless that's already the clustered index
+	 * for that catalog.
+	 *
+	 * XXX We don't check for this in CLUSTER, because it's historically been
+	 * allowed.
+	 */
+	if (cmd != REPACK_COMMAND_CLUSTER &&
+		!allowSystemTableMods && OidIsValid(indexOid) &&
+		IsCatalogRelation(OldHeap) && !index->rd_index->indisclustered)
+		ereport(ERROR,
+				errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				errmsg("permission denied: \"%s\" is a system catalog",
+					   RelationGetRelationName(OldHeap)),
+				errdetail("System catalogs can only be clustered by the index they're already clustered on, if any, unless \"%s\" is enabled.",
+						  "allow_system_table_mods"));
+
 	/*
 	 * Quietly ignore the request if this is a materialized view which has not
 	 * been populated from its query. No harm is done because there is no data
@@ -469,7 +415,7 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	TransferPredicateLocksToHeapRelation(OldHeap);
 
 	/* rebuild_relation does all the dirty work */
-	rebuild_relation(OldHeap, index, verbose);
+	rebuild_relation(cmd, OldHeap, index, verbose);
 	/* rebuild_relation closes OldHeap, and index if valid */
 
 out:
@@ -482,6 +428,63 @@ out:
 	pgstat_progress_end_command();
 }
 
+/*
+ * Check if the table (and its index) still meets the requirements of
+ * cluster_rel().
+ */
+static bool
+cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
+					Oid userid, int options)
+{
+	Oid			tableOid = RelationGetRelid(OldHeap);
+
+	/* Check that the user still has privileges for the relation */
+	if (!cluster_is_permitted_for_relation(cmd, tableOid, userid))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		return false;
+	}
+
+	/*
+	 * Silently skip a temp table for a remote session.  Only doing this check
+	 * in the "recheck" case is appropriate (which currently means somebody is
+	 * executing a database-wide CLUSTER or on a partitioned table), because
+	 * there is another check in cluster() which will stop any attempt to
+	 * cluster remote temp tables by name.  There is another check in
+	 * cluster_rel which is redundant, but we leave it for extra safety.
+	 */
+	if (RELATION_IS_OTHER_TEMP(OldHeap))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		return false;
+	}
+
+	if (OidIsValid(indexOid))
+	{
+		/*
+		 * Check that the index still exists
+		 */
+		if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
+		{
+			relation_close(OldHeap, AccessExclusiveLock);
+			return false;
+		}
+
+		/*
+		 * Check that the index is still the one with indisclustered set, if
+		 * needed.
+		 */
+		if ((options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
+			!get_index_isclustered(indexOid))
+		{
+			relation_close(OldHeap, AccessExclusiveLock);
+			return false;
+		}
+	}
+
+	return true;
+}
+
 /*
  * Verify that the specified heap and index are valid to cluster on
  *
@@ -626,7 +629,8 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
  * On exit, they are closed, but locks on them are not released.
  */
 static void
-rebuild_relation(Relation OldHeap, Relation index, bool verbose)
+rebuild_relation(RepackCommand cmd,
+				 Relation OldHeap, Relation index, bool verbose)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			accessMethod = OldHeap->rd_rel->relam;
@@ -642,8 +646,8 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose)
 	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false) &&
 		   (index == NULL || CheckRelationLockedByMe(index, AccessExclusiveLock, false)));
 
-	if (index)
-		/* Mark the correct index as clustered */
+	/* for CLUSTER or REPACK USING INDEX, mark the index as the one to use */
+	if (index != NULL)
 		mark_index_clustered(OldHeap, RelationGetRelid(index), true);
 
 	/* Remember info about rel before closing OldHeap */
@@ -958,20 +962,20 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	/* Log what we're doing */
 	if (OldIndex != NULL && !use_sort)
 		ereport(elevel,
-				(errmsg("clustering \"%s.%s\" using index scan on \"%s\"",
-						nspname,
-						RelationGetRelationName(OldHeap),
-						RelationGetRelationName(OldIndex))));
+				errmsg("repacking \"%s.%s\" using index scan on \"%s\"",
+					   nspname,
+					   RelationGetRelationName(OldHeap),
+					   RelationGetRelationName(OldIndex)));
 	else if (use_sort)
 		ereport(elevel,
-				(errmsg("clustering \"%s.%s\" using sequential scan and sort",
-						nspname,
-						RelationGetRelationName(OldHeap))));
+				errmsg("repacking \"%s.%s\" using sequential scan and sort",
+					   nspname,
+					   RelationGetRelationName(OldHeap)));
 	else
 		ereport(elevel,
-				(errmsg("vacuuming \"%s.%s\"",
-						nspname,
-						RelationGetRelationName(OldHeap))));
+				errmsg("repacking \"%s.%s\" in physical order",
+					   nspname,
+					   RelationGetRelationName(OldHeap)));
 
 	/*
 	 * Hand off the actual copying to AM specific function, the generic code
@@ -1458,8 +1462,8 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	int			i;
 
 	/* Report that we are now swapping relation files */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_SWAP_REL_FILES);
 
 	/* Zero out possible results from swapped_relation_files */
 	memset(mapped_tables, 0, sizeof(mapped_tables));
@@ -1509,14 +1513,14 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 		reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
 
 	/* Report that we are now reindexing relations */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_REBUILD_INDEX);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
 
 	reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
 
 	/* Report that we are now doing clean up */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_FINAL_CLEANUP);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_FINAL_CLEANUP);
 
 	/*
 	 * If the relation being rebuilt is pg_class, swap_relation_files()
@@ -1632,106 +1636,191 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	}
 }
 
-
 /*
- * Get a list of tables that the current user has privileges on and
- * have indisclustered set.  Return the list in a List * of RelToCluster
- * (stored in the specified memory context), each one giving the tableOid
- * and the indexOid on which the table is already clustered.
+ * Determine which relations to process, when REPACK/CLUSTER is called
+ * without specifying a table name.  The exact process depends on whether
+ * USING INDEX was given or not, and in any case we only return tables and
+ * materialized views that the current user has privileges to repack/cluster.
+ *
+ * If USING INDEX was given, we scan pg_index to find those that have
+ * indisclustered set; if it was not given, scan pg_class and return all
+ * tables.
+ *
+ * Return it as a list of RelToCluster in the given memory context.
  */
 static List *
-get_tables_to_cluster(MemoryContext cluster_context)
+get_tables_to_repack(RepackCommand cmd, bool usingindex, MemoryContext permcxt)
 {
-	Relation	indRelation;
+	Relation	catalog;
 	TableScanDesc scan;
-	ScanKeyData entry;
-	HeapTuple	indexTuple;
-	Form_pg_index index;
-	MemoryContext old_context;
+	HeapTuple	tuple;
 	List	   *rtcs = NIL;
 
-	/*
-	 * Get all indexes that have indisclustered set and that the current user
-	 * has the appropriate privileges for.
-	 */
-	indRelation = table_open(IndexRelationId, AccessShareLock);
-	ScanKeyInit(&entry,
-				Anum_pg_index_indisclustered,
-				BTEqualStrategyNumber, F_BOOLEQ,
-				BoolGetDatum(true));
-	scan = table_beginscan_catalog(indRelation, 1, &entry);
-	while ((indexTuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+	if (usingindex)
 	{
-		RelToCluster *rtc;
+		ScanKeyData entry;
+
+		catalog = table_open(IndexRelationId, AccessShareLock);
+		ScanKeyInit(&entry,
+					Anum_pg_index_indisclustered,
+					BTEqualStrategyNumber, F_BOOLEQ,
+					BoolGetDatum(true));
+		scan = table_beginscan_catalog(catalog, 1, &entry);
+		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+		{
+			RelToCluster *rtc;
+			Form_pg_index index;
+			MemoryContext oldcxt;
 
-		index = (Form_pg_index) GETSTRUCT(indexTuple);
+			index = (Form_pg_index) GETSTRUCT(tuple);
 
-		if (!cluster_is_permitted_for_relation(index->indrelid, GetUserId()))
-			continue;
+			/*
+			 * Try to obtain a light lock on the index's table, to ensure it
+			 * doesn't go away while we collect the list.  If we cannot, just
+			 * disregard it.
+			 */
+			if (!ConditionalLockRelationOid(index->indrelid, AccessShareLock))
+				continue;
 
-		/* Use a permanent memory context for the result list */
-		old_context = MemoryContextSwitchTo(cluster_context);
+			/* Verify that the table still exists */
+			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(index->indrelid)))
+			{
+				/* Release useless lock */
+				UnlockRelationOid(index->indrelid, AccessShareLock);
+				continue;
+			}
 
-		rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
-		rtc->tableOid = index->indrelid;
-		rtc->indexOid = index->indexrelid;
-		rtcs = lappend(rtcs, rtc);
+			if (!cluster_is_permitted_for_relation(cmd, index->indrelid,
+												   GetUserId()))
+				continue;
+
+			/* Use a permanent memory context for the result list */
+			oldcxt = MemoryContextSwitchTo(permcxt);
+			rtc = palloc(sizeof(RelToCluster));
+			rtc->tableOid = index->indrelid;
+			rtc->indexOid = index->indexrelid;
+			rtcs = lappend(rtcs, rtc);
+			MemoryContextSwitchTo(oldcxt);
+		}
+	}
+	else
+	{
+		catalog = table_open(RelationRelationId, AccessShareLock);
+		scan = table_beginscan_catalog(catalog, 0, NULL);
+
+		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+		{
+			RelToCluster *rtc;
+			Form_pg_class class;
+			MemoryContext oldcxt;
+
+			class = (Form_pg_class) GETSTRUCT(tuple);
+
+			/*
+			 * Try to obtain a light lock on the table, to ensure it doesn't
+			 * go away while we collect the list.  If we cannot, just
+			 * disregard the table.
+			 */
+			if (!ConditionalLockRelationOid(class->oid, AccessShareLock))
+				continue;
 
-		MemoryContextSwitchTo(old_context);
+			/* Verify that the table still exists */
+			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(class->oid)))
+			{
+				/* Release useless lock */
+				UnlockRelationOid(class->oid, AccessShareLock);
+				continue;
+			}
+
+			/* Can only process plain tables and matviews */
+			if (class->relkind != RELKIND_RELATION &&
+				class->relkind != RELKIND_MATVIEW)
+				continue;
+
+			/* noisily skip rels which the user can't process */
+			if (!cluster_is_permitted_for_relation(cmd, class->oid,
+												   GetUserId()))
+				continue;
+
+			/* Use a permanent memory context for the result list */
+			oldcxt = MemoryContextSwitchTo(permcxt);
+			rtc = palloc(sizeof(RelToCluster));
+			rtc->tableOid = class->oid;
+			rtc->indexOid = InvalidOid;
+			rtcs = lappend(rtcs, rtc);
+			MemoryContextSwitchTo(oldcxt);
+		}
 	}
-	table_endscan(scan);
 
-	relation_close(indRelation, AccessShareLock);
+	table_endscan(scan);
+	relation_close(catalog, AccessShareLock);
 
 	return rtcs;
 }
 
 /*
- * Given an index on a partitioned table, return a list of RelToCluster for
+ * Given a partitioned table or its index, return a list of RelToCluster for
  * all the children leaves tables/indexes.
  *
  * Like expand_vacuum_rel, but here caller must hold AccessExclusiveLock
  * on the table containing the index.
+ *
+ * 'rel_is_index' tells whether 'relid' is that of an index (true) or of the
+ * owning relation.
  */
 static List *
-get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
+get_tables_to_repack_partitioned(RepackCommand cmd, Oid relid,
+								 bool rel_is_index, MemoryContext permcxt)
 {
 	List	   *inhoids;
-	ListCell   *lc;
 	List	   *rtcs = NIL;
-	MemoryContext old_context;
 
-	/* Do not lock the children until they're processed */
-	inhoids = find_all_inheritors(indexOid, NoLock, NULL);
-
-	foreach(lc, inhoids)
+	/*
+	 * Do not lock the children until they're processed.  Note that we do hold
+	 * a lock on the parent partitioned table.
+	 */
+	inhoids = find_all_inheritors(relid, NoLock, NULL);
+	foreach_oid(child_oid, inhoids)
 	{
-		Oid			indexrelid = lfirst_oid(lc);
-		Oid			relid = IndexGetRelation(indexrelid, false);
+		Oid			table_oid,
+					index_oid;
 		RelToCluster *rtc;
+		MemoryContext oldcxt;
 
-		/* consider only leaf indexes */
-		if (get_rel_relkind(indexrelid) != RELKIND_INDEX)
-			continue;
+		if (rel_is_index)
+		{
+			/* consider only leaf indexes */
+			if (get_rel_relkind(child_oid) != RELKIND_INDEX)
+				continue;
+
+			table_oid = IndexGetRelation(child_oid, false);
+			index_oid = child_oid;
+		}
+		else
+		{
+			/* consider only leaf relations */
+			if (get_rel_relkind(child_oid) != RELKIND_RELATION)
+				continue;
+
+			table_oid = child_oid;
+			index_oid = InvalidOid;
+		}
 
 		/*
 		 * It's possible that the user does not have privileges to CLUSTER the
-		 * leaf partition despite having such privileges on the partitioned
-		 * table.  We skip any partitions which the user is not permitted to
-		 * CLUSTER.
+		 * leaf partition despite having them on the partitioned table.  Skip
+		 * if so.
 		 */
-		if (!cluster_is_permitted_for_relation(relid, GetUserId()))
+		if (!cluster_is_permitted_for_relation(cmd, table_oid, GetUserId()))
 			continue;
 
 		/* Use a permanent memory context for the result list */
-		old_context = MemoryContextSwitchTo(cluster_context);
-
-		rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
-		rtc->tableOid = relid;
-		rtc->indexOid = indexrelid;
+		oldcxt = MemoryContextSwitchTo(permcxt);
+		rtc = palloc(sizeof(RelToCluster));
+		rtc->tableOid = table_oid;
+		rtc->indexOid = index_oid;
 		rtcs = lappend(rtcs, rtc);
-
-		MemoryContextSwitchTo(old_context);
+		MemoryContextSwitchTo(oldcxt);
 	}
 
 	return rtcs;
@@ -1742,13 +1831,167 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
  * function emits a WARNING.
  */
 static bool
-cluster_is_permitted_for_relation(Oid relid, Oid userid)
+cluster_is_permitted_for_relation(RepackCommand cmd, Oid relid, Oid userid)
 {
+	Assert(cmd == REPACK_COMMAND_CLUSTER || cmd == REPACK_COMMAND_REPACK);
+
 	if (pg_class_aclcheck(relid, userid, ACL_MAINTAIN) == ACLCHECK_OK)
 		return true;
 
 	ereport(WARNING,
-			(errmsg("permission denied to cluster \"%s\", skipping it",
-					get_rel_name(relid))));
+			errmsg("permission denied to execute %s on \"%s\", skipping it",
+				   RepackCommandAsString(cmd),
+				   get_rel_name(relid)));
+
 	return false;
 }
+
+
+/*
+ * Given a RepackStmt with an indicated relation name, resolve the relation
+ * name, obtain lock on it, then determine what to do based on the relation
+ * type: if it's table and not partitioned, repack it as indicated (using an
+ * existing clustered index, or following the given one), and return NULL.
+ *
+ * On the other hand, if the table is partitioned, do nothing further and
+ * instead return the opened and locked relcache entry, so that caller can
+ * process the partitions using the multiple-table handling code.  In this
+ * case, if an index name is given, it's up to the caller to resolve it.
+ */
+static Relation
+process_single_relation(RepackStmt *stmt, ClusterParams *params)
+{
+	Relation	rel;
+	Oid			tableOid;
+
+	Assert(stmt->relation != NULL);
+	Assert(stmt->command == REPACK_COMMAND_CLUSTER ||
+		   stmt->command == REPACK_COMMAND_REPACK);
+
+	/*
+	 * Find, lock, and check permissions on the table.  We obtain
+	 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
+	 * single-transaction case.
+	 */
+	tableOid = RangeVarGetRelidExtended(stmt->relation->relation,
+										AccessExclusiveLock,
+										0,
+										RangeVarCallbackMaintainsTable,
+										NULL);
+	rel = table_open(tableOid, NoLock);
+
+	/*
+	 * Reject clustering a remote temp table ... their local buffer manager is
+	 * not going to cope.
+	 */
+	if (RELATION_IS_OTHER_TEMP(rel))
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot execute %s on temporary tables of other sessions",
+					   RepackCommandAsString(stmt->command)));
+
+	/*
+	 * Make sure ANALYZE is specified if a column list is present.
+	 */
+	if ((params->options & CLUOPT_ANALYZE) == 0 && stmt->relation->va_cols != NIL)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("ANALYZE option must be specified when a column list is provided"));
+
+	/*
+	 * For partitioned tables, let caller handle this.  Otherwise, process it
+	 * here and we're done.
+	 */
+	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+		return rel;
+	else
+	{
+		Oid			indexOid;
+
+		indexOid = determine_clustered_index(rel, stmt->usingindex,
+											 stmt->indexname);
+		if (OidIsValid(indexOid))
+			check_index_is_clusterable(rel, indexOid, AccessExclusiveLock);
+		cluster_rel(stmt->command, rel, indexOid, params);
+
+		/* Do an analyze, if requested */
+		if (params->options & CLUOPT_ANALYZE)
+		{
+			VacuumParams vac_params = {0};
+
+			vac_params.options |= VACOPT_ANALYZE;
+			if (params->options & CLUOPT_VERBOSE)
+				vac_params.options |= VACOPT_VERBOSE;
+			analyze_rel(tableOid, NULL, vac_params,
+						stmt->relation->va_cols, true, NULL);
+		}
+
+		return NULL;
+	}
+}
+
+/*
+ * Given a relation and the usingindex/indexname options in a
+ * REPACK USING INDEX or CLUSTER command, return the OID of the
+ * index to use for clustering the table.
+ *
+ * Caller must hold lock on the relation so that the set of indexes
+ * doesn't change, and must call check_index_is_clusterable.
+ */
+static Oid
+determine_clustered_index(Relation rel, bool usingindex, const char *indexname)
+{
+	Oid			indexOid;
+
+	if (indexname == NULL && usingindex)
+	{
+		/*
+		 * If USING INDEX with no name is given, find a clustered index, or
+		 * error out if none.
+		 */
+		indexOid = InvalidOid;
+		foreach_oid(idxoid, RelationGetIndexList(rel))
+		{
+			if (get_index_isclustered(idxoid))
+			{
+				indexOid = idxoid;
+				break;
+			}
+		}
+
+		if (!OidIsValid(indexOid))
+			ereport(ERROR,
+					errcode(ERRCODE_UNDEFINED_OBJECT),
+					errmsg("there is no previously clustered index for table \"%s\"",
+						   RelationGetRelationName(rel)));
+	}
+	else if (indexname != NULL)
+	{
+		/* An index was specified; obtain its OID. */
+		indexOid = get_relname_relid(indexname, rel->rd_rel->relnamespace);
+		if (!OidIsValid(indexOid))
+			ereport(ERROR,
+					errcode(ERRCODE_UNDEFINED_OBJECT),
+					errmsg("index \"%s\" for table \"%s\" does not exist",
+						   indexname, RelationGetRelationName(rel)));
+	}
+	else
+		indexOid = InvalidOid;
+
+	return indexOid;
+}
+
+static const char *
+RepackCommandAsString(RepackCommand cmd)
+{
+	switch (cmd)
+	{
+		case REPACK_COMMAND_REPACK:
+			return "REPACK";
+		case REPACK_COMMAND_VACUUMFULL:
+			return "VACUUM";
+		case REPACK_COMMAND_CLUSTER:
+			return "CLUSTER";
+	}
+	return "???";
+}
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 0528d1b6ecb..6afa203983f 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -351,7 +351,6 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 		}
 	}
 
-
 	/*
 	 * Sanity check DISABLE_PAGE_SKIPPING option.
 	 */
@@ -2289,8 +2288,9 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 			if ((params.options & VACOPT_VERBOSE) != 0)
 				cluster_params.options |= CLUOPT_VERBOSE;
 
-			/* VACUUM FULL is now a variant of CLUSTER; see cluster.c */
-			cluster_rel(rel, InvalidOid, &cluster_params);
+			/* VACUUM FULL is a variant of REPACK; see cluster.c */
+			cluster_rel(REPACK_COMMAND_VACUUMFULL, rel, InvalidOid,
+						&cluster_params);
 			/* cluster_rel closes the relation, but keeps lock */
 
 			rel = NULL;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index c3a0a354a9c..c314c11e23d 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -286,7 +286,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		AlterCompositeTypeStmt AlterUserMappingStmt
 		AlterRoleStmt AlterRoleSetStmt AlterPolicyStmt AlterStatsStmt
 		AlterDefaultPrivilegesStmt DefACLAction
-		AnalyzeStmt CallStmt ClosePortalStmt ClusterStmt CommentStmt
+		AnalyzeStmt CallStmt ClosePortalStmt CommentStmt
 		ConstraintsSetStmt CopyStmt CreateAsStmt CreateCastStmt
 		CreateDomainStmt CreateExtensionStmt CreateGroupStmt CreateOpClassStmt
 		CreateOpFamilyStmt AlterOpFamilyStmt CreatePLangStmt
@@ -303,7 +303,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		GrantStmt GrantRoleStmt ImportForeignSchemaStmt IndexStmt InsertStmt
 		ListenStmt LoadStmt LockStmt MergeStmt NotifyStmt ExplainableStmt PreparableStmt
 		CreateFunctionStmt AlterFunctionStmt ReindexStmt RemoveAggrStmt
-		RemoveFuncStmt RemoveOperStmt RenameStmt ReturnStmt RevokeStmt RevokeRoleStmt
+		RemoveFuncStmt RemoveOperStmt RenameStmt RepackStmt ReturnStmt RevokeStmt RevokeRoleStmt
 		RuleActionStmt RuleActionStmtOrEmpty RuleStmt
 		SecLabelStmt SelectStmt TransactionStmt TransactionStmtLegacy TruncateStmt
 		UnlistenStmt UpdateStmt VacuumStmt
@@ -322,7 +322,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 %type <str>			opt_single_name
 %type <list>		opt_qualified_name
-%type <boolean>		opt_concurrently
+%type <boolean>		opt_concurrently opt_usingindex
 %type <dbehavior>	opt_drop_behavior
 %type <list>		opt_utility_option_list
 %type <list>		opt_wait_with_clause
@@ -770,7 +770,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	QUOTE QUOTES
 
 	RANGE READ REAL REASSIGN RECURSIVE REF_P REFERENCES REFERENCING
-	REFRESH REINDEX RELATIVE_P RELEASE RENAME REPEATABLE REPLACE REPLICA
+	REFRESH REINDEX RELATIVE_P RELEASE RENAME REPACK REPEATABLE REPLACE REPLICA
 	RESET RESPECT_P RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP
 	ROUTINE ROUTINES ROW ROWS RULE
 
@@ -1032,7 +1032,6 @@ stmt:
 			| CallStmt
 			| CheckPointStmt
 			| ClosePortalStmt
-			| ClusterStmt
 			| CommentStmt
 			| ConstraintsSetStmt
 			| CopyStmt
@@ -1106,6 +1105,7 @@ stmt:
 			| RemoveFuncStmt
 			| RemoveOperStmt
 			| RenameStmt
+			| RepackStmt
 			| RevokeStmt
 			| RevokeRoleStmt
 			| RuleStmt
@@ -1143,6 +1143,11 @@ opt_concurrently:
 			| /*EMPTY*/						{ $$ = false; }
 		;
 
+opt_usingindex:
+			USING INDEX						{ $$ = true; }
+			| /* EMPTY */					{ $$ = false; }
+		;
+
 opt_drop_behavior:
 			CASCADE							{ $$ = DROP_CASCADE; }
 			| RESTRICT						{ $$ = DROP_RESTRICT; }
@@ -11979,38 +11984,82 @@ CreateConversionStmt:
 /*****************************************************************************
  *
  *		QUERY:
+ *				REPACK [ (options) ] [ <qualified_name> [ <name_list> ] [ USING INDEX <index_name> ] ]
+ *
+ *			obsolete variants:
  *				CLUSTER (options) [ <qualified_name> [ USING <index_name> ] ]
  *				CLUSTER [VERBOSE] [ <qualified_name> [ USING <index_name> ] ]
  *				CLUSTER [VERBOSE] <index_name> ON <qualified_name> (for pre-8.3)
  *
  *****************************************************************************/
 
-ClusterStmt:
-			CLUSTER '(' utility_option_list ')' qualified_name cluster_index_specification
+RepackStmt:
+			REPACK opt_utility_option_list vacuum_relation USING INDEX name
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
-					n->relation = $5;
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = (VacuumRelation *) $3;
 					n->indexname = $6;
+					n->usingindex = true;
+					n->params = $2;
+					$$ = (Node *) n;
+				}
+			| REPACK opt_utility_option_list vacuum_relation opt_usingindex
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = (VacuumRelation *) $3;
+					n->indexname = NULL;
+					n->usingindex = $4;
+					n->params = $2;
+					$$ = (Node *) n;
+				}
+			| REPACK opt_utility_option_list opt_usingindex
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = NULL;
+					n->indexname = NULL;
+					n->usingindex = $3;
+					n->params = $2;
+					$$ = (Node *) n;
+				}
+			| CLUSTER '(' utility_option_list ')' qualified_name cluster_index_specification
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_CLUSTER;
+					n->relation = makeNode(VacuumRelation);
+					n->relation->relation = $5;
+					n->indexname = $6;
+					n->usingindex = true;
 					n->params = $3;
 					$$ = (Node *) n;
 				}
 			| CLUSTER opt_utility_option_list
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = NULL;
 					n->indexname = NULL;
+					n->usingindex = true;
 					n->params = $2;
 					$$ = (Node *) n;
 				}
 			/* unparenthesized VERBOSE kept for pre-14 compatibility */
 			| CLUSTER opt_verbose qualified_name cluster_index_specification
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
-					n->relation = $3;
+					n->command = REPACK_COMMAND_CLUSTER;
+					n->relation = makeNode(VacuumRelation);
+					n->relation->relation = $3;
 					n->indexname = $4;
+					n->usingindex = true;
 					if ($2)
 						n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
@@ -12018,20 +12067,25 @@ ClusterStmt:
 			/* unparenthesized VERBOSE kept for pre-17 compatibility */
 			| CLUSTER VERBOSE
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = NULL;
 					n->indexname = NULL;
+					n->usingindex = true;
 					n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
 				}
 			/* kept for pre-8.3 compatibility */
 			| CLUSTER opt_verbose name ON qualified_name
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
-					n->relation = $5;
+					n->command = REPACK_COMMAND_CLUSTER;
+					n->relation = makeNode(VacuumRelation);
+					n->relation->relation = $5;
 					n->indexname = $3;
+					n->usingindex = true;
 					if ($2)
 						n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
@@ -18069,6 +18123,7 @@ unreserved_keyword:
 			| RELATIVE_P
 			| RELEASE
 			| RENAME
+			| REPACK
 			| REPEATABLE
 			| REPLACE
 			| REPLICA
@@ -18704,6 +18759,7 @@ bare_label_keyword:
 			| RELATIVE_P
 			| RELEASE
 			| RENAME
+			| REPACK
 			| REPEATABLE
 			| REPLACE
 			| REPLICA
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index d18a3a60a46..3e731dc8117 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -279,9 +279,9 @@ ClassifyUtilityCommandAsReadOnly(Node *parsetree)
 				return COMMAND_OK_IN_RECOVERY | COMMAND_OK_IN_READ_ONLY_TXN;
 			}
 
-		case T_ClusterStmt:
 		case T_ReindexStmt:
 		case T_VacuumStmt:
+		case T_RepackStmt:
 			{
 				/*
 				 * These commands write WAL, so they're not strictly
@@ -856,14 +856,14 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 			ExecuteCallStmt(castNode(CallStmt, parsetree), params, isAtomicContext, dest);
 			break;
 
-		case T_ClusterStmt:
-			cluster(pstate, (ClusterStmt *) parsetree, isTopLevel);
-			break;
-
 		case T_VacuumStmt:
 			ExecVacuum(pstate, (VacuumStmt *) parsetree, isTopLevel);
 			break;
 
+		case T_RepackStmt:
+			ExecRepack(pstate, (RepackStmt *) parsetree, isTopLevel);
+			break;
+
 		case T_ExplainStmt:
 			ExplainQuery(pstate, (ExplainStmt *) parsetree, params, dest);
 			break;
@@ -2864,10 +2864,6 @@ CreateCommandTag(Node *parsetree)
 			tag = CMDTAG_CALL;
 			break;
 
-		case T_ClusterStmt:
-			tag = CMDTAG_CLUSTER;
-			break;
-
 		case T_VacuumStmt:
 			if (((VacuumStmt *) parsetree)->is_vacuumcmd)
 				tag = CMDTAG_VACUUM;
@@ -2875,6 +2871,13 @@ CreateCommandTag(Node *parsetree)
 				tag = CMDTAG_ANALYZE;
 			break;
 
+		case T_RepackStmt:
+			if (((RepackStmt *) parsetree)->command == REPACK_COMMAND_CLUSTER)
+				tag = CMDTAG_CLUSTER;
+			else
+				tag = CMDTAG_REPACK;
+			break;
+
 		case T_ExplainStmt:
 			tag = CMDTAG_EXPLAIN;
 			break;
@@ -3516,7 +3519,7 @@ GetCommandLogLevel(Node *parsetree)
 			lev = LOGSTMT_ALL;
 			break;
 
-		case T_ClusterStmt:
+		case T_RepackStmt:
 			lev = LOGSTMT_DDL;
 			break;
 
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index ef6fffe60b9..fc86cbb3b88 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -289,6 +289,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 		cmdtype = PROGRESS_COMMAND_ANALYZE;
 	else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
 		cmdtype = PROGRESS_COMMAND_CLUSTER;
+	else if (pg_strcasecmp(cmd, "REPACK") == 0)
+		cmdtype = PROGRESS_COMMAND_REPACK;
 	else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)
 		cmdtype = PROGRESS_COMMAND_CREATE_INDEX;
 	else if (pg_strcasecmp(cmd, "BASEBACKUP") == 0)
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 20d7a65c614..626d9f1c98b 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -1267,7 +1267,7 @@ static const char *const sql_commands[] = {
 	"DELETE FROM", "DISCARD", "DO", "DROP", "END", "EXECUTE", "EXPLAIN",
 	"FETCH", "GRANT", "IMPORT FOREIGN SCHEMA", "INSERT INTO", "LISTEN", "LOAD", "LOCK",
 	"MERGE INTO", "MOVE", "NOTIFY", "PREPARE",
-	"REASSIGN", "REFRESH MATERIALIZED VIEW", "REINDEX", "RELEASE",
+	"REASSIGN", "REFRESH MATERIALIZED VIEW", "REINDEX", "RELEASE", "REPACK",
 	"RESET", "REVOKE", "ROLLBACK",
 	"SAVEPOINT", "SECURITY LABEL", "SELECT", "SET", "SHOW", "START",
 	"TABLE", "TRUNCATE", "UNLISTEN", "UPDATE", "VACUUM", "VALUES",
@@ -5040,6 +5040,46 @@ match_previous_words(int pattern_id,
 			COMPLETE_WITH_QUERY(Query_for_list_of_tablespaces);
 	}
 
+/* REPACK */
+	else if (Matches("REPACK"))
+		COMPLETE_WITH_SCHEMA_QUERY_PLUS(Query_for_list_of_clusterables,
+										"(", "USING INDEX");
+	else if (Matches("REPACK", "(*)"))
+		COMPLETE_WITH_SCHEMA_QUERY_PLUS(Query_for_list_of_clusterables,
+										"USING INDEX");
+	else if (Matches("REPACK", MatchAnyExcept("(")))
+		COMPLETE_WITH("USING INDEX");
+	else if (Matches("REPACK", "(*)", MatchAnyExcept("(")))
+		COMPLETE_WITH("USING INDEX");
+	else if (Matches("REPACK", MatchAny, "USING", "INDEX") ||
+			 Matches("REPACK", "(*)", MatchAny, "USING", "INDEX"))
+	{
+		set_completion_reference(prev3_wd);
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_index_of_table);
+	}
+	/*
+	 * Complete ... [ (*) ] <sth> USING INDEX, with a list of indexes for
+	 * <sth>.
+	 */
+	else if (TailMatches(MatchAny, "USING", "INDEX"))
+	{
+		set_completion_reference(prev3_wd);
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_index_of_table);
+	}
+	else if (HeadMatches("REPACK", "(*") &&
+			 !HeadMatches("REPACK", "(*)"))
+	{
+		/*
+		 * This fires if we're in an unfinished parenthesized option list.
+		 * get_previous_words treats a completed parenthesized option list as
+		 * one word, so the above test is correct.
+		 */
+		if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
+			COMPLETE_WITH("ANALYZE", "VERBOSE");
+		else if (TailMatches("ANALYZE", "VERBOSE"))
+			COMPLETE_WITH("ON", "OFF");
+	}
+
 /* SECURITY LABEL */
 	else if (Matches("SECURITY"))
 		COMPLETE_WITH("LABEL");
diff --git a/src/bin/scripts/Makefile b/src/bin/scripts/Makefile
index 019ca06455d..f0c1bd4175c 100644
--- a/src/bin/scripts/Makefile
+++ b/src/bin/scripts/Makefile
@@ -16,7 +16,7 @@ subdir = src/bin/scripts
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-PROGRAMS = createdb createuser dropdb dropuser clusterdb vacuumdb reindexdb pg_isready
+PROGRAMS = createdb createuser dropdb dropuser clusterdb vacuumdb reindexdb pg_isready pg_repackdb
 
 override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
 LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
@@ -31,6 +31,7 @@ clusterdb: clusterdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport su
 vacuumdb: vacuumdb.o vacuuming.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 reindexdb: reindexdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 pg_isready: pg_isready.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
+pg_repackdb: pg_repackdb.o vacuuming.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 
 install: all installdirs
 	$(INSTALL_PROGRAM) createdb$(X)   '$(DESTDIR)$(bindir)'/createdb$(X)
@@ -41,6 +42,7 @@ install: all installdirs
 	$(INSTALL_PROGRAM) vacuumdb$(X)   '$(DESTDIR)$(bindir)'/vacuumdb$(X)
 	$(INSTALL_PROGRAM) reindexdb$(X)  '$(DESTDIR)$(bindir)'/reindexdb$(X)
 	$(INSTALL_PROGRAM) pg_isready$(X) '$(DESTDIR)$(bindir)'/pg_isready$(X)
+	$(INSTALL_PROGRAM) pg_repackdb$(X) '$(DESTDIR)$(bindir)'/pg_repackdb$(X)
 
 installdirs:
 	$(MKDIR_P) '$(DESTDIR)$(bindir)'
diff --git a/src/bin/scripts/meson.build b/src/bin/scripts/meson.build
index a4fed59d1c9..be573cae682 100644
--- a/src/bin/scripts/meson.build
+++ b/src/bin/scripts/meson.build
@@ -42,6 +42,7 @@ vacuuming_common = static_library('libvacuuming_common',
 
 binaries = [
   'vacuumdb',
+  'pg_repackdb',
 ]
 foreach binary : binaries
   binary_sources = files('@0@.c'.format(binary))
@@ -80,6 +81,7 @@ tests += {
       't/100_vacuumdb.pl',
       't/101_vacuumdb_all.pl',
       't/102_vacuumdb_stages.pl',
+      't/103_repackdb.pl',
       't/200_connstr.pl',
     ],
   },
diff --git a/src/bin/scripts/pg_repackdb.c b/src/bin/scripts/pg_repackdb.c
new file mode 100644
index 00000000000..1edfa34ed0f
--- /dev/null
+++ b/src/bin/scripts/pg_repackdb.c
@@ -0,0 +1,242 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_repackdb
+ *		An utility to run REPACK
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * FIXME: this is missing a way to specify the index to use to repack one
+ * table, or whether to pass a WITH INDEX clause when multiple tables are
+ * used.  Something like --index[=indexname].  Adding that bleeds into
+ * vacuuming.c as well.
+ *
+ * src/bin/scripts/pg_repackdb.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <limits.h>
+
+#include "common.h"
+#include "common/logging.h"
+#include "fe_utils/option_utils.h"
+#include "vacuuming.h"
+
+static void help(const char *progname);
+static void check_objfilter(bits32 objfilter);
+
+int
+main(int argc, char *argv[])
+{
+	static struct option long_options[] = {
+		{"host", required_argument, NULL, 'h'},
+		{"port", required_argument, NULL, 'p'},
+		{"username", required_argument, NULL, 'U'},
+		{"no-password", no_argument, NULL, 'w'},
+		{"password", no_argument, NULL, 'W'},
+		{"echo", no_argument, NULL, 'e'},
+		{"quiet", no_argument, NULL, 'q'},
+		{"dbname", required_argument, NULL, 'd'},
+		{"analyze", no_argument, NULL, 'z'},
+		{"all", no_argument, NULL, 'a'},
+		/* XXX this could be 'i', but optional_arg is messy */
+		{"index", optional_argument, NULL, 1},
+		{"table", required_argument, NULL, 't'},
+		{"verbose", no_argument, NULL, 'v'},
+		{"jobs", required_argument, NULL, 'j'},
+		{"schema", required_argument, NULL, 'n'},
+		{"exclude-schema", required_argument, NULL, 'N'},
+		{"maintenance-db", required_argument, NULL, 2},
+		{NULL, 0, NULL, 0}
+	};
+
+	const char *progname;
+	int			optindex;
+	int			c;
+	const char *dbname = NULL;
+	const char *maintenance_db = NULL;
+	ConnParams	cparams;
+	bool		echo = false;
+	bool		quiet = false;
+	vacuumingOptions vacopts;
+	SimpleStringList objects = {NULL, NULL};
+	int			concurrentCons = 1;
+	int			tbl_count = 0;
+	int			ret;
+
+	/* initialize options */
+	memset(&vacopts, 0, sizeof(vacopts));
+	vacopts.mode = MODE_REPACK;
+
+	/* the same for connection parameters */
+	memset(&cparams, 0, sizeof(cparams));
+	cparams.prompt_password = TRI_DEFAULT;
+
+	pg_logging_init(argv[0]);
+	progname = get_progname(argv[0]);
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
+
+	handle_help_version_opts(argc, argv, progname, help);
+
+	while ((c = getopt_long(argc, argv, "ad:eh:j:n:N:p:qt:U:vwWz",
+							long_options, &optindex)) != -1)
+	{
+		switch (c)
+		{
+			case 'a':
+				vacopts.objfilter |= OBJFILTER_ALL_DBS;
+				break;
+			case 'd':
+				vacopts.objfilter |= OBJFILTER_DATABASE;
+				dbname = pg_strdup(optarg);
+				break;
+			case 'e':
+				echo = true;
+				break;
+			case 'h':
+				cparams.pghost = pg_strdup(optarg);
+				break;
+			case 'j':
+				if (!option_parse_int(optarg, "-j/--jobs", 1, INT_MAX,
+									  &concurrentCons))
+					exit(1);
+				break;
+			case 'n':
+				vacopts.objfilter |= OBJFILTER_SCHEMA;
+				simple_string_list_append(&objects, optarg);
+				break;
+			case 'N':
+				vacopts.objfilter |= OBJFILTER_SCHEMA_EXCLUDE;
+				simple_string_list_append(&objects, optarg);
+				break;
+			case 'p':
+				cparams.pgport = pg_strdup(optarg);
+				break;
+			case 'q':
+				quiet = true;
+				break;
+			case 't':
+				vacopts.objfilter |= OBJFILTER_TABLE;
+				simple_string_list_append(&objects, optarg);
+				tbl_count++;
+				break;
+			case 'U':
+				cparams.pguser = pg_strdup(optarg);
+				break;
+			case 'v':
+				vacopts.verbose = true;
+				break;
+			case 'w':
+				cparams.prompt_password = TRI_NO;
+				break;
+			case 'W':
+				cparams.prompt_password = TRI_YES;
+				break;
+			case 'z':
+				vacopts.and_analyze = true;
+				break;
+			case 1:
+				vacopts.using_index = true;
+				if (optarg)
+					vacopts.indexname = pg_strdup(optarg);
+				else
+					vacopts.indexname = NULL;
+				break;
+			case 2:
+				maintenance_db = pg_strdup(optarg);
+				break;
+			default:
+				/* getopt_long already emitted a complaint */
+				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+				exit(1);
+		}
+	}
+
+	/*
+	 * Non-option argument specifies database name as long as it wasn't
+	 * already specified with -d / --dbname
+	 */
+	if (optind < argc && dbname == NULL)
+	{
+		vacopts.objfilter |= OBJFILTER_DATABASE;
+		dbname = argv[optind];
+		optind++;
+	}
+
+	if (optind < argc)
+	{
+		pg_log_error("too many command-line arguments (first is \"%s\")",
+					 argv[optind]);
+		pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+		exit(1);
+	}
+
+	/*
+	 * Validate the combination of filters specified in the command-line
+	 * options.
+	 */
+	check_objfilter(vacopts.objfilter);
+
+	ret = vacuuming_main(&cparams, dbname, maintenance_db, &vacopts,
+						 &objects, tbl_count, concurrentCons,
+						 progname, echo, quiet);
+	exit(ret);
+}
+
+/*
+ * Verify that the filters used at command line are compatible.
+ */
+void
+check_objfilter(bits32 objfilter)
+{
+	if ((objfilter & OBJFILTER_ALL_DBS) &&
+		(objfilter & OBJFILTER_DATABASE))
+		pg_fatal("cannot repack all databases and a specific one at the same time");
+
+	if ((objfilter & OBJFILTER_TABLE) &&
+		(objfilter & OBJFILTER_SCHEMA))
+		pg_fatal("cannot repack all tables in schema(s) and specific table(s) at the same time");
+
+	if ((objfilter & OBJFILTER_TABLE) &&
+		(objfilter & OBJFILTER_SCHEMA_EXCLUDE))
+		pg_fatal("cannot repack specific table(s) and exclude schema(s) at the same time");
+
+	if ((objfilter & OBJFILTER_SCHEMA) &&
+		(objfilter & OBJFILTER_SCHEMA_EXCLUDE))
+		pg_fatal("cannot repack all tables in schema(s) and exclude schema(s) at the same time");
+}
+
+static void
+help(const char *progname)
+{
+	printf(_("%s repacks a PostgreSQL database.\n\n"), progname);
+	printf(_("Usage:\n"));
+	printf(_("  %s [OPTION]... [DBNAME]\n"), progname);
+	printf(_("\nOptions:\n"));
+	printf(_("  -a, --all                       repack all databases\n"));
+	printf(_("  -d, --dbname=DBNAME             database to repack\n"));
+	printf(_("  -e, --echo                      show the commands being sent to the server\n"));
+	printf(_("      --index[=INDEX]             repack following an index\n"));
+	printf(_("  -j, --jobs=NUM                  use this many concurrent connections to repack\n"));
+	printf(_("  -n, --schema=SCHEMA             repack tables in the specified schema(s) only\n"));
+	printf(_("  -N, --exclude-schema=SCHEMA     do not repack tables in the specified schema(s)\n"));
+	printf(_("  -q, --quiet                     don't write any messages\n"));
+	printf(_("  -t, --table='TABLE[(COLUMNS)]'  repack specific table(s) only\n"));
+	printf(_("  -v, --verbose                   write a lot of output\n"));
+	printf(_("  -V, --version                   output version information, then exit\n"));
+	printf(_("  -z, --analyze                   update optimizer statistics\n"));
+	printf(_("  -?, --help                      show this help, then exit\n"));
+	printf(_("\nConnection options:\n"));
+	printf(_("  -h, --host=HOSTNAME       database server host or socket directory\n"));
+	printf(_("  -p, --port=PORT           database server port\n"));
+	printf(_("  -U, --username=USERNAME   user name to connect as\n"));
+	printf(_("  -w, --no-password         never prompt for password\n"));
+	printf(_("  -W, --password            force password prompt\n"));
+	printf(_("  --maintenance-db=DBNAME   alternate maintenance database\n"));
+	printf(_("\nRead the description of the SQL command REPACK for details.\n"));
+	printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
+	printf(_("%s home page: <%s>\n"), PACKAGE_NAME, PACKAGE_URL);
+}
diff --git a/src/bin/scripts/t/103_repackdb.pl b/src/bin/scripts/t/103_repackdb.pl
new file mode 100644
index 00000000000..cadce9b837c
--- /dev/null
+++ b/src/bin/scripts/t/103_repackdb.pl
@@ -0,0 +1,47 @@
+# Copyright (c) 2021-2025, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+program_help_ok('pg_repackdb');
+program_version_ok('pg_repackdb');
+program_options_handling_ok('pg_repackdb');
+
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init;
+$node->start;
+
+$node->issues_sql_like(
+	[ 'pg_repackdb', 'postgres', '-t', 'pg_class'],
+	qr/statement: REPACK.*pg_class;/,
+	'pg_repackdb processes a single table');
+
+$node->safe_psql('postgres', 'CREATE USER testusr;
+	GRANT CREATE ON SCHEMA public TO testusr');
+$node->safe_psql('postgres',
+	'CREATE TABLE cluster_1 (a int primary key);
+	ALTER TABLE cluster_1 CLUSTER ON cluster_1_pkey;
+	CREATE TABLE cluster_2 (a int unique);
+	ALTER TABLE cluster_2 CLUSTER ON cluster_2_a_key;',
+	extra_params => ['-U' => 'testusr']);
+
+$node->issues_sql_like(
+	[ 'pg_repackdb', 'postgres', '-U', 'testusr' ],
+	qr/statement: REPACK.*;/,
+	'SQL REPACK run');
+
+$node->issues_sql_like(
+	[ 'pg_repackdb', 'postgres', '--index'],
+	qr/statement: REPACK.*cluster_1 USING INDEX.*statement: REPACK.*cluster_2 USING INDEX/ms,
+	'pg_repackdb --index chooses multiple tables');
+
+$node->issues_sql_like(
+	[ 'pg_repackdb', 'postgres', '--analyze', '-t', 'cluster_1'],
+	qr/statement: REPACK \(ANALYZE\) public.cluster_1/,
+	'pg_repackdb --analyze works');
+
+done_testing();
diff --git a/src/bin/scripts/vacuuming.c b/src/bin/scripts/vacuuming.c
index 9f44cae02ae..d4c1cba325a 100644
--- a/src/bin/scripts/vacuuming.c
+++ b/src/bin/scripts/vacuuming.c
@@ -1,6 +1,6 @@
 /*-------------------------------------------------------------------------
  * vacuuming.c
- *		Helper routines for vacuumdb
+ *		Helper routines for vacuumdb and pg_repackdb
  *
  * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -43,8 +43,8 @@ static SimpleStringList *retrieve_objects(PGconn *conn,
 static void free_retrieved_objects(SimpleStringList *list);
 static void prepare_vacuum_command(PGconn *conn, PQExpBuffer sql,
 								   vacuumingOptions *vacopts, const char *table);
-static void run_vacuum_command(PGconn *conn, const char *sql, bool echo,
-							   const char *table);
+static void run_vacuum_command(PGconn *conn, vacuumingOptions *vacopts,
+							   const char *sql, bool echo, const char *table);
 
 /*
  * Executes vacuum/analyze as indicated.  Returns 0 if the plan is carried
@@ -194,6 +194,14 @@ vacuum_one_database(ConnParams *cparams,
 
 	conn = connectDatabase(cparams, progname, echo, false, true);
 
+	if (vacopts->mode == MODE_REPACK && PQserverVersion(conn) < 190000)
+	{
+		/* XXX arguably, here we should use VACUUM FULL instead of failing */
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" command on server versions older than PostgreSQL %s",
+				 "REPACK", "19");
+	}
+
 	if (vacopts->disable_page_skipping && PQserverVersion(conn) < 90600)
 	{
 		PQfinish(conn);
@@ -286,9 +294,18 @@ vacuum_one_database(ConnParams *cparams,
 		if (vacopts->mode == MODE_ANALYZE_IN_STAGES)
 			printf(_("%s: processing database \"%s\": %s\n"),
 				   progname, PQdb(conn), _(stage_messages[stage]));
-		else
+		else if (vacopts->mode == MODE_ANALYZE)
+			printf(_("%s: analyzing database \"%s\"\n"),
+				   progname, PQdb(conn));
+		else if (vacopts->mode == MODE_VACUUM)
 			printf(_("%s: vacuuming database \"%s\"\n"),
 				   progname, PQdb(conn));
+		else
+		{
+			Assert(vacopts->mode == MODE_REPACK);
+			printf(_("%s: repacking database \"%s\"\n"),
+				   progname, PQdb(conn));
+		}
 		fflush(stdout);
 	}
 
@@ -383,7 +400,7 @@ vacuum_one_database(ConnParams *cparams,
 		 * through ParallelSlotsGetIdle.
 		 */
 		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, sql.data,
+		run_vacuum_command(free_slot->connection, vacopts, sql.data,
 						   echo, tabname);
 
 		cell = cell->next;
@@ -408,7 +425,7 @@ vacuum_one_database(ConnParams *cparams,
 		}
 
 		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, cmd, echo, NULL);
+		run_vacuum_command(free_slot->connection, vacopts, cmd, echo, NULL);
 
 		if (!ParallelSlotsWaitCompletion(sa))
 			ret = EXIT_FAILURE; /* error already reported by handler */
@@ -636,6 +653,35 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
 								 " AND listed_objects.object_oid IS NOT NULL\n");
 	}
 
+	/*
+	 * In REPACK mode, if the 'using_index' option was given but no index
+	 * name, filter only tables that have an index with indisclustered set.
+	 * (If an index name is given, we trust the user to pass a reasonable list
+	 * of tables.)
+	 *
+	 * XXX it may be worth printing an error if an index name is given with no
+	 * list of tables.
+	 */
+	if (vacopts->mode == MODE_REPACK &&
+		vacopts->using_index && !vacopts->indexname)
+	{
+		appendPQExpBufferStr(&catalog_query,
+							 " AND EXISTS (SELECT 1 FROM pg_catalog.pg_index\n"
+							 "    WHERE indrelid = c.oid AND indisclustered)\n");
+	}
+
+	/*
+	 * In REPACK mode, only consider the tables that the current user has
+	 * MAINTAIN privileges on.  XXX maybe we should do this in all cases, not
+	 * just REPACK.  The vacuumdb output is too noisy for no reason.
+	 */
+	if (vacopts->mode == MODE_REPACK)
+	{
+		appendPQExpBufferStr(&catalog_query,
+							 " AND pg_catalog.has_table_privilege(current_user, "
+							 "c.oid, 'MAINTAIN')\n");
+	}
+
 	/*
 	 * If no tables were listed, filter for the relevant relation types.  If
 	 * tables were given via --table, don't bother filtering by relation type.
@@ -874,8 +920,10 @@ prepare_vacuum_command(PGconn *conn, PQExpBuffer sql,
 			if (vacopts->verbose)
 				appendPQExpBufferStr(sql, " VERBOSE");
 		}
+
+		appendPQExpBuffer(sql, " %s", table);
 	}
-	else
+	else if (vacopts->mode == MODE_VACUUM)
 	{
 		appendPQExpBufferStr(sql, "VACUUM");
 
@@ -989,9 +1037,39 @@ prepare_vacuum_command(PGconn *conn, PQExpBuffer sql,
 			if (vacopts->and_analyze)
 				appendPQExpBufferStr(sql, " ANALYZE");
 		}
+
+		appendPQExpBuffer(sql, " %s", table);
 	}
+	else if (vacopts->mode == MODE_REPACK)
+	{
+		appendPQExpBufferStr(sql, "REPACK");
 
-	appendPQExpBuffer(sql, " %s;", table);
+		if (vacopts->verbose)
+		{
+			appendPQExpBuffer(sql, "%sVERBOSE", sep);
+			sep = comma;
+		}
+		if (vacopts->and_analyze)
+		{
+			appendPQExpBuffer(sql, "%sANALYZE", sep);
+			sep = comma;
+		}
+
+		if (sep != paren)
+			appendPQExpBufferChar(sql, ')');
+
+		appendPQExpBuffer(sql, " %s", table);
+
+		if (vacopts->using_index)
+		{
+			appendPQExpBuffer(sql, " USING INDEX");
+			if (vacopts->indexname)
+				appendPQExpBuffer(sql, " %s", fmtIdEnc(vacopts->indexname,
+													   PQclientEncoding(conn)));
+		}
+	}
+
+	appendPQExpBufferChar(sql, ';');
 }
 
 /*
@@ -1001,8 +1079,8 @@ prepare_vacuum_command(PGconn *conn, PQExpBuffer sql,
  * Any errors during command execution are reported to stderr.
  */
 static void
-run_vacuum_command(PGconn *conn, const char *sql, bool echo,
-				   const char *table)
+run_vacuum_command(PGconn *conn, vacuumingOptions *vacopts,
+				   const char *sql, bool echo, const char *table)
 {
 	bool		status;
 
@@ -1015,13 +1093,21 @@ run_vacuum_command(PGconn *conn, const char *sql, bool echo,
 	{
 		if (table)
 		{
-			pg_log_error("vacuuming of table \"%s\" in database \"%s\" failed: %s",
-						 table, PQdb(conn), PQerrorMessage(conn));
+			if (vacopts->mode == MODE_VACUUM)
+				pg_log_error("vacuuming of table \"%s\" in database \"%s\" failed: %s",
+							 table, PQdb(conn), PQerrorMessage(conn));
+			else
+				pg_log_error("repacking of table \"%s\" in database \"%s\" failed: %s",
+							 table, PQdb(conn), PQerrorMessage(conn));
 		}
 		else
 		{
-			pg_log_error("vacuuming of database \"%s\" failed: %s",
-						 PQdb(conn), PQerrorMessage(conn));
+			if (vacopts->mode == MODE_VACUUM)
+				pg_log_error("vacuuming of database \"%s\" failed: %s",
+							 PQdb(conn), PQerrorMessage(conn));
+			else
+				pg_log_error("repacking of database \"%s\" failed: %s",
+							 PQdb(conn), PQerrorMessage(conn));
 		}
 	}
 }
diff --git a/src/bin/scripts/vacuuming.h b/src/bin/scripts/vacuuming.h
index 49f968b32e5..665dbaedfad 100644
--- a/src/bin/scripts/vacuuming.h
+++ b/src/bin/scripts/vacuuming.h
@@ -20,6 +20,7 @@
 typedef enum
 {
 	MODE_VACUUM,
+	MODE_REPACK,
 	MODE_ANALYZE,
 	MODE_ANALYZE_IN_STAGES
 } RunMode;
@@ -37,6 +38,8 @@ typedef struct vacuumingOptions
 	bool		and_analyze;
 	bool		full;
 	bool		freeze;
+	bool		using_index;
+	char	   *indexname;
 	bool		disable_page_skipping;
 	bool		skip_locked;
 	int			min_xid_age;
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 60088a64cbb..652542e8e65 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -24,6 +24,7 @@
 #define CLUOPT_RECHECK 0x02		/* recheck relation state */
 #define CLUOPT_RECHECK_ISCLUSTERED 0x04 /* recheck relation state for
 										 * indisclustered */
+#define CLUOPT_ANALYZE 0x08		/* do an ANALYZE */
 
 /* options for CLUSTER */
 typedef struct ClusterParams
@@ -31,8 +32,11 @@ typedef struct ClusterParams
 	bits32		options;		/* bitmask of CLUOPT_* */
 } ClusterParams;
 
-extern void cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel);
-extern void cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params);
+
+extern void ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel);
+
+extern void cluster_rel(RepackCommand command, Relation OldHeap, Oid indexOid,
+						ClusterParams *params);
 extern void check_index_is_clusterable(Relation OldHeap, Oid indexOid,
 									   LOCKMODE lockmode);
 extern void mark_index_clustered(Relation rel, Oid indexOid, bool is_internal);
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 1cde4bd9bcf..ebf004b7aa5 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -56,28 +56,34 @@
 #define PROGRESS_ANALYZE_PHASE_COMPUTE_EXT_STATS		4
 #define PROGRESS_ANALYZE_PHASE_FINALIZE_ANALYZE			5
 
-/* Progress parameters for cluster */
-#define PROGRESS_CLUSTER_COMMAND				0
-#define PROGRESS_CLUSTER_PHASE					1
-#define PROGRESS_CLUSTER_INDEX_RELID			2
-#define PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED	3
-#define PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN	4
-#define PROGRESS_CLUSTER_TOTAL_HEAP_BLKS		5
-#define PROGRESS_CLUSTER_HEAP_BLKS_SCANNED		6
-#define PROGRESS_CLUSTER_INDEX_REBUILD_COUNT	7
-
-/* Phases of cluster (as advertised via PROGRESS_CLUSTER_PHASE) */
-#define PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP	1
-#define PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP	2
-#define PROGRESS_CLUSTER_PHASE_SORT_TUPLES		3
-#define PROGRESS_CLUSTER_PHASE_WRITE_NEW_HEAP	4
-#define PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES	5
-#define PROGRESS_CLUSTER_PHASE_REBUILD_INDEX	6
-#define PROGRESS_CLUSTER_PHASE_FINAL_CLEANUP	7
-
-/* Commands of PROGRESS_CLUSTER */
-#define PROGRESS_CLUSTER_COMMAND_CLUSTER		1
-#define PROGRESS_CLUSTER_COMMAND_VACUUM_FULL	2
+/*
+ * Progress parameters for REPACK.
+ *
+ * Values for PROGRESS_REPACK_COMMAND are defined as in RepackCommand.
+ *
+ * Note: Since REPACK shares code with CLUSTER, these values are also
+ * used by CLUSTER. (CLUSTER is now deprecated, so it makes little sense to
+ * introduce a separate set of constants.)
+ */
+#define PROGRESS_REPACK_COMMAND					0
+#define PROGRESS_REPACK_PHASE					1
+#define PROGRESS_REPACK_INDEX_RELID				2
+#define PROGRESS_REPACK_HEAP_TUPLES_SCANNED		3
+#define PROGRESS_REPACK_HEAP_TUPLES_WRITTEN		4
+#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			5
+#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		6
+#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		7
+
+/*
+ * Phases of repack (as advertised via PROGRESS_REPACK_PHASE).
+ */
+#define PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP		1
+#define PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP	2
+#define PROGRESS_REPACK_PHASE_SORT_TUPLES		3
+#define PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP	4
+#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	5
+#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		6
+#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		7
 
 /* Progress parameters for CREATE INDEX */
 /* 3, 4 and 5 reserved for "waitfor" metrics */
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index d14294a4ece..94892042b8d 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3951,18 +3951,6 @@ typedef struct AlterSystemStmt
 	VariableSetStmt *setstmt;	/* SET subcommand */
 } AlterSystemStmt;
 
-/* ----------------------
- *		Cluster Statement (support pbrown's cluster index implementation)
- * ----------------------
- */
-typedef struct ClusterStmt
-{
-	NodeTag		type;
-	RangeVar   *relation;		/* relation being indexed, or NULL if all */
-	char	   *indexname;		/* original index defined */
-	List	   *params;			/* list of DefElem nodes */
-} ClusterStmt;
-
 /* ----------------------
  *		Vacuum and Analyze Statements
  *
@@ -3975,7 +3963,7 @@ typedef struct VacuumStmt
 	NodeTag		type;
 	List	   *options;		/* list of DefElem nodes */
 	List	   *rels;			/* list of VacuumRelation, or NIL for all */
-	bool		is_vacuumcmd;	/* true for VACUUM, false for ANALYZE */
+	bool		is_vacuumcmd;	/* true for VACUUM, false otherwise */
 } VacuumStmt;
 
 /*
@@ -3993,6 +3981,27 @@ typedef struct VacuumRelation
 	List	   *va_cols;		/* list of column names, or NIL for all */
 } VacuumRelation;
 
+/* ----------------------
+ *		Repack Statement
+ * ----------------------
+ */
+typedef enum RepackCommand
+{
+	REPACK_COMMAND_CLUSTER = 1,
+	REPACK_COMMAND_REPACK,
+	REPACK_COMMAND_VACUUMFULL,
+} RepackCommand;
+
+typedef struct RepackStmt
+{
+	NodeTag		type;
+	RepackCommand command;		/* type of command being run */
+	VacuumRelation *relation;	/* relation being repacked */
+	char	   *indexname;		/* order tuples by this index */
+	bool		usingindex;		/* whether USING INDEX is specified */
+	List	   *params;			/* list of DefElem nodes */
+} RepackStmt;
+
 /* ----------------------
  *		Explain Statement
  *
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 5d4fe27ef96..f1a1d5e7a80 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -376,6 +376,7 @@ PG_KEYWORD("reindex", REINDEX, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("relative", RELATIVE_P, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("release", RELEASE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("rename", RENAME, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("repack", REPACK, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("repeatable", REPEATABLE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("replace", REPLACE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("replica", REPLICA, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/include/tcop/cmdtaglist.h b/src/include/tcop/cmdtaglist.h
index c4606d65043..66690f1134a 100644
--- a/src/include/tcop/cmdtaglist.h
+++ b/src/include/tcop/cmdtaglist.h
@@ -196,6 +196,7 @@ PG_CMDTAG(CMDTAG_REASSIGN_OWNED, "REASSIGN OWNED", false, false, false)
 PG_CMDTAG(CMDTAG_REFRESH_MATERIALIZED_VIEW, "REFRESH MATERIALIZED VIEW", true, false, false)
 PG_CMDTAG(CMDTAG_REINDEX, "REINDEX", true, false, false)
 PG_CMDTAG(CMDTAG_RELEASE, "RELEASE", false, false, false)
+PG_CMDTAG(CMDTAG_REPACK, "REPACK", false, false, false)
 PG_CMDTAG(CMDTAG_RESET, "RESET", false, false, false)
 PG_CMDTAG(CMDTAG_REVOKE, "REVOKE", true, false, false)
 PG_CMDTAG(CMDTAG_REVOKE_ROLE, "REVOKE ROLE", false, false, false)
diff --git a/src/include/utils/backend_progress.h b/src/include/utils/backend_progress.h
index dda813ab407..e69e366dcdc 100644
--- a/src/include/utils/backend_progress.h
+++ b/src/include/utils/backend_progress.h
@@ -28,6 +28,7 @@ typedef enum ProgressCommandType
 	PROGRESS_COMMAND_CREATE_INDEX,
 	PROGRESS_COMMAND_BASEBACKUP,
 	PROGRESS_COMMAND_COPY,
+	PROGRESS_COMMAND_REPACK,
 } ProgressCommandType;
 
 #define PGSTAT_NUM_PROGRESS_PARAM	20
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index 4d40a6809ab..277854418fa 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -495,6 +495,43 @@ ALTER TABLE clstrpart SET WITHOUT CLUSTER;
 ERROR:  cannot mark index clustered in partitioned table
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 ERROR:  cannot mark index clustered in partitioned table
+-- Check that REPACK sets new relfilenodes: it should process exactly the same
+-- tables as CLUSTER did.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart USING INDEX clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
+-- And finally the same for REPACK w/o index.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
 DROP TABLE clstrpart;
 -- Ownership of partitions is checked
 CREATE TABLE ptnowner(i int unique) PARTITION BY LIST (i);
@@ -513,7 +550,7 @@ CREATE TEMP TABLE ptnowner_oldnodes AS
   JOIN pg_class AS c ON c.oid=tree.relid;
 SET SESSION AUTHORIZATION regress_ptnowner;
 CLUSTER ptnowner USING ptnowner_i_idx;
-WARNING:  permission denied to cluster "ptnowner2", skipping it
+WARNING:  permission denied to execute CLUSTER on "ptnowner2", skipping it
 RESET SESSION AUTHORIZATION;
 SELECT a.relname, a.relfilenode=b.relfilenode FROM pg_class a
   JOIN ptnowner_oldnodes b USING (oid) ORDER BY a.relname COLLATE "C";
@@ -665,6 +702,101 @@ SELECT * FROM clstr_expression WHERE -a = -3 ORDER BY -a, b;
 (4 rows)
 
 COMMIT;
+----------------------------------------------------------------------
+--
+-- REPACK
+--
+----------------------------------------------------------------------
+-- REPACK handles individual tables identically to CLUSTER, but it's worth
+-- checking if it handles table hierarchies identically as well.
+REPACK clstr_tst USING INDEX clstr_tst_c;
+-- Verify that inheritance link still works
+INSERT INTO clstr_tst_inh VALUES (0, 100, 'in child table 2');
+SELECT a,b,c,substring(d for 30), length(d) from clstr_tst;
+ a  |  b  |        c         |           substring            | length 
+----+-----+------------------+--------------------------------+--------
+ 10 |  14 | catorce          |                                |       
+ 18 |   5 | cinco            |                                |       
+  9 |   4 | cuatro           |                                |       
+ 26 |  19 | diecinueve       |                                |       
+ 12 |  18 | dieciocho        |                                |       
+ 30 |  16 | dieciseis        |                                |       
+ 24 |  17 | diecisiete       |                                |       
+  2 |  10 | diez             |                                |       
+ 23 |  12 | doce             |                                |       
+ 11 |   2 | dos              |                                |       
+ 25 |   9 | nueve            |                                |       
+ 31 |   8 | ocho             |                                |       
+  1 |  11 | once             |                                |       
+ 28 |  15 | quince           |                                |       
+ 32 |   6 | seis             | xyzzyxyzzyxyzzyxyzzyxyzzyxyzzy | 500000
+ 29 |   7 | siete            |                                |       
+ 15 |  13 | trece            |                                |       
+ 22 |  30 | treinta          |                                |       
+ 17 |  32 | treinta y dos    |                                |       
+  3 |  31 | treinta y uno    |                                |       
+  5 |   3 | tres             |                                |       
+ 20 |   1 | uno              |                                |       
+  6 |  20 | veinte           |                                |       
+ 14 |  25 | veinticinco      |                                |       
+ 21 |  24 | veinticuatro     |                                |       
+  4 |  22 | veintidos        |                                |       
+ 19 |  29 | veintinueve      |                                |       
+ 16 |  28 | veintiocho       |                                |       
+ 27 |  26 | veintiseis       |                                |       
+ 13 |  27 | veintisiete      |                                |       
+  7 |  23 | veintitres       |                                |       
+  8 |  21 | veintiuno        |                                |       
+  0 | 100 | in child table   |                                |       
+  0 | 100 | in child table 2 |                                |       
+(34 rows)
+
+-- Verify that foreign key link still works
+INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
+ERROR:  insert or update on table "clstr_tst" violates foreign key constraint "clstr_tst_con"
+DETAIL:  Key (b)=(1111) is not present in table "clstr_tst_s".
+SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
+ORDER BY 1;
+       conname        
+----------------------
+ clstr_tst_a_not_null
+ clstr_tst_con
+ clstr_tst_pkey
+(3 rows)
+
+-- Verify partial analyze works
+REPACK (ANALYZE) clstr_tst (a);
+REPACK (ANALYZE) clstr_tst;
+REPACK (VERBOSE) clstr_tst (a);
+ERROR:  ANALYZE option must be specified when a column list is provided
+-- REPACK w/o argument performs no ordering, so we can only check which tables
+-- have the relfilenode changed.
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_old AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+SET SESSION AUTHORIZATION regress_clstr_user;
+SET client_min_messages = ERROR;  -- order of "skipping" warnings may vary
+REPACK;
+RESET client_min_messages;
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_new AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+-- Do the actual comparison. Unlike CLUSTER, clstr_3 should have been
+-- processed because there is nothing like clustering index here.
+SELECT o.relname FROM relnodes_old o
+JOIN relnodes_new n ON o.relname = n.relname
+WHERE o.relfilenode <> n.relfilenode
+ORDER BY o.relname;
+ relname 
+---------
+ clstr_1
+ clstr_3
+(2 rows)
+
 -- clean up
 DROP TABLE clustertest;
 DROP TABLE clstr_1;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 85d795dbd63..69bf6b1baf6 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1995,34 +1995,23 @@ pg_stat_progress_basebackup| SELECT pid,
             ELSE NULL::text
         END AS backup_type
    FROM pg_stat_get_progress_info('BASEBACKUP'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20);
-pg_stat_progress_cluster| SELECT s.pid,
-    s.datid,
-    d.datname,
-    s.relid,
-        CASE s.param1
-            WHEN 1 THEN 'CLUSTER'::text
-            WHEN 2 THEN 'VACUUM FULL'::text
-            ELSE NULL::text
+pg_stat_progress_cluster| SELECT pid,
+    datid,
+    datname,
+    relid,
+        CASE
+            WHEN (command = ANY (ARRAY['CLUSTER'::text, 'VACUUM FULL'::text])) THEN command
+            WHEN (repack_index_relid = (0)::oid) THEN 'VACUUM FULL'::text
+            ELSE 'CLUSTER'::text
         END AS command,
-        CASE s.param2
-            WHEN 0 THEN 'initializing'::text
-            WHEN 1 THEN 'seq scanning heap'::text
-            WHEN 2 THEN 'index scanning heap'::text
-            WHEN 3 THEN 'sorting tuples'::text
-            WHEN 4 THEN 'writing new heap'::text
-            WHEN 5 THEN 'swapping relation files'::text
-            WHEN 6 THEN 'rebuilding index'::text
-            WHEN 7 THEN 'performing final cleanup'::text
-            ELSE NULL::text
-        END AS phase,
-    (s.param3)::oid AS cluster_index_relid,
-    s.param4 AS heap_tuples_scanned,
-    s.param5 AS heap_tuples_written,
-    s.param6 AS heap_blks_total,
-    s.param7 AS heap_blks_scanned,
-    s.param8 AS index_rebuild_count
-   FROM (pg_stat_get_progress_info('CLUSTER'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
-     LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+    phase,
+    repack_index_relid AS cluster_index_relid,
+    heap_tuples_scanned,
+    heap_tuples_written,
+    heap_blks_total,
+    heap_blks_scanned,
+    index_rebuild_count
+   FROM pg_stat_progress_repack;
 pg_stat_progress_copy| SELECT s.pid,
     s.datid,
     d.datname,
@@ -2082,6 +2071,35 @@ pg_stat_progress_create_index| SELECT s.pid,
     s.param15 AS partitions_done
    FROM (pg_stat_get_progress_info('CREATE INDEX'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+pg_stat_progress_repack| SELECT s.pid,
+    s.datid,
+    d.datname,
+    s.relid,
+        CASE s.param1
+            WHEN 1 THEN 'CLUSTER'::text
+            WHEN 2 THEN 'REPACK'::text
+            WHEN 3 THEN 'VACUUM FULL'::text
+            ELSE NULL::text
+        END AS command,
+        CASE s.param2
+            WHEN 0 THEN 'initializing'::text
+            WHEN 1 THEN 'seq scanning heap'::text
+            WHEN 2 THEN 'index scanning heap'::text
+            WHEN 3 THEN 'sorting tuples'::text
+            WHEN 4 THEN 'writing new heap'::text
+            WHEN 5 THEN 'swapping relation files'::text
+            WHEN 6 THEN 'rebuilding index'::text
+            WHEN 7 THEN 'performing final cleanup'::text
+            ELSE NULL::text
+        END AS phase,
+    (s.param3)::oid AS repack_index_relid,
+    s.param4 AS heap_tuples_scanned,
+    s.param5 AS heap_tuples_written,
+    s.param6 AS heap_blks_total,
+    s.param7 AS heap_blks_scanned,
+    s.param8 AS index_rebuild_count
+   FROM (pg_stat_get_progress_info('REPACK'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
+     LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_vacuum| SELECT s.pid,
     s.datid,
     d.datname,
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index b7115f86104..c976823a3cb 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -76,7 +76,6 @@ INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
 SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
 ORDER BY 1;
 
-
 SELECT relname, relkind,
     EXISTS(SELECT 1 FROM pg_class WHERE oid = c.reltoastrelid) AS hastoast
 FROM pg_class c WHERE relname LIKE 'clstr_tst%' ORDER BY relname;
@@ -229,6 +228,24 @@ SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM o
 CLUSTER clstrpart;
 ALTER TABLE clstrpart SET WITHOUT CLUSTER;
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+
+-- Check that REPACK sets new relfilenodes: it should process exactly the same
+-- tables as CLUSTER did.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart USING INDEX clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+
+-- And finally the same for REPACK w/o index.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+
 DROP TABLE clstrpart;
 
 -- Ownership of partitions is checked
@@ -313,6 +330,57 @@ EXPLAIN (COSTS OFF) SELECT * FROM clstr_expression WHERE -a = -3 ORDER BY -a, b;
 SELECT * FROM clstr_expression WHERE -a = -3 ORDER BY -a, b;
 COMMIT;
 
+----------------------------------------------------------------------
+--
+-- REPACK
+--
+----------------------------------------------------------------------
+
+-- REPACK handles individual tables identically to CLUSTER, but it's worth
+-- checking if it handles table hierarchies identically as well.
+REPACK clstr_tst USING INDEX clstr_tst_c;
+
+-- Verify that inheritance link still works
+INSERT INTO clstr_tst_inh VALUES (0, 100, 'in child table 2');
+SELECT a,b,c,substring(d for 30), length(d) from clstr_tst;
+
+-- Verify that foreign key link still works
+INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
+
+SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
+ORDER BY 1;
+
+-- Verify partial analyze works
+REPACK (ANALYZE) clstr_tst (a);
+REPACK (ANALYZE) clstr_tst;
+REPACK (VERBOSE) clstr_tst (a);
+
+-- REPACK w/o argument performs no ordering, so we can only check which tables
+-- have the relfilenode changed.
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_old AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+
+SET SESSION AUTHORIZATION regress_clstr_user;
+SET client_min_messages = ERROR;  -- order of "skipping" warnings may vary
+REPACK;
+RESET client_min_messages;
+
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_new AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+
+-- Do the actual comparison. Unlike CLUSTER, clstr_3 should have been
+-- processed because there is nothing like clustering index here.
+SELECT o.relname FROM relnodes_old o
+JOIN relnodes_new n ON o.relname = n.relname
+WHERE o.relfilenode <> n.relfilenode
+ORDER BY o.relname;
+
 -- clean up
 DROP TABLE clustertest;
 DROP TABLE clstr_1;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9dd65b10254..4f3c7c160a6 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2557,6 +2557,8 @@ ReorderBufferTupleCidKey
 ReorderBufferUpdateProgressTxnCB
 ReorderTuple
 RepOriginId
+RepackCommand
+RepackStmt
 ReparameterizeForeignPathByChild_function
 ReplaceVarsFromTargetList_context
 ReplaceVarsNoMatchOption
-- 
2.47.3

v27-0002-Refactor-index_concurrently_create_copy-for-use-with.patchtext/x-diffDownload

From 5f8676451d30587fada6eac478d716a717d0fdb3 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Tue, 9 Dec 2025 19:44:42 +0100
Subject: [PATCH 2/5] Refactor index_concurrently_create_copy() for use with
 REPACK (CONCURRENTLY).

This patch moves the code to index_create_copy() and adds a "concurrently"
parameter so it can be used by REPACK (CONCURRENTLY).

With the CONCURRENTLY option, REPACK cannot simply swap the heap file and
rebuild its indexes. Instead, it needs to build a separate set of indexes
(including system catalog entries) *before* the actual swap, to reduce the
time AccessExclusiveLock needs to be held for.
---
 src/backend/catalog/index.c      | 54 +++++++++++++++++++++++---------
 src/backend/commands/indexcmds.c |  6 ++--
 src/backend/nodes/makefuncs.c    |  9 +++---
 src/include/catalog/index.h      |  3 ++
 src/include/nodes/makefuncs.h    |  4 ++-
 5 files changed, 54 insertions(+), 22 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 08d4b8e44d7..cf2d0abf370 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1290,15 +1290,32 @@ index_create(Relation heapRelation,
 /*
  * index_concurrently_create_copy
  *
- * Create concurrently an index based on the definition of the one provided by
- * caller.  The index is inserted into catalogs and needs to be built later
- * on.  This is called during concurrent reindex processing.
- *
- * "tablespaceOid" is the tablespace to use for this index.
+ * Variant of index_create_copy(), called during concurrent reindex
+ * processing.
  */
 Oid
 index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 							   Oid tablespaceOid, const char *newName)
+{
+	return index_create_copy(heapRelation, oldIndexId, tablespaceOid, newName,
+							 true);
+}
+
+/*
+ * index_create_copy
+ *
+ * Create an index based on the definition of the one provided by caller.  The
+ * index is inserted into catalogs. If 'concurrently' is TRUE, it needs to be
+ * built later on, otherwise it's built immediately.
+ *
+ * "tablespaceOid" is the tablespace to use for this index.
+ *
+ * The actual implementation of index_concurrently_create_copy(), reusable for
+ * other purposes.
+ */
+Oid
+index_create_copy(Relation heapRelation, Oid oldIndexId, Oid tablespaceOid,
+				  const char *newName, bool concurrently)
 {
 	Relation	indexRelation;
 	IndexInfo  *oldInfo,
@@ -1317,6 +1334,7 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 	List	   *indexColNames = NIL;
 	List	   *indexExprs = NIL;
 	List	   *indexPreds = NIL;
+	int			flags = 0;
 
 	indexRelation = index_open(oldIndexId, RowExclusiveLock);
 
@@ -1327,7 +1345,7 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 	 * Concurrent build of an index with exclusion constraints is not
 	 * supported.
 	 */
-	if (oldInfo->ii_ExclusionOps != NULL)
+	if (oldInfo->ii_ExclusionOps != NULL && concurrently)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("concurrent index creation for exclusion constraints is not supported")));
@@ -1383,9 +1401,7 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 	}
 
 	/*
-	 * Build the index information for the new index.  Note that rebuild of
-	 * indexes with exclusion constraints is not supported, hence there is no
-	 * need to fill all the ii_Exclusion* fields.
+	 * Build the index information for the new index.
 	 */
 	newInfo = makeIndexInfo(oldInfo->ii_NumIndexAttrs,
 							oldInfo->ii_NumIndexKeyAttrs,
@@ -1394,10 +1410,13 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 							indexPreds,
 							oldInfo->ii_Unique,
 							oldInfo->ii_NullsNotDistinct,
-							false,	/* not ready for inserts */
-							true,
+							!concurrently,	/* isready */
+							concurrently,	/* concurrent */
 							indexRelation->rd_indam->amsummarizing,
-							oldInfo->ii_WithoutOverlaps);
+							oldInfo->ii_WithoutOverlaps,
+							oldInfo->ii_ExclusionOps,
+							oldInfo->ii_ExclusionProcs,
+							oldInfo->ii_ExclusionStrats);
 
 	/*
 	 * Extract the list of column names and the column numbers for the new
@@ -1435,6 +1454,9 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 		stattargets[i].isnull = isnull;
 	}
 
+	if (concurrently)
+		flags = INDEX_CREATE_SKIP_BUILD | INDEX_CREATE_CONCURRENT;
+
 	/*
 	 * Now create the new index.
 	 *
@@ -1458,7 +1480,7 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 							  indcoloptions->values,
 							  stattargets,
 							  reloptionsDatum,
-							  INDEX_CREATE_SKIP_BUILD | INDEX_CREATE_CONCURRENT,
+							  flags,
 							  0,
 							  true, /* allow table to be a system catalog? */
 							  false,	/* is_internal? */
@@ -2452,7 +2474,8 @@ BuildIndexInfo(Relation index)
 					   indexStruct->indisready,
 					   false,
 					   index->rd_indam->amsummarizing,
-					   indexStruct->indisexclusion && indexStruct->indisunique);
+					   indexStruct->indisexclusion && indexStruct->indisunique,
+					   NULL, NULL, NULL);
 
 	/* fill in attribute numbers */
 	for (i = 0; i < numAtts; i++)
@@ -2512,7 +2535,8 @@ BuildDummyIndexInfo(Relation index)
 					   indexStruct->indisready,
 					   false,
 					   index->rd_indam->amsummarizing,
-					   indexStruct->indisexclusion && indexStruct->indisunique);
+					   indexStruct->indisexclusion && indexStruct->indisunique,
+					   NULL, NULL, NULL);
 
 	/* fill in attribute numbers */
 	for (i = 0; i < numAtts; i++)
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index d9cccb6ac18..d8d8f72a875 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -242,7 +242,8 @@ CheckIndexCompatible(Oid oldId,
 	 */
 	indexInfo = makeIndexInfo(numberOfAttributes, numberOfAttributes,
 							  accessMethodId, NIL, NIL, false, false,
-							  false, false, amsummarizing, isWithoutOverlaps);
+							  false, false, amsummarizing, isWithoutOverlaps,
+							  NULL, NULL, NULL);
 	typeIds = palloc_array(Oid, numberOfAttributes);
 	collationIds = palloc_array(Oid, numberOfAttributes);
 	opclassIds = palloc_array(Oid, numberOfAttributes);
@@ -927,7 +928,8 @@ DefineIndex(Oid tableId,
 							  !concurrent,
 							  concurrent,
 							  amissummarizing,
-							  stmt->iswithoutoverlaps);
+							  stmt->iswithoutoverlaps,
+							  NULL, NULL, NULL);
 
 	typeIds = palloc_array(Oid, numberOfAttributes);
 	collationIds = palloc_array(Oid, numberOfAttributes);
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index e2d9e9be41a..c5d5a37f514 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -834,7 +834,8 @@ IndexInfo *
 makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 			  List *predicates, bool unique, bool nulls_not_distinct,
 			  bool isready, bool concurrent, bool summarizing,
-			  bool withoutoverlaps)
+			  bool withoutoverlaps, Oid *exclusion_ops, Oid *exclusion_procs,
+			  uint16 *exclusion_strats)
 {
 	IndexInfo  *n = makeNode(IndexInfo);
 
@@ -863,9 +864,9 @@ makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 	n->ii_PredicateState = NULL;
 
 	/* exclusion constraints */
-	n->ii_ExclusionOps = NULL;
-	n->ii_ExclusionProcs = NULL;
-	n->ii_ExclusionStrats = NULL;
+	n->ii_ExclusionOps = exclusion_ops;
+	n->ii_ExclusionProcs = exclusion_procs;
+	n->ii_ExclusionStrats = exclusion_strats;
 
 	/* speculative inserts */
 	n->ii_UniqueOps = NULL;
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index dda95e54903..4bf909078d8 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -99,6 +99,9 @@ extern Oid	index_concurrently_create_copy(Relation heapRelation,
 										   Oid oldIndexId,
 										   Oid tablespaceOid,
 										   const char *newName);
+extern Oid	index_create_copy(Relation heapRelation, Oid oldIndexId,
+							  Oid tablespaceOid, const char *newName,
+							  bool concurrently);
 
 extern void index_concurrently_build(Oid heapRelationId,
 									 Oid indexRelationId);
diff --git a/src/include/nodes/makefuncs.h b/src/include/nodes/makefuncs.h
index 5473ce9a288..9ff7159ff0c 100644
--- a/src/include/nodes/makefuncs.h
+++ b/src/include/nodes/makefuncs.h
@@ -99,7 +99,9 @@ extern IndexInfo *makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid,
 								List *expressions, List *predicates,
 								bool unique, bool nulls_not_distinct,
 								bool isready, bool concurrent,
-								bool summarizing, bool withoutoverlaps);
+								bool summarizing, bool withoutoverlaps,
+								Oid *exclusion_ops, Oid *exclusion_procs,
+								uint16 *exclusion_strats);
 
 extern Node *makeStringConst(char *str, int location);
 extern DefElem *makeDefElem(char *name, Node *arg, int location);
-- 
2.47.3

v27-0003-Move-conversion-of-a-historic-to-MVCC-snapshot-to-a-.patchtext/x-diffDownload

From fbeb5aecb602b989b1a7b5f457aba5943942e7f6 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Tue, 9 Dec 2025 19:44:42 +0100
Subject: [PATCH 3/5] Move conversion of a "historic" to MVCC snapshot to a
 separate function.

The conversion is now handled by SnapBuildMVCCFromHistoric(). REPACK
CONCURRENTLY will also need it.
---
 src/backend/commands/cluster.c              |  8 ++-
 src/backend/replication/logical/snapbuild.c | 57 +++++++++++++++++----
 src/backend/utils/time/snapmgr.c            |  3 +-
 src/include/replication/snapbuild.h         |  1 +
 src/include/utils/snapmgr.h                 |  1 +
 5 files changed, 53 insertions(+), 17 deletions(-)

diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 3afab656cd9..89f0b03a31c 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -70,8 +70,7 @@ typedef struct
 
 static bool cluster_rel_recheck(RepackCommand cmd, Relation OldHeap,
 								Oid indexOid, Oid userid, int options);
-static void rebuild_relation(RepackCommand cmd,
-							 Relation OldHeap, Relation index, bool verbose);
+static void rebuild_relation(Relation OldHeap, Relation index, bool verbose);
 static void copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
@@ -415,7 +414,7 @@ cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	TransferPredicateLocksToHeapRelation(OldHeap);
 
 	/* rebuild_relation does all the dirty work */
-	rebuild_relation(cmd, OldHeap, index, verbose);
+	rebuild_relation(OldHeap, index, verbose);
 	/* rebuild_relation closes OldHeap, and index if valid */
 
 out:
@@ -629,8 +628,7 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
  * On exit, they are closed, but locks on them are not released.
  */
 static void
-rebuild_relation(RepackCommand cmd,
-				 Relation OldHeap, Relation index, bool verbose)
+rebuild_relation(Relation OldHeap, Relation index, bool verbose)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			accessMethod = OldHeap->rd_rel->relam;
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 6e18baa33cb..34bdd987478 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -440,10 +440,7 @@ Snapshot
 SnapBuildInitialSnapshot(SnapBuild *builder)
 {
 	Snapshot	snap;
-	TransactionId xid;
 	TransactionId safeXid;
-	TransactionId *newxip;
-	int			newxcnt = 0;
 
 	Assert(XactIsoLevel == XACT_REPEATABLE_READ);
 	Assert(builder->building_full_snapshot);
@@ -485,7 +482,33 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 
 	MyProc->xmin = snap->xmin;
 
-	/* allocate in transaction context */
+	/* Convert the historic snapshot to MVCC snapshot. */
+	return SnapBuildMVCCFromHistoric(snap, true);
+}
+
+/*
+ * Turn a historic MVCC snapshot into an ordinary MVCC snapshot.
+ *
+ * Unlike a regular (non-historic) MVCC snapshot, the 'xip' array of this
+ * snapshot contains not only running main transactions, but also their
+ * subtransactions. On the other hand, 'subxip' will usually be empty. This
+ * difference does not affect the result of XidInMVCCSnapshot() because it
+ * searches both in 'xip' and 'subxip'.
+ *
+ * Pass true for 'in_place' if you don't care about modifying the source
+ * snapshot. If you need a new instance, and one that was allocated as a
+ * single chunk of memory, pass false.
+ */
+Snapshot
+SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place)
+{
+	TransactionId xid;
+	TransactionId *oldxip = snapshot->xip;
+	uint32		oldxcnt = snapshot->xcnt;
+	TransactionId *newxip;
+	int			newxcnt = 0;
+	Snapshot	result;
+
 	newxip = (TransactionId *)
 		palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount());
 
@@ -495,7 +518,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	 * classical snapshot by marking all non-committed transactions as
 	 * in-progress. This can be expensive.
 	 */
-	for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);)
+	for (xid = snapshot->xmin; NormalTransactionIdPrecedes(xid, snapshot->xmax);)
 	{
 		void	   *test;
 
@@ -503,7 +526,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 		 * Check whether transaction committed using the decoding snapshot
 		 * meaning of ->xip.
 		 */
-		test = bsearch(&xid, snap->xip, snap->xcnt,
+		test = bsearch(&xid, snapshot->xip, snapshot->xcnt,
 					   sizeof(TransactionId), xidComparator);
 
 		if (test == NULL)
@@ -520,11 +543,25 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	}
 
 	/* adjust remaining snapshot fields as needed */
-	snap->snapshot_type = SNAPSHOT_MVCC;
-	snap->xcnt = newxcnt;
-	snap->xip = newxip;
+	snapshot->xcnt = newxcnt;
+	snapshot->xip = newxip;
+
+	if (in_place)
+		result = snapshot;
+	else
+	{
+		result = CopySnapshot(snapshot);
+
+		/* Restore the original values so the source is intact. */
+		snapshot->xip = oldxip;
+		snapshot->xcnt = oldxcnt;
+
+		/* newxip has been copied */
+		pfree(newxip);
+	}
+	result->snapshot_type = SNAPSHOT_MVCC;
 
-	return snap;
+	return result;
 }
 
 /*
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 24f73a49d27..886060305f5 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -213,7 +213,6 @@ typedef struct ExportedSnapshot
 static List *exportedSnapshots = NIL;
 
 /* Prototypes for local functions */
-static Snapshot CopySnapshot(Snapshot snapshot);
 static void UnregisterSnapshotNoOwner(Snapshot snapshot);
 static void FreeSnapshot(Snapshot snapshot);
 static void SnapshotResetXmin(void);
@@ -604,7 +603,7 @@ SetTransactionSnapshot(Snapshot sourcesnap, VirtualTransactionId *sourcevxid,
  * The copy is palloc'd in TopTransactionContext and has initial refcounts set
  * to 0.  The returned snapshot has the copied flag set.
  */
-static Snapshot
+Snapshot
 CopySnapshot(Snapshot snapshot)
 {
 	Snapshot	newsnap;
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index 44031dcf6e3..6d4d2d1814c 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -73,6 +73,7 @@ extern void FreeSnapshotBuilder(SnapBuild *builder);
 extern void SnapBuildSnapDecRefcount(Snapshot snap);
 
 extern Snapshot SnapBuildInitialSnapshot(SnapBuild *builder);
+extern Snapshot SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place);
 extern const char *SnapBuildExportSnapshot(SnapBuild *builder);
 extern void SnapBuildClearExportedSnapshot(void);
 extern void SnapBuildResetExportedSnapshotState(void);
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..f65f83c85cd 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -63,6 +63,7 @@ extern Snapshot GetTransactionSnapshot(void);
 extern Snapshot GetLatestSnapshot(void);
 extern void SnapshotSetCommandId(CommandId curcid);
 
+extern Snapshot CopySnapshot(Snapshot snapshot);
 extern Snapshot GetCatalogSnapshot(Oid relid);
 extern Snapshot GetNonHistoricCatalogSnapshot(Oid relid);
 extern void InvalidateCatalogSnapshot(void);
-- 
2.47.3

v27-0004-Add-CONCURRENTLY-option-to-REPACK-command.patchtext/plainDownload

From ac222c7cff8365b87f02edaa2622955eef7f2bbb Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Tue, 9 Dec 2025 19:44:43 +0100
Subject: [PATCH 4/5] Add CONCURRENTLY option to REPACK command.

The REPACK command copies the relation data into a new file, creates new
indexes and eventually swaps the files. To make sure that the old file does
not change during the copying, the relation is locked in an exclusive mode,
which prevents applications from both reading and writing. (To keep the data
consistent, we'd only need to prevent the applications from writing, but even
reading needs to be blocked before we can swap the files - otherwise some
applications could continue using the old file. Since we should not request a
stronger lock without releasing the weaker one first, we acquire the exclusive
lock in the beginning and keep it till the end of the processing.)

This patch introduces an alternative workflow, which only requires the
exclusive lock when the relation (and index) files are being swapped.
(Supposedly, the swapping should be pretty fast.) On the other hand, when we
copy the data to the new file, we allow applications to read from the relation
and even to write to it.

First, we scan the relation using a "historic snapshot", and insert all the
tuples satisfying this snapshot into the new file.

Second, logical decoding is used to capture the data changes done by
applications during the copying (i.e. changes that do not satisfy the historic
snapshot mentioned above), and those are applied to the new file before we
acquire the exclusive lock that we need to swap the files. (Of course, more
data changes can take place while we are waiting for the lock - these will be
applied to the new file after we have acquired the lock, before we swap the
files.)

Since the logical decoding system, during its startup, waits until all the
transactions which already have XID assigned have finished, there is a risk of
deadlock if a transaction that already changed anything in the database tries
to acquire a conflicting lock on the table REPACK CONCURRENTLY is working
on. As an example, consider transaction running CREATE INDEX command on the
table that is being REPACKed CONCURRENTLY. On the other hand, DML commands
(INSERT, UPDATE, DELETE) are not a problem as their lock does not conflict
with REPACK CONCURRENTLY.

The current approach is that we accept the risk. If we tried to avoid it, it'd
be necessary to unlock the table before the logical decoding is setup and lock
it again afterwards. Such temporary unlocking would imply re-checking if the
table still meets all the requirements for REPACK CONCURRENTLY.

Like the existing implementation of REPACK, the variant with the CONCURRENTLY
option also requires an extra space for the new relation and index files
(which coexist with the old files for some time). In addition, the
CONCURRENTLY option might introduce a lag in releasing WAL segments for
archiving / recycling. This is due to the decoding of the data changes done by
applications concurrently. When copying the table contents into the new file,
we check the lag periodically. If it exceeds the size of a WAL segment, we
decode all the available WAL before resuming the copying. (Of course, the
changes are not applied until the whole table contents is copied.) A
background worker might be a better approach for the decoding - let's consider
implementing it in the future.

The WAL records produced by running DML commands on the new relation do not
contain enough information to be processed by the logical decoding system. All
we need from the new relation is the file (relfilenode), while the actual
relation is eventually dropped. Thus there is no point in replaying the DMLs
anywhere.
---
 doc/src/sgml/monitoring.sgml                  |   37 +-
 doc/src/sgml/mvcc.sgml                        |   12 +-
 doc/src/sgml/ref/repack.sgml                  |  129 +-
 src/Makefile                                  |    1 +
 src/backend/access/heap/heapam.c              |   34 +-
 src/backend/access/heap/heapam_handler.c      |  227 ++-
 src/backend/access/heap/rewriteheap.c         |    6 +-
 src/backend/catalog/system_views.sql          |   19 +-
 src/backend/commands/cluster.c                | 1593 +++++++++++++++--
 src/backend/commands/matview.c                |    2 +-
 src/backend/commands/tablecmds.c              |    1 +
 src/backend/commands/vacuum.c                 |   12 +-
 src/backend/meson.build                       |    1 +
 src/backend/replication/logical/decode.c      |   93 +
 src/backend/replication/logical/snapbuild.c   |   21 +
 .../replication/pgoutput_repack/Makefile      |   32 +
 .../replication/pgoutput_repack/meson.build   |   18 +
 .../pgoutput_repack/pgoutput_repack.c         |  240 +++
 .../storage/lmgr/generate-lwlocknames.pl      |    2 +-
 src/backend/utils/time/snapmgr.c              |    3 +-
 src/bin/psql/tab-complete.in.c                |    4 +-
 src/include/access/heapam.h                   |    5 +-
 src/include/access/heapam_xlog.h              |    2 +
 src/include/access/tableam.h                  |   10 +
 src/include/commands/cluster.h                |   90 +-
 src/include/commands/progress.h               |   17 +-
 src/include/replication/snapbuild.h           |    1 +
 src/include/storage/lockdefs.h                |    4 +-
 src/include/utils/snapmgr.h                   |    2 +
 src/test/modules/injection_points/Makefile    |    3 +
 .../injection_points/expected/repack.out      |  113 ++
 .../expected/repack_toast.out                 |   64 +
 .../modules/injection_points/logical.conf     |    1 +
 src/test/modules/injection_points/meson.build |    4 +
 .../injection_points/specs/repack.spec        |  142 ++
 .../injection_points/specs/repack_toast.spec  |  105 ++
 src/test/regress/expected/rules.out           |   19 +-
 src/tools/pgindent/typedefs.list              |    5 +
 38 files changed, 2840 insertions(+), 234 deletions(-)
 create mode 100644 src/backend/replication/pgoutput_repack/Makefile
 create mode 100644 src/backend/replication/pgoutput_repack/meson.build
 create mode 100644 src/backend/replication/pgoutput_repack/pgoutput_repack.c
 create mode 100644 src/test/modules/injection_points/expected/repack.out
 create mode 100644 src/test/modules/injection_points/expected/repack_toast.out
 create mode 100644 src/test/modules/injection_points/logical.conf
 create mode 100644 src/test/modules/injection_points/specs/repack.spec
 create mode 100644 src/test/modules/injection_points/specs/repack_toast.spec

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 816b11b7318..bf17bee8e1d 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6177,14 +6177,35 @@ FROM pg_stat_get_backend_idset() AS backendid;
 
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>heap_tuples_written</structfield> <type>bigint</type>
+       <structfield>heap_tuples_inserted</structfield> <type>bigint</type>
       </para>
       <para>
-       Number of heap tuples written.
+       Number of heap tuples inserted.
        This counter only advances when the phase is
        <literal>seq scanning heap</literal>,
-       <literal>index scanning heap</literal>
-       or <literal>writing new heap</literal>.
+       <literal>index scanning heap</literal>,
+       <literal>writing new heap</literal>
+       or <literal>catch-up</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_updated</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples updated.
+       This counter only advances when the phase is <literal>catch-up</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_deleted</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples deleted.
+       This counter only advances when the phase is <literal>catch-up</literal>.
       </para></entry>
      </row>
 
@@ -6265,6 +6286,14 @@ FROM pg_stat_get_backend_idset() AS backendid;
        <command>REPACK</command> is currently writing the new heap.
      </entry>
     </row>
+    <row>
+     <entry><literal>catch-up</literal></entry>
+     <entry>
+       <command>REPACK CONCURRENTLY</command> is currently processing the DML
+       commands that other transactions executed during any of the preceding
+       phases.
+     </entry>
+    </row>
     <row>
      <entry><literal>swapping relation files</literal></entry>
      <entry>
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index 049ee75a4ba..0f5c34af542 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -1833,15 +1833,17 @@ SELECT pg_advisory_lock(q.id) FROM
    <title>Caveats</title>
 
    <para>
-    Some DDL commands, currently only <link linkend="sql-truncate"><command>TRUNCATE</command></link> and the
-    table-rewriting forms of <link linkend="sql-altertable"><command>ALTER TABLE</command></link>, are not
+    Some commands, currently only <link linkend="sql-truncate"><command>TRUNCATE</command></link>, the
+    table-rewriting forms of <link linkend="sql-altertable"><command>ALTER
+    TABLE</command></link> and <command>REPACK</command> with
+    the <literal>CONCURRENTLY</literal> option, are not
     MVCC-safe.  This means that after the truncation or rewrite commits, the
     table will appear empty to concurrent transactions, if they are using a
-    snapshot taken before the DDL command committed.  This will only be an
+    snapshot taken before the command committed.  This will only be an
     issue for a transaction that did not access the table in question
-    before the DDL command started &mdash; any transaction that has done so
+    before the command started &mdash; any transaction that has done so
     would hold at least an <literal>ACCESS SHARE</literal> table lock,
-    which would block the DDL command until that transaction completes.
+    which would block the truncating or rewriting command until that transaction completes.
     So these commands will not cause any apparent inconsistency in the
     table contents for successive queries on the target table, but they
     could cause visible inconsistency between the contents of the target
diff --git a/doc/src/sgml/ref/repack.sgml b/doc/src/sgml/ref/repack.sgml
index 61d5c2cdef1..30c43c49069 100644
--- a/doc/src/sgml/ref/repack.sgml
+++ b/doc/src/sgml/ref/repack.sgml
@@ -28,6 +28,7 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] USING
 
     VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
     ANALYZE [ <replaceable class="parameter">boolean</replaceable> ]
+    CONCURRENTLY [ <replaceable class="parameter">boolean</replaceable> ]
 
 <phrase>and <replaceable class="parameter">table_and_columns</replaceable> is:</phrase>
 
@@ -54,7 +55,8 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] USING
    processes every table and materialized view in the current database that
    the current user has the <literal>MAINTAIN</literal> privilege on. This
    form of <command>REPACK</command> cannot be executed inside a transaction
-   block.
+   block.  Also, this form is not allowed if
+   the <literal>CONCURRENTLY</literal> option is used.
   </para>
 
   <para>
@@ -67,7 +69,8 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] USING
    When a table is being repacked, an <literal>ACCESS EXCLUSIVE</literal> lock
    is acquired on it. This prevents any other database operations (both reads
    and writes) from operating on the table until the <command>REPACK</command>
-   is finished.
+   is finished. If you want to keep the table accessible during the repacking,
+   consider using the <literal>CONCURRENTLY</literal> option.
   </para>
 
   <refsect2 id="sql-repack-notes-on-clustering" xreflabel="Notes on Clustering">
@@ -195,6 +198,128 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] USING
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>CONCURRENTLY</literal></term>
+    <listitem>
+     <para>
+      Allow other transactions to use the table while it is being repacked.
+     </para>
+
+     <para>
+      Internally, <command>REPACK</command> copies the contents of the table
+      (ignoring dead tuples) into a new file, sorted by the specified index,
+      and also creates a new file for each index. Then it swaps the old and
+      new files for the table and all the indexes, and deletes the old
+      files. The <literal>ACCESS EXCLUSIVE</literal> lock is needed to make
+      sure that the old files do not change during the processing because the
+      changes would get lost due to the swap.
+     </para>
+
+     <para>
+      With the <literal>CONCURRENTLY</literal> option, the <literal>ACCESS
+      EXCLUSIVE</literal> lock is only acquired to swap the table and index
+      files. The data changes that took place during the creation of the new
+      table and index files are captured using logical decoding
+      (<xref linkend="logicaldecoding"/>) and applied before
+      the <literal>ACCESS EXCLUSIVE</literal> lock is requested. Thus the lock
+      is typically held only for the time needed to swap the files, which
+      should be pretty short. However, the time might still be noticeable if
+      too many data changes have been done to the table while
+      <command>REPACK</command> was waiting for the lock: those changes must
+      be processed just before the files are swapped, while the
+      <literal>ACCESS EXCLUSIVE</literal> lock is being held.
+     </para>
+
+     <para>
+      Note that <command>REPACK</command> with the
+      <literal>CONCURRENTLY</literal> option does not try to order the rows
+      inserted into the table after the repacking started. Also
+      note <command>REPACK</command> might fail to complete due to DDL
+      commands executed on the table by other transactions during the
+      repacking.
+     </para>
+
+     <note>
+      <para>
+       In addition to the temporary space requirements explained in
+       <xref linkend="sql-repack-notes-on-resources"/>,
+       the <literal>CONCURRENTLY</literal> option can add to the usage of
+       temporary space a bit more. The reason is that other transactions can
+       perform DML operations which cannot be applied to the new file until
+       <command>REPACK</command> has copied all the tuples from the old
+       file. Thus the tuples inserted into the old file during the copying are
+       also stored separately in a temporary file, so they can eventually be
+       applied to the new file.
+      </para>
+
+      <para>
+       Furthermore, the data changes performed during the copying are
+       extracted from <link linkend="wal">write-ahead log</link> (WAL), and
+       this extraction (decoding) only takes place when certain amount of WAL
+       has been written. Therefore, WAL removal can be delayed by this
+       threshold. Currently the threshold is equal to the value of
+       the <link linkend="guc-wal-segment-size"><varname>wal_segment_size</varname></link>
+       configuration parameter.
+      </para>
+     </note>
+
+     <para>
+      The <literal>CONCURRENTLY</literal> option cannot be used in the
+      following cases:
+
+      <itemizedlist>
+       <listitem>
+        <para>
+          The table is <literal>UNLOGGED</literal>.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+          The table is partitioned.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+          The table is a system catalog or a <acronym>TOAST</acronym> table.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         <command>REPACK</command> is executed inside a transaction block.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+          The <link linkend="guc-wal-level"><varname>wal_level</varname></link>
+          configuration parameter is less than <literal>logical</literal>.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         The <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+         configuration parameter does not allow for creation of an additional
+         replication slot.
+        </para>
+       </listitem>
+      </itemizedlist>
+     </para>
+
+     <warning>
+      <para>
+       <command>REPACK</command> with the <literal>CONCURRENTLY</literal>
+       option is not MVCC-safe, see <xref linkend="mvcc-caveats"/> for
+       details.
+      </para>
+     </warning>
+
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>VERBOSE</literal></term>
     <listitem>
diff --git a/src/Makefile b/src/Makefile
index 2f31a2f20a7..b18c9a14ffa 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -23,6 +23,7 @@ SUBDIRS = \
 	interfaces \
 	backend/replication/libpqwalreceiver \
 	backend/replication/pgoutput \
+	backend/replication/pgoutput_repack \
 	fe_utils \
 	bin \
 	pl \
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4d382a04338..e11833f01b4 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -60,7 +60,8 @@ static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
 static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 								  Buffer newbuf, HeapTuple oldtup,
 								  HeapTuple newtup, HeapTuple old_key_tuple,
-								  bool all_visible_cleared, bool new_all_visible_cleared);
+								  bool all_visible_cleared, bool new_all_visible_cleared,
+								  bool walLogical);
 #ifdef USE_ASSERT_CHECKING
 static void check_lock_if_inplace_updateable_rel(Relation relation,
 												 const ItemPointerData *otid,
@@ -2803,7 +2804,7 @@ xmax_infomask_changed(uint16 new_infomask, uint16 old_infomask)
 TM_Result
 heap_delete(Relation relation, const ItemPointerData *tid,
 			CommandId cid, Snapshot crosscheck, bool wait,
-			TM_FailureData *tmfd, bool changingPart)
+			TM_FailureData *tmfd, bool changingPart, bool walLogical)
 {
 	TM_Result	result;
 	TransactionId xid = GetCurrentTransactionId();
@@ -3050,7 +3051,8 @@ l1:
 	 * Compute replica identity tuple before entering the critical section so
 	 * we don't PANIC upon a memory allocation failure.
 	 */
-	old_key_tuple = ExtractReplicaIdentity(relation, &tp, true, &old_key_copied);
+	old_key_tuple = walLogical ?
+		ExtractReplicaIdentity(relation, &tp, true, &old_key_copied) : NULL;
 
 	/*
 	 * If this is the first possibly-multixact-able operation in the current
@@ -3140,6 +3142,15 @@ l1:
 				xlrec.flags |= XLH_DELETE_CONTAINS_OLD_KEY;
 		}
 
+		/*
+		 * Unlike UPDATE, DELETE is decoded even if there is no old key, so it
+		 * does not help to clear both XLH_DELETE_CONTAINS_OLD_TUPLE and
+		 * XLH_DELETE_CONTAINS_OLD_KEY. Thus we need an extra flag. TODO
+		 * Consider not decoding tuples w/o the old tuple/key instead.
+		 */
+		if (!walLogical)
+			xlrec.flags |= XLH_DELETE_NO_LOGICAL;
+
 		XLogBeginInsert();
 		XLogRegisterData(&xlrec, SizeOfHeapDelete);
 
@@ -3232,7 +3243,8 @@ simple_heap_delete(Relation relation, const ItemPointerData *tid)
 	result = heap_delete(relation, tid,
 						 GetCurrentCommandId(true), InvalidSnapshot,
 						 true /* wait for commit */ ,
-						 &tmfd, false /* changingPart */ );
+						 &tmfd, false,	/* changingPart */
+						 true /* walLogical */ );
 	switch (result)
 	{
 		case TM_SelfModified:
@@ -3273,7 +3285,7 @@ TM_Result
 heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
 			CommandId cid, Snapshot crosscheck, bool wait,
 			TM_FailureData *tmfd, LockTupleMode *lockmode,
-			TU_UpdateIndexes *update_indexes)
+			TU_UpdateIndexes *update_indexes, bool walLogical)
 {
 	TM_Result	result;
 	TransactionId xid = GetCurrentTransactionId();
@@ -4166,7 +4178,8 @@ l2:
 								 newbuf, &oldtup, heaptup,
 								 old_key_tuple,
 								 all_visible_cleared,
-								 all_visible_cleared_new);
+								 all_visible_cleared_new,
+								 walLogical);
 		if (newbuf != buffer)
 		{
 			PageSetLSN(BufferGetPage(newbuf), recptr);
@@ -4524,7 +4537,8 @@ simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup
 	result = heap_update(relation, otid, tup,
 						 GetCurrentCommandId(true), InvalidSnapshot,
 						 true /* wait for commit */ ,
-						 &tmfd, &lockmode, update_indexes);
+						 &tmfd, &lockmode, update_indexes,
+						 true /* walLogical */ );
 	switch (result)
 	{
 		case TM_SelfModified:
@@ -8864,7 +8878,8 @@ static XLogRecPtr
 log_heap_update(Relation reln, Buffer oldbuf,
 				Buffer newbuf, HeapTuple oldtup, HeapTuple newtup,
 				HeapTuple old_key_tuple,
-				bool all_visible_cleared, bool new_all_visible_cleared)
+				bool all_visible_cleared, bool new_all_visible_cleared,
+				bool walLogical)
 {
 	xl_heap_update xlrec;
 	xl_heap_header xlhdr;
@@ -8875,7 +8890,8 @@ log_heap_update(Relation reln, Buffer oldbuf,
 				suffixlen = 0;
 	XLogRecPtr	recptr;
 	Page		page = BufferGetPage(newbuf);
-	bool		need_tuple_data = RelationIsLogicallyLogged(reln);
+	bool		need_tuple_data = RelationIsLogicallyLogged(reln) &&
+		walLogical;
 	bool		init;
 	int			bufflags;
 
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 79f9de5d760..01be29eb405 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -33,6 +33,7 @@
 #include "catalog/index.h"
 #include "catalog/storage.h"
 #include "catalog/storage_xlog.h"
+#include "commands/cluster.h"
 #include "commands/progress.h"
 #include "executor/executor.h"
 #include "miscadmin.h"
@@ -309,7 +310,8 @@ heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
 	 * the storage itself is cleaning the dead tuples by itself, it is the
 	 * time to call the index tuple deletion also.
 	 */
-	return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart);
+	return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart,
+					   true);
 }
 
 
@@ -328,7 +330,7 @@ heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
 	tuple->t_tableOid = slot->tts_tableOid;
 
 	result = heap_update(relation, otid, tuple, cid, crosscheck, wait,
-						 tmfd, lockmode, update_indexes);
+						 tmfd, lockmode, update_indexes, true);
 	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
 
 	/*
@@ -685,13 +687,15 @@ static void
 heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 								 Relation OldIndex, bool use_sort,
 								 TransactionId OldestXmin,
+								 Snapshot snapshot,
+								 LogicalDecodingContext *decoding_ctx,
 								 TransactionId *xid_cutoff,
 								 MultiXactId *multi_cutoff,
 								 double *num_tuples,
 								 double *tups_vacuumed,
 								 double *tups_recently_dead)
 {
-	RewriteState rwstate;
+	RewriteState rwstate = NULL;
 	IndexScanDesc indexScan;
 	TableScanDesc tableScan;
 	HeapScanDesc heapScan;
@@ -705,6 +709,8 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	bool	   *isnull;
 	BufferHeapTupleTableSlot *hslot;
 	BlockNumber prev_cblock = InvalidBlockNumber;
+	bool		concurrent = snapshot != NULL;
+	XLogRecPtr	end_of_wal_prev = GetFlushRecPtr(NULL);
 
 	/* Remember if it's a system catalog */
 	is_system_catalog = IsSystemRelation(OldHeap);
@@ -720,9 +726,12 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	values = (Datum *) palloc(natts * sizeof(Datum));
 	isnull = (bool *) palloc(natts * sizeof(bool));
 
-	/* Initialize the rewrite operation */
-	rwstate = begin_heap_rewrite(OldHeap, NewHeap, OldestXmin, *xid_cutoff,
-								 *multi_cutoff);
+	/*
+	 * Initialize the rewrite operation.
+	 */
+	if (!concurrent)
+		rwstate = begin_heap_rewrite(OldHeap, NewHeap, OldestXmin,
+									 *xid_cutoff, *multi_cutoff);
 
 
 	/* Set up sorting if wanted */
@@ -737,6 +746,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	 * Prepare to scan the OldHeap.  To ensure we see recently-dead tuples
 	 * that still need to be copied, we scan with SnapshotAny and use
 	 * HeapTupleSatisfiesVacuum for the visibility test.
+	 *
+	 * In the CONCURRENTLY case, we do regular MVCC visibility tests, using
+	 * the snapshot passed by the caller.
 	 */
 	if (OldIndex != NULL && !use_sort)
 	{
@@ -753,7 +765,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex,
+									snapshot ? snapshot : SnapshotAny,
+									NULL, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -762,7 +776,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap,
+									snapshot ? snapshot : SnapshotAny,
+									0, (ScanKey) NULL);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
@@ -837,70 +853,77 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		tuple = ExecFetchSlotHeapTuple(slot, false, NULL);
 		buf = hslot->buffer;
 
-		LockBuffer(buf, BUFFER_LOCK_SHARE);
-
-		switch (HeapTupleSatisfiesVacuum(tuple, OldestXmin, buf))
+		/*
+		 * Regarding CONCURRENTLY, see the comments on MVCC snapshot above.
+		 */
+		if (!concurrent)
 		{
-			case HEAPTUPLE_DEAD:
-				/* Definitely dead */
-				isdead = true;
-				break;
-			case HEAPTUPLE_RECENTLY_DEAD:
-				*tups_recently_dead += 1;
-				/* fall through */
-			case HEAPTUPLE_LIVE:
-				/* Live or recently dead, must copy it */
-				isdead = false;
-				break;
-			case HEAPTUPLE_INSERT_IN_PROGRESS:
+			LockBuffer(buf, BUFFER_LOCK_SHARE);
 
-				/*
-				 * Since we hold exclusive lock on the relation, normally the
-				 * only way to see this is if it was inserted earlier in our
-				 * own transaction.  However, it can happen in system
-				 * catalogs, since we tend to release write lock before commit
-				 * there.  Give a warning if neither case applies; but in any
-				 * case we had better copy it.
-				 */
-				if (!is_system_catalog &&
-					!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple->t_data)))
-					elog(WARNING, "concurrent insert in progress within table \"%s\"",
-						 RelationGetRelationName(OldHeap));
-				/* treat as live */
-				isdead = false;
-				break;
-			case HEAPTUPLE_DELETE_IN_PROGRESS:
+			switch (HeapTupleSatisfiesVacuum(tuple, OldestXmin, buf))
+			{
+				case HEAPTUPLE_DEAD:
+					/* Definitely dead */
+					isdead = true;
+					break;
+				case HEAPTUPLE_RECENTLY_DEAD:
+					*tups_recently_dead += 1;
+					/* fall through */
+				case HEAPTUPLE_LIVE:
+					/* Live or recently dead, must copy it */
+					isdead = false;
+					break;
+				case HEAPTUPLE_INSERT_IN_PROGRESS:
 
-				/*
-				 * Similar situation to INSERT_IN_PROGRESS case.
-				 */
-				if (!is_system_catalog &&
-					!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tuple->t_data)))
-					elog(WARNING, "concurrent delete in progress within table \"%s\"",
-						 RelationGetRelationName(OldHeap));
-				/* treat as recently dead */
-				*tups_recently_dead += 1;
-				isdead = false;
-				break;
-			default:
-				elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
-				isdead = false; /* keep compiler quiet */
-				break;
-		}
+					/*
+					 * As long as we hold exclusive lock on the relation,
+					 * normally the only way to see this is if it was inserted
+					 * earlier in our own transaction.  However, it can happen
+					 * in system catalogs, since we tend to release write lock
+					 * before commit there. Give a warning if neither case
+					 * applies; but in any case we had better copy it.
+					 */
+					if (!is_system_catalog &&
+						!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple->t_data)))
+						elog(WARNING, "concurrent insert in progress within table \"%s\"",
+							 RelationGetRelationName(OldHeap));
+					/* treat as live */
+					isdead = false;
+					break;
+				case HEAPTUPLE_DELETE_IN_PROGRESS:
+
+					/*
+					 * Similar situation to INSERT_IN_PROGRESS case.
+					 */
+					if (!is_system_catalog &&
+						!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tuple->t_data)))
+						elog(WARNING, "concurrent delete in progress within table \"%s\"",
+							 RelationGetRelationName(OldHeap));
+					/* treat as recently dead */
+					*tups_recently_dead += 1;
+					isdead = false;
+					break;
+				default:
+					elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
+					isdead = false; /* keep compiler quiet */
+					break;
+			}
 
-		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 
-		if (isdead)
-		{
-			*tups_vacuumed += 1;
-			/* heap rewrite module still needs to see it... */
-			if (rewrite_heap_dead_tuple(rwstate, tuple))
+			if (isdead)
 			{
-				/* A previous recently-dead tuple is now known dead */
 				*tups_vacuumed += 1;
-				*tups_recently_dead -= 1;
+				/* heap rewrite module still needs to see it... */
+				if (rewrite_heap_dead_tuple(rwstate, tuple))
+				{
+					/* A previous recently-dead tuple is now known dead */
+					*tups_vacuumed += 1;
+					*tups_recently_dead -= 1;
+				}
+
+				continue;
 			}
-			continue;
 		}
 
 		*num_tuples += 1;
@@ -919,7 +942,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		{
 			const int	ct_index[] = {
 				PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
-				PROGRESS_REPACK_HEAP_TUPLES_WRITTEN
+				PROGRESS_REPACK_HEAP_TUPLES_INSERTED
 			};
 			int64		ct_val[2];
 
@@ -934,6 +957,31 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			ct_val[1] = *num_tuples;
 			pgstat_progress_update_multi_param(2, ct_index, ct_val);
 		}
+
+		/*
+		 * Process the WAL produced by the load, as well as by other
+		 * transactions, so that the replication slot can advance and WAL does
+		 * not pile up. Use wal_segment_size as a threshold so that we do not
+		 * introduce the decoding overhead too often.
+		 *
+		 * Of course, we must not apply the changes until the initial load has
+		 * completed.
+		 *
+		 * Note that our insertions into the new table should not be decoded
+		 * as we (intentionally) do not write the logical decoding specific
+		 * information to WAL.
+		 */
+		if (concurrent)
+		{
+			XLogRecPtr	end_of_wal;
+
+			end_of_wal = GetFlushRecPtr(NULL);
+			if ((end_of_wal - end_of_wal_prev) > wal_segment_size)
+			{
+				repack_decode_concurrent_changes(decoding_ctx, end_of_wal);
+				end_of_wal_prev = end_of_wal;
+			}
+		}
 	}
 
 	if (indexScan != NULL)
@@ -977,15 +1025,32 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 									 values, isnull,
 									 rwstate);
 			/* Report n_tuples */
-			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_WRITTEN,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_INSERTED,
 										 n_tuples);
+
+			/*
+			 * Try to keep the amount of not-yet-decoded WAL small, like
+			 * above.
+			 */
+			if (concurrent)
+			{
+				XLogRecPtr	end_of_wal;
+
+				end_of_wal = GetFlushRecPtr(NULL);
+				if ((end_of_wal - end_of_wal_prev) > wal_segment_size)
+				{
+					repack_decode_concurrent_changes(decoding_ctx, end_of_wal);
+					end_of_wal_prev = end_of_wal;
+				}
+			}
 		}
 
 		tuplesort_end(tuplesort);
 	}
 
 	/* Write out any remaining tuples, and fsync if needed */
-	end_heap_rewrite(rwstate);
+	if (rwstate)
+		end_heap_rewrite(rwstate);
 
 	/* Clean up */
 	pfree(values);
@@ -2376,6 +2441,10 @@ heapam_scan_sample_next_tuple(TableScanDesc scan, SampleScanState *scanstate,
  * SET WITHOUT OIDS.
  *
  * So, we must reconstruct the tuple from component Datums.
+ *
+ * If rwstate=NULL, use simple_heap_insert() instead of rewriting - in that
+ * case we still need to deform/form the tuple. TODO Shouldn't we rename the
+ * function, as might not do any rewrite?
  */
 static void
 reform_and_rewrite_tuple(HeapTuple tuple,
@@ -2398,8 +2467,28 @@ reform_and_rewrite_tuple(HeapTuple tuple,
 
 	copiedTuple = heap_form_tuple(newTupDesc, values, isnull);
 
-	/* The heap rewrite module does the rest */
-	rewrite_heap_tuple(rwstate, tuple, copiedTuple);
+	if (rwstate)
+		/* The heap rewrite module does the rest */
+		rewrite_heap_tuple(rwstate, tuple, copiedTuple);
+	else
+	{
+		/*
+		 * Insert tuple when processing REPACK CONCURRENTLY.
+		 *
+		 * rewriteheap.c is not used in the CONCURRENTLY case because it'd be
+		 * difficult to do the same in the catch-up phase (as the logical
+		 * decoding does not provide us with sufficient visibility
+		 * information). Thus we must use heap_insert() both during the
+		 * catch-up and here.
+		 *
+		 * The following is like simple_heap_insert() except that we pass the
+		 * flag to skip logical decoding: as soon as REPACK CONCURRENTLY swaps
+		 * the relation files, it drops this relation, so no logical
+		 * replication subscription should need the data.
+		 */
+		heap_insert(NewHeap, copiedTuple, GetCurrentCommandId(true),
+					HEAP_INSERT_NO_LOGICAL, NULL);
+	}
 
 	heap_freetuple(copiedTuple);
 }
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index 66ab48f0fe0..ee83a0fc91d 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -617,9 +617,9 @@ raw_heap_insert(RewriteState state, HeapTuple tup)
 		int			options = HEAP_INSERT_SKIP_FSM;
 
 		/*
-		 * While rewriting the heap for VACUUM FULL / CLUSTER, make sure data
-		 * for the TOAST table are not logically decoded.  The main heap is
-		 * WAL-logged as XLOG FPI records, which are not logically decoded.
+		 * While rewriting the heap for REPACK, make sure data for the TOAST
+		 * table are not logically decoded.  The main heap is WAL-logged as
+		 * XLOG FPI records, which are not logically decoded.
 		 */
 		options |= HEAP_INSERT_NO_LOGICAL;
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 574f1004b9a..5ceb574ebc9 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1287,16 +1287,19 @@ CREATE VIEW pg_stat_progress_repack AS
                       WHEN 2 THEN 'index scanning heap'
                       WHEN 3 THEN 'sorting tuples'
                       WHEN 4 THEN 'writing new heap'
-                      WHEN 5 THEN 'swapping relation files'
-                      WHEN 6 THEN 'rebuilding index'
-                      WHEN 7 THEN 'performing final cleanup'
+                      WHEN 5 THEN 'catch-up'
+                      WHEN 6 THEN 'swapping relation files'
+                      WHEN 7 THEN 'rebuilding index'
+                      WHEN 8 THEN 'performing final cleanup'
                       END AS phase,
         CAST(S.param3 AS oid) AS repack_index_relid,
         S.param4 AS heap_tuples_scanned,
-        S.param5 AS heap_tuples_written,
-        S.param6 AS heap_blks_total,
-        S.param7 AS heap_blks_scanned,
-        S.param8 AS index_rebuild_count
+        S.param5 AS heap_tuples_inserted,
+        S.param6 AS heap_tuples_updated,
+        S.param7 AS heap_tuples_deleted,
+        S.param8 AS heap_blks_total,
+        S.param9 AS heap_blks_scanned,
+        S.param10 AS index_rebuild_count
     FROM pg_stat_get_progress_info('REPACK') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
@@ -1314,7 +1317,7 @@ CREATE VIEW pg_stat_progress_cluster AS
         phase,
         repack_index_relid AS cluster_index_relid,
         heap_tuples_scanned,
-        heap_tuples_written,
+        heap_tuples_inserted + heap_tuples_updated AS heap_tuples_written,
         heap_blks_total,
         heap_blks_scanned,
         index_rebuild_count
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 89f0b03a31c..501bd36c23e 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -26,6 +26,10 @@
 #include "access/toast_internals.h"
 #include "access/transam.h"
 #include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xlog_internal.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
 #include "catalog/catalog.h"
 #include "catalog/dependency.h"
 #include "catalog/heap.h"
@@ -33,6 +37,7 @@
 #include "catalog/namespace.h"
 #include "catalog/objectaccess.h"
 #include "catalog/pg_am.h"
+#include "catalog/pg_control.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/toasting.h"
 #include "commands/cluster.h"
@@ -40,15 +45,21 @@
 #include "commands/progress.h"
 #include "commands/tablecmds.h"
 #include "commands/vacuum.h"
+#include "executor/executor.h"
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
 #include "pgstat.h"
+#include "replication/decode.h"
+#include "replication/logical.h"
+#include "replication/snapbuild.h"
 #include "storage/bufmgr.h"
+#include "storage/ipc.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
 #include "utils/acl.h"
 #include "utils/fmgroids.h"
 #include "utils/guc.h"
+#include "utils/injection_point.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
@@ -68,12 +79,62 @@ typedef struct
 	Oid			indexOid;
 } RelToCluster;
 
+/*
+ * The following definitions are used for concurrent processing.
+ */
+
+/*
+ * The locators are used to avoid logical decoding of data that we do not need
+ * for our table.
+ */
+RelFileLocator repacked_rel_locator = {.relNumber = InvalidOid};
+RelFileLocator repacked_rel_toast_locator = {.relNumber = InvalidOid};
+
+/*
+ * Everything we need to call ExecInsertIndexTuples().
+ */
+typedef struct IndexInsertState
+{
+	ResultRelInfo *rri;
+	EState	   *estate;
+} IndexInsertState;
+
+/* The WAL segment being decoded. */
+static XLogSegNo repack_current_segment = 0;
+
+/*
+ * Information needed to apply concurrent data changes.
+ */
+typedef struct ChangeDest
+{
+	/* The relation the changes are applied to. */
+	Relation	rel;
+
+	/*
+	 * The following is needed to find the existing tuple if the change is
+	 * UPDATE or DELETE. 'ident_key' should have all the fields except for
+	 * 'sk_argument' initialized.
+	 */
+	Relation	ident_index;
+	ScanKey		ident_key;
+	int			ident_key_nentries;
+
+	/* Needed to update indexes of rel_dst. */
+	IndexInsertState *iistate;
+} ChangeDest;
+
 static bool cluster_rel_recheck(RepackCommand cmd, Relation OldHeap,
-								Oid indexOid, Oid userid, int options);
-static void rebuild_relation(Relation OldHeap, Relation index, bool verbose);
+								Oid indexOid, Oid userid, LOCKMODE lmode,
+								int options);
+static void check_repack_concurrently_requirements(Relation rel);
+static void rebuild_relation(Relation OldHeap, Relation index, bool verbose,
+							 bool concurrent);
 static void copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
-							bool verbose, bool *pSwapToastByContent,
-							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
+							Snapshot snapshot, LogicalDecodingContext *decoding_ctx,
+							bool verbose,
+							bool *pSwapToastByContent,
+							TransactionId *pFreezeXid,
+							MultiXactId *pCutoffMulti);
 static List *get_tables_to_repack(RepackCommand cmd, bool usingindex,
 								  MemoryContext permcxt);
 static List *get_tables_to_repack_partitioned(RepackCommand cmd,
@@ -81,13 +142,51 @@ static List *get_tables_to_repack_partitioned(RepackCommand cmd,
 											  MemoryContext permcxt);
 static bool cluster_is_permitted_for_relation(RepackCommand cmd,
 											  Oid relid, Oid userid);
+
+static void begin_concurrent_repack(Relation rel);
+static void end_concurrent_repack(void);
+static LogicalDecodingContext *setup_logical_decoding(Oid relid);
+static HeapTuple get_changed_tuple(char *change);
+static void apply_concurrent_changes(RepackDecodingState *dstate,
+									 ChangeDest *dest);
+static void apply_concurrent_insert(Relation rel, HeapTuple tup,
+									IndexInsertState *iistate,
+									TupleTableSlot *index_slot);
+static void apply_concurrent_update(Relation rel, HeapTuple tup,
+									HeapTuple tup_target,
+									IndexInsertState *iistate,
+									TupleTableSlot *index_slot);
+static void apply_concurrent_delete(Relation rel, HeapTuple tup_target);
+static HeapTuple find_target_tuple(Relation rel, ChangeDest *dest,
+								   HeapTuple tup_key,
+								   TupleTableSlot *ident_slot);
+static void process_concurrent_changes(LogicalDecodingContext *decoding_ctx,
+									   XLogRecPtr end_of_wal,
+									   ChangeDest *dest);
+static IndexInsertState *get_index_insert_state(Relation relation,
+												Oid ident_index_id,
+												Relation *ident_index_p);
+static ScanKey build_identity_key(Oid ident_idx_oid, Relation rel_src,
+								  int *nentries);
+static void free_index_insert_state(IndexInsertState *iistate);
+static void cleanup_logical_decoding(LogicalDecodingContext *ctx);
+static void rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
+											   Relation cl_index,
+											   LogicalDecodingContext *decoding_ctx,
+											   TransactionId frozenXid,
+											   MultiXactId cutoffMulti);
+static List *build_new_indexes(Relation NewHeap, Relation OldHeap, List *OldIndexes);
 static Relation process_single_relation(RepackStmt *stmt,
+										LOCKMODE lockmode,
+										bool isTopLevel,
 										ClusterParams *params);
 static Oid	determine_clustered_index(Relation rel, bool usingindex,
 									  const char *indexname);
 static const char *RepackCommandAsString(RepackCommand cmd);
 
 
+#define REPL_PLUGIN_NAME   "pgoutput_repack"
+
 /*
  * The repack code allows for processing multiple tables at once. Because
  * of this, we cannot just run everything on a single transaction, or we
@@ -117,6 +216,7 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 	ClusterParams params = {0};
 	Relation	rel = NULL;
 	MemoryContext repack_context;
+	LOCKMODE	lockmode;
 	List	   *rtcs;
 
 	/* Parse option list */
@@ -127,6 +227,16 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 		else if (strcmp(opt->defname, "analyze") == 0 ||
 				 strcmp(opt->defname, "analyse") == 0)
 			params.options |= defGetBoolean(opt) ? CLUOPT_ANALYZE : 0;
+		else if (strcmp(opt->defname, "concurrently") == 0 &&
+				 defGetBoolean(opt))
+		{
+			if (stmt->command != REPACK_COMMAND_REPACK)
+				ereport(ERROR,
+						errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						errmsg("CONCURRENTLY option not supported for %s",
+							   RepackCommandAsString(stmt->command)));
+			params.options |= CLUOPT_CONCURRENT;
+		}
 		else
 			ereport(ERROR,
 					errcode(ERRCODE_SYNTAX_ERROR),
@@ -136,13 +246,25 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 					parser_errposition(pstate, opt->location));
 	}
 
+	/*
+	 * Determine the lock mode expected by cluster_rel().
+	 *
+	 * In the exclusive case, we obtain AccessExclusiveLock right away to
+	 * avoid lock-upgrade hazard in the single-transaction case. In the
+	 * CONCURRENTLY case, the AccessExclusiveLock will only be used at the end
+	 * of processing, supposedly for very short time. Until then, we'll have
+	 * to unlock the relation temporarily, so there's no lock-upgrade hazard.
+	 */
+	lockmode = (params.options & CLUOPT_CONCURRENT) == 0 ?
+		AccessExclusiveLock : ShareUpdateExclusiveLock;
+
 	/*
 	 * If a single relation is specified, process it and we're done ... unless
 	 * the relation is a partitioned table, in which case we fall through.
 	 */
 	if (stmt->relation != NULL)
 	{
-		rel = process_single_relation(stmt, &params);
+		rel = process_single_relation(stmt, lockmode, isTopLevel, &params);
 		if (rel == NULL)
 			return;				/* all done */
 	}
@@ -157,10 +279,29 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 				errmsg("cannot %s multiple tables", "REPACK (ANALYZE)"));
 
 	/*
-	 * By here, we know we are in a multi-table situation.  In order to avoid
-	 * holding locks for too long, we want to process each table in its own
-	 * transaction.  This forces us to disallow running inside a user
-	 * transaction block.
+	 * By here, we know we are in a multi-table situation.
+	 *
+	 * Concurrent processing is currently considered rather special (e.g. in
+	 * terms of resources consumed) so it is not performed in bulk.
+	 */
+	if (params.options & CLUOPT_CONCURRENT)
+	{
+		if (rel != NULL)
+		{
+			Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
+			ereport(ERROR,
+					errmsg("REPACK CONCURRENTLY not supported for partitioned tables"),
+					errhint("Consider running the command for individual partitions."));
+		}
+		else
+			ereport(ERROR,
+					errmsg("REPACK CONCURRENTLY requires explicit table name"));
+	}
+
+	/*
+	 * In order to avoid holding locks for too long, we want to process each
+	 * table in its own transaction.  This forces us to disallow running
+	 * inside a user transaction block.
 	 */
 	PreventInTransactionBlock(isTopLevel, RepackCommandAsString(stmt->command));
 
@@ -243,7 +384,7 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 		 * Open the target table, coping with the case where it has been
 		 * dropped.
 		 */
-		rel = try_table_open(rtc->tableOid, AccessExclusiveLock);
+		rel = try_table_open(rtc->tableOid, lockmode);
 		if (rel == NULL)
 		{
 			CommitTransactionCommand();
@@ -254,7 +395,7 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 		PushActiveSnapshot(GetTransactionSnapshot());
 
 		/* Process this table */
-		cluster_rel(stmt->command, rel, rtc->indexOid, &params);
+		cluster_rel(stmt->command, rel, rtc->indexOid, &params, isTopLevel);
 		/* cluster_rel closes the relation, but keeps lock */
 
 		PopActiveSnapshot();
@@ -283,22 +424,53 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
  * If indexOid is InvalidOid, the table will be rewritten in physical order
  * instead of index order.
  *
+ * Note that, in the concurrent case, the function releases the lock at some
+ * point, in order to get AccessExclusiveLock for the final steps (i.e. to
+ * swap the relation files). To make things simpler, the caller should expect
+ * OldHeap to be closed on return, regardless CLUOPT_CONCURRENT. (The
+ * AccessExclusiveLock is kept till the end of the transaction.)
+ *
  * 'cmd' indicates which command is being executed, to be used for error
  * messages.
  */
 void
 cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
-			ClusterParams *params)
+			ClusterParams *params, bool isTopLevel)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
+	Relation	index;
+	LOCKMODE	lmode;
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
 	bool		verbose = ((params->options & CLUOPT_VERBOSE) != 0);
 	bool		recheck = ((params->options & CLUOPT_RECHECK) != 0);
-	Relation	index;
+	bool		concurrent = ((params->options & CLUOPT_CONCURRENT) != 0);
+
+	/*
+	 * The lock mode is AccessExclusiveLock for normal processing and
+	 * ShareUpdateExclusiveLock for concurrent processing (so that SELECT,
+	 * INSERT, UPDATE and DELETE commands work, but cluster_rel() cannot be
+	 * called concurrently for the same relation).
+	 */
+	lmode = !concurrent ? AccessExclusiveLock : ShareUpdateExclusiveLock;
+
+	/* There are specific requirements on concurrent processing. */
+	if (concurrent)
+	{
+		/*
+		 * Make sure we have no XID assigned, otherwise call of
+		 * setup_logical_decoding() can cause a deadlock.
+		 *
+		 * The existence of transaction block actually does not imply that XID
+		 * was already assigned, but it very likely is. We might want to check
+		 * the result of GetCurrentTransactionIdIfAny() instead, but that
+		 * would be less clear from user's perspective.
+		 */
+		PreventInTransactionBlock(isTopLevel, "REPACK (CONCURRENTLY)");
 
-	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false));
+		check_repack_concurrently_requirements(OldHeap);
+	}
 
 	/* Check for user-requested abort. */
 	CHECK_FOR_INTERRUPTS();
@@ -324,10 +496,13 @@ cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	 * If this is a single-transaction CLUSTER, we can skip these tests. We
 	 * *must* skip the one on indisclustered since it would reject an attempt
 	 * to cluster a not-previously-clustered index.
+	 *
+	 * XXX move [some of] these comments to where the RECHECK flag is
+	 * determined?
 	 */
 	if (recheck &&
 		!cluster_rel_recheck(cmd, OldHeap, indexOid, save_userid,
-							 params->options))
+							 lmode, params->options))
 		goto out;
 
 	/*
@@ -342,6 +517,12 @@ cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 				errmsg("cannot run %s on a shared catalog",
 					   RepackCommandAsString(cmd)));
 
+	/*
+	 * The CONCURRENTLY case should have been rejected earlier because it does
+	 * not support system catalogs.
+	 */
+	Assert(!(OldHeap->rd_rel->relisshared && concurrent));
+
 	/*
 	 * Don't process temp tables of other backends ... their local buffer
 	 * manager is not going to cope.
@@ -362,7 +543,7 @@ cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	if (OidIsValid(indexOid))
 	{
 		/* verify the index is good and lock it */
-		check_index_is_clusterable(OldHeap, indexOid, AccessExclusiveLock);
+		check_index_is_clusterable(OldHeap, indexOid, lmode);
 		/* also open it */
 		index = index_open(indexOid, NoLock);
 	}
@@ -397,7 +578,9 @@ cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	if (OldHeap->rd_rel->relkind == RELKIND_MATVIEW &&
 		!RelationIsPopulated(OldHeap))
 	{
-		relation_close(OldHeap, AccessExclusiveLock);
+		if (index)
+			index_close(index, lmode);
+		relation_close(OldHeap, lmode);
 		goto out;
 	}
 
@@ -410,11 +593,34 @@ cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	 * invalid, because we move tuples around.  Promote them to relation
 	 * locks.  Predicate locks on indexes will be promoted when they are
 	 * reindexed.
+	 *
+	 * During concurrent processing, the heap as well as its indexes stay in
+	 * operation, so we postpone this step until they are locked using
+	 * AccessExclusiveLock near the end of the processing.
 	 */
-	TransferPredicateLocksToHeapRelation(OldHeap);
+	if (!concurrent)
+		TransferPredicateLocksToHeapRelation(OldHeap);
 
 	/* rebuild_relation does all the dirty work */
-	rebuild_relation(OldHeap, index, verbose);
+	PG_TRY();
+	{
+		/*
+		 * For concurrent processing, make sure that our logical decoding
+		 * ignores data changes of other tables than the one we are
+		 * processing.
+		 */
+		if (concurrent)
+			begin_concurrent_repack(OldHeap);
+
+		rebuild_relation(OldHeap, index, verbose, concurrent);
+	}
+	PG_FINALLY();
+	{
+		if (concurrent)
+			end_concurrent_repack();
+	}
+	PG_END_TRY();
+
 	/* rebuild_relation closes OldHeap, and index if valid */
 
 out:
@@ -433,14 +639,14 @@ out:
  */
 static bool
 cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
-					Oid userid, int options)
+					Oid userid, LOCKMODE lmode, int options)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 
 	/* Check that the user still has privileges for the relation */
 	if (!cluster_is_permitted_for_relation(cmd, tableOid, userid))
 	{
-		relation_close(OldHeap, AccessExclusiveLock);
+		relation_close(OldHeap, lmode);
 		return false;
 	}
 
@@ -454,7 +660,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	 */
 	if (RELATION_IS_OTHER_TEMP(OldHeap))
 	{
-		relation_close(OldHeap, AccessExclusiveLock);
+		relation_close(OldHeap, lmode);
 		return false;
 	}
 
@@ -465,7 +671,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 		 */
 		if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
 		{
-			relation_close(OldHeap, AccessExclusiveLock);
+			relation_close(OldHeap, lmode);
 			return false;
 		}
 
@@ -476,7 +682,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 		if ((options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
 			!get_index_isclustered(indexOid))
 		{
-			relation_close(OldHeap, AccessExclusiveLock);
+			relation_close(OldHeap, lmode);
 			return false;
 		}
 	}
@@ -488,7 +694,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
  * Verify that the specified heap and index are valid to cluster on
  *
  * Side effect: obtains lock on the index.  The caller may
- * in some cases already have AccessExclusiveLock on the table, but
+ * in some cases already have a lock of the same strength on the table, but
  * not in all cases so we can't rely on the table-level lock for
  * protection here.
  */
@@ -617,18 +823,87 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 	table_close(pg_index, RowExclusiveLock);
 }
 
+/*
+ * Check if the CONCURRENTLY option is legal for the relation.
+ */
+static void
+check_repack_concurrently_requirements(Relation rel)
+{
+	char		relpersistence,
+				replident;
+	Oid			ident_idx;
+
+	/* Data changes in system relations are not logically decoded. */
+	if (IsCatalogRelation(rel))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("REPACK CONCURRENTLY is not supported for catalog relations.")));
+
+	/*
+	 * reorderbuffer.c does not seem to handle processing of TOAST relation
+	 * alone.
+	 */
+	if (IsToastRelation(rel))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("REPACK CONCURRENTLY is not supported for TOAST relations, unless the main relation is repacked too.")));
+
+	relpersistence = rel->rd_rel->relpersistence;
+	if (relpersistence != RELPERSISTENCE_PERMANENT)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("REPACK CONCURRENTLY is only allowed for permanent relations.")));
+
+	/* With NOTHING, WAL does not contain the old tuple. */
+	replident = rel->rd_rel->relreplident;
+	if (replident == REPLICA_IDENTITY_NOTHING)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("Relation \"%s\" has insufficient replication identity.",
+						 RelationGetRelationName(rel))));
+
+	/*
+	 * Identity index is not set if the replica identity is FULL, but PK might
+	 * exist in such a case.
+	 */
+	ident_idx = RelationGetReplicaIndex(rel);
+	if (!OidIsValid(ident_idx) && OidIsValid(rel->rd_pkindex))
+		ident_idx = rel->rd_pkindex;
+	if (!OidIsValid(ident_idx))
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot process relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 (errhint("Relation \"%s\" has no identity index.",
+						  RelationGetRelationName(rel)))));
+}
+
+
 /*
  * rebuild_relation: rebuild an existing relation in index or physical order
  *
- * OldHeap: table to rebuild.
+ * OldHeap: table to rebuild.  See cluster_rel() for comments on the required
+ * lock strength.
+ *
  * index: index to cluster by, or NULL to rewrite in physical order.
  *
- * On entry, heap and index (if one is given) must be open, and
- * AccessExclusiveLock held on them.
- * On exit, they are closed, but locks on them are not released.
+ * On entry, heap and index (if one is given) must be open, and the
+ * appropriate lock held on them -- AccessExclusiveLock for exclusive
+ * processing and ShareUpdateExclusiveLock for concurrent processing.
+ *
+ * On exit, they are closed, but still locked with AccessExclusiveLock.
+ * (The function handles the lock upgrade if 'concurrent' is true.)
  */
 static void
-rebuild_relation(Relation OldHeap, Relation index, bool verbose)
+rebuild_relation(Relation OldHeap, Relation index, bool verbose, bool concurrent)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			accessMethod = OldHeap->rd_rel->relam;
@@ -636,13 +911,38 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose)
 	Oid			OIDNewHeap;
 	Relation	NewHeap;
 	char		relpersistence;
-	bool		is_system_catalog;
 	bool		swap_toast_by_content;
 	TransactionId frozenXid;
 	MultiXactId cutoffMulti;
+	LogicalDecodingContext *decoding_ctx = NULL;
+	Snapshot	snapshot = NULL;
+#if USE_ASSERT_CHECKING
+	LOCKMODE	lmode;
+
+	lmode = concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock;
+
+	Assert(CheckRelationLockedByMe(OldHeap, lmode, false));
+	Assert(index == NULL || CheckRelationLockedByMe(index, lmode, false));
+#endif
+
+	if (concurrent)
+	{
+		/*
+		 * Prepare to capture the concurrent data changes.
+		 *
+		 * Note that this call waits for all transactions with XID already
+		 * assigned to finish. If some of those transactions is waiting for a
+		 * lock conflicting with ShareUpdateExclusiveLock on our table (e.g.
+		 * it runs CREATE INDEX), we can end up in a deadlock. Not sure this
+		 * risk is worth unlocking/locking the table (and its clustering
+		 * index) and checking again if its still eligible for REPACK
+		 * CONCURRENTLY.
+		 */
+		decoding_ctx = setup_logical_decoding(tableOid);
 
-	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false) &&
-		   (index == NULL || CheckRelationLockedByMe(index, AccessExclusiveLock, false)));
+		snapshot = SnapBuildInitialSnapshotForRepack(decoding_ctx->snapshot_builder);
+		PushActiveSnapshot(snapshot);
+	}
 
 	/* for CLUSTER or REPACK USING INDEX, mark the index as the one to use */
 	if (index != NULL)
@@ -650,7 +950,6 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose)
 
 	/* Remember info about rel before closing OldHeap */
 	relpersistence = OldHeap->rd_rel->relpersistence;
-	is_system_catalog = IsSystemRelation(OldHeap);
 
 	/*
 	 * Create the transient table that will receive the re-ordered data.
@@ -666,30 +965,65 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose)
 	NewHeap = table_open(OIDNewHeap, NoLock);
 
 	/* Copy the heap data into the new table in the desired order */
-	copy_table_data(NewHeap, OldHeap, index, verbose,
+	copy_table_data(NewHeap, OldHeap, index, snapshot, decoding_ctx, verbose,
 					&swap_toast_by_content, &frozenXid, &cutoffMulti);
 
+	/* The historic snapshot won't be needed anymore. */
+	if (snapshot)
+		PopActiveSnapshot();
 
-	/* Close relcache entries, but keep lock until transaction commit */
-	table_close(OldHeap, NoLock);
-	if (index)
-		index_close(index, NoLock);
+	if (concurrent)
+	{
+		/*
+		 * Push a snapshot that we will use to find old versions of rows when
+		 * processing concurrent UPDATE and DELETE commands. (That snapshot
+		 * should also be used by index expressions.)
+		 */
+		PushActiveSnapshot(GetTransactionSnapshot());
 
-	/*
-	 * Close the new relation so it can be dropped as soon as the storage is
-	 * swapped. The relation is not visible to others, so no need to unlock it
-	 * explicitly.
-	 */
-	table_close(NewHeap, NoLock);
+		/*
+		 * Make sure we can find the tuples just inserted when applying DML
+		 * commands on top of those.
+		 */
+		CommandCounterIncrement();
 
-	/*
-	 * Swap the physical files of the target and transient tables, then
-	 * rebuild the target's indexes and throw away the transient table.
-	 */
-	finish_heap_swap(tableOid, OIDNewHeap, is_system_catalog,
-					 swap_toast_by_content, false, true,
-					 frozenXid, cutoffMulti,
-					 relpersistence);
+		Assert(!swap_toast_by_content);
+		rebuild_relation_finish_concurrent(NewHeap, OldHeap, index,
+										   decoding_ctx,
+										   frozenXid, cutoffMulti);
+		PopActiveSnapshot();
+
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_FINAL_CLEANUP);
+
+		/* Done with decoding. */
+		cleanup_logical_decoding(decoding_ctx);
+	}
+	else
+	{
+		bool		is_system_catalog = IsSystemRelation(OldHeap);
+
+		/* Close relcache entries, but keep lock until transaction commit */
+		table_close(OldHeap, NoLock);
+		if (index)
+			index_close(index, NoLock);
+
+		/*
+		 * Close the new relation so it can be dropped as soon as the storage
+		 * is swapped. The relation is not visible to others, so no need to
+		 * unlock it explicitly.
+		 */
+		table_close(NewHeap, NoLock);
+
+		/*
+		 * Swap the physical files of the target and transient tables, then
+		 * rebuild the target's indexes and throw away the transient table.
+		 */
+		finish_heap_swap(tableOid, OIDNewHeap, is_system_catalog,
+						 swap_toast_by_content, false, true, true,
+						 frozenXid, cutoffMulti,
+						 relpersistence);
+	}
 }
 
 
@@ -824,15 +1158,19 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
 /*
  * Do the physical copying of table data.
  *
+ * 'snapshot' and 'decoding_ctx': see table_relation_copy_for_cluster(). Pass
+ * iff concurrent processing is required.
+ *
  * There are three output parameters:
  * *pSwapToastByContent is set true if toast tables must be swapped by content.
  * *pFreezeXid receives the TransactionId used as freeze cutoff point.
  * *pCutoffMulti receives the MultiXactId used as a cutoff point.
  */
 static void
-copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verbose,
-				bool *pSwapToastByContent, TransactionId *pFreezeXid,
-				MultiXactId *pCutoffMulti)
+copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
+				Snapshot snapshot, LogicalDecodingContext *decoding_ctx,
+				bool verbose, bool *pSwapToastByContent,
+				TransactionId *pFreezeXid, MultiXactId *pCutoffMulti)
 {
 	Relation	relRelation;
 	HeapTuple	reltup;
@@ -849,6 +1187,10 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	int			elevel = verbose ? INFO : DEBUG2;
 	PGRUsage	ru0;
 	char	   *nspname;
+	bool		concurrent = snapshot != NULL;
+	LOCKMODE	lmode;
+
+	lmode = concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock;
 
 	pg_rusage_init(&ru0);
 
@@ -877,7 +1219,7 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	 * will be held till end of transaction.
 	 */
 	if (OldHeap->rd_rel->reltoastrelid)
-		LockRelationOid(OldHeap->rd_rel->reltoastrelid, AccessExclusiveLock);
+		LockRelationOid(OldHeap->rd_rel->reltoastrelid, lmode);
 
 	/*
 	 * If both tables have TOAST tables, perform toast swap by content.  It is
@@ -886,7 +1228,8 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	 * swap by links.  This is okay because swap by content is only essential
 	 * for system catalogs, and we don't support schema changes for them.
 	 */
-	if (OldHeap->rd_rel->reltoastrelid && NewHeap->rd_rel->reltoastrelid)
+	if (OldHeap->rd_rel->reltoastrelid && NewHeap->rd_rel->reltoastrelid &&
+		!concurrent)
 	{
 		*pSwapToastByContent = true;
 
@@ -907,6 +1250,10 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 		 * follow the toast pointers to the wrong place.  (It would actually
 		 * work for values copied over from the old toast table, but not for
 		 * any values that we toast which were previously not toasted.)
+		 *
+		 * This would not work with CONCURRENTLY because we may need to delete
+		 * TOASTed tuples from the new heap. With this hack, we'd delete them
+		 * from the old heap.
 		 */
 		NewHeap->rd_toastoid = OldHeap->rd_rel->reltoastrelid;
 	}
@@ -982,7 +1329,9 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	 * values (e.g. because the AM doesn't use freezing).
 	 */
 	table_relation_copy_for_cluster(OldHeap, NewHeap, OldIndex, use_sort,
-									cutoffs.OldestXmin, &cutoffs.FreezeLimit,
+									cutoffs.OldestXmin, snapshot,
+									decoding_ctx,
+									&cutoffs.FreezeLimit,
 									&cutoffs.MultiXactCutoff,
 									&num_tuples, &tups_vacuumed,
 									&tups_recently_dead);
@@ -991,7 +1340,11 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	*pFreezeXid = cutoffs.FreezeLimit;
 	*pCutoffMulti = cutoffs.MultiXactCutoff;
 
-	/* Reset rd_toastoid just to be tidy --- it shouldn't be looked at again */
+	/*
+	 * Reset rd_toastoid just to be tidy --- it shouldn't be looked at again.
+	 * In the CONCURRENTLY case, we need to set it again before applying the
+	 * concurrent changes.
+	 */
 	NewHeap->rd_toastoid = InvalidOid;
 
 	num_pages = RelationGetNumberOfBlocks(NewHeap);
@@ -1449,14 +1802,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 				 bool swap_toast_by_content,
 				 bool check_constraints,
 				 bool is_internal,
+				 bool reindex,
 				 TransactionId frozenXid,
 				 MultiXactId cutoffMulti,
 				 char newrelpersistence)
 {
 	ObjectAddress object;
 	Oid			mapped_tables[4];
-	int			reindex_flags;
-	ReindexParams reindex_params = {0};
 	int			i;
 
 	/* Report that we are now swapping relation files */
@@ -1482,39 +1834,47 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	if (is_system_catalog)
 		CacheInvalidateCatalog(OIDOldHeap);
 
-	/*
-	 * Rebuild each index on the relation (but not the toast table, which is
-	 * all-new at this point).  It is important to do this before the DROP
-	 * step because if we are processing a system catalog that will be used
-	 * during DROP, we want to have its indexes available.  There is no
-	 * advantage to the other order anyway because this is all transactional,
-	 * so no chance to reclaim disk space before commit.  We do not need a
-	 * final CommandCounterIncrement() because reindex_relation does it.
-	 *
-	 * Note: because index_build is called via reindex_relation, it will never
-	 * set indcheckxmin true for the indexes.  This is OK even though in some
-	 * sense we are building new indexes rather than rebuilding existing ones,
-	 * because the new heap won't contain any HOT chains at all, let alone
-	 * broken ones, so it can't be necessary to set indcheckxmin.
-	 */
-	reindex_flags = REINDEX_REL_SUPPRESS_INDEX_USE;
-	if (check_constraints)
-		reindex_flags |= REINDEX_REL_CHECK_CONSTRAINTS;
+	if (reindex)
+	{
+		int			reindex_flags;
+		ReindexParams reindex_params = {0};
 
-	/*
-	 * Ensure that the indexes have the same persistence as the parent
-	 * relation.
-	 */
-	if (newrelpersistence == RELPERSISTENCE_UNLOGGED)
-		reindex_flags |= REINDEX_REL_FORCE_INDEXES_UNLOGGED;
-	else if (newrelpersistence == RELPERSISTENCE_PERMANENT)
-		reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
+		/*
+		 * Rebuild each index on the relation (but not the toast table, which
+		 * is all-new at this point).  It is important to do this before the
+		 * DROP step because if we are processing a system catalog that will
+		 * be used during DROP, we want to have its indexes available.  There
+		 * is no advantage to the other order anyway because this is all
+		 * transactional, so no chance to reclaim disk space before commit. We
+		 * do not need a final CommandCounterIncrement() because
+		 * reindex_relation does it.
+		 *
+		 * Note: because index_build is called via reindex_relation, it will
+		 * never set indcheckxmin true for the indexes.  This is OK even
+		 * though in some sense we are building new indexes rather than
+		 * rebuilding existing ones, because the new heap won't contain any
+		 * HOT chains at all, let alone broken ones, so it can't be necessary
+		 * to set indcheckxmin.
+		 */
+		reindex_flags = REINDEX_REL_SUPPRESS_INDEX_USE;
+		if (check_constraints)
+			reindex_flags |= REINDEX_REL_CHECK_CONSTRAINTS;
 
-	/* Report that we are now reindexing relations */
-	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
-								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
+		/*
+		 * Ensure that the indexes have the same persistence as the parent
+		 * relation.
+		 */
+		if (newrelpersistence == RELPERSISTENCE_UNLOGGED)
+			reindex_flags |= REINDEX_REL_FORCE_INDEXES_UNLOGGED;
+		else if (newrelpersistence == RELPERSISTENCE_PERMANENT)
+			reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
+
+		/* Report that we are now reindexing relations */
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
 
-	reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
+		reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
+	}
 
 	/* Report that we are now doing clean up */
 	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
@@ -1558,6 +1918,17 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	object.objectId = OIDNewHeap;
 	object.objectSubId = 0;
 
+	if (!reindex)
+	{
+		/*
+		 * Make sure the changes in pg_class are visible. This is especially
+		 * important if !swap_toast_by_content, so that the correct TOAST
+		 * relation is dropped. (reindex_relation() above did not help in this
+		 * case))
+		 */
+		CommandCounterIncrement();
+	}
+
 	/*
 	 * The new relation is local to our transaction and we know nothing
 	 * depends on it, so DROP_RESTRICT should be OK.
@@ -1597,7 +1968,7 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 
 			/* Get the associated valid index to be renamed */
 			toastidx = toast_get_valid_index(newrel->rd_rel->reltoastrelid,
-											 NoLock);
+											 AccessExclusiveLock);
 
 			/* rename the toast table ... */
 			snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u",
@@ -1857,7 +2228,8 @@ cluster_is_permitted_for_relation(RepackCommand cmd, Oid relid, Oid userid)
  * case, if an index name is given, it's up to the caller to resolve it.
  */
 static Relation
-process_single_relation(RepackStmt *stmt, ClusterParams *params)
+process_single_relation(RepackStmt *stmt, LOCKMODE lockmode, bool isTopLevel,
+						ClusterParams *params)
 {
 	Relation	rel;
 	Oid			tableOid;
@@ -1866,13 +2238,9 @@ process_single_relation(RepackStmt *stmt, ClusterParams *params)
 	Assert(stmt->command == REPACK_COMMAND_CLUSTER ||
 		   stmt->command == REPACK_COMMAND_REPACK);
 
-	/*
-	 * Find, lock, and check permissions on the table.  We obtain
-	 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
-	 * single-transaction case.
-	 */
+	/* Find, lock, and check permissions on the table. */
 	tableOid = RangeVarGetRelidExtended(stmt->relation->relation,
-										AccessExclusiveLock,
+										lockmode,
 										0,
 										RangeVarCallbackMaintainsTable,
 										NULL);
@@ -1904,13 +2272,14 @@ process_single_relation(RepackStmt *stmt, ClusterParams *params)
 		return rel;
 	else
 	{
-		Oid			indexOid;
+		Oid			indexOid = InvalidOid;
 
 		indexOid = determine_clustered_index(rel, stmt->usingindex,
 											 stmt->indexname);
 		if (OidIsValid(indexOid))
-			check_index_is_clusterable(rel, indexOid, AccessExclusiveLock);
-		cluster_rel(stmt->command, rel, indexOid, params);
+			check_index_is_clusterable(rel, indexOid, lockmode);
+
+		cluster_rel(stmt->command, rel, indexOid, params, isTopLevel);
 
 		/* Do an analyze, if requested */
 		if (params->options & CLUOPT_ANALYZE)
@@ -1993,3 +2362,1021 @@ RepackCommandAsString(RepackCommand cmd)
 	}
 	return "???";
 }
+
+
+/*
+ * Call this function before REPACK CONCURRENTLY starts to setup logical
+ * decoding. It makes sure that other users of the table put enough
+ * information into WAL.
+ *
+ * The point is that at various places we expect that the table we're
+ * processing is treated like a system catalog. For example, we need to be
+ * able to scan it using a "historic snapshot" anytime during the processing
+ * (as opposed to scanning only at the start point of the decoding, as logical
+ * replication does during initial table synchronization), in order to apply
+ * concurrent UPDATE / DELETE commands.
+ *
+ * Note that TOAST table needs no attention here as it's not scanned using
+ * historic snapshot.
+ */
+static void
+begin_concurrent_repack(Relation rel)
+{
+	Oid			toastrelid;
+
+	/*
+	 * Avoid logical decoding of other relations by this backend. The lock we
+	 * have guarantees that the actual locator cannot be changed concurrently:
+	 * TRUNCATE needs AccessExclusiveLock.
+	 */
+	Assert(CheckRelationLockedByMe(rel, ShareUpdateExclusiveLock, false));
+	repacked_rel_locator = rel->rd_locator;
+	toastrelid = rel->rd_rel->reltoastrelid;
+	if (OidIsValid(toastrelid))
+	{
+		Relation	toastrel;
+
+		/* Avoid logical decoding of other TOAST relations. */
+		toastrel = table_open(toastrelid, AccessShareLock);
+		repacked_rel_toast_locator = toastrel->rd_locator;
+		table_close(toastrel, AccessShareLock);
+	}
+}
+
+/*
+ * Call this when done with REPACK CONCURRENTLY.
+ */
+static void
+end_concurrent_repack(void)
+{
+	/*
+	 * Restore normal function of (future) logical decoding for this backend.
+	 */
+	repacked_rel_locator.relNumber = InvalidOid;
+	repacked_rel_toast_locator.relNumber = InvalidOid;
+}
+
+/*
+ * This function is much like pg_create_logical_replication_slot() except that
+ * the new slot is neither released (if anyone else could read changes from
+ * our slot, we could miss changes other backends do while we copy the
+ * existing data into temporary table), nor persisted (it's easier to handle
+ * crash by restarting all the work from scratch).
+ */
+static LogicalDecodingContext *
+setup_logical_decoding(Oid relid)
+{
+	Relation	rel;
+	TupleDesc	tupdesc;
+	LogicalDecodingContext *ctx;
+	RepackDecodingState *dstate = palloc0_object(RepackDecodingState);
+
+	/*
+	 * REPACK CONCURRENTLY is not allowed in a transaction block, so this
+	 * should never fire.
+	 */
+	Assert(!TransactionIdIsValid(GetTopTransactionIdIfAny()));
+
+	/*
+	 * A single backend should not execute multiple REPACK commands at a time,
+	 * so use PID to make the slot unique.
+	 */
+	snprintf(NameStr(dstate->slotname), NAMEDATALEN, "repack_%d", MyProcPid);
+
+	/*
+	 * Check if we can use logical decoding.
+	 */
+	CheckSlotPermissions();
+	CheckLogicalDecodingRequirements();
+
+	/* RS_TEMPORARY so that the slot gets cleaned up on ERROR. */
+	ReplicationSlotCreate(NameStr(dstate->slotname), true, RS_TEMPORARY,
+						  false, false, false);
+
+	/*
+	 * Neither prepare_write nor do_write callback nor update_progress is
+	 * useful for us.
+	 */
+	ctx = CreateInitDecodingContext(REPL_PLUGIN_NAME,
+									NIL,
+									true,
+									InvalidXLogRecPtr,
+									XL_ROUTINE(.page_read = read_local_xlog_page,
+											   .segment_open = wal_segment_open,
+											   .segment_close = wal_segment_close),
+									NULL, NULL, NULL);
+
+	/*
+	 * We don't have control on setting fast_forward, so at least check it.
+	 */
+	Assert(!ctx->fast_forward);
+
+	DecodingContextFindStartpoint(ctx);
+
+	/* Some WAL records should have been read. */
+	Assert(ctx->reader->EndRecPtr != InvalidXLogRecPtr);
+
+	XLByteToSeg(ctx->reader->EndRecPtr, repack_current_segment,
+				wal_segment_size);
+
+	/*
+	 * Setup structures to store decoded changes.
+	 */
+	dstate->relid = relid;
+	dstate->tstore = tuplestore_begin_heap(false, false,
+										   maintenance_work_mem);
+
+	/* Caller should already have the table locked. */
+	rel = table_open(relid, NoLock);
+	tupdesc = CreateTupleDescCopy(RelationGetDescr(rel));
+	dstate->tupdesc = tupdesc;
+	table_close(rel, NoLock);
+
+	/* Initialize the descriptor to store the changes ... */
+	dstate->tupdesc_change = CreateTemplateTupleDesc(1);
+
+	TupleDescInitEntry(dstate->tupdesc_change, 1, NULL, BYTEAOID, -1, 0);
+	/* ... as well as the corresponding slot. */
+	dstate->tsslot = MakeSingleTupleTableSlot(dstate->tupdesc_change,
+											  &TTSOpsMinimalTuple);
+
+	dstate->resowner = ResourceOwnerCreate(CurrentResourceOwner,
+										   "logical decoding");
+
+	ctx->output_writer_private = dstate;
+	return ctx;
+}
+
+/*
+ * Retrieve tuple from ConcurrentChange structure.
+ *
+ * The input data starts with the structure but it might not be appropriately
+ * aligned.
+ */
+static HeapTuple
+get_changed_tuple(char *change)
+{
+	HeapTupleData tup_data;
+	HeapTuple	result;
+	char	   *src;
+
+	/*
+	 * Ensure alignment before accessing the fields. (This is why we can't use
+	 * heap_copytuple() instead of this function.)
+	 */
+	src = change + offsetof(ConcurrentChange, tup_data);
+	memcpy(&tup_data, src, sizeof(HeapTupleData));
+
+	result = (HeapTuple) palloc(HEAPTUPLESIZE + tup_data.t_len);
+	memcpy(result, &tup_data, sizeof(HeapTupleData));
+	result->t_data = (HeapTupleHeader) ((char *) result + HEAPTUPLESIZE);
+	src = change + SizeOfConcurrentChange;
+	memcpy(result->t_data, src, result->t_len);
+
+	return result;
+}
+
+/*
+ * Decode logical changes from the WAL sequence up to end_of_wal.
+ */
+void
+repack_decode_concurrent_changes(LogicalDecodingContext *ctx,
+								 XLogRecPtr end_of_wal)
+{
+	RepackDecodingState *dstate;
+	ResourceOwner resowner_old;
+
+	/*
+	 * Invalidate the "present" cache before moving to "(recent) history".
+	 */
+	InvalidateSystemCaches();
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+	resowner_old = CurrentResourceOwner;
+	CurrentResourceOwner = dstate->resowner;
+
+	PG_TRY();
+	{
+		while (ctx->reader->EndRecPtr < end_of_wal)
+		{
+			XLogRecord *record;
+			XLogSegNo	segno_new;
+			char	   *errm = NULL;
+			XLogRecPtr	end_lsn;
+
+			record = XLogReadRecord(ctx->reader, &errm);
+			if (errm)
+				elog(ERROR, "%s", errm);
+
+			if (record != NULL)
+				LogicalDecodingProcessRecord(ctx, ctx->reader);
+
+			/*
+			 * If WAL segment boundary has been crossed, inform the decoding
+			 * system that the catalog_xmin can advance. (We can confirm more
+			 * often, but a filling a single WAL segment should not take much
+			 * time.)
+			 */
+			end_lsn = ctx->reader->EndRecPtr;
+			XLByteToSeg(end_lsn, segno_new, wal_segment_size);
+			if (segno_new != repack_current_segment)
+			{
+				LogicalConfirmReceivedLocation(end_lsn);
+				elog(DEBUG1, "REPACK: confirmed receive location %X/%X",
+					 (uint32) (end_lsn >> 32), (uint32) end_lsn);
+				repack_current_segment = segno_new;
+			}
+
+			CHECK_FOR_INTERRUPTS();
+		}
+		InvalidateSystemCaches();
+		CurrentResourceOwner = resowner_old;
+	}
+	PG_CATCH();
+	{
+		/* clear all timetravel entries */
+		InvalidateSystemCaches();
+		CurrentResourceOwner = resowner_old;
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+}
+
+/*
+ * Apply changes that happened during the initial load.
+ *
+ * Scan key is passed by caller, so it does not have to be constructed
+ * multiple times. Key entries have all fields initialized, except for
+ * sk_argument.
+ */
+static void
+apply_concurrent_changes(RepackDecodingState *dstate, ChangeDest *dest)
+{
+	Relation	rel = dest->rel;
+	TupleTableSlot *index_slot,
+			   *ident_slot;
+	HeapTuple	tup_old = NULL;
+
+	if (dstate->nchanges == 0)
+		return;
+
+	/* TupleTableSlot is needed to pass the tuple to ExecInsertIndexTuples(). */
+	index_slot = MakeSingleTupleTableSlot(dstate->tupdesc, &TTSOpsHeapTuple);
+
+	/* A slot to fetch tuples from identity index. */
+	ident_slot = table_slot_create(rel, NULL);
+
+	while (tuplestore_gettupleslot(dstate->tstore, true, false,
+								   dstate->tsslot))
+	{
+		bool		shouldFree;
+		HeapTuple	tup_change,
+					tup,
+					tup_exist;
+		char	   *change_raw,
+				   *src;
+		ConcurrentChange change;
+		bool		isnull[1];
+		Datum		values[1];
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* Get the change from the single-column tuple. */
+		tup_change = ExecFetchSlotHeapTuple(dstate->tsslot, false, &shouldFree);
+		heap_deform_tuple(tup_change, dstate->tupdesc_change, values, isnull);
+		Assert(!isnull[0]);
+
+		/* Make sure we access aligned data. */
+		change_raw = (char *) DatumGetByteaP(values[0]);
+		src = (char *) VARDATA(change_raw);
+		memcpy(&change, src, SizeOfConcurrentChange);
+
+		/*
+		 * Extract the tuple from the change. The tuple is copied here because
+		 * it might be assigned to 'tup_old', in which case it needs to
+		 * survive into the next iteration.
+		 */
+		tup = get_changed_tuple(src);
+
+		if (change.kind == CHANGE_UPDATE_OLD)
+		{
+			Assert(tup_old == NULL);
+			tup_old = tup;
+		}
+		else if (change.kind == CHANGE_INSERT)
+		{
+			Assert(tup_old == NULL);
+
+			apply_concurrent_insert(rel, tup, dest->iistate, index_slot);
+
+			pfree(tup);
+		}
+		else if (change.kind == CHANGE_UPDATE_NEW ||
+				 change.kind == CHANGE_DELETE)
+		{
+			HeapTuple	tup_key;
+
+			if (change.kind == CHANGE_UPDATE_NEW)
+			{
+				tup_key = tup_old != NULL ? tup_old : tup;
+			}
+			else
+			{
+				Assert(tup_old == NULL);
+				tup_key = tup;
+			}
+
+			/*
+			 * Find the tuple to be updated or deleted.
+			 */
+			tup_exist = find_target_tuple(rel, dest, tup_key, ident_slot);
+			if (tup_exist == NULL)
+				elog(ERROR, "failed to find target tuple");
+
+			if (change.kind == CHANGE_UPDATE_NEW)
+				apply_concurrent_update(rel, tup, tup_exist, dest->iistate,
+										index_slot);
+			else
+				apply_concurrent_delete(rel, tup_exist);
+
+			if (tup_old != NULL)
+			{
+				pfree(tup_old);
+				tup_old = NULL;
+			}
+
+			pfree(tup);
+		}
+		else
+			elog(ERROR, "unrecognized kind of change: %d", change.kind);
+
+		/*
+		 * If a change was applied now, increment CID for next writes and
+		 * update the snapshot so it sees the changes we've applied so far.
+		 */
+		if (change.kind != CHANGE_UPDATE_OLD)
+		{
+			CommandCounterIncrement();
+			UpdateActiveSnapshotCommandId();
+		}
+
+		/* TTSOpsMinimalTuple has .get_heap_tuple==NULL. */
+		Assert(shouldFree);
+		pfree(tup_change);
+	}
+
+	tuplestore_clear(dstate->tstore);
+	dstate->nchanges = 0;
+
+	/* Cleanup. */
+	ExecDropSingleTupleTableSlot(index_slot);
+	ExecDropSingleTupleTableSlot(ident_slot);
+}
+
+static void
+apply_concurrent_insert(Relation rel, HeapTuple tup, IndexInsertState *iistate,
+						TupleTableSlot *index_slot)
+{
+	List	   *recheck;
+
+
+	/*
+	 * Like simple_heap_insert(), but make sure that the INSERT is not
+	 * logically decoded - see reform_and_rewrite_tuple() for more
+	 * information.
+	 */
+	heap_insert(rel, tup, GetCurrentCommandId(true), HEAP_INSERT_NO_LOGICAL,
+				NULL);
+
+	/*
+	 * Update indexes.
+	 *
+	 * In case functions in the index need the active snapshot and caller
+	 * hasn't set one.
+	 */
+	ExecStoreHeapTuple(tup, index_slot, false);
+	recheck = ExecInsertIndexTuples(iistate->rri,
+									index_slot,
+									iistate->estate,
+									false,	/* update */
+									false,	/* noDupErr */
+									NULL,	/* specConflict */
+									NIL,	/* arbiterIndexes */
+									false	/* onlySummarizing */
+		);
+
+	/*
+	 * If recheck is required, it must have been preformed on the source
+	 * relation by now. (All the logical changes we process here are already
+	 * committed.)
+	 */
+	list_free(recheck);
+
+	pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_INSERTED, 1);
+}
+
+static void
+apply_concurrent_update(Relation rel, HeapTuple tup, HeapTuple tup_target,
+						IndexInsertState *iistate, TupleTableSlot *index_slot)
+{
+	LockTupleMode lockmode;
+	TM_FailureData tmfd;
+	TU_UpdateIndexes update_indexes;
+	TM_Result	res;
+	List	   *recheck;
+
+	/*
+	 * Write the new tuple into the new heap. ('tup' gets the TID assigned
+	 * here.)
+	 *
+	 * Do it like in simple_heap_update(), except for 'wal_logical' (and
+	 * except for 'wait').
+	 */
+	res = heap_update(rel, &tup_target->t_self, tup,
+					  GetCurrentCommandId(true),
+					  InvalidSnapshot,
+					  false,	/* no wait - only we are doing changes */
+					  &tmfd, &lockmode, &update_indexes,
+					  false /* wal_logical */ );
+	if (res != TM_Ok)
+		ereport(ERROR, (errmsg("failed to apply concurrent UPDATE")));
+
+	ExecStoreHeapTuple(tup, index_slot, false);
+
+	if (update_indexes != TU_None)
+	{
+		recheck = ExecInsertIndexTuples(iistate->rri,
+										index_slot,
+										iistate->estate,
+										true,	/* update */
+										false,	/* noDupErr */
+										NULL,	/* specConflict */
+										NIL,	/* arbiterIndexes */
+		/* onlySummarizing */
+										update_indexes == TU_Summarizing);
+		list_free(recheck);
+	}
+
+	pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_UPDATED, 1);
+}
+
+static void
+apply_concurrent_delete(Relation rel, HeapTuple tup_target)
+{
+	TM_Result	res;
+	TM_FailureData tmfd;
+
+	/*
+	 * Delete tuple from the new heap.
+	 *
+	 * Do it like in simple_heap_delete(), except for 'wal_logical' (and
+	 * except for 'wait').
+	 */
+	res = heap_delete(rel, &tup_target->t_self, GetCurrentCommandId(true),
+					  InvalidSnapshot, false,
+					  &tmfd,
+					  false,	/* no wait - only we are doing changes */
+					  false /* wal_logical */ );
+
+	if (res != TM_Ok)
+		ereport(ERROR, (errmsg("failed to apply concurrent DELETE")));
+
+	pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_DELETED, 1);
+}
+
+/*
+ * Find the tuple to be updated or deleted.
+ *
+ * 'tup_key' is a tuple containing the key values for the scan.
+ *
+ * On exit,'*scan_p' contains the scan descriptor used. The caller must close
+ * it when he no longer needs the tuple returned.
+ */
+static HeapTuple
+find_target_tuple(Relation rel, ChangeDest *dest, HeapTuple tup_key,
+				  TupleTableSlot *ident_slot)
+{
+	Relation	ident_index = dest->ident_index;
+	IndexScanDesc scan;
+	Form_pg_index ident_form;
+	int2vector *ident_indkey;
+	HeapTuple	result = NULL;
+
+	/* XXX no instrumentation for now */
+	scan = index_beginscan(rel, ident_index, GetActiveSnapshot(),
+						   NULL, dest->ident_key_nentries, 0);
+	index_rescan(scan, dest->ident_key, dest->ident_key_nentries, NULL, 0);
+
+	/* Info needed to retrieve key values from heap tuple. */
+	ident_form = ident_index->rd_index;
+	ident_indkey = &ident_form->indkey;
+
+	/* Use the incoming tuple to finalize the scan key. */
+	for (int i = 0; i < scan->numberOfKeys; i++)
+	{
+		ScanKey		entry;
+		bool		isnull;
+		int16		attno_heap;
+
+		entry = &scan->keyData[i];
+		attno_heap = ident_indkey->values[i];
+		entry->sk_argument = heap_getattr(tup_key,
+										  attno_heap,
+										  rel->rd_att,
+										  &isnull);
+		Assert(!isnull);
+	}
+	if (index_getnext_slot(scan, ForwardScanDirection, ident_slot))
+	{
+		bool		shouldFree;
+
+		result = ExecFetchSlotHeapTuple(ident_slot, false, &shouldFree);
+		/* TTSOpsBufferHeapTuple has .get_heap_tuple != NULL. */
+		Assert(!shouldFree);
+	}
+	index_endscan(scan);
+
+	return result;
+}
+
+/*
+ * Decode and apply concurrent changes.
+ */
+static void
+process_concurrent_changes(LogicalDecodingContext *decoding_ctx,
+						   XLogRecPtr end_of_wal, ChangeDest *dest)
+{
+	RepackDecodingState *dstate;
+
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_CATCH_UP);
+
+	dstate = (RepackDecodingState *) decoding_ctx->output_writer_private;
+
+	repack_decode_concurrent_changes(decoding_ctx, end_of_wal);
+
+	if (dstate->nchanges == 0)
+		return;
+
+	apply_concurrent_changes(dstate, dest);
+}
+
+/*
+ * Initialize IndexInsertState for index specified by ident_index_id.
+ *
+ * While doing that, also return the identity index in *ident_index_p.
+ */
+static IndexInsertState *
+get_index_insert_state(Relation relation, Oid ident_index_id,
+					   Relation *ident_index_p)
+{
+	EState	   *estate;
+	int			i;
+	IndexInsertState *result;
+	Relation	ident_index = NULL;
+
+	result = (IndexInsertState *) palloc0(sizeof(IndexInsertState));
+	estate = CreateExecutorState();
+
+	result->rri = (ResultRelInfo *) palloc(sizeof(ResultRelInfo));
+	InitResultRelInfo(result->rri, relation, 0, 0, 0);
+	ExecOpenIndices(result->rri, false);
+
+	/*
+	 * Find the relcache entry of the identity index so that we spend no extra
+	 * effort to open / close it.
+	 */
+	for (i = 0; i < result->rri->ri_NumIndices; i++)
+	{
+		Relation	ind_rel;
+
+		ind_rel = result->rri->ri_IndexRelationDescs[i];
+		if (ind_rel->rd_id == ident_index_id)
+			ident_index = ind_rel;
+	}
+	if (ident_index == NULL)
+		elog(ERROR, "failed to open identity index");
+
+	/* Only initialize fields needed by ExecInsertIndexTuples(). */
+	result->estate = estate;
+
+	*ident_index_p = ident_index;
+	return result;
+}
+
+/*
+ * Build scan key to process logical changes.
+ */
+static ScanKey
+build_identity_key(Oid ident_idx_oid, Relation rel_src, int *nentries)
+{
+	Relation	ident_idx_rel;
+	Form_pg_index ident_idx;
+	int			n,
+				i;
+	ScanKey		result;
+
+	Assert(OidIsValid(ident_idx_oid));
+	ident_idx_rel = index_open(ident_idx_oid, AccessShareLock);
+	ident_idx = ident_idx_rel->rd_index;
+	n = ident_idx->indnatts;
+	result = (ScanKey) palloc(sizeof(ScanKeyData) * n);
+	for (i = 0; i < n; i++)
+	{
+		ScanKey		entry;
+		int16		relattno;
+		Form_pg_attribute att;
+		Oid			opfamily,
+					opcintype,
+					opno,
+					opcode;
+
+		entry = &result[i];
+		relattno = ident_idx->indkey.values[i];
+		if (relattno >= 1)
+		{
+			TupleDesc	desc;
+
+			desc = rel_src->rd_att;
+			att = TupleDescAttr(desc, relattno - 1);
+		}
+		else
+			elog(ERROR, "unexpected attribute number %d in index", relattno);
+
+		opfamily = ident_idx_rel->rd_opfamily[i];
+		opcintype = ident_idx_rel->rd_opcintype[i];
+		opno = get_opfamily_member(opfamily, opcintype, opcintype,
+								   BTEqualStrategyNumber);
+
+		if (!OidIsValid(opno))
+			elog(ERROR, "failed to find = operator for type %u", opcintype);
+
+		opcode = get_opcode(opno);
+		if (!OidIsValid(opcode))
+			elog(ERROR, "failed to find = operator for operator %u", opno);
+
+		/* Initialize everything but argument. */
+		ScanKeyInit(entry,
+					i + 1,
+					BTEqualStrategyNumber, opcode,
+					(Datum) NULL);
+		entry->sk_collation = att->attcollation;
+	}
+	index_close(ident_idx_rel, AccessShareLock);
+
+	*nentries = n;
+	return result;
+}
+
+static void
+free_index_insert_state(IndexInsertState *iistate)
+{
+	ExecCloseIndices(iistate->rri);
+	FreeExecutorState(iistate->estate);
+	pfree(iistate->rri);
+	pfree(iistate);
+}
+
+static void
+cleanup_logical_decoding(LogicalDecodingContext *ctx)
+{
+	RepackDecodingState *dstate;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	ExecDropSingleTupleTableSlot(dstate->tsslot);
+	FreeTupleDesc(dstate->tupdesc_change);
+	FreeTupleDesc(dstate->tupdesc);
+	tuplestore_end(dstate->tstore);
+
+	FreeDecodingContext(ctx);
+
+	ReplicationSlotRelease();
+	ReplicationSlotDrop(NameStr(dstate->slotname), false);
+	pfree(dstate);
+}
+
+/*
+ * The final steps of rebuild_relation() for concurrent processing.
+ *
+ * On entry, NewHeap is locked in AccessExclusiveLock mode. OldHeap and its
+ * clustering index (if one is passed) are still locked in a mode that allows
+ * concurrent data changes. On exit, both tables and their indexes are closed,
+ * but locked in AccessExclusiveLock mode.
+ */
+static void
+rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
+								   Relation cl_index,
+								   LogicalDecodingContext *decoding_ctx,
+								   TransactionId frozenXid,
+								   MultiXactId cutoffMulti)
+{
+	LOCKMODE	lockmode_old PG_USED_FOR_ASSERTS_ONLY;
+	List	   *ind_oids_new;
+	Oid			old_table_oid = RelationGetRelid(OldHeap);
+	Oid			new_table_oid = RelationGetRelid(NewHeap);
+	List	   *ind_oids_old = RelationGetIndexList(OldHeap);
+	ListCell   *lc,
+			   *lc2;
+	char		relpersistence;
+	bool		is_system_catalog;
+	Oid			ident_idx_old,
+				ident_idx_new;
+	XLogRecPtr	wal_insert_ptr,
+				end_of_wal;
+	char		dummy_rec_data = '\0';
+	Relation   *ind_refs,
+			   *ind_refs_p;
+	int			nind;
+	ChangeDest	chgdst;
+
+	/* Like in cluster_rel(). */
+	lockmode_old = ShareUpdateExclusiveLock;
+	Assert(CheckRelationLockedByMe(OldHeap, lockmode_old, false));
+	Assert(cl_index == NULL ||
+		   CheckRelationLockedByMe(cl_index, lockmode_old, false));
+	/* This is expected from the caller. */
+	Assert(CheckRelationLockedByMe(NewHeap, AccessExclusiveLock, false));
+
+	ident_idx_old = RelationGetReplicaIndex(OldHeap);
+
+	/*
+	 * Unlike the exclusive case, we build new indexes for the new relation
+	 * rather than swapping the storage and reindexing the old relation. The
+	 * point is that the index build can take some time, so we do it before we
+	 * get AccessExclusiveLock on the old heap and therefore we cannot swap
+	 * the heap storage yet.
+	 *
+	 * index_create() will lock the new indexes using AccessExclusiveLock - no
+	 * need to change that. At the same time, we use ShareUpdateExclusiveLock
+	 * to lock the existing indexes - that should be enough to prevent others
+	 * from changing them while we're repacking the relation. The lock on
+	 * table should prevent others from changing the index column list, but
+	 * might not be enough for commands like ALTER INDEX ... SET ... (Those
+	 * are not necessarily dangerous, but can make user confused if the
+	 * changes they do get lost due to REPACK.)
+	 */
+	ind_oids_new = build_new_indexes(NewHeap, OldHeap, ind_oids_old);
+
+	/*
+	 * Processing shouldn't start w/o valid identity index.
+	 */
+	Assert(OidIsValid(ident_idx_old));
+
+	/* Find "identity index" on the new relation. */
+	ident_idx_new = InvalidOid;
+	forboth(lc, ind_oids_old, lc2, ind_oids_new)
+	{
+		Oid			ind_old = lfirst_oid(lc);
+		Oid			ind_new = lfirst_oid(lc2);
+
+		if (ident_idx_old == ind_old)
+		{
+			ident_idx_new = ind_new;
+			break;
+		}
+	}
+	if (!OidIsValid(ident_idx_new))
+		/*
+		 * Should not happen, given our lock on the old relation.
+		 */
+		ereport(ERROR,
+				(errmsg("identity index missing on the new relation")));
+
+	/* Gather information to apply concurrent changes. */
+	chgdst.rel = NewHeap;
+	chgdst.iistate = get_index_insert_state(NewHeap, ident_idx_new,
+											&chgdst.ident_index);
+	chgdst.ident_key = build_identity_key(ident_idx_new, OldHeap,
+										  &chgdst.ident_key_nentries);
+
+	/*
+	 * During testing, wait for another backend to perform concurrent data
+	 * changes which we will process below.
+	 */
+	INJECTION_POINT("repack-concurrently-before-lock", NULL);
+
+	/*
+	 * Flush all WAL records inserted so far (possibly except for the last
+	 * incomplete page, see GetInsertRecPtr), to minimize the amount of data
+	 * we need to flush while holding exclusive lock on the source table.
+	 */
+	wal_insert_ptr = GetInsertRecPtr();
+	XLogFlush(wal_insert_ptr);
+	end_of_wal = GetFlushRecPtr(NULL);
+
+	/*
+	 * Apply concurrent changes first time, to minimize the time we need to
+	 * hold AccessExclusiveLock. (Quite some amount of WAL could have been
+	 * written during the data copying and index creation.)
+	 */
+	process_concurrent_changes(decoding_ctx, end_of_wal, &chgdst);
+
+	/*
+	 * Acquire AccessExclusiveLock on the table, its TOAST relation (if there
+	 * is one), all its indexes, so that we can swap the files.
+	 *
+	 * Before that, unlock the index temporarily to avoid deadlock in case
+	 * another transaction is trying to lock it while holding the lock on the
+	 * table.
+	 */
+	if (cl_index)
+	{
+		index_close(cl_index, ShareUpdateExclusiveLock);
+		cl_index = NULL;
+	}
+	/* For the same reason, unlock TOAST relation. */
+	if (OldHeap->rd_rel->reltoastrelid)
+		LockRelationOid(OldHeap->rd_rel->reltoastrelid, AccessExclusiveLock);
+	/* Finally lock the table */
+	LockRelationOid(old_table_oid, AccessExclusiveLock);
+
+	/*
+	 * Lock all indexes now, not only the clustering one: all indexes need to
+	 * have their files swapped. While doing that, store their relation
+	 * references in an array, to handle predicate locks below.
+	 */
+	ind_refs_p = ind_refs = palloc_array(Relation, list_length(ind_oids_old));
+	nind = 0;
+	foreach(lc, ind_oids_old)
+	{
+		Oid			ind_oid;
+		Relation	index;
+
+		ind_oid = lfirst_oid(lc);
+		index = index_open(ind_oid, AccessExclusiveLock);
+
+		/*
+		 * TODO 1) Do we need to check if ALTER INDEX was executed since the
+		 * new index was created in build_new_indexes()? 2) Specifically for
+		 * the clustering index, should check_index_is_clusterable() be called
+		 * here? (Not sure about the latter: ShareUpdateExclusiveLock on the
+		 * table probably blocks all commands that affect the result of
+		 * check_index_is_clusterable().)
+		 */
+		*ind_refs_p = index;
+		ind_refs_p++;
+		nind++;
+	}
+
+	/*
+	 * In addition, lock the OldHeap's TOAST relation exclusively - again, the
+	 * lock is needed to swap the files.
+	 */
+	if (OidIsValid(OldHeap->rd_rel->reltoastrelid))
+		LockRelationOid(OldHeap->rd_rel->reltoastrelid, AccessExclusiveLock);
+
+	/*
+	 * Tuples and pages of the old heap will be gone, but the heap will stay.
+	 */
+	TransferPredicateLocksToHeapRelation(OldHeap);
+	/* The same for indexes. */
+	for (int i = 0; i < nind; i++)
+	{
+		Relation	index = ind_refs[i];
+
+		TransferPredicateLocksToHeapRelation(index);
+
+		/*
+		 * References to indexes on the old relation are not needed anymore,
+		 * however locks stay till the end of the transaction.
+		 */
+		index_close(index, NoLock);
+	}
+	pfree(ind_refs);
+
+	/*
+	 * Flush anything we see in WAL, to make sure that all changes committed
+	 * while we were waiting for the exclusive lock are available for
+	 * decoding. This should not be necessary if all backends had
+	 * synchronous_commit set, but we can't rely on this setting.
+	 *
+	 * Unfortunately, GetInsertRecPtr() may lag behind the actual insert
+	 * position, and GetLastImportantRecPtr() points at the start of the last
+	 * record rather than at the end. Thus the simplest way to determine the
+	 * insert position is to insert a dummy record and use its LSN.
+	 *
+	 * XXX Consider using GetLastImportantRecPtr() and adding the size of the
+	 * last record (plus the total size of all the page headers the record
+	 * spans)?
+	 */
+	XLogBeginInsert();
+	XLogRegisterData(&dummy_rec_data, 1);
+	wal_insert_ptr = XLogInsert(RM_XLOG_ID, XLOG_NOOP);
+	XLogFlush(wal_insert_ptr);
+	end_of_wal = GetFlushRecPtr(NULL);
+
+	/* Apply the concurrent changes again. */
+	process_concurrent_changes(decoding_ctx, end_of_wal, &chgdst);
+
+	/* Remember info about rel before closing OldHeap */
+	relpersistence = OldHeap->rd_rel->relpersistence;
+	is_system_catalog = IsSystemRelation(OldHeap);
+
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_SWAP_REL_FILES);
+
+	/*
+	 * Even ShareUpdateExclusiveLock should have prevented others from
+	 * creating / dropping indexes (even using the CONCURRENTLY option), so we
+	 * do not need to check whether the lists match.
+	 */
+	forboth(lc, ind_oids_old, lc2, ind_oids_new)
+	{
+		Oid			ind_old = lfirst_oid(lc);
+		Oid			ind_new = lfirst_oid(lc2);
+		Oid			mapped_tables[4];
+
+		/* Zero out possible results from swapped_relation_files */
+		memset(mapped_tables, 0, sizeof(mapped_tables));
+
+		swap_relation_files(ind_old, ind_new,
+							(old_table_oid == RelationRelationId),
+							false, /* swap_toast_by_content */
+							true,
+							InvalidTransactionId,
+							InvalidMultiXactId,
+							mapped_tables);
+
+#ifdef USE_ASSERT_CHECKING
+
+		/*
+		 * Concurrent processing is not supported for system relations, so
+		 * there should be no mapped tables.
+		 */
+		for (int i = 0; i < 4; i++)
+			Assert(mapped_tables[i] == 0);
+#endif
+	}
+
+	/* The new indexes must be visible for deletion. */
+	CommandCounterIncrement();
+
+	/* Close the old heap but keep lock until transaction commit. */
+	table_close(OldHeap, NoLock);
+	/* Close the new heap. (We didn't have to open its indexes). */
+	table_close(NewHeap, NoLock);
+
+	/* Cleanup what we don't need anymore. (And close the identity index.) */
+	pfree(chgdst.ident_key);
+	free_index_insert_state(chgdst.iistate);
+
+	/*
+	 * Swap the relations and their TOAST relations and TOAST indexes. This
+	 * also drops the new relation and its indexes.
+	 *
+	 * (System catalogs are currently not supported.)
+	 */
+	Assert(!is_system_catalog);
+	finish_heap_swap(old_table_oid, new_table_oid,
+					 is_system_catalog,
+					 false, /* swap_toast_by_content */
+					 false, true, false,
+					 frozenXid, cutoffMulti,
+					 relpersistence);
+}
+
+/*
+ * Build indexes on NewHeap according to those on OldHeap.
+ *
+ * OldIndexes is the list of index OIDs on OldHeap. The contained indexes end
+ * up locked using ShareUpdateExclusiveLock.
+ *
+ * A list of OIDs of the corresponding indexes created on NewHeap is
+ * returned. The order of items does match, so we can use these arrays to swap
+ * index storage.
+ */
+static List *
+build_new_indexes(Relation NewHeap, Relation OldHeap, List *OldIndexes)
+{
+	ListCell   *lc;
+	List	   *result = NIL;
+
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
+
+	foreach(lc, OldIndexes)
+	{
+		Oid			ind_oid,
+					ind_oid_new;
+		char	   *newName;
+		Relation	ind;
+
+		ind_oid = lfirst_oid(lc);
+		ind = index_open(ind_oid, ShareUpdateExclusiveLock);
+
+		newName = ChooseRelationName(get_rel_name(ind_oid),
+									 NULL,
+									 "repacknew",
+									 get_rel_namespace(ind->rd_index->indrelid),
+									 false);
+		ind_oid_new = index_create_copy(NewHeap, ind_oid,
+										ind->rd_rel->reltablespace, newName,
+										false);
+		result = lappend_oid(result, ind_oid_new);
+
+		index_close(ind, NoLock);
+	}
+
+	return result;
+}
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index a5c579ce112..f223b27c76f 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -892,7 +892,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 static void
 refresh_by_heap_swap(Oid matviewOid, Oid OIDNewHeap, char relpersistence)
 {
-	finish_heap_swap(matviewOid, OIDNewHeap, false, false, true, true,
+	finish_heap_swap(matviewOid, OIDNewHeap, false, false, true, true, true,
 					 RecentXmin, ReadNextMultiXactId(), relpersistence);
 }
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 07e5b95782e..1bce85e4232 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -5992,6 +5992,7 @@ ATRewriteTables(AlterTableStmt *parsetree, List **wqueue, LOCKMODE lockmode,
 			finish_heap_swap(tab->relid, OIDNewHeap,
 							 false, false, true,
 							 !OidIsValid(tab->newTableSpace),
+							 true,
 							 RecentXmin,
 							 ReadNextMultiXactId(),
 							 persistence);
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 6afa203983f..ae8b5d4066f 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -126,7 +126,7 @@ static void vac_truncate_clog(TransactionId frozenXID,
 							  TransactionId lastSaneFrozenXid,
 							  MultiXactId lastSaneMinMulti);
 static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
-					   BufferAccessStrategy bstrategy);
+					   BufferAccessStrategy bstrategy, bool isTopLevel);
 static double compute_parallel_delay(void);
 static VacOptValue get_vacoptval_from_boolean(DefElem *def);
 static bool vac_tid_reaped(ItemPointer itemptr, void *state);
@@ -629,7 +629,8 @@ vacuum(List *relations, const VacuumParams params, BufferAccessStrategy bstrateg
 
 			if (params.options & VACOPT_VACUUM)
 			{
-				if (!vacuum_rel(vrel->oid, vrel->relation, params, bstrategy))
+				if (!vacuum_rel(vrel->oid, vrel->relation, params, bstrategy,
+								isTopLevel))
 					continue;
 			}
 
@@ -1999,7 +2000,7 @@ vac_truncate_clog(TransactionId frozenXID,
  */
 static bool
 vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
-		   BufferAccessStrategy bstrategy)
+		   BufferAccessStrategy bstrategy, bool isTopLevel)
 {
 	LOCKMODE	lmode;
 	Relation	rel;
@@ -2290,7 +2291,7 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 
 			/* VACUUM FULL is a variant of REPACK; see cluster.c */
 			cluster_rel(REPACK_COMMAND_VACUUMFULL, rel, InvalidOid,
-						&cluster_params);
+						&cluster_params, isTopLevel);
 			/* cluster_rel closes the relation, but keeps lock */
 
 			rel = NULL;
@@ -2333,7 +2334,8 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 		toast_vacuum_params.options |= VACOPT_PROCESS_MAIN;
 		toast_vacuum_params.toast_parent = relid;
 
-		vacuum_rel(toast_relid, NULL, toast_vacuum_params, bstrategy);
+		vacuum_rel(toast_relid, NULL, toast_vacuum_params, bstrategy,
+				   isTopLevel);
 	}
 
 	/*
diff --git a/src/backend/meson.build b/src/backend/meson.build
index b831a541652..5c148131217 100644
--- a/src/backend/meson.build
+++ b/src/backend/meson.build
@@ -194,5 +194,6 @@ pg_test_mod_args = pg_mod_args + {
 subdir('jit/llvm')
 subdir('replication/libpqwalreceiver')
 subdir('replication/pgoutput')
+subdir('replication/pgoutput_repack')
 subdir('snowball')
 subdir('utils/mb/conversion_procs')
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..73fc4d30c67 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -33,6 +33,7 @@
 #include "access/xlogreader.h"
 #include "access/xlogrecord.h"
 #include "catalog/pg_control.h"
+#include "commands/cluster.h"
 #include "replication/decode.h"
 #include "replication/logical.h"
 #include "replication/message.h"
@@ -472,6 +473,88 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	TransactionId xid = XLogRecGetXid(buf->record);
 	SnapBuild  *builder = ctx->snapshot_builder;
 
+	/*
+	 * If the change is not intended for logical decoding, do not even
+	 * establish transaction for it - REPACK CONCURRENTLY is the typical use
+	 * case.
+	 *
+	 * First, check if REPACK CONCURRENTLY is being performed by this backend.
+	 * If so, only decode data changes of the table that it is processing, and
+	 * the changes of its TOAST relation.
+	 *
+	 * (TOAST locator should not be set unless the main is.)
+	 */
+	Assert(!OidIsValid(repacked_rel_toast_locator.relNumber) ||
+		   OidIsValid(repacked_rel_locator.relNumber));
+
+	if (OidIsValid(repacked_rel_locator.relNumber))
+	{
+		XLogReaderState *r = buf->record;
+		RelFileLocator locator;
+
+		/* Not all records contain the block. */
+		if (XLogRecGetBlockTagExtended(r, 0, &locator, NULL, NULL, NULL) &&
+			!RelFileLocatorEquals(locator, repacked_rel_locator) &&
+			(!OidIsValid(repacked_rel_toast_locator.relNumber) ||
+			 !RelFileLocatorEquals(locator, repacked_rel_toast_locator)))
+			return;
+	}
+
+	/*
+	 * Second, skip records which do not contain sufficient information for
+	 * the decoding.
+	 *
+	 * The problem we solve here is that REPACK CONCURRENTLY generates WAL
+	 * when doing changes in the new table. Those changes should not be useful
+	 * for any other user (such as logical replication subscription) because
+	 * the new table will eventually be dropped (after REPACK CONCURRENTLY has
+	 * assigned its file to the "old table").
+	 */
+	switch (info)
+	{
+		case XLOG_HEAP_INSERT:
+			{
+				xl_heap_insert *rec;
+
+				rec = (xl_heap_insert *) XLogRecGetData(buf->record);
+
+				/*
+				 * This does happen when 1) raw_heap_insert marks the TOAST
+				 * record as HEAP_INSERT_NO_LOGICAL, 2) REPACK CONCURRENTLY
+				 * replays inserts performed by other backends.
+				 */
+				if ((rec->flags & XLH_INSERT_CONTAINS_NEW_TUPLE) == 0)
+					return;
+
+				break;
+			}
+
+		case XLOG_HEAP_HOT_UPDATE:
+		case XLOG_HEAP_UPDATE:
+			{
+				xl_heap_update *rec;
+
+				rec = (xl_heap_update *) XLogRecGetData(buf->record);
+				if ((rec->flags &
+					 (XLH_UPDATE_CONTAINS_NEW_TUPLE |
+					  XLH_UPDATE_CONTAINS_OLD_TUPLE |
+					  XLH_UPDATE_CONTAINS_OLD_KEY)) == 0)
+					return;
+
+				break;
+			}
+
+		case XLOG_HEAP_DELETE:
+			{
+				xl_heap_delete *rec;
+
+				rec = (xl_heap_delete *) XLogRecGetData(buf->record);
+				if (rec->flags & XLH_DELETE_NO_LOGICAL)
+					return;
+				break;
+			}
+	}
+
 	ReorderBufferProcessXid(ctx->reorder, xid, buf->origptr);
 
 	/*
@@ -512,6 +595,16 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			break;
 
 		case XLOG_HEAP_TRUNCATE:
+			/* Is REPACK (CONCURRENTLY) being run by this backend? */
+			if (OidIsValid(repacked_rel_locator.relNumber))
+				/*
+				 * TRUNCATE changes rd_locator of the relation, so it'd break
+				 * REPACK (CONCURRENTLY). In fact it should not happen because
+				 * TRUNCATE needs AccessExclusiveLock on the table. Should we
+				 * only use Assert() here?
+				 */
+				ereport(ERROR,
+						(errmsg("TRUNCATE encountered while doing REPACK (CONCURRENTLY)")));
 			if (SnapBuildProcessChange(builder, xid, buf->origptr) &&
 				!ctx->fast_forward)
 				DecodeTruncate(ctx, buf);
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 34bdd987478..c1edc02c3fb 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -486,6 +486,27 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	return SnapBuildMVCCFromHistoric(snap, true);
 }
 
+/*
+ * Build an MVCC snapshot for the initial data load performed by REPACK
+ * CONCURRENTLY command.
+ *
+ * The snapshot will only be used to scan one particular relation, which is
+ * treated like a catalog (therefore ->building_full_snapshot is not
+ * important), and the caller should already have a replication slot setup (so
+ * we do not set MyProc->xmin). XXX Do we yet need to add some restrictions?
+ */
+Snapshot
+SnapBuildInitialSnapshotForRepack(SnapBuild *builder)
+{
+	Snapshot	snap;
+
+	Assert(builder->state == SNAPBUILD_CONSISTENT);
+	Assert(builder->building_full_snapshot);
+
+	snap = SnapBuildBuildSnapshot(builder);
+	return SnapBuildMVCCFromHistoric(snap, false);
+}
+
 /*
  * Turn a historic MVCC snapshot into an ordinary MVCC snapshot.
  *
diff --git a/src/backend/replication/pgoutput_repack/Makefile b/src/backend/replication/pgoutput_repack/Makefile
new file mode 100644
index 00000000000..4efeb713b70
--- /dev/null
+++ b/src/backend/replication/pgoutput_repack/Makefile
@@ -0,0 +1,32 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for src/backend/replication/pgoutput_repack
+#
+# IDENTIFICATION
+#    src/backend/replication/pgoutput_repack
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/replication/pgoutput_repack
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = \
+	$(WIN32RES) \
+	pgoutput_repack.o
+PGFILEDESC = "pgoutput_repack - logical replication output plugin for REPACK command"
+NAME = pgoutput_repack
+
+all: all-shared-lib
+
+include $(top_srcdir)/src/Makefile.shlib
+
+install: all installdirs install-lib
+
+installdirs: installdirs-lib
+
+uninstall: uninstall-lib
+
+clean distclean: clean-lib
+	rm -f $(OBJS)
diff --git a/src/backend/replication/pgoutput_repack/meson.build b/src/backend/replication/pgoutput_repack/meson.build
new file mode 100644
index 00000000000..133e865a4a0
--- /dev/null
+++ b/src/backend/replication/pgoutput_repack/meson.build
@@ -0,0 +1,18 @@
+# Copyright (c) 2022-2024, PostgreSQL Global Development Group
+
+pgoutput_repack_sources = files(
+  'pgoutput_repack.c',
+)
+
+if host_system == 'windows'
+  pgoutput_repack_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'pgoutput_repack',
+    '--FILEDESC', 'pgoutput_repack - logical replication output plugin for REPACK command',])
+endif
+
+pgoutput_repack = shared_module('pgoutput_repack',
+  pgoutput_repack_sources,
+  kwargs: pg_mod_args,
+)
+
+backend_targets += pgoutput_repack
diff --git a/src/backend/replication/pgoutput_repack/pgoutput_repack.c b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
new file mode 100644
index 00000000000..c8930640a0d
--- /dev/null
+++ b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
@@ -0,0 +1,240 @@
+/*-------------------------------------------------------------------------
+ *
+ * pgoutput_repack.c
+ *		Logical Replication output plugin for REPACK command
+ *
+ * Copyright (c) 2012-2025, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		  src/backend/replication/pgoutput_repack/pgoutput_repack.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heaptoast.h"
+#include "commands/cluster.h"
+#include "replication/snapbuild.h"
+#include "utils/memutils.h"
+
+PG_MODULE_MAGIC;
+
+static void plugin_startup(LogicalDecodingContext *ctx,
+						   OutputPluginOptions *opt, bool is_init);
+static void plugin_shutdown(LogicalDecodingContext *ctx);
+static void plugin_begin_txn(LogicalDecodingContext *ctx,
+							 ReorderBufferTXN *txn);
+static void plugin_commit_txn(LogicalDecodingContext *ctx,
+							  ReorderBufferTXN *txn, XLogRecPtr commit_lsn);
+static void plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+						  Relation rel, ReorderBufferChange *change);
+static void store_change(LogicalDecodingContext *ctx,
+						 ConcurrentChangeKind kind, HeapTuple tuple);
+
+void
+_PG_output_plugin_init(OutputPluginCallbacks *cb)
+{
+	AssertVariableIsOfType(&_PG_output_plugin_init, LogicalOutputPluginInit);
+
+	cb->startup_cb = plugin_startup;
+	cb->begin_cb = plugin_begin_txn;
+	cb->change_cb = plugin_change;
+	cb->commit_cb = plugin_commit_txn;
+	cb->shutdown_cb = plugin_shutdown;
+}
+
+
+/* initialize this plugin */
+static void
+plugin_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
+			   bool is_init)
+{
+	ctx->output_plugin_private = NULL;
+
+	/* Probably unnecessary, as we don't use the SQL interface ... */
+	opt->output_type = OUTPUT_PLUGIN_BINARY_OUTPUT;
+
+	if (ctx->output_plugin_options != NIL)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("This plugin does not expect any options")));
+	}
+}
+
+static void
+plugin_shutdown(LogicalDecodingContext *ctx)
+{
+}
+
+/*
+ * As we don't release the slot during processing of particular table, there's
+ * no room for SQL interface, even for debugging purposes. Therefore we need
+ * neither OutputPluginPrepareWrite() nor OutputPluginWrite() in the plugin
+ * callbacks. (Although we might want to write custom callbacks, this API
+ * seems to be unnecessarily generic for our purposes.)
+ */
+
+/* BEGIN callback */
+static void
+plugin_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
+{
+}
+
+/* COMMIT callback */
+static void
+plugin_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+				  XLogRecPtr commit_lsn)
+{
+}
+
+/*
+ * Callback for individual changed tuples
+ */
+static void
+plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+			  Relation relation, ReorderBufferChange *change)
+{
+	RepackDecodingState *dstate;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	/* Only interested in one particular relation. */
+	if (relation->rd_id != dstate->relid)
+		return;
+
+	/* Decode entry depending on its type */
+	switch (change->action)
+	{
+		case REORDER_BUFFER_CHANGE_INSERT:
+			{
+				HeapTuple	newtuple;
+
+				newtuple = change->data.tp.newtuple != NULL ?
+					change->data.tp.newtuple : NULL;
+
+				/*
+				 * Identity checks in the main function should have made this
+				 * impossible.
+				 */
+				if (newtuple == NULL)
+					elog(ERROR, "Incomplete insert info.");
+
+				store_change(ctx, CHANGE_INSERT, newtuple);
+			}
+			break;
+		case REORDER_BUFFER_CHANGE_UPDATE:
+			{
+				HeapTuple	oldtuple,
+							newtuple;
+
+				oldtuple = change->data.tp.oldtuple != NULL ?
+					change->data.tp.oldtuple : NULL;
+				newtuple = change->data.tp.newtuple != NULL ?
+					change->data.tp.newtuple : NULL;
+
+				if (newtuple == NULL)
+					elog(ERROR, "Incomplete update info.");
+
+				if (oldtuple != NULL)
+					store_change(ctx, CHANGE_UPDATE_OLD, oldtuple);
+
+				store_change(ctx, CHANGE_UPDATE_NEW, newtuple);
+			}
+			break;
+		case REORDER_BUFFER_CHANGE_DELETE:
+			{
+				HeapTuple	oldtuple;
+
+				oldtuple = change->data.tp.oldtuple ?
+					change->data.tp.oldtuple : NULL;
+
+				if (oldtuple == NULL)
+					elog(ERROR, "Incomplete delete info.");
+
+				store_change(ctx, CHANGE_DELETE, oldtuple);
+			}
+			break;
+		default:
+			/* Should not come here */
+			Assert(false);
+			break;
+	}
+}
+
+/* Store concurrent data change. */
+static void
+store_change(LogicalDecodingContext *ctx, ConcurrentChangeKind kind,
+			 HeapTuple tuple)
+{
+	RepackDecodingState *dstate;
+	char	   *change_raw;
+	ConcurrentChange change;
+	bool		flattened = false;
+	Size		size;
+	Datum		values[1];
+	bool		isnull[1];
+	char	   *dst;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	size = VARHDRSZ + SizeOfConcurrentChange;
+
+	/*
+	 * ReorderBufferCommit() stores the TOAST chunks in its private memory
+	 * context and frees them after having called apply_change().  Therefore
+	 * we need flat copy (including TOAST) that we eventually copy into the
+	 * memory context which is available to decode_concurrent_changes().
+	 */
+	if (HeapTupleHasExternal(tuple))
+	{
+		/*
+		 * toast_flatten_tuple_to_datum() might be more convenient but we
+		 * don't want the decompression it does.
+		 */
+		tuple = toast_flatten_tuple(tuple, dstate->tupdesc);
+		flattened = true;
+	}
+
+	size += tuple->t_len;
+	if (size >= MaxAllocSize)
+		elog(ERROR, "Change is too big.");
+
+	/* Construct the change. */
+	change_raw = (char *) palloc0(size);
+	SET_VARSIZE(change_raw, size);
+
+	/*
+	 * Since the varlena alignment might not be sufficient for the structure,
+	 * set the fields in a local instance and remember where it should
+	 * eventually be copied.
+	 */
+	change.kind = kind;
+	dst = (char *) VARDATA(change_raw);
+
+	/*
+	 * Copy the tuple.
+	 *
+	 * Note: change->tup_data.t_data must be fixed on retrieval!
+	 */
+	memcpy(&change.tup_data, tuple, sizeof(HeapTupleData));
+	memcpy(dst, &change, SizeOfConcurrentChange);
+	dst += SizeOfConcurrentChange;
+	memcpy(dst, tuple->t_data, tuple->t_len);
+
+	/* The data has been copied. */
+	if (flattened)
+		pfree(tuple);
+
+	/* Store as tuple of 1 bytea column. */
+	values[0] = PointerGetDatum(change_raw);
+	isnull[0] = false;
+	tuplestore_putvalues(dstate->tstore, dstate->tupdesc_change,
+						 values, isnull);
+
+	/* Accounting. */
+	dstate->nchanges++;
+
+	/* Cleanup. */
+	pfree(change_raw);
+}
diff --git a/src/backend/storage/lmgr/generate-lwlocknames.pl b/src/backend/storage/lmgr/generate-lwlocknames.pl
index cd3e43c448a..519f3953638 100644
--- a/src/backend/storage/lmgr/generate-lwlocknames.pl
+++ b/src/backend/storage/lmgr/generate-lwlocknames.pl
@@ -162,7 +162,7 @@ while (<$lwlocklist>)
 
 die
   "$wait_event_lwlocks[$lwlock_count] defined in wait_event_names.txt but "
-  . " missing from lwlocklist.h"
+  . "missing from lwlocklist.h"
   if $lwlock_count < scalar @wait_event_lwlocks;
 
 die
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 886060305f5..fbb3d66bbd9 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -214,7 +214,6 @@ static List *exportedSnapshots = NIL;
 
 /* Prototypes for local functions */
 static void UnregisterSnapshotNoOwner(Snapshot snapshot);
-static void FreeSnapshot(Snapshot snapshot);
 static void SnapshotResetXmin(void);
 
 /* ResourceOwner callbacks to track snapshot references */
@@ -659,7 +658,7 @@ CopySnapshot(Snapshot snapshot)
  * FreeSnapshot
  *		Free the memory associated with a snapshot.
  */
-static void
+void
 FreeSnapshot(Snapshot snapshot)
 {
 	Assert(snapshot->regd_count == 0);
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 626d9f1c98b..0fcf343d3af 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -5075,8 +5075,8 @@ match_previous_words(int pattern_id,
 		 * one word, so the above test is correct.
 		 */
 		if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
-			COMPLETE_WITH("ANALYZE", "VERBOSE");
-		else if (TailMatches("ANALYZE", "VERBOSE"))
+			COMPLETE_WITH("ANALYZE", "CONCURRENTLY", "VERBOSE");
+		else if (TailMatches("ANALYZE", "CONCURRENTLY", "VERBOSE"))
 			COMPLETE_WITH("ON", "OFF");
 	}
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 632c4332a8c..b7cd25896f6 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -361,14 +361,15 @@ extern void heap_multi_insert(Relation relation, TupleTableSlot **slots,
 							  BulkInsertState bistate);
 extern TM_Result heap_delete(Relation relation, const ItemPointerData *tid,
 							 CommandId cid, Snapshot crosscheck, bool wait,
-							 TM_FailureData *tmfd, bool changingPart);
+							 TM_FailureData *tmfd, bool changingPart,
+							 bool wal_logical);
 extern void heap_finish_speculative(Relation relation, const ItemPointerData *tid);
 extern void heap_abort_speculative(Relation relation, const ItemPointerData *tid);
 extern TM_Result heap_update(Relation relation, const ItemPointerData *otid,
 							 HeapTuple newtup,
 							 CommandId cid, Snapshot crosscheck, bool wait,
 							 TM_FailureData *tmfd, LockTupleMode *lockmode,
-							 TU_UpdateIndexes *update_indexes);
+							 TU_UpdateIndexes *update_indexes, bool wal_logical);
 extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
 								 CommandId cid, LockTupleMode mode, LockWaitPolicy wait_policy,
 								 bool follow_updates,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..2cc49fd48de 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -104,6 +104,8 @@
 #define XLH_DELETE_CONTAINS_OLD_KEY				(1<<2)
 #define XLH_DELETE_IS_SUPER						(1<<3)
 #define XLH_DELETE_IS_PARTITION_MOVE			(1<<4)
+/* See heap_delete() */
+#define XLH_DELETE_NO_LOGICAL					(1<<5)
 
 /* convenience macro for checking whether any form of old tuple was logged */
 #define XLH_DELETE_CONTAINS_OLD						\
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 2fa790b6bf5..d8f76d325f9 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -22,6 +22,7 @@
 #include "access/xact.h"
 #include "commands/vacuum.h"
 #include "executor/tuptable.h"
+#include "replication/logical.h"
 #include "storage/read_stream.h"
 #include "utils/rel.h"
 #include "utils/snapshot.h"
@@ -629,6 +630,8 @@ typedef struct TableAmRoutine
 											  Relation OldIndex,
 											  bool use_sort,
 											  TransactionId OldestXmin,
+											  Snapshot snapshot,
+											  LogicalDecodingContext *decoding_ctx,
 											  TransactionId *xid_cutoff,
 											  MultiXactId *multi_cutoff,
 											  double *num_tuples,
@@ -1646,6 +1649,10 @@ table_relation_copy_data(Relation rel, const RelFileLocator *newrlocator)
  *   not needed for the relation's AM
  * - *xid_cutoff - ditto
  * - *multi_cutoff - ditto
+ * - snapshot - if != NULL, ignore data changes done by transactions that this
+ *	 (MVCC) snapshot considers still in-progress or in the future.
+ * - decoding_ctx - logical decoding context, to capture concurrent data
+ *   changes.
  *
  * Output parameters:
  * - *xid_cutoff - rel's new relfrozenxid value, may be invalid
@@ -1658,6 +1665,8 @@ table_relation_copy_for_cluster(Relation OldTable, Relation NewTable,
 								Relation OldIndex,
 								bool use_sort,
 								TransactionId OldestXmin,
+								Snapshot snapshot,
+								LogicalDecodingContext *decoding_ctx,
 								TransactionId *xid_cutoff,
 								MultiXactId *multi_cutoff,
 								double *num_tuples,
@@ -1666,6 +1675,7 @@ table_relation_copy_for_cluster(Relation OldTable, Relation NewTable,
 {
 	OldTable->rd_tableam->relation_copy_for_cluster(OldTable, NewTable, OldIndex,
 													use_sort, OldestXmin,
+													snapshot, decoding_ctx,
 													xid_cutoff, multi_cutoff,
 													num_tuples, tups_vacuumed,
 													tups_recently_dead);
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 652542e8e65..b43a1740053 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -13,10 +13,15 @@
 #ifndef CLUSTER_H
 #define CLUSTER_H
 
+#include "nodes/execnodes.h"
 #include "nodes/parsenodes.h"
 #include "parser/parse_node.h"
+#include "replication/logical.h"
 #include "storage/lock.h"
+#include "storage/relfilelocator.h"
 #include "utils/relcache.h"
+#include "utils/resowner.h"
+#include "utils/tuplestore.h"
 
 
 /* flag bits for ClusterParams->options */
@@ -25,6 +30,8 @@
 #define CLUOPT_RECHECK_ISCLUSTERED 0x04 /* recheck relation state for
 										 * indisclustered */
 #define CLUOPT_ANALYZE 0x08		/* do an ANALYZE */
+#define CLUOPT_CONCURRENT 0x10	/* allow concurrent data changes */
+
 
 /* options for CLUSTER */
 typedef struct ClusterParams
@@ -33,14 +40,94 @@ typedef struct ClusterParams
 } ClusterParams;
 
 
+/*
+ * The following definitions are used by REPACK CONCURRENTLY.
+ */
+
+extern RelFileLocator repacked_rel_locator;
+extern RelFileLocator repacked_rel_toast_locator;
+
+typedef enum
+{
+	CHANGE_INSERT,
+	CHANGE_UPDATE_OLD,
+	CHANGE_UPDATE_NEW,
+	CHANGE_DELETE
+} ConcurrentChangeKind;
+
+typedef struct ConcurrentChange
+{
+	/* See the enum above. */
+	ConcurrentChangeKind kind;
+
+	/*
+	 * The actual tuple.
+	 *
+	 * The tuple data follows the ConcurrentChange structure. Before use make
+	 * sure the tuple is correctly aligned (ConcurrentChange can be stored as
+	 * bytea) and that tuple->t_data is fixed.
+	 */
+	HeapTupleData tup_data;
+} ConcurrentChange;
+
+#define SizeOfConcurrentChange (offsetof(ConcurrentChange, tup_data) + \
+								sizeof(HeapTupleData))
+
+/*
+ * Logical decoding state.
+ *
+ * Here we store the data changes that we decode from WAL while the table
+ * contents is being copied to a new storage. Also the necessary metadata
+ * needed to apply these changes to the table is stored here.
+ */
+typedef struct RepackDecodingState
+{
+	/* The relation whose changes we're decoding. */
+	Oid			relid;
+
+	/* Replication slot name. */
+	NameData	slotname;
+
+	/*
+	 * Decoded changes are stored here. Although we try to avoid excessive
+	 * batches, it can happen that the changes need to be stored to disk. The
+	 * tuplestore does this transparently.
+	 */
+	Tuplestorestate *tstore;
+
+	/* The current number of changes in tstore. */
+	double		nchanges;
+
+	/*
+	 * Descriptor to store the ConcurrentChange structure serialized (bytea).
+	 * We can't store the tuple directly because tuplestore only supports
+	 * minimum tuple and we may need to transfer OID system column from the
+	 * output plugin. Also we need to transfer the change kind, so it's better
+	 * to put everything in the structure than to use 2 tuplestores "in
+	 * parallel".
+	 */
+	TupleDesc	tupdesc_change;
+
+	/* Tuple descriptor needed to update indexes. */
+	TupleDesc	tupdesc;
+
+	/* Slot to retrieve data from tstore. */
+	TupleTableSlot *tsslot;
+
+	ResourceOwner resowner;
+} RepackDecodingState;
+
 extern void ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel);
 
 extern void cluster_rel(RepackCommand command, Relation OldHeap, Oid indexOid,
-						ClusterParams *params);
+						ClusterParams *params, bool isTopLevel);
 extern void check_index_is_clusterable(Relation OldHeap, Oid indexOid,
 									   LOCKMODE lockmode);
 extern void mark_index_clustered(Relation rel, Oid indexOid, bool is_internal);
 
+extern void repack_decode_concurrent_changes(LogicalDecodingContext *ctx,
+											 XLogRecPtr end_of_wal);
+
 extern Oid	make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
 						  char relpersistence, LOCKMODE lockmode);
 extern void finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
@@ -48,6 +135,7 @@ extern void finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 							 bool swap_toast_by_content,
 							 bool check_constraints,
 							 bool is_internal,
+							 bool reindex,
 							 TransactionId frozenXid,
 							 MultiXactId cutoffMulti,
 							 char newrelpersistence);
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index ebf004b7aa5..5024fea5e2e 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -69,10 +69,12 @@
 #define PROGRESS_REPACK_PHASE					1
 #define PROGRESS_REPACK_INDEX_RELID				2
 #define PROGRESS_REPACK_HEAP_TUPLES_SCANNED		3
-#define PROGRESS_REPACK_HEAP_TUPLES_WRITTEN		4
-#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			5
-#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		6
-#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		7
+#define PROGRESS_REPACK_HEAP_TUPLES_INSERTED	4
+#define PROGRESS_REPACK_HEAP_TUPLES_UPDATED		5
+#define PROGRESS_REPACK_HEAP_TUPLES_DELETED		6
+#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			7
+#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		8
+#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		9
 
 /*
  * Phases of repack (as advertised via PROGRESS_REPACK_PHASE).
@@ -81,9 +83,10 @@
 #define PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP	2
 #define PROGRESS_REPACK_PHASE_SORT_TUPLES		3
 #define PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP	4
-#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	5
-#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		6
-#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		7
+#define PROGRESS_REPACK_PHASE_CATCH_UP			5
+#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	6
+#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		7
+#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		8
 
 /* Progress parameters for CREATE INDEX */
 /* 3, 4 and 5 reserved for "waitfor" metrics */
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index 6d4d2d1814c..802fc4b0823 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -73,6 +73,7 @@ extern void FreeSnapshotBuilder(SnapBuild *builder);
 extern void SnapBuildSnapDecRefcount(Snapshot snap);
 
 extern Snapshot SnapBuildInitialSnapshot(SnapBuild *builder);
+extern Snapshot SnapBuildInitialSnapshotForRepack(SnapBuild *builder);
 extern Snapshot SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place);
 extern const char *SnapBuildExportSnapshot(SnapBuild *builder);
 extern void SnapBuildClearExportedSnapshot(void);
diff --git a/src/include/storage/lockdefs.h b/src/include/storage/lockdefs.h
index 7f3ba0352f6..2739327b0da 100644
--- a/src/include/storage/lockdefs.h
+++ b/src/include/storage/lockdefs.h
@@ -36,8 +36,8 @@ typedef int LOCKMODE;
 #define AccessShareLock			1	/* SELECT */
 #define RowShareLock			2	/* SELECT FOR UPDATE/FOR SHARE */
 #define RowExclusiveLock		3	/* INSERT, UPDATE, DELETE */
-#define ShareUpdateExclusiveLock 4	/* VACUUM (non-FULL), ANALYZE, CREATE
-									 * INDEX CONCURRENTLY */
+#define ShareUpdateExclusiveLock 4	/* VACUUM (non-exclusive), ANALYZE, CREATE
+									 * INDEX CONCURRENTLY, REPACK CONCURRENTLY */
 #define ShareLock				5	/* CREATE INDEX (WITHOUT CONCURRENTLY) */
 #define ShareRowExclusiveLock	6	/* like EXCLUSIVE MODE, but allows ROW
 									 * SHARE */
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index f65f83c85cd..1f821fd2ccd 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -64,6 +64,8 @@ extern Snapshot GetLatestSnapshot(void);
 extern void SnapshotSetCommandId(CommandId curcid);
 
 extern Snapshot CopySnapshot(Snapshot snapshot);
+extern void FreeSnapshot(Snapshot snapshot);
+
 extern Snapshot GetCatalogSnapshot(Oid relid);
 extern Snapshot GetNonHistoricCatalogSnapshot(Oid relid);
 extern void InvalidateCatalogSnapshot(void);
diff --git a/src/test/modules/injection_points/Makefile b/src/test/modules/injection_points/Makefile
index c85034eb8cc..a9769f1d99f 100644
--- a/src/test/modules/injection_points/Makefile
+++ b/src/test/modules/injection_points/Makefile
@@ -14,12 +14,15 @@ REGRESS_OPTS = --dlpath=$(top_builddir)/src/test/regress
 
 ISOLATION = basic \
 	    inplace \
+	    repack \
+	    repack_toast \
 	    syscache-update-pruned \
 	    index-concurrently-upsert \
 	    index-concurrently-upsert-predicate \
 	    reindex-concurrently-upsert \
 	    reindex-concurrently-upsert-on-constraint \
 	    reindex-concurrently-upsert-partitioned
+ISOLATION_OPTS = --temp-config $(top_srcdir)/src/test/modules/injection_points/logical.conf
 
 # The injection points are cluster-wide, so disable installcheck
 NO_INSTALLCHECK = 1
diff --git a/src/test/modules/injection_points/expected/repack.out b/src/test/modules/injection_points/expected/repack.out
new file mode 100644
index 00000000000..b575e9052ee
--- /dev/null
+++ b/src/test/modules/injection_points/expected/repack.out
@@ -0,0 +1,113 @@
+Parsed test spec with 2 sessions
+
+starting permutation: wait_before_lock change_existing change_new change_subxact1 change_subxact2 check2 wakeup_before_lock check1
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step wait_before_lock: 
+	REPACK (CONCURRENTLY) repack_test USING INDEX repack_test_pkey;
+ <waiting ...>
+step change_existing: 
+	UPDATE repack_test SET i=10 where i=1;
+	UPDATE repack_test SET j=20 where i=2;
+	UPDATE repack_test SET i=30 where i=3;
+	UPDATE repack_test SET i=40 where i=30;
+	DELETE FROM repack_test WHERE i=4;
+
+step change_new: 
+	INSERT INTO repack_test(i, j) VALUES (5, 5), (6, 6), (7, 7), (8, 8);
+	UPDATE repack_test SET i=50 where i=5;
+	UPDATE repack_test SET j=60 where i=6;
+	DELETE FROM repack_test WHERE i=7;
+
+step change_subxact1: 
+	BEGIN;
+	INSERT INTO repack_test(i, j) VALUES (100, 100);
+	SAVEPOINT s1;
+	UPDATE repack_test SET i=101 where i=100;
+	SAVEPOINT s2;
+	UPDATE repack_test SET i=102 where i=101;
+	COMMIT;
+
+step change_subxact2: 
+	BEGIN;
+	SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 110);
+	ROLLBACK TO SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 111);
+	COMMIT;
+
+step check2: 
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s2(i, j)
+	SELECT i, j FROM repack_test;
+
+  i|  j
+---+---
+  2| 20
+  6| 60
+  8|  8
+ 10|  1
+ 40|  3
+ 50|  5
+102|100
+110|111
+(8 rows)
+
+step wakeup_before_lock: 
+	SELECT injection_points_wakeup('repack-concurrently-before-lock');
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step wait_before_lock: <... completed>
+step check1: 
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT count(DISTINCT node) FROM relfilenodes;
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s1(i, j)
+	SELECT i, j FROM repack_test;
+
+	SELECT count(*)
+	FROM data_s1 d1 FULL JOIN data_s2 d2 USING (i, j)
+	WHERE d1.i ISNULL OR d2.i ISNULL;
+
+count
+-----
+    2
+(1 row)
+
+  i|  j
+---+---
+  2| 20
+  6| 60
+  8|  8
+ 10|  1
+ 40|  3
+ 50|  5
+102|100
+110|111
+(8 rows)
+
+count
+-----
+    0
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
diff --git a/src/test/modules/injection_points/expected/repack_toast.out b/src/test/modules/injection_points/expected/repack_toast.out
new file mode 100644
index 00000000000..4f866a74e32
--- /dev/null
+++ b/src/test/modules/injection_points/expected/repack_toast.out
@@ -0,0 +1,64 @@
+Parsed test spec with 2 sessions
+
+starting permutation: wait_before_lock change check2 wakeup_before_lock check1
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step wait_before_lock: 
+	REPACK (CONCURRENTLY) repack_test;
+ <waiting ...>
+step change: 
+	UPDATE repack_test SET j=get_long_string() where i=2;
+	DELETE FROM repack_test WHERE i=3;
+	INSERT INTO repack_test(i, j) VALUES (4, get_long_string());
+
+step check2: 
+	INSERT INTO relfilenodes(node)
+	SELECT c2.relfilenode
+	FROM pg_class c1 JOIN pg_class c2 ON c2.oid = c1.oid OR c2.oid = c1.reltoastrelid
+	WHERE c1.relname='repack_test';
+
+	INSERT INTO data_s2(i, j)
+	SELECT i, j FROM repack_test;
+
+step wakeup_before_lock: 
+	SELECT injection_points_wakeup('repack-concurrently-before-lock');
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step wait_before_lock: <... completed>
+step check1: 
+	INSERT INTO relfilenodes(node)
+	SELECT c2.relfilenode
+	FROM pg_class c1 JOIN pg_class c2 ON c2.oid = c1.oid OR c2.oid = c1.reltoastrelid
+	WHERE c1.relname='repack_test';
+
+	SELECT count(DISTINCT node) FROM relfilenodes;
+
+	INSERT INTO data_s1(i, j)
+	SELECT i, j FROM repack_test;
+
+	SELECT count(*)
+	FROM data_s1 d1 FULL JOIN data_s2 d2 USING (i, j)
+	WHERE d1.i ISNULL OR d2.i ISNULL;
+
+count
+-----
+    4
+(1 row)
+
+count
+-----
+    0
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
diff --git a/src/test/modules/injection_points/logical.conf b/src/test/modules/injection_points/logical.conf
new file mode 100644
index 00000000000..e3d257315fa
--- /dev/null
+++ b/src/test/modules/injection_points/logical.conf
@@ -0,0 +1 @@
+wal_level = logical
diff --git a/src/test/modules/injection_points/meson.build b/src/test/modules/injection_points/meson.build
index 8d6f662040d..b72bfb8ff06 100644
--- a/src/test/modules/injection_points/meson.build
+++ b/src/test/modules/injection_points/meson.build
@@ -45,6 +45,8 @@ tests += {
     'specs': [
       'basic',
       'inplace',
+      'repack',
+      'repack_toast',
       'syscache-update-pruned',
       'index-concurrently-upsert',
       'index-concurrently-upsert-predicate',
@@ -55,5 +57,7 @@ tests += {
     'runningcheck': false, # see syscache-update-pruned
     # Some tests wait for all snapshots, so avoid parallel execution
     'runningcheck-parallel': false,
+    # 'repack' requires wal_level = 'logical'.
+    'regress_args': ['--temp-config', files('logical.conf')],
   },
 }
diff --git a/src/test/modules/injection_points/specs/repack.spec b/src/test/modules/injection_points/specs/repack.spec
new file mode 100644
index 00000000000..d727a9b056b
--- /dev/null
+++ b/src/test/modules/injection_points/specs/repack.spec
@@ -0,0 +1,142 @@
+# REPACK (CONCURRENTLY) ... USING INDEX ...;
+setup
+{
+	CREATE EXTENSION injection_points;
+
+	CREATE TABLE repack_test(i int PRIMARY KEY, j int);
+	INSERT INTO repack_test(i, j) VALUES (1, 1), (2, 2), (3, 3), (4, 4);
+
+	CREATE TABLE relfilenodes(node oid);
+
+	CREATE TABLE data_s1(i int, j int);
+	CREATE TABLE data_s2(i int, j int);
+}
+
+teardown
+{
+	DROP TABLE repack_test;
+	DROP EXTENSION injection_points;
+
+	DROP TABLE relfilenodes;
+	DROP TABLE data_s1;
+	DROP TABLE data_s2;
+}
+
+session s1
+setup
+{
+	SELECT injection_points_set_local();
+	SELECT injection_points_attach('repack-concurrently-before-lock', 'wait');
+}
+# Perform the initial load and wait for s2 to do some data changes.
+step wait_before_lock
+{
+	REPACK (CONCURRENTLY) repack_test USING INDEX repack_test_pkey;
+}
+# Check the table from the perspective of s1.
+#
+# Besides the contents, we also check that relfilenode has changed.
+
+# Have each session write the contents into a table and use FULL JOIN to check
+# if the outputs are identical.
+step check1
+{
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT count(DISTINCT node) FROM relfilenodes;
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s1(i, j)
+	SELECT i, j FROM repack_test;
+
+	SELECT count(*)
+	FROM data_s1 d1 FULL JOIN data_s2 d2 USING (i, j)
+	WHERE d1.i ISNULL OR d2.i ISNULL;
+}
+teardown
+{
+	SELECT injection_points_detach('repack-concurrently-before-lock');
+}
+
+session s2
+# Change the existing data. UPDATE changes both key and non-key columns. Also
+# update one row twice to test whether tuple version generated by this session
+# can be found.
+step change_existing
+{
+	UPDATE repack_test SET i=10 where i=1;
+	UPDATE repack_test SET j=20 where i=2;
+	UPDATE repack_test SET i=30 where i=3;
+	UPDATE repack_test SET i=40 where i=30;
+	DELETE FROM repack_test WHERE i=4;
+}
+# Insert new rows and UPDATE / DELETE some of them. Again, update both key and
+# non-key column.
+step change_new
+{
+	INSERT INTO repack_test(i, j) VALUES (5, 5), (6, 6), (7, 7), (8, 8);
+	UPDATE repack_test SET i=50 where i=5;
+	UPDATE repack_test SET j=60 where i=6;
+	DELETE FROM repack_test WHERE i=7;
+}
+
+# When applying concurrent data changes, we should see the effects of an
+# in-progress subtransaction.
+#
+# XXX Not sure this test is useful now - it was designed for the patch that
+# preserves tuple visibility and which therefore modifies
+# TransactionIdIsCurrentTransactionId().
+step change_subxact1
+{
+	BEGIN;
+	INSERT INTO repack_test(i, j) VALUES (100, 100);
+	SAVEPOINT s1;
+	UPDATE repack_test SET i=101 where i=100;
+	SAVEPOINT s2;
+	UPDATE repack_test SET i=102 where i=101;
+	COMMIT;
+}
+
+# When applying concurrent data changes, we should not see the effects of a
+# rolled back subtransaction.
+#
+# XXX Is this test useful? See above.
+step change_subxact2
+{
+	BEGIN;
+	SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 110);
+	ROLLBACK TO SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 111);
+	COMMIT;
+}
+
+# Check the table from the perspective of s2.
+step check2
+{
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s2(i, j)
+	SELECT i, j FROM repack_test;
+}
+step wakeup_before_lock
+{
+	SELECT injection_points_wakeup('repack-concurrently-before-lock');
+}
+
+# Test if data changes introduced while one session is performing REPACK
+# CONCURRENTLY find their way into the table.
+permutation
+	wait_before_lock
+	change_existing
+	change_new
+	change_subxact1
+	change_subxact2
+	check2
+	wakeup_before_lock
+	check1
diff --git a/src/test/modules/injection_points/specs/repack_toast.spec b/src/test/modules/injection_points/specs/repack_toast.spec
new file mode 100644
index 00000000000..b48abf21450
--- /dev/null
+++ b/src/test/modules/injection_points/specs/repack_toast.spec
@@ -0,0 +1,105 @@
+# REPACK (CONCURRENTLY);
+#
+# Test handling of TOAST. At the same time, no tuplesort.
+setup
+{
+	CREATE EXTENSION injection_points;
+
+	-- Return a string that needs to be TOASTed.
+	CREATE FUNCTION get_long_string()
+	RETURNS text
+	LANGUAGE sql as $$
+		SELECT string_agg(chr(65 + trunc(25 * random())::int), '')
+		FROM generate_series(1, 2048) s(x);
+	$$;
+
+	CREATE TABLE repack_test(i int PRIMARY KEY, j text);
+	INSERT INTO repack_test(i, j) VALUES (1, get_long_string()),
+		(2, get_long_string()), (3, get_long_string());
+
+	CREATE TABLE relfilenodes(node oid);
+
+	CREATE TABLE data_s1(i int, j text);
+	CREATE TABLE data_s2(i int, j text);
+}
+
+teardown
+{
+	DROP TABLE repack_test;
+	DROP EXTENSION injection_points;
+	DROP FUNCTION get_long_string();
+
+	DROP TABLE relfilenodes;
+	DROP TABLE data_s1;
+	DROP TABLE data_s2;
+}
+
+session s1
+setup
+{
+	SELECT injection_points_set_local();
+	SELECT injection_points_attach('repack-concurrently-before-lock', 'wait');
+}
+# Perform the initial load and wait for s2 to do some data changes.
+step wait_before_lock
+{
+	REPACK (CONCURRENTLY) repack_test;
+}
+# Check the table from the perspective of s1.
+#
+# Besides the contents, we also check that relfilenode has changed.
+
+# Have each session write the contents into a table and use FULL JOIN to check
+# if the outputs are identical.
+step check1
+{
+	INSERT INTO relfilenodes(node)
+	SELECT c2.relfilenode
+	FROM pg_class c1 JOIN pg_class c2 ON c2.oid = c1.oid OR c2.oid = c1.reltoastrelid
+	WHERE c1.relname='repack_test';
+
+	SELECT count(DISTINCT node) FROM relfilenodes;
+
+	INSERT INTO data_s1(i, j)
+	SELECT i, j FROM repack_test;
+
+	SELECT count(*)
+	FROM data_s1 d1 FULL JOIN data_s2 d2 USING (i, j)
+	WHERE d1.i ISNULL OR d2.i ISNULL;
+}
+teardown
+{
+    SELECT injection_points_detach('repack-concurrently-before-lock');
+}
+
+session s2
+step change
+{
+	UPDATE repack_test SET j=get_long_string() where i=2;
+	DELETE FROM repack_test WHERE i=3;
+	INSERT INTO repack_test(i, j) VALUES (4, get_long_string());
+}
+# Check the table from the perspective of s2.
+step check2
+{
+	INSERT INTO relfilenodes(node)
+	SELECT c2.relfilenode
+	FROM pg_class c1 JOIN pg_class c2 ON c2.oid = c1.oid OR c2.oid = c1.reltoastrelid
+	WHERE c1.relname='repack_test';
+
+	INSERT INTO data_s2(i, j)
+	SELECT i, j FROM repack_test;
+}
+step wakeup_before_lock
+{
+	SELECT injection_points_wakeup('repack-concurrently-before-lock');
+}
+
+# Test if data changes introduced while one session is performing REPACK
+# CONCURRENTLY find their way into the table.
+permutation
+	wait_before_lock
+	change
+	check2
+	wakeup_before_lock
+	check1
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 69bf6b1baf6..44042832c79 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2007,7 +2007,7 @@ pg_stat_progress_cluster| SELECT pid,
     phase,
     repack_index_relid AS cluster_index_relid,
     heap_tuples_scanned,
-    heap_tuples_written,
+    (heap_tuples_inserted + heap_tuples_updated) AS heap_tuples_written,
     heap_blks_total,
     heap_blks_scanned,
     index_rebuild_count
@@ -2087,17 +2087,20 @@ pg_stat_progress_repack| SELECT s.pid,
             WHEN 2 THEN 'index scanning heap'::text
             WHEN 3 THEN 'sorting tuples'::text
             WHEN 4 THEN 'writing new heap'::text
-            WHEN 5 THEN 'swapping relation files'::text
-            WHEN 6 THEN 'rebuilding index'::text
-            WHEN 7 THEN 'performing final cleanup'::text
+            WHEN 5 THEN 'catch-up'::text
+            WHEN 6 THEN 'swapping relation files'::text
+            WHEN 7 THEN 'rebuilding index'::text
+            WHEN 8 THEN 'performing final cleanup'::text
             ELSE NULL::text
         END AS phase,
     (s.param3)::oid AS repack_index_relid,
     s.param4 AS heap_tuples_scanned,
-    s.param5 AS heap_tuples_written,
-    s.param6 AS heap_blks_total,
-    s.param7 AS heap_blks_scanned,
-    s.param8 AS index_rebuild_count
+    s.param5 AS heap_tuples_inserted,
+    s.param6 AS heap_tuples_updated,
+    s.param7 AS heap_tuples_deleted,
+    s.param8 AS heap_blks_total,
+    s.param9 AS heap_blks_scanned,
+    s.param10 AS index_rebuild_count
    FROM (pg_stat_get_progress_info('REPACK'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_vacuum| SELECT s.pid,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 4f3c7c160a6..3139b14e85f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -411,6 +411,7 @@ CatCacheHeader
 CatalogId
 CatalogIdMapEntry
 CatalogIndexState
+ChangeDest
 ChangeVarNodes_callback
 ChangeVarNodes_context
 CheckPoint
@@ -487,6 +488,8 @@ CompressFileHandle
 CompressionLocation
 CompressorState
 ComputeXidHorizonsResult
+ConcurrentChange
+ConcurrentChangeKind
 ConditionVariable
 ConditionVariableMinimallyPadded
 ConditionalStack
@@ -1264,6 +1267,7 @@ IndexElem
 IndexFetchHeapData
 IndexFetchTableData
 IndexInfo
+IndexInsertState
 IndexList
 IndexOnlyScan
 IndexOnlyScanState
@@ -2558,6 +2562,7 @@ ReorderBufferUpdateProgressTxnCB
 ReorderTuple
 RepOriginId
 RepackCommand
+RepackDecodingState
 RepackStmt
 ReparameterizeForeignPathByChild_function
 ReplaceVarsFromTargetList_context
-- 
2.47.3

v27-0005-Use-background-worker-to-do-logical-decoding.patchtext/x-diffDownload

From 3763e71b4b08bef32c799e69a9f142df887095bc Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Tue, 9 Dec 2025 19:44:43 +0100
Subject: [PATCH 5/5] Use background worker to do logical decoding.

If the backend performing REPACK (CONCURRENTLY) does both data copying and
logical decoding, it has to "travel in time" back and forth and therefore it
has to invalidate system caches quite a few times. (The copying and the
decoding work with different catalog snapshots.) As the decoding worker has
separate caches, the switching is not necessary.

Without the worker, it'd also be difficult to switch between potentiallly long
running tasks like index build and WAL decoding. (No decoding during that time
at all can suspend archiving / recycling of WAL segments for some time, which
in turn may result in full disk.)

Another problem is that, after having acquired AccessExclusiveLock (in order
to swap the files), the backend needs to both decode and apply the data
changes that took place while it was waiting for the lock. With the decoding
worker, the decoding runs all the time, so the backend only needs to apply the
changes. This can reduce the time the exclusive lock is held for.

Note that the code added in order to handle ERRORs in the background worker
almost duplicates the existing code that does the same for other types of
workers (See ProcessParallelMessages() and
ProcessParallelApplyMessages()). Refactoring of the existing code might be
useful, to reduce the duplication.
---
 src/backend/access/heap/heapam_handler.c      |   44 -
 src/backend/commands/cluster.c                | 1139 +++++++++++++----
 src/backend/libpq/pqmq.c                      |    5 +
 src/backend/postmaster/bgworker.c             |    4 +
 src/backend/replication/logical/logical.c     |    6 +-
 .../pgoutput_repack/pgoutput_repack.c         |   54 +-
 src/backend/storage/ipc/procsignal.c          |    4 +
 src/backend/tcop/postgres.c                   |    4 +
 .../utils/activity/wait_event_names.txt       |    2 +
 src/include/access/tableam.h                  |    7 +-
 src/include/commands/cluster.h                |   68 +-
 src/include/storage/procsignal.h              |    1 +
 src/tools/pgindent/typedefs.list              |    4 +-
 13 files changed, 932 insertions(+), 410 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 01be29eb405..e6d630fa2f7 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -33,7 +33,6 @@
 #include "catalog/index.h"
 #include "catalog/storage.h"
 #include "catalog/storage_xlog.h"
-#include "commands/cluster.h"
 #include "commands/progress.h"
 #include "executor/executor.h"
 #include "miscadmin.h"
@@ -688,7 +687,6 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 								 Relation OldIndex, bool use_sort,
 								 TransactionId OldestXmin,
 								 Snapshot snapshot,
-								 LogicalDecodingContext *decoding_ctx,
 								 TransactionId *xid_cutoff,
 								 MultiXactId *multi_cutoff,
 								 double *num_tuples,
@@ -710,7 +708,6 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	BufferHeapTupleTableSlot *hslot;
 	BlockNumber prev_cblock = InvalidBlockNumber;
 	bool		concurrent = snapshot != NULL;
-	XLogRecPtr	end_of_wal_prev = GetFlushRecPtr(NULL);
 
 	/* Remember if it's a system catalog */
 	is_system_catalog = IsSystemRelation(OldHeap);
@@ -957,31 +954,6 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			ct_val[1] = *num_tuples;
 			pgstat_progress_update_multi_param(2, ct_index, ct_val);
 		}
-
-		/*
-		 * Process the WAL produced by the load, as well as by other
-		 * transactions, so that the replication slot can advance and WAL does
-		 * not pile up. Use wal_segment_size as a threshold so that we do not
-		 * introduce the decoding overhead too often.
-		 *
-		 * Of course, we must not apply the changes until the initial load has
-		 * completed.
-		 *
-		 * Note that our insertions into the new table should not be decoded
-		 * as we (intentionally) do not write the logical decoding specific
-		 * information to WAL.
-		 */
-		if (concurrent)
-		{
-			XLogRecPtr	end_of_wal;
-
-			end_of_wal = GetFlushRecPtr(NULL);
-			if ((end_of_wal - end_of_wal_prev) > wal_segment_size)
-			{
-				repack_decode_concurrent_changes(decoding_ctx, end_of_wal);
-				end_of_wal_prev = end_of_wal;
-			}
-		}
 	}
 
 	if (indexScan != NULL)
@@ -1027,22 +999,6 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			/* Report n_tuples */
 			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_INSERTED,
 										 n_tuples);
-
-			/*
-			 * Try to keep the amount of not-yet-decoded WAL small, like
-			 * above.
-			 */
-			if (concurrent)
-			{
-				XLogRecPtr	end_of_wal;
-
-				end_of_wal = GetFlushRecPtr(NULL);
-				if ((end_of_wal - end_of_wal_prev) > wal_segment_size)
-				{
-					repack_decode_concurrent_changes(decoding_ctx, end_of_wal);
-					end_of_wal_prev = end_of_wal;
-				}
-			}
 		}
 
 		tuplesort_end(tuplesort);
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 501bd36c23e..f2a2ec6d3e5 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -46,6 +46,8 @@
 #include "commands/tablecmds.h"
 #include "commands/vacuum.h"
 #include "executor/executor.h"
+#include "libpq/pqformat.h"
+#include "libpq/pqmq.h"
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
 #include "pgstat.h"
@@ -56,6 +58,8 @@
 #include "storage/ipc.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
+#include "storage/procsignal.h"
+#include "tcop/tcopprot.h"
 #include "utils/acl.h"
 #include "utils/fmgroids.h"
 #include "utils/guc.h"
@@ -123,6 +127,103 @@ typedef struct ChangeDest
 	IndexInsertState *iistate;
 } ChangeDest;
 
+/*
+ * Layout of shared memory used for communication between backend and the
+ * worker that performs logical decoding of data changes
+ */
+typedef struct DecodingWorkerShared
+{
+	/*
+	 * Once the worker has reached this LSN, it should close the current
+	 * output file and either create a new one or exit, according to the field
+	 * 'done'. If the value is InvalidXLogRecPtr, the worker should decode all
+	 * the WAL available and keep checking this field. It is ok if the worker
+	 * had already decoded records whose LSN is >= lsn_upto before this field
+	 * has been set.
+	 */
+	XLogRecPtr	lsn_upto;
+
+	/* Exit after closing the current file? */
+	bool		done;
+
+	/* The output is stored here. */
+	SharedFileSet sfs;
+
+	/* Can backend read the file contents? */
+	bool		sfs_valid;
+
+	/* Number of the last file exported by the worker. */
+	int			last_exported;
+
+	/* Synchronize access to the fields above. */
+	slock_t		mutex;
+
+	/* Database to connect to. */
+	Oid			dbid;
+
+	/* Role to connect as. */
+	Oid			roleid;
+
+	/* Decode data changes of this relation. */
+	Oid			relid;
+
+	/* The backend uses this to wait for the worker. */
+	ConditionVariable cv;
+
+	/* Info to signal the backend. */
+	PGPROC	   *backend_proc;
+	pid_t		backend_pid;
+	ProcNumber	backend_proc_number;
+
+	/* Error queue. */
+	shm_mq	   *error_mq;
+
+	/*
+	 * Memory the queue is located int.
+	 *
+	 * For considerations on the value see the comments of
+	 * PARALLEL_ERROR_QUEUE_SIZE.
+	 */
+#define REPACK_ERROR_QUEUE_SIZE			16384
+	char		error_queue[FLEXIBLE_ARRAY_MEMBER];
+} DecodingWorkerShared;
+
+/*
+ * Generate output file name. If relations of the same 'relid' happen to be
+ * processed at the same time, they must be from different databases and
+ * therefore different backends must be involved. (PID is already present in
+ * the fileset name.)
+ */
+static inline void
+DecodingWorkerFileName(char *fname, Oid relid, uint32 seq)
+{
+	snprintf(fname, MAXPGPATH, "%u-%u", relid, seq);
+}
+
+/*
+ * Backend-local information to control the decoding worker.
+ */
+typedef struct DecodingWorker
+{
+	/* The worker. */
+	BackgroundWorkerHandle *handle;
+
+	/* DecodingWorkerShared is in this segment. */
+	dsm_segment *seg;
+
+	/* Handle of the error queue. */
+	shm_mq_handle *error_mqh;
+} DecodingWorker;
+
+/* Pointer to currently running decoding worker. */
+static DecodingWorker *decoding_worker = NULL;
+
+/*
+ * Is there a message sent by a repack worker that the backend needs to
+ * receive?
+ */
+volatile sig_atomic_t RepackMessagePending = false;
+
 static bool cluster_rel_recheck(RepackCommand cmd, Relation OldHeap,
 								Oid indexOid, Oid userid, LOCKMODE lmode,
 								int options);
@@ -130,7 +231,7 @@ static void check_repack_concurrently_requirements(Relation rel);
 static void rebuild_relation(Relation OldHeap, Relation index, bool verbose,
 							 bool concurrent);
 static void copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
-							Snapshot snapshot, LogicalDecodingContext *decoding_ctx,
+							Snapshot snapshot,
 							bool verbose,
 							bool *pSwapToastByContent,
 							TransactionId *pFreezeXid,
@@ -143,12 +244,10 @@ static List *get_tables_to_repack_partitioned(RepackCommand cmd,
 static bool cluster_is_permitted_for_relation(RepackCommand cmd,
 											  Oid relid, Oid userid);
 
-static void begin_concurrent_repack(Relation rel);
-static void end_concurrent_repack(void);
 static LogicalDecodingContext *setup_logical_decoding(Oid relid);
-static HeapTuple get_changed_tuple(char *change);
-static void apply_concurrent_changes(RepackDecodingState *dstate,
-									 ChangeDest *dest);
+static bool decode_concurrent_changes(LogicalDecodingContext *ctx,
+									  DecodingWorkerShared *shared);
+static void apply_concurrent_changes(BufFile *file, ChangeDest *dest);
 static void apply_concurrent_insert(Relation rel, HeapTuple tup,
 									IndexInsertState *iistate,
 									TupleTableSlot *index_slot);
@@ -160,9 +259,9 @@ static void apply_concurrent_delete(Relation rel, HeapTuple tup_target);
 static HeapTuple find_target_tuple(Relation rel, ChangeDest *dest,
 								   HeapTuple tup_key,
 								   TupleTableSlot *ident_slot);
-static void process_concurrent_changes(LogicalDecodingContext *decoding_ctx,
-									   XLogRecPtr end_of_wal,
-									   ChangeDest *dest);
+static void process_concurrent_changes(XLogRecPtr end_of_wal,
+									   ChangeDest *dest,
+									   bool done);
 static IndexInsertState *get_index_insert_state(Relation relation,
 												Oid ident_index_id,
 												Relation *ident_index_p);
@@ -172,7 +271,6 @@ static void free_index_insert_state(IndexInsertState *iistate);
 static void cleanup_logical_decoding(LogicalDecodingContext *ctx);
 static void rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
 											   Relation cl_index,
-											   LogicalDecodingContext *decoding_ctx,
 											   TransactionId frozenXid,
 											   MultiXactId cutoffMulti);
 static List *build_new_indexes(Relation NewHeap, Relation OldHeap, List *OldIndexes);
@@ -182,6 +280,13 @@ static Relation process_single_relation(RepackStmt *stmt,
 										ClusterParams *params);
 static Oid	determine_clustered_index(Relation rel, bool usingindex,
 									  const char *indexname);
+static void start_decoding_worker(Oid relid);
+static void stop_decoding_worker(void);
+static void repack_worker_internal(dsm_segment *seg);
+static void export_initial_snapshot(Snapshot snapshot,
+									DecodingWorkerShared *shared);
+static Snapshot get_initial_snapshot(DecodingWorker *worker);
+static void ProcessRepackMessage(StringInfo msg);
 static const char *RepackCommandAsString(RepackCommand cmd);
 
 
@@ -604,20 +709,20 @@ cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	/* rebuild_relation does all the dirty work */
 	PG_TRY();
 	{
-		/*
-		 * For concurrent processing, make sure that our logical decoding
-		 * ignores data changes of other tables than the one we are
-		 * processing.
-		 */
-		if (concurrent)
-			begin_concurrent_repack(OldHeap);
-
 		rebuild_relation(OldHeap, index, verbose, concurrent);
 	}
 	PG_FINALLY();
 	{
 		if (concurrent)
-			end_concurrent_repack();
+		{
+			/*
+			 * Since during normal operation the worker was already asked to
+			 * exit, stopping it explicitly is especially important on ERROR.
+			 * However it still seems a good practice to make sure that the
+			 * worker never survives the REPACK command.
+			 */
+			stop_decoding_worker();
+		}
 	}
 	PG_END_TRY();
 
@@ -914,7 +1019,6 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose, bool concurrent
 	bool		swap_toast_by_content;
 	TransactionId frozenXid;
 	MultiXactId cutoffMulti;
-	LogicalDecodingContext *decoding_ctx = NULL;
 	Snapshot	snapshot = NULL;
 #if USE_ASSERT_CHECKING
 	LOCKMODE	lmode;
@@ -928,19 +1032,36 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose, bool concurrent
 	if (concurrent)
 	{
 		/*
-		 * Prepare to capture the concurrent data changes.
+		 * The worker needs to be member of the locking group we're the leader
+		 * of. We ought to become the leader before the worker starts. The
+		 * worker will join the group as soon as it starts.
+		 *
+		 * This is to make sure that the deadlock described below is
+		 * detectable by deadlock.c: if the worker waits for a transaction to
+		 * complete and we are waiting for the worker output, then effectively
+		 * we (i.e. this backend) are waiting for that transaction.
+		 */
+		BecomeLockGroupLeader();
+
+		/*
+		 * Start the worker that decodes data changes applied while we're
+		 * copying the table contents.
 		 *
-		 * Note that this call waits for all transactions with XID already
-		 * assigned to finish. If some of those transactions is waiting for a
-		 * lock conflicting with ShareUpdateExclusiveLock on our table (e.g.
-		 * it runs CREATE INDEX), we can end up in a deadlock. Not sure this
-		 * risk is worth unlocking/locking the table (and its clustering
-		 * index) and checking again if its still eligible for REPACK
-		 * CONCURRENTLY.
+		 * Note that the worker has to wait for all transactions with XID
+		 * already assigned to finish. If some of those transactions is
+		 * waiting for a lock conflicting with ShareUpdateExclusiveLock on our
+		 * table (e.g.  it runs CREATE INDEX), we can end up in a deadlock.
+		 * Not sure this risk is worth unlocking/locking the table (and its
+		 * clustering index) and checking again if its still eligible for
+		 * REPACK CONCURRENTLY.
+		 */
+		start_decoding_worker(tableOid);
+
+		/*
+		 * Wait until the worker has the initial snapshot and retrieve it.
 		 */
-		decoding_ctx = setup_logical_decoding(tableOid);
+		snapshot = get_initial_snapshot(decoding_worker);
 
-		snapshot = SnapBuildInitialSnapshotForRepack(decoding_ctx->snapshot_builder);
 		PushActiveSnapshot(snapshot);
 	}
 
@@ -965,7 +1086,7 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose, bool concurrent
 	NewHeap = table_open(OIDNewHeap, NoLock);
 
 	/* Copy the heap data into the new table in the desired order */
-	copy_table_data(NewHeap, OldHeap, index, snapshot, decoding_ctx, verbose,
+	copy_table_data(NewHeap, OldHeap, index, snapshot, verbose,
 					&swap_toast_by_content, &frozenXid, &cutoffMulti);
 
 	/* The historic snapshot won't be needed anymore. */
@@ -989,15 +1110,11 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose, bool concurrent
 
 		Assert(!swap_toast_by_content);
 		rebuild_relation_finish_concurrent(NewHeap, OldHeap, index,
-										   decoding_ctx,
 										   frozenXid, cutoffMulti);
 		PopActiveSnapshot();
 
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_FINAL_CLEANUP);
-
-		/* Done with decoding. */
-		cleanup_logical_decoding(decoding_ctx);
 	}
 	else
 	{
@@ -1168,8 +1285,7 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
  */
 static void
 copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
-				Snapshot snapshot, LogicalDecodingContext *decoding_ctx,
-				bool verbose, bool *pSwapToastByContent,
+				Snapshot snapshot, bool verbose, bool *pSwapToastByContent,
 				TransactionId *pFreezeXid, MultiXactId *pCutoffMulti)
 {
 	Relation	relRelation;
@@ -1330,7 +1446,6 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
 	 */
 	table_relation_copy_for_cluster(OldHeap, NewHeap, OldIndex, use_sort,
 									cutoffs.OldestXmin, snapshot,
-									decoding_ctx,
 									&cutoffs.FreezeLimit,
 									&cutoffs.MultiXactCutoff,
 									&num_tuples, &tups_vacuumed,
@@ -2363,59 +2478,6 @@ RepackCommandAsString(RepackCommand cmd)
 	return "???";
 }
 
-
-/*
- * Call this function before REPACK CONCURRENTLY starts to setup logical
- * decoding. It makes sure that other users of the table put enough
- * information into WAL.
- *
- * The point is that at various places we expect that the table we're
- * processing is treated like a system catalog. For example, we need to be
- * able to scan it using a "historic snapshot" anytime during the processing
- * (as opposed to scanning only at the start point of the decoding, as logical
- * replication does during initial table synchronization), in order to apply
- * concurrent UPDATE / DELETE commands.
- *
- * Note that TOAST table needs no attention here as it's not scanned using
- * historic snapshot.
- */
-static void
-begin_concurrent_repack(Relation rel)
-{
-	Oid			toastrelid;
-
-	/*
-	 * Avoid logical decoding of other relations by this backend. The lock we
-	 * have guarantees that the actual locator cannot be changed concurrently:
-	 * TRUNCATE needs AccessExclusiveLock.
-	 */
-	Assert(CheckRelationLockedByMe(rel, ShareUpdateExclusiveLock, false));
-	repacked_rel_locator = rel->rd_locator;
-	toastrelid = rel->rd_rel->reltoastrelid;
-	if (OidIsValid(toastrelid))
-	{
-		Relation	toastrel;
-
-		/* Avoid logical decoding of other TOAST relations. */
-		toastrel = table_open(toastrelid, AccessShareLock);
-		repacked_rel_toast_locator = toastrel->rd_locator;
-		table_close(toastrel, AccessShareLock);
-	}
-}
-
-/*
- * Call this when done with REPACK CONCURRENTLY.
- */
-static void
-end_concurrent_repack(void)
-{
-	/*
-	 * Restore normal function of (future) logical decoding for this backend.
-	 */
-	repacked_rel_locator.relNumber = InvalidOid;
-	repacked_rel_toast_locator.relNumber = InvalidOid;
-}
-
 /*
  * This function is much like pg_create_logical_replication_slot() except that
  * the new slot is neither released (if anyone else could read changes from
@@ -2427,9 +2489,10 @@ static LogicalDecodingContext *
 setup_logical_decoding(Oid relid)
 {
 	Relation	rel;
-	TupleDesc	tupdesc;
+	Oid			toastrelid;
 	LogicalDecodingContext *ctx;
-	RepackDecodingState *dstate = palloc0_object(RepackDecodingState);
+	NameData	slotname;
+	RepackDecodingState *dstate;
 
 	/*
 	 * REPACK CONCURRENTLY is not allowed in a transaction block, so this
@@ -2437,21 +2500,21 @@ setup_logical_decoding(Oid relid)
 	 */
 	Assert(!TransactionIdIsValid(GetTopTransactionIdIfAny()));
 
-	/*
-	 * A single backend should not execute multiple REPACK commands at a time,
-	 * so use PID to make the slot unique.
-	 */
-	snprintf(NameStr(dstate->slotname), NAMEDATALEN, "repack_%d", MyProcPid);
-
 	/*
 	 * Check if we can use logical decoding.
 	 */
 	CheckSlotPermissions();
 	CheckLogicalDecodingRequirements();
 
-	/* RS_TEMPORARY so that the slot gets cleaned up on ERROR. */
-	ReplicationSlotCreate(NameStr(dstate->slotname), true, RS_TEMPORARY,
-						  false, false, false);
+	/*
+	 * A single backend should not execute multiple REPACK commands at a time,
+	 * so use PID to make the slot unique.
+	 *
+	 * RS_TEMPORARY so that the slot gets cleaned up on ERROR.
+	 */
+	snprintf(NameStr(slotname), NAMEDATALEN, "repack_%d", MyProcPid);
+	ReplicationSlotCreate(NameStr(slotname), true, RS_TEMPORARY, false, false,
+						  false);
 
 	/*
 	 * Neither prepare_write nor do_write callback nor update_progress is
@@ -2473,197 +2536,238 @@ setup_logical_decoding(Oid relid)
 
 	DecodingContextFindStartpoint(ctx);
 
+	/*
+	 * decode_concurrent_changes() needs non-blocking callback.
+	 */
+	ctx->reader->routine.page_read = read_local_xlog_page_no_wait;
+
+	/*
+	 * read_local_xlog_page_no_wait() needs to be able to indicate the end of
+	 * WAL.
+	 */
+	ctx->reader->private_data = MemoryContextAllocZero(ctx->context,
+													  sizeof(ReadLocalXLogPageNoWaitPrivate));
+
+
 	/* Some WAL records should have been read. */
 	Assert(ctx->reader->EndRecPtr != InvalidXLogRecPtr);
 
+	/*
+	 * Initialize repack_current_segment so that we can notice WAL segment
+	 * boundaries.
+	 */
 	XLByteToSeg(ctx->reader->EndRecPtr, repack_current_segment,
 				wal_segment_size);
 
-	/*
-	 * Setup structures to store decoded changes.
-	 */
+	dstate = palloc0_object(RepackDecodingState);
 	dstate->relid = relid;
-	dstate->tstore = tuplestore_begin_heap(false, false,
-										   maintenance_work_mem);
 
-	/* Caller should already have the table locked. */
-	rel = table_open(relid, NoLock);
-	tupdesc = CreateTupleDescCopy(RelationGetDescr(rel));
-	dstate->tupdesc = tupdesc;
-	table_close(rel, NoLock);
+	/*
+	 * Tuple descriptor may be needed to flatten a tuple before we write it to
+	 * a file. A copy is needed because the decoding worker invalidates system
+	 * caches before it starts to do the actual work.
+	 */
+	rel = table_open(relid, AccessShareLock);
+	dstate->tupdesc = CreateTupleDescCopy(RelationGetDescr(rel));
 
-	/* Initialize the descriptor to store the changes ... */
-	dstate->tupdesc_change = CreateTemplateTupleDesc(1);
+	/* Avoid logical decoding of other relations. */
+	repacked_rel_locator = rel->rd_locator;
+	toastrelid = rel->rd_rel->reltoastrelid;
+	if (OidIsValid(toastrelid))
+	{
+		Relation	toastrel;
 
-	TupleDescInitEntry(dstate->tupdesc_change, 1, NULL, BYTEAOID, -1, 0);
-	/* ... as well as the corresponding slot. */
-	dstate->tsslot = MakeSingleTupleTableSlot(dstate->tupdesc_change,
-											  &TTSOpsMinimalTuple);
+		/* Avoid logical decoding of other TOAST relations. */
+		toastrel = table_open(toastrelid, AccessShareLock);
+		repacked_rel_toast_locator = toastrel->rd_locator;
+		table_close(toastrel, AccessShareLock);
+	}
+	table_close(rel, AccessShareLock);
 
-	dstate->resowner = ResourceOwnerCreate(CurrentResourceOwner,
-										   "logical decoding");
+	/* The file will be set as soon as we have it opened. */
+	dstate->file = NULL;
 
 	ctx->output_writer_private = dstate;
+
 	return ctx;
 }
 
 /*
- * Retrieve tuple from ConcurrentChange structure.
+ * Decode logical changes from the WAL sequence and store them to a file.
  *
- * The input data starts with the structure but it might not be appropriately
- * aligned.
+ * If true is returned, there is no more work for the worker.
  */
-static HeapTuple
-get_changed_tuple(char *change)
+static bool
+decode_concurrent_changes(LogicalDecodingContext *ctx,
+						  DecodingWorkerShared *shared)
 {
-	HeapTupleData tup_data;
-	HeapTuple	result;
-	char	   *src;
+	RepackDecodingState *dstate;
+	XLogRecPtr	lsn_upto;
+	bool		done;
+	char		fname[MAXPGPATH];
 
-	/*
-	 * Ensure alignment before accessing the fields. (This is why we can't use
-	 * heap_copytuple() instead of this function.)
-	 */
-	src = change + offsetof(ConcurrentChange, tup_data);
-	memcpy(&tup_data, src, sizeof(HeapTupleData));
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
 
-	result = (HeapTuple) palloc(HEAPTUPLESIZE + tup_data.t_len);
-	memcpy(result, &tup_data, sizeof(HeapTupleData));
-	result->t_data = (HeapTupleHeader) ((char *) result + HEAPTUPLESIZE);
-	src = change + SizeOfConcurrentChange;
-	memcpy(result->t_data, src, result->t_len);
+	/* Open the output file. */
+	DecodingWorkerFileName(fname, shared->relid, shared->last_exported + 1);
+	dstate->file = BufFileCreateFileSet(&shared->sfs.fs, fname);
 
-	return result;
-}
+	SpinLockAcquire(&shared->mutex);
+	lsn_upto = shared->lsn_upto;
+	done = shared->done;
+	SpinLockRelease(&shared->mutex);
 
-/*
- * Decode logical changes from the WAL sequence up to end_of_wal.
- */
-void
-repack_decode_concurrent_changes(LogicalDecodingContext *ctx,
-								 XLogRecPtr end_of_wal)
-{
-	RepackDecodingState *dstate;
-	ResourceOwner resowner_old;
-
-	/*
-	 * Invalidate the "present" cache before moving to "(recent) history".
-	 */
-	InvalidateSystemCaches();
+	while (XLogRecPtrIsInvalid(lsn_upto) || ctx->reader->EndRecPtr < lsn_upto)
+	{
+		XLogRecord *record;
+		XLogSegNo	segno_new;
+		char	   *errm = NULL;
+		XLogRecPtr	end_lsn;
 
-	dstate = (RepackDecodingState *) ctx->output_writer_private;
-	resowner_old = CurrentResourceOwner;
-	CurrentResourceOwner = dstate->resowner;
+		CHECK_FOR_INTERRUPTS();
 
-	PG_TRY();
-	{
-		while (ctx->reader->EndRecPtr < end_of_wal)
+		record = XLogReadRecord(ctx->reader, &errm);
+		if (record == NULL)
 		{
-			XLogRecord *record;
-			XLogSegNo	segno_new;
-			char	   *errm = NULL;
-			XLogRecPtr	end_lsn;
+			ReadLocalXLogPageNoWaitPrivate *priv;
 
-			record = XLogReadRecord(ctx->reader, &errm);
 			if (errm)
-				elog(ERROR, "%s", errm);
-
-			if (record != NULL)
-				LogicalDecodingProcessRecord(ctx, ctx->reader);
+				ereport(ERROR, (errmsg("%s", errm)));
 
 			/*
-			 * If WAL segment boundary has been crossed, inform the decoding
-			 * system that the catalog_xmin can advance. (We can confirm more
-			 * often, but a filling a single WAL segment should not take much
-			 * time.)
+			 * In the decoding loop we do not want to get blocked when there
+			 * is no more WAL available, otherwise the loop would become
+			 * uninterruptible. The point is that the worker is only useful if
+			 * it starts decoding before lsn_upto is set. Thus it can reach
+			 * the end of WAL and find out later that it did not have to go
+			 * that far.
 			 */
-			end_lsn = ctx->reader->EndRecPtr;
-			XLByteToSeg(end_lsn, segno_new, wal_segment_size);
-			if (segno_new != repack_current_segment)
+			priv = (ReadLocalXLogPageNoWaitPrivate *)
+				ctx->reader->private_data;
+			if (priv->end_of_wal)
 			{
-				LogicalConfirmReceivedLocation(end_lsn);
-				elog(DEBUG1, "REPACK: confirmed receive location %X/%X",
-					 (uint32) (end_lsn >> 32), (uint32) end_lsn);
-				repack_current_segment = segno_new;
+				priv->end_of_wal = false;
+
+				/* Do we know how far we should get? */
+				if (XLogRecPtrIsInvalid(lsn_upto))
+				{
+					SpinLockAcquire(&shared->mutex);
+					lsn_upto = shared->lsn_upto;
+					/* 'done' should be set at the same time as 'lsn_upto' */
+					done = shared->done;
+					SpinLockRelease(&shared->mutex);
+
+					/* Check if the work happens to be complete. */
+					continue;
+				}
+
+				/* Wait a bit before we retry reading WAL. */
+				(void) WaitLatch(MyLatch,
+								 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+								 1000L,
+								 WAIT_EVENT_REPACK_WORKER_MAIN);
+
+				continue;
 			}
+			else
+				ereport(ERROR, (errmsg("could not read WAL record")));
+		}
 
-			CHECK_FOR_INTERRUPTS();
+		LogicalDecodingProcessRecord(ctx, ctx->reader);
+
+		/*
+		 * If WAL segment boundary has been crossed, inform the decoding
+		 * system that the catalog_xmin can advance.
+		 *
+		 * TODO Does it make sense to confirm more often? Segment size seems
+		 * appropriate for restart_lsn (because less than a segment cannot be
+		 * recycled anyway), however more frequent checks might be beneficial
+		 * for catalog_xmin.
+		 */
+		end_lsn = ctx->reader->EndRecPtr;
+		XLByteToSeg(end_lsn, segno_new, wal_segment_size);
+		if (segno_new != repack_current_segment)
+		{
+			LogicalConfirmReceivedLocation(end_lsn);
+			elog(DEBUG1, "REPACK: confirmed receive location %X/%X",
+				 (uint32) (end_lsn >> 32), (uint32) end_lsn);
+			repack_current_segment = segno_new;
+		}
+
+		/* Keep checking if 'lsn_upto' was specified. */
+		if (XLogRecPtrIsInvalid(lsn_upto))
+		{
+			SpinLockAcquire(&shared->mutex);
+			lsn_upto = shared->lsn_upto;
+			/* 'done' should be set at the same time as 'lsn_upto' */
+			done = shared->done;
+			SpinLockRelease(&shared->mutex);
 		}
-		InvalidateSystemCaches();
-		CurrentResourceOwner = resowner_old;
-	}
-	PG_CATCH();
-	{
-		/* clear all timetravel entries */
-		InvalidateSystemCaches();
-		CurrentResourceOwner = resowner_old;
-		PG_RE_THROW();
 	}
-	PG_END_TRY();
+
+	/*
+	 * Close the file and make it available to the backend.
+	 */
+	BufFileClose(dstate->file);
+	dstate->file = NULL;
+	SpinLockAcquire(&shared->mutex);
+	shared->lsn_upto = InvalidXLogRecPtr;
+	shared->sfs_valid = true;
+	shared->last_exported++;
+	SpinLockRelease(&shared->mutex);
+	ConditionVariableSignal(&shared->cv);
+
+	return done;
 }
 
 /*
- * Apply changes that happened during the initial load.
- *
- * Scan key is passed by caller, so it does not have to be constructed
- * multiple times. Key entries have all fields initialized, except for
- * sk_argument.
+ * Apply changes stored in 'file'.
  */
 static void
-apply_concurrent_changes(RepackDecodingState *dstate, ChangeDest *dest)
+apply_concurrent_changes(BufFile *file, ChangeDest *dest)
 {
+	char		kind;
+	uint32		t_len;
 	Relation	rel = dest->rel;
 	TupleTableSlot *index_slot,
 			   *ident_slot;
 	HeapTuple	tup_old = NULL;
 
-	if (dstate->nchanges == 0)
-		return;
-
 	/* TupleTableSlot is needed to pass the tuple to ExecInsertIndexTuples(). */
-	index_slot = MakeSingleTupleTableSlot(dstate->tupdesc, &TTSOpsHeapTuple);
+	index_slot = MakeSingleTupleTableSlot(RelationGetDescr(rel),
+										  &TTSOpsHeapTuple);
 
 	/* A slot to fetch tuples from identity index. */
 	ident_slot = table_slot_create(rel, NULL);
 
-	while (tuplestore_gettupleslot(dstate->tstore, true, false,
-								   dstate->tsslot))
+	while (true)
 	{
-		bool		shouldFree;
-		HeapTuple	tup_change,
-					tup,
+		size_t		nread;
+		HeapTuple	tup,
 					tup_exist;
-		char	   *change_raw,
-				   *src;
-		ConcurrentChange change;
-		bool		isnull[1];
-		Datum		values[1];
 
 		CHECK_FOR_INTERRUPTS();
 
-		/* Get the change from the single-column tuple. */
-		tup_change = ExecFetchSlotHeapTuple(dstate->tsslot, false, &shouldFree);
-		heap_deform_tuple(tup_change, dstate->tupdesc_change, values, isnull);
-		Assert(!isnull[0]);
-
-		/* Make sure we access aligned data. */
-		change_raw = (char *) DatumGetByteaP(values[0]);
-		src = (char *) VARDATA(change_raw);
-		memcpy(&change, src, SizeOfConcurrentChange);
+		nread = BufFileReadMaybeEOF(file, &kind, 1, true);
+		/* Are we done with the file? */
+		if (nread == 0)
+			break;
 
-		/*
-		 * Extract the tuple from the change. The tuple is copied here because
-		 * it might be assigned to 'tup_old', in which case it needs to
-		 * survive into the next iteration.
-		 */
-		tup = get_changed_tuple(src);
+		/* Read the tuple. */
+		BufFileReadExact(file, &t_len, sizeof(t_len));
+		tup = (HeapTuple) palloc(HEAPTUPLESIZE + t_len);
+		tup->t_data = (HeapTupleHeader) ((char *) tup + HEAPTUPLESIZE);
+		BufFileReadExact(file, tup->t_data, t_len);
+		tup->t_len = t_len;
+		ItemPointerSetInvalid(&tup->t_self);
+		tup->t_tableOid = RelationGetRelid(dest->rel);
 
-		if (change.kind == CHANGE_UPDATE_OLD)
+		if (kind == CHANGE_UPDATE_OLD)
 		{
 			Assert(tup_old == NULL);
 			tup_old = tup;
 		}
-		else if (change.kind == CHANGE_INSERT)
+		else if (kind == CHANGE_INSERT)
 		{
 			Assert(tup_old == NULL);
 
@@ -2671,12 +2775,11 @@ apply_concurrent_changes(RepackDecodingState *dstate, ChangeDest *dest)
 
 			pfree(tup);
 		}
-		else if (change.kind == CHANGE_UPDATE_NEW ||
-				 change.kind == CHANGE_DELETE)
+		else if (kind == CHANGE_UPDATE_NEW || kind == CHANGE_DELETE)
 		{
 			HeapTuple	tup_key;
 
-			if (change.kind == CHANGE_UPDATE_NEW)
+			if (kind == CHANGE_UPDATE_NEW)
 			{
 				tup_key = tup_old != NULL ? tup_old : tup;
 			}
@@ -2693,7 +2796,7 @@ apply_concurrent_changes(RepackDecodingState *dstate, ChangeDest *dest)
 			if (tup_exist == NULL)
 				elog(ERROR, "failed to find target tuple");
 
-			if (change.kind == CHANGE_UPDATE_NEW)
+			if (kind == CHANGE_UPDATE_NEW)
 				apply_concurrent_update(rel, tup, tup_exist, dest->iistate,
 										index_slot);
 			else
@@ -2708,26 +2811,19 @@ apply_concurrent_changes(RepackDecodingState *dstate, ChangeDest *dest)
 			pfree(tup);
 		}
 		else
-			elog(ERROR, "unrecognized kind of change: %d", change.kind);
+			elog(ERROR, "unrecognized kind of change: %d", kind);
 
 		/*
 		 * If a change was applied now, increment CID for next writes and
 		 * update the snapshot so it sees the changes we've applied so far.
 		 */
-		if (change.kind != CHANGE_UPDATE_OLD)
+		if (kind != CHANGE_UPDATE_OLD)
 		{
 			CommandCounterIncrement();
 			UpdateActiveSnapshotCommandId();
 		}
-
-		/* TTSOpsMinimalTuple has .get_heap_tuple==NULL. */
-		Assert(shouldFree);
-		pfree(tup_change);
 	}
 
-	tuplestore_clear(dstate->tstore);
-	dstate->nchanges = 0;
-
 	/* Cleanup. */
 	ExecDropSingleTupleTableSlot(index_slot);
 	ExecDropSingleTupleTableSlot(ident_slot);
@@ -2865,6 +2961,12 @@ find_target_tuple(Relation rel, ChangeDest *dest, HeapTuple tup_key,
 	/* XXX no instrumentation for now */
 	scan = index_beginscan(rel, ident_index, GetActiveSnapshot(),
 						   NULL, dest->ident_key_nentries, 0);
+
+	/*
+	 * Scan key is passed by caller, so it does not have to be constructed
+	 * multiple times. Key entries have all fields initialized, except for
+	 * sk_argument.
+	 */
 	index_rescan(scan, dest->ident_key, dest->ident_key_nentries, NULL, 0);
 
 	/* Info needed to retrieve key values from heap tuple. */
@@ -2900,25 +3002,58 @@ find_target_tuple(Relation rel, ChangeDest *dest, HeapTuple tup_key,
 }
 
 /*
- * Decode and apply concurrent changes.
+ * Decode and apply concurrent changes, up to (and including) the record whose
+ * LSN is 'end_of_wal'.
  */
 static void
-process_concurrent_changes(LogicalDecodingContext *decoding_ctx,
-						   XLogRecPtr end_of_wal, ChangeDest *dest)
+process_concurrent_changes(XLogRecPtr end_of_wal, ChangeDest *dest, bool done)
 {
-	RepackDecodingState *dstate;
+	DecodingWorkerShared *shared;
+	char		fname[MAXPGPATH];
+	BufFile    *file;
 
 	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 								 PROGRESS_REPACK_PHASE_CATCH_UP);
 
-	dstate = (RepackDecodingState *) decoding_ctx->output_writer_private;
+	/* Ask the worker for the file. */
+	shared = (DecodingWorkerShared *) dsm_segment_address(decoding_worker->seg);
+	SpinLockAcquire(&shared->mutex);
+	shared->lsn_upto = end_of_wal;
+	shared->done = done;
+	SpinLockRelease(&shared->mutex);
 
-	repack_decode_concurrent_changes(decoding_ctx, end_of_wal);
+	/*
+	 * The worker needs to finish processing of the current WAL record. Even
+	 * if it's idle, it'll need to close the output file. Thus we're likely to
+	 * wait, so prepare for sleep.
+	 */
+	ConditionVariablePrepareToSleep(&shared->cv);
+	for (;;)
+	{
+		bool		valid;
 
-	if (dstate->nchanges == 0)
-		return;
+		SpinLockAcquire(&shared->mutex);
+		valid = shared->sfs_valid;
+		SpinLockRelease(&shared->mutex);
+
+		if (valid)
+			break;
+
+		ConditionVariableSleep(&shared->cv, WAIT_EVENT_REPACK_WORKER_EXPORT);
+	}
+	ConditionVariableCancelSleep();
 
-	apply_concurrent_changes(dstate, dest);
+	/* Open the file. */
+	DecodingWorkerFileName(fname, shared->relid, shared->last_exported);
+	file = BufFileOpenFileSet(&shared->sfs.fs, fname, O_RDONLY, false);
+	apply_concurrent_changes(file, dest);
+
+	/* No file is exported until the worker exports the next one. */
+	SpinLockAcquire(&shared->mutex);
+	shared->sfs_valid = false;
+	SpinLockRelease(&shared->mutex);
+
+	BufFileClose(file);
 }
 
 /*
@@ -3044,15 +3179,10 @@ cleanup_logical_decoding(LogicalDecodingContext *ctx)
 
 	dstate = (RepackDecodingState *) ctx->output_writer_private;
 
-	ExecDropSingleTupleTableSlot(dstate->tsslot);
-	FreeTupleDesc(dstate->tupdesc_change);
 	FreeTupleDesc(dstate->tupdesc);
-	tuplestore_end(dstate->tstore);
-
 	FreeDecodingContext(ctx);
 
-	ReplicationSlotRelease();
-	ReplicationSlotDrop(NameStr(dstate->slotname), false);
+	ReplicationSlotDropAcquired();
 	pfree(dstate);
 }
 
@@ -3067,7 +3197,6 @@ cleanup_logical_decoding(LogicalDecodingContext *ctx)
 static void
 rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
 								   Relation cl_index,
-								   LogicalDecodingContext *decoding_ctx,
 								   TransactionId frozenXid,
 								   MultiXactId cutoffMulti)
 {
@@ -3170,7 +3299,7 @@ rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
 	 * hold AccessExclusiveLock. (Quite some amount of WAL could have been
 	 * written during the data copying and index creation.)
 	 */
-	process_concurrent_changes(decoding_ctx, end_of_wal, &chgdst);
+	process_concurrent_changes(end_of_wal, &chgdst, false);
 
 	/*
 	 * Acquire AccessExclusiveLock on the table, its TOAST relation (if there
@@ -3266,8 +3395,11 @@ rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
 	XLogFlush(wal_insert_ptr);
 	end_of_wal = GetFlushRecPtr(NULL);
 
-	/* Apply the concurrent changes again. */
-	process_concurrent_changes(decoding_ctx, end_of_wal, &chgdst);
+	/*
+	 * Apply the concurrent changes again. Indicate that the decoding worker
+	 * won't be needed anymore.
+	 */
+	process_concurrent_changes(end_of_wal, &chgdst, true);
 
 	/* Remember info about rel before closing OldHeap */
 	relpersistence = OldHeap->rd_rel->relpersistence;
@@ -3380,3 +3512,488 @@ build_new_indexes(Relation NewHeap, Relation OldHeap, List *OldIndexes)
 
 	return result;
 }
+
+/*
+ * Try to start a background worker to perform logical decoding of data
+ * changes applied to relation while REPACK CONCURRENTLY is copying its
+ * contents to a new table.
+ */
+static void
+start_decoding_worker(Oid relid)
+{
+	Size		size;
+	dsm_segment *seg;
+	DecodingWorkerShared *shared;
+	shm_mq	   *mq;
+	shm_mq_handle *mqh;
+	BackgroundWorker bgw;
+
+	/* Setup shared memory. */
+	size = BUFFERALIGN(offsetof(DecodingWorkerShared, error_queue)) +
+		BUFFERALIGN(REPACK_ERROR_QUEUE_SIZE);
+	seg = dsm_create(size, 0);
+	shared = (DecodingWorkerShared *) dsm_segment_address(seg);
+	shared->lsn_upto = InvalidXLogRecPtr;
+	shared->done = false;
+	SharedFileSetInit(&shared->sfs, seg);
+	shared->sfs_valid = false;
+	shared->last_exported = -1;
+	SpinLockInit(&shared->mutex);
+	shared->dbid = MyDatabaseId;
+
+	/*
+	 * This is the UserId set in cluster_rel(). Security context shouldn't be
+	 * needed for decoding worker.
+	 */
+	shared->roleid = GetUserId();
+	shared->relid = relid;
+	ConditionVariableInit(&shared->cv);
+	shared->backend_proc = MyProc;
+	shared->backend_pid = MyProcPid;
+	shared->backend_proc_number = MyProcNumber;
+
+	mq = shm_mq_create((char *) BUFFERALIGN(shared->error_queue),
+					   REPACK_ERROR_QUEUE_SIZE);
+	shm_mq_set_receiver(mq, MyProc);
+	mqh = shm_mq_attach(mq, seg, NULL);
+
+	memset(&bgw, 0, sizeof(bgw));
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "REPACK decoding worker for relation \"%s\"",
+			 get_rel_name(relid));
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "REPACK decoding worker");
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	snprintf(bgw.bgw_library_name, MAXPGPATH, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "RepackWorkerMain");
+	bgw.bgw_main_arg = UInt32GetDatum(dsm_segment_handle(seg));
+	bgw.bgw_notify_pid = MyProcPid;
+
+	decoding_worker = palloc0_object(DecodingWorker);
+	if (!RegisterDynamicBackgroundWorker(&bgw, &decoding_worker->handle))
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of background worker slots"),
+				 errhint("You might need to increase \"%s\".", "max_worker_processes")));
+
+	decoding_worker->seg = seg;
+	decoding_worker->error_mqh = mqh;
+}
+
+/*
+ * Stop the decoding worker and cleanup the related resources.
+ *
+ * The worker stops on its own when it knows there is no more work to do, but
+ * we need to stop it explicitly at least on ERROR in the launching backend.
+ */
+static void
+stop_decoding_worker(void)
+{
+	BgwHandleStatus status;
+
+	/* Haven't reached the worker startup? */
+	if (decoding_worker == NULL)
+		return;
+
+	/* Could not register the worker? */
+	if (decoding_worker->handle == NULL)
+		return;
+
+	TerminateBackgroundWorker(decoding_worker->handle);
+	/* The worker should really exit before the REPACK command does. */
+	HOLD_INTERRUPTS();
+	status = WaitForBackgroundWorkerShutdown(decoding_worker->handle);
+	RESUME_INTERRUPTS();
+
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errcode(ERRCODE_ADMIN_SHUTDOWN),
+				 errmsg("postmaster exited during REPACK command")));
+
+	shm_mq_detach(decoding_worker->error_mqh);
+
+	/*
+	 * If we could not cancel the current sleep due to ERROR, do that before
+	 * we detach from the shared memory the condition variable is located in.
+	 * If we did not, the bgworker ERROR handling code would try and fail
+	 * badly.
+	 */
+	ConditionVariableCancelSleep();
+
+	dsm_detach(decoding_worker->seg);
+	pfree(decoding_worker);
+	decoding_worker = NULL;
+}
+
+/* Is this process a REPACK worker? */
+static bool is_repack_worker = false;
+
+static pid_t backend_pid;
+static ProcNumber backend_proc_number;
+
+/*
+ * See ParallelWorkerShutdown for details.
+ */
+static void
+RepackWorkerShutdown(int code, Datum arg)
+{
+	SendProcSignal(backend_pid,
+				   PROCSIG_REPACK_MESSAGE,
+				   backend_proc_number);
+
+	dsm_detach((dsm_segment *) DatumGetPointer(arg));
+}
+
+/* REPACK decoding worker entry point */
+void
+RepackWorkerMain(Datum main_arg)
+{
+	dsm_segment *seg;
+	DecodingWorkerShared *shared;
+	shm_mq	   *mq;
+	shm_mq_handle *mqh;
+
+	is_repack_worker = true;
+
+	/*
+	 * Override the default bgworker_die() with die() so we can use
+	 * CHECK_FOR_INTERRUPTS().
+	 */
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	seg = dsm_attach(DatumGetUInt32(main_arg));
+	if (seg == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("could not map dynamic shared memory segment")));
+
+	shared = (DecodingWorkerShared *) dsm_segment_address(seg);
+
+	/* Arrange to signal the leader if we exit. */
+	backend_pid = shared->backend_pid;
+	backend_proc_number = shared->backend_proc_number;
+	before_shmem_exit(RepackWorkerShutdown, PointerGetDatum(seg));
+
+	/*
+	 * Join locking group - see the comments around the call of
+	 * start_decoding_worker().
+	 */
+	if (!BecomeLockGroupMember(shared->backend_proc, backend_pid))
+		/* The leader is not running anymore. */
+		return;
+
+	/*
+	 * Setup a queue to send error messages to the backend that launched this
+	 * worker.
+	 */
+	mq = (shm_mq *) (char *) BUFFERALIGN(shared->error_queue);
+	shm_mq_set_sender(mq, MyProc);
+	mqh = shm_mq_attach(mq, seg, NULL);
+	pq_redirect_to_shm_mq(seg, mqh);
+	pq_set_parallel_leader(shared->backend_pid,
+						   shared->backend_proc_number);
+
+	/* Connect to the database. */
+	BackgroundWorkerInitializeConnectionByOid(shared->dbid, shared->roleid, 0);
+
+	repack_worker_internal(seg);
+}
+
+static void
+repack_worker_internal(dsm_segment *seg)
+{
+	DecodingWorkerShared *shared;
+	LogicalDecodingContext *decoding_ctx;
+	SharedFileSet *sfs;
+	Snapshot	snapshot;
+
+	/*
+	 * Transaction is needed to open relation, and it also provides us with a
+	 * resource owner.
+	 */
+	StartTransactionCommand();
+
+	shared = (DecodingWorkerShared *) dsm_segment_address(seg);
+
+	/*
+	 * Not sure the spinlock is needed here - the backend should not change
+	 * anything in the shared memory until we have serialized the snapshot.
+	 */
+	SpinLockAcquire(&shared->mutex);
+	Assert(XLogRecPtrIsInvalid(shared->lsn_upto));
+	Assert(!shared->sfs_valid);
+	sfs = &shared->sfs;
+	SpinLockRelease(&shared->mutex);
+
+	SharedFileSetAttach(sfs, seg);
+
+	/*
+	 * Prepare to capture the concurrent data changes ourselves.
+	 */
+	decoding_ctx = setup_logical_decoding(shared->relid);
+
+	/* Build the initial snapshot and export it. */
+	snapshot = SnapBuildInitialSnapshotForRepack(decoding_ctx->snapshot_builder);
+	export_initial_snapshot(snapshot, shared);
+
+	/*
+	 * The worker already had to access some system catalogs during startup,
+	 * and we even had to open the relation we are processing. Now that we're
+	 * going to work with historic snapshots, the system caches must be
+	 * invalidated.
+	 */
+	InvalidateSystemCaches();
+
+	while (!decode_concurrent_changes(decoding_ctx, shared))
+		;
+
+	/* Cleanup. */
+	cleanup_logical_decoding(decoding_ctx);
+	CommitTransactionCommand();
+}
+
+/*
+ * Make snapshot available to the backend that launched the decoding worker.
+ */
+static void
+export_initial_snapshot(Snapshot snapshot, DecodingWorkerShared *shared)
+{
+	char		fname[MAXPGPATH];
+	BufFile    *file;
+	Size		snap_size;
+	char	   *snap_space;
+
+	snap_size = EstimateSnapshotSpace(snapshot);
+	snap_space = (char *) palloc(snap_size);
+	SerializeSnapshot(snapshot, snap_space);
+	FreeSnapshot(snapshot);
+
+	DecodingWorkerFileName(fname, shared->relid, shared->last_exported + 1);
+	file = BufFileCreateFileSet(&shared->sfs.fs, fname);
+	/* To make restoration easier, write the snapshot size first. */
+	BufFileWrite(file, &snap_size, sizeof(snap_size));
+	BufFileWrite(file, snap_space, snap_size);
+	pfree(snap_space);
+	BufFileClose(file);
+
+	/* Tell the backend that the file is available. */
+	SpinLockAcquire(&shared->mutex);
+	shared->sfs_valid = true;
+	shared->last_exported++;
+	SpinLockRelease(&shared->mutex);
+	ConditionVariableSignal(&shared->cv);
+}
+
+/*
+ * Get the initial snapshot from the decoding worker.
+ */
+static Snapshot
+get_initial_snapshot(DecodingWorker *worker)
+{
+	DecodingWorkerShared *shared;
+	char		fname[MAXPGPATH];
+	BufFile    *file;
+	Size		snap_size;
+	char	   *snap_space;
+	Snapshot	snapshot;
+
+	shared = (DecodingWorkerShared *) dsm_segment_address(worker->seg);
+
+	/*
+	 * The worker needs to initialize the logical decoding, which usually
+	 * takes some time. Therefore it makes sense to prepare for the sleep
+	 * first.
+	 */
+	ConditionVariablePrepareToSleep(&shared->cv);
+	for (;;)
+	{
+		bool		valid;
+
+		SpinLockAcquire(&shared->mutex);
+		valid = shared->sfs_valid;
+		SpinLockRelease(&shared->mutex);
+
+		if (valid)
+			break;
+
+		ConditionVariableSleep(&shared->cv, WAIT_EVENT_REPACK_WORKER_EXPORT);
+	}
+	ConditionVariableCancelSleep();
+
+	/* Read the snapshot from a file. */
+	DecodingWorkerFileName(fname, shared->relid, shared->last_exported);
+	file = BufFileOpenFileSet(&shared->sfs.fs, fname, O_RDONLY, false);
+	BufFileReadExact(file, &snap_size, sizeof(snap_size));
+	snap_space = (char *) palloc(snap_size);
+	BufFileReadExact(file, snap_space, snap_size);
+	BufFileClose(file);
+
+	SpinLockAcquire(&shared->mutex);
+	shared->sfs_valid = false;
+	SpinLockRelease(&shared->mutex);
+
+	/* Restore it. */
+	snapshot = RestoreSnapshot(snap_space);
+	pfree(snap_space);
+
+	return snapshot;
+}
+
+bool
+IsRepackWorker(void)
+{
+	return is_repack_worker;
+}
+
+/*
+ * Handle receipt of an interrupt indicating a repack worker message.
+ *
+ * Note: this is called within a signal handler!  All we can do is set
+ * a flag that will cause the next CHECK_FOR_INTERRUPTS() to invoke
+ * ProcessRepackMessages().
+ */
+void
+HandleRepackMessageInterrupt(void)
+{
+	InterruptPending = true;
+	RepackMessagePending = true;
+	SetLatch(MyLatch);
+}
+
+/*
+ * Process any queued protocol messages received from parallel workers.
+ */
+void
+ProcessRepackMessages(void)
+{
+	MemoryContext oldcontext;
+
+	static MemoryContext hpm_context = NULL;
+
+	/*
+	 * Nothing to do if we haven't launched the worker yet or have already
+	 * terminated it.
+	 */
+	if (decoding_worker == NULL)
+		return;
+
+	/*
+	 * This is invoked from ProcessInterrupts(), and since some of the
+	 * functions it calls contain CHECK_FOR_INTERRUPTS(), there is a potential
+	 * for recursive calls if more signals are received while this runs.  It's
+	 * unclear that recursive entry would be safe, and it doesn't seem useful
+	 * even if it is safe, so let's block interrupts until done.
+	 */
+	HOLD_INTERRUPTS();
+
+	/*
+	 * Moreover, CurrentMemoryContext might be pointing almost anywhere.  We
+	 * don't want to risk leaking data into long-lived contexts, so let's do
+	 * our work here in a private context that we can reset on each use.
+	 */
+	if (hpm_context == NULL)	/* first time through? */
+		hpm_context = AllocSetContextCreate(TopMemoryContext,
+											"ProcessParallelMessages",
+											ALLOCSET_DEFAULT_SIZES);
+	else
+		MemoryContextReset(hpm_context);
+
+	oldcontext = MemoryContextSwitchTo(hpm_context);
+
+	/* OK to process messages.  Reset the flag saying there are more to do. */
+	RepackMessagePending = false;
+
+	/*
+	 * Read as many messages as we can from each worker, but stop when no more
+	 * messages can be read from the worker without blocking.
+	 */
+	while (true)
+	{
+		shm_mq_result res;
+		Size		nbytes;
+		void	   *data;
+
+		res = shm_mq_receive(decoding_worker->error_mqh, &nbytes,
+							 &data, true);
+		if (res == SHM_MQ_WOULD_BLOCK)
+			break;
+		else if (res == SHM_MQ_SUCCESS)
+		{
+			StringInfoData msg;
+
+			initStringInfo(&msg);
+			appendBinaryStringInfo(&msg, data, nbytes);
+			ProcessRepackMessage(&msg);
+			pfree(msg.data);
+		}
+		else
+		{
+			/*
+			 * The decoding worker is special in that it exits as soon as it
+			 * has its work done. Thus the DETACHED result code is fine.
+			 */
+			Assert(res = SHM_MQ_DETACHED);
+
+			break;
+		}
+	}
+
+	MemoryContextSwitchTo(oldcontext);
+
+	/* Might as well clear the context on our way out */
+	MemoryContextReset(hpm_context);
+
+	RESUME_INTERRUPTS();
+}
+
+/*
+ * Process a single protocol message received from a single parallel worker.
+ */
+static void
+ProcessRepackMessage(StringInfo msg)
+{
+	char		msgtype;
+
+	msgtype = pq_getmsgbyte(msg);
+
+	switch (msgtype)
+	{
+		case PqMsg_ErrorResponse:
+		case PqMsg_NoticeResponse:
+			{
+				ErrorData	edata;
+
+				/* Parse ErrorResponse or NoticeResponse. */
+				pq_parse_errornotice(msg, &edata);
+
+				/* Death of a worker isn't enough justification for suicide. */
+				edata.elevel = Min(edata.elevel, ERROR);
+
+				/*
+				 * If desired, add a context line to show that this is a
+				 * message propagated from a parallel worker.  Otherwise, it
+				 * can sometimes be confusing to understand what actually
+				 * happened.
+				 */
+				if (edata.context)
+					edata.context = psprintf("%s\n%s", edata.context,
+											 _("decoding worker"));
+				else
+					edata.context = pstrdup(_("decoding worker"));
+
+				/* Rethrow error or print notice. */
+				ThrowErrorData(&edata);
+
+				break;
+			}
+
+		default:
+			{
+				elog(ERROR, "unrecognized message type received from decoding worker: %c (message length %d bytes)",
+					 msgtype, msg->len);
+			}
+	}
+}
diff --git a/src/backend/libpq/pqmq.c b/src/backend/libpq/pqmq.c
index 2b75de0ddef..28a5a400fb1 100644
--- a/src/backend/libpq/pqmq.c
+++ b/src/backend/libpq/pqmq.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 
 #include "access/parallel.h"
+#include "commands/cluster.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "libpq/pqmq.h"
@@ -175,6 +176,10 @@ mq_putmessage(char msgtype, const char *s, size_t len)
 				SendProcSignal(pq_mq_parallel_leader_pid,
 							   PROCSIG_PARALLEL_APPLY_MESSAGE,
 							   pq_mq_parallel_leader_proc_number);
+			else if (IsRepackWorker())
+				SendProcSignal(pq_mq_parallel_leader_pid,
+							   PROCSIG_REPACK_MESSAGE,
+							   pq_mq_parallel_leader_proc_number);
 			else
 			{
 				Assert(IsParallelWorker());
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 142a02eb5e9..b368990e90b 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -13,6 +13,7 @@
 #include "postgres.h"
 
 #include "access/parallel.h"
+#include "commands/cluster.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -135,6 +136,9 @@ static const struct
 	},
 	{
 		"SequenceSyncWorkerMain", SequenceSyncWorkerMain
+	},
+	{
+		"RepackWorkerMain", RepackWorkerMain
 	}
 };
 
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 866f92cf799..85323ba61e2 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -205,7 +205,11 @@ StartupDecodingContext(List *output_plugin_options,
 
 	ctx->slot = slot;
 
-	ctx->reader = XLogReaderAllocate(wal_segment_size, NULL, xl_routine, ctx);
+	/*
+	 * TODO A separate patch for PG core, unless there's really a reason to
+	 * pass ctx for private_data (May extensions expect ctx?).
+	 */
+	ctx->reader = XLogReaderAllocate(wal_segment_size, NULL, xl_routine, NULL);
 	if (!ctx->reader)
 		ereport(ERROR,
 				(errcode(ERRCODE_OUT_OF_MEMORY),
diff --git a/src/backend/replication/pgoutput_repack/pgoutput_repack.c b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
index c8930640a0d..fb9956d392d 100644
--- a/src/backend/replication/pgoutput_repack/pgoutput_repack.c
+++ b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
@@ -168,17 +168,13 @@ store_change(LogicalDecodingContext *ctx, ConcurrentChangeKind kind,
 			 HeapTuple tuple)
 {
 	RepackDecodingState *dstate;
-	char	   *change_raw;
-	ConcurrentChange change;
+	char		kind_byte = (char) kind;
 	bool		flattened = false;
-	Size		size;
-	Datum		values[1];
-	bool		isnull[1];
-	char	   *dst;
 
 	dstate = (RepackDecodingState *) ctx->output_writer_private;
 
-	size = VARHDRSZ + SizeOfConcurrentChange;
+	/* Store the change kind. */
+	BufFileWrite(dstate->file, &kind_byte, 1);
 
 	/*
 	 * ReorderBufferCommit() stores the TOAST chunks in its private memory
@@ -195,46 +191,12 @@ store_change(LogicalDecodingContext *ctx, ConcurrentChangeKind kind,
 		tuple = toast_flatten_tuple(tuple, dstate->tupdesc);
 		flattened = true;
 	}
+	/* Store the tuple size ... */
+	BufFileWrite(dstate->file, &tuple->t_len, sizeof(tuple->t_len));
+	/* ... and the tuple itself. */
+	BufFileWrite(dstate->file, tuple->t_data, tuple->t_len);
 
-	size += tuple->t_len;
-	if (size >= MaxAllocSize)
-		elog(ERROR, "Change is too big.");
-
-	/* Construct the change. */
-	change_raw = (char *) palloc0(size);
-	SET_VARSIZE(change_raw, size);
-
-	/*
-	 * Since the varlena alignment might not be sufficient for the structure,
-	 * set the fields in a local instance and remember where it should
-	 * eventually be copied.
-	 */
-	change.kind = kind;
-	dst = (char *) VARDATA(change_raw);
-
-	/*
-	 * Copy the tuple.
-	 *
-	 * Note: change->tup_data.t_data must be fixed on retrieval!
-	 */
-	memcpy(&change.tup_data, tuple, sizeof(HeapTupleData));
-	memcpy(dst, &change, SizeOfConcurrentChange);
-	dst += SizeOfConcurrentChange;
-	memcpy(dst, tuple->t_data, tuple->t_len);
-
-	/* The data has been copied. */
+	/* Free the flat copy if created above. */
 	if (flattened)
 		pfree(tuple);
-
-	/* Store as tuple of 1 bytea column. */
-	values[0] = PointerGetDatum(change_raw);
-	isnull[0] = false;
-	tuplestore_putvalues(dstate->tstore, dstate->tupdesc_change,
-						 values, isnull);
-
-	/* Accounting. */
-	dstate->nchanges++;
-
-	/* Cleanup. */
-	pfree(change_raw);
 }
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 087821311cc..af12144795b 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -19,6 +19,7 @@
 
 #include "access/parallel.h"
 #include "commands/async.h"
+#include "commands/cluster.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/pg_bitutils.h"
@@ -694,6 +695,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
 	if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
 		HandleParallelApplyMessageInterrupt();
 
+	if (CheckProcSignal(PROCSIG_REPACK_MESSAGE))
+		HandleRepackMessageInterrupt();
+
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_DATABASE))
 		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_DATABASE);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 7dd75a490aa..4a4858882f0 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -36,6 +36,7 @@
 #include "access/xact.h"
 #include "catalog/pg_type.h"
 #include "commands/async.h"
+#include "commands/cluster.h"
 #include "commands/event_trigger.h"
 #include "commands/explain_state.h"
 #include "commands/prepare.h"
@@ -3541,6 +3542,9 @@ ProcessInterrupts(void)
 
 	if (ParallelApplyMessagePending)
 		ProcessParallelApplyMessages();
+
+	if (RepackMessagePending)
+		ProcessRepackMessages();
 }
 
 /*
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index f39830dbb34..cbcc8550960 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -62,6 +62,7 @@ LOGICAL_APPLY_MAIN	"Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN	"Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication parallel apply process."
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
+REPACK_WORKER_MAIN	"Waiting in main loop of REPACK decoding worker process."
 REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
@@ -153,6 +154,7 @@ RECOVERY_CONFLICT_SNAPSHOT	"Waiting for recovery conflict resolution for a vacuu
 RECOVERY_CONFLICT_TABLESPACE	"Waiting for recovery conflict resolution for dropping a tablespace."
 RECOVERY_END_COMMAND	"Waiting for <xref linkend="guc-recovery-end-command"/> to complete."
 RECOVERY_PAUSE	"Waiting for recovery to be resumed."
+REPACK_WORKER_EXPORT	"Waiting for decoding worker to export a new output file."
 REPLICATION_ORIGIN_DROP	"Waiting for a replication origin to become inactive so it can be dropped."
 REPLICATION_SLOT_DROP	"Waiting for a replication slot to become inactive so it can be dropped."
 RESTORE_COMMAND	"Waiting for <xref linkend="guc-restore-command"/> to complete."
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index d8f76d325f9..2b983abce3e 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -22,7 +22,6 @@
 #include "access/xact.h"
 #include "commands/vacuum.h"
 #include "executor/tuptable.h"
-#include "replication/logical.h"
 #include "storage/read_stream.h"
 #include "utils/rel.h"
 #include "utils/snapshot.h"
@@ -631,7 +630,6 @@ typedef struct TableAmRoutine
 											  bool use_sort,
 											  TransactionId OldestXmin,
 											  Snapshot snapshot,
-											  LogicalDecodingContext *decoding_ctx,
 											  TransactionId *xid_cutoff,
 											  MultiXactId *multi_cutoff,
 											  double *num_tuples,
@@ -1652,7 +1650,7 @@ table_relation_copy_data(Relation rel, const RelFileLocator *newrlocator)
  * - snapshot - if != NULL, ignore data changes done by transactions that this
  *	 (MVCC) snapshot considers still in-progress or in the future.
  * - decoding_ctx - logical decoding context, to capture concurrent data
- *   changes.
+ *   changes. NULL if background worker takes care of the decoding.
  *
  * Output parameters:
  * - *xid_cutoff - rel's new relfrozenxid value, may be invalid
@@ -1666,7 +1664,6 @@ table_relation_copy_for_cluster(Relation OldTable, Relation NewTable,
 								bool use_sort,
 								TransactionId OldestXmin,
 								Snapshot snapshot,
-								LogicalDecodingContext *decoding_ctx,
 								TransactionId *xid_cutoff,
 								MultiXactId *multi_cutoff,
 								double *num_tuples,
@@ -1675,7 +1672,7 @@ table_relation_copy_for_cluster(Relation OldTable, Relation NewTable,
 {
 	OldTable->rd_tableam->relation_copy_for_cluster(OldTable, NewTable, OldIndex,
 													use_sort, OldestXmin,
-													snapshot, decoding_ctx,
+													snapshot,
 													xid_cutoff, multi_cutoff,
 													num_tuples, tups_vacuumed,
 													tups_recently_dead);
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index b43a1740053..0ac70ec30d7 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -17,6 +17,7 @@
 #include "nodes/parsenodes.h"
 #include "parser/parse_node.h"
 #include "replication/logical.h"
+#include "storage/buffile.h"
 #include "storage/lock.h"
 #include "storage/relfilelocator.h"
 #include "utils/relcache.h"
@@ -47,6 +48,9 @@ typedef struct ClusterParams
 extern RelFileLocator repacked_rel_locator;
 extern RelFileLocator repacked_rel_toast_locator;
 
+/*
+ * Stored as a single byte in the output file.
+ */
 typedef enum
 {
 	CHANGE_INSERT,
@@ -55,68 +59,30 @@ typedef enum
 	CHANGE_DELETE
 } ConcurrentChangeKind;
 
-typedef struct ConcurrentChange
-{
-	/* See the enum above. */
-	ConcurrentChangeKind kind;
-
-	/*
-	 * The actual tuple.
-	 *
-	 * The tuple data follows the ConcurrentChange structure. Before use make
-	 * sure the tuple is correctly aligned (ConcurrentChange can be stored as
-	 * bytea) and that tuple->t_data is fixed.
-	 */
-	HeapTupleData tup_data;
-} ConcurrentChange;
-
-#define SizeOfConcurrentChange (offsetof(ConcurrentChange, tup_data) + \
-								sizeof(HeapTupleData))
-
 /*
  * Logical decoding state.
  *
- * Here we store the data changes that we decode from WAL while the table
- * contents is being copied to a new storage. Also the necessary metadata
- * needed to apply these changes to the table is stored here.
+ * The output plugin uses it to store the data changes that it decodes from
+ * WAL while the table contents is being copied to a new storage.
  */
 typedef struct RepackDecodingState
 {
 	/* The relation whose changes we're decoding. */
 	Oid			relid;
 
-	/* Replication slot name. */
-	NameData	slotname;
-
-	/*
-	 * Decoded changes are stored here. Although we try to avoid excessive
-	 * batches, it can happen that the changes need to be stored to disk. The
-	 * tuplestore does this transparently.
-	 */
-	Tuplestorestate *tstore;
-
-	/* The current number of changes in tstore. */
-	double		nchanges;
-
-	/*
-	 * Descriptor to store the ConcurrentChange structure serialized (bytea).
-	 * We can't store the tuple directly because tuplestore only supports
-	 * minimum tuple and we may need to transfer OID system column from the
-	 * output plugin. Also we need to transfer the change kind, so it's better
-	 * to put everything in the structure than to use 2 tuplestores "in
-	 * parallel".
-	 */
-	TupleDesc	tupdesc_change;
-
-	/* Tuple descriptor needed to update indexes. */
+	/* Tuple descriptor of the relation being processed. */
 	TupleDesc	tupdesc;
 
-	/* Slot to retrieve data from tstore. */
-	TupleTableSlot *tsslot;
-
-	ResourceOwner resowner;
+	/* The current output file. */
+	BufFile    *file;
 } RepackDecodingState;
 
+extern PGDLLIMPORT volatile sig_atomic_t RepackMessagePending;
+
+extern bool IsRepackWorker(void);
+extern void HandleRepackMessageInterrupt(void);
+extern void ProcessRepackMessages(void);
+
 extern void ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel);
 
 extern void cluster_rel(RepackCommand command, Relation OldHeap, Oid indexOid,
@@ -125,9 +91,6 @@ extern void check_index_is_clusterable(Relation OldHeap, Oid indexOid,
 									   LOCKMODE lockmode);
 extern void mark_index_clustered(Relation rel, Oid indexOid, bool is_internal);
 
-extern void repack_decode_concurrent_changes(LogicalDecodingContext *ctx,
-											 XLogRecPtr end_of_wal);
-
 extern Oid	make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
 						  char relpersistence, LOCKMODE lockmode);
 extern void finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
@@ -140,4 +103,5 @@ extern void finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 							 MultiXactId cutoffMulti,
 							 char newrelpersistence);
 
+extern void RepackWorkerMain(Datum main_arg);
 #endif							/* CLUSTER_H */
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index afeeb1ca019..c0a66516b66 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -36,6 +36,7 @@ typedef enum
 	PROCSIG_BARRIER,			/* global barrier interrupt  */
 	PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
 	PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
+	PROCSIG_REPACK_MESSAGE,		/* Message from repack worker */
 
 	/* Recovery conflict reasons */
 	PROCSIG_RECOVERY_CONFLICT_FIRST,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3139b14e85f..35344910f65 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -488,7 +488,6 @@ CompressFileHandle
 CompressionLocation
 CompressorState
 ComputeXidHorizonsResult
-ConcurrentChange
 ConcurrentChangeKind
 ConditionVariable
 ConditionVariableMinimallyPadded
@@ -629,6 +628,9 @@ DeclareCursorStmt
 DecodedBkpBlock
 DecodedXLogRecord
 DecodingOutputState
+DecodingWorker
+DecodingWorkerShared
+DecodingWorkerState
 DefElem
 DefElemAction
 DefaultACLInfo
-- 
2.47.3

#67

Alvaro Herrera

alvherre@alvh.no-ip.org

about 1 month ago

In reply to: Antonin Houska (#66)

Re: Adding REPACK [concurrently]

Hello, many thanks for the new version. Here's a very quick proposal
for a new top-of-file comment on cluster.c,

* cluster.c
* Implementation of REPACK [CONCURRENTLY], also known as CLUSTER and
* VACUUM FULL.
*
* There are two somewhat different ways to rewrite a table. In non-
* concurrent mode, it's easy: take AccessExclusiveLock, create a new
* transient relation, copy the tuples over to the relfilenode of the
* new relation, swap the relfilenodes, then drop the old relation.
*
* In concurrent mode, we lock the table with only ShareUpdateExclusiveLock,
* then do an initial copy as above. However, while the tuples are being
* copied, concurrent transactions could modify the table, and to cope
* with those changes, we rely on logical decoding to obtain them from WAL.
* A bgworker consumes WAL while the initial copy is ongoing (to prevent
* excessive WAL from being reserved), and accumulates the changes in
* a tuplestore. Once the initial copy is complete, we read the changes
* from the tuplestore and re-apply them on the new heap. Then we
* upgrade our ShareUpdateExclusiveLock to AccessExclusiveLock and swap
* the relfilenodes. This way, the time we hold a strong lock on the
* table is much reduced, and the bloat is greatly reduced.

I haven't read build_relation_finish_concurrent() yet to understand how
exactly do we do the lock upgrade, which I think is an important point
we should address in this comment. Also not addressed is how exactly we
handle indexes. Feel free to correct this, reword it or include any
additional details that you think are important.

(At this point we could just as well rename the file to repack.c, since
very little of the original remains. But let's discuss that later.)

Thanks,

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Doing what he did amounts to sticking his fingers under the hood of the
implementation; if he gets his fingers burnt, it's his problem." (Tom Lane)

#68

Mihail Nikalayeu

mihailnikalayeu@gmail.com

about 1 month ago

In reply to: Antonin Houska (#66)

Re: Adding REPACK [concurrently]

Hello, Antonin!

On Tue, Dec 9, 2025 at 7:52 PM Antonin Houska <ah@cybertec.at> wrote:

Worker makes more sense to me - the initial implementation is in 0005.

Comments for 0005, so far:

---

export_initial_snapshot

Hm, should we use ExportSnapshot instead? And ImportSnapshort to import it.

---

get_initial_snapshot

Should we check if a worker is still alive while waiting? Also is
"process_concurrent_changes".

And AFAIU RegisterDynamicBackgroundWorker does not guarantee new
workers to be started (in case of some fork-related issues).

---

Assert(res = SHM_MQ_DETACHED);

---

/* Wait a bit before we retry reading WAL. */
(void) WaitLatch(MyLatch,
WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
1000L,
WAIT_EVENT_REPACK_WORKER_MAIN);

Looks like we need ResetLatch(MyLatch); here.

---

* - decoding_ctx - logical decoding context, to capture concurrent data

Need to be removed together with parameters.

---

hpm_context = AllocSetContextCreate(TopMemoryContext,
"ProcessParallelMessages",
ALLOCSET_DEFAULT_SIZES);

"ProcessRepacklMessages"

---

if (XLogRecPtrIsInvalid(lsn_upto))
{
SpinLockAcquire(&shared->mutex);
lsn_upto = shared->lsn_upto;
/* 'done' should be set at the same time as 'lsn_upto' */
done = shared->done;
SpinLockRelease(&shared->mutex);

/* Check if the work happens to be complete. */
continue;
}

May be moved to the start of the loop to avoid duplication.

---

SpinLockAcquire(&shared->mutex);
valid = shared->sfs_valid;
SpinLockRelease(&shared->mutex);

Better to remember last_exported here to avoid any races/misses.

---

shared->lsn_upto = InvalidXLogRecPtr;

I think it is better to clear it once it is read (after removing duplication).

---

bool done;

bool exit_after_lsn_upto?

---

bool sfs_valid;

Do we really need it? I think it is better to leave only last_exported
and in process_concurrent_changes wait add argument
(last_processed_file) and wait for last_exported to become higher.

---
What if we reverse roles of leader-worker?

Leader gets a snapshot, transfers it to workers (multiple probably for
parallel scan) using already ready mechanics - workers are processing
the scan of the table in parallel. Leader decodes the WAL.

Also, workers may be assigned with a list of indexes they need to build.

Feels like it reuses more from current infrastructure and also needs
less different synchronization logic. But I'm not sure about the
indexes phase - maybe it is not so easy to do.

---
Also, should we add some kind of back pressure between building
indexes/new heap and num of WAL we have?
But probably it is out of scope of the patch.

---
To build N indexes we need to scan table N times. What is about
building multiple indexes during a single heap scan?

--
Just a gentle reminder about the XMIN_COMMITTED flag and WAL storm
after the switch.

Best regards,
Mikhail.

#69

Mihail Nikalayeu

mihailnikalayeu@gmail.com

29 days ago

In reply to: Mihail Nikalayeu (#68)

3 attachment(s)

Re: Adding REPACK [concurrently]

Hello, everyone.

Stress tests for REPACK concurrently in attachment.
So far I can't break anything (except MVCC of course).

A rebased version of the MVCC-safe "light" version with its own stress
test is attached also.

Best regards,
Mikhail.

Attachments:

nocfbot-0006-Preserve-visibility-information-of-the-conc.patchtext/plain; charset=US-ASCII; name=nocfbot-0006-Preserve-visibility-information-of-the-conc.patchDownload

From 457235c743a2dec2c1917fbdfa7f5a48d305c63e Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <mihailnikalayeu@gmail.com>
Date: Sat, 13 Dec 2025 19:42:52 +0100
Subject: [PATCH vnocfbot] Preserve visibility information of the concurrent 
 data  changes.

As explained in the commit message of the preceding patch of the series, the
data changes done by applications while REPACK CONCURRENTLY is copying the
table contents to a new file are decoded from WAL and eventually also applied
to the new file. To reduce the complexity a little bit, the preceding patch
uses the current transaction (i.e. transaction opened by the REPACK command)
to execute those INSERT, UPDATE and DELETE commands.

However, REPACK is not expected to change visibility of tuples. Therefore,
this patch fixes the handling of the "concurrent data changes". It ensures
that tuples written into the new table have the same XID and command ID (CID)
as they had in the old table.

To "replay" an UPDATE or DELETE command on the new table, we use SnapshotSelf to find the last alive version of tuple and update with stamp with xid of original transaction. It is safe because:
* all transactions we replaying are committed
* apply worker working without any concurrent modifiers of the table

As long as we preserve the tuple visibility information (which includes XID),
it's important to avoid logical decoding of the WAL generated by DMLs on the
new table: the logical decoding subsystem probably does not expect that the
incoming WAL records contain XIDs of an already decoded transactions. (And of
course, repeated decoding would be wasted effort.)

Author: Antonin Houska <ah@cybertec.at> with changes from Mikhail Nikalayeu <mihailnikalayeu@gmail.com
---
 contrib/amcheck/meson.build                   |   1 +
 .../amcheck/t/009_repack_concurrently_mvcc.pl | 113 ++++++++++++++++++
 doc/src/sgml/mvcc.sgml                        |  12 +-
 doc/src/sgml/ref/repack.sgml                  |   9 --
 src/backend/access/common/toast_internals.c   |   3 +-
 src/backend/access/heap/heapam.c              |  29 +++--
 src/backend/access/heap/heapam_handler.c      |  24 ++--
 src/backend/commands/cluster.c                | 107 ++++++++++++-----
 .../pgoutput_repack/pgoutput_repack.c         |  16 ++-
 src/include/access/heapam.h                   |   6 +-
 .../injection_points/specs/repack.spec        |   4 -
 11 files changed, 243 insertions(+), 81 deletions(-)
 create mode 100644 contrib/amcheck/t/009_repack_concurrently_mvcc.pl

diff --git a/contrib/amcheck/meson.build b/contrib/amcheck/meson.build
index f7c70735989..6946c684259 100644
--- a/contrib/amcheck/meson.build
+++ b/contrib/amcheck/meson.build
@@ -52,6 +52,7 @@ tests += {
       't/006_verify_gin.pl',
       't/007_repack_concurrently.pl',
       't/008_repack_concurrently.pl',
+      't/009_repack_concurrently_mvcc.pl',
     ],
   },
 }
diff --git a/contrib/amcheck/t/009_repack_concurrently_mvcc.pl b/contrib/amcheck/t/009_repack_concurrently_mvcc.pl
new file mode 100644
index 00000000000..a83fd5b8141
--- /dev/null
+++ b/contrib/amcheck/t/009_repack_concurrently_mvcc.pl
@@ -0,0 +1,113 @@
+
+# Copyright (c) 2021-2025, PostgreSQL Global Development Group
+
+# Test REPACK CONCURRENTLY with concurrent modifications
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+
+use Test::More;
+
+my $node;
+
+#
+# Test set-up
+#
+$node = PostgreSQL::Test::Cluster->new('CIC_test');
+$node->init;
+$node->append_conf('postgresql.conf',
+	'lock_timeout = ' . (1000 * $PostgreSQL::Test::Utils::timeout_default));
+$node->append_conf(
+	'postgresql.conf', qq(
+wal_level = logical
+));
+$node->start;
+$node->safe_psql('postgres', q(CREATE TABLE tbl1(i int PRIMARY KEY, j int)));
+$node->safe_psql('postgres', q(CREATE TABLE tbl2(i int PRIMARY KEY, j int)));
+
+
+# Insert 100 rows into tbl1
+$node->safe_psql('postgres', q(
+    INSERT INTO tbl1 SELECT i, i % 100 FROM generate_series(1,100) i
+));
+
+# Insert 100 rows into tbl2
+$node->safe_psql('postgres', q(
+    INSERT INTO tbl2 SELECT i, i % 100 FROM generate_series(1,100) i
+));
+
+
+# Insert 100 rows into tbl1
+$node->safe_psql('postgres', q(
+	CREATE OR REPLACE FUNCTION log_raise(i int, j1 int, j2 int) RETURNS VOID AS $$
+	BEGIN
+	  RAISE NOTICE 'ERROR i=% j1=% j2=%', i, j1, j2;
+	END;$$ LANGUAGE plpgsql;
+));
+
+$node->safe_psql('postgres', q(CREATE UNLOGGED SEQUENCE in_row_rebuild START 1 INCREMENT 1;));
+$node->safe_psql('postgres', q(SELECT nextval('in_row_rebuild');));
+
+
+$node->pgbench(
+'--no-vacuum --client=10 --jobs=4 --exit-on-abort --transactions=2500',
+0,
+[qr{actually processed}],
+[qr{^$}],
+'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY',
+{
+	'concurrent_ops' => q(
+		SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+		\if :gotlock
+			SELECT nextval('in_row_rebuild') AS last_value \gset
+			\if :last_value = 2
+				REPACK (CONCURRENTLY) tbl1 USING INDEX tbl1_pkey;
+				\sleep 10 ms
+				REPACK (CONCURRENTLY) tbl2 USING INDEX tbl2_pkey;
+				\sleep 10 ms
+			\endif
+			SELECT pg_advisory_unlock(42);
+		\else
+			\set num random(1, 100)
+			BEGIN;
+			UPDATE tbl1 SET j = j + 1 WHERE i = :num;
+			\sleep 1 ms
+			UPDATE tbl1 SET j = j + 2 WHERE i = :num;
+			\sleep 1 ms
+			UPDATE tbl1 SET j = j + 3 WHERE i = :num;
+			\sleep 1 ms
+			UPDATE tbl1 SET j = j + 4 WHERE i = :num;
+			\sleep 1 ms
+
+			UPDATE tbl2 SET j = j + 1 WHERE i = :num;
+			\sleep 1 ms
+			UPDATE tbl2 SET j = j + 2 WHERE i = :num;
+			\sleep 1 ms
+			UPDATE tbl2 SET j = j + 3 WHERE i = :num;
+			\sleep 1 ms
+			UPDATE tbl2 SET j = j + 4 WHERE i = :num;
+
+			COMMIT;
+			SELECT setval('in_row_rebuild', 1);
+
+			BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
+			SELECT COALESCE(SUM(j), 0) AS t1 FROM tbl1 WHERE i = :num \gset p_
+			\sleep 10 ms
+			SELECT COALESCE(SUM(j), 0) AS t2 FROM tbl2 WHERE i = :num \gset p_
+			\if :p_t1 != :p_t2
+				COMMIT;
+				SELECT log_raise(tbl1.i, tbl1.j, tbl2.j) FROM tbl1 LEFT OUTER JOIN tbl2 ON tbl1.i = tbl2.i WHERE tbl1.j != tbl2.j;
+				\sleep 10 ms
+				SELECT log_raise(tbl1.i, tbl1.j, tbl2.j) FROM tbl1 LEFT OUTER JOIN tbl2 ON tbl1.i = tbl2.i WHERE tbl1.j != tbl2.j;
+				SELECT (:p_t1 + :p_t2) / 0;
+			\endif
+
+			COMMIT;
+		\endif
+	)
+});
+
+$node->stop;
+done_testing();
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index 0f5c34af542..049ee75a4ba 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -1833,17 +1833,15 @@ SELECT pg_advisory_lock(q.id) FROM
    <title>Caveats</title>
 
    <para>
-    Some commands, currently only <link linkend="sql-truncate"><command>TRUNCATE</command></link>, the
-    table-rewriting forms of <link linkend="sql-altertable"><command>ALTER
-    TABLE</command></link> and <command>REPACK</command> with
-    the <literal>CONCURRENTLY</literal> option, are not
+    Some DDL commands, currently only <link linkend="sql-truncate"><command>TRUNCATE</command></link> and the
+    table-rewriting forms of <link linkend="sql-altertable"><command>ALTER TABLE</command></link>, are not
     MVCC-safe.  This means that after the truncation or rewrite commits, the
     table will appear empty to concurrent transactions, if they are using a
-    snapshot taken before the command committed.  This will only be an
+    snapshot taken before the DDL command committed.  This will only be an
     issue for a transaction that did not access the table in question
-    before the command started &mdash; any transaction that has done so
+    before the DDL command started &mdash; any transaction that has done so
     would hold at least an <literal>ACCESS SHARE</literal> table lock,
-    which would block the truncating or rewriting command until that transaction completes.
+    which would block the DDL command until that transaction completes.
     So these commands will not cause any apparent inconsistency in the
     table contents for successive queries on the target table, but they
     could cause visible inconsistency between the contents of the target
diff --git a/doc/src/sgml/ref/repack.sgml b/doc/src/sgml/ref/repack.sgml
index 30c43c49069..9796a923597 100644
--- a/doc/src/sgml/ref/repack.sgml
+++ b/doc/src/sgml/ref/repack.sgml
@@ -308,15 +308,6 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] USING
        </listitem>
       </itemizedlist>
      </para>
-
-     <warning>
-      <para>
-       <command>REPACK</command> with the <literal>CONCURRENTLY</literal>
-       option is not MVCC-safe, see <xref linkend="mvcc-caveats"/> for
-       details.
-      </para>
-     </warning>
-
     </listitem>
    </varlistentry>
 
diff --git a/src/backend/access/common/toast_internals.c b/src/backend/access/common/toast_internals.c
index 63b848473f8..91119da5cd5 100644
--- a/src/backend/access/common/toast_internals.c
+++ b/src/backend/access/common/toast_internals.c
@@ -311,7 +311,8 @@ toast_save_datum(Relation rel, Datum value,
 
 		toasttup = heap_form_tuple(toasttupDesc, t_values, t_isnull);
 
-		heap_insert(toastrel, toasttup, mycid, options, NULL);
+		heap_insert(toastrel, toasttup, GetCurrentTransactionId(), mycid,
+					options, NULL);
 
 		/*
 		 * Create the index entry.  We cheat a little here by not using
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e11833f01b4..94ca07e4b55 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2085,7 +2085,7 @@ ReleaseBulkInsertStatePin(BulkInsertState bistate)
 /*
  *	heap_insert		- insert tuple into a heap
  *
- * The new tuple is stamped with current transaction ID and the specified
+ * The new tuple is stamped with specified transaction ID and the specified
  * command ID.
  *
  * See table_tuple_insert for comments about most of the input flags, except
@@ -2101,15 +2101,16 @@ ReleaseBulkInsertStatePin(BulkInsertState bistate)
  * reflected into *tup.
  */
 void
-heap_insert(Relation relation, HeapTuple tup, CommandId cid,
-			int options, BulkInsertState bistate)
+heap_insert(Relation relation, HeapTuple tup, TransactionId xid,
+			CommandId cid, int options, BulkInsertState bistate)
 {
-	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
+	Assert(TransactionIdIsValid(xid));
+
 	/* Cheap, simplistic check that the tuple matches the rel's rowtype. */
 	Assert(HeapTupleHeaderGetNatts(tup->t_data) <=
 		   RelationGetNumberOfAttributes(relation));
@@ -2375,7 +2376,6 @@ void
 heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 				  CommandId cid, int options, BulkInsertState bistate)
 {
-	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple  *heaptuples;
 	int			i;
 	int			ndone;
@@ -2408,7 +2408,7 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		tuple = ExecFetchSlotHeapTuple(slots[i], true, NULL);
 		slots[i]->tts_tableOid = RelationGetRelid(relation);
 		tuple->t_tableOid = slots[i]->tts_tableOid;
-		heaptuples[i] = heap_prepare_insert(relation, tuple, xid, cid,
+		heaptuples[i] = heap_prepare_insert(relation, tuple, GetCurrentTransactionId(), cid,
 											options);
 	}
 
@@ -2746,7 +2746,8 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 void
 simple_heap_insert(Relation relation, HeapTuple tup)
 {
-	heap_insert(relation, tup, GetCurrentCommandId(true), 0, NULL);
+	heap_insert(relation, tup, GetCurrentTransactionId(),
+				GetCurrentCommandId(true), 0, NULL);
 }
 
 /*
@@ -2803,11 +2804,10 @@ xmax_infomask_changed(uint16 new_infomask, uint16 old_infomask)
  */
 TM_Result
 heap_delete(Relation relation, const ItemPointerData *tid,
-			CommandId cid, Snapshot crosscheck, bool wait,
+			TransactionId xid, CommandId cid, Snapshot crosscheck, bool wait,
 			TM_FailureData *tmfd, bool changingPart, bool walLogical)
 {
 	TM_Result	result;
-	TransactionId xid = GetCurrentTransactionId();
 	ItemId		lp;
 	HeapTupleData tp;
 	Page		page;
@@ -2824,6 +2824,7 @@ heap_delete(Relation relation, const ItemPointerData *tid,
 	bool		old_key_copied = false;
 
 	Assert(ItemPointerIsValid(tid));
+	Assert(TransactionIdIsValid(xid));
 
 	AssertHasSnapshotForToast(relation);
 
@@ -3240,7 +3241,7 @@ simple_heap_delete(Relation relation, const ItemPointerData *tid)
 	TM_Result	result;
 	TM_FailureData tmfd;
 
-	result = heap_delete(relation, tid,
+	result = heap_delete(relation, tid, GetCurrentTransactionId(),
 						 GetCurrentCommandId(true), InvalidSnapshot,
 						 true /* wait for commit */ ,
 						 &tmfd, false,	/* changingPart */
@@ -3283,12 +3284,11 @@ simple_heap_delete(Relation relation, const ItemPointerData *tid)
  */
 TM_Result
 heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
-			CommandId cid, Snapshot crosscheck, bool wait,
+			TransactionId xid, CommandId cid, Snapshot crosscheck, bool wait,
 			TM_FailureData *tmfd, LockTupleMode *lockmode,
 			TU_UpdateIndexes *update_indexes, bool walLogical)
 {
 	TM_Result	result;
-	TransactionId xid = GetCurrentTransactionId();
 	Bitmapset  *hot_attrs;
 	Bitmapset  *sum_attrs;
 	Bitmapset  *key_attrs;
@@ -3328,6 +3328,7 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
 				infomask2_new_tuple;
 
 	Assert(ItemPointerIsValid(otid));
+	Assert(TransactionIdIsValid(xid));
 
 	/* Cheap, simplistic check that the tuple matches the rel's rowtype. */
 	Assert(HeapTupleHeaderGetNatts(newtup->t_data) <=
@@ -4534,7 +4535,7 @@ simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup
 	TM_FailureData tmfd;
 	LockTupleMode lockmode;
 
-	result = heap_update(relation, otid, tup,
+	result = heap_update(relation, otid, tup, GetCurrentTransactionId(),
 						 GetCurrentCommandId(true), InvalidSnapshot,
 						 true /* wait for commit */ ,
 						 &tmfd, &lockmode, update_indexes,
@@ -5373,8 +5374,6 @@ compute_new_xmax_infomask(TransactionId xmax, uint16 old_infomask,
 	uint16		new_infomask,
 				new_infomask2;
 
-	Assert(TransactionIdIsCurrentTransactionId(add_to_xmax));
-
 l5:
 	new_infomask = 0;
 	new_infomask2 = 0;
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index e6d630fa2f7..b49f9add5bb 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -252,7 +252,8 @@ heapam_tuple_insert(Relation relation, TupleTableSlot *slot, CommandId cid,
 	tuple->t_tableOid = slot->tts_tableOid;
 
 	/* Perform the insertion, and copy the resulting ItemPointer */
-	heap_insert(relation, tuple, cid, options, bistate);
+	heap_insert(relation, tuple, GetCurrentTransactionId(), cid, options,
+				bistate);
 	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
 
 	if (shouldFree)
@@ -275,7 +276,8 @@ heapam_tuple_insert_speculative(Relation relation, TupleTableSlot *slot,
 	options |= HEAP_INSERT_SPECULATIVE;
 
 	/* Perform the insertion, and copy the resulting ItemPointer */
-	heap_insert(relation, tuple, cid, options, bistate);
+	heap_insert(relation, tuple, GetCurrentTransactionId(), cid, options,
+				bistate);
 	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
 
 	if (shouldFree)
@@ -309,8 +311,8 @@ heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
 	 * the storage itself is cleaning the dead tuples by itself, it is the
 	 * time to call the index tuple deletion also.
 	 */
-	return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart,
-					   true);
+	return heap_delete(relation, tid, GetCurrentTransactionId(), cid,
+					   crosscheck, wait, tmfd, changingPart, true);
 }
 
 
@@ -328,7 +330,8 @@ heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
 	slot->tts_tableOid = RelationGetRelid(relation);
 	tuple->t_tableOid = slot->tts_tableOid;
 
-	result = heap_update(relation, otid, tuple, cid, crosscheck, wait,
+	result = heap_update(relation, otid, tuple, GetCurrentTransactionId(),
+						 cid, crosscheck, wait,
 						 tmfd, lockmode, update_indexes, true);
 	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
 
@@ -2441,9 +2444,16 @@ reform_and_rewrite_tuple(HeapTuple tuple,
 		 * flag to skip logical decoding: as soon as REPACK CONCURRENTLY swaps
 		 * the relation files, it drops this relation, so no logical
 		 * replication subscription should need the data.
+		 *
+		 * It is also crucial to stamp the new record with the exact same xid
+		 * and cid, because the tuple must be visible to the snapshots of the
+		 * concurrent transactions later.
 		 */
-		heap_insert(NewHeap, copiedTuple, GetCurrentCommandId(true),
-					HEAP_INSERT_NO_LOGICAL, NULL);
+		// TODO: looks like cid is not required
+		CommandId	cid = HeapTupleHeaderGetRawCommandId(tuple->t_data);
+		TransactionId xid = HeapTupleHeaderGetXmin(tuple->t_data);
+
+		heap_insert(NewHeap, copiedTuple, xid, cid, HEAP_INSERT_NO_LOGICAL, NULL);
 	}
 
 	heap_freetuple(copiedTuple);
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index f2a2ec6d3e5..1b1928ce300 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -58,6 +58,7 @@
 #include "storage/ipc.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
+#include "storage/procarray.h"
 #include "storage/procsignal.h"
 #include "tcop/tcopprot.h"
 #include "utils/acl.h"
@@ -249,15 +250,20 @@ static bool decode_concurrent_changes(LogicalDecodingContext *ctx,
 									  DecodingWorkerShared *shared);
 static void apply_concurrent_changes(BufFile *file, ChangeDest *dest);
 static void apply_concurrent_insert(Relation rel, HeapTuple tup,
+									TransactionId xid,
 									IndexInsertState *iistate,
 									TupleTableSlot *index_slot);
 static void apply_concurrent_update(Relation rel, HeapTuple tup,
 									HeapTuple tup_target,
+									TransactionId xid,
 									IndexInsertState *iistate,
 									TupleTableSlot *index_slot);
-static void apply_concurrent_delete(Relation rel, HeapTuple tup_target);
+static void apply_concurrent_delete(Relation rel,
+									TransactionId xid,
+									HeapTuple tup_target);
 static HeapTuple find_target_tuple(Relation rel, ChangeDest *dest,
 								   HeapTuple tup_key,
+								   Snapshot snapshot,
 								   TupleTableSlot *ident_slot);
 static void process_concurrent_changes(XLogRecPtr end_of_wal,
 									   ChangeDest *dest,
@@ -1091,7 +1097,14 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose, bool concurrent
 
 	/* The historic snapshot won't be needed anymore. */
 	if (snapshot)
+	{
+		TransactionId xmin = snapshot->xmin;
 		PopActiveSnapshot();
+		Assert(concurrent);
+		// TODO: seems like it not required: need to check SnapBuildInitialSnapshotForRepack
+		WaitForOlderSnapshots(xmin, false);
+	}
+
 
 	if (concurrent)
 	{
@@ -1382,30 +1395,35 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
 	 * not to be aggressive about this.
 	 */
 	memset(&params, 0, sizeof(VacuumParams));
-	vacuum_get_cutoffs(OldHeap, params, &cutoffs);
-
-	/*
-	 * FreezeXid will become the table's new relfrozenxid, and that mustn't go
-	 * backwards, so take the max.
-	 */
+	if (!concurrent)
 	{
 		TransactionId relfrozenxid = OldHeap->rd_rel->relfrozenxid;
+		MultiXactId relminmxid = OldHeap->rd_rel->relminmxid;
 
+		vacuum_get_cutoffs(OldHeap, params, &cutoffs);
+		/*
+		 * FreezeXid will become the table's new relfrozenxid, and that mustn't go
+		 * backwards, so take the max.
+		 */
 		if (TransactionIdIsValid(relfrozenxid) &&
 			TransactionIdPrecedes(cutoffs.FreezeLimit, relfrozenxid))
 			cutoffs.FreezeLimit = relfrozenxid;
-	}
-
-	/*
-	 * MultiXactCutoff, similarly, shouldn't go backwards either.
-	 */
-	{
-		MultiXactId relminmxid = OldHeap->rd_rel->relminmxid;
-
+		/*
+		 * MultiXactCutoff, similarly, shouldn't go backwards either.
+		 */
 		if (MultiXactIdIsValid(relminmxid) &&
 			MultiXactIdPrecedes(cutoffs.MultiXactCutoff, relminmxid))
 			cutoffs.MultiXactCutoff = relminmxid;
 	}
+	else
+	{
+		/*
+		 * In concurrent mode we reuse all the xmin/xmax,
+		 * so just use current values for simplicity.
+		 */
+		cutoffs.FreezeLimit = OldHeap->rd_rel->relfrozenxid;
+		cutoffs.MultiXactCutoff = OldHeap->rd_rel->relminmxid;
+	}
 
 	/*
 	 * Decide whether to use an indexscan or seqscan-and-optional-sort to scan
@@ -2745,6 +2763,7 @@ apply_concurrent_changes(BufFile *file, ChangeDest *dest)
 		size_t		nread;
 		HeapTuple	tup,
 					tup_exist;
+		TransactionId xid;
 
 		CHECK_FOR_INTERRUPTS();
 
@@ -2761,6 +2780,17 @@ apply_concurrent_changes(BufFile *file, ChangeDest *dest)
 		tup->t_len = t_len;
 		ItemPointerSetInvalid(&tup->t_self);
 		tup->t_tableOid = RelationGetRelid(dest->rel);
+		BufFileReadExact(file, &xid, sizeof(TransactionId));
+
+		if (TransactionIdIsValid(xid && TransactionIdIsInProgress(xid)))
+		{
+			/* xmin is committed for sure because we got that update from reorderbuffer.
+			 * but there is a possibility procarray is not yet updated and current backend still see it as
+			 * in-progress. Let's wait for procarray to be updated. */
+			XactLockTableWait(xid, NULL, NULL, XLTW_None);
+			Assert(!TransactionIdIsInProgress(xid));
+			Assert(TransactionIdDidCommit(xid));
+		}
 
 		if (kind == CHANGE_UPDATE_OLD)
 		{
@@ -2771,7 +2801,7 @@ apply_concurrent_changes(BufFile *file, ChangeDest *dest)
 		{
 			Assert(tup_old == NULL);
 
-			apply_concurrent_insert(rel, tup, dest->iistate, index_slot);
+			apply_concurrent_insert(rel, tup, xid, dest->iistate, index_slot);
 
 			pfree(tup);
 		}
@@ -2790,17 +2820,21 @@ apply_concurrent_changes(BufFile *file, ChangeDest *dest)
 			}
 
 			/*
-			 * Find the tuple to be updated or deleted.
+			 * Find the tuple to be updated or deleted using SnapshotSelf.
+			 * That way we receive the last alive version in case of HOT chain.
+			 * It is guaranteed there is no any non-yet committed, but updated version
+			 * because we here replaying all-committed transactions without any concurrency
+			 * involved.
 			 */
-			tup_exist = find_target_tuple(rel, dest, tup_key, ident_slot);
+			tup_exist = find_target_tuple(rel, dest, tup_key, SnapshotSelf, ident_slot);
 			if (tup_exist == NULL)
 				elog(ERROR, "failed to find target tuple");
 
 			if (kind == CHANGE_UPDATE_NEW)
-				apply_concurrent_update(rel, tup, tup_exist, dest->iistate,
+				apply_concurrent_update(rel, tup, tup_exist, xid, dest->iistate,
 										index_slot);
 			else
-				apply_concurrent_delete(rel, tup_exist);
+				apply_concurrent_delete(rel, xid, tup_exist);
 
 			if (tup_old != NULL)
 			{
@@ -2819,6 +2853,7 @@ apply_concurrent_changes(BufFile *file, ChangeDest *dest)
 		 */
 		if (kind != CHANGE_UPDATE_OLD)
 		{
+			// TODO: not sure it is required at all: we are replaying committed transactions stamping them with committed XID
 			CommandCounterIncrement();
 			UpdateActiveSnapshotCommandId();
 		}
@@ -2830,7 +2865,7 @@ apply_concurrent_changes(BufFile *file, ChangeDest *dest)
 }
 
 static void
-apply_concurrent_insert(Relation rel, HeapTuple tup, IndexInsertState *iistate,
+apply_concurrent_insert(Relation rel, HeapTuple tup, TransactionId xid, IndexInsertState *iistate,
 						TupleTableSlot *index_slot)
 {
 	List	   *recheck;
@@ -2840,9 +2875,12 @@ apply_concurrent_insert(Relation rel, HeapTuple tup, IndexInsertState *iistate,
 	 * Like simple_heap_insert(), but make sure that the INSERT is not
 	 * logically decoded - see reform_and_rewrite_tuple() for more
 	 * information.
+	 *
+	 * Use already committed xid to stamp the tuple.
 	 */
-	heap_insert(rel, tup, GetCurrentCommandId(true), HEAP_INSERT_NO_LOGICAL,
-				NULL);
+	Assert(TransactionIdIsValid(xid));
+	heap_insert(rel, tup, xid, GetCurrentCommandId(true),
+				HEAP_INSERT_NO_LOGICAL, NULL);
 
 	/*
 	 * Update indexes.
@@ -2850,6 +2888,7 @@ apply_concurrent_insert(Relation rel, HeapTuple tup, IndexInsertState *iistate,
 	 * In case functions in the index need the active snapshot and caller
 	 * hasn't set one.
 	 */
+	PushActiveSnapshot(GetLatestSnapshot());
 	ExecStoreHeapTuple(tup, index_slot, false);
 	recheck = ExecInsertIndexTuples(iistate->rri,
 									index_slot,
@@ -2860,6 +2899,7 @@ apply_concurrent_insert(Relation rel, HeapTuple tup, IndexInsertState *iistate,
 									NIL,	/* arbiterIndexes */
 									false	/* onlySummarizing */
 		);
+	PopActiveSnapshot();
 
 	/*
 	 * If recheck is required, it must have been preformed on the source
@@ -2873,6 +2913,7 @@ apply_concurrent_insert(Relation rel, HeapTuple tup, IndexInsertState *iistate,
 
 static void
 apply_concurrent_update(Relation rel, HeapTuple tup, HeapTuple tup_target,
+						TransactionId xid,
 						IndexInsertState *iistate, TupleTableSlot *index_slot)
 {
 	LockTupleMode lockmode;
@@ -2887,9 +2928,12 @@ apply_concurrent_update(Relation rel, HeapTuple tup, HeapTuple tup_target,
 	 *
 	 * Do it like in simple_heap_update(), except for 'wal_logical' (and
 	 * except for 'wait').
+	 *
+	 * Use already committed xid to stamp the tuple.
 	 */
+	Assert(TransactionIdIsValid(xid));
 	res = heap_update(rel, &tup_target->t_self, tup,
-					  GetCurrentCommandId(true),
+					  xid, GetCurrentCommandId(true),
 					  InvalidSnapshot,
 					  false,	/* no wait - only we are doing changes */
 					  &tmfd, &lockmode, &update_indexes,
@@ -2901,6 +2945,7 @@ apply_concurrent_update(Relation rel, HeapTuple tup, HeapTuple tup_target,
 
 	if (update_indexes != TU_None)
 	{
+		PushActiveSnapshot(GetLatestSnapshot());
 		recheck = ExecInsertIndexTuples(iistate->rri,
 										index_slot,
 										iistate->estate,
@@ -2910,6 +2955,7 @@ apply_concurrent_update(Relation rel, HeapTuple tup, HeapTuple tup_target,
 										NIL,	/* arbiterIndexes */
 		/* onlySummarizing */
 										update_indexes == TU_Summarizing);
+		PopActiveSnapshot();
 		list_free(recheck);
 	}
 
@@ -2917,7 +2963,7 @@ apply_concurrent_update(Relation rel, HeapTuple tup, HeapTuple tup_target,
 }
 
 static void
-apply_concurrent_delete(Relation rel, HeapTuple tup_target)
+apply_concurrent_delete(Relation rel, TransactionId xid, HeapTuple tup_target)
 {
 	TM_Result	res;
 	TM_FailureData tmfd;
@@ -2927,9 +2973,12 @@ apply_concurrent_delete(Relation rel, HeapTuple tup_target)
 	 *
 	 * Do it like in simple_heap_delete(), except for 'wal_logical' (and
 	 * except for 'wait').
+	 *
+	 * Use already committed xid to stamp the tuple.
 	 */
-	res = heap_delete(rel, &tup_target->t_self, GetCurrentCommandId(true),
-					  InvalidSnapshot, false,
+	Assert(TransactionIdIsValid(xid));
+	res = heap_delete(rel, &tup_target->t_self, xid,
+					  GetCurrentCommandId(true), InvalidSnapshot, false,
 					  &tmfd,
 					  false,	/* no wait - only we are doing changes */
 					  false /* wal_logical */ );
@@ -2950,7 +2999,7 @@ apply_concurrent_delete(Relation rel, HeapTuple tup_target)
  */
 static HeapTuple
 find_target_tuple(Relation rel, ChangeDest *dest, HeapTuple tup_key,
-				  TupleTableSlot *ident_slot)
+				  Snapshot snapshot, TupleTableSlot *ident_slot)
 {
 	Relation	ident_index = dest->ident_index;
 	IndexScanDesc scan;
@@ -2959,7 +3008,7 @@ find_target_tuple(Relation rel, ChangeDest *dest, HeapTuple tup_key,
 	HeapTuple	result = NULL;
 
 	/* XXX no instrumentation for now */
-	scan = index_beginscan(rel, ident_index, GetActiveSnapshot(),
+	scan = index_beginscan(rel, ident_index, snapshot,
 						   NULL, dest->ident_key_nentries, 0);
 
 	/*
diff --git a/src/backend/replication/pgoutput_repack/pgoutput_repack.c b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
index fb9956d392d..8d796e0a684 100644
--- a/src/backend/replication/pgoutput_repack/pgoutput_repack.c
+++ b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
@@ -29,7 +29,8 @@ static void plugin_commit_txn(LogicalDecodingContext *ctx,
 static void plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 						  Relation rel, ReorderBufferChange *change);
 static void store_change(LogicalDecodingContext *ctx,
-						 ConcurrentChangeKind kind, HeapTuple tuple);
+						 ConcurrentChangeKind kind, HeapTuple tuple,
+						 TransactionId xid);
 
 void
 _PG_output_plugin_init(OutputPluginCallbacks *cb)
@@ -120,7 +121,7 @@ plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 				if (newtuple == NULL)
 					elog(ERROR, "Incomplete insert info.");
 
-				store_change(ctx, CHANGE_INSERT, newtuple);
+				store_change(ctx, CHANGE_INSERT, newtuple, change->txn->xid);
 			}
 			break;
 		case REORDER_BUFFER_CHANGE_UPDATE:
@@ -137,9 +138,11 @@ plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 					elog(ERROR, "Incomplete update info.");
 
 				if (oldtuple != NULL)
-					store_change(ctx, CHANGE_UPDATE_OLD, oldtuple);
+					store_change(ctx, CHANGE_UPDATE_OLD, oldtuple,
+								 change->txn->xid);
 
-				store_change(ctx, CHANGE_UPDATE_NEW, newtuple);
+				store_change(ctx, CHANGE_UPDATE_NEW, newtuple,
+							 change->txn->xid);
 			}
 			break;
 		case REORDER_BUFFER_CHANGE_DELETE:
@@ -152,7 +155,7 @@ plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 				if (oldtuple == NULL)
 					elog(ERROR, "Incomplete delete info.");
 
-				store_change(ctx, CHANGE_DELETE, oldtuple);
+				store_change(ctx, CHANGE_DELETE, oldtuple, change->txn->xid);
 			}
 			break;
 		default:
@@ -165,7 +168,7 @@ plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 /* Store concurrent data change. */
 static void
 store_change(LogicalDecodingContext *ctx, ConcurrentChangeKind kind,
-			 HeapTuple tuple)
+			 HeapTuple tuple, TransactionId xid)
 {
 	RepackDecodingState *dstate;
 	char		kind_byte = (char) kind;
@@ -195,6 +198,7 @@ store_change(LogicalDecodingContext *ctx, ConcurrentChangeKind kind,
 	BufFileWrite(dstate->file, &tuple->t_len, sizeof(tuple->t_len));
 	/* ... and the tuple itself. */
 	BufFileWrite(dstate->file, tuple->t_data, tuple->t_len);
+	BufFileWrite(dstate->file, &xid, sizeof(TransactionId));
 
 	/* Free the flat copy if created above. */
 	if (flattened)
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b7cd25896f6..d9776f61a0d 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -354,20 +354,20 @@ extern BulkInsertState GetBulkInsertState(void);
 extern void FreeBulkInsertState(BulkInsertState);
 extern void ReleaseBulkInsertStatePin(BulkInsertState bistate);
 
-extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
+extern void heap_insert(Relation relation, HeapTuple tup, TransactionId xid, CommandId cid,
 						int options, BulkInsertState bistate);
 extern void heap_multi_insert(Relation relation, TupleTableSlot **slots,
 							  int ntuples, CommandId cid, int options,
 							  BulkInsertState bistate);
 extern TM_Result heap_delete(Relation relation, const ItemPointerData *tid,
-							 CommandId cid, Snapshot crosscheck, bool wait,
+							 TransactionId xid, CommandId cid, Snapshot crosscheck, bool wait,
 							 TM_FailureData *tmfd, bool changingPart,
 							 bool wal_logical);
 extern void heap_finish_speculative(Relation relation, const ItemPointerData *tid);
 extern void heap_abort_speculative(Relation relation, const ItemPointerData *tid);
 extern TM_Result heap_update(Relation relation, const ItemPointerData *otid,
 							 HeapTuple newtup,
-							 CommandId cid, Snapshot crosscheck, bool wait,
+							 TransactionId xid, CommandId cid, Snapshot crosscheck, bool wait,
 							 TM_FailureData *tmfd, LockTupleMode *lockmode,
 							 TU_UpdateIndexes *update_indexes, bool wal_logical);
 extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
diff --git a/src/test/modules/injection_points/specs/repack.spec b/src/test/modules/injection_points/specs/repack.spec
index d727a9b056b..accd42d78aa 100644
--- a/src/test/modules/injection_points/specs/repack.spec
+++ b/src/test/modules/injection_points/specs/repack.spec
@@ -85,9 +85,6 @@ step change_new
 # When applying concurrent data changes, we should see the effects of an
 # in-progress subtransaction.
 #
-# XXX Not sure this test is useful now - it was designed for the patch that
-# preserves tuple visibility and which therefore modifies
-# TransactionIdIsCurrentTransactionId().
 step change_subxact1
 {
 	BEGIN;
@@ -102,7 +99,6 @@ step change_subxact1
 # When applying concurrent data changes, we should not see the effects of a
 # rolled back subtransaction.
 #
-# XXX Is this test useful? See above.
 step change_subxact2
 {
 	BEGIN;
-- 
2.43.0

nocfbot-0002-one-more-stress-test-for-repack-concurrentl.patchapplication/x-patch; name=nocfbot-0002-one-more-stress-test-for-repack-concurrentl.patchDownload

From 25fc848068b28e5b2ae099bdecae35fdf8cb6240 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <mihailnikalayeu@gmail.com>
Date: Sat, 13 Dec 2025 18:46:46 +0100
Subject: [PATCH vnocfbot 2/2] one more stress test for repack concurrently

---
 contrib/amcheck/meson.build                  |   1 +
 contrib/amcheck/t/008_repack_concurrently.pl | 101 +++++++++++++++++++
 2 files changed, 102 insertions(+)
 create mode 100644 contrib/amcheck/t/008_repack_concurrently.pl

diff --git a/contrib/amcheck/meson.build b/contrib/amcheck/meson.build
index 2b69081d3bf..f7c70735989 100644
--- a/contrib/amcheck/meson.build
+++ b/contrib/amcheck/meson.build
@@ -51,6 +51,7 @@ tests += {
       't/005_pitr.pl',
       't/006_verify_gin.pl',
       't/007_repack_concurrently.pl',
+      't/008_repack_concurrently.pl',
     ],
   },
 }
diff --git a/contrib/amcheck/t/008_repack_concurrently.pl b/contrib/amcheck/t/008_repack_concurrently.pl
new file mode 100644
index 00000000000..220524d41b3
--- /dev/null
+++ b/contrib/amcheck/t/008_repack_concurrently.pl
@@ -0,0 +1,101 @@
+
+# Copyright (c) 2021-2025, PostgreSQL Global Development Group
+
+# Test REPACK CONCURRENTLY with concurrent modifications
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+
+use Test::More;
+
+my $node;
+
+#
+# Test set-up
+#
+$node = PostgreSQL::Test::Cluster->new('CIC_test');
+$node->init;
+$node->append_conf('postgresql.conf',
+	'lock_timeout = ' . (1000 * $PostgreSQL::Test::Utils::timeout_default));
+$node->append_conf(
+	'postgresql.conf', qq(
+wal_level = logical
+));
+
+my $no_hot = int(rand(2));
+
+$node->start;
+$node->safe_psql('postgres', q(CREATE TABLE tbl(i SERIAL PRIMARY KEY, j int)));
+if ($no_hot)
+{
+	$node->safe_psql('postgres', q(CREATE INDEX test_idx ON tbl(j);));
+}
+else
+{
+	$node->safe_psql('postgres', q(CREATE INDEX test_idx ON tbl(i);));
+}
+
+# Load amcheck
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+my $sum = $node->safe_psql('postgres', q(
+	SELECT SUM(j) AS sum FROM tbl
+));
+
+$node->safe_psql('postgres', q(CREATE UNLOGGED SEQUENCE last_j START 1 INCREMENT 1;));
+
+
+$node->pgbench(
+'--no-vacuum --client=30 --jobs=4 --exit-on-abort --transactions=1000',
+0,
+[qr{actually processed}],
+[qr{^$}],
+'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY',
+{
+	'concurrent_ops' => qq(
+		SELECT pg_try_advisory_lock(42)::integer AS gotlock \\gset
+		\\if :gotlock
+			REPACK (CONCURRENTLY) tbl USING INDEX tbl_pkey;
+			SELECT bt_index_parent_check('tbl_pkey', heapallindexed => true);
+			SELECT bt_index_parent_check('test_idx', heapallindexed => true);
+			\\sleep 10 ms
+
+			REPACK (CONCURRENTLY) tbl USING INDEX test_idx;
+			SELECT bt_index_parent_check('tbl_pkey', heapallindexed => true);
+			SELECT bt_index_parent_check('test_idx', heapallindexed => true);
+			\\sleep 10 ms
+
+			REPACK (CONCURRENTLY) tbl;
+			SELECT bt_index_parent_check('tbl_pkey', heapallindexed => true);
+			SELECT bt_index_parent_check('test_idx', heapallindexed => true);
+			\\sleep 10 ms
+
+			SELECT pg_advisory_unlock(42);
+		\\else
+			SELECT pg_advisory_lock(43);
+				BEGIN;
+				INSERT INTO tbl(j) VALUES (nextval('last_j')) RETURNING j \\gset p_
+				COMMIT;
+			SELECT pg_advisory_unlock(43);
+			\\sleep 1 ms
+
+			BEGIN
+			--TRANSACTION ISOLATION LEVEL REPEATABLE READ
+			;
+			SELECT 1;
+			\\sleep 1 ms
+			SELECT COUNT(*) AS count FROM tbl WHERE j <= :p_j \\gset p_
+			\\if :p_count != :p_j
+				COMMIT;
+				SELECT (:p_count) / 0;
+			\\endif
+
+			COMMIT;
+		\\endif
+	)
+});
+
+$node->stop;
+done_testing();
-- 
2.43.0

nocfbot-0001-stress-test-for-repack-concurrently.patchapplication/x-patch; name=nocfbot-0001-stress-test-for-repack-concurrently.patchDownload

From db84bbad9d10ffacffc763dbf0ed4bb481f42399 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <mihailnikalayeu@gmail.com>
Date: Sat, 13 Dec 2025 18:13:37 +0100
Subject: [PATCH vnocfbot 1/2] stress test for repack concurrently

---
 contrib/amcheck/meson.build                  |   1 +
 contrib/amcheck/t/007_repack_concurrently.pl | 110 +++++++++++++++++++
 2 files changed, 111 insertions(+)
 create mode 100644 contrib/amcheck/t/007_repack_concurrently.pl

diff --git a/contrib/amcheck/meson.build b/contrib/amcheck/meson.build
index 1f0c347ed54..2b69081d3bf 100644
--- a/contrib/amcheck/meson.build
+++ b/contrib/amcheck/meson.build
@@ -50,6 +50,7 @@ tests += {
       't/004_verify_nbtree_unique.pl',
       't/005_pitr.pl',
       't/006_verify_gin.pl',
+      't/007_repack_concurrently.pl',
     ],
   },
 }
diff --git a/contrib/amcheck/t/007_repack_concurrently.pl b/contrib/amcheck/t/007_repack_concurrently.pl
new file mode 100644
index 00000000000..a47cebb347b
--- /dev/null
+++ b/contrib/amcheck/t/007_repack_concurrently.pl
@@ -0,0 +1,110 @@
+
+# Copyright (c) 2021-2025, PostgreSQL Global Development Group
+
+# Test REPACK CONCURRENTLY with concurrent modifications
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+
+use Test::More;
+
+my $node;
+
+#
+# Test set-up
+#
+$node = PostgreSQL::Test::Cluster->new('CIC_test');
+$node->init;
+$node->append_conf('postgresql.conf',
+	'lock_timeout = ' . (1000 * $PostgreSQL::Test::Utils::timeout_default));
+$node->append_conf(
+	'postgresql.conf', qq(
+wal_level = logical
+));
+
+my $n=1000;
+my $no_hot = int(rand(2));
+
+$node->start;
+$node->safe_psql('postgres', q(CREATE TABLE tbl(i int PRIMARY KEY, j int)));
+
+if ($no_hot)
+{
+	$node->safe_psql('postgres', q(CREATE INDEX test_idx ON tbl(j);));
+}
+else
+{
+	$node->safe_psql('postgres', q(CREATE INDEX test_idx ON tbl(i);));
+}
+
+
+# Load amcheck
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Insert $n rows into tbl
+$node->safe_psql('postgres', qq(
+	INSERT INTO tbl SELECT i, i FROM generate_series(1,$n) i
+));
+
+my $sum = $node->safe_psql('postgres', q(
+	SELECT SUM(j) AS sum FROM tbl
+));
+
+
+$node->pgbench(
+'--no-vacuum --client=30 --jobs=4 --exit-on-abort --transactions=5000',
+0,
+[qr{actually processed}],
+[qr{^$}],
+'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY',
+{
+	'concurrent_ops' => qq(
+		SELECT pg_try_advisory_lock(42)::integer AS gotlock \\gset
+		\\if :gotlock
+			REPACK (CONCURRENTLY) tbl USING INDEX tbl_pkey;
+			SELECT bt_index_parent_check('tbl_pkey', heapallindexed => true);
+			SELECT bt_index_parent_check('test_idx', heapallindexed => true);
+			\\sleep 10 ms
+
+			REPACK (CONCURRENTLY) tbl USING INDEX test_idx;
+			SELECT bt_index_parent_check('tbl_pkey', heapallindexed => true);
+			SELECT bt_index_parent_check('test_idx', heapallindexed => true);
+			\\sleep 10 ms
+
+			REPACK (CONCURRENTLY) tbl;
+			SELECT bt_index_parent_check('tbl_pkey', heapallindexed => true);
+			SELECT bt_index_parent_check('test_idx', heapallindexed => true);
+			\\sleep 10 ms
+
+			SELECT pg_advisory_unlock(42);
+		\\else
+			\\set num_a random(1, $n)
+			\\set num_b random(1, $n)
+			\\set diff random(1, 10000)
+			BEGIN;
+			UPDATE tbl SET j = j + :diff WHERE i = :num_a;
+			\\sleep 1 ms
+			UPDATE tbl SET j = j - :diff WHERE i = :num_b;
+			\\sleep 1 ms
+			COMMIT;
+
+			BEGIN
+			--TRANSACTION ISOLATION LEVEL REPEATABLE READ
+			;
+			SELECT 1;
+			\\sleep 1 ms
+			SELECT COALESCE(SUM(j), 0) AS sum FROM tbl \\gset p_
+			\\if :p_sum != $sum
+				COMMIT;
+				SELECT (:p_sum) / 0;
+			\\endif
+
+			COMMIT;
+		\\endif
+	)
+});
+
+$node->stop;
+done_testing();
-- 
2.43.0

#70

Antonin Houska

ah@cybertec.at

29 days ago

In reply to: Alvaro Herrera (#67)

6 attachment(s)

Re: Adding REPACK [concurrently]

Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Hello, many thanks for the new version. Here's a very quick proposal
for a new top-of-file comment on cluster.c,

The comment matches 0005, but I had to adjust it for 0004 (no background
worker there). Also, the worker writes the changes to a file rather than
tuplestore (storage/sharedfileset.h seems to me an easier way to pass the data
from one process to another) Besides that I made the following changes:

"bloat is greatly reduced" -> "bloat is eliminated"

and

"table, and to cope with" -> "table. To cope with"

I haven't read build_relation_finish_concurrent() yet to understand how
exactly do we do the lock upgrade, which I think is an important point
we should address in this comment. Also not addressed is how exactly we
handle indexes. Feel free to correct this, reword it or include any
additional details that you think are important.

ok, I'll get back to the earlier parts of the set, including this, in the
beginning of January. Regarding indexes, one thing I've noticed recently that
they get locked in build_new_indexes(), but maybe it should happen earlier.

(At this point we could just as well rename the file to repack.c, since
very little of the original remains. But let's discuss that later.)

ok. Do you mean only the file or the functions as well? (I'm not going to do
that now, w/o that discussion.)

Attached here is a new version of the patch set. Its rebased and extended one
more time: 0006 is a PoC of the "snapshot resetting" technique, as discussed
elsewhere with Mihail Nikalayeu and Matthias van de Meent. The way snapshot
are generated here is different though: we need the snapshots from logical
replication's snapbuild.c, not those from procarray.c. More information is in
the commit message.

I do not insist that this should go to PG 19, just needed some confidence that
it's doable, as well as some feedback. There are no tests for this yet, but
I've played with it for a while and checked the behavior using debugger. I'm
curious to hear if the design is sound.

While working on that, I fixed some problems in 0004 and 0005 too. It
shouldn't be difficult to identify them using git, if needed.

Even if 0005 and 0006 won't land in PG19, these parts show that some
refactoring may be needed regarding the AM callback
table_relation_copy_for_cluster(). The parts 0004, 0005 and 0006 each change
the argument list. It wouldn't be perfect if both PG 19 and 20 changed the
API. I think we should reconsider which arguments are generic and which are
rather AM-specific. Maybe we should then add an opaque pointer (void *) for
the AM-specific information. REPACK could then use it to pass the
CONCURRENTLY-specific information.

I'm now going to prioritize the parts <= 0004.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachments:

v28-0001-Add-REPACK-command.patchtext/x-diff; charset=utf-8Download

From 46da847ee07d471134badc362add95ed321269b1 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Sat, 13 Dec 2025 19:27:17 +0100
Subject: [PATCH 1/6] Add REPACK command
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

REPACK absorbs the functionality of VACUUM FULL and CLUSTER in a single
command.  Because this functionality is completely different from
regular VACUUM, having it separate from VACUUM makes it easier for users
to understand; as for CLUSTER, the term is heavily overloaded in the
IT world and even in Postgres itself, so it's good that we can avoid it.

This also adds pg_repackdb, a new utility that can invoke the new
commands.  This is heavily based on vacuumdb.

Author: Antonin Houska <ah@cybertec.at>
Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: jian he <jian.universality@gmail.com>
Discussion: https://postgr.es/m/82651.1720540558@antos
Discussion: https://postgr.es/m/202507262156.sb455angijk6@alvherre.pgsql
---
 doc/src/sgml/monitoring.sgml             | 223 +++++-
 doc/src/sgml/ref/allfiles.sgml           |   2 +
 doc/src/sgml/ref/cluster.sgml            |  97 +--
 doc/src/sgml/ref/clusterdb.sgml          |   5 +
 doc/src/sgml/ref/pg_repackdb.sgml        | 488 +++++++++++++
 doc/src/sgml/ref/repack.sgml             | 328 +++++++++
 doc/src/sgml/ref/vacuum.sgml             |  33 +-
 doc/src/sgml/reference.sgml              |   2 +
 src/backend/access/heap/heapam_handler.c |  32 +-
 src/backend/catalog/index.c              |   2 +-
 src/backend/catalog/system_views.sql     |  29 +-
 src/backend/commands/cluster.c           | 849 +++++++++++++++--------
 src/backend/commands/vacuum.c            |   6 +-
 src/backend/parser/gram.y                |  86 ++-
 src/backend/tcop/utility.c               |  23 +-
 src/backend/utils/adt/pgstatfuncs.c      |   2 +
 src/bin/psql/tab-complete.in.c           |  42 +-
 src/bin/scripts/Makefile                 |   4 +-
 src/bin/scripts/meson.build              |   2 +
 src/bin/scripts/pg_repackdb.c            | 240 +++++++
 src/bin/scripts/t/103_repackdb.pl        |  47 ++
 src/bin/scripts/vacuuming.c              | 102 ++-
 src/bin/scripts/vacuuming.h              |   3 +
 src/include/commands/cluster.h           |   8 +-
 src/include/commands/progress.h          |  50 +-
 src/include/nodes/parsenodes.h           |  35 +-
 src/include/parser/kwlist.h              |   1 +
 src/include/tcop/cmdtaglist.h            |   1 +
 src/include/utils/backend_progress.h     |   1 +
 src/test/regress/expected/cluster.out    | 134 +++-
 src/test/regress/expected/rules.out      |  72 +-
 src/test/regress/sql/cluster.sql         |  70 +-
 src/tools/pgindent/typedefs.list         |   2 +
 33 files changed, 2484 insertions(+), 537 deletions(-)
 create mode 100644 doc/src/sgml/ref/pg_repackdb.sgml
 create mode 100644 doc/src/sgml/ref/repack.sgml
 create mode 100644 src/bin/scripts/pg_repackdb.c
 create mode 100644 src/bin/scripts/t/103_repackdb.pl

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 817fd9f4ca7..b07fe3294cd 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -405,6 +405,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_progress_repack</structname><indexterm><primary>pg_stat_progress_repack</primary></indexterm></entry>
+      <entry>One row for each backend running
+       <command>REPACK</command>, showing current progress.  See
+       <xref linkend="repack-progress-reporting"/>.
+      </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_progress_basebackup</structname><indexterm><primary>pg_stat_progress_basebackup</primary></indexterm></entry>
       <entry>One row for each WAL sender process streaming a base backup,
@@ -5609,7 +5617,8 @@ FROM pg_stat_get_backend_idset() AS backendid;
    certain commands during command execution.  Currently, the only commands
    which support progress reporting are <command>ANALYZE</command>,
    <command>CLUSTER</command>,
-   <command>CREATE INDEX</command>, <command>VACUUM</command>,
+   <command>CREATE INDEX</command>, <command>REPACK</command>,
+   <command>VACUUM</command>,
    <command>COPY</command>,
    and <xref linkend="protocol-replication-base-backup"/> (i.e., replication
    command that <xref linkend="app-pgbasebackup"/> issues to take
@@ -6093,6 +6102,218 @@ FROM pg_stat_get_backend_idset() AS backendid;
   </table>
  </sect2>
 
+ <sect2 id="repack-progress-reporting">
+  <title>REPACK Progress Reporting</title>
+
+  <indexterm>
+   <primary>pg_stat_progress_repack</primary>
+  </indexterm>
+
+  <para>
+   Whenever <command>REPACK</command> is running,
+   the <structname>pg_stat_progress_repack</structname> view will contain a
+   row for each backend that is currently running the command.  The tables
+   below describe the information that will be reported and provide
+   information about how to interpret it.
+  </para>
+
+  <table id="pg-stat-progress-repack-view" xreflabel="pg_stat_progress_repack">
+   <title><structname>pg_stat_progress_repack</structname> View</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>pid</structfield> <type>integer</type>
+      </para>
+      <para>
+       Process ID of backend.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>datid</structfield> <type>oid</type>
+      </para>
+      <para>
+       OID of the database to which this backend is connected.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>datname</structfield> <type>name</type>
+      </para>
+      <para>
+       Name of the database to which this backend is connected.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>relid</structfield> <type>oid</type>
+      </para>
+      <para>
+       OID of the table being repacked.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>phase</structfield> <type>text</type>
+      </para>
+      <para>
+       Current processing phase. See <xref linkend="repack-phases"/>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>repack_index_relid</structfield> <type>oid</type>
+      </para>
+      <para>
+       If the table is being scanned using an index, this is the OID of the
+       index being used; otherwise, it is zero.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_scanned</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples scanned.
+       This counter only advances when the phase is
+       <literal>seq scanning heap</literal>,
+       <literal>index scanning heap</literal>
+       or <literal>writing new heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_written</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples written.
+       This counter only advances when the phase is
+       <literal>seq scanning heap</literal>,
+       <literal>index scanning heap</literal>
+       or <literal>writing new heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_blks_total</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total number of heap blocks in the table.  This number is reported
+       as of the beginning of <literal>seq scanning heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_blks_scanned</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap blocks scanned.  This counter only advances when the
+       phase is <literal>seq scanning heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>index_rebuild_count</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of indexes rebuilt.  This counter only advances when the phase
+       is <literal>rebuilding index</literal>.
+      </para></entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <table id="repack-phases">
+   <title>REPACK Phases</title>
+   <tgroup cols="2">
+    <colspec colname="col1" colwidth="1*"/>
+    <colspec colname="col2" colwidth="2*"/>
+    <thead>
+    <row>
+      <entry>Phase</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><literal>initializing</literal></entry>
+     <entry>
+       The command is preparing to begin scanning the heap.  This phase is
+       expected to be very brief.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>seq scanning heap</literal></entry>
+     <entry>
+       The command is currently scanning the table using a sequential scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>index scanning heap</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently scanning the table using an index scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>sorting tuples</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently sorting tuples.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>writing new heap</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently writing the new heap.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>swapping relation files</literal></entry>
+     <entry>
+       The command is currently swapping newly-built files into place.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>rebuilding index</literal></entry>
+     <entry>
+       The command is currently rebuilding an index.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>performing final cleanup</literal></entry>
+     <entry>
+       The command is performing final cleanup.  When this phase is
+       completed, <command>REPACK</command> will end.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+ </sect2>
+
  <sect2 id="copy-progress-reporting">
   <title>COPY Progress Reporting</title>
 
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index e167406c744..5df944d13ca 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -167,6 +167,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY refreshMaterializedView SYSTEM "refresh_materialized_view.sgml">
 <!ENTITY reindex            SYSTEM "reindex.sgml">
 <!ENTITY releaseSavepoint   SYSTEM "release_savepoint.sgml">
+<!ENTITY repack             SYSTEM "repack.sgml">
 <!ENTITY reset              SYSTEM "reset.sgml">
 <!ENTITY revoke             SYSTEM "revoke.sgml">
 <!ENTITY rollback           SYSTEM "rollback.sgml">
@@ -213,6 +214,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY pgIsready          SYSTEM "pg_isready.sgml">
 <!ENTITY pgReceivewal       SYSTEM "pg_receivewal.sgml">
 <!ENTITY pgRecvlogical      SYSTEM "pg_recvlogical.sgml">
+<!ENTITY pgRepackdb         SYSTEM "pg_repackdb.sgml">
 <!ENTITY pgResetwal         SYSTEM "pg_resetwal.sgml">
 <!ENTITY pgRestore          SYSTEM "pg_restore.sgml">
 <!ENTITY pgRewind           SYSTEM "pg_rewind.sgml">
diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index 0b47460080b..2cda711bc9f 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -33,51 +33,13 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
   <title>Description</title>
 
   <para>
-   <command>CLUSTER</command> instructs <productname>PostgreSQL</productname>
-   to cluster the table specified
-   by <replaceable class="parameter">table_name</replaceable>
-   based on the index specified by
-   <replaceable class="parameter">index_name</replaceable>. The index must
-   already have been defined on
-   <replaceable class="parameter">table_name</replaceable>.
+   The <command>CLUSTER</command> command is equivalent to
+   <xref linkend="sql-repack"/> with an <literal>USING INDEX</literal>
+   clause.  See there for more details.
   </para>
 
-  <para>
-   When a table is clustered, it is physically reordered
-   based on the index information. Clustering is a one-time operation:
-   when the table is subsequently updated, the changes are
-   not clustered.  That is, no attempt is made to store new or
-   updated rows according to their index order.  (If one wishes, one can
-   periodically recluster by issuing the command again.  Also, setting
-   the table's <literal>fillfactor</literal> storage parameter to less than
-   100% can aid in preserving cluster ordering during updates, since updated
-   rows are kept on the same page if enough space is available there.)
-  </para>
-
-  <para>
-   When a table is clustered, <productname>PostgreSQL</productname>
-   remembers which index it was clustered by.  The form
-   <command>CLUSTER <replaceable class="parameter">table_name</replaceable></command>
-   reclusters the table using the same index as before.  You can also
-   use the <literal>CLUSTER</literal> or <literal>SET WITHOUT CLUSTER</literal>
-   forms of <link linkend="sql-altertable"><command>ALTER TABLE</command></link> to set the index to be used for
-   future cluster operations, or to clear any previous setting.
-  </para>
-
-  <para>
-   <command>CLUSTER</command> without a
-   <replaceable class="parameter">table_name</replaceable> reclusters all the
-   previously-clustered tables in the current database that the calling user
-   has privileges for.  This form of <command>CLUSTER</command> cannot be
-   executed inside a transaction block.
-  </para>
+<!-- Do we need to describe exactly which options map to what?  They seem obvious to me. -->
 
-  <para>
-   When a table is being clustered, an <literal>ACCESS
-   EXCLUSIVE</literal> lock is acquired on it. This prevents any other
-   database operations (both reads and writes) from operating on the
-   table until the <command>CLUSTER</command> is finished.
-  </para>
  </refsect1>
 
  <refsect1>
@@ -136,63 +98,12 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
     on the table.
    </para>
 
-   <para>
-    In cases where you are accessing single rows randomly
-    within a table, the actual order of the data in the
-    table is unimportant. However, if you tend to access some
-    data more than others, and there is an index that groups
-    them together, you will benefit from using <command>CLUSTER</command>.
-    If you are requesting a range of indexed values from a table, or a
-    single indexed value that has multiple rows that match,
-    <command>CLUSTER</command> will help because once the index identifies the
-    table page for the first row that matches, all other rows
-    that match are probably already on the same table page,
-    and so you save disk accesses and speed up the query.
-   </para>
-
-   <para>
-    <command>CLUSTER</command> can re-sort the table using either an index scan
-    on the specified index, or (if the index is a b-tree) a sequential
-    scan followed by sorting.  It will attempt to choose the method that
-    will be faster, based on planner cost parameters and available statistical
-    information.
-   </para>
-
    <para>
     While <command>CLUSTER</command> is running, the <xref
     linkend="guc-search-path"/> is temporarily changed to <literal>pg_catalog,
     pg_temp</literal>.
    </para>
 
-   <para>
-    When an index scan is used, a temporary copy of the table is created that
-    contains the table data in the index order.  Temporary copies of each
-    index on the table are created as well.  Therefore, you need free space on
-    disk at least equal to the sum of the table size and the index sizes.
-   </para>
-
-   <para>
-    When a sequential scan and sort is used, a temporary sort file is
-    also created, so that the peak temporary space requirement is as much
-    as double the table size, plus the index sizes.  This method is often
-    faster than the index scan method, but if the disk space requirement is
-    intolerable, you can disable this choice by temporarily setting <xref
-    linkend="guc-enable-sort"/> to <literal>off</literal>.
-   </para>
-
-   <para>
-    It is advisable to set <xref linkend="guc-maintenance-work-mem"/> to
-    a reasonably large value (but not more than the amount of RAM you can
-    dedicate to the <command>CLUSTER</command> operation) before clustering.
-   </para>
-
-   <para>
-    Because the planner records statistics about the ordering of
-    tables, it is advisable to run <link linkend="sql-analyze"><command>ANALYZE</command></link>
-    on the newly clustered table.
-    Otherwise, the planner might make poor choices of query plans.
-   </para>
-
    <para>
     Because <command>CLUSTER</command> remembers which indexes are clustered,
     one can cluster the tables one wants clustered manually the first time,
diff --git a/doc/src/sgml/ref/clusterdb.sgml b/doc/src/sgml/ref/clusterdb.sgml
index 0d2051bf6f1..b50c9581a98 100644
--- a/doc/src/sgml/ref/clusterdb.sgml
+++ b/doc/src/sgml/ref/clusterdb.sgml
@@ -64,6 +64,11 @@ PostgreSQL documentation
    this utility and via other methods for accessing the server.
   </para>
 
+  <para>
+   <application>clusterdb</application> has been superseded by
+   <application>pg_repackdb</application>.
+  </para>
+
  </refsect1>
 
 
diff --git a/doc/src/sgml/ref/pg_repackdb.sgml b/doc/src/sgml/ref/pg_repackdb.sgml
new file mode 100644
index 00000000000..b313b54ab63
--- /dev/null
+++ b/doc/src/sgml/ref/pg_repackdb.sgml
@@ -0,0 +1,488 @@
+<!--
+doc/src/sgml/ref/pg_repackdb.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="app-pgrepackdb">
+ <indexterm zone="app-pgrepackdb">
+  <primary>pg_repackdb</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle><application>pg_repackdb</application></refentrytitle>
+  <manvolnum>1</manvolnum>
+  <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>pg_repackdb</refname>
+  <refpurpose>repack and analyze a <productname>PostgreSQL</productname>
+  database</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_repackdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+      <arg choice="plain"><option>-t</option></arg>
+      <arg choice="plain"><option>--table</option></arg>
+     </group>
+     <replaceable>table</replaceable>
+     <arg choice="opt">( <replaceable class="parameter">column</replaceable> [,...] )</arg>
+    </arg>
+   </arg>
+
+   <arg choice="opt">
+    <group choice="plain">
+     <arg choice="plain"><replaceable>dbname</replaceable></arg>
+     <arg choice="plain"><option>-a</option></arg>
+     <arg choice="plain"><option>--all</option></arg>
+    </group>
+   </arg>
+  </cmdsynopsis>
+
+  <cmdsynopsis>
+   <command>pg_repackdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+      <arg choice="plain"><option>-n</option></arg>
+      <arg choice="plain"><option>--schema</option></arg>
+     </group>
+     <replaceable>schema</replaceable>
+    </arg>
+   </arg>
+
+   <arg choice="opt">
+    <group choice="plain">
+     <arg choice="plain"><replaceable>dbname</replaceable></arg>
+     <arg choice="plain"><option>-a</option></arg>
+     <arg choice="plain"><option>--all</option></arg>
+    </group>
+   </arg>
+  </cmdsynopsis>
+
+  <cmdsynopsis>
+   <command>pg_repackdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+      <arg choice="plain"><option>-N</option></arg>
+      <arg choice="plain"><option>--exclude-schema</option></arg>
+     </group>
+     <replaceable>schema</replaceable>
+    </arg>
+   </arg>
+
+   <arg choice="opt">
+    <group choice="plain">
+     <arg choice="plain"><replaceable>dbname</replaceable></arg>
+     <arg choice="plain"><option>-a</option></arg>
+     <arg choice="plain"><option>--all</option></arg>
+    </group>
+   </arg>
+  </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <application>pg_repackdb</application> is a utility for repacking a
+   <productname>PostgreSQL</productname> database.
+   <application>pg_repackdb</application> will also generate internal
+   statistics used by the <productname>PostgreSQL</productname> query
+   optimizer.
+  </para>
+
+  <para>
+   <application>pg_repackdb</application> is a wrapper around the SQL
+   command <link linkend="sql-repack"><command>REPACK</command></link> There
+   is no effective difference between repacking and analyzing databases via
+   this utility and via other methods for accessing the server.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Options</title>
+
+   <para>
+    <application>pg_repackdb</application> accepts the following command-line arguments:
+    <variablelist>
+     <varlistentry>
+      <term><option>-a</option></term>
+      <term><option>--all</option></term>
+      <listitem>
+       <para>
+        Repack all databases.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option><optional>-d</optional> <replaceable class="parameter">dbname</replaceable></option></term>
+      <term><option><optional>--dbname=</optional><replaceable class="parameter">dbname</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the name of the database to be repacked or analyzed,
+        when <option>-a</option>/<option>--all</option> is not used.  If this
+        is not specified, the database name is read from the environment
+        variable <envar>PGDATABASE</envar>.  If that is not set, the user name
+        specified for the connection is used.
+        The <replaceable>dbname</replaceable> can be
+        a <link linkend="libpq-connstring">connection string</link>.  If so,
+        connection string parameters will override any conflicting command
+        line options.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-e</option></term>
+      <term><option>--echo</option></term>
+      <listitem>
+       <para>
+        Echo the commands that <application>pg_repackdb</application>
+        generates and sends to the server.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--index<optional>=<replaceable class="parameter">index_name</replaceable></optional></option></term>
+      <listitem>
+       <para>
+        Pass the <literal>USING INDEX</literal> clause to <literal>REPACK</literal>,
+        and optionally the index name to specify.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-j <replaceable class="parameter">njobs</replaceable></option></term>
+      <term><option>--jobs=<replaceable class="parameter">njobs</replaceable></option></term>
+      <listitem>
+       <para>
+        Execute the repack or analyze commands in parallel by running
+        <replaceable class="parameter">njobs</replaceable>
+        commands simultaneously.  This option may reduce the processing time
+        but it also increases the load on the database server.
+       </para>
+       <para>
+        <application>pg_repackdb</application> will open
+        <replaceable class="parameter">njobs</replaceable> connections to the
+        database, so make sure your <xref linkend="guc-max-connections"/>
+        setting is high enough to accommodate all connections.
+       </para>
+       <para>
+        Note that using this mode might cause deadlock failures if certain
+        system catalogs are processed in parallel.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-n <replaceable class="parameter">schema</replaceable></option></term>
+      <term><option>--schema=<replaceable class="parameter">schema</replaceable></option></term>
+      <listitem>
+       <para>
+        Repack or analyze all tables in
+        <replaceable class="parameter">schema</replaceable> only.  Multiple
+        schemas can be repacked by writing multiple <option>-n</option>
+        switches.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-N <replaceable class="parameter">schema</replaceable></option></term>
+      <term><option>--exclude-schema=<replaceable class="parameter">schema</replaceable></option></term>
+      <listitem>
+       <para>
+        Do not repack or analyze any tables in
+        <replaceable class="parameter">schema</replaceable>.  Multiple schemas
+        can be excluded by writing multiple <option>-N</option> switches.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-q</option></term>
+      <term><option>--quiet</option></term>
+      <listitem>
+       <para>
+        Do not display progress messages.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-t <replaceable class="parameter">table</replaceable> [ (<replaceable class="parameter">column</replaceable> [,...]) ]</option></term>
+      <term><option>--table=<replaceable class="parameter">table</replaceable> [ (<replaceable class="parameter">column</replaceable> [,...]) ]</option></term>
+      <listitem>
+       <para>
+        Repack or analyze <replaceable class="parameter">table</replaceable>
+        only.  Column names can be specified only in conjunction with
+        the <option>--analyze</option> option.  Multiple tables can be
+        repacked by writing multiple
+        <option>-t</option> switches.
+       </para>
+       <tip>
+        <para>
+         If you specify columns, you probably have to escape the parentheses
+         from the shell.  (See examples below.)
+        </para>
+       </tip>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-v</option></term>
+      <term><option>--verbose</option></term>
+      <listitem>
+       <para>
+        Print detailed information during processing.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-V</option></term>
+       <term><option>--version</option></term>
+       <listitem>
+       <para>
+       Print the <application>pg_repackdb</application> version and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-z</option></term>
+      <term><option>--analyze</option></term>
+      <listitem>
+       <para>
+        Also calculate statistics for use by the optimizer.  If a column name
+        list is given, only compute statistics for those columns.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-?</option></term>
+       <term><option>--help</option></term>
+       <listitem>
+       <para>
+       Show help about <application>pg_repackdb</application> command line
+       arguments, and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+    </variablelist>
+   </para>
+
+   <para>
+    <application>pg_repackdb</application> also accepts
+    the following command-line arguments for connection parameters:
+    <variablelist>
+     <varlistentry>
+      <term><option>-h <replaceable class="parameter">host</replaceable></option></term>
+      <term><option>--host=<replaceable class="parameter">host</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the host name of the machine on which the server
+        is running.  If the value begins with a slash, it is used
+        as the directory for the Unix domain socket.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-p <replaceable class="parameter">port</replaceable></option></term>
+      <term><option>--port=<replaceable class="parameter">port</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the TCP port or local Unix domain socket file
+        extension on which the server
+        is listening for connections.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-U <replaceable class="parameter">username</replaceable></option></term>
+      <term><option>--username=<replaceable class="parameter">username</replaceable></option></term>
+      <listitem>
+       <para>
+        User name to connect as.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-w</option></term>
+      <term><option>--no-password</option></term>
+      <listitem>
+       <para>
+        Never issue a password prompt.  If the server requires
+        password authentication and a password is not available by
+        other means such as a <filename>.pgpass</filename> file, the
+        connection attempt will fail.  This option can be useful in
+        batch jobs and scripts where no user is present to enter a
+        password.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-W</option></term>
+      <term><option>--password</option></term>
+      <listitem>
+       <para>
+        Force <application>pg_repackdb</application> to prompt for a
+        password before connecting to a database.
+       </para>
+
+       <para>
+        This option is never essential, since
+        <application>pg_repackdb</application> will automatically prompt
+        for a password if the server demands password authentication.
+        However, <application>pg_repackdb</application> will waste a
+        connection attempt finding out that the server wants a password.
+        In some cases it is worth typing <option>-W</option> to avoid the extra
+        connection attempt.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--maintenance-db=<replaceable class="parameter">dbname</replaceable></option></term>
+      <listitem>
+       <para>
+        When the <option>-a</option>/<option>--all</option> is used, connect
+        to this database to gather the list of databases to repack.
+        If not specified, the <literal>postgres</literal> database will be used,
+        or if that does not exist, <literal>template1</literal> will be used.
+        This can be a <link linkend="libpq-connstring">connection
+        string</link>.  If so, connection string parameters will override any
+        conflicting command line options.  Also, connection string parameters
+        other than the database name itself will be re-used when connecting
+        to other databases.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+   </para>
+ </refsect1>
+
+
+ <refsect1>
+  <title>Environment</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><envar>PGDATABASE</envar></term>
+    <term><envar>PGHOST</envar></term>
+    <term><envar>PGPORT</envar></term>
+    <term><envar>PGUSER</envar></term>
+
+    <listitem>
+     <para>
+      Default connection parameters
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><envar>PG_COLOR</envar></term>
+    <listitem>
+     <para>
+      Specifies whether to use color in diagnostic messages. Possible values
+      are <literal>always</literal>, <literal>auto</literal> and
+      <literal>never</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <para>
+   This utility, like most other <productname>PostgreSQL</productname> utilities,
+   also uses the environment variables supported by <application>libpq</application>
+   (see <xref linkend="libpq-envars"/>).
+  </para>
+
+ </refsect1>
+
+
+ <refsect1>
+  <title>Diagnostics</title>
+
+  <para>
+   In case of difficulty, see
+   <xref linkend="sql-repack"/> and <xref linkend="app-psql"/> for
+   discussions of potential problems and error messages.
+   The database server must be running at the
+   targeted host.  Also, any default connection settings and environment
+   variables used by the <application>libpq</application> front-end
+   library will apply.
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+   <para>
+    To repack the database <literal>test</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb test</userinput>
+</screen>
+   </para>
+
+   <para>
+    To repack and analyze for the optimizer a database named
+    <literal>bigdb</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --analyze bigdb</userinput>
+</screen>
+   </para>
+
+   <para>
+    To repack a single table
+    <literal>foo</literal> in a database named
+    <literal>xyzzy</literal>, and analyze a single column
+    <literal>bar</literal> of the table for the optimizer:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --analyze --verbose --table='foo(bar)' xyzzy</userinput>
+</screen></para>
+
+   <para>
+    To repack all tables in the <literal>foo</literal> and <literal>bar</literal> schemas
+    in a database named <literal>xyzzy</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --schema='foo' --schema='bar' xyzzy</userinput>
+</screen></para>
+
+
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-repack"/></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/repack.sgml b/doc/src/sgml/ref/repack.sgml
new file mode 100644
index 00000000000..61d5c2cdef1
--- /dev/null
+++ b/doc/src/sgml/ref/repack.sgml
@@ -0,0 +1,328 @@
+<!--
+doc/src/sgml/ref/repack.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="sql-repack">
+ <indexterm zone="sql-repack">
+  <primary>REPACK</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>REPACK</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>REPACK</refname>
+  <refpurpose>rewrite a table to reclaim disk space</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <replaceable class="parameter">table_and_columns</replaceable> [ USING INDEX [ <replaceable class="parameter">index_name</replaceable> ] ] ]
+REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] USING INDEX
+
+<phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
+
+    VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
+    ANALYZE [ <replaceable class="parameter">boolean</replaceable> ]
+
+<phrase>and <replaceable class="parameter">table_and_columns</replaceable> is:</phrase>
+
+    <replaceable class="parameter">table_name</replaceable> [ ( <replaceable class="parameter">column_name</replaceable> [, ...] ) ]
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>REPACK</command> reclaims storage occupied by dead
+   tuples. Unlike <command>VACUUM</command>, it does so by rewriting the
+   entire contents of the table specified
+   by <replaceable class="parameter">table_name</replaceable> into a new disk
+   file with no extra space (except for the space guaranteed by
+   the <literal>fillfactor</literal> storage parameter), allowing unused space
+   to be returned to the operating system.
+  </para>
+
+  <para>
+   Without
+   a <replaceable class="parameter">table_name</replaceable>, <command>REPACK</command>
+   processes every table and materialized view in the current database that
+   the current user has the <literal>MAINTAIN</literal> privilege on. This
+   form of <command>REPACK</command> cannot be executed inside a transaction
+   block.
+  </para>
+
+  <para>
+   If a <literal>USING INDEX</literal> clause is specified, the rows are
+   physically reordered based on information from an index.  Please see the
+   notes on clustering below.
+  </para>
+
+  <para>
+   When a table is being repacked, an <literal>ACCESS EXCLUSIVE</literal> lock
+   is acquired on it. This prevents any other database operations (both reads
+   and writes) from operating on the table until the <command>REPACK</command>
+   is finished.
+  </para>
+
+  <refsect2 id="sql-repack-notes-on-clustering" xreflabel="Notes on Clustering">
+   <title>Notes on Clustering</title>
+
+   <para>
+    If the <literal>USING INDEX</literal> clause is specified, the rows in
+    the table are physically reordered following an index: if an index name
+    is specified in the command, then that index is used; if no index name
+    is specified, then the index that has been configured as the index to
+    cluster on.  If no index has been configured in this way, an error is
+    thrown.  The index given in the <literal>USING INDEX</literal> clause
+    is configured as the index to cluster on, as well as an index given
+    to the <command>CLUSTER</command> command.  An index can be set
+    manually using <command>ALTER TABLE ... CLUSTER ON</command>, and reset
+    with <command>ALTER TABLE ... SET WITHOUT CLUSTER</command>.
+   </para>
+
+   <para>
+    If no table name is specified in <command>REPACK USING INDEX</command>,
+    all tables which have a clustering index defined and which the calling
+    user has privileges for are processed.
+   </para>
+
+   <para>
+    Clustering is a one-time operation: when the table is
+    subsequently updated, the changes are not clustered.  That is, no attempt
+    is made to store new or updated rows according to their index order.  (If
+    one wishes, one can periodically recluster by issuing the command again.
+    Also, setting the table's <literal>fillfactor</literal> storage parameter
+    to less than 100% can aid in preserving cluster ordering during updates,
+    since updated rows are kept on the same page if enough space is available
+    there.)
+   </para>
+
+   <para>
+    In cases where you are accessing single rows randomly within a table, the
+    actual order of the data in the table is unimportant. However, if you tend
+    to access some data more than others, and there is an index that groups
+    them together, you will benefit from using clustering.  If
+    you are requesting a range of indexed values from a table, or a single
+    indexed value that has multiple rows that match,
+    <command>REPACK</command> will help because once the index identifies the
+    table page for the first row that matches, all other rows that match are
+    probably already on the same table page, and so you save disk accesses and
+    speed up the query.
+   </para>
+
+   <para>
+    <command>REPACK</command> can re-sort the table using either an index scan
+    on the specified index (if the index is a b-tree), or a sequential scan
+    followed by sorting.  It will attempt to choose the method that will be
+    faster, based on planner cost parameters and available statistical
+    information.
+   </para>
+
+   <para>
+    Because the planner records statistics about the ordering of tables, it is
+    advisable to
+    run <link linkend="sql-analyze"><command>ANALYZE</command></link> on the
+    newly repacked table.  Otherwise, the planner might make poor choices of
+    query plans.
+   </para>
+  </refsect2>
+
+  <refsect2 id="sql-repack-notes-on-resources" xreflabel="Notes on Resources">
+   <title>Notes on Resources</title>
+
+   <para>
+    When an index scan or a sequential scan without sort is used, a temporary
+    copy of the table is created that contains the table data in the index
+    order.  Temporary copies of each index on the table are created as well.
+    Therefore, you need free space on disk at least equal to the sum of the
+    table size and the index sizes.
+   </para>
+
+   <para>
+    When a sequential scan and sort is used, a temporary sort file is also
+    created, so that the peak temporary space requirement is as much as double
+    the table size, plus the index sizes.  This method is often faster than
+    the index scan method, but if the disk space requirement is intolerable,
+    you can disable this choice by temporarily setting
+    <xref linkend="guc-enable-sort"/> to <literal>off</literal>.
+   </para>
+
+   <para>
+    It is advisable to set <xref linkend="guc-maintenance-work-mem"/> to a
+    reasonably large value (but not more than the amount of RAM you can
+    dedicate to the <command>REPACK</command> operation) before repacking.
+   </para>
+  </refsect2>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><replaceable class="parameter">table_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (possibly schema-qualified) of a table.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">column_name</replaceable></term>
+    <listitem>
+     <para>
+      The name of a specific column to analyze. Defaults to all columns.
+      If a column list is specific, <literal>ANALYZE</literal> must also
+      be specified.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">index_name</replaceable></term>
+    <listitem>
+     <para>
+      The name of an index.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>VERBOSE</literal></term>
+    <listitem>
+     <para>
+      Prints a progress report as each table is repacked
+      at <literal>INFO</literal> level.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>ANALYZE</literal></term>
+    <term><literal>ANALYSE</literal></term>
+    <listitem>
+     <para>
+      Applies <xref linkend="sql-analyze"/> on the table after repacking.  This is
+      currently only supported when a single (non-partitioned) table is specified.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">boolean</replaceable></term>
+    <listitem>
+     <para>
+      Specifies whether the selected option should be turned on or off.
+      You can write <literal>TRUE</literal>, <literal>ON</literal>, or
+      <literal>1</literal> to enable the option, and <literal>FALSE</literal>,
+      <literal>OFF</literal>, or <literal>0</literal> to disable it.  The
+      <replaceable class="parameter">boolean</replaceable> value can also
+      be omitted, in which case <literal>TRUE</literal> is assumed.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+
+   <para>
+    To repack a table, one must have the <literal>MAINTAIN</literal> privilege
+    on the table.
+   </para>
+
+   <para>
+    While <command>REPACK</command> is running, the <xref
+    linkend="guc-search-path"/> is temporarily changed to <literal>pg_catalog,
+    pg_temp</literal>.
+   </para>
+
+  <para>
+    Each backend running <command>REPACK</command> will report its progress
+    in the <structname>pg_stat_progress_repack</structname> view. See
+    <xref linkend="repack-progress-reporting"/> for details.
+  </para>
+
+   <para>
+    Repacking a partitioned table repacks each of its partitions. If an index
+    is specified, each partition is repacked using the partition of that
+    index. <command>REPACK</command> on a partitioned table cannot be executed
+    inside a transaction block.
+   </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   Repack the table <literal>employees</literal>:
+<programlisting>
+REPACK employees;
+</programlisting>
+  </para>
+
+  <para>
+   Repack the table <literal>employees</literal> on the basis of its
+   index <literal>employees_ind</literal> (Since index is used here, this is
+   effectively clustering):
+<programlisting>
+REPACK employees USING INDEX employees_ind;
+</programlisting>
+  </para>
+
+  <para>
+   Repack the table <literal>cases</literal> on physical ordering,
+   running an <command>ANALYZE</command> on the given columns once
+   repacking is done, showing informational messages:
+<programlisting>
+REPACK (ANALYZE, VERBOSE) cases (district, case_nr);
+</programlisting>
+  </para>
+
+  <para>
+   Repack all tables in the database on which you have
+   the <literal>MAINTAIN</literal> privilege:
+<programlisting>
+REPACK;
+</programlisting>
+  </para>
+
+  <para>
+   Repack all tables for which a clustering index has previously been
+   configured on which you have the <literal>MAINTAIN</literal> privilege,
+   showing informational messages:
+<programlisting>
+REPACK (VERBOSE) USING INDEX;
+</programlisting>
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There is no <command>REPACK</command> statement in the SQL standard.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="app-pgrepackdb"/></member>
+   <member><xref linkend="repack-progress-reporting"/></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index 6d0fdd43cfb..ac5d083d468 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -25,7 +25,6 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
 
 <phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
 
-    FULL [ <replaceable class="parameter">boolean</replaceable> ]
     FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
     VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
     ANALYZE [ <replaceable class="parameter">boolean</replaceable> ]
@@ -39,6 +38,7 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
     SKIP_DATABASE_STATS [ <replaceable class="parameter">boolean</replaceable> ]
     ONLY_DATABASE_STATS [ <replaceable class="parameter">boolean</replaceable> ]
     BUFFER_USAGE_LIMIT <replaceable class="parameter">size</replaceable>
+    FULL [ <replaceable class="parameter">boolean</replaceable> ]
 
 <phrase>and <replaceable class="parameter">table_and_columns</replaceable> is:</phrase>
 
@@ -95,20 +95,6 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
   <title>Parameters</title>
 
   <variablelist>
-   <varlistentry>
-    <term><literal>FULL</literal></term>
-    <listitem>
-     <para>
-      Selects <quote>full</quote> vacuum, which can reclaim more
-      space, but takes much longer and exclusively locks the table.
-      This method also requires extra disk space, since it writes a
-      new copy of the table and doesn't release the old copy until
-      the operation is complete.  Usually this should only be used when a
-      significant amount of space needs to be reclaimed from within the table.
-     </para>
-    </listitem>
-   </varlistentry>
-
    <varlistentry>
     <term><literal>FREEZE</literal></term>
     <listitem>
@@ -362,6 +348,23 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>FULL</literal></term>
+    <listitem>
+     <para>
+      This option, which is deprecated, makes <command>VACUUM</command>
+      behave like <command>REPACK</command> without a
+      <literal>USING INDEX</literal> clause.
+      This method of compacting the table takes much longer than
+      <command>VACUUM</command> and exclusively locks the table.
+      This method also requires extra disk space, since it writes a
+      new copy of the table and doesn't release the old copy until
+      the operation is complete.  Usually this should only be used when a
+      significant amount of space needs to be reclaimed from within the table.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><replaceable class="parameter">boolean</replaceable></term>
     <listitem>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 2cf02c37b17..5d9a8a25a02 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -195,6 +195,7 @@
    &refreshMaterializedView;
    &reindex;
    &releaseSavepoint;
+   &repack;
    &reset;
    &revoke;
    &rollback;
@@ -258,6 +259,7 @@
    &pgIsready;
    &pgReceivewal;
    &pgRecvlogical;
+   &pgRepackdb;
    &pgRestore;
    &pgVerifyBackup;
    &psqlRef;
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index dd4fe6bf62f..b3a19003cdd 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -741,13 +741,13 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	if (OldIndex != NULL && !use_sort)
 	{
 		const int	ci_index[] = {
-			PROGRESS_CLUSTER_PHASE,
-			PROGRESS_CLUSTER_INDEX_RELID
+			PROGRESS_REPACK_PHASE,
+			PROGRESS_REPACK_INDEX_RELID
 		};
 		int64		ci_val[2];
 
 		/* Set phase and OIDOldIndex to columns */
-		ci_val[0] = PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP;
+		ci_val[0] = PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP;
 		ci_val[1] = RelationGetRelid(OldIndex);
 		pgstat_progress_update_multi_param(2, ci_index, ci_val);
 
@@ -759,15 +759,15 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	else
 	{
 		/* In scan-and-sort mode and also VACUUM FULL, set phase */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
 		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
 		/* Set total heap blocks */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_TOTAL_HEAP_BLKS,
+		pgstat_progress_update_param(PROGRESS_REPACK_TOTAL_HEAP_BLKS,
 									 heapScan->rs_nblocks);
 	}
 
@@ -809,7 +809,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 				 * is manually updated to the correct value when the table
 				 * scan finishes.
 				 */
-				pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+				pgstat_progress_update_param(PROGRESS_REPACK_HEAP_BLKS_SCANNED,
 											 heapScan->rs_nblocks);
 				break;
 			}
@@ -825,7 +825,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			 */
 			if (prev_cblock != heapScan->rs_cblock)
 			{
-				pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+				pgstat_progress_update_param(PROGRESS_REPACK_HEAP_BLKS_SCANNED,
 											 (heapScan->rs_cblock +
 											  heapScan->rs_nblocks -
 											  heapScan->rs_startblock
@@ -912,14 +912,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			 * In scan-and-sort mode, report increase in number of tuples
 			 * scanned
 			 */
-			pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
 										 *num_tuples);
 		}
 		else
 		{
 			const int	ct_index[] = {
-				PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED,
-				PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN
+				PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
+				PROGRESS_REPACK_HEAP_TUPLES_WRITTEN
 			};
 			int64		ct_val[2];
 
@@ -952,14 +952,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		double		n_tuples = 0;
 
 		/* Report that we are now sorting tuples */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_SORT_TUPLES);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_SORT_TUPLES);
 
 		tuplesort_performsort(tuplesort);
 
 		/* Report that we are now writing new heap */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_WRITE_NEW_HEAP);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP);
 
 		for (;;)
 		{
@@ -977,7 +977,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 									 values, isnull,
 									 rwstate);
 			/* Report n_tuples */
-			pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_WRITTEN,
 										 n_tuples);
 		}
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 8dea58ad96b..bd77584bc99 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -4079,7 +4079,7 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 		Assert(!ReindexIsProcessingIndex(indexOid));
 
 		/* Set index rebuild count */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_INDEX_REBUILD_COUNT,
+		pgstat_progress_update_param(PROGRESS_REPACK_INDEX_REBUILD_COUNT,
 									 i);
 		i++;
 	}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 0a0f95f6bb9..6c1461c9ef6 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1283,14 +1283,15 @@ CREATE VIEW pg_stat_progress_vacuum AS
     FROM pg_stat_get_progress_info('VACUUM') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
-CREATE VIEW pg_stat_progress_cluster AS
+CREATE VIEW pg_stat_progress_repack AS
     SELECT
         S.pid AS pid,
         S.datid AS datid,
         D.datname AS datname,
         S.relid AS relid,
         CASE S.param1 WHEN 1 THEN 'CLUSTER'
-                      WHEN 2 THEN 'VACUUM FULL'
+                      WHEN 2 THEN 'REPACK'
+                      WHEN 3 THEN 'VACUUM FULL'
                       END AS command,
         CASE S.param2 WHEN 0 THEN 'initializing'
                       WHEN 1 THEN 'seq scanning heap'
@@ -1301,15 +1302,35 @@ CREATE VIEW pg_stat_progress_cluster AS
                       WHEN 6 THEN 'rebuilding index'
                       WHEN 7 THEN 'performing final cleanup'
                       END AS phase,
-        CAST(S.param3 AS oid) AS cluster_index_relid,
+        CAST(S.param3 AS oid) AS repack_index_relid,
         S.param4 AS heap_tuples_scanned,
         S.param5 AS heap_tuples_written,
         S.param6 AS heap_blks_total,
         S.param7 AS heap_blks_scanned,
         S.param8 AS index_rebuild_count
-    FROM pg_stat_get_progress_info('CLUSTER') AS S
+    FROM pg_stat_get_progress_info('REPACK') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
+-- This view is as the one above, except for renaming a column and avoiding
+-- 'REPACK' as a command name to report.
+CREATE VIEW pg_stat_progress_cluster AS
+    SELECT
+        pid,
+        datid,
+        datname,
+        relid,
+        CASE WHEN command IN ('CLUSTER', 'VACUUM FULL') THEN command
+             WHEN repack_index_relid = 0 THEN 'VACUUM FULL'
+             ELSE 'CLUSTER' END AS command,
+        phase,
+        repack_index_relid AS cluster_index_relid,
+        heap_tuples_scanned,
+        heap_tuples_written,
+        heap_blks_total,
+        heap_blks_scanned,
+        index_rebuild_count
+    FROM pg_stat_progress_repack;
+
 CREATE VIEW pg_stat_progress_create_index AS
     SELECT
         S.pid AS pid, S.datid AS datid, D.datname AS datname,
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 2120c85ccb4..7f772c5c4f8 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1,7 +1,8 @@
 /*-------------------------------------------------------------------------
  *
  * cluster.c
- *	  CLUSTER a table on an index.  This is now also used for VACUUM FULL.
+ *	  CLUSTER a table on an index.  This is now also used for VACUUM FULL and
+ *	  REPACK.
  *
  * There is hardly anything left of Paul Brown's original implementation...
  *
@@ -67,27 +68,36 @@ typedef struct
 	Oid			indexOid;
 } RelToCluster;
 
-
-static void cluster_multiple_rels(List *rtcs, ClusterParams *params);
-static void rebuild_relation(Relation OldHeap, Relation index, bool verbose);
+static bool cluster_rel_recheck(RepackCommand cmd, Relation OldHeap,
+								Oid indexOid, Oid userid, int options);
+static void rebuild_relation(RepackCommand cmd,
+							 Relation OldHeap, Relation index, bool verbose);
 static void copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
-static List *get_tables_to_cluster(MemoryContext cluster_context);
-static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
-											   Oid indexOid);
-static bool cluster_is_permitted_for_relation(Oid relid, Oid userid);
+static List *get_tables_to_repack(RepackCommand cmd, bool usingindex,
+								  MemoryContext permcxt);
+static List *get_tables_to_repack_partitioned(RepackCommand cmd,
+											  Oid relid, bool rel_is_index,
+											  MemoryContext permcxt);
+static bool cluster_is_permitted_for_relation(RepackCommand cmd,
+											  Oid relid, Oid userid);
+static Relation process_single_relation(RepackStmt *stmt,
+										ClusterParams *params);
+static Oid	determine_clustered_index(Relation rel, bool usingindex,
+									  const char *indexname);
+static const char *RepackCommandAsString(RepackCommand cmd);
 
 
-/*---------------------------------------------------------------------------
- * This cluster code allows for clustering multiple tables at once. Because
+/*
+ * The repack code allows for processing multiple tables at once. Because
  * of this, we cannot just run everything on a single transaction, or we
  * would be forced to acquire exclusive locks on all the tables being
  * clustered, simultaneously --- very likely leading to deadlock.
  *
- * To solve this we follow a similar strategy to VACUUM code,
- * clustering each relation in a separate transaction. For this to work,
- * we need to:
+ * To solve this we follow a similar strategy to VACUUM code, processing each
+ * relation in a separate transaction. For this to work, we need to:
+ *
  *	- provide a separate memory context so that we can pass information in
  *	  a way that survives across transactions
  *	- start a new transaction every time a new relation is clustered
@@ -98,197 +108,165 @@ static bool cluster_is_permitted_for_relation(Oid relid, Oid userid);
  *
  * The single-relation case does not have any such overhead.
  *
- * We also allow a relation to be specified without index.  In that case,
- * the indisclustered bit will be looked up, and an ERROR will be thrown
- * if there is no index with the bit set.
- *---------------------------------------------------------------------------
+ * We also allow a relation to be repacked following an index, but without
+ * naming a specific one.  In that case, the indisclustered bit will be
+ * looked up, and an ERROR will be thrown if no so-marked index is found.
  */
 void
-cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
+ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 {
-	ListCell   *lc;
 	ClusterParams params = {0};
-	bool		verbose = false;
 	Relation	rel = NULL;
-	Oid			indexOid = InvalidOid;
-	MemoryContext cluster_context;
+	MemoryContext repack_context;
 	List	   *rtcs;
 
 	/* Parse option list */
-	foreach(lc, stmt->params)
+	foreach_node(DefElem, opt, stmt->params)
 	{
-		DefElem    *opt = (DefElem *) lfirst(lc);
-
 		if (strcmp(opt->defname, "verbose") == 0)
-			verbose = defGetBoolean(opt);
+			params.options |= defGetBoolean(opt) ? CLUOPT_VERBOSE : 0;
+		else if (strcmp(opt->defname, "analyze") == 0 ||
+				 strcmp(opt->defname, "analyse") == 0)
+			params.options |= defGetBoolean(opt) ? CLUOPT_ANALYZE : 0;
 		else
 			ereport(ERROR,
-					(errcode(ERRCODE_SYNTAX_ERROR),
-					 errmsg("unrecognized %s option \"%s\"",
-							"CLUSTER", opt->defname),
-					 parser_errposition(pstate, opt->location)));
+					errcode(ERRCODE_SYNTAX_ERROR),
+					errmsg("unrecognized %s option \"%s\"",
+						   RepackCommandAsString(stmt->command),
+						   opt->defname),
+					parser_errposition(pstate, opt->location));
 	}
 
-	params.options = (verbose ? CLUOPT_VERBOSE : 0);
-
+	/*
+	 * If a single relation is specified, process it and we're done ... unless
+	 * the relation is a partitioned table, in which case we fall through.
+	 */
 	if (stmt->relation != NULL)
 	{
-		/* This is the single-relation case. */
-		Oid			tableOid;
-
-		/*
-		 * Find, lock, and check permissions on the table.  We obtain
-		 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
-		 * single-transaction case.
-		 */
-		tableOid = RangeVarGetRelidExtended(stmt->relation,
-											AccessExclusiveLock,
-											0,
-											RangeVarCallbackMaintainsTable,
-											NULL);
-		rel = table_open(tableOid, NoLock);
-
-		/*
-		 * Reject clustering a remote temp table ... their local buffer
-		 * manager is not going to cope.
-		 */
-		if (RELATION_IS_OTHER_TEMP(rel))
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot cluster temporary tables of other sessions")));
-
-		if (stmt->indexname == NULL)
-		{
-			ListCell   *index;
-
-			/* We need to find the index that has indisclustered set. */
-			foreach(index, RelationGetIndexList(rel))
-			{
-				indexOid = lfirst_oid(index);
-				if (get_index_isclustered(indexOid))
-					break;
-				indexOid = InvalidOid;
-			}
-
-			if (!OidIsValid(indexOid))
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_OBJECT),
-						 errmsg("there is no previously clustered index for table \"%s\"",
-								stmt->relation->relname)));
-		}
-		else
-		{
-			/*
-			 * The index is expected to be in the same namespace as the
-			 * relation.
-			 */
-			indexOid = get_relname_relid(stmt->indexname,
-										 rel->rd_rel->relnamespace);
-			if (!OidIsValid(indexOid))
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_OBJECT),
-						 errmsg("index \"%s\" for table \"%s\" does not exist",
-								stmt->indexname, stmt->relation->relname)));
-		}
-
-		/* For non-partitioned tables, do what we came here to do. */
-		if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
-		{
-			cluster_rel(rel, indexOid, &params);
-			/* cluster_rel closes the relation, but keeps lock */
-
-			return;
-		}
+		rel = process_single_relation(stmt, &params);
+		if (rel == NULL)
+			return;				/* all done */
 	}
 
+	/*
+	 * Don't allow ANALYZE in the multiple-relation case for now.  Maybe we
+	 * can add support for this later.
+	 */
+	if (params.options & CLUOPT_ANALYZE)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot %s multiple tables", "REPACK (ANALYZE)"));
+
 	/*
 	 * By here, we know we are in a multi-table situation.  In order to avoid
 	 * holding locks for too long, we want to process each table in its own
 	 * transaction.  This forces us to disallow running inside a user
 	 * transaction block.
 	 */
-	PreventInTransactionBlock(isTopLevel, "CLUSTER");
+	PreventInTransactionBlock(isTopLevel, RepackCommandAsString(stmt->command));
 
 	/* Also, we need a memory context to hold our list of relations */
-	cluster_context = AllocSetContextCreate(PortalContext,
-											"Cluster",
-											ALLOCSET_DEFAULT_SIZES);
+	repack_context = AllocSetContextCreate(PortalContext,
+										   "Repack",
+										   ALLOCSET_DEFAULT_SIZES);
+
+	params.options |= CLUOPT_RECHECK;
 
 	/*
-	 * Either we're processing a partitioned table, or we were not given any
-	 * table name at all.  In either case, obtain a list of relations to
-	 * process.
-	 *
-	 * In the former case, an index name must have been given, so we don't
-	 * need to recheck its "indisclustered" bit, but we have to check that it
-	 * is an index that we can cluster on.  In the latter case, we set the
-	 * option bit to have indisclustered verified.
-	 *
-	 * Rechecking the relation itself is necessary here in all cases.
+	 * If we don't have a relation yet, determine a relation list.  If we do,
+	 * then it must be a partitioned table, and we want to process its
+	 * partitions.
 	 */
-	params.options |= CLUOPT_RECHECK;
-	if (rel != NULL)
+	if (rel == NULL)
 	{
-		Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
-		check_index_is_clusterable(rel, indexOid, AccessShareLock);
-		rtcs = get_tables_to_cluster_partitioned(cluster_context, indexOid);
-
-		/* close relation, releasing lock on parent table */
-		table_close(rel, AccessExclusiveLock);
+		Assert(stmt->indexname == NULL);
+		rtcs = get_tables_to_repack(stmt->command, stmt->usingindex,
+									repack_context);
 	}
 	else
 	{
-		rtcs = get_tables_to_cluster(cluster_context);
-		params.options |= CLUOPT_RECHECK_ISCLUSTERED;
-	}
+		Oid			relid;
+		bool		rel_is_index;
 
-	/* Do the job. */
-	cluster_multiple_rels(rtcs, &params);
+		Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
 
-	/* Start a new transaction for the cleanup work. */
-	StartTransactionCommand();
+		/*
+		 * If USING INDEX was specified, resolve the index name now and pass
+		 * it down.
+		 */
+		if (stmt->usingindex)
+		{
+			/*
+			 * If no index name was specified when repacking a partitioned
+			 * table, punt for now.  Maybe we can improve this later.
+			 */
+			if (!stmt->indexname)
+				ereport(ERROR,
+						errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+						errmsg("there is no previously clustered index for table \"%s\"",
+							   RelationGetRelationName(rel)));
+
+			relid = determine_clustered_index(rel, stmt->usingindex,
+											  stmt->indexname);
+			if (!OidIsValid(relid))
+				elog(ERROR, "unable to determine index to cluster on");
+			/* XXX is this the right place for this check? */
+			check_index_is_clusterable(rel, relid, AccessExclusiveLock);
+			rel_is_index = true;
+		}
+		else
+		{
+			relid = RelationGetRelid(rel);
+			rel_is_index = false;
+		}
 
-	/* Clean up working storage */
-	MemoryContextDelete(cluster_context);
-}
+		rtcs = get_tables_to_repack_partitioned(stmt->command,
+												relid, rel_is_index,
+												repack_context);
 
-/*
- * Given a list of relations to cluster, process each of them in a separate
- * transaction.
- *
- * We expect to be in a transaction at start, but there isn't one when we
- * return.
- */
-static void
-cluster_multiple_rels(List *rtcs, ClusterParams *params)
-{
-	ListCell   *lc;
+		/* close parent relation, releasing lock on it */
+		table_close(rel, AccessExclusiveLock);
+		rel = NULL;
+	}
 
 	/* Commit to get out of starting transaction */
 	PopActiveSnapshot();
 	CommitTransactionCommand();
 
 	/* Cluster the tables, each in a separate transaction */
-	foreach(lc, rtcs)
+	Assert(rel == NULL);
+	foreach_ptr(RelToCluster, rtc, rtcs)
 	{
-		RelToCluster *rtc = (RelToCluster *) lfirst(lc);
-		Relation	rel;
-
 		/* Start a new transaction for each relation. */
 		StartTransactionCommand();
 
+		/*
+		 * Open the target table, coping with the case where it has been
+		 * dropped.
+		 */
+		rel = try_table_open(rtc->tableOid, AccessExclusiveLock);
+		if (rel == NULL)
+		{
+			CommitTransactionCommand();
+			continue;
+		}
+
 		/* functions in indexes may want a snapshot set */
 		PushActiveSnapshot(GetTransactionSnapshot());
 
-		rel = table_open(rtc->tableOid, AccessExclusiveLock);
-
 		/* Process this table */
-		cluster_rel(rel, rtc->indexOid, params);
+		cluster_rel(stmt->command, rel, rtc->indexOid, &params);
 		/* cluster_rel closes the relation, but keeps lock */
 
 		PopActiveSnapshot();
 		CommitTransactionCommand();
 	}
+
+	/* Start a new transaction for the cleanup work. */
+	StartTransactionCommand();
+
+	/* Clean up working storage */
+	MemoryContextDelete(repack_context);
 }
 
 /*
@@ -304,11 +282,14 @@ cluster_multiple_rels(List *rtcs, ClusterParams *params)
  * them incrementally while we load the table.
  *
  * If indexOid is InvalidOid, the table will be rewritten in physical order
- * instead of index order.  This is the new implementation of VACUUM FULL,
- * and error messages should refer to the operation as VACUUM not CLUSTER.
+ * instead of index order.
+ *
+ * 'cmd' indicates which command is being executed, to be used for error
+ * messages.
  */
 void
-cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
+cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
+			ClusterParams *params)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			save_userid;
@@ -323,13 +304,8 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	/* Check for user-requested abort. */
 	CHECK_FOR_INTERRUPTS();
 
-	pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
-	if (OidIsValid(indexOid))
-		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
-									 PROGRESS_CLUSTER_COMMAND_CLUSTER);
-	else
-		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
-									 PROGRESS_CLUSTER_COMMAND_VACUUM_FULL);
+	pgstat_progress_start_command(PROGRESS_COMMAND_REPACK, tableOid);
+	pgstat_progress_update_param(PROGRESS_REPACK_COMMAND, cmd);
 
 	/*
 	 * Switch to the table owner's userid, so that any index functions are run
@@ -350,86 +326,38 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	 * *must* skip the one on indisclustered since it would reject an attempt
 	 * to cluster a not-previously-clustered index.
 	 */
-	if (recheck)
-	{
-		/* Check that the user still has privileges for the relation */
-		if (!cluster_is_permitted_for_relation(tableOid, save_userid))
-		{
-			relation_close(OldHeap, AccessExclusiveLock);
-			goto out;
-		}
-
-		/*
-		 * Silently skip a temp table for a remote session.  Only doing this
-		 * check in the "recheck" case is appropriate (which currently means
-		 * somebody is executing a database-wide CLUSTER or on a partitioned
-		 * table), because there is another check in cluster() which will stop
-		 * any attempt to cluster remote temp tables by name.  There is
-		 * another check in cluster_rel which is redundant, but we leave it
-		 * for extra safety.
-		 */
-		if (RELATION_IS_OTHER_TEMP(OldHeap))
-		{
-			relation_close(OldHeap, AccessExclusiveLock);
-			goto out;
-		}
-
-		if (OidIsValid(indexOid))
-		{
-			/*
-			 * Check that the index still exists
-			 */
-			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
-			{
-				relation_close(OldHeap, AccessExclusiveLock);
-				goto out;
-			}
-
-			/*
-			 * Check that the index is still the one with indisclustered set,
-			 * if needed.
-			 */
-			if ((params->options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
-				!get_index_isclustered(indexOid))
-			{
-				relation_close(OldHeap, AccessExclusiveLock);
-				goto out;
-			}
-		}
-	}
+	if (recheck &&
+		!cluster_rel_recheck(cmd, OldHeap, indexOid, save_userid,
+							 params->options))
+		goto out;
 
 	/*
-	 * We allow VACUUM FULL, but not CLUSTER, on shared catalogs.  CLUSTER
-	 * would work in most respects, but the index would only get marked as
-	 * indisclustered in the current database, leading to unexpected behavior
-	 * if CLUSTER were later invoked in another database.
+	 * We allow repacking shared catalogs only when not using an index. It
+	 * would work to use an index in most respects, but the index would only
+	 * get marked as indisclustered in the current database, leading to
+	 * unexpected behavior if CLUSTER were later invoked in another database.
 	 */
 	if (OidIsValid(indexOid) && OldHeap->rd_rel->relisshared)
 		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot cluster a shared catalog")));
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot run %s on a shared catalog",
+					   RepackCommandAsString(cmd)));
 
 	/*
 	 * Don't process temp tables of other backends ... their local buffer
 	 * manager is not going to cope.
 	 */
 	if (RELATION_IS_OTHER_TEMP(OldHeap))
-	{
-		if (OidIsValid(indexOid))
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot cluster temporary tables of other sessions")));
-		else
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot vacuum temporary tables of other sessions")));
-	}
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot run %s on temporary tables of other sessions",
+					   RepackCommandAsString(cmd)));
 
 	/*
 	 * Also check for active uses of the relation in the current transaction,
 	 * including open scans and pending AFTER trigger events.
 	 */
-	CheckTableNotInUse(OldHeap, OidIsValid(indexOid) ? "CLUSTER" : "VACUUM");
+	CheckTableNotInUse(OldHeap, RepackCommandAsString(cmd));
 
 	/* Check heap and index are valid to cluster on */
 	if (OidIsValid(indexOid))
@@ -442,6 +370,24 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	else
 		index = NULL;
 
+	/*
+	 * When allow_system_table_mods is turned off, we disallow repacking a
+	 * catalog on a particular index unless that's already the clustered index
+	 * for that catalog.
+	 *
+	 * XXX We don't check for this in CLUSTER, because it's historically been
+	 * allowed.
+	 */
+	if (cmd != REPACK_COMMAND_CLUSTER &&
+		!allowSystemTableMods && OidIsValid(indexOid) &&
+		IsCatalogRelation(OldHeap) && !index->rd_index->indisclustered)
+		ereport(ERROR,
+				errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				errmsg("permission denied: \"%s\" is a system catalog",
+					   RelationGetRelationName(OldHeap)),
+				errdetail("System catalogs can only be clustered by the index they're already clustered on, if any, unless \"%s\" is enabled.",
+						  "allow_system_table_mods"));
+
 	/*
 	 * Quietly ignore the request if this is a materialized view which has not
 	 * been populated from its query. No harm is done because there is no data
@@ -469,7 +415,7 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	TransferPredicateLocksToHeapRelation(OldHeap);
 
 	/* rebuild_relation does all the dirty work */
-	rebuild_relation(OldHeap, index, verbose);
+	rebuild_relation(cmd, OldHeap, index, verbose);
 	/* rebuild_relation closes OldHeap, and index if valid */
 
 out:
@@ -482,6 +428,63 @@ out:
 	pgstat_progress_end_command();
 }
 
+/*
+ * Check if the table (and its index) still meets the requirements of
+ * cluster_rel().
+ */
+static bool
+cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
+					Oid userid, int options)
+{
+	Oid			tableOid = RelationGetRelid(OldHeap);
+
+	/* Check that the user still has privileges for the relation */
+	if (!cluster_is_permitted_for_relation(cmd, tableOid, userid))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		return false;
+	}
+
+	/*
+	 * Silently skip a temp table for a remote session.  Only doing this check
+	 * in the "recheck" case is appropriate (which currently means somebody is
+	 * executing a database-wide CLUSTER or on a partitioned table), because
+	 * there is another check in cluster() which will stop any attempt to
+	 * cluster remote temp tables by name.  There is another check in
+	 * cluster_rel which is redundant, but we leave it for extra safety.
+	 */
+	if (RELATION_IS_OTHER_TEMP(OldHeap))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		return false;
+	}
+
+	if (OidIsValid(indexOid))
+	{
+		/*
+		 * Check that the index still exists
+		 */
+		if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
+		{
+			relation_close(OldHeap, AccessExclusiveLock);
+			return false;
+		}
+
+		/*
+		 * Check that the index is still the one with indisclustered set, if
+		 * needed.
+		 */
+		if ((options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
+			!get_index_isclustered(indexOid))
+		{
+			relation_close(OldHeap, AccessExclusiveLock);
+			return false;
+		}
+	}
+
+	return true;
+}
+
 /*
  * Verify that the specified heap and index are valid to cluster on
  *
@@ -626,7 +629,8 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
  * On exit, they are closed, but locks on them are not released.
  */
 static void
-rebuild_relation(Relation OldHeap, Relation index, bool verbose)
+rebuild_relation(RepackCommand cmd,
+				 Relation OldHeap, Relation index, bool verbose)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			accessMethod = OldHeap->rd_rel->relam;
@@ -642,8 +646,8 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose)
 	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false) &&
 		   (index == NULL || CheckRelationLockedByMe(index, AccessExclusiveLock, false)));
 
-	if (index)
-		/* Mark the correct index as clustered */
+	/* for CLUSTER or REPACK USING INDEX, mark the index as the one to use */
+	if (index != NULL)
 		mark_index_clustered(OldHeap, RelationGetRelid(index), true);
 
 	/* Remember info about rel before closing OldHeap */
@@ -958,20 +962,20 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	/* Log what we're doing */
 	if (OldIndex != NULL && !use_sort)
 		ereport(elevel,
-				(errmsg("clustering \"%s.%s\" using index scan on \"%s\"",
-						nspname,
-						RelationGetRelationName(OldHeap),
-						RelationGetRelationName(OldIndex))));
+				errmsg("repacking \"%s.%s\" using index scan on \"%s\"",
+					   nspname,
+					   RelationGetRelationName(OldHeap),
+					   RelationGetRelationName(OldIndex)));
 	else if (use_sort)
 		ereport(elevel,
-				(errmsg("clustering \"%s.%s\" using sequential scan and sort",
-						nspname,
-						RelationGetRelationName(OldHeap))));
+				errmsg("repacking \"%s.%s\" using sequential scan and sort",
+					   nspname,
+					   RelationGetRelationName(OldHeap)));
 	else
 		ereport(elevel,
-				(errmsg("vacuuming \"%s.%s\"",
-						nspname,
-						RelationGetRelationName(OldHeap))));
+				errmsg("repacking \"%s.%s\" in physical order",
+					   nspname,
+					   RelationGetRelationName(OldHeap)));
 
 	/*
 	 * Hand off the actual copying to AM specific function, the generic code
@@ -1458,8 +1462,8 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	int			i;
 
 	/* Report that we are now swapping relation files */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_SWAP_REL_FILES);
 
 	/* Zero out possible results from swapped_relation_files */
 	memset(mapped_tables, 0, sizeof(mapped_tables));
@@ -1509,14 +1513,14 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 		reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
 
 	/* Report that we are now reindexing relations */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_REBUILD_INDEX);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
 
 	reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
 
 	/* Report that we are now doing clean up */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_FINAL_CLEANUP);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_FINAL_CLEANUP);
 
 	/*
 	 * If the relation being rebuilt is pg_class, swap_relation_files()
@@ -1632,106 +1636,191 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	}
 }
 
-
 /*
- * Get a list of tables that the current user has privileges on and
- * have indisclustered set.  Return the list in a List * of RelToCluster
- * (stored in the specified memory context), each one giving the tableOid
- * and the indexOid on which the table is already clustered.
+ * Determine which relations to process, when REPACK/CLUSTER is called
+ * without specifying a table name.  The exact process depends on whether
+ * USING INDEX was given or not, and in any case we only return tables and
+ * materialized views that the current user has privileges to repack/cluster.
+ *
+ * If USING INDEX was given, we scan pg_index to find those that have
+ * indisclustered set; if it was not given, scan pg_class and return all
+ * tables.
+ *
+ * Return it as a list of RelToCluster in the given memory context.
  */
 static List *
-get_tables_to_cluster(MemoryContext cluster_context)
+get_tables_to_repack(RepackCommand cmd, bool usingindex, MemoryContext permcxt)
 {
-	Relation	indRelation;
+	Relation	catalog;
 	TableScanDesc scan;
-	ScanKeyData entry;
-	HeapTuple	indexTuple;
-	Form_pg_index index;
-	MemoryContext old_context;
+	HeapTuple	tuple;
 	List	   *rtcs = NIL;
 
-	/*
-	 * Get all indexes that have indisclustered set and that the current user
-	 * has the appropriate privileges for.
-	 */
-	indRelation = table_open(IndexRelationId, AccessShareLock);
-	ScanKeyInit(&entry,
-				Anum_pg_index_indisclustered,
-				BTEqualStrategyNumber, F_BOOLEQ,
-				BoolGetDatum(true));
-	scan = table_beginscan_catalog(indRelation, 1, &entry);
-	while ((indexTuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+	if (usingindex)
 	{
-		RelToCluster *rtc;
+		ScanKeyData entry;
+
+		catalog = table_open(IndexRelationId, AccessShareLock);
+		ScanKeyInit(&entry,
+					Anum_pg_index_indisclustered,
+					BTEqualStrategyNumber, F_BOOLEQ,
+					BoolGetDatum(true));
+		scan = table_beginscan_catalog(catalog, 1, &entry);
+		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+		{
+			RelToCluster *rtc;
+			Form_pg_index index;
+			MemoryContext oldcxt;
 
-		index = (Form_pg_index) GETSTRUCT(indexTuple);
+			index = (Form_pg_index) GETSTRUCT(tuple);
 
-		if (!cluster_is_permitted_for_relation(index->indrelid, GetUserId()))
-			continue;
+			/*
+			 * Try to obtain a light lock on the index's table, to ensure it
+			 * doesn't go away while we collect the list.  If we cannot, just
+			 * disregard it.
+			 */
+			if (!ConditionalLockRelationOid(index->indrelid, AccessShareLock))
+				continue;
 
-		/* Use a permanent memory context for the result list */
-		old_context = MemoryContextSwitchTo(cluster_context);
+			/* Verify that the table still exists */
+			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(index->indrelid)))
+			{
+				/* Release useless lock */
+				UnlockRelationOid(index->indrelid, AccessShareLock);
+				continue;
+			}
 
-		rtc = palloc_object(RelToCluster);
-		rtc->tableOid = index->indrelid;
-		rtc->indexOid = index->indexrelid;
-		rtcs = lappend(rtcs, rtc);
+			if (!cluster_is_permitted_for_relation(cmd, index->indrelid,
+												   GetUserId()))
+				continue;
+
+			/* Use a permanent memory context for the result list */
+			oldcxt = MemoryContextSwitchTo(permcxt);
+			rtc = palloc_object(RelToCluster);
+			rtc->tableOid = index->indrelid;
+			rtc->indexOid = index->indexrelid;
+			rtcs = lappend(rtcs, rtc);
+			MemoryContextSwitchTo(oldcxt);
+		}
+	}
+	else
+	{
+		catalog = table_open(RelationRelationId, AccessShareLock);
+		scan = table_beginscan_catalog(catalog, 0, NULL);
+
+		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+		{
+			RelToCluster *rtc;
+			Form_pg_class class;
+			MemoryContext oldcxt;
+
+			class = (Form_pg_class) GETSTRUCT(tuple);
+
+			/*
+			 * Try to obtain a light lock on the table, to ensure it doesn't
+			 * go away while we collect the list.  If we cannot, just
+			 * disregard the table.
+			 */
+			if (!ConditionalLockRelationOid(class->oid, AccessShareLock))
+				continue;
 
-		MemoryContextSwitchTo(old_context);
+			/* Verify that the table still exists */
+			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(class->oid)))
+			{
+				/* Release useless lock */
+				UnlockRelationOid(class->oid, AccessShareLock);
+				continue;
+			}
+
+			/* Can only process plain tables and matviews */
+			if (class->relkind != RELKIND_RELATION &&
+				class->relkind != RELKIND_MATVIEW)
+				continue;
+
+			/* noisily skip rels which the user can't process */
+			if (!cluster_is_permitted_for_relation(cmd, class->oid,
+												   GetUserId()))
+				continue;
+
+			/* Use a permanent memory context for the result list */
+			oldcxt = MemoryContextSwitchTo(permcxt);
+			rtc = palloc_object(RelToCluster);
+			rtc->tableOid = class->oid;
+			rtc->indexOid = InvalidOid;
+			rtcs = lappend(rtcs, rtc);
+			MemoryContextSwitchTo(oldcxt);
+		}
 	}
-	table_endscan(scan);
 
-	relation_close(indRelation, AccessShareLock);
+	table_endscan(scan);
+	relation_close(catalog, AccessShareLock);
 
 	return rtcs;
 }
 
 /*
- * Given an index on a partitioned table, return a list of RelToCluster for
+ * Given a partitioned table or its index, return a list of RelToCluster for
  * all the children leaves tables/indexes.
  *
  * Like expand_vacuum_rel, but here caller must hold AccessExclusiveLock
  * on the table containing the index.
+ *
+ * 'rel_is_index' tells whether 'relid' is that of an index (true) or of the
+ * owning relation.
  */
 static List *
-get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
+get_tables_to_repack_partitioned(RepackCommand cmd, Oid relid,
+								 bool rel_is_index, MemoryContext permcxt)
 {
 	List	   *inhoids;
-	ListCell   *lc;
 	List	   *rtcs = NIL;
-	MemoryContext old_context;
 
-	/* Do not lock the children until they're processed */
-	inhoids = find_all_inheritors(indexOid, NoLock, NULL);
-
-	foreach(lc, inhoids)
+	/*
+	 * Do not lock the children until they're processed.  Note that we do hold
+	 * a lock on the parent partitioned table.
+	 */
+	inhoids = find_all_inheritors(relid, NoLock, NULL);
+	foreach_oid(child_oid, inhoids)
 	{
-		Oid			indexrelid = lfirst_oid(lc);
-		Oid			relid = IndexGetRelation(indexrelid, false);
+		Oid			table_oid,
+					index_oid;
 		RelToCluster *rtc;
+		MemoryContext oldcxt;
 
-		/* consider only leaf indexes */
-		if (get_rel_relkind(indexrelid) != RELKIND_INDEX)
-			continue;
+		if (rel_is_index)
+		{
+			/* consider only leaf indexes */
+			if (get_rel_relkind(child_oid) != RELKIND_INDEX)
+				continue;
+
+			table_oid = IndexGetRelation(child_oid, false);
+			index_oid = child_oid;
+		}
+		else
+		{
+			/* consider only leaf relations */
+			if (get_rel_relkind(child_oid) != RELKIND_RELATION)
+				continue;
+
+			table_oid = child_oid;
+			index_oid = InvalidOid;
+		}
 
 		/*
 		 * It's possible that the user does not have privileges to CLUSTER the
-		 * leaf partition despite having such privileges on the partitioned
-		 * table.  We skip any partitions which the user is not permitted to
-		 * CLUSTER.
+		 * leaf partition despite having them on the partitioned table.  Skip
+		 * if so.
 		 */
-		if (!cluster_is_permitted_for_relation(relid, GetUserId()))
+		if (!cluster_is_permitted_for_relation(cmd, table_oid, GetUserId()))
 			continue;
 
 		/* Use a permanent memory context for the result list */
-		old_context = MemoryContextSwitchTo(cluster_context);
-
+		oldcxt = MemoryContextSwitchTo(permcxt);
 		rtc = palloc_object(RelToCluster);
-		rtc->tableOid = relid;
-		rtc->indexOid = indexrelid;
+		rtc->tableOid = table_oid;
+		rtc->indexOid = index_oid;
 		rtcs = lappend(rtcs, rtc);
-
-		MemoryContextSwitchTo(old_context);
+		MemoryContextSwitchTo(oldcxt);
 	}
 
 	return rtcs;
@@ -1742,13 +1831,167 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
  * function emits a WARNING.
  */
 static bool
-cluster_is_permitted_for_relation(Oid relid, Oid userid)
+cluster_is_permitted_for_relation(RepackCommand cmd, Oid relid, Oid userid)
 {
+	Assert(cmd == REPACK_COMMAND_CLUSTER || cmd == REPACK_COMMAND_REPACK);
+
 	if (pg_class_aclcheck(relid, userid, ACL_MAINTAIN) == ACLCHECK_OK)
 		return true;
 
 	ereport(WARNING,
-			(errmsg("permission denied to cluster \"%s\", skipping it",
-					get_rel_name(relid))));
+			errmsg("permission denied to execute %s on \"%s\", skipping it",
+				   RepackCommandAsString(cmd),
+				   get_rel_name(relid)));
+
 	return false;
 }
+
+
+/*
+ * Given a RepackStmt with an indicated relation name, resolve the relation
+ * name, obtain lock on it, then determine what to do based on the relation
+ * type: if it's table and not partitioned, repack it as indicated (using an
+ * existing clustered index, or following the given one), and return NULL.
+ *
+ * On the other hand, if the table is partitioned, do nothing further and
+ * instead return the opened and locked relcache entry, so that caller can
+ * process the partitions using the multiple-table handling code.  In this
+ * case, if an index name is given, it's up to the caller to resolve it.
+ */
+static Relation
+process_single_relation(RepackStmt *stmt, ClusterParams *params)
+{
+	Relation	rel;
+	Oid			tableOid;
+
+	Assert(stmt->relation != NULL);
+	Assert(stmt->command == REPACK_COMMAND_CLUSTER ||
+		   stmt->command == REPACK_COMMAND_REPACK);
+
+	/*
+	 * Find, lock, and check permissions on the table.  We obtain
+	 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
+	 * single-transaction case.
+	 */
+	tableOid = RangeVarGetRelidExtended(stmt->relation->relation,
+										AccessExclusiveLock,
+										0,
+										RangeVarCallbackMaintainsTable,
+										NULL);
+	rel = table_open(tableOid, NoLock);
+
+	/*
+	 * Reject clustering a remote temp table ... their local buffer manager is
+	 * not going to cope.
+	 */
+	if (RELATION_IS_OTHER_TEMP(rel))
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot execute %s on temporary tables of other sessions",
+					   RepackCommandAsString(stmt->command)));
+
+	/*
+	 * Make sure ANALYZE is specified if a column list is present.
+	 */
+	if ((params->options & CLUOPT_ANALYZE) == 0 && stmt->relation->va_cols != NIL)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("ANALYZE option must be specified when a column list is provided"));
+
+	/*
+	 * For partitioned tables, let caller handle this.  Otherwise, process it
+	 * here and we're done.
+	 */
+	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+		return rel;
+	else
+	{
+		Oid			indexOid;
+
+		indexOid = determine_clustered_index(rel, stmt->usingindex,
+											 stmt->indexname);
+		if (OidIsValid(indexOid))
+			check_index_is_clusterable(rel, indexOid, AccessExclusiveLock);
+		cluster_rel(stmt->command, rel, indexOid, params);
+
+		/* Do an analyze, if requested */
+		if (params->options & CLUOPT_ANALYZE)
+		{
+			VacuumParams vac_params = {0};
+
+			vac_params.options |= VACOPT_ANALYZE;
+			if (params->options & CLUOPT_VERBOSE)
+				vac_params.options |= VACOPT_VERBOSE;
+			analyze_rel(tableOid, NULL, vac_params,
+						stmt->relation->va_cols, true, NULL);
+		}
+
+		return NULL;
+	}
+}
+
+/*
+ * Given a relation and the usingindex/indexname options in a
+ * REPACK USING INDEX or CLUSTER command, return the OID of the
+ * index to use for clustering the table.
+ *
+ * Caller must hold lock on the relation so that the set of indexes
+ * doesn't change, and must call check_index_is_clusterable.
+ */
+static Oid
+determine_clustered_index(Relation rel, bool usingindex, const char *indexname)
+{
+	Oid			indexOid;
+
+	if (indexname == NULL && usingindex)
+	{
+		/*
+		 * If USING INDEX with no name is given, find a clustered index, or
+		 * error out if none.
+		 */
+		indexOid = InvalidOid;
+		foreach_oid(idxoid, RelationGetIndexList(rel))
+		{
+			if (get_index_isclustered(idxoid))
+			{
+				indexOid = idxoid;
+				break;
+			}
+		}
+
+		if (!OidIsValid(indexOid))
+			ereport(ERROR,
+					errcode(ERRCODE_UNDEFINED_OBJECT),
+					errmsg("there is no previously clustered index for table \"%s\"",
+						   RelationGetRelationName(rel)));
+	}
+	else if (indexname != NULL)
+	{
+		/* An index was specified; obtain its OID. */
+		indexOid = get_relname_relid(indexname, rel->rd_rel->relnamespace);
+		if (!OidIsValid(indexOid))
+			ereport(ERROR,
+					errcode(ERRCODE_UNDEFINED_OBJECT),
+					errmsg("index \"%s\" for table \"%s\" does not exist",
+						   indexname, RelationGetRelationName(rel)));
+	}
+	else
+		indexOid = InvalidOid;
+
+	return indexOid;
+}
+
+static const char *
+RepackCommandAsString(RepackCommand cmd)
+{
+	switch (cmd)
+	{
+		case REPACK_COMMAND_REPACK:
+			return "REPACK";
+		case REPACK_COMMAND_VACUUMFULL:
+			return "VACUUM";
+		case REPACK_COMMAND_CLUSTER:
+			return "CLUSTER";
+	}
+	return "???";
+}
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 0528d1b6ecb..6afa203983f 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -351,7 +351,6 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 		}
 	}
 
-
 	/*
 	 * Sanity check DISABLE_PAGE_SKIPPING option.
 	 */
@@ -2289,8 +2288,9 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 			if ((params.options & VACOPT_VERBOSE) != 0)
 				cluster_params.options |= CLUOPT_VERBOSE;
 
-			/* VACUUM FULL is now a variant of CLUSTER; see cluster.c */
-			cluster_rel(rel, InvalidOid, &cluster_params);
+			/* VACUUM FULL is a variant of REPACK; see cluster.c */
+			cluster_rel(REPACK_COMMAND_VACUUMFULL, rel, InvalidOid,
+						&cluster_params);
 			/* cluster_rel closes the relation, but keeps lock */
 
 			rel = NULL;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 7856ce9d78f..ff336a1adf8 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -286,7 +286,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		AlterCompositeTypeStmt AlterUserMappingStmt
 		AlterRoleStmt AlterRoleSetStmt AlterPolicyStmt AlterStatsStmt
 		AlterDefaultPrivilegesStmt DefACLAction
-		AnalyzeStmt CallStmt ClosePortalStmt ClusterStmt CommentStmt
+		AnalyzeStmt CallStmt ClosePortalStmt CommentStmt
 		ConstraintsSetStmt CopyStmt CreateAsStmt CreateCastStmt
 		CreateDomainStmt CreateExtensionStmt CreateGroupStmt CreateOpClassStmt
 		CreateOpFamilyStmt AlterOpFamilyStmt CreatePLangStmt
@@ -303,7 +303,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		GrantStmt GrantRoleStmt ImportForeignSchemaStmt IndexStmt InsertStmt
 		ListenStmt LoadStmt LockStmt MergeStmt NotifyStmt ExplainableStmt PreparableStmt
 		CreateFunctionStmt AlterFunctionStmt ReindexStmt RemoveAggrStmt
-		RemoveFuncStmt RemoveOperStmt RenameStmt ReturnStmt RevokeStmt RevokeRoleStmt
+		RemoveFuncStmt RemoveOperStmt RenameStmt RepackStmt ReturnStmt RevokeStmt RevokeRoleStmt
 		RuleActionStmt RuleActionStmtOrEmpty RuleStmt
 		SecLabelStmt SelectStmt TransactionStmt TransactionStmtLegacy TruncateStmt
 		UnlistenStmt UpdateStmt VacuumStmt
@@ -322,7 +322,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 %type <str>			opt_single_name
 %type <list>		opt_qualified_name
-%type <boolean>		opt_concurrently
+%type <boolean>		opt_concurrently opt_usingindex
 %type <dbehavior>	opt_drop_behavior
 %type <list>		opt_utility_option_list
 %type <list>		opt_wait_with_clause
@@ -770,7 +770,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	QUOTE QUOTES
 
 	RANGE READ REAL REASSIGN RECURSIVE REF_P REFERENCES REFERENCING
-	REFRESH REINDEX RELATIVE_P RELEASE RENAME REPEATABLE REPLACE REPLICA
+	REFRESH REINDEX RELATIVE_P RELEASE RENAME REPACK REPEATABLE REPLACE REPLICA
 	RESET RESPECT_P RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP
 	ROUTINE ROUTINES ROW ROWS RULE
 
@@ -1032,7 +1032,6 @@ stmt:
 			| CallStmt
 			| CheckPointStmt
 			| ClosePortalStmt
-			| ClusterStmt
 			| CommentStmt
 			| ConstraintsSetStmt
 			| CopyStmt
@@ -1106,6 +1105,7 @@ stmt:
 			| RemoveFuncStmt
 			| RemoveOperStmt
 			| RenameStmt
+			| RepackStmt
 			| RevokeStmt
 			| RevokeRoleStmt
 			| RuleStmt
@@ -1143,6 +1143,11 @@ opt_concurrently:
 			| /*EMPTY*/						{ $$ = false; }
 		;
 
+opt_usingindex:
+			USING INDEX						{ $$ = true; }
+			| /* EMPTY */					{ $$ = false; }
+		;
+
 opt_drop_behavior:
 			CASCADE							{ $$ = DROP_CASCADE; }
 			| RESTRICT						{ $$ = DROP_RESTRICT; }
@@ -11979,38 +11984,82 @@ CreateConversionStmt:
 /*****************************************************************************
  *
  *		QUERY:
+ *				REPACK [ (options) ] [ <qualified_name> [ <name_list> ] [ USING INDEX <index_name> ] ]
+ *
+ *			obsolete variants:
  *				CLUSTER (options) [ <qualified_name> [ USING <index_name> ] ]
  *				CLUSTER [VERBOSE] [ <qualified_name> [ USING <index_name> ] ]
  *				CLUSTER [VERBOSE] <index_name> ON <qualified_name> (for pre-8.3)
  *
  *****************************************************************************/
 
-ClusterStmt:
-			CLUSTER '(' utility_option_list ')' qualified_name cluster_index_specification
+RepackStmt:
+			REPACK opt_utility_option_list vacuum_relation USING INDEX name
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
-					n->relation = $5;
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = (VacuumRelation *) $3;
 					n->indexname = $6;
+					n->usingindex = true;
+					n->params = $2;
+					$$ = (Node *) n;
+				}
+			| REPACK opt_utility_option_list vacuum_relation opt_usingindex
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = (VacuumRelation *) $3;
+					n->indexname = NULL;
+					n->usingindex = $4;
+					n->params = $2;
+					$$ = (Node *) n;
+				}
+			| REPACK opt_utility_option_list opt_usingindex
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = NULL;
+					n->indexname = NULL;
+					n->usingindex = $3;
+					n->params = $2;
+					$$ = (Node *) n;
+				}
+			| CLUSTER '(' utility_option_list ')' qualified_name cluster_index_specification
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_CLUSTER;
+					n->relation = makeNode(VacuumRelation);
+					n->relation->relation = $5;
+					n->indexname = $6;
+					n->usingindex = true;
 					n->params = $3;
 					$$ = (Node *) n;
 				}
 			| CLUSTER opt_utility_option_list
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = NULL;
 					n->indexname = NULL;
+					n->usingindex = true;
 					n->params = $2;
 					$$ = (Node *) n;
 				}
 			/* unparenthesized VERBOSE kept for pre-14 compatibility */
 			| CLUSTER opt_verbose qualified_name cluster_index_specification
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
-					n->relation = $3;
+					n->command = REPACK_COMMAND_CLUSTER;
+					n->relation = makeNode(VacuumRelation);
+					n->relation->relation = $3;
 					n->indexname = $4;
+					n->usingindex = true;
 					if ($2)
 						n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
@@ -12018,20 +12067,25 @@ ClusterStmt:
 			/* unparenthesized VERBOSE kept for pre-17 compatibility */
 			| CLUSTER VERBOSE
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = NULL;
 					n->indexname = NULL;
+					n->usingindex = true;
 					n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
 				}
 			/* kept for pre-8.3 compatibility */
 			| CLUSTER opt_verbose name ON qualified_name
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
-					n->relation = $5;
+					n->command = REPACK_COMMAND_CLUSTER;
+					n->relation = makeNode(VacuumRelation);
+					n->relation->relation = $5;
 					n->indexname = $3;
+					n->usingindex = true;
 					if ($2)
 						n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
@@ -18069,6 +18123,7 @@ unreserved_keyword:
 			| RELATIVE_P
 			| RELEASE
 			| RENAME
+			| REPACK
 			| REPEATABLE
 			| REPLACE
 			| REPLICA
@@ -18704,6 +18759,7 @@ bare_label_keyword:
 			| RELATIVE_P
 			| RELEASE
 			| RENAME
+			| REPACK
 			| REPEATABLE
 			| REPLACE
 			| REPLICA
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index d18a3a60a46..3e731dc8117 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -279,9 +279,9 @@ ClassifyUtilityCommandAsReadOnly(Node *parsetree)
 				return COMMAND_OK_IN_RECOVERY | COMMAND_OK_IN_READ_ONLY_TXN;
 			}
 
-		case T_ClusterStmt:
 		case T_ReindexStmt:
 		case T_VacuumStmt:
+		case T_RepackStmt:
 			{
 				/*
 				 * These commands write WAL, so they're not strictly
@@ -856,14 +856,14 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 			ExecuteCallStmt(castNode(CallStmt, parsetree), params, isAtomicContext, dest);
 			break;
 
-		case T_ClusterStmt:
-			cluster(pstate, (ClusterStmt *) parsetree, isTopLevel);
-			break;
-
 		case T_VacuumStmt:
 			ExecVacuum(pstate, (VacuumStmt *) parsetree, isTopLevel);
 			break;
 
+		case T_RepackStmt:
+			ExecRepack(pstate, (RepackStmt *) parsetree, isTopLevel);
+			break;
+
 		case T_ExplainStmt:
 			ExplainQuery(pstate, (ExplainStmt *) parsetree, params, dest);
 			break;
@@ -2864,10 +2864,6 @@ CreateCommandTag(Node *parsetree)
 			tag = CMDTAG_CALL;
 			break;
 
-		case T_ClusterStmt:
-			tag = CMDTAG_CLUSTER;
-			break;
-
 		case T_VacuumStmt:
 			if (((VacuumStmt *) parsetree)->is_vacuumcmd)
 				tag = CMDTAG_VACUUM;
@@ -2875,6 +2871,13 @@ CreateCommandTag(Node *parsetree)
 				tag = CMDTAG_ANALYZE;
 			break;
 
+		case T_RepackStmt:
+			if (((RepackStmt *) parsetree)->command == REPACK_COMMAND_CLUSTER)
+				tag = CMDTAG_CLUSTER;
+			else
+				tag = CMDTAG_REPACK;
+			break;
+
 		case T_ExplainStmt:
 			tag = CMDTAG_EXPLAIN;
 			break;
@@ -3516,7 +3519,7 @@ GetCommandLogLevel(Node *parsetree)
 			lev = LOGSTMT_ALL;
 			break;
 
-		case T_ClusterStmt:
+		case T_RepackStmt:
 			lev = LOGSTMT_DDL;
 			break;
 
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index ef6fffe60b9..fc86cbb3b88 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -289,6 +289,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 		cmdtype = PROGRESS_COMMAND_ANALYZE;
 	else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
 		cmdtype = PROGRESS_COMMAND_CLUSTER;
+	else if (pg_strcasecmp(cmd, "REPACK") == 0)
+		cmdtype = PROGRESS_COMMAND_REPACK;
 	else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)
 		cmdtype = PROGRESS_COMMAND_CREATE_INDEX;
 	else if (pg_strcasecmp(cmd, "BASEBACKUP") == 0)
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 20d7a65c614..626d9f1c98b 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -1267,7 +1267,7 @@ static const char *const sql_commands[] = {
 	"DELETE FROM", "DISCARD", "DO", "DROP", "END", "EXECUTE", "EXPLAIN",
 	"FETCH", "GRANT", "IMPORT FOREIGN SCHEMA", "INSERT INTO", "LISTEN", "LOAD", "LOCK",
 	"MERGE INTO", "MOVE", "NOTIFY", "PREPARE",
-	"REASSIGN", "REFRESH MATERIALIZED VIEW", "REINDEX", "RELEASE",
+	"REASSIGN", "REFRESH MATERIALIZED VIEW", "REINDEX", "RELEASE", "REPACK",
 	"RESET", "REVOKE", "ROLLBACK",
 	"SAVEPOINT", "SECURITY LABEL", "SELECT", "SET", "SHOW", "START",
 	"TABLE", "TRUNCATE", "UNLISTEN", "UPDATE", "VACUUM", "VALUES",
@@ -5040,6 +5040,46 @@ match_previous_words(int pattern_id,
 			COMPLETE_WITH_QUERY(Query_for_list_of_tablespaces);
 	}
 
+/* REPACK */
+	else if (Matches("REPACK"))
+		COMPLETE_WITH_SCHEMA_QUERY_PLUS(Query_for_list_of_clusterables,
+										"(", "USING INDEX");
+	else if (Matches("REPACK", "(*)"))
+		COMPLETE_WITH_SCHEMA_QUERY_PLUS(Query_for_list_of_clusterables,
+										"USING INDEX");
+	else if (Matches("REPACK", MatchAnyExcept("(")))
+		COMPLETE_WITH("USING INDEX");
+	else if (Matches("REPACK", "(*)", MatchAnyExcept("(")))
+		COMPLETE_WITH("USING INDEX");
+	else if (Matches("REPACK", MatchAny, "USING", "INDEX") ||
+			 Matches("REPACK", "(*)", MatchAny, "USING", "INDEX"))
+	{
+		set_completion_reference(prev3_wd);
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_index_of_table);
+	}
+	/*
+	 * Complete ... [ (*) ] <sth> USING INDEX, with a list of indexes for
+	 * <sth>.
+	 */
+	else if (TailMatches(MatchAny, "USING", "INDEX"))
+	{
+		set_completion_reference(prev3_wd);
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_index_of_table);
+	}
+	else if (HeadMatches("REPACK", "(*") &&
+			 !HeadMatches("REPACK", "(*)"))
+	{
+		/*
+		 * This fires if we're in an unfinished parenthesized option list.
+		 * get_previous_words treats a completed parenthesized option list as
+		 * one word, so the above test is correct.
+		 */
+		if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
+			COMPLETE_WITH("ANALYZE", "VERBOSE");
+		else if (TailMatches("ANALYZE", "VERBOSE"))
+			COMPLETE_WITH("ON", "OFF");
+	}
+
 /* SECURITY LABEL */
 	else if (Matches("SECURITY"))
 		COMPLETE_WITH("LABEL");
diff --git a/src/bin/scripts/Makefile b/src/bin/scripts/Makefile
index 019ca06455d..f0c1bd4175c 100644
--- a/src/bin/scripts/Makefile
+++ b/src/bin/scripts/Makefile
@@ -16,7 +16,7 @@ subdir = src/bin/scripts
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-PROGRAMS = createdb createuser dropdb dropuser clusterdb vacuumdb reindexdb pg_isready
+PROGRAMS = createdb createuser dropdb dropuser clusterdb vacuumdb reindexdb pg_isready pg_repackdb
 
 override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
 LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
@@ -31,6 +31,7 @@ clusterdb: clusterdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport su
 vacuumdb: vacuumdb.o vacuuming.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 reindexdb: reindexdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 pg_isready: pg_isready.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
+pg_repackdb: pg_repackdb.o vacuuming.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 
 install: all installdirs
 	$(INSTALL_PROGRAM) createdb$(X)   '$(DESTDIR)$(bindir)'/createdb$(X)
@@ -41,6 +42,7 @@ install: all installdirs
 	$(INSTALL_PROGRAM) vacuumdb$(X)   '$(DESTDIR)$(bindir)'/vacuumdb$(X)
 	$(INSTALL_PROGRAM) reindexdb$(X)  '$(DESTDIR)$(bindir)'/reindexdb$(X)
 	$(INSTALL_PROGRAM) pg_isready$(X) '$(DESTDIR)$(bindir)'/pg_isready$(X)
+	$(INSTALL_PROGRAM) pg_repackdb$(X) '$(DESTDIR)$(bindir)'/pg_repackdb$(X)
 
 installdirs:
 	$(MKDIR_P) '$(DESTDIR)$(bindir)'
diff --git a/src/bin/scripts/meson.build b/src/bin/scripts/meson.build
index a4fed59d1c9..be573cae682 100644
--- a/src/bin/scripts/meson.build
+++ b/src/bin/scripts/meson.build
@@ -42,6 +42,7 @@ vacuuming_common = static_library('libvacuuming_common',
 
 binaries = [
   'vacuumdb',
+  'pg_repackdb',
 ]
 foreach binary : binaries
   binary_sources = files('@0@.c'.format(binary))
@@ -80,6 +81,7 @@ tests += {
       't/100_vacuumdb.pl',
       't/101_vacuumdb_all.pl',
       't/102_vacuumdb_stages.pl',
+      't/103_repackdb.pl',
       't/200_connstr.pl',
     ],
   },
diff --git a/src/bin/scripts/pg_repackdb.c b/src/bin/scripts/pg_repackdb.c
new file mode 100644
index 00000000000..2765d1e97b8
--- /dev/null
+++ b/src/bin/scripts/pg_repackdb.c
@@ -0,0 +1,240 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_repackdb
+ *		An utility to run REPACK
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * FIXME: this is missing a way to specify the index to use to repack one
+ * table, or whether to pass a WITH INDEX clause when multiple tables are
+ * used.  Something like --index[=indexname].  Adding that bleeds into
+ * vacuuming.c as well.
+ *
+ * src/bin/scripts/pg_repackdb.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <limits.h>
+
+#include "common.h"
+#include "common/logging.h"
+#include "fe_utils/option_utils.h"
+#include "vacuuming.h"
+
+static void help(const char *progname);
+static void check_objfilter(bits32 objfilter);
+
+int
+main(int argc, char *argv[])
+{
+	static struct option long_options[] = {
+		{"host", required_argument, NULL, 'h'},
+		{"port", required_argument, NULL, 'p'},
+		{"username", required_argument, NULL, 'U'},
+		{"no-password", no_argument, NULL, 'w'},
+		{"password", no_argument, NULL, 'W'},
+		{"echo", no_argument, NULL, 'e'},
+		{"quiet", no_argument, NULL, 'q'},
+		{"dbname", required_argument, NULL, 'd'},
+		{"analyze", no_argument, NULL, 'z'},
+		{"all", no_argument, NULL, 'a'},
+		/* XXX this could be 'i', but optional_arg is messy */
+		{"index", optional_argument, NULL, 1},
+		{"table", required_argument, NULL, 't'},
+		{"verbose", no_argument, NULL, 'v'},
+		{"jobs", required_argument, NULL, 'j'},
+		{"schema", required_argument, NULL, 'n'},
+		{"exclude-schema", required_argument, NULL, 'N'},
+		{"maintenance-db", required_argument, NULL, 2},
+		{NULL, 0, NULL, 0}
+	};
+
+	const char *progname;
+	int			optindex;
+	int			c;
+	const char *dbname = NULL;
+	const char *maintenance_db = NULL;
+	ConnParams	cparams;
+	vacuumingOptions vacopts;
+	SimpleStringList objects = {NULL, NULL};
+	int			concurrentCons = 1;
+	int			tbl_count = 0;
+	int			ret;
+
+	/* initialize options */
+	memset(&vacopts, 0, sizeof(vacopts));
+	vacopts.mode = MODE_REPACK;
+
+	/* the same for connection parameters */
+	memset(&cparams, 0, sizeof(cparams));
+	cparams.prompt_password = TRI_DEFAULT;
+
+	pg_logging_init(argv[0]);
+	progname = get_progname(argv[0]);
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
+
+	handle_help_version_opts(argc, argv, progname, help);
+
+	while ((c = getopt_long(argc, argv, "ad:eh:j:n:N:p:qt:U:vwWz",
+							long_options, &optindex)) != -1)
+	{
+		switch (c)
+		{
+			case 'a':
+				vacopts.objfilter |= OBJFILTER_ALL_DBS;
+				break;
+			case 'd':
+				vacopts.objfilter |= OBJFILTER_DATABASE;
+				dbname = pg_strdup(optarg);
+				break;
+			case 'e':
+				vacopts.echo = true;
+				break;
+			case 'h':
+				cparams.pghost = pg_strdup(optarg);
+				break;
+			case 'j':
+				if (!option_parse_int(optarg, "-j/--jobs", 1, INT_MAX,
+									  &concurrentCons))
+					exit(1);
+				break;
+			case 'n':
+				vacopts.objfilter |= OBJFILTER_SCHEMA;
+				simple_string_list_append(&objects, optarg);
+				break;
+			case 'N':
+				vacopts.objfilter |= OBJFILTER_SCHEMA_EXCLUDE;
+				simple_string_list_append(&objects, optarg);
+				break;
+			case 'p':
+				cparams.pgport = pg_strdup(optarg);
+				break;
+			case 'q':
+				vacopts.quiet = true;
+				break;
+			case 't':
+				vacopts.objfilter |= OBJFILTER_TABLE;
+				simple_string_list_append(&objects, optarg);
+				tbl_count++;
+				break;
+			case 'U':
+				cparams.pguser = pg_strdup(optarg);
+				break;
+			case 'v':
+				vacopts.verbose = true;
+				break;
+			case 'w':
+				cparams.prompt_password = TRI_NO;
+				break;
+			case 'W':
+				cparams.prompt_password = TRI_YES;
+				break;
+			case 'z':
+				vacopts.and_analyze = true;
+				break;
+			case 1:
+				vacopts.using_index = true;
+				if (optarg)
+					vacopts.indexname = pg_strdup(optarg);
+				else
+					vacopts.indexname = NULL;
+				break;
+			case 2:
+				maintenance_db = pg_strdup(optarg);
+				break;
+			default:
+				/* getopt_long already emitted a complaint */
+				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+				exit(1);
+		}
+	}
+
+	/*
+	 * Non-option argument specifies database name as long as it wasn't
+	 * already specified with -d / --dbname
+	 */
+	if (optind < argc && dbname == NULL)
+	{
+		vacopts.objfilter |= OBJFILTER_DATABASE;
+		dbname = argv[optind];
+		optind++;
+	}
+
+	if (optind < argc)
+	{
+		pg_log_error("too many command-line arguments (first is \"%s\")",
+					 argv[optind]);
+		pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+		exit(1);
+	}
+
+	/*
+	 * Validate the combination of filters specified in the command-line
+	 * options.
+	 */
+	check_objfilter(vacopts.objfilter);
+
+	ret = vacuuming_main(&cparams, dbname, maintenance_db, &vacopts,
+						 &objects, tbl_count, concurrentCons,
+						 progname);
+	exit(ret);
+}
+
+/*
+ * Verify that the filters used at command line are compatible.
+ */
+void
+check_objfilter(bits32 objfilter)
+{
+	if ((objfilter & OBJFILTER_ALL_DBS) &&
+		(objfilter & OBJFILTER_DATABASE))
+		pg_fatal("cannot repack all databases and a specific one at the same time");
+
+	if ((objfilter & OBJFILTER_TABLE) &&
+		(objfilter & OBJFILTER_SCHEMA))
+		pg_fatal("cannot repack all tables in schema(s) and specific table(s) at the same time");
+
+	if ((objfilter & OBJFILTER_TABLE) &&
+		(objfilter & OBJFILTER_SCHEMA_EXCLUDE))
+		pg_fatal("cannot repack specific table(s) and exclude schema(s) at the same time");
+
+	if ((objfilter & OBJFILTER_SCHEMA) &&
+		(objfilter & OBJFILTER_SCHEMA_EXCLUDE))
+		pg_fatal("cannot repack all tables in schema(s) and exclude schema(s) at the same time");
+}
+
+static void
+help(const char *progname)
+{
+	printf(_("%s repacks a PostgreSQL database.\n\n"), progname);
+	printf(_("Usage:\n"));
+	printf(_("  %s [OPTION]... [DBNAME]\n"), progname);
+	printf(_("\nOptions:\n"));
+	printf(_("  -a, --all                       repack all databases\n"));
+	printf(_("  -d, --dbname=DBNAME             database to repack\n"));
+	printf(_("  -e, --echo                      show the commands being sent to the server\n"));
+	printf(_("      --index[=INDEX]             repack following an index\n"));
+	printf(_("  -j, --jobs=NUM                  use this many concurrent connections to repack\n"));
+	printf(_("  -n, --schema=SCHEMA             repack tables in the specified schema(s) only\n"));
+	printf(_("  -N, --exclude-schema=SCHEMA     do not repack tables in the specified schema(s)\n"));
+	printf(_("  -q, --quiet                     don't write any messages\n"));
+	printf(_("  -t, --table='TABLE[(COLUMNS)]'  repack specific table(s) only\n"));
+	printf(_("  -v, --verbose                   write a lot of output\n"));
+	printf(_("  -V, --version                   output version information, then exit\n"));
+	printf(_("  -z, --analyze                   update optimizer statistics\n"));
+	printf(_("  -?, --help                      show this help, then exit\n"));
+	printf(_("\nConnection options:\n"));
+	printf(_("  -h, --host=HOSTNAME       database server host or socket directory\n"));
+	printf(_("  -p, --port=PORT           database server port\n"));
+	printf(_("  -U, --username=USERNAME   user name to connect as\n"));
+	printf(_("  -w, --no-password         never prompt for password\n"));
+	printf(_("  -W, --password            force password prompt\n"));
+	printf(_("  --maintenance-db=DBNAME   alternate maintenance database\n"));
+	printf(_("\nRead the description of the SQL command REPACK for details.\n"));
+	printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
+	printf(_("%s home page: <%s>\n"), PACKAGE_NAME, PACKAGE_URL);
+}
diff --git a/src/bin/scripts/t/103_repackdb.pl b/src/bin/scripts/t/103_repackdb.pl
new file mode 100644
index 00000000000..cadce9b837c
--- /dev/null
+++ b/src/bin/scripts/t/103_repackdb.pl
@@ -0,0 +1,47 @@
+# Copyright (c) 2021-2025, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+program_help_ok('pg_repackdb');
+program_version_ok('pg_repackdb');
+program_options_handling_ok('pg_repackdb');
+
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init;
+$node->start;
+
+$node->issues_sql_like(
+	[ 'pg_repackdb', 'postgres', '-t', 'pg_class'],
+	qr/statement: REPACK.*pg_class;/,
+	'pg_repackdb processes a single table');
+
+$node->safe_psql('postgres', 'CREATE USER testusr;
+	GRANT CREATE ON SCHEMA public TO testusr');
+$node->safe_psql('postgres',
+	'CREATE TABLE cluster_1 (a int primary key);
+	ALTER TABLE cluster_1 CLUSTER ON cluster_1_pkey;
+	CREATE TABLE cluster_2 (a int unique);
+	ALTER TABLE cluster_2 CLUSTER ON cluster_2_a_key;',
+	extra_params => ['-U' => 'testusr']);
+
+$node->issues_sql_like(
+	[ 'pg_repackdb', 'postgres', '-U', 'testusr' ],
+	qr/statement: REPACK.*;/,
+	'SQL REPACK run');
+
+$node->issues_sql_like(
+	[ 'pg_repackdb', 'postgres', '--index'],
+	qr/statement: REPACK.*cluster_1 USING INDEX.*statement: REPACK.*cluster_2 USING INDEX/ms,
+	'pg_repackdb --index chooses multiple tables');
+
+$node->issues_sql_like(
+	[ 'pg_repackdb', 'postgres', '--analyze', '-t', 'cluster_1'],
+	qr/statement: REPACK \(ANALYZE\) public.cluster_1/,
+	'pg_repackdb --analyze works');
+
+done_testing();
diff --git a/src/bin/scripts/vacuuming.c b/src/bin/scripts/vacuuming.c
index 5d2d8a64961..2e3d6d2ad73 100644
--- a/src/bin/scripts/vacuuming.c
+++ b/src/bin/scripts/vacuuming.c
@@ -1,6 +1,6 @@
 /*-------------------------------------------------------------------------
  * vacuuming.c
- *		Helper routines for vacuumdb
+ *		Helper routines for vacuumdb and pg_repackdb
  *
  * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -194,6 +194,14 @@ vacuum_one_database(ConnParams *cparams,
 
 	conn = connectDatabase(cparams, progname, vacopts->echo, false, true);
 
+	if (vacopts->mode == MODE_REPACK && PQserverVersion(conn) < 190000)
+	{
+		/* XXX arguably, here we should use VACUUM FULL instead of failing */
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" command on server versions older than PostgreSQL %s",
+				 "REPACK", "19");
+	}
+
 	if (vacopts->disable_page_skipping && PQserverVersion(conn) < 90600)
 	{
 		PQfinish(conn);
@@ -286,9 +294,18 @@ vacuum_one_database(ConnParams *cparams,
 		if (vacopts->mode == MODE_ANALYZE_IN_STAGES)
 			printf(_("%s: processing database \"%s\": %s\n"),
 				   progname, PQdb(conn), _(stage_messages[stage]));
-		else
+		else if (vacopts->mode == MODE_ANALYZE)
+			printf(_("%s: analyzing database \"%s\"\n"),
+				   progname, PQdb(conn));
+		else if (vacopts->mode == MODE_VACUUM)
 			printf(_("%s: vacuuming database \"%s\"\n"),
 				   progname, PQdb(conn));
+		else
+		{
+			Assert(vacopts->mode == MODE_REPACK);
+			printf(_("%s: repacking database \"%s\"\n"),
+				   progname, PQdb(conn));
+		}
 		fflush(stdout);
 	}
 
@@ -640,6 +657,35 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
 								 " AND listed_objects.object_oid IS NOT NULL\n");
 	}
 
+	/*
+	 * In REPACK mode, if the 'using_index' option was given but no index
+	 * name, filter only tables that have an index with indisclustered set.
+	 * (If an index name is given, we trust the user to pass a reasonable list
+	 * of tables.)
+	 *
+	 * XXX it may be worth printing an error if an index name is given with no
+	 * list of tables.
+	 */
+	if (vacopts->mode == MODE_REPACK &&
+		vacopts->using_index && !vacopts->indexname)
+	{
+		appendPQExpBufferStr(&catalog_query,
+							 " AND EXISTS (SELECT 1 FROM pg_catalog.pg_index\n"
+							 "    WHERE indrelid = c.oid AND indisclustered)\n");
+	}
+
+	/*
+	 * In REPACK mode, only consider the tables that the current user has
+	 * MAINTAIN privileges on.  XXX maybe we should do this in all cases, not
+	 * just REPACK.  The vacuumdb output is too noisy for no reason.
+	 */
+	if (vacopts->mode == MODE_REPACK)
+	{
+		appendPQExpBufferStr(&catalog_query,
+							 " AND pg_catalog.has_table_privilege(current_user, "
+							 "c.oid, 'MAINTAIN')\n");
+	}
+
 	/*
 	 * If no tables were listed, filter for the relevant relation types.  If
 	 * tables were given via --table, don't bother filtering by relation type.
@@ -878,8 +924,10 @@ prepare_vacuum_command(PGconn *conn, PQExpBuffer sql,
 			if (vacopts->verbose)
 				appendPQExpBufferStr(sql, " VERBOSE");
 		}
+
+		appendPQExpBuffer(sql, " %s", table);
 	}
-	else
+	else if (vacopts->mode == MODE_VACUUM)
 	{
 		appendPQExpBufferStr(sql, "VACUUM");
 
@@ -993,9 +1041,39 @@ prepare_vacuum_command(PGconn *conn, PQExpBuffer sql,
 			if (vacopts->and_analyze)
 				appendPQExpBufferStr(sql, " ANALYZE");
 		}
+
+		appendPQExpBuffer(sql, " %s", table);
 	}
+	else if (vacopts->mode == MODE_REPACK)
+	{
+		appendPQExpBufferStr(sql, "REPACK");
 
-	appendPQExpBuffer(sql, " %s;", table);
+		if (vacopts->verbose)
+		{
+			appendPQExpBuffer(sql, "%sVERBOSE", sep);
+			sep = comma;
+		}
+		if (vacopts->and_analyze)
+		{
+			appendPQExpBuffer(sql, "%sANALYZE", sep);
+			sep = comma;
+		}
+
+		if (sep != paren)
+			appendPQExpBufferChar(sql, ')');
+
+		appendPQExpBuffer(sql, " %s", table);
+
+		if (vacopts->using_index)
+		{
+			appendPQExpBuffer(sql, " USING INDEX");
+			if (vacopts->indexname)
+				appendPQExpBuffer(sql, " %s", fmtIdEnc(vacopts->indexname,
+													   PQclientEncoding(conn)));
+		}
+	}
+
+	appendPQExpBufferChar(sql, ';');
 }
 
 /*
@@ -1024,13 +1102,21 @@ run_vacuum_command(ParallelSlot *free_slot, vacuumingOptions *vacopts,
 	{
 		if (table)
 		{
-			pg_log_error("vacuuming of table \"%s\" in database \"%s\" failed: %s",
-						 table, PQdb(conn), PQerrorMessage(conn));
+			if (vacopts->mode == MODE_VACUUM)
+				pg_log_error("vacuuming of table \"%s\" in database \"%s\" failed: %s",
+							 table, PQdb(conn), PQerrorMessage(conn));
+			else
+				pg_log_error("repacking of table \"%s\" in database \"%s\" failed: %s",
+							 table, PQdb(conn), PQerrorMessage(conn));
 		}
 		else
 		{
-			pg_log_error("vacuuming of database \"%s\" failed: %s",
-						 PQdb(conn), PQerrorMessage(conn));
+			if (vacopts->mode == MODE_VACUUM)
+				pg_log_error("vacuuming of database \"%s\" failed: %s",
+							 PQdb(conn), PQerrorMessage(conn));
+			else
+				pg_log_error("repacking of database \"%s\" failed: %s",
+							 PQdb(conn), PQerrorMessage(conn));
 		}
 	}
 }
diff --git a/src/bin/scripts/vacuuming.h b/src/bin/scripts/vacuuming.h
index 586b6caa3d6..dba6ac6f6e0 100644
--- a/src/bin/scripts/vacuuming.h
+++ b/src/bin/scripts/vacuuming.h
@@ -20,6 +20,7 @@
 typedef enum
 {
 	MODE_VACUUM,
+	MODE_REPACK,
 	MODE_ANALYZE,
 	MODE_ANALYZE_IN_STAGES
 } RunMode;
@@ -37,6 +38,8 @@ typedef struct vacuumingOptions
 	bool		and_analyze;
 	bool		full;
 	bool		freeze;
+	bool		using_index;
+	char	   *indexname;
 	bool		disable_page_skipping;
 	bool		skip_locked;
 	int			min_xid_age;
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 60088a64cbb..652542e8e65 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -24,6 +24,7 @@
 #define CLUOPT_RECHECK 0x02		/* recheck relation state */
 #define CLUOPT_RECHECK_ISCLUSTERED 0x04 /* recheck relation state for
 										 * indisclustered */
+#define CLUOPT_ANALYZE 0x08		/* do an ANALYZE */
 
 /* options for CLUSTER */
 typedef struct ClusterParams
@@ -31,8 +32,11 @@ typedef struct ClusterParams
 	bits32		options;		/* bitmask of CLUOPT_* */
 } ClusterParams;
 
-extern void cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel);
-extern void cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params);
+
+extern void ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel);
+
+extern void cluster_rel(RepackCommand command, Relation OldHeap, Oid indexOid,
+						ClusterParams *params);
 extern void check_index_is_clusterable(Relation OldHeap, Oid indexOid,
 									   LOCKMODE lockmode);
 extern void mark_index_clustered(Relation rel, Oid indexOid, bool is_internal);
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 9dc63a5a5bd..8d68bcbef95 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -73,28 +73,34 @@
 #define PROGRESS_ANALYZE_STARTED_BY_MANUAL			1
 #define PROGRESS_ANALYZE_STARTED_BY_AUTOVACUUM		2
 
-/* Progress parameters for cluster */
-#define PROGRESS_CLUSTER_COMMAND				0
-#define PROGRESS_CLUSTER_PHASE					1
-#define PROGRESS_CLUSTER_INDEX_RELID			2
-#define PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED	3
-#define PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN	4
-#define PROGRESS_CLUSTER_TOTAL_HEAP_BLKS		5
-#define PROGRESS_CLUSTER_HEAP_BLKS_SCANNED		6
-#define PROGRESS_CLUSTER_INDEX_REBUILD_COUNT	7
-
-/* Phases of cluster (as advertised via PROGRESS_CLUSTER_PHASE) */
-#define PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP	1
-#define PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP	2
-#define PROGRESS_CLUSTER_PHASE_SORT_TUPLES		3
-#define PROGRESS_CLUSTER_PHASE_WRITE_NEW_HEAP	4
-#define PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES	5
-#define PROGRESS_CLUSTER_PHASE_REBUILD_INDEX	6
-#define PROGRESS_CLUSTER_PHASE_FINAL_CLEANUP	7
-
-/* Commands of PROGRESS_CLUSTER */
-#define PROGRESS_CLUSTER_COMMAND_CLUSTER		1
-#define PROGRESS_CLUSTER_COMMAND_VACUUM_FULL	2
+/*
+ * Progress parameters for REPACK.
+ *
+ * Values for PROGRESS_REPACK_COMMAND are defined as in RepackCommand.
+ *
+ * Note: Since REPACK shares code with CLUSTER, these values are also
+ * used by CLUSTER. (CLUSTER is now deprecated, so it makes little sense to
+ * introduce a separate set of constants.)
+ */
+#define PROGRESS_REPACK_COMMAND					0
+#define PROGRESS_REPACK_PHASE					1
+#define PROGRESS_REPACK_INDEX_RELID				2
+#define PROGRESS_REPACK_HEAP_TUPLES_SCANNED		3
+#define PROGRESS_REPACK_HEAP_TUPLES_WRITTEN		4
+#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			5
+#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		6
+#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		7
+
+/*
+ * Phases of repack (as advertised via PROGRESS_REPACK_PHASE).
+ */
+#define PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP		1
+#define PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP	2
+#define PROGRESS_REPACK_PHASE_SORT_TUPLES		3
+#define PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP	4
+#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	5
+#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		6
+#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		7
 
 /* Progress parameters for CREATE INDEX */
 /* 3, 4 and 5 reserved for "waitfor" metrics */
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index d14294a4ece..94892042b8d 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3951,18 +3951,6 @@ typedef struct AlterSystemStmt
 	VariableSetStmt *setstmt;	/* SET subcommand */
 } AlterSystemStmt;
 
-/* ----------------------
- *		Cluster Statement (support pbrown's cluster index implementation)
- * ----------------------
- */
-typedef struct ClusterStmt
-{
-	NodeTag		type;
-	RangeVar   *relation;		/* relation being indexed, or NULL if all */
-	char	   *indexname;		/* original index defined */
-	List	   *params;			/* list of DefElem nodes */
-} ClusterStmt;
-
 /* ----------------------
  *		Vacuum and Analyze Statements
  *
@@ -3975,7 +3963,7 @@ typedef struct VacuumStmt
 	NodeTag		type;
 	List	   *options;		/* list of DefElem nodes */
 	List	   *rels;			/* list of VacuumRelation, or NIL for all */
-	bool		is_vacuumcmd;	/* true for VACUUM, false for ANALYZE */
+	bool		is_vacuumcmd;	/* true for VACUUM, false otherwise */
 } VacuumStmt;
 
 /*
@@ -3993,6 +3981,27 @@ typedef struct VacuumRelation
 	List	   *va_cols;		/* list of column names, or NIL for all */
 } VacuumRelation;
 
+/* ----------------------
+ *		Repack Statement
+ * ----------------------
+ */
+typedef enum RepackCommand
+{
+	REPACK_COMMAND_CLUSTER = 1,
+	REPACK_COMMAND_REPACK,
+	REPACK_COMMAND_VACUUMFULL,
+} RepackCommand;
+
+typedef struct RepackStmt
+{
+	NodeTag		type;
+	RepackCommand command;		/* type of command being run */
+	VacuumRelation *relation;	/* relation being repacked */
+	char	   *indexname;		/* order tuples by this index */
+	bool		usingindex;		/* whether USING INDEX is specified */
+	List	   *params;			/* list of DefElem nodes */
+} RepackStmt;
+
 /* ----------------------
  *		Explain Statement
  *
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 5d4fe27ef96..f1a1d5e7a80 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -376,6 +376,7 @@ PG_KEYWORD("reindex", REINDEX, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("relative", RELATIVE_P, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("release", RELEASE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("rename", RENAME, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("repack", REPACK, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("repeatable", REPEATABLE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("replace", REPLACE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("replica", REPLICA, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/include/tcop/cmdtaglist.h b/src/include/tcop/cmdtaglist.h
index c4606d65043..66690f1134a 100644
--- a/src/include/tcop/cmdtaglist.h
+++ b/src/include/tcop/cmdtaglist.h
@@ -196,6 +196,7 @@ PG_CMDTAG(CMDTAG_REASSIGN_OWNED, "REASSIGN OWNED", false, false, false)
 PG_CMDTAG(CMDTAG_REFRESH_MATERIALIZED_VIEW, "REFRESH MATERIALIZED VIEW", true, false, false)
 PG_CMDTAG(CMDTAG_REINDEX, "REINDEX", true, false, false)
 PG_CMDTAG(CMDTAG_RELEASE, "RELEASE", false, false, false)
+PG_CMDTAG(CMDTAG_REPACK, "REPACK", false, false, false)
 PG_CMDTAG(CMDTAG_RESET, "RESET", false, false, false)
 PG_CMDTAG(CMDTAG_REVOKE, "REVOKE", true, false, false)
 PG_CMDTAG(CMDTAG_REVOKE_ROLE, "REVOKE ROLE", false, false, false)
diff --git a/src/include/utils/backend_progress.h b/src/include/utils/backend_progress.h
index dda813ab407..e69e366dcdc 100644
--- a/src/include/utils/backend_progress.h
+++ b/src/include/utils/backend_progress.h
@@ -28,6 +28,7 @@ typedef enum ProgressCommandType
 	PROGRESS_COMMAND_CREATE_INDEX,
 	PROGRESS_COMMAND_BASEBACKUP,
 	PROGRESS_COMMAND_COPY,
+	PROGRESS_COMMAND_REPACK,
 } ProgressCommandType;
 
 #define PGSTAT_NUM_PROGRESS_PARAM	20
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index 4d40a6809ab..277854418fa 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -495,6 +495,43 @@ ALTER TABLE clstrpart SET WITHOUT CLUSTER;
 ERROR:  cannot mark index clustered in partitioned table
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 ERROR:  cannot mark index clustered in partitioned table
+-- Check that REPACK sets new relfilenodes: it should process exactly the same
+-- tables as CLUSTER did.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart USING INDEX clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
+-- And finally the same for REPACK w/o index.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
 DROP TABLE clstrpart;
 -- Ownership of partitions is checked
 CREATE TABLE ptnowner(i int unique) PARTITION BY LIST (i);
@@ -513,7 +550,7 @@ CREATE TEMP TABLE ptnowner_oldnodes AS
   JOIN pg_class AS c ON c.oid=tree.relid;
 SET SESSION AUTHORIZATION regress_ptnowner;
 CLUSTER ptnowner USING ptnowner_i_idx;
-WARNING:  permission denied to cluster "ptnowner2", skipping it
+WARNING:  permission denied to execute CLUSTER on "ptnowner2", skipping it
 RESET SESSION AUTHORIZATION;
 SELECT a.relname, a.relfilenode=b.relfilenode FROM pg_class a
   JOIN ptnowner_oldnodes b USING (oid) ORDER BY a.relname COLLATE "C";
@@ -665,6 +702,101 @@ SELECT * FROM clstr_expression WHERE -a = -3 ORDER BY -a, b;
 (4 rows)
 
 COMMIT;
+----------------------------------------------------------------------
+--
+-- REPACK
+--
+----------------------------------------------------------------------
+-- REPACK handles individual tables identically to CLUSTER, but it's worth
+-- checking if it handles table hierarchies identically as well.
+REPACK clstr_tst USING INDEX clstr_tst_c;
+-- Verify that inheritance link still works
+INSERT INTO clstr_tst_inh VALUES (0, 100, 'in child table 2');
+SELECT a,b,c,substring(d for 30), length(d) from clstr_tst;
+ a  |  b  |        c         |           substring            | length 
+----+-----+------------------+--------------------------------+--------
+ 10 |  14 | catorce          |                                |       
+ 18 |   5 | cinco            |                                |       
+  9 |   4 | cuatro           |                                |       
+ 26 |  19 | diecinueve       |                                |       
+ 12 |  18 | dieciocho        |                                |       
+ 30 |  16 | dieciseis        |                                |       
+ 24 |  17 | diecisiete       |                                |       
+  2 |  10 | diez             |                                |       
+ 23 |  12 | doce             |                                |       
+ 11 |   2 | dos              |                                |       
+ 25 |   9 | nueve            |                                |       
+ 31 |   8 | ocho             |                                |       
+  1 |  11 | once             |                                |       
+ 28 |  15 | quince           |                                |       
+ 32 |   6 | seis             | xyzzyxyzzyxyzzyxyzzyxyzzyxyzzy | 500000
+ 29 |   7 | siete            |                                |       
+ 15 |  13 | trece            |                                |       
+ 22 |  30 | treinta          |                                |       
+ 17 |  32 | treinta y dos    |                                |       
+  3 |  31 | treinta y uno    |                                |       
+  5 |   3 | tres             |                                |       
+ 20 |   1 | uno              |                                |       
+  6 |  20 | veinte           |                                |       
+ 14 |  25 | veinticinco      |                                |       
+ 21 |  24 | veinticuatro     |                                |       
+  4 |  22 | veintidos        |                                |       
+ 19 |  29 | veintinueve      |                                |       
+ 16 |  28 | veintiocho       |                                |       
+ 27 |  26 | veintiseis       |                                |       
+ 13 |  27 | veintisiete      |                                |       
+  7 |  23 | veintitres       |                                |       
+  8 |  21 | veintiuno        |                                |       
+  0 | 100 | in child table   |                                |       
+  0 | 100 | in child table 2 |                                |       
+(34 rows)
+
+-- Verify that foreign key link still works
+INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
+ERROR:  insert or update on table "clstr_tst" violates foreign key constraint "clstr_tst_con"
+DETAIL:  Key (b)=(1111) is not present in table "clstr_tst_s".
+SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
+ORDER BY 1;
+       conname        
+----------------------
+ clstr_tst_a_not_null
+ clstr_tst_con
+ clstr_tst_pkey
+(3 rows)
+
+-- Verify partial analyze works
+REPACK (ANALYZE) clstr_tst (a);
+REPACK (ANALYZE) clstr_tst;
+REPACK (VERBOSE) clstr_tst (a);
+ERROR:  ANALYZE option must be specified when a column list is provided
+-- REPACK w/o argument performs no ordering, so we can only check which tables
+-- have the relfilenode changed.
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_old AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+SET SESSION AUTHORIZATION regress_clstr_user;
+SET client_min_messages = ERROR;  -- order of "skipping" warnings may vary
+REPACK;
+RESET client_min_messages;
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_new AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+-- Do the actual comparison. Unlike CLUSTER, clstr_3 should have been
+-- processed because there is nothing like clustering index here.
+SELECT o.relname FROM relnodes_old o
+JOIN relnodes_new n ON o.relname = n.relname
+WHERE o.relfilenode <> n.relfilenode
+ORDER BY o.relname;
+ relname 
+---------
+ clstr_1
+ clstr_3
+(2 rows)
+
 -- clean up
 DROP TABLE clustertest;
 DROP TABLE clstr_1;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 4286c266e17..eb90fd3de5f 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2000,34 +2000,23 @@ pg_stat_progress_basebackup| SELECT pid,
             ELSE NULL::text
         END AS backup_type
    FROM pg_stat_get_progress_info('BASEBACKUP'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20);
-pg_stat_progress_cluster| SELECT s.pid,
-    s.datid,
-    d.datname,
-    s.relid,
-        CASE s.param1
-            WHEN 1 THEN 'CLUSTER'::text
-            WHEN 2 THEN 'VACUUM FULL'::text
-            ELSE NULL::text
+pg_stat_progress_cluster| SELECT pid,
+    datid,
+    datname,
+    relid,
+        CASE
+            WHEN (command = ANY (ARRAY['CLUSTER'::text, 'VACUUM FULL'::text])) THEN command
+            WHEN (repack_index_relid = (0)::oid) THEN 'VACUUM FULL'::text
+            ELSE 'CLUSTER'::text
         END AS command,
-        CASE s.param2
-            WHEN 0 THEN 'initializing'::text
-            WHEN 1 THEN 'seq scanning heap'::text
-            WHEN 2 THEN 'index scanning heap'::text
-            WHEN 3 THEN 'sorting tuples'::text
-            WHEN 4 THEN 'writing new heap'::text
-            WHEN 5 THEN 'swapping relation files'::text
-            WHEN 6 THEN 'rebuilding index'::text
-            WHEN 7 THEN 'performing final cleanup'::text
-            ELSE NULL::text
-        END AS phase,
-    (s.param3)::oid AS cluster_index_relid,
-    s.param4 AS heap_tuples_scanned,
-    s.param5 AS heap_tuples_written,
-    s.param6 AS heap_blks_total,
-    s.param7 AS heap_blks_scanned,
-    s.param8 AS index_rebuild_count
-   FROM (pg_stat_get_progress_info('CLUSTER'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
-     LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+    phase,
+    repack_index_relid AS cluster_index_relid,
+    heap_tuples_scanned,
+    heap_tuples_written,
+    heap_blks_total,
+    heap_blks_scanned,
+    index_rebuild_count
+   FROM pg_stat_progress_repack;
 pg_stat_progress_copy| SELECT s.pid,
     s.datid,
     d.datname,
@@ -2087,6 +2076,35 @@ pg_stat_progress_create_index| SELECT s.pid,
     s.param15 AS partitions_done
    FROM (pg_stat_get_progress_info('CREATE INDEX'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+pg_stat_progress_repack| SELECT s.pid,
+    s.datid,
+    d.datname,
+    s.relid,
+        CASE s.param1
+            WHEN 1 THEN 'CLUSTER'::text
+            WHEN 2 THEN 'REPACK'::text
+            WHEN 3 THEN 'VACUUM FULL'::text
+            ELSE NULL::text
+        END AS command,
+        CASE s.param2
+            WHEN 0 THEN 'initializing'::text
+            WHEN 1 THEN 'seq scanning heap'::text
+            WHEN 2 THEN 'index scanning heap'::text
+            WHEN 3 THEN 'sorting tuples'::text
+            WHEN 4 THEN 'writing new heap'::text
+            WHEN 5 THEN 'swapping relation files'::text
+            WHEN 6 THEN 'rebuilding index'::text
+            WHEN 7 THEN 'performing final cleanup'::text
+            ELSE NULL::text
+        END AS phase,
+    (s.param3)::oid AS repack_index_relid,
+    s.param4 AS heap_tuples_scanned,
+    s.param5 AS heap_tuples_written,
+    s.param6 AS heap_blks_total,
+    s.param7 AS heap_blks_scanned,
+    s.param8 AS index_rebuild_count
+   FROM (pg_stat_get_progress_info('REPACK'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
+     LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_vacuum| SELECT s.pid,
     s.datid,
     d.datname,
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index b7115f86104..c976823a3cb 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -76,7 +76,6 @@ INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
 SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
 ORDER BY 1;
 
-
 SELECT relname, relkind,
     EXISTS(SELECT 1 FROM pg_class WHERE oid = c.reltoastrelid) AS hastoast
 FROM pg_class c WHERE relname LIKE 'clstr_tst%' ORDER BY relname;
@@ -229,6 +228,24 @@ SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM o
 CLUSTER clstrpart;
 ALTER TABLE clstrpart SET WITHOUT CLUSTER;
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+
+-- Check that REPACK sets new relfilenodes: it should process exactly the same
+-- tables as CLUSTER did.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart USING INDEX clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+
+-- And finally the same for REPACK w/o index.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+
 DROP TABLE clstrpart;
 
 -- Ownership of partitions is checked
@@ -313,6 +330,57 @@ EXPLAIN (COSTS OFF) SELECT * FROM clstr_expression WHERE -a = -3 ORDER BY -a, b;
 SELECT * FROM clstr_expression WHERE -a = -3 ORDER BY -a, b;
 COMMIT;
 
+----------------------------------------------------------------------
+--
+-- REPACK
+--
+----------------------------------------------------------------------
+
+-- REPACK handles individual tables identically to CLUSTER, but it's worth
+-- checking if it handles table hierarchies identically as well.
+REPACK clstr_tst USING INDEX clstr_tst_c;
+
+-- Verify that inheritance link still works
+INSERT INTO clstr_tst_inh VALUES (0, 100, 'in child table 2');
+SELECT a,b,c,substring(d for 30), length(d) from clstr_tst;
+
+-- Verify that foreign key link still works
+INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
+
+SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
+ORDER BY 1;
+
+-- Verify partial analyze works
+REPACK (ANALYZE) clstr_tst (a);
+REPACK (ANALYZE) clstr_tst;
+REPACK (VERBOSE) clstr_tst (a);
+
+-- REPACK w/o argument performs no ordering, so we can only check which tables
+-- have the relfilenode changed.
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_old AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+
+SET SESSION AUTHORIZATION regress_clstr_user;
+SET client_min_messages = ERROR;  -- order of "skipping" warnings may vary
+REPACK;
+RESET client_min_messages;
+
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_new AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+
+-- Do the actual comparison. Unlike CLUSTER, clstr_3 should have been
+-- processed because there is nothing like clustering index here.
+SELECT o.relname FROM relnodes_old o
+JOIN relnodes_new n ON o.relname = n.relname
+WHERE o.relfilenode <> n.relfilenode
+ORDER BY o.relname;
+
 -- clean up
 DROP TABLE clustertest;
 DROP TABLE clstr_1;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9dd65b10254..4f3c7c160a6 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2557,6 +2557,8 @@ ReorderBufferTupleCidKey
 ReorderBufferUpdateProgressTxnCB
 ReorderTuple
 RepOriginId
+RepackCommand
+RepackStmt
 ReparameterizeForeignPathByChild_function
 ReplaceVarsFromTargetList_context
 ReplaceVarsNoMatchOption
-- 
2.47.3

v28-0002-Refactor-index_concurrently_create_copy-for-use-with.patchtext/x-diffDownload

From c44fc84b28c599a9c3e484f0b8d5f482305ac5d6 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Sat, 13 Dec 2025 19:27:18 +0100
Subject: [PATCH 2/6] Refactor index_concurrently_create_copy() for use with
 REPACK (CONCURRENTLY).

This patch moves the code to index_create_copy() and adds a "concurrently"
parameter so it can be used by REPACK (CONCURRENTLY).

With the CONCURRENTLY option, REPACK cannot simply swap the heap file and
rebuild its indexes. Instead, it needs to build a separate set of indexes
(including system catalog entries) *before* the actual swap, to reduce the
time AccessExclusiveLock needs to be held for.
---
 src/backend/catalog/index.c      | 54 +++++++++++++++++++++++---------
 src/backend/commands/cluster.c   |  8 ++---
 src/backend/commands/indexcmds.c |  6 ++--
 src/backend/nodes/makefuncs.c    |  9 +++---
 src/include/catalog/index.h      |  3 ++
 src/include/nodes/makefuncs.h    |  4 ++-
 6 files changed, 57 insertions(+), 27 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index bd77584bc99..60efa77d34f 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1290,15 +1290,32 @@ index_create(Relation heapRelation,
 /*
  * index_concurrently_create_copy
  *
- * Create concurrently an index based on the definition of the one provided by
- * caller.  The index is inserted into catalogs and needs to be built later
- * on.  This is called during concurrent reindex processing.
- *
- * "tablespaceOid" is the tablespace to use for this index.
+ * Variant of index_create_copy(), called during concurrent reindex
+ * processing.
  */
 Oid
 index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 							   Oid tablespaceOid, const char *newName)
+{
+	return index_create_copy(heapRelation, oldIndexId, tablespaceOid, newName,
+							 true);
+}
+
+/*
+ * index_create_copy
+ *
+ * Create an index based on the definition of the one provided by caller.  The
+ * index is inserted into catalogs. If 'concurrently' is TRUE, it needs to be
+ * built later on, otherwise it's built immediately.
+ *
+ * "tablespaceOid" is the tablespace to use for this index.
+ *
+ * The actual implementation of index_concurrently_create_copy(), reusable for
+ * other purposes.
+ */
+Oid
+index_create_copy(Relation heapRelation, Oid oldIndexId, Oid tablespaceOid,
+				  const char *newName, bool concurrently)
 {
 	Relation	indexRelation;
 	IndexInfo  *oldInfo,
@@ -1317,6 +1334,7 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 	List	   *indexColNames = NIL;
 	List	   *indexExprs = NIL;
 	List	   *indexPreds = NIL;
+	int			flags = 0;
 
 	indexRelation = index_open(oldIndexId, RowExclusiveLock);
 
@@ -1327,7 +1345,7 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 	 * Concurrent build of an index with exclusion constraints is not
 	 * supported.
 	 */
-	if (oldInfo->ii_ExclusionOps != NULL)
+	if (oldInfo->ii_ExclusionOps != NULL && concurrently)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("concurrent index creation for exclusion constraints is not supported")));
@@ -1383,9 +1401,7 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 	}
 
 	/*
-	 * Build the index information for the new index.  Note that rebuild of
-	 * indexes with exclusion constraints is not supported, hence there is no
-	 * need to fill all the ii_Exclusion* fields.
+	 * Build the index information for the new index.
 	 */
 	newInfo = makeIndexInfo(oldInfo->ii_NumIndexAttrs,
 							oldInfo->ii_NumIndexKeyAttrs,
@@ -1394,10 +1410,13 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 							indexPreds,
 							oldInfo->ii_Unique,
 							oldInfo->ii_NullsNotDistinct,
-							false,	/* not ready for inserts */
-							true,
+							!concurrently,	/* isready */
+							concurrently,	/* concurrent */
 							indexRelation->rd_indam->amsummarizing,
-							oldInfo->ii_WithoutOverlaps);
+							oldInfo->ii_WithoutOverlaps,
+							oldInfo->ii_ExclusionOps,
+							oldInfo->ii_ExclusionProcs,
+							oldInfo->ii_ExclusionStrats);
 
 	/*
 	 * Extract the list of column names and the column numbers for the new
@@ -1435,6 +1454,9 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 		stattargets[i].isnull = isnull;
 	}
 
+	if (concurrently)
+		flags = INDEX_CREATE_SKIP_BUILD | INDEX_CREATE_CONCURRENT;
+
 	/*
 	 * Now create the new index.
 	 *
@@ -1458,7 +1480,7 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 							  indcoloptions->values,
 							  stattargets,
 							  reloptionsDatum,
-							  INDEX_CREATE_SKIP_BUILD | INDEX_CREATE_CONCURRENT,
+							  flags,
 							  0,
 							  true, /* allow table to be a system catalog? */
 							  false,	/* is_internal? */
@@ -2452,7 +2474,8 @@ BuildIndexInfo(Relation index)
 					   indexStruct->indisready,
 					   false,
 					   index->rd_indam->amsummarizing,
-					   indexStruct->indisexclusion && indexStruct->indisunique);
+					   indexStruct->indisexclusion && indexStruct->indisunique,
+					   NULL, NULL, NULL);
 
 	/* fill in attribute numbers */
 	for (i = 0; i < numAtts; i++)
@@ -2512,7 +2535,8 @@ BuildDummyIndexInfo(Relation index)
 					   indexStruct->indisready,
 					   false,
 					   index->rd_indam->amsummarizing,
-					   indexStruct->indisexclusion && indexStruct->indisunique);
+					   indexStruct->indisexclusion && indexStruct->indisunique,
+					   NULL, NULL, NULL);
 
 	/* fill in attribute numbers */
 	for (i = 0; i < numAtts; i++)
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 7f772c5c4f8..a2d72ce494d 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -70,8 +70,7 @@ typedef struct
 
 static bool cluster_rel_recheck(RepackCommand cmd, Relation OldHeap,
 								Oid indexOid, Oid userid, int options);
-static void rebuild_relation(RepackCommand cmd,
-							 Relation OldHeap, Relation index, bool verbose);
+static void rebuild_relation(Relation OldHeap, Relation index, bool verbose);
 static void copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
@@ -415,7 +414,7 @@ cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	TransferPredicateLocksToHeapRelation(OldHeap);
 
 	/* rebuild_relation does all the dirty work */
-	rebuild_relation(cmd, OldHeap, index, verbose);
+	rebuild_relation(OldHeap, index, verbose);
 	/* rebuild_relation closes OldHeap, and index if valid */
 
 out:
@@ -629,8 +628,7 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
  * On exit, they are closed, but locks on them are not released.
  */
 static void
-rebuild_relation(RepackCommand cmd,
-				 Relation OldHeap, Relation index, bool verbose)
+rebuild_relation(Relation OldHeap, Relation index, bool verbose)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			accessMethod = OldHeap->rd_rel->relam;
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index d9cccb6ac18..d8d8f72a875 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -242,7 +242,8 @@ CheckIndexCompatible(Oid oldId,
 	 */
 	indexInfo = makeIndexInfo(numberOfAttributes, numberOfAttributes,
 							  accessMethodId, NIL, NIL, false, false,
-							  false, false, amsummarizing, isWithoutOverlaps);
+							  false, false, amsummarizing, isWithoutOverlaps,
+							  NULL, NULL, NULL);
 	typeIds = palloc_array(Oid, numberOfAttributes);
 	collationIds = palloc_array(Oid, numberOfAttributes);
 	opclassIds = palloc_array(Oid, numberOfAttributes);
@@ -927,7 +928,8 @@ DefineIndex(Oid tableId,
 							  !concurrent,
 							  concurrent,
 							  amissummarizing,
-							  stmt->iswithoutoverlaps);
+							  stmt->iswithoutoverlaps,
+							  NULL, NULL, NULL);
 
 	typeIds = palloc_array(Oid, numberOfAttributes);
 	collationIds = palloc_array(Oid, numberOfAttributes);
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index e2d9e9be41a..c5d5a37f514 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -834,7 +834,8 @@ IndexInfo *
 makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 			  List *predicates, bool unique, bool nulls_not_distinct,
 			  bool isready, bool concurrent, bool summarizing,
-			  bool withoutoverlaps)
+			  bool withoutoverlaps, Oid *exclusion_ops, Oid *exclusion_procs,
+			  uint16 *exclusion_strats)
 {
 	IndexInfo  *n = makeNode(IndexInfo);
 
@@ -863,9 +864,9 @@ makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 	n->ii_PredicateState = NULL;
 
 	/* exclusion constraints */
-	n->ii_ExclusionOps = NULL;
-	n->ii_ExclusionProcs = NULL;
-	n->ii_ExclusionStrats = NULL;
+	n->ii_ExclusionOps = exclusion_ops;
+	n->ii_ExclusionProcs = exclusion_procs;
+	n->ii_ExclusionStrats = exclusion_strats;
 
 	/* speculative inserts */
 	n->ii_UniqueOps = NULL;
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index dda95e54903..4bf909078d8 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -99,6 +99,9 @@ extern Oid	index_concurrently_create_copy(Relation heapRelation,
 										   Oid oldIndexId,
 										   Oid tablespaceOid,
 										   const char *newName);
+extern Oid	index_create_copy(Relation heapRelation, Oid oldIndexId,
+							  Oid tablespaceOid, const char *newName,
+							  bool concurrently);
 
 extern void index_concurrently_build(Oid heapRelationId,
 									 Oid indexRelationId);
diff --git a/src/include/nodes/makefuncs.h b/src/include/nodes/makefuncs.h
index 5473ce9a288..9ff7159ff0c 100644
--- a/src/include/nodes/makefuncs.h
+++ b/src/include/nodes/makefuncs.h
@@ -99,7 +99,9 @@ extern IndexInfo *makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid,
 								List *expressions, List *predicates,
 								bool unique, bool nulls_not_distinct,
 								bool isready, bool concurrent,
-								bool summarizing, bool withoutoverlaps);
+								bool summarizing, bool withoutoverlaps,
+								Oid *exclusion_ops, Oid *exclusion_procs,
+								uint16 *exclusion_strats);
 
 extern Node *makeStringConst(char *str, int location);
 extern DefElem *makeDefElem(char *name, Node *arg, int location);
-- 
2.47.3

v28-0003-Move-conversion-of-a-historic-to-MVCC-snapshot-to-a-.patchtext/x-diffDownload

From 34a13e3fabc89d2952393f37d2b1388e96c9c7c2 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Sat, 13 Dec 2025 19:27:18 +0100
Subject: [PATCH 3/6] Move conversion of a "historic" to MVCC snapshot to a
 separate function.

The conversion is now handled by SnapBuildMVCCFromHistoric(). REPACK
CONCURRENTLY will also need it.
---
 src/backend/replication/logical/snapbuild.c | 57 +++++++++++++++++----
 src/backend/utils/time/snapmgr.c            |  3 +-
 src/include/replication/snapbuild.h         |  1 +
 src/include/utils/snapmgr.h                 |  1 +
 4 files changed, 50 insertions(+), 12 deletions(-)

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index d6ab1e017eb..a3730804428 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -440,10 +440,7 @@ Snapshot
 SnapBuildInitialSnapshot(SnapBuild *builder)
 {
 	Snapshot	snap;
-	TransactionId xid;
 	TransactionId safeXid;
-	TransactionId *newxip;
-	int			newxcnt = 0;
 
 	Assert(XactIsoLevel == XACT_REPEATABLE_READ);
 	Assert(builder->building_full_snapshot);
@@ -485,7 +482,33 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 
 	MyProc->xmin = snap->xmin;
 
-	/* allocate in transaction context */
+	/* Convert the historic snapshot to MVCC snapshot. */
+	return SnapBuildMVCCFromHistoric(snap, true);
+}
+
+/*
+ * Turn a historic MVCC snapshot into an ordinary MVCC snapshot.
+ *
+ * Unlike a regular (non-historic) MVCC snapshot, the 'xip' array of this
+ * snapshot contains not only running main transactions, but also their
+ * subtransactions. On the other hand, 'subxip' will usually be empty. This
+ * difference does not affect the result of XidInMVCCSnapshot() because it
+ * searches both in 'xip' and 'subxip'.
+ *
+ * Pass true for 'in_place' if you don't care about modifying the source
+ * snapshot. If you need a new instance, and one that was allocated as a
+ * single chunk of memory, pass false.
+ */
+Snapshot
+SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place)
+{
+	TransactionId xid;
+	TransactionId *oldxip = snapshot->xip;
+	uint32		oldxcnt = snapshot->xcnt;
+	TransactionId *newxip;
+	int			newxcnt = 0;
+	Snapshot	result;
+
 	newxip = palloc_array(TransactionId, GetMaxSnapshotXidCount());
 
 	/*
@@ -494,7 +517,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	 * classical snapshot by marking all non-committed transactions as
 	 * in-progress. This can be expensive.
 	 */
-	for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);)
+	for (xid = snapshot->xmin; NormalTransactionIdPrecedes(xid, snapshot->xmax);)
 	{
 		void	   *test;
 
@@ -502,7 +525,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 		 * Check whether transaction committed using the decoding snapshot
 		 * meaning of ->xip.
 		 */
-		test = bsearch(&xid, snap->xip, snap->xcnt,
+		test = bsearch(&xid, snapshot->xip, snapshot->xcnt,
 					   sizeof(TransactionId), xidComparator);
 
 		if (test == NULL)
@@ -519,11 +542,25 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	}
 
 	/* adjust remaining snapshot fields as needed */
-	snap->snapshot_type = SNAPSHOT_MVCC;
-	snap->xcnt = newxcnt;
-	snap->xip = newxip;
+	snapshot->xcnt = newxcnt;
+	snapshot->xip = newxip;
+
+	if (in_place)
+		result = snapshot;
+	else
+	{
+		result = CopySnapshot(snapshot);
+
+		/* Restore the original values so the source is intact. */
+		snapshot->xip = oldxip;
+		snapshot->xcnt = oldxcnt;
+
+		/* newxip has been copied */
+		pfree(newxip);
+	}
+	result->snapshot_type = SNAPSHOT_MVCC;
 
-	return snap;
+	return result;
 }
 
 /*
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 40a2e90e071..4cf32ffe833 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -213,7 +213,6 @@ typedef struct ExportedSnapshot
 static List *exportedSnapshots = NIL;
 
 /* Prototypes for local functions */
-static Snapshot CopySnapshot(Snapshot snapshot);
 static void UnregisterSnapshotNoOwner(Snapshot snapshot);
 static void FreeSnapshot(Snapshot snapshot);
 static void SnapshotResetXmin(void);
@@ -604,7 +603,7 @@ SetTransactionSnapshot(Snapshot sourcesnap, VirtualTransactionId *sourcevxid,
  * The copy is palloc'd in TopTransactionContext and has initial refcounts set
  * to 0.  The returned snapshot has the copied flag set.
  */
-static Snapshot
+Snapshot
 CopySnapshot(Snapshot snapshot)
 {
 	Snapshot	newsnap;
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index 44031dcf6e3..6d4d2d1814c 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -73,6 +73,7 @@ extern void FreeSnapshotBuilder(SnapBuild *builder);
 extern void SnapBuildSnapDecRefcount(Snapshot snap);
 
 extern Snapshot SnapBuildInitialSnapshot(SnapBuild *builder);
+extern Snapshot SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place);
 extern const char *SnapBuildExportSnapshot(SnapBuild *builder);
 extern void SnapBuildClearExportedSnapshot(void);
 extern void SnapBuildResetExportedSnapshotState(void);
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..f65f83c85cd 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -63,6 +63,7 @@ extern Snapshot GetTransactionSnapshot(void);
 extern Snapshot GetLatestSnapshot(void);
 extern void SnapshotSetCommandId(CommandId curcid);
 
+extern Snapshot CopySnapshot(Snapshot snapshot);
 extern Snapshot GetCatalogSnapshot(Oid relid);
 extern Snapshot GetNonHistoricCatalogSnapshot(Oid relid);
 extern void InvalidateCatalogSnapshot(void);
-- 
2.47.3

v28-0004-Add-CONCURRENTLY-option-to-REPACK-command.patchtext/plainDownload

From 6279394135f2b693b6fffd174822509e0a067cbf Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Sat, 13 Dec 2025 19:27:18 +0100
Subject: [PATCH 4/6] Add CONCURRENTLY option to REPACK command.

The REPACK command copies the relation data into a new file, creates new
indexes and eventually swaps the files. To make sure that the old file does
not change during the copying, the relation is locked in an exclusive mode,
which prevents applications from both reading and writing. (To keep the data
consistent, we'd only need to prevent the applications from writing, but even
reading needs to be blocked before we can swap the files - otherwise some
applications could continue using the old file. Since we should not request a
stronger lock without releasing the weaker one first, we acquire the exclusive
lock in the beginning and keep it till the end of the processing.)

This patch introduces an alternative workflow, which only requires the
exclusive lock when the relation (and index) files are being swapped.
(Supposedly, the swapping should be pretty fast.) On the other hand, when we
copy the data to the new file, we allow applications to read from the relation
and even to write to it.

First, we scan the relation using a "historic snapshot", and insert all the
tuples satisfying this snapshot into the new file.

Second, logical decoding is used to capture the data changes done by
applications during the copying (i.e. changes that do not satisfy the historic
snapshot mentioned above), and those are applied to the new file before we
acquire the exclusive lock that we need to swap the files. (Of course, more
data changes can take place while we are waiting for the lock - these will be
applied to the new file after we have acquired the lock, before we swap the
files.)

Since the logical decoding system, during its startup, waits until all the
transactions which already have XID assigned have finished, there is a risk of
deadlock if a transaction that already changed anything in the database tries
to acquire a conflicting lock on the table REPACK CONCURRENTLY is working
on. As an example, consider transaction running CREATE INDEX command on the
table that is being REPACKed CONCURRENTLY. On the other hand, DML commands
(INSERT, UPDATE, DELETE) are not a problem as their lock does not conflict
with REPACK CONCURRENTLY.

The current approach is that we accept the risk. If we tried to avoid it, it'd
be necessary to unlock the table before the logical decoding is setup and lock
it again afterwards. Such temporary unlocking would imply re-checking if the
table still meets all the requirements for REPACK CONCURRENTLY.

Like the existing implementation of REPACK, the variant with the CONCURRENTLY
option also requires an extra space for the new relation and index files
(which coexist with the old files for some time). In addition, the
CONCURRENTLY option might introduce a lag in releasing WAL segments for
archiving / recycling. This is due to the decoding of the data changes done by
applications concurrently. When copying the table contents into the new file,
we check the lag periodically. If it exceeds the size of a WAL segment, we
decode all the available WAL before resuming the copying. (Of course, the
changes are not applied until the whole table contents is copied.) A
background worker might be a better approach for the decoding - let's consider
implementing it in the future.

The WAL records produced by running DML commands on the new relation do not
contain enough information to be processed by the logical decoding system. All
we need from the new relation is the file (relfilenode), while the actual
relation is eventually dropped. Thus there is no point in replaying the DMLs
anywhere.
---
 doc/src/sgml/monitoring.sgml                  |   37 +-
 doc/src/sgml/mvcc.sgml                        |   12 +-
 doc/src/sgml/ref/repack.sgml                  |  129 +-
 src/Makefile                                  |    1 +
 src/backend/access/heap/heapam.c              |   34 +-
 src/backend/access/heap/heapam_handler.c      |  227 ++-
 src/backend/access/heap/rewriteheap.c         |    6 +-
 src/backend/catalog/system_views.sql          |   19 +-
 src/backend/commands/cluster.c                | 1599 +++++++++++++++--
 src/backend/commands/matview.c                |    2 +-
 src/backend/commands/tablecmds.c              |    1 +
 src/backend/commands/vacuum.c                 |   12 +-
 src/backend/meson.build                       |    1 +
 src/backend/replication/logical/decode.c      |   94 +
 src/backend/replication/logical/snapbuild.c   |   21 +
 .../replication/pgoutput_repack/Makefile      |   32 +
 .../replication/pgoutput_repack/meson.build   |   18 +
 .../pgoutput_repack/pgoutput_repack.c         |  240 +++
 .../storage/lmgr/generate-lwlocknames.pl      |    2 +-
 src/backend/utils/time/snapmgr.c              |    3 +-
 src/bin/psql/tab-complete.in.c                |    4 +-
 src/include/access/heapam.h                   |    5 +-
 src/include/access/heapam_xlog.h              |    2 +
 src/include/access/tableam.h                  |   10 +
 src/include/commands/cluster.h                |   90 +-
 src/include/commands/progress.h               |   17 +-
 src/include/replication/snapbuild.h           |    1 +
 src/include/storage/lockdefs.h                |    4 +-
 src/include/utils/snapmgr.h                   |    2 +
 src/test/modules/injection_points/Makefile    |    3 +
 .../injection_points/expected/repack.out      |  113 ++
 .../expected/repack_toast.out                 |   64 +
 .../modules/injection_points/logical.conf     |    1 +
 src/test/modules/injection_points/meson.build |    4 +
 .../injection_points/specs/repack.spec        |  142 ++
 .../injection_points/specs/repack_toast.spec  |  105 ++
 src/test/regress/expected/rules.out           |   19 +-
 src/tools/pgindent/typedefs.list              |    5 +
 38 files changed, 2845 insertions(+), 236 deletions(-)
 create mode 100644 src/backend/replication/pgoutput_repack/Makefile
 create mode 100644 src/backend/replication/pgoutput_repack/meson.build
 create mode 100644 src/backend/replication/pgoutput_repack/pgoutput_repack.c
 create mode 100644 src/test/modules/injection_points/expected/repack.out
 create mode 100644 src/test/modules/injection_points/expected/repack_toast.out
 create mode 100644 src/test/modules/injection_points/logical.conf
 create mode 100644 src/test/modules/injection_points/specs/repack.spec
 create mode 100644 src/test/modules/injection_points/specs/repack_toast.spec

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index b07fe3294cd..ae56b09aeba 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6202,14 +6202,35 @@ FROM pg_stat_get_backend_idset() AS backendid;
 
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>heap_tuples_written</structfield> <type>bigint</type>
+       <structfield>heap_tuples_inserted</structfield> <type>bigint</type>
       </para>
       <para>
-       Number of heap tuples written.
+       Number of heap tuples inserted.
        This counter only advances when the phase is
        <literal>seq scanning heap</literal>,
-       <literal>index scanning heap</literal>
-       or <literal>writing new heap</literal>.
+       <literal>index scanning heap</literal>,
+       <literal>writing new heap</literal>
+       or <literal>catch-up</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_updated</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples updated.
+       This counter only advances when the phase is <literal>catch-up</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_deleted</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples deleted.
+       This counter only advances when the phase is <literal>catch-up</literal>.
       </para></entry>
      </row>
 
@@ -6290,6 +6311,14 @@ FROM pg_stat_get_backend_idset() AS backendid;
        <command>REPACK</command> is currently writing the new heap.
      </entry>
     </row>
+    <row>
+     <entry><literal>catch-up</literal></entry>
+     <entry>
+       <command>REPACK CONCURRENTLY</command> is currently processing the DML
+       commands that other transactions executed during any of the preceding
+       phases.
+     </entry>
+    </row>
     <row>
      <entry><literal>swapping relation files</literal></entry>
      <entry>
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index 049ee75a4ba..0f5c34af542 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -1833,15 +1833,17 @@ SELECT pg_advisory_lock(q.id) FROM
    <title>Caveats</title>
 
    <para>
-    Some DDL commands, currently only <link linkend="sql-truncate"><command>TRUNCATE</command></link> and the
-    table-rewriting forms of <link linkend="sql-altertable"><command>ALTER TABLE</command></link>, are not
+    Some commands, currently only <link linkend="sql-truncate"><command>TRUNCATE</command></link>, the
+    table-rewriting forms of <link linkend="sql-altertable"><command>ALTER
+    TABLE</command></link> and <command>REPACK</command> with
+    the <literal>CONCURRENTLY</literal> option, are not
     MVCC-safe.  This means that after the truncation or rewrite commits, the
     table will appear empty to concurrent transactions, if they are using a
-    snapshot taken before the DDL command committed.  This will only be an
+    snapshot taken before the command committed.  This will only be an
     issue for a transaction that did not access the table in question
-    before the DDL command started &mdash; any transaction that has done so
+    before the command started &mdash; any transaction that has done so
     would hold at least an <literal>ACCESS SHARE</literal> table lock,
-    which would block the DDL command until that transaction completes.
+    which would block the truncating or rewriting command until that transaction completes.
     So these commands will not cause any apparent inconsistency in the
     table contents for successive queries on the target table, but they
     could cause visible inconsistency between the contents of the target
diff --git a/doc/src/sgml/ref/repack.sgml b/doc/src/sgml/ref/repack.sgml
index 61d5c2cdef1..30c43c49069 100644
--- a/doc/src/sgml/ref/repack.sgml
+++ b/doc/src/sgml/ref/repack.sgml
@@ -28,6 +28,7 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] USING
 
     VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
     ANALYZE [ <replaceable class="parameter">boolean</replaceable> ]
+    CONCURRENTLY [ <replaceable class="parameter">boolean</replaceable> ]
 
 <phrase>and <replaceable class="parameter">table_and_columns</replaceable> is:</phrase>
 
@@ -54,7 +55,8 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] USING
    processes every table and materialized view in the current database that
    the current user has the <literal>MAINTAIN</literal> privilege on. This
    form of <command>REPACK</command> cannot be executed inside a transaction
-   block.
+   block.  Also, this form is not allowed if
+   the <literal>CONCURRENTLY</literal> option is used.
   </para>
 
   <para>
@@ -67,7 +69,8 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] USING
    When a table is being repacked, an <literal>ACCESS EXCLUSIVE</literal> lock
    is acquired on it. This prevents any other database operations (both reads
    and writes) from operating on the table until the <command>REPACK</command>
-   is finished.
+   is finished. If you want to keep the table accessible during the repacking,
+   consider using the <literal>CONCURRENTLY</literal> option.
   </para>
 
   <refsect2 id="sql-repack-notes-on-clustering" xreflabel="Notes on Clustering">
@@ -195,6 +198,128 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] USING
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>CONCURRENTLY</literal></term>
+    <listitem>
+     <para>
+      Allow other transactions to use the table while it is being repacked.
+     </para>
+
+     <para>
+      Internally, <command>REPACK</command> copies the contents of the table
+      (ignoring dead tuples) into a new file, sorted by the specified index,
+      and also creates a new file for each index. Then it swaps the old and
+      new files for the table and all the indexes, and deletes the old
+      files. The <literal>ACCESS EXCLUSIVE</literal> lock is needed to make
+      sure that the old files do not change during the processing because the
+      changes would get lost due to the swap.
+     </para>
+
+     <para>
+      With the <literal>CONCURRENTLY</literal> option, the <literal>ACCESS
+      EXCLUSIVE</literal> lock is only acquired to swap the table and index
+      files. The data changes that took place during the creation of the new
+      table and index files are captured using logical decoding
+      (<xref linkend="logicaldecoding"/>) and applied before
+      the <literal>ACCESS EXCLUSIVE</literal> lock is requested. Thus the lock
+      is typically held only for the time needed to swap the files, which
+      should be pretty short. However, the time might still be noticeable if
+      too many data changes have been done to the table while
+      <command>REPACK</command> was waiting for the lock: those changes must
+      be processed just before the files are swapped, while the
+      <literal>ACCESS EXCLUSIVE</literal> lock is being held.
+     </para>
+
+     <para>
+      Note that <command>REPACK</command> with the
+      <literal>CONCURRENTLY</literal> option does not try to order the rows
+      inserted into the table after the repacking started. Also
+      note <command>REPACK</command> might fail to complete due to DDL
+      commands executed on the table by other transactions during the
+      repacking.
+     </para>
+
+     <note>
+      <para>
+       In addition to the temporary space requirements explained in
+       <xref linkend="sql-repack-notes-on-resources"/>,
+       the <literal>CONCURRENTLY</literal> option can add to the usage of
+       temporary space a bit more. The reason is that other transactions can
+       perform DML operations which cannot be applied to the new file until
+       <command>REPACK</command> has copied all the tuples from the old
+       file. Thus the tuples inserted into the old file during the copying are
+       also stored separately in a temporary file, so they can eventually be
+       applied to the new file.
+      </para>
+
+      <para>
+       Furthermore, the data changes performed during the copying are
+       extracted from <link linkend="wal">write-ahead log</link> (WAL), and
+       this extraction (decoding) only takes place when certain amount of WAL
+       has been written. Therefore, WAL removal can be delayed by this
+       threshold. Currently the threshold is equal to the value of
+       the <link linkend="guc-wal-segment-size"><varname>wal_segment_size</varname></link>
+       configuration parameter.
+      </para>
+     </note>
+
+     <para>
+      The <literal>CONCURRENTLY</literal> option cannot be used in the
+      following cases:
+
+      <itemizedlist>
+       <listitem>
+        <para>
+          The table is <literal>UNLOGGED</literal>.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+          The table is partitioned.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+          The table is a system catalog or a <acronym>TOAST</acronym> table.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         <command>REPACK</command> is executed inside a transaction block.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+          The <link linkend="guc-wal-level"><varname>wal_level</varname></link>
+          configuration parameter is less than <literal>logical</literal>.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         The <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+         configuration parameter does not allow for creation of an additional
+         replication slot.
+        </para>
+       </listitem>
+      </itemizedlist>
+     </para>
+
+     <warning>
+      <para>
+       <command>REPACK</command> with the <literal>CONCURRENTLY</literal>
+       option is not MVCC-safe, see <xref linkend="mvcc-caveats"/> for
+       details.
+      </para>
+     </warning>
+
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>VERBOSE</literal></term>
     <listitem>
diff --git a/src/Makefile b/src/Makefile
index 2f31a2f20a7..b18c9a14ffa 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -23,6 +23,7 @@ SUBDIRS = \
 	interfaces \
 	backend/replication/libpqwalreceiver \
 	backend/replication/pgoutput \
+	backend/replication/pgoutput_repack \
 	fe_utils \
 	bin \
 	pl \
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 225f9829f22..aa2b4ca3f0a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -60,7 +60,8 @@ static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
 static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 								  Buffer newbuf, HeapTuple oldtup,
 								  HeapTuple newtup, HeapTuple old_key_tuple,
-								  bool all_visible_cleared, bool new_all_visible_cleared);
+								  bool all_visible_cleared, bool new_all_visible_cleared,
+								  bool walLogical);
 #ifdef USE_ASSERT_CHECKING
 static void check_lock_if_inplace_updateable_rel(Relation relation,
 												 const ItemPointerData *otid,
@@ -2803,7 +2804,7 @@ xmax_infomask_changed(uint16 new_infomask, uint16 old_infomask)
 TM_Result
 heap_delete(Relation relation, const ItemPointerData *tid,
 			CommandId cid, Snapshot crosscheck, bool wait,
-			TM_FailureData *tmfd, bool changingPart)
+			TM_FailureData *tmfd, bool changingPart, bool walLogical)
 {
 	TM_Result	result;
 	TransactionId xid = GetCurrentTransactionId();
@@ -3050,7 +3051,8 @@ l1:
 	 * Compute replica identity tuple before entering the critical section so
 	 * we don't PANIC upon a memory allocation failure.
 	 */
-	old_key_tuple = ExtractReplicaIdentity(relation, &tp, true, &old_key_copied);
+	old_key_tuple = walLogical ?
+		ExtractReplicaIdentity(relation, &tp, true, &old_key_copied) : NULL;
 
 	/*
 	 * If this is the first possibly-multixact-able operation in the current
@@ -3140,6 +3142,15 @@ l1:
 				xlrec.flags |= XLH_DELETE_CONTAINS_OLD_KEY;
 		}
 
+		/*
+		 * Unlike UPDATE, DELETE is decoded even if there is no old key, so it
+		 * does not help to clear both XLH_DELETE_CONTAINS_OLD_TUPLE and
+		 * XLH_DELETE_CONTAINS_OLD_KEY. Thus we need an extra flag. TODO
+		 * Consider not decoding tuples w/o the old tuple/key instead.
+		 */
+		if (!walLogical)
+			xlrec.flags |= XLH_DELETE_NO_LOGICAL;
+
 		XLogBeginInsert();
 		XLogRegisterData(&xlrec, SizeOfHeapDelete);
 
@@ -3232,7 +3243,8 @@ simple_heap_delete(Relation relation, const ItemPointerData *tid)
 	result = heap_delete(relation, tid,
 						 GetCurrentCommandId(true), InvalidSnapshot,
 						 true /* wait for commit */ ,
-						 &tmfd, false /* changingPart */ );
+						 &tmfd, false,	/* changingPart */
+						 true /* walLogical */ );
 	switch (result)
 	{
 		case TM_SelfModified:
@@ -3273,7 +3285,7 @@ TM_Result
 heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
 			CommandId cid, Snapshot crosscheck, bool wait,
 			TM_FailureData *tmfd, LockTupleMode *lockmode,
-			TU_UpdateIndexes *update_indexes)
+			TU_UpdateIndexes *update_indexes, bool walLogical)
 {
 	TM_Result	result;
 	TransactionId xid = GetCurrentTransactionId();
@@ -4166,7 +4178,8 @@ l2:
 								 newbuf, &oldtup, heaptup,
 								 old_key_tuple,
 								 all_visible_cleared,
-								 all_visible_cleared_new);
+								 all_visible_cleared_new,
+								 walLogical);
 		if (newbuf != buffer)
 		{
 			PageSetLSN(BufferGetPage(newbuf), recptr);
@@ -4524,7 +4537,8 @@ simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup
 	result = heap_update(relation, otid, tup,
 						 GetCurrentCommandId(true), InvalidSnapshot,
 						 true /* wait for commit */ ,
-						 &tmfd, &lockmode, update_indexes);
+						 &tmfd, &lockmode, update_indexes,
+						 true /* walLogical */ );
 	switch (result)
 	{
 		case TM_SelfModified:
@@ -8864,7 +8878,8 @@ static XLogRecPtr
 log_heap_update(Relation reln, Buffer oldbuf,
 				Buffer newbuf, HeapTuple oldtup, HeapTuple newtup,
 				HeapTuple old_key_tuple,
-				bool all_visible_cleared, bool new_all_visible_cleared)
+				bool all_visible_cleared, bool new_all_visible_cleared,
+				bool walLogical)
 {
 	xl_heap_update xlrec;
 	xl_heap_header xlhdr;
@@ -8875,7 +8890,8 @@ log_heap_update(Relation reln, Buffer oldbuf,
 				suffixlen = 0;
 	XLogRecPtr	recptr;
 	Page		page = BufferGetPage(newbuf);
-	bool		need_tuple_data = RelationIsLogicallyLogged(reln);
+	bool		need_tuple_data = RelationIsLogicallyLogged(reln) &&
+		walLogical;
 	bool		init;
 	int			bufflags;
 
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index b3a19003cdd..6063e037edb 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -33,6 +33,7 @@
 #include "catalog/index.h"
 #include "catalog/storage.h"
 #include "catalog/storage_xlog.h"
+#include "commands/cluster.h"
 #include "commands/progress.h"
 #include "executor/executor.h"
 #include "miscadmin.h"
@@ -309,7 +310,8 @@ heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
 	 * the storage itself is cleaning the dead tuples by itself, it is the
 	 * time to call the index tuple deletion also.
 	 */
-	return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart);
+	return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart,
+					   true);
 }
 
 
@@ -328,7 +330,7 @@ heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
 	tuple->t_tableOid = slot->tts_tableOid;
 
 	result = heap_update(relation, otid, tuple, cid, crosscheck, wait,
-						 tmfd, lockmode, update_indexes);
+						 tmfd, lockmode, update_indexes, true);
 	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
 
 	/*
@@ -685,13 +687,15 @@ static void
 heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 								 Relation OldIndex, bool use_sort,
 								 TransactionId OldestXmin,
+								 Snapshot snapshot,
+								 LogicalDecodingContext *decoding_ctx,
 								 TransactionId *xid_cutoff,
 								 MultiXactId *multi_cutoff,
 								 double *num_tuples,
 								 double *tups_vacuumed,
 								 double *tups_recently_dead)
 {
-	RewriteState rwstate;
+	RewriteState rwstate = NULL;
 	IndexScanDesc indexScan;
 	TableScanDesc tableScan;
 	HeapScanDesc heapScan;
@@ -705,6 +709,8 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	bool	   *isnull;
 	BufferHeapTupleTableSlot *hslot;
 	BlockNumber prev_cblock = InvalidBlockNumber;
+	bool		concurrent = snapshot != NULL;
+	XLogRecPtr	end_of_wal_prev = GetFlushRecPtr(NULL);
 
 	/* Remember if it's a system catalog */
 	is_system_catalog = IsSystemRelation(OldHeap);
@@ -720,9 +726,12 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	values = palloc_array(Datum, natts);
 	isnull = palloc_array(bool, natts);
 
-	/* Initialize the rewrite operation */
-	rwstate = begin_heap_rewrite(OldHeap, NewHeap, OldestXmin, *xid_cutoff,
-								 *multi_cutoff);
+	/*
+	 * Initialize the rewrite operation.
+	 */
+	if (!concurrent)
+		rwstate = begin_heap_rewrite(OldHeap, NewHeap, OldestXmin,
+									 *xid_cutoff, *multi_cutoff);
 
 
 	/* Set up sorting if wanted */
@@ -737,6 +746,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	 * Prepare to scan the OldHeap.  To ensure we see recently-dead tuples
 	 * that still need to be copied, we scan with SnapshotAny and use
 	 * HeapTupleSatisfiesVacuum for the visibility test.
+	 *
+	 * In the CONCURRENTLY case, we do regular MVCC visibility tests, using
+	 * the snapshot passed by the caller.
 	 */
 	if (OldIndex != NULL && !use_sort)
 	{
@@ -753,7 +765,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex,
+									snapshot ? snapshot : SnapshotAny,
+									NULL, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -762,7 +776,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap,
+									snapshot ? snapshot : SnapshotAny,
+									0, (ScanKey) NULL);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
@@ -837,70 +853,77 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		tuple = ExecFetchSlotHeapTuple(slot, false, NULL);
 		buf = hslot->buffer;
 
-		LockBuffer(buf, BUFFER_LOCK_SHARE);
-
-		switch (HeapTupleSatisfiesVacuum(tuple, OldestXmin, buf))
+		/*
+		 * Regarding CONCURRENTLY, see the comments on MVCC snapshot above.
+		 */
+		if (!concurrent)
 		{
-			case HEAPTUPLE_DEAD:
-				/* Definitely dead */
-				isdead = true;
-				break;
-			case HEAPTUPLE_RECENTLY_DEAD:
-				*tups_recently_dead += 1;
-				/* fall through */
-			case HEAPTUPLE_LIVE:
-				/* Live or recently dead, must copy it */
-				isdead = false;
-				break;
-			case HEAPTUPLE_INSERT_IN_PROGRESS:
+			LockBuffer(buf, BUFFER_LOCK_SHARE);
 
-				/*
-				 * Since we hold exclusive lock on the relation, normally the
-				 * only way to see this is if it was inserted earlier in our
-				 * own transaction.  However, it can happen in system
-				 * catalogs, since we tend to release write lock before commit
-				 * there.  Give a warning if neither case applies; but in any
-				 * case we had better copy it.
-				 */
-				if (!is_system_catalog &&
-					!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple->t_data)))
-					elog(WARNING, "concurrent insert in progress within table \"%s\"",
-						 RelationGetRelationName(OldHeap));
-				/* treat as live */
-				isdead = false;
-				break;
-			case HEAPTUPLE_DELETE_IN_PROGRESS:
+			switch (HeapTupleSatisfiesVacuum(tuple, OldestXmin, buf))
+			{
+				case HEAPTUPLE_DEAD:
+					/* Definitely dead */
+					isdead = true;
+					break;
+				case HEAPTUPLE_RECENTLY_DEAD:
+					*tups_recently_dead += 1;
+					/* fall through */
+				case HEAPTUPLE_LIVE:
+					/* Live or recently dead, must copy it */
+					isdead = false;
+					break;
+				case HEAPTUPLE_INSERT_IN_PROGRESS:
 
-				/*
-				 * Similar situation to INSERT_IN_PROGRESS case.
-				 */
-				if (!is_system_catalog &&
-					!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tuple->t_data)))
-					elog(WARNING, "concurrent delete in progress within table \"%s\"",
-						 RelationGetRelationName(OldHeap));
-				/* treat as recently dead */
-				*tups_recently_dead += 1;
-				isdead = false;
-				break;
-			default:
-				elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
-				isdead = false; /* keep compiler quiet */
-				break;
-		}
+					/*
+					 * As long as we hold exclusive lock on the relation,
+					 * normally the only way to see this is if it was inserted
+					 * earlier in our own transaction.  However, it can happen
+					 * in system catalogs, since we tend to release write lock
+					 * before commit there. Give a warning if neither case
+					 * applies; but in any case we had better copy it.
+					 */
+					if (!is_system_catalog &&
+						!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple->t_data)))
+						elog(WARNING, "concurrent insert in progress within table \"%s\"",
+							 RelationGetRelationName(OldHeap));
+					/* treat as live */
+					isdead = false;
+					break;
+				case HEAPTUPLE_DELETE_IN_PROGRESS:
+
+					/*
+					 * Similar situation to INSERT_IN_PROGRESS case.
+					 */
+					if (!is_system_catalog &&
+						!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tuple->t_data)))
+						elog(WARNING, "concurrent delete in progress within table \"%s\"",
+							 RelationGetRelationName(OldHeap));
+					/* treat as recently dead */
+					*tups_recently_dead += 1;
+					isdead = false;
+					break;
+				default:
+					elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
+					isdead = false; /* keep compiler quiet */
+					break;
+			}
 
-		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 
-		if (isdead)
-		{
-			*tups_vacuumed += 1;
-			/* heap rewrite module still needs to see it... */
-			if (rewrite_heap_dead_tuple(rwstate, tuple))
+			if (isdead)
 			{
-				/* A previous recently-dead tuple is now known dead */
 				*tups_vacuumed += 1;
-				*tups_recently_dead -= 1;
+				/* heap rewrite module still needs to see it... */
+				if (rewrite_heap_dead_tuple(rwstate, tuple))
+				{
+					/* A previous recently-dead tuple is now known dead */
+					*tups_vacuumed += 1;
+					*tups_recently_dead -= 1;
+				}
+
+				continue;
 			}
-			continue;
 		}
 
 		*num_tuples += 1;
@@ -919,7 +942,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		{
 			const int	ct_index[] = {
 				PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
-				PROGRESS_REPACK_HEAP_TUPLES_WRITTEN
+				PROGRESS_REPACK_HEAP_TUPLES_INSERTED
 			};
 			int64		ct_val[2];
 
@@ -934,6 +957,31 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			ct_val[1] = *num_tuples;
 			pgstat_progress_update_multi_param(2, ct_index, ct_val);
 		}
+
+		/*
+		 * Process the WAL produced by the load, as well as by other
+		 * transactions, so that the replication slot can advance and WAL does
+		 * not pile up. Use wal_segment_size as a threshold so that we do not
+		 * introduce the decoding overhead too often.
+		 *
+		 * Of course, we must not apply the changes until the initial load has
+		 * completed.
+		 *
+		 * Note that our insertions into the new table should not be decoded
+		 * as we (intentionally) do not write the logical decoding specific
+		 * information to WAL.
+		 */
+		if (concurrent)
+		{
+			XLogRecPtr	end_of_wal;
+
+			end_of_wal = GetFlushRecPtr(NULL);
+			if ((end_of_wal - end_of_wal_prev) > wal_segment_size)
+			{
+				repack_decode_concurrent_changes(decoding_ctx, end_of_wal);
+				end_of_wal_prev = end_of_wal;
+			}
+		}
 	}
 
 	if (indexScan != NULL)
@@ -977,15 +1025,32 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 									 values, isnull,
 									 rwstate);
 			/* Report n_tuples */
-			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_WRITTEN,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_INSERTED,
 										 n_tuples);
+
+			/*
+			 * Try to keep the amount of not-yet-decoded WAL small, like
+			 * above.
+			 */
+			if (concurrent)
+			{
+				XLogRecPtr	end_of_wal;
+
+				end_of_wal = GetFlushRecPtr(NULL);
+				if ((end_of_wal - end_of_wal_prev) > wal_segment_size)
+				{
+					repack_decode_concurrent_changes(decoding_ctx, end_of_wal);
+					end_of_wal_prev = end_of_wal;
+				}
+			}
 		}
 
 		tuplesort_end(tuplesort);
 	}
 
 	/* Write out any remaining tuples, and fsync if needed */
-	end_heap_rewrite(rwstate);
+	if (rwstate)
+		end_heap_rewrite(rwstate);
 
 	/* Clean up */
 	pfree(values);
@@ -2376,6 +2441,10 @@ heapam_scan_sample_next_tuple(TableScanDesc scan, SampleScanState *scanstate,
  * SET WITHOUT OIDS.
  *
  * So, we must reconstruct the tuple from component Datums.
+ *
+ * If rwstate=NULL, use simple_heap_insert() instead of rewriting - in that
+ * case we still need to deform/form the tuple. TODO Shouldn't we rename the
+ * function, as might not do any rewrite?
  */
 static void
 reform_and_rewrite_tuple(HeapTuple tuple,
@@ -2398,8 +2467,28 @@ reform_and_rewrite_tuple(HeapTuple tuple,
 
 	copiedTuple = heap_form_tuple(newTupDesc, values, isnull);
 
-	/* The heap rewrite module does the rest */
-	rewrite_heap_tuple(rwstate, tuple, copiedTuple);
+	if (rwstate)
+		/* The heap rewrite module does the rest */
+		rewrite_heap_tuple(rwstate, tuple, copiedTuple);
+	else
+	{
+		/*
+		 * Insert tuple when processing REPACK CONCURRENTLY.
+		 *
+		 * rewriteheap.c is not used in the CONCURRENTLY case because it'd be
+		 * difficult to do the same in the catch-up phase (as the logical
+		 * decoding does not provide us with sufficient visibility
+		 * information). Thus we must use heap_insert() both during the
+		 * catch-up and here.
+		 *
+		 * The following is like simple_heap_insert() except that we pass the
+		 * flag to skip logical decoding: as soon as REPACK CONCURRENTLY swaps
+		 * the relation files, it drops this relation, so no logical
+		 * replication subscription should need the data.
+		 */
+		heap_insert(NewHeap, copiedTuple, GetCurrentCommandId(true),
+					HEAP_INSERT_NO_LOGICAL, NULL);
+	}
 
 	heap_freetuple(copiedTuple);
 }
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index 7ce3c5e2685..ef428486030 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -617,9 +617,9 @@ raw_heap_insert(RewriteState state, HeapTuple tup)
 		int			options = HEAP_INSERT_SKIP_FSM;
 
 		/*
-		 * While rewriting the heap for VACUUM FULL / CLUSTER, make sure data
-		 * for the TOAST table are not logically decoded.  The main heap is
-		 * WAL-logged as XLOG FPI records, which are not logically decoded.
+		 * While rewriting the heap for REPACK, make sure data for the TOAST
+		 * table are not logically decoded.  The main heap is WAL-logged as
+		 * XLOG FPI records, which are not logically decoded.
 		 */
 		options |= HEAP_INSERT_NO_LOGICAL;
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 6c1461c9ef6..9b7f8cf8497 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1298,16 +1298,19 @@ CREATE VIEW pg_stat_progress_repack AS
                       WHEN 2 THEN 'index scanning heap'
                       WHEN 3 THEN 'sorting tuples'
                       WHEN 4 THEN 'writing new heap'
-                      WHEN 5 THEN 'swapping relation files'
-                      WHEN 6 THEN 'rebuilding index'
-                      WHEN 7 THEN 'performing final cleanup'
+                      WHEN 5 THEN 'catch-up'
+                      WHEN 6 THEN 'swapping relation files'
+                      WHEN 7 THEN 'rebuilding index'
+                      WHEN 8 THEN 'performing final cleanup'
                       END AS phase,
         CAST(S.param3 AS oid) AS repack_index_relid,
         S.param4 AS heap_tuples_scanned,
-        S.param5 AS heap_tuples_written,
-        S.param6 AS heap_blks_total,
-        S.param7 AS heap_blks_scanned,
-        S.param8 AS index_rebuild_count
+        S.param5 AS heap_tuples_inserted,
+        S.param6 AS heap_tuples_updated,
+        S.param7 AS heap_tuples_deleted,
+        S.param8 AS heap_blks_total,
+        S.param9 AS heap_blks_scanned,
+        S.param10 AS index_rebuild_count
     FROM pg_stat_get_progress_info('REPACK') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
@@ -1325,7 +1328,7 @@ CREATE VIEW pg_stat_progress_cluster AS
         phase,
         repack_index_relid AS cluster_index_relid,
         heap_tuples_scanned,
-        heap_tuples_written,
+        heap_tuples_inserted + heap_tuples_updated AS heap_tuples_written,
         heap_blks_total,
         heap_blks_scanned,
         index_rebuild_count
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index a2d72ce494d..569705abc81 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1,8 +1,23 @@
 /*-------------------------------------------------------------------------
  *
  * cluster.c
- *	  CLUSTER a table on an index.  This is now also used for VACUUM FULL and
- *	  REPACK.
+ *		Implementation of REPACK [CONCURRENTLY], also known as CLUSTER and
+ *		VACUUM FULL.
+ *
+ * There are two somewhat different ways to rewrite a table.  In non-
+ * concurrent mode, it's easy: take AccessExclusiveLock, create a new
+ * transient relation, copy the tuples over to the relfilenode of the new
+ * relation, swap the relfilenodes, then drop the old relation.
+ *
+ * In concurrent mode, we lock the table with only ShareUpdateExclusiveLock,
+ * then do an initial copy as above.  However, while the tuples are being
+ * copied, concurrent transactions could modify the table. To cope with those
+ * changes, we rely on logical decoding to obtain them from WAL.  The changes
+ * are accumulated in a tuplestore.  Once the initial copy is complete, we
+ * read the changes from the tuplestore and re-apply them on the new heap.
+ * Then we upgrade our ShareUpdateExclusiveLock to AccessExclusiveLock and
+ * swap the relfilenodes.  This way, the time we hold a strong lock on the
+ * table is much reduced, and the bloat is eliminated.
  *
  * There is hardly anything left of Paul Brown's original implementation...
  *
@@ -26,6 +41,10 @@
 #include "access/toast_internals.h"
 #include "access/transam.h"
 #include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xlog_internal.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
 #include "catalog/catalog.h"
 #include "catalog/dependency.h"
 #include "catalog/heap.h"
@@ -33,6 +52,7 @@
 #include "catalog/namespace.h"
 #include "catalog/objectaccess.h"
 #include "catalog/pg_am.h"
+#include "catalog/pg_control.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/toasting.h"
 #include "commands/cluster.h"
@@ -40,15 +60,21 @@
 #include "commands/progress.h"
 #include "commands/tablecmds.h"
 #include "commands/vacuum.h"
+#include "executor/executor.h"
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
 #include "pgstat.h"
+#include "replication/decode.h"
+#include "replication/logical.h"
+#include "replication/snapbuild.h"
 #include "storage/bufmgr.h"
+#include "storage/ipc.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
 #include "utils/acl.h"
 #include "utils/fmgroids.h"
 #include "utils/guc.h"
+#include "utils/injection_point.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
@@ -68,12 +94,62 @@ typedef struct
 	Oid			indexOid;
 } RelToCluster;
 
+/*
+ * The following definitions are used for concurrent processing.
+ */
+
+/*
+ * The locators are used to avoid logical decoding of data that we do not need
+ * for our table.
+ */
+RelFileLocator repacked_rel_locator = {.relNumber = InvalidOid};
+RelFileLocator repacked_rel_toast_locator = {.relNumber = InvalidOid};
+
+/*
+ * Everything we need to call ExecInsertIndexTuples().
+ */
+typedef struct IndexInsertState
+{
+	ResultRelInfo *rri;
+	EState	   *estate;
+} IndexInsertState;
+
+/* The WAL segment being decoded. */
+static XLogSegNo repack_current_segment = 0;
+
+/*
+ * Information needed to apply concurrent data changes.
+ */
+typedef struct ChangeDest
+{
+	/* The relation the changes are applied to. */
+	Relation	rel;
+
+	/*
+	 * The following is needed to find the existing tuple if the change is
+	 * UPDATE or DELETE. 'ident_key' should have all the fields except for
+	 * 'sk_argument' initialized.
+	 */
+	Relation	ident_index;
+	ScanKey		ident_key;
+	int			ident_key_nentries;
+
+	/* Needed to update indexes of rel_dst. */
+	IndexInsertState *iistate;
+} ChangeDest;
+
 static bool cluster_rel_recheck(RepackCommand cmd, Relation OldHeap,
-								Oid indexOid, Oid userid, int options);
-static void rebuild_relation(Relation OldHeap, Relation index, bool verbose);
+								Oid indexOid, Oid userid, LOCKMODE lmode,
+								int options);
+static void check_repack_concurrently_requirements(Relation rel);
+static void rebuild_relation(Relation OldHeap, Relation index, bool verbose,
+							 bool concurrent);
 static void copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
-							bool verbose, bool *pSwapToastByContent,
-							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
+							Snapshot snapshot, LogicalDecodingContext *decoding_ctx,
+							bool verbose,
+							bool *pSwapToastByContent,
+							TransactionId *pFreezeXid,
+							MultiXactId *pCutoffMulti);
 static List *get_tables_to_repack(RepackCommand cmd, bool usingindex,
 								  MemoryContext permcxt);
 static List *get_tables_to_repack_partitioned(RepackCommand cmd,
@@ -81,13 +157,51 @@ static List *get_tables_to_repack_partitioned(RepackCommand cmd,
 											  MemoryContext permcxt);
 static bool cluster_is_permitted_for_relation(RepackCommand cmd,
 											  Oid relid, Oid userid);
+
+static void begin_concurrent_repack(Relation rel);
+static void end_concurrent_repack(void);
+static LogicalDecodingContext *setup_logical_decoding(Oid relid);
+static HeapTuple get_changed_tuple(char *change);
+static void apply_concurrent_changes(RepackDecodingState *dstate,
+									 ChangeDest *dest);
+static void apply_concurrent_insert(Relation rel, HeapTuple tup,
+									IndexInsertState *iistate,
+									TupleTableSlot *index_slot);
+static void apply_concurrent_update(Relation rel, HeapTuple tup,
+									HeapTuple tup_target,
+									IndexInsertState *iistate,
+									TupleTableSlot *index_slot);
+static void apply_concurrent_delete(Relation rel, HeapTuple tup_target);
+static HeapTuple find_target_tuple(Relation rel, ChangeDest *dest,
+								   HeapTuple tup_key,
+								   TupleTableSlot *ident_slot);
+static void process_concurrent_changes(LogicalDecodingContext *decoding_ctx,
+									   XLogRecPtr end_of_wal,
+									   ChangeDest *dest);
+static IndexInsertState *get_index_insert_state(Relation relation,
+												Oid ident_index_id,
+												Relation *ident_index_p);
+static ScanKey build_identity_key(Oid ident_idx_oid, Relation rel_src,
+								  int *nentries);
+static void free_index_insert_state(IndexInsertState *iistate);
+static void cleanup_logical_decoding(LogicalDecodingContext *ctx);
+static void rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
+											   Relation cl_index,
+											   LogicalDecodingContext *decoding_ctx,
+											   TransactionId frozenXid,
+											   MultiXactId cutoffMulti);
+static List *build_new_indexes(Relation NewHeap, Relation OldHeap, List *OldIndexes);
 static Relation process_single_relation(RepackStmt *stmt,
+										LOCKMODE lockmode,
+										bool isTopLevel,
 										ClusterParams *params);
 static Oid	determine_clustered_index(Relation rel, bool usingindex,
 									  const char *indexname);
 static const char *RepackCommandAsString(RepackCommand cmd);
 
 
+#define REPL_PLUGIN_NAME   "pgoutput_repack"
+
 /*
  * The repack code allows for processing multiple tables at once. Because
  * of this, we cannot just run everything on a single transaction, or we
@@ -117,6 +231,7 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 	ClusterParams params = {0};
 	Relation	rel = NULL;
 	MemoryContext repack_context;
+	LOCKMODE	lockmode;
 	List	   *rtcs;
 
 	/* Parse option list */
@@ -127,6 +242,16 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 		else if (strcmp(opt->defname, "analyze") == 0 ||
 				 strcmp(opt->defname, "analyse") == 0)
 			params.options |= defGetBoolean(opt) ? CLUOPT_ANALYZE : 0;
+		else if (strcmp(opt->defname, "concurrently") == 0 &&
+				 defGetBoolean(opt))
+		{
+			if (stmt->command != REPACK_COMMAND_REPACK)
+				ereport(ERROR,
+						errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						errmsg("CONCURRENTLY option not supported for %s",
+							   RepackCommandAsString(stmt->command)));
+			params.options |= CLUOPT_CONCURRENT;
+		}
 		else
 			ereport(ERROR,
 					errcode(ERRCODE_SYNTAX_ERROR),
@@ -136,13 +261,25 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 					parser_errposition(pstate, opt->location));
 	}
 
+	/*
+	 * Determine the lock mode expected by cluster_rel().
+	 *
+	 * In the exclusive case, we obtain AccessExclusiveLock right away to
+	 * avoid lock-upgrade hazard in the single-transaction case. In the
+	 * CONCURRENTLY case, the AccessExclusiveLock will only be used at the end
+	 * of processing, supposedly for very short time. Until then, we'll have
+	 * to unlock the relation temporarily, so there's no lock-upgrade hazard.
+	 */
+	lockmode = (params.options & CLUOPT_CONCURRENT) == 0 ?
+		AccessExclusiveLock : ShareUpdateExclusiveLock;
+
 	/*
 	 * If a single relation is specified, process it and we're done ... unless
 	 * the relation is a partitioned table, in which case we fall through.
 	 */
 	if (stmt->relation != NULL)
 	{
-		rel = process_single_relation(stmt, &params);
+		rel = process_single_relation(stmt, lockmode, isTopLevel, &params);
 		if (rel == NULL)
 			return;				/* all done */
 	}
@@ -157,10 +294,29 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 				errmsg("cannot %s multiple tables", "REPACK (ANALYZE)"));
 
 	/*
-	 * By here, we know we are in a multi-table situation.  In order to avoid
-	 * holding locks for too long, we want to process each table in its own
-	 * transaction.  This forces us to disallow running inside a user
-	 * transaction block.
+	 * By here, we know we are in a multi-table situation.
+	 *
+	 * Concurrent processing is currently considered rather special (e.g. in
+	 * terms of resources consumed) so it is not performed in bulk.
+	 */
+	if (params.options & CLUOPT_CONCURRENT)
+	{
+		if (rel != NULL)
+		{
+			Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
+			ereport(ERROR,
+					errmsg("REPACK CONCURRENTLY not supported for partitioned tables"),
+					errhint("Consider running the command for individual partitions."));
+		}
+		else
+			ereport(ERROR,
+					errmsg("REPACK CONCURRENTLY requires explicit table name"));
+	}
+
+	/*
+	 * In order to avoid holding locks for too long, we want to process each
+	 * table in its own transaction.  This forces us to disallow running
+	 * inside a user transaction block.
 	 */
 	PreventInTransactionBlock(isTopLevel, RepackCommandAsString(stmt->command));
 
@@ -243,7 +399,7 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 		 * Open the target table, coping with the case where it has been
 		 * dropped.
 		 */
-		rel = try_table_open(rtc->tableOid, AccessExclusiveLock);
+		rel = try_table_open(rtc->tableOid, lockmode);
 		if (rel == NULL)
 		{
 			CommitTransactionCommand();
@@ -254,7 +410,7 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 		PushActiveSnapshot(GetTransactionSnapshot());
 
 		/* Process this table */
-		cluster_rel(stmt->command, rel, rtc->indexOid, &params);
+		cluster_rel(stmt->command, rel, rtc->indexOid, &params, isTopLevel);
 		/* cluster_rel closes the relation, but keeps lock */
 
 		PopActiveSnapshot();
@@ -283,22 +439,53 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
  * If indexOid is InvalidOid, the table will be rewritten in physical order
  * instead of index order.
  *
+ * Note that, in the concurrent case, the function releases the lock at some
+ * point, in order to get AccessExclusiveLock for the final steps (i.e. to
+ * swap the relation files). To make things simpler, the caller should expect
+ * OldHeap to be closed on return, regardless CLUOPT_CONCURRENT. (The
+ * AccessExclusiveLock is kept till the end of the transaction.)
+ *
  * 'cmd' indicates which command is being executed, to be used for error
  * messages.
  */
 void
 cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
-			ClusterParams *params)
+			ClusterParams *params, bool isTopLevel)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
+	Relation	index;
+	LOCKMODE	lmode;
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
 	bool		verbose = ((params->options & CLUOPT_VERBOSE) != 0);
 	bool		recheck = ((params->options & CLUOPT_RECHECK) != 0);
-	Relation	index;
+	bool		concurrent = ((params->options & CLUOPT_CONCURRENT) != 0);
+
+	/*
+	 * The lock mode is AccessExclusiveLock for normal processing and
+	 * ShareUpdateExclusiveLock for concurrent processing (so that SELECT,
+	 * INSERT, UPDATE and DELETE commands work, but cluster_rel() cannot be
+	 * called concurrently for the same relation).
+	 */
+	lmode = !concurrent ? AccessExclusiveLock : ShareUpdateExclusiveLock;
 
-	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false));
+	/* There are specific requirements on concurrent processing. */
+	if (concurrent)
+	{
+		/*
+		 * Make sure we have no XID assigned, otherwise call of
+		 * setup_logical_decoding() can cause a deadlock.
+		 *
+		 * The existence of transaction block actually does not imply that XID
+		 * was already assigned, but it very likely is. We might want to check
+		 * the result of GetCurrentTransactionIdIfAny() instead, but that
+		 * would be less clear from user's perspective.
+		 */
+		PreventInTransactionBlock(isTopLevel, "REPACK (CONCURRENTLY)");
+
+		check_repack_concurrently_requirements(OldHeap);
+	}
 
 	/* Check for user-requested abort. */
 	CHECK_FOR_INTERRUPTS();
@@ -324,10 +511,13 @@ cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	 * If this is a single-transaction CLUSTER, we can skip these tests. We
 	 * *must* skip the one on indisclustered since it would reject an attempt
 	 * to cluster a not-previously-clustered index.
+	 *
+	 * XXX move [some of] these comments to where the RECHECK flag is
+	 * determined?
 	 */
 	if (recheck &&
 		!cluster_rel_recheck(cmd, OldHeap, indexOid, save_userid,
-							 params->options))
+							 lmode, params->options))
 		goto out;
 
 	/*
@@ -342,6 +532,12 @@ cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 				errmsg("cannot run %s on a shared catalog",
 					   RepackCommandAsString(cmd)));
 
+	/*
+	 * The CONCURRENTLY case should have been rejected earlier because it does
+	 * not support system catalogs.
+	 */
+	Assert(!(OldHeap->rd_rel->relisshared && concurrent));
+
 	/*
 	 * Don't process temp tables of other backends ... their local buffer
 	 * manager is not going to cope.
@@ -362,7 +558,7 @@ cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	if (OidIsValid(indexOid))
 	{
 		/* verify the index is good and lock it */
-		check_index_is_clusterable(OldHeap, indexOid, AccessExclusiveLock);
+		check_index_is_clusterable(OldHeap, indexOid, lmode);
 		/* also open it */
 		index = index_open(indexOid, NoLock);
 	}
@@ -397,7 +593,9 @@ cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	if (OldHeap->rd_rel->relkind == RELKIND_MATVIEW &&
 		!RelationIsPopulated(OldHeap))
 	{
-		relation_close(OldHeap, AccessExclusiveLock);
+		if (index)
+			index_close(index, lmode);
+		relation_close(OldHeap, lmode);
 		goto out;
 	}
 
@@ -410,11 +608,34 @@ cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	 * invalid, because we move tuples around.  Promote them to relation
 	 * locks.  Predicate locks on indexes will be promoted when they are
 	 * reindexed.
+	 *
+	 * During concurrent processing, the heap as well as its indexes stay in
+	 * operation, so we postpone this step until they are locked using
+	 * AccessExclusiveLock near the end of the processing.
 	 */
-	TransferPredicateLocksToHeapRelation(OldHeap);
+	if (!concurrent)
+		TransferPredicateLocksToHeapRelation(OldHeap);
 
 	/* rebuild_relation does all the dirty work */
-	rebuild_relation(OldHeap, index, verbose);
+	PG_TRY();
+	{
+		/*
+		 * For concurrent processing, make sure that our logical decoding
+		 * ignores data changes of other tables than the one we are
+		 * processing.
+		 */
+		if (concurrent)
+			begin_concurrent_repack(OldHeap);
+
+		rebuild_relation(OldHeap, index, verbose, concurrent);
+	}
+	PG_FINALLY();
+	{
+		if (concurrent)
+			end_concurrent_repack();
+	}
+	PG_END_TRY();
+
 	/* rebuild_relation closes OldHeap, and index if valid */
 
 out:
@@ -433,14 +654,14 @@ out:
  */
 static bool
 cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
-					Oid userid, int options)
+					Oid userid, LOCKMODE lmode, int options)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 
 	/* Check that the user still has privileges for the relation */
 	if (!cluster_is_permitted_for_relation(cmd, tableOid, userid))
 	{
-		relation_close(OldHeap, AccessExclusiveLock);
+		relation_close(OldHeap, lmode);
 		return false;
 	}
 
@@ -454,7 +675,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	 */
 	if (RELATION_IS_OTHER_TEMP(OldHeap))
 	{
-		relation_close(OldHeap, AccessExclusiveLock);
+		relation_close(OldHeap, lmode);
 		return false;
 	}
 
@@ -465,7 +686,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 		 */
 		if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
 		{
-			relation_close(OldHeap, AccessExclusiveLock);
+			relation_close(OldHeap, lmode);
 			return false;
 		}
 
@@ -476,7 +697,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 		if ((options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
 			!get_index_isclustered(indexOid))
 		{
-			relation_close(OldHeap, AccessExclusiveLock);
+			relation_close(OldHeap, lmode);
 			return false;
 		}
 	}
@@ -488,7 +709,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
  * Verify that the specified heap and index are valid to cluster on
  *
  * Side effect: obtains lock on the index.  The caller may
- * in some cases already have AccessExclusiveLock on the table, but
+ * in some cases already have a lock of the same strength on the table, but
  * not in all cases so we can't rely on the table-level lock for
  * protection here.
  */
@@ -617,18 +838,87 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 	table_close(pg_index, RowExclusiveLock);
 }
 
+/*
+ * Check if the CONCURRENTLY option is legal for the relation.
+ */
+static void
+check_repack_concurrently_requirements(Relation rel)
+{
+	char		relpersistence,
+				replident;
+	Oid			ident_idx;
+
+	/* Data changes in system relations are not logically decoded. */
+	if (IsCatalogRelation(rel))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("REPACK CONCURRENTLY is not supported for catalog relations.")));
+
+	/*
+	 * reorderbuffer.c does not seem to handle processing of TOAST relation
+	 * alone.
+	 */
+	if (IsToastRelation(rel))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("REPACK CONCURRENTLY is not supported for TOAST relations, unless the main relation is repacked too.")));
+
+	relpersistence = rel->rd_rel->relpersistence;
+	if (relpersistence != RELPERSISTENCE_PERMANENT)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("REPACK CONCURRENTLY is only allowed for permanent relations.")));
+
+	/* With NOTHING, WAL does not contain the old tuple. */
+	replident = rel->rd_rel->relreplident;
+	if (replident == REPLICA_IDENTITY_NOTHING)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("Relation \"%s\" has insufficient replication identity.",
+						 RelationGetRelationName(rel))));
+
+	/*
+	 * Identity index is not set if the replica identity is FULL, but PK might
+	 * exist in such a case.
+	 */
+	ident_idx = RelationGetReplicaIndex(rel);
+	if (!OidIsValid(ident_idx) && OidIsValid(rel->rd_pkindex))
+		ident_idx = rel->rd_pkindex;
+	if (!OidIsValid(ident_idx))
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot process relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 (errhint("Relation \"%s\" has no identity index.",
+						  RelationGetRelationName(rel)))));
+}
+
+
 /*
  * rebuild_relation: rebuild an existing relation in index or physical order
  *
- * OldHeap: table to rebuild.
+ * OldHeap: table to rebuild.  See cluster_rel() for comments on the required
+ * lock strength.
+ *
  * index: index to cluster by, or NULL to rewrite in physical order.
  *
- * On entry, heap and index (if one is given) must be open, and
- * AccessExclusiveLock held on them.
- * On exit, they are closed, but locks on them are not released.
+ * On entry, heap and index (if one is given) must be open, and the
+ * appropriate lock held on them -- AccessExclusiveLock for exclusive
+ * processing and ShareUpdateExclusiveLock for concurrent processing.
+ *
+ * On exit, they are closed, but still locked with AccessExclusiveLock.
+ * (The function handles the lock upgrade if 'concurrent' is true.)
  */
 static void
-rebuild_relation(Relation OldHeap, Relation index, bool verbose)
+rebuild_relation(Relation OldHeap, Relation index, bool verbose, bool concurrent)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			accessMethod = OldHeap->rd_rel->relam;
@@ -636,13 +926,38 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose)
 	Oid			OIDNewHeap;
 	Relation	NewHeap;
 	char		relpersistence;
-	bool		is_system_catalog;
 	bool		swap_toast_by_content;
 	TransactionId frozenXid;
 	MultiXactId cutoffMulti;
+	LogicalDecodingContext *decoding_ctx = NULL;
+	Snapshot	snapshot = NULL;
+#if USE_ASSERT_CHECKING
+	LOCKMODE	lmode;
+
+	lmode = concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock;
+
+	Assert(CheckRelationLockedByMe(OldHeap, lmode, false));
+	Assert(index == NULL || CheckRelationLockedByMe(index, lmode, false));
+#endif
+
+	if (concurrent)
+	{
+		/*
+		 * Prepare to capture the concurrent data changes.
+		 *
+		 * Note that this call waits for all transactions with XID already
+		 * assigned to finish. If some of those transactions is waiting for a
+		 * lock conflicting with ShareUpdateExclusiveLock on our table (e.g.
+		 * it runs CREATE INDEX), we can end up in a deadlock. Not sure this
+		 * risk is worth unlocking/locking the table (and its clustering
+		 * index) and checking again if its still eligible for REPACK
+		 * CONCURRENTLY.
+		 */
+		decoding_ctx = setup_logical_decoding(tableOid);
 
-	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false) &&
-		   (index == NULL || CheckRelationLockedByMe(index, AccessExclusiveLock, false)));
+		snapshot = SnapBuildInitialSnapshotForRepack(decoding_ctx->snapshot_builder);
+		PushActiveSnapshot(snapshot);
+	}
 
 	/* for CLUSTER or REPACK USING INDEX, mark the index as the one to use */
 	if (index != NULL)
@@ -650,7 +965,6 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose)
 
 	/* Remember info about rel before closing OldHeap */
 	relpersistence = OldHeap->rd_rel->relpersistence;
-	is_system_catalog = IsSystemRelation(OldHeap);
 
 	/*
 	 * Create the transient table that will receive the re-ordered data.
@@ -666,30 +980,54 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose)
 	NewHeap = table_open(OIDNewHeap, NoLock);
 
 	/* Copy the heap data into the new table in the desired order */
-	copy_table_data(NewHeap, OldHeap, index, verbose,
+	copy_table_data(NewHeap, OldHeap, index, snapshot, decoding_ctx, verbose,
 					&swap_toast_by_content, &frozenXid, &cutoffMulti);
 
+	/* The historic snapshot won't be needed anymore. */
+	if (snapshot)
+	{
+		PopActiveSnapshot();
+		UpdateActiveSnapshotCommandId();
+	}
 
-	/* Close relcache entries, but keep lock until transaction commit */
-	table_close(OldHeap, NoLock);
-	if (index)
-		index_close(index, NoLock);
+	if (concurrent)
+	{
+		Assert(!swap_toast_by_content);
+		rebuild_relation_finish_concurrent(NewHeap, OldHeap, index,
+										   decoding_ctx,
+										   frozenXid, cutoffMulti);
 
-	/*
-	 * Close the new relation so it can be dropped as soon as the storage is
-	 * swapped. The relation is not visible to others, so no need to unlock it
-	 * explicitly.
-	 */
-	table_close(NewHeap, NoLock);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_FINAL_CLEANUP);
 
-	/*
-	 * Swap the physical files of the target and transient tables, then
-	 * rebuild the target's indexes and throw away the transient table.
-	 */
-	finish_heap_swap(tableOid, OIDNewHeap, is_system_catalog,
-					 swap_toast_by_content, false, true,
-					 frozenXid, cutoffMulti,
-					 relpersistence);
+		/* Done with decoding. */
+		cleanup_logical_decoding(decoding_ctx);
+	}
+	else
+	{
+		bool		is_system_catalog = IsSystemRelation(OldHeap);
+
+		/* Close relcache entries, but keep lock until transaction commit */
+		table_close(OldHeap, NoLock);
+		if (index)
+			index_close(index, NoLock);
+
+		/*
+		 * Close the new relation so it can be dropped as soon as the storage
+		 * is swapped. The relation is not visible to others, so no need to
+		 * unlock it explicitly.
+		 */
+		table_close(NewHeap, NoLock);
+
+		/*
+		 * Swap the physical files of the target and transient tables, then
+		 * rebuild the target's indexes and throw away the transient table.
+		 */
+		finish_heap_swap(tableOid, OIDNewHeap, is_system_catalog,
+						 swap_toast_by_content, false, true, true,
+						 frozenXid, cutoffMulti,
+						 relpersistence);
+	}
 }
 
 
@@ -824,15 +1162,19 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
 /*
  * Do the physical copying of table data.
  *
+ * 'snapshot' and 'decoding_ctx': see table_relation_copy_for_cluster(). Pass
+ * iff concurrent processing is required.
+ *
  * There are three output parameters:
  * *pSwapToastByContent is set true if toast tables must be swapped by content.
  * *pFreezeXid receives the TransactionId used as freeze cutoff point.
  * *pCutoffMulti receives the MultiXactId used as a cutoff point.
  */
 static void
-copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verbose,
-				bool *pSwapToastByContent, TransactionId *pFreezeXid,
-				MultiXactId *pCutoffMulti)
+copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
+				Snapshot snapshot, LogicalDecodingContext *decoding_ctx,
+				bool verbose, bool *pSwapToastByContent,
+				TransactionId *pFreezeXid, MultiXactId *pCutoffMulti)
 {
 	Relation	relRelation;
 	HeapTuple	reltup;
@@ -849,6 +1191,10 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	int			elevel = verbose ? INFO : DEBUG2;
 	PGRUsage	ru0;
 	char	   *nspname;
+	bool		concurrent = snapshot != NULL;
+	LOCKMODE	lmode;
+
+	lmode = concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock;
 
 	pg_rusage_init(&ru0);
 
@@ -877,7 +1223,7 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	 * will be held till end of transaction.
 	 */
 	if (OldHeap->rd_rel->reltoastrelid)
-		LockRelationOid(OldHeap->rd_rel->reltoastrelid, AccessExclusiveLock);
+		LockRelationOid(OldHeap->rd_rel->reltoastrelid, lmode);
 
 	/*
 	 * If both tables have TOAST tables, perform toast swap by content.  It is
@@ -886,7 +1232,8 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	 * swap by links.  This is okay because swap by content is only essential
 	 * for system catalogs, and we don't support schema changes for them.
 	 */
-	if (OldHeap->rd_rel->reltoastrelid && NewHeap->rd_rel->reltoastrelid)
+	if (OldHeap->rd_rel->reltoastrelid && NewHeap->rd_rel->reltoastrelid &&
+		!concurrent)
 	{
 		*pSwapToastByContent = true;
 
@@ -907,6 +1254,10 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 		 * follow the toast pointers to the wrong place.  (It would actually
 		 * work for values copied over from the old toast table, but not for
 		 * any values that we toast which were previously not toasted.)
+		 *
+		 * This would not work with CONCURRENTLY because we may need to delete
+		 * TOASTed tuples from the new heap. With this hack, we'd delete them
+		 * from the old heap.
 		 */
 		NewHeap->rd_toastoid = OldHeap->rd_rel->reltoastrelid;
 	}
@@ -982,7 +1333,9 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	 * values (e.g. because the AM doesn't use freezing).
 	 */
 	table_relation_copy_for_cluster(OldHeap, NewHeap, OldIndex, use_sort,
-									cutoffs.OldestXmin, &cutoffs.FreezeLimit,
+									cutoffs.OldestXmin, snapshot,
+									decoding_ctx,
+									&cutoffs.FreezeLimit,
 									&cutoffs.MultiXactCutoff,
 									&num_tuples, &tups_vacuumed,
 									&tups_recently_dead);
@@ -991,7 +1344,11 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	*pFreezeXid = cutoffs.FreezeLimit;
 	*pCutoffMulti = cutoffs.MultiXactCutoff;
 
-	/* Reset rd_toastoid just to be tidy --- it shouldn't be looked at again */
+	/*
+	 * Reset rd_toastoid just to be tidy --- it shouldn't be looked at again.
+	 * In the CONCURRENTLY case, we need to set it again before applying the
+	 * concurrent changes.
+	 */
 	NewHeap->rd_toastoid = InvalidOid;
 
 	num_pages = RelationGetNumberOfBlocks(NewHeap);
@@ -1449,14 +1806,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 				 bool swap_toast_by_content,
 				 bool check_constraints,
 				 bool is_internal,
+				 bool reindex,
 				 TransactionId frozenXid,
 				 MultiXactId cutoffMulti,
 				 char newrelpersistence)
 {
 	ObjectAddress object;
 	Oid			mapped_tables[4];
-	int			reindex_flags;
-	ReindexParams reindex_params = {0};
 	int			i;
 
 	/* Report that we are now swapping relation files */
@@ -1482,39 +1838,47 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	if (is_system_catalog)
 		CacheInvalidateCatalog(OIDOldHeap);
 
-	/*
-	 * Rebuild each index on the relation (but not the toast table, which is
-	 * all-new at this point).  It is important to do this before the DROP
-	 * step because if we are processing a system catalog that will be used
-	 * during DROP, we want to have its indexes available.  There is no
-	 * advantage to the other order anyway because this is all transactional,
-	 * so no chance to reclaim disk space before commit.  We do not need a
-	 * final CommandCounterIncrement() because reindex_relation does it.
-	 *
-	 * Note: because index_build is called via reindex_relation, it will never
-	 * set indcheckxmin true for the indexes.  This is OK even though in some
-	 * sense we are building new indexes rather than rebuilding existing ones,
-	 * because the new heap won't contain any HOT chains at all, let alone
-	 * broken ones, so it can't be necessary to set indcheckxmin.
-	 */
-	reindex_flags = REINDEX_REL_SUPPRESS_INDEX_USE;
-	if (check_constraints)
-		reindex_flags |= REINDEX_REL_CHECK_CONSTRAINTS;
+	if (reindex)
+	{
+		int			reindex_flags;
+		ReindexParams reindex_params = {0};
 
-	/*
-	 * Ensure that the indexes have the same persistence as the parent
-	 * relation.
-	 */
-	if (newrelpersistence == RELPERSISTENCE_UNLOGGED)
-		reindex_flags |= REINDEX_REL_FORCE_INDEXES_UNLOGGED;
-	else if (newrelpersistence == RELPERSISTENCE_PERMANENT)
-		reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
+		/*
+		 * Rebuild each index on the relation (but not the toast table, which
+		 * is all-new at this point).  It is important to do this before the
+		 * DROP step because if we are processing a system catalog that will
+		 * be used during DROP, we want to have its indexes available.  There
+		 * is no advantage to the other order anyway because this is all
+		 * transactional, so no chance to reclaim disk space before commit. We
+		 * do not need a final CommandCounterIncrement() because
+		 * reindex_relation does it.
+		 *
+		 * Note: because index_build is called via reindex_relation, it will
+		 * never set indcheckxmin true for the indexes.  This is OK even
+		 * though in some sense we are building new indexes rather than
+		 * rebuilding existing ones, because the new heap won't contain any
+		 * HOT chains at all, let alone broken ones, so it can't be necessary
+		 * to set indcheckxmin.
+		 */
+		reindex_flags = REINDEX_REL_SUPPRESS_INDEX_USE;
+		if (check_constraints)
+			reindex_flags |= REINDEX_REL_CHECK_CONSTRAINTS;
 
-	/* Report that we are now reindexing relations */
-	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
-								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
+		/*
+		 * Ensure that the indexes have the same persistence as the parent
+		 * relation.
+		 */
+		if (newrelpersistence == RELPERSISTENCE_UNLOGGED)
+			reindex_flags |= REINDEX_REL_FORCE_INDEXES_UNLOGGED;
+		else if (newrelpersistence == RELPERSISTENCE_PERMANENT)
+			reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
+
+		/* Report that we are now reindexing relations */
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
 
-	reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
+		reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
+	}
 
 	/* Report that we are now doing clean up */
 	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
@@ -1558,6 +1922,17 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	object.objectId = OIDNewHeap;
 	object.objectSubId = 0;
 
+	if (!reindex)
+	{
+		/*
+		 * Make sure the changes in pg_class are visible. This is especially
+		 * important if !swap_toast_by_content, so that the correct TOAST
+		 * relation is dropped. (reindex_relation() above did not help in this
+		 * case))
+		 */
+		CommandCounterIncrement();
+	}
+
 	/*
 	 * The new relation is local to our transaction and we know nothing
 	 * depends on it, so DROP_RESTRICT should be OK.
@@ -1597,7 +1972,7 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 
 			/* Get the associated valid index to be renamed */
 			toastidx = toast_get_valid_index(newrel->rd_rel->reltoastrelid,
-											 NoLock);
+											 AccessExclusiveLock);
 
 			/* rename the toast table ... */
 			snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u",
@@ -1857,7 +2232,8 @@ cluster_is_permitted_for_relation(RepackCommand cmd, Oid relid, Oid userid)
  * case, if an index name is given, it's up to the caller to resolve it.
  */
 static Relation
-process_single_relation(RepackStmt *stmt, ClusterParams *params)
+process_single_relation(RepackStmt *stmt, LOCKMODE lockmode, bool isTopLevel,
+						ClusterParams *params)
 {
 	Relation	rel;
 	Oid			tableOid;
@@ -1866,13 +2242,9 @@ process_single_relation(RepackStmt *stmt, ClusterParams *params)
 	Assert(stmt->command == REPACK_COMMAND_CLUSTER ||
 		   stmt->command == REPACK_COMMAND_REPACK);
 
-	/*
-	 * Find, lock, and check permissions on the table.  We obtain
-	 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
-	 * single-transaction case.
-	 */
+	/* Find, lock, and check permissions on the table. */
 	tableOid = RangeVarGetRelidExtended(stmt->relation->relation,
-										AccessExclusiveLock,
+										lockmode,
 										0,
 										RangeVarCallbackMaintainsTable,
 										NULL);
@@ -1904,13 +2276,14 @@ process_single_relation(RepackStmt *stmt, ClusterParams *params)
 		return rel;
 	else
 	{
-		Oid			indexOid;
+		Oid			indexOid = InvalidOid;
 
 		indexOid = determine_clustered_index(rel, stmt->usingindex,
 											 stmt->indexname);
 		if (OidIsValid(indexOid))
-			check_index_is_clusterable(rel, indexOid, AccessExclusiveLock);
-		cluster_rel(stmt->command, rel, indexOid, params);
+			check_index_is_clusterable(rel, indexOid, lockmode);
+
+		cluster_rel(stmt->command, rel, indexOid, params, isTopLevel);
 
 		/* Do an analyze, if requested */
 		if (params->options & CLUOPT_ANALYZE)
@@ -1993,3 +2366,1019 @@ RepackCommandAsString(RepackCommand cmd)
 	}
 	return "???";
 }
+
+
+/*
+ * Call this function before REPACK CONCURRENTLY starts to setup logical
+ * decoding. It makes sure that other users of the table put enough
+ * information into WAL.
+ *
+ * The point is that at various places we expect that the table we're
+ * processing is treated like a system catalog. For example, we need to be
+ * able to scan it using a "historic snapshot" anytime during the processing
+ * (as opposed to scanning only at the start point of the decoding, as logical
+ * replication does during initial table synchronization), in order to apply
+ * concurrent UPDATE / DELETE commands.
+ *
+ * Note that TOAST table needs no attention here as it's not scanned using
+ * historic snapshot.
+ */
+static void
+begin_concurrent_repack(Relation rel)
+{
+	Oid			toastrelid;
+
+	/*
+	 * Avoid logical decoding of other relations by this backend. The lock we
+	 * have guarantees that the actual locator cannot be changed concurrently:
+	 * TRUNCATE needs AccessExclusiveLock.
+	 */
+	Assert(CheckRelationLockedByMe(rel, ShareUpdateExclusiveLock, false));
+	repacked_rel_locator = rel->rd_locator;
+	toastrelid = rel->rd_rel->reltoastrelid;
+	if (OidIsValid(toastrelid))
+	{
+		Relation	toastrel;
+
+		/* Avoid logical decoding of other TOAST relations. */
+		toastrel = table_open(toastrelid, AccessShareLock);
+		repacked_rel_toast_locator = toastrel->rd_locator;
+		table_close(toastrel, AccessShareLock);
+	}
+}
+
+/*
+ * Call this when done with REPACK CONCURRENTLY.
+ */
+static void
+end_concurrent_repack(void)
+{
+	/*
+	 * Restore normal function of (future) logical decoding for this backend.
+	 */
+	repacked_rel_locator.relNumber = InvalidOid;
+	repacked_rel_toast_locator.relNumber = InvalidOid;
+}
+
+/*
+ * This function is much like pg_create_logical_replication_slot() except that
+ * the new slot is neither released (if anyone else could read changes from
+ * our slot, we could miss changes other backends do while we copy the
+ * existing data into temporary table), nor persisted (it's easier to handle
+ * crash by restarting all the work from scratch).
+ */
+static LogicalDecodingContext *
+setup_logical_decoding(Oid relid)
+{
+	Relation	rel;
+	TupleDesc	tupdesc;
+	LogicalDecodingContext *ctx;
+	RepackDecodingState *dstate = palloc0_object(RepackDecodingState);
+
+	/*
+	 * REPACK CONCURRENTLY is not allowed in a transaction block, so this
+	 * should never fire.
+	 */
+	Assert(!TransactionIdIsValid(GetTopTransactionIdIfAny()));
+
+	/*
+	 * A single backend should not execute multiple REPACK commands at a time,
+	 * so use PID to make the slot unique.
+	 */
+	snprintf(NameStr(dstate->slotname), NAMEDATALEN, "repack_%d", MyProcPid);
+
+	/*
+	 * Check if we can use logical decoding.
+	 */
+	CheckSlotPermissions();
+	CheckLogicalDecodingRequirements();
+
+	/* RS_TEMPORARY so that the slot gets cleaned up on ERROR. */
+	ReplicationSlotCreate(NameStr(dstate->slotname), true, RS_TEMPORARY,
+						  false, false, false);
+
+	/*
+	 * Neither prepare_write nor do_write callback nor update_progress is
+	 * useful for us.
+	 */
+	ctx = CreateInitDecodingContext(REPL_PLUGIN_NAME,
+									NIL,
+									true,
+									InvalidXLogRecPtr,
+									XL_ROUTINE(.page_read = read_local_xlog_page,
+											   .segment_open = wal_segment_open,
+											   .segment_close = wal_segment_close),
+									NULL, NULL, NULL);
+
+	/*
+	 * We don't have control on setting fast_forward, so at least check it.
+	 */
+	Assert(!ctx->fast_forward);
+
+	DecodingContextFindStartpoint(ctx);
+
+	/* Some WAL records should have been read. */
+	Assert(ctx->reader->EndRecPtr != InvalidXLogRecPtr);
+
+	XLByteToSeg(ctx->reader->EndRecPtr, repack_current_segment,
+				wal_segment_size);
+
+	/*
+	 * Setup structures to store decoded changes.
+	 */
+	dstate->relid = relid;
+	dstate->tstore = tuplestore_begin_heap(false, false,
+										   maintenance_work_mem);
+
+	/* Caller should already have the table locked. */
+	rel = table_open(relid, NoLock);
+	tupdesc = CreateTupleDescCopy(RelationGetDescr(rel));
+	dstate->tupdesc = tupdesc;
+	table_close(rel, NoLock);
+
+	/* Initialize the descriptor to store the changes ... */
+	dstate->tupdesc_change = CreateTemplateTupleDesc(1);
+
+	TupleDescInitEntry(dstate->tupdesc_change, 1, NULL, BYTEAOID, -1, 0);
+	/* ... as well as the corresponding slot. */
+	dstate->tsslot = MakeSingleTupleTableSlot(dstate->tupdesc_change,
+											  &TTSOpsMinimalTuple);
+
+	dstate->resowner = ResourceOwnerCreate(CurrentResourceOwner,
+										   "logical decoding");
+
+	ctx->output_writer_private = dstate;
+	return ctx;
+}
+
+/*
+ * Retrieve tuple from ConcurrentChange structure.
+ *
+ * The input data starts with the structure but it might not be appropriately
+ * aligned.
+ */
+static HeapTuple
+get_changed_tuple(char *change)
+{
+	HeapTupleData tup_data;
+	HeapTuple	result;
+	char	   *src;
+
+	/*
+	 * Ensure alignment before accessing the fields. (This is why we can't use
+	 * heap_copytuple() instead of this function.)
+	 */
+	src = change + offsetof(ConcurrentChange, tup_data);
+	memcpy(&tup_data, src, sizeof(HeapTupleData));
+
+	result = (HeapTuple) palloc(HEAPTUPLESIZE + tup_data.t_len);
+	memcpy(result, &tup_data, sizeof(HeapTupleData));
+	result->t_data = (HeapTupleHeader) ((char *) result + HEAPTUPLESIZE);
+	src = change + SizeOfConcurrentChange;
+	memcpy(result->t_data, src, result->t_len);
+
+	return result;
+}
+
+/*
+ * Decode logical changes from the WAL sequence up to end_of_wal.
+ */
+void
+repack_decode_concurrent_changes(LogicalDecodingContext *ctx,
+								 XLogRecPtr end_of_wal)
+{
+	RepackDecodingState *dstate;
+	ResourceOwner resowner_old;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+	resowner_old = CurrentResourceOwner;
+	CurrentResourceOwner = dstate->resowner;
+
+	PG_TRY();
+	{
+		while (ctx->reader->EndRecPtr < end_of_wal)
+		{
+			XLogRecord *record;
+			XLogSegNo	segno_new;
+			char	   *errm = NULL;
+			XLogRecPtr	end_lsn;
+
+			record = XLogReadRecord(ctx->reader, &errm);
+			if (errm)
+				elog(ERROR, "%s", errm);
+
+			if (record != NULL)
+				LogicalDecodingProcessRecord(ctx, ctx->reader);
+
+			/*
+			 * If WAL segment boundary has been crossed, inform the decoding
+			 * system that the catalog_xmin can advance. (We can confirm more
+			 * often, but a filling a single WAL segment should not take much
+			 * time.)
+			 */
+			end_lsn = ctx->reader->EndRecPtr;
+			XLByteToSeg(end_lsn, segno_new, wal_segment_size);
+			if (segno_new != repack_current_segment)
+			{
+				LogicalConfirmReceivedLocation(end_lsn);
+				elog(DEBUG1, "REPACK: confirmed receive location %X/%X",
+					 (uint32) (end_lsn >> 32), (uint32) end_lsn);
+				repack_current_segment = segno_new;
+			}
+
+			CHECK_FOR_INTERRUPTS();
+		}
+		InvalidateSystemCaches();
+		CurrentResourceOwner = resowner_old;
+	}
+	PG_CATCH();
+	{
+		/* clear all timetravel entries */
+		InvalidateSystemCaches();
+		CurrentResourceOwner = resowner_old;
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+}
+
+/*
+ * Apply changes stored in 'file'.
+ */
+static void
+apply_concurrent_changes(RepackDecodingState *dstate, ChangeDest *dest)
+{
+	Relation	rel = dest->rel;
+	TupleTableSlot *index_slot,
+			   *ident_slot;
+	HeapTuple	tup_old = NULL;
+
+	if (dstate->nchanges == 0)
+		return;
+
+	/* TupleTableSlot is needed to pass the tuple to ExecInsertIndexTuples(). */
+	index_slot = MakeSingleTupleTableSlot(dstate->tupdesc, &TTSOpsHeapTuple);
+
+	/* A slot to fetch tuples from identity index. */
+	ident_slot = table_slot_create(rel, NULL);
+
+	while (tuplestore_gettupleslot(dstate->tstore, true, false,
+								   dstate->tsslot))
+	{
+		bool		shouldFree;
+		HeapTuple	tup_change,
+					tup,
+					tup_exist;
+		char	   *change_raw,
+				   *src;
+		ConcurrentChange change;
+		bool		isnull[1];
+		Datum		values[1];
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* Get the change from the single-column tuple. */
+		tup_change = ExecFetchSlotHeapTuple(dstate->tsslot, false, &shouldFree);
+		heap_deform_tuple(tup_change, dstate->tupdesc_change, values, isnull);
+		Assert(!isnull[0]);
+
+		/* Make sure we access aligned data. */
+		change_raw = (char *) DatumGetByteaP(values[0]);
+		src = (char *) VARDATA(change_raw);
+		memcpy(&change, src, SizeOfConcurrentChange);
+
+		/*
+		 * Extract the tuple from the change. The tuple is copied here because
+		 * it might be assigned to 'tup_old', in which case it needs to
+		 * survive into the next iteration.
+		 */
+		tup = get_changed_tuple(src);
+
+		if (change.kind == CHANGE_UPDATE_OLD)
+		{
+			Assert(tup_old == NULL);
+			tup_old = tup;
+		}
+		else if (change.kind == CHANGE_INSERT)
+		{
+			Assert(tup_old == NULL);
+
+			apply_concurrent_insert(rel, tup, dest->iistate, index_slot);
+
+			pfree(tup);
+		}
+		else if (change.kind == CHANGE_UPDATE_NEW ||
+				 change.kind == CHANGE_DELETE)
+		{
+			HeapTuple	tup_key;
+
+			if (change.kind == CHANGE_UPDATE_NEW)
+			{
+				tup_key = tup_old != NULL ? tup_old : tup;
+			}
+			else
+			{
+				Assert(tup_old == NULL);
+				tup_key = tup;
+			}
+
+			/*
+			 * Find the tuple to be updated or deleted.
+			 */
+			tup_exist = find_target_tuple(rel, dest, tup_key, ident_slot);
+			if (tup_exist == NULL)
+				elog(ERROR, "failed to find target tuple");
+
+			if (change.kind == CHANGE_UPDATE_NEW)
+				apply_concurrent_update(rel, tup, tup_exist, dest->iistate,
+										index_slot);
+			else
+				apply_concurrent_delete(rel, tup_exist);
+
+			if (tup_old != NULL)
+			{
+				pfree(tup_old);
+				tup_old = NULL;
+			}
+
+			pfree(tup);
+		}
+		else
+			elog(ERROR, "unrecognized kind of change: %d", change.kind);
+
+		/*
+		 * If a change was applied now, increment CID for next writes and
+		 * update the snapshot so it sees the changes we've applied so far.
+		 */
+		if (change.kind != CHANGE_UPDATE_OLD)
+		{
+			CommandCounterIncrement();
+			UpdateActiveSnapshotCommandId();
+		}
+
+		/* TTSOpsMinimalTuple has .get_heap_tuple==NULL. */
+		Assert(shouldFree);
+		pfree(tup_change);
+	}
+
+	tuplestore_clear(dstate->tstore);
+	dstate->nchanges = 0;
+
+	/* Cleanup. */
+	ExecDropSingleTupleTableSlot(index_slot);
+	ExecDropSingleTupleTableSlot(ident_slot);
+}
+
+static void
+apply_concurrent_insert(Relation rel, HeapTuple tup, IndexInsertState *iistate,
+						TupleTableSlot *index_slot)
+{
+	List	   *recheck;
+
+
+	/*
+	 * Like simple_heap_insert(), but make sure that the INSERT is not
+	 * logically decoded - see reform_and_rewrite_tuple() for more
+	 * information.
+	 */
+	heap_insert(rel, tup, GetCurrentCommandId(true), HEAP_INSERT_NO_LOGICAL,
+				NULL);
+
+	/*
+	 * Update indexes.
+	 *
+	 * In case functions in the index need the active snapshot and caller
+	 * hasn't set one.
+	 */
+	ExecStoreHeapTuple(tup, index_slot, false);
+	recheck = ExecInsertIndexTuples(iistate->rri,
+									index_slot,
+									iistate->estate,
+									false,	/* update */
+									false,	/* noDupErr */
+									NULL,	/* specConflict */
+									NIL,	/* arbiterIndexes */
+									false	/* onlySummarizing */
+		);
+
+	/*
+	 * If recheck is required, it must have been preformed on the source
+	 * relation by now. (All the logical changes we process here are already
+	 * committed.)
+	 */
+	list_free(recheck);
+
+	pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_INSERTED, 1);
+}
+
+static void
+apply_concurrent_update(Relation rel, HeapTuple tup, HeapTuple tup_target,
+						IndexInsertState *iistate, TupleTableSlot *index_slot)
+{
+	LockTupleMode lockmode;
+	TM_FailureData tmfd;
+	TU_UpdateIndexes update_indexes;
+	TM_Result	res;
+	List	   *recheck;
+
+	/*
+	 * Write the new tuple into the new heap. ('tup' gets the TID assigned
+	 * here.)
+	 *
+	 * Do it like in simple_heap_update(), except for 'wal_logical' (and
+	 * except for 'wait').
+	 */
+	res = heap_update(rel, &tup_target->t_self, tup,
+					  GetCurrentCommandId(true),
+					  InvalidSnapshot,
+					  false,	/* no wait - only we are doing changes */
+					  &tmfd, &lockmode, &update_indexes,
+					  false /* wal_logical */ );
+	if (res != TM_Ok)
+		ereport(ERROR, (errmsg("failed to apply concurrent UPDATE")));
+
+	ExecStoreHeapTuple(tup, index_slot, false);
+
+	if (update_indexes != TU_None)
+	{
+		recheck = ExecInsertIndexTuples(iistate->rri,
+										index_slot,
+										iistate->estate,
+										true,	/* update */
+										false,	/* noDupErr */
+										NULL,	/* specConflict */
+										NIL,	/* arbiterIndexes */
+		/* onlySummarizing */
+										update_indexes == TU_Summarizing);
+		list_free(recheck);
+	}
+
+	pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_UPDATED, 1);
+}
+
+static void
+apply_concurrent_delete(Relation rel, HeapTuple tup_target)
+{
+	TM_Result	res;
+	TM_FailureData tmfd;
+
+	/*
+	 * Delete tuple from the new heap.
+	 *
+	 * Do it like in simple_heap_delete(), except for 'wal_logical' (and
+	 * except for 'wait').
+	 */
+	res = heap_delete(rel, &tup_target->t_self, GetCurrentCommandId(true),
+					  InvalidSnapshot, false,
+					  &tmfd,
+					  false,	/* no wait - only we are doing changes */
+					  false /* wal_logical */ );
+
+	if (res != TM_Ok)
+		ereport(ERROR, (errmsg("failed to apply concurrent DELETE")));
+
+	pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_DELETED, 1);
+}
+
+/*
+ * Find the tuple to be updated or deleted.
+ *
+ * 'tup_key' is a tuple containing the key values for the scan.
+ *
+ * On exit,'*scan_p' contains the scan descriptor used. The caller must close
+ * it when he no longer needs the tuple returned.
+ */
+static HeapTuple
+find_target_tuple(Relation rel, ChangeDest *dest, HeapTuple tup_key,
+				  TupleTableSlot *ident_slot)
+{
+	Relation	ident_index = dest->ident_index;
+	IndexScanDesc scan;
+	Form_pg_index ident_form;
+	int2vector *ident_indkey;
+	HeapTuple	result = NULL;
+
+	/* XXX no instrumentation for now */
+	scan = index_beginscan(rel, ident_index, GetActiveSnapshot(),
+						   NULL, dest->ident_key_nentries, 0);
+
+	/*
+	 * Scan key is passed by caller, so it does not have to be constructed
+	 * multiple times. Key entries have all fields initialized, except for
+	 * sk_argument.
+	 */
+	index_rescan(scan, dest->ident_key, dest->ident_key_nentries, NULL, 0);
+
+	/* Info needed to retrieve key values from heap tuple. */
+	ident_form = ident_index->rd_index;
+	ident_indkey = &ident_form->indkey;
+
+	/* Use the incoming tuple to finalize the scan key. */
+	for (int i = 0; i < scan->numberOfKeys; i++)
+	{
+		ScanKey		entry;
+		bool		isnull;
+		int16		attno_heap;
+
+		entry = &scan->keyData[i];
+		attno_heap = ident_indkey->values[i];
+		entry->sk_argument = heap_getattr(tup_key,
+										  attno_heap,
+										  rel->rd_att,
+										  &isnull);
+		Assert(!isnull);
+	}
+	if (index_getnext_slot(scan, ForwardScanDirection, ident_slot))
+	{
+		bool		shouldFree;
+
+		result = ExecFetchSlotHeapTuple(ident_slot, false, &shouldFree);
+		/* TTSOpsBufferHeapTuple has .get_heap_tuple != NULL. */
+		Assert(!shouldFree);
+	}
+	index_endscan(scan);
+
+	return result;
+}
+
+/*
+ * Decode and apply concurrent changes.
+ */
+static void
+process_concurrent_changes(LogicalDecodingContext *decoding_ctx,
+						   XLogRecPtr end_of_wal, ChangeDest *dest)
+{
+	RepackDecodingState *dstate;
+
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_CATCH_UP);
+
+	dstate = (RepackDecodingState *) decoding_ctx->output_writer_private;
+
+	repack_decode_concurrent_changes(decoding_ctx, end_of_wal);
+
+	if (dstate->nchanges == 0)
+		return;
+
+	apply_concurrent_changes(dstate, dest);
+}
+
+/*
+ * Initialize IndexInsertState for index specified by ident_index_id.
+ *
+ * While doing that, also return the identity index in *ident_index_p.
+ */
+static IndexInsertState *
+get_index_insert_state(Relation relation, Oid ident_index_id,
+					   Relation *ident_index_p)
+{
+	EState	   *estate;
+	int			i;
+	IndexInsertState *result;
+	Relation	ident_index = NULL;
+
+	result = (IndexInsertState *) palloc0(sizeof(IndexInsertState));
+	estate = CreateExecutorState();
+
+	result->rri = (ResultRelInfo *) palloc(sizeof(ResultRelInfo));
+	InitResultRelInfo(result->rri, relation, 0, 0, 0);
+	ExecOpenIndices(result->rri, false);
+
+	/*
+	 * Find the relcache entry of the identity index so that we spend no extra
+	 * effort to open / close it.
+	 */
+	for (i = 0; i < result->rri->ri_NumIndices; i++)
+	{
+		Relation	ind_rel;
+
+		ind_rel = result->rri->ri_IndexRelationDescs[i];
+		if (ind_rel->rd_id == ident_index_id)
+			ident_index = ind_rel;
+	}
+	if (ident_index == NULL)
+		elog(ERROR, "failed to open identity index");
+
+	/* Only initialize fields needed by ExecInsertIndexTuples(). */
+	result->estate = estate;
+
+	*ident_index_p = ident_index;
+	return result;
+}
+
+/*
+ * Build scan key to process logical changes.
+ */
+static ScanKey
+build_identity_key(Oid ident_idx_oid, Relation rel_src, int *nentries)
+{
+	Relation	ident_idx_rel;
+	Form_pg_index ident_idx;
+	int			n,
+				i;
+	ScanKey		result;
+
+	Assert(OidIsValid(ident_idx_oid));
+	ident_idx_rel = index_open(ident_idx_oid, AccessShareLock);
+	ident_idx = ident_idx_rel->rd_index;
+	n = ident_idx->indnatts;
+	result = (ScanKey) palloc(sizeof(ScanKeyData) * n);
+	for (i = 0; i < n; i++)
+	{
+		ScanKey		entry;
+		int16		relattno;
+		Form_pg_attribute att;
+		Oid			opfamily,
+					opcintype,
+					opno,
+					opcode;
+
+		entry = &result[i];
+		relattno = ident_idx->indkey.values[i];
+		if (relattno >= 1)
+		{
+			TupleDesc	desc;
+
+			desc = rel_src->rd_att;
+			att = TupleDescAttr(desc, relattno - 1);
+		}
+		else
+			elog(ERROR, "unexpected attribute number %d in index", relattno);
+
+		opfamily = ident_idx_rel->rd_opfamily[i];
+		opcintype = ident_idx_rel->rd_opcintype[i];
+		opno = get_opfamily_member(opfamily, opcintype, opcintype,
+								   BTEqualStrategyNumber);
+
+		if (!OidIsValid(opno))
+			elog(ERROR, "failed to find = operator for type %u", opcintype);
+
+		opcode = get_opcode(opno);
+		if (!OidIsValid(opcode))
+			elog(ERROR, "failed to find = operator for operator %u", opno);
+
+		/* Initialize everything but argument. */
+		ScanKeyInit(entry,
+					i + 1,
+					BTEqualStrategyNumber, opcode,
+					(Datum) NULL);
+		entry->sk_collation = att->attcollation;
+	}
+	index_close(ident_idx_rel, AccessShareLock);
+
+	*nentries = n;
+	return result;
+}
+
+static void
+free_index_insert_state(IndexInsertState *iistate)
+{
+	ExecCloseIndices(iistate->rri);
+	FreeExecutorState(iistate->estate);
+	pfree(iistate->rri);
+	pfree(iistate);
+}
+
+static void
+cleanup_logical_decoding(LogicalDecodingContext *ctx)
+{
+	RepackDecodingState *dstate;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	ExecDropSingleTupleTableSlot(dstate->tsslot);
+	FreeTupleDesc(dstate->tupdesc_change);
+	FreeTupleDesc(dstate->tupdesc);
+	tuplestore_end(dstate->tstore);
+
+	FreeDecodingContext(ctx);
+
+	ReplicationSlotRelease();
+	ReplicationSlotDrop(NameStr(dstate->slotname), false);
+	pfree(dstate);
+}
+
+/*
+ * The final steps of rebuild_relation() for concurrent processing.
+ *
+ * On entry, NewHeap is locked in AccessExclusiveLock mode. OldHeap and its
+ * clustering index (if one is passed) are still locked in a mode that allows
+ * concurrent data changes. On exit, both tables and their indexes are closed,
+ * but locked in AccessExclusiveLock mode.
+ */
+static void
+rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
+								   Relation cl_index,
+								   LogicalDecodingContext *decoding_ctx,
+								   TransactionId frozenXid,
+								   MultiXactId cutoffMulti)
+{
+	LOCKMODE	lockmode_old PG_USED_FOR_ASSERTS_ONLY;
+	List	   *ind_oids_new;
+	Oid			old_table_oid = RelationGetRelid(OldHeap);
+	Oid			new_table_oid = RelationGetRelid(NewHeap);
+	List	   *ind_oids_old = RelationGetIndexList(OldHeap);
+	ListCell   *lc,
+			   *lc2;
+	char		relpersistence;
+	bool		is_system_catalog;
+	Oid			ident_idx_old,
+				ident_idx_new;
+	XLogRecPtr	wal_insert_ptr,
+				end_of_wal;
+	char		dummy_rec_data = '\0';
+	Relation   *ind_refs,
+			   *ind_refs_p;
+	int			nind;
+	ChangeDest	chgdst;
+
+	/* Like in cluster_rel(). */
+	lockmode_old = ShareUpdateExclusiveLock;
+	Assert(CheckRelationLockedByMe(OldHeap, lockmode_old, false));
+	Assert(cl_index == NULL ||
+		   CheckRelationLockedByMe(cl_index, lockmode_old, false));
+	/* This is expected from the caller. */
+	Assert(CheckRelationLockedByMe(NewHeap, AccessExclusiveLock, false));
+
+	ident_idx_old = RelationGetReplicaIndex(OldHeap);
+
+	/*
+	 * Unlike the exclusive case, we build new indexes for the new relation
+	 * rather than swapping the storage and reindexing the old relation. The
+	 * point is that the index build can take some time, so we do it before we
+	 * get AccessExclusiveLock on the old heap and therefore we cannot swap
+	 * the heap storage yet.
+	 *
+	 * index_create() will lock the new indexes using AccessExclusiveLock - no
+	 * need to change that. At the same time, we use ShareUpdateExclusiveLock
+	 * to lock the existing indexes - that should be enough to prevent others
+	 * from changing them while we're repacking the relation. The lock on
+	 * table should prevent others from changing the index column list, but
+	 * might not be enough for commands like ALTER INDEX ... SET ... (Those
+	 * are not necessarily dangerous, but can make user confused if the
+	 * changes they do get lost due to REPACK.)
+	 */
+	ind_oids_new = build_new_indexes(NewHeap, OldHeap, ind_oids_old);
+
+	/*
+	 * Processing shouldn't start w/o valid identity index.
+	 */
+	Assert(OidIsValid(ident_idx_old));
+
+	/* Find "identity index" on the new relation. */
+	ident_idx_new = InvalidOid;
+	forboth(lc, ind_oids_old, lc2, ind_oids_new)
+	{
+		Oid			ind_old = lfirst_oid(lc);
+		Oid			ind_new = lfirst_oid(lc2);
+
+		if (ident_idx_old == ind_old)
+		{
+			ident_idx_new = ind_new;
+			break;
+		}
+	}
+	if (!OidIsValid(ident_idx_new))
+
+		/*
+		 * Should not happen, given our lock on the old relation.
+		 */
+		ereport(ERROR,
+				(errmsg("identity index missing on the new relation")));
+
+	/* Gather information to apply concurrent changes. */
+	chgdst.rel = NewHeap;
+	chgdst.iistate = get_index_insert_state(NewHeap, ident_idx_new,
+											&chgdst.ident_index);
+	chgdst.ident_key = build_identity_key(ident_idx_new, OldHeap,
+										  &chgdst.ident_key_nentries);
+
+	/*
+	 * During testing, wait for another backend to perform concurrent data
+	 * changes which we will process below.
+	 */
+	INJECTION_POINT("repack-concurrently-before-lock", NULL);
+
+	/*
+	 * Flush all WAL records inserted so far (possibly except for the last
+	 * incomplete page, see GetInsertRecPtr), to minimize the amount of data
+	 * we need to flush while holding exclusive lock on the source table.
+	 */
+	wal_insert_ptr = GetInsertRecPtr();
+	XLogFlush(wal_insert_ptr);
+	end_of_wal = GetFlushRecPtr(NULL);
+
+	/*
+	 * Apply concurrent changes first time, to minimize the time we need to
+	 * hold AccessExclusiveLock. (Quite some amount of WAL could have been
+	 * written during the data copying and index creation.)
+	 */
+	process_concurrent_changes(decoding_ctx, end_of_wal, &chgdst);
+
+	/*
+	 * Acquire AccessExclusiveLock on the table, its TOAST relation (if there
+	 * is one), all its indexes, so that we can swap the files.
+	 *
+	 * Before that, unlock the index temporarily to avoid deadlock in case
+	 * another transaction is trying to lock it while holding the lock on the
+	 * table.
+	 */
+	if (cl_index)
+	{
+		index_close(cl_index, ShareUpdateExclusiveLock);
+		cl_index = NULL;
+	}
+	/* For the same reason, unlock TOAST relation. */
+	if (OldHeap->rd_rel->reltoastrelid)
+		LockRelationOid(OldHeap->rd_rel->reltoastrelid, AccessExclusiveLock);
+	/* Finally lock the table */
+	LockRelationOid(old_table_oid, AccessExclusiveLock);
+
+	/*
+	 * Lock all indexes now, not only the clustering one: all indexes need to
+	 * have their files swapped. While doing that, store their relation
+	 * references in an array, to handle predicate locks below.
+	 */
+	ind_refs_p = ind_refs = palloc_array(Relation, list_length(ind_oids_old));
+	nind = 0;
+	foreach(lc, ind_oids_old)
+	{
+		Oid			ind_oid;
+		Relation	index;
+
+		ind_oid = lfirst_oid(lc);
+		index = index_open(ind_oid, AccessExclusiveLock);
+
+		/*
+		 * TODO 1) Do we need to check if ALTER INDEX was executed since the
+		 * new index was created in build_new_indexes()? 2) Specifically for
+		 * the clustering index, should check_index_is_clusterable() be called
+		 * here? (Not sure about the latter: ShareUpdateExclusiveLock on the
+		 * table probably blocks all commands that affect the result of
+		 * check_index_is_clusterable().)
+		 */
+		*ind_refs_p = index;
+		ind_refs_p++;
+		nind++;
+	}
+
+	/*
+	 * In addition, lock the OldHeap's TOAST relation exclusively - again, the
+	 * lock is needed to swap the files.
+	 */
+	if (OidIsValid(OldHeap->rd_rel->reltoastrelid))
+		LockRelationOid(OldHeap->rd_rel->reltoastrelid, AccessExclusiveLock);
+
+	/*
+	 * Tuples and pages of the old heap will be gone, but the heap will stay.
+	 */
+	TransferPredicateLocksToHeapRelation(OldHeap);
+	/* The same for indexes. */
+	for (int i = 0; i < nind; i++)
+	{
+		Relation	index = ind_refs[i];
+
+		TransferPredicateLocksToHeapRelation(index);
+
+		/*
+		 * References to indexes on the old relation are not needed anymore,
+		 * however locks stay till the end of the transaction.
+		 */
+		index_close(index, NoLock);
+	}
+	pfree(ind_refs);
+
+	/*
+	 * Flush anything we see in WAL, to make sure that all changes committed
+	 * while we were waiting for the exclusive lock are available for
+	 * decoding. This should not be necessary if all backends had
+	 * synchronous_commit set, but we can't rely on this setting.
+	 *
+	 * Unfortunately, GetInsertRecPtr() may lag behind the actual insert
+	 * position, and GetLastImportantRecPtr() points at the start of the last
+	 * record rather than at the end. Thus the simplest way to determine the
+	 * insert position is to insert a dummy record and use its LSN.
+	 *
+	 * XXX Consider using GetLastImportantRecPtr() and adding the size of the
+	 * last record (plus the total size of all the page headers the record
+	 * spans)?
+	 */
+	XLogBeginInsert();
+	XLogRegisterData(&dummy_rec_data, 1);
+	wal_insert_ptr = XLogInsert(RM_XLOG_ID, XLOG_NOOP);
+	XLogFlush(wal_insert_ptr);
+	end_of_wal = GetFlushRecPtr(NULL);
+
+	/* Apply the concurrent changes again. */
+	process_concurrent_changes(decoding_ctx, end_of_wal, &chgdst);
+
+	/* Remember info about rel before closing OldHeap */
+	relpersistence = OldHeap->rd_rel->relpersistence;
+	is_system_catalog = IsSystemRelation(OldHeap);
+
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_SWAP_REL_FILES);
+
+	/*
+	 * Even ShareUpdateExclusiveLock should have prevented others from
+	 * creating / dropping indexes (even using the CONCURRENTLY option), so we
+	 * do not need to check whether the lists match.
+	 */
+	forboth(lc, ind_oids_old, lc2, ind_oids_new)
+	{
+		Oid			ind_old = lfirst_oid(lc);
+		Oid			ind_new = lfirst_oid(lc2);
+		Oid			mapped_tables[4];
+
+		/* Zero out possible results from swapped_relation_files */
+		memset(mapped_tables, 0, sizeof(mapped_tables));
+
+		swap_relation_files(ind_old, ind_new,
+							(old_table_oid == RelationRelationId),
+							false,	/* swap_toast_by_content */
+							true,
+							InvalidTransactionId,
+							InvalidMultiXactId,
+							mapped_tables);
+
+#ifdef USE_ASSERT_CHECKING
+
+		/*
+		 * Concurrent processing is not supported for system relations, so
+		 * there should be no mapped tables.
+		 */
+		for (int i = 0; i < 4; i++)
+			Assert(mapped_tables[i] == 0);
+#endif
+	}
+
+	/* The new indexes must be visible for deletion. */
+	CommandCounterIncrement();
+
+	/* Close the old heap but keep lock until transaction commit. */
+	table_close(OldHeap, NoLock);
+	/* Close the new heap. (We didn't have to open its indexes). */
+	table_close(NewHeap, NoLock);
+
+	/* Cleanup what we don't need anymore. (And close the identity index.) */
+	pfree(chgdst.ident_key);
+	free_index_insert_state(chgdst.iistate);
+
+	/*
+	 * Swap the relations and their TOAST relations and TOAST indexes. This
+	 * also drops the new relation and its indexes.
+	 *
+	 * (System catalogs are currently not supported.)
+	 */
+	Assert(!is_system_catalog);
+	finish_heap_swap(old_table_oid, new_table_oid,
+					 is_system_catalog,
+					 false,		/* swap_toast_by_content */
+					 false, true, false,
+					 frozenXid, cutoffMulti,
+					 relpersistence);
+}
+
+/*
+ * Build indexes on NewHeap according to those on OldHeap.
+ *
+ * OldIndexes is the list of index OIDs on OldHeap. The contained indexes end
+ * up locked using ShareUpdateExclusiveLock.
+ *
+ * A list of OIDs of the corresponding indexes created on NewHeap is
+ * returned. The order of items does match, so we can use these arrays to swap
+ * index storage.
+ */
+static List *
+build_new_indexes(Relation NewHeap, Relation OldHeap, List *OldIndexes)
+{
+	ListCell   *lc;
+	List	   *result = NIL;
+
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
+
+	foreach(lc, OldIndexes)
+	{
+		Oid			ind_oid,
+					ind_oid_new;
+		char	   *newName;
+		Relation	ind;
+
+		ind_oid = lfirst_oid(lc);
+		ind = index_open(ind_oid, ShareUpdateExclusiveLock);
+
+		newName = ChooseRelationName(get_rel_name(ind_oid),
+									 NULL,
+									 "repacknew",
+									 get_rel_namespace(ind->rd_index->indrelid),
+									 false);
+		ind_oid_new = index_create_copy(NewHeap, ind_oid,
+										ind->rd_rel->reltablespace, newName,
+										false);
+		result = lappend_oid(result, ind_oid_new);
+
+		index_close(ind, NoLock);
+	}
+
+	return result;
+}
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index a1fd4cab35b..85d289ca22b 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -892,7 +892,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 static void
 refresh_by_heap_swap(Oid matviewOid, Oid OIDNewHeap, char relpersistence)
 {
-	finish_heap_swap(matviewOid, OIDNewHeap, false, false, true, true,
+	finish_heap_swap(matviewOid, OIDNewHeap, false, false, true, true, true,
 					 RecentXmin, ReadNextMultiXactId(), relpersistence);
 }
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 1c9ef53be20..d3235df6c11 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -5992,6 +5992,7 @@ ATRewriteTables(AlterTableStmt *parsetree, List **wqueue, LOCKMODE lockmode,
 			finish_heap_swap(tab->relid, OIDNewHeap,
 							 false, false, true,
 							 !OidIsValid(tab->newTableSpace),
+							 true,
 							 RecentXmin,
 							 ReadNextMultiXactId(),
 							 persistence);
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 6afa203983f..ae8b5d4066f 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -126,7 +126,7 @@ static void vac_truncate_clog(TransactionId frozenXID,
 							  TransactionId lastSaneFrozenXid,
 							  MultiXactId lastSaneMinMulti);
 static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
-					   BufferAccessStrategy bstrategy);
+					   BufferAccessStrategy bstrategy, bool isTopLevel);
 static double compute_parallel_delay(void);
 static VacOptValue get_vacoptval_from_boolean(DefElem *def);
 static bool vac_tid_reaped(ItemPointer itemptr, void *state);
@@ -629,7 +629,8 @@ vacuum(List *relations, const VacuumParams params, BufferAccessStrategy bstrateg
 
 			if (params.options & VACOPT_VACUUM)
 			{
-				if (!vacuum_rel(vrel->oid, vrel->relation, params, bstrategy))
+				if (!vacuum_rel(vrel->oid, vrel->relation, params, bstrategy,
+								isTopLevel))
 					continue;
 			}
 
@@ -1999,7 +2000,7 @@ vac_truncate_clog(TransactionId frozenXID,
  */
 static bool
 vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
-		   BufferAccessStrategy bstrategy)
+		   BufferAccessStrategy bstrategy, bool isTopLevel)
 {
 	LOCKMODE	lmode;
 	Relation	rel;
@@ -2290,7 +2291,7 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 
 			/* VACUUM FULL is a variant of REPACK; see cluster.c */
 			cluster_rel(REPACK_COMMAND_VACUUMFULL, rel, InvalidOid,
-						&cluster_params);
+						&cluster_params, isTopLevel);
 			/* cluster_rel closes the relation, but keeps lock */
 
 			rel = NULL;
@@ -2333,7 +2334,8 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 		toast_vacuum_params.options |= VACOPT_PROCESS_MAIN;
 		toast_vacuum_params.toast_parent = relid;
 
-		vacuum_rel(toast_relid, NULL, toast_vacuum_params, bstrategy);
+		vacuum_rel(toast_relid, NULL, toast_vacuum_params, bstrategy,
+				   isTopLevel);
 	}
 
 	/*
diff --git a/src/backend/meson.build b/src/backend/meson.build
index b831a541652..5c148131217 100644
--- a/src/backend/meson.build
+++ b/src/backend/meson.build
@@ -194,5 +194,6 @@ pg_test_mod_args = pg_mod_args + {
 subdir('jit/llvm')
 subdir('replication/libpqwalreceiver')
 subdir('replication/pgoutput')
+subdir('replication/pgoutput_repack')
 subdir('snowball')
 subdir('utils/mb/conversion_procs')
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..a956892f42f 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -33,6 +33,7 @@
 #include "access/xlogreader.h"
 #include "access/xlogrecord.h"
 #include "catalog/pg_control.h"
+#include "commands/cluster.h"
 #include "replication/decode.h"
 #include "replication/logical.h"
 #include "replication/message.h"
@@ -472,6 +473,88 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	TransactionId xid = XLogRecGetXid(buf->record);
 	SnapBuild  *builder = ctx->snapshot_builder;
 
+	/*
+	 * If the change is not intended for logical decoding, do not even
+	 * establish transaction for it - REPACK CONCURRENTLY is the typical use
+	 * case.
+	 *
+	 * First, check if REPACK CONCURRENTLY is being performed by this backend.
+	 * If so, only decode data changes of the table that it is processing, and
+	 * the changes of its TOAST relation.
+	 *
+	 * (TOAST locator should not be set unless the main is.)
+	 */
+	Assert(!OidIsValid(repacked_rel_toast_locator.relNumber) ||
+		   OidIsValid(repacked_rel_locator.relNumber));
+
+	if (OidIsValid(repacked_rel_locator.relNumber))
+	{
+		XLogReaderState *r = buf->record;
+		RelFileLocator locator;
+
+		/* Not all records contain the block. */
+		if (XLogRecGetBlockTagExtended(r, 0, &locator, NULL, NULL, NULL) &&
+			!RelFileLocatorEquals(locator, repacked_rel_locator) &&
+			(!OidIsValid(repacked_rel_toast_locator.relNumber) ||
+			 !RelFileLocatorEquals(locator, repacked_rel_toast_locator)))
+			return;
+	}
+
+	/*
+	 * Second, skip records which do not contain sufficient information for
+	 * the decoding.
+	 *
+	 * The problem we solve here is that REPACK CONCURRENTLY generates WAL
+	 * when doing changes in the new table. Those changes should not be useful
+	 * for any other user (such as logical replication subscription) because
+	 * the new table will eventually be dropped (after REPACK CONCURRENTLY has
+	 * assigned its file to the "old table").
+	 */
+	switch (info)
+	{
+		case XLOG_HEAP_INSERT:
+			{
+				xl_heap_insert *rec;
+
+				rec = (xl_heap_insert *) XLogRecGetData(buf->record);
+
+				/*
+				 * This does happen when 1) raw_heap_insert marks the TOAST
+				 * record as HEAP_INSERT_NO_LOGICAL, 2) REPACK CONCURRENTLY
+				 * replays inserts performed by other backends.
+				 */
+				if ((rec->flags & XLH_INSERT_CONTAINS_NEW_TUPLE) == 0)
+					return;
+
+				break;
+			}
+
+		case XLOG_HEAP_HOT_UPDATE:
+		case XLOG_HEAP_UPDATE:
+			{
+				xl_heap_update *rec;
+
+				rec = (xl_heap_update *) XLogRecGetData(buf->record);
+				if ((rec->flags &
+					 (XLH_UPDATE_CONTAINS_NEW_TUPLE |
+					  XLH_UPDATE_CONTAINS_OLD_TUPLE |
+					  XLH_UPDATE_CONTAINS_OLD_KEY)) == 0)
+					return;
+
+				break;
+			}
+
+		case XLOG_HEAP_DELETE:
+			{
+				xl_heap_delete *rec;
+
+				rec = (xl_heap_delete *) XLogRecGetData(buf->record);
+				if (rec->flags & XLH_DELETE_NO_LOGICAL)
+					return;
+				break;
+			}
+	}
+
 	ReorderBufferProcessXid(ctx->reorder, xid, buf->origptr);
 
 	/*
@@ -512,6 +595,17 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			break;
 
 		case XLOG_HEAP_TRUNCATE:
+			/* Is REPACK (CONCURRENTLY) being run by this backend? */
+			if (OidIsValid(repacked_rel_locator.relNumber))
+
+				/*
+				 * TRUNCATE changes rd_locator of the relation, so it'd break
+				 * REPACK (CONCURRENTLY). In fact it should not happen because
+				 * TRUNCATE needs AccessExclusiveLock on the table. Should we
+				 * only use Assert() here?
+				 */
+				ereport(ERROR,
+						(errmsg("TRUNCATE encountered while doing REPACK (CONCURRENTLY)")));
 			if (SnapBuildProcessChange(builder, xid, buf->origptr) &&
 				!ctx->fast_forward)
 				DecodeTruncate(ctx, buf);
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index a3730804428..7643dfe31bb 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -486,6 +486,27 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	return SnapBuildMVCCFromHistoric(snap, true);
 }
 
+/*
+ * Build an MVCC snapshot for the initial data load performed by REPACK
+ * CONCURRENTLY command.
+ *
+ * The snapshot will only be used to scan one particular relation, which is
+ * treated like a catalog (therefore ->building_full_snapshot is not
+ * important), and the caller should already have a replication slot setup (so
+ * we do not set MyProc->xmin). XXX Do we yet need to add some restrictions?
+ */
+Snapshot
+SnapBuildInitialSnapshotForRepack(SnapBuild *builder)
+{
+	Snapshot	snap;
+
+	Assert(builder->state == SNAPBUILD_CONSISTENT);
+	Assert(builder->building_full_snapshot);
+
+	snap = SnapBuildBuildSnapshot(builder);
+	return SnapBuildMVCCFromHistoric(snap, false);
+}
+
 /*
  * Turn a historic MVCC snapshot into an ordinary MVCC snapshot.
  *
diff --git a/src/backend/replication/pgoutput_repack/Makefile b/src/backend/replication/pgoutput_repack/Makefile
new file mode 100644
index 00000000000..4efeb713b70
--- /dev/null
+++ b/src/backend/replication/pgoutput_repack/Makefile
@@ -0,0 +1,32 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for src/backend/replication/pgoutput_repack
+#
+# IDENTIFICATION
+#    src/backend/replication/pgoutput_repack
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/replication/pgoutput_repack
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = \
+	$(WIN32RES) \
+	pgoutput_repack.o
+PGFILEDESC = "pgoutput_repack - logical replication output plugin for REPACK command"
+NAME = pgoutput_repack
+
+all: all-shared-lib
+
+include $(top_srcdir)/src/Makefile.shlib
+
+install: all installdirs install-lib
+
+installdirs: installdirs-lib
+
+uninstall: uninstall-lib
+
+clean distclean: clean-lib
+	rm -f $(OBJS)
diff --git a/src/backend/replication/pgoutput_repack/meson.build b/src/backend/replication/pgoutput_repack/meson.build
new file mode 100644
index 00000000000..133e865a4a0
--- /dev/null
+++ b/src/backend/replication/pgoutput_repack/meson.build
@@ -0,0 +1,18 @@
+# Copyright (c) 2022-2024, PostgreSQL Global Development Group
+
+pgoutput_repack_sources = files(
+  'pgoutput_repack.c',
+)
+
+if host_system == 'windows'
+  pgoutput_repack_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'pgoutput_repack',
+    '--FILEDESC', 'pgoutput_repack - logical replication output plugin for REPACK command',])
+endif
+
+pgoutput_repack = shared_module('pgoutput_repack',
+  pgoutput_repack_sources,
+  kwargs: pg_mod_args,
+)
+
+backend_targets += pgoutput_repack
diff --git a/src/backend/replication/pgoutput_repack/pgoutput_repack.c b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
new file mode 100644
index 00000000000..c8930640a0d
--- /dev/null
+++ b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
@@ -0,0 +1,240 @@
+/*-------------------------------------------------------------------------
+ *
+ * pgoutput_repack.c
+ *		Logical Replication output plugin for REPACK command
+ *
+ * Copyright (c) 2012-2025, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		  src/backend/replication/pgoutput_repack/pgoutput_repack.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heaptoast.h"
+#include "commands/cluster.h"
+#include "replication/snapbuild.h"
+#include "utils/memutils.h"
+
+PG_MODULE_MAGIC;
+
+static void plugin_startup(LogicalDecodingContext *ctx,
+						   OutputPluginOptions *opt, bool is_init);
+static void plugin_shutdown(LogicalDecodingContext *ctx);
+static void plugin_begin_txn(LogicalDecodingContext *ctx,
+							 ReorderBufferTXN *txn);
+static void plugin_commit_txn(LogicalDecodingContext *ctx,
+							  ReorderBufferTXN *txn, XLogRecPtr commit_lsn);
+static void plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+						  Relation rel, ReorderBufferChange *change);
+static void store_change(LogicalDecodingContext *ctx,
+						 ConcurrentChangeKind kind, HeapTuple tuple);
+
+void
+_PG_output_plugin_init(OutputPluginCallbacks *cb)
+{
+	AssertVariableIsOfType(&_PG_output_plugin_init, LogicalOutputPluginInit);
+
+	cb->startup_cb = plugin_startup;
+	cb->begin_cb = plugin_begin_txn;
+	cb->change_cb = plugin_change;
+	cb->commit_cb = plugin_commit_txn;
+	cb->shutdown_cb = plugin_shutdown;
+}
+
+
+/* initialize this plugin */
+static void
+plugin_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
+			   bool is_init)
+{
+	ctx->output_plugin_private = NULL;
+
+	/* Probably unnecessary, as we don't use the SQL interface ... */
+	opt->output_type = OUTPUT_PLUGIN_BINARY_OUTPUT;
+
+	if (ctx->output_plugin_options != NIL)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("This plugin does not expect any options")));
+	}
+}
+
+static void
+plugin_shutdown(LogicalDecodingContext *ctx)
+{
+}
+
+/*
+ * As we don't release the slot during processing of particular table, there's
+ * no room for SQL interface, even for debugging purposes. Therefore we need
+ * neither OutputPluginPrepareWrite() nor OutputPluginWrite() in the plugin
+ * callbacks. (Although we might want to write custom callbacks, this API
+ * seems to be unnecessarily generic for our purposes.)
+ */
+
+/* BEGIN callback */
+static void
+plugin_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
+{
+}
+
+/* COMMIT callback */
+static void
+plugin_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+				  XLogRecPtr commit_lsn)
+{
+}
+
+/*
+ * Callback for individual changed tuples
+ */
+static void
+plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+			  Relation relation, ReorderBufferChange *change)
+{
+	RepackDecodingState *dstate;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	/* Only interested in one particular relation. */
+	if (relation->rd_id != dstate->relid)
+		return;
+
+	/* Decode entry depending on its type */
+	switch (change->action)
+	{
+		case REORDER_BUFFER_CHANGE_INSERT:
+			{
+				HeapTuple	newtuple;
+
+				newtuple = change->data.tp.newtuple != NULL ?
+					change->data.tp.newtuple : NULL;
+
+				/*
+				 * Identity checks in the main function should have made this
+				 * impossible.
+				 */
+				if (newtuple == NULL)
+					elog(ERROR, "Incomplete insert info.");
+
+				store_change(ctx, CHANGE_INSERT, newtuple);
+			}
+			break;
+		case REORDER_BUFFER_CHANGE_UPDATE:
+			{
+				HeapTuple	oldtuple,
+							newtuple;
+
+				oldtuple = change->data.tp.oldtuple != NULL ?
+					change->data.tp.oldtuple : NULL;
+				newtuple = change->data.tp.newtuple != NULL ?
+					change->data.tp.newtuple : NULL;
+
+				if (newtuple == NULL)
+					elog(ERROR, "Incomplete update info.");
+
+				if (oldtuple != NULL)
+					store_change(ctx, CHANGE_UPDATE_OLD, oldtuple);
+
+				store_change(ctx, CHANGE_UPDATE_NEW, newtuple);
+			}
+			break;
+		case REORDER_BUFFER_CHANGE_DELETE:
+			{
+				HeapTuple	oldtuple;
+
+				oldtuple = change->data.tp.oldtuple ?
+					change->data.tp.oldtuple : NULL;
+
+				if (oldtuple == NULL)
+					elog(ERROR, "Incomplete delete info.");
+
+				store_change(ctx, CHANGE_DELETE, oldtuple);
+			}
+			break;
+		default:
+			/* Should not come here */
+			Assert(false);
+			break;
+	}
+}
+
+/* Store concurrent data change. */
+static void
+store_change(LogicalDecodingContext *ctx, ConcurrentChangeKind kind,
+			 HeapTuple tuple)
+{
+	RepackDecodingState *dstate;
+	char	   *change_raw;
+	ConcurrentChange change;
+	bool		flattened = false;
+	Size		size;
+	Datum		values[1];
+	bool		isnull[1];
+	char	   *dst;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	size = VARHDRSZ + SizeOfConcurrentChange;
+
+	/*
+	 * ReorderBufferCommit() stores the TOAST chunks in its private memory
+	 * context and frees them after having called apply_change().  Therefore
+	 * we need flat copy (including TOAST) that we eventually copy into the
+	 * memory context which is available to decode_concurrent_changes().
+	 */
+	if (HeapTupleHasExternal(tuple))
+	{
+		/*
+		 * toast_flatten_tuple_to_datum() might be more convenient but we
+		 * don't want the decompression it does.
+		 */
+		tuple = toast_flatten_tuple(tuple, dstate->tupdesc);
+		flattened = true;
+	}
+
+	size += tuple->t_len;
+	if (size >= MaxAllocSize)
+		elog(ERROR, "Change is too big.");
+
+	/* Construct the change. */
+	change_raw = (char *) palloc0(size);
+	SET_VARSIZE(change_raw, size);
+
+	/*
+	 * Since the varlena alignment might not be sufficient for the structure,
+	 * set the fields in a local instance and remember where it should
+	 * eventually be copied.
+	 */
+	change.kind = kind;
+	dst = (char *) VARDATA(change_raw);
+
+	/*
+	 * Copy the tuple.
+	 *
+	 * Note: change->tup_data.t_data must be fixed on retrieval!
+	 */
+	memcpy(&change.tup_data, tuple, sizeof(HeapTupleData));
+	memcpy(dst, &change, SizeOfConcurrentChange);
+	dst += SizeOfConcurrentChange;
+	memcpy(dst, tuple->t_data, tuple->t_len);
+
+	/* The data has been copied. */
+	if (flattened)
+		pfree(tuple);
+
+	/* Store as tuple of 1 bytea column. */
+	values[0] = PointerGetDatum(change_raw);
+	isnull[0] = false;
+	tuplestore_putvalues(dstate->tstore, dstate->tupdesc_change,
+						 values, isnull);
+
+	/* Accounting. */
+	dstate->nchanges++;
+
+	/* Cleanup. */
+	pfree(change_raw);
+}
diff --git a/src/backend/storage/lmgr/generate-lwlocknames.pl b/src/backend/storage/lmgr/generate-lwlocknames.pl
index cd3e43c448a..519f3953638 100644
--- a/src/backend/storage/lmgr/generate-lwlocknames.pl
+++ b/src/backend/storage/lmgr/generate-lwlocknames.pl
@@ -162,7 +162,7 @@ while (<$lwlocklist>)
 
 die
   "$wait_event_lwlocks[$lwlock_count] defined in wait_event_names.txt but "
-  . " missing from lwlocklist.h"
+  . "missing from lwlocklist.h"
   if $lwlock_count < scalar @wait_event_lwlocks;
 
 die
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 4cf32ffe833..93fc3037110 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -214,7 +214,6 @@ static List *exportedSnapshots = NIL;
 
 /* Prototypes for local functions */
 static void UnregisterSnapshotNoOwner(Snapshot snapshot);
-static void FreeSnapshot(Snapshot snapshot);
 static void SnapshotResetXmin(void);
 
 /* ResourceOwner callbacks to track snapshot references */
@@ -659,7 +658,7 @@ CopySnapshot(Snapshot snapshot)
  * FreeSnapshot
  *		Free the memory associated with a snapshot.
  */
-static void
+void
 FreeSnapshot(Snapshot snapshot)
 {
 	Assert(snapshot->regd_count == 0);
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 626d9f1c98b..0fcf343d3af 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -5075,8 +5075,8 @@ match_previous_words(int pattern_id,
 		 * one word, so the above test is correct.
 		 */
 		if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
-			COMPLETE_WITH("ANALYZE", "VERBOSE");
-		else if (TailMatches("ANALYZE", "VERBOSE"))
+			COMPLETE_WITH("ANALYZE", "CONCURRENTLY", "VERBOSE");
+		else if (TailMatches("ANALYZE", "CONCURRENTLY", "VERBOSE"))
 			COMPLETE_WITH("ON", "OFF");
 	}
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 632c4332a8c..b7cd25896f6 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -361,14 +361,15 @@ extern void heap_multi_insert(Relation relation, TupleTableSlot **slots,
 							  BulkInsertState bistate);
 extern TM_Result heap_delete(Relation relation, const ItemPointerData *tid,
 							 CommandId cid, Snapshot crosscheck, bool wait,
-							 TM_FailureData *tmfd, bool changingPart);
+							 TM_FailureData *tmfd, bool changingPart,
+							 bool wal_logical);
 extern void heap_finish_speculative(Relation relation, const ItemPointerData *tid);
 extern void heap_abort_speculative(Relation relation, const ItemPointerData *tid);
 extern TM_Result heap_update(Relation relation, const ItemPointerData *otid,
 							 HeapTuple newtup,
 							 CommandId cid, Snapshot crosscheck, bool wait,
 							 TM_FailureData *tmfd, LockTupleMode *lockmode,
-							 TU_UpdateIndexes *update_indexes);
+							 TU_UpdateIndexes *update_indexes, bool wal_logical);
 extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
 								 CommandId cid, LockTupleMode mode, LockWaitPolicy wait_policy,
 								 bool follow_updates,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..2cc49fd48de 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -104,6 +104,8 @@
 #define XLH_DELETE_CONTAINS_OLD_KEY				(1<<2)
 #define XLH_DELETE_IS_SUPER						(1<<3)
 #define XLH_DELETE_IS_PARTITION_MOVE			(1<<4)
+/* See heap_delete() */
+#define XLH_DELETE_NO_LOGICAL					(1<<5)
 
 /* convenience macro for checking whether any form of old tuple was logged */
 #define XLH_DELETE_CONTAINS_OLD						\
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 2fa790b6bf5..d8f76d325f9 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -22,6 +22,7 @@
 #include "access/xact.h"
 #include "commands/vacuum.h"
 #include "executor/tuptable.h"
+#include "replication/logical.h"
 #include "storage/read_stream.h"
 #include "utils/rel.h"
 #include "utils/snapshot.h"
@@ -629,6 +630,8 @@ typedef struct TableAmRoutine
 											  Relation OldIndex,
 											  bool use_sort,
 											  TransactionId OldestXmin,
+											  Snapshot snapshot,
+											  LogicalDecodingContext *decoding_ctx,
 											  TransactionId *xid_cutoff,
 											  MultiXactId *multi_cutoff,
 											  double *num_tuples,
@@ -1646,6 +1649,10 @@ table_relation_copy_data(Relation rel, const RelFileLocator *newrlocator)
  *   not needed for the relation's AM
  * - *xid_cutoff - ditto
  * - *multi_cutoff - ditto
+ * - snapshot - if != NULL, ignore data changes done by transactions that this
+ *	 (MVCC) snapshot considers still in-progress or in the future.
+ * - decoding_ctx - logical decoding context, to capture concurrent data
+ *   changes.
  *
  * Output parameters:
  * - *xid_cutoff - rel's new relfrozenxid value, may be invalid
@@ -1658,6 +1665,8 @@ table_relation_copy_for_cluster(Relation OldTable, Relation NewTable,
 								Relation OldIndex,
 								bool use_sort,
 								TransactionId OldestXmin,
+								Snapshot snapshot,
+								LogicalDecodingContext *decoding_ctx,
 								TransactionId *xid_cutoff,
 								MultiXactId *multi_cutoff,
 								double *num_tuples,
@@ -1666,6 +1675,7 @@ table_relation_copy_for_cluster(Relation OldTable, Relation NewTable,
 {
 	OldTable->rd_tableam->relation_copy_for_cluster(OldTable, NewTable, OldIndex,
 													use_sort, OldestXmin,
+													snapshot, decoding_ctx,
 													xid_cutoff, multi_cutoff,
 													num_tuples, tups_vacuumed,
 													tups_recently_dead);
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 652542e8e65..b43a1740053 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -13,10 +13,15 @@
 #ifndef CLUSTER_H
 #define CLUSTER_H
 
+#include "nodes/execnodes.h"
 #include "nodes/parsenodes.h"
 #include "parser/parse_node.h"
+#include "replication/logical.h"
 #include "storage/lock.h"
+#include "storage/relfilelocator.h"
 #include "utils/relcache.h"
+#include "utils/resowner.h"
+#include "utils/tuplestore.h"
 
 
 /* flag bits for ClusterParams->options */
@@ -25,6 +30,8 @@
 #define CLUOPT_RECHECK_ISCLUSTERED 0x04 /* recheck relation state for
 										 * indisclustered */
 #define CLUOPT_ANALYZE 0x08		/* do an ANALYZE */
+#define CLUOPT_CONCURRENT 0x10	/* allow concurrent data changes */
+
 
 /* options for CLUSTER */
 typedef struct ClusterParams
@@ -33,14 +40,94 @@ typedef struct ClusterParams
 } ClusterParams;
 
 
+/*
+ * The following definitions are used by REPACK CONCURRENTLY.
+ */
+
+extern RelFileLocator repacked_rel_locator;
+extern RelFileLocator repacked_rel_toast_locator;
+
+typedef enum
+{
+	CHANGE_INSERT,
+	CHANGE_UPDATE_OLD,
+	CHANGE_UPDATE_NEW,
+	CHANGE_DELETE
+} ConcurrentChangeKind;
+
+typedef struct ConcurrentChange
+{
+	/* See the enum above. */
+	ConcurrentChangeKind kind;
+
+	/*
+	 * The actual tuple.
+	 *
+	 * The tuple data follows the ConcurrentChange structure. Before use make
+	 * sure the tuple is correctly aligned (ConcurrentChange can be stored as
+	 * bytea) and that tuple->t_data is fixed.
+	 */
+	HeapTupleData tup_data;
+} ConcurrentChange;
+
+#define SizeOfConcurrentChange (offsetof(ConcurrentChange, tup_data) + \
+								sizeof(HeapTupleData))
+
+/*
+ * Logical decoding state.
+ *
+ * Here we store the data changes that we decode from WAL while the table
+ * contents is being copied to a new storage. Also the necessary metadata
+ * needed to apply these changes to the table is stored here.
+ */
+typedef struct RepackDecodingState
+{
+	/* The relation whose changes we're decoding. */
+	Oid			relid;
+
+	/* Replication slot name. */
+	NameData	slotname;
+
+	/*
+	 * Decoded changes are stored here. Although we try to avoid excessive
+	 * batches, it can happen that the changes need to be stored to disk. The
+	 * tuplestore does this transparently.
+	 */
+	Tuplestorestate *tstore;
+
+	/* The current number of changes in tstore. */
+	double		nchanges;
+
+	/*
+	 * Descriptor to store the ConcurrentChange structure serialized (bytea).
+	 * We can't store the tuple directly because tuplestore only supports
+	 * minimum tuple and we may need to transfer OID system column from the
+	 * output plugin. Also we need to transfer the change kind, so it's better
+	 * to put everything in the structure than to use 2 tuplestores "in
+	 * parallel".
+	 */
+	TupleDesc	tupdesc_change;
+
+	/* Tuple descriptor needed to update indexes. */
+	TupleDesc	tupdesc;
+
+	/* Slot to retrieve data from tstore. */
+	TupleTableSlot *tsslot;
+
+	ResourceOwner resowner;
+} RepackDecodingState;
+
 extern void ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel);
 
 extern void cluster_rel(RepackCommand command, Relation OldHeap, Oid indexOid,
-						ClusterParams *params);
+						ClusterParams *params, bool isTopLevel);
 extern void check_index_is_clusterable(Relation OldHeap, Oid indexOid,
 									   LOCKMODE lockmode);
 extern void mark_index_clustered(Relation rel, Oid indexOid, bool is_internal);
 
+extern void repack_decode_concurrent_changes(LogicalDecodingContext *ctx,
+											 XLogRecPtr end_of_wal);
+
 extern Oid	make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
 						  char relpersistence, LOCKMODE lockmode);
 extern void finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
@@ -48,6 +135,7 @@ extern void finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 							 bool swap_toast_by_content,
 							 bool check_constraints,
 							 bool is_internal,
+							 bool reindex,
 							 TransactionId frozenXid,
 							 MultiXactId cutoffMulti,
 							 char newrelpersistence);
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 8d68bcbef95..18cb482ac26 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -86,10 +86,12 @@
 #define PROGRESS_REPACK_PHASE					1
 #define PROGRESS_REPACK_INDEX_RELID				2
 #define PROGRESS_REPACK_HEAP_TUPLES_SCANNED		3
-#define PROGRESS_REPACK_HEAP_TUPLES_WRITTEN		4
-#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			5
-#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		6
-#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		7
+#define PROGRESS_REPACK_HEAP_TUPLES_INSERTED	4
+#define PROGRESS_REPACK_HEAP_TUPLES_UPDATED		5
+#define PROGRESS_REPACK_HEAP_TUPLES_DELETED		6
+#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			7
+#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		8
+#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		9
 
 /*
  * Phases of repack (as advertised via PROGRESS_REPACK_PHASE).
@@ -98,9 +100,10 @@
 #define PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP	2
 #define PROGRESS_REPACK_PHASE_SORT_TUPLES		3
 #define PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP	4
-#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	5
-#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		6
-#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		7
+#define PROGRESS_REPACK_PHASE_CATCH_UP			5
+#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	6
+#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		7
+#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		8
 
 /* Progress parameters for CREATE INDEX */
 /* 3, 4 and 5 reserved for "waitfor" metrics */
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index 6d4d2d1814c..802fc4b0823 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -73,6 +73,7 @@ extern void FreeSnapshotBuilder(SnapBuild *builder);
 extern void SnapBuildSnapDecRefcount(Snapshot snap);
 
 extern Snapshot SnapBuildInitialSnapshot(SnapBuild *builder);
+extern Snapshot SnapBuildInitialSnapshotForRepack(SnapBuild *builder);
 extern Snapshot SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place);
 extern const char *SnapBuildExportSnapshot(SnapBuild *builder);
 extern void SnapBuildClearExportedSnapshot(void);
diff --git a/src/include/storage/lockdefs.h b/src/include/storage/lockdefs.h
index 7f3ba0352f6..2739327b0da 100644
--- a/src/include/storage/lockdefs.h
+++ b/src/include/storage/lockdefs.h
@@ -36,8 +36,8 @@ typedef int LOCKMODE;
 #define AccessShareLock			1	/* SELECT */
 #define RowShareLock			2	/* SELECT FOR UPDATE/FOR SHARE */
 #define RowExclusiveLock		3	/* INSERT, UPDATE, DELETE */
-#define ShareUpdateExclusiveLock 4	/* VACUUM (non-FULL), ANALYZE, CREATE
-									 * INDEX CONCURRENTLY */
+#define ShareUpdateExclusiveLock 4	/* VACUUM (non-exclusive), ANALYZE, CREATE
+									 * INDEX CONCURRENTLY, REPACK CONCURRENTLY */
 #define ShareLock				5	/* CREATE INDEX (WITHOUT CONCURRENTLY) */
 #define ShareRowExclusiveLock	6	/* like EXCLUSIVE MODE, but allows ROW
 									 * SHARE */
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index f65f83c85cd..1f821fd2ccd 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -64,6 +64,8 @@ extern Snapshot GetLatestSnapshot(void);
 extern void SnapshotSetCommandId(CommandId curcid);
 
 extern Snapshot CopySnapshot(Snapshot snapshot);
+extern void FreeSnapshot(Snapshot snapshot);
+
 extern Snapshot GetCatalogSnapshot(Oid relid);
 extern Snapshot GetNonHistoricCatalogSnapshot(Oid relid);
 extern void InvalidateCatalogSnapshot(void);
diff --git a/src/test/modules/injection_points/Makefile b/src/test/modules/injection_points/Makefile
index c85034eb8cc..a9769f1d99f 100644
--- a/src/test/modules/injection_points/Makefile
+++ b/src/test/modules/injection_points/Makefile
@@ -14,12 +14,15 @@ REGRESS_OPTS = --dlpath=$(top_builddir)/src/test/regress
 
 ISOLATION = basic \
 	    inplace \
+	    repack \
+	    repack_toast \
 	    syscache-update-pruned \
 	    index-concurrently-upsert \
 	    index-concurrently-upsert-predicate \
 	    reindex-concurrently-upsert \
 	    reindex-concurrently-upsert-on-constraint \
 	    reindex-concurrently-upsert-partitioned
+ISOLATION_OPTS = --temp-config $(top_srcdir)/src/test/modules/injection_points/logical.conf
 
 # The injection points are cluster-wide, so disable installcheck
 NO_INSTALLCHECK = 1
diff --git a/src/test/modules/injection_points/expected/repack.out b/src/test/modules/injection_points/expected/repack.out
new file mode 100644
index 00000000000..b575e9052ee
--- /dev/null
+++ b/src/test/modules/injection_points/expected/repack.out
@@ -0,0 +1,113 @@
+Parsed test spec with 2 sessions
+
+starting permutation: wait_before_lock change_existing change_new change_subxact1 change_subxact2 check2 wakeup_before_lock check1
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step wait_before_lock: 
+	REPACK (CONCURRENTLY) repack_test USING INDEX repack_test_pkey;
+ <waiting ...>
+step change_existing: 
+	UPDATE repack_test SET i=10 where i=1;
+	UPDATE repack_test SET j=20 where i=2;
+	UPDATE repack_test SET i=30 where i=3;
+	UPDATE repack_test SET i=40 where i=30;
+	DELETE FROM repack_test WHERE i=4;
+
+step change_new: 
+	INSERT INTO repack_test(i, j) VALUES (5, 5), (6, 6), (7, 7), (8, 8);
+	UPDATE repack_test SET i=50 where i=5;
+	UPDATE repack_test SET j=60 where i=6;
+	DELETE FROM repack_test WHERE i=7;
+
+step change_subxact1: 
+	BEGIN;
+	INSERT INTO repack_test(i, j) VALUES (100, 100);
+	SAVEPOINT s1;
+	UPDATE repack_test SET i=101 where i=100;
+	SAVEPOINT s2;
+	UPDATE repack_test SET i=102 where i=101;
+	COMMIT;
+
+step change_subxact2: 
+	BEGIN;
+	SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 110);
+	ROLLBACK TO SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 111);
+	COMMIT;
+
+step check2: 
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s2(i, j)
+	SELECT i, j FROM repack_test;
+
+  i|  j
+---+---
+  2| 20
+  6| 60
+  8|  8
+ 10|  1
+ 40|  3
+ 50|  5
+102|100
+110|111
+(8 rows)
+
+step wakeup_before_lock: 
+	SELECT injection_points_wakeup('repack-concurrently-before-lock');
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step wait_before_lock: <... completed>
+step check1: 
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT count(DISTINCT node) FROM relfilenodes;
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s1(i, j)
+	SELECT i, j FROM repack_test;
+
+	SELECT count(*)
+	FROM data_s1 d1 FULL JOIN data_s2 d2 USING (i, j)
+	WHERE d1.i ISNULL OR d2.i ISNULL;
+
+count
+-----
+    2
+(1 row)
+
+  i|  j
+---+---
+  2| 20
+  6| 60
+  8|  8
+ 10|  1
+ 40|  3
+ 50|  5
+102|100
+110|111
+(8 rows)
+
+count
+-----
+    0
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
diff --git a/src/test/modules/injection_points/expected/repack_toast.out b/src/test/modules/injection_points/expected/repack_toast.out
new file mode 100644
index 00000000000..4f866a74e32
--- /dev/null
+++ b/src/test/modules/injection_points/expected/repack_toast.out
@@ -0,0 +1,64 @@
+Parsed test spec with 2 sessions
+
+starting permutation: wait_before_lock change check2 wakeup_before_lock check1
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step wait_before_lock: 
+	REPACK (CONCURRENTLY) repack_test;
+ <waiting ...>
+step change: 
+	UPDATE repack_test SET j=get_long_string() where i=2;
+	DELETE FROM repack_test WHERE i=3;
+	INSERT INTO repack_test(i, j) VALUES (4, get_long_string());
+
+step check2: 
+	INSERT INTO relfilenodes(node)
+	SELECT c2.relfilenode
+	FROM pg_class c1 JOIN pg_class c2 ON c2.oid = c1.oid OR c2.oid = c1.reltoastrelid
+	WHERE c1.relname='repack_test';
+
+	INSERT INTO data_s2(i, j)
+	SELECT i, j FROM repack_test;
+
+step wakeup_before_lock: 
+	SELECT injection_points_wakeup('repack-concurrently-before-lock');
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step wait_before_lock: <... completed>
+step check1: 
+	INSERT INTO relfilenodes(node)
+	SELECT c2.relfilenode
+	FROM pg_class c1 JOIN pg_class c2 ON c2.oid = c1.oid OR c2.oid = c1.reltoastrelid
+	WHERE c1.relname='repack_test';
+
+	SELECT count(DISTINCT node) FROM relfilenodes;
+
+	INSERT INTO data_s1(i, j)
+	SELECT i, j FROM repack_test;
+
+	SELECT count(*)
+	FROM data_s1 d1 FULL JOIN data_s2 d2 USING (i, j)
+	WHERE d1.i ISNULL OR d2.i ISNULL;
+
+count
+-----
+    4
+(1 row)
+
+count
+-----
+    0
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
diff --git a/src/test/modules/injection_points/logical.conf b/src/test/modules/injection_points/logical.conf
new file mode 100644
index 00000000000..e3d257315fa
--- /dev/null
+++ b/src/test/modules/injection_points/logical.conf
@@ -0,0 +1 @@
+wal_level = logical
diff --git a/src/test/modules/injection_points/meson.build b/src/test/modules/injection_points/meson.build
index 8d6f662040d..b72bfb8ff06 100644
--- a/src/test/modules/injection_points/meson.build
+++ b/src/test/modules/injection_points/meson.build
@@ -45,6 +45,8 @@ tests += {
     'specs': [
       'basic',
       'inplace',
+      'repack',
+      'repack_toast',
       'syscache-update-pruned',
       'index-concurrently-upsert',
       'index-concurrently-upsert-predicate',
@@ -55,5 +57,7 @@ tests += {
     'runningcheck': false, # see syscache-update-pruned
     # Some tests wait for all snapshots, so avoid parallel execution
     'runningcheck-parallel': false,
+    # 'repack' requires wal_level = 'logical'.
+    'regress_args': ['--temp-config', files('logical.conf')],
   },
 }
diff --git a/src/test/modules/injection_points/specs/repack.spec b/src/test/modules/injection_points/specs/repack.spec
new file mode 100644
index 00000000000..d727a9b056b
--- /dev/null
+++ b/src/test/modules/injection_points/specs/repack.spec
@@ -0,0 +1,142 @@
+# REPACK (CONCURRENTLY) ... USING INDEX ...;
+setup
+{
+	CREATE EXTENSION injection_points;
+
+	CREATE TABLE repack_test(i int PRIMARY KEY, j int);
+	INSERT INTO repack_test(i, j) VALUES (1, 1), (2, 2), (3, 3), (4, 4);
+
+	CREATE TABLE relfilenodes(node oid);
+
+	CREATE TABLE data_s1(i int, j int);
+	CREATE TABLE data_s2(i int, j int);
+}
+
+teardown
+{
+	DROP TABLE repack_test;
+	DROP EXTENSION injection_points;
+
+	DROP TABLE relfilenodes;
+	DROP TABLE data_s1;
+	DROP TABLE data_s2;
+}
+
+session s1
+setup
+{
+	SELECT injection_points_set_local();
+	SELECT injection_points_attach('repack-concurrently-before-lock', 'wait');
+}
+# Perform the initial load and wait for s2 to do some data changes.
+step wait_before_lock
+{
+	REPACK (CONCURRENTLY) repack_test USING INDEX repack_test_pkey;
+}
+# Check the table from the perspective of s1.
+#
+# Besides the contents, we also check that relfilenode has changed.
+
+# Have each session write the contents into a table and use FULL JOIN to check
+# if the outputs are identical.
+step check1
+{
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT count(DISTINCT node) FROM relfilenodes;
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s1(i, j)
+	SELECT i, j FROM repack_test;
+
+	SELECT count(*)
+	FROM data_s1 d1 FULL JOIN data_s2 d2 USING (i, j)
+	WHERE d1.i ISNULL OR d2.i ISNULL;
+}
+teardown
+{
+	SELECT injection_points_detach('repack-concurrently-before-lock');
+}
+
+session s2
+# Change the existing data. UPDATE changes both key and non-key columns. Also
+# update one row twice to test whether tuple version generated by this session
+# can be found.
+step change_existing
+{
+	UPDATE repack_test SET i=10 where i=1;
+	UPDATE repack_test SET j=20 where i=2;
+	UPDATE repack_test SET i=30 where i=3;
+	UPDATE repack_test SET i=40 where i=30;
+	DELETE FROM repack_test WHERE i=4;
+}
+# Insert new rows and UPDATE / DELETE some of them. Again, update both key and
+# non-key column.
+step change_new
+{
+	INSERT INTO repack_test(i, j) VALUES (5, 5), (6, 6), (7, 7), (8, 8);
+	UPDATE repack_test SET i=50 where i=5;
+	UPDATE repack_test SET j=60 where i=6;
+	DELETE FROM repack_test WHERE i=7;
+}
+
+# When applying concurrent data changes, we should see the effects of an
+# in-progress subtransaction.
+#
+# XXX Not sure this test is useful now - it was designed for the patch that
+# preserves tuple visibility and which therefore modifies
+# TransactionIdIsCurrentTransactionId().
+step change_subxact1
+{
+	BEGIN;
+	INSERT INTO repack_test(i, j) VALUES (100, 100);
+	SAVEPOINT s1;
+	UPDATE repack_test SET i=101 where i=100;
+	SAVEPOINT s2;
+	UPDATE repack_test SET i=102 where i=101;
+	COMMIT;
+}
+
+# When applying concurrent data changes, we should not see the effects of a
+# rolled back subtransaction.
+#
+# XXX Is this test useful? See above.
+step change_subxact2
+{
+	BEGIN;
+	SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 110);
+	ROLLBACK TO SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 111);
+	COMMIT;
+}
+
+# Check the table from the perspective of s2.
+step check2
+{
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s2(i, j)
+	SELECT i, j FROM repack_test;
+}
+step wakeup_before_lock
+{
+	SELECT injection_points_wakeup('repack-concurrently-before-lock');
+}
+
+# Test if data changes introduced while one session is performing REPACK
+# CONCURRENTLY find their way into the table.
+permutation
+	wait_before_lock
+	change_existing
+	change_new
+	change_subxact1
+	change_subxact2
+	check2
+	wakeup_before_lock
+	check1
diff --git a/src/test/modules/injection_points/specs/repack_toast.spec b/src/test/modules/injection_points/specs/repack_toast.spec
new file mode 100644
index 00000000000..b48abf21450
--- /dev/null
+++ b/src/test/modules/injection_points/specs/repack_toast.spec
@@ -0,0 +1,105 @@
+# REPACK (CONCURRENTLY);
+#
+# Test handling of TOAST. At the same time, no tuplesort.
+setup
+{
+	CREATE EXTENSION injection_points;
+
+	-- Return a string that needs to be TOASTed.
+	CREATE FUNCTION get_long_string()
+	RETURNS text
+	LANGUAGE sql as $$
+		SELECT string_agg(chr(65 + trunc(25 * random())::int), '')
+		FROM generate_series(1, 2048) s(x);
+	$$;
+
+	CREATE TABLE repack_test(i int PRIMARY KEY, j text);
+	INSERT INTO repack_test(i, j) VALUES (1, get_long_string()),
+		(2, get_long_string()), (3, get_long_string());
+
+	CREATE TABLE relfilenodes(node oid);
+
+	CREATE TABLE data_s1(i int, j text);
+	CREATE TABLE data_s2(i int, j text);
+}
+
+teardown
+{
+	DROP TABLE repack_test;
+	DROP EXTENSION injection_points;
+	DROP FUNCTION get_long_string();
+
+	DROP TABLE relfilenodes;
+	DROP TABLE data_s1;
+	DROP TABLE data_s2;
+}
+
+session s1
+setup
+{
+	SELECT injection_points_set_local();
+	SELECT injection_points_attach('repack-concurrently-before-lock', 'wait');
+}
+# Perform the initial load and wait for s2 to do some data changes.
+step wait_before_lock
+{
+	REPACK (CONCURRENTLY) repack_test;
+}
+# Check the table from the perspective of s1.
+#
+# Besides the contents, we also check that relfilenode has changed.
+
+# Have each session write the contents into a table and use FULL JOIN to check
+# if the outputs are identical.
+step check1
+{
+	INSERT INTO relfilenodes(node)
+	SELECT c2.relfilenode
+	FROM pg_class c1 JOIN pg_class c2 ON c2.oid = c1.oid OR c2.oid = c1.reltoastrelid
+	WHERE c1.relname='repack_test';
+
+	SELECT count(DISTINCT node) FROM relfilenodes;
+
+	INSERT INTO data_s1(i, j)
+	SELECT i, j FROM repack_test;
+
+	SELECT count(*)
+	FROM data_s1 d1 FULL JOIN data_s2 d2 USING (i, j)
+	WHERE d1.i ISNULL OR d2.i ISNULL;
+}
+teardown
+{
+    SELECT injection_points_detach('repack-concurrently-before-lock');
+}
+
+session s2
+step change
+{
+	UPDATE repack_test SET j=get_long_string() where i=2;
+	DELETE FROM repack_test WHERE i=3;
+	INSERT INTO repack_test(i, j) VALUES (4, get_long_string());
+}
+# Check the table from the perspective of s2.
+step check2
+{
+	INSERT INTO relfilenodes(node)
+	SELECT c2.relfilenode
+	FROM pg_class c1 JOIN pg_class c2 ON c2.oid = c1.oid OR c2.oid = c1.reltoastrelid
+	WHERE c1.relname='repack_test';
+
+	INSERT INTO data_s2(i, j)
+	SELECT i, j FROM repack_test;
+}
+step wakeup_before_lock
+{
+	SELECT injection_points_wakeup('repack-concurrently-before-lock');
+}
+
+# Test if data changes introduced while one session is performing REPACK
+# CONCURRENTLY find their way into the table.
+permutation
+	wait_before_lock
+	change
+	check2
+	wakeup_before_lock
+	check1
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index eb90fd3de5f..88da9ef75b4 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2012,7 +2012,7 @@ pg_stat_progress_cluster| SELECT pid,
     phase,
     repack_index_relid AS cluster_index_relid,
     heap_tuples_scanned,
-    heap_tuples_written,
+    (heap_tuples_inserted + heap_tuples_updated) AS heap_tuples_written,
     heap_blks_total,
     heap_blks_scanned,
     index_rebuild_count
@@ -2092,17 +2092,20 @@ pg_stat_progress_repack| SELECT s.pid,
             WHEN 2 THEN 'index scanning heap'::text
             WHEN 3 THEN 'sorting tuples'::text
             WHEN 4 THEN 'writing new heap'::text
-            WHEN 5 THEN 'swapping relation files'::text
-            WHEN 6 THEN 'rebuilding index'::text
-            WHEN 7 THEN 'performing final cleanup'::text
+            WHEN 5 THEN 'catch-up'::text
+            WHEN 6 THEN 'swapping relation files'::text
+            WHEN 7 THEN 'rebuilding index'::text
+            WHEN 8 THEN 'performing final cleanup'::text
             ELSE NULL::text
         END AS phase,
     (s.param3)::oid AS repack_index_relid,
     s.param4 AS heap_tuples_scanned,
-    s.param5 AS heap_tuples_written,
-    s.param6 AS heap_blks_total,
-    s.param7 AS heap_blks_scanned,
-    s.param8 AS index_rebuild_count
+    s.param5 AS heap_tuples_inserted,
+    s.param6 AS heap_tuples_updated,
+    s.param7 AS heap_tuples_deleted,
+    s.param8 AS heap_blks_total,
+    s.param9 AS heap_blks_scanned,
+    s.param10 AS index_rebuild_count
    FROM (pg_stat_get_progress_info('REPACK'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_vacuum| SELECT s.pid,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 4f3c7c160a6..3139b14e85f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -411,6 +411,7 @@ CatCacheHeader
 CatalogId
 CatalogIdMapEntry
 CatalogIndexState
+ChangeDest
 ChangeVarNodes_callback
 ChangeVarNodes_context
 CheckPoint
@@ -487,6 +488,8 @@ CompressFileHandle
 CompressionLocation
 CompressorState
 ComputeXidHorizonsResult
+ConcurrentChange
+ConcurrentChangeKind
 ConditionVariable
 ConditionVariableMinimallyPadded
 ConditionalStack
@@ -1264,6 +1267,7 @@ IndexElem
 IndexFetchHeapData
 IndexFetchTableData
 IndexInfo
+IndexInsertState
 IndexList
 IndexOnlyScan
 IndexOnlyScanState
@@ -2558,6 +2562,7 @@ ReorderBufferUpdateProgressTxnCB
 ReorderTuple
 RepOriginId
 RepackCommand
+RepackDecodingState
 RepackStmt
 ReparameterizeForeignPathByChild_function
 ReplaceVarsFromTargetList_context
-- 
2.47.3

v28-0005-Use-background-worker-to-do-logical-decoding.patchtext/x-diffDownload

From 08c77f5c2ecc1890477fa460e4131b53538f6c42 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Sat, 13 Dec 2025 19:27:18 +0100
Subject: [PATCH 5/6] Use background worker to do logical decoding.

If the backend performing REPACK (CONCURRENTLY) does both data copying and
logical decoding, it has to "travel in time" back and forth and therefore it
has to invalidate system caches quite a few times. (The copying and the
decoding work with different catalog snapshots.) As the decoding worker has
separate caches, the switching is not necessary.

Without the worker, it'd also be difficult to switch between potentiallly long
running tasks like index build and WAL decoding. (No decoding during that time
at all can suspend archiving / recycling of WAL segments for some time, which
in turn may result in full disk.)

Another problem is that, after having acquired AccessExclusiveLock (in order
to swap the files), the backend needs to both decode and apply the data
changes that took place while it was waiting for the lock. With the decoding
worker, the decoding runs all the time, so the backend only needs to apply the
changes. This can reduce the time the exclusive lock is held for.

Note that the code added in order to handle ERRORs in the background worker
almost duplicates the existing code that does the same for other types of
workers (See ProcessParallelMessages() and
ProcessParallelApplyMessages()). Refactoring of the existing code might be
useful, to reduce the duplication.
---
 src/backend/access/heap/heapam_handler.c      |   44 -
 src/backend/commands/cluster.c                | 1144 +++++++++++++----
 src/backend/libpq/pqmq.c                      |    5 +
 src/backend/postmaster/bgworker.c             |    4 +
 src/backend/replication/logical/logical.c     |    6 +-
 .../pgoutput_repack/pgoutput_repack.c         |   54 +-
 src/backend/storage/ipc/procsignal.c          |    4 +
 src/backend/tcop/postgres.c                   |    4 +
 .../utils/activity/wait_event_names.txt       |    2 +
 src/include/access/tableam.h                  |    7 +-
 src/include/commands/cluster.h                |   68 +-
 src/include/storage/procsignal.h              |    1 +
 src/tools/pgindent/typedefs.list              |    4 +-
 13 files changed, 946 insertions(+), 401 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 6063e037edb..cb09e6fd1dc 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -33,7 +33,6 @@
 #include "catalog/index.h"
 #include "catalog/storage.h"
 #include "catalog/storage_xlog.h"
-#include "commands/cluster.h"
 #include "commands/progress.h"
 #include "executor/executor.h"
 #include "miscadmin.h"
@@ -688,7 +687,6 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 								 Relation OldIndex, bool use_sort,
 								 TransactionId OldestXmin,
 								 Snapshot snapshot,
-								 LogicalDecodingContext *decoding_ctx,
 								 TransactionId *xid_cutoff,
 								 MultiXactId *multi_cutoff,
 								 double *num_tuples,
@@ -710,7 +708,6 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	BufferHeapTupleTableSlot *hslot;
 	BlockNumber prev_cblock = InvalidBlockNumber;
 	bool		concurrent = snapshot != NULL;
-	XLogRecPtr	end_of_wal_prev = GetFlushRecPtr(NULL);
 
 	/* Remember if it's a system catalog */
 	is_system_catalog = IsSystemRelation(OldHeap);
@@ -957,31 +954,6 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			ct_val[1] = *num_tuples;
 			pgstat_progress_update_multi_param(2, ct_index, ct_val);
 		}
-
-		/*
-		 * Process the WAL produced by the load, as well as by other
-		 * transactions, so that the replication slot can advance and WAL does
-		 * not pile up. Use wal_segment_size as a threshold so that we do not
-		 * introduce the decoding overhead too often.
-		 *
-		 * Of course, we must not apply the changes until the initial load has
-		 * completed.
-		 *
-		 * Note that our insertions into the new table should not be decoded
-		 * as we (intentionally) do not write the logical decoding specific
-		 * information to WAL.
-		 */
-		if (concurrent)
-		{
-			XLogRecPtr	end_of_wal;
-
-			end_of_wal = GetFlushRecPtr(NULL);
-			if ((end_of_wal - end_of_wal_prev) > wal_segment_size)
-			{
-				repack_decode_concurrent_changes(decoding_ctx, end_of_wal);
-				end_of_wal_prev = end_of_wal;
-			}
-		}
 	}
 
 	if (indexScan != NULL)
@@ -1027,22 +999,6 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			/* Report n_tuples */
 			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_INSERTED,
 										 n_tuples);
-
-			/*
-			 * Try to keep the amount of not-yet-decoded WAL small, like
-			 * above.
-			 */
-			if (concurrent)
-			{
-				XLogRecPtr	end_of_wal;
-
-				end_of_wal = GetFlushRecPtr(NULL);
-				if ((end_of_wal - end_of_wal_prev) > wal_segment_size)
-				{
-					repack_decode_concurrent_changes(decoding_ctx, end_of_wal);
-					end_of_wal_prev = end_of_wal;
-				}
-			}
 		}
 
 		tuplesort_end(tuplesort);
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 569705abc81..b0383c1375f 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -12,12 +12,13 @@
  * In concurrent mode, we lock the table with only ShareUpdateExclusiveLock,
  * then do an initial copy as above.  However, while the tuples are being
  * copied, concurrent transactions could modify the table. To cope with those
- * changes, we rely on logical decoding to obtain them from WAL.  The changes
- * are accumulated in a tuplestore.  Once the initial copy is complete, we
- * read the changes from the tuplestore and re-apply them on the new heap.
- * Then we upgrade our ShareUpdateExclusiveLock to AccessExclusiveLock and
- * swap the relfilenodes.  This way, the time we hold a strong lock on the
- * table is much reduced, and the bloat is eliminated.
+ * changes, we rely on logical decoding to obtain them from WAL.  A bgworker
+ * consumes WAL while the initial copy is ongoing (to prevent excessive WAL
+ * from being reserved), and accumulates the changes in a file.  Once the
+ * initial copy is complete, we read the changes from the file and re-apply
+ * them on the new heap.  Then we upgrade our ShareUpdateExclusiveLock to
+ * AccessExclusiveLock and swap the relfilenodes.  This way, the time we hold
+ * a strong lock on the table is much reduced, and the bloat is eliminated.
  *
  * There is hardly anything left of Paul Brown's original implementation...
  *
@@ -61,6 +62,8 @@
 #include "commands/tablecmds.h"
 #include "commands/vacuum.h"
 #include "executor/executor.h"
+#include "libpq/pqformat.h"
+#include "libpq/pqmq.h"
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
 #include "pgstat.h"
@@ -71,6 +74,8 @@
 #include "storage/ipc.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
+#include "storage/procsignal.h"
+#include "tcop/tcopprot.h"
 #include "utils/acl.h"
 #include "utils/fmgroids.h"
 #include "utils/guc.h"
@@ -138,6 +143,106 @@ typedef struct ChangeDest
 	IndexInsertState *iistate;
 } ChangeDest;
 
+/*
+ * Layout of shared memory used for communication between backend and the
+ * worker that performs logical decoding of data changes
+ */
+typedef struct DecodingWorkerShared
+{
+	/* Is the decoding initialized? */
+	bool		initialized;
+
+	/*
+	 * Once the worker has reached this LSN, it should close the current
+	 * output file and either create a new one or exit, according to the field
+	 * 'done'. If the value is InvalidXLogRecPtr, the worker should decode all
+	 * the WAL available and keep checking this field. It is ok if the worker
+	 * had already decoded records whose LSN is >= lsn_upto before this field
+	 * has been set.
+	 */
+	XLogRecPtr	lsn_upto;
+
+	/* Exit after closing the current file? */
+	bool		done;
+
+	/* The output is stored here. */
+	SharedFileSet sfs;
+
+	/* Can backend read the file contents? */
+	bool		sfs_valid;
+
+	/* Number of the last file exported by the worker. */
+	int			last_exported;
+
+	/* Synchronize access to the fields above. */
+	slock_t		mutex;
+
+	/* Database to connect to. */
+	Oid			dbid;
+
+	/* Role to connect as. */
+	Oid			roleid;
+
+	/* Decode data changes of this relation. */
+	Oid			relid;
+
+	/* The backend uses this to wait for the worker. */
+	ConditionVariable cv;
+
+	/* Info to signal the backend. */
+	PGPROC	   *backend_proc;
+	pid_t		backend_pid;
+	ProcNumber	backend_proc_number;
+
+	/* Error queue. */
+	shm_mq	   *error_mq;
+
+	/*
+	 * Memory the queue is located int.
+	 *
+	 * For considerations on the value see the comments of
+	 * PARALLEL_ERROR_QUEUE_SIZE.
+	 */
+#define REPACK_ERROR_QUEUE_SIZE			16384
+	char		error_queue[FLEXIBLE_ARRAY_MEMBER];
+} DecodingWorkerShared;
+
+/*
+ * Generate worker's output file name. If relations of the same 'relid' happen
+ * to be processed at the same time, they must be from different databases and
+ * therefore different backends must be involved. (PID is already present in
+ * the fileset name.)
+ */
+static inline void
+DecodingWorkerFileName(char *fname, Oid relid, uint32 seq)
+{
+	snprintf(fname, MAXPGPATH, "%u-%u", relid, seq);
+}
+
+/*
+ * Backend-local information to control the decoding worker.
+ */
+typedef struct DecodingWorker
+{
+	/* The worker. */
+	BackgroundWorkerHandle *handle;
+
+	/* DecodingWorkerShared is in this segment. */
+	dsm_segment *seg;
+
+	/* Handle of the error queue. */
+	shm_mq_handle *error_mqh;
+} DecodingWorker;
+
+/* Pointer to currently running decoding worker. */
+static DecodingWorker *decoding_worker = NULL;
+
+/*
+ * Is there a message sent by a repack worker that the backend needs to
+ * receive?
+ */
+volatile sig_atomic_t RepackMessagePending = false;
+
 static bool cluster_rel_recheck(RepackCommand cmd, Relation OldHeap,
 								Oid indexOid, Oid userid, LOCKMODE lmode,
 								int options);
@@ -145,7 +250,7 @@ static void check_repack_concurrently_requirements(Relation rel);
 static void rebuild_relation(Relation OldHeap, Relation index, bool verbose,
 							 bool concurrent);
 static void copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
-							Snapshot snapshot, LogicalDecodingContext *decoding_ctx,
+							Snapshot snapshot,
 							bool verbose,
 							bool *pSwapToastByContent,
 							TransactionId *pFreezeXid,
@@ -158,12 +263,10 @@ static List *get_tables_to_repack_partitioned(RepackCommand cmd,
 static bool cluster_is_permitted_for_relation(RepackCommand cmd,
 											  Oid relid, Oid userid);
 
-static void begin_concurrent_repack(Relation rel);
-static void end_concurrent_repack(void);
 static LogicalDecodingContext *setup_logical_decoding(Oid relid);
-static HeapTuple get_changed_tuple(char *change);
-static void apply_concurrent_changes(RepackDecodingState *dstate,
-									 ChangeDest *dest);
+static bool decode_concurrent_changes(LogicalDecodingContext *ctx,
+									  DecodingWorkerShared *shared);
+static void apply_concurrent_changes(BufFile *file, ChangeDest *dest);
 static void apply_concurrent_insert(Relation rel, HeapTuple tup,
 									IndexInsertState *iistate,
 									TupleTableSlot *index_slot);
@@ -175,9 +278,9 @@ static void apply_concurrent_delete(Relation rel, HeapTuple tup_target);
 static HeapTuple find_target_tuple(Relation rel, ChangeDest *dest,
 								   HeapTuple tup_key,
 								   TupleTableSlot *ident_slot);
-static void process_concurrent_changes(LogicalDecodingContext *decoding_ctx,
-									   XLogRecPtr end_of_wal,
-									   ChangeDest *dest);
+static void process_concurrent_changes(XLogRecPtr end_of_wal,
+									   ChangeDest *dest,
+									   bool done);
 static IndexInsertState *get_index_insert_state(Relation relation,
 												Oid ident_index_id,
 												Relation *ident_index_p);
@@ -187,7 +290,6 @@ static void free_index_insert_state(IndexInsertState *iistate);
 static void cleanup_logical_decoding(LogicalDecodingContext *ctx);
 static void rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
 											   Relation cl_index,
-											   LogicalDecodingContext *decoding_ctx,
 											   TransactionId frozenXid,
 											   MultiXactId cutoffMulti);
 static List *build_new_indexes(Relation NewHeap, Relation OldHeap, List *OldIndexes);
@@ -197,6 +299,13 @@ static Relation process_single_relation(RepackStmt *stmt,
 										ClusterParams *params);
 static Oid	determine_clustered_index(Relation rel, bool usingindex,
 									  const char *indexname);
+static void start_decoding_worker(Oid relid);
+static void stop_decoding_worker(void);
+static void repack_worker_internal(dsm_segment *seg);
+static void export_initial_snapshot(Snapshot snapshot,
+									DecodingWorkerShared *shared);
+static Snapshot get_initial_snapshot(DecodingWorker *worker);
+static void ProcessRepackMessage(StringInfo msg);
 static const char *RepackCommandAsString(RepackCommand cmd);
 
 
@@ -619,20 +728,20 @@ cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	/* rebuild_relation does all the dirty work */
 	PG_TRY();
 	{
-		/*
-		 * For concurrent processing, make sure that our logical decoding
-		 * ignores data changes of other tables than the one we are
-		 * processing.
-		 */
-		if (concurrent)
-			begin_concurrent_repack(OldHeap);
-
 		rebuild_relation(OldHeap, index, verbose, concurrent);
 	}
 	PG_FINALLY();
 	{
 		if (concurrent)
-			end_concurrent_repack();
+		{
+			/*
+			 * Since during normal operation the worker was already asked to
+			 * exit, stopping it explicitly is especially important on ERROR.
+			 * However it still seems a good practice to make sure that the
+			 * worker never survives the REPACK command.
+			 */
+			stop_decoding_worker();
+		}
 	}
 	PG_END_TRY();
 
@@ -929,7 +1038,6 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose, bool concurrent
 	bool		swap_toast_by_content;
 	TransactionId frozenXid;
 	MultiXactId cutoffMulti;
-	LogicalDecodingContext *decoding_ctx = NULL;
 	Snapshot	snapshot = NULL;
 #if USE_ASSERT_CHECKING
 	LOCKMODE	lmode;
@@ -943,19 +1051,36 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose, bool concurrent
 	if (concurrent)
 	{
 		/*
-		 * Prepare to capture the concurrent data changes.
+		 * The worker needs to be member of the locking group we're the leader
+		 * of. We ought to become the leader before the worker starts. The
+		 * worker will join the group as soon as it starts.
+		 *
+		 * This is to make sure that the deadlock described below is
+		 * detectable by deadlock.c: if the worker waits for a transaction to
+		 * complete and we are waiting for the worker output, then effectively
+		 * we (i.e. this backend) are waiting for that transaction.
+		 */
+		BecomeLockGroupLeader();
+
+		/*
+		 * Start the worker that decodes data changes applied while we're
+		 * copying the table contents.
 		 *
-		 * Note that this call waits for all transactions with XID already
-		 * assigned to finish. If some of those transactions is waiting for a
-		 * lock conflicting with ShareUpdateExclusiveLock on our table (e.g.
-		 * it runs CREATE INDEX), we can end up in a deadlock. Not sure this
-		 * risk is worth unlocking/locking the table (and its clustering
-		 * index) and checking again if its still eligible for REPACK
-		 * CONCURRENTLY.
+		 * Note that the worker has to wait for all transactions with XID
+		 * already assigned to finish. If some of those transactions is
+		 * waiting for a lock conflicting with ShareUpdateExclusiveLock on our
+		 * table (e.g.  it runs CREATE INDEX), we can end up in a deadlock.
+		 * Not sure this risk is worth unlocking/locking the table (and its
+		 * clustering index) and checking again if its still eligible for
+		 * REPACK CONCURRENTLY.
+		 */
+		start_decoding_worker(tableOid);
+
+		/*
+		 * Wait until the worker has the initial snapshot and retrieve it.
 		 */
-		decoding_ctx = setup_logical_decoding(tableOid);
+		snapshot = get_initial_snapshot(decoding_worker);
 
-		snapshot = SnapBuildInitialSnapshotForRepack(decoding_ctx->snapshot_builder);
 		PushActiveSnapshot(snapshot);
 	}
 
@@ -980,7 +1105,7 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose, bool concurrent
 	NewHeap = table_open(OIDNewHeap, NoLock);
 
 	/* Copy the heap data into the new table in the desired order */
-	copy_table_data(NewHeap, OldHeap, index, snapshot, decoding_ctx, verbose,
+	copy_table_data(NewHeap, OldHeap, index, snapshot, verbose,
 					&swap_toast_by_content, &frozenXid, &cutoffMulti);
 
 	/* The historic snapshot won't be needed anymore. */
@@ -994,14 +1119,10 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose, bool concurrent
 	{
 		Assert(!swap_toast_by_content);
 		rebuild_relation_finish_concurrent(NewHeap, OldHeap, index,
-										   decoding_ctx,
 										   frozenXid, cutoffMulti);
 
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_FINAL_CLEANUP);
-
-		/* Done with decoding. */
-		cleanup_logical_decoding(decoding_ctx);
 	}
 	else
 	{
@@ -1172,8 +1293,7 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
  */
 static void
 copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
-				Snapshot snapshot, LogicalDecodingContext *decoding_ctx,
-				bool verbose, bool *pSwapToastByContent,
+				Snapshot snapshot, bool verbose, bool *pSwapToastByContent,
 				TransactionId *pFreezeXid, MultiXactId *pCutoffMulti)
 {
 	Relation	relRelation;
@@ -1334,7 +1454,6 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
 	 */
 	table_relation_copy_for_cluster(OldHeap, NewHeap, OldIndex, use_sort,
 									cutoffs.OldestXmin, snapshot,
-									decoding_ctx,
 									&cutoffs.FreezeLimit,
 									&cutoffs.MultiXactCutoff,
 									&num_tuples, &tups_vacuumed,
@@ -2367,59 +2486,6 @@ RepackCommandAsString(RepackCommand cmd)
 	return "???";
 }
 
-
-/*
- * Call this function before REPACK CONCURRENTLY starts to setup logical
- * decoding. It makes sure that other users of the table put enough
- * information into WAL.
- *
- * The point is that at various places we expect that the table we're
- * processing is treated like a system catalog. For example, we need to be
- * able to scan it using a "historic snapshot" anytime during the processing
- * (as opposed to scanning only at the start point of the decoding, as logical
- * replication does during initial table synchronization), in order to apply
- * concurrent UPDATE / DELETE commands.
- *
- * Note that TOAST table needs no attention here as it's not scanned using
- * historic snapshot.
- */
-static void
-begin_concurrent_repack(Relation rel)
-{
-	Oid			toastrelid;
-
-	/*
-	 * Avoid logical decoding of other relations by this backend. The lock we
-	 * have guarantees that the actual locator cannot be changed concurrently:
-	 * TRUNCATE needs AccessExclusiveLock.
-	 */
-	Assert(CheckRelationLockedByMe(rel, ShareUpdateExclusiveLock, false));
-	repacked_rel_locator = rel->rd_locator;
-	toastrelid = rel->rd_rel->reltoastrelid;
-	if (OidIsValid(toastrelid))
-	{
-		Relation	toastrel;
-
-		/* Avoid logical decoding of other TOAST relations. */
-		toastrel = table_open(toastrelid, AccessShareLock);
-		repacked_rel_toast_locator = toastrel->rd_locator;
-		table_close(toastrel, AccessShareLock);
-	}
-}
-
-/*
- * Call this when done with REPACK CONCURRENTLY.
- */
-static void
-end_concurrent_repack(void)
-{
-	/*
-	 * Restore normal function of (future) logical decoding for this backend.
-	 */
-	repacked_rel_locator.relNumber = InvalidOid;
-	repacked_rel_toast_locator.relNumber = InvalidOid;
-}
-
 /*
  * This function is much like pg_create_logical_replication_slot() except that
  * the new slot is neither released (if anyone else could read changes from
@@ -2431,9 +2497,10 @@ static LogicalDecodingContext *
 setup_logical_decoding(Oid relid)
 {
 	Relation	rel;
-	TupleDesc	tupdesc;
+	Oid			toastrelid;
 	LogicalDecodingContext *ctx;
-	RepackDecodingState *dstate = palloc0_object(RepackDecodingState);
+	NameData	slotname;
+	RepackDecodingState *dstate;
 
 	/*
 	 * REPACK CONCURRENTLY is not allowed in a transaction block, so this
@@ -2441,21 +2508,21 @@ setup_logical_decoding(Oid relid)
 	 */
 	Assert(!TransactionIdIsValid(GetTopTransactionIdIfAny()));
 
-	/*
-	 * A single backend should not execute multiple REPACK commands at a time,
-	 * so use PID to make the slot unique.
-	 */
-	snprintf(NameStr(dstate->slotname), NAMEDATALEN, "repack_%d", MyProcPid);
-
 	/*
 	 * Check if we can use logical decoding.
 	 */
 	CheckSlotPermissions();
 	CheckLogicalDecodingRequirements();
 
-	/* RS_TEMPORARY so that the slot gets cleaned up on ERROR. */
-	ReplicationSlotCreate(NameStr(dstate->slotname), true, RS_TEMPORARY,
-						  false, false, false);
+	/*
+	 * A single backend should not execute multiple REPACK commands at a time,
+	 * so use PID to make the slot unique.
+	 *
+	 * RS_TEMPORARY so that the slot gets cleaned up on ERROR.
+	 */
+	snprintf(NameStr(slotname), NAMEDATALEN, "repack_%d", MyProcPid);
+	ReplicationSlotCreate(NameStr(slotname), true, RS_TEMPORARY, false, false,
+						  false);
 
 	/*
 	 * Neither prepare_write nor do_write callback nor update_progress is
@@ -2477,104 +2544,109 @@ setup_logical_decoding(Oid relid)
 
 	DecodingContextFindStartpoint(ctx);
 
+	/*
+	 * decode_concurrent_changes() needs non-blocking callback.
+	 */
+	ctx->reader->routine.page_read = read_local_xlog_page_no_wait;
+
+	/*
+	 * read_local_xlog_page_no_wait() needs to be able to indicate the end of
+	 * WAL.
+	 */
+	ctx->reader->private_data = MemoryContextAllocZero(ctx->context,
+													   sizeof(ReadLocalXLogPageNoWaitPrivate));
+
+
 	/* Some WAL records should have been read. */
 	Assert(ctx->reader->EndRecPtr != InvalidXLogRecPtr);
 
+	/*
+	 * Initialize repack_current_segment so that we can notice WAL segment
+	 * boundaries.
+	 */
 	XLByteToSeg(ctx->reader->EndRecPtr, repack_current_segment,
 				wal_segment_size);
 
-	/*
-	 * Setup structures to store decoded changes.
-	 */
+	dstate = palloc0_object(RepackDecodingState);
 	dstate->relid = relid;
-	dstate->tstore = tuplestore_begin_heap(false, false,
-										   maintenance_work_mem);
 
-	/* Caller should already have the table locked. */
-	rel = table_open(relid, NoLock);
-	tupdesc = CreateTupleDescCopy(RelationGetDescr(rel));
-	dstate->tupdesc = tupdesc;
-	table_close(rel, NoLock);
+	/*
+	 * Tuple descriptor may be needed to flatten a tuple before we write it to
+	 * a file. A copy is needed because the decoding worker invalidates system
+	 * caches before it starts to do the actual work.
+	 */
+	rel = table_open(relid, AccessShareLock);
+	dstate->tupdesc = CreateTupleDescCopy(RelationGetDescr(rel));
 
-	/* Initialize the descriptor to store the changes ... */
-	dstate->tupdesc_change = CreateTemplateTupleDesc(1);
+	/* Avoid logical decoding of other relations. */
+	repacked_rel_locator = rel->rd_locator;
+	toastrelid = rel->rd_rel->reltoastrelid;
+	if (OidIsValid(toastrelid))
+	{
+		Relation	toastrel;
 
-	TupleDescInitEntry(dstate->tupdesc_change, 1, NULL, BYTEAOID, -1, 0);
-	/* ... as well as the corresponding slot. */
-	dstate->tsslot = MakeSingleTupleTableSlot(dstate->tupdesc_change,
-											  &TTSOpsMinimalTuple);
+		/* Avoid logical decoding of other TOAST relations. */
+		toastrel = table_open(toastrelid, AccessShareLock);
+		repacked_rel_toast_locator = toastrel->rd_locator;
+		table_close(toastrel, AccessShareLock);
+	}
+	table_close(rel, AccessShareLock);
 
-	dstate->resowner = ResourceOwnerCreate(CurrentResourceOwner,
-										   "logical decoding");
+	/* The file will be set as soon as we have it opened. */
+	dstate->file = NULL;
 
 	ctx->output_writer_private = dstate;
+
 	return ctx;
 }
 
 /*
- * Retrieve tuple from ConcurrentChange structure.
+ * Decode logical changes from the WAL sequence and store them to a file.
  *
- * The input data starts with the structure but it might not be appropriately
- * aligned.
+ * If true is returned, there is no more work for the worker.
  */
-static HeapTuple
-get_changed_tuple(char *change)
-{
-	HeapTupleData tup_data;
-	HeapTuple	result;
-	char	   *src;
-
-	/*
-	 * Ensure alignment before accessing the fields. (This is why we can't use
-	 * heap_copytuple() instead of this function.)
-	 */
-	src = change + offsetof(ConcurrentChange, tup_data);
-	memcpy(&tup_data, src, sizeof(HeapTupleData));
-
-	result = (HeapTuple) palloc(HEAPTUPLESIZE + tup_data.t_len);
-	memcpy(result, &tup_data, sizeof(HeapTupleData));
-	result->t_data = (HeapTupleHeader) ((char *) result + HEAPTUPLESIZE);
-	src = change + SizeOfConcurrentChange;
-	memcpy(result->t_data, src, result->t_len);
-
-	return result;
-}
-
-/*
- * Decode logical changes from the WAL sequence up to end_of_wal.
- */
-void
-repack_decode_concurrent_changes(LogicalDecodingContext *ctx,
-								 XLogRecPtr end_of_wal)
+static bool
+decode_concurrent_changes(LogicalDecodingContext *ctx,
+						  DecodingWorkerShared *shared)
 {
 	RepackDecodingState *dstate;
-	ResourceOwner resowner_old;
+	XLogRecPtr	lsn_upto;
+	bool		done;
+	char		fname[MAXPGPATH];
 
 	dstate = (RepackDecodingState *) ctx->output_writer_private;
-	resowner_old = CurrentResourceOwner;
-	CurrentResourceOwner = dstate->resowner;
 
-	PG_TRY();
+	/* Open the output file. */
+	DecodingWorkerFileName(fname, shared->relid, shared->last_exported + 1);
+	dstate->file = BufFileCreateFileSet(&shared->sfs.fs, fname);
+
+	SpinLockAcquire(&shared->mutex);
+	lsn_upto = shared->lsn_upto;
+	done = shared->done;
+	SpinLockRelease(&shared->mutex);
+
+	while (true)
 	{
-		while (ctx->reader->EndRecPtr < end_of_wal)
-		{
-			XLogRecord *record;
-			XLogSegNo	segno_new;
-			char	   *errm = NULL;
-			XLogRecPtr	end_lsn;
+		XLogRecord *record;
+		XLogSegNo	segno_new;
+		char	   *errm = NULL;
+		XLogRecPtr	end_lsn;
 
-			record = XLogReadRecord(ctx->reader, &errm);
-			if (errm)
-				elog(ERROR, "%s", errm);
+		CHECK_FOR_INTERRUPTS();
 
-			if (record != NULL)
-				LogicalDecodingProcessRecord(ctx, ctx->reader);
+		record = XLogReadRecord(ctx->reader, &errm);
+		if (record)
+		{
+			LogicalDecodingProcessRecord(ctx, ctx->reader);
 
 			/*
 			 * If WAL segment boundary has been crossed, inform the decoding
-			 * system that the catalog_xmin can advance. (We can confirm more
-			 * often, but a filling a single WAL segment should not take much
-			 * time.)
+			 * system that the catalog_xmin can advance.
+			 *
+			 * TODO Does it make sense to confirm more often? Segment size
+			 * seems appropriate for restart_lsn (because less than a segment
+			 * cannot be recycled anyway), however more frequent checks might
+			 * be beneficial for catalog_xmin.
 			 */
 			end_lsn = ctx->reader->EndRecPtr;
 			XLByteToSeg(end_lsn, segno_new, wal_segment_size);
@@ -2585,80 +2657,117 @@ repack_decode_concurrent_changes(LogicalDecodingContext *ctx,
 					 (uint32) (end_lsn >> 32), (uint32) end_lsn);
 				repack_current_segment = segno_new;
 			}
+		}
+		else
+		{
+			ReadLocalXLogPageNoWaitPrivate *priv;
+
+			if (errm)
+				ereport(ERROR, (errmsg("%s", errm)));
 
-			CHECK_FOR_INTERRUPTS();
+			/*
+			 * In the decoding loop we do not want to get blocked when there
+			 * is no more WAL available, otherwise the loop would become
+			 * uninterruptible. The point is that the worker is only useful if
+			 * it starts decoding before lsn_upto is set. Thus it can reach
+			 * the end of WAL and find out later that it did not have to go
+			 * that far.
+			 */
+			priv = (ReadLocalXLogPageNoWaitPrivate *)
+				ctx->reader->private_data;
+			if (priv->end_of_wal)
+				priv->end_of_wal = false;
+			else
+				ereport(ERROR, (errmsg("could not read WAL record")));
 		}
-		InvalidateSystemCaches();
-		CurrentResourceOwner = resowner_old;
-	}
-	PG_CATCH();
-	{
-		/* clear all timetravel entries */
-		InvalidateSystemCaches();
-		CurrentResourceOwner = resowner_old;
-		PG_RE_THROW();
+
+		/*
+		 * Whether we could read new record or not, keep checking if
+		 * 'lsn_upto' was specified.
+		 */
+		if (XLogRecPtrIsInvalid(lsn_upto))
+		{
+			SpinLockAcquire(&shared->mutex);
+			lsn_upto = shared->lsn_upto;
+			/* 'done' should be set at the same time as 'lsn_upto' */
+			done = shared->done;
+			SpinLockRelease(&shared->mutex);
+		}
+		if (!XLogRecPtrIsInvalid(lsn_upto) &&
+			ctx->reader->EndRecPtr >= lsn_upto)
+			break;
+
+		if (record == NULL)
+			/* Wait a bit before we retry reading WAL. */
+			(void) WaitLatch(MyLatch,
+							 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+							 100000L,	/* XXX Tune the delay. */
+							 WAIT_EVENT_REPACK_WORKER_MAIN);
 	}
-	PG_END_TRY();
+
+	/*
+	 * Close the file so we can make it available to the backend.
+	 */
+	BufFileClose(dstate->file);
+	dstate->file = NULL;
+	SpinLockAcquire(&shared->mutex);
+	shared->lsn_upto = InvalidXLogRecPtr;
+	shared->sfs_valid = true;
+	shared->last_exported++;
+	SpinLockRelease(&shared->mutex);
+	ConditionVariableSignal(&shared->cv);
+
+	return done;
 }
 
 /*
  * Apply changes stored in 'file'.
  */
 static void
-apply_concurrent_changes(RepackDecodingState *dstate, ChangeDest *dest)
+apply_concurrent_changes(BufFile *file, ChangeDest *dest)
 {
+	char		kind;
+	uint32		t_len;
 	Relation	rel = dest->rel;
 	TupleTableSlot *index_slot,
 			   *ident_slot;
 	HeapTuple	tup_old = NULL;
 
-	if (dstate->nchanges == 0)
-		return;
-
 	/* TupleTableSlot is needed to pass the tuple to ExecInsertIndexTuples(). */
-	index_slot = MakeSingleTupleTableSlot(dstate->tupdesc, &TTSOpsHeapTuple);
+	index_slot = MakeSingleTupleTableSlot(RelationGetDescr(rel),
+										  &TTSOpsHeapTuple);
 
 	/* A slot to fetch tuples from identity index. */
 	ident_slot = table_slot_create(rel, NULL);
 
-	while (tuplestore_gettupleslot(dstate->tstore, true, false,
-								   dstate->tsslot))
+	while (true)
 	{
-		bool		shouldFree;
-		HeapTuple	tup_change,
-					tup,
+		size_t		nread;
+		HeapTuple	tup,
 					tup_exist;
-		char	   *change_raw,
-				   *src;
-		ConcurrentChange change;
-		bool		isnull[1];
-		Datum		values[1];
 
 		CHECK_FOR_INTERRUPTS();
 
-		/* Get the change from the single-column tuple. */
-		tup_change = ExecFetchSlotHeapTuple(dstate->tsslot, false, &shouldFree);
-		heap_deform_tuple(tup_change, dstate->tupdesc_change, values, isnull);
-		Assert(!isnull[0]);
-
-		/* Make sure we access aligned data. */
-		change_raw = (char *) DatumGetByteaP(values[0]);
-		src = (char *) VARDATA(change_raw);
-		memcpy(&change, src, SizeOfConcurrentChange);
+		nread = BufFileReadMaybeEOF(file, &kind, 1, true);
+		/* Are we done with the file? */
+		if (nread == 0)
+			break;
 
-		/*
-		 * Extract the tuple from the change. The tuple is copied here because
-		 * it might be assigned to 'tup_old', in which case it needs to
-		 * survive into the next iteration.
-		 */
-		tup = get_changed_tuple(src);
+		/* Read the tuple. */
+		BufFileReadExact(file, &t_len, sizeof(t_len));
+		tup = (HeapTuple) palloc(HEAPTUPLESIZE + t_len);
+		tup->t_data = (HeapTupleHeader) ((char *) tup + HEAPTUPLESIZE);
+		BufFileReadExact(file, tup->t_data, t_len);
+		tup->t_len = t_len;
+		ItemPointerSetInvalid(&tup->t_self);
+		tup->t_tableOid = RelationGetRelid(dest->rel);
 
-		if (change.kind == CHANGE_UPDATE_OLD)
+		if (kind == CHANGE_UPDATE_OLD)
 		{
 			Assert(tup_old == NULL);
 			tup_old = tup;
 		}
-		else if (change.kind == CHANGE_INSERT)
+		else if (kind == CHANGE_INSERT)
 		{
 			Assert(tup_old == NULL);
 
@@ -2666,12 +2775,11 @@ apply_concurrent_changes(RepackDecodingState *dstate, ChangeDest *dest)
 
 			pfree(tup);
 		}
-		else if (change.kind == CHANGE_UPDATE_NEW ||
-				 change.kind == CHANGE_DELETE)
+		else if (kind == CHANGE_UPDATE_NEW || kind == CHANGE_DELETE)
 		{
 			HeapTuple	tup_key;
 
-			if (change.kind == CHANGE_UPDATE_NEW)
+			if (kind == CHANGE_UPDATE_NEW)
 			{
 				tup_key = tup_old != NULL ? tup_old : tup;
 			}
@@ -2688,7 +2796,7 @@ apply_concurrent_changes(RepackDecodingState *dstate, ChangeDest *dest)
 			if (tup_exist == NULL)
 				elog(ERROR, "failed to find target tuple");
 
-			if (change.kind == CHANGE_UPDATE_NEW)
+			if (kind == CHANGE_UPDATE_NEW)
 				apply_concurrent_update(rel, tup, tup_exist, dest->iistate,
 										index_slot);
 			else
@@ -2703,26 +2811,19 @@ apply_concurrent_changes(RepackDecodingState *dstate, ChangeDest *dest)
 			pfree(tup);
 		}
 		else
-			elog(ERROR, "unrecognized kind of change: %d", change.kind);
+			elog(ERROR, "unrecognized kind of change: %d", kind);
 
 		/*
 		 * If a change was applied now, increment CID for next writes and
 		 * update the snapshot so it sees the changes we've applied so far.
 		 */
-		if (change.kind != CHANGE_UPDATE_OLD)
+		if (kind != CHANGE_UPDATE_OLD)
 		{
 			CommandCounterIncrement();
 			UpdateActiveSnapshotCommandId();
 		}
-
-		/* TTSOpsMinimalTuple has .get_heap_tuple==NULL. */
-		Assert(shouldFree);
-		pfree(tup_change);
 	}
 
-	tuplestore_clear(dstate->tstore);
-	dstate->nchanges = 0;
-
 	/* Cleanup. */
 	ExecDropSingleTupleTableSlot(index_slot);
 	ExecDropSingleTupleTableSlot(ident_slot);
@@ -2901,25 +3002,58 @@ find_target_tuple(Relation rel, ChangeDest *dest, HeapTuple tup_key,
 }
 
 /*
- * Decode and apply concurrent changes.
+ * Decode and apply concurrent changes, up to (and including) the record whose
+ * LSN is 'end_of_wal'.
  */
 static void
-process_concurrent_changes(LogicalDecodingContext *decoding_ctx,
-						   XLogRecPtr end_of_wal, ChangeDest *dest)
+process_concurrent_changes(XLogRecPtr end_of_wal, ChangeDest *dest, bool done)
 {
-	RepackDecodingState *dstate;
+	DecodingWorkerShared *shared;
+	char		fname[MAXPGPATH];
+	BufFile    *file;
 
 	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 								 PROGRESS_REPACK_PHASE_CATCH_UP);
 
-	dstate = (RepackDecodingState *) decoding_ctx->output_writer_private;
+	/* Ask the worker for the file. */
+	shared = (DecodingWorkerShared *) dsm_segment_address(decoding_worker->seg);
+	SpinLockAcquire(&shared->mutex);
+	shared->lsn_upto = end_of_wal;
+	shared->done = done;
+	SpinLockRelease(&shared->mutex);
 
-	repack_decode_concurrent_changes(decoding_ctx, end_of_wal);
+	/*
+	 * The worker needs to finish processing of the current WAL record. Even
+	 * if it's idle, it'll need to close the output file. Thus we're likely to
+	 * wait, so prepare for sleep.
+	 */
+	ConditionVariablePrepareToSleep(&shared->cv);
+	for (;;)
+	{
+		bool		valid;
 
-	if (dstate->nchanges == 0)
-		return;
+		SpinLockAcquire(&shared->mutex);
+		valid = shared->sfs_valid;
+		SpinLockRelease(&shared->mutex);
+
+		if (valid)
+			break;
+
+		ConditionVariableSleep(&shared->cv, WAIT_EVENT_REPACK_WORKER_EXPORT);
+	}
+	ConditionVariableCancelSleep();
 
-	apply_concurrent_changes(dstate, dest);
+	/* Open the file. */
+	DecodingWorkerFileName(fname, shared->relid, shared->last_exported);
+	file = BufFileOpenFileSet(&shared->sfs.fs, fname, O_RDONLY, false);
+	apply_concurrent_changes(file, dest);
+
+	/* No file is exported until the worker exports the next one. */
+	SpinLockAcquire(&shared->mutex);
+	shared->sfs_valid = false;
+	SpinLockRelease(&shared->mutex);
+
+	BufFileClose(file);
 }
 
 /*
@@ -3045,15 +3179,10 @@ cleanup_logical_decoding(LogicalDecodingContext *ctx)
 
 	dstate = (RepackDecodingState *) ctx->output_writer_private;
 
-	ExecDropSingleTupleTableSlot(dstate->tsslot);
-	FreeTupleDesc(dstate->tupdesc_change);
 	FreeTupleDesc(dstate->tupdesc);
-	tuplestore_end(dstate->tstore);
-
 	FreeDecodingContext(ctx);
 
-	ReplicationSlotRelease();
-	ReplicationSlotDrop(NameStr(dstate->slotname), false);
+	ReplicationSlotDropAcquired();
 	pfree(dstate);
 }
 
@@ -3068,7 +3197,6 @@ cleanup_logical_decoding(LogicalDecodingContext *ctx)
 static void
 rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
 								   Relation cl_index,
-								   LogicalDecodingContext *decoding_ctx,
 								   TransactionId frozenXid,
 								   MultiXactId cutoffMulti)
 {
@@ -3172,7 +3300,7 @@ rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
 	 * hold AccessExclusiveLock. (Quite some amount of WAL could have been
 	 * written during the data copying and index creation.)
 	 */
-	process_concurrent_changes(decoding_ctx, end_of_wal, &chgdst);
+	process_concurrent_changes(end_of_wal, &chgdst, false);
 
 	/*
 	 * Acquire AccessExclusiveLock on the table, its TOAST relation (if there
@@ -3268,8 +3396,11 @@ rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
 	XLogFlush(wal_insert_ptr);
 	end_of_wal = GetFlushRecPtr(NULL);
 
-	/* Apply the concurrent changes again. */
-	process_concurrent_changes(decoding_ctx, end_of_wal, &chgdst);
+	/*
+	 * Apply the concurrent changes again. Indicate that the decoding worker
+	 * won't be needed anymore.
+	 */
+	process_concurrent_changes(end_of_wal, &chgdst, true);
 
 	/* Remember info about rel before closing OldHeap */
 	relpersistence = OldHeap->rd_rel->relpersistence;
@@ -3382,3 +3513,514 @@ build_new_indexes(Relation NewHeap, Relation OldHeap, List *OldIndexes)
 
 	return result;
 }
+
+/*
+ * Try to start a background worker to perform logical decoding of data
+ * changes applied to relation while REPACK CONCURRENTLY is copying its
+ * contents to a new table.
+ */
+static void
+start_decoding_worker(Oid relid)
+{
+	Size		size;
+	dsm_segment *seg;
+	DecodingWorkerShared *shared;
+	shm_mq	   *mq;
+	shm_mq_handle *mqh;
+	BackgroundWorker bgw;
+
+	/* Setup shared memory. */
+	size = BUFFERALIGN(offsetof(DecodingWorkerShared, error_queue)) +
+		BUFFERALIGN(REPACK_ERROR_QUEUE_SIZE);
+	seg = dsm_create(size, 0);
+	shared = (DecodingWorkerShared *) dsm_segment_address(seg);
+	shared->lsn_upto = InvalidXLogRecPtr;
+	shared->done = false;
+	SharedFileSetInit(&shared->sfs, seg);
+	shared->sfs_valid = false;
+	shared->last_exported = -1;
+	SpinLockInit(&shared->mutex);
+	shared->dbid = MyDatabaseId;
+
+	/*
+	 * This is the UserId set in cluster_rel(). Security context shouldn't be
+	 * needed for decoding worker.
+	 */
+	shared->roleid = GetUserId();
+	shared->relid = relid;
+	ConditionVariableInit(&shared->cv);
+	shared->backend_proc = MyProc;
+	shared->backend_pid = MyProcPid;
+	shared->backend_proc_number = MyProcNumber;
+
+	mq = shm_mq_create((char *) BUFFERALIGN(shared->error_queue),
+					   REPACK_ERROR_QUEUE_SIZE);
+	shm_mq_set_receiver(mq, MyProc);
+	mqh = shm_mq_attach(mq, seg, NULL);
+
+	memset(&bgw, 0, sizeof(bgw));
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "REPACK decoding worker for relation \"%s\"",
+			 get_rel_name(relid));
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "REPACK decoding worker");
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	snprintf(bgw.bgw_library_name, MAXPGPATH, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "RepackWorkerMain");
+	bgw.bgw_main_arg = UInt32GetDatum(dsm_segment_handle(seg));
+	bgw.bgw_notify_pid = MyProcPid;
+
+	decoding_worker = palloc0_object(DecodingWorker);
+	if (!RegisterDynamicBackgroundWorker(&bgw, &decoding_worker->handle))
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of background worker slots"),
+				 errhint("You might need to increase \"%s\".", "max_worker_processes")));
+
+	decoding_worker->seg = seg;
+	decoding_worker->error_mqh = mqh;
+
+	/*
+	 * The decoding setup must be done before the caller can have XID assigned
+	 * for any reason, otherwise the worker might end up in a deadlock,
+	 * waiting for the caller's transaction to end. Therefore wait here until
+	 * the worker indicates that it has the logical decoding initialized.
+	 */
+	ConditionVariablePrepareToSleep(&shared->cv);
+	for (;;)
+	{
+		int			initialized;
+
+		SpinLockAcquire(&shared->mutex);
+		initialized = shared->initialized;
+		SpinLockRelease(&shared->mutex);
+
+		if (initialized)
+			break;
+
+		ConditionVariableSleep(&shared->cv, WAIT_EVENT_REPACK_WORKER_EXPORT);
+	}
+	ConditionVariableCancelSleep();
+}
+
+/*
+ * Stop the decoding worker and cleanup the related resources.
+ *
+ * The worker stops on its own when it knows there is no more work to do, but
+ * we need to stop it explicitly at least on ERROR in the launching backend.
+ */
+static void
+stop_decoding_worker(void)
+{
+	BgwHandleStatus status;
+
+	/* Haven't reached the worker startup? */
+	if (decoding_worker == NULL)
+		return;
+
+	/* Could not register the worker? */
+	if (decoding_worker->handle == NULL)
+		return;
+
+	TerminateBackgroundWorker(decoding_worker->handle);
+	/* The worker should really exit before the REPACK command does. */
+	HOLD_INTERRUPTS();
+	status = WaitForBackgroundWorkerShutdown(decoding_worker->handle);
+	RESUME_INTERRUPTS();
+
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errcode(ERRCODE_ADMIN_SHUTDOWN),
+				 errmsg("postmaster exited during REPACK command")));
+
+	shm_mq_detach(decoding_worker->error_mqh);
+
+	/*
+	 * If we could not cancel the current sleep due to ERROR, do that before
+	 * we detach from the shared memory the condition variable is located in.
+	 * If we did not, the bgworker ERROR handling code would try and fail
+	 * badly.
+	 */
+	ConditionVariableCancelSleep();
+
+	dsm_detach(decoding_worker->seg);
+	pfree(decoding_worker);
+	decoding_worker = NULL;
+}
+
+/* Is this process a REPACK worker? */
+static bool is_repack_worker = false;
+
+static pid_t backend_pid;
+static ProcNumber backend_proc_number;
+
+/*
+ * See ParallelWorkerShutdown for details.
+ */
+static void
+RepackWorkerShutdown(int code, Datum arg)
+{
+	SendProcSignal(backend_pid,
+				   PROCSIG_REPACK_MESSAGE,
+				   backend_proc_number);
+
+	dsm_detach((dsm_segment *) DatumGetPointer(arg));
+}
+
+/* REPACK decoding worker entry point */
+void
+RepackWorkerMain(Datum main_arg)
+{
+	dsm_segment *seg;
+	DecodingWorkerShared *shared;
+	shm_mq	   *mq;
+	shm_mq_handle *mqh;
+
+	is_repack_worker = true;
+
+	/*
+	 * Override the default bgworker_die() with die() so we can use
+	 * CHECK_FOR_INTERRUPTS().
+	 */
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	seg = dsm_attach(DatumGetUInt32(main_arg));
+	if (seg == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("could not map dynamic shared memory segment")));
+
+	shared = (DecodingWorkerShared *) dsm_segment_address(seg);
+
+	/* Arrange to signal the leader if we exit. */
+	backend_pid = shared->backend_pid;
+	backend_proc_number = shared->backend_proc_number;
+	before_shmem_exit(RepackWorkerShutdown, PointerGetDatum(seg));
+
+	/*
+	 * Join locking group - see the comments around the call of
+	 * start_decoding_worker().
+	 */
+	if (!BecomeLockGroupMember(shared->backend_proc, backend_pid))
+		/* The leader is not running anymore. */
+		return;
+
+	/*
+	 * Setup a queue to send error messages to the backend that launched this
+	 * worker.
+	 */
+	mq = (shm_mq *) (char *) BUFFERALIGN(shared->error_queue);
+	shm_mq_set_sender(mq, MyProc);
+	mqh = shm_mq_attach(mq, seg, NULL);
+	pq_redirect_to_shm_mq(seg, mqh);
+	pq_set_parallel_leader(shared->backend_pid,
+						   shared->backend_proc_number);
+
+	/* Connect to the database. */
+	BackgroundWorkerInitializeConnectionByOid(shared->dbid, shared->roleid, 0);
+
+	repack_worker_internal(seg);
+}
+
+static void
+repack_worker_internal(dsm_segment *seg)
+{
+	DecodingWorkerShared *shared;
+	LogicalDecodingContext *decoding_ctx;
+	SharedFileSet *sfs;
+	Snapshot	snapshot;
+
+	/*
+	 * Transaction is needed to open relation, and it also provides us with a
+	 * resource owner.
+	 */
+	StartTransactionCommand();
+
+	shared = (DecodingWorkerShared *) dsm_segment_address(seg);
+
+	/*
+	 * Not sure the spinlock is needed here - the backend should not change
+	 * anything in the shared memory until we have serialized the snapshot.
+	 */
+	SpinLockAcquire(&shared->mutex);
+	Assert(XLogRecPtrIsInvalid(shared->lsn_upto));
+	Assert(!shared->sfs_valid);
+	sfs = &shared->sfs;
+	SpinLockRelease(&shared->mutex);
+
+	SharedFileSetAttach(sfs, seg);
+
+	/*
+	 * Prepare to capture the concurrent data changes ourselves.
+	 */
+	decoding_ctx = setup_logical_decoding(shared->relid);
+
+	/* Announce that we're ready. */
+	SpinLockAcquire(&shared->mutex);
+	shared->initialized = true;
+	SpinLockRelease(&shared->mutex);
+	ConditionVariableSignal(&shared->cv);
+
+	/* Build the initial snapshot and export it. */
+	snapshot = SnapBuildInitialSnapshotForRepack(decoding_ctx->snapshot_builder);
+	export_initial_snapshot(snapshot, shared);
+
+	/*
+	 * Only historic snapshots should be used now. Do not let us restrict the
+	 * progress of xmin horizon.
+	 */
+	InvalidateCatalogSnapshot();
+
+	while (!decode_concurrent_changes(decoding_ctx, shared))
+		;
+
+	/* Cleanup. */
+	cleanup_logical_decoding(decoding_ctx);
+	CommitTransactionCommand();
+}
+
+/*
+ * Make snapshot available to the backend that launched the decoding worker.
+ */
+static void
+export_initial_snapshot(Snapshot snapshot, DecodingWorkerShared *shared)
+{
+	char		fname[MAXPGPATH];
+	BufFile    *file;
+	Size		snap_size;
+	char	   *snap_space;
+
+	snap_size = EstimateSnapshotSpace(snapshot);
+	snap_space = (char *) palloc(snap_size);
+	SerializeSnapshot(snapshot, snap_space);
+	FreeSnapshot(snapshot);
+
+	DecodingWorkerFileName(fname, shared->relid, shared->last_exported + 1);
+	file = BufFileCreateFileSet(&shared->sfs.fs, fname);
+	/* To make restoration easier, write the snapshot size first. */
+	BufFileWrite(file, &snap_size, sizeof(snap_size));
+	BufFileWrite(file, snap_space, snap_size);
+	pfree(snap_space);
+	BufFileClose(file);
+
+	/* Tell the backend that the file is available. */
+	SpinLockAcquire(&shared->mutex);
+	shared->sfs_valid = true;
+	shared->last_exported++;
+	SpinLockRelease(&shared->mutex);
+	ConditionVariableSignal(&shared->cv);
+}
+
+/*
+ * Get the initial snapshot from the decoding worker.
+ */
+static Snapshot
+get_initial_snapshot(DecodingWorker *worker)
+{
+	DecodingWorkerShared *shared;
+	char		fname[MAXPGPATH];
+	BufFile    *file;
+	Size		snap_size;
+	char	   *snap_space;
+	Snapshot	snapshot;
+
+	shared = (DecodingWorkerShared *) dsm_segment_address(worker->seg);
+
+	/*
+	 * The worker needs to initialize the logical decoding, which usually
+	 * takes some time. Therefore it makes sense to prepare for the sleep
+	 * first.
+	 */
+	ConditionVariablePrepareToSleep(&shared->cv);
+	for (;;)
+	{
+		bool		valid;
+
+		SpinLockAcquire(&shared->mutex);
+		valid = shared->sfs_valid;
+		SpinLockRelease(&shared->mutex);
+
+		if (valid)
+			break;
+
+		ConditionVariableSleep(&shared->cv, WAIT_EVENT_REPACK_WORKER_EXPORT);
+	}
+	ConditionVariableCancelSleep();
+
+	/* Read the snapshot from a file. */
+	DecodingWorkerFileName(fname, shared->relid, shared->last_exported);
+	file = BufFileOpenFileSet(&shared->sfs.fs, fname, O_RDONLY, false);
+	BufFileReadExact(file, &snap_size, sizeof(snap_size));
+	snap_space = (char *) palloc(snap_size);
+	BufFileReadExact(file, snap_space, snap_size);
+	BufFileClose(file);
+
+	SpinLockAcquire(&shared->mutex);
+	shared->sfs_valid = false;
+	SpinLockRelease(&shared->mutex);
+
+	/* Restore it. */
+	snapshot = RestoreSnapshot(snap_space);
+	pfree(snap_space);
+
+	return snapshot;
+}
+
+bool
+IsRepackWorker(void)
+{
+	return is_repack_worker;
+}
+
+/*
+ * Handle receipt of an interrupt indicating a repack worker message.
+ *
+ * Note: this is called within a signal handler!  All we can do is set
+ * a flag that will cause the next CHECK_FOR_INTERRUPTS() to invoke
+ * ProcessRepackMessages().
+ */
+void
+HandleRepackMessageInterrupt(void)
+{
+	InterruptPending = true;
+	RepackMessagePending = true;
+	SetLatch(MyLatch);
+}
+
+/*
+ * Process any queued protocol messages received from parallel workers.
+ */
+void
+ProcessRepackMessages(void)
+{
+	MemoryContext oldcontext;
+
+	static MemoryContext hpm_context = NULL;
+
+	/*
+	 * Nothing to do if we haven't launched the worker yet or have already
+	 * terminated it.
+	 */
+	if (decoding_worker == NULL)
+		return;
+
+	/*
+	 * This is invoked from ProcessInterrupts(), and since some of the
+	 * functions it calls contain CHECK_FOR_INTERRUPTS(), there is a potential
+	 * for recursive calls if more signals are received while this runs.  It's
+	 * unclear that recursive entry would be safe, and it doesn't seem useful
+	 * even if it is safe, so let's block interrupts until done.
+	 */
+	HOLD_INTERRUPTS();
+
+	/*
+	 * Moreover, CurrentMemoryContext might be pointing almost anywhere.  We
+	 * don't want to risk leaking data into long-lived contexts, so let's do
+	 * our work here in a private context that we can reset on each use.
+	 */
+	if (hpm_context == NULL)	/* first time through? */
+		hpm_context = AllocSetContextCreate(TopMemoryContext,
+											"ProcessParallelMessages",
+											ALLOCSET_DEFAULT_SIZES);
+	else
+		MemoryContextReset(hpm_context);
+
+	oldcontext = MemoryContextSwitchTo(hpm_context);
+
+	/* OK to process messages.  Reset the flag saying there are more to do. */
+	RepackMessagePending = false;
+
+	/*
+	 * Read as many messages as we can from each worker, but stop when no more
+	 * messages can be read from the worker without blocking.
+	 */
+	while (true)
+	{
+		shm_mq_result res;
+		Size		nbytes;
+		void	   *data;
+
+		res = shm_mq_receive(decoding_worker->error_mqh, &nbytes,
+							 &data, true);
+		if (res == SHM_MQ_WOULD_BLOCK)
+			break;
+		else if (res == SHM_MQ_SUCCESS)
+		{
+			StringInfoData msg;
+
+			initStringInfo(&msg);
+			appendBinaryStringInfo(&msg, data, nbytes);
+			ProcessRepackMessage(&msg);
+			pfree(msg.data);
+		}
+		else
+		{
+			/*
+			 * The decoding worker is special in that it exits as soon as it
+			 * has its work done. Thus the DETACHED result code is fine.
+			 */
+			Assert(res = SHM_MQ_DETACHED);
+
+			break;
+		}
+	}
+
+	MemoryContextSwitchTo(oldcontext);
+
+	/* Might as well clear the context on our way out */
+	MemoryContextReset(hpm_context);
+
+	RESUME_INTERRUPTS();
+}
+
+/*
+ * Process a single protocol message received from a single parallel worker.
+ */
+static void
+ProcessRepackMessage(StringInfo msg)
+{
+	char		msgtype;
+
+	msgtype = pq_getmsgbyte(msg);
+
+	switch (msgtype)
+	{
+		case PqMsg_ErrorResponse:
+		case PqMsg_NoticeResponse:
+			{
+				ErrorData	edata;
+
+				/* Parse ErrorResponse or NoticeResponse. */
+				pq_parse_errornotice(msg, &edata);
+
+				/* Death of a worker isn't enough justification for suicide. */
+				edata.elevel = Min(edata.elevel, ERROR);
+
+				/*
+				 * If desired, add a context line to show that this is a
+				 * message propagated from a parallel worker.  Otherwise, it
+				 * can sometimes be confusing to understand what actually
+				 * happened.
+				 */
+				if (edata.context)
+					edata.context = psprintf("%s\n%s", edata.context,
+											 _("decoding worker"));
+				else
+					edata.context = pstrdup(_("decoding worker"));
+
+				/* Rethrow error or print notice. */
+				ThrowErrorData(&edata);
+
+				break;
+			}
+
+		default:
+			{
+				elog(ERROR, "unrecognized message type received from decoding worker: %c (message length %d bytes)",
+					 msgtype, msg->len);
+			}
+	}
+}
diff --git a/src/backend/libpq/pqmq.c b/src/backend/libpq/pqmq.c
index 2b75de0ddef..28a5a400fb1 100644
--- a/src/backend/libpq/pqmq.c
+++ b/src/backend/libpq/pqmq.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 
 #include "access/parallel.h"
+#include "commands/cluster.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "libpq/pqmq.h"
@@ -175,6 +176,10 @@ mq_putmessage(char msgtype, const char *s, size_t len)
 				SendProcSignal(pq_mq_parallel_leader_pid,
 							   PROCSIG_PARALLEL_APPLY_MESSAGE,
 							   pq_mq_parallel_leader_proc_number);
+			else if (IsRepackWorker())
+				SendProcSignal(pq_mq_parallel_leader_pid,
+							   PROCSIG_REPACK_MESSAGE,
+							   pq_mq_parallel_leader_proc_number);
 			else
 			{
 				Assert(IsParallelWorker());
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 8e1068969ae..2d1ba94a45f 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -13,6 +13,7 @@
 #include "postgres.h"
 
 #include "access/parallel.h"
+#include "commands/cluster.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -135,6 +136,9 @@ static const struct
 	},
 	{
 		"SequenceSyncWorkerMain", SequenceSyncWorkerMain
+	},
+	{
+		"RepackWorkerMain", RepackWorkerMain
 	}
 };
 
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 1b11ed63dc6..b3fd7fec392 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -205,7 +205,11 @@ StartupDecodingContext(List *output_plugin_options,
 
 	ctx->slot = slot;
 
-	ctx->reader = XLogReaderAllocate(wal_segment_size, NULL, xl_routine, ctx);
+	/*
+	 * TODO A separate patch for PG core, unless there's really a reason to
+	 * pass ctx for private_data (May extensions expect ctx?).
+	 */
+	ctx->reader = XLogReaderAllocate(wal_segment_size, NULL, xl_routine, NULL);
 	if (!ctx->reader)
 		ereport(ERROR,
 				(errcode(ERRCODE_OUT_OF_MEMORY),
diff --git a/src/backend/replication/pgoutput_repack/pgoutput_repack.c b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
index c8930640a0d..fb9956d392d 100644
--- a/src/backend/replication/pgoutput_repack/pgoutput_repack.c
+++ b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
@@ -168,17 +168,13 @@ store_change(LogicalDecodingContext *ctx, ConcurrentChangeKind kind,
 			 HeapTuple tuple)
 {
 	RepackDecodingState *dstate;
-	char	   *change_raw;
-	ConcurrentChange change;
+	char		kind_byte = (char) kind;
 	bool		flattened = false;
-	Size		size;
-	Datum		values[1];
-	bool		isnull[1];
-	char	   *dst;
 
 	dstate = (RepackDecodingState *) ctx->output_writer_private;
 
-	size = VARHDRSZ + SizeOfConcurrentChange;
+	/* Store the change kind. */
+	BufFileWrite(dstate->file, &kind_byte, 1);
 
 	/*
 	 * ReorderBufferCommit() stores the TOAST chunks in its private memory
@@ -195,46 +191,12 @@ store_change(LogicalDecodingContext *ctx, ConcurrentChangeKind kind,
 		tuple = toast_flatten_tuple(tuple, dstate->tupdesc);
 		flattened = true;
 	}
+	/* Store the tuple size ... */
+	BufFileWrite(dstate->file, &tuple->t_len, sizeof(tuple->t_len));
+	/* ... and the tuple itself. */
+	BufFileWrite(dstate->file, tuple->t_data, tuple->t_len);
 
-	size += tuple->t_len;
-	if (size >= MaxAllocSize)
-		elog(ERROR, "Change is too big.");
-
-	/* Construct the change. */
-	change_raw = (char *) palloc0(size);
-	SET_VARSIZE(change_raw, size);
-
-	/*
-	 * Since the varlena alignment might not be sufficient for the structure,
-	 * set the fields in a local instance and remember where it should
-	 * eventually be copied.
-	 */
-	change.kind = kind;
-	dst = (char *) VARDATA(change_raw);
-
-	/*
-	 * Copy the tuple.
-	 *
-	 * Note: change->tup_data.t_data must be fixed on retrieval!
-	 */
-	memcpy(&change.tup_data, tuple, sizeof(HeapTupleData));
-	memcpy(dst, &change, SizeOfConcurrentChange);
-	dst += SizeOfConcurrentChange;
-	memcpy(dst, tuple->t_data, tuple->t_len);
-
-	/* The data has been copied. */
+	/* Free the flat copy if created above. */
 	if (flattened)
 		pfree(tuple);
-
-	/* Store as tuple of 1 bytea column. */
-	values[0] = PointerGetDatum(change_raw);
-	isnull[0] = false;
-	tuplestore_putvalues(dstate->tstore, dstate->tupdesc_change,
-						 values, isnull);
-
-	/* Accounting. */
-	dstate->nchanges++;
-
-	/* Cleanup. */
-	pfree(change_raw);
 }
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 087821311cc..af12144795b 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -19,6 +19,7 @@
 
 #include "access/parallel.h"
 #include "commands/async.h"
+#include "commands/cluster.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/pg_bitutils.h"
@@ -694,6 +695,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
 	if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
 		HandleParallelApplyMessageInterrupt();
 
+	if (CheckProcSignal(PROCSIG_REPACK_MESSAGE))
+		HandleRepackMessageInterrupt();
+
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_DATABASE))
 		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_DATABASE);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 7dd75a490aa..4a4858882f0 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -36,6 +36,7 @@
 #include "access/xact.h"
 #include "catalog/pg_type.h"
 #include "commands/async.h"
+#include "commands/cluster.h"
 #include "commands/event_trigger.h"
 #include "commands/explain_state.h"
 #include "commands/prepare.h"
@@ -3541,6 +3542,9 @@ ProcessInterrupts(void)
 
 	if (ParallelApplyMessagePending)
 		ProcessParallelApplyMessages();
+
+	if (RepackMessagePending)
+		ProcessRepackMessages();
 }
 
 /*
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index f39830dbb34..cbcc8550960 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -62,6 +62,7 @@ LOGICAL_APPLY_MAIN	"Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN	"Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication parallel apply process."
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
+REPACK_WORKER_MAIN	"Waiting in main loop of REPACK decoding worker process."
 REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
@@ -153,6 +154,7 @@ RECOVERY_CONFLICT_SNAPSHOT	"Waiting for recovery conflict resolution for a vacuu
 RECOVERY_CONFLICT_TABLESPACE	"Waiting for recovery conflict resolution for dropping a tablespace."
 RECOVERY_END_COMMAND	"Waiting for <xref linkend="guc-recovery-end-command"/> to complete."
 RECOVERY_PAUSE	"Waiting for recovery to be resumed."
+REPACK_WORKER_EXPORT	"Waiting for decoding worker to export a new output file."
 REPLICATION_ORIGIN_DROP	"Waiting for a replication origin to become inactive so it can be dropped."
 REPLICATION_SLOT_DROP	"Waiting for a replication slot to become inactive so it can be dropped."
 RESTORE_COMMAND	"Waiting for <xref linkend="guc-restore-command"/> to complete."
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index d8f76d325f9..321c00682ec 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -22,7 +22,6 @@
 #include "access/xact.h"
 #include "commands/vacuum.h"
 #include "executor/tuptable.h"
-#include "replication/logical.h"
 #include "storage/read_stream.h"
 #include "utils/rel.h"
 #include "utils/snapshot.h"
@@ -631,7 +630,6 @@ typedef struct TableAmRoutine
 											  bool use_sort,
 											  TransactionId OldestXmin,
 											  Snapshot snapshot,
-											  LogicalDecodingContext *decoding_ctx,
 											  TransactionId *xid_cutoff,
 											  MultiXactId *multi_cutoff,
 											  double *num_tuples,
@@ -1651,8 +1649,6 @@ table_relation_copy_data(Relation rel, const RelFileLocator *newrlocator)
  * - *multi_cutoff - ditto
  * - snapshot - if != NULL, ignore data changes done by transactions that this
  *	 (MVCC) snapshot considers still in-progress or in the future.
- * - decoding_ctx - logical decoding context, to capture concurrent data
- *   changes.
  *
  * Output parameters:
  * - *xid_cutoff - rel's new relfrozenxid value, may be invalid
@@ -1666,7 +1662,6 @@ table_relation_copy_for_cluster(Relation OldTable, Relation NewTable,
 								bool use_sort,
 								TransactionId OldestXmin,
 								Snapshot snapshot,
-								LogicalDecodingContext *decoding_ctx,
 								TransactionId *xid_cutoff,
 								MultiXactId *multi_cutoff,
 								double *num_tuples,
@@ -1675,7 +1670,7 @@ table_relation_copy_for_cluster(Relation OldTable, Relation NewTable,
 {
 	OldTable->rd_tableam->relation_copy_for_cluster(OldTable, NewTable, OldIndex,
 													use_sort, OldestXmin,
-													snapshot, decoding_ctx,
+													snapshot,
 													xid_cutoff, multi_cutoff,
 													num_tuples, tups_vacuumed,
 													tups_recently_dead);
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index b43a1740053..0ac70ec30d7 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -17,6 +17,7 @@
 #include "nodes/parsenodes.h"
 #include "parser/parse_node.h"
 #include "replication/logical.h"
+#include "storage/buffile.h"
 #include "storage/lock.h"
 #include "storage/relfilelocator.h"
 #include "utils/relcache.h"
@@ -47,6 +48,9 @@ typedef struct ClusterParams
 extern RelFileLocator repacked_rel_locator;
 extern RelFileLocator repacked_rel_toast_locator;
 
+/*
+ * Stored as a single byte in the output file.
+ */
 typedef enum
 {
 	CHANGE_INSERT,
@@ -55,68 +59,30 @@ typedef enum
 	CHANGE_DELETE
 } ConcurrentChangeKind;
 
-typedef struct ConcurrentChange
-{
-	/* See the enum above. */
-	ConcurrentChangeKind kind;
-
-	/*
-	 * The actual tuple.
-	 *
-	 * The tuple data follows the ConcurrentChange structure. Before use make
-	 * sure the tuple is correctly aligned (ConcurrentChange can be stored as
-	 * bytea) and that tuple->t_data is fixed.
-	 */
-	HeapTupleData tup_data;
-} ConcurrentChange;
-
-#define SizeOfConcurrentChange (offsetof(ConcurrentChange, tup_data) + \
-								sizeof(HeapTupleData))
-
 /*
  * Logical decoding state.
  *
- * Here we store the data changes that we decode from WAL while the table
- * contents is being copied to a new storage. Also the necessary metadata
- * needed to apply these changes to the table is stored here.
+ * The output plugin uses it to store the data changes that it decodes from
+ * WAL while the table contents is being copied to a new storage.
  */
 typedef struct RepackDecodingState
 {
 	/* The relation whose changes we're decoding. */
 	Oid			relid;
 
-	/* Replication slot name. */
-	NameData	slotname;
-
-	/*
-	 * Decoded changes are stored here. Although we try to avoid excessive
-	 * batches, it can happen that the changes need to be stored to disk. The
-	 * tuplestore does this transparently.
-	 */
-	Tuplestorestate *tstore;
-
-	/* The current number of changes in tstore. */
-	double		nchanges;
-
-	/*
-	 * Descriptor to store the ConcurrentChange structure serialized (bytea).
-	 * We can't store the tuple directly because tuplestore only supports
-	 * minimum tuple and we may need to transfer OID system column from the
-	 * output plugin. Also we need to transfer the change kind, so it's better
-	 * to put everything in the structure than to use 2 tuplestores "in
-	 * parallel".
-	 */
-	TupleDesc	tupdesc_change;
-
-	/* Tuple descriptor needed to update indexes. */
+	/* Tuple descriptor of the relation being processed. */
 	TupleDesc	tupdesc;
 
-	/* Slot to retrieve data from tstore. */
-	TupleTableSlot *tsslot;
-
-	ResourceOwner resowner;
+	/* The current output file. */
+	BufFile    *file;
 } RepackDecodingState;
 
+extern PGDLLIMPORT volatile sig_atomic_t RepackMessagePending;
+
+extern bool IsRepackWorker(void);
+extern void HandleRepackMessageInterrupt(void);
+extern void ProcessRepackMessages(void);
+
 extern void ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel);
 
 extern void cluster_rel(RepackCommand command, Relation OldHeap, Oid indexOid,
@@ -125,9 +91,6 @@ extern void check_index_is_clusterable(Relation OldHeap, Oid indexOid,
 									   LOCKMODE lockmode);
 extern void mark_index_clustered(Relation rel, Oid indexOid, bool is_internal);
 
-extern void repack_decode_concurrent_changes(LogicalDecodingContext *ctx,
-											 XLogRecPtr end_of_wal);
-
 extern Oid	make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
 						  char relpersistence, LOCKMODE lockmode);
 extern void finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
@@ -140,4 +103,5 @@ extern void finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 							 MultiXactId cutoffMulti,
 							 char newrelpersistence);
 
+extern void RepackWorkerMain(Datum main_arg);
 #endif							/* CLUSTER_H */
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index afeeb1ca019..c0a66516b66 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -36,6 +36,7 @@ typedef enum
 	PROCSIG_BARRIER,			/* global barrier interrupt  */
 	PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
 	PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
+	PROCSIG_REPACK_MESSAGE,		/* Message from repack worker */
 
 	/* Recovery conflict reasons */
 	PROCSIG_RECOVERY_CONFLICT_FIRST,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3139b14e85f..35344910f65 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -488,7 +488,6 @@ CompressFileHandle
 CompressionLocation
 CompressorState
 ComputeXidHorizonsResult
-ConcurrentChange
 ConcurrentChangeKind
 ConditionVariable
 ConditionVariableMinimallyPadded
@@ -629,6 +628,9 @@ DeclareCursorStmt
 DecodedBkpBlock
 DecodedXLogRecord
 DecodingOutputState
+DecodingWorker
+DecodingWorkerShared
+DecodingWorkerState
 DefElem
 DefElemAction
 DefaultACLInfo
-- 
2.47.3

v28-0006-Use-multiple-snapshots-to-copy-the-data.patchtext/plainDownload

From 32accb96480ddf42847ae23f30f23636d64eeb50 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Sat, 13 Dec 2025 19:27:18 +0100
Subject: [PATCH 6/6] Use multiple snapshots to copy the data.

REPACK (CONCURRENTLY) does not prevent applications from using the table that
is being processed, however it can prevent the xmin horizon from advancing and
thus restrict VACUUM for the whole database. This patch adds the ability to
use particular snapshot only for certain range of pages. Each time that number
of pages is processed, a new snapshot is built, which supposedly has its xmin
higher than the previous snapshot.

The data copying works as follows:

  1. Have the logical decoding system build a snapshot S0 for range R0 at
     LSN0. This snapshot sees all the data changes whose commit records have
     LSN < LSN0.

  2. Copy the pages in that range to the new relation. The changes not visible
     to the snapshot (because their transactions are still running) will
     appear in the output of the logical decoding system as soon as their
     commit records appear in WAL.

  3. Perform logical decoding of all changes we find in WAL for the table
     we're repacking, put them aside and remember that out of these we can
     only apply those that affect the range R0 in the old
     relation. (Naturally, we cannot apply ones that belong to other pages
     because it's impossible to UPDATE / DELETE a row in the new relation if
     it hasn't been copied yet.) Once the decoding is done, consider LSN1 to
     be the position of the end of the last WAL record decoded.

  4. Build a new snapshot S1 at position LSN1, i.e. one that sees all the data
     whose commit records are at WAL positions < LSN1. Use this snapshot to
     copy the range of pages R1.

  5. Perform logical decoding like in step 3, but remember, that out of this
     next set, only changes belonging to ranges R0 *and* R1 in the old table
     can be applied.

  6. etc

Note that the changes decoded above should not be applied to the new relation
until the whole relation has been copied. The point is that we need "identity
index" to apply UPDATE and DELETE statements, and bulk creation of indexes on
the already copied heap is probably better than retail insertions during the
copying.

Special attention needs to be paid to UPDATES that span page ranges. For
example, if the old tuple is in range R0, but the new tuple is in R1, and R1
hasn't been copied yet, we only DELETE the old version from the new
relation. The new version will be handled during processing of range R1. The
snapshot S1 will be based on WAL position following that UPDATE, so it'll see
the new tuple if its transaction's commit record is at WAL position lower than
the position where we built the snapshot. On the other hand, if the commit
record appears at higher position than the that of the snapshot, the
corresponding INSERT will be decoded and replayed sometime later: once the
scan of R1 started, changes of tuples belonging to it are no longer filtered
out.

Likewise, if the old tuple is in range R1 (not yet copied) but the new tuple
is in R0, we only perform INSERT on the new relation. The deletion of the old
version will either be visible to the snapshot S1 (i.e. the snapshot won't see
the old version), or replayed later.

This approach introduces one limitation though: if the USING INDEX clause is
specified, an explicit sort is always used. Index scan wouldn't work because
it does not return the tuples sorted by CTID. That way we wouldn't be able to
split the copying into ranges of pages. I'm not sure it's serious. If REPACK
runs concurrently and does not restrict VACUUM, the execution time should not
be critical.

A new GUC repack_snapshot_after can be used to set the number of pages per
snapshot. It's currently classified as DEVELOPER_OPTIONS and may be replaced
by a constant after enough evaluation is done.
---
 src/backend/access/heap/heapam_handler.c      | 144 ++++-
 src/backend/commands/cluster.c                | 561 +++++++++++++-----
 src/backend/replication/logical/decode.c      |  47 +-
 src/backend/replication/logical/logical.c     |  30 +-
 .../replication/logical/reorderbuffer.c       |  50 ++
 src/backend/replication/logical/snapbuild.c   |  27 +-
 .../pgoutput_repack/pgoutput_repack.c         |   2 +
 src/backend/utils/misc/guc_parameters.dat     |  10 +
 src/backend/utils/misc/guc_tables.c           |   1 +
 src/include/access/tableam.h                  |  14 +-
 src/include/commands/cluster.h                |  65 ++
 src/include/replication/logical.h             |   2 +-
 src/include/replication/reorderbuffer.h       |   1 +
 src/include/replication/snapbuild.h           |   2 +-
 src/tools/pgindent/typedefs.list              |   3 +-
 15 files changed, 771 insertions(+), 188 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index cb09e6fd1dc..b87fad605e4 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -33,6 +33,7 @@
 #include "catalog/index.h"
 #include "catalog/storage.h"
 #include "catalog/storage_xlog.h"
+#include "commands/cluster.h"
 #include "commands/progress.h"
 #include "executor/executor.h"
 #include "miscadmin.h"
@@ -686,12 +687,12 @@ static void
 heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 								 Relation OldIndex, bool use_sort,
 								 TransactionId OldestXmin,
-								 Snapshot snapshot,
 								 TransactionId *xid_cutoff,
 								 MultiXactId *multi_cutoff,
 								 double *num_tuples,
 								 double *tups_vacuumed,
-								 double *tups_recently_dead)
+								 double *tups_recently_dead,
+								 void *tableam_data)
 {
 	RewriteState rwstate = NULL;
 	IndexScanDesc indexScan;
@@ -707,7 +708,10 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	bool	   *isnull;
 	BufferHeapTupleTableSlot *hslot;
 	BlockNumber prev_cblock = InvalidBlockNumber;
-	bool		concurrent = snapshot != NULL;
+	ConcurrentChangeContext *ctx = (ConcurrentChangeContext *) tableam_data;
+	bool		concurrent = ctx != NULL;
+	Snapshot	snapshot = NULL;
+	BlockNumber range_end = InvalidBlockNumber;
 
 	/* Remember if it's a system catalog */
 	is_system_catalog = IsSystemRelation(OldHeap);
@@ -744,8 +748,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	 * that still need to be copied, we scan with SnapshotAny and use
 	 * HeapTupleSatisfiesVacuum for the visibility test.
 	 *
-	 * In the CONCURRENTLY case, we do regular MVCC visibility tests, using
-	 * the snapshot passed by the caller.
+	 * In the CONCURRENTLY case, we do regular MVCC visibility tests. The
+	 * snapshot changes several times during the scan so that we do not block
+	 * the progress of the xmin horizon for VACUUM too much.
 	 */
 	if (OldIndex != NULL && !use_sort)
 	{
@@ -773,10 +778,15 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap,
-									snapshot ? snapshot : SnapshotAny,
-									0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
 		heapScan = (HeapScanDesc) tableScan;
+
+		/*
+		 * In CONCURRENTLY mode we scan the table by ranges of blocks and the
+		 * algorithm below expects forward direction. (No other direction
+		 * should be set here regardless concurrently anyway.)
+		 */
+		Assert(heapScan->rs_dir == ForwardScanDirection || !concurrent);
 		indexScan = NULL;
 
 		/* Set total heap blocks */
@@ -787,6 +797,24 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	slot = table_slot_create(OldHeap, NULL);
 	hslot = (BufferHeapTupleTableSlot *) slot;
 
+	if (concurrent)
+	{
+		/*
+		 * Do not block the progress of xmin horizons.
+		 *
+		 * TODO Analyze thoroughly if this might have bad consequences.
+		 */
+		PopActiveSnapshot();
+		InvalidateCatalogSnapshot();
+
+		/*
+		 * Wait until the worker has the initial snapshot and retrieve it.
+		 */
+		snapshot = repack_get_snapshot(ctx);
+
+		PushActiveSnapshot(snapshot);
+	}
+
 	/*
 	 * Scan through the OldHeap, either in OldIndex order or sequentially;
 	 * copy each tuple into the NewHeap, or transiently to the tuplesort
@@ -803,6 +831,13 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		if (indexScan != NULL)
 		{
+			/*
+			 * Index scan should not be used in the CONCURRENTLY case because
+			 * it returns tuples in random order, so we could not split the
+			 * scan into a series of page ranges.
+			 */
+			Assert(!concurrent);
+
 			if (!index_getnext_slot(indexScan, ForwardScanDirection, slot))
 				break;
 
@@ -824,6 +859,18 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 				 */
 				pgstat_progress_update_param(PROGRESS_REPACK_HEAP_BLKS_SCANNED,
 											 heapScan->rs_nblocks);
+
+				if (concurrent)
+				{
+					PopActiveSnapshot();
+
+					/*
+					 * For the last range, there are no restriction on block
+					 * numbers, so the concurrent data changes pertaining to
+					 * this range can decoded (and applied) anytime after this
+					 * loop.
+					 */
+				}
 				break;
 			}
 
@@ -922,6 +969,75 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 				continue;
 			}
 		}
+		else
+		{
+			BlockNumber blkno;
+			bool		visible;
+
+			/*
+			 * With CONCURRENTLY, we use each snapshot only for certain range
+			 * of pages, so that VACUUM does not get block for too long. So
+			 * first check if the tuple falls into the current range.
+			 */
+			blkno = BufferGetBlockNumber(buf);
+
+			/* The first block of the scan? */
+			if (!BlockNumberIsValid(ctx->first_block))
+			{
+				Assert(!BlockNumberIsValid(range_end));
+
+				ctx->first_block = blkno;
+				range_end = repack_blocks_per_snapshot;
+			}
+			else
+			{
+				Assert(BlockNumberIsValid(range_end));
+
+				/* End of the current range? */
+				if (blkno >= range_end)
+				{
+					XLogRecPtr	end_of_wal;
+
+					PopActiveSnapshot();
+
+					/*
+					 * XXX It might be worth Assert(CatalogSnapshot == NULL)
+					 * here, however that symbol is not external.
+					 */
+
+					/*
+					 * Decode all the concurrent data changes committed so far
+					 * - these will be applicable to the current range.
+					 */
+					end_of_wal = GetFlushRecPtr(NULL);
+					repack_get_concurrent_changes(ctx, end_of_wal, range_end,
+												  true, false);
+
+					/*
+					 * Define the next range.
+					 */
+					range_end = blkno + repack_blocks_per_snapshot;
+
+					/*
+					 * Get the snapshot for the next range - it should have
+					 * been built at the position right after the last change
+					 * decoded. Data present in the next range of blocks will
+					 * either be visible to the snapshot or appear in the next
+					 * batch of decoded changes.
+					 */
+					snapshot = repack_get_snapshot(ctx);
+					PushActiveSnapshot(snapshot);
+				}
+			}
+
+			/* Finally check the tuple visibility. */
+			LockBuffer(buf, BUFFER_LOCK_SHARE);
+			visible = HeapTupleSatisfiesVisibility(tuple, snapshot, buf);
+			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+
+			if (!visible)
+				continue;
+		}
 
 		*num_tuples += 1;
 		if (tuplesort != NULL)
@@ -956,6 +1072,18 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		}
 	}
 
+	if (concurrent)
+	{
+		XLogRecPtr	end_of_wal;
+
+		/* Decode the changes belonging to the last range. */
+		end_of_wal = GetFlushRecPtr(NULL);
+		repack_get_concurrent_changes(ctx, end_of_wal, InvalidBlockNumber,
+									  false, false);
+
+		PushActiveSnapshot(GetTransactionSnapshot());
+	}
+
 	if (indexScan != NULL)
 		index_endscan(indexScan);
 	if (tableScan != NULL)
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index b0383c1375f..3eb642b996d 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -110,38 +110,27 @@ typedef struct
 RelFileLocator repacked_rel_locator = {.relNumber = InvalidOid};
 RelFileLocator repacked_rel_toast_locator = {.relNumber = InvalidOid};
 
-/*
- * Everything we need to call ExecInsertIndexTuples().
- */
-typedef struct IndexInsertState
-{
-	ResultRelInfo *rri;
-	EState	   *estate;
-} IndexInsertState;
-
 /* The WAL segment being decoded. */
 static XLogSegNo repack_current_segment = 0;
 
 /*
- * Information needed to apply concurrent data changes.
+ * When REPACK (CONCURRENTLY) copies data to the new heap, a new snapshot is
+ * built after processing this many pages.
  */
-typedef struct ChangeDest
-{
-	/* The relation the changes are applied to. */
-	Relation	rel;
+int			repack_blocks_per_snapshot = 1024;
 
-	/*
-	 * The following is needed to find the existing tuple if the change is
-	 * UPDATE or DELETE. 'ident_key' should have all the fields except for
-	 * 'sk_argument' initialized.
-	 */
-	Relation	ident_index;
-	ScanKey		ident_key;
-	int			ident_key_nentries;
+/*
+ * Remember here to which pages should applied to changes recorded in given
+ * file.
+ */
+typedef struct RepackApplyRange
+{
+	/* The first block of the next range. */
+	BlockNumber end;
 
-	/* Needed to update indexes of rel_dst. */
-	IndexInsertState *iistate;
-} ChangeDest;
+	/* File containing the changes to be applied to blocks in this range. */
+	char	   *fname;
+} RepackApplyRange;
 
 /*
  * Layout of shared memory used for communication between backend and the
@@ -152,6 +141,9 @@ typedef struct DecodingWorkerShared
 	/* Is the decoding initialized? */
 	bool		initialized;
 
+	/* Set to request a snapshot. */
+	bool		snapshot_requested;
+
 	/*
 	 * Once the worker has reached this LSN, it should close the current
 	 * output file and either create a new one or exit, according to the field
@@ -159,20 +151,25 @@ typedef struct DecodingWorkerShared
 	 * the WAL available and keep checking this field. It is ok if the worker
 	 * had already decoded records whose LSN is >= lsn_upto before this field
 	 * has been set.
+	 *
+	 * Set a valid LSN to request data changes.
 	 */
 	XLogRecPtr	lsn_upto;
 
+#define	WORKER_RESPONSE_SNAPSHOT	0x1
+#define	WORKER_RESPONSE_CHANGES		0x2
+	/* Which kind of data is ready? */
+	int			response;;
+
 	/* Exit after closing the current file? */
 	bool		done;
 
 	/* The output is stored here. */
 	SharedFileSet sfs;
 
-	/* Can backend read the file contents? */
-	bool		sfs_valid;
-
 	/* Number of the last file exported by the worker. */
-	int			last_exported;
+	int			last_exported_changes;
+	int			last_exported_snapshot;
 
 	/* Synchronize access to the fields above. */
 	slock_t		mutex;
@@ -214,26 +211,14 @@ typedef struct DecodingWorkerShared
  * the fileset name.)
  */
 static inline void
-DecodingWorkerFileName(char *fname, Oid relid, uint32 seq)
+DecodingWorkerFileName(char *fname, Oid relid, uint32 seq, bool snapshot)
 {
-	snprintf(fname, MAXPGPATH, "%u-%u", relid, seq);
+	if (!snapshot)
+		snprintf(fname, MAXPGPATH, "%u-%u", relid, seq);
+	else
+		snprintf(fname, MAXPGPATH, "%u-%u-snapshot", relid, seq);
 }
 
-/*
- * Backend-local information to control the decoding worker.
- */
-typedef struct DecodingWorker
-{
-	/* The worker. */
-	BackgroundWorkerHandle *handle;
-
-	/* DecodingWorkerShared is in this segment. */
-	dsm_segment *seg;
-
-	/* Handle of the error queue. */
-	shm_mq_handle *error_mqh;
-} DecodingWorker;
-
 /* Pointer to currently running decoding worker. */
 static DecodingWorker *decoding_worker = NULL;
 
@@ -250,11 +235,11 @@ static void check_repack_concurrently_requirements(Relation rel);
 static void rebuild_relation(Relation OldHeap, Relation index, bool verbose,
 							 bool concurrent);
 static void copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
-							Snapshot snapshot,
 							bool verbose,
 							bool *pSwapToastByContent,
 							TransactionId *pFreezeXid,
-							MultiXactId *pCutoffMulti);
+							MultiXactId *pCutoffMulti,
+							ConcurrentChangeContext *ctx);
 static List *get_tables_to_repack(RepackCommand cmd, bool usingindex,
 								  MemoryContext permcxt);
 static List *get_tables_to_repack_partitioned(RepackCommand cmd,
@@ -264,9 +249,12 @@ static bool cluster_is_permitted_for_relation(RepackCommand cmd,
 											  Oid relid, Oid userid);
 
 static LogicalDecodingContext *setup_logical_decoding(Oid relid);
-static bool decode_concurrent_changes(LogicalDecodingContext *ctx,
+static bool decode_concurrent_changes(LogicalDecodingContext *decoding_ctx,
 									  DecodingWorkerShared *shared);
-static void apply_concurrent_changes(BufFile *file, ChangeDest *dest);
+static void apply_concurrent_changes(ConcurrentChangeContext *ctx);
+static void apply_concurrent_changes_file(ConcurrentChangeContext *ctx,
+										  BufFile *file,
+										  BlockNumber range_end);
 static void apply_concurrent_insert(Relation rel, HeapTuple tup,
 									IndexInsertState *iistate,
 									TupleTableSlot *index_slot);
@@ -275,12 +263,14 @@ static void apply_concurrent_update(Relation rel, HeapTuple tup,
 									IndexInsertState *iistate,
 									TupleTableSlot *index_slot);
 static void apply_concurrent_delete(Relation rel, HeapTuple tup_target);
-static HeapTuple find_target_tuple(Relation rel, ChangeDest *dest,
+static bool is_tuple_in_block_range(HeapTuple tup, BlockNumber start,
+									BlockNumber end);
+static HeapTuple find_target_tuple(Relation rel,
+								   ConcurrentChangeContext *ctx,
 								   HeapTuple tup_key,
 								   TupleTableSlot *ident_slot);
-static void process_concurrent_changes(XLogRecPtr end_of_wal,
-									   ChangeDest *dest,
-									   bool done);
+static void repack_add_block_range(ConcurrentChangeContext *ctx,
+								   BlockNumber end, char *fname);
 static IndexInsertState *get_index_insert_state(Relation relation,
 												Oid ident_index_id,
 												Relation *ident_index_p);
@@ -291,7 +281,8 @@ static void cleanup_logical_decoding(LogicalDecodingContext *ctx);
 static void rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
 											   Relation cl_index,
 											   TransactionId frozenXid,
-											   MultiXactId cutoffMulti);
+											   MultiXactId cutoffMulti,
+											   ConcurrentChangeContext *ctx);
 static List *build_new_indexes(Relation NewHeap, Relation OldHeap, List *OldIndexes);
 static Relation process_single_relation(RepackStmt *stmt,
 										LOCKMODE lockmode,
@@ -302,9 +293,8 @@ static Oid	determine_clustered_index(Relation rel, bool usingindex,
 static void start_decoding_worker(Oid relid);
 static void stop_decoding_worker(void);
 static void repack_worker_internal(dsm_segment *seg);
-static void export_initial_snapshot(Snapshot snapshot,
-									DecodingWorkerShared *shared);
-static Snapshot get_initial_snapshot(DecodingWorker *worker);
+static void export_snapshot(Snapshot snapshot,
+							DecodingWorkerShared *shared);
 static void ProcessRepackMessage(StringInfo msg);
 static const char *RepackCommandAsString(RepackCommand cmd);
 
@@ -1008,6 +998,15 @@ check_repack_concurrently_requirements(Relation rel)
 						RelationGetRelationName(rel)),
 				 (errhint("Relation \"%s\" has no identity index.",
 						  RelationGetRelationName(rel)))));
+
+	/*
+	 * In the CONCURRENTLY mode we don't want to use the same snapshot
+	 * throughout the whole processing, as it could block the progress of xmin
+	 * horizon.
+	 */
+	if (IsolationUsesXactSnapshot())
+		ereport(ERROR,
+				(errmsg("REPACK (CONCURRENTLY) does not support transaction isolation higher than READ COMMITTED")));
 }
 
 
@@ -1038,7 +1037,7 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose, bool concurrent
 	bool		swap_toast_by_content;
 	TransactionId frozenXid;
 	MultiXactId cutoffMulti;
-	Snapshot	snapshot = NULL;
+	ConcurrentChangeContext *ctx = NULL;
 #if USE_ASSERT_CHECKING
 	LOCKMODE	lmode;
 
@@ -1050,6 +1049,13 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose, bool concurrent
 
 	if (concurrent)
 	{
+		/*
+		 * This is only needed here to gather the data changes and range
+		 * information during the copying. The fields needed to apply the
+		 * changes be filled later.
+		 */
+		ctx = palloc0_object(ConcurrentChangeContext);
+
 		/*
 		 * The worker needs to be member of the locking group we're the leader
 		 * of. We ought to become the leader before the worker starts. The
@@ -1075,13 +1081,7 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose, bool concurrent
 		 * REPACK CONCURRENTLY.
 		 */
 		start_decoding_worker(tableOid);
-
-		/*
-		 * Wait until the worker has the initial snapshot and retrieve it.
-		 */
-		snapshot = get_initial_snapshot(decoding_worker);
-
-		PushActiveSnapshot(snapshot);
+		ctx->worker = decoding_worker;
 	}
 
 	/* for CLUSTER or REPACK USING INDEX, mark the index as the one to use */
@@ -1105,21 +1105,25 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose, bool concurrent
 	NewHeap = table_open(OIDNewHeap, NoLock);
 
 	/* Copy the heap data into the new table in the desired order */
-	copy_table_data(NewHeap, OldHeap, index, snapshot, verbose,
-					&swap_toast_by_content, &frozenXid, &cutoffMulti);
-
-	/* The historic snapshot won't be needed anymore. */
-	if (snapshot)
+	if (concurrent)
 	{
-		PopActiveSnapshot();
-		UpdateActiveSnapshotCommandId();
+		ctx->first_block = InvalidBlockNumber;
+		ctx->block_ranges = NIL;
 	}
+	copy_table_data(NewHeap, OldHeap, index, verbose, &swap_toast_by_content,
+					&frozenXid, &cutoffMulti, ctx);
 
 	if (concurrent)
 	{
+		/*
+		 * Make sure the active snapshot can see the data copied, so the rows
+		 * can be updated / deleted.
+		 */
+		UpdateActiveSnapshotCommandId();
+
 		Assert(!swap_toast_by_content);
 		rebuild_relation_finish_concurrent(NewHeap, OldHeap, index,
-										   frozenXid, cutoffMulti);
+										   frozenXid, cutoffMulti, ctx);
 
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_FINAL_CLEANUP);
@@ -1283,9 +1287,6 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
 /*
  * Do the physical copying of table data.
  *
- * 'snapshot' and 'decoding_ctx': see table_relation_copy_for_cluster(). Pass
- * iff concurrent processing is required.
- *
  * There are three output parameters:
  * *pSwapToastByContent is set true if toast tables must be swapped by content.
  * *pFreezeXid receives the TransactionId used as freeze cutoff point.
@@ -1293,8 +1294,9 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
  */
 static void
 copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
-				Snapshot snapshot, bool verbose, bool *pSwapToastByContent,
-				TransactionId *pFreezeXid, MultiXactId *pCutoffMulti)
+				bool verbose, bool *pSwapToastByContent,
+				TransactionId *pFreezeXid, MultiXactId *pCutoffMulti,
+				ConcurrentChangeContext *ctx)
 {
 	Relation	relRelation;
 	HeapTuple	reltup;
@@ -1311,7 +1313,7 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
 	int			elevel = verbose ? INFO : DEBUG2;
 	PGRUsage	ru0;
 	char	   *nspname;
-	bool		concurrent = snapshot != NULL;
+	bool		concurrent = ctx != NULL;
 	LOCKMODE	lmode;
 
 	lmode = concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock;
@@ -1423,8 +1425,18 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
 	 * provided, else plain seqscan.
 	 */
 	if (OldIndex != NULL && OldIndex->rd_rel->relam == BTREE_AM_OID)
-		use_sort = plan_cluster_use_sort(RelationGetRelid(OldHeap),
-										 RelationGetRelid(OldIndex));
+	{
+		if (!concurrent)
+			use_sort = plan_cluster_use_sort(RelationGetRelid(OldHeap),
+											 RelationGetRelid(OldIndex));
+		else
+
+			/*
+			 * To use multiple snapshots, we need to process the table
+			 * sequentially.
+			 */
+			use_sort = true;
+	}
 	else
 		use_sort = false;
 
@@ -1453,11 +1465,11 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
 	 * values (e.g. because the AM doesn't use freezing).
 	 */
 	table_relation_copy_for_cluster(OldHeap, NewHeap, OldIndex, use_sort,
-									cutoffs.OldestXmin, snapshot,
+									cutoffs.OldestXmin,
 									&cutoffs.FreezeLimit,
 									&cutoffs.MultiXactCutoff,
 									&num_tuples, &tups_vacuumed,
-									&tups_recently_dead);
+									&tups_recently_dead, ctx);
 
 	/* return selected values to caller, get set as relfrozenxid/minmxid */
 	*pFreezeXid = cutoffs.FreezeLimit;
@@ -2610,6 +2622,7 @@ decode_concurrent_changes(LogicalDecodingContext *ctx,
 						  DecodingWorkerShared *shared)
 {
 	RepackDecodingState *dstate;
+	bool		snapshot_requested;
 	XLogRecPtr	lsn_upto;
 	bool		done;
 	char		fname[MAXPGPATH];
@@ -2617,11 +2630,14 @@ decode_concurrent_changes(LogicalDecodingContext *ctx,
 	dstate = (RepackDecodingState *) ctx->output_writer_private;
 
 	/* Open the output file. */
-	DecodingWorkerFileName(fname, shared->relid, shared->last_exported + 1);
+	DecodingWorkerFileName(fname, shared->relid,
+						   shared->last_exported_changes + 1,
+						   false);
 	dstate->file = BufFileCreateFileSet(&shared->sfs.fs, fname);
 
 	SpinLockAcquire(&shared->mutex);
 	lsn_upto = shared->lsn_upto;
+	snapshot_requested = shared->snapshot_requested;
 	done = shared->done;
 	SpinLockRelease(&shared->mutex);
 
@@ -2689,6 +2705,7 @@ decode_concurrent_changes(LogicalDecodingContext *ctx,
 		{
 			SpinLockAcquire(&shared->mutex);
 			lsn_upto = shared->lsn_upto;
+			snapshot_requested = shared->snapshot_requested;
 			/* 'done' should be set at the same time as 'lsn_upto' */
 			done = shared->done;
 			SpinLockRelease(&shared->mutex);
@@ -2710,28 +2727,108 @@ decode_concurrent_changes(LogicalDecodingContext *ctx,
 	 */
 	BufFileClose(dstate->file);
 	dstate->file = NULL;
+
+	/*
+	 * Before publishing the data changes, export the snapshot too if
+	 * requested. Publishing both at once makes sense because both are needed
+	 * at the same time, and it's simpler.
+	 */
+	if (snapshot_requested)
+	{
+		Snapshot	snapshot;
+
+		snapshot = SnapBuildSnapshotForRepack(ctx->snapshot_builder);
+		export_snapshot(snapshot, shared);
+
+		/*
+		 * Adjust the replication slot's xmin so that VACUUM can do more work.
+		 */
+		LogicalIncreaseXminForSlot(InvalidXLogRecPtr, snapshot->xmin, false);
+		FreeSnapshot(snapshot);
+	}
+	else
+	{
+		/*
+		 * If data changes were requested but no following snapshot, we don't
+		 * care about xmin horizon because the heap copying should be done by
+		 * now.
+		 */
+		LogicalIncreaseXminForSlot(InvalidXLogRecPtr, InvalidTransactionId,
+								   false);
+
+	}
+
+	/* Now announce that the output is available. */
 	SpinLockAcquire(&shared->mutex);
 	shared->lsn_upto = InvalidXLogRecPtr;
-	shared->sfs_valid = true;
-	shared->last_exported++;
+	shared->response |= WORKER_RESPONSE_CHANGES;
+	shared->last_exported_changes++;
+	if (snapshot_requested)
+	{
+		shared->snapshot_requested = false;
+		shared->response |= WORKER_RESPONSE_SNAPSHOT;
+		shared->last_exported_snapshot++;
+	}
 	SpinLockRelease(&shared->mutex);
+
 	ConditionVariableSignal(&shared->cv);
 
 	return done;
 }
 
 /*
- * Apply changes stored in 'file'.
+ * Apply all concurrent changes.
  */
 static void
-apply_concurrent_changes(BufFile *file, ChangeDest *dest)
+apply_concurrent_changes(ConcurrentChangeContext *ctx)
+{
+	DecodingWorkerShared *shared;
+	ListCell   *lc;
+
+	shared = (DecodingWorkerShared *) dsm_segment_address(decoding_worker->seg);
+
+	foreach(lc, ctx->block_ranges)
+	{
+		RepackApplyRange *range;
+		BufFile    *file;
+
+		range = (RepackApplyRange *) lfirst(lc);
+
+		file = BufFileOpenFileSet(&shared->sfs.fs, range->fname, O_RDONLY,
+								  false);
+
+		/*
+		 * If range end is valid, the start should be as well.
+		 */
+		Assert(!BlockNumberIsValid(range->end) ||
+			   BlockNumberIsValid(ctx->first_block));
+
+		apply_concurrent_changes_file(ctx, file, range->end);
+		BufFileClose(file);
+
+		pfree(range->fname);
+		pfree(range);
+	}
+
+	/* Get ready for the next decoding. */
+	ctx->block_ranges = NIL;
+	ctx->first_block = InvalidBlockNumber;
+}
+
+/*
+ * Apply concurrent changes stored in 'file'.
+ */
+static void
+apply_concurrent_changes_file(ConcurrentChangeContext *ctx, BufFile *file,
+							  BlockNumber range_end)
 {
 	char		kind;
 	uint32		t_len;
-	Relation	rel = dest->rel;
+	Relation	rel = ctx->rel;
 	TupleTableSlot *index_slot,
 			   *ident_slot;
 	HeapTuple	tup_old = NULL;
+	bool		check_range = BlockNumberIsValid(range_end);
 
 	/* TupleTableSlot is needed to pass the tuple to ExecInsertIndexTuples(). */
 	index_slot = MakeSingleTupleTableSlot(RelationGetDescr(rel),
@@ -2759,8 +2856,8 @@ apply_concurrent_changes(BufFile *file, ChangeDest *dest)
 		tup->t_data = (HeapTupleHeader) ((char *) tup + HEAPTUPLESIZE);
 		BufFileReadExact(file, tup->t_data, t_len);
 		tup->t_len = t_len;
-		ItemPointerSetInvalid(&tup->t_self);
-		tup->t_tableOid = RelationGetRelid(dest->rel);
+		tup->t_tableOid = RelationGetRelid(ctx->rel);
+		BufFileReadExact(file, &tup->t_self, sizeof(tup->t_self));
 
 		if (kind == CHANGE_UPDATE_OLD)
 		{
@@ -2771,7 +2868,10 @@ apply_concurrent_changes(BufFile *file, ChangeDest *dest)
 		{
 			Assert(tup_old == NULL);
 
-			apply_concurrent_insert(rel, tup, dest->iistate, index_slot);
+			if (!check_range ||
+				is_tuple_in_block_range(tup, ctx->first_block, range_end))
+				apply_concurrent_insert(rel, tup, ctx->iistate,
+										index_slot);
 
 			pfree(tup);
 		}
@@ -2792,16 +2892,52 @@ apply_concurrent_changes(BufFile *file, ChangeDest *dest)
 			/*
 			 * Find the tuple to be updated or deleted.
 			 */
-			tup_exist = find_target_tuple(rel, dest, tup_key, ident_slot);
-			if (tup_exist == NULL)
-				elog(ERROR, "failed to find target tuple");
+			if (!check_range ||
+				(is_tuple_in_block_range(tup_key, ctx->first_block,
+										 range_end)))
+			{
+				/* The change needs to be applied to this tuple. */
+				tup_exist = find_target_tuple(rel, ctx, tup_key, ident_slot);
+				if (tup_exist == NULL)
+					elog(ERROR, "failed to find target tuple");
 
-			if (kind == CHANGE_UPDATE_NEW)
-				apply_concurrent_update(rel, tup, tup_exist, dest->iistate,
-										index_slot);
+				if (kind == CHANGE_DELETE)
+					apply_concurrent_delete(rel, tup_exist);
+				else
+				{
+					/* UPDATE */
+					if (!check_range || tup == tup_key ||
+						is_tuple_in_block_range(tup, ctx->first_block,
+												range_end))
+						/* The new tuple is in the same range. */
+						apply_concurrent_update(rel, tup, tup_exist,
+												ctx->iistate, index_slot);
+					else
+
+						/*
+						 * The new key is in the other range, so only delete
+						 * it from the current one. The new version should be
+						 * visible to the snapshot that we'll use to copy the
+						 * other block.
+						 */
+						apply_concurrent_delete(rel, tup_exist);
+				}
+			}
 			else
-				apply_concurrent_delete(rel, tup_exist);
-
+			{
+				/*
+				 * The change belongs to another range, so we don't need to
+				 * bother with the old tuple: the snapshot used for the other
+				 * range won't see it, so it won't be copied. However, the new
+				 * tuple still may need to go to the range we are checking. In
+				 * that case, simply insert it there.
+				 */
+				if (kind == CHANGE_UPDATE_NEW && tup != tup_key &&
+					is_tuple_in_block_range(tup, ctx->first_block,
+											range_end))
+					apply_concurrent_insert(rel, tup, ctx->iistate,
+											index_slot);
+			}
 			if (tup_old != NULL)
 			{
 				pfree(tup_old);
@@ -2940,6 +3076,33 @@ apply_concurrent_delete(Relation rel, HeapTuple tup_target)
 	pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_DELETED, 1);
 }
 
+/*
+ * Check if tuple originates from given range of blocks that have already been
+ * copied.
+ */
+static bool
+is_tuple_in_block_range(HeapTuple tup, BlockNumber start, BlockNumber end)
+{
+	BlockNumber blknum;
+
+	Assert(BlockNumberIsValid(start) && BlockNumberIsValid(end));
+
+	blknum = ItemPointerGetBlockNumber(&tup->t_self);
+	Assert(BlockNumberIsValid(blknum));
+
+	if (start < end)
+	{
+		return blknum >= start && blknum < end;
+	}
+	else
+	{
+		/* Has the scan position wrapped around? */
+		Assert(start > end);
+
+		return blknum >= start || blknum < end;
+	}
+}
+
 /*
  * Find the tuple to be updated or deleted.
  *
@@ -2949,10 +3112,10 @@ apply_concurrent_delete(Relation rel, HeapTuple tup_target)
  * it when he no longer needs the tuple returned.
  */
 static HeapTuple
-find_target_tuple(Relation rel, ChangeDest *dest, HeapTuple tup_key,
-				  TupleTableSlot *ident_slot)
+find_target_tuple(Relation rel, ConcurrentChangeContext *ctx,
+				  HeapTuple tup_key, TupleTableSlot *ident_slot)
 {
-	Relation	ident_index = dest->ident_index;
+	Relation	ident_index = ctx->ident_index;
 	IndexScanDesc scan;
 	Form_pg_index ident_form;
 	int2vector *ident_indkey;
@@ -2960,14 +3123,14 @@ find_target_tuple(Relation rel, ChangeDest *dest, HeapTuple tup_key,
 
 	/* XXX no instrumentation for now */
 	scan = index_beginscan(rel, ident_index, GetActiveSnapshot(),
-						   NULL, dest->ident_key_nentries, 0);
+						   NULL, ctx->ident_key_nentries, 0);
 
 	/*
 	 * Scan key is passed by caller, so it does not have to be constructed
 	 * multiple times. Key entries have all fields initialized, except for
 	 * sk_argument.
 	 */
-	index_rescan(scan, dest->ident_key, dest->ident_key_nentries, NULL, 0);
+	index_rescan(scan, ctx->ident_key, ctx->ident_key_nentries, NULL, 0);
 
 	/* Info needed to retrieve key values from heap tuple. */
 	ident_form = ident_index->rd_index;
@@ -3002,15 +3165,22 @@ find_target_tuple(Relation rel, ChangeDest *dest, HeapTuple tup_key,
 }
 
 /*
- * Decode and apply concurrent changes, up to (and including) the record whose
- * LSN is 'end_of_wal'.
+ * Get concurrent changes, up to (and including) the record whose LSN is
+ * 'end_of_wal', from the decoding worker. If 'range_end' is a valid block
+ * number, the changes should only be applied to blocks greater than or equal
+ * to ctx->first_block and lower than range_end.
+ *
+ * If 'request_snapshot' is true, the snapshot built at LSN following the last
+ * data change needs to be exported too.
  */
-static void
-process_concurrent_changes(XLogRecPtr end_of_wal, ChangeDest *dest, bool done)
+extern void
+repack_get_concurrent_changes(ConcurrentChangeContext *ctx,
+							  XLogRecPtr end_of_wal,
+							  BlockNumber range_end,
+							  bool request_snapshot, bool done)
 {
 	DecodingWorkerShared *shared;
 	char		fname[MAXPGPATH];
-	BufFile    *file;
 
 	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 								 PROGRESS_REPACK_PHASE_CATCH_UP);
@@ -3019,6 +3189,8 @@ process_concurrent_changes(XLogRecPtr end_of_wal, ChangeDest *dest, bool done)
 	shared = (DecodingWorkerShared *) dsm_segment_address(decoding_worker->seg);
 	SpinLockAcquire(&shared->mutex);
 	shared->lsn_upto = end_of_wal;
+	Assert(!shared->snapshot_requested);
+	shared->snapshot_requested = request_snapshot;
 	shared->done = done;
 	SpinLockRelease(&shared->mutex);
 
@@ -3030,32 +3202,49 @@ process_concurrent_changes(XLogRecPtr end_of_wal, ChangeDest *dest, bool done)
 	ConditionVariablePrepareToSleep(&shared->cv);
 	for (;;)
 	{
-		bool		valid;
+		int			response;
 
 		SpinLockAcquire(&shared->mutex);
-		valid = shared->sfs_valid;
+		response = shared->response;
 		SpinLockRelease(&shared->mutex);
 
-		if (valid)
+		if (response & WORKER_RESPONSE_CHANGES)
 			break;
 
 		ConditionVariableSleep(&shared->cv, WAIT_EVENT_REPACK_WORKER_EXPORT);
 	}
 	ConditionVariableCancelSleep();
 
-	/* Open the file. */
-	DecodingWorkerFileName(fname, shared->relid, shared->last_exported);
-	file = BufFileOpenFileSet(&shared->sfs.fs, fname, O_RDONLY, false);
-	apply_concurrent_changes(file, dest);
+	/*
+	 * Remember the file name so we can apply the changes when appropriate.
+	 * One particular reason to postpone the replay is that indexes haven't
+	 * been built yet on the new heap.
+	 */
+	DecodingWorkerFileName(fname, shared->relid,
+						   shared->last_exported_changes,
+						   false);
+	repack_add_block_range(ctx, range_end, fname);
 
 	/* No file is exported until the worker exports the next one. */
 	SpinLockAcquire(&shared->mutex);
-	shared->sfs_valid = false;
+	shared->response &= ~WORKER_RESPONSE_CHANGES;
+	Assert(XLogRecPtrIsInvalid(shared->lsn_upto));
 	SpinLockRelease(&shared->mutex);
+}
 
-	BufFileClose(file);
+static void
+repack_add_block_range(ConcurrentChangeContext *ctx, BlockNumber end,
+					   char *fname)
+{
+	RepackApplyRange *range;
+
+	range = palloc_object(RepackApplyRange);
+	range->end = end;
+	range->fname = pstrdup(fname);
+	ctx->block_ranges = lappend(ctx->block_ranges, range);
 }
 
+
 /*
  * Initialize IndexInsertState for index specified by ident_index_id.
  *
@@ -3198,7 +3387,8 @@ static void
 rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
 								   Relation cl_index,
 								   TransactionId frozenXid,
-								   MultiXactId cutoffMulti)
+								   MultiXactId cutoffMulti,
+								   ConcurrentChangeContext *ctx)
 {
 	LOCKMODE	lockmode_old PG_USED_FOR_ASSERTS_ONLY;
 	List	   *ind_oids_new;
@@ -3217,7 +3407,6 @@ rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
 	Relation   *ind_refs,
 			   *ind_refs_p;
 	int			nind;
-	ChangeDest	chgdst;
 
 	/* Like in cluster_rel(). */
 	lockmode_old = ShareUpdateExclusiveLock;
@@ -3274,11 +3463,18 @@ rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
 				(errmsg("identity index missing on the new relation")));
 
 	/* Gather information to apply concurrent changes. */
-	chgdst.rel = NewHeap;
-	chgdst.iistate = get_index_insert_state(NewHeap, ident_idx_new,
-											&chgdst.ident_index);
-	chgdst.ident_key = build_identity_key(ident_idx_new, OldHeap,
-										  &chgdst.ident_key_nentries);
+	ctx->rel = NewHeap;
+	ctx->iistate = get_index_insert_state(NewHeap, ident_idx_new,
+										  &ctx->ident_index);
+	ctx->ident_key = build_identity_key(ident_idx_new, OldHeap,
+										&ctx->ident_key_nentries);
+
+	/*
+	 * Replay the concurrent data changes gathered during heap copying. This
+	 * had to wait until after the index build because the identity index is
+	 * needed to apply UPDATE and DELETE changes.
+	 */
+	apply_concurrent_changes(ctx);
 
 	/*
 	 * During testing, wait for another backend to perform concurrent data
@@ -3296,11 +3492,13 @@ rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
 	end_of_wal = GetFlushRecPtr(NULL);
 
 	/*
-	 * Apply concurrent changes first time, to minimize the time we need to
-	 * hold AccessExclusiveLock. (Quite some amount of WAL could have been
+	 * Decode and apply concurrent changes again, to minimize the time we need
+	 * to hold AccessExclusiveLock. (Quite some amount of WAL could have been
 	 * written during the data copying and index creation.)
 	 */
-	process_concurrent_changes(end_of_wal, &chgdst, false);
+	repack_get_concurrent_changes(ctx, end_of_wal, InvalidBlockNumber, false,
+								  false);
+	apply_concurrent_changes(ctx);
 
 	/*
 	 * Acquire AccessExclusiveLock on the table, its TOAST relation (if there
@@ -3397,10 +3595,13 @@ rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
 	end_of_wal = GetFlushRecPtr(NULL);
 
 	/*
-	 * Apply the concurrent changes again. Indicate that the decoding worker
-	 * won't be needed anymore.
+	 * Decode and apply the concurrent changes again. Indicate that the
+	 * decoding worker won't be needed anymore.
 	 */
-	process_concurrent_changes(end_of_wal, &chgdst, true);
+	repack_get_concurrent_changes(ctx, end_of_wal, InvalidBlockNumber, false,
+								  true);
+	apply_concurrent_changes(ctx);
+
 
 	/* Remember info about rel before closing OldHeap */
 	relpersistence = OldHeap->rd_rel->relpersistence;
@@ -3451,8 +3652,8 @@ rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
 	table_close(NewHeap, NoLock);
 
 	/* Cleanup what we don't need anymore. (And close the identity index.) */
-	pfree(chgdst.ident_key);
-	free_index_insert_state(chgdst.iistate);
+	pfree(ctx->ident_key);
+	free_index_insert_state(ctx->iistate);
 
 	/*
 	 * Swap the relations and their TOAST relations and TOAST indexes. This
@@ -3495,6 +3696,23 @@ build_new_indexes(Relation NewHeap, Relation OldHeap, List *OldIndexes)
 		char	   *newName;
 		Relation	ind;
 
+		/*
+		 * Try to reduce the impact on VACUUM.
+		 *
+		 * The individual builds might still be a problem, but that's a
+		 * separate issue.
+		 *
+		 * TODO Can we somehow use the fact that the new heap is not yet
+		 * visible to other transaction, and thus cannot be vacuumed? Perhaps
+		 * by preventing snapshots from setting MyProc->xmin temporarily. (All
+		 * the snapshots that might have participated in the build, including
+		 * the catalog snapshots, must not be used for other tables of
+		 * course.)
+		 */
+		PopActiveSnapshot();
+		InvalidateCatalogSnapshot();
+		PushActiveSnapshot(GetTransactionSnapshot());
+
 		ind_oid = lfirst_oid(lc);
 		ind = index_open(ind_oid, ShareUpdateExclusiveLock);
 
@@ -3534,11 +3752,15 @@ start_decoding_worker(Oid relid)
 		BUFFERALIGN(REPACK_ERROR_QUEUE_SIZE);
 	seg = dsm_create(size, 0);
 	shared = (DecodingWorkerShared *) dsm_segment_address(seg);
+	shared->initialized = false;
 	shared->lsn_upto = InvalidXLogRecPtr;
 	shared->done = false;
+	/* Snapshot is the first thing we need from the worker. */
+	shared->snapshot_requested = true;
+	shared->response = 0;
 	SharedFileSetInit(&shared->sfs, seg);
-	shared->sfs_valid = false;
-	shared->last_exported = -1;
+	shared->last_exported_changes = -1;
+	shared->last_exported_snapshot = -1;
 	SpinLockInit(&shared->mutex);
 	shared->dbid = MyDatabaseId;
 
@@ -3747,7 +3969,10 @@ repack_worker_internal(dsm_segment *seg)
 	 */
 	SpinLockAcquire(&shared->mutex);
 	Assert(XLogRecPtrIsInvalid(shared->lsn_upto));
-	Assert(!shared->sfs_valid);
+	Assert(shared->response == 0);
+	/* Initially we're expected to provide a snapshot and only that. */
+	Assert(shared->snapshot_requested &&
+		   XLogRecPtrIsInvalid(shared->lsn_upto));
 	sfs = &shared->sfs;
 	SpinLockRelease(&shared->mutex);
 
@@ -3765,8 +3990,23 @@ repack_worker_internal(dsm_segment *seg)
 	ConditionVariableSignal(&shared->cv);
 
 	/* Build the initial snapshot and export it. */
-	snapshot = SnapBuildInitialSnapshotForRepack(decoding_ctx->snapshot_builder);
-	export_initial_snapshot(snapshot, shared);
+	snapshot = SnapBuildSnapshotForRepack(decoding_ctx->snapshot_builder);
+	export_snapshot(snapshot, shared);
+
+	/*
+	 * Adjust the replication slot's xmin so that VACUUM can do more work.
+	 */
+	LogicalIncreaseXminForSlot(InvalidXLogRecPtr, snapshot->xmin, false);
+	FreeSnapshot(snapshot);
+
+	/* Tell the backend that the file is available. */
+	SpinLockAcquire(&shared->mutex);
+	Assert(shared->snapshot_requested);
+	shared->snapshot_requested = false;
+	shared->response |= WORKER_RESPONSE_SNAPSHOT;
+	shared->last_exported_snapshot++;
+	SpinLockRelease(&shared->mutex);
+	ConditionVariableSignal(&shared->cv);
 
 	/*
 	 * Only historic snapshots should be used now. Do not let us restrict the
@@ -3786,7 +4026,7 @@ repack_worker_internal(dsm_segment *seg)
  * Make snapshot available to the backend that launched the decoding worker.
  */
 static void
-export_initial_snapshot(Snapshot snapshot, DecodingWorkerShared *shared)
+export_snapshot(Snapshot snapshot, DecodingWorkerShared *shared)
 {
 	char		fname[MAXPGPATH];
 	BufFile    *file;
@@ -3796,29 +4036,23 @@ export_initial_snapshot(Snapshot snapshot, DecodingWorkerShared *shared)
 	snap_size = EstimateSnapshotSpace(snapshot);
 	snap_space = (char *) palloc(snap_size);
 	SerializeSnapshot(snapshot, snap_space);
-	FreeSnapshot(snapshot);
 
-	DecodingWorkerFileName(fname, shared->relid, shared->last_exported + 1);
+	DecodingWorkerFileName(fname, shared->relid,
+						   shared->last_exported_snapshot + 1,
+						   true);
 	file = BufFileCreateFileSet(&shared->sfs.fs, fname);
 	/* To make restoration easier, write the snapshot size first. */
 	BufFileWrite(file, &snap_size, sizeof(snap_size));
 	BufFileWrite(file, snap_space, snap_size);
 	pfree(snap_space);
 	BufFileClose(file);
-
-	/* Tell the backend that the file is available. */
-	SpinLockAcquire(&shared->mutex);
-	shared->sfs_valid = true;
-	shared->last_exported++;
-	SpinLockRelease(&shared->mutex);
-	ConditionVariableSignal(&shared->cv);
 }
 
 /*
- * Get the initial snapshot from the decoding worker.
+ * Get snapshot from the decoding worker.
  */
-static Snapshot
-get_initial_snapshot(DecodingWorker *worker)
+extern Snapshot
+repack_get_snapshot(ConcurrentChangeContext *ctx)
 {
 	DecodingWorkerShared *shared;
 	char		fname[MAXPGPATH];
@@ -3826,24 +4060,26 @@ get_initial_snapshot(DecodingWorker *worker)
 	Size		snap_size;
 	char	   *snap_space;
 	Snapshot	snapshot;
+	DecodingWorker *worker = ctx->worker;
 
 	shared = (DecodingWorkerShared *) dsm_segment_address(worker->seg);
 
 	/*
-	 * The worker needs to initialize the logical decoding, which usually
-	 * takes some time. Therefore it makes sense to prepare for the sleep
-	 * first.
+	 * For the first snapshot request, the worker needs to initialize the
+	 * logical decoding, which usually takes some time. Therefore it makes
+	 * sense to prepare for the sleep first. Does it make sense to skip the
+	 * preparation on the next requests?
 	 */
 	ConditionVariablePrepareToSleep(&shared->cv);
 	for (;;)
 	{
-		bool		valid;
+		int			response;
 
 		SpinLockAcquire(&shared->mutex);
-		valid = shared->sfs_valid;
+		response = shared->response;
 		SpinLockRelease(&shared->mutex);
 
-		if (valid)
+		if (response & WORKER_RESPONSE_SNAPSHOT)
 			break;
 
 		ConditionVariableSleep(&shared->cv, WAIT_EVENT_REPACK_WORKER_EXPORT);
@@ -3851,7 +4087,9 @@ get_initial_snapshot(DecodingWorker *worker)
 	ConditionVariableCancelSleep();
 
 	/* Read the snapshot from a file. */
-	DecodingWorkerFileName(fname, shared->relid, shared->last_exported);
+	DecodingWorkerFileName(fname, shared->relid,
+						   shared->last_exported_snapshot,
+						   true);
 	file = BufFileOpenFileSet(&shared->sfs.fs, fname, O_RDONLY, false);
 	BufFileReadExact(file, &snap_size, sizeof(snap_size));
 	snap_space = (char *) palloc(snap_size);
@@ -3859,7 +4097,8 @@ get_initial_snapshot(DecodingWorker *worker)
 	BufFileClose(file);
 
 	SpinLockAcquire(&shared->mutex);
-	shared->sfs_valid = false;
+	shared->response &= ~WORKER_RESPONSE_SNAPSHOT;
+	Assert(!shared->snapshot_requested);
 	SpinLockRelease(&shared->mutex);
 
 	/* Restore it. */
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index a956892f42f..c8bc85d8bcc 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -1003,6 +1003,7 @@ DecodeInsert(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	xl_heap_insert *xlrec;
 	ReorderBufferChange *change;
 	RelFileLocator target_locator;
+	BlockNumber blknum;
 
 	xlrec = (xl_heap_insert *) XLogRecGetData(r);
 
@@ -1014,7 +1015,7 @@ DecodeInsert(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		return;
 
 	/* only interested in our database */
-	XLogRecGetBlockTag(r, 0, &target_locator, NULL, NULL);
+	XLogRecGetBlockTag(r, 0, &target_locator, NULL, &blknum);
 	if (target_locator.dbOid != ctx->slot->data.database)
 		return;
 
@@ -1039,6 +1040,15 @@ DecodeInsert(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 
 	DecodeXLogTuple(tupledata, datalen, change->data.tp.newtuple);
 
+	/*
+	 * REPACK (CONCURRENTLY) needs block number to check if the corresponding
+	 * part of the table was already copied.
+	 */
+	if (OidIsValid(repacked_rel_locator.relNumber))
+		/* offnum is not really needed, but let's set valid pointer. */
+		ItemPointerSet(&change->data.tp.newtuple->t_self, blknum,
+					   xlrec->offnum);
+
 	change->data.tp.clear_toast_afterwards = true;
 
 	ReorderBufferQueueChange(ctx->reorder, XLogRecGetXid(r), buf->origptr,
@@ -1060,11 +1070,12 @@ DecodeUpdate(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	ReorderBufferChange *change;
 	char	   *data;
 	RelFileLocator target_locator;
+	BlockNumber new_blknum;
 
 	xlrec = (xl_heap_update *) XLogRecGetData(r);
 
 	/* only interested in our database */
-	XLogRecGetBlockTag(r, 0, &target_locator, NULL, NULL);
+	XLogRecGetBlockTag(r, 0, &target_locator, NULL, &new_blknum);
 	if (target_locator.dbOid != ctx->slot->data.database)
 		return;
 
@@ -1090,12 +1101,27 @@ DecodeUpdate(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			ReorderBufferAllocTupleBuf(ctx->reorder, tuplelen);
 
 		DecodeXLogTuple(data, datalen, change->data.tp.newtuple);
+
+		/*
+		 * REPACK (CONCURRENTLY) needs block number to check if the
+		 * corresponding part of the table was already copied.
+		 */
+		if (OidIsValid(repacked_rel_locator.relNumber))
+			/* offnum is not really needed, but let's set valid pointer. */
+			ItemPointerSet(&change->data.tp.newtuple->t_self,
+						   new_blknum, xlrec->new_offnum);
 	}
 
 	if (xlrec->flags & XLH_UPDATE_CONTAINS_OLD)
 	{
 		Size		datalen;
 		Size		tuplelen;
+		BlockNumber old_blknum;
+
+		if (XLogRecHasBlockRef(r, 1))
+			XLogRecGetBlockTag(r, 1, NULL, NULL, &old_blknum);
+		else
+			XLogRecGetBlockTag(r, 0, NULL, NULL, &old_blknum);
 
 		/* caution, remaining data in record is not aligned */
 		data = XLogRecGetData(r) + SizeOfHeapUpdate;
@@ -1106,6 +1132,11 @@ DecodeUpdate(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			ReorderBufferAllocTupleBuf(ctx->reorder, tuplelen);
 
 		DecodeXLogTuple(data, datalen, change->data.tp.oldtuple);
+		/* See above. */
+		if (OidIsValid(repacked_rel_locator.relNumber))
+			ItemPointerSet(&change->data.tp.oldtuple->t_self,
+						   old_blknum, xlrec->old_offnum);
+
 	}
 
 	change->data.tp.clear_toast_afterwards = true;
@@ -1126,11 +1157,12 @@ DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	xl_heap_delete *xlrec;
 	ReorderBufferChange *change;
 	RelFileLocator target_locator;
+	BlockNumber blknum;
 
 	xlrec = (xl_heap_delete *) XLogRecGetData(r);
 
 	/* only interested in our database */
-	XLogRecGetBlockTag(r, 0, &target_locator, NULL, NULL);
+	XLogRecGetBlockTag(r, 0, &target_locator, NULL, &blknum);
 	if (target_locator.dbOid != ctx->slot->data.database)
 		return;
 
@@ -1162,6 +1194,15 @@ DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 
 		DecodeXLogTuple((char *) xlrec + SizeOfHeapDelete,
 						datalen, change->data.tp.oldtuple);
+
+		/*
+		 * REPACK (CONCURRENTLY) needs block number to check if the
+		 * corresponding part of the table was already copied.
+		 */
+		if (OidIsValid(repacked_rel_locator.relNumber))
+			/* offnum is not really needed, but let's set valid pointer. */
+			ItemPointerSet(&change->data.tp.oldtuple->t_self, blknum,
+						   xlrec->offnum);
 	}
 
 	change->data.tp.clear_toast_afterwards = true;
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index b3fd7fec392..1e445704a1b 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1670,14 +1670,17 @@ update_progress_txn_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 
 /*
  * Set the required catalog xmin horizon for historic snapshots in the current
- * replication slot.
+ * replication slot if catalog is TRUE, or xmin if catalog is FALSE.
  *
  * Note that in the most cases, we won't be able to immediately use the xmin
  * to increase the xmin horizon: we need to wait till the client has confirmed
- * receiving current_lsn with LogicalConfirmReceivedLocation().
+ * receiving current_lsn with LogicalConfirmReceivedLocation(). However,
+ * catalog=FALSE is only allowed for temporary replication slots, so the
+ * horizon is applied immediately.
  */
 void
-LogicalIncreaseXminForSlot(XLogRecPtr current_lsn, TransactionId xmin)
+LogicalIncreaseXminForSlot(XLogRecPtr current_lsn, TransactionId xmin,
+						   bool catalog)
 {
 	bool		updated_xmin = false;
 	ReplicationSlot *slot;
@@ -1688,6 +1691,27 @@ LogicalIncreaseXminForSlot(XLogRecPtr current_lsn, TransactionId xmin)
 	Assert(slot != NULL);
 
 	SpinLockAcquire(&slot->mutex);
+	if (!catalog)
+	{
+		/*
+		 * The non-catalog horizon can only advance in temporary slots, so
+		 * update it in the shared memory immediately (w/o requiring prior
+		 * saving to disk).
+		 */
+		Assert(slot->data.persistency == RS_TEMPORARY);
+
+		/*
+		 * The horizon must not go backwards, however it's o.k. to become
+		 * invalid.
+		 */
+		Assert(!TransactionIdIsValid(slot->effective_xmin) ||
+			   !TransactionIdIsValid(xmin) ||
+			   TransactionIdFollowsOrEquals(xmin, slot->effective_xmin));
+
+		slot->effective_xmin = xmin;
+		SpinLockRelease(&slot->mutex);
+		return;
+	}
 
 	/*
 	 * don't overwrite if we already have a newer xmin. This can happen if we
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index f18c6fb52b5..273f65d6cc8 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -3734,6 +3734,56 @@ ReorderBufferXidHasCatalogChanges(ReorderBuffer *rb, TransactionId xid)
 	return rbtxn_has_catalog_changes(txn);
 }
 
+/*
+ * Check if a transaction (or its subtransaction) contains a heap change.
+ */
+bool
+ReorderBufferXidHasHeapChanges(ReorderBuffer *rb, TransactionId xid)
+{
+	ReorderBufferTXN *txn;
+	dlist_iter	iter;
+
+	txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+								false);
+	if (txn == NULL)
+		return false;
+
+	dlist_foreach(iter, &txn->changes)
+	{
+		ReorderBufferChange *change;
+
+		change = dlist_container(ReorderBufferChange, node, iter.cur);
+
+		switch (change->action)
+		{
+			case REORDER_BUFFER_CHANGE_INSERT:
+			case REORDER_BUFFER_CHANGE_UPDATE:
+			case REORDER_BUFFER_CHANGE_DELETE:
+				return true;
+			default:
+				break;
+		}
+	}
+
+	/* Check subtransactions. */
+
+	/*
+	 * TODO Verify that subtransactions must be assigned to the top-level
+	 * transactions by now.
+	 */
+	dlist_foreach(iter, &txn->subtxns)
+	{
+		ReorderBufferTXN *subtxn;
+
+		subtxn = dlist_container(ReorderBufferTXN, node, iter.cur);
+
+		if (ReorderBufferXidHasHeapChanges(rb, subtxn->xid))
+			return true;
+	}
+
+	return false;
+}
+
 /*
  * ReorderBufferXidHasBaseSnapshot
  *		Have we already set the base snapshot for the given txn/subtxn?
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 7643dfe31bb..4bc6cd22496 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -128,6 +128,7 @@
 #include "access/heapam_xlog.h"
 #include "access/transam.h"
 #include "access/xact.h"
+#include "commands/cluster.h"
 #include "common/file_utils.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -496,7 +497,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
  * we do not set MyProc->xmin). XXX Do we yet need to add some restrictions?
  */
 Snapshot
-SnapBuildInitialSnapshotForRepack(SnapBuild *builder)
+SnapBuildSnapshotForRepack(SnapBuild *builder)
 {
 	Snapshot	snap;
 
@@ -1035,6 +1036,28 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 		}
 	}
 
+	/*
+	 * Is REPACKED (CONCURRENTLY) is being run by this backend?
+	 */
+	else if (OidIsValid(repacked_rel_locator.relNumber))
+	{
+		Assert(builder->building_full_snapshot);
+
+		/*
+		 * In this special mode, heap changes of other relations should not be
+		 * decoded at all - see heap_decode(). Thus if we find a single heap
+		 * change in this transaction (or its subtransaction), we know that
+		 * this transaction changes the relation being repacked.
+		 */
+		if (ReorderBufferXidHasHeapChanges(builder->reorder, xid))
+
+			/*
+			 * Record the commit so we can build snapshots for the relation
+			 * being repacked.
+			 */
+			needs_timetravel = true;
+	}
+
 	for (nxact = 0; nxact < nsubxacts; nxact++)
 	{
 		TransactionId subxid = subxacts[nxact];
@@ -1240,7 +1263,7 @@ SnapBuildProcessRunningXacts(SnapBuild *builder, XLogRecPtr lsn, xl_running_xact
 		xmin = running->oldestRunningXid;
 	elog(DEBUG3, "xmin: %u, xmax: %u, oldest running: %u, oldest xmin: %u",
 		 builder->xmin, builder->xmax, running->oldestRunningXid, xmin);
-	LogicalIncreaseXminForSlot(lsn, xmin);
+	LogicalIncreaseXminForSlot(lsn, xmin, true);
 
 	/*
 	 * Also tell the slot where we can restart decoding from. We don't want to
diff --git a/src/backend/replication/pgoutput_repack/pgoutput_repack.c b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
index fb9956d392d..be1c3ec9626 100644
--- a/src/backend/replication/pgoutput_repack/pgoutput_repack.c
+++ b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
@@ -195,6 +195,8 @@ store_change(LogicalDecodingContext *ctx, ConcurrentChangeKind kind,
 	BufFileWrite(dstate->file, &tuple->t_len, sizeof(tuple->t_len));
 	/* ... and the tuple itself. */
 	BufFileWrite(dstate->file, tuple->t_data, tuple->t_len);
+	/* CTID is needed as well, to check block ranges. */
+	BufFileWrite(dstate->file, &tuple->t_self, sizeof(tuple->t_self));
 
 	/* Free the flat copy if created above. */
 	if (flattened)
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 3b9d8349078..a82d284c44c 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -2415,6 +2415,16 @@
   boot_val => 'true',
 },
 
+# TODO Tune boot_val, 1024 is probably too low.
+{ name => 'repack_snapshot_after', type => 'int', context => 'PGC_USERSET', group => 'DEVELOPER_OPTIONS',
+  short_desc => 'Number of pages after which REPACK (CONCURRENTLY) builds a new snapshot.',
+  flags => 'GUC_UNIT_BLOCKS | GUC_NOT_IN_SAMPLE',
+  variable => 'repack_blocks_per_snapshot',
+  boot_val => '1024',
+  min => '1',
+  max => 'INT_MAX',
+}
+
 { name => 'reserved_connections', type => 'int', context => 'PGC_POSTMASTER', group => 'CONN_AUTH_SETTINGS',
   short_desc => 'Sets the number of connection slots reserved for roles with privileges of pg_use_reserved_connections.',
   variable => 'ReservedConnections',
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index f87b558c2c6..07f7cfdd6cc 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -42,6 +42,7 @@
 #include "catalog/namespace.h"
 #include "catalog/storage.h"
 #include "commands/async.h"
+#include "commands/cluster.h"
 #include "commands/extension.h"
 #include "commands/event_trigger.h"
 #include "commands/tablespace.h"
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 321c00682ec..6fe3fc760bc 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -629,12 +629,12 @@ typedef struct TableAmRoutine
 											  Relation OldIndex,
 											  bool use_sort,
 											  TransactionId OldestXmin,
-											  Snapshot snapshot,
 											  TransactionId *xid_cutoff,
 											  MultiXactId *multi_cutoff,
 											  double *num_tuples,
 											  double *tups_vacuumed,
-											  double *tups_recently_dead);
+											  double *tups_recently_dead,
+											  void *tableam_data);
 
 	/*
 	 * React to VACUUM command on the relation. The VACUUM can be triggered by
@@ -1647,8 +1647,6 @@ table_relation_copy_data(Relation rel, const RelFileLocator *newrlocator)
  *   not needed for the relation's AM
  * - *xid_cutoff - ditto
  * - *multi_cutoff - ditto
- * - snapshot - if != NULL, ignore data changes done by transactions that this
- *	 (MVCC) snapshot considers still in-progress or in the future.
  *
  * Output parameters:
  * - *xid_cutoff - rel's new relfrozenxid value, may be invalid
@@ -1661,19 +1659,19 @@ table_relation_copy_for_cluster(Relation OldTable, Relation NewTable,
 								Relation OldIndex,
 								bool use_sort,
 								TransactionId OldestXmin,
-								Snapshot snapshot,
 								TransactionId *xid_cutoff,
 								MultiXactId *multi_cutoff,
 								double *num_tuples,
 								double *tups_vacuumed,
-								double *tups_recently_dead)
+								double *tups_recently_dead,
+								void *tableam_data)
 {
 	OldTable->rd_tableam->relation_copy_for_cluster(OldTable, NewTable, OldIndex,
 													use_sort, OldestXmin,
-													snapshot,
 													xid_cutoff, multi_cutoff,
 													num_tuples, tups_vacuumed,
-													tups_recently_dead);
+													tups_recently_dead,
+													tableam_data);
 }
 
 /*
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 0ac70ec30d7..2b61dce92dc 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -16,10 +16,12 @@
 #include "nodes/execnodes.h"
 #include "nodes/parsenodes.h"
 #include "parser/parse_node.h"
+#include "postmaster/bgworker.h"
 #include "replication/logical.h"
 #include "storage/buffile.h"
 #include "storage/lock.h"
 #include "storage/relfilelocator.h"
+#include "storage/shm_mq.h"
 #include "utils/relcache.h"
 #include "utils/resowner.h"
 #include "utils/tuplestore.h"
@@ -47,6 +49,63 @@ typedef struct ClusterParams
 
 extern RelFileLocator repacked_rel_locator;
 extern RelFileLocator repacked_rel_toast_locator;
+extern PGDLLIMPORT int repack_blocks_per_snapshot;
+
+/*
+ * Everything we need to call ExecInsertIndexTuples().
+ */
+typedef struct IndexInsertState
+{
+	ResultRelInfo *rri;
+	EState	   *estate;
+} IndexInsertState;
+
+/*
+ * Backend-local information to control the decoding worker.
+ */
+typedef struct DecodingWorker
+{
+	/* The worker. */
+	BackgroundWorkerHandle *handle;
+
+	/* DecodingWorkerShared is in this segment. */
+	dsm_segment *seg;
+
+	/* Handle of the error queue. */
+	shm_mq_handle *error_mqh;
+} DecodingWorker;
+
+/*
+ * Information needed to handle concurrent data changes.
+ */
+typedef struct ConcurrentChangeContext
+{
+	/* The relation the changes are applied to. */
+	Relation	rel;
+
+	/*
+	 * Background worker performing logical decoding of concurrent data
+	 * changes.
+	 */
+	DecodingWorker *worker;
+
+	/*
+	 * The following is needed to find the existing tuple if the change is
+	 * UPDATE or DELETE. 'ident_key' should have all the fields except for
+	 * 'sk_argument' initialized.
+	 */
+	Relation	ident_index;
+	ScanKey		ident_key;
+	int			ident_key_nentries;
+
+	/* Needed to update indexes of rel_dst. */
+	IndexInsertState *iistate;
+
+	/* The first block of the scan used to copy the heap. */
+	BlockNumber first_block;
+	/* List of RepackApplyRange objects. */
+	List	   *block_ranges;
+} ConcurrentChangeContext;
 
 /*
  * Stored as a single byte in the output file.
@@ -102,6 +161,12 @@ extern void finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 							 TransactionId frozenXid,
 							 MultiXactId cutoffMulti,
 							 char newrelpersistence);
+extern void repack_get_concurrent_changes(struct ConcurrentChangeContext *ctx,
+										  XLogRecPtr end_of_wal,
+										  BlockNumber range_end,
+										  bool request_snapshot,
+										  bool done);
+extern Snapshot repack_get_snapshot(struct ConcurrentChangeContext *ctx);
 
 extern void RepackWorkerMain(Datum main_arg);
 #endif							/* CLUSTER_H */
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index 2e562bee5a9..9924f706f20 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -137,7 +137,7 @@ extern bool DecodingContextReady(LogicalDecodingContext *ctx);
 extern void FreeDecodingContext(LogicalDecodingContext *ctx);
 
 extern void LogicalIncreaseXminForSlot(XLogRecPtr current_lsn,
-									   TransactionId xmin);
+									   TransactionId xmin, bool catalog);
 extern void LogicalIncreaseRestartDecodingForSlot(XLogRecPtr current_lsn,
 												  XLogRecPtr restart_lsn);
 extern void LogicalConfirmReceivedLocation(XLogRecPtr lsn);
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 3cbe106a3c7..b6b739f29f4 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -763,6 +763,7 @@ extern void ReorderBufferProcessXid(ReorderBuffer *rb, TransactionId xid, XLogRe
 
 extern void ReorderBufferXidSetCatalogChanges(ReorderBuffer *rb, TransactionId xid, XLogRecPtr lsn);
 extern bool ReorderBufferXidHasCatalogChanges(ReorderBuffer *rb, TransactionId xid);
+extern bool ReorderBufferXidHasHeapChanges(ReorderBuffer *rb, TransactionId xid);
 extern bool ReorderBufferXidHasBaseSnapshot(ReorderBuffer *rb, TransactionId xid);
 
 extern bool ReorderBufferRememberPrepareInfo(ReorderBuffer *rb, TransactionId xid,
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index 802fc4b0823..d1f9037fcfb 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -73,7 +73,7 @@ extern void FreeSnapshotBuilder(SnapBuild *builder);
 extern void SnapBuildSnapDecRefcount(Snapshot snap);
 
 extern Snapshot SnapBuildInitialSnapshot(SnapBuild *builder);
-extern Snapshot SnapBuildInitialSnapshotForRepack(SnapBuild *builder);
+extern Snapshot SnapBuildSnapshotForRepack(SnapBuild *builder);
 extern Snapshot SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place);
 extern const char *SnapBuildExportSnapshot(SnapBuild *builder);
 extern void SnapBuildClearExportedSnapshot(void);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 35344910f65..54115135732 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -411,7 +411,6 @@ CatCacheHeader
 CatalogId
 CatalogIdMapEntry
 CatalogIndexState
-ChangeDest
 ChangeVarNodes_callback
 ChangeVarNodes_context
 CheckPoint
@@ -488,6 +487,7 @@ CompressFileHandle
 CompressionLocation
 CompressorState
 ComputeXidHorizonsResult
+ConcurrentChangeContext
 ConcurrentChangeKind
 ConditionVariable
 ConditionVariableMinimallyPadded
@@ -2563,6 +2563,7 @@ ReorderBufferTupleCidKey
 ReorderBufferUpdateProgressTxnCB
 ReorderTuple
 RepOriginId
+RepackApplyRange
 RepackCommand
 RepackDecodingState
 RepackStmt
-- 
2.47.3

#71

Mihail Nikalayeu

mihailnikalayeu@gmail.com

29 days ago

In reply to: Mihail Nikalayeu (#69)

Re: Adding REPACK [concurrently]

Once it was sent, I realized MVCC-safe fails with
007_repack_concurrently.pl with TRANSACTION ISOLATION LEVEL REPEATABLE
READ uncommented.

Don't know why it fails - but happy it fails :)

On Sat, Dec 13, 2025 at 7:45 PM Mihail Nikalayeu
<mihailnikalayeu@gmail.com> wrote:

Show quoted text

Hello, everyone.

Stress tests for REPACK concurrently in attachment.
So far I can't break anything (except MVCC of course).

A rebased version of the MVCC-safe "light" version with its own stress
test is attached also.

Best regards,
Mikhail.

#72

Mihail Nikalayeu

mihailnikalayeu@gmail.com

29 days ago

In reply to: Antonin Houska (#70)

Re: Adding REPACK [concurrently]

Hello, Antonin!

On Sat, Dec 13, 2025 at 7:48 PM Antonin Houska <ah@cybertec.at> wrote:

Attached here is a new version of the patch set. Its rebased and extended one
more time: 0006 is a PoC of the "snapshot resetting" technique, as discussed
elsewhere with Mihail Nikalayeu and Matthias van de Meent. The way snapshot
are generated here is different though: we need the snapshots from logical
replication's snapbuild.c, not those from procarray.c. More information is in
the commit message.

Have you seen my feedback for 0004? Do you plan to check it? Asking to
understand if it is worth reviewing now or later.

Best regards,
Mikhail.

#73

Antonin Houska

ah@cybertec.at

29 days ago

In reply to: Mihail Nikalayeu (#68)

Re: Adding REPACK [concurrently]

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

On Tue, Dec 9, 2025 at 7:52 PM Antonin Houska <ah@cybertec.at> wrote:

Worker makes more sense to me - the initial implementation is in 0005.

Comments for 0005, so far:

Thanks!

---

export_initial_snapshot

Hm, should we use ExportSnapshot instead? And ImportSnapshort to import it.

There is at least one thing that I don't want: ImportSnapshot calls
SetTransactionSnapshot() at the end. I chose the way leader process uses to
serialize and pass snapshot to background workers.

---

get_initial_snapshot

Should we check if a worker is still alive while waiting? Also is
"process_concurrent_changes".

ConditionVariableSleep() should handle that - see the WL_EXIT_ON_PM_DEATH flag
in ConditionVariableTimedSleep().

And AFAIU RegisterDynamicBackgroundWorker does not guarantee new
workers to be started (in case of some fork-related issues).

Yes, user will get ERROR in such a case. This is different from parallel
workers in query processing: if parallel worker cannot be started, the leader
(AFAICS) still executes the query. I'm not sure though if we should implement
REPACK (CONCURRENTLY) in such a way that it works even w/o the worker. The
code would be more complex and the behaviour quite different (I mean the
possibly huge amount of unprocessed WAL that you pointed out earlier.)

---

Assert(res = SHM_MQ_DETACHED);

==

Thanks!

---

/* Wait a bit before we retry reading WAL. */
(void) WaitLatch(MyLatch,
WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
1000L,
WAIT_EVENT_REPACK_WORKER_MAIN);

Looks like we need ResetLatch(MyLatch); here.

You seem to be right.

---

* - decoding_ctx - logical decoding context, to capture concurrent data

Need to be removed together with parameters.

Do you mean in 0005? (It'd help if you pasted the hunk headers.) This should
be fixed in v28 [1]/messages/by-id/210036.1765651719@localhost

---

hpm_context = AllocSetContextCreate(TopMemoryContext,
"ProcessParallelMessages",
ALLOCSET_DEFAULT_SIZES);

"ProcessRepacklMessages"

ok, the copy and pasting is a problem that needs to be addressed (mentioned in
the last paragraph of the commit message of 0005).

---

if (XLogRecPtrIsInvalid(lsn_upto))
{
SpinLockAcquire(&shared->mutex);
lsn_upto = shared->lsn_upto;
/* 'done' should be set at the same time as 'lsn_upto' */
done = shared->done;
SpinLockRelease(&shared->mutex);

/* Check if the work happens to be complete. */
continue;
}

May be moved to the start of the loop to avoid duplication.

I found more problems in this part when working on v28, maybe check that.

---

SpinLockAcquire(&shared->mutex);
valid = shared->sfs_valid;
SpinLockRelease(&shared->mutex);

Better to remember last_exported here to avoid any races/misses.

What races/misses exactly?

---

shared->lsn_upto = InvalidXLogRecPtr;

I think it is better to clear it once it is read (after removing duplication).

Maybe, I'll think about it.

---

bool done;

bool exit_after_lsn_upto?

Not sure.

---

bool sfs_valid;

Do we really need it? I think it is better to leave only last_exported
and in process_concurrent_changes wait add argument
(last_processed_file) and wait for last_exported to become higher.

I'll consider that (The variable is replaced in the 0006 part of v28, but the
idea should still be applicable.)

---
What if we reverse roles of leader-worker?

Leader gets a snapshot, transfers it to workers (multiple probably for
parallel scan) using already ready mechanics - workers are processing
the scan of the table in parallel. Leader decodes the WAL.

Insertion into a table by multiple workers is a special thing, but maybe it'd
be doable in this case, but ...

Also, workers may be assigned with a list of indexes they need to build.

Feels like it reuses more from current infrastructure and also needs
less different synchronization logic. But I'm not sure about the
indexes phase - maybe it is not so easy to do.

... my feelings were the opposite, i.e. I thought require higher amount of
code rearrangement. Moreover, the part 0006 of v28 (snapshot switching) would
be trickier. It processes one range of blocks after another, and parallelism
would make it more difficult.

---
Also, should we add some kind of back pressure between building
indexes/new heap and num of WAL we have?
But probably it is out of scope of the patch.

Do you mean that the decoding worker should be less active if the amount of
WAL doesn't grow too fast?

---
To build N indexes we need to scan table N times. What is about
building multiple indexes during a single heap scan?

That sounds like a separate feature, and similarly difficult as enhancing
CREATE INDEX so it can create multiple indexes at a time.

--
Just a gentle reminder about the XMIN_COMMITTED flag and WAL storm
after the switch.

ok, I have it in my notes, moved it more to the top :-)

[1]: /messages/by-id/210036.1765651719@localhost

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#74

Alvaro Herrera

alvherre@alvh.no-ip.org

28 days ago

In reply to: Antonin Houska (#70)

Re: Adding REPACK [concurrently]

On 2025-Dec-13, Antonin Houska wrote:

From 6279394135f2b693b6fffd174822509e0a067cbf Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Sat, 13 Dec 2025 19:27:18 +0100
Subject: [PATCH 4/6] Add CONCURRENTLY option to REPACK command.

diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..a956892f42f 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -472,6 +473,88 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)

+	/*
+	 * Second, skip records which do not contain sufficient information for
+	 * the decoding.
+	 *
+	 * The problem we solve here is that REPACK CONCURRENTLY generates WAL
+	 * when doing changes in the new table. Those changes should not be useful
+	 * for any other user (such as logical replication subscription) because
+	 * the new table will eventually be dropped (after REPACK CONCURRENTLY has
+	 * assigned its file to the "old table").
+	 */
+	switch (info)
+	{
+		case XLOG_HEAP_INSERT:
+			{
+				xl_heap_insert *rec;
+
+				rec = (xl_heap_insert *) XLogRecGetData(buf->record);
+
+				/*
+				 * This does happen when 1) raw_heap_insert marks the TOAST
+				 * record as HEAP_INSERT_NO_LOGICAL, 2) REPACK CONCURRENTLY
+				 * replays inserts performed by other backends.
+				 */
+				if ((rec->flags & XLH_INSERT_CONTAINS_NEW_TUPLE) == 0)
+					return;
+
+				break;
+			}
+
+		case XLOG_HEAP_HOT_UPDATE:
+		case XLOG_HEAP_UPDATE:
+			{
+				xl_heap_update *rec;
+
+				rec = (xl_heap_update *) XLogRecGetData(buf->record);
+				if ((rec->flags &
+					 (XLH_UPDATE_CONTAINS_NEW_TUPLE |
+					  XLH_UPDATE_CONTAINS_OLD_TUPLE |
+					  XLH_UPDATE_CONTAINS_OLD_KEY)) == 0)
+					return;
+
+				break;
+			}
+
+		case XLOG_HEAP_DELETE:
+			{
+				xl_heap_delete *rec;
+
+				rec = (xl_heap_delete *) XLogRecGetData(buf->record);
+				if (rec->flags & XLH_DELETE_NO_LOGICAL)
+					return;
+				break;
+			}
+	}

I'm confused as to the purpose of this addition. I took this whole
block out, and no tests seem to fail. Moreover, some of the cases that
are being skipped because of this, would already be skipped by code in
DecodeInsert / DecodeUpdate anyway. The case for XLOG_HEAP_DELETE seems
to have no effect (that is, the "return" there never hits for any tests
as far as I can tell.)

The reason I ask is that the line immediately below does this:

ReorderBufferProcessXid(ctx->reorder, xid, buf->origptr);

which means the Xid is tracked for snapshot building purposes. Which is
probably important, because of what the comment right below it says:

/*
* If we don't have snapshot or we are just fast-forwarding, there is no
* point in decoding data changes. However, it's crucial to build the base
* snapshot during fast-forward mode (as is done in
* SnapBuildProcessChange()) because we require the snapshot's xmin when
* determining the candidate catalog_xmin for the replication slot. See
* SnapBuildProcessRunningXacts().
*/

So what happens here is that we would skip processing the Xid of a xlog
record during snapshot-building, on the grounds that it doesn't contain
logical changes. I'm not sure this is okay. If we do indeed need this,
then perhaps it should be done after ReorderBufferProcessXid().

Or did you intend to make this conditional on the backend running
REPACK?

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

#75

Mihail Nikalayeu

mihailnikalayeu@gmail.com

25 days ago

In reply to: Mihail Nikalayeu (#69)

Re: Adding REPACK [concurrently]

Hello!

On Sat, Dec 13, 2025 at 7:45 PM Mihail Nikalayeu
<mihailnikalayeu@gmail.com> wrote:

Stress tests for REPACK concurrently in attachment.

To run:
ninja && meson test --suite setup && meson test --print-errorlogs
--suite amcheck *007*
ninja && meson test --suite setup && meson test --print-errorlogs
--suite amcheck *008*

Results for v28:

Up to " v28-0005-Use-background-worker-to-do-logical-decoding.patch":

Technically it passes, but sometimes I saw 0% CPU usage for long
periods with such stacks (looks like it happens for 0008 more often):

epoll_wait 0x000078b99512a037
WaitEventSetWaitBlock waiteventset.c:1192
WaitEventSetWait waiteventset.c:1140
WaitLatch latch.c:196
decode_concurrent_changes cluster.c:2702
repack_worker_internal cluster.c:3777
RepackWorkerMain cluster.c:3725
BackgroundWorkerMain bgworker.c:850
postmaster_child_launch launch_backend.c:268
StartBackgroundWorker postmaster.c:4168
maybe_start_bgworkers postmaster.c:4334
LaunchMissingBackgroundProcesses postmaster.c:3408
ServerLoop postmaster.c:1728
PostmasterMain postmaster.c:1403
main main.c:231

epoll_wait 0x000078b99512a037
WaitEventSetWaitBlock waiteventset.c:1192
WaitEventSetWait waiteventset.c:1140
WaitLatch latch.c:196
ConditionVariableTimedSleep condition_variable.c:165
ConditionVariableSleep condition_variable.c:100
process_concurrent_changes cluster.c:3042
rebuild_relation_finish_concurrent cluster.c:3303
rebuild_relation cluster.c:1121
cluster_rel cluster.c:731
process_single_relation cluster.c:2405
ExecRepack cluster.c:391
standard_ProcessUtility utility.c:864
ProcessUtility utility.c:525
PortalRunUtility pquery.c:1148
PortalRunMulti pquery.c:1306
PortalRun pquery.c:783
exec_simple_query postgres.c:1280
PostgresMain postgres.c:4779
BackendMain backend_startup.c:124
postmaster_child_launch launch_backend.c:268
BackendStartup postmaster.c:3598
ServerLoop postmaster.c:1713
PostmasterMain postmaster.c:1403
main main.c:231

Probably it is because

100000L, /* XXX Tune the delay. */

100 seconds is clearly too much.

For "v28-0006-Use-multiple-snapshots-to-copy-the-data.patch":

0007: crash with

TRAP: failed Assert("portal->portalSnapshot == GetActiveSnapshot()"),
File: "../src/backend/tcop/pquery.c", Line: 1169, PID: 178414
postgres: CIC_test: nkey postgres [local]
REPACK(ExceptionalCondition+0xbe)[0x5743f9a955bb]
postgres: CIC_test: nkey postgres [local] REPACK(+0x67fac4)[0x5743f98a7ac4]
postgres: CIC_test: nkey postgres [local] REPACK(+0x67fced)[0x5743f98a7ced]
postgres: CIC_test: nkey postgres [local]
REPACK(PortalRun+0x346)[0x5743f98a7107]
postgres: CIC_test: nkey postgres [local] REPACK(+0x6773bb)[0x5743f989f3bb]
postgres: CIC_test: nkey postgres [local]
REPACK(PostgresMain+0xc1c)[0x5743f98a4f58]
postgres: CIC_test: nkey postgres [local] REPACK(+0x6726c6)[0x5743f989a6c6]
postgres: CIC_test: nkey postgres [local]
REPACK(postmaster_child_launch+0x191)[0x5743f979678c]
postgres: CIC_test: nkey postgres [local] REPACK(+0x5755ca)[0x5743f979d5ca]
postgres: CIC_test: nkey postgres [local] REPACK(+0x572972)[0x5743f979a972]
postgres: CIC_test: nkey postgres [local]
REPACK(PostmasterMain+0x168a)[0x5743f979a225]
postgres: CIC_test: nkey postgres [local] REPACK(main+0x3a1)[0x5743f9662176]
/lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x77f80402a1ca]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x77f80402a28b]
postgres: CIC_test: nkey postgres [local] REPACK(_start+0x25)[0x5743f9311eb5]

0008: pass

Best regards,
Mikhail.

#76

Mihail Nikalayeu

mihailnikalayeu@gmail.com

25 days ago

In reply to: Antonin Houska (#73)

Re: Adding REPACK [concurrently]

Hello, Antonin!

On Sat, Dec 13, 2025 at 8:39 PM Antonin Houska <ah@cybertec.at> wrote:

---

SpinLockAcquire(&shared->mutex);
valid = shared->sfs_valid;
SpinLockRelease(&shared->mutex);

Better to remember last_exported here to avoid any races/misses.

What races/misses exactly?

Just as some way to reduce a number of potential scenarios/states
between parallel actors.

---

bool done;

bool exit_after_lsn_upto?

Not sure.

I think it should be named in some way to signal it is a request, not a report.

Also, should we add some kind of back pressure between building
indexes/new heap and num of WAL we have?
But probably it is out of scope of the patch.

Do you mean that the decoding worker should be less active if the amount of
WAL doesn't grow too fast?

In the previous version (without background) we have some kind of
back-pressure during the scan part (if we have too muchWAL delayed
because of us - we process it).
But it is not more true with a background worker. At the same time -
it never was during the index building phase...

Best regards,
Mikhail.

#77

Antonin Houska

ah@cybertec.at

7 days ago

In reply to: Alvaro Herrera (#74)

Re: Adding REPACK [concurrently]

Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2025-Dec-13, Antonin Houska wrote:

From 6279394135f2b693b6fffd174822509e0a067cbf Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Sat, 13 Dec 2025 19:27:18 +0100
Subject: [PATCH 4/6] Add CONCURRENTLY option to REPACK command.

diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..a956892f42f 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -472,6 +473,88 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)

+	/*
+	 * Second, skip records which do not contain sufficient information for
+	 * the decoding.
+	 *
+	 * The problem we solve here is that REPACK CONCURRENTLY generates WAL
+	 * when doing changes in the new table. Those changes should not be useful
+	 * for any other user (such as logical replication subscription) because
+	 * the new table will eventually be dropped (after REPACK CONCURRENTLY has
+	 * assigned its file to the "old table").
+	 */
+	switch (info)
+	{
+		case XLOG_HEAP_INSERT:
+			{
+				xl_heap_insert *rec;
+
+				rec = (xl_heap_insert *) XLogRecGetData(buf->record);
+
+				/*
+				 * This does happen when 1) raw_heap_insert marks the TOAST
+				 * record as HEAP_INSERT_NO_LOGICAL, 2) REPACK CONCURRENTLY
+				 * replays inserts performed by other backends.
+				 */
+				if ((rec->flags & XLH_INSERT_CONTAINS_NEW_TUPLE) == 0)
+					return;
+
+				break;
+			}
+
+		case XLOG_HEAP_HOT_UPDATE:
+		case XLOG_HEAP_UPDATE:
+			{
+				xl_heap_update *rec;
+
+				rec = (xl_heap_update *) XLogRecGetData(buf->record);
+				if ((rec->flags &
+					 (XLH_UPDATE_CONTAINS_NEW_TUPLE |
+					  XLH_UPDATE_CONTAINS_OLD_TUPLE |
+					  XLH_UPDATE_CONTAINS_OLD_KEY)) == 0)
+					return;
+
+				break;
+			}
+
+		case XLOG_HEAP_DELETE:
+			{
+				xl_heap_delete *rec;
+
+				rec = (xl_heap_delete *) XLogRecGetData(buf->record);
+				if (rec->flags & XLH_DELETE_NO_LOGICAL)
+					return;
+				break;
+			}
+	}

I'm confused as to the purpose of this addition. I took this whole
block out, and no tests seem to fail.

This is just an optimization, to avoid unnecessary decoding of data changes
that the output plugin would ignore anyway. Note that REPACK (CONCURRENTLY)
can generate a huge amount of WAL itself.

Moreover, some of the cases that
are being skipped because of this, would already be skipped by code in
DecodeInsert / DecodeUpdate anyway.

By checking earlier I tried to avoid calling ReorderBufferProcessXid()
unnecessarily.

The case for XLOG_HEAP_DELETE seems
to have no effect (that is, the "return" there never hits for any tests
as far as I can tell.)

The current tests do not cover this, but it should be hit by backends
performing logical decoding unrelated to REPACK. The typical case is that WAL
sender involved in logical replication reads a DELETE record generated by
REPACK (CONCURRENTLY) due to replaying a DELETE statement on the new relation.

The reason I ask is that the line immediately below does this:

ReorderBufferProcessXid(ctx->reorder, xid, buf->origptr);

which means the Xid is tracked for snapshot building purposes. Which is
probably important, because of what the comment right below it says:

/*
* If we don't have snapshot or we are just fast-forwarding, there is no
* point in decoding data changes. However, it's crucial to build the base
* snapshot during fast-forward mode (as is done in
* SnapBuildProcessChange()) because we require the snapshot's xmin when
* determining the candidate catalog_xmin for the replication slot. See
* SnapBuildProcessRunningXacts().
*/

So what happens here is that we would skip processing the Xid of a xlog
record during snapshot-building, on the grounds that it doesn't contain
logical changes. I'm not sure this is okay.

I think I missed the fact that SnapBuildProcessChange() relies on
ReorderBufferProcessXid() having been called.

If we do indeed need this,
then perhaps it should be done after ReorderBufferProcessXid().

... and after SnapBuildProcessChange(). Thus the changes being discussed here
should be removed from the patch. I'll do that in the next version. Thanks.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#78

Antonin Houska

ah@cybertec.at

7 days ago

In reply to: Antonin Houska (#77)

Re: Adding REPACK [concurrently]

Antonin Houska <ah@cybertec.at> wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

If we do indeed need this,
then perhaps it should be done after ReorderBufferProcessXid().

... and after SnapBuildProcessChange(). Thus the changes being discussed here
should be removed from the patch. I'll do that in the next version. Thanks.

Actually the check of XLH_DELETE_NO_LOGICAL should not be discarded. I think
it should be added to DecodeDelete().

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#79

Antonin Houska

ah@cybertec.at

7 days ago

In reply to: Mihail Nikalayeu (#75)

Re: Adding REPACK [concurrently]

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

Hello!

On Sat, Dec 13, 2025 at 7:45 PM Mihail Nikalayeu
<mihailnikalayeu@gmail.com> wrote:

Stress tests for REPACK concurrently in attachment.

To run:
ninja && meson test --suite setup && meson test --print-errorlogs
--suite amcheck *007*
ninja && meson test --suite setup && meson test --print-errorlogs
--suite amcheck *008*

Thanks for running the tests!

Results for v28:

Up to " v28-0005-Use-background-worker-to-do-logical-decoding.patch":

Technically it passes, but sometimes I saw 0% CPU usage for long
periods with such stacks (looks like it happens for 0008 more often):

...

Probably it is because

100000L, /* XXX Tune the delay. */

100 seconds is clearly too much.

I confused milliseconds with microseconds. Since I was only running the code
with debugger, the long delays didn't appear to be a problem.

Instead of tuning the timeout, I'm thinking of introducing a condition
variable that signals WAL flushing.

For "v28-0006-Use-multiple-snapshots-to-copy-the-data.patch":

0007: crash with

TRAP: failed Assert("portal->portalSnapshot == GetActiveSnapshot()"),
File: "../src/backend/tcop/pquery.c", Line: 1169, PID: 178414

Thanks. I'll check when I have time for this part.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#80

Alvaro Herrera

alvherre@alvh.no-ip.org

7 days ago

In reply to: Antonin Houska (#79)

Re: Adding REPACK [concurrently]

On 2026-Jan-05, Antonin Houska wrote:

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

Probably it is because

100000L, /* XXX Tune the delay. */

100 seconds is clearly too much.

I confused milliseconds with microseconds. Since I was only running the code
with debugger, the long delays didn't appear to be a problem.

Instead of tuning the timeout, I'm thinking of introducing a condition
variable that signals WAL flushing.

I think there is a patch that adds support for this in the queue already
-- see this message:
/messages/by-id/CAPpHfds-KiZRuCruc0jHxLSxLqzKcHJGwOFFA0b_RgaJvtUOEQ@mail.gmail.com

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"¿Qué importan los años? Lo que realmente importa es comprobar que
a fin de cuentas la mejor edad de la vida es estar vivo" (Mafalda)

#81

Antonin Houska

ah@cybertec.at

7 days ago

In reply to: Alvaro Herrera (#80)

Re: Adding REPACK [concurrently]

Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2026-Jan-05, Antonin Houska wrote:

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

Probably it is because

100000L, /* XXX Tune the delay. */

100 seconds is clearly too much.

I confused milliseconds with microseconds. Since I was only running the code
with debugger, the long delays didn't appear to be a problem.

Instead of tuning the timeout, I'm thinking of introducing a condition
variable that signals WAL flushing.

I think there is a patch that adds support for this in the queue already
-- see this message:
/messages/by-id/CAPpHfds-KiZRuCruc0jHxLSxLqzKcHJGwOFFA0b_RgaJvtUOEQ@mail.gmail.com

Thanks for the hint! It seems that the already committed WaitForLSN() function
does what I need.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#82

Antonin Houska

ah@cybertec.at

4 days ago

In reply to: Antonin Houska (#78)

6 attachment(s)

Re: Adding REPACK [concurrently]

Antonin Houska <ah@cybertec.at> wrote:

Antonin Houska <ah@cybertec.at> wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

If we do indeed need this,
then perhaps it should be done after ReorderBufferProcessXid().

... and after SnapBuildProcessChange(). Thus the changes being discussed here
should be removed from the patch. I'll do that in the next version. Thanks.

Actually the check of XLH_DELETE_NO_LOGICAL should not be discarded. I think
it should be added to DecodeDelete().

v29 tries to fix the problem.

Besides that, it reflects the recent Mihail's comments [1]/messages/by-id/CADzfLwXp4c-MJx7yVDxAGNNxPbX4o9dqyivxavtHvmUsdXYqBQ@mail.gmail.com, [2]/messages/by-id/CADzfLwWNz_jwi7KVOmJ9D97+zwxsiwDSqSUUJ9oqUCOqkbGnRA@mail.gmail.com.

[1]: /messages/by-id/CADzfLwXp4c-MJx7yVDxAGNNxPbX4o9dqyivxavtHvmUsdXYqBQ@mail.gmail.com
/messages/by-id/CADzfLwXp4c-MJx7yVDxAGNNxPbX4o9dqyivxavtHvmUsdXYqBQ@mail.gmail.com
[2]: /messages/by-id/CADzfLwWNz_jwi7KVOmJ9D97+zwxsiwDSqSUUJ9oqUCOqkbGnRA@mail.gmail.com

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachments:

v29-0001-Add-REPACK-command.patchtext/x-diff; charset=utf-8Download

From 501f6365caef2f5b67b68715cf84245066933b85 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Thu, 8 Jan 2026 17:47:49 +0100
Subject: [PATCH 1/6] Add REPACK command
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

REPACK absorbs the functionality of VACUUM FULL and CLUSTER in a single
command.  Because this functionality is completely different from
regular VACUUM, having it separate from VACUUM makes it easier for users
to understand; as for CLUSTER, the term is heavily overloaded in the
IT world and even in Postgres itself, so it's good that we can avoid it.

This also adds pg_repackdb, a new utility that can invoke the new
commands.  This is heavily based on vacuumdb.

Author: Antonin Houska <ah@cybertec.at>
Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: jian he <jian.universality@gmail.com>
Discussion: https://postgr.es/m/82651.1720540558@antos
Discussion: https://postgr.es/m/202507262156.sb455angijk6@alvherre.pgsql
---
 doc/src/sgml/monitoring.sgml             | 223 +++++-
 doc/src/sgml/ref/allfiles.sgml           |   2 +
 doc/src/sgml/ref/cluster.sgml            |  97 +--
 doc/src/sgml/ref/clusterdb.sgml          |   5 +
 doc/src/sgml/ref/pg_repackdb.sgml        | 488 +++++++++++++
 doc/src/sgml/ref/repack.sgml             | 328 +++++++++
 doc/src/sgml/ref/vacuum.sgml             |  33 +-
 doc/src/sgml/reference.sgml              |   2 +
 src/backend/access/heap/heapam_handler.c |  32 +-
 src/backend/catalog/index.c              |   2 +-
 src/backend/catalog/system_views.sql     |  29 +-
 src/backend/commands/cluster.c           | 849 +++++++++++++++--------
 src/backend/commands/vacuum.c            |   6 +-
 src/backend/parser/gram.y                |  86 ++-
 src/backend/tcop/utility.c               |  23 +-
 src/backend/utils/adt/pgstatfuncs.c      |   2 +
 src/bin/psql/tab-complete.in.c           |  42 +-
 src/bin/scripts/Makefile                 |   3 +
 src/bin/scripts/meson.build              |   2 +
 src/bin/scripts/pg_repackdb.c            | 240 +++++++
 src/bin/scripts/t/103_repackdb.pl        |  47 ++
 src/bin/scripts/vacuuming.c              | 102 ++-
 src/bin/scripts/vacuuming.h              |   3 +
 src/include/commands/cluster.h           |   8 +-
 src/include/commands/progress.h          |  50 +-
 src/include/nodes/parsenodes.h           |  35 +-
 src/include/parser/kwlist.h              |   1 +
 src/include/tcop/cmdtaglist.h            |   1 +
 src/include/utils/backend_progress.h     |   1 +
 src/test/regress/expected/cluster.out    | 134 +++-
 src/test/regress/expected/rules.out      |  72 +-
 src/test/regress/sql/cluster.sql         |  70 +-
 src/tools/pgindent/typedefs.list         |   2 +
 33 files changed, 2484 insertions(+), 536 deletions(-)
 create mode 100644 doc/src/sgml/ref/pg_repackdb.sgml
 create mode 100644 doc/src/sgml/ref/repack.sgml
 create mode 100644 src/bin/scripts/pg_repackdb.c
 create mode 100644 src/bin/scripts/t/103_repackdb.pl

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 817fd9f4ca7..b07fe3294cd 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -405,6 +405,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_progress_repack</structname><indexterm><primary>pg_stat_progress_repack</primary></indexterm></entry>
+      <entry>One row for each backend running
+       <command>REPACK</command>, showing current progress.  See
+       <xref linkend="repack-progress-reporting"/>.
+      </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_progress_basebackup</structname><indexterm><primary>pg_stat_progress_basebackup</primary></indexterm></entry>
       <entry>One row for each WAL sender process streaming a base backup,
@@ -5609,7 +5617,8 @@ FROM pg_stat_get_backend_idset() AS backendid;
    certain commands during command execution.  Currently, the only commands
    which support progress reporting are <command>ANALYZE</command>,
    <command>CLUSTER</command>,
-   <command>CREATE INDEX</command>, <command>VACUUM</command>,
+   <command>CREATE INDEX</command>, <command>REPACK</command>,
+   <command>VACUUM</command>,
    <command>COPY</command>,
    and <xref linkend="protocol-replication-base-backup"/> (i.e., replication
    command that <xref linkend="app-pgbasebackup"/> issues to take
@@ -6093,6 +6102,218 @@ FROM pg_stat_get_backend_idset() AS backendid;
   </table>
  </sect2>
 
+ <sect2 id="repack-progress-reporting">
+  <title>REPACK Progress Reporting</title>
+
+  <indexterm>
+   <primary>pg_stat_progress_repack</primary>
+  </indexterm>
+
+  <para>
+   Whenever <command>REPACK</command> is running,
+   the <structname>pg_stat_progress_repack</structname> view will contain a
+   row for each backend that is currently running the command.  The tables
+   below describe the information that will be reported and provide
+   information about how to interpret it.
+  </para>
+
+  <table id="pg-stat-progress-repack-view" xreflabel="pg_stat_progress_repack">
+   <title><structname>pg_stat_progress_repack</structname> View</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>pid</structfield> <type>integer</type>
+      </para>
+      <para>
+       Process ID of backend.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>datid</structfield> <type>oid</type>
+      </para>
+      <para>
+       OID of the database to which this backend is connected.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>datname</structfield> <type>name</type>
+      </para>
+      <para>
+       Name of the database to which this backend is connected.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>relid</structfield> <type>oid</type>
+      </para>
+      <para>
+       OID of the table being repacked.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>phase</structfield> <type>text</type>
+      </para>
+      <para>
+       Current processing phase. See <xref linkend="repack-phases"/>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>repack_index_relid</structfield> <type>oid</type>
+      </para>
+      <para>
+       If the table is being scanned using an index, this is the OID of the
+       index being used; otherwise, it is zero.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_scanned</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples scanned.
+       This counter only advances when the phase is
+       <literal>seq scanning heap</literal>,
+       <literal>index scanning heap</literal>
+       or <literal>writing new heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_written</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples written.
+       This counter only advances when the phase is
+       <literal>seq scanning heap</literal>,
+       <literal>index scanning heap</literal>
+       or <literal>writing new heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_blks_total</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total number of heap blocks in the table.  This number is reported
+       as of the beginning of <literal>seq scanning heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_blks_scanned</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap blocks scanned.  This counter only advances when the
+       phase is <literal>seq scanning heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>index_rebuild_count</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of indexes rebuilt.  This counter only advances when the phase
+       is <literal>rebuilding index</literal>.
+      </para></entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <table id="repack-phases">
+   <title>REPACK Phases</title>
+   <tgroup cols="2">
+    <colspec colname="col1" colwidth="1*"/>
+    <colspec colname="col2" colwidth="2*"/>
+    <thead>
+    <row>
+      <entry>Phase</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><literal>initializing</literal></entry>
+     <entry>
+       The command is preparing to begin scanning the heap.  This phase is
+       expected to be very brief.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>seq scanning heap</literal></entry>
+     <entry>
+       The command is currently scanning the table using a sequential scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>index scanning heap</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently scanning the table using an index scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>sorting tuples</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently sorting tuples.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>writing new heap</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently writing the new heap.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>swapping relation files</literal></entry>
+     <entry>
+       The command is currently swapping newly-built files into place.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>rebuilding index</literal></entry>
+     <entry>
+       The command is currently rebuilding an index.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>performing final cleanup</literal></entry>
+     <entry>
+       The command is performing final cleanup.  When this phase is
+       completed, <command>REPACK</command> will end.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+ </sect2>
+
  <sect2 id="copy-progress-reporting">
   <title>COPY Progress Reporting</title>
 
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index e167406c744..5df944d13ca 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -167,6 +167,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY refreshMaterializedView SYSTEM "refresh_materialized_view.sgml">
 <!ENTITY reindex            SYSTEM "reindex.sgml">
 <!ENTITY releaseSavepoint   SYSTEM "release_savepoint.sgml">
+<!ENTITY repack             SYSTEM "repack.sgml">
 <!ENTITY reset              SYSTEM "reset.sgml">
 <!ENTITY revoke             SYSTEM "revoke.sgml">
 <!ENTITY rollback           SYSTEM "rollback.sgml">
@@ -213,6 +214,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY pgIsready          SYSTEM "pg_isready.sgml">
 <!ENTITY pgReceivewal       SYSTEM "pg_receivewal.sgml">
 <!ENTITY pgRecvlogical      SYSTEM "pg_recvlogical.sgml">
+<!ENTITY pgRepackdb         SYSTEM "pg_repackdb.sgml">
 <!ENTITY pgResetwal         SYSTEM "pg_resetwal.sgml">
 <!ENTITY pgRestore          SYSTEM "pg_restore.sgml">
 <!ENTITY pgRewind           SYSTEM "pg_rewind.sgml">
diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index 0b47460080b..2cda711bc9f 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -33,51 +33,13 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
   <title>Description</title>
 
   <para>
-   <command>CLUSTER</command> instructs <productname>PostgreSQL</productname>
-   to cluster the table specified
-   by <replaceable class="parameter">table_name</replaceable>
-   based on the index specified by
-   <replaceable class="parameter">index_name</replaceable>. The index must
-   already have been defined on
-   <replaceable class="parameter">table_name</replaceable>.
+   The <command>CLUSTER</command> command is equivalent to
+   <xref linkend="sql-repack"/> with an <literal>USING INDEX</literal>
+   clause.  See there for more details.
   </para>
 
-  <para>
-   When a table is clustered, it is physically reordered
-   based on the index information. Clustering is a one-time operation:
-   when the table is subsequently updated, the changes are
-   not clustered.  That is, no attempt is made to store new or
-   updated rows according to their index order.  (If one wishes, one can
-   periodically recluster by issuing the command again.  Also, setting
-   the table's <literal>fillfactor</literal> storage parameter to less than
-   100% can aid in preserving cluster ordering during updates, since updated
-   rows are kept on the same page if enough space is available there.)
-  </para>
-
-  <para>
-   When a table is clustered, <productname>PostgreSQL</productname>
-   remembers which index it was clustered by.  The form
-   <command>CLUSTER <replaceable class="parameter">table_name</replaceable></command>
-   reclusters the table using the same index as before.  You can also
-   use the <literal>CLUSTER</literal> or <literal>SET WITHOUT CLUSTER</literal>
-   forms of <link linkend="sql-altertable"><command>ALTER TABLE</command></link> to set the index to be used for
-   future cluster operations, or to clear any previous setting.
-  </para>
-
-  <para>
-   <command>CLUSTER</command> without a
-   <replaceable class="parameter">table_name</replaceable> reclusters all the
-   previously-clustered tables in the current database that the calling user
-   has privileges for.  This form of <command>CLUSTER</command> cannot be
-   executed inside a transaction block.
-  </para>
+<!-- Do we need to describe exactly which options map to what?  They seem obvious to me. -->
 
-  <para>
-   When a table is being clustered, an <literal>ACCESS
-   EXCLUSIVE</literal> lock is acquired on it. This prevents any other
-   database operations (both reads and writes) from operating on the
-   table until the <command>CLUSTER</command> is finished.
-  </para>
  </refsect1>
 
  <refsect1>
@@ -136,63 +98,12 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
     on the table.
    </para>
 
-   <para>
-    In cases where you are accessing single rows randomly
-    within a table, the actual order of the data in the
-    table is unimportant. However, if you tend to access some
-    data more than others, and there is an index that groups
-    them together, you will benefit from using <command>CLUSTER</command>.
-    If you are requesting a range of indexed values from a table, or a
-    single indexed value that has multiple rows that match,
-    <command>CLUSTER</command> will help because once the index identifies the
-    table page for the first row that matches, all other rows
-    that match are probably already on the same table page,
-    and so you save disk accesses and speed up the query.
-   </para>
-
-   <para>
-    <command>CLUSTER</command> can re-sort the table using either an index scan
-    on the specified index, or (if the index is a b-tree) a sequential
-    scan followed by sorting.  It will attempt to choose the method that
-    will be faster, based on planner cost parameters and available statistical
-    information.
-   </para>
-
    <para>
     While <command>CLUSTER</command> is running, the <xref
     linkend="guc-search-path"/> is temporarily changed to <literal>pg_catalog,
     pg_temp</literal>.
    </para>
 
-   <para>
-    When an index scan is used, a temporary copy of the table is created that
-    contains the table data in the index order.  Temporary copies of each
-    index on the table are created as well.  Therefore, you need free space on
-    disk at least equal to the sum of the table size and the index sizes.
-   </para>
-
-   <para>
-    When a sequential scan and sort is used, a temporary sort file is
-    also created, so that the peak temporary space requirement is as much
-    as double the table size, plus the index sizes.  This method is often
-    faster than the index scan method, but if the disk space requirement is
-    intolerable, you can disable this choice by temporarily setting <xref
-    linkend="guc-enable-sort"/> to <literal>off</literal>.
-   </para>
-
-   <para>
-    It is advisable to set <xref linkend="guc-maintenance-work-mem"/> to
-    a reasonably large value (but not more than the amount of RAM you can
-    dedicate to the <command>CLUSTER</command> operation) before clustering.
-   </para>
-
-   <para>
-    Because the planner records statistics about the ordering of
-    tables, it is advisable to run <link linkend="sql-analyze"><command>ANALYZE</command></link>
-    on the newly clustered table.
-    Otherwise, the planner might make poor choices of query plans.
-   </para>
-
    <para>
     Because <command>CLUSTER</command> remembers which indexes are clustered,
     one can cluster the tables one wants clustered manually the first time,
diff --git a/doc/src/sgml/ref/clusterdb.sgml b/doc/src/sgml/ref/clusterdb.sgml
index 0d2051bf6f1..b50c9581a98 100644
--- a/doc/src/sgml/ref/clusterdb.sgml
+++ b/doc/src/sgml/ref/clusterdb.sgml
@@ -64,6 +64,11 @@ PostgreSQL documentation
    this utility and via other methods for accessing the server.
   </para>
 
+  <para>
+   <application>clusterdb</application> has been superseded by
+   <application>pg_repackdb</application>.
+  </para>
+
  </refsect1>
 
 
diff --git a/doc/src/sgml/ref/pg_repackdb.sgml b/doc/src/sgml/ref/pg_repackdb.sgml
new file mode 100644
index 00000000000..b313b54ab63
--- /dev/null
+++ b/doc/src/sgml/ref/pg_repackdb.sgml
@@ -0,0 +1,488 @@
+<!--
+doc/src/sgml/ref/pg_repackdb.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="app-pgrepackdb">
+ <indexterm zone="app-pgrepackdb">
+  <primary>pg_repackdb</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle><application>pg_repackdb</application></refentrytitle>
+  <manvolnum>1</manvolnum>
+  <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>pg_repackdb</refname>
+  <refpurpose>repack and analyze a <productname>PostgreSQL</productname>
+  database</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_repackdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+      <arg choice="plain"><option>-t</option></arg>
+      <arg choice="plain"><option>--table</option></arg>
+     </group>
+     <replaceable>table</replaceable>
+     <arg choice="opt">( <replaceable class="parameter">column</replaceable> [,...] )</arg>
+    </arg>
+   </arg>
+
+   <arg choice="opt">
+    <group choice="plain">
+     <arg choice="plain"><replaceable>dbname</replaceable></arg>
+     <arg choice="plain"><option>-a</option></arg>
+     <arg choice="plain"><option>--all</option></arg>
+    </group>
+   </arg>
+  </cmdsynopsis>
+
+  <cmdsynopsis>
+   <command>pg_repackdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+      <arg choice="plain"><option>-n</option></arg>
+      <arg choice="plain"><option>--schema</option></arg>
+     </group>
+     <replaceable>schema</replaceable>
+    </arg>
+   </arg>
+
+   <arg choice="opt">
+    <group choice="plain">
+     <arg choice="plain"><replaceable>dbname</replaceable></arg>
+     <arg choice="plain"><option>-a</option></arg>
+     <arg choice="plain"><option>--all</option></arg>
+    </group>
+   </arg>
+  </cmdsynopsis>
+
+  <cmdsynopsis>
+   <command>pg_repackdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+      <arg choice="plain"><option>-N</option></arg>
+      <arg choice="plain"><option>--exclude-schema</option></arg>
+     </group>
+     <replaceable>schema</replaceable>
+    </arg>
+   </arg>
+
+   <arg choice="opt">
+    <group choice="plain">
+     <arg choice="plain"><replaceable>dbname</replaceable></arg>
+     <arg choice="plain"><option>-a</option></arg>
+     <arg choice="plain"><option>--all</option></arg>
+    </group>
+   </arg>
+  </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <application>pg_repackdb</application> is a utility for repacking a
+   <productname>PostgreSQL</productname> database.
+   <application>pg_repackdb</application> will also generate internal
+   statistics used by the <productname>PostgreSQL</productname> query
+   optimizer.
+  </para>
+
+  <para>
+   <application>pg_repackdb</application> is a wrapper around the SQL
+   command <link linkend="sql-repack"><command>REPACK</command></link> There
+   is no effective difference between repacking and analyzing databases via
+   this utility and via other methods for accessing the server.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Options</title>
+
+   <para>
+    <application>pg_repackdb</application> accepts the following command-line arguments:
+    <variablelist>
+     <varlistentry>
+      <term><option>-a</option></term>
+      <term><option>--all</option></term>
+      <listitem>
+       <para>
+        Repack all databases.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option><optional>-d</optional> <replaceable class="parameter">dbname</replaceable></option></term>
+      <term><option><optional>--dbname=</optional><replaceable class="parameter">dbname</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the name of the database to be repacked or analyzed,
+        when <option>-a</option>/<option>--all</option> is not used.  If this
+        is not specified, the database name is read from the environment
+        variable <envar>PGDATABASE</envar>.  If that is not set, the user name
+        specified for the connection is used.
+        The <replaceable>dbname</replaceable> can be
+        a <link linkend="libpq-connstring">connection string</link>.  If so,
+        connection string parameters will override any conflicting command
+        line options.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-e</option></term>
+      <term><option>--echo</option></term>
+      <listitem>
+       <para>
+        Echo the commands that <application>pg_repackdb</application>
+        generates and sends to the server.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--index<optional>=<replaceable class="parameter">index_name</replaceable></optional></option></term>
+      <listitem>
+       <para>
+        Pass the <literal>USING INDEX</literal> clause to <literal>REPACK</literal>,
+        and optionally the index name to specify.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-j <replaceable class="parameter">njobs</replaceable></option></term>
+      <term><option>--jobs=<replaceable class="parameter">njobs</replaceable></option></term>
+      <listitem>
+       <para>
+        Execute the repack or analyze commands in parallel by running
+        <replaceable class="parameter">njobs</replaceable>
+        commands simultaneously.  This option may reduce the processing time
+        but it also increases the load on the database server.
+       </para>
+       <para>
+        <application>pg_repackdb</application> will open
+        <replaceable class="parameter">njobs</replaceable> connections to the
+        database, so make sure your <xref linkend="guc-max-connections"/>
+        setting is high enough to accommodate all connections.
+       </para>
+       <para>
+        Note that using this mode might cause deadlock failures if certain
+        system catalogs are processed in parallel.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-n <replaceable class="parameter">schema</replaceable></option></term>
+      <term><option>--schema=<replaceable class="parameter">schema</replaceable></option></term>
+      <listitem>
+       <para>
+        Repack or analyze all tables in
+        <replaceable class="parameter">schema</replaceable> only.  Multiple
+        schemas can be repacked by writing multiple <option>-n</option>
+        switches.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-N <replaceable class="parameter">schema</replaceable></option></term>
+      <term><option>--exclude-schema=<replaceable class="parameter">schema</replaceable></option></term>
+      <listitem>
+       <para>
+        Do not repack or analyze any tables in
+        <replaceable class="parameter">schema</replaceable>.  Multiple schemas
+        can be excluded by writing multiple <option>-N</option> switches.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-q</option></term>
+      <term><option>--quiet</option></term>
+      <listitem>
+       <para>
+        Do not display progress messages.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-t <replaceable class="parameter">table</replaceable> [ (<replaceable class="parameter">column</replaceable> [,...]) ]</option></term>
+      <term><option>--table=<replaceable class="parameter">table</replaceable> [ (<replaceable class="parameter">column</replaceable> [,...]) ]</option></term>
+      <listitem>
+       <para>
+        Repack or analyze <replaceable class="parameter">table</replaceable>
+        only.  Column names can be specified only in conjunction with
+        the <option>--analyze</option> option.  Multiple tables can be
+        repacked by writing multiple
+        <option>-t</option> switches.
+       </para>
+       <tip>
+        <para>
+         If you specify columns, you probably have to escape the parentheses
+         from the shell.  (See examples below.)
+        </para>
+       </tip>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-v</option></term>
+      <term><option>--verbose</option></term>
+      <listitem>
+       <para>
+        Print detailed information during processing.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-V</option></term>
+       <term><option>--version</option></term>
+       <listitem>
+       <para>
+       Print the <application>pg_repackdb</application> version and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-z</option></term>
+      <term><option>--analyze</option></term>
+      <listitem>
+       <para>
+        Also calculate statistics for use by the optimizer.  If a column name
+        list is given, only compute statistics for those columns.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-?</option></term>
+       <term><option>--help</option></term>
+       <listitem>
+       <para>
+       Show help about <application>pg_repackdb</application> command line
+       arguments, and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+    </variablelist>
+   </para>
+
+   <para>
+    <application>pg_repackdb</application> also accepts
+    the following command-line arguments for connection parameters:
+    <variablelist>
+     <varlistentry>
+      <term><option>-h <replaceable class="parameter">host</replaceable></option></term>
+      <term><option>--host=<replaceable class="parameter">host</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the host name of the machine on which the server
+        is running.  If the value begins with a slash, it is used
+        as the directory for the Unix domain socket.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-p <replaceable class="parameter">port</replaceable></option></term>
+      <term><option>--port=<replaceable class="parameter">port</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the TCP port or local Unix domain socket file
+        extension on which the server
+        is listening for connections.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-U <replaceable class="parameter">username</replaceable></option></term>
+      <term><option>--username=<replaceable class="parameter">username</replaceable></option></term>
+      <listitem>
+       <para>
+        User name to connect as.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-w</option></term>
+      <term><option>--no-password</option></term>
+      <listitem>
+       <para>
+        Never issue a password prompt.  If the server requires
+        password authentication and a password is not available by
+        other means such as a <filename>.pgpass</filename> file, the
+        connection attempt will fail.  This option can be useful in
+        batch jobs and scripts where no user is present to enter a
+        password.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-W</option></term>
+      <term><option>--password</option></term>
+      <listitem>
+       <para>
+        Force <application>pg_repackdb</application> to prompt for a
+        password before connecting to a database.
+       </para>
+
+       <para>
+        This option is never essential, since
+        <application>pg_repackdb</application> will automatically prompt
+        for a password if the server demands password authentication.
+        However, <application>pg_repackdb</application> will waste a
+        connection attempt finding out that the server wants a password.
+        In some cases it is worth typing <option>-W</option> to avoid the extra
+        connection attempt.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--maintenance-db=<replaceable class="parameter">dbname</replaceable></option></term>
+      <listitem>
+       <para>
+        When the <option>-a</option>/<option>--all</option> is used, connect
+        to this database to gather the list of databases to repack.
+        If not specified, the <literal>postgres</literal> database will be used,
+        or if that does not exist, <literal>template1</literal> will be used.
+        This can be a <link linkend="libpq-connstring">connection
+        string</link>.  If so, connection string parameters will override any
+        conflicting command line options.  Also, connection string parameters
+        other than the database name itself will be re-used when connecting
+        to other databases.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+   </para>
+ </refsect1>
+
+
+ <refsect1>
+  <title>Environment</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><envar>PGDATABASE</envar></term>
+    <term><envar>PGHOST</envar></term>
+    <term><envar>PGPORT</envar></term>
+    <term><envar>PGUSER</envar></term>
+
+    <listitem>
+     <para>
+      Default connection parameters
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><envar>PG_COLOR</envar></term>
+    <listitem>
+     <para>
+      Specifies whether to use color in diagnostic messages. Possible values
+      are <literal>always</literal>, <literal>auto</literal> and
+      <literal>never</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <para>
+   This utility, like most other <productname>PostgreSQL</productname> utilities,
+   also uses the environment variables supported by <application>libpq</application>
+   (see <xref linkend="libpq-envars"/>).
+  </para>
+
+ </refsect1>
+
+
+ <refsect1>
+  <title>Diagnostics</title>
+
+  <para>
+   In case of difficulty, see
+   <xref linkend="sql-repack"/> and <xref linkend="app-psql"/> for
+   discussions of potential problems and error messages.
+   The database server must be running at the
+   targeted host.  Also, any default connection settings and environment
+   variables used by the <application>libpq</application> front-end
+   library will apply.
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+   <para>
+    To repack the database <literal>test</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb test</userinput>
+</screen>
+   </para>
+
+   <para>
+    To repack and analyze for the optimizer a database named
+    <literal>bigdb</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --analyze bigdb</userinput>
+</screen>
+   </para>
+
+   <para>
+    To repack a single table
+    <literal>foo</literal> in a database named
+    <literal>xyzzy</literal>, and analyze a single column
+    <literal>bar</literal> of the table for the optimizer:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --analyze --verbose --table='foo(bar)' xyzzy</userinput>
+</screen></para>
+
+   <para>
+    To repack all tables in the <literal>foo</literal> and <literal>bar</literal> schemas
+    in a database named <literal>xyzzy</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --schema='foo' --schema='bar' xyzzy</userinput>
+</screen></para>
+
+
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-repack"/></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/repack.sgml b/doc/src/sgml/ref/repack.sgml
new file mode 100644
index 00000000000..61d5c2cdef1
--- /dev/null
+++ b/doc/src/sgml/ref/repack.sgml
@@ -0,0 +1,328 @@
+<!--
+doc/src/sgml/ref/repack.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="sql-repack">
+ <indexterm zone="sql-repack">
+  <primary>REPACK</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>REPACK</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>REPACK</refname>
+  <refpurpose>rewrite a table to reclaim disk space</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <replaceable class="parameter">table_and_columns</replaceable> [ USING INDEX [ <replaceable class="parameter">index_name</replaceable> ] ] ]
+REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] USING INDEX
+
+<phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
+
+    VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
+    ANALYZE [ <replaceable class="parameter">boolean</replaceable> ]
+
+<phrase>and <replaceable class="parameter">table_and_columns</replaceable> is:</phrase>
+
+    <replaceable class="parameter">table_name</replaceable> [ ( <replaceable class="parameter">column_name</replaceable> [, ...] ) ]
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>REPACK</command> reclaims storage occupied by dead
+   tuples. Unlike <command>VACUUM</command>, it does so by rewriting the
+   entire contents of the table specified
+   by <replaceable class="parameter">table_name</replaceable> into a new disk
+   file with no extra space (except for the space guaranteed by
+   the <literal>fillfactor</literal> storage parameter), allowing unused space
+   to be returned to the operating system.
+  </para>
+
+  <para>
+   Without
+   a <replaceable class="parameter">table_name</replaceable>, <command>REPACK</command>
+   processes every table and materialized view in the current database that
+   the current user has the <literal>MAINTAIN</literal> privilege on. This
+   form of <command>REPACK</command> cannot be executed inside a transaction
+   block.
+  </para>
+
+  <para>
+   If a <literal>USING INDEX</literal> clause is specified, the rows are
+   physically reordered based on information from an index.  Please see the
+   notes on clustering below.
+  </para>
+
+  <para>
+   When a table is being repacked, an <literal>ACCESS EXCLUSIVE</literal> lock
+   is acquired on it. This prevents any other database operations (both reads
+   and writes) from operating on the table until the <command>REPACK</command>
+   is finished.
+  </para>
+
+  <refsect2 id="sql-repack-notes-on-clustering" xreflabel="Notes on Clustering">
+   <title>Notes on Clustering</title>
+
+   <para>
+    If the <literal>USING INDEX</literal> clause is specified, the rows in
+    the table are physically reordered following an index: if an index name
+    is specified in the command, then that index is used; if no index name
+    is specified, then the index that has been configured as the index to
+    cluster on.  If no index has been configured in this way, an error is
+    thrown.  The index given in the <literal>USING INDEX</literal> clause
+    is configured as the index to cluster on, as well as an index given
+    to the <command>CLUSTER</command> command.  An index can be set
+    manually using <command>ALTER TABLE ... CLUSTER ON</command>, and reset
+    with <command>ALTER TABLE ... SET WITHOUT CLUSTER</command>.
+   </para>
+
+   <para>
+    If no table name is specified in <command>REPACK USING INDEX</command>,
+    all tables which have a clustering index defined and which the calling
+    user has privileges for are processed.
+   </para>
+
+   <para>
+    Clustering is a one-time operation: when the table is
+    subsequently updated, the changes are not clustered.  That is, no attempt
+    is made to store new or updated rows according to their index order.  (If
+    one wishes, one can periodically recluster by issuing the command again.
+    Also, setting the table's <literal>fillfactor</literal> storage parameter
+    to less than 100% can aid in preserving cluster ordering during updates,
+    since updated rows are kept on the same page if enough space is available
+    there.)
+   </para>
+
+   <para>
+    In cases where you are accessing single rows randomly within a table, the
+    actual order of the data in the table is unimportant. However, if you tend
+    to access some data more than others, and there is an index that groups
+    them together, you will benefit from using clustering.  If
+    you are requesting a range of indexed values from a table, or a single
+    indexed value that has multiple rows that match,
+    <command>REPACK</command> will help because once the index identifies the
+    table page for the first row that matches, all other rows that match are
+    probably already on the same table page, and so you save disk accesses and
+    speed up the query.
+   </para>
+
+   <para>
+    <command>REPACK</command> can re-sort the table using either an index scan
+    on the specified index (if the index is a b-tree), or a sequential scan
+    followed by sorting.  It will attempt to choose the method that will be
+    faster, based on planner cost parameters and available statistical
+    information.
+   </para>
+
+   <para>
+    Because the planner records statistics about the ordering of tables, it is
+    advisable to
+    run <link linkend="sql-analyze"><command>ANALYZE</command></link> on the
+    newly repacked table.  Otherwise, the planner might make poor choices of
+    query plans.
+   </para>
+  </refsect2>
+
+  <refsect2 id="sql-repack-notes-on-resources" xreflabel="Notes on Resources">
+   <title>Notes on Resources</title>
+
+   <para>
+    When an index scan or a sequential scan without sort is used, a temporary
+    copy of the table is created that contains the table data in the index
+    order.  Temporary copies of each index on the table are created as well.
+    Therefore, you need free space on disk at least equal to the sum of the
+    table size and the index sizes.
+   </para>
+
+   <para>
+    When a sequential scan and sort is used, a temporary sort file is also
+    created, so that the peak temporary space requirement is as much as double
+    the table size, plus the index sizes.  This method is often faster than
+    the index scan method, but if the disk space requirement is intolerable,
+    you can disable this choice by temporarily setting
+    <xref linkend="guc-enable-sort"/> to <literal>off</literal>.
+   </para>
+
+   <para>
+    It is advisable to set <xref linkend="guc-maintenance-work-mem"/> to a
+    reasonably large value (but not more than the amount of RAM you can
+    dedicate to the <command>REPACK</command> operation) before repacking.
+   </para>
+  </refsect2>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><replaceable class="parameter">table_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (possibly schema-qualified) of a table.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">column_name</replaceable></term>
+    <listitem>
+     <para>
+      The name of a specific column to analyze. Defaults to all columns.
+      If a column list is specific, <literal>ANALYZE</literal> must also
+      be specified.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">index_name</replaceable></term>
+    <listitem>
+     <para>
+      The name of an index.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>VERBOSE</literal></term>
+    <listitem>
+     <para>
+      Prints a progress report as each table is repacked
+      at <literal>INFO</literal> level.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>ANALYZE</literal></term>
+    <term><literal>ANALYSE</literal></term>
+    <listitem>
+     <para>
+      Applies <xref linkend="sql-analyze"/> on the table after repacking.  This is
+      currently only supported when a single (non-partitioned) table is specified.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">boolean</replaceable></term>
+    <listitem>
+     <para>
+      Specifies whether the selected option should be turned on or off.
+      You can write <literal>TRUE</literal>, <literal>ON</literal>, or
+      <literal>1</literal> to enable the option, and <literal>FALSE</literal>,
+      <literal>OFF</literal>, or <literal>0</literal> to disable it.  The
+      <replaceable class="parameter">boolean</replaceable> value can also
+      be omitted, in which case <literal>TRUE</literal> is assumed.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+
+   <para>
+    To repack a table, one must have the <literal>MAINTAIN</literal> privilege
+    on the table.
+   </para>
+
+   <para>
+    While <command>REPACK</command> is running, the <xref
+    linkend="guc-search-path"/> is temporarily changed to <literal>pg_catalog,
+    pg_temp</literal>.
+   </para>
+
+  <para>
+    Each backend running <command>REPACK</command> will report its progress
+    in the <structname>pg_stat_progress_repack</structname> view. See
+    <xref linkend="repack-progress-reporting"/> for details.
+  </para>
+
+   <para>
+    Repacking a partitioned table repacks each of its partitions. If an index
+    is specified, each partition is repacked using the partition of that
+    index. <command>REPACK</command> on a partitioned table cannot be executed
+    inside a transaction block.
+   </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   Repack the table <literal>employees</literal>:
+<programlisting>
+REPACK employees;
+</programlisting>
+  </para>
+
+  <para>
+   Repack the table <literal>employees</literal> on the basis of its
+   index <literal>employees_ind</literal> (Since index is used here, this is
+   effectively clustering):
+<programlisting>
+REPACK employees USING INDEX employees_ind;
+</programlisting>
+  </para>
+
+  <para>
+   Repack the table <literal>cases</literal> on physical ordering,
+   running an <command>ANALYZE</command> on the given columns once
+   repacking is done, showing informational messages:
+<programlisting>
+REPACK (ANALYZE, VERBOSE) cases (district, case_nr);
+</programlisting>
+  </para>
+
+  <para>
+   Repack all tables in the database on which you have
+   the <literal>MAINTAIN</literal> privilege:
+<programlisting>
+REPACK;
+</programlisting>
+  </para>
+
+  <para>
+   Repack all tables for which a clustering index has previously been
+   configured on which you have the <literal>MAINTAIN</literal> privilege,
+   showing informational messages:
+<programlisting>
+REPACK (VERBOSE) USING INDEX;
+</programlisting>
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There is no <command>REPACK</command> statement in the SQL standard.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="app-pgrepackdb"/></member>
+   <member><xref linkend="repack-progress-reporting"/></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index 6d0fdd43cfb..ac5d083d468 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -25,7 +25,6 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
 
 <phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
 
-    FULL [ <replaceable class="parameter">boolean</replaceable> ]
     FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
     VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
     ANALYZE [ <replaceable class="parameter">boolean</replaceable> ]
@@ -39,6 +38,7 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
     SKIP_DATABASE_STATS [ <replaceable class="parameter">boolean</replaceable> ]
     ONLY_DATABASE_STATS [ <replaceable class="parameter">boolean</replaceable> ]
     BUFFER_USAGE_LIMIT <replaceable class="parameter">size</replaceable>
+    FULL [ <replaceable class="parameter">boolean</replaceable> ]
 
 <phrase>and <replaceable class="parameter">table_and_columns</replaceable> is:</phrase>
 
@@ -95,20 +95,6 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
   <title>Parameters</title>
 
   <variablelist>
-   <varlistentry>
-    <term><literal>FULL</literal></term>
-    <listitem>
-     <para>
-      Selects <quote>full</quote> vacuum, which can reclaim more
-      space, but takes much longer and exclusively locks the table.
-      This method also requires extra disk space, since it writes a
-      new copy of the table and doesn't release the old copy until
-      the operation is complete.  Usually this should only be used when a
-      significant amount of space needs to be reclaimed from within the table.
-     </para>
-    </listitem>
-   </varlistentry>
-
    <varlistentry>
     <term><literal>FREEZE</literal></term>
     <listitem>
@@ -362,6 +348,23 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>FULL</literal></term>
+    <listitem>
+     <para>
+      This option, which is deprecated, makes <command>VACUUM</command>
+      behave like <command>REPACK</command> without a
+      <literal>USING INDEX</literal> clause.
+      This method of compacting the table takes much longer than
+      <command>VACUUM</command> and exclusively locks the table.
+      This method also requires extra disk space, since it writes a
+      new copy of the table and doesn't release the old copy until
+      the operation is complete.  Usually this should only be used when a
+      significant amount of space needs to be reclaimed from within the table.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><replaceable class="parameter">boolean</replaceable></term>
     <listitem>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 2cf02c37b17..5d9a8a25a02 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -195,6 +195,7 @@
    &refreshMaterializedView;
    &reindex;
    &releaseSavepoint;
+   &repack;
    &reset;
    &revoke;
    &rollback;
@@ -258,6 +259,7 @@
    &pgIsready;
    &pgReceivewal;
    &pgRecvlogical;
+   &pgRepackdb;
    &pgRestore;
    &pgVerifyBackup;
    &psqlRef;
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 09a456e9966..778377b9866 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -741,13 +741,13 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	if (OldIndex != NULL && !use_sort)
 	{
 		const int	ci_index[] = {
-			PROGRESS_CLUSTER_PHASE,
-			PROGRESS_CLUSTER_INDEX_RELID
+			PROGRESS_REPACK_PHASE,
+			PROGRESS_REPACK_INDEX_RELID
 		};
 		int64		ci_val[2];
 
 		/* Set phase and OIDOldIndex to columns */
-		ci_val[0] = PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP;
+		ci_val[0] = PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP;
 		ci_val[1] = RelationGetRelid(OldIndex);
 		pgstat_progress_update_multi_param(2, ci_index, ci_val);
 
@@ -759,15 +759,15 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	else
 	{
 		/* In scan-and-sort mode and also VACUUM FULL, set phase */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
 		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
 		/* Set total heap blocks */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_TOTAL_HEAP_BLKS,
+		pgstat_progress_update_param(PROGRESS_REPACK_TOTAL_HEAP_BLKS,
 									 heapScan->rs_nblocks);
 	}
 
@@ -809,7 +809,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 				 * is manually updated to the correct value when the table
 				 * scan finishes.
 				 */
-				pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+				pgstat_progress_update_param(PROGRESS_REPACK_HEAP_BLKS_SCANNED,
 											 heapScan->rs_nblocks);
 				break;
 			}
@@ -825,7 +825,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			 */
 			if (prev_cblock != heapScan->rs_cblock)
 			{
-				pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+				pgstat_progress_update_param(PROGRESS_REPACK_HEAP_BLKS_SCANNED,
 											 (heapScan->rs_cblock +
 											  heapScan->rs_nblocks -
 											  heapScan->rs_startblock
@@ -912,14 +912,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			 * In scan-and-sort mode, report increase in number of tuples
 			 * scanned
 			 */
-			pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
 										 *num_tuples);
 		}
 		else
 		{
 			const int	ct_index[] = {
-				PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED,
-				PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN
+				PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
+				PROGRESS_REPACK_HEAP_TUPLES_WRITTEN
 			};
 			int64		ct_val[2];
 
@@ -952,14 +952,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		double		n_tuples = 0;
 
 		/* Report that we are now sorting tuples */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_SORT_TUPLES);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_SORT_TUPLES);
 
 		tuplesort_performsort(tuplesort);
 
 		/* Report that we are now writing new heap */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_WRITE_NEW_HEAP);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP);
 
 		for (;;)
 		{
@@ -977,7 +977,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 									 values, isnull,
 									 rwstate);
 			/* Report n_tuples */
-			pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_WRITTEN,
 										 n_tuples);
 		}
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 43de42ce39e..5ee6389d39c 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -4077,7 +4077,7 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 		Assert(!ReindexIsProcessingIndex(indexOid));
 
 		/* Set index rebuild count */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_INDEX_REBUILD_COUNT,
+		pgstat_progress_update_param(PROGRESS_REPACK_INDEX_REBUILD_COUNT,
 									 i);
 		i++;
 	}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 7553f31fef0..3f05ba3083a 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1283,14 +1283,15 @@ CREATE VIEW pg_stat_progress_vacuum AS
     FROM pg_stat_get_progress_info('VACUUM') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
-CREATE VIEW pg_stat_progress_cluster AS
+CREATE VIEW pg_stat_progress_repack AS
     SELECT
         S.pid AS pid,
         S.datid AS datid,
         D.datname AS datname,
         S.relid AS relid,
         CASE S.param1 WHEN 1 THEN 'CLUSTER'
-                      WHEN 2 THEN 'VACUUM FULL'
+                      WHEN 2 THEN 'REPACK'
+                      WHEN 3 THEN 'VACUUM FULL'
                       END AS command,
         CASE S.param2 WHEN 0 THEN 'initializing'
                       WHEN 1 THEN 'seq scanning heap'
@@ -1301,15 +1302,35 @@ CREATE VIEW pg_stat_progress_cluster AS
                       WHEN 6 THEN 'rebuilding index'
                       WHEN 7 THEN 'performing final cleanup'
                       END AS phase,
-        CAST(S.param3 AS oid) AS cluster_index_relid,
+        CAST(S.param3 AS oid) AS repack_index_relid,
         S.param4 AS heap_tuples_scanned,
         S.param5 AS heap_tuples_written,
         S.param6 AS heap_blks_total,
         S.param7 AS heap_blks_scanned,
         S.param8 AS index_rebuild_count
-    FROM pg_stat_get_progress_info('CLUSTER') AS S
+    FROM pg_stat_get_progress_info('REPACK') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
+-- This view is as the one above, except for renaming a column and avoiding
+-- 'REPACK' as a command name to report.
+CREATE VIEW pg_stat_progress_cluster AS
+    SELECT
+        pid,
+        datid,
+        datname,
+        relid,
+        CASE WHEN command IN ('CLUSTER', 'VACUUM FULL') THEN command
+             WHEN repack_index_relid = 0 THEN 'VACUUM FULL'
+             ELSE 'CLUSTER' END AS command,
+        phase,
+        repack_index_relid AS cluster_index_relid,
+        heap_tuples_scanned,
+        heap_tuples_written,
+        heap_blks_total,
+        heap_blks_scanned,
+        index_rebuild_count
+    FROM pg_stat_progress_repack;
+
 CREATE VIEW pg_stat_progress_create_index AS
     SELECT
         S.pid AS pid, S.datid AS datid, D.datname AS datname,
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 60a4617a585..06f6dfc37a5 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1,7 +1,8 @@
 /*-------------------------------------------------------------------------
  *
  * cluster.c
- *	  CLUSTER a table on an index.  This is now also used for VACUUM FULL.
+ *	  CLUSTER a table on an index.  This is now also used for VACUUM FULL and
+ *	  REPACK.
  *
  * There is hardly anything left of Paul Brown's original implementation...
  *
@@ -67,27 +68,36 @@ typedef struct
 	Oid			indexOid;
 } RelToCluster;
 
-
-static void cluster_multiple_rels(List *rtcs, ClusterParams *params);
-static void rebuild_relation(Relation OldHeap, Relation index, bool verbose);
+static bool cluster_rel_recheck(RepackCommand cmd, Relation OldHeap,
+								Oid indexOid, Oid userid, int options);
+static void rebuild_relation(RepackCommand cmd,
+							 Relation OldHeap, Relation index, bool verbose);
 static void copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
-static List *get_tables_to_cluster(MemoryContext cluster_context);
-static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
-											   Oid indexOid);
-static bool cluster_is_permitted_for_relation(Oid relid, Oid userid);
+static List *get_tables_to_repack(RepackCommand cmd, bool usingindex,
+								  MemoryContext permcxt);
+static List *get_tables_to_repack_partitioned(RepackCommand cmd,
+											  Oid relid, bool rel_is_index,
+											  MemoryContext permcxt);
+static bool cluster_is_permitted_for_relation(RepackCommand cmd,
+											  Oid relid, Oid userid);
+static Relation process_single_relation(RepackStmt *stmt,
+										ClusterParams *params);
+static Oid	determine_clustered_index(Relation rel, bool usingindex,
+									  const char *indexname);
+static const char *RepackCommandAsString(RepackCommand cmd);
 
 
-/*---------------------------------------------------------------------------
- * This cluster code allows for clustering multiple tables at once. Because
+/*
+ * The repack code allows for processing multiple tables at once. Because
  * of this, we cannot just run everything on a single transaction, or we
  * would be forced to acquire exclusive locks on all the tables being
  * clustered, simultaneously --- very likely leading to deadlock.
  *
- * To solve this we follow a similar strategy to VACUUM code,
- * clustering each relation in a separate transaction. For this to work,
- * we need to:
+ * To solve this we follow a similar strategy to VACUUM code, processing each
+ * relation in a separate transaction. For this to work, we need to:
+ *
  *	- provide a separate memory context so that we can pass information in
  *	  a way that survives across transactions
  *	- start a new transaction every time a new relation is clustered
@@ -98,197 +108,165 @@ static bool cluster_is_permitted_for_relation(Oid relid, Oid userid);
  *
  * The single-relation case does not have any such overhead.
  *
- * We also allow a relation to be specified without index.  In that case,
- * the indisclustered bit will be looked up, and an ERROR will be thrown
- * if there is no index with the bit set.
- *---------------------------------------------------------------------------
+ * We also allow a relation to be repacked following an index, but without
+ * naming a specific one.  In that case, the indisclustered bit will be
+ * looked up, and an ERROR will be thrown if no so-marked index is found.
  */
 void
-cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
+ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 {
-	ListCell   *lc;
 	ClusterParams params = {0};
-	bool		verbose = false;
 	Relation	rel = NULL;
-	Oid			indexOid = InvalidOid;
-	MemoryContext cluster_context;
+	MemoryContext repack_context;
 	List	   *rtcs;
 
 	/* Parse option list */
-	foreach(lc, stmt->params)
+	foreach_node(DefElem, opt, stmt->params)
 	{
-		DefElem    *opt = (DefElem *) lfirst(lc);
-
 		if (strcmp(opt->defname, "verbose") == 0)
-			verbose = defGetBoolean(opt);
+			params.options |= defGetBoolean(opt) ? CLUOPT_VERBOSE : 0;
+		else if (strcmp(opt->defname, "analyze") == 0 ||
+				 strcmp(opt->defname, "analyse") == 0)
+			params.options |= defGetBoolean(opt) ? CLUOPT_ANALYZE : 0;
 		else
 			ereport(ERROR,
-					(errcode(ERRCODE_SYNTAX_ERROR),
-					 errmsg("unrecognized %s option \"%s\"",
-							"CLUSTER", opt->defname),
-					 parser_errposition(pstate, opt->location)));
+					errcode(ERRCODE_SYNTAX_ERROR),
+					errmsg("unrecognized %s option \"%s\"",
+						   RepackCommandAsString(stmt->command),
+						   opt->defname),
+					parser_errposition(pstate, opt->location));
 	}
 
-	params.options = (verbose ? CLUOPT_VERBOSE : 0);
-
+	/*
+	 * If a single relation is specified, process it and we're done ... unless
+	 * the relation is a partitioned table, in which case we fall through.
+	 */
 	if (stmt->relation != NULL)
 	{
-		/* This is the single-relation case. */
-		Oid			tableOid;
-
-		/*
-		 * Find, lock, and check permissions on the table.  We obtain
-		 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
-		 * single-transaction case.
-		 */
-		tableOid = RangeVarGetRelidExtended(stmt->relation,
-											AccessExclusiveLock,
-											0,
-											RangeVarCallbackMaintainsTable,
-											NULL);
-		rel = table_open(tableOid, NoLock);
-
-		/*
-		 * Reject clustering a remote temp table ... their local buffer
-		 * manager is not going to cope.
-		 */
-		if (RELATION_IS_OTHER_TEMP(rel))
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot cluster temporary tables of other sessions")));
-
-		if (stmt->indexname == NULL)
-		{
-			ListCell   *index;
-
-			/* We need to find the index that has indisclustered set. */
-			foreach(index, RelationGetIndexList(rel))
-			{
-				indexOid = lfirst_oid(index);
-				if (get_index_isclustered(indexOid))
-					break;
-				indexOid = InvalidOid;
-			}
-
-			if (!OidIsValid(indexOid))
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_OBJECT),
-						 errmsg("there is no previously clustered index for table \"%s\"",
-								stmt->relation->relname)));
-		}
-		else
-		{
-			/*
-			 * The index is expected to be in the same namespace as the
-			 * relation.
-			 */
-			indexOid = get_relname_relid(stmt->indexname,
-										 rel->rd_rel->relnamespace);
-			if (!OidIsValid(indexOid))
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_OBJECT),
-						 errmsg("index \"%s\" for table \"%s\" does not exist",
-								stmt->indexname, stmt->relation->relname)));
-		}
-
-		/* For non-partitioned tables, do what we came here to do. */
-		if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
-		{
-			cluster_rel(rel, indexOid, &params);
-			/* cluster_rel closes the relation, but keeps lock */
-
-			return;
-		}
+		rel = process_single_relation(stmt, &params);
+		if (rel == NULL)
+			return;				/* all done */
 	}
 
+	/*
+	 * Don't allow ANALYZE in the multiple-relation case for now.  Maybe we
+	 * can add support for this later.
+	 */
+	if (params.options & CLUOPT_ANALYZE)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot %s multiple tables", "REPACK (ANALYZE)"));
+
 	/*
 	 * By here, we know we are in a multi-table situation.  In order to avoid
 	 * holding locks for too long, we want to process each table in its own
 	 * transaction.  This forces us to disallow running inside a user
 	 * transaction block.
 	 */
-	PreventInTransactionBlock(isTopLevel, "CLUSTER");
+	PreventInTransactionBlock(isTopLevel, RepackCommandAsString(stmt->command));
 
 	/* Also, we need a memory context to hold our list of relations */
-	cluster_context = AllocSetContextCreate(PortalContext,
-											"Cluster",
-											ALLOCSET_DEFAULT_SIZES);
+	repack_context = AllocSetContextCreate(PortalContext,
+										   "Repack",
+										   ALLOCSET_DEFAULT_SIZES);
+
+	params.options |= CLUOPT_RECHECK;
 
 	/*
-	 * Either we're processing a partitioned table, or we were not given any
-	 * table name at all.  In either case, obtain a list of relations to
-	 * process.
-	 *
-	 * In the former case, an index name must have been given, so we don't
-	 * need to recheck its "indisclustered" bit, but we have to check that it
-	 * is an index that we can cluster on.  In the latter case, we set the
-	 * option bit to have indisclustered verified.
-	 *
-	 * Rechecking the relation itself is necessary here in all cases.
+	 * If we don't have a relation yet, determine a relation list.  If we do,
+	 * then it must be a partitioned table, and we want to process its
+	 * partitions.
 	 */
-	params.options |= CLUOPT_RECHECK;
-	if (rel != NULL)
+	if (rel == NULL)
 	{
-		Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
-		check_index_is_clusterable(rel, indexOid, AccessShareLock);
-		rtcs = get_tables_to_cluster_partitioned(cluster_context, indexOid);
-
-		/* close relation, releasing lock on parent table */
-		table_close(rel, AccessExclusiveLock);
+		Assert(stmt->indexname == NULL);
+		rtcs = get_tables_to_repack(stmt->command, stmt->usingindex,
+									repack_context);
 	}
 	else
 	{
-		rtcs = get_tables_to_cluster(cluster_context);
-		params.options |= CLUOPT_RECHECK_ISCLUSTERED;
-	}
+		Oid			relid;
+		bool		rel_is_index;
 
-	/* Do the job. */
-	cluster_multiple_rels(rtcs, &params);
+		Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
 
-	/* Start a new transaction for the cleanup work. */
-	StartTransactionCommand();
+		/*
+		 * If USING INDEX was specified, resolve the index name now and pass
+		 * it down.
+		 */
+		if (stmt->usingindex)
+		{
+			/*
+			 * If no index name was specified when repacking a partitioned
+			 * table, punt for now.  Maybe we can improve this later.
+			 */
+			if (!stmt->indexname)
+				ereport(ERROR,
+						errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+						errmsg("there is no previously clustered index for table \"%s\"",
+							   RelationGetRelationName(rel)));
+
+			relid = determine_clustered_index(rel, stmt->usingindex,
+											  stmt->indexname);
+			if (!OidIsValid(relid))
+				elog(ERROR, "unable to determine index to cluster on");
+			/* XXX is this the right place for this check? */
+			check_index_is_clusterable(rel, relid, AccessExclusiveLock);
+			rel_is_index = true;
+		}
+		else
+		{
+			relid = RelationGetRelid(rel);
+			rel_is_index = false;
+		}
 
-	/* Clean up working storage */
-	MemoryContextDelete(cluster_context);
-}
+		rtcs = get_tables_to_repack_partitioned(stmt->command,
+												relid, rel_is_index,
+												repack_context);
 
-/*
- * Given a list of relations to cluster, process each of them in a separate
- * transaction.
- *
- * We expect to be in a transaction at start, but there isn't one when we
- * return.
- */
-static void
-cluster_multiple_rels(List *rtcs, ClusterParams *params)
-{
-	ListCell   *lc;
+		/* close parent relation, releasing lock on it */
+		table_close(rel, AccessExclusiveLock);
+		rel = NULL;
+	}
 
 	/* Commit to get out of starting transaction */
 	PopActiveSnapshot();
 	CommitTransactionCommand();
 
 	/* Cluster the tables, each in a separate transaction */
-	foreach(lc, rtcs)
+	Assert(rel == NULL);
+	foreach_ptr(RelToCluster, rtc, rtcs)
 	{
-		RelToCluster *rtc = (RelToCluster *) lfirst(lc);
-		Relation	rel;
-
 		/* Start a new transaction for each relation. */
 		StartTransactionCommand();
 
+		/*
+		 * Open the target table, coping with the case where it has been
+		 * dropped.
+		 */
+		rel = try_table_open(rtc->tableOid, AccessExclusiveLock);
+		if (rel == NULL)
+		{
+			CommitTransactionCommand();
+			continue;
+		}
+
 		/* functions in indexes may want a snapshot set */
 		PushActiveSnapshot(GetTransactionSnapshot());
 
-		rel = table_open(rtc->tableOid, AccessExclusiveLock);
-
 		/* Process this table */
-		cluster_rel(rel, rtc->indexOid, params);
+		cluster_rel(stmt->command, rel, rtc->indexOid, &params);
 		/* cluster_rel closes the relation, but keeps lock */
 
 		PopActiveSnapshot();
 		CommitTransactionCommand();
 	}
+
+	/* Start a new transaction for the cleanup work. */
+	StartTransactionCommand();
+
+	/* Clean up working storage */
+	MemoryContextDelete(repack_context);
 }
 
 /*
@@ -304,11 +282,14 @@ cluster_multiple_rels(List *rtcs, ClusterParams *params)
  * them incrementally while we load the table.
  *
  * If indexOid is InvalidOid, the table will be rewritten in physical order
- * instead of index order.  This is the new implementation of VACUUM FULL,
- * and error messages should refer to the operation as VACUUM not CLUSTER.
+ * instead of index order.
+ *
+ * 'cmd' indicates which command is being executed, to be used for error
+ * messages.
  */
 void
-cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
+cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
+			ClusterParams *params)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			save_userid;
@@ -323,13 +304,8 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	/* Check for user-requested abort. */
 	CHECK_FOR_INTERRUPTS();
 
-	pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
-	if (OidIsValid(indexOid))
-		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
-									 PROGRESS_CLUSTER_COMMAND_CLUSTER);
-	else
-		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
-									 PROGRESS_CLUSTER_COMMAND_VACUUM_FULL);
+	pgstat_progress_start_command(PROGRESS_COMMAND_REPACK, tableOid);
+	pgstat_progress_update_param(PROGRESS_REPACK_COMMAND, cmd);
 
 	/*
 	 * Switch to the table owner's userid, so that any index functions are run
@@ -350,86 +326,38 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	 * *must* skip the one on indisclustered since it would reject an attempt
 	 * to cluster a not-previously-clustered index.
 	 */
-	if (recheck)
-	{
-		/* Check that the user still has privileges for the relation */
-		if (!cluster_is_permitted_for_relation(tableOid, save_userid))
-		{
-			relation_close(OldHeap, AccessExclusiveLock);
-			goto out;
-		}
-
-		/*
-		 * Silently skip a temp table for a remote session.  Only doing this
-		 * check in the "recheck" case is appropriate (which currently means
-		 * somebody is executing a database-wide CLUSTER or on a partitioned
-		 * table), because there is another check in cluster() which will stop
-		 * any attempt to cluster remote temp tables by name.  There is
-		 * another check in cluster_rel which is redundant, but we leave it
-		 * for extra safety.
-		 */
-		if (RELATION_IS_OTHER_TEMP(OldHeap))
-		{
-			relation_close(OldHeap, AccessExclusiveLock);
-			goto out;
-		}
-
-		if (OidIsValid(indexOid))
-		{
-			/*
-			 * Check that the index still exists
-			 */
-			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
-			{
-				relation_close(OldHeap, AccessExclusiveLock);
-				goto out;
-			}
-
-			/*
-			 * Check that the index is still the one with indisclustered set,
-			 * if needed.
-			 */
-			if ((params->options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
-				!get_index_isclustered(indexOid))
-			{
-				relation_close(OldHeap, AccessExclusiveLock);
-				goto out;
-			}
-		}
-	}
+	if (recheck &&
+		!cluster_rel_recheck(cmd, OldHeap, indexOid, save_userid,
+							 params->options))
+		goto out;
 
 	/*
-	 * We allow VACUUM FULL, but not CLUSTER, on shared catalogs.  CLUSTER
-	 * would work in most respects, but the index would only get marked as
-	 * indisclustered in the current database, leading to unexpected behavior
-	 * if CLUSTER were later invoked in another database.
+	 * We allow repacking shared catalogs only when not using an index. It
+	 * would work to use an index in most respects, but the index would only
+	 * get marked as indisclustered in the current database, leading to
+	 * unexpected behavior if CLUSTER were later invoked in another database.
 	 */
 	if (OidIsValid(indexOid) && OldHeap->rd_rel->relisshared)
 		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot cluster a shared catalog")));
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot run %s on a shared catalog",
+					   RepackCommandAsString(cmd)));
 
 	/*
 	 * Don't process temp tables of other backends ... their local buffer
 	 * manager is not going to cope.
 	 */
 	if (RELATION_IS_OTHER_TEMP(OldHeap))
-	{
-		if (OidIsValid(indexOid))
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot cluster temporary tables of other sessions")));
-		else
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot vacuum temporary tables of other sessions")));
-	}
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot run %s on temporary tables of other sessions",
+					   RepackCommandAsString(cmd)));
 
 	/*
 	 * Also check for active uses of the relation in the current transaction,
 	 * including open scans and pending AFTER trigger events.
 	 */
-	CheckTableNotInUse(OldHeap, OidIsValid(indexOid) ? "CLUSTER" : "VACUUM");
+	CheckTableNotInUse(OldHeap, RepackCommandAsString(cmd));
 
 	/* Check heap and index are valid to cluster on */
 	if (OidIsValid(indexOid))
@@ -442,6 +370,24 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	else
 		index = NULL;
 
+	/*
+	 * When allow_system_table_mods is turned off, we disallow repacking a
+	 * catalog on a particular index unless that's already the clustered index
+	 * for that catalog.
+	 *
+	 * XXX We don't check for this in CLUSTER, because it's historically been
+	 * allowed.
+	 */
+	if (cmd != REPACK_COMMAND_CLUSTER &&
+		!allowSystemTableMods && OidIsValid(indexOid) &&
+		IsCatalogRelation(OldHeap) && !index->rd_index->indisclustered)
+		ereport(ERROR,
+				errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				errmsg("permission denied: \"%s\" is a system catalog",
+					   RelationGetRelationName(OldHeap)),
+				errdetail("System catalogs can only be clustered by the index they're already clustered on, if any, unless \"%s\" is enabled.",
+						  "allow_system_table_mods"));
+
 	/*
 	 * Quietly ignore the request if this is a materialized view which has not
 	 * been populated from its query. No harm is done because there is no data
@@ -469,7 +415,7 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	TransferPredicateLocksToHeapRelation(OldHeap);
 
 	/* rebuild_relation does all the dirty work */
-	rebuild_relation(OldHeap, index, verbose);
+	rebuild_relation(cmd, OldHeap, index, verbose);
 	/* rebuild_relation closes OldHeap, and index if valid */
 
 out:
@@ -482,6 +428,63 @@ out:
 	pgstat_progress_end_command();
 }
 
+/*
+ * Check if the table (and its index) still meets the requirements of
+ * cluster_rel().
+ */
+static bool
+cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
+					Oid userid, int options)
+{
+	Oid			tableOid = RelationGetRelid(OldHeap);
+
+	/* Check that the user still has privileges for the relation */
+	if (!cluster_is_permitted_for_relation(cmd, tableOid, userid))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		return false;
+	}
+
+	/*
+	 * Silently skip a temp table for a remote session.  Only doing this check
+	 * in the "recheck" case is appropriate (which currently means somebody is
+	 * executing a database-wide CLUSTER or on a partitioned table), because
+	 * there is another check in cluster() which will stop any attempt to
+	 * cluster remote temp tables by name.  There is another check in
+	 * cluster_rel which is redundant, but we leave it for extra safety.
+	 */
+	if (RELATION_IS_OTHER_TEMP(OldHeap))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		return false;
+	}
+
+	if (OidIsValid(indexOid))
+	{
+		/*
+		 * Check that the index still exists
+		 */
+		if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
+		{
+			relation_close(OldHeap, AccessExclusiveLock);
+			return false;
+		}
+
+		/*
+		 * Check that the index is still the one with indisclustered set, if
+		 * needed.
+		 */
+		if ((options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
+			!get_index_isclustered(indexOid))
+		{
+			relation_close(OldHeap, AccessExclusiveLock);
+			return false;
+		}
+	}
+
+	return true;
+}
+
 /*
  * Verify that the specified heap and index are valid to cluster on
  *
@@ -626,7 +629,8 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
  * On exit, they are closed, but locks on them are not released.
  */
 static void
-rebuild_relation(Relation OldHeap, Relation index, bool verbose)
+rebuild_relation(RepackCommand cmd,
+				 Relation OldHeap, Relation index, bool verbose)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			accessMethod = OldHeap->rd_rel->relam;
@@ -642,8 +646,8 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose)
 	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false) &&
 		   (index == NULL || CheckRelationLockedByMe(index, AccessExclusiveLock, false)));
 
-	if (index)
-		/* Mark the correct index as clustered */
+	/* for CLUSTER or REPACK USING INDEX, mark the index as the one to use */
+	if (index != NULL)
 		mark_index_clustered(OldHeap, RelationGetRelid(index), true);
 
 	/* Remember info about rel before closing OldHeap */
@@ -958,20 +962,20 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	/* Log what we're doing */
 	if (OldIndex != NULL && !use_sort)
 		ereport(elevel,
-				(errmsg("clustering \"%s.%s\" using index scan on \"%s\"",
-						nspname,
-						RelationGetRelationName(OldHeap),
-						RelationGetRelationName(OldIndex))));
+				errmsg("repacking \"%s.%s\" using index scan on \"%s\"",
+					   nspname,
+					   RelationGetRelationName(OldHeap),
+					   RelationGetRelationName(OldIndex)));
 	else if (use_sort)
 		ereport(elevel,
-				(errmsg("clustering \"%s.%s\" using sequential scan and sort",
-						nspname,
-						RelationGetRelationName(OldHeap))));
+				errmsg("repacking \"%s.%s\" using sequential scan and sort",
+					   nspname,
+					   RelationGetRelationName(OldHeap)));
 	else
 		ereport(elevel,
-				(errmsg("vacuuming \"%s.%s\"",
-						nspname,
-						RelationGetRelationName(OldHeap))));
+				errmsg("repacking \"%s.%s\" in physical order",
+					   nspname,
+					   RelationGetRelationName(OldHeap)));
 
 	/*
 	 * Hand off the actual copying to AM specific function, the generic code
@@ -1458,8 +1462,8 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	int			i;
 
 	/* Report that we are now swapping relation files */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_SWAP_REL_FILES);
 
 	/* Zero out possible results from swapped_relation_files */
 	memset(mapped_tables, 0, sizeof(mapped_tables));
@@ -1509,14 +1513,14 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 		reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
 
 	/* Report that we are now reindexing relations */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_REBUILD_INDEX);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
 
 	reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
 
 	/* Report that we are now doing clean up */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_FINAL_CLEANUP);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_FINAL_CLEANUP);
 
 	/*
 	 * If the relation being rebuilt is pg_class, swap_relation_files()
@@ -1632,106 +1636,191 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	}
 }
 
-
 /*
- * Get a list of tables that the current user has privileges on and
- * have indisclustered set.  Return the list in a List * of RelToCluster
- * (stored in the specified memory context), each one giving the tableOid
- * and the indexOid on which the table is already clustered.
+ * Determine which relations to process, when REPACK/CLUSTER is called
+ * without specifying a table name.  The exact process depends on whether
+ * USING INDEX was given or not, and in any case we only return tables and
+ * materialized views that the current user has privileges to repack/cluster.
+ *
+ * If USING INDEX was given, we scan pg_index to find those that have
+ * indisclustered set; if it was not given, scan pg_class and return all
+ * tables.
+ *
+ * Return it as a list of RelToCluster in the given memory context.
  */
 static List *
-get_tables_to_cluster(MemoryContext cluster_context)
+get_tables_to_repack(RepackCommand cmd, bool usingindex, MemoryContext permcxt)
 {
-	Relation	indRelation;
+	Relation	catalog;
 	TableScanDesc scan;
-	ScanKeyData entry;
-	HeapTuple	indexTuple;
-	Form_pg_index index;
-	MemoryContext old_context;
+	HeapTuple	tuple;
 	List	   *rtcs = NIL;
 
-	/*
-	 * Get all indexes that have indisclustered set and that the current user
-	 * has the appropriate privileges for.
-	 */
-	indRelation = table_open(IndexRelationId, AccessShareLock);
-	ScanKeyInit(&entry,
-				Anum_pg_index_indisclustered,
-				BTEqualStrategyNumber, F_BOOLEQ,
-				BoolGetDatum(true));
-	scan = table_beginscan_catalog(indRelation, 1, &entry);
-	while ((indexTuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+	if (usingindex)
 	{
-		RelToCluster *rtc;
+		ScanKeyData entry;
+
+		catalog = table_open(IndexRelationId, AccessShareLock);
+		ScanKeyInit(&entry,
+					Anum_pg_index_indisclustered,
+					BTEqualStrategyNumber, F_BOOLEQ,
+					BoolGetDatum(true));
+		scan = table_beginscan_catalog(catalog, 1, &entry);
+		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+		{
+			RelToCluster *rtc;
+			Form_pg_index index;
+			MemoryContext oldcxt;
 
-		index = (Form_pg_index) GETSTRUCT(indexTuple);
+			index = (Form_pg_index) GETSTRUCT(tuple);
 
-		if (!cluster_is_permitted_for_relation(index->indrelid, GetUserId()))
-			continue;
+			/*
+			 * Try to obtain a light lock on the index's table, to ensure it
+			 * doesn't go away while we collect the list.  If we cannot, just
+			 * disregard it.
+			 */
+			if (!ConditionalLockRelationOid(index->indrelid, AccessShareLock))
+				continue;
 
-		/* Use a permanent memory context for the result list */
-		old_context = MemoryContextSwitchTo(cluster_context);
+			/* Verify that the table still exists */
+			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(index->indrelid)))
+			{
+				/* Release useless lock */
+				UnlockRelationOid(index->indrelid, AccessShareLock);
+				continue;
+			}
 
-		rtc = palloc_object(RelToCluster);
-		rtc->tableOid = index->indrelid;
-		rtc->indexOid = index->indexrelid;
-		rtcs = lappend(rtcs, rtc);
+			if (!cluster_is_permitted_for_relation(cmd, index->indrelid,
+												   GetUserId()))
+				continue;
+
+			/* Use a permanent memory context for the result list */
+			oldcxt = MemoryContextSwitchTo(permcxt);
+			rtc = palloc_object(RelToCluster);
+			rtc->tableOid = index->indrelid;
+			rtc->indexOid = index->indexrelid;
+			rtcs = lappend(rtcs, rtc);
+			MemoryContextSwitchTo(oldcxt);
+		}
+	}
+	else
+	{
+		catalog = table_open(RelationRelationId, AccessShareLock);
+		scan = table_beginscan_catalog(catalog, 0, NULL);
+
+		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+		{
+			RelToCluster *rtc;
+			Form_pg_class class;
+			MemoryContext oldcxt;
+
+			class = (Form_pg_class) GETSTRUCT(tuple);
+
+			/*
+			 * Try to obtain a light lock on the table, to ensure it doesn't
+			 * go away while we collect the list.  If we cannot, just
+			 * disregard the table.
+			 */
+			if (!ConditionalLockRelationOid(class->oid, AccessShareLock))
+				continue;
 
-		MemoryContextSwitchTo(old_context);
+			/* Verify that the table still exists */
+			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(class->oid)))
+			{
+				/* Release useless lock */
+				UnlockRelationOid(class->oid, AccessShareLock);
+				continue;
+			}
+
+			/* Can only process plain tables and matviews */
+			if (class->relkind != RELKIND_RELATION &&
+				class->relkind != RELKIND_MATVIEW)
+				continue;
+
+			/* noisily skip rels which the user can't process */
+			if (!cluster_is_permitted_for_relation(cmd, class->oid,
+												   GetUserId()))
+				continue;
+
+			/* Use a permanent memory context for the result list */
+			oldcxt = MemoryContextSwitchTo(permcxt);
+			rtc = palloc_object(RelToCluster);
+			rtc->tableOid = class->oid;
+			rtc->indexOid = InvalidOid;
+			rtcs = lappend(rtcs, rtc);
+			MemoryContextSwitchTo(oldcxt);
+		}
 	}
-	table_endscan(scan);
 
-	relation_close(indRelation, AccessShareLock);
+	table_endscan(scan);
+	relation_close(catalog, AccessShareLock);
 
 	return rtcs;
 }
 
 /*
- * Given an index on a partitioned table, return a list of RelToCluster for
+ * Given a partitioned table or its index, return a list of RelToCluster for
  * all the children leaves tables/indexes.
  *
  * Like expand_vacuum_rel, but here caller must hold AccessExclusiveLock
  * on the table containing the index.
+ *
+ * 'rel_is_index' tells whether 'relid' is that of an index (true) or of the
+ * owning relation.
  */
 static List *
-get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
+get_tables_to_repack_partitioned(RepackCommand cmd, Oid relid,
+								 bool rel_is_index, MemoryContext permcxt)
 {
 	List	   *inhoids;
-	ListCell   *lc;
 	List	   *rtcs = NIL;
-	MemoryContext old_context;
 
-	/* Do not lock the children until they're processed */
-	inhoids = find_all_inheritors(indexOid, NoLock, NULL);
-
-	foreach(lc, inhoids)
+	/*
+	 * Do not lock the children until they're processed.  Note that we do hold
+	 * a lock on the parent partitioned table.
+	 */
+	inhoids = find_all_inheritors(relid, NoLock, NULL);
+	foreach_oid(child_oid, inhoids)
 	{
-		Oid			indexrelid = lfirst_oid(lc);
-		Oid			relid = IndexGetRelation(indexrelid, false);
+		Oid			table_oid,
+					index_oid;
 		RelToCluster *rtc;
+		MemoryContext oldcxt;
 
-		/* consider only leaf indexes */
-		if (get_rel_relkind(indexrelid) != RELKIND_INDEX)
-			continue;
+		if (rel_is_index)
+		{
+			/* consider only leaf indexes */
+			if (get_rel_relkind(child_oid) != RELKIND_INDEX)
+				continue;
+
+			table_oid = IndexGetRelation(child_oid, false);
+			index_oid = child_oid;
+		}
+		else
+		{
+			/* consider only leaf relations */
+			if (get_rel_relkind(child_oid) != RELKIND_RELATION)
+				continue;
+
+			table_oid = child_oid;
+			index_oid = InvalidOid;
+		}
 
 		/*
 		 * It's possible that the user does not have privileges to CLUSTER the
-		 * leaf partition despite having such privileges on the partitioned
-		 * table.  We skip any partitions which the user is not permitted to
-		 * CLUSTER.
+		 * leaf partition despite having them on the partitioned table.  Skip
+		 * if so.
 		 */
-		if (!cluster_is_permitted_for_relation(relid, GetUserId()))
+		if (!cluster_is_permitted_for_relation(cmd, table_oid, GetUserId()))
 			continue;
 
 		/* Use a permanent memory context for the result list */
-		old_context = MemoryContextSwitchTo(cluster_context);
-
+		oldcxt = MemoryContextSwitchTo(permcxt);
 		rtc = palloc_object(RelToCluster);
-		rtc->tableOid = relid;
-		rtc->indexOid = indexrelid;
+		rtc->tableOid = table_oid;
+		rtc->indexOid = index_oid;
 		rtcs = lappend(rtcs, rtc);
-
-		MemoryContextSwitchTo(old_context);
+		MemoryContextSwitchTo(oldcxt);
 	}
 
 	return rtcs;
@@ -1742,13 +1831,167 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
  * function emits a WARNING.
  */
 static bool
-cluster_is_permitted_for_relation(Oid relid, Oid userid)
+cluster_is_permitted_for_relation(RepackCommand cmd, Oid relid, Oid userid)
 {
+	Assert(cmd == REPACK_COMMAND_CLUSTER || cmd == REPACK_COMMAND_REPACK);
+
 	if (pg_class_aclcheck(relid, userid, ACL_MAINTAIN) == ACLCHECK_OK)
 		return true;
 
 	ereport(WARNING,
-			(errmsg("permission denied to cluster \"%s\", skipping it",
-					get_rel_name(relid))));
+			errmsg("permission denied to execute %s on \"%s\", skipping it",
+				   RepackCommandAsString(cmd),
+				   get_rel_name(relid)));
+
 	return false;
 }
+
+
+/*
+ * Given a RepackStmt with an indicated relation name, resolve the relation
+ * name, obtain lock on it, then determine what to do based on the relation
+ * type: if it's table and not partitioned, repack it as indicated (using an
+ * existing clustered index, or following the given one), and return NULL.
+ *
+ * On the other hand, if the table is partitioned, do nothing further and
+ * instead return the opened and locked relcache entry, so that caller can
+ * process the partitions using the multiple-table handling code.  In this
+ * case, if an index name is given, it's up to the caller to resolve it.
+ */
+static Relation
+process_single_relation(RepackStmt *stmt, ClusterParams *params)
+{
+	Relation	rel;
+	Oid			tableOid;
+
+	Assert(stmt->relation != NULL);
+	Assert(stmt->command == REPACK_COMMAND_CLUSTER ||
+		   stmt->command == REPACK_COMMAND_REPACK);
+
+	/*
+	 * Find, lock, and check permissions on the table.  We obtain
+	 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
+	 * single-transaction case.
+	 */
+	tableOid = RangeVarGetRelidExtended(stmt->relation->relation,
+										AccessExclusiveLock,
+										0,
+										RangeVarCallbackMaintainsTable,
+										NULL);
+	rel = table_open(tableOid, NoLock);
+
+	/*
+	 * Reject clustering a remote temp table ... their local buffer manager is
+	 * not going to cope.
+	 */
+	if (RELATION_IS_OTHER_TEMP(rel))
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot execute %s on temporary tables of other sessions",
+					   RepackCommandAsString(stmt->command)));
+
+	/*
+	 * Make sure ANALYZE is specified if a column list is present.
+	 */
+	if ((params->options & CLUOPT_ANALYZE) == 0 && stmt->relation->va_cols != NIL)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("ANALYZE option must be specified when a column list is provided"));
+
+	/*
+	 * For partitioned tables, let caller handle this.  Otherwise, process it
+	 * here and we're done.
+	 */
+	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+		return rel;
+	else
+	{
+		Oid			indexOid;
+
+		indexOid = determine_clustered_index(rel, stmt->usingindex,
+											 stmt->indexname);
+		if (OidIsValid(indexOid))
+			check_index_is_clusterable(rel, indexOid, AccessExclusiveLock);
+		cluster_rel(stmt->command, rel, indexOid, params);
+
+		/* Do an analyze, if requested */
+		if (params->options & CLUOPT_ANALYZE)
+		{
+			VacuumParams vac_params = {0};
+
+			vac_params.options |= VACOPT_ANALYZE;
+			if (params->options & CLUOPT_VERBOSE)
+				vac_params.options |= VACOPT_VERBOSE;
+			analyze_rel(tableOid, NULL, vac_params,
+						stmt->relation->va_cols, true, NULL);
+		}
+
+		return NULL;
+	}
+}
+
+/*
+ * Given a relation and the usingindex/indexname options in a
+ * REPACK USING INDEX or CLUSTER command, return the OID of the
+ * index to use for clustering the table.
+ *
+ * Caller must hold lock on the relation so that the set of indexes
+ * doesn't change, and must call check_index_is_clusterable.
+ */
+static Oid
+determine_clustered_index(Relation rel, bool usingindex, const char *indexname)
+{
+	Oid			indexOid;
+
+	if (indexname == NULL && usingindex)
+	{
+		/*
+		 * If USING INDEX with no name is given, find a clustered index, or
+		 * error out if none.
+		 */
+		indexOid = InvalidOid;
+		foreach_oid(idxoid, RelationGetIndexList(rel))
+		{
+			if (get_index_isclustered(idxoid))
+			{
+				indexOid = idxoid;
+				break;
+			}
+		}
+
+		if (!OidIsValid(indexOid))
+			ereport(ERROR,
+					errcode(ERRCODE_UNDEFINED_OBJECT),
+					errmsg("there is no previously clustered index for table \"%s\"",
+						   RelationGetRelationName(rel)));
+	}
+	else if (indexname != NULL)
+	{
+		/* An index was specified; obtain its OID. */
+		indexOid = get_relname_relid(indexname, rel->rd_rel->relnamespace);
+		if (!OidIsValid(indexOid))
+			ereport(ERROR,
+					errcode(ERRCODE_UNDEFINED_OBJECT),
+					errmsg("index \"%s\" for table \"%s\" does not exist",
+						   indexname, RelationGetRelationName(rel)));
+	}
+	else
+		indexOid = InvalidOid;
+
+	return indexOid;
+}
+
+static const char *
+RepackCommandAsString(RepackCommand cmd)
+{
+	switch (cmd)
+	{
+		case REPACK_COMMAND_REPACK:
+			return "REPACK";
+		case REPACK_COMMAND_VACUUMFULL:
+			return "VACUUM";
+		case REPACK_COMMAND_CLUSTER:
+			return "CLUSTER";
+	}
+	return "???";
+}
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index aa4fbec143f..c363467a9cc 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -351,7 +351,6 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 		}
 	}
 
-
 	/*
 	 * Sanity check DISABLE_PAGE_SKIPPING option.
 	 */
@@ -2289,8 +2288,9 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 			if ((params.options & VACOPT_VERBOSE) != 0)
 				cluster_params.options |= CLUOPT_VERBOSE;
 
-			/* VACUUM FULL is now a variant of CLUSTER; see cluster.c */
-			cluster_rel(rel, InvalidOid, &cluster_params);
+			/* VACUUM FULL is a variant of REPACK; see cluster.c */
+			cluster_rel(REPACK_COMMAND_VACUUMFULL, rel, InvalidOid,
+						&cluster_params);
 			/* cluster_rel closes the relation, but keeps lock */
 
 			rel = NULL;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 713ee5c10a2..54d37c10447 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -287,7 +287,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		AlterCompositeTypeStmt AlterUserMappingStmt
 		AlterRoleStmt AlterRoleSetStmt AlterPolicyStmt AlterStatsStmt
 		AlterDefaultPrivilegesStmt DefACLAction
-		AnalyzeStmt CallStmt ClosePortalStmt ClusterStmt CommentStmt
+		AnalyzeStmt CallStmt ClosePortalStmt CommentStmt
 		ConstraintsSetStmt CopyStmt CreateAsStmt CreateCastStmt
 		CreateDomainStmt CreateExtensionStmt CreateGroupStmt CreateOpClassStmt
 		CreateOpFamilyStmt AlterOpFamilyStmt CreatePLangStmt
@@ -304,7 +304,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		GrantStmt GrantRoleStmt ImportForeignSchemaStmt IndexStmt InsertStmt
 		ListenStmt LoadStmt LockStmt MergeStmt NotifyStmt ExplainableStmt PreparableStmt
 		CreateFunctionStmt AlterFunctionStmt ReindexStmt RemoveAggrStmt
-		RemoveFuncStmt RemoveOperStmt RenameStmt ReturnStmt RevokeStmt RevokeRoleStmt
+		RemoveFuncStmt RemoveOperStmt RenameStmt RepackStmt ReturnStmt RevokeStmt RevokeRoleStmt
 		RuleActionStmt RuleActionStmtOrEmpty RuleStmt
 		SecLabelStmt SelectStmt TransactionStmt TransactionStmtLegacy TruncateStmt
 		UnlistenStmt UpdateStmt VacuumStmt
@@ -323,7 +323,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 %type <str>			opt_single_name
 %type <list>		opt_qualified_name
-%type <boolean>		opt_concurrently
+%type <boolean>		opt_concurrently opt_usingindex
 %type <dbehavior>	opt_drop_behavior
 %type <list>		opt_utility_option_list
 %type <list>		opt_wait_with_clause
@@ -773,7 +773,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	QUOTE QUOTES
 
 	RANGE READ REAL REASSIGN RECURSIVE REF_P REFERENCES REFERENCING
-	REFRESH REINDEX RELATIVE_P RELEASE RENAME REPEATABLE REPLACE REPLICA
+	REFRESH REINDEX RELATIVE_P RELEASE RENAME REPACK REPEATABLE REPLACE REPLICA
 	RESET RESPECT_P RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP
 	ROUTINE ROUTINES ROW ROWS RULE
 
@@ -1035,7 +1035,6 @@ stmt:
 			| CallStmt
 			| CheckPointStmt
 			| ClosePortalStmt
-			| ClusterStmt
 			| CommentStmt
 			| ConstraintsSetStmt
 			| CopyStmt
@@ -1109,6 +1108,7 @@ stmt:
 			| RemoveFuncStmt
 			| RemoveOperStmt
 			| RenameStmt
+			| RepackStmt
 			| RevokeStmt
 			| RevokeRoleStmt
 			| RuleStmt
@@ -1146,6 +1146,11 @@ opt_concurrently:
 			| /*EMPTY*/						{ $$ = false; }
 		;
 
+opt_usingindex:
+			USING INDEX						{ $$ = true; }
+			| /* EMPTY */					{ $$ = false; }
+		;
+
 opt_drop_behavior:
 			CASCADE							{ $$ = DROP_CASCADE; }
 			| RESTRICT						{ $$ = DROP_RESTRICT; }
@@ -12036,38 +12041,82 @@ CreateConversionStmt:
 /*****************************************************************************
  *
  *		QUERY:
+ *				REPACK [ (options) ] [ <qualified_name> [ <name_list> ] [ USING INDEX <index_name> ] ]
+ *
+ *			obsolete variants:
  *				CLUSTER (options) [ <qualified_name> [ USING <index_name> ] ]
  *				CLUSTER [VERBOSE] [ <qualified_name> [ USING <index_name> ] ]
  *				CLUSTER [VERBOSE] <index_name> ON <qualified_name> (for pre-8.3)
  *
  *****************************************************************************/
 
-ClusterStmt:
-			CLUSTER '(' utility_option_list ')' qualified_name cluster_index_specification
+RepackStmt:
+			REPACK opt_utility_option_list vacuum_relation USING INDEX name
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
-					n->relation = $5;
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = (VacuumRelation *) $3;
 					n->indexname = $6;
+					n->usingindex = true;
+					n->params = $2;
+					$$ = (Node *) n;
+				}
+			| REPACK opt_utility_option_list vacuum_relation opt_usingindex
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = (VacuumRelation *) $3;
+					n->indexname = NULL;
+					n->usingindex = $4;
+					n->params = $2;
+					$$ = (Node *) n;
+				}
+			| REPACK opt_utility_option_list opt_usingindex
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = NULL;
+					n->indexname = NULL;
+					n->usingindex = $3;
+					n->params = $2;
+					$$ = (Node *) n;
+				}
+			| CLUSTER '(' utility_option_list ')' qualified_name cluster_index_specification
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_CLUSTER;
+					n->relation = makeNode(VacuumRelation);
+					n->relation->relation = $5;
+					n->indexname = $6;
+					n->usingindex = true;
 					n->params = $3;
 					$$ = (Node *) n;
 				}
 			| CLUSTER opt_utility_option_list
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = NULL;
 					n->indexname = NULL;
+					n->usingindex = true;
 					n->params = $2;
 					$$ = (Node *) n;
 				}
 			/* unparenthesized VERBOSE kept for pre-14 compatibility */
 			| CLUSTER opt_verbose qualified_name cluster_index_specification
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
-					n->relation = $3;
+					n->command = REPACK_COMMAND_CLUSTER;
+					n->relation = makeNode(VacuumRelation);
+					n->relation->relation = $3;
 					n->indexname = $4;
+					n->usingindex = true;
 					if ($2)
 						n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
@@ -12075,20 +12124,25 @@ ClusterStmt:
 			/* unparenthesized VERBOSE kept for pre-17 compatibility */
 			| CLUSTER VERBOSE
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = NULL;
 					n->indexname = NULL;
+					n->usingindex = true;
 					n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
 				}
 			/* kept for pre-8.3 compatibility */
 			| CLUSTER opt_verbose name ON qualified_name
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
-					n->relation = $5;
+					n->command = REPACK_COMMAND_CLUSTER;
+					n->relation = makeNode(VacuumRelation);
+					n->relation->relation = $5;
 					n->indexname = $3;
+					n->usingindex = true;
 					if ($2)
 						n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
@@ -18127,6 +18181,7 @@ unreserved_keyword:
 			| RELATIVE_P
 			| RELEASE
 			| RENAME
+			| REPACK
 			| REPEATABLE
 			| REPLACE
 			| REPLICA
@@ -18764,6 +18819,7 @@ bare_label_keyword:
 			| RELATIVE_P
 			| RELEASE
 			| RENAME
+			| REPACK
 			| REPEATABLE
 			| REPLACE
 			| REPLICA
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 34dd6e18df5..ca737b05115 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -279,9 +279,9 @@ ClassifyUtilityCommandAsReadOnly(Node *parsetree)
 				return COMMAND_OK_IN_RECOVERY | COMMAND_OK_IN_READ_ONLY_TXN;
 			}
 
-		case T_ClusterStmt:
 		case T_ReindexStmt:
 		case T_VacuumStmt:
+		case T_RepackStmt:
 			{
 				/*
 				 * These commands write WAL, so they're not strictly
@@ -856,14 +856,14 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 			ExecuteCallStmt(castNode(CallStmt, parsetree), params, isAtomicContext, dest);
 			break;
 
-		case T_ClusterStmt:
-			cluster(pstate, (ClusterStmt *) parsetree, isTopLevel);
-			break;
-
 		case T_VacuumStmt:
 			ExecVacuum(pstate, (VacuumStmt *) parsetree, isTopLevel);
 			break;
 
+		case T_RepackStmt:
+			ExecRepack(pstate, (RepackStmt *) parsetree, isTopLevel);
+			break;
+
 		case T_ExplainStmt:
 			ExplainQuery(pstate, (ExplainStmt *) parsetree, params, dest);
 			break;
@@ -2865,10 +2865,6 @@ CreateCommandTag(Node *parsetree)
 			tag = CMDTAG_CALL;
 			break;
 
-		case T_ClusterStmt:
-			tag = CMDTAG_CLUSTER;
-			break;
-
 		case T_VacuumStmt:
 			if (((VacuumStmt *) parsetree)->is_vacuumcmd)
 				tag = CMDTAG_VACUUM;
@@ -2876,6 +2872,13 @@ CreateCommandTag(Node *parsetree)
 				tag = CMDTAG_ANALYZE;
 			break;
 
+		case T_RepackStmt:
+			if (((RepackStmt *) parsetree)->command == REPACK_COMMAND_CLUSTER)
+				tag = CMDTAG_CLUSTER;
+			else
+				tag = CMDTAG_REPACK;
+			break;
+
 		case T_ExplainStmt:
 			tag = CMDTAG_EXPLAIN;
 			break;
@@ -3517,7 +3520,7 @@ GetCommandLogLevel(Node *parsetree)
 			lev = LOGSTMT_ALL;
 			break;
 
-		case T_ClusterStmt:
+		case T_RepackStmt:
 			lev = LOGSTMT_DDL;
 			break;
 
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 73ca0bb0b7f..ae7e5601b43 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -289,6 +289,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 		cmdtype = PROGRESS_COMMAND_ANALYZE;
 	else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
 		cmdtype = PROGRESS_COMMAND_CLUSTER;
+	else if (pg_strcasecmp(cmd, "REPACK") == 0)
+		cmdtype = PROGRESS_COMMAND_REPACK;
 	else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)
 		cmdtype = PROGRESS_COMMAND_CREATE_INDEX;
 	else if (pg_strcasecmp(cmd, "BASEBACKUP") == 0)
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 8b91bc00062..2a1bb47ff03 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -1267,7 +1267,7 @@ static const char *const sql_commands[] = {
 	"DELETE FROM", "DISCARD", "DO", "DROP", "END", "EXECUTE", "EXPLAIN",
 	"FETCH", "GRANT", "IMPORT FOREIGN SCHEMA", "INSERT INTO", "LISTEN", "LOAD", "LOCK",
 	"MERGE INTO", "MOVE", "NOTIFY", "PREPARE",
-	"REASSIGN", "REFRESH MATERIALIZED VIEW", "REINDEX", "RELEASE",
+	"REASSIGN", "REFRESH MATERIALIZED VIEW", "REINDEX", "RELEASE", "REPACK",
 	"RESET", "REVOKE", "ROLLBACK",
 	"SAVEPOINT", "SECURITY LABEL", "SELECT", "SET", "SHOW", "START",
 	"TABLE", "TRUNCATE", "UNLISTEN", "UPDATE", "VACUUM", "VALUES",
@@ -5086,6 +5086,46 @@ match_previous_words(int pattern_id,
 			COMPLETE_WITH_QUERY(Query_for_list_of_tablespaces);
 	}
 
+/* REPACK */
+	else if (Matches("REPACK"))
+		COMPLETE_WITH_SCHEMA_QUERY_PLUS(Query_for_list_of_clusterables,
+										"(", "USING INDEX");
+	else if (Matches("REPACK", "(*)"))
+		COMPLETE_WITH_SCHEMA_QUERY_PLUS(Query_for_list_of_clusterables,
+										"USING INDEX");
+	else if (Matches("REPACK", MatchAnyExcept("(")))
+		COMPLETE_WITH("USING INDEX");
+	else if (Matches("REPACK", "(*)", MatchAnyExcept("(")))
+		COMPLETE_WITH("USING INDEX");
+	else if (Matches("REPACK", MatchAny, "USING", "INDEX") ||
+			 Matches("REPACK", "(*)", MatchAny, "USING", "INDEX"))
+	{
+		set_completion_reference(prev3_wd);
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_index_of_table);
+	}
+	/*
+	 * Complete ... [ (*) ] <sth> USING INDEX, with a list of indexes for
+	 * <sth>.
+	 */
+	else if (TailMatches(MatchAny, "USING", "INDEX"))
+	{
+		set_completion_reference(prev3_wd);
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_index_of_table);
+	}
+	else if (HeadMatches("REPACK", "(*") &&
+			 !HeadMatches("REPACK", "(*)"))
+	{
+		/*
+		 * This fires if we're in an unfinished parenthesized option list.
+		 * get_previous_words treats a completed parenthesized option list as
+		 * one word, so the above test is correct.
+		 */
+		if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
+			COMPLETE_WITH("ANALYZE", "VERBOSE");
+		else if (TailMatches("ANALYZE", "VERBOSE"))
+			COMPLETE_WITH("ON", "OFF");
+	}
+
 /* SECURITY LABEL */
 	else if (Matches("SECURITY"))
 		COMPLETE_WITH("LABEL");
diff --git a/src/bin/scripts/Makefile b/src/bin/scripts/Makefile
index e6cd9ef4af5..f964a5803ca 100644
--- a/src/bin/scripts/Makefile
+++ b/src/bin/scripts/Makefile
@@ -23,6 +23,7 @@ PROGRAMS = \
 	dropdb \
 	dropuser \
 	pg_isready \
+	pg_repackdb \
 	reindexdb \
 	vacuumdb
 
@@ -39,6 +40,7 @@ clusterdb: clusterdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport su
 vacuumdb: vacuumdb.o vacuuming.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 reindexdb: reindexdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 pg_isready: pg_isready.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
+pg_repackdb: pg_repackdb.o vacuuming.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 
 install: all installdirs
 	$(INSTALL_PROGRAM) createdb$(X)   '$(DESTDIR)$(bindir)'/createdb$(X)
@@ -49,6 +51,7 @@ install: all installdirs
 	$(INSTALL_PROGRAM) vacuumdb$(X)   '$(DESTDIR)$(bindir)'/vacuumdb$(X)
 	$(INSTALL_PROGRAM) reindexdb$(X)  '$(DESTDIR)$(bindir)'/reindexdb$(X)
 	$(INSTALL_PROGRAM) pg_isready$(X) '$(DESTDIR)$(bindir)'/pg_isready$(X)
+	$(INSTALL_PROGRAM) pg_repackdb$(X) '$(DESTDIR)$(bindir)'/pg_repackdb$(X)
 
 installdirs:
 	$(MKDIR_P) '$(DESTDIR)$(bindir)'
diff --git a/src/bin/scripts/meson.build b/src/bin/scripts/meson.build
index c083ec38099..1e88bacebba 100644
--- a/src/bin/scripts/meson.build
+++ b/src/bin/scripts/meson.build
@@ -42,6 +42,7 @@ vacuuming_common = static_library('libvacuuming_common',
 
 binaries = [
   'vacuumdb',
+  'pg_repackdb',
 ]
 foreach binary : binaries
   binary_sources = files('@0@.c'.format(binary))
@@ -80,6 +81,7 @@ tests += {
       't/100_vacuumdb.pl',
       't/101_vacuumdb_all.pl',
       't/102_vacuumdb_stages.pl',
+      't/103_repackdb.pl',
       't/200_connstr.pl',
     ],
   },
diff --git a/src/bin/scripts/pg_repackdb.c b/src/bin/scripts/pg_repackdb.c
new file mode 100644
index 00000000000..2765d1e97b8
--- /dev/null
+++ b/src/bin/scripts/pg_repackdb.c
@@ -0,0 +1,240 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_repackdb
+ *		An utility to run REPACK
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * FIXME: this is missing a way to specify the index to use to repack one
+ * table, or whether to pass a WITH INDEX clause when multiple tables are
+ * used.  Something like --index[=indexname].  Adding that bleeds into
+ * vacuuming.c as well.
+ *
+ * src/bin/scripts/pg_repackdb.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <limits.h>
+
+#include "common.h"
+#include "common/logging.h"
+#include "fe_utils/option_utils.h"
+#include "vacuuming.h"
+
+static void help(const char *progname);
+static void check_objfilter(bits32 objfilter);
+
+int
+main(int argc, char *argv[])
+{
+	static struct option long_options[] = {
+		{"host", required_argument, NULL, 'h'},
+		{"port", required_argument, NULL, 'p'},
+		{"username", required_argument, NULL, 'U'},
+		{"no-password", no_argument, NULL, 'w'},
+		{"password", no_argument, NULL, 'W'},
+		{"echo", no_argument, NULL, 'e'},
+		{"quiet", no_argument, NULL, 'q'},
+		{"dbname", required_argument, NULL, 'd'},
+		{"analyze", no_argument, NULL, 'z'},
+		{"all", no_argument, NULL, 'a'},
+		/* XXX this could be 'i', but optional_arg is messy */
+		{"index", optional_argument, NULL, 1},
+		{"table", required_argument, NULL, 't'},
+		{"verbose", no_argument, NULL, 'v'},
+		{"jobs", required_argument, NULL, 'j'},
+		{"schema", required_argument, NULL, 'n'},
+		{"exclude-schema", required_argument, NULL, 'N'},
+		{"maintenance-db", required_argument, NULL, 2},
+		{NULL, 0, NULL, 0}
+	};
+
+	const char *progname;
+	int			optindex;
+	int			c;
+	const char *dbname = NULL;
+	const char *maintenance_db = NULL;
+	ConnParams	cparams;
+	vacuumingOptions vacopts;
+	SimpleStringList objects = {NULL, NULL};
+	int			concurrentCons = 1;
+	int			tbl_count = 0;
+	int			ret;
+
+	/* initialize options */
+	memset(&vacopts, 0, sizeof(vacopts));
+	vacopts.mode = MODE_REPACK;
+
+	/* the same for connection parameters */
+	memset(&cparams, 0, sizeof(cparams));
+	cparams.prompt_password = TRI_DEFAULT;
+
+	pg_logging_init(argv[0]);
+	progname = get_progname(argv[0]);
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
+
+	handle_help_version_opts(argc, argv, progname, help);
+
+	while ((c = getopt_long(argc, argv, "ad:eh:j:n:N:p:qt:U:vwWz",
+							long_options, &optindex)) != -1)
+	{
+		switch (c)
+		{
+			case 'a':
+				vacopts.objfilter |= OBJFILTER_ALL_DBS;
+				break;
+			case 'd':
+				vacopts.objfilter |= OBJFILTER_DATABASE;
+				dbname = pg_strdup(optarg);
+				break;
+			case 'e':
+				vacopts.echo = true;
+				break;
+			case 'h':
+				cparams.pghost = pg_strdup(optarg);
+				break;
+			case 'j':
+				if (!option_parse_int(optarg, "-j/--jobs", 1, INT_MAX,
+									  &concurrentCons))
+					exit(1);
+				break;
+			case 'n':
+				vacopts.objfilter |= OBJFILTER_SCHEMA;
+				simple_string_list_append(&objects, optarg);
+				break;
+			case 'N':
+				vacopts.objfilter |= OBJFILTER_SCHEMA_EXCLUDE;
+				simple_string_list_append(&objects, optarg);
+				break;
+			case 'p':
+				cparams.pgport = pg_strdup(optarg);
+				break;
+			case 'q':
+				vacopts.quiet = true;
+				break;
+			case 't':
+				vacopts.objfilter |= OBJFILTER_TABLE;
+				simple_string_list_append(&objects, optarg);
+				tbl_count++;
+				break;
+			case 'U':
+				cparams.pguser = pg_strdup(optarg);
+				break;
+			case 'v':
+				vacopts.verbose = true;
+				break;
+			case 'w':
+				cparams.prompt_password = TRI_NO;
+				break;
+			case 'W':
+				cparams.prompt_password = TRI_YES;
+				break;
+			case 'z':
+				vacopts.and_analyze = true;
+				break;
+			case 1:
+				vacopts.using_index = true;
+				if (optarg)
+					vacopts.indexname = pg_strdup(optarg);
+				else
+					vacopts.indexname = NULL;
+				break;
+			case 2:
+				maintenance_db = pg_strdup(optarg);
+				break;
+			default:
+				/* getopt_long already emitted a complaint */
+				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+				exit(1);
+		}
+	}
+
+	/*
+	 * Non-option argument specifies database name as long as it wasn't
+	 * already specified with -d / --dbname
+	 */
+	if (optind < argc && dbname == NULL)
+	{
+		vacopts.objfilter |= OBJFILTER_DATABASE;
+		dbname = argv[optind];
+		optind++;
+	}
+
+	if (optind < argc)
+	{
+		pg_log_error("too many command-line arguments (first is \"%s\")",
+					 argv[optind]);
+		pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+		exit(1);
+	}
+
+	/*
+	 * Validate the combination of filters specified in the command-line
+	 * options.
+	 */
+	check_objfilter(vacopts.objfilter);
+
+	ret = vacuuming_main(&cparams, dbname, maintenance_db, &vacopts,
+						 &objects, tbl_count, concurrentCons,
+						 progname);
+	exit(ret);
+}
+
+/*
+ * Verify that the filters used at command line are compatible.
+ */
+void
+check_objfilter(bits32 objfilter)
+{
+	if ((objfilter & OBJFILTER_ALL_DBS) &&
+		(objfilter & OBJFILTER_DATABASE))
+		pg_fatal("cannot repack all databases and a specific one at the same time");
+
+	if ((objfilter & OBJFILTER_TABLE) &&
+		(objfilter & OBJFILTER_SCHEMA))
+		pg_fatal("cannot repack all tables in schema(s) and specific table(s) at the same time");
+
+	if ((objfilter & OBJFILTER_TABLE) &&
+		(objfilter & OBJFILTER_SCHEMA_EXCLUDE))
+		pg_fatal("cannot repack specific table(s) and exclude schema(s) at the same time");
+
+	if ((objfilter & OBJFILTER_SCHEMA) &&
+		(objfilter & OBJFILTER_SCHEMA_EXCLUDE))
+		pg_fatal("cannot repack all tables in schema(s) and exclude schema(s) at the same time");
+}
+
+static void
+help(const char *progname)
+{
+	printf(_("%s repacks a PostgreSQL database.\n\n"), progname);
+	printf(_("Usage:\n"));
+	printf(_("  %s [OPTION]... [DBNAME]\n"), progname);
+	printf(_("\nOptions:\n"));
+	printf(_("  -a, --all                       repack all databases\n"));
+	printf(_("  -d, --dbname=DBNAME             database to repack\n"));
+	printf(_("  -e, --echo                      show the commands being sent to the server\n"));
+	printf(_("      --index[=INDEX]             repack following an index\n"));
+	printf(_("  -j, --jobs=NUM                  use this many concurrent connections to repack\n"));
+	printf(_("  -n, --schema=SCHEMA             repack tables in the specified schema(s) only\n"));
+	printf(_("  -N, --exclude-schema=SCHEMA     do not repack tables in the specified schema(s)\n"));
+	printf(_("  -q, --quiet                     don't write any messages\n"));
+	printf(_("  -t, --table='TABLE[(COLUMNS)]'  repack specific table(s) only\n"));
+	printf(_("  -v, --verbose                   write a lot of output\n"));
+	printf(_("  -V, --version                   output version information, then exit\n"));
+	printf(_("  -z, --analyze                   update optimizer statistics\n"));
+	printf(_("  -?, --help                      show this help, then exit\n"));
+	printf(_("\nConnection options:\n"));
+	printf(_("  -h, --host=HOSTNAME       database server host or socket directory\n"));
+	printf(_("  -p, --port=PORT           database server port\n"));
+	printf(_("  -U, --username=USERNAME   user name to connect as\n"));
+	printf(_("  -w, --no-password         never prompt for password\n"));
+	printf(_("  -W, --password            force password prompt\n"));
+	printf(_("  --maintenance-db=DBNAME   alternate maintenance database\n"));
+	printf(_("\nRead the description of the SQL command REPACK for details.\n"));
+	printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
+	printf(_("%s home page: <%s>\n"), PACKAGE_NAME, PACKAGE_URL);
+}
diff --git a/src/bin/scripts/t/103_repackdb.pl b/src/bin/scripts/t/103_repackdb.pl
new file mode 100644
index 00000000000..cadce9b837c
--- /dev/null
+++ b/src/bin/scripts/t/103_repackdb.pl
@@ -0,0 +1,47 @@
+# Copyright (c) 2021-2025, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+program_help_ok('pg_repackdb');
+program_version_ok('pg_repackdb');
+program_options_handling_ok('pg_repackdb');
+
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init;
+$node->start;
+
+$node->issues_sql_like(
+	[ 'pg_repackdb', 'postgres', '-t', 'pg_class'],
+	qr/statement: REPACK.*pg_class;/,
+	'pg_repackdb processes a single table');
+
+$node->safe_psql('postgres', 'CREATE USER testusr;
+	GRANT CREATE ON SCHEMA public TO testusr');
+$node->safe_psql('postgres',
+	'CREATE TABLE cluster_1 (a int primary key);
+	ALTER TABLE cluster_1 CLUSTER ON cluster_1_pkey;
+	CREATE TABLE cluster_2 (a int unique);
+	ALTER TABLE cluster_2 CLUSTER ON cluster_2_a_key;',
+	extra_params => ['-U' => 'testusr']);
+
+$node->issues_sql_like(
+	[ 'pg_repackdb', 'postgres', '-U', 'testusr' ],
+	qr/statement: REPACK.*;/,
+	'SQL REPACK run');
+
+$node->issues_sql_like(
+	[ 'pg_repackdb', 'postgres', '--index'],
+	qr/statement: REPACK.*cluster_1 USING INDEX.*statement: REPACK.*cluster_2 USING INDEX/ms,
+	'pg_repackdb --index chooses multiple tables');
+
+$node->issues_sql_like(
+	[ 'pg_repackdb', 'postgres', '--analyze', '-t', 'cluster_1'],
+	qr/statement: REPACK \(ANALYZE\) public.cluster_1/,
+	'pg_repackdb --analyze works');
+
+done_testing();
diff --git a/src/bin/scripts/vacuuming.c b/src/bin/scripts/vacuuming.c
index faac9089a01..0fa7be3c0c3 100644
--- a/src/bin/scripts/vacuuming.c
+++ b/src/bin/scripts/vacuuming.c
@@ -1,6 +1,6 @@
 /*-------------------------------------------------------------------------
  * vacuuming.c
- *		Helper routines for vacuumdb
+ *		Helper routines for vacuumdb and pg_repackdb
  *
  * Portions Copyright (c) 1996-2026, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -194,6 +194,14 @@ vacuum_one_database(ConnParams *cparams,
 
 	conn = connectDatabase(cparams, progname, vacopts->echo, false, true);
 
+	if (vacopts->mode == MODE_REPACK && PQserverVersion(conn) < 190000)
+	{
+		/* XXX arguably, here we should use VACUUM FULL instead of failing */
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" command on server versions older than PostgreSQL %s",
+				 "REPACK", "19");
+	}
+
 	if (vacopts->disable_page_skipping && PQserverVersion(conn) < 90600)
 	{
 		PQfinish(conn);
@@ -286,9 +294,18 @@ vacuum_one_database(ConnParams *cparams,
 		if (vacopts->mode == MODE_ANALYZE_IN_STAGES)
 			printf(_("%s: processing database \"%s\": %s\n"),
 				   progname, PQdb(conn), _(stage_messages[stage]));
-		else
+		else if (vacopts->mode == MODE_ANALYZE)
+			printf(_("%s: analyzing database \"%s\"\n"),
+				   progname, PQdb(conn));
+		else if (vacopts->mode == MODE_VACUUM)
 			printf(_("%s: vacuuming database \"%s\"\n"),
 				   progname, PQdb(conn));
+		else
+		{
+			Assert(vacopts->mode == MODE_REPACK);
+			printf(_("%s: repacking database \"%s\"\n"),
+				   progname, PQdb(conn));
+		}
 		fflush(stdout);
 	}
 
@@ -640,6 +657,35 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
 								 " AND listed_objects.object_oid IS NOT NULL\n");
 	}
 
+	/*
+	 * In REPACK mode, if the 'using_index' option was given but no index
+	 * name, filter only tables that have an index with indisclustered set.
+	 * (If an index name is given, we trust the user to pass a reasonable list
+	 * of tables.)
+	 *
+	 * XXX it may be worth printing an error if an index name is given with no
+	 * list of tables.
+	 */
+	if (vacopts->mode == MODE_REPACK &&
+		vacopts->using_index && !vacopts->indexname)
+	{
+		appendPQExpBufferStr(&catalog_query,
+							 " AND EXISTS (SELECT 1 FROM pg_catalog.pg_index\n"
+							 "    WHERE indrelid = c.oid AND indisclustered)\n");
+	}
+
+	/*
+	 * In REPACK mode, only consider the tables that the current user has
+	 * MAINTAIN privileges on.  XXX maybe we should do this in all cases, not
+	 * just REPACK.  The vacuumdb output is too noisy for no reason.
+	 */
+	if (vacopts->mode == MODE_REPACK)
+	{
+		appendPQExpBufferStr(&catalog_query,
+							 " AND pg_catalog.has_table_privilege(current_user, "
+							 "c.oid, 'MAINTAIN')\n");
+	}
+
 	/*
 	 * If no tables were listed, filter for the relevant relation types.  If
 	 * tables were given via --table, don't bother filtering by relation type.
@@ -878,8 +924,10 @@ prepare_vacuum_command(PGconn *conn, PQExpBuffer sql,
 			if (vacopts->verbose)
 				appendPQExpBufferStr(sql, " VERBOSE");
 		}
+
+		appendPQExpBuffer(sql, " %s", table);
 	}
-	else
+	else if (vacopts->mode == MODE_VACUUM)
 	{
 		appendPQExpBufferStr(sql, "VACUUM");
 
@@ -993,9 +1041,39 @@ prepare_vacuum_command(PGconn *conn, PQExpBuffer sql,
 			if (vacopts->and_analyze)
 				appendPQExpBufferStr(sql, " ANALYZE");
 		}
+
+		appendPQExpBuffer(sql, " %s", table);
 	}
+	else if (vacopts->mode == MODE_REPACK)
+	{
+		appendPQExpBufferStr(sql, "REPACK");
 
-	appendPQExpBuffer(sql, " %s;", table);
+		if (vacopts->verbose)
+		{
+			appendPQExpBuffer(sql, "%sVERBOSE", sep);
+			sep = comma;
+		}
+		if (vacopts->and_analyze)
+		{
+			appendPQExpBuffer(sql, "%sANALYZE", sep);
+			sep = comma;
+		}
+
+		if (sep != paren)
+			appendPQExpBufferChar(sql, ')');
+
+		appendPQExpBuffer(sql, " %s", table);
+
+		if (vacopts->using_index)
+		{
+			appendPQExpBuffer(sql, " USING INDEX");
+			if (vacopts->indexname)
+				appendPQExpBuffer(sql, " %s", fmtIdEnc(vacopts->indexname,
+													   PQclientEncoding(conn)));
+		}
+	}
+
+	appendPQExpBufferChar(sql, ';');
 }
 
 /*
@@ -1024,13 +1102,21 @@ run_vacuum_command(ParallelSlot *free_slot, vacuumingOptions *vacopts,
 	{
 		if (table)
 		{
-			pg_log_error("vacuuming of table \"%s\" in database \"%s\" failed: %s",
-						 table, PQdb(conn), PQerrorMessage(conn));
+			if (vacopts->mode == MODE_VACUUM)
+				pg_log_error("vacuuming of table \"%s\" in database \"%s\" failed: %s",
+							 table, PQdb(conn), PQerrorMessage(conn));
+			else
+				pg_log_error("repacking of table \"%s\" in database \"%s\" failed: %s",
+							 table, PQdb(conn), PQerrorMessage(conn));
 		}
 		else
 		{
-			pg_log_error("vacuuming of database \"%s\" failed: %s",
-						 PQdb(conn), PQerrorMessage(conn));
+			if (vacopts->mode == MODE_VACUUM)
+				pg_log_error("vacuuming of database \"%s\" failed: %s",
+							 PQdb(conn), PQerrorMessage(conn));
+			else
+				pg_log_error("repacking of database \"%s\" failed: %s",
+							 PQdb(conn), PQerrorMessage(conn));
 		}
 	}
 }
diff --git a/src/bin/scripts/vacuuming.h b/src/bin/scripts/vacuuming.h
index be4a75ef2f4..6e8abe00a4f 100644
--- a/src/bin/scripts/vacuuming.h
+++ b/src/bin/scripts/vacuuming.h
@@ -20,6 +20,7 @@
 typedef enum
 {
 	MODE_VACUUM,
+	MODE_REPACK,
 	MODE_ANALYZE,
 	MODE_ANALYZE_IN_STAGES
 } RunMode;
@@ -37,6 +38,8 @@ typedef struct vacuumingOptions
 	bool		and_analyze;
 	bool		full;
 	bool		freeze;
+	bool		using_index;
+	char	   *indexname;
 	bool		disable_page_skipping;
 	bool		skip_locked;
 	int			min_xid_age;
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 8ea81622f9d..28741988478 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -24,6 +24,7 @@
 #define CLUOPT_RECHECK 0x02		/* recheck relation state */
 #define CLUOPT_RECHECK_ISCLUSTERED 0x04 /* recheck relation state for
 										 * indisclustered */
+#define CLUOPT_ANALYZE 0x08		/* do an ANALYZE */
 
 /* options for CLUSTER */
 typedef struct ClusterParams
@@ -31,8 +32,11 @@ typedef struct ClusterParams
 	bits32		options;		/* bitmask of CLUOPT_* */
 } ClusterParams;
 
-extern void cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel);
-extern void cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params);
+
+extern void ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel);
+
+extern void cluster_rel(RepackCommand command, Relation OldHeap, Oid indexOid,
+						ClusterParams *params);
 extern void check_index_is_clusterable(Relation OldHeap, Oid indexOid,
 									   LOCKMODE lockmode);
 extern void mark_index_clustered(Relation rel, Oid indexOid, bool is_internal);
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 359221dc296..f00e39b937d 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -73,28 +73,34 @@
 #define PROGRESS_ANALYZE_STARTED_BY_MANUAL			1
 #define PROGRESS_ANALYZE_STARTED_BY_AUTOVACUUM		2
 
-/* Progress parameters for cluster */
-#define PROGRESS_CLUSTER_COMMAND				0
-#define PROGRESS_CLUSTER_PHASE					1
-#define PROGRESS_CLUSTER_INDEX_RELID			2
-#define PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED	3
-#define PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN	4
-#define PROGRESS_CLUSTER_TOTAL_HEAP_BLKS		5
-#define PROGRESS_CLUSTER_HEAP_BLKS_SCANNED		6
-#define PROGRESS_CLUSTER_INDEX_REBUILD_COUNT	7
-
-/* Phases of cluster (as advertised via PROGRESS_CLUSTER_PHASE) */
-#define PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP	1
-#define PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP	2
-#define PROGRESS_CLUSTER_PHASE_SORT_TUPLES		3
-#define PROGRESS_CLUSTER_PHASE_WRITE_NEW_HEAP	4
-#define PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES	5
-#define PROGRESS_CLUSTER_PHASE_REBUILD_INDEX	6
-#define PROGRESS_CLUSTER_PHASE_FINAL_CLEANUP	7
-
-/* Commands of PROGRESS_CLUSTER */
-#define PROGRESS_CLUSTER_COMMAND_CLUSTER		1
-#define PROGRESS_CLUSTER_COMMAND_VACUUM_FULL	2
+/*
+ * Progress parameters for REPACK.
+ *
+ * Values for PROGRESS_REPACK_COMMAND are defined as in RepackCommand.
+ *
+ * Note: Since REPACK shares code with CLUSTER, these values are also
+ * used by CLUSTER. (CLUSTER is now deprecated, so it makes little sense to
+ * introduce a separate set of constants.)
+ */
+#define PROGRESS_REPACK_COMMAND					0
+#define PROGRESS_REPACK_PHASE					1
+#define PROGRESS_REPACK_INDEX_RELID				2
+#define PROGRESS_REPACK_HEAP_TUPLES_SCANNED		3
+#define PROGRESS_REPACK_HEAP_TUPLES_WRITTEN		4
+#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			5
+#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		6
+#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		7
+
+/*
+ * Phases of repack (as advertised via PROGRESS_REPACK_PHASE).
+ */
+#define PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP		1
+#define PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP	2
+#define PROGRESS_REPACK_PHASE_SORT_TUPLES		3
+#define PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP	4
+#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	5
+#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		6
+#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		7
 
 /* Progress parameters for CREATE INDEX */
 /* 3, 4 and 5 reserved for "waitfor" metrics */
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index aac4bfc70d9..e651ed3a1ff 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3980,18 +3980,6 @@ typedef struct AlterSystemStmt
 	VariableSetStmt *setstmt;	/* SET subcommand */
 } AlterSystemStmt;
 
-/* ----------------------
- *		Cluster Statement (support pbrown's cluster index implementation)
- * ----------------------
- */
-typedef struct ClusterStmt
-{
-	NodeTag		type;
-	RangeVar   *relation;		/* relation being indexed, or NULL if all */
-	char	   *indexname;		/* original index defined */
-	List	   *params;			/* list of DefElem nodes */
-} ClusterStmt;
-
 /* ----------------------
  *		Vacuum and Analyze Statements
  *
@@ -4004,7 +3992,7 @@ typedef struct VacuumStmt
 	NodeTag		type;
 	List	   *options;		/* list of DefElem nodes */
 	List	   *rels;			/* list of VacuumRelation, or NIL for all */
-	bool		is_vacuumcmd;	/* true for VACUUM, false for ANALYZE */
+	bool		is_vacuumcmd;	/* true for VACUUM, false otherwise */
 } VacuumStmt;
 
 /*
@@ -4022,6 +4010,27 @@ typedef struct VacuumRelation
 	List	   *va_cols;		/* list of column names, or NIL for all */
 } VacuumRelation;
 
+/* ----------------------
+ *		Repack Statement
+ * ----------------------
+ */
+typedef enum RepackCommand
+{
+	REPACK_COMMAND_CLUSTER = 1,
+	REPACK_COMMAND_REPACK,
+	REPACK_COMMAND_VACUUMFULL,
+} RepackCommand;
+
+typedef struct RepackStmt
+{
+	NodeTag		type;
+	RepackCommand command;		/* type of command being run */
+	VacuumRelation *relation;	/* relation being repacked */
+	char	   *indexname;		/* order tuples by this index */
+	bool		usingindex;		/* whether USING INDEX is specified */
+	List	   *params;			/* list of DefElem nodes */
+} RepackStmt;
+
 /* ----------------------
  *		Explain Statement
  *
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index f7753c5c8a8..6f74a8c05c7 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -377,6 +377,7 @@ PG_KEYWORD("reindex", REINDEX, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("relative", RELATIVE_P, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("release", RELEASE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("rename", RENAME, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("repack", REPACK, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("repeatable", REPEATABLE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("replace", REPLACE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("replica", REPLICA, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/include/tcop/cmdtaglist.h b/src/include/tcop/cmdtaglist.h
index 1290c9bab68..652dc61b834 100644
--- a/src/include/tcop/cmdtaglist.h
+++ b/src/include/tcop/cmdtaglist.h
@@ -196,6 +196,7 @@ PG_CMDTAG(CMDTAG_REASSIGN_OWNED, "REASSIGN OWNED", false, false, false)
 PG_CMDTAG(CMDTAG_REFRESH_MATERIALIZED_VIEW, "REFRESH MATERIALIZED VIEW", true, false, false)
 PG_CMDTAG(CMDTAG_REINDEX, "REINDEX", true, false, false)
 PG_CMDTAG(CMDTAG_RELEASE, "RELEASE", false, false, false)
+PG_CMDTAG(CMDTAG_REPACK, "REPACK", false, false, false)
 PG_CMDTAG(CMDTAG_RESET, "RESET", false, false, false)
 PG_CMDTAG(CMDTAG_REVOKE, "REVOKE", true, false, false)
 PG_CMDTAG(CMDTAG_REVOKE_ROLE, "REVOKE ROLE", false, false, false)
diff --git a/src/include/utils/backend_progress.h b/src/include/utils/backend_progress.h
index 19f63b41431..b5f19dff52e 100644
--- a/src/include/utils/backend_progress.h
+++ b/src/include/utils/backend_progress.h
@@ -28,6 +28,7 @@ typedef enum ProgressCommandType
 	PROGRESS_COMMAND_CREATE_INDEX,
 	PROGRESS_COMMAND_BASEBACKUP,
 	PROGRESS_COMMAND_COPY,
+	PROGRESS_COMMAND_REPACK,
 } ProgressCommandType;
 
 #define PGSTAT_NUM_PROGRESS_PARAM	20
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index 4d40a6809ab..277854418fa 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -495,6 +495,43 @@ ALTER TABLE clstrpart SET WITHOUT CLUSTER;
 ERROR:  cannot mark index clustered in partitioned table
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 ERROR:  cannot mark index clustered in partitioned table
+-- Check that REPACK sets new relfilenodes: it should process exactly the same
+-- tables as CLUSTER did.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart USING INDEX clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
+-- And finally the same for REPACK w/o index.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
 DROP TABLE clstrpart;
 -- Ownership of partitions is checked
 CREATE TABLE ptnowner(i int unique) PARTITION BY LIST (i);
@@ -513,7 +550,7 @@ CREATE TEMP TABLE ptnowner_oldnodes AS
   JOIN pg_class AS c ON c.oid=tree.relid;
 SET SESSION AUTHORIZATION regress_ptnowner;
 CLUSTER ptnowner USING ptnowner_i_idx;
-WARNING:  permission denied to cluster "ptnowner2", skipping it
+WARNING:  permission denied to execute CLUSTER on "ptnowner2", skipping it
 RESET SESSION AUTHORIZATION;
 SELECT a.relname, a.relfilenode=b.relfilenode FROM pg_class a
   JOIN ptnowner_oldnodes b USING (oid) ORDER BY a.relname COLLATE "C";
@@ -665,6 +702,101 @@ SELECT * FROM clstr_expression WHERE -a = -3 ORDER BY -a, b;
 (4 rows)
 
 COMMIT;
+----------------------------------------------------------------------
+--
+-- REPACK
+--
+----------------------------------------------------------------------
+-- REPACK handles individual tables identically to CLUSTER, but it's worth
+-- checking if it handles table hierarchies identically as well.
+REPACK clstr_tst USING INDEX clstr_tst_c;
+-- Verify that inheritance link still works
+INSERT INTO clstr_tst_inh VALUES (0, 100, 'in child table 2');
+SELECT a,b,c,substring(d for 30), length(d) from clstr_tst;
+ a  |  b  |        c         |           substring            | length 
+----+-----+------------------+--------------------------------+--------
+ 10 |  14 | catorce          |                                |       
+ 18 |   5 | cinco            |                                |       
+  9 |   4 | cuatro           |                                |       
+ 26 |  19 | diecinueve       |                                |       
+ 12 |  18 | dieciocho        |                                |       
+ 30 |  16 | dieciseis        |                                |       
+ 24 |  17 | diecisiete       |                                |       
+  2 |  10 | diez             |                                |       
+ 23 |  12 | doce             |                                |       
+ 11 |   2 | dos              |                                |       
+ 25 |   9 | nueve            |                                |       
+ 31 |   8 | ocho             |                                |       
+  1 |  11 | once             |                                |       
+ 28 |  15 | quince           |                                |       
+ 32 |   6 | seis             | xyzzyxyzzyxyzzyxyzzyxyzzyxyzzy | 500000
+ 29 |   7 | siete            |                                |       
+ 15 |  13 | trece            |                                |       
+ 22 |  30 | treinta          |                                |       
+ 17 |  32 | treinta y dos    |                                |       
+  3 |  31 | treinta y uno    |                                |       
+  5 |   3 | tres             |                                |       
+ 20 |   1 | uno              |                                |       
+  6 |  20 | veinte           |                                |       
+ 14 |  25 | veinticinco      |                                |       
+ 21 |  24 | veinticuatro     |                                |       
+  4 |  22 | veintidos        |                                |       
+ 19 |  29 | veintinueve      |                                |       
+ 16 |  28 | veintiocho       |                                |       
+ 27 |  26 | veintiseis       |                                |       
+ 13 |  27 | veintisiete      |                                |       
+  7 |  23 | veintitres       |                                |       
+  8 |  21 | veintiuno        |                                |       
+  0 | 100 | in child table   |                                |       
+  0 | 100 | in child table 2 |                                |       
+(34 rows)
+
+-- Verify that foreign key link still works
+INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
+ERROR:  insert or update on table "clstr_tst" violates foreign key constraint "clstr_tst_con"
+DETAIL:  Key (b)=(1111) is not present in table "clstr_tst_s".
+SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
+ORDER BY 1;
+       conname        
+----------------------
+ clstr_tst_a_not_null
+ clstr_tst_con
+ clstr_tst_pkey
+(3 rows)
+
+-- Verify partial analyze works
+REPACK (ANALYZE) clstr_tst (a);
+REPACK (ANALYZE) clstr_tst;
+REPACK (VERBOSE) clstr_tst (a);
+ERROR:  ANALYZE option must be specified when a column list is provided
+-- REPACK w/o argument performs no ordering, so we can only check which tables
+-- have the relfilenode changed.
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_old AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+SET SESSION AUTHORIZATION regress_clstr_user;
+SET client_min_messages = ERROR;  -- order of "skipping" warnings may vary
+REPACK;
+RESET client_min_messages;
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_new AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+-- Do the actual comparison. Unlike CLUSTER, clstr_3 should have been
+-- processed because there is nothing like clustering index here.
+SELECT o.relname FROM relnodes_old o
+JOIN relnodes_new n ON o.relname = n.relname
+WHERE o.relfilenode <> n.relfilenode
+ORDER BY o.relname;
+ relname 
+---------
+ clstr_1
+ clstr_3
+(2 rows)
+
 -- clean up
 DROP TABLE clustertest;
 DROP TABLE clstr_1;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index f4ee2bd7459..48461550636 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2002,34 +2002,23 @@ pg_stat_progress_basebackup| SELECT pid,
             ELSE NULL::text
         END AS backup_type
    FROM pg_stat_get_progress_info('BASEBACKUP'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20);
-pg_stat_progress_cluster| SELECT s.pid,
-    s.datid,
-    d.datname,
-    s.relid,
-        CASE s.param1
-            WHEN 1 THEN 'CLUSTER'::text
-            WHEN 2 THEN 'VACUUM FULL'::text
-            ELSE NULL::text
+pg_stat_progress_cluster| SELECT pid,
+    datid,
+    datname,
+    relid,
+        CASE
+            WHEN (command = ANY (ARRAY['CLUSTER'::text, 'VACUUM FULL'::text])) THEN command
+            WHEN (repack_index_relid = (0)::oid) THEN 'VACUUM FULL'::text
+            ELSE 'CLUSTER'::text
         END AS command,
-        CASE s.param2
-            WHEN 0 THEN 'initializing'::text
-            WHEN 1 THEN 'seq scanning heap'::text
-            WHEN 2 THEN 'index scanning heap'::text
-            WHEN 3 THEN 'sorting tuples'::text
-            WHEN 4 THEN 'writing new heap'::text
-            WHEN 5 THEN 'swapping relation files'::text
-            WHEN 6 THEN 'rebuilding index'::text
-            WHEN 7 THEN 'performing final cleanup'::text
-            ELSE NULL::text
-        END AS phase,
-    (s.param3)::oid AS cluster_index_relid,
-    s.param4 AS heap_tuples_scanned,
-    s.param5 AS heap_tuples_written,
-    s.param6 AS heap_blks_total,
-    s.param7 AS heap_blks_scanned,
-    s.param8 AS index_rebuild_count
-   FROM (pg_stat_get_progress_info('CLUSTER'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
-     LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+    phase,
+    repack_index_relid AS cluster_index_relid,
+    heap_tuples_scanned,
+    heap_tuples_written,
+    heap_blks_total,
+    heap_blks_scanned,
+    index_rebuild_count
+   FROM pg_stat_progress_repack;
 pg_stat_progress_copy| SELECT s.pid,
     s.datid,
     d.datname,
@@ -2089,6 +2078,35 @@ pg_stat_progress_create_index| SELECT s.pid,
     s.param15 AS partitions_done
    FROM (pg_stat_get_progress_info('CREATE INDEX'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+pg_stat_progress_repack| SELECT s.pid,
+    s.datid,
+    d.datname,
+    s.relid,
+        CASE s.param1
+            WHEN 1 THEN 'CLUSTER'::text
+            WHEN 2 THEN 'REPACK'::text
+            WHEN 3 THEN 'VACUUM FULL'::text
+            ELSE NULL::text
+        END AS command,
+        CASE s.param2
+            WHEN 0 THEN 'initializing'::text
+            WHEN 1 THEN 'seq scanning heap'::text
+            WHEN 2 THEN 'index scanning heap'::text
+            WHEN 3 THEN 'sorting tuples'::text
+            WHEN 4 THEN 'writing new heap'::text
+            WHEN 5 THEN 'swapping relation files'::text
+            WHEN 6 THEN 'rebuilding index'::text
+            WHEN 7 THEN 'performing final cleanup'::text
+            ELSE NULL::text
+        END AS phase,
+    (s.param3)::oid AS repack_index_relid,
+    s.param4 AS heap_tuples_scanned,
+    s.param5 AS heap_tuples_written,
+    s.param6 AS heap_blks_total,
+    s.param7 AS heap_blks_scanned,
+    s.param8 AS index_rebuild_count
+   FROM (pg_stat_get_progress_info('REPACK'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
+     LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_vacuum| SELECT s.pid,
     s.datid,
     d.datname,
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index b7115f86104..c976823a3cb 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -76,7 +76,6 @@ INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
 SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
 ORDER BY 1;
 
-
 SELECT relname, relkind,
     EXISTS(SELECT 1 FROM pg_class WHERE oid = c.reltoastrelid) AS hastoast
 FROM pg_class c WHERE relname LIKE 'clstr_tst%' ORDER BY relname;
@@ -229,6 +228,24 @@ SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM o
 CLUSTER clstrpart;
 ALTER TABLE clstrpart SET WITHOUT CLUSTER;
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+
+-- Check that REPACK sets new relfilenodes: it should process exactly the same
+-- tables as CLUSTER did.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart USING INDEX clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+
+-- And finally the same for REPACK w/o index.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+
 DROP TABLE clstrpart;
 
 -- Ownership of partitions is checked
@@ -313,6 +330,57 @@ EXPLAIN (COSTS OFF) SELECT * FROM clstr_expression WHERE -a = -3 ORDER BY -a, b;
 SELECT * FROM clstr_expression WHERE -a = -3 ORDER BY -a, b;
 COMMIT;
 
+----------------------------------------------------------------------
+--
+-- REPACK
+--
+----------------------------------------------------------------------
+
+-- REPACK handles individual tables identically to CLUSTER, but it's worth
+-- checking if it handles table hierarchies identically as well.
+REPACK clstr_tst USING INDEX clstr_tst_c;
+
+-- Verify that inheritance link still works
+INSERT INTO clstr_tst_inh VALUES (0, 100, 'in child table 2');
+SELECT a,b,c,substring(d for 30), length(d) from clstr_tst;
+
+-- Verify that foreign key link still works
+INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
+
+SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
+ORDER BY 1;
+
+-- Verify partial analyze works
+REPACK (ANALYZE) clstr_tst (a);
+REPACK (ANALYZE) clstr_tst;
+REPACK (VERBOSE) clstr_tst (a);
+
+-- REPACK w/o argument performs no ordering, so we can only check which tables
+-- have the relfilenode changed.
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_old AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+
+SET SESSION AUTHORIZATION regress_clstr_user;
+SET client_min_messages = ERROR;  -- order of "skipping" warnings may vary
+REPACK;
+RESET client_min_messages;
+
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_new AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+
+-- Do the actual comparison. Unlike CLUSTER, clstr_3 should have been
+-- processed because there is nothing like clustering index here.
+SELECT o.relname FROM relnodes_old o
+JOIN relnodes_new n ON o.relname = n.relname
+WHERE o.relfilenode <> n.relfilenode
+ORDER BY o.relname;
+
 -- clean up
 DROP TABLE clustertest;
 DROP TABLE clstr_1;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 09e7f1d420e..c81d93d0e5a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2569,6 +2569,8 @@ ReorderBufferTupleCidKey
 ReorderBufferUpdateProgressTxnCB
 ReorderTuple
 RepOriginId
+RepackCommand
+RepackStmt
 ReparameterizeForeignPathByChild_function
 ReplaceVarsFromTargetList_context
 ReplaceVarsNoMatchOption
-- 
2.47.3

v29-0002-Refactor-index_concurrently_create_copy-for-use-with.patchtext/x-diffDownload

From b81b57dc89575223557a5c7d110d46f7b71e5d50 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Thu, 8 Jan 2026 17:47:50 +0100
Subject: [PATCH 2/6] Refactor index_concurrently_create_copy() for use with
 REPACK (CONCURRENTLY).

This patch moves the code to index_create_copy() and adds a "concurrently"
parameter so it can be used by REPACK (CONCURRENTLY).

With the CONCURRENTLY option, REPACK cannot simply swap the heap file and
rebuild its indexes. Instead, it needs to build a separate set of indexes
(including system catalog entries) *before* the actual swap, to reduce the
time AccessExclusiveLock needs to be held for.
---
 src/backend/catalog/index.c      | 54 +++++++++++++++++++++++---------
 src/backend/commands/cluster.c   |  8 ++---
 src/backend/commands/indexcmds.c |  6 ++--
 src/backend/nodes/makefuncs.c    |  9 +++---
 src/include/catalog/index.h      |  3 ++
 src/include/nodes/makefuncs.h    |  4 ++-
 6 files changed, 57 insertions(+), 27 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 5ee6389d39c..f8e6c3d804e 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1288,15 +1288,32 @@ index_create(Relation heapRelation,
 /*
  * index_concurrently_create_copy
  *
- * Create concurrently an index based on the definition of the one provided by
- * caller.  The index is inserted into catalogs and needs to be built later
- * on.  This is called during concurrent reindex processing.
- *
- * "tablespaceOid" is the tablespace to use for this index.
+ * Variant of index_create_copy(), called during concurrent reindex
+ * processing.
  */
 Oid
 index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 							   Oid tablespaceOid, const char *newName)
+{
+	return index_create_copy(heapRelation, oldIndexId, tablespaceOid, newName,
+							 true);
+}
+
+/*
+ * index_create_copy
+ *
+ * Create an index based on the definition of the one provided by caller.  The
+ * index is inserted into catalogs. If 'concurrently' is TRUE, it needs to be
+ * built later on, otherwise it's built immediately.
+ *
+ * "tablespaceOid" is the tablespace to use for this index.
+ *
+ * The actual implementation of index_concurrently_create_copy(), reusable for
+ * other purposes.
+ */
+Oid
+index_create_copy(Relation heapRelation, Oid oldIndexId, Oid tablespaceOid,
+				  const char *newName, bool concurrently)
 {
 	Relation	indexRelation;
 	IndexInfo  *oldInfo,
@@ -1315,6 +1332,7 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 	List	   *indexColNames = NIL;
 	List	   *indexExprs = NIL;
 	List	   *indexPreds = NIL;
+	int			flags = 0;
 
 	indexRelation = index_open(oldIndexId, RowExclusiveLock);
 
@@ -1325,7 +1343,7 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 	 * Concurrent build of an index with exclusion constraints is not
 	 * supported.
 	 */
-	if (oldInfo->ii_ExclusionOps != NULL)
+	if (oldInfo->ii_ExclusionOps != NULL && concurrently)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("concurrent index creation for exclusion constraints is not supported")));
@@ -1381,9 +1399,7 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 	}
 
 	/*
-	 * Build the index information for the new index.  Note that rebuild of
-	 * indexes with exclusion constraints is not supported, hence there is no
-	 * need to fill all the ii_Exclusion* fields.
+	 * Build the index information for the new index.
 	 */
 	newInfo = makeIndexInfo(oldInfo->ii_NumIndexAttrs,
 							oldInfo->ii_NumIndexKeyAttrs,
@@ -1392,10 +1408,13 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 							indexPreds,
 							oldInfo->ii_Unique,
 							oldInfo->ii_NullsNotDistinct,
-							false,	/* not ready for inserts */
-							true,
+							!concurrently,	/* isready */
+							concurrently,	/* concurrent */
 							indexRelation->rd_indam->amsummarizing,
-							oldInfo->ii_WithoutOverlaps);
+							oldInfo->ii_WithoutOverlaps,
+							oldInfo->ii_ExclusionOps,
+							oldInfo->ii_ExclusionProcs,
+							oldInfo->ii_ExclusionStrats);
 
 	/*
 	 * Extract the list of column names and the column numbers for the new
@@ -1433,6 +1452,9 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 		stattargets[i].isnull = isnull;
 	}
 
+	if (concurrently)
+		flags = INDEX_CREATE_SKIP_BUILD | INDEX_CREATE_CONCURRENT;
+
 	/*
 	 * Now create the new index.
 	 *
@@ -1456,7 +1478,7 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 							  indcoloptions->values,
 							  stattargets,
 							  reloptionsDatum,
-							  INDEX_CREATE_SKIP_BUILD | INDEX_CREATE_CONCURRENT,
+							  flags,
 							  0,
 							  true, /* allow table to be a system catalog? */
 							  false,	/* is_internal? */
@@ -2450,7 +2472,8 @@ BuildIndexInfo(Relation index)
 					   indexStruct->indisready,
 					   false,
 					   index->rd_indam->amsummarizing,
-					   indexStruct->indisexclusion && indexStruct->indisunique);
+					   indexStruct->indisexclusion && indexStruct->indisunique,
+					   NULL, NULL, NULL);
 
 	/* fill in attribute numbers */
 	for (i = 0; i < numAtts; i++)
@@ -2510,7 +2533,8 @@ BuildDummyIndexInfo(Relation index)
 					   indexStruct->indisready,
 					   false,
 					   index->rd_indam->amsummarizing,
-					   indexStruct->indisexclusion && indexStruct->indisunique);
+					   indexStruct->indisexclusion && indexStruct->indisunique,
+					   NULL, NULL, NULL);
 
 	/* fill in attribute numbers */
 	for (i = 0; i < numAtts; i++)
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 06f6dfc37a5..094f3d36047 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -70,8 +70,7 @@ typedef struct
 
 static bool cluster_rel_recheck(RepackCommand cmd, Relation OldHeap,
 								Oid indexOid, Oid userid, int options);
-static void rebuild_relation(RepackCommand cmd,
-							 Relation OldHeap, Relation index, bool verbose);
+static void rebuild_relation(Relation OldHeap, Relation index, bool verbose);
 static void copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
@@ -415,7 +414,7 @@ cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	TransferPredicateLocksToHeapRelation(OldHeap);
 
 	/* rebuild_relation does all the dirty work */
-	rebuild_relation(cmd, OldHeap, index, verbose);
+	rebuild_relation(OldHeap, index, verbose);
 	/* rebuild_relation closes OldHeap, and index if valid */
 
 out:
@@ -629,8 +628,7 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
  * On exit, they are closed, but locks on them are not released.
  */
 static void
-rebuild_relation(RepackCommand cmd,
-				 Relation OldHeap, Relation index, bool verbose)
+rebuild_relation(Relation OldHeap, Relation index, bool verbose)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			accessMethod = OldHeap->rd_rel->relam;
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 08c86cc163c..e837c308e27 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -243,7 +243,8 @@ CheckIndexCompatible(Oid oldId,
 	 */
 	indexInfo = makeIndexInfo(numberOfAttributes, numberOfAttributes,
 							  accessMethodId, NIL, NIL, false, false,
-							  false, false, amsummarizing, isWithoutOverlaps);
+							  false, false, amsummarizing, isWithoutOverlaps,
+							  NULL, NULL, NULL);
 	typeIds = palloc_array(Oid, numberOfAttributes);
 	collationIds = palloc_array(Oid, numberOfAttributes);
 	opclassIds = palloc_array(Oid, numberOfAttributes);
@@ -930,7 +931,8 @@ DefineIndex(ParseState *pstate,
 							  !concurrent,
 							  concurrent,
 							  amissummarizing,
-							  stmt->iswithoutoverlaps);
+							  stmt->iswithoutoverlaps,
+							  NULL, NULL, NULL);
 
 	typeIds = palloc_array(Oid, numberOfAttributes);
 	collationIds = palloc_array(Oid, numberOfAttributes);
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index 2caec621d73..ca7e21e8349 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -834,7 +834,8 @@ IndexInfo *
 makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 			  List *predicates, bool unique, bool nulls_not_distinct,
 			  bool isready, bool concurrent, bool summarizing,
-			  bool withoutoverlaps)
+			  bool withoutoverlaps, Oid *exclusion_ops, Oid *exclusion_procs,
+			  uint16 *exclusion_strats)
 {
 	IndexInfo  *n = makeNode(IndexInfo);
 
@@ -863,9 +864,9 @@ makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 	n->ii_PredicateState = NULL;
 
 	/* exclusion constraints */
-	n->ii_ExclusionOps = NULL;
-	n->ii_ExclusionProcs = NULL;
-	n->ii_ExclusionStrats = NULL;
+	n->ii_ExclusionOps = exclusion_ops;
+	n->ii_ExclusionProcs = exclusion_procs;
+	n->ii_ExclusionStrats = exclusion_strats;
 
 	/* speculative inserts */
 	n->ii_UniqueOps = NULL;
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index b259c4141ed..3426087b445 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -99,6 +99,9 @@ extern Oid	index_concurrently_create_copy(Relation heapRelation,
 										   Oid oldIndexId,
 										   Oid tablespaceOid,
 										   const char *newName);
+extern Oid	index_create_copy(Relation heapRelation, Oid oldIndexId,
+							  Oid tablespaceOid, const char *newName,
+							  bool concurrently);
 
 extern void index_concurrently_build(Oid heapRelationId,
 									 Oid indexRelationId);
diff --git a/src/include/nodes/makefuncs.h b/src/include/nodes/makefuncs.h
index 982ec25ae14..dcea148ae1a 100644
--- a/src/include/nodes/makefuncs.h
+++ b/src/include/nodes/makefuncs.h
@@ -99,7 +99,9 @@ extern IndexInfo *makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid,
 								List *expressions, List *predicates,
 								bool unique, bool nulls_not_distinct,
 								bool isready, bool concurrent,
-								bool summarizing, bool withoutoverlaps);
+								bool summarizing, bool withoutoverlaps,
+								Oid *exclusion_ops, Oid *exclusion_procs,
+								uint16 *exclusion_strats);
 
 extern Node *makeStringConst(char *str, int location);
 extern DefElem *makeDefElem(char *name, Node *arg, int location);
-- 
2.47.3

v29-0003-Move-conversion-of-a-historic-to-MVCC-snapshot-to-a-.patchtext/x-diffDownload

From 550abe2ab6bb76dd53bb8bd23ad2944956c7c8a1 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Thu, 8 Jan 2026 17:47:50 +0100
Subject: [PATCH 3/6] Move conversion of a "historic" to MVCC snapshot to a
 separate function.

The conversion is now handled by SnapBuildMVCCFromHistoric(). REPACK
CONCURRENTLY will also need it.
---
 src/backend/replication/logical/snapbuild.c | 57 +++++++++++++++++----
 src/backend/utils/time/snapmgr.c            |  3 +-
 src/include/replication/snapbuild.h         |  1 +
 src/include/utils/snapmgr.h                 |  1 +
 4 files changed, 50 insertions(+), 12 deletions(-)

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 7f79621b57e..95f230f8e9b 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -440,10 +440,7 @@ Snapshot
 SnapBuildInitialSnapshot(SnapBuild *builder)
 {
 	Snapshot	snap;
-	TransactionId xid;
 	TransactionId safeXid;
-	TransactionId *newxip;
-	int			newxcnt = 0;
 
 	Assert(XactIsoLevel == XACT_REPEATABLE_READ);
 	Assert(builder->building_full_snapshot);
@@ -485,7 +482,33 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 
 	MyProc->xmin = snap->xmin;
 
-	/* allocate in transaction context */
+	/* Convert the historic snapshot to MVCC snapshot. */
+	return SnapBuildMVCCFromHistoric(snap, true);
+}
+
+/*
+ * Turn a historic MVCC snapshot into an ordinary MVCC snapshot.
+ *
+ * Unlike a regular (non-historic) MVCC snapshot, the 'xip' array of this
+ * snapshot contains not only running main transactions, but also their
+ * subtransactions. On the other hand, 'subxip' will usually be empty. This
+ * difference does not affect the result of XidInMVCCSnapshot() because it
+ * searches both in 'xip' and 'subxip'.
+ *
+ * Pass true for 'in_place' if you don't care about modifying the source
+ * snapshot. If you need a new instance, and one that was allocated as a
+ * single chunk of memory, pass false.
+ */
+Snapshot
+SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place)
+{
+	TransactionId xid;
+	TransactionId *oldxip = snapshot->xip;
+	uint32		oldxcnt = snapshot->xcnt;
+	TransactionId *newxip;
+	int			newxcnt = 0;
+	Snapshot	result;
+
 	newxip = palloc_array(TransactionId, GetMaxSnapshotXidCount());
 
 	/*
@@ -494,7 +517,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	 * classical snapshot by marking all non-committed transactions as
 	 * in-progress. This can be expensive.
 	 */
-	for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);)
+	for (xid = snapshot->xmin; NormalTransactionIdPrecedes(xid, snapshot->xmax);)
 	{
 		void	   *test;
 
@@ -502,7 +525,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 		 * Check whether transaction committed using the decoding snapshot
 		 * meaning of ->xip.
 		 */
-		test = bsearch(&xid, snap->xip, snap->xcnt,
+		test = bsearch(&xid, snapshot->xip, snapshot->xcnt,
 					   sizeof(TransactionId), xidComparator);
 
 		if (test == NULL)
@@ -519,11 +542,25 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	}
 
 	/* adjust remaining snapshot fields as needed */
-	snap->snapshot_type = SNAPSHOT_MVCC;
-	snap->xcnt = newxcnt;
-	snap->xip = newxip;
+	snapshot->xcnt = newxcnt;
+	snapshot->xip = newxip;
+
+	if (in_place)
+		result = snapshot;
+	else
+	{
+		result = CopySnapshot(snapshot);
+
+		/* Restore the original values so the source is intact. */
+		snapshot->xip = oldxip;
+		snapshot->xcnt = oldxcnt;
+
+		/* newxip has been copied */
+		pfree(newxip);
+	}
+	result->snapshot_type = SNAPSHOT_MVCC;
 
-	return snap;
+	return result;
 }
 
 /*
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 2e6197f5f35..3af1b366adf 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -213,7 +213,6 @@ typedef struct ExportedSnapshot
 static List *exportedSnapshots = NIL;
 
 /* Prototypes for local functions */
-static Snapshot CopySnapshot(Snapshot snapshot);
 static void UnregisterSnapshotNoOwner(Snapshot snapshot);
 static void FreeSnapshot(Snapshot snapshot);
 static void SnapshotResetXmin(void);
@@ -604,7 +603,7 @@ SetTransactionSnapshot(Snapshot sourcesnap, VirtualTransactionId *sourcevxid,
  * The copy is palloc'd in TopTransactionContext and has initial refcounts set
  * to 0.  The returned snapshot has the copied flag set.
  */
-static Snapshot
+Snapshot
 CopySnapshot(Snapshot snapshot)
 {
 	Snapshot	newsnap;
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index ccded021433..34383dea776 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -73,6 +73,7 @@ extern void FreeSnapshotBuilder(SnapBuild *builder);
 extern void SnapBuildSnapDecRefcount(Snapshot snap);
 
 extern Snapshot SnapBuildInitialSnapshot(SnapBuild *builder);
+extern Snapshot SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place);
 extern const char *SnapBuildExportSnapshot(SnapBuild *builder);
 extern void SnapBuildClearExportedSnapshot(void);
 extern void SnapBuildResetExportedSnapshotState(void);
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index b8c01a291a1..de824945f0b 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -63,6 +63,7 @@ extern Snapshot GetTransactionSnapshot(void);
 extern Snapshot GetLatestSnapshot(void);
 extern void SnapshotSetCommandId(CommandId curcid);
 
+extern Snapshot CopySnapshot(Snapshot snapshot);
 extern Snapshot GetCatalogSnapshot(Oid relid);
 extern Snapshot GetNonHistoricCatalogSnapshot(Oid relid);
 extern void InvalidateCatalogSnapshot(void);
-- 
2.47.3

v29-0004-Add-CONCURRENTLY-option-to-REPACK-command.patchtext/plainDownload

From d249930f3a44603ee03fff8be1f202061164c493 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Thu, 8 Jan 2026 17:47:50 +0100
Subject: [PATCH 4/6] Add CONCURRENTLY option to REPACK command.

The REPACK command copies the relation data into a new file, creates new
indexes and eventually swaps the files. To make sure that the old file does
not change during the copying, the relation is locked in an exclusive mode,
which prevents applications from both reading and writing. (To keep the data
consistent, we'd only need to prevent the applications from writing, but even
reading needs to be blocked before we can swap the files - otherwise some
applications could continue using the old file. Since we should not request a
stronger lock without releasing the weaker one first, we acquire the exclusive
lock in the beginning and keep it till the end of the processing.)

This patch introduces an alternative workflow, which only requires the
exclusive lock when the relation (and index) files are being swapped.
(Supposedly, the swapping should be pretty fast.) On the other hand, when we
copy the data to the new file, we allow applications to read from the relation
and even to write to it.

First, we scan the relation using a "historic snapshot", and insert all the
tuples satisfying this snapshot into the new file.

Second, logical decoding is used to capture the data changes done by
applications during the copying (i.e. changes that do not satisfy the historic
snapshot mentioned above), and those are applied to the new file before we
acquire the exclusive lock that we need to swap the files. (Of course, more
data changes can take place while we are waiting for the lock - these will be
applied to the new file after we have acquired the lock, before we swap the
files.)

Since the logical decoding system, during its startup, waits until all the
transactions which already have XID assigned have finished, there is a risk of
deadlock if a transaction that already changed anything in the database tries
to acquire a conflicting lock on the table REPACK CONCURRENTLY is working
on. As an example, consider transaction running CREATE INDEX command on the
table that is being REPACKed CONCURRENTLY. On the other hand, DML commands
(INSERT, UPDATE, DELETE) are not a problem as their lock does not conflict
with REPACK CONCURRENTLY.

The current approach is that we accept the risk. If we tried to avoid it, it'd
be necessary to unlock the table before the logical decoding is setup and lock
it again afterwards. Such temporary unlocking would imply re-checking if the
table still meets all the requirements for REPACK CONCURRENTLY.

Like the existing implementation of REPACK, the variant with the CONCURRENTLY
option also requires an extra space for the new relation and index files
(which coexist with the old files for some time). In addition, the
CONCURRENTLY option might introduce a lag in releasing WAL segments for
archiving / recycling. This is due to the decoding of the data changes done by
applications concurrently. When copying the table contents into the new file,
we check the lag periodically. If it exceeds the size of a WAL segment, we
decode all the available WAL before resuming the copying. (Of course, the
changes are not applied until the whole table contents is copied.) A
background worker might be a better approach for the decoding - let's consider
implementing it in the future.

The WAL records produced by running DML commands on the new relation do not
contain enough information to be processed by the logical decoding system. All
we need from the new relation is the file (relfilenode), while the actual
relation is eventually dropped. Thus there is no point in replaying the DMLs
anywhere.
---
 doc/src/sgml/monitoring.sgml                  |   37 +-
 doc/src/sgml/mvcc.sgml                        |   12 +-
 doc/src/sgml/ref/repack.sgml                  |  129 +-
 src/Makefile                                  |    1 +
 src/backend/access/heap/heapam.c              |   34 +-
 src/backend/access/heap/heapam_handler.c      |  227 ++-
 src/backend/access/heap/rewriteheap.c         |    6 +-
 src/backend/catalog/system_views.sql          |   19 +-
 src/backend/commands/cluster.c                | 1647 +++++++++++++++--
 src/backend/commands/matview.c                |    2 +-
 src/backend/commands/tablecmds.c              |    1 +
 src/backend/commands/vacuum.c                 |   12 +-
 src/backend/meson.build                       |    1 +
 src/backend/replication/logical/decode.c      |   40 +-
 src/backend/replication/logical/snapbuild.c   |   21 +
 .../replication/pgoutput_repack/Makefile      |   32 +
 .../replication/pgoutput_repack/meson.build   |   18 +
 .../pgoutput_repack/pgoutput_repack.c         |  240 +++
 .../storage/lmgr/generate-lwlocknames.pl      |    2 +-
 src/backend/utils/time/snapmgr.c              |    3 +-
 src/bin/psql/tab-complete.in.c                |    4 +-
 src/include/access/heapam.h                   |    5 +-
 src/include/access/heapam_xlog.h              |    2 +
 src/include/access/tableam.h                  |   10 +
 src/include/commands/cluster.h                |   88 +-
 src/include/commands/progress.h               |   17 +-
 src/include/replication/snapbuild.h           |    1 +
 src/include/storage/lockdefs.h                |    4 +-
 src/include/utils/snapmgr.h                   |    2 +
 src/test/modules/injection_points/Makefile    |    3 +
 .../injection_points/expected/repack.out      |  113 ++
 .../expected/repack_toast.out                 |   64 +
 .../modules/injection_points/logical.conf     |    1 +
 src/test/modules/injection_points/meson.build |    4 +
 .../injection_points/specs/repack.spec        |  142 ++
 .../injection_points/specs/repack_toast.spec  |  105 ++
 src/test/regress/expected/rules.out           |   19 +-
 src/tools/pgindent/typedefs.list              |    5 +
 38 files changed, 2834 insertions(+), 239 deletions(-)
 create mode 100644 src/backend/replication/pgoutput_repack/Makefile
 create mode 100644 src/backend/replication/pgoutput_repack/meson.build
 create mode 100644 src/backend/replication/pgoutput_repack/pgoutput_repack.c
 create mode 100644 src/test/modules/injection_points/expected/repack.out
 create mode 100644 src/test/modules/injection_points/expected/repack_toast.out
 create mode 100644 src/test/modules/injection_points/logical.conf
 create mode 100644 src/test/modules/injection_points/specs/repack.spec
 create mode 100644 src/test/modules/injection_points/specs/repack_toast.spec

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index b07fe3294cd..ae56b09aeba 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6202,14 +6202,35 @@ FROM pg_stat_get_backend_idset() AS backendid;
 
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>heap_tuples_written</structfield> <type>bigint</type>
+       <structfield>heap_tuples_inserted</structfield> <type>bigint</type>
       </para>
       <para>
-       Number of heap tuples written.
+       Number of heap tuples inserted.
        This counter only advances when the phase is
        <literal>seq scanning heap</literal>,
-       <literal>index scanning heap</literal>
-       or <literal>writing new heap</literal>.
+       <literal>index scanning heap</literal>,
+       <literal>writing new heap</literal>
+       or <literal>catch-up</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_updated</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples updated.
+       This counter only advances when the phase is <literal>catch-up</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_deleted</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples deleted.
+       This counter only advances when the phase is <literal>catch-up</literal>.
       </para></entry>
      </row>
 
@@ -6290,6 +6311,14 @@ FROM pg_stat_get_backend_idset() AS backendid;
        <command>REPACK</command> is currently writing the new heap.
      </entry>
     </row>
+    <row>
+     <entry><literal>catch-up</literal></entry>
+     <entry>
+       <command>REPACK CONCURRENTLY</command> is currently processing the DML
+       commands that other transactions executed during any of the preceding
+       phases.
+     </entry>
+    </row>
     <row>
      <entry><literal>swapping relation files</literal></entry>
      <entry>
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index 049ee75a4ba..0f5c34af542 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -1833,15 +1833,17 @@ SELECT pg_advisory_lock(q.id) FROM
    <title>Caveats</title>
 
    <para>
-    Some DDL commands, currently only <link linkend="sql-truncate"><command>TRUNCATE</command></link> and the
-    table-rewriting forms of <link linkend="sql-altertable"><command>ALTER TABLE</command></link>, are not
+    Some commands, currently only <link linkend="sql-truncate"><command>TRUNCATE</command></link>, the
+    table-rewriting forms of <link linkend="sql-altertable"><command>ALTER
+    TABLE</command></link> and <command>REPACK</command> with
+    the <literal>CONCURRENTLY</literal> option, are not
     MVCC-safe.  This means that after the truncation or rewrite commits, the
     table will appear empty to concurrent transactions, if they are using a
-    snapshot taken before the DDL command committed.  This will only be an
+    snapshot taken before the command committed.  This will only be an
     issue for a transaction that did not access the table in question
-    before the DDL command started &mdash; any transaction that has done so
+    before the command started &mdash; any transaction that has done so
     would hold at least an <literal>ACCESS SHARE</literal> table lock,
-    which would block the DDL command until that transaction completes.
+    which would block the truncating or rewriting command until that transaction completes.
     So these commands will not cause any apparent inconsistency in the
     table contents for successive queries on the target table, but they
     could cause visible inconsistency between the contents of the target
diff --git a/doc/src/sgml/ref/repack.sgml b/doc/src/sgml/ref/repack.sgml
index 61d5c2cdef1..30c43c49069 100644
--- a/doc/src/sgml/ref/repack.sgml
+++ b/doc/src/sgml/ref/repack.sgml
@@ -28,6 +28,7 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] USING
 
     VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
     ANALYZE [ <replaceable class="parameter">boolean</replaceable> ]
+    CONCURRENTLY [ <replaceable class="parameter">boolean</replaceable> ]
 
 <phrase>and <replaceable class="parameter">table_and_columns</replaceable> is:</phrase>
 
@@ -54,7 +55,8 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] USING
    processes every table and materialized view in the current database that
    the current user has the <literal>MAINTAIN</literal> privilege on. This
    form of <command>REPACK</command> cannot be executed inside a transaction
-   block.
+   block.  Also, this form is not allowed if
+   the <literal>CONCURRENTLY</literal> option is used.
   </para>
 
   <para>
@@ -67,7 +69,8 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] USING
    When a table is being repacked, an <literal>ACCESS EXCLUSIVE</literal> lock
    is acquired on it. This prevents any other database operations (both reads
    and writes) from operating on the table until the <command>REPACK</command>
-   is finished.
+   is finished. If you want to keep the table accessible during the repacking,
+   consider using the <literal>CONCURRENTLY</literal> option.
   </para>
 
   <refsect2 id="sql-repack-notes-on-clustering" xreflabel="Notes on Clustering">
@@ -195,6 +198,128 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] USING
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>CONCURRENTLY</literal></term>
+    <listitem>
+     <para>
+      Allow other transactions to use the table while it is being repacked.
+     </para>
+
+     <para>
+      Internally, <command>REPACK</command> copies the contents of the table
+      (ignoring dead tuples) into a new file, sorted by the specified index,
+      and also creates a new file for each index. Then it swaps the old and
+      new files for the table and all the indexes, and deletes the old
+      files. The <literal>ACCESS EXCLUSIVE</literal> lock is needed to make
+      sure that the old files do not change during the processing because the
+      changes would get lost due to the swap.
+     </para>
+
+     <para>
+      With the <literal>CONCURRENTLY</literal> option, the <literal>ACCESS
+      EXCLUSIVE</literal> lock is only acquired to swap the table and index
+      files. The data changes that took place during the creation of the new
+      table and index files are captured using logical decoding
+      (<xref linkend="logicaldecoding"/>) and applied before
+      the <literal>ACCESS EXCLUSIVE</literal> lock is requested. Thus the lock
+      is typically held only for the time needed to swap the files, which
+      should be pretty short. However, the time might still be noticeable if
+      too many data changes have been done to the table while
+      <command>REPACK</command> was waiting for the lock: those changes must
+      be processed just before the files are swapped, while the
+      <literal>ACCESS EXCLUSIVE</literal> lock is being held.
+     </para>
+
+     <para>
+      Note that <command>REPACK</command> with the
+      <literal>CONCURRENTLY</literal> option does not try to order the rows
+      inserted into the table after the repacking started. Also
+      note <command>REPACK</command> might fail to complete due to DDL
+      commands executed on the table by other transactions during the
+      repacking.
+     </para>
+
+     <note>
+      <para>
+       In addition to the temporary space requirements explained in
+       <xref linkend="sql-repack-notes-on-resources"/>,
+       the <literal>CONCURRENTLY</literal> option can add to the usage of
+       temporary space a bit more. The reason is that other transactions can
+       perform DML operations which cannot be applied to the new file until
+       <command>REPACK</command> has copied all the tuples from the old
+       file. Thus the tuples inserted into the old file during the copying are
+       also stored separately in a temporary file, so they can eventually be
+       applied to the new file.
+      </para>
+
+      <para>
+       Furthermore, the data changes performed during the copying are
+       extracted from <link linkend="wal">write-ahead log</link> (WAL), and
+       this extraction (decoding) only takes place when certain amount of WAL
+       has been written. Therefore, WAL removal can be delayed by this
+       threshold. Currently the threshold is equal to the value of
+       the <link linkend="guc-wal-segment-size"><varname>wal_segment_size</varname></link>
+       configuration parameter.
+      </para>
+     </note>
+
+     <para>
+      The <literal>CONCURRENTLY</literal> option cannot be used in the
+      following cases:
+
+      <itemizedlist>
+       <listitem>
+        <para>
+          The table is <literal>UNLOGGED</literal>.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+          The table is partitioned.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+          The table is a system catalog or a <acronym>TOAST</acronym> table.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         <command>REPACK</command> is executed inside a transaction block.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+          The <link linkend="guc-wal-level"><varname>wal_level</varname></link>
+          configuration parameter is less than <literal>logical</literal>.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         The <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+         configuration parameter does not allow for creation of an additional
+         replication slot.
+        </para>
+       </listitem>
+      </itemizedlist>
+     </para>
+
+     <warning>
+      <para>
+       <command>REPACK</command> with the <literal>CONCURRENTLY</literal>
+       option is not MVCC-safe, see <xref linkend="mvcc-caveats"/> for
+       details.
+      </para>
+     </warning>
+
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>VERBOSE</literal></term>
     <listitem>
diff --git a/src/Makefile b/src/Makefile
index 2f31a2f20a7..b18c9a14ffa 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -23,6 +23,7 @@ SUBDIRS = \
 	interfaces \
 	backend/replication/libpqwalreceiver \
 	backend/replication/pgoutput \
+	backend/replication/pgoutput_repack \
 	fe_utils \
 	bin \
 	pl \
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ad9d6338ec2..f1cdd5d1d87 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -60,7 +60,8 @@ static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
 static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 								  Buffer newbuf, HeapTuple oldtup,
 								  HeapTuple newtup, HeapTuple old_key_tuple,
-								  bool all_visible_cleared, bool new_all_visible_cleared);
+								  bool all_visible_cleared, bool new_all_visible_cleared,
+								  bool walLogical);
 #ifdef USE_ASSERT_CHECKING
 static void check_lock_if_inplace_updateable_rel(Relation relation,
 												 const ItemPointerData *otid,
@@ -2806,7 +2807,7 @@ xmax_infomask_changed(uint16 new_infomask, uint16 old_infomask)
 TM_Result
 heap_delete(Relation relation, const ItemPointerData *tid,
 			CommandId cid, Snapshot crosscheck, bool wait,
-			TM_FailureData *tmfd, bool changingPart)
+			TM_FailureData *tmfd, bool changingPart, bool walLogical)
 {
 	TM_Result	result;
 	TransactionId xid = GetCurrentTransactionId();
@@ -3053,7 +3054,8 @@ l1:
 	 * Compute replica identity tuple before entering the critical section so
 	 * we don't PANIC upon a memory allocation failure.
 	 */
-	old_key_tuple = ExtractReplicaIdentity(relation, &tp, true, &old_key_copied);
+	old_key_tuple = walLogical ?
+		ExtractReplicaIdentity(relation, &tp, true, &old_key_copied) : NULL;
 
 	/*
 	 * If this is the first possibly-multixact-able operation in the current
@@ -3143,6 +3145,15 @@ l1:
 				xlrec.flags |= XLH_DELETE_CONTAINS_OLD_KEY;
 		}
 
+		/*
+		 * Unlike UPDATE, DELETE is decoded even if there is no old key, so it
+		 * does not help to clear both XLH_DELETE_CONTAINS_OLD_TUPLE and
+		 * XLH_DELETE_CONTAINS_OLD_KEY. Thus we need an extra flag. TODO
+		 * Consider not decoding tuples w/o the old tuple/key instead.
+		 */
+		if (!walLogical)
+			xlrec.flags |= XLH_DELETE_NO_LOGICAL;
+
 		XLogBeginInsert();
 		XLogRegisterData(&xlrec, SizeOfHeapDelete);
 
@@ -3235,7 +3246,8 @@ simple_heap_delete(Relation relation, const ItemPointerData *tid)
 	result = heap_delete(relation, tid,
 						 GetCurrentCommandId(true), InvalidSnapshot,
 						 true /* wait for commit */ ,
-						 &tmfd, false /* changingPart */ );
+						 &tmfd, false,	/* changingPart */
+						 true /* walLogical */ );
 	switch (result)
 	{
 		case TM_SelfModified:
@@ -3276,7 +3288,7 @@ TM_Result
 heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
 			CommandId cid, Snapshot crosscheck, bool wait,
 			TM_FailureData *tmfd, LockTupleMode *lockmode,
-			TU_UpdateIndexes *update_indexes)
+			TU_UpdateIndexes *update_indexes, bool walLogical)
 {
 	TM_Result	result;
 	TransactionId xid = GetCurrentTransactionId();
@@ -4169,7 +4181,8 @@ l2:
 								 newbuf, &oldtup, heaptup,
 								 old_key_tuple,
 								 all_visible_cleared,
-								 all_visible_cleared_new);
+								 all_visible_cleared_new,
+								 walLogical);
 		if (newbuf != buffer)
 		{
 			PageSetLSN(BufferGetPage(newbuf), recptr);
@@ -4527,7 +4540,8 @@ simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup
 	result = heap_update(relation, otid, tup,
 						 GetCurrentCommandId(true), InvalidSnapshot,
 						 true /* wait for commit */ ,
-						 &tmfd, &lockmode, update_indexes);
+						 &tmfd, &lockmode, update_indexes,
+						 true /* walLogical */ );
 	switch (result)
 	{
 		case TM_SelfModified:
@@ -8883,7 +8897,8 @@ static XLogRecPtr
 log_heap_update(Relation reln, Buffer oldbuf,
 				Buffer newbuf, HeapTuple oldtup, HeapTuple newtup,
 				HeapTuple old_key_tuple,
-				bool all_visible_cleared, bool new_all_visible_cleared)
+				bool all_visible_cleared, bool new_all_visible_cleared,
+				bool walLogical)
 {
 	xl_heap_update xlrec;
 	xl_heap_header xlhdr;
@@ -8894,7 +8909,8 @@ log_heap_update(Relation reln, Buffer oldbuf,
 				suffixlen = 0;
 	XLogRecPtr	recptr;
 	Page		page = BufferGetPage(newbuf);
-	bool		need_tuple_data = RelationIsLogicallyLogged(reln);
+	bool		need_tuple_data = RelationIsLogicallyLogged(reln) &&
+		walLogical;
 	bool		init;
 	int			bufflags;
 
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 778377b9866..3526b6adcb5 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -33,6 +33,7 @@
 #include "catalog/index.h"
 #include "catalog/storage.h"
 #include "catalog/storage_xlog.h"
+#include "commands/cluster.h"
 #include "commands/progress.h"
 #include "executor/executor.h"
 #include "miscadmin.h"
@@ -309,7 +310,8 @@ heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
 	 * the storage itself is cleaning the dead tuples by itself, it is the
 	 * time to call the index tuple deletion also.
 	 */
-	return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart);
+	return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart,
+					   true);
 }
 
 
@@ -328,7 +330,7 @@ heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
 	tuple->t_tableOid = slot->tts_tableOid;
 
 	result = heap_update(relation, otid, tuple, cid, crosscheck, wait,
-						 tmfd, lockmode, update_indexes);
+						 tmfd, lockmode, update_indexes, true);
 	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
 
 	/*
@@ -685,13 +687,15 @@ static void
 heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 								 Relation OldIndex, bool use_sort,
 								 TransactionId OldestXmin,
+								 Snapshot snapshot,
+								 LogicalDecodingContext *decoding_ctx,
 								 TransactionId *xid_cutoff,
 								 MultiXactId *multi_cutoff,
 								 double *num_tuples,
 								 double *tups_vacuumed,
 								 double *tups_recently_dead)
 {
-	RewriteState rwstate;
+	RewriteState rwstate = NULL;
 	IndexScanDesc indexScan;
 	TableScanDesc tableScan;
 	HeapScanDesc heapScan;
@@ -705,6 +709,8 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	bool	   *isnull;
 	BufferHeapTupleTableSlot *hslot;
 	BlockNumber prev_cblock = InvalidBlockNumber;
+	bool		concurrent = snapshot != NULL;
+	XLogRecPtr	end_of_wal_prev = GetFlushRecPtr(NULL);
 
 	/* Remember if it's a system catalog */
 	is_system_catalog = IsSystemRelation(OldHeap);
@@ -720,9 +726,12 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	values = palloc_array(Datum, natts);
 	isnull = palloc_array(bool, natts);
 
-	/* Initialize the rewrite operation */
-	rwstate = begin_heap_rewrite(OldHeap, NewHeap, OldestXmin, *xid_cutoff,
-								 *multi_cutoff);
+	/*
+	 * Initialize the rewrite operation.
+	 */
+	if (!concurrent)
+		rwstate = begin_heap_rewrite(OldHeap, NewHeap, OldestXmin,
+									 *xid_cutoff, *multi_cutoff);
 
 
 	/* Set up sorting if wanted */
@@ -737,6 +746,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	 * Prepare to scan the OldHeap.  To ensure we see recently-dead tuples
 	 * that still need to be copied, we scan with SnapshotAny and use
 	 * HeapTupleSatisfiesVacuum for the visibility test.
+	 *
+	 * In the CONCURRENTLY case, we do regular MVCC visibility tests, using
+	 * the snapshot passed by the caller.
 	 */
 	if (OldIndex != NULL && !use_sort)
 	{
@@ -753,7 +765,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex,
+									snapshot ? snapshot : SnapshotAny,
+									NULL, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -762,7 +776,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap,
+									snapshot ? snapshot : SnapshotAny,
+									0, (ScanKey) NULL);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
@@ -837,70 +853,77 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		tuple = ExecFetchSlotHeapTuple(slot, false, NULL);
 		buf = hslot->buffer;
 
-		LockBuffer(buf, BUFFER_LOCK_SHARE);
-
-		switch (HeapTupleSatisfiesVacuum(tuple, OldestXmin, buf))
+		/*
+		 * Regarding CONCURRENTLY, see the comments on MVCC snapshot above.
+		 */
+		if (!concurrent)
 		{
-			case HEAPTUPLE_DEAD:
-				/* Definitely dead */
-				isdead = true;
-				break;
-			case HEAPTUPLE_RECENTLY_DEAD:
-				*tups_recently_dead += 1;
-				/* fall through */
-			case HEAPTUPLE_LIVE:
-				/* Live or recently dead, must copy it */
-				isdead = false;
-				break;
-			case HEAPTUPLE_INSERT_IN_PROGRESS:
+			LockBuffer(buf, BUFFER_LOCK_SHARE);
 
-				/*
-				 * Since we hold exclusive lock on the relation, normally the
-				 * only way to see this is if it was inserted earlier in our
-				 * own transaction.  However, it can happen in system
-				 * catalogs, since we tend to release write lock before commit
-				 * there.  Give a warning if neither case applies; but in any
-				 * case we had better copy it.
-				 */
-				if (!is_system_catalog &&
-					!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple->t_data)))
-					elog(WARNING, "concurrent insert in progress within table \"%s\"",
-						 RelationGetRelationName(OldHeap));
-				/* treat as live */
-				isdead = false;
-				break;
-			case HEAPTUPLE_DELETE_IN_PROGRESS:
+			switch (HeapTupleSatisfiesVacuum(tuple, OldestXmin, buf))
+			{
+				case HEAPTUPLE_DEAD:
+					/* Definitely dead */
+					isdead = true;
+					break;
+				case HEAPTUPLE_RECENTLY_DEAD:
+					*tups_recently_dead += 1;
+					/* fall through */
+				case HEAPTUPLE_LIVE:
+					/* Live or recently dead, must copy it */
+					isdead = false;
+					break;
+				case HEAPTUPLE_INSERT_IN_PROGRESS:
 
-				/*
-				 * Similar situation to INSERT_IN_PROGRESS case.
-				 */
-				if (!is_system_catalog &&
-					!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tuple->t_data)))
-					elog(WARNING, "concurrent delete in progress within table \"%s\"",
-						 RelationGetRelationName(OldHeap));
-				/* treat as recently dead */
-				*tups_recently_dead += 1;
-				isdead = false;
-				break;
-			default:
-				elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
-				isdead = false; /* keep compiler quiet */
-				break;
-		}
+					/*
+					 * As long as we hold exclusive lock on the relation,
+					 * normally the only way to see this is if it was inserted
+					 * earlier in our own transaction.  However, it can happen
+					 * in system catalogs, since we tend to release write lock
+					 * before commit there. Give a warning if neither case
+					 * applies; but in any case we had better copy it.
+					 */
+					if (!is_system_catalog &&
+						!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple->t_data)))
+						elog(WARNING, "concurrent insert in progress within table \"%s\"",
+							 RelationGetRelationName(OldHeap));
+					/* treat as live */
+					isdead = false;
+					break;
+				case HEAPTUPLE_DELETE_IN_PROGRESS:
+
+					/*
+					 * Similar situation to INSERT_IN_PROGRESS case.
+					 */
+					if (!is_system_catalog &&
+						!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tuple->t_data)))
+						elog(WARNING, "concurrent delete in progress within table \"%s\"",
+							 RelationGetRelationName(OldHeap));
+					/* treat as recently dead */
+					*tups_recently_dead += 1;
+					isdead = false;
+					break;
+				default:
+					elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
+					isdead = false; /* keep compiler quiet */
+					break;
+			}
 
-		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 
-		if (isdead)
-		{
-			*tups_vacuumed += 1;
-			/* heap rewrite module still needs to see it... */
-			if (rewrite_heap_dead_tuple(rwstate, tuple))
+			if (isdead)
 			{
-				/* A previous recently-dead tuple is now known dead */
 				*tups_vacuumed += 1;
-				*tups_recently_dead -= 1;
+				/* heap rewrite module still needs to see it... */
+				if (rewrite_heap_dead_tuple(rwstate, tuple))
+				{
+					/* A previous recently-dead tuple is now known dead */
+					*tups_vacuumed += 1;
+					*tups_recently_dead -= 1;
+				}
+
+				continue;
 			}
-			continue;
 		}
 
 		*num_tuples += 1;
@@ -919,7 +942,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		{
 			const int	ct_index[] = {
 				PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
-				PROGRESS_REPACK_HEAP_TUPLES_WRITTEN
+				PROGRESS_REPACK_HEAP_TUPLES_INSERTED
 			};
 			int64		ct_val[2];
 
@@ -934,6 +957,31 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			ct_val[1] = *num_tuples;
 			pgstat_progress_update_multi_param(2, ct_index, ct_val);
 		}
+
+		/*
+		 * Process the WAL produced by the load, as well as by other
+		 * transactions, so that the replication slot can advance and WAL does
+		 * not pile up. Use wal_segment_size as a threshold so that we do not
+		 * introduce the decoding overhead too often.
+		 *
+		 * Of course, we must not apply the changes until the initial load has
+		 * completed.
+		 *
+		 * Note that our insertions into the new table should not be decoded
+		 * as we (intentionally) do not write the logical decoding specific
+		 * information to WAL.
+		 */
+		if (concurrent)
+		{
+			XLogRecPtr	end_of_wal;
+
+			end_of_wal = GetFlushRecPtr(NULL);
+			if ((end_of_wal - end_of_wal_prev) > wal_segment_size)
+			{
+				repack_decode_concurrent_changes(decoding_ctx, end_of_wal);
+				end_of_wal_prev = end_of_wal;
+			}
+		}
 	}
 
 	if (indexScan != NULL)
@@ -977,15 +1025,32 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 									 values, isnull,
 									 rwstate);
 			/* Report n_tuples */
-			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_WRITTEN,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_INSERTED,
 										 n_tuples);
+
+			/*
+			 * Try to keep the amount of not-yet-decoded WAL small, like
+			 * above.
+			 */
+			if (concurrent)
+			{
+				XLogRecPtr	end_of_wal;
+
+				end_of_wal = GetFlushRecPtr(NULL);
+				if ((end_of_wal - end_of_wal_prev) > wal_segment_size)
+				{
+					repack_decode_concurrent_changes(decoding_ctx, end_of_wal);
+					end_of_wal_prev = end_of_wal;
+				}
+			}
 		}
 
 		tuplesort_end(tuplesort);
 	}
 
 	/* Write out any remaining tuples, and fsync if needed */
-	end_heap_rewrite(rwstate);
+	if (rwstate)
+		end_heap_rewrite(rwstate);
 
 	/* Clean up */
 	pfree(values);
@@ -2376,6 +2441,10 @@ heapam_scan_sample_next_tuple(TableScanDesc scan, SampleScanState *scanstate,
  * SET WITHOUT OIDS.
  *
  * So, we must reconstruct the tuple from component Datums.
+ *
+ * If rwstate=NULL, use simple_heap_insert() instead of rewriting - in that
+ * case we still need to deform/form the tuple. TODO Shouldn't we rename the
+ * function, as might not do any rewrite?
  */
 static void
 reform_and_rewrite_tuple(HeapTuple tuple,
@@ -2398,8 +2467,28 @@ reform_and_rewrite_tuple(HeapTuple tuple,
 
 	copiedTuple = heap_form_tuple(newTupDesc, values, isnull);
 
-	/* The heap rewrite module does the rest */
-	rewrite_heap_tuple(rwstate, tuple, copiedTuple);
+	if (rwstate)
+		/* The heap rewrite module does the rest */
+		rewrite_heap_tuple(rwstate, tuple, copiedTuple);
+	else
+	{
+		/*
+		 * Insert tuple when processing REPACK CONCURRENTLY.
+		 *
+		 * rewriteheap.c is not used in the CONCURRENTLY case because it'd be
+		 * difficult to do the same in the catch-up phase (as the logical
+		 * decoding does not provide us with sufficient visibility
+		 * information). Thus we must use heap_insert() both during the
+		 * catch-up and here.
+		 *
+		 * The following is like simple_heap_insert() except that we pass the
+		 * flag to skip logical decoding: as soon as REPACK CONCURRENTLY swaps
+		 * the relation files, it drops this relation, so no logical
+		 * replication subscription should need the data.
+		 */
+		heap_insert(NewHeap, copiedTuple, GetCurrentCommandId(true),
+					HEAP_INSERT_NO_LOGICAL, NULL);
+	}
 
 	heap_freetuple(copiedTuple);
 }
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index bae3a2da77a..45892da1e14 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -617,9 +617,9 @@ raw_heap_insert(RewriteState state, HeapTuple tup)
 		int			options = HEAP_INSERT_SKIP_FSM;
 
 		/*
-		 * While rewriting the heap for VACUUM FULL / CLUSTER, make sure data
-		 * for the TOAST table are not logically decoded.  The main heap is
-		 * WAL-logged as XLOG FPI records, which are not logically decoded.
+		 * While rewriting the heap for REPACK, make sure data for the TOAST
+		 * table are not logically decoded.  The main heap is WAL-logged as
+		 * XLOG FPI records, which are not logically decoded.
 		 */
 		options |= HEAP_INSERT_NO_LOGICAL;
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 3f05ba3083a..d79eab5670c 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1298,16 +1298,19 @@ CREATE VIEW pg_stat_progress_repack AS
                       WHEN 2 THEN 'index scanning heap'
                       WHEN 3 THEN 'sorting tuples'
                       WHEN 4 THEN 'writing new heap'
-                      WHEN 5 THEN 'swapping relation files'
-                      WHEN 6 THEN 'rebuilding index'
-                      WHEN 7 THEN 'performing final cleanup'
+                      WHEN 5 THEN 'catch-up'
+                      WHEN 6 THEN 'swapping relation files'
+                      WHEN 7 THEN 'rebuilding index'
+                      WHEN 8 THEN 'performing final cleanup'
                       END AS phase,
         CAST(S.param3 AS oid) AS repack_index_relid,
         S.param4 AS heap_tuples_scanned,
-        S.param5 AS heap_tuples_written,
-        S.param6 AS heap_blks_total,
-        S.param7 AS heap_blks_scanned,
-        S.param8 AS index_rebuild_count
+        S.param5 AS heap_tuples_inserted,
+        S.param6 AS heap_tuples_updated,
+        S.param7 AS heap_tuples_deleted,
+        S.param8 AS heap_blks_total,
+        S.param9 AS heap_blks_scanned,
+        S.param10 AS index_rebuild_count
     FROM pg_stat_get_progress_info('REPACK') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
@@ -1325,7 +1328,7 @@ CREATE VIEW pg_stat_progress_cluster AS
         phase,
         repack_index_relid AS cluster_index_relid,
         heap_tuples_scanned,
-        heap_tuples_written,
+        heap_tuples_inserted + heap_tuples_updated AS heap_tuples_written,
         heap_blks_total,
         heap_blks_scanned,
         index_rebuild_count
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 094f3d36047..c3feb0c3de4 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1,8 +1,23 @@
 /*-------------------------------------------------------------------------
  *
  * cluster.c
- *	  CLUSTER a table on an index.  This is now also used for VACUUM FULL and
- *	  REPACK.
+ *		Implementation of REPACK [CONCURRENTLY], also known as CLUSTER and
+ *		VACUUM FULL.
+ *
+ * There are two somewhat different ways to rewrite a table.  In non-
+ * concurrent mode, it's easy: take AccessExclusiveLock, create a new
+ * transient relation, copy the tuples over to the relfilenode of the new
+ * relation, swap the relfilenodes, then drop the old relation.
+ *
+ * In concurrent mode, we lock the table with only ShareUpdateExclusiveLock,
+ * then do an initial copy as above.  However, while the tuples are being
+ * copied, concurrent transactions could modify the table. To cope with those
+ * changes, we rely on logical decoding to obtain them from WAL.  The changes
+ * are accumulated in a tuplestore.  Once the initial copy is complete, we
+ * read the changes from the tuplestore and re-apply them on the new heap.
+ * Then we upgrade our ShareUpdateExclusiveLock to AccessExclusiveLock and
+ * swap the relfilenodes.  This way, the time we hold a strong lock on the
+ * table is much reduced, and the bloat is eliminated.
  *
  * There is hardly anything left of Paul Brown's original implementation...
  *
@@ -26,6 +41,10 @@
 #include "access/toast_internals.h"
 #include "access/transam.h"
 #include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xlog_internal.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
 #include "catalog/catalog.h"
 #include "catalog/dependency.h"
 #include "catalog/heap.h"
@@ -33,6 +52,7 @@
 #include "catalog/namespace.h"
 #include "catalog/objectaccess.h"
 #include "catalog/pg_am.h"
+#include "catalog/pg_control.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/toasting.h"
 #include "commands/cluster.h"
@@ -40,15 +60,21 @@
 #include "commands/progress.h"
 #include "commands/tablecmds.h"
 #include "commands/vacuum.h"
+#include "executor/executor.h"
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
 #include "pgstat.h"
+#include "replication/decode.h"
+#include "replication/logical.h"
+#include "replication/snapbuild.h"
 #include "storage/bufmgr.h"
+#include "storage/ipc.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
 #include "utils/acl.h"
 #include "utils/fmgroids.h"
 #include "utils/guc.h"
+#include "utils/injection_point.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
@@ -68,12 +94,62 @@ typedef struct
 	Oid			indexOid;
 } RelToCluster;
 
+/*
+ * The following definitions are used for concurrent processing.
+ */
+
+/*
+ * The locators are used to avoid logical decoding of data that we do not need
+ * for our table.
+ */
+static RelFileLocator repacked_rel_locator = {.relNumber = InvalidOid};
+static RelFileLocator repacked_rel_toast_locator = {.relNumber = InvalidOid};
+
+/*
+ * Everything we need to call ExecInsertIndexTuples().
+ */
+typedef struct IndexInsertState
+{
+	ResultRelInfo *rri;
+	EState	   *estate;
+} IndexInsertState;
+
+/* The WAL segment being decoded. */
+static XLogSegNo repack_current_segment = 0;
+
+/*
+ * Information needed to apply concurrent data changes.
+ */
+typedef struct ChangeDest
+{
+	/* The relation the changes are applied to. */
+	Relation	rel;
+
+	/*
+	 * The following is needed to find the existing tuple if the change is
+	 * UPDATE or DELETE. 'ident_key' should have all the fields except for
+	 * 'sk_argument' initialized.
+	 */
+	Relation	ident_index;
+	ScanKey		ident_key;
+	int			ident_key_nentries;
+
+	/* Needed to update indexes of rel_dst. */
+	IndexInsertState *iistate;
+} ChangeDest;
+
 static bool cluster_rel_recheck(RepackCommand cmd, Relation OldHeap,
-								Oid indexOid, Oid userid, int options);
-static void rebuild_relation(Relation OldHeap, Relation index, bool verbose);
+								Oid indexOid, Oid userid, LOCKMODE lmode,
+								int options);
+static void check_repack_concurrently_requirements(Relation rel);
+static void rebuild_relation(Relation OldHeap, Relation index, bool verbose,
+							 bool concurrent);
 static void copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
-							bool verbose, bool *pSwapToastByContent,
-							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
+							Snapshot snapshot, LogicalDecodingContext *decoding_ctx,
+							bool verbose,
+							bool *pSwapToastByContent,
+							TransactionId *pFreezeXid,
+							MultiXactId *pCutoffMulti);
 static List *get_tables_to_repack(RepackCommand cmd, bool usingindex,
 								  MemoryContext permcxt);
 static List *get_tables_to_repack_partitioned(RepackCommand cmd,
@@ -81,13 +157,51 @@ static List *get_tables_to_repack_partitioned(RepackCommand cmd,
 											  MemoryContext permcxt);
 static bool cluster_is_permitted_for_relation(RepackCommand cmd,
 											  Oid relid, Oid userid);
+
+static void begin_concurrent_repack(Relation rel);
+static void end_concurrent_repack(void);
+static LogicalDecodingContext *setup_logical_decoding(Oid relid);
+static HeapTuple get_changed_tuple(char *change);
+static void apply_concurrent_changes(RepackDecodingState *dstate,
+									 ChangeDest *dest);
+static void apply_concurrent_insert(Relation rel, HeapTuple tup,
+									IndexInsertState *iistate,
+									TupleTableSlot *index_slot);
+static void apply_concurrent_update(Relation rel, HeapTuple tup,
+									HeapTuple tup_target,
+									IndexInsertState *iistate,
+									TupleTableSlot *index_slot);
+static void apply_concurrent_delete(Relation rel, HeapTuple tup_target);
+static HeapTuple find_target_tuple(Relation rel, ChangeDest *dest,
+								   HeapTuple tup_key,
+								   TupleTableSlot *ident_slot);
+static void process_concurrent_changes(LogicalDecodingContext *decoding_ctx,
+									   XLogRecPtr end_of_wal,
+									   ChangeDest *dest);
+static IndexInsertState *get_index_insert_state(Relation relation,
+												Oid ident_index_id,
+												Relation *ident_index_p);
+static ScanKey build_identity_key(Oid ident_idx_oid, Relation rel_src,
+								  int *nentries);
+static void free_index_insert_state(IndexInsertState *iistate);
+static void cleanup_logical_decoding(LogicalDecodingContext *ctx);
+static void rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
+											   Relation cl_index,
+											   LogicalDecodingContext *decoding_ctx,
+											   TransactionId frozenXid,
+											   MultiXactId cutoffMulti);
+static List *build_new_indexes(Relation NewHeap, Relation OldHeap, List *OldIndexes);
 static Relation process_single_relation(RepackStmt *stmt,
+										LOCKMODE lockmode,
+										bool isTopLevel,
 										ClusterParams *params);
 static Oid	determine_clustered_index(Relation rel, bool usingindex,
 									  const char *indexname);
 static const char *RepackCommandAsString(RepackCommand cmd);
 
 
+#define REPL_PLUGIN_NAME   "pgoutput_repack"
+
 /*
  * The repack code allows for processing multiple tables at once. Because
  * of this, we cannot just run everything on a single transaction, or we
@@ -117,6 +231,7 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 	ClusterParams params = {0};
 	Relation	rel = NULL;
 	MemoryContext repack_context;
+	LOCKMODE	lockmode;
 	List	   *rtcs;
 
 	/* Parse option list */
@@ -127,6 +242,16 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 		else if (strcmp(opt->defname, "analyze") == 0 ||
 				 strcmp(opt->defname, "analyse") == 0)
 			params.options |= defGetBoolean(opt) ? CLUOPT_ANALYZE : 0;
+		else if (strcmp(opt->defname, "concurrently") == 0 &&
+				 defGetBoolean(opt))
+		{
+			if (stmt->command != REPACK_COMMAND_REPACK)
+				ereport(ERROR,
+						errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						errmsg("CONCURRENTLY option not supported for %s",
+							   RepackCommandAsString(stmt->command)));
+			params.options |= CLUOPT_CONCURRENT;
+		}
 		else
 			ereport(ERROR,
 					errcode(ERRCODE_SYNTAX_ERROR),
@@ -136,13 +261,25 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 					parser_errposition(pstate, opt->location));
 	}
 
+	/*
+	 * Determine the lock mode expected by cluster_rel().
+	 *
+	 * In the exclusive case, we obtain AccessExclusiveLock right away to
+	 * avoid lock-upgrade hazard in the single-transaction case. In the
+	 * CONCURRENTLY case, the AccessExclusiveLock will only be used at the end
+	 * of processing, supposedly for very short time. Until then, we'll have
+	 * to unlock the relation temporarily, so there's no lock-upgrade hazard.
+	 */
+	lockmode = (params.options & CLUOPT_CONCURRENT) == 0 ?
+		AccessExclusiveLock : ShareUpdateExclusiveLock;
+
 	/*
 	 * If a single relation is specified, process it and we're done ... unless
 	 * the relation is a partitioned table, in which case we fall through.
 	 */
 	if (stmt->relation != NULL)
 	{
-		rel = process_single_relation(stmt, &params);
+		rel = process_single_relation(stmt, lockmode, isTopLevel, &params);
 		if (rel == NULL)
 			return;				/* all done */
 	}
@@ -157,10 +294,29 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 				errmsg("cannot %s multiple tables", "REPACK (ANALYZE)"));
 
 	/*
-	 * By here, we know we are in a multi-table situation.  In order to avoid
-	 * holding locks for too long, we want to process each table in its own
-	 * transaction.  This forces us to disallow running inside a user
-	 * transaction block.
+	 * By here, we know we are in a multi-table situation.
+	 *
+	 * Concurrent processing is currently considered rather special (e.g. in
+	 * terms of resources consumed) so it is not performed in bulk.
+	 */
+	if (params.options & CLUOPT_CONCURRENT)
+	{
+		if (rel != NULL)
+		{
+			Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
+			ereport(ERROR,
+					errmsg("REPACK CONCURRENTLY not supported for partitioned tables"),
+					errhint("Consider running the command for individual partitions."));
+		}
+		else
+			ereport(ERROR,
+					errmsg("REPACK CONCURRENTLY requires explicit table name"));
+	}
+
+	/*
+	 * In order to avoid holding locks for too long, we want to process each
+	 * table in its own transaction.  This forces us to disallow running
+	 * inside a user transaction block.
 	 */
 	PreventInTransactionBlock(isTopLevel, RepackCommandAsString(stmt->command));
 
@@ -243,7 +399,7 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 		 * Open the target table, coping with the case where it has been
 		 * dropped.
 		 */
-		rel = try_table_open(rtc->tableOid, AccessExclusiveLock);
+		rel = try_table_open(rtc->tableOid, lockmode);
 		if (rel == NULL)
 		{
 			CommitTransactionCommand();
@@ -254,7 +410,7 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 		PushActiveSnapshot(GetTransactionSnapshot());
 
 		/* Process this table */
-		cluster_rel(stmt->command, rel, rtc->indexOid, &params);
+		cluster_rel(stmt->command, rel, rtc->indexOid, &params, isTopLevel);
 		/* cluster_rel closes the relation, but keeps lock */
 
 		PopActiveSnapshot();
@@ -283,22 +439,53 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
  * If indexOid is InvalidOid, the table will be rewritten in physical order
  * instead of index order.
  *
+ * Note that, in the concurrent case, the function releases the lock at some
+ * point, in order to get AccessExclusiveLock for the final steps (i.e. to
+ * swap the relation files). To make things simpler, the caller should expect
+ * OldHeap to be closed on return, regardless CLUOPT_CONCURRENT. (The
+ * AccessExclusiveLock is kept till the end of the transaction.)
+ *
  * 'cmd' indicates which command is being executed, to be used for error
  * messages.
  */
 void
 cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
-			ClusterParams *params)
+			ClusterParams *params, bool isTopLevel)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
+	Relation	index;
+	LOCKMODE	lmode;
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
 	bool		verbose = ((params->options & CLUOPT_VERBOSE) != 0);
 	bool		recheck = ((params->options & CLUOPT_RECHECK) != 0);
-	Relation	index;
+	bool		concurrent = ((params->options & CLUOPT_CONCURRENT) != 0);
+
+	/*
+	 * The lock mode is AccessExclusiveLock for normal processing and
+	 * ShareUpdateExclusiveLock for concurrent processing (so that SELECT,
+	 * INSERT, UPDATE and DELETE commands work, but cluster_rel() cannot be
+	 * called concurrently for the same relation).
+	 */
+	lmode = !concurrent ? AccessExclusiveLock : ShareUpdateExclusiveLock;
 
-	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false));
+	/* There are specific requirements on concurrent processing. */
+	if (concurrent)
+	{
+		/*
+		 * Make sure we have no XID assigned, otherwise call of
+		 * setup_logical_decoding() can cause a deadlock.
+		 *
+		 * The existence of transaction block actually does not imply that XID
+		 * was already assigned, but it very likely is. We might want to check
+		 * the result of GetCurrentTransactionIdIfAny() instead, but that
+		 * would be less clear from user's perspective.
+		 */
+		PreventInTransactionBlock(isTopLevel, "REPACK (CONCURRENTLY)");
+
+		check_repack_concurrently_requirements(OldHeap);
+	}
 
 	/* Check for user-requested abort. */
 	CHECK_FOR_INTERRUPTS();
@@ -324,10 +511,13 @@ cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	 * If this is a single-transaction CLUSTER, we can skip these tests. We
 	 * *must* skip the one on indisclustered since it would reject an attempt
 	 * to cluster a not-previously-clustered index.
+	 *
+	 * XXX move [some of] these comments to where the RECHECK flag is
+	 * determined?
 	 */
 	if (recheck &&
 		!cluster_rel_recheck(cmd, OldHeap, indexOid, save_userid,
-							 params->options))
+							 lmode, params->options))
 		goto out;
 
 	/*
@@ -342,6 +532,12 @@ cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 				errmsg("cannot run %s on a shared catalog",
 					   RepackCommandAsString(cmd)));
 
+	/*
+	 * The CONCURRENTLY case should have been rejected earlier because it does
+	 * not support system catalogs.
+	 */
+	Assert(!(OldHeap->rd_rel->relisshared && concurrent));
+
 	/*
 	 * Don't process temp tables of other backends ... their local buffer
 	 * manager is not going to cope.
@@ -362,7 +558,7 @@ cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	if (OidIsValid(indexOid))
 	{
 		/* verify the index is good and lock it */
-		check_index_is_clusterable(OldHeap, indexOid, AccessExclusiveLock);
+		check_index_is_clusterable(OldHeap, indexOid, lmode);
 		/* also open it */
 		index = index_open(indexOid, NoLock);
 	}
@@ -397,7 +593,9 @@ cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	if (OldHeap->rd_rel->relkind == RELKIND_MATVIEW &&
 		!RelationIsPopulated(OldHeap))
 	{
-		relation_close(OldHeap, AccessExclusiveLock);
+		if (index)
+			index_close(index, lmode);
+		relation_close(OldHeap, lmode);
 		goto out;
 	}
 
@@ -410,11 +608,34 @@ cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	 * invalid, because we move tuples around.  Promote them to relation
 	 * locks.  Predicate locks on indexes will be promoted when they are
 	 * reindexed.
+	 *
+	 * During concurrent processing, the heap as well as its indexes stay in
+	 * operation, so we postpone this step until they are locked using
+	 * AccessExclusiveLock near the end of the processing.
 	 */
-	TransferPredicateLocksToHeapRelation(OldHeap);
+	if (!concurrent)
+		TransferPredicateLocksToHeapRelation(OldHeap);
 
 	/* rebuild_relation does all the dirty work */
-	rebuild_relation(OldHeap, index, verbose);
+	PG_TRY();
+	{
+		/*
+		 * For concurrent processing, make sure that our logical decoding
+		 * ignores data changes of other tables than the one we are
+		 * processing.
+		 */
+		if (concurrent)
+			begin_concurrent_repack(OldHeap);
+
+		rebuild_relation(OldHeap, index, verbose, concurrent);
+	}
+	PG_FINALLY();
+	{
+		if (concurrent)
+			end_concurrent_repack();
+	}
+	PG_END_TRY();
+
 	/* rebuild_relation closes OldHeap, and index if valid */
 
 out:
@@ -433,14 +654,14 @@ out:
  */
 static bool
 cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
-					Oid userid, int options)
+					Oid userid, LOCKMODE lmode, int options)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 
 	/* Check that the user still has privileges for the relation */
 	if (!cluster_is_permitted_for_relation(cmd, tableOid, userid))
 	{
-		relation_close(OldHeap, AccessExclusiveLock);
+		relation_close(OldHeap, lmode);
 		return false;
 	}
 
@@ -454,7 +675,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	 */
 	if (RELATION_IS_OTHER_TEMP(OldHeap))
 	{
-		relation_close(OldHeap, AccessExclusiveLock);
+		relation_close(OldHeap, lmode);
 		return false;
 	}
 
@@ -465,7 +686,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 		 */
 		if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
 		{
-			relation_close(OldHeap, AccessExclusiveLock);
+			relation_close(OldHeap, lmode);
 			return false;
 		}
 
@@ -476,7 +697,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 		if ((options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
 			!get_index_isclustered(indexOid))
 		{
-			relation_close(OldHeap, AccessExclusiveLock);
+			relation_close(OldHeap, lmode);
 			return false;
 		}
 	}
@@ -488,7 +709,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
  * Verify that the specified heap and index are valid to cluster on
  *
  * Side effect: obtains lock on the index.  The caller may
- * in some cases already have AccessExclusiveLock on the table, but
+ * in some cases already have a lock of the same strength on the table, but
  * not in all cases so we can't rely on the table-level lock for
  * protection here.
  */
@@ -617,18 +838,87 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 	table_close(pg_index, RowExclusiveLock);
 }
 
+/*
+ * Check if the CONCURRENTLY option is legal for the relation.
+ */
+static void
+check_repack_concurrently_requirements(Relation rel)
+{
+	char		relpersistence,
+				replident;
+	Oid			ident_idx;
+
+	/* Data changes in system relations are not logically decoded. */
+	if (IsCatalogRelation(rel))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("REPACK CONCURRENTLY is not supported for catalog relations.")));
+
+	/*
+	 * reorderbuffer.c does not seem to handle processing of TOAST relation
+	 * alone.
+	 */
+	if (IsToastRelation(rel))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("REPACK CONCURRENTLY is not supported for TOAST relations, unless the main relation is repacked too.")));
+
+	relpersistence = rel->rd_rel->relpersistence;
+	if (relpersistence != RELPERSISTENCE_PERMANENT)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("REPACK CONCURRENTLY is only allowed for permanent relations.")));
+
+	/* With NOTHING, WAL does not contain the old tuple. */
+	replident = rel->rd_rel->relreplident;
+	if (replident == REPLICA_IDENTITY_NOTHING)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("Relation \"%s\" has insufficient replication identity.",
+						 RelationGetRelationName(rel))));
+
+	/*
+	 * Identity index is not set if the replica identity is FULL, but PK might
+	 * exist in such a case.
+	 */
+	ident_idx = RelationGetReplicaIndex(rel);
+	if (!OidIsValid(ident_idx) && OidIsValid(rel->rd_pkindex))
+		ident_idx = rel->rd_pkindex;
+	if (!OidIsValid(ident_idx))
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot process relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 (errhint("Relation \"%s\" has no identity index.",
+						  RelationGetRelationName(rel)))));
+}
+
+
 /*
  * rebuild_relation: rebuild an existing relation in index or physical order
  *
- * OldHeap: table to rebuild.
+ * OldHeap: table to rebuild.  See cluster_rel() for comments on the required
+ * lock strength.
+ *
  * index: index to cluster by, or NULL to rewrite in physical order.
  *
- * On entry, heap and index (if one is given) must be open, and
- * AccessExclusiveLock held on them.
- * On exit, they are closed, but locks on them are not released.
+ * On entry, heap and index (if one is given) must be open, and the
+ * appropriate lock held on them -- AccessExclusiveLock for exclusive
+ * processing and ShareUpdateExclusiveLock for concurrent processing.
+ *
+ * On exit, they are closed, but still locked with AccessExclusiveLock.
+ * (The function handles the lock upgrade if 'concurrent' is true.)
  */
 static void
-rebuild_relation(Relation OldHeap, Relation index, bool verbose)
+rebuild_relation(Relation OldHeap, Relation index, bool verbose, bool concurrent)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			accessMethod = OldHeap->rd_rel->relam;
@@ -636,13 +926,38 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose)
 	Oid			OIDNewHeap;
 	Relation	NewHeap;
 	char		relpersistence;
-	bool		is_system_catalog;
 	bool		swap_toast_by_content;
 	TransactionId frozenXid;
 	MultiXactId cutoffMulti;
+	LogicalDecodingContext *decoding_ctx = NULL;
+	Snapshot	snapshot = NULL;
+#if USE_ASSERT_CHECKING
+	LOCKMODE	lmode;
+
+	lmode = concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock;
+
+	Assert(CheckRelationLockedByMe(OldHeap, lmode, false));
+	Assert(index == NULL || CheckRelationLockedByMe(index, lmode, false));
+#endif
+
+	if (concurrent)
+	{
+		/*
+		 * Prepare to capture the concurrent data changes.
+		 *
+		 * Note that this call waits for all transactions with XID already
+		 * assigned to finish. If some of those transactions is waiting for a
+		 * lock conflicting with ShareUpdateExclusiveLock on our table (e.g.
+		 * it runs CREATE INDEX), we can end up in a deadlock. Not sure this
+		 * risk is worth unlocking/locking the table (and its clustering
+		 * index) and checking again if its still eligible for REPACK
+		 * CONCURRENTLY.
+		 */
+		decoding_ctx = setup_logical_decoding(tableOid);
 
-	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false) &&
-		   (index == NULL || CheckRelationLockedByMe(index, AccessExclusiveLock, false)));
+		snapshot = SnapBuildInitialSnapshotForRepack(decoding_ctx->snapshot_builder);
+		PushActiveSnapshot(snapshot);
+	}
 
 	/* for CLUSTER or REPACK USING INDEX, mark the index as the one to use */
 	if (index != NULL)
@@ -650,7 +965,6 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose)
 
 	/* Remember info about rel before closing OldHeap */
 	relpersistence = OldHeap->rd_rel->relpersistence;
-	is_system_catalog = IsSystemRelation(OldHeap);
 
 	/*
 	 * Create the transient table that will receive the re-ordered data.
@@ -666,30 +980,54 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose)
 	NewHeap = table_open(OIDNewHeap, NoLock);
 
 	/* Copy the heap data into the new table in the desired order */
-	copy_table_data(NewHeap, OldHeap, index, verbose,
+	copy_table_data(NewHeap, OldHeap, index, snapshot, decoding_ctx, verbose,
 					&swap_toast_by_content, &frozenXid, &cutoffMulti);
 
+	/* The historic snapshot won't be needed anymore. */
+	if (snapshot)
+	{
+		PopActiveSnapshot();
+		UpdateActiveSnapshotCommandId();
+	}
 
-	/* Close relcache entries, but keep lock until transaction commit */
-	table_close(OldHeap, NoLock);
-	if (index)
-		index_close(index, NoLock);
+	if (concurrent)
+	{
+		Assert(!swap_toast_by_content);
+		rebuild_relation_finish_concurrent(NewHeap, OldHeap, index,
+										   decoding_ctx,
+										   frozenXid, cutoffMulti);
 
-	/*
-	 * Close the new relation so it can be dropped as soon as the storage is
-	 * swapped. The relation is not visible to others, so no need to unlock it
-	 * explicitly.
-	 */
-	table_close(NewHeap, NoLock);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_FINAL_CLEANUP);
 
-	/*
-	 * Swap the physical files of the target and transient tables, then
-	 * rebuild the target's indexes and throw away the transient table.
-	 */
-	finish_heap_swap(tableOid, OIDNewHeap, is_system_catalog,
-					 swap_toast_by_content, false, true,
-					 frozenXid, cutoffMulti,
-					 relpersistence);
+		/* Done with decoding. */
+		cleanup_logical_decoding(decoding_ctx);
+	}
+	else
+	{
+		bool		is_system_catalog = IsSystemRelation(OldHeap);
+
+		/* Close relcache entries, but keep lock until transaction commit */
+		table_close(OldHeap, NoLock);
+		if (index)
+			index_close(index, NoLock);
+
+		/*
+		 * Close the new relation so it can be dropped as soon as the storage
+		 * is swapped. The relation is not visible to others, so no need to
+		 * unlock it explicitly.
+		 */
+		table_close(NewHeap, NoLock);
+
+		/*
+		 * Swap the physical files of the target and transient tables, then
+		 * rebuild the target's indexes and throw away the transient table.
+		 */
+		finish_heap_swap(tableOid, OIDNewHeap, is_system_catalog,
+						 swap_toast_by_content, false, true, true,
+						 frozenXid, cutoffMulti,
+						 relpersistence);
+	}
 }
 
 
@@ -824,15 +1162,19 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
 /*
  * Do the physical copying of table data.
  *
+ * 'snapshot' and 'decoding_ctx': see table_relation_copy_for_cluster(). Pass
+ * iff concurrent processing is required.
+ *
  * There are three output parameters:
  * *pSwapToastByContent is set true if toast tables must be swapped by content.
  * *pFreezeXid receives the TransactionId used as freeze cutoff point.
  * *pCutoffMulti receives the MultiXactId used as a cutoff point.
  */
 static void
-copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verbose,
-				bool *pSwapToastByContent, TransactionId *pFreezeXid,
-				MultiXactId *pCutoffMulti)
+copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
+				Snapshot snapshot, LogicalDecodingContext *decoding_ctx,
+				bool verbose, bool *pSwapToastByContent,
+				TransactionId *pFreezeXid, MultiXactId *pCutoffMulti)
 {
 	Relation	relRelation;
 	HeapTuple	reltup;
@@ -849,6 +1191,10 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	int			elevel = verbose ? INFO : DEBUG2;
 	PGRUsage	ru0;
 	char	   *nspname;
+	bool		concurrent = snapshot != NULL;
+	LOCKMODE	lmode;
+
+	lmode = concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock;
 
 	pg_rusage_init(&ru0);
 
@@ -877,7 +1223,7 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	 * will be held till end of transaction.
 	 */
 	if (OldHeap->rd_rel->reltoastrelid)
-		LockRelationOid(OldHeap->rd_rel->reltoastrelid, AccessExclusiveLock);
+		LockRelationOid(OldHeap->rd_rel->reltoastrelid, lmode);
 
 	/*
 	 * If both tables have TOAST tables, perform toast swap by content.  It is
@@ -886,7 +1232,8 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	 * swap by links.  This is okay because swap by content is only essential
 	 * for system catalogs, and we don't support schema changes for them.
 	 */
-	if (OldHeap->rd_rel->reltoastrelid && NewHeap->rd_rel->reltoastrelid)
+	if (OldHeap->rd_rel->reltoastrelid && NewHeap->rd_rel->reltoastrelid &&
+		!concurrent)
 	{
 		*pSwapToastByContent = true;
 
@@ -907,6 +1254,10 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 		 * follow the toast pointers to the wrong place.  (It would actually
 		 * work for values copied over from the old toast table, but not for
 		 * any values that we toast which were previously not toasted.)
+		 *
+		 * This would not work with CONCURRENTLY because we may need to delete
+		 * TOASTed tuples from the new heap. With this hack, we'd delete them
+		 * from the old heap.
 		 */
 		NewHeap->rd_toastoid = OldHeap->rd_rel->reltoastrelid;
 	}
@@ -982,7 +1333,9 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	 * values (e.g. because the AM doesn't use freezing).
 	 */
 	table_relation_copy_for_cluster(OldHeap, NewHeap, OldIndex, use_sort,
-									cutoffs.OldestXmin, &cutoffs.FreezeLimit,
+									cutoffs.OldestXmin, snapshot,
+									decoding_ctx,
+									&cutoffs.FreezeLimit,
 									&cutoffs.MultiXactCutoff,
 									&num_tuples, &tups_vacuumed,
 									&tups_recently_dead);
@@ -991,7 +1344,11 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	*pFreezeXid = cutoffs.FreezeLimit;
 	*pCutoffMulti = cutoffs.MultiXactCutoff;
 
-	/* Reset rd_toastoid just to be tidy --- it shouldn't be looked at again */
+	/*
+	 * Reset rd_toastoid just to be tidy --- it shouldn't be looked at again.
+	 * In the CONCURRENTLY case, we need to set it again before applying the
+	 * concurrent changes.
+	 */
 	NewHeap->rd_toastoid = InvalidOid;
 
 	num_pages = RelationGetNumberOfBlocks(NewHeap);
@@ -1449,14 +1806,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 				 bool swap_toast_by_content,
 				 bool check_constraints,
 				 bool is_internal,
+				 bool reindex,
 				 TransactionId frozenXid,
 				 MultiXactId cutoffMulti,
 				 char newrelpersistence)
 {
 	ObjectAddress object;
 	Oid			mapped_tables[4];
-	int			reindex_flags;
-	ReindexParams reindex_params = {0};
 	int			i;
 
 	/* Report that we are now swapping relation files */
@@ -1482,39 +1838,47 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	if (is_system_catalog)
 		CacheInvalidateCatalog(OIDOldHeap);
 
-	/*
-	 * Rebuild each index on the relation (but not the toast table, which is
-	 * all-new at this point).  It is important to do this before the DROP
-	 * step because if we are processing a system catalog that will be used
-	 * during DROP, we want to have its indexes available.  There is no
-	 * advantage to the other order anyway because this is all transactional,
-	 * so no chance to reclaim disk space before commit.  We do not need a
-	 * final CommandCounterIncrement() because reindex_relation does it.
-	 *
-	 * Note: because index_build is called via reindex_relation, it will never
-	 * set indcheckxmin true for the indexes.  This is OK even though in some
-	 * sense we are building new indexes rather than rebuilding existing ones,
-	 * because the new heap won't contain any HOT chains at all, let alone
-	 * broken ones, so it can't be necessary to set indcheckxmin.
-	 */
-	reindex_flags = REINDEX_REL_SUPPRESS_INDEX_USE;
-	if (check_constraints)
-		reindex_flags |= REINDEX_REL_CHECK_CONSTRAINTS;
+	if (reindex)
+	{
+		int			reindex_flags;
+		ReindexParams reindex_params = {0};
 
-	/*
-	 * Ensure that the indexes have the same persistence as the parent
-	 * relation.
-	 */
-	if (newrelpersistence == RELPERSISTENCE_UNLOGGED)
-		reindex_flags |= REINDEX_REL_FORCE_INDEXES_UNLOGGED;
-	else if (newrelpersistence == RELPERSISTENCE_PERMANENT)
-		reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
+		/*
+		 * Rebuild each index on the relation (but not the toast table, which
+		 * is all-new at this point).  It is important to do this before the
+		 * DROP step because if we are processing a system catalog that will
+		 * be used during DROP, we want to have its indexes available.  There
+		 * is no advantage to the other order anyway because this is all
+		 * transactional, so no chance to reclaim disk space before commit. We
+		 * do not need a final CommandCounterIncrement() because
+		 * reindex_relation does it.
+		 *
+		 * Note: because index_build is called via reindex_relation, it will
+		 * never set indcheckxmin true for the indexes.  This is OK even
+		 * though in some sense we are building new indexes rather than
+		 * rebuilding existing ones, because the new heap won't contain any
+		 * HOT chains at all, let alone broken ones, so it can't be necessary
+		 * to set indcheckxmin.
+		 */
+		reindex_flags = REINDEX_REL_SUPPRESS_INDEX_USE;
+		if (check_constraints)
+			reindex_flags |= REINDEX_REL_CHECK_CONSTRAINTS;
 
-	/* Report that we are now reindexing relations */
-	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
-								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
+		/*
+		 * Ensure that the indexes have the same persistence as the parent
+		 * relation.
+		 */
+		if (newrelpersistence == RELPERSISTENCE_UNLOGGED)
+			reindex_flags |= REINDEX_REL_FORCE_INDEXES_UNLOGGED;
+		else if (newrelpersistence == RELPERSISTENCE_PERMANENT)
+			reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
+
+		/* Report that we are now reindexing relations */
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
 
-	reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
+		reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
+	}
 
 	/* Report that we are now doing clean up */
 	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
@@ -1558,6 +1922,17 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	object.objectId = OIDNewHeap;
 	object.objectSubId = 0;
 
+	if (!reindex)
+	{
+		/*
+		 * Make sure the changes in pg_class are visible. This is especially
+		 * important if !swap_toast_by_content, so that the correct TOAST
+		 * relation is dropped. (reindex_relation() above did not help in this
+		 * case))
+		 */
+		CommandCounterIncrement();
+	}
+
 	/*
 	 * The new relation is local to our transaction and we know nothing
 	 * depends on it, so DROP_RESTRICT should be OK.
@@ -1597,7 +1972,7 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 
 			/* Get the associated valid index to be renamed */
 			toastidx = toast_get_valid_index(newrel->rd_rel->reltoastrelid,
-											 NoLock);
+											 AccessExclusiveLock);
 
 			/* rename the toast table ... */
 			snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u",
@@ -1857,7 +2232,8 @@ cluster_is_permitted_for_relation(RepackCommand cmd, Oid relid, Oid userid)
  * case, if an index name is given, it's up to the caller to resolve it.
  */
 static Relation
-process_single_relation(RepackStmt *stmt, ClusterParams *params)
+process_single_relation(RepackStmt *stmt, LOCKMODE lockmode, bool isTopLevel,
+						ClusterParams *params)
 {
 	Relation	rel;
 	Oid			tableOid;
@@ -1866,13 +2242,9 @@ process_single_relation(RepackStmt *stmt, ClusterParams *params)
 	Assert(stmt->command == REPACK_COMMAND_CLUSTER ||
 		   stmt->command == REPACK_COMMAND_REPACK);
 
-	/*
-	 * Find, lock, and check permissions on the table.  We obtain
-	 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
-	 * single-transaction case.
-	 */
+	/* Find, lock, and check permissions on the table. */
 	tableOid = RangeVarGetRelidExtended(stmt->relation->relation,
-										AccessExclusiveLock,
+										lockmode,
 										0,
 										RangeVarCallbackMaintainsTable,
 										NULL);
@@ -1904,13 +2276,14 @@ process_single_relation(RepackStmt *stmt, ClusterParams *params)
 		return rel;
 	else
 	{
-		Oid			indexOid;
+		Oid			indexOid = InvalidOid;
 
 		indexOid = determine_clustered_index(rel, stmt->usingindex,
 											 stmt->indexname);
 		if (OidIsValid(indexOid))
-			check_index_is_clusterable(rel, indexOid, AccessExclusiveLock);
-		cluster_rel(stmt->command, rel, indexOid, params);
+			check_index_is_clusterable(rel, indexOid, lockmode);
+
+		cluster_rel(stmt->command, rel, indexOid, params, isTopLevel);
 
 		/* Do an analyze, if requested */
 		if (params->options & CLUOPT_ANALYZE)
@@ -1993,3 +2366,1067 @@ RepackCommandAsString(RepackCommand cmd)
 	}
 	return "???";
 }
+
+
+/*
+ * Call this function before REPACK CONCURRENTLY starts to setup logical
+ * decoding. It makes sure that other users of the table put enough
+ * information into WAL.
+ *
+ * The point is that at various places we expect that the table we're
+ * processing is treated like a system catalog. For example, we need to be
+ * able to scan it using a "historic snapshot" anytime during the processing
+ * (as opposed to scanning only at the start point of the decoding, as logical
+ * replication does during initial table synchronization), in order to apply
+ * concurrent UPDATE / DELETE commands.
+ *
+ * Note that TOAST table needs no attention here as it's not scanned using
+ * historic snapshot.
+ */
+static void
+begin_concurrent_repack(Relation rel)
+{
+	Oid			toastrelid;
+
+	/*
+	 * Avoid logical decoding of other relations by this backend. The lock we
+	 * have guarantees that the actual locator cannot be changed concurrently:
+	 * TRUNCATE needs AccessExclusiveLock.
+	 */
+	Assert(CheckRelationLockedByMe(rel, ShareUpdateExclusiveLock, false));
+	repacked_rel_locator = rel->rd_locator;
+	toastrelid = rel->rd_rel->reltoastrelid;
+	if (OidIsValid(toastrelid))
+	{
+		Relation	toastrel;
+
+		/* Avoid logical decoding of other TOAST relations. */
+		toastrel = table_open(toastrelid, AccessShareLock);
+		repacked_rel_toast_locator = toastrel->rd_locator;
+		table_close(toastrel, AccessShareLock);
+	}
+}
+
+/*
+ * Call this when done with REPACK CONCURRENTLY.
+ */
+static void
+end_concurrent_repack(void)
+{
+	/*
+	 * Restore normal function of (future) logical decoding for this backend.
+	 */
+	repacked_rel_locator.relNumber = InvalidOid;
+	repacked_rel_toast_locator.relNumber = InvalidOid;
+}
+
+/*
+ * Is this backend performing logical decoding on behalf of REPACK
+ * (CONCURRENTLY) ?
+ */
+bool
+am_decoding_for_repack(void)
+{
+	return OidIsValid(repacked_rel_locator.relNumber);
+}
+
+/*
+ * Does the WAL record contain a data change that this backend does not need
+ * to decode on behalf of REPACK (CONCURRENT)?
+ */
+bool
+change_useless_for_repack(XLogRecordBuffer *buf)
+{
+	XLogReaderState *r = buf->record;
+	RelFileLocator locator;
+
+	/* TOAST locator should not be set unless the main is. */
+	Assert(!OidIsValid(repacked_rel_toast_locator.relNumber) ||
+		   OidIsValid(repacked_rel_locator.relNumber));
+
+	/*
+	 * Backends not involved in REPACK (CONCURRENTLY) should not do the
+	 * filtering.
+	 */
+	if (!am_decoding_for_repack())
+		return false;
+
+	/*
+	 * If the record does not contain the block 0, it's probably not INSERT /
+	 * UPDATE / DELETE. In any case, we do not have enough information to
+	 * filter the change out.
+	 */
+	if (!XLogRecGetBlockTagExtended(r, 0, &locator, NULL, NULL, NULL))
+		return false;
+
+	/*
+	 * Decode the change if it belongs to the table we are repacking, or if it
+	 * belongs to its TOAST relation.
+	 */
+	if (RelFileLocatorEquals(locator, repacked_rel_locator))
+		return false;
+	if (OidIsValid(repacked_rel_toast_locator.relNumber) &&
+		RelFileLocatorEquals(locator, repacked_rel_toast_locator))
+		return false;
+
+	/* Filter out changes of other tables. */
+	return true;
+}
+
+/*
+ * This function is much like pg_create_logical_replication_slot() except that
+ * the new slot is neither released (if anyone else could read changes from
+ * our slot, we could miss changes other backends do while we copy the
+ * existing data into temporary table), nor persisted (it's easier to handle
+ * crash by restarting all the work from scratch).
+ */
+static LogicalDecodingContext *
+setup_logical_decoding(Oid relid)
+{
+	Relation	rel;
+	TupleDesc	tupdesc;
+	LogicalDecodingContext *ctx;
+	RepackDecodingState *dstate = palloc0_object(RepackDecodingState);
+
+	/*
+	 * REPACK CONCURRENTLY is not allowed in a transaction block, so this
+	 * should never fire.
+	 */
+	Assert(!TransactionIdIsValid(GetTopTransactionIdIfAny()));
+
+	/*
+	 * A single backend should not execute multiple REPACK commands at a time,
+	 * so use PID to make the slot unique.
+	 */
+	snprintf(NameStr(dstate->slotname), NAMEDATALEN, "repack_%d", MyProcPid);
+
+	/*
+	 * Check if we can use logical decoding.
+	 */
+	CheckSlotPermissions();
+	CheckLogicalDecodingRequirements();
+
+	/* RS_TEMPORARY so that the slot gets cleaned up on ERROR. */
+	ReplicationSlotCreate(NameStr(dstate->slotname), true, RS_TEMPORARY,
+						  false, false, false);
+
+	/*
+	 * Neither prepare_write nor do_write callback nor update_progress is
+	 * useful for us.
+	 */
+	ctx = CreateInitDecodingContext(REPL_PLUGIN_NAME,
+									NIL,
+									true,
+									InvalidXLogRecPtr,
+									XL_ROUTINE(.page_read = read_local_xlog_page,
+											   .segment_open = wal_segment_open,
+											   .segment_close = wal_segment_close),
+									NULL, NULL, NULL);
+
+	/*
+	 * We don't have control on setting fast_forward, so at least check it.
+	 */
+	Assert(!ctx->fast_forward);
+
+	DecodingContextFindStartpoint(ctx);
+
+	/* Some WAL records should have been read. */
+	Assert(ctx->reader->EndRecPtr != InvalidXLogRecPtr);
+
+	XLByteToSeg(ctx->reader->EndRecPtr, repack_current_segment,
+				wal_segment_size);
+
+	/*
+	 * Setup structures to store decoded changes.
+	 */
+	dstate->relid = relid;
+	dstate->tstore = tuplestore_begin_heap(false, false,
+										   maintenance_work_mem);
+
+	/* Caller should already have the table locked. */
+	rel = table_open(relid, NoLock);
+	tupdesc = CreateTupleDescCopy(RelationGetDescr(rel));
+	dstate->tupdesc = tupdesc;
+	table_close(rel, NoLock);
+
+	/* Initialize the descriptor to store the changes ... */
+	dstate->tupdesc_change = CreateTemplateTupleDesc(1);
+
+	TupleDescInitEntry(dstate->tupdesc_change, 1, NULL, BYTEAOID, -1, 0);
+	/* ... as well as the corresponding slot. */
+	dstate->tsslot = MakeSingleTupleTableSlot(dstate->tupdesc_change,
+											  &TTSOpsMinimalTuple);
+
+	dstate->resowner = ResourceOwnerCreate(CurrentResourceOwner,
+										   "logical decoding");
+
+	ctx->output_writer_private = dstate;
+	return ctx;
+}
+
+/*
+ * Retrieve tuple from ConcurrentChange structure.
+ *
+ * The input data starts with the structure but it might not be appropriately
+ * aligned.
+ */
+static HeapTuple
+get_changed_tuple(char *change)
+{
+	HeapTupleData tup_data;
+	HeapTuple	result;
+	char	   *src;
+
+	/*
+	 * Ensure alignment before accessing the fields. (This is why we can't use
+	 * heap_copytuple() instead of this function.)
+	 */
+	src = change + offsetof(ConcurrentChange, tup_data);
+	memcpy(&tup_data, src, sizeof(HeapTupleData));
+
+	result = (HeapTuple) palloc(HEAPTUPLESIZE + tup_data.t_len);
+	memcpy(result, &tup_data, sizeof(HeapTupleData));
+	result->t_data = (HeapTupleHeader) ((char *) result + HEAPTUPLESIZE);
+	src = change + SizeOfConcurrentChange;
+	memcpy(result->t_data, src, result->t_len);
+
+	return result;
+}
+
+/*
+ * Decode logical changes from the WAL sequence up to end_of_wal.
+ */
+void
+repack_decode_concurrent_changes(LogicalDecodingContext *ctx,
+								 XLogRecPtr end_of_wal)
+{
+	RepackDecodingState *dstate;
+	ResourceOwner resowner_old;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+	resowner_old = CurrentResourceOwner;
+	CurrentResourceOwner = dstate->resowner;
+
+	PG_TRY();
+	{
+		while (ctx->reader->EndRecPtr < end_of_wal)
+		{
+			XLogRecord *record;
+			XLogSegNo	segno_new;
+			char	   *errm = NULL;
+			XLogRecPtr	end_lsn;
+
+			record = XLogReadRecord(ctx->reader, &errm);
+			if (errm)
+				elog(ERROR, "%s", errm);
+
+			if (record != NULL)
+				LogicalDecodingProcessRecord(ctx, ctx->reader);
+
+			/*
+			 * If WAL segment boundary has been crossed, inform the decoding
+			 * system that the catalog_xmin can advance. (We can confirm more
+			 * often, but a filling a single WAL segment should not take much
+			 * time.)
+			 */
+			end_lsn = ctx->reader->EndRecPtr;
+			XLByteToSeg(end_lsn, segno_new, wal_segment_size);
+			if (segno_new != repack_current_segment)
+			{
+				LogicalConfirmReceivedLocation(end_lsn);
+				elog(DEBUG1, "REPACK: confirmed receive location %X/%X",
+					 (uint32) (end_lsn >> 32), (uint32) end_lsn);
+				repack_current_segment = segno_new;
+			}
+
+			CHECK_FOR_INTERRUPTS();
+		}
+		InvalidateSystemCaches();
+		CurrentResourceOwner = resowner_old;
+	}
+	PG_CATCH();
+	{
+		/* clear all timetravel entries */
+		InvalidateSystemCaches();
+		CurrentResourceOwner = resowner_old;
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+}
+
+/*
+ * Apply changes stored in 'file'.
+ */
+static void
+apply_concurrent_changes(RepackDecodingState *dstate, ChangeDest *dest)
+{
+	Relation	rel = dest->rel;
+	TupleTableSlot *index_slot,
+			   *ident_slot;
+	HeapTuple	tup_old = NULL;
+
+	if (dstate->nchanges == 0)
+		return;
+
+	/* TupleTableSlot is needed to pass the tuple to ExecInsertIndexTuples(). */
+	index_slot = MakeSingleTupleTableSlot(dstate->tupdesc, &TTSOpsHeapTuple);
+
+	/* A slot to fetch tuples from identity index. */
+	ident_slot = table_slot_create(rel, NULL);
+
+	while (tuplestore_gettupleslot(dstate->tstore, true, false,
+								   dstate->tsslot))
+	{
+		bool		shouldFree;
+		HeapTuple	tup_change,
+					tup,
+					tup_exist;
+		char	   *change_raw,
+				   *src;
+		ConcurrentChange change;
+		bool		isnull[1];
+		Datum		values[1];
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* Get the change from the single-column tuple. */
+		tup_change = ExecFetchSlotHeapTuple(dstate->tsslot, false, &shouldFree);
+		heap_deform_tuple(tup_change, dstate->tupdesc_change, values, isnull);
+		Assert(!isnull[0]);
+
+		/* Make sure we access aligned data. */
+		change_raw = (char *) DatumGetByteaP(values[0]);
+		src = (char *) VARDATA(change_raw);
+		memcpy(&change, src, SizeOfConcurrentChange);
+
+		/*
+		 * Extract the tuple from the change. The tuple is copied here because
+		 * it might be assigned to 'tup_old', in which case it needs to
+		 * survive into the next iteration.
+		 */
+		tup = get_changed_tuple(src);
+
+		if (change.kind == CHANGE_UPDATE_OLD)
+		{
+			Assert(tup_old == NULL);
+			tup_old = tup;
+		}
+		else if (change.kind == CHANGE_INSERT)
+		{
+			Assert(tup_old == NULL);
+
+			apply_concurrent_insert(rel, tup, dest->iistate, index_slot);
+
+			pfree(tup);
+		}
+		else if (change.kind == CHANGE_UPDATE_NEW ||
+				 change.kind == CHANGE_DELETE)
+		{
+			HeapTuple	tup_key;
+
+			if (change.kind == CHANGE_UPDATE_NEW)
+			{
+				tup_key = tup_old != NULL ? tup_old : tup;
+			}
+			else
+			{
+				Assert(tup_old == NULL);
+				tup_key = tup;
+			}
+
+			/*
+			 * Find the tuple to be updated or deleted.
+			 */
+			tup_exist = find_target_tuple(rel, dest, tup_key, ident_slot);
+			if (tup_exist == NULL)
+				elog(ERROR, "failed to find target tuple");
+
+			if (change.kind == CHANGE_UPDATE_NEW)
+				apply_concurrent_update(rel, tup, tup_exist, dest->iistate,
+										index_slot);
+			else
+				apply_concurrent_delete(rel, tup_exist);
+
+			if (tup_old != NULL)
+			{
+				pfree(tup_old);
+				tup_old = NULL;
+			}
+
+			pfree(tup);
+		}
+		else
+			elog(ERROR, "unrecognized kind of change: %d", change.kind);
+
+		/*
+		 * If a change was applied now, increment CID for next writes and
+		 * update the snapshot so it sees the changes we've applied so far.
+		 */
+		if (change.kind != CHANGE_UPDATE_OLD)
+		{
+			CommandCounterIncrement();
+			UpdateActiveSnapshotCommandId();
+		}
+
+		/* TTSOpsMinimalTuple has .get_heap_tuple==NULL. */
+		Assert(shouldFree);
+		pfree(tup_change);
+	}
+
+	tuplestore_clear(dstate->tstore);
+	dstate->nchanges = 0;
+
+	/* Cleanup. */
+	ExecDropSingleTupleTableSlot(index_slot);
+	ExecDropSingleTupleTableSlot(ident_slot);
+}
+
+static void
+apply_concurrent_insert(Relation rel, HeapTuple tup, IndexInsertState *iistate,
+						TupleTableSlot *index_slot)
+{
+	List	   *recheck;
+
+
+	/*
+	 * Like simple_heap_insert(), but make sure that the INSERT is not
+	 * logically decoded - see reform_and_rewrite_tuple() for more
+	 * information.
+	 */
+	heap_insert(rel, tup, GetCurrentCommandId(true), HEAP_INSERT_NO_LOGICAL,
+				NULL);
+
+	/*
+	 * Update indexes.
+	 *
+	 * In case functions in the index need the active snapshot and caller
+	 * hasn't set one.
+	 */
+	ExecStoreHeapTuple(tup, index_slot, false);
+	recheck = ExecInsertIndexTuples(iistate->rri,
+									index_slot,
+									iistate->estate,
+									false,	/* update */
+									false,	/* noDupErr */
+									NULL,	/* specConflict */
+									NIL,	/* arbiterIndexes */
+									false	/* onlySummarizing */
+		);
+
+	/*
+	 * If recheck is required, it must have been preformed on the source
+	 * relation by now. (All the logical changes we process here are already
+	 * committed.)
+	 */
+	list_free(recheck);
+
+	pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_INSERTED, 1);
+}
+
+static void
+apply_concurrent_update(Relation rel, HeapTuple tup, HeapTuple tup_target,
+						IndexInsertState *iistate, TupleTableSlot *index_slot)
+{
+	LockTupleMode lockmode;
+	TM_FailureData tmfd;
+	TU_UpdateIndexes update_indexes;
+	TM_Result	res;
+	List	   *recheck;
+
+	/*
+	 * Write the new tuple into the new heap. ('tup' gets the TID assigned
+	 * here.)
+	 *
+	 * Do it like in simple_heap_update(), except for 'wal_logical' (and
+	 * except for 'wait').
+	 */
+	res = heap_update(rel, &tup_target->t_self, tup,
+					  GetCurrentCommandId(true),
+					  InvalidSnapshot,
+					  false,	/* no wait - only we are doing changes */
+					  &tmfd, &lockmode, &update_indexes,
+					  false /* wal_logical */ );
+	if (res != TM_Ok)
+		ereport(ERROR, (errmsg("failed to apply concurrent UPDATE")));
+
+	ExecStoreHeapTuple(tup, index_slot, false);
+
+	if (update_indexes != TU_None)
+	{
+		recheck = ExecInsertIndexTuples(iistate->rri,
+										index_slot,
+										iistate->estate,
+										true,	/* update */
+										false,	/* noDupErr */
+										NULL,	/* specConflict */
+										NIL,	/* arbiterIndexes */
+		/* onlySummarizing */
+										update_indexes == TU_Summarizing);
+		list_free(recheck);
+	}
+
+	pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_UPDATED, 1);
+}
+
+static void
+apply_concurrent_delete(Relation rel, HeapTuple tup_target)
+{
+	TM_Result	res;
+	TM_FailureData tmfd;
+
+	/*
+	 * Delete tuple from the new heap.
+	 *
+	 * Do it like in simple_heap_delete(), except for 'wal_logical' (and
+	 * except for 'wait').
+	 */
+	res = heap_delete(rel, &tup_target->t_self, GetCurrentCommandId(true),
+					  InvalidSnapshot, false,
+					  &tmfd,
+					  false,	/* no wait - only we are doing changes */
+					  false /* wal_logical */ );
+
+	if (res != TM_Ok)
+		ereport(ERROR, (errmsg("failed to apply concurrent DELETE")));
+
+	pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_DELETED, 1);
+}
+
+/*
+ * Find the tuple to be updated or deleted.
+ *
+ * 'tup_key' is a tuple containing the key values for the scan.
+ *
+ * On exit,'*scan_p' contains the scan descriptor used. The caller must close
+ * it when he no longer needs the tuple returned.
+ */
+static HeapTuple
+find_target_tuple(Relation rel, ChangeDest *dest, HeapTuple tup_key,
+				  TupleTableSlot *ident_slot)
+{
+	Relation	ident_index = dest->ident_index;
+	IndexScanDesc scan;
+	Form_pg_index ident_form;
+	int2vector *ident_indkey;
+	HeapTuple	result = NULL;
+
+	/* XXX no instrumentation for now */
+	scan = index_beginscan(rel, ident_index, GetActiveSnapshot(),
+						   NULL, dest->ident_key_nentries, 0);
+
+	/*
+	 * Scan key is passed by caller, so it does not have to be constructed
+	 * multiple times. Key entries have all fields initialized, except for
+	 * sk_argument.
+	 */
+	index_rescan(scan, dest->ident_key, dest->ident_key_nentries, NULL, 0);
+
+	/* Info needed to retrieve key values from heap tuple. */
+	ident_form = ident_index->rd_index;
+	ident_indkey = &ident_form->indkey;
+
+	/* Use the incoming tuple to finalize the scan key. */
+	for (int i = 0; i < scan->numberOfKeys; i++)
+	{
+		ScanKey		entry;
+		bool		isnull;
+		int16		attno_heap;
+
+		entry = &scan->keyData[i];
+		attno_heap = ident_indkey->values[i];
+		entry->sk_argument = heap_getattr(tup_key,
+										  attno_heap,
+										  rel->rd_att,
+										  &isnull);
+		Assert(!isnull);
+	}
+	if (index_getnext_slot(scan, ForwardScanDirection, ident_slot))
+	{
+		bool		shouldFree;
+
+		result = ExecFetchSlotHeapTuple(ident_slot, false, &shouldFree);
+		/* TTSOpsBufferHeapTuple has .get_heap_tuple != NULL. */
+		Assert(!shouldFree);
+	}
+	index_endscan(scan);
+
+	return result;
+}
+
+/*
+ * Decode and apply concurrent changes.
+ */
+static void
+process_concurrent_changes(LogicalDecodingContext *decoding_ctx,
+						   XLogRecPtr end_of_wal, ChangeDest *dest)
+{
+	RepackDecodingState *dstate;
+
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_CATCH_UP);
+
+	dstate = (RepackDecodingState *) decoding_ctx->output_writer_private;
+
+	repack_decode_concurrent_changes(decoding_ctx, end_of_wal);
+
+	if (dstate->nchanges == 0)
+		return;
+
+	apply_concurrent_changes(dstate, dest);
+}
+
+/*
+ * Initialize IndexInsertState for index specified by ident_index_id.
+ *
+ * While doing that, also return the identity index in *ident_index_p.
+ */
+static IndexInsertState *
+get_index_insert_state(Relation relation, Oid ident_index_id,
+					   Relation *ident_index_p)
+{
+	EState	   *estate;
+	int			i;
+	IndexInsertState *result;
+	Relation	ident_index = NULL;
+
+	result = (IndexInsertState *) palloc0(sizeof(IndexInsertState));
+	estate = CreateExecutorState();
+
+	result->rri = (ResultRelInfo *) palloc(sizeof(ResultRelInfo));
+	InitResultRelInfo(result->rri, relation, 0, 0, 0);
+	ExecOpenIndices(result->rri, false);
+
+	/*
+	 * Find the relcache entry of the identity index so that we spend no extra
+	 * effort to open / close it.
+	 */
+	for (i = 0; i < result->rri->ri_NumIndices; i++)
+	{
+		Relation	ind_rel;
+
+		ind_rel = result->rri->ri_IndexRelationDescs[i];
+		if (ind_rel->rd_id == ident_index_id)
+			ident_index = ind_rel;
+	}
+	if (ident_index == NULL)
+		elog(ERROR, "failed to open identity index");
+
+	/* Only initialize fields needed by ExecInsertIndexTuples(). */
+	result->estate = estate;
+
+	*ident_index_p = ident_index;
+	return result;
+}
+
+/*
+ * Build scan key to process logical changes.
+ */
+static ScanKey
+build_identity_key(Oid ident_idx_oid, Relation rel_src, int *nentries)
+{
+	Relation	ident_idx_rel;
+	Form_pg_index ident_idx;
+	int			n,
+				i;
+	ScanKey		result;
+
+	Assert(OidIsValid(ident_idx_oid));
+	ident_idx_rel = index_open(ident_idx_oid, AccessShareLock);
+	ident_idx = ident_idx_rel->rd_index;
+	n = ident_idx->indnatts;
+	result = (ScanKey) palloc(sizeof(ScanKeyData) * n);
+	for (i = 0; i < n; i++)
+	{
+		ScanKey		entry;
+		int16		relattno;
+		Form_pg_attribute att;
+		Oid			opfamily,
+					opcintype,
+					opno,
+					opcode;
+
+		entry = &result[i];
+		relattno = ident_idx->indkey.values[i];
+		if (relattno >= 1)
+		{
+			TupleDesc	desc;
+
+			desc = rel_src->rd_att;
+			att = TupleDescAttr(desc, relattno - 1);
+		}
+		else
+			elog(ERROR, "unexpected attribute number %d in index", relattno);
+
+		opfamily = ident_idx_rel->rd_opfamily[i];
+		opcintype = ident_idx_rel->rd_opcintype[i];
+		opno = get_opfamily_member(opfamily, opcintype, opcintype,
+								   BTEqualStrategyNumber);
+
+		if (!OidIsValid(opno))
+			elog(ERROR, "failed to find = operator for type %u", opcintype);
+
+		opcode = get_opcode(opno);
+		if (!OidIsValid(opcode))
+			elog(ERROR, "failed to find = operator for operator %u", opno);
+
+		/* Initialize everything but argument. */
+		ScanKeyInit(entry,
+					i + 1,
+					BTEqualStrategyNumber, opcode,
+					(Datum) NULL);
+		entry->sk_collation = att->attcollation;
+	}
+	index_close(ident_idx_rel, AccessShareLock);
+
+	*nentries = n;
+	return result;
+}
+
+static void
+free_index_insert_state(IndexInsertState *iistate)
+{
+	ExecCloseIndices(iistate->rri);
+	FreeExecutorState(iistate->estate);
+	pfree(iistate->rri);
+	pfree(iistate);
+}
+
+static void
+cleanup_logical_decoding(LogicalDecodingContext *ctx)
+{
+	RepackDecodingState *dstate;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	ExecDropSingleTupleTableSlot(dstate->tsslot);
+	FreeTupleDesc(dstate->tupdesc_change);
+	FreeTupleDesc(dstate->tupdesc);
+	tuplestore_end(dstate->tstore);
+
+	FreeDecodingContext(ctx);
+
+	ReplicationSlotRelease();
+	ReplicationSlotDrop(NameStr(dstate->slotname), false);
+	pfree(dstate);
+}
+
+/*
+ * The final steps of rebuild_relation() for concurrent processing.
+ *
+ * On entry, NewHeap is locked in AccessExclusiveLock mode. OldHeap and its
+ * clustering index (if one is passed) are still locked in a mode that allows
+ * concurrent data changes. On exit, both tables and their indexes are closed,
+ * but locked in AccessExclusiveLock mode.
+ */
+static void
+rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
+								   Relation cl_index,
+								   LogicalDecodingContext *decoding_ctx,
+								   TransactionId frozenXid,
+								   MultiXactId cutoffMulti)
+{
+	LOCKMODE	lockmode_old PG_USED_FOR_ASSERTS_ONLY;
+	List	   *ind_oids_new;
+	Oid			old_table_oid = RelationGetRelid(OldHeap);
+	Oid			new_table_oid = RelationGetRelid(NewHeap);
+	List	   *ind_oids_old = RelationGetIndexList(OldHeap);
+	ListCell   *lc,
+			   *lc2;
+	char		relpersistence;
+	bool		is_system_catalog;
+	Oid			ident_idx_old,
+				ident_idx_new;
+	XLogRecPtr	wal_insert_ptr,
+				end_of_wal;
+	char		dummy_rec_data = '\0';
+	Relation   *ind_refs,
+			   *ind_refs_p;
+	int			nind;
+	ChangeDest	chgdst;
+
+	/* Like in cluster_rel(). */
+	lockmode_old = ShareUpdateExclusiveLock;
+	Assert(CheckRelationLockedByMe(OldHeap, lockmode_old, false));
+	Assert(cl_index == NULL ||
+		   CheckRelationLockedByMe(cl_index, lockmode_old, false));
+	/* This is expected from the caller. */
+	Assert(CheckRelationLockedByMe(NewHeap, AccessExclusiveLock, false));
+
+	ident_idx_old = RelationGetReplicaIndex(OldHeap);
+
+	/*
+	 * Unlike the exclusive case, we build new indexes for the new relation
+	 * rather than swapping the storage and reindexing the old relation. The
+	 * point is that the index build can take some time, so we do it before we
+	 * get AccessExclusiveLock on the old heap and therefore we cannot swap
+	 * the heap storage yet.
+	 *
+	 * index_create() will lock the new indexes using AccessExclusiveLock - no
+	 * need to change that. At the same time, we use ShareUpdateExclusiveLock
+	 * to lock the existing indexes - that should be enough to prevent others
+	 * from changing them while we're repacking the relation. The lock on
+	 * table should prevent others from changing the index column list, but
+	 * might not be enough for commands like ALTER INDEX ... SET ... (Those
+	 * are not necessarily dangerous, but can make user confused if the
+	 * changes they do get lost due to REPACK.)
+	 */
+	ind_oids_new = build_new_indexes(NewHeap, OldHeap, ind_oids_old);
+
+	/*
+	 * Processing shouldn't start w/o valid identity index.
+	 */
+	Assert(OidIsValid(ident_idx_old));
+
+	/* Find "identity index" on the new relation. */
+	ident_idx_new = InvalidOid;
+	forboth(lc, ind_oids_old, lc2, ind_oids_new)
+	{
+		Oid			ind_old = lfirst_oid(lc);
+		Oid			ind_new = lfirst_oid(lc2);
+
+		if (ident_idx_old == ind_old)
+		{
+			ident_idx_new = ind_new;
+			break;
+		}
+	}
+	if (!OidIsValid(ident_idx_new))
+
+		/*
+		 * Should not happen, given our lock on the old relation.
+		 */
+		ereport(ERROR,
+				(errmsg("identity index missing on the new relation")));
+
+	/* Gather information to apply concurrent changes. */
+	chgdst.rel = NewHeap;
+	chgdst.iistate = get_index_insert_state(NewHeap, ident_idx_new,
+											&chgdst.ident_index);
+	chgdst.ident_key = build_identity_key(ident_idx_new, OldHeap,
+										  &chgdst.ident_key_nentries);
+
+	/*
+	 * During testing, wait for another backend to perform concurrent data
+	 * changes which we will process below.
+	 */
+	INJECTION_POINT("repack-concurrently-before-lock", NULL);
+
+	/*
+	 * Flush all WAL records inserted so far (possibly except for the last
+	 * incomplete page, see GetInsertRecPtr), to minimize the amount of data
+	 * we need to flush while holding exclusive lock on the source table.
+	 */
+	wal_insert_ptr = GetInsertRecPtr();
+	XLogFlush(wal_insert_ptr);
+	end_of_wal = GetFlushRecPtr(NULL);
+
+	/*
+	 * Apply concurrent changes first time, to minimize the time we need to
+	 * hold AccessExclusiveLock. (Quite some amount of WAL could have been
+	 * written during the data copying and index creation.)
+	 */
+	process_concurrent_changes(decoding_ctx, end_of_wal, &chgdst);
+
+	/*
+	 * Acquire AccessExclusiveLock on the table, its TOAST relation (if there
+	 * is one), all its indexes, so that we can swap the files.
+	 *
+	 * Before that, unlock the index temporarily to avoid deadlock in case
+	 * another transaction is trying to lock it while holding the lock on the
+	 * table.
+	 */
+	if (cl_index)
+	{
+		index_close(cl_index, ShareUpdateExclusiveLock);
+		cl_index = NULL;
+	}
+	/* For the same reason, unlock TOAST relation. */
+	if (OldHeap->rd_rel->reltoastrelid)
+		LockRelationOid(OldHeap->rd_rel->reltoastrelid, AccessExclusiveLock);
+	/* Finally lock the table */
+	LockRelationOid(old_table_oid, AccessExclusiveLock);
+
+	/*
+	 * Lock all indexes now, not only the clustering one: all indexes need to
+	 * have their files swapped. While doing that, store their relation
+	 * references in an array, to handle predicate locks below.
+	 */
+	ind_refs_p = ind_refs = palloc_array(Relation, list_length(ind_oids_old));
+	nind = 0;
+	foreach_oid(ind_oid, ind_oids_old)
+	{
+		Relation	index;
+
+		index = index_open(ind_oid, AccessExclusiveLock);
+
+		/*
+		 * TODO 1) Do we need to check if ALTER INDEX was executed since the
+		 * new index was created in build_new_indexes()? 2) Specifically for
+		 * the clustering index, should check_index_is_clusterable() be called
+		 * here? (Not sure about the latter: ShareUpdateExclusiveLock on the
+		 * table probably blocks all commands that affect the result of
+		 * check_index_is_clusterable().)
+		 */
+		*ind_refs_p = index;
+		ind_refs_p++;
+		nind++;
+	}
+
+	/*
+	 * In addition, lock the OldHeap's TOAST relation exclusively - again, the
+	 * lock is needed to swap the files.
+	 */
+	if (OidIsValid(OldHeap->rd_rel->reltoastrelid))
+		LockRelationOid(OldHeap->rd_rel->reltoastrelid, AccessExclusiveLock);
+
+	/*
+	 * Tuples and pages of the old heap will be gone, but the heap will stay.
+	 */
+	TransferPredicateLocksToHeapRelation(OldHeap);
+	/* The same for indexes. */
+	for (int i = 0; i < nind; i++)
+	{
+		Relation	index = ind_refs[i];
+
+		TransferPredicateLocksToHeapRelation(index);
+
+		/*
+		 * References to indexes on the old relation are not needed anymore,
+		 * however locks stay till the end of the transaction.
+		 */
+		index_close(index, NoLock);
+	}
+	pfree(ind_refs);
+
+	/*
+	 * Flush anything we see in WAL, to make sure that all changes committed
+	 * while we were waiting for the exclusive lock are available for
+	 * decoding. This should not be necessary if all backends had
+	 * synchronous_commit set, but we can't rely on this setting.
+	 *
+	 * Unfortunately, GetInsertRecPtr() may lag behind the actual insert
+	 * position, and GetLastImportantRecPtr() points at the start of the last
+	 * record rather than at the end. Thus the simplest way to determine the
+	 * insert position is to insert a dummy record and use its LSN.
+	 *
+	 * XXX Consider using GetLastImportantRecPtr() and adding the size of the
+	 * last record (plus the total size of all the page headers the record
+	 * spans)?
+	 */
+	XLogBeginInsert();
+	XLogRegisterData(&dummy_rec_data, 1);
+	wal_insert_ptr = XLogInsert(RM_XLOG_ID, XLOG_NOOP);
+	XLogFlush(wal_insert_ptr);
+	end_of_wal = GetFlushRecPtr(NULL);
+
+	/* Apply the concurrent changes again. */
+	process_concurrent_changes(decoding_ctx, end_of_wal, &chgdst);
+
+	/* Remember info about rel before closing OldHeap */
+	relpersistence = OldHeap->rd_rel->relpersistence;
+	is_system_catalog = IsSystemRelation(OldHeap);
+
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_SWAP_REL_FILES);
+
+	/*
+	 * Even ShareUpdateExclusiveLock should have prevented others from
+	 * creating / dropping indexes (even using the CONCURRENTLY option), so we
+	 * do not need to check whether the lists match.
+	 */
+	forboth(lc, ind_oids_old, lc2, ind_oids_new)
+	{
+		Oid			ind_old = lfirst_oid(lc);
+		Oid			ind_new = lfirst_oid(lc2);
+		Oid			mapped_tables[4];
+
+		/* Zero out possible results from swapped_relation_files */
+		memset(mapped_tables, 0, sizeof(mapped_tables));
+
+		swap_relation_files(ind_old, ind_new,
+							(old_table_oid == RelationRelationId),
+							false,	/* swap_toast_by_content */
+							true,
+							InvalidTransactionId,
+							InvalidMultiXactId,
+							mapped_tables);
+
+#ifdef USE_ASSERT_CHECKING
+
+		/*
+		 * Concurrent processing is not supported for system relations, so
+		 * there should be no mapped tables.
+		 */
+		for (int i = 0; i < 4; i++)
+			Assert(mapped_tables[i] == 0);
+#endif
+	}
+
+	/* The new indexes must be visible for deletion. */
+	CommandCounterIncrement();
+
+	/* Close the old heap but keep lock until transaction commit. */
+	table_close(OldHeap, NoLock);
+	/* Close the new heap. (We didn't have to open its indexes). */
+	table_close(NewHeap, NoLock);
+
+	/* Cleanup what we don't need anymore. (And close the identity index.) */
+	pfree(chgdst.ident_key);
+	free_index_insert_state(chgdst.iistate);
+
+	/*
+	 * Swap the relations and their TOAST relations and TOAST indexes. This
+	 * also drops the new relation and its indexes.
+	 *
+	 * (System catalogs are currently not supported.)
+	 */
+	Assert(!is_system_catalog);
+	finish_heap_swap(old_table_oid, new_table_oid,
+					 is_system_catalog,
+					 false,		/* swap_toast_by_content */
+					 false, true, false,
+					 frozenXid, cutoffMulti,
+					 relpersistence);
+}
+
+/*
+ * Build indexes on NewHeap according to those on OldHeap.
+ *
+ * OldIndexes is the list of index OIDs on OldHeap. The contained indexes end
+ * up locked using ShareUpdateExclusiveLock.
+ *
+ * A list of OIDs of the corresponding indexes created on NewHeap is
+ * returned. The order of items does match, so we can use these arrays to swap
+ * index storage.
+ */
+static List *
+build_new_indexes(Relation NewHeap, Relation OldHeap, List *OldIndexes)
+{
+	List	   *result = NIL;
+
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
+
+	foreach_oid(ind_oid, OldIndexes)
+	{
+		Oid			ind_oid_new;
+		char	   *newName;
+		Relation	ind;
+
+		ind = index_open(ind_oid, ShareUpdateExclusiveLock);
+
+		newName = ChooseRelationName(get_rel_name(ind_oid),
+									 NULL,
+									 "repacknew",
+									 get_rel_namespace(ind->rd_index->indrelid),
+									 false);
+		ind_oid_new = index_create_copy(NewHeap, ind_oid,
+										ind->rd_rel->reltablespace, newName,
+										false);
+		result = lappend_oid(result, ind_oid_new);
+
+		index_close(ind, NoLock);
+	}
+
+	return result;
+}
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 81a55a33ef2..ebc70f5bead 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -892,7 +892,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 static void
 refresh_by_heap_swap(Oid matviewOid, Oid OIDNewHeap, char relpersistence)
 {
-	finish_heap_swap(matviewOid, OIDNewHeap, false, false, true, true,
+	finish_heap_swap(matviewOid, OIDNewHeap, false, false, true, true, true,
 					 RecentXmin, ReadNextMultiXactId(), relpersistence);
 }
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index f976c0e5c7e..296387c7889 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6025,6 +6025,7 @@ ATRewriteTables(AlterTableStmt *parsetree, List **wqueue, LOCKMODE lockmode,
 			finish_heap_swap(tab->relid, OIDNewHeap,
 							 false, false, true,
 							 !OidIsValid(tab->newTableSpace),
+							 true,
 							 RecentXmin,
 							 ReadNextMultiXactId(),
 							 persistence);
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index c363467a9cc..30a2b2578f7 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -126,7 +126,7 @@ static void vac_truncate_clog(TransactionId frozenXID,
 							  TransactionId lastSaneFrozenXid,
 							  MultiXactId lastSaneMinMulti);
 static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
-					   BufferAccessStrategy bstrategy);
+					   BufferAccessStrategy bstrategy, bool isTopLevel);
 static double compute_parallel_delay(void);
 static VacOptValue get_vacoptval_from_boolean(DefElem *def);
 static bool vac_tid_reaped(ItemPointer itemptr, void *state);
@@ -629,7 +629,8 @@ vacuum(List *relations, const VacuumParams params, BufferAccessStrategy bstrateg
 
 			if (params.options & VACOPT_VACUUM)
 			{
-				if (!vacuum_rel(vrel->oid, vrel->relation, params, bstrategy))
+				if (!vacuum_rel(vrel->oid, vrel->relation, params, bstrategy,
+								isTopLevel))
 					continue;
 			}
 
@@ -1999,7 +2000,7 @@ vac_truncate_clog(TransactionId frozenXID,
  */
 static bool
 vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
-		   BufferAccessStrategy bstrategy)
+		   BufferAccessStrategy bstrategy, bool isTopLevel)
 {
 	LOCKMODE	lmode;
 	Relation	rel;
@@ -2290,7 +2291,7 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 
 			/* VACUUM FULL is a variant of REPACK; see cluster.c */
 			cluster_rel(REPACK_COMMAND_VACUUMFULL, rel, InvalidOid,
-						&cluster_params);
+						&cluster_params, isTopLevel);
 			/* cluster_rel closes the relation, but keeps lock */
 
 			rel = NULL;
@@ -2333,7 +2334,8 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 		toast_vacuum_params.options |= VACOPT_PROCESS_MAIN;
 		toast_vacuum_params.toast_parent = relid;
 
-		vacuum_rel(toast_relid, NULL, toast_vacuum_params, bstrategy);
+		vacuum_rel(toast_relid, NULL, toast_vacuum_params, bstrategy,
+				   isTopLevel);
 	}
 
 	/*
diff --git a/src/backend/meson.build b/src/backend/meson.build
index de0a5fa1f7d..41f30a69dbd 100644
--- a/src/backend/meson.build
+++ b/src/backend/meson.build
@@ -194,5 +194,6 @@ pg_test_mod_args = pg_mod_args + {
 subdir('jit/llvm')
 subdir('replication/libpqwalreceiver')
 subdir('replication/pgoutput')
+subdir('replication/pgoutput_repack')
 subdir('snowball')
 subdir('utils/mb/conversion_procs')
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index e25dd6bc366..dc8c7be2aca 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -33,6 +33,7 @@
 #include "access/xlogreader.h"
 #include "access/xlogrecord.h"
 #include "catalog/pg_control.h"
+#include "commands/cluster.h"
 #include "replication/decode.h"
 #include "replication/logical.h"
 #include "replication/message.h"
@@ -467,6 +468,15 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	TransactionId xid = XLogRecGetXid(buf->record);
 	SnapBuild  *builder = ctx->snapshot_builder;
 
+	/*
+	 * XXX Should we return here if change_useless_for_repack() returns true,
+	 * instead of calling the function below? Unlike the fast-forward case, we
+	 * shouldn't need the base snapshot for the containing transaction until
+	 * we receive a change that belongs to the table being REPACKed. Thus it
+	 * should be fine to skip SnapBuildProcessChange(), and therefore
+	 * reorderbuffer.c can create the transaction later.
+	 */
+
 	ReorderBufferProcessXid(ctx->reorder, xid, buf->origptr);
 
 	/*
@@ -484,7 +494,8 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	{
 		case XLOG_HEAP_INSERT:
 			if (SnapBuildProcessChange(builder, xid, buf->origptr) &&
-				!ctx->fast_forward)
+				!ctx->fast_forward &&
+				!change_useless_for_repack(buf))
 				DecodeInsert(ctx, buf);
 			break;
 
@@ -496,17 +507,31 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP_HOT_UPDATE:
 		case XLOG_HEAP_UPDATE:
 			if (SnapBuildProcessChange(builder, xid, buf->origptr) &&
-				!ctx->fast_forward)
+				!ctx->fast_forward &&
+				!change_useless_for_repack(buf))
 				DecodeUpdate(ctx, buf);
 			break;
 
 		case XLOG_HEAP_DELETE:
 			if (SnapBuildProcessChange(builder, xid, buf->origptr) &&
-				!ctx->fast_forward)
+				!ctx->fast_forward &&
+				!change_useless_for_repack(buf))
 				DecodeDelete(ctx, buf);
 			break;
 
 		case XLOG_HEAP_TRUNCATE:
+			/* Is REPACK (CONCURRENTLY) being run by this backend? */
+			if (am_decoding_for_repack())
+			{
+				/*
+				 * TRUNCATE changes rd_locator of the relation, so it'd break
+				 * REPACK (CONCURRENTLY). In fact it should not happen because
+				 * TRUNCATE needs AccessExclusiveLock on the table. Should we
+				 * only use Assert() here?
+				 */
+				ereport(ERROR,
+						(errmsg("TRUNCATE encountered while doing REPACK (CONCURRENTLY)")));
+			}
 			if (SnapBuildProcessChange(builder, xid, buf->origptr) &&
 				!ctx->fast_forward)
 				DecodeTruncate(ctx, buf);
@@ -1021,6 +1046,15 @@ DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 
 	xlrec = (xl_heap_delete *) XLogRecGetData(r);
 
+	/*
+	 * Ignore changes which are considered useless for logical
+	 * decoding. Currently such changes are created by REPACK (CONCURRENTLY)
+	 * when replays DELETE commands on the new table (which is not yet visible
+	 * to other transactions).
+	 */
+	if (xlrec->flags & XLH_DELETE_NO_LOGICAL)
+		return;
+
 	/* only interested in our database */
 	XLogRecGetBlockTag(r, 0, &target_locator, NULL, NULL);
 	if (target_locator.dbOid != ctx->slot->data.database)
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 95f230f8e9b..e238bcd73cd 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -486,6 +486,27 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	return SnapBuildMVCCFromHistoric(snap, true);
 }
 
+/*
+ * Build an MVCC snapshot for the initial data load performed by REPACK
+ * CONCURRENTLY command.
+ *
+ * The snapshot will only be used to scan one particular relation, which is
+ * treated like a catalog (therefore ->building_full_snapshot is not
+ * important), and the caller should already have a replication slot setup (so
+ * we do not set MyProc->xmin). XXX Do we yet need to add some restrictions?
+ */
+Snapshot
+SnapBuildInitialSnapshotForRepack(SnapBuild *builder)
+{
+	Snapshot	snap;
+
+	Assert(builder->state == SNAPBUILD_CONSISTENT);
+	Assert(builder->building_full_snapshot);
+
+	snap = SnapBuildBuildSnapshot(builder);
+	return SnapBuildMVCCFromHistoric(snap, false);
+}
+
 /*
  * Turn a historic MVCC snapshot into an ordinary MVCC snapshot.
  *
diff --git a/src/backend/replication/pgoutput_repack/Makefile b/src/backend/replication/pgoutput_repack/Makefile
new file mode 100644
index 00000000000..4efeb713b70
--- /dev/null
+++ b/src/backend/replication/pgoutput_repack/Makefile
@@ -0,0 +1,32 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for src/backend/replication/pgoutput_repack
+#
+# IDENTIFICATION
+#    src/backend/replication/pgoutput_repack
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/replication/pgoutput_repack
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = \
+	$(WIN32RES) \
+	pgoutput_repack.o
+PGFILEDESC = "pgoutput_repack - logical replication output plugin for REPACK command"
+NAME = pgoutput_repack
+
+all: all-shared-lib
+
+include $(top_srcdir)/src/Makefile.shlib
+
+install: all installdirs install-lib
+
+installdirs: installdirs-lib
+
+uninstall: uninstall-lib
+
+clean distclean: clean-lib
+	rm -f $(OBJS)
diff --git a/src/backend/replication/pgoutput_repack/meson.build b/src/backend/replication/pgoutput_repack/meson.build
new file mode 100644
index 00000000000..133e865a4a0
--- /dev/null
+++ b/src/backend/replication/pgoutput_repack/meson.build
@@ -0,0 +1,18 @@
+# Copyright (c) 2022-2024, PostgreSQL Global Development Group
+
+pgoutput_repack_sources = files(
+  'pgoutput_repack.c',
+)
+
+if host_system == 'windows'
+  pgoutput_repack_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'pgoutput_repack',
+    '--FILEDESC', 'pgoutput_repack - logical replication output plugin for REPACK command',])
+endif
+
+pgoutput_repack = shared_module('pgoutput_repack',
+  pgoutput_repack_sources,
+  kwargs: pg_mod_args,
+)
+
+backend_targets += pgoutput_repack
diff --git a/src/backend/replication/pgoutput_repack/pgoutput_repack.c b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
new file mode 100644
index 00000000000..c8930640a0d
--- /dev/null
+++ b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
@@ -0,0 +1,240 @@
+/*-------------------------------------------------------------------------
+ *
+ * pgoutput_repack.c
+ *		Logical Replication output plugin for REPACK command
+ *
+ * Copyright (c) 2012-2025, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		  src/backend/replication/pgoutput_repack/pgoutput_repack.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heaptoast.h"
+#include "commands/cluster.h"
+#include "replication/snapbuild.h"
+#include "utils/memutils.h"
+
+PG_MODULE_MAGIC;
+
+static void plugin_startup(LogicalDecodingContext *ctx,
+						   OutputPluginOptions *opt, bool is_init);
+static void plugin_shutdown(LogicalDecodingContext *ctx);
+static void plugin_begin_txn(LogicalDecodingContext *ctx,
+							 ReorderBufferTXN *txn);
+static void plugin_commit_txn(LogicalDecodingContext *ctx,
+							  ReorderBufferTXN *txn, XLogRecPtr commit_lsn);
+static void plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+						  Relation rel, ReorderBufferChange *change);
+static void store_change(LogicalDecodingContext *ctx,
+						 ConcurrentChangeKind kind, HeapTuple tuple);
+
+void
+_PG_output_plugin_init(OutputPluginCallbacks *cb)
+{
+	AssertVariableIsOfType(&_PG_output_plugin_init, LogicalOutputPluginInit);
+
+	cb->startup_cb = plugin_startup;
+	cb->begin_cb = plugin_begin_txn;
+	cb->change_cb = plugin_change;
+	cb->commit_cb = plugin_commit_txn;
+	cb->shutdown_cb = plugin_shutdown;
+}
+
+
+/* initialize this plugin */
+static void
+plugin_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
+			   bool is_init)
+{
+	ctx->output_plugin_private = NULL;
+
+	/* Probably unnecessary, as we don't use the SQL interface ... */
+	opt->output_type = OUTPUT_PLUGIN_BINARY_OUTPUT;
+
+	if (ctx->output_plugin_options != NIL)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("This plugin does not expect any options")));
+	}
+}
+
+static void
+plugin_shutdown(LogicalDecodingContext *ctx)
+{
+}
+
+/*
+ * As we don't release the slot during processing of particular table, there's
+ * no room for SQL interface, even for debugging purposes. Therefore we need
+ * neither OutputPluginPrepareWrite() nor OutputPluginWrite() in the plugin
+ * callbacks. (Although we might want to write custom callbacks, this API
+ * seems to be unnecessarily generic for our purposes.)
+ */
+
+/* BEGIN callback */
+static void
+plugin_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
+{
+}
+
+/* COMMIT callback */
+static void
+plugin_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+				  XLogRecPtr commit_lsn)
+{
+}
+
+/*
+ * Callback for individual changed tuples
+ */
+static void
+plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+			  Relation relation, ReorderBufferChange *change)
+{
+	RepackDecodingState *dstate;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	/* Only interested in one particular relation. */
+	if (relation->rd_id != dstate->relid)
+		return;
+
+	/* Decode entry depending on its type */
+	switch (change->action)
+	{
+		case REORDER_BUFFER_CHANGE_INSERT:
+			{
+				HeapTuple	newtuple;
+
+				newtuple = change->data.tp.newtuple != NULL ?
+					change->data.tp.newtuple : NULL;
+
+				/*
+				 * Identity checks in the main function should have made this
+				 * impossible.
+				 */
+				if (newtuple == NULL)
+					elog(ERROR, "Incomplete insert info.");
+
+				store_change(ctx, CHANGE_INSERT, newtuple);
+			}
+			break;
+		case REORDER_BUFFER_CHANGE_UPDATE:
+			{
+				HeapTuple	oldtuple,
+							newtuple;
+
+				oldtuple = change->data.tp.oldtuple != NULL ?
+					change->data.tp.oldtuple : NULL;
+				newtuple = change->data.tp.newtuple != NULL ?
+					change->data.tp.newtuple : NULL;
+
+				if (newtuple == NULL)
+					elog(ERROR, "Incomplete update info.");
+
+				if (oldtuple != NULL)
+					store_change(ctx, CHANGE_UPDATE_OLD, oldtuple);
+
+				store_change(ctx, CHANGE_UPDATE_NEW, newtuple);
+			}
+			break;
+		case REORDER_BUFFER_CHANGE_DELETE:
+			{
+				HeapTuple	oldtuple;
+
+				oldtuple = change->data.tp.oldtuple ?
+					change->data.tp.oldtuple : NULL;
+
+				if (oldtuple == NULL)
+					elog(ERROR, "Incomplete delete info.");
+
+				store_change(ctx, CHANGE_DELETE, oldtuple);
+			}
+			break;
+		default:
+			/* Should not come here */
+			Assert(false);
+			break;
+	}
+}
+
+/* Store concurrent data change. */
+static void
+store_change(LogicalDecodingContext *ctx, ConcurrentChangeKind kind,
+			 HeapTuple tuple)
+{
+	RepackDecodingState *dstate;
+	char	   *change_raw;
+	ConcurrentChange change;
+	bool		flattened = false;
+	Size		size;
+	Datum		values[1];
+	bool		isnull[1];
+	char	   *dst;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	size = VARHDRSZ + SizeOfConcurrentChange;
+
+	/*
+	 * ReorderBufferCommit() stores the TOAST chunks in its private memory
+	 * context and frees them after having called apply_change().  Therefore
+	 * we need flat copy (including TOAST) that we eventually copy into the
+	 * memory context which is available to decode_concurrent_changes().
+	 */
+	if (HeapTupleHasExternal(tuple))
+	{
+		/*
+		 * toast_flatten_tuple_to_datum() might be more convenient but we
+		 * don't want the decompression it does.
+		 */
+		tuple = toast_flatten_tuple(tuple, dstate->tupdesc);
+		flattened = true;
+	}
+
+	size += tuple->t_len;
+	if (size >= MaxAllocSize)
+		elog(ERROR, "Change is too big.");
+
+	/* Construct the change. */
+	change_raw = (char *) palloc0(size);
+	SET_VARSIZE(change_raw, size);
+
+	/*
+	 * Since the varlena alignment might not be sufficient for the structure,
+	 * set the fields in a local instance and remember where it should
+	 * eventually be copied.
+	 */
+	change.kind = kind;
+	dst = (char *) VARDATA(change_raw);
+
+	/*
+	 * Copy the tuple.
+	 *
+	 * Note: change->tup_data.t_data must be fixed on retrieval!
+	 */
+	memcpy(&change.tup_data, tuple, sizeof(HeapTupleData));
+	memcpy(dst, &change, SizeOfConcurrentChange);
+	dst += SizeOfConcurrentChange;
+	memcpy(dst, tuple->t_data, tuple->t_len);
+
+	/* The data has been copied. */
+	if (flattened)
+		pfree(tuple);
+
+	/* Store as tuple of 1 bytea column. */
+	values[0] = PointerGetDatum(change_raw);
+	isnull[0] = false;
+	tuplestore_putvalues(dstate->tstore, dstate->tupdesc_change,
+						 values, isnull);
+
+	/* Accounting. */
+	dstate->nchanges++;
+
+	/* Cleanup. */
+	pfree(change_raw);
+}
diff --git a/src/backend/storage/lmgr/generate-lwlocknames.pl b/src/backend/storage/lmgr/generate-lwlocknames.pl
index b49007167b0..2e7f1054e62 100644
--- a/src/backend/storage/lmgr/generate-lwlocknames.pl
+++ b/src/backend/storage/lmgr/generate-lwlocknames.pl
@@ -162,7 +162,7 @@ while (<$lwlocklist>)
 
 die
   "$wait_event_lwlocks[$lwlock_count] defined in wait_event_names.txt but "
-  . " missing from lwlocklist.h"
+  . "missing from lwlocklist.h"
   if $lwlock_count < scalar @wait_event_lwlocks;
 
 die
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 3af1b366adf..fdf3427b43f 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -214,7 +214,6 @@ static List *exportedSnapshots = NIL;
 
 /* Prototypes for local functions */
 static void UnregisterSnapshotNoOwner(Snapshot snapshot);
-static void FreeSnapshot(Snapshot snapshot);
 static void SnapshotResetXmin(void);
 
 /* ResourceOwner callbacks to track snapshot references */
@@ -659,7 +658,7 @@ CopySnapshot(Snapshot snapshot)
  * FreeSnapshot
  *		Free the memory associated with a snapshot.
  */
-static void
+void
 FreeSnapshot(Snapshot snapshot)
 {
 	Assert(snapshot->regd_count == 0);
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 2a1bb47ff03..0ec0f4c4790 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -5121,8 +5121,8 @@ match_previous_words(int pattern_id,
 		 * one word, so the above test is correct.
 		 */
 		if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
-			COMPLETE_WITH("ANALYZE", "VERBOSE");
-		else if (TailMatches("ANALYZE", "VERBOSE"))
+			COMPLETE_WITH("ANALYZE", "CONCURRENTLY", "VERBOSE");
+		else if (TailMatches("ANALYZE", "CONCURRENTLY", "VERBOSE"))
 			COMPLETE_WITH("ON", "OFF");
 	}
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ce48fac42ba..90cccea5bb5 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -361,14 +361,15 @@ extern void heap_multi_insert(Relation relation, TupleTableSlot **slots,
 							  BulkInsertState bistate);
 extern TM_Result heap_delete(Relation relation, const ItemPointerData *tid,
 							 CommandId cid, Snapshot crosscheck, bool wait,
-							 TM_FailureData *tmfd, bool changingPart);
+							 TM_FailureData *tmfd, bool changingPart,
+							 bool wal_logical);
 extern void heap_finish_speculative(Relation relation, const ItemPointerData *tid);
 extern void heap_abort_speculative(Relation relation, const ItemPointerData *tid);
 extern TM_Result heap_update(Relation relation, const ItemPointerData *otid,
 							 HeapTuple newtup,
 							 CommandId cid, Snapshot crosscheck, bool wait,
 							 TM_FailureData *tmfd, LockTupleMode *lockmode,
-							 TU_UpdateIndexes *update_indexes);
+							 TU_UpdateIndexes *update_indexes, bool wal_logical);
 extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
 								 CommandId cid, LockTupleMode mode, LockWaitPolicy wait_policy,
 								 bool follow_updates,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index ce3566ba949..f1f5495556b 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -104,6 +104,8 @@
 #define XLH_DELETE_CONTAINS_OLD_KEY				(1<<2)
 #define XLH_DELETE_IS_SUPER						(1<<3)
 #define XLH_DELETE_IS_PARTITION_MOVE			(1<<4)
+/* See heap_delete() */
+#define XLH_DELETE_NO_LOGICAL					(1<<5)
 
 /* convenience macro for checking whether any form of old tuple was logged */
 #define XLH_DELETE_CONTAINS_OLD						\
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index e2ec5289d4d..76aa993009a 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -22,6 +22,7 @@
 #include "access/xact.h"
 #include "commands/vacuum.h"
 #include "executor/tuptable.h"
+#include "replication/logical.h"
 #include "storage/read_stream.h"
 #include "utils/rel.h"
 #include "utils/snapshot.h"
@@ -629,6 +630,8 @@ typedef struct TableAmRoutine
 											  Relation OldIndex,
 											  bool use_sort,
 											  TransactionId OldestXmin,
+											  Snapshot snapshot,
+											  LogicalDecodingContext *decoding_ctx,
 											  TransactionId *xid_cutoff,
 											  MultiXactId *multi_cutoff,
 											  double *num_tuples,
@@ -1646,6 +1649,10 @@ table_relation_copy_data(Relation rel, const RelFileLocator *newrlocator)
  *   not needed for the relation's AM
  * - *xid_cutoff - ditto
  * - *multi_cutoff - ditto
+ * - snapshot - if != NULL, ignore data changes done by transactions that this
+ *	 (MVCC) snapshot considers still in-progress or in the future.
+ * - decoding_ctx - logical decoding context, to capture concurrent data
+ *   changes.
  *
  * Output parameters:
  * - *xid_cutoff - rel's new relfrozenxid value, may be invalid
@@ -1658,6 +1665,8 @@ table_relation_copy_for_cluster(Relation OldTable, Relation NewTable,
 								Relation OldIndex,
 								bool use_sort,
 								TransactionId OldestXmin,
+								Snapshot snapshot,
+								LogicalDecodingContext *decoding_ctx,
 								TransactionId *xid_cutoff,
 								MultiXactId *multi_cutoff,
 								double *num_tuples,
@@ -1666,6 +1675,7 @@ table_relation_copy_for_cluster(Relation OldTable, Relation NewTable,
 {
 	OldTable->rd_tableam->relation_copy_for_cluster(OldTable, NewTable, OldIndex,
 													use_sort, OldestXmin,
+													snapshot, decoding_ctx,
 													xid_cutoff, multi_cutoff,
 													num_tuples, tups_vacuumed,
 													tups_recently_dead);
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 28741988478..6a5c476294a 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -13,10 +13,15 @@
 #ifndef CLUSTER_H
 #define CLUSTER_H
 
+#include "nodes/execnodes.h"
 #include "nodes/parsenodes.h"
 #include "parser/parse_node.h"
+#include "replication/decode.h"
+#include "replication/logical.h"
 #include "storage/lock.h"
 #include "utils/relcache.h"
+#include "utils/resowner.h"
+#include "utils/tuplestore.h"
 
 
 /* flag bits for ClusterParams->options */
@@ -25,6 +30,8 @@
 #define CLUOPT_RECHECK_ISCLUSTERED 0x04 /* recheck relation state for
 										 * indisclustered */
 #define CLUOPT_ANALYZE 0x08		/* do an ANALYZE */
+#define CLUOPT_CONCURRENT 0x10	/* allow concurrent data changes */
+
 
 /* options for CLUSTER */
 typedef struct ClusterParams
@@ -33,10 +40,84 @@ typedef struct ClusterParams
 } ClusterParams;
 
 
+/*
+ * The following definitions are used by REPACK CONCURRENTLY.
+ */
+
+typedef enum
+{
+	CHANGE_INSERT,
+	CHANGE_UPDATE_OLD,
+	CHANGE_UPDATE_NEW,
+	CHANGE_DELETE
+} ConcurrentChangeKind;
+
+typedef struct ConcurrentChange
+{
+	/* See the enum above. */
+	ConcurrentChangeKind kind;
+
+	/*
+	 * The actual tuple.
+	 *
+	 * The tuple data follows the ConcurrentChange structure. Before use make
+	 * sure the tuple is correctly aligned (ConcurrentChange can be stored as
+	 * bytea) and that tuple->t_data is fixed.
+	 */
+	HeapTupleData tup_data;
+} ConcurrentChange;
+
+#define SizeOfConcurrentChange (offsetof(ConcurrentChange, tup_data) + \
+								sizeof(HeapTupleData))
+
+/*
+ * Logical decoding state.
+ *
+ * Here we store the data changes that we decode from WAL while the table
+ * contents is being copied to a new storage. Also the necessary metadata
+ * needed to apply these changes to the table is stored here.
+ */
+typedef struct RepackDecodingState
+{
+	/* The relation whose changes we're decoding. */
+	Oid			relid;
+
+	/* Replication slot name. */
+	NameData	slotname;
+
+	/*
+	 * Decoded changes are stored here. Although we try to avoid excessive
+	 * batches, it can happen that the changes need to be stored to disk. The
+	 * tuplestore does this transparently.
+	 */
+	Tuplestorestate *tstore;
+
+	/* The current number of changes in tstore. */
+	double		nchanges;
+
+	/*
+	 * Descriptor to store the ConcurrentChange structure serialized (bytea).
+	 * We can't store the tuple directly because tuplestore only supports
+	 * minimum tuple and we may need to transfer OID system column from the
+	 * output plugin. Also we need to transfer the change kind, so it's better
+	 * to put everything in the structure than to use 2 tuplestores "in
+	 * parallel".
+	 */
+	TupleDesc	tupdesc_change;
+
+	/* Tuple descriptor needed to update indexes. */
+	TupleDesc	tupdesc;
+
+	/* Slot to retrieve data from tstore. */
+	TupleTableSlot *tsslot;
+
+	ResourceOwner resowner;
+} RepackDecodingState;
+
 extern void ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel);
 
 extern void cluster_rel(RepackCommand command, Relation OldHeap, Oid indexOid,
-						ClusterParams *params);
+						ClusterParams *params, bool isTopLevel);
 extern void check_index_is_clusterable(Relation OldHeap, Oid indexOid,
 									   LOCKMODE lockmode);
 extern void mark_index_clustered(Relation rel, Oid indexOid, bool is_internal);
@@ -48,8 +129,13 @@ extern void finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 							 bool swap_toast_by_content,
 							 bool check_constraints,
 							 bool is_internal,
+							 bool reindex,
 							 TransactionId frozenXid,
 							 MultiXactId cutoffMulti,
 							 char newrelpersistence);
 
+extern bool am_decoding_for_repack(void);
+extern bool change_useless_for_repack(XLogRecordBuffer *buf);
+extern void repack_decode_concurrent_changes(LogicalDecodingContext *ctx,
+											 XLogRecPtr end_of_wal);
 #endif							/* CLUSTER_H */
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index f00e39b937d..4445724a463 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -86,10 +86,12 @@
 #define PROGRESS_REPACK_PHASE					1
 #define PROGRESS_REPACK_INDEX_RELID				2
 #define PROGRESS_REPACK_HEAP_TUPLES_SCANNED		3
-#define PROGRESS_REPACK_HEAP_TUPLES_WRITTEN		4
-#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			5
-#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		6
-#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		7
+#define PROGRESS_REPACK_HEAP_TUPLES_INSERTED	4
+#define PROGRESS_REPACK_HEAP_TUPLES_UPDATED		5
+#define PROGRESS_REPACK_HEAP_TUPLES_DELETED		6
+#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			7
+#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		8
+#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		9
 
 /*
  * Phases of repack (as advertised via PROGRESS_REPACK_PHASE).
@@ -98,9 +100,10 @@
 #define PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP	2
 #define PROGRESS_REPACK_PHASE_SORT_TUPLES		3
 #define PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP	4
-#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	5
-#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		6
-#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		7
+#define PROGRESS_REPACK_PHASE_CATCH_UP			5
+#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	6
+#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		7
+#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		8
 
 /* Progress parameters for CREATE INDEX */
 /* 3, 4 and 5 reserved for "waitfor" metrics */
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index 34383dea776..5ee267d1c90 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -73,6 +73,7 @@ extern void FreeSnapshotBuilder(SnapBuild *builder);
 extern void SnapBuildSnapDecRefcount(Snapshot snap);
 
 extern Snapshot SnapBuildInitialSnapshot(SnapBuild *builder);
+extern Snapshot SnapBuildInitialSnapshotForRepack(SnapBuild *builder);
 extern Snapshot SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place);
 extern const char *SnapBuildExportSnapshot(SnapBuild *builder);
 extern void SnapBuildClearExportedSnapshot(void);
diff --git a/src/include/storage/lockdefs.h b/src/include/storage/lockdefs.h
index b73bb5618e6..3785b009808 100644
--- a/src/include/storage/lockdefs.h
+++ b/src/include/storage/lockdefs.h
@@ -36,8 +36,8 @@ typedef int LOCKMODE;
 #define AccessShareLock			1	/* SELECT */
 #define RowShareLock			2	/* SELECT FOR UPDATE/FOR SHARE */
 #define RowExclusiveLock		3	/* INSERT, UPDATE, DELETE */
-#define ShareUpdateExclusiveLock 4	/* VACUUM (non-FULL), ANALYZE, CREATE
-									 * INDEX CONCURRENTLY */
+#define ShareUpdateExclusiveLock 4	/* VACUUM (non-exclusive), ANALYZE, CREATE
+									 * INDEX CONCURRENTLY, REPACK CONCURRENTLY */
 #define ShareLock				5	/* CREATE INDEX (WITHOUT CONCURRENTLY) */
 #define ShareRowExclusiveLock	6	/* like EXCLUSIVE MODE, but allows ROW
 									 * SHARE */
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index de824945f0b..0eb8ced76d3 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -64,6 +64,8 @@ extern Snapshot GetLatestSnapshot(void);
 extern void SnapshotSetCommandId(CommandId curcid);
 
 extern Snapshot CopySnapshot(Snapshot snapshot);
+extern void FreeSnapshot(Snapshot snapshot);
+
 extern Snapshot GetCatalogSnapshot(Oid relid);
 extern Snapshot GetNonHistoricCatalogSnapshot(Oid relid);
 extern void InvalidateCatalogSnapshot(void);
diff --git a/src/test/modules/injection_points/Makefile b/src/test/modules/injection_points/Makefile
index a41d781f8c9..6b6a3b84a57 100644
--- a/src/test/modules/injection_points/Makefile
+++ b/src/test/modules/injection_points/Makefile
@@ -14,8 +14,11 @@ REGRESS_OPTS = --dlpath=$(top_builddir)/src/test/regress
 
 ISOLATION = basic \
 	    inplace \
+	    repack \
+	    repack_toast \
 	    syscache-update-pruned \
 	    heap_lock_update
+ISOLATION_OPTS = --temp-config $(top_srcdir)/src/test/modules/injection_points/logical.conf
 
 # The injection points are cluster-wide, so disable installcheck
 NO_INSTALLCHECK = 1
diff --git a/src/test/modules/injection_points/expected/repack.out b/src/test/modules/injection_points/expected/repack.out
new file mode 100644
index 00000000000..b575e9052ee
--- /dev/null
+++ b/src/test/modules/injection_points/expected/repack.out
@@ -0,0 +1,113 @@
+Parsed test spec with 2 sessions
+
+starting permutation: wait_before_lock change_existing change_new change_subxact1 change_subxact2 check2 wakeup_before_lock check1
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step wait_before_lock: 
+	REPACK (CONCURRENTLY) repack_test USING INDEX repack_test_pkey;
+ <waiting ...>
+step change_existing: 
+	UPDATE repack_test SET i=10 where i=1;
+	UPDATE repack_test SET j=20 where i=2;
+	UPDATE repack_test SET i=30 where i=3;
+	UPDATE repack_test SET i=40 where i=30;
+	DELETE FROM repack_test WHERE i=4;
+
+step change_new: 
+	INSERT INTO repack_test(i, j) VALUES (5, 5), (6, 6), (7, 7), (8, 8);
+	UPDATE repack_test SET i=50 where i=5;
+	UPDATE repack_test SET j=60 where i=6;
+	DELETE FROM repack_test WHERE i=7;
+
+step change_subxact1: 
+	BEGIN;
+	INSERT INTO repack_test(i, j) VALUES (100, 100);
+	SAVEPOINT s1;
+	UPDATE repack_test SET i=101 where i=100;
+	SAVEPOINT s2;
+	UPDATE repack_test SET i=102 where i=101;
+	COMMIT;
+
+step change_subxact2: 
+	BEGIN;
+	SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 110);
+	ROLLBACK TO SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 111);
+	COMMIT;
+
+step check2: 
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s2(i, j)
+	SELECT i, j FROM repack_test;
+
+  i|  j
+---+---
+  2| 20
+  6| 60
+  8|  8
+ 10|  1
+ 40|  3
+ 50|  5
+102|100
+110|111
+(8 rows)
+
+step wakeup_before_lock: 
+	SELECT injection_points_wakeup('repack-concurrently-before-lock');
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step wait_before_lock: <... completed>
+step check1: 
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT count(DISTINCT node) FROM relfilenodes;
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s1(i, j)
+	SELECT i, j FROM repack_test;
+
+	SELECT count(*)
+	FROM data_s1 d1 FULL JOIN data_s2 d2 USING (i, j)
+	WHERE d1.i ISNULL OR d2.i ISNULL;
+
+count
+-----
+    2
+(1 row)
+
+  i|  j
+---+---
+  2| 20
+  6| 60
+  8|  8
+ 10|  1
+ 40|  3
+ 50|  5
+102|100
+110|111
+(8 rows)
+
+count
+-----
+    0
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
diff --git a/src/test/modules/injection_points/expected/repack_toast.out b/src/test/modules/injection_points/expected/repack_toast.out
new file mode 100644
index 00000000000..4f866a74e32
--- /dev/null
+++ b/src/test/modules/injection_points/expected/repack_toast.out
@@ -0,0 +1,64 @@
+Parsed test spec with 2 sessions
+
+starting permutation: wait_before_lock change check2 wakeup_before_lock check1
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step wait_before_lock: 
+	REPACK (CONCURRENTLY) repack_test;
+ <waiting ...>
+step change: 
+	UPDATE repack_test SET j=get_long_string() where i=2;
+	DELETE FROM repack_test WHERE i=3;
+	INSERT INTO repack_test(i, j) VALUES (4, get_long_string());
+
+step check2: 
+	INSERT INTO relfilenodes(node)
+	SELECT c2.relfilenode
+	FROM pg_class c1 JOIN pg_class c2 ON c2.oid = c1.oid OR c2.oid = c1.reltoastrelid
+	WHERE c1.relname='repack_test';
+
+	INSERT INTO data_s2(i, j)
+	SELECT i, j FROM repack_test;
+
+step wakeup_before_lock: 
+	SELECT injection_points_wakeup('repack-concurrently-before-lock');
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step wait_before_lock: <... completed>
+step check1: 
+	INSERT INTO relfilenodes(node)
+	SELECT c2.relfilenode
+	FROM pg_class c1 JOIN pg_class c2 ON c2.oid = c1.oid OR c2.oid = c1.reltoastrelid
+	WHERE c1.relname='repack_test';
+
+	SELECT count(DISTINCT node) FROM relfilenodes;
+
+	INSERT INTO data_s1(i, j)
+	SELECT i, j FROM repack_test;
+
+	SELECT count(*)
+	FROM data_s1 d1 FULL JOIN data_s2 d2 USING (i, j)
+	WHERE d1.i ISNULL OR d2.i ISNULL;
+
+count
+-----
+    4
+(1 row)
+
+count
+-----
+    0
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
diff --git a/src/test/modules/injection_points/logical.conf b/src/test/modules/injection_points/logical.conf
new file mode 100644
index 00000000000..e3d257315fa
--- /dev/null
+++ b/src/test/modules/injection_points/logical.conf
@@ -0,0 +1 @@
+wal_level = logical
diff --git a/src/test/modules/injection_points/meson.build b/src/test/modules/injection_points/meson.build
index fcc85414515..bbf1f6e665c 100644
--- a/src/test/modules/injection_points/meson.build
+++ b/src/test/modules/injection_points/meson.build
@@ -45,11 +45,15 @@ tests += {
     'specs': [
       'basic',
       'inplace',
+      'repack',
+      'repack_toast',
       'syscache-update-pruned',
       'heap_lock_update',
     ],
     'runningcheck': false, # see syscache-update-pruned
     # Some tests wait for all snapshots, so avoid parallel execution
     'runningcheck-parallel': false,
+    # 'repack' requires wal_level = 'logical'.
+    'regress_args': ['--temp-config', files('logical.conf')],
   },
 }
diff --git a/src/test/modules/injection_points/specs/repack.spec b/src/test/modules/injection_points/specs/repack.spec
new file mode 100644
index 00000000000..d727a9b056b
--- /dev/null
+++ b/src/test/modules/injection_points/specs/repack.spec
@@ -0,0 +1,142 @@
+# REPACK (CONCURRENTLY) ... USING INDEX ...;
+setup
+{
+	CREATE EXTENSION injection_points;
+
+	CREATE TABLE repack_test(i int PRIMARY KEY, j int);
+	INSERT INTO repack_test(i, j) VALUES (1, 1), (2, 2), (3, 3), (4, 4);
+
+	CREATE TABLE relfilenodes(node oid);
+
+	CREATE TABLE data_s1(i int, j int);
+	CREATE TABLE data_s2(i int, j int);
+}
+
+teardown
+{
+	DROP TABLE repack_test;
+	DROP EXTENSION injection_points;
+
+	DROP TABLE relfilenodes;
+	DROP TABLE data_s1;
+	DROP TABLE data_s2;
+}
+
+session s1
+setup
+{
+	SELECT injection_points_set_local();
+	SELECT injection_points_attach('repack-concurrently-before-lock', 'wait');
+}
+# Perform the initial load and wait for s2 to do some data changes.
+step wait_before_lock
+{
+	REPACK (CONCURRENTLY) repack_test USING INDEX repack_test_pkey;
+}
+# Check the table from the perspective of s1.
+#
+# Besides the contents, we also check that relfilenode has changed.
+
+# Have each session write the contents into a table and use FULL JOIN to check
+# if the outputs are identical.
+step check1
+{
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT count(DISTINCT node) FROM relfilenodes;
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s1(i, j)
+	SELECT i, j FROM repack_test;
+
+	SELECT count(*)
+	FROM data_s1 d1 FULL JOIN data_s2 d2 USING (i, j)
+	WHERE d1.i ISNULL OR d2.i ISNULL;
+}
+teardown
+{
+	SELECT injection_points_detach('repack-concurrently-before-lock');
+}
+
+session s2
+# Change the existing data. UPDATE changes both key and non-key columns. Also
+# update one row twice to test whether tuple version generated by this session
+# can be found.
+step change_existing
+{
+	UPDATE repack_test SET i=10 where i=1;
+	UPDATE repack_test SET j=20 where i=2;
+	UPDATE repack_test SET i=30 where i=3;
+	UPDATE repack_test SET i=40 where i=30;
+	DELETE FROM repack_test WHERE i=4;
+}
+# Insert new rows and UPDATE / DELETE some of them. Again, update both key and
+# non-key column.
+step change_new
+{
+	INSERT INTO repack_test(i, j) VALUES (5, 5), (6, 6), (7, 7), (8, 8);
+	UPDATE repack_test SET i=50 where i=5;
+	UPDATE repack_test SET j=60 where i=6;
+	DELETE FROM repack_test WHERE i=7;
+}
+
+# When applying concurrent data changes, we should see the effects of an
+# in-progress subtransaction.
+#
+# XXX Not sure this test is useful now - it was designed for the patch that
+# preserves tuple visibility and which therefore modifies
+# TransactionIdIsCurrentTransactionId().
+step change_subxact1
+{
+	BEGIN;
+	INSERT INTO repack_test(i, j) VALUES (100, 100);
+	SAVEPOINT s1;
+	UPDATE repack_test SET i=101 where i=100;
+	SAVEPOINT s2;
+	UPDATE repack_test SET i=102 where i=101;
+	COMMIT;
+}
+
+# When applying concurrent data changes, we should not see the effects of a
+# rolled back subtransaction.
+#
+# XXX Is this test useful? See above.
+step change_subxact2
+{
+	BEGIN;
+	SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 110);
+	ROLLBACK TO SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 111);
+	COMMIT;
+}
+
+# Check the table from the perspective of s2.
+step check2
+{
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s2(i, j)
+	SELECT i, j FROM repack_test;
+}
+step wakeup_before_lock
+{
+	SELECT injection_points_wakeup('repack-concurrently-before-lock');
+}
+
+# Test if data changes introduced while one session is performing REPACK
+# CONCURRENTLY find their way into the table.
+permutation
+	wait_before_lock
+	change_existing
+	change_new
+	change_subxact1
+	change_subxact2
+	check2
+	wakeup_before_lock
+	check1
diff --git a/src/test/modules/injection_points/specs/repack_toast.spec b/src/test/modules/injection_points/specs/repack_toast.spec
new file mode 100644
index 00000000000..b48abf21450
--- /dev/null
+++ b/src/test/modules/injection_points/specs/repack_toast.spec
@@ -0,0 +1,105 @@
+# REPACK (CONCURRENTLY);
+#
+# Test handling of TOAST. At the same time, no tuplesort.
+setup
+{
+	CREATE EXTENSION injection_points;
+
+	-- Return a string that needs to be TOASTed.
+	CREATE FUNCTION get_long_string()
+	RETURNS text
+	LANGUAGE sql as $$
+		SELECT string_agg(chr(65 + trunc(25 * random())::int), '')
+		FROM generate_series(1, 2048) s(x);
+	$$;
+
+	CREATE TABLE repack_test(i int PRIMARY KEY, j text);
+	INSERT INTO repack_test(i, j) VALUES (1, get_long_string()),
+		(2, get_long_string()), (3, get_long_string());
+
+	CREATE TABLE relfilenodes(node oid);
+
+	CREATE TABLE data_s1(i int, j text);
+	CREATE TABLE data_s2(i int, j text);
+}
+
+teardown
+{
+	DROP TABLE repack_test;
+	DROP EXTENSION injection_points;
+	DROP FUNCTION get_long_string();
+
+	DROP TABLE relfilenodes;
+	DROP TABLE data_s1;
+	DROP TABLE data_s2;
+}
+
+session s1
+setup
+{
+	SELECT injection_points_set_local();
+	SELECT injection_points_attach('repack-concurrently-before-lock', 'wait');
+}
+# Perform the initial load and wait for s2 to do some data changes.
+step wait_before_lock
+{
+	REPACK (CONCURRENTLY) repack_test;
+}
+# Check the table from the perspective of s1.
+#
+# Besides the contents, we also check that relfilenode has changed.
+
+# Have each session write the contents into a table and use FULL JOIN to check
+# if the outputs are identical.
+step check1
+{
+	INSERT INTO relfilenodes(node)
+	SELECT c2.relfilenode
+	FROM pg_class c1 JOIN pg_class c2 ON c2.oid = c1.oid OR c2.oid = c1.reltoastrelid
+	WHERE c1.relname='repack_test';
+
+	SELECT count(DISTINCT node) FROM relfilenodes;
+
+	INSERT INTO data_s1(i, j)
+	SELECT i, j FROM repack_test;
+
+	SELECT count(*)
+	FROM data_s1 d1 FULL JOIN data_s2 d2 USING (i, j)
+	WHERE d1.i ISNULL OR d2.i ISNULL;
+}
+teardown
+{
+    SELECT injection_points_detach('repack-concurrently-before-lock');
+}
+
+session s2
+step change
+{
+	UPDATE repack_test SET j=get_long_string() where i=2;
+	DELETE FROM repack_test WHERE i=3;
+	INSERT INTO repack_test(i, j) VALUES (4, get_long_string());
+}
+# Check the table from the perspective of s2.
+step check2
+{
+	INSERT INTO relfilenodes(node)
+	SELECT c2.relfilenode
+	FROM pg_class c1 JOIN pg_class c2 ON c2.oid = c1.oid OR c2.oid = c1.reltoastrelid
+	WHERE c1.relname='repack_test';
+
+	INSERT INTO data_s2(i, j)
+	SELECT i, j FROM repack_test;
+}
+step wakeup_before_lock
+{
+	SELECT injection_points_wakeup('repack-concurrently-before-lock');
+}
+
+# Test if data changes introduced while one session is performing REPACK
+# CONCURRENTLY find their way into the table.
+permutation
+	wait_before_lock
+	change
+	check2
+	wakeup_before_lock
+	check1
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 48461550636..470920f0d16 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2014,7 +2014,7 @@ pg_stat_progress_cluster| SELECT pid,
     phase,
     repack_index_relid AS cluster_index_relid,
     heap_tuples_scanned,
-    heap_tuples_written,
+    (heap_tuples_inserted + heap_tuples_updated) AS heap_tuples_written,
     heap_blks_total,
     heap_blks_scanned,
     index_rebuild_count
@@ -2094,17 +2094,20 @@ pg_stat_progress_repack| SELECT s.pid,
             WHEN 2 THEN 'index scanning heap'::text
             WHEN 3 THEN 'sorting tuples'::text
             WHEN 4 THEN 'writing new heap'::text
-            WHEN 5 THEN 'swapping relation files'::text
-            WHEN 6 THEN 'rebuilding index'::text
-            WHEN 7 THEN 'performing final cleanup'::text
+            WHEN 5 THEN 'catch-up'::text
+            WHEN 6 THEN 'swapping relation files'::text
+            WHEN 7 THEN 'rebuilding index'::text
+            WHEN 8 THEN 'performing final cleanup'::text
             ELSE NULL::text
         END AS phase,
     (s.param3)::oid AS repack_index_relid,
     s.param4 AS heap_tuples_scanned,
-    s.param5 AS heap_tuples_written,
-    s.param6 AS heap_blks_total,
-    s.param7 AS heap_blks_scanned,
-    s.param8 AS index_rebuild_count
+    s.param5 AS heap_tuples_inserted,
+    s.param6 AS heap_tuples_updated,
+    s.param7 AS heap_tuples_deleted,
+    s.param8 AS heap_blks_total,
+    s.param9 AS heap_blks_scanned,
+    s.param10 AS index_rebuild_count
    FROM (pg_stat_get_progress_info('REPACK'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_vacuum| SELECT s.pid,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index c81d93d0e5a..a0b7b38a5e2 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -419,6 +419,7 @@ CatCacheHeader
 CatalogId
 CatalogIdMapEntry
 CatalogIndexState
+ChangeDest
 ChangeVarNodes_callback
 ChangeVarNodes_context
 CheckPoint
@@ -495,6 +496,8 @@ CompressFileHandle
 CompressionLocation
 CompressorState
 ComputeXidHorizonsResult
+ConcurrentChange
+ConcurrentChangeKind
 ConditionVariable
 ConditionVariableMinimallyPadded
 ConditionalStack
@@ -1274,6 +1277,7 @@ IndexElem
 IndexFetchHeapData
 IndexFetchTableData
 IndexInfo
+IndexInsertState
 IndexList
 IndexOnlyScan
 IndexOnlyScanState
@@ -2570,6 +2574,7 @@ ReorderBufferUpdateProgressTxnCB
 ReorderTuple
 RepOriginId
 RepackCommand
+RepackDecodingState
 RepackStmt
 ReparameterizeForeignPathByChild_function
 ReplaceVarsFromTargetList_context
-- 
2.47.3

v29-0005-Use-background-worker-to-do-logical-decoding.patchtext/x-diffDownload

From 9a98478a375fcb73f4e057883480fbfe37b3b219 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Thu, 8 Jan 2026 17:47:50 +0100
Subject: [PATCH 5/6] Use background worker to do logical decoding.

If the backend performing REPACK (CONCURRENTLY) does both data copying and
logical decoding, it has to "travel in time" back and forth and therefore it
has to invalidate system caches quite a few times. (The copying and the
decoding work with different catalog snapshots.) As the decoding worker has
separate caches, the switching is not necessary.

Without the worker, it'd also be difficult to switch between potentiallly long
running tasks like index build and WAL decoding. (No decoding during that time
at all can suspend archiving / recycling of WAL segments for some time, which
in turn may result in full disk.)

Another problem is that, after having acquired AccessExclusiveLock (in order
to swap the files), the backend needs to both decode and apply the data
changes that took place while it was waiting for the lock. With the decoding
worker, the decoding runs all the time, so the backend only needs to apply the
changes. This can reduce the time the exclusive lock is held for.

Note that the code added in order to handle ERRORs in the background worker
almost duplicates the existing code that does the same for other types of
workers (See ProcessParallelMessages() and
ProcessParallelApplyMessages()). Refactoring of the existing code might be
useful, to reduce the duplication.
---
 src/backend/access/heap/heapam_handler.c      |   44 -
 src/backend/commands/cluster.c                | 1174 +++++++++++++----
 src/backend/libpq/pqmq.c                      |    5 +
 src/backend/postmaster/bgworker.c             |    4 +
 src/backend/replication/logical/logical.c     |    6 +-
 .../pgoutput_repack/pgoutput_repack.c         |   54 +-
 src/backend/storage/ipc/procsignal.c          |    4 +
 src/backend/tcop/postgres.c                   |    4 +
 .../utils/activity/wait_event_names.txt       |    2 +
 src/include/access/tableam.h                  |    7 +-
 src/include/commands/cluster.h                |   71 +-
 src/include/storage/procsignal.h              |    1 +
 src/tools/pgindent/typedefs.list              |    4 +-
 13 files changed, 979 insertions(+), 401 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 3526b6adcb5..475c536ce43 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -33,7 +33,6 @@
 #include "catalog/index.h"
 #include "catalog/storage.h"
 #include "catalog/storage_xlog.h"
-#include "commands/cluster.h"
 #include "commands/progress.h"
 #include "executor/executor.h"
 #include "miscadmin.h"
@@ -688,7 +687,6 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 								 Relation OldIndex, bool use_sort,
 								 TransactionId OldestXmin,
 								 Snapshot snapshot,
-								 LogicalDecodingContext *decoding_ctx,
 								 TransactionId *xid_cutoff,
 								 MultiXactId *multi_cutoff,
 								 double *num_tuples,
@@ -710,7 +708,6 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	BufferHeapTupleTableSlot *hslot;
 	BlockNumber prev_cblock = InvalidBlockNumber;
 	bool		concurrent = snapshot != NULL;
-	XLogRecPtr	end_of_wal_prev = GetFlushRecPtr(NULL);
 
 	/* Remember if it's a system catalog */
 	is_system_catalog = IsSystemRelation(OldHeap);
@@ -957,31 +954,6 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			ct_val[1] = *num_tuples;
 			pgstat_progress_update_multi_param(2, ct_index, ct_val);
 		}
-
-		/*
-		 * Process the WAL produced by the load, as well as by other
-		 * transactions, so that the replication slot can advance and WAL does
-		 * not pile up. Use wal_segment_size as a threshold so that we do not
-		 * introduce the decoding overhead too often.
-		 *
-		 * Of course, we must not apply the changes until the initial load has
-		 * completed.
-		 *
-		 * Note that our insertions into the new table should not be decoded
-		 * as we (intentionally) do not write the logical decoding specific
-		 * information to WAL.
-		 */
-		if (concurrent)
-		{
-			XLogRecPtr	end_of_wal;
-
-			end_of_wal = GetFlushRecPtr(NULL);
-			if ((end_of_wal - end_of_wal_prev) > wal_segment_size)
-			{
-				repack_decode_concurrent_changes(decoding_ctx, end_of_wal);
-				end_of_wal_prev = end_of_wal;
-			}
-		}
 	}
 
 	if (indexScan != NULL)
@@ -1027,22 +999,6 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			/* Report n_tuples */
 			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_INSERTED,
 										 n_tuples);
-
-			/*
-			 * Try to keep the amount of not-yet-decoded WAL small, like
-			 * above.
-			 */
-			if (concurrent)
-			{
-				XLogRecPtr	end_of_wal;
-
-				end_of_wal = GetFlushRecPtr(NULL);
-				if ((end_of_wal - end_of_wal_prev) > wal_segment_size)
-				{
-					repack_decode_concurrent_changes(decoding_ctx, end_of_wal);
-					end_of_wal_prev = end_of_wal;
-				}
-			}
 		}
 
 		tuplesort_end(tuplesort);
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index c3feb0c3de4..5232fbfb57d 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -12,12 +12,13 @@
  * In concurrent mode, we lock the table with only ShareUpdateExclusiveLock,
  * then do an initial copy as above.  However, while the tuples are being
  * copied, concurrent transactions could modify the table. To cope with those
- * changes, we rely on logical decoding to obtain them from WAL.  The changes
- * are accumulated in a tuplestore.  Once the initial copy is complete, we
- * read the changes from the tuplestore and re-apply them on the new heap.
- * Then we upgrade our ShareUpdateExclusiveLock to AccessExclusiveLock and
- * swap the relfilenodes.  This way, the time we hold a strong lock on the
- * table is much reduced, and the bloat is eliminated.
+ * changes, we rely on logical decoding to obtain them from WAL.  A bgworker
+ * consumes WAL while the initial copy is ongoing (to prevent excessive WAL
+ * from being reserved), and accumulates the changes in a file.  Once the
+ * initial copy is complete, we read the changes from the file and re-apply
+ * them on the new heap.  Then we upgrade our ShareUpdateExclusiveLock to
+ * AccessExclusiveLock and swap the relfilenodes.  This way, the time we hold
+ * a strong lock on the table is much reduced, and the bloat is eliminated.
  *
  * There is hardly anything left of Paul Brown's original implementation...
  *
@@ -45,6 +46,7 @@
 #include "access/xlog_internal.h"
 #include "access/xloginsert.h"
 #include "access/xlogutils.h"
+#include "access/xlogwait.h"
 #include "catalog/catalog.h"
 #include "catalog/dependency.h"
 #include "catalog/heap.h"
@@ -61,6 +63,8 @@
 #include "commands/tablecmds.h"
 #include "commands/vacuum.h"
 #include "executor/executor.h"
+#include "libpq/pqformat.h"
+#include "libpq/pqmq.h"
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
 #include "pgstat.h"
@@ -71,6 +75,8 @@
 #include "storage/ipc.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
+#include "storage/procsignal.h"
+#include "tcop/tcopprot.h"
 #include "utils/acl.h"
 #include "utils/fmgroids.h"
 #include "utils/guc.h"
@@ -117,6 +123,12 @@ typedef struct IndexInsertState
 /* The WAL segment being decoded. */
 static XLogSegNo repack_current_segment = 0;
 
+/*
+ * The first file exported by the decoding worker must contain a snapshot, the
+ * following ones contain the data changes.
+ */
+#define WORKER_FILE_SNAPSHOT	0
+
 /*
  * Information needed to apply concurrent data changes.
  */
@@ -136,8 +148,113 @@ typedef struct ChangeDest
 
 	/* Needed to update indexes of rel_dst. */
 	IndexInsertState *iistate;
+
+	/*
+	 * Sequential number of the file containing the changes.
+	 *
+	 * TODO This field makes the structure name less descriptive. Should we
+	 * rename it, e.g. to ChangeApplyInfo?
+	 */
+	int		file_seq;
 } ChangeDest;
 
+/*
+ * Layout of shared memory used for communication between backend and the
+ * worker that performs logical decoding of data changes
+ */
+typedef struct DecodingWorkerShared
+{
+	/* Is the decoding initialized? */
+	bool		initialized;
+
+	/*
+	 * Once the worker has reached this LSN, it should close the current
+	 * output file and either create a new one or exit, according to the field
+	 * 'done'. If the value is InvalidXLogRecPtr, the worker should decode all
+	 * the WAL available and keep checking this field. It is ok if the worker
+	 * had already decoded records whose LSN is >= lsn_upto before this field
+	 * has been set.
+	 */
+	XLogRecPtr	lsn_upto;
+
+	/* Exit after closing the current file? */
+	bool		done;
+
+	/* The output is stored here. */
+	SharedFileSet sfs;
+
+	/* Number of the last file exported by the worker. */
+	int			last_exported;
+
+	/* Synchronize access to the fields above. */
+	slock_t		mutex;
+
+	/* Database to connect to. */
+	Oid			dbid;
+
+	/* Role to connect as. */
+	Oid			roleid;
+
+	/* Decode data changes of this relation. */
+	Oid			relid;
+
+	/* The backend uses this to wait for the worker. */
+	ConditionVariable cv;
+
+	/* Info to signal the backend. */
+	PGPROC	   *backend_proc;
+	pid_t		backend_pid;
+	ProcNumber	backend_proc_number;
+
+	/* Error queue. */
+	shm_mq	   *error_mq;
+
+	/*
+	 * Memory the queue is located int.
+	 *
+	 * For considerations on the value see the comments of
+	 * PARALLEL_ERROR_QUEUE_SIZE.
+	 */
+#define REPACK_ERROR_QUEUE_SIZE			16384
+	char		error_queue[FLEXIBLE_ARRAY_MEMBER];
+} DecodingWorkerShared;
+
+/*
+ * Generate worker's output file name. If relations of the same 'relid' happen
+ * to be processed at the same time, they must be from different databases and
+ * therefore different backends must be involved. (PID is already present in
+ * the fileset name.)
+ */
+static inline void
+DecodingWorkerFileName(char *fname, Oid relid, uint32 seq)
+{
+	snprintf(fname, MAXPGPATH, "%u-%u", relid, seq);
+}
+
+/*
+ * Backend-local information to control the decoding worker.
+ */
+typedef struct DecodingWorker
+{
+	/* The worker. */
+	BackgroundWorkerHandle *handle;
+
+	/* DecodingWorkerShared is in this segment. */
+	dsm_segment *seg;
+
+	/* Handle of the error queue. */
+	shm_mq_handle *error_mqh;
+} DecodingWorker;
+
+/* Pointer to currently running decoding worker. */
+static DecodingWorker *decoding_worker = NULL;
+
+/*
+ * Is there a message sent by a repack worker that the backend needs to
+ * receive?
+ */
+volatile sig_atomic_t RepackMessagePending = false;
+
 static bool cluster_rel_recheck(RepackCommand cmd, Relation OldHeap,
 								Oid indexOid, Oid userid, LOCKMODE lmode,
 								int options);
@@ -145,7 +262,7 @@ static void check_repack_concurrently_requirements(Relation rel);
 static void rebuild_relation(Relation OldHeap, Relation index, bool verbose,
 							 bool concurrent);
 static void copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
-							Snapshot snapshot, LogicalDecodingContext *decoding_ctx,
+							Snapshot snapshot,
 							bool verbose,
 							bool *pSwapToastByContent,
 							TransactionId *pFreezeXid,
@@ -158,12 +275,10 @@ static List *get_tables_to_repack_partitioned(RepackCommand cmd,
 static bool cluster_is_permitted_for_relation(RepackCommand cmd,
 											  Oid relid, Oid userid);
 
-static void begin_concurrent_repack(Relation rel);
-static void end_concurrent_repack(void);
 static LogicalDecodingContext *setup_logical_decoding(Oid relid);
-static HeapTuple get_changed_tuple(char *change);
-static void apply_concurrent_changes(RepackDecodingState *dstate,
-									 ChangeDest *dest);
+static bool decode_concurrent_changes(LogicalDecodingContext *ctx,
+									  DecodingWorkerShared *shared);
+static void apply_concurrent_changes(BufFile *file, ChangeDest *dest);
 static void apply_concurrent_insert(Relation rel, HeapTuple tup,
 									IndexInsertState *iistate,
 									TupleTableSlot *index_slot);
@@ -175,9 +290,9 @@ static void apply_concurrent_delete(Relation rel, HeapTuple tup_target);
 static HeapTuple find_target_tuple(Relation rel, ChangeDest *dest,
 								   HeapTuple tup_key,
 								   TupleTableSlot *ident_slot);
-static void process_concurrent_changes(LogicalDecodingContext *decoding_ctx,
-									   XLogRecPtr end_of_wal,
-									   ChangeDest *dest);
+static void process_concurrent_changes(XLogRecPtr end_of_wal,
+									   ChangeDest *dest,
+									   bool done);
 static IndexInsertState *get_index_insert_state(Relation relation,
 												Oid ident_index_id,
 												Relation *ident_index_p);
@@ -187,7 +302,6 @@ static void free_index_insert_state(IndexInsertState *iistate);
 static void cleanup_logical_decoding(LogicalDecodingContext *ctx);
 static void rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
 											   Relation cl_index,
-											   LogicalDecodingContext *decoding_ctx,
 											   TransactionId frozenXid,
 											   MultiXactId cutoffMulti);
 static List *build_new_indexes(Relation NewHeap, Relation OldHeap, List *OldIndexes);
@@ -197,6 +311,13 @@ static Relation process_single_relation(RepackStmt *stmt,
 										ClusterParams *params);
 static Oid	determine_clustered_index(Relation rel, bool usingindex,
 									  const char *indexname);
+static void start_decoding_worker(Oid relid);
+static void stop_decoding_worker(void);
+static void repack_worker_internal(dsm_segment *seg);
+static void export_initial_snapshot(Snapshot snapshot,
+									DecodingWorkerShared *shared);
+static Snapshot get_initial_snapshot(DecodingWorker *worker);
+static void ProcessRepackMessage(StringInfo msg);
 static const char *RepackCommandAsString(RepackCommand cmd);
 
 
@@ -619,20 +740,20 @@ cluster_rel(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	/* rebuild_relation does all the dirty work */
 	PG_TRY();
 	{
-		/*
-		 * For concurrent processing, make sure that our logical decoding
-		 * ignores data changes of other tables than the one we are
-		 * processing.
-		 */
-		if (concurrent)
-			begin_concurrent_repack(OldHeap);
-
 		rebuild_relation(OldHeap, index, verbose, concurrent);
 	}
 	PG_FINALLY();
 	{
 		if (concurrent)
-			end_concurrent_repack();
+		{
+			/*
+			 * Since during normal operation the worker was already asked to
+			 * exit, stopping it explicitly is especially important on ERROR.
+			 * However it still seems a good practice to make sure that the
+			 * worker never survives the REPACK command.
+			 */
+			stop_decoding_worker();
+		}
 	}
 	PG_END_TRY();
 
@@ -929,7 +1050,6 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose, bool concurrent
 	bool		swap_toast_by_content;
 	TransactionId frozenXid;
 	MultiXactId cutoffMulti;
-	LogicalDecodingContext *decoding_ctx = NULL;
 	Snapshot	snapshot = NULL;
 #if USE_ASSERT_CHECKING
 	LOCKMODE	lmode;
@@ -943,19 +1063,36 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose, bool concurrent
 	if (concurrent)
 	{
 		/*
-		 * Prepare to capture the concurrent data changes.
+		 * The worker needs to be member of the locking group we're the leader
+		 * of. We ought to become the leader before the worker starts. The
+		 * worker will join the group as soon as it starts.
 		 *
-		 * Note that this call waits for all transactions with XID already
-		 * assigned to finish. If some of those transactions is waiting for a
-		 * lock conflicting with ShareUpdateExclusiveLock on our table (e.g.
-		 * it runs CREATE INDEX), we can end up in a deadlock. Not sure this
-		 * risk is worth unlocking/locking the table (and its clustering
-		 * index) and checking again if its still eligible for REPACK
-		 * CONCURRENTLY.
+		 * This is to make sure that the deadlock described below is
+		 * detectable by deadlock.c: if the worker waits for a transaction to
+		 * complete and we are waiting for the worker output, then effectively
+		 * we (i.e. this backend) are waiting for that transaction.
 		 */
-		decoding_ctx = setup_logical_decoding(tableOid);
+		BecomeLockGroupLeader();
+
+		/*
+		 * Start the worker that decodes data changes applied while we're
+		 * copying the table contents.
+		 *
+		 * Note that the worker has to wait for all transactions with XID
+		 * already assigned to finish. If some of those transactions is
+		 * waiting for a lock conflicting with ShareUpdateExclusiveLock on our
+		 * table (e.g.  it runs CREATE INDEX), we can end up in a deadlock.
+		 * Not sure this risk is worth unlocking/locking the table (and its
+		 * clustering index) and checking again if its still eligible for
+		 * REPACK CONCURRENTLY.
+		 */
+		start_decoding_worker(tableOid);
+
+		/*
+		 * Wait until the worker has the initial snapshot and retrieve it.
+		 */
+		snapshot = get_initial_snapshot(decoding_worker);
 
-		snapshot = SnapBuildInitialSnapshotForRepack(decoding_ctx->snapshot_builder);
 		PushActiveSnapshot(snapshot);
 	}
 
@@ -980,7 +1117,7 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose, bool concurrent
 	NewHeap = table_open(OIDNewHeap, NoLock);
 
 	/* Copy the heap data into the new table in the desired order */
-	copy_table_data(NewHeap, OldHeap, index, snapshot, decoding_ctx, verbose,
+	copy_table_data(NewHeap, OldHeap, index, snapshot, verbose,
 					&swap_toast_by_content, &frozenXid, &cutoffMulti);
 
 	/* The historic snapshot won't be needed anymore. */
@@ -994,14 +1131,10 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose, bool concurrent
 	{
 		Assert(!swap_toast_by_content);
 		rebuild_relation_finish_concurrent(NewHeap, OldHeap, index,
-										   decoding_ctx,
 										   frozenXid, cutoffMulti);
 
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_FINAL_CLEANUP);
-
-		/* Done with decoding. */
-		cleanup_logical_decoding(decoding_ctx);
 	}
 	else
 	{
@@ -1172,8 +1305,7 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
  */
 static void
 copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
-				Snapshot snapshot, LogicalDecodingContext *decoding_ctx,
-				bool verbose, bool *pSwapToastByContent,
+				Snapshot snapshot, bool verbose, bool *pSwapToastByContent,
 				TransactionId *pFreezeXid, MultiXactId *pCutoffMulti)
 {
 	Relation	relRelation;
@@ -1334,7 +1466,6 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
 	 */
 	table_relation_copy_for_cluster(OldHeap, NewHeap, OldIndex, use_sort,
 									cutoffs.OldestXmin, snapshot,
-									decoding_ctx,
 									&cutoffs.FreezeLimit,
 									&cutoffs.MultiXactCutoff,
 									&num_tuples, &tups_vacuumed,
@@ -2367,59 +2498,6 @@ RepackCommandAsString(RepackCommand cmd)
 	return "???";
 }
 
-
-/*
- * Call this function before REPACK CONCURRENTLY starts to setup logical
- * decoding. It makes sure that other users of the table put enough
- * information into WAL.
- *
- * The point is that at various places we expect that the table we're
- * processing is treated like a system catalog. For example, we need to be
- * able to scan it using a "historic snapshot" anytime during the processing
- * (as opposed to scanning only at the start point of the decoding, as logical
- * replication does during initial table synchronization), in order to apply
- * concurrent UPDATE / DELETE commands.
- *
- * Note that TOAST table needs no attention here as it's not scanned using
- * historic snapshot.
- */
-static void
-begin_concurrent_repack(Relation rel)
-{
-	Oid			toastrelid;
-
-	/*
-	 * Avoid logical decoding of other relations by this backend. The lock we
-	 * have guarantees that the actual locator cannot be changed concurrently:
-	 * TRUNCATE needs AccessExclusiveLock.
-	 */
-	Assert(CheckRelationLockedByMe(rel, ShareUpdateExclusiveLock, false));
-	repacked_rel_locator = rel->rd_locator;
-	toastrelid = rel->rd_rel->reltoastrelid;
-	if (OidIsValid(toastrelid))
-	{
-		Relation	toastrel;
-
-		/* Avoid logical decoding of other TOAST relations. */
-		toastrel = table_open(toastrelid, AccessShareLock);
-		repacked_rel_toast_locator = toastrel->rd_locator;
-		table_close(toastrel, AccessShareLock);
-	}
-}
-
-/*
- * Call this when done with REPACK CONCURRENTLY.
- */
-static void
-end_concurrent_repack(void)
-{
-	/*
-	 * Restore normal function of (future) logical decoding for this backend.
-	 */
-	repacked_rel_locator.relNumber = InvalidOid;
-	repacked_rel_toast_locator.relNumber = InvalidOid;
-}
-
 /*
  * Is this backend performing logical decoding on behalf of REPACK
  * (CONCURRENTLY) ?
@@ -2484,9 +2562,10 @@ static LogicalDecodingContext *
 setup_logical_decoding(Oid relid)
 {
 	Relation	rel;
-	TupleDesc	tupdesc;
+	Oid			toastrelid;
 	LogicalDecodingContext *ctx;
-	RepackDecodingState *dstate = palloc0_object(RepackDecodingState);
+	NameData	slotname;
+	RepackDecodingState *dstate;
 
 	/*
 	 * REPACK CONCURRENTLY is not allowed in a transaction block, so this
@@ -2494,21 +2573,21 @@ setup_logical_decoding(Oid relid)
 	 */
 	Assert(!TransactionIdIsValid(GetTopTransactionIdIfAny()));
 
-	/*
-	 * A single backend should not execute multiple REPACK commands at a time,
-	 * so use PID to make the slot unique.
-	 */
-	snprintf(NameStr(dstate->slotname), NAMEDATALEN, "repack_%d", MyProcPid);
-
 	/*
 	 * Check if we can use logical decoding.
 	 */
 	CheckSlotPermissions();
 	CheckLogicalDecodingRequirements();
 
-	/* RS_TEMPORARY so that the slot gets cleaned up on ERROR. */
-	ReplicationSlotCreate(NameStr(dstate->slotname), true, RS_TEMPORARY,
-						  false, false, false);
+	/*
+	 * A single backend should not execute multiple REPACK commands at a time,
+	 * so use PID to make the slot unique.
+	 *
+	 * RS_TEMPORARY so that the slot gets cleaned up on ERROR.
+	 */
+	snprintf(NameStr(slotname), NAMEDATALEN, "repack_%d", MyProcPid);
+	ReplicationSlotCreate(NameStr(slotname), true, RS_TEMPORARY, false, false,
+						  false);
 
 	/*
 	 * Neither prepare_write nor do_write callback nor update_progress is
@@ -2530,104 +2609,109 @@ setup_logical_decoding(Oid relid)
 
 	DecodingContextFindStartpoint(ctx);
 
+	/*
+	 * decode_concurrent_changes() needs non-blocking callback.
+	 */
+	ctx->reader->routine.page_read = read_local_xlog_page_no_wait;
+
+	/*
+	 * read_local_xlog_page_no_wait() needs to be able to indicate the end of
+	 * WAL.
+	 */
+	ctx->reader->private_data = MemoryContextAllocZero(ctx->context,
+													   sizeof(ReadLocalXLogPageNoWaitPrivate));
+
+
 	/* Some WAL records should have been read. */
 	Assert(ctx->reader->EndRecPtr != InvalidXLogRecPtr);
 
+	/*
+	 * Initialize repack_current_segment so that we can notice WAL segment
+	 * boundaries.
+	 */
 	XLByteToSeg(ctx->reader->EndRecPtr, repack_current_segment,
 				wal_segment_size);
 
-	/*
-	 * Setup structures to store decoded changes.
-	 */
+	dstate = palloc0_object(RepackDecodingState);
 	dstate->relid = relid;
-	dstate->tstore = tuplestore_begin_heap(false, false,
-										   maintenance_work_mem);
 
-	/* Caller should already have the table locked. */
-	rel = table_open(relid, NoLock);
-	tupdesc = CreateTupleDescCopy(RelationGetDescr(rel));
-	dstate->tupdesc = tupdesc;
-	table_close(rel, NoLock);
+	/*
+	 * Tuple descriptor may be needed to flatten a tuple before we write it to
+	 * a file. A copy is needed because the decoding worker invalidates system
+	 * caches before it starts to do the actual work.
+	 */
+	rel = table_open(relid, AccessShareLock);
+	dstate->tupdesc = CreateTupleDescCopy(RelationGetDescr(rel));
 
-	/* Initialize the descriptor to store the changes ... */
-	dstate->tupdesc_change = CreateTemplateTupleDesc(1);
+	/* Avoid logical decoding of other relations. */
+	repacked_rel_locator = rel->rd_locator;
+	toastrelid = rel->rd_rel->reltoastrelid;
+	if (OidIsValid(toastrelid))
+	{
+		Relation	toastrel;
 
-	TupleDescInitEntry(dstate->tupdesc_change, 1, NULL, BYTEAOID, -1, 0);
-	/* ... as well as the corresponding slot. */
-	dstate->tsslot = MakeSingleTupleTableSlot(dstate->tupdesc_change,
-											  &TTSOpsMinimalTuple);
+		/* Avoid logical decoding of other TOAST relations. */
+		toastrel = table_open(toastrelid, AccessShareLock);
+		repacked_rel_toast_locator = toastrel->rd_locator;
+		table_close(toastrel, AccessShareLock);
+	}
+	table_close(rel, AccessShareLock);
 
-	dstate->resowner = ResourceOwnerCreate(CurrentResourceOwner,
-										   "logical decoding");
+	/* The file will be set as soon as we have it opened. */
+	dstate->file = NULL;
 
 	ctx->output_writer_private = dstate;
+
 	return ctx;
 }
 
 /*
- * Retrieve tuple from ConcurrentChange structure.
+ * Decode logical changes from the WAL sequence and store them to a file.
  *
- * The input data starts with the structure but it might not be appropriately
- * aligned.
- */
-static HeapTuple
-get_changed_tuple(char *change)
-{
-	HeapTupleData tup_data;
-	HeapTuple	result;
-	char	   *src;
-
-	/*
-	 * Ensure alignment before accessing the fields. (This is why we can't use
-	 * heap_copytuple() instead of this function.)
-	 */
-	src = change + offsetof(ConcurrentChange, tup_data);
-	memcpy(&tup_data, src, sizeof(HeapTupleData));
-
-	result = (HeapTuple) palloc(HEAPTUPLESIZE + tup_data.t_len);
-	memcpy(result, &tup_data, sizeof(HeapTupleData));
-	result->t_data = (HeapTupleHeader) ((char *) result + HEAPTUPLESIZE);
-	src = change + SizeOfConcurrentChange;
-	memcpy(result->t_data, src, result->t_len);
-
-	return result;
-}
-
-/*
- * Decode logical changes from the WAL sequence up to end_of_wal.
+ * If true is returned, there is no more work for the worker.
  */
-void
-repack_decode_concurrent_changes(LogicalDecodingContext *ctx,
-								 XLogRecPtr end_of_wal)
+static bool
+decode_concurrent_changes(LogicalDecodingContext *ctx,
+						  DecodingWorkerShared *shared)
 {
 	RepackDecodingState *dstate;
-	ResourceOwner resowner_old;
+	XLogRecPtr	lsn_upto;
+	bool		done;
+	char		fname[MAXPGPATH];
 
 	dstate = (RepackDecodingState *) ctx->output_writer_private;
-	resowner_old = CurrentResourceOwner;
-	CurrentResourceOwner = dstate->resowner;
 
-	PG_TRY();
+	/* Open the output file. */
+	DecodingWorkerFileName(fname, shared->relid, shared->last_exported + 1);
+	dstate->file = BufFileCreateFileSet(&shared->sfs.fs, fname);
+
+	SpinLockAcquire(&shared->mutex);
+	lsn_upto = shared->lsn_upto;
+	done = shared->done;
+	SpinLockRelease(&shared->mutex);
+
+	while (true)
 	{
-		while (ctx->reader->EndRecPtr < end_of_wal)
-		{
-			XLogRecord *record;
-			XLogSegNo	segno_new;
-			char	   *errm = NULL;
-			XLogRecPtr	end_lsn;
+		XLogRecord *record;
+		XLogSegNo	segno_new;
+		char	   *errm = NULL;
+		XLogRecPtr	end_lsn;
 
-			record = XLogReadRecord(ctx->reader, &errm);
-			if (errm)
-				elog(ERROR, "%s", errm);
+		CHECK_FOR_INTERRUPTS();
 
-			if (record != NULL)
-				LogicalDecodingProcessRecord(ctx, ctx->reader);
+		record = XLogReadRecord(ctx->reader, &errm);
+		if (record)
+		{
+			LogicalDecodingProcessRecord(ctx, ctx->reader);
 
 			/*
 			 * If WAL segment boundary has been crossed, inform the decoding
-			 * system that the catalog_xmin can advance. (We can confirm more
-			 * often, but a filling a single WAL segment should not take much
-			 * time.)
+			 * system that the catalog_xmin can advance.
+			 *
+			 * TODO Does it make sense to confirm more often? Segment size
+			 * seems appropriate for restart_lsn (because less than a segment
+			 * cannot be recycled anyway), however more frequent checks might
+			 * be beneficial for catalog_xmin.
 			 */
 			end_lsn = ctx->reader->EndRecPtr;
 			XLByteToSeg(end_lsn, segno_new, wal_segment_size);
@@ -2638,80 +2722,137 @@ repack_decode_concurrent_changes(LogicalDecodingContext *ctx,
 					 (uint32) (end_lsn >> 32), (uint32) end_lsn);
 				repack_current_segment = segno_new;
 			}
+		}
+		else
+		{
+			ReadLocalXLogPageNoWaitPrivate *priv;
 
-			CHECK_FOR_INTERRUPTS();
+			if (errm)
+				ereport(ERROR, (errmsg("%s", errm)));
+
+			/*
+			 * In the decoding loop we do not want to get blocked when there
+			 * is no more WAL available, otherwise the loop would become
+			 * uninterruptible.
+			 */
+			priv = (ReadLocalXLogPageNoWaitPrivate *)
+				ctx->reader->private_data;
+			if (priv->end_of_wal)
+				/* Do not miss the end of WAL condition next time. */
+				priv->end_of_wal = false;
+			else
+				ereport(ERROR, (errmsg("could not read WAL record")));
+		}
+
+		/*
+		 * Whether we could read new record or not, keep checking if
+		 * 'lsn_upto' was specified.
+		 */
+		if (XLogRecPtrIsInvalid(lsn_upto))
+		{
+			SpinLockAcquire(&shared->mutex);
+			lsn_upto = shared->lsn_upto;
+			/* 'done' should be set at the same time as 'lsn_upto' */
+			done = shared->done;
+			SpinLockRelease(&shared->mutex);
+		}
+		if (!XLogRecPtrIsInvalid(lsn_upto) &&
+			ctx->reader->EndRecPtr >= lsn_upto)
+			break;
+
+		if (record == NULL)
+		{
+			int64 timeout = 0;
+			WaitLSNResult	res;
+
+			/*
+			 * Before we retry reading, wait until new WAL is flushed.
+			 *
+			 * There is a race condition such that the backend executing
+			 * REPACK determines 'lsn_upto', but before it sets the shared
+			 * variable, we reach the end of WAL. In that case we'd need to
+			 * wait until the next WAL flush (unrelated to REPACK). Although
+			 * that should not be a problem in a busy system, it might be
+			 * noticeable in other cases, including regression tests (which
+			 * are not necessarily executed in parallel). Therefore it makes
+			 * sense to use timeout.
+			 *
+			 * If lsn_upto is valid, WAL records having LSN lower than that
+			 * should already have been flushed to disk.
+			 */
+			if (XLogRecPtrIsInvalid(lsn_upto))
+				timeout = 100L;
+			res = WaitForLSN(WAIT_LSN_TYPE_PRIMARY_FLUSH,
+							 ctx->reader->EndRecPtr + 1,
+							 timeout);
+			if (res != WAIT_LSN_RESULT_SUCCESS &&
+				res != WAIT_LSN_RESULT_TIMEOUT)
+				ereport(ERROR, (errmsg("waiting for WAL failed")));
 		}
-		InvalidateSystemCaches();
-		CurrentResourceOwner = resowner_old;
-	}
-	PG_CATCH();
-	{
-		/* clear all timetravel entries */
-		InvalidateSystemCaches();
-		CurrentResourceOwner = resowner_old;
-		PG_RE_THROW();
 	}
-	PG_END_TRY();
+
+	/*
+	 * Close the file so we can make it available to the backend.
+	 */
+	BufFileClose(dstate->file);
+	dstate->file = NULL;
+	SpinLockAcquire(&shared->mutex);
+	shared->lsn_upto = InvalidXLogRecPtr;
+	shared->last_exported++;
+	SpinLockRelease(&shared->mutex);
+	ConditionVariableSignal(&shared->cv);
+
+	return done;
 }
 
 /*
  * Apply changes stored in 'file'.
  */
 static void
-apply_concurrent_changes(RepackDecodingState *dstate, ChangeDest *dest)
+apply_concurrent_changes(BufFile *file, ChangeDest *dest)
 {
+	char		kind;
+	uint32		t_len;
 	Relation	rel = dest->rel;
 	TupleTableSlot *index_slot,
 			   *ident_slot;
 	HeapTuple	tup_old = NULL;
 
-	if (dstate->nchanges == 0)
-		return;
-
 	/* TupleTableSlot is needed to pass the tuple to ExecInsertIndexTuples(). */
-	index_slot = MakeSingleTupleTableSlot(dstate->tupdesc, &TTSOpsHeapTuple);
+	index_slot = MakeSingleTupleTableSlot(RelationGetDescr(rel),
+										  &TTSOpsHeapTuple);
 
 	/* A slot to fetch tuples from identity index. */
 	ident_slot = table_slot_create(rel, NULL);
 
-	while (tuplestore_gettupleslot(dstate->tstore, true, false,
-								   dstate->tsslot))
+	while (true)
 	{
-		bool		shouldFree;
-		HeapTuple	tup_change,
-					tup,
+		size_t		nread;
+		HeapTuple	tup,
 					tup_exist;
-		char	   *change_raw,
-				   *src;
-		ConcurrentChange change;
-		bool		isnull[1];
-		Datum		values[1];
 
 		CHECK_FOR_INTERRUPTS();
 
-		/* Get the change from the single-column tuple. */
-		tup_change = ExecFetchSlotHeapTuple(dstate->tsslot, false, &shouldFree);
-		heap_deform_tuple(tup_change, dstate->tupdesc_change, values, isnull);
-		Assert(!isnull[0]);
-
-		/* Make sure we access aligned data. */
-		change_raw = (char *) DatumGetByteaP(values[0]);
-		src = (char *) VARDATA(change_raw);
-		memcpy(&change, src, SizeOfConcurrentChange);
+		nread = BufFileReadMaybeEOF(file, &kind, 1, true);
+		/* Are we done with the file? */
+		if (nread == 0)
+			break;
 
-		/*
-		 * Extract the tuple from the change. The tuple is copied here because
-		 * it might be assigned to 'tup_old', in which case it needs to
-		 * survive into the next iteration.
-		 */
-		tup = get_changed_tuple(src);
+		/* Read the tuple. */
+		BufFileReadExact(file, &t_len, sizeof(t_len));
+		tup = (HeapTuple) palloc(HEAPTUPLESIZE + t_len);
+		tup->t_data = (HeapTupleHeader) ((char *) tup + HEAPTUPLESIZE);
+		BufFileReadExact(file, tup->t_data, t_len);
+		tup->t_len = t_len;
+		ItemPointerSetInvalid(&tup->t_self);
+		tup->t_tableOid = RelationGetRelid(dest->rel);
 
-		if (change.kind == CHANGE_UPDATE_OLD)
+		if (kind == CHANGE_UPDATE_OLD)
 		{
 			Assert(tup_old == NULL);
 			tup_old = tup;
 		}
-		else if (change.kind == CHANGE_INSERT)
+		else if (kind == CHANGE_INSERT)
 		{
 			Assert(tup_old == NULL);
 
@@ -2719,12 +2860,11 @@ apply_concurrent_changes(RepackDecodingState *dstate, ChangeDest *dest)
 
 			pfree(tup);
 		}
-		else if (change.kind == CHANGE_UPDATE_NEW ||
-				 change.kind == CHANGE_DELETE)
+		else if (kind == CHANGE_UPDATE_NEW || kind == CHANGE_DELETE)
 		{
 			HeapTuple	tup_key;
 
-			if (change.kind == CHANGE_UPDATE_NEW)
+			if (kind == CHANGE_UPDATE_NEW)
 			{
 				tup_key = tup_old != NULL ? tup_old : tup;
 			}
@@ -2741,7 +2881,7 @@ apply_concurrent_changes(RepackDecodingState *dstate, ChangeDest *dest)
 			if (tup_exist == NULL)
 				elog(ERROR, "failed to find target tuple");
 
-			if (change.kind == CHANGE_UPDATE_NEW)
+			if (kind == CHANGE_UPDATE_NEW)
 				apply_concurrent_update(rel, tup, tup_exist, dest->iistate,
 										index_slot);
 			else
@@ -2756,26 +2896,19 @@ apply_concurrent_changes(RepackDecodingState *dstate, ChangeDest *dest)
 			pfree(tup);
 		}
 		else
-			elog(ERROR, "unrecognized kind of change: %d", change.kind);
+			elog(ERROR, "unrecognized kind of change: %d", kind);
 
 		/*
 		 * If a change was applied now, increment CID for next writes and
 		 * update the snapshot so it sees the changes we've applied so far.
 		 */
-		if (change.kind != CHANGE_UPDATE_OLD)
+		if (kind != CHANGE_UPDATE_OLD)
 		{
 			CommandCounterIncrement();
 			UpdateActiveSnapshotCommandId();
 		}
-
-		/* TTSOpsMinimalTuple has .get_heap_tuple==NULL. */
-		Assert(shouldFree);
-		pfree(tup_change);
 	}
 
-	tuplestore_clear(dstate->tstore);
-	dstate->nchanges = 0;
-
 	/* Cleanup. */
 	ExecDropSingleTupleTableSlot(index_slot);
 	ExecDropSingleTupleTableSlot(ident_slot);
@@ -2954,25 +3087,59 @@ find_target_tuple(Relation rel, ChangeDest *dest, HeapTuple tup_key,
 }
 
 /*
- * Decode and apply concurrent changes.
+ * Decode and apply concurrent changes, up to (and including) the record whose
+ * LSN is 'end_of_wal'.
  */
 static void
-process_concurrent_changes(LogicalDecodingContext *decoding_ctx,
-						   XLogRecPtr end_of_wal, ChangeDest *dest)
+process_concurrent_changes(XLogRecPtr end_of_wal, ChangeDest *dest, bool done)
 {
-	RepackDecodingState *dstate;
+	DecodingWorkerShared *shared;
+	char		fname[MAXPGPATH];
+	BufFile    *file;
 
 	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 								 PROGRESS_REPACK_PHASE_CATCH_UP);
 
-	dstate = (RepackDecodingState *) decoding_ctx->output_writer_private;
+	/* Ask the worker for the file. */
+	shared = (DecodingWorkerShared *) dsm_segment_address(decoding_worker->seg);
+	SpinLockAcquire(&shared->mutex);
+	shared->lsn_upto = end_of_wal;
+	shared->done = done;
+	SpinLockRelease(&shared->mutex);
 
-	repack_decode_concurrent_changes(decoding_ctx, end_of_wal);
+	/*
+	 * The worker needs to finish processing of the current WAL record. Even
+	 * if it's idle, it'll need to close the output file. Thus we're likely to
+	 * wait, so prepare for sleep.
+	 */
+	ConditionVariablePrepareToSleep(&shared->cv);
+	for (;;)
+	{
+		int		last_exported;
 
-	if (dstate->nchanges == 0)
-		return;
+		SpinLockAcquire(&shared->mutex);
+		last_exported = shared->last_exported;
+		SpinLockRelease(&shared->mutex);
+
+		/*
+		 * Has the worker exported the file we are waiting for?
+		 */
+		if (last_exported == dest->file_seq)
+			break;
+
+		ConditionVariableSleep(&shared->cv, WAIT_EVENT_REPACK_WORKER_EXPORT);
+	}
+	ConditionVariableCancelSleep();
 
-	apply_concurrent_changes(dstate, dest);
+	/* Open the file. */
+	DecodingWorkerFileName(fname, shared->relid, dest->file_seq);
+	file = BufFileOpenFileSet(&shared->sfs.fs, fname, O_RDONLY, false);
+	apply_concurrent_changes(file, dest);
+
+	BufFileClose(file);
+
+	/* Get ready for the next file. */
+	dest->file_seq++;
 }
 
 /*
@@ -3098,15 +3265,10 @@ cleanup_logical_decoding(LogicalDecodingContext *ctx)
 
 	dstate = (RepackDecodingState *) ctx->output_writer_private;
 
-	ExecDropSingleTupleTableSlot(dstate->tsslot);
-	FreeTupleDesc(dstate->tupdesc_change);
 	FreeTupleDesc(dstate->tupdesc);
-	tuplestore_end(dstate->tstore);
-
 	FreeDecodingContext(ctx);
 
-	ReplicationSlotRelease();
-	ReplicationSlotDrop(NameStr(dstate->slotname), false);
+	ReplicationSlotDropAcquired();
 	pfree(dstate);
 }
 
@@ -3121,7 +3283,6 @@ cleanup_logical_decoding(LogicalDecodingContext *ctx)
 static void
 rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
 								   Relation cl_index,
-								   LogicalDecodingContext *decoding_ctx,
 								   TransactionId frozenXid,
 								   MultiXactId cutoffMulti)
 {
@@ -3204,6 +3365,7 @@ rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
 											&chgdst.ident_index);
 	chgdst.ident_key = build_identity_key(ident_idx_new, OldHeap,
 										  &chgdst.ident_key_nentries);
+	chgdst.file_seq = WORKER_FILE_SNAPSHOT + 1;
 
 	/*
 	 * During testing, wait for another backend to perform concurrent data
@@ -3225,7 +3387,7 @@ rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
 	 * hold AccessExclusiveLock. (Quite some amount of WAL could have been
 	 * written during the data copying and index creation.)
 	 */
-	process_concurrent_changes(decoding_ctx, end_of_wal, &chgdst);
+	process_concurrent_changes(end_of_wal, &chgdst, false);
 
 	/*
 	 * Acquire AccessExclusiveLock on the table, its TOAST relation (if there
@@ -3319,8 +3481,11 @@ rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
 	XLogFlush(wal_insert_ptr);
 	end_of_wal = GetFlushRecPtr(NULL);
 
-	/* Apply the concurrent changes again. */
-	process_concurrent_changes(decoding_ctx, end_of_wal, &chgdst);
+	/*
+	 * Apply the concurrent changes again. Indicate that the decoding worker
+	 * won't be needed anymore.
+	 */
+	process_concurrent_changes(end_of_wal, &chgdst, true);
 
 	/* Remember info about rel before closing OldHeap */
 	relpersistence = OldHeap->rd_rel->relpersistence;
@@ -3430,3 +3595,510 @@ build_new_indexes(Relation NewHeap, Relation OldHeap, List *OldIndexes)
 
 	return result;
 }
+
+/*
+ * Try to start a background worker to perform logical decoding of data
+ * changes applied to relation while REPACK CONCURRENTLY is copying its
+ * contents to a new table.
+ */
+static void
+start_decoding_worker(Oid relid)
+{
+	Size		size;
+	dsm_segment *seg;
+	DecodingWorkerShared *shared;
+	shm_mq	   *mq;
+	shm_mq_handle *mqh;
+	BackgroundWorker bgw;
+
+	/* Setup shared memory. */
+	size = BUFFERALIGN(offsetof(DecodingWorkerShared, error_queue)) +
+		BUFFERALIGN(REPACK_ERROR_QUEUE_SIZE);
+	seg = dsm_create(size, 0);
+	shared = (DecodingWorkerShared *) dsm_segment_address(seg);
+	shared->lsn_upto = InvalidXLogRecPtr;
+	shared->done = false;
+	SharedFileSetInit(&shared->sfs, seg);
+	shared->last_exported = -1;
+	SpinLockInit(&shared->mutex);
+	shared->dbid = MyDatabaseId;
+
+	/*
+	 * This is the UserId set in cluster_rel(). Security context shouldn't be
+	 * needed for decoding worker.
+	 */
+	shared->roleid = GetUserId();
+	shared->relid = relid;
+	ConditionVariableInit(&shared->cv);
+	shared->backend_proc = MyProc;
+	shared->backend_pid = MyProcPid;
+	shared->backend_proc_number = MyProcNumber;
+
+	mq = shm_mq_create((char *) BUFFERALIGN(shared->error_queue),
+					   REPACK_ERROR_QUEUE_SIZE);
+	shm_mq_set_receiver(mq, MyProc);
+	mqh = shm_mq_attach(mq, seg, NULL);
+
+	memset(&bgw, 0, sizeof(bgw));
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "REPACK decoding worker for relation \"%s\"",
+			 get_rel_name(relid));
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "REPACK decoding worker");
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	snprintf(bgw.bgw_library_name, MAXPGPATH, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "RepackWorkerMain");
+	bgw.bgw_main_arg = UInt32GetDatum(dsm_segment_handle(seg));
+	bgw.bgw_notify_pid = MyProcPid;
+
+	decoding_worker = palloc0_object(DecodingWorker);
+	if (!RegisterDynamicBackgroundWorker(&bgw, &decoding_worker->handle))
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of background worker slots"),
+				 errhint("You might need to increase \"%s\".", "max_worker_processes")));
+
+	decoding_worker->seg = seg;
+	decoding_worker->error_mqh = mqh;
+
+	/*
+	 * The decoding setup must be done before the caller can have XID assigned
+	 * for any reason, otherwise the worker might end up in a deadlock,
+	 * waiting for the caller's transaction to end. Therefore wait here until
+	 * the worker indicates that it has the logical decoding initialized.
+	 */
+	ConditionVariablePrepareToSleep(&shared->cv);
+	for (;;)
+	{
+		int			initialized;
+
+		SpinLockAcquire(&shared->mutex);
+		initialized = shared->initialized;
+		SpinLockRelease(&shared->mutex);
+
+		if (initialized)
+			break;
+
+		ConditionVariableSleep(&shared->cv, WAIT_EVENT_REPACK_WORKER_EXPORT);
+	}
+	ConditionVariableCancelSleep();
+}
+
+/*
+ * Stop the decoding worker and cleanup the related resources.
+ *
+ * The worker stops on its own when it knows there is no more work to do, but
+ * we need to stop it explicitly at least on ERROR in the launching backend.
+ */
+static void
+stop_decoding_worker(void)
+{
+	BgwHandleStatus status;
+
+	/* Haven't reached the worker startup? */
+	if (decoding_worker == NULL)
+		return;
+
+	/* Could not register the worker? */
+	if (decoding_worker->handle == NULL)
+		return;
+
+	TerminateBackgroundWorker(decoding_worker->handle);
+	/* The worker should really exit before the REPACK command does. */
+	HOLD_INTERRUPTS();
+	status = WaitForBackgroundWorkerShutdown(decoding_worker->handle);
+	RESUME_INTERRUPTS();
+
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errcode(ERRCODE_ADMIN_SHUTDOWN),
+				 errmsg("postmaster exited during REPACK command")));
+
+	shm_mq_detach(decoding_worker->error_mqh);
+
+	/*
+	 * If we could not cancel the current sleep due to ERROR, do that before
+	 * we detach from the shared memory the condition variable is located in.
+	 * If we did not, the bgworker ERROR handling code would try and fail
+	 * badly.
+	 */
+	ConditionVariableCancelSleep();
+
+	dsm_detach(decoding_worker->seg);
+	pfree(decoding_worker);
+	decoding_worker = NULL;
+}
+
+/* Is this process a REPACK worker? */
+static bool is_repack_worker = false;
+
+static pid_t backend_pid;
+static ProcNumber backend_proc_number;
+
+/*
+ * See ParallelWorkerShutdown for details.
+ */
+static void
+RepackWorkerShutdown(int code, Datum arg)
+{
+	SendProcSignal(backend_pid,
+				   PROCSIG_REPACK_MESSAGE,
+				   backend_proc_number);
+
+	dsm_detach((dsm_segment *) DatumGetPointer(arg));
+}
+
+/* REPACK decoding worker entry point */
+void
+RepackWorkerMain(Datum main_arg)
+{
+	dsm_segment *seg;
+	DecodingWorkerShared *shared;
+	shm_mq	   *mq;
+	shm_mq_handle *mqh;
+
+	is_repack_worker = true;
+
+	/*
+	 * Override the default bgworker_die() with die() so we can use
+	 * CHECK_FOR_INTERRUPTS().
+	 */
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	seg = dsm_attach(DatumGetUInt32(main_arg));
+	if (seg == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("could not map dynamic shared memory segment")));
+
+	shared = (DecodingWorkerShared *) dsm_segment_address(seg);
+
+	/* Arrange to signal the leader if we exit. */
+	backend_pid = shared->backend_pid;
+	backend_proc_number = shared->backend_proc_number;
+	before_shmem_exit(RepackWorkerShutdown, PointerGetDatum(seg));
+
+	/*
+	 * Join locking group - see the comments around the call of
+	 * start_decoding_worker().
+	 */
+	if (!BecomeLockGroupMember(shared->backend_proc, backend_pid))
+		/* The leader is not running anymore. */
+		return;
+
+	/*
+	 * Setup a queue to send error messages to the backend that launched this
+	 * worker.
+	 */
+	mq = (shm_mq *) (char *) BUFFERALIGN(shared->error_queue);
+	shm_mq_set_sender(mq, MyProc);
+	mqh = shm_mq_attach(mq, seg, NULL);
+	pq_redirect_to_shm_mq(seg, mqh);
+	pq_set_parallel_leader(shared->backend_pid,
+						   shared->backend_proc_number);
+
+	/* Connect to the database. */
+	BackgroundWorkerInitializeConnectionByOid(shared->dbid, shared->roleid, 0);
+
+	repack_worker_internal(seg);
+}
+
+static void
+repack_worker_internal(dsm_segment *seg)
+{
+	DecodingWorkerShared *shared;
+	LogicalDecodingContext *decoding_ctx;
+	SharedFileSet *sfs;
+	Snapshot	snapshot;
+
+	/*
+	 * Transaction is needed to open relation, and it also provides us with a
+	 * resource owner.
+	 */
+	StartTransactionCommand();
+
+	shared = (DecodingWorkerShared *) dsm_segment_address(seg);
+
+	/*
+	 * Not sure the spinlock is needed here - the backend should not change
+	 * anything in the shared memory until we have serialized the snapshot.
+	 */
+	SpinLockAcquire(&shared->mutex);
+	Assert(XLogRecPtrIsInvalid(shared->lsn_upto));
+	sfs = &shared->sfs;
+	SpinLockRelease(&shared->mutex);
+
+	SharedFileSetAttach(sfs, seg);
+
+	/*
+	 * Prepare to capture the concurrent data changes ourselves.
+	 */
+	decoding_ctx = setup_logical_decoding(shared->relid);
+
+	/* Announce that we're ready. */
+	SpinLockAcquire(&shared->mutex);
+	shared->initialized = true;
+	SpinLockRelease(&shared->mutex);
+	ConditionVariableSignal(&shared->cv);
+
+	/* Build the initial snapshot and export it. */
+	snapshot = SnapBuildInitialSnapshotForRepack(decoding_ctx->snapshot_builder);
+	export_initial_snapshot(snapshot, shared);
+
+	/*
+	 * Only historic snapshots should be used now. Do not let us restrict the
+	 * progress of xmin horizon.
+	 */
+	InvalidateCatalogSnapshot();
+
+	while (!decode_concurrent_changes(decoding_ctx, shared))
+		;
+
+	/* Cleanup. */
+	cleanup_logical_decoding(decoding_ctx);
+	CommitTransactionCommand();
+}
+
+/*
+ * Make snapshot available to the backend that launched the decoding worker.
+ */
+static void
+export_initial_snapshot(Snapshot snapshot, DecodingWorkerShared *shared)
+{
+	char		fname[MAXPGPATH];
+	BufFile    *file;
+	Size		snap_size;
+	char	   *snap_space;
+
+	snap_size = EstimateSnapshotSpace(snapshot);
+	snap_space = (char *) palloc(snap_size);
+	SerializeSnapshot(snapshot, snap_space);
+	FreeSnapshot(snapshot);
+
+	DecodingWorkerFileName(fname, shared->relid, shared->last_exported + 1);
+	file = BufFileCreateFileSet(&shared->sfs.fs, fname);
+	/* To make restoration easier, write the snapshot size first. */
+	BufFileWrite(file, &snap_size, sizeof(snap_size));
+	BufFileWrite(file, snap_space, snap_size);
+	pfree(snap_space);
+	BufFileClose(file);
+
+	/* Increase the counter to tell the backend that the file is available. */
+	SpinLockAcquire(&shared->mutex);
+	shared->last_exported++;
+	SpinLockRelease(&shared->mutex);
+	ConditionVariableSignal(&shared->cv);
+}
+
+/*
+ * Get the initial snapshot from the decoding worker.
+ */
+static Snapshot
+get_initial_snapshot(DecodingWorker *worker)
+{
+	DecodingWorkerShared *shared;
+	char		fname[MAXPGPATH];
+	BufFile    *file;
+	Size		snap_size;
+	char	   *snap_space;
+	Snapshot	snapshot;
+
+	shared = (DecodingWorkerShared *) dsm_segment_address(worker->seg);
+
+	/*
+	 * The worker needs to initialize the logical decoding, which usually
+	 * takes some time. Therefore it makes sense to prepare for the sleep
+	 * first.
+	 */
+	ConditionVariablePrepareToSleep(&shared->cv);
+	for (;;)
+	{
+		int		last_exported;
+
+		SpinLockAcquire(&shared->mutex);
+		last_exported = shared->last_exported;
+		SpinLockRelease(&shared->mutex);
+
+		/*
+		 * Has the worker exported the file we are waiting for?
+		 */
+		if (last_exported == WORKER_FILE_SNAPSHOT)
+			break;
+
+		ConditionVariableSleep(&shared->cv, WAIT_EVENT_REPACK_WORKER_EXPORT);
+	}
+	ConditionVariableCancelSleep();
+
+	/* Read the snapshot from a file. */
+	DecodingWorkerFileName(fname, shared->relid, WORKER_FILE_SNAPSHOT);
+	file = BufFileOpenFileSet(&shared->sfs.fs, fname, O_RDONLY, false);
+	BufFileReadExact(file, &snap_size, sizeof(snap_size));
+	snap_space = (char *) palloc(snap_size);
+	BufFileReadExact(file, snap_space, snap_size);
+	BufFileClose(file);
+
+	/* Restore it. */
+	snapshot = RestoreSnapshot(snap_space);
+	pfree(snap_space);
+
+	return snapshot;
+}
+
+bool
+IsRepackWorker(void)
+{
+	return is_repack_worker;
+}
+
+/*
+ * Handle receipt of an interrupt indicating a repack worker message.
+ *
+ * Note: this is called within a signal handler!  All we can do is set
+ * a flag that will cause the next CHECK_FOR_INTERRUPTS() to invoke
+ * ProcessRepackMessages().
+ */
+void
+HandleRepackMessageInterrupt(void)
+{
+	InterruptPending = true;
+	RepackMessagePending = true;
+	SetLatch(MyLatch);
+}
+
+/*
+ * Process any queued protocol messages received from parallel workers.
+ */
+void
+ProcessRepackMessages(void)
+{
+	MemoryContext oldcontext;
+
+	static MemoryContext hpm_context = NULL;
+
+	/*
+	 * Nothing to do if we haven't launched the worker yet or have already
+	 * terminated it.
+	 */
+	if (decoding_worker == NULL)
+		return;
+
+	/*
+	 * This is invoked from ProcessInterrupts(), and since some of the
+	 * functions it calls contain CHECK_FOR_INTERRUPTS(), there is a potential
+	 * for recursive calls if more signals are received while this runs.  It's
+	 * unclear that recursive entry would be safe, and it doesn't seem useful
+	 * even if it is safe, so let's block interrupts until done.
+	 */
+	HOLD_INTERRUPTS();
+
+	/*
+	 * Moreover, CurrentMemoryContext might be pointing almost anywhere.  We
+	 * don't want to risk leaking data into long-lived contexts, so let's do
+	 * our work here in a private context that we can reset on each use.
+	 */
+	if (hpm_context == NULL)	/* first time through? */
+		hpm_context = AllocSetContextCreate(TopMemoryContext,
+											"ProcessRepackMessages",
+											ALLOCSET_DEFAULT_SIZES);
+	else
+		MemoryContextReset(hpm_context);
+
+	oldcontext = MemoryContextSwitchTo(hpm_context);
+
+	/* OK to process messages.  Reset the flag saying there are more to do. */
+	RepackMessagePending = false;
+
+	/*
+	 * Read as many messages as we can from each worker, but stop when no more
+	 * messages can be read from the worker without blocking.
+	 */
+	while (true)
+	{
+		shm_mq_result res;
+		Size		nbytes;
+		void	   *data;
+
+		res = shm_mq_receive(decoding_worker->error_mqh, &nbytes,
+							 &data, true);
+		if (res == SHM_MQ_WOULD_BLOCK)
+			break;
+		else if (res == SHM_MQ_SUCCESS)
+		{
+			StringInfoData msg;
+
+			initStringInfo(&msg);
+			appendBinaryStringInfo(&msg, data, nbytes);
+			ProcessRepackMessage(&msg);
+			pfree(msg.data);
+		}
+		else
+		{
+			/*
+			 * The decoding worker is special in that it exits as soon as it
+			 * has its work done. Thus the DETACHED result code is fine.
+			 */
+			Assert(res == SHM_MQ_DETACHED);
+
+			break;
+		}
+	}
+
+	MemoryContextSwitchTo(oldcontext);
+
+	/* Might as well clear the context on our way out */
+	MemoryContextReset(hpm_context);
+
+	RESUME_INTERRUPTS();
+}
+
+/*
+ * Process a single protocol message received from a single parallel worker.
+ */
+static void
+ProcessRepackMessage(StringInfo msg)
+{
+	char		msgtype;
+
+	msgtype = pq_getmsgbyte(msg);
+
+	switch (msgtype)
+	{
+		case PqMsg_ErrorResponse:
+		case PqMsg_NoticeResponse:
+			{
+				ErrorData	edata;
+
+				/* Parse ErrorResponse or NoticeResponse. */
+				pq_parse_errornotice(msg, &edata);
+
+				/* Death of a worker isn't enough justification for suicide. */
+				edata.elevel = Min(edata.elevel, ERROR);
+
+				/*
+				 * If desired, add a context line to show that this is a
+				 * message propagated from a parallel worker.  Otherwise, it
+				 * can sometimes be confusing to understand what actually
+				 * happened.
+				 */
+				if (edata.context)
+					edata.context = psprintf("%s\n%s", edata.context,
+											 _("decoding worker"));
+				else
+					edata.context = pstrdup(_("decoding worker"));
+
+				/* Rethrow error or print notice. */
+				ThrowErrorData(&edata);
+
+				break;
+			}
+
+		default:
+			{
+				elog(ERROR, "unrecognized message type received from decoding worker: %c (message length %d bytes)",
+					 msgtype, msg->len);
+			}
+	}
+}
diff --git a/src/backend/libpq/pqmq.c b/src/backend/libpq/pqmq.c
index 6e4bbfb5aa1..42f6fa472c5 100644
--- a/src/backend/libpq/pqmq.c
+++ b/src/backend/libpq/pqmq.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 
 #include "access/parallel.h"
+#include "commands/cluster.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "libpq/pqmq.h"
@@ -175,6 +176,10 @@ mq_putmessage(char msgtype, const char *s, size_t len)
 				SendProcSignal(pq_mq_parallel_leader_pid,
 							   PROCSIG_PARALLEL_APPLY_MESSAGE,
 							   pq_mq_parallel_leader_proc_number);
+			else if (IsRepackWorker())
+				SendProcSignal(pq_mq_parallel_leader_pid,
+							   PROCSIG_REPACK_MESSAGE,
+							   pq_mq_parallel_leader_proc_number);
 			else
 			{
 				Assert(IsParallelWorker());
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 65deabe91a7..334bb708c5b 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -13,6 +13,7 @@
 #include "postgres.h"
 
 #include "access/parallel.h"
+#include "commands/cluster.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -136,6 +137,9 @@ static const struct
 	},
 	{
 		"SequenceSyncWorkerMain", SequenceSyncWorkerMain
+	},
+	{
+		"RepackWorkerMain", RepackWorkerMain
 	}
 };
 
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index b0ef1a12520..35a46988285 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -194,7 +194,11 @@ StartupDecodingContext(List *output_plugin_options,
 
 	ctx->slot = slot;
 
-	ctx->reader = XLogReaderAllocate(wal_segment_size, NULL, xl_routine, ctx);
+	/*
+	 * TODO A separate patch for PG core, unless there's really a reason to
+	 * pass ctx for private_data (May extensions expect ctx?).
+	 */
+	ctx->reader = XLogReaderAllocate(wal_segment_size, NULL, xl_routine, NULL);
 	if (!ctx->reader)
 		ereport(ERROR,
 				(errcode(ERRCODE_OUT_OF_MEMORY),
diff --git a/src/backend/replication/pgoutput_repack/pgoutput_repack.c b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
index c8930640a0d..fb9956d392d 100644
--- a/src/backend/replication/pgoutput_repack/pgoutput_repack.c
+++ b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
@@ -168,17 +168,13 @@ store_change(LogicalDecodingContext *ctx, ConcurrentChangeKind kind,
 			 HeapTuple tuple)
 {
 	RepackDecodingState *dstate;
-	char	   *change_raw;
-	ConcurrentChange change;
+	char		kind_byte = (char) kind;
 	bool		flattened = false;
-	Size		size;
-	Datum		values[1];
-	bool		isnull[1];
-	char	   *dst;
 
 	dstate = (RepackDecodingState *) ctx->output_writer_private;
 
-	size = VARHDRSZ + SizeOfConcurrentChange;
+	/* Store the change kind. */
+	BufFileWrite(dstate->file, &kind_byte, 1);
 
 	/*
 	 * ReorderBufferCommit() stores the TOAST chunks in its private memory
@@ -195,46 +191,12 @@ store_change(LogicalDecodingContext *ctx, ConcurrentChangeKind kind,
 		tuple = toast_flatten_tuple(tuple, dstate->tupdesc);
 		flattened = true;
 	}
+	/* Store the tuple size ... */
+	BufFileWrite(dstate->file, &tuple->t_len, sizeof(tuple->t_len));
+	/* ... and the tuple itself. */
+	BufFileWrite(dstate->file, tuple->t_data, tuple->t_len);
 
-	size += tuple->t_len;
-	if (size >= MaxAllocSize)
-		elog(ERROR, "Change is too big.");
-
-	/* Construct the change. */
-	change_raw = (char *) palloc0(size);
-	SET_VARSIZE(change_raw, size);
-
-	/*
-	 * Since the varlena alignment might not be sufficient for the structure,
-	 * set the fields in a local instance and remember where it should
-	 * eventually be copied.
-	 */
-	change.kind = kind;
-	dst = (char *) VARDATA(change_raw);
-
-	/*
-	 * Copy the tuple.
-	 *
-	 * Note: change->tup_data.t_data must be fixed on retrieval!
-	 */
-	memcpy(&change.tup_data, tuple, sizeof(HeapTupleData));
-	memcpy(dst, &change, SizeOfConcurrentChange);
-	dst += SizeOfConcurrentChange;
-	memcpy(dst, tuple->t_data, tuple->t_len);
-
-	/* The data has been copied. */
+	/* Free the flat copy if created above. */
 	if (flattened)
 		pfree(tuple);
-
-	/* Store as tuple of 1 bytea column. */
-	values[0] = PointerGetDatum(change_raw);
-	isnull[0] = false;
-	tuplestore_putvalues(dstate->tstore, dstate->tupdesc_change,
-						 values, isnull);
-
-	/* Accounting. */
-	dstate->nchanges++;
-
-	/* Cleanup. */
-	pfree(change_raw);
 }
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 8e56922dcea..6f9e7a7aab7 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -19,6 +19,7 @@
 
 #include "access/parallel.h"
 #include "commands/async.h"
+#include "commands/cluster.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/pg_bitutils.h"
@@ -697,6 +698,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
 	if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
 		HandleParallelApplyMessageInterrupt();
 
+	if (CheckProcSignal(PROCSIG_REPACK_MESSAGE))
+		HandleRepackMessageInterrupt();
+
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_DATABASE))
 		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_DATABASE);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 015c67bbeba..566e5a50c30 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -36,6 +36,7 @@
 #include "access/xact.h"
 #include "catalog/pg_type.h"
 #include "commands/async.h"
+#include "commands/cluster.h"
 #include "commands/event_trigger.h"
 #include "commands/explain_state.h"
 #include "commands/prepare.h"
@@ -3541,6 +3542,9 @@ ProcessInterrupts(void)
 
 	if (ParallelApplyMessagePending)
 		ProcessParallelApplyMessages();
+
+	if (RepackMessagePending)
+		ProcessRepackMessages();
 }
 
 /*
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 3299de23bb3..73a3def69bc 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -62,6 +62,7 @@ LOGICAL_APPLY_MAIN	"Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN	"Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication parallel apply process."
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
+REPACK_WORKER_MAIN	"Waiting in main loop of REPACK decoding worker process."
 REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot synchronization."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
@@ -154,6 +155,7 @@ RECOVERY_CONFLICT_SNAPSHOT	"Waiting for recovery conflict resolution for a vacuu
 RECOVERY_CONFLICT_TABLESPACE	"Waiting for recovery conflict resolution for dropping a tablespace."
 RECOVERY_END_COMMAND	"Waiting for <xref linkend="guc-recovery-end-command"/> to complete."
 RECOVERY_PAUSE	"Waiting for recovery to be resumed."
+REPACK_WORKER_EXPORT	"Waiting for decoding worker to export a new output file."
 REPLICATION_ORIGIN_DROP	"Waiting for a replication origin to become inactive so it can be dropped."
 REPLICATION_SLOT_DROP	"Waiting for a replication slot to become inactive so it can be dropped."
 RESTORE_COMMAND	"Waiting for <xref linkend="guc-restore-command"/> to complete."
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 76aa993009a..15760363a1a 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -22,7 +22,6 @@
 #include "access/xact.h"
 #include "commands/vacuum.h"
 #include "executor/tuptable.h"
-#include "replication/logical.h"
 #include "storage/read_stream.h"
 #include "utils/rel.h"
 #include "utils/snapshot.h"
@@ -631,7 +630,6 @@ typedef struct TableAmRoutine
 											  bool use_sort,
 											  TransactionId OldestXmin,
 											  Snapshot snapshot,
-											  LogicalDecodingContext *decoding_ctx,
 											  TransactionId *xid_cutoff,
 											  MultiXactId *multi_cutoff,
 											  double *num_tuples,
@@ -1651,8 +1649,6 @@ table_relation_copy_data(Relation rel, const RelFileLocator *newrlocator)
  * - *multi_cutoff - ditto
  * - snapshot - if != NULL, ignore data changes done by transactions that this
  *	 (MVCC) snapshot considers still in-progress or in the future.
- * - decoding_ctx - logical decoding context, to capture concurrent data
- *   changes.
  *
  * Output parameters:
  * - *xid_cutoff - rel's new relfrozenxid value, may be invalid
@@ -1666,7 +1662,6 @@ table_relation_copy_for_cluster(Relation OldTable, Relation NewTable,
 								bool use_sort,
 								TransactionId OldestXmin,
 								Snapshot snapshot,
-								LogicalDecodingContext *decoding_ctx,
 								TransactionId *xid_cutoff,
 								MultiXactId *multi_cutoff,
 								double *num_tuples,
@@ -1675,7 +1670,7 @@ table_relation_copy_for_cluster(Relation OldTable, Relation NewTable,
 {
 	OldTable->rd_tableam->relation_copy_for_cluster(OldTable, NewTable, OldIndex,
 													use_sort, OldestXmin,
-													snapshot, decoding_ctx,
+													snapshot,
 													xid_cutoff, multi_cutoff,
 													num_tuples, tups_vacuumed,
 													tups_recently_dead);
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 6a5c476294a..1b05d5d418b 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -17,11 +17,13 @@
 #include "nodes/parsenodes.h"
 #include "parser/parse_node.h"
 #include "replication/decode.h"
+#include "postmaster/bgworker.h"
 #include "replication/logical.h"
+#include "storage/buffile.h"
 #include "storage/lock.h"
+#include "storage/shm_mq.h"
 #include "utils/relcache.h"
 #include "utils/resowner.h"
-#include "utils/tuplestore.h"
 
 
 /* flag bits for ClusterParams->options */
@@ -44,6 +46,9 @@ typedef struct ClusterParams
  * The following definitions are used by REPACK CONCURRENTLY.
  */
 
+/*
+ * Stored as a single byte in the output file.
+ */
 typedef enum
 {
 	CHANGE_INSERT,
@@ -52,68 +57,30 @@ typedef enum
 	CHANGE_DELETE
 } ConcurrentChangeKind;
 
-typedef struct ConcurrentChange
-{
-	/* See the enum above. */
-	ConcurrentChangeKind kind;
-
-	/*
-	 * The actual tuple.
-	 *
-	 * The tuple data follows the ConcurrentChange structure. Before use make
-	 * sure the tuple is correctly aligned (ConcurrentChange can be stored as
-	 * bytea) and that tuple->t_data is fixed.
-	 */
-	HeapTupleData tup_data;
-} ConcurrentChange;
-
-#define SizeOfConcurrentChange (offsetof(ConcurrentChange, tup_data) + \
-								sizeof(HeapTupleData))
-
 /*
  * Logical decoding state.
  *
- * Here we store the data changes that we decode from WAL while the table
- * contents is being copied to a new storage. Also the necessary metadata
- * needed to apply these changes to the table is stored here.
+ * The output plugin uses it to store the data changes that it decodes from
+ * WAL while the table contents is being copied to a new storage.
  */
 typedef struct RepackDecodingState
 {
 	/* The relation whose changes we're decoding. */
 	Oid			relid;
 
-	/* Replication slot name. */
-	NameData	slotname;
-
-	/*
-	 * Decoded changes are stored here. Although we try to avoid excessive
-	 * batches, it can happen that the changes need to be stored to disk. The
-	 * tuplestore does this transparently.
-	 */
-	Tuplestorestate *tstore;
-
-	/* The current number of changes in tstore. */
-	double		nchanges;
-
-	/*
-	 * Descriptor to store the ConcurrentChange structure serialized (bytea).
-	 * We can't store the tuple directly because tuplestore only supports
-	 * minimum tuple and we may need to transfer OID system column from the
-	 * output plugin. Also we need to transfer the change kind, so it's better
-	 * to put everything in the structure than to use 2 tuplestores "in
-	 * parallel".
-	 */
-	TupleDesc	tupdesc_change;
-
-	/* Tuple descriptor needed to update indexes. */
+	/* Tuple descriptor of the relation being processed. */
 	TupleDesc	tupdesc;
 
-	/* Slot to retrieve data from tstore. */
-	TupleTableSlot *tsslot;
-
-	ResourceOwner resowner;
+	/* The current output file. */
+	BufFile    *file;
 } RepackDecodingState;
 
+extern PGDLLIMPORT volatile sig_atomic_t RepackMessagePending;
+
+extern bool IsRepackWorker(void);
+extern void HandleRepackMessageInterrupt(void);
+extern void ProcessRepackMessages(void);
+
 extern void ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel);
 
 extern void cluster_rel(RepackCommand command, Relation OldHeap, Oid indexOid,
@@ -136,6 +103,6 @@ extern void finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 
 extern bool am_decoding_for_repack(void);
 extern bool change_useless_for_repack(XLogRecordBuffer *buf);
-extern void repack_decode_concurrent_changes(LogicalDecodingContext *ctx,
-											 XLogRecPtr end_of_wal);
+
+extern void RepackWorkerMain(Datum main_arg);
 #endif							/* CLUSTER_H */
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index e52b8eb7697..3ef35ca6b80 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -36,6 +36,7 @@ typedef enum
 	PROCSIG_BARRIER,			/* global barrier interrupt  */
 	PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
 	PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
+	PROCSIG_REPACK_MESSAGE,		/* Message from repack worker */
 
 	/* Recovery conflict reasons */
 	PROCSIG_RECOVERY_CONFLICT_FIRST,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a0b7b38a5e2..d1a694f9008 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -496,7 +496,6 @@ CompressFileHandle
 CompressionLocation
 CompressorState
 ComputeXidHorizonsResult
-ConcurrentChange
 ConcurrentChangeKind
 ConditionVariable
 ConditionVariableMinimallyPadded
@@ -636,6 +635,9 @@ DeclareCursorStmt
 DecodedBkpBlock
 DecodedXLogRecord
 DecodingOutputState
+DecodingWorker
+DecodingWorkerShared
+DecodingWorkerState
 DefElem
 DefElemAction
 DefaultACLInfo
-- 
2.47.3

v29-0006-Use-multiple-snapshots-to-copy-the-data.patchtext/plainDownload

From e317848bac4a20ac331239672315d78143c80605 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Thu, 8 Jan 2026 17:47:50 +0100
Subject: [PATCH 6/6] Use multiple snapshots to copy the data.

REPACK (CONCURRENTLY) does not prevent applications from using the table that
is being processed, however it can prevent the xmin horizon from advancing and
thus restrict VACUUM for the whole database. This patch adds the ability to
use particular snapshot only for certain range of pages. Each time that number
of pages is processed, a new snapshot is built, which supposedly has its xmin
higher than the previous snapshot.

The data copying works as follows:

  1. Have the logical decoding system build a snapshot S0 for range R0 at
     LSN0. This snapshot sees all the data changes whose commit records have
     LSN < LSN0.

  2. Copy the pages in that range to the new relation. The changes not visible
     to the snapshot (because their transactions are still running) will
     appear in the output of the logical decoding system as soon as their
     commit records appear in WAL.

  3. Perform logical decoding of all changes we find in WAL for the table
     we're repacking, put them aside and remember that out of these we can
     only apply those that affect the range R0 in the old
     relation. (Naturally, we cannot apply ones that belong to other pages
     because it's impossible to UPDATE / DELETE a row in the new relation if
     it hasn't been copied yet.) Once the decoding is done, consider LSN1 to
     be the position of the end of the last WAL record decoded.

  4. Build a new snapshot S1 at position LSN1, i.e. one that sees all the data
     whose commit records are at WAL positions < LSN1. Use this snapshot to
     copy the range of pages R1.

  5. Perform logical decoding like in step 3, but remember, that out of this
     next set, only changes belonging to ranges R0 *and* R1 in the old table
     can be applied.

  6. etc

Note that the changes decoded above should not be applied to the new relation
until the whole relation has been copied. The point is that we need "identity
index" to apply UPDATE and DELETE statements, and bulk creation of indexes on
the already copied heap is probably better than retail insertions during the
copying.

Special attention needs to be paid to UPDATES that span page ranges. For
example, if the old tuple is in range R0, but the new tuple is in R1, and R1
hasn't been copied yet, we only DELETE the old version from the new
relation. The new version will be handled during processing of range R1. The
snapshot S1 will be based on WAL position following that UPDATE, so it'll see
the new tuple if its transaction's commit record is at WAL position lower than
the position where we built the snapshot. On the other hand, if the commit
record appears at higher position than the that of the snapshot, the
corresponding INSERT will be decoded and replayed sometime later: once the
scan of R1 started, changes of tuples belonging to it are no longer filtered
out.

Likewise, if the old tuple is in range R1 (not yet copied) but the new tuple
is in R0, we only perform INSERT on the new relation. The deletion of the old
version will either be visible to the snapshot S1 (i.e. the snapshot won't see
the old version), or replayed later.

This approach introduces one limitation though: if the USING INDEX clause is
specified, an explicit sort is always used. Index scan wouldn't work because
it does not return the tuples sorted by CTID. That way we wouldn't be able to
split the copying into ranges of pages. I'm not sure it's serious. If REPACK
runs concurrently and does not restrict VACUUM, the execution time should not
be critical.

A new GUC repack_snapshot_after can be used to set the number of pages per
snapshot. It's currently classified as DEVELOPER_OPTIONS and may be replaced
by a constant after enough evaluation is done.
---
 src/backend/access/heap/heapam_handler.c      | 144 ++++-
 src/backend/commands/cluster.c                | 589 +++++++++++++-----
 src/backend/replication/logical/decode.c      |  47 +-
 src/backend/replication/logical/logical.c     |  30 +-
 .../replication/logical/reorderbuffer.c       |  50 ++
 src/backend/replication/logical/snapbuild.c   |  27 +-
 .../pgoutput_repack/pgoutput_repack.c         |   2 +
 src/backend/utils/misc/guc_parameters.dat     |  10 +
 src/backend/utils/misc/guc_tables.c           |   1 +
 src/include/access/tableam.h                  |  14 +-
 src/include/commands/cluster.h                |  72 +++
 src/include/replication/logical.h             |   2 +-
 src/include/replication/reorderbuffer.h       |   1 +
 src/include/replication/snapbuild.h           |   2 +-
 src/tools/pgindent/typedefs.list              |   3 +-
 15 files changed, 803 insertions(+), 191 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 475c536ce43..9c02d91d327 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -33,6 +33,7 @@
 #include "catalog/index.h"
 #include "catalog/storage.h"
 #include "catalog/storage_xlog.h"
+#include "commands/cluster.h"
 #include "commands/progress.h"
 #include "executor/executor.h"
 #include "miscadmin.h"
@@ -686,12 +687,12 @@ static void
 heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 								 Relation OldIndex, bool use_sort,
 								 TransactionId OldestXmin,
-								 Snapshot snapshot,
 								 TransactionId *xid_cutoff,
 								 MultiXactId *multi_cutoff,
 								 double *num_tuples,
 								 double *tups_vacuumed,
-								 double *tups_recently_dead)
+								 double *tups_recently_dead,
+								 void *tableam_data)
 {
 	RewriteState rwstate = NULL;
 	IndexScanDesc indexScan;
@@ -707,7 +708,10 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	bool	   *isnull;
 	BufferHeapTupleTableSlot *hslot;
 	BlockNumber prev_cblock = InvalidBlockNumber;
-	bool		concurrent = snapshot != NULL;
+	ConcurrentChangeContext *ctx = (ConcurrentChangeContext *) tableam_data;
+	bool		concurrent = ctx != NULL;
+	Snapshot	snapshot = NULL;
+	BlockNumber range_end = InvalidBlockNumber;
 
 	/* Remember if it's a system catalog */
 	is_system_catalog = IsSystemRelation(OldHeap);
@@ -744,8 +748,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	 * that still need to be copied, we scan with SnapshotAny and use
 	 * HeapTupleSatisfiesVacuum for the visibility test.
 	 *
-	 * In the CONCURRENTLY case, we do regular MVCC visibility tests, using
-	 * the snapshot passed by the caller.
+	 * In the CONCURRENTLY case, we do regular MVCC visibility tests. The
+	 * snapshot changes several times during the scan so that we do not block
+	 * the progress of the xmin horizon for VACUUM too much.
 	 */
 	if (OldIndex != NULL && !use_sort)
 	{
@@ -773,10 +778,15 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap,
-									snapshot ? snapshot : SnapshotAny,
-									0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
 		heapScan = (HeapScanDesc) tableScan;
+
+		/*
+		 * In CONCURRENTLY mode we scan the table by ranges of blocks and the
+		 * algorithm below expects forward direction. (No other direction
+		 * should be set here regardless concurrently anyway.)
+		 */
+		Assert(heapScan->rs_dir == ForwardScanDirection || !concurrent);
 		indexScan = NULL;
 
 		/* Set total heap blocks */
@@ -787,6 +797,24 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	slot = table_slot_create(OldHeap, NULL);
 	hslot = (BufferHeapTupleTableSlot *) slot;
 
+	if (concurrent)
+	{
+		/*
+		 * Do not block the progress of xmin horizons.
+		 *
+		 * TODO Analyze thoroughly if this might have bad consequences.
+		 */
+		PopActiveSnapshot();
+		InvalidateCatalogSnapshot();
+
+		/*
+		 * Wait until the worker has the initial snapshot and retrieve it.
+		 */
+		snapshot = repack_get_snapshot(ctx);
+
+		PushActiveSnapshot(snapshot);
+	}
+
 	/*
 	 * Scan through the OldHeap, either in OldIndex order or sequentially;
 	 * copy each tuple into the NewHeap, or transiently to the tuplesort
@@ -803,6 +831,13 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		if (indexScan != NULL)
 		{
+			/*
+			 * Index scan should not be used in the CONCURRENTLY case because
+			 * it returns tuples in random order, so we could not split the
+			 * scan into a series of page ranges.
+			 */
+			Assert(!concurrent);
+
 			if (!index_getnext_slot(indexScan, ForwardScanDirection, slot))
 				break;
 
@@ -824,6 +859,18 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 				 */
 				pgstat_progress_update_param(PROGRESS_REPACK_HEAP_BLKS_SCANNED,
 											 heapScan->rs_nblocks);
+
+				if (concurrent)
+				{
+					PopActiveSnapshot();
+
+					/*
+					 * For the last range, there are no restriction on block
+					 * numbers, so the concurrent data changes pertaining to
+					 * this range can decoded (and applied) anytime after this
+					 * loop.
+					 */
+				}
 				break;
 			}
 
@@ -922,6 +969,75 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 				continue;
 			}
 		}
+		else
+		{
+			BlockNumber blkno;
+			bool		visible;
+
+			/*
+			 * With CONCURRENTLY, we use each snapshot only for certain range
+			 * of pages, so that VACUUM does not get block for too long. So
+			 * first check if the tuple falls into the current range.
+			 */
+			blkno = BufferGetBlockNumber(buf);
+
+			/* The first block of the scan? */
+			if (!BlockNumberIsValid(ctx->first_block))
+			{
+				Assert(!BlockNumberIsValid(range_end));
+
+				ctx->first_block = blkno;
+				range_end = repack_blocks_per_snapshot;
+			}
+			else
+			{
+				Assert(BlockNumberIsValid(range_end));
+
+				/* End of the current range? */
+				if (blkno >= range_end)
+				{
+					XLogRecPtr	end_of_wal;
+
+					PopActiveSnapshot();
+
+					/*
+					 * XXX It might be worth Assert(CatalogSnapshot == NULL)
+					 * here, however that symbol is not external.
+					 */
+
+					/*
+					 * Decode all the concurrent data changes committed so far
+					 * - these will be applicable to the current range.
+					 */
+					end_of_wal = GetFlushRecPtr(NULL);
+					repack_get_concurrent_changes(ctx, end_of_wal, range_end,
+												  true, false);
+
+					/*
+					 * Define the next range.
+					 */
+					range_end = blkno + repack_blocks_per_snapshot;
+
+					/*
+					 * Get the snapshot for the next range - it should have
+					 * been built at the position right after the last change
+					 * decoded. Data present in the next range of blocks will
+					 * either be visible to the snapshot or appear in the next
+					 * batch of decoded changes.
+					 */
+					snapshot = repack_get_snapshot(ctx);
+					PushActiveSnapshot(snapshot);
+				}
+			}
+
+			/* Finally check the tuple visibility. */
+			LockBuffer(buf, BUFFER_LOCK_SHARE);
+			visible = HeapTupleSatisfiesVisibility(tuple, snapshot, buf);
+			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+
+			if (!visible)
+				continue;
+		}
 
 		*num_tuples += 1;
 		if (tuplesort != NULL)
@@ -956,6 +1072,18 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		}
 	}
 
+	if (concurrent)
+	{
+		XLogRecPtr	end_of_wal;
+
+		/* Decode the changes belonging to the last range. */
+		end_of_wal = GetFlushRecPtr(NULL);
+		repack_get_concurrent_changes(ctx, end_of_wal, InvalidBlockNumber,
+									  false, false);
+
+		PushActiveSnapshot(GetTransactionSnapshot());
+	}
+
 	if (indexScan != NULL)
 		index_endscan(indexScan);
 	if (tableScan != NULL)
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 5232fbfb57d..8affa859abc 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -111,52 +111,27 @@ typedef struct
 static RelFileLocator repacked_rel_locator = {.relNumber = InvalidOid};
 static RelFileLocator repacked_rel_toast_locator = {.relNumber = InvalidOid};
 
-/*
- * Everything we need to call ExecInsertIndexTuples().
- */
-typedef struct IndexInsertState
-{
-	ResultRelInfo *rri;
-	EState	   *estate;
-} IndexInsertState;
-
 /* The WAL segment being decoded. */
 static XLogSegNo repack_current_segment = 0;
 
 /*
- * The first file exported by the decoding worker must contain a snapshot, the
- * following ones contain the data changes.
+ * When REPACK (CONCURRENTLY) copies data to the new heap, a new snapshot is
+ * built after processing this many pages.
  */
-#define WORKER_FILE_SNAPSHOT	0
+int			repack_blocks_per_snapshot = 1024;
 
 /*
- * Information needed to apply concurrent data changes.
+ * Remember here to which pages should applied to changes recorded in given
+ * file.
  */
-typedef struct ChangeDest
+typedef struct RepackApplyRange
 {
-	/* The relation the changes are applied to. */
-	Relation	rel;
+	/* The first block of the next range. */
+	BlockNumber end;
 
-	/*
-	 * The following is needed to find the existing tuple if the change is
-	 * UPDATE or DELETE. 'ident_key' should have all the fields except for
-	 * 'sk_argument' initialized.
-	 */
-	Relation	ident_index;
-	ScanKey		ident_key;
-	int			ident_key_nentries;
-
-	/* Needed to update indexes of rel_dst. */
-	IndexInsertState *iistate;
-
-	/*
-	 * Sequential number of the file containing the changes.
-	 *
-	 * TODO This field makes the structure name less descriptive. Should we
-	 * rename it, e.g. to ChangeApplyInfo?
-	 */
-	int		file_seq;
-} ChangeDest;
+	/* File containing the changes to be applied to blocks in this range. */
+	char	   *fname;
+} RepackApplyRange;
 
 /*
  * Layout of shared memory used for communication between backend and the
@@ -167,6 +142,9 @@ typedef struct DecodingWorkerShared
 	/* Is the decoding initialized? */
 	bool		initialized;
 
+	/* Set to request a snapshot. */
+	bool		snapshot_requested;
+
 	/*
 	 * Once the worker has reached this LSN, it should close the current
 	 * output file and either create a new one or exit, according to the field
@@ -174,6 +152,8 @@ typedef struct DecodingWorkerShared
 	 * the WAL available and keep checking this field. It is ok if the worker
 	 * had already decoded records whose LSN is >= lsn_upto before this field
 	 * has been set.
+	 *
+	 * Set a valid LSN to request data changes.
 	 */
 	XLogRecPtr	lsn_upto;
 
@@ -184,7 +164,8 @@ typedef struct DecodingWorkerShared
 	SharedFileSet sfs;
 
 	/* Number of the last file exported by the worker. */
-	int			last_exported;
+	int			last_exported_snapshot;
+	int			last_exported_changes;
 
 	/* Synchronize access to the fields above. */
 	slock_t		mutex;
@@ -226,26 +207,14 @@ typedef struct DecodingWorkerShared
  * the fileset name.)
  */
 static inline void
-DecodingWorkerFileName(char *fname, Oid relid, uint32 seq)
+DecodingWorkerFileName(char *fname, Oid relid, uint32 seq, bool snapshot)
 {
-	snprintf(fname, MAXPGPATH, "%u-%u", relid, seq);
+	if (!snapshot)
+		snprintf(fname, MAXPGPATH, "%u-%u", relid, seq);
+	else
+		snprintf(fname, MAXPGPATH, "%u-%u-snapshot", relid, seq);
 }
 
-/*
- * Backend-local information to control the decoding worker.
- */
-typedef struct DecodingWorker
-{
-	/* The worker. */
-	BackgroundWorkerHandle *handle;
-
-	/* DecodingWorkerShared is in this segment. */
-	dsm_segment *seg;
-
-	/* Handle of the error queue. */
-	shm_mq_handle *error_mqh;
-} DecodingWorker;
-
 /* Pointer to currently running decoding worker. */
 static DecodingWorker *decoding_worker = NULL;
 
@@ -262,11 +231,11 @@ static void check_repack_concurrently_requirements(Relation rel);
 static void rebuild_relation(Relation OldHeap, Relation index, bool verbose,
 							 bool concurrent);
 static void copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
-							Snapshot snapshot,
 							bool verbose,
 							bool *pSwapToastByContent,
 							TransactionId *pFreezeXid,
-							MultiXactId *pCutoffMulti);
+							MultiXactId *pCutoffMulti,
+							ConcurrentChangeContext *ctx);
 static List *get_tables_to_repack(RepackCommand cmd, bool usingindex,
 								  MemoryContext permcxt);
 static List *get_tables_to_repack_partitioned(RepackCommand cmd,
@@ -276,9 +245,12 @@ static bool cluster_is_permitted_for_relation(RepackCommand cmd,
 											  Oid relid, Oid userid);
 
 static LogicalDecodingContext *setup_logical_decoding(Oid relid);
-static bool decode_concurrent_changes(LogicalDecodingContext *ctx,
+static bool decode_concurrent_changes(LogicalDecodingContext *decoding_ctx,
 									  DecodingWorkerShared *shared);
-static void apply_concurrent_changes(BufFile *file, ChangeDest *dest);
+static void apply_concurrent_changes(ConcurrentChangeContext *ctx);
+static void apply_concurrent_changes_file(ConcurrentChangeContext *ctx,
+										  BufFile *file,
+										  BlockNumber range_end);
 static void apply_concurrent_insert(Relation rel, HeapTuple tup,
 									IndexInsertState *iistate,
 									TupleTableSlot *index_slot);
@@ -287,12 +259,14 @@ static void apply_concurrent_update(Relation rel, HeapTuple tup,
 									IndexInsertState *iistate,
 									TupleTableSlot *index_slot);
 static void apply_concurrent_delete(Relation rel, HeapTuple tup_target);
-static HeapTuple find_target_tuple(Relation rel, ChangeDest *dest,
+static bool is_tuple_in_block_range(HeapTuple tup, BlockNumber start,
+									BlockNumber end);
+static HeapTuple find_target_tuple(Relation rel,
+								   ConcurrentChangeContext *ctx,
 								   HeapTuple tup_key,
 								   TupleTableSlot *ident_slot);
-static void process_concurrent_changes(XLogRecPtr end_of_wal,
-									   ChangeDest *dest,
-									   bool done);
+static void repack_add_block_range(ConcurrentChangeContext *ctx,
+								   BlockNumber end, char *fname);
 static IndexInsertState *get_index_insert_state(Relation relation,
 												Oid ident_index_id,
 												Relation *ident_index_p);
@@ -303,7 +277,8 @@ static void cleanup_logical_decoding(LogicalDecodingContext *ctx);
 static void rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
 											   Relation cl_index,
 											   TransactionId frozenXid,
-											   MultiXactId cutoffMulti);
+											   MultiXactId cutoffMulti,
+											   ConcurrentChangeContext *ctx);
 static List *build_new_indexes(Relation NewHeap, Relation OldHeap, List *OldIndexes);
 static Relation process_single_relation(RepackStmt *stmt,
 										LOCKMODE lockmode,
@@ -314,9 +289,8 @@ static Oid	determine_clustered_index(Relation rel, bool usingindex,
 static void start_decoding_worker(Oid relid);
 static void stop_decoding_worker(void);
 static void repack_worker_internal(dsm_segment *seg);
-static void export_initial_snapshot(Snapshot snapshot,
-									DecodingWorkerShared *shared);
-static Snapshot get_initial_snapshot(DecodingWorker *worker);
+static void export_snapshot(Snapshot snapshot,
+							DecodingWorkerShared *shared);
 static void ProcessRepackMessage(StringInfo msg);
 static const char *RepackCommandAsString(RepackCommand cmd);
 
@@ -402,7 +376,15 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 	{
 		rel = process_single_relation(stmt, lockmode, isTopLevel, &params);
 		if (rel == NULL)
+		{
+			/*
+			 * The original transaction was committed, so the current
+			 * portal will not pop the active snapshot.
+			 */
+			PopActiveSnapshot();
+
 			return;				/* all done */
+		}
 	}
 
 	/*
@@ -1020,6 +1002,15 @@ check_repack_concurrently_requirements(Relation rel)
 						RelationGetRelationName(rel)),
 				 (errhint("Relation \"%s\" has no identity index.",
 						  RelationGetRelationName(rel)))));
+
+	/*
+	 * In the CONCURRENTLY mode we don't want to use the same snapshot
+	 * throughout the whole processing, as it could block the progress of xmin
+	 * horizon.
+	 */
+	if (IsolationUsesXactSnapshot())
+		ereport(ERROR,
+				(errmsg("REPACK (CONCURRENTLY) does not support transaction isolation higher than READ COMMITTED")));
 }
 
 
@@ -1050,7 +1041,7 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose, bool concurrent
 	bool		swap_toast_by_content;
 	TransactionId frozenXid;
 	MultiXactId cutoffMulti;
-	Snapshot	snapshot = NULL;
+	ConcurrentChangeContext *ctx = NULL;
 #if USE_ASSERT_CHECKING
 	LOCKMODE	lmode;
 
@@ -1062,6 +1053,13 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose, bool concurrent
 
 	if (concurrent)
 	{
+		/*
+		 * This is only needed here to gather the data changes and range
+		 * information during the copying. The fields needed to apply the
+		 * changes be filled later.
+		 */
+		ctx = palloc0_object(ConcurrentChangeContext);
+
 		/*
 		 * The worker needs to be member of the locking group we're the leader
 		 * of. We ought to become the leader before the worker starts. The
@@ -1087,13 +1085,7 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose, bool concurrent
 		 * REPACK CONCURRENTLY.
 		 */
 		start_decoding_worker(tableOid);
-
-		/*
-		 * Wait until the worker has the initial snapshot and retrieve it.
-		 */
-		snapshot = get_initial_snapshot(decoding_worker);
-
-		PushActiveSnapshot(snapshot);
+		ctx->worker = decoding_worker;
 	}
 
 	/* for CLUSTER or REPACK USING INDEX, mark the index as the one to use */
@@ -1117,21 +1109,25 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose, bool concurrent
 	NewHeap = table_open(OIDNewHeap, NoLock);
 
 	/* Copy the heap data into the new table in the desired order */
-	copy_table_data(NewHeap, OldHeap, index, snapshot, verbose,
-					&swap_toast_by_content, &frozenXid, &cutoffMulti);
-
-	/* The historic snapshot won't be needed anymore. */
-	if (snapshot)
+	if (concurrent)
 	{
-		PopActiveSnapshot();
-		UpdateActiveSnapshotCommandId();
+		ctx->first_block = InvalidBlockNumber;
+		ctx->block_ranges = NIL;
 	}
+	copy_table_data(NewHeap, OldHeap, index, verbose, &swap_toast_by_content,
+					&frozenXid, &cutoffMulti, ctx);
 
 	if (concurrent)
 	{
+		/*
+		 * Make sure the active snapshot can see the data copied, so the rows
+		 * can be updated / deleted.
+		 */
+		UpdateActiveSnapshotCommandId();
+
 		Assert(!swap_toast_by_content);
 		rebuild_relation_finish_concurrent(NewHeap, OldHeap, index,
-										   frozenXid, cutoffMulti);
+										   frozenXid, cutoffMulti, ctx);
 
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_FINAL_CLEANUP);
@@ -1295,9 +1291,6 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
 /*
  * Do the physical copying of table data.
  *
- * 'snapshot' and 'decoding_ctx': see table_relation_copy_for_cluster(). Pass
- * iff concurrent processing is required.
- *
  * There are three output parameters:
  * *pSwapToastByContent is set true if toast tables must be swapped by content.
  * *pFreezeXid receives the TransactionId used as freeze cutoff point.
@@ -1305,8 +1298,9 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
  */
 static void
 copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
-				Snapshot snapshot, bool verbose, bool *pSwapToastByContent,
-				TransactionId *pFreezeXid, MultiXactId *pCutoffMulti)
+				bool verbose, bool *pSwapToastByContent,
+				TransactionId *pFreezeXid, MultiXactId *pCutoffMulti,
+				ConcurrentChangeContext *ctx)
 {
 	Relation	relRelation;
 	HeapTuple	reltup;
@@ -1323,7 +1317,7 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
 	int			elevel = verbose ? INFO : DEBUG2;
 	PGRUsage	ru0;
 	char	   *nspname;
-	bool		concurrent = snapshot != NULL;
+	bool		concurrent = ctx != NULL;
 	LOCKMODE	lmode;
 
 	lmode = concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock;
@@ -1435,8 +1429,18 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
 	 * provided, else plain seqscan.
 	 */
 	if (OldIndex != NULL && OldIndex->rd_rel->relam == BTREE_AM_OID)
-		use_sort = plan_cluster_use_sort(RelationGetRelid(OldHeap),
-										 RelationGetRelid(OldIndex));
+	{
+		if (!concurrent)
+			use_sort = plan_cluster_use_sort(RelationGetRelid(OldHeap),
+											 RelationGetRelid(OldIndex));
+		else
+
+			/*
+			 * To use multiple snapshots, we need to process the table
+			 * sequentially.
+			 */
+			use_sort = true;
+	}
 	else
 		use_sort = false;
 
@@ -1465,11 +1469,11 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
 	 * values (e.g. because the AM doesn't use freezing).
 	 */
 	table_relation_copy_for_cluster(OldHeap, NewHeap, OldIndex, use_sort,
-									cutoffs.OldestXmin, snapshot,
+									cutoffs.OldestXmin,
 									&cutoffs.FreezeLimit,
 									&cutoffs.MultiXactCutoff,
 									&num_tuples, &tups_vacuumed,
-									&tups_recently_dead);
+									&tups_recently_dead, ctx);
 
 	/* return selected values to caller, get set as relfrozenxid/minmxid */
 	*pFreezeXid = cutoffs.FreezeLimit;
@@ -2361,6 +2365,8 @@ cluster_is_permitted_for_relation(RepackCommand cmd, Oid relid, Oid userid)
  * instead return the opened and locked relcache entry, so that caller can
  * process the partitions using the multiple-table handling code.  In this
  * case, if an index name is given, it's up to the caller to resolve it.
+ *
+ * A new transaction is started in either case.
  */
 static Relation
 process_single_relation(RepackStmt *stmt, LOCKMODE lockmode, bool isTopLevel,
@@ -2373,6 +2379,25 @@ process_single_relation(RepackStmt *stmt, LOCKMODE lockmode, bool isTopLevel,
 	Assert(stmt->command == REPACK_COMMAND_CLUSTER ||
 		   stmt->command == REPACK_COMMAND_REPACK);
 
+	/*
+	 * Since REPACK (CONCURRENTLY) pops the active snapshot during the
+	 * processing (it creates and pushes snapshots on its own), and since that
+	 * snapshot can be referenced by the current portal, we need to make sure
+	 * that the portal has no dangling pointer to the snapshot. Starting a new
+	 * transaction seems to be the simplest way.
+	 */
+	PopActiveSnapshot();
+	CommitTransactionCommand();
+
+	/* Start a new transaction. */
+	StartTransactionCommand();
+
+	/*
+	 * Functions in indexes may want a snapshot set. Note that the portal is
+	 * not aware of this one, so the caller needs to pop it explicitly.
+	 */
+	PushActiveSnapshot(GetTransactionSnapshot());
+
 	/* Find, lock, and check permissions on the table. */
 	tableOid = RangeVarGetRelidExtended(stmt->relation->relation,
 										lockmode,
@@ -2675,6 +2700,7 @@ decode_concurrent_changes(LogicalDecodingContext *ctx,
 						  DecodingWorkerShared *shared)
 {
 	RepackDecodingState *dstate;
+	bool		snapshot_requested;
 	XLogRecPtr	lsn_upto;
 	bool		done;
 	char		fname[MAXPGPATH];
@@ -2682,11 +2708,14 @@ decode_concurrent_changes(LogicalDecodingContext *ctx,
 	dstate = (RepackDecodingState *) ctx->output_writer_private;
 
 	/* Open the output file. */
-	DecodingWorkerFileName(fname, shared->relid, shared->last_exported + 1);
+	DecodingWorkerFileName(fname, shared->relid,
+						   shared->last_exported_changes + 1,
+						   false);
 	dstate->file = BufFileCreateFileSet(&shared->sfs.fs, fname);
 
 	SpinLockAcquire(&shared->mutex);
 	lsn_upto = shared->lsn_upto;
+	snapshot_requested = shared->snapshot_requested;
 	done = shared->done;
 	SpinLockRelease(&shared->mutex);
 
@@ -2752,6 +2781,7 @@ decode_concurrent_changes(LogicalDecodingContext *ctx,
 		{
 			SpinLockAcquire(&shared->mutex);
 			lsn_upto = shared->lsn_upto;
+			snapshot_requested = shared->snapshot_requested;
 			/* 'done' should be set at the same time as 'lsn_upto' */
 			done = shared->done;
 			SpinLockRelease(&shared->mutex);
@@ -2796,27 +2826,105 @@ decode_concurrent_changes(LogicalDecodingContext *ctx,
 	 */
 	BufFileClose(dstate->file);
 	dstate->file = NULL;
+
+	/*
+	 * Before publishing the data changes, export the snapshot too if
+	 * requested. Publishing both at once makes sense because both are needed
+	 * at the same time, and it's simpler.
+	 */
+	if (snapshot_requested)
+	{
+		Snapshot	snapshot;
+
+		snapshot = SnapBuildSnapshotForRepack(ctx->snapshot_builder);
+		export_snapshot(snapshot, shared);
+
+		/*
+		 * Adjust the replication slot's xmin so that VACUUM can do more work.
+		 */
+		LogicalIncreaseXminForSlot(InvalidXLogRecPtr, snapshot->xmin, false);
+		FreeSnapshot(snapshot);
+	}
+	else
+	{
+		/*
+		 * If data changes were requested but no following snapshot, we don't
+		 * care about xmin horizon because the heap copying should be done by
+		 * now.
+		 */
+		LogicalIncreaseXminForSlot(InvalidXLogRecPtr, InvalidTransactionId,
+								   false);
+
+	}
+
+	/*
+	 * Now increase the counter(s) to announce that the output is
+	 * available.
+	 */
 	SpinLockAcquire(&shared->mutex);
+	shared->last_exported_changes++;
 	shared->lsn_upto = InvalidXLogRecPtr;
-	shared->last_exported++;
+	if (snapshot_requested)
+	{
+		shared->last_exported_snapshot++;
+		shared->snapshot_requested = false;
+	}
 	SpinLockRelease(&shared->mutex);
+
 	ConditionVariableSignal(&shared->cv);
 
 	return done;
 }
 
 /*
- * Apply changes stored in 'file'.
+ * Apply all concurrent changes.
  */
 static void
-apply_concurrent_changes(BufFile *file, ChangeDest *dest)
+apply_concurrent_changes(ConcurrentChangeContext *ctx)
+{
+	DecodingWorkerShared *shared;
+
+	shared = (DecodingWorkerShared *) dsm_segment_address(decoding_worker->seg);
+
+	foreach_ptr(RepackApplyRange, range, ctx->block_ranges)
+	{
+		BufFile    *file;
+
+		file = BufFileOpenFileSet(&shared->sfs.fs, range->fname, O_RDONLY,
+								  false);
+
+		/*
+		 * If range end is valid, the start should be as well.
+		 */
+		Assert(!BlockNumberIsValid(range->end) ||
+			   BlockNumberIsValid(ctx->first_block));
+
+		apply_concurrent_changes_file(ctx, file, range->end);
+		BufFileClose(file);
+
+		pfree(range->fname);
+		pfree(range);
+	}
+
+	/* Get ready for the next decoding. */
+	ctx->block_ranges = NIL;
+	ctx->first_block = InvalidBlockNumber;
+}
+
+/*
+ * Apply concurrent changes stored in 'file'.
+ */
+static void
+apply_concurrent_changes_file(ConcurrentChangeContext *ctx, BufFile *file,
+							  BlockNumber range_end)
 {
 	char		kind;
 	uint32		t_len;
-	Relation	rel = dest->rel;
+	Relation	rel = ctx->rel;
 	TupleTableSlot *index_slot,
 			   *ident_slot;
 	HeapTuple	tup_old = NULL;
+	bool		check_range = BlockNumberIsValid(range_end);
 
 	/* TupleTableSlot is needed to pass the tuple to ExecInsertIndexTuples(). */
 	index_slot = MakeSingleTupleTableSlot(RelationGetDescr(rel),
@@ -2844,8 +2952,8 @@ apply_concurrent_changes(BufFile *file, ChangeDest *dest)
 		tup->t_data = (HeapTupleHeader) ((char *) tup + HEAPTUPLESIZE);
 		BufFileReadExact(file, tup->t_data, t_len);
 		tup->t_len = t_len;
-		ItemPointerSetInvalid(&tup->t_self);
-		tup->t_tableOid = RelationGetRelid(dest->rel);
+		tup->t_tableOid = RelationGetRelid(ctx->rel);
+		BufFileReadExact(file, &tup->t_self, sizeof(tup->t_self));
 
 		if (kind == CHANGE_UPDATE_OLD)
 		{
@@ -2856,7 +2964,10 @@ apply_concurrent_changes(BufFile *file, ChangeDest *dest)
 		{
 			Assert(tup_old == NULL);
 
-			apply_concurrent_insert(rel, tup, dest->iistate, index_slot);
+			if (!check_range ||
+				is_tuple_in_block_range(tup, ctx->first_block, range_end))
+				apply_concurrent_insert(rel, tup, ctx->iistate,
+										index_slot);
 
 			pfree(tup);
 		}
@@ -2877,16 +2988,52 @@ apply_concurrent_changes(BufFile *file, ChangeDest *dest)
 			/*
 			 * Find the tuple to be updated or deleted.
 			 */
-			tup_exist = find_target_tuple(rel, dest, tup_key, ident_slot);
-			if (tup_exist == NULL)
-				elog(ERROR, "failed to find target tuple");
+			if (!check_range ||
+				(is_tuple_in_block_range(tup_key, ctx->first_block,
+										 range_end)))
+			{
+				/* The change needs to be applied to this tuple. */
+				tup_exist = find_target_tuple(rel, ctx, tup_key, ident_slot);
+				if (tup_exist == NULL)
+					elog(ERROR, "failed to find target tuple");
 
-			if (kind == CHANGE_UPDATE_NEW)
-				apply_concurrent_update(rel, tup, tup_exist, dest->iistate,
-										index_slot);
+				if (kind == CHANGE_DELETE)
+					apply_concurrent_delete(rel, tup_exist);
+				else
+				{
+					/* UPDATE */
+					if (!check_range || tup == tup_key ||
+						is_tuple_in_block_range(tup, ctx->first_block,
+												range_end))
+						/* The new tuple is in the same range. */
+						apply_concurrent_update(rel, tup, tup_exist,
+												ctx->iistate, index_slot);
+					else
+
+						/*
+						 * The new key is in the other range, so only delete
+						 * it from the current one. The new version should be
+						 * visible to the snapshot that we'll use to copy the
+						 * other block.
+						 */
+						apply_concurrent_delete(rel, tup_exist);
+				}
+			}
 			else
-				apply_concurrent_delete(rel, tup_exist);
-
+			{
+				/*
+				 * The change belongs to another range, so we don't need to
+				 * bother with the old tuple: the snapshot used for the other
+				 * range won't see it, so it won't be copied. However, the new
+				 * tuple still may need to go to the range we are checking. In
+				 * that case, simply insert it there.
+				 */
+				if (kind == CHANGE_UPDATE_NEW && tup != tup_key &&
+					is_tuple_in_block_range(tup, ctx->first_block,
+											range_end))
+					apply_concurrent_insert(rel, tup, ctx->iistate,
+											index_slot);
+			}
 			if (tup_old != NULL)
 			{
 				pfree(tup_old);
@@ -3025,6 +3172,33 @@ apply_concurrent_delete(Relation rel, HeapTuple tup_target)
 	pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_DELETED, 1);
 }
 
+/*
+ * Check if tuple originates from given range of blocks that have already been
+ * copied.
+ */
+static bool
+is_tuple_in_block_range(HeapTuple tup, BlockNumber start, BlockNumber end)
+{
+	BlockNumber blknum;
+
+	Assert(BlockNumberIsValid(start) && BlockNumberIsValid(end));
+
+	blknum = ItemPointerGetBlockNumber(&tup->t_self);
+	Assert(BlockNumberIsValid(blknum));
+
+	if (start < end)
+	{
+		return blknum >= start && blknum < end;
+	}
+	else
+	{
+		/* Has the scan position wrapped around? */
+		Assert(start > end);
+
+		return blknum >= start || blknum < end;
+	}
+}
+
 /*
  * Find the tuple to be updated or deleted.
  *
@@ -3034,10 +3208,10 @@ apply_concurrent_delete(Relation rel, HeapTuple tup_target)
  * it when he no longer needs the tuple returned.
  */
 static HeapTuple
-find_target_tuple(Relation rel, ChangeDest *dest, HeapTuple tup_key,
-				  TupleTableSlot *ident_slot)
+find_target_tuple(Relation rel, ConcurrentChangeContext *ctx,
+				  HeapTuple tup_key, TupleTableSlot *ident_slot)
 {
-	Relation	ident_index = dest->ident_index;
+	Relation	ident_index = ctx->ident_index;
 	IndexScanDesc scan;
 	Form_pg_index ident_form;
 	int2vector *ident_indkey;
@@ -3045,14 +3219,14 @@ find_target_tuple(Relation rel, ChangeDest *dest, HeapTuple tup_key,
 
 	/* XXX no instrumentation for now */
 	scan = index_beginscan(rel, ident_index, GetActiveSnapshot(),
-						   NULL, dest->ident_key_nentries, 0);
+						   NULL, ctx->ident_key_nentries, 0);
 
 	/*
 	 * Scan key is passed by caller, so it does not have to be constructed
 	 * multiple times. Key entries have all fields initialized, except for
 	 * sk_argument.
 	 */
-	index_rescan(scan, dest->ident_key, dest->ident_key_nentries, NULL, 0);
+	index_rescan(scan, ctx->ident_key, ctx->ident_key_nentries, NULL, 0);
 
 	/* Info needed to retrieve key values from heap tuple. */
 	ident_form = ident_index->rd_index;
@@ -3087,15 +3261,22 @@ find_target_tuple(Relation rel, ChangeDest *dest, HeapTuple tup_key,
 }
 
 /*
- * Decode and apply concurrent changes, up to (and including) the record whose
- * LSN is 'end_of_wal'.
+ * Get concurrent changes, up to (and including) the record whose LSN is
+ * 'end_of_wal', from the decoding worker. If 'range_end' is a valid block
+ * number, the changes should only be applied to blocks greater than or equal
+ * to ctx->first_block and lower than range_end.
+ *
+ * If 'request_snapshot' is true, the snapshot built at LSN following the last
+ * data change needs to be exported too.
  */
-static void
-process_concurrent_changes(XLogRecPtr end_of_wal, ChangeDest *dest, bool done)
+extern void
+repack_get_concurrent_changes(ConcurrentChangeContext *ctx,
+							  XLogRecPtr end_of_wal,
+							  BlockNumber range_end,
+							  bool request_snapshot, bool done)
 {
 	DecodingWorkerShared *shared;
 	char		fname[MAXPGPATH];
-	BufFile    *file;
 
 	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 								 PROGRESS_REPACK_PHASE_CATCH_UP);
@@ -3104,6 +3285,8 @@ process_concurrent_changes(XLogRecPtr end_of_wal, ChangeDest *dest, bool done)
 	shared = (DecodingWorkerShared *) dsm_segment_address(decoding_worker->seg);
 	SpinLockAcquire(&shared->mutex);
 	shared->lsn_upto = end_of_wal;
+	Assert(!shared->snapshot_requested);
+	shared->snapshot_requested = request_snapshot;
 	shared->done = done;
 	SpinLockRelease(&shared->mutex);
 
@@ -3118,30 +3301,52 @@ process_concurrent_changes(XLogRecPtr end_of_wal, ChangeDest *dest, bool done)
 		int		last_exported;
 
 		SpinLockAcquire(&shared->mutex);
-		last_exported = shared->last_exported;
+		last_exported = shared->last_exported_changes;
 		SpinLockRelease(&shared->mutex);
 
 		/*
 		 * Has the worker exported the file we are waiting for?
 		 */
-		if (last_exported == dest->file_seq)
+		if (last_exported == ctx->file_seq_changes)
 			break;
 
 		ConditionVariableSleep(&shared->cv, WAIT_EVENT_REPACK_WORKER_EXPORT);
 	}
 	ConditionVariableCancelSleep();
 
-	/* Open the file. */
-	DecodingWorkerFileName(fname, shared->relid, dest->file_seq);
-	file = BufFileOpenFileSet(&shared->sfs.fs, fname, O_RDONLY, false);
-	apply_concurrent_changes(file, dest);
+	/*
+	 * Remember the file name so we can apply the changes when appropriate.
+	 * One particular reason to postpone the replay is that indexes haven't
+	 * been built yet on the new heap.
+	 */
+	DecodingWorkerFileName(fname, shared->relid, ctx->file_seq_changes,
+						   false);
+	repack_add_block_range(ctx, range_end, fname);
 
-	BufFileClose(file);
+#ifdef USE_ASSERT_CHECKING
+	/* No file is exported until the worker exports the next one. */
+	SpinLockAcquire(&shared->mutex);
+	Assert(XLogRecPtrIsInvalid(shared->lsn_upto));
+	SpinLockRelease(&shared->mutex);
+#endif
+
+	/* Get ready for the next set of changes. */
+	ctx->file_seq_changes++;
+}
+
+static void
+repack_add_block_range(ConcurrentChangeContext *ctx, BlockNumber end,
+					   char *fname)
+{
+	RepackApplyRange *range;
 
-	/* Get ready for the next file. */
-	dest->file_seq++;
+	range = palloc_object(RepackApplyRange);
+	range->end = end;
+	range->fname = pstrdup(fname);
+	ctx->block_ranges = lappend(ctx->block_ranges, range);
 }
 
+
 /*
  * Initialize IndexInsertState for index specified by ident_index_id.
  *
@@ -3284,7 +3489,8 @@ static void
 rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
 								   Relation cl_index,
 								   TransactionId frozenXid,
-								   MultiXactId cutoffMulti)
+								   MultiXactId cutoffMulti,
+								   ConcurrentChangeContext *ctx)
 {
 	LOCKMODE	lockmode_old PG_USED_FOR_ASSERTS_ONLY;
 	List	   *ind_oids_new;
@@ -3303,7 +3509,6 @@ rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
 	Relation   *ind_refs,
 			   *ind_refs_p;
 	int			nind;
-	ChangeDest	chgdst;
 
 	/* Like in cluster_rel(). */
 	lockmode_old = ShareUpdateExclusiveLock;
@@ -3360,12 +3565,18 @@ rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
 				(errmsg("identity index missing on the new relation")));
 
 	/* Gather information to apply concurrent changes. */
-	chgdst.rel = NewHeap;
-	chgdst.iistate = get_index_insert_state(NewHeap, ident_idx_new,
-											&chgdst.ident_index);
-	chgdst.ident_key = build_identity_key(ident_idx_new, OldHeap,
-										  &chgdst.ident_key_nentries);
-	chgdst.file_seq = WORKER_FILE_SNAPSHOT + 1;
+	ctx->rel = NewHeap;
+	ctx->iistate = get_index_insert_state(NewHeap, ident_idx_new,
+										  &ctx->ident_index);
+	ctx->ident_key = build_identity_key(ident_idx_new, OldHeap,
+										&ctx->ident_key_nentries);
+
+	/*
+	 * Replay the concurrent data changes gathered during heap copying. This
+	 * had to wait until after the index build because the identity index is
+	 * needed to apply UPDATE and DELETE changes.
+	 */
+	apply_concurrent_changes(ctx);
 
 	/*
 	 * During testing, wait for another backend to perform concurrent data
@@ -3383,11 +3594,13 @@ rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
 	end_of_wal = GetFlushRecPtr(NULL);
 
 	/*
-	 * Apply concurrent changes first time, to minimize the time we need to
-	 * hold AccessExclusiveLock. (Quite some amount of WAL could have been
+	 * Decode and apply concurrent changes again, to minimize the time we need
+	 * to hold AccessExclusiveLock. (Quite some amount of WAL could have been
 	 * written during the data copying and index creation.)
 	 */
-	process_concurrent_changes(end_of_wal, &chgdst, false);
+	repack_get_concurrent_changes(ctx, end_of_wal, InvalidBlockNumber, false,
+								  false);
+	apply_concurrent_changes(ctx);
 
 	/*
 	 * Acquire AccessExclusiveLock on the table, its TOAST relation (if there
@@ -3482,10 +3695,13 @@ rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
 	end_of_wal = GetFlushRecPtr(NULL);
 
 	/*
-	 * Apply the concurrent changes again. Indicate that the decoding worker
-	 * won't be needed anymore.
+	 * Decode and apply the concurrent changes again. Indicate that the
+	 * decoding worker won't be needed anymore.
 	 */
-	process_concurrent_changes(end_of_wal, &chgdst, true);
+	repack_get_concurrent_changes(ctx, end_of_wal, InvalidBlockNumber, false,
+								  true);
+	apply_concurrent_changes(ctx);
+
 
 	/* Remember info about rel before closing OldHeap */
 	relpersistence = OldHeap->rd_rel->relpersistence;
@@ -3536,8 +3752,8 @@ rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
 	table_close(NewHeap, NoLock);
 
 	/* Cleanup what we don't need anymore. (And close the identity index.) */
-	pfree(chgdst.ident_key);
-	free_index_insert_state(chgdst.iistate);
+	pfree(ctx->ident_key);
+	free_index_insert_state(ctx->iistate);
 
 	/*
 	 * Swap the relations and their TOAST relations and TOAST indexes. This
@@ -3578,6 +3794,23 @@ build_new_indexes(Relation NewHeap, Relation OldHeap, List *OldIndexes)
 		char	   *newName;
 		Relation	ind;
 
+		/*
+		 * Try to reduce the impact on VACUUM.
+		 *
+		 * The individual builds might still be a problem, but that's a
+		 * separate issue.
+		 *
+		 * TODO Can we somehow use the fact that the new heap is not yet
+		 * visible to other transaction, and thus cannot be vacuumed? Perhaps
+		 * by preventing snapshots from setting MyProc->xmin temporarily. (All
+		 * the snapshots that might have participated in the build, including
+		 * the catalog snapshots, must not be used for other tables of
+		 * course.)
+		 */
+		PopActiveSnapshot();
+		InvalidateCatalogSnapshot();
+		PushActiveSnapshot(GetTransactionSnapshot());
+
 		ind = index_open(ind_oid, ShareUpdateExclusiveLock);
 
 		newName = ChooseRelationName(get_rel_name(ind_oid),
@@ -3616,10 +3849,14 @@ start_decoding_worker(Oid relid)
 		BUFFERALIGN(REPACK_ERROR_QUEUE_SIZE);
 	seg = dsm_create(size, 0);
 	shared = (DecodingWorkerShared *) dsm_segment_address(seg);
+	shared->initialized = false;
 	shared->lsn_upto = InvalidXLogRecPtr;
 	shared->done = false;
+	/* Snapshot is the first thing we need from the worker. */
+	shared->snapshot_requested = true;
 	SharedFileSetInit(&shared->sfs, seg);
-	shared->last_exported = -1;
+	shared->last_exported_changes = -1;
+	shared->last_exported_snapshot = -1;
 	SpinLockInit(&shared->mutex);
 	shared->dbid = MyDatabaseId;
 
@@ -3828,6 +4065,9 @@ repack_worker_internal(dsm_segment *seg)
 	 */
 	SpinLockAcquire(&shared->mutex);
 	Assert(XLogRecPtrIsInvalid(shared->lsn_upto));
+	/* Initially we're expected to provide a snapshot and only that. */
+	Assert(shared->snapshot_requested &&
+		   XLogRecPtrIsInvalid(shared->lsn_upto));
 	sfs = &shared->sfs;
 	SpinLockRelease(&shared->mutex);
 
@@ -3845,8 +4085,22 @@ repack_worker_internal(dsm_segment *seg)
 	ConditionVariableSignal(&shared->cv);
 
 	/* Build the initial snapshot and export it. */
-	snapshot = SnapBuildInitialSnapshotForRepack(decoding_ctx->snapshot_builder);
-	export_initial_snapshot(snapshot, shared);
+	snapshot = SnapBuildSnapshotForRepack(decoding_ctx->snapshot_builder);
+	export_snapshot(snapshot, shared);
+
+	/*
+	 * Adjust the replication slot's xmin so that VACUUM can do more work.
+	 */
+	LogicalIncreaseXminForSlot(InvalidXLogRecPtr, snapshot->xmin, false);
+	FreeSnapshot(snapshot);
+
+	/* Increase the counter to tell the backend that the file is available. */
+	SpinLockAcquire(&shared->mutex);
+	Assert(shared->snapshot_requested);
+	shared->last_exported_snapshot++;
+	shared->snapshot_requested = false;
+	SpinLockRelease(&shared->mutex);
+	ConditionVariableSignal(&shared->cv);
 
 	/*
 	 * Only historic snapshots should be used now. Do not let us restrict the
@@ -3866,7 +4120,7 @@ repack_worker_internal(dsm_segment *seg)
  * Make snapshot available to the backend that launched the decoding worker.
  */
 static void
-export_initial_snapshot(Snapshot snapshot, DecodingWorkerShared *shared)
+export_snapshot(Snapshot snapshot, DecodingWorkerShared *shared)
 {
 	char		fname[MAXPGPATH];
 	BufFile    *file;
@@ -3876,28 +4130,23 @@ export_initial_snapshot(Snapshot snapshot, DecodingWorkerShared *shared)
 	snap_size = EstimateSnapshotSpace(snapshot);
 	snap_space = (char *) palloc(snap_size);
 	SerializeSnapshot(snapshot, snap_space);
-	FreeSnapshot(snapshot);
 
-	DecodingWorkerFileName(fname, shared->relid, shared->last_exported + 1);
+	DecodingWorkerFileName(fname, shared->relid,
+						   shared->last_exported_snapshot + 1,
+						   true);
 	file = BufFileCreateFileSet(&shared->sfs.fs, fname);
 	/* To make restoration easier, write the snapshot size first. */
 	BufFileWrite(file, &snap_size, sizeof(snap_size));
 	BufFileWrite(file, snap_space, snap_size);
 	pfree(snap_space);
 	BufFileClose(file);
-
-	/* Increase the counter to tell the backend that the file is available. */
-	SpinLockAcquire(&shared->mutex);
-	shared->last_exported++;
-	SpinLockRelease(&shared->mutex);
-	ConditionVariableSignal(&shared->cv);
 }
 
 /*
- * Get the initial snapshot from the decoding worker.
+ * Get snapshot from the decoding worker.
  */
-static Snapshot
-get_initial_snapshot(DecodingWorker *worker)
+extern Snapshot
+repack_get_snapshot(ConcurrentChangeContext *ctx)
 {
 	DecodingWorkerShared *shared;
 	char		fname[MAXPGPATH];
@@ -3905,13 +4154,15 @@ get_initial_snapshot(DecodingWorker *worker)
 	Size		snap_size;
 	char	   *snap_space;
 	Snapshot	snapshot;
+	DecodingWorker *worker = ctx->worker;
 
 	shared = (DecodingWorkerShared *) dsm_segment_address(worker->seg);
 
 	/*
-	 * The worker needs to initialize the logical decoding, which usually
-	 * takes some time. Therefore it makes sense to prepare for the sleep
-	 * first.
+	 * For the first snapshot request, the worker needs to initialize the
+	 * logical decoding, which usually takes some time. Therefore it makes
+	 * sense to prepare for the sleep first. Does it make sense to skip the
+	 * preparation on the next requests?
 	 */
 	ConditionVariablePrepareToSleep(&shared->cv);
 	for (;;)
@@ -3919,13 +4170,13 @@ get_initial_snapshot(DecodingWorker *worker)
 		int		last_exported;
 
 		SpinLockAcquire(&shared->mutex);
-		last_exported = shared->last_exported;
+		last_exported = shared->last_exported_snapshot;
 		SpinLockRelease(&shared->mutex);
 
 		/*
 		 * Has the worker exported the file we are waiting for?
 		 */
-		if (last_exported == WORKER_FILE_SNAPSHOT)
+		if (last_exported == ctx->file_seq_snapshot)
 			break;
 
 		ConditionVariableSleep(&shared->cv, WAIT_EVENT_REPACK_WORKER_EXPORT);
@@ -3933,17 +4184,27 @@ get_initial_snapshot(DecodingWorker *worker)
 	ConditionVariableCancelSleep();
 
 	/* Read the snapshot from a file. */
-	DecodingWorkerFileName(fname, shared->relid, WORKER_FILE_SNAPSHOT);
+	DecodingWorkerFileName(fname, shared->relid, ctx->file_seq_snapshot,
+						   true);
 	file = BufFileOpenFileSet(&shared->sfs.fs, fname, O_RDONLY, false);
 	BufFileReadExact(file, &snap_size, sizeof(snap_size));
 	snap_space = (char *) palloc(snap_size);
 	BufFileReadExact(file, snap_space, snap_size);
 	BufFileClose(file);
 
+#ifdef USE_ASSERT_CHECKING
+	SpinLockAcquire(&shared->mutex);
+	Assert(!shared->snapshot_requested);
+	SpinLockRelease(&shared->mutex);
+#endif
+
 	/* Restore it. */
 	snapshot = RestoreSnapshot(snap_space);
 	pfree(snap_space);
 
+	/* Get ready for the next snapshot. */
+	ctx->file_seq_snapshot++;
+
 	return snapshot;
 }
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index dc8c7be2aca..8f42238ab21 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -920,6 +920,7 @@ DecodeInsert(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	xl_heap_insert *xlrec;
 	ReorderBufferChange *change;
 	RelFileLocator target_locator;
+	BlockNumber blknum;
 
 	xlrec = (xl_heap_insert *) XLogRecGetData(r);
 
@@ -931,7 +932,7 @@ DecodeInsert(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		return;
 
 	/* only interested in our database */
-	XLogRecGetBlockTag(r, 0, &target_locator, NULL, NULL);
+	XLogRecGetBlockTag(r, 0, &target_locator, NULL, &blknum);
 	if (target_locator.dbOid != ctx->slot->data.database)
 		return;
 
@@ -956,6 +957,15 @@ DecodeInsert(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 
 	DecodeXLogTuple(tupledata, datalen, change->data.tp.newtuple);
 
+	/*
+	 * REPACK (CONCURRENTLY) needs block number to check if the corresponding
+	 * part of the table was already copied.
+	 */
+	if (am_decoding_for_repack())
+		/* offnum is not really needed, but let's set valid pointer. */
+		ItemPointerSet(&change->data.tp.newtuple->t_self, blknum,
+					   xlrec->offnum);
+
 	change->data.tp.clear_toast_afterwards = true;
 
 	ReorderBufferQueueChange(ctx->reorder, XLogRecGetXid(r), buf->origptr,
@@ -977,11 +987,12 @@ DecodeUpdate(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	ReorderBufferChange *change;
 	char	   *data;
 	RelFileLocator target_locator;
+	BlockNumber new_blknum;
 
 	xlrec = (xl_heap_update *) XLogRecGetData(r);
 
 	/* only interested in our database */
-	XLogRecGetBlockTag(r, 0, &target_locator, NULL, NULL);
+	XLogRecGetBlockTag(r, 0, &target_locator, NULL, &new_blknum);
 	if (target_locator.dbOid != ctx->slot->data.database)
 		return;
 
@@ -1007,12 +1018,27 @@ DecodeUpdate(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			ReorderBufferAllocTupleBuf(ctx->reorder, tuplelen);
 
 		DecodeXLogTuple(data, datalen, change->data.tp.newtuple);
+
+		/*
+		 * REPACK (CONCURRENTLY) needs block number to check if the
+		 * corresponding part of the table was already copied.
+		 */
+		if (am_decoding_for_repack())
+			/* offnum is not really needed, but let's set valid pointer. */
+			ItemPointerSet(&change->data.tp.newtuple->t_self,
+						   new_blknum, xlrec->new_offnum);
 	}
 
 	if (xlrec->flags & XLH_UPDATE_CONTAINS_OLD)
 	{
 		Size		datalen;
 		Size		tuplelen;
+		BlockNumber old_blknum;
+
+		if (XLogRecHasBlockRef(r, 1))
+			XLogRecGetBlockTag(r, 1, NULL, NULL, &old_blknum);
+		else
+			XLogRecGetBlockTag(r, 0, NULL, NULL, &old_blknum);
 
 		/* caution, remaining data in record is not aligned */
 		data = XLogRecGetData(r) + SizeOfHeapUpdate;
@@ -1023,6 +1049,11 @@ DecodeUpdate(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			ReorderBufferAllocTupleBuf(ctx->reorder, tuplelen);
 
 		DecodeXLogTuple(data, datalen, change->data.tp.oldtuple);
+		/* See above. */
+		if (am_decoding_for_repack())
+			ItemPointerSet(&change->data.tp.oldtuple->t_self,
+						   old_blknum, xlrec->old_offnum);
+
 	}
 
 	change->data.tp.clear_toast_afterwards = true;
@@ -1043,6 +1074,7 @@ DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	xl_heap_delete *xlrec;
 	ReorderBufferChange *change;
 	RelFileLocator target_locator;
+	BlockNumber blknum;
 
 	xlrec = (xl_heap_delete *) XLogRecGetData(r);
 
@@ -1056,7 +1088,7 @@ DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		return;
 
 	/* only interested in our database */
-	XLogRecGetBlockTag(r, 0, &target_locator, NULL, NULL);
+	XLogRecGetBlockTag(r, 0, &target_locator, NULL, &blknum);
 	if (target_locator.dbOid != ctx->slot->data.database)
 		return;
 
@@ -1088,6 +1120,15 @@ DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 
 		DecodeXLogTuple((char *) xlrec + SizeOfHeapDelete,
 						datalen, change->data.tp.oldtuple);
+
+		/*
+		 * REPACK (CONCURRENTLY) needs block number to check if the
+		 * corresponding part of the table was already copied.
+		 */
+		if (am_decoding_for_repack())
+			/* offnum is not really needed, but let's set valid pointer. */
+			ItemPointerSet(&change->data.tp.oldtuple->t_self, blknum,
+						   xlrec->offnum);
 	}
 
 	change->data.tp.clear_toast_afterwards = true;
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 35a46988285..76119c5ecaa 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1661,14 +1661,17 @@ update_progress_txn_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 
 /*
  * Set the required catalog xmin horizon for historic snapshots in the current
- * replication slot.
+ * replication slot if catalog is TRUE, or xmin if catalog is FALSE.
  *
  * Note that in the most cases, we won't be able to immediately use the xmin
  * to increase the xmin horizon: we need to wait till the client has confirmed
- * receiving current_lsn with LogicalConfirmReceivedLocation().
+ * receiving current_lsn with LogicalConfirmReceivedLocation(). However,
+ * catalog=FALSE is only allowed for temporary replication slots, so the
+ * horizon is applied immediately.
  */
 void
-LogicalIncreaseXminForSlot(XLogRecPtr current_lsn, TransactionId xmin)
+LogicalIncreaseXminForSlot(XLogRecPtr current_lsn, TransactionId xmin,
+						   bool catalog)
 {
 	bool		updated_xmin = false;
 	ReplicationSlot *slot;
@@ -1679,6 +1682,27 @@ LogicalIncreaseXminForSlot(XLogRecPtr current_lsn, TransactionId xmin)
 	Assert(slot != NULL);
 
 	SpinLockAcquire(&slot->mutex);
+	if (!catalog)
+	{
+		/*
+		 * The non-catalog horizon can only advance in temporary slots, so
+		 * update it in the shared memory immediately (w/o requiring prior
+		 * saving to disk).
+		 */
+		Assert(slot->data.persistency == RS_TEMPORARY);
+
+		/*
+		 * The horizon must not go backwards, however it's o.k. to become
+		 * invalid.
+		 */
+		Assert(!TransactionIdIsValid(slot->effective_xmin) ||
+			   !TransactionIdIsValid(xmin) ||
+			   TransactionIdFollowsOrEquals(xmin, slot->effective_xmin));
+
+		slot->effective_xmin = xmin;
+		SpinLockRelease(&slot->mutex);
+		return;
+	}
 
 	/*
 	 * don't overwrite if we already have a newer xmin. This can happen if we
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index a0293f6ec7c..3003cadd76e 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -3734,6 +3734,56 @@ ReorderBufferXidHasCatalogChanges(ReorderBuffer *rb, TransactionId xid)
 	return rbtxn_has_catalog_changes(txn);
 }
 
+/*
+ * Check if a transaction (or its subtransaction) contains a heap change.
+ */
+bool
+ReorderBufferXidHasHeapChanges(ReorderBuffer *rb, TransactionId xid)
+{
+	ReorderBufferTXN *txn;
+	dlist_iter	iter;
+
+	txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+								false);
+	if (txn == NULL)
+		return false;
+
+	dlist_foreach(iter, &txn->changes)
+	{
+		ReorderBufferChange *change;
+
+		change = dlist_container(ReorderBufferChange, node, iter.cur);
+
+		switch (change->action)
+		{
+			case REORDER_BUFFER_CHANGE_INSERT:
+			case REORDER_BUFFER_CHANGE_UPDATE:
+			case REORDER_BUFFER_CHANGE_DELETE:
+				return true;
+			default:
+				break;
+		}
+	}
+
+	/* Check subtransactions. */
+
+	/*
+	 * TODO Verify that subtransactions must be assigned to the top-level
+	 * transactions by now.
+	 */
+	dlist_foreach(iter, &txn->subtxns)
+	{
+		ReorderBufferTXN *subtxn;
+
+		subtxn = dlist_container(ReorderBufferTXN, node, iter.cur);
+
+		if (ReorderBufferXidHasHeapChanges(rb, subtxn->xid))
+			return true;
+	}
+
+	return false;
+}
+
 /*
  * ReorderBufferXidHasBaseSnapshot
  *		Have we already set the base snapshot for the given txn/subtxn?
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index e238bcd73cd..fbc24de6e24 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -128,6 +128,7 @@
 #include "access/heapam_xlog.h"
 #include "access/transam.h"
 #include "access/xact.h"
+#include "commands/cluster.h"
 #include "common/file_utils.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -496,7 +497,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
  * we do not set MyProc->xmin). XXX Do we yet need to add some restrictions?
  */
 Snapshot
-SnapBuildInitialSnapshotForRepack(SnapBuild *builder)
+SnapBuildSnapshotForRepack(SnapBuild *builder)
 {
 	Snapshot	snap;
 
@@ -1035,6 +1036,28 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 		}
 	}
 
+	/*
+	 * Is REPACKED (CONCURRENTLY) is being run by this backend?
+	 */
+	else if (am_decoding_for_repack())
+	{
+		Assert(builder->building_full_snapshot);
+
+		/*
+		 * In this special mode, heap changes of other relations should not be
+		 * decoded at all - see heap_decode(). Thus if we find a single heap
+		 * change in this transaction (or its subtransaction), we know that
+		 * this transaction changes the relation being repacked.
+		 */
+		if (ReorderBufferXidHasHeapChanges(builder->reorder, xid))
+
+			/*
+			 * Record the commit so we can build snapshots for the relation
+			 * being repacked.
+			 */
+			needs_timetravel = true;
+	}
+
 	for (nxact = 0; nxact < nsubxacts; nxact++)
 	{
 		TransactionId subxid = subxacts[nxact];
@@ -1240,7 +1263,7 @@ SnapBuildProcessRunningXacts(SnapBuild *builder, XLogRecPtr lsn, xl_running_xact
 		xmin = running->oldestRunningXid;
 	elog(DEBUG3, "xmin: %u, xmax: %u, oldest running: %u, oldest xmin: %u",
 		 builder->xmin, builder->xmax, running->oldestRunningXid, xmin);
-	LogicalIncreaseXminForSlot(lsn, xmin);
+	LogicalIncreaseXminForSlot(lsn, xmin, true);
 
 	/*
 	 * Also tell the slot where we can restart decoding from. We don't want to
diff --git a/src/backend/replication/pgoutput_repack/pgoutput_repack.c b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
index fb9956d392d..be1c3ec9626 100644
--- a/src/backend/replication/pgoutput_repack/pgoutput_repack.c
+++ b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
@@ -195,6 +195,8 @@ store_change(LogicalDecodingContext *ctx, ConcurrentChangeKind kind,
 	BufFileWrite(dstate->file, &tuple->t_len, sizeof(tuple->t_len));
 	/* ... and the tuple itself. */
 	BufFileWrite(dstate->file, tuple->t_data, tuple->t_len);
+	/* CTID is needed as well, to check block ranges. */
+	BufFileWrite(dstate->file, &tuple->t_self, sizeof(tuple->t_self));
 
 	/* Free the flat copy if created above. */
 	if (flattened)
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 7c60b125564..24f29f0016e 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -2424,6 +2424,16 @@
   boot_val => 'true',
 },
 
+# TODO Tune boot_val, 1024 is probably too low.
+{ name => 'repack_snapshot_after', type => 'int', context => 'PGC_USERSET', group => 'DEVELOPER_OPTIONS',
+  short_desc => 'Number of pages after which REPACK (CONCURRENTLY) builds a new snapshot.',
+  flags => 'GUC_UNIT_BLOCKS | GUC_NOT_IN_SAMPLE',
+  variable => 'repack_blocks_per_snapshot',
+  boot_val => '1024',
+  min => '1',
+  max => 'INT_MAX',
+}
+
 { name => 'reserved_connections', type => 'int', context => 'PGC_POSTMASTER', group => 'CONN_AUTH_SETTINGS',
   short_desc => 'Sets the number of connection slots reserved for roles with privileges of pg_use_reserved_connections.',
   variable => 'ReservedConnections',
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 73ff6ad0a32..55c761de759 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -42,6 +42,7 @@
 #include "catalog/namespace.h"
 #include "catalog/storage.h"
 #include "commands/async.h"
+#include "commands/cluster.h"
 #include "commands/extension.h"
 #include "commands/event_trigger.h"
 #include "commands/tablespace.h"
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 15760363a1a..03ba89e6989 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -629,12 +629,12 @@ typedef struct TableAmRoutine
 											  Relation OldIndex,
 											  bool use_sort,
 											  TransactionId OldestXmin,
-											  Snapshot snapshot,
 											  TransactionId *xid_cutoff,
 											  MultiXactId *multi_cutoff,
 											  double *num_tuples,
 											  double *tups_vacuumed,
-											  double *tups_recently_dead);
+											  double *tups_recently_dead,
+											  void *tableam_data);
 
 	/*
 	 * React to VACUUM command on the relation. The VACUUM can be triggered by
@@ -1647,8 +1647,6 @@ table_relation_copy_data(Relation rel, const RelFileLocator *newrlocator)
  *   not needed for the relation's AM
  * - *xid_cutoff - ditto
  * - *multi_cutoff - ditto
- * - snapshot - if != NULL, ignore data changes done by transactions that this
- *	 (MVCC) snapshot considers still in-progress or in the future.
  *
  * Output parameters:
  * - *xid_cutoff - rel's new relfrozenxid value, may be invalid
@@ -1661,19 +1659,19 @@ table_relation_copy_for_cluster(Relation OldTable, Relation NewTable,
 								Relation OldIndex,
 								bool use_sort,
 								TransactionId OldestXmin,
-								Snapshot snapshot,
 								TransactionId *xid_cutoff,
 								MultiXactId *multi_cutoff,
 								double *num_tuples,
 								double *tups_vacuumed,
-								double *tups_recently_dead)
+								double *tups_recently_dead,
+								void *tableam_data)
 {
 	OldTable->rd_tableam->relation_copy_for_cluster(OldTable, NewTable, OldIndex,
 													use_sort, OldestXmin,
-													snapshot,
 													xid_cutoff, multi_cutoff,
 													num_tuples, tups_vacuumed,
-													tups_recently_dead);
+													tups_recently_dead,
+													tableam_data);
 }
 
 /*
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 1b05d5d418b..438ee0d751e 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -46,6 +46,72 @@ typedef struct ClusterParams
  * The following definitions are used by REPACK CONCURRENTLY.
  */
 
+extern PGDLLIMPORT int repack_blocks_per_snapshot;
+
+/*
+ * Everything we need to call ExecInsertIndexTuples().
+ */
+typedef struct IndexInsertState
+{
+	ResultRelInfo *rri;
+	EState	   *estate;
+} IndexInsertState;
+
+/*
+ * Backend-local information to control the decoding worker.
+ */
+typedef struct DecodingWorker
+{
+	/* The worker. */
+	BackgroundWorkerHandle *handle;
+
+	/* DecodingWorkerShared is in this segment. */
+	dsm_segment *seg;
+
+	/* Handle of the error queue. */
+	shm_mq_handle *error_mqh;
+} DecodingWorker;
+
+/*
+ * Information needed to handle concurrent data changes.
+ */
+typedef struct ConcurrentChangeContext
+{
+	/* The relation the changes are applied to. */
+	Relation	rel;
+
+	/*
+	 * Background worker performing logical decoding of concurrent data
+	 * changes.
+	 */
+	DecodingWorker *worker;
+
+	/*
+	 * Sequential numbers of the most recent files containing snapshots and
+	 * data changes respectively. These files are created by the decoding
+	 * worker.
+	 */
+	int		file_seq_snapshot;
+	int		file_seq_changes;
+
+	/*
+	 * The following is needed to find the existing tuple if the change is
+	 * UPDATE or DELETE. 'ident_key' should have all the fields except for
+	 * 'sk_argument' initialized.
+	 */
+	Relation	ident_index;
+	ScanKey		ident_key;
+	int			ident_key_nentries;
+
+	/* Needed to update indexes of rel_dst. */
+	IndexInsertState *iistate;
+
+	/* The first block of the scan used to copy the heap. */
+	BlockNumber first_block;
+	/* List of RepackApplyRange objects. */
+	List	   *block_ranges;
+} ConcurrentChangeContext;
+
 /*
  * Stored as a single byte in the output file.
  */
@@ -103,6 +169,12 @@ extern void finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 
 extern bool am_decoding_for_repack(void);
 extern bool change_useless_for_repack(XLogRecordBuffer *buf);
+extern void repack_get_concurrent_changes(struct ConcurrentChangeContext *ctx,
+										  XLogRecPtr end_of_wal,
+										  BlockNumber range_end,
+										  bool request_snapshot,
+										  bool done);
+extern Snapshot repack_get_snapshot(struct ConcurrentChangeContext *ctx);
 
 extern void RepackWorkerMain(Datum main_arg);
 #endif							/* CLUSTER_H */
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index 5b43e181135..432dca928e3 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -137,7 +137,7 @@ extern bool DecodingContextReady(LogicalDecodingContext *ctx);
 extern void FreeDecodingContext(LogicalDecodingContext *ctx);
 
 extern void LogicalIncreaseXminForSlot(XLogRecPtr current_lsn,
-									   TransactionId xmin);
+									   TransactionId xmin, bool catalog);
 extern void LogicalIncreaseRestartDecodingForSlot(XLogRecPtr current_lsn,
 												  XLogRecPtr restart_lsn);
 extern void LogicalConfirmReceivedLocation(XLogRecPtr lsn);
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 314e35592c0..19df5f4a9ee 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -763,6 +763,7 @@ extern void ReorderBufferProcessXid(ReorderBuffer *rb, TransactionId xid, XLogRe
 
 extern void ReorderBufferXidSetCatalogChanges(ReorderBuffer *rb, TransactionId xid, XLogRecPtr lsn);
 extern bool ReorderBufferXidHasCatalogChanges(ReorderBuffer *rb, TransactionId xid);
+extern bool ReorderBufferXidHasHeapChanges(ReorderBuffer *rb, TransactionId xid);
 extern bool ReorderBufferXidHasBaseSnapshot(ReorderBuffer *rb, TransactionId xid);
 
 extern bool ReorderBufferRememberPrepareInfo(ReorderBuffer *rb, TransactionId xid,
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index 5ee267d1c90..b20a4d1a93d 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -73,7 +73,7 @@ extern void FreeSnapshotBuilder(SnapBuild *builder);
 extern void SnapBuildSnapDecRefcount(Snapshot snap);
 
 extern Snapshot SnapBuildInitialSnapshot(SnapBuild *builder);
-extern Snapshot SnapBuildInitialSnapshotForRepack(SnapBuild *builder);
+extern Snapshot SnapBuildSnapshotForRepack(SnapBuild *builder);
 extern Snapshot SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place);
 extern const char *SnapBuildExportSnapshot(SnapBuild *builder);
 extern void SnapBuildClearExportedSnapshot(void);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index d1a694f9008..220a2b43aa1 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -419,7 +419,6 @@ CatCacheHeader
 CatalogId
 CatalogIdMapEntry
 CatalogIndexState
-ChangeDest
 ChangeVarNodes_callback
 ChangeVarNodes_context
 CheckPoint
@@ -496,6 +495,7 @@ CompressFileHandle
 CompressionLocation
 CompressorState
 ComputeXidHorizonsResult
+ConcurrentChangeContext
 ConcurrentChangeKind
 ConditionVariable
 ConditionVariableMinimallyPadded
@@ -2575,6 +2575,7 @@ ReorderBufferTupleCidKey
 ReorderBufferUpdateProgressTxnCB
 ReorderTuple
 RepOriginId
+RepackApplyRange
 RepackCommand
 RepackDecodingState
 RepackStmt
-- 
2.47.3

#83

Mihail Nikalayeu

mihailnikalayeu@gmail.com

1 day ago

In reply to: Antonin Houska (#82)

Re: Adding REPACK [concurrently]

Hello, Antonin!

On Thu, Jan 8, 2026 at 7:59 PM Antonin Houska <ah@cybertec.at> wrote:

v29 tries to fix the problem.

Some comments for 0001-0004.

------ 0001 -----

src/bin/scripts/t/103_repackdb.pl:1:
# Copyright (c) 2021-2025

Update year for 2026.

* FIXME: this is missing a way to specify the index to use to repack one
* table, or whether to pass a WITH INDEX clause when multiple tables are
* used. Something like --index[=indexname]. Adding that bleeds into
* vacuuming.c as well.

Comments look stale.

return "???";

I think it is better to add Assert(false); before (done that way in a
few places).

command <link linkend="sql-repack"><command>REPACK</command></link> There

need .

“An utility”

Should be “A utility”

else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
cmdtype = PROGRESS_COMMAND_CLUSTER;

Should we set PROGRESS_COMMAND_REPACK here? Because cluster is not
used anywhere. Probably we may even delete PROGRESS_COMMAND_CLUSTER.

CLUOPT_RECHECK_ISCLUSTERED

It is not set anymore... Probably something is wrong here or we need
to just remove that constant and check for it.

------ 0002 -----

rebuild_relation(Relation OldHeap, Relation index, bool verbose)

It removes unused cmd parameter, but I think it is better to not add
it in the previous commit.

------ 0003 -----

int newxcnt = 0;

I think it is better to use uint32 for consistency here.

Also, I think it is worth adding Assert(snapshot->snapshot_type ==
SNAPSHOT_HISTORIC_MVCC)

------ 0004 -----

/* Is REPACK (CONCURRENTLY) being run by this backend? */
if (am_decoding_for_repack())

We should check change_useless_for_repack here - to avoid looking and
TRUNCATE of unrelated tables.

/* For the same reason, unlock TOAST relation. */
if (OldHeap->rd_rel->reltoastrelid)
LockRelationOid(OldHeap->rd_rel->reltoastrelid, AccessExclusiveLock);

Hm, we are locking here instead of unlocking ;)

(errhint("Relation \"%s\" has no identity index.",
RelationGetRelationName(rel)))));

One level of braces may be removed.

* to decode on behalf of REPACK (CONCURRENT)?

CONCURRENTLY

* If recheck is required, it must have been preformed on the source

"performed"

* On exit,'*scan_p' contains the scan descriptor used. The caller must close
* it when he no longer needs the tuple returned.

There is no scan_p argument here.

* Copyright (c) 2012-2025, PostgreSQL Global Development Group

2026

newtuple = change->data.tp.newtuple != NULL ?
change->data.tp.newtuple : NULL;

oldtuple = change->data.tp.oldtuple != NULL ?
change->data.tp.oldtuple : NULL;
newtuple = change->data.tp.newtuple != NULL ?
change->data.tp.newtuple : NULL;

Hm, should it be just x = y ?

apply_concurrent_insert

Double newline at function start.

heap2_decode

Should we check for change_useless_for_repack here also? (for multi
insert, for example).

Best regards,
Mikhail.