Re: Adding REPACK [concurrently]

Started by Alvaro Herrera5 months ago31 messages

alvherre@alvh.no-ip.org

5 months ago

3 attachment(s)

On 2025-Jul-26, Robert Treat wrote:

For clarity, are you intending to commit this patch before having the
other parts ready? (If that sounds like an objection, it isn't) After
a first pass, I think there's some confusing bits in the new docs that
could use straightening out, but there likely going to overlap changes
once concurrently is brought in, so it might make sense to hold off on
those.

I'm aiming at getting 0001 committed during the September commitfest,
and the CONCURRENTLY flag addition later in the pg19 cycle. But I'd
rather have good-enough docs at every step of the way. They don't have
to be *perfect* if we want to get everything in pg19, but I'd rather not
leave anything openly confusing even transiently.

That said, I did not review the docs this time around, so here's them
the same as they were in the previous post. But if you want to suggest
changes for the docs in 0001, please do. Just don't get too carried
away.

speaking of, for this bit in src/backend/commands/cluster.c

+    switch (cmd)
+    {
+        case REPACK_COMMAND_REPACK:
+            return "REPACK";
+        case REPACK_COMMAND_VACUUMFULL:
+            return "VACUUM";
+        case REPACK_COMMAND_CLUSTER:
+            return "VACUUM";
+    }
+    return "???";

The last one should return "CLUSTER" no?

Absolutely -- my blunder.

On 2025-Jul-27, Fujii Masao wrote:

The patch submitted here, largely by Antonin Houska with some
changes by me, is based on the the pg_squeeze code which he
authored, and first introduces a new command called REPACK to absorb
both VACUUM FULL and CLUSTER, followed by addition of a CONCURRENTLY
flag to allow some forms of REPACK to operate online using logical
decoding.

Does this mean REPACK CONCURRENTLY requires wal_level = logical, while
plain REPACK (without CONCURRENTLY) works with any wal_level setting?
If we eventually deprecate VACUUM FULL and CLUSTER, I think plain
REPACK should still be allowed with wal_level = minimal or replica, so
users with those settings can perform equivalent processing.

Absolutely.

One of the later patches in the series, which I have not included yet,
intends to implement the idea of transiently enabling wal_level=logical
for the table being repacked concurrently, so that you can still use
the concurrent mode if you have a non-logical-wal_level instance.

+ if (!cluster_is_permitted_for_relation(tableOid, userid,
+    CLUSTER_COMMAND_CLUSTER))
As for the patch you attached, it seems to be an early WIP and
might not be ready for review yet?? BTW, I got the following
compilation failure and probably CLUSTER_COMMAND_CLUSTER
the above should be GetUserId().

This was a silly merge mistake, caused by my squashing Antonin's 0004
(trivial code restructuring) into 0001 at the last minute and failing to
"git add" the compile fixes before doing git-format-patch.

Here's v17. (I decided that calling my previous one "v1" after Antonin
had gone all the way to v15 was stupid on my part.) The important part
here is that I rebased Antonin 0004's, that is, the addition of the
CONCURRENTLY flag, plus 0005 regression tests.

The only interesting change here is that I decided to not mess with the
grammar by allowing an unparenthesized CONCURRENTLY keyword; if you want
concurrent, you have to say "REPACK (CONCURRENTLY)". This is at odds
with the way we use the keyword in other commands, but ISTM we don't
_need_ to support that legacy syntax. Anyway, this is easy to put back
afterwards, if enough people find it not useless.

I've not reviewed 0003 in depth yet, just rebased it. But it works to
the point that CI is happy with it.

I've not yet included Antonin's 0006 and 0007.

TODO list for 0001:

- addition of src/bin/scripts/repackdb
- clean up the progress report infrastructure
- doc review

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
Thou shalt check the array bounds of all strings (indeed, all arrays), for
surely where thou typest "foo" someone someday shall type
"supercalifragilisticexpialidocious" (5th Commandment for C programmers)

Attachments:

v17-0001-Add-REPACK-command.patchtext/x-diff; charset=utf-8Download

From 365543b6392027a75c3a16161186e2187e70fba5 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Mon, 30 Jun 2025 19:41:42 +0200
Subject: [PATCH v17 1/3] Add REPACK command.

The existing CLUSTER command as well as VACUUM with the FULL option both
reclaim unused space by rewriting table. Now that we want to enhance this
functionality (in particular, by adding a new option CONCURRENTLY), we should
enhance both commands because they are both implemented by the same function
(cluster.c:cluster_rel). However, adding the same option to two different
commands is not very user-friendly. Therefore it was decided to create a new
command and to declare both CLUSTER command and the FULL option of VACUUM
deprecated. Future enhancements to this rewriting code will only affect the
new command.

Like CLUSTER, the REPACK command reorders the table according to the specified
index. Unlike CLUSTER, REPACK does not require the index: if only table is
specified, the command acts as VACUUM FULL. As we don't want to remove CLUSTER
and VACUUM FULL yet, there are three callers of the cluster_rel() function
now: REPACK, CLUSTER and VACUUM FULL. When we need to distinguish who is
calling this function (mostly for logging, but also for progress reporting),
we can no longer use the OID of the clustering index: both REPACK and VACUUM
FULL can pass InvalidOid. Therefore, this patch introduces a new enumeration
type ClusterCommand, and adds an argument of this type to the cluster_rel()
function and to all the functions that need to distinguish the caller.

Like CLUSTER and VACUUM FULL, the REPACK command without arguments processes
all the tables on which the current user has the MAINTAIN privilege.

A new view pg_stat_progress_repack view is added to monitor the progress of
REPACK. Currently it displays the same information as pg_stat_progress_cluster
(except that column names might differ), but it'll also display the status of
the REPACK CONCURRENTLY command in the future, so the view definitions will
eventually diverge.

Regarding user documentation, the patch moves the information on clustering
from cluster.sgml to the new file repack.sgml. cluster.sgml now contains a
link that points to the related section of repack.sgml. A note on deprecation
and a link to repack.sgml are added to both cluster.sgml and vacuum.sgml.

Author: Antonin Houska <ah@cybertec.at>
Discussion: https://postgr.es/m/82651.1720540558@antos
Discussion: https://postgr.es/m/202507262156.sb455angijk6@alvherre.pgsql
---
 doc/src/sgml/monitoring.sgml             | 223 ++++++-
 doc/src/sgml/ref/allfiles.sgml           |   1 +
 doc/src/sgml/ref/cluster.sgml            |  82 +--
 doc/src/sgml/ref/repack.sgml             | 254 ++++++++
 doc/src/sgml/ref/vacuum.sgml             |   9 +
 doc/src/sgml/reference.sgml              |   1 +
 src/backend/access/heap/heapam_handler.c |  32 +-
 src/backend/catalog/index.c              |   2 +-
 src/backend/catalog/system_views.sql     |  26 +
 src/backend/commands/cluster.c           | 723 +++++++++++++++--------
 src/backend/commands/vacuum.c            |   3 +-
 src/backend/parser/gram.y                |  77 ++-
 src/backend/tcop/utility.c               |  20 +-
 src/backend/utils/adt/pgstatfuncs.c      |   2 +
 src/bin/psql/tab-complete.in.c           |  33 +-
 src/include/commands/cluster.h           |   7 +-
 src/include/commands/progress.h          |  61 +-
 src/include/nodes/parsenodes.h           |  20 +-
 src/include/parser/kwlist.h              |   1 +
 src/include/tcop/cmdtaglist.h            |   1 +
 src/include/utils/backend_progress.h     |   1 +
 src/test/regress/expected/cluster.out    | 125 +++-
 src/test/regress/expected/rules.out      |  23 +
 src/test/regress/sql/cluster.sql         |  59 ++
 src/tools/pgindent/typedefs.list         |   3 +
 25 files changed, 1408 insertions(+), 381 deletions(-)
 create mode 100644 doc/src/sgml/ref/repack.sgml

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 823afe1b30b..924e1b1fa99 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -405,6 +405,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_progress_repack</structname><indexterm><primary>pg_stat_progress_repack</primary></indexterm></entry>
+      <entry>One row for each backend running
+       <command>REPACK</command>, showing current progress.  See
+       <xref linkend="repack-progress-reporting"/>.
+      </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_progress_basebackup</structname><indexterm><primary>pg_stat_progress_basebackup</primary></indexterm></entry>
       <entry>One row for each WAL sender process streaming a base backup,
@@ -5495,7 +5503,8 @@ FROM pg_stat_get_backend_idset() AS backendid;
    certain commands during command execution.  Currently, the only commands
    which support progress reporting are <command>ANALYZE</command>,
    <command>CLUSTER</command>,
-   <command>CREATE INDEX</command>, <command>VACUUM</command>,
+   <command>CREATE INDEX</command>, <command>REPACK</command>,
+   <command>VACUUM</command>,
    <command>COPY</command>,
    and <xref linkend="protocol-replication-base-backup"/> (i.e., replication
    command that <xref linkend="app-pgbasebackup"/> issues to take
@@ -5954,6 +5963,218 @@ FROM pg_stat_get_backend_idset() AS backendid;
   </table>
  </sect2>
 
+ <sect2 id="repack-progress-reporting">
+  <title>REPACK Progress Reporting</title>
+
+  <indexterm>
+   <primary>pg_stat_progress_repack</primary>
+  </indexterm>
+
+  <para>
+   Whenever <command>REPACK</command> is running,
+   the <structname>pg_stat_progress_repack</structname> view will contain a
+   row for each backend that is currently running the command.  The tables
+   below describe the information that will be reported and provide
+   information about how to interpret it.
+  </para>
+
+  <table id="pg-stat-progress-repack-view" xreflabel="pg_stat_progress_repack">
+   <title><structname>pg_stat_progress_repack</structname> View</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>pid</structfield> <type>integer</type>
+      </para>
+      <para>
+       Process ID of backend.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>datid</structfield> <type>oid</type>
+      </para>
+      <para>
+       OID of the database to which this backend is connected.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>datname</structfield> <type>name</type>
+      </para>
+      <para>
+       Name of the database to which this backend is connected.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>relid</structfield> <type>oid</type>
+      </para>
+      <para>
+       OID of the table being repacked.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>phase</structfield> <type>text</type>
+      </para>
+      <para>
+       Current processing phase. See <xref linkend="repack-phases"/>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>repack_index_relid</structfield> <type>oid</type>
+      </para>
+      <para>
+       If the table is being scanned using an index, this is the OID of the
+       index being used; otherwise, it is zero.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_scanned</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples scanned.
+       This counter only advances when the phase is
+       <literal>seq scanning heap</literal>,
+       <literal>index scanning heap</literal>
+       or <literal>writing new heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_written</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples written.
+       This counter only advances when the phase is
+       <literal>seq scanning heap</literal>,
+       <literal>index scanning heap</literal>
+       or <literal>writing new heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_blks_total</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total number of heap blocks in the table.  This number is reported
+       as of the beginning of <literal>seq scanning heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_blks_scanned</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap blocks scanned.  This counter only advances when the
+       phase is <literal>seq scanning heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>index_rebuild_count</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of indexes rebuilt.  This counter only advances when the phase
+       is <literal>rebuilding index</literal>.
+      </para></entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <table id="repack-phases">
+   <title>REPACK Phases</title>
+   <tgroup cols="2">
+    <colspec colname="col1" colwidth="1*"/>
+    <colspec colname="col2" colwidth="2*"/>
+    <thead>
+    <row>
+      <entry>Phase</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><literal>initializing</literal></entry>
+     <entry>
+       The command is preparing to begin scanning the heap.  This phase is
+       expected to be very brief.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>seq scanning heap</literal></entry>
+     <entry>
+       The command is currently scanning the table using a sequential scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>index scanning heap</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently scanning the table using an index scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>sorting tuples</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently sorting tuples.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>writing new heap</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently writing the new heap.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>swapping relation files</literal></entry>
+     <entry>
+       The command is currently swapping newly-built files into place.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>rebuilding index</literal></entry>
+     <entry>
+       The command is currently rebuilding an index.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>performing final cleanup</literal></entry>
+     <entry>
+       The command is performing final cleanup.  When this phase is
+       completed, <command>REPACK</command> will end.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+ </sect2>
+
  <sect2 id="copy-progress-reporting">
   <title>COPY Progress Reporting</title>
 
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index f5be638867a..c0ef654fcb4 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -167,6 +167,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY refreshMaterializedView SYSTEM "refresh_materialized_view.sgml">
 <!ENTITY reindex            SYSTEM "reindex.sgml">
 <!ENTITY releaseSavepoint   SYSTEM "release_savepoint.sgml">
+<!ENTITY repack             SYSTEM "repack.sgml">
 <!ENTITY reset              SYSTEM "reset.sgml">
 <!ENTITY revoke             SYSTEM "revoke.sgml">
 <!ENTITY rollback           SYSTEM "rollback.sgml">
diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index 8811f169ea0..ee4fd965928 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -42,18 +42,6 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
    <replaceable class="parameter">table_name</replaceable>.
   </para>
 
-  <para>
-   When a table is clustered, it is physically reordered
-   based on the index information. Clustering is a one-time operation:
-   when the table is subsequently updated, the changes are
-   not clustered.  That is, no attempt is made to store new or
-   updated rows according to their index order.  (If one wishes, one can
-   periodically recluster by issuing the command again.  Also, setting
-   the table's <literal>fillfactor</literal> storage parameter to less than
-   100% can aid in preserving cluster ordering during updates, since updated
-   rows are kept on the same page if enough space is available there.)
-  </para>
-
   <para>
    When a table is clustered, <productname>PostgreSQL</productname>
    remembers which index it was clustered by.  The form
@@ -78,6 +66,25 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
    database operations (both reads and writes) from operating on the
    table until the <command>CLUSTER</command> is finished.
   </para>
+
+  <warning>
+   <para>
+    The <command>CLUSTER</command> command is deprecated in favor of
+    <xref linkend="sql-repack"/>.
+   </para>
+  </warning>
+
+  <note>
+   <para>
+    <xref linkend="sql-repack-notes-on-clustering"/> explain how clustering
+    works, whether it is initiated by <command>CLUSTER</command> or
+    by <command>REPACK</command>. The notable difference between the two is
+    that <command>REPACK</command> does not remember the index used last
+    time. Thus if you don't specify an index, <command>REPACK</command>
+    rewrites the table but does not try to cluster it.
+   </para>
+  </note>
+
  </refsect1>
 
  <refsect1>
@@ -136,63 +143,12 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
     on the table.
    </para>
 
-   <para>
-    In cases where you are accessing single rows randomly
-    within a table, the actual order of the data in the
-    table is unimportant. However, if you tend to access some
-    data more than others, and there is an index that groups
-    them together, you will benefit from using <command>CLUSTER</command>.
-    If you are requesting a range of indexed values from a table, or a
-    single indexed value that has multiple rows that match,
-    <command>CLUSTER</command> will help because once the index identifies the
-    table page for the first row that matches, all other rows
-    that match are probably already on the same table page,
-    and so you save disk accesses and speed up the query.
-   </para>
-
-   <para>
-    <command>CLUSTER</command> can re-sort the table using either an index scan
-    on the specified index, or (if the index is a b-tree) a sequential
-    scan followed by sorting.  It will attempt to choose the method that
-    will be faster, based on planner cost parameters and available statistical
-    information.
-   </para>
-
    <para>
     While <command>CLUSTER</command> is running, the <xref
     linkend="guc-search-path"/> is temporarily changed to <literal>pg_catalog,
     pg_temp</literal>.
    </para>
 
-   <para>
-    When an index scan is used, a temporary copy of the table is created that
-    contains the table data in the index order.  Temporary copies of each
-    index on the table are created as well.  Therefore, you need free space on
-    disk at least equal to the sum of the table size and the index sizes.
-   </para>
-
-   <para>
-    When a sequential scan and sort is used, a temporary sort file is
-    also created, so that the peak temporary space requirement is as much
-    as double the table size, plus the index sizes.  This method is often
-    faster than the index scan method, but if the disk space requirement is
-    intolerable, you can disable this choice by temporarily setting <xref
-    linkend="guc-enable-sort"/> to <literal>off</literal>.
-   </para>
-
-   <para>
-    It is advisable to set <xref linkend="guc-maintenance-work-mem"/> to
-    a reasonably large value (but not more than the amount of RAM you can
-    dedicate to the <command>CLUSTER</command> operation) before clustering.
-   </para>
-
-   <para>
-    Because the planner records statistics about the ordering of
-    tables, it is advisable to run <link linkend="sql-analyze"><command>ANALYZE</command></link>
-    on the newly clustered table.
-    Otherwise, the planner might make poor choices of query plans.
-   </para>
-
    <para>
     Because <command>CLUSTER</command> remembers which indexes are clustered,
     one can cluster the tables one wants clustered manually the first time,
diff --git a/doc/src/sgml/ref/repack.sgml b/doc/src/sgml/ref/repack.sgml
new file mode 100644
index 00000000000..a612c72d971
--- /dev/null
+++ b/doc/src/sgml/ref/repack.sgml
@@ -0,0 +1,254 @@
+<!--
+doc/src/sgml/ref/repack.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="sql-repack">
+ <indexterm zone="sql-repack">
+  <primary>REPACK</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>REPACK</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>REPACK</refname>
+  <refpurpose>rewrite a table to reclaim disk space</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <replaceable class="parameter">table_name</replaceable> [ USING INDEX <replaceable class="parameter">index_name</replaceable> ] ]
+
+<phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
+
+    VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>REPACK</command> reclaims storage occupied by dead
+   tuples. Unlike <command>VACUUM</command>, it does so by rewriting the
+   entire contents of the table specified
+   by <replaceable class="parameter">table_name</replaceable> into a new disk
+   file with no extra space (except for the space guaranteed by
+   the <literal>fillfactor</literal> storage parameter), allowing unused space
+   to be returned to the operating system.
+  </para>
+
+  <para>
+   Without
+   a <replaceable class="parameter">table_name</replaceable>, <command>REPACK</command>
+   processes every table and materialized view in the current database that
+   the current user has the <literal>MAINTAIN</literal> privilege on. This
+   form of <command>REPACK</command> cannot be executed inside a transaction
+   block.
+  </para>
+
+  <para>
+   If <replaceable class="parameter">index_name</replaceable> is specified,
+   the table is clustered by this index. Please see the notes on clustering
+   below.
+  </para>
+
+  <para>
+   When a table is being repacked, an <literal>ACCESS EXCLUSIVE</literal> lock
+   is acquired on it. This prevents any other database operations (both reads
+   and writes) from operating on the table until the <command>REPACK</command>
+   is finished.
+  </para>
+
+  <refsect2 id="sql-repack-notes-on-clustering" xreflabel="Notes on Clustering">
+   <title>Notes on Clustering</title>
+
+   <para>
+    When a table is clustered, it is physically reordered based on the index
+    information.  Clustering is a one-time operation: when the table is
+    subsequently updated, the changes are not clustered.  That is, no attempt
+    is made to store new or updated rows according to their index order.  (If
+    one wishes, one can periodically recluster by issuing the command again.
+    Also, setting the table's <literal>fillfactor</literal> storage parameter
+    to less than 100% can aid in preserving cluster ordering during updates,
+    since updated rows are kept on the same page if enough space is available
+    there.)
+   </para>
+
+   <para>
+    In cases where you are accessing single rows randomly within a table, the
+    actual order of the data in the table is unimportant. However, if you tend
+    to access some data more than others, and there is an index that groups
+    them together, you will benefit from using <command>REPACK</command>.  If
+    you are requesting a range of indexed values from a table, or a single
+    indexed value that has multiple rows that match,
+    <command>REPACK</command> will help because once the index identifies the
+    table page for the first row that matches, all other rows that match are
+    probably already on the same table page, and so you save disk accesses and
+    speed up the query.
+   </para>
+
+   <para>
+    <command>REPACK</command> can re-sort the table using either an index scan
+    on the specified index (if the index is a b-tree), or a sequential scan
+    followed by sorting.  It will attempt to choose the method that will be
+    faster, based on planner cost parameters and available statistical
+    information.
+   </para>
+
+   <para>
+    Because the planner records statistics about the ordering of tables, it is
+    advisable to
+    run <link linkend="sql-analyze"><command>ANALYZE</command></link> on the
+    newly repacked table.  Otherwise, the planner might make poor choices of
+    query plans.
+   </para>
+  </refsect2>
+
+  <refsect2 id="sql-repack-notes-on-resources" xreflabel="Notes on Resources">
+   <title>Notes on Resources</title>
+
+   <para>
+    When an index scan or a sequential scan without sort is used, a temporary
+    copy of the table is created that contains the table data in the index
+    order.  Temporary copies of each index on the table are created as well.
+    Therefore, you need free space on disk at least equal to the sum of the
+    table size and the index sizes.
+   </para>
+
+   <para>
+    When a sequential scan and sort is used, a temporary sort file is also
+    created, so that the peak temporary space requirement is as much as double
+    the table size, plus the index sizes.  This method is often faster than
+    the index scan method, but if the disk space requirement is intolerable,
+    you can disable this choice by temporarily setting
+    <xref linkend="guc-enable-sort"/> to <literal>off</literal>.
+   </para>
+
+   <para>
+    It is advisable to set <xref linkend="guc-maintenance-work-mem"/> to a
+    reasonably large value (but not more than the amount of RAM you can
+    dedicate to the <command>REPACK</command> operation) before repacking.
+   </para>
+  </refsect2>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><replaceable class="parameter">table_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (possibly schema-qualified) of a table.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">index_name</replaceable></term>
+    <listitem>
+     <para>
+      The name of an index.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>VERBOSE</literal></term>
+    <listitem>
+     <para>
+      Prints a progress report as each table is repacked
+      at <literal>INFO</literal> level.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">boolean</replaceable></term>
+    <listitem>
+     <para>
+      Specifies whether the selected option should be turned on or off.
+      You can write <literal>TRUE</literal>, <literal>ON</literal>, or
+      <literal>1</literal> to enable the option, and <literal>FALSE</literal>,
+      <literal>OFF</literal>, or <literal>0</literal> to disable it.  The
+      <replaceable class="parameter">boolean</replaceable> value can also
+      be omitted, in which case <literal>TRUE</literal> is assumed.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+
+   <para>
+    To repack a table, one must have the <literal>MAINTAIN</literal> privilege
+    on the table.
+   </para>
+
+   <para>
+    While <command>REPACK</command> is running, the <xref
+    linkend="guc-search-path"/> is temporarily changed to <literal>pg_catalog,
+    pg_temp</literal>.
+   </para>
+
+  <para>
+    Each backend running <command>REPACK</command> will report its progress
+    in the <structname>pg_stat_progress_repack</structname> view. See
+    <xref linkend="repack-progress-reporting"/> for details.
+  </para>
+
+   <para>
+    Repacking a partitioned table repacks each of its partitions. If an index
+    is specified, each partition is repacked using the partition of that
+    index. <command>REPACK</command> on a partitioned table cannot be executed
+    inside a transaction block.
+   </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   Repack the table <literal>employees</literal>:
+<programlisting>
+REPACK employees;
+</programlisting>
+  </para>
+
+  <para>
+   Repack the table <literal>employees</literal> on the basis of its
+   index <literal>employees_ind</literal> (Since index is used here, this is
+   effectively clustering):
+<programlisting>
+REPACK employees USING INDEX employees_ind;
+</programlisting>
+  </para>
+
+  <para>
+   Repack all tables in the database on which you have
+   the <literal>MAINTAIN</literal> privilege:
+<programlisting>
+REPACK;
+</programlisting></para>
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There is no <command>REPACK</command> statement in the SQL standard.
+  </para>
+
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index bd5dcaf86a5..cee1cf3926c 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -98,6 +98,7 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
    <varlistentry>
     <term><literal>FULL</literal></term>
     <listitem>
+
      <para>
       Selects <quote>full</quote> vacuum, which can reclaim more
       space, but takes much longer and exclusively locks the table.
@@ -106,6 +107,14 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
       the operation is complete.  Usually this should only be used when a
       significant amount of space needs to be reclaimed from within the table.
      </para>
+
+     <warning>
+      <para>
+       The <option>FULL</option> parameter is deprecated in favor of
+       <xref linkend="sql-repack"/>.
+      </para>
+     </warning>
+
     </listitem>
    </varlistentry>
 
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index ff85ace83fc..229912d35b7 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -195,6 +195,7 @@
    &refreshMaterializedView;
    &reindex;
    &releaseSavepoint;
+   &repack;
    &reset;
    &revoke;
    &rollback;
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index cb4bc35c93e..0b03070d394 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -741,13 +741,13 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	if (OldIndex != NULL && !use_sort)
 	{
 		const int	ci_index[] = {
-			PROGRESS_CLUSTER_PHASE,
-			PROGRESS_CLUSTER_INDEX_RELID
+			PROGRESS_REPACK_PHASE,
+			PROGRESS_REPACK_INDEX_RELID
 		};
 		int64		ci_val[2];
 
 		/* Set phase and OIDOldIndex to columns */
-		ci_val[0] = PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP;
+		ci_val[0] = PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP;
 		ci_val[1] = RelationGetRelid(OldIndex);
 		pgstat_progress_update_multi_param(2, ci_index, ci_val);
 
@@ -759,15 +759,15 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	else
 	{
 		/* In scan-and-sort mode and also VACUUM FULL, set phase */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
 		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
 		/* Set total heap blocks */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_TOTAL_HEAP_BLKS,
+		pgstat_progress_update_param(PROGRESS_REPACK_TOTAL_HEAP_BLKS,
 									 heapScan->rs_nblocks);
 	}
 
@@ -809,7 +809,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 				 * is manually updated to the correct value when the table
 				 * scan finishes.
 				 */
-				pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+				pgstat_progress_update_param(PROGRESS_REPACK_HEAP_BLKS_SCANNED,
 											 heapScan->rs_nblocks);
 				break;
 			}
@@ -825,7 +825,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			 */
 			if (prev_cblock != heapScan->rs_cblock)
 			{
-				pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+				pgstat_progress_update_param(PROGRESS_REPACK_HEAP_BLKS_SCANNED,
 											 (heapScan->rs_cblock +
 											  heapScan->rs_nblocks -
 											  heapScan->rs_startblock
@@ -912,14 +912,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			 * In scan-and-sort mode, report increase in number of tuples
 			 * scanned
 			 */
-			pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
 										 *num_tuples);
 		}
 		else
 		{
 			const int	ct_index[] = {
-				PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED,
-				PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN
+				PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
+				PROGRESS_REPACK_HEAP_TUPLES_WRITTEN
 			};
 			int64		ct_val[2];
 
@@ -952,14 +952,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		double		n_tuples = 0;
 
 		/* Report that we are now sorting tuples */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_SORT_TUPLES);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_SORT_TUPLES);
 
 		tuplesort_performsort(tuplesort);
 
 		/* Report that we are now writing new heap */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_WRITE_NEW_HEAP);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP);
 
 		for (;;)
 		{
@@ -977,7 +977,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 									 values, isnull,
 									 rwstate);
 			/* Report n_tuples */
-			pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_WRITTEN,
 										 n_tuples);
 		}
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index c4029a4f3d3..3063abff9a5 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -4079,7 +4079,7 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 		Assert(!ReindexIsProcessingIndex(indexOid));
 
 		/* Set index rebuild count */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_INDEX_REBUILD_COUNT,
+		pgstat_progress_update_param(PROGRESS_REPACK_INDEX_REBUILD_COUNT,
 									 i);
 		i++;
 	}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index f6eca09ee15..02f091c3ed6 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1279,6 +1279,32 @@ CREATE VIEW pg_stat_progress_cluster AS
     FROM pg_stat_get_progress_info('CLUSTER') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
+CREATE VIEW pg_stat_progress_repack AS
+    SELECT
+        S.pid AS pid,
+        S.datid AS datid,
+        D.datname AS datname,
+        S.relid AS relid,
+	-- param1 is currently unused
+        CASE S.param2 WHEN 0 THEN 'initializing'
+                      WHEN 1 THEN 'seq scanning heap'
+                      WHEN 2 THEN 'index scanning heap'
+                      WHEN 3 THEN 'sorting tuples'
+                      WHEN 4 THEN 'writing new heap'
+                      WHEN 5 THEN 'swapping relation files'
+                      WHEN 6 THEN 'rebuilding index'
+                      WHEN 7 THEN 'performing final cleanup'
+                      END AS phase,
+        CAST(S.param3 AS oid) AS repack_index_relid,
+        S.param4 AS heap_tuples_scanned,
+        S.param5 AS heap_tuples_written,
+        S.param6 AS heap_blks_total,
+        S.param7 AS heap_blks_scanned,
+        S.param8 AS index_rebuild_count
+    FROM pg_stat_get_progress_info('REPACK') AS S
+        LEFT JOIN pg_database D ON S.datid = D.oid;
+
+
 CREATE VIEW pg_stat_progress_create_index AS
     SELECT
         S.pid AS pid, S.datid AS datid, D.datname AS datname,
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index b55221d44cd..af14a230f94 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -67,18 +67,41 @@ typedef struct
 	Oid			indexOid;
 } RelToCluster;
 
-
-static void cluster_multiple_rels(List *rtcs, ClusterParams *params);
-static void rebuild_relation(Relation OldHeap, Relation index, bool verbose);
+static bool cluster_rel_recheck(RepackCommand cmd, Relation OldHeap,
+								Oid indexOid, Oid userid, int options);
+static void rebuild_relation(RepackCommand cmd, bool usingindex,
+							 Relation OldHeap, Relation index, bool verbose);
 static void copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
-static List *get_tables_to_cluster(MemoryContext cluster_context);
-static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
-											   Oid indexOid);
-static bool cluster_is_permitted_for_relation(Oid relid, Oid userid);
+static List *get_tables_to_repack(RepackCommand cmd, bool usingindex,
+								  MemoryContext permcxt);
+static List *get_tables_to_repack_partitioned(RepackCommand cmd,
+											  MemoryContext cluster_context,
+											  Oid relid, bool rel_is_index);
+static bool cluster_is_permitted_for_relation(RepackCommand cmd,
+											  Oid relid, Oid userid);
+static Relation process_single_relation(RepackStmt *stmt,
+										ClusterParams *params);
+static Oid	determine_clustered_index(Relation rel, bool usingindex,
+									  const char *indexname);
 
 
+static const char *
+RepackCommandAsString(RepackCommand cmd)
+{
+	switch (cmd)
+	{
+		case REPACK_COMMAND_REPACK:
+			return "REPACK";
+		case REPACK_COMMAND_VACUUMFULL:
+			return "VACUUM";
+		case REPACK_COMMAND_CLUSTER:
+			return "VACUUM";
+	}
+	return "???";
+}
+
 /*---------------------------------------------------------------------------
  * This cluster code allows for clustering multiple tables at once. Because
  * of this, we cannot just run everything on a single transaction, or we
@@ -104,101 +127,39 @@ static bool cluster_is_permitted_for_relation(Oid relid, Oid userid);
  *---------------------------------------------------------------------------
  */
 void
-cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
+ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 {
-	ListCell   *lc;
 	ClusterParams params = {0};
 	bool		verbose = false;
 	Relation	rel = NULL;
-	Oid			indexOid = InvalidOid;
-	MemoryContext cluster_context;
+	MemoryContext repack_context;
 	List	   *rtcs;
 
 	/* Parse option list */
-	foreach(lc, stmt->params)
+	foreach_node(DefElem, opt, stmt->params)
 	{
-		DefElem    *opt = (DefElem *) lfirst(lc);
-
 		if (strcmp(opt->defname, "verbose") == 0)
 			verbose = defGetBoolean(opt);
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
-					 errmsg("unrecognized CLUSTER option \"%s\"",
+					 errmsg("unrecognized %s option \"%s\"",
+							RepackCommandAsString(stmt->command),
 							opt->defname),
 					 parser_errposition(pstate, opt->location)));
 	}
 
 	params.options = (verbose ? CLUOPT_VERBOSE : 0);
 
+	/*
+	 * If a single relation is specified, process it and we're done ... unless
+	 * the relation is a partitioned table, in which case we fall through.
+	 */
 	if (stmt->relation != NULL)
 	{
-		/* This is the single-relation case. */
-		Oid			tableOid;
-
-		/*
-		 * Find, lock, and check permissions on the table.  We obtain
-		 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
-		 * single-transaction case.
-		 */
-		tableOid = RangeVarGetRelidExtended(stmt->relation,
-											AccessExclusiveLock,
-											0,
-											RangeVarCallbackMaintainsTable,
-											NULL);
-		rel = table_open(tableOid, NoLock);
-
-		/*
-		 * Reject clustering a remote temp table ... their local buffer
-		 * manager is not going to cope.
-		 */
-		if (RELATION_IS_OTHER_TEMP(rel))
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot cluster temporary tables of other sessions")));
-
-		if (stmt->indexname == NULL)
-		{
-			ListCell   *index;
-
-			/* We need to find the index that has indisclustered set. */
-			foreach(index, RelationGetIndexList(rel))
-			{
-				indexOid = lfirst_oid(index);
-				if (get_index_isclustered(indexOid))
-					break;
-				indexOid = InvalidOid;
-			}
-
-			if (!OidIsValid(indexOid))
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_OBJECT),
-						 errmsg("there is no previously clustered index for table \"%s\"",
-								stmt->relation->relname)));
-		}
-		else
-		{
-			/*
-			 * The index is expected to be in the same namespace as the
-			 * relation.
-			 */
-			indexOid = get_relname_relid(stmt->indexname,
-										 rel->rd_rel->relnamespace);
-			if (!OidIsValid(indexOid))
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_OBJECT),
-						 errmsg("index \"%s\" for table \"%s\" does not exist",
-								stmt->indexname, stmt->relation->relname)));
-		}
-
-		/* For non-partitioned tables, do what we came here to do. */
-		if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
-		{
-			cluster_rel(rel, indexOid, &params);
-			/* cluster_rel closes the relation, but keeps lock */
-
+		rel = process_single_relation(stmt, &params);
+		if (rel == NULL)
 			return;
-		}
 	}
 
 	/*
@@ -207,88 +168,103 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 	 * transaction.  This forces us to disallow running inside a user
 	 * transaction block.
 	 */
-	PreventInTransactionBlock(isTopLevel, "CLUSTER");
+	PreventInTransactionBlock(isTopLevel, RepackCommandAsString(stmt->command));
 
 	/* Also, we need a memory context to hold our list of relations */
-	cluster_context = AllocSetContextCreate(PortalContext,
-											"Cluster",
-											ALLOCSET_DEFAULT_SIZES);
+	repack_context = AllocSetContextCreate(PortalContext,
+										   "Repack",
+										   ALLOCSET_DEFAULT_SIZES);
+
+	params.options |= CLUOPT_RECHECK;
 
 	/*
-	 * Either we're processing a partitioned table, or we were not given any
-	 * table name at all.  In either case, obtain a list of relations to
-	 * process.
-	 *
-	 * In the former case, an index name must have been given, so we don't
-	 * need to recheck its "indisclustered" bit, but we have to check that it
-	 * is an index that we can cluster on.  In the latter case, we set the
-	 * option bit to have indisclustered verified.
-	 *
-	 * Rechecking the relation itself is necessary here in all cases.
+	 * If we don't have a relation yet, determine a relation list.  If we do,
+	 * then it must be a partitioned table, and we want to process its
+	 * partitions.
 	 */
-	params.options |= CLUOPT_RECHECK;
-	if (rel != NULL)
+	if (rel == NULL)
 	{
-		Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
-		check_index_is_clusterable(rel, indexOid, AccessShareLock);
-		rtcs = get_tables_to_cluster_partitioned(cluster_context, indexOid);
-
-		/* close relation, releasing lock on parent table */
-		table_close(rel, AccessExclusiveLock);
+		Assert(stmt->indexname == NULL);
+		rtcs = get_tables_to_repack(stmt->command, stmt->usingindex,
+									repack_context);
 	}
 	else
 	{
-		rtcs = get_tables_to_cluster(cluster_context);
-		params.options |= CLUOPT_RECHECK_ISCLUSTERED;
+		Oid			relid;
+		bool		rel_is_index;
+
+		Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
+
+		/*
+		 * If an index name was specified, resolve it now and pass it down.
+		 */
+		if (stmt->usingindex)
+		{
+			/*
+			 * XXX how should this behave?  Passing no index to a partitioned
+			 * table could be useful to have certain partitions clustered by
+			 * some index, and other partitions by a different index.
+			 */
+			if (!stmt->indexname)
+				ereport(ERROR,
+						errmsg("there is no previously clustered index for table \"%s\"",
+							   RelationGetRelationName(rel)));
+
+			relid = determine_clustered_index(rel, true, stmt->indexname);
+			/* XXX is this the right place for this? */
+			check_index_is_clusterable(rel, relid, AccessExclusiveLock);
+			rel_is_index = true;
+		}
+		else
+		{
+			relid = RelationGetRelid(rel);
+			rel_is_index = false;
+		}
+
+		rtcs = get_tables_to_repack_partitioned(stmt->command, repack_context,
+												relid, rel_is_index);
+
+		/* close parent relation, releasing lock on it */
+		table_close(rel, AccessExclusiveLock);
+		rel = NULL;
 	}
 
-	/* Do the job. */
-	cluster_multiple_rels(rtcs, &params);
-
-	/* Start a new transaction for the cleanup work. */
-	StartTransactionCommand();
-
-	/* Clean up working storage */
-	MemoryContextDelete(cluster_context);
-}
-
-/*
- * Given a list of relations to cluster, process each of them in a separate
- * transaction.
- *
- * We expect to be in a transaction at start, but there isn't one when we
- * return.
- */
-static void
-cluster_multiple_rels(List *rtcs, ClusterParams *params)
-{
-	ListCell   *lc;
-
 	/* Commit to get out of starting transaction */
 	PopActiveSnapshot();
 	CommitTransactionCommand();
 
 	/* Cluster the tables, each in a separate transaction */
-	foreach(lc, rtcs)
+	Assert(rel == NULL);
+	foreach_ptr(RelToCluster, rtc, rtcs)
 	{
-		RelToCluster *rtc = (RelToCluster *) lfirst(lc);
-		Relation	rel;
-
 		/* Start a new transaction for each relation. */
 		StartTransactionCommand();
 
 		/* functions in indexes may want a snapshot set */
 		PushActiveSnapshot(GetTransactionSnapshot());
 
-		rel = table_open(rtc->tableOid, AccessExclusiveLock);
+		/*
+		 * Open the target table, coping with the case where it has been
+		 * dropped.
+		 */
+		rel = try_table_open(rtc->tableOid, AccessExclusiveLock);
+		if (rel == NULL)
+			continue;
 
 		/* Process this table */
-		cluster_rel(rel, rtc->indexOid, params);
+		cluster_rel(stmt->command, stmt->usingindex,
+					rel, rtc->indexOid, &params);
 		/* cluster_rel closes the relation, but keeps lock */
 
 		PopActiveSnapshot();
 		CommitTransactionCommand();
 	}
+
+	/* Start a new transaction for the cleanup work. */
+	StartTransactionCommand();
+
+	/* Clean up working storage */
+	MemoryContextDelete(repack_context);
 }
 
 /*
@@ -304,11 +280,14 @@ cluster_multiple_rels(List *rtcs, ClusterParams *params)
  * them incrementally while we load the table.
  *
  * If indexOid is InvalidOid, the table will be rewritten in physical order
- * instead of index order.  This is the new implementation of VACUUM FULL,
- * and error messages should refer to the operation as VACUUM not CLUSTER.
+ * instead of index order.
+ *
+ * 'cmd' indicates which command is being executed, to be used for error
+ * messages.
  */
 void
-cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
+cluster_rel(RepackCommand cmd, bool usingindex,
+			Relation OldHeap, Oid indexOid, ClusterParams *params)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			save_userid;
@@ -323,13 +302,25 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	/* Check for user-requested abort. */
 	CHECK_FOR_INTERRUPTS();
 
-	pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
-	if (OidIsValid(indexOid))
-		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
-									 PROGRESS_CLUSTER_COMMAND_CLUSTER);
+	if (cmd == REPACK_COMMAND_REPACK)
+		pgstat_progress_start_command(PROGRESS_COMMAND_REPACK, tableOid);
 	else
-		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
+		pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
+
+	if (cmd == REPACK_COMMAND_REPACK)
+		pgstat_progress_update_param(PROGRESS_REPACK_COMMAND,
+									 PROGRESS_REPACK_COMMAND_REPACK);
+	else if (cmd == REPACK_COMMAND_CLUSTER)
+	{
+		pgstat_progress_update_param(PROGRESS_REPACK_COMMAND,
+									 PROGRESS_CLUSTER_COMMAND_CLUSTER);
+	}
+	else
+	{
+		Assert(cmd == REPACK_COMMAND_VACUUMFULL);
+		pgstat_progress_update_param(PROGRESS_REPACK_COMMAND,
 									 PROGRESS_CLUSTER_COMMAND_VACUUM_FULL);
+	}
 
 	/*
 	 * Switch to the table owner's userid, so that any index functions are run
@@ -351,63 +342,21 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	 * to cluster a not-previously-clustered index.
 	 */
 	if (recheck)
-	{
-		/* Check that the user still has privileges for the relation */
-		if (!cluster_is_permitted_for_relation(tableOid, save_userid))
-		{
-			relation_close(OldHeap, AccessExclusiveLock);
+		if (!cluster_rel_recheck(cmd, OldHeap, indexOid, save_userid,
+								 params->options))
 			goto out;
-		}
-
-		/*
-		 * Silently skip a temp table for a remote session.  Only doing this
-		 * check in the "recheck" case is appropriate (which currently means
-		 * somebody is executing a database-wide CLUSTER or on a partitioned
-		 * table), because there is another check in cluster() which will stop
-		 * any attempt to cluster remote temp tables by name.  There is
-		 * another check in cluster_rel which is redundant, but we leave it
-		 * for extra safety.
-		 */
-		if (RELATION_IS_OTHER_TEMP(OldHeap))
-		{
-			relation_close(OldHeap, AccessExclusiveLock);
-			goto out;
-		}
-
-		if (OidIsValid(indexOid))
-		{
-			/*
-			 * Check that the index still exists
-			 */
-			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
-			{
-				relation_close(OldHeap, AccessExclusiveLock);
-				goto out;
-			}
-
-			/*
-			 * Check that the index is still the one with indisclustered set,
-			 * if needed.
-			 */
-			if ((params->options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
-				!get_index_isclustered(indexOid))
-			{
-				relation_close(OldHeap, AccessExclusiveLock);
-				goto out;
-			}
-		}
-	}
 
 	/*
-	 * We allow VACUUM FULL, but not CLUSTER, on shared catalogs.  CLUSTER
-	 * would work in most respects, but the index would only get marked as
-	 * indisclustered in the current database, leading to unexpected behavior
-	 * if CLUSTER were later invoked in another database.
+	 * We allow repacking shared catalogs only when not using an index. It
+	 * would work to use an index in most respects, but the index would only
+	 * get marked as indisclustered in the current database, leading to
+	 * unexpected behavior if CLUSTER were later invoked in another database.
 	 */
-	if (OidIsValid(indexOid) && OldHeap->rd_rel->relisshared)
+	if (usingindex && OldHeap->rd_rel->relisshared)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot cluster a shared catalog")));
+				 errmsg("cannot run \"%s\" on a shared catalog",
+						RepackCommandAsString(cmd))));
 
 	/*
 	 * Don't process temp tables of other backends ... their local buffer
@@ -415,21 +364,30 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	 */
 	if (RELATION_IS_OTHER_TEMP(OldHeap))
 	{
-		if (OidIsValid(indexOid))
+		if (cmd == REPACK_COMMAND_CLUSTER)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("cannot cluster temporary tables of other sessions")));
+		else if (cmd == REPACK_COMMAND_REPACK)
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot repack temporary tables of other sessions")));
+		}
 		else
+		{
+			Assert(cmd == REPACK_COMMAND_VACUUMFULL);
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("cannot vacuum temporary tables of other sessions")));
+		}
 	}
 
 	/*
 	 * Also check for active uses of the relation in the current transaction,
 	 * including open scans and pending AFTER trigger events.
 	 */
-	CheckTableNotInUse(OldHeap, OidIsValid(indexOid) ? "CLUSTER" : "VACUUM");
+	CheckTableNotInUse(OldHeap, RepackCommandAsString(cmd));
 
 	/* Check heap and index are valid to cluster on */
 	if (OidIsValid(indexOid))
@@ -469,7 +427,7 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	TransferPredicateLocksToHeapRelation(OldHeap);
 
 	/* rebuild_relation does all the dirty work */
-	rebuild_relation(OldHeap, index, verbose);
+	rebuild_relation(cmd, usingindex, OldHeap, index, verbose);
 	/* rebuild_relation closes OldHeap, and index if valid */
 
 out:
@@ -482,6 +440,63 @@ out:
 	pgstat_progress_end_command();
 }
 
+/*
+ * Check if the table (and its index) still meets the requirements of
+ * cluster_rel().
+ */
+static bool
+cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
+					Oid userid, int options)
+{
+	Oid			tableOid = RelationGetRelid(OldHeap);
+
+	/* Check that the user still has privileges for the relation */
+	if (!cluster_is_permitted_for_relation(cmd, tableOid, userid))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		return false;
+	}
+
+	/*
+	 * Silently skip a temp table for a remote session.  Only doing this check
+	 * in the "recheck" case is appropriate (which currently means somebody is
+	 * executing a database-wide CLUSTER or on a partitioned table), because
+	 * there is another check in cluster() which will stop any attempt to
+	 * cluster remote temp tables by name.  There is another check in
+	 * cluster_rel which is redundant, but we leave it for extra safety.
+	 */
+	if (RELATION_IS_OTHER_TEMP(OldHeap))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		return false;
+	}
+
+	if (OidIsValid(indexOid))
+	{
+		/*
+		 * Check that the index still exists
+		 */
+		if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
+		{
+			relation_close(OldHeap, AccessExclusiveLock);
+			return false;
+		}
+
+		/*
+		 * Check that the index is still the one with indisclustered set, if
+		 * needed.
+		 */
+		if ((options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
+			!get_index_isclustered(indexOid))
+		{
+			relation_close(OldHeap, AccessExclusiveLock);
+			return false;
+		}
+	}
+
+	return true;
+}
+
 /*
  * Verify that the specified heap and index are valid to cluster on
  *
@@ -626,7 +641,8 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
  * On exit, they are closed, but locks on them are not released.
  */
 static void
-rebuild_relation(Relation OldHeap, Relation index, bool verbose)
+rebuild_relation(RepackCommand cmd, bool usingindex,
+				 Relation OldHeap, Relation index, bool verbose)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			accessMethod = OldHeap->rd_rel->relam;
@@ -642,8 +658,8 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose)
 	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false) &&
 		   (index == NULL || CheckRelationLockedByMe(index, AccessExclusiveLock, false)));
 
-	if (index)
-		/* Mark the correct index as clustered */
+	/* for CLUSTER or REPACK USING INDEX, mark the index as the one to use */
+	if (usingindex)
 		mark_index_clustered(OldHeap, RelationGetRelid(index), true);
 
 	/* Remember info about rel before closing OldHeap */
@@ -1458,8 +1474,8 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	int			i;
 
 	/* Report that we are now swapping relation files */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_SWAP_REL_FILES);
 
 	/* Zero out possible results from swapped_relation_files */
 	memset(mapped_tables, 0, sizeof(mapped_tables));
@@ -1509,14 +1525,14 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 		reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
 
 	/* Report that we are now reindexing relations */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_REBUILD_INDEX);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
 
 	reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
 
 	/* Report that we are now doing clean up */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_FINAL_CLEANUP);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_FINAL_CLEANUP);
 
 	/*
 	 * If the relation being rebuilt is pg_class, swap_relation_files()
@@ -1632,69 +1648,137 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	}
 }
 
-
 /*
- * Get a list of tables that the current user has privileges on and
- * have indisclustered set.  Return the list in a List * of RelToCluster
- * (stored in the specified memory context), each one giving the tableOid
- * and the indexOid on which the table is already clustered.
+ * Determine which relations to process, when REPACK/CLUSTER is called
+ * without specifying a table name.  The exact process depends on whether
+ * USING INDEX was given or not, and in any case we only return tables and
+ * materialized views that the current user has privileges to repack/cluster.
+ *
+ * If USING INDEX was given, we scan pg_index to find those that have
+ * indisclustered set; if it was not given, scan pg_class and return all
+ * tables.
+ *
+ * Return it as a list of RelToCluster in the given memory context.
  */
 static List *
-get_tables_to_cluster(MemoryContext cluster_context)
+get_tables_to_repack(RepackCommand command, bool usingindex,
+					 MemoryContext permcxt)
 {
-	Relation	indRelation;
+	Relation	catalog;
 	TableScanDesc scan;
-	ScanKeyData entry;
-	HeapTuple	indexTuple;
-	Form_pg_index index;
+	HeapTuple	tuple;
 	MemoryContext old_context;
 	List	   *rtcs = NIL;
 
-	/*
-	 * Get all indexes that have indisclustered set and that the current user
-	 * has the appropriate privileges for.
-	 */
-	indRelation = table_open(IndexRelationId, AccessShareLock);
-	ScanKeyInit(&entry,
-				Anum_pg_index_indisclustered,
-				BTEqualStrategyNumber, F_BOOLEQ,
-				BoolGetDatum(true));
-	scan = table_beginscan_catalog(indRelation, 1, &entry);
-	while ((indexTuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+	if (usingindex)
 	{
-		RelToCluster *rtc;
+		ScanKeyData entry;
 
-		index = (Form_pg_index) GETSTRUCT(indexTuple);
+		catalog = table_open(IndexRelationId, AccessShareLock);
+		ScanKeyInit(&entry,
+					Anum_pg_index_indisclustered,
+					BTEqualStrategyNumber, F_BOOLEQ,
+					BoolGetDatum(true));
+		scan = table_beginscan_catalog(catalog, 1, &entry);
+		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+		{
+			RelToCluster *rtc;
+			Form_pg_index index;
 
-		if (!cluster_is_permitted_for_relation(index->indrelid, GetUserId()))
-			continue;
+			index = (Form_pg_index) GETSTRUCT(tuple);
 
-		/* Use a permanent memory context for the result list */
-		old_context = MemoryContextSwitchTo(cluster_context);
+			/*
+			 * XXX I think the only reason there's no test failure here is
+			 * that we seldom have clustered indexes that would be affected by
+			 * concurrency.  Maybe we should also do the
+			 * ConditionalLockRelationOid+SearchSysCacheExists dance that we
+			 * do below.
+			 */
+			if (!cluster_is_permitted_for_relation(command, index->indrelid,
+												   GetUserId()))
+				continue;
 
-		rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
-		rtc->tableOid = index->indrelid;
-		rtc->indexOid = index->indexrelid;
-		rtcs = lappend(rtcs, rtc);
+			/* Use a permanent memory context for the result list */
+			old_context = MemoryContextSwitchTo(permcxt);
 
-		MemoryContextSwitchTo(old_context);
+			rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
+			rtc->tableOid = index->indrelid;
+			rtc->indexOid = index->indexrelid;
+			rtcs = lappend(rtcs, rtc);
+
+			MemoryContextSwitchTo(old_context);
+		}
 	}
-	table_endscan(scan);
+	else
+	{
+		catalog = table_open(RelationRelationId, AccessShareLock);
+		scan = table_beginscan_catalog(catalog, 0, NULL);
 
-	relation_close(indRelation, AccessShareLock);
+		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+		{
+			RelToCluster *rtc;
+			Form_pg_class class;
+
+			class = (Form_pg_class) GETSTRUCT(tuple);
+
+			/*
+			 * Try to obtain a light lock on the table, to ensure it doesn't
+			 * go away while we collect the list.  If we cannot, just
+			 * disregard the table.  XXX we could release at the bottom of the
+			 * loop, but for now just hold it until this transaction is
+			 * finished.
+			 */
+			if (!ConditionalLockRelationOid(class->oid, AccessShareLock))
+				continue;
+
+			/* Verify that the table still exists. */
+			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(class->oid)))
+			{
+				/* Release useless lock */
+				UnlockRelationOid(class->oid, AccessShareLock);
+				continue;
+			}
+
+			/* Can only process plain tables and matviews */
+			if (class->relkind != RELKIND_RELATION &&
+				class->relkind != RELKIND_MATVIEW)
+				continue;
+
+			if (!cluster_is_permitted_for_relation(command, class->oid,
+												   GetUserId()))
+				continue;
+
+			/* Use a permanent memory context for the result list */
+			old_context = MemoryContextSwitchTo(permcxt);
+
+			rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
+			rtc->tableOid = class->oid;
+			rtc->indexOid = InvalidOid;
+			rtcs = lappend(rtcs, rtc);
+
+			MemoryContextSwitchTo(old_context);
+		}
+	}
+
+	table_endscan(scan);
+	relation_close(catalog, AccessShareLock);
 
 	return rtcs;
 }
 
 /*
- * Given an index on a partitioned table, return a list of RelToCluster for
+ * Given a partitioned table or its index, return a list of RelToCluster for
  * all the children leaves tables/indexes.
  *
  * Like expand_vacuum_rel, but here caller must hold AccessExclusiveLock
  * on the table containing the index.
+ *
+ * 'rel_is_index' tells whether 'relid' is that of an index (true) or of the
+ * owning relation.
  */
 static List *
-get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
+get_tables_to_repack_partitioned(RepackCommand cmd, MemoryContext cluster_context,
+								 Oid relid, bool rel_is_index)
 {
 	List	   *inhoids;
 	ListCell   *lc;
@@ -1702,17 +1786,33 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
 	MemoryContext old_context;
 
 	/* Do not lock the children until they're processed */
-	inhoids = find_all_inheritors(indexOid, NoLock, NULL);
+	inhoids = find_all_inheritors(relid, NoLock, NULL);
 
 	foreach(lc, inhoids)
 	{
-		Oid			indexrelid = lfirst_oid(lc);
-		Oid			relid = IndexGetRelation(indexrelid, false);
+		Oid			inhoid = lfirst_oid(lc);
+		Oid			inhrelid,
+					inhindid;
 		RelToCluster *rtc;
 
-		/* consider only leaf indexes */
-		if (get_rel_relkind(indexrelid) != RELKIND_INDEX)
-			continue;
+		if (rel_is_index)
+		{
+			/* consider only leaf indexes */
+			if (get_rel_relkind(inhoid) != RELKIND_INDEX)
+				continue;
+
+			inhrelid = IndexGetRelation(inhoid, false);
+			inhindid = inhoid;
+		}
+		else
+		{
+			/* consider only leaf relations */
+			if (get_rel_relkind(inhoid) != RELKIND_RELATION)
+				continue;
+
+			inhrelid = inhoid;
+			inhindid = InvalidOid;
+		}
 
 		/*
 		 * It's possible that the user does not have privileges to CLUSTER the
@@ -1720,15 +1820,15 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
 		 * table.  We skip any partitions which the user is not permitted to
 		 * CLUSTER.
 		 */
-		if (!cluster_is_permitted_for_relation(relid, GetUserId()))
+		if (!cluster_is_permitted_for_relation(cmd, inhrelid, GetUserId()))
 			continue;
 
 		/* Use a permanent memory context for the result list */
 		old_context = MemoryContextSwitchTo(cluster_context);
 
 		rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
-		rtc->tableOid = relid;
-		rtc->indexOid = indexrelid;
+		rtc->tableOid = inhrelid;
+		rtc->indexOid = inhindid;
 		rtcs = lappend(rtcs, rtc);
 
 		MemoryContextSwitchTo(old_context);
@@ -1742,13 +1842,134 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
  * function emits a WARNING.
  */
 static bool
-cluster_is_permitted_for_relation(Oid relid, Oid userid)
+cluster_is_permitted_for_relation(RepackCommand cmd, Oid relid, Oid userid)
 {
 	if (pg_class_aclcheck(relid, userid, ACL_MAINTAIN) == ACLCHECK_OK)
 		return true;
 
+	Assert(cmd == REPACK_COMMAND_CLUSTER || cmd == REPACK_COMMAND_REPACK);
 	ereport(WARNING,
-			(errmsg("permission denied to cluster \"%s\", skipping it",
-					get_rel_name(relid))));
+			errmsg("permission denied to execute %s on \"%s\", skipping it",
+				   cmd == REPACK_COMMAND_CLUSTER ? "CLUSTER" : "REPACK",
+				   get_rel_name(relid)));
+
 	return false;
 }
+
+
+/*
+ * Given a RepackStmt with an indicated relation name, resolve the relation
+ * name, obtain lock on it, then determine what to do based on the relation
+ * type: if it's not a partitioned table, repack it as indicated (using an
+ * existing clustered index, or following the indicated index), and return
+ * NULL.
+ *
+ * On the other hand, if the table is partitioned, do nothing further and
+ * instead return the opened relcache entry, so that caller can process the
+ * partitions using the multiple-table handling code.  The index name is not
+ * resolve in this case.
+ */
+static Relation
+process_single_relation(RepackStmt *stmt, ClusterParams *params)
+{
+	Relation	rel;
+	Oid			tableOid;
+
+	Assert(stmt->relation != NULL);
+	Assert(stmt->command == REPACK_COMMAND_CLUSTER ||
+		   stmt->command == REPACK_COMMAND_REPACK);
+
+	/*
+	 * Find, lock, and check permissions on the table.  We obtain
+	 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
+	 * single-transaction case.
+	 */
+	tableOid = RangeVarGetRelidExtended(stmt->relation,
+										AccessExclusiveLock,
+										0,
+										RangeVarCallbackMaintainsTable,
+										NULL);
+	rel = table_open(tableOid, NoLock);
+
+	/*
+	 * Reject clustering a remote temp table ... their local buffer manager is
+	 * not going to cope.
+	 */
+	if (RELATION_IS_OTHER_TEMP(rel))
+	{
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot execute %s on temporary tables of other sessions",
+					   RepackCommandAsString(stmt->command)));
+	}
+
+	/*
+	 * For partitioned tables, let caller handle this.  Otherwise, process it
+	 * here and we're done.
+	 */
+	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+		return rel;
+	else
+	{
+		Oid			indexOid;
+
+		indexOid = determine_clustered_index(rel, stmt->usingindex,
+											 stmt->indexname);
+		check_index_is_clusterable(rel, indexOid, AccessExclusiveLock);
+		cluster_rel(stmt->command, stmt->usingindex, rel, indexOid, params);
+		return NULL;
+	}
+}
+
+/*
+ * Given a relation and the usingindex/indexname options in a
+ * REPACK USING INDEX or CLUSTER command, return the OID of the index to use
+ * for clustering the table.
+ *
+ * Caller must hold lock on the relation so that the set of indexes doesn't
+ * change, and must call check_index_is_clusterable.
+ */
+static Oid
+determine_clustered_index(Relation rel, bool usingindex, const char *indexname)
+{
+	Oid			indexOid;
+
+	if (indexname == NULL && usingindex)
+	{
+		ListCell   *lc;
+
+		/* Find an index with indisclustered set, or report error */
+		foreach(lc, RelationGetIndexList(rel))
+		{
+			indexOid = lfirst_oid(lc);
+
+			if (get_index_isclustered(indexOid))
+				break;
+			indexOid = InvalidOid;
+		}
+
+		if (!OidIsValid(indexOid))
+			ereport(ERROR,
+					errcode(ERRCODE_UNDEFINED_OBJECT),
+					errmsg("there is no previously clustered index for table \"%s\"",
+						   RelationGetRelationName(rel)));
+	}
+	else if (indexname != NULL)
+	{
+		/*
+		 * An index was specified; figure out its name.  It must be in the
+		 * same namespace as the relation.
+		 */
+		indexOid = get_relname_relid(indexname,
+									 rel->rd_rel->relnamespace);
+		if (!OidIsValid(indexOid))
+			ereport(ERROR,
+					errcode(ERRCODE_UNDEFINED_OBJECT),
+					errmsg("index \"%s\" for table \"%s\" does not exist",
+						   indexname, RelationGetRelationName(rel)));
+	}
+	else
+		indexOid = InvalidOid;
+
+	return indexOid;
+}
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 733ef40ae7c..8863ad0e8bd 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -2287,7 +2287,8 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 				cluster_params.options |= CLUOPT_VERBOSE;
 
 			/* VACUUM FULL is now a variant of CLUSTER; see cluster.c */
-			cluster_rel(rel, InvalidOid, &cluster_params);
+			cluster_rel(REPACK_COMMAND_VACUUMFULL, false, rel, InvalidOid,
+						&cluster_params);
 			/* cluster_rel closes the relation, but keeps lock */
 
 			rel = NULL;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index db43034b9db..062235817f4 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -280,7 +280,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		AlterCompositeTypeStmt AlterUserMappingStmt
 		AlterRoleStmt AlterRoleSetStmt AlterPolicyStmt AlterStatsStmt
 		AlterDefaultPrivilegesStmt DefACLAction
-		AnalyzeStmt CallStmt ClosePortalStmt ClusterStmt CommentStmt
+		AnalyzeStmt CallStmt ClosePortalStmt CommentStmt
 		ConstraintsSetStmt CopyStmt CreateAsStmt CreateCastStmt
 		CreateDomainStmt CreateExtensionStmt CreateGroupStmt CreateOpClassStmt
 		CreateOpFamilyStmt AlterOpFamilyStmt CreatePLangStmt
@@ -297,7 +297,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		GrantStmt GrantRoleStmt ImportForeignSchemaStmt IndexStmt InsertStmt
 		ListenStmt LoadStmt LockStmt MergeStmt NotifyStmt ExplainableStmt PreparableStmt
 		CreateFunctionStmt AlterFunctionStmt ReindexStmt RemoveAggrStmt
-		RemoveFuncStmt RemoveOperStmt RenameStmt ReturnStmt RevokeStmt RevokeRoleStmt
+		RemoveFuncStmt RemoveOperStmt RenameStmt RepackStmt ReturnStmt RevokeStmt RevokeRoleStmt
 		RuleActionStmt RuleActionStmtOrEmpty RuleStmt
 		SecLabelStmt SelectStmt TransactionStmt TransactionStmtLegacy TruncateStmt
 		UnlistenStmt UpdateStmt VacuumStmt
@@ -316,7 +316,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 %type <str>			opt_single_name
 %type <list>		opt_qualified_name
-%type <boolean>		opt_concurrently
+%type <boolean>		opt_concurrently opt_usingindex
 %type <dbehavior>	opt_drop_behavior
 %type <list>		opt_utility_option_list
 %type <list>		utility_option_list
@@ -763,7 +763,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	QUOTE QUOTES
 
 	RANGE READ REAL REASSIGN RECURSIVE REF_P REFERENCES REFERENCING
-	REFRESH REINDEX RELATIVE_P RELEASE RENAME REPEATABLE REPLACE REPLICA
+	REFRESH REINDEX RELATIVE_P RELEASE RENAME REPACK REPEATABLE REPLACE REPLICA
 	RESET RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP
 	ROUTINE ROUTINES ROW ROWS RULE
 
@@ -1025,7 +1025,6 @@ stmt:
 			| CallStmt
 			| CheckPointStmt
 			| ClosePortalStmt
-			| ClusterStmt
 			| CommentStmt
 			| ConstraintsSetStmt
 			| CopyStmt
@@ -1099,6 +1098,7 @@ stmt:
 			| RemoveFuncStmt
 			| RemoveOperStmt
 			| RenameStmt
+			| RepackStmt
 			| RevokeStmt
 			| RevokeRoleStmt
 			| RuleStmt
@@ -1135,6 +1135,11 @@ opt_concurrently:
 			| /*EMPTY*/						{ $$ = false; }
 		;
 
+opt_usingindex:
+			USING INDEX						{ $$ = true; }
+			| /* EMPTY */					{ $$ = false; }
+		;
+
 opt_drop_behavior:
 			CASCADE							{ $$ = DROP_CASCADE; }
 			| RESTRICT						{ $$ = DROP_RESTRICT; }
@@ -11912,38 +11917,80 @@ CreateConversionStmt:
 /*****************************************************************************
  *
  *		QUERY:
+ *				REPACK [ (options) ] [ <qualified_name> [ USING INDEX <index_name> ] ]
+ *
+ *			obsolete variants:
  *				CLUSTER (options) [ <qualified_name> [ USING <index_name> ] ]
  *				CLUSTER [VERBOSE] [ <qualified_name> [ USING <index_name> ] ]
  *				CLUSTER [VERBOSE] <index_name> ON <qualified_name> (for pre-8.3)
  *
  *****************************************************************************/
 
-ClusterStmt:
-			CLUSTER '(' utility_option_list ')' qualified_name cluster_index_specification
+RepackStmt:
+			REPACK opt_utility_option_list qualified_name USING INDEX name
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = $3;
+					n->indexname = $6;
+					n->usingindex = true;
+					n->params = $2;
+					$$ = (Node *) n;
+				}
+			| REPACK opt_utility_option_list qualified_name opt_usingindex
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = $3;
+					n->indexname = NULL;
+					n->usingindex = $4;
+					n->params = $2;
+					$$ = (Node *) n;
+				}
+			| REPACK opt_usingindex
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = NULL;
+					n->indexname = NULL;
+					n->usingindex = $2;
+					n->params = NIL;
+					$$ = (Node *) n;
+				}
+			| CLUSTER '(' utility_option_list ')' qualified_name cluster_index_specification
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = $5;
 					n->indexname = $6;
+					n->usingindex = true;
 					n->params = $3;
 					$$ = (Node *) n;
 				}
 			| CLUSTER opt_utility_option_list
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = NULL;
 					n->indexname = NULL;
+					n->usingindex = true;
 					n->params = $2;
 					$$ = (Node *) n;
 				}
 			/* unparenthesized VERBOSE kept for pre-14 compatibility */
 			| CLUSTER opt_verbose qualified_name cluster_index_specification
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = $3;
 					n->indexname = $4;
+					n->usingindex = true;
 					if ($2)
 						n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
@@ -11951,20 +11998,24 @@ ClusterStmt:
 			/* unparenthesized VERBOSE kept for pre-17 compatibility */
 			| CLUSTER VERBOSE
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = NULL;
 					n->indexname = NULL;
+					n->usingindex = true;
 					n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
 				}
 			/* kept for pre-8.3 compatibility */
 			| CLUSTER opt_verbose name ON qualified_name
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = $5;
 					n->indexname = $3;
+					n->usingindex = true;
 					if ($2)
 						n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
@@ -17960,6 +18011,7 @@ unreserved_keyword:
 			| RELATIVE_P
 			| RELEASE
 			| RENAME
+			| REPACK
 			| REPEATABLE
 			| REPLACE
 			| REPLICA
@@ -18592,6 +18644,7 @@ bare_label_keyword:
 			| RELATIVE_P
 			| RELEASE
 			| RENAME
+			| REPACK
 			| REPEATABLE
 			| REPLACE
 			| REPLICA
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 4f4191b0ea6..8a30e081614 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -277,9 +277,9 @@ ClassifyUtilityCommandAsReadOnly(Node *parsetree)
 				return COMMAND_OK_IN_RECOVERY | COMMAND_OK_IN_READ_ONLY_TXN;
 			}
 
-		case T_ClusterStmt:
 		case T_ReindexStmt:
 		case T_VacuumStmt:
+		case T_RepackStmt:
 			{
 				/*
 				 * These commands write WAL, so they're not strictly
@@ -854,14 +854,14 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 			ExecuteCallStmt(castNode(CallStmt, parsetree), params, isAtomicContext, dest);
 			break;
 
-		case T_ClusterStmt:
-			cluster(pstate, (ClusterStmt *) parsetree, isTopLevel);
-			break;
-
 		case T_VacuumStmt:
 			ExecVacuum(pstate, (VacuumStmt *) parsetree, isTopLevel);
 			break;
 
+		case T_RepackStmt:
+			ExecRepack(pstate, (RepackStmt *) parsetree, isTopLevel);
+			break;
+
 		case T_ExplainStmt:
 			ExplainQuery(pstate, (ExplainStmt *) parsetree, params, dest);
 			break;
@@ -2850,10 +2850,6 @@ CreateCommandTag(Node *parsetree)
 			tag = CMDTAG_CALL;
 			break;
 
-		case T_ClusterStmt:
-			tag = CMDTAG_CLUSTER;
-			break;
-
 		case T_VacuumStmt:
 			if (((VacuumStmt *) parsetree)->is_vacuumcmd)
 				tag = CMDTAG_VACUUM;
@@ -2861,6 +2857,10 @@ CreateCommandTag(Node *parsetree)
 				tag = CMDTAG_ANALYZE;
 			break;
 
+		case T_RepackStmt:
+			tag = CMDTAG_REPACK;
+			break;
+
 		case T_ExplainStmt:
 			tag = CMDTAG_EXPLAIN;
 			break;
@@ -3498,7 +3498,7 @@ GetCommandLogLevel(Node *parsetree)
 			lev = LOGSTMT_ALL;
 			break;
 
-		case T_ClusterStmt:
+		case T_RepackStmt:
 			lev = LOGSTMT_DDL;
 			break;
 
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 1c12ddbae49..b2ad8ba45cd 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -268,6 +268,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 		cmdtype = PROGRESS_COMMAND_ANALYZE;
 	else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
 		cmdtype = PROGRESS_COMMAND_CLUSTER;
+	else if (pg_strcasecmp(cmd, "REPACK") == 0)
+		cmdtype = PROGRESS_COMMAND_REPACK;
 	else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)
 		cmdtype = PROGRESS_COMMAND_CREATE_INDEX;
 	else if (pg_strcasecmp(cmd, "BASEBACKUP") == 0)
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 1f2ca946fc5..31c1ce5dc09 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -1247,7 +1247,7 @@ static const char *const sql_commands[] = {
 	"DELETE FROM", "DISCARD", "DO", "DROP", "END", "EXECUTE", "EXPLAIN",
 	"FETCH", "GRANT", "IMPORT FOREIGN SCHEMA", "INSERT INTO", "LISTEN", "LOAD", "LOCK",
 	"MERGE INTO", "MOVE", "NOTIFY", "PREPARE",
-	"REASSIGN", "REFRESH MATERIALIZED VIEW", "REINDEX", "RELEASE",
+	"REASSIGN", "REFRESH MATERIALIZED VIEW", "REINDEX", "RELEASE", "REPACK",
 	"RESET", "REVOKE", "ROLLBACK",
 	"SAVEPOINT", "SECURITY LABEL", "SELECT", "SET", "SHOW", "START",
 	"TABLE", "TRUNCATE", "UNLISTEN", "UPDATE", "VACUUM", "VALUES", "WITH",
@@ -4997,6 +4997,37 @@ match_previous_words(int pattern_id,
 			COMPLETE_WITH_QUERY(Query_for_list_of_tablespaces);
 	}
 
+/* REPACK */
+	else if (Matches("REPACK"))
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
+	else if (Matches("REPACK", "(*)"))
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
+	/* If we have REPACK <sth>, then add "USING INDEX" */
+	else if (Matches("REPACK", MatchAnyExcept("(")))
+		COMPLETE_WITH("USING INDEX");
+	/* If we have REPACK (*) <sth>, then add "USING INDEX" */
+	else if (Matches("REPACK", "(*)", MatchAny))
+		COMPLETE_WITH("USING INDEX");
+	/* If we have REPACK <sth> USING, then add the index as well */
+	else if (Matches("REPACK", MatchAny, "USING", "INDEX"))
+	{
+		set_completion_reference(prev3_wd);
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_index_of_table);
+	}
+	else if (HeadMatches("REPACK", "(*") &&
+			 !HeadMatches("REPACK", "(*)"))
+	{
+		/*
+		 * This fires if we're in an unfinished parenthesized option list.
+		 * get_previous_words treats a completed parenthesized option list as
+		 * one word, so the above test is correct.
+		 */
+		if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
+			COMPLETE_WITH("VERBOSE");
+		else if (TailMatches("VERBOSE"))
+			COMPLETE_WITH("ON", "OFF");
+	}
+
 /* SECURITY LABEL */
 	else if (Matches("SECURITY"))
 		COMPLETE_WITH("LABEL");
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 60088a64cbb..7f4138c7b36 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -31,8 +31,11 @@ typedef struct ClusterParams
 	bits32		options;		/* bitmask of CLUOPT_* */
 } ClusterParams;
 
-extern void cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel);
-extern void cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params);
+
+extern void ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel);
+
+extern void cluster_rel(RepackCommand command, bool usingindex,
+						Relation OldHeap, Oid indexOid, ClusterParams *params);
 extern void check_index_is_clusterable(Relation OldHeap, Oid indexOid,
 									   LOCKMODE lockmode);
 extern void mark_index_clustered(Relation rel, Oid indexOid, bool is_internal);
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 7c736e7b03b..5f102743af4 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -56,24 +56,51 @@
 #define PROGRESS_ANALYZE_PHASE_COMPUTE_EXT_STATS		4
 #define PROGRESS_ANALYZE_PHASE_FINALIZE_ANALYZE			5
 
-/* Progress parameters for cluster */
-#define PROGRESS_CLUSTER_COMMAND				0
-#define PROGRESS_CLUSTER_PHASE					1
-#define PROGRESS_CLUSTER_INDEX_RELID			2
-#define PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED	3
-#define PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN	4
-#define PROGRESS_CLUSTER_TOTAL_HEAP_BLKS		5
-#define PROGRESS_CLUSTER_HEAP_BLKS_SCANNED		6
-#define PROGRESS_CLUSTER_INDEX_REBUILD_COUNT	7
+/*
+ * Progress parameters for REPACK.
+ *
+ * Note: Since REPACK shares some code with CLUSTER, these values are also
+ * used by CLUSTER. (CLUSTER is now deprecated, so it makes little sense to
+ * introduce a separate set of constants.)
+ */
+#define PROGRESS_REPACK_COMMAND					0
+#define PROGRESS_REPACK_PHASE					1
+#define PROGRESS_REPACK_INDEX_RELID				2
+#define PROGRESS_REPACK_HEAP_TUPLES_SCANNED		3
+#define PROGRESS_REPACK_HEAP_TUPLES_WRITTEN		4
+#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			5
+#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		6
+#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		7
 
-/* Phases of cluster (as advertised via PROGRESS_CLUSTER_PHASE) */
-#define PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP	1
-#define PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP	2
-#define PROGRESS_CLUSTER_PHASE_SORT_TUPLES		3
-#define PROGRESS_CLUSTER_PHASE_WRITE_NEW_HEAP	4
-#define PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES	5
-#define PROGRESS_CLUSTER_PHASE_REBUILD_INDEX	6
-#define PROGRESS_CLUSTER_PHASE_FINAL_CLEANUP	7
+/*
+ * Phases of repack (as advertised via PROGRESS_REPACK_PHASE).
+ */
+#define PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP		1
+#define PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP	2
+#define PROGRESS_REPACK_PHASE_SORT_TUPLES		3
+#define PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP	4
+#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	5
+#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		6
+#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		7
+
+/*
+ * Commands of PROGRESS_REPACK
+ *
+ * Currently we only have one command, so the PROGRESS_REPACK_COMMAND
+ * parameter is not necessary. However it makes cluster.c simpler if we have
+ * the same set of parameters for CLUSTER and REPACK - see the note on REPACK
+ * parameters above.
+ */
+#define PROGRESS_REPACK_COMMAND_REPACK			1
+
+/*
+ * Progress parameters for cluster.
+ *
+ * Although we need to report REPACK and CLUSTER in separate views, the
+ * parameters and phases of CLUSTER are a subset of those of REPACK. Therefore
+ * we just use the appropriate values defined for REPACK above instead of
+ * defining a separate set of constants here.
+ */
 
 /* Commands of PROGRESS_CLUSTER */
 #define PROGRESS_CLUSTER_COMMAND_CLUSTER		1
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 86a236bd58b..fcc25a0c592 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3949,16 +3949,26 @@ typedef struct AlterSystemStmt
 } AlterSystemStmt;
 
 /* ----------------------
- *		Cluster Statement (support pbrown's cluster index implementation)
+ *		Repack Statement
  * ----------------------
  */
-typedef struct ClusterStmt
+typedef enum RepackCommand
+{
+	REPACK_COMMAND_CLUSTER,
+	REPACK_COMMAND_REPACK,
+	REPACK_COMMAND_VACUUMFULL,
+} RepackCommand;
+
+typedef struct RepackStmt
 {
 	NodeTag		type;
-	RangeVar   *relation;		/* relation being indexed, or NULL if all */
-	char	   *indexname;		/* original index defined */
+	RepackCommand command;		/* type of command being run */
+	RangeVar   *relation;		/* relation being repacked */
+	char	   *indexname;		/* order tuples by this index */
+	bool		usingindex;		/* whether USING INDEX is specified */
 	List	   *params;			/* list of DefElem nodes */
-} ClusterStmt;
+} RepackStmt;
+
 
 /* ----------------------
  *		Vacuum and Analyze Statements
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index a4af3f717a1..22559369e2c 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -374,6 +374,7 @@ PG_KEYWORD("reindex", REINDEX, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("relative", RELATIVE_P, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("release", RELEASE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("rename", RENAME, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("repack", REPACK, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("repeatable", REPEATABLE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("replace", REPLACE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("replica", REPLICA, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/include/tcop/cmdtaglist.h b/src/include/tcop/cmdtaglist.h
index d250a714d59..cceb312f2b3 100644
--- a/src/include/tcop/cmdtaglist.h
+++ b/src/include/tcop/cmdtaglist.h
@@ -196,6 +196,7 @@ PG_CMDTAG(CMDTAG_REASSIGN_OWNED, "REASSIGN OWNED", false, false, false)
 PG_CMDTAG(CMDTAG_REFRESH_MATERIALIZED_VIEW, "REFRESH MATERIALIZED VIEW", true, false, false)
 PG_CMDTAG(CMDTAG_REINDEX, "REINDEX", true, false, false)
 PG_CMDTAG(CMDTAG_RELEASE, "RELEASE", false, false, false)
+PG_CMDTAG(CMDTAG_REPACK, "REPACK", false, false, false)
 PG_CMDTAG(CMDTAG_RESET, "RESET", false, false, false)
 PG_CMDTAG(CMDTAG_REVOKE, "REVOKE", true, false, false)
 PG_CMDTAG(CMDTAG_REVOKE_ROLE, "REVOKE ROLE", false, false, false)
diff --git a/src/include/utils/backend_progress.h b/src/include/utils/backend_progress.h
index dda813ab407..e69e366dcdc 100644
--- a/src/include/utils/backend_progress.h
+++ b/src/include/utils/backend_progress.h
@@ -28,6 +28,7 @@ typedef enum ProgressCommandType
 	PROGRESS_COMMAND_CREATE_INDEX,
 	PROGRESS_COMMAND_BASEBACKUP,
 	PROGRESS_COMMAND_COPY,
+	PROGRESS_COMMAND_REPACK,
 } ProgressCommandType;
 
 #define PGSTAT_NUM_PROGRESS_PARAM	20
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index 4d40a6809ab..5256628b51d 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -254,6 +254,63 @@ ORDER BY 1;
  clstr_tst_pkey
 (3 rows)
 
+-- REPACK handles individual tables identically to CLUSTER, but it's worth
+-- checking if it handles table hierarchies identically as well.
+REPACK clstr_tst USING INDEX clstr_tst_c;
+-- Verify that inheritance link still works
+INSERT INTO clstr_tst_inh VALUES (0, 100, 'in child table 2');
+SELECT a,b,c,substring(d for 30), length(d) from clstr_tst;
+ a  |  b  |        c         |           substring            | length 
+----+-----+------------------+--------------------------------+--------
+ 10 |  14 | catorce          |                                |       
+ 18 |   5 | cinco            |                                |       
+  9 |   4 | cuatro           |                                |       
+ 26 |  19 | diecinueve       |                                |       
+ 12 |  18 | dieciocho        |                                |       
+ 30 |  16 | dieciseis        |                                |       
+ 24 |  17 | diecisiete       |                                |       
+  2 |  10 | diez             |                                |       
+ 23 |  12 | doce             |                                |       
+ 11 |   2 | dos              |                                |       
+ 25 |   9 | nueve            |                                |       
+ 31 |   8 | ocho             |                                |       
+  1 |  11 | once             |                                |       
+ 28 |  15 | quince           |                                |       
+ 32 |   6 | seis             | xyzzyxyzzyxyzzyxyzzyxyzzyxyzzy | 500000
+ 29 |   7 | siete            |                                |       
+ 15 |  13 | trece            |                                |       
+ 22 |  30 | treinta          |                                |       
+ 17 |  32 | treinta y dos    |                                |       
+  3 |  31 | treinta y uno    |                                |       
+  5 |   3 | tres             |                                |       
+ 20 |   1 | uno              |                                |       
+  6 |  20 | veinte           |                                |       
+ 14 |  25 | veinticinco      |                                |       
+ 21 |  24 | veinticuatro     |                                |       
+  4 |  22 | veintidos        |                                |       
+ 19 |  29 | veintinueve      |                                |       
+ 16 |  28 | veintiocho       |                                |       
+ 27 |  26 | veintiseis       |                                |       
+ 13 |  27 | veintisiete      |                                |       
+  7 |  23 | veintitres       |                                |       
+  8 |  21 | veintiuno        |                                |       
+  0 | 100 | in child table   |                                |       
+  0 | 100 | in child table 2 |                                |       
+(34 rows)
+
+-- Verify that foreign key link still works
+INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
+ERROR:  insert or update on table "clstr_tst" violates foreign key constraint "clstr_tst_con"
+DETAIL:  Key (b)=(1111) is not present in table "clstr_tst_s".
+SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
+ORDER BY 1;
+       conname        
+----------------------
+ clstr_tst_a_not_null
+ clstr_tst_con
+ clstr_tst_pkey
+(3 rows)
+
 SELECT relname, relkind,
     EXISTS(SELECT 1 FROM pg_class WHERE oid = c.reltoastrelid) AS hastoast
 FROM pg_class c WHERE relname LIKE 'clstr_tst%' ORDER BY relname;
@@ -381,6 +438,35 @@ SELECT * FROM clstr_1;
  2
 (2 rows)
 
+-- REPACK w/o argument performs no ordering, so we can only check which tables
+-- have the relfilenode changed.
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_old AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+SET SESSION AUTHORIZATION regress_clstr_user;
+SET client_min_messages = ERROR;  -- order of "skipping" warnings may vary
+REPACK;
+RESET client_min_messages;
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_new AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+-- Do the actual comparison. Unlike CLUSTER, clstr_3 should have been
+-- processed because there is nothing like clustering index here.
+SELECT o.relname FROM relnodes_old o
+JOIN relnodes_new n ON o.relname = n.relname
+WHERE o.relfilenode <> n.relfilenode
+ORDER BY o.relname;
+ relname 
+---------
+ clstr_1
+ clstr_3
+(2 rows)
+
+SET SESSION AUTHORIZATION regress_clstr_user;
 -- Test MVCC-safety of cluster. There isn't much we can do to verify the
 -- results with a single backend...
 CREATE TABLE clustertest (key int PRIMARY KEY);
@@ -495,6 +581,43 @@ ALTER TABLE clstrpart SET WITHOUT CLUSTER;
 ERROR:  cannot mark index clustered in partitioned table
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 ERROR:  cannot mark index clustered in partitioned table
+-- Check that REPACK sets new relfilenodes: it should process exactly the same
+-- tables as CLUSTER did.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart USING INDEX clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
+-- And finally the same for REPACK w/o index.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
 DROP TABLE clstrpart;
 -- Ownership of partitions is checked
 CREATE TABLE ptnowner(i int unique) PARTITION BY LIST (i);
@@ -513,7 +636,7 @@ CREATE TEMP TABLE ptnowner_oldnodes AS
   JOIN pg_class AS c ON c.oid=tree.relid;
 SET SESSION AUTHORIZATION regress_ptnowner;
 CLUSTER ptnowner USING ptnowner_i_idx;
-WARNING:  permission denied to cluster "ptnowner2", skipping it
+WARNING:  permission denied to execute CLUSTER on "ptnowner2", skipping it
 RESET SESSION AUTHORIZATION;
 SELECT a.relname, a.relfilenode=b.relfilenode FROM pg_class a
   JOIN ptnowner_oldnodes b USING (oid) ORDER BY a.relname COLLATE "C";
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index dce8c672b40..84ae47080f5 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2066,6 +2066,29 @@ pg_stat_progress_create_index| SELECT s.pid,
     s.param15 AS partitions_done
    FROM (pg_stat_get_progress_info('CREATE INDEX'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+pg_stat_progress_repack| SELECT s.pid,
+    s.datid,
+    d.datname,
+    s.relid,
+        CASE s.param2
+            WHEN 0 THEN 'initializing'::text
+            WHEN 1 THEN 'seq scanning heap'::text
+            WHEN 2 THEN 'index scanning heap'::text
+            WHEN 3 THEN 'sorting tuples'::text
+            WHEN 4 THEN 'writing new heap'::text
+            WHEN 5 THEN 'swapping relation files'::text
+            WHEN 6 THEN 'rebuilding index'::text
+            WHEN 7 THEN 'performing final cleanup'::text
+            ELSE NULL::text
+        END AS phase,
+    (s.param3)::oid AS repack_index_relid,
+    s.param4 AS heap_tuples_scanned,
+    s.param5 AS heap_tuples_written,
+    s.param6 AS heap_blks_total,
+    s.param7 AS heap_blks_scanned,
+    s.param8 AS index_rebuild_count
+   FROM (pg_stat_get_progress_info('REPACK'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
+     LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_vacuum| SELECT s.pid,
     s.datid,
     d.datname,
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index b7115f86104..cfcc3dc9761 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -76,6 +76,19 @@ INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
 SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
 ORDER BY 1;
 
+-- REPACK handles individual tables identically to CLUSTER, but it's worth
+-- checking if it handles table hierarchies identically as well.
+REPACK clstr_tst USING INDEX clstr_tst_c;
+
+-- Verify that inheritance link still works
+INSERT INTO clstr_tst_inh VALUES (0, 100, 'in child table 2');
+SELECT a,b,c,substring(d for 30), length(d) from clstr_tst;
+
+-- Verify that foreign key link still works
+INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
+
+SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
+ORDER BY 1;
 
 SELECT relname, relkind,
     EXISTS(SELECT 1 FROM pg_class WHERE oid = c.reltoastrelid) AS hastoast
@@ -159,6 +172,34 @@ INSERT INTO clstr_1 VALUES (1);
 CLUSTER clstr_1;
 SELECT * FROM clstr_1;
 
+-- REPACK w/o argument performs no ordering, so we can only check which tables
+-- have the relfilenode changed.
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_old AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+
+SET SESSION AUTHORIZATION regress_clstr_user;
+SET client_min_messages = ERROR;  -- order of "skipping" warnings may vary
+REPACK;
+RESET client_min_messages;
+
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_new AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+
+-- Do the actual comparison. Unlike CLUSTER, clstr_3 should have been
+-- processed because there is nothing like clustering index here.
+SELECT o.relname FROM relnodes_old o
+JOIN relnodes_new n ON o.relname = n.relname
+WHERE o.relfilenode <> n.relfilenode
+ORDER BY o.relname;
+
+SET SESSION AUTHORIZATION regress_clstr_user;
+
 -- Test MVCC-safety of cluster. There isn't much we can do to verify the
 -- results with a single backend...
 
@@ -229,6 +270,24 @@ SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM o
 CLUSTER clstrpart;
 ALTER TABLE clstrpart SET WITHOUT CLUSTER;
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+
+-- Check that REPACK sets new relfilenodes: it should process exactly the same
+-- tables as CLUSTER did.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart USING INDEX clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+
+-- And finally the same for REPACK w/o index.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+
 DROP TABLE clstrpart;
 
 -- Ownership of partitions is checked
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e6f2e93b2d6..24f98ed1e9e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -425,6 +425,7 @@ ClientCertName
 ClientConnectionInfo
 ClientData
 ClientSocket
+ClusterCommand
 ClonePtrType
 ClosePortalStmt
 ClosePtrType
@@ -2536,6 +2537,8 @@ ReorderBufferTupleCidKey
 ReorderBufferUpdateProgressTxnCB
 ReorderTuple
 RepOriginId
+RepackCommand
+RepackStmt
 ReparameterizeForeignPathByChild_function
 ReplaceVarsFromTargetList_context
 ReplaceVarsNoMatchOption
-- 
2.39.5

v17-0002-Move-conversion-of-a-historic-to-MVCC-snapshot-t.patchtext/x-diff; charset=utf-8Download

From 66f41ccd5cb75c46cf7b367127aae328c7ce3480 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Mon, 30 Jun 2025 19:41:42 +0200
Subject: [PATCH v17 2/3] Move conversion of a "historic" to MVCC snapshot to a
 separate function.

The conversion is now handled by SnapBuildMVCCFromHistoric(). REPACK
CONCURRENTLY will also need it.
---
 src/backend/replication/logical/snapbuild.c | 51 +++++++++++++++++----
 src/backend/utils/time/snapmgr.c            |  3 +-
 src/include/replication/snapbuild.h         |  1 +
 src/include/utils/snapmgr.h                 |  1 +
 4 files changed, 45 insertions(+), 11 deletions(-)

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 8532bfd27e5..6eaa40f6acf 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -440,10 +440,7 @@ Snapshot
 SnapBuildInitialSnapshot(SnapBuild *builder)
 {
 	Snapshot	snap;
-	TransactionId xid;
 	TransactionId safeXid;
-	TransactionId *newxip;
-	int			newxcnt = 0;
 
 	Assert(XactIsoLevel == XACT_REPEATABLE_READ);
 	Assert(builder->building_full_snapshot);
@@ -485,6 +482,31 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 
 	MyProc->xmin = snap->xmin;
 
+	/* Convert the historic snapshot to MVCC snapshot. */
+	return SnapBuildMVCCFromHistoric(snap, true);
+}
+
+/*
+ * Turn a historic MVCC snapshot into an ordinary MVCC snapshot.
+ *
+ * Unlike a regular (non-historic) MVCC snapshot, the xip array of this
+ * snapshot contains not only running main transactions, but also their
+ * subtransactions. This difference does has no impact on XidInMVCCSnapshot().
+ *
+ * Pass true for 'in_place' if you don't care about modifying the source
+ * snapshot. If you need a new instance, and one that was allocated as a
+ * single chunk of memory, pass false.
+ */
+Snapshot
+SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place)
+{
+	TransactionId xid;
+	TransactionId *oldxip = snapshot->xip;
+	uint32		oldxcnt = snapshot->xcnt;
+	TransactionId *newxip;
+	int			newxcnt = 0;
+	Snapshot	result;
+
 	/* allocate in transaction context */
 	newxip = (TransactionId *)
 		palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount());
@@ -495,7 +517,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	 * classical snapshot by marking all non-committed transactions as
 	 * in-progress. This can be expensive.
 	 */
-	for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);)
+	for (xid = snapshot->xmin; NormalTransactionIdPrecedes(xid, snapshot->xmax);)
 	{
 		void	   *test;
 
@@ -503,7 +525,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 		 * Check whether transaction committed using the decoding snapshot
 		 * meaning of ->xip.
 		 */
-		test = bsearch(&xid, snap->xip, snap->xcnt,
+		test = bsearch(&xid, snapshot->xip, snapshot->xcnt,
 					   sizeof(TransactionId), xidComparator);
 
 		if (test == NULL)
@@ -520,11 +542,22 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	}
 
 	/* adjust remaining snapshot fields as needed */
-	snap->snapshot_type = SNAPSHOT_MVCC;
-	snap->xcnt = newxcnt;
-	snap->xip = newxip;
+	snapshot->xcnt = newxcnt;
+	snapshot->xip = newxip;
 
-	return snap;
+	if (in_place)
+		result = snapshot;
+	else
+	{
+		result = CopySnapshot(snapshot);
+
+		/* Restore the original values so the source is intact. */
+		snapshot->xip = oldxip;
+		snapshot->xcnt = oldxcnt;
+	}
+	result->snapshot_type = SNAPSHOT_MVCC;
+
+	return result;
 }
 
 /*
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index ea35f30f494..70a6b8902d1 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -212,7 +212,6 @@ typedef struct ExportedSnapshot
 static List *exportedSnapshots = NIL;
 
 /* Prototypes for local functions */
-static Snapshot CopySnapshot(Snapshot snapshot);
 static void UnregisterSnapshotNoOwner(Snapshot snapshot);
 static void FreeSnapshot(Snapshot snapshot);
 static void SnapshotResetXmin(void);
@@ -591,7 +590,7 @@ SetTransactionSnapshot(Snapshot sourcesnap, VirtualTransactionId *sourcevxid,
  * The copy is palloc'd in TopTransactionContext and has initial refcounts set
  * to 0.  The returned snapshot has the copied flag set.
  */
-static Snapshot
+Snapshot
 CopySnapshot(Snapshot snapshot)
 {
 	Snapshot	newsnap;
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index 44031dcf6e3..6d4d2d1814c 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -73,6 +73,7 @@ extern void FreeSnapshotBuilder(SnapBuild *builder);
 extern void SnapBuildSnapDecRefcount(Snapshot snap);
 
 extern Snapshot SnapBuildInitialSnapshot(SnapBuild *builder);
+extern Snapshot SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place);
 extern const char *SnapBuildExportSnapshot(SnapBuild *builder);
 extern void SnapBuildClearExportedSnapshot(void);
 extern void SnapBuildResetExportedSnapshotState(void);
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index d346be71642..147b190210a 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -60,6 +60,7 @@ extern Snapshot GetTransactionSnapshot(void);
 extern Snapshot GetLatestSnapshot(void);
 extern void SnapshotSetCommandId(CommandId curcid);
 
+extern Snapshot CopySnapshot(Snapshot snapshot);
 extern Snapshot GetCatalogSnapshot(Oid relid);
 extern Snapshot GetNonHistoricCatalogSnapshot(Oid relid);
 extern void InvalidateCatalogSnapshot(void);
-- 
2.39.5

v17-0003-Add-CONCURRENTLY-option-to-REPACK-command.patchtext/x-diff; charset=utf-8Download

From 16d06847870a16b91a86faf17c1b661cf8a25e96 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= <alvherre@kurilemu.de>
Date: Tue, 29 Jul 2025 19:00:46 +0200
Subject: [PATCH v17 3/3] Add CONCURRENTLY option to REPACK command.

The REPACK command copies the relation data into a new file, creates new
indexes and eventually swaps the files. To make sure that the old file does
not change during the copying, the relation is locked in an exclusive mode,
which prevents applications from both reading and writing. (To keep the data
consistent, we'd only need to prevent the applications from writing, but even
reading needs to be blocked before we can swap the files - otherwise some
applications could continue using the old file. Since we should not request a
stronger lock without releasing the weaker one first, we acquire the exclusive
lock in the beginning and keep it till the end of the processing.)

This patch introduces an alternative workflow, which only requires the
exclusive lock when the relation (and index) files are being swapped.
(Supposedly, the swapping should be pretty fast.) On the other hand, when we
copy the data to the new file, we allow applications to read from the relation
and even to write to it.

First, we scan the relation using a "historic snapshot", and insert all the
tuples satisfying this snapshot into the new file.

Second, logical decoding is used to capture the data changes done by
applications during the copying (i.e. changes that do not satisfy the historic
snapshot mentioned above), and those are applied to the new file before we
acquire the exclusive lock that we need to swap the files. (Of course, more
data changes can take place while we are waiting for the lock - these will be
applied to the new file after we have acquired the lock, before we swap the
files.)

Since the logical decoding system, during its startup, waits until all the
transactions which already have XID assigned have finished, there is a risk of
deadlock if a transaction that already changed anything in the database tries
to acquire a conflicting lock on the table REPACK CONCURRENTLY is working
on. As an example, consider transaction running CREATE INDEX command on the
table that is being REPACKed CONCURRENTLY. On the other hand, DML commands
(INSERT, UPDATE, DELETE) are not a problem as their lock does not conflict
with REPACK CONCURRENTLY.

The current approach is that we accept the risk. If we tried to avoid it, it'd
be necessary to unlock the table before the logical decoding is setup and lock
it again afterwards. Such temporary unlocking would imply re-checking if the
table still meets all the requirements for REPACK CONCURRENTLY.

Like the existing implementation of REPACK, the variant with the CONCURRENTLY
option also requires an extra space for the new relation and index files
(which coexist with the old files for some time). In addition, the
CONCURRENTLY option might introduce a lag in releasing WAL segments for
archiving / recycling. This is due to the decoding of the data changes done by
applications concurrently. When copying the table contents into the new file,
we check the lag periodically. If it exceeds the size of a WAL segment, we
decode all the available WAL before resuming the copying. (Of course, the
changes are not applied until the whole table contents is copied.) A
background worker might be a better approach for the decoding - let's consider
implementing it in the future.

The WAL records produced by running DML commands on the new relation do not
contain enough information to be processed by the logical decoding system. All
we need from the new relation is the file (relfilenode), while the actual
relation is eventually dropped. Thus there is no point in replaying the DMLs
anywhere.

Author: Antonin Houska <ah@cybertec.de>
Discussion: https://postgr.es/m/82651.1720540558@antos
---
 doc/src/sgml/monitoring.sgml                  |   37 +-
 doc/src/sgml/mvcc.sgml                        |   12 +-
 doc/src/sgml/ref/repack.sgml                  |  129 +-
 src/Makefile                                  |    1 +
 src/backend/access/heap/heapam.c              |   34 +-
 src/backend/access/heap/heapam_handler.c      |  227 +-
 src/backend/access/heap/rewriteheap.c         |    6 +-
 src/backend/access/transam/xact.c             |   11 +-
 src/backend/catalog/index.c                   |   43 +-
 src/backend/catalog/system_views.sql          |   30 +-
 src/backend/commands/cluster.c                | 1822 ++++++++++++++++-
 src/backend/commands/matview.c                |    2 +-
 src/backend/commands/tablecmds.c              |    1 +
 src/backend/commands/vacuum.c                 |   12 +-
 src/backend/meson.build                       |    1 +
 src/backend/replication/logical/decode.c      |   83 +
 src/backend/replication/logical/snapbuild.c   |   20 +
 .../replication/pgoutput_repack/Makefile      |   32 +
 .../replication/pgoutput_repack/meson.build   |   18 +
 .../pgoutput_repack/pgoutput_repack.c         |  288 +++
 src/backend/storage/ipc/ipci.c                |    1 +
 .../storage/lmgr/generate-lwlocknames.pl      |    2 +-
 src/backend/utils/cache/relcache.c            |    1 +
 src/backend/utils/time/snapmgr.c              |    3 +-
 src/bin/psql/tab-complete.in.c                |   25 +-
 src/include/access/heapam.h                   |    9 +-
 src/include/access/heapam_xlog.h              |    2 +
 src/include/access/tableam.h                  |   10 +
 src/include/catalog/index.h                   |    3 +
 src/include/commands/cluster.h                |   90 +-
 src/include/commands/progress.h               |   23 +-
 src/include/replication/snapbuild.h           |    1 +
 src/include/storage/lockdefs.h                |    4 +-
 src/include/utils/snapmgr.h                   |    2 +
 src/test/modules/injection_points/Makefile    |    5 +-
 .../injection_points/expected/repack.out      |  113 +
 .../modules/injection_points/logical.conf     |    1 +
 src/test/modules/injection_points/meson.build |    4 +
 .../injection_points/specs/repack.spec        |  143 ++
 src/test/regress/expected/rules.out           |   29 +-
 src/tools/pgindent/typedefs.list              |    4 +
 41 files changed, 3001 insertions(+), 283 deletions(-)
 create mode 100644 src/backend/replication/pgoutput_repack/Makefile
 create mode 100644 src/backend/replication/pgoutput_repack/meson.build
 create mode 100644 src/backend/replication/pgoutput_repack/pgoutput_repack.c
 create mode 100644 src/test/modules/injection_points/expected/repack.out
 create mode 100644 src/test/modules/injection_points/logical.conf
 create mode 100644 src/test/modules/injection_points/specs/repack.spec

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 924e1b1fa99..35160198b64 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6063,14 +6063,35 @@ FROM pg_stat_get_backend_idset() AS backendid;
 
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>heap_tuples_written</structfield> <type>bigint</type>
+       <structfield>heap_tuples_inserted</structfield> <type>bigint</type>
       </para>
       <para>
-       Number of heap tuples written.
+       Number of heap tuples inserted.
        This counter only advances when the phase is
        <literal>seq scanning heap</literal>,
-       <literal>index scanning heap</literal>
-       or <literal>writing new heap</literal>.
+       <literal>index scanning heap</literal>,
+       <literal>writing new heap</literal>
+       or <literal>catch-up</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_updated</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples updated.
+       This counter only advances when the phase is <literal>catch-up</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_deleted</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples deleted.
+       This counter only advances when the phase is <literal>catch-up</literal>.
       </para></entry>
      </row>
 
@@ -6151,6 +6172,14 @@ FROM pg_stat_get_backend_idset() AS backendid;
        <command>REPACK</command> is currently writing the new heap.
      </entry>
     </row>
+    <row>
+     <entry><literal>catch-up</literal></entry>
+     <entry>
+       <command>REPACK CONCURRENTLY</command> is currently processing the DML
+       commands that other transactions executed during any of the preceding
+       phase.
+     </entry>
+    </row>
     <row>
      <entry><literal>swapping relation files</literal></entry>
      <entry>
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index 049ee75a4ba..0f5c34af542 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -1833,15 +1833,17 @@ SELECT pg_advisory_lock(q.id) FROM
    <title>Caveats</title>
 
    <para>
-    Some DDL commands, currently only <link linkend="sql-truncate"><command>TRUNCATE</command></link> and the
-    table-rewriting forms of <link linkend="sql-altertable"><command>ALTER TABLE</command></link>, are not
+    Some commands, currently only <link linkend="sql-truncate"><command>TRUNCATE</command></link>, the
+    table-rewriting forms of <link linkend="sql-altertable"><command>ALTER
+    TABLE</command></link> and <command>REPACK</command> with
+    the <literal>CONCURRENTLY</literal> option, are not
     MVCC-safe.  This means that after the truncation or rewrite commits, the
     table will appear empty to concurrent transactions, if they are using a
-    snapshot taken before the DDL command committed.  This will only be an
+    snapshot taken before the command committed.  This will only be an
     issue for a transaction that did not access the table in question
-    before the DDL command started &mdash; any transaction that has done so
+    before the command started &mdash; any transaction that has done so
     would hold at least an <literal>ACCESS SHARE</literal> table lock,
-    which would block the DDL command until that transaction completes.
+    which would block the truncating or rewriting command until that transaction completes.
     So these commands will not cause any apparent inconsistency in the
     table contents for successive queries on the target table, but they
     could cause visible inconsistency between the contents of the target
diff --git a/doc/src/sgml/ref/repack.sgml b/doc/src/sgml/ref/repack.sgml
index a612c72d971..9c089a6b3d7 100644
--- a/doc/src/sgml/ref/repack.sgml
+++ b/doc/src/sgml/ref/repack.sgml
@@ -22,6 +22,7 @@ PostgreSQL documentation
  <refsynopsisdiv>
 <synopsis>
 REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <replaceable class="parameter">table_name</replaceable> [ USING INDEX <replaceable class="parameter">index_name</replaceable> ] ]
+REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] CONCURRENTLY <replaceable class="parameter">table_name</replaceable> [ USING INDEX <replaceable class="parameter">index_name</replaceable> ]
 
 <phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
 
@@ -48,7 +49,8 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
    processes every table and materialized view in the current database that
    the current user has the <literal>MAINTAIN</literal> privilege on. This
    form of <command>REPACK</command> cannot be executed inside a transaction
-   block.
+   block.  Also, this form is not allowed if
+   the <literal>CONCURRENTLY</literal> option is used.
   </para>
 
   <para>
@@ -61,7 +63,8 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
    When a table is being repacked, an <literal>ACCESS EXCLUSIVE</literal> lock
    is acquired on it. This prevents any other database operations (both reads
    and writes) from operating on the table until the <command>REPACK</command>
-   is finished.
+   is finished. If you want to keep the table accessible during the repacking,
+   consider using the <literal>CONCURRENTLY</literal> option.
   </para>
 
   <refsect2 id="sql-repack-notes-on-clustering" xreflabel="Notes on Clustering">
@@ -160,6 +163,128 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>CONCURRENTLY</literal></term>
+    <listitem>
+     <para>
+      Allow other transactions to use the table while it is being repacked.
+     </para>
+
+     <para>
+      Internally, <command>REPACK</command> copies the contents of the table
+      (ignoring dead tuples) into a new file, sorted by the specified index,
+      and also creates a new file for each index. Then it swaps the old and
+      new files for the table and all the indexes, and deletes the old
+      files. The <literal>ACCESS EXCLUSIVE</literal> lock is needed to make
+      sure that the old files do not change during the processing because the
+      changes would get lost due to the swap.
+     </para>
+
+     <para>
+      With the <literal>CONCURRENTLY</literal> option, the <literal>ACCESS
+      EXCLUSIVE</literal> lock is only acquired to swap the table and index
+      files. The data changes that took place during the creation of the new
+      table and index files are captured using logical decoding
+      (<xref linkend="logicaldecoding"/>) and applied before
+      the <literal>ACCESS EXCLUSIVE</literal> lock is requested. Thus the lock
+      is typically held only for the time needed to swap the files, which
+      should be pretty short. However, the time might still be noticeable if
+      too many data changes have been done to the table while
+      <command>REPACK</command> was waiting for the lock: those changes must
+      be processed just before the files are swapped, while the
+      <literal>ACCESS EXCLUSIVE</literal> lock is being held.
+     </para>
+
+     <para>
+      Note that <command>REPACK</command> with the
+      the <literal>CONCURRENTLY</literal> option does not try to order the
+      rows inserted into the table after the repacking started. Also
+      note <command>REPACK</command> might fail to complete due to DDL
+      commands executed on the table by other transactions during the
+      repacking.
+     </para>
+
+     <note>
+      <para>
+       In addition to the temporary space requirements explained in
+       <xref linkend="sql-repack-notes-on-resources"/>,
+       the <literal>CONCURRENTLY</literal> option can add to the usage of
+       temporary space a bit more. The reason is that other transactions can
+       perform DML operations which cannot be applied to the new file until
+       <command>REPACK</command> has copied all the tuples from the old
+       file. Thus the tuples inserted into the old file during the copying are
+       also stored separately in a temporary file, so they can eventually be
+       applied to the new file.
+      </para>
+
+      <para>
+       Furthermore, the data changes performed during the copying are
+       extracted from <link linkend="wal">write-ahead log</link> (WAL), and
+       this extraction (decoding) only takes place when certain amount of WAL
+       has been written. Therefore, WAL removal can be delayed by this
+       threshold. Currently the threshold is equal to the value of
+       the <link linkend="guc-wal-segment-size"><varname>wal_segment_size</varname></link>
+       configuration parameter.
+      </para>
+     </note>
+
+     <para>
+      The <literal>CONCURRENTLY</literal> option cannot be used in the
+      following cases:
+
+      <itemizedlist>
+       <listitem>
+        <para>
+          The table is <literal>UNLOGGED</literal>.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+          The table is partitioned.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+          The table is a system catalog or a <acronym>TOAST</acronym> table.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         <command>REPACK</command> is executed inside a transaction block.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+          The <link linkend="guc-wal-level"><varname>wal_level</varname></link>
+          configuration parameter is less than <literal>logical</literal>.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         The <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+         configuration parameter does not allow for creation of an additional
+         replication slot.
+        </para>
+       </listitem>
+      </itemizedlist>
+     </para>
+
+     <warning>
+      <para>
+       <command>REPACK</command> with the <literal>CONCURRENTLY</literal>
+       option is not MVCC-safe, see <xref linkend="mvcc-caveats"/> for
+       details.
+      </para>
+     </warning>
+
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>VERBOSE</literal></term>
     <listitem>
diff --git a/src/Makefile b/src/Makefile
index 2f31a2f20a7..b18c9a14ffa 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -23,6 +23,7 @@ SUBDIRS = \
 	interfaces \
 	backend/replication/libpqwalreceiver \
 	backend/replication/pgoutput \
+	backend/replication/pgoutput_repack \
 	fe_utils \
 	bin \
 	pl \
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0dcd6ee817e..4fdb3e880e4 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -60,7 +60,8 @@ static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
 static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 								  Buffer newbuf, HeapTuple oldtup,
 								  HeapTuple newtup, HeapTuple old_key_tuple,
-								  bool all_visible_cleared, bool new_all_visible_cleared);
+								  bool all_visible_cleared, bool new_all_visible_cleared,
+								  bool wal_logical);
 #ifdef USE_ASSERT_CHECKING
 static void check_lock_if_inplace_updateable_rel(Relation relation,
 												 ItemPointer otid,
@@ -2769,7 +2770,7 @@ xmax_infomask_changed(uint16 new_infomask, uint16 old_infomask)
 TM_Result
 heap_delete(Relation relation, ItemPointer tid,
 			CommandId cid, Snapshot crosscheck, bool wait,
-			TM_FailureData *tmfd, bool changingPart)
+			TM_FailureData *tmfd, bool changingPart, bool wal_logical)
 {
 	TM_Result	result;
 	TransactionId xid = GetCurrentTransactionId();
@@ -3016,7 +3017,8 @@ l1:
 	 * Compute replica identity tuple before entering the critical section so
 	 * we don't PANIC upon a memory allocation failure.
 	 */
-	old_key_tuple = ExtractReplicaIdentity(relation, &tp, true, &old_key_copied);
+	old_key_tuple = wal_logical ?
+		ExtractReplicaIdentity(relation, &tp, true, &old_key_copied) : NULL;
 
 	/*
 	 * If this is the first possibly-multixact-able operation in the current
@@ -3106,6 +3108,15 @@ l1:
 				xlrec.flags |= XLH_DELETE_CONTAINS_OLD_KEY;
 		}
 
+		/*
+		 * Unlike UPDATE, DELETE is decoded even if there is no old key, so it
+		 * does not help to clear both XLH_DELETE_CONTAINS_OLD_TUPLE and
+		 * XLH_DELETE_CONTAINS_OLD_KEY. Thus we need an extra flag. TODO
+		 * Consider not decoding tuples w/o the old tuple/key instead.
+		 */
+		if (!wal_logical)
+			xlrec.flags |= XLH_DELETE_NO_LOGICAL;
+
 		XLogBeginInsert();
 		XLogRegisterData(&xlrec, SizeOfHeapDelete);
 
@@ -3198,7 +3209,8 @@ simple_heap_delete(Relation relation, ItemPointer tid)
 	result = heap_delete(relation, tid,
 						 GetCurrentCommandId(true), InvalidSnapshot,
 						 true /* wait for commit */ ,
-						 &tmfd, false /* changingPart */ );
+						 &tmfd, false, /* changingPart */
+						 true /* wal_logical */);
 	switch (result)
 	{
 		case TM_SelfModified:
@@ -3239,7 +3251,7 @@ TM_Result
 heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 			CommandId cid, Snapshot crosscheck, bool wait,
 			TM_FailureData *tmfd, LockTupleMode *lockmode,
-			TU_UpdateIndexes *update_indexes)
+			TU_UpdateIndexes *update_indexes, bool wal_logical)
 {
 	TM_Result	result;
 	TransactionId xid = GetCurrentTransactionId();
@@ -4132,7 +4144,8 @@ l2:
 								 newbuf, &oldtup, heaptup,
 								 old_key_tuple,
 								 all_visible_cleared,
-								 all_visible_cleared_new);
+								 all_visible_cleared_new,
+								 wal_logical);
 		if (newbuf != buffer)
 		{
 			PageSetLSN(BufferGetPage(newbuf), recptr);
@@ -4490,7 +4503,8 @@ simple_heap_update(Relation relation, ItemPointer otid, HeapTuple tup,
 	result = heap_update(relation, otid, tup,
 						 GetCurrentCommandId(true), InvalidSnapshot,
 						 true /* wait for commit */ ,
-						 &tmfd, &lockmode, update_indexes);
+						 &tmfd, &lockmode, update_indexes,
+						 true	/* wal_logical */);
 	switch (result)
 	{
 		case TM_SelfModified:
@@ -8831,7 +8845,8 @@ static XLogRecPtr
 log_heap_update(Relation reln, Buffer oldbuf,
 				Buffer newbuf, HeapTuple oldtup, HeapTuple newtup,
 				HeapTuple old_key_tuple,
-				bool all_visible_cleared, bool new_all_visible_cleared)
+				bool all_visible_cleared, bool new_all_visible_cleared,
+				bool wal_logical)
 {
 	xl_heap_update xlrec;
 	xl_heap_header xlhdr;
@@ -8842,7 +8857,8 @@ log_heap_update(Relation reln, Buffer oldbuf,
 				suffixlen = 0;
 	XLogRecPtr	recptr;
 	Page		page = BufferGetPage(newbuf);
-	bool		need_tuple_data = RelationIsLogicallyLogged(reln);
+	bool		need_tuple_data = RelationIsLogicallyLogged(reln) &&
+		wal_logical;
 	bool		init;
 	int			bufflags;
 
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 0b03070d394..c829c06f769 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -33,6 +33,7 @@
 #include "catalog/index.h"
 #include "catalog/storage.h"
 #include "catalog/storage_xlog.h"
+#include "commands/cluster.h"
 #include "commands/progress.h"
 #include "executor/executor.h"
 #include "miscadmin.h"
@@ -309,7 +310,8 @@ heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
 	 * the storage itself is cleaning the dead tuples by itself, it is the
 	 * time to call the index tuple deletion also.
 	 */
-	return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart);
+	return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart,
+					   true);
 }
 
 
@@ -328,7 +330,7 @@ heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
 	tuple->t_tableOid = slot->tts_tableOid;
 
 	result = heap_update(relation, otid, tuple, cid, crosscheck, wait,
-						 tmfd, lockmode, update_indexes);
+						 tmfd, lockmode, update_indexes, true);
 	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
 
 	/*
@@ -685,13 +687,15 @@ static void
 heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 								 Relation OldIndex, bool use_sort,
 								 TransactionId OldestXmin,
+								 Snapshot snapshot,
+								 LogicalDecodingContext *decoding_ctx,
 								 TransactionId *xid_cutoff,
 								 MultiXactId *multi_cutoff,
 								 double *num_tuples,
 								 double *tups_vacuumed,
 								 double *tups_recently_dead)
 {
-	RewriteState rwstate;
+	RewriteState rwstate = NULL;
 	IndexScanDesc indexScan;
 	TableScanDesc tableScan;
 	HeapScanDesc heapScan;
@@ -705,6 +709,8 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	bool	   *isnull;
 	BufferHeapTupleTableSlot *hslot;
 	BlockNumber prev_cblock = InvalidBlockNumber;
+	bool		concurrent = snapshot != NULL;
+	XLogRecPtr	end_of_wal_prev = GetFlushRecPtr(NULL);
 
 	/* Remember if it's a system catalog */
 	is_system_catalog = IsSystemRelation(OldHeap);
@@ -720,9 +726,12 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	values = (Datum *) palloc(natts * sizeof(Datum));
 	isnull = (bool *) palloc(natts * sizeof(bool));
 
-	/* Initialize the rewrite operation */
-	rwstate = begin_heap_rewrite(OldHeap, NewHeap, OldestXmin, *xid_cutoff,
-								 *multi_cutoff);
+	/*
+	 * Initialize the rewrite operation.
+	 */
+	if (!concurrent)
+		rwstate = begin_heap_rewrite(OldHeap, NewHeap, OldestXmin,
+									 *xid_cutoff, *multi_cutoff);
 
 
 	/* Set up sorting if wanted */
@@ -737,6 +746,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	 * Prepare to scan the OldHeap.  To ensure we see recently-dead tuples
 	 * that still need to be copied, we scan with SnapshotAny and use
 	 * HeapTupleSatisfiesVacuum for the visibility test.
+	 *
+	 * In the CONCURRENTLY case, we do regular MVCC visibility tests, using
+	 * the snapshot passed by the caller.
 	 */
 	if (OldIndex != NULL && !use_sort)
 	{
@@ -753,7 +765,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex,
+									snapshot ? snapshot :SnapshotAny,
+									NULL, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -762,7 +776,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap,
+									snapshot ? snapshot :SnapshotAny,
+									0, (ScanKey) NULL);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
@@ -785,6 +801,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		HeapTuple	tuple;
 		Buffer		buf;
 		bool		isdead;
+		HTSV_Result vis;
 
 		CHECK_FOR_INTERRUPTS();
 
@@ -837,70 +854,84 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		tuple = ExecFetchSlotHeapTuple(slot, false, NULL);
 		buf = hslot->buffer;
 
-		LockBuffer(buf, BUFFER_LOCK_SHARE);
-
-		switch (HeapTupleSatisfiesVacuum(tuple, OldestXmin, buf))
+		/*
+		 * Regarding CONCURRENTLY, see the comments on MVCC snapshot above.
+		 */
+		if (!concurrent)
 		{
-			case HEAPTUPLE_DEAD:
-				/* Definitely dead */
-				isdead = true;
-				break;
-			case HEAPTUPLE_RECENTLY_DEAD:
-				*tups_recently_dead += 1;
-				/* fall through */
-			case HEAPTUPLE_LIVE:
-				/* Live or recently dead, must copy it */
-				isdead = false;
-				break;
-			case HEAPTUPLE_INSERT_IN_PROGRESS:
+			LockBuffer(buf, BUFFER_LOCK_SHARE);
 
-				/*
-				 * Since we hold exclusive lock on the relation, normally the
-				 * only way to see this is if it was inserted earlier in our
-				 * own transaction.  However, it can happen in system
-				 * catalogs, since we tend to release write lock before commit
-				 * there.  Give a warning if neither case applies; but in any
-				 * case we had better copy it.
-				 */
-				if (!is_system_catalog &&
-					!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple->t_data)))
-					elog(WARNING, "concurrent insert in progress within table \"%s\"",
-						 RelationGetRelationName(OldHeap));
-				/* treat as live */
-				isdead = false;
-				break;
-			case HEAPTUPLE_DELETE_IN_PROGRESS:
-
-				/*
-				 * Similar situation to INSERT_IN_PROGRESS case.
-				 */
-				if (!is_system_catalog &&
-					!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tuple->t_data)))
-					elog(WARNING, "concurrent delete in progress within table \"%s\"",
-						 RelationGetRelationName(OldHeap));
-				/* treat as recently dead */
-				*tups_recently_dead += 1;
-				isdead = false;
-				break;
-			default:
-				elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
-				isdead = false; /* keep compiler quiet */
-				break;
-		}
-
-		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
-
-		if (isdead)
-		{
-			*tups_vacuumed += 1;
-			/* heap rewrite module still needs to see it... */
-			if (rewrite_heap_dead_tuple(rwstate, tuple))
+			switch ((vis = HeapTupleSatisfiesVacuum(tuple, OldestXmin, buf)))
 			{
-				/* A previous recently-dead tuple is now known dead */
-				*tups_vacuumed += 1;
-				*tups_recently_dead -= 1;
+				case HEAPTUPLE_DEAD:
+					/* Definitely dead */
+					isdead = true;
+					break;
+				case HEAPTUPLE_RECENTLY_DEAD:
+					*tups_recently_dead += 1;
+					/* fall through */
+				case HEAPTUPLE_LIVE:
+					/* Live or recently dead, must copy it */
+					isdead = false;
+					break;
+				case HEAPTUPLE_INSERT_IN_PROGRESS:
+
+				/*
+				 * As long as we hold exclusive lock on the relation, normally
+				 * the only way to see this is if it was inserted earlier in
+				 * our own transaction.  However, it can happen in system
+				 * catalogs, since we tend to release write lock before commit
+				 * there. Also, there's no exclusive lock during concurrent
+				 * processing. Give a warning if neither case applies; but in
+				 * any case we had better copy it.
+				 */
+					if (!is_system_catalog && !concurrent &&
+						!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple->t_data)))
+						elog(WARNING, "concurrent insert in progress within table \"%s\"",
+							 RelationGetRelationName(OldHeap));
+					/* treat as live */
+					isdead = false;
+					break;
+				case HEAPTUPLE_DELETE_IN_PROGRESS:
+
+					/*
+					 * Similar situation to INSERT_IN_PROGRESS case.
+					 */
+					if (!is_system_catalog && !concurrent &&
+						!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tuple->t_data)))
+						elog(WARNING, "concurrent delete in progress within table \"%s\"",
+							 RelationGetRelationName(OldHeap));
+					/* treat as recently dead */
+					*tups_recently_dead += 1;
+					isdead = false;
+					break;
+				default:
+					elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
+					isdead = false; /* keep compiler quiet */
+					break;
 			}
-			continue;
+
+			if (isdead)
+			{
+				*tups_vacuumed += 1;
+				/* heap rewrite module still needs to see it... */
+				if (rewrite_heap_dead_tuple(rwstate, tuple))
+				{
+					/* A previous recently-dead tuple is now known dead */
+					*tups_vacuumed += 1;
+					*tups_recently_dead -= 1;
+				}
+
+				LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+				continue;
+			}
+
+			/*
+			 * In the concurrent case, we have a copy of the tuple, so we
+			 * don't worry whether the source tuple will be deleted / updated
+			 * after we release the lock.
+			 */
+			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 		}
 
 		*num_tuples += 1;
@@ -919,7 +950,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		{
 			const int	ct_index[] = {
 				PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
-				PROGRESS_REPACK_HEAP_TUPLES_WRITTEN
+				PROGRESS_REPACK_HEAP_TUPLES_INSERTED
 			};
 			int64		ct_val[2];
 
@@ -934,6 +965,31 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			ct_val[1] = *num_tuples;
 			pgstat_progress_update_multi_param(2, ct_index, ct_val);
 		}
+
+		/*
+		 * Process the WAL produced by the load, as well as by other
+		 * transactions, so that the replication slot can advance and WAL does
+		 * not pile up. Use wal_segment_size as a threshold so that we do not
+		 * introduce the decoding overhead too often.
+		 *
+		 * Of course, we must not apply the changes until the initial load has
+		 * completed.
+		 *
+		 * Note that our insertions into the new table should not be decoded
+		 * as we (intentionally) do not write the logical decoding specific
+		 * information to WAL.
+		 */
+		if (concurrent)
+		{
+			XLogRecPtr	end_of_wal;
+
+			end_of_wal = GetFlushRecPtr(NULL);
+			if ((end_of_wal - end_of_wal_prev) > wal_segment_size)
+			{
+				repack_decode_concurrent_changes(decoding_ctx, end_of_wal);
+				end_of_wal_prev = end_of_wal;
+			}
+		}
 	}
 
 	if (indexScan != NULL)
@@ -977,7 +1033,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 									 values, isnull,
 									 rwstate);
 			/* Report n_tuples */
-			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_WRITTEN,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_INSERTED,
 										 n_tuples);
 		}
 
@@ -985,7 +1041,8 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	}
 
 	/* Write out any remaining tuples, and fsync if needed */
-	end_heap_rewrite(rwstate);
+	if (rwstate)
+		end_heap_rewrite(rwstate);
 
 	/* Clean up */
 	pfree(values);
@@ -2376,6 +2433,10 @@ heapam_scan_sample_next_tuple(TableScanDesc scan, SampleScanState *scanstate,
  * SET WITHOUT OIDS.
  *
  * So, we must reconstruct the tuple from component Datums.
+ *
+ * If rwstate=NULL, use simple_heap_insert() instead of rewriting - in that
+ * case we still need to deform/form the tuple. TODO Shouldn't we rename the
+ * function, as might not do any rewrite?
  */
 static void
 reform_and_rewrite_tuple(HeapTuple tuple,
@@ -2398,8 +2459,28 @@ reform_and_rewrite_tuple(HeapTuple tuple,
 
 	copiedTuple = heap_form_tuple(newTupDesc, values, isnull);
 
-	/* The heap rewrite module does the rest */
-	rewrite_heap_tuple(rwstate, tuple, copiedTuple);
+	if (rwstate)
+		/* The heap rewrite module does the rest */
+		rewrite_heap_tuple(rwstate, tuple, copiedTuple);
+	else
+	{
+		/*
+		 * Insert tuple when processing REPACK CONCURRENTLY.
+		 *
+		 * rewriteheap.c is not used in the CONCURRENTLY case because it'd be
+		 * difficult to do the same in the catch-up phase (as the logical
+		 * decoding does not provide us with sufficient visibility
+		 * information). Thus we must use heap_insert() both during the
+		 * catch-up and here.
+		 *
+		 * The following is like simple_heap_insert() except that we pass the
+		 * flag to skip logical decoding: as soon as REPACK CONCURRENTLY swaps
+		 * the relation files, it drops this relation, so no logical
+		 * replication subscription should need the data.
+		 */
+		heap_insert(NewHeap, copiedTuple, GetCurrentCommandId(true),
+					HEAP_INSERT_NO_LOGICAL, NULL);
+	}
 
 	heap_freetuple(copiedTuple);
 }
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index e6d2b5fced1..6aa2ed214f2 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -617,9 +617,9 @@ raw_heap_insert(RewriteState state, HeapTuple tup)
 		int			options = HEAP_INSERT_SKIP_FSM;
 
 		/*
-		 * While rewriting the heap for VACUUM FULL / CLUSTER, make sure data
-		 * for the TOAST table are not logically decoded.  The main heap is
-		 * WAL-logged as XLOG FPI records, which are not logically decoded.
+		 * While rewriting the heap for REPACK, make sure data for the TOAST
+		 * table are not logically decoded.  The main heap is WAL-logged as
+		 * XLOG FPI records, which are not logically decoded.
 		 */
 		options |= HEAP_INSERT_NO_LOGICAL;
 
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index b46e7e9c2a6..5670f2bfbde 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -215,6 +215,7 @@ typedef struct TransactionStateData
 	bool		parallelChildXact;	/* is any parent transaction parallel? */
 	bool		chain;			/* start a new block after this one */
 	bool		topXidLogged;	/* for a subxact: is top-level XID logged? */
+	bool		internal;		/* for a subxact: launched internally? */
 	struct TransactionStateData *parent;	/* back link to parent */
 } TransactionStateData;
 
@@ -4735,6 +4736,7 @@ BeginInternalSubTransaction(const char *name)
 			/* Normal subtransaction start */
 			PushTransaction();
 			s = CurrentTransactionState;	/* changed by push */
+			s->internal = true;
 
 			/*
 			 * Savepoint names, like the TransactionState block itself, live
@@ -5251,7 +5253,13 @@ AbortSubTransaction(void)
 	LWLockReleaseAll();
 
 	pgstat_report_wait_end();
-	pgstat_progress_end_command();
+
+	/*
+	 * Internal subtransacion might be used by an user command, in which case
+	 * the command outlives the subtransaction.
+	 */
+	if (!s->internal)
+		pgstat_progress_end_command();
 
 	pgaio_error_cleanup();
 
@@ -5468,6 +5476,7 @@ PushTransaction(void)
 	s->parallelModeLevel = 0;
 	s->parallelChildXact = (p->parallelModeLevel != 0 || p->parallelChildXact);
 	s->topXidLogged = false;
+	s->internal = false;
 
 	CurrentTransactionState = s;
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 3063abff9a5..241cec50111 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1418,22 +1418,7 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 	for (int i = 0; i < newInfo->ii_NumIndexAttrs; i++)
 		opclassOptions[i] = get_attoptions(oldIndexId, i + 1);
 
-	/* Extract statistic targets for each attribute */
-	stattargets = palloc0_array(NullableDatum, newInfo->ii_NumIndexAttrs);
-	for (int i = 0; i < newInfo->ii_NumIndexAttrs; i++)
-	{
-		HeapTuple	tp;
-		Datum		dat;
-
-		tp = SearchSysCache2(ATTNUM, ObjectIdGetDatum(oldIndexId), Int16GetDatum(i + 1));
-		if (!HeapTupleIsValid(tp))
-			elog(ERROR, "cache lookup failed for attribute %d of relation %u",
-				 i + 1, oldIndexId);
-		dat = SysCacheGetAttr(ATTNUM, tp, Anum_pg_attribute_attstattarget, &isnull);
-		ReleaseSysCache(tp);
-		stattargets[i].value = dat;
-		stattargets[i].isnull = isnull;
-	}
+	stattargets = get_index_stattargets(oldIndexId, newInfo);
 
 	/*
 	 * Now create the new index.
@@ -1472,6 +1457,32 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 	return newIndexId;
 }
 
+NullableDatum *
+get_index_stattargets(Oid indexid, IndexInfo *indInfo)
+{
+	NullableDatum *stattargets;
+
+	/* Extract statistic targets for each attribute */
+	stattargets = palloc0_array(NullableDatum, indInfo->ii_NumIndexAttrs);
+	for (int i = 0; i < indInfo->ii_NumIndexAttrs; i++)
+	{
+		HeapTuple	tp;
+		Datum		dat;
+		bool		isnull;
+
+		tp = SearchSysCache2(ATTNUM, ObjectIdGetDatum(indexid), Int16GetDatum(i + 1));
+		if (!HeapTupleIsValid(tp))
+			elog(ERROR, "cache lookup failed for attribute %d of relation %u",
+				 i + 1, indexid);
+		dat = SysCacheGetAttr(ATTNUM, tp, Anum_pg_attribute_attstattarget, &isnull);
+		ReleaseSysCache(tp);
+		stattargets[i].value = dat;
+		stattargets[i].isnull = isnull;
+	}
+
+	return stattargets;
+}
+
 /*
  * index_concurrently_build
  *
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 02f091c3ed6..85fa3fcf687 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1266,16 +1266,17 @@ CREATE VIEW pg_stat_progress_cluster AS
                       WHEN 2 THEN 'index scanning heap'
                       WHEN 3 THEN 'sorting tuples'
                       WHEN 4 THEN 'writing new heap'
-                      WHEN 5 THEN 'swapping relation files'
-                      WHEN 6 THEN 'rebuilding index'
-                      WHEN 7 THEN 'performing final cleanup'
+                      -- 5 is 'catch-up', but that should not appear here.
+                      WHEN 6 THEN 'swapping relation files'
+                      WHEN 7 THEN 'rebuilding index'
+                      WHEN 8 THEN 'performing final cleanup'
                       END AS phase,
         CAST(S.param3 AS oid) AS cluster_index_relid,
         S.param4 AS heap_tuples_scanned,
         S.param5 AS heap_tuples_written,
-        S.param6 AS heap_blks_total,
-        S.param7 AS heap_blks_scanned,
-        S.param8 AS index_rebuild_count
+        S.param8 AS heap_blks_total,
+        S.param9 AS heap_blks_scanned,
+        S.param10 AS index_rebuild_count
     FROM pg_stat_get_progress_info('CLUSTER') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
@@ -1291,16 +1292,19 @@ CREATE VIEW pg_stat_progress_repack AS
                       WHEN 2 THEN 'index scanning heap'
                       WHEN 3 THEN 'sorting tuples'
                       WHEN 4 THEN 'writing new heap'
-                      WHEN 5 THEN 'swapping relation files'
-                      WHEN 6 THEN 'rebuilding index'
-                      WHEN 7 THEN 'performing final cleanup'
+                      WHEN 5 THEN 'catch-up'
+                      WHEN 6 THEN 'swapping relation files'
+                      WHEN 7 THEN 'rebuilding index'
+                      WHEN 8 THEN 'performing final cleanup'
                       END AS phase,
         CAST(S.param3 AS oid) AS repack_index_relid,
         S.param4 AS heap_tuples_scanned,
-        S.param5 AS heap_tuples_written,
-        S.param6 AS heap_blks_total,
-        S.param7 AS heap_blks_scanned,
-        S.param8 AS index_rebuild_count
+        S.param5 AS heap_tuples_inserted,
+        S.param6 AS heap_tuples_updated,
+        S.param7 AS heap_tuples_deleted,
+        S.param8 AS heap_blks_total,
+        S.param9 AS heap_blks_scanned,
+        S.param10 AS index_rebuild_count
     FROM pg_stat_get_progress_info('REPACK') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index af14a230f94..5854067c4ff 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -25,6 +25,10 @@
 #include "access/toast_internals.h"
 #include "access/transam.h"
 #include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xlog_internal.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
 #include "catalog/catalog.h"
 #include "catalog/dependency.h"
 #include "catalog/heap.h"
@@ -32,6 +36,7 @@
 #include "catalog/namespace.h"
 #include "catalog/objectaccess.h"
 #include "catalog/pg_am.h"
+#include "catalog/pg_control.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/toasting.h"
 #include "commands/cluster.h"
@@ -39,15 +44,21 @@
 #include "commands/progress.h"
 #include "commands/tablecmds.h"
 #include "commands/vacuum.h"
+#include "executor/executor.h"
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
 #include "pgstat.h"
+#include "replication/decode.h"
+#include "replication/logical.h"
+#include "replication/snapbuild.h"
 #include "storage/bufmgr.h"
+#include "storage/ipc.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
 #include "utils/acl.h"
 #include "utils/fmgroids.h"
 #include "utils/guc.h"
+#include "utils/injection_point.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
@@ -67,13 +78,45 @@ typedef struct
 	Oid			indexOid;
 } RelToCluster;
 
+/*
+ * The following definitions are used for concurrent processing.
+ */
+
+/*
+ * The locators are used to avoid logical decoding of data that we do not need
+ * for our table.
+ */
+RelFileLocator repacked_rel_locator = {.relNumber = InvalidOid};
+RelFileLocator repacked_rel_toast_locator = {.relNumber = InvalidOid};
+
+/*
+ * Everything we need to call ExecInsertIndexTuples().
+ */
+typedef struct IndexInsertState
+{
+	ResultRelInfo *rri;
+	EState	   *estate;
+
+	Relation	ident_index;
+} IndexInsertState;
+
+/* The WAL segment being decoded. */
+static XLogSegNo repack_current_segment = 0;
+
+
 static bool cluster_rel_recheck(RepackCommand cmd, Relation OldHeap,
-								Oid indexOid, Oid userid, int options);
+								Oid indexOid, Oid userid, LOCKMODE lmode,
+								int options);
+static void check_repack_concurrently_requirements(Relation rel);
 static void rebuild_relation(RepackCommand cmd, bool usingindex,
-							 Relation OldHeap, Relation index, bool verbose);
+							 Relation OldHeap, Relation index, Oid userid,
+							 bool verbose, bool concurrent);
 static void copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
-							bool verbose, bool *pSwapToastByContent,
-							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
+							Snapshot snapshot, LogicalDecodingContext *decoding_ctx,
+							bool verbose,
+							bool *pSwapToastByContent,
+							TransactionId *pFreezeXid,
+							MultiXactId *pCutoffMulti);
 static List *get_tables_to_repack(RepackCommand cmd, bool usingindex,
 								  MemoryContext permcxt);
 static List *get_tables_to_repack_partitioned(RepackCommand cmd,
@@ -81,12 +124,61 @@ static List *get_tables_to_repack_partitioned(RepackCommand cmd,
 											  Oid relid, bool rel_is_index);
 static bool cluster_is_permitted_for_relation(RepackCommand cmd,
 											  Oid relid, Oid userid);
+
+static void begin_concurrent_repack(Relation rel);
+static void end_concurrent_repack(void);
+static LogicalDecodingContext *setup_logical_decoding(Oid relid,
+													  const char *slotname,
+													  TupleDesc tupdesc);
+static HeapTuple get_changed_tuple(char *change);
+static void apply_concurrent_changes(RepackDecodingState *dstate,
+									 Relation rel, ScanKey key, int nkeys,
+									 IndexInsertState *iistate);
+static void apply_concurrent_insert(Relation rel, ConcurrentChange *change,
+									HeapTuple tup, IndexInsertState *iistate,
+									TupleTableSlot *index_slot);
+static void apply_concurrent_update(Relation rel, HeapTuple tup,
+									HeapTuple tup_target,
+									ConcurrentChange *change,
+									IndexInsertState *iistate,
+									TupleTableSlot *index_slot);
+static void apply_concurrent_delete(Relation rel, HeapTuple tup_target,
+									ConcurrentChange *change);
+static HeapTuple find_target_tuple(Relation rel, ScanKey key, int nkeys,
+								   HeapTuple tup_key,
+								   IndexInsertState *iistate,
+								   TupleTableSlot *ident_slot,
+								   IndexScanDesc *scan_p);
+static void process_concurrent_changes(LogicalDecodingContext *ctx,
+									   XLogRecPtr end_of_wal,
+									   Relation rel_dst,
+									   Relation rel_src,
+									   ScanKey ident_key,
+									   int ident_key_nentries,
+									   IndexInsertState *iistate);
+static IndexInsertState *get_index_insert_state(Relation relation,
+												Oid ident_index_id);
+static ScanKey build_identity_key(Oid ident_idx_oid, Relation rel_src,
+								  int *nentries);
+static void free_index_insert_state(IndexInsertState *iistate);
+static void cleanup_logical_decoding(LogicalDecodingContext *ctx);
+static void rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
+											   Relation cl_index,
+											   LogicalDecodingContext *ctx,
+											   bool swap_toast_by_content,
+											   TransactionId frozenXid,
+											   MultiXactId cutoffMulti);
+static List *build_new_indexes(Relation NewHeap, Relation OldHeap, List *OldIndexes);
 static Relation process_single_relation(RepackStmt *stmt,
+										LOCKMODE lockmode,
+										bool isTopLevel,
 										ClusterParams *params);
 static Oid	determine_clustered_index(Relation rel, bool usingindex,
 									  const char *indexname);
 
 
+#define REPL_PLUGIN_NAME   "pgoutput_repack"
+
 static const char *
 RepackCommandAsString(RepackCommand cmd)
 {
@@ -95,9 +187,9 @@ RepackCommandAsString(RepackCommand cmd)
 		case REPACK_COMMAND_REPACK:
 			return "REPACK";
 		case REPACK_COMMAND_VACUUMFULL:
-			return "VACUUM";
+			return "VACUUM (FULL)";
 		case REPACK_COMMAND_CLUSTER:
-			return "VACUUM";
+			return "CLUSTER";
 	}
 	return "???";
 }
@@ -130,16 +222,27 @@ void
 ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 {
 	ClusterParams params = {0};
-	bool		verbose = false;
 	Relation	rel = NULL;
 	MemoryContext repack_context;
+	LOCKMODE	lockmode;
 	List	   *rtcs;
 
 	/* Parse option list */
 	foreach_node(DefElem, opt, stmt->params)
 	{
-		if (strcmp(opt->defname, "verbose") == 0)
-			verbose = defGetBoolean(opt);
+		if (strcmp(opt->defname, "verbose") == 0 &&
+			defGetBoolean(opt))
+			params.options |= CLUOPT_VERBOSE;
+		else if (strcmp(opt->defname, "concurrently") == 0 &&
+				 defGetBoolean(opt))
+		{
+			if (stmt->command != REPACK_COMMAND_REPACK)
+				ereport(ERROR,
+						errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						errmsg("CONCURRENTLY option not supported for %s",
+							   RepackCommandAsString(stmt->command)));
+			params.options |= CLUOPT_CONCURRENT;
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -149,7 +252,17 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 					 parser_errposition(pstate, opt->location)));
 	}
 
-	params.options = (verbose ? CLUOPT_VERBOSE : 0);
+	/*
+	 * Determine the lock mode expected by cluster_rel().
+	 *
+	 * In the exclusive case, we obtain AccessExclusiveLock right away to
+	 * avoid lock-upgrade hazard in the single-transaction case. In the
+	 * CONCURRENTLY case, the AccessExclusiveLock will only be used at the end
+	 * of processing, supposedly for very short time. Until then, we'll have
+	 * to unlock the relation temporarily, so there's no lock-upgrade hazard.
+	 */
+	lockmode = (params.options & CLUOPT_CONCURRENT) == 0 ?
+		AccessExclusiveLock : ShareUpdateExclusiveLock;
 
 	/*
 	 * If a single relation is specified, process it and we're done ... unless
@@ -157,16 +270,35 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 	 */
 	if (stmt->relation != NULL)
 	{
-		rel = process_single_relation(stmt, &params);
+		rel = process_single_relation(stmt, lockmode, isTopLevel, &params);
 		if (rel == NULL)
 			return;
 	}
 
 	/*
-	 * By here, we know we are in a multi-table situation.  In order to avoid
-	 * holding locks for too long, we want to process each table in its own
-	 * transaction.  This forces us to disallow running inside a user
-	 * transaction block.
+	 * By here, we know we are in a multi-table situation.
+	 *
+	 * Concurrent processing is currently considered rather special (e.g. in
+	 * terms of resources consumed) so it is not performed in bulk.
+	 */
+	if (params.options & CLUOPT_CONCURRENT)
+	{
+		if (rel != NULL)
+		{
+			Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
+			ereport(ERROR,
+					errmsg("REPACK CONCURRENTLY not supported for partitioned tables"),
+					errhint("Consider running the command for individual partitions."));
+		}
+		else
+			ereport(ERROR,
+					errmsg("REPACK CONCURRENTLY requires explicit table name"));
+	}
+
+	/*
+	 * In order to avoid holding locks for too long, we want to process each
+	 * table in its own transaction.  This forces us to disallow running
+	 * inside a user transaction block.
 	 */
 	PreventInTransactionBlock(isTopLevel, RepackCommandAsString(stmt->command));
 
@@ -247,13 +379,13 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 		 * Open the target table, coping with the case where it has been
 		 * dropped.
 		 */
-		rel = try_table_open(rtc->tableOid, AccessExclusiveLock);
+		rel = try_table_open(rtc->tableOid, lockmode);
 		if (rel == NULL)
 			continue;
 
 		/* Process this table */
 		cluster_rel(stmt->command, stmt->usingindex,
-					rel, rtc->indexOid, &params);
+					rel, rtc->indexOid, &params, isTopLevel);
 		/* cluster_rel closes the relation, but keeps lock */
 
 		PopActiveSnapshot();
@@ -282,22 +414,55 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
  * If indexOid is InvalidOid, the table will be rewritten in physical order
  * instead of index order.
  *
+ * Note that, in the concurrent case, the function releases the lock at some
+ * point, in order to get AccessExclusiveLock for the final steps (i.e. to
+ * swap the relation files). To make things simpler, the caller should expect
+ * OldHeap to be closed on return, regardless CLUOPT_CONCURRENT. (The
+ * AccessExclusiveLock is kept till the end of the transaction.)
+ *
  * 'cmd' indicates which command is being executed, to be used for error
  * messages.
  */
 void
 cluster_rel(RepackCommand cmd, bool usingindex,
-			Relation OldHeap, Oid indexOid, ClusterParams *params)
+			Relation OldHeap, Oid indexOid, ClusterParams *params,
+			bool isTopLevel)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
+	Relation	index;
+	LOCKMODE	lmode;
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
 	bool		verbose = ((params->options & CLUOPT_VERBOSE) != 0);
 	bool		recheck = ((params->options & CLUOPT_RECHECK) != 0);
-	Relation	index;
+	bool		concurrent = ((params->options & CLUOPT_CONCURRENT) != 0);
 
-	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false));
+	/*
+	 * Check that the correct lock is held. The lock mode is
+	 * AccessExclusiveLock for normal processing and ShareUpdateExclusiveLock
+	 * for concurrent processing (so that SELECT, INSERT, UPDATE and DELETE
+	 * commands work, but cluster_rel() cannot be called concurrently for the
+	 * same relation).
+	 */
+	lmode = !concurrent ? AccessExclusiveLock : ShareUpdateExclusiveLock;
+
+	/* There are specific requirements on concurrent processing. */
+	if (concurrent)
+	{
+		/*
+		 * Make sure we have no XID assigned, otherwise call of
+		 * setup_logical_decoding() can cause a deadlock.
+		 *
+		 * The existence of transaction block actually does not imply that XID
+		 * was already assigned, but it very likely is. We might want to check
+		 * the result of GetCurrentTransactionIdIfAny() instead, but that
+		 * would be less clear from user's perspective.
+		 */
+		PreventInTransactionBlock(isTopLevel, "REPACK (CONCURRENTLY)");
+
+		check_repack_concurrently_requirements(OldHeap);
+	}
 
 	/* Check for user-requested abort. */
 	CHECK_FOR_INTERRUPTS();
@@ -340,11 +505,13 @@ cluster_rel(RepackCommand cmd, bool usingindex,
 	 * If this is a single-transaction CLUSTER, we can skip these tests. We
 	 * *must* skip the one on indisclustered since it would reject an attempt
 	 * to cluster a not-previously-clustered index.
+	 *
+	 * XXX move [some of] these comments to where the RECHECK flag is
+	 * determined?
 	 */
-	if (recheck)
-		if (!cluster_rel_recheck(cmd, OldHeap, indexOid, save_userid,
-								 params->options))
-			goto out;
+	if (recheck && !cluster_rel_recheck(cmd, OldHeap, indexOid, save_userid,
+										lmode, params->options))
+		goto out;
 
 	/*
 	 * We allow repacking shared catalogs only when not using an index. It
@@ -358,6 +525,12 @@ cluster_rel(RepackCommand cmd, bool usingindex,
 				 errmsg("cannot run \"%s\" on a shared catalog",
 						RepackCommandAsString(cmd))));
 
+	/*
+	 * The CONCURRENTLY case should have been rejected earlier because it does
+	 * not support system catalogs.
+	 */
+	Assert(!(OldHeap->rd_rel->relisshared && concurrent));
+
 	/*
 	 * Don't process temp tables of other backends ... their local buffer
 	 * manager is not going to cope.
@@ -393,7 +566,7 @@ cluster_rel(RepackCommand cmd, bool usingindex,
 	if (OidIsValid(indexOid))
 	{
 		/* verify the index is good and lock it */
-		check_index_is_clusterable(OldHeap, indexOid, AccessExclusiveLock);
+		check_index_is_clusterable(OldHeap, indexOid, lmode);
 		/* also open it */
 		index = index_open(indexOid, NoLock);
 	}
@@ -410,7 +583,9 @@ cluster_rel(RepackCommand cmd, bool usingindex,
 	if (OldHeap->rd_rel->relkind == RELKIND_MATVIEW &&
 		!RelationIsPopulated(OldHeap))
 	{
-		relation_close(OldHeap, AccessExclusiveLock);
+		if (index)
+			index_close(index, lmode);
+		relation_close(OldHeap, lmode);
 		goto out;
 	}
 
@@ -423,11 +598,35 @@ cluster_rel(RepackCommand cmd, bool usingindex,
 	 * invalid, because we move tuples around.  Promote them to relation
 	 * locks.  Predicate locks on indexes will be promoted when they are
 	 * reindexed.
+	 *
+	 * During concurrent processing, the heap as well as its indexes stay in
+	 * operation, so we postpone this step until they are locked using
+	 * AccessExclusiveLock near the end of the processing.
 	 */
-	TransferPredicateLocksToHeapRelation(OldHeap);
+	if (!concurrent)
+		TransferPredicateLocksToHeapRelation(OldHeap);
 
 	/* rebuild_relation does all the dirty work */
-	rebuild_relation(cmd, usingindex, OldHeap, index, verbose);
+	PG_TRY();
+	{
+		/*
+		 * For concurrent processing, make sure that our logical decoding
+		 * ignores data changes of other tables than the one we are
+		 * processing.
+		 */
+		if (concurrent)
+			begin_concurrent_repack(OldHeap);
+
+		rebuild_relation(cmd, usingindex, OldHeap, index, save_userid,
+						 verbose, concurrent);
+	}
+	PG_FINALLY();
+	{
+		if (concurrent)
+			end_concurrent_repack();
+	}
+	PG_END_TRY();
+
 	/* rebuild_relation closes OldHeap, and index if valid */
 
 out:
@@ -446,14 +645,14 @@ out:
  */
 static bool
 cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
-					Oid userid, int options)
+					Oid userid, LOCKMODE lmode, int options)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 
 	/* Check that the user still has privileges for the relation */
 	if (!cluster_is_permitted_for_relation(cmd, tableOid, userid))
 	{
-		relation_close(OldHeap, AccessExclusiveLock);
+		relation_close(OldHeap, lmode);
 		return false;
 	}
 
@@ -467,7 +666,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	 */
 	if (RELATION_IS_OTHER_TEMP(OldHeap))
 	{
-		relation_close(OldHeap, AccessExclusiveLock);
+		relation_close(OldHeap, lmode);
 		return false;
 	}
 
@@ -478,7 +677,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 		 */
 		if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
 		{
-			relation_close(OldHeap, AccessExclusiveLock);
+			relation_close(OldHeap, lmode);
 			return false;
 		}
 
@@ -489,7 +688,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 		if ((options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
 			!get_index_isclustered(indexOid))
 		{
-			relation_close(OldHeap, AccessExclusiveLock);
+			relation_close(OldHeap, lmode);
 			return false;
 		}
 	}
@@ -630,19 +829,89 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 	table_close(pg_index, RowExclusiveLock);
 }
 
+/*
+ * Check if the CONCURRENTLY option is legal for the relation.
+ */
+static void
+check_repack_concurrently_requirements(Relation rel)
+{
+	char		relpersistence,
+				replident;
+	Oid			ident_idx;
+
+	/* Data changes in system relations are not logically decoded. */
+	if (IsCatalogRelation(rel))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("REPACK CONCURRENTLY is not supported for catalog relations.")));
+
+	/*
+	 * reorderbuffer.c does not seem to handle processing of TOAST relation
+	 * alone.
+	 */
+	if (IsToastRelation(rel))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("REPACK CONCURRENTLY is not supported for TOAST relations, unless the main relation is repacked too.")));
+
+	relpersistence = rel->rd_rel->relpersistence;
+	if (relpersistence != RELPERSISTENCE_PERMANENT)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("REPACK CONCURRENTLY is only allowed for permanent relations.")));
+
+	/* With NOTHING, WAL does not contain the old tuple. */
+	replident = rel->rd_rel->relreplident;
+	if (replident == REPLICA_IDENTITY_NOTHING)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("Relation \"%s\" has insufficient replication identity.",
+						 RelationGetRelationName(rel))));
+
+	/*
+	 * Identity index is not set if the replica identity is FULL, but PK might
+	 * exist in such a case.
+	 */
+	ident_idx = RelationGetReplicaIndex(rel);
+	if (!OidIsValid(ident_idx) && OidIsValid(rel->rd_pkindex))
+		ident_idx = rel->rd_pkindex;
+	if (!OidIsValid(ident_idx))
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot process relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 (errhint("Relation \"%s\" has no identity index.",
+						  RelationGetRelationName(rel)))));
+}
+
+
 /*
  * rebuild_relation: rebuild an existing relation in index or physical order
  *
- * OldHeap: table to rebuild.
+ * OldHeap: table to rebuild.  See cluster_rel() for comments on the required
+ * lock strength.
+ *
  * index: index to cluster by, or NULL to rewrite in physical order.
  *
- * On entry, heap and index (if one is given) must be open, and
- * AccessExclusiveLock held on them.
- * On exit, they are closed, but locks on them are not released.
+ * On entry, heap and index (if one is given) must be open, and the
+ * appropriate lock held on them -- AccessExclusiveLock for exclusive
+ * processing and ShareUpdateExclusiveLock for concurrent processing.
+ *
+ * On exit, they are closed, but still locked with AccessExclusiveLock.  (The
+ * function handles the lock upgrade if 'concurrent' is true.)
  */
 static void
 rebuild_relation(RepackCommand cmd, bool usingindex,
-				 Relation OldHeap, Relation index, bool verbose)
+				 Relation OldHeap, Relation index, Oid userid,
+				 bool verbose, bool concurrent)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			accessMethod = OldHeap->rd_rel->relam;
@@ -650,13 +919,55 @@ rebuild_relation(RepackCommand cmd, bool usingindex,
 	Oid			OIDNewHeap;
 	Relation	NewHeap;
 	char		relpersistence;
-	bool		is_system_catalog;
 	bool		swap_toast_by_content;
 	TransactionId frozenXid;
 	MultiXactId cutoffMulti;
+	NameData	slotname;
+	LogicalDecodingContext *ctx = NULL;
+	Snapshot	snapshot = NULL;
+#if USE_ASSERT_CHECKING
+	LOCKMODE	lmode;
 
-	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false) &&
-		   (index == NULL || CheckRelationLockedByMe(index, AccessExclusiveLock, false)));
+	lmode = concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock;
+
+	Assert(CheckRelationLockedByMe(OldHeap, lmode, false));
+	Assert(!usingindex || CheckRelationLockedByMe(index, lmode, false));
+#endif
+
+	if (concurrent)
+	{
+		TupleDesc	tupdesc;
+
+		/*
+		 * REPACK CONCURRENTLY is not allowed in a transaction block, so this
+		 * should never fire.
+		 */
+		Assert(GetTopTransactionIdIfAny() == InvalidTransactionId);
+
+		/*
+		 * A single backend should not execute multiple REPACK commands at a
+		 * time, so use PID to make the slot unique.
+		 */
+		snprintf(NameStr(slotname), NAMEDATALEN, "repack_%d", MyProcPid);
+
+		tupdesc = CreateTupleDescCopy(RelationGetDescr(OldHeap));
+
+		/*
+		 * Prepare to capture the concurrent data changes.
+		 *
+		 * Note that this call waits for all transactions with XID already
+		 * assigned to finish. If some of those transactions is waiting for a
+		 * lock conflicting with ShareUpdateExclusiveLock on our table (e.g.
+		 * it runs CREATE INDEX), we can end up in a deadlock. Not sure this
+		 * risk is worth unlocking/locking the table (and its clustering
+		 * index) and checking again if its still eligible for REPACK
+		 * CONCURRENTLY.
+		 */
+		ctx = setup_logical_decoding(tableOid, NameStr(slotname), tupdesc);
+
+		snapshot = SnapBuildInitialSnapshotForRepack(ctx->snapshot_builder);
+		PushActiveSnapshot(snapshot);
+	}
 
 	/* for CLUSTER or REPACK USING INDEX, mark the index as the one to use */
 	if (usingindex)
@@ -664,7 +975,6 @@ rebuild_relation(RepackCommand cmd, bool usingindex,
 
 	/* Remember info about rel before closing OldHeap */
 	relpersistence = OldHeap->rd_rel->relpersistence;
-	is_system_catalog = IsSystemRelation(OldHeap);
 
 	/*
 	 * Create the transient table that will receive the re-ordered data.
@@ -680,30 +990,67 @@ rebuild_relation(RepackCommand cmd, bool usingindex,
 	NewHeap = table_open(OIDNewHeap, NoLock);
 
 	/* Copy the heap data into the new table in the desired order */
-	copy_table_data(NewHeap, OldHeap, index, verbose,
+	copy_table_data(NewHeap, OldHeap, index, snapshot, ctx, verbose,
 					&swap_toast_by_content, &frozenXid, &cutoffMulti);
 
+	/* The historic snapshot won't be needed anymore. */
+	if (snapshot)
+		PopActiveSnapshot();
 
-	/* Close relcache entries, but keep lock until transaction commit */
-	table_close(OldHeap, NoLock);
-	if (index)
-		index_close(index, NoLock);
+	if (concurrent)
+	{
+		/*
+		 * Push a snapshot that we will use to find old versions of rows when
+		 * processing concurrent UPDATE and DELETE commands. (That snapshot
+		 * should also be used by index expressions.)
+		 */
+		PushActiveSnapshot(GetTransactionSnapshot());
 
-	/*
-	 * Close the new relation so it can be dropped as soon as the storage is
-	 * swapped. The relation is not visible to others, so no need to unlock it
-	 * explicitly.
-	 */
-	table_close(NewHeap, NoLock);
+		/*
+		 * Make sure we can find the tuples just inserted when applying DML
+		 * commands on top of those.
+		 */
+		CommandCounterIncrement();
+		UpdateActiveSnapshotCommandId();
 
-	/*
-	 * Swap the physical files of the target and transient tables, then
-	 * rebuild the target's indexes and throw away the transient table.
-	 */
-	finish_heap_swap(tableOid, OIDNewHeap, is_system_catalog,
-					 swap_toast_by_content, false, true,
-					 frozenXid, cutoffMulti,
-					 relpersistence);
+		rebuild_relation_finish_concurrent(NewHeap, OldHeap, index,
+										   ctx, swap_toast_by_content,
+										   frozenXid, cutoffMulti);
+		PopActiveSnapshot();
+
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_FINAL_CLEANUP);
+
+		/* Done with decoding. */
+		cleanup_logical_decoding(ctx);
+		ReplicationSlotRelease();
+		ReplicationSlotDrop(NameStr(slotname), false);
+	}
+	else
+	{
+		bool		is_system_catalog = IsSystemRelation(OldHeap);
+
+		/* Close relcache entries, but keep lock until transaction commit */
+		table_close(OldHeap, NoLock);
+		if (index)
+			index_close(index, NoLock);
+
+		/*
+		 * Close the new relation so it can be dropped as soon as the storage
+		 * is swapped. The relation is not visible to others, so no need to
+		 * unlock it explicitly.
+		 */
+		table_close(NewHeap, NoLock);
+
+		/*
+		 * Swap the physical files of the target and transient tables, then
+		 * rebuild the target's indexes and throw away the transient table.
+		 */
+		finish_heap_swap(tableOid, OIDNewHeap, is_system_catalog,
+						 swap_toast_by_content, false, true, true,
+						 frozenXid, cutoffMulti,
+						 relpersistence);
+	}
 }
 
 
@@ -838,15 +1185,19 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
 /*
  * Do the physical copying of table data.
  *
+ * 'snapshot' and 'decoding_ctx': see table_relation_copy_for_cluster(). Pass
+ * iff concurrent processing is required.
+ *
  * There are three output parameters:
  * *pSwapToastByContent is set true if toast tables must be swapped by content.
  * *pFreezeXid receives the TransactionId used as freeze cutoff point.
  * *pCutoffMulti receives the MultiXactId used as a cutoff point.
  */
 static void
-copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verbose,
-				bool *pSwapToastByContent, TransactionId *pFreezeXid,
-				MultiXactId *pCutoffMulti)
+copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
+				Snapshot snapshot, LogicalDecodingContext *decoding_ctx,
+				bool verbose, bool *pSwapToastByContent,
+				TransactionId *pFreezeXid, MultiXactId *pCutoffMulti)
 {
 	Relation	relRelation;
 	HeapTuple	reltup;
@@ -864,6 +1215,8 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	PGRUsage	ru0;
 	char	   *nspname;
 
+	bool		concurrent = snapshot != NULL;
+
 	pg_rusage_init(&ru0);
 
 	/* Store a copy of the namespace name for logging purposes */
@@ -966,8 +1319,48 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	 * provided, else plain seqscan.
 	 */
 	if (OldIndex != NULL && OldIndex->rd_rel->relam == BTREE_AM_OID)
+	{
+		ResourceOwner oldowner = NULL;
+		ResourceOwner resowner = NULL;
+
+		/*
+		 * In the CONCURRENT case, use a dedicated resource owner so we don't
+		 * leave any additional locks behind us that we cannot release easily.
+		 */
+		if (concurrent)
+		{
+			Assert(CheckRelationLockedByMe(OldHeap, ShareUpdateExclusiveLock,
+										   false));
+			Assert(CheckRelationLockedByMe(OldIndex, ShareUpdateExclusiveLock,
+										   false));
+
+			resowner = ResourceOwnerCreate(CurrentResourceOwner,
+										   "plan_cluster_use_sort");
+			oldowner = CurrentResourceOwner;
+			CurrentResourceOwner = resowner;
+		}
+
 		use_sort = plan_cluster_use_sort(RelationGetRelid(OldHeap),
 										 RelationGetRelid(OldIndex));
+
+		if (concurrent)
+		{
+			CurrentResourceOwner = oldowner;
+
+			/*
+			 * We are primarily concerned about locks, but if the planner
+			 * happened to allocate any other resources, we should release
+			 * them too because we're going to delete the whole resowner.
+			 */
+			ResourceOwnerRelease(resowner, RESOURCE_RELEASE_BEFORE_LOCKS,
+								 false, false);
+			ResourceOwnerRelease(resowner, RESOURCE_RELEASE_LOCKS,
+								 false, false);
+			ResourceOwnerRelease(resowner, RESOURCE_RELEASE_AFTER_LOCKS,
+								 false, false);
+			ResourceOwnerDelete(resowner);
+		}
+	}
 	else
 		use_sort = false;
 
@@ -996,7 +1389,9 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	 * values (e.g. because the AM doesn't use freezing).
 	 */
 	table_relation_copy_for_cluster(OldHeap, NewHeap, OldIndex, use_sort,
-									cutoffs.OldestXmin, &cutoffs.FreezeLimit,
+									cutoffs.OldestXmin, snapshot,
+									decoding_ctx,
+									&cutoffs.FreezeLimit,
 									&cutoffs.MultiXactCutoff,
 									&num_tuples, &tups_vacuumed,
 									&tups_recently_dead);
@@ -1005,7 +1400,11 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	*pFreezeXid = cutoffs.FreezeLimit;
 	*pCutoffMulti = cutoffs.MultiXactCutoff;
 
-	/* Reset rd_toastoid just to be tidy --- it shouldn't be looked at again */
+	/*
+	 * Reset rd_toastoid just to be tidy --- it shouldn't be looked at again.
+	 * In the CONCURRENTLY case, we need to set it again before applying the
+	 * concurrent changes.
+	 */
 	NewHeap->rd_toastoid = InvalidOid;
 
 	num_pages = RelationGetNumberOfBlocks(NewHeap);
@@ -1463,14 +1862,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 				 bool swap_toast_by_content,
 				 bool check_constraints,
 				 bool is_internal,
+				 bool reindex,
 				 TransactionId frozenXid,
 				 MultiXactId cutoffMulti,
 				 char newrelpersistence)
 {
 	ObjectAddress object;
 	Oid			mapped_tables[4];
-	int			reindex_flags;
-	ReindexParams reindex_params = {0};
 	int			i;
 
 	/* Report that we are now swapping relation files */
@@ -1496,39 +1894,47 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	if (is_system_catalog)
 		CacheInvalidateCatalog(OIDOldHeap);
 
-	/*
-	 * Rebuild each index on the relation (but not the toast table, which is
-	 * all-new at this point).  It is important to do this before the DROP
-	 * step because if we are processing a system catalog that will be used
-	 * during DROP, we want to have its indexes available.  There is no
-	 * advantage to the other order anyway because this is all transactional,
-	 * so no chance to reclaim disk space before commit.  We do not need a
-	 * final CommandCounterIncrement() because reindex_relation does it.
-	 *
-	 * Note: because index_build is called via reindex_relation, it will never
-	 * set indcheckxmin true for the indexes.  This is OK even though in some
-	 * sense we are building new indexes rather than rebuilding existing ones,
-	 * because the new heap won't contain any HOT chains at all, let alone
-	 * broken ones, so it can't be necessary to set indcheckxmin.
-	 */
-	reindex_flags = REINDEX_REL_SUPPRESS_INDEX_USE;
-	if (check_constraints)
-		reindex_flags |= REINDEX_REL_CHECK_CONSTRAINTS;
+	if (reindex)
+	{
+		int			reindex_flags;
+		ReindexParams reindex_params = {0};
 
-	/*
-	 * Ensure that the indexes have the same persistence as the parent
-	 * relation.
-	 */
-	if (newrelpersistence == RELPERSISTENCE_UNLOGGED)
-		reindex_flags |= REINDEX_REL_FORCE_INDEXES_UNLOGGED;
-	else if (newrelpersistence == RELPERSISTENCE_PERMANENT)
-		reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
+		/*
+		 * Rebuild each index on the relation (but not the toast table, which
+		 * is all-new at this point).  It is important to do this before the
+		 * DROP step because if we are processing a system catalog that will
+		 * be used during DROP, we want to have its indexes available.  There
+		 * is no advantage to the other order anyway because this is all
+		 * transactional, so no chance to reclaim disk space before commit. We
+		 * do not need a final CommandCounterIncrement() because
+		 * reindex_relation does it.
+		 *
+		 * Note: because index_build is called via reindex_relation, it will
+		 * never set indcheckxmin true for the indexes.  This is OK even
+		 * though in some sense we are building new indexes rather than
+		 * rebuilding existing ones, because the new heap won't contain any
+		 * HOT chains at all, let alone broken ones, so it can't be necessary
+		 * to set indcheckxmin.
+		 */
+		reindex_flags = REINDEX_REL_SUPPRESS_INDEX_USE;
+		if (check_constraints)
+			reindex_flags |= REINDEX_REL_CHECK_CONSTRAINTS;
 
-	/* Report that we are now reindexing relations */
-	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
-								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
+		/*
+		 * Ensure that the indexes have the same persistence as the parent
+		 * relation.
+		 */
+		if (newrelpersistence == RELPERSISTENCE_UNLOGGED)
+			reindex_flags |= REINDEX_REL_FORCE_INDEXES_UNLOGGED;
+		else if (newrelpersistence == RELPERSISTENCE_PERMANENT)
+			reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
 
-	reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
+		/* Report that we are now reindexing relations */
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
+
+		reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
+	}
 
 	/* Report that we are now doing clean up */
 	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
@@ -1870,7 +2276,8 @@ cluster_is_permitted_for_relation(RepackCommand cmd, Oid relid, Oid userid)
  * resolve in this case.
  */
 static Relation
-process_single_relation(RepackStmt *stmt, ClusterParams *params)
+process_single_relation(RepackStmt *stmt, LOCKMODE lockmode, bool isTopLevel,
+						ClusterParams *params)
 {
 	Relation	rel;
 	Oid			tableOid;
@@ -1879,13 +2286,9 @@ process_single_relation(RepackStmt *stmt, ClusterParams *params)
 	Assert(stmt->command == REPACK_COMMAND_CLUSTER ||
 		   stmt->command == REPACK_COMMAND_REPACK);
 
-	/*
-	 * Find, lock, and check permissions on the table.  We obtain
-	 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
-	 * single-transaction case.
-	 */
+	/* Find, lock, and check permissions on the table. */
 	tableOid = RangeVarGetRelidExtended(stmt->relation,
-										AccessExclusiveLock,
+										lockmode,
 										0,
 										RangeVarCallbackMaintainsTable,
 										NULL);
@@ -1911,12 +2314,17 @@ process_single_relation(RepackStmt *stmt, ClusterParams *params)
 		return rel;
 	else
 	{
-		Oid			indexOid;
+		Oid		indexOid = InvalidOid;
 
-		indexOid = determine_clustered_index(rel, stmt->usingindex,
-											 stmt->indexname);
-		check_index_is_clusterable(rel, indexOid, AccessExclusiveLock);
-		cluster_rel(stmt->command, stmt->usingindex, rel, indexOid, params);
+		if (stmt->usingindex)
+		{
+			indexOid = determine_clustered_index(rel, stmt->usingindex,
+												 stmt->indexname);
+			check_index_is_clusterable(rel, indexOid, lockmode);
+		}
+
+		cluster_rel(stmt->command, stmt->usingindex, rel, indexOid,
+					params, isTopLevel);
 		return NULL;
 	}
 }
@@ -1973,3 +2381,1197 @@ determine_clustered_index(Relation rel, bool usingindex, const char *indexname)
 
 	return indexOid;
 }
+
+
+/*
+ * Call this function before REPACK CONCURRENTLY starts to setup logical
+ * decoding. It makes sure that other users of the table put enough
+ * information into WAL.
+ *
+ * The point is that at various places we expect that the table we're
+ * processing is treated like a system catalog. For example, we need to be
+ * able to scan it using a "historic snapshot" anytime during the processing
+ * (as opposed to scanning only at the start point of the decoding, as logical
+ * replication does during initial table synchronization), in order to apply
+ * concurrent UPDATE / DELETE commands.
+ *
+ * Note that TOAST table needs no attention here as it's not scanned using
+ * historic snapshot.
+ */
+static void
+begin_concurrent_repack(Relation rel)
+{
+	Oid			toastrelid;
+
+	/* Avoid logical decoding of other relations by this backend. */
+	repacked_rel_locator = rel->rd_locator;
+	toastrelid = rel->rd_rel->reltoastrelid;
+	if (OidIsValid(toastrelid))
+	{
+		Relation	toastrel;
+
+		/* Avoid logical decoding of other TOAST relations. */
+		toastrel = table_open(toastrelid, AccessShareLock);
+		repacked_rel_toast_locator = toastrel->rd_locator;
+		table_close(toastrel, AccessShareLock);
+	}
+}
+
+/*
+ * Call this when done with REPACK CONCURRENTLY.
+ */
+static void
+end_concurrent_repack(void)
+{
+	/*
+	 * Restore normal function of (future) logical decoding for this backend.
+	 */
+	repacked_rel_locator.relNumber = InvalidOid;
+	repacked_rel_toast_locator.relNumber = InvalidOid;
+}
+
+/*
+ * This function is much like pg_create_logical_replication_slot() except that
+ * the new slot is neither released (if anyone else could read changes from
+ * our slot, we could miss changes other backends do while we copy the
+ * existing data into temporary table), nor persisted (it's easier to handle
+ * crash by restarting all the work from scratch).
+ */
+static LogicalDecodingContext *
+setup_logical_decoding(Oid relid, const char *slotname, TupleDesc tupdesc)
+{
+	LogicalDecodingContext *ctx;
+	RepackDecodingState *dstate;
+
+	/*
+	 * Check if we can use logical decoding.
+	 */
+	CheckSlotPermissions();
+	CheckLogicalDecodingRequirements();
+
+	/* RS_TEMPORARY so that the slot gets cleaned up on ERROR. */
+	ReplicationSlotCreate(slotname, true, RS_TEMPORARY, false, false, false);
+
+	/*
+	 * Neither prepare_write nor do_write callback nor update_progress is
+	 * useful for us.
+	 *
+	 * Regarding the value of need_full_snapshot, we pass false because the
+	 * table we are processing is present in RepackedRelsHash and therefore,
+	 * regarding logical decoding, treated like a catalog.
+	 */
+	ctx = CreateInitDecodingContext(REPL_PLUGIN_NAME,
+									NIL,
+									false,
+									InvalidXLogRecPtr,
+									XL_ROUTINE(.page_read = read_local_xlog_page,
+											   .segment_open = wal_segment_open,
+											   .segment_close = wal_segment_close),
+									NULL, NULL, NULL);
+
+	/*
+	 * We don't have control on setting fast_forward, so at least check it.
+	 */
+	Assert(!ctx->fast_forward);
+
+	DecodingContextFindStartpoint(ctx);
+
+	/* Some WAL records should have been read. */
+	Assert(ctx->reader->EndRecPtr != InvalidXLogRecPtr);
+
+	XLByteToSeg(ctx->reader->EndRecPtr, repack_current_segment,
+				wal_segment_size);
+
+	/*
+	 * Setup structures to store decoded changes.
+	 */
+	dstate = palloc0(sizeof(RepackDecodingState));
+	dstate->relid = relid;
+	dstate->tstore = tuplestore_begin_heap(false, false,
+										   maintenance_work_mem);
+
+	dstate->tupdesc = tupdesc;
+
+	/* Initialize the descriptor to store the changes ... */
+	dstate->tupdesc_change = CreateTemplateTupleDesc(1);
+
+	TupleDescInitEntry(dstate->tupdesc_change, 1, NULL, BYTEAOID, -1, 0);
+	/* ... as well as the corresponding slot. */
+	dstate->tsslot = MakeSingleTupleTableSlot(dstate->tupdesc_change,
+											  &TTSOpsMinimalTuple);
+
+	dstate->resowner = ResourceOwnerCreate(CurrentResourceOwner,
+										   "logical decoding");
+
+	ctx->output_writer_private = dstate;
+	return ctx;
+}
+
+/*
+ * Retrieve tuple from ConcurrentChange structure.
+ *
+ * The input data starts with the structure but it might not be appropriately
+ * aligned.
+ */
+static HeapTuple
+get_changed_tuple(char *change)
+{
+	HeapTupleData tup_data;
+	HeapTuple	result;
+	char	   *src;
+
+	/*
+	 * Ensure alignment before accessing the fields. (This is why we can't use
+	 * heap_copytuple() instead of this function.)
+	 */
+	src = change + offsetof(ConcurrentChange, tup_data);
+	memcpy(&tup_data, src, sizeof(HeapTupleData));
+
+	result = (HeapTuple) palloc(HEAPTUPLESIZE + tup_data.t_len);
+	memcpy(result, &tup_data, sizeof(HeapTupleData));
+	result->t_data = (HeapTupleHeader) ((char *) result + HEAPTUPLESIZE);
+	src = change + SizeOfConcurrentChange;
+	memcpy(result->t_data, src, result->t_len);
+
+	return result;
+}
+
+/*
+ * Decode logical changes from the WAL sequence up to end_of_wal.
+ */
+void
+repack_decode_concurrent_changes(LogicalDecodingContext *ctx,
+								 XLogRecPtr end_of_wal)
+{
+	RepackDecodingState *dstate;
+	ResourceOwner resowner_old;
+
+	/*
+	 * Invalidate the "present" cache before moving to "(recent) history".
+	 */
+	InvalidateSystemCaches();
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+	resowner_old = CurrentResourceOwner;
+	CurrentResourceOwner = dstate->resowner;
+
+	PG_TRY();
+	{
+		while (ctx->reader->EndRecPtr < end_of_wal)
+		{
+			XLogRecord *record;
+			XLogSegNo	segno_new;
+			char	   *errm = NULL;
+			XLogRecPtr	end_lsn;
+
+			record = XLogReadRecord(ctx->reader, &errm);
+			if (errm)
+				elog(ERROR, "%s", errm);
+
+			if (record != NULL)
+				LogicalDecodingProcessRecord(ctx, ctx->reader);
+
+			/*
+			 * If WAL segment boundary has been crossed, inform the decoding
+			 * system that the catalog_xmin can advance. (We can confirm more
+			 * often, but a filling a single WAL segment should not take much
+			 * time.)
+			 */
+			end_lsn = ctx->reader->EndRecPtr;
+			XLByteToSeg(end_lsn, segno_new, wal_segment_size);
+			if (segno_new != repack_current_segment)
+			{
+				LogicalConfirmReceivedLocation(end_lsn);
+				elog(DEBUG1, "REPACK: confirmed receive location %X/%X",
+					 (uint32) (end_lsn >> 32), (uint32) end_lsn);
+				repack_current_segment = segno_new;
+			}
+
+			CHECK_FOR_INTERRUPTS();
+		}
+		InvalidateSystemCaches();
+		CurrentResourceOwner = resowner_old;
+	}
+	PG_CATCH();
+	{
+		/* clear all timetravel entries */
+		InvalidateSystemCaches();
+		CurrentResourceOwner = resowner_old;
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+}
+
+/*
+ * Apply changes that happened during the initial load.
+ *
+ * Scan key is passed by caller, so it does not have to be constructed
+ * multiple times. Key entries have all fields initialized, except for
+ * sk_argument.
+ */
+static void
+apply_concurrent_changes(RepackDecodingState *dstate, Relation rel,
+						 ScanKey key, int nkeys, IndexInsertState *iistate)
+{
+	TupleTableSlot *index_slot,
+			   *ident_slot;
+	HeapTuple	tup_old = NULL;
+
+	if (dstate->nchanges == 0)
+		return;
+
+	/* TupleTableSlot is needed to pass the tuple to ExecInsertIndexTuples(). */
+	index_slot = MakeSingleTupleTableSlot(dstate->tupdesc, &TTSOpsHeapTuple);
+
+	/* A slot to fetch tuples from identity index. */
+	ident_slot = table_slot_create(rel, NULL);
+
+	while (tuplestore_gettupleslot(dstate->tstore, true, false,
+								   dstate->tsslot))
+	{
+		bool		shouldFree;
+		HeapTuple	tup_change,
+					tup,
+					tup_exist;
+		char	   *change_raw,
+				   *src;
+		ConcurrentChange change;
+		bool		isnull[1];
+		Datum		values[1];
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* Get the change from the single-column tuple. */
+		tup_change = ExecFetchSlotHeapTuple(dstate->tsslot, false, &shouldFree);
+		heap_deform_tuple(tup_change, dstate->tupdesc_change, values, isnull);
+		Assert(!isnull[0]);
+
+		/* Make sure we access aligned data. */
+		change_raw = (char *) DatumGetByteaP(values[0]);
+		src = (char *) VARDATA(change_raw);
+		memcpy(&change, src, SizeOfConcurrentChange);
+
+		/* TRUNCATE change contains no tuple, so process it separately. */
+		if (change.kind == CHANGE_TRUNCATE)
+		{
+			/*
+			 * All the things that ExecuteTruncateGuts() does (such as firing
+			 * triggers or handling the DROP_CASCADE behavior) should have
+			 * taken place on the source relation. Thus we only do the actual
+			 * truncation of the new relation (and its indexes).
+			 */
+			heap_truncate_one_rel(rel);
+
+			pfree(tup_change);
+			continue;
+		}
+
+		/*
+		 * Extract the tuple from the change. The tuple is copied here because
+		 * it might be assigned to 'tup_old', in which case it needs to
+		 * survive into the next iteration.
+		 */
+		tup = get_changed_tuple(src);
+
+		if (change.kind == CHANGE_UPDATE_OLD)
+		{
+			Assert(tup_old == NULL);
+			tup_old = tup;
+		}
+		else if (change.kind == CHANGE_INSERT)
+		{
+			Assert(tup_old == NULL);
+
+			apply_concurrent_insert(rel, &change, tup, iistate, index_slot);
+
+			pfree(tup);
+		}
+		else if (change.kind == CHANGE_UPDATE_NEW ||
+				 change.kind == CHANGE_DELETE)
+		{
+			IndexScanDesc ind_scan = NULL;
+			HeapTuple	tup_key;
+
+			if (change.kind == CHANGE_UPDATE_NEW)
+			{
+				tup_key = tup_old != NULL ? tup_old : tup;
+			}
+			else
+			{
+				Assert(tup_old == NULL);
+				tup_key = tup;
+			}
+
+			/*
+			 * Find the tuple to be updated or deleted.
+			 */
+			tup_exist = find_target_tuple(rel, key, nkeys, tup_key,
+										  iistate, ident_slot, &ind_scan);
+			if (tup_exist == NULL)
+				elog(ERROR, "Failed to find target tuple");
+
+			if (change.kind == CHANGE_UPDATE_NEW)
+				apply_concurrent_update(rel, tup, tup_exist, &change, iistate,
+										index_slot);
+			else
+				apply_concurrent_delete(rel, tup_exist, &change);
+
+			if (tup_old != NULL)
+			{
+				pfree(tup_old);
+				tup_old = NULL;
+			}
+
+			pfree(tup);
+			index_endscan(ind_scan);
+		}
+		else
+			elog(ERROR, "Unrecognized kind of change: %d", change.kind);
+
+		/*
+		 * If a change was applied now, increment CID for next writes and
+		 * update the snapshot so it sees the changes we've applied so far.
+		 */
+		if (change.kind != CHANGE_UPDATE_OLD)
+		{
+			CommandCounterIncrement();
+			UpdateActiveSnapshotCommandId();
+		}
+
+		/* TTSOpsMinimalTuple has .get_heap_tuple==NULL. */
+		Assert(shouldFree);
+		pfree(tup_change);
+	}
+
+	tuplestore_clear(dstate->tstore);
+	dstate->nchanges = 0;
+
+	/* Cleanup. */
+	ExecDropSingleTupleTableSlot(index_slot);
+	ExecDropSingleTupleTableSlot(ident_slot);
+}
+
+static void
+apply_concurrent_insert(Relation rel, ConcurrentChange *change, HeapTuple tup,
+						IndexInsertState *iistate, TupleTableSlot *index_slot)
+{
+	List	   *recheck;
+
+
+	/*
+	 * Like simple_heap_insert(), but make sure that the INSERT is not
+	 * logically decoded - see reform_and_rewrite_tuple() for more
+	 * information.
+	 */
+	heap_insert(rel, tup, GetCurrentCommandId(true), HEAP_INSERT_NO_LOGICAL,
+				NULL);
+
+	/*
+	 * Update indexes.
+	 *
+	 * In case functions in the index need the active snapshot and caller
+	 * hasn't set one.
+	 */
+	ExecStoreHeapTuple(tup, index_slot, false);
+	recheck = ExecInsertIndexTuples(iistate->rri,
+									index_slot,
+									iistate->estate,
+									false,	/* update */
+									false,	/* noDupErr */
+									NULL,	/* specConflict */
+									NIL,	/* arbiterIndexes */
+									false	/* onlySummarizing */
+		);
+
+	/*
+	 * If recheck is required, it must have been preformed on the source
+	 * relation by now. (All the logical changes we process here are already
+	 * committed.)
+	 */
+	list_free(recheck);
+
+	pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_INSERTED, 1);
+}
+
+static void
+apply_concurrent_update(Relation rel, HeapTuple tup, HeapTuple tup_target,
+						ConcurrentChange *change, IndexInsertState *iistate,
+						TupleTableSlot *index_slot)
+{
+	LockTupleMode lockmode;
+	TM_FailureData tmfd;
+	TU_UpdateIndexes update_indexes;
+	TM_Result	res;
+	List	   *recheck;
+
+	/*
+	 * Write the new tuple into the new heap. ('tup' gets the TID assigned
+	 * here.)
+	 *
+	 * Do it like in simple_heap_update(), except for 'wal_logical' (and
+	 * except for 'wait').
+	 */
+	res = heap_update(rel, &tup_target->t_self, tup,
+					  GetCurrentCommandId(true),
+					  InvalidSnapshot,
+					  false,	/* no wait - only we are doing changes */
+					  &tmfd, &lockmode, &update_indexes,
+					  false /* wal_logical */ );
+	if (res != TM_Ok)
+		ereport(ERROR, (errmsg("failed to apply concurrent UPDATE")));
+
+	ExecStoreHeapTuple(tup, index_slot, false);
+
+	if (update_indexes != TU_None)
+	{
+		recheck = ExecInsertIndexTuples(iistate->rri,
+										index_slot,
+										iistate->estate,
+										true,	/* update */
+										false,	/* noDupErr */
+										NULL,	/* specConflict */
+										NIL,	/* arbiterIndexes */
+		/* onlySummarizing */
+										update_indexes == TU_Summarizing);
+		list_free(recheck);
+	}
+
+	pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_UPDATED, 1);
+}
+
+static void
+apply_concurrent_delete(Relation rel, HeapTuple tup_target,
+						ConcurrentChange *change)
+{
+	TM_Result	res;
+	TM_FailureData tmfd;
+
+	/*
+	 * Delete tuple from the new heap.
+	 *
+	 * Do it like in simple_heap_delete(), except for 'wal_logical' (and
+	 * except for 'wait').
+	 */
+	res = heap_delete(rel, &tup_target->t_self, GetCurrentCommandId(true),
+					  InvalidSnapshot, false,
+					  &tmfd,
+					  false,	/* no wait - only we are doing changes */
+					  false /* wal_logical */ );
+
+	if (res != TM_Ok)
+		ereport(ERROR, (errmsg("failed to apply concurrent DELETE")));
+
+	pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_DELETED, 1);
+}
+
+/*
+ * Find the tuple to be updated or deleted.
+ *
+ * 'key' is a pre-initialized scan key, into which the function will put the
+ * key values.
+ *
+ * 'tup_key' is a tuple containing the key values for the scan.
+ *
+ * On exit,'*scan_p' contains the scan descriptor used. The caller must close
+ * it when he no longer needs the tuple returned.
+ */
+static HeapTuple
+find_target_tuple(Relation rel, ScanKey key, int nkeys, HeapTuple tup_key,
+				  IndexInsertState *iistate,
+				  TupleTableSlot *ident_slot, IndexScanDesc *scan_p)
+{
+	IndexScanDesc scan;
+	Form_pg_index ident_form;
+	int2vector *ident_indkey;
+	HeapTuple	result = NULL;
+
+	/* XXX no instrumentation for now */
+	scan = index_beginscan(rel, iistate->ident_index, GetActiveSnapshot(),
+						   NULL, nkeys, 0);
+	*scan_p = scan;
+	index_rescan(scan, key, nkeys, NULL, 0);
+
+	/* Info needed to retrieve key values from heap tuple. */
+	ident_form = iistate->ident_index->rd_index;
+	ident_indkey = &ident_form->indkey;
+
+	/* Use the incoming tuple to finalize the scan key. */
+	for (int i = 0; i < scan->numberOfKeys; i++)
+	{
+		ScanKey		entry;
+		bool		isnull;
+		int16		attno_heap;
+
+		entry = &scan->keyData[i];
+		attno_heap = ident_indkey->values[i];
+		entry->sk_argument = heap_getattr(tup_key,
+										  attno_heap,
+										  rel->rd_att,
+										  &isnull);
+		Assert(!isnull);
+	}
+	if (index_getnext_slot(scan, ForwardScanDirection, ident_slot))
+	{
+		bool		shouldFree;
+
+		result = ExecFetchSlotHeapTuple(ident_slot, false, &shouldFree);
+		/* TTSOpsBufferHeapTuple has .get_heap_tuple != NULL. */
+		Assert(!shouldFree);
+	}
+
+	return result;
+}
+
+/*
+ * Decode and apply concurrent changes.
+ *
+ * Pass rel_src iff its reltoastrelid is needed.
+ */
+static void
+process_concurrent_changes(LogicalDecodingContext *ctx, XLogRecPtr end_of_wal,
+						   Relation rel_dst, Relation rel_src, ScanKey ident_key,
+						   int ident_key_nentries, IndexInsertState *iistate)
+{
+	RepackDecodingState *dstate;
+
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_CATCH_UP);
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	repack_decode_concurrent_changes(ctx, end_of_wal);
+
+	if (dstate->nchanges == 0)
+		return;
+
+	PG_TRY();
+	{
+		/*
+		 * Make sure that TOAST values can eventually be accessed via the old
+		 * relation - see comment in copy_table_data().
+		 */
+		if (rel_src)
+			rel_dst->rd_toastoid = rel_src->rd_rel->reltoastrelid;
+
+		apply_concurrent_changes(dstate, rel_dst, ident_key,
+								 ident_key_nentries, iistate);
+	}
+	PG_FINALLY();
+	{
+		if (rel_src)
+			rel_dst->rd_toastoid = InvalidOid;
+	}
+	PG_END_TRY();
+}
+
+static IndexInsertState *
+get_index_insert_state(Relation relation, Oid ident_index_id)
+{
+	EState	   *estate;
+	int			i;
+	IndexInsertState *result;
+
+	result = (IndexInsertState *) palloc0(sizeof(IndexInsertState));
+	estate = CreateExecutorState();
+
+	result->rri = (ResultRelInfo *) palloc(sizeof(ResultRelInfo));
+	InitResultRelInfo(result->rri, relation, 0, 0, 0);
+	ExecOpenIndices(result->rri, false);
+
+	/*
+	 * Find the relcache entry of the identity index so that we spend no extra
+	 * effort to open / close it.
+	 */
+	for (i = 0; i < result->rri->ri_NumIndices; i++)
+	{
+		Relation	ind_rel;
+
+		ind_rel = result->rri->ri_IndexRelationDescs[i];
+		if (ind_rel->rd_id == ident_index_id)
+			result->ident_index = ind_rel;
+	}
+	if (result->ident_index == NULL)
+		elog(ERROR, "Failed to open identity index");
+
+	/* Only initialize fields needed by ExecInsertIndexTuples(). */
+	result->estate = estate;
+
+	return result;
+}
+
+/*
+ * Build scan key to process logical changes.
+ */
+static ScanKey
+build_identity_key(Oid ident_idx_oid, Relation rel_src, int *nentries)
+{
+	Relation	ident_idx_rel;
+	Form_pg_index ident_idx;
+	int			n,
+				i;
+	ScanKey		result;
+
+	Assert(OidIsValid(ident_idx_oid));
+	ident_idx_rel = index_open(ident_idx_oid, AccessShareLock);
+	ident_idx = ident_idx_rel->rd_index;
+	n = ident_idx->indnatts;
+	result = (ScanKey) palloc(sizeof(ScanKeyData) * n);
+	for (i = 0; i < n; i++)
+	{
+		ScanKey		entry;
+		int16		relattno;
+		Form_pg_attribute att;
+		Oid			opfamily,
+					opcintype,
+					opno,
+					opcode;
+
+		entry = &result[i];
+		relattno = ident_idx->indkey.values[i];
+		if (relattno >= 1)
+		{
+			TupleDesc	desc;
+
+			desc = rel_src->rd_att;
+			att = TupleDescAttr(desc, relattno - 1);
+		}
+		else
+			elog(ERROR, "Unexpected attribute number %d in index", relattno);
+
+		opfamily = ident_idx_rel->rd_opfamily[i];
+		opcintype = ident_idx_rel->rd_opcintype[i];
+		opno = get_opfamily_member(opfamily, opcintype, opcintype,
+								   BTEqualStrategyNumber);
+
+		if (!OidIsValid(opno))
+			elog(ERROR, "Failed to find = operator for type %u", opcintype);
+
+		opcode = get_opcode(opno);
+		if (!OidIsValid(opcode))
+			elog(ERROR, "Failed to find = operator for operator %u", opno);
+
+		/* Initialize everything but argument. */
+		ScanKeyInit(entry,
+					i + 1,
+					BTEqualStrategyNumber, opcode,
+					(Datum) NULL);
+		entry->sk_collation = att->attcollation;
+	}
+	index_close(ident_idx_rel, AccessShareLock);
+
+	*nentries = n;
+	return result;
+}
+
+static void
+free_index_insert_state(IndexInsertState *iistate)
+{
+	ExecCloseIndices(iistate->rri);
+	FreeExecutorState(iistate->estate);
+	pfree(iistate->rri);
+	pfree(iistate);
+}
+
+static void
+cleanup_logical_decoding(LogicalDecodingContext *ctx)
+{
+	RepackDecodingState *dstate;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	ExecDropSingleTupleTableSlot(dstate->tsslot);
+	FreeTupleDesc(dstate->tupdesc_change);
+	FreeTupleDesc(dstate->tupdesc);
+	tuplestore_end(dstate->tstore);
+
+	FreeDecodingContext(ctx);
+}
+
+/*
+ * The final steps of rebuild_relation() for concurrent processing.
+ *
+ * On entry, NewHeap is locked in AccessExclusiveLock mode. OldHeap and its
+ * clustering index (if one is passed) are still locked in a mode that allows
+ * concurrent data changes. On exit, both tables and their indexes are closed,
+ * but locked in AccessExclusiveLock mode.
+ */
+static void
+rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
+								   Relation cl_index,
+								   LogicalDecodingContext *ctx,
+								   bool swap_toast_by_content,
+								   TransactionId frozenXid,
+								   MultiXactId cutoffMulti)
+{
+	LOCKMODE	lockmode_old PG_USED_FOR_ASSERTS_ONLY;
+	List	   *ind_oids_new;
+	Oid			old_table_oid = RelationGetRelid(OldHeap);
+	Oid			new_table_oid = RelationGetRelid(NewHeap);
+	List	   *ind_oids_old = RelationGetIndexList(OldHeap);
+	ListCell   *lc,
+			   *lc2;
+	char		relpersistence;
+	bool		is_system_catalog;
+	Oid			ident_idx_old,
+				ident_idx_new;
+	IndexInsertState *iistate;
+	ScanKey		ident_key;
+	int			ident_key_nentries;
+	XLogRecPtr	wal_insert_ptr,
+				end_of_wal;
+	char		dummy_rec_data = '\0';
+	Relation   *ind_refs,
+			   *ind_refs_p;
+	int			nind;
+
+	/* Like in cluster_rel(). */
+	lockmode_old = ShareUpdateExclusiveLock;
+	Assert(CheckRelationLockedByMe(OldHeap, lockmode_old, false));
+	Assert(cl_index == NULL ||
+		   CheckRelationLockedByMe(cl_index, lockmode_old, false));
+	/* This is expected from the caller. */
+	Assert(CheckRelationLockedByMe(NewHeap, AccessExclusiveLock, false));
+
+	ident_idx_old = RelationGetReplicaIndex(OldHeap);
+
+	/*
+	 * Unlike the exclusive case, we build new indexes for the new relation
+	 * rather than swapping the storage and reindexing the old relation. The
+	 * point is that the index build can take some time, so we do it before we
+	 * get AccessExclusiveLock on the old heap and therefore we cannot swap
+	 * the heap storage yet.
+	 *
+	 * index_create() will lock the new indexes using AccessExclusiveLock - no
+	 * need to change that.
+	 *
+	 * We assume that ShareUpdateExclusiveLock on the table prevents anyone
+	 * from dropping the existing indexes or adding new ones, so the lists of
+	 * old and new indexes should match at the swap time. On the other hand we
+	 * do not block ALTER INDEX commands that do not require table lock (e.g.
+	 * ALTER INDEX ... SET ...).
+	 *
+	 * XXX Should we check a the end of our work if another transaction
+	 * executed such a command and issue a NOTICE that we might have discarded
+	 * its effects? (For example, someone changes storage parameter after we
+	 * have created the new index, the new value of that parameter is lost.)
+	 * Alternatively, we can lock all the indexes now in a mode that blocks
+	 * all the ALTER INDEX commands (ShareUpdateExclusiveLock ?), and keep
+	 * them locked till the end of the transactions. That might increase the
+	 * risk of deadlock during the lock upgrade below, however SELECT / DML
+	 * queries should not be involved in such a deadlock.
+	 */
+	ind_oids_new = build_new_indexes(NewHeap, OldHeap, ind_oids_old);
+
+	/*
+	 * Processing shouldn't start w/o valid identity index.
+	 */
+	Assert(OidIsValid(ident_idx_old));
+
+	/* Find "identity index" on the new relation. */
+	ident_idx_new = InvalidOid;
+	forboth(lc, ind_oids_old, lc2, ind_oids_new)
+	{
+		Oid			ind_old = lfirst_oid(lc);
+		Oid			ind_new = lfirst_oid(lc2);
+
+		if (ident_idx_old == ind_old)
+		{
+			ident_idx_new = ind_new;
+			break;
+		}
+	}
+	if (!OidIsValid(ident_idx_new))
+
+		/*
+		 * Should not happen, given our lock on the old relation.
+		 */
+		ereport(ERROR,
+				(errmsg("Identity index missing on the new relation")));
+
+	/* Executor state to update indexes. */
+	iistate = get_index_insert_state(NewHeap, ident_idx_new);
+
+	/*
+	 * Build scan key that we'll use to look for rows to be updated / deleted
+	 * during logical decoding.
+	 */
+	ident_key = build_identity_key(ident_idx_new, OldHeap, &ident_key_nentries);
+
+	/*
+	 * During testing, wait for another backend to perform concurrent data
+	 * changes which we will process below.
+	 */
+	INJECTION_POINT("repack-concurrently-before-lock", NULL);
+
+	/*
+	 * Flush all WAL records inserted so far (possibly except for the last
+	 * incomplete page, see GetInsertRecPtr), to minimize the amount of data
+	 * we need to flush while holding exclusive lock on the source table.
+	 */
+	wal_insert_ptr = GetInsertRecPtr();
+	XLogFlush(wal_insert_ptr);
+	end_of_wal = GetFlushRecPtr(NULL);
+
+	/*
+	 * Apply concurrent changes first time, to minimize the time we need to
+	 * hold AccessExclusiveLock. (Quite some amount of WAL could have been
+	 * written during the data copying and index creation.)
+	 */
+	process_concurrent_changes(ctx, end_of_wal, NewHeap,
+							   swap_toast_by_content ? OldHeap : NULL,
+							   ident_key, ident_key_nentries, iistate);
+
+	/*
+	 * Acquire AccessExclusiveLock on the table, its TOAST relation (if there
+	 * is one), all its indexes, so that we can swap the files.
+	 *
+	 * Before that, unlock the index temporarily to avoid deadlock in case
+	 * another transaction is trying to lock it while holding the lock on the
+	 * table.
+	 */
+	if (cl_index)
+	{
+		index_close(cl_index, ShareUpdateExclusiveLock);
+		cl_index = NULL;
+	}
+	/* For the same reason, unlock TOAST relation. */
+	if (OldHeap->rd_rel->reltoastrelid)
+		LockRelationOid(OldHeap->rd_rel->reltoastrelid, AccessExclusiveLock);
+	/* Finally lock the table */
+	LockRelationOid(old_table_oid, AccessExclusiveLock);
+
+	/*
+	 * Lock all indexes now, not only the clustering one: all indexes need to
+	 * have their files swapped. While doing that, store their relation
+	 * references in an array, to handle predicate locks below.
+	 */
+	ind_refs_p = ind_refs = palloc_array(Relation, list_length(ind_oids_old));
+	nind = 0;
+	foreach(lc, ind_oids_old)
+	{
+		Oid			ind_oid;
+		Relation	index;
+
+		ind_oid = lfirst_oid(lc);
+		index = index_open(ind_oid, AccessExclusiveLock);
+
+		/*
+		 * TODO 1) Do we need to check if ALTER INDEX was executed since the
+		 * new index was created in build_new_indexes()? 2) Specifically for
+		 * the clustering index, should check_index_is_clusterable() be called
+		 * here? (Not sure about the latter: ShareUpdateExclusiveLock on the
+		 * table probably blocks all commands that affect the result of
+		 * check_index_is_clusterable().)
+		 */
+		*ind_refs_p = index;
+		ind_refs_p++;
+		nind++;
+	}
+
+	/*
+	 * In addition, lock the OldHeap's TOAST relation exclusively - again, the
+	 * lock is needed to swap the files.
+	 */
+	if (OidIsValid(OldHeap->rd_rel->reltoastrelid))
+		LockRelationOid(OldHeap->rd_rel->reltoastrelid, AccessExclusiveLock);
+
+	/*
+	 * Tuples and pages of the old heap will be gone, but the heap will stay.
+	 */
+	TransferPredicateLocksToHeapRelation(OldHeap);
+	/* The same for indexes. */
+	for (int i = 0; i < nind; i++)
+	{
+		Relation	index = ind_refs[i];
+
+		TransferPredicateLocksToHeapRelation(index);
+
+		/*
+		 * References to indexes on the old relation are not needed anymore,
+		 * however locks stay till the end of the transaction.
+		 */
+		index_close(index, NoLock);
+	}
+	pfree(ind_refs);
+
+	/*
+	 * Flush anything we see in WAL, to make sure that all changes committed
+	 * while we were waiting for the exclusive lock are available for
+	 * decoding. This should not be necessary if all backends had
+	 * synchronous_commit set, but we can't rely on this setting.
+	 *
+	 * Unfortunately, GetInsertRecPtr() may lag behind the actual insert
+	 * position, and GetLastImportantRecPtr() points at the start of the last
+	 * record rather than at the end. Thus the simplest way to determine the
+	 * insert position is to insert a dummy record and use its LSN.
+	 *
+	 * XXX Consider using GetLastImportantRecPtr() and adding the size of the
+	 * last record (plus the total size of all the page headers the record
+	 * spans)?
+	 */
+	XLogBeginInsert();
+	XLogRegisterData(&dummy_rec_data, 1);
+	wal_insert_ptr = XLogInsert(RM_XLOG_ID, XLOG_NOOP);
+	XLogFlush(wal_insert_ptr);
+	end_of_wal = GetFlushRecPtr(NULL);
+
+	/* Apply the concurrent changes again. */
+	process_concurrent_changes(ctx, end_of_wal, NewHeap,
+							   swap_toast_by_content ? OldHeap : NULL,
+							   ident_key, ident_key_nentries, iistate);
+
+	/* Remember info about rel before closing OldHeap */
+	relpersistence = OldHeap->rd_rel->relpersistence;
+	is_system_catalog = IsSystemRelation(OldHeap);
+
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_SWAP_REL_FILES);
+
+	/*
+	 * Even ShareUpdateExclusiveLock should have prevented others from
+	 * creating / dropping indexes (even using the CONCURRENTLY option), so we
+	 * do not need to check whether the lists match.
+	 */
+	forboth(lc, ind_oids_old, lc2, ind_oids_new)
+	{
+		Oid			ind_old = lfirst_oid(lc);
+		Oid			ind_new = lfirst_oid(lc2);
+		Oid			mapped_tables[4];
+
+		/* Zero out possible results from swapped_relation_files */
+		memset(mapped_tables, 0, sizeof(mapped_tables));
+
+		swap_relation_files(ind_old, ind_new,
+							(old_table_oid == RelationRelationId),
+							swap_toast_by_content,
+							true,
+							InvalidTransactionId,
+							InvalidMultiXactId,
+							mapped_tables);
+
+#ifdef USE_ASSERT_CHECKING
+
+		/*
+		 * Concurrent processing is not supported for system relations, so
+		 * there should be no mapped tables.
+		 */
+		for (int i = 0; i < 4; i++)
+			Assert(mapped_tables[i] == 0);
+#endif
+	}
+
+	/* The new indexes must be visible for deletion. */
+	CommandCounterIncrement();
+
+	/* Close the old heap but keep lock until transaction commit. */
+	table_close(OldHeap, NoLock);
+	/* Close the new heap. (We didn't have to open its indexes). */
+	table_close(NewHeap, NoLock);
+
+	/* Cleanup what we don't need anymore. (And close the identity index.) */
+	pfree(ident_key);
+	free_index_insert_state(iistate);
+
+	/*
+	 * Swap the relations and their TOAST relations and TOAST indexes. This
+	 * also drops the new relation and its indexes.
+	 *
+	 * (System catalogs are currently not supported.)
+	 */
+	Assert(!is_system_catalog);
+	finish_heap_swap(old_table_oid, new_table_oid,
+					 is_system_catalog,
+					 swap_toast_by_content,
+					 false, true, false,
+					 frozenXid, cutoffMulti,
+					 relpersistence);
+}
+
+/*
+ * Build indexes on NewHeap according to those on OldHeap.
+ *
+ * OldIndexes is the list of index OIDs on OldHeap.
+ *
+ * A list of OIDs of the corresponding indexes created on NewHeap is
+ * returned. The order of items does match, so we can use these arrays to swap
+ * index storage.
+ */
+static List *
+build_new_indexes(Relation NewHeap, Relation OldHeap, List *OldIndexes)
+{
+	StringInfo	ind_name;
+	ListCell   *lc;
+	List	   *result = NIL;
+
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
+
+	ind_name = makeStringInfo();
+
+	foreach(lc, OldIndexes)
+	{
+		Oid			ind_oid,
+					ind_oid_new,
+					tbsp_oid;
+		Relation	ind;
+		IndexInfo  *ind_info;
+		int			i,
+					heap_col_id;
+		List	   *colnames;
+		int16		indnatts;
+		Oid		   *collations,
+				   *opclasses;
+		HeapTuple	tup;
+		bool		isnull;
+		Datum		d;
+		oidvector  *oidvec;
+		int2vector *int2vec;
+		size_t		oid_arr_size;
+		size_t		int2_arr_size;
+		int16	   *indoptions;
+		text	   *reloptions = NULL;
+		bits16		flags;
+		Datum	   *opclassOptions;
+		NullableDatum *stattargets;
+
+		ind_oid = lfirst_oid(lc);
+		ind = index_open(ind_oid, AccessShareLock);
+		ind_info = BuildIndexInfo(ind);
+
+		tbsp_oid = ind->rd_rel->reltablespace;
+
+		/*
+		 * Index name really doesn't matter, we'll eventually use only their
+		 * storage. Just make them unique within the table.
+		 */
+		resetStringInfo(ind_name);
+		appendStringInfo(ind_name, "ind_%d",
+						 list_cell_number(OldIndexes, lc));
+
+		flags = 0;
+		if (ind->rd_index->indisprimary)
+			flags |= INDEX_CREATE_IS_PRIMARY;
+
+		colnames = NIL;
+		indnatts = ind->rd_index->indnatts;
+		oid_arr_size = sizeof(Oid) * indnatts;
+		int2_arr_size = sizeof(int16) * indnatts;
+
+		collations = (Oid *) palloc(oid_arr_size);
+		for (i = 0; i < indnatts; i++)
+		{
+			char	   *colname;
+
+			heap_col_id = ind->rd_index->indkey.values[i];
+			if (heap_col_id > 0)
+			{
+				Form_pg_attribute att;
+
+				/* Normal attribute. */
+				att = TupleDescAttr(OldHeap->rd_att, heap_col_id - 1);
+				colname = pstrdup(NameStr(att->attname));
+				collations[i] = att->attcollation;
+			}
+			else if (heap_col_id == 0)
+			{
+				HeapTuple	tuple;
+				Form_pg_attribute att;
+
+				/*
+				 * Expression column is not present in relcache. What we need
+				 * here is an attribute of the *index* relation.
+				 */
+				tuple = SearchSysCache2(ATTNUM,
+										ObjectIdGetDatum(ind_oid),
+										Int16GetDatum(i + 1));
+				if (!HeapTupleIsValid(tuple))
+					elog(ERROR,
+						 "cache lookup failed for attribute %d of relation %u",
+						 i + 1, ind_oid);
+				att = (Form_pg_attribute) GETSTRUCT(tuple);
+				colname = pstrdup(NameStr(att->attname));
+				collations[i] = att->attcollation;
+				ReleaseSysCache(tuple);
+			}
+			else
+				elog(ERROR, "Unexpected column number: %d",
+					 heap_col_id);
+
+			colnames = lappend(colnames, colname);
+		}
+
+		/*
+		 * Special effort needed for variable length attributes of
+		 * Form_pg_index.
+		 */
+		tup = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(ind_oid));
+		if (!HeapTupleIsValid(tup))
+			elog(ERROR, "cache lookup failed for index %u", ind_oid);
+		d = SysCacheGetAttr(INDEXRELID, tup, Anum_pg_index_indclass, &isnull);
+		Assert(!isnull);
+		oidvec = (oidvector *) DatumGetPointer(d);
+		opclasses = (Oid *) palloc(oid_arr_size);
+		memcpy(opclasses, oidvec->values, oid_arr_size);
+
+		d = SysCacheGetAttr(INDEXRELID, tup, Anum_pg_index_indoption,
+							&isnull);
+		Assert(!isnull);
+		int2vec = (int2vector *) DatumGetPointer(d);
+		indoptions = (int16 *) palloc(int2_arr_size);
+		memcpy(indoptions, int2vec->values, int2_arr_size);
+		ReleaseSysCache(tup);
+
+		tup = SearchSysCache1(RELOID, ObjectIdGetDatum(ind_oid));
+		if (!HeapTupleIsValid(tup))
+			elog(ERROR, "cache lookup failed for index relation %u", ind_oid);
+		d = SysCacheGetAttr(RELOID, tup, Anum_pg_class_reloptions, &isnull);
+		reloptions = !isnull ? DatumGetTextPCopy(d) : NULL;
+		ReleaseSysCache(tup);
+
+		opclassOptions = palloc0(sizeof(Datum) * ind_info->ii_NumIndexAttrs);
+		for (i = 0; i < ind_info->ii_NumIndexAttrs; i++)
+			opclassOptions[i] = get_attoptions(ind_oid, i + 1);
+
+		stattargets = get_index_stattargets(ind_oid, ind_info);
+
+		/*
+		 * Neither parentIndexRelid nor parentConstraintId needs to be passed
+		 * since the new catalog entries (pg_constraint, pg_inherits) would
+		 * eventually be dropped. Therefore there's no need to record valid
+		 * dependency on parents.
+		 */
+		ind_oid_new = index_create(NewHeap,
+								   ind_name->data,
+								   InvalidOid,
+								   InvalidOid,	/* parentIndexRelid */
+								   InvalidOid,	/* parentConstraintId */
+								   InvalidOid,
+								   ind_info,
+								   colnames,
+								   ind->rd_rel->relam,
+								   tbsp_oid,
+								   collations,
+								   opclasses,
+								   opclassOptions,
+								   indoptions,
+								   stattargets,
+								   PointerGetDatum(reloptions),
+								   flags,	/* flags */
+								   0,	/* constr_flags */
+								   false,	/* allow_system_table_mods */
+								   false,	/* is_internal */
+								   NULL /* constraintId */
+			);
+		result = lappend_oid(result, ind_oid_new);
+
+		index_close(ind, AccessShareLock);
+		list_free_deep(colnames);
+		pfree(collations);
+		pfree(opclasses);
+		pfree(indoptions);
+		if (reloptions)
+			pfree(reloptions);
+	}
+
+	return result;
+}
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 188e26f0e6e..71b73c21ebf 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -904,7 +904,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 static void
 refresh_by_heap_swap(Oid matviewOid, Oid OIDNewHeap, char relpersistence)
 {
-	finish_heap_swap(matviewOid, OIDNewHeap, false, false, true, true,
+	finish_heap_swap(matviewOid, OIDNewHeap, false, false, true, true, true,
 					 RecentXmin, ReadNextMultiXactId(), relpersistence);
 }
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index cb811520c29..b5fe4e0e7ed 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -5989,6 +5989,7 @@ ATRewriteTables(AlterTableStmt *parsetree, List **wqueue, LOCKMODE lockmode,
 			finish_heap_swap(tab->relid, OIDNewHeap,
 							 false, false, true,
 							 !OidIsValid(tab->newTableSpace),
+							 true,
 							 RecentXmin,
 							 ReadNextMultiXactId(),
 							 persistence);
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 8863ad0e8bd..6de9d0ba39d 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -125,7 +125,7 @@ static void vac_truncate_clog(TransactionId frozenXID,
 							  TransactionId lastSaneFrozenXid,
 							  MultiXactId lastSaneMinMulti);
 static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
-					   BufferAccessStrategy bstrategy);
+					   BufferAccessStrategy bstrategy, bool isTopLevel);
 static double compute_parallel_delay(void);
 static VacOptValue get_vacoptval_from_boolean(DefElem *def);
 static bool vac_tid_reaped(ItemPointer itemptr, void *state);
@@ -633,7 +633,8 @@ vacuum(List *relations, const VacuumParams params, BufferAccessStrategy bstrateg
 
 			if (params.options & VACOPT_VACUUM)
 			{
-				if (!vacuum_rel(vrel->oid, vrel->relation, params, bstrategy))
+				if (!vacuum_rel(vrel->oid, vrel->relation, params, bstrategy,
+								isTopLevel))
 					continue;
 			}
 
@@ -1997,7 +1998,7 @@ vac_truncate_clog(TransactionId frozenXID,
  */
 static bool
 vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
-		   BufferAccessStrategy bstrategy)
+		   BufferAccessStrategy bstrategy, bool isTopLevel)
 {
 	LOCKMODE	lmode;
 	Relation	rel;
@@ -2288,7 +2289,7 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 
 			/* VACUUM FULL is now a variant of CLUSTER; see cluster.c */
 			cluster_rel(REPACK_COMMAND_VACUUMFULL, false, rel, InvalidOid,
-						&cluster_params);
+						&cluster_params, isTopLevel);
 			/* cluster_rel closes the relation, but keeps lock */
 
 			rel = NULL;
@@ -2331,7 +2332,8 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 		toast_vacuum_params.options |= VACOPT_PROCESS_MAIN;
 		toast_vacuum_params.toast_parent = relid;
 
-		vacuum_rel(toast_relid, NULL, toast_vacuum_params, bstrategy);
+		vacuum_rel(toast_relid, NULL, toast_vacuum_params, bstrategy,
+				   isTopLevel);
 	}
 
 	/*
diff --git a/src/backend/meson.build b/src/backend/meson.build
index 2b0db214804..50aa385a581 100644
--- a/src/backend/meson.build
+++ b/src/backend/meson.build
@@ -194,5 +194,6 @@ pg_test_mod_args = pg_mod_args + {
 subdir('jit/llvm')
 subdir('replication/libpqwalreceiver')
 subdir('replication/pgoutput')
+subdir('replication/pgoutput_repack')
 subdir('snowball')
 subdir('utils/mb/conversion_procs')
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..5dc4ae58ffe 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -33,6 +33,7 @@
 #include "access/xlogreader.h"
 #include "access/xlogrecord.h"
 #include "catalog/pg_control.h"
+#include "commands/cluster.h"
 #include "replication/decode.h"
 #include "replication/logical.h"
 #include "replication/message.h"
@@ -472,6 +473,88 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	TransactionId xid = XLogRecGetXid(buf->record);
 	SnapBuild  *builder = ctx->snapshot_builder;
 
+	/*
+	 * If the change is not intended for logical decoding, do not even
+	 * establish transaction for it - REPACK CONCURRENTLY is the typical use
+	 * case.
+	 *
+	 * First, check if REPACK CONCURRENTLY is being performed by this backend.
+	 * If so, only decode data changes of the table that it is processing, and
+	 * the changes of its TOAST relation.
+	 *
+	 * (TOAST locator should not be set unless the main is.)
+	 */
+	Assert(!OidIsValid(repacked_rel_toast_locator.relNumber) ||
+		   OidIsValid(repacked_rel_locator.relNumber));
+
+	if (OidIsValid(repacked_rel_locator.relNumber))
+	{
+		XLogReaderState *r = buf->record;
+		RelFileLocator locator;
+
+		/* Not all records contain the block. */
+		if (XLogRecGetBlockTagExtended(r, 0, &locator, NULL, NULL, NULL) &&
+			!RelFileLocatorEquals(locator, repacked_rel_locator) &&
+			(!OidIsValid(repacked_rel_toast_locator.relNumber) ||
+			 !RelFileLocatorEquals(locator, repacked_rel_toast_locator)))
+			return;
+	}
+
+	/*
+	 * Second, skip records which do not contain sufficient information for
+	 * the decoding.
+	 *
+	 * The problem we solve here is that REPACK CONCURRENTLY generates WAL
+	 * when doing changes in the new table. Those changes should not be useful
+	 * for any other user (such as logical replication subscription) because
+	 * the new table will eventually be dropped (after REPACK CONCURRENTLY has
+	 * assigned its file to the "old table").
+	 */
+	switch (info)
+	{
+		case XLOG_HEAP_INSERT:
+			{
+				xl_heap_insert *rec;
+
+				rec = (xl_heap_insert *) XLogRecGetData(buf->record);
+
+				/*
+				 * This does happen when 1) raw_heap_insert marks the TOAST
+				 * record as HEAP_INSERT_NO_LOGICAL, 2) REPACK CONCURRENTLY
+				 * replays inserts performed by other backends.
+				 */
+				if ((rec->flags & XLH_INSERT_CONTAINS_NEW_TUPLE) == 0)
+					return;
+
+				break;
+			}
+
+		case XLOG_HEAP_HOT_UPDATE:
+		case XLOG_HEAP_UPDATE:
+			{
+				xl_heap_update *rec;
+
+				rec = (xl_heap_update *) XLogRecGetData(buf->record);
+				if ((rec->flags &
+					 (XLH_UPDATE_CONTAINS_NEW_TUPLE |
+					  XLH_UPDATE_CONTAINS_OLD_TUPLE |
+					  XLH_UPDATE_CONTAINS_OLD_KEY)) == 0)
+					return;
+
+				break;
+			}
+
+		case XLOG_HEAP_DELETE:
+			{
+				xl_heap_delete *rec;
+
+				rec = (xl_heap_delete *) XLogRecGetData(buf->record);
+				if (rec->flags & XLH_DELETE_NO_LOGICAL)
+					return;
+				break;
+			}
+	}
+
 	ReorderBufferProcessXid(ctx->reorder, xid, buf->origptr);
 
 	/*
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 6eaa40f6acf..56ea521e8c6 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -486,6 +486,26 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	return SnapBuildMVCCFromHistoric(snap, true);
 }
 
+/*
+ * Build an MVCC snapshot for the initial data load performed by REPACK
+ * CONCURRENTLY command.
+ *
+ * The snapshot will only be used to scan one particular relation, which is
+ * treated like a catalog (therefore ->building_full_snapshot is not
+ * important), and the caller should already have a replication slot setup (so
+ * we do not set MyProc->xmin). XXX Do we yet need to add some restrictions?
+ */
+Snapshot
+SnapBuildInitialSnapshotForRepack(SnapBuild *builder)
+{
+	Snapshot	snap;
+
+	Assert(builder->state == SNAPBUILD_CONSISTENT);
+
+	snap = SnapBuildBuildSnapshot(builder);
+	return SnapBuildMVCCFromHistoric(snap, false);
+}
+
 /*
  * Turn a historic MVCC snapshot into an ordinary MVCC snapshot.
  *
diff --git a/src/backend/replication/pgoutput_repack/Makefile b/src/backend/replication/pgoutput_repack/Makefile
new file mode 100644
index 00000000000..4efeb713b70
--- /dev/null
+++ b/src/backend/replication/pgoutput_repack/Makefile
@@ -0,0 +1,32 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for src/backend/replication/pgoutput_repack
+#
+# IDENTIFICATION
+#    src/backend/replication/pgoutput_repack
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/replication/pgoutput_repack
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = \
+	$(WIN32RES) \
+	pgoutput_repack.o
+PGFILEDESC = "pgoutput_repack - logical replication output plugin for REPACK command"
+NAME = pgoutput_repack
+
+all: all-shared-lib
+
+include $(top_srcdir)/src/Makefile.shlib
+
+install: all installdirs install-lib
+
+installdirs: installdirs-lib
+
+uninstall: uninstall-lib
+
+clean distclean: clean-lib
+	rm -f $(OBJS)
diff --git a/src/backend/replication/pgoutput_repack/meson.build b/src/backend/replication/pgoutput_repack/meson.build
new file mode 100644
index 00000000000..133e865a4a0
--- /dev/null
+++ b/src/backend/replication/pgoutput_repack/meson.build
@@ -0,0 +1,18 @@
+# Copyright (c) 2022-2024, PostgreSQL Global Development Group
+
+pgoutput_repack_sources = files(
+  'pgoutput_repack.c',
+)
+
+if host_system == 'windows'
+  pgoutput_repack_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'pgoutput_repack',
+    '--FILEDESC', 'pgoutput_repack - logical replication output plugin for REPACK command',])
+endif
+
+pgoutput_repack = shared_module('pgoutput_repack',
+  pgoutput_repack_sources,
+  kwargs: pg_mod_args,
+)
+
+backend_targets += pgoutput_repack
diff --git a/src/backend/replication/pgoutput_repack/pgoutput_repack.c b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
new file mode 100644
index 00000000000..687fbbc59bb
--- /dev/null
+++ b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
@@ -0,0 +1,288 @@
+/*-------------------------------------------------------------------------
+ *
+ * pgoutput_cluster.c
+ *		Logical Replication output plugin for REPACK command
+ *
+ * Copyright (c) 2012-2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		  src/backend/replication/pgoutput_cluster/pgoutput_cluster.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heaptoast.h"
+#include "commands/cluster.h"
+#include "replication/snapbuild.h"
+
+PG_MODULE_MAGIC;
+
+static void plugin_startup(LogicalDecodingContext *ctx,
+						   OutputPluginOptions *opt, bool is_init);
+static void plugin_shutdown(LogicalDecodingContext *ctx);
+static void plugin_begin_txn(LogicalDecodingContext *ctx,
+							 ReorderBufferTXN *txn);
+static void plugin_commit_txn(LogicalDecodingContext *ctx,
+							  ReorderBufferTXN *txn, XLogRecPtr commit_lsn);
+static void plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+						  Relation rel, ReorderBufferChange *change);
+static void plugin_truncate(struct LogicalDecodingContext *ctx,
+							ReorderBufferTXN *txn, int nrelations,
+							Relation relations[],
+							ReorderBufferChange *change);
+static void store_change(LogicalDecodingContext *ctx,
+						 ConcurrentChangeKind kind, HeapTuple tuple);
+
+void
+_PG_output_plugin_init(OutputPluginCallbacks *cb)
+{
+	AssertVariableIsOfType(&_PG_output_plugin_init, LogicalOutputPluginInit);
+
+	cb->startup_cb = plugin_startup;
+	cb->begin_cb = plugin_begin_txn;
+	cb->change_cb = plugin_change;
+	cb->truncate_cb = plugin_truncate;
+	cb->commit_cb = plugin_commit_txn;
+	cb->shutdown_cb = plugin_shutdown;
+}
+
+
+/* initialize this plugin */
+static void
+plugin_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
+			   bool is_init)
+{
+	ctx->output_plugin_private = NULL;
+
+	/* Probably unnecessary, as we don't use the SQL interface ... */
+	opt->output_type = OUTPUT_PLUGIN_BINARY_OUTPUT;
+
+	if (ctx->output_plugin_options != NIL)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("This plugin does not expect any options")));
+	}
+}
+
+static void
+plugin_shutdown(LogicalDecodingContext *ctx)
+{
+}
+
+/*
+ * As we don't release the slot during processing of particular table, there's
+ * no room for SQL interface, even for debugging purposes. Therefore we need
+ * neither OutputPluginPrepareWrite() nor OutputPluginWrite() in the plugin
+ * callbacks. (Although we might want to write custom callbacks, this API
+ * seems to be unnecessarily generic for our purposes.)
+ */
+
+/* BEGIN callback */
+static void
+plugin_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
+{
+}
+
+/* COMMIT callback */
+static void
+plugin_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+				  XLogRecPtr commit_lsn)
+{
+}
+
+/*
+ * Callback for individual changed tuples
+ */
+static void
+plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+			  Relation relation, ReorderBufferChange *change)
+{
+	RepackDecodingState *dstate;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	/* Only interested in one particular relation. */
+	if (relation->rd_id != dstate->relid)
+		return;
+
+	/* Decode entry depending on its type */
+	switch (change->action)
+	{
+		case REORDER_BUFFER_CHANGE_INSERT:
+			{
+				HeapTuple	newtuple;
+
+				newtuple = change->data.tp.newtuple != NULL ?
+					change->data.tp.newtuple : NULL;
+
+				/*
+				 * Identity checks in the main function should have made this
+				 * impossible.
+				 */
+				if (newtuple == NULL)
+					elog(ERROR, "Incomplete insert info.");
+
+				store_change(ctx, CHANGE_INSERT, newtuple);
+			}
+			break;
+		case REORDER_BUFFER_CHANGE_UPDATE:
+			{
+				HeapTuple	oldtuple,
+							newtuple;
+
+				oldtuple = change->data.tp.oldtuple != NULL ?
+					change->data.tp.oldtuple : NULL;
+				newtuple = change->data.tp.newtuple != NULL ?
+					change->data.tp.newtuple : NULL;
+
+				if (newtuple == NULL)
+					elog(ERROR, "Incomplete update info.");
+
+				if (oldtuple != NULL)
+					store_change(ctx, CHANGE_UPDATE_OLD, oldtuple);
+
+				store_change(ctx, CHANGE_UPDATE_NEW, newtuple);
+			}
+			break;
+		case REORDER_BUFFER_CHANGE_DELETE:
+			{
+				HeapTuple	oldtuple;
+
+				oldtuple = change->data.tp.oldtuple ?
+					change->data.tp.oldtuple : NULL;
+
+				if (oldtuple == NULL)
+					elog(ERROR, "Incomplete delete info.");
+
+				store_change(ctx, CHANGE_DELETE, oldtuple);
+			}
+			break;
+		default:
+			/* Should not come here */
+			Assert(false);
+			break;
+	}
+}
+
+static void
+plugin_truncate(struct LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+				int nrelations, Relation relations[],
+				ReorderBufferChange *change)
+{
+	RepackDecodingState *dstate;
+	int			i;
+	Relation	relation = NULL;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	/* Find the relation we are processing. */
+	for (i = 0; i < nrelations; i++)
+	{
+		relation = relations[i];
+
+		if (RelationGetRelid(relation) == dstate->relid)
+			break;
+	}
+
+	/* Is this truncation of another relation? */
+	if (i == nrelations)
+		return;
+
+	store_change(ctx, CHANGE_TRUNCATE, NULL);
+}
+
+/* Store concurrent data change. */
+static void
+store_change(LogicalDecodingContext *ctx, ConcurrentChangeKind kind,
+			 HeapTuple tuple)
+{
+	RepackDecodingState *dstate;
+	char	   *change_raw;
+	ConcurrentChange change;
+	bool		flattened = false;
+	Size		size;
+	Datum		values[1];
+	bool		isnull[1];
+	char	   *dst,
+			   *dst_start;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	size = MAXALIGN(VARHDRSZ) + SizeOfConcurrentChange;
+
+	if (tuple)
+	{
+		/*
+		 * ReorderBufferCommit() stores the TOAST chunks in its private memory
+		 * context and frees them after having called apply_change().
+		 * Therefore we need flat copy (including TOAST) that we eventually
+		 * copy into the memory context which is available to
+		 * decode_concurrent_changes().
+		 */
+		if (HeapTupleHasExternal(tuple))
+		{
+			/*
+			 * toast_flatten_tuple_to_datum() might be more convenient but we
+			 * don't want the decompression it does.
+			 */
+			tuple = toast_flatten_tuple(tuple, dstate->tupdesc);
+			flattened = true;
+		}
+
+		size += tuple->t_len;
+	}
+
+	/* XXX Isn't there any function / macro to do this? */
+	if (size >= 0x3FFFFFFF)
+		elog(ERROR, "Change is too big.");
+
+	/* Construct the change. */
+	change_raw = (char *) palloc0(size);
+	SET_VARSIZE(change_raw, size);
+
+	/*
+	 * Since the varlena alignment might not be sufficient for the structure,
+	 * set the fields in a local instance and remember where it should
+	 * eventually be copied.
+	 */
+	change.kind = kind;
+	dst_start = (char *) VARDATA(change_raw);
+
+	/* No other information is needed for TRUNCATE. */
+	if (change.kind == CHANGE_TRUNCATE)
+	{
+		memcpy(dst_start, &change, SizeOfConcurrentChange);
+		goto store;
+	}
+
+	/*
+	 * Copy the tuple.
+	 *
+	 * CAUTION: change->tup_data.t_data must be fixed on retrieval!
+	 */
+	memcpy(&change.tup_data, tuple, sizeof(HeapTupleData));
+	dst = dst_start + SizeOfConcurrentChange;
+	memcpy(dst, tuple->t_data, tuple->t_len);
+
+	/* The data has been copied. */
+	if (flattened)
+		pfree(tuple);
+
+store:
+	/* Copy the structure so it can be stored. */
+	memcpy(dst_start, &change, SizeOfConcurrentChange);
+
+	/* Store as tuple of 1 bytea column. */
+	values[0] = PointerGetDatum(change_raw);
+	isnull[0] = false;
+	tuplestore_putvalues(dstate->tstore, dstate->tupdesc_change,
+						 values, isnull);
+
+	/* Accounting. */
+	dstate->nchanges++;
+
+	/* Cleanup. */
+	pfree(change_raw);
+}
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2fa045e6b0f..e9ddf39500c 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -25,6 +25,7 @@
 #include "access/xlogprefetcher.h"
 #include "access/xlogrecovery.h"
 #include "commands/async.h"
+#include "commands/cluster.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
diff --git a/src/backend/storage/lmgr/generate-lwlocknames.pl b/src/backend/storage/lmgr/generate-lwlocknames.pl
index cd3e43c448a..519f3953638 100644
--- a/src/backend/storage/lmgr/generate-lwlocknames.pl
+++ b/src/backend/storage/lmgr/generate-lwlocknames.pl
@@ -162,7 +162,7 @@ while (<$lwlocklist>)
 
 die
   "$wait_event_lwlocks[$lwlock_count] defined in wait_event_names.txt but "
-  . " missing from lwlocklist.h"
+  . "missing from lwlocklist.h"
   if $lwlock_count < scalar @wait_event_lwlocks;
 
 die
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 559ba9cdb2c..4911642fb3c 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -64,6 +64,7 @@
 #include "catalog/pg_type.h"
 #include "catalog/schemapg.h"
 #include "catalog/storage.h"
+#include "commands/cluster.h"
 #include "commands/policy.h"
 #include "commands/publicationcmds.h"
 #include "commands/trigger.h"
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 70a6b8902d1..7f1c220e00b 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -213,7 +213,6 @@ static List *exportedSnapshots = NIL;
 
 /* Prototypes for local functions */
 static void UnregisterSnapshotNoOwner(Snapshot snapshot);
-static void FreeSnapshot(Snapshot snapshot);
 static void SnapshotResetXmin(void);
 
 /* ResourceOwner callbacks to track snapshot references */
@@ -646,7 +645,7 @@ CopySnapshot(Snapshot snapshot)
  * FreeSnapshot
  *		Free the memory associated with a snapshot.
  */
-static void
+void
 FreeSnapshot(Snapshot snapshot)
 {
 	Assert(snapshot->regd_count == 0);
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 31c1ce5dc09..f23efa12d69 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -4998,18 +4998,27 @@ match_previous_words(int pattern_id,
 	}
 
 /* REPACK */
-	else if (Matches("REPACK"))
+	else if (Matches("REPACK") || Matches("REPACK", "(*)"))
+		COMPLETE_WITH_SCHEMA_QUERY_PLUS(Query_for_list_of_clusterables,
+										"CONCURRENTLY");
+	else if (Matches("REPACK", "CONCURRENTLY"))
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
-	else if (Matches("REPACK", "(*)"))
+	else if (Matches("REPACK", "(*)", "CONCURRENTLY"))
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
-	/* If we have REPACK <sth>, then add "USING INDEX" */
-	else if (Matches("REPACK", MatchAnyExcept("(")))
+	/* If we have REPACK [ CONCURRENTLY ] <sth>, then add "USING INDEX" */
+	else if (Matches("REPACK", MatchAnyExcept("(|CONCURRENTLY")) ||
+			 Matches("REPACK", "CONCURRENTLY", MatchAnyExcept("(")))
 		COMPLETE_WITH("USING INDEX");
-	/* If we have REPACK (*) <sth>, then add "USING INDEX" */
-	else if (Matches("REPACK", "(*)", MatchAny))
+	/* If we have REPACK (*) [ CONCURRENTLY ] <sth>, then add "USING INDEX" */
+	else if (Matches("REPACK", "(*)", MatchAnyExcept("CONCURRENTLY")) ||
+			 Matches("REPACK", "(*)", "CONCURRENTLY", MatchAnyExcept("(")))
 		COMPLETE_WITH("USING INDEX");
-	/* If we have REPACK <sth> USING, then add the index as well */
-	else if (Matches("REPACK", MatchAny, "USING", "INDEX"))
+
+	/*
+	 * Complete ... [ (*) ] [ CONCURRENTLY ] <sth> USING INDEX, with a list of
+	 * indexes for <sth>.
+	 */
+	else if (TailMatches(MatchAnyExcept("(|CONCURRENTLY"), "USING", "INDEX"))
 	{
 		set_completion_reference(prev3_wd);
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_index_of_table);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2bd5a897f8..b82dd17a966 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -323,14 +323,15 @@ extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots,
 							  BulkInsertState bistate);
 extern TM_Result heap_delete(Relation relation, ItemPointer tid,
 							 CommandId cid, Snapshot crosscheck, bool wait,
-							 struct TM_FailureData *tmfd, bool changingPart);
+							 struct TM_FailureData *tmfd, bool changingPart,
+							 bool wal_logical);
 extern void heap_finish_speculative(Relation relation, ItemPointer tid);
 extern void heap_abort_speculative(Relation relation, ItemPointer tid);
 extern TM_Result heap_update(Relation relation, ItemPointer otid,
 							 HeapTuple newtup,
 							 CommandId cid, Snapshot crosscheck, bool wait,
 							 struct TM_FailureData *tmfd, LockTupleMode *lockmode,
-							 TU_UpdateIndexes *update_indexes);
+							 TU_UpdateIndexes *update_indexes, bool wal_logical);
 extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
 								 CommandId cid, LockTupleMode mode, LockWaitPolicy wait_policy,
 								 bool follow_updates,
@@ -411,6 +412,10 @@ extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
 								 uint16 infomask, TransactionId xid);
+extern bool HeapTupleMVCCInserted(HeapTuple htup, Snapshot snapshot,
+								  Buffer buffer);
+extern bool HeapTupleMVCCNotDeleted(HeapTuple htup, Snapshot snapshot,
+									Buffer buffer);
 extern bool HeapTupleHeaderIsOnlyLocked(HeapTupleHeader tuple);
 extern bool HeapTupleIsSurelyDead(HeapTuple htup,
 								  struct GlobalVisState *vistest);
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 277df6b3cf0..8d4af07f840 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -104,6 +104,8 @@
 #define XLH_DELETE_CONTAINS_OLD_KEY				(1<<2)
 #define XLH_DELETE_IS_SUPER						(1<<3)
 #define XLH_DELETE_IS_PARTITION_MOVE			(1<<4)
+/* See heap_delete() */
+#define XLH_DELETE_NO_LOGICAL					(1<<5)
 
 /* convenience macro for checking whether any form of old tuple was logged */
 #define XLH_DELETE_CONTAINS_OLD						\
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 1c9e802a6b1..289b64edfd9 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -22,6 +22,7 @@
 #include "access/xact.h"
 #include "commands/vacuum.h"
 #include "executor/tuptable.h"
+#include "replication/logical.h"
 #include "storage/read_stream.h"
 #include "utils/rel.h"
 #include "utils/snapshot.h"
@@ -623,6 +624,8 @@ typedef struct TableAmRoutine
 											  Relation OldIndex,
 											  bool use_sort,
 											  TransactionId OldestXmin,
+											  Snapshot snapshot,
+											  LogicalDecodingContext *decoding_ctx,
 											  TransactionId *xid_cutoff,
 											  MultiXactId *multi_cutoff,
 											  double *num_tuples,
@@ -1627,6 +1630,10 @@ table_relation_copy_data(Relation rel, const RelFileLocator *newrlocator)
  *   not needed for the relation's AM
  * - *xid_cutoff - ditto
  * - *multi_cutoff - ditto
+ * - snapshot - if != NULL, ignore data changes done by transactions that this
+ *	 (MVCC) snapshot considers still in-progress or in the future.
+ * - decoding_ctx - logical decoding context, to capture concurrent data
+ *   changes.
  *
  * Output parameters:
  * - *xid_cutoff - rel's new relfrozenxid value, may be invalid
@@ -1639,6 +1646,8 @@ table_relation_copy_for_cluster(Relation OldTable, Relation NewTable,
 								Relation OldIndex,
 								bool use_sort,
 								TransactionId OldestXmin,
+								Snapshot snapshot,
+								LogicalDecodingContext *decoding_ctx,
 								TransactionId *xid_cutoff,
 								MultiXactId *multi_cutoff,
 								double *num_tuples,
@@ -1647,6 +1656,7 @@ table_relation_copy_for_cluster(Relation OldTable, Relation NewTable,
 {
 	OldTable->rd_tableam->relation_copy_for_cluster(OldTable, NewTable, OldIndex,
 													use_sort, OldestXmin,
+													snapshot, decoding_ctx,
 													xid_cutoff, multi_cutoff,
 													num_tuples, tups_vacuumed,
 													tups_recently_dead);
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index 4daa8bef5ee..66431cc19e5 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -100,6 +100,9 @@ extern Oid	index_concurrently_create_copy(Relation heapRelation,
 										   Oid tablespaceOid,
 										   const char *newName);
 
+extern NullableDatum *get_index_stattargets(Oid indexid,
+											IndexInfo *indInfo);
+
 extern void index_concurrently_build(Oid heapRelationId,
 									 Oid indexRelationId);
 
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 7f4138c7b36..532ffa7208d 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -13,10 +13,15 @@
 #ifndef CLUSTER_H
 #define CLUSTER_H
 
+#include "nodes/execnodes.h"
 #include "nodes/parsenodes.h"
 #include "parser/parse_node.h"
+#include "replication/logical.h"
 #include "storage/lock.h"
+#include "storage/relfilelocator.h"
 #include "utils/relcache.h"
+#include "utils/resowner.h"
+#include "utils/tuplestore.h"
 
 
 /* flag bits for ClusterParams->options */
@@ -24,6 +29,7 @@
 #define CLUOPT_RECHECK 0x02		/* recheck relation state */
 #define CLUOPT_RECHECK_ISCLUSTERED 0x04 /* recheck relation state for
 										 * indisclustered */
+#define CLUOPT_CONCURRENT 0x08	/* allow concurrent data changes */
 
 /* options for CLUSTER */
 typedef struct ClusterParams
@@ -32,14 +38,95 @@ typedef struct ClusterParams
 } ClusterParams;
 
 
+/*
+ * The following definitions are used by REPACK CONCURRENTLY.
+ */
+
+extern RelFileLocator repacked_rel_locator;
+extern RelFileLocator repacked_rel_toast_locator;
+
+typedef enum
+{
+	CHANGE_INSERT,
+	CHANGE_UPDATE_OLD,
+	CHANGE_UPDATE_NEW,
+	CHANGE_DELETE,
+	CHANGE_TRUNCATE
+} ConcurrentChangeKind;
+
+typedef struct ConcurrentChange
+{
+	/* See the enum above. */
+	ConcurrentChangeKind kind;
+
+	/*
+	 * The actual tuple.
+	 *
+	 * The tuple data follows the ConcurrentChange structure. Before use make
+	 * sure the tuple is correctly aligned (ConcurrentChange can be stored as
+	 * bytea) and that tuple->t_data is fixed.
+	 */
+	HeapTupleData tup_data;
+} ConcurrentChange;
+
+#define SizeOfConcurrentChange (offsetof(ConcurrentChange, tup_data) + \
+								sizeof(HeapTupleData))
+
+/*
+ * Logical decoding state.
+ *
+ * Here we store the data changes that we decode from WAL while the table
+ * contents is being copied to a new storage. Also the necessary metadata
+ * needed to apply these changes to the table is stored here.
+ */
+typedef struct RepackDecodingState
+{
+	/* The relation whose changes we're decoding. */
+	Oid			relid;
+
+	/*
+	 * Decoded changes are stored here. Although we try to avoid excessive
+	 * batches, it can happen that the changes need to be stored to disk. The
+	 * tuplestore does this transparently.
+	 */
+	Tuplestorestate *tstore;
+
+	/* The current number of changes in tstore. */
+	double		nchanges;
+
+	/*
+	 * Descriptor to store the ConcurrentChange structure serialized (bytea).
+	 * We can't store the tuple directly because tuplestore only supports
+	 * minimum tuple and we may need to transfer OID system column from the
+	 * output plugin. Also we need to transfer the change kind, so it's better
+	 * to put everything in the structure than to use 2 tuplestores "in
+	 * parallel".
+	 */
+	TupleDesc	tupdesc_change;
+
+	/* Tuple descriptor needed to update indexes. */
+	TupleDesc	tupdesc;
+
+	/* Slot to retrieve data from tstore. */
+	TupleTableSlot *tsslot;
+
+	ResourceOwner resowner;
+} RepackDecodingState;
+
+
+
 extern void ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel);
 
 extern void cluster_rel(RepackCommand command, bool usingindex,
-						Relation OldHeap, Oid indexOid, ClusterParams *params);
+						Relation OldHeap, Oid indexOid, ClusterParams *params,
+						bool isTopLevel);
 extern void check_index_is_clusterable(Relation OldHeap, Oid indexOid,
 									   LOCKMODE lockmode);
 extern void mark_index_clustered(Relation rel, Oid indexOid, bool is_internal);
 
+extern void repack_decode_concurrent_changes(LogicalDecodingContext *ctx,
+											 XLogRecPtr end_of_wal);
+
 extern Oid	make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
 						  char relpersistence, LOCKMODE lockmode);
 extern void finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
@@ -47,6 +134,7 @@ extern void finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 							 bool swap_toast_by_content,
 							 bool check_constraints,
 							 bool is_internal,
+							 bool reindex,
 							 TransactionId frozenXid,
 							 MultiXactId cutoffMulti,
 							 char newrelpersistence);
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 5f102743af4..eb507bd7c3c 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -59,18 +59,20 @@
 /*
  * Progress parameters for REPACK.
  *
- * Note: Since REPACK shares some code with CLUSTER, these values are also
- * used by CLUSTER. (CLUSTER is now deprecated, so it makes little sense to
- * introduce a separate set of constants.)
+ * Note: Since REPACK shares some code with CLUSTER, (some of) these values
+ * are also used by CLUSTER. (CLUSTER is now deprecated, so it makes little
+ * sense to introduce a separate set of constants.)
  */
 #define PROGRESS_REPACK_COMMAND					0
 #define PROGRESS_REPACK_PHASE					1
 #define PROGRESS_REPACK_INDEX_RELID				2
 #define PROGRESS_REPACK_HEAP_TUPLES_SCANNED		3
-#define PROGRESS_REPACK_HEAP_TUPLES_WRITTEN		4
-#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			5
-#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		6
-#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		7
+#define PROGRESS_REPACK_HEAP_TUPLES_INSERTED	4
+#define PROGRESS_REPACK_HEAP_TUPLES_UPDATED		5
+#define PROGRESS_REPACK_HEAP_TUPLES_DELETED		6
+#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			7
+#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		8
+#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		9
 
 /*
  * Phases of repack (as advertised via PROGRESS_REPACK_PHASE).
@@ -79,9 +81,10 @@
 #define PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP	2
 #define PROGRESS_REPACK_PHASE_SORT_TUPLES		3
 #define PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP	4
-#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	5
-#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		6
-#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		7
+#define PROGRESS_REPACK_PHASE_CATCH_UP			5
+#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	6
+#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		7
+#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		8
 
 /*
  * Commands of PROGRESS_REPACK
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index 6d4d2d1814c..802fc4b0823 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -73,6 +73,7 @@ extern void FreeSnapshotBuilder(SnapBuild *builder);
 extern void SnapBuildSnapDecRefcount(Snapshot snap);
 
 extern Snapshot SnapBuildInitialSnapshot(SnapBuild *builder);
+extern Snapshot SnapBuildInitialSnapshotForRepack(SnapBuild *builder);
 extern Snapshot SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place);
 extern const char *SnapBuildExportSnapshot(SnapBuild *builder);
 extern void SnapBuildClearExportedSnapshot(void);
diff --git a/src/include/storage/lockdefs.h b/src/include/storage/lockdefs.h
index 7f3ba0352f6..2739327b0da 100644
--- a/src/include/storage/lockdefs.h
+++ b/src/include/storage/lockdefs.h
@@ -36,8 +36,8 @@ typedef int LOCKMODE;
 #define AccessShareLock			1	/* SELECT */
 #define RowShareLock			2	/* SELECT FOR UPDATE/FOR SHARE */
 #define RowExclusiveLock		3	/* INSERT, UPDATE, DELETE */
-#define ShareUpdateExclusiveLock 4	/* VACUUM (non-FULL), ANALYZE, CREATE
-									 * INDEX CONCURRENTLY */
+#define ShareUpdateExclusiveLock 4	/* VACUUM (non-exclusive), ANALYZE, CREATE
+									 * INDEX CONCURRENTLY, REPACK CONCURRENTLY */
 #define ShareLock				5	/* CREATE INDEX (WITHOUT CONCURRENTLY) */
 #define ShareRowExclusiveLock	6	/* like EXCLUSIVE MODE, but allows ROW
 									 * SHARE */
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 147b190210a..5eeabdc6c4f 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -61,6 +61,8 @@ extern Snapshot GetLatestSnapshot(void);
 extern void SnapshotSetCommandId(CommandId curcid);
 
 extern Snapshot CopySnapshot(Snapshot snapshot);
+extern void FreeSnapshot(Snapshot snapshot);
+
 extern Snapshot GetCatalogSnapshot(Oid relid);
 extern Snapshot GetNonHistoricCatalogSnapshot(Oid relid);
 extern void InvalidateCatalogSnapshot(void);
diff --git a/src/test/modules/injection_points/Makefile b/src/test/modules/injection_points/Makefile
index fc82cd67f6c..f16422175f8 100644
--- a/src/test/modules/injection_points/Makefile
+++ b/src/test/modules/injection_points/Makefile
@@ -11,10 +11,11 @@ EXTENSION = injection_points
 DATA = injection_points--1.0.sql
 PGFILEDESC = "injection_points - facility for injection points"
 
-REGRESS = injection_points hashagg reindex_conc vacuum
+# REGRESS = injection_points hashagg reindex_conc vacuum
 REGRESS_OPTS = --dlpath=$(top_builddir)/src/test/regress
 
-ISOLATION = basic inplace syscache-update-pruned
+ISOLATION = basic inplace syscache-update-pruned repack
+ISOLATION_OPTS = --temp-config $(top_srcdir)/src/test/modules/injection_points/logical.conf
 
 TAP_TESTS = 1
 
diff --git a/src/test/modules/injection_points/expected/repack.out b/src/test/modules/injection_points/expected/repack.out
new file mode 100644
index 00000000000..b575e9052ee
--- /dev/null
+++ b/src/test/modules/injection_points/expected/repack.out
@@ -0,0 +1,113 @@
+Parsed test spec with 2 sessions
+
+starting permutation: wait_before_lock change_existing change_new change_subxact1 change_subxact2 check2 wakeup_before_lock check1
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step wait_before_lock: 
+	REPACK (CONCURRENTLY) repack_test USING INDEX repack_test_pkey;
+ <waiting ...>
+step change_existing: 
+	UPDATE repack_test SET i=10 where i=1;
+	UPDATE repack_test SET j=20 where i=2;
+	UPDATE repack_test SET i=30 where i=3;
+	UPDATE repack_test SET i=40 where i=30;
+	DELETE FROM repack_test WHERE i=4;
+
+step change_new: 
+	INSERT INTO repack_test(i, j) VALUES (5, 5), (6, 6), (7, 7), (8, 8);
+	UPDATE repack_test SET i=50 where i=5;
+	UPDATE repack_test SET j=60 where i=6;
+	DELETE FROM repack_test WHERE i=7;
+
+step change_subxact1: 
+	BEGIN;
+	INSERT INTO repack_test(i, j) VALUES (100, 100);
+	SAVEPOINT s1;
+	UPDATE repack_test SET i=101 where i=100;
+	SAVEPOINT s2;
+	UPDATE repack_test SET i=102 where i=101;
+	COMMIT;
+
+step change_subxact2: 
+	BEGIN;
+	SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 110);
+	ROLLBACK TO SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 111);
+	COMMIT;
+
+step check2: 
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s2(i, j)
+	SELECT i, j FROM repack_test;
+
+  i|  j
+---+---
+  2| 20
+  6| 60
+  8|  8
+ 10|  1
+ 40|  3
+ 50|  5
+102|100
+110|111
+(8 rows)
+
+step wakeup_before_lock: 
+	SELECT injection_points_wakeup('repack-concurrently-before-lock');
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step wait_before_lock: <... completed>
+step check1: 
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT count(DISTINCT node) FROM relfilenodes;
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s1(i, j)
+	SELECT i, j FROM repack_test;
+
+	SELECT count(*)
+	FROM data_s1 d1 FULL JOIN data_s2 d2 USING (i, j)
+	WHERE d1.i ISNULL OR d2.i ISNULL;
+
+count
+-----
+    2
+(1 row)
+
+  i|  j
+---+---
+  2| 20
+  6| 60
+  8|  8
+ 10|  1
+ 40|  3
+ 50|  5
+102|100
+110|111
+(8 rows)
+
+count
+-----
+    0
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
diff --git a/src/test/modules/injection_points/logical.conf b/src/test/modules/injection_points/logical.conf
new file mode 100644
index 00000000000..c8f264bc6cb
--- /dev/null
+++ b/src/test/modules/injection_points/logical.conf
@@ -0,0 +1 @@
+wal_level = logical
\ No newline at end of file
diff --git a/src/test/modules/injection_points/meson.build b/src/test/modules/injection_points/meson.build
index 20390d6b4bf..29561103bbf 100644
--- a/src/test/modules/injection_points/meson.build
+++ b/src/test/modules/injection_points/meson.build
@@ -47,9 +47,13 @@ tests += {
     'specs': [
       'basic',
       'inplace',
+      'repack',
       'syscache-update-pruned',
     ],
     'runningcheck': false, # see syscache-update-pruned
+    # 'repack' requires wal_level = 'logical'.
+    'regress_args': ['--temp-config', files('logical.conf')],
+
   },
   'tap': {
     'env': {
diff --git a/src/test/modules/injection_points/specs/repack.spec b/src/test/modules/injection_points/specs/repack.spec
new file mode 100644
index 00000000000..75850334986
--- /dev/null
+++ b/src/test/modules/injection_points/specs/repack.spec
@@ -0,0 +1,143 @@
+# Prefix the system columns with underscore as they are not allowed as column
+# names.
+setup
+{
+	CREATE EXTENSION injection_points;
+
+	CREATE TABLE repack_test(i int PRIMARY KEY, j int);
+	INSERT INTO repack_test(i, j) VALUES (1, 1), (2, 2), (3, 3), (4, 4);
+
+	CREATE TABLE relfilenodes(node oid);
+
+	CREATE TABLE data_s1(i int, j int);
+	CREATE TABLE data_s2(i int, j int);
+}
+
+teardown
+{
+	DROP TABLE repack_test;
+	DROP EXTENSION injection_points;
+
+	DROP TABLE relfilenodes;
+	DROP TABLE data_s1;
+	DROP TABLE data_s2;
+}
+
+session s1
+setup
+{
+	SELECT injection_points_set_local();
+	SELECT injection_points_attach('repack-concurrently-before-lock', 'wait');
+}
+# Perform the initial load and wait for s2 to do some data changes.
+step wait_before_lock
+{
+	REPACK (CONCURRENTLY) repack_test USING INDEX repack_test_pkey;
+}
+# Check the table from the perspective of s1.
+#
+# Besides the contents, we also check that relfilenode has changed.
+
+# Have each session write the contents into a table and use FULL JOIN to check
+# if the outputs are identical.
+step check1
+{
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT count(DISTINCT node) FROM relfilenodes;
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s1(i, j)
+	SELECT i, j FROM repack_test;
+
+	SELECT count(*)
+	FROM data_s1 d1 FULL JOIN data_s2 d2 USING (i, j)
+	WHERE d1.i ISNULL OR d2.i ISNULL;
+}
+teardown
+{
+    SELECT injection_points_detach('repack-concurrently-before-lock');
+}
+
+session s2
+# Change the existing data. UPDATE changes both key and non-key columns. Also
+# update one row twice to test whether tuple version generated by this session
+# can be found.
+step change_existing
+{
+	UPDATE repack_test SET i=10 where i=1;
+	UPDATE repack_test SET j=20 where i=2;
+	UPDATE repack_test SET i=30 where i=3;
+	UPDATE repack_test SET i=40 where i=30;
+	DELETE FROM repack_test WHERE i=4;
+}
+# Insert new rows and UPDATE / DELETE some of them. Again, update both key and
+# non-key column.
+step change_new
+{
+	INSERT INTO repack_test(i, j) VALUES (5, 5), (6, 6), (7, 7), (8, 8);
+	UPDATE repack_test SET i=50 where i=5;
+	UPDATE repack_test SET j=60 where i=6;
+	DELETE FROM repack_test WHERE i=7;
+}
+
+# When applying concurrent data changes, we should see the effects of an
+# in-progress subtransaction.
+#
+# XXX Not sure this test is useful now - it was designed for the patch that
+# preserves tuple visibility and which therefore modifies
+# TransactionIdIsCurrentTransactionId().
+step change_subxact1
+{
+	BEGIN;
+	INSERT INTO repack_test(i, j) VALUES (100, 100);
+	SAVEPOINT s1;
+	UPDATE repack_test SET i=101 where i=100;
+	SAVEPOINT s2;
+	UPDATE repack_test SET i=102 where i=101;
+	COMMIT;
+}
+
+# When applying concurrent data changes, we should not see the effects of a
+# rolled back subtransaction.
+#
+# XXX Is this test useful? See above.
+step change_subxact2
+{
+	BEGIN;
+	SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 110);
+	ROLLBACK TO SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 111);
+	COMMIT;
+}
+
+# Check the table from the perspective of s2.
+step check2
+{
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s2(i, j)
+	SELECT i, j FROM repack_test;
+}
+step wakeup_before_lock
+{
+	SELECT injection_points_wakeup('repack-concurrently-before-lock');
+}
+
+# Test if data changes introduced while one session is performing REPACK
+# CONCURRENTLY find their way into the table.
+permutation
+	wait_before_lock
+	change_existing
+	change_new
+	change_subxact1
+	change_subxact2
+	check2
+	wakeup_before_lock
+	check1
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 84ae47080f5..d2d863e3da5 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1994,17 +1994,17 @@ pg_stat_progress_cluster| SELECT s.pid,
             WHEN 2 THEN 'index scanning heap'::text
             WHEN 3 THEN 'sorting tuples'::text
             WHEN 4 THEN 'writing new heap'::text
-            WHEN 5 THEN 'swapping relation files'::text
-            WHEN 6 THEN 'rebuilding index'::text
-            WHEN 7 THEN 'performing final cleanup'::text
+            WHEN 6 THEN 'swapping relation files'::text
+            WHEN 7 THEN 'rebuilding index'::text
+            WHEN 8 THEN 'performing final cleanup'::text
             ELSE NULL::text
         END AS phase,
     (s.param3)::oid AS cluster_index_relid,
     s.param4 AS heap_tuples_scanned,
     s.param5 AS heap_tuples_written,
-    s.param6 AS heap_blks_total,
-    s.param7 AS heap_blks_scanned,
-    s.param8 AS index_rebuild_count
+    s.param8 AS heap_blks_total,
+    s.param9 AS heap_blks_scanned,
+    s.param10 AS index_rebuild_count
    FROM (pg_stat_get_progress_info('CLUSTER'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_copy| SELECT s.pid,
@@ -2076,17 +2076,20 @@ pg_stat_progress_repack| SELECT s.pid,
             WHEN 2 THEN 'index scanning heap'::text
             WHEN 3 THEN 'sorting tuples'::text
             WHEN 4 THEN 'writing new heap'::text
-            WHEN 5 THEN 'swapping relation files'::text
-            WHEN 6 THEN 'rebuilding index'::text
-            WHEN 7 THEN 'performing final cleanup'::text
+            WHEN 5 THEN 'catch-up'::text
+            WHEN 6 THEN 'swapping relation files'::text
+            WHEN 7 THEN 'rebuilding index'::text
+            WHEN 8 THEN 'performing final cleanup'::text
             ELSE NULL::text
         END AS phase,
     (s.param3)::oid AS repack_index_relid,
     s.param4 AS heap_tuples_scanned,
-    s.param5 AS heap_tuples_written,
-    s.param6 AS heap_blks_total,
-    s.param7 AS heap_blks_scanned,
-    s.param8 AS index_rebuild_count
+    s.param5 AS heap_tuples_inserted,
+    s.param6 AS heap_tuples_updated,
+    s.param7 AS heap_tuples_deleted,
+    s.param8 AS heap_blks_total,
+    s.param9 AS heap_blks_scanned,
+    s.param10 AS index_rebuild_count
    FROM (pg_stat_get_progress_info('REPACK'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_vacuum| SELECT s.pid,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 24f98ed1e9e..af967625181 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -486,6 +486,8 @@ CompressFileHandle
 CompressionLocation
 CompressorState
 ComputeXidHorizonsResult
+ConcurrentChange
+ConcurrentChangeKind
 ConditionVariable
 ConditionVariableMinimallyPadded
 ConditionalStack
@@ -1258,6 +1260,7 @@ IndexElem
 IndexFetchHeapData
 IndexFetchTableData
 IndexInfo
+IndexInsertState
 IndexList
 IndexOnlyScan
 IndexOnlyScanState
@@ -2538,6 +2541,7 @@ ReorderBufferUpdateProgressTxnCB
 ReorderTuple
 RepOriginId
 RepackCommand
+RepackDecodingState
 RepackStmt
 ReparameterizeForeignPathByChild_function
 ReplaceVarsFromTargetList_context
-- 
2.39.5

Import Notes

Reply to msg id not found: CAHGQGwELP6OAOprxjkS5iYSj1KXKmVPULJyYMddE7-uRxrSP_Q@mail.gmail.comCABV9wwMjm_8GQ_6A7WZkXzEkbz5ZPurB3ijxu74MbFJ4+XrWUg@mail.gmail.com

Fujii Masao

masao.fujii@gmail.com

5 months ago

In reply to: Alvaro Herrera (#1)

On Fri, Aug 1, 2025 at 1:50 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

One of the later patches in the series, which I have not included yet,
intends to implement the idea of transiently enabling wal_level=logical
for the table being repacked concurrently, so that you can still use
the concurrent mode if you have a non-logical-wal_level instance.

Sounds good to me!

Here's v17.

I just tried REPACK command and observed a few things:

When I repeatedly ran REPACK on the regression database
while make installcheck was running, I got the following error:

ERROR: StartTransactionCommand: unexpected state STARTED

"REPACK (VERBOSE);" failed with the following error.

ERROR: syntax error at or near ";"

REPACK (CONCURRENTLY) USING INDEX failed with the following error,
while the same command without CONCURRENTLY completed successfully:

=# REPACK (CONCURRENTLY) parallel_vacuum_table using index
regular_sized_index ;
ERROR: cannot process relation "parallel_vacuum_table"
HINT: Relation "parallel_vacuum_table" has no identity index.

When I ran REPACK (CONCURRENTLY) on a table that's also a logical
replication target, I saw the following log messages. Is this expected?

=# REPACK (CONCURRENTLY) t;
LOG: logical decoding found consistent point at 1/00021F20
DETAIL: There are no running transactions.
STATEMENT: REPACK (CONCURRENTLY) t;

Regards,

--
Fujii Masao

Mihail Nikalayeu

mihailnikalayeu@gmail.com

5 months ago

In reply to: Fujii Masao (#2)

Hello Álvaro,

Should we skip non-ready indexes in build_new_indexes?

Also, in the current implementation, concurrent mode is marked as
non-MVCC safe. From my point of view, this is a significant limitation
for practical use.
Should we consider an option to exchange non-MVCC issues to short
exclusive lock of ProcArrayLock + cancellation of some transactions
with older xmin?

Best regards,
Mikhail

Mihail Nikalayeu

mihailnikalayeu@gmail.com

5 months ago

In reply to: Mihail Nikalayeu (#3)

Hello!

One more thing - I think build_new_indexes and
index_concurrently_create_copy are very close in semantics, so it
might be a good idea to refactor them a bit.

I’m still concerned about MVCC-related issues. For multiple
applications, this is a dealbreaker, because in some cases correctness
is a higher priority than availability.

Possible options:

1) Terminate connections with old snapshots.

Add a flag to terminate all connections with snapshots during the
ExclusiveLock period for the swap. From the application’s perspective,
this is not a big deal - it's similar to a primary switch. We would
also need to prevent new snapshots from being taken during the swap
transaction, so a short exclusive lock on ProcArrayLock would also be
required.

2) MVCC-safe two-phase approach (inspired by CREATE INDEX).

- copy the data from T1 to the new table T2.
- apply the log.
- take a table-exclusive lock on T1
- apply the log again.
- instead of swapping, mark the T2 as a kind of shadow table - any
transaction applying changes to T1 must also apply them to T2, while
reads still use T1 as the source of truth.
- commit (and record the transaction ID as XID1).
- at this point, all changes are applied to both tables with the same
XIDs because of the "shadow table" mechanism.
- wait until older snapshots no longer treat XID1 as uncommitted.
- now the tables are identical from the MVCC perspective.
- take an exclusive lock on both T1 and T2.
- perform the swap and drop T1.
- commit.

This is more complex and would require implementing some sort of
"shadow table" mechanism, so it might not be worth the effort. Option
1 feels more appealing to me.

If others think this is a good idea, I might try implementing a proof
of concept.

Best regards,
Mikhail

Alvaro Herrera

alvherre@alvh.no-ip.org

5 months ago

In reply to: Mihail Nikalayeu (#4)

On 2025-Aug-09, Mihail Nikalayeu wrote:

Hello!

One more thing - I think build_new_indexes and
index_concurrently_create_copy are very close in semantics, so it
might be a good idea to refactor them a bit.

I’m still concerned about MVCC-related issues. For multiple
applications, this is a dealbreaker, because in some cases correctness
is a higher priority than availability.

Please note that Antonin already implemented this. See his patches
here:
/messages/by-id/77690.1725610115@antos
I proposed to leave this part out initially, which is why it hasn't been
reposted. We can review and discuss after the initial patches are in.
Because having an MVCC-safe mode has drawbacks, IMO we should make it
optional.

But you're welcome to review that part specifically if you're so
inclined, and offer feedback on it. (I suggest to rewind back your
checked-out tree to branch master at the time that patch was posted, for
easy application. We can deal with a rebase later.)

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Officer Krupke, what are we to do?
Gee, officer Krupke, Krup you! (West Side Story, "Gee, Officer Krupke")

Antonin Houska

ah@cybertec.at

5 months ago

In reply to: Mihail Nikalayeu (#4)

4 attachment(s)

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

One more thing - I think build_new_indexes and
index_concurrently_create_copy are very close in semantics, so it
might be a good idea to refactor them a bit.

You're right. I think I even used the latter for reference when writing the
first.

0002 in the attached series tries to fix that. build_new_indexes() (in 0004)
is simpler now.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachments:

v18-0001-Add-REPACK-command.patchtext/x-diffDownload

From eadf2666b1e92fd52372fa859ef177f3a0d11f77 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Mon, 11 Aug 2025 15:22:05 +0200
Subject: [PATCH 1/4] Add REPACK command.

The existing CLUSTER command as well as VACUUM with the FULL option both
reclaim unused space by rewriting table. Now that we want to enhance this
functionality (in particular, by adding a new option CONCURRENTLY), we should
enhance both commands because they are both implemented by the same function
(cluster.c:cluster_rel). However, adding the same option to two different
commands is not very user-friendly. Therefore it was decided to create a new
command and to declare both CLUSTER command and the FULL option of VACUUM
deprecated. Future enhancements to this rewriting code will only affect the
new command.

Like CLUSTER, the REPACK command reorders the table according to the specified
index. Unlike CLUSTER, REPACK does not require the index: if only table is
specified, the command acts as VACUUM FULL. As we don't want to remove CLUSTER
and VACUUM FULL yet, there are three callers of the cluster_rel() function
now: REPACK, CLUSTER and VACUUM FULL. When we need to distinguish who is
calling this function (mostly for logging, but also for progress reporting),
we can no longer use the OID of the clustering index: both REPACK and VACUUM
FULL can pass InvalidOid. Therefore, this patch introduces a new enumeration
type ClusterCommand, and adds an argument of this type to the cluster_rel()
function and to all the functions that need to distinguish the caller.

Like CLUSTER and VACUUM FULL, the REPACK command without arguments processes
all the tables on which the current user has the MAINTAIN privilege.

A new view pg_stat_progress_repack view is added to monitor the progress of
REPACK. Currently it displays the same information as pg_stat_progress_cluster
(except that column names might differ), but it'll also display the status of
the REPACK CONCURRENTLY command in the future, so the view definitions will
eventually diverge.

Regarding user documentation, the patch moves the information on clustering
from cluster.sgml to the new file repack.sgml. cluster.sgml now contains a
link that points to the related section of repack.sgml. A note on deprecation
and a link to repack.sgml are added to both cluster.sgml and vacuum.sgml.
---
 doc/src/sgml/monitoring.sgml             | 223 ++++++-
 doc/src/sgml/ref/allfiles.sgml           |   1 +
 doc/src/sgml/ref/cluster.sgml            |  82 +--
 doc/src/sgml/ref/repack.sgml             | 254 ++++++++
 doc/src/sgml/ref/vacuum.sgml             |   9 +
 doc/src/sgml/reference.sgml              |   1 +
 src/backend/access/heap/heapam_handler.c |  32 +-
 src/backend/catalog/index.c              |   2 +-
 src/backend/catalog/system_views.sql     |  26 +
 src/backend/commands/cluster.c           | 717 +++++++++++++++--------
 src/backend/commands/vacuum.c            |   3 +-
 src/backend/parser/gram.y                |  77 ++-
 src/backend/tcop/utility.c               |  20 +-
 src/backend/utils/adt/pgstatfuncs.c      |   2 +
 src/bin/psql/tab-complete.in.c           |  33 +-
 src/include/commands/cluster.h           |   7 +-
 src/include/commands/progress.h          |  63 +-
 src/include/nodes/parsenodes.h           |  20 +-
 src/include/parser/kwlist.h              |   1 +
 src/include/tcop/cmdtaglist.h            |   1 +
 src/include/utils/backend_progress.h     |   1 +
 src/test/regress/expected/cluster.out    | 125 +++-
 src/test/regress/expected/rules.out      |  23 +
 src/test/regress/sql/cluster.sql         |  59 ++
 src/tools/pgindent/typedefs.list         |   3 +
 25 files changed, 1406 insertions(+), 379 deletions(-)
 create mode 100644 doc/src/sgml/ref/repack.sgml

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3f4a27a736e..12e103d319d 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -405,6 +405,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_progress_repack</structname><indexterm><primary>pg_stat_progress_repack</primary></indexterm></entry>
+      <entry>One row for each backend running
+       <command>REPACK</command>, showing current progress.  See
+       <xref linkend="repack-progress-reporting"/>.
+      </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_progress_basebackup</structname><indexterm><primary>pg_stat_progress_basebackup</primary></indexterm></entry>
       <entry>One row for each WAL sender process streaming a base backup,
@@ -5506,7 +5514,8 @@ FROM pg_stat_get_backend_idset() AS backendid;
    certain commands during command execution.  Currently, the only commands
    which support progress reporting are <command>ANALYZE</command>,
    <command>CLUSTER</command>,
-   <command>CREATE INDEX</command>, <command>VACUUM</command>,
+   <command>CREATE INDEX</command>, <command>REPACK</command>,
+   <command>VACUUM</command>,
    <command>COPY</command>,
    and <xref linkend="protocol-replication-base-backup"/> (i.e., replication
    command that <xref linkend="app-pgbasebackup"/> issues to take
@@ -5965,6 +5974,218 @@ FROM pg_stat_get_backend_idset() AS backendid;
   </table>
  </sect2>
 
+ <sect2 id="repack-progress-reporting">
+  <title>REPACK Progress Reporting</title>
+
+  <indexterm>
+   <primary>pg_stat_progress_repack</primary>
+  </indexterm>
+
+  <para>
+   Whenever <command>REPACK</command> is running,
+   the <structname>pg_stat_progress_repack</structname> view will contain a
+   row for each backend that is currently running the command.  The tables
+   below describe the information that will be reported and provide
+   information about how to interpret it.
+  </para>
+
+  <table id="pg-stat-progress-repack-view" xreflabel="pg_stat_progress_repack">
+   <title><structname>pg_stat_progress_repack</structname> View</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>pid</structfield> <type>integer</type>
+      </para>
+      <para>
+       Process ID of backend.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>datid</structfield> <type>oid</type>
+      </para>
+      <para>
+       OID of the database to which this backend is connected.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>datname</structfield> <type>name</type>
+      </para>
+      <para>
+       Name of the database to which this backend is connected.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>relid</structfield> <type>oid</type>
+      </para>
+      <para>
+       OID of the table being repacked.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>phase</structfield> <type>text</type>
+      </para>
+      <para>
+       Current processing phase. See <xref linkend="repack-phases"/>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>repack_index_relid</structfield> <type>oid</type>
+      </para>
+      <para>
+       If the table is being scanned using an index, this is the OID of the
+       index being used; otherwise, it is zero.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_scanned</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples scanned.
+       This counter only advances when the phase is
+       <literal>seq scanning heap</literal>,
+       <literal>index scanning heap</literal>
+       or <literal>writing new heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_written</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples written.
+       This counter only advances when the phase is
+       <literal>seq scanning heap</literal>,
+       <literal>index scanning heap</literal>
+       or <literal>writing new heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_blks_total</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total number of heap blocks in the table.  This number is reported
+       as of the beginning of <literal>seq scanning heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_blks_scanned</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap blocks scanned.  This counter only advances when the
+       phase is <literal>seq scanning heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>index_rebuild_count</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of indexes rebuilt.  This counter only advances when the phase
+       is <literal>rebuilding index</literal>.
+      </para></entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <table id="repack-phases">
+   <title>REPACK Phases</title>
+   <tgroup cols="2">
+    <colspec colname="col1" colwidth="1*"/>
+    <colspec colname="col2" colwidth="2*"/>
+    <thead>
+    <row>
+      <entry>Phase</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><literal>initializing</literal></entry>
+     <entry>
+       The command is preparing to begin scanning the heap.  This phase is
+       expected to be very brief.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>seq scanning heap</literal></entry>
+     <entry>
+       The command is currently scanning the table using a sequential scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>index scanning heap</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently scanning the table using an index scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>sorting tuples</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently sorting tuples.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>writing new heap</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently writing the new heap.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>swapping relation files</literal></entry>
+     <entry>
+       The command is currently swapping newly-built files into place.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>rebuilding index</literal></entry>
+     <entry>
+       The command is currently rebuilding an index.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>performing final cleanup</literal></entry>
+     <entry>
+       The command is performing final cleanup.  When this phase is
+       completed, <command>REPACK</command> will end.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+ </sect2>
+
  <sect2 id="copy-progress-reporting">
   <title>COPY Progress Reporting</title>
 
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index f5be638867a..c0ef654fcb4 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -167,6 +167,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY refreshMaterializedView SYSTEM "refresh_materialized_view.sgml">
 <!ENTITY reindex            SYSTEM "reindex.sgml">
 <!ENTITY releaseSavepoint   SYSTEM "release_savepoint.sgml">
+<!ENTITY repack             SYSTEM "repack.sgml">
 <!ENTITY reset              SYSTEM "reset.sgml">
 <!ENTITY revoke             SYSTEM "revoke.sgml">
 <!ENTITY rollback           SYSTEM "rollback.sgml">
diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index 8811f169ea0..ee4fd965928 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -42,18 +42,6 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
    <replaceable class="parameter">table_name</replaceable>.
   </para>
 
-  <para>
-   When a table is clustered, it is physically reordered
-   based on the index information. Clustering is a one-time operation:
-   when the table is subsequently updated, the changes are
-   not clustered.  That is, no attempt is made to store new or
-   updated rows according to their index order.  (If one wishes, one can
-   periodically recluster by issuing the command again.  Also, setting
-   the table's <literal>fillfactor</literal> storage parameter to less than
-   100% can aid in preserving cluster ordering during updates, since updated
-   rows are kept on the same page if enough space is available there.)
-  </para>
-
   <para>
    When a table is clustered, <productname>PostgreSQL</productname>
    remembers which index it was clustered by.  The form
@@ -78,6 +66,25 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
    database operations (both reads and writes) from operating on the
    table until the <command>CLUSTER</command> is finished.
   </para>
+
+  <warning>
+   <para>
+    The <command>CLUSTER</command> command is deprecated in favor of
+    <xref linkend="sql-repack"/>.
+   </para>
+  </warning>
+
+  <note>
+   <para>
+    <xref linkend="sql-repack-notes-on-clustering"/> explain how clustering
+    works, whether it is initiated by <command>CLUSTER</command> or
+    by <command>REPACK</command>. The notable difference between the two is
+    that <command>REPACK</command> does not remember the index used last
+    time. Thus if you don't specify an index, <command>REPACK</command>
+    rewrites the table but does not try to cluster it.
+   </para>
+  </note>
+
  </refsect1>
 
  <refsect1>
@@ -136,63 +143,12 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
     on the table.
    </para>
 
-   <para>
-    In cases where you are accessing single rows randomly
-    within a table, the actual order of the data in the
-    table is unimportant. However, if you tend to access some
-    data more than others, and there is an index that groups
-    them together, you will benefit from using <command>CLUSTER</command>.
-    If you are requesting a range of indexed values from a table, or a
-    single indexed value that has multiple rows that match,
-    <command>CLUSTER</command> will help because once the index identifies the
-    table page for the first row that matches, all other rows
-    that match are probably already on the same table page,
-    and so you save disk accesses and speed up the query.
-   </para>
-
-   <para>
-    <command>CLUSTER</command> can re-sort the table using either an index scan
-    on the specified index, or (if the index is a b-tree) a sequential
-    scan followed by sorting.  It will attempt to choose the method that
-    will be faster, based on planner cost parameters and available statistical
-    information.
-   </para>
-
    <para>
     While <command>CLUSTER</command> is running, the <xref
     linkend="guc-search-path"/> is temporarily changed to <literal>pg_catalog,
     pg_temp</literal>.
    </para>
 
-   <para>
-    When an index scan is used, a temporary copy of the table is created that
-    contains the table data in the index order.  Temporary copies of each
-    index on the table are created as well.  Therefore, you need free space on
-    disk at least equal to the sum of the table size and the index sizes.
-   </para>
-
-   <para>
-    When a sequential scan and sort is used, a temporary sort file is
-    also created, so that the peak temporary space requirement is as much
-    as double the table size, plus the index sizes.  This method is often
-    faster than the index scan method, but if the disk space requirement is
-    intolerable, you can disable this choice by temporarily setting <xref
-    linkend="guc-enable-sort"/> to <literal>off</literal>.
-   </para>
-
-   <para>
-    It is advisable to set <xref linkend="guc-maintenance-work-mem"/> to
-    a reasonably large value (but not more than the amount of RAM you can
-    dedicate to the <command>CLUSTER</command> operation) before clustering.
-   </para>
-
-   <para>
-    Because the planner records statistics about the ordering of
-    tables, it is advisable to run <link linkend="sql-analyze"><command>ANALYZE</command></link>
-    on the newly clustered table.
-    Otherwise, the planner might make poor choices of query plans.
-   </para>
-
    <para>
     Because <command>CLUSTER</command> remembers which indexes are clustered,
     one can cluster the tables one wants clustered manually the first time,
diff --git a/doc/src/sgml/ref/repack.sgml b/doc/src/sgml/ref/repack.sgml
new file mode 100644
index 00000000000..a612c72d971
--- /dev/null
+++ b/doc/src/sgml/ref/repack.sgml
@@ -0,0 +1,254 @@
+<!--
+doc/src/sgml/ref/repack.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="sql-repack">
+ <indexterm zone="sql-repack">
+  <primary>REPACK</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>REPACK</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>REPACK</refname>
+  <refpurpose>rewrite a table to reclaim disk space</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <replaceable class="parameter">table_name</replaceable> [ USING INDEX <replaceable class="parameter">index_name</replaceable> ] ]
+
+<phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
+
+    VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>REPACK</command> reclaims storage occupied by dead
+   tuples. Unlike <command>VACUUM</command>, it does so by rewriting the
+   entire contents of the table specified
+   by <replaceable class="parameter">table_name</replaceable> into a new disk
+   file with no extra space (except for the space guaranteed by
+   the <literal>fillfactor</literal> storage parameter), allowing unused space
+   to be returned to the operating system.
+  </para>
+
+  <para>
+   Without
+   a <replaceable class="parameter">table_name</replaceable>, <command>REPACK</command>
+   processes every table and materialized view in the current database that
+   the current user has the <literal>MAINTAIN</literal> privilege on. This
+   form of <command>REPACK</command> cannot be executed inside a transaction
+   block.
+  </para>
+
+  <para>
+   If <replaceable class="parameter">index_name</replaceable> is specified,
+   the table is clustered by this index. Please see the notes on clustering
+   below.
+  </para>
+
+  <para>
+   When a table is being repacked, an <literal>ACCESS EXCLUSIVE</literal> lock
+   is acquired on it. This prevents any other database operations (both reads
+   and writes) from operating on the table until the <command>REPACK</command>
+   is finished.
+  </para>
+
+  <refsect2 id="sql-repack-notes-on-clustering" xreflabel="Notes on Clustering">
+   <title>Notes on Clustering</title>
+
+   <para>
+    When a table is clustered, it is physically reordered based on the index
+    information.  Clustering is a one-time operation: when the table is
+    subsequently updated, the changes are not clustered.  That is, no attempt
+    is made to store new or updated rows according to their index order.  (If
+    one wishes, one can periodically recluster by issuing the command again.
+    Also, setting the table's <literal>fillfactor</literal> storage parameter
+    to less than 100% can aid in preserving cluster ordering during updates,
+    since updated rows are kept on the same page if enough space is available
+    there.)
+   </para>
+
+   <para>
+    In cases where you are accessing single rows randomly within a table, the
+    actual order of the data in the table is unimportant. However, if you tend
+    to access some data more than others, and there is an index that groups
+    them together, you will benefit from using <command>REPACK</command>.  If
+    you are requesting a range of indexed values from a table, or a single
+    indexed value that has multiple rows that match,
+    <command>REPACK</command> will help because once the index identifies the
+    table page for the first row that matches, all other rows that match are
+    probably already on the same table page, and so you save disk accesses and
+    speed up the query.
+   </para>
+
+   <para>
+    <command>REPACK</command> can re-sort the table using either an index scan
+    on the specified index (if the index is a b-tree), or a sequential scan
+    followed by sorting.  It will attempt to choose the method that will be
+    faster, based on planner cost parameters and available statistical
+    information.
+   </para>
+
+   <para>
+    Because the planner records statistics about the ordering of tables, it is
+    advisable to
+    run <link linkend="sql-analyze"><command>ANALYZE</command></link> on the
+    newly repacked table.  Otherwise, the planner might make poor choices of
+    query plans.
+   </para>
+  </refsect2>
+
+  <refsect2 id="sql-repack-notes-on-resources" xreflabel="Notes on Resources">
+   <title>Notes on Resources</title>
+
+   <para>
+    When an index scan or a sequential scan without sort is used, a temporary
+    copy of the table is created that contains the table data in the index
+    order.  Temporary copies of each index on the table are created as well.
+    Therefore, you need free space on disk at least equal to the sum of the
+    table size and the index sizes.
+   </para>
+
+   <para>
+    When a sequential scan and sort is used, a temporary sort file is also
+    created, so that the peak temporary space requirement is as much as double
+    the table size, plus the index sizes.  This method is often faster than
+    the index scan method, but if the disk space requirement is intolerable,
+    you can disable this choice by temporarily setting
+    <xref linkend="guc-enable-sort"/> to <literal>off</literal>.
+   </para>
+
+   <para>
+    It is advisable to set <xref linkend="guc-maintenance-work-mem"/> to a
+    reasonably large value (but not more than the amount of RAM you can
+    dedicate to the <command>REPACK</command> operation) before repacking.
+   </para>
+  </refsect2>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><replaceable class="parameter">table_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (possibly schema-qualified) of a table.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">index_name</replaceable></term>
+    <listitem>
+     <para>
+      The name of an index.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>VERBOSE</literal></term>
+    <listitem>
+     <para>
+      Prints a progress report as each table is repacked
+      at <literal>INFO</literal> level.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">boolean</replaceable></term>
+    <listitem>
+     <para>
+      Specifies whether the selected option should be turned on or off.
+      You can write <literal>TRUE</literal>, <literal>ON</literal>, or
+      <literal>1</literal> to enable the option, and <literal>FALSE</literal>,
+      <literal>OFF</literal>, or <literal>0</literal> to disable it.  The
+      <replaceable class="parameter">boolean</replaceable> value can also
+      be omitted, in which case <literal>TRUE</literal> is assumed.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+
+   <para>
+    To repack a table, one must have the <literal>MAINTAIN</literal> privilege
+    on the table.
+   </para>
+
+   <para>
+    While <command>REPACK</command> is running, the <xref
+    linkend="guc-search-path"/> is temporarily changed to <literal>pg_catalog,
+    pg_temp</literal>.
+   </para>
+
+  <para>
+    Each backend running <command>REPACK</command> will report its progress
+    in the <structname>pg_stat_progress_repack</structname> view. See
+    <xref linkend="repack-progress-reporting"/> for details.
+  </para>
+
+   <para>
+    Repacking a partitioned table repacks each of its partitions. If an index
+    is specified, each partition is repacked using the partition of that
+    index. <command>REPACK</command> on a partitioned table cannot be executed
+    inside a transaction block.
+   </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   Repack the table <literal>employees</literal>:
+<programlisting>
+REPACK employees;
+</programlisting>
+  </para>
+
+  <para>
+   Repack the table <literal>employees</literal> on the basis of its
+   index <literal>employees_ind</literal> (Since index is used here, this is
+   effectively clustering):
+<programlisting>
+REPACK employees USING INDEX employees_ind;
+</programlisting>
+  </para>
+
+  <para>
+   Repack all tables in the database on which you have
+   the <literal>MAINTAIN</literal> privilege:
+<programlisting>
+REPACK;
+</programlisting></para>
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There is no <command>REPACK</command> statement in the SQL standard.
+  </para>
+
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index bd5dcaf86a5..cee1cf3926c 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -98,6 +98,7 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
    <varlistentry>
     <term><literal>FULL</literal></term>
     <listitem>
+
      <para>
       Selects <quote>full</quote> vacuum, which can reclaim more
       space, but takes much longer and exclusively locks the table.
@@ -106,6 +107,14 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
       the operation is complete.  Usually this should only be used when a
       significant amount of space needs to be reclaimed from within the table.
      </para>
+
+     <warning>
+      <para>
+       The <option>FULL</option> parameter is deprecated in favor of
+       <xref linkend="sql-repack"/>.
+      </para>
+     </warning>
+
     </listitem>
    </varlistentry>
 
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index ff85ace83fc..229912d35b7 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -195,6 +195,7 @@
    &refreshMaterializedView;
    &reindex;
    &releaseSavepoint;
+   &repack;
    &reset;
    &revoke;
    &rollback;
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index cb4bc35c93e..0b03070d394 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -741,13 +741,13 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	if (OldIndex != NULL && !use_sort)
 	{
 		const int	ci_index[] = {
-			PROGRESS_CLUSTER_PHASE,
-			PROGRESS_CLUSTER_INDEX_RELID
+			PROGRESS_REPACK_PHASE,
+			PROGRESS_REPACK_INDEX_RELID
 		};
 		int64		ci_val[2];
 
 		/* Set phase and OIDOldIndex to columns */
-		ci_val[0] = PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP;
+		ci_val[0] = PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP;
 		ci_val[1] = RelationGetRelid(OldIndex);
 		pgstat_progress_update_multi_param(2, ci_index, ci_val);
 
@@ -759,15 +759,15 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	else
 	{
 		/* In scan-and-sort mode and also VACUUM FULL, set phase */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
 		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
 		/* Set total heap blocks */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_TOTAL_HEAP_BLKS,
+		pgstat_progress_update_param(PROGRESS_REPACK_TOTAL_HEAP_BLKS,
 									 heapScan->rs_nblocks);
 	}
 
@@ -809,7 +809,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 				 * is manually updated to the correct value when the table
 				 * scan finishes.
 				 */
-				pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+				pgstat_progress_update_param(PROGRESS_REPACK_HEAP_BLKS_SCANNED,
 											 heapScan->rs_nblocks);
 				break;
 			}
@@ -825,7 +825,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			 */
 			if (prev_cblock != heapScan->rs_cblock)
 			{
-				pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+				pgstat_progress_update_param(PROGRESS_REPACK_HEAP_BLKS_SCANNED,
 											 (heapScan->rs_cblock +
 											  heapScan->rs_nblocks -
 											  heapScan->rs_startblock
@@ -912,14 +912,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			 * In scan-and-sort mode, report increase in number of tuples
 			 * scanned
 			 */
-			pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
 										 *num_tuples);
 		}
 		else
 		{
 			const int	ct_index[] = {
-				PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED,
-				PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN
+				PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
+				PROGRESS_REPACK_HEAP_TUPLES_WRITTEN
 			};
 			int64		ct_val[2];
 
@@ -952,14 +952,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		double		n_tuples = 0;
 
 		/* Report that we are now sorting tuples */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_SORT_TUPLES);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_SORT_TUPLES);
 
 		tuplesort_performsort(tuplesort);
 
 		/* Report that we are now writing new heap */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_WRITE_NEW_HEAP);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP);
 
 		for (;;)
 		{
@@ -977,7 +977,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 									 values, isnull,
 									 rwstate);
 			/* Report n_tuples */
-			pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_WRITTEN,
 										 n_tuples);
 		}
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index c4029a4f3d3..3063abff9a5 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -4079,7 +4079,7 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 		Assert(!ReindexIsProcessingIndex(indexOid));
 
 		/* Set index rebuild count */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_INDEX_REBUILD_COUNT,
+		pgstat_progress_update_param(PROGRESS_REPACK_INDEX_REBUILD_COUNT,
 									 i);
 		i++;
 	}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 1b3c5a55882..b2b7b10c2be 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1279,6 +1279,32 @@ CREATE VIEW pg_stat_progress_cluster AS
     FROM pg_stat_get_progress_info('CLUSTER') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
+CREATE VIEW pg_stat_progress_repack AS
+    SELECT
+        S.pid AS pid,
+        S.datid AS datid,
+        D.datname AS datname,
+        S.relid AS relid,
+	-- param1 is currently unused
+        CASE S.param2 WHEN 0 THEN 'initializing'
+                      WHEN 1 THEN 'seq scanning heap'
+                      WHEN 2 THEN 'index scanning heap'
+                      WHEN 3 THEN 'sorting tuples'
+                      WHEN 4 THEN 'writing new heap'
+                      WHEN 5 THEN 'swapping relation files'
+                      WHEN 6 THEN 'rebuilding index'
+                      WHEN 7 THEN 'performing final cleanup'
+                      END AS phase,
+        CAST(S.param3 AS oid) AS repack_index_relid,
+        S.param4 AS heap_tuples_scanned,
+        S.param5 AS heap_tuples_written,
+        S.param6 AS heap_blks_total,
+        S.param7 AS heap_blks_scanned,
+        S.param8 AS index_rebuild_count
+    FROM pg_stat_get_progress_info('REPACK') AS S
+        LEFT JOIN pg_database D ON S.datid = D.oid;
+
+
 CREATE VIEW pg_stat_progress_create_index AS
     SELECT
         S.pid AS pid, S.datid AS datid, D.datname AS datname,
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index b55221d44cd..af14a230f94 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -67,17 +67,40 @@ typedef struct
 	Oid			indexOid;
 } RelToCluster;
 
-
-static void cluster_multiple_rels(List *rtcs, ClusterParams *params);
-static void rebuild_relation(Relation OldHeap, Relation index, bool verbose);
+static bool cluster_rel_recheck(RepackCommand cmd, Relation OldHeap,
+								Oid indexOid, Oid userid, int options);
+static void rebuild_relation(RepackCommand cmd, bool usingindex,
+							 Relation OldHeap, Relation index, bool verbose);
 static void copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
-static List *get_tables_to_cluster(MemoryContext cluster_context);
-static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
-											   Oid indexOid);
-static bool cluster_is_permitted_for_relation(Oid relid, Oid userid);
-
+static List *get_tables_to_repack(RepackCommand cmd, bool usingindex,
+								  MemoryContext permcxt);
+static List *get_tables_to_repack_partitioned(RepackCommand cmd,
+											  MemoryContext cluster_context,
+											  Oid relid, bool rel_is_index);
+static bool cluster_is_permitted_for_relation(RepackCommand cmd,
+											  Oid relid, Oid userid);
+static Relation process_single_relation(RepackStmt *stmt,
+										ClusterParams *params);
+static Oid	determine_clustered_index(Relation rel, bool usingindex,
+									  const char *indexname);
+
+
+static const char *
+RepackCommandAsString(RepackCommand cmd)
+{
+	switch (cmd)
+	{
+		case REPACK_COMMAND_REPACK:
+			return "REPACK";
+		case REPACK_COMMAND_VACUUMFULL:
+			return "VACUUM";
+		case REPACK_COMMAND_CLUSTER:
+			return "VACUUM";
+	}
+	return "???";
+}
 
 /*---------------------------------------------------------------------------
  * This cluster code allows for clustering multiple tables at once. Because
@@ -104,101 +127,39 @@ static bool cluster_is_permitted_for_relation(Oid relid, Oid userid);
  *---------------------------------------------------------------------------
  */
 void
-cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
+ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 {
-	ListCell   *lc;
 	ClusterParams params = {0};
 	bool		verbose = false;
 	Relation	rel = NULL;
-	Oid			indexOid = InvalidOid;
-	MemoryContext cluster_context;
+	MemoryContext repack_context;
 	List	   *rtcs;
 
 	/* Parse option list */
-	foreach(lc, stmt->params)
+	foreach_node(DefElem, opt, stmt->params)
 	{
-		DefElem    *opt = (DefElem *) lfirst(lc);
-
 		if (strcmp(opt->defname, "verbose") == 0)
 			verbose = defGetBoolean(opt);
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
-					 errmsg("unrecognized CLUSTER option \"%s\"",
+					 errmsg("unrecognized %s option \"%s\"",
+							RepackCommandAsString(stmt->command),
 							opt->defname),
 					 parser_errposition(pstate, opt->location)));
 	}
 
 	params.options = (verbose ? CLUOPT_VERBOSE : 0);
 
+	/*
+	 * If a single relation is specified, process it and we're done ... unless
+	 * the relation is a partitioned table, in which case we fall through.
+	 */
 	if (stmt->relation != NULL)
 	{
-		/* This is the single-relation case. */
-		Oid			tableOid;
-
-		/*
-		 * Find, lock, and check permissions on the table.  We obtain
-		 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
-		 * single-transaction case.
-		 */
-		tableOid = RangeVarGetRelidExtended(stmt->relation,
-											AccessExclusiveLock,
-											0,
-											RangeVarCallbackMaintainsTable,
-											NULL);
-		rel = table_open(tableOid, NoLock);
-
-		/*
-		 * Reject clustering a remote temp table ... their local buffer
-		 * manager is not going to cope.
-		 */
-		if (RELATION_IS_OTHER_TEMP(rel))
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot cluster temporary tables of other sessions")));
-
-		if (stmt->indexname == NULL)
-		{
-			ListCell   *index;
-
-			/* We need to find the index that has indisclustered set. */
-			foreach(index, RelationGetIndexList(rel))
-			{
-				indexOid = lfirst_oid(index);
-				if (get_index_isclustered(indexOid))
-					break;
-				indexOid = InvalidOid;
-			}
-
-			if (!OidIsValid(indexOid))
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_OBJECT),
-						 errmsg("there is no previously clustered index for table \"%s\"",
-								stmt->relation->relname)));
-		}
-		else
-		{
-			/*
-			 * The index is expected to be in the same namespace as the
-			 * relation.
-			 */
-			indexOid = get_relname_relid(stmt->indexname,
-										 rel->rd_rel->relnamespace);
-			if (!OidIsValid(indexOid))
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_OBJECT),
-						 errmsg("index \"%s\" for table \"%s\" does not exist",
-								stmt->indexname, stmt->relation->relname)));
-		}
-
-		/* For non-partitioned tables, do what we came here to do. */
-		if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
-		{
-			cluster_rel(rel, indexOid, &params);
-			/* cluster_rel closes the relation, but keeps lock */
-
+		rel = process_single_relation(stmt, &params);
+		if (rel == NULL)
 			return;
-		}
 	}
 
 	/*
@@ -207,88 +168,103 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 	 * transaction.  This forces us to disallow running inside a user
 	 * transaction block.
 	 */
-	PreventInTransactionBlock(isTopLevel, "CLUSTER");
+	PreventInTransactionBlock(isTopLevel, RepackCommandAsString(stmt->command));
 
 	/* Also, we need a memory context to hold our list of relations */
-	cluster_context = AllocSetContextCreate(PortalContext,
-											"Cluster",
-											ALLOCSET_DEFAULT_SIZES);
+	repack_context = AllocSetContextCreate(PortalContext,
+										   "Repack",
+										   ALLOCSET_DEFAULT_SIZES);
+
+	params.options |= CLUOPT_RECHECK;
 
 	/*
-	 * Either we're processing a partitioned table, or we were not given any
-	 * table name at all.  In either case, obtain a list of relations to
-	 * process.
-	 *
-	 * In the former case, an index name must have been given, so we don't
-	 * need to recheck its "indisclustered" bit, but we have to check that it
-	 * is an index that we can cluster on.  In the latter case, we set the
-	 * option bit to have indisclustered verified.
-	 *
-	 * Rechecking the relation itself is necessary here in all cases.
+	 * If we don't have a relation yet, determine a relation list.  If we do,
+	 * then it must be a partitioned table, and we want to process its
+	 * partitions.
 	 */
-	params.options |= CLUOPT_RECHECK;
-	if (rel != NULL)
+	if (rel == NULL)
 	{
-		Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
-		check_index_is_clusterable(rel, indexOid, AccessShareLock);
-		rtcs = get_tables_to_cluster_partitioned(cluster_context, indexOid);
-
-		/* close relation, releasing lock on parent table */
-		table_close(rel, AccessExclusiveLock);
+		Assert(stmt->indexname == NULL);
+		rtcs = get_tables_to_repack(stmt->command, stmt->usingindex,
+									repack_context);
 	}
 	else
 	{
-		rtcs = get_tables_to_cluster(cluster_context);
-		params.options |= CLUOPT_RECHECK_ISCLUSTERED;
-	}
+		Oid			relid;
+		bool		rel_is_index;
 
-	/* Do the job. */
-	cluster_multiple_rels(rtcs, &params);
+		Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
 
-	/* Start a new transaction for the cleanup work. */
-	StartTransactionCommand();
+		/*
+		 * If an index name was specified, resolve it now and pass it down.
+		 */
+		if (stmt->usingindex)
+		{
+			/*
+			 * XXX how should this behave?  Passing no index to a partitioned
+			 * table could be useful to have certain partitions clustered by
+			 * some index, and other partitions by a different index.
+			 */
+			if (!stmt->indexname)
+				ereport(ERROR,
+						errmsg("there is no previously clustered index for table \"%s\"",
+							   RelationGetRelationName(rel)));
 
-	/* Clean up working storage */
-	MemoryContextDelete(cluster_context);
-}
+			relid = determine_clustered_index(rel, true, stmt->indexname);
+			/* XXX is this the right place for this? */
+			check_index_is_clusterable(rel, relid, AccessExclusiveLock);
+			rel_is_index = true;
+		}
+		else
+		{
+			relid = RelationGetRelid(rel);
+			rel_is_index = false;
+		}
 
-/*
- * Given a list of relations to cluster, process each of them in a separate
- * transaction.
- *
- * We expect to be in a transaction at start, but there isn't one when we
- * return.
- */
-static void
-cluster_multiple_rels(List *rtcs, ClusterParams *params)
-{
-	ListCell   *lc;
+		rtcs = get_tables_to_repack_partitioned(stmt->command, repack_context,
+												relid, rel_is_index);
+
+		/* close parent relation, releasing lock on it */
+		table_close(rel, AccessExclusiveLock);
+		rel = NULL;
+	}
 
 	/* Commit to get out of starting transaction */
 	PopActiveSnapshot();
 	CommitTransactionCommand();
 
 	/* Cluster the tables, each in a separate transaction */
-	foreach(lc, rtcs)
+	Assert(rel == NULL);
+	foreach_ptr(RelToCluster, rtc, rtcs)
 	{
-		RelToCluster *rtc = (RelToCluster *) lfirst(lc);
-		Relation	rel;
-
 		/* Start a new transaction for each relation. */
 		StartTransactionCommand();
 
 		/* functions in indexes may want a snapshot set */
 		PushActiveSnapshot(GetTransactionSnapshot());
 
-		rel = table_open(rtc->tableOid, AccessExclusiveLock);
+		/*
+		 * Open the target table, coping with the case where it has been
+		 * dropped.
+		 */
+		rel = try_table_open(rtc->tableOid, AccessExclusiveLock);
+		if (rel == NULL)
+			continue;
 
 		/* Process this table */
-		cluster_rel(rel, rtc->indexOid, params);
+		cluster_rel(stmt->command, stmt->usingindex,
+					rel, rtc->indexOid, &params);
 		/* cluster_rel closes the relation, but keeps lock */
 
 		PopActiveSnapshot();
 		CommitTransactionCommand();
 	}
+
+	/* Start a new transaction for the cleanup work. */
+	StartTransactionCommand();
+
+	/* Clean up working storage */
+	MemoryContextDelete(repack_context);
 }
 
 /*
@@ -304,11 +280,14 @@ cluster_multiple_rels(List *rtcs, ClusterParams *params)
  * them incrementally while we load the table.
  *
  * If indexOid is InvalidOid, the table will be rewritten in physical order
- * instead of index order.  This is the new implementation of VACUUM FULL,
- * and error messages should refer to the operation as VACUUM not CLUSTER.
+ * instead of index order.
+ *
+ * 'cmd' indicates which command is being executed, to be used for error
+ * messages.
  */
 void
-cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
+cluster_rel(RepackCommand cmd, bool usingindex,
+			Relation OldHeap, Oid indexOid, ClusterParams *params)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			save_userid;
@@ -323,13 +302,25 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	/* Check for user-requested abort. */
 	CHECK_FOR_INTERRUPTS();
 
-	pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
-	if (OidIsValid(indexOid))
-		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
+	if (cmd == REPACK_COMMAND_REPACK)
+		pgstat_progress_start_command(PROGRESS_COMMAND_REPACK, tableOid);
+	else
+		pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
+
+	if (cmd == REPACK_COMMAND_REPACK)
+		pgstat_progress_update_param(PROGRESS_REPACK_COMMAND,
+									 PROGRESS_REPACK_COMMAND_REPACK);
+	else if (cmd == REPACK_COMMAND_CLUSTER)
+	{
+		pgstat_progress_update_param(PROGRESS_REPACK_COMMAND,
 									 PROGRESS_CLUSTER_COMMAND_CLUSTER);
+	}
 	else
-		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
+	{
+		Assert(cmd == REPACK_COMMAND_VACUUMFULL);
+		pgstat_progress_update_param(PROGRESS_REPACK_COMMAND,
 									 PROGRESS_CLUSTER_COMMAND_VACUUM_FULL);
+	}
 
 	/*
 	 * Switch to the table owner's userid, so that any index functions are run
@@ -351,63 +342,21 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	 * to cluster a not-previously-clustered index.
 	 */
 	if (recheck)
-	{
-		/* Check that the user still has privileges for the relation */
-		if (!cluster_is_permitted_for_relation(tableOid, save_userid))
-		{
-			relation_close(OldHeap, AccessExclusiveLock);
-			goto out;
-		}
-
-		/*
-		 * Silently skip a temp table for a remote session.  Only doing this
-		 * check in the "recheck" case is appropriate (which currently means
-		 * somebody is executing a database-wide CLUSTER or on a partitioned
-		 * table), because there is another check in cluster() which will stop
-		 * any attempt to cluster remote temp tables by name.  There is
-		 * another check in cluster_rel which is redundant, but we leave it
-		 * for extra safety.
-		 */
-		if (RELATION_IS_OTHER_TEMP(OldHeap))
-		{
-			relation_close(OldHeap, AccessExclusiveLock);
+		if (!cluster_rel_recheck(cmd, OldHeap, indexOid, save_userid,
+								 params->options))
 			goto out;
-		}
-
-		if (OidIsValid(indexOid))
-		{
-			/*
-			 * Check that the index still exists
-			 */
-			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
-			{
-				relation_close(OldHeap, AccessExclusiveLock);
-				goto out;
-			}
-
-			/*
-			 * Check that the index is still the one with indisclustered set,
-			 * if needed.
-			 */
-			if ((params->options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
-				!get_index_isclustered(indexOid))
-			{
-				relation_close(OldHeap, AccessExclusiveLock);
-				goto out;
-			}
-		}
-	}
 
 	/*
-	 * We allow VACUUM FULL, but not CLUSTER, on shared catalogs.  CLUSTER
-	 * would work in most respects, but the index would only get marked as
-	 * indisclustered in the current database, leading to unexpected behavior
-	 * if CLUSTER were later invoked in another database.
+	 * We allow repacking shared catalogs only when not using an index. It
+	 * would work to use an index in most respects, but the index would only
+	 * get marked as indisclustered in the current database, leading to
+	 * unexpected behavior if CLUSTER were later invoked in another database.
 	 */
-	if (OidIsValid(indexOid) && OldHeap->rd_rel->relisshared)
+	if (usingindex && OldHeap->rd_rel->relisshared)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot cluster a shared catalog")));
+				 errmsg("cannot run \"%s\" on a shared catalog",
+						RepackCommandAsString(cmd))));
 
 	/*
 	 * Don't process temp tables of other backends ... their local buffer
@@ -415,21 +364,30 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	 */
 	if (RELATION_IS_OTHER_TEMP(OldHeap))
 	{
-		if (OidIsValid(indexOid))
+		if (cmd == REPACK_COMMAND_CLUSTER)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("cannot cluster temporary tables of other sessions")));
+		else if (cmd == REPACK_COMMAND_REPACK)
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot repack temporary tables of other sessions")));
+		}
 		else
+		{
+			Assert(cmd == REPACK_COMMAND_VACUUMFULL);
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("cannot vacuum temporary tables of other sessions")));
+		}
 	}
 
 	/*
 	 * Also check for active uses of the relation in the current transaction,
 	 * including open scans and pending AFTER trigger events.
 	 */
-	CheckTableNotInUse(OldHeap, OidIsValid(indexOid) ? "CLUSTER" : "VACUUM");
+	CheckTableNotInUse(OldHeap, RepackCommandAsString(cmd));
 
 	/* Check heap and index are valid to cluster on */
 	if (OidIsValid(indexOid))
@@ -469,7 +427,7 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	TransferPredicateLocksToHeapRelation(OldHeap);
 
 	/* rebuild_relation does all the dirty work */
-	rebuild_relation(OldHeap, index, verbose);
+	rebuild_relation(cmd, usingindex, OldHeap, index, verbose);
 	/* rebuild_relation closes OldHeap, and index if valid */
 
 out:
@@ -482,6 +440,63 @@ out:
 	pgstat_progress_end_command();
 }
 
+/*
+ * Check if the table (and its index) still meets the requirements of
+ * cluster_rel().
+ */
+static bool
+cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
+					Oid userid, int options)
+{
+	Oid			tableOid = RelationGetRelid(OldHeap);
+
+	/* Check that the user still has privileges for the relation */
+	if (!cluster_is_permitted_for_relation(cmd, tableOid, userid))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		return false;
+	}
+
+	/*
+	 * Silently skip a temp table for a remote session.  Only doing this check
+	 * in the "recheck" case is appropriate (which currently means somebody is
+	 * executing a database-wide CLUSTER or on a partitioned table), because
+	 * there is another check in cluster() which will stop any attempt to
+	 * cluster remote temp tables by name.  There is another check in
+	 * cluster_rel which is redundant, but we leave it for extra safety.
+	 */
+	if (RELATION_IS_OTHER_TEMP(OldHeap))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		return false;
+	}
+
+	if (OidIsValid(indexOid))
+	{
+		/*
+		 * Check that the index still exists
+		 */
+		if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
+		{
+			relation_close(OldHeap, AccessExclusiveLock);
+			return false;
+		}
+
+		/*
+		 * Check that the index is still the one with indisclustered set, if
+		 * needed.
+		 */
+		if ((options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
+			!get_index_isclustered(indexOid))
+		{
+			relation_close(OldHeap, AccessExclusiveLock);
+			return false;
+		}
+	}
+
+	return true;
+}
+
 /*
  * Verify that the specified heap and index are valid to cluster on
  *
@@ -626,7 +641,8 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
  * On exit, they are closed, but locks on them are not released.
  */
 static void
-rebuild_relation(Relation OldHeap, Relation index, bool verbose)
+rebuild_relation(RepackCommand cmd, bool usingindex,
+				 Relation OldHeap, Relation index, bool verbose)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			accessMethod = OldHeap->rd_rel->relam;
@@ -642,8 +658,8 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose)
 	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false) &&
 		   (index == NULL || CheckRelationLockedByMe(index, AccessExclusiveLock, false)));
 
-	if (index)
-		/* Mark the correct index as clustered */
+	/* for CLUSTER or REPACK USING INDEX, mark the index as the one to use */
+	if (usingindex)
 		mark_index_clustered(OldHeap, RelationGetRelid(index), true);
 
 	/* Remember info about rel before closing OldHeap */
@@ -1458,8 +1474,8 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	int			i;
 
 	/* Report that we are now swapping relation files */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_SWAP_REL_FILES);
 
 	/* Zero out possible results from swapped_relation_files */
 	memset(mapped_tables, 0, sizeof(mapped_tables));
@@ -1509,14 +1525,14 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 		reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
 
 	/* Report that we are now reindexing relations */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_REBUILD_INDEX);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
 
 	reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
 
 	/* Report that we are now doing clean up */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_FINAL_CLEANUP);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_FINAL_CLEANUP);
 
 	/*
 	 * If the relation being rebuilt is pg_class, swap_relation_files()
@@ -1632,69 +1648,137 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	}
 }
 
-
 /*
- * Get a list of tables that the current user has privileges on and
- * have indisclustered set.  Return the list in a List * of RelToCluster
- * (stored in the specified memory context), each one giving the tableOid
- * and the indexOid on which the table is already clustered.
+ * Determine which relations to process, when REPACK/CLUSTER is called
+ * without specifying a table name.  The exact process depends on whether
+ * USING INDEX was given or not, and in any case we only return tables and
+ * materialized views that the current user has privileges to repack/cluster.
+ *
+ * If USING INDEX was given, we scan pg_index to find those that have
+ * indisclustered set; if it was not given, scan pg_class and return all
+ * tables.
+ *
+ * Return it as a list of RelToCluster in the given memory context.
  */
 static List *
-get_tables_to_cluster(MemoryContext cluster_context)
+get_tables_to_repack(RepackCommand command, bool usingindex,
+					 MemoryContext permcxt)
 {
-	Relation	indRelation;
+	Relation	catalog;
 	TableScanDesc scan;
-	ScanKeyData entry;
-	HeapTuple	indexTuple;
-	Form_pg_index index;
+	HeapTuple	tuple;
 	MemoryContext old_context;
 	List	   *rtcs = NIL;
 
-	/*
-	 * Get all indexes that have indisclustered set and that the current user
-	 * has the appropriate privileges for.
-	 */
-	indRelation = table_open(IndexRelationId, AccessShareLock);
-	ScanKeyInit(&entry,
-				Anum_pg_index_indisclustered,
-				BTEqualStrategyNumber, F_BOOLEQ,
-				BoolGetDatum(true));
-	scan = table_beginscan_catalog(indRelation, 1, &entry);
-	while ((indexTuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+	if (usingindex)
 	{
-		RelToCluster *rtc;
+		ScanKeyData entry;
+
+		catalog = table_open(IndexRelationId, AccessShareLock);
+		ScanKeyInit(&entry,
+					Anum_pg_index_indisclustered,
+					BTEqualStrategyNumber, F_BOOLEQ,
+					BoolGetDatum(true));
+		scan = table_beginscan_catalog(catalog, 1, &entry);
+		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+		{
+			RelToCluster *rtc;
+			Form_pg_index index;
 
-		index = (Form_pg_index) GETSTRUCT(indexTuple);
+			index = (Form_pg_index) GETSTRUCT(tuple);
 
-		if (!cluster_is_permitted_for_relation(index->indrelid, GetUserId()))
-			continue;
+			/*
+			 * XXX I think the only reason there's no test failure here is
+			 * that we seldom have clustered indexes that would be affected by
+			 * concurrency.  Maybe we should also do the
+			 * ConditionalLockRelationOid+SearchSysCacheExists dance that we
+			 * do below.
+			 */
+			if (!cluster_is_permitted_for_relation(command, index->indrelid,
+												   GetUserId()))
+				continue;
 
-		/* Use a permanent memory context for the result list */
-		old_context = MemoryContextSwitchTo(cluster_context);
+			/* Use a permanent memory context for the result list */
+			old_context = MemoryContextSwitchTo(permcxt);
 
-		rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
-		rtc->tableOid = index->indrelid;
-		rtc->indexOid = index->indexrelid;
-		rtcs = lappend(rtcs, rtc);
+			rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
+			rtc->tableOid = index->indrelid;
+			rtc->indexOid = index->indexrelid;
+			rtcs = lappend(rtcs, rtc);
 
-		MemoryContextSwitchTo(old_context);
+			MemoryContextSwitchTo(old_context);
+		}
+	}
+	else
+	{
+		catalog = table_open(RelationRelationId, AccessShareLock);
+		scan = table_beginscan_catalog(catalog, 0, NULL);
+
+		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+		{
+			RelToCluster *rtc;
+			Form_pg_class class;
+
+			class = (Form_pg_class) GETSTRUCT(tuple);
+
+			/*
+			 * Try to obtain a light lock on the table, to ensure it doesn't
+			 * go away while we collect the list.  If we cannot, just
+			 * disregard the table.  XXX we could release at the bottom of the
+			 * loop, but for now just hold it until this transaction is
+			 * finished.
+			 */
+			if (!ConditionalLockRelationOid(class->oid, AccessShareLock))
+				continue;
+
+			/* Verify that the table still exists. */
+			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(class->oid)))
+			{
+				/* Release useless lock */
+				UnlockRelationOid(class->oid, AccessShareLock);
+				continue;
+			}
+
+			/* Can only process plain tables and matviews */
+			if (class->relkind != RELKIND_RELATION &&
+				class->relkind != RELKIND_MATVIEW)
+				continue;
+
+			if (!cluster_is_permitted_for_relation(command, class->oid,
+												   GetUserId()))
+				continue;
+
+			/* Use a permanent memory context for the result list */
+			old_context = MemoryContextSwitchTo(permcxt);
+
+			rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
+			rtc->tableOid = class->oid;
+			rtc->indexOid = InvalidOid;
+			rtcs = lappend(rtcs, rtc);
+
+			MemoryContextSwitchTo(old_context);
+		}
 	}
-	table_endscan(scan);
 
-	relation_close(indRelation, AccessShareLock);
+	table_endscan(scan);
+	relation_close(catalog, AccessShareLock);
 
 	return rtcs;
 }
 
 /*
- * Given an index on a partitioned table, return a list of RelToCluster for
+ * Given a partitioned table or its index, return a list of RelToCluster for
  * all the children leaves tables/indexes.
  *
  * Like expand_vacuum_rel, but here caller must hold AccessExclusiveLock
  * on the table containing the index.
+ *
+ * 'rel_is_index' tells whether 'relid' is that of an index (true) or of the
+ * owning relation.
  */
 static List *
-get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
+get_tables_to_repack_partitioned(RepackCommand cmd, MemoryContext cluster_context,
+								 Oid relid, bool rel_is_index)
 {
 	List	   *inhoids;
 	ListCell   *lc;
@@ -1702,17 +1786,33 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
 	MemoryContext old_context;
 
 	/* Do not lock the children until they're processed */
-	inhoids = find_all_inheritors(indexOid, NoLock, NULL);
+	inhoids = find_all_inheritors(relid, NoLock, NULL);
 
 	foreach(lc, inhoids)
 	{
-		Oid			indexrelid = lfirst_oid(lc);
-		Oid			relid = IndexGetRelation(indexrelid, false);
+		Oid			inhoid = lfirst_oid(lc);
+		Oid			inhrelid,
+					inhindid;
 		RelToCluster *rtc;
 
-		/* consider only leaf indexes */
-		if (get_rel_relkind(indexrelid) != RELKIND_INDEX)
-			continue;
+		if (rel_is_index)
+		{
+			/* consider only leaf indexes */
+			if (get_rel_relkind(inhoid) != RELKIND_INDEX)
+				continue;
+
+			inhrelid = IndexGetRelation(inhoid, false);
+			inhindid = inhoid;
+		}
+		else
+		{
+			/* consider only leaf relations */
+			if (get_rel_relkind(inhoid) != RELKIND_RELATION)
+				continue;
+
+			inhrelid = inhoid;
+			inhindid = InvalidOid;
+		}
 
 		/*
 		 * It's possible that the user does not have privileges to CLUSTER the
@@ -1720,15 +1820,15 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
 		 * table.  We skip any partitions which the user is not permitted to
 		 * CLUSTER.
 		 */
-		if (!cluster_is_permitted_for_relation(relid, GetUserId()))
+		if (!cluster_is_permitted_for_relation(cmd, inhrelid, GetUserId()))
 			continue;
 
 		/* Use a permanent memory context for the result list */
 		old_context = MemoryContextSwitchTo(cluster_context);
 
 		rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
-		rtc->tableOid = relid;
-		rtc->indexOid = indexrelid;
+		rtc->tableOid = inhrelid;
+		rtc->indexOid = inhindid;
 		rtcs = lappend(rtcs, rtc);
 
 		MemoryContextSwitchTo(old_context);
@@ -1742,13 +1842,134 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
  * function emits a WARNING.
  */
 static bool
-cluster_is_permitted_for_relation(Oid relid, Oid userid)
+cluster_is_permitted_for_relation(RepackCommand cmd, Oid relid, Oid userid)
 {
 	if (pg_class_aclcheck(relid, userid, ACL_MAINTAIN) == ACLCHECK_OK)
 		return true;
 
+	Assert(cmd == REPACK_COMMAND_CLUSTER || cmd == REPACK_COMMAND_REPACK);
 	ereport(WARNING,
-			(errmsg("permission denied to cluster \"%s\", skipping it",
-					get_rel_name(relid))));
+			errmsg("permission denied to execute %s on \"%s\", skipping it",
+				   cmd == REPACK_COMMAND_CLUSTER ? "CLUSTER" : "REPACK",
+				   get_rel_name(relid)));
+
 	return false;
 }
+
+
+/*
+ * Given a RepackStmt with an indicated relation name, resolve the relation
+ * name, obtain lock on it, then determine what to do based on the relation
+ * type: if it's not a partitioned table, repack it as indicated (using an
+ * existing clustered index, or following the indicated index), and return
+ * NULL.
+ *
+ * On the other hand, if the table is partitioned, do nothing further and
+ * instead return the opened relcache entry, so that caller can process the
+ * partitions using the multiple-table handling code.  The index name is not
+ * resolve in this case.
+ */
+static Relation
+process_single_relation(RepackStmt *stmt, ClusterParams *params)
+{
+	Relation	rel;
+	Oid			tableOid;
+
+	Assert(stmt->relation != NULL);
+	Assert(stmt->command == REPACK_COMMAND_CLUSTER ||
+		   stmt->command == REPACK_COMMAND_REPACK);
+
+	/*
+	 * Find, lock, and check permissions on the table.  We obtain
+	 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
+	 * single-transaction case.
+	 */
+	tableOid = RangeVarGetRelidExtended(stmt->relation,
+										AccessExclusiveLock,
+										0,
+										RangeVarCallbackMaintainsTable,
+										NULL);
+	rel = table_open(tableOid, NoLock);
+
+	/*
+	 * Reject clustering a remote temp table ... their local buffer manager is
+	 * not going to cope.
+	 */
+	if (RELATION_IS_OTHER_TEMP(rel))
+	{
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot execute %s on temporary tables of other sessions",
+					   RepackCommandAsString(stmt->command)));
+	}
+
+	/*
+	 * For partitioned tables, let caller handle this.  Otherwise, process it
+	 * here and we're done.
+	 */
+	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+		return rel;
+	else
+	{
+		Oid			indexOid;
+
+		indexOid = determine_clustered_index(rel, stmt->usingindex,
+											 stmt->indexname);
+		check_index_is_clusterable(rel, indexOid, AccessExclusiveLock);
+		cluster_rel(stmt->command, stmt->usingindex, rel, indexOid, params);
+		return NULL;
+	}
+}
+
+/*
+ * Given a relation and the usingindex/indexname options in a
+ * REPACK USING INDEX or CLUSTER command, return the OID of the index to use
+ * for clustering the table.
+ *
+ * Caller must hold lock on the relation so that the set of indexes doesn't
+ * change, and must call check_index_is_clusterable.
+ */
+static Oid
+determine_clustered_index(Relation rel, bool usingindex, const char *indexname)
+{
+	Oid			indexOid;
+
+	if (indexname == NULL && usingindex)
+	{
+		ListCell   *lc;
+
+		/* Find an index with indisclustered set, or report error */
+		foreach(lc, RelationGetIndexList(rel))
+		{
+			indexOid = lfirst_oid(lc);
+
+			if (get_index_isclustered(indexOid))
+				break;
+			indexOid = InvalidOid;
+		}
+
+		if (!OidIsValid(indexOid))
+			ereport(ERROR,
+					errcode(ERRCODE_UNDEFINED_OBJECT),
+					errmsg("there is no previously clustered index for table \"%s\"",
+						   RelationGetRelationName(rel)));
+	}
+	else if (indexname != NULL)
+	{
+		/*
+		 * An index was specified; figure out its name.  It must be in the
+		 * same namespace as the relation.
+		 */
+		indexOid = get_relname_relid(indexname,
+									 rel->rd_rel->relnamespace);
+		if (!OidIsValid(indexOid))
+			ereport(ERROR,
+					errcode(ERRCODE_UNDEFINED_OBJECT),
+					errmsg("index \"%s\" for table \"%s\" does not exist",
+						   indexname, RelationGetRelationName(rel)));
+	}
+	else
+		indexOid = InvalidOid;
+
+	return indexOid;
+}
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 733ef40ae7c..8863ad0e8bd 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -2287,7 +2287,8 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 				cluster_params.options |= CLUOPT_VERBOSE;
 
 			/* VACUUM FULL is now a variant of CLUSTER; see cluster.c */
-			cluster_rel(rel, InvalidOid, &cluster_params);
+			cluster_rel(REPACK_COMMAND_VACUUMFULL, false, rel, InvalidOid,
+						&cluster_params);
 			/* cluster_rel closes the relation, but keeps lock */
 
 			rel = NULL;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index db43034b9db..062235817f4 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -280,7 +280,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		AlterCompositeTypeStmt AlterUserMappingStmt
 		AlterRoleStmt AlterRoleSetStmt AlterPolicyStmt AlterStatsStmt
 		AlterDefaultPrivilegesStmt DefACLAction
-		AnalyzeStmt CallStmt ClosePortalStmt ClusterStmt CommentStmt
+		AnalyzeStmt CallStmt ClosePortalStmt CommentStmt
 		ConstraintsSetStmt CopyStmt CreateAsStmt CreateCastStmt
 		CreateDomainStmt CreateExtensionStmt CreateGroupStmt CreateOpClassStmt
 		CreateOpFamilyStmt AlterOpFamilyStmt CreatePLangStmt
@@ -297,7 +297,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		GrantStmt GrantRoleStmt ImportForeignSchemaStmt IndexStmt InsertStmt
 		ListenStmt LoadStmt LockStmt MergeStmt NotifyStmt ExplainableStmt PreparableStmt
 		CreateFunctionStmt AlterFunctionStmt ReindexStmt RemoveAggrStmt
-		RemoveFuncStmt RemoveOperStmt RenameStmt ReturnStmt RevokeStmt RevokeRoleStmt
+		RemoveFuncStmt RemoveOperStmt RenameStmt RepackStmt ReturnStmt RevokeStmt RevokeRoleStmt
 		RuleActionStmt RuleActionStmtOrEmpty RuleStmt
 		SecLabelStmt SelectStmt TransactionStmt TransactionStmtLegacy TruncateStmt
 		UnlistenStmt UpdateStmt VacuumStmt
@@ -316,7 +316,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 %type <str>			opt_single_name
 %type <list>		opt_qualified_name
-%type <boolean>		opt_concurrently
+%type <boolean>		opt_concurrently opt_usingindex
 %type <dbehavior>	opt_drop_behavior
 %type <list>		opt_utility_option_list
 %type <list>		utility_option_list
@@ -763,7 +763,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	QUOTE QUOTES
 
 	RANGE READ REAL REASSIGN RECURSIVE REF_P REFERENCES REFERENCING
-	REFRESH REINDEX RELATIVE_P RELEASE RENAME REPEATABLE REPLACE REPLICA
+	REFRESH REINDEX RELATIVE_P RELEASE RENAME REPACK REPEATABLE REPLACE REPLICA
 	RESET RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP
 	ROUTINE ROUTINES ROW ROWS RULE
 
@@ -1025,7 +1025,6 @@ stmt:
 			| CallStmt
 			| CheckPointStmt
 			| ClosePortalStmt
-			| ClusterStmt
 			| CommentStmt
 			| ConstraintsSetStmt
 			| CopyStmt
@@ -1099,6 +1098,7 @@ stmt:
 			| RemoveFuncStmt
 			| RemoveOperStmt
 			| RenameStmt
+			| RepackStmt
 			| RevokeStmt
 			| RevokeRoleStmt
 			| RuleStmt
@@ -1135,6 +1135,11 @@ opt_concurrently:
 			| /*EMPTY*/						{ $$ = false; }
 		;
 
+opt_usingindex:
+			USING INDEX						{ $$ = true; }
+			| /* EMPTY */					{ $$ = false; }
+		;
+
 opt_drop_behavior:
 			CASCADE							{ $$ = DROP_CASCADE; }
 			| RESTRICT						{ $$ = DROP_RESTRICT; }
@@ -11912,38 +11917,80 @@ CreateConversionStmt:
 /*****************************************************************************
  *
  *		QUERY:
+ *				REPACK [ (options) ] [ <qualified_name> [ USING INDEX <index_name> ] ]
+ *
+ *			obsolete variants:
  *				CLUSTER (options) [ <qualified_name> [ USING <index_name> ] ]
  *				CLUSTER [VERBOSE] [ <qualified_name> [ USING <index_name> ] ]
  *				CLUSTER [VERBOSE] <index_name> ON <qualified_name> (for pre-8.3)
  *
  *****************************************************************************/
 
-ClusterStmt:
-			CLUSTER '(' utility_option_list ')' qualified_name cluster_index_specification
+RepackStmt:
+			REPACK opt_utility_option_list qualified_name USING INDEX name
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = $3;
+					n->indexname = $6;
+					n->usingindex = true;
+					n->params = $2;
+					$$ = (Node *) n;
+				}
+			| REPACK opt_utility_option_list qualified_name opt_usingindex
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = $3;
+					n->indexname = NULL;
+					n->usingindex = $4;
+					n->params = $2;
+					$$ = (Node *) n;
+				}
+			| REPACK opt_usingindex
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = NULL;
+					n->indexname = NULL;
+					n->usingindex = $2;
+					n->params = NIL;
+					$$ = (Node *) n;
+				}
+			| CLUSTER '(' utility_option_list ')' qualified_name cluster_index_specification
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = $5;
 					n->indexname = $6;
+					n->usingindex = true;
 					n->params = $3;
 					$$ = (Node *) n;
 				}
 			| CLUSTER opt_utility_option_list
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = NULL;
 					n->indexname = NULL;
+					n->usingindex = true;
 					n->params = $2;
 					$$ = (Node *) n;
 				}
 			/* unparenthesized VERBOSE kept for pre-14 compatibility */
 			| CLUSTER opt_verbose qualified_name cluster_index_specification
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = $3;
 					n->indexname = $4;
+					n->usingindex = true;
 					if ($2)
 						n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
@@ -11951,20 +11998,24 @@ ClusterStmt:
 			/* unparenthesized VERBOSE kept for pre-17 compatibility */
 			| CLUSTER VERBOSE
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = NULL;
 					n->indexname = NULL;
+					n->usingindex = true;
 					n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
 				}
 			/* kept for pre-8.3 compatibility */
 			| CLUSTER opt_verbose name ON qualified_name
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = $5;
 					n->indexname = $3;
+					n->usingindex = true;
 					if ($2)
 						n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
@@ -17960,6 +18011,7 @@ unreserved_keyword:
 			| RELATIVE_P
 			| RELEASE
 			| RENAME
+			| REPACK
 			| REPEATABLE
 			| REPLACE
 			| REPLICA
@@ -18592,6 +18644,7 @@ bare_label_keyword:
 			| RELATIVE_P
 			| RELEASE
 			| RENAME
+			| REPACK
 			| REPEATABLE
 			| REPLACE
 			| REPLICA
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 4f4191b0ea6..8a30e081614 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -277,9 +277,9 @@ ClassifyUtilityCommandAsReadOnly(Node *parsetree)
 				return COMMAND_OK_IN_RECOVERY | COMMAND_OK_IN_READ_ONLY_TXN;
 			}
 
-		case T_ClusterStmt:
 		case T_ReindexStmt:
 		case T_VacuumStmt:
+		case T_RepackStmt:
 			{
 				/*
 				 * These commands write WAL, so they're not strictly
@@ -854,14 +854,14 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 			ExecuteCallStmt(castNode(CallStmt, parsetree), params, isAtomicContext, dest);
 			break;
 
-		case T_ClusterStmt:
-			cluster(pstate, (ClusterStmt *) parsetree, isTopLevel);
-			break;
-
 		case T_VacuumStmt:
 			ExecVacuum(pstate, (VacuumStmt *) parsetree, isTopLevel);
 			break;
 
+		case T_RepackStmt:
+			ExecRepack(pstate, (RepackStmt *) parsetree, isTopLevel);
+			break;
+
 		case T_ExplainStmt:
 			ExplainQuery(pstate, (ExplainStmt *) parsetree, params, dest);
 			break;
@@ -2850,10 +2850,6 @@ CreateCommandTag(Node *parsetree)
 			tag = CMDTAG_CALL;
 			break;
 
-		case T_ClusterStmt:
-			tag = CMDTAG_CLUSTER;
-			break;
-
 		case T_VacuumStmt:
 			if (((VacuumStmt *) parsetree)->is_vacuumcmd)
 				tag = CMDTAG_VACUUM;
@@ -2861,6 +2857,10 @@ CreateCommandTag(Node *parsetree)
 				tag = CMDTAG_ANALYZE;
 			break;
 
+		case T_RepackStmt:
+			tag = CMDTAG_REPACK;
+			break;
+
 		case T_ExplainStmt:
 			tag = CMDTAG_EXPLAIN;
 			break;
@@ -3498,7 +3498,7 @@ GetCommandLogLevel(Node *parsetree)
 			lev = LOGSTMT_ALL;
 			break;
 
-		case T_ClusterStmt:
+		case T_RepackStmt:
 			lev = LOGSTMT_DDL;
 			break;
 
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index c756c2bebaa..a1e10e8c2f6 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -268,6 +268,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 		cmdtype = PROGRESS_COMMAND_ANALYZE;
 	else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
 		cmdtype = PROGRESS_COMMAND_CLUSTER;
+	else if (pg_strcasecmp(cmd, "REPACK") == 0)
+		cmdtype = PROGRESS_COMMAND_REPACK;
 	else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)
 		cmdtype = PROGRESS_COMMAND_CREATE_INDEX;
 	else if (pg_strcasecmp(cmd, "BASEBACKUP") == 0)
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 1f2ca946fc5..31c1ce5dc09 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -1247,7 +1247,7 @@ static const char *const sql_commands[] = {
 	"DELETE FROM", "DISCARD", "DO", "DROP", "END", "EXECUTE", "EXPLAIN",
 	"FETCH", "GRANT", "IMPORT FOREIGN SCHEMA", "INSERT INTO", "LISTEN", "LOAD", "LOCK",
 	"MERGE INTO", "MOVE", "NOTIFY", "PREPARE",
-	"REASSIGN", "REFRESH MATERIALIZED VIEW", "REINDEX", "RELEASE",
+	"REASSIGN", "REFRESH MATERIALIZED VIEW", "REINDEX", "RELEASE", "REPACK",
 	"RESET", "REVOKE", "ROLLBACK",
 	"SAVEPOINT", "SECURITY LABEL", "SELECT", "SET", "SHOW", "START",
 	"TABLE", "TRUNCATE", "UNLISTEN", "UPDATE", "VACUUM", "VALUES", "WITH",
@@ -4997,6 +4997,37 @@ match_previous_words(int pattern_id,
 			COMPLETE_WITH_QUERY(Query_for_list_of_tablespaces);
 	}
 
+/* REPACK */
+	else if (Matches("REPACK"))
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
+	else if (Matches("REPACK", "(*)"))
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
+	/* If we have REPACK <sth>, then add "USING INDEX" */
+	else if (Matches("REPACK", MatchAnyExcept("(")))
+		COMPLETE_WITH("USING INDEX");
+	/* If we have REPACK (*) <sth>, then add "USING INDEX" */
+	else if (Matches("REPACK", "(*)", MatchAny))
+		COMPLETE_WITH("USING INDEX");
+	/* If we have REPACK <sth> USING, then add the index as well */
+	else if (Matches("REPACK", MatchAny, "USING", "INDEX"))
+	{
+		set_completion_reference(prev3_wd);
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_index_of_table);
+	}
+	else if (HeadMatches("REPACK", "(*") &&
+			 !HeadMatches("REPACK", "(*)"))
+	{
+		/*
+		 * This fires if we're in an unfinished parenthesized option list.
+		 * get_previous_words treats a completed parenthesized option list as
+		 * one word, so the above test is correct.
+		 */
+		if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
+			COMPLETE_WITH("VERBOSE");
+		else if (TailMatches("VERBOSE"))
+			COMPLETE_WITH("ON", "OFF");
+	}
+
 /* SECURITY LABEL */
 	else if (Matches("SECURITY"))
 		COMPLETE_WITH("LABEL");
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 60088a64cbb..7f4138c7b36 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -31,8 +31,11 @@ typedef struct ClusterParams
 	bits32		options;		/* bitmask of CLUOPT_* */
 } ClusterParams;
 
-extern void cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel);
-extern void cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params);
+
+extern void ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel);
+
+extern void cluster_rel(RepackCommand command, bool usingindex,
+						Relation OldHeap, Oid indexOid, ClusterParams *params);
 extern void check_index_is_clusterable(Relation OldHeap, Oid indexOid,
 									   LOCKMODE lockmode);
 extern void mark_index_clustered(Relation rel, Oid indexOid, bool is_internal);
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 1cde4bd9bcf..5b6639c114c 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -56,24 +56,51 @@
 #define PROGRESS_ANALYZE_PHASE_COMPUTE_EXT_STATS		4
 #define PROGRESS_ANALYZE_PHASE_FINALIZE_ANALYZE			5
 
-/* Progress parameters for cluster */
-#define PROGRESS_CLUSTER_COMMAND				0
-#define PROGRESS_CLUSTER_PHASE					1
-#define PROGRESS_CLUSTER_INDEX_RELID			2
-#define PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED	3
-#define PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN	4
-#define PROGRESS_CLUSTER_TOTAL_HEAP_BLKS		5
-#define PROGRESS_CLUSTER_HEAP_BLKS_SCANNED		6
-#define PROGRESS_CLUSTER_INDEX_REBUILD_COUNT	7
-
-/* Phases of cluster (as advertised via PROGRESS_CLUSTER_PHASE) */
-#define PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP	1
-#define PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP	2
-#define PROGRESS_CLUSTER_PHASE_SORT_TUPLES		3
-#define PROGRESS_CLUSTER_PHASE_WRITE_NEW_HEAP	4
-#define PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES	5
-#define PROGRESS_CLUSTER_PHASE_REBUILD_INDEX	6
-#define PROGRESS_CLUSTER_PHASE_FINAL_CLEANUP	7
+/*
+ * Progress parameters for REPACK.
+ *
+ * Note: Since REPACK shares some code with CLUSTER, these values are also
+ * used by CLUSTER. (CLUSTER is now deprecated, so it makes little sense to
+ * introduce a separate set of constants.)
+ */
+#define PROGRESS_REPACK_COMMAND					0
+#define PROGRESS_REPACK_PHASE					1
+#define PROGRESS_REPACK_INDEX_RELID				2
+#define PROGRESS_REPACK_HEAP_TUPLES_SCANNED		3
+#define PROGRESS_REPACK_HEAP_TUPLES_WRITTEN		4
+#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			5
+#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		6
+#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		7
+
+/*
+ * Phases of repack (as advertised via PROGRESS_REPACK_PHASE).
+ */
+#define PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP		1
+#define PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP	2
+#define PROGRESS_REPACK_PHASE_SORT_TUPLES		3
+#define PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP	4
+#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	5
+#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		6
+#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		7
+
+/*
+ * Commands of PROGRESS_REPACK
+ *
+ * Currently we only have one command, so the PROGRESS_REPACK_COMMAND
+ * parameter is not necessary. However it makes cluster.c simpler if we have
+ * the same set of parameters for CLUSTER and REPACK - see the note on REPACK
+ * parameters above.
+ */
+#define PROGRESS_REPACK_COMMAND_REPACK			1
+
+/*
+ * Progress parameters for cluster.
+ *
+ * Although we need to report REPACK and CLUSTER in separate views, the
+ * parameters and phases of CLUSTER are a subset of those of REPACK. Therefore
+ * we just use the appropriate values defined for REPACK above instead of
+ * defining a separate set of constants here.
+ */
 
 /* Commands of PROGRESS_CLUSTER */
 #define PROGRESS_CLUSTER_COMMAND_CLUSTER		1
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 86a236bd58b..fcc25a0c592 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3949,16 +3949,26 @@ typedef struct AlterSystemStmt
 } AlterSystemStmt;
 
 /* ----------------------
- *		Cluster Statement (support pbrown's cluster index implementation)
+ *		Repack Statement
  * ----------------------
  */
-typedef struct ClusterStmt
+typedef enum RepackCommand
+{
+	REPACK_COMMAND_CLUSTER,
+	REPACK_COMMAND_REPACK,
+	REPACK_COMMAND_VACUUMFULL,
+} RepackCommand;
+
+typedef struct RepackStmt
 {
 	NodeTag		type;
-	RangeVar   *relation;		/* relation being indexed, or NULL if all */
-	char	   *indexname;		/* original index defined */
+	RepackCommand command;		/* type of command being run */
+	RangeVar   *relation;		/* relation being repacked */
+	char	   *indexname;		/* order tuples by this index */
+	bool		usingindex;		/* whether USING INDEX is specified */
 	List	   *params;			/* list of DefElem nodes */
-} ClusterStmt;
+} RepackStmt;
+
 
 /* ----------------------
  *		Vacuum and Analyze Statements
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index a4af3f717a1..22559369e2c 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -374,6 +374,7 @@ PG_KEYWORD("reindex", REINDEX, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("relative", RELATIVE_P, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("release", RELEASE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("rename", RENAME, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("repack", REPACK, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("repeatable", REPEATABLE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("replace", REPLACE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("replica", REPLICA, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/include/tcop/cmdtaglist.h b/src/include/tcop/cmdtaglist.h
index d250a714d59..cceb312f2b3 100644
--- a/src/include/tcop/cmdtaglist.h
+++ b/src/include/tcop/cmdtaglist.h
@@ -196,6 +196,7 @@ PG_CMDTAG(CMDTAG_REASSIGN_OWNED, "REASSIGN OWNED", false, false, false)
 PG_CMDTAG(CMDTAG_REFRESH_MATERIALIZED_VIEW, "REFRESH MATERIALIZED VIEW", true, false, false)
 PG_CMDTAG(CMDTAG_REINDEX, "REINDEX", true, false, false)
 PG_CMDTAG(CMDTAG_RELEASE, "RELEASE", false, false, false)
+PG_CMDTAG(CMDTAG_REPACK, "REPACK", false, false, false)
 PG_CMDTAG(CMDTAG_RESET, "RESET", false, false, false)
 PG_CMDTAG(CMDTAG_REVOKE, "REVOKE", true, false, false)
 PG_CMDTAG(CMDTAG_REVOKE_ROLE, "REVOKE ROLE", false, false, false)
diff --git a/src/include/utils/backend_progress.h b/src/include/utils/backend_progress.h
index dda813ab407..e69e366dcdc 100644
--- a/src/include/utils/backend_progress.h
+++ b/src/include/utils/backend_progress.h
@@ -28,6 +28,7 @@ typedef enum ProgressCommandType
 	PROGRESS_COMMAND_CREATE_INDEX,
 	PROGRESS_COMMAND_BASEBACKUP,
 	PROGRESS_COMMAND_COPY,
+	PROGRESS_COMMAND_REPACK,
 } ProgressCommandType;
 
 #define PGSTAT_NUM_PROGRESS_PARAM	20
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index 4d40a6809ab..5256628b51d 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -254,6 +254,63 @@ ORDER BY 1;
  clstr_tst_pkey
 (3 rows)
 
+-- REPACK handles individual tables identically to CLUSTER, but it's worth
+-- checking if it handles table hierarchies identically as well.
+REPACK clstr_tst USING INDEX clstr_tst_c;
+-- Verify that inheritance link still works
+INSERT INTO clstr_tst_inh VALUES (0, 100, 'in child table 2');
+SELECT a,b,c,substring(d for 30), length(d) from clstr_tst;
+ a  |  b  |        c         |           substring            | length 
+----+-----+------------------+--------------------------------+--------
+ 10 |  14 | catorce          |                                |       
+ 18 |   5 | cinco            |                                |       
+  9 |   4 | cuatro           |                                |       
+ 26 |  19 | diecinueve       |                                |       
+ 12 |  18 | dieciocho        |                                |       
+ 30 |  16 | dieciseis        |                                |       
+ 24 |  17 | diecisiete       |                                |       
+  2 |  10 | diez             |                                |       
+ 23 |  12 | doce             |                                |       
+ 11 |   2 | dos              |                                |       
+ 25 |   9 | nueve            |                                |       
+ 31 |   8 | ocho             |                                |       
+  1 |  11 | once             |                                |       
+ 28 |  15 | quince           |                                |       
+ 32 |   6 | seis             | xyzzyxyzzyxyzzyxyzzyxyzzyxyzzy | 500000
+ 29 |   7 | siete            |                                |       
+ 15 |  13 | trece            |                                |       
+ 22 |  30 | treinta          |                                |       
+ 17 |  32 | treinta y dos    |                                |       
+  3 |  31 | treinta y uno    |                                |       
+  5 |   3 | tres             |                                |       
+ 20 |   1 | uno              |                                |       
+  6 |  20 | veinte           |                                |       
+ 14 |  25 | veinticinco      |                                |       
+ 21 |  24 | veinticuatro     |                                |       
+  4 |  22 | veintidos        |                                |       
+ 19 |  29 | veintinueve      |                                |       
+ 16 |  28 | veintiocho       |                                |       
+ 27 |  26 | veintiseis       |                                |       
+ 13 |  27 | veintisiete      |                                |       
+  7 |  23 | veintitres       |                                |       
+  8 |  21 | veintiuno        |                                |       
+  0 | 100 | in child table   |                                |       
+  0 | 100 | in child table 2 |                                |       
+(34 rows)
+
+-- Verify that foreign key link still works
+INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
+ERROR:  insert or update on table "clstr_tst" violates foreign key constraint "clstr_tst_con"
+DETAIL:  Key (b)=(1111) is not present in table "clstr_tst_s".
+SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
+ORDER BY 1;
+       conname        
+----------------------
+ clstr_tst_a_not_null
+ clstr_tst_con
+ clstr_tst_pkey
+(3 rows)
+
 SELECT relname, relkind,
     EXISTS(SELECT 1 FROM pg_class WHERE oid = c.reltoastrelid) AS hastoast
 FROM pg_class c WHERE relname LIKE 'clstr_tst%' ORDER BY relname;
@@ -381,6 +438,35 @@ SELECT * FROM clstr_1;
  2
 (2 rows)
 
+-- REPACK w/o argument performs no ordering, so we can only check which tables
+-- have the relfilenode changed.
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_old AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+SET SESSION AUTHORIZATION regress_clstr_user;
+SET client_min_messages = ERROR;  -- order of "skipping" warnings may vary
+REPACK;
+RESET client_min_messages;
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_new AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+-- Do the actual comparison. Unlike CLUSTER, clstr_3 should have been
+-- processed because there is nothing like clustering index here.
+SELECT o.relname FROM relnodes_old o
+JOIN relnodes_new n ON o.relname = n.relname
+WHERE o.relfilenode <> n.relfilenode
+ORDER BY o.relname;
+ relname 
+---------
+ clstr_1
+ clstr_3
+(2 rows)
+
+SET SESSION AUTHORIZATION regress_clstr_user;
 -- Test MVCC-safety of cluster. There isn't much we can do to verify the
 -- results with a single backend...
 CREATE TABLE clustertest (key int PRIMARY KEY);
@@ -495,6 +581,43 @@ ALTER TABLE clstrpart SET WITHOUT CLUSTER;
 ERROR:  cannot mark index clustered in partitioned table
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 ERROR:  cannot mark index clustered in partitioned table
+-- Check that REPACK sets new relfilenodes: it should process exactly the same
+-- tables as CLUSTER did.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart USING INDEX clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
+-- And finally the same for REPACK w/o index.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
 DROP TABLE clstrpart;
 -- Ownership of partitions is checked
 CREATE TABLE ptnowner(i int unique) PARTITION BY LIST (i);
@@ -513,7 +636,7 @@ CREATE TEMP TABLE ptnowner_oldnodes AS
   JOIN pg_class AS c ON c.oid=tree.relid;
 SET SESSION AUTHORIZATION regress_ptnowner;
 CLUSTER ptnowner USING ptnowner_i_idx;
-WARNING:  permission denied to cluster "ptnowner2", skipping it
+WARNING:  permission denied to execute CLUSTER on "ptnowner2", skipping it
 RESET SESSION AUTHORIZATION;
 SELECT a.relname, a.relfilenode=b.relfilenode FROM pg_class a
   JOIN ptnowner_oldnodes b USING (oid) ORDER BY a.relname COLLATE "C";
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 35e8aad7701..3a1d1d28282 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2071,6 +2071,29 @@ pg_stat_progress_create_index| SELECT s.pid,
     s.param15 AS partitions_done
    FROM (pg_stat_get_progress_info('CREATE INDEX'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+pg_stat_progress_repack| SELECT s.pid,
+    s.datid,
+    d.datname,
+    s.relid,
+        CASE s.param2
+            WHEN 0 THEN 'initializing'::text
+            WHEN 1 THEN 'seq scanning heap'::text
+            WHEN 2 THEN 'index scanning heap'::text
+            WHEN 3 THEN 'sorting tuples'::text
+            WHEN 4 THEN 'writing new heap'::text
+            WHEN 5 THEN 'swapping relation files'::text
+            WHEN 6 THEN 'rebuilding index'::text
+            WHEN 7 THEN 'performing final cleanup'::text
+            ELSE NULL::text
+        END AS phase,
+    (s.param3)::oid AS repack_index_relid,
+    s.param4 AS heap_tuples_scanned,
+    s.param5 AS heap_tuples_written,
+    s.param6 AS heap_blks_total,
+    s.param7 AS heap_blks_scanned,
+    s.param8 AS index_rebuild_count
+   FROM (pg_stat_get_progress_info('REPACK'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
+     LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_vacuum| SELECT s.pid,
     s.datid,
     d.datname,
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index b7115f86104..cfcc3dc9761 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -76,6 +76,19 @@ INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
 SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
 ORDER BY 1;
 
+-- REPACK handles individual tables identically to CLUSTER, but it's worth
+-- checking if it handles table hierarchies identically as well.
+REPACK clstr_tst USING INDEX clstr_tst_c;
+
+-- Verify that inheritance link still works
+INSERT INTO clstr_tst_inh VALUES (0, 100, 'in child table 2');
+SELECT a,b,c,substring(d for 30), length(d) from clstr_tst;
+
+-- Verify that foreign key link still works
+INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
+
+SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
+ORDER BY 1;
 
 SELECT relname, relkind,
     EXISTS(SELECT 1 FROM pg_class WHERE oid = c.reltoastrelid) AS hastoast
@@ -159,6 +172,34 @@ INSERT INTO clstr_1 VALUES (1);
 CLUSTER clstr_1;
 SELECT * FROM clstr_1;
 
+-- REPACK w/o argument performs no ordering, so we can only check which tables
+-- have the relfilenode changed.
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_old AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+
+SET SESSION AUTHORIZATION regress_clstr_user;
+SET client_min_messages = ERROR;  -- order of "skipping" warnings may vary
+REPACK;
+RESET client_min_messages;
+
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_new AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+
+-- Do the actual comparison. Unlike CLUSTER, clstr_3 should have been
+-- processed because there is nothing like clustering index here.
+SELECT o.relname FROM relnodes_old o
+JOIN relnodes_new n ON o.relname = n.relname
+WHERE o.relfilenode <> n.relfilenode
+ORDER BY o.relname;
+
+SET SESSION AUTHORIZATION regress_clstr_user;
+
 -- Test MVCC-safety of cluster. There isn't much we can do to verify the
 -- results with a single backend...
 
@@ -229,6 +270,24 @@ SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM o
 CLUSTER clstrpart;
 ALTER TABLE clstrpart SET WITHOUT CLUSTER;
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+
+-- Check that REPACK sets new relfilenodes: it should process exactly the same
+-- tables as CLUSTER did.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart USING INDEX clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+
+-- And finally the same for REPACK w/o index.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+
 DROP TABLE clstrpart;
 
 -- Ownership of partitions is checked
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e6f2e93b2d6..24f98ed1e9e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -425,6 +425,7 @@ ClientCertName
 ClientConnectionInfo
 ClientData
 ClientSocket
+ClusterCommand
 ClonePtrType
 ClosePortalStmt
 ClosePtrType
@@ -2536,6 +2537,8 @@ ReorderBufferTupleCidKey
 ReorderBufferUpdateProgressTxnCB
 ReorderTuple
 RepOriginId
+RepackCommand
+RepackStmt
 ReparameterizeForeignPathByChild_function
 ReplaceVarsFromTargetList_context
 ReplaceVarsNoMatchOption
-- 
2.47.1

v18-0002-Refactor-index_concurrently_create_copy-for-use-with.patchtext/x-diffDownload

From 71d825ef5a61236a6ff288da3251379580e6d368 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Mon, 11 Aug 2025 15:31:34 +0200
Subject: [PATCH 2/4] Refactor index_concurrently_create_copy() for use with
 REPACK (CONCURRENTLY).

This patch moves the code to index_create_copy() and adds a "concurrently"
parameter so it can be used by REPACK (CONCURRENTLY).

With the CONCURRENTLY option, REPACK cannot simply swap the heap file and
rebuild its indexes. Instead, it needs to build a separate set of indexes
(including system catalog entries) *before* the actual swap, to reduce the
time AccessExclusiveLock needs to be held for.
---
 src/backend/catalog/index.c | 36 ++++++++++++++++++++++++++++--------
 src/include/catalog/index.h |  3 +++
 2 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 3063abff9a5..ff34b1b4269 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1290,15 +1290,31 @@ index_create(Relation heapRelation,
 /*
  * index_concurrently_create_copy
  *
- * Create concurrently an index based on the definition of the one provided by
- * caller.  The index is inserted into catalogs and needs to be built later
- * on.  This is called during concurrent reindex processing.
- *
- * "tablespaceOid" is the tablespace to use for this index.
+ * Variant of index_create_copy(), called during concurrent reindex
+ * processing.
  */
 Oid
 index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 							   Oid tablespaceOid, const char *newName)
+{
+	return index_create_copy(heapRelation, oldIndexId, tablespaceOid, newName,
+							 true);
+}
+
+/*
+ * index_create_copy
+ *
+ * Create an index based on the definition of the one provided by caller.  The
+ * index is inserted into catalogs and needs to be built later on.
+ *
+ * "tablespaceOid" is the tablespace to use for this index.
+ *
+ * The actual implementation of index_concurrently_create_copy(), reusable for
+ * other purposes.
+ */
+Oid
+index_create_copy(Relation heapRelation, Oid oldIndexId, Oid tablespaceOid,
+				  const char *newName, bool concurrently)
 {
 	Relation	indexRelation;
 	IndexInfo  *oldInfo,
@@ -1317,6 +1333,7 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 	List	   *indexColNames = NIL;
 	List	   *indexExprs = NIL;
 	List	   *indexPreds = NIL;
+	int		flags = 0;
 
 	indexRelation = index_open(oldIndexId, RowExclusiveLock);
 
@@ -1325,9 +1342,9 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 
 	/*
 	 * Concurrent build of an index with exclusion constraints is not
-	 * supported.
+	 * supported. If !concurrently, ii_ExclusinOps is currently not needed.
 	 */
-	if (oldInfo->ii_ExclusionOps != NULL)
+	if (oldInfo->ii_ExclusionOps != NULL && concurrently)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("concurrent index creation for exclusion constraints is not supported")));
@@ -1435,6 +1452,9 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 		stattargets[i].isnull = isnull;
 	}
 
+	if (concurrently)
+		flags = INDEX_CREATE_SKIP_BUILD | INDEX_CREATE_CONCURRENT;
+
 	/*
 	 * Now create the new index.
 	 *
@@ -1458,7 +1478,7 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 							  indcoloptions->values,
 							  stattargets,
 							  reloptionsDatum,
-							  INDEX_CREATE_SKIP_BUILD | INDEX_CREATE_CONCURRENT,
+							  flags,
 							  0,
 							  true, /* allow table to be a system catalog? */
 							  false,	/* is_internal? */
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index 4daa8bef5ee..fc3e42e5f70 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -99,6 +99,9 @@ extern Oid	index_concurrently_create_copy(Relation heapRelation,
 										   Oid oldIndexId,
 										   Oid tablespaceOid,
 										   const char *newName);
+extern Oid index_create_copy(Relation heapRelation, Oid oldIndexId,
+							 Oid tablespaceOid, const char *newName,
+							 bool concurrently);
 
 extern void index_concurrently_build(Oid heapRelationId,
 									 Oid indexRelationId);
-- 
2.47.1

v18-0003-Move-conversion-of-a-historic-to-MVCC-snapshot-to-a-.patchtext/x-diffDownload

From be9c9691330bf71d3d5a3a895c072959d8302270 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Mon, 11 Aug 2025 15:23:05 +0200
Subject: [PATCH 3/4] Move conversion of a "historic" to MVCC snapshot to a
 separate function.

The conversion is now handled by SnapBuildMVCCFromHistoric(). REPACK
CONCURRENTLY will also need it.
---
 src/backend/replication/logical/snapbuild.c | 51 +++++++++++++++++----
 src/backend/utils/time/snapmgr.c            |  3 +-
 src/include/replication/snapbuild.h         |  1 +
 src/include/utils/snapmgr.h                 |  1 +
 4 files changed, 45 insertions(+), 11 deletions(-)

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 8532bfd27e5..6eaa40f6acf 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -440,10 +440,7 @@ Snapshot
 SnapBuildInitialSnapshot(SnapBuild *builder)
 {
 	Snapshot	snap;
-	TransactionId xid;
 	TransactionId safeXid;
-	TransactionId *newxip;
-	int			newxcnt = 0;
 
 	Assert(XactIsoLevel == XACT_REPEATABLE_READ);
 	Assert(builder->building_full_snapshot);
@@ -485,6 +482,31 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 
 	MyProc->xmin = snap->xmin;
 
+	/* Convert the historic snapshot to MVCC snapshot. */
+	return SnapBuildMVCCFromHistoric(snap, true);
+}
+
+/*
+ * Turn a historic MVCC snapshot into an ordinary MVCC snapshot.
+ *
+ * Unlike a regular (non-historic) MVCC snapshot, the xip array of this
+ * snapshot contains not only running main transactions, but also their
+ * subtransactions. This difference does has no impact on XidInMVCCSnapshot().
+ *
+ * Pass true for 'in_place' if you don't care about modifying the source
+ * snapshot. If you need a new instance, and one that was allocated as a
+ * single chunk of memory, pass false.
+ */
+Snapshot
+SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place)
+{
+	TransactionId xid;
+	TransactionId *oldxip = snapshot->xip;
+	uint32		oldxcnt = snapshot->xcnt;
+	TransactionId *newxip;
+	int			newxcnt = 0;
+	Snapshot	result;
+
 	/* allocate in transaction context */
 	newxip = (TransactionId *)
 		palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount());
@@ -495,7 +517,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	 * classical snapshot by marking all non-committed transactions as
 	 * in-progress. This can be expensive.
 	 */
-	for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);)
+	for (xid = snapshot->xmin; NormalTransactionIdPrecedes(xid, snapshot->xmax);)
 	{
 		void	   *test;
 
@@ -503,7 +525,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 		 * Check whether transaction committed using the decoding snapshot
 		 * meaning of ->xip.
 		 */
-		test = bsearch(&xid, snap->xip, snap->xcnt,
+		test = bsearch(&xid, snapshot->xip, snapshot->xcnt,
 					   sizeof(TransactionId), xidComparator);
 
 		if (test == NULL)
@@ -520,11 +542,22 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	}
 
 	/* adjust remaining snapshot fields as needed */
-	snap->snapshot_type = SNAPSHOT_MVCC;
-	snap->xcnt = newxcnt;
-	snap->xip = newxip;
+	snapshot->xcnt = newxcnt;
+	snapshot->xip = newxip;
+
+	if (in_place)
+		result = snapshot;
+	else
+	{
+		result = CopySnapshot(snapshot);
+
+		/* Restore the original values so the source is intact. */
+		snapshot->xip = oldxip;
+		snapshot->xcnt = oldxcnt;
+	}
+	result->snapshot_type = SNAPSHOT_MVCC;
 
-	return snap;
+	return result;
 }
 
 /*
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index ea35f30f494..70a6b8902d1 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -212,7 +212,6 @@ typedef struct ExportedSnapshot
 static List *exportedSnapshots = NIL;
 
 /* Prototypes for local functions */
-static Snapshot CopySnapshot(Snapshot snapshot);
 static void UnregisterSnapshotNoOwner(Snapshot snapshot);
 static void FreeSnapshot(Snapshot snapshot);
 static void SnapshotResetXmin(void);
@@ -591,7 +590,7 @@ SetTransactionSnapshot(Snapshot sourcesnap, VirtualTransactionId *sourcevxid,
  * The copy is palloc'd in TopTransactionContext and has initial refcounts set
  * to 0.  The returned snapshot has the copied flag set.
  */
-static Snapshot
+Snapshot
 CopySnapshot(Snapshot snapshot)
 {
 	Snapshot	newsnap;
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index 44031dcf6e3..6d4d2d1814c 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -73,6 +73,7 @@ extern void FreeSnapshotBuilder(SnapBuild *builder);
 extern void SnapBuildSnapDecRefcount(Snapshot snap);
 
 extern Snapshot SnapBuildInitialSnapshot(SnapBuild *builder);
+extern Snapshot SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place);
 extern const char *SnapBuildExportSnapshot(SnapBuild *builder);
 extern void SnapBuildClearExportedSnapshot(void);
 extern void SnapBuildResetExportedSnapshotState(void);
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index d346be71642..147b190210a 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -60,6 +60,7 @@ extern Snapshot GetTransactionSnapshot(void);
 extern Snapshot GetLatestSnapshot(void);
 extern void SnapshotSetCommandId(CommandId curcid);
 
+extern Snapshot CopySnapshot(Snapshot snapshot);
 extern Snapshot GetCatalogSnapshot(Oid relid);
 extern Snapshot GetNonHistoricCatalogSnapshot(Oid relid);
 extern void InvalidateCatalogSnapshot(void);
-- 
2.47.1

v18-0004-Add-CONCURRENTLY-option-to-REPACK-command.patchtext/plainDownload

From 1f3f0c762b63a7d92b782fcbbdd1e977954c3406 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Mon, 11 Aug 2025 16:12:51 +0200
Subject: [PATCH 4/4] Add CONCURRENTLY option to REPACK command.

The REPACK command copies the relation data into a new file, creates new
indexes and eventually swaps the files. To make sure that the old file does
not change during the copying, the relation is locked in an exclusive mode,
which prevents applications from both reading and writing. (To keep the data
consistent, we'd only need to prevent the applications from writing, but even
reading needs to be blocked before we can swap the files - otherwise some
applications could continue using the old file. Since we should not request a
stronger lock without releasing the weaker one first, we acquire the exclusive
lock in the beginning and keep it till the end of the processing.)

This patch introduces an alternative workflow, which only requires the
exclusive lock when the relation (and index) files are being swapped.
(Supposedly, the swapping should be pretty fast.) On the other hand, when we
copy the data to the new file, we allow applications to read from the relation
and even to write to it.

First, we scan the relation using a "historic snapshot", and insert all the
tuples satisfying this snapshot into the new file.

Second, logical decoding is used to capture the data changes done by
applications during the copying (i.e. changes that do not satisfy the historic
snapshot mentioned above), and those are applied to the new file before we
acquire the exclusive lock that we need to swap the files. (Of course, more
data changes can take place while we are waiting for the lock - these will be
applied to the new file after we have acquired the lock, before we swap the
files.)

Since the logical decoding system, during its startup, waits until all the
transactions which already have XID assigned have finished, there is a risk of
deadlock if a transaction that already changed anything in the database tries
to acquire a conflicting lock on the table REPACK CONCURRENTLY is working
on. As an example, consider transaction running CREATE INDEX command on the
table that is being REPACKed CONCURRENTLY. On the other hand, DML commands
(INSERT, UPDATE, DELETE) are not a problem as their lock does not conflict
with REPACK CONCURRENTLY.

The current approach is that we accept the risk. If we tried to avoid it, it'd
be necessary to unlock the table before the logical decoding is setup and lock
it again afterwards. Such temporary unlocking would imply re-checking if the
table still meets all the requirements for REPACK CONCURRENTLY.

Like the existing implementation of REPACK, the variant with the CONCURRENTLY
option also requires an extra space for the new relation and index files
(which coexist with the old files for some time). In addition, the
CONCURRENTLY option might introduce a lag in releasing WAL segments for
archiving / recycling. This is due to the decoding of the data changes done by
applications concurrently. When copying the table contents into the new file,
we check the lag periodically. If it exceeds the size of a WAL segment, we
decode all the available WAL before resuming the copying. (Of course, the
changes are not applied until the whole table contents is copied.) A
background worker might be a better approach for the decoding - let's consider
implementing it in the future.

The WAL records produced by running DML commands on the new relation do not
contain enough information to be processed by the logical decoding system. All
we need from the new relation is the file (relfilenode), while the actual
relation is eventually dropped. Thus there is no point in replaying the DMLs
anywhere.
---
 doc/src/sgml/monitoring.sgml                  |   37 +-
 doc/src/sgml/mvcc.sgml                        |   12 +-
 doc/src/sgml/ref/repack.sgml                  |  129 +-
 src/Makefile                                  |    1 +
 src/backend/access/heap/heapam.c              |   34 +-
 src/backend/access/heap/heapam_handler.c      |  215 ++-
 src/backend/access/heap/rewriteheap.c         |    6 +-
 src/backend/access/transam/xact.c             |   11 +-
 src/backend/catalog/system_views.sql          |   30 +-
 src/backend/commands/cluster.c                | 1677 +++++++++++++++--
 src/backend/commands/matview.c                |    2 +-
 src/backend/commands/tablecmds.c              |    1 +
 src/backend/commands/vacuum.c                 |   12 +-
 src/backend/meson.build                       |    1 +
 src/backend/replication/logical/decode.c      |   83 +
 src/backend/replication/logical/snapbuild.c   |   20 +
 .../replication/pgoutput_repack/Makefile      |   32 +
 .../replication/pgoutput_repack/meson.build   |   18 +
 .../pgoutput_repack/pgoutput_repack.c         |  288 +++
 src/backend/storage/ipc/ipci.c                |    1 +
 .../storage/lmgr/generate-lwlocknames.pl      |    2 +-
 src/backend/utils/cache/relcache.c            |    1 +
 src/backend/utils/time/snapmgr.c              |    3 +-
 src/bin/psql/tab-complete.in.c                |   25 +-
 src/include/access/heapam.h                   |    9 +-
 src/include/access/heapam_xlog.h              |    2 +
 src/include/access/tableam.h                  |   10 +
 src/include/commands/cluster.h                |   90 +-
 src/include/commands/progress.h               |   23 +-
 src/include/replication/snapbuild.h           |    1 +
 src/include/storage/lockdefs.h                |    4 +-
 src/include/utils/snapmgr.h                   |    2 +
 src/test/modules/injection_points/Makefile    |    5 +-
 .../injection_points/expected/repack.out      |  113 ++
 .../modules/injection_points/logical.conf     |    1 +
 src/test/modules/injection_points/meson.build |    4 +
 .../injection_points/specs/repack.spec        |  143 ++
 src/test/regress/expected/rules.out           |   29 +-
 src/tools/pgindent/typedefs.list              |    4 +
 39 files changed, 2820 insertions(+), 261 deletions(-)
 create mode 100644 src/backend/replication/pgoutput_repack/Makefile
 create mode 100644 src/backend/replication/pgoutput_repack/meson.build
 create mode 100644 src/backend/replication/pgoutput_repack/pgoutput_repack.c
 create mode 100644 src/test/modules/injection_points/expected/repack.out
 create mode 100644 src/test/modules/injection_points/logical.conf
 create mode 100644 src/test/modules/injection_points/specs/repack.spec

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 12e103d319d..61c0197555f 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6074,14 +6074,35 @@ FROM pg_stat_get_backend_idset() AS backendid;
 
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>heap_tuples_written</structfield> <type>bigint</type>
+       <structfield>heap_tuples_inserted</structfield> <type>bigint</type>
       </para>
       <para>
-       Number of heap tuples written.
+       Number of heap tuples inserted.
        This counter only advances when the phase is
        <literal>seq scanning heap</literal>,
-       <literal>index scanning heap</literal>
-       or <literal>writing new heap</literal>.
+       <literal>index scanning heap</literal>,
+       <literal>writing new heap</literal>
+       or <literal>catch-up</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_updated</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples updated.
+       This counter only advances when the phase is <literal>catch-up</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_deleted</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples deleted.
+       This counter only advances when the phase is <literal>catch-up</literal>.
       </para></entry>
      </row>
 
@@ -6162,6 +6183,14 @@ FROM pg_stat_get_backend_idset() AS backendid;
        <command>REPACK</command> is currently writing the new heap.
      </entry>
     </row>
+    <row>
+     <entry><literal>catch-up</literal></entry>
+     <entry>
+       <command>REPACK CONCURRENTLY</command> is currently processing the DML
+       commands that other transactions executed during any of the preceding
+       phase.
+     </entry>
+    </row>
     <row>
      <entry><literal>swapping relation files</literal></entry>
      <entry>
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index 049ee75a4ba..0f5c34af542 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -1833,15 +1833,17 @@ SELECT pg_advisory_lock(q.id) FROM
    <title>Caveats</title>
 
    <para>
-    Some DDL commands, currently only <link linkend="sql-truncate"><command>TRUNCATE</command></link> and the
-    table-rewriting forms of <link linkend="sql-altertable"><command>ALTER TABLE</command></link>, are not
+    Some commands, currently only <link linkend="sql-truncate"><command>TRUNCATE</command></link>, the
+    table-rewriting forms of <link linkend="sql-altertable"><command>ALTER
+    TABLE</command></link> and <command>REPACK</command> with
+    the <literal>CONCURRENTLY</literal> option, are not
     MVCC-safe.  This means that after the truncation or rewrite commits, the
     table will appear empty to concurrent transactions, if they are using a
-    snapshot taken before the DDL command committed.  This will only be an
+    snapshot taken before the command committed.  This will only be an
     issue for a transaction that did not access the table in question
-    before the DDL command started &mdash; any transaction that has done so
+    before the command started &mdash; any transaction that has done so
     would hold at least an <literal>ACCESS SHARE</literal> table lock,
-    which would block the DDL command until that transaction completes.
+    which would block the truncating or rewriting command until that transaction completes.
     So these commands will not cause any apparent inconsistency in the
     table contents for successive queries on the target table, but they
     could cause visible inconsistency between the contents of the target
diff --git a/doc/src/sgml/ref/repack.sgml b/doc/src/sgml/ref/repack.sgml
index a612c72d971..9c089a6b3d7 100644
--- a/doc/src/sgml/ref/repack.sgml
+++ b/doc/src/sgml/ref/repack.sgml
@@ -22,6 +22,7 @@ PostgreSQL documentation
  <refsynopsisdiv>
 <synopsis>
 REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <replaceable class="parameter">table_name</replaceable> [ USING INDEX <replaceable class="parameter">index_name</replaceable> ] ]
+REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] CONCURRENTLY <replaceable class="parameter">table_name</replaceable> [ USING INDEX <replaceable class="parameter">index_name</replaceable> ]
 
 <phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
 
@@ -48,7 +49,8 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
    processes every table and materialized view in the current database that
    the current user has the <literal>MAINTAIN</literal> privilege on. This
    form of <command>REPACK</command> cannot be executed inside a transaction
-   block.
+   block.  Also, this form is not allowed if
+   the <literal>CONCURRENTLY</literal> option is used.
   </para>
 
   <para>
@@ -61,7 +63,8 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
    When a table is being repacked, an <literal>ACCESS EXCLUSIVE</literal> lock
    is acquired on it. This prevents any other database operations (both reads
    and writes) from operating on the table until the <command>REPACK</command>
-   is finished.
+   is finished. If you want to keep the table accessible during the repacking,
+   consider using the <literal>CONCURRENTLY</literal> option.
   </para>
 
   <refsect2 id="sql-repack-notes-on-clustering" xreflabel="Notes on Clustering">
@@ -160,6 +163,128 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>CONCURRENTLY</literal></term>
+    <listitem>
+     <para>
+      Allow other transactions to use the table while it is being repacked.
+     </para>
+
+     <para>
+      Internally, <command>REPACK</command> copies the contents of the table
+      (ignoring dead tuples) into a new file, sorted by the specified index,
+      and also creates a new file for each index. Then it swaps the old and
+      new files for the table and all the indexes, and deletes the old
+      files. The <literal>ACCESS EXCLUSIVE</literal> lock is needed to make
+      sure that the old files do not change during the processing because the
+      changes would get lost due to the swap.
+     </para>
+
+     <para>
+      With the <literal>CONCURRENTLY</literal> option, the <literal>ACCESS
+      EXCLUSIVE</literal> lock is only acquired to swap the table and index
+      files. The data changes that took place during the creation of the new
+      table and index files are captured using logical decoding
+      (<xref linkend="logicaldecoding"/>) and applied before
+      the <literal>ACCESS EXCLUSIVE</literal> lock is requested. Thus the lock
+      is typically held only for the time needed to swap the files, which
+      should be pretty short. However, the time might still be noticeable if
+      too many data changes have been done to the table while
+      <command>REPACK</command> was waiting for the lock: those changes must
+      be processed just before the files are swapped, while the
+      <literal>ACCESS EXCLUSIVE</literal> lock is being held.
+     </para>
+
+     <para>
+      Note that <command>REPACK</command> with the
+      the <literal>CONCURRENTLY</literal> option does not try to order the
+      rows inserted into the table after the repacking started. Also
+      note <command>REPACK</command> might fail to complete due to DDL
+      commands executed on the table by other transactions during the
+      repacking.
+     </para>
+
+     <note>
+      <para>
+       In addition to the temporary space requirements explained in
+       <xref linkend="sql-repack-notes-on-resources"/>,
+       the <literal>CONCURRENTLY</literal> option can add to the usage of
+       temporary space a bit more. The reason is that other transactions can
+       perform DML operations which cannot be applied to the new file until
+       <command>REPACK</command> has copied all the tuples from the old
+       file. Thus the tuples inserted into the old file during the copying are
+       also stored separately in a temporary file, so they can eventually be
+       applied to the new file.
+      </para>
+
+      <para>
+       Furthermore, the data changes performed during the copying are
+       extracted from <link linkend="wal">write-ahead log</link> (WAL), and
+       this extraction (decoding) only takes place when certain amount of WAL
+       has been written. Therefore, WAL removal can be delayed by this
+       threshold. Currently the threshold is equal to the value of
+       the <link linkend="guc-wal-segment-size"><varname>wal_segment_size</varname></link>
+       configuration parameter.
+      </para>
+     </note>
+
+     <para>
+      The <literal>CONCURRENTLY</literal> option cannot be used in the
+      following cases:
+
+      <itemizedlist>
+       <listitem>
+        <para>
+          The table is <literal>UNLOGGED</literal>.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+          The table is partitioned.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+          The table is a system catalog or a <acronym>TOAST</acronym> table.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         <command>REPACK</command> is executed inside a transaction block.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+          The <link linkend="guc-wal-level"><varname>wal_level</varname></link>
+          configuration parameter is less than <literal>logical</literal>.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         The <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+         configuration parameter does not allow for creation of an additional
+         replication slot.
+        </para>
+       </listitem>
+      </itemizedlist>
+     </para>
+
+     <warning>
+      <para>
+       <command>REPACK</command> with the <literal>CONCURRENTLY</literal>
+       option is not MVCC-safe, see <xref linkend="mvcc-caveats"/> for
+       details.
+      </para>
+     </warning>
+
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>VERBOSE</literal></term>
     <listitem>
diff --git a/src/Makefile b/src/Makefile
index 2f31a2f20a7..b18c9a14ffa 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -23,6 +23,7 @@ SUBDIRS = \
 	interfaces \
 	backend/replication/libpqwalreceiver \
 	backend/replication/pgoutput \
+	backend/replication/pgoutput_repack \
 	fe_utils \
 	bin \
 	pl \
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0dcd6ee817e..4fdb3e880e4 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -60,7 +60,8 @@ static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
 static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 								  Buffer newbuf, HeapTuple oldtup,
 								  HeapTuple newtup, HeapTuple old_key_tuple,
-								  bool all_visible_cleared, bool new_all_visible_cleared);
+								  bool all_visible_cleared, bool new_all_visible_cleared,
+								  bool wal_logical);
 #ifdef USE_ASSERT_CHECKING
 static void check_lock_if_inplace_updateable_rel(Relation relation,
 												 ItemPointer otid,
@@ -2769,7 +2770,7 @@ xmax_infomask_changed(uint16 new_infomask, uint16 old_infomask)
 TM_Result
 heap_delete(Relation relation, ItemPointer tid,
 			CommandId cid, Snapshot crosscheck, bool wait,
-			TM_FailureData *tmfd, bool changingPart)
+			TM_FailureData *tmfd, bool changingPart, bool wal_logical)
 {
 	TM_Result	result;
 	TransactionId xid = GetCurrentTransactionId();
@@ -3016,7 +3017,8 @@ l1:
 	 * Compute replica identity tuple before entering the critical section so
 	 * we don't PANIC upon a memory allocation failure.
 	 */
-	old_key_tuple = ExtractReplicaIdentity(relation, &tp, true, &old_key_copied);
+	old_key_tuple = wal_logical ?
+		ExtractReplicaIdentity(relation, &tp, true, &old_key_copied) : NULL;
 
 	/*
 	 * If this is the first possibly-multixact-able operation in the current
@@ -3106,6 +3108,15 @@ l1:
 				xlrec.flags |= XLH_DELETE_CONTAINS_OLD_KEY;
 		}
 
+		/*
+		 * Unlike UPDATE, DELETE is decoded even if there is no old key, so it
+		 * does not help to clear both XLH_DELETE_CONTAINS_OLD_TUPLE and
+		 * XLH_DELETE_CONTAINS_OLD_KEY. Thus we need an extra flag. TODO
+		 * Consider not decoding tuples w/o the old tuple/key instead.
+		 */
+		if (!wal_logical)
+			xlrec.flags |= XLH_DELETE_NO_LOGICAL;
+
 		XLogBeginInsert();
 		XLogRegisterData(&xlrec, SizeOfHeapDelete);
 
@@ -3198,7 +3209,8 @@ simple_heap_delete(Relation relation, ItemPointer tid)
 	result = heap_delete(relation, tid,
 						 GetCurrentCommandId(true), InvalidSnapshot,
 						 true /* wait for commit */ ,
-						 &tmfd, false /* changingPart */ );
+						 &tmfd, false, /* changingPart */
+						 true /* wal_logical */);
 	switch (result)
 	{
 		case TM_SelfModified:
@@ -3239,7 +3251,7 @@ TM_Result
 heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 			CommandId cid, Snapshot crosscheck, bool wait,
 			TM_FailureData *tmfd, LockTupleMode *lockmode,
-			TU_UpdateIndexes *update_indexes)
+			TU_UpdateIndexes *update_indexes, bool wal_logical)
 {
 	TM_Result	result;
 	TransactionId xid = GetCurrentTransactionId();
@@ -4132,7 +4144,8 @@ l2:
 								 newbuf, &oldtup, heaptup,
 								 old_key_tuple,
 								 all_visible_cleared,
-								 all_visible_cleared_new);
+								 all_visible_cleared_new,
+								 wal_logical);
 		if (newbuf != buffer)
 		{
 			PageSetLSN(BufferGetPage(newbuf), recptr);
@@ -4490,7 +4503,8 @@ simple_heap_update(Relation relation, ItemPointer otid, HeapTuple tup,
 	result = heap_update(relation, otid, tup,
 						 GetCurrentCommandId(true), InvalidSnapshot,
 						 true /* wait for commit */ ,
-						 &tmfd, &lockmode, update_indexes);
+						 &tmfd, &lockmode, update_indexes,
+						 true	/* wal_logical */);
 	switch (result)
 	{
 		case TM_SelfModified:
@@ -8831,7 +8845,8 @@ static XLogRecPtr
 log_heap_update(Relation reln, Buffer oldbuf,
 				Buffer newbuf, HeapTuple oldtup, HeapTuple newtup,
 				HeapTuple old_key_tuple,
-				bool all_visible_cleared, bool new_all_visible_cleared)
+				bool all_visible_cleared, bool new_all_visible_cleared,
+				bool wal_logical)
 {
 	xl_heap_update xlrec;
 	xl_heap_header xlhdr;
@@ -8842,7 +8857,8 @@ log_heap_update(Relation reln, Buffer oldbuf,
 				suffixlen = 0;
 	XLogRecPtr	recptr;
 	Page		page = BufferGetPage(newbuf);
-	bool		need_tuple_data = RelationIsLogicallyLogged(reln);
+	bool		need_tuple_data = RelationIsLogicallyLogged(reln) &&
+		wal_logical;
 	bool		init;
 	int			bufflags;
 
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 0b03070d394..c829c06f769 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -33,6 +33,7 @@
 #include "catalog/index.h"
 #include "catalog/storage.h"
 #include "catalog/storage_xlog.h"
+#include "commands/cluster.h"
 #include "commands/progress.h"
 #include "executor/executor.h"
 #include "miscadmin.h"
@@ -309,7 +310,8 @@ heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
 	 * the storage itself is cleaning the dead tuples by itself, it is the
 	 * time to call the index tuple deletion also.
 	 */
-	return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart);
+	return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart,
+					   true);
 }
 
 
@@ -328,7 +330,7 @@ heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
 	tuple->t_tableOid = slot->tts_tableOid;
 
 	result = heap_update(relation, otid, tuple, cid, crosscheck, wait,
-						 tmfd, lockmode, update_indexes);
+						 tmfd, lockmode, update_indexes, true);
 	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
 
 	/*
@@ -685,13 +687,15 @@ static void
 heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 								 Relation OldIndex, bool use_sort,
 								 TransactionId OldestXmin,
+								 Snapshot snapshot,
+								 LogicalDecodingContext *decoding_ctx,
 								 TransactionId *xid_cutoff,
 								 MultiXactId *multi_cutoff,
 								 double *num_tuples,
 								 double *tups_vacuumed,
 								 double *tups_recently_dead)
 {
-	RewriteState rwstate;
+	RewriteState rwstate = NULL;
 	IndexScanDesc indexScan;
 	TableScanDesc tableScan;
 	HeapScanDesc heapScan;
@@ -705,6 +709,8 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	bool	   *isnull;
 	BufferHeapTupleTableSlot *hslot;
 	BlockNumber prev_cblock = InvalidBlockNumber;
+	bool		concurrent = snapshot != NULL;
+	XLogRecPtr	end_of_wal_prev = GetFlushRecPtr(NULL);
 
 	/* Remember if it's a system catalog */
 	is_system_catalog = IsSystemRelation(OldHeap);
@@ -720,9 +726,12 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	values = (Datum *) palloc(natts * sizeof(Datum));
 	isnull = (bool *) palloc(natts * sizeof(bool));
 
-	/* Initialize the rewrite operation */
-	rwstate = begin_heap_rewrite(OldHeap, NewHeap, OldestXmin, *xid_cutoff,
-								 *multi_cutoff);
+	/*
+	 * Initialize the rewrite operation.
+	 */
+	if (!concurrent)
+		rwstate = begin_heap_rewrite(OldHeap, NewHeap, OldestXmin,
+									 *xid_cutoff, *multi_cutoff);
 
 
 	/* Set up sorting if wanted */
@@ -737,6 +746,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	 * Prepare to scan the OldHeap.  To ensure we see recently-dead tuples
 	 * that still need to be copied, we scan with SnapshotAny and use
 	 * HeapTupleSatisfiesVacuum for the visibility test.
+	 *
+	 * In the CONCURRENTLY case, we do regular MVCC visibility tests, using
+	 * the snapshot passed by the caller.
 	 */
 	if (OldIndex != NULL && !use_sort)
 	{
@@ -753,7 +765,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex,
+									snapshot ? snapshot :SnapshotAny,
+									NULL, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -762,7 +776,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap,
+									snapshot ? snapshot :SnapshotAny,
+									0, (ScanKey) NULL);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
@@ -785,6 +801,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		HeapTuple	tuple;
 		Buffer		buf;
 		bool		isdead;
+		HTSV_Result vis;
 
 		CHECK_FOR_INTERRUPTS();
 
@@ -837,70 +854,84 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		tuple = ExecFetchSlotHeapTuple(slot, false, NULL);
 		buf = hslot->buffer;
 
-		LockBuffer(buf, BUFFER_LOCK_SHARE);
-
-		switch (HeapTupleSatisfiesVacuum(tuple, OldestXmin, buf))
+		/*
+		 * Regarding CONCURRENTLY, see the comments on MVCC snapshot above.
+		 */
+		if (!concurrent)
 		{
-			case HEAPTUPLE_DEAD:
-				/* Definitely dead */
-				isdead = true;
-				break;
-			case HEAPTUPLE_RECENTLY_DEAD:
-				*tups_recently_dead += 1;
-				/* fall through */
-			case HEAPTUPLE_LIVE:
-				/* Live or recently dead, must copy it */
-				isdead = false;
-				break;
-			case HEAPTUPLE_INSERT_IN_PROGRESS:
+			LockBuffer(buf, BUFFER_LOCK_SHARE);
 
-				/*
-				 * Since we hold exclusive lock on the relation, normally the
-				 * only way to see this is if it was inserted earlier in our
-				 * own transaction.  However, it can happen in system
-				 * catalogs, since we tend to release write lock before commit
-				 * there.  Give a warning if neither case applies; but in any
-				 * case we had better copy it.
-				 */
-				if (!is_system_catalog &&
-					!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple->t_data)))
-					elog(WARNING, "concurrent insert in progress within table \"%s\"",
-						 RelationGetRelationName(OldHeap));
-				/* treat as live */
-				isdead = false;
-				break;
-			case HEAPTUPLE_DELETE_IN_PROGRESS:
+			switch ((vis = HeapTupleSatisfiesVacuum(tuple, OldestXmin, buf)))
+			{
+				case HEAPTUPLE_DEAD:
+					/* Definitely dead */
+					isdead = true;
+					break;
+				case HEAPTUPLE_RECENTLY_DEAD:
+					*tups_recently_dead += 1;
+					/* fall through */
+				case HEAPTUPLE_LIVE:
+					/* Live or recently dead, must copy it */
+					isdead = false;
+					break;
+				case HEAPTUPLE_INSERT_IN_PROGRESS:
 
 				/*
-				 * Similar situation to INSERT_IN_PROGRESS case.
+				 * As long as we hold exclusive lock on the relation, normally
+				 * the only way to see this is if it was inserted earlier in
+				 * our own transaction.  However, it can happen in system
+				 * catalogs, since we tend to release write lock before commit
+				 * there. Also, there's no exclusive lock during concurrent
+				 * processing. Give a warning if neither case applies; but in
+				 * any case we had better copy it.
 				 */
-				if (!is_system_catalog &&
-					!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tuple->t_data)))
-					elog(WARNING, "concurrent delete in progress within table \"%s\"",
-						 RelationGetRelationName(OldHeap));
-				/* treat as recently dead */
-				*tups_recently_dead += 1;
-				isdead = false;
-				break;
-			default:
-				elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
-				isdead = false; /* keep compiler quiet */
-				break;
-		}
+					if (!is_system_catalog && !concurrent &&
+						!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple->t_data)))
+						elog(WARNING, "concurrent insert in progress within table \"%s\"",
+							 RelationGetRelationName(OldHeap));
+					/* treat as live */
+					isdead = false;
+					break;
+				case HEAPTUPLE_DELETE_IN_PROGRESS:
 
-		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+					/*
+					 * Similar situation to INSERT_IN_PROGRESS case.
+					 */
+					if (!is_system_catalog && !concurrent &&
+						!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tuple->t_data)))
+						elog(WARNING, "concurrent delete in progress within table \"%s\"",
+							 RelationGetRelationName(OldHeap));
+					/* treat as recently dead */
+					*tups_recently_dead += 1;
+					isdead = false;
+					break;
+				default:
+					elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
+					isdead = false; /* keep compiler quiet */
+					break;
+			}
 
-		if (isdead)
-		{
-			*tups_vacuumed += 1;
-			/* heap rewrite module still needs to see it... */
-			if (rewrite_heap_dead_tuple(rwstate, tuple))
+			if (isdead)
 			{
-				/* A previous recently-dead tuple is now known dead */
 				*tups_vacuumed += 1;
-				*tups_recently_dead -= 1;
+				/* heap rewrite module still needs to see it... */
+				if (rewrite_heap_dead_tuple(rwstate, tuple))
+				{
+					/* A previous recently-dead tuple is now known dead */
+					*tups_vacuumed += 1;
+					*tups_recently_dead -= 1;
+				}
+
+				LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+				continue;
 			}
-			continue;
+
+			/*
+			 * In the concurrent case, we have a copy of the tuple, so we
+			 * don't worry whether the source tuple will be deleted / updated
+			 * after we release the lock.
+			 */
+			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 		}
 
 		*num_tuples += 1;
@@ -919,7 +950,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		{
 			const int	ct_index[] = {
 				PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
-				PROGRESS_REPACK_HEAP_TUPLES_WRITTEN
+				PROGRESS_REPACK_HEAP_TUPLES_INSERTED
 			};
 			int64		ct_val[2];
 
@@ -934,6 +965,31 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			ct_val[1] = *num_tuples;
 			pgstat_progress_update_multi_param(2, ct_index, ct_val);
 		}
+
+		/*
+		 * Process the WAL produced by the load, as well as by other
+		 * transactions, so that the replication slot can advance and WAL does
+		 * not pile up. Use wal_segment_size as a threshold so that we do not
+		 * introduce the decoding overhead too often.
+		 *
+		 * Of course, we must not apply the changes until the initial load has
+		 * completed.
+		 *
+		 * Note that our insertions into the new table should not be decoded
+		 * as we (intentionally) do not write the logical decoding specific
+		 * information to WAL.
+		 */
+		if (concurrent)
+		{
+			XLogRecPtr	end_of_wal;
+
+			end_of_wal = GetFlushRecPtr(NULL);
+			if ((end_of_wal - end_of_wal_prev) > wal_segment_size)
+			{
+				repack_decode_concurrent_changes(decoding_ctx, end_of_wal);
+				end_of_wal_prev = end_of_wal;
+			}
+		}
 	}
 
 	if (indexScan != NULL)
@@ -977,7 +1033,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 									 values, isnull,
 									 rwstate);
 			/* Report n_tuples */
-			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_WRITTEN,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_INSERTED,
 										 n_tuples);
 		}
 
@@ -985,7 +1041,8 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	}
 
 	/* Write out any remaining tuples, and fsync if needed */
-	end_heap_rewrite(rwstate);
+	if (rwstate)
+		end_heap_rewrite(rwstate);
 
 	/* Clean up */
 	pfree(values);
@@ -2376,6 +2433,10 @@ heapam_scan_sample_next_tuple(TableScanDesc scan, SampleScanState *scanstate,
  * SET WITHOUT OIDS.
  *
  * So, we must reconstruct the tuple from component Datums.
+ *
+ * If rwstate=NULL, use simple_heap_insert() instead of rewriting - in that
+ * case we still need to deform/form the tuple. TODO Shouldn't we rename the
+ * function, as might not do any rewrite?
  */
 static void
 reform_and_rewrite_tuple(HeapTuple tuple,
@@ -2398,8 +2459,28 @@ reform_and_rewrite_tuple(HeapTuple tuple,
 
 	copiedTuple = heap_form_tuple(newTupDesc, values, isnull);
 
-	/* The heap rewrite module does the rest */
-	rewrite_heap_tuple(rwstate, tuple, copiedTuple);
+	if (rwstate)
+		/* The heap rewrite module does the rest */
+		rewrite_heap_tuple(rwstate, tuple, copiedTuple);
+	else
+	{
+		/*
+		 * Insert tuple when processing REPACK CONCURRENTLY.
+		 *
+		 * rewriteheap.c is not used in the CONCURRENTLY case because it'd be
+		 * difficult to do the same in the catch-up phase (as the logical
+		 * decoding does not provide us with sufficient visibility
+		 * information). Thus we must use heap_insert() both during the
+		 * catch-up and here.
+		 *
+		 * The following is like simple_heap_insert() except that we pass the
+		 * flag to skip logical decoding: as soon as REPACK CONCURRENTLY swaps
+		 * the relation files, it drops this relation, so no logical
+		 * replication subscription should need the data.
+		 */
+		heap_insert(NewHeap, copiedTuple, GetCurrentCommandId(true),
+					HEAP_INSERT_NO_LOGICAL, NULL);
+	}
 
 	heap_freetuple(copiedTuple);
 }
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index e6d2b5fced1..6aa2ed214f2 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -617,9 +617,9 @@ raw_heap_insert(RewriteState state, HeapTuple tup)
 		int			options = HEAP_INSERT_SKIP_FSM;
 
 		/*
-		 * While rewriting the heap for VACUUM FULL / CLUSTER, make sure data
-		 * for the TOAST table are not logically decoded.  The main heap is
-		 * WAL-logged as XLOG FPI records, which are not logically decoded.
+		 * While rewriting the heap for REPACK, make sure data for the TOAST
+		 * table are not logically decoded.  The main heap is WAL-logged as
+		 * XLOG FPI records, which are not logically decoded.
 		 */
 		options |= HEAP_INSERT_NO_LOGICAL;
 
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index b46e7e9c2a6..5670f2bfbde 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -215,6 +215,7 @@ typedef struct TransactionStateData
 	bool		parallelChildXact;	/* is any parent transaction parallel? */
 	bool		chain;			/* start a new block after this one */
 	bool		topXidLogged;	/* for a subxact: is top-level XID logged? */
+	bool		internal;		/* for a subxact: launched internally? */
 	struct TransactionStateData *parent;	/* back link to parent */
 } TransactionStateData;
 
@@ -4735,6 +4736,7 @@ BeginInternalSubTransaction(const char *name)
 			/* Normal subtransaction start */
 			PushTransaction();
 			s = CurrentTransactionState;	/* changed by push */
+			s->internal = true;
 
 			/*
 			 * Savepoint names, like the TransactionState block itself, live
@@ -5251,7 +5253,13 @@ AbortSubTransaction(void)
 	LWLockReleaseAll();
 
 	pgstat_report_wait_end();
-	pgstat_progress_end_command();
+
+	/*
+	 * Internal subtransacion might be used by an user command, in which case
+	 * the command outlives the subtransaction.
+	 */
+	if (!s->internal)
+		pgstat_progress_end_command();
 
 	pgaio_error_cleanup();
 
@@ -5468,6 +5476,7 @@ PushTransaction(void)
 	s->parallelModeLevel = 0;
 	s->parallelChildXact = (p->parallelModeLevel != 0 || p->parallelChildXact);
 	s->topXidLogged = false;
+	s->internal = false;
 
 	CurrentTransactionState = s;
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index b2b7b10c2be..a92ac78ad9e 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1266,16 +1266,17 @@ CREATE VIEW pg_stat_progress_cluster AS
                       WHEN 2 THEN 'index scanning heap'
                       WHEN 3 THEN 'sorting tuples'
                       WHEN 4 THEN 'writing new heap'
-                      WHEN 5 THEN 'swapping relation files'
-                      WHEN 6 THEN 'rebuilding index'
-                      WHEN 7 THEN 'performing final cleanup'
+                      -- 5 is 'catch-up', but that should not appear here.
+                      WHEN 6 THEN 'swapping relation files'
+                      WHEN 7 THEN 'rebuilding index'
+                      WHEN 8 THEN 'performing final cleanup'
                       END AS phase,
         CAST(S.param3 AS oid) AS cluster_index_relid,
         S.param4 AS heap_tuples_scanned,
         S.param5 AS heap_tuples_written,
-        S.param6 AS heap_blks_total,
-        S.param7 AS heap_blks_scanned,
-        S.param8 AS index_rebuild_count
+        S.param8 AS heap_blks_total,
+        S.param9 AS heap_blks_scanned,
+        S.param10 AS index_rebuild_count
     FROM pg_stat_get_progress_info('CLUSTER') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
@@ -1291,16 +1292,19 @@ CREATE VIEW pg_stat_progress_repack AS
                       WHEN 2 THEN 'index scanning heap'
                       WHEN 3 THEN 'sorting tuples'
                       WHEN 4 THEN 'writing new heap'
-                      WHEN 5 THEN 'swapping relation files'
-                      WHEN 6 THEN 'rebuilding index'
-                      WHEN 7 THEN 'performing final cleanup'
+                      WHEN 5 THEN 'catch-up'
+                      WHEN 6 THEN 'swapping relation files'
+                      WHEN 7 THEN 'rebuilding index'
+                      WHEN 8 THEN 'performing final cleanup'
                       END AS phase,
         CAST(S.param3 AS oid) AS repack_index_relid,
         S.param4 AS heap_tuples_scanned,
-        S.param5 AS heap_tuples_written,
-        S.param6 AS heap_blks_total,
-        S.param7 AS heap_blks_scanned,
-        S.param8 AS index_rebuild_count
+        S.param5 AS heap_tuples_inserted,
+        S.param6 AS heap_tuples_updated,
+        S.param7 AS heap_tuples_deleted,
+        S.param8 AS heap_blks_total,
+        S.param9 AS heap_blks_scanned,
+        S.param10 AS index_rebuild_count
     FROM pg_stat_get_progress_info('REPACK') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index af14a230f94..aa3ae85bcee 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -25,6 +25,10 @@
 #include "access/toast_internals.h"
 #include "access/transam.h"
 #include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xlog_internal.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
 #include "catalog/catalog.h"
 #include "catalog/dependency.h"
 #include "catalog/heap.h"
@@ -32,6 +36,7 @@
 #include "catalog/namespace.h"
 #include "catalog/objectaccess.h"
 #include "catalog/pg_am.h"
+#include "catalog/pg_control.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/toasting.h"
 #include "commands/cluster.h"
@@ -39,15 +44,21 @@
 #include "commands/progress.h"
 #include "commands/tablecmds.h"
 #include "commands/vacuum.h"
+#include "executor/executor.h"
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
 #include "pgstat.h"
+#include "replication/decode.h"
+#include "replication/logical.h"
+#include "replication/snapbuild.h"
 #include "storage/bufmgr.h"
+#include "storage/ipc.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
 #include "utils/acl.h"
 #include "utils/fmgroids.h"
 #include "utils/guc.h"
+#include "utils/injection_point.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
@@ -67,13 +78,45 @@ typedef struct
 	Oid			indexOid;
 } RelToCluster;
 
+/*
+ * The following definitions are used for concurrent processing.
+ */
+
+/*
+ * The locators are used to avoid logical decoding of data that we do not need
+ * for our table.
+ */
+RelFileLocator repacked_rel_locator = {.relNumber = InvalidOid};
+RelFileLocator repacked_rel_toast_locator = {.relNumber = InvalidOid};
+
+/*
+ * Everything we need to call ExecInsertIndexTuples().
+ */
+typedef struct IndexInsertState
+{
+	ResultRelInfo *rri;
+	EState	   *estate;
+
+	Relation	ident_index;
+} IndexInsertState;
+
+/* The WAL segment being decoded. */
+static XLogSegNo repack_current_segment = 0;
+
+
 static bool cluster_rel_recheck(RepackCommand cmd, Relation OldHeap,
-								Oid indexOid, Oid userid, int options);
+								Oid indexOid, Oid userid, LOCKMODE lmode,
+								int options);
+static void check_repack_concurrently_requirements(Relation rel);
 static void rebuild_relation(RepackCommand cmd, bool usingindex,
-							 Relation OldHeap, Relation index, bool verbose);
+							 Relation OldHeap, Relation index, Oid userid,
+							 bool verbose, bool concurrent);
 static void copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
-							bool verbose, bool *pSwapToastByContent,
-							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
+							Snapshot snapshot, LogicalDecodingContext *decoding_ctx,
+							bool verbose,
+							bool *pSwapToastByContent,
+							TransactionId *pFreezeXid,
+							MultiXactId *pCutoffMulti);
 static List *get_tables_to_repack(RepackCommand cmd, bool usingindex,
 								  MemoryContext permcxt);
 static List *get_tables_to_repack_partitioned(RepackCommand cmd,
@@ -81,12 +124,61 @@ static List *get_tables_to_repack_partitioned(RepackCommand cmd,
 											  Oid relid, bool rel_is_index);
 static bool cluster_is_permitted_for_relation(RepackCommand cmd,
 											  Oid relid, Oid userid);
+
+static void begin_concurrent_repack(Relation rel);
+static void end_concurrent_repack(void);
+static LogicalDecodingContext *setup_logical_decoding(Oid relid,
+													  const char *slotname,
+													  TupleDesc tupdesc);
+static HeapTuple get_changed_tuple(char *change);
+static void apply_concurrent_changes(RepackDecodingState *dstate,
+									 Relation rel, ScanKey key, int nkeys,
+									 IndexInsertState *iistate);
+static void apply_concurrent_insert(Relation rel, ConcurrentChange *change,
+									HeapTuple tup, IndexInsertState *iistate,
+									TupleTableSlot *index_slot);
+static void apply_concurrent_update(Relation rel, HeapTuple tup,
+									HeapTuple tup_target,
+									ConcurrentChange *change,
+									IndexInsertState *iistate,
+									TupleTableSlot *index_slot);
+static void apply_concurrent_delete(Relation rel, HeapTuple tup_target,
+									ConcurrentChange *change);
+static HeapTuple find_target_tuple(Relation rel, ScanKey key, int nkeys,
+								   HeapTuple tup_key,
+								   IndexInsertState *iistate,
+								   TupleTableSlot *ident_slot,
+								   IndexScanDesc *scan_p);
+static void process_concurrent_changes(LogicalDecodingContext *ctx,
+									   XLogRecPtr end_of_wal,
+									   Relation rel_dst,
+									   Relation rel_src,
+									   ScanKey ident_key,
+									   int ident_key_nentries,
+									   IndexInsertState *iistate);
+static IndexInsertState *get_index_insert_state(Relation relation,
+												Oid ident_index_id);
+static ScanKey build_identity_key(Oid ident_idx_oid, Relation rel_src,
+								  int *nentries);
+static void free_index_insert_state(IndexInsertState *iistate);
+static void cleanup_logical_decoding(LogicalDecodingContext *ctx);
+static void rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
+											   Relation cl_index,
+											   LogicalDecodingContext *ctx,
+											   bool swap_toast_by_content,
+											   TransactionId frozenXid,
+											   MultiXactId cutoffMulti);
+static List *build_new_indexes(Relation NewHeap, Relation OldHeap, List *OldIndexes);
 static Relation process_single_relation(RepackStmt *stmt,
+										LOCKMODE lockmode,
+										bool isTopLevel,
 										ClusterParams *params);
 static Oid	determine_clustered_index(Relation rel, bool usingindex,
 									  const char *indexname);
 
 
+#define REPL_PLUGIN_NAME   "pgoutput_repack"
+
 static const char *
 RepackCommandAsString(RepackCommand cmd)
 {
@@ -95,9 +187,9 @@ RepackCommandAsString(RepackCommand cmd)
 		case REPACK_COMMAND_REPACK:
 			return "REPACK";
 		case REPACK_COMMAND_VACUUMFULL:
-			return "VACUUM";
+			return "VACUUM (FULL)";
 		case REPACK_COMMAND_CLUSTER:
-			return "VACUUM";
+			return "CLUSTER";
 	}
 	return "???";
 }
@@ -130,16 +222,27 @@ void
 ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 {
 	ClusterParams params = {0};
-	bool		verbose = false;
 	Relation	rel = NULL;
 	MemoryContext repack_context;
+	LOCKMODE	lockmode;
 	List	   *rtcs;
 
 	/* Parse option list */
 	foreach_node(DefElem, opt, stmt->params)
 	{
-		if (strcmp(opt->defname, "verbose") == 0)
-			verbose = defGetBoolean(opt);
+		if (strcmp(opt->defname, "verbose") == 0 &&
+			defGetBoolean(opt))
+			params.options |= CLUOPT_VERBOSE;
+		else if (strcmp(opt->defname, "concurrently") == 0 &&
+				 defGetBoolean(opt))
+		{
+			if (stmt->command != REPACK_COMMAND_REPACK)
+				ereport(ERROR,
+						errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						errmsg("CONCURRENTLY option not supported for %s",
+							   RepackCommandAsString(stmt->command)));
+			params.options |= CLUOPT_CONCURRENT;
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -149,7 +252,17 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 					 parser_errposition(pstate, opt->location)));
 	}
 
-	params.options = (verbose ? CLUOPT_VERBOSE : 0);
+	/*
+	 * Determine the lock mode expected by cluster_rel().
+	 *
+	 * In the exclusive case, we obtain AccessExclusiveLock right away to
+	 * avoid lock-upgrade hazard in the single-transaction case. In the
+	 * CONCURRENTLY case, the AccessExclusiveLock will only be used at the end
+	 * of processing, supposedly for very short time. Until then, we'll have
+	 * to unlock the relation temporarily, so there's no lock-upgrade hazard.
+	 */
+	lockmode = (params.options & CLUOPT_CONCURRENT) == 0 ?
+		AccessExclusiveLock : ShareUpdateExclusiveLock;
 
 	/*
 	 * If a single relation is specified, process it and we're done ... unless
@@ -157,16 +270,35 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 	 */
 	if (stmt->relation != NULL)
 	{
-		rel = process_single_relation(stmt, &params);
+		rel = process_single_relation(stmt, lockmode, isTopLevel, &params);
 		if (rel == NULL)
 			return;
 	}
 
 	/*
-	 * By here, we know we are in a multi-table situation.  In order to avoid
-	 * holding locks for too long, we want to process each table in its own
-	 * transaction.  This forces us to disallow running inside a user
-	 * transaction block.
+	 * By here, we know we are in a multi-table situation.
+	 *
+	 * Concurrent processing is currently considered rather special (e.g. in
+	 * terms of resources consumed) so it is not performed in bulk.
+	 */
+	if (params.options & CLUOPT_CONCURRENT)
+	{
+		if (rel != NULL)
+		{
+			Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
+			ereport(ERROR,
+					errmsg("REPACK CONCURRENTLY not supported for partitioned tables"),
+					errhint("Consider running the command for individual partitions."));
+		}
+		else
+			ereport(ERROR,
+					errmsg("REPACK CONCURRENTLY requires explicit table name"));
+	}
+
+	/*
+	 * In order to avoid holding locks for too long, we want to process each
+	 * table in its own transaction.  This forces us to disallow running
+	 * inside a user transaction block.
 	 */
 	PreventInTransactionBlock(isTopLevel, RepackCommandAsString(stmt->command));
 
@@ -247,13 +379,13 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 		 * Open the target table, coping with the case where it has been
 		 * dropped.
 		 */
-		rel = try_table_open(rtc->tableOid, AccessExclusiveLock);
+		rel = try_table_open(rtc->tableOid, lockmode);
 		if (rel == NULL)
 			continue;
 
 		/* Process this table */
 		cluster_rel(stmt->command, stmt->usingindex,
-					rel, rtc->indexOid, &params);
+					rel, rtc->indexOid, &params, isTopLevel);
 		/* cluster_rel closes the relation, but keeps lock */
 
 		PopActiveSnapshot();
@@ -282,22 +414,55 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
  * If indexOid is InvalidOid, the table will be rewritten in physical order
  * instead of index order.
  *
+ * Note that, in the concurrent case, the function releases the lock at some
+ * point, in order to get AccessExclusiveLock for the final steps (i.e. to
+ * swap the relation files). To make things simpler, the caller should expect
+ * OldHeap to be closed on return, regardless CLUOPT_CONCURRENT. (The
+ * AccessExclusiveLock is kept till the end of the transaction.)
+ *
  * 'cmd' indicates which command is being executed, to be used for error
  * messages.
  */
 void
 cluster_rel(RepackCommand cmd, bool usingindex,
-			Relation OldHeap, Oid indexOid, ClusterParams *params)
+			Relation OldHeap, Oid indexOid, ClusterParams *params,
+			bool isTopLevel)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
+	Relation	index;
+	LOCKMODE	lmode;
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
 	bool		verbose = ((params->options & CLUOPT_VERBOSE) != 0);
 	bool		recheck = ((params->options & CLUOPT_RECHECK) != 0);
-	Relation	index;
+	bool		concurrent = ((params->options & CLUOPT_CONCURRENT) != 0);
+
+	/*
+	 * Check that the correct lock is held. The lock mode is
+	 * AccessExclusiveLock for normal processing and ShareUpdateExclusiveLock
+	 * for concurrent processing (so that SELECT, INSERT, UPDATE and DELETE
+	 * commands work, but cluster_rel() cannot be called concurrently for the
+	 * same relation).
+	 */
+	lmode = !concurrent ? AccessExclusiveLock : ShareUpdateExclusiveLock;
+
+	/* There are specific requirements on concurrent processing. */
+	if (concurrent)
+	{
+		/*
+		 * Make sure we have no XID assigned, otherwise call of
+		 * setup_logical_decoding() can cause a deadlock.
+		 *
+		 * The existence of transaction block actually does not imply that XID
+		 * was already assigned, but it very likely is. We might want to check
+		 * the result of GetCurrentTransactionIdIfAny() instead, but that
+		 * would be less clear from user's perspective.
+		 */
+		PreventInTransactionBlock(isTopLevel, "REPACK (CONCURRENTLY)");
 
-	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false));
+		check_repack_concurrently_requirements(OldHeap);
+	}
 
 	/* Check for user-requested abort. */
 	CHECK_FOR_INTERRUPTS();
@@ -340,11 +505,13 @@ cluster_rel(RepackCommand cmd, bool usingindex,
 	 * If this is a single-transaction CLUSTER, we can skip these tests. We
 	 * *must* skip the one on indisclustered since it would reject an attempt
 	 * to cluster a not-previously-clustered index.
+	 *
+	 * XXX move [some of] these comments to where the RECHECK flag is
+	 * determined?
 	 */
-	if (recheck)
-		if (!cluster_rel_recheck(cmd, OldHeap, indexOid, save_userid,
-								 params->options))
-			goto out;
+	if (recheck && !cluster_rel_recheck(cmd, OldHeap, indexOid, save_userid,
+										lmode, params->options))
+		goto out;
 
 	/*
 	 * We allow repacking shared catalogs only when not using an index. It
@@ -358,6 +525,12 @@ cluster_rel(RepackCommand cmd, bool usingindex,
 				 errmsg("cannot run \"%s\" on a shared catalog",
 						RepackCommandAsString(cmd))));
 
+	/*
+	 * The CONCURRENTLY case should have been rejected earlier because it does
+	 * not support system catalogs.
+	 */
+	Assert(!(OldHeap->rd_rel->relisshared && concurrent));
+
 	/*
 	 * Don't process temp tables of other backends ... their local buffer
 	 * manager is not going to cope.
@@ -393,7 +566,7 @@ cluster_rel(RepackCommand cmd, bool usingindex,
 	if (OidIsValid(indexOid))
 	{
 		/* verify the index is good and lock it */
-		check_index_is_clusterable(OldHeap, indexOid, AccessExclusiveLock);
+		check_index_is_clusterable(OldHeap, indexOid, lmode);
 		/* also open it */
 		index = index_open(indexOid, NoLock);
 	}
@@ -410,7 +583,9 @@ cluster_rel(RepackCommand cmd, bool usingindex,
 	if (OldHeap->rd_rel->relkind == RELKIND_MATVIEW &&
 		!RelationIsPopulated(OldHeap))
 	{
-		relation_close(OldHeap, AccessExclusiveLock);
+		if (index)
+			index_close(index, lmode);
+		relation_close(OldHeap, lmode);
 		goto out;
 	}
 
@@ -423,11 +598,35 @@ cluster_rel(RepackCommand cmd, bool usingindex,
 	 * invalid, because we move tuples around.  Promote them to relation
 	 * locks.  Predicate locks on indexes will be promoted when they are
 	 * reindexed.
+	 *
+	 * During concurrent processing, the heap as well as its indexes stay in
+	 * operation, so we postpone this step until they are locked using
+	 * AccessExclusiveLock near the end of the processing.
 	 */
-	TransferPredicateLocksToHeapRelation(OldHeap);
+	if (!concurrent)
+		TransferPredicateLocksToHeapRelation(OldHeap);
 
 	/* rebuild_relation does all the dirty work */
-	rebuild_relation(cmd, usingindex, OldHeap, index, verbose);
+	PG_TRY();
+	{
+		/*
+		 * For concurrent processing, make sure that our logical decoding
+		 * ignores data changes of other tables than the one we are
+		 * processing.
+		 */
+		if (concurrent)
+			begin_concurrent_repack(OldHeap);
+
+		rebuild_relation(cmd, usingindex, OldHeap, index, save_userid,
+						 verbose, concurrent);
+	}
+	PG_FINALLY();
+	{
+		if (concurrent)
+			end_concurrent_repack();
+	}
+	PG_END_TRY();
+
 	/* rebuild_relation closes OldHeap, and index if valid */
 
 out:
@@ -446,14 +645,14 @@ out:
  */
 static bool
 cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
-					Oid userid, int options)
+					Oid userid, LOCKMODE lmode, int options)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 
 	/* Check that the user still has privileges for the relation */
 	if (!cluster_is_permitted_for_relation(cmd, tableOid, userid))
 	{
-		relation_close(OldHeap, AccessExclusiveLock);
+		relation_close(OldHeap, lmode);
 		return false;
 	}
 
@@ -467,7 +666,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	 */
 	if (RELATION_IS_OTHER_TEMP(OldHeap))
 	{
-		relation_close(OldHeap, AccessExclusiveLock);
+		relation_close(OldHeap, lmode);
 		return false;
 	}
 
@@ -478,7 +677,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 		 */
 		if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
 		{
-			relation_close(OldHeap, AccessExclusiveLock);
+			relation_close(OldHeap, lmode);
 			return false;
 		}
 
@@ -489,7 +688,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 		if ((options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
 			!get_index_isclustered(indexOid))
 		{
-			relation_close(OldHeap, AccessExclusiveLock);
+			relation_close(OldHeap, lmode);
 			return false;
 		}
 	}
@@ -630,19 +829,89 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 	table_close(pg_index, RowExclusiveLock);
 }
 
+/*
+ * Check if the CONCURRENTLY option is legal for the relation.
+ */
+static void
+check_repack_concurrently_requirements(Relation rel)
+{
+	char		relpersistence,
+				replident;
+	Oid			ident_idx;
+
+	/* Data changes in system relations are not logically decoded. */
+	if (IsCatalogRelation(rel))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("REPACK CONCURRENTLY is not supported for catalog relations.")));
+
+	/*
+	 * reorderbuffer.c does not seem to handle processing of TOAST relation
+	 * alone.
+	 */
+	if (IsToastRelation(rel))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("REPACK CONCURRENTLY is not supported for TOAST relations, unless the main relation is repacked too.")));
+
+	relpersistence = rel->rd_rel->relpersistence;
+	if (relpersistence != RELPERSISTENCE_PERMANENT)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("REPACK CONCURRENTLY is only allowed for permanent relations.")));
+
+	/* With NOTHING, WAL does not contain the old tuple. */
+	replident = rel->rd_rel->relreplident;
+	if (replident == REPLICA_IDENTITY_NOTHING)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("Relation \"%s\" has insufficient replication identity.",
+						 RelationGetRelationName(rel))));
+
+	/*
+	 * Identity index is not set if the replica identity is FULL, but PK might
+	 * exist in such a case.
+	 */
+	ident_idx = RelationGetReplicaIndex(rel);
+	if (!OidIsValid(ident_idx) && OidIsValid(rel->rd_pkindex))
+		ident_idx = rel->rd_pkindex;
+	if (!OidIsValid(ident_idx))
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot process relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 (errhint("Relation \"%s\" has no identity index.",
+						  RelationGetRelationName(rel)))));
+}
+
+
 /*
  * rebuild_relation: rebuild an existing relation in index or physical order
  *
- * OldHeap: table to rebuild.
+ * OldHeap: table to rebuild.  See cluster_rel() for comments on the required
+ * lock strength.
+ *
  * index: index to cluster by, or NULL to rewrite in physical order.
  *
- * On entry, heap and index (if one is given) must be open, and
- * AccessExclusiveLock held on them.
- * On exit, they are closed, but locks on them are not released.
+ * On entry, heap and index (if one is given) must be open, and the
+ * appropriate lock held on them -- AccessExclusiveLock for exclusive
+ * processing and ShareUpdateExclusiveLock for concurrent processing.
+ *
+ * On exit, they are closed, but still locked with AccessExclusiveLock.  (The
+ * function handles the lock upgrade if 'concurrent' is true.)
  */
 static void
 rebuild_relation(RepackCommand cmd, bool usingindex,
-				 Relation OldHeap, Relation index, bool verbose)
+				 Relation OldHeap, Relation index, Oid userid,
+				 bool verbose, bool concurrent)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			accessMethod = OldHeap->rd_rel->relam;
@@ -650,13 +919,55 @@ rebuild_relation(RepackCommand cmd, bool usingindex,
 	Oid			OIDNewHeap;
 	Relation	NewHeap;
 	char		relpersistence;
-	bool		is_system_catalog;
 	bool		swap_toast_by_content;
 	TransactionId frozenXid;
 	MultiXactId cutoffMulti;
+	NameData	slotname;
+	LogicalDecodingContext *ctx = NULL;
+	Snapshot	snapshot = NULL;
+#if USE_ASSERT_CHECKING
+	LOCKMODE	lmode;
+
+	lmode = concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock;
+
+	Assert(CheckRelationLockedByMe(OldHeap, lmode, false));
+	Assert(!usingindex || CheckRelationLockedByMe(index, lmode, false));
+#endif
+
+	if (concurrent)
+	{
+		TupleDesc	tupdesc;
+
+		/*
+		 * REPACK CONCURRENTLY is not allowed in a transaction block, so this
+		 * should never fire.
+		 */
+		Assert(GetTopTransactionIdIfAny() == InvalidTransactionId);
+
+		/*
+		 * A single backend should not execute multiple REPACK commands at a
+		 * time, so use PID to make the slot unique.
+		 */
+		snprintf(NameStr(slotname), NAMEDATALEN, "repack_%d", MyProcPid);
+
+		tupdesc = CreateTupleDescCopy(RelationGetDescr(OldHeap));
+
+		/*
+		 * Prepare to capture the concurrent data changes.
+		 *
+		 * Note that this call waits for all transactions with XID already
+		 * assigned to finish. If some of those transactions is waiting for a
+		 * lock conflicting with ShareUpdateExclusiveLock on our table (e.g.
+		 * it runs CREATE INDEX), we can end up in a deadlock. Not sure this
+		 * risk is worth unlocking/locking the table (and its clustering
+		 * index) and checking again if its still eligible for REPACK
+		 * CONCURRENTLY.
+		 */
+		ctx = setup_logical_decoding(tableOid, NameStr(slotname), tupdesc);
 
-	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false) &&
-		   (index == NULL || CheckRelationLockedByMe(index, AccessExclusiveLock, false)));
+		snapshot = SnapBuildInitialSnapshotForRepack(ctx->snapshot_builder);
+		PushActiveSnapshot(snapshot);
+	}
 
 	/* for CLUSTER or REPACK USING INDEX, mark the index as the one to use */
 	if (usingindex)
@@ -664,7 +975,6 @@ rebuild_relation(RepackCommand cmd, bool usingindex,
 
 	/* Remember info about rel before closing OldHeap */
 	relpersistence = OldHeap->rd_rel->relpersistence;
-	is_system_catalog = IsSystemRelation(OldHeap);
 
 	/*
 	 * Create the transient table that will receive the re-ordered data.
@@ -680,30 +990,67 @@ rebuild_relation(RepackCommand cmd, bool usingindex,
 	NewHeap = table_open(OIDNewHeap, NoLock);
 
 	/* Copy the heap data into the new table in the desired order */
-	copy_table_data(NewHeap, OldHeap, index, verbose,
+	copy_table_data(NewHeap, OldHeap, index, snapshot, ctx, verbose,
 					&swap_toast_by_content, &frozenXid, &cutoffMulti);
 
+	/* The historic snapshot won't be needed anymore. */
+	if (snapshot)
+		PopActiveSnapshot();
 
-	/* Close relcache entries, but keep lock until transaction commit */
-	table_close(OldHeap, NoLock);
-	if (index)
-		index_close(index, NoLock);
+	if (concurrent)
+	{
+		/*
+		 * Push a snapshot that we will use to find old versions of rows when
+		 * processing concurrent UPDATE and DELETE commands. (That snapshot
+		 * should also be used by index expressions.)
+		 */
+		PushActiveSnapshot(GetTransactionSnapshot());
 
-	/*
-	 * Close the new relation so it can be dropped as soon as the storage is
-	 * swapped. The relation is not visible to others, so no need to unlock it
-	 * explicitly.
-	 */
-	table_close(NewHeap, NoLock);
+		/*
+		 * Make sure we can find the tuples just inserted when applying DML
+		 * commands on top of those.
+		 */
+		CommandCounterIncrement();
+		UpdateActiveSnapshotCommandId();
 
-	/*
-	 * Swap the physical files of the target and transient tables, then
-	 * rebuild the target's indexes and throw away the transient table.
-	 */
-	finish_heap_swap(tableOid, OIDNewHeap, is_system_catalog,
-					 swap_toast_by_content, false, true,
-					 frozenXid, cutoffMulti,
-					 relpersistence);
+		rebuild_relation_finish_concurrent(NewHeap, OldHeap, index,
+										   ctx, swap_toast_by_content,
+										   frozenXid, cutoffMulti);
+		PopActiveSnapshot();
+
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_FINAL_CLEANUP);
+
+		/* Done with decoding. */
+		cleanup_logical_decoding(ctx);
+		ReplicationSlotRelease();
+		ReplicationSlotDrop(NameStr(slotname), false);
+	}
+	else
+	{
+		bool		is_system_catalog = IsSystemRelation(OldHeap);
+
+		/* Close relcache entries, but keep lock until transaction commit */
+		table_close(OldHeap, NoLock);
+		if (index)
+			index_close(index, NoLock);
+
+		/*
+		 * Close the new relation so it can be dropped as soon as the storage
+		 * is swapped. The relation is not visible to others, so no need to
+		 * unlock it explicitly.
+		 */
+		table_close(NewHeap, NoLock);
+
+		/*
+		 * Swap the physical files of the target and transient tables, then
+		 * rebuild the target's indexes and throw away the transient table.
+		 */
+		finish_heap_swap(tableOid, OIDNewHeap, is_system_catalog,
+						 swap_toast_by_content, false, true, true,
+						 frozenXid, cutoffMulti,
+						 relpersistence);
+	}
 }
 
 
@@ -838,15 +1185,19 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
 /*
  * Do the physical copying of table data.
  *
+ * 'snapshot' and 'decoding_ctx': see table_relation_copy_for_cluster(). Pass
+ * iff concurrent processing is required.
+ *
  * There are three output parameters:
  * *pSwapToastByContent is set true if toast tables must be swapped by content.
  * *pFreezeXid receives the TransactionId used as freeze cutoff point.
  * *pCutoffMulti receives the MultiXactId used as a cutoff point.
  */
 static void
-copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verbose,
-				bool *pSwapToastByContent, TransactionId *pFreezeXid,
-				MultiXactId *pCutoffMulti)
+copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
+				Snapshot snapshot, LogicalDecodingContext *decoding_ctx,
+				bool verbose, bool *pSwapToastByContent,
+				TransactionId *pFreezeXid, MultiXactId *pCutoffMulti)
 {
 	Relation	relRelation;
 	HeapTuple	reltup;
@@ -864,6 +1215,8 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	PGRUsage	ru0;
 	char	   *nspname;
 
+	bool		concurrent = snapshot != NULL;
+
 	pg_rusage_init(&ru0);
 
 	/* Store a copy of the namespace name for logging purposes */
@@ -966,8 +1319,48 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	 * provided, else plain seqscan.
 	 */
 	if (OldIndex != NULL && OldIndex->rd_rel->relam == BTREE_AM_OID)
+	{
+		ResourceOwner oldowner = NULL;
+		ResourceOwner resowner = NULL;
+
+		/*
+		 * In the CONCURRENT case, use a dedicated resource owner so we don't
+		 * leave any additional locks behind us that we cannot release easily.
+		 */
+		if (concurrent)
+		{
+			Assert(CheckRelationLockedByMe(OldHeap, ShareUpdateExclusiveLock,
+										   false));
+			Assert(CheckRelationLockedByMe(OldIndex, ShareUpdateExclusiveLock,
+										   false));
+
+			resowner = ResourceOwnerCreate(CurrentResourceOwner,
+										   "plan_cluster_use_sort");
+			oldowner = CurrentResourceOwner;
+			CurrentResourceOwner = resowner;
+		}
+
 		use_sort = plan_cluster_use_sort(RelationGetRelid(OldHeap),
 										 RelationGetRelid(OldIndex));
+
+		if (concurrent)
+		{
+			CurrentResourceOwner = oldowner;
+
+			/*
+			 * We are primarily concerned about locks, but if the planner
+			 * happened to allocate any other resources, we should release
+			 * them too because we're going to delete the whole resowner.
+			 */
+			ResourceOwnerRelease(resowner, RESOURCE_RELEASE_BEFORE_LOCKS,
+								 false, false);
+			ResourceOwnerRelease(resowner, RESOURCE_RELEASE_LOCKS,
+								 false, false);
+			ResourceOwnerRelease(resowner, RESOURCE_RELEASE_AFTER_LOCKS,
+								 false, false);
+			ResourceOwnerDelete(resowner);
+		}
+	}
 	else
 		use_sort = false;
 
@@ -996,7 +1389,9 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	 * values (e.g. because the AM doesn't use freezing).
 	 */
 	table_relation_copy_for_cluster(OldHeap, NewHeap, OldIndex, use_sort,
-									cutoffs.OldestXmin, &cutoffs.FreezeLimit,
+									cutoffs.OldestXmin, snapshot,
+									decoding_ctx,
+									&cutoffs.FreezeLimit,
 									&cutoffs.MultiXactCutoff,
 									&num_tuples, &tups_vacuumed,
 									&tups_recently_dead);
@@ -1005,7 +1400,11 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	*pFreezeXid = cutoffs.FreezeLimit;
 	*pCutoffMulti = cutoffs.MultiXactCutoff;
 
-	/* Reset rd_toastoid just to be tidy --- it shouldn't be looked at again */
+	/*
+	 * Reset rd_toastoid just to be tidy --- it shouldn't be looked at again.
+	 * In the CONCURRENTLY case, we need to set it again before applying the
+	 * concurrent changes.
+	 */
 	NewHeap->rd_toastoid = InvalidOid;
 
 	num_pages = RelationGetNumberOfBlocks(NewHeap);
@@ -1463,14 +1862,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 				 bool swap_toast_by_content,
 				 bool check_constraints,
 				 bool is_internal,
+				 bool reindex,
 				 TransactionId frozenXid,
 				 MultiXactId cutoffMulti,
 				 char newrelpersistence)
 {
 	ObjectAddress object;
 	Oid			mapped_tables[4];
-	int			reindex_flags;
-	ReindexParams reindex_params = {0};
 	int			i;
 
 	/* Report that we are now swapping relation files */
@@ -1496,39 +1894,47 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	if (is_system_catalog)
 		CacheInvalidateCatalog(OIDOldHeap);
 
-	/*
-	 * Rebuild each index on the relation (but not the toast table, which is
-	 * all-new at this point).  It is important to do this before the DROP
-	 * step because if we are processing a system catalog that will be used
-	 * during DROP, we want to have its indexes available.  There is no
-	 * advantage to the other order anyway because this is all transactional,
-	 * so no chance to reclaim disk space before commit.  We do not need a
-	 * final CommandCounterIncrement() because reindex_relation does it.
-	 *
-	 * Note: because index_build is called via reindex_relation, it will never
-	 * set indcheckxmin true for the indexes.  This is OK even though in some
-	 * sense we are building new indexes rather than rebuilding existing ones,
-	 * because the new heap won't contain any HOT chains at all, let alone
-	 * broken ones, so it can't be necessary to set indcheckxmin.
-	 */
-	reindex_flags = REINDEX_REL_SUPPRESS_INDEX_USE;
-	if (check_constraints)
-		reindex_flags |= REINDEX_REL_CHECK_CONSTRAINTS;
+	if (reindex)
+	{
+		int			reindex_flags;
+		ReindexParams reindex_params = {0};
 
-	/*
-	 * Ensure that the indexes have the same persistence as the parent
-	 * relation.
-	 */
-	if (newrelpersistence == RELPERSISTENCE_UNLOGGED)
-		reindex_flags |= REINDEX_REL_FORCE_INDEXES_UNLOGGED;
-	else if (newrelpersistence == RELPERSISTENCE_PERMANENT)
-		reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
+		/*
+		 * Rebuild each index on the relation (but not the toast table, which
+		 * is all-new at this point).  It is important to do this before the
+		 * DROP step because if we are processing a system catalog that will
+		 * be used during DROP, we want to have its indexes available.  There
+		 * is no advantage to the other order anyway because this is all
+		 * transactional, so no chance to reclaim disk space before commit. We
+		 * do not need a final CommandCounterIncrement() because
+		 * reindex_relation does it.
+		 *
+		 * Note: because index_build is called via reindex_relation, it will
+		 * never set indcheckxmin true for the indexes.  This is OK even
+		 * though in some sense we are building new indexes rather than
+		 * rebuilding existing ones, because the new heap won't contain any
+		 * HOT chains at all, let alone broken ones, so it can't be necessary
+		 * to set indcheckxmin.
+		 */
+		reindex_flags = REINDEX_REL_SUPPRESS_INDEX_USE;
+		if (check_constraints)
+			reindex_flags |= REINDEX_REL_CHECK_CONSTRAINTS;
 
-	/* Report that we are now reindexing relations */
-	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
-								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
+		/*
+		 * Ensure that the indexes have the same persistence as the parent
+		 * relation.
+		 */
+		if (newrelpersistence == RELPERSISTENCE_UNLOGGED)
+			reindex_flags |= REINDEX_REL_FORCE_INDEXES_UNLOGGED;
+		else if (newrelpersistence == RELPERSISTENCE_PERMANENT)
+			reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
 
-	reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
+		/* Report that we are now reindexing relations */
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
+
+		reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
+	}
 
 	/* Report that we are now doing clean up */
 	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
@@ -1870,7 +2276,8 @@ cluster_is_permitted_for_relation(RepackCommand cmd, Oid relid, Oid userid)
  * resolve in this case.
  */
 static Relation
-process_single_relation(RepackStmt *stmt, ClusterParams *params)
+process_single_relation(RepackStmt *stmt, LOCKMODE lockmode, bool isTopLevel,
+						ClusterParams *params)
 {
 	Relation	rel;
 	Oid			tableOid;
@@ -1879,13 +2286,9 @@ process_single_relation(RepackStmt *stmt, ClusterParams *params)
 	Assert(stmt->command == REPACK_COMMAND_CLUSTER ||
 		   stmt->command == REPACK_COMMAND_REPACK);
 
-	/*
-	 * Find, lock, and check permissions on the table.  We obtain
-	 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
-	 * single-transaction case.
-	 */
+	/* Find, lock, and check permissions on the table. */
 	tableOid = RangeVarGetRelidExtended(stmt->relation,
-										AccessExclusiveLock,
+										lockmode,
 										0,
 										RangeVarCallbackMaintainsTable,
 										NULL);
@@ -1911,12 +2314,17 @@ process_single_relation(RepackStmt *stmt, ClusterParams *params)
 		return rel;
 	else
 	{
-		Oid			indexOid;
+		Oid		indexOid = InvalidOid;
+
+		if (stmt->usingindex)
+		{
+			indexOid = determine_clustered_index(rel, stmt->usingindex,
+												 stmt->indexname);
+			check_index_is_clusterable(rel, indexOid, lockmode);
+		}
 
-		indexOid = determine_clustered_index(rel, stmt->usingindex,
-											 stmt->indexname);
-		check_index_is_clusterable(rel, indexOid, AccessExclusiveLock);
-		cluster_rel(stmt->command, stmt->usingindex, rel, indexOid, params);
+		cluster_rel(stmt->command, stmt->usingindex, rel, indexOid,
+					params, isTopLevel);
 		return NULL;
 	}
 }
@@ -1973,3 +2381,1052 @@ determine_clustered_index(Relation rel, bool usingindex, const char *indexname)
 
 	return indexOid;
 }
+
+
+/*
+ * Call this function before REPACK CONCURRENTLY starts to setup logical
+ * decoding. It makes sure that other users of the table put enough
+ * information into WAL.
+ *
+ * The point is that at various places we expect that the table we're
+ * processing is treated like a system catalog. For example, we need to be
+ * able to scan it using a "historic snapshot" anytime during the processing
+ * (as opposed to scanning only at the start point of the decoding, as logical
+ * replication does during initial table synchronization), in order to apply
+ * concurrent UPDATE / DELETE commands.
+ *
+ * Note that TOAST table needs no attention here as it's not scanned using
+ * historic snapshot.
+ */
+static void
+begin_concurrent_repack(Relation rel)
+{
+	Oid			toastrelid;
+
+	/* Avoid logical decoding of other relations by this backend. */
+	repacked_rel_locator = rel->rd_locator;
+	toastrelid = rel->rd_rel->reltoastrelid;
+	if (OidIsValid(toastrelid))
+	{
+		Relation	toastrel;
+
+		/* Avoid logical decoding of other TOAST relations. */
+		toastrel = table_open(toastrelid, AccessShareLock);
+		repacked_rel_toast_locator = toastrel->rd_locator;
+		table_close(toastrel, AccessShareLock);
+	}
+}
+
+/*
+ * Call this when done with REPACK CONCURRENTLY.
+ */
+static void
+end_concurrent_repack(void)
+{
+	/*
+	 * Restore normal function of (future) logical decoding for this backend.
+	 */
+	repacked_rel_locator.relNumber = InvalidOid;
+	repacked_rel_toast_locator.relNumber = InvalidOid;
+}
+
+/*
+ * This function is much like pg_create_logical_replication_slot() except that
+ * the new slot is neither released (if anyone else could read changes from
+ * our slot, we could miss changes other backends do while we copy the
+ * existing data into temporary table), nor persisted (it's easier to handle
+ * crash by restarting all the work from scratch).
+ */
+static LogicalDecodingContext *
+setup_logical_decoding(Oid relid, const char *slotname, TupleDesc tupdesc)
+{
+	LogicalDecodingContext *ctx;
+	RepackDecodingState *dstate;
+
+	/*
+	 * Check if we can use logical decoding.
+	 */
+	CheckSlotPermissions();
+	CheckLogicalDecodingRequirements();
+
+	/* RS_TEMPORARY so that the slot gets cleaned up on ERROR. */
+	ReplicationSlotCreate(slotname, true, RS_TEMPORARY, false, false, false);
+
+	/*
+	 * Neither prepare_write nor do_write callback nor update_progress is
+	 * useful for us.
+	 *
+	 * Regarding the value of need_full_snapshot, we pass false because the
+	 * table we are processing is present in RepackedRelsHash and therefore,
+	 * regarding logical decoding, treated like a catalog.
+	 */
+	ctx = CreateInitDecodingContext(REPL_PLUGIN_NAME,
+									NIL,
+									false,
+									InvalidXLogRecPtr,
+									XL_ROUTINE(.page_read = read_local_xlog_page,
+											   .segment_open = wal_segment_open,
+											   .segment_close = wal_segment_close),
+									NULL, NULL, NULL);
+
+	/*
+	 * We don't have control on setting fast_forward, so at least check it.
+	 */
+	Assert(!ctx->fast_forward);
+
+	DecodingContextFindStartpoint(ctx);
+
+	/* Some WAL records should have been read. */
+	Assert(ctx->reader->EndRecPtr != InvalidXLogRecPtr);
+
+	XLByteToSeg(ctx->reader->EndRecPtr, repack_current_segment,
+				wal_segment_size);
+
+	/*
+	 * Setup structures to store decoded changes.
+	 */
+	dstate = palloc0(sizeof(RepackDecodingState));
+	dstate->relid = relid;
+	dstate->tstore = tuplestore_begin_heap(false, false,
+										   maintenance_work_mem);
+
+	dstate->tupdesc = tupdesc;
+
+	/* Initialize the descriptor to store the changes ... */
+	dstate->tupdesc_change = CreateTemplateTupleDesc(1);
+
+	TupleDescInitEntry(dstate->tupdesc_change, 1, NULL, BYTEAOID, -1, 0);
+	/* ... as well as the corresponding slot. */
+	dstate->tsslot = MakeSingleTupleTableSlot(dstate->tupdesc_change,
+											  &TTSOpsMinimalTuple);
+
+	dstate->resowner = ResourceOwnerCreate(CurrentResourceOwner,
+										   "logical decoding");
+
+	ctx->output_writer_private = dstate;
+	return ctx;
+}
+
+/*
+ * Retrieve tuple from ConcurrentChange structure.
+ *
+ * The input data starts with the structure but it might not be appropriately
+ * aligned.
+ */
+static HeapTuple
+get_changed_tuple(char *change)
+{
+	HeapTupleData tup_data;
+	HeapTuple	result;
+	char	   *src;
+
+	/*
+	 * Ensure alignment before accessing the fields. (This is why we can't use
+	 * heap_copytuple() instead of this function.)
+	 */
+	src = change + offsetof(ConcurrentChange, tup_data);
+	memcpy(&tup_data, src, sizeof(HeapTupleData));
+
+	result = (HeapTuple) palloc(HEAPTUPLESIZE + tup_data.t_len);
+	memcpy(result, &tup_data, sizeof(HeapTupleData));
+	result->t_data = (HeapTupleHeader) ((char *) result + HEAPTUPLESIZE);
+	src = change + SizeOfConcurrentChange;
+	memcpy(result->t_data, src, result->t_len);
+
+	return result;
+}
+
+/*
+ * Decode logical changes from the WAL sequence up to end_of_wal.
+ */
+void
+repack_decode_concurrent_changes(LogicalDecodingContext *ctx,
+								 XLogRecPtr end_of_wal)
+{
+	RepackDecodingState *dstate;
+	ResourceOwner resowner_old;
+
+	/*
+	 * Invalidate the "present" cache before moving to "(recent) history".
+	 */
+	InvalidateSystemCaches();
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+	resowner_old = CurrentResourceOwner;
+	CurrentResourceOwner = dstate->resowner;
+
+	PG_TRY();
+	{
+		while (ctx->reader->EndRecPtr < end_of_wal)
+		{
+			XLogRecord *record;
+			XLogSegNo	segno_new;
+			char	   *errm = NULL;
+			XLogRecPtr	end_lsn;
+
+			record = XLogReadRecord(ctx->reader, &errm);
+			if (errm)
+				elog(ERROR, "%s", errm);
+
+			if (record != NULL)
+				LogicalDecodingProcessRecord(ctx, ctx->reader);
+
+			/*
+			 * If WAL segment boundary has been crossed, inform the decoding
+			 * system that the catalog_xmin can advance. (We can confirm more
+			 * often, but a filling a single WAL segment should not take much
+			 * time.)
+			 */
+			end_lsn = ctx->reader->EndRecPtr;
+			XLByteToSeg(end_lsn, segno_new, wal_segment_size);
+			if (segno_new != repack_current_segment)
+			{
+				LogicalConfirmReceivedLocation(end_lsn);
+				elog(DEBUG1, "REPACK: confirmed receive location %X/%X",
+					 (uint32) (end_lsn >> 32), (uint32) end_lsn);
+				repack_current_segment = segno_new;
+			}
+
+			CHECK_FOR_INTERRUPTS();
+		}
+		InvalidateSystemCaches();
+		CurrentResourceOwner = resowner_old;
+	}
+	PG_CATCH();
+	{
+		/* clear all timetravel entries */
+		InvalidateSystemCaches();
+		CurrentResourceOwner = resowner_old;
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+}
+
+/*
+ * Apply changes that happened during the initial load.
+ *
+ * Scan key is passed by caller, so it does not have to be constructed
+ * multiple times. Key entries have all fields initialized, except for
+ * sk_argument.
+ */
+static void
+apply_concurrent_changes(RepackDecodingState *dstate, Relation rel,
+						 ScanKey key, int nkeys, IndexInsertState *iistate)
+{
+	TupleTableSlot *index_slot,
+			   *ident_slot;
+	HeapTuple	tup_old = NULL;
+
+	if (dstate->nchanges == 0)
+		return;
+
+	/* TupleTableSlot is needed to pass the tuple to ExecInsertIndexTuples(). */
+	index_slot = MakeSingleTupleTableSlot(dstate->tupdesc, &TTSOpsHeapTuple);
+
+	/* A slot to fetch tuples from identity index. */
+	ident_slot = table_slot_create(rel, NULL);
+
+	while (tuplestore_gettupleslot(dstate->tstore, true, false,
+								   dstate->tsslot))
+	{
+		bool		shouldFree;
+		HeapTuple	tup_change,
+					tup,
+					tup_exist;
+		char	   *change_raw,
+				   *src;
+		ConcurrentChange change;
+		bool		isnull[1];
+		Datum		values[1];
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* Get the change from the single-column tuple. */
+		tup_change = ExecFetchSlotHeapTuple(dstate->tsslot, false, &shouldFree);
+		heap_deform_tuple(tup_change, dstate->tupdesc_change, values, isnull);
+		Assert(!isnull[0]);
+
+		/* Make sure we access aligned data. */
+		change_raw = (char *) DatumGetByteaP(values[0]);
+		src = (char *) VARDATA(change_raw);
+		memcpy(&change, src, SizeOfConcurrentChange);
+
+		/* TRUNCATE change contains no tuple, so process it separately. */
+		if (change.kind == CHANGE_TRUNCATE)
+		{
+			/*
+			 * All the things that ExecuteTruncateGuts() does (such as firing
+			 * triggers or handling the DROP_CASCADE behavior) should have
+			 * taken place on the source relation. Thus we only do the actual
+			 * truncation of the new relation (and its indexes).
+			 */
+			heap_truncate_one_rel(rel);
+
+			pfree(tup_change);
+			continue;
+		}
+
+		/*
+		 * Extract the tuple from the change. The tuple is copied here because
+		 * it might be assigned to 'tup_old', in which case it needs to
+		 * survive into the next iteration.
+		 */
+		tup = get_changed_tuple(src);
+
+		if (change.kind == CHANGE_UPDATE_OLD)
+		{
+			Assert(tup_old == NULL);
+			tup_old = tup;
+		}
+		else if (change.kind == CHANGE_INSERT)
+		{
+			Assert(tup_old == NULL);
+
+			apply_concurrent_insert(rel, &change, tup, iistate, index_slot);
+
+			pfree(tup);
+		}
+		else if (change.kind == CHANGE_UPDATE_NEW ||
+				 change.kind == CHANGE_DELETE)
+		{
+			IndexScanDesc ind_scan = NULL;
+			HeapTuple	tup_key;
+
+			if (change.kind == CHANGE_UPDATE_NEW)
+			{
+				tup_key = tup_old != NULL ? tup_old : tup;
+			}
+			else
+			{
+				Assert(tup_old == NULL);
+				tup_key = tup;
+			}
+
+			/*
+			 * Find the tuple to be updated or deleted.
+			 */
+			tup_exist = find_target_tuple(rel, key, nkeys, tup_key,
+										  iistate, ident_slot, &ind_scan);
+			if (tup_exist == NULL)
+				elog(ERROR, "Failed to find target tuple");
+
+			if (change.kind == CHANGE_UPDATE_NEW)
+				apply_concurrent_update(rel, tup, tup_exist, &change, iistate,
+										index_slot);
+			else
+				apply_concurrent_delete(rel, tup_exist, &change);
+
+			if (tup_old != NULL)
+			{
+				pfree(tup_old);
+				tup_old = NULL;
+			}
+
+			pfree(tup);
+			index_endscan(ind_scan);
+		}
+		else
+			elog(ERROR, "Unrecognized kind of change: %d", change.kind);
+
+		/*
+		 * If a change was applied now, increment CID for next writes and
+		 * update the snapshot so it sees the changes we've applied so far.
+		 */
+		if (change.kind != CHANGE_UPDATE_OLD)
+		{
+			CommandCounterIncrement();
+			UpdateActiveSnapshotCommandId();
+		}
+
+		/* TTSOpsMinimalTuple has .get_heap_tuple==NULL. */
+		Assert(shouldFree);
+		pfree(tup_change);
+	}
+
+	tuplestore_clear(dstate->tstore);
+	dstate->nchanges = 0;
+
+	/* Cleanup. */
+	ExecDropSingleTupleTableSlot(index_slot);
+	ExecDropSingleTupleTableSlot(ident_slot);
+}
+
+static void
+apply_concurrent_insert(Relation rel, ConcurrentChange *change, HeapTuple tup,
+						IndexInsertState *iistate, TupleTableSlot *index_slot)
+{
+	List	   *recheck;
+
+
+	/*
+	 * Like simple_heap_insert(), but make sure that the INSERT is not
+	 * logically decoded - see reform_and_rewrite_tuple() for more
+	 * information.
+	 */
+	heap_insert(rel, tup, GetCurrentCommandId(true), HEAP_INSERT_NO_LOGICAL,
+				NULL);
+
+	/*
+	 * Update indexes.
+	 *
+	 * In case functions in the index need the active snapshot and caller
+	 * hasn't set one.
+	 */
+	ExecStoreHeapTuple(tup, index_slot, false);
+	recheck = ExecInsertIndexTuples(iistate->rri,
+									index_slot,
+									iistate->estate,
+									false,	/* update */
+									false,	/* noDupErr */
+									NULL,	/* specConflict */
+									NIL,	/* arbiterIndexes */
+									false	/* onlySummarizing */
+		);
+
+	/*
+	 * If recheck is required, it must have been preformed on the source
+	 * relation by now. (All the logical changes we process here are already
+	 * committed.)
+	 */
+	list_free(recheck);
+
+	pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_INSERTED, 1);
+}
+
+static void
+apply_concurrent_update(Relation rel, HeapTuple tup, HeapTuple tup_target,
+						ConcurrentChange *change, IndexInsertState *iistate,
+						TupleTableSlot *index_slot)
+{
+	LockTupleMode lockmode;
+	TM_FailureData tmfd;
+	TU_UpdateIndexes update_indexes;
+	TM_Result	res;
+	List	   *recheck;
+
+	/*
+	 * Write the new tuple into the new heap. ('tup' gets the TID assigned
+	 * here.)
+	 *
+	 * Do it like in simple_heap_update(), except for 'wal_logical' (and
+	 * except for 'wait').
+	 */
+	res = heap_update(rel, &tup_target->t_self, tup,
+					  GetCurrentCommandId(true),
+					  InvalidSnapshot,
+					  false,	/* no wait - only we are doing changes */
+					  &tmfd, &lockmode, &update_indexes,
+					  false /* wal_logical */ );
+	if (res != TM_Ok)
+		ereport(ERROR, (errmsg("failed to apply concurrent UPDATE")));
+
+	ExecStoreHeapTuple(tup, index_slot, false);
+
+	if (update_indexes != TU_None)
+	{
+		recheck = ExecInsertIndexTuples(iistate->rri,
+										index_slot,
+										iistate->estate,
+										true,	/* update */
+										false,	/* noDupErr */
+										NULL,	/* specConflict */
+										NIL,	/* arbiterIndexes */
+		/* onlySummarizing */
+										update_indexes == TU_Summarizing);
+		list_free(recheck);
+	}
+
+	pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_UPDATED, 1);
+}
+
+static void
+apply_concurrent_delete(Relation rel, HeapTuple tup_target,
+						ConcurrentChange *change)
+{
+	TM_Result	res;
+	TM_FailureData tmfd;
+
+	/*
+	 * Delete tuple from the new heap.
+	 *
+	 * Do it like in simple_heap_delete(), except for 'wal_logical' (and
+	 * except for 'wait').
+	 */
+	res = heap_delete(rel, &tup_target->t_self, GetCurrentCommandId(true),
+					  InvalidSnapshot, false,
+					  &tmfd,
+					  false,	/* no wait - only we are doing changes */
+					  false /* wal_logical */ );
+
+	if (res != TM_Ok)
+		ereport(ERROR, (errmsg("failed to apply concurrent DELETE")));
+
+	pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_DELETED, 1);
+}
+
+/*
+ * Find the tuple to be updated or deleted.
+ *
+ * 'key' is a pre-initialized scan key, into which the function will put the
+ * key values.
+ *
+ * 'tup_key' is a tuple containing the key values for the scan.
+ *
+ * On exit,'*scan_p' contains the scan descriptor used. The caller must close
+ * it when he no longer needs the tuple returned.
+ */
+static HeapTuple
+find_target_tuple(Relation rel, ScanKey key, int nkeys, HeapTuple tup_key,
+				  IndexInsertState *iistate,
+				  TupleTableSlot *ident_slot, IndexScanDesc *scan_p)
+{
+	IndexScanDesc scan;
+	Form_pg_index ident_form;
+	int2vector *ident_indkey;
+	HeapTuple	result = NULL;
+
+	/* XXX no instrumentation for now */
+	scan = index_beginscan(rel, iistate->ident_index, GetActiveSnapshot(),
+						   NULL, nkeys, 0);
+	*scan_p = scan;
+	index_rescan(scan, key, nkeys, NULL, 0);
+
+	/* Info needed to retrieve key values from heap tuple. */
+	ident_form = iistate->ident_index->rd_index;
+	ident_indkey = &ident_form->indkey;
+
+	/* Use the incoming tuple to finalize the scan key. */
+	for (int i = 0; i < scan->numberOfKeys; i++)
+	{
+		ScanKey		entry;
+		bool		isnull;
+		int16		attno_heap;
+
+		entry = &scan->keyData[i];
+		attno_heap = ident_indkey->values[i];
+		entry->sk_argument = heap_getattr(tup_key,
+										  attno_heap,
+										  rel->rd_att,
+										  &isnull);
+		Assert(!isnull);
+	}
+	if (index_getnext_slot(scan, ForwardScanDirection, ident_slot))
+	{
+		bool		shouldFree;
+
+		result = ExecFetchSlotHeapTuple(ident_slot, false, &shouldFree);
+		/* TTSOpsBufferHeapTuple has .get_heap_tuple != NULL. */
+		Assert(!shouldFree);
+	}
+
+	return result;
+}
+
+/*
+ * Decode and apply concurrent changes.
+ *
+ * Pass rel_src iff its reltoastrelid is needed.
+ */
+static void
+process_concurrent_changes(LogicalDecodingContext *ctx, XLogRecPtr end_of_wal,
+						   Relation rel_dst, Relation rel_src, ScanKey ident_key,
+						   int ident_key_nentries, IndexInsertState *iistate)
+{
+	RepackDecodingState *dstate;
+
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_CATCH_UP);
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	repack_decode_concurrent_changes(ctx, end_of_wal);
+
+	if (dstate->nchanges == 0)
+		return;
+
+	PG_TRY();
+	{
+		/*
+		 * Make sure that TOAST values can eventually be accessed via the old
+		 * relation - see comment in copy_table_data().
+		 */
+		if (rel_src)
+			rel_dst->rd_toastoid = rel_src->rd_rel->reltoastrelid;
+
+		apply_concurrent_changes(dstate, rel_dst, ident_key,
+								 ident_key_nentries, iistate);
+	}
+	PG_FINALLY();
+	{
+		if (rel_src)
+			rel_dst->rd_toastoid = InvalidOid;
+	}
+	PG_END_TRY();
+}
+
+static IndexInsertState *
+get_index_insert_state(Relation relation, Oid ident_index_id)
+{
+	EState	   *estate;
+	int			i;
+	IndexInsertState *result;
+
+	result = (IndexInsertState *) palloc0(sizeof(IndexInsertState));
+	estate = CreateExecutorState();
+
+	result->rri = (ResultRelInfo *) palloc(sizeof(ResultRelInfo));
+	InitResultRelInfo(result->rri, relation, 0, 0, 0);
+	ExecOpenIndices(result->rri, false);
+
+	/*
+	 * Find the relcache entry of the identity index so that we spend no extra
+	 * effort to open / close it.
+	 */
+	for (i = 0; i < result->rri->ri_NumIndices; i++)
+	{
+		Relation	ind_rel;
+
+		ind_rel = result->rri->ri_IndexRelationDescs[i];
+		if (ind_rel->rd_id == ident_index_id)
+			result->ident_index = ind_rel;
+	}
+	if (result->ident_index == NULL)
+		elog(ERROR, "Failed to open identity index");
+
+	/* Only initialize fields needed by ExecInsertIndexTuples(). */
+	result->estate = estate;
+
+	return result;
+}
+
+/*
+ * Build scan key to process logical changes.
+ */
+static ScanKey
+build_identity_key(Oid ident_idx_oid, Relation rel_src, int *nentries)
+{
+	Relation	ident_idx_rel;
+	Form_pg_index ident_idx;
+	int			n,
+				i;
+	ScanKey		result;
+
+	Assert(OidIsValid(ident_idx_oid));
+	ident_idx_rel = index_open(ident_idx_oid, AccessShareLock);
+	ident_idx = ident_idx_rel->rd_index;
+	n = ident_idx->indnatts;
+	result = (ScanKey) palloc(sizeof(ScanKeyData) * n);
+	for (i = 0; i < n; i++)
+	{
+		ScanKey		entry;
+		int16		relattno;
+		Form_pg_attribute att;
+		Oid			opfamily,
+					opcintype,
+					opno,
+					opcode;
+
+		entry = &result[i];
+		relattno = ident_idx->indkey.values[i];
+		if (relattno >= 1)
+		{
+			TupleDesc	desc;
+
+			desc = rel_src->rd_att;
+			att = TupleDescAttr(desc, relattno - 1);
+		}
+		else
+			elog(ERROR, "Unexpected attribute number %d in index", relattno);
+
+		opfamily = ident_idx_rel->rd_opfamily[i];
+		opcintype = ident_idx_rel->rd_opcintype[i];
+		opno = get_opfamily_member(opfamily, opcintype, opcintype,
+								   BTEqualStrategyNumber);
+
+		if (!OidIsValid(opno))
+			elog(ERROR, "Failed to find = operator for type %u", opcintype);
+
+		opcode = get_opcode(opno);
+		if (!OidIsValid(opcode))
+			elog(ERROR, "Failed to find = operator for operator %u", opno);
+
+		/* Initialize everything but argument. */
+		ScanKeyInit(entry,
+					i + 1,
+					BTEqualStrategyNumber, opcode,
+					(Datum) NULL);
+		entry->sk_collation = att->attcollation;
+	}
+	index_close(ident_idx_rel, AccessShareLock);
+
+	*nentries = n;
+	return result;
+}
+
+static void
+free_index_insert_state(IndexInsertState *iistate)
+{
+	ExecCloseIndices(iistate->rri);
+	FreeExecutorState(iistate->estate);
+	pfree(iistate->rri);
+	pfree(iistate);
+}
+
+static void
+cleanup_logical_decoding(LogicalDecodingContext *ctx)
+{
+	RepackDecodingState *dstate;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	ExecDropSingleTupleTableSlot(dstate->tsslot);
+	FreeTupleDesc(dstate->tupdesc_change);
+	FreeTupleDesc(dstate->tupdesc);
+	tuplestore_end(dstate->tstore);
+
+	FreeDecodingContext(ctx);
+}
+
+/*
+ * The final steps of rebuild_relation() for concurrent processing.
+ *
+ * On entry, NewHeap is locked in AccessExclusiveLock mode. OldHeap and its
+ * clustering index (if one is passed) are still locked in a mode that allows
+ * concurrent data changes. On exit, both tables and their indexes are closed,
+ * but locked in AccessExclusiveLock mode.
+ */
+static void
+rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
+								   Relation cl_index,
+								   LogicalDecodingContext *ctx,
+								   bool swap_toast_by_content,
+								   TransactionId frozenXid,
+								   MultiXactId cutoffMulti)
+{
+	LOCKMODE	lockmode_old PG_USED_FOR_ASSERTS_ONLY;
+	List	   *ind_oids_new;
+	Oid			old_table_oid = RelationGetRelid(OldHeap);
+	Oid			new_table_oid = RelationGetRelid(NewHeap);
+	List	   *ind_oids_old = RelationGetIndexList(OldHeap);
+	ListCell   *lc,
+			   *lc2;
+	char		relpersistence;
+	bool		is_system_catalog;
+	Oid			ident_idx_old,
+				ident_idx_new;
+	IndexInsertState *iistate;
+	ScanKey		ident_key;
+	int			ident_key_nentries;
+	XLogRecPtr	wal_insert_ptr,
+				end_of_wal;
+	char		dummy_rec_data = '\0';
+	Relation   *ind_refs,
+			   *ind_refs_p;
+	int			nind;
+
+	/* Like in cluster_rel(). */
+	lockmode_old = ShareUpdateExclusiveLock;
+	Assert(CheckRelationLockedByMe(OldHeap, lockmode_old, false));
+	Assert(cl_index == NULL ||
+		   CheckRelationLockedByMe(cl_index, lockmode_old, false));
+	/* This is expected from the caller. */
+	Assert(CheckRelationLockedByMe(NewHeap, AccessExclusiveLock, false));
+
+	ident_idx_old = RelationGetReplicaIndex(OldHeap);
+
+	/*
+	 * Unlike the exclusive case, we build new indexes for the new relation
+	 * rather than swapping the storage and reindexing the old relation. The
+	 * point is that the index build can take some time, so we do it before we
+	 * get AccessExclusiveLock on the old heap and therefore we cannot swap
+	 * the heap storage yet.
+	 *
+	 * index_create() will lock the new indexes using AccessExclusiveLock - no
+	 * need to change that.
+	 *
+	 * We assume that ShareUpdateExclusiveLock on the table prevents anyone
+	 * from dropping the existing indexes or adding new ones, so the lists of
+	 * old and new indexes should match at the swap time. On the other hand we
+	 * do not block ALTER INDEX commands that do not require table lock (e.g.
+	 * ALTER INDEX ... SET ...).
+	 *
+	 * XXX Should we check a the end of our work if another transaction
+	 * executed such a command and issue a NOTICE that we might have discarded
+	 * its effects? (For example, someone changes storage parameter after we
+	 * have created the new index, the new value of that parameter is lost.)
+	 * Alternatively, we can lock all the indexes now in a mode that blocks
+	 * all the ALTER INDEX commands (ShareUpdateExclusiveLock ?), and keep
+	 * them locked till the end of the transactions. That might increase the
+	 * risk of deadlock during the lock upgrade below, however SELECT / DML
+	 * queries should not be involved in such a deadlock.
+	 */
+	ind_oids_new = build_new_indexes(NewHeap, OldHeap, ind_oids_old);
+
+	/*
+	 * Processing shouldn't start w/o valid identity index.
+	 */
+	Assert(OidIsValid(ident_idx_old));
+
+	/* Find "identity index" on the new relation. */
+	ident_idx_new = InvalidOid;
+	forboth(lc, ind_oids_old, lc2, ind_oids_new)
+	{
+		Oid			ind_old = lfirst_oid(lc);
+		Oid			ind_new = lfirst_oid(lc2);
+
+		if (ident_idx_old == ind_old)
+		{
+			ident_idx_new = ind_new;
+			break;
+		}
+	}
+	if (!OidIsValid(ident_idx_new))
+
+		/*
+		 * Should not happen, given our lock on the old relation.
+		 */
+		ereport(ERROR,
+				(errmsg("Identity index missing on the new relation")));
+
+	/* Executor state to update indexes. */
+	iistate = get_index_insert_state(NewHeap, ident_idx_new);
+
+	/*
+	 * Build scan key that we'll use to look for rows to be updated / deleted
+	 * during logical decoding.
+	 */
+	ident_key = build_identity_key(ident_idx_new, OldHeap, &ident_key_nentries);
+
+	/*
+	 * During testing, wait for another backend to perform concurrent data
+	 * changes which we will process below.
+	 */
+	INJECTION_POINT("repack-concurrently-before-lock", NULL);
+
+	/*
+	 * Flush all WAL records inserted so far (possibly except for the last
+	 * incomplete page, see GetInsertRecPtr), to minimize the amount of data
+	 * we need to flush while holding exclusive lock on the source table.
+	 */
+	wal_insert_ptr = GetInsertRecPtr();
+	XLogFlush(wal_insert_ptr);
+	end_of_wal = GetFlushRecPtr(NULL);
+
+	/*
+	 * Apply concurrent changes first time, to minimize the time we need to
+	 * hold AccessExclusiveLock. (Quite some amount of WAL could have been
+	 * written during the data copying and index creation.)
+	 */
+	process_concurrent_changes(ctx, end_of_wal, NewHeap,
+							   swap_toast_by_content ? OldHeap : NULL,
+							   ident_key, ident_key_nentries, iistate);
+
+	/*
+	 * Acquire AccessExclusiveLock on the table, its TOAST relation (if there
+	 * is one), all its indexes, so that we can swap the files.
+	 *
+	 * Before that, unlock the index temporarily to avoid deadlock in case
+	 * another transaction is trying to lock it while holding the lock on the
+	 * table.
+	 */
+	if (cl_index)
+	{
+		index_close(cl_index, ShareUpdateExclusiveLock);
+		cl_index = NULL;
+	}
+	/* For the same reason, unlock TOAST relation. */
+	if (OldHeap->rd_rel->reltoastrelid)
+		LockRelationOid(OldHeap->rd_rel->reltoastrelid, AccessExclusiveLock);
+	/* Finally lock the table */
+	LockRelationOid(old_table_oid, AccessExclusiveLock);
+
+	/*
+	 * Lock all indexes now, not only the clustering one: all indexes need to
+	 * have their files swapped. While doing that, store their relation
+	 * references in an array, to handle predicate locks below.
+	 */
+	ind_refs_p = ind_refs = palloc_array(Relation, list_length(ind_oids_old));
+	nind = 0;
+	foreach(lc, ind_oids_old)
+	{
+		Oid			ind_oid;
+		Relation	index;
+
+		ind_oid = lfirst_oid(lc);
+		index = index_open(ind_oid, AccessExclusiveLock);
+
+		/*
+		 * TODO 1) Do we need to check if ALTER INDEX was executed since the
+		 * new index was created in build_new_indexes()? 2) Specifically for
+		 * the clustering index, should check_index_is_clusterable() be called
+		 * here? (Not sure about the latter: ShareUpdateExclusiveLock on the
+		 * table probably blocks all commands that affect the result of
+		 * check_index_is_clusterable().)
+		 */
+		*ind_refs_p = index;
+		ind_refs_p++;
+		nind++;
+	}
+
+	/*
+	 * In addition, lock the OldHeap's TOAST relation exclusively - again, the
+	 * lock is needed to swap the files.
+	 */
+	if (OidIsValid(OldHeap->rd_rel->reltoastrelid))
+		LockRelationOid(OldHeap->rd_rel->reltoastrelid, AccessExclusiveLock);
+
+	/*
+	 * Tuples and pages of the old heap will be gone, but the heap will stay.
+	 */
+	TransferPredicateLocksToHeapRelation(OldHeap);
+	/* The same for indexes. */
+	for (int i = 0; i < nind; i++)
+	{
+		Relation	index = ind_refs[i];
+
+		TransferPredicateLocksToHeapRelation(index);
+
+		/*
+		 * References to indexes on the old relation are not needed anymore,
+		 * however locks stay till the end of the transaction.
+		 */
+		index_close(index, NoLock);
+	}
+	pfree(ind_refs);
+
+	/*
+	 * Flush anything we see in WAL, to make sure that all changes committed
+	 * while we were waiting for the exclusive lock are available for
+	 * decoding. This should not be necessary if all backends had
+	 * synchronous_commit set, but we can't rely on this setting.
+	 *
+	 * Unfortunately, GetInsertRecPtr() may lag behind the actual insert
+	 * position, and GetLastImportantRecPtr() points at the start of the last
+	 * record rather than at the end. Thus the simplest way to determine the
+	 * insert position is to insert a dummy record and use its LSN.
+	 *
+	 * XXX Consider using GetLastImportantRecPtr() and adding the size of the
+	 * last record (plus the total size of all the page headers the record
+	 * spans)?
+	 */
+	XLogBeginInsert();
+	XLogRegisterData(&dummy_rec_data, 1);
+	wal_insert_ptr = XLogInsert(RM_XLOG_ID, XLOG_NOOP);
+	XLogFlush(wal_insert_ptr);
+	end_of_wal = GetFlushRecPtr(NULL);
+
+	/* Apply the concurrent changes again. */
+	process_concurrent_changes(ctx, end_of_wal, NewHeap,
+							   swap_toast_by_content ? OldHeap : NULL,
+							   ident_key, ident_key_nentries, iistate);
+
+	/* Remember info about rel before closing OldHeap */
+	relpersistence = OldHeap->rd_rel->relpersistence;
+	is_system_catalog = IsSystemRelation(OldHeap);
+
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_SWAP_REL_FILES);
+
+	/*
+	 * Even ShareUpdateExclusiveLock should have prevented others from
+	 * creating / dropping indexes (even using the CONCURRENTLY option), so we
+	 * do not need to check whether the lists match.
+	 */
+	forboth(lc, ind_oids_old, lc2, ind_oids_new)
+	{
+		Oid			ind_old = lfirst_oid(lc);
+		Oid			ind_new = lfirst_oid(lc2);
+		Oid			mapped_tables[4];
+
+		/* Zero out possible results from swapped_relation_files */
+		memset(mapped_tables, 0, sizeof(mapped_tables));
+
+		swap_relation_files(ind_old, ind_new,
+							(old_table_oid == RelationRelationId),
+							swap_toast_by_content,
+							true,
+							InvalidTransactionId,
+							InvalidMultiXactId,
+							mapped_tables);
+
+#ifdef USE_ASSERT_CHECKING
+
+		/*
+		 * Concurrent processing is not supported for system relations, so
+		 * there should be no mapped tables.
+		 */
+		for (int i = 0; i < 4; i++)
+			Assert(mapped_tables[i] == 0);
+#endif
+	}
+
+	/* The new indexes must be visible for deletion. */
+	CommandCounterIncrement();
+
+	/* Close the old heap but keep lock until transaction commit. */
+	table_close(OldHeap, NoLock);
+	/* Close the new heap. (We didn't have to open its indexes). */
+	table_close(NewHeap, NoLock);
+
+	/* Cleanup what we don't need anymore. (And close the identity index.) */
+	pfree(ident_key);
+	free_index_insert_state(iistate);
+
+	/*
+	 * Swap the relations and their TOAST relations and TOAST indexes. This
+	 * also drops the new relation and its indexes.
+	 *
+	 * (System catalogs are currently not supported.)
+	 */
+	Assert(!is_system_catalog);
+	finish_heap_swap(old_table_oid, new_table_oid,
+					 is_system_catalog,
+					 swap_toast_by_content,
+					 false, true, false,
+					 frozenXid, cutoffMulti,
+					 relpersistence);
+}
+
+/*
+ * Build indexes on NewHeap according to those on OldHeap.
+ *
+ * OldIndexes is the list of index OIDs on OldHeap.
+ *
+ * A list of OIDs of the corresponding indexes created on NewHeap is
+ * returned. The order of items does match, so we can use these arrays to swap
+ * index storage.
+ */
+static List *
+build_new_indexes(Relation NewHeap, Relation OldHeap, List *OldIndexes)
+{
+	ListCell   *lc;
+	List	   *result = NIL;
+
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
+
+	foreach(lc, OldIndexes)
+	{
+		Oid			ind_oid,
+			ind_oid_new;
+		char		*newName;
+		Relation	ind;
+
+		ind_oid = lfirst_oid(lc);
+		ind = index_open(ind_oid, AccessShareLock);
+
+		newName = ChooseRelationName(get_rel_name(ind_oid),
+									 NULL,
+									 "repacknew",
+									 get_rel_namespace(ind->rd_index->indrelid),
+									 false);
+		ind_oid_new = index_create_copy(NewHeap, ind_oid,
+										ind->rd_rel->reltablespace, newName,
+										false);
+		result = lappend_oid(result, ind_oid_new);
+
+		index_close(ind, AccessShareLock);
+	}
+
+	return result;
+}
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 188e26f0e6e..71b73c21ebf 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -904,7 +904,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 static void
 refresh_by_heap_swap(Oid matviewOid, Oid OIDNewHeap, char relpersistence)
 {
-	finish_heap_swap(matviewOid, OIDNewHeap, false, false, true, true,
+	finish_heap_swap(matviewOid, OIDNewHeap, false, false, true, true, true,
 					 RecentXmin, ReadNextMultiXactId(), relpersistence);
 }
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index c6dd2e020da..d83136a2657 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -5989,6 +5989,7 @@ ATRewriteTables(AlterTableStmt *parsetree, List **wqueue, LOCKMODE lockmode,
 			finish_heap_swap(tab->relid, OIDNewHeap,
 							 false, false, true,
 							 !OidIsValid(tab->newTableSpace),
+							 true,
 							 RecentXmin,
 							 ReadNextMultiXactId(),
 							 persistence);
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 8863ad0e8bd..6de9d0ba39d 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -125,7 +125,7 @@ static void vac_truncate_clog(TransactionId frozenXID,
 							  TransactionId lastSaneFrozenXid,
 							  MultiXactId lastSaneMinMulti);
 static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
-					   BufferAccessStrategy bstrategy);
+					   BufferAccessStrategy bstrategy, bool isTopLevel);
 static double compute_parallel_delay(void);
 static VacOptValue get_vacoptval_from_boolean(DefElem *def);
 static bool vac_tid_reaped(ItemPointer itemptr, void *state);
@@ -633,7 +633,8 @@ vacuum(List *relations, const VacuumParams params, BufferAccessStrategy bstrateg
 
 			if (params.options & VACOPT_VACUUM)
 			{
-				if (!vacuum_rel(vrel->oid, vrel->relation, params, bstrategy))
+				if (!vacuum_rel(vrel->oid, vrel->relation, params, bstrategy,
+								isTopLevel))
 					continue;
 			}
 
@@ -1997,7 +1998,7 @@ vac_truncate_clog(TransactionId frozenXID,
  */
 static bool
 vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
-		   BufferAccessStrategy bstrategy)
+		   BufferAccessStrategy bstrategy, bool isTopLevel)
 {
 	LOCKMODE	lmode;
 	Relation	rel;
@@ -2288,7 +2289,7 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 
 			/* VACUUM FULL is now a variant of CLUSTER; see cluster.c */
 			cluster_rel(REPACK_COMMAND_VACUUMFULL, false, rel, InvalidOid,
-						&cluster_params);
+						&cluster_params, isTopLevel);
 			/* cluster_rel closes the relation, but keeps lock */
 
 			rel = NULL;
@@ -2331,7 +2332,8 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 		toast_vacuum_params.options |= VACOPT_PROCESS_MAIN;
 		toast_vacuum_params.toast_parent = relid;
 
-		vacuum_rel(toast_relid, NULL, toast_vacuum_params, bstrategy);
+		vacuum_rel(toast_relid, NULL, toast_vacuum_params, bstrategy,
+				   isTopLevel);
 	}
 
 	/*
diff --git a/src/backend/meson.build b/src/backend/meson.build
index 2b0db214804..50aa385a581 100644
--- a/src/backend/meson.build
+++ b/src/backend/meson.build
@@ -194,5 +194,6 @@ pg_test_mod_args = pg_mod_args + {
 subdir('jit/llvm')
 subdir('replication/libpqwalreceiver')
 subdir('replication/pgoutput')
+subdir('replication/pgoutput_repack')
 subdir('snowball')
 subdir('utils/mb/conversion_procs')
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..5dc4ae58ffe 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -33,6 +33,7 @@
 #include "access/xlogreader.h"
 #include "access/xlogrecord.h"
 #include "catalog/pg_control.h"
+#include "commands/cluster.h"
 #include "replication/decode.h"
 #include "replication/logical.h"
 #include "replication/message.h"
@@ -472,6 +473,88 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	TransactionId xid = XLogRecGetXid(buf->record);
 	SnapBuild  *builder = ctx->snapshot_builder;
 
+	/*
+	 * If the change is not intended for logical decoding, do not even
+	 * establish transaction for it - REPACK CONCURRENTLY is the typical use
+	 * case.
+	 *
+	 * First, check if REPACK CONCURRENTLY is being performed by this backend.
+	 * If so, only decode data changes of the table that it is processing, and
+	 * the changes of its TOAST relation.
+	 *
+	 * (TOAST locator should not be set unless the main is.)
+	 */
+	Assert(!OidIsValid(repacked_rel_toast_locator.relNumber) ||
+		   OidIsValid(repacked_rel_locator.relNumber));
+
+	if (OidIsValid(repacked_rel_locator.relNumber))
+	{
+		XLogReaderState *r = buf->record;
+		RelFileLocator locator;
+
+		/* Not all records contain the block. */
+		if (XLogRecGetBlockTagExtended(r, 0, &locator, NULL, NULL, NULL) &&
+			!RelFileLocatorEquals(locator, repacked_rel_locator) &&
+			(!OidIsValid(repacked_rel_toast_locator.relNumber) ||
+			 !RelFileLocatorEquals(locator, repacked_rel_toast_locator)))
+			return;
+	}
+
+	/*
+	 * Second, skip records which do not contain sufficient information for
+	 * the decoding.
+	 *
+	 * The problem we solve here is that REPACK CONCURRENTLY generates WAL
+	 * when doing changes in the new table. Those changes should not be useful
+	 * for any other user (such as logical replication subscription) because
+	 * the new table will eventually be dropped (after REPACK CONCURRENTLY has
+	 * assigned its file to the "old table").
+	 */
+	switch (info)
+	{
+		case XLOG_HEAP_INSERT:
+			{
+				xl_heap_insert *rec;
+
+				rec = (xl_heap_insert *) XLogRecGetData(buf->record);
+
+				/*
+				 * This does happen when 1) raw_heap_insert marks the TOAST
+				 * record as HEAP_INSERT_NO_LOGICAL, 2) REPACK CONCURRENTLY
+				 * replays inserts performed by other backends.
+				 */
+				if ((rec->flags & XLH_INSERT_CONTAINS_NEW_TUPLE) == 0)
+					return;
+
+				break;
+			}
+
+		case XLOG_HEAP_HOT_UPDATE:
+		case XLOG_HEAP_UPDATE:
+			{
+				xl_heap_update *rec;
+
+				rec = (xl_heap_update *) XLogRecGetData(buf->record);
+				if ((rec->flags &
+					 (XLH_UPDATE_CONTAINS_NEW_TUPLE |
+					  XLH_UPDATE_CONTAINS_OLD_TUPLE |
+					  XLH_UPDATE_CONTAINS_OLD_KEY)) == 0)
+					return;
+
+				break;
+			}
+
+		case XLOG_HEAP_DELETE:
+			{
+				xl_heap_delete *rec;
+
+				rec = (xl_heap_delete *) XLogRecGetData(buf->record);
+				if (rec->flags & XLH_DELETE_NO_LOGICAL)
+					return;
+				break;
+			}
+	}
+
 	ReorderBufferProcessXid(ctx->reorder, xid, buf->origptr);
 
 	/*
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 6eaa40f6acf..56ea521e8c6 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -486,6 +486,26 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	return SnapBuildMVCCFromHistoric(snap, true);
 }
 
+/*
+ * Build an MVCC snapshot for the initial data load performed by REPACK
+ * CONCURRENTLY command.
+ *
+ * The snapshot will only be used to scan one particular relation, which is
+ * treated like a catalog (therefore ->building_full_snapshot is not
+ * important), and the caller should already have a replication slot setup (so
+ * we do not set MyProc->xmin). XXX Do we yet need to add some restrictions?
+ */
+Snapshot
+SnapBuildInitialSnapshotForRepack(SnapBuild *builder)
+{
+	Snapshot	snap;
+
+	Assert(builder->state == SNAPBUILD_CONSISTENT);
+
+	snap = SnapBuildBuildSnapshot(builder);
+	return SnapBuildMVCCFromHistoric(snap, false);
+}
+
 /*
  * Turn a historic MVCC snapshot into an ordinary MVCC snapshot.
  *
diff --git a/src/backend/replication/pgoutput_repack/Makefile b/src/backend/replication/pgoutput_repack/Makefile
new file mode 100644
index 00000000000..4efeb713b70
--- /dev/null
+++ b/src/backend/replication/pgoutput_repack/Makefile
@@ -0,0 +1,32 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for src/backend/replication/pgoutput_repack
+#
+# IDENTIFICATION
+#    src/backend/replication/pgoutput_repack
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/replication/pgoutput_repack
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = \
+	$(WIN32RES) \
+	pgoutput_repack.o
+PGFILEDESC = "pgoutput_repack - logical replication output plugin for REPACK command"
+NAME = pgoutput_repack
+
+all: all-shared-lib
+
+include $(top_srcdir)/src/Makefile.shlib
+
+install: all installdirs install-lib
+
+installdirs: installdirs-lib
+
+uninstall: uninstall-lib
+
+clean distclean: clean-lib
+	rm -f $(OBJS)
diff --git a/src/backend/replication/pgoutput_repack/meson.build b/src/backend/replication/pgoutput_repack/meson.build
new file mode 100644
index 00000000000..133e865a4a0
--- /dev/null
+++ b/src/backend/replication/pgoutput_repack/meson.build
@@ -0,0 +1,18 @@
+# Copyright (c) 2022-2024, PostgreSQL Global Development Group
+
+pgoutput_repack_sources = files(
+  'pgoutput_repack.c',
+)
+
+if host_system == 'windows'
+  pgoutput_repack_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'pgoutput_repack',
+    '--FILEDESC', 'pgoutput_repack - logical replication output plugin for REPACK command',])
+endif
+
+pgoutput_repack = shared_module('pgoutput_repack',
+  pgoutput_repack_sources,
+  kwargs: pg_mod_args,
+)
+
+backend_targets += pgoutput_repack
diff --git a/src/backend/replication/pgoutput_repack/pgoutput_repack.c b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
new file mode 100644
index 00000000000..687fbbc59bb
--- /dev/null
+++ b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
@@ -0,0 +1,288 @@
+/*-------------------------------------------------------------------------
+ *
+ * pgoutput_cluster.c
+ *		Logical Replication output plugin for REPACK command
+ *
+ * Copyright (c) 2012-2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		  src/backend/replication/pgoutput_cluster/pgoutput_cluster.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heaptoast.h"
+#include "commands/cluster.h"
+#include "replication/snapbuild.h"
+
+PG_MODULE_MAGIC;
+
+static void plugin_startup(LogicalDecodingContext *ctx,
+						   OutputPluginOptions *opt, bool is_init);
+static void plugin_shutdown(LogicalDecodingContext *ctx);
+static void plugin_begin_txn(LogicalDecodingContext *ctx,
+							 ReorderBufferTXN *txn);
+static void plugin_commit_txn(LogicalDecodingContext *ctx,
+							  ReorderBufferTXN *txn, XLogRecPtr commit_lsn);
+static void plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+						  Relation rel, ReorderBufferChange *change);
+static void plugin_truncate(struct LogicalDecodingContext *ctx,
+							ReorderBufferTXN *txn, int nrelations,
+							Relation relations[],
+							ReorderBufferChange *change);
+static void store_change(LogicalDecodingContext *ctx,
+						 ConcurrentChangeKind kind, HeapTuple tuple);
+
+void
+_PG_output_plugin_init(OutputPluginCallbacks *cb)
+{
+	AssertVariableIsOfType(&_PG_output_plugin_init, LogicalOutputPluginInit);
+
+	cb->startup_cb = plugin_startup;
+	cb->begin_cb = plugin_begin_txn;
+	cb->change_cb = plugin_change;
+	cb->truncate_cb = plugin_truncate;
+	cb->commit_cb = plugin_commit_txn;
+	cb->shutdown_cb = plugin_shutdown;
+}
+
+
+/* initialize this plugin */
+static void
+plugin_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
+			   bool is_init)
+{
+	ctx->output_plugin_private = NULL;
+
+	/* Probably unnecessary, as we don't use the SQL interface ... */
+	opt->output_type = OUTPUT_PLUGIN_BINARY_OUTPUT;
+
+	if (ctx->output_plugin_options != NIL)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("This plugin does not expect any options")));
+	}
+}
+
+static void
+plugin_shutdown(LogicalDecodingContext *ctx)
+{
+}
+
+/*
+ * As we don't release the slot during processing of particular table, there's
+ * no room for SQL interface, even for debugging purposes. Therefore we need
+ * neither OutputPluginPrepareWrite() nor OutputPluginWrite() in the plugin
+ * callbacks. (Although we might want to write custom callbacks, this API
+ * seems to be unnecessarily generic for our purposes.)
+ */
+
+/* BEGIN callback */
+static void
+plugin_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
+{
+}
+
+/* COMMIT callback */
+static void
+plugin_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+				  XLogRecPtr commit_lsn)
+{
+}
+
+/*
+ * Callback for individual changed tuples
+ */
+static void
+plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+			  Relation relation, ReorderBufferChange *change)
+{
+	RepackDecodingState *dstate;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	/* Only interested in one particular relation. */
+	if (relation->rd_id != dstate->relid)
+		return;
+
+	/* Decode entry depending on its type */
+	switch (change->action)
+	{
+		case REORDER_BUFFER_CHANGE_INSERT:
+			{
+				HeapTuple	newtuple;
+
+				newtuple = change->data.tp.newtuple != NULL ?
+					change->data.tp.newtuple : NULL;
+
+				/*
+				 * Identity checks in the main function should have made this
+				 * impossible.
+				 */
+				if (newtuple == NULL)
+					elog(ERROR, "Incomplete insert info.");
+
+				store_change(ctx, CHANGE_INSERT, newtuple);
+			}
+			break;
+		case REORDER_BUFFER_CHANGE_UPDATE:
+			{
+				HeapTuple	oldtuple,
+							newtuple;
+
+				oldtuple = change->data.tp.oldtuple != NULL ?
+					change->data.tp.oldtuple : NULL;
+				newtuple = change->data.tp.newtuple != NULL ?
+					change->data.tp.newtuple : NULL;
+
+				if (newtuple == NULL)
+					elog(ERROR, "Incomplete update info.");
+
+				if (oldtuple != NULL)
+					store_change(ctx, CHANGE_UPDATE_OLD, oldtuple);
+
+				store_change(ctx, CHANGE_UPDATE_NEW, newtuple);
+			}
+			break;
+		case REORDER_BUFFER_CHANGE_DELETE:
+			{
+				HeapTuple	oldtuple;
+
+				oldtuple = change->data.tp.oldtuple ?
+					change->data.tp.oldtuple : NULL;
+
+				if (oldtuple == NULL)
+					elog(ERROR, "Incomplete delete info.");
+
+				store_change(ctx, CHANGE_DELETE, oldtuple);
+			}
+			break;
+		default:
+			/* Should not come here */
+			Assert(false);
+			break;
+	}
+}
+
+static void
+plugin_truncate(struct LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+				int nrelations, Relation relations[],
+				ReorderBufferChange *change)
+{
+	RepackDecodingState *dstate;
+	int			i;
+	Relation	relation = NULL;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	/* Find the relation we are processing. */
+	for (i = 0; i < nrelations; i++)
+	{
+		relation = relations[i];
+
+		if (RelationGetRelid(relation) == dstate->relid)
+			break;
+	}
+
+	/* Is this truncation of another relation? */
+	if (i == nrelations)
+		return;
+
+	store_change(ctx, CHANGE_TRUNCATE, NULL);
+}
+
+/* Store concurrent data change. */
+static void
+store_change(LogicalDecodingContext *ctx, ConcurrentChangeKind kind,
+			 HeapTuple tuple)
+{
+	RepackDecodingState *dstate;
+	char	   *change_raw;
+	ConcurrentChange change;
+	bool		flattened = false;
+	Size		size;
+	Datum		values[1];
+	bool		isnull[1];
+	char	   *dst,
+			   *dst_start;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	size = MAXALIGN(VARHDRSZ) + SizeOfConcurrentChange;
+
+	if (tuple)
+	{
+		/*
+		 * ReorderBufferCommit() stores the TOAST chunks in its private memory
+		 * context and frees them after having called apply_change().
+		 * Therefore we need flat copy (including TOAST) that we eventually
+		 * copy into the memory context which is available to
+		 * decode_concurrent_changes().
+		 */
+		if (HeapTupleHasExternal(tuple))
+		{
+			/*
+			 * toast_flatten_tuple_to_datum() might be more convenient but we
+			 * don't want the decompression it does.
+			 */
+			tuple = toast_flatten_tuple(tuple, dstate->tupdesc);
+			flattened = true;
+		}
+
+		size += tuple->t_len;
+	}
+
+	/* XXX Isn't there any function / macro to do this? */
+	if (size >= 0x3FFFFFFF)
+		elog(ERROR, "Change is too big.");
+
+	/* Construct the change. */
+	change_raw = (char *) palloc0(size);
+	SET_VARSIZE(change_raw, size);
+
+	/*
+	 * Since the varlena alignment might not be sufficient for the structure,
+	 * set the fields in a local instance and remember where it should
+	 * eventually be copied.
+	 */
+	change.kind = kind;
+	dst_start = (char *) VARDATA(change_raw);
+
+	/* No other information is needed for TRUNCATE. */
+	if (change.kind == CHANGE_TRUNCATE)
+	{
+		memcpy(dst_start, &change, SizeOfConcurrentChange);
+		goto store;
+	}
+
+	/*
+	 * Copy the tuple.
+	 *
+	 * CAUTION: change->tup_data.t_data must be fixed on retrieval!
+	 */
+	memcpy(&change.tup_data, tuple, sizeof(HeapTupleData));
+	dst = dst_start + SizeOfConcurrentChange;
+	memcpy(dst, tuple->t_data, tuple->t_len);
+
+	/* The data has been copied. */
+	if (flattened)
+		pfree(tuple);
+
+store:
+	/* Copy the structure so it can be stored. */
+	memcpy(dst_start, &change, SizeOfConcurrentChange);
+
+	/* Store as tuple of 1 bytea column. */
+	values[0] = PointerGetDatum(change_raw);
+	isnull[0] = false;
+	tuplestore_putvalues(dstate->tstore, dstate->tupdesc_change,
+						 values, isnull);
+
+	/* Accounting. */
+	dstate->nchanges++;
+
+	/* Cleanup. */
+	pfree(change_raw);
+}
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2fa045e6b0f..e9ddf39500c 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -25,6 +25,7 @@
 #include "access/xlogprefetcher.h"
 #include "access/xlogrecovery.h"
 #include "commands/async.h"
+#include "commands/cluster.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
diff --git a/src/backend/storage/lmgr/generate-lwlocknames.pl b/src/backend/storage/lmgr/generate-lwlocknames.pl
index cd3e43c448a..519f3953638 100644
--- a/src/backend/storage/lmgr/generate-lwlocknames.pl
+++ b/src/backend/storage/lmgr/generate-lwlocknames.pl
@@ -162,7 +162,7 @@ while (<$lwlocklist>)
 
 die
   "$wait_event_lwlocks[$lwlock_count] defined in wait_event_names.txt but "
-  . " missing from lwlocklist.h"
+  . "missing from lwlocklist.h"
   if $lwlock_count < scalar @wait_event_lwlocks;
 
 die
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 6fe268a8eec..d27a4c30548 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -64,6 +64,7 @@
 #include "catalog/pg_type.h"
 #include "catalog/schemapg.h"
 #include "catalog/storage.h"
+#include "commands/cluster.h"
 #include "commands/policy.h"
 #include "commands/publicationcmds.h"
 #include "commands/trigger.h"
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 70a6b8902d1..7f1c220e00b 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -213,7 +213,6 @@ static List *exportedSnapshots = NIL;
 
 /* Prototypes for local functions */
 static void UnregisterSnapshotNoOwner(Snapshot snapshot);
-static void FreeSnapshot(Snapshot snapshot);
 static void SnapshotResetXmin(void);
 
 /* ResourceOwner callbacks to track snapshot references */
@@ -646,7 +645,7 @@ CopySnapshot(Snapshot snapshot)
  * FreeSnapshot
  *		Free the memory associated with a snapshot.
  */
-static void
+void
 FreeSnapshot(Snapshot snapshot)
 {
 	Assert(snapshot->regd_count == 0);
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 31c1ce5dc09..f23efa12d69 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -4998,18 +4998,27 @@ match_previous_words(int pattern_id,
 	}
 
 /* REPACK */
-	else if (Matches("REPACK"))
+	else if (Matches("REPACK") || Matches("REPACK", "(*)"))
+		COMPLETE_WITH_SCHEMA_QUERY_PLUS(Query_for_list_of_clusterables,
+										"CONCURRENTLY");
+	else if (Matches("REPACK", "CONCURRENTLY"))
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
-	else if (Matches("REPACK", "(*)"))
+	else if (Matches("REPACK", "(*)", "CONCURRENTLY"))
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
-	/* If we have REPACK <sth>, then add "USING INDEX" */
-	else if (Matches("REPACK", MatchAnyExcept("(")))
+	/* If we have REPACK [ CONCURRENTLY ] <sth>, then add "USING INDEX" */
+	else if (Matches("REPACK", MatchAnyExcept("(|CONCURRENTLY")) ||
+			 Matches("REPACK", "CONCURRENTLY", MatchAnyExcept("(")))
 		COMPLETE_WITH("USING INDEX");
-	/* If we have REPACK (*) <sth>, then add "USING INDEX" */
-	else if (Matches("REPACK", "(*)", MatchAny))
+	/* If we have REPACK (*) [ CONCURRENTLY ] <sth>, then add "USING INDEX" */
+	else if (Matches("REPACK", "(*)", MatchAnyExcept("CONCURRENTLY")) ||
+			 Matches("REPACK", "(*)", "CONCURRENTLY", MatchAnyExcept("(")))
 		COMPLETE_WITH("USING INDEX");
-	/* If we have REPACK <sth> USING, then add the index as well */
-	else if (Matches("REPACK", MatchAny, "USING", "INDEX"))
+
+	/*
+	 * Complete ... [ (*) ] [ CONCURRENTLY ] <sth> USING INDEX, with a list of
+	 * indexes for <sth>.
+	 */
+	else if (TailMatches(MatchAnyExcept("(|CONCURRENTLY"), "USING", "INDEX"))
 	{
 		set_completion_reference(prev3_wd);
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_index_of_table);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2bd5a897f8..b82dd17a966 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -323,14 +323,15 @@ extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots,
 							  BulkInsertState bistate);
 extern TM_Result heap_delete(Relation relation, ItemPointer tid,
 							 CommandId cid, Snapshot crosscheck, bool wait,
-							 struct TM_FailureData *tmfd, bool changingPart);
+							 struct TM_FailureData *tmfd, bool changingPart,
+							 bool wal_logical);
 extern void heap_finish_speculative(Relation relation, ItemPointer tid);
 extern void heap_abort_speculative(Relation relation, ItemPointer tid);
 extern TM_Result heap_update(Relation relation, ItemPointer otid,
 							 HeapTuple newtup,
 							 CommandId cid, Snapshot crosscheck, bool wait,
 							 struct TM_FailureData *tmfd, LockTupleMode *lockmode,
-							 TU_UpdateIndexes *update_indexes);
+							 TU_UpdateIndexes *update_indexes, bool wal_logical);
 extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
 								 CommandId cid, LockTupleMode mode, LockWaitPolicy wait_policy,
 								 bool follow_updates,
@@ -411,6 +412,10 @@ extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
 								 uint16 infomask, TransactionId xid);
+extern bool HeapTupleMVCCInserted(HeapTuple htup, Snapshot snapshot,
+								  Buffer buffer);
+extern bool HeapTupleMVCCNotDeleted(HeapTuple htup, Snapshot snapshot,
+									Buffer buffer);
 extern bool HeapTupleHeaderIsOnlyLocked(HeapTupleHeader tuple);
 extern bool HeapTupleIsSurelyDead(HeapTuple htup,
 								  struct GlobalVisState *vistest);
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 277df6b3cf0..8d4af07f840 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -104,6 +104,8 @@
 #define XLH_DELETE_CONTAINS_OLD_KEY				(1<<2)
 #define XLH_DELETE_IS_SUPER						(1<<3)
 #define XLH_DELETE_IS_PARTITION_MOVE			(1<<4)
+/* See heap_delete() */
+#define XLH_DELETE_NO_LOGICAL					(1<<5)
 
 /* convenience macro for checking whether any form of old tuple was logged */
 #define XLH_DELETE_CONTAINS_OLD						\
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 1c9e802a6b1..289b64edfd9 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -22,6 +22,7 @@
 #include "access/xact.h"
 #include "commands/vacuum.h"
 #include "executor/tuptable.h"
+#include "replication/logical.h"
 #include "storage/read_stream.h"
 #include "utils/rel.h"
 #include "utils/snapshot.h"
@@ -623,6 +624,8 @@ typedef struct TableAmRoutine
 											  Relation OldIndex,
 											  bool use_sort,
 											  TransactionId OldestXmin,
+											  Snapshot snapshot,
+											  LogicalDecodingContext *decoding_ctx,
 											  TransactionId *xid_cutoff,
 											  MultiXactId *multi_cutoff,
 											  double *num_tuples,
@@ -1627,6 +1630,10 @@ table_relation_copy_data(Relation rel, const RelFileLocator *newrlocator)
  *   not needed for the relation's AM
  * - *xid_cutoff - ditto
  * - *multi_cutoff - ditto
+ * - snapshot - if != NULL, ignore data changes done by transactions that this
+ *	 (MVCC) snapshot considers still in-progress or in the future.
+ * - decoding_ctx - logical decoding context, to capture concurrent data
+ *   changes.
  *
  * Output parameters:
  * - *xid_cutoff - rel's new relfrozenxid value, may be invalid
@@ -1639,6 +1646,8 @@ table_relation_copy_for_cluster(Relation OldTable, Relation NewTable,
 								Relation OldIndex,
 								bool use_sort,
 								TransactionId OldestXmin,
+								Snapshot snapshot,
+								LogicalDecodingContext *decoding_ctx,
 								TransactionId *xid_cutoff,
 								MultiXactId *multi_cutoff,
 								double *num_tuples,
@@ -1647,6 +1656,7 @@ table_relation_copy_for_cluster(Relation OldTable, Relation NewTable,
 {
 	OldTable->rd_tableam->relation_copy_for_cluster(OldTable, NewTable, OldIndex,
 													use_sort, OldestXmin,
+													snapshot, decoding_ctx,
 													xid_cutoff, multi_cutoff,
 													num_tuples, tups_vacuumed,
 													tups_recently_dead);
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 7f4138c7b36..532ffa7208d 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -13,10 +13,15 @@
 #ifndef CLUSTER_H
 #define CLUSTER_H
 
+#include "nodes/execnodes.h"
 #include "nodes/parsenodes.h"
 #include "parser/parse_node.h"
+#include "replication/logical.h"
 #include "storage/lock.h"
+#include "storage/relfilelocator.h"
 #include "utils/relcache.h"
+#include "utils/resowner.h"
+#include "utils/tuplestore.h"
 
 
 /* flag bits for ClusterParams->options */
@@ -24,6 +29,7 @@
 #define CLUOPT_RECHECK 0x02		/* recheck relation state */
 #define CLUOPT_RECHECK_ISCLUSTERED 0x04 /* recheck relation state for
 										 * indisclustered */
+#define CLUOPT_CONCURRENT 0x08	/* allow concurrent data changes */
 
 /* options for CLUSTER */
 typedef struct ClusterParams
@@ -32,14 +38,95 @@ typedef struct ClusterParams
 } ClusterParams;
 
 
+/*
+ * The following definitions are used by REPACK CONCURRENTLY.
+ */
+
+extern RelFileLocator repacked_rel_locator;
+extern RelFileLocator repacked_rel_toast_locator;
+
+typedef enum
+{
+	CHANGE_INSERT,
+	CHANGE_UPDATE_OLD,
+	CHANGE_UPDATE_NEW,
+	CHANGE_DELETE,
+	CHANGE_TRUNCATE
+} ConcurrentChangeKind;
+
+typedef struct ConcurrentChange
+{
+	/* See the enum above. */
+	ConcurrentChangeKind kind;
+
+	/*
+	 * The actual tuple.
+	 *
+	 * The tuple data follows the ConcurrentChange structure. Before use make
+	 * sure the tuple is correctly aligned (ConcurrentChange can be stored as
+	 * bytea) and that tuple->t_data is fixed.
+	 */
+	HeapTupleData tup_data;
+} ConcurrentChange;
+
+#define SizeOfConcurrentChange (offsetof(ConcurrentChange, tup_data) + \
+								sizeof(HeapTupleData))
+
+/*
+ * Logical decoding state.
+ *
+ * Here we store the data changes that we decode from WAL while the table
+ * contents is being copied to a new storage. Also the necessary metadata
+ * needed to apply these changes to the table is stored here.
+ */
+typedef struct RepackDecodingState
+{
+	/* The relation whose changes we're decoding. */
+	Oid			relid;
+
+	/*
+	 * Decoded changes are stored here. Although we try to avoid excessive
+	 * batches, it can happen that the changes need to be stored to disk. The
+	 * tuplestore does this transparently.
+	 */
+	Tuplestorestate *tstore;
+
+	/* The current number of changes in tstore. */
+	double		nchanges;
+
+	/*
+	 * Descriptor to store the ConcurrentChange structure serialized (bytea).
+	 * We can't store the tuple directly because tuplestore only supports
+	 * minimum tuple and we may need to transfer OID system column from the
+	 * output plugin. Also we need to transfer the change kind, so it's better
+	 * to put everything in the structure than to use 2 tuplestores "in
+	 * parallel".
+	 */
+	TupleDesc	tupdesc_change;
+
+	/* Tuple descriptor needed to update indexes. */
+	TupleDesc	tupdesc;
+
+	/* Slot to retrieve data from tstore. */
+	TupleTableSlot *tsslot;
+
+	ResourceOwner resowner;
+} RepackDecodingState;
+
+
+
 extern void ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel);
 
 extern void cluster_rel(RepackCommand command, bool usingindex,
-						Relation OldHeap, Oid indexOid, ClusterParams *params);
+						Relation OldHeap, Oid indexOid, ClusterParams *params,
+						bool isTopLevel);
 extern void check_index_is_clusterable(Relation OldHeap, Oid indexOid,
 									   LOCKMODE lockmode);
 extern void mark_index_clustered(Relation rel, Oid indexOid, bool is_internal);
 
+extern void repack_decode_concurrent_changes(LogicalDecodingContext *ctx,
+											 XLogRecPtr end_of_wal);
+
 extern Oid	make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
 						  char relpersistence, LOCKMODE lockmode);
 extern void finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
@@ -47,6 +134,7 @@ extern void finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 							 bool swap_toast_by_content,
 							 bool check_constraints,
 							 bool is_internal,
+							 bool reindex,
 							 TransactionId frozenXid,
 							 MultiXactId cutoffMulti,
 							 char newrelpersistence);
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 5b6639c114c..93917ad5544 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -59,18 +59,20 @@
 /*
  * Progress parameters for REPACK.
  *
- * Note: Since REPACK shares some code with CLUSTER, these values are also
- * used by CLUSTER. (CLUSTER is now deprecated, so it makes little sense to
- * introduce a separate set of constants.)
+ * Note: Since REPACK shares some code with CLUSTER, (some of) these values
+ * are also used by CLUSTER. (CLUSTER is now deprecated, so it makes little
+ * sense to introduce a separate set of constants.)
  */
 #define PROGRESS_REPACK_COMMAND					0
 #define PROGRESS_REPACK_PHASE					1
 #define PROGRESS_REPACK_INDEX_RELID				2
 #define PROGRESS_REPACK_HEAP_TUPLES_SCANNED		3
-#define PROGRESS_REPACK_HEAP_TUPLES_WRITTEN		4
-#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			5
-#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		6
-#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		7
+#define PROGRESS_REPACK_HEAP_TUPLES_INSERTED	4
+#define PROGRESS_REPACK_HEAP_TUPLES_UPDATED		5
+#define PROGRESS_REPACK_HEAP_TUPLES_DELETED		6
+#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			7
+#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		8
+#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		9
 
 /*
  * Phases of repack (as advertised via PROGRESS_REPACK_PHASE).
@@ -79,9 +81,10 @@
 #define PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP	2
 #define PROGRESS_REPACK_PHASE_SORT_TUPLES		3
 #define PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP	4
-#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	5
-#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		6
-#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		7
+#define PROGRESS_REPACK_PHASE_CATCH_UP			5
+#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	6
+#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		7
+#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		8
 
 /*
  * Commands of PROGRESS_REPACK
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index 6d4d2d1814c..802fc4b0823 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -73,6 +73,7 @@ extern void FreeSnapshotBuilder(SnapBuild *builder);
 extern void SnapBuildSnapDecRefcount(Snapshot snap);
 
 extern Snapshot SnapBuildInitialSnapshot(SnapBuild *builder);
+extern Snapshot SnapBuildInitialSnapshotForRepack(SnapBuild *builder);
 extern Snapshot SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place);
 extern const char *SnapBuildExportSnapshot(SnapBuild *builder);
 extern void SnapBuildClearExportedSnapshot(void);
diff --git a/src/include/storage/lockdefs.h b/src/include/storage/lockdefs.h
index 7f3ba0352f6..2739327b0da 100644
--- a/src/include/storage/lockdefs.h
+++ b/src/include/storage/lockdefs.h
@@ -36,8 +36,8 @@ typedef int LOCKMODE;
 #define AccessShareLock			1	/* SELECT */
 #define RowShareLock			2	/* SELECT FOR UPDATE/FOR SHARE */
 #define RowExclusiveLock		3	/* INSERT, UPDATE, DELETE */
-#define ShareUpdateExclusiveLock 4	/* VACUUM (non-FULL), ANALYZE, CREATE
-									 * INDEX CONCURRENTLY */
+#define ShareUpdateExclusiveLock 4	/* VACUUM (non-exclusive), ANALYZE, CREATE
+									 * INDEX CONCURRENTLY, REPACK CONCURRENTLY */
 #define ShareLock				5	/* CREATE INDEX (WITHOUT CONCURRENTLY) */
 #define ShareRowExclusiveLock	6	/* like EXCLUSIVE MODE, but allows ROW
 									 * SHARE */
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 147b190210a..5eeabdc6c4f 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -61,6 +61,8 @@ extern Snapshot GetLatestSnapshot(void);
 extern void SnapshotSetCommandId(CommandId curcid);
 
 extern Snapshot CopySnapshot(Snapshot snapshot);
+extern void FreeSnapshot(Snapshot snapshot);
+
 extern Snapshot GetCatalogSnapshot(Oid relid);
 extern Snapshot GetNonHistoricCatalogSnapshot(Oid relid);
 extern void InvalidateCatalogSnapshot(void);
diff --git a/src/test/modules/injection_points/Makefile b/src/test/modules/injection_points/Makefile
index fc82cd67f6c..f16422175f8 100644
--- a/src/test/modules/injection_points/Makefile
+++ b/src/test/modules/injection_points/Makefile
@@ -11,10 +11,11 @@ EXTENSION = injection_points
 DATA = injection_points--1.0.sql
 PGFILEDESC = "injection_points - facility for injection points"
 
-REGRESS = injection_points hashagg reindex_conc vacuum
+# REGRESS = injection_points hashagg reindex_conc vacuum
 REGRESS_OPTS = --dlpath=$(top_builddir)/src/test/regress
 
-ISOLATION = basic inplace syscache-update-pruned
+ISOLATION = basic inplace syscache-update-pruned repack
+ISOLATION_OPTS = --temp-config $(top_srcdir)/src/test/modules/injection_points/logical.conf
 
 TAP_TESTS = 1
 
diff --git a/src/test/modules/injection_points/expected/repack.out b/src/test/modules/injection_points/expected/repack.out
new file mode 100644
index 00000000000..b575e9052ee
--- /dev/null
+++ b/src/test/modules/injection_points/expected/repack.out
@@ -0,0 +1,113 @@
+Parsed test spec with 2 sessions
+
+starting permutation: wait_before_lock change_existing change_new change_subxact1 change_subxact2 check2 wakeup_before_lock check1
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step wait_before_lock: 
+	REPACK (CONCURRENTLY) repack_test USING INDEX repack_test_pkey;
+ <waiting ...>
+step change_existing: 
+	UPDATE repack_test SET i=10 where i=1;
+	UPDATE repack_test SET j=20 where i=2;
+	UPDATE repack_test SET i=30 where i=3;
+	UPDATE repack_test SET i=40 where i=30;
+	DELETE FROM repack_test WHERE i=4;
+
+step change_new: 
+	INSERT INTO repack_test(i, j) VALUES (5, 5), (6, 6), (7, 7), (8, 8);
+	UPDATE repack_test SET i=50 where i=5;
+	UPDATE repack_test SET j=60 where i=6;
+	DELETE FROM repack_test WHERE i=7;
+
+step change_subxact1: 
+	BEGIN;
+	INSERT INTO repack_test(i, j) VALUES (100, 100);
+	SAVEPOINT s1;
+	UPDATE repack_test SET i=101 where i=100;
+	SAVEPOINT s2;
+	UPDATE repack_test SET i=102 where i=101;
+	COMMIT;
+
+step change_subxact2: 
+	BEGIN;
+	SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 110);
+	ROLLBACK TO SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 111);
+	COMMIT;
+
+step check2: 
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s2(i, j)
+	SELECT i, j FROM repack_test;
+
+  i|  j
+---+---
+  2| 20
+  6| 60
+  8|  8
+ 10|  1
+ 40|  3
+ 50|  5
+102|100
+110|111
+(8 rows)
+
+step wakeup_before_lock: 
+	SELECT injection_points_wakeup('repack-concurrently-before-lock');
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step wait_before_lock: <... completed>
+step check1: 
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT count(DISTINCT node) FROM relfilenodes;
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s1(i, j)
+	SELECT i, j FROM repack_test;
+
+	SELECT count(*)
+	FROM data_s1 d1 FULL JOIN data_s2 d2 USING (i, j)
+	WHERE d1.i ISNULL OR d2.i ISNULL;
+
+count
+-----
+    2
+(1 row)
+
+  i|  j
+---+---
+  2| 20
+  6| 60
+  8|  8
+ 10|  1
+ 40|  3
+ 50|  5
+102|100
+110|111
+(8 rows)
+
+count
+-----
+    0
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
diff --git a/src/test/modules/injection_points/logical.conf b/src/test/modules/injection_points/logical.conf
new file mode 100644
index 00000000000..c8f264bc6cb
--- /dev/null
+++ b/src/test/modules/injection_points/logical.conf
@@ -0,0 +1 @@
+wal_level = logical
\ No newline at end of file
diff --git a/src/test/modules/injection_points/meson.build b/src/test/modules/injection_points/meson.build
index 20390d6b4bf..29561103bbf 100644
--- a/src/test/modules/injection_points/meson.build
+++ b/src/test/modules/injection_points/meson.build
@@ -47,9 +47,13 @@ tests += {
     'specs': [
       'basic',
       'inplace',
+      'repack',
       'syscache-update-pruned',
     ],
     'runningcheck': false, # see syscache-update-pruned
+    # 'repack' requires wal_level = 'logical'.
+    'regress_args': ['--temp-config', files('logical.conf')],
+
   },
   'tap': {
     'env': {
diff --git a/src/test/modules/injection_points/specs/repack.spec b/src/test/modules/injection_points/specs/repack.spec
new file mode 100644
index 00000000000..75850334986
--- /dev/null
+++ b/src/test/modules/injection_points/specs/repack.spec
@@ -0,0 +1,143 @@
+# Prefix the system columns with underscore as they are not allowed as column
+# names.
+setup
+{
+	CREATE EXTENSION injection_points;
+
+	CREATE TABLE repack_test(i int PRIMARY KEY, j int);
+	INSERT INTO repack_test(i, j) VALUES (1, 1), (2, 2), (3, 3), (4, 4);
+
+	CREATE TABLE relfilenodes(node oid);
+
+	CREATE TABLE data_s1(i int, j int);
+	CREATE TABLE data_s2(i int, j int);
+}
+
+teardown
+{
+	DROP TABLE repack_test;
+	DROP EXTENSION injection_points;
+
+	DROP TABLE relfilenodes;
+	DROP TABLE data_s1;
+	DROP TABLE data_s2;
+}
+
+session s1
+setup
+{
+	SELECT injection_points_set_local();
+	SELECT injection_points_attach('repack-concurrently-before-lock', 'wait');
+}
+# Perform the initial load and wait for s2 to do some data changes.
+step wait_before_lock
+{
+	REPACK (CONCURRENTLY) repack_test USING INDEX repack_test_pkey;
+}
+# Check the table from the perspective of s1.
+#
+# Besides the contents, we also check that relfilenode has changed.
+
+# Have each session write the contents into a table and use FULL JOIN to check
+# if the outputs are identical.
+step check1
+{
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT count(DISTINCT node) FROM relfilenodes;
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s1(i, j)
+	SELECT i, j FROM repack_test;
+
+	SELECT count(*)
+	FROM data_s1 d1 FULL JOIN data_s2 d2 USING (i, j)
+	WHERE d1.i ISNULL OR d2.i ISNULL;
+}
+teardown
+{
+    SELECT injection_points_detach('repack-concurrently-before-lock');
+}
+
+session s2
+# Change the existing data. UPDATE changes both key and non-key columns. Also
+# update one row twice to test whether tuple version generated by this session
+# can be found.
+step change_existing
+{
+	UPDATE repack_test SET i=10 where i=1;
+	UPDATE repack_test SET j=20 where i=2;
+	UPDATE repack_test SET i=30 where i=3;
+	UPDATE repack_test SET i=40 where i=30;
+	DELETE FROM repack_test WHERE i=4;
+}
+# Insert new rows and UPDATE / DELETE some of them. Again, update both key and
+# non-key column.
+step change_new
+{
+	INSERT INTO repack_test(i, j) VALUES (5, 5), (6, 6), (7, 7), (8, 8);
+	UPDATE repack_test SET i=50 where i=5;
+	UPDATE repack_test SET j=60 where i=6;
+	DELETE FROM repack_test WHERE i=7;
+}
+
+# When applying concurrent data changes, we should see the effects of an
+# in-progress subtransaction.
+#
+# XXX Not sure this test is useful now - it was designed for the patch that
+# preserves tuple visibility and which therefore modifies
+# TransactionIdIsCurrentTransactionId().
+step change_subxact1
+{
+	BEGIN;
+	INSERT INTO repack_test(i, j) VALUES (100, 100);
+	SAVEPOINT s1;
+	UPDATE repack_test SET i=101 where i=100;
+	SAVEPOINT s2;
+	UPDATE repack_test SET i=102 where i=101;
+	COMMIT;
+}
+
+# When applying concurrent data changes, we should not see the effects of a
+# rolled back subtransaction.
+#
+# XXX Is this test useful? See above.
+step change_subxact2
+{
+	BEGIN;
+	SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 110);
+	ROLLBACK TO SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 111);
+	COMMIT;
+}
+
+# Check the table from the perspective of s2.
+step check2
+{
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s2(i, j)
+	SELECT i, j FROM repack_test;
+}
+step wakeup_before_lock
+{
+	SELECT injection_points_wakeup('repack-concurrently-before-lock');
+}
+
+# Test if data changes introduced while one session is performing REPACK
+# CONCURRENTLY find their way into the table.
+permutation
+	wait_before_lock
+	change_existing
+	change_new
+	change_subxact1
+	change_subxact2
+	check2
+	wakeup_before_lock
+	check1
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 3a1d1d28282..fe227bd8a30 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1999,17 +1999,17 @@ pg_stat_progress_cluster| SELECT s.pid,
             WHEN 2 THEN 'index scanning heap'::text
             WHEN 3 THEN 'sorting tuples'::text
             WHEN 4 THEN 'writing new heap'::text
-            WHEN 5 THEN 'swapping relation files'::text
-            WHEN 6 THEN 'rebuilding index'::text
-            WHEN 7 THEN 'performing final cleanup'::text
+            WHEN 6 THEN 'swapping relation files'::text
+            WHEN 7 THEN 'rebuilding index'::text
+            WHEN 8 THEN 'performing final cleanup'::text
             ELSE NULL::text
         END AS phase,
     (s.param3)::oid AS cluster_index_relid,
     s.param4 AS heap_tuples_scanned,
     s.param5 AS heap_tuples_written,
-    s.param6 AS heap_blks_total,
-    s.param7 AS heap_blks_scanned,
-    s.param8 AS index_rebuild_count
+    s.param8 AS heap_blks_total,
+    s.param9 AS heap_blks_scanned,
+    s.param10 AS index_rebuild_count
    FROM (pg_stat_get_progress_info('CLUSTER'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_copy| SELECT s.pid,
@@ -2081,17 +2081,20 @@ pg_stat_progress_repack| SELECT s.pid,
             WHEN 2 THEN 'index scanning heap'::text
             WHEN 3 THEN 'sorting tuples'::text
             WHEN 4 THEN 'writing new heap'::text
-            WHEN 5 THEN 'swapping relation files'::text
-            WHEN 6 THEN 'rebuilding index'::text
-            WHEN 7 THEN 'performing final cleanup'::text
+            WHEN 5 THEN 'catch-up'::text
+            WHEN 6 THEN 'swapping relation files'::text
+            WHEN 7 THEN 'rebuilding index'::text
+            WHEN 8 THEN 'performing final cleanup'::text
             ELSE NULL::text
         END AS phase,
     (s.param3)::oid AS repack_index_relid,
     s.param4 AS heap_tuples_scanned,
-    s.param5 AS heap_tuples_written,
-    s.param6 AS heap_blks_total,
-    s.param7 AS heap_blks_scanned,
-    s.param8 AS index_rebuild_count
+    s.param5 AS heap_tuples_inserted,
+    s.param6 AS heap_tuples_updated,
+    s.param7 AS heap_tuples_deleted,
+    s.param8 AS heap_blks_total,
+    s.param9 AS heap_blks_scanned,
+    s.param10 AS index_rebuild_count
    FROM (pg_stat_get_progress_info('REPACK'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_vacuum| SELECT s.pid,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 24f98ed1e9e..af967625181 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -486,6 +486,8 @@ CompressFileHandle
 CompressionLocation
 CompressorState
 ComputeXidHorizonsResult
+ConcurrentChange
+ConcurrentChangeKind
 ConditionVariable
 ConditionVariableMinimallyPadded
 ConditionalStack
@@ -1258,6 +1260,7 @@ IndexElem
 IndexFetchHeapData
 IndexFetchTableData
 IndexInfo
+IndexInsertState
 IndexList
 IndexOnlyScan
 IndexOnlyScanState
@@ -2538,6 +2541,7 @@ ReorderBufferUpdateProgressTxnCB
 ReorderTuple
 RepOriginId
 RepackCommand
+RepackDecodingState
 RepackStmt
 ReparameterizeForeignPathByChild_function
 ReplaceVarsFromTargetList_context
-- 
2.47.1

Antonin Houska

ah@cybertec.at

5 months ago

In reply to: Antonin Houska (#6)

5 attachment(s)

Antonin Houska <ah@cybertec.at> wrote:

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

One more thing - I think build_new_indexes and
index_concurrently_create_copy are very close in semantics, so it
might be a good idea to refactor them a bit.

You're right. I think I even used the latter for reference when writing the
first.

0002 in the attached series tries to fix that. build_new_indexes() (in 0004)
is simpler now.

This is v18 again. Parts 0001 through 0004 are unchanged, however 0005 is
added. It implements a new client application pg_repackdb. (If I posted 0005
alone its regression tests would not work. I wonder if the cfbot handles the
repeated occurence of the 'v18-' prefix correctly.)

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachments:

v18-0001-Add-REPACK-command.patchtext/x-diffDownload

From eadf2666b1e92fd52372fa859ef177f3a0d11f77 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Mon, 11 Aug 2025 15:22:05 +0200
Subject: [PATCH 1/4] Add REPACK command.

The existing CLUSTER command as well as VACUUM with the FULL option both
reclaim unused space by rewriting table. Now that we want to enhance this
functionality (in particular, by adding a new option CONCURRENTLY), we should
enhance both commands because they are both implemented by the same function
(cluster.c:cluster_rel). However, adding the same option to two different
commands is not very user-friendly. Therefore it was decided to create a new
command and to declare both CLUSTER command and the FULL option of VACUUM
deprecated. Future enhancements to this rewriting code will only affect the
new command.

Like CLUSTER, the REPACK command reorders the table according to the specified
index. Unlike CLUSTER, REPACK does not require the index: if only table is
specified, the command acts as VACUUM FULL. As we don't want to remove CLUSTER
and VACUUM FULL yet, there are three callers of the cluster_rel() function
now: REPACK, CLUSTER and VACUUM FULL. When we need to distinguish who is
calling this function (mostly for logging, but also for progress reporting),
we can no longer use the OID of the clustering index: both REPACK and VACUUM
FULL can pass InvalidOid. Therefore, this patch introduces a new enumeration
type ClusterCommand, and adds an argument of this type to the cluster_rel()
function and to all the functions that need to distinguish the caller.

Like CLUSTER and VACUUM FULL, the REPACK command without arguments processes
all the tables on which the current user has the MAINTAIN privilege.

A new view pg_stat_progress_repack view is added to monitor the progress of
REPACK. Currently it displays the same information as pg_stat_progress_cluster
(except that column names might differ), but it'll also display the status of
the REPACK CONCURRENTLY command in the future, so the view definitions will
eventually diverge.

Regarding user documentation, the patch moves the information on clustering
from cluster.sgml to the new file repack.sgml. cluster.sgml now contains a
link that points to the related section of repack.sgml. A note on deprecation
and a link to repack.sgml are added to both cluster.sgml and vacuum.sgml.
---
 doc/src/sgml/monitoring.sgml             | 223 ++++++-
 doc/src/sgml/ref/allfiles.sgml           |   1 +
 doc/src/sgml/ref/cluster.sgml            |  82 +--
 doc/src/sgml/ref/repack.sgml             | 254 ++++++++
 doc/src/sgml/ref/vacuum.sgml             |   9 +
 doc/src/sgml/reference.sgml              |   1 +
 src/backend/access/heap/heapam_handler.c |  32 +-
 src/backend/catalog/index.c              |   2 +-
 src/backend/catalog/system_views.sql     |  26 +
 src/backend/commands/cluster.c           | 717 +++++++++++++++--------
 src/backend/commands/vacuum.c            |   3 +-
 src/backend/parser/gram.y                |  77 ++-
 src/backend/tcop/utility.c               |  20 +-
 src/backend/utils/adt/pgstatfuncs.c      |   2 +
 src/bin/psql/tab-complete.in.c           |  33 +-
 src/include/commands/cluster.h           |   7 +-
 src/include/commands/progress.h          |  63 +-
 src/include/nodes/parsenodes.h           |  20 +-
 src/include/parser/kwlist.h              |   1 +
 src/include/tcop/cmdtaglist.h            |   1 +
 src/include/utils/backend_progress.h     |   1 +
 src/test/regress/expected/cluster.out    | 125 +++-
 src/test/regress/expected/rules.out      |  23 +
 src/test/regress/sql/cluster.sql         |  59 ++
 src/tools/pgindent/typedefs.list         |   3 +
 25 files changed, 1406 insertions(+), 379 deletions(-)
 create mode 100644 doc/src/sgml/ref/repack.sgml

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3f4a27a736e..12e103d319d 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -405,6 +405,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_progress_repack</structname><indexterm><primary>pg_stat_progress_repack</primary></indexterm></entry>
+      <entry>One row for each backend running
+       <command>REPACK</command>, showing current progress.  See
+       <xref linkend="repack-progress-reporting"/>.
+      </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_progress_basebackup</structname><indexterm><primary>pg_stat_progress_basebackup</primary></indexterm></entry>
       <entry>One row for each WAL sender process streaming a base backup,
@@ -5506,7 +5514,8 @@ FROM pg_stat_get_backend_idset() AS backendid;
    certain commands during command execution.  Currently, the only commands
    which support progress reporting are <command>ANALYZE</command>,
    <command>CLUSTER</command>,
-   <command>CREATE INDEX</command>, <command>VACUUM</command>,
+   <command>CREATE INDEX</command>, <command>REPACK</command>,
+   <command>VACUUM</command>,
    <command>COPY</command>,
    and <xref linkend="protocol-replication-base-backup"/> (i.e., replication
    command that <xref linkend="app-pgbasebackup"/> issues to take
@@ -5965,6 +5974,218 @@ FROM pg_stat_get_backend_idset() AS backendid;
   </table>
  </sect2>
 
+ <sect2 id="repack-progress-reporting">
+  <title>REPACK Progress Reporting</title>
+
+  <indexterm>
+   <primary>pg_stat_progress_repack</primary>
+  </indexterm>
+
+  <para>
+   Whenever <command>REPACK</command> is running,
+   the <structname>pg_stat_progress_repack</structname> view will contain a
+   row for each backend that is currently running the command.  The tables
+   below describe the information that will be reported and provide
+   information about how to interpret it.
+  </para>
+
+  <table id="pg-stat-progress-repack-view" xreflabel="pg_stat_progress_repack">
+   <title><structname>pg_stat_progress_repack</structname> View</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>pid</structfield> <type>integer</type>
+      </para>
+      <para>
+       Process ID of backend.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>datid</structfield> <type>oid</type>
+      </para>
+      <para>
+       OID of the database to which this backend is connected.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>datname</structfield> <type>name</type>
+      </para>
+      <para>
+       Name of the database to which this backend is connected.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>relid</structfield> <type>oid</type>
+      </para>
+      <para>
+       OID of the table being repacked.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>phase</structfield> <type>text</type>
+      </para>
+      <para>
+       Current processing phase. See <xref linkend="repack-phases"/>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>repack_index_relid</structfield> <type>oid</type>
+      </para>
+      <para>
+       If the table is being scanned using an index, this is the OID of the
+       index being used; otherwise, it is zero.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_scanned</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples scanned.
+       This counter only advances when the phase is
+       <literal>seq scanning heap</literal>,
+       <literal>index scanning heap</literal>
+       or <literal>writing new heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_written</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples written.
+       This counter only advances when the phase is
+       <literal>seq scanning heap</literal>,
+       <literal>index scanning heap</literal>
+       or <literal>writing new heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_blks_total</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total number of heap blocks in the table.  This number is reported
+       as of the beginning of <literal>seq scanning heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_blks_scanned</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap blocks scanned.  This counter only advances when the
+       phase is <literal>seq scanning heap</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>index_rebuild_count</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of indexes rebuilt.  This counter only advances when the phase
+       is <literal>rebuilding index</literal>.
+      </para></entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <table id="repack-phases">
+   <title>REPACK Phases</title>
+   <tgroup cols="2">
+    <colspec colname="col1" colwidth="1*"/>
+    <colspec colname="col2" colwidth="2*"/>
+    <thead>
+    <row>
+      <entry>Phase</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><literal>initializing</literal></entry>
+     <entry>
+       The command is preparing to begin scanning the heap.  This phase is
+       expected to be very brief.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>seq scanning heap</literal></entry>
+     <entry>
+       The command is currently scanning the table using a sequential scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>index scanning heap</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently scanning the table using an index scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>sorting tuples</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently sorting tuples.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>writing new heap</literal></entry>
+     <entry>
+       <command>REPACK</command> is currently writing the new heap.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>swapping relation files</literal></entry>
+     <entry>
+       The command is currently swapping newly-built files into place.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>rebuilding index</literal></entry>
+     <entry>
+       The command is currently rebuilding an index.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>performing final cleanup</literal></entry>
+     <entry>
+       The command is performing final cleanup.  When this phase is
+       completed, <command>REPACK</command> will end.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+ </sect2>
+
  <sect2 id="copy-progress-reporting">
   <title>COPY Progress Reporting</title>
 
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index f5be638867a..c0ef654fcb4 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -167,6 +167,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY refreshMaterializedView SYSTEM "refresh_materialized_view.sgml">
 <!ENTITY reindex            SYSTEM "reindex.sgml">
 <!ENTITY releaseSavepoint   SYSTEM "release_savepoint.sgml">
+<!ENTITY repack             SYSTEM "repack.sgml">
 <!ENTITY reset              SYSTEM "reset.sgml">
 <!ENTITY revoke             SYSTEM "revoke.sgml">
 <!ENTITY rollback           SYSTEM "rollback.sgml">
diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index 8811f169ea0..ee4fd965928 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -42,18 +42,6 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
    <replaceable class="parameter">table_name</replaceable>.
   </para>
 
-  <para>
-   When a table is clustered, it is physically reordered
-   based on the index information. Clustering is a one-time operation:
-   when the table is subsequently updated, the changes are
-   not clustered.  That is, no attempt is made to store new or
-   updated rows according to their index order.  (If one wishes, one can
-   periodically recluster by issuing the command again.  Also, setting
-   the table's <literal>fillfactor</literal> storage parameter to less than
-   100% can aid in preserving cluster ordering during updates, since updated
-   rows are kept on the same page if enough space is available there.)
-  </para>
-
   <para>
    When a table is clustered, <productname>PostgreSQL</productname>
    remembers which index it was clustered by.  The form
@@ -78,6 +66,25 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
    database operations (both reads and writes) from operating on the
    table until the <command>CLUSTER</command> is finished.
   </para>
+
+  <warning>
+   <para>
+    The <command>CLUSTER</command> command is deprecated in favor of
+    <xref linkend="sql-repack"/>.
+   </para>
+  </warning>
+
+  <note>
+   <para>
+    <xref linkend="sql-repack-notes-on-clustering"/> explain how clustering
+    works, whether it is initiated by <command>CLUSTER</command> or
+    by <command>REPACK</command>. The notable difference between the two is
+    that <command>REPACK</command> does not remember the index used last
+    time. Thus if you don't specify an index, <command>REPACK</command>
+    rewrites the table but does not try to cluster it.
+   </para>
+  </note>
+
  </refsect1>
 
  <refsect1>
@@ -136,63 +143,12 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
     on the table.
    </para>
 
-   <para>
-    In cases where you are accessing single rows randomly
-    within a table, the actual order of the data in the
-    table is unimportant. However, if you tend to access some
-    data more than others, and there is an index that groups
-    them together, you will benefit from using <command>CLUSTER</command>.
-    If you are requesting a range of indexed values from a table, or a
-    single indexed value that has multiple rows that match,
-    <command>CLUSTER</command> will help because once the index identifies the
-    table page for the first row that matches, all other rows
-    that match are probably already on the same table page,
-    and so you save disk accesses and speed up the query.
-   </para>
-
-   <para>
-    <command>CLUSTER</command> can re-sort the table using either an index scan
-    on the specified index, or (if the index is a b-tree) a sequential
-    scan followed by sorting.  It will attempt to choose the method that
-    will be faster, based on planner cost parameters and available statistical
-    information.
-   </para>
-
    <para>
     While <command>CLUSTER</command> is running, the <xref
     linkend="guc-search-path"/> is temporarily changed to <literal>pg_catalog,
     pg_temp</literal>.
    </para>
 
-   <para>
-    When an index scan is used, a temporary copy of the table is created that
-    contains the table data in the index order.  Temporary copies of each
-    index on the table are created as well.  Therefore, you need free space on
-    disk at least equal to the sum of the table size and the index sizes.
-   </para>
-
-   <para>
-    When a sequential scan and sort is used, a temporary sort file is
-    also created, so that the peak temporary space requirement is as much
-    as double the table size, plus the index sizes.  This method is often
-    faster than the index scan method, but if the disk space requirement is
-    intolerable, you can disable this choice by temporarily setting <xref
-    linkend="guc-enable-sort"/> to <literal>off</literal>.
-   </para>
-
-   <para>
-    It is advisable to set <xref linkend="guc-maintenance-work-mem"/> to
-    a reasonably large value (but not more than the amount of RAM you can
-    dedicate to the <command>CLUSTER</command> operation) before clustering.
-   </para>
-
-   <para>
-    Because the planner records statistics about the ordering of
-    tables, it is advisable to run <link linkend="sql-analyze"><command>ANALYZE</command></link>
-    on the newly clustered table.
-    Otherwise, the planner might make poor choices of query plans.
-   </para>
-
    <para>
     Because <command>CLUSTER</command> remembers which indexes are clustered,
     one can cluster the tables one wants clustered manually the first time,
diff --git a/doc/src/sgml/ref/repack.sgml b/doc/src/sgml/ref/repack.sgml
new file mode 100644
index 00000000000..a612c72d971
--- /dev/null
+++ b/doc/src/sgml/ref/repack.sgml
@@ -0,0 +1,254 @@
+<!--
+doc/src/sgml/ref/repack.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="sql-repack">
+ <indexterm zone="sql-repack">
+  <primary>REPACK</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>REPACK</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>REPACK</refname>
+  <refpurpose>rewrite a table to reclaim disk space</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <replaceable class="parameter">table_name</replaceable> [ USING INDEX <replaceable class="parameter">index_name</replaceable> ] ]
+
+<phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
+
+    VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>REPACK</command> reclaims storage occupied by dead
+   tuples. Unlike <command>VACUUM</command>, it does so by rewriting the
+   entire contents of the table specified
+   by <replaceable class="parameter">table_name</replaceable> into a new disk
+   file with no extra space (except for the space guaranteed by
+   the <literal>fillfactor</literal> storage parameter), allowing unused space
+   to be returned to the operating system.
+  </para>
+
+  <para>
+   Without
+   a <replaceable class="parameter">table_name</replaceable>, <command>REPACK</command>
+   processes every table and materialized view in the current database that
+   the current user has the <literal>MAINTAIN</literal> privilege on. This
+   form of <command>REPACK</command> cannot be executed inside a transaction
+   block.
+  </para>
+
+  <para>
+   If <replaceable class="parameter">index_name</replaceable> is specified,
+   the table is clustered by this index. Please see the notes on clustering
+   below.
+  </para>
+
+  <para>
+   When a table is being repacked, an <literal>ACCESS EXCLUSIVE</literal> lock
+   is acquired on it. This prevents any other database operations (both reads
+   and writes) from operating on the table until the <command>REPACK</command>
+   is finished.
+  </para>
+
+  <refsect2 id="sql-repack-notes-on-clustering" xreflabel="Notes on Clustering">
+   <title>Notes on Clustering</title>
+
+   <para>
+    When a table is clustered, it is physically reordered based on the index
+    information.  Clustering is a one-time operation: when the table is
+    subsequently updated, the changes are not clustered.  That is, no attempt
+    is made to store new or updated rows according to their index order.  (If
+    one wishes, one can periodically recluster by issuing the command again.
+    Also, setting the table's <literal>fillfactor</literal> storage parameter
+    to less than 100% can aid in preserving cluster ordering during updates,
+    since updated rows are kept on the same page if enough space is available
+    there.)
+   </para>
+
+   <para>
+    In cases where you are accessing single rows randomly within a table, the
+    actual order of the data in the table is unimportant. However, if you tend
+    to access some data more than others, and there is an index that groups
+    them together, you will benefit from using <command>REPACK</command>.  If
+    you are requesting a range of indexed values from a table, or a single
+    indexed value that has multiple rows that match,
+    <command>REPACK</command> will help because once the index identifies the
+    table page for the first row that matches, all other rows that match are
+    probably already on the same table page, and so you save disk accesses and
+    speed up the query.
+   </para>
+
+   <para>
+    <command>REPACK</command> can re-sort the table using either an index scan
+    on the specified index (if the index is a b-tree), or a sequential scan
+    followed by sorting.  It will attempt to choose the method that will be
+    faster, based on planner cost parameters and available statistical
+    information.
+   </para>
+
+   <para>
+    Because the planner records statistics about the ordering of tables, it is
+    advisable to
+    run <link linkend="sql-analyze"><command>ANALYZE</command></link> on the
+    newly repacked table.  Otherwise, the planner might make poor choices of
+    query plans.
+   </para>
+  </refsect2>
+
+  <refsect2 id="sql-repack-notes-on-resources" xreflabel="Notes on Resources">
+   <title>Notes on Resources</title>
+
+   <para>
+    When an index scan or a sequential scan without sort is used, a temporary
+    copy of the table is created that contains the table data in the index
+    order.  Temporary copies of each index on the table are created as well.
+    Therefore, you need free space on disk at least equal to the sum of the
+    table size and the index sizes.
+   </para>
+
+   <para>
+    When a sequential scan and sort is used, a temporary sort file is also
+    created, so that the peak temporary space requirement is as much as double
+    the table size, plus the index sizes.  This method is often faster than
+    the index scan method, but if the disk space requirement is intolerable,
+    you can disable this choice by temporarily setting
+    <xref linkend="guc-enable-sort"/> to <literal>off</literal>.
+   </para>
+
+   <para>
+    It is advisable to set <xref linkend="guc-maintenance-work-mem"/> to a
+    reasonably large value (but not more than the amount of RAM you can
+    dedicate to the <command>REPACK</command> operation) before repacking.
+   </para>
+  </refsect2>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><replaceable class="parameter">table_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (possibly schema-qualified) of a table.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">index_name</replaceable></term>
+    <listitem>
+     <para>
+      The name of an index.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>VERBOSE</literal></term>
+    <listitem>
+     <para>
+      Prints a progress report as each table is repacked
+      at <literal>INFO</literal> level.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">boolean</replaceable></term>
+    <listitem>
+     <para>
+      Specifies whether the selected option should be turned on or off.
+      You can write <literal>TRUE</literal>, <literal>ON</literal>, or
+      <literal>1</literal> to enable the option, and <literal>FALSE</literal>,
+      <literal>OFF</literal>, or <literal>0</literal> to disable it.  The
+      <replaceable class="parameter">boolean</replaceable> value can also
+      be omitted, in which case <literal>TRUE</literal> is assumed.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+
+   <para>
+    To repack a table, one must have the <literal>MAINTAIN</literal> privilege
+    on the table.
+   </para>
+
+   <para>
+    While <command>REPACK</command> is running, the <xref
+    linkend="guc-search-path"/> is temporarily changed to <literal>pg_catalog,
+    pg_temp</literal>.
+   </para>
+
+  <para>
+    Each backend running <command>REPACK</command> will report its progress
+    in the <structname>pg_stat_progress_repack</structname> view. See
+    <xref linkend="repack-progress-reporting"/> for details.
+  </para>
+
+   <para>
+    Repacking a partitioned table repacks each of its partitions. If an index
+    is specified, each partition is repacked using the partition of that
+    index. <command>REPACK</command> on a partitioned table cannot be executed
+    inside a transaction block.
+   </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   Repack the table <literal>employees</literal>:
+<programlisting>
+REPACK employees;
+</programlisting>
+  </para>
+
+  <para>
+   Repack the table <literal>employees</literal> on the basis of its
+   index <literal>employees_ind</literal> (Since index is used here, this is
+   effectively clustering):
+<programlisting>
+REPACK employees USING INDEX employees_ind;
+</programlisting>
+  </para>
+
+  <para>
+   Repack all tables in the database on which you have
+   the <literal>MAINTAIN</literal> privilege:
+<programlisting>
+REPACK;
+</programlisting></para>
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There is no <command>REPACK</command> statement in the SQL standard.
+  </para>
+
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index bd5dcaf86a5..cee1cf3926c 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -98,6 +98,7 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
    <varlistentry>
     <term><literal>FULL</literal></term>
     <listitem>
+
      <para>
       Selects <quote>full</quote> vacuum, which can reclaim more
       space, but takes much longer and exclusively locks the table.
@@ -106,6 +107,14 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
       the operation is complete.  Usually this should only be used when a
       significant amount of space needs to be reclaimed from within the table.
      </para>
+
+     <warning>
+      <para>
+       The <option>FULL</option> parameter is deprecated in favor of
+       <xref linkend="sql-repack"/>.
+      </para>
+     </warning>
+
     </listitem>
    </varlistentry>
 
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index ff85ace83fc..229912d35b7 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -195,6 +195,7 @@
    &refreshMaterializedView;
    &reindex;
    &releaseSavepoint;
+   &repack;
    &reset;
    &revoke;
    &rollback;
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index cb4bc35c93e..0b03070d394 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -741,13 +741,13 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	if (OldIndex != NULL && !use_sort)
 	{
 		const int	ci_index[] = {
-			PROGRESS_CLUSTER_PHASE,
-			PROGRESS_CLUSTER_INDEX_RELID
+			PROGRESS_REPACK_PHASE,
+			PROGRESS_REPACK_INDEX_RELID
 		};
 		int64		ci_val[2];
 
 		/* Set phase and OIDOldIndex to columns */
-		ci_val[0] = PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP;
+		ci_val[0] = PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP;
 		ci_val[1] = RelationGetRelid(OldIndex);
 		pgstat_progress_update_multi_param(2, ci_index, ci_val);
 
@@ -759,15 +759,15 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	else
 	{
 		/* In scan-and-sort mode and also VACUUM FULL, set phase */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
 		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
 		/* Set total heap blocks */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_TOTAL_HEAP_BLKS,
+		pgstat_progress_update_param(PROGRESS_REPACK_TOTAL_HEAP_BLKS,
 									 heapScan->rs_nblocks);
 	}
 
@@ -809,7 +809,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 				 * is manually updated to the correct value when the table
 				 * scan finishes.
 				 */
-				pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+				pgstat_progress_update_param(PROGRESS_REPACK_HEAP_BLKS_SCANNED,
 											 heapScan->rs_nblocks);
 				break;
 			}
@@ -825,7 +825,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			 */
 			if (prev_cblock != heapScan->rs_cblock)
 			{
-				pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+				pgstat_progress_update_param(PROGRESS_REPACK_HEAP_BLKS_SCANNED,
 											 (heapScan->rs_cblock +
 											  heapScan->rs_nblocks -
 											  heapScan->rs_startblock
@@ -912,14 +912,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			 * In scan-and-sort mode, report increase in number of tuples
 			 * scanned
 			 */
-			pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
 										 *num_tuples);
 		}
 		else
 		{
 			const int	ct_index[] = {
-				PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED,
-				PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN
+				PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
+				PROGRESS_REPACK_HEAP_TUPLES_WRITTEN
 			};
 			int64		ct_val[2];
 
@@ -952,14 +952,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		double		n_tuples = 0;
 
 		/* Report that we are now sorting tuples */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_SORT_TUPLES);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_SORT_TUPLES);
 
 		tuplesort_performsort(tuplesort);
 
 		/* Report that we are now writing new heap */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-									 PROGRESS_CLUSTER_PHASE_WRITE_NEW_HEAP);
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP);
 
 		for (;;)
 		{
@@ -977,7 +977,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 									 values, isnull,
 									 rwstate);
 			/* Report n_tuples */
-			pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_WRITTEN,
 										 n_tuples);
 		}
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index c4029a4f3d3..3063abff9a5 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -4079,7 +4079,7 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 		Assert(!ReindexIsProcessingIndex(indexOid));
 
 		/* Set index rebuild count */
-		pgstat_progress_update_param(PROGRESS_CLUSTER_INDEX_REBUILD_COUNT,
+		pgstat_progress_update_param(PROGRESS_REPACK_INDEX_REBUILD_COUNT,
 									 i);
 		i++;
 	}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 1b3c5a55882..b2b7b10c2be 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1279,6 +1279,32 @@ CREATE VIEW pg_stat_progress_cluster AS
     FROM pg_stat_get_progress_info('CLUSTER') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
+CREATE VIEW pg_stat_progress_repack AS
+    SELECT
+        S.pid AS pid,
+        S.datid AS datid,
+        D.datname AS datname,
+        S.relid AS relid,
+	-- param1 is currently unused
+        CASE S.param2 WHEN 0 THEN 'initializing'
+                      WHEN 1 THEN 'seq scanning heap'
+                      WHEN 2 THEN 'index scanning heap'
+                      WHEN 3 THEN 'sorting tuples'
+                      WHEN 4 THEN 'writing new heap'
+                      WHEN 5 THEN 'swapping relation files'
+                      WHEN 6 THEN 'rebuilding index'
+                      WHEN 7 THEN 'performing final cleanup'
+                      END AS phase,
+        CAST(S.param3 AS oid) AS repack_index_relid,
+        S.param4 AS heap_tuples_scanned,
+        S.param5 AS heap_tuples_written,
+        S.param6 AS heap_blks_total,
+        S.param7 AS heap_blks_scanned,
+        S.param8 AS index_rebuild_count
+    FROM pg_stat_get_progress_info('REPACK') AS S
+        LEFT JOIN pg_database D ON S.datid = D.oid;
+
+
 CREATE VIEW pg_stat_progress_create_index AS
     SELECT
         S.pid AS pid, S.datid AS datid, D.datname AS datname,
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index b55221d44cd..af14a230f94 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -67,17 +67,40 @@ typedef struct
 	Oid			indexOid;
 } RelToCluster;
 
-
-static void cluster_multiple_rels(List *rtcs, ClusterParams *params);
-static void rebuild_relation(Relation OldHeap, Relation index, bool verbose);
+static bool cluster_rel_recheck(RepackCommand cmd, Relation OldHeap,
+								Oid indexOid, Oid userid, int options);
+static void rebuild_relation(RepackCommand cmd, bool usingindex,
+							 Relation OldHeap, Relation index, bool verbose);
 static void copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
-static List *get_tables_to_cluster(MemoryContext cluster_context);
-static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
-											   Oid indexOid);
-static bool cluster_is_permitted_for_relation(Oid relid, Oid userid);
-
+static List *get_tables_to_repack(RepackCommand cmd, bool usingindex,
+								  MemoryContext permcxt);
+static List *get_tables_to_repack_partitioned(RepackCommand cmd,
+											  MemoryContext cluster_context,
+											  Oid relid, bool rel_is_index);
+static bool cluster_is_permitted_for_relation(RepackCommand cmd,
+											  Oid relid, Oid userid);
+static Relation process_single_relation(RepackStmt *stmt,
+										ClusterParams *params);
+static Oid	determine_clustered_index(Relation rel, bool usingindex,
+									  const char *indexname);
+
+
+static const char *
+RepackCommandAsString(RepackCommand cmd)
+{
+	switch (cmd)
+	{
+		case REPACK_COMMAND_REPACK:
+			return "REPACK";
+		case REPACK_COMMAND_VACUUMFULL:
+			return "VACUUM";
+		case REPACK_COMMAND_CLUSTER:
+			return "VACUUM";
+	}
+	return "???";
+}
 
 /*---------------------------------------------------------------------------
  * This cluster code allows for clustering multiple tables at once. Because
@@ -104,101 +127,39 @@ static bool cluster_is_permitted_for_relation(Oid relid, Oid userid);
  *---------------------------------------------------------------------------
  */
 void
-cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
+ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 {
-	ListCell   *lc;
 	ClusterParams params = {0};
 	bool		verbose = false;
 	Relation	rel = NULL;
-	Oid			indexOid = InvalidOid;
-	MemoryContext cluster_context;
+	MemoryContext repack_context;
 	List	   *rtcs;
 
 	/* Parse option list */
-	foreach(lc, stmt->params)
+	foreach_node(DefElem, opt, stmt->params)
 	{
-		DefElem    *opt = (DefElem *) lfirst(lc);
-
 		if (strcmp(opt->defname, "verbose") == 0)
 			verbose = defGetBoolean(opt);
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
-					 errmsg("unrecognized CLUSTER option \"%s\"",
+					 errmsg("unrecognized %s option \"%s\"",
+							RepackCommandAsString(stmt->command),
 							opt->defname),
 					 parser_errposition(pstate, opt->location)));
 	}
 
 	params.options = (verbose ? CLUOPT_VERBOSE : 0);
 
+	/*
+	 * If a single relation is specified, process it and we're done ... unless
+	 * the relation is a partitioned table, in which case we fall through.
+	 */
 	if (stmt->relation != NULL)
 	{
-		/* This is the single-relation case. */
-		Oid			tableOid;
-
-		/*
-		 * Find, lock, and check permissions on the table.  We obtain
-		 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
-		 * single-transaction case.
-		 */
-		tableOid = RangeVarGetRelidExtended(stmt->relation,
-											AccessExclusiveLock,
-											0,
-											RangeVarCallbackMaintainsTable,
-											NULL);
-		rel = table_open(tableOid, NoLock);
-
-		/*
-		 * Reject clustering a remote temp table ... their local buffer
-		 * manager is not going to cope.
-		 */
-		if (RELATION_IS_OTHER_TEMP(rel))
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot cluster temporary tables of other sessions")));
-
-		if (stmt->indexname == NULL)
-		{
-			ListCell   *index;
-
-			/* We need to find the index that has indisclustered set. */
-			foreach(index, RelationGetIndexList(rel))
-			{
-				indexOid = lfirst_oid(index);
-				if (get_index_isclustered(indexOid))
-					break;
-				indexOid = InvalidOid;
-			}
-
-			if (!OidIsValid(indexOid))
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_OBJECT),
-						 errmsg("there is no previously clustered index for table \"%s\"",
-								stmt->relation->relname)));
-		}
-		else
-		{
-			/*
-			 * The index is expected to be in the same namespace as the
-			 * relation.
-			 */
-			indexOid = get_relname_relid(stmt->indexname,
-										 rel->rd_rel->relnamespace);
-			if (!OidIsValid(indexOid))
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_OBJECT),
-						 errmsg("index \"%s\" for table \"%s\" does not exist",
-								stmt->indexname, stmt->relation->relname)));
-		}
-
-		/* For non-partitioned tables, do what we came here to do. */
-		if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
-		{
-			cluster_rel(rel, indexOid, &params);
-			/* cluster_rel closes the relation, but keeps lock */
-
+		rel = process_single_relation(stmt, &params);
+		if (rel == NULL)
 			return;
-		}
 	}
 
 	/*
@@ -207,88 +168,103 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 	 * transaction.  This forces us to disallow running inside a user
 	 * transaction block.
 	 */
-	PreventInTransactionBlock(isTopLevel, "CLUSTER");
+	PreventInTransactionBlock(isTopLevel, RepackCommandAsString(stmt->command));
 
 	/* Also, we need a memory context to hold our list of relations */
-	cluster_context = AllocSetContextCreate(PortalContext,
-											"Cluster",
-											ALLOCSET_DEFAULT_SIZES);
+	repack_context = AllocSetContextCreate(PortalContext,
+										   "Repack",
+										   ALLOCSET_DEFAULT_SIZES);
+
+	params.options |= CLUOPT_RECHECK;
 
 	/*
-	 * Either we're processing a partitioned table, or we were not given any
-	 * table name at all.  In either case, obtain a list of relations to
-	 * process.
-	 *
-	 * In the former case, an index name must have been given, so we don't
-	 * need to recheck its "indisclustered" bit, but we have to check that it
-	 * is an index that we can cluster on.  In the latter case, we set the
-	 * option bit to have indisclustered verified.
-	 *
-	 * Rechecking the relation itself is necessary here in all cases.
+	 * If we don't have a relation yet, determine a relation list.  If we do,
+	 * then it must be a partitioned table, and we want to process its
+	 * partitions.
 	 */
-	params.options |= CLUOPT_RECHECK;
-	if (rel != NULL)
+	if (rel == NULL)
 	{
-		Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
-		check_index_is_clusterable(rel, indexOid, AccessShareLock);
-		rtcs = get_tables_to_cluster_partitioned(cluster_context, indexOid);
-
-		/* close relation, releasing lock on parent table */
-		table_close(rel, AccessExclusiveLock);
+		Assert(stmt->indexname == NULL);
+		rtcs = get_tables_to_repack(stmt->command, stmt->usingindex,
+									repack_context);
 	}
 	else
 	{
-		rtcs = get_tables_to_cluster(cluster_context);
-		params.options |= CLUOPT_RECHECK_ISCLUSTERED;
-	}
+		Oid			relid;
+		bool		rel_is_index;
 
-	/* Do the job. */
-	cluster_multiple_rels(rtcs, &params);
+		Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
 
-	/* Start a new transaction for the cleanup work. */
-	StartTransactionCommand();
+		/*
+		 * If an index name was specified, resolve it now and pass it down.
+		 */
+		if (stmt->usingindex)
+		{
+			/*
+			 * XXX how should this behave?  Passing no index to a partitioned
+			 * table could be useful to have certain partitions clustered by
+			 * some index, and other partitions by a different index.
+			 */
+			if (!stmt->indexname)
+				ereport(ERROR,
+						errmsg("there is no previously clustered index for table \"%s\"",
+							   RelationGetRelationName(rel)));
 
-	/* Clean up working storage */
-	MemoryContextDelete(cluster_context);
-}
+			relid = determine_clustered_index(rel, true, stmt->indexname);
+			/* XXX is this the right place for this? */
+			check_index_is_clusterable(rel, relid, AccessExclusiveLock);
+			rel_is_index = true;
+		}
+		else
+		{
+			relid = RelationGetRelid(rel);
+			rel_is_index = false;
+		}
 
-/*
- * Given a list of relations to cluster, process each of them in a separate
- * transaction.
- *
- * We expect to be in a transaction at start, but there isn't one when we
- * return.
- */
-static void
-cluster_multiple_rels(List *rtcs, ClusterParams *params)
-{
-	ListCell   *lc;
+		rtcs = get_tables_to_repack_partitioned(stmt->command, repack_context,
+												relid, rel_is_index);
+
+		/* close parent relation, releasing lock on it */
+		table_close(rel, AccessExclusiveLock);
+		rel = NULL;
+	}
 
 	/* Commit to get out of starting transaction */
 	PopActiveSnapshot();
 	CommitTransactionCommand();
 
 	/* Cluster the tables, each in a separate transaction */
-	foreach(lc, rtcs)
+	Assert(rel == NULL);
+	foreach_ptr(RelToCluster, rtc, rtcs)
 	{
-		RelToCluster *rtc = (RelToCluster *) lfirst(lc);
-		Relation	rel;
-
 		/* Start a new transaction for each relation. */
 		StartTransactionCommand();
 
 		/* functions in indexes may want a snapshot set */
 		PushActiveSnapshot(GetTransactionSnapshot());
 
-		rel = table_open(rtc->tableOid, AccessExclusiveLock);
+		/*
+		 * Open the target table, coping with the case where it has been
+		 * dropped.
+		 */
+		rel = try_table_open(rtc->tableOid, AccessExclusiveLock);
+		if (rel == NULL)
+			continue;
 
 		/* Process this table */
-		cluster_rel(rel, rtc->indexOid, params);
+		cluster_rel(stmt->command, stmt->usingindex,
+					rel, rtc->indexOid, &params);
 		/* cluster_rel closes the relation, but keeps lock */
 
 		PopActiveSnapshot();
 		CommitTransactionCommand();
 	}
+
+	/* Start a new transaction for the cleanup work. */
+	StartTransactionCommand();
+
+	/* Clean up working storage */
+	MemoryContextDelete(repack_context);
 }
 
 /*
@@ -304,11 +280,14 @@ cluster_multiple_rels(List *rtcs, ClusterParams *params)
  * them incrementally while we load the table.
  *
  * If indexOid is InvalidOid, the table will be rewritten in physical order
- * instead of index order.  This is the new implementation of VACUUM FULL,
- * and error messages should refer to the operation as VACUUM not CLUSTER.
+ * instead of index order.
+ *
+ * 'cmd' indicates which command is being executed, to be used for error
+ * messages.
  */
 void
-cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
+cluster_rel(RepackCommand cmd, bool usingindex,
+			Relation OldHeap, Oid indexOid, ClusterParams *params)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			save_userid;
@@ -323,13 +302,25 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	/* Check for user-requested abort. */
 	CHECK_FOR_INTERRUPTS();
 
-	pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
-	if (OidIsValid(indexOid))
-		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
+	if (cmd == REPACK_COMMAND_REPACK)
+		pgstat_progress_start_command(PROGRESS_COMMAND_REPACK, tableOid);
+	else
+		pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
+
+	if (cmd == REPACK_COMMAND_REPACK)
+		pgstat_progress_update_param(PROGRESS_REPACK_COMMAND,
+									 PROGRESS_REPACK_COMMAND_REPACK);
+	else if (cmd == REPACK_COMMAND_CLUSTER)
+	{
+		pgstat_progress_update_param(PROGRESS_REPACK_COMMAND,
 									 PROGRESS_CLUSTER_COMMAND_CLUSTER);
+	}
 	else
-		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
+	{
+		Assert(cmd == REPACK_COMMAND_VACUUMFULL);
+		pgstat_progress_update_param(PROGRESS_REPACK_COMMAND,
 									 PROGRESS_CLUSTER_COMMAND_VACUUM_FULL);
+	}
 
 	/*
 	 * Switch to the table owner's userid, so that any index functions are run
@@ -351,63 +342,21 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	 * to cluster a not-previously-clustered index.
 	 */
 	if (recheck)
-	{
-		/* Check that the user still has privileges for the relation */
-		if (!cluster_is_permitted_for_relation(tableOid, save_userid))
-		{
-			relation_close(OldHeap, AccessExclusiveLock);
-			goto out;
-		}
-
-		/*
-		 * Silently skip a temp table for a remote session.  Only doing this
-		 * check in the "recheck" case is appropriate (which currently means
-		 * somebody is executing a database-wide CLUSTER or on a partitioned
-		 * table), because there is another check in cluster() which will stop
-		 * any attempt to cluster remote temp tables by name.  There is
-		 * another check in cluster_rel which is redundant, but we leave it
-		 * for extra safety.
-		 */
-		if (RELATION_IS_OTHER_TEMP(OldHeap))
-		{
-			relation_close(OldHeap, AccessExclusiveLock);
+		if (!cluster_rel_recheck(cmd, OldHeap, indexOid, save_userid,
+								 params->options))
 			goto out;
-		}
-
-		if (OidIsValid(indexOid))
-		{
-			/*
-			 * Check that the index still exists
-			 */
-			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
-			{
-				relation_close(OldHeap, AccessExclusiveLock);
-				goto out;
-			}
-
-			/*
-			 * Check that the index is still the one with indisclustered set,
-			 * if needed.
-			 */
-			if ((params->options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
-				!get_index_isclustered(indexOid))
-			{
-				relation_close(OldHeap, AccessExclusiveLock);
-				goto out;
-			}
-		}
-	}
 
 	/*
-	 * We allow VACUUM FULL, but not CLUSTER, on shared catalogs.  CLUSTER
-	 * would work in most respects, but the index would only get marked as
-	 * indisclustered in the current database, leading to unexpected behavior
-	 * if CLUSTER were later invoked in another database.
+	 * We allow repacking shared catalogs only when not using an index. It
+	 * would work to use an index in most respects, but the index would only
+	 * get marked as indisclustered in the current database, leading to
+	 * unexpected behavior if CLUSTER were later invoked in another database.
 	 */
-	if (OidIsValid(indexOid) && OldHeap->rd_rel->relisshared)
+	if (usingindex && OldHeap->rd_rel->relisshared)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot cluster a shared catalog")));
+				 errmsg("cannot run \"%s\" on a shared catalog",
+						RepackCommandAsString(cmd))));
 
 	/*
 	 * Don't process temp tables of other backends ... their local buffer
@@ -415,21 +364,30 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	 */
 	if (RELATION_IS_OTHER_TEMP(OldHeap))
 	{
-		if (OidIsValid(indexOid))
+		if (cmd == REPACK_COMMAND_CLUSTER)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("cannot cluster temporary tables of other sessions")));
+		else if (cmd == REPACK_COMMAND_REPACK)
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot repack temporary tables of other sessions")));
+		}
 		else
+		{
+			Assert(cmd == REPACK_COMMAND_VACUUMFULL);
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("cannot vacuum temporary tables of other sessions")));
+		}
 	}
 
 	/*
 	 * Also check for active uses of the relation in the current transaction,
 	 * including open scans and pending AFTER trigger events.
 	 */
-	CheckTableNotInUse(OldHeap, OidIsValid(indexOid) ? "CLUSTER" : "VACUUM");
+	CheckTableNotInUse(OldHeap, RepackCommandAsString(cmd));
 
 	/* Check heap and index are valid to cluster on */
 	if (OidIsValid(indexOid))
@@ -469,7 +427,7 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
 	TransferPredicateLocksToHeapRelation(OldHeap);
 
 	/* rebuild_relation does all the dirty work */
-	rebuild_relation(OldHeap, index, verbose);
+	rebuild_relation(cmd, usingindex, OldHeap, index, verbose);
 	/* rebuild_relation closes OldHeap, and index if valid */
 
 out:
@@ -482,6 +440,63 @@ out:
 	pgstat_progress_end_command();
 }
 
+/*
+ * Check if the table (and its index) still meets the requirements of
+ * cluster_rel().
+ */
+static bool
+cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
+					Oid userid, int options)
+{
+	Oid			tableOid = RelationGetRelid(OldHeap);
+
+	/* Check that the user still has privileges for the relation */
+	if (!cluster_is_permitted_for_relation(cmd, tableOid, userid))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		return false;
+	}
+
+	/*
+	 * Silently skip a temp table for a remote session.  Only doing this check
+	 * in the "recheck" case is appropriate (which currently means somebody is
+	 * executing a database-wide CLUSTER or on a partitioned table), because
+	 * there is another check in cluster() which will stop any attempt to
+	 * cluster remote temp tables by name.  There is another check in
+	 * cluster_rel which is redundant, but we leave it for extra safety.
+	 */
+	if (RELATION_IS_OTHER_TEMP(OldHeap))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		return false;
+	}
+
+	if (OidIsValid(indexOid))
+	{
+		/*
+		 * Check that the index still exists
+		 */
+		if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
+		{
+			relation_close(OldHeap, AccessExclusiveLock);
+			return false;
+		}
+
+		/*
+		 * Check that the index is still the one with indisclustered set, if
+		 * needed.
+		 */
+		if ((options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
+			!get_index_isclustered(indexOid))
+		{
+			relation_close(OldHeap, AccessExclusiveLock);
+			return false;
+		}
+	}
+
+	return true;
+}
+
 /*
  * Verify that the specified heap and index are valid to cluster on
  *
@@ -626,7 +641,8 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
  * On exit, they are closed, but locks on them are not released.
  */
 static void
-rebuild_relation(Relation OldHeap, Relation index, bool verbose)
+rebuild_relation(RepackCommand cmd, bool usingindex,
+				 Relation OldHeap, Relation index, bool verbose)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			accessMethod = OldHeap->rd_rel->relam;
@@ -642,8 +658,8 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose)
 	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false) &&
 		   (index == NULL || CheckRelationLockedByMe(index, AccessExclusiveLock, false)));
 
-	if (index)
-		/* Mark the correct index as clustered */
+	/* for CLUSTER or REPACK USING INDEX, mark the index as the one to use */
+	if (usingindex)
 		mark_index_clustered(OldHeap, RelationGetRelid(index), true);
 
 	/* Remember info about rel before closing OldHeap */
@@ -1458,8 +1474,8 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	int			i;
 
 	/* Report that we are now swapping relation files */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_SWAP_REL_FILES);
 
 	/* Zero out possible results from swapped_relation_files */
 	memset(mapped_tables, 0, sizeof(mapped_tables));
@@ -1509,14 +1525,14 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 		reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
 
 	/* Report that we are now reindexing relations */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_REBUILD_INDEX);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
 
 	reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
 
 	/* Report that we are now doing clean up */
-	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
-								 PROGRESS_CLUSTER_PHASE_FINAL_CLEANUP);
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_FINAL_CLEANUP);
 
 	/*
 	 * If the relation being rebuilt is pg_class, swap_relation_files()
@@ -1632,69 +1648,137 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	}
 }
 
-
 /*
- * Get a list of tables that the current user has privileges on and
- * have indisclustered set.  Return the list in a List * of RelToCluster
- * (stored in the specified memory context), each one giving the tableOid
- * and the indexOid on which the table is already clustered.
+ * Determine which relations to process, when REPACK/CLUSTER is called
+ * without specifying a table name.  The exact process depends on whether
+ * USING INDEX was given or not, and in any case we only return tables and
+ * materialized views that the current user has privileges to repack/cluster.
+ *
+ * If USING INDEX was given, we scan pg_index to find those that have
+ * indisclustered set; if it was not given, scan pg_class and return all
+ * tables.
+ *
+ * Return it as a list of RelToCluster in the given memory context.
  */
 static List *
-get_tables_to_cluster(MemoryContext cluster_context)
+get_tables_to_repack(RepackCommand command, bool usingindex,
+					 MemoryContext permcxt)
 {
-	Relation	indRelation;
+	Relation	catalog;
 	TableScanDesc scan;
-	ScanKeyData entry;
-	HeapTuple	indexTuple;
-	Form_pg_index index;
+	HeapTuple	tuple;
 	MemoryContext old_context;
 	List	   *rtcs = NIL;
 
-	/*
-	 * Get all indexes that have indisclustered set and that the current user
-	 * has the appropriate privileges for.
-	 */
-	indRelation = table_open(IndexRelationId, AccessShareLock);
-	ScanKeyInit(&entry,
-				Anum_pg_index_indisclustered,
-				BTEqualStrategyNumber, F_BOOLEQ,
-				BoolGetDatum(true));
-	scan = table_beginscan_catalog(indRelation, 1, &entry);
-	while ((indexTuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+	if (usingindex)
 	{
-		RelToCluster *rtc;
+		ScanKeyData entry;
+
+		catalog = table_open(IndexRelationId, AccessShareLock);
+		ScanKeyInit(&entry,
+					Anum_pg_index_indisclustered,
+					BTEqualStrategyNumber, F_BOOLEQ,
+					BoolGetDatum(true));
+		scan = table_beginscan_catalog(catalog, 1, &entry);
+		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+		{
+			RelToCluster *rtc;
+			Form_pg_index index;
 
-		index = (Form_pg_index) GETSTRUCT(indexTuple);
+			index = (Form_pg_index) GETSTRUCT(tuple);
 
-		if (!cluster_is_permitted_for_relation(index->indrelid, GetUserId()))
-			continue;
+			/*
+			 * XXX I think the only reason there's no test failure here is
+			 * that we seldom have clustered indexes that would be affected by
+			 * concurrency.  Maybe we should also do the
+			 * ConditionalLockRelationOid+SearchSysCacheExists dance that we
+			 * do below.
+			 */
+			if (!cluster_is_permitted_for_relation(command, index->indrelid,
+												   GetUserId()))
+				continue;
 
-		/* Use a permanent memory context for the result list */
-		old_context = MemoryContextSwitchTo(cluster_context);
+			/* Use a permanent memory context for the result list */
+			old_context = MemoryContextSwitchTo(permcxt);
 
-		rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
-		rtc->tableOid = index->indrelid;
-		rtc->indexOid = index->indexrelid;
-		rtcs = lappend(rtcs, rtc);
+			rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
+			rtc->tableOid = index->indrelid;
+			rtc->indexOid = index->indexrelid;
+			rtcs = lappend(rtcs, rtc);
 
-		MemoryContextSwitchTo(old_context);
+			MemoryContextSwitchTo(old_context);
+		}
+	}
+	else
+	{
+		catalog = table_open(RelationRelationId, AccessShareLock);
+		scan = table_beginscan_catalog(catalog, 0, NULL);
+
+		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+		{
+			RelToCluster *rtc;
+			Form_pg_class class;
+
+			class = (Form_pg_class) GETSTRUCT(tuple);
+
+			/*
+			 * Try to obtain a light lock on the table, to ensure it doesn't
+			 * go away while we collect the list.  If we cannot, just
+			 * disregard the table.  XXX we could release at the bottom of the
+			 * loop, but for now just hold it until this transaction is
+			 * finished.
+			 */
+			if (!ConditionalLockRelationOid(class->oid, AccessShareLock))
+				continue;
+
+			/* Verify that the table still exists. */
+			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(class->oid)))
+			{
+				/* Release useless lock */
+				UnlockRelationOid(class->oid, AccessShareLock);
+				continue;
+			}
+
+			/* Can only process plain tables and matviews */
+			if (class->relkind != RELKIND_RELATION &&
+				class->relkind != RELKIND_MATVIEW)
+				continue;
+
+			if (!cluster_is_permitted_for_relation(command, class->oid,
+												   GetUserId()))
+				continue;
+
+			/* Use a permanent memory context for the result list */
+			old_context = MemoryContextSwitchTo(permcxt);
+
+			rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
+			rtc->tableOid = class->oid;
+			rtc->indexOid = InvalidOid;
+			rtcs = lappend(rtcs, rtc);
+
+			MemoryContextSwitchTo(old_context);
+		}
 	}
-	table_endscan(scan);
 
-	relation_close(indRelation, AccessShareLock);
+	table_endscan(scan);
+	relation_close(catalog, AccessShareLock);
 
 	return rtcs;
 }
 
 /*
- * Given an index on a partitioned table, return a list of RelToCluster for
+ * Given a partitioned table or its index, return a list of RelToCluster for
  * all the children leaves tables/indexes.
  *
  * Like expand_vacuum_rel, but here caller must hold AccessExclusiveLock
  * on the table containing the index.
+ *
+ * 'rel_is_index' tells whether 'relid' is that of an index (true) or of the
+ * owning relation.
  */
 static List *
-get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
+get_tables_to_repack_partitioned(RepackCommand cmd, MemoryContext cluster_context,
+								 Oid relid, bool rel_is_index)
 {
 	List	   *inhoids;
 	ListCell   *lc;
@@ -1702,17 +1786,33 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
 	MemoryContext old_context;
 
 	/* Do not lock the children until they're processed */
-	inhoids = find_all_inheritors(indexOid, NoLock, NULL);
+	inhoids = find_all_inheritors(relid, NoLock, NULL);
 
 	foreach(lc, inhoids)
 	{
-		Oid			indexrelid = lfirst_oid(lc);
-		Oid			relid = IndexGetRelation(indexrelid, false);
+		Oid			inhoid = lfirst_oid(lc);
+		Oid			inhrelid,
+					inhindid;
 		RelToCluster *rtc;
 
-		/* consider only leaf indexes */
-		if (get_rel_relkind(indexrelid) != RELKIND_INDEX)
-			continue;
+		if (rel_is_index)
+		{
+			/* consider only leaf indexes */
+			if (get_rel_relkind(inhoid) != RELKIND_INDEX)
+				continue;
+
+			inhrelid = IndexGetRelation(inhoid, false);
+			inhindid = inhoid;
+		}
+		else
+		{
+			/* consider only leaf relations */
+			if (get_rel_relkind(inhoid) != RELKIND_RELATION)
+				continue;
+
+			inhrelid = inhoid;
+			inhindid = InvalidOid;
+		}
 
 		/*
 		 * It's possible that the user does not have privileges to CLUSTER the
@@ -1720,15 +1820,15 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
 		 * table.  We skip any partitions which the user is not permitted to
 		 * CLUSTER.
 		 */
-		if (!cluster_is_permitted_for_relation(relid, GetUserId()))
+		if (!cluster_is_permitted_for_relation(cmd, inhrelid, GetUserId()))
 			continue;
 
 		/* Use a permanent memory context for the result list */
 		old_context = MemoryContextSwitchTo(cluster_context);
 
 		rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
-		rtc->tableOid = relid;
-		rtc->indexOid = indexrelid;
+		rtc->tableOid = inhrelid;
+		rtc->indexOid = inhindid;
 		rtcs = lappend(rtcs, rtc);
 
 		MemoryContextSwitchTo(old_context);
@@ -1742,13 +1842,134 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
  * function emits a WARNING.
  */
 static bool
-cluster_is_permitted_for_relation(Oid relid, Oid userid)
+cluster_is_permitted_for_relation(RepackCommand cmd, Oid relid, Oid userid)
 {
 	if (pg_class_aclcheck(relid, userid, ACL_MAINTAIN) == ACLCHECK_OK)
 		return true;
 
+	Assert(cmd == REPACK_COMMAND_CLUSTER || cmd == REPACK_COMMAND_REPACK);
 	ereport(WARNING,
-			(errmsg("permission denied to cluster \"%s\", skipping it",
-					get_rel_name(relid))));
+			errmsg("permission denied to execute %s on \"%s\", skipping it",
+				   cmd == REPACK_COMMAND_CLUSTER ? "CLUSTER" : "REPACK",
+				   get_rel_name(relid)));
+
 	return false;
 }
+
+
+/*
+ * Given a RepackStmt with an indicated relation name, resolve the relation
+ * name, obtain lock on it, then determine what to do based on the relation
+ * type: if it's not a partitioned table, repack it as indicated (using an
+ * existing clustered index, or following the indicated index), and return
+ * NULL.
+ *
+ * On the other hand, if the table is partitioned, do nothing further and
+ * instead return the opened relcache entry, so that caller can process the
+ * partitions using the multiple-table handling code.  The index name is not
+ * resolve in this case.
+ */
+static Relation
+process_single_relation(RepackStmt *stmt, ClusterParams *params)
+{
+	Relation	rel;
+	Oid			tableOid;
+
+	Assert(stmt->relation != NULL);
+	Assert(stmt->command == REPACK_COMMAND_CLUSTER ||
+		   stmt->command == REPACK_COMMAND_REPACK);
+
+	/*
+	 * Find, lock, and check permissions on the table.  We obtain
+	 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
+	 * single-transaction case.
+	 */
+	tableOid = RangeVarGetRelidExtended(stmt->relation,
+										AccessExclusiveLock,
+										0,
+										RangeVarCallbackMaintainsTable,
+										NULL);
+	rel = table_open(tableOid, NoLock);
+
+	/*
+	 * Reject clustering a remote temp table ... their local buffer manager is
+	 * not going to cope.
+	 */
+	if (RELATION_IS_OTHER_TEMP(rel))
+	{
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot execute %s on temporary tables of other sessions",
+					   RepackCommandAsString(stmt->command)));
+	}
+
+	/*
+	 * For partitioned tables, let caller handle this.  Otherwise, process it
+	 * here and we're done.
+	 */
+	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+		return rel;
+	else
+	{
+		Oid			indexOid;
+
+		indexOid = determine_clustered_index(rel, stmt->usingindex,
+											 stmt->indexname);
+		check_index_is_clusterable(rel, indexOid, AccessExclusiveLock);
+		cluster_rel(stmt->command, stmt->usingindex, rel, indexOid, params);
+		return NULL;
+	}
+}
+
+/*
+ * Given a relation and the usingindex/indexname options in a
+ * REPACK USING INDEX or CLUSTER command, return the OID of the index to use
+ * for clustering the table.
+ *
+ * Caller must hold lock on the relation so that the set of indexes doesn't
+ * change, and must call check_index_is_clusterable.
+ */
+static Oid
+determine_clustered_index(Relation rel, bool usingindex, const char *indexname)
+{
+	Oid			indexOid;
+
+	if (indexname == NULL && usingindex)
+	{
+		ListCell   *lc;
+
+		/* Find an index with indisclustered set, or report error */
+		foreach(lc, RelationGetIndexList(rel))
+		{
+			indexOid = lfirst_oid(lc);
+
+			if (get_index_isclustered(indexOid))
+				break;
+			indexOid = InvalidOid;
+		}
+
+		if (!OidIsValid(indexOid))
+			ereport(ERROR,
+					errcode(ERRCODE_UNDEFINED_OBJECT),
+					errmsg("there is no previously clustered index for table \"%s\"",
+						   RelationGetRelationName(rel)));
+	}
+	else if (indexname != NULL)
+	{
+		/*
+		 * An index was specified; figure out its name.  It must be in the
+		 * same namespace as the relation.
+		 */
+		indexOid = get_relname_relid(indexname,
+									 rel->rd_rel->relnamespace);
+		if (!OidIsValid(indexOid))
+			ereport(ERROR,
+					errcode(ERRCODE_UNDEFINED_OBJECT),
+					errmsg("index \"%s\" for table \"%s\" does not exist",
+						   indexname, RelationGetRelationName(rel)));
+	}
+	else
+		indexOid = InvalidOid;
+
+	return indexOid;
+}
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 733ef40ae7c..8863ad0e8bd 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -2287,7 +2287,8 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 				cluster_params.options |= CLUOPT_VERBOSE;
 
 			/* VACUUM FULL is now a variant of CLUSTER; see cluster.c */
-			cluster_rel(rel, InvalidOid, &cluster_params);
+			cluster_rel(REPACK_COMMAND_VACUUMFULL, false, rel, InvalidOid,
+						&cluster_params);
 			/* cluster_rel closes the relation, but keeps lock */
 
 			rel = NULL;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index db43034b9db..062235817f4 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -280,7 +280,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		AlterCompositeTypeStmt AlterUserMappingStmt
 		AlterRoleStmt AlterRoleSetStmt AlterPolicyStmt AlterStatsStmt
 		AlterDefaultPrivilegesStmt DefACLAction
-		AnalyzeStmt CallStmt ClosePortalStmt ClusterStmt CommentStmt
+		AnalyzeStmt CallStmt ClosePortalStmt CommentStmt
 		ConstraintsSetStmt CopyStmt CreateAsStmt CreateCastStmt
 		CreateDomainStmt CreateExtensionStmt CreateGroupStmt CreateOpClassStmt
 		CreateOpFamilyStmt AlterOpFamilyStmt CreatePLangStmt
@@ -297,7 +297,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		GrantStmt GrantRoleStmt ImportForeignSchemaStmt IndexStmt InsertStmt
 		ListenStmt LoadStmt LockStmt MergeStmt NotifyStmt ExplainableStmt PreparableStmt
 		CreateFunctionStmt AlterFunctionStmt ReindexStmt RemoveAggrStmt
-		RemoveFuncStmt RemoveOperStmt RenameStmt ReturnStmt RevokeStmt RevokeRoleStmt
+		RemoveFuncStmt RemoveOperStmt RenameStmt RepackStmt ReturnStmt RevokeStmt RevokeRoleStmt
 		RuleActionStmt RuleActionStmtOrEmpty RuleStmt
 		SecLabelStmt SelectStmt TransactionStmt TransactionStmtLegacy TruncateStmt
 		UnlistenStmt UpdateStmt VacuumStmt
@@ -316,7 +316,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 %type <str>			opt_single_name
 %type <list>		opt_qualified_name
-%type <boolean>		opt_concurrently
+%type <boolean>		opt_concurrently opt_usingindex
 %type <dbehavior>	opt_drop_behavior
 %type <list>		opt_utility_option_list
 %type <list>		utility_option_list
@@ -763,7 +763,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	QUOTE QUOTES
 
 	RANGE READ REAL REASSIGN RECURSIVE REF_P REFERENCES REFERENCING
-	REFRESH REINDEX RELATIVE_P RELEASE RENAME REPEATABLE REPLACE REPLICA
+	REFRESH REINDEX RELATIVE_P RELEASE RENAME REPACK REPEATABLE REPLACE REPLICA
 	RESET RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP
 	ROUTINE ROUTINES ROW ROWS RULE
 
@@ -1025,7 +1025,6 @@ stmt:
 			| CallStmt
 			| CheckPointStmt
 			| ClosePortalStmt
-			| ClusterStmt
 			| CommentStmt
 			| ConstraintsSetStmt
 			| CopyStmt
@@ -1099,6 +1098,7 @@ stmt:
 			| RemoveFuncStmt
 			| RemoveOperStmt
 			| RenameStmt
+			| RepackStmt
 			| RevokeStmt
 			| RevokeRoleStmt
 			| RuleStmt
@@ -1135,6 +1135,11 @@ opt_concurrently:
 			| /*EMPTY*/						{ $$ = false; }
 		;
 
+opt_usingindex:
+			USING INDEX						{ $$ = true; }
+			| /* EMPTY */					{ $$ = false; }
+		;
+
 opt_drop_behavior:
 			CASCADE							{ $$ = DROP_CASCADE; }
 			| RESTRICT						{ $$ = DROP_RESTRICT; }
@@ -11912,38 +11917,80 @@ CreateConversionStmt:
 /*****************************************************************************
  *
  *		QUERY:
+ *				REPACK [ (options) ] [ <qualified_name> [ USING INDEX <index_name> ] ]
+ *
+ *			obsolete variants:
  *				CLUSTER (options) [ <qualified_name> [ USING <index_name> ] ]
  *				CLUSTER [VERBOSE] [ <qualified_name> [ USING <index_name> ] ]
  *				CLUSTER [VERBOSE] <index_name> ON <qualified_name> (for pre-8.3)
  *
  *****************************************************************************/
 
-ClusterStmt:
-			CLUSTER '(' utility_option_list ')' qualified_name cluster_index_specification
+RepackStmt:
+			REPACK opt_utility_option_list qualified_name USING INDEX name
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = $3;
+					n->indexname = $6;
+					n->usingindex = true;
+					n->params = $2;
+					$$ = (Node *) n;
+				}
+			| REPACK opt_utility_option_list qualified_name opt_usingindex
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = $3;
+					n->indexname = NULL;
+					n->usingindex = $4;
+					n->params = $2;
+					$$ = (Node *) n;
+				}
+			| REPACK opt_usingindex
+				{
+					RepackStmt *n = makeNode(RepackStmt);
+
+					n->command = REPACK_COMMAND_REPACK;
+					n->relation = NULL;
+					n->indexname = NULL;
+					n->usingindex = $2;
+					n->params = NIL;
+					$$ = (Node *) n;
+				}
+			| CLUSTER '(' utility_option_list ')' qualified_name cluster_index_specification
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = $5;
 					n->indexname = $6;
+					n->usingindex = true;
 					n->params = $3;
 					$$ = (Node *) n;
 				}
 			| CLUSTER opt_utility_option_list
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = NULL;
 					n->indexname = NULL;
+					n->usingindex = true;
 					n->params = $2;
 					$$ = (Node *) n;
 				}
 			/* unparenthesized VERBOSE kept for pre-14 compatibility */
 			| CLUSTER opt_verbose qualified_name cluster_index_specification
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = $3;
 					n->indexname = $4;
+					n->usingindex = true;
 					if ($2)
 						n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
@@ -11951,20 +11998,24 @@ ClusterStmt:
 			/* unparenthesized VERBOSE kept for pre-17 compatibility */
 			| CLUSTER VERBOSE
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = NULL;
 					n->indexname = NULL;
+					n->usingindex = true;
 					n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
 				}
 			/* kept for pre-8.3 compatibility */
 			| CLUSTER opt_verbose name ON qualified_name
 				{
-					ClusterStmt *n = makeNode(ClusterStmt);
+					RepackStmt *n = makeNode(RepackStmt);
 
+					n->command = REPACK_COMMAND_CLUSTER;
 					n->relation = $5;
 					n->indexname = $3;
+					n->usingindex = true;
 					if ($2)
 						n->params = list_make1(makeDefElem("verbose", NULL, @2));
 					$$ = (Node *) n;
@@ -17960,6 +18011,7 @@ unreserved_keyword:
 			| RELATIVE_P
 			| RELEASE
 			| RENAME
+			| REPACK
 			| REPEATABLE
 			| REPLACE
 			| REPLICA
@@ -18592,6 +18644,7 @@ bare_label_keyword:
 			| RELATIVE_P
 			| RELEASE
 			| RENAME
+			| REPACK
 			| REPEATABLE
 			| REPLACE
 			| REPLICA
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 4f4191b0ea6..8a30e081614 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -277,9 +277,9 @@ ClassifyUtilityCommandAsReadOnly(Node *parsetree)
 				return COMMAND_OK_IN_RECOVERY | COMMAND_OK_IN_READ_ONLY_TXN;
 			}
 
-		case T_ClusterStmt:
 		case T_ReindexStmt:
 		case T_VacuumStmt:
+		case T_RepackStmt:
 			{
 				/*
 				 * These commands write WAL, so they're not strictly
@@ -854,14 +854,14 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 			ExecuteCallStmt(castNode(CallStmt, parsetree), params, isAtomicContext, dest);
 			break;
 
-		case T_ClusterStmt:
-			cluster(pstate, (ClusterStmt *) parsetree, isTopLevel);
-			break;
-
 		case T_VacuumStmt:
 			ExecVacuum(pstate, (VacuumStmt *) parsetree, isTopLevel);
 			break;
 
+		case T_RepackStmt:
+			ExecRepack(pstate, (RepackStmt *) parsetree, isTopLevel);
+			break;
+
 		case T_ExplainStmt:
 			ExplainQuery(pstate, (ExplainStmt *) parsetree, params, dest);
 			break;
@@ -2850,10 +2850,6 @@ CreateCommandTag(Node *parsetree)
 			tag = CMDTAG_CALL;
 			break;
 
-		case T_ClusterStmt:
-			tag = CMDTAG_CLUSTER;
-			break;
-
 		case T_VacuumStmt:
 			if (((VacuumStmt *) parsetree)->is_vacuumcmd)
 				tag = CMDTAG_VACUUM;
@@ -2861,6 +2857,10 @@ CreateCommandTag(Node *parsetree)
 				tag = CMDTAG_ANALYZE;
 			break;
 
+		case T_RepackStmt:
+			tag = CMDTAG_REPACK;
+			break;
+
 		case T_ExplainStmt:
 			tag = CMDTAG_EXPLAIN;
 			break;
@@ -3498,7 +3498,7 @@ GetCommandLogLevel(Node *parsetree)
 			lev = LOGSTMT_ALL;
 			break;
 
-		case T_ClusterStmt:
+		case T_RepackStmt:
 			lev = LOGSTMT_DDL;
 			break;
 
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index c756c2bebaa..a1e10e8c2f6 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -268,6 +268,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 		cmdtype = PROGRESS_COMMAND_ANALYZE;
 	else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
 		cmdtype = PROGRESS_COMMAND_CLUSTER;
+	else if (pg_strcasecmp(cmd, "REPACK") == 0)
+		cmdtype = PROGRESS_COMMAND_REPACK;
 	else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)
 		cmdtype = PROGRESS_COMMAND_CREATE_INDEX;
 	else if (pg_strcasecmp(cmd, "BASEBACKUP") == 0)
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 1f2ca946fc5..31c1ce5dc09 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -1247,7 +1247,7 @@ static const char *const sql_commands[] = {
 	"DELETE FROM", "DISCARD", "DO", "DROP", "END", "EXECUTE", "EXPLAIN",
 	"FETCH", "GRANT", "IMPORT FOREIGN SCHEMA", "INSERT INTO", "LISTEN", "LOAD", "LOCK",
 	"MERGE INTO", "MOVE", "NOTIFY", "PREPARE",
-	"REASSIGN", "REFRESH MATERIALIZED VIEW", "REINDEX", "RELEASE",
+	"REASSIGN", "REFRESH MATERIALIZED VIEW", "REINDEX", "RELEASE", "REPACK",
 	"RESET", "REVOKE", "ROLLBACK",
 	"SAVEPOINT", "SECURITY LABEL", "SELECT", "SET", "SHOW", "START",
 	"TABLE", "TRUNCATE", "UNLISTEN", "UPDATE", "VACUUM", "VALUES", "WITH",
@@ -4997,6 +4997,37 @@ match_previous_words(int pattern_id,
 			COMPLETE_WITH_QUERY(Query_for_list_of_tablespaces);
 	}
 
+/* REPACK */
+	else if (Matches("REPACK"))
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
+	else if (Matches("REPACK", "(*)"))
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
+	/* If we have REPACK <sth>, then add "USING INDEX" */
+	else if (Matches("REPACK", MatchAnyExcept("(")))
+		COMPLETE_WITH("USING INDEX");
+	/* If we have REPACK (*) <sth>, then add "USING INDEX" */
+	else if (Matches("REPACK", "(*)", MatchAny))
+		COMPLETE_WITH("USING INDEX");
+	/* If we have REPACK <sth> USING, then add the index as well */
+	else if (Matches("REPACK", MatchAny, "USING", "INDEX"))
+	{
+		set_completion_reference(prev3_wd);
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_index_of_table);
+	}
+	else if (HeadMatches("REPACK", "(*") &&
+			 !HeadMatches("REPACK", "(*)"))
+	{
+		/*
+		 * This fires if we're in an unfinished parenthesized option list.
+		 * get_previous_words treats a completed parenthesized option list as
+		 * one word, so the above test is correct.
+		 */
+		if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
+			COMPLETE_WITH("VERBOSE");
+		else if (TailMatches("VERBOSE"))
+			COMPLETE_WITH("ON", "OFF");
+	}
+
 /* SECURITY LABEL */
 	else if (Matches("SECURITY"))
 		COMPLETE_WITH("LABEL");
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 60088a64cbb..7f4138c7b36 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -31,8 +31,11 @@ typedef struct ClusterParams
 	bits32		options;		/* bitmask of CLUOPT_* */
 } ClusterParams;
 
-extern void cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel);
-extern void cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params);
+
+extern void ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel);
+
+extern void cluster_rel(RepackCommand command, bool usingindex,
+						Relation OldHeap, Oid indexOid, ClusterParams *params);
 extern void check_index_is_clusterable(Relation OldHeap, Oid indexOid,
 									   LOCKMODE lockmode);
 extern void mark_index_clustered(Relation rel, Oid indexOid, bool is_internal);
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 1cde4bd9bcf..5b6639c114c 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -56,24 +56,51 @@
 #define PROGRESS_ANALYZE_PHASE_COMPUTE_EXT_STATS		4
 #define PROGRESS_ANALYZE_PHASE_FINALIZE_ANALYZE			5
 
-/* Progress parameters for cluster */
-#define PROGRESS_CLUSTER_COMMAND				0
-#define PROGRESS_CLUSTER_PHASE					1
-#define PROGRESS_CLUSTER_INDEX_RELID			2
-#define PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED	3
-#define PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN	4
-#define PROGRESS_CLUSTER_TOTAL_HEAP_BLKS		5
-#define PROGRESS_CLUSTER_HEAP_BLKS_SCANNED		6
-#define PROGRESS_CLUSTER_INDEX_REBUILD_COUNT	7
-
-/* Phases of cluster (as advertised via PROGRESS_CLUSTER_PHASE) */
-#define PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP	1
-#define PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP	2
-#define PROGRESS_CLUSTER_PHASE_SORT_TUPLES		3
-#define PROGRESS_CLUSTER_PHASE_WRITE_NEW_HEAP	4
-#define PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES	5
-#define PROGRESS_CLUSTER_PHASE_REBUILD_INDEX	6
-#define PROGRESS_CLUSTER_PHASE_FINAL_CLEANUP	7
+/*
+ * Progress parameters for REPACK.
+ *
+ * Note: Since REPACK shares some code with CLUSTER, these values are also
+ * used by CLUSTER. (CLUSTER is now deprecated, so it makes little sense to
+ * introduce a separate set of constants.)
+ */
+#define PROGRESS_REPACK_COMMAND					0
+#define PROGRESS_REPACK_PHASE					1
+#define PROGRESS_REPACK_INDEX_RELID				2
+#define PROGRESS_REPACK_HEAP_TUPLES_SCANNED		3
+#define PROGRESS_REPACK_HEAP_TUPLES_WRITTEN		4
+#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			5
+#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		6
+#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		7
+
+/*
+ * Phases of repack (as advertised via PROGRESS_REPACK_PHASE).
+ */
+#define PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP		1
+#define PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP	2
+#define PROGRESS_REPACK_PHASE_SORT_TUPLES		3
+#define PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP	4
+#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	5
+#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		6
+#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		7
+
+/*
+ * Commands of PROGRESS_REPACK
+ *
+ * Currently we only have one command, so the PROGRESS_REPACK_COMMAND
+ * parameter is not necessary. However it makes cluster.c simpler if we have
+ * the same set of parameters for CLUSTER and REPACK - see the note on REPACK
+ * parameters above.
+ */
+#define PROGRESS_REPACK_COMMAND_REPACK			1
+
+/*
+ * Progress parameters for cluster.
+ *
+ * Although we need to report REPACK and CLUSTER in separate views, the
+ * parameters and phases of CLUSTER are a subset of those of REPACK. Therefore
+ * we just use the appropriate values defined for REPACK above instead of
+ * defining a separate set of constants here.
+ */
 
 /* Commands of PROGRESS_CLUSTER */
 #define PROGRESS_CLUSTER_COMMAND_CLUSTER		1
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 86a236bd58b..fcc25a0c592 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3949,16 +3949,26 @@ typedef struct AlterSystemStmt
 } AlterSystemStmt;
 
 /* ----------------------
- *		Cluster Statement (support pbrown's cluster index implementation)
+ *		Repack Statement
  * ----------------------
  */
-typedef struct ClusterStmt
+typedef enum RepackCommand
+{
+	REPACK_COMMAND_CLUSTER,
+	REPACK_COMMAND_REPACK,
+	REPACK_COMMAND_VACUUMFULL,
+} RepackCommand;
+
+typedef struct RepackStmt
 {
 	NodeTag		type;
-	RangeVar   *relation;		/* relation being indexed, or NULL if all */
-	char	   *indexname;		/* original index defined */
+	RepackCommand command;		/* type of command being run */
+	RangeVar   *relation;		/* relation being repacked */
+	char	   *indexname;		/* order tuples by this index */
+	bool		usingindex;		/* whether USING INDEX is specified */
 	List	   *params;			/* list of DefElem nodes */
-} ClusterStmt;
+} RepackStmt;
+
 
 /* ----------------------
  *		Vacuum and Analyze Statements
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index a4af3f717a1..22559369e2c 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -374,6 +374,7 @@ PG_KEYWORD("reindex", REINDEX, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("relative", RELATIVE_P, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("release", RELEASE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("rename", RENAME, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("repack", REPACK, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("repeatable", REPEATABLE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("replace", REPLACE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("replica", REPLICA, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/include/tcop/cmdtaglist.h b/src/include/tcop/cmdtaglist.h
index d250a714d59..cceb312f2b3 100644
--- a/src/include/tcop/cmdtaglist.h
+++ b/src/include/tcop/cmdtaglist.h
@@ -196,6 +196,7 @@ PG_CMDTAG(CMDTAG_REASSIGN_OWNED, "REASSIGN OWNED", false, false, false)
 PG_CMDTAG(CMDTAG_REFRESH_MATERIALIZED_VIEW, "REFRESH MATERIALIZED VIEW", true, false, false)
 PG_CMDTAG(CMDTAG_REINDEX, "REINDEX", true, false, false)
 PG_CMDTAG(CMDTAG_RELEASE, "RELEASE", false, false, false)
+PG_CMDTAG(CMDTAG_REPACK, "REPACK", false, false, false)
 PG_CMDTAG(CMDTAG_RESET, "RESET", false, false, false)
 PG_CMDTAG(CMDTAG_REVOKE, "REVOKE", true, false, false)
 PG_CMDTAG(CMDTAG_REVOKE_ROLE, "REVOKE ROLE", false, false, false)
diff --git a/src/include/utils/backend_progress.h b/src/include/utils/backend_progress.h
index dda813ab407..e69e366dcdc 100644
--- a/src/include/utils/backend_progress.h
+++ b/src/include/utils/backend_progress.h
@@ -28,6 +28,7 @@ typedef enum ProgressCommandType
 	PROGRESS_COMMAND_CREATE_INDEX,
 	PROGRESS_COMMAND_BASEBACKUP,
 	PROGRESS_COMMAND_COPY,
+	PROGRESS_COMMAND_REPACK,
 } ProgressCommandType;
 
 #define PGSTAT_NUM_PROGRESS_PARAM	20
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index 4d40a6809ab..5256628b51d 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -254,6 +254,63 @@ ORDER BY 1;
  clstr_tst_pkey
 (3 rows)
 
+-- REPACK handles individual tables identically to CLUSTER, but it's worth
+-- checking if it handles table hierarchies identically as well.
+REPACK clstr_tst USING INDEX clstr_tst_c;
+-- Verify that inheritance link still works
+INSERT INTO clstr_tst_inh VALUES (0, 100, 'in child table 2');
+SELECT a,b,c,substring(d for 30), length(d) from clstr_tst;
+ a  |  b  |        c         |           substring            | length 
+----+-----+------------------+--------------------------------+--------
+ 10 |  14 | catorce          |                                |       
+ 18 |   5 | cinco            |                                |       
+  9 |   4 | cuatro           |                                |       
+ 26 |  19 | diecinueve       |                                |       
+ 12 |  18 | dieciocho        |                                |       
+ 30 |  16 | dieciseis        |                                |       
+ 24 |  17 | diecisiete       |                                |       
+  2 |  10 | diez             |                                |       
+ 23 |  12 | doce             |                                |       
+ 11 |   2 | dos              |                                |       
+ 25 |   9 | nueve            |                                |       
+ 31 |   8 | ocho             |                                |       
+  1 |  11 | once             |                                |       
+ 28 |  15 | quince           |                                |       
+ 32 |   6 | seis             | xyzzyxyzzyxyzzyxyzzyxyzzyxyzzy | 500000
+ 29 |   7 | siete            |                                |       
+ 15 |  13 | trece            |                                |       
+ 22 |  30 | treinta          |                                |       
+ 17 |  32 | treinta y dos    |                                |       
+  3 |  31 | treinta y uno    |                                |       
+  5 |   3 | tres             |                                |       
+ 20 |   1 | uno              |                                |       
+  6 |  20 | veinte           |                                |       
+ 14 |  25 | veinticinco      |                                |       
+ 21 |  24 | veinticuatro     |                                |       
+  4 |  22 | veintidos        |                                |       
+ 19 |  29 | veintinueve      |                                |       
+ 16 |  28 | veintiocho       |                                |       
+ 27 |  26 | veintiseis       |                                |       
+ 13 |  27 | veintisiete      |                                |       
+  7 |  23 | veintitres       |                                |       
+  8 |  21 | veintiuno        |                                |       
+  0 | 100 | in child table   |                                |       
+  0 | 100 | in child table 2 |                                |       
+(34 rows)
+
+-- Verify that foreign key link still works
+INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
+ERROR:  insert or update on table "clstr_tst" violates foreign key constraint "clstr_tst_con"
+DETAIL:  Key (b)=(1111) is not present in table "clstr_tst_s".
+SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
+ORDER BY 1;
+       conname        
+----------------------
+ clstr_tst_a_not_null
+ clstr_tst_con
+ clstr_tst_pkey
+(3 rows)
+
 SELECT relname, relkind,
     EXISTS(SELECT 1 FROM pg_class WHERE oid = c.reltoastrelid) AS hastoast
 FROM pg_class c WHERE relname LIKE 'clstr_tst%' ORDER BY relname;
@@ -381,6 +438,35 @@ SELECT * FROM clstr_1;
  2
 (2 rows)
 
+-- REPACK w/o argument performs no ordering, so we can only check which tables
+-- have the relfilenode changed.
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_old AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+SET SESSION AUTHORIZATION regress_clstr_user;
+SET client_min_messages = ERROR;  -- order of "skipping" warnings may vary
+REPACK;
+RESET client_min_messages;
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_new AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+-- Do the actual comparison. Unlike CLUSTER, clstr_3 should have been
+-- processed because there is nothing like clustering index here.
+SELECT o.relname FROM relnodes_old o
+JOIN relnodes_new n ON o.relname = n.relname
+WHERE o.relfilenode <> n.relfilenode
+ORDER BY o.relname;
+ relname 
+---------
+ clstr_1
+ clstr_3
+(2 rows)
+
+SET SESSION AUTHORIZATION regress_clstr_user;
 -- Test MVCC-safety of cluster. There isn't much we can do to verify the
 -- results with a single backend...
 CREATE TABLE clustertest (key int PRIMARY KEY);
@@ -495,6 +581,43 @@ ALTER TABLE clstrpart SET WITHOUT CLUSTER;
 ERROR:  cannot mark index clustered in partitioned table
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 ERROR:  cannot mark index clustered in partitioned table
+-- Check that REPACK sets new relfilenodes: it should process exactly the same
+-- tables as CLUSTER did.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart USING INDEX clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
+-- And finally the same for REPACK w/o index.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
 DROP TABLE clstrpart;
 -- Ownership of partitions is checked
 CREATE TABLE ptnowner(i int unique) PARTITION BY LIST (i);
@@ -513,7 +636,7 @@ CREATE TEMP TABLE ptnowner_oldnodes AS
   JOIN pg_class AS c ON c.oid=tree.relid;
 SET SESSION AUTHORIZATION regress_ptnowner;
 CLUSTER ptnowner USING ptnowner_i_idx;
-WARNING:  permission denied to cluster "ptnowner2", skipping it
+WARNING:  permission denied to execute CLUSTER on "ptnowner2", skipping it
 RESET SESSION AUTHORIZATION;
 SELECT a.relname, a.relfilenode=b.relfilenode FROM pg_class a
   JOIN ptnowner_oldnodes b USING (oid) ORDER BY a.relname COLLATE "C";
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 35e8aad7701..3a1d1d28282 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2071,6 +2071,29 @@ pg_stat_progress_create_index| SELECT s.pid,
     s.param15 AS partitions_done
    FROM (pg_stat_get_progress_info('CREATE INDEX'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+pg_stat_progress_repack| SELECT s.pid,
+    s.datid,
+    d.datname,
+    s.relid,
+        CASE s.param2
+            WHEN 0 THEN 'initializing'::text
+            WHEN 1 THEN 'seq scanning heap'::text
+            WHEN 2 THEN 'index scanning heap'::text
+            WHEN 3 THEN 'sorting tuples'::text
+            WHEN 4 THEN 'writing new heap'::text
+            WHEN 5 THEN 'swapping relation files'::text
+            WHEN 6 THEN 'rebuilding index'::text
+            WHEN 7 THEN 'performing final cleanup'::text
+            ELSE NULL::text
+        END AS phase,
+    (s.param3)::oid AS repack_index_relid,
+    s.param4 AS heap_tuples_scanned,
+    s.param5 AS heap_tuples_written,
+    s.param6 AS heap_blks_total,
+    s.param7 AS heap_blks_scanned,
+    s.param8 AS index_rebuild_count
+   FROM (pg_stat_get_progress_info('REPACK'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
+     LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_vacuum| SELECT s.pid,
     s.datid,
     d.datname,
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index b7115f86104..cfcc3dc9761 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -76,6 +76,19 @@ INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
 SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
 ORDER BY 1;
 
+-- REPACK handles individual tables identically to CLUSTER, but it's worth
+-- checking if it handles table hierarchies identically as well.
+REPACK clstr_tst USING INDEX clstr_tst_c;
+
+-- Verify that inheritance link still works
+INSERT INTO clstr_tst_inh VALUES (0, 100, 'in child table 2');
+SELECT a,b,c,substring(d for 30), length(d) from clstr_tst;
+
+-- Verify that foreign key link still works
+INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
+
+SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
+ORDER BY 1;
 
 SELECT relname, relkind,
     EXISTS(SELECT 1 FROM pg_class WHERE oid = c.reltoastrelid) AS hastoast
@@ -159,6 +172,34 @@ INSERT INTO clstr_1 VALUES (1);
 CLUSTER clstr_1;
 SELECT * FROM clstr_1;
 
+-- REPACK w/o argument performs no ordering, so we can only check which tables
+-- have the relfilenode changed.
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_old AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+
+SET SESSION AUTHORIZATION regress_clstr_user;
+SET client_min_messages = ERROR;  -- order of "skipping" warnings may vary
+REPACK;
+RESET client_min_messages;
+
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_new AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+
+-- Do the actual comparison. Unlike CLUSTER, clstr_3 should have been
+-- processed because there is nothing like clustering index here.
+SELECT o.relname FROM relnodes_old o
+JOIN relnodes_new n ON o.relname = n.relname
+WHERE o.relfilenode <> n.relfilenode
+ORDER BY o.relname;
+
+SET SESSION AUTHORIZATION regress_clstr_user;
+
 -- Test MVCC-safety of cluster. There isn't much we can do to verify the
 -- results with a single backend...
 
@@ -229,6 +270,24 @@ SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM o
 CLUSTER clstrpart;
 ALTER TABLE clstrpart SET WITHOUT CLUSTER;
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+
+-- Check that REPACK sets new relfilenodes: it should process exactly the same
+-- tables as CLUSTER did.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart USING INDEX clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+
+-- And finally the same for REPACK w/o index.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+
 DROP TABLE clstrpart;
 
 -- Ownership of partitions is checked
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e6f2e93b2d6..24f98ed1e9e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -425,6 +425,7 @@ ClientCertName
 ClientConnectionInfo
 ClientData
 ClientSocket
+ClusterCommand
 ClonePtrType
 ClosePortalStmt
 ClosePtrType
@@ -2536,6 +2537,8 @@ ReorderBufferTupleCidKey
 ReorderBufferUpdateProgressTxnCB
 ReorderTuple
 RepOriginId
+RepackCommand
+RepackStmt
 ReparameterizeForeignPathByChild_function
 ReplaceVarsFromTargetList_context
 ReplaceVarsNoMatchOption
-- 
2.47.1

v18-0002-Refactor-index_concurrently_create_copy-for-use-with.patchtext/x-diffDownload

From 71d825ef5a61236a6ff288da3251379580e6d368 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Mon, 11 Aug 2025 15:31:34 +0200
Subject: [PATCH 2/4] Refactor index_concurrently_create_copy() for use with
 REPACK (CONCURRENTLY).

This patch moves the code to index_create_copy() and adds a "concurrently"
parameter so it can be used by REPACK (CONCURRENTLY).

With the CONCURRENTLY option, REPACK cannot simply swap the heap file and
rebuild its indexes. Instead, it needs to build a separate set of indexes
(including system catalog entries) *before* the actual swap, to reduce the
time AccessExclusiveLock needs to be held for.
---
 src/backend/catalog/index.c | 36 ++++++++++++++++++++++++++++--------
 src/include/catalog/index.h |  3 +++
 2 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 3063abff9a5..ff34b1b4269 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1290,15 +1290,31 @@ index_create(Relation heapRelation,
 /*
  * index_concurrently_create_copy
  *
- * Create concurrently an index based on the definition of the one provided by
- * caller.  The index is inserted into catalogs and needs to be built later
- * on.  This is called during concurrent reindex processing.
- *
- * "tablespaceOid" is the tablespace to use for this index.
+ * Variant of index_create_copy(), called during concurrent reindex
+ * processing.
  */
 Oid
 index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 							   Oid tablespaceOid, const char *newName)
+{
+	return index_create_copy(heapRelation, oldIndexId, tablespaceOid, newName,
+							 true);
+}
+
+/*
+ * index_create_copy
+ *
+ * Create an index based on the definition of the one provided by caller.  The
+ * index is inserted into catalogs and needs to be built later on.
+ *
+ * "tablespaceOid" is the tablespace to use for this index.
+ *
+ * The actual implementation of index_concurrently_create_copy(), reusable for
+ * other purposes.
+ */
+Oid
+index_create_copy(Relation heapRelation, Oid oldIndexId, Oid tablespaceOid,
+				  const char *newName, bool concurrently)
 {
 	Relation	indexRelation;
 	IndexInfo  *oldInfo,
@@ -1317,6 +1333,7 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 	List	   *indexColNames = NIL;
 	List	   *indexExprs = NIL;
 	List	   *indexPreds = NIL;
+	int		flags = 0;
 
 	indexRelation = index_open(oldIndexId, RowExclusiveLock);
 
@@ -1325,9 +1342,9 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 
 	/*
 	 * Concurrent build of an index with exclusion constraints is not
-	 * supported.
+	 * supported. If !concurrently, ii_ExclusinOps is currently not needed.
 	 */
-	if (oldInfo->ii_ExclusionOps != NULL)
+	if (oldInfo->ii_ExclusionOps != NULL && concurrently)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("concurrent index creation for exclusion constraints is not supported")));
@@ -1435,6 +1452,9 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 		stattargets[i].isnull = isnull;
 	}
 
+	if (concurrently)
+		flags = INDEX_CREATE_SKIP_BUILD | INDEX_CREATE_CONCURRENT;
+
 	/*
 	 * Now create the new index.
 	 *
@@ -1458,7 +1478,7 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 							  indcoloptions->values,
 							  stattargets,
 							  reloptionsDatum,
-							  INDEX_CREATE_SKIP_BUILD | INDEX_CREATE_CONCURRENT,
+							  flags,
 							  0,
 							  true, /* allow table to be a system catalog? */
 							  false,	/* is_internal? */
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index 4daa8bef5ee..fc3e42e5f70 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -99,6 +99,9 @@ extern Oid	index_concurrently_create_copy(Relation heapRelation,
 										   Oid oldIndexId,
 										   Oid tablespaceOid,
 										   const char *newName);
+extern Oid index_create_copy(Relation heapRelation, Oid oldIndexId,
+							 Oid tablespaceOid, const char *newName,
+							 bool concurrently);
 
 extern void index_concurrently_build(Oid heapRelationId,
 									 Oid indexRelationId);
-- 
2.47.1

v18-0003-Move-conversion-of-a-historic-to-MVCC-snapshot-to-a-.patchtext/x-diffDownload

From be9c9691330bf71d3d5a3a895c072959d8302270 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Mon, 11 Aug 2025 15:23:05 +0200
Subject: [PATCH 3/4] Move conversion of a "historic" to MVCC snapshot to a
 separate function.

The conversion is now handled by SnapBuildMVCCFromHistoric(). REPACK
CONCURRENTLY will also need it.
---
 src/backend/replication/logical/snapbuild.c | 51 +++++++++++++++++----
 src/backend/utils/time/snapmgr.c            |  3 +-
 src/include/replication/snapbuild.h         |  1 +
 src/include/utils/snapmgr.h                 |  1 +
 4 files changed, 45 insertions(+), 11 deletions(-)

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 8532bfd27e5..6eaa40f6acf 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -440,10 +440,7 @@ Snapshot
 SnapBuildInitialSnapshot(SnapBuild *builder)
 {
 	Snapshot	snap;
-	TransactionId xid;
 	TransactionId safeXid;
-	TransactionId *newxip;
-	int			newxcnt = 0;
 
 	Assert(XactIsoLevel == XACT_REPEATABLE_READ);
 	Assert(builder->building_full_snapshot);
@@ -485,6 +482,31 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 
 	MyProc->xmin = snap->xmin;
 
+	/* Convert the historic snapshot to MVCC snapshot. */
+	return SnapBuildMVCCFromHistoric(snap, true);
+}
+
+/*
+ * Turn a historic MVCC snapshot into an ordinary MVCC snapshot.
+ *
+ * Unlike a regular (non-historic) MVCC snapshot, the xip array of this
+ * snapshot contains not only running main transactions, but also their
+ * subtransactions. This difference does has no impact on XidInMVCCSnapshot().
+ *
+ * Pass true for 'in_place' if you don't care about modifying the source
+ * snapshot. If you need a new instance, and one that was allocated as a
+ * single chunk of memory, pass false.
+ */
+Snapshot
+SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place)
+{
+	TransactionId xid;
+	TransactionId *oldxip = snapshot->xip;
+	uint32		oldxcnt = snapshot->xcnt;
+	TransactionId *newxip;
+	int			newxcnt = 0;
+	Snapshot	result;
+
 	/* allocate in transaction context */
 	newxip = (TransactionId *)
 		palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount());
@@ -495,7 +517,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	 * classical snapshot by marking all non-committed transactions as
 	 * in-progress. This can be expensive.
 	 */
-	for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);)
+	for (xid = snapshot->xmin; NormalTransactionIdPrecedes(xid, snapshot->xmax);)
 	{
 		void	   *test;
 
@@ -503,7 +525,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 		 * Check whether transaction committed using the decoding snapshot
 		 * meaning of ->xip.
 		 */
-		test = bsearch(&xid, snap->xip, snap->xcnt,
+		test = bsearch(&xid, snapshot->xip, snapshot->xcnt,
 					   sizeof(TransactionId), xidComparator);
 
 		if (test == NULL)
@@ -520,11 +542,22 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	}
 
 	/* adjust remaining snapshot fields as needed */
-	snap->snapshot_type = SNAPSHOT_MVCC;
-	snap->xcnt = newxcnt;
-	snap->xip = newxip;
+	snapshot->xcnt = newxcnt;
+	snapshot->xip = newxip;
+
+	if (in_place)
+		result = snapshot;
+	else
+	{
+		result = CopySnapshot(snapshot);
+
+		/* Restore the original values so the source is intact. */
+		snapshot->xip = oldxip;
+		snapshot->xcnt = oldxcnt;
+	}
+	result->snapshot_type = SNAPSHOT_MVCC;
 
-	return snap;
+	return result;
 }
 
 /*
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index ea35f30f494..70a6b8902d1 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -212,7 +212,6 @@ typedef struct ExportedSnapshot
 static List *exportedSnapshots = NIL;
 
 /* Prototypes for local functions */
-static Snapshot CopySnapshot(Snapshot snapshot);
 static void UnregisterSnapshotNoOwner(Snapshot snapshot);
 static void FreeSnapshot(Snapshot snapshot);
 static void SnapshotResetXmin(void);
@@ -591,7 +590,7 @@ SetTransactionSnapshot(Snapshot sourcesnap, VirtualTransactionId *sourcevxid,
  * The copy is palloc'd in TopTransactionContext and has initial refcounts set
  * to 0.  The returned snapshot has the copied flag set.
  */
-static Snapshot
+Snapshot
 CopySnapshot(Snapshot snapshot)
 {
 	Snapshot	newsnap;
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index 44031dcf6e3..6d4d2d1814c 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -73,6 +73,7 @@ extern void FreeSnapshotBuilder(SnapBuild *builder);
 extern void SnapBuildSnapDecRefcount(Snapshot snap);
 
 extern Snapshot SnapBuildInitialSnapshot(SnapBuild *builder);
+extern Snapshot SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place);
 extern const char *SnapBuildExportSnapshot(SnapBuild *builder);
 extern void SnapBuildClearExportedSnapshot(void);
 extern void SnapBuildResetExportedSnapshotState(void);
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index d346be71642..147b190210a 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -60,6 +60,7 @@ extern Snapshot GetTransactionSnapshot(void);
 extern Snapshot GetLatestSnapshot(void);
 extern void SnapshotSetCommandId(CommandId curcid);
 
+extern Snapshot CopySnapshot(Snapshot snapshot);
 extern Snapshot GetCatalogSnapshot(Oid relid);
 extern Snapshot GetNonHistoricCatalogSnapshot(Oid relid);
 extern void InvalidateCatalogSnapshot(void);
-- 
2.47.1

v18-0004-Add-CONCURRENTLY-option-to-REPACK-command.patchtext/plainDownload

From 1f3f0c762b63a7d92b782fcbbdd1e977954c3406 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Mon, 11 Aug 2025 16:12:51 +0200
Subject: [PATCH 4/4] Add CONCURRENTLY option to REPACK command.

The REPACK command copies the relation data into a new file, creates new
indexes and eventually swaps the files. To make sure that the old file does
not change during the copying, the relation is locked in an exclusive mode,
which prevents applications from both reading and writing. (To keep the data
consistent, we'd only need to prevent the applications from writing, but even
reading needs to be blocked before we can swap the files - otherwise some
applications could continue using the old file. Since we should not request a
stronger lock without releasing the weaker one first, we acquire the exclusive
lock in the beginning and keep it till the end of the processing.)

This patch introduces an alternative workflow, which only requires the
exclusive lock when the relation (and index) files are being swapped.
(Supposedly, the swapping should be pretty fast.) On the other hand, when we
copy the data to the new file, we allow applications to read from the relation
and even to write to it.

First, we scan the relation using a "historic snapshot", and insert all the
tuples satisfying this snapshot into the new file.

Second, logical decoding is used to capture the data changes done by
applications during the copying (i.e. changes that do not satisfy the historic
snapshot mentioned above), and those are applied to the new file before we
acquire the exclusive lock that we need to swap the files. (Of course, more
data changes can take place while we are waiting for the lock - these will be
applied to the new file after we have acquired the lock, before we swap the
files.)

Since the logical decoding system, during its startup, waits until all the
transactions which already have XID assigned have finished, there is a risk of
deadlock if a transaction that already changed anything in the database tries
to acquire a conflicting lock on the table REPACK CONCURRENTLY is working
on. As an example, consider transaction running CREATE INDEX command on the
table that is being REPACKed CONCURRENTLY. On the other hand, DML commands
(INSERT, UPDATE, DELETE) are not a problem as their lock does not conflict
with REPACK CONCURRENTLY.

The current approach is that we accept the risk. If we tried to avoid it, it'd
be necessary to unlock the table before the logical decoding is setup and lock
it again afterwards. Such temporary unlocking would imply re-checking if the
table still meets all the requirements for REPACK CONCURRENTLY.

Like the existing implementation of REPACK, the variant with the CONCURRENTLY
option also requires an extra space for the new relation and index files
(which coexist with the old files for some time). In addition, the
CONCURRENTLY option might introduce a lag in releasing WAL segments for
archiving / recycling. This is due to the decoding of the data changes done by
applications concurrently. When copying the table contents into the new file,
we check the lag periodically. If it exceeds the size of a WAL segment, we
decode all the available WAL before resuming the copying. (Of course, the
changes are not applied until the whole table contents is copied.) A
background worker might be a better approach for the decoding - let's consider
implementing it in the future.

The WAL records produced by running DML commands on the new relation do not
contain enough information to be processed by the logical decoding system. All
we need from the new relation is the file (relfilenode), while the actual
relation is eventually dropped. Thus there is no point in replaying the DMLs
anywhere.
---
 doc/src/sgml/monitoring.sgml                  |   37 +-
 doc/src/sgml/mvcc.sgml                        |   12 +-
 doc/src/sgml/ref/repack.sgml                  |  129 +-
 src/Makefile                                  |    1 +
 src/backend/access/heap/heapam.c              |   34 +-
 src/backend/access/heap/heapam_handler.c      |  215 ++-
 src/backend/access/heap/rewriteheap.c         |    6 +-
 src/backend/access/transam/xact.c             |   11 +-
 src/backend/catalog/system_views.sql          |   30 +-
 src/backend/commands/cluster.c                | 1677 +++++++++++++++--
 src/backend/commands/matview.c                |    2 +-
 src/backend/commands/tablecmds.c              |    1 +
 src/backend/commands/vacuum.c                 |   12 +-
 src/backend/meson.build                       |    1 +
 src/backend/replication/logical/decode.c      |   83 +
 src/backend/replication/logical/snapbuild.c   |   20 +
 .../replication/pgoutput_repack/Makefile      |   32 +
 .../replication/pgoutput_repack/meson.build   |   18 +
 .../pgoutput_repack/pgoutput_repack.c         |  288 +++
 src/backend/storage/ipc/ipci.c                |    1 +
 .../storage/lmgr/generate-lwlocknames.pl      |    2 +-
 src/backend/utils/cache/relcache.c            |    1 +
 src/backend/utils/time/snapmgr.c              |    3 +-
 src/bin/psql/tab-complete.in.c                |   25 +-
 src/include/access/heapam.h                   |    9 +-
 src/include/access/heapam_xlog.h              |    2 +
 src/include/access/tableam.h                  |   10 +
 src/include/commands/cluster.h                |   90 +-
 src/include/commands/progress.h               |   23 +-
 src/include/replication/snapbuild.h           |    1 +
 src/include/storage/lockdefs.h                |    4 +-
 src/include/utils/snapmgr.h                   |    2 +
 src/test/modules/injection_points/Makefile    |    5 +-
 .../injection_points/expected/repack.out      |  113 ++
 .../modules/injection_points/logical.conf     |    1 +
 src/test/modules/injection_points/meson.build |    4 +
 .../injection_points/specs/repack.spec        |  143 ++
 src/test/regress/expected/rules.out           |   29 +-
 src/tools/pgindent/typedefs.list              |    4 +
 39 files changed, 2820 insertions(+), 261 deletions(-)
 create mode 100644 src/backend/replication/pgoutput_repack/Makefile
 create mode 100644 src/backend/replication/pgoutput_repack/meson.build
 create mode 100644 src/backend/replication/pgoutput_repack/pgoutput_repack.c
 create mode 100644 src/test/modules/injection_points/expected/repack.out
 create mode 100644 src/test/modules/injection_points/logical.conf
 create mode 100644 src/test/modules/injection_points/specs/repack.spec

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 12e103d319d..61c0197555f 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6074,14 +6074,35 @@ FROM pg_stat_get_backend_idset() AS backendid;
 
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>heap_tuples_written</structfield> <type>bigint</type>
+       <structfield>heap_tuples_inserted</structfield> <type>bigint</type>
       </para>
       <para>
-       Number of heap tuples written.
+       Number of heap tuples inserted.
        This counter only advances when the phase is
        <literal>seq scanning heap</literal>,
-       <literal>index scanning heap</literal>
-       or <literal>writing new heap</literal>.
+       <literal>index scanning heap</literal>,
+       <literal>writing new heap</literal>
+       or <literal>catch-up</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_updated</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples updated.
+       This counter only advances when the phase is <literal>catch-up</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>heap_tuples_deleted</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of heap tuples deleted.
+       This counter only advances when the phase is <literal>catch-up</literal>.
       </para></entry>
      </row>
 
@@ -6162,6 +6183,14 @@ FROM pg_stat_get_backend_idset() AS backendid;
        <command>REPACK</command> is currently writing the new heap.
      </entry>
     </row>
+    <row>
+     <entry><literal>catch-up</literal></entry>
+     <entry>
+       <command>REPACK CONCURRENTLY</command> is currently processing the DML
+       commands that other transactions executed during any of the preceding
+       phase.
+     </entry>
+    </row>
     <row>
      <entry><literal>swapping relation files</literal></entry>
      <entry>
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index 049ee75a4ba..0f5c34af542 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -1833,15 +1833,17 @@ SELECT pg_advisory_lock(q.id) FROM
    <title>Caveats</title>
 
    <para>
-    Some DDL commands, currently only <link linkend="sql-truncate"><command>TRUNCATE</command></link> and the
-    table-rewriting forms of <link linkend="sql-altertable"><command>ALTER TABLE</command></link>, are not
+    Some commands, currently only <link linkend="sql-truncate"><command>TRUNCATE</command></link>, the
+    table-rewriting forms of <link linkend="sql-altertable"><command>ALTER
+    TABLE</command></link> and <command>REPACK</command> with
+    the <literal>CONCURRENTLY</literal> option, are not
     MVCC-safe.  This means that after the truncation or rewrite commits, the
     table will appear empty to concurrent transactions, if they are using a
-    snapshot taken before the DDL command committed.  This will only be an
+    snapshot taken before the command committed.  This will only be an
     issue for a transaction that did not access the table in question
-    before the DDL command started &mdash; any transaction that has done so
+    before the command started &mdash; any transaction that has done so
     would hold at least an <literal>ACCESS SHARE</literal> table lock,
-    which would block the DDL command until that transaction completes.
+    which would block the truncating or rewriting command until that transaction completes.
     So these commands will not cause any apparent inconsistency in the
     table contents for successive queries on the target table, but they
     could cause visible inconsistency between the contents of the target
diff --git a/doc/src/sgml/ref/repack.sgml b/doc/src/sgml/ref/repack.sgml
index a612c72d971..9c089a6b3d7 100644
--- a/doc/src/sgml/ref/repack.sgml
+++ b/doc/src/sgml/ref/repack.sgml
@@ -22,6 +22,7 @@ PostgreSQL documentation
  <refsynopsisdiv>
 <synopsis>
 REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <replaceable class="parameter">table_name</replaceable> [ USING INDEX <replaceable class="parameter">index_name</replaceable> ] ]
+REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] CONCURRENTLY <replaceable class="parameter">table_name</replaceable> [ USING INDEX <replaceable class="parameter">index_name</replaceable> ]
 
 <phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
 
@@ -48,7 +49,8 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
    processes every table and materialized view in the current database that
    the current user has the <literal>MAINTAIN</literal> privilege on. This
    form of <command>REPACK</command> cannot be executed inside a transaction
-   block.
+   block.  Also, this form is not allowed if
+   the <literal>CONCURRENTLY</literal> option is used.
   </para>
 
   <para>
@@ -61,7 +63,8 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
    When a table is being repacked, an <literal>ACCESS EXCLUSIVE</literal> lock
    is acquired on it. This prevents any other database operations (both reads
    and writes) from operating on the table until the <command>REPACK</command>
-   is finished.
+   is finished. If you want to keep the table accessible during the repacking,
+   consider using the <literal>CONCURRENTLY</literal> option.
   </para>
 
   <refsect2 id="sql-repack-notes-on-clustering" xreflabel="Notes on Clustering">
@@ -160,6 +163,128 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>CONCURRENTLY</literal></term>
+    <listitem>
+     <para>
+      Allow other transactions to use the table while it is being repacked.
+     </para>
+
+     <para>
+      Internally, <command>REPACK</command> copies the contents of the table
+      (ignoring dead tuples) into a new file, sorted by the specified index,
+      and also creates a new file for each index. Then it swaps the old and
+      new files for the table and all the indexes, and deletes the old
+      files. The <literal>ACCESS EXCLUSIVE</literal> lock is needed to make
+      sure that the old files do not change during the processing because the
+      changes would get lost due to the swap.
+     </para>
+
+     <para>
+      With the <literal>CONCURRENTLY</literal> option, the <literal>ACCESS
+      EXCLUSIVE</literal> lock is only acquired to swap the table and index
+      files. The data changes that took place during the creation of the new
+      table and index files are captured using logical decoding
+      (<xref linkend="logicaldecoding"/>) and applied before
+      the <literal>ACCESS EXCLUSIVE</literal> lock is requested. Thus the lock
+      is typically held only for the time needed to swap the files, which
+      should be pretty short. However, the time might still be noticeable if
+      too many data changes have been done to the table while
+      <command>REPACK</command> was waiting for the lock: those changes must
+      be processed just before the files are swapped, while the
+      <literal>ACCESS EXCLUSIVE</literal> lock is being held.
+     </para>
+
+     <para>
+      Note that <command>REPACK</command> with the
+      the <literal>CONCURRENTLY</literal> option does not try to order the
+      rows inserted into the table after the repacking started. Also
+      note <command>REPACK</command> might fail to complete due to DDL
+      commands executed on the table by other transactions during the
+      repacking.
+     </para>
+
+     <note>
+      <para>
+       In addition to the temporary space requirements explained in
+       <xref linkend="sql-repack-notes-on-resources"/>,
+       the <literal>CONCURRENTLY</literal> option can add to the usage of
+       temporary space a bit more. The reason is that other transactions can
+       perform DML operations which cannot be applied to the new file until
+       <command>REPACK</command> has copied all the tuples from the old
+       file. Thus the tuples inserted into the old file during the copying are
+       also stored separately in a temporary file, so they can eventually be
+       applied to the new file.
+      </para>
+
+      <para>
+       Furthermore, the data changes performed during the copying are
+       extracted from <link linkend="wal">write-ahead log</link> (WAL), and
+       this extraction (decoding) only takes place when certain amount of WAL
+       has been written. Therefore, WAL removal can be delayed by this
+       threshold. Currently the threshold is equal to the value of
+       the <link linkend="guc-wal-segment-size"><varname>wal_segment_size</varname></link>
+       configuration parameter.
+      </para>
+     </note>
+
+     <para>
+      The <literal>CONCURRENTLY</literal> option cannot be used in the
+      following cases:
+
+      <itemizedlist>
+       <listitem>
+        <para>
+          The table is <literal>UNLOGGED</literal>.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+          The table is partitioned.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+          The table is a system catalog or a <acronym>TOAST</acronym> table.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         <command>REPACK</command> is executed inside a transaction block.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+          The <link linkend="guc-wal-level"><varname>wal_level</varname></link>
+          configuration parameter is less than <literal>logical</literal>.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         The <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+         configuration parameter does not allow for creation of an additional
+         replication slot.
+        </para>
+       </listitem>
+      </itemizedlist>
+     </para>
+
+     <warning>
+      <para>
+       <command>REPACK</command> with the <literal>CONCURRENTLY</literal>
+       option is not MVCC-safe, see <xref linkend="mvcc-caveats"/> for
+       details.
+      </para>
+     </warning>
+
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>VERBOSE</literal></term>
     <listitem>
diff --git a/src/Makefile b/src/Makefile
index 2f31a2f20a7..b18c9a14ffa 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -23,6 +23,7 @@ SUBDIRS = \
 	interfaces \
 	backend/replication/libpqwalreceiver \
 	backend/replication/pgoutput \
+	backend/replication/pgoutput_repack \
 	fe_utils \
 	bin \
 	pl \
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0dcd6ee817e..4fdb3e880e4 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -60,7 +60,8 @@ static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
 static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 								  Buffer newbuf, HeapTuple oldtup,
 								  HeapTuple newtup, HeapTuple old_key_tuple,
-								  bool all_visible_cleared, bool new_all_visible_cleared);
+								  bool all_visible_cleared, bool new_all_visible_cleared,
+								  bool wal_logical);
 #ifdef USE_ASSERT_CHECKING
 static void check_lock_if_inplace_updateable_rel(Relation relation,
 												 ItemPointer otid,
@@ -2769,7 +2770,7 @@ xmax_infomask_changed(uint16 new_infomask, uint16 old_infomask)
 TM_Result
 heap_delete(Relation relation, ItemPointer tid,
 			CommandId cid, Snapshot crosscheck, bool wait,
-			TM_FailureData *tmfd, bool changingPart)
+			TM_FailureData *tmfd, bool changingPart, bool wal_logical)
 {
 	TM_Result	result;
 	TransactionId xid = GetCurrentTransactionId();
@@ -3016,7 +3017,8 @@ l1:
 	 * Compute replica identity tuple before entering the critical section so
 	 * we don't PANIC upon a memory allocation failure.
 	 */
-	old_key_tuple = ExtractReplicaIdentity(relation, &tp, true, &old_key_copied);
+	old_key_tuple = wal_logical ?
+		ExtractReplicaIdentity(relation, &tp, true, &old_key_copied) : NULL;
 
 	/*
 	 * If this is the first possibly-multixact-able operation in the current
@@ -3106,6 +3108,15 @@ l1:
 				xlrec.flags |= XLH_DELETE_CONTAINS_OLD_KEY;
 		}
 
+		/*
+		 * Unlike UPDATE, DELETE is decoded even if there is no old key, so it
+		 * does not help to clear both XLH_DELETE_CONTAINS_OLD_TUPLE and
+		 * XLH_DELETE_CONTAINS_OLD_KEY. Thus we need an extra flag. TODO
+		 * Consider not decoding tuples w/o the old tuple/key instead.
+		 */
+		if (!wal_logical)
+			xlrec.flags |= XLH_DELETE_NO_LOGICAL;
+
 		XLogBeginInsert();
 		XLogRegisterData(&xlrec, SizeOfHeapDelete);
 
@@ -3198,7 +3209,8 @@ simple_heap_delete(Relation relation, ItemPointer tid)
 	result = heap_delete(relation, tid,
 						 GetCurrentCommandId(true), InvalidSnapshot,
 						 true /* wait for commit */ ,
-						 &tmfd, false /* changingPart */ );
+						 &tmfd, false, /* changingPart */
+						 true /* wal_logical */);
 	switch (result)
 	{
 		case TM_SelfModified:
@@ -3239,7 +3251,7 @@ TM_Result
 heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 			CommandId cid, Snapshot crosscheck, bool wait,
 			TM_FailureData *tmfd, LockTupleMode *lockmode,
-			TU_UpdateIndexes *update_indexes)
+			TU_UpdateIndexes *update_indexes, bool wal_logical)
 {
 	TM_Result	result;
 	TransactionId xid = GetCurrentTransactionId();
@@ -4132,7 +4144,8 @@ l2:
 								 newbuf, &oldtup, heaptup,
 								 old_key_tuple,
 								 all_visible_cleared,
-								 all_visible_cleared_new);
+								 all_visible_cleared_new,
+								 wal_logical);
 		if (newbuf != buffer)
 		{
 			PageSetLSN(BufferGetPage(newbuf), recptr);
@@ -4490,7 +4503,8 @@ simple_heap_update(Relation relation, ItemPointer otid, HeapTuple tup,
 	result = heap_update(relation, otid, tup,
 						 GetCurrentCommandId(true), InvalidSnapshot,
 						 true /* wait for commit */ ,
-						 &tmfd, &lockmode, update_indexes);
+						 &tmfd, &lockmode, update_indexes,
+						 true	/* wal_logical */);
 	switch (result)
 	{
 		case TM_SelfModified:
@@ -8831,7 +8845,8 @@ static XLogRecPtr
 log_heap_update(Relation reln, Buffer oldbuf,
 				Buffer newbuf, HeapTuple oldtup, HeapTuple newtup,
 				HeapTuple old_key_tuple,
-				bool all_visible_cleared, bool new_all_visible_cleared)
+				bool all_visible_cleared, bool new_all_visible_cleared,
+				bool wal_logical)
 {
 	xl_heap_update xlrec;
 	xl_heap_header xlhdr;
@@ -8842,7 +8857,8 @@ log_heap_update(Relation reln, Buffer oldbuf,
 				suffixlen = 0;
 	XLogRecPtr	recptr;
 	Page		page = BufferGetPage(newbuf);
-	bool		need_tuple_data = RelationIsLogicallyLogged(reln);
+	bool		need_tuple_data = RelationIsLogicallyLogged(reln) &&
+		wal_logical;
 	bool		init;
 	int			bufflags;
 
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 0b03070d394..c829c06f769 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -33,6 +33,7 @@
 #include "catalog/index.h"
 #include "catalog/storage.h"
 #include "catalog/storage_xlog.h"
+#include "commands/cluster.h"
 #include "commands/progress.h"
 #include "executor/executor.h"
 #include "miscadmin.h"
@@ -309,7 +310,8 @@ heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
 	 * the storage itself is cleaning the dead tuples by itself, it is the
 	 * time to call the index tuple deletion also.
 	 */
-	return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart);
+	return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart,
+					   true);
 }
 
 
@@ -328,7 +330,7 @@ heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
 	tuple->t_tableOid = slot->tts_tableOid;
 
 	result = heap_update(relation, otid, tuple, cid, crosscheck, wait,
-						 tmfd, lockmode, update_indexes);
+						 tmfd, lockmode, update_indexes, true);
 	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
 
 	/*
@@ -685,13 +687,15 @@ static void
 heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 								 Relation OldIndex, bool use_sort,
 								 TransactionId OldestXmin,
+								 Snapshot snapshot,
+								 LogicalDecodingContext *decoding_ctx,
 								 TransactionId *xid_cutoff,
 								 MultiXactId *multi_cutoff,
 								 double *num_tuples,
 								 double *tups_vacuumed,
 								 double *tups_recently_dead)
 {
-	RewriteState rwstate;
+	RewriteState rwstate = NULL;
 	IndexScanDesc indexScan;
 	TableScanDesc tableScan;
 	HeapScanDesc heapScan;
@@ -705,6 +709,8 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	bool	   *isnull;
 	BufferHeapTupleTableSlot *hslot;
 	BlockNumber prev_cblock = InvalidBlockNumber;
+	bool		concurrent = snapshot != NULL;
+	XLogRecPtr	end_of_wal_prev = GetFlushRecPtr(NULL);
 
 	/* Remember if it's a system catalog */
 	is_system_catalog = IsSystemRelation(OldHeap);
@@ -720,9 +726,12 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	values = (Datum *) palloc(natts * sizeof(Datum));
 	isnull = (bool *) palloc(natts * sizeof(bool));
 
-	/* Initialize the rewrite operation */
-	rwstate = begin_heap_rewrite(OldHeap, NewHeap, OldestXmin, *xid_cutoff,
-								 *multi_cutoff);
+	/*
+	 * Initialize the rewrite operation.
+	 */
+	if (!concurrent)
+		rwstate = begin_heap_rewrite(OldHeap, NewHeap, OldestXmin,
+									 *xid_cutoff, *multi_cutoff);
 
 
 	/* Set up sorting if wanted */
@@ -737,6 +746,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	 * Prepare to scan the OldHeap.  To ensure we see recently-dead tuples
 	 * that still need to be copied, we scan with SnapshotAny and use
 	 * HeapTupleSatisfiesVacuum for the visibility test.
+	 *
+	 * In the CONCURRENTLY case, we do regular MVCC visibility tests, using
+	 * the snapshot passed by the caller.
 	 */
 	if (OldIndex != NULL && !use_sort)
 	{
@@ -753,7 +765,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex,
+									snapshot ? snapshot :SnapshotAny,
+									NULL, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -762,7 +776,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap,
+									snapshot ? snapshot :SnapshotAny,
+									0, (ScanKey) NULL);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
@@ -785,6 +801,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		HeapTuple	tuple;
 		Buffer		buf;
 		bool		isdead;
+		HTSV_Result vis;
 
 		CHECK_FOR_INTERRUPTS();
 
@@ -837,70 +854,84 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		tuple = ExecFetchSlotHeapTuple(slot, false, NULL);
 		buf = hslot->buffer;
 
-		LockBuffer(buf, BUFFER_LOCK_SHARE);
-
-		switch (HeapTupleSatisfiesVacuum(tuple, OldestXmin, buf))
+		/*
+		 * Regarding CONCURRENTLY, see the comments on MVCC snapshot above.
+		 */
+		if (!concurrent)
 		{
-			case HEAPTUPLE_DEAD:
-				/* Definitely dead */
-				isdead = true;
-				break;
-			case HEAPTUPLE_RECENTLY_DEAD:
-				*tups_recently_dead += 1;
-				/* fall through */
-			case HEAPTUPLE_LIVE:
-				/* Live or recently dead, must copy it */
-				isdead = false;
-				break;
-			case HEAPTUPLE_INSERT_IN_PROGRESS:
+			LockBuffer(buf, BUFFER_LOCK_SHARE);
 
-				/*
-				 * Since we hold exclusive lock on the relation, normally the
-				 * only way to see this is if it was inserted earlier in our
-				 * own transaction.  However, it can happen in system
-				 * catalogs, since we tend to release write lock before commit
-				 * there.  Give a warning if neither case applies; but in any
-				 * case we had better copy it.
-				 */
-				if (!is_system_catalog &&
-					!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple->t_data)))
-					elog(WARNING, "concurrent insert in progress within table \"%s\"",
-						 RelationGetRelationName(OldHeap));
-				/* treat as live */
-				isdead = false;
-				break;
-			case HEAPTUPLE_DELETE_IN_PROGRESS:
+			switch ((vis = HeapTupleSatisfiesVacuum(tuple, OldestXmin, buf)))
+			{
+				case HEAPTUPLE_DEAD:
+					/* Definitely dead */
+					isdead = true;
+					break;
+				case HEAPTUPLE_RECENTLY_DEAD:
+					*tups_recently_dead += 1;
+					/* fall through */
+				case HEAPTUPLE_LIVE:
+					/* Live or recently dead, must copy it */
+					isdead = false;
+					break;
+				case HEAPTUPLE_INSERT_IN_PROGRESS:
 
 				/*
-				 * Similar situation to INSERT_IN_PROGRESS case.
+				 * As long as we hold exclusive lock on the relation, normally
+				 * the only way to see this is if it was inserted earlier in
+				 * our own transaction.  However, it can happen in system
+				 * catalogs, since we tend to release write lock before commit
+				 * there. Also, there's no exclusive lock during concurrent
+				 * processing. Give a warning if neither case applies; but in
+				 * any case we had better copy it.
 				 */
-				if (!is_system_catalog &&
-					!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tuple->t_data)))
-					elog(WARNING, "concurrent delete in progress within table \"%s\"",
-						 RelationGetRelationName(OldHeap));
-				/* treat as recently dead */
-				*tups_recently_dead += 1;
-				isdead = false;
-				break;
-			default:
-				elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
-				isdead = false; /* keep compiler quiet */
-				break;
-		}
+					if (!is_system_catalog && !concurrent &&
+						!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple->t_data)))
+						elog(WARNING, "concurrent insert in progress within table \"%s\"",
+							 RelationGetRelationName(OldHeap));
+					/* treat as live */
+					isdead = false;
+					break;
+				case HEAPTUPLE_DELETE_IN_PROGRESS:
 
-		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+					/*
+					 * Similar situation to INSERT_IN_PROGRESS case.
+					 */
+					if (!is_system_catalog && !concurrent &&
+						!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tuple->t_data)))
+						elog(WARNING, "concurrent delete in progress within table \"%s\"",
+							 RelationGetRelationName(OldHeap));
+					/* treat as recently dead */
+					*tups_recently_dead += 1;
+					isdead = false;
+					break;
+				default:
+					elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
+					isdead = false; /* keep compiler quiet */
+					break;
+			}
 
-		if (isdead)
-		{
-			*tups_vacuumed += 1;
-			/* heap rewrite module still needs to see it... */
-			if (rewrite_heap_dead_tuple(rwstate, tuple))
+			if (isdead)
 			{
-				/* A previous recently-dead tuple is now known dead */
 				*tups_vacuumed += 1;
-				*tups_recently_dead -= 1;
+				/* heap rewrite module still needs to see it... */
+				if (rewrite_heap_dead_tuple(rwstate, tuple))
+				{
+					/* A previous recently-dead tuple is now known dead */
+					*tups_vacuumed += 1;
+					*tups_recently_dead -= 1;
+				}
+
+				LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+				continue;
 			}
-			continue;
+
+			/*
+			 * In the concurrent case, we have a copy of the tuple, so we
+			 * don't worry whether the source tuple will be deleted / updated
+			 * after we release the lock.
+			 */
+			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 		}
 
 		*num_tuples += 1;
@@ -919,7 +950,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		{
 			const int	ct_index[] = {
 				PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
-				PROGRESS_REPACK_HEAP_TUPLES_WRITTEN
+				PROGRESS_REPACK_HEAP_TUPLES_INSERTED
 			};
 			int64		ct_val[2];
 
@@ -934,6 +965,31 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			ct_val[1] = *num_tuples;
 			pgstat_progress_update_multi_param(2, ct_index, ct_val);
 		}
+
+		/*
+		 * Process the WAL produced by the load, as well as by other
+		 * transactions, so that the replication slot can advance and WAL does
+		 * not pile up. Use wal_segment_size as a threshold so that we do not
+		 * introduce the decoding overhead too often.
+		 *
+		 * Of course, we must not apply the changes until the initial load has
+		 * completed.
+		 *
+		 * Note that our insertions into the new table should not be decoded
+		 * as we (intentionally) do not write the logical decoding specific
+		 * information to WAL.
+		 */
+		if (concurrent)
+		{
+			XLogRecPtr	end_of_wal;
+
+			end_of_wal = GetFlushRecPtr(NULL);
+			if ((end_of_wal - end_of_wal_prev) > wal_segment_size)
+			{
+				repack_decode_concurrent_changes(decoding_ctx, end_of_wal);
+				end_of_wal_prev = end_of_wal;
+			}
+		}
 	}
 
 	if (indexScan != NULL)
@@ -977,7 +1033,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 									 values, isnull,
 									 rwstate);
 			/* Report n_tuples */
-			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_WRITTEN,
+			pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_INSERTED,
 										 n_tuples);
 		}
 
@@ -985,7 +1041,8 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	}
 
 	/* Write out any remaining tuples, and fsync if needed */
-	end_heap_rewrite(rwstate);
+	if (rwstate)
+		end_heap_rewrite(rwstate);
 
 	/* Clean up */
 	pfree(values);
@@ -2376,6 +2433,10 @@ heapam_scan_sample_next_tuple(TableScanDesc scan, SampleScanState *scanstate,
  * SET WITHOUT OIDS.
  *
  * So, we must reconstruct the tuple from component Datums.
+ *
+ * If rwstate=NULL, use simple_heap_insert() instead of rewriting - in that
+ * case we still need to deform/form the tuple. TODO Shouldn't we rename the
+ * function, as might not do any rewrite?
  */
 static void
 reform_and_rewrite_tuple(HeapTuple tuple,
@@ -2398,8 +2459,28 @@ reform_and_rewrite_tuple(HeapTuple tuple,
 
 	copiedTuple = heap_form_tuple(newTupDesc, values, isnull);
 
-	/* The heap rewrite module does the rest */
-	rewrite_heap_tuple(rwstate, tuple, copiedTuple);
+	if (rwstate)
+		/* The heap rewrite module does the rest */
+		rewrite_heap_tuple(rwstate, tuple, copiedTuple);
+	else
+	{
+		/*
+		 * Insert tuple when processing REPACK CONCURRENTLY.
+		 *
+		 * rewriteheap.c is not used in the CONCURRENTLY case because it'd be
+		 * difficult to do the same in the catch-up phase (as the logical
+		 * decoding does not provide us with sufficient visibility
+		 * information). Thus we must use heap_insert() both during the
+		 * catch-up and here.
+		 *
+		 * The following is like simple_heap_insert() except that we pass the
+		 * flag to skip logical decoding: as soon as REPACK CONCURRENTLY swaps
+		 * the relation files, it drops this relation, so no logical
+		 * replication subscription should need the data.
+		 */
+		heap_insert(NewHeap, copiedTuple, GetCurrentCommandId(true),
+					HEAP_INSERT_NO_LOGICAL, NULL);
+	}
 
 	heap_freetuple(copiedTuple);
 }
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index e6d2b5fced1..6aa2ed214f2 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -617,9 +617,9 @@ raw_heap_insert(RewriteState state, HeapTuple tup)
 		int			options = HEAP_INSERT_SKIP_FSM;
 
 		/*
-		 * While rewriting the heap for VACUUM FULL / CLUSTER, make sure data
-		 * for the TOAST table are not logically decoded.  The main heap is
-		 * WAL-logged as XLOG FPI records, which are not logically decoded.
+		 * While rewriting the heap for REPACK, make sure data for the TOAST
+		 * table are not logically decoded.  The main heap is WAL-logged as
+		 * XLOG FPI records, which are not logically decoded.
 		 */
 		options |= HEAP_INSERT_NO_LOGICAL;
 
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index b46e7e9c2a6..5670f2bfbde 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -215,6 +215,7 @@ typedef struct TransactionStateData
 	bool		parallelChildXact;	/* is any parent transaction parallel? */
 	bool		chain;			/* start a new block after this one */
 	bool		topXidLogged;	/* for a subxact: is top-level XID logged? */
+	bool		internal;		/* for a subxact: launched internally? */
 	struct TransactionStateData *parent;	/* back link to parent */
 } TransactionStateData;
 
@@ -4735,6 +4736,7 @@ BeginInternalSubTransaction(const char *name)
 			/* Normal subtransaction start */
 			PushTransaction();
 			s = CurrentTransactionState;	/* changed by push */
+			s->internal = true;
 
 			/*
 			 * Savepoint names, like the TransactionState block itself, live
@@ -5251,7 +5253,13 @@ AbortSubTransaction(void)
 	LWLockReleaseAll();
 
 	pgstat_report_wait_end();
-	pgstat_progress_end_command();
+
+	/*
+	 * Internal subtransacion might be used by an user command, in which case
+	 * the command outlives the subtransaction.
+	 */
+	if (!s->internal)
+		pgstat_progress_end_command();
 
 	pgaio_error_cleanup();
 
@@ -5468,6 +5476,7 @@ PushTransaction(void)
 	s->parallelModeLevel = 0;
 	s->parallelChildXact = (p->parallelModeLevel != 0 || p->parallelChildXact);
 	s->topXidLogged = false;
+	s->internal = false;
 
 	CurrentTransactionState = s;
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index b2b7b10c2be..a92ac78ad9e 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1266,16 +1266,17 @@ CREATE VIEW pg_stat_progress_cluster AS
                       WHEN 2 THEN 'index scanning heap'
                       WHEN 3 THEN 'sorting tuples'
                       WHEN 4 THEN 'writing new heap'
-                      WHEN 5 THEN 'swapping relation files'
-                      WHEN 6 THEN 'rebuilding index'
-                      WHEN 7 THEN 'performing final cleanup'
+                      -- 5 is 'catch-up', but that should not appear here.
+                      WHEN 6 THEN 'swapping relation files'
+                      WHEN 7 THEN 'rebuilding index'
+                      WHEN 8 THEN 'performing final cleanup'
                       END AS phase,
         CAST(S.param3 AS oid) AS cluster_index_relid,
         S.param4 AS heap_tuples_scanned,
         S.param5 AS heap_tuples_written,
-        S.param6 AS heap_blks_total,
-        S.param7 AS heap_blks_scanned,
-        S.param8 AS index_rebuild_count
+        S.param8 AS heap_blks_total,
+        S.param9 AS heap_blks_scanned,
+        S.param10 AS index_rebuild_count
     FROM pg_stat_get_progress_info('CLUSTER') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
@@ -1291,16 +1292,19 @@ CREATE VIEW pg_stat_progress_repack AS
                       WHEN 2 THEN 'index scanning heap'
                       WHEN 3 THEN 'sorting tuples'
                       WHEN 4 THEN 'writing new heap'
-                      WHEN 5 THEN 'swapping relation files'
-                      WHEN 6 THEN 'rebuilding index'
-                      WHEN 7 THEN 'performing final cleanup'
+                      WHEN 5 THEN 'catch-up'
+                      WHEN 6 THEN 'swapping relation files'
+                      WHEN 7 THEN 'rebuilding index'
+                      WHEN 8 THEN 'performing final cleanup'
                       END AS phase,
         CAST(S.param3 AS oid) AS repack_index_relid,
         S.param4 AS heap_tuples_scanned,
-        S.param5 AS heap_tuples_written,
-        S.param6 AS heap_blks_total,
-        S.param7 AS heap_blks_scanned,
-        S.param8 AS index_rebuild_count
+        S.param5 AS heap_tuples_inserted,
+        S.param6 AS heap_tuples_updated,
+        S.param7 AS heap_tuples_deleted,
+        S.param8 AS heap_blks_total,
+        S.param9 AS heap_blks_scanned,
+        S.param10 AS index_rebuild_count
     FROM pg_stat_get_progress_info('REPACK') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index af14a230f94..aa3ae85bcee 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -25,6 +25,10 @@
 #include "access/toast_internals.h"
 #include "access/transam.h"
 #include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xlog_internal.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
 #include "catalog/catalog.h"
 #include "catalog/dependency.h"
 #include "catalog/heap.h"
@@ -32,6 +36,7 @@
 #include "catalog/namespace.h"
 #include "catalog/objectaccess.h"
 #include "catalog/pg_am.h"
+#include "catalog/pg_control.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/toasting.h"
 #include "commands/cluster.h"
@@ -39,15 +44,21 @@
 #include "commands/progress.h"
 #include "commands/tablecmds.h"
 #include "commands/vacuum.h"
+#include "executor/executor.h"
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
 #include "pgstat.h"
+#include "replication/decode.h"
+#include "replication/logical.h"
+#include "replication/snapbuild.h"
 #include "storage/bufmgr.h"
+#include "storage/ipc.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
 #include "utils/acl.h"
 #include "utils/fmgroids.h"
 #include "utils/guc.h"
+#include "utils/injection_point.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
@@ -67,13 +78,45 @@ typedef struct
 	Oid			indexOid;
 } RelToCluster;
 
+/*
+ * The following definitions are used for concurrent processing.
+ */
+
+/*
+ * The locators are used to avoid logical decoding of data that we do not need
+ * for our table.
+ */
+RelFileLocator repacked_rel_locator = {.relNumber = InvalidOid};
+RelFileLocator repacked_rel_toast_locator = {.relNumber = InvalidOid};
+
+/*
+ * Everything we need to call ExecInsertIndexTuples().
+ */
+typedef struct IndexInsertState
+{
+	ResultRelInfo *rri;
+	EState	   *estate;
+
+	Relation	ident_index;
+} IndexInsertState;
+
+/* The WAL segment being decoded. */
+static XLogSegNo repack_current_segment = 0;
+
+
 static bool cluster_rel_recheck(RepackCommand cmd, Relation OldHeap,
-								Oid indexOid, Oid userid, int options);
+								Oid indexOid, Oid userid, LOCKMODE lmode,
+								int options);
+static void check_repack_concurrently_requirements(Relation rel);
 static void rebuild_relation(RepackCommand cmd, bool usingindex,
-							 Relation OldHeap, Relation index, bool verbose);
+							 Relation OldHeap, Relation index, Oid userid,
+							 bool verbose, bool concurrent);
 static void copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
-							bool verbose, bool *pSwapToastByContent,
-							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
+							Snapshot snapshot, LogicalDecodingContext *decoding_ctx,
+							bool verbose,
+							bool *pSwapToastByContent,
+							TransactionId *pFreezeXid,
+							MultiXactId *pCutoffMulti);
 static List *get_tables_to_repack(RepackCommand cmd, bool usingindex,
 								  MemoryContext permcxt);
 static List *get_tables_to_repack_partitioned(RepackCommand cmd,
@@ -81,12 +124,61 @@ static List *get_tables_to_repack_partitioned(RepackCommand cmd,
 											  Oid relid, bool rel_is_index);
 static bool cluster_is_permitted_for_relation(RepackCommand cmd,
 											  Oid relid, Oid userid);
+
+static void begin_concurrent_repack(Relation rel);
+static void end_concurrent_repack(void);
+static LogicalDecodingContext *setup_logical_decoding(Oid relid,
+													  const char *slotname,
+													  TupleDesc tupdesc);
+static HeapTuple get_changed_tuple(char *change);
+static void apply_concurrent_changes(RepackDecodingState *dstate,
+									 Relation rel, ScanKey key, int nkeys,
+									 IndexInsertState *iistate);
+static void apply_concurrent_insert(Relation rel, ConcurrentChange *change,
+									HeapTuple tup, IndexInsertState *iistate,
+									TupleTableSlot *index_slot);
+static void apply_concurrent_update(Relation rel, HeapTuple tup,
+									HeapTuple tup_target,
+									ConcurrentChange *change,
+									IndexInsertState *iistate,
+									TupleTableSlot *index_slot);
+static void apply_concurrent_delete(Relation rel, HeapTuple tup_target,
+									ConcurrentChange *change);
+static HeapTuple find_target_tuple(Relation rel, ScanKey key, int nkeys,
+								   HeapTuple tup_key,
+								   IndexInsertState *iistate,
+								   TupleTableSlot *ident_slot,
+								   IndexScanDesc *scan_p);
+static void process_concurrent_changes(LogicalDecodingContext *ctx,
+									   XLogRecPtr end_of_wal,
+									   Relation rel_dst,
+									   Relation rel_src,
+									   ScanKey ident_key,
+									   int ident_key_nentries,
+									   IndexInsertState *iistate);
+static IndexInsertState *get_index_insert_state(Relation relation,
+												Oid ident_index_id);
+static ScanKey build_identity_key(Oid ident_idx_oid, Relation rel_src,
+								  int *nentries);
+static void free_index_insert_state(IndexInsertState *iistate);
+static void cleanup_logical_decoding(LogicalDecodingContext *ctx);
+static void rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
+											   Relation cl_index,
+											   LogicalDecodingContext *ctx,
+											   bool swap_toast_by_content,
+											   TransactionId frozenXid,
+											   MultiXactId cutoffMulti);
+static List *build_new_indexes(Relation NewHeap, Relation OldHeap, List *OldIndexes);
 static Relation process_single_relation(RepackStmt *stmt,
+										LOCKMODE lockmode,
+										bool isTopLevel,
 										ClusterParams *params);
 static Oid	determine_clustered_index(Relation rel, bool usingindex,
 									  const char *indexname);
 
 
+#define REPL_PLUGIN_NAME   "pgoutput_repack"
+
 static const char *
 RepackCommandAsString(RepackCommand cmd)
 {
@@ -95,9 +187,9 @@ RepackCommandAsString(RepackCommand cmd)
 		case REPACK_COMMAND_REPACK:
 			return "REPACK";
 		case REPACK_COMMAND_VACUUMFULL:
-			return "VACUUM";
+			return "VACUUM (FULL)";
 		case REPACK_COMMAND_CLUSTER:
-			return "VACUUM";
+			return "CLUSTER";
 	}
 	return "???";
 }
@@ -130,16 +222,27 @@ void
 ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 {
 	ClusterParams params = {0};
-	bool		verbose = false;
 	Relation	rel = NULL;
 	MemoryContext repack_context;
+	LOCKMODE	lockmode;
 	List	   *rtcs;
 
 	/* Parse option list */
 	foreach_node(DefElem, opt, stmt->params)
 	{
-		if (strcmp(opt->defname, "verbose") == 0)
-			verbose = defGetBoolean(opt);
+		if (strcmp(opt->defname, "verbose") == 0 &&
+			defGetBoolean(opt))
+			params.options |= CLUOPT_VERBOSE;
+		else if (strcmp(opt->defname, "concurrently") == 0 &&
+				 defGetBoolean(opt))
+		{
+			if (stmt->command != REPACK_COMMAND_REPACK)
+				ereport(ERROR,
+						errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						errmsg("CONCURRENTLY option not supported for %s",
+							   RepackCommandAsString(stmt->command)));
+			params.options |= CLUOPT_CONCURRENT;
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -149,7 +252,17 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 					 parser_errposition(pstate, opt->location)));
 	}
 
-	params.options = (verbose ? CLUOPT_VERBOSE : 0);
+	/*
+	 * Determine the lock mode expected by cluster_rel().
+	 *
+	 * In the exclusive case, we obtain AccessExclusiveLock right away to
+	 * avoid lock-upgrade hazard in the single-transaction case. In the
+	 * CONCURRENTLY case, the AccessExclusiveLock will only be used at the end
+	 * of processing, supposedly for very short time. Until then, we'll have
+	 * to unlock the relation temporarily, so there's no lock-upgrade hazard.
+	 */
+	lockmode = (params.options & CLUOPT_CONCURRENT) == 0 ?
+		AccessExclusiveLock : ShareUpdateExclusiveLock;
 
 	/*
 	 * If a single relation is specified, process it and we're done ... unless
@@ -157,16 +270,35 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 	 */
 	if (stmt->relation != NULL)
 	{
-		rel = process_single_relation(stmt, &params);
+		rel = process_single_relation(stmt, lockmode, isTopLevel, &params);
 		if (rel == NULL)
 			return;
 	}
 
 	/*
-	 * By here, we know we are in a multi-table situation.  In order to avoid
-	 * holding locks for too long, we want to process each table in its own
-	 * transaction.  This forces us to disallow running inside a user
-	 * transaction block.
+	 * By here, we know we are in a multi-table situation.
+	 *
+	 * Concurrent processing is currently considered rather special (e.g. in
+	 * terms of resources consumed) so it is not performed in bulk.
+	 */
+	if (params.options & CLUOPT_CONCURRENT)
+	{
+		if (rel != NULL)
+		{
+			Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
+			ereport(ERROR,
+					errmsg("REPACK CONCURRENTLY not supported for partitioned tables"),
+					errhint("Consider running the command for individual partitions."));
+		}
+		else
+			ereport(ERROR,
+					errmsg("REPACK CONCURRENTLY requires explicit table name"));
+	}
+
+	/*
+	 * In order to avoid holding locks for too long, we want to process each
+	 * table in its own transaction.  This forces us to disallow running
+	 * inside a user transaction block.
 	 */
 	PreventInTransactionBlock(isTopLevel, RepackCommandAsString(stmt->command));
 
@@ -247,13 +379,13 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
 		 * Open the target table, coping with the case where it has been
 		 * dropped.
 		 */
-		rel = try_table_open(rtc->tableOid, AccessExclusiveLock);
+		rel = try_table_open(rtc->tableOid, lockmode);
 		if (rel == NULL)
 			continue;
 
 		/* Process this table */
 		cluster_rel(stmt->command, stmt->usingindex,
-					rel, rtc->indexOid, &params);
+					rel, rtc->indexOid, &params, isTopLevel);
 		/* cluster_rel closes the relation, but keeps lock */
 
 		PopActiveSnapshot();
@@ -282,22 +414,55 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
  * If indexOid is InvalidOid, the table will be rewritten in physical order
  * instead of index order.
  *
+ * Note that, in the concurrent case, the function releases the lock at some
+ * point, in order to get AccessExclusiveLock for the final steps (i.e. to
+ * swap the relation files). To make things simpler, the caller should expect
+ * OldHeap to be closed on return, regardless CLUOPT_CONCURRENT. (The
+ * AccessExclusiveLock is kept till the end of the transaction.)
+ *
  * 'cmd' indicates which command is being executed, to be used for error
  * messages.
  */
 void
 cluster_rel(RepackCommand cmd, bool usingindex,
-			Relation OldHeap, Oid indexOid, ClusterParams *params)
+			Relation OldHeap, Oid indexOid, ClusterParams *params,
+			bool isTopLevel)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
+	Relation	index;
+	LOCKMODE	lmode;
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
 	bool		verbose = ((params->options & CLUOPT_VERBOSE) != 0);
 	bool		recheck = ((params->options & CLUOPT_RECHECK) != 0);
-	Relation	index;
+	bool		concurrent = ((params->options & CLUOPT_CONCURRENT) != 0);
+
+	/*
+	 * Check that the correct lock is held. The lock mode is
+	 * AccessExclusiveLock for normal processing and ShareUpdateExclusiveLock
+	 * for concurrent processing (so that SELECT, INSERT, UPDATE and DELETE
+	 * commands work, but cluster_rel() cannot be called concurrently for the
+	 * same relation).
+	 */
+	lmode = !concurrent ? AccessExclusiveLock : ShareUpdateExclusiveLock;
+
+	/* There are specific requirements on concurrent processing. */
+	if (concurrent)
+	{
+		/*
+		 * Make sure we have no XID assigned, otherwise call of
+		 * setup_logical_decoding() can cause a deadlock.
+		 *
+		 * The existence of transaction block actually does not imply that XID
+		 * was already assigned, but it very likely is. We might want to check
+		 * the result of GetCurrentTransactionIdIfAny() instead, but that
+		 * would be less clear from user's perspective.
+		 */
+		PreventInTransactionBlock(isTopLevel, "REPACK (CONCURRENTLY)");
 
-	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false));
+		check_repack_concurrently_requirements(OldHeap);
+	}
 
 	/* Check for user-requested abort. */
 	CHECK_FOR_INTERRUPTS();
@@ -340,11 +505,13 @@ cluster_rel(RepackCommand cmd, bool usingindex,
 	 * If this is a single-transaction CLUSTER, we can skip these tests. We
 	 * *must* skip the one on indisclustered since it would reject an attempt
 	 * to cluster a not-previously-clustered index.
+	 *
+	 * XXX move [some of] these comments to where the RECHECK flag is
+	 * determined?
 	 */
-	if (recheck)
-		if (!cluster_rel_recheck(cmd, OldHeap, indexOid, save_userid,
-								 params->options))
-			goto out;
+	if (recheck && !cluster_rel_recheck(cmd, OldHeap, indexOid, save_userid,
+										lmode, params->options))
+		goto out;
 
 	/*
 	 * We allow repacking shared catalogs only when not using an index. It
@@ -358,6 +525,12 @@ cluster_rel(RepackCommand cmd, bool usingindex,
 				 errmsg("cannot run \"%s\" on a shared catalog",
 						RepackCommandAsString(cmd))));
 
+	/*
+	 * The CONCURRENTLY case should have been rejected earlier because it does
+	 * not support system catalogs.
+	 */
+	Assert(!(OldHeap->rd_rel->relisshared && concurrent));
+
 	/*
 	 * Don't process temp tables of other backends ... their local buffer
 	 * manager is not going to cope.
@@ -393,7 +566,7 @@ cluster_rel(RepackCommand cmd, bool usingindex,
 	if (OidIsValid(indexOid))
 	{
 		/* verify the index is good and lock it */
-		check_index_is_clusterable(OldHeap, indexOid, AccessExclusiveLock);
+		check_index_is_clusterable(OldHeap, indexOid, lmode);
 		/* also open it */
 		index = index_open(indexOid, NoLock);
 	}
@@ -410,7 +583,9 @@ cluster_rel(RepackCommand cmd, bool usingindex,
 	if (OldHeap->rd_rel->relkind == RELKIND_MATVIEW &&
 		!RelationIsPopulated(OldHeap))
 	{
-		relation_close(OldHeap, AccessExclusiveLock);
+		if (index)
+			index_close(index, lmode);
+		relation_close(OldHeap, lmode);
 		goto out;
 	}
 
@@ -423,11 +598,35 @@ cluster_rel(RepackCommand cmd, bool usingindex,
 	 * invalid, because we move tuples around.  Promote them to relation
 	 * locks.  Predicate locks on indexes will be promoted when they are
 	 * reindexed.
+	 *
+	 * During concurrent processing, the heap as well as its indexes stay in
+	 * operation, so we postpone this step until they are locked using
+	 * AccessExclusiveLock near the end of the processing.
 	 */
-	TransferPredicateLocksToHeapRelation(OldHeap);
+	if (!concurrent)
+		TransferPredicateLocksToHeapRelation(OldHeap);
 
 	/* rebuild_relation does all the dirty work */
-	rebuild_relation(cmd, usingindex, OldHeap, index, verbose);
+	PG_TRY();
+	{
+		/*
+		 * For concurrent processing, make sure that our logical decoding
+		 * ignores data changes of other tables than the one we are
+		 * processing.
+		 */
+		if (concurrent)
+			begin_concurrent_repack(OldHeap);
+
+		rebuild_relation(cmd, usingindex, OldHeap, index, save_userid,
+						 verbose, concurrent);
+	}
+	PG_FINALLY();
+	{
+		if (concurrent)
+			end_concurrent_repack();
+	}
+	PG_END_TRY();
+
 	/* rebuild_relation closes OldHeap, and index if valid */
 
 out:
@@ -446,14 +645,14 @@ out:
  */
 static bool
 cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
-					Oid userid, int options)
+					Oid userid, LOCKMODE lmode, int options)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 
 	/* Check that the user still has privileges for the relation */
 	if (!cluster_is_permitted_for_relation(cmd, tableOid, userid))
 	{
-		relation_close(OldHeap, AccessExclusiveLock);
+		relation_close(OldHeap, lmode);
 		return false;
 	}
 
@@ -467,7 +666,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 	 */
 	if (RELATION_IS_OTHER_TEMP(OldHeap))
 	{
-		relation_close(OldHeap, AccessExclusiveLock);
+		relation_close(OldHeap, lmode);
 		return false;
 	}
 
@@ -478,7 +677,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 		 */
 		if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
 		{
-			relation_close(OldHeap, AccessExclusiveLock);
+			relation_close(OldHeap, lmode);
 			return false;
 		}
 
@@ -489,7 +688,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
 		if ((options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
 			!get_index_isclustered(indexOid))
 		{
-			relation_close(OldHeap, AccessExclusiveLock);
+			relation_close(OldHeap, lmode);
 			return false;
 		}
 	}
@@ -630,19 +829,89 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 	table_close(pg_index, RowExclusiveLock);
 }
 
+/*
+ * Check if the CONCURRENTLY option is legal for the relation.
+ */
+static void
+check_repack_concurrently_requirements(Relation rel)
+{
+	char		relpersistence,
+				replident;
+	Oid			ident_idx;
+
+	/* Data changes in system relations are not logically decoded. */
+	if (IsCatalogRelation(rel))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("REPACK CONCURRENTLY is not supported for catalog relations.")));
+
+	/*
+	 * reorderbuffer.c does not seem to handle processing of TOAST relation
+	 * alone.
+	 */
+	if (IsToastRelation(rel))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("REPACK CONCURRENTLY is not supported for TOAST relations, unless the main relation is repacked too.")));
+
+	relpersistence = rel->rd_rel->relpersistence;
+	if (relpersistence != RELPERSISTENCE_PERMANENT)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("REPACK CONCURRENTLY is only allowed for permanent relations.")));
+
+	/* With NOTHING, WAL does not contain the old tuple. */
+	replident = rel->rd_rel->relreplident;
+	if (replident == REPLICA_IDENTITY_NOTHING)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot repack relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 errhint("Relation \"%s\" has insufficient replication identity.",
+						 RelationGetRelationName(rel))));
+
+	/*
+	 * Identity index is not set if the replica identity is FULL, but PK might
+	 * exist in such a case.
+	 */
+	ident_idx = RelationGetReplicaIndex(rel);
+	if (!OidIsValid(ident_idx) && OidIsValid(rel->rd_pkindex))
+		ident_idx = rel->rd_pkindex;
+	if (!OidIsValid(ident_idx))
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot process relation \"%s\"",
+						RelationGetRelationName(rel)),
+				 (errhint("Relation \"%s\" has no identity index.",
+						  RelationGetRelationName(rel)))));
+}
+
+
 /*
  * rebuild_relation: rebuild an existing relation in index or physical order
  *
- * OldHeap: table to rebuild.
+ * OldHeap: table to rebuild.  See cluster_rel() for comments on the required
+ * lock strength.
+ *
  * index: index to cluster by, or NULL to rewrite in physical order.
  *
- * On entry, heap and index (if one is given) must be open, and
- * AccessExclusiveLock held on them.
- * On exit, they are closed, but locks on them are not released.
+ * On entry, heap and index (if one is given) must be open, and the
+ * appropriate lock held on them -- AccessExclusiveLock for exclusive
+ * processing and ShareUpdateExclusiveLock for concurrent processing.
+ *
+ * On exit, they are closed, but still locked with AccessExclusiveLock.  (The
+ * function handles the lock upgrade if 'concurrent' is true.)
  */
 static void
 rebuild_relation(RepackCommand cmd, bool usingindex,
-				 Relation OldHeap, Relation index, bool verbose)
+				 Relation OldHeap, Relation index, Oid userid,
+				 bool verbose, bool concurrent)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			accessMethod = OldHeap->rd_rel->relam;
@@ -650,13 +919,55 @@ rebuild_relation(RepackCommand cmd, bool usingindex,
 	Oid			OIDNewHeap;
 	Relation	NewHeap;
 	char		relpersistence;
-	bool		is_system_catalog;
 	bool		swap_toast_by_content;
 	TransactionId frozenXid;
 	MultiXactId cutoffMulti;
+	NameData	slotname;
+	LogicalDecodingContext *ctx = NULL;
+	Snapshot	snapshot = NULL;
+#if USE_ASSERT_CHECKING
+	LOCKMODE	lmode;
+
+	lmode = concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock;
+
+	Assert(CheckRelationLockedByMe(OldHeap, lmode, false));
+	Assert(!usingindex || CheckRelationLockedByMe(index, lmode, false));
+#endif
+
+	if (concurrent)
+	{
+		TupleDesc	tupdesc;
+
+		/*
+		 * REPACK CONCURRENTLY is not allowed in a transaction block, so this
+		 * should never fire.
+		 */
+		Assert(GetTopTransactionIdIfAny() == InvalidTransactionId);
+
+		/*
+		 * A single backend should not execute multiple REPACK commands at a
+		 * time, so use PID to make the slot unique.
+		 */
+		snprintf(NameStr(slotname), NAMEDATALEN, "repack_%d", MyProcPid);
+
+		tupdesc = CreateTupleDescCopy(RelationGetDescr(OldHeap));
+
+		/*
+		 * Prepare to capture the concurrent data changes.
+		 *
+		 * Note that this call waits for all transactions with XID already
+		 * assigned to finish. If some of those transactions is waiting for a
+		 * lock conflicting with ShareUpdateExclusiveLock on our table (e.g.
+		 * it runs CREATE INDEX), we can end up in a deadlock. Not sure this
+		 * risk is worth unlocking/locking the table (and its clustering
+		 * index) and checking again if its still eligible for REPACK
+		 * CONCURRENTLY.
+		 */
+		ctx = setup_logical_decoding(tableOid, NameStr(slotname), tupdesc);
 
-	Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false) &&
-		   (index == NULL || CheckRelationLockedByMe(index, AccessExclusiveLock, false)));
+		snapshot = SnapBuildInitialSnapshotForRepack(ctx->snapshot_builder);
+		PushActiveSnapshot(snapshot);
+	}
 
 	/* for CLUSTER or REPACK USING INDEX, mark the index as the one to use */
 	if (usingindex)
@@ -664,7 +975,6 @@ rebuild_relation(RepackCommand cmd, bool usingindex,
 
 	/* Remember info about rel before closing OldHeap */
 	relpersistence = OldHeap->rd_rel->relpersistence;
-	is_system_catalog = IsSystemRelation(OldHeap);
 
 	/*
 	 * Create the transient table that will receive the re-ordered data.
@@ -680,30 +990,67 @@ rebuild_relation(RepackCommand cmd, bool usingindex,
 	NewHeap = table_open(OIDNewHeap, NoLock);
 
 	/* Copy the heap data into the new table in the desired order */
-	copy_table_data(NewHeap, OldHeap, index, verbose,
+	copy_table_data(NewHeap, OldHeap, index, snapshot, ctx, verbose,
 					&swap_toast_by_content, &frozenXid, &cutoffMulti);
 
+	/* The historic snapshot won't be needed anymore. */
+	if (snapshot)
+		PopActiveSnapshot();
 
-	/* Close relcache entries, but keep lock until transaction commit */
-	table_close(OldHeap, NoLock);
-	if (index)
-		index_close(index, NoLock);
+	if (concurrent)
+	{
+		/*
+		 * Push a snapshot that we will use to find old versions of rows when
+		 * processing concurrent UPDATE and DELETE commands. (That snapshot
+		 * should also be used by index expressions.)
+		 */
+		PushActiveSnapshot(GetTransactionSnapshot());
 
-	/*
-	 * Close the new relation so it can be dropped as soon as the storage is
-	 * swapped. The relation is not visible to others, so no need to unlock it
-	 * explicitly.
-	 */
-	table_close(NewHeap, NoLock);
+		/*
+		 * Make sure we can find the tuples just inserted when applying DML
+		 * commands on top of those.
+		 */
+		CommandCounterIncrement();
+		UpdateActiveSnapshotCommandId();
 
-	/*
-	 * Swap the physical files of the target and transient tables, then
-	 * rebuild the target's indexes and throw away the transient table.
-	 */
-	finish_heap_swap(tableOid, OIDNewHeap, is_system_catalog,
-					 swap_toast_by_content, false, true,
-					 frozenXid, cutoffMulti,
-					 relpersistence);
+		rebuild_relation_finish_concurrent(NewHeap, OldHeap, index,
+										   ctx, swap_toast_by_content,
+										   frozenXid, cutoffMulti);
+		PopActiveSnapshot();
+
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_FINAL_CLEANUP);
+
+		/* Done with decoding. */
+		cleanup_logical_decoding(ctx);
+		ReplicationSlotRelease();
+		ReplicationSlotDrop(NameStr(slotname), false);
+	}
+	else
+	{
+		bool		is_system_catalog = IsSystemRelation(OldHeap);
+
+		/* Close relcache entries, but keep lock until transaction commit */
+		table_close(OldHeap, NoLock);
+		if (index)
+			index_close(index, NoLock);
+
+		/*
+		 * Close the new relation so it can be dropped as soon as the storage
+		 * is swapped. The relation is not visible to others, so no need to
+		 * unlock it explicitly.
+		 */
+		table_close(NewHeap, NoLock);
+
+		/*
+		 * Swap the physical files of the target and transient tables, then
+		 * rebuild the target's indexes and throw away the transient table.
+		 */
+		finish_heap_swap(tableOid, OIDNewHeap, is_system_catalog,
+						 swap_toast_by_content, false, true, true,
+						 frozenXid, cutoffMulti,
+						 relpersistence);
+	}
 }
 
 
@@ -838,15 +1185,19 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
 /*
  * Do the physical copying of table data.
  *
+ * 'snapshot' and 'decoding_ctx': see table_relation_copy_for_cluster(). Pass
+ * iff concurrent processing is required.
+ *
  * There are three output parameters:
  * *pSwapToastByContent is set true if toast tables must be swapped by content.
  * *pFreezeXid receives the TransactionId used as freeze cutoff point.
  * *pCutoffMulti receives the MultiXactId used as a cutoff point.
  */
 static void
-copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verbose,
-				bool *pSwapToastByContent, TransactionId *pFreezeXid,
-				MultiXactId *pCutoffMulti)
+copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
+				Snapshot snapshot, LogicalDecodingContext *decoding_ctx,
+				bool verbose, bool *pSwapToastByContent,
+				TransactionId *pFreezeXid, MultiXactId *pCutoffMulti)
 {
 	Relation	relRelation;
 	HeapTuple	reltup;
@@ -864,6 +1215,8 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	PGRUsage	ru0;
 	char	   *nspname;
 
+	bool		concurrent = snapshot != NULL;
+
 	pg_rusage_init(&ru0);
 
 	/* Store a copy of the namespace name for logging purposes */
@@ -966,8 +1319,48 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	 * provided, else plain seqscan.
 	 */
 	if (OldIndex != NULL && OldIndex->rd_rel->relam == BTREE_AM_OID)
+	{
+		ResourceOwner oldowner = NULL;
+		ResourceOwner resowner = NULL;
+
+		/*
+		 * In the CONCURRENT case, use a dedicated resource owner so we don't
+		 * leave any additional locks behind us that we cannot release easily.
+		 */
+		if (concurrent)
+		{
+			Assert(CheckRelationLockedByMe(OldHeap, ShareUpdateExclusiveLock,
+										   false));
+			Assert(CheckRelationLockedByMe(OldIndex, ShareUpdateExclusiveLock,
+										   false));
+
+			resowner = ResourceOwnerCreate(CurrentResourceOwner,
+										   "plan_cluster_use_sort");
+			oldowner = CurrentResourceOwner;
+			CurrentResourceOwner = resowner;
+		}
+
 		use_sort = plan_cluster_use_sort(RelationGetRelid(OldHeap),
 										 RelationGetRelid(OldIndex));
+
+		if (concurrent)
+		{
+			CurrentResourceOwner = oldowner;
+
+			/*
+			 * We are primarily concerned about locks, but if the planner
+			 * happened to allocate any other resources, we should release
+			 * them too because we're going to delete the whole resowner.
+			 */
+			ResourceOwnerRelease(resowner, RESOURCE_RELEASE_BEFORE_LOCKS,
+								 false, false);
+			ResourceOwnerRelease(resowner, RESOURCE_RELEASE_LOCKS,
+								 false, false);
+			ResourceOwnerRelease(resowner, RESOURCE_RELEASE_AFTER_LOCKS,
+								 false, false);
+			ResourceOwnerDelete(resowner);
+		}
+	}
 	else
 		use_sort = false;
 
@@ -996,7 +1389,9 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	 * values (e.g. because the AM doesn't use freezing).
 	 */
 	table_relation_copy_for_cluster(OldHeap, NewHeap, OldIndex, use_sort,
-									cutoffs.OldestXmin, &cutoffs.FreezeLimit,
+									cutoffs.OldestXmin, snapshot,
+									decoding_ctx,
+									&cutoffs.FreezeLimit,
 									&cutoffs.MultiXactCutoff,
 									&num_tuples, &tups_vacuumed,
 									&tups_recently_dead);
@@ -1005,7 +1400,11 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
 	*pFreezeXid = cutoffs.FreezeLimit;
 	*pCutoffMulti = cutoffs.MultiXactCutoff;
 
-	/* Reset rd_toastoid just to be tidy --- it shouldn't be looked at again */
+	/*
+	 * Reset rd_toastoid just to be tidy --- it shouldn't be looked at again.
+	 * In the CONCURRENTLY case, we need to set it again before applying the
+	 * concurrent changes.
+	 */
 	NewHeap->rd_toastoid = InvalidOid;
 
 	num_pages = RelationGetNumberOfBlocks(NewHeap);
@@ -1463,14 +1862,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 				 bool swap_toast_by_content,
 				 bool check_constraints,
 				 bool is_internal,
+				 bool reindex,
 				 TransactionId frozenXid,
 				 MultiXactId cutoffMulti,
 				 char newrelpersistence)
 {
 	ObjectAddress object;
 	Oid			mapped_tables[4];
-	int			reindex_flags;
-	ReindexParams reindex_params = {0};
 	int			i;
 
 	/* Report that we are now swapping relation files */
@@ -1496,39 +1894,47 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	if (is_system_catalog)
 		CacheInvalidateCatalog(OIDOldHeap);
 
-	/*
-	 * Rebuild each index on the relation (but not the toast table, which is
-	 * all-new at this point).  It is important to do this before the DROP
-	 * step because if we are processing a system catalog that will be used
-	 * during DROP, we want to have its indexes available.  There is no
-	 * advantage to the other order anyway because this is all transactional,
-	 * so no chance to reclaim disk space before commit.  We do not need a
-	 * final CommandCounterIncrement() because reindex_relation does it.
-	 *
-	 * Note: because index_build is called via reindex_relation, it will never
-	 * set indcheckxmin true for the indexes.  This is OK even though in some
-	 * sense we are building new indexes rather than rebuilding existing ones,
-	 * because the new heap won't contain any HOT chains at all, let alone
-	 * broken ones, so it can't be necessary to set indcheckxmin.
-	 */
-	reindex_flags = REINDEX_REL_SUPPRESS_INDEX_USE;
-	if (check_constraints)
-		reindex_flags |= REINDEX_REL_CHECK_CONSTRAINTS;
+	if (reindex)
+	{
+		int			reindex_flags;
+		ReindexParams reindex_params = {0};
 
-	/*
-	 * Ensure that the indexes have the same persistence as the parent
-	 * relation.
-	 */
-	if (newrelpersistence == RELPERSISTENCE_UNLOGGED)
-		reindex_flags |= REINDEX_REL_FORCE_INDEXES_UNLOGGED;
-	else if (newrelpersistence == RELPERSISTENCE_PERMANENT)
-		reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
+		/*
+		 * Rebuild each index on the relation (but not the toast table, which
+		 * is all-new at this point).  It is important to do this before the
+		 * DROP step because if we are processing a system catalog that will
+		 * be used during DROP, we want to have its indexes available.  There
+		 * is no advantage to the other order anyway because this is all
+		 * transactional, so no chance to reclaim disk space before commit. We
+		 * do not need a final CommandCounterIncrement() because
+		 * reindex_relation does it.
+		 *
+		 * Note: because index_build is called via reindex_relation, it will
+		 * never set indcheckxmin true for the indexes.  This is OK even
+		 * though in some sense we are building new indexes rather than
+		 * rebuilding existing ones, because the new heap won't contain any
+		 * HOT chains at all, let alone broken ones, so it can't be necessary
+		 * to set indcheckxmin.
+		 */
+		reindex_flags = REINDEX_REL_SUPPRESS_INDEX_USE;
+		if (check_constraints)
+			reindex_flags |= REINDEX_REL_CHECK_CONSTRAINTS;
 
-	/* Report that we are now reindexing relations */
-	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
-								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
+		/*
+		 * Ensure that the indexes have the same persistence as the parent
+		 * relation.
+		 */
+		if (newrelpersistence == RELPERSISTENCE_UNLOGGED)
+			reindex_flags |= REINDEX_REL_FORCE_INDEXES_UNLOGGED;
+		else if (newrelpersistence == RELPERSISTENCE_PERMANENT)
+			reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
 
-	reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
+		/* Report that we are now reindexing relations */
+		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+									 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
+
+		reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
+	}
 
 	/* Report that we are now doing clean up */
 	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
@@ -1870,7 +2276,8 @@ cluster_is_permitted_for_relation(RepackCommand cmd, Oid relid, Oid userid)
  * resolve in this case.
  */
 static Relation
-process_single_relation(RepackStmt *stmt, ClusterParams *params)
+process_single_relation(RepackStmt *stmt, LOCKMODE lockmode, bool isTopLevel,
+						ClusterParams *params)
 {
 	Relation	rel;
 	Oid			tableOid;
@@ -1879,13 +2286,9 @@ process_single_relation(RepackStmt *stmt, ClusterParams *params)
 	Assert(stmt->command == REPACK_COMMAND_CLUSTER ||
 		   stmt->command == REPACK_COMMAND_REPACK);
 
-	/*
-	 * Find, lock, and check permissions on the table.  We obtain
-	 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
-	 * single-transaction case.
-	 */
+	/* Find, lock, and check permissions on the table. */
 	tableOid = RangeVarGetRelidExtended(stmt->relation,
-										AccessExclusiveLock,
+										lockmode,
 										0,
 										RangeVarCallbackMaintainsTable,
 										NULL);
@@ -1911,12 +2314,17 @@ process_single_relation(RepackStmt *stmt, ClusterParams *params)
 		return rel;
 	else
 	{
-		Oid			indexOid;
+		Oid		indexOid = InvalidOid;
+
+		if (stmt->usingindex)
+		{
+			indexOid = determine_clustered_index(rel, stmt->usingindex,
+												 stmt->indexname);
+			check_index_is_clusterable(rel, indexOid, lockmode);
+		}
 
-		indexOid = determine_clustered_index(rel, stmt->usingindex,
-											 stmt->indexname);
-		check_index_is_clusterable(rel, indexOid, AccessExclusiveLock);
-		cluster_rel(stmt->command, stmt->usingindex, rel, indexOid, params);
+		cluster_rel(stmt->command, stmt->usingindex, rel, indexOid,
+					params, isTopLevel);
 		return NULL;
 	}
 }
@@ -1973,3 +2381,1052 @@ determine_clustered_index(Relation rel, bool usingindex, const char *indexname)
 
 	return indexOid;
 }
+
+
+/*
+ * Call this function before REPACK CONCURRENTLY starts to setup logical
+ * decoding. It makes sure that other users of the table put enough
+ * information into WAL.
+ *
+ * The point is that at various places we expect that the table we're
+ * processing is treated like a system catalog. For example, we need to be
+ * able to scan it using a "historic snapshot" anytime during the processing
+ * (as opposed to scanning only at the start point of the decoding, as logical
+ * replication does during initial table synchronization), in order to apply
+ * concurrent UPDATE / DELETE commands.
+ *
+ * Note that TOAST table needs no attention here as it's not scanned using
+ * historic snapshot.
+ */
+static void
+begin_concurrent_repack(Relation rel)
+{
+	Oid			toastrelid;
+
+	/* Avoid logical decoding of other relations by this backend. */
+	repacked_rel_locator = rel->rd_locator;
+	toastrelid = rel->rd_rel->reltoastrelid;
+	if (OidIsValid(toastrelid))
+	{
+		Relation	toastrel;
+
+		/* Avoid logical decoding of other TOAST relations. */
+		toastrel = table_open(toastrelid, AccessShareLock);
+		repacked_rel_toast_locator = toastrel->rd_locator;
+		table_close(toastrel, AccessShareLock);
+	}
+}
+
+/*
+ * Call this when done with REPACK CONCURRENTLY.
+ */
+static void
+end_concurrent_repack(void)
+{
+	/*
+	 * Restore normal function of (future) logical decoding for this backend.
+	 */
+	repacked_rel_locator.relNumber = InvalidOid;
+	repacked_rel_toast_locator.relNumber = InvalidOid;
+}
+
+/*
+ * This function is much like pg_create_logical_replication_slot() except that
+ * the new slot is neither released (if anyone else could read changes from
+ * our slot, we could miss changes other backends do while we copy the
+ * existing data into temporary table), nor persisted (it's easier to handle
+ * crash by restarting all the work from scratch).
+ */
+static LogicalDecodingContext *
+setup_logical_decoding(Oid relid, const char *slotname, TupleDesc tupdesc)
+{
+	LogicalDecodingContext *ctx;
+	RepackDecodingState *dstate;
+
+	/*
+	 * Check if we can use logical decoding.
+	 */
+	CheckSlotPermissions();
+	CheckLogicalDecodingRequirements();
+
+	/* RS_TEMPORARY so that the slot gets cleaned up on ERROR. */
+	ReplicationSlotCreate(slotname, true, RS_TEMPORARY, false, false, false);
+
+	/*
+	 * Neither prepare_write nor do_write callback nor update_progress is
+	 * useful for us.
+	 *
+	 * Regarding the value of need_full_snapshot, we pass false because the
+	 * table we are processing is present in RepackedRelsHash and therefore,
+	 * regarding logical decoding, treated like a catalog.
+	 */
+	ctx = CreateInitDecodingContext(REPL_PLUGIN_NAME,
+									NIL,
+									false,
+									InvalidXLogRecPtr,
+									XL_ROUTINE(.page_read = read_local_xlog_page,
+											   .segment_open = wal_segment_open,
+											   .segment_close = wal_segment_close),
+									NULL, NULL, NULL);
+
+	/*
+	 * We don't have control on setting fast_forward, so at least check it.
+	 */
+	Assert(!ctx->fast_forward);
+
+	DecodingContextFindStartpoint(ctx);
+
+	/* Some WAL records should have been read. */
+	Assert(ctx->reader->EndRecPtr != InvalidXLogRecPtr);
+
+	XLByteToSeg(ctx->reader->EndRecPtr, repack_current_segment,
+				wal_segment_size);
+
+	/*
+	 * Setup structures to store decoded changes.
+	 */
+	dstate = palloc0(sizeof(RepackDecodingState));
+	dstate->relid = relid;
+	dstate->tstore = tuplestore_begin_heap(false, false,
+										   maintenance_work_mem);
+
+	dstate->tupdesc = tupdesc;
+
+	/* Initialize the descriptor to store the changes ... */
+	dstate->tupdesc_change = CreateTemplateTupleDesc(1);
+
+	TupleDescInitEntry(dstate->tupdesc_change, 1, NULL, BYTEAOID, -1, 0);
+	/* ... as well as the corresponding slot. */
+	dstate->tsslot = MakeSingleTupleTableSlot(dstate->tupdesc_change,
+											  &TTSOpsMinimalTuple);
+
+	dstate->resowner = ResourceOwnerCreate(CurrentResourceOwner,
+										   "logical decoding");
+
+	ctx->output_writer_private = dstate;
+	return ctx;
+}
+
+/*
+ * Retrieve tuple from ConcurrentChange structure.
+ *
+ * The input data starts with the structure but it might not be appropriately
+ * aligned.
+ */
+static HeapTuple
+get_changed_tuple(char *change)
+{
+	HeapTupleData tup_data;
+	HeapTuple	result;
+	char	   *src;
+
+	/*
+	 * Ensure alignment before accessing the fields. (This is why we can't use
+	 * heap_copytuple() instead of this function.)
+	 */
+	src = change + offsetof(ConcurrentChange, tup_data);
+	memcpy(&tup_data, src, sizeof(HeapTupleData));
+
+	result = (HeapTuple) palloc(HEAPTUPLESIZE + tup_data.t_len);
+	memcpy(result, &tup_data, sizeof(HeapTupleData));
+	result->t_data = (HeapTupleHeader) ((char *) result + HEAPTUPLESIZE);
+	src = change + SizeOfConcurrentChange;
+	memcpy(result->t_data, src, result->t_len);
+
+	return result;
+}
+
+/*
+ * Decode logical changes from the WAL sequence up to end_of_wal.
+ */
+void
+repack_decode_concurrent_changes(LogicalDecodingContext *ctx,
+								 XLogRecPtr end_of_wal)
+{
+	RepackDecodingState *dstate;
+	ResourceOwner resowner_old;
+
+	/*
+	 * Invalidate the "present" cache before moving to "(recent) history".
+	 */
+	InvalidateSystemCaches();
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+	resowner_old = CurrentResourceOwner;
+	CurrentResourceOwner = dstate->resowner;
+
+	PG_TRY();
+	{
+		while (ctx->reader->EndRecPtr < end_of_wal)
+		{
+			XLogRecord *record;
+			XLogSegNo	segno_new;
+			char	   *errm = NULL;
+			XLogRecPtr	end_lsn;
+
+			record = XLogReadRecord(ctx->reader, &errm);
+			if (errm)
+				elog(ERROR, "%s", errm);
+
+			if (record != NULL)
+				LogicalDecodingProcessRecord(ctx, ctx->reader);
+
+			/*
+			 * If WAL segment boundary has been crossed, inform the decoding
+			 * system that the catalog_xmin can advance. (We can confirm more
+			 * often, but a filling a single WAL segment should not take much
+			 * time.)
+			 */
+			end_lsn = ctx->reader->EndRecPtr;
+			XLByteToSeg(end_lsn, segno_new, wal_segment_size);
+			if (segno_new != repack_current_segment)
+			{
+				LogicalConfirmReceivedLocation(end_lsn);
+				elog(DEBUG1, "REPACK: confirmed receive location %X/%X",
+					 (uint32) (end_lsn >> 32), (uint32) end_lsn);
+				repack_current_segment = segno_new;
+			}
+
+			CHECK_FOR_INTERRUPTS();
+		}
+		InvalidateSystemCaches();
+		CurrentResourceOwner = resowner_old;
+	}
+	PG_CATCH();
+	{
+		/* clear all timetravel entries */
+		InvalidateSystemCaches();
+		CurrentResourceOwner = resowner_old;
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+}
+
+/*
+ * Apply changes that happened during the initial load.
+ *
+ * Scan key is passed by caller, so it does not have to be constructed
+ * multiple times. Key entries have all fields initialized, except for
+ * sk_argument.
+ */
+static void
+apply_concurrent_changes(RepackDecodingState *dstate, Relation rel,
+						 ScanKey key, int nkeys, IndexInsertState *iistate)
+{
+	TupleTableSlot *index_slot,
+			   *ident_slot;
+	HeapTuple	tup_old = NULL;
+
+	if (dstate->nchanges == 0)
+		return;
+
+	/* TupleTableSlot is needed to pass the tuple to ExecInsertIndexTuples(). */
+	index_slot = MakeSingleTupleTableSlot(dstate->tupdesc, &TTSOpsHeapTuple);
+
+	/* A slot to fetch tuples from identity index. */
+	ident_slot = table_slot_create(rel, NULL);
+
+	while (tuplestore_gettupleslot(dstate->tstore, true, false,
+								   dstate->tsslot))
+	{
+		bool		shouldFree;
+		HeapTuple	tup_change,
+					tup,
+					tup_exist;
+		char	   *change_raw,
+				   *src;
+		ConcurrentChange change;
+		bool		isnull[1];
+		Datum		values[1];
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* Get the change from the single-column tuple. */
+		tup_change = ExecFetchSlotHeapTuple(dstate->tsslot, false, &shouldFree);
+		heap_deform_tuple(tup_change, dstate->tupdesc_change, values, isnull);
+		Assert(!isnull[0]);
+
+		/* Make sure we access aligned data. */
+		change_raw = (char *) DatumGetByteaP(values[0]);
+		src = (char *) VARDATA(change_raw);
+		memcpy(&change, src, SizeOfConcurrentChange);
+
+		/* TRUNCATE change contains no tuple, so process it separately. */
+		if (change.kind == CHANGE_TRUNCATE)
+		{
+			/*
+			 * All the things that ExecuteTruncateGuts() does (such as firing
+			 * triggers or handling the DROP_CASCADE behavior) should have
+			 * taken place on the source relation. Thus we only do the actual
+			 * truncation of the new relation (and its indexes).
+			 */
+			heap_truncate_one_rel(rel);
+
+			pfree(tup_change);
+			continue;
+		}
+
+		/*
+		 * Extract the tuple from the change. The tuple is copied here because
+		 * it might be assigned to 'tup_old', in which case it needs to
+		 * survive into the next iteration.
+		 */
+		tup = get_changed_tuple(src);
+
+		if (change.kind == CHANGE_UPDATE_OLD)
+		{
+			Assert(tup_old == NULL);
+			tup_old = tup;
+		}
+		else if (change.kind == CHANGE_INSERT)
+		{
+			Assert(tup_old == NULL);
+
+			apply_concurrent_insert(rel, &change, tup, iistate, index_slot);
+
+			pfree(tup);
+		}
+		else if (change.kind == CHANGE_UPDATE_NEW ||
+				 change.kind == CHANGE_DELETE)
+		{
+			IndexScanDesc ind_scan = NULL;
+			HeapTuple	tup_key;
+
+			if (change.kind == CHANGE_UPDATE_NEW)
+			{
+				tup_key = tup_old != NULL ? tup_old : tup;
+			}
+			else
+			{
+				Assert(tup_old == NULL);
+				tup_key = tup;
+			}
+
+			/*
+			 * Find the tuple to be updated or deleted.
+			 */
+			tup_exist = find_target_tuple(rel, key, nkeys, tup_key,
+										  iistate, ident_slot, &ind_scan);
+			if (tup_exist == NULL)
+				elog(ERROR, "Failed to find target tuple");
+
+			if (change.kind == CHANGE_UPDATE_NEW)
+				apply_concurrent_update(rel, tup, tup_exist, &change, iistate,
+										index_slot);
+			else
+				apply_concurrent_delete(rel, tup_exist, &change);
+
+			if (tup_old != NULL)
+			{
+				pfree(tup_old);
+				tup_old = NULL;
+			}
+
+			pfree(tup);
+			index_endscan(ind_scan);
+		}
+		else
+			elog(ERROR, "Unrecognized kind of change: %d", change.kind);
+
+		/*
+		 * If a change was applied now, increment CID for next writes and
+		 * update the snapshot so it sees the changes we've applied so far.
+		 */
+		if (change.kind != CHANGE_UPDATE_OLD)
+		{
+			CommandCounterIncrement();
+			UpdateActiveSnapshotCommandId();
+		}
+
+		/* TTSOpsMinimalTuple has .get_heap_tuple==NULL. */
+		Assert(shouldFree);
+		pfree(tup_change);
+	}
+
+	tuplestore_clear(dstate->tstore);
+	dstate->nchanges = 0;
+
+	/* Cleanup. */
+	ExecDropSingleTupleTableSlot(index_slot);
+	ExecDropSingleTupleTableSlot(ident_slot);
+}
+
+static void
+apply_concurrent_insert(Relation rel, ConcurrentChange *change, HeapTuple tup,
+						IndexInsertState *iistate, TupleTableSlot *index_slot)
+{
+	List	   *recheck;
+
+
+	/*
+	 * Like simple_heap_insert(), but make sure that the INSERT is not
+	 * logically decoded - see reform_and_rewrite_tuple() for more
+	 * information.
+	 */
+	heap_insert(rel, tup, GetCurrentCommandId(true), HEAP_INSERT_NO_LOGICAL,
+				NULL);
+
+	/*
+	 * Update indexes.
+	 *
+	 * In case functions in the index need the active snapshot and caller
+	 * hasn't set one.
+	 */
+	ExecStoreHeapTuple(tup, index_slot, false);
+	recheck = ExecInsertIndexTuples(iistate->rri,
+									index_slot,
+									iistate->estate,
+									false,	/* update */
+									false,	/* noDupErr */
+									NULL,	/* specConflict */
+									NIL,	/* arbiterIndexes */
+									false	/* onlySummarizing */
+		);
+
+	/*
+	 * If recheck is required, it must have been preformed on the source
+	 * relation by now. (All the logical changes we process here are already
+	 * committed.)
+	 */
+	list_free(recheck);
+
+	pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_INSERTED, 1);
+}
+
+static void
+apply_concurrent_update(Relation rel, HeapTuple tup, HeapTuple tup_target,
+						ConcurrentChange *change, IndexInsertState *iistate,
+						TupleTableSlot *index_slot)
+{
+	LockTupleMode lockmode;
+	TM_FailureData tmfd;
+	TU_UpdateIndexes update_indexes;
+	TM_Result	res;
+	List	   *recheck;
+
+	/*
+	 * Write the new tuple into the new heap. ('tup' gets the TID assigned
+	 * here.)
+	 *
+	 * Do it like in simple_heap_update(), except for 'wal_logical' (and
+	 * except for 'wait').
+	 */
+	res = heap_update(rel, &tup_target->t_self, tup,
+					  GetCurrentCommandId(true),
+					  InvalidSnapshot,
+					  false,	/* no wait - only we are doing changes */
+					  &tmfd, &lockmode, &update_indexes,
+					  false /* wal_logical */ );
+	if (res != TM_Ok)
+		ereport(ERROR, (errmsg("failed to apply concurrent UPDATE")));
+
+	ExecStoreHeapTuple(tup, index_slot, false);
+
+	if (update_indexes != TU_None)
+	{
+		recheck = ExecInsertIndexTuples(iistate->rri,
+										index_slot,
+										iistate->estate,
+										true,	/* update */
+										false,	/* noDupErr */
+										NULL,	/* specConflict */
+										NIL,	/* arbiterIndexes */
+		/* onlySummarizing */
+										update_indexes == TU_Summarizing);
+		list_free(recheck);
+	}
+
+	pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_UPDATED, 1);
+}
+
+static void
+apply_concurrent_delete(Relation rel, HeapTuple tup_target,
+						ConcurrentChange *change)
+{
+	TM_Result	res;
+	TM_FailureData tmfd;
+
+	/*
+	 * Delete tuple from the new heap.
+	 *
+	 * Do it like in simple_heap_delete(), except for 'wal_logical' (and
+	 * except for 'wait').
+	 */
+	res = heap_delete(rel, &tup_target->t_self, GetCurrentCommandId(true),
+					  InvalidSnapshot, false,
+					  &tmfd,
+					  false,	/* no wait - only we are doing changes */
+					  false /* wal_logical */ );
+
+	if (res != TM_Ok)
+		ereport(ERROR, (errmsg("failed to apply concurrent DELETE")));
+
+	pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_DELETED, 1);
+}
+
+/*
+ * Find the tuple to be updated or deleted.
+ *
+ * 'key' is a pre-initialized scan key, into which the function will put the
+ * key values.
+ *
+ * 'tup_key' is a tuple containing the key values for the scan.
+ *
+ * On exit,'*scan_p' contains the scan descriptor used. The caller must close
+ * it when he no longer needs the tuple returned.
+ */
+static HeapTuple
+find_target_tuple(Relation rel, ScanKey key, int nkeys, HeapTuple tup_key,
+				  IndexInsertState *iistate,
+				  TupleTableSlot *ident_slot, IndexScanDesc *scan_p)
+{
+	IndexScanDesc scan;
+	Form_pg_index ident_form;
+	int2vector *ident_indkey;
+	HeapTuple	result = NULL;
+
+	/* XXX no instrumentation for now */
+	scan = index_beginscan(rel, iistate->ident_index, GetActiveSnapshot(),
+						   NULL, nkeys, 0);
+	*scan_p = scan;
+	index_rescan(scan, key, nkeys, NULL, 0);
+
+	/* Info needed to retrieve key values from heap tuple. */
+	ident_form = iistate->ident_index->rd_index;
+	ident_indkey = &ident_form->indkey;
+
+	/* Use the incoming tuple to finalize the scan key. */
+	for (int i = 0; i < scan->numberOfKeys; i++)
+	{
+		ScanKey		entry;
+		bool		isnull;
+		int16		attno_heap;
+
+		entry = &scan->keyData[i];
+		attno_heap = ident_indkey->values[i];
+		entry->sk_argument = heap_getattr(tup_key,
+										  attno_heap,
+										  rel->rd_att,
+										  &isnull);
+		Assert(!isnull);
+	}
+	if (index_getnext_slot(scan, ForwardScanDirection, ident_slot))
+	{
+		bool		shouldFree;
+
+		result = ExecFetchSlotHeapTuple(ident_slot, false, &shouldFree);
+		/* TTSOpsBufferHeapTuple has .get_heap_tuple != NULL. */
+		Assert(!shouldFree);
+	}
+
+	return result;
+}
+
+/*
+ * Decode and apply concurrent changes.
+ *
+ * Pass rel_src iff its reltoastrelid is needed.
+ */
+static void
+process_concurrent_changes(LogicalDecodingContext *ctx, XLogRecPtr end_of_wal,
+						   Relation rel_dst, Relation rel_src, ScanKey ident_key,
+						   int ident_key_nentries, IndexInsertState *iistate)
+{
+	RepackDecodingState *dstate;
+
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_CATCH_UP);
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	repack_decode_concurrent_changes(ctx, end_of_wal);
+
+	if (dstate->nchanges == 0)
+		return;
+
+	PG_TRY();
+	{
+		/*
+		 * Make sure that TOAST values can eventually be accessed via the old
+		 * relation - see comment in copy_table_data().
+		 */
+		if (rel_src)
+			rel_dst->rd_toastoid = rel_src->rd_rel->reltoastrelid;
+
+		apply_concurrent_changes(dstate, rel_dst, ident_key,
+								 ident_key_nentries, iistate);
+	}
+	PG_FINALLY();
+	{
+		if (rel_src)
+			rel_dst->rd_toastoid = InvalidOid;
+	}
+	PG_END_TRY();
+}
+
+static IndexInsertState *
+get_index_insert_state(Relation relation, Oid ident_index_id)
+{
+	EState	   *estate;
+	int			i;
+	IndexInsertState *result;
+
+	result = (IndexInsertState *) palloc0(sizeof(IndexInsertState));
+	estate = CreateExecutorState();
+
+	result->rri = (ResultRelInfo *) palloc(sizeof(ResultRelInfo));
+	InitResultRelInfo(result->rri, relation, 0, 0, 0);
+	ExecOpenIndices(result->rri, false);
+
+	/*
+	 * Find the relcache entry of the identity index so that we spend no extra
+	 * effort to open / close it.
+	 */
+	for (i = 0; i < result->rri->ri_NumIndices; i++)
+	{
+		Relation	ind_rel;
+
+		ind_rel = result->rri->ri_IndexRelationDescs[i];
+		if (ind_rel->rd_id == ident_index_id)
+			result->ident_index = ind_rel;
+	}
+	if (result->ident_index == NULL)
+		elog(ERROR, "Failed to open identity index");
+
+	/* Only initialize fields needed by ExecInsertIndexTuples(). */
+	result->estate = estate;
+
+	return result;
+}
+
+/*
+ * Build scan key to process logical changes.
+ */
+static ScanKey
+build_identity_key(Oid ident_idx_oid, Relation rel_src, int *nentries)
+{
+	Relation	ident_idx_rel;
+	Form_pg_index ident_idx;
+	int			n,
+				i;
+	ScanKey		result;
+
+	Assert(OidIsValid(ident_idx_oid));
+	ident_idx_rel = index_open(ident_idx_oid, AccessShareLock);
+	ident_idx = ident_idx_rel->rd_index;
+	n = ident_idx->indnatts;
+	result = (ScanKey) palloc(sizeof(ScanKeyData) * n);
+	for (i = 0; i < n; i++)
+	{
+		ScanKey		entry;
+		int16		relattno;
+		Form_pg_attribute att;
+		Oid			opfamily,
+					opcintype,
+					opno,
+					opcode;
+
+		entry = &result[i];
+		relattno = ident_idx->indkey.values[i];
+		if (relattno >= 1)
+		{
+			TupleDesc	desc;
+
+			desc = rel_src->rd_att;
+			att = TupleDescAttr(desc, relattno - 1);
+		}
+		else
+			elog(ERROR, "Unexpected attribute number %d in index", relattno);
+
+		opfamily = ident_idx_rel->rd_opfamily[i];
+		opcintype = ident_idx_rel->rd_opcintype[i];
+		opno = get_opfamily_member(opfamily, opcintype, opcintype,
+								   BTEqualStrategyNumber);
+
+		if (!OidIsValid(opno))
+			elog(ERROR, "Failed to find = operator for type %u", opcintype);
+
+		opcode = get_opcode(opno);
+		if (!OidIsValid(opcode))
+			elog(ERROR, "Failed to find = operator for operator %u", opno);
+
+		/* Initialize everything but argument. */
+		ScanKeyInit(entry,
+					i + 1,
+					BTEqualStrategyNumber, opcode,
+					(Datum) NULL);
+		entry->sk_collation = att->attcollation;
+	}
+	index_close(ident_idx_rel, AccessShareLock);
+
+	*nentries = n;
+	return result;
+}
+
+static void
+free_index_insert_state(IndexInsertState *iistate)
+{
+	ExecCloseIndices(iistate->rri);
+	FreeExecutorState(iistate->estate);
+	pfree(iistate->rri);
+	pfree(iistate);
+}
+
+static void
+cleanup_logical_decoding(LogicalDecodingContext *ctx)
+{
+	RepackDecodingState *dstate;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	ExecDropSingleTupleTableSlot(dstate->tsslot);
+	FreeTupleDesc(dstate->tupdesc_change);
+	FreeTupleDesc(dstate->tupdesc);
+	tuplestore_end(dstate->tstore);
+
+	FreeDecodingContext(ctx);
+}
+
+/*
+ * The final steps of rebuild_relation() for concurrent processing.
+ *
+ * On entry, NewHeap is locked in AccessExclusiveLock mode. OldHeap and its
+ * clustering index (if one is passed) are still locked in a mode that allows
+ * concurrent data changes. On exit, both tables and their indexes are closed,
+ * but locked in AccessExclusiveLock mode.
+ */
+static void
+rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
+								   Relation cl_index,
+								   LogicalDecodingContext *ctx,
+								   bool swap_toast_by_content,
+								   TransactionId frozenXid,
+								   MultiXactId cutoffMulti)
+{
+	LOCKMODE	lockmode_old PG_USED_FOR_ASSERTS_ONLY;
+	List	   *ind_oids_new;
+	Oid			old_table_oid = RelationGetRelid(OldHeap);
+	Oid			new_table_oid = RelationGetRelid(NewHeap);
+	List	   *ind_oids_old = RelationGetIndexList(OldHeap);
+	ListCell   *lc,
+			   *lc2;
+	char		relpersistence;
+	bool		is_system_catalog;
+	Oid			ident_idx_old,
+				ident_idx_new;
+	IndexInsertState *iistate;
+	ScanKey		ident_key;
+	int			ident_key_nentries;
+	XLogRecPtr	wal_insert_ptr,
+				end_of_wal;
+	char		dummy_rec_data = '\0';
+	Relation   *ind_refs,
+			   *ind_refs_p;
+	int			nind;
+
+	/* Like in cluster_rel(). */
+	lockmode_old = ShareUpdateExclusiveLock;
+	Assert(CheckRelationLockedByMe(OldHeap, lockmode_old, false));
+	Assert(cl_index == NULL ||
+		   CheckRelationLockedByMe(cl_index, lockmode_old, false));
+	/* This is expected from the caller. */
+	Assert(CheckRelationLockedByMe(NewHeap, AccessExclusiveLock, false));
+
+	ident_idx_old = RelationGetReplicaIndex(OldHeap);
+
+	/*
+	 * Unlike the exclusive case, we build new indexes for the new relation
+	 * rather than swapping the storage and reindexing the old relation. The
+	 * point is that the index build can take some time, so we do it before we
+	 * get AccessExclusiveLock on the old heap and therefore we cannot swap
+	 * the heap storage yet.
+	 *
+	 * index_create() will lock the new indexes using AccessExclusiveLock - no
+	 * need to change that.
+	 *
+	 * We assume that ShareUpdateExclusiveLock on the table prevents anyone
+	 * from dropping the existing indexes or adding new ones, so the lists of
+	 * old and new indexes should match at the swap time. On the other hand we
+	 * do not block ALTER INDEX commands that do not require table lock (e.g.
+	 * ALTER INDEX ... SET ...).
+	 *
+	 * XXX Should we check a the end of our work if another transaction
+	 * executed such a command and issue a NOTICE that we might have discarded
+	 * its effects? (For example, someone changes storage parameter after we
+	 * have created the new index, the new value of that parameter is lost.)
+	 * Alternatively, we can lock all the indexes now in a mode that blocks
+	 * all the ALTER INDEX commands (ShareUpdateExclusiveLock ?), and keep
+	 * them locked till the end of the transactions. That might increase the
+	 * risk of deadlock during the lock upgrade below, however SELECT / DML
+	 * queries should not be involved in such a deadlock.
+	 */
+	ind_oids_new = build_new_indexes(NewHeap, OldHeap, ind_oids_old);
+
+	/*
+	 * Processing shouldn't start w/o valid identity index.
+	 */
+	Assert(OidIsValid(ident_idx_old));
+
+	/* Find "identity index" on the new relation. */
+	ident_idx_new = InvalidOid;
+	forboth(lc, ind_oids_old, lc2, ind_oids_new)
+	{
+		Oid			ind_old = lfirst_oid(lc);
+		Oid			ind_new = lfirst_oid(lc2);
+
+		if (ident_idx_old == ind_old)
+		{
+			ident_idx_new = ind_new;
+			break;
+		}
+	}
+	if (!OidIsValid(ident_idx_new))
+
+		/*
+		 * Should not happen, given our lock on the old relation.
+		 */
+		ereport(ERROR,
+				(errmsg("Identity index missing on the new relation")));
+
+	/* Executor state to update indexes. */
+	iistate = get_index_insert_state(NewHeap, ident_idx_new);
+
+	/*
+	 * Build scan key that we'll use to look for rows to be updated / deleted
+	 * during logical decoding.
+	 */
+	ident_key = build_identity_key(ident_idx_new, OldHeap, &ident_key_nentries);
+
+	/*
+	 * During testing, wait for another backend to perform concurrent data
+	 * changes which we will process below.
+	 */
+	INJECTION_POINT("repack-concurrently-before-lock", NULL);
+
+	/*
+	 * Flush all WAL records inserted so far (possibly except for the last
+	 * incomplete page, see GetInsertRecPtr), to minimize the amount of data
+	 * we need to flush while holding exclusive lock on the source table.
+	 */
+	wal_insert_ptr = GetInsertRecPtr();
+	XLogFlush(wal_insert_ptr);
+	end_of_wal = GetFlushRecPtr(NULL);
+
+	/*
+	 * Apply concurrent changes first time, to minimize the time we need to
+	 * hold AccessExclusiveLock. (Quite some amount of WAL could have been
+	 * written during the data copying and index creation.)
+	 */
+	process_concurrent_changes(ctx, end_of_wal, NewHeap,
+							   swap_toast_by_content ? OldHeap : NULL,
+							   ident_key, ident_key_nentries, iistate);
+
+	/*
+	 * Acquire AccessExclusiveLock on the table, its TOAST relation (if there
+	 * is one), all its indexes, so that we can swap the files.
+	 *
+	 * Before that, unlock the index temporarily to avoid deadlock in case
+	 * another transaction is trying to lock it while holding the lock on the
+	 * table.
+	 */
+	if (cl_index)
+	{
+		index_close(cl_index, ShareUpdateExclusiveLock);
+		cl_index = NULL;
+	}
+	/* For the same reason, unlock TOAST relation. */
+	if (OldHeap->rd_rel->reltoastrelid)
+		LockRelationOid(OldHeap->rd_rel->reltoastrelid, AccessExclusiveLock);
+	/* Finally lock the table */
+	LockRelationOid(old_table_oid, AccessExclusiveLock);
+
+	/*
+	 * Lock all indexes now, not only the clustering one: all indexes need to
+	 * have their files swapped. While doing that, store their relation
+	 * references in an array, to handle predicate locks below.
+	 */
+	ind_refs_p = ind_refs = palloc_array(Relation, list_length(ind_oids_old));
+	nind = 0;
+	foreach(lc, ind_oids_old)
+	{
+		Oid			ind_oid;
+		Relation	index;
+
+		ind_oid = lfirst_oid(lc);
+		index = index_open(ind_oid, AccessExclusiveLock);
+
+		/*
+		 * TODO 1) Do we need to check if ALTER INDEX was executed since the
+		 * new index was created in build_new_indexes()? 2) Specifically for
+		 * the clustering index, should check_index_is_clusterable() be called
+		 * here? (Not sure about the latter: ShareUpdateExclusiveLock on the
+		 * table probably blocks all commands that affect the result of
+		 * check_index_is_clusterable().)
+		 */
+		*ind_refs_p = index;
+		ind_refs_p++;
+		nind++;
+	}
+
+	/*
+	 * In addition, lock the OldHeap's TOAST relation exclusively - again, the
+	 * lock is needed to swap the files.
+	 */
+	if (OidIsValid(OldHeap->rd_rel->reltoastrelid))
+		LockRelationOid(OldHeap->rd_rel->reltoastrelid, AccessExclusiveLock);
+
+	/*
+	 * Tuples and pages of the old heap will be gone, but the heap will stay.
+	 */
+	TransferPredicateLocksToHeapRelation(OldHeap);
+	/* The same for indexes. */
+	for (int i = 0; i < nind; i++)
+	{
+		Relation	index = ind_refs[i];
+
+		TransferPredicateLocksToHeapRelation(index);
+
+		/*
+		 * References to indexes on the old relation are not needed anymore,
+		 * however locks stay till the end of the transaction.
+		 */
+		index_close(index, NoLock);
+	}
+	pfree(ind_refs);
+
+	/*
+	 * Flush anything we see in WAL, to make sure that all changes committed
+	 * while we were waiting for the exclusive lock are available for
+	 * decoding. This should not be necessary if all backends had
+	 * synchronous_commit set, but we can't rely on this setting.
+	 *
+	 * Unfortunately, GetInsertRecPtr() may lag behind the actual insert
+	 * position, and GetLastImportantRecPtr() points at the start of the last
+	 * record rather than at the end. Thus the simplest way to determine the
+	 * insert position is to insert a dummy record and use its LSN.
+	 *
+	 * XXX Consider using GetLastImportantRecPtr() and adding the size of the
+	 * last record (plus the total size of all the page headers the record
+	 * spans)?
+	 */
+	XLogBeginInsert();
+	XLogRegisterData(&dummy_rec_data, 1);
+	wal_insert_ptr = XLogInsert(RM_XLOG_ID, XLOG_NOOP);
+	XLogFlush(wal_insert_ptr);
+	end_of_wal = GetFlushRecPtr(NULL);
+
+	/* Apply the concurrent changes again. */
+	process_concurrent_changes(ctx, end_of_wal, NewHeap,
+							   swap_toast_by_content ? OldHeap : NULL,
+							   ident_key, ident_key_nentries, iistate);
+
+	/* Remember info about rel before closing OldHeap */
+	relpersistence = OldHeap->rd_rel->relpersistence;
+	is_system_catalog = IsSystemRelation(OldHeap);
+
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_SWAP_REL_FILES);
+
+	/*
+	 * Even ShareUpdateExclusiveLock should have prevented others from
+	 * creating / dropping indexes (even using the CONCURRENTLY option), so we
+	 * do not need to check whether the lists match.
+	 */
+	forboth(lc, ind_oids_old, lc2, ind_oids_new)
+	{
+		Oid			ind_old = lfirst_oid(lc);
+		Oid			ind_new = lfirst_oid(lc2);
+		Oid			mapped_tables[4];
+
+		/* Zero out possible results from swapped_relation_files */
+		memset(mapped_tables, 0, sizeof(mapped_tables));
+
+		swap_relation_files(ind_old, ind_new,
+							(old_table_oid == RelationRelationId),
+							swap_toast_by_content,
+							true,
+							InvalidTransactionId,
+							InvalidMultiXactId,
+							mapped_tables);
+
+#ifdef USE_ASSERT_CHECKING
+
+		/*
+		 * Concurrent processing is not supported for system relations, so
+		 * there should be no mapped tables.
+		 */
+		for (int i = 0; i < 4; i++)
+			Assert(mapped_tables[i] == 0);
+#endif
+	}
+
+	/* The new indexes must be visible for deletion. */
+	CommandCounterIncrement();
+
+	/* Close the old heap but keep lock until transaction commit. */
+	table_close(OldHeap, NoLock);
+	/* Close the new heap. (We didn't have to open its indexes). */
+	table_close(NewHeap, NoLock);
+
+	/* Cleanup what we don't need anymore. (And close the identity index.) */
+	pfree(ident_key);
+	free_index_insert_state(iistate);
+
+	/*
+	 * Swap the relations and their TOAST relations and TOAST indexes. This
+	 * also drops the new relation and its indexes.
+	 *
+	 * (System catalogs are currently not supported.)
+	 */
+	Assert(!is_system_catalog);
+	finish_heap_swap(old_table_oid, new_table_oid,
+					 is_system_catalog,
+					 swap_toast_by_content,
+					 false, true, false,
+					 frozenXid, cutoffMulti,
+					 relpersistence);
+}
+
+/*
+ * Build indexes on NewHeap according to those on OldHeap.
+ *
+ * OldIndexes is the list of index OIDs on OldHeap.
+ *
+ * A list of OIDs of the corresponding indexes created on NewHeap is
+ * returned. The order of items does match, so we can use these arrays to swap
+ * index storage.
+ */
+static List *
+build_new_indexes(Relation NewHeap, Relation OldHeap, List *OldIndexes)
+{
+	ListCell   *lc;
+	List	   *result = NIL;
+
+	pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+								 PROGRESS_REPACK_PHASE_REBUILD_INDEX);
+
+	foreach(lc, OldIndexes)
+	{
+		Oid			ind_oid,
+			ind_oid_new;
+		char		*newName;
+		Relation	ind;
+
+		ind_oid = lfirst_oid(lc);
+		ind = index_open(ind_oid, AccessShareLock);
+
+		newName = ChooseRelationName(get_rel_name(ind_oid),
+									 NULL,
+									 "repacknew",
+									 get_rel_namespace(ind->rd_index->indrelid),
+									 false);
+		ind_oid_new = index_create_copy(NewHeap, ind_oid,
+										ind->rd_rel->reltablespace, newName,
+										false);
+		result = lappend_oid(result, ind_oid_new);
+
+		index_close(ind, AccessShareLock);
+	}
+
+	return result;
+}
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 188e26f0e6e..71b73c21ebf 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -904,7 +904,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 static void
 refresh_by_heap_swap(Oid matviewOid, Oid OIDNewHeap, char relpersistence)
 {
-	finish_heap_swap(matviewOid, OIDNewHeap, false, false, true, true,
+	finish_heap_swap(matviewOid, OIDNewHeap, false, false, true, true, true,
 					 RecentXmin, ReadNextMultiXactId(), relpersistence);
 }
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index c6dd2e020da..d83136a2657 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -5989,6 +5989,7 @@ ATRewriteTables(AlterTableStmt *parsetree, List **wqueue, LOCKMODE lockmode,
 			finish_heap_swap(tab->relid, OIDNewHeap,
 							 false, false, true,
 							 !OidIsValid(tab->newTableSpace),
+							 true,
 							 RecentXmin,
 							 ReadNextMultiXactId(),
 							 persistence);
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 8863ad0e8bd..6de9d0ba39d 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -125,7 +125,7 @@ static void vac_truncate_clog(TransactionId frozenXID,
 							  TransactionId lastSaneFrozenXid,
 							  MultiXactId lastSaneMinMulti);
 static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
-					   BufferAccessStrategy bstrategy);
+					   BufferAccessStrategy bstrategy, bool isTopLevel);
 static double compute_parallel_delay(void);
 static VacOptValue get_vacoptval_from_boolean(DefElem *def);
 static bool vac_tid_reaped(ItemPointer itemptr, void *state);
@@ -633,7 +633,8 @@ vacuum(List *relations, const VacuumParams params, BufferAccessStrategy bstrateg
 
 			if (params.options & VACOPT_VACUUM)
 			{
-				if (!vacuum_rel(vrel->oid, vrel->relation, params, bstrategy))
+				if (!vacuum_rel(vrel->oid, vrel->relation, params, bstrategy,
+								isTopLevel))
 					continue;
 			}
 
@@ -1997,7 +1998,7 @@ vac_truncate_clog(TransactionId frozenXID,
  */
 static bool
 vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
-		   BufferAccessStrategy bstrategy)
+		   BufferAccessStrategy bstrategy, bool isTopLevel)
 {
 	LOCKMODE	lmode;
 	Relation	rel;
@@ -2288,7 +2289,7 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 
 			/* VACUUM FULL is now a variant of CLUSTER; see cluster.c */
 			cluster_rel(REPACK_COMMAND_VACUUMFULL, false, rel, InvalidOid,
-						&cluster_params);
+						&cluster_params, isTopLevel);
 			/* cluster_rel closes the relation, but keeps lock */
 
 			rel = NULL;
@@ -2331,7 +2332,8 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
 		toast_vacuum_params.options |= VACOPT_PROCESS_MAIN;
 		toast_vacuum_params.toast_parent = relid;
 
-		vacuum_rel(toast_relid, NULL, toast_vacuum_params, bstrategy);
+		vacuum_rel(toast_relid, NULL, toast_vacuum_params, bstrategy,
+				   isTopLevel);
 	}
 
 	/*
diff --git a/src/backend/meson.build b/src/backend/meson.build
index 2b0db214804..50aa385a581 100644
--- a/src/backend/meson.build
+++ b/src/backend/meson.build
@@ -194,5 +194,6 @@ pg_test_mod_args = pg_mod_args + {
 subdir('jit/llvm')
 subdir('replication/libpqwalreceiver')
 subdir('replication/pgoutput')
+subdir('replication/pgoutput_repack')
 subdir('snowball')
 subdir('utils/mb/conversion_procs')
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..5dc4ae58ffe 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -33,6 +33,7 @@
 #include "access/xlogreader.h"
 #include "access/xlogrecord.h"
 #include "catalog/pg_control.h"
+#include "commands/cluster.h"
 #include "replication/decode.h"
 #include "replication/logical.h"
 #include "replication/message.h"
@@ -472,6 +473,88 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	TransactionId xid = XLogRecGetXid(buf->record);
 	SnapBuild  *builder = ctx->snapshot_builder;
 
+	/*
+	 * If the change is not intended for logical decoding, do not even
+	 * establish transaction for it - REPACK CONCURRENTLY is the typical use
+	 * case.
+	 *
+	 * First, check if REPACK CONCURRENTLY is being performed by this backend.
+	 * If so, only decode data changes of the table that it is processing, and
+	 * the changes of its TOAST relation.
+	 *
+	 * (TOAST locator should not be set unless the main is.)
+	 */
+	Assert(!OidIsValid(repacked_rel_toast_locator.relNumber) ||
+		   OidIsValid(repacked_rel_locator.relNumber));
+
+	if (OidIsValid(repacked_rel_locator.relNumber))
+	{
+		XLogReaderState *r = buf->record;
+		RelFileLocator locator;
+
+		/* Not all records contain the block. */
+		if (XLogRecGetBlockTagExtended(r, 0, &locator, NULL, NULL, NULL) &&
+			!RelFileLocatorEquals(locator, repacked_rel_locator) &&
+			(!OidIsValid(repacked_rel_toast_locator.relNumber) ||
+			 !RelFileLocatorEquals(locator, repacked_rel_toast_locator)))
+			return;
+	}
+
+	/*
+	 * Second, skip records which do not contain sufficient information for
+	 * the decoding.
+	 *
+	 * The problem we solve here is that REPACK CONCURRENTLY generates WAL
+	 * when doing changes in the new table. Those changes should not be useful
+	 * for any other user (such as logical replication subscription) because
+	 * the new table will eventually be dropped (after REPACK CONCURRENTLY has
+	 * assigned its file to the "old table").
+	 */
+	switch (info)
+	{
+		case XLOG_HEAP_INSERT:
+			{
+				xl_heap_insert *rec;
+
+				rec = (xl_heap_insert *) XLogRecGetData(buf->record);
+
+				/*
+				 * This does happen when 1) raw_heap_insert marks the TOAST
+				 * record as HEAP_INSERT_NO_LOGICAL, 2) REPACK CONCURRENTLY
+				 * replays inserts performed by other backends.
+				 */
+				if ((rec->flags & XLH_INSERT_CONTAINS_NEW_TUPLE) == 0)
+					return;
+
+				break;
+			}
+
+		case XLOG_HEAP_HOT_UPDATE:
+		case XLOG_HEAP_UPDATE:
+			{
+				xl_heap_update *rec;
+
+				rec = (xl_heap_update *) XLogRecGetData(buf->record);
+				if ((rec->flags &
+					 (XLH_UPDATE_CONTAINS_NEW_TUPLE |
+					  XLH_UPDATE_CONTAINS_OLD_TUPLE |
+					  XLH_UPDATE_CONTAINS_OLD_KEY)) == 0)
+					return;
+
+				break;
+			}
+
+		case XLOG_HEAP_DELETE:
+			{
+				xl_heap_delete *rec;
+
+				rec = (xl_heap_delete *) XLogRecGetData(buf->record);
+				if (rec->flags & XLH_DELETE_NO_LOGICAL)
+					return;
+				break;
+			}
+	}
+
 	ReorderBufferProcessXid(ctx->reorder, xid, buf->origptr);
 
 	/*
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 6eaa40f6acf..56ea521e8c6 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -486,6 +486,26 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	return SnapBuildMVCCFromHistoric(snap, true);
 }
 
+/*
+ * Build an MVCC snapshot for the initial data load performed by REPACK
+ * CONCURRENTLY command.
+ *
+ * The snapshot will only be used to scan one particular relation, which is
+ * treated like a catalog (therefore ->building_full_snapshot is not
+ * important), and the caller should already have a replication slot setup (so
+ * we do not set MyProc->xmin). XXX Do we yet need to add some restrictions?
+ */
+Snapshot
+SnapBuildInitialSnapshotForRepack(SnapBuild *builder)
+{
+	Snapshot	snap;
+
+	Assert(builder->state == SNAPBUILD_CONSISTENT);
+
+	snap = SnapBuildBuildSnapshot(builder);
+	return SnapBuildMVCCFromHistoric(snap, false);
+}
+
 /*
  * Turn a historic MVCC snapshot into an ordinary MVCC snapshot.
  *
diff --git a/src/backend/replication/pgoutput_repack/Makefile b/src/backend/replication/pgoutput_repack/Makefile
new file mode 100644
index 00000000000..4efeb713b70
--- /dev/null
+++ b/src/backend/replication/pgoutput_repack/Makefile
@@ -0,0 +1,32 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for src/backend/replication/pgoutput_repack
+#
+# IDENTIFICATION
+#    src/backend/replication/pgoutput_repack
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/replication/pgoutput_repack
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = \
+	$(WIN32RES) \
+	pgoutput_repack.o
+PGFILEDESC = "pgoutput_repack - logical replication output plugin for REPACK command"
+NAME = pgoutput_repack
+
+all: all-shared-lib
+
+include $(top_srcdir)/src/Makefile.shlib
+
+install: all installdirs install-lib
+
+installdirs: installdirs-lib
+
+uninstall: uninstall-lib
+
+clean distclean: clean-lib
+	rm -f $(OBJS)
diff --git a/src/backend/replication/pgoutput_repack/meson.build b/src/backend/replication/pgoutput_repack/meson.build
new file mode 100644
index 00000000000..133e865a4a0
--- /dev/null
+++ b/src/backend/replication/pgoutput_repack/meson.build
@@ -0,0 +1,18 @@
+# Copyright (c) 2022-2024, PostgreSQL Global Development Group
+
+pgoutput_repack_sources = files(
+  'pgoutput_repack.c',
+)
+
+if host_system == 'windows'
+  pgoutput_repack_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'pgoutput_repack',
+    '--FILEDESC', 'pgoutput_repack - logical replication output plugin for REPACK command',])
+endif
+
+pgoutput_repack = shared_module('pgoutput_repack',
+  pgoutput_repack_sources,
+  kwargs: pg_mod_args,
+)
+
+backend_targets += pgoutput_repack
diff --git a/src/backend/replication/pgoutput_repack/pgoutput_repack.c b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
new file mode 100644
index 00000000000..687fbbc59bb
--- /dev/null
+++ b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
@@ -0,0 +1,288 @@
+/*-------------------------------------------------------------------------
+ *
+ * pgoutput_cluster.c
+ *		Logical Replication output plugin for REPACK command
+ *
+ * Copyright (c) 2012-2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		  src/backend/replication/pgoutput_cluster/pgoutput_cluster.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heaptoast.h"
+#include "commands/cluster.h"
+#include "replication/snapbuild.h"
+
+PG_MODULE_MAGIC;
+
+static void plugin_startup(LogicalDecodingContext *ctx,
+						   OutputPluginOptions *opt, bool is_init);
+static void plugin_shutdown(LogicalDecodingContext *ctx);
+static void plugin_begin_txn(LogicalDecodingContext *ctx,
+							 ReorderBufferTXN *txn);
+static void plugin_commit_txn(LogicalDecodingContext *ctx,
+							  ReorderBufferTXN *txn, XLogRecPtr commit_lsn);
+static void plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+						  Relation rel, ReorderBufferChange *change);
+static void plugin_truncate(struct LogicalDecodingContext *ctx,
+							ReorderBufferTXN *txn, int nrelations,
+							Relation relations[],
+							ReorderBufferChange *change);
+static void store_change(LogicalDecodingContext *ctx,
+						 ConcurrentChangeKind kind, HeapTuple tuple);
+
+void
+_PG_output_plugin_init(OutputPluginCallbacks *cb)
+{
+	AssertVariableIsOfType(&_PG_output_plugin_init, LogicalOutputPluginInit);
+
+	cb->startup_cb = plugin_startup;
+	cb->begin_cb = plugin_begin_txn;
+	cb->change_cb = plugin_change;
+	cb->truncate_cb = plugin_truncate;
+	cb->commit_cb = plugin_commit_txn;
+	cb->shutdown_cb = plugin_shutdown;
+}
+
+
+/* initialize this plugin */
+static void
+plugin_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
+			   bool is_init)
+{
+	ctx->output_plugin_private = NULL;
+
+	/* Probably unnecessary, as we don't use the SQL interface ... */
+	opt->output_type = OUTPUT_PLUGIN_BINARY_OUTPUT;
+
+	if (ctx->output_plugin_options != NIL)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("This plugin does not expect any options")));
+	}
+}
+
+static void
+plugin_shutdown(LogicalDecodingContext *ctx)
+{
+}
+
+/*
+ * As we don't release the slot during processing of particular table, there's
+ * no room for SQL interface, even for debugging purposes. Therefore we need
+ * neither OutputPluginPrepareWrite() nor OutputPluginWrite() in the plugin
+ * callbacks. (Although we might want to write custom callbacks, this API
+ * seems to be unnecessarily generic for our purposes.)
+ */
+
+/* BEGIN callback */
+static void
+plugin_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
+{
+}
+
+/* COMMIT callback */
+static void
+plugin_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+				  XLogRecPtr commit_lsn)
+{
+}
+
+/*
+ * Callback for individual changed tuples
+ */
+static void
+plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+			  Relation relation, ReorderBufferChange *change)
+{
+	RepackDecodingState *dstate;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	/* Only interested in one particular relation. */
+	if (relation->rd_id != dstate->relid)
+		return;
+
+	/* Decode entry depending on its type */
+	switch (change->action)
+	{
+		case REORDER_BUFFER_CHANGE_INSERT:
+			{
+				HeapTuple	newtuple;
+
+				newtuple = change->data.tp.newtuple != NULL ?
+					change->data.tp.newtuple : NULL;
+
+				/*
+				 * Identity checks in the main function should have made this
+				 * impossible.
+				 */
+				if (newtuple == NULL)
+					elog(ERROR, "Incomplete insert info.");
+
+				store_change(ctx, CHANGE_INSERT, newtuple);
+			}
+			break;
+		case REORDER_BUFFER_CHANGE_UPDATE:
+			{
+				HeapTuple	oldtuple,
+							newtuple;
+
+				oldtuple = change->data.tp.oldtuple != NULL ?
+					change->data.tp.oldtuple : NULL;
+				newtuple = change->data.tp.newtuple != NULL ?
+					change->data.tp.newtuple : NULL;
+
+				if (newtuple == NULL)
+					elog(ERROR, "Incomplete update info.");
+
+				if (oldtuple != NULL)
+					store_change(ctx, CHANGE_UPDATE_OLD, oldtuple);
+
+				store_change(ctx, CHANGE_UPDATE_NEW, newtuple);
+			}
+			break;
+		case REORDER_BUFFER_CHANGE_DELETE:
+			{
+				HeapTuple	oldtuple;
+
+				oldtuple = change->data.tp.oldtuple ?
+					change->data.tp.oldtuple : NULL;
+
+				if (oldtuple == NULL)
+					elog(ERROR, "Incomplete delete info.");
+
+				store_change(ctx, CHANGE_DELETE, oldtuple);
+			}
+			break;
+		default:
+			/* Should not come here */
+			Assert(false);
+			break;
+	}
+}
+
+static void
+plugin_truncate(struct LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+				int nrelations, Relation relations[],
+				ReorderBufferChange *change)
+{
+	RepackDecodingState *dstate;
+	int			i;
+	Relation	relation = NULL;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	/* Find the relation we are processing. */
+	for (i = 0; i < nrelations; i++)
+	{
+		relation = relations[i];
+
+		if (RelationGetRelid(relation) == dstate->relid)
+			break;
+	}
+
+	/* Is this truncation of another relation? */
+	if (i == nrelations)
+		return;
+
+	store_change(ctx, CHANGE_TRUNCATE, NULL);
+}
+
+/* Store concurrent data change. */
+static void
+store_change(LogicalDecodingContext *ctx, ConcurrentChangeKind kind,
+			 HeapTuple tuple)
+{
+	RepackDecodingState *dstate;
+	char	   *change_raw;
+	ConcurrentChange change;
+	bool		flattened = false;
+	Size		size;
+	Datum		values[1];
+	bool		isnull[1];
+	char	   *dst,
+			   *dst_start;
+
+	dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+	size = MAXALIGN(VARHDRSZ) + SizeOfConcurrentChange;
+
+	if (tuple)
+	{
+		/*
+		 * ReorderBufferCommit() stores the TOAST chunks in its private memory
+		 * context and frees them after having called apply_change().
+		 * Therefore we need flat copy (including TOAST) that we eventually
+		 * copy into the memory context which is available to
+		 * decode_concurrent_changes().
+		 */
+		if (HeapTupleHasExternal(tuple))
+		{
+			/*
+			 * toast_flatten_tuple_to_datum() might be more convenient but we
+			 * don't want the decompression it does.
+			 */
+			tuple = toast_flatten_tuple(tuple, dstate->tupdesc);
+			flattened = true;
+		}
+
+		size += tuple->t_len;
+	}
+
+	/* XXX Isn't there any function / macro to do this? */
+	if (size >= 0x3FFFFFFF)
+		elog(ERROR, "Change is too big.");
+
+	/* Construct the change. */
+	change_raw = (char *) palloc0(size);
+	SET_VARSIZE(change_raw, size);
+
+	/*
+	 * Since the varlena alignment might not be sufficient for the structure,
+	 * set the fields in a local instance and remember where it should
+	 * eventually be copied.
+	 */
+	change.kind = kind;
+	dst_start = (char *) VARDATA(change_raw);
+
+	/* No other information is needed for TRUNCATE. */
+	if (change.kind == CHANGE_TRUNCATE)
+	{
+		memcpy(dst_start, &change, SizeOfConcurrentChange);
+		goto store;
+	}
+
+	/*
+	 * Copy the tuple.
+	 *
+	 * CAUTION: change->tup_data.t_data must be fixed on retrieval!
+	 */
+	memcpy(&change.tup_data, tuple, sizeof(HeapTupleData));
+	dst = dst_start + SizeOfConcurrentChange;
+	memcpy(dst, tuple->t_data, tuple->t_len);
+
+	/* The data has been copied. */
+	if (flattened)
+		pfree(tuple);
+
+store:
+	/* Copy the structure so it can be stored. */
+	memcpy(dst_start, &change, SizeOfConcurrentChange);
+
+	/* Store as tuple of 1 bytea column. */
+	values[0] = PointerGetDatum(change_raw);
+	isnull[0] = false;
+	tuplestore_putvalues(dstate->tstore, dstate->tupdesc_change,
+						 values, isnull);
+
+	/* Accounting. */
+	dstate->nchanges++;
+
+	/* Cleanup. */
+	pfree(change_raw);
+}
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2fa045e6b0f..e9ddf39500c 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -25,6 +25,7 @@
 #include "access/xlogprefetcher.h"
 #include "access/xlogrecovery.h"
 #include "commands/async.h"
+#include "commands/cluster.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
diff --git a/src/backend/storage/lmgr/generate-lwlocknames.pl b/src/backend/storage/lmgr/generate-lwlocknames.pl
index cd3e43c448a..519f3953638 100644
--- a/src/backend/storage/lmgr/generate-lwlocknames.pl
+++ b/src/backend/storage/lmgr/generate-lwlocknames.pl
@@ -162,7 +162,7 @@ while (<$lwlocklist>)
 
 die
   "$wait_event_lwlocks[$lwlock_count] defined in wait_event_names.txt but "
-  . " missing from lwlocklist.h"
+  . "missing from lwlocklist.h"
   if $lwlock_count < scalar @wait_event_lwlocks;
 
 die
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 6fe268a8eec..d27a4c30548 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -64,6 +64,7 @@
 #include "catalog/pg_type.h"
 #include "catalog/schemapg.h"
 #include "catalog/storage.h"
+#include "commands/cluster.h"
 #include "commands/policy.h"
 #include "commands/publicationcmds.h"
 #include "commands/trigger.h"
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 70a6b8902d1..7f1c220e00b 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -213,7 +213,6 @@ static List *exportedSnapshots = NIL;
 
 /* Prototypes for local functions */
 static void UnregisterSnapshotNoOwner(Snapshot snapshot);
-static void FreeSnapshot(Snapshot snapshot);
 static void SnapshotResetXmin(void);
 
 /* ResourceOwner callbacks to track snapshot references */
@@ -646,7 +645,7 @@ CopySnapshot(Snapshot snapshot)
  * FreeSnapshot
  *		Free the memory associated with a snapshot.
  */
-static void
+void
 FreeSnapshot(Snapshot snapshot)
 {
 	Assert(snapshot->regd_count == 0);
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 31c1ce5dc09..f23efa12d69 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -4998,18 +4998,27 @@ match_previous_words(int pattern_id,
 	}
 
 /* REPACK */
-	else if (Matches("REPACK"))
+	else if (Matches("REPACK") || Matches("REPACK", "(*)"))
+		COMPLETE_WITH_SCHEMA_QUERY_PLUS(Query_for_list_of_clusterables,
+										"CONCURRENTLY");
+	else if (Matches("REPACK", "CONCURRENTLY"))
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
-	else if (Matches("REPACK", "(*)"))
+	else if (Matches("REPACK", "(*)", "CONCURRENTLY"))
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
-	/* If we have REPACK <sth>, then add "USING INDEX" */
-	else if (Matches("REPACK", MatchAnyExcept("(")))
+	/* If we have REPACK [ CONCURRENTLY ] <sth>, then add "USING INDEX" */
+	else if (Matches("REPACK", MatchAnyExcept("(|CONCURRENTLY")) ||
+			 Matches("REPACK", "CONCURRENTLY", MatchAnyExcept("(")))
 		COMPLETE_WITH("USING INDEX");
-	/* If we have REPACK (*) <sth>, then add "USING INDEX" */
-	else if (Matches("REPACK", "(*)", MatchAny))
+	/* If we have REPACK (*) [ CONCURRENTLY ] <sth>, then add "USING INDEX" */
+	else if (Matches("REPACK", "(*)", MatchAnyExcept("CONCURRENTLY")) ||
+			 Matches("REPACK", "(*)", "CONCURRENTLY", MatchAnyExcept("(")))
 		COMPLETE_WITH("USING INDEX");
-	/* If we have REPACK <sth> USING, then add the index as well */
-	else if (Matches("REPACK", MatchAny, "USING", "INDEX"))
+
+	/*
+	 * Complete ... [ (*) ] [ CONCURRENTLY ] <sth> USING INDEX, with a list of
+	 * indexes for <sth>.
+	 */
+	else if (TailMatches(MatchAnyExcept("(|CONCURRENTLY"), "USING", "INDEX"))
 	{
 		set_completion_reference(prev3_wd);
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_index_of_table);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2bd5a897f8..b82dd17a966 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -323,14 +323,15 @@ extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots,
 							  BulkInsertState bistate);
 extern TM_Result heap_delete(Relation relation, ItemPointer tid,
 							 CommandId cid, Snapshot crosscheck, bool wait,
-							 struct TM_FailureData *tmfd, bool changingPart);
+							 struct TM_FailureData *tmfd, bool changingPart,
+							 bool wal_logical);
 extern void heap_finish_speculative(Relation relation, ItemPointer tid);
 extern void heap_abort_speculative(Relation relation, ItemPointer tid);
 extern TM_Result heap_update(Relation relation, ItemPointer otid,
 							 HeapTuple newtup,
 							 CommandId cid, Snapshot crosscheck, bool wait,
 							 struct TM_FailureData *tmfd, LockTupleMode *lockmode,
-							 TU_UpdateIndexes *update_indexes);
+							 TU_UpdateIndexes *update_indexes, bool wal_logical);
 extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
 								 CommandId cid, LockTupleMode mode, LockWaitPolicy wait_policy,
 								 bool follow_updates,
@@ -411,6 +412,10 @@ extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
 								 uint16 infomask, TransactionId xid);
+extern bool HeapTupleMVCCInserted(HeapTuple htup, Snapshot snapshot,
+								  Buffer buffer);
+extern bool HeapTupleMVCCNotDeleted(HeapTuple htup, Snapshot snapshot,
+									Buffer buffer);
 extern bool HeapTupleHeaderIsOnlyLocked(HeapTupleHeader tuple);
 extern bool HeapTupleIsSurelyDead(HeapTuple htup,
 								  struct GlobalVisState *vistest);
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 277df6b3cf0..8d4af07f840 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -104,6 +104,8 @@
 #define XLH_DELETE_CONTAINS_OLD_KEY				(1<<2)
 #define XLH_DELETE_IS_SUPER						(1<<3)
 #define XLH_DELETE_IS_PARTITION_MOVE			(1<<4)
+/* See heap_delete() */
+#define XLH_DELETE_NO_LOGICAL					(1<<5)
 
 /* convenience macro for checking whether any form of old tuple was logged */
 #define XLH_DELETE_CONTAINS_OLD						\
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 1c9e802a6b1..289b64edfd9 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -22,6 +22,7 @@
 #include "access/xact.h"
 #include "commands/vacuum.h"
 #include "executor/tuptable.h"
+#include "replication/logical.h"
 #include "storage/read_stream.h"
 #include "utils/rel.h"
 #include "utils/snapshot.h"
@@ -623,6 +624,8 @@ typedef struct TableAmRoutine
 											  Relation OldIndex,
 											  bool use_sort,
 											  TransactionId OldestXmin,
+											  Snapshot snapshot,
+											  LogicalDecodingContext *decoding_ctx,
 											  TransactionId *xid_cutoff,
 											  MultiXactId *multi_cutoff,
 											  double *num_tuples,
@@ -1627,6 +1630,10 @@ table_relation_copy_data(Relation rel, const RelFileLocator *newrlocator)
  *   not needed for the relation's AM
  * - *xid_cutoff - ditto
  * - *multi_cutoff - ditto
+ * - snapshot - if != NULL, ignore data changes done by transactions that this
+ *	 (MVCC) snapshot considers still in-progress or in the future.
+ * - decoding_ctx - logical decoding context, to capture concurrent data
+ *   changes.
  *
  * Output parameters:
  * - *xid_cutoff - rel's new relfrozenxid value, may be invalid
@@ -1639,6 +1646,8 @@ table_relation_copy_for_cluster(Relation OldTable, Relation NewTable,
 								Relation OldIndex,
 								bool use_sort,
 								TransactionId OldestXmin,
+								Snapshot snapshot,
+								LogicalDecodingContext *decoding_ctx,
 								TransactionId *xid_cutoff,
 								MultiXactId *multi_cutoff,
 								double *num_tuples,
@@ -1647,6 +1656,7 @@ table_relation_copy_for_cluster(Relation OldTable, Relation NewTable,
 {
 	OldTable->rd_tableam->relation_copy_for_cluster(OldTable, NewTable, OldIndex,
 													use_sort, OldestXmin,
+													snapshot, decoding_ctx,
 													xid_cutoff, multi_cutoff,
 													num_tuples, tups_vacuumed,
 													tups_recently_dead);
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 7f4138c7b36..532ffa7208d 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -13,10 +13,15 @@
 #ifndef CLUSTER_H
 #define CLUSTER_H
 
+#include "nodes/execnodes.h"
 #include "nodes/parsenodes.h"
 #include "parser/parse_node.h"
+#include "replication/logical.h"
 #include "storage/lock.h"
+#include "storage/relfilelocator.h"
 #include "utils/relcache.h"
+#include "utils/resowner.h"
+#include "utils/tuplestore.h"
 
 
 /* flag bits for ClusterParams->options */
@@ -24,6 +29,7 @@
 #define CLUOPT_RECHECK 0x02		/* recheck relation state */
 #define CLUOPT_RECHECK_ISCLUSTERED 0x04 /* recheck relation state for
 										 * indisclustered */
+#define CLUOPT_CONCURRENT 0x08	/* allow concurrent data changes */
 
 /* options for CLUSTER */
 typedef struct ClusterParams
@@ -32,14 +38,95 @@ typedef struct ClusterParams
 } ClusterParams;
 
 
+/*
+ * The following definitions are used by REPACK CONCURRENTLY.
+ */
+
+extern RelFileLocator repacked_rel_locator;
+extern RelFileLocator repacked_rel_toast_locator;
+
+typedef enum
+{
+	CHANGE_INSERT,
+	CHANGE_UPDATE_OLD,
+	CHANGE_UPDATE_NEW,
+	CHANGE_DELETE,
+	CHANGE_TRUNCATE
+} ConcurrentChangeKind;
+
+typedef struct ConcurrentChange
+{
+	/* See the enum above. */
+	ConcurrentChangeKind kind;
+
+	/*
+	 * The actual tuple.
+	 *
+	 * The tuple data follows the ConcurrentChange structure. Before use make
+	 * sure the tuple is correctly aligned (ConcurrentChange can be stored as
+	 * bytea) and that tuple->t_data is fixed.
+	 */
+	HeapTupleData tup_data;
+} ConcurrentChange;
+
+#define SizeOfConcurrentChange (offsetof(ConcurrentChange, tup_data) + \
+								sizeof(HeapTupleData))
+
+/*
+ * Logical decoding state.
+ *
+ * Here we store the data changes that we decode from WAL while the table
+ * contents is being copied to a new storage. Also the necessary metadata
+ * needed to apply these changes to the table is stored here.
+ */
+typedef struct RepackDecodingState
+{
+	/* The relation whose changes we're decoding. */
+	Oid			relid;
+
+	/*
+	 * Decoded changes are stored here. Although we try to avoid excessive
+	 * batches, it can happen that the changes need to be stored to disk. The
+	 * tuplestore does this transparently.
+	 */
+	Tuplestorestate *tstore;
+
+	/* The current number of changes in tstore. */
+	double		nchanges;
+
+	/*
+	 * Descriptor to store the ConcurrentChange structure serialized (bytea).
+	 * We can't store the tuple directly because tuplestore only supports
+	 * minimum tuple and we may need to transfer OID system column from the
+	 * output plugin. Also we need to transfer the change kind, so it's better
+	 * to put everything in the structure than to use 2 tuplestores "in
+	 * parallel".
+	 */
+	TupleDesc	tupdesc_change;
+
+	/* Tuple descriptor needed to update indexes. */
+	TupleDesc	tupdesc;
+
+	/* Slot to retrieve data from tstore. */
+	TupleTableSlot *tsslot;
+
+	ResourceOwner resowner;
+} RepackDecodingState;
+
+
+
 extern void ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel);
 
 extern void cluster_rel(RepackCommand command, bool usingindex,
-						Relation OldHeap, Oid indexOid, ClusterParams *params);
+						Relation OldHeap, Oid indexOid, ClusterParams *params,
+						bool isTopLevel);
 extern void check_index_is_clusterable(Relation OldHeap, Oid indexOid,
 									   LOCKMODE lockmode);
 extern void mark_index_clustered(Relation rel, Oid indexOid, bool is_internal);
 
+extern void repack_decode_concurrent_changes(LogicalDecodingContext *ctx,
+											 XLogRecPtr end_of_wal);
+
 extern Oid	make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
 						  char relpersistence, LOCKMODE lockmode);
 extern void finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
@@ -47,6 +134,7 @@ extern void finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 							 bool swap_toast_by_content,
 							 bool check_constraints,
 							 bool is_internal,
+							 bool reindex,
 							 TransactionId frozenXid,
 							 MultiXactId cutoffMulti,
 							 char newrelpersistence);
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 5b6639c114c..93917ad5544 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -59,18 +59,20 @@
 /*
  * Progress parameters for REPACK.
  *
- * Note: Since REPACK shares some code with CLUSTER, these values are also
- * used by CLUSTER. (CLUSTER is now deprecated, so it makes little sense to
- * introduce a separate set of constants.)
+ * Note: Since REPACK shares some code with CLUSTER, (some of) these values
+ * are also used by CLUSTER. (CLUSTER is now deprecated, so it makes little
+ * sense to introduce a separate set of constants.)
  */
 #define PROGRESS_REPACK_COMMAND					0
 #define PROGRESS_REPACK_PHASE					1
 #define PROGRESS_REPACK_INDEX_RELID				2
 #define PROGRESS_REPACK_HEAP_TUPLES_SCANNED		3
-#define PROGRESS_REPACK_HEAP_TUPLES_WRITTEN		4
-#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			5
-#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		6
-#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		7
+#define PROGRESS_REPACK_HEAP_TUPLES_INSERTED	4
+#define PROGRESS_REPACK_HEAP_TUPLES_UPDATED		5
+#define PROGRESS_REPACK_HEAP_TUPLES_DELETED		6
+#define PROGRESS_REPACK_TOTAL_HEAP_BLKS			7
+#define PROGRESS_REPACK_HEAP_BLKS_SCANNED		8
+#define PROGRESS_REPACK_INDEX_REBUILD_COUNT		9
 
 /*
  * Phases of repack (as advertised via PROGRESS_REPACK_PHASE).
@@ -79,9 +81,10 @@
 #define PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP	2
 #define PROGRESS_REPACK_PHASE_SORT_TUPLES		3
 #define PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP	4
-#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	5
-#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		6
-#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		7
+#define PROGRESS_REPACK_PHASE_CATCH_UP			5
+#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES	6
+#define PROGRESS_REPACK_PHASE_REBUILD_INDEX		7
+#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP		8
 
 /*
  * Commands of PROGRESS_REPACK
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index 6d4d2d1814c..802fc4b0823 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -73,6 +73,7 @@ extern void FreeSnapshotBuilder(SnapBuild *builder);
 extern void SnapBuildSnapDecRefcount(Snapshot snap);
 
 extern Snapshot SnapBuildInitialSnapshot(SnapBuild *builder);
+extern Snapshot SnapBuildInitialSnapshotForRepack(SnapBuild *builder);
 extern Snapshot SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place);
 extern const char *SnapBuildExportSnapshot(SnapBuild *builder);
 extern void SnapBuildClearExportedSnapshot(void);
diff --git a/src/include/storage/lockdefs.h b/src/include/storage/lockdefs.h
index 7f3ba0352f6..2739327b0da 100644
--- a/src/include/storage/lockdefs.h
+++ b/src/include/storage/lockdefs.h
@@ -36,8 +36,8 @@ typedef int LOCKMODE;
 #define AccessShareLock			1	/* SELECT */
 #define RowShareLock			2	/* SELECT FOR UPDATE/FOR SHARE */
 #define RowExclusiveLock		3	/* INSERT, UPDATE, DELETE */
-#define ShareUpdateExclusiveLock 4	/* VACUUM (non-FULL), ANALYZE, CREATE
-									 * INDEX CONCURRENTLY */
+#define ShareUpdateExclusiveLock 4	/* VACUUM (non-exclusive), ANALYZE, CREATE
+									 * INDEX CONCURRENTLY, REPACK CONCURRENTLY */
 #define ShareLock				5	/* CREATE INDEX (WITHOUT CONCURRENTLY) */
 #define ShareRowExclusiveLock	6	/* like EXCLUSIVE MODE, but allows ROW
 									 * SHARE */
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 147b190210a..5eeabdc6c4f 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -61,6 +61,8 @@ extern Snapshot GetLatestSnapshot(void);
 extern void SnapshotSetCommandId(CommandId curcid);
 
 extern Snapshot CopySnapshot(Snapshot snapshot);
+extern void FreeSnapshot(Snapshot snapshot);
+
 extern Snapshot GetCatalogSnapshot(Oid relid);
 extern Snapshot GetNonHistoricCatalogSnapshot(Oid relid);
 extern void InvalidateCatalogSnapshot(void);
diff --git a/src/test/modules/injection_points/Makefile b/src/test/modules/injection_points/Makefile
index fc82cd67f6c..f16422175f8 100644
--- a/src/test/modules/injection_points/Makefile
+++ b/src/test/modules/injection_points/Makefile
@@ -11,10 +11,11 @@ EXTENSION = injection_points
 DATA = injection_points--1.0.sql
 PGFILEDESC = "injection_points - facility for injection points"
 
-REGRESS = injection_points hashagg reindex_conc vacuum
+# REGRESS = injection_points hashagg reindex_conc vacuum
 REGRESS_OPTS = --dlpath=$(top_builddir)/src/test/regress
 
-ISOLATION = basic inplace syscache-update-pruned
+ISOLATION = basic inplace syscache-update-pruned repack
+ISOLATION_OPTS = --temp-config $(top_srcdir)/src/test/modules/injection_points/logical.conf
 
 TAP_TESTS = 1
 
diff --git a/src/test/modules/injection_points/expected/repack.out b/src/test/modules/injection_points/expected/repack.out
new file mode 100644
index 00000000000..b575e9052ee
--- /dev/null
+++ b/src/test/modules/injection_points/expected/repack.out
@@ -0,0 +1,113 @@
+Parsed test spec with 2 sessions
+
+starting permutation: wait_before_lock change_existing change_new change_subxact1 change_subxact2 check2 wakeup_before_lock check1
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step wait_before_lock: 
+	REPACK (CONCURRENTLY) repack_test USING INDEX repack_test_pkey;
+ <waiting ...>
+step change_existing: 
+	UPDATE repack_test SET i=10 where i=1;
+	UPDATE repack_test SET j=20 where i=2;
+	UPDATE repack_test SET i=30 where i=3;
+	UPDATE repack_test SET i=40 where i=30;
+	DELETE FROM repack_test WHERE i=4;
+
+step change_new: 
+	INSERT INTO repack_test(i, j) VALUES (5, 5), (6, 6), (7, 7), (8, 8);
+	UPDATE repack_test SET i=50 where i=5;
+	UPDATE repack_test SET j=60 where i=6;
+	DELETE FROM repack_test WHERE i=7;
+
+step change_subxact1: 
+	BEGIN;
+	INSERT INTO repack_test(i, j) VALUES (100, 100);
+	SAVEPOINT s1;
+	UPDATE repack_test SET i=101 where i=100;
+	SAVEPOINT s2;
+	UPDATE repack_test SET i=102 where i=101;
+	COMMIT;
+
+step change_subxact2: 
+	BEGIN;
+	SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 110);
+	ROLLBACK TO SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 111);
+	COMMIT;
+
+step check2: 
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s2(i, j)
+	SELECT i, j FROM repack_test;
+
+  i|  j
+---+---
+  2| 20
+  6| 60
+  8|  8
+ 10|  1
+ 40|  3
+ 50|  5
+102|100
+110|111
+(8 rows)
+
+step wakeup_before_lock: 
+	SELECT injection_points_wakeup('repack-concurrently-before-lock');
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step wait_before_lock: <... completed>
+step check1: 
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT count(DISTINCT node) FROM relfilenodes;
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s1(i, j)
+	SELECT i, j FROM repack_test;
+
+	SELECT count(*)
+	FROM data_s1 d1 FULL JOIN data_s2 d2 USING (i, j)
+	WHERE d1.i ISNULL OR d2.i ISNULL;
+
+count
+-----
+    2
+(1 row)
+
+  i|  j
+---+---
+  2| 20
+  6| 60
+  8|  8
+ 10|  1
+ 40|  3
+ 50|  5
+102|100
+110|111
+(8 rows)
+
+count
+-----
+    0
+(1 row)
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
diff --git a/src/test/modules/injection_points/logical.conf b/src/test/modules/injection_points/logical.conf
new file mode 100644
index 00000000000..c8f264bc6cb
--- /dev/null
+++ b/src/test/modules/injection_points/logical.conf
@@ -0,0 +1 @@
+wal_level = logical
\ No newline at end of file
diff --git a/src/test/modules/injection_points/meson.build b/src/test/modules/injection_points/meson.build
index 20390d6b4bf..29561103bbf 100644
--- a/src/test/modules/injection_points/meson.build
+++ b/src/test/modules/injection_points/meson.build
@@ -47,9 +47,13 @@ tests += {
     'specs': [
       'basic',
       'inplace',
+      'repack',
       'syscache-update-pruned',
     ],
     'runningcheck': false, # see syscache-update-pruned
+    # 'repack' requires wal_level = 'logical'.
+    'regress_args': ['--temp-config', files('logical.conf')],
+
   },
   'tap': {
     'env': {
diff --git a/src/test/modules/injection_points/specs/repack.spec b/src/test/modules/injection_points/specs/repack.spec
new file mode 100644
index 00000000000..75850334986
--- /dev/null
+++ b/src/test/modules/injection_points/specs/repack.spec
@@ -0,0 +1,143 @@
+# Prefix the system columns with underscore as they are not allowed as column
+# names.
+setup
+{
+	CREATE EXTENSION injection_points;
+
+	CREATE TABLE repack_test(i int PRIMARY KEY, j int);
+	INSERT INTO repack_test(i, j) VALUES (1, 1), (2, 2), (3, 3), (4, 4);
+
+	CREATE TABLE relfilenodes(node oid);
+
+	CREATE TABLE data_s1(i int, j int);
+	CREATE TABLE data_s2(i int, j int);
+}
+
+teardown
+{
+	DROP TABLE repack_test;
+	DROP EXTENSION injection_points;
+
+	DROP TABLE relfilenodes;
+	DROP TABLE data_s1;
+	DROP TABLE data_s2;
+}
+
+session s1
+setup
+{
+	SELECT injection_points_set_local();
+	SELECT injection_points_attach('repack-concurrently-before-lock', 'wait');
+}
+# Perform the initial load and wait for s2 to do some data changes.
+step wait_before_lock
+{
+	REPACK (CONCURRENTLY) repack_test USING INDEX repack_test_pkey;
+}
+# Check the table from the perspective of s1.
+#
+# Besides the contents, we also check that relfilenode has changed.
+
+# Have each session write the contents into a table and use FULL JOIN to check
+# if the outputs are identical.
+step check1
+{
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT count(DISTINCT node) FROM relfilenodes;
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s1(i, j)
+	SELECT i, j FROM repack_test;
+
+	SELECT count(*)
+	FROM data_s1 d1 FULL JOIN data_s2 d2 USING (i, j)
+	WHERE d1.i ISNULL OR d2.i ISNULL;
+}
+teardown
+{
+    SELECT injection_points_detach('repack-concurrently-before-lock');
+}
+
+session s2
+# Change the existing data. UPDATE changes both key and non-key columns. Also
+# update one row twice to test whether tuple version generated by this session
+# can be found.
+step change_existing
+{
+	UPDATE repack_test SET i=10 where i=1;
+	UPDATE repack_test SET j=20 where i=2;
+	UPDATE repack_test SET i=30 where i=3;
+	UPDATE repack_test SET i=40 where i=30;
+	DELETE FROM repack_test WHERE i=4;
+}
+# Insert new rows and UPDATE / DELETE some of them. Again, update both key and
+# non-key column.
+step change_new
+{
+	INSERT INTO repack_test(i, j) VALUES (5, 5), (6, 6), (7, 7), (8, 8);
+	UPDATE repack_test SET i=50 where i=5;
+	UPDATE repack_test SET j=60 where i=6;
+	DELETE FROM repack_test WHERE i=7;
+}
+
+# When applying concurrent data changes, we should see the effects of an
+# in-progress subtransaction.
+#
+# XXX Not sure this test is useful now - it was designed for the patch that
+# preserves tuple visibility and which therefore modifies
+# TransactionIdIsCurrentTransactionId().
+step change_subxact1
+{
+	BEGIN;
+	INSERT INTO repack_test(i, j) VALUES (100, 100);
+	SAVEPOINT s1;
+	UPDATE repack_test SET i=101 where i=100;
+	SAVEPOINT s2;
+	UPDATE repack_test SET i=102 where i=101;
+	COMMIT;
+}
+
+# When applying concurrent data changes, we should not see the effects of a
+# rolled back subtransaction.
+#
+# XXX Is this test useful? See above.
+step change_subxact2
+{
+	BEGIN;
+	SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 110);
+	ROLLBACK TO SAVEPOINT s1;
+	INSERT INTO repack_test(i, j) VALUES (110, 111);
+	COMMIT;
+}
+
+# Check the table from the perspective of s2.
+step check2
+{
+	INSERT INTO relfilenodes(node)
+	SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+	SELECT i, j FROM repack_test ORDER BY i, j;
+
+	INSERT INTO data_s2(i, j)
+	SELECT i, j FROM repack_test;
+}
+step wakeup_before_lock
+{
+	SELECT injection_points_wakeup('repack-concurrently-before-lock');
+}
+
+# Test if data changes introduced while one session is performing REPACK
+# CONCURRENTLY find their way into the table.
+permutation
+	wait_before_lock
+	change_existing
+	change_new
+	change_subxact1
+	change_subxact2
+	check2
+	wakeup_before_lock
+	check1
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 3a1d1d28282..fe227bd8a30 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1999,17 +1999,17 @@ pg_stat_progress_cluster| SELECT s.pid,
             WHEN 2 THEN 'index scanning heap'::text
             WHEN 3 THEN 'sorting tuples'::text
             WHEN 4 THEN 'writing new heap'::text
-            WHEN 5 THEN 'swapping relation files'::text
-            WHEN 6 THEN 'rebuilding index'::text
-            WHEN 7 THEN 'performing final cleanup'::text
+            WHEN 6 THEN 'swapping relation files'::text
+            WHEN 7 THEN 'rebuilding index'::text
+            WHEN 8 THEN 'performing final cleanup'::text
             ELSE NULL::text
         END AS phase,
     (s.param3)::oid AS cluster_index_relid,
     s.param4 AS heap_tuples_scanned,
     s.param5 AS heap_tuples_written,
-    s.param6 AS heap_blks_total,
-    s.param7 AS heap_blks_scanned,
-    s.param8 AS index_rebuild_count
+    s.param8 AS heap_blks_total,
+    s.param9 AS heap_blks_scanned,
+    s.param10 AS index_rebuild_count
    FROM (pg_stat_get_progress_info('CLUSTER'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_copy| SELECT s.pid,
@@ -2081,17 +2081,20 @@ pg_stat_progress_repack| SELECT s.pid,
             WHEN 2 THEN 'index scanning heap'::text
             WHEN 3 THEN 'sorting tuples'::text
             WHEN 4 THEN 'writing new heap'::text
-            WHEN 5 THEN 'swapping relation files'::text
-            WHEN 6 THEN 'rebuilding index'::text
-            WHEN 7 THEN 'performing final cleanup'::text
+            WHEN 5 THEN 'catch-up'::text
+            WHEN 6 THEN 'swapping relation files'::text
+            WHEN 7 THEN 'rebuilding index'::text
+            WHEN 8 THEN 'performing final cleanup'::text
             ELSE NULL::text
         END AS phase,
     (s.param3)::oid AS repack_index_relid,
     s.param4 AS heap_tuples_scanned,
-    s.param5 AS heap_tuples_written,
-    s.param6 AS heap_blks_total,
-    s.param7 AS heap_blks_scanned,
-    s.param8 AS index_rebuild_count
+    s.param5 AS heap_tuples_inserted,
+    s.param6 AS heap_tuples_updated,
+    s.param7 AS heap_tuples_deleted,
+    s.param8 AS heap_blks_total,
+    s.param9 AS heap_blks_scanned,
+    s.param10 AS index_rebuild_count
    FROM (pg_stat_get_progress_info('REPACK'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_vacuum| SELECT s.pid,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 24f98ed1e9e..af967625181 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -486,6 +486,8 @@ CompressFileHandle
 CompressionLocation
 CompressorState
 ComputeXidHorizonsResult
+ConcurrentChange
+ConcurrentChangeKind
 ConditionVariable
 ConditionVariableMinimallyPadded
 ConditionalStack
@@ -1258,6 +1260,7 @@ IndexElem
 IndexFetchHeapData
 IndexFetchTableData
 IndexInfo
+IndexInsertState
 IndexList
 IndexOnlyScan
 IndexOnlyScanState
@@ -2538,6 +2541,7 @@ ReorderBufferUpdateProgressTxnCB
 ReorderTuple
 RepOriginId
 RepackCommand
+RepackDecodingState
 RepackStmt
 ReparameterizeForeignPathByChild_function
 ReplaceVarsFromTargetList_context
-- 
2.47.1

v18-0005-Introduce-repackdb-client-application.patchtext/x-diffDownload

From d1c28027961c1a72cbb711ffb5d96ab2a33e2bf0 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Fri, 15 Aug 2025 14:14:16 +0200
Subject: [PATCH] Introduce repackdb client application.

repackdb is a client-side counterpart of the REPACK command. Like vacuumdb, it
can open multiple database connections and thus it can be beneficial if user
wants to process many tables at a time.

The application works almost identically to vacuumdb, except that it runs
REPACK instead of VACUUM. Therefore we implement it by moving most of the code
from vacuumdb.c to common.c and by calling it from both vacuum.c and
pg_repack.c, after setting global variable "mode" either to MODE_VACUUM or to
MODE_REPACK.

The patch also removes several local variables from the main() function of
vacuumdb.c instead of copying them to pg_repackdb.c.
---
 doc/src/sgml/ref/allfiles.sgml    |    1 +
 doc/src/sgml/ref/pg_repackdb.sgml |  479 +++++++++++++
 doc/src/sgml/reference.sgml       |    1 +
 src/bin/scripts/.gitignore        |    1 +
 src/bin/scripts/Makefile          |    5 +-
 src/bin/scripts/common.c          | 1082 +++++++++++++++++++++++++++++
 src/bin/scripts/common.h          |   78 +++
 src/bin/scripts/meson.build       |    2 +
 src/bin/scripts/pg_repackdb.c     |  177 +++++
 src/bin/scripts/t/103_repackdb.pl |   24 +
 src/bin/scripts/vacuumdb.c        | 1068 +---------------------------
 src/tools/pgindent/typedefs.list  |    1 +
 12 files changed, 1867 insertions(+), 1052 deletions(-)
 create mode 100644 doc/src/sgml/ref/pg_repackdb.sgml
 create mode 100644 src/bin/scripts/pg_repackdb.c
 create mode 100644 src/bin/scripts/t/103_repackdb.pl

diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index c0ef654fcb4..eabf92e3536 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -213,6 +213,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY pgIsready          SYSTEM "pg_isready.sgml">
 <!ENTITY pgReceivewal       SYSTEM "pg_receivewal.sgml">
 <!ENTITY pgRecvlogical      SYSTEM "pg_recvlogical.sgml">
+<!ENTITY pgRepackdb         SYSTEM "pg_repackdb.sgml">
 <!ENTITY pgResetwal         SYSTEM "pg_resetwal.sgml">
 <!ENTITY pgRestore          SYSTEM "pg_restore.sgml">
 <!ENTITY pgRewind           SYSTEM "pg_rewind.sgml">
diff --git a/doc/src/sgml/ref/pg_repackdb.sgml b/doc/src/sgml/ref/pg_repackdb.sgml
new file mode 100644
index 00000000000..32570d071cb
--- /dev/null
+++ b/doc/src/sgml/ref/pg_repackdb.sgml
@@ -0,0 +1,479 @@
+<!--
+doc/src/sgml/ref/pg_repackdb.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="app-pgrepackdb">
+ <indexterm zone="app-pgrepackdb">
+  <primary>pg_repackdb</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle><application>pg_repackdb</application></refentrytitle>
+  <manvolnum>1</manvolnum>
+  <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>pg_repackdb</refname>
+  <refpurpose>repack and analyze a <productname>PostgreSQL</productname>
+  database</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_repackdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+      <arg choice="plain"><option>-t</option></arg>
+      <arg choice="plain"><option>--table</option></arg>
+     </group>
+     <replaceable>table</replaceable>
+     <arg choice="opt">( <replaceable class="parameter">column</replaceable> [,...] )</arg>
+    </arg>
+   </arg>
+
+   <arg choice="opt">
+    <group choice="plain">
+     <arg choice="plain"><replaceable>dbname</replaceable></arg>
+     <arg choice="plain"><option>-a</option></arg>
+     <arg choice="plain"><option>--all</option></arg>
+    </group>
+   </arg>
+  </cmdsynopsis>
+
+  <cmdsynopsis>
+   <command>pg_repackdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+      <arg choice="plain"><option>-n</option></arg>
+      <arg choice="plain"><option>--schema</option></arg>
+     </group>
+     <replaceable>schema</replaceable>
+    </arg>
+   </arg>
+
+   <arg choice="opt">
+    <group choice="plain">
+     <arg choice="plain"><replaceable>dbname</replaceable></arg>
+     <arg choice="plain"><option>-a</option></arg>
+     <arg choice="plain"><option>--all</option></arg>
+    </group>
+   </arg>
+  </cmdsynopsis>
+
+  <cmdsynopsis>
+   <command>pg_repackdb</command>
+   <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+   <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+   <arg choice="plain" rep="repeat">
+    <arg choice="opt">
+     <group choice="plain">
+      <arg choice="plain"><option>-N</option></arg>
+      <arg choice="plain"><option>--exclude-schema</option></arg>
+     </group>
+     <replaceable>schema</replaceable>
+    </arg>
+   </arg>
+
+   <arg choice="opt">
+    <group choice="plain">
+     <arg choice="plain"><replaceable>dbname</replaceable></arg>
+     <arg choice="plain"><option>-a</option></arg>
+     <arg choice="plain"><option>--all</option></arg>
+    </group>
+   </arg>
+  </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <application>pg_repackdb</application> is a utility for repacking a
+   <productname>PostgreSQL</productname> database.
+   <application>pg_repackdb</application> will also generate internal
+   statistics used by the <productname>PostgreSQL</productname> query
+   optimizer.
+  </para>
+
+  <para>
+   <application>pg_repackdb</application> is a wrapper around the SQL
+   command <link linkend="sql-repack"><command>REPACK</command></link> There
+   is no effective difference between repacking and analyzing databases via
+   this utility and via other methods for accessing the server.
+  </para>
+
+ </refsect1>
+
+
+ <refsect1>
+  <title>Options</title>
+
+   <para>
+    <application>pg_repackdb</application> accepts the following command-line arguments:
+    <variablelist>
+     <varlistentry>
+      <term><option>-a</option></term>
+      <term><option>--all</option></term>
+      <listitem>
+       <para>
+        Repack all databases.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option><optional>-d</optional> <replaceable class="parameter">dbname</replaceable></option></term>
+      <term><option><optional>--dbname=</optional><replaceable class="parameter">dbname</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the name of the database to be repacked or analyzed,
+        when <option>-a</option>/<option>--all</option> is not used.  If this
+        is not specified, the database name is read from the environment
+        variable <envar>PGDATABASE</envar>.  If that is not set, the user name
+        specified for the connection is used.
+        The <replaceable>dbname</replaceable> can be
+        a <link linkend="libpq-connstring">connection string</link>.  If so,
+        connection string parameters will override any conflicting command
+        line options.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-e</option></term>
+      <term><option>--echo</option></term>
+      <listitem>
+       <para>
+        Echo the commands that <application>pg_repackdb</application>
+        generates and sends to the server.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-j <replaceable class="parameter">njobs</replaceable></option></term>
+      <term><option>--jobs=<replaceable class="parameter">njobs</replaceable></option></term>
+      <listitem>
+       <para>
+        Execute the repack or analyze commands in parallel by running
+        <replaceable class="parameter">njobs</replaceable>
+        commands simultaneously.  This option may reduce the processing time
+        but it also increases the load on the database server.
+       </para>
+       <para>
+        <application>pg_repackdb</application> will open
+        <replaceable class="parameter">njobs</replaceable> connections to the
+        database, so make sure your <xref linkend="guc-max-connections"/>
+        setting is high enough to accommodate all connections.
+       </para>
+       <para>
+        Note that using this mode might cause deadlock failures if certain
+        system catalogs are processed in parallel.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-n <replaceable class="parameter">schema</replaceable></option></term>
+      <term><option>--schema=<replaceable class="parameter">schema</replaceable></option></term>
+      <listitem>
+       <para>
+        Repack or analyze all tables in
+        <replaceable class="parameter">schema</replaceable> only.  Multiple
+        schemas can be repacked by writing multiple <option>-n</option>
+        switches.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-N <replaceable class="parameter">schema</replaceable></option></term>
+      <term><option>--exclude-schema=<replaceable class="parameter">schema</replaceable></option></term>
+      <listitem>
+       <para>
+        Do not repack or analyze any tables in
+        <replaceable class="parameter">schema</replaceable>.  Multiple schemas
+        can be excluded by writing multiple <option>-N</option> switches.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-q</option></term>
+      <term><option>--quiet</option></term>
+      <listitem>
+       <para>
+        Do not display progress messages.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-t <replaceable class="parameter">table</replaceable> [ (<replaceable class="parameter">column</replaceable> [,...]) ]</option></term>
+      <term><option>--table=<replaceable class="parameter">table</replaceable> [ (<replaceable class="parameter">column</replaceable> [,...]) ]</option></term>
+      <listitem>
+       <para>
+        Repack or analyze <replaceable class="parameter">table</replaceable>
+        only.  Column names can be specified only in conjunction with
+        the <option>--analyze</option> option.  Multiple tables can be
+        repacked by writing multiple
+        <option>-t</option> switches.
+       </para>
+       <tip>
+        <para>
+         If you specify columns, you probably have to escape the parentheses
+         from the shell.  (See examples below.)
+        </para>
+       </tip>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-v</option></term>
+      <term><option>--verbose</option></term>
+      <listitem>
+       <para>
+        Print detailed information during processing.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-V</option></term>
+       <term><option>--version</option></term>
+       <listitem>
+       <para>
+       Print the <application>pg_repackdb</application> version and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-z</option></term>
+      <term><option>--analyze</option></term>
+      <listitem>
+       <para>
+        Also calculate statistics for use by the optimizer.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-?</option></term>
+       <term><option>--help</option></term>
+       <listitem>
+       <para>
+       Show help about <application>pg_repackdb</application> command line
+       arguments, and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+    </variablelist>
+   </para>
+
+   <para>
+    <application>pg_repackdb</application> also accepts
+    the following command-line arguments for connection parameters:
+    <variablelist>
+     <varlistentry>
+      <term><option>-h <replaceable class="parameter">host</replaceable></option></term>
+      <term><option>--host=<replaceable class="parameter">host</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the host name of the machine on which the server
+        is running.  If the value begins with a slash, it is used
+        as the directory for the Unix domain socket.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-p <replaceable class="parameter">port</replaceable></option></term>
+      <term><option>--port=<replaceable class="parameter">port</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the TCP port or local Unix domain socket file
+        extension on which the server
+        is listening for connections.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-U <replaceable class="parameter">username</replaceable></option></term>
+      <term><option>--username=<replaceable class="parameter">username</replaceable></option></term>
+      <listitem>
+       <para>
+        User name to connect as.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-w</option></term>
+      <term><option>--no-password</option></term>
+      <listitem>
+       <para>
+        Never issue a password prompt.  If the server requires
+        password authentication and a password is not available by
+        other means such as a <filename>.pgpass</filename> file, the
+        connection attempt will fail.  This option can be useful in
+        batch jobs and scripts where no user is present to enter a
+        password.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-W</option></term>
+      <term><option>--password</option></term>
+      <listitem>
+       <para>
+        Force <application>pg_repackdb</application> to prompt for a
+        password before connecting to a database.
+       </para>
+
+       <para>
+        This option is never essential, since
+        <application>pg_repackdb</application> will automatically prompt
+        for a password if the server demands password authentication.
+        However, <application>pg_repackdb</application> will waste a
+        connection attempt finding out that the server wants a password.
+        In some cases it is worth typing <option>-W</option> to avoid the extra
+        connection attempt.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--maintenance-db=<replaceable class="parameter">dbname</replaceable></option></term>
+      <listitem>
+       <para>
+        When the <option>-a</option>/<option>--all</option> is used, connect
+        to this database to gather the list of databases to repack.
+        If not specified, the <literal>postgres</literal> database will be used,
+        or if that does not exist, <literal>template1</literal> will be used.
+        This can be a <link linkend="libpq-connstring">connection
+        string</link>.  If so, connection string parameters will override any
+        conflicting command line options.  Also, connection string parameters
+        other than the database name itself will be re-used when connecting
+        to other databases.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+   </para>
+ </refsect1>
+
+
+ <refsect1>
+  <title>Environment</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><envar>PGDATABASE</envar></term>
+    <term><envar>PGHOST</envar></term>
+    <term><envar>PGPORT</envar></term>
+    <term><envar>PGUSER</envar></term>
+
+    <listitem>
+     <para>
+      Default connection parameters
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><envar>PG_COLOR</envar></term>
+    <listitem>
+     <para>
+      Specifies whether to use color in diagnostic messages. Possible values
+      are <literal>always</literal>, <literal>auto</literal> and
+      <literal>never</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <para>
+   This utility, like most other <productname>PostgreSQL</productname> utilities,
+   also uses the environment variables supported by <application>libpq</application>
+   (see <xref linkend="libpq-envars"/>).
+  </para>
+
+ </refsect1>
+
+
+ <refsect1>
+  <title>Diagnostics</title>
+
+  <para>
+   In case of difficulty, see
+   <xref linkend="sql-repack"/> and <xref linkend="app-psql"/> for
+   discussions of potential problems and error messages.
+   The database server must be running at the
+   targeted host.  Also, any default connection settings and environment
+   variables used by the <application>libpq</application> front-end
+   library will apply.
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+   <para>
+    To repack the database <literal>test</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb test</userinput>
+</screen>
+   </para>
+
+   <para>
+    To repack and analyze for the optimizer a database named
+    <literal>bigdb</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --analyze bigdb</userinput>
+</screen>
+   </para>
+
+   <para>
+    To repack a single table
+    <literal>foo</literal> in a database named
+    <literal>xyzzy</literal>, and analyze a single column
+    <literal>bar</literal> of the table for the optimizer:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --analyze --verbose --table='foo(bar)' xyzzy</userinput>
+</screen></para>
+
+   <para>
+    To repack all tables in the <literal>foo</literal> and <literal>bar</literal> schemas
+    in a database named <literal>xyzzy</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --schema='foo' --schema='bar' xyzzy</userinput>
+</screen></para>
+
+
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-repack"/></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 229912d35b7..2ee08e21f41 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -258,6 +258,7 @@
    &pgIsready;
    &pgReceivewal;
    &pgRecvlogical;
+   &pgRepackdb;
    &pgRestore;
    &pgVerifyBackup;
    &psqlRef;
diff --git a/src/bin/scripts/.gitignore b/src/bin/scripts/.gitignore
index 0f23fe00045..8e8b608fd69 100644
--- a/src/bin/scripts/.gitignore
+++ b/src/bin/scripts/.gitignore
@@ -6,5 +6,6 @@
 /reindexdb
 /vacuumdb
 /pg_isready
+/pg_repackdb
 
 /tmp_check/
diff --git a/src/bin/scripts/Makefile b/src/bin/scripts/Makefile
index f6b4d40810b..c95fedaee6e 100644
--- a/src/bin/scripts/Makefile
+++ b/src/bin/scripts/Makefile
@@ -16,7 +16,8 @@ subdir = src/bin/scripts
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-PROGRAMS = createdb createuser dropdb dropuser clusterdb vacuumdb reindexdb pg_isready
+PROGRAMS = createdb createuser dropdb dropuser clusterdb vacuumdb reindexdb pg_isready \
+pg_repackdb
 
 override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
 LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
@@ -31,6 +32,7 @@ clusterdb: clusterdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport su
 vacuumdb: vacuumdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 reindexdb: reindexdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 pg_isready: pg_isready.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
+pg_repackdb: pg_repackdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
 
 install: all installdirs
 	$(INSTALL_PROGRAM) createdb$(X)   '$(DESTDIR)$(bindir)'/createdb$(X)
@@ -41,6 +43,7 @@ install: all installdirs
 	$(INSTALL_PROGRAM) vacuumdb$(X)   '$(DESTDIR)$(bindir)'/vacuumdb$(X)
 	$(INSTALL_PROGRAM) reindexdb$(X)  '$(DESTDIR)$(bindir)'/reindexdb$(X)
 	$(INSTALL_PROGRAM) pg_isready$(X) '$(DESTDIR)$(bindir)'/pg_isready$(X)
+	$(INSTALL_PROGRAM) pg_repackdb$(X) '$(DESTDIR)$(bindir)'/pg_repackdb$(X)
 
 installdirs:
 	$(MKDIR_P) '$(DESTDIR)$(bindir)'
diff --git a/src/bin/scripts/common.c b/src/bin/scripts/common.c
index 7b8f7d4c526..023c61b3e14 100644
--- a/src/bin/scripts/common.c
+++ b/src/bin/scripts/common.c
@@ -20,6 +20,7 @@
 #include "common.h"
 #include "common/connect.h"
 #include "common/logging.h"
+#include "fe_utils/parallel_slot.h"
 #include "common/string.h"
 #include "fe_utils/query_utils.h"
 #include "fe_utils/string_utils.h"
@@ -165,3 +166,1084 @@ yesno_prompt(const char *question)
 			   _(PG_YESLETTER), _(PG_NOLETTER));
 	}
 }
+
+/*
+ * The following code is only used by vacuumdb and pg_repackdb, so not really
+ * "common". Not sure though if it's worth a separate file.
+ */
+
+static struct option long_options_common[] = {
+	{"host", required_argument, NULL, 'h'},
+	{"port", required_argument, NULL, 'p'},
+	{"username", required_argument, NULL, 'U'},
+	{"no-password", no_argument, NULL, 'w'},
+	{"password", no_argument, NULL, 'W'},
+	{"echo", no_argument, NULL, 'e'},
+	{"quiet", no_argument, NULL, 'q'},
+	{"dbname", required_argument, NULL, 'd'},
+	{"all", no_argument, NULL, 'a'},
+	{"table", required_argument, NULL, 't'},
+	{"verbose", no_argument, NULL, 'v'},
+	{"jobs", required_argument, NULL, 'j'},
+	{"schema", required_argument, NULL, 'n'},
+	{"exclude-schema", required_argument, NULL, 'N'},
+	{"maintenance-db", required_argument, NULL, 2}
+};
+
+static struct option long_options_vacuum[] = {
+	{"analyze", no_argument, NULL, 'z'},
+	{"analyze-only", no_argument, NULL, 'Z'},
+	{"freeze", no_argument, NULL, 'F'},
+	{"full", no_argument, NULL, 'f'},
+	{"parallel", required_argument, NULL, 'P'},
+	{"analyze-in-stages", no_argument, NULL, 3},
+	{"disable-page-skipping", no_argument, NULL, 4},
+	{"skip-locked", no_argument, NULL, 5},
+	/* TODO Consider moving to _common */
+	{"min-xid-age", required_argument, NULL, 6},
+	/* TODO Consider moving to _common */
+	{"min-mxid-age", required_argument, NULL, 7},
+	{"no-index-cleanup", no_argument, NULL, 8},
+	{"force-index-cleanup", no_argument, NULL, 9},
+	{"no-truncate", no_argument, NULL, 10},
+	{"no-process-toast", no_argument, NULL, 11},
+	{"no-process-main", no_argument, NULL, 12},
+	{"buffer-usage-limit", required_argument, NULL, 13},
+	{"missing-stats-only", no_argument, NULL, 14}
+};
+
+/* TODO Remove if there are eventually no specific options. */
+static struct option long_options_repack[] = {
+};
+
+RunMode		mode;
+
+VacObjFilter objfilter = OBJFILTER_NONE;
+
+/* For analyze-in-stages mode */
+#define ANALYZE_NO_STAGE	-1
+#define ANALYZE_NUM_STAGES	3
+
+/*
+ * Construct the options array. The result depends on whether we're doing
+ * VACUUM or REPACK.
+ */
+struct option *
+get_options(void)
+{
+	int			ncommon = lengthof(long_options_common);
+	int			nother = mode == MODE_VACUUM ? lengthof(long_options_vacuum) :
+		lengthof(long_options_repack);
+	struct option *result = palloc_array(struct option, ncommon + nother + 1);
+	struct option *src = long_options_common;
+	struct option *dst = result;
+	int			i;
+
+	for (i = 0; i < ncommon; i++)
+	{
+		memcpy(dst, src, sizeof(struct option));
+		dst++;
+		src++;
+	}
+
+	src = mode == MODE_VACUUM ? long_options_vacuum : long_options_repack;
+	for (i = 0; i < nother; i++)
+	{
+		memcpy(dst, src, sizeof(struct option));
+		dst++;
+		src++;
+	}
+
+	/* End-of-list marker */
+	memset(dst, 0, sizeof(struct option));
+
+	return result;
+}
+
+/*
+ * Verify that the filters used at command line are compatible.
+ */
+void
+check_objfilter(void)
+{
+	if ((objfilter & OBJFILTER_ALL_DBS) &&
+		(objfilter & OBJFILTER_DATABASE))
+	{
+		if (mode == MODE_VACUUM)
+			pg_fatal("cannot vacuum all databases and a specific one at the same time");
+		else
+			pg_fatal("cannot repack all databases and a specific one at the same time");
+	}
+
+	if ((objfilter & OBJFILTER_TABLE) &&
+		(objfilter & OBJFILTER_SCHEMA))
+	{
+		if (mode == MODE_VACUUM)
+			pg_fatal("cannot vacuum all tables in schema(s) and specific table(s) at the same time");
+		else
+			pg_fatal("cannot repack all tables in schema(s) and specific table(s) at the same time");
+	}
+
+	if ((objfilter & OBJFILTER_TABLE) &&
+		(objfilter & OBJFILTER_SCHEMA_EXCLUDE))
+	{
+		if (mode == MODE_VACUUM)
+			pg_fatal("cannot vacuum specific table(s) and exclude schema(s) at the same time");
+		else
+			pg_fatal("cannot repack specific table(s) and exclude schema(s) at the same time");
+	}
+
+	if ((objfilter & OBJFILTER_SCHEMA) &&
+		(objfilter & OBJFILTER_SCHEMA_EXCLUDE))
+	{
+		if (mode == MODE_VACUUM)
+			pg_fatal("cannot vacuum all tables in schema(s) and exclude schema(s) at the same time");
+		else
+			pg_fatal("cannot repack all tables in schema(s) and exclude schema(s) at the same time");
+	}
+}
+
+void
+main_common(ConnParams *cparams, const char *dbname,
+			const char *maintenance_db, vacuumingOptions *vacopts,
+			SimpleStringList *objects, bool analyze_in_stages,
+			int tbl_count, int concurrentCons,
+			const char *progname, bool echo, bool quiet)
+{
+	setup_cancel_handler(NULL);
+
+	/* Avoid opening extra connections. */
+	if (tbl_count && (concurrentCons > tbl_count))
+		concurrentCons = tbl_count;
+
+	if (objfilter & OBJFILTER_ALL_DBS)
+	{
+		cparams->dbname = maintenance_db;
+
+		vacuum_all_databases(cparams, vacopts,
+							 analyze_in_stages,
+							 objects,
+							 concurrentCons,
+							 progname, echo, quiet);
+	}
+	else
+	{
+		if (dbname == NULL)
+		{
+			if (getenv("PGDATABASE"))
+				dbname = getenv("PGDATABASE");
+			else if (getenv("PGUSER"))
+				dbname = getenv("PGUSER");
+			else
+				dbname = get_user_name_or_exit(progname);
+		}
+
+		cparams->dbname = dbname;
+
+		if (analyze_in_stages)
+		{
+			int			stage;
+			SimpleStringList *found_objs = NULL;
+
+			for (stage = 0; stage < ANALYZE_NUM_STAGES; stage++)
+			{
+				vacuum_one_database(cparams, vacopts,
+									stage,
+									objects,
+									vacopts->missing_stats_only ? &found_objs : NULL,
+									concurrentCons,
+									progname, echo, quiet);
+			}
+		}
+		else
+			vacuum_one_database(cparams, vacopts,
+								ANALYZE_NO_STAGE,
+								objects, NULL,
+								concurrentCons,
+								progname, echo, quiet);
+	}
+}
+
+/*
+ * Vacuum/analyze all connectable databases.
+ *
+ * In analyze-in-stages mode, we process all databases in one stage before
+ * moving on to the next stage.  That ensure minimal stats are available
+ * quickly everywhere before generating more detailed ones.
+ */
+void
+vacuum_all_databases(ConnParams *cparams,
+					 vacuumingOptions *vacopts,
+					 bool analyze_in_stages,
+					 SimpleStringList *objects,
+					 int concurrentCons,
+					 const char *progname, bool echo, bool quiet)
+{
+	PGconn	   *conn;
+	PGresult   *result;
+	int			stage;
+	int			i;
+
+	conn = connectMaintenanceDatabase(cparams, progname, echo);
+	if (mode == MODE_REPACK && PQserverVersion(conn) < 190000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" command on server versions older than PostgreSQL %s",
+				 "REPACK", "19");
+	}
+	result = executeQuery(conn,
+						  "SELECT datname FROM pg_database WHERE datallowconn AND datconnlimit <> -2 ORDER BY 1;",
+						  echo);
+	PQfinish(conn);
+
+	if (analyze_in_stages)
+	{
+		SimpleStringList **found_objs = NULL;
+
+		if (vacopts->missing_stats_only)
+			found_objs = palloc0(PQntuples(result) * sizeof(SimpleStringList *));
+
+		/*
+		 * When analyzing all databases in stages, we analyze them all in the
+		 * fastest stage first, so that initial statistics become available
+		 * for all of them as soon as possible.
+		 *
+		 * This means we establish several times as many connections, but
+		 * that's a secondary consideration.
+		 */
+		for (stage = 0; stage < ANALYZE_NUM_STAGES; stage++)
+		{
+			for (i = 0; i < PQntuples(result); i++)
+			{
+				cparams->override_dbname = PQgetvalue(result, i, 0);
+
+				vacuum_one_database(cparams, vacopts,
+									stage,
+									objects,
+									vacopts->missing_stats_only ? &found_objs[i] : NULL,
+									concurrentCons,
+									progname, echo, quiet);
+			}
+		}
+	}
+	else
+	{
+		for (i = 0; i < PQntuples(result); i++)
+		{
+			cparams->override_dbname = PQgetvalue(result, i, 0);
+
+			vacuum_one_database(cparams, vacopts,
+								ANALYZE_NO_STAGE,
+								objects, NULL,
+								concurrentCons,
+								progname, echo, quiet);
+		}
+	}
+
+	PQclear(result);
+}
+
+/*
+ * vacuum_one_database
+ *
+ * Process tables in the given database.
+ *
+ * There are two ways to specify the list of objects to process:
+ *
+ * 1) The "found_objs" parameter is a double pointer to a fully qualified list
+ *    of objects to process, as returned by a previous call to
+ *    vacuum_one_database().
+ *
+ *     a) If both "found_objs" (the double pointer) and "*found_objs" (the
+ *        once-dereferenced double pointer) are not NULL, this list takes
+ *        priority, and anything specified in "objects" is ignored.
+ *
+ *     b) If "found_objs" (the double pointer) is not NULL but "*found_objs"
+ *        (the once-dereferenced double pointer) _is_ NULL, the "objects"
+ *        parameter takes priority, and the results of the catalog query
+ *        described in (2) are stored in "found_objs".
+ *
+ *     c) If "found_objs" (the double pointer) is NULL, the "objects"
+ *        parameter again takes priority, and the results of the catalog query
+ *        are not saved.
+ *
+ * 2) The "objects" parameter is a user-specified list of objects to process.
+ *    When (1b) or (1c) applies, this function performs a catalog query to
+ *    retrieve a fully qualified list of objects to process, as described
+ *    below.
+ *
+ *     a) If "objects" is not NULL, the catalog query gathers only the objects
+ *        listed in "objects".
+ *
+ *     b) If "objects" is NULL, all tables in the database are gathered.
+ *
+ * Note that this function is only concerned with running exactly one stage
+ * when in analyze-in-stages mode; caller must iterate on us if necessary.
+ *
+ * If concurrentCons is > 1, multiple connections are used to vacuum tables
+ * in parallel.
+ */
+void
+vacuum_one_database(ConnParams *cparams,
+					vacuumingOptions *vacopts,
+					int stage,
+					SimpleStringList *objects,
+					SimpleStringList **found_objs,
+					int concurrentCons,
+					const char *progname, bool echo, bool quiet)
+{
+	PQExpBufferData sql;
+	PGconn	   *conn;
+	SimpleStringListCell *cell;
+	ParallelSlotArray *sa;
+	int			ntups = 0;
+	bool		failed = false;
+	const char *initcmd;
+	SimpleStringList *ret = NULL;
+	const char *stage_commands[] = {
+		"SET default_statistics_target=1; SET vacuum_cost_delay=0;",
+		"SET default_statistics_target=10; RESET vacuum_cost_delay;",
+		"RESET default_statistics_target;"
+	};
+	const char *stage_messages[] = {
+		gettext_noop("Generating minimal optimizer statistics (1 target)"),
+		gettext_noop("Generating medium optimizer statistics (10 targets)"),
+		gettext_noop("Generating default (full) optimizer statistics")
+	};
+
+	Assert(stage == ANALYZE_NO_STAGE ||
+		   (stage >= 0 && stage < ANALYZE_NUM_STAGES));
+
+	conn = connectDatabase(cparams, progname, echo, false, true);
+
+	if (mode == MODE_REPACK && PQserverVersion(conn) < 190000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" command on server versions older than PostgreSQL %s",
+				 "REPACK", "19");
+	}
+
+	if (vacopts->disable_page_skipping && PQserverVersion(conn) < 90600)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "disable-page-skipping", "9.6");
+	}
+
+	if (vacopts->no_index_cleanup && PQserverVersion(conn) < 120000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "no-index-cleanup", "12");
+	}
+
+	if (vacopts->force_index_cleanup && PQserverVersion(conn) < 120000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "force-index-cleanup", "12");
+	}
+
+	if (!vacopts->do_truncate && PQserverVersion(conn) < 120000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "no-truncate", "12");
+	}
+
+	if (!vacopts->process_main && PQserverVersion(conn) < 160000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "no-process-main", "16");
+	}
+
+	if (!vacopts->process_toast && PQserverVersion(conn) < 140000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "no-process-toast", "14");
+	}
+
+	if (vacopts->skip_locked && PQserverVersion(conn) < 120000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "skip-locked", "12");
+	}
+
+	if (vacopts->min_xid_age != 0 && PQserverVersion(conn) < 90600)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "--min-xid-age", "9.6");
+	}
+
+	if (vacopts->min_mxid_age != 0 && PQserverVersion(conn) < 90600)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "--min-mxid-age", "9.6");
+	}
+
+	if (vacopts->parallel_workers >= 0 && PQserverVersion(conn) < 130000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "--parallel", "13");
+	}
+
+	if (vacopts->buffer_usage_limit && PQserverVersion(conn) < 160000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "--buffer-usage-limit", "16");
+	}
+
+	if (vacopts->missing_stats_only && PQserverVersion(conn) < 150000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "--missing-stats-only", "15");
+	}
+
+	/* skip_database_stats is used automatically if server supports it */
+	vacopts->skip_database_stats = (PQserverVersion(conn) >= 160000);
+
+	if (!quiet)
+	{
+		if (stage != ANALYZE_NO_STAGE)
+			printf(_("%s: processing database \"%s\": %s\n"),
+				   progname, PQdb(conn), _(stage_messages[stage]));
+		else if (mode == MODE_VACUUM)
+			printf(_("%s: vacuuming database \"%s\"\n"),
+				   progname, PQdb(conn));
+		else
+		{
+			Assert(mode == MODE_REPACK);
+			printf(_("%s: repacking database \"%s\"\n"),
+				   progname, PQdb(conn));
+		}
+		fflush(stdout);
+	}
+
+	/*
+	 * If the caller provided the results of a previous catalog query, just
+	 * use that.  Otherwise, run the catalog query ourselves and set the
+	 * return variable if provided.
+	 */
+	if (found_objs && *found_objs)
+		ret = *found_objs;
+	else
+	{
+		ret = retrieve_objects(conn, vacopts, objects, echo);
+		if (found_objs)
+			*found_objs = ret;
+	}
+
+	/*
+	 * Count the number of objects in the catalog query result.  If there are
+	 * none, we are done.
+	 */
+	for (cell = ret ? ret->head : NULL; cell; cell = cell->next)
+		ntups++;
+
+	if (ntups == 0)
+	{
+		PQfinish(conn);
+		return;
+	}
+
+	/*
+	 * Ensure concurrentCons is sane.  If there are more connections than
+	 * vacuumable relations, we don't need to use them all.
+	 */
+	if (concurrentCons > ntups)
+		concurrentCons = ntups;
+	if (concurrentCons <= 0)
+		concurrentCons = 1;
+
+	/*
+	 * All slots need to be prepared to run the appropriate analyze stage, if
+	 * caller requested that mode.  We have to prepare the initial connection
+	 * ourselves before setting up the slots.
+	 */
+	if (stage == ANALYZE_NO_STAGE)
+		initcmd = NULL;
+	else
+	{
+		initcmd = stage_commands[stage];
+		executeCommand(conn, initcmd, echo);
+	}
+
+	/*
+	 * Setup the database connections. We reuse the connection we already have
+	 * for the first slot.  If not in parallel mode, the first slot in the
+	 * array contains the connection.
+	 */
+	sa = ParallelSlotsSetup(concurrentCons, cparams, progname, echo, initcmd);
+	ParallelSlotsAdoptConn(sa, conn);
+
+	initPQExpBuffer(&sql);
+
+	cell = ret->head;
+	do
+	{
+		const char *tabname = cell->val;
+		ParallelSlot *free_slot;
+
+		if (CancelRequested)
+		{
+			failed = true;
+			goto finish;
+		}
+
+		free_slot = ParallelSlotsGetIdle(sa, NULL);
+		if (!free_slot)
+		{
+			failed = true;
+			goto finish;
+		}
+
+		prepare_vacuum_command(&sql, PQserverVersion(free_slot->connection),
+							   vacopts, tabname);
+
+		/*
+		 * Execute the vacuum.  All errors are handled in processQueryResult
+		 * through ParallelSlotsGetIdle.
+		 */
+		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
+		run_vacuum_command(free_slot->connection, sql.data,
+						   echo, tabname);
+
+		cell = cell->next;
+	} while (cell != NULL);
+
+	if (!ParallelSlotsWaitCompletion(sa))
+	{
+		failed = true;
+		goto finish;
+	}
+
+	/* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */
+	if (mode == MODE_VACUUM && vacopts->skip_database_stats &&
+		stage == ANALYZE_NO_STAGE)
+	{
+		const char *cmd = "VACUUM (ONLY_DATABASE_STATS);";
+		ParallelSlot *free_slot = ParallelSlotsGetIdle(sa, NULL);
+
+		if (!free_slot)
+		{
+			failed = true;
+			goto finish;
+		}
+
+		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
+		run_vacuum_command(free_slot->connection, cmd, echo, NULL);
+
+		if (!ParallelSlotsWaitCompletion(sa))
+			failed = true;
+	}
+
+finish:
+	ParallelSlotsTerminate(sa);
+	pg_free(sa);
+
+	termPQExpBuffer(&sql);
+
+	if (failed)
+		exit(1);
+}
+
+/*
+ * Prepare the list of tables to process by querying the catalogs.
+ *
+ * Since we execute the constructed query with the default search_path (which
+ * could be unsafe), everything in this query MUST be fully qualified.
+ *
+ * First, build a WITH clause for the catalog query if any tables were
+ * specified, with a set of values made of relation names and their optional
+ * set of columns.  This is used to match any provided column lists with the
+ * generated qualified identifiers and to filter for the tables provided via
+ * --table.  If a listed table does not exist, the catalog query will fail.
+ */
+SimpleStringList *
+retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
+				 SimpleStringList *objects, bool echo)
+{
+	PQExpBufferData buf;
+	PQExpBufferData catalog_query;
+	PGresult   *res;
+	SimpleStringListCell *cell;
+	SimpleStringList *found_objs = palloc0(sizeof(SimpleStringList));
+	bool		objects_listed = false;
+
+	initPQExpBuffer(&catalog_query);
+	for (cell = objects ? objects->head : NULL; cell; cell = cell->next)
+	{
+		char	   *just_table = NULL;
+		const char *just_columns = NULL;
+
+		if (!objects_listed)
+		{
+			appendPQExpBufferStr(&catalog_query,
+								 "WITH listed_objects (object_oid, column_list) "
+								 "AS (\n  VALUES (");
+			objects_listed = true;
+		}
+		else
+			appendPQExpBufferStr(&catalog_query, ",\n  (");
+
+		if (objfilter & (OBJFILTER_SCHEMA | OBJFILTER_SCHEMA_EXCLUDE))
+		{
+			appendStringLiteralConn(&catalog_query, cell->val, conn);
+			appendPQExpBufferStr(&catalog_query, "::pg_catalog.regnamespace, ");
+		}
+
+		if (objfilter & OBJFILTER_TABLE)
+		{
+			/*
+			 * Split relation and column names given by the user, this is used
+			 * to feed the CTE with values on which are performed pre-run
+			 * validity checks as well.  For now these happen only on the
+			 * relation name.
+			 */
+			splitTableColumnsSpec(cell->val, PQclientEncoding(conn),
+								  &just_table, &just_columns);
+
+			appendStringLiteralConn(&catalog_query, just_table, conn);
+			appendPQExpBufferStr(&catalog_query, "::pg_catalog.regclass, ");
+		}
+
+		if (just_columns && just_columns[0] != '\0')
+			appendStringLiteralConn(&catalog_query, just_columns, conn);
+		else
+			appendPQExpBufferStr(&catalog_query, "NULL");
+
+		appendPQExpBufferStr(&catalog_query, "::pg_catalog.text)");
+
+		pg_free(just_table);
+	}
+
+	/* Finish formatting the CTE */
+	if (objects_listed)
+		appendPQExpBufferStr(&catalog_query, "\n)\n");
+
+	appendPQExpBufferStr(&catalog_query, "SELECT c.relname, ns.nspname");
+
+	if (objects_listed)
+		appendPQExpBufferStr(&catalog_query, ", listed_objects.column_list");
+
+	appendPQExpBufferStr(&catalog_query,
+						 " FROM pg_catalog.pg_class c\n"
+						 " JOIN pg_catalog.pg_namespace ns"
+						 " ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n"
+						 " CROSS JOIN LATERAL (SELECT c.relkind IN ("
+						 CppAsString2(RELKIND_PARTITIONED_TABLE) ", "
+						 CppAsString2(RELKIND_PARTITIONED_INDEX) ")) as p (inherited)\n"
+						 " LEFT JOIN pg_catalog.pg_class t"
+						 " ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid\n");
+
+	/*
+	 * Used to match the tables or schemas listed by the user, completing the
+	 * JOIN clause.
+	 */
+	if (objects_listed)
+	{
+		appendPQExpBufferStr(&catalog_query, " LEFT JOIN listed_objects"
+							 " ON listed_objects.object_oid"
+							 " OPERATOR(pg_catalog.=) ");
+
+		if (objfilter & OBJFILTER_TABLE)
+			appendPQExpBufferStr(&catalog_query, "c.oid\n");
+		else
+			appendPQExpBufferStr(&catalog_query, "ns.oid\n");
+	}
+
+	/*
+	 * Exclude temporary tables, beginning the WHERE clause.
+	 */
+	appendPQExpBufferStr(&catalog_query,
+						 " WHERE c.relpersistence OPERATOR(pg_catalog.!=) "
+						 CppAsString2(RELPERSISTENCE_TEMP) "\n");
+
+	/*
+	 * Used to match the tables or schemas listed by the user, for the WHERE
+	 * clause.
+	 */
+	if (objects_listed)
+	{
+		if (objfilter & OBJFILTER_SCHEMA_EXCLUDE)
+			appendPQExpBufferStr(&catalog_query,
+								 " AND listed_objects.object_oid IS NULL\n");
+		else
+			appendPQExpBufferStr(&catalog_query,
+								 " AND listed_objects.object_oid IS NOT NULL\n");
+	}
+
+	/*
+	 * If no tables were listed, filter for the relevant relation types.  If
+	 * tables were given via --table, don't bother filtering by relation type.
+	 * Instead, let the server decide whether a given relation can be
+	 * processed in which case the user will know about it.
+	 */
+	if ((objfilter & OBJFILTER_TABLE) == 0)
+	{
+		appendPQExpBufferStr(&catalog_query,
+							 " AND c.relkind OPERATOR(pg_catalog.=) ANY (array["
+							 CppAsString2(RELKIND_RELATION) ", "
+							 CppAsString2(RELKIND_MATVIEW) "])\n");
+	}
+
+	/*
+	 * For --min-xid-age and --min-mxid-age, the age of the relation is the
+	 * greatest of the ages of the main relation and its associated TOAST
+	 * table.  The commands generated by vacuumdb will also process the TOAST
+	 * table for the relation if necessary, so it does not need to be
+	 * considered separately.
+	 */
+	if (vacopts->min_xid_age != 0)
+	{
+		appendPQExpBuffer(&catalog_query,
+						  " AND GREATEST(pg_catalog.age(c.relfrozenxid),"
+						  " pg_catalog.age(t.relfrozenxid)) "
+						  " OPERATOR(pg_catalog.>=) '%d'::pg_catalog.int4\n"
+						  " AND c.relfrozenxid OPERATOR(pg_catalog.!=)"
+						  " '0'::pg_catalog.xid\n",
+						  vacopts->min_xid_age);
+	}
+
+	if (vacopts->min_mxid_age != 0)
+	{
+		appendPQExpBuffer(&catalog_query,
+						  " AND GREATEST(pg_catalog.mxid_age(c.relminmxid),"
+						  " pg_catalog.mxid_age(t.relminmxid)) OPERATOR(pg_catalog.>=)"
+						  " '%d'::pg_catalog.int4\n"
+						  " AND c.relminmxid OPERATOR(pg_catalog.!=)"
+						  " '0'::pg_catalog.xid\n",
+						  vacopts->min_mxid_age);
+	}
+
+	if (vacopts->missing_stats_only)
+	{
+		appendPQExpBufferStr(&catalog_query, " AND (\n");
+
+		/* regular stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
+							 " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
+							 " AND NOT a.attisdropped\n"
+							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
+							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
+							 " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+		/* extended stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
+							 " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
+							 " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
+							 " AND d.stxdinherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+		/* expression indexes */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
+							 " JOIN pg_catalog.pg_index i"
+							 " ON i.indexrelid OPERATOR(pg_catalog.=) a.attrelid\n"
+							 " WHERE i.indrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND i.indkey[a.attnum OPERATOR(pg_catalog.-) 1::pg_catalog.int2]"
+							 " OPERATOR(pg_catalog.=) 0::pg_catalog.int2\n"
+							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
+							 " AND NOT a.attisdropped\n"
+							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
+							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
+							 " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+		/* inheritance and regular stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
+							 " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
+							 " AND NOT a.attisdropped\n"
+							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND c.relhassubclass\n"
+							 " AND NOT p.inherited\n"
+							 " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
+							 " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
+							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
+							 " AND s.stainherit))\n");
+
+		/* inheritance and extended stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
+							 " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND c.relhassubclass\n"
+							 " AND NOT p.inherited\n"
+							 " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
+							 " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
+							 " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
+							 " AND d.stxdinherit))\n");
+
+		appendPQExpBufferStr(&catalog_query, " )\n");
+	}
+
+	/*
+	 * Execute the catalog query.  We use the default search_path for this
+	 * query for consistency with table lookups done elsewhere by the user.
+	 */
+	appendPQExpBufferStr(&catalog_query, " ORDER BY c.relpages DESC;");
+	executeCommand(conn, "RESET search_path;", echo);
+	res = executeQuery(conn, catalog_query.data, echo);
+	termPQExpBuffer(&catalog_query);
+	PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, echo));
+
+	/*
+	 * Build qualified identifiers for each table, including the column list
+	 * if given.
+	 */
+	initPQExpBuffer(&buf);
+	for (int i = 0; i < PQntuples(res); i++)
+	{
+		appendPQExpBufferStr(&buf,
+							 fmtQualifiedIdEnc(PQgetvalue(res, i, 1),
+											   PQgetvalue(res, i, 0),
+											   PQclientEncoding(conn)));
+
+		if (objects_listed && !PQgetisnull(res, i, 2))
+			appendPQExpBufferStr(&buf, PQgetvalue(res, i, 2));
+
+		simple_string_list_append(found_objs, buf.data);
+		resetPQExpBuffer(&buf);
+	}
+	termPQExpBuffer(&buf);
+	PQclear(res);
+
+	return found_objs;
+}
+
+/*
+ * Construct a vacuum/analyze/repack command to run based on the given
+ * options, in the given string buffer, which may contain previous garbage.
+ *
+ * The table name used must be already properly quoted.  The command generated
+ * depends on the server version involved and it is semicolon-terminated.
+ */
+void
+prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
+					   vacuumingOptions *vacopts, const char *table)
+{
+	const char *paren = " (";
+	const char *comma = ", ";
+	const char *sep = paren;
+
+	resetPQExpBuffer(sql);
+
+	if (mode == MODE_REPACK)
+	{
+		appendPQExpBufferStr(sql, "REPACK");
+		if (vacopts->verbose)
+			appendPQExpBufferStr(sql, " (VERBOSE)");
+	}
+	else if (vacopts->analyze_only)
+	{
+		appendPQExpBufferStr(sql, "ANALYZE");
+
+		/* parenthesized grammar of ANALYZE is supported since v11 */
+		if (serverVersion >= 110000)
+		{
+			if (vacopts->skip_locked)
+			{
+				/* SKIP_LOCKED is supported since v12 */
+				Assert(serverVersion >= 120000);
+				appendPQExpBuffer(sql, "%sSKIP_LOCKED", sep);
+				sep = comma;
+			}
+			if (vacopts->verbose)
+			{
+				appendPQExpBuffer(sql, "%sVERBOSE", sep);
+				sep = comma;
+			}
+			if (vacopts->buffer_usage_limit)
+			{
+				Assert(serverVersion >= 160000);
+				appendPQExpBuffer(sql, "%sBUFFER_USAGE_LIMIT '%s'", sep,
+								  vacopts->buffer_usage_limit);
+				sep = comma;
+			}
+			if (sep != paren)
+				appendPQExpBufferChar(sql, ')');
+		}
+		else
+		{
+			if (vacopts->verbose)
+				appendPQExpBufferStr(sql, " VERBOSE");
+		}
+	}
+	else
+	{
+		appendPQExpBufferStr(sql, "VACUUM");
+
+		/* parenthesized grammar of VACUUM is supported since v9.0 */
+		if (serverVersion >= 90000)
+		{
+			if (vacopts->disable_page_skipping)
+			{
+				/* DISABLE_PAGE_SKIPPING is supported since v9.6 */
+				Assert(serverVersion >= 90600);
+				appendPQExpBuffer(sql, "%sDISABLE_PAGE_SKIPPING", sep);
+				sep = comma;
+			}
+			if (vacopts->no_index_cleanup)
+			{
+				/* "INDEX_CLEANUP FALSE" has been supported since v12 */
+				Assert(serverVersion >= 120000);
+				Assert(!vacopts->force_index_cleanup);
+				appendPQExpBuffer(sql, "%sINDEX_CLEANUP FALSE", sep);
+				sep = comma;
+			}
+			if (vacopts->force_index_cleanup)
+			{
+				/* "INDEX_CLEANUP TRUE" has been supported since v12 */
+				Assert(serverVersion >= 120000);
+				Assert(!vacopts->no_index_cleanup);
+				appendPQExpBuffer(sql, "%sINDEX_CLEANUP TRUE", sep);
+				sep = comma;
+			}
+			if (!vacopts->do_truncate)
+			{
+				/* TRUNCATE is supported since v12 */
+				Assert(serverVersion >= 120000);
+				appendPQExpBuffer(sql, "%sTRUNCATE FALSE", sep);
+				sep = comma;
+			}
+			if (!vacopts->process_main)
+			{
+				/* PROCESS_MAIN is supported since v16 */
+				Assert(serverVersion >= 160000);
+				appendPQExpBuffer(sql, "%sPROCESS_MAIN FALSE", sep);
+				sep = comma;
+			}
+			if (!vacopts->process_toast)
+			{
+				/* PROCESS_TOAST is supported since v14 */
+				Assert(serverVersion >= 140000);
+				appendPQExpBuffer(sql, "%sPROCESS_TOAST FALSE", sep);
+				sep = comma;
+			}
+			if (vacopts->skip_database_stats)
+			{
+				/* SKIP_DATABASE_STATS is supported since v16 */
+				Assert(serverVersion >= 160000);
+				appendPQExpBuffer(sql, "%sSKIP_DATABASE_STATS", sep);
+				sep = comma;
+			}
+			if (vacopts->skip_locked)
+			{
+				/* SKIP_LOCKED is supported since v12 */
+				Assert(serverVersion >= 120000);
+				appendPQExpBuffer(sql, "%sSKIP_LOCKED", sep);
+				sep = comma;
+			}
+			if (vacopts->full)
+			{
+				appendPQExpBuffer(sql, "%sFULL", sep);
+				sep = comma;
+			}
+			if (vacopts->freeze)
+			{
+				appendPQExpBuffer(sql, "%sFREEZE", sep);
+				sep = comma;
+			}
+			if (vacopts->verbose)
+			{
+				appendPQExpBuffer(sql, "%sVERBOSE", sep);
+				sep = comma;
+			}
+			if (vacopts->and_analyze)
+			{
+				appendPQExpBuffer(sql, "%sANALYZE", sep);
+				sep = comma;
+			}
+			if (vacopts->parallel_workers >= 0)
+			{
+				/* PARALLEL is supported since v13 */
+				Assert(serverVersion >= 130000);
+				appendPQExpBuffer(sql, "%sPARALLEL %d", sep,
+								  vacopts->parallel_workers);
+				sep = comma;
+			}
+			if (vacopts->buffer_usage_limit)
+			{
+				Assert(serverVersion >= 160000);
+				appendPQExpBuffer(sql, "%sBUFFER_USAGE_LIMIT '%s'", sep,
+								  vacopts->buffer_usage_limit);
+				sep = comma;
+			}
+			if (sep != paren)
+				appendPQExpBufferChar(sql, ')');
+		}
+		else
+		{
+			if (vacopts->full)
+				appendPQExpBufferStr(sql, " FULL");
+			if (vacopts->freeze)
+				appendPQExpBufferStr(sql, " FREEZE");
+			if (vacopts->verbose)
+				appendPQExpBufferStr(sql, " VERBOSE");
+			if (vacopts->and_analyze)
+				appendPQExpBufferStr(sql, " ANALYZE");
+		}
+	}
+
+	appendPQExpBuffer(sql, " %s;", table);
+}
+
+/*
+ * Send a vacuum/analyze command to the server, returning after sending the
+ * command.
+ *
+ * Any errors during command execution are reported to stderr.
+ */
+void
+run_vacuum_command(PGconn *conn, const char *sql, bool echo,
+				   const char *table)
+{
+	bool		status;
+
+	if (echo)
+		printf("%s\n", sql);
+
+	status = PQsendQuery(conn, sql) == 1;
+
+	if (!status)
+	{
+		if (table)
+		{
+			if (mode == MODE_VACUUM)
+				pg_log_error("vacuuming of table \"%s\" in database \"%s\" failed: %s",
+							 table, PQdb(conn), PQerrorMessage(conn));
+			else
+				pg_log_error("repacking of table \"%s\" in database \"%s\" failed: %s",
+							 table, PQdb(conn), PQerrorMessage(conn));
+		}
+		else
+		{
+			if (mode == MODE_VACUUM)
+				pg_log_error("vacuuming of database \"%s\" failed: %s",
+							 PQdb(conn), PQerrorMessage(conn));
+			else
+				pg_log_error("repacking of database \"%s\" failed: %s",
+							 PQdb(conn), PQerrorMessage(conn));
+		}
+	}
+}
diff --git a/src/bin/scripts/common.h b/src/bin/scripts/common.h
index 7fab1fb6e03..d94355820c0 100644
--- a/src/bin/scripts/common.h
+++ b/src/bin/scripts/common.h
@@ -9,8 +9,12 @@
 #ifndef COMMON_H
 #define COMMON_H
 
+#include "catalog/pg_class_d.h"
 #include "common/username.h"
+#include "fe_utils/cancel.h"
+#include "fe_utils/option_utils.h"
 #include "fe_utils/connect_utils.h"
+#include "fe_utils/simple_list.h"
 #include "getopt_long.h"
 #include "libpq-fe.h"
 #include "pqexpbuffer.h"
@@ -23,4 +27,78 @@ extern void appendQualifiedRelation(PQExpBuffer buf, const char *spec,
 
 extern bool yesno_prompt(const char *question);
 
+/* vacuum options controlled by user flags */
+typedef struct vacuumingOptions
+{
+	bool		analyze_only;
+	bool		verbose;
+	bool		and_analyze;
+	bool		full;
+	bool		freeze;
+	bool		disable_page_skipping;
+	bool		skip_locked;
+	int			min_xid_age;
+	int			min_mxid_age;
+	int			parallel_workers;	/* >= 0 indicates user specified the
+									 * parallel degree, otherwise -1 */
+	bool		no_index_cleanup;
+	bool		force_index_cleanup;
+	bool		do_truncate;
+	bool		process_main;
+	bool		process_toast;
+	bool		skip_database_stats;
+	char	   *buffer_usage_limit;
+	bool		missing_stats_only;
+} vacuumingOptions;
+
+typedef enum
+{
+	MODE_VACUUM,
+	MODE_REPACK
+} RunMode;
+
+extern RunMode mode;
+
+/* object filter options */
+typedef enum
+{
+	OBJFILTER_NONE = 0,			/* no filter used */
+	OBJFILTER_ALL_DBS = (1 << 0),	/* -a | --all */
+	OBJFILTER_DATABASE = (1 << 1),	/* -d | --dbname */
+	OBJFILTER_TABLE = (1 << 2), /* -t | --table */
+	OBJFILTER_SCHEMA = (1 << 3),	/* -n | --schema */
+	OBJFILTER_SCHEMA_EXCLUDE = (1 << 4),	/* -N | --exclude-schema */
+} VacObjFilter;
+
+extern VacObjFilter objfilter;
+
+extern struct option *get_options(void);
+extern void	check_objfilter(void);
+extern void main_common(ConnParams *cparams, const char *dbname,
+						const char *maintenance_db, vacuumingOptions *vacopts,
+						SimpleStringList *objects, bool analyze_in_stages,
+						int tbl_count, int concurrentCons,
+						const char *progname, bool echo, bool quiet);
+extern void vacuum_all_databases(ConnParams *cparams,
+								 vacuumingOptions *vacopts,
+								 bool analyze_in_stages,
+								 SimpleStringList *objects,
+								 int concurrentCons,
+								 const char *progname, bool echo, bool quiet);
+extern void vacuum_one_database(ConnParams *cparams,
+								vacuumingOptions *vacopts,
+								int stage,
+								SimpleStringList *objects,
+								SimpleStringList **found_objs,
+								int concurrentCons,
+								const char *progname, bool echo, bool quiet);
+extern SimpleStringList *retrieve_objects(PGconn *conn,
+										  vacuumingOptions *vacopts,
+										  SimpleStringList *objects,
+										  bool echo);
+extern void prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
+								   vacuumingOptions *vacopts,
+								   const char *table);
+extern void run_vacuum_command(PGconn *conn, const char *sql, bool echo,
+							   const char *table);
 #endif							/* COMMON_H */
diff --git a/src/bin/scripts/meson.build b/src/bin/scripts/meson.build
index 80df7c33257..dd828394b26 100644
--- a/src/bin/scripts/meson.build
+++ b/src/bin/scripts/meson.build
@@ -15,6 +15,7 @@ binaries = [
   'vacuumdb',
   'reindexdb',
   'pg_isready',
+  'pg_repackdb',
 ]
 
 foreach binary : binaries
@@ -54,6 +55,7 @@ tests += {
       't/100_vacuumdb.pl',
       't/101_vacuumdb_all.pl',
       't/102_vacuumdb_stages.pl',
+      't/103_repackdb.pl',
       't/200_connstr.pl',
     ],
   },
diff --git a/src/bin/scripts/pg_repackdb.c b/src/bin/scripts/pg_repackdb.c
new file mode 100644
index 00000000000..fe4a80eee89
--- /dev/null
+++ b/src/bin/scripts/pg_repackdb.c
@@ -0,0 +1,177 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_repackdb
+ *		Repack (and analyze) database.
+ *
+ * Copyright (c) 2025, PostgreSQL Global Development Group
+ *
+ * src/bin/scripts/pg_repackdb.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <limits.h>
+
+#include "common.h"
+#include "common/logging.h"
+
+static void help(const char *progname);
+
+int
+main(int argc, char *argv[])
+{
+	const char *progname;
+	int			optindex;
+	int			c;
+	const char *dbname = NULL;
+	const char *maintenance_db = NULL;
+	ConnParams	cparams;
+	bool		echo = false;
+	bool		quiet = false;
+	vacuumingOptions vacopts;
+	SimpleStringList objects = {NULL, NULL};
+	int			concurrentCons = 1;
+	int			tbl_count = 0;
+
+	mode = MODE_REPACK;
+
+	/* initialize options */
+	memset(&vacopts, 0, sizeof(vacopts));
+
+	/* the same for connection parameters */
+	memset(&cparams, 0, sizeof(cparams));
+	cparams.prompt_password = TRI_DEFAULT;
+
+	pg_logging_init(argv[0]);
+	progname = get_progname(argv[0]);
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
+
+	handle_help_version_opts(argc, argv, progname, help);
+
+	while ((c = getopt_long(argc, argv, "ad:eh:j:n:N:p:qt:U:vwW",
+							get_options(), &optindex)) != -1)
+	{
+		switch (c)
+		{
+			case 'a':
+				objfilter |= OBJFILTER_ALL_DBS;
+				break;
+			case 'd':
+				objfilter |= OBJFILTER_DATABASE;
+				dbname = pg_strdup(optarg);
+				break;
+			case 'e':
+				echo = true;
+				break;
+			case 'h':
+				cparams.pghost = pg_strdup(optarg);
+				break;
+			case 'j':
+				if (!option_parse_int(optarg, "-j/--jobs", 1, INT_MAX,
+									  &concurrentCons))
+					exit(1);
+				break;
+			case 'n':
+				objfilter |= OBJFILTER_SCHEMA;
+				simple_string_list_append(&objects, optarg);
+				break;
+			case 'N':
+				objfilter |= OBJFILTER_SCHEMA_EXCLUDE;
+				simple_string_list_append(&objects, optarg);
+				break;
+			case 'p':
+				cparams.pgport = pg_strdup(optarg);
+				break;
+			case 'q':
+				quiet = true;
+				break;
+			case 't':
+				objfilter |= OBJFILTER_TABLE;
+				simple_string_list_append(&objects, optarg);
+				tbl_count++;
+				break;
+			case 'U':
+				cparams.pguser = pg_strdup(optarg);
+				break;
+			case 'v':
+				vacopts.verbose = true;
+				break;
+			case 'w':
+				cparams.prompt_password = TRI_NO;
+				break;
+			case 'W':
+				cparams.prompt_password = TRI_YES;
+				break;
+			case 2:
+				maintenance_db = pg_strdup(optarg);
+				break;
+			default:
+				/* getopt_long already emitted a complaint */
+				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+				exit(1);
+		}
+	}
+
+	/*
+	 * Non-option argument specifies database name as long as it wasn't
+	 * already specified with -d / --dbname
+	 */
+	if (optind < argc && dbname == NULL)
+	{
+		objfilter |= OBJFILTER_DATABASE;
+		dbname = argv[optind];
+		optind++;
+	}
+
+	if (optind < argc)
+	{
+		pg_log_error("too many command-line arguments (first is \"%s\")",
+					 argv[optind]);
+		pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+		exit(1);
+	}
+
+	/*
+	 * Validate the combination of filters specified in the command-line
+	 * options.
+	 */
+	check_objfilter();
+
+	main_common(&cparams, dbname, maintenance_db, &vacopts, &objects,
+				false, tbl_count, concurrentCons,
+				progname, echo, quiet);
+
+	exit(0);
+}
+
+static void
+help(const char *progname)
+{
+	printf(_("%s cleans and analyzes a PostgreSQL database.\n\n"), progname);
+	printf(_("Usage:\n"));
+	printf(_("  %s [OPTION]... [DBNAME]\n"), progname);
+	printf(_("\nOptions:\n"));
+	printf(_("  -a, --all                       repack all databases\n"));
+	printf(_("  -d, --dbname=DBNAME             database to repack\n"));
+	printf(_("  -e, --echo                      show the commands being sent to the server\n"));
+	printf(_("  -j, --jobs=NUM                  use this many concurrent connections to repack\n"));
+	printf(_("  -n, --schema=SCHEMA             repack tables in the specified schema(s) only\n"));
+	printf(_("  -N, --exclude-schema=SCHEMA     do not repack tables in the specified schema(s)\n"));
+	printf(_("  -q, --quiet                     don't write any messages\n"));
+	printf(_("  -t, --table='TABLE[(COLUMNS)]'  repack specific table(s) only\n"));
+	printf(_("  -v, --verbose                   write a lot of output\n"));
+	printf(_("  -V, --version                   output version information, then exit\n"));
+	printf(_("  -?, --help                      show this help, then exit\n"));
+	printf(_("\nConnection options:\n"));
+	printf(_("  -h, --host=HOSTNAME       database server host or socket directory\n"));
+	printf(_("  -p, --port=PORT           database server port\n"));
+	printf(_("  -U, --username=USERNAME   user name to connect as\n"));
+	printf(_("  -w, --no-password         never prompt for password\n"));
+	printf(_("  -W, --password            force password prompt\n"));
+	printf(_("  --maintenance-db=DBNAME   alternate maintenance database\n"));
+	printf(_("\nRead the description of the SQL command VACUUM for details.\n"));
+	printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
+	printf(_("%s home page: <%s>\n"), PACKAGE_NAME, PACKAGE_URL);
+}
diff --git a/src/bin/scripts/t/103_repackdb.pl b/src/bin/scripts/t/103_repackdb.pl
new file mode 100644
index 00000000000..51de4d7ab34
--- /dev/null
+++ b/src/bin/scripts/t/103_repackdb.pl
@@ -0,0 +1,24 @@
+# Copyright (c) 2021-2025, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+program_help_ok('pg_repackdb');
+program_version_ok('pg_repackdb');
+program_options_handling_ok('pg_repackdb');
+
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init;
+$node->start;
+
+$node->issues_sql_like(
+	[ 'pg_repackdb', 'postgres' ],
+	qr/statement: REPACK.*;/,
+	'SQL REPACK run');
+
+
+done_testing();
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 79b1096eb08..e5738cedf0e 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -14,140 +14,24 @@
 
 #include <limits.h>
 
-#include "catalog/pg_class_d.h"
 #include "common.h"
 #include "common/connect.h"
 #include "common/logging.h"
-#include "fe_utils/cancel.h"
 #include "fe_utils/option_utils.h"
-#include "fe_utils/parallel_slot.h"
 #include "fe_utils/query_utils.h"
-#include "fe_utils/simple_list.h"
 #include "fe_utils/string_utils.h"
 
-
-/* vacuum options controlled by user flags */
-typedef struct vacuumingOptions
-{
-	bool		analyze_only;
-	bool		verbose;
-	bool		and_analyze;
-	bool		full;
-	bool		freeze;
-	bool		disable_page_skipping;
-	bool		skip_locked;
-	int			min_xid_age;
-	int			min_mxid_age;
-	int			parallel_workers;	/* >= 0 indicates user specified the
-									 * parallel degree, otherwise -1 */
-	bool		no_index_cleanup;
-	bool		force_index_cleanup;
-	bool		do_truncate;
-	bool		process_main;
-	bool		process_toast;
-	bool		skip_database_stats;
-	char	   *buffer_usage_limit;
-	bool		missing_stats_only;
-} vacuumingOptions;
-
-/* object filter options */
-typedef enum
-{
-	OBJFILTER_NONE = 0,			/* no filter used */
-	OBJFILTER_ALL_DBS = (1 << 0),	/* -a | --all */
-	OBJFILTER_DATABASE = (1 << 1),	/* -d | --dbname */
-	OBJFILTER_TABLE = (1 << 2), /* -t | --table */
-	OBJFILTER_SCHEMA = (1 << 3),	/* -n | --schema */
-	OBJFILTER_SCHEMA_EXCLUDE = (1 << 4),	/* -N | --exclude-schema */
-} VacObjFilter;
-
-static VacObjFilter objfilter = OBJFILTER_NONE;
-
-static SimpleStringList *retrieve_objects(PGconn *conn,
-										  vacuumingOptions *vacopts,
-										  SimpleStringList *objects,
-										  bool echo);
-
-static void vacuum_one_database(ConnParams *cparams,
-								vacuumingOptions *vacopts,
-								int stage,
-								SimpleStringList *objects,
-								SimpleStringList **found_objs,
-								int concurrentCons,
-								const char *progname, bool echo, bool quiet);
-
-static void vacuum_all_databases(ConnParams *cparams,
-								 vacuumingOptions *vacopts,
-								 bool analyze_in_stages,
-								 SimpleStringList *objects,
-								 int concurrentCons,
-								 const char *progname, bool echo, bool quiet);
-
-static void prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
-								   vacuumingOptions *vacopts, const char *table);
-
-static void run_vacuum_command(PGconn *conn, const char *sql, bool echo,
-							   const char *table);
-
 static void help(const char *progname);
-
-void		check_objfilter(void);
-
 static char *escape_quotes(const char *src);
 
-/* For analyze-in-stages mode */
-#define ANALYZE_NO_STAGE	-1
-#define ANALYZE_NUM_STAGES	3
-
-
 int
 main(int argc, char *argv[])
 {
-	static struct option long_options[] = {
-		{"host", required_argument, NULL, 'h'},
-		{"port", required_argument, NULL, 'p'},
-		{"username", required_argument, NULL, 'U'},
-		{"no-password", no_argument, NULL, 'w'},
-		{"password", no_argument, NULL, 'W'},
-		{"echo", no_argument, NULL, 'e'},
-		{"quiet", no_argument, NULL, 'q'},
-		{"dbname", required_argument, NULL, 'd'},
-		{"analyze", no_argument, NULL, 'z'},
-		{"analyze-only", no_argument, NULL, 'Z'},
-		{"freeze", no_argument, NULL, 'F'},
-		{"all", no_argument, NULL, 'a'},
-		{"table", required_argument, NULL, 't'},
-		{"full", no_argument, NULL, 'f'},
-		{"verbose", no_argument, NULL, 'v'},
-		{"jobs", required_argument, NULL, 'j'},
-		{"parallel", required_argument, NULL, 'P'},
-		{"schema", required_argument, NULL, 'n'},
-		{"exclude-schema", required_argument, NULL, 'N'},
-		{"maintenance-db", required_argument, NULL, 2},
-		{"analyze-in-stages", no_argument, NULL, 3},
-		{"disable-page-skipping", no_argument, NULL, 4},
-		{"skip-locked", no_argument, NULL, 5},
-		{"min-xid-age", required_argument, NULL, 6},
-		{"min-mxid-age", required_argument, NULL, 7},
-		{"no-index-cleanup", no_argument, NULL, 8},
-		{"force-index-cleanup", no_argument, NULL, 9},
-		{"no-truncate", no_argument, NULL, 10},
-		{"no-process-toast", no_argument, NULL, 11},
-		{"no-process-main", no_argument, NULL, 12},
-		{"buffer-usage-limit", required_argument, NULL, 13},
-		{"missing-stats-only", no_argument, NULL, 14},
-		{NULL, 0, NULL, 0}
-	};
-
 	const char *progname;
 	int			optindex;
 	int			c;
 	const char *dbname = NULL;
 	const char *maintenance_db = NULL;
-	char	   *host = NULL;
-	char	   *port = NULL;
-	char	   *username = NULL;
-	enum trivalue prompt_password = TRI_DEFAULT;
 	ConnParams	cparams;
 	bool		echo = false;
 	bool		quiet = false;
@@ -157,6 +41,8 @@ main(int argc, char *argv[])
 	int			concurrentCons = 1;
 	int			tbl_count = 0;
 
+	mode = MODE_VACUUM;
+
 	/* initialize options */
 	memset(&vacopts, 0, sizeof(vacopts));
 	vacopts.parallel_workers = -1;
@@ -167,13 +53,18 @@ main(int argc, char *argv[])
 	vacopts.process_main = true;
 	vacopts.process_toast = true;
 
+	/* the same for connection parameters */
+	memset(&cparams, 0, sizeof(cparams));
+	cparams.prompt_password = TRI_DEFAULT;
+
 	pg_logging_init(argv[0]);
 	progname = get_progname(argv[0]);
 	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
 
-	handle_help_version_opts(argc, argv, "vacuumdb", help);
+	handle_help_version_opts(argc, argv, progname, help);
 
-	while ((c = getopt_long(argc, argv, "ad:efFh:j:n:N:p:P:qt:U:vwWzZ", long_options, &optindex)) != -1)
+	while ((c = getopt_long(argc, argv, "ad:efFh:j:n:N:p:P:qt:U:vwWzZ",
+							get_options(), &optindex)) != -1)
 	{
 		switch (c)
 		{
@@ -194,7 +85,7 @@ main(int argc, char *argv[])
 				vacopts.freeze = true;
 				break;
 			case 'h':
-				host = pg_strdup(optarg);
+				cparams.pghost = pg_strdup(optarg);
 				break;
 			case 'j':
 				if (!option_parse_int(optarg, "-j/--jobs", 1, INT_MAX,
@@ -210,7 +101,7 @@ main(int argc, char *argv[])
 				simple_string_list_append(&objects, optarg);
 				break;
 			case 'p':
-				port = pg_strdup(optarg);
+				cparams.pgport = pg_strdup(optarg);
 				break;
 			case 'P':
 				if (!option_parse_int(optarg, "-P/--parallel", 0, INT_MAX,
@@ -226,16 +117,16 @@ main(int argc, char *argv[])
 				tbl_count++;
 				break;
 			case 'U':
-				username = pg_strdup(optarg);
+				cparams.pguser = pg_strdup(optarg);
 				break;
 			case 'v':
 				vacopts.verbose = true;
 				break;
 			case 'w':
-				prompt_password = TRI_NO;
+				cparams.prompt_password = TRI_NO;
 				break;
 			case 'W':
-				prompt_password = TRI_YES;
+				cparams.prompt_password = TRI_YES;
 				break;
 			case 'z':
 				vacopts.and_analyze = true;
@@ -379,92 +270,13 @@ main(int argc, char *argv[])
 		pg_fatal("cannot use the \"%s\" option without \"%s\" or \"%s\"",
 				 "missing-stats-only", "analyze-only", "analyze-in-stages");
 
-	/* fill cparams except for dbname, which is set below */
-	cparams.pghost = host;
-	cparams.pgport = port;
-	cparams.pguser = username;
-	cparams.prompt_password = prompt_password;
-	cparams.override_dbname = NULL;
-
-	setup_cancel_handler(NULL);
-
-	/* Avoid opening extra connections. */
-	if (tbl_count && (concurrentCons > tbl_count))
-		concurrentCons = tbl_count;
-
-	if (objfilter & OBJFILTER_ALL_DBS)
-	{
-		cparams.dbname = maintenance_db;
-
-		vacuum_all_databases(&cparams, &vacopts,
-							 analyze_in_stages,
-							 &objects,
-							 concurrentCons,
-							 progname, echo, quiet);
-	}
-	else
-	{
-		if (dbname == NULL)
-		{
-			if (getenv("PGDATABASE"))
-				dbname = getenv("PGDATABASE");
-			else if (getenv("PGUSER"))
-				dbname = getenv("PGUSER");
-			else
-				dbname = get_user_name_or_exit(progname);
-		}
-
-		cparams.dbname = dbname;
-
-		if (analyze_in_stages)
-		{
-			int			stage;
-			SimpleStringList *found_objs = NULL;
-
-			for (stage = 0; stage < ANALYZE_NUM_STAGES; stage++)
-			{
-				vacuum_one_database(&cparams, &vacopts,
-									stage,
-									&objects,
-									vacopts.missing_stats_only ? &found_objs : NULL,
-									concurrentCons,
-									progname, echo, quiet);
-			}
-		}
-		else
-			vacuum_one_database(&cparams, &vacopts,
-								ANALYZE_NO_STAGE,
-								&objects, NULL,
-								concurrentCons,
-								progname, echo, quiet);
-	}
+	main_common(&cparams, dbname, maintenance_db, &vacopts, &objects,
+				analyze_in_stages, tbl_count, concurrentCons,
+				progname, echo, quiet);
 
 	exit(0);
 }
 
-/*
- * Verify that the filters used at command line are compatible.
- */
-void
-check_objfilter(void)
-{
-	if ((objfilter & OBJFILTER_ALL_DBS) &&
-		(objfilter & OBJFILTER_DATABASE))
-		pg_fatal("cannot vacuum all databases and a specific one at the same time");
-
-	if ((objfilter & OBJFILTER_TABLE) &&
-		(objfilter & OBJFILTER_SCHEMA))
-		pg_fatal("cannot vacuum all tables in schema(s) and specific table(s) at the same time");
-
-	if ((objfilter & OBJFILTER_TABLE) &&
-		(objfilter & OBJFILTER_SCHEMA_EXCLUDE))
-		pg_fatal("cannot vacuum specific table(s) and exclude schema(s) at the same time");
-
-	if ((objfilter & OBJFILTER_SCHEMA) &&
-		(objfilter & OBJFILTER_SCHEMA_EXCLUDE))
-		pg_fatal("cannot vacuum all tables in schema(s) and exclude schema(s) at the same time");
-}
-
 /*
  * Returns a newly malloc'd version of 'src' with escaped single quotes and
  * backslashes.
@@ -479,852 +291,6 @@ escape_quotes(const char *src)
 	return result;
 }
 
-/*
- * vacuum_one_database
- *
- * Process tables in the given database.
- *
- * There are two ways to specify the list of objects to process:
- *
- * 1) The "found_objs" parameter is a double pointer to a fully qualified list
- *    of objects to process, as returned by a previous call to
- *    vacuum_one_database().
- *
- *     a) If both "found_objs" (the double pointer) and "*found_objs" (the
- *        once-dereferenced double pointer) are not NULL, this list takes
- *        priority, and anything specified in "objects" is ignored.
- *
- *     b) If "found_objs" (the double pointer) is not NULL but "*found_objs"
- *        (the once-dereferenced double pointer) _is_ NULL, the "objects"
- *        parameter takes priority, and the results of the catalog query
- *        described in (2) are stored in "found_objs".
- *
- *     c) If "found_objs" (the double pointer) is NULL, the "objects"
- *        parameter again takes priority, and the results of the catalog query
- *        are not saved.
- *
- * 2) The "objects" parameter is a user-specified list of objects to process.
- *    When (1b) or (1c) applies, this function performs a catalog query to
- *    retrieve a fully qualified list of objects to process, as described
- *    below.
- *
- *     a) If "objects" is not NULL, the catalog query gathers only the objects
- *        listed in "objects".
- *
- *     b) If "objects" is NULL, all tables in the database are gathered.
- *
- * Note that this function is only concerned with running exactly one stage
- * when in analyze-in-stages mode; caller must iterate on us if necessary.
- *
- * If concurrentCons is > 1, multiple connections are used to vacuum tables
- * in parallel.
- */
-static void
-vacuum_one_database(ConnParams *cparams,
-					vacuumingOptions *vacopts,
-					int stage,
-					SimpleStringList *objects,
-					SimpleStringList **found_objs,
-					int concurrentCons,
-					const char *progname, bool echo, bool quiet)
-{
-	PQExpBufferData sql;
-	PGconn	   *conn;
-	SimpleStringListCell *cell;
-	ParallelSlotArray *sa;
-	int			ntups = 0;
-	bool		failed = false;
-	const char *initcmd;
-	SimpleStringList *ret = NULL;
-	const char *stage_commands[] = {
-		"SET default_statistics_target=1; SET vacuum_cost_delay=0;",
-		"SET default_statistics_target=10; RESET vacuum_cost_delay;",
-		"RESET default_statistics_target;"
-	};
-	const char *stage_messages[] = {
-		gettext_noop("Generating minimal optimizer statistics (1 target)"),
-		gettext_noop("Generating medium optimizer statistics (10 targets)"),
-		gettext_noop("Generating default (full) optimizer statistics")
-	};
-
-	Assert(stage == ANALYZE_NO_STAGE ||
-		   (stage >= 0 && stage < ANALYZE_NUM_STAGES));
-
-	conn = connectDatabase(cparams, progname, echo, false, true);
-
-	if (vacopts->disable_page_skipping && PQserverVersion(conn) < 90600)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "disable-page-skipping", "9.6");
-	}
-
-	if (vacopts->no_index_cleanup && PQserverVersion(conn) < 120000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "no-index-cleanup", "12");
-	}
-
-	if (vacopts->force_index_cleanup && PQserverVersion(conn) < 120000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "force-index-cleanup", "12");
-	}
-
-	if (!vacopts->do_truncate && PQserverVersion(conn) < 120000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "no-truncate", "12");
-	}
-
-	if (!vacopts->process_main && PQserverVersion(conn) < 160000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "no-process-main", "16");
-	}
-
-	if (!vacopts->process_toast && PQserverVersion(conn) < 140000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "no-process-toast", "14");
-	}
-
-	if (vacopts->skip_locked && PQserverVersion(conn) < 120000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "skip-locked", "12");
-	}
-
-	if (vacopts->min_xid_age != 0 && PQserverVersion(conn) < 90600)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "--min-xid-age", "9.6");
-	}
-
-	if (vacopts->min_mxid_age != 0 && PQserverVersion(conn) < 90600)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "--min-mxid-age", "9.6");
-	}
-
-	if (vacopts->parallel_workers >= 0 && PQserverVersion(conn) < 130000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "--parallel", "13");
-	}
-
-	if (vacopts->buffer_usage_limit && PQserverVersion(conn) < 160000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "--buffer-usage-limit", "16");
-	}
-
-	if (vacopts->missing_stats_only && PQserverVersion(conn) < 150000)
-	{
-		PQfinish(conn);
-		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
-				 "--missing-stats-only", "15");
-	}
-
-	/* skip_database_stats is used automatically if server supports it */
-	vacopts->skip_database_stats = (PQserverVersion(conn) >= 160000);
-
-	if (!quiet)
-	{
-		if (stage != ANALYZE_NO_STAGE)
-			printf(_("%s: processing database \"%s\": %s\n"),
-				   progname, PQdb(conn), _(stage_messages[stage]));
-		else
-			printf(_("%s: vacuuming database \"%s\"\n"),
-				   progname, PQdb(conn));
-		fflush(stdout);
-	}
-
-	/*
-	 * If the caller provided the results of a previous catalog query, just
-	 * use that.  Otherwise, run the catalog query ourselves and set the
-	 * return variable if provided.
-	 */
-	if (found_objs && *found_objs)
-		ret = *found_objs;
-	else
-	{
-		ret = retrieve_objects(conn, vacopts, objects, echo);
-		if (found_objs)
-			*found_objs = ret;
-	}
-
-	/*
-	 * Count the number of objects in the catalog query result.  If there are
-	 * none, we are done.
-	 */
-	for (cell = ret ? ret->head : NULL; cell; cell = cell->next)
-		ntups++;
-
-	if (ntups == 0)
-	{
-		PQfinish(conn);
-		return;
-	}
-
-	/*
-	 * Ensure concurrentCons is sane.  If there are more connections than
-	 * vacuumable relations, we don't need to use them all.
-	 */
-	if (concurrentCons > ntups)
-		concurrentCons = ntups;
-	if (concurrentCons <= 0)
-		concurrentCons = 1;
-
-	/*
-	 * All slots need to be prepared to run the appropriate analyze stage, if
-	 * caller requested that mode.  We have to prepare the initial connection
-	 * ourselves before setting up the slots.
-	 */
-	if (stage == ANALYZE_NO_STAGE)
-		initcmd = NULL;
-	else
-	{
-		initcmd = stage_commands[stage];
-		executeCommand(conn, initcmd, echo);
-	}
-
-	/*
-	 * Setup the database connections. We reuse the connection we already have
-	 * for the first slot.  If not in parallel mode, the first slot in the
-	 * array contains the connection.
-	 */
-	sa = ParallelSlotsSetup(concurrentCons, cparams, progname, echo, initcmd);
-	ParallelSlotsAdoptConn(sa, conn);
-
-	initPQExpBuffer(&sql);
-
-	cell = ret->head;
-	do
-	{
-		const char *tabname = cell->val;
-		ParallelSlot *free_slot;
-
-		if (CancelRequested)
-		{
-			failed = true;
-			goto finish;
-		}
-
-		free_slot = ParallelSlotsGetIdle(sa, NULL);
-		if (!free_slot)
-		{
-			failed = true;
-			goto finish;
-		}
-
-		prepare_vacuum_command(&sql, PQserverVersion(free_slot->connection),
-							   vacopts, tabname);
-
-		/*
-		 * Execute the vacuum.  All errors are handled in processQueryResult
-		 * through ParallelSlotsGetIdle.
-		 */
-		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, sql.data,
-						   echo, tabname);
-
-		cell = cell->next;
-	} while (cell != NULL);
-
-	if (!ParallelSlotsWaitCompletion(sa))
-	{
-		failed = true;
-		goto finish;
-	}
-
-	/* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */
-	if (vacopts->skip_database_stats && stage == ANALYZE_NO_STAGE)
-	{
-		const char *cmd = "VACUUM (ONLY_DATABASE_STATS);";
-		ParallelSlot *free_slot = ParallelSlotsGetIdle(sa, NULL);
-
-		if (!free_slot)
-		{
-			failed = true;
-			goto finish;
-		}
-
-		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, cmd, echo, NULL);
-
-		if (!ParallelSlotsWaitCompletion(sa))
-			failed = true;
-	}
-
-finish:
-	ParallelSlotsTerminate(sa);
-	pg_free(sa);
-
-	termPQExpBuffer(&sql);
-
-	if (failed)
-		exit(1);
-}
-
-/*
- * Prepare the list of tables to process by querying the catalogs.
- *
- * Since we execute the constructed query with the default search_path (which
- * could be unsafe), everything in this query MUST be fully qualified.
- *
- * First, build a WITH clause for the catalog query if any tables were
- * specified, with a set of values made of relation names and their optional
- * set of columns.  This is used to match any provided column lists with the
- * generated qualified identifiers and to filter for the tables provided via
- * --table.  If a listed table does not exist, the catalog query will fail.
- */
-static SimpleStringList *
-retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
-				 SimpleStringList *objects, bool echo)
-{
-	PQExpBufferData buf;
-	PQExpBufferData catalog_query;
-	PGresult   *res;
-	SimpleStringListCell *cell;
-	SimpleStringList *found_objs = palloc0(sizeof(SimpleStringList));
-	bool		objects_listed = false;
-
-	initPQExpBuffer(&catalog_query);
-	for (cell = objects ? objects->head : NULL; cell; cell = cell->next)
-	{
-		char	   *just_table = NULL;
-		const char *just_columns = NULL;
-
-		if (!objects_listed)
-		{
-			appendPQExpBufferStr(&catalog_query,
-								 "WITH listed_objects (object_oid, column_list) "
-								 "AS (\n  VALUES (");
-			objects_listed = true;
-		}
-		else
-			appendPQExpBufferStr(&catalog_query, ",\n  (");
-
-		if (objfilter & (OBJFILTER_SCHEMA | OBJFILTER_SCHEMA_EXCLUDE))
-		{
-			appendStringLiteralConn(&catalog_query, cell->val, conn);
-			appendPQExpBufferStr(&catalog_query, "::pg_catalog.regnamespace, ");
-		}
-
-		if (objfilter & OBJFILTER_TABLE)
-		{
-			/*
-			 * Split relation and column names given by the user, this is used
-			 * to feed the CTE with values on which are performed pre-run
-			 * validity checks as well.  For now these happen only on the
-			 * relation name.
-			 */
-			splitTableColumnsSpec(cell->val, PQclientEncoding(conn),
-								  &just_table, &just_columns);
-
-			appendStringLiteralConn(&catalog_query, just_table, conn);
-			appendPQExpBufferStr(&catalog_query, "::pg_catalog.regclass, ");
-		}
-
-		if (just_columns && just_columns[0] != '\0')
-			appendStringLiteralConn(&catalog_query, just_columns, conn);
-		else
-			appendPQExpBufferStr(&catalog_query, "NULL");
-
-		appendPQExpBufferStr(&catalog_query, "::pg_catalog.text)");
-
-		pg_free(just_table);
-	}
-
-	/* Finish formatting the CTE */
-	if (objects_listed)
-		appendPQExpBufferStr(&catalog_query, "\n)\n");
-
-	appendPQExpBufferStr(&catalog_query, "SELECT c.relname, ns.nspname");
-
-	if (objects_listed)
-		appendPQExpBufferStr(&catalog_query, ", listed_objects.column_list");
-
-	appendPQExpBufferStr(&catalog_query,
-						 " FROM pg_catalog.pg_class c\n"
-						 " JOIN pg_catalog.pg_namespace ns"
-						 " ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n"
-						 " CROSS JOIN LATERAL (SELECT c.relkind IN ("
-						 CppAsString2(RELKIND_PARTITIONED_TABLE) ", "
-						 CppAsString2(RELKIND_PARTITIONED_INDEX) ")) as p (inherited)\n"
-						 " LEFT JOIN pg_catalog.pg_class t"
-						 " ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid\n");
-
-	/*
-	 * Used to match the tables or schemas listed by the user, completing the
-	 * JOIN clause.
-	 */
-	if (objects_listed)
-	{
-		appendPQExpBufferStr(&catalog_query, " LEFT JOIN listed_objects"
-							 " ON listed_objects.object_oid"
-							 " OPERATOR(pg_catalog.=) ");
-
-		if (objfilter & OBJFILTER_TABLE)
-			appendPQExpBufferStr(&catalog_query, "c.oid\n");
-		else
-			appendPQExpBufferStr(&catalog_query, "ns.oid\n");
-	}
-
-	/*
-	 * Exclude temporary tables, beginning the WHERE clause.
-	 */
-	appendPQExpBufferStr(&catalog_query,
-						 " WHERE c.relpersistence OPERATOR(pg_catalog.!=) "
-						 CppAsString2(RELPERSISTENCE_TEMP) "\n");
-
-	/*
-	 * Used to match the tables or schemas listed by the user, for the WHERE
-	 * clause.
-	 */
-	if (objects_listed)
-	{
-		if (objfilter & OBJFILTER_SCHEMA_EXCLUDE)
-			appendPQExpBufferStr(&catalog_query,
-								 " AND listed_objects.object_oid IS NULL\n");
-		else
-			appendPQExpBufferStr(&catalog_query,
-								 " AND listed_objects.object_oid IS NOT NULL\n");
-	}
-
-	/*
-	 * If no tables were listed, filter for the relevant relation types.  If
-	 * tables were given via --table, don't bother filtering by relation type.
-	 * Instead, let the server decide whether a given relation can be
-	 * processed in which case the user will know about it.
-	 */
-	if ((objfilter & OBJFILTER_TABLE) == 0)
-	{
-		appendPQExpBufferStr(&catalog_query,
-							 " AND c.relkind OPERATOR(pg_catalog.=) ANY (array["
-							 CppAsString2(RELKIND_RELATION) ", "
-							 CppAsString2(RELKIND_MATVIEW) "])\n");
-	}
-
-	/*
-	 * For --min-xid-age and --min-mxid-age, the age of the relation is the
-	 * greatest of the ages of the main relation and its associated TOAST
-	 * table.  The commands generated by vacuumdb will also process the TOAST
-	 * table for the relation if necessary, so it does not need to be
-	 * considered separately.
-	 */
-	if (vacopts->min_xid_age != 0)
-	{
-		appendPQExpBuffer(&catalog_query,
-						  " AND GREATEST(pg_catalog.age(c.relfrozenxid),"
-						  " pg_catalog.age(t.relfrozenxid)) "
-						  " OPERATOR(pg_catalog.>=) '%d'::pg_catalog.int4\n"
-						  " AND c.relfrozenxid OPERATOR(pg_catalog.!=)"
-						  " '0'::pg_catalog.xid\n",
-						  vacopts->min_xid_age);
-	}
-
-	if (vacopts->min_mxid_age != 0)
-	{
-		appendPQExpBuffer(&catalog_query,
-						  " AND GREATEST(pg_catalog.mxid_age(c.relminmxid),"
-						  " pg_catalog.mxid_age(t.relminmxid)) OPERATOR(pg_catalog.>=)"
-						  " '%d'::pg_catalog.int4\n"
-						  " AND c.relminmxid OPERATOR(pg_catalog.!=)"
-						  " '0'::pg_catalog.xid\n",
-						  vacopts->min_mxid_age);
-	}
-
-	if (vacopts->missing_stats_only)
-	{
-		appendPQExpBufferStr(&catalog_query, " AND (\n");
-
-		/* regular stats */
-		appendPQExpBufferStr(&catalog_query,
-							 " EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
-							 " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
-							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
-							 " AND NOT a.attisdropped\n"
-							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
-							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
-							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
-							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
-							 " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
-
-		/* extended stats */
-		appendPQExpBufferStr(&catalog_query,
-							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
-							 " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
-							 " AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
-							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
-							 " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
-							 " AND d.stxdinherit OPERATOR(pg_catalog.=) p.inherited))\n");
-
-		/* expression indexes */
-		appendPQExpBufferStr(&catalog_query,
-							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
-							 " JOIN pg_catalog.pg_index i"
-							 " ON i.indexrelid OPERATOR(pg_catalog.=) a.attrelid\n"
-							 " WHERE i.indrelid OPERATOR(pg_catalog.=) c.oid\n"
-							 " AND i.indkey[a.attnum OPERATOR(pg_catalog.-) 1::pg_catalog.int2]"
-							 " OPERATOR(pg_catalog.=) 0::pg_catalog.int2\n"
-							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
-							 " AND NOT a.attisdropped\n"
-							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
-							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
-							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
-							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
-							 " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
-
-		/* inheritance and regular stats */
-		appendPQExpBufferStr(&catalog_query,
-							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
-							 " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
-							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
-							 " AND NOT a.attisdropped\n"
-							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
-							 " AND c.relhassubclass\n"
-							 " AND NOT p.inherited\n"
-							 " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
-							 " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
-							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
-							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
-							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
-							 " AND s.stainherit))\n");
-
-		/* inheritance and extended stats */
-		appendPQExpBufferStr(&catalog_query,
-							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
-							 " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
-							 " AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
-							 " AND c.relhassubclass\n"
-							 " AND NOT p.inherited\n"
-							 " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
-							 " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
-							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
-							 " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
-							 " AND d.stxdinherit))\n");
-
-		appendPQExpBufferStr(&catalog_query, " )\n");
-	}
-
-	/*
-	 * Execute the catalog query.  We use the default search_path for this
-	 * query for consistency with table lookups done elsewhere by the user.
-	 */
-	appendPQExpBufferStr(&catalog_query, " ORDER BY c.relpages DESC;");
-	executeCommand(conn, "RESET search_path;", echo);
-	res = executeQuery(conn, catalog_query.data, echo);
-	termPQExpBuffer(&catalog_query);
-	PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, echo));
-
-	/*
-	 * Build qualified identifiers for each table, including the column list
-	 * if given.
-	 */
-	initPQExpBuffer(&buf);
-	for (int i = 0; i < PQntuples(res); i++)
-	{
-		appendPQExpBufferStr(&buf,
-							 fmtQualifiedIdEnc(PQgetvalue(res, i, 1),
-											   PQgetvalue(res, i, 0),
-											   PQclientEncoding(conn)));
-
-		if (objects_listed && !PQgetisnull(res, i, 2))
-			appendPQExpBufferStr(&buf, PQgetvalue(res, i, 2));
-
-		simple_string_list_append(found_objs, buf.data);
-		resetPQExpBuffer(&buf);
-	}
-	termPQExpBuffer(&buf);
-	PQclear(res);
-
-	return found_objs;
-}
-
-/*
- * Vacuum/analyze all connectable databases.
- *
- * In analyze-in-stages mode, we process all databases in one stage before
- * moving on to the next stage.  That ensure minimal stats are available
- * quickly everywhere before generating more detailed ones.
- */
-static void
-vacuum_all_databases(ConnParams *cparams,
-					 vacuumingOptions *vacopts,
-					 bool analyze_in_stages,
-					 SimpleStringList *objects,
-					 int concurrentCons,
-					 const char *progname, bool echo, bool quiet)
-{
-	PGconn	   *conn;
-	PGresult   *result;
-	int			stage;
-	int			i;
-
-	conn = connectMaintenanceDatabase(cparams, progname, echo);
-	result = executeQuery(conn,
-						  "SELECT datname FROM pg_database WHERE datallowconn AND datconnlimit <> -2 ORDER BY 1;",
-						  echo);
-	PQfinish(conn);
-
-	if (analyze_in_stages)
-	{
-		SimpleStringList **found_objs = NULL;
-
-		if (vacopts->missing_stats_only)
-			found_objs = palloc0(PQntuples(result) * sizeof(SimpleStringList *));
-
-		/*
-		 * When analyzing all databases in stages, we analyze them all in the
-		 * fastest stage first, so that initial statistics become available
-		 * for all of them as soon as possible.
-		 *
-		 * This means we establish several times as many connections, but
-		 * that's a secondary consideration.
-		 */
-		for (stage = 0; stage < ANALYZE_NUM_STAGES; stage++)
-		{
-			for (i = 0; i < PQntuples(result); i++)
-			{
-				cparams->override_dbname = PQgetvalue(result, i, 0);
-
-				vacuum_one_database(cparams, vacopts,
-									stage,
-									objects,
-									vacopts->missing_stats_only ? &found_objs[i] : NULL,
-									concurrentCons,
-									progname, echo, quiet);
-			}
-		}
-	}
-	else
-	{
-		for (i = 0; i < PQntuples(result); i++)
-		{
-			cparams->override_dbname = PQgetvalue(result, i, 0);
-
-			vacuum_one_database(cparams, vacopts,
-								ANALYZE_NO_STAGE,
-								objects, NULL,
-								concurrentCons,
-								progname, echo, quiet);
-		}
-	}
-
-	PQclear(result);
-}
-
-/*
- * Construct a vacuum/analyze command to run based on the given options, in the
- * given string buffer, which may contain previous garbage.
- *
- * The table name used must be already properly quoted.  The command generated
- * depends on the server version involved and it is semicolon-terminated.
- */
-static void
-prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
-					   vacuumingOptions *vacopts, const char *table)
-{
-	const char *paren = " (";
-	const char *comma = ", ";
-	const char *sep = paren;
-
-	resetPQExpBuffer(sql);
-
-	if (vacopts->analyze_only)
-	{
-		appendPQExpBufferStr(sql, "ANALYZE");
-
-		/* parenthesized grammar of ANALYZE is supported since v11 */
-		if (serverVersion >= 110000)
-		{
-			if (vacopts->skip_locked)
-			{
-				/* SKIP_LOCKED is supported since v12 */
-				Assert(serverVersion >= 120000);
-				appendPQExpBuffer(sql, "%sSKIP_LOCKED", sep);
-				sep = comma;
-			}
-			if (vacopts->verbose)
-			{
-				appendPQExpBuffer(sql, "%sVERBOSE", sep);
-				sep = comma;
-			}
-			if (vacopts->buffer_usage_limit)
-			{
-				Assert(serverVersion >= 160000);
-				appendPQExpBuffer(sql, "%sBUFFER_USAGE_LIMIT '%s'", sep,
-								  vacopts->buffer_usage_limit);
-				sep = comma;
-			}
-			if (sep != paren)
-				appendPQExpBufferChar(sql, ')');
-		}
-		else
-		{
-			if (vacopts->verbose)
-				appendPQExpBufferStr(sql, " VERBOSE");
-		}
-	}
-	else
-	{
-		appendPQExpBufferStr(sql, "VACUUM");
-
-		/* parenthesized grammar of VACUUM is supported since v9.0 */
-		if (serverVersion >= 90000)
-		{
-			if (vacopts->disable_page_skipping)
-			{
-				/* DISABLE_PAGE_SKIPPING is supported since v9.6 */
-				Assert(serverVersion >= 90600);
-				appendPQExpBuffer(sql, "%sDISABLE_PAGE_SKIPPING", sep);
-				sep = comma;
-			}
-			if (vacopts->no_index_cleanup)
-			{
-				/* "INDEX_CLEANUP FALSE" has been supported since v12 */
-				Assert(serverVersion >= 120000);
-				Assert(!vacopts->force_index_cleanup);
-				appendPQExpBuffer(sql, "%sINDEX_CLEANUP FALSE", sep);
-				sep = comma;
-			}
-			if (vacopts->force_index_cleanup)
-			{
-				/* "INDEX_CLEANUP TRUE" has been supported since v12 */
-				Assert(serverVersion >= 120000);
-				Assert(!vacopts->no_index_cleanup);
-				appendPQExpBuffer(sql, "%sINDEX_CLEANUP TRUE", sep);
-				sep = comma;
-			}
-			if (!vacopts->do_truncate)
-			{
-				/* TRUNCATE is supported since v12 */
-				Assert(serverVersion >= 120000);
-				appendPQExpBuffer(sql, "%sTRUNCATE FALSE", sep);
-				sep = comma;
-			}
-			if (!vacopts->process_main)
-			{
-				/* PROCESS_MAIN is supported since v16 */
-				Assert(serverVersion >= 160000);
-				appendPQExpBuffer(sql, "%sPROCESS_MAIN FALSE", sep);
-				sep = comma;
-			}
-			if (!vacopts->process_toast)
-			{
-				/* PROCESS_TOAST is supported since v14 */
-				Assert(serverVersion >= 140000);
-				appendPQExpBuffer(sql, "%sPROCESS_TOAST FALSE", sep);
-				sep = comma;
-			}
-			if (vacopts->skip_database_stats)
-			{
-				/* SKIP_DATABASE_STATS is supported since v16 */
-				Assert(serverVersion >= 160000);
-				appendPQExpBuffer(sql, "%sSKIP_DATABASE_STATS", sep);
-				sep = comma;
-			}
-			if (vacopts->skip_locked)
-			{
-				/* SKIP_LOCKED is supported since v12 */
-				Assert(serverVersion >= 120000);
-				appendPQExpBuffer(sql, "%sSKIP_LOCKED", sep);
-				sep = comma;
-			}
-			if (vacopts->full)
-			{
-				appendPQExpBuffer(sql, "%sFULL", sep);
-				sep = comma;
-			}
-			if (vacopts->freeze)
-			{
-				appendPQExpBuffer(sql, "%sFREEZE", sep);
-				sep = comma;
-			}
-			if (vacopts->verbose)
-			{
-				appendPQExpBuffer(sql, "%sVERBOSE", sep);
-				sep = comma;
-			}
-			if (vacopts->and_analyze)
-			{
-				appendPQExpBuffer(sql, "%sANALYZE", sep);
-				sep = comma;
-			}
-			if (vacopts->parallel_workers >= 0)
-			{
-				/* PARALLEL is supported since v13 */
-				Assert(serverVersion >= 130000);
-				appendPQExpBuffer(sql, "%sPARALLEL %d", sep,
-								  vacopts->parallel_workers);
-				sep = comma;
-			}
-			if (vacopts->buffer_usage_limit)
-			{
-				Assert(serverVersion >= 160000);
-				appendPQExpBuffer(sql, "%sBUFFER_USAGE_LIMIT '%s'", sep,
-								  vacopts->buffer_usage_limit);
-				sep = comma;
-			}
-			if (sep != paren)
-				appendPQExpBufferChar(sql, ')');
-		}
-		else
-		{
-			if (vacopts->full)
-				appendPQExpBufferStr(sql, " FULL");
-			if (vacopts->freeze)
-				appendPQExpBufferStr(sql, " FREEZE");
-			if (vacopts->verbose)
-				appendPQExpBufferStr(sql, " VERBOSE");
-			if (vacopts->and_analyze)
-				appendPQExpBufferStr(sql, " ANALYZE");
-		}
-	}
-
-	appendPQExpBuffer(sql, " %s;", table);
-}
-
-/*
- * Send a vacuum/analyze command to the server, returning after sending the
- * command.
- *
- * Any errors during command execution are reported to stderr.
- */
-static void
-run_vacuum_command(PGconn *conn, const char *sql, bool echo,
-				   const char *table)
-{
-	bool		status;
-
-	if (echo)
-		printf("%s\n", sql);
-
-	status = PQsendQuery(conn, sql) == 1;
-
-	if (!status)
-	{
-		if (table)
-			pg_log_error("vacuuming of table \"%s\" in database \"%s\" failed: %s",
-						 table, PQdb(conn), PQerrorMessage(conn));
-		else
-			pg_log_error("vacuuming of database \"%s\" failed: %s",
-						 PQdb(conn), PQerrorMessage(conn));
-	}
-}
-
 static void
 help(const char *progname)
 {
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index af967625181..02858605434 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2609,6 +2609,7 @@ RtlNtStatusToDosError_t
 RuleInfo
 RuleLock
 RuleStmt
+RunMode
 RunningTransactions
 RunningTransactionsData
 SASLStatus
-- 
2.47.1

Alvaro Herrera

alvherre@alvh.no-ip.org

5 months ago

In reply to: Antonin Houska (#7)

On 2025-Aug-15, Antonin Houska wrote:

This is v18 again.

Thanks for this!

Parts 0001 through 0004 are unchanged, however 0005 is added. It
implements a new client application pg_repackdb. (If I posted 0005
alone its regression tests would not work. I wonder if the cfbot
handles the repeated occurence of the 'v18-' prefix correctly.)

Yeah, the cfbot is just going to take the attachments from the latest
email in the thread that has any, and assume they are the whole that
make up the patch. It wouldn't work to post just v18-0005 and assume
that the bot is going grab patches 0001 through 0004 from a previous
email, if that's what you're thinking. In short, what you did is
correct and necessary.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

Mihail Nikalayeu

mihailnikalayeu@gmail.com

5 months ago

In reply to: Alvaro Herrera (#5)

1 attachment(s)

Hello everyone!

Alvaro Herrera <alvherre@alvh.no-ip.org>:

Please note that Antonin already implemented this. See his patches
here:
/messages/by-id/77690.1725610115@antos
I proposed to leave this part out initially, which is why it hasn't been
reposted. We can review and discuss after the initial patches are in.

I think it is worth pushing it at least in the same release cycle.

But you're welcome to review that part specifically if you're so
inclined, and offer feedback on it. (I suggest to rewind back your
checked-out tree to branch master at the time that patch was posted, for
easy application. We can deal with a rebase later.)

I have rebased that on top of v18 (attached).

Also, I think I found an issue (or lost something during rebase): we
must preserve xmin,cmin during initial copy
to make sure that data is going to be visible by snapshots of
concurrent changes later:

static void
reform_and_rewrite_tuple(......)
.....
/*It is also crucial to stamp the new record with the exact same
xid and cid,
* because the tuple must be visible to the snapshot of the
applied concurrent
* change later.
*/
CommandId cid = HeapTupleHeaderGetRawCommandId(tuple->t_data);
TransactionId xid = HeapTupleHeaderGetXmin(tuple->t_data);

heap_insert(NewHeap, copiedTuple, xid, cid, HEAP_INSERT_NO_LOGICAL, NULL);

I'll try to polish that part a little bit.

Because having an MVCC-safe mode has drawbacks, IMO we should make it
optional.

Do you mean some option for the command? Like REPACK (CONCURRENTLY, SAFE)?

Best regards,
Mikhail.

Attachments:

v18-0006-Preserve-visibility-information-of-the-concurren.patchapplication/octet-stream; name=v18-0006-Preserve-visibility-information-of-the-concurren.patchDownload

From b2f4d126e04d0396f8ad69f5113ed405a4efd723 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <mihailnikalayeu@gmail.com>
Date: Thu, 21 Aug 2025 00:19:24 +0200
Subject: [PATCH v18 6/6] Preserve visibility information of the concurrent
 data  changes.

As explained in the commit message of the preceding patch of the series, the
data changes done by applications while REPACK CONCURRENTLY is copying the
table contents to a new file are decoded from WAL and eventually also applied
to the new file. To reduce the complexity a little bit, the preceding patch
uses the current transaction (i.e. transaction opened by the REPACK command)
to execute those INSERT, UPDATE and DELETE commands.

However, REPACK is not expected to change visibility of tuples. Therefore,
this patch fixes the handling of the "concurrent data changes". It ensures
that tuples written into the new table have the same XID and command ID (CID)
as they had in the old table.

To "replay" an UPDATE or DELETE command on the new table, we need the
appropriate snapshot to find the previous tuple version in the new table. The
(historic) snapshot we used to decode the UPDATE / DELETE should (by
definition) see the state of the catalog prior to that UPDATE / DELETE. Thus
we can use the same snapshot to find the "old tuple" for UPDATE / DELETE in
the new table if:

1) REPACK CONCURRENTLY preserves visibility information of all tuples - that's
the purpose of this part of the patch series.

2) The table being REPACKed is treated as a system catalog by all transactions
that modify its data. This ensures that reorderbuffer.c generates a new
snapshot for each data change in the table.

We ensure 2) by maintaining a shared hashtable of tables being REPACKed
CONCURRENTLY and by adjusting the RelationIsAccessibleInLogicalDecoding()
macro so it checks this hashtable. (The corresponding flag is also added to
the relation cache, so that the shared hashtable does not have to be accessed
too often.) It's essential that after adding an entry to the hashtable we wait
for completion of all the transactions that might have started to modify our
table before our entry has was added. We achieve that by upgrading our lock on
the table to ShareLock temporarily: as soon as we acquire it, no DML command
should be running on the table. (This lock upgrade shouldn't cause any
deadlock because we care to not hold a lock on other objects at the same
time.)

As long as we preserve the tuple visibility information (which includes XID),
it's important to avoid logical decoding of the WAL generated by DMLs on the
new table: the logical decoding subsystem probably does not expect that the
incoming WAL records contain XIDs of an already decoded transactions. (And of
course, repeated decoding would be wasted effort.)

Author: Antonin Houska <ah@cybertec.at> with small changes from Mikhail Nikalayeu <mihailnikalayeu@gmail.com
>
---
 src/backend/access/common/toast_internals.c   |   3 +-
 src/backend/access/heap/heapam.c              |  54 ++-
 src/backend/access/heap/heapam_handler.c      |  23 +-
 src/backend/access/transam/xact.c             |  52 +++
 src/backend/commands/cluster.c                | 400 ++++++++++++++++--
 src/backend/replication/logical/decode.c      |  28 +-
 src/backend/replication/logical/snapbuild.c   |  22 +-
 .../pgoutput_repack/pgoutput_repack.c         |  68 ++-
 src/backend/storage/ipc/ipci.c                |   2 +
 .../utils/activity/wait_event_names.txt       |   1 +
 src/backend/utils/cache/inval.c               |  21 +
 src/backend/utils/cache/relcache.c            |   4 +
 src/include/access/heapam.h                   |  12 +-
 src/include/access/xact.h                     |   2 +
 src/include/commands/cluster.h                |  22 +
 src/include/storage/lwlocklist.h              |   1 +
 src/include/utils/inval.h                     |   2 +
 src/include/utils/rel.h                       |   7 +-
 src/include/utils/snapshot.h                  |   3 +
 .../injection_points/specs/repack.spec        |   4 -
 src/tools/pgindent/typedefs.list              |   1 +
 21 files changed, 636 insertions(+), 96 deletions(-)

diff --git a/src/backend/access/common/toast_internals.c b/src/backend/access/common/toast_internals.c
index a1d0eed8953..586eb42a137 100644
--- a/src/backend/access/common/toast_internals.c
+++ b/src/backend/access/common/toast_internals.c
@@ -320,7 +320,8 @@ toast_save_datum(Relation rel, Datum value,
 		memcpy(VARDATA(&chunk_data), data_p, chunk_size);
 		toasttup = heap_form_tuple(toasttupDesc, t_values, t_isnull);
 
-		heap_insert(toastrel, toasttup, mycid, options, NULL);
+		heap_insert(toastrel, toasttup, GetCurrentTransactionId(), mycid,
+					options, NULL);
 
 		/*
 		 * Create the index entry.  We cheat a little here by not using
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4fdb3e880e4..e7b9f7b6374 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2059,7 +2059,7 @@ ReleaseBulkInsertStatePin(BulkInsertState bistate)
 /*
  *	heap_insert		- insert tuple into a heap
  *
- * The new tuple is stamped with current transaction ID and the specified
+ * The new tuple is stamped with specified transaction ID and the specified
  * command ID.
  *
  * See table_tuple_insert for comments about most of the input flags, except
@@ -2075,15 +2075,16 @@ ReleaseBulkInsertStatePin(BulkInsertState bistate)
  * reflected into *tup.
  */
 void
-heap_insert(Relation relation, HeapTuple tup, CommandId cid,
-			int options, BulkInsertState bistate)
+heap_insert(Relation relation, HeapTuple tup, TransactionId xid,
+			CommandId cid, int options, BulkInsertState bistate)
 {
-	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
+	Assert(TransactionIdIsValid(xid));
+
 	/* Cheap, simplistic check that the tuple matches the rel's rowtype. */
 	Assert(HeapTupleHeaderGetNatts(tup->t_data) <=
 		   RelationGetNumberOfAttributes(relation));
@@ -2165,8 +2166,15 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		/*
 		 * If this is a catalog, we need to transmit combo CIDs to properly
 		 * decode, so log that as well.
+		 *
+		 * HEAP_INSERT_NO_LOGICAL should be set when applying data changes
+		 * done by other transactions during REPACK CONCURRENTLY. In such a
+		 * case, the insertion should not be decoded at all - see
+		 * heap_decode(). (It's also set by raw_heap_insert() for TOAST, but
+		 * TOAST does not pass this test anyway.)
 		 */
-		if (RelationIsAccessibleInLogicalDecoding(relation))
+		if ((options & HEAP_INSERT_NO_LOGICAL) == 0 &&
+			RelationIsAccessibleInLogicalDecoding(relation))
 			log_heap_new_cid(relation, heaptup);
 
 		/*
@@ -2712,7 +2720,8 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 void
 simple_heap_insert(Relation relation, HeapTuple tup)
 {
-	heap_insert(relation, tup, GetCurrentCommandId(true), 0, NULL);
+	heap_insert(relation, tup, GetCurrentTransactionId(),
+				GetCurrentCommandId(true), 0, NULL);
 }
 
 /*
@@ -2769,11 +2778,11 @@ xmax_infomask_changed(uint16 new_infomask, uint16 old_infomask)
  */
 TM_Result
 heap_delete(Relation relation, ItemPointer tid,
-			CommandId cid, Snapshot crosscheck, bool wait,
-			TM_FailureData *tmfd, bool changingPart, bool wal_logical)
+			TransactionId xid, CommandId cid, Snapshot crosscheck, bool wait,
+			TM_FailureData *tmfd, bool changingPart,
+			bool wal_logical)
 {
 	TM_Result	result;
-	TransactionId xid = GetCurrentTransactionId();
 	ItemId		lp;
 	HeapTupleData tp;
 	Page		page;
@@ -2790,6 +2799,7 @@ heap_delete(Relation relation, ItemPointer tid,
 	bool		old_key_copied = false;
 
 	Assert(ItemPointerIsValid(tid));
+	Assert(TransactionIdIsValid(xid));
 
 	AssertHasSnapshotForToast(relation);
 
@@ -3086,8 +3096,12 @@ l1:
 		/*
 		 * For logical decode we need combo CIDs to properly decode the
 		 * catalog
+		 *
+		 * Like in heap_insert(), visibility is unchanged when called from
+		 * VACUUM FULL / CLUSTER.
 		 */
-		if (RelationIsAccessibleInLogicalDecoding(relation))
+		if (wal_logical &&
+			RelationIsAccessibleInLogicalDecoding(relation))
 			log_heap_new_cid(relation, &tp);
 
 		xlrec.flags = 0;
@@ -3206,11 +3220,11 @@ simple_heap_delete(Relation relation, ItemPointer tid)
 	TM_Result	result;
 	TM_FailureData tmfd;
 
-	result = heap_delete(relation, tid,
+	result = heap_delete(relation, tid, GetCurrentTransactionId(),
 						 GetCurrentCommandId(true), InvalidSnapshot,
 						 true /* wait for commit */ ,
-						 &tmfd, false, /* changingPart */
-						 true /* wal_logical */);
+						 &tmfd, false,	/* changingPart */
+						 true /* wal_logical */ );
 	switch (result)
 	{
 		case TM_SelfModified:
@@ -3249,12 +3263,11 @@ simple_heap_delete(Relation relation, ItemPointer tid)
  */
 TM_Result
 heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
-			CommandId cid, Snapshot crosscheck, bool wait,
-			TM_FailureData *tmfd, LockTupleMode *lockmode,
+			TransactionId xid, CommandId cid, Snapshot crosscheck,
+			bool wait, TM_FailureData *tmfd, LockTupleMode *lockmode,
 			TU_UpdateIndexes *update_indexes, bool wal_logical)
 {
 	TM_Result	result;
-	TransactionId xid = GetCurrentTransactionId();
 	Bitmapset  *hot_attrs;
 	Bitmapset  *sum_attrs;
 	Bitmapset  *key_attrs;
@@ -3294,6 +3307,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 				infomask2_new_tuple;
 
 	Assert(ItemPointerIsValid(otid));
+	Assert(TransactionIdIsValid(xid));
 
 	/* Cheap, simplistic check that the tuple matches the rel's rowtype. */
 	Assert(HeapTupleHeaderGetNatts(newtup->t_data) <=
@@ -4133,8 +4147,12 @@ l2:
 		/*
 		 * For logical decoding we need combo CIDs to properly decode the
 		 * catalog.
+		 *
+		 * Like in heap_insert(), visibility is unchanged when called from
+		 * VACUUM FULL / CLUSTER.
 		 */
-		if (RelationIsAccessibleInLogicalDecoding(relation))
+		if (wal_logical &&
+			RelationIsAccessibleInLogicalDecoding(relation))
 		{
 			log_heap_new_cid(relation, &oldtup);
 			log_heap_new_cid(relation, heaptup);
@@ -4500,7 +4518,7 @@ simple_heap_update(Relation relation, ItemPointer otid, HeapTuple tup,
 	TM_FailureData tmfd;
 	LockTupleMode lockmode;
 
-	result = heap_update(relation, otid, tup,
+	result = heap_update(relation, otid, tup, GetCurrentTransactionId(),
 						 GetCurrentCommandId(true), InvalidSnapshot,
 						 true /* wait for commit */ ,
 						 &tmfd, &lockmode, update_indexes,
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index c829c06f769..c42a1bd55ee 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -253,7 +253,8 @@ heapam_tuple_insert(Relation relation, TupleTableSlot *slot, CommandId cid,
 	tuple->t_tableOid = slot->tts_tableOid;
 
 	/* Perform the insertion, and copy the resulting ItemPointer */
-	heap_insert(relation, tuple, cid, options, bistate);
+	heap_insert(relation, tuple, GetCurrentTransactionId(), cid, options,
+				bistate);
 	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
 
 	if (shouldFree)
@@ -276,7 +277,8 @@ heapam_tuple_insert_speculative(Relation relation, TupleTableSlot *slot,
 	options |= HEAP_INSERT_SPECULATIVE;
 
 	/* Perform the insertion, and copy the resulting ItemPointer */
-	heap_insert(relation, tuple, cid, options, bistate);
+	heap_insert(relation, tuple, GetCurrentTransactionId(), cid, options,
+				bistate);
 	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
 
 	if (shouldFree)
@@ -310,8 +312,8 @@ heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
 	 * the storage itself is cleaning the dead tuples by itself, it is the
 	 * time to call the index tuple deletion also.
 	 */
-	return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart,
-					   true);
+	return heap_delete(relation, tid, GetCurrentTransactionId(), cid,
+					   crosscheck, wait, tmfd, changingPart, true);
 }
 
 
@@ -329,7 +331,8 @@ heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
 	slot->tts_tableOid = RelationGetRelid(relation);
 	tuple->t_tableOid = slot->tts_tableOid;
 
-	result = heap_update(relation, otid, tuple, cid, crosscheck, wait,
+	result = heap_update(relation, otid, tuple, GetCurrentTransactionId(),
+						 cid, crosscheck, wait,
 						 tmfd, lockmode, update_indexes, true);
 	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
 
@@ -2477,9 +2480,15 @@ reform_and_rewrite_tuple(HeapTuple tuple,
 		 * flag to skip logical decoding: as soon as REPACK CONCURRENTLY swaps
 		 * the relation files, it drops this relation, so no logical
 		 * replication subscription should need the data.
+		 *
+		* It is also crucial to stamp the new record with the exact same xid and cid,
+		* because the tuple must be visible to the snapshot of the applied concurrent
+		* change later.
 		 */
-		heap_insert(NewHeap, copiedTuple, GetCurrentCommandId(true),
-					HEAP_INSERT_NO_LOGICAL, NULL);
+		CommandId		cid = HeapTupleHeaderGetRawCommandId(tuple->t_data);
+		TransactionId	xid = HeapTupleHeaderGetXmin(tuple->t_data);
+
+		heap_insert(NewHeap, copiedTuple, xid, cid, HEAP_INSERT_NO_LOGICAL, NULL);
 	}
 
 	heap_freetuple(copiedTuple);
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 5670f2bfbde..e913594fc07 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -126,6 +126,18 @@ static FullTransactionId XactTopFullTransactionId = {InvalidTransactionId};
 static int	nParallelCurrentXids = 0;
 static TransactionId *ParallelCurrentXids;
 
+/*
+ * Another case that requires TransactionIdIsCurrentTransactionId() to behave
+ * specially is when REPACK CONCURRENTLY is processing data changes made in
+ * the old storage of a table by other transactions. When applying the changes
+ * to the new storage, the backend executing the CLUSTER command needs to act
+ * on behalf on those other transactions. The transactions responsible for the
+ * changes in the old storage are stored in this array, sorted by
+ * xidComparator.
+ */
+static int	nRepackCurrentXids = 0;
+static TransactionId *RepackCurrentXids = NULL;
+
 /*
  * Miscellaneous flag bits to record events which occur on the top level
  * transaction. These flags are only persisted in MyXactFlags and are intended
@@ -973,6 +985,8 @@ TransactionIdIsCurrentTransactionId(TransactionId xid)
 		int			low,
 					high;
 
+		Assert(nRepackCurrentXids == 0);
+
 		low = 0;
 		high = nParallelCurrentXids - 1;
 		while (low <= high)
@@ -992,6 +1006,21 @@ TransactionIdIsCurrentTransactionId(TransactionId xid)
 		return false;
 	}
 
+	/*
+	 * When executing CLUSTER CONCURRENTLY, the array of current transactions
+	 * is given.
+	 */
+	if (nRepackCurrentXids > 0)
+	{
+		Assert(nParallelCurrentXids == 0);
+
+		return bsearch(&xid,
+					   RepackCurrentXids,
+					   nRepackCurrentXids,
+					   sizeof(TransactionId),
+					   xidComparator) != NULL;
+	}
+
 	/*
 	 * We will return true for the Xid of the current subtransaction, any of
 	 * its subcommitted children, any of its parents, or any of their
@@ -5661,6 +5690,29 @@ EndParallelWorkerTransaction(void)
 	CurrentTransactionState->blockState = TBLOCK_DEFAULT;
 }
 
+/*
+ * SetRepackCurrentXids
+ *		Set the XID array that TransactionIdIsCurrentTransactionId() should
+ *		use.
+ */
+void
+SetRepackCurrentXids(TransactionId *xip, int xcnt)
+{
+	RepackCurrentXids = xip;
+	nRepackCurrentXids = xcnt;
+}
+
+/*
+ * ResetRepackCurrentXids
+ *		Undo the effect of SetRepackCurrentXids().
+ */
+void
+ResetRepackCurrentXids(void)
+{
+	RepackCurrentXids = NULL;
+	nRepackCurrentXids = 0;
+}
+
 /*
  * ShowTransactionState
  *		Debug support
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index aa3ae85bcee..a4d4d37d211 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -82,6 +82,11 @@ typedef struct
  * The following definitions are used for concurrent processing.
  */
 
+/*
+ * OID of the table being repacked by this backend.
+ */
+static Oid	repacked_rel = InvalidOid;
+
 /*
  * The locators are used to avoid logical decoding of data that we do not need
  * for our table.
@@ -125,8 +130,10 @@ static List *get_tables_to_repack_partitioned(RepackCommand cmd,
 static bool cluster_is_permitted_for_relation(RepackCommand cmd,
 											  Oid relid, Oid userid);
 
-static void begin_concurrent_repack(Relation rel);
-static void end_concurrent_repack(void);
+static void begin_concurrent_repack(Relation rel, Relation *index_p,
+									bool *entered_p);
+static void end_concurrent_repack(bool error);
+static void cluster_before_shmem_exit_callback(int code, Datum arg);
 static LogicalDecodingContext *setup_logical_decoding(Oid relid,
 													  const char *slotname,
 													  TupleDesc tupdesc);
@@ -146,6 +153,7 @@ static void apply_concurrent_delete(Relation rel, HeapTuple tup_target,
 									ConcurrentChange *change);
 static HeapTuple find_target_tuple(Relation rel, ScanKey key, int nkeys,
 								   HeapTuple tup_key,
+								   Snapshot snapshot,
 								   IndexInsertState *iistate,
 								   TupleTableSlot *ident_slot,
 								   IndexScanDesc *scan_p);
@@ -437,6 +445,8 @@ cluster_rel(RepackCommand cmd, bool usingindex,
 	bool		verbose = ((params->options & CLUOPT_VERBOSE) != 0);
 	bool		recheck = ((params->options & CLUOPT_RECHECK) != 0);
 	bool		concurrent = ((params->options & CLUOPT_CONCURRENT) != 0);
+	bool		entered,
+				success;
 
 	/*
 	 * Check that the correct lock is held. The lock mode is
@@ -607,23 +617,30 @@ cluster_rel(RepackCommand cmd, bool usingindex,
 		TransferPredicateLocksToHeapRelation(OldHeap);
 
 	/* rebuild_relation does all the dirty work */
+	entered = false;
+	success = false;
 	PG_TRY();
 	{
 		/*
-		 * For concurrent processing, make sure that our logical decoding
-		 * ignores data changes of other tables than the one we are
-		 * processing.
+		 * For concurrent processing, make sure that
+		 *
+		 * 1) our logical decoding ignores data changes of other tables than
+		 * the one we are processing.
+		 *
+		 * 2) other transactions treat this table as if it was a system / user
+		 * catalog, and WAL the relevant additional information.
 		 */
 		if (concurrent)
-			begin_concurrent_repack(OldHeap);
+			begin_concurrent_repack(OldHeap, &index, &entered);
 
 		rebuild_relation(cmd, usingindex, OldHeap, index, save_userid,
 						 verbose, concurrent);
+		success = true;
 	}
 	PG_FINALLY();
 	{
-		if (concurrent)
-			end_concurrent_repack();
+		if (concurrent && entered)
+			end_concurrent_repack(!success);
 	}
 	PG_END_TRY();
 
@@ -2383,6 +2400,47 @@ determine_clustered_index(Relation rel, bool usingindex, const char *indexname)
 }
 
 
+/*
+ * Each relation being processed by REPACK CONCURRENTLY must be in the
+ * repackedRels hashtable.
+ */
+typedef struct RepackedRel
+{
+	Oid			relid;
+	Oid			dbid;
+} RepackedRel;
+
+static HTAB *RepackedRelsHash = NULL;
+
+/*
+ * Maximum number of entries in the hashtable.
+ *
+ * A replication slot is needed for the processing, so use this GUC to
+ * allocate memory for the hashtable.
+ */
+#define	MAX_REPACKED_RELS	(max_replication_slots)
+
+Size
+RepackShmemSize(void)
+{
+	return hash_estimate_size(MAX_REPACKED_RELS, sizeof(RepackedRel));
+}
+
+void
+RepackShmemInit(void)
+{
+	HASHCTL		info;
+
+	info.keysize = sizeof(RepackedRel);
+	info.entrysize = info.keysize;
+
+	RepackedRelsHash = ShmemInitHash("Repacked Relations",
+									 MAX_REPACKED_RELS,
+									 MAX_REPACKED_RELS,
+									 &info,
+									 HASH_ELEM | HASH_BLOBS);
+}
+
 /*
  * Call this function before REPACK CONCURRENTLY starts to setup logical
  * decoding. It makes sure that other users of the table put enough
@@ -2397,11 +2455,119 @@ determine_clustered_index(Relation rel, bool usingindex, const char *indexname)
  *
  * Note that TOAST table needs no attention here as it's not scanned using
  * historic snapshot.
+ *
+ * 'index_p' is in/out argument because the function unlocks the index
+ * temporarily.
+ *
+ * 'enter_p' receives a bool value telling whether relation OID was entered
+ * into RepackedRelsHash or not.
  */
 static void
-begin_concurrent_repack(Relation rel)
+begin_concurrent_repack(Relation rel, Relation *index_p, bool *entered_p)
 {
-	Oid			toastrelid;
+	Oid			relid,
+				toastrelid;
+	Relation	index = NULL;
+	Oid			indexid = InvalidOid;
+	RepackedRel key,
+			   *entry;
+	bool		found;
+	static bool before_shmem_exit_callback_setup = false;
+
+	relid = RelationGetRelid(rel);
+	index = index_p ? *index_p : NULL;
+
+	/*
+	 * Make sure that we do not leave an entry in RepackedRelsHash if exiting
+	 * due to FATAL.
+	 */
+	if (!before_shmem_exit_callback_setup)
+	{
+		before_shmem_exit(cluster_before_shmem_exit_callback, 0);
+		before_shmem_exit_callback_setup = true;
+	}
+
+	memset(&key, 0, sizeof(key));
+	key.relid = relid;
+	key.dbid = MyDatabaseId;
+
+	*entered_p = false;
+	LWLockAcquire(RepackedRelsLock, LW_EXCLUSIVE);
+	entry = (RepackedRel *)
+		hash_search(RepackedRelsHash, &key, HASH_ENTER_NULL, &found);
+	if (found)
+	{
+		/*
+		 * Since REPACK CONCURRENTLY takes ShareRowExclusiveLock, a conflict
+		 * should occur much earlier. However that lock may be released
+		 * temporarily, see below.  Anyway, we should complain whatever the
+		 * reason of the conflict might be.
+		 */
+		ereport(ERROR,
+				(errmsg("relation \"%s\" is already being processed by REPACK CONCURRENTLY",
+						RelationGetRelationName(rel))));
+	}
+	if (entry == NULL)
+		ereport(ERROR,
+				(errmsg("too many requests for REPACK CONCURRENTLY at a time")),
+				(errhint("Please consider increasing the \"max_replication_slots\" configuration parameter.")));
+
+	/*
+	 * Even if anything fails below, the caller has to do cleanup in the
+	 * shared memory.
+	 */
+	*entered_p = true;
+
+	/*
+	 * Enable the callback to remove the entry in case of exit. We should not
+	 * do this earlier, otherwise an attempt to insert already existing entry
+	 * could make us remove that entry (inserted by another backend) during
+	 * ERROR handling.
+	 */
+	Assert(!OidIsValid(repacked_rel));
+	repacked_rel = relid;
+
+	LWLockRelease(RepackedRelsLock);
+
+	/*
+	 * Make sure that other backends are aware of the new hash entry as soon
+	 * as they open our table.
+	 */
+	CacheInvalidateRelcacheImmediate(relid);
+
+	/*
+	 * Also make sure that the existing users of the table update their
+	 * relcache entry as soon as they try to run DML commands on it.
+	 *
+	 * ShareLock is the weakest lock that conflicts with DMLs. If any backend
+	 * has a lower lock, we assume it'll accept our invalidation message when
+	 * it changes the lock mode.
+	 *
+	 * Before upgrading the lock on the relation, close the index temporarily
+	 * to avoid a deadlock if another backend running DML already has its lock
+	 * (ShareLock) on the table and waits for the lock on the index.
+	 */
+	if (index)
+	{
+		indexid = RelationGetRelid(index);
+		index_close(index, ShareUpdateExclusiveLock);
+	}
+	LockRelationOid(relid, ShareLock);
+	UnlockRelationOid(relid, ShareLock);
+	if (OidIsValid(indexid))
+	{
+		/*
+		 * Re-open the index and check that it hasn't changed while unlocked.
+		 */
+		check_index_is_clusterable(rel, indexid, ShareUpdateExclusiveLock);
+
+		/*
+		 * Return the new relcache entry to the caller. (It's been locked by
+		 * the call above.)
+		 */
+		index = index_open(indexid, NoLock);
+		*index_p = index;
+	}
 
 	/* Avoid logical decoding of other relations by this backend. */
 	repacked_rel_locator = rel->rd_locator;
@@ -2419,15 +2585,122 @@ begin_concurrent_repack(Relation rel)
 
 /*
  * Call this when done with REPACK CONCURRENTLY.
+ *
+ * 'error' tells whether the function is being called in order to handle
+ * error.
  */
 static void
-end_concurrent_repack(void)
+end_concurrent_repack(bool error)
 {
+	RepackedRel key;
+	RepackedRel *entry = NULL;
+	Oid			relid = repacked_rel;
+
+	/* Remove the relation from the hash if we managed to insert one. */
+	if (OidIsValid(repacked_rel))
+	{
+		memset(&key, 0, sizeof(key));
+		key.relid = repacked_rel;
+		key.dbid = MyDatabaseId;
+		LWLockAcquire(RepackedRelsLock, LW_EXCLUSIVE);
+		entry = hash_search(RepackedRelsHash, &key, HASH_REMOVE, NULL);
+		LWLockRelease(RepackedRelsLock);
+
+		/*
+		 * Make others refresh their information whether they should still
+		 * treat the table as catalog from the perspective of writing WAL.
+		 *
+		 * XXX Unlike entering the entry into the hashtable, we do not bother
+		 * with locking and unlocking the table here:
+		 *
+		 * 1) On normal completion (and sometimes even on ERROR), the caller
+		 * is already holding AccessExclusiveLock on the table, so there
+		 * should be no relcache reference unaware of this change.
+		 *
+		 * 2) In the other cases, the worst scenario is that the other
+		 * backends will write unnecessary information to WAL until they close
+		 * the relation.
+		 *
+		 * Should we use ShareLock mode to fix 2) at least for the non-FATAL
+		 * errors? (Our before_shmem_exit callback is in charge of FATAL, and
+		 * that probably should not try to acquire any lock.)
+		 */
+		CacheInvalidateRelcacheImmediate(repacked_rel);
+
+		/*
+		 * By clearing this variable we also disable
+		 * cluster_before_shmem_exit_callback().
+		 */
+		repacked_rel = InvalidOid;
+	}
+
 	/*
 	 * Restore normal function of (future) logical decoding for this backend.
 	 */
 	repacked_rel_locator.relNumber = InvalidOid;
 	repacked_rel_toast_locator.relNumber = InvalidOid;
+
+	/*
+	 * On normal completion (!error), we should not really fail to remove the
+	 * entry. But if it wasn't there for any reason, raise ERROR to make sure
+	 * the transaction is aborted: if other transactions, while changing the
+	 * contents of the relation, didn't know that REPACK CONCURRENTLY was in
+	 * progress, they could have missed to WAL enough information, and thus we
+	 * could have produced an inconsistent table contents.
+	 *
+	 * On the other hand, if we are already handling an error, there's no
+	 * reason to worry about inconsistent contents of the new storage because
+	 * the transaction is going to be rolled back anyway. Furthermore, by
+	 * raising ERROR here we'd shadow the original error.
+	 */
+	if (!error)
+	{
+		char	   *relname;
+
+		if (OidIsValid(relid) && entry == NULL)
+		{
+			relname = get_rel_name(relid);
+			if (!relname)
+				ereport(ERROR,
+						(errmsg("cache lookup failed for relation %u",
+								relid)));
+
+			ereport(ERROR,
+					(errmsg("relation \"%s\" not found among repacked relations",
+							relname)));
+		}
+	}
+}
+
+/*
+ * A wrapper to call end_concurrent_repack() as a before_shmem_exit callback.
+ */
+static void
+cluster_before_shmem_exit_callback(int code, Datum arg)
+{
+	if (OidIsValid(repacked_rel))
+		end_concurrent_repack(true);
+}
+
+/*
+ * Check if relation is currently being processed by REPACK CONCURRENTLY.
+ */
+bool
+is_concurrent_repack_in_progress(Oid relid)
+{
+	RepackedRel key,
+			   *entry;
+
+	memset(&key, 0, sizeof(key));
+	key.relid = relid;
+	key.dbid = MyDatabaseId;
+
+	LWLockAcquire(RepackedRelsLock, LW_SHARED);
+	entry = (RepackedRel *)
+		hash_search(RepackedRelsHash, &key, HASH_FIND, NULL);
+	LWLockRelease(RepackedRelsLock);
+
+	return entry != NULL;
 }
 
 /*
@@ -2489,6 +2762,9 @@ setup_logical_decoding(Oid relid, const char *slotname, TupleDesc tupdesc)
 	dstate->relid = relid;
 	dstate->tstore = tuplestore_begin_heap(false, false,
 										   maintenance_work_mem);
+#ifdef USE_ASSERT_CHECKING
+	dstate->last_change_xid = InvalidTransactionId;
+#endif
 
 	dstate->tupdesc = tupdesc;
 
@@ -2636,6 +2912,7 @@ apply_concurrent_changes(RepackDecodingState *dstate, Relation rel,
 		char	   *change_raw,
 				   *src;
 		ConcurrentChange change;
+		Snapshot	snapshot;
 		bool		isnull[1];
 		Datum		values[1];
 
@@ -2704,8 +2981,30 @@ apply_concurrent_changes(RepackDecodingState *dstate, Relation rel,
 
 			/*
 			 * Find the tuple to be updated or deleted.
+			 *
+			 * As the table being REPACKed concurrently is treated like a
+			 * catalog, new CID is WAL-logged and decoded. And since we use
+			 * the same XID that the original DMLs did, the snapshot used for
+			 * the logical decoding (by now converted to a non-historic MVCC
+			 * snapshot) should see the tuples inserted previously into the
+			 * new heap and/or updated there.
 			 */
-			tup_exist = find_target_tuple(rel, key, nkeys, tup_key,
+			snapshot = change.snapshot;
+
+			/*
+			 * Set what should be considered current transaction (and
+			 * subtransactions) during visibility check.
+			 *
+			 * Note that this snapshot was created from a historic snapshot
+			 * using SnapBuildMVCCFromHistoric(), which does not touch
+			 * 'subxip'. Thus, unlike in a regular MVCC snapshot, the array
+			 * only contains the transactions whose data changes we are
+			 * applying, and its subtransactions. That's exactly what we need
+			 * to check if particular xact is a "current transaction:".
+			 */
+			SetRepackCurrentXids(snapshot->subxip, snapshot->subxcnt);
+
+			tup_exist = find_target_tuple(rel, key, nkeys, tup_key, snapshot,
 										  iistate, ident_slot, &ind_scan);
 			if (tup_exist == NULL)
 				elog(ERROR, "Failed to find target tuple");
@@ -2716,6 +3015,8 @@ apply_concurrent_changes(RepackDecodingState *dstate, Relation rel,
 			else
 				apply_concurrent_delete(rel, tup_exist, &change);
 
+			ResetRepackCurrentXids();
+
 			if (tup_old != NULL)
 			{
 				pfree(tup_old);
@@ -2728,14 +3029,14 @@ apply_concurrent_changes(RepackDecodingState *dstate, Relation rel,
 		else
 			elog(ERROR, "Unrecognized kind of change: %d", change.kind);
 
-		/*
-		 * If a change was applied now, increment CID for next writes and
-		 * update the snapshot so it sees the changes we've applied so far.
-		 */
-		if (change.kind != CHANGE_UPDATE_OLD)
+		/* Free the snapshot if this is the last change that needed it. */
+		Assert(change.snapshot->active_count > 0);
+		change.snapshot->active_count--;
+		if (change.snapshot->active_count == 0)
 		{
-			CommandCounterIncrement();
-			UpdateActiveSnapshotCommandId();
+			if (change.snapshot == dstate->snapshot)
+				dstate->snapshot = NULL;
+			FreeSnapshot(change.snapshot);
 		}
 
 		/* TTSOpsMinimalTuple has .get_heap_tuple==NULL. */
@@ -2755,16 +3056,35 @@ static void
 apply_concurrent_insert(Relation rel, ConcurrentChange *change, HeapTuple tup,
 						IndexInsertState *iistate, TupleTableSlot *index_slot)
 {
+	Snapshot	snapshot = change->snapshot;
 	List	   *recheck;
 
+	/*
+	 * For INSERT, the visibility information is not important, but we use the
+	 * snapshot to get CID. Index functions might need the whole snapshot
+	 * anyway.
+	 */
+	SetRepackCurrentXids(snapshot->subxip, snapshot->subxcnt);
+
+	/*
+	 * Write the tuple into the new heap.
+	 *
+	 * The snapshot is the one we used to decode the insert (though converted
+	 * to "non-historic" MVCC snapshot), i.e. the snapshot's curcid is the
+	 * tuple CID incremented by one (due to the "new CID" WAL record that got
+	 * written along with the INSERT record). Thus if we want to use the
+	 * original CID, we need to subtract 1 from curcid.
+	 */
+	Assert(snapshot->curcid != InvalidCommandId &&
+		   snapshot->curcid > FirstCommandId);
 
 	/*
 	 * Like simple_heap_insert(), but make sure that the INSERT is not
 	 * logically decoded - see reform_and_rewrite_tuple() for more
 	 * information.
 	 */
-	heap_insert(rel, tup, GetCurrentCommandId(true), HEAP_INSERT_NO_LOGICAL,
-				NULL);
+	heap_insert(rel, tup, change->xid, snapshot->curcid - 1,
+				HEAP_INSERT_NO_LOGICAL, NULL);
 
 	/*
 	 * Update indexes.
@@ -2772,6 +3092,7 @@ apply_concurrent_insert(Relation rel, ConcurrentChange *change, HeapTuple tup,
 	 * In case functions in the index need the active snapshot and caller
 	 * hasn't set one.
 	 */
+	PushActiveSnapshot(snapshot);
 	ExecStoreHeapTuple(tup, index_slot, false);
 	recheck = ExecInsertIndexTuples(iistate->rri,
 									index_slot,
@@ -2782,6 +3103,8 @@ apply_concurrent_insert(Relation rel, ConcurrentChange *change, HeapTuple tup,
 									NIL,	/* arbiterIndexes */
 									false	/* onlySummarizing */
 		);
+	PopActiveSnapshot();
+	ResetRepackCurrentXids();
 
 	/*
 	 * If recheck is required, it must have been preformed on the source
@@ -2803,6 +3126,7 @@ apply_concurrent_update(Relation rel, HeapTuple tup, HeapTuple tup_target,
 	TU_UpdateIndexes update_indexes;
 	TM_Result	res;
 	List	   *recheck;
+	Snapshot	snapshot = change->snapshot;
 
 	/*
 	 * Write the new tuple into the new heap. ('tup' gets the TID assigned
@@ -2810,13 +3134,19 @@ apply_concurrent_update(Relation rel, HeapTuple tup, HeapTuple tup_target,
 	 *
 	 * Do it like in simple_heap_update(), except for 'wal_logical' (and
 	 * except for 'wait').
+	 *
+	 * Regarding CID, see the comment in apply_concurrent_insert().
 	 */
+	Assert(snapshot->curcid != InvalidCommandId &&
+		   snapshot->curcid > FirstCommandId);
+
 	res = heap_update(rel, &tup_target->t_self, tup,
-					  GetCurrentCommandId(true),
+					  change->xid, snapshot->curcid - 1,
 					  InvalidSnapshot,
 					  false,	/* no wait - only we are doing changes */
 					  &tmfd, &lockmode, &update_indexes,
-					  false /* wal_logical */ );
+	/* wal_logical */
+					  false);
 	if (res != TM_Ok)
 		ereport(ERROR, (errmsg("failed to apply concurrent UPDATE")));
 
@@ -2824,6 +3154,7 @@ apply_concurrent_update(Relation rel, HeapTuple tup, HeapTuple tup_target,
 
 	if (update_indexes != TU_None)
 	{
+		PushActiveSnapshot(snapshot);
 		recheck = ExecInsertIndexTuples(iistate->rri,
 										index_slot,
 										iistate->estate,
@@ -2833,6 +3164,7 @@ apply_concurrent_update(Relation rel, HeapTuple tup, HeapTuple tup_target,
 										NIL,	/* arbiterIndexes */
 		/* onlySummarizing */
 										update_indexes == TU_Summarizing);
+		PopActiveSnapshot();
 		list_free(recheck);
 	}
 
@@ -2845,6 +3177,12 @@ apply_concurrent_delete(Relation rel, HeapTuple tup_target,
 {
 	TM_Result	res;
 	TM_FailureData tmfd;
+	Snapshot	snapshot = change->snapshot;
+
+
+	/* Regarding CID, see the comment in apply_concurrent_insert(). */
+	Assert(snapshot->curcid != InvalidCommandId &&
+		   snapshot->curcid > FirstCommandId);
 
 	/*
 	 * Delete tuple from the new heap.
@@ -2852,11 +3190,11 @@ apply_concurrent_delete(Relation rel, HeapTuple tup_target,
 	 * Do it like in simple_heap_delete(), except for 'wal_logical' (and
 	 * except for 'wait').
 	 */
-	res = heap_delete(rel, &tup_target->t_self, GetCurrentCommandId(true),
-					  InvalidSnapshot, false,
-					  &tmfd,
-					  false,	/* no wait - only we are doing changes */
-					  false /* wal_logical */ );
+	res = heap_delete(rel, &tup_target->t_self, change->xid,
+					  snapshot->curcid - 1, InvalidSnapshot, false,
+					  &tmfd, false,
+	/* wal_logical */
+					  false);
 
 	if (res != TM_Ok)
 		ereport(ERROR, (errmsg("failed to apply concurrent DELETE")));
@@ -2877,7 +3215,7 @@ apply_concurrent_delete(Relation rel, HeapTuple tup_target,
  */
 static HeapTuple
 find_target_tuple(Relation rel, ScanKey key, int nkeys, HeapTuple tup_key,
-				  IndexInsertState *iistate,
+				  Snapshot snapshot, IndexInsertState *iistate,
 				  TupleTableSlot *ident_slot, IndexScanDesc *scan_p)
 {
 	IndexScanDesc scan;
@@ -2886,7 +3224,7 @@ find_target_tuple(Relation rel, ScanKey key, int nkeys, HeapTuple tup_key,
 	HeapTuple	result = NULL;
 
 	/* XXX no instrumentation for now */
-	scan = index_beginscan(rel, iistate->ident_index, GetActiveSnapshot(),
+	scan = index_beginscan(rel, iistate->ident_index, snapshot,
 						   NULL, nkeys, 0);
 	*scan_p = scan;
 	index_rescan(scan, key, nkeys, NULL, 0);
@@ -2958,6 +3296,8 @@ process_concurrent_changes(LogicalDecodingContext *ctx, XLogRecPtr end_of_wal,
 	}
 	PG_FINALLY();
 	{
+		ResetRepackCurrentXids();
+
 		if (rel_src)
 			rel_dst->rd_toastoid = InvalidOid;
 	}
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 5dc4ae58ffe..9fefcffd8b3 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -475,9 +475,14 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 
 	/*
 	 * If the change is not intended for logical decoding, do not even
-	 * establish transaction for it - REPACK CONCURRENTLY is the typical use
-	 * case.
-	 *
+	 * establish transaction for it. This is particularly important if the
+	 * record was generated by REPACK CONCURRENTLY because this command uses
+	 * the original XID when doing changes in the new storage. The decoding
+	 * system probably does not expect to see the same transaction multiple
+	 * times.
+	 */
+
+	/*
 	 * First, check if REPACK CONCURRENTLY is being performed by this backend.
 	 * If so, only decode data changes of the table that it is processing, and
 	 * the changes of its TOAST relation.
@@ -504,11 +509,11 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	 * Second, skip records which do not contain sufficient information for
 	 * the decoding.
 	 *
-	 * The problem we solve here is that REPACK CONCURRENTLY generates WAL
-	 * when doing changes in the new table. Those changes should not be useful
-	 * for any other user (such as logical replication subscription) because
-	 * the new table will eventually be dropped (after REPACK CONCURRENTLY has
-	 * assigned its file to the "old table").
+	 * One particular problem we solve here is that REPACK CONCURRENTLY
+	 * generates WAL when doing changes in the new table. Those changes should
+	 * not be decoded because reorderbuffer.c considers their XID already
+	 * committed. (REPACK CONCURRENTLY deliberately generates WAL records in
+	 * such a way that they are skipped here.)
 	 */
 	switch (info)
 	{
@@ -995,13 +1000,6 @@ DecodeInsert(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 
 	xlrec = (xl_heap_insert *) XLogRecGetData(r);
 
-	/*
-	 * Ignore insert records without new tuples (this does happen when
-	 * raw_heap_insert marks the TOAST record as HEAP_INSERT_NO_LOGICAL).
-	 */
-	if (!(xlrec->flags & XLH_INSERT_CONTAINS_NEW_TUPLE))
-		return;
-
 	/* only interested in our database */
 	XLogRecGetBlockTag(r, 0, &target_locator, NULL, NULL);
 	if (target_locator.dbOid != ctx->slot->data.database)
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 8e5116a9cab..72a38074a7b 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -155,7 +155,7 @@ static bool ExportInProgress = false;
 static void SnapBuildPurgeOlderTxn(SnapBuild *builder);
 
 /* snapshot building/manipulation/distribution functions */
-static Snapshot SnapBuildBuildSnapshot(SnapBuild *builder);
+static Snapshot SnapBuildBuildSnapshot(SnapBuild *builder, XLogRecPtr lsn);
 
 static void SnapBuildFreeSnapshot(Snapshot snap);
 
@@ -352,12 +352,17 @@ SnapBuildSnapDecRefcount(Snapshot snap)
  * Build a new snapshot, based on currently committed catalog-modifying
  * transactions.
  *
+ * 'lsn' is the location of the commit record (of a catalog-changing
+ * transaction) that triggered creation of the snapshot. Pass
+ * InvalidXLogRecPtr for the transaction base snapshot or if it the user of
+ * the snapshot should not need the LSN.
+ *
  * In-progress transactions with catalog access are *not* allowed to modify
  * these snapshots; they have to copy them and fill in appropriate ->curcid
  * and ->subxip/subxcnt values.
  */
 static Snapshot
-SnapBuildBuildSnapshot(SnapBuild *builder)
+SnapBuildBuildSnapshot(SnapBuild *builder, XLogRecPtr lsn)
 {
 	Snapshot	snapshot;
 	Size		ssize;
@@ -425,6 +430,7 @@ SnapBuildBuildSnapshot(SnapBuild *builder)
 	snapshot->active_count = 0;
 	snapshot->regd_count = 0;
 	snapshot->snapXactCompletionCount = 0;
+	snapshot->lsn = lsn;
 
 	return snapshot;
 }
@@ -461,7 +467,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	if (TransactionIdIsValid(MyProc->xmin))
 		elog(ERROR, "cannot build an initial slot snapshot when MyProc->xmin already is valid");
 
-	snap = SnapBuildBuildSnapshot(builder);
+	snap = SnapBuildBuildSnapshot(builder, InvalidXLogRecPtr);
 
 	/*
 	 * We know that snap->xmin is alive, enforced by the logical xmin
@@ -502,7 +508,7 @@ SnapBuildInitialSnapshotForRepack(SnapBuild *builder)
 
 	Assert(builder->state == SNAPBUILD_CONSISTENT);
 
-	snap = SnapBuildBuildSnapshot(builder);
+	snap = SnapBuildBuildSnapshot(builder, InvalidXLogRecPtr);
 	return SnapBuildMVCCFromHistoric(snap, false);
 }
 
@@ -636,7 +642,7 @@ SnapBuildGetOrBuildSnapshot(SnapBuild *builder)
 	/* only build a new snapshot if we don't have a prebuilt one */
 	if (builder->snapshot == NULL)
 	{
-		builder->snapshot = SnapBuildBuildSnapshot(builder);
+		builder->snapshot = SnapBuildBuildSnapshot(builder, InvalidXLogRecPtr);
 		/* increase refcount for the snapshot builder */
 		SnapBuildSnapIncRefcount(builder->snapshot);
 	}
@@ -716,7 +722,7 @@ SnapBuildProcessChange(SnapBuild *builder, TransactionId xid, XLogRecPtr lsn)
 		/* only build a new snapshot if we don't have a prebuilt one */
 		if (builder->snapshot == NULL)
 		{
-			builder->snapshot = SnapBuildBuildSnapshot(builder);
+			builder->snapshot = SnapBuildBuildSnapshot(builder, lsn);
 			/* increase refcount for the snapshot builder */
 			SnapBuildSnapIncRefcount(builder->snapshot);
 		}
@@ -1130,7 +1136,7 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 		if (builder->snapshot)
 			SnapBuildSnapDecRefcount(builder->snapshot);
 
-		builder->snapshot = SnapBuildBuildSnapshot(builder);
+		builder->snapshot = SnapBuildBuildSnapshot(builder, lsn);
 
 		/* we might need to execute invalidations, add snapshot */
 		if (!ReorderBufferXidHasBaseSnapshot(builder->reorder, xid))
@@ -1958,7 +1964,7 @@ SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
 	{
 		SnapBuildSnapDecRefcount(builder->snapshot);
 	}
-	builder->snapshot = SnapBuildBuildSnapshot(builder);
+	builder->snapshot = SnapBuildBuildSnapshot(builder, InvalidXLogRecPtr);
 	SnapBuildSnapIncRefcount(builder->snapshot);
 
 	ReorderBufferSetRestartPoint(builder->reorder, lsn);
diff --git a/src/backend/replication/pgoutput_repack/pgoutput_repack.c b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
index 687fbbc59bb..28bd16f9cc7 100644
--- a/src/backend/replication/pgoutput_repack/pgoutput_repack.c
+++ b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
@@ -32,7 +32,8 @@ static void plugin_truncate(struct LogicalDecodingContext *ctx,
 							Relation relations[],
 							ReorderBufferChange *change);
 static void store_change(LogicalDecodingContext *ctx,
-						 ConcurrentChangeKind kind, HeapTuple tuple);
+						 ConcurrentChangeKind kind, HeapTuple tuple,
+						 TransactionId xid);
 
 void
 _PG_output_plugin_init(OutputPluginCallbacks *cb)
@@ -100,6 +101,7 @@ plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 			  Relation relation, ReorderBufferChange *change)
 {
 	RepackDecodingState *dstate;
+	Snapshot	snapshot;
 
 	dstate = (RepackDecodingState *) ctx->output_writer_private;
 
@@ -107,6 +109,48 @@ plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 	if (relation->rd_id != dstate->relid)
 		return;
 
+	/*
+	 * Catalog snapshot is fine because the table we are processing is
+	 * temporarily considered a user catalog table.
+	 */
+	snapshot = GetCatalogSnapshot(InvalidOid);
+	Assert(snapshot->snapshot_type == SNAPSHOT_HISTORIC_MVCC);
+	Assert(!snapshot->suboverflowed);
+
+	/*
+	 * This should not happen, but if we don't have enough information to
+	 * apply a new snapshot, the consequences would be bad. Thus prefer ERROR
+	 * to Assert().
+	 */
+	if (XLogRecPtrIsInvalid(snapshot->lsn))
+		ereport(ERROR, (errmsg("snapshot has invalid LSN")));
+
+	/*
+	 * reorderbuffer.c changes the catalog snapshot as soon as it sees a new
+	 * CID or a commit record of a catalog-changing transaction.
+	 */
+	if (dstate->snapshot == NULL || snapshot->lsn != dstate->snapshot_lsn ||
+		snapshot->curcid != dstate->snapshot->curcid)
+	{
+		/* CID should not go backwards. */
+		Assert(dstate->snapshot == NULL ||
+			   snapshot->curcid >= dstate->snapshot->curcid ||
+			   change->txn->xid != dstate->last_change_xid);
+
+		/*
+		 * XXX Is it a problem that the copy is created in
+		 * TopTransactionContext?
+		 *
+		 * XXX Wouldn't it be o.k. for SnapBuildMVCCFromHistoric() to set xcnt
+		 * to 0 instead of converting xip in this case? The point is that
+		 * transactions which are still in progress from the perspective of
+		 * reorderbuffer.c could not be replayed yet, so we do not need to
+		 * examine their XIDs.
+		 */
+		dstate->snapshot = SnapBuildMVCCFromHistoric(snapshot, false);
+		dstate->snapshot_lsn = snapshot->lsn;
+	}
+
 	/* Decode entry depending on its type */
 	switch (change->action)
 	{
@@ -124,7 +168,7 @@ plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 				if (newtuple == NULL)
 					elog(ERROR, "Incomplete insert info.");
 
-				store_change(ctx, CHANGE_INSERT, newtuple);
+				store_change(ctx, CHANGE_INSERT, newtuple, change->txn->xid);
 			}
 			break;
 		case REORDER_BUFFER_CHANGE_UPDATE:
@@ -141,9 +185,11 @@ plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 					elog(ERROR, "Incomplete update info.");
 
 				if (oldtuple != NULL)
-					store_change(ctx, CHANGE_UPDATE_OLD, oldtuple);
+					store_change(ctx, CHANGE_UPDATE_OLD, oldtuple,
+								 change->txn->xid);
 
-				store_change(ctx, CHANGE_UPDATE_NEW, newtuple);
+				store_change(ctx, CHANGE_UPDATE_NEW, newtuple,
+							 change->txn->xid);
 			}
 			break;
 		case REORDER_BUFFER_CHANGE_DELETE:
@@ -156,7 +202,7 @@ plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 				if (oldtuple == NULL)
 					elog(ERROR, "Incomplete delete info.");
 
-				store_change(ctx, CHANGE_DELETE, oldtuple);
+				store_change(ctx, CHANGE_DELETE, oldtuple, change->txn->xid);
 			}
 			break;
 		default:
@@ -190,13 +236,13 @@ plugin_truncate(struct LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 	if (i == nrelations)
 		return;
 
-	store_change(ctx, CHANGE_TRUNCATE, NULL);
+	store_change(ctx, CHANGE_TRUNCATE, NULL, InvalidTransactionId);
 }
 
 /* Store concurrent data change. */
 static void
 store_change(LogicalDecodingContext *ctx, ConcurrentChangeKind kind,
-			 HeapTuple tuple)
+			 HeapTuple tuple, TransactionId xid)
 {
 	RepackDecodingState *dstate;
 	char	   *change_raw;
@@ -266,6 +312,11 @@ store_change(LogicalDecodingContext *ctx, ConcurrentChangeKind kind,
 	dst = dst_start + SizeOfConcurrentChange;
 	memcpy(dst, tuple->t_data, tuple->t_len);
 
+	/* Initialize the other fields. */
+	change.xid = xid;
+	change.snapshot = dstate->snapshot;
+	dstate->snapshot->active_count++;
+
 	/* The data has been copied. */
 	if (flattened)
 		pfree(tuple);
@@ -279,6 +330,9 @@ store:
 	isnull[0] = false;
 	tuplestore_putvalues(dstate->tstore, dstate->tupdesc_change,
 						 values, isnull);
+#ifdef USE_ASSERT_CHECKING
+	dstate->last_change_xid = xid;
+#endif
 
 	/* Accounting. */
 	dstate->nchanges++;
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index e9ddf39500c..e24e1795aa9 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -151,6 +151,7 @@ CalculateShmemSize(int *num_semaphores)
 	size = add_size(size, InjectionPointShmemSize());
 	size = add_size(size, SlotSyncShmemSize());
 	size = add_size(size, AioShmemSize());
+	size = add_size(size, RepackShmemSize());
 
 	/* include additional requested shmem from preload libraries */
 	size = add_size(size, total_addin_request);
@@ -344,6 +345,7 @@ CreateOrAttachShmemStructs(void)
 	WaitEventCustomShmemInit();
 	InjectionPointShmemInit();
 	AioShmemInit();
+	RepackShmemInit();
 }
 
 /*
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 0be307d2ca0..f546c2e12ca 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -352,6 +352,7 @@ DSMRegistry	"Waiting to read or update the dynamic shared memory registry."
 InjectionPoint	"Waiting to read or update information related to injection points."
 SerialControl	"Waiting to read or update shared <filename>pg_serial</filename> state."
 AioWorkerSubmissionQueue	"Waiting to access AIO worker submission queue."
+RepackedRels	"Waiting to access to hash table with list of repacked relations."
 
 #
 # END OF PREDEFINED LWLOCKS (DO NOT CHANGE THIS LINE)
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index 02505c88b8e..ecaa2283c2a 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -1643,6 +1643,27 @@ CacheInvalidateRelcache(Relation relation)
 								 databaseId, relationId);
 }
 
+/*
+ * CacheInvalidateRelcacheImmediate
+ *		Send invalidation message for the specified relation's relcache entry.
+ *
+ * Currently this is used in REPACK CONCURRENTLY, to make sure that other
+ * backends are aware that the command is being executed for the relation.
+ */
+void
+CacheInvalidateRelcacheImmediate(Oid relid)
+{
+	SharedInvalidationMessage msg;
+
+	msg.rc.id = SHAREDINVALRELCACHE_ID;
+	msg.rc.dbId = MyDatabaseId;
+	msg.rc.relId = relid;
+	/* check AddCatcacheInvalidationMessage() for an explanation */
+	VALGRIND_MAKE_MEM_DEFINED(&msg, sizeof(msg));
+
+	SendSharedInvalidMessages(&msg, 1);
+}
+
 /*
  * CacheInvalidateRelcacheAll
  *		Register invalidation of the whole relcache at the end of command.
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index d27a4c30548..ea565b5b053 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -1279,6 +1279,10 @@ retry:
 	/* make sure relation is marked as having no open file yet */
 	relation->rd_smgr = NULL;
 
+	/* Is REPACK CONCURRENTLY in progress? */
+	relation->rd_repack_concurrent =
+		is_concurrent_repack_in_progress(targetRelId);
+
 	/*
 	 * now we can free the memory allocated for pg_class_tuple
 	 */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b82dd17a966..981425f23b6 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -316,22 +316,24 @@ extern BulkInsertState GetBulkInsertState(void);
 extern void FreeBulkInsertState(BulkInsertState);
 extern void ReleaseBulkInsertStatePin(BulkInsertState bistate);
 
-extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
-						int options, BulkInsertState bistate);
+extern void heap_insert(Relation relation, HeapTuple tup, TransactionId xid,
+						CommandId cid, int options, BulkInsertState bistate);
 extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots,
 							  int ntuples, CommandId cid, int options,
 							  BulkInsertState bistate);
 extern TM_Result heap_delete(Relation relation, ItemPointer tid,
-							 CommandId cid, Snapshot crosscheck, bool wait,
+							 TransactionId xid, CommandId cid,
+							 Snapshot crosscheck, bool wait,
 							 struct TM_FailureData *tmfd, bool changingPart,
 							 bool wal_logical);
 extern void heap_finish_speculative(Relation relation, ItemPointer tid);
 extern void heap_abort_speculative(Relation relation, ItemPointer tid);
 extern TM_Result heap_update(Relation relation, ItemPointer otid,
-							 HeapTuple newtup,
+							 HeapTuple newtup, TransactionId xid,
 							 CommandId cid, Snapshot crosscheck, bool wait,
 							 struct TM_FailureData *tmfd, LockTupleMode *lockmode,
-							 TU_UpdateIndexes *update_indexes, bool wal_logical);
+							 TU_UpdateIndexes *update_indexes,
+							 bool wal_logical);
 extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
 								 CommandId cid, LockTupleMode mode, LockWaitPolicy wait_policy,
 								 bool follow_updates,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index b2bc10ee041..fbb66d559b6 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -482,6 +482,8 @@ extern Size EstimateTransactionStateSpace(void);
 extern void SerializeTransactionState(Size maxsize, char *start_address);
 extern void StartParallelWorkerTransaction(char *tstatespace);
 extern void EndParallelWorkerTransaction(void);
+extern void SetRepackCurrentXids(TransactionId *xip, int xcnt);
+extern void ResetRepackCurrentXids(void);
 extern bool IsTransactionBlock(void);
 extern bool IsTransactionOrTransactionBlock(void);
 extern char TransactionBlockStatusCode(void);
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 532ffa7208d..00ae6fc18e8 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -59,6 +59,14 @@ typedef struct ConcurrentChange
 	/* See the enum above. */
 	ConcurrentChangeKind kind;
 
+	/* Transaction that changes the data. */
+	TransactionId xid;
+
+	/*
+	 * Historic catalog snapshot that was used to decode this change.
+	 */
+	Snapshot	snapshot;
+
 	/*
 	 * The actual tuple.
 	 *
@@ -90,6 +98,8 @@ typedef struct RepackDecodingState
 	 * tuplestore does this transparently.
 	 */
 	Tuplestorestate *tstore;
+	/* XID of the last change added to tstore. */
+	TransactionId last_change_xid PG_USED_FOR_ASSERTS_ONLY;
 
 	/* The current number of changes in tstore. */
 	double		nchanges;
@@ -110,6 +120,14 @@ typedef struct RepackDecodingState
 	/* Slot to retrieve data from tstore. */
 	TupleTableSlot *tsslot;
 
+	/*
+	 * Historic catalog snapshot that was used to decode the most recent
+	 * change.
+	 */
+	Snapshot	snapshot;
+	/* LSN of the record  */
+	XLogRecPtr	snapshot_lsn;
+
 	ResourceOwner resowner;
 } RepackDecodingState;
 
@@ -139,4 +157,8 @@ extern void finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 							 MultiXactId cutoffMulti,
 							 char newrelpersistence);
 
+extern Size RepackShmemSize(void);
+extern void RepackShmemInit(void);
+extern bool is_concurrent_repack_in_progress(Oid relid);
+
 #endif							/* CLUSTER_H */
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index 208d2e3a8ed..869d9da7337 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -85,6 +85,7 @@ PG_LWLOCK(50, DSMRegistry)
 PG_LWLOCK(51, InjectionPoint)
 PG_LWLOCK(52, SerialControl)
 PG_LWLOCK(53, AioWorkerSubmissionQueue)
+PG_LWLOCK(54, RepackedRels)
 
 /*
  * There also exist several built-in LWLock tranches.  As with the predefined
diff --git a/src/include/utils/inval.h b/src/include/utils/inval.h
index 9b871caef62..ae9dee394dc 100644
--- a/src/include/utils/inval.h
+++ b/src/include/utils/inval.h
@@ -50,6 +50,8 @@ extern void CacheInvalidateCatalog(Oid catalogId);
 
 extern void CacheInvalidateRelcache(Relation relation);
 
+extern void CacheInvalidateRelcacheImmediate(Oid relid);
+
 extern void CacheInvalidateRelcacheAll(void);
 
 extern void CacheInvalidateRelcacheByTuple(HeapTuple classTuple);
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index b552359915f..66de3bc0c29 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -253,6 +253,9 @@ typedef struct RelationData
 	bool		pgstat_enabled; /* should relation stats be counted */
 	/* use "struct" here to avoid needing to include pgstat.h: */
 	struct PgStat_TableStatus *pgstat_info; /* statistics collection area */
+
+	/* Is REPACK CONCURRENTLY being performed on this relation? */
+	bool		rd_repack_concurrent;
 } RelationData;
 
 
@@ -695,7 +698,9 @@ RelationCloseSmgr(Relation relation)
 #define RelationIsAccessibleInLogicalDecoding(relation) \
 	(XLogLogicalInfoActive() && \
 	 RelationNeedsWAL(relation) && \
-	 (IsCatalogRelation(relation) || RelationIsUsedAsCatalogTable(relation)))
+	 (IsCatalogRelation(relation) || \
+	  RelationIsUsedAsCatalogTable(relation) || \
+	  (relation)->rd_repack_concurrent))
 
 /*
  * RelationIsLogicallyLogged
diff --git a/src/include/utils/snapshot.h b/src/include/utils/snapshot.h
index 0e546ec1497..014f27db7d7 100644
--- a/src/include/utils/snapshot.h
+++ b/src/include/utils/snapshot.h
@@ -13,6 +13,7 @@
 #ifndef SNAPSHOT_H
 #define SNAPSHOT_H
 
+#include "access/xlogdefs.h"
 #include "lib/pairingheap.h"
 
 
@@ -201,6 +202,8 @@ typedef struct SnapshotData
 	uint32		regd_count;		/* refcount on RegisteredSnapshots */
 	pairingheap_node ph_node;	/* link in the RegisteredSnapshots heap */
 
+	XLogRecPtr	lsn;			/* position in the WAL stream when taken */
+
 	/*
 	 * The transaction completion count at the time GetSnapshotData() built
 	 * this snapshot. Allows to avoid re-computing static snapshots when no
diff --git a/src/test/modules/injection_points/specs/repack.spec b/src/test/modules/injection_points/specs/repack.spec
index 75850334986..3711a7c92b9 100644
--- a/src/test/modules/injection_points/specs/repack.spec
+++ b/src/test/modules/injection_points/specs/repack.spec
@@ -86,9 +86,6 @@ step change_new
 # When applying concurrent data changes, we should see the effects of an
 # in-progress subtransaction.
 #
-# XXX Not sure this test is useful now - it was designed for the patch that
-# preserves tuple visibility and which therefore modifies
-# TransactionIdIsCurrentTransactionId().
 step change_subxact1
 {
 	BEGIN;
@@ -103,7 +100,6 @@ step change_subxact1
 # When applying concurrent data changes, we should not see the effects of a
 # rolled back subtransaction.
 #
-# XXX Is this test useful? See above.
 step change_subxact2
 {
 	BEGIN;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 480ce894978..e42bb5209c0 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2540,6 +2540,7 @@ ReorderBufferTupleCidKey
 ReorderBufferUpdateProgressTxnCB
 ReorderTuple
 RepOriginId
+RepackedRel
 RepackCommand
 RepackDecodingState
 RepackStmt
-- 
2.43.0

#10

Antonin Houska

ah@cybertec.at

5 months ago

In reply to: Mihail Nikalayeu (#9)

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

Also, I think I found an issue (or lost something during rebase): we
must preserve xmin,cmin during initial copy
to make sure that data is going to be visible by snapshots of
concurrent changes later:

static void
reform_and_rewrite_tuple(......)
.....
/*It is also crucial to stamp the new record with the exact same
xid and cid,
* because the tuple must be visible to the snapshot of the
applied concurrent
* change later.
*/
CommandId cid = HeapTupleHeaderGetRawCommandId(tuple->t_data);
TransactionId xid = HeapTupleHeaderGetXmin(tuple->t_data);

heap_insert(NewHeap, copiedTuple, xid, cid, HEAP_INSERT_NO_LOGICAL, NULL);

When posting version 12 of the patch [1]/messages/by-id/178741.1743514291@localhost I raised a concern that the the MVCC
safety is too expensive when it comes to logical decoding. Therefore, I
abandoned the concept for now, and v13 [2]/messages/by-id/97795.1744363522@localhost uses plain heap_insert(). Once we
implement the MVCC safety, we simply rewrite the tuple like v12 did - that's
the simplest way to preserve fields like xmin, cmin, ...

[1]: /messages/by-id/178741.1743514291@localhost
[2]: /messages/by-id/97795.1744363522@localhost

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#11

Mihail Nikalayeu

mihailnikalayeu@gmail.com

5 months ago

In reply to: Antonin Houska (#10)

Hello, Antonin!

When posting version 12 of the patch [1] I raised a concern that the the MVCC
safety is too expensive when it comes to logical decoding. Therefore, I
abandoned the concept for now, and v13 [2] uses plain heap_insert(). Once we
implement the MVCC safety, we simply rewrite the tuple like v12 did - that's
the simplest way to preserve fields like xmin, cmin, ...

Thanks for the explanation.

I was looking into catalog-related logical decoding features, and it
seems like they are clearly overkill for the repack case.
We don't need CID tracking or even a snapshot for each commit if we’re
okay with passing xmin/xmax as arguments.

What do you think about the following approach for replaying:
* use the extracted XID as the value for xmin/xmax.
* use SnapshotSelf to find the tuple for update/delete operations.

SnapshotSelf seems like a good fit here:
* it sees the last "existing" version.
* any XID set as xmin/xmax in the repacked version is already
committed - so each update/insert is effectively "committed" once
written.
* it works with multiple updates of the same tuple within a single
transaction - SnapshotSelf sees the last version.
* all updates are ordered and replayed sequentially - so the last
version is always the one we want.

If I'm not missing anything, this looks like something worth including
in the patch set.
If so, I can try implementing a test version.

Best regards,
Mikhail

#12

Antonin Houska

ah@cybertec.at

5 months ago

In reply to: Mihail Nikalayeu (#11)

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

I was looking into catalog-related logical decoding features, and it
seems like they are clearly overkill for the repack case.
We don't need CID tracking or even a snapshot for each commit if we’re
okay with passing xmin/xmax as arguments.

I assume you are concerned with the patch part 0005 of the v12 patch
("Preserve visibility information of the concurrent data changes."), aren't
you?

What do you think about the following approach for replaying:
* use the extracted XID as the value for xmin/xmax.
* use SnapshotSelf to find the tuple for update/delete operations.

SnapshotSelf seems like a good fit here:
* it sees the last "existing" version.
* any XID set as xmin/xmax in the repacked version is already
committed - so each update/insert is effectively "committed" once
written.
* it works with multiple updates of the same tuple within a single
transaction - SnapshotSelf sees the last version.
* all updates are ordered and replayed sequentially - so the last
version is always the one we want.

If I'm not missing anything, this looks like something worth including
in the patch set.
If so, I can try implementing a test version.

Not sure I understand in all details, but I don't think SnapshotSelf is the
correct snapshot. Note that HeapTupleSatisfiesSelf() does not use its
'snapshot' argument at all. Instead, it considers the set of running
transactions as it is at the time the function is called.

One particular problem I imagine is replaying an UPDATE to a row that some
later transaction will eventually delete, but the transaction that ran the
UPDATE obviously had to see it. When looking for the old version during the
replay, HeapTupleSatisfiesMVCC() will find the old version as long as we pass
the correct snapshot to it.

However, at the time we're replaying the UPDATE in the new table, the tuple
may have been already deleted from the old table, and the deleting transaction
may already have committed. In such a case, HeapTupleSatisfiesSelf() will
conclude the old version invisible and the we'll fail to replay the UPDATE.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#13

Mihail Nikalayeu

mihailnikalayeu@gmail.com

5 months ago

In reply to: Antonin Houska (#12)

Hi, Antonin!

I assume you are concerned with the patch part 0005 of the v12 patch
("Preserve visibility information of the concurrent data changes."), aren't
you?

Yes, of course. I got an idea while trying to find a way to optimize it.

Not sure I understand in all details, but I don't think SnapshotSelf is the
correct snapshot. Note that HeapTupleSatisfiesSelf() does not use its
'snapshot' argument at all. Instead, it considers the set of running
transactions as it is at the time the function is called.

Yes, and it is almost the same behavior when a typical MVCC snapshot
encounters a tuple created by its own transaction.

So, how it works in the non MVCC-safe case (current patch behaviour):

1) we have a whole initial table snapshot with all the xmin = repack XID
2) appling transaction sees ALL the self-alive (no xmax) tuples in it
because all tuples created\deleted by transaction itself
3) each update/delete during the replay selects the last existing
tuple version, updates it xmax and inserts a new one
4) so, there is no any real MVCC involved - just find the latest
version and create a new version
5) and it works correctly because all ordering issues were resolved by
locking mechanisms on the original table or by reordering buffer

How it maps to MVCC-safe case (SnapshotSelf):

1) we have a whole initial table snapshot with all xmin copied from
the original table. All such xmin are committed.
2) appling transaction sees ALL the self-alive (no xmax) tuple in it
because its xmin\xmax is committed and SnapshotSelf is happy with it
3) each update/delete during the replay selects the last existing
tuple version, updates it xmax=original xid and inserts a new one
keeping with xmin=orignal xid
4) --//--
5) --//--

However, at the time we're replaying the UPDATE in the new table, the tuple
may have been already deleted from the old table, and the deleting transaction
may already have committed. In such a case, HeapTupleSatisfiesSelf() will
conclude the old version invisible and the we'll fail to replay the UPDATE.

No, it will see it - because its xmax will be empty in the repacked
version of the table.

From your question I feel you understood the concept - but feel free
to ask for an additional explanation/scheme.

Best regards,
Mikhail.

#14

Antonin Houska

ah@cybertec.at

5 months ago

In reply to: Mihail Nikalayeu (#13)

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

Not sure I understand in all details, but I don't think SnapshotSelf is the
correct snapshot. Note that HeapTupleSatisfiesSelf() does not use its
'snapshot' argument at all. Instead, it considers the set of running
transactions as it is at the time the function is called.

Yes, and it is almost the same behavior when a typical MVCC snapshot
encounters a tuple created by its own transaction.

So, how it works in the non MVCC-safe case (current patch behaviour):

1) we have a whole initial table snapshot with all the xmin = repack XID
2) appling transaction sees ALL the self-alive (no xmax) tuples in it
because all tuples created\deleted by transaction itself
3) each update/delete during the replay selects the last existing
tuple version, updates it xmax and inserts a new one
4) so, there is no any real MVCC involved - just find the latest
version and create a new version
5) and it works correctly because all ordering issues were resolved by
locking mechanisms on the original table or by reordering buffer

How it maps to MVCC-safe case (SnapshotSelf):

1) we have a whole initial table snapshot with all xmin copied from
the original table. All such xmin are committed.
2) appling transaction sees ALL the self-alive (no xmax) tuple in it
because its xmin\xmax is committed and SnapshotSelf is happy with it

How does HeapTupleSatisfiesSelf() recognize the status of any XID w/o using a
snapshot? Do you mean by checking the commit log (TransactionIdDidCommit) ?

3) each update/delete during the replay selects the last existing
tuple version, updates it xmax=original xid and inserts a new one
keeping with xmin=orignal xid
4) --//--
5) --//--

However, at the time we're replaying the UPDATE in the new table, the tuple
may have been already deleted from the old table, and the deleting transaction
may already have committed. In such a case, HeapTupleSatisfiesSelf() will
conclude the old version invisible and the we'll fail to replay the UPDATE.

No, it will see it - because its xmax will be empty in the repacked
version of the table.

You're right, it'll be empty in the new table.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#15

Mihail Nikalayeu

mihailnikalayeu@gmail.com

5 months ago

In reply to: Antonin Houska (#14)

Hi, Antonin

How does HeapTupleSatisfiesSelf() recognize the status of any XID w/o using a
snapshot? Do you mean by checking the commit log (TransactionIdDidCommit) ?

Yes, TransactionIdDidCommit. Another option is just invent a new
snapshot type - SnapshotBelieveEverythingCommitted - for that
particular case it should work - because all xmin/xmax written into
the new table are committed by design.

#16

Robert Treat

rob@xzilla.net

5 months ago

In reply to: Mihail Nikalayeu (#13)

On Mon, Aug 25, 2025 at 10:15 AM Mihail Nikalayeu
<mihailnikalayeu@gmail.com> wrote:

1) we have a whole initial table snapshot with all xmin copied from
the original table. All such xmin are committed.
2) appling transaction sees ALL the self-alive (no xmax) tuple in it
because its xmin\xmax is committed and SnapshotSelf is happy with it
3) each update/delete during the replay selects the last existing
tuple version, updates it xmax=original xid and inserts a new one
keeping with xmin=orignal xid
4) --//--
5) --//--

Advancing the tables min xid to at least repack XID is a pretty big
feature, but the above scenario sounds like it would result in any
non-modified pre-existing tuples ending up with their original xmin
rather than repack XID, which seems like it could lead to weird
side-effects. Maybe I am mis-thinking it though?

Robert Treat
https://xzilla.net

#17

Antonin Houska

ah@cybertec.at

5 months ago

In reply to: Robert Treat (#16)

Robert Treat <rob@xzilla.net> wrote:

On Mon, Aug 25, 2025 at 10:15 AM Mihail Nikalayeu
<mihailnikalayeu@gmail.com> wrote:

1) we have a whole initial table snapshot with all xmin copied from
the original table. All such xmin are committed.
2) appling transaction sees ALL the self-alive (no xmax) tuple in it
because its xmin\xmax is committed and SnapshotSelf is happy with it
3) each update/delete during the replay selects the last existing
tuple version, updates it xmax=original xid and inserts a new one
keeping with xmin=orignal xid
4) --//--
5) --//--

Advancing the tables min xid to at least repack XID is a pretty big
feature, but the above scenario sounds like it would result in any
non-modified pre-existing tuples ending up with their original xmin
rather than repack XID, which seems like it could lead to weird
side-effects. Maybe I am mis-thinking it though?

What we discuss here is how to keep visibility information of tuples (xmin,
xmax, ...) unchanged. Both CLUSTER and VACUUM FULL already do that. However
it's not trivial to ensure that REPACK with the CONCURRENTLY option does as
well.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#18

Antonin Houska

ah@cybertec.at

5 months ago

In reply to: Mihail Nikalayeu (#15)

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

Hi, Antonin

How does HeapTupleSatisfiesSelf() recognize the status of any XID w/o using a
snapshot? Do you mean by checking the commit log (TransactionIdDidCommit) ?

Yes, TransactionIdDidCommit.

I think the problem is that HeapTupleSatisfiesSelf() uses
TransactionIdIsInProgress() instead of checking the snapshot:

...
else if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmin(tuple)))
return false;
else if (TransactionIdDidCommit(HeapTupleHeaderGetRawXmin(tuple)))
...

When decoding (and replaying) data changes, you deal with the database state
as it was (far) in the past. However TransactionIdIsInProgress() is not
suitable for this purpose.

And since CommitTransaction() updates the commit log before removing the
transaction from ProcArray, I can even imagine race conditions: if a
transaction is committed and decoded fast enough, TransactionIdIsInProgress()
might still return true. In such a case, HeapTupleSatisfiesSelf() returns
false instead of calling TransactionIdDidCommit().

Another option is just invent a new
snapshot type - SnapshotBelieveEverythingCommitted - for that
particular case it should work - because all xmin/xmax written into
the new table are committed by design.

I'd prefer optimization of the logical decoding for REPACK CONCURRENTLY, and
using the MVCC snapshots.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#19

Mihail Nikalayeu

mihailnikalayeu@gmail.com

5 months ago

In reply to: Antonin Houska (#18)

Antonin Houska <ah@cybertec.at>:

I think the problem is that HeapTupleSatisfiesSelf() uses
TransactionIdIsInProgress() instead of checking the snapshot:

Yes, some issues might be possible for SnapshotSelf.
Possible solution is to override TransactionIdIsCurrentTransactionId
to true (like you did with nParallelCurrentXids but just return true).
IIUC, in that case all checks are going to behave the same way as in v5 version.

I'd prefer optimization of the logical decoding for REPACK CONCURRENTLY, and
using the MVCC snapshots.

It is also possible, but it is much more complex and feels like overkill to me.
We need just a way to find the latest version of row in the world of
all-committed transactions without any concurrent writers - I am
pretty sure it is possible to achieve in a more simple and effective
way.

#20

Antonin Houska

ah@cybertec.at

5 months ago

In reply to: Mihail Nikalayeu (#19)

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

Antonin Houska <ah@cybertec.at>:

I think the problem is that HeapTupleSatisfiesSelf() uses
TransactionIdIsInProgress() instead of checking the snapshot:

Yes, some issues might be possible for SnapshotSelf.
Possible solution is to override TransactionIdIsCurrentTransactionId
to true (like you did with nParallelCurrentXids but just return true).
IIUC, in that case all checks are going to behave the same way as in v5 version.

I assume you mean v12-0005. Yes, that modifies
TransactionIdIsCurrentTransactionId(), so that the the transaction being
replayed recognizes if it (or its subtransaction) performed particular change
itself.

Although it could work, I think it'd be confusing to consider the transactions
being replayed as "current" from the point of view of the backend that
executes REPACK CONCURRENTLY.

But the primary issue is that in v12-0005,
TransactionIdIsCurrentTransactionId() gets the information on "current
transactions" from snapshots - see the calls of SetRepackCurrentXids() before
each scan. It's probably not what you want.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#21

Mihail Nikalayeu

mihailnikalayeu@gmail.com

5 months ago

In reply to: Antonin Houska (#20)

Antonin Houska <ah@cybertec.at>:

Although it could work, I think it'd be confusing to consider the transactions
being replayed as "current" from the point of view of the backend that
executes REPACK CONCURRENTLY.

Just realized SnapshotDirty is the thing that fits into the role - it
respects not-yet committed transactions, giving enough information to
wait for them.
It is already used in a similar pattern in
check_exclusion_or_unique_constraint and RelationFindReplTupleByIndex.

So, it is easy to detect the case of the race you described previously
and retry + there is no sense to hack around
TransactionIdIsCurrentTransactionId.

BWT, btree + SnapshotDirty has issue [0]/messages/by-id/CADzfLwXGhH_qD6RGqPyEeKdmHgr-HpA-tASYdi5onP+RyP5TCw@mail.gmail.com, but it is a different story
and happens only with concurrent updates which are not present in the
current scope.

[0]: /messages/by-id/CADzfLwXGhH_qD6RGqPyEeKdmHgr-HpA-tASYdi5onP+RyP5TCw@mail.gmail.com

#22

Antonin Houska

ah@cybertec.at

5 months ago

In reply to: Mihail Nikalayeu (#21)

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

Antonin Houska <ah@cybertec.at>:

Although it could work, I think it'd be confusing to consider the transactions
being replayed as "current" from the point of view of the backend that
executes REPACK CONCURRENTLY.

Just realized SnapshotDirty is the thing that fits into the role - it
respects not-yet committed transactions, giving enough information to
wait for them.
It is already used in a similar pattern in
check_exclusion_or_unique_constraint and RelationFindReplTupleByIndex.

So, it is easy to detect the case of the race you described previously
and retry + there is no sense to hack around
TransactionIdIsCurrentTransactionId.

Where exactly should HeapTupleSatisfiesDirty() conclude that the tuple is
visible? TransactionIdIsCurrentTransactionId() will not do w/o the
modifications that you proposed earlier [1]/messages/by-id/CADzfLwUqyOmpkLmciecBy4aBN1sohQVZ2Hgc6m-tjSUqDRHwyQ@mail.gmail.com and TransactionIdIsInProgress() is
not suitable as I explained in [2]/messages/by-id/24483.1756142534@localhost.

I understand your idea that there are no transaction aborts in the new table,
which makes things simpler. I cannot judge if it's worth inventing a new kind
of snapshot. Anyway, I think you'd then also need to hack
HeapTupleSatisfiesUpdate(). Isn't that too invasive?

[1]: /messages/by-id/CADzfLwUqyOmpkLmciecBy4aBN1sohQVZ2Hgc6m-tjSUqDRHwyQ@mail.gmail.com
[2]: /messages/by-id/24483.1756142534@localhost

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#23

Mihail Nikalayeu

mihailnikalayeu@gmail.com

5 months ago

In reply to: Antonin Houska (#22)

Hello, Antonin!

Antonin Houska <ah@cybertec.at>:

Where exactly should HeapTupleSatisfiesDirty() conclude that the tuple is
visible? TransactionIdIsCurrentTransactionId() will not do w/o the
modifications that you proposed earlier [1] and TransactionIdIsInProgress() is
not suitable as I explained in [2].

HeapTupleSatisfiesDirty is designed to respect even not-yet-committed
transactions and provides additional related information.

else if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmin(tuple)))
{
/*
* Return the speculative token to caller. Caller can worry about
* xmax, since it requires a conclusively locked row version, and
* a concurrent update to this tuple is a conflict of its
* purposes.
*/
if (HeapTupleHeaderIsSpeculative(tuple))
{
snapshot->speculativeToken =
HeapTupleHeaderGetSpeculativeToken(tuple);

Assert(snapshot->speculativeToken != 0);
}

snapshot->xmin = HeapTupleHeaderGetRawXmin(tuple);
/* XXX shouldn't we fall through to look at xmax? */
return true; /* in insertion by other */
}

So, it returns true when TransactionIdIsInProgress is true.
However, that alone is not sufficient to trust the result in the common case.

You may check check_exclusion_or_unique_constraint or
RelationFindReplTupleByIndex for that pattern:
if xmin is set in the snapshot (a special hack in SnapshotDirty to
provide additional information from the check), we wait for the
ongoing transaction (or one that is actually committed but not yet
properly reflected in the proc array), and then retry the entire tuple
search.

So, the race condition you explained in [2] will be resolved by a
retry, and the changes to TransactionIdIsInProgress described in [1]
are not necessary.

I understand your idea that there are no transaction aborts in the new table,
which makes things simpler. I cannot judge if it's worth inventing a new kind
of snapshot. Anyway, I think you'd then also need to hack
HeapTupleSatisfiesUpdate(). Isn't that too invasive?

It seems that HeapTupleSatisfiesUpdate is also fine as it currently
exists (we'll see the committed version after retry)..

The solution appears to be non-invasive:
* uses the existing snapshot type
* follows the existing usage pattern
* leaves TransactionIdIsInProgress and HeapTupleSatisfiesUpdate unchanged

The main change is that xmin/xmax values are forced from the arguments
- but that seems unavoidable in any case.

I'll try to make some kind of prototype this weekend + cover race
condition you mentioned in specs.
Maybe some corner cases will appear.

By the way, there's one more optimization we could apply in both
MVCC-safe and non-MVCC-safe cases: setting the HEAP_XMIN_COMMITTED /
HEAP_XMAX_COMMITTED bit in the new table:
* in the MVCC-safe approach, the transaction is already committed.
* in the non-MVCC-safe case, it isn’t committed yet - but no one will
examine that bit before it commits (though this approach does feel
more fragile).

This could help avoid potential storms of full-page writes caused by
SetHintBit after the table switch.

Best regards,
Mikhail.

#24

Antonin Houska

ah@cybertec.at

5 months ago

In reply to: Mihail Nikalayeu (#23)

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

Hello, Antonin!

Antonin Houska <ah@cybertec.at>:

Where exactly should HeapTupleSatisfiesDirty() conclude that the tuple is
visible? TransactionIdIsCurrentTransactionId() will not do w/o the
modifications that you proposed earlier [1] and TransactionIdIsInProgress() is
not suitable as I explained in [2].

HeapTupleSatisfiesDirty is designed to respect even not-yet-committed
transactions and provides additional related information.

else if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmin(tuple)))
{
/*
* Return the speculative token to caller. Caller can worry about
* xmax, since it requires a conclusively locked row version, and
* a concurrent update to this tuple is a conflict of its
* purposes.
*/
if (HeapTupleHeaderIsSpeculative(tuple))
{
snapshot->speculativeToken =
HeapTupleHeaderGetSpeculativeToken(tuple);

Assert(snapshot->speculativeToken != 0);
}

snapshot->xmin = HeapTupleHeaderGetRawXmin(tuple);
/* XXX shouldn't we fall through to look at xmax? */
return true; /* in insertion by other */
}

So, it returns true when TransactionIdIsInProgress is true.
However, that alone is not sufficient to trust the result in the common case.

You may check check_exclusion_or_unique_constraint or
RelationFindReplTupleByIndex for that pattern:
if xmin is set in the snapshot (a special hack in SnapshotDirty to
provide additional information from the check), we wait for the
ongoing transaction (or one that is actually committed but not yet
properly reflected in the proc array), and then retry the entire tuple
search.

So, the race condition you explained in [2] will be resolved by a
retry, and the changes to TransactionIdIsInProgress described in [1]
are not necessary.

I insist that this is a misuse of TransactionIdIsInProgress(). When dealing
with logical decoding, only WAL should tell whether particular transaction is
still running. AFAICS this is how reorderbuffer.c works.

A new kind of snapshot seems like (much) cleaner solution at the moment.

I'll try to make some kind of prototype this weekend + cover race
condition you mentioned in specs.
Maybe some corner cases will appear.

No rush. First, the MVCC safety is not likely to be included in v19
[1]: /messages/by-id/202504040733.ysuy5gad55md@alvherre.pgsql
writing code.

By the way, there's one more optimization we could apply in both
MVCC-safe and non-MVCC-safe cases: setting the HEAP_XMIN_COMMITTED /
HEAP_XMAX_COMMITTED bit in the new table:
* in the MVCC-safe approach, the transaction is already committed.
* in the non-MVCC-safe case, it isn’t committed yet - but no one will
examine that bit before it commits (though this approach does feel
more fragile).

This could help avoid potential storms of full-page writes caused by
SetHintBit after the table switch.

Good idea, thanks.

[1]: /messages/by-id/202504040733.ysuy5gad55md@alvherre.pgsql

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#25

Mihail Nikalayeu

mihailnikalayeu@gmail.com

5 months ago

In reply to: Antonin Houska (#24)

Antonin Houska <ah@cybertec.at>:

I insist that this is a misuse of TransactionIdIsInProgress(). When dealing
with logical decoding, only WAL should tell whether particular transaction is
still running. AFAICS this is how reorderbuffer.c works.

Hm... Maybe, but at the same time we already have SnapshotDirty used
in that way and it even deals with the same race....
But I agree - a special kind of snapshot is a more accurate solution.

A new kind of snapshot seems like (much) cleaner solution at the moment.

Do you mean some kind of snapshot which only uses
TransactionIdDidCommit/Abort ignoring
TransactionIdIsCurrentTransactionId/TransactionIdIsInProgress?
Actually it behaves like SnapshotBelieveEverythingCommitted in that
particular case, but TransactionIdDidCommit/Abort may be used as some
kind of assert/error source to be sure everything is going as
designed.
And, yes, for the new snapshot we need to have
HeapTupleSatisfiesUpdate to be modified.

Also, to deal with that particular race we may just use
XactLockTableWait(xid, NULL, NULL, XLTW_None) before starting
transaction replay.

No rush. First, the MVCC safety is not likely to be included in v19 [1].

That worries me - it is not the behaviour someone expects from a
database by default. At least the warning should be much more visible
and obvious.
I think most of user will expect the same guarantees as [CREATE|RE]
INDEX CONCURRENTLY provides.

#26

Antonin Houska

ah@cybertec.at

5 months ago

In reply to: Mihail Nikalayeu (#25)

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

A new kind of snapshot seems like (much) cleaner solution at the moment.

Do you mean some kind of snapshot which only uses
TransactionIdDidCommit/Abort ignoring
TransactionIdIsCurrentTransactionId/TransactionIdIsInProgress?
Actually it behaves like SnapshotBelieveEverythingCommitted in that
particular case, but TransactionIdDidCommit/Abort may be used as some
kind of assert/error source to be sure everything is going as
designed.

Given that there should be no (sub)transaction aborts in the new table, I
think you only need to check that the tuple has valid xmin and invalid xmax.

I think the XID should be in the commit log at the time the transaction is
being replayed, so it should be legal to use TransactionIdDidCommit/Abort in
Assert() statements. (And as long as REPACK CONCURRENTLY will use
ShareUpdateExclusiveLock, which conflicts with VACUUM, pg_class(relfrozenxid)
for given table should not advance during the processing, and therefore the
replayed XIDs should not be truncated from the commit log while REPACK
CONCURRENTLY is running.)

And, yes, for the new snapshot we need to have
HeapTupleSatisfiesUpdate to be modified.

Also, to deal with that particular race we may just use
XactLockTableWait(xid, NULL, NULL, XLTW_None) before starting
transaction replay.

Do you mean the race related to TransactionIdIsInProgress()? Not sure I
understand, as you suggested above that you no longer need the function.

No rush. First, the MVCC safety is not likely to be included in v19 [1].

That worries me - it is not the behaviour someone expects from a
database by default. At least the warning should be much more visible
and obvious.
I think most of user will expect the same guarantees as [CREATE|RE]
INDEX CONCURRENTLY provides.

It does not really worry me. The pg_squeeze extension is not MVCC-safe and I
remember there were only 1 or 2 related complaints throughout its
existence. (pg_repack isn't MVCC-safe as well, but I don't keep track of its
issues.)

Of course, user documentation should warn about the problem, in a way it does
for other commands (typically ALTER TABLE).

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#27

Mihail Nikalayeu

mihailnikalayeu@gmail.com

5 months ago

In reply to: Antonin Houska (#26)

Antonin Houska <ah@cybertec.at>:

Do you mean the race related to TransactionIdIsInProgress()? Not sure I
understand, as you suggested above that you no longer need the function.

The "lightweight" approaches I see so far:
* XactLockTableWait before replay + SnapshotSelf(GetLatestSnapshot?)
* SnapshotDirty + retry logic
* SnapshotBelieveEverythingCommitted + modification of
HeapTupleSatisfiesUpdate (because it called by heap_update and looks
into TransactionIdIsInProgress)

It does not really worry me. The pg_squeeze extension is not MVCC-safe and I
remember there were only 1 or 2 related complaints throughout its
existence. (pg_repack isn't MVCC-safe as well, but I don't keep track of its
issues.)

But pg_squeeze and pg_repack are extensions. If we are moving that
mechanics into core I'd expect some improvements over pg_squeeze.
MVCC-safety of REINDEX CONCURRENTLY makes it possible to run it on a
regular basis as some kind of background job. It would be nice to have
something like this for the heap.

I agree the initial approach is too invasive, complex and
performance-heavy to push it forward now.
But, any of "lightweight" feels like a good candidate to be shipped
with the feature itself - relatively easy and non-invasive.

Best regards,
Mikhail.

#28

Alvaro Herrera

alvherre@alvh.no-ip.org

5 months ago

In reply to: Mihail Nikalayeu (#9)

On 2025-Aug-21, Mihail Nikalayeu wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org>:

I proposed to leave this part out initially, which is why it hasn't been
reposted. We can review and discuss after the initial patches are in.

I think it is worth pushing it at least in the same release cycle.

If others are motivated enough to certify it, maybe we can consider it.
But I don't think I'm going to have time to get this part reviewed and
committed in time for 19, so you might need another committer.

Because having an MVCC-safe mode has drawbacks, IMO we should make it
optional.

Do you mean some option for the command? Like REPACK (CONCURRENTLY, SAFE)?

Yes, exactly that.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"La grandeza es una experiencia transitoria. Nunca es consistente.
Depende en gran parte de la imaginación humana creadora de mitos"
(Irulan)

#29

Mihail Nikalayeu

mihailnikalayeu@gmail.com

5 months ago

In reply to: Alvaro Herrera (#28)

Hello, Álvaro!

If others are motivated enough to certify it, maybe we can consider it.
But I don't think I'm going to have time to get this part reviewed and
committed in time for 19, so you might need another committer.

I don't think it is realistic to involve another committer - it is
just a well-known curse of all non-committers :)

Because having an MVCC-safe mode has drawbacks, IMO we should make it
optional.

As far as I can see, the proposed "lightweight" solutions don't
introduce any drawbacks - unless something has been overlooked.

Do you mean some option for the command? Like REPACK (CONCURRENTLY, SAFE)?

Yes, exactly that.

To be honest that approach feels a little bit strange for me. I work
in the database-consumer (not database-developer) industry and 90% of
DevOps engineers (or similar roles who deal with database maintenance
now) have no clue what MVCC is - and it is industry standard nowadays.

From my perspective - it is better to have a less performant, but
MVCC-safe approach by default, with some "more performance, less
safety" flag for those who truly understand the trade-offs.
All kinds of safety and correctness is probably the main reason to use
classic databases instead of storing data in s3\elastis\mongo\etc
these days.

In case of some incident related to that (in a large well-known
company) the typical takeaway for readers of tech blogs will simply be
"some command in Postgres is broken". And maybe also "the database
with a name starting with 'O' is not affected by that flaw".

Yes, some ALTER table-rewriting commands are not MVCC-safe, but those
typically block everything for hours - so they're avoided unless
absolutely necessary, and usually performed during backend outages to
prevent systems from getting stuck waiting on millions of locks.

"CONCURRENTLY" is something that feels like a "working in background,
don't worry, do not stop your 100k RPS" thing, especially in Postgres
because of CIC.

I also have personal experience debugging incidents caused by
incorrect Postgres behavior - and it’s absolute hell.

Sorry, I made a drama here....

I fully understand resource limitations... Maybe Postgres 19 could
introduce something like REPACK (CONCURRENTLY, UNSAFE) first, and
later add a safer REPACK (CONCURRENTLY)?

Best regards,
Mikhail.

#30

Antonin Houska

ah@cybertec.at

5 months ago

In reply to: Mihail Nikalayeu (#29)

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

In case of some incident related to that (in a large well-known
company) the typical takeaway for readers of tech blogs will simply be
"some command in Postgres is broken".

For _responsible_ users, the message will rather be that "some tech bloggers
do not bother to read user documentation".

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#31

Michael Paquier

michael@paquier.xyz

4 months ago

In reply to: Mihail Nikalayeu (#25)

On Wed, Aug 27, 2025 at 10:22:24AM +0200, Mihail Nikalayeu wrote:

That worries me - it is not the behaviour someone expects from a
database by default. At least the warning should be much more visible
and obvious.
I think most of user will expect the same guarantees as [CREATE|RE]
INDEX CONCURRENTLY provides.

Having a unified path for the handling of the waits and the locking
sounds to me like a pretty good argument in favor of a basic
implementation.

In my experience, users do not really care about the time it takes to
complete a operation involving CONCURRENTLY if we allow concurrent
reads and writes in parallel of it. I have not looked at the proposal
in details, but before trying a more folkloric MVCC approach, relying
on basics that we know have been working for some time seems like a
good and sufficient initial step in terms of handling the waits and
the locks with table AMs (aka heap or something else).

Just my 2c.
--
Michael