pg_upgrade: transfer pg_largeobject_metadata's files when possible
(new thread)
On Fri, Jul 18, 2025 at 11:05:04AM -0500, Nathan Bossart wrote:
I'm cautiously optimistic that we can find some better gains for upgrades
from v16 and newer. That would involve dumping lo_create() commands for
all LOs with comments/seclabels, dumping the relevant pg_shdepend rows, and
then copying/linking the pg_largeobject_metadata files like we did prior to
v12.
Here is a patch. For background, the reason this is limited to upgrades
from v16 and newer is because the aclitem data type (needed by
pg_largeobject_metadata.lomacl) changed its storage format in v16 (see
commit 7b378237aa). Note that the patch is essentially a revert of commit
12a53c732c, but there are enough differences that it should be considered a
fresh effort.
Something I hadn't anticipated is that we need to take special care to
transfer the relfilenode of pg_largeobject_metadata and its index, as was
done for pg_largeobject in commits d498e052b4 and bbe08b8869. In fact, the
majority of the patch is dedicated to that.
My testing showed some decent, but not earth-shattering performance
improvements from this patch. For upgrades with many large objects with
NULL lomacl/lomowner columns, pg_upgrade was 50% faster. With non-NULL
lomacl/lomowner, that dropped to 25%. When each large object had a
comment, there was no change. I'm assuming that its rare to have lots of
large objects with comments or security labels, so I don't see any need to
expend energy trying to optimize that use-case.
I am a bit concerned that we'll forget to add checks for new types of
dependencies similar to comments and security labels. If we do, pg_upgrade
should just fail to restore the schema, and fixing the code should be easy
enough. Also, we'll need to remember to revisit this code if there's
another storage format change for one of pg_largeobject_metadata's columns,
but that seems unlikely to happen anytime soon. On the whole, I'm not too
worried about either of these points.
--
nathan
Attachments:
v1-0001-pg_upgrade-Transfer-pg_largeobject_metadata-s-fil.patchtext/plain; charset=utf-8Download
From 9845e0e1c6a2bfacc53390d244fdb47f9a276169 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Thu, 14 Aug 2025 10:14:43 -0500
Subject: [PATCH v1 1/1] pg_upgrade: Transfer pg_largeobject_metadata's files
when possible.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Commit 161a3e8b68 taught pg_upgrade to use COPY for large object
metadata for upgrades from v12 and newer, which is much faster to
restore than the proper large object SQL commands. For upgrades
from v16 and newer, we can take this a step further and transfer
the large object metadata files as if they were user tables. We
can't transfer the files from older versions because the aclitem
data type (needed by pg_largeobject_metadata.lomacl) changed its
storage format in v16 (see commit 7b378237aa). Note that this
commit is essentially a revert of commit 12a53c732c, but there are
enough differences that it should be considered a fresh effort.
There are a couple of caveats. First, we still need to COPY the
corresponding pg_shdepend rows for large objects, since those
aren't transferred by anything else. Second, we need to COPY
anything in pg_largeobject_metadata with a comment or security
label, else restoring those will fail. This means that an upgrade
in which every large object has a comment or security label won't
gain anything from this commit, but it should at least avoid making
these unusual use-cases any worse.
pg_upgrade must also take care to transfer the relfilenode of
pg_largeobject_metadata and its index, Ã la commits d498e052b4 and
bbe08b8869.
---
src/backend/commands/tablecmds.c | 12 +++--
src/bin/pg_dump/pg_dump.c | 80 ++++++++++++++++++++++++++------
src/bin/pg_upgrade/info.c | 11 +++--
src/bin/pg_upgrade/pg_upgrade.c | 6 +--
4 files changed, 86 insertions(+), 23 deletions(-)
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index c6dd2e020da..4132b570513 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -42,6 +42,7 @@
#include "catalog/pg_foreign_table.h"
#include "catalog/pg_inherits.h"
#include "catalog/pg_largeobject.h"
+#include "catalog/pg_largeobject_metadata.h"
#include "catalog/pg_namespace.h"
#include "catalog/pg_opclass.h"
#include "catalog/pg_policy.h"
@@ -2389,12 +2390,15 @@ truncate_check_rel(Oid relid, Form_pg_class reltuple)
/*
* Most system catalogs can't be truncated at all, or at least not unless
* allow_system_table_mods=on. As an exception, however, we allow
- * pg_largeobject to be truncated as part of pg_upgrade, because we need
- * to change its relfilenode to match the old cluster, and allowing a
- * TRUNCATE command to be executed is the easiest way of doing that.
+ * pg_largeobject and pg_largeobject_metadata to be truncated as part of
+ * pg_upgrade, because we need to change its relfilenode to match the old
+ * cluster, and allowing a TRUNCATE command to be executed is the easiest
+ * way of doing that.
*/
if (!allowSystemTableMods && IsSystemClass(relid, reltuple)
- && (!IsBinaryUpgrade || relid != LargeObjectRelationId))
+ && (!IsBinaryUpgrade ||
+ (relid != LargeObjectRelationId &&
+ relid != LargeObjectMetadataRelationId)))
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
errmsg("permission denied: \"%s\" is a system catalog",
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index fc7a6639163..48066fab744 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -1131,6 +1131,23 @@ main(int argc, char **argv)
shdepend->dataObj->filtercond = "WHERE classid = 'pg_largeobject'::regclass "
"AND dbid = (SELECT oid FROM pg_database "
" WHERE datname = current_database())";
+
+ /*
+ * If upgrading from v16 or newer, only dump large objects with
+ * comments/seclabels. For these upgrades, pg_upgrade can copy/link
+ * pg_largeobject_metadata's files (which is usually faster) but we
+ * still need to dump LOs with comments/seclabels here so that the
+ * subsequent COMMENT and SECURITY LABEL commands work. pg_upgrade
+ * can't copy/link the files from older versions because aclitem
+ * (needed by pg_largeobject_metadata.lomacl) changed its storage
+ * format in v16.
+ */
+ if (fout->remoteVersion >= 160000)
+ lo_metadata->dataObj->filtercond = "WHERE oid IN "
+ "(SELECT objoid FROM pg_description "
+ "WHERE classoid = " CppAsString2(LargeObjectRelationId) " "
+ "UNION SELECT objoid FROM pg_seclabel "
+ "WHERE classoid = " CppAsString2(LargeObjectRelationId) ")";
}
/*
@@ -3629,26 +3646,32 @@ dumpDatabase(Archive *fout)
/*
* pg_largeobject comes from the old system intact, so set its
* relfrozenxids, relminmxids and relfilenode.
+ *
+ * pg_largeobject_metadata also comes from the old system intact for
+ * upgrades from v16 and newer, so set its relfrozenxids, relminmxids, and
+ * relfilenode, too. pg_upgrade can't copy/link the files from older
+ * versions because aclitem (needed by pg_largeobject_metadata.lomacl)
+ * changed its storage format in v16.
*/
if (dopt->binary_upgrade)
{
PGresult *lo_res;
PQExpBuffer loFrozenQry = createPQExpBuffer();
PQExpBuffer loOutQry = createPQExpBuffer();
+ PQExpBuffer lomOutQry = createPQExpBuffer();
PQExpBuffer loHorizonQry = createPQExpBuffer();
+ PQExpBuffer lomHorizonQry = createPQExpBuffer();
int ii_relfrozenxid,
ii_relfilenode,
ii_oid,
ii_relminmxid;
- /*
- * pg_largeobject
- */
if (fout->remoteVersion >= 90300)
appendPQExpBuffer(loFrozenQry, "SELECT relfrozenxid, relminmxid, relfilenode, oid\n"
"FROM pg_catalog.pg_class\n"
- "WHERE oid IN (%u, %u);\n",
- LargeObjectRelationId, LargeObjectLOidPNIndexId);
+ "WHERE oid IN (%u, %u, %u, %u);\n",
+ LargeObjectRelationId, LargeObjectLOidPNIndexId,
+ LargeObjectMetadataRelationId, LargeObjectMetadataOidIndexId);
else
appendPQExpBuffer(loFrozenQry, "SELECT relfrozenxid, 0 AS relminmxid, relfilenode, oid\n"
"FROM pg_catalog.pg_class\n"
@@ -3663,35 +3686,57 @@ dumpDatabase(Archive *fout)
ii_oid = PQfnumber(lo_res, "oid");
appendPQExpBufferStr(loHorizonQry, "\n-- For binary upgrade, set pg_largeobject relfrozenxid and relminmxid\n");
+ appendPQExpBufferStr(lomHorizonQry, "\n-- For binary upgrade, set pg_largeobject_metadata relfrozenxid and relminmxid\n");
appendPQExpBufferStr(loOutQry, "\n-- For binary upgrade, preserve pg_largeobject and index relfilenodes\n");
+ appendPQExpBufferStr(lomOutQry, "\n-- For binary upgrade, preserve pg_largeobject_metadata and index relfilenodes\n");
for (int i = 0; i < PQntuples(lo_res); ++i)
{
Oid oid;
RelFileNumber relfilenumber;
+ PQExpBuffer horizonQry;
+ PQExpBuffer outQry;
+
+ oid = atooid(PQgetvalue(lo_res, i, ii_oid));
+ relfilenumber = atooid(PQgetvalue(lo_res, i, ii_relfilenode));
- appendPQExpBuffer(loHorizonQry, "UPDATE pg_catalog.pg_class\n"
+ if (oid == LargeObjectRelationId ||
+ oid == LargeObjectLOidPNIndexId)
+ {
+ horizonQry = loHorizonQry;
+ outQry = loOutQry;
+ }
+ else
+ {
+ horizonQry = lomHorizonQry;
+ outQry = lomOutQry;
+ }
+
+ appendPQExpBuffer(horizonQry, "UPDATE pg_catalog.pg_class\n"
"SET relfrozenxid = '%u', relminmxid = '%u'\n"
"WHERE oid = %u;\n",
atooid(PQgetvalue(lo_res, i, ii_relfrozenxid)),
atooid(PQgetvalue(lo_res, i, ii_relminmxid)),
atooid(PQgetvalue(lo_res, i, ii_oid)));
- oid = atooid(PQgetvalue(lo_res, i, ii_oid));
- relfilenumber = atooid(PQgetvalue(lo_res, i, ii_relfilenode));
-
- if (oid == LargeObjectRelationId)
- appendPQExpBuffer(loOutQry,
+ if (oid == LargeObjectRelationId ||
+ oid == LargeObjectMetadataRelationId)
+ appendPQExpBuffer(outQry,
"SELECT pg_catalog.binary_upgrade_set_next_heap_relfilenode('%u'::pg_catalog.oid);\n",
relfilenumber);
- else if (oid == LargeObjectLOidPNIndexId)
- appendPQExpBuffer(loOutQry,
+ else if (oid == LargeObjectLOidPNIndexId ||
+ oid == LargeObjectMetadataOidIndexId)
+ appendPQExpBuffer(outQry,
"SELECT pg_catalog.binary_upgrade_set_next_index_relfilenode('%u'::pg_catalog.oid);\n",
relfilenumber);
}
appendPQExpBufferStr(loOutQry,
"TRUNCATE pg_catalog.pg_largeobject;\n");
+ appendPQExpBufferStr(lomOutQry,
+ "TRUNCATE pg_catalog.pg_largeobject_metadata;\n");
+
appendPQExpBufferStr(loOutQry, loHorizonQry->data);
+ appendPQExpBufferStr(lomOutQry, lomHorizonQry->data);
ArchiveEntry(fout, nilCatalogId, createDumpId(),
ARCHIVE_OPTS(.tag = "pg_largeobject",
@@ -3699,11 +3744,20 @@ dumpDatabase(Archive *fout)
.section = SECTION_PRE_DATA,
.createStmt = loOutQry->data));
+ if (fout->remoteVersion >= 160000)
+ ArchiveEntry(fout, nilCatalogId, createDumpId(),
+ ARCHIVE_OPTS(.tag = "pg_largeobject_metadata",
+ .description = "pg_largeobject_metadata",
+ .section = SECTION_PRE_DATA,
+ .createStmt = lomOutQry->data));
+
PQclear(lo_res);
destroyPQExpBuffer(loFrozenQry);
destroyPQExpBuffer(loHorizonQry);
+ destroyPQExpBuffer(lomHorizonQry);
destroyPQExpBuffer(loOutQry);
+ destroyPQExpBuffer(lomOutQry);
}
PQclear(res);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index c39eb077c2f..7ce08270168 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -498,7 +498,10 @@ get_rel_infos_query(void)
*
* pg_largeobject contains user data that does not appear in pg_dump
* output, so we have to copy that system table. It's easiest to do that
- * by treating it as a user table.
+ * by treating it as a user table. We can do the same for
+ * pg_largeobject_metadata for upgrades from v16 and newer. pg_upgrade
+ * can't copy/link the files from older versions because aclitem (needed
+ * by pg_largeobject_metadata.lomacl) changed its storage format in v16.
*/
appendPQExpBuffer(&query,
"WITH regular_heap (reloid, indtable, toastheap) AS ( "
@@ -514,10 +517,12 @@ get_rel_infos_query(void)
" 'binary_upgrade', 'pg_toast') AND "
" c.oid >= %u::pg_catalog.oid) OR "
" (n.nspname = 'pg_catalog' AND "
- " relname IN ('pg_largeobject') ))), ",
+ " relname IN ('pg_largeobject'%s) ))), ",
(user_opts.transfer_mode == TRANSFER_MODE_SWAP) ?
", " CppAsString2(RELKIND_SEQUENCE) : "",
- FirstNormalObjectId);
+ FirstNormalObjectId,
+ (GET_MAJOR_VERSION(old_cluster.major_version) >= 1600) ?
+ ", 'pg_largeobject_metadata'" : "");
/*
* Add a CTE that collects OIDs of toast tables belonging to the tables
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index d5cd5bf0b3a..490e98fa26f 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -29,9 +29,9 @@
* We control all assignments of pg_enum.oid because these oids are stored
* in user tables as enum values.
*
- * We control all assignments of pg_authid.oid for historical reasons (the
- * oids used to be stored in pg_largeobject_metadata, which is now copied via
- * SQL commands), that might change at some point in the future.
+ * We control all assignments of pg_authid.oid because the oids are stored in
+ * pg_largeobject_metadata, which is copied via file transfer for upgrades
+ * from v16 and newer.
*
* We control all assignments of pg_database.oid because we want the directory
* names to match between the old and new cluster.
--
2.39.5 (Apple Git-154)
Have you considered re-creating pg_shdepend from
pg_largeobject_metadata directly instead of having special cases for
dumping it ?
It would also be useful in cases of old (pg_upgraded since before pg
12) databases which might be missing it anyway.
Show quoted text
On Thu, Aug 14, 2025 at 5:22 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
(new thread)
On Fri, Jul 18, 2025 at 11:05:04AM -0500, Nathan Bossart wrote:
I'm cautiously optimistic that we can find some better gains for upgrades
from v16 and newer. That would involve dumping lo_create() commands for
all LOs with comments/seclabels, dumping the relevant pg_shdepend rows, and
then copying/linking the pg_largeobject_metadata files like we did prior to
v12.Here is a patch. For background, the reason this is limited to upgrades
from v16 and newer is because the aclitem data type (needed by
pg_largeobject_metadata.lomacl) changed its storage format in v16 (see
commit 7b378237aa). Note that the patch is essentially a revert of commit
12a53c732c, but there are enough differences that it should be considered a
fresh effort.Something I hadn't anticipated is that we need to take special care to
transfer the relfilenode of pg_largeobject_metadata and its index, as was
done for pg_largeobject in commits d498e052b4 and bbe08b8869. In fact, the
majority of the patch is dedicated to that.My testing showed some decent, but not earth-shattering performance
improvements from this patch. For upgrades with many large objects with
NULL lomacl/lomowner columns, pg_upgrade was 50% faster. With non-NULL
lomacl/lomowner, that dropped to 25%. When each large object had a
comment, there was no change. I'm assuming that its rare to have lots of
large objects with comments or security labels, so I don't see any need to
expend energy trying to optimize that use-case.I am a bit concerned that we'll forget to add checks for new types of
dependencies similar to comments and security labels. If we do, pg_upgrade
should just fail to restore the schema, and fixing the code should be easy
enough. Also, we'll need to remember to revisit this code if there's
another storage format change for one of pg_largeobject_metadata's columns,
but that seems unlikely to happen anytime soon. On the whole, I'm not too
worried about either of these points.--
nathan
On Tue, Aug 19, 2025 at 09:49:26AM +0200, Hannu Krosing wrote:
Have you considered re-creating pg_shdepend from
pg_largeobject_metadata directly instead of having special cases for
dumping it ?
I considered it when you last brought up the idea [0]/messages/by-id/CAMT0RQTXiqH7zdQEVSVd2L7_Cw4wQ1eHOD8hfZ+0vecMXJWc-w@mail.gmail.com, but my testing
indicated that it's measurably slower.
It would also be useful in cases of old (pg_upgraded since before pg
12) databases which might be missing it anyway.
We only use COPY for upgrades from v12 and newer, and the patch at hand
only applies to v16 and newer. There should be no need to repair
pg_shdepend for any such upgrades because we haven't copied/linked the
files since before v12.
[0]: /messages/by-id/CAMT0RQTXiqH7zdQEVSVd2L7_Cw4wQ1eHOD8hfZ+0vecMXJWc-w@mail.gmail.com
--
nathan
On Thu, Aug 14, 2025 at 10:22:02AM -0500, Nathan Bossart wrote:
Here is a patch. For background, the reason this is limited to upgrades
from v16 and newer is because the aclitem data type (needed by
pg_largeobject_metadata.lomacl) changed its storage format in v16 (see
commit 7b378237aa). Note that the patch is essentially a revert of commit
12a53c732c, but there are enough differences that it should be considered a
fresh effort.
Noted.
Something I hadn't anticipated is that we need to take special care to
transfer the relfilenode of pg_largeobject_metadata and its index, as was
done for pg_largeobject in commits d498e052b4 and bbe08b8869. In fact, the
majority of the patch is dedicated to that.My testing showed some decent, but not earth-shattering performance
improvements from this patch. For upgrades with many large objects with
NULL lomacl/lomowner columns, pg_upgrade was 50% faster. With non-NULL
lomacl/lomowner, that dropped to 25%. When each large object had a
comment, there was no change. I'm assuming that its rare to have lots of
large objects with comments or security labels, so I don't see any need to
expend energy trying to optimize that use-case.
I highly doubt that there are a lot of comments assigned to LOs, so
these numbers are pretty cool IMO. Security labels are a pain to test
in the upgrade path, or test_dummy_label could be extended with a new
TAP test and a pg_upgrade command.. There is some coverage with
comments on LOs in src/bin/pg_dump's 002, so that would be enough for
the comment part, at least.
I am a bit concerned that we'll forget to add checks for new types of
dependencies similar to comments and security labels. If we do, pg_upgrade
should just fail to restore the schema, and fixing the code should be easy
enough. Also, we'll need to remember to revisit this code if there's
another storage format change for one of pg_largeobject_metadata's columns,
but that seems unlikely to happen anytime soon. On the whole, I'm not too
worried about either of these points.
This part does not worry me much, TBH. This stuff would require
dump/restore support and the pg_dump test suite would catch that for
commands with --binary-upgrade. So it should be hard to miss.
- /*
- * pg_largeobject
- */
if (fout->remoteVersion >= 90300)
appendPQExpBuffer(loFrozenQry, "SELECT relfrozenxid, relminmxid, relfilenode, oid\n"
"FROM pg_catalog.pg_class\n"
- "WHERE oid IN (%u, %u);\n",
- LargeObjectRelationId, LargeObjectLOidPNIndexId);
+ "WHERE oid IN (%u, %u, %u, %u);\n",
+ LargeObjectRelationId, LargeObjectLOidPNIndexId,
+ LargeObjectMetadataRelationId, LargeObjectMetadataOidIndexId);
[...]
appendPQExpBufferStr(loHorizonQry, "\n-- For binary upgrade, set pg_largeobject relfrozenxid and relminmxid\n");
+ appendPQExpBufferStr(lomHorizonQry, "\n-- For binary upgrade, set pg_largeobject_metadata relfrozenxid and relminmxid\n");
appendPQExpBufferStr(loOutQry, "\n-- For binary upgrade, preserve pg_largeobject and index relfilenodes\n");
+ appendPQExpBufferStr(lomOutQry, "\n-- For binary upgrade, preserve pg_largeobject_metadata and index relfilenodes\n");
Is all that really required when upgrading from a cluster in the
9.3~15 range?
--
Michael
On Mon, Sep 01, 2025 at 03:19:46PM +0900, Michael Paquier wrote:
I highly doubt that there are a lot of comments assigned to LOs, so
these numbers are pretty cool IMO. Security labels are a pain to test
in the upgrade path, or test_dummy_label could be extended with a new
TAP test and a pg_upgrade command.. There is some coverage with
comments on LOs in src/bin/pg_dump's 002, so that would be enough for
the comment part, at least.
Do you think a new pg_upgrade test for security labels is worth the
trouble? It seems doable, but it'd be an awfully expensive test for this.
On the other hand, I'm not sure there's any coverage for pg_upgrade with
security labels, so perhaps this is a good time to establish some tests.
- /* - * pg_largeobject - */ if (fout->remoteVersion >= 90300) appendPQExpBuffer(loFrozenQry, "SELECT relfrozenxid, relminmxid, relfilenode, oid\n" "FROM pg_catalog.pg_class\n" - "WHERE oid IN (%u, %u);\n", - LargeObjectRelationId, LargeObjectLOidPNIndexId); + "WHERE oid IN (%u, %u, %u, %u);\n", + LargeObjectRelationId, LargeObjectLOidPNIndexId, + LargeObjectMetadataRelationId, LargeObjectMetadataOidIndexId); [...] appendPQExpBufferStr(loHorizonQry, "\n-- For binary upgrade, set pg_largeobject relfrozenxid and relminmxid\n"); + appendPQExpBufferStr(lomHorizonQry, "\n-- For binary upgrade, set pg_largeobject_metadata relfrozenxid and relminmxid\n"); appendPQExpBufferStr(loOutQry, "\n-- For binary upgrade, preserve pg_largeobject and index relfilenodes\n"); + appendPQExpBufferStr(lomOutQry, "\n-- For binary upgrade, preserve pg_largeobject_metadata and index relfilenodes\n");Is all that really required when upgrading from a cluster in the
9.3~15 range?
No, that stuff is discarded for upgrades from those versions. My intent
was to maintain readability by avoiding lots of version checks. FWIW I
originally put all of the pg_large_object_metadata stuff in a separate
block, but that resulted in a lot of copy/pasted code. I'm happy to adjust
it as you see fit.
--
nathan
On Tue, Sep 02, 2025 at 09:43:40AM -0500, Nathan Bossart wrote:
Do you think a new pg_upgrade test for security labels is worth the
trouble? It seems doable, but it'd be an awfully expensive test for this.
On the other hand, I'm not sure there's any coverage for pg_upgrade with
security labels, so perhaps this is a good time to establish some tests.
I would argue in favor of these additions. Security labels are not
the most popular thing ever, AFAIK, but your patch makes the need more
relevant to have. The cheapest approach would be to add a LO creation
pattern in src/bin/pg_dump/t/002_pg_dump.pl, with an EXTRA_INSTALL
pointing at src/test/modules/dummy_seclabel/ to be able to create the
security label (we already do that in pg_upgrade and pg_basebackup so
the trick works). That should be enough to make sure that the binary
upgrade dumps have the seclabel data included. It's a bit funky, I
agree. So if you think that this is not worth the test cycles, I
won't push hard on this point, either.
No, that stuff is discarded for upgrades from those versions. My intent
was to maintain readability by avoiding lots of version checks. FWIW I
originally put all of the pg_large_object_metadata stuff in a separate
block, but that resulted in a lot of copy/pasted code. I'm happy to adjust
it as you see fit.
+ if (fout->remoteVersion >= 160000)
+ ArchiveEntry(fout, nilCatalogId, createDumpId(),
+ ARCHIVE_OPTS(.tag = "pg_largeobject_metadata",
+ .description = "pg_largeobject_metadata",
+ .section = SECTION_PRE_DATA,
+ .createStmt = lomOutQry->data));
So it's filtered out depending on the old cluster version here. I
have managed to miss this part. Some tests with LOs and across
multiple backend versions (pg_dump -C --binary-upgrade) are showing me
that I am wrong and that you are right, with diffs showing up properly
in the binary upgrade dumps for pg_largeobject_metadata. This logic
looks OK to me, even if it's a waste to fill lomOutQry when upgrading
from an instance in the 9.3~15 range, discarding ArchiveEntry() at the
end. It's not a big deal.
--
Michael
On Thu, Sep 04, 2025 at 01:59:36PM +0900, Michael Paquier wrote:
On Tue, Sep 02, 2025 at 09:43:40AM -0500, Nathan Bossart wrote:
Do you think a new pg_upgrade test for security labels is worth the
trouble? It seems doable, but it'd be an awfully expensive test for this.
On the other hand, I'm not sure there's any coverage for pg_upgrade with
security labels, so perhaps this is a good time to establish some tests.I would argue in favor of these additions. Security labels are not
the most popular thing ever, AFAIK, but your patch makes the need more
relevant to have. The cheapest approach would be to add a LO creation
pattern in src/bin/pg_dump/t/002_pg_dump.pl, with an EXTRA_INSTALL
pointing at src/test/modules/dummy_seclabel/ to be able to create the
security label (we already do that in pg_upgrade and pg_basebackup so
the trick works). That should be enough to make sure that the binary
upgrade dumps have the seclabel data included. It's a bit funky, I
agree. So if you think that this is not worth the test cycles, I
won't push hard on this point, either.
Ah, I'd forgotten about EXTRA_INSTALL. That simplifies things. There's
enough special handling for large objects in pg_upgrade that I think we
ought to test it end-to-end, so I sneaked it into 006_tranfer_modes.pl.
WDYT?
--
nathan
Attachments:
v2-0001-pg_upgrade-Transfer-pg_largeobject_metadata-s-fil.patchtext/plain; charset=utf-8Download
From e2cce34c7047d28f17b457d05d2848af2ad9ffa8 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Thu, 14 Aug 2025 10:14:43 -0500
Subject: [PATCH v2 1/1] pg_upgrade: Transfer pg_largeobject_metadata's files
when possible.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Commit 161a3e8b68 taught pg_upgrade to use COPY for large object
metadata for upgrades from v12 and newer, which is much faster to
restore than the proper large object SQL commands. For upgrades
from v16 and newer, we can take this a step further and transfer
the large object metadata files as if they were user tables. We
can't transfer the files from older versions because the aclitem
data type (needed by pg_largeobject_metadata.lomacl) changed its
storage format in v16 (see commit 7b378237aa). Note that this
commit is essentially a revert of commit 12a53c732c, but there are
enough differences that it should be considered a fresh effort.
There are a couple of caveats. First, we still need to COPY the
corresponding pg_shdepend rows for large objects, since those
aren't transferred by anything else. Second, we need to COPY
anything in pg_largeobject_metadata with a comment or security
label, else restoring those will fail. This means that an upgrade
in which every large object has a comment or security label won't
gain anything from this commit, but it should at least avoid making
these unusual use-cases any worse.
pg_upgrade must also take care to transfer the relfilenode of
pg_largeobject_metadata and its index, Ã la commits d498e052b4 and
bbe08b8869.
---
src/backend/commands/tablecmds.c | 12 ++--
src/bin/pg_dump/pg_dump.c | 80 ++++++++++++++++++----
src/bin/pg_upgrade/Makefile | 3 +-
src/bin/pg_upgrade/info.c | 11 ++-
src/bin/pg_upgrade/pg_upgrade.c | 6 +-
src/bin/pg_upgrade/t/006_transfer_modes.pl | 55 +++++++++++++++
6 files changed, 142 insertions(+), 25 deletions(-)
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 082a3575d62..3be2e051d32 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -42,6 +42,7 @@
#include "catalog/pg_foreign_table.h"
#include "catalog/pg_inherits.h"
#include "catalog/pg_largeobject.h"
+#include "catalog/pg_largeobject_metadata.h"
#include "catalog/pg_namespace.h"
#include "catalog/pg_opclass.h"
#include "catalog/pg_policy.h"
@@ -2389,12 +2390,15 @@ truncate_check_rel(Oid relid, Form_pg_class reltuple)
/*
* Most system catalogs can't be truncated at all, or at least not unless
* allow_system_table_mods=on. As an exception, however, we allow
- * pg_largeobject to be truncated as part of pg_upgrade, because we need
- * to change its relfilenode to match the old cluster, and allowing a
- * TRUNCATE command to be executed is the easiest way of doing that.
+ * pg_largeobject and pg_largeobject_metadata to be truncated as part of
+ * pg_upgrade, because we need to change its relfilenode to match the old
+ * cluster, and allowing a TRUNCATE command to be executed is the easiest
+ * way of doing that.
*/
if (!allowSystemTableMods && IsSystemClass(relid, reltuple)
- && (!IsBinaryUpgrade || relid != LargeObjectRelationId))
+ && (!IsBinaryUpgrade ||
+ (relid != LargeObjectRelationId &&
+ relid != LargeObjectMetadataRelationId)))
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
errmsg("permission denied: \"%s\" is a system catalog",
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index bea793456f9..b4c45ad803e 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -1131,6 +1131,23 @@ main(int argc, char **argv)
shdepend->dataObj->filtercond = "WHERE classid = 'pg_largeobject'::regclass "
"AND dbid = (SELECT oid FROM pg_database "
" WHERE datname = current_database())";
+
+ /*
+ * If upgrading from v16 or newer, only dump large objects with
+ * comments/seclabels. For these upgrades, pg_upgrade can copy/link
+ * pg_largeobject_metadata's files (which is usually faster) but we
+ * still need to dump LOs with comments/seclabels here so that the
+ * subsequent COMMENT and SECURITY LABEL commands work. pg_upgrade
+ * can't copy/link the files from older versions because aclitem
+ * (needed by pg_largeobject_metadata.lomacl) changed its storage
+ * format in v16.
+ */
+ if (fout->remoteVersion >= 160000)
+ lo_metadata->dataObj->filtercond = "WHERE oid IN "
+ "(SELECT objoid FROM pg_description "
+ "WHERE classoid = " CppAsString2(LargeObjectRelationId) " "
+ "UNION SELECT objoid FROM pg_seclabel "
+ "WHERE classoid = " CppAsString2(LargeObjectRelationId) ")";
}
/*
@@ -3629,26 +3646,32 @@ dumpDatabase(Archive *fout)
/*
* pg_largeobject comes from the old system intact, so set its
* relfrozenxids, relminmxids and relfilenode.
+ *
+ * pg_largeobject_metadata also comes from the old system intact for
+ * upgrades from v16 and newer, so set its relfrozenxids, relminmxids, and
+ * relfilenode, too. pg_upgrade can't copy/link the files from older
+ * versions because aclitem (needed by pg_largeobject_metadata.lomacl)
+ * changed its storage format in v16.
*/
if (dopt->binary_upgrade)
{
PGresult *lo_res;
PQExpBuffer loFrozenQry = createPQExpBuffer();
PQExpBuffer loOutQry = createPQExpBuffer();
+ PQExpBuffer lomOutQry = createPQExpBuffer();
PQExpBuffer loHorizonQry = createPQExpBuffer();
+ PQExpBuffer lomHorizonQry = createPQExpBuffer();
int ii_relfrozenxid,
ii_relfilenode,
ii_oid,
ii_relminmxid;
- /*
- * pg_largeobject
- */
if (fout->remoteVersion >= 90300)
appendPQExpBuffer(loFrozenQry, "SELECT relfrozenxid, relminmxid, relfilenode, oid\n"
"FROM pg_catalog.pg_class\n"
- "WHERE oid IN (%u, %u);\n",
- LargeObjectRelationId, LargeObjectLOidPNIndexId);
+ "WHERE oid IN (%u, %u, %u, %u);\n",
+ LargeObjectRelationId, LargeObjectLOidPNIndexId,
+ LargeObjectMetadataRelationId, LargeObjectMetadataOidIndexId);
else
appendPQExpBuffer(loFrozenQry, "SELECT relfrozenxid, 0 AS relminmxid, relfilenode, oid\n"
"FROM pg_catalog.pg_class\n"
@@ -3663,35 +3686,57 @@ dumpDatabase(Archive *fout)
ii_oid = PQfnumber(lo_res, "oid");
appendPQExpBufferStr(loHorizonQry, "\n-- For binary upgrade, set pg_largeobject relfrozenxid and relminmxid\n");
+ appendPQExpBufferStr(lomHorizonQry, "\n-- For binary upgrade, set pg_largeobject_metadata relfrozenxid and relminmxid\n");
appendPQExpBufferStr(loOutQry, "\n-- For binary upgrade, preserve pg_largeobject and index relfilenodes\n");
+ appendPQExpBufferStr(lomOutQry, "\n-- For binary upgrade, preserve pg_largeobject_metadata and index relfilenodes\n");
for (int i = 0; i < PQntuples(lo_res); ++i)
{
Oid oid;
RelFileNumber relfilenumber;
+ PQExpBuffer horizonQry;
+ PQExpBuffer outQry;
+
+ oid = atooid(PQgetvalue(lo_res, i, ii_oid));
+ relfilenumber = atooid(PQgetvalue(lo_res, i, ii_relfilenode));
- appendPQExpBuffer(loHorizonQry, "UPDATE pg_catalog.pg_class\n"
+ if (oid == LargeObjectRelationId ||
+ oid == LargeObjectLOidPNIndexId)
+ {
+ horizonQry = loHorizonQry;
+ outQry = loOutQry;
+ }
+ else
+ {
+ horizonQry = lomHorizonQry;
+ outQry = lomOutQry;
+ }
+
+ appendPQExpBuffer(horizonQry, "UPDATE pg_catalog.pg_class\n"
"SET relfrozenxid = '%u', relminmxid = '%u'\n"
"WHERE oid = %u;\n",
atooid(PQgetvalue(lo_res, i, ii_relfrozenxid)),
atooid(PQgetvalue(lo_res, i, ii_relminmxid)),
atooid(PQgetvalue(lo_res, i, ii_oid)));
- oid = atooid(PQgetvalue(lo_res, i, ii_oid));
- relfilenumber = atooid(PQgetvalue(lo_res, i, ii_relfilenode));
-
- if (oid == LargeObjectRelationId)
- appendPQExpBuffer(loOutQry,
+ if (oid == LargeObjectRelationId ||
+ oid == LargeObjectMetadataRelationId)
+ appendPQExpBuffer(outQry,
"SELECT pg_catalog.binary_upgrade_set_next_heap_relfilenode('%u'::pg_catalog.oid);\n",
relfilenumber);
- else if (oid == LargeObjectLOidPNIndexId)
- appendPQExpBuffer(loOutQry,
+ else if (oid == LargeObjectLOidPNIndexId ||
+ oid == LargeObjectMetadataOidIndexId)
+ appendPQExpBuffer(outQry,
"SELECT pg_catalog.binary_upgrade_set_next_index_relfilenode('%u'::pg_catalog.oid);\n",
relfilenumber);
}
appendPQExpBufferStr(loOutQry,
"TRUNCATE pg_catalog.pg_largeobject;\n");
+ appendPQExpBufferStr(lomOutQry,
+ "TRUNCATE pg_catalog.pg_largeobject_metadata;\n");
+
appendPQExpBufferStr(loOutQry, loHorizonQry->data);
+ appendPQExpBufferStr(lomOutQry, lomHorizonQry->data);
ArchiveEntry(fout, nilCatalogId, createDumpId(),
ARCHIVE_OPTS(.tag = "pg_largeobject",
@@ -3699,11 +3744,20 @@ dumpDatabase(Archive *fout)
.section = SECTION_PRE_DATA,
.createStmt = loOutQry->data));
+ if (fout->remoteVersion >= 160000)
+ ArchiveEntry(fout, nilCatalogId, createDumpId(),
+ ARCHIVE_OPTS(.tag = "pg_largeobject_metadata",
+ .description = "pg_largeobject_metadata",
+ .section = SECTION_PRE_DATA,
+ .createStmt = lomOutQry->data));
+
PQclear(lo_res);
destroyPQExpBuffer(loFrozenQry);
destroyPQExpBuffer(loHorizonQry);
+ destroyPQExpBuffer(lomHorizonQry);
destroyPQExpBuffer(loOutQry);
+ destroyPQExpBuffer(lomOutQry);
}
PQclear(res);
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index f83d2b5d309..69fcf593cae 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,8 +3,7 @@
PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
PGAPPICON = win32
-# required for 003_upgrade_logical_replication_slots.pl
-EXTRA_INSTALL=contrib/test_decoding
+EXTRA_INSTALL=contrib/test_decoding src/test/modules/dummy_seclabel
subdir = src/bin/pg_upgrade
top_builddir = ../../..
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index c39eb077c2f..7ce08270168 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -498,7 +498,10 @@ get_rel_infos_query(void)
*
* pg_largeobject contains user data that does not appear in pg_dump
* output, so we have to copy that system table. It's easiest to do that
- * by treating it as a user table.
+ * by treating it as a user table. We can do the same for
+ * pg_largeobject_metadata for upgrades from v16 and newer. pg_upgrade
+ * can't copy/link the files from older versions because aclitem (needed
+ * by pg_largeobject_metadata.lomacl) changed its storage format in v16.
*/
appendPQExpBuffer(&query,
"WITH regular_heap (reloid, indtable, toastheap) AS ( "
@@ -514,10 +517,12 @@ get_rel_infos_query(void)
" 'binary_upgrade', 'pg_toast') AND "
" c.oid >= %u::pg_catalog.oid) OR "
" (n.nspname = 'pg_catalog' AND "
- " relname IN ('pg_largeobject') ))), ",
+ " relname IN ('pg_largeobject'%s) ))), ",
(user_opts.transfer_mode == TRANSFER_MODE_SWAP) ?
", " CppAsString2(RELKIND_SEQUENCE) : "",
- FirstNormalObjectId);
+ FirstNormalObjectId,
+ (GET_MAJOR_VERSION(old_cluster.major_version) >= 1600) ?
+ ", 'pg_largeobject_metadata'" : "");
/*
* Add a CTE that collects OIDs of toast tables belonging to the tables
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index d5cd5bf0b3a..490e98fa26f 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -29,9 +29,9 @@
* We control all assignments of pg_enum.oid because these oids are stored
* in user tables as enum values.
*
- * We control all assignments of pg_authid.oid for historical reasons (the
- * oids used to be stored in pg_largeobject_metadata, which is now copied via
- * SQL commands), that might change at some point in the future.
+ * We control all assignments of pg_authid.oid because the oids are stored in
+ * pg_largeobject_metadata, which is copied via file transfer for upgrades
+ * from v16 and newer.
*
* We control all assignments of pg_database.oid because we want the directory
* names to match between the old and new cluster.
diff --git a/src/bin/pg_upgrade/t/006_transfer_modes.pl b/src/bin/pg_upgrade/t/006_transfer_modes.pl
index 348f4021462..b355f6de5b9 100644
--- a/src/bin/pg_upgrade/t/006_transfer_modes.pl
+++ b/src/bin/pg_upgrade/t/006_transfer_modes.pl
@@ -83,6 +83,33 @@ sub test_mode
$old->safe_psql('testdb3',
"CREATE TABLE test6 AS SELECT generate_series(607, 711)");
}
+
+ # While we are here, test handling of large objects.
+ $old->safe_psql('postgres', q|
+ CREATE ROLE regress_lo_1;
+ CREATE ROLE regress_lo_2;
+
+ SELECT lo_from_bytea(4532, '\xffffff00');
+ COMMENT ON LARGE OBJECT 4532 IS 'test';
+
+ SELECT lo_from_bytea(4533, '\x0f0f0f0f');
+ ALTER LARGE OBJECT 4533 OWNER TO regress_lo_1;
+ GRANT SELECT ON LARGE OBJECT 4533 TO regress_lo_2;
+ |);
+
+ # For same-version upgrades, test large objects with security labels
+ #
+ # This requires installing dummy_seclabel, so we limit this test to
+ # same-version upgrades.
+ if (!defined($ENV{oldinstall}))
+ {
+ $old->safe_psql('postgres', q|
+ CREATE EXTENSION dummy_seclabel;
+
+ SELECT lo_from_bytea(4534, '\x00ffffff');
+ SECURITY LABEL ON LARGE OBJECT 4534 IS 'classified';
+ |);
+ }
$old->stop;
my $result = command_ok_or_fails_like(
@@ -132,6 +159,34 @@ sub test_mode
$result = $new->safe_psql('testdb3', "SELECT COUNT(*) FROM test6");
is($result, '105', "test6 data after pg_upgrade $mode");
}
+
+ # Tests for large objects
+ $result = $new->safe_psql('postgres', "SELECT lo_get(4532)");
+ is($result, '\xffffff00', "LO contents after upgrade");
+ $result = $new->safe_psql('postgres',
+ "SELECT obj_description(4532, 'pg_largeobject')");
+ is($result, 'test', "comment on LO after pg_upgrade");
+
+ $result = $new->safe_psql('postgres', "SELECT lo_get(4533)");
+ is($result, '\x0f0f0f0f', "LO contents after upgrade");
+ $result = $new->safe_psql('postgres',
+ "SELECT lomowner::regrole FROM pg_largeobject_metadata WHERE oid = 4533");
+ is($result, 'regress_lo_1', "LO owner after upgrade");
+ $result = $new->safe_psql('postgres',
+ "SELECT lomacl FROM pg_largeobject_metadata WHERE oid = 4533");
+ is($result, '{regress_lo_1=rw/regress_lo_1,regress_lo_2=r/regress_lo_1}',
+ "LO ACL after upgrade");
+
+ if (!defined($ENV{oldinstall}))
+ {
+ $result = $new->safe_psql('postgres', "SELECT lo_get(4534)");
+ is($result, '\x00ffffff', "LO contents after upgrade");
+ $result = $new->safe_psql('postgres', q|
+ SELECT label FROM pg_seclabel WHERE objoid = 4534
+ AND classoid = 'pg_largeobject'::regclass
+ |);
+ is($result, 'classified', "seclabel on LO after pg_upgrade");
+ }
$new->stop;
}
--
2.39.5 (Apple Git-154)
On Thu, Sep 04, 2025 at 08:23:58PM -0500, Nathan Bossart wrote:
Ah, I'd forgotten about EXTRA_INSTALL. That simplifies things. There's
enough special handling for large objects in pg_upgrade that I think we
ought to test it end-to-end, so I sneaked it into 006_tranfer_modes.pl.
WDYT?
Neat. That works for me.
+ $old->safe_psql('postgres', q|
+ CREATE EXTENSION dummy_seclabel;
Still I think that this bit is going to fail with installcheck,
because src/test/modules/ is not installed by default :)
You can make that conditional with check_extension(), like the tricks
for injection_points in other TAP tests. The security label creation
and test also need to be made conditional. The work is already done
with the check on oldinstall, it just needs an extra tweak, so I would
store the condition in a separate variable at the top of test_mode()
in 006_transfer_modes.pl, or something equivalent.
--
Michael
On Fri, Sep 05, 2025 at 03:35:21PM +0900, Michael Paquier wrote:
+ $old->safe_psql('postgres', q| + CREATE EXTENSION dummy_seclabel;Still I think that this bit is going to fail with installcheck,
because src/test/modules/ is not installed by default :)You can make that conditional with check_extension(), like the tricks
for injection_points in other TAP tests. The security label creation
and test also need to be made conditional. The work is already done
with the check on oldinstall, it just needs an extra tweak, so I would
store the condition in a separate variable at the top of test_mode()
in 006_transfer_modes.pl, or something equivalent.
How does this look?
--
nathan
Attachments:
v3-0001-pg_upgrade-Transfer-pg_largeobject_metadata-s-fil.patchtext/plain; charset=utf-8Download
From a09d792405df4607886643adaca30c3ee3a3c03e Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Thu, 14 Aug 2025 10:14:43 -0500
Subject: [PATCH v3 1/1] pg_upgrade: Transfer pg_largeobject_metadata's files
when possible.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Commit 161a3e8b68 taught pg_upgrade to use COPY for large object
metadata for upgrades from v12 and newer, which is much faster to
restore than the proper large object SQL commands. For upgrades
from v16 and newer, we can take this a step further and transfer
the large object metadata files as if they were user tables. We
can't transfer the files from older versions because the aclitem
data type (needed by pg_largeobject_metadata.lomacl) changed its
storage format in v16 (see commit 7b378237aa). Note that this
commit is essentially a revert of commit 12a53c732c, but there are
enough differences that it should be considered a fresh effort.
There are a couple of caveats. First, we still need to COPY the
corresponding pg_shdepend rows for large objects, since those
aren't transferred by anything else. Second, we need to COPY
anything in pg_largeobject_metadata with a comment or security
label, else restoring those will fail. This means that an upgrade
in which every large object has a comment or security label won't
gain anything from this commit, but it should at least avoid making
these unusual use-cases any worse.
pg_upgrade must also take care to transfer the relfilenode of
pg_largeobject_metadata and its index, Ã la commits d498e052b4 and
bbe08b8869.
---
src/backend/commands/tablecmds.c | 12 ++--
src/bin/pg_dump/pg_dump.c | 80 ++++++++++++++++++----
src/bin/pg_upgrade/Makefile | 3 +-
src/bin/pg_upgrade/info.c | 11 ++-
src/bin/pg_upgrade/pg_upgrade.c | 6 +-
src/bin/pg_upgrade/t/006_transfer_modes.pl | 67 ++++++++++++++++++
6 files changed, 154 insertions(+), 25 deletions(-)
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 082a3575d62..3be2e051d32 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -42,6 +42,7 @@
#include "catalog/pg_foreign_table.h"
#include "catalog/pg_inherits.h"
#include "catalog/pg_largeobject.h"
+#include "catalog/pg_largeobject_metadata.h"
#include "catalog/pg_namespace.h"
#include "catalog/pg_opclass.h"
#include "catalog/pg_policy.h"
@@ -2389,12 +2390,15 @@ truncate_check_rel(Oid relid, Form_pg_class reltuple)
/*
* Most system catalogs can't be truncated at all, or at least not unless
* allow_system_table_mods=on. As an exception, however, we allow
- * pg_largeobject to be truncated as part of pg_upgrade, because we need
- * to change its relfilenode to match the old cluster, and allowing a
- * TRUNCATE command to be executed is the easiest way of doing that.
+ * pg_largeobject and pg_largeobject_metadata to be truncated as part of
+ * pg_upgrade, because we need to change its relfilenode to match the old
+ * cluster, and allowing a TRUNCATE command to be executed is the easiest
+ * way of doing that.
*/
if (!allowSystemTableMods && IsSystemClass(relid, reltuple)
- && (!IsBinaryUpgrade || relid != LargeObjectRelationId))
+ && (!IsBinaryUpgrade ||
+ (relid != LargeObjectRelationId &&
+ relid != LargeObjectMetadataRelationId)))
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
errmsg("permission denied: \"%s\" is a system catalog",
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index bea793456f9..b4c45ad803e 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -1131,6 +1131,23 @@ main(int argc, char **argv)
shdepend->dataObj->filtercond = "WHERE classid = 'pg_largeobject'::regclass "
"AND dbid = (SELECT oid FROM pg_database "
" WHERE datname = current_database())";
+
+ /*
+ * If upgrading from v16 or newer, only dump large objects with
+ * comments/seclabels. For these upgrades, pg_upgrade can copy/link
+ * pg_largeobject_metadata's files (which is usually faster) but we
+ * still need to dump LOs with comments/seclabels here so that the
+ * subsequent COMMENT and SECURITY LABEL commands work. pg_upgrade
+ * can't copy/link the files from older versions because aclitem
+ * (needed by pg_largeobject_metadata.lomacl) changed its storage
+ * format in v16.
+ */
+ if (fout->remoteVersion >= 160000)
+ lo_metadata->dataObj->filtercond = "WHERE oid IN "
+ "(SELECT objoid FROM pg_description "
+ "WHERE classoid = " CppAsString2(LargeObjectRelationId) " "
+ "UNION SELECT objoid FROM pg_seclabel "
+ "WHERE classoid = " CppAsString2(LargeObjectRelationId) ")";
}
/*
@@ -3629,26 +3646,32 @@ dumpDatabase(Archive *fout)
/*
* pg_largeobject comes from the old system intact, so set its
* relfrozenxids, relminmxids and relfilenode.
+ *
+ * pg_largeobject_metadata also comes from the old system intact for
+ * upgrades from v16 and newer, so set its relfrozenxids, relminmxids, and
+ * relfilenode, too. pg_upgrade can't copy/link the files from older
+ * versions because aclitem (needed by pg_largeobject_metadata.lomacl)
+ * changed its storage format in v16.
*/
if (dopt->binary_upgrade)
{
PGresult *lo_res;
PQExpBuffer loFrozenQry = createPQExpBuffer();
PQExpBuffer loOutQry = createPQExpBuffer();
+ PQExpBuffer lomOutQry = createPQExpBuffer();
PQExpBuffer loHorizonQry = createPQExpBuffer();
+ PQExpBuffer lomHorizonQry = createPQExpBuffer();
int ii_relfrozenxid,
ii_relfilenode,
ii_oid,
ii_relminmxid;
- /*
- * pg_largeobject
- */
if (fout->remoteVersion >= 90300)
appendPQExpBuffer(loFrozenQry, "SELECT relfrozenxid, relminmxid, relfilenode, oid\n"
"FROM pg_catalog.pg_class\n"
- "WHERE oid IN (%u, %u);\n",
- LargeObjectRelationId, LargeObjectLOidPNIndexId);
+ "WHERE oid IN (%u, %u, %u, %u);\n",
+ LargeObjectRelationId, LargeObjectLOidPNIndexId,
+ LargeObjectMetadataRelationId, LargeObjectMetadataOidIndexId);
else
appendPQExpBuffer(loFrozenQry, "SELECT relfrozenxid, 0 AS relminmxid, relfilenode, oid\n"
"FROM pg_catalog.pg_class\n"
@@ -3663,35 +3686,57 @@ dumpDatabase(Archive *fout)
ii_oid = PQfnumber(lo_res, "oid");
appendPQExpBufferStr(loHorizonQry, "\n-- For binary upgrade, set pg_largeobject relfrozenxid and relminmxid\n");
+ appendPQExpBufferStr(lomHorizonQry, "\n-- For binary upgrade, set pg_largeobject_metadata relfrozenxid and relminmxid\n");
appendPQExpBufferStr(loOutQry, "\n-- For binary upgrade, preserve pg_largeobject and index relfilenodes\n");
+ appendPQExpBufferStr(lomOutQry, "\n-- For binary upgrade, preserve pg_largeobject_metadata and index relfilenodes\n");
for (int i = 0; i < PQntuples(lo_res); ++i)
{
Oid oid;
RelFileNumber relfilenumber;
+ PQExpBuffer horizonQry;
+ PQExpBuffer outQry;
+
+ oid = atooid(PQgetvalue(lo_res, i, ii_oid));
+ relfilenumber = atooid(PQgetvalue(lo_res, i, ii_relfilenode));
- appendPQExpBuffer(loHorizonQry, "UPDATE pg_catalog.pg_class\n"
+ if (oid == LargeObjectRelationId ||
+ oid == LargeObjectLOidPNIndexId)
+ {
+ horizonQry = loHorizonQry;
+ outQry = loOutQry;
+ }
+ else
+ {
+ horizonQry = lomHorizonQry;
+ outQry = lomOutQry;
+ }
+
+ appendPQExpBuffer(horizonQry, "UPDATE pg_catalog.pg_class\n"
"SET relfrozenxid = '%u', relminmxid = '%u'\n"
"WHERE oid = %u;\n",
atooid(PQgetvalue(lo_res, i, ii_relfrozenxid)),
atooid(PQgetvalue(lo_res, i, ii_relminmxid)),
atooid(PQgetvalue(lo_res, i, ii_oid)));
- oid = atooid(PQgetvalue(lo_res, i, ii_oid));
- relfilenumber = atooid(PQgetvalue(lo_res, i, ii_relfilenode));
-
- if (oid == LargeObjectRelationId)
- appendPQExpBuffer(loOutQry,
+ if (oid == LargeObjectRelationId ||
+ oid == LargeObjectMetadataRelationId)
+ appendPQExpBuffer(outQry,
"SELECT pg_catalog.binary_upgrade_set_next_heap_relfilenode('%u'::pg_catalog.oid);\n",
relfilenumber);
- else if (oid == LargeObjectLOidPNIndexId)
- appendPQExpBuffer(loOutQry,
+ else if (oid == LargeObjectLOidPNIndexId ||
+ oid == LargeObjectMetadataOidIndexId)
+ appendPQExpBuffer(outQry,
"SELECT pg_catalog.binary_upgrade_set_next_index_relfilenode('%u'::pg_catalog.oid);\n",
relfilenumber);
}
appendPQExpBufferStr(loOutQry,
"TRUNCATE pg_catalog.pg_largeobject;\n");
+ appendPQExpBufferStr(lomOutQry,
+ "TRUNCATE pg_catalog.pg_largeobject_metadata;\n");
+
appendPQExpBufferStr(loOutQry, loHorizonQry->data);
+ appendPQExpBufferStr(lomOutQry, lomHorizonQry->data);
ArchiveEntry(fout, nilCatalogId, createDumpId(),
ARCHIVE_OPTS(.tag = "pg_largeobject",
@@ -3699,11 +3744,20 @@ dumpDatabase(Archive *fout)
.section = SECTION_PRE_DATA,
.createStmt = loOutQry->data));
+ if (fout->remoteVersion >= 160000)
+ ArchiveEntry(fout, nilCatalogId, createDumpId(),
+ ARCHIVE_OPTS(.tag = "pg_largeobject_metadata",
+ .description = "pg_largeobject_metadata",
+ .section = SECTION_PRE_DATA,
+ .createStmt = lomOutQry->data));
+
PQclear(lo_res);
destroyPQExpBuffer(loFrozenQry);
destroyPQExpBuffer(loHorizonQry);
+ destroyPQExpBuffer(lomHorizonQry);
destroyPQExpBuffer(loOutQry);
+ destroyPQExpBuffer(lomOutQry);
}
PQclear(res);
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index f83d2b5d309..69fcf593cae 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,8 +3,7 @@
PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
PGAPPICON = win32
-# required for 003_upgrade_logical_replication_slots.pl
-EXTRA_INSTALL=contrib/test_decoding
+EXTRA_INSTALL=contrib/test_decoding src/test/modules/dummy_seclabel
subdir = src/bin/pg_upgrade
top_builddir = ../../..
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index c39eb077c2f..7ce08270168 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -498,7 +498,10 @@ get_rel_infos_query(void)
*
* pg_largeobject contains user data that does not appear in pg_dump
* output, so we have to copy that system table. It's easiest to do that
- * by treating it as a user table.
+ * by treating it as a user table. We can do the same for
+ * pg_largeobject_metadata for upgrades from v16 and newer. pg_upgrade
+ * can't copy/link the files from older versions because aclitem (needed
+ * by pg_largeobject_metadata.lomacl) changed its storage format in v16.
*/
appendPQExpBuffer(&query,
"WITH regular_heap (reloid, indtable, toastheap) AS ( "
@@ -514,10 +517,12 @@ get_rel_infos_query(void)
" 'binary_upgrade', 'pg_toast') AND "
" c.oid >= %u::pg_catalog.oid) OR "
" (n.nspname = 'pg_catalog' AND "
- " relname IN ('pg_largeobject') ))), ",
+ " relname IN ('pg_largeobject'%s) ))), ",
(user_opts.transfer_mode == TRANSFER_MODE_SWAP) ?
", " CppAsString2(RELKIND_SEQUENCE) : "",
- FirstNormalObjectId);
+ FirstNormalObjectId,
+ (GET_MAJOR_VERSION(old_cluster.major_version) >= 1600) ?
+ ", 'pg_largeobject_metadata'" : "");
/*
* Add a CTE that collects OIDs of toast tables belonging to the tables
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index d5cd5bf0b3a..490e98fa26f 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -29,9 +29,9 @@
* We control all assignments of pg_enum.oid because these oids are stored
* in user tables as enum values.
*
- * We control all assignments of pg_authid.oid for historical reasons (the
- * oids used to be stored in pg_largeobject_metadata, which is now copied via
- * SQL commands), that might change at some point in the future.
+ * We control all assignments of pg_authid.oid because the oids are stored in
+ * pg_largeobject_metadata, which is copied via file transfer for upgrades
+ * from v16 and newer.
*
* We control all assignments of pg_database.oid because we want the directory
* names to match between the old and new cluster.
diff --git a/src/bin/pg_upgrade/t/006_transfer_modes.pl b/src/bin/pg_upgrade/t/006_transfer_modes.pl
index 348f4021462..2f68f0b56aa 100644
--- a/src/bin/pg_upgrade/t/006_transfer_modes.pl
+++ b/src/bin/pg_upgrade/t/006_transfer_modes.pl
@@ -45,6 +45,22 @@ sub test_mode
$old->append_conf('postgresql.conf', "allow_in_place_tablespaces = true");
}
+ # We can only test security labels if both the old and new installations
+ # have dummy_seclabel.
+ my $test_seclabel = 1;
+ $old->start;
+ if (!$old->check_extension('dummy_seclabel'))
+ {
+ $test_seclabel = 0;
+ }
+ $old->stop;
+ $new->start;
+ if (!$new->check_extension('dummy_seclabel'))
+ {
+ $test_seclabel = 0;
+ }
+ $new->stop;
+
# Create a small variety of simple test objects on the old cluster. We'll
# check that these reach the new version after upgrading.
$old->start;
@@ -83,6 +99,29 @@ sub test_mode
$old->safe_psql('testdb3',
"CREATE TABLE test6 AS SELECT generate_series(607, 711)");
}
+
+ # While we are here, test handling of large objects.
+ $old->safe_psql('postgres', q|
+ CREATE ROLE regress_lo_1;
+ CREATE ROLE regress_lo_2;
+
+ SELECT lo_from_bytea(4532, '\xffffff00');
+ COMMENT ON LARGE OBJECT 4532 IS 'test';
+
+ SELECT lo_from_bytea(4533, '\x0f0f0f0f');
+ ALTER LARGE OBJECT 4533 OWNER TO regress_lo_1;
+ GRANT SELECT ON LARGE OBJECT 4533 TO regress_lo_2;
+ |);
+
+ if ($test_seclabel)
+ {
+ $old->safe_psql('postgres', q|
+ CREATE EXTENSION dummy_seclabel;
+
+ SELECT lo_from_bytea(4534, '\x00ffffff');
+ SECURITY LABEL ON LARGE OBJECT 4534 IS 'classified';
+ |);
+ }
$old->stop;
my $result = command_ok_or_fails_like(
@@ -132,6 +171,34 @@ sub test_mode
$result = $new->safe_psql('testdb3', "SELECT COUNT(*) FROM test6");
is($result, '105', "test6 data after pg_upgrade $mode");
}
+
+ # Tests for large objects
+ $result = $new->safe_psql('postgres', "SELECT lo_get(4532)");
+ is($result, '\xffffff00', "LO contents after upgrade");
+ $result = $new->safe_psql('postgres',
+ "SELECT obj_description(4532, 'pg_largeobject')");
+ is($result, 'test', "comment on LO after pg_upgrade");
+
+ $result = $new->safe_psql('postgres', "SELECT lo_get(4533)");
+ is($result, '\x0f0f0f0f', "LO contents after upgrade");
+ $result = $new->safe_psql('postgres',
+ "SELECT lomowner::regrole FROM pg_largeobject_metadata WHERE oid = 4533");
+ is($result, 'regress_lo_1', "LO owner after upgrade");
+ $result = $new->safe_psql('postgres',
+ "SELECT lomacl FROM pg_largeobject_metadata WHERE oid = 4533");
+ is($result, '{regress_lo_1=rw/regress_lo_1,regress_lo_2=r/regress_lo_1}',
+ "LO ACL after upgrade");
+
+ if ($test_seclabel)
+ {
+ $result = $new->safe_psql('postgres', "SELECT lo_get(4534)");
+ is($result, '\x00ffffff', "LO contents after upgrade");
+ $result = $new->safe_psql('postgres', q|
+ SELECT label FROM pg_seclabel WHERE objoid = 4534
+ AND classoid = 'pg_largeobject'::regclass
+ |);
+ is($result, 'classified', "seclabel on LO after pg_upgrade");
+ }
$new->stop;
}
--
2.39.5 (Apple Git-154)
On Fri, Sep 05, 2025 at 01:12:49PM -0500, Nathan Bossart wrote:
How does this look?
+ # We can only test security labels if both the old and new installations
+ # have dummy_seclabel.
+ my $test_seclabel = 1;
+ $old->start;
+ if (!$old->check_extension('dummy_seclabel'))
+ {
+ $test_seclabel = 0;
+ }
+ $old->stop;
+ $new->start;
+ if (!$new->check_extension('dummy_seclabel'))
+ {
+ $test_seclabel = 0;
+ }
+ $new->stop;
Yep. This plan is safe to rely on.
A tiny comment I may have is that the LO numbers are hardcoded and
duplicated. I would have used a variable to store these numbers.
Please feel free to ignore my picky-ism.
--
Michael
On Sat, Sep 06, 2025 at 10:12:11AM +0900, Michael Paquier wrote:
Yep. This plan is safe to rely on.
Committed, thanks for reviewing!
--
nathan