pg_upgrade: transfer pg_largeobject_metadata's files when possible
(new thread)
On Fri, Jul 18, 2025 at 11:05:04AM -0500, Nathan Bossart wrote:
I'm cautiously optimistic that we can find some better gains for upgrades
from v16 and newer. That would involve dumping lo_create() commands for
all LOs with comments/seclabels, dumping the relevant pg_shdepend rows, and
then copying/linking the pg_largeobject_metadata files like we did prior to
v12.
Here is a patch. For background, the reason this is limited to upgrades
from v16 and newer is because the aclitem data type (needed by
pg_largeobject_metadata.lomacl) changed its storage format in v16 (see
commit 7b378237aa). Note that the patch is essentially a revert of commit
12a53c732c, but there are enough differences that it should be considered a
fresh effort.
Something I hadn't anticipated is that we need to take special care to
transfer the relfilenode of pg_largeobject_metadata and its index, as was
done for pg_largeobject in commits d498e052b4 and bbe08b8869. In fact, the
majority of the patch is dedicated to that.
My testing showed some decent, but not earth-shattering performance
improvements from this patch. For upgrades with many large objects with
NULL lomacl/lomowner columns, pg_upgrade was 50% faster. With non-NULL
lomacl/lomowner, that dropped to 25%. When each large object had a
comment, there was no change. I'm assuming that its rare to have lots of
large objects with comments or security labels, so I don't see any need to
expend energy trying to optimize that use-case.
I am a bit concerned that we'll forget to add checks for new types of
dependencies similar to comments and security labels. If we do, pg_upgrade
should just fail to restore the schema, and fixing the code should be easy
enough. Also, we'll need to remember to revisit this code if there's
another storage format change for one of pg_largeobject_metadata's columns,
but that seems unlikely to happen anytime soon. On the whole, I'm not too
worried about either of these points.
--
nathan
Attachments:
v1-0001-pg_upgrade-Transfer-pg_largeobject_metadata-s-fil.patchtext/plain; charset=utf-8Download+86-24
Have you considered re-creating pg_shdepend from
pg_largeobject_metadata directly instead of having special cases for
dumping it ?
It would also be useful in cases of old (pg_upgraded since before pg
12) databases which might be missing it anyway.
Show quoted text
On Thu, Aug 14, 2025 at 5:22 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
(new thread)
On Fri, Jul 18, 2025 at 11:05:04AM -0500, Nathan Bossart wrote:
I'm cautiously optimistic that we can find some better gains for upgrades
from v16 and newer. That would involve dumping lo_create() commands for
all LOs with comments/seclabels, dumping the relevant pg_shdepend rows, and
then copying/linking the pg_largeobject_metadata files like we did prior to
v12.Here is a patch. For background, the reason this is limited to upgrades
from v16 and newer is because the aclitem data type (needed by
pg_largeobject_metadata.lomacl) changed its storage format in v16 (see
commit 7b378237aa). Note that the patch is essentially a revert of commit
12a53c732c, but there are enough differences that it should be considered a
fresh effort.Something I hadn't anticipated is that we need to take special care to
transfer the relfilenode of pg_largeobject_metadata and its index, as was
done for pg_largeobject in commits d498e052b4 and bbe08b8869. In fact, the
majority of the patch is dedicated to that.My testing showed some decent, but not earth-shattering performance
improvements from this patch. For upgrades with many large objects with
NULL lomacl/lomowner columns, pg_upgrade was 50% faster. With non-NULL
lomacl/lomowner, that dropped to 25%. When each large object had a
comment, there was no change. I'm assuming that its rare to have lots of
large objects with comments or security labels, so I don't see any need to
expend energy trying to optimize that use-case.I am a bit concerned that we'll forget to add checks for new types of
dependencies similar to comments and security labels. If we do, pg_upgrade
should just fail to restore the schema, and fixing the code should be easy
enough. Also, we'll need to remember to revisit this code if there's
another storage format change for one of pg_largeobject_metadata's columns,
but that seems unlikely to happen anytime soon. On the whole, I'm not too
worried about either of these points.--
nathan
On Tue, Aug 19, 2025 at 09:49:26AM +0200, Hannu Krosing wrote:
Have you considered re-creating pg_shdepend from
pg_largeobject_metadata directly instead of having special cases for
dumping it ?
I considered it when you last brought up the idea [0]/messages/by-id/CAMT0RQTXiqH7zdQEVSVd2L7_Cw4wQ1eHOD8hfZ+0vecMXJWc-w@mail.gmail.com, but my testing
indicated that it's measurably slower.
It would also be useful in cases of old (pg_upgraded since before pg
12) databases which might be missing it anyway.
We only use COPY for upgrades from v12 and newer, and the patch at hand
only applies to v16 and newer. There should be no need to repair
pg_shdepend for any such upgrades because we haven't copied/linked the
files since before v12.
[0]: /messages/by-id/CAMT0RQTXiqH7zdQEVSVd2L7_Cw4wQ1eHOD8hfZ+0vecMXJWc-w@mail.gmail.com
--
nathan
On Thu, Aug 14, 2025 at 10:22:02AM -0500, Nathan Bossart wrote:
Here is a patch. For background, the reason this is limited to upgrades
from v16 and newer is because the aclitem data type (needed by
pg_largeobject_metadata.lomacl) changed its storage format in v16 (see
commit 7b378237aa). Note that the patch is essentially a revert of commit
12a53c732c, but there are enough differences that it should be considered a
fresh effort.
Noted.
Something I hadn't anticipated is that we need to take special care to
transfer the relfilenode of pg_largeobject_metadata and its index, as was
done for pg_largeobject in commits d498e052b4 and bbe08b8869. In fact, the
majority of the patch is dedicated to that.My testing showed some decent, but not earth-shattering performance
improvements from this patch. For upgrades with many large objects with
NULL lomacl/lomowner columns, pg_upgrade was 50% faster. With non-NULL
lomacl/lomowner, that dropped to 25%. When each large object had a
comment, there was no change. I'm assuming that its rare to have lots of
large objects with comments or security labels, so I don't see any need to
expend energy trying to optimize that use-case.
I highly doubt that there are a lot of comments assigned to LOs, so
these numbers are pretty cool IMO. Security labels are a pain to test
in the upgrade path, or test_dummy_label could be extended with a new
TAP test and a pg_upgrade command.. There is some coverage with
comments on LOs in src/bin/pg_dump's 002, so that would be enough for
the comment part, at least.
I am a bit concerned that we'll forget to add checks for new types of
dependencies similar to comments and security labels. If we do, pg_upgrade
should just fail to restore the schema, and fixing the code should be easy
enough. Also, we'll need to remember to revisit this code if there's
another storage format change for one of pg_largeobject_metadata's columns,
but that seems unlikely to happen anytime soon. On the whole, I'm not too
worried about either of these points.
This part does not worry me much, TBH. This stuff would require
dump/restore support and the pg_dump test suite would catch that for
commands with --binary-upgrade. So it should be hard to miss.
- /*
- * pg_largeobject
- */
if (fout->remoteVersion >= 90300)
appendPQExpBuffer(loFrozenQry, "SELECT relfrozenxid, relminmxid, relfilenode, oid\n"
"FROM pg_catalog.pg_class\n"
- "WHERE oid IN (%u, %u);\n",
- LargeObjectRelationId, LargeObjectLOidPNIndexId);
+ "WHERE oid IN (%u, %u, %u, %u);\n",
+ LargeObjectRelationId, LargeObjectLOidPNIndexId,
+ LargeObjectMetadataRelationId, LargeObjectMetadataOidIndexId);
[...]
appendPQExpBufferStr(loHorizonQry, "\n-- For binary upgrade, set pg_largeobject relfrozenxid and relminmxid\n");
+ appendPQExpBufferStr(lomHorizonQry, "\n-- For binary upgrade, set pg_largeobject_metadata relfrozenxid and relminmxid\n");
appendPQExpBufferStr(loOutQry, "\n-- For binary upgrade, preserve pg_largeobject and index relfilenodes\n");
+ appendPQExpBufferStr(lomOutQry, "\n-- For binary upgrade, preserve pg_largeobject_metadata and index relfilenodes\n");
Is all that really required when upgrading from a cluster in the
9.3~15 range?
--
Michael
On Mon, Sep 01, 2025 at 03:19:46PM +0900, Michael Paquier wrote:
I highly doubt that there are a lot of comments assigned to LOs, so
these numbers are pretty cool IMO. Security labels are a pain to test
in the upgrade path, or test_dummy_label could be extended with a new
TAP test and a pg_upgrade command.. There is some coverage with
comments on LOs in src/bin/pg_dump's 002, so that would be enough for
the comment part, at least.
Do you think a new pg_upgrade test for security labels is worth the
trouble? It seems doable, but it'd be an awfully expensive test for this.
On the other hand, I'm not sure there's any coverage for pg_upgrade with
security labels, so perhaps this is a good time to establish some tests.
- /* - * pg_largeobject - */ if (fout->remoteVersion >= 90300) appendPQExpBuffer(loFrozenQry, "SELECT relfrozenxid, relminmxid, relfilenode, oid\n" "FROM pg_catalog.pg_class\n" - "WHERE oid IN (%u, %u);\n", - LargeObjectRelationId, LargeObjectLOidPNIndexId); + "WHERE oid IN (%u, %u, %u, %u);\n", + LargeObjectRelationId, LargeObjectLOidPNIndexId, + LargeObjectMetadataRelationId, LargeObjectMetadataOidIndexId); [...] appendPQExpBufferStr(loHorizonQry, "\n-- For binary upgrade, set pg_largeobject relfrozenxid and relminmxid\n"); + appendPQExpBufferStr(lomHorizonQry, "\n-- For binary upgrade, set pg_largeobject_metadata relfrozenxid and relminmxid\n"); appendPQExpBufferStr(loOutQry, "\n-- For binary upgrade, preserve pg_largeobject and index relfilenodes\n"); + appendPQExpBufferStr(lomOutQry, "\n-- For binary upgrade, preserve pg_largeobject_metadata and index relfilenodes\n");Is all that really required when upgrading from a cluster in the
9.3~15 range?
No, that stuff is discarded for upgrades from those versions. My intent
was to maintain readability by avoiding lots of version checks. FWIW I
originally put all of the pg_large_object_metadata stuff in a separate
block, but that resulted in a lot of copy/pasted code. I'm happy to adjust
it as you see fit.
--
nathan
On Tue, Sep 02, 2025 at 09:43:40AM -0500, Nathan Bossart wrote:
Do you think a new pg_upgrade test for security labels is worth the
trouble? It seems doable, but it'd be an awfully expensive test for this.
On the other hand, I'm not sure there's any coverage for pg_upgrade with
security labels, so perhaps this is a good time to establish some tests.
I would argue in favor of these additions. Security labels are not
the most popular thing ever, AFAIK, but your patch makes the need more
relevant to have. The cheapest approach would be to add a LO creation
pattern in src/bin/pg_dump/t/002_pg_dump.pl, with an EXTRA_INSTALL
pointing at src/test/modules/dummy_seclabel/ to be able to create the
security label (we already do that in pg_upgrade and pg_basebackup so
the trick works). That should be enough to make sure that the binary
upgrade dumps have the seclabel data included. It's a bit funky, I
agree. So if you think that this is not worth the test cycles, I
won't push hard on this point, either.
No, that stuff is discarded for upgrades from those versions. My intent
was to maintain readability by avoiding lots of version checks. FWIW I
originally put all of the pg_large_object_metadata stuff in a separate
block, but that resulted in a lot of copy/pasted code. I'm happy to adjust
it as you see fit.
+ if (fout->remoteVersion >= 160000)
+ ArchiveEntry(fout, nilCatalogId, createDumpId(),
+ ARCHIVE_OPTS(.tag = "pg_largeobject_metadata",
+ .description = "pg_largeobject_metadata",
+ .section = SECTION_PRE_DATA,
+ .createStmt = lomOutQry->data));
So it's filtered out depending on the old cluster version here. I
have managed to miss this part. Some tests with LOs and across
multiple backend versions (pg_dump -C --binary-upgrade) are showing me
that I am wrong and that you are right, with diffs showing up properly
in the binary upgrade dumps for pg_largeobject_metadata. This logic
looks OK to me, even if it's a waste to fill lomOutQry when upgrading
from an instance in the 9.3~15 range, discarding ArchiveEntry() at the
end. It's not a big deal.
--
Michael
On Thu, Sep 04, 2025 at 01:59:36PM +0900, Michael Paquier wrote:
On Tue, Sep 02, 2025 at 09:43:40AM -0500, Nathan Bossart wrote:
Do you think a new pg_upgrade test for security labels is worth the
trouble? It seems doable, but it'd be an awfully expensive test for this.
On the other hand, I'm not sure there's any coverage for pg_upgrade with
security labels, so perhaps this is a good time to establish some tests.I would argue in favor of these additions. Security labels are not
the most popular thing ever, AFAIK, but your patch makes the need more
relevant to have. The cheapest approach would be to add a LO creation
pattern in src/bin/pg_dump/t/002_pg_dump.pl, with an EXTRA_INSTALL
pointing at src/test/modules/dummy_seclabel/ to be able to create the
security label (we already do that in pg_upgrade and pg_basebackup so
the trick works). That should be enough to make sure that the binary
upgrade dumps have the seclabel data included. It's a bit funky, I
agree. So if you think that this is not worth the test cycles, I
won't push hard on this point, either.
Ah, I'd forgotten about EXTRA_INSTALL. That simplifies things. There's
enough special handling for large objects in pg_upgrade that I think we
ought to test it end-to-end, so I sneaked it into 006_tranfer_modes.pl.
WDYT?
--
nathan
Attachments:
v2-0001-pg_upgrade-Transfer-pg_largeobject_metadata-s-fil.patchtext/plain; charset=utf-8Download+142-26
On Thu, Sep 04, 2025 at 08:23:58PM -0500, Nathan Bossart wrote:
Ah, I'd forgotten about EXTRA_INSTALL. That simplifies things. There's
enough special handling for large objects in pg_upgrade that I think we
ought to test it end-to-end, so I sneaked it into 006_tranfer_modes.pl.
WDYT?
Neat. That works for me.
+ $old->safe_psql('postgres', q|
+ CREATE EXTENSION dummy_seclabel;
Still I think that this bit is going to fail with installcheck,
because src/test/modules/ is not installed by default :)
You can make that conditional with check_extension(), like the tricks
for injection_points in other TAP tests. The security label creation
and test also need to be made conditional. The work is already done
with the check on oldinstall, it just needs an extra tweak, so I would
store the condition in a separate variable at the top of test_mode()
in 006_transfer_modes.pl, or something equivalent.
--
Michael
On Fri, Sep 05, 2025 at 03:35:21PM +0900, Michael Paquier wrote:
+ $old->safe_psql('postgres', q| + CREATE EXTENSION dummy_seclabel;Still I think that this bit is going to fail with installcheck,
because src/test/modules/ is not installed by default :)You can make that conditional with check_extension(), like the tricks
for injection_points in other TAP tests. The security label creation
and test also need to be made conditional. The work is already done
with the check on oldinstall, it just needs an extra tweak, so I would
store the condition in a separate variable at the top of test_mode()
in 006_transfer_modes.pl, or something equivalent.
How does this look?
--
nathan
Attachments:
v3-0001-pg_upgrade-Transfer-pg_largeobject_metadata-s-fil.patchtext/plain; charset=utf-8Download+154-26
On Fri, Sep 05, 2025 at 01:12:49PM -0500, Nathan Bossart wrote:
How does this look?
+ # We can only test security labels if both the old and new installations
+ # have dummy_seclabel.
+ my $test_seclabel = 1;
+ $old->start;
+ if (!$old->check_extension('dummy_seclabel'))
+ {
+ $test_seclabel = 0;
+ }
+ $old->stop;
+ $new->start;
+ if (!$new->check_extension('dummy_seclabel'))
+ {
+ $test_seclabel = 0;
+ }
+ $new->stop;
Yep. This plan is safe to rely on.
A tiny comment I may have is that the LO numbers are hardcoded and
duplicated. I would have used a variable to store these numbers.
Please feel free to ignore my picky-ism.
--
Michael
On Sat, Sep 06, 2025 at 10:12:11AM +0900, Michael Paquier wrote:
Yep. This plan is safe to rely on.
Committed, thanks for reviewing!
--
nathan
On Mon, Sep 08, 2025 at 02:20:33PM -0500, Nathan Bossart wrote:
Committed, thanks for reviewing!
Thanks!
--
Michael
Hi,
On 2025-09-08 14:20:33 -0500, Nathan Bossart wrote:
On Sat, Sep 06, 2025 at 10:12:11AM +0900, Michael Paquier wrote:
Yep. This plan is safe to rely on.
Committed, thanks for reviewing!
I just had occasion to look at this code and I found the way this works quite
confusing:
When upgrading from a new enough server, we migrate
pg_largeobject_metadata. So far so good. But we *also* do a COPY FROM
COPY "pg_catalog"."pg_largeobject_metadata" ("oid", "lomowner", "lomacl") FROM stdin;
...
for all the large objects that have a description or a security label.
For a while I was somewhat baffled, because that sure looks like it ought to
lead to uniqueness violations. But it doesn't.
The reason, I think, is that the COPY is happening into a relfilenode that
will be overwritten later, it doesn't yet contain the contents of the old
cluster.
Presumably we do this because we need the temporary pg_largeobject_metadata to
make COMMENT ON and security label commands not fail.
If this is the reasoning / how it works, shouldn't there be a comment in the
code or the commit message explaining that? Because it sure seems non-obvious
to me.
It's also not entirely obvious to me that this is safe - after all
(bbe08b8869bd, revised in 0e758ae89) appeared to have taken some pains to
ensure that the file gets unlinked immediately during the "binary upgrade
mode" TRUNCATE. But now we are actually filling that file again, after the
relation had been truncated?
Separately, if we don't handle large objects that don't have comments or
labels via pg_dump, why do we do dependency tracking for all LOs in getLOs(),
rather than just the ones that have a comment / label? Given how much memory
both the query results and the dependency tracking take...
Greetings,
Andres Freund
Hi,
On 2026-02-03 18:16:17 -0500, Andres Freund wrote:
On 2025-09-08 14:20:33 -0500, Nathan Bossart wrote:
On Sat, Sep 06, 2025 at 10:12:11AM +0900, Michael Paquier wrote:
Yep. This plan is safe to rely on.
Committed, thanks for reviewing!
I just had occasion to look at this code and I found the way this works quite
confusing:When upgrading from a new enough server, we migrate
pg_largeobject_metadata. So far so good. But we *also* do a COPY FROM
COPY "pg_catalog"."pg_largeobject_metadata" ("oid", "lomowner", "lomacl") FROM stdin;
...for all the large objects that have a description or a security label.
For a while I was somewhat baffled, because that sure looks like it ought to
lead to uniqueness violations. But it doesn't.The reason, I think, is that the COPY is happening into a relfilenode that
will be overwritten later, it doesn't yet contain the contents of the old
cluster.Presumably we do this because we need the temporary pg_largeobject_metadata to
make COMMENT ON and security label commands not fail.If this is the reasoning / how it works, shouldn't there be a comment in the
code or the commit message explaining that? Because it sure seems non-obvious
to me.It's also not entirely obvious to me that this is safe - after all
(bbe08b8869bd, revised in 0e758ae89) appeared to have taken some pains to
ensure that the file gets unlinked immediately during the "binary upgrade
mode" TRUNCATE. But now we are actually filling that file again, after the
relation had been truncated?
An example of what could go wrong:
Imagine that we add support for freezing pages on-access (as we are
discussing). If pg_largeobject_metadata was *not* frozen / vacuumed on the old
server, there might not be a _vm on the old server, but due to the new
on-access freezing, there might be one created for the data in the ephemeral
pg_largeobject_metadata during the accesses from COMMENT ON.
Because there's no VM on the old server, transfer_relfile() would just return
false:
/* Is it an extent, fsm, or vm file? */
if (type_suffix[0] != '\0' || segno != 0)
{
/* Did file open fail? */
if (stat(old_file, &statbuf) != 0)
{
/* File does not exist? That's OK, just return */
if (errno == ENOENT)
return;
therefore not reaching the:
unlink(new_file);
Leaving a completely wrong visibilitymap in place.
There are other scary possibilities. Imagine that the ephemeral
pg_largeobject_metadata ends up bigger than on the old server, e.g. because we
are a bit more aggressive about bulk relation extension in the new version
(which we should be!). If that size difference ends up with a different
segment count between the old server and the ephemeral relation in the new
server, we're in trouble: transfer_relfile() wouldn't unlink the additional
segments on the target system.
Greetings,
Andres Freund
On Tue, Feb 03, 2026 at 06:46:25PM -0500, Andres Freund wrote:
The reason, I think, is that the COPY is happening into a relfilenode that
will be overwritten later, it doesn't yet contain the contents of the old
cluster.Presumably we do this because we need the temporary pg_largeobject_metadata to
make COMMENT ON and security label commands not fail.If this is the reasoning / how it works, shouldn't there be a comment in the
code or the commit message explaining that? Because it sure seems non-obvious
to me.
Right, the COPY for LOs with comments and security labels is solely meant
to avoid failure when restoring the comments and security labels, since we
won't have transferred the relation files yet. This was the case before
commit 12a53c732c, where we had this comment in getBlobs():
* We *do* dump out the definition of the blob because we need that to
* make the restoration of the comments, and anything else, work since
* pg_upgrade copies the files behind pg_largeobject and
* pg_largeobject_metadata after the dump is restored.
Commit 3bcfcd815e (mine) added this one to pg_dump.c:
* If upgrading from v16 or newer, only dump large objects with
* comments/seclabels. For these upgrades, pg_upgrade can copy/link
* pg_largeobject_metadata's files (which is usually faster) but we
* still need to dump LOs with comments/seclabels here so that the
* subsequent COMMENT and SECURITY LABEL commands work. pg_upgrade
* can't copy/link the files from older versions because aclitem
* (needed by pg_largeobject_metadata.lomacl) changed its storage
* format in v16.
IIUC your critique is that this doesn't explain the overwriting behavior
like the older comment does. I'll work on adding that.
It's also not entirely obvious to me that this is safe - after all
(bbe08b8869bd, revised in 0e758ae89) appeared to have taken some pains to
ensure that the file gets unlinked immediately during the "binary upgrade
mode" TRUNCATE. But now we are actually filling that file again, after the
relation had been truncated?An example of what could go wrong:
[... examples of what could go wrong ...]
I'm considering a couple of options here, but it seems like the easiest
thing to do is to move the TRUNCATE commands to the end of the dump file.
At least, that seems to be sufficient for our existing tests. If that
seems okay to you, I can work on putting together a patch.
--
nathan
On Wed, Feb 04, 2026 at 10:06:29AM -0600, Nathan Bossart wrote:
IIUC your critique is that this doesn't explain the overwriting behavior
like the older comment does. I'll work on adding that.[...]
I'm considering a couple of options here, but it seems like the easiest
thing to do is to move the TRUNCATE commands to the end of the dump file.
At least, that seems to be sufficient for our existing tests. If that
seems okay to you, I can work on putting together a patch.
Here is a rough first draft of a patch that does this.
--
nathan
Attachments:
v1-0001-fix-pg_largeobject_metadata-file-transfer.patchtext/plain; charset=us-asciiDownload+28-12
Hi,
On 2026-02-04 10:06:29 -0600, Nathan Bossart wrote:
IIUC your critique is that this doesn't explain the overwriting behavior
like the older comment does. I'll work on adding that.
I think even the old comment was woefully under-documenting that the commands
are all just make work that's going to be thrown out almost immediately after.
Thanks for addressing that!
On 2026-02-04 14:08:47 -0600, Nathan Bossart wrote:
On Wed, Feb 04, 2026 at 10:06:29AM -0600, Nathan Bossart wrote:
IIUC your critique is that this doesn't explain the overwriting behavior
like the older comment does. I'll work on adding that.[...]
I'm considering a couple of options here, but it seems like the easiest
thing to do is to move the TRUNCATE commands to the end of the dump file.
At least, that seems to be sufficient for our existing tests. If that
seems okay to you, I can work on putting together a patch.Here is a rough first draft of a patch that does this.
It certainly seems better than what we do now. Still feels pretty grotty and
error prone to me that we fill the catalog table and then throw the contents
out.
@@ -1157,7 +1158,10 @@ main(int argc, char **argv) * subsequent COMMENT and SECURITY LABEL commands work. pg_upgrade * can't copy/link the files from older versions because aclitem * (needed by pg_largeobject_metadata.lomacl) changed its storage - * format in v16. + * format in v16. At the end of the dump, we'll generate a TRUNCATE + * command for pg_largeobject_metadata so that it's contents are + * cleared in preparation for the subsequent file transfer by + * pg_upgrade. */
I'd move that comment to earlier in the paragraph, it sounds a bit like it
applies to the v16 specific bits.
if (fout->remoteVersion >= 160000)
lo_metadata->dataObj->filtercond = "WHERE oid IN "
@@ -1243,6 +1247,13 @@ main(int argc, char **argv)
for (i = 0; i < numObjs; i++)
dumpDumpableObject(fout, dobjs[i]);+ /* + * For binary upgrades, set relfrozenxids, relminmxids, and relfilenodes + * of pg_largeobject and maybe pg_largeobject_metadata, and remove all + * their files. We will transfer them from the old cluster as needed. + */ + dumpLOTruncation(fout); + /* * Set up options info to ensure we dump what we want. */
Seems good to move this to a dedicated function, regardless of anything else.
Do I see correctly that we just rely on the ordering in the file, rather than
dependencies? That's not a complaint, I just don't know that code very well.
Greetings,
Andres Freund
On Thu, Feb 05, 2026 at 11:19:46AM -0500, Andres Freund wrote:
It certainly seems better than what we do now. Still feels pretty grotty and
error prone to me that we fill the catalog table and then throw the contents
out.
Before I go any further with this approach, I thought of something else we
could do that I believe is worth considering...
As of commit 3bcfcd815e, the only reason we are dumping any of
pg_largeobject_metadata at all is to avoid an ERROR during COMMENT ON or
SECURITY LABEL ON because the call to LargeObjectExists() in
get_object_address() returns false. If we bypass that check in
binary-upgrade mode, we can skip dumping pg_largeobject_metadata entirely.
The attached patch passes our existing tests, and it seems to create the
expected binary-upgrade-mode dump files, too. I haven't updated any of the
comments yet.
--
nathan
Attachments:
v2-0001-fix-pg_largeobject_metadata-file-transfer.patchtext/plain; charset=us-asciiDownload+24-32
On Thu, Feb 05, 2026 at 11:36:00AM -0600, Nathan Bossart wrote:
@@ -1046,7 +1046,7 @@ get_object_address(ObjectType objtype, Node *object, address.classId = LargeObjectRelationId; address.objectId = oidparse(object); address.objectSubId = 0; - if (!LargeObjectExists(address.objectId)) + if (!LargeObjectExists(address.objectId) && !IsBinaryUpgrade) { if (!missing_ok) ereport(ERROR,
(Probably better to set missing_ok for only the specific commands we want
to bypass this check, but you get the idea...)
--
nathan
Hi,
On 2026-02-05 11:36:00 -0600, Nathan Bossart wrote:
On Thu, Feb 05, 2026 at 11:19:46AM -0500, Andres Freund wrote:
It certainly seems better than what we do now. Still feels pretty grotty and
error prone to me that we fill the catalog table and then throw the contents
out.Before I go any further with this approach, I thought of something else we
could do that I believe is worth considering...As of commit 3bcfcd815e, the only reason we are dumping any of
pg_largeobject_metadata at all is to avoid an ERROR during COMMENT ON or
SECURITY LABEL ON because the call to LargeObjectExists() in
get_object_address() returns false. If we bypass that check in
binary-upgrade mode, we can skip dumping pg_largeobject_metadata entirely.
Yea, I think that's worth considering. As you say downthread, the check for
binary upgrade should probably be moved, but that's details.
Upthread I also wondering why we do all the work in getLOs() if we don't
actually need most of it (only if there are comments or labels). Right now
that's a very slow and very memory intensive part of doing an upgrade of a
system with a lot of binary upgrades. Do we need *any* of that if we go the
path you suggest?
Greetings,
Andres Freund