pg15b2: large objects lost on upgrade

Started by Justin Pryzbyalmost 4 years ago97 messageshackers
Jump to latest
#1Justin Pryzby
pryzby@telsasoft.com

I noticed this during beta1, but dismissed the issue when it wasn't easily
reproducible. Now, I saw the same problem while upgrading from beta1 to beta2,
so couldn't dismiss it. It turns out that LOs are lost if VACUUM FULL was run.

| /usr/pgsql-15b1/bin/initdb --no-sync -D pg15b1.dat -k
| /usr/pgsql-15b1/bin/postgres -D pg15b1.dat -c logging_collector=no -p 5678 -k /tmp&
| psql -h /tmp postgres -p 5678 -c '\lo_import /etc/shells' -c 'VACUUM FULL pg_largeobject'
| rm -fr pg15b2.dat && /usr/pgsql-15b2/bin/initdb --no-sync -k -D pg15b2.dat && /usr/pgsql-15b2/bin/pg_upgrade -d pg15b1.dat -D pg15b2.dat -b /usr/pgsql-15b1/bin
| /usr/pgsql-15b2/bin/postgres -D pg15b2.dat -c logging_collector=no -p 5678 -k /tmp&

Or, for your convenience, with paths in tmp_install:
| ./tmp_install/usr/local/pgsql/bin/initdb --no-sync -D pg15b1.dat -k
| ./tmp_install/usr/local/pgsql/bin/postgres -D pg15b1.dat -c logging_collector=no -p 5678 -k /tmp&
| psql -h /tmp postgres -p 5678 -c '\lo_import /etc/shells' -c 'VACUUM FULL pg_largeobject'
| rm -fr pg15b2.dat && ./tmp_install/usr/local/pgsql/bin/initdb --no-sync -k -D pg15b2.dat && ./tmp_install/usr/local/pgsql/bin/pg_upgrade -d pg15b1.dat -D pg15b2.dat -b ./tmp_install/usr/local/pgsql/bin
| ./tmp_install/usr/local/pgsql/bin/postgres -D pg15b2.dat -c logging_collector=no -p 5678 -k /tmp&

postgres=# table pg_largeobject_metadata ;
16384 | 10 |

postgres=# \lo_list
16384 | pryzbyj |

postgres=# \lo_export 16384 /dev/stdout
lo_export
postgres=# table pg_largeobject;

postgres=# \dt+ pg_largeobject
pg_catalog | pg_largeobject | table | pryzbyj | permanent | heap | 0 bytes |

I reproduced the problem at 9a974cbcba but not its parent commit.

commit 9a974cbcba005256a19991203583a94b4f9a21a9
Author: Robert Haas <rhaas@postgresql.org>
Date: Mon Jan 17 13:32:44 2022 -0500

pg_upgrade: Preserve relfilenodes and tablespace OIDs.

--
Justin

#2Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#1)
Re: pg15b2: large objects lost on upgrade

On Fri, Jul 01, 2022 at 06:14:13PM -0500, Justin Pryzby wrote:

I reproduced the problem at 9a974cbcba but not its parent commit.

commit 9a974cbcba005256a19991203583a94b4f9a21a9
Author: Robert Haas <rhaas@postgresql.org>
Date: Mon Jan 17 13:32:44 2022 -0500

pg_upgrade: Preserve relfilenodes and tablespace OIDs.

Oops. Robert?

This reproduces as well when using pg_upgrade with the same version as
origin and target, meaning that this extra step in the TAP test is
able to reproduce the issue (extra VACUUM FULL before chdir'ing):
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -208,6 +208,8 @@ if (defined($ENV{oldinstall}))
    }
 }
+$oldnode->safe_psql("regression", "VACUUM FULL pg_largeobject;");
+
# In a VPATH build, we'll be started in the source directory, but we want
--
Michael
#3Robert Haas
robertmhaas@gmail.com
In reply to: Justin Pryzby (#1)
Re: pg15b2: large objects lost on upgrade

On Fri, Jul 1, 2022 at 7:14 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

I noticed this during beta1, but dismissed the issue when it wasn't easily
reproducible. Now, I saw the same problem while upgrading from beta1 to beta2,
so couldn't dismiss it. It turns out that LOs are lost if VACUUM FULL was run.

Yikes. That's really bad, and I have no idea what might be causing it,
either. I'll plan to investigate this on Tuesday unless someone gets
to it before then.

--
Robert Haas
EDB: http://www.enterprisedb.com

#4Justin Pryzby
pryzby@telsasoft.com
In reply to: Robert Haas (#3)
Re: pg15b2: large objects lost on upgrade

On Sat, Jul 02, 2022 at 08:34:04AM -0400, Robert Haas wrote:

On Fri, Jul 1, 2022 at 7:14 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

I noticed this during beta1, but dismissed the issue when it wasn't easily
reproducible. Now, I saw the same problem while upgrading from beta1 to beta2,
so couldn't dismiss it. It turns out that LOs are lost if VACUUM FULL was run.

Yikes. That's really bad, and I have no idea what might be causing it,
either. I'll plan to investigate this on Tuesday unless someone gets
to it before then.

I suppose it's like Bruce said, here.

/messages/by-id/20210601140949.GC22012@momjian.us

|One tricky case is pg_largeobject, which is copied from the old to new
|cluster since it has user data. To preserve that relfilenode, you would
|need to have pg_upgrade perform cluster surgery in each database to
|renumber its relfilenode to match since it is created by initdb. I
|can't think of a case where pg_upgrade already does something like that.

Rather than setting the filenode of the next object as for user tables,
pg-upgrade needs to UPDATE the relfilenode.

This patch "works for me" but feel free to improve on it.

Attachments:

0001-wip-preserve-relfilenode-of-pg_largeobject-and-its-i.patchtext/x-diff; charset=us-asciiDownload+24-16
#5Julien Rouhaud
rjuju123@gmail.com
In reply to: Robert Haas (#3)
Re: pg15b2: large objects lost on upgrade

On Sat, Jul 02, 2022 at 08:34:04AM -0400, Robert Haas wrote:

On Fri, Jul 1, 2022 at 7:14 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

I noticed this during beta1, but dismissed the issue when it wasn't easily
reproducible. Now, I saw the same problem while upgrading from beta1 to beta2,
so couldn't dismiss it. It turns out that LOs are lost if VACUUM FULL was run.

Yikes. That's really bad, and I have no idea what might be causing it,
either. I'll plan to investigate this on Tuesday unless someone gets
to it before then.

As far as I can see the data is still there, it's just that the new cluster
keeps its default relfilenode instead of preserving the old cluster's value:

regression=# table pg_largeobject;
loid | pageno | data
------+--------+------
(0 rows)

regression=# select oid, relfilenode from pg_class where relname = 'pg_largeobject';
oid | relfilenode
------+-------------
2613 | 2613
(1 row)

-- using the value from the old cluster
regression=# update pg_class set relfilenode = 39909 where oid = 2613;
UPDATE 1

regression=# table pg_largeobject;
loid | pageno |
-------+--------+-----------------
33211 | 0 | \x0a4920776[...]
34356 | 0 | \xdeadbeef
(2 rows)

#6Shruthi Gowda
gowdashru@gmail.com
In reply to: Justin Pryzby (#4)
Re: pg15b2: large objects lost on upgrade

I was able to reproduce the issue. Also, the issue does not occur with code
before to preserve relfilenode commit.
I tested your patch and it fixes the problem.
I am currently analyzing a few things related to the issue. I will come
back once my analysis is completed.

On Sat, Jul 2, 2022 at 9:19 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

Show quoted text

On Sat, Jul 02, 2022 at 08:34:04AM -0400, Robert Haas wrote:

On Fri, Jul 1, 2022 at 7:14 PM Justin Pryzby <pryzby@telsasoft.com>

wrote:

I noticed this during beta1, but dismissed the issue when it wasn't

easily

reproducible. Now, I saw the same problem while upgrading from beta1

to beta2,

so couldn't dismiss it. It turns out that LOs are lost if VACUUM FULL

was run.

Yikes. That's really bad, and I have no idea what might be causing it,
either. I'll plan to investigate this on Tuesday unless someone gets
to it before then.

I suppose it's like Bruce said, here.

/messages/by-id/20210601140949.GC22012@momjian.us

|One tricky case is pg_largeobject, which is copied from the old to new
|cluster since it has user data. To preserve that relfilenode, you would
|need to have pg_upgrade perform cluster surgery in each database to
|renumber its relfilenode to match since it is created by initdb. I
|can't think of a case where pg_upgrade already does something like that.

Rather than setting the filenode of the next object as for user tables,
pg-upgrade needs to UPDATE the relfilenode.

This patch "works for me" but feel free to improve on it.

#7Robert Haas
robertmhaas@gmail.com
In reply to: Justin Pryzby (#4)
Re: pg15b2: large objects lost on upgrade

On Sat, Jul 2, 2022 at 11:49 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

I suppose it's like Bruce said, here.

/messages/by-id/20210601140949.GC22012@momjian.us

Well, I feel dumb. I remember reading that email back when Bruce sent
it, but it seems that it slipped out of my head between then and when
I committed. I think your patch is fine, except that I think maybe we
should adjust this dump comment:

-- For binary upgrade, set pg_largeobject relfrozenxid, relminmxid and
relfilenode

Perhaps:

-- For binary upgrade, preserve values for pg_largeobject and its index

Listing the exact properties preserved seems less important to me than
mentioning that the second UPDATE statement is for its index --
because if you look at the SQL that's generated, you can see what's
being preserved, but you don't automatically know why there are two
UPDATE statements or what the rows with those OIDs are.

I had a moment of panic this morning where I thought maybe the whole
patch needed to be reverted. I was worried that we might need to
preserve the OID of every system table and index. Otherwise, what
happens if the OID counter in the old cluster wraps around and some
user-created object gets an OID that the system tables are using in
the new cluster? However, I think this can't happen, because when the
OID counter wraps around, it wraps around to 16384, and the
relfilenode values for newly created system tables and indexes are all
less than 16384. So I believe we only need to fix pg_largeobject and
its index, and I think your patch does that.

Anyone else see it differently?

--
Robert Haas
EDB: http://www.enterprisedb.com

#8Justin Pryzby
pryzby@telsasoft.com
In reply to: Robert Haas (#7)
Re: pg15b2: large objects lost on upgrade

On Tue, Jul 05, 2022 at 12:43:54PM -0400, Robert Haas wrote:

On Sat, Jul 2, 2022 at 11:49 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

I suppose it's like Bruce said, here.

/messages/by-id/20210601140949.GC22012@momjian.us

Well, I feel dumb. I remember reading that email back when Bruce sent
it, but it seems that it slipped out of my head between then and when
I committed. I think your patch is fine, except that I think maybe we

My patch also leaves a 0 byte file around from initdb, which is harmless, but
dirty.

I've seen before where a bunch of 0 byte files are abandoned in an
otherwise-empty tablespace, with no associated relation, and I have to "rm"
them to be able to drop the tablespace. Maybe that's a known issue, maybe it's
due to crashes or other edge case, maybe it's of no consequence, and maybe it's
already been fixed or being fixed already. But it'd be nice to avoid another
way to have a 0 byte files - especially ones named with system OIDs.

Listing the exact properties preserved seems less important to me than
mentioning that the second UPDATE statement is for its index --

+1

--
Justin

#9Robert Haas
robertmhaas@gmail.com
In reply to: Justin Pryzby (#8)
Re: pg15b2: large objects lost on upgrade

On Tue, Jul 5, 2022 at 12:56 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

My patch also leaves a 0 byte file around from initdb, which is harmless, but
dirty.

I've seen before where a bunch of 0 byte files are abandoned in an
otherwise-empty tablespace, with no associated relation, and I have to "rm"
them to be able to drop the tablespace. Maybe that's a known issue, maybe it's
due to crashes or other edge case, maybe it's of no consequence, and maybe it's
already been fixed or being fixed already. But it'd be nice to avoid another
way to have a 0 byte files - especially ones named with system OIDs.

Do you want to add something to the patch to have pg_upgrade remove
the stray file?

--
Robert Haas
EDB: http://www.enterprisedb.com

#10Justin Pryzby
pryzby@telsasoft.com
In reply to: Robert Haas (#9)
Re: pg15b2: large objects lost on upgrade

On Tue, Jul 05, 2022 at 02:40:21PM -0400, Robert Haas wrote:

On Tue, Jul 5, 2022 at 12:56 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

My patch also leaves a 0 byte file around from initdb, which is harmless, but
dirty.

I've seen before where a bunch of 0 byte files are abandoned in an
otherwise-empty tablespace, with no associated relation, and I have to "rm"
them to be able to drop the tablespace. Maybe that's a known issue, maybe it's
due to crashes or other edge case, maybe it's of no consequence, and maybe it's
already been fixed or being fixed already. But it'd be nice to avoid another
way to have a 0 byte files - especially ones named with system OIDs.

Do you want to add something to the patch to have pg_upgrade remove
the stray file?

I'm looking into it, but it'd help to hear suggestions about where to put it.
My current ideas aren't very good.

--
Justin

#11Robert Haas
robertmhaas@gmail.com
In reply to: Justin Pryzby (#10)
Re: pg15b2: large objects lost on upgrade

On Wed, Jul 6, 2022 at 7:56 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

I'm looking into it, but it'd help to hear suggestions about where to put it.
My current ideas aren't very good.

In main() there is a comment that begins "Most failures happen in
create_new_objects(), which has just completed at this point." I am
thinking you might want to insert a new function call just before that
comment, like remove_orphaned_files() or tidy_up_new_cluster().

Another option could be to do something at the beginning of
transfer_all_new_tablespaces().

--
Robert Haas
EDB: http://www.enterprisedb.com

#12Justin Pryzby
pryzby@telsasoft.com
In reply to: Robert Haas (#11)
Re: pg15b2: large objects lost on upgrade

On Wed, Jul 06, 2022 at 08:25:04AM -0400, Robert Haas wrote:

On Wed, Jul 6, 2022 at 7:56 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

I'm looking into it, but it'd help to hear suggestions about where to put it.
My current ideas aren't very good.

In main() there is a comment that begins "Most failures happen in
create_new_objects(), which has just completed at this point." I am
thinking you might want to insert a new function call just before that
comment, like remove_orphaned_files() or tidy_up_new_cluster().

Another option could be to do something at the beginning of
transfer_all_new_tablespaces().

That seems like the better option, since it has access to the custer's
filenodes.

I checked upgrades from 9.2, upgrades with/out vacuum full, and upgrades with a
DB tablespace.

Maybe it's a good idea to check that the file is empty before unlinking...

--
Justin

Attachments:

0001-fix-Preserve-relfilenodes-for-pg_largeobject-and-its.patchtext/x-diff; charset=us-asciiDownload+86-22
#13Robert Haas
robertmhaas@gmail.com
In reply to: Justin Pryzby (#12)
Re: pg15b2: large objects lost on upgrade

On Thu, Jul 7, 2022 at 1:10 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

Maybe it's a good idea to check that the file is empty before unlinking...

If we want to verify that there are no large objects in the cluster,
we could do that in check_new_cluster_is_empty(). However, even if
there aren't, the length of the file could still be more than 0, if
there were some large objects previously and then they were removed.
So it's not entirely obvious to me that we should refuse to remove a
non-empty file.

--
Robert Haas
EDB: http://www.enterprisedb.com

#14Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#13)
Re: pg15b2: large objects lost on upgrade

On Thu, Jul 7, 2022 at 01:38:44PM -0400, Robert Haas wrote:

On Thu, Jul 7, 2022 at 1:10 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

Maybe it's a good idea to check that the file is empty before unlinking...

If we want to verify that there are no large objects in the cluster,
we could do that in check_new_cluster_is_empty(). However, even if
there aren't, the length of the file could still be more than 0, if
there were some large objects previously and then they were removed.
So it's not entirely obvious to me that we should refuse to remove a
non-empty file.

Uh, that initdb-created pg_largeobject file should not have any data in
it ever, as far as I know at that point in pg_upgrade. How would values
have gotten in there? Via pg_dump?

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Indecision is a decision. Inaction is an action. Mark Batterson

#15Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#14)
Re: pg15b2: large objects lost on upgrade

On Thu, Jul 7, 2022 at 2:24 PM Bruce Momjian <bruce@momjian.us> wrote:

On Thu, Jul 7, 2022 at 01:38:44PM -0400, Robert Haas wrote:

On Thu, Jul 7, 2022 at 1:10 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

Maybe it's a good idea to check that the file is empty before unlinking...

If we want to verify that there are no large objects in the cluster,
we could do that in check_new_cluster_is_empty(). However, even if
there aren't, the length of the file could still be more than 0, if
there were some large objects previously and then they were removed.
So it's not entirely obvious to me that we should refuse to remove a
non-empty file.

Uh, that initdb-created pg_largeobject file should not have any data in
it ever, as far as I know at that point in pg_upgrade. How would values
have gotten in there? Via pg_dump?

I was thinking if the user had done it manually before running pg_upgrade.

--
Robert Haas
EDB: http://www.enterprisedb.com

#16Justin Pryzby
pryzby@telsasoft.com
In reply to: Robert Haas (#15)
Re: pg15b2: large objects lost on upgrade

On Thu, Jul 07, 2022 at 02:38:44PM -0400, Robert Haas wrote:

On Thu, Jul 7, 2022 at 2:24 PM Bruce Momjian <bruce@momjian.us> wrote:

On Thu, Jul 7, 2022 at 01:38:44PM -0400, Robert Haas wrote:

On Thu, Jul 7, 2022 at 1:10 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

Maybe it's a good idea to check that the file is empty before unlinking...

If we want to verify that there are no large objects in the cluster,
we could do that in check_new_cluster_is_empty(). However, even if
there aren't, the length of the file could still be more than 0, if
there were some large objects previously and then they were removed.
So it's not entirely obvious to me that we should refuse to remove a
non-empty file.

Uh, that initdb-created pg_largeobject file should not have any data in
it ever, as far as I know at that point in pg_upgrade. How would values
have gotten in there? Via pg_dump?

I was thinking if the user had done it manually before running pg_upgrade.

We're referring to the new cluster which should have been initdb'd more or less
immediately before running pg_upgrade [0].

It'd be weird to me if someone were to initdb a new cluster, then create some
large objects, and then maybe delete them, and then run pg_upgrade. Why
wouldn't they do their work on the old cluster before upgrading, or on the new
cluster afterwards ?

Does anybody actually do anything significant between initdb and pg_upgrade ?
Is that considered to be supported ? It seems like pg_upgrade could itself run
initdb, with the correct options for locale, checksum, block size, etc
(although it probably has to support the existing way to handle "compatible
encodings").

Actually, I think check_new_cluster_is_empty() ought to prohibit doing work
between initdb and pg_upgrade by checking that no objects have *ever* been
created in the new cluster, by checking that NextOid == 16384. But I have a
separate thread about "pg-upgrade allows itself to be re-run", and this has
more to do with that than about whether to check that the file is empty before
removing it.

--
Justin

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Justin Pryzby (#16)
Re: pg15b2: large objects lost on upgrade

Justin Pryzby <pryzby@telsasoft.com> writes:

On Thu, Jul 07, 2022 at 02:38:44PM -0400, Robert Haas wrote:

On Thu, Jul 7, 2022 at 2:24 PM Bruce Momjian <bruce@momjian.us> wrote:

Uh, that initdb-created pg_largeobject file should not have any data in
it ever, as far as I know at that point in pg_upgrade. How would values
have gotten in there? Via pg_dump?

I was thinking if the user had done it manually before running pg_upgrade.

We're referring to the new cluster which should have been initdb'd more or less
immediately before running pg_upgrade [0].

It'd be weird to me if someone were to initdb a new cluster, then create some
large objects, and then maybe delete them, and then run pg_upgrade.

AFAIK you're voiding the warranty if you make any changes at all in the
destination cluster before pg_upgrade'ing. As an example, if you created
a table there you'd be risking an OID and/or relfilenode collision with
something due to be imported from the source cluster.

Actually, I think check_new_cluster_is_empty() ought to prohibit doing work
between initdb and pg_upgrade by checking that no objects have *ever* been
created in the new cluster, by checking that NextOid == 16384.

It would be good to have some such check; I'm not sure that that one in
particular is the best option.

But I have a
separate thread about "pg-upgrade allows itself to be re-run", and this has
more to do with that than about whether to check that the file is empty before
removing it.

Yeah, that's another foot-gun in the same area.

regards, tom lane

#18Robert Haas
robertmhaas@gmail.com
In reply to: Justin Pryzby (#16)
Re: pg15b2: large objects lost on upgrade
+               /* Keep track of whether a filenode matches the OID */
+               if (maps[mapnum].relfilenumber == LargeObjectRelationId)
+                       *has_lotable = true;
+               if (maps[mapnum].relfilenumber == LargeObjectLOidPNIndexId)
+                       *has_loindex = true;
On Thu, Jul 7, 2022 at 2:44 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

We're referring to the new cluster which should have been initdb'd more or less
immediately before running pg_upgrade [0].

It'd be weird to me if someone were to initdb a new cluster, then create some
large objects, and then maybe delete them, and then run pg_upgrade. Why
wouldn't they do their work on the old cluster before upgrading, or on the new
cluster afterwards ?

Does anybody actually do anything significant between initdb and pg_upgrade ?
Is that considered to be supported ? It seems like pg_upgrade could itself run
initdb, with the correct options for locale, checksum, block size, etc
(although it probably has to support the existing way to handle "compatible
encodings").

Actually, I think check_new_cluster_is_empty() ought to prohibit doing work
between initdb and pg_upgrade by checking that no objects have *ever* been
created in the new cluster, by checking that NextOid == 16384. But I have a
separate thread about "pg-upgrade allows itself to be re-run", and this has
more to do with that than about whether to check that the file is empty before
removing it.

I think you're getting at a really good point here which is also my
point: we assume that nothing significant has happened between when
the cluster was created and when pg_upgrade is run, but we don't check
it. Either we shouldn't assume it, or we should check it.

So, is such activity ever legitimate? I think there are people doing
it. The motivation is that maybe you have a dump from the old database
that doesn't quite restore on the new version, but by doing something
to the new cluster, you can make it restore. For instance, maybe there
are some functions that used to be part of core and are now only
available in an extension. That's going to make pg_upgrade's
dump-and-restore workflow fail, but if you install that extension onto
the new cluster, perhaps you can work around the problem. It doesn't
have to be an extension, even. Maybe some function in core just got an
extra argument, and you're using it, so the calls to that function
cause dump-and-restore to fail. You might try overloading it in the
new database with the old calling sequence to get things working.

Now, are these kinds of things considered to be supported? Well, I
don't know that we've made any statement about that one way or the
other. Perhaps they are not. But I can see why people want to use
workarounds like this. The alternative is having to dump-and-restore
instead of an in-place upgrade, and that's painfully slow by
comparison. pg_upgrade itself doesn't give you any tools to deal with
this kind of situation, but the process is just loose enough to allow
people to insert their own workarounds, so they do. I'm sure I'd do
the same, in their shoes.

My view on this is that, while we probably don't want to make such
things officially supported, I don't think we should ban it outright,
either. We probably can't support an upgrade after the next cluster
has been subjected to arbitrary amounts of tinkering, but we're making
a mistake if we write code that has fragile assumptions for no really
good reason. I think we can do better than this excerpt from your
patch, for example:

+               /* Keep track of whether a filenode matches the OID */
+               if (maps[mapnum].relfilenumber == LargeObjectRelationId)
+                       *has_lotable = true;
+               if (maps[mapnum].relfilenumber == LargeObjectLOidPNIndexId)
+                       *has_loindex = true;

I spent a while struggling to understand this because it seems to me
that every database has an LO table and an LO index, so what's up with
these names? I think what these names are really tracking is whether
the relfilenumber of pg_largeobject and its index in the old database
had their default values. But this makes the assumption that the LO
table and LO index in the new database have never been subjected to
VACUUM FULL or CLUSTER and, while there's no real reason to do that, I
can't quite see what the point of such an unnecessary and fragile
assumption might be. If Dilip's patch to make relfilenodes 56 bits
gets committed, this is going to break anyway, because with that
patch, the relfilenode for a table or index doesn't start out being
equal to its OID.

Perhaps a better solution to this particular problem is to remove the
backing files for the large object table and index *before* restoring
the dump, deciding what files to remove by asking the running server
for the file path. It might seem funny to allow for dangling pg_class
entries, but we're going to create that situation for all other user
rels anyway, and pg_upgrade treats pg_largeobject as a user rel.

--
Robert Haas
EDB: http://www.enterprisedb.com

#19Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#7)
Re: pg15b2: large objects lost on upgrade

On Tue, Jul 5, 2022 at 12:43:54PM -0400, Robert Haas wrote:

On Sat, Jul 2, 2022 at 11:49 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

I suppose it's like Bruce said, here.

/messages/by-id/20210601140949.GC22012@momjian.us

Well, I feel dumb. I remember reading that email back when Bruce sent
it, but it seems that it slipped out of my head between then and when
I committed. I think your patch is fine, except that I think maybe we

It happens to us all.

I had a moment of panic this morning where I thought maybe the whole

Yes, I have had those panics too.

patch needed to be reverted. I was worried that we might need to
preserve the OID of every system table and index. Otherwise, what
happens if the OID counter in the old cluster wraps around and some
user-created object gets an OID that the system tables are using in
the new cluster? However, I think this can't happen, because when the
OID counter wraps around, it wraps around to 16384, and the
relfilenode values for newly created system tables and indexes are all
less than 16384. So I believe we only need to fix pg_largeobject and
its index, and I think your patch does that.

So, let me explain how I look at this. There are two number-spaces, oid
and relfilenode. In each number-space, there are system-assigned ones
less than 16384, and higher ones for post-initdb use.

What we did in pre-PG15 was to preserve only oids, and have the
relfilenode match the oid, and we have discussed the negatives of this.

For PG 15+, we preserve relfilenodes too. These number assignment cases
only work if we handle _all_ numbering, except for non-pg_largeobject
system tables.

In pre-PG15, pg_largeobject was easily handled because initdb already
assigned the oid and relfilenode to be the same for pg_largeobject, so a
simple copy worked fine. pg_largeobject is an anomaly in PG 15 because
it is assigned a relfilenode in the system number space by initdb, but
then it needs to be potentially renamed into the relfilenode user number
space. This is the basis for my email as already posted:

/messages/by-id/20210601140949.GC22012@momjian.us

You are right to be concerned since you are spanning number spaces, but
I think you are fine because the relfilenode in the user-space cannot
have been used since it already was being used in each database. It is
true we never had a per-database rename like this before.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Indecision is a decision. Inaction is an action. Mark Batterson

#20Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#19)
Re: pg15b2: large objects lost on upgrade

On Thu, Jul 7, 2022 at 4:16 PM Bruce Momjian <bruce@momjian.us> wrote:

You are right to be concerned since you are spanning number spaces, but
I think you are fine because the relfilenode in the user-space cannot
have been used since it already was being used in each database. It is
true we never had a per-database rename like this before.

Thanks for checking over the reasoning, and the kind words in general.
I just committed Justin's fix for the bug, without fixing the fact
that the new cluster's original pg_largeobject files will be left
orphaned afterward. That's a relatively minor problem by comparison,
and it seemed best to me not to wait too long to get the main issue
addressed.

--
Robert Haas
EDB: http://www.enterprisedb.com

#21Justin Pryzby
pryzby@telsasoft.com
In reply to: Robert Haas (#18)
#22Robert Haas
robertmhaas@gmail.com
In reply to: Justin Pryzby (#21)
#23Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#20)
#24Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#23)
#25Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#24)
#26Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#25)
#27Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#25)
#28Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#27)
#29Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#27)
#30Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#28)
#31Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#27)
#32Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#31)
#33Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#32)
#34Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#33)
#35Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#34)
#36Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#35)
#37Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#36)
#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#37)
#39Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#38)
#40Andrew Dunstan
andrew@dunslane.net
In reply to: Robert Haas (#35)
#41Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#39)
#42Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#41)
#43Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#42)
#44Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#43)
#45Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#44)
#46Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#45)
#47Noah Misch
noah@leadboat.com
In reply to: Tom Lane (#42)
#48Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noah Misch (#47)
#49Jonathan S. Katz
jkatz@postgresql.org
In reply to: Tom Lane (#48)
#50Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jonathan S. Katz (#49)
#51Jonathan S. Katz
jkatz@postgresql.org
In reply to: Tom Lane (#50)
#52Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#50)
#53Jonathan S. Katz
jkatz@postgresql.org
In reply to: Robert Haas (#52)
#54Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jonathan S. Katz (#51)
#55Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jonathan S. Katz (#53)
#56Jonathan S. Katz
jkatz@postgresql.org
In reply to: Tom Lane (#55)
#57Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jonathan S. Katz (#56)
#58Jonathan S. Katz
jkatz@postgresql.org
In reply to: Tom Lane (#57)
#59Jonathan S. Katz
jkatz@postgresql.org
In reply to: Jonathan S. Katz (#58)
#60Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#57)
#61Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#60)
#62Jonathan S. Katz
jkatz@postgresql.org
In reply to: Tom Lane (#61)
#63Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#61)
In reply to: Tom Lane (#54)
In reply to: Robert Haas (#60)
#66Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#60)
In reply to: Andres Freund (#66)
#68Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#67)
#69Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#66)
In reply to: Tom Lane (#68)
#71Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#69)
#72Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#71)
In reply to: Tom Lane (#71)
#74Jonathan S. Katz
jkatz@postgresql.org
In reply to: Peter Geoghegan (#73)
#75Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jonathan S. Katz (#74)
#76Jonathan S. Katz
jkatz@postgresql.org
In reply to: Tom Lane (#75)
#77Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#75)
#78Robert Haas
robertmhaas@gmail.com
In reply to: Jonathan S. Katz (#76)
#79Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jonathan S. Katz (#76)
#80Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#77)
#81Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#80)
#82Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#81)
#83Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#81)
In reply to: Robert Haas (#81)
#85Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#84)
In reply to: Tom Lane (#85)
#87Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#83)
#88Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#87)
#89Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#82)
In reply to: Robert Haas (#89)
#91Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#90)
#92Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#88)
In reply to: Robert Haas (#91)
#94Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#54)
#95Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#94)
#96Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#95)
#97Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#96)