Horribly slow pg_upgrade performance with many Large Objects

Started by Hannu Krosing11 months ago48 messages
Jump to latest
#1Hannu Krosing
hannu@tm.ee

Hi Hackers

## The issue

I have now met a not insignificant number of cases where pg_upgrade
performance is really bad when the database has a large number of
Large Objects.

The average time to `pg_dump --binary-upgrade --format=custom ...` a
database and then `pg_restore ...` it back is 1 minute per 1 million
LOs,

This does not seem to bad until you realize that it is
- 100 minutes, ot 1H40min for 100M LOs
- 16H40 min for 1 Billion large objects .
- 66H40m or 4 2days 18 hours and 40 min for those unfortunate enough
who have to full 4 Billion LOs

My direct tests have 10m5s for 10M LOs and 100m54s for 100M.

Also a smaller server just runs out of shared memory when restoring 100M LOs.

I know that our semi-official recommendation for using Large Objects
is "Don't!" but still people use them and sometimes they have a lot of
them. One of the reasons seems to be an older version of Hibernate
which converted any column of type CLOB into a Large Object, so we see
cases where average LO size is a few hundred to a few thousand bytes.

I read through the old discussions around the time of release of v12
and the consensus seemed to be at that time that dumping the LOs
without data was ok "because there will not be too many LOs" which
unfortunately has turned out to be wishful thinking, as there are lots
of Tiny Large Object out there and these really slow down pg_upgrade.

## So what to do about this?

The obvious solution would be to handle the table
`pg_largeobject_metadata` the same way as we currently handle
`pg_largeobject `by not doing anything with it in `pg_dump
--binary-upgrade` and just handle the contents it like we do for user
tables in pg_upgrade itself.

This should work fine for all source database versions starting from PgSQL v12.

For older supported versions 9.2 to 11 where `oid` was system column
we should just dump the three fields of `pg_largeobject_metadata`
directly so that the dump command would be `COPY (SELECT oid,
lomowner, lomacl) TO stdout` and then restore would happen more or
less automatically.

Or we could just modify `pg_restore` so that it restores LOs using
COPY instead of doing the current dance of 3 separate SELECT's from
lo_xxx() functions.

pg_dump --binary-upgrade format=custome is relatively fast for LOs -
1m58s for 100M object,

though still much slower than `pg_dump --data-only -t
pg_largeobject_metadata` which is 20 sec, or 23 when also run through
gzip

## preferred solution for --binary-upgrade

- link/copy data files when source is v12+
- dump the triplet of pg_largeobject_metadata columns if source is v9.2 - v11

an acceptable faster solution would be to just always dump
pg_largeobject_metadata (oid, lomowner, lomacl) as this would not
need any changes to `pg_upgrade`, just to `pg_dump --binary-upgrade`
it would not be as fast as linking/copying but would be several orders
of magnitude faster than current behaviour.

Does anyone have any objections to this ?

Would anyone be willing to take up fixing this ?

Or perhaps advise where in code the change should be done ?

I started looking at this and at high level It looks like I need to
create a task to copy out specific data columns and not the structure
of pg_largeobject metadata but as pg_dump.c is 582,673 it would take
some time to understand where exactly to do this and all the possible
interactions with flags like --binary-upgrade and specific formats.

---
Cheers,
Hannu

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hannu Krosing (#1)
Re: Horribly slow pg_upgrade performance with many Large Objects

Hannu Krosing <hannuk@google.com> writes:

I have now met a not insignificant number of cases where pg_upgrade
performance is really bad when the database has a large number of
Large Objects.

What version are you testing? We did some work in that area in the
v17 cycle (a45c78e32).

regards, tom lane

#3Nathan Bossart
nathandbossart@gmail.com
In reply to: Hannu Krosing (#1)
Re: Horribly slow pg_upgrade performance with many Large Objects

On Mon, Apr 07, 2025 at 10:33:47PM +0200, Hannu Krosing wrote:

The obvious solution would be to handle the table
`pg_largeobject_metadata` the same way as we currently handle
`pg_largeobject `by not doing anything with it in `pg_dump
--binary-upgrade` and just handle the contents it like we do for user
tables in pg_upgrade itself.

This should work fine for all source database versions starting from PgSQL v12.

Unfortunately, the storage format for aclitem changed in v16, so this would
need to be restricted to upgrades from v16 and newer. That being said, I
regularly hear about slow upgrades with many LOs, so I think it'd be
worthwhile to try to improve matters in v19.

--
nathan

#4Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#2)
Re: Horribly slow pg_upgrade performance with many Large Objects

On Mon, Apr 07, 2025 at 05:25:32PM -0400, Tom Lane wrote:

What version are you testing? We did some work in that area in the
v17 cycle (a45c78e32).

I am puzzled by the target version used here, as well. If there is
more that can be improved, v19 would be the version to consider for
future improvements at this stage.
--
Michael

#5Hannu Krosing
hannu@tm.ee
In reply to: Michael Paquier (#4)
Re: Horribly slow pg_upgrade performance with many Large Objects

I was testing on version 17

On Tue, Apr 8, 2025 at 6:52 AM Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Apr 07, 2025 at 05:25:32PM -0400, Tom Lane wrote:

What version are you testing? We did some work in that area in the
v17 cycle (a45c78e32).

I am puzzled by the target version used here, as well.

I was testing on version 17

Here is how you can easily test too (as --binary-upgrade does not dump
the actual data it is ok for the test to not put anything there)

hannuk@db01-c1a:~/work/lo-testing$ createdb -p 5433 lodb
hannuk@db01-c1a:~/work/lo-testing$ psql -p 5433 lodb
psql (17.4 (Ubuntu 17.4-1.pgdg22.04+2))
Type "help" for help.

lodb=# insert into pg_largeobject_metadata(oid, lomowner) SELECT i,
16384 FROM generate_series(1, 100_000_000) g(i);
INSERT 0 100000000
Time: 162414.216 ms (02:42.414)
lodb=#
\q

hannuk@db01-c1a:~/work/lo-testing$ time pg_dump --data-only -t
pg_largeobject_metadata -p 5433 lodb | gzip >
pg_largeobject_metadata.data.gz
real 0m22.094s
user 0m20.741s
sys 0m2.085s

hannuk@db01-c1a:~/work/lo-testing$ time pg_dump --data-only -t
pg_largeobject_metadata --format=custom -p 5433 lodb -f
pg_largeobject_metadata.dump
real 0m20.226s
user 0m18.068s
sys 0m0.824s

If there is
more that can be improved, v19 would be the version to consider for
future improvements at this stage.

If the internal format has changed in 16 the correct way would be to
go through the data-only dump of pg_largeobject_metadata in all cases.
Even for the 100M case where you get the restore in 2 minutes instead
of 100 minutes

hannuk@db01-c1a:~/work/lo-testing$ createdb -p 5434 lodb
hannuk@db01-c1a:~/work/lo-testing$ time pg_restore -p 5434
--exit-on-error --transaction-size=1000 --dbname lodb
pg_largeobject_metadata.dump

real 2m2.277s
user 0m2.594s
sys 0m0.549s

And even in case of the user-visible format change in acl format it is
most likely that changing the visible format using some regexp magic,
or even a dedicated function, would still me much faster than creating
all the LOs though creation commands.

------
The commands I used to do the pg_upgrade-like test were

hannuk@db01-c1a:~/work/lo-testing$ time pg_dump --schema-only
--quote-all-identifiers --binary-upgrade --format=custom
--file=lodb100m.dump -p 5433 lodb
real 1m58.241s
user 0m35.229s
sys 0m17.854s

hannuk@db01-c1a:~/work/lo-testing$ time pg_restore -p 5434
--exit-on-error --transaction-size=1000 --dbname lodb lodb100m.dump
real 100m54.878s
user 3m23.885s
sys 20m33.761s

(I left out the --verbose part that pg_upgrade also sets as I did not
want to get 100M lines of "large object created " messages )

also the postgres server at -p 5434 needs to be started with -b flag
to accept the loading a dump from --binary-upgrade. In Debian/Ubuntu
this can be directly passed to pg_ctlcluster as follows

sudo pg_ctlcluster 17 target -o -b

----
Hannu

Show quoted text

--
Michael

#6Hannu Krosing
hannu@tm.ee
In reply to: Hannu Krosing (#5)
Re: Horribly slow pg_upgrade performance with many Large Objects

Looked like a bit illogical order on re-reading it so I want to make
clear that the pg_upgrade-like test showing 100min for 100 million LOs
is at the end of last message and the proposed solution is at the
beginning

Show quoted text

On Tue, Apr 8, 2025 at 9:15 AM Hannu Krosing <hannuk@google.com> wrote:

I was testing on version 17

On Tue, Apr 8, 2025 at 6:52 AM Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Apr 07, 2025 at 05:25:32PM -0400, Tom Lane wrote:

What version are you testing? We did some work in that area in the
v17 cycle (a45c78e32).

I am puzzled by the target version used here, as well.

I was testing on version 17

Here is how you can easily test too (as --binary-upgrade does not dump
the actual data it is ok for the test to not put anything there)

hannuk@db01-c1a:~/work/lo-testing$ createdb -p 5433 lodb
hannuk@db01-c1a:~/work/lo-testing$ psql -p 5433 lodb
psql (17.4 (Ubuntu 17.4-1.pgdg22.04+2))
Type "help" for help.

lodb=# insert into pg_largeobject_metadata(oid, lomowner) SELECT i,
16384 FROM generate_series(1, 100_000_000) g(i);
INSERT 0 100000000
Time: 162414.216 ms (02:42.414)
lodb=#
\q

hannuk@db01-c1a:~/work/lo-testing$ time pg_dump --data-only -t
pg_largeobject_metadata -p 5433 lodb | gzip >
pg_largeobject_metadata.data.gz
real 0m22.094s
user 0m20.741s
sys 0m2.085s

hannuk@db01-c1a:~/work/lo-testing$ time pg_dump --data-only -t
pg_largeobject_metadata --format=custom -p 5433 lodb -f
pg_largeobject_metadata.dump
real 0m20.226s
user 0m18.068s
sys 0m0.824s

If there is
more that can be improved, v19 would be the version to consider for
future improvements at this stage.

If the internal format has changed in 16 the correct way would be to
go through the data-only dump of pg_largeobject_metadata in all cases.
Even for the 100M case where you get the restore in 2 minutes instead
of 100 minutes

hannuk@db01-c1a:~/work/lo-testing$ createdb -p 5434 lodb
hannuk@db01-c1a:~/work/lo-testing$ time pg_restore -p 5434
--exit-on-error --transaction-size=1000 --dbname lodb
pg_largeobject_metadata.dump

real 2m2.277s
user 0m2.594s
sys 0m0.549s

And even in case of the user-visible format change in acl format it is
most likely that changing the visible format using some regexp magic,
or even a dedicated function, would still me much faster than creating
all the LOs though creation commands.

------
The commands I used to do the pg_upgrade-like test were

hannuk@db01-c1a:~/work/lo-testing$ time pg_dump --schema-only
--quote-all-identifiers --binary-upgrade --format=custom
--file=lodb100m.dump -p 5433 lodb
real 1m58.241s
user 0m35.229s
sys 0m17.854s

hannuk@db01-c1a:~/work/lo-testing$ time pg_restore -p 5434
--exit-on-error --transaction-size=1000 --dbname lodb lodb100m.dump
real 100m54.878s
user 3m23.885s
sys 20m33.761s

(I left out the --verbose part that pg_upgrade also sets as I did not
want to get 100M lines of "large object created " messages )

also the postgres server at -p 5434 needs to be started with -b flag
to accept the loading a dump from --binary-upgrade. In Debian/Ubuntu
this can be directly passed to pg_ctlcluster as follows

sudo pg_ctlcluster 17 target -o -b

----
Hannu

--
Michael

#7Hannu Krosing
hannu@tm.ee
In reply to: Nathan Bossart (#3)
Re: Horribly slow pg_upgrade performance with many Large Objects

On Tue, Apr 8, 2025 at 12:17 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

On Mon, Apr 07, 2025 at 10:33:47PM +0200, Hannu Krosing wrote:

The obvious solution would be to handle the table
`pg_largeobject_metadata` the same way as we currently handle
`pg_largeobject `by not doing anything with it in `pg_dump
--binary-upgrade` and just handle the contents it like we do for user
tables in pg_upgrade itself.

This should work fine for all source database versions starting from PgSQL v12.

Unfortunately, the storage format for aclitem changed in v16, so this would
need to be restricted to upgrades from v16 and newer.

Have we also changed the external format of aclitem any time since v
9.2 or are the changes just to storage ?

If external formats have been stable we can still get reasonable
performance with dumping the data (2 min for 100M rows)
Plus dumping data would work for all the supported source versions.

The worst case would still be quite bad with 80+ min for the full set
of 4 billion LOs but even that would be much better than the 3 days
with current wayd.

That being said, I
regularly hear about slow upgrades with many LOs, so I think it'd be
worthwhile to try to improve matters in v19.

Changing the LO export to dumping pg_largeobject_metadata content
instead of creating the LOs should be a nice small change confined to
pg_dump --binary-upgrade only so perhaps we could squeeze it in v18
still.

--
Hannu

#8Nathan Bossart
nathandbossart@gmail.com
In reply to: Hannu Krosing (#7)
Re: Horribly slow pg_upgrade performance with many Large Objects

On Tue, Apr 08, 2025 at 09:35:24AM +0200, Hannu Krosing wrote:

On Tue, Apr 8, 2025 at 12:17 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

That being said, I
regularly hear about slow upgrades with many LOs, so I think it'd be
worthwhile to try to improve matters in v19.

Changing the LO export to dumping pg_largeobject_metadata content
instead of creating the LOs should be a nice small change confined to
pg_dump --binary-upgrade only so perhaps we could squeeze it in v18
still.

Feature freeze for v18 was ~4 hours ago, so unfortunately this is v19
material at this point.

--
nathan

#9Hannu Krosing
hannu@tm.ee
In reply to: Nathan Bossart (#8)
Re: Horribly slow pg_upgrade performance with many Large Objects

On Tue, Apr 8, 2025 at 5:46 PM Nathan Bossart <nathandbossart@gmail.com> wrote:

On Tue, Apr 08, 2025 at 09:35:24AM +0200, Hannu Krosing wrote:

On Tue, Apr 8, 2025 at 12:17 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

That being said, I
regularly hear about slow upgrades with many LOs, so I think it'd be
worthwhile to try to improve matters in v19.

Changing the LO export to dumping pg_largeobject_metadata content
instead of creating the LOs should be a nice small change confined to
pg_dump --binary-upgrade only so perhaps we could squeeze it in v18
still.

Feature freeze for v18 was ~4 hours ago, so unfortunately this is v19
material at this point.

Sure. But I actually think that this is something that should be
back-ported to at least all supported versions at some pon.
Possibly made dependent on some environment flag so only people that
desperately need it will get it.

Btw, who would be the right person(s) to ask questions about internals
of pg_dump ?
I have a few more things in the pipeline to add there and would like
to make sure that I have the right approach.

------
Hannu

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nathan Bossart (#8)
Re: Horribly slow pg_upgrade performance with many Large Objects

Nathan Bossart <nathandbossart@gmail.com> writes:

On Tue, Apr 08, 2025 at 09:35:24AM +0200, Hannu Krosing wrote:

Changing the LO export to dumping pg_largeobject_metadata content
instead of creating the LOs should be a nice small change confined to
pg_dump --binary-upgrade only so perhaps we could squeeze it in v18
still.

Feature freeze for v18 was ~4 hours ago, so unfortunately this is v19
material at this point.

Yeah, even if we had a patch in hand, it's too late for v18. However
there are additional problems with this idea:

1. The idea requires role OIDs to match across the upgrade.
I don't believe that pg_upgrade tries to preserve role OIDs --- and
doing so would be problematic, because what if the new cluster's
bootstrap superuser is named differently in the old and new clusters?

It might be possible to work around that with some casting to/from
regrole, but I don't think a simple COPY into pg_largeobject_metadata
will play along with that.

2. If you just do the equivalent of an INSERT or COPY into
pg_largeobject_metadata, you could create entries that look right,
but they are actually not right because there should be pg_shdepend
entries backing each ownership or permission reference (for non-pinned
roles) and there won't be.

I guess you could think of also manually inserting rows into
pg_shdepend, but (a) ugh and (b) the claimed speedup is kind
of vanishing into the distance at this point.

regards, tom lane

#11Hannu Krosing
hannu@tm.ee
In reply to: Tom Lane (#10)
Re: Horribly slow pg_upgrade performance with many Large Objects

This is what the opening comment in pg_upgrade says

I think we do preserve role oids

/*
* To simplify the upgrade process, we force certain system values to be
* identical between old and new clusters:
*
* We control all assignments of pg_class.oid (and relfilenode) so toast
* oids are the same between old and new clusters. This is important
* because toast oids are stored as toast pointers in user tables.
*
* While pg_class.oid and pg_class.relfilenode are initially the same in a
* cluster, they can diverge due to CLUSTER, REINDEX, or VACUUM FULL. We
* control assignments of pg_class.relfilenode because we want the filenames
* to match between the old and new cluster.
*
* We control assignment of pg_tablespace.oid because we want the oid to match
* between the old and new cluster.
*
* We control all assignments of pg_type.oid because these oids are stored
* in user composite type values.
*
* We control all assignments of pg_enum.oid because these oids are stored
* in user tables as enum values.
*
* We control all assignments of pg_authid.oid for historical reasons (the
* oids used to be stored in pg_largeobject_metadata, which is now copied via
* SQL commands), that might change at some point in the future.
*/

Show quoted text

On Tue, Apr 8, 2025 at 6:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Nathan Bossart <nathandbossart@gmail.com> writes:

On Tue, Apr 08, 2025 at 09:35:24AM +0200, Hannu Krosing wrote:

Changing the LO export to dumping pg_largeobject_metadata content
instead of creating the LOs should be a nice small change confined to
pg_dump --binary-upgrade only so perhaps we could squeeze it in v18
still.

Feature freeze for v18 was ~4 hours ago, so unfortunately this is v19
material at this point.

Yeah, even if we had a patch in hand, it's too late for v18. However
there are additional problems with this idea:

1. The idea requires role OIDs to match across the upgrade.
I don't believe that pg_upgrade tries to preserve role OIDs --- and
doing so would be problematic, because what if the new cluster's
bootstrap superuser is named differently in the old and new clusters?

It might be possible to work around that with some casting to/from
regrole, but I don't think a simple COPY into pg_largeobject_metadata
will play along with that.

2. If you just do the equivalent of an INSERT or COPY into
pg_largeobject_metadata, you could create entries that look right,
but they are actually not right because there should be pg_shdepend
entries backing each ownership or permission reference (for non-pinned
roles) and there won't be.

I guess you could think of also manually inserting rows into
pg_shdepend, but (a) ugh and (b) the claimed speedup is kind
of vanishing into the distance at this point.

regards, tom lane

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hannu Krosing (#11)
Re: Horribly slow pg_upgrade performance with many Large Objects

Hannu Krosing <hannuk@google.com> writes:

I think we do preserve role oids

Oh ... I'd been looking for mentions of "role" in
pg_upgrade_support.c, but what I should have looked for was
"pg_authid". So yeah, we do preserve role OIDs, and maybe that's
enough to make this workable, at least with source versions that
share the same rules for what goes into pg_largeobject_metadata and
pg_shdepend. It's not something I'd risk back-patching though.

regards, tom lane

#13Nathan Bossart
nathandbossart@gmail.com
In reply to: Tom Lane (#12)
Re: Horribly slow pg_upgrade performance with many Large Objects

On Tue, Apr 08, 2025 at 12:37:43PM -0400, Tom Lane wrote:

Hannu Krosing <hannuk@google.com> writes:

I think we do preserve role oids

Oh ... I'd been looking for mentions of "role" in
pg_upgrade_support.c, but what I should have looked for was
"pg_authid". So yeah, we do preserve role OIDs, and maybe that's
enough to make this workable, at least with source versions that
share the same rules for what goes into pg_largeobject_metadata and
pg_shdepend. It's not something I'd risk back-patching though.

I do think it's worth considering going back to copying
pg_largobject_metadata's files for upgrades from v16 and newer. That
sounds restrictive at the moment, but it'll mean that all but one supported
major version can copy the files during upgrade to v19. I'll admit I'm a
tad worried about having to go back to copying via SQL commands in the
future and re-regressing things (leading to unpredictable differences in
upgrade downtime), but I'm not sure that's a great reason to withhold this
optimization.

Of course, I wouldn't be opposed to optimizing the SQL command strategy,
too, but I suspect that won't compare to copying the files.

--
nathan

#14Hannu Krosing
hannu@tm.ee
In reply to: Tom Lane (#12)
Re: Horribly slow pg_upgrade performance with many Large Objects

On Tue, Apr 8, 2025 at 6:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hannu Krosing <hannuk@google.com> writes:

I think we do preserve role oids

Oh ... I'd been looking for mentions of "role" in
pg_upgrade_support.c, but what I should have looked for was
"pg_authid". So yeah, we do preserve role OIDs, and maybe that's
enough to make this workable, at least with source versions that
share the same rules for what goes into pg_largeobject_metadata and
pg_shdepend. It's not something I'd risk back-patching though.

The risk is why I suggest to have this in backports as something last
resort which is gated on an environment variable

I have been forced to do this half-manually a few times to make
pg_upgrade feasible at all.

Current really UGH workaround is to use pg_repack support for swapping
relfilenodes of pg_largeobject_metadata with a user table before and
after the upgrade and then manually patching up leftovers like
pg_shdepend after pg_upgrade.

These have fortunately been cases where there were no other
dependencies like comments or security labels on LOs, but these to
should be doable;

Show quoted text

regards, tom lane

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nathan Bossart (#13)
Re: Horribly slow pg_upgrade performance with many Large Objects

Nathan Bossart <nathandbossart@gmail.com> writes:

I do think it's worth considering going back to copying
pg_largobject_metadata's files for upgrades from v16 and newer.

(If we do this) I don't see why we'd need to stop at v16. I'm
envisioning that we'd use COPY, which will be dealing in the
text representation of aclitems, and I don't think that's changed
in a long time. The sort of thing that would break it is changes
in the set of available/default privilege bits for large objects.

That is, where the dump currently contains something like

SELECT pg_catalog.lo_create('2121');
ALTER LARGE OBJECT 2121 OWNER TO postgres;
GRANT ALL ON LARGE OBJECT 2121 TO joe;

we'd have

COPY pg_largeobject_metadata FROM STDIN;
...
2121 10 {postgres=rw/postgres,joe=rw/postgres}
...

and some appropriate COPY data for pg_shdepend too.

(The fact that this representation will contain both numeric and
symbolic role OIDs is why I was concerned about OID stability.)

The key thing that worries me here is if the source and target
versions have different ideas of which roles are pinned, which would
silently change what appears in pg_shdepend. But it'd only really
break if a role mentioned in some LO's owner or ACL is pinned in the
source and not in the target, which seems unlikely. (In the other
direction, we'd just be adding a useless row in pg_shdepend.)

regards, tom lane

#16Hannu Krosing
hannu@tm.ee
In reply to: Tom Lane (#15)
Re: Horribly slow pg_upgrade performance with many Large Objects

On Tue, Apr 8, 2025 at 7:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Nathan Bossart <nathandbossart@gmail.com> writes:

I do think it's worth considering going back to copying
pg_largobject_metadata's files for upgrades from v16 and newer.

(If we do this) I don't see why we'd need to stop at v16. I'm
envisioning that we'd use COPY, which will be dealing in the
text representation of aclitems, and I don't think that's changed
in a long time. The sort of thing that would break it is changes
in the set of available/default privilege bits for large objects.

That is, where the dump currently contains something like

SELECT pg_catalog.lo_create('2121');
ALTER LARGE OBJECT 2121 OWNER TO postgres;
GRANT ALL ON LARGE OBJECT 2121 TO joe;

Also note that in my --binary-upgrade tests the 100 min / 100M objects
ratio was in case with no grants. I would expect this to grow to at
least 120 to 150 minutes when grants are also involved.

In copy case I would expect the presence of grants to not make much difference.

we'd have

COPY pg_largeobject_metadata FROM STDIN;
...
2121 10 {postgres=rw/postgres,joe=rw/postgres}
...

--
Hannu

#17Nathan Bossart
nathandbossart@gmail.com
In reply to: Tom Lane (#15)
Re: Horribly slow pg_upgrade performance with many Large Objects

On Tue, Apr 08, 2025 at 01:07:09PM -0400, Tom Lane wrote:

Nathan Bossart <nathandbossart@gmail.com> writes:

I do think it's worth considering going back to copying
pg_largobject_metadata's files for upgrades from v16 and newer.

(If we do this) I don't see why we'd need to stop at v16. I'm
envisioning that we'd use COPY, which will be dealing in the
text representation of aclitems, and I don't think that's changed
in a long time. The sort of thing that would break it is changes
in the set of available/default privilege bits for large objects.

I was thinking of actually reverting commit 12a53c7 for upgrades from v16,
which AFAICT is the last release where any relevant storage formats changed
(aclitem changed in v16). But if COPY gets us pretty close to that and is
less likely to be disrupted by future changes, it could be a better
long-term approach.

That is, where the dump currently contains something like

SELECT pg_catalog.lo_create('2121');
ALTER LARGE OBJECT 2121 OWNER TO postgres;
GRANT ALL ON LARGE OBJECT 2121 TO joe;

we'd have

COPY pg_largeobject_metadata FROM STDIN;
...
2121 10 {postgres=rw/postgres,joe=rw/postgres}
...

and some appropriate COPY data for pg_shdepend too.

Unless I'm missing something, we don't seem to have had any dependency
handling before commit 12a53c7. Was that broken before we moved to SQL
commands?

--
nathan

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hannu Krosing (#16)
Re: Horribly slow pg_upgrade performance with many Large Objects

Hannu Krosing <hannuk@google.com> writes:

In copy case I would expect the presence of grants to not make much difference.

aclitemin is slower than a lot of other datatype input functions,
but it's still got to be faster than a GRANT.

regards, tom lane

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hannu Krosing (#16)
Re: Horribly slow pg_upgrade performance with many Large Objects

Hmm ... one annoying thing for this project is that AFAICS pg_upgrade
does *not* preserve database OIDs, which is problematic for using
COPY to load pg_shdepend rows.

regards, tom lane

#20Nathan Bossart
nathandbossart@gmail.com
In reply to: Tom Lane (#19)
Re: Horribly slow pg_upgrade performance with many Large Objects

On Tue, Apr 08, 2025 at 01:36:58PM -0400, Tom Lane wrote:

Hmm ... one annoying thing for this project is that AFAICS pg_upgrade
does *not* preserve database OIDs, which is problematic for using
COPY to load pg_shdepend rows.

I think it does; see commit aa01051.

--
nathan

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nathan Bossart (#17)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nathan Bossart (#20)
#23Nathan Bossart
nathandbossart@gmail.com
In reply to: Tom Lane (#21)
#24Hannu Krosing
hannu@tm.ee
In reply to: Nathan Bossart (#23)
#25Jan Wieck
JanWieck@Yahoo.com
In reply to: Hannu Krosing (#24)
#26Nathan Bossart
nathandbossart@gmail.com
In reply to: Jan Wieck (#25)
#27Nathan Bossart
nathandbossart@gmail.com
In reply to: Tom Lane (#22)
#28Nathan Bossart
nathandbossart@gmail.com
In reply to: Hannu Krosing (#24)
#29Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#17)
#30Hannu Krosing
hannu@tm.ee
In reply to: Nathan Bossart (#29)
#31Nathan Bossart
nathandbossart@gmail.com
In reply to: Tom Lane (#15)
#32Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#31)
#33Hannu Krosing
hannu@tm.ee
In reply to: Nathan Bossart (#32)
#34Nathan Bossart
nathandbossart@gmail.com
In reply to: Hannu Krosing (#33)
#35Hannu Krosing
hannu@tm.ee
In reply to: Nathan Bossart (#34)
#36Nathan Bossart
nathandbossart@gmail.com
In reply to: Hannu Krosing (#35)
#37Hannu Krosing
hannu@tm.ee
In reply to: Nathan Bossart (#36)
#38Nitin Motiani
nitinmotiani@google.com
In reply to: Hannu Krosing (#37)
#39Nathan Bossart
nathandbossart@gmail.com
In reply to: Nitin Motiani (#38)
#40Nitin Motiani
nitinmotiani@google.com
In reply to: Nathan Bossart (#39)
#41Nathan Bossart
nathandbossart@gmail.com
In reply to: Nitin Motiani (#40)
#42Nitin Motiani
nitinmotiani@google.com
In reply to: Nathan Bossart (#41)
#43Nathan Bossart
nathandbossart@gmail.com
In reply to: Nitin Motiani (#42)
#44Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nathan Bossart (#43)
#45Nathan Bossart
nathandbossart@gmail.com
In reply to: Tom Lane (#44)
#46Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#45)
#47Hannu Krosing
hannu@tm.ee
In reply to: Nathan Bossart (#46)
#48Nathan Bossart
nathandbossart@gmail.com
In reply to: Hannu Krosing (#47)