RE: pg_upgrade failing for 200+ million Large Objects
Hi all,
Attached is a proof-of-concept patch that allows Postgres to perform
pg_upgrade if the instance has Millions of objects.
It would be great if someone could take a look and see if this patch is in
the right direction. There are some pending tasks (such as documentation /
pg_resetxlog vs pg_resetwal related changes) but for now, the patch helps
remove a stalemate where if a Postgres instance has a large number
(accurately speaking 146+ Million) of Large Objects, pg_upgrade fails. This
is easily reproducible and besides deleting Large Objects before upgrade,
there is no other (apparent) way for pg_upgrade to complete.
The patch (attached):
- Applies cleanly on REL9_6_STABLE -
c7a4fc3dd001646d5938687ad59ab84545d5d043
- 'make check' passes
- Allows the user to provide a constant via pg_upgrade command-line, that
overrides the 2 billion constant in pg_resetxlog [1] thereby increasing the
(window of) Transaction IDs available for pg_upgrade to complete.
Sample argument for pg_upgrade:
$ /opt/postgres/96/bin/pg_upgrade --max-limit-xid 1000000000 --old-bindir
...
With this patch, pg_upgrade is now able to upgrade a v9.5 cluster with 500
million Large Objects successfully to v9.6 - some stats below:
Source Postgres - v9.5.24
Target Version - v9.6.21
Large Object Count: 500 Million Large Objects
Machine - r5.4xlarge (16vCPU / 128GB RAM + 1TB swap)
Memory used during pg_upgrade - ~350GB
Time taken - 25+ hrs. (tested twice) - (All LOs processed sequentially ->
Scope for optimization)
Although counter-intuitive, for this testing purpose all Large Objects were
small (essentially the idea was to test the count) and created by using
something like this:
seq 1 50000 | xargs -n 1 -i -P 10 /opt/postgres/95/bin/psql -c "select
lo_from_bytea(0, '\xffffff00') from generate_series(1,10000);" > /dev/null
I am not married to the patch (especially the argument name) but ideally I'd
prefer a way to get this upgrade going without a patch. For now, I am unable
to find any other way to upgrade a v9.5 Postgres database in this scenario,
facing End-of-Life.
Reference:
1) 2 Billion constant -
https://github.com/postgres/postgres/blob/ca3b37487be333a1d241dab1bbdd17a211
a88f43/src/bin/pg_resetwal/pg_resetwal.c#L444
Thanks,
Robins Tharakan
Show quoted text
-----Original Message-----
From: Tharakan, Robins
Sent: Wednesday, 3 March 2021 10:36 PM
To: pgsql-hackers@postgresql.org
Subject: pg_upgrade failing for 200+ million Large ObjectsHi,
While reviewing a failed upgrade from Postgres v9.5 (to v9.6) I saw that
the
instance had ~200 million (in-use) Large Objects. I was able to reproduce
this on a test instance which too fails with a similar error.pg_restore: executing BLOB 4980622
pg_restore: WARNING: database with OID 0 must be vacuumed within 1000001
transactions
HINT: To avoid a database shutdown, execute a database-wide VACUUM in
that
database.
You might also need to commit or roll back old prepared transactions.
pg_restore: executing BLOB 4980623
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 2565; 2613 4980623 BLOB
4980623 postgres
pg_restore: [archiver (db)] could not execute query: ERROR: database is
not
accepting commands to avoid wraparound data loss in database with OID 0
HINT: Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions.
Command was: SELECT pg_catalog.lo_create('4980623');To remove the obvious possibilities, these Large Objects that are still
in-use (so vacuumlo wouldn't help), giving more system resources doesn't
help, moving Large Objects around to another database doesn't help (since
this is cluster-wide restriction), the source instance is nowhere close
to
wraparound and lastly recent-most minor versions don't help either (I
tried
compiling 9_6_STABLE + upgrade database with 150 million LO and still
encountered the same issue).Do let me know if I am missing something obvious but it appears that this
is
happening owing to 2 things coming together:* Each Large Object is migrated in its own transaction during pg_upgrade
* pg_resetxlog appears to be narrowing the window (available for
pg_upgrade)
to ~146 Million XIDs (2^31 - 1 million XID wraparound margin - 2 billion
which is a hard-coded constant - see [1] - in what appears to be an
attempt
to force an Autovacuum Wraparound session soon after upgrade completes).Ideally such an XID based restriction, is limiting for an instance that's
actively using a lot of Large Objects. Besides forcing AutoVacuum
Wraparound
logic to kick in soon after, I am unclear what much else it aims to do.
What
it does seem to be doing is to block Major Version upgrades if the
pre-upgrade instance has >146 Million Large Objects (half that, if the LO
additionally requires ALTER LARGE OBJECT OWNER TO for each of those
objects
during pg_restore)For long-term these ideas came to mind, although am unsure which are
low-hanging fruits and which outright impossible - For e.g. clubbing
multiple objects in a transaction [2] / Force AutoVacuum post upgrade
(and
thus remove this limitation altogether) or see if "pg_resetxlog -x" (from
within pg_upgrade) could help in some way to work-around this limitation.Is there a short-term recommendation for this scenario?
I can understand a high number of small-sized objects is not a great way
to
use pg_largeobject (since Large Objects was intended to be for, well,
'large
objects') but this magic number of Large Objects is now a stalemate at
this
point (with respect to v9.5 EOL).Reference:
1) pg_resetxlog -
https://github.com/postgres/postgres/blob/ca3b37487be333a1d241dab1bbdd17a
211
a88f43/src/bin/pg_resetwal/pg_resetwal.c#L444
2)
/messages/by-id/ed7d86a1-b907-4f53-9f6e-
63482d2f2bac%4
0manitou-mail.org-
Thanks
Robins Tharakan
Attachments:
pgupgrade_lo_v2.patchapplication/octet-stream; name=pgupgrade_lo_v2.patchDownload
diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c
index 3e79482ca2..f2e9824cb5 100644
--- a/src/bin/pg_resetxlog/pg_resetxlog.c
+++ b/src/bin/pg_resetxlog/pg_resetxlog.c
@@ -67,6 +67,7 @@ static TransactionId set_xid = 0;
static TransactionId set_oldest_commit_ts_xid = 0;
static TransactionId set_newest_commit_ts_xid = 0;
static Oid set_oid = 0;
+static Oid max_limit_xid = 2000000000;
static MultiXactId set_mxid = 0;
static MultiXactOffset set_mxoff = (MultiXactOffset) -1;
static uint32 minXlogTli = 0;
@@ -116,7 +117,7 @@ main(int argc, char *argv[])
}
- while ((c = getopt(argc, argv, "c:D:e:fl:m:no:O:x:")) != -1)
+ while ((c = getopt(argc, argv, "c:D:e:fl:L:m:no:O:x:")) != -1)
{
switch (c)
{
@@ -210,6 +211,21 @@ main(int argc, char *argv[])
}
break;
+ case 'L':
+ max_limit_xid = strtoul(optarg, &endptr, 0);
+ if (endptr == optarg || *endptr != '\0')
+ {
+ fprintf(stderr, _("%s: invalid argument for option %s\n"), progname, "-L");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+ exit(1);
+ }
+ if (max_limit_xid <= 500000000)
+ {
+ fprintf(stderr, _("%s: Max Limit XID (-L) must not be less than 500 Million\n"), progname);
+ exit(1);
+ }
+ break;
+
case 'm':
set_mxid = strtoul(optarg, &endptr, 0);
if (endptr == optarg || *endptr != ',')
@@ -381,7 +397,7 @@ main(int argc, char *argv[])
* reasonably safe. The magic constant here corresponds to the
* maximum allowed value of autovacuum_freeze_max_age.
*/
- ControlFile.checkPointCopy.oldestXid = set_xid - 2000000000;
+ ControlFile.checkPointCopy.oldestXid = set_xid - max_limit_xid;
if (ControlFile.checkPointCopy.oldestXid < FirstNormalTransactionId)
ControlFile.checkPointCopy.oldestXid += FirstNormalTransactionId;
ControlFile.checkPointCopy.oldestXidDB = InvalidOid;
@@ -1239,6 +1255,7 @@ usage(void)
printf(_(" -e XIDEPOCH set next transaction ID epoch\n"));
printf(_(" -f force update to be done\n"));
printf(_(" -l XLOGFILE force minimum WAL starting location for new transaction log\n"));
+ printf(_(" -L MAXLIMITXID force max XID starting location for new transaction log\n"));
printf(_(" -m MXID,MXID set next and oldest multitransaction ID\n"));
printf(_(" -n no update, just show what would be done (for testing)\n"));
printf(_(" -o OID set next OID\n"));
diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c
index 7ab284a51b..3a861739f7 100644
--- a/src/bin/pg_upgrade/option.c
+++ b/src/bin/pg_upgrade/option.c
@@ -53,6 +53,7 @@ parseCommandLine(int argc, char *argv[])
{"link", no_argument, NULL, 'k'},
{"retain", no_argument, NULL, 'r'},
{"jobs", required_argument, NULL, 'j'},
+ {"max-limit-xid", required_argument, NULL, 'L'},
{"verbose", no_argument, NULL, 'v'},
{NULL, 0, NULL, 0}
};
@@ -64,6 +65,7 @@ parseCommandLine(int argc, char *argv[])
time_t run_time = time(NULL);
user_opts.transfer_mode = TRANSFER_MODE_COPY;
+ user_opts.maxlimitxid = 2000000000;
os_info.progname = get_progname(argv[0]);
@@ -101,7 +103,7 @@ parseCommandLine(int argc, char *argv[])
if ((log_opts.internal = fopen_priv(INTERNAL_LOG_FILE, "a")) == NULL)
pg_fatal("cannot write to log file %s\n", INTERNAL_LOG_FILE);
- while ((option = getopt_long(argc, argv, "d:D:b:B:cj:ko:O:p:P:rU:v",
+ while ((option = getopt_long(argc, argv, "d:D:b:B:cj:ko:L:O:p:P:rU:v",
long_options, &optindex)) != -1)
{
switch (option)
@@ -132,6 +134,10 @@ parseCommandLine(int argc, char *argv[])
user_opts.jobs = atoi(optarg);
break;
+ case 'L':
+ user_opts.maxlimitxid = atoi(optarg);
+ break;
+
case 'k':
user_opts.transfer_mode = TRANSFER_MODE_LINK;
break;
@@ -287,6 +293,7 @@ Options:\n\
-D, --new-datadir=DATADIR new cluster data directory\n\
-j, --jobs=NUM number of simultaneous processes or threads to use\n\
-k, --link link instead of copying files to new cluster\n\
+ -L, --max-limit-xid=NUM max-limit XIDs to consider\n\
-o, --old-options=OPTIONS old cluster options to pass to the server\n\
-O, --new-options=OPTIONS new cluster options to pass to the server\n\
-p, --old-port=PORT old cluster port number (default %d)\n\
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 02078c0357..2d0f3a7e04 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -411,8 +411,10 @@ copy_clog_xlog_xid(void)
/* set the next transaction id and epoch of the new cluster */
prep_status("Setting next transaction ID and epoch for new cluster");
exec_prog(UTILITY_LOG_FILE, NULL, true,
- "\"%s/pg_resetxlog\" -f -x %u \"%s\"",
- new_cluster.bindir, old_cluster.controldata.chkpnt_nxtxid,
+ "\"%s/pg_resetxlog\" -L %u -f -x %u \"%s\"",
+ new_cluster.bindir,
+ user_opts.maxlimitxid,
+ old_cluster.controldata.chkpnt_nxtxid,
new_cluster.pgdata);
exec_prog(UTILITY_LOG_FILE, NULL, true,
"\"%s/pg_resetxlog\" -f -e %u \"%s\"",
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 9fbdacc53e..50fe73ae09 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -298,6 +298,7 @@ typedef struct
* changes */
transferMode transfer_mode; /* copy files or link them? */
int jobs;
+ int maxlimitxid;
} UserOpts;
On 7 Mar 2021, at 09:43, Tharakan, Robins <tharar@amazon.com> wrote:
The patch (attached):
- Applies cleanly on REL9_6_STABLE -
c7a4fc3dd001646d5938687ad59ab84545d5d043
Did you target 9.6 because that's where you want to upgrade to, or is this not
a problem on HEAD? If it's still a problem on HEAD you should probably submit
the patch against there. You probably also want to add it to the next commit
fest to make sure it's not forgotten about: https://commitfest.postgresql.org/33/
I am not married to the patch (especially the argument name) but ideally I'd
prefer a way to get this upgrade going without a patch. For now, I am unable
to find any other way to upgrade a v9.5 Postgres database in this scenario,
facing End-of-Life.
It's obviously not my call to make in any shape or form, but this doesn't
really seem to fall under what is generally backported into a stable release?
--
Daniel Gustafsson https://vmware.com/
On 07.03.21 09:43, Tharakan, Robins wrote:
Attached is a proof-of-concept patch that allows Postgres to perform
pg_upgrade if the instance has Millions of objects.It would be great if someone could take a look and see if this patch is in
the right direction. There are some pending tasks (such as documentation /
pg_resetxlog vs pg_resetwal related changes) but for now, the patch helps
remove a stalemate where if a Postgres instance has a large number
(accurately speaking 146+ Million) of Large Objects, pg_upgrade fails. This
is easily reproducible and besides deleting Large Objects before upgrade,
there is no other (apparent) way for pg_upgrade to complete.The patch (attached):
- Applies cleanly on REL9_6_STABLE -
c7a4fc3dd001646d5938687ad59ab84545d5d043
- 'make check' passes
- Allows the user to provide a constant via pg_upgrade command-line, that
overrides the 2 billion constant in pg_resetxlog [1] thereby increasing the
(window of) Transaction IDs available for pg_upgrade to complete.
Could you explain what your analysis of the problem is and why this
patch (might) fix it?
Right now, all I see here is, pass a big number via a command-line
option and hope it works.
Thanks Daniel for the input / next-steps.
I see that 'master' too has this same magic constant [1] and so I expect it
to have similar restrictions, although I haven't tested this yet.
I do agree that the need then is to re-submit a patch that works with
'master'. (I am travelling the next few days but) Unless discussions go
tangential, I expect to revert with an updated patch by the end of this week
and create a commitfest entry while at it.
Reference:
1)
https://github.com/postgres/postgres/blob/master/src/bin/pg_resetwal/pg_rese
twal.c#L444
-
Robins Tharakan
Show quoted text
-----Original Message-----
From: Daniel Gustafsson <daniel@yesql.se>
Sent: Monday, 8 March 2021 9:42 AM
To: Tharakan, Robins <tharar@amazon.com>
Cc: pgsql-hackers@postgresql.org
Subject: RE: [EXTERNAL] pg_upgrade failing for 200+ million Large ObjectsCAUTION: This email originated from outside of the organization. Do not
click links or open attachments unless you can confirm the sender and
know the content is safe.On 7 Mar 2021, at 09:43, Tharakan, Robins <tharar@amazon.com> wrote:
The patch (attached):
- Applies cleanly on REL9_6_STABLE -
c7a4fc3dd001646d5938687ad59ab84545d5d043Did you target 9.6 because that's where you want to upgrade to, or is
this not a problem on HEAD? If it's still a problem on HEAD you should
probably submit the patch against there. You probably also want to add
it to the next commit fest to make sure it's not forgotten about:
https://commitfest.postgresql.org/33/I am not married to the patch (especially the argument name) but
ideally I'd prefer a way to get this upgrade going without a patch.
For now, I am unable to find any other way to upgrade a v9.5 Postgres
database in this scenario, facing End-of-Life.It's obviously not my call to make in any shape or form, but this doesn't
really seem to fall under what is generally backported into a stable
release?--
Daniel Gustafsson https://vmware.com/
Attachments:
Import Notes
Resolved by subject fallback
Thanks Peter.
The original email [1] had some more context that somehow didn't get
associated with this recent email. Apologies for any confusion.
In short, pg_resetxlog (and pg_resetwal) employs a magic constant [2] (for
both v9.6 as well as master) which seems to have been selected to force an
aggressive autovacuum as soon as the upgrade completes. Although that works
as planned, it narrows the window of Transaction IDs available for the
upgrade (before which XID wraparound protection kicks and aborts the
upgrade) to 146 Million.
Reducing this magic constant allows a larger XID window, which is what the
patch is trying to do. With the patch, I was able to upgrade a cluster with
500m Large Objects successfully (which otherwise reliably fails). In the
original email [1] I had also listed a few other possible workarounds, but
was unsure which would be a good direction to start working on.... thus this
patch to make a start.
Reference:
1) /messages/by-id/12601596dbbc4c01b86b4ac4d2bd4d48@
EX13D05UWC001.ant.amazon.com
2) https://github.com/postgres/postgres/blob/master/src/bin/pg_resetwal/pg_r
esetwal.c#L444
-
robins | tharar@ | syd12
Show quoted text
-----Original Message-----
From: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Sent: Monday, 8 March 2021 9:25 PM
To: Tharakan, Robins <tharar@amazon.com>; pgsql-hackers@postgresql.org
Subject: [EXTERNAL] [UNVERIFIED SENDER] Re: pg_upgrade failing for 200+
million Large ObjectsCAUTION: This email originated from outside of the organization. Do not
click links or open attachments unless you can confirm the sender and
know the content is safe.On 07.03.21 09:43, Tharakan, Robins wrote:
Attached is a proof-of-concept patch that allows Postgres to perform
pg_upgrade if the instance has Millions of objects.It would be great if someone could take a look and see if this patch
is in the right direction. There are some pending tasks (such as
documentation / pg_resetxlog vs pg_resetwal related changes) but for
now, the patch helps remove a stalemate where if a Postgres instance
has a large number (accurately speaking 146+ Million) of Large
Objects, pg_upgrade fails. This is easily reproducible and besides
deleting Large Objects before upgrade, there is no other (apparent) wayfor pg_upgrade to complete.
The patch (attached):
- Applies cleanly on REL9_6_STABLE -
c7a4fc3dd001646d5938687ad59ab84545d5d043
- 'make check' passes
- Allows the user to provide a constant via pg_upgrade command-line,
that overrides the 2 billion constant in pg_resetxlog [1] thereby
increasing the (window of) Transaction IDs available for pg_upgrade tocomplete.
Could you explain what your analysis of the problem is and why this patch
(might) fix it?Right now, all I see here is, pass a big number via a command-line option
and hope it works.
Attachments:
Import Notes
Resolved by subject fallback
On Mon, Mar 8, 2021 at 12:02 PM Tharakan, Robins <tharar@amazon.com> wrote:
Thanks Peter.
The original email [1] had some more context that somehow didn't get
associated with this recent email. Apologies for any confusion.
Please take a look at your email configuration -- all your emails are
lacking both References and In-reply-to headers, so every email starts
a new thread, both for each reader and in the archives. It seems quite
broken. It makes it very hard to follow.
In short, pg_resetxlog (and pg_resetwal) employs a magic constant [2] (for
both v9.6 as well as master) which seems to have been selected to force an
aggressive autovacuum as soon as the upgrade completes. Although that works
as planned, it narrows the window of Transaction IDs available for the
upgrade (before which XID wraparound protection kicks and aborts the
upgrade) to 146 Million.Reducing this magic constant allows a larger XID window, which is what the
patch is trying to do. With the patch, I was able to upgrade a cluster with
500m Large Objects successfully (which otherwise reliably fails). In the
original email [1] I had also listed a few other possible workarounds, but
was unsure which would be a good direction to start working on.... thus this
patch to make a start.
This still seems to just fix the symptoms and not the actual problem.
What part of the pg_upgrade process is it that actually burns through
that many transactions?
Without looking, I would guess it's the schema reload using
pg_dump/pg_restore and not actually pg_upgrade itself. This is a known
issue in pg_dump/pg_restore. And if that is the case -- perhaps just
running all of those in a single transaction would be a better choice?
One could argue it's still not a proper fix, because we'd still have a
huge memory usage etc, but it would then only burn 1 xid instead of
500M...
AFAICT at a quick check, pg_dump in binary upgrade mode emits one
lo_create() and one ALTER ... OWNER TO for each large object - so with
500M large objects that would be a billion statements, and thus a
billion xids. And without checking, I'm fairly sure it doesn't load in
a single transaction...
--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/
Hi Magnus,
On Mon, 8 Mar 2021 at 23:34, Magnus Hagander <magnus@hagander.net> wrote:
AFAICT at a quick check, pg_dump in binary upgrade mode emits one
lo_create() and one ALTER ... OWNER TO for each large object - so with
500M large objects that would be a billion statements, and thus a
billion xids. And without checking, I'm fairly sure it doesn't load in
a single transaction...
Your assumptions are pretty much correct.
The issue isn't with pg_upgrade itself. During pg_restore, each Large
Object (and separately each ALTER LARGE OBJECT OWNER TO) consumes an XID
each. For background, that's the reason the v9.5 production instance I was
reviewing, was unable to process more than 73 Million large objects since
each object required a CREATE + ALTER. (To clarify, 73 million = (2^31 - 2
billion magic constant - 1 Million wraparound protection) / 2)
Without looking, I would guess it's the schema reload using
pg_dump/pg_restore and not actually pg_upgrade itself. This is a known
issue in pg_dump/pg_restore. And if that is the case -- perhaps just
running all of those in a single transaction would be a better choice?
One could argue it's still not a proper fix, because we'd still have a
huge memory usage etc, but it would then only burn 1 xid instead of
500M...
(I hope I am not missing something but) When I tried to force pg_restore to
use a single transaction (by hacking pg_upgrade's pg_restore call to use
--single-transaction), it too failed owing to being unable to lock so many
objects in a single transaction.
This still seems to just fix the symptoms and not the actual problem.
I agree that the patch doesn't address the root-cause, but it did get the
upgrade to complete on a test-setup. Do you think that (instead of all
objects) batching multiple Large Objects in a single transaction (and
allowing the caller to size that batch via command line) would be a good /
acceptable idea here?
Please take a look at your email configuration -- all your emails are
lacking both References and In-reply-to headers.
Thanks for highlighting the cause here. Hopefully switching mail clients
would help.
-
Robins Tharakan
Robins Tharakan <tharakan@gmail.com> writes:
On Mon, 8 Mar 2021 at 23:34, Magnus Hagander <magnus@hagander.net> wrote:
Without looking, I would guess it's the schema reload using
pg_dump/pg_restore and not actually pg_upgrade itself. This is a known
issue in pg_dump/pg_restore. And if that is the case -- perhaps just
running all of those in a single transaction would be a better choice?
One could argue it's still not a proper fix, because we'd still have a
huge memory usage etc, but it would then only burn 1 xid instead of
500M...
(I hope I am not missing something but) When I tried to force pg_restore to
use a single transaction (by hacking pg_upgrade's pg_restore call to use
--single-transaction), it too failed owing to being unable to lock so many
objects in a single transaction.
It does seem that --single-transaction is a better idea than fiddling with
the transaction wraparound parameters, since the latter is just going to
put off the onset of trouble. However, we'd have to do something about
the lock consumption. Would it be sane to have the backend not bother to
take any locks in binary-upgrade mode?
regards, tom lane
On Mon, Mar 8, 2021 at 5:33 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robins Tharakan <tharakan@gmail.com> writes:
On Mon, 8 Mar 2021 at 23:34, Magnus Hagander <magnus@hagander.net> wrote:
Without looking, I would guess it's the schema reload using
pg_dump/pg_restore and not actually pg_upgrade itself. This is a known
issue in pg_dump/pg_restore. And if that is the case -- perhaps just
running all of those in a single transaction would be a better choice?
One could argue it's still not a proper fix, because we'd still have a
huge memory usage etc, but it would then only burn 1 xid instead of
500M...(I hope I am not missing something but) When I tried to force pg_restore to
use a single transaction (by hacking pg_upgrade's pg_restore call to use
--single-transaction), it too failed owing to being unable to lock so many
objects in a single transaction.It does seem that --single-transaction is a better idea than fiddling with
the transaction wraparound parameters, since the latter is just going to
put off the onset of trouble. However, we'd have to do something about
the lock consumption. Would it be sane to have the backend not bother to
take any locks in binary-upgrade mode?
I believe the problem occurs when writing them rather than when
reading them, and I don't think we have a binary upgrade mode there.
We could invent one of course. Another option might be to exclusively
lock pg_largeobject, and just say that if you do that, we don't have
to lock the individual objects (ever)?
--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/
Magnus Hagander <magnus@hagander.net> writes:
On Mon, Mar 8, 2021 at 5:33 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
It does seem that --single-transaction is a better idea than fiddling with
the transaction wraparound parameters, since the latter is just going to
put off the onset of trouble. However, we'd have to do something about
the lock consumption. Would it be sane to have the backend not bother to
take any locks in binary-upgrade mode?
I believe the problem occurs when writing them rather than when
reading them, and I don't think we have a binary upgrade mode there.
You're confusing pg_dump's --binary-upgrade switch (indeed applied on
the dumping side) with the backend's -b switch (IsBinaryUpgrade,
applied on the restoring side).
We could invent one of course. Another option might be to exclusively
lock pg_largeobject, and just say that if you do that, we don't have
to lock the individual objects (ever)?
What was in the back of my mind is that we've sometimes seen complaints
about too many locks needed to dump or restore a database with $MANY
tables; so the large-object case seems like just a special case.
The answer up to now has been "raise max_locks_per_transaction enough
so you don't see the failure". Having now consumed a little more
caffeine, I remember that that works in pg_upgrade scenarios too,
since the user can fiddle with the target cluster's postgresql.conf
before starting pg_upgrade.
So it seems like the path of least resistance is
(a) make pg_upgrade use --single-transaction when calling pg_restore
(b) document (better) how to get around too-many-locks failures.
regards, tom lane
On Mon, Mar 8, 2021 at 5:58 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
On Mon, Mar 8, 2021 at 5:33 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
It does seem that --single-transaction is a better idea than fiddling with
the transaction wraparound parameters, since the latter is just going to
put off the onset of trouble. However, we'd have to do something about
the lock consumption. Would it be sane to have the backend not bother to
take any locks in binary-upgrade mode?I believe the problem occurs when writing them rather than when
reading them, and I don't think we have a binary upgrade mode there.You're confusing pg_dump's --binary-upgrade switch (indeed applied on
the dumping side) with the backend's -b switch (IsBinaryUpgrade,
applied on the restoring side).
Ah. Yes, I am.
We could invent one of course. Another option might be to exclusively
lock pg_largeobject, and just say that if you do that, we don't have
to lock the individual objects (ever)?What was in the back of my mind is that we've sometimes seen complaints
about too many locks needed to dump or restore a database with $MANY
tables; so the large-object case seems like just a special case.
It is -- but I guess it's more likely to have 100M large objects than
to have 100M tables. (and the cutoff point comes a lot earlier than
100M). But the fundamental onei s the same.
The answer up to now has been "raise max_locks_per_transaction enough
so you don't see the failure". Having now consumed a little more
caffeine, I remember that that works in pg_upgrade scenarios too,
since the user can fiddle with the target cluster's postgresql.conf
before starting pg_upgrade.So it seems like the path of least resistance is
(a) make pg_upgrade use --single-transaction when calling pg_restore
(b) document (better) how to get around too-many-locks failures.
Agreed. Certainly seems like a better path forward than arbitrarily
pushing the limit on number of transactions which just postpones the
problem.
--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/
On 3/8/21 11:58 AM, Tom Lane wrote:
The answer up to now has been "raise max_locks_per_transaction enough
so you don't see the failure". Having now consumed a little more
caffeine, I remember that that works in pg_upgrade scenarios too,
since the user can fiddle with the target cluster's postgresql.conf
before starting pg_upgrade.So it seems like the path of least resistance is
(a) make pg_upgrade use --single-transaction when calling pg_restore
(b) document (better) how to get around too-many-locks failures.
That would first require to fix how pg_upgrade is creating the
databases. It uses "pg_restore --create", which is mutually exclusive
with --single-transaction because we cannot create a database inside of
a transaction. On the way pg_upgrade also mangles the pg_database.datdba
(all databases are owned by postgres after an upgrade; will submit a
separate patch for that as I consider that a bug by itself).
All that aside, the entire approach doesn't scale.
In a hacked up pg_upgrade that does "createdb" first before calling
pg_upgrade with --single-transaction. I can upgrade 1M large objects with
max_locks_per_transaction = 5300
max_connectinons=100
which contradicts the docs. Need to find out where that math went off
the rails because that config should only have room for 530,000 locks,
not 1M. The same test fails with max_locks_per_transaction = 5200.
But this would mean that one has to modify the postgresql.conf to
something like 530,000 max_locks_per_transaction at 100 max_connections
in order to actually run a successful upgrade of 100M large objects.
This config requires 26GB of memory just for locks. Add to that the
memory pg_restore needs to load the entire TOC before even restoring a
single object.
Not going to work. But tests are still ongoing ...
Regards, Jan
--
Jan Wieck
Principle Database Engineer
Amazon Web Services
On 3/20/21 12:39 AM, Jan Wieck wrote:
On 3/8/21 11:58 AM, Tom Lane wrote:
The answer up to now has been "raise max_locks_per_transaction enough
so you don't see the failure". Having now consumed a little more
caffeine, I remember that that works in pg_upgrade scenarios too,
since the user can fiddle with the target cluster's postgresql.conf
before starting pg_upgrade.So it seems like the path of least resistance is
(a) make pg_upgrade use --single-transaction when calling pg_restore
(b) document (better) how to get around too-many-locks failures.
That would first require to fix how pg_upgrade is creating the
databases. It uses "pg_restore --create", which is mutually exclusive
with --single-transaction because we cannot create a database inside
of a transaction. On the way pg_upgrade also mangles the
pg_database.datdba (all databases are owned by postgres after an
upgrade; will submit a separate patch for that as I consider that a
bug by itself).All that aside, the entire approach doesn't scale.
In a hacked up pg_upgrade that does "createdb" first before calling
pg_upgrade with --single-transaction. I can upgrade 1M large objects with
max_locks_per_transaction = 5300
max_connectinons=100
which contradicts the docs. Need to find out where that math went off
the rails because that config should only have room for 530,000 locks,
not 1M. The same test fails with max_locks_per_transaction = 5200.But this would mean that one has to modify the postgresql.conf to
something like 530,000 max_locks_per_transaction at 100
max_connections in order to actually run a successful upgrade of 100M
large objects. This config requires 26GB of memory just for locks. Add
to that the memory pg_restore needs to load the entire TOC before even
restoring a single object.Not going to work. But tests are still ongoing ...
I thought Tom's suggestion upthread:
Would it be sane to have the backend not bother to
take any locks in binary-upgrade mode?
was interesting. Could we do that on the restore side? After all, what
are we locking against in binary upgrade mode?
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
Jan Wieck <jan@wi3ck.info> writes:
On 3/8/21 11:58 AM, Tom Lane wrote:
So it seems like the path of least resistance is
(a) make pg_upgrade use --single-transaction when calling pg_restore
(b) document (better) how to get around too-many-locks failures.
That would first require to fix how pg_upgrade is creating the
databases. It uses "pg_restore --create", which is mutually exclusive
with --single-transaction because we cannot create a database inside of
a transaction.
Ugh.
All that aside, the entire approach doesn't scale.
Yeah, agreed. When we gave large objects individual ownership and ACL
info, it was argued that pg_dump could afford to treat each one as a
separate TOC entry because "you wouldn't have that many of them, if
they're large". The limits of that approach were obvious even at the
time, and I think now we're starting to see people for whom it really
doesn't work.
I wonder if pg_dump could improve matters cheaply by aggregating the
large objects by owner and ACL contents. That is, do
select distinct lomowner, lomacl from pg_largeobject_metadata;
and make just *one* BLOB TOC entry for each result. Then dump out
all the matching blobs under that heading.
A possible objection is that it'd reduce the ability to restore blobs
selectively, so maybe we'd need to make it optional.
Of course, that just reduces the memory consumption on the client
side; it does nothing for the locks. Can we get away with releasing the
lock immediately after doing an ALTER OWNER or GRANT/REVOKE on a blob?
regards, tom lane
On Sat, Mar 20, 2021 at 11:23:19AM -0400, Tom Lane wrote:
I wonder if pg_dump could improve matters cheaply by aggregating the
large objects by owner and ACL contents. That is, doselect distinct lomowner, lomacl from pg_largeobject_metadata;
and make just *one* BLOB TOC entry for each result. Then dump out
all the matching blobs under that heading.A possible objection is that it'd reduce the ability to restore blobs
selectively, so maybe we'd need to make it optional.Of course, that just reduces the memory consumption on the client
side; it does nothing for the locks. Can we get away with releasing the
lock immediately after doing an ALTER OWNER or GRANT/REVOKE on a blob?
Well, in pg_upgrade mode you can, since there are no other cluster
users, but you might be asking for general pg_dump usage.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
If only the physical world exists, free will is an illusion.
Bruce Momjian <bruce@momjian.us> writes:
On Sat, Mar 20, 2021 at 11:23:19AM -0400, Tom Lane wrote:
Of course, that just reduces the memory consumption on the client
side; it does nothing for the locks. Can we get away with releasing the
lock immediately after doing an ALTER OWNER or GRANT/REVOKE on a blob?
Well, in pg_upgrade mode you can, since there are no other cluster
users, but you might be asking for general pg_dump usage.
Yeah, this problem doesn't only affect pg_upgrade scenarios, so it'd
really be better to find a way that isn't dependent on binary-upgrade
mode.
regards, tom lane
On 3/20/21 11:23 AM, Tom Lane wrote:
Jan Wieck <jan@wi3ck.info> writes:
All that aside, the entire approach doesn't scale.
Yeah, agreed. When we gave large objects individual ownership and ACL
info, it was argued that pg_dump could afford to treat each one as a
separate TOC entry because "you wouldn't have that many of them, if
they're large". The limits of that approach were obvious even at the
time, and I think now we're starting to see people for whom it really
doesn't work.
It actually looks more like some users have millions of "small objects".
I am still wondering where that is coming from and why they are abusing
LOs in that way, but that is more out of curiosity. Fact is that they
are out there and that they cannot upgrade from their 9.5 databases,
which are now past EOL.
I wonder if pg_dump could improve matters cheaply by aggregating the
large objects by owner and ACL contents. That is, doselect distinct lomowner, lomacl from pg_largeobject_metadata;
and make just *one* BLOB TOC entry for each result. Then dump out
all the matching blobs under that heading.
What I am currently experimenting with is moving the BLOB TOC entries
into the parallel data phase of pg_restore "when doing binary upgrade".
It seems to scale nicely with the number of cores in the system. In
addition to that have options for pg_upgrade and pg_restore that cause
the restore to batch them into transactions, like 10,000 objects at a
time. There was a separate thread for that but I guess it is better to
keep it all together here now.
A possible objection is that it'd reduce the ability to restore blobs
selectively, so maybe we'd need to make it optional.
I fully intend to make all this into new "options". I am afraid that
there is no one-size-fits-all solution here.
Of course, that just reduces the memory consumption on the client
side; it does nothing for the locks. Can we get away with releasing the
lock immediately after doing an ALTER OWNER or GRANT/REVOKE on a blob?
I'm not very fond of the idea going lockless when at the same time
trying to parallelize the restore phase. That can lead to really nasty
race conditions. For now I'm aiming at batches in transactions.
Regards, Jan
--
Jan Wieck
Principle Database Engineer
Amazon Web Services
On 3/20/21 12:55 PM, Jan Wieck wrote:
On 3/20/21 11:23 AM, Tom Lane wrote:
Jan Wieck <jan@wi3ck.info> writes:
All that aside, the entire approach doesn't scale.
Yeah, agreed. When we gave large objects individual ownership and ACL
info, it was argued that pg_dump could afford to treat each one as a
separate TOC entry because "you wouldn't have that many of them, if
they're large". The limits of that approach were obvious even at the
time, and I think now we're starting to see people for whom it really
doesn't work.It actually looks more like some users have millions of "small
objects". I am still wondering where that is coming from and why they
are abusing LOs in that way, but that is more out of curiosity. Fact
is that they are out there and that they cannot upgrade from their 9.5
databases, which are now past EOL.
One possible (probable?) source is the JDBC driver, which currently
treats all Blobs (and Clobs, for that matter) as LOs. I'm working on
improving that some: <https://github.com/pgjdbc/pgjdbc/pull/2093>
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
On 3/20/21 12:39 AM, Jan Wieck wrote:
On the way pg_upgrade also mangles the pg_database.datdba
(all databases are owned by postgres after an upgrade; will submit a
separate patch for that as I consider that a bug by itself).
Patch attached.
Regards, Jan
--
Jan Wieck
Principle Database Engineer
Amazon Web Services
Attachments:
pg_upgrade-preserve-datdba.v1.difftext/x-patch; charset=UTF-8; name=pg_upgrade-preserve-datdba.v1.diffDownload
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index 5d9a26c..38f7202 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -344,6 +344,7 @@ get_db_infos(ClusterInfo *cluster)
DbInfo *dbinfos;
int i_datname,
i_oid,
+ i_datdba,
i_encoding,
i_datcollate,
i_datctype,
@@ -351,9 +352,12 @@ get_db_infos(ClusterInfo *cluster)
char query[QUERY_ALLOC];
snprintf(query, sizeof(query),
- "SELECT d.oid, d.datname, d.encoding, d.datcollate, d.datctype, "
+ "SELECT d.oid, d.datname, u.rolname, d.encoding, "
+ "d.datcollate, d.datctype, "
"%s AS spclocation "
"FROM pg_catalog.pg_database d "
+ " JOIN pg_catalog.pg_authid u "
+ " ON d.datdba = u.oid "
" LEFT OUTER JOIN pg_catalog.pg_tablespace t "
" ON d.dattablespace = t.oid "
"WHERE d.datallowconn = true "
@@ -367,6 +371,7 @@ get_db_infos(ClusterInfo *cluster)
i_oid = PQfnumber(res, "oid");
i_datname = PQfnumber(res, "datname");
+ i_datdba = PQfnumber(res, "rolname");
i_encoding = PQfnumber(res, "encoding");
i_datcollate = PQfnumber(res, "datcollate");
i_datctype = PQfnumber(res, "datctype");
@@ -379,6 +384,7 @@ get_db_infos(ClusterInfo *cluster)
{
dbinfos[tupnum].db_oid = atooid(PQgetvalue(res, tupnum, i_oid));
dbinfos[tupnum].db_name = pg_strdup(PQgetvalue(res, tupnum, i_datname));
+ dbinfos[tupnum].db_owner = pg_strdup(PQgetvalue(res, tupnum, i_datdba));
dbinfos[tupnum].db_encoding = atoi(PQgetvalue(res, tupnum, i_encoding));
dbinfos[tupnum].db_collate = pg_strdup(PQgetvalue(res, tupnum, i_datcollate));
dbinfos[tupnum].db_ctype = pg_strdup(PQgetvalue(res, tupnum, i_datctype));
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index e23b8ca..8fd9a13 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -378,18 +378,36 @@ create_new_objects(void)
* propagate its database-level properties.
*/
if (strcmp(old_db->db_name, "postgres") == 0)
- create_opts = "--clean --create";
+ {
+ parallel_exec_prog(log_file_name,
+ NULL,
+ "\"%s/pg_restore\" %s --exit-on-error "
+ "--verbose --clean --create "
+ "--dbname template1 \"%s\"",
+ new_cluster.bindir,
+ cluster_conn_opts(&new_cluster),
+ sql_file_name);
+ }
else
- create_opts = "--create";
-
- parallel_exec_prog(log_file_name,
- NULL,
- "\"%s/pg_restore\" %s %s --exit-on-error --verbose "
- "--dbname template1 \"%s\"",
- new_cluster.bindir,
- cluster_conn_opts(&new_cluster),
- create_opts,
- sql_file_name);
+ {
+ exec_prog(log_file_name, NULL, true, true,
+ "\"%s/createdb\" -O \"%s\" %s \"%s\"",
+ new_cluster.bindir,
+ old_db->db_owner,
+ cluster_conn_opts(&new_cluster),
+ old_db->db_name);
+ parallel_exec_prog(log_file_name,
+ NULL,
+ "\"%s/pg_restore\" %s --exit-on-error "
+ "--verbose "
+ "--dbname \"%s\" \"%s\"",
+ new_cluster.bindir,
+ cluster_conn_opts(&new_cluster),
+ old_db->db_name,
+ sql_file_name);
+ }
+
+
}
/* reap all children */
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 919a784..a3cda97 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -177,6 +177,7 @@ typedef struct
{
Oid db_oid; /* oid of the database */
char *db_name; /* database name */
+ char *db_owner; /* database owner */
char db_tablespace[MAXPGPATH]; /* database default tablespace
* path */
char *db_collate;
On 3/21/21 7:47 AM, Andrew Dunstan wrote:
One possible (probable?) source is the JDBC driver, which currently
treats all Blobs (and Clobs, for that matter) as LOs. I'm working on
improving that some: <https://github.com/pgjdbc/pgjdbc/pull/2093>
You mean the user is using OID columns pointing to large objects and the
JDBC driver is mapping those for streaming operations?
Yeah, that would explain a lot.
Thanks, Jan
--
Jan Wieck
Principle Database Engineer
Amazon Web Services
Jan Wieck <jan@wi3ck.info> writes:
On 3/20/21 12:39 AM, Jan Wieck wrote:
On the way pg_upgrade also mangles the pg_database.datdba
(all databases are owned by postgres after an upgrade; will submit a
separate patch for that as I consider that a bug by itself).
Patch attached.
Hmm, doesn't this lose all *other* database-level properties?
I think maybe what we have here is a bug in pg_restore, its
--create switch ought to be trying to update the database's
ownership.
regards, tom lane
On 3/21/21 12:57 PM, Tom Lane wrote:
Jan Wieck <jan@wi3ck.info> writes:
On 3/20/21 12:39 AM, Jan Wieck wrote:
On the way pg_upgrade also mangles the pg_database.datdba
(all databases are owned by postgres after an upgrade; will submit a
separate patch for that as I consider that a bug by itself).Patch attached.
Hmm, doesn't this lose all *other* database-level properties?
I think maybe what we have here is a bug in pg_restore, its
--create switch ought to be trying to update the database's
ownership.
Possibly. I didn't look into that route.
Regards, Jan
--
Jan Wieck
Principle Database Engineer
Amazon Web Services
On 3/21/21 1:15 PM, Jan Wieck wrote:
On 3/21/21 12:57 PM, Tom Lane wrote:
Jan Wieck <jan@wi3ck.info> writes:
On 3/20/21 12:39 AM, Jan Wieck wrote:
On the way pg_upgrade also mangles the pg_database.datdba
(all databases are owned by postgres after an upgrade; will submit a
separate patch for that as I consider that a bug by itself).Patch attached.
Hmm, doesn't this lose all *other* database-level properties?
I think maybe what we have here is a bug in pg_restore, its
--create switch ought to be trying to update the database's
ownership.Possibly. I didn't look into that route.
Thanks for that. I like this patch a lot better.
Regards, Jan
--
Jan Wieck
Principle Database Engineer
Amazon Web Services
Attachments:
pg_restore-preserve-datdba.v1.difftext/x-patch; charset=UTF-8; name=pg_restore-preserve-datdba.v1.diffDownload
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index f8bec3f..19c1e71 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -3030,6 +3030,8 @@ dumpDatabase(Archive *fout)
resetPQExpBuffer(creaQry);
resetPQExpBuffer(delQry);
+ appendPQExpBuffer(creaQry, "ALTER DATABASE %s OWNER TO %s;\n", qdatname, dba);
+
if (strlen(datconnlimit) > 0 && strcmp(datconnlimit, "-1") != 0)
appendPQExpBuffer(creaQry, "ALTER DATABASE %s CONNECTION LIMIT = %s;\n",
qdatname, datconnlimit);
On 3/21/21 12:56 PM, Jan Wieck wrote:
On 3/21/21 7:47 AM, Andrew Dunstan wrote:
One possible (probable?) source is the JDBC driver, which currently
treats all Blobs (and Clobs, for that matter) as LOs. I'm working on
improving that some: <https://github.com/pgjdbc/pgjdbc/pull/2093>You mean the user is using OID columns pointing to large objects and
the JDBC driver is mapping those for streaming operations?Yeah, that would explain a lot.
Probably in most cases the database is designed by Hibernate, and the
front end programmers know nothing at all of Oids or LOs, they just ask
for and get a Blob.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
Jan Wieck <jan@wi3ck.info> writes:
On 3/21/21 12:57 PM, Tom Lane wrote:
I think maybe what we have here is a bug in pg_restore, its
--create switch ought to be trying to update the database's
ownership.
Thanks for that. I like this patch a lot better.
Needs a little more work than that --- we should allow it to respond
to the --no-owner switch, for example. But I think likely we can do
it where other object ownership is handled. I'll look in a bit.
regards, tom lane
I wrote:
Needs a little more work than that --- we should allow it to respond
to the --no-owner switch, for example. But I think likely we can do
it where other object ownership is handled. I'll look in a bit.
Actually ... said code already DOES do that, so now I'm confused.
I tried
regression=# create user joe;
CREATE ROLE
regression=# create database joe owner joe;
CREATE DATABASE
regression=# \q
$ pg_dump -Fc joe >joe.dump
$ pg_restore --create -f - joe.dump | more
and I see
--
-- Name: joe; Type: DATABASE; Schema: -; Owner: joe
--
CREATE DATABASE joe WITH TEMPLATE = template0 ENCODING = 'SQL_ASCII' LOCALE = 'C';
ALTER DATABASE joe OWNER TO joe;
so at least in this case it's doing the right thing. We need a bit
more detail about the context in which it's doing the wrong thing
for you.
regards, tom lane
I wrote:
... so at least in this case it's doing the right thing. We need a bit
more detail about the context in which it's doing the wrong thing
for you.
Just to cross-check, I tried modifying pg_upgrade's regression test
as attached, and it still passes. (And inspection of the leftover
dump2.sql file verifies that the database ownership was correct.)
So I'm not sure what's up here.
regards, tom lane
Attachments:
upgrade-test-dbownership.patchtext/x-diff; charset=us-ascii; name=upgrade-test-dbownership.patchDownload
diff --git a/src/bin/pg_upgrade/test.sh b/src/bin/pg_upgrade/test.sh
index 9c6deae294..436646b5ba 100644
--- a/src/bin/pg_upgrade/test.sh
+++ b/src/bin/pg_upgrade/test.sh
@@ -150,6 +150,9 @@ export EXTRA_REGRESS_OPTS
standard_initdb "$oldbindir"/initdb
"$oldbindir"/pg_ctl start -l "$logdir/postmaster1.log" -o "$POSTMASTER_OPTS" -w
+# Create another user (just to exercise database ownership restoration).
+createuser regression_dbowner || createdb_status=$?
+
# Create databases with names covering the ASCII bytes other than NUL, BEL,
# LF, or CR. BEL would ring the terminal bell in the course of this test, and
# it is not otherwise a special case. PostgreSQL doesn't support the rest.
@@ -160,7 +163,7 @@ dbname1='\"\'$dbname1'\\"\\\'
dbname2=`awk 'BEGIN { for (i = 46; i < 91; i++) printf "%c", i }' </dev/null`
dbname3=`awk 'BEGIN { for (i = 91; i < 128; i++) printf "%c", i }' </dev/null`
createdb "regression$dbname1" || createdb_status=$?
-createdb "regression$dbname2" || createdb_status=$?
+createdb --owner=regression_dbowner "regression$dbname2" || createdb_status=$?
createdb "regression$dbname3" || createdb_status=$?
if "$MAKE" -C "$oldsrc" installcheck-parallel; then
@@ -227,7 +230,7 @@ PGDATA="$BASE_PGDATA"
standard_initdb 'initdb'
-pg_upgrade $PG_UPGRADE_OPTS -d "${PGDATA}.old" -D "$PGDATA" -b "$oldbindir" -p "$PGPORT" -P "$PGPORT"
+pg_upgrade $PG_UPGRADE_OPTS -d "${PGDATA}.old" -D "$PGDATA" -b "$oldbindir" -p "$PGPORT" -P "$PGPORT" -j 4
# make sure all directories and files have group permissions, on Unix hosts
# Windows hosts don't support Unix-y permissions.
On 3/21/21 2:34 PM, Tom Lane wrote:
and I see
--
-- Name: joe; Type: DATABASE; Schema: -; Owner: joe
--CREATE DATABASE joe WITH TEMPLATE = template0 ENCODING = 'SQL_ASCII' LOCALE = 'C';
ALTER DATABASE joe OWNER TO joe;
so at least in this case it's doing the right thing. We need a bit
more detail about the context in which it's doing the wrong thing
for you.
After moving all of this to a pristine postgresql.org based repo I see
the same. My best guess at this point is that the permission hoops, that
RDS and Aurora PostgreSQL are jumping through, was messing with this.
But that has nothing to do with the actual topic.
So let's focus on the actual problem of running out of XIDs and memory
while doing the upgrade involving millions of small large objects.
Regards, Jan
--
Jan Wieck
Principle Database Engineer
Amazon Web Services
Jan Wieck <jan@wi3ck.info> writes:
So let's focus on the actual problem of running out of XIDs and memory
while doing the upgrade involving millions of small large objects.
Right. So as far as --single-transaction vs. --create goes, that's
mostly a definitional problem. As long as the contents of a DB are
restored in one transaction, it's not gonna matter if we eat one or
two more XIDs while creating the DB itself. So we could either
relax pg_restore's complaint, or invent a different switch that's
named to acknowledge that it's not really only one transaction.
That still leaves us with the lots-o-locks problem. However, once
we've crossed the Rubicon of "it's not really only one transaction",
you could imagine that the switch is "--fewer-transactions", and the
idea is for pg_restore to commit after every (say) 100000 operations.
That would both bound its lock requirements and greatly cut its XID
consumption.
The work you described sounded like it could fit into that paradigm,
with the additional ability to run some parallel restore tasks
that are each consuming a bounded number of locks.
regards, tom lane
On 3/21/21 3:56 PM, Tom Lane wrote:
Jan Wieck <jan@wi3ck.info> writes:
So let's focus on the actual problem of running out of XIDs and memory
while doing the upgrade involving millions of small large objects.Right. So as far as --single-transaction vs. --create goes, that's
mostly a definitional problem. As long as the contents of a DB are
restored in one transaction, it's not gonna matter if we eat one or
two more XIDs while creating the DB itself. So we could either
relax pg_restore's complaint, or invent a different switch that's
named to acknowledge that it's not really only one transaction.That still leaves us with the lots-o-locks problem. However, once
we've crossed the Rubicon of "it's not really only one transaction",
you could imagine that the switch is "--fewer-transactions", and the
idea is for pg_restore to commit after every (say) 100000 operations.
That would both bound its lock requirements and greatly cut its XID
consumption.
It leaves us with three things.
1) tremendous amounts of locks
2) tremendous amounts of memory needed
3) taking forever because it is single threaded.
I created a pathological case here on a VM with 24GB of RAM, 80GB of
SWAP sitting on NVME. The database has 20 million large objects, each of
which has 2 GRANTS, 1 COMMENT and 1 SECURITY LABEL (dummy). Each LO only
contains a string "large object <oid>", so the whole database in 9.5 is
about 15GB in size.
A stock pg_upgrade to version 14devel using --link takes about 15 hours.
This is partly because the pg_dump and pg_restore both grow to something
like 50GB+ to hold the TOC. Which sounds out of touch considering that
the entire system catalog on disk is less than 15GB. But aside from the
ridiculous amount of swapping, the whole thing also suffers from
consuming about 80 million transactions and apparently having just as
many network round trips with a single client.
The work you described sounded like it could fit into that paradigm,
with the additional ability to run some parallel restore tasks
that are each consuming a bounded number of locks.
I have attached a POC patch that implements two new options for pg_upgrade.
--restore-jobs=NUM --jobs parameter passed to pg_restore
--restore-blob-batch-size=NUM number of blobs restored in one xact
It does a bit more than just that. It rearranges the way large objects
are dumped so that most of the commands are all in one TOC entry and the
entry is emitted into SECTION_DATA when in binary upgrade mode (which
guarantees that there isn't any actual BLOB data in the dump). This
greatly reduces the number of network round trips and when using 8
parallel restore jobs, almost saturates the 4-core VM. Reducing the
number of TOC entries also reduces the total virtual memory need of
pg_restore to 15G, so there is a lot less swapping going on.
It cuts down the pg_upgrade time from 15 hours to 1.5 hours. In that run
I used --restore-jobs=8 and --restore-blob-batch-size=10000 (with a
max_locks_per_transaction=12000).
As said, this isn't a "one size fits all" solution. The pg_upgrade
parameters for --jobs and --restore-jobs will really depend on the
situation. Hundreds of small databases want --jobs, but one database
with millions of large objects wants --restore-jobs.
Regards, Jan
--
Jan Wieck
Principle Database Engineer
Amazon Web Services
Attachments:
pg_upgrade_improvements.v2.difftext/x-patch; charset=UTF-8; name=pg_upgrade_improvements.v2.diffDownload
diff --git a/src/bin/pg_dump/parallel.c b/src/bin/pg_dump/parallel.c
index c7351a4..4a611d0 100644
--- a/src/bin/pg_dump/parallel.c
+++ b/src/bin/pg_dump/parallel.c
@@ -864,6 +864,11 @@ RunWorker(ArchiveHandle *AH, ParallelSlot *slot)
WaitForCommands(AH, pipefd);
/*
+ * Close an eventually open BLOB batch transaction.
+ */
+ CommitBlobTransaction((Archive *)AH);
+
+ /*
* Disconnect from database and clean up.
*/
set_cancel_slot_archive(slot, NULL);
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 0296b9b..cd8a590 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -203,6 +203,8 @@ typedef struct Archive
int numWorkers; /* number of parallel processes */
char *sync_snapshot_id; /* sync snapshot id for parallel operation */
+ int blobBatchSize; /* # of blobs to restore per transaction */
+
/* info needed for string escaping */
int encoding; /* libpq code for client_encoding */
bool std_strings; /* standard_conforming_strings */
@@ -269,6 +271,7 @@ extern void WriteData(Archive *AH, const void *data, size_t dLen);
extern int StartBlob(Archive *AH, Oid oid);
extern int EndBlob(Archive *AH, Oid oid);
+extern void CommitBlobTransaction(Archive *AH);
extern void CloseArchive(Archive *AH);
extern void SetArchiveOptions(Archive *AH, DumpOptions *dopt, RestoreOptions *ropt);
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 1f82c64..51a862a 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -68,6 +68,8 @@ typedef struct _parallelReadyList
bool sorted; /* are valid entries currently sorted? */
} ParallelReadyList;
+static int blobBatchCount = 0;
+static bool blobInXact = false;
static ArchiveHandle *_allocAH(const char *FileSpec, const ArchiveFormat fmt,
const int compression, bool dosync, ArchiveMode mode,
@@ -265,6 +267,8 @@ CloseArchive(Archive *AHX)
int res = 0;
ArchiveHandle *AH = (ArchiveHandle *) AHX;
+ CommitBlobTransaction(AHX);
+
AH->ClosePtr(AH);
/* Close the output */
@@ -279,6 +283,24 @@ CloseArchive(Archive *AHX)
/* Public */
void
+CommitBlobTransaction(Archive *AHX)
+{
+ ArchiveHandle *AH = (ArchiveHandle *) AHX;
+
+ if (blobInXact)
+ {
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "-- End BLOB restore batch\n");
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "COMMIT;\n\n");
+
+ blobBatchCount = 0;
+ blobInXact = false;
+ }
+}
+
+/* Public */
+void
SetArchiveOptions(Archive *AH, DumpOptions *dopt, RestoreOptions *ropt)
{
/* Caller can omit dump options, in which case we synthesize them */
@@ -3531,6 +3553,59 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, bool isData)
{
RestoreOptions *ropt = AH->public.ropt;
+ /* We restore BLOBs in batches to reduce XID consumption */
+ if (strcmp(te->desc, "BLOB") == 0 && AH->public.blobBatchSize > 0)
+ {
+ if (blobInXact)
+ {
+ /* We are inside a BLOB restore transaction */
+ if (blobBatchCount >= AH->public.blobBatchSize)
+ {
+ /*
+ * We did reach the batch size with the previous BLOB.
+ * Commit and start a new batch.
+ */
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "-- BLOB batch size reached\n");
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "COMMIT;\n");
+ ahprintf(AH, "BEGIN;\n\n");
+
+ blobBatchCount = 1;
+ }
+ else
+ {
+ /* This one still fits into the current batch */
+ blobBatchCount++;
+ }
+ }
+ else
+ {
+ /* Not inside a transaction, start a new batch */
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "-- Start BLOB restore batch\n");
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "BEGIN;\n\n");
+
+ blobBatchCount = 1;
+ blobInXact = true;
+ }
+ }
+ else
+ {
+ /* Not a BLOB. If we have a BLOB batch open, close it. */
+ if (blobInXact)
+ {
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "-- End BLOB restore batch\n");
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "COMMIT;\n\n");
+
+ blobBatchCount = 0;
+ blobInXact = false;
+ }
+ }
+
/* Select owner, schema, tablespace and default AM as necessary */
_becomeOwner(AH, te);
_selectOutputSchema(AH, te->namespace);
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index f8bec3f..f153f08 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -165,12 +165,20 @@ static void guessConstraintInheritance(TableInfo *tblinfo, int numTables);
static void dumpComment(Archive *fout, const char *type, const char *name,
const char *namespace, const char *owner,
CatalogId catalogId, int subid, DumpId dumpId);
+static bool dumpCommentQuery(Archive *fout, PQExpBuffer query, PQExpBuffer tag,
+ const char *type, const char *name,
+ const char *namespace, const char *owner,
+ CatalogId catalogId, int subid, DumpId dumpId);
static int findComments(Archive *fout, Oid classoid, Oid objoid,
CommentItem **items);
static int collectComments(Archive *fout, CommentItem **items);
static void dumpSecLabel(Archive *fout, const char *type, const char *name,
const char *namespace, const char *owner,
CatalogId catalogId, int subid, DumpId dumpId);
+static bool dumpSecLabelQuery(Archive *fout, PQExpBuffer query, PQExpBuffer tag,
+ const char *type, const char *name,
+ const char *namespace, const char *owner,
+ CatalogId catalogId, int subid, DumpId dumpId);
static int findSecLabels(Archive *fout, Oid classoid, Oid objoid,
SecLabelItem **items);
static int collectSecLabels(Archive *fout, SecLabelItem **items);
@@ -227,6 +235,13 @@ static DumpId dumpACL(Archive *fout, DumpId objDumpId, DumpId altDumpId,
const char *nspname, const char *owner,
const char *acls, const char *racls,
const char *initacls, const char *initracls);
+static bool dumpACLQuery(Archive *fout, PQExpBuffer query, PQExpBuffer tag,
+ DumpId objDumpId, DumpId altDumpId,
+ const char *type, const char *name,
+ const char *subname,
+ const char *nspname, const char *owner,
+ const char *acls, const char *racls,
+ const char *initacls, const char *initracls);
static void getDependencies(Archive *fout);
static void BuildArchiveDependencies(Archive *fout);
@@ -3468,11 +3483,44 @@ dumpBlob(Archive *fout, const BlobInfo *binfo)
{
PQExpBuffer cquery = createPQExpBuffer();
PQExpBuffer dquery = createPQExpBuffer();
+ PQExpBuffer tag = createPQExpBuffer();
+ teSection section = SECTION_PRE_DATA;
appendPQExpBuffer(cquery,
"SELECT pg_catalog.lo_create('%s');\n",
binfo->dobj.name);
+ /*
+ * In binary upgrade mode we put all the queries to restore
+ * one large object into a single TOC entry and emit it as
+ * SECTION_DATA so that they can be restored in parallel.
+ */
+ if (fout->dopt->binary_upgrade)
+ {
+ section = SECTION_DATA;
+
+ /* Dump comment if any */
+ if (binfo->dobj.dump & DUMP_COMPONENT_COMMENT)
+ dumpCommentQuery(fout, cquery, tag, "LARGE OBJECT",
+ binfo->dobj.name, NULL, binfo->rolname,
+ binfo->dobj.catId, 0, binfo->dobj.dumpId);
+
+ /* Dump security label if any */
+ if (binfo->dobj.dump & DUMP_COMPONENT_SECLABEL)
+ dumpSecLabelQuery(fout, cquery, tag, "LARGE OBJECT",
+ binfo->dobj.name,
+ NULL, binfo->rolname,
+ binfo->dobj.catId, 0, binfo->dobj.dumpId);
+
+ /* Dump ACL if any */
+ if (binfo->blobacl && (binfo->dobj.dump & DUMP_COMPONENT_ACL))
+ dumpACLQuery(fout, cquery, tag,
+ binfo->dobj.dumpId, InvalidDumpId, "LARGE OBJECT",
+ binfo->dobj.name, NULL,
+ NULL, binfo->rolname, binfo->blobacl, binfo->rblobacl,
+ binfo->initblobacl, binfo->initrblobacl);
+ }
+
appendPQExpBuffer(dquery,
"SELECT pg_catalog.lo_unlink('%s');\n",
binfo->dobj.name);
@@ -3482,28 +3530,31 @@ dumpBlob(Archive *fout, const BlobInfo *binfo)
ARCHIVE_OPTS(.tag = binfo->dobj.name,
.owner = binfo->rolname,
.description = "BLOB",
- .section = SECTION_PRE_DATA,
+ .section = section,
.createStmt = cquery->data,
.dropStmt = dquery->data));
- /* Dump comment if any */
- if (binfo->dobj.dump & DUMP_COMPONENT_COMMENT)
- dumpComment(fout, "LARGE OBJECT", binfo->dobj.name,
- NULL, binfo->rolname,
- binfo->dobj.catId, 0, binfo->dobj.dumpId);
-
- /* Dump security label if any */
- if (binfo->dobj.dump & DUMP_COMPONENT_SECLABEL)
- dumpSecLabel(fout, "LARGE OBJECT", binfo->dobj.name,
- NULL, binfo->rolname,
- binfo->dobj.catId, 0, binfo->dobj.dumpId);
-
- /* Dump ACL if any */
- if (binfo->blobacl && (binfo->dobj.dump & DUMP_COMPONENT_ACL))
- dumpACL(fout, binfo->dobj.dumpId, InvalidDumpId, "LARGE OBJECT",
- binfo->dobj.name, NULL,
- NULL, binfo->rolname, binfo->blobacl, binfo->rblobacl,
- binfo->initblobacl, binfo->initrblobacl);
+ if (!fout->dopt->binary_upgrade)
+ {
+ /* Dump comment if any */
+ if (binfo->dobj.dump & DUMP_COMPONENT_COMMENT)
+ dumpComment(fout, "LARGE OBJECT", binfo->dobj.name,
+ NULL, binfo->rolname,
+ binfo->dobj.catId, 0, binfo->dobj.dumpId);
+
+ /* Dump security label if any */
+ if (binfo->dobj.dump & DUMP_COMPONENT_SECLABEL)
+ dumpSecLabel(fout, "LARGE OBJECT", binfo->dobj.name,
+ NULL, binfo->rolname,
+ binfo->dobj.catId, 0, binfo->dobj.dumpId);
+
+ /* Dump ACL if any */
+ if (binfo->blobacl && (binfo->dobj.dump & DUMP_COMPONENT_ACL))
+ dumpACL(fout, binfo->dobj.dumpId, InvalidDumpId, "LARGE OBJECT",
+ binfo->dobj.name, NULL,
+ NULL, binfo->rolname, binfo->blobacl, binfo->rblobacl,
+ binfo->initblobacl, binfo->initrblobacl);
+ }
destroyPQExpBuffer(cquery);
destroyPQExpBuffer(dquery);
@@ -9868,25 +9919,56 @@ dumpComment(Archive *fout, const char *type, const char *name,
const char *namespace, const char *owner,
CatalogId catalogId, int subid, DumpId dumpId)
{
+ PQExpBuffer query = createPQExpBuffer();
+ PQExpBuffer tag = createPQExpBuffer();
+
+ if (dumpCommentQuery(fout, query, tag, type, name, namespace, owner,
+ catalogId, subid, dumpId))
+ {
+ /*
+ * We mark comments as SECTION_NONE because they really belong in the
+ * same section as their parent, whether that is pre-data or
+ * post-data.
+ */
+ ArchiveEntry(fout, nilCatalogId, createDumpId(),
+ ARCHIVE_OPTS(.tag = tag->data,
+ .namespace = namespace,
+ .owner = owner,
+ .description = "COMMENT",
+ .section = SECTION_NONE,
+ .createStmt = query->data,
+ .deps = &dumpId,
+ .nDeps = 1));
+ }
+ destroyPQExpBuffer(query);
+ destroyPQExpBuffer(tag);
+}
+
+static bool
+dumpCommentQuery(Archive *fout, PQExpBuffer query, PQExpBuffer tag,
+ const char *type, const char *name,
+ const char *namespace, const char *owner,
+ CatalogId catalogId, int subid, DumpId dumpId)
+{
DumpOptions *dopt = fout->dopt;
CommentItem *comments;
int ncomments;
/* do nothing, if --no-comments is supplied */
if (dopt->no_comments)
- return;
+ return false;
/* Comments are schema not data ... except blob comments are data */
if (strcmp(type, "LARGE OBJECT") != 0)
{
if (dopt->dataOnly)
- return;
+ return false;
}
else
{
/* We do dump blob comments in binary-upgrade mode */
if (dopt->schemaOnly && !dopt->binary_upgrade)
- return;
+ return false;
}
/* Search for comments associated with catalogId, using table */
@@ -9905,9 +9987,6 @@ dumpComment(Archive *fout, const char *type, const char *name,
/* If a comment exists, build COMMENT ON statement */
if (ncomments > 0)
{
- PQExpBuffer query = createPQExpBuffer();
- PQExpBuffer tag = createPQExpBuffer();
-
appendPQExpBuffer(query, "COMMENT ON %s ", type);
if (namespace && *namespace)
appendPQExpBuffer(query, "%s.", fmtId(namespace));
@@ -9917,24 +9996,10 @@ dumpComment(Archive *fout, const char *type, const char *name,
appendPQExpBuffer(tag, "%s %s", type, name);
- /*
- * We mark comments as SECTION_NONE because they really belong in the
- * same section as their parent, whether that is pre-data or
- * post-data.
- */
- ArchiveEntry(fout, nilCatalogId, createDumpId(),
- ARCHIVE_OPTS(.tag = tag->data,
- .namespace = namespace,
- .owner = owner,
- .description = "COMMENT",
- .section = SECTION_NONE,
- .createStmt = query->data,
- .deps = &dumpId,
- .nDeps = 1));
-
- destroyPQExpBuffer(query);
- destroyPQExpBuffer(tag);
+ return true;
}
+
+ return false;
}
/*
@@ -15070,18 +15135,63 @@ dumpACL(Archive *fout, DumpId objDumpId, DumpId altDumpId,
const char *initacls, const char *initracls)
{
DumpId aclDumpId = InvalidDumpId;
+ PQExpBuffer query = createPQExpBuffer();
+ PQExpBuffer tag = createPQExpBuffer();
+
+ if (dumpACLQuery(fout, query, tag, objDumpId, altDumpId,
+ type, name, subname, nspname, owner,
+ acls, racls, initacls, initracls))
+ {
+ DumpId aclDeps[2];
+ int nDeps = 0;
+
+ if (subname)
+ appendPQExpBuffer(tag, "COLUMN %s.%s", name, subname);
+ else
+ appendPQExpBuffer(tag, "%s %s", type, name);
+
+ aclDeps[nDeps++] = objDumpId;
+ if (altDumpId != InvalidDumpId)
+ aclDeps[nDeps++] = altDumpId;
+
+ aclDumpId = createDumpId();
+
+ ArchiveEntry(fout, nilCatalogId, aclDumpId,
+ ARCHIVE_OPTS(.tag = tag->data,
+ .namespace = nspname,
+ .owner = owner,
+ .description = "ACL",
+ .section = SECTION_NONE,
+ .createStmt = query->data,
+ .deps = aclDeps,
+ .nDeps = nDeps));
+
+ }
+
+ destroyPQExpBuffer(query);
+ destroyPQExpBuffer(tag);
+
+ return aclDumpId;
+}
+
+static bool
+dumpACLQuery(Archive *fout, PQExpBuffer query, PQExpBuffer tag,
+ DumpId objDumpId, DumpId altDumpId,
+ const char *type, const char *name, const char *subname,
+ const char *nspname, const char *owner,
+ const char *acls, const char *racls,
+ const char *initacls, const char *initracls)
+{
DumpOptions *dopt = fout->dopt;
- PQExpBuffer sql;
+ bool haveACL = false;
/* Do nothing if ACL dump is not enabled */
if (dopt->aclsSkip)
- return InvalidDumpId;
+ return false;
/* --data-only skips ACLs *except* BLOB ACLs */
if (dopt->dataOnly && strcmp(type, "LARGE OBJECT") != 0)
- return InvalidDumpId;
-
- sql = createPQExpBuffer();
+ return false;
/*
* Check to see if this object has had any initial ACLs included for it.
@@ -15093,54 +15203,31 @@ dumpACL(Archive *fout, DumpId objDumpId, DumpId altDumpId,
*/
if (strlen(initacls) != 0 || strlen(initracls) != 0)
{
- appendPQExpBufferStr(sql, "SELECT pg_catalog.binary_upgrade_set_record_init_privs(true);\n");
+ haveACL = true;
+ appendPQExpBufferStr(query, "SELECT pg_catalog.binary_upgrade_set_record_init_privs(true);\n");
if (!buildACLCommands(name, subname, nspname, type,
initacls, initracls, owner,
- "", fout->remoteVersion, sql))
+ "", fout->remoteVersion, query))
fatal("could not parse initial GRANT ACL list (%s) or initial REVOKE ACL list (%s) for object \"%s\" (%s)",
initacls, initracls, name, type);
- appendPQExpBufferStr(sql, "SELECT pg_catalog.binary_upgrade_set_record_init_privs(false);\n");
+ appendPQExpBufferStr(query, "SELECT pg_catalog.binary_upgrade_set_record_init_privs(false);\n");
}
if (!buildACLCommands(name, subname, nspname, type,
acls, racls, owner,
- "", fout->remoteVersion, sql))
+ "", fout->remoteVersion, query))
fatal("could not parse GRANT ACL list (%s) or REVOKE ACL list (%s) for object \"%s\" (%s)",
acls, racls, name, type);
- if (sql->len > 0)
+ if (haveACL && tag != NULL)
{
- PQExpBuffer tag = createPQExpBuffer();
- DumpId aclDeps[2];
- int nDeps = 0;
-
if (subname)
appendPQExpBuffer(tag, "COLUMN %s.%s", name, subname);
else
appendPQExpBuffer(tag, "%s %s", type, name);
-
- aclDeps[nDeps++] = objDumpId;
- if (altDumpId != InvalidDumpId)
- aclDeps[nDeps++] = altDumpId;
-
- aclDumpId = createDumpId();
-
- ArchiveEntry(fout, nilCatalogId, aclDumpId,
- ARCHIVE_OPTS(.tag = tag->data,
- .namespace = nspname,
- .owner = owner,
- .description = "ACL",
- .section = SECTION_NONE,
- .createStmt = sql->data,
- .deps = aclDeps,
- .nDeps = nDeps));
-
- destroyPQExpBuffer(tag);
}
- destroyPQExpBuffer(sql);
-
- return aclDumpId;
+ return haveACL;
}
/*
@@ -15166,34 +15253,58 @@ dumpSecLabel(Archive *fout, const char *type, const char *name,
const char *namespace, const char *owner,
CatalogId catalogId, int subid, DumpId dumpId)
{
+ PQExpBuffer query = createPQExpBuffer();
+ PQExpBuffer tag = createPQExpBuffer();
+
+ if (dumpSecLabelQuery(fout, query, tag, type, name,
+ namespace, owner, catalogId, subid, dumpId))
+ {
+ ArchiveEntry(fout, nilCatalogId, createDumpId(),
+ ARCHIVE_OPTS(.tag = tag->data,
+ .namespace = namespace,
+ .owner = owner,
+ .description = "SECURITY LABEL",
+ .section = SECTION_NONE,
+ .createStmt = query->data,
+ .deps = &dumpId,
+ .nDeps = 1));
+ }
+
+ destroyPQExpBuffer(query);
+ destroyPQExpBuffer(tag);
+}
+
+static bool
+dumpSecLabelQuery(Archive *fout, PQExpBuffer query, PQExpBuffer tag,
+ const char *type, const char *name,
+ const char *namespace, const char *owner,
+ CatalogId catalogId, int subid, DumpId dumpId)
+{
DumpOptions *dopt = fout->dopt;
SecLabelItem *labels;
int nlabels;
int i;
- PQExpBuffer query;
/* do nothing, if --no-security-labels is supplied */
if (dopt->no_security_labels)
- return;
+ return false;
/* Security labels are schema not data ... except blob labels are data */
if (strcmp(type, "LARGE OBJECT") != 0)
{
if (dopt->dataOnly)
- return;
+ return false;
}
else
{
/* We do dump blob security labels in binary-upgrade mode */
if (dopt->schemaOnly && !dopt->binary_upgrade)
- return;
+ return false;
}
/* Search for security labels associated with catalogId, using table */
nlabels = findSecLabels(fout, catalogId.tableoid, catalogId.oid, &labels);
- query = createPQExpBuffer();
-
for (i = 0; i < nlabels; i++)
{
/*
@@ -15214,22 +15325,11 @@ dumpSecLabel(Archive *fout, const char *type, const char *name,
if (query->len > 0)
{
- PQExpBuffer tag = createPQExpBuffer();
-
appendPQExpBuffer(tag, "%s %s", type, name);
- ArchiveEntry(fout, nilCatalogId, createDumpId(),
- ARCHIVE_OPTS(.tag = tag->data,
- .namespace = namespace,
- .owner = owner,
- .description = "SECURITY LABEL",
- .section = SECTION_NONE,
- .createStmt = query->data,
- .deps = &dumpId,
- .nDeps = 1));
- destroyPQExpBuffer(tag);
+ return true;
}
- destroyPQExpBuffer(query);
+ return false;
}
/*
diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
index 589b4ae..b16db03 100644
--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -59,6 +59,7 @@ main(int argc, char **argv)
int c;
int exit_code;
int numWorkers = 1;
+ int blobBatchSize = 0;
Archive *AH;
char *inputFileSpec;
static int disable_triggers = 0;
@@ -120,6 +121,7 @@ main(int argc, char **argv)
{"no-publications", no_argument, &no_publications, 1},
{"no-security-labels", no_argument, &no_security_labels, 1},
{"no-subscriptions", no_argument, &no_subscriptions, 1},
+ {"restore-blob-batch-size", required_argument, NULL, 4},
{NULL, 0, NULL, 0}
};
@@ -280,6 +282,10 @@ main(int argc, char **argv)
set_dump_section(optarg, &(opts->dumpSections));
break;
+ case 4: /* # of blobs to restore per transaction */
+ blobBatchSize = atoi(optarg);
+ break;
+
default:
fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
exit_nicely(1);
@@ -434,6 +440,7 @@ main(int argc, char **argv)
SortTocFromFile(AH);
AH->numWorkers = numWorkers;
+ AH->blobBatchSize = blobBatchSize;
if (opts->tocSummary)
PrintTOCSummary(AH);
@@ -506,6 +513,8 @@ usage(const char *progname)
printf(_(" --use-set-session-authorization\n"
" use SET SESSION AUTHORIZATION commands instead of\n"
" ALTER OWNER commands to set ownership\n"));
+ printf(_(" --restore-blob-batch-size=NUM\n"
+ " attempt to restore NUM large objects per transaction\n"));
printf(_("\nConnection options:\n"));
printf(_(" -h, --host=HOSTNAME database server host or socket directory\n"));
diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c
index 9c9b313..868e9f6 100644
--- a/src/bin/pg_upgrade/option.c
+++ b/src/bin/pg_upgrade/option.c
@@ -57,6 +57,8 @@ parseCommandLine(int argc, char *argv[])
{"verbose", no_argument, NULL, 'v'},
{"clone", no_argument, NULL, 1},
{"index-collation-versions-unknown", no_argument, NULL, 2},
+ {"restore-jobs", required_argument, NULL, 3},
+ {"restore-blob-batch-size", required_argument, NULL, 4},
{NULL, 0, NULL, 0}
};
@@ -208,6 +210,14 @@ parseCommandLine(int argc, char *argv[])
user_opts.ind_coll_unknown = true;
break;
+ case 3:
+ user_opts.restore_jobs = atoi(optarg);
+ break;
+
+ case 4:
+ user_opts.blob_batch_size = atoi(optarg);
+ break;
+
default:
fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
os_info.progname);
@@ -314,6 +324,8 @@ usage(void)
printf(_(" --clone clone instead of copying files to new cluster\n"));
printf(_(" --index-collation-versions-unknown\n"));
printf(_(" mark text indexes as needing to be rebuilt\n"));
+ printf(_(" --restore-blob-batch-size=NUM attempt to restore NUM large objects per\n"));
+ printf(_(" transaction\n"));
printf(_(" -?, --help show this help, then exit\n"));
printf(_("\n"
"Before running pg_upgrade you must:\n"
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index e23b8ca..095e980 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -385,10 +385,13 @@ create_new_objects(void)
parallel_exec_prog(log_file_name,
NULL,
"\"%s/pg_restore\" %s %s --exit-on-error --verbose "
+ "--jobs %d --restore-blob-batch-size %d "
"--dbname template1 \"%s\"",
new_cluster.bindir,
cluster_conn_opts(&new_cluster),
create_opts,
+ user_opts.restore_jobs,
+ user_opts.blob_batch_size,
sql_file_name);
}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 919a784..5647f96 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -291,6 +291,8 @@ typedef struct
* changes */
transferMode transfer_mode; /* copy files or link them? */
int jobs; /* number of processes/threads to use */
+ int restore_jobs; /* number of pg_restore --jobs to use */
+ int blob_batch_size; /* number of blobs to restore per xact */
char *socketdir; /* directory to use for Unix sockets */
bool ind_coll_unknown; /* mark unknown index collation versions */
} UserOpts;
Hi,
w.r.t. pg_upgrade_improvements.v2.diff.
+ blobBatchCount = 0;
+ blobInXact = false;
The count and bool flag are always reset in tandem. It seems
variable blobInXact is not needed.
Cheers
On 3/22/21 5:36 PM, Zhihong Yu wrote:
Hi,
w.r.t. pg_upgrade_improvements.v2.diff.
+ blobBatchCount = 0;
+ blobInXact = false;The count and bool flag are always reset in tandem. It seems
variable blobInXact is not needed.
You are right. I will fix that.
Thanks, Jan
--
Jan Wieck
Principle Database Engineer
Amazon Web Services
On 3/22/21 7:18 PM, Jan Wieck wrote:
On 3/22/21 5:36 PM, Zhihong Yu wrote:
Hi,
w.r.t. pg_upgrade_improvements.v2.diff.
+ blobBatchCount = 0;
+ blobInXact = false;The count and bool flag are always reset in tandem. It seems
variable blobInXact is not needed.You are right. I will fix that.
New patch v3 attached.
Thanks, Jan
--
Jan Wieck
Principle Database Engineer
Amazon Web Services
Attachments:
pg_upgrade_improvements.v3.difftext/x-patch; charset=UTF-8; name=pg_upgrade_improvements.v3.diffDownload
diff --git a/src/bin/pg_dump/parallel.c b/src/bin/pg_dump/parallel.c
index c7351a4..4a611d0 100644
--- a/src/bin/pg_dump/parallel.c
+++ b/src/bin/pg_dump/parallel.c
@@ -864,6 +864,11 @@ RunWorker(ArchiveHandle *AH, ParallelSlot *slot)
WaitForCommands(AH, pipefd);
/*
+ * Close an eventually open BLOB batch transaction.
+ */
+ CommitBlobTransaction((Archive *)AH);
+
+ /*
* Disconnect from database and clean up.
*/
set_cancel_slot_archive(slot, NULL);
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 0296b9b..cd8a590 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -203,6 +203,8 @@ typedef struct Archive
int numWorkers; /* number of parallel processes */
char *sync_snapshot_id; /* sync snapshot id for parallel operation */
+ int blobBatchSize; /* # of blobs to restore per transaction */
+
/* info needed for string escaping */
int encoding; /* libpq code for client_encoding */
bool std_strings; /* standard_conforming_strings */
@@ -269,6 +271,7 @@ extern void WriteData(Archive *AH, const void *data, size_t dLen);
extern int StartBlob(Archive *AH, Oid oid);
extern int EndBlob(Archive *AH, Oid oid);
+extern void CommitBlobTransaction(Archive *AH);
extern void CloseArchive(Archive *AH);
extern void SetArchiveOptions(Archive *AH, DumpOptions *dopt, RestoreOptions *ropt);
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 1f82c64..8331e8a 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -68,6 +68,7 @@ typedef struct _parallelReadyList
bool sorted; /* are valid entries currently sorted? */
} ParallelReadyList;
+static int blobBatchCount = 0;
static ArchiveHandle *_allocAH(const char *FileSpec, const ArchiveFormat fmt,
const int compression, bool dosync, ArchiveMode mode,
@@ -265,6 +266,8 @@ CloseArchive(Archive *AHX)
int res = 0;
ArchiveHandle *AH = (ArchiveHandle *) AHX;
+ CommitBlobTransaction(AHX);
+
AH->ClosePtr(AH);
/* Close the output */
@@ -279,6 +282,23 @@ CloseArchive(Archive *AHX)
/* Public */
void
+CommitBlobTransaction(Archive *AHX)
+{
+ ArchiveHandle *AH = (ArchiveHandle *) AHX;
+
+ if (blobBatchCount > 0)
+ {
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "-- End BLOB restore batch\n");
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "COMMIT;\n\n");
+
+ blobBatchCount = 0;
+ }
+}
+
+/* Public */
+void
SetArchiveOptions(Archive *AH, DumpOptions *dopt, RestoreOptions *ropt)
{
/* Caller can omit dump options, in which case we synthesize them */
@@ -3531,6 +3551,57 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, bool isData)
{
RestoreOptions *ropt = AH->public.ropt;
+ /* We restore BLOBs in batches to reduce XID consumption */
+ if (strcmp(te->desc, "BLOB") == 0 && AH->public.blobBatchSize > 0)
+ {
+ if (blobBatchCount > 0)
+ {
+ /* We are inside a BLOB restore transaction */
+ if (blobBatchCount >= AH->public.blobBatchSize)
+ {
+ /*
+ * We did reach the batch size with the previous BLOB.
+ * Commit and start a new batch.
+ */
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "-- BLOB batch size reached\n");
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "COMMIT;\n");
+ ahprintf(AH, "BEGIN;\n\n");
+
+ blobBatchCount = 1;
+ }
+ else
+ {
+ /* This one still fits into the current batch */
+ blobBatchCount++;
+ }
+ }
+ else
+ {
+ /* Not inside a transaction, start a new batch */
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "-- Start BLOB restore batch\n");
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "BEGIN;\n\n");
+
+ blobBatchCount = 1;
+ }
+ }
+ else
+ {
+ /* Not a BLOB. If we have a BLOB batch open, close it. */
+ if (blobBatchCount > 0)
+ {
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "-- End BLOB restore batch\n");
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "COMMIT;\n\n");
+
+ blobBatchCount = 0;
+ }
+ }
+
/* Select owner, schema, tablespace and default AM as necessary */
_becomeOwner(AH, te);
_selectOutputSchema(AH, te->namespace);
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index f8bec3f..f153f08 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -165,12 +165,20 @@ static void guessConstraintInheritance(TableInfo *tblinfo, int numTables);
static void dumpComment(Archive *fout, const char *type, const char *name,
const char *namespace, const char *owner,
CatalogId catalogId, int subid, DumpId dumpId);
+static bool dumpCommentQuery(Archive *fout, PQExpBuffer query, PQExpBuffer tag,
+ const char *type, const char *name,
+ const char *namespace, const char *owner,
+ CatalogId catalogId, int subid, DumpId dumpId);
static int findComments(Archive *fout, Oid classoid, Oid objoid,
CommentItem **items);
static int collectComments(Archive *fout, CommentItem **items);
static void dumpSecLabel(Archive *fout, const char *type, const char *name,
const char *namespace, const char *owner,
CatalogId catalogId, int subid, DumpId dumpId);
+static bool dumpSecLabelQuery(Archive *fout, PQExpBuffer query, PQExpBuffer tag,
+ const char *type, const char *name,
+ const char *namespace, const char *owner,
+ CatalogId catalogId, int subid, DumpId dumpId);
static int findSecLabels(Archive *fout, Oid classoid, Oid objoid,
SecLabelItem **items);
static int collectSecLabels(Archive *fout, SecLabelItem **items);
@@ -227,6 +235,13 @@ static DumpId dumpACL(Archive *fout, DumpId objDumpId, DumpId altDumpId,
const char *nspname, const char *owner,
const char *acls, const char *racls,
const char *initacls, const char *initracls);
+static bool dumpACLQuery(Archive *fout, PQExpBuffer query, PQExpBuffer tag,
+ DumpId objDumpId, DumpId altDumpId,
+ const char *type, const char *name,
+ const char *subname,
+ const char *nspname, const char *owner,
+ const char *acls, const char *racls,
+ const char *initacls, const char *initracls);
static void getDependencies(Archive *fout);
static void BuildArchiveDependencies(Archive *fout);
@@ -3468,11 +3483,44 @@ dumpBlob(Archive *fout, const BlobInfo *binfo)
{
PQExpBuffer cquery = createPQExpBuffer();
PQExpBuffer dquery = createPQExpBuffer();
+ PQExpBuffer tag = createPQExpBuffer();
+ teSection section = SECTION_PRE_DATA;
appendPQExpBuffer(cquery,
"SELECT pg_catalog.lo_create('%s');\n",
binfo->dobj.name);
+ /*
+ * In binary upgrade mode we put all the queries to restore
+ * one large object into a single TOC entry and emit it as
+ * SECTION_DATA so that they can be restored in parallel.
+ */
+ if (fout->dopt->binary_upgrade)
+ {
+ section = SECTION_DATA;
+
+ /* Dump comment if any */
+ if (binfo->dobj.dump & DUMP_COMPONENT_COMMENT)
+ dumpCommentQuery(fout, cquery, tag, "LARGE OBJECT",
+ binfo->dobj.name, NULL, binfo->rolname,
+ binfo->dobj.catId, 0, binfo->dobj.dumpId);
+
+ /* Dump security label if any */
+ if (binfo->dobj.dump & DUMP_COMPONENT_SECLABEL)
+ dumpSecLabelQuery(fout, cquery, tag, "LARGE OBJECT",
+ binfo->dobj.name,
+ NULL, binfo->rolname,
+ binfo->dobj.catId, 0, binfo->dobj.dumpId);
+
+ /* Dump ACL if any */
+ if (binfo->blobacl && (binfo->dobj.dump & DUMP_COMPONENT_ACL))
+ dumpACLQuery(fout, cquery, tag,
+ binfo->dobj.dumpId, InvalidDumpId, "LARGE OBJECT",
+ binfo->dobj.name, NULL,
+ NULL, binfo->rolname, binfo->blobacl, binfo->rblobacl,
+ binfo->initblobacl, binfo->initrblobacl);
+ }
+
appendPQExpBuffer(dquery,
"SELECT pg_catalog.lo_unlink('%s');\n",
binfo->dobj.name);
@@ -3482,28 +3530,31 @@ dumpBlob(Archive *fout, const BlobInfo *binfo)
ARCHIVE_OPTS(.tag = binfo->dobj.name,
.owner = binfo->rolname,
.description = "BLOB",
- .section = SECTION_PRE_DATA,
+ .section = section,
.createStmt = cquery->data,
.dropStmt = dquery->data));
- /* Dump comment if any */
- if (binfo->dobj.dump & DUMP_COMPONENT_COMMENT)
- dumpComment(fout, "LARGE OBJECT", binfo->dobj.name,
- NULL, binfo->rolname,
- binfo->dobj.catId, 0, binfo->dobj.dumpId);
-
- /* Dump security label if any */
- if (binfo->dobj.dump & DUMP_COMPONENT_SECLABEL)
- dumpSecLabel(fout, "LARGE OBJECT", binfo->dobj.name,
- NULL, binfo->rolname,
- binfo->dobj.catId, 0, binfo->dobj.dumpId);
-
- /* Dump ACL if any */
- if (binfo->blobacl && (binfo->dobj.dump & DUMP_COMPONENT_ACL))
- dumpACL(fout, binfo->dobj.dumpId, InvalidDumpId, "LARGE OBJECT",
- binfo->dobj.name, NULL,
- NULL, binfo->rolname, binfo->blobacl, binfo->rblobacl,
- binfo->initblobacl, binfo->initrblobacl);
+ if (!fout->dopt->binary_upgrade)
+ {
+ /* Dump comment if any */
+ if (binfo->dobj.dump & DUMP_COMPONENT_COMMENT)
+ dumpComment(fout, "LARGE OBJECT", binfo->dobj.name,
+ NULL, binfo->rolname,
+ binfo->dobj.catId, 0, binfo->dobj.dumpId);
+
+ /* Dump security label if any */
+ if (binfo->dobj.dump & DUMP_COMPONENT_SECLABEL)
+ dumpSecLabel(fout, "LARGE OBJECT", binfo->dobj.name,
+ NULL, binfo->rolname,
+ binfo->dobj.catId, 0, binfo->dobj.dumpId);
+
+ /* Dump ACL if any */
+ if (binfo->blobacl && (binfo->dobj.dump & DUMP_COMPONENT_ACL))
+ dumpACL(fout, binfo->dobj.dumpId, InvalidDumpId, "LARGE OBJECT",
+ binfo->dobj.name, NULL,
+ NULL, binfo->rolname, binfo->blobacl, binfo->rblobacl,
+ binfo->initblobacl, binfo->initrblobacl);
+ }
destroyPQExpBuffer(cquery);
destroyPQExpBuffer(dquery);
@@ -9868,25 +9919,56 @@ dumpComment(Archive *fout, const char *type, const char *name,
const char *namespace, const char *owner,
CatalogId catalogId, int subid, DumpId dumpId)
{
+ PQExpBuffer query = createPQExpBuffer();
+ PQExpBuffer tag = createPQExpBuffer();
+
+ if (dumpCommentQuery(fout, query, tag, type, name, namespace, owner,
+ catalogId, subid, dumpId))
+ {
+ /*
+ * We mark comments as SECTION_NONE because they really belong in the
+ * same section as their parent, whether that is pre-data or
+ * post-data.
+ */
+ ArchiveEntry(fout, nilCatalogId, createDumpId(),
+ ARCHIVE_OPTS(.tag = tag->data,
+ .namespace = namespace,
+ .owner = owner,
+ .description = "COMMENT",
+ .section = SECTION_NONE,
+ .createStmt = query->data,
+ .deps = &dumpId,
+ .nDeps = 1));
+ }
+ destroyPQExpBuffer(query);
+ destroyPQExpBuffer(tag);
+}
+
+static bool
+dumpCommentQuery(Archive *fout, PQExpBuffer query, PQExpBuffer tag,
+ const char *type, const char *name,
+ const char *namespace, const char *owner,
+ CatalogId catalogId, int subid, DumpId dumpId)
+{
DumpOptions *dopt = fout->dopt;
CommentItem *comments;
int ncomments;
/* do nothing, if --no-comments is supplied */
if (dopt->no_comments)
- return;
+ return false;
/* Comments are schema not data ... except blob comments are data */
if (strcmp(type, "LARGE OBJECT") != 0)
{
if (dopt->dataOnly)
- return;
+ return false;
}
else
{
/* We do dump blob comments in binary-upgrade mode */
if (dopt->schemaOnly && !dopt->binary_upgrade)
- return;
+ return false;
}
/* Search for comments associated with catalogId, using table */
@@ -9905,9 +9987,6 @@ dumpComment(Archive *fout, const char *type, const char *name,
/* If a comment exists, build COMMENT ON statement */
if (ncomments > 0)
{
- PQExpBuffer query = createPQExpBuffer();
- PQExpBuffer tag = createPQExpBuffer();
-
appendPQExpBuffer(query, "COMMENT ON %s ", type);
if (namespace && *namespace)
appendPQExpBuffer(query, "%s.", fmtId(namespace));
@@ -9917,24 +9996,10 @@ dumpComment(Archive *fout, const char *type, const char *name,
appendPQExpBuffer(tag, "%s %s", type, name);
- /*
- * We mark comments as SECTION_NONE because they really belong in the
- * same section as their parent, whether that is pre-data or
- * post-data.
- */
- ArchiveEntry(fout, nilCatalogId, createDumpId(),
- ARCHIVE_OPTS(.tag = tag->data,
- .namespace = namespace,
- .owner = owner,
- .description = "COMMENT",
- .section = SECTION_NONE,
- .createStmt = query->data,
- .deps = &dumpId,
- .nDeps = 1));
-
- destroyPQExpBuffer(query);
- destroyPQExpBuffer(tag);
+ return true;
}
+
+ return false;
}
/*
@@ -15070,18 +15135,63 @@ dumpACL(Archive *fout, DumpId objDumpId, DumpId altDumpId,
const char *initacls, const char *initracls)
{
DumpId aclDumpId = InvalidDumpId;
+ PQExpBuffer query = createPQExpBuffer();
+ PQExpBuffer tag = createPQExpBuffer();
+
+ if (dumpACLQuery(fout, query, tag, objDumpId, altDumpId,
+ type, name, subname, nspname, owner,
+ acls, racls, initacls, initracls))
+ {
+ DumpId aclDeps[2];
+ int nDeps = 0;
+
+ if (subname)
+ appendPQExpBuffer(tag, "COLUMN %s.%s", name, subname);
+ else
+ appendPQExpBuffer(tag, "%s %s", type, name);
+
+ aclDeps[nDeps++] = objDumpId;
+ if (altDumpId != InvalidDumpId)
+ aclDeps[nDeps++] = altDumpId;
+
+ aclDumpId = createDumpId();
+
+ ArchiveEntry(fout, nilCatalogId, aclDumpId,
+ ARCHIVE_OPTS(.tag = tag->data,
+ .namespace = nspname,
+ .owner = owner,
+ .description = "ACL",
+ .section = SECTION_NONE,
+ .createStmt = query->data,
+ .deps = aclDeps,
+ .nDeps = nDeps));
+
+ }
+
+ destroyPQExpBuffer(query);
+ destroyPQExpBuffer(tag);
+
+ return aclDumpId;
+}
+
+static bool
+dumpACLQuery(Archive *fout, PQExpBuffer query, PQExpBuffer tag,
+ DumpId objDumpId, DumpId altDumpId,
+ const char *type, const char *name, const char *subname,
+ const char *nspname, const char *owner,
+ const char *acls, const char *racls,
+ const char *initacls, const char *initracls)
+{
DumpOptions *dopt = fout->dopt;
- PQExpBuffer sql;
+ bool haveACL = false;
/* Do nothing if ACL dump is not enabled */
if (dopt->aclsSkip)
- return InvalidDumpId;
+ return false;
/* --data-only skips ACLs *except* BLOB ACLs */
if (dopt->dataOnly && strcmp(type, "LARGE OBJECT") != 0)
- return InvalidDumpId;
-
- sql = createPQExpBuffer();
+ return false;
/*
* Check to see if this object has had any initial ACLs included for it.
@@ -15093,54 +15203,31 @@ dumpACL(Archive *fout, DumpId objDumpId, DumpId altDumpId,
*/
if (strlen(initacls) != 0 || strlen(initracls) != 0)
{
- appendPQExpBufferStr(sql, "SELECT pg_catalog.binary_upgrade_set_record_init_privs(true);\n");
+ haveACL = true;
+ appendPQExpBufferStr(query, "SELECT pg_catalog.binary_upgrade_set_record_init_privs(true);\n");
if (!buildACLCommands(name, subname, nspname, type,
initacls, initracls, owner,
- "", fout->remoteVersion, sql))
+ "", fout->remoteVersion, query))
fatal("could not parse initial GRANT ACL list (%s) or initial REVOKE ACL list (%s) for object \"%s\" (%s)",
initacls, initracls, name, type);
- appendPQExpBufferStr(sql, "SELECT pg_catalog.binary_upgrade_set_record_init_privs(false);\n");
+ appendPQExpBufferStr(query, "SELECT pg_catalog.binary_upgrade_set_record_init_privs(false);\n");
}
if (!buildACLCommands(name, subname, nspname, type,
acls, racls, owner,
- "", fout->remoteVersion, sql))
+ "", fout->remoteVersion, query))
fatal("could not parse GRANT ACL list (%s) or REVOKE ACL list (%s) for object \"%s\" (%s)",
acls, racls, name, type);
- if (sql->len > 0)
+ if (haveACL && tag != NULL)
{
- PQExpBuffer tag = createPQExpBuffer();
- DumpId aclDeps[2];
- int nDeps = 0;
-
if (subname)
appendPQExpBuffer(tag, "COLUMN %s.%s", name, subname);
else
appendPQExpBuffer(tag, "%s %s", type, name);
-
- aclDeps[nDeps++] = objDumpId;
- if (altDumpId != InvalidDumpId)
- aclDeps[nDeps++] = altDumpId;
-
- aclDumpId = createDumpId();
-
- ArchiveEntry(fout, nilCatalogId, aclDumpId,
- ARCHIVE_OPTS(.tag = tag->data,
- .namespace = nspname,
- .owner = owner,
- .description = "ACL",
- .section = SECTION_NONE,
- .createStmt = sql->data,
- .deps = aclDeps,
- .nDeps = nDeps));
-
- destroyPQExpBuffer(tag);
}
- destroyPQExpBuffer(sql);
-
- return aclDumpId;
+ return haveACL;
}
/*
@@ -15166,34 +15253,58 @@ dumpSecLabel(Archive *fout, const char *type, const char *name,
const char *namespace, const char *owner,
CatalogId catalogId, int subid, DumpId dumpId)
{
+ PQExpBuffer query = createPQExpBuffer();
+ PQExpBuffer tag = createPQExpBuffer();
+
+ if (dumpSecLabelQuery(fout, query, tag, type, name,
+ namespace, owner, catalogId, subid, dumpId))
+ {
+ ArchiveEntry(fout, nilCatalogId, createDumpId(),
+ ARCHIVE_OPTS(.tag = tag->data,
+ .namespace = namespace,
+ .owner = owner,
+ .description = "SECURITY LABEL",
+ .section = SECTION_NONE,
+ .createStmt = query->data,
+ .deps = &dumpId,
+ .nDeps = 1));
+ }
+
+ destroyPQExpBuffer(query);
+ destroyPQExpBuffer(tag);
+}
+
+static bool
+dumpSecLabelQuery(Archive *fout, PQExpBuffer query, PQExpBuffer tag,
+ const char *type, const char *name,
+ const char *namespace, const char *owner,
+ CatalogId catalogId, int subid, DumpId dumpId)
+{
DumpOptions *dopt = fout->dopt;
SecLabelItem *labels;
int nlabels;
int i;
- PQExpBuffer query;
/* do nothing, if --no-security-labels is supplied */
if (dopt->no_security_labels)
- return;
+ return false;
/* Security labels are schema not data ... except blob labels are data */
if (strcmp(type, "LARGE OBJECT") != 0)
{
if (dopt->dataOnly)
- return;
+ return false;
}
else
{
/* We do dump blob security labels in binary-upgrade mode */
if (dopt->schemaOnly && !dopt->binary_upgrade)
- return;
+ return false;
}
/* Search for security labels associated with catalogId, using table */
nlabels = findSecLabels(fout, catalogId.tableoid, catalogId.oid, &labels);
- query = createPQExpBuffer();
-
for (i = 0; i < nlabels; i++)
{
/*
@@ -15214,22 +15325,11 @@ dumpSecLabel(Archive *fout, const char *type, const char *name,
if (query->len > 0)
{
- PQExpBuffer tag = createPQExpBuffer();
-
appendPQExpBuffer(tag, "%s %s", type, name);
- ArchiveEntry(fout, nilCatalogId, createDumpId(),
- ARCHIVE_OPTS(.tag = tag->data,
- .namespace = namespace,
- .owner = owner,
- .description = "SECURITY LABEL",
- .section = SECTION_NONE,
- .createStmt = query->data,
- .deps = &dumpId,
- .nDeps = 1));
- destroyPQExpBuffer(tag);
+ return true;
}
- destroyPQExpBuffer(query);
+ return false;
}
/*
diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
index 589b4ae..b16db03 100644
--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -59,6 +59,7 @@ main(int argc, char **argv)
int c;
int exit_code;
int numWorkers = 1;
+ int blobBatchSize = 0;
Archive *AH;
char *inputFileSpec;
static int disable_triggers = 0;
@@ -120,6 +121,7 @@ main(int argc, char **argv)
{"no-publications", no_argument, &no_publications, 1},
{"no-security-labels", no_argument, &no_security_labels, 1},
{"no-subscriptions", no_argument, &no_subscriptions, 1},
+ {"restore-blob-batch-size", required_argument, NULL, 4},
{NULL, 0, NULL, 0}
};
@@ -280,6 +282,10 @@ main(int argc, char **argv)
set_dump_section(optarg, &(opts->dumpSections));
break;
+ case 4: /* # of blobs to restore per transaction */
+ blobBatchSize = atoi(optarg);
+ break;
+
default:
fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
exit_nicely(1);
@@ -434,6 +440,7 @@ main(int argc, char **argv)
SortTocFromFile(AH);
AH->numWorkers = numWorkers;
+ AH->blobBatchSize = blobBatchSize;
if (opts->tocSummary)
PrintTOCSummary(AH);
@@ -506,6 +513,8 @@ usage(const char *progname)
printf(_(" --use-set-session-authorization\n"
" use SET SESSION AUTHORIZATION commands instead of\n"
" ALTER OWNER commands to set ownership\n"));
+ printf(_(" --restore-blob-batch-size=NUM\n"
+ " attempt to restore NUM large objects per transaction\n"));
printf(_("\nConnection options:\n"));
printf(_(" -h, --host=HOSTNAME database server host or socket directory\n"));
diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c
index 9c9b313..868e9f6 100644
--- a/src/bin/pg_upgrade/option.c
+++ b/src/bin/pg_upgrade/option.c
@@ -57,6 +57,8 @@ parseCommandLine(int argc, char *argv[])
{"verbose", no_argument, NULL, 'v'},
{"clone", no_argument, NULL, 1},
{"index-collation-versions-unknown", no_argument, NULL, 2},
+ {"restore-jobs", required_argument, NULL, 3},
+ {"restore-blob-batch-size", required_argument, NULL, 4},
{NULL, 0, NULL, 0}
};
@@ -208,6 +210,14 @@ parseCommandLine(int argc, char *argv[])
user_opts.ind_coll_unknown = true;
break;
+ case 3:
+ user_opts.restore_jobs = atoi(optarg);
+ break;
+
+ case 4:
+ user_opts.blob_batch_size = atoi(optarg);
+ break;
+
default:
fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
os_info.progname);
@@ -314,6 +324,8 @@ usage(void)
printf(_(" --clone clone instead of copying files to new cluster\n"));
printf(_(" --index-collation-versions-unknown\n"));
printf(_(" mark text indexes as needing to be rebuilt\n"));
+ printf(_(" --restore-blob-batch-size=NUM attempt to restore NUM large objects per\n"));
+ printf(_(" transaction\n"));
printf(_(" -?, --help show this help, then exit\n"));
printf(_("\n"
"Before running pg_upgrade you must:\n"
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index e23b8ca..095e980 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -385,10 +385,13 @@ create_new_objects(void)
parallel_exec_prog(log_file_name,
NULL,
"\"%s/pg_restore\" %s %s --exit-on-error --verbose "
+ "--jobs %d --restore-blob-batch-size %d "
"--dbname template1 \"%s\"",
new_cluster.bindir,
cluster_conn_opts(&new_cluster),
create_opts,
+ user_opts.restore_jobs,
+ user_opts.blob_batch_size,
sql_file_name);
}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 919a784..5647f96 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -291,6 +291,8 @@ typedef struct
* changes */
transferMode transfer_mode; /* copy files or link them? */
int jobs; /* number of processes/threads to use */
+ int restore_jobs; /* number of pg_restore --jobs to use */
+ int blob_batch_size; /* number of blobs to restore per xact */
char *socketdir; /* directory to use for Unix sockets */
bool ind_coll_unknown; /* mark unknown index collation versions */
} UserOpts;
On Tue, Mar 23, 2021 at 08:51:32AM -0400, Jan Wieck wrote:
On 3/22/21 7:18 PM, Jan Wieck wrote:
On 3/22/21 5:36 PM, Zhihong Yu wrote:
Hi,
w.r.t.�pg_upgrade_improvements.v2.diff.
+ � � � blobBatchCount = 0;
+ � � � blobInXact = false;The count and bool flag are always reset in tandem. It seems
variable�blobInXact�is not needed.You are right. I will fix that.
New patch v3 attached.
Would it be better to allow pg_upgrade to pass arbitrary arguments to
pg_restore, instead of just these specific ones?
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
If only the physical world exists, free will is an illusion.
On 3/23/21 10:56 AM, Bruce Momjian wrote:
On Tue, Mar 23, 2021 at 08:51:32AM -0400, Jan Wieck wrote:
On 3/22/21 7:18 PM, Jan Wieck wrote:
On 3/22/21 5:36 PM, Zhihong Yu wrote:
Hi,
w.r.t. pg_upgrade_improvements.v2.diff.
+ blobBatchCount = 0;
+ blobInXact = false;The count and bool flag are always reset in tandem. It seems
variable blobInXact is not needed.You are right. I will fix that.
New patch v3 attached.
Would it be better to allow pg_upgrade to pass arbitrary arguments to
pg_restore, instead of just these specific ones?
That would mean arbitrary parameters to pg_dump as well as pg_restore.
But yes, that would probably be better in the long run.
Any suggestion as to how that would actually look like? Unfortunately
pg_restore has -[dDoOr] already used, so it doesn't look like there will
be any naturally intelligible short options for that.
Regards, Jan
--
Jan Wieck
Principle Database Engineer
Amazon Web Services
On Tue, Mar 23, 2021 at 01:25:15PM -0400, Jan Wieck wrote:
On 3/23/21 10:56 AM, Bruce Momjian wrote:
Would it be better to allow pg_upgrade to pass arbitrary arguments to
pg_restore, instead of just these specific ones?That would mean arbitrary parameters to pg_dump as well as pg_restore. But
yes, that would probably be better in the long run.Any suggestion as to how that would actually look like? Unfortunately
pg_restore has -[dDoOr] already used, so it doesn't look like there will be
any naturally intelligible short options for that.
We have the postmaster which can pass arbitrary arguments to postgres
processes using -o.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
If only the physical world exists, free will is an illusion.
On 3/23/21 2:06 PM, Bruce Momjian wrote:
We have the postmaster which can pass arbitrary arguments to postgres
processes using -o.
Right, and -o is already taken in pg_upgrade for sending options to the
old postmaster.
What we are looking for are options for sending options to pg_dump and
pg_restore, which are not postmasters or children of postmaster, but
rather clients. There is no option to send options to clients of
postmasters.
So the question remains, how do we name this?
--pg-dump-options "<string>"
--pg-restore-options "<string>"
where "<string>" could be something like "--whatever[=NUM] [...]" would
be something unambiguous.
Regards, Jan
--
Jan Wieck
Principle Database Engineer
Amazon Web Services
On Tue, Mar 23, 2021 at 02:23:03PM -0400, Jan Wieck wrote:
On 3/23/21 2:06 PM, Bruce Momjian wrote:
We have the postmaster which can pass arbitrary arguments to postgres
processes using -o.Right, and -o is already taken in pg_upgrade for sending options to the old
postmaster.What we are looking for are options for sending options to pg_dump and
pg_restore, which are not postmasters or children of postmaster, but rather
clients. There is no option to send options to clients of postmasters.So the question remains, how do we name this?
--pg-dump-options "<string>"
--pg-restore-options "<string>"where "<string>" could be something like "--whatever[=NUM] [...]" would be
something unambiguous.
Sure. I don't think the letter you use is a problem.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
If only the physical world exists, free will is an illusion.
Jan Wieck <jan@wi3ck.info> writes:
So the question remains, how do we name this?
--pg-dump-options "<string>"
--pg-restore-options "<string>"
If you're passing multiple options, that is
--pg-dump-options "--foo=x --bar=y"
it seems just horribly fragile. Lose the double quotes and suddenly
--bar is a separate option to pg_upgrade itself, not part of the argument
for the previous option. That's pretty easy to do when passing things
through shell scripts, too. So it'd likely be safer to write
--pg-dump-option=--foo=x --pg-dump-option=--bar=y
which requires pg_upgrade to allow aggregating multiple options,
but you'd probably want it to act that way anyway.
regards, tom lane
On 3/23/21 2:35 PM, Tom Lane wrote:
Jan Wieck <jan@wi3ck.info> writes:
So the question remains, how do we name this?
--pg-dump-options "<string>"
--pg-restore-options "<string>"If you're passing multiple options, that is
--pg-dump-options "--foo=x --bar=y"
it seems just horribly fragile. Lose the double quotes and suddenly
--bar is a separate option to pg_upgrade itself, not part of the argument
for the previous option. That's pretty easy to do when passing things
through shell scripts, too. So it'd likely be safer to write--pg-dump-option=--foo=x --pg-dump-option=--bar=y
which requires pg_upgrade to allow aggregating multiple options,
but you'd probably want it to act that way anyway.
... which would be all really easy if pg_upgrade wouldn't be assembling
a shell script string to pass into parallel_exec_prog() by itself.
But I will see what I can do ...
Regards, Jan
--
Jan Wieck
Principle Database Engineer
Amazon Web Services
Jan Wieck <jan@wi3ck.info> writes:
On 3/23/21 2:35 PM, Tom Lane wrote:
If you're passing multiple options, that is
--pg-dump-options "--foo=x --bar=y"
it seems just horribly fragile. Lose the double quotes and suddenly
--bar is a separate option to pg_upgrade itself, not part of the argument
for the previous option. That's pretty easy to do when passing things
through shell scripts, too.
... which would be all really easy if pg_upgrade wouldn't be assembling
a shell script string to pass into parallel_exec_prog() by itself.
No, what I was worried about is shell script(s) that invoke pg_upgrade
and have to pass down some of these options through multiple levels of
option parsing.
BTW, it doesn't seem like the "pg-" prefix has any value-add here,
so maybe "--dump-option" and "--restore-option" would be suitable
spellings.
regards, tom lane
On 3/23/21 2:59 PM, Tom Lane wrote:
Jan Wieck <jan@wi3ck.info> writes:
On 3/23/21 2:35 PM, Tom Lane wrote:
If you're passing multiple options, that is
--pg-dump-options "--foo=x --bar=y"
it seems just horribly fragile. Lose the double quotes and suddenly
--bar is a separate option to pg_upgrade itself, not part of the argument
for the previous option. That's pretty easy to do when passing things
through shell scripts, too.... which would be all really easy if pg_upgrade wouldn't be assembling
a shell script string to pass into parallel_exec_prog() by itself.No, what I was worried about is shell script(s) that invoke pg_upgrade
and have to pass down some of these options through multiple levels of
option parsing.
The problem here is that pg_upgrade itself is invoking a shell again. It
is not assembling an array of arguments to pass into exec*(). I'd be a
happy camper if it did the latter. But as things are we'd have to add
full shell escapeing for arbitrary strings.
BTW, it doesn't seem like the "pg-" prefix has any value-add here,
so maybe "--dump-option" and "--restore-option" would be suitable
spellings.
Agreed.
Regards, Jan
--
Jan Wieck
Principle Database Engineer
Amazon Web Services
Jan Wieck <jan@wi3ck.info> writes:
The problem here is that pg_upgrade itself is invoking a shell again. It
is not assembling an array of arguments to pass into exec*(). I'd be a
happy camper if it did the latter. But as things are we'd have to add
full shell escapeing for arbitrary strings.
Surely we need that (and have it already) anyway?
I think we've stayed away from exec* because we'd have to write an
emulation for Windows. Maybe somebody will get fed up and produce
such code, but it's not likely to be the least-effort route to the
goal.
regards, tom lane
On 3/23/21 3:35 PM, Tom Lane wrote:
Jan Wieck <jan@wi3ck.info> writes:
The problem here is that pg_upgrade itself is invoking a shell again. It
is not assembling an array of arguments to pass into exec*(). I'd be a
happy camper if it did the latter. But as things are we'd have to add
full shell escapeing for arbitrary strings.Surely we need that (and have it already) anyway?
There are functions to shell escape a single string, like
appendShellString()
but that is hardly enough when a single optarg for --restore-option
could look like any of
--jobs 8
--jobs=8
--jobs='8'
--jobs '8'
--jobs "8"
--jobs="8"
--dont-bother-about-jobs
When placed into a shell string, those things have very different
effects on your args[].
I also want to say that we are overengineering this whole thing. Yes,
there is the problem of shell quoting possibly going wrong as it passes
from one shell to another. But for now this is all about passing a few
numbers down from pg_upgrade to pg_restore (and eventually pg_dump).
Have we even reached a consensus yet on that doing it the way, my patch
is proposing, is the right way to go? Like that emitting BLOB TOC
entries into SECTION_DATA when in binary upgrade mode is a good thing?
Or that bunching all the SQL statements for creating the blob, changing
the ACL and COMMENT and SECLABEL all in one multi-statement-query is.
Maybe we should focus on those details before getting into all the
parameter naming stuff.
Regards, Jan
--
Jan Wieck
Principle Database Engineer
Amazon Web Services
Jan Wieck <jan@wi3ck.info> writes:
Have we even reached a consensus yet on that doing it the way, my patch
is proposing, is the right way to go? Like that emitting BLOB TOC
entries into SECTION_DATA when in binary upgrade mode is a good thing?
Or that bunching all the SQL statements for creating the blob, changing
the ACL and COMMENT and SECLABEL all in one multi-statement-query is.
Now you're asking for actual review effort, which is a little hard
to come by towards the tail end of the last CF of a cycle. I'm
interested in this topic, but I can't justify spending much time
on it right now.
regards, tom lane
On 3/23/21 4:55 PM, Tom Lane wrote:
Jan Wieck <jan@wi3ck.info> writes:
Have we even reached a consensus yet on that doing it the way, my patch
is proposing, is the right way to go? Like that emitting BLOB TOC
entries into SECTION_DATA when in binary upgrade mode is a good thing?
Or that bunching all the SQL statements for creating the blob, changing
the ACL and COMMENT and SECLABEL all in one multi-statement-query is.Now you're asking for actual review effort, which is a little hard
to come by towards the tail end of the last CF of a cycle. I'm
interested in this topic, but I can't justify spending much time
on it right now.
Understood.
In any case I changed the options so that they behave the same way, the
existing -o and -O (for old/new postmaster options) work. I don't think
it would be wise to have option forwarding work differently between
options for postmaster and options for pg_dump/pg_restore.
Regards, Jan
--
Jan Wieck
Principle Database Engineer
Amazon Web Services
On 3/24/21 12:04 PM, Jan Wieck wrote:
In any case I changed the options so that they behave the same way, the
existing -o and -O (for old/new postmaster options) work. I don't think
it would be wise to have option forwarding work differently between
options for postmaster and options for pg_dump/pg_restore.
Attaching the actual diff might help.
--
Jan Wieck
Principle Database Engineer
Amazon Web Services
Attachments:
pg_upgrade_improvements.v4.difftext/x-patch; charset=UTF-8; name=pg_upgrade_improvements.v4.diffDownload
diff --git a/src/bin/pg_dump/parallel.c b/src/bin/pg_dump/parallel.c
index c7351a4..4a611d0 100644
--- a/src/bin/pg_dump/parallel.c
+++ b/src/bin/pg_dump/parallel.c
@@ -864,6 +864,11 @@ RunWorker(ArchiveHandle *AH, ParallelSlot *slot)
WaitForCommands(AH, pipefd);
/*
+ * Close an eventually open BLOB batch transaction.
+ */
+ CommitBlobTransaction((Archive *)AH);
+
+ /*
* Disconnect from database and clean up.
*/
set_cancel_slot_archive(slot, NULL);
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 0296b9b..cd8a590 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -203,6 +203,8 @@ typedef struct Archive
int numWorkers; /* number of parallel processes */
char *sync_snapshot_id; /* sync snapshot id for parallel operation */
+ int blobBatchSize; /* # of blobs to restore per transaction */
+
/* info needed for string escaping */
int encoding; /* libpq code for client_encoding */
bool std_strings; /* standard_conforming_strings */
@@ -269,6 +271,7 @@ extern void WriteData(Archive *AH, const void *data, size_t dLen);
extern int StartBlob(Archive *AH, Oid oid);
extern int EndBlob(Archive *AH, Oid oid);
+extern void CommitBlobTransaction(Archive *AH);
extern void CloseArchive(Archive *AH);
extern void SetArchiveOptions(Archive *AH, DumpOptions *dopt, RestoreOptions *ropt);
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 1f82c64..8331e8a 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -68,6 +68,7 @@ typedef struct _parallelReadyList
bool sorted; /* are valid entries currently sorted? */
} ParallelReadyList;
+static int blobBatchCount = 0;
static ArchiveHandle *_allocAH(const char *FileSpec, const ArchiveFormat fmt,
const int compression, bool dosync, ArchiveMode mode,
@@ -265,6 +266,8 @@ CloseArchive(Archive *AHX)
int res = 0;
ArchiveHandle *AH = (ArchiveHandle *) AHX;
+ CommitBlobTransaction(AHX);
+
AH->ClosePtr(AH);
/* Close the output */
@@ -279,6 +282,23 @@ CloseArchive(Archive *AHX)
/* Public */
void
+CommitBlobTransaction(Archive *AHX)
+{
+ ArchiveHandle *AH = (ArchiveHandle *) AHX;
+
+ if (blobBatchCount > 0)
+ {
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "-- End BLOB restore batch\n");
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "COMMIT;\n\n");
+
+ blobBatchCount = 0;
+ }
+}
+
+/* Public */
+void
SetArchiveOptions(Archive *AH, DumpOptions *dopt, RestoreOptions *ropt)
{
/* Caller can omit dump options, in which case we synthesize them */
@@ -3531,6 +3551,57 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, bool isData)
{
RestoreOptions *ropt = AH->public.ropt;
+ /* We restore BLOBs in batches to reduce XID consumption */
+ if (strcmp(te->desc, "BLOB") == 0 && AH->public.blobBatchSize > 0)
+ {
+ if (blobBatchCount > 0)
+ {
+ /* We are inside a BLOB restore transaction */
+ if (blobBatchCount >= AH->public.blobBatchSize)
+ {
+ /*
+ * We did reach the batch size with the previous BLOB.
+ * Commit and start a new batch.
+ */
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "-- BLOB batch size reached\n");
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "COMMIT;\n");
+ ahprintf(AH, "BEGIN;\n\n");
+
+ blobBatchCount = 1;
+ }
+ else
+ {
+ /* This one still fits into the current batch */
+ blobBatchCount++;
+ }
+ }
+ else
+ {
+ /* Not inside a transaction, start a new batch */
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "-- Start BLOB restore batch\n");
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "BEGIN;\n\n");
+
+ blobBatchCount = 1;
+ }
+ }
+ else
+ {
+ /* Not a BLOB. If we have a BLOB batch open, close it. */
+ if (blobBatchCount > 0)
+ {
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "-- End BLOB restore batch\n");
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "COMMIT;\n\n");
+
+ blobBatchCount = 0;
+ }
+ }
+
/* Select owner, schema, tablespace and default AM as necessary */
_becomeOwner(AH, te);
_selectOutputSchema(AH, te->namespace);
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index f8bec3f..f153f08 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -165,12 +165,20 @@ static void guessConstraintInheritance(TableInfo *tblinfo, int numTables);
static void dumpComment(Archive *fout, const char *type, const char *name,
const char *namespace, const char *owner,
CatalogId catalogId, int subid, DumpId dumpId);
+static bool dumpCommentQuery(Archive *fout, PQExpBuffer query, PQExpBuffer tag,
+ const char *type, const char *name,
+ const char *namespace, const char *owner,
+ CatalogId catalogId, int subid, DumpId dumpId);
static int findComments(Archive *fout, Oid classoid, Oid objoid,
CommentItem **items);
static int collectComments(Archive *fout, CommentItem **items);
static void dumpSecLabel(Archive *fout, const char *type, const char *name,
const char *namespace, const char *owner,
CatalogId catalogId, int subid, DumpId dumpId);
+static bool dumpSecLabelQuery(Archive *fout, PQExpBuffer query, PQExpBuffer tag,
+ const char *type, const char *name,
+ const char *namespace, const char *owner,
+ CatalogId catalogId, int subid, DumpId dumpId);
static int findSecLabels(Archive *fout, Oid classoid, Oid objoid,
SecLabelItem **items);
static int collectSecLabels(Archive *fout, SecLabelItem **items);
@@ -227,6 +235,13 @@ static DumpId dumpACL(Archive *fout, DumpId objDumpId, DumpId altDumpId,
const char *nspname, const char *owner,
const char *acls, const char *racls,
const char *initacls, const char *initracls);
+static bool dumpACLQuery(Archive *fout, PQExpBuffer query, PQExpBuffer tag,
+ DumpId objDumpId, DumpId altDumpId,
+ const char *type, const char *name,
+ const char *subname,
+ const char *nspname, const char *owner,
+ const char *acls, const char *racls,
+ const char *initacls, const char *initracls);
static void getDependencies(Archive *fout);
static void BuildArchiveDependencies(Archive *fout);
@@ -3468,11 +3483,44 @@ dumpBlob(Archive *fout, const BlobInfo *binfo)
{
PQExpBuffer cquery = createPQExpBuffer();
PQExpBuffer dquery = createPQExpBuffer();
+ PQExpBuffer tag = createPQExpBuffer();
+ teSection section = SECTION_PRE_DATA;
appendPQExpBuffer(cquery,
"SELECT pg_catalog.lo_create('%s');\n",
binfo->dobj.name);
+ /*
+ * In binary upgrade mode we put all the queries to restore
+ * one large object into a single TOC entry and emit it as
+ * SECTION_DATA so that they can be restored in parallel.
+ */
+ if (fout->dopt->binary_upgrade)
+ {
+ section = SECTION_DATA;
+
+ /* Dump comment if any */
+ if (binfo->dobj.dump & DUMP_COMPONENT_COMMENT)
+ dumpCommentQuery(fout, cquery, tag, "LARGE OBJECT",
+ binfo->dobj.name, NULL, binfo->rolname,
+ binfo->dobj.catId, 0, binfo->dobj.dumpId);
+
+ /* Dump security label if any */
+ if (binfo->dobj.dump & DUMP_COMPONENT_SECLABEL)
+ dumpSecLabelQuery(fout, cquery, tag, "LARGE OBJECT",
+ binfo->dobj.name,
+ NULL, binfo->rolname,
+ binfo->dobj.catId, 0, binfo->dobj.dumpId);
+
+ /* Dump ACL if any */
+ if (binfo->blobacl && (binfo->dobj.dump & DUMP_COMPONENT_ACL))
+ dumpACLQuery(fout, cquery, tag,
+ binfo->dobj.dumpId, InvalidDumpId, "LARGE OBJECT",
+ binfo->dobj.name, NULL,
+ NULL, binfo->rolname, binfo->blobacl, binfo->rblobacl,
+ binfo->initblobacl, binfo->initrblobacl);
+ }
+
appendPQExpBuffer(dquery,
"SELECT pg_catalog.lo_unlink('%s');\n",
binfo->dobj.name);
@@ -3482,28 +3530,31 @@ dumpBlob(Archive *fout, const BlobInfo *binfo)
ARCHIVE_OPTS(.tag = binfo->dobj.name,
.owner = binfo->rolname,
.description = "BLOB",
- .section = SECTION_PRE_DATA,
+ .section = section,
.createStmt = cquery->data,
.dropStmt = dquery->data));
- /* Dump comment if any */
- if (binfo->dobj.dump & DUMP_COMPONENT_COMMENT)
- dumpComment(fout, "LARGE OBJECT", binfo->dobj.name,
- NULL, binfo->rolname,
- binfo->dobj.catId, 0, binfo->dobj.dumpId);
-
- /* Dump security label if any */
- if (binfo->dobj.dump & DUMP_COMPONENT_SECLABEL)
- dumpSecLabel(fout, "LARGE OBJECT", binfo->dobj.name,
- NULL, binfo->rolname,
- binfo->dobj.catId, 0, binfo->dobj.dumpId);
-
- /* Dump ACL if any */
- if (binfo->blobacl && (binfo->dobj.dump & DUMP_COMPONENT_ACL))
- dumpACL(fout, binfo->dobj.dumpId, InvalidDumpId, "LARGE OBJECT",
- binfo->dobj.name, NULL,
- NULL, binfo->rolname, binfo->blobacl, binfo->rblobacl,
- binfo->initblobacl, binfo->initrblobacl);
+ if (!fout->dopt->binary_upgrade)
+ {
+ /* Dump comment if any */
+ if (binfo->dobj.dump & DUMP_COMPONENT_COMMENT)
+ dumpComment(fout, "LARGE OBJECT", binfo->dobj.name,
+ NULL, binfo->rolname,
+ binfo->dobj.catId, 0, binfo->dobj.dumpId);
+
+ /* Dump security label if any */
+ if (binfo->dobj.dump & DUMP_COMPONENT_SECLABEL)
+ dumpSecLabel(fout, "LARGE OBJECT", binfo->dobj.name,
+ NULL, binfo->rolname,
+ binfo->dobj.catId, 0, binfo->dobj.dumpId);
+
+ /* Dump ACL if any */
+ if (binfo->blobacl && (binfo->dobj.dump & DUMP_COMPONENT_ACL))
+ dumpACL(fout, binfo->dobj.dumpId, InvalidDumpId, "LARGE OBJECT",
+ binfo->dobj.name, NULL,
+ NULL, binfo->rolname, binfo->blobacl, binfo->rblobacl,
+ binfo->initblobacl, binfo->initrblobacl);
+ }
destroyPQExpBuffer(cquery);
destroyPQExpBuffer(dquery);
@@ -9868,25 +9919,56 @@ dumpComment(Archive *fout, const char *type, const char *name,
const char *namespace, const char *owner,
CatalogId catalogId, int subid, DumpId dumpId)
{
+ PQExpBuffer query = createPQExpBuffer();
+ PQExpBuffer tag = createPQExpBuffer();
+
+ if (dumpCommentQuery(fout, query, tag, type, name, namespace, owner,
+ catalogId, subid, dumpId))
+ {
+ /*
+ * We mark comments as SECTION_NONE because they really belong in the
+ * same section as their parent, whether that is pre-data or
+ * post-data.
+ */
+ ArchiveEntry(fout, nilCatalogId, createDumpId(),
+ ARCHIVE_OPTS(.tag = tag->data,
+ .namespace = namespace,
+ .owner = owner,
+ .description = "COMMENT",
+ .section = SECTION_NONE,
+ .createStmt = query->data,
+ .deps = &dumpId,
+ .nDeps = 1));
+ }
+ destroyPQExpBuffer(query);
+ destroyPQExpBuffer(tag);
+}
+
+static bool
+dumpCommentQuery(Archive *fout, PQExpBuffer query, PQExpBuffer tag,
+ const char *type, const char *name,
+ const char *namespace, const char *owner,
+ CatalogId catalogId, int subid, DumpId dumpId)
+{
DumpOptions *dopt = fout->dopt;
CommentItem *comments;
int ncomments;
/* do nothing, if --no-comments is supplied */
if (dopt->no_comments)
- return;
+ return false;
/* Comments are schema not data ... except blob comments are data */
if (strcmp(type, "LARGE OBJECT") != 0)
{
if (dopt->dataOnly)
- return;
+ return false;
}
else
{
/* We do dump blob comments in binary-upgrade mode */
if (dopt->schemaOnly && !dopt->binary_upgrade)
- return;
+ return false;
}
/* Search for comments associated with catalogId, using table */
@@ -9905,9 +9987,6 @@ dumpComment(Archive *fout, const char *type, const char *name,
/* If a comment exists, build COMMENT ON statement */
if (ncomments > 0)
{
- PQExpBuffer query = createPQExpBuffer();
- PQExpBuffer tag = createPQExpBuffer();
-
appendPQExpBuffer(query, "COMMENT ON %s ", type);
if (namespace && *namespace)
appendPQExpBuffer(query, "%s.", fmtId(namespace));
@@ -9917,24 +9996,10 @@ dumpComment(Archive *fout, const char *type, const char *name,
appendPQExpBuffer(tag, "%s %s", type, name);
- /*
- * We mark comments as SECTION_NONE because they really belong in the
- * same section as their parent, whether that is pre-data or
- * post-data.
- */
- ArchiveEntry(fout, nilCatalogId, createDumpId(),
- ARCHIVE_OPTS(.tag = tag->data,
- .namespace = namespace,
- .owner = owner,
- .description = "COMMENT",
- .section = SECTION_NONE,
- .createStmt = query->data,
- .deps = &dumpId,
- .nDeps = 1));
-
- destroyPQExpBuffer(query);
- destroyPQExpBuffer(tag);
+ return true;
}
+
+ return false;
}
/*
@@ -15070,18 +15135,63 @@ dumpACL(Archive *fout, DumpId objDumpId, DumpId altDumpId,
const char *initacls, const char *initracls)
{
DumpId aclDumpId = InvalidDumpId;
+ PQExpBuffer query = createPQExpBuffer();
+ PQExpBuffer tag = createPQExpBuffer();
+
+ if (dumpACLQuery(fout, query, tag, objDumpId, altDumpId,
+ type, name, subname, nspname, owner,
+ acls, racls, initacls, initracls))
+ {
+ DumpId aclDeps[2];
+ int nDeps = 0;
+
+ if (subname)
+ appendPQExpBuffer(tag, "COLUMN %s.%s", name, subname);
+ else
+ appendPQExpBuffer(tag, "%s %s", type, name);
+
+ aclDeps[nDeps++] = objDumpId;
+ if (altDumpId != InvalidDumpId)
+ aclDeps[nDeps++] = altDumpId;
+
+ aclDumpId = createDumpId();
+
+ ArchiveEntry(fout, nilCatalogId, aclDumpId,
+ ARCHIVE_OPTS(.tag = tag->data,
+ .namespace = nspname,
+ .owner = owner,
+ .description = "ACL",
+ .section = SECTION_NONE,
+ .createStmt = query->data,
+ .deps = aclDeps,
+ .nDeps = nDeps));
+
+ }
+
+ destroyPQExpBuffer(query);
+ destroyPQExpBuffer(tag);
+
+ return aclDumpId;
+}
+
+static bool
+dumpACLQuery(Archive *fout, PQExpBuffer query, PQExpBuffer tag,
+ DumpId objDumpId, DumpId altDumpId,
+ const char *type, const char *name, const char *subname,
+ const char *nspname, const char *owner,
+ const char *acls, const char *racls,
+ const char *initacls, const char *initracls)
+{
DumpOptions *dopt = fout->dopt;
- PQExpBuffer sql;
+ bool haveACL = false;
/* Do nothing if ACL dump is not enabled */
if (dopt->aclsSkip)
- return InvalidDumpId;
+ return false;
/* --data-only skips ACLs *except* BLOB ACLs */
if (dopt->dataOnly && strcmp(type, "LARGE OBJECT") != 0)
- return InvalidDumpId;
-
- sql = createPQExpBuffer();
+ return false;
/*
* Check to see if this object has had any initial ACLs included for it.
@@ -15093,54 +15203,31 @@ dumpACL(Archive *fout, DumpId objDumpId, DumpId altDumpId,
*/
if (strlen(initacls) != 0 || strlen(initracls) != 0)
{
- appendPQExpBufferStr(sql, "SELECT pg_catalog.binary_upgrade_set_record_init_privs(true);\n");
+ haveACL = true;
+ appendPQExpBufferStr(query, "SELECT pg_catalog.binary_upgrade_set_record_init_privs(true);\n");
if (!buildACLCommands(name, subname, nspname, type,
initacls, initracls, owner,
- "", fout->remoteVersion, sql))
+ "", fout->remoteVersion, query))
fatal("could not parse initial GRANT ACL list (%s) or initial REVOKE ACL list (%s) for object \"%s\" (%s)",
initacls, initracls, name, type);
- appendPQExpBufferStr(sql, "SELECT pg_catalog.binary_upgrade_set_record_init_privs(false);\n");
+ appendPQExpBufferStr(query, "SELECT pg_catalog.binary_upgrade_set_record_init_privs(false);\n");
}
if (!buildACLCommands(name, subname, nspname, type,
acls, racls, owner,
- "", fout->remoteVersion, sql))
+ "", fout->remoteVersion, query))
fatal("could not parse GRANT ACL list (%s) or REVOKE ACL list (%s) for object \"%s\" (%s)",
acls, racls, name, type);
- if (sql->len > 0)
+ if (haveACL && tag != NULL)
{
- PQExpBuffer tag = createPQExpBuffer();
- DumpId aclDeps[2];
- int nDeps = 0;
-
if (subname)
appendPQExpBuffer(tag, "COLUMN %s.%s", name, subname);
else
appendPQExpBuffer(tag, "%s %s", type, name);
-
- aclDeps[nDeps++] = objDumpId;
- if (altDumpId != InvalidDumpId)
- aclDeps[nDeps++] = altDumpId;
-
- aclDumpId = createDumpId();
-
- ArchiveEntry(fout, nilCatalogId, aclDumpId,
- ARCHIVE_OPTS(.tag = tag->data,
- .namespace = nspname,
- .owner = owner,
- .description = "ACL",
- .section = SECTION_NONE,
- .createStmt = sql->data,
- .deps = aclDeps,
- .nDeps = nDeps));
-
- destroyPQExpBuffer(tag);
}
- destroyPQExpBuffer(sql);
-
- return aclDumpId;
+ return haveACL;
}
/*
@@ -15166,34 +15253,58 @@ dumpSecLabel(Archive *fout, const char *type, const char *name,
const char *namespace, const char *owner,
CatalogId catalogId, int subid, DumpId dumpId)
{
+ PQExpBuffer query = createPQExpBuffer();
+ PQExpBuffer tag = createPQExpBuffer();
+
+ if (dumpSecLabelQuery(fout, query, tag, type, name,
+ namespace, owner, catalogId, subid, dumpId))
+ {
+ ArchiveEntry(fout, nilCatalogId, createDumpId(),
+ ARCHIVE_OPTS(.tag = tag->data,
+ .namespace = namespace,
+ .owner = owner,
+ .description = "SECURITY LABEL",
+ .section = SECTION_NONE,
+ .createStmt = query->data,
+ .deps = &dumpId,
+ .nDeps = 1));
+ }
+
+ destroyPQExpBuffer(query);
+ destroyPQExpBuffer(tag);
+}
+
+static bool
+dumpSecLabelQuery(Archive *fout, PQExpBuffer query, PQExpBuffer tag,
+ const char *type, const char *name,
+ const char *namespace, const char *owner,
+ CatalogId catalogId, int subid, DumpId dumpId)
+{
DumpOptions *dopt = fout->dopt;
SecLabelItem *labels;
int nlabels;
int i;
- PQExpBuffer query;
/* do nothing, if --no-security-labels is supplied */
if (dopt->no_security_labels)
- return;
+ return false;
/* Security labels are schema not data ... except blob labels are data */
if (strcmp(type, "LARGE OBJECT") != 0)
{
if (dopt->dataOnly)
- return;
+ return false;
}
else
{
/* We do dump blob security labels in binary-upgrade mode */
if (dopt->schemaOnly && !dopt->binary_upgrade)
- return;
+ return false;
}
/* Search for security labels associated with catalogId, using table */
nlabels = findSecLabels(fout, catalogId.tableoid, catalogId.oid, &labels);
- query = createPQExpBuffer();
-
for (i = 0; i < nlabels; i++)
{
/*
@@ -15214,22 +15325,11 @@ dumpSecLabel(Archive *fout, const char *type, const char *name,
if (query->len > 0)
{
- PQExpBuffer tag = createPQExpBuffer();
-
appendPQExpBuffer(tag, "%s %s", type, name);
- ArchiveEntry(fout, nilCatalogId, createDumpId(),
- ARCHIVE_OPTS(.tag = tag->data,
- .namespace = namespace,
- .owner = owner,
- .description = "SECURITY LABEL",
- .section = SECTION_NONE,
- .createStmt = query->data,
- .deps = &dumpId,
- .nDeps = 1));
- destroyPQExpBuffer(tag);
+ return true;
}
- destroyPQExpBuffer(query);
+ return false;
}
/*
diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
index 589b4ae..b16db03 100644
--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -59,6 +59,7 @@ main(int argc, char **argv)
int c;
int exit_code;
int numWorkers = 1;
+ int blobBatchSize = 0;
Archive *AH;
char *inputFileSpec;
static int disable_triggers = 0;
@@ -120,6 +121,7 @@ main(int argc, char **argv)
{"no-publications", no_argument, &no_publications, 1},
{"no-security-labels", no_argument, &no_security_labels, 1},
{"no-subscriptions", no_argument, &no_subscriptions, 1},
+ {"restore-blob-batch-size", required_argument, NULL, 4},
{NULL, 0, NULL, 0}
};
@@ -280,6 +282,10 @@ main(int argc, char **argv)
set_dump_section(optarg, &(opts->dumpSections));
break;
+ case 4: /* # of blobs to restore per transaction */
+ blobBatchSize = atoi(optarg);
+ break;
+
default:
fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
exit_nicely(1);
@@ -434,6 +440,7 @@ main(int argc, char **argv)
SortTocFromFile(AH);
AH->numWorkers = numWorkers;
+ AH->blobBatchSize = blobBatchSize;
if (opts->tocSummary)
PrintTOCSummary(AH);
@@ -506,6 +513,8 @@ usage(const char *progname)
printf(_(" --use-set-session-authorization\n"
" use SET SESSION AUTHORIZATION commands instead of\n"
" ALTER OWNER commands to set ownership\n"));
+ printf(_(" --restore-blob-batch-size=NUM\n"
+ " attempt to restore NUM large objects per transaction\n"));
printf(_("\nConnection options:\n"));
printf(_(" -h, --host=HOSTNAME database server host or socket directory\n"));
diff --git a/src/bin/pg_upgrade/dump.c b/src/bin/pg_upgrade/dump.c
index 33d9591..183bb6d 100644
--- a/src/bin/pg_upgrade/dump.c
+++ b/src/bin/pg_upgrade/dump.c
@@ -52,8 +52,11 @@ generate_old_dump(void)
parallel_exec_prog(log_file_name, NULL,
"\"%s/pg_dump\" %s --schema-only --quote-all-identifiers "
+ "%s "
"--binary-upgrade --format=custom %s %s --file=\"%s\" %s",
new_cluster.bindir, cluster_conn_opts(&old_cluster),
+ user_opts.pg_dump_opts ?
+ user_opts.pg_dump_opts : "",
log_opts.verbose ? "--verbose" : "",
user_opts.ind_coll_unknown ?
"--index-collation-versions-unknown" : "",
diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c
index 9c9b313..d0efb9f 100644
--- a/src/bin/pg_upgrade/option.c
+++ b/src/bin/pg_upgrade/option.c
@@ -57,6 +57,8 @@ parseCommandLine(int argc, char *argv[])
{"verbose", no_argument, NULL, 'v'},
{"clone", no_argument, NULL, 1},
{"index-collation-versions-unknown", no_argument, NULL, 2},
+ {"dump-options", required_argument, NULL, 3},
+ {"restore-options", required_argument, NULL, 4},
{NULL, 0, NULL, 0}
};
@@ -208,6 +210,34 @@ parseCommandLine(int argc, char *argv[])
user_opts.ind_coll_unknown = true;
break;
+ case 3:
+ /* append option? */
+ if (!user_opts.pg_dump_opts)
+ user_opts.pg_dump_opts = pg_strdup(optarg);
+ else
+ {
+ char *old_opts = user_opts.pg_dump_opts;
+
+ user_opts.pg_dump_opts = psprintf("%s %s",
+ old_opts, optarg);
+ free(old_opts);
+ }
+ break;
+
+ case 4:
+ /* append option? */
+ if (!user_opts.pg_restore_opts)
+ user_opts.pg_restore_opts = pg_strdup(optarg);
+ else
+ {
+ char *old_opts = user_opts.pg_restore_opts;
+
+ user_opts.pg_restore_opts = psprintf("%s %s",
+ old_opts, optarg);
+ free(old_opts);
+ }
+ break;
+
default:
fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
os_info.progname);
@@ -314,6 +344,8 @@ usage(void)
printf(_(" --clone clone instead of copying files to new cluster\n"));
printf(_(" --index-collation-versions-unknown\n"));
printf(_(" mark text indexes as needing to be rebuilt\n"));
+ printf(_(" --dump-options=OPTIONS options to pass to pg_dump\n"));
+ printf(_(" --restore-options=OPTIONS options to pass to pg_restore\n"));
printf(_(" -?, --help show this help, then exit\n"));
printf(_("\n"
"Before running pg_upgrade you must:\n"
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index e23b8ca..6f6b12d 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -348,10 +348,13 @@ create_new_objects(void)
true,
true,
"\"%s/pg_restore\" %s %s --exit-on-error --verbose "
+ "%s "
"--dbname postgres \"%s\"",
new_cluster.bindir,
cluster_conn_opts(&new_cluster),
create_opts,
+ user_opts.pg_restore_opts ?
+ user_opts.pg_restore_opts : "",
sql_file_name);
break; /* done once we've processed template1 */
@@ -385,10 +388,13 @@ create_new_objects(void)
parallel_exec_prog(log_file_name,
NULL,
"\"%s/pg_restore\" %s %s --exit-on-error --verbose "
+ "%s "
"--dbname template1 \"%s\"",
new_cluster.bindir,
cluster_conn_opts(&new_cluster),
create_opts,
+ user_opts.pg_restore_opts ?
+ user_opts.pg_restore_opts : "",
sql_file_name);
}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 919a784..4b7959e 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -293,6 +293,8 @@ typedef struct
int jobs; /* number of processes/threads to use */
char *socketdir; /* directory to use for Unix sockets */
bool ind_coll_unknown; /* mark unknown index collation versions */
+ char *pg_dump_opts; /* options to pass to pg_dump */
+ char *pg_restore_opts; /* options to pass to pg_dump */
} UserOpts;
typedef struct
On Wed, Mar 24, 2021 at 12:05:27PM -0400, Jan Wieck wrote:
On 3/24/21 12:04 PM, Jan Wieck wrote:
In any case I changed the options so that they behave the same way, the
existing -o and -O (for old/new postmaster options) work. I don't think
it would be wise to have option forwarding work differently between
options for postmaster and options for pg_dump/pg_restore.Attaching the actual diff might help.
I think the original issue with XIDs was fixed by 74cf7d46a.
Are you still planning to progress the patches addressing huge memory use of
pg_restore?
Note this other, old thread on -general, which I believe has variations on the
same patches.
/messages/by-id/7bf19bf2-e6b7-01a7-1d96-f0607c728c49@wi3ck.info
There was discussion about using pg_restore --single. Note that that was used
at some point in the past: see 12ee6ec71 and 861ad67bd.
The immediate problem is that --single conflicts with --create.
I cleaned up a patch I'd written to work around that. It preserves DB settings
and passes pg_upgrade's test. It's probably not portable as written, but if need be
could pass an empty file instead of /dev/null...
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 3628bd74a7..9c504aff79 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -364,6 +364,16 @@ create_new_objects(void)
DbInfo *old_db = &old_cluster.dbarr.dbs[dbnum];
const char *create_opts;
+ PQExpBufferData connstr,
+ escaped_connstr;
+
+ initPQExpBuffer(&connstr);
+ initPQExpBuffer(&escaped_connstr);
+ appendPQExpBufferStr(&connstr, "dbname=");
+ appendConnStrVal(&connstr, old_db->db_name);
+ appendShellString(&escaped_connstr, connstr.data);
+ termPQExpBuffer(&connstr);
+
/* Skip template1 in this pass */
if (strcmp(old_db->db_name, "template1") == 0)
continue;
@@ -378,18 +388,31 @@ create_new_objects(void)
* propagate its database-level properties.
*/
if (strcmp(old_db->db_name, "postgres") == 0)
- create_opts = "--clean --create";
+ create_opts = "--clean";
else
- create_opts = "--create";
+ create_opts = "";
+ /* Create the DB but exclude all objects */
parallel_exec_prog(log_file_name,
NULL,
"\"%s/pg_restore\" %s %s --exit-on-error --verbose "
+ "--create -L /dev/null "
"--dbname template1 \"%s\"",
new_cluster.bindir,
cluster_conn_opts(&new_cluster),
create_opts,
sql_file_name);
+
+ parallel_exec_prog(log_file_name,
+ NULL,
+ "\"%s/pg_restore\" %s %s --exit-on-error --verbose --single "
+ "--dbname=%s \"%s\"",
+ new_cluster.bindir,
+ cluster_conn_opts(&new_cluster),
+ create_opts,
+ escaped_connstr.data,
+ sql_file_name);
+
}
/* reap all children */
On Wed, Mar 24, 2021 at 12:05:27PM -0400, Jan Wieck wrote:
On 3/24/21 12:04 PM, Jan Wieck wrote:
In any case I changed the options so that they behave the same way, the
existing -o and -O (for old/new postmaster options) work. I don't think
it would be wise to have option forwarding work differently between
options for postmaster and options for pg_dump/pg_restore.Attaching the actual diff might help.
I'd like to revive this thread, so I've created a commitfest entry [0]https://commitfest.postgresql.org/39/3841/ and
attached a hastily rebased patch that compiles and passes the tests. I am
aiming to spend some more time on this in the near future.
[0]: https://commitfest.postgresql.org/39/3841/
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Attachments:
pg_upgrade_improvements_v5.difftext/x-diff; charset=us-asciiDownload
diff --git a/src/bin/pg_dump/parallel.c b/src/bin/pg_dump/parallel.c
index c8a70d9bc1..faf1953e18 100644
--- a/src/bin/pg_dump/parallel.c
+++ b/src/bin/pg_dump/parallel.c
@@ -858,6 +858,11 @@ RunWorker(ArchiveHandle *AH, ParallelSlot *slot)
*/
WaitForCommands(AH, pipefd);
+ /*
+ * Close an eventually open BLOB batch transaction.
+ */
+ CommitBlobTransaction((Archive *)AH);
+
/*
* Disconnect from database and clean up.
*/
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index fcc5f6bd05..f16ecdecc0 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -220,6 +220,8 @@ typedef struct Archive
int numWorkers; /* number of parallel processes */
char *sync_snapshot_id; /* sync snapshot id for parallel operation */
+ int blobBatchSize; /* # of blobs to restore per transaction */
+
/* info needed for string escaping */
int encoding; /* libpq code for client_encoding */
bool std_strings; /* standard_conforming_strings */
@@ -290,6 +292,7 @@ extern void WriteData(Archive *AH, const void *data, size_t dLen);
extern int StartBlob(Archive *AH, Oid oid);
extern int EndBlob(Archive *AH, Oid oid);
+extern void CommitBlobTransaction(Archive *AH);
extern void CloseArchive(Archive *AH);
extern void SetArchiveOptions(Archive *AH, DumpOptions *dopt, RestoreOptions *ropt);
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 233198afc0..7cfbed5e75 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -68,6 +68,7 @@ typedef struct _parallelReadyList
bool sorted; /* are valid entries currently sorted? */
} ParallelReadyList;
+static int blobBatchCount = 0;
static ArchiveHandle *_allocAH(const char *FileSpec, const ArchiveFormat fmt,
const int compression, bool dosync, ArchiveMode mode,
@@ -266,6 +267,8 @@ CloseArchive(Archive *AHX)
int res = 0;
ArchiveHandle *AH = (ArchiveHandle *) AHX;
+ CommitBlobTransaction(AHX);
+
AH->ClosePtr(AH);
/* Close the output */
@@ -279,6 +282,23 @@ CloseArchive(Archive *AHX)
pg_fatal("could not close output file: %m");
}
+/* Public */
+void
+CommitBlobTransaction(Archive *AHX)
+{
+ ArchiveHandle *AH = (ArchiveHandle *) AHX;
+
+ if (blobBatchCount > 0)
+ {
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "-- End BLOB restore batch\n");
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "COMMIT;\n\n");
+
+ blobBatchCount = 0;
+ }
+}
+
/* Public */
void
SetArchiveOptions(Archive *AH, DumpOptions *dopt, RestoreOptions *ropt)
@@ -3489,6 +3509,57 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, bool isData)
{
RestoreOptions *ropt = AH->public.ropt;
+ /* We restore BLOBs in batches to reduce XID consumption */
+ if (strcmp(te->desc, "BLOB") == 0 && AH->public.blobBatchSize > 0)
+ {
+ if (blobBatchCount > 0)
+ {
+ /* We are inside a BLOB restore transaction */
+ if (blobBatchCount >= AH->public.blobBatchSize)
+ {
+ /*
+ * We did reach the batch size with the previous BLOB.
+ * Commit and start a new batch.
+ */
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "-- BLOB batch size reached\n");
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "COMMIT;\n");
+ ahprintf(AH, "BEGIN;\n\n");
+
+ blobBatchCount = 1;
+ }
+ else
+ {
+ /* This one still fits into the current batch */
+ blobBatchCount++;
+ }
+ }
+ else
+ {
+ /* Not inside a transaction, start a new batch */
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "-- Start BLOB restore batch\n");
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "BEGIN;\n\n");
+
+ blobBatchCount = 1;
+ }
+ }
+ else
+ {
+ /* Not a BLOB. If we have a BLOB batch open, close it. */
+ if (blobBatchCount > 0)
+ {
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "-- End BLOB restore batch\n");
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "COMMIT;\n\n");
+
+ blobBatchCount = 0;
+ }
+ }
+
/* Select owner, schema, tablespace and default AM as necessary */
_becomeOwner(AH, te);
_selectOutputSchema(AH, te->namespace);
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index d25709ad5f..17c0dd7f0c 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -196,11 +196,20 @@ static inline void dumpComment(Archive *fout, const char *type,
const char *name, const char *namespace,
const char *owner, CatalogId catalogId,
int subid, DumpId dumpId);
+static bool dumpCommentQuery(Archive *fout, PQExpBuffer query, PQExpBuffer tag,
+ const char *type, const char *name,
+ const char *namespace, const char *owner,
+ CatalogId catalogId, int subid, DumpId dumpId,
+ const char *initdb_comment);
static int findComments(Oid classoid, Oid objoid, CommentItem **items);
static void collectComments(Archive *fout);
static void dumpSecLabel(Archive *fout, const char *type, const char *name,
const char *namespace, const char *owner,
CatalogId catalogId, int subid, DumpId dumpId);
+static bool dumpSecLabelQuery(Archive *fout, PQExpBuffer query, PQExpBuffer tag,
+ const char *type, const char *name,
+ const char *namespace, const char *owner,
+ CatalogId catalogId, int subid, DumpId dumpId);
static int findSecLabels(Oid classoid, Oid objoid, SecLabelItem **items);
static void collectSecLabels(Archive *fout);
static void dumpDumpableObject(Archive *fout, DumpableObject *dobj);
@@ -256,6 +265,12 @@ static DumpId dumpACL(Archive *fout, DumpId objDumpId, DumpId altDumpId,
const char *type, const char *name, const char *subname,
const char *nspname, const char *owner,
const DumpableAcl *dacl);
+static bool dumpACLQuery(Archive *fout, PQExpBuffer query, PQExpBuffer tag,
+ DumpId objDumpId, DumpId altDumpId,
+ const char *type, const char *name,
+ const char *subname,
+ const char *nspname, const char *owner,
+ const DumpableAcl *dacl);
static void getDependencies(Archive *fout);
static void BuildArchiveDependencies(Archive *fout);
@@ -3477,11 +3492,43 @@ dumpBlob(Archive *fout, const BlobInfo *binfo)
{
PQExpBuffer cquery = createPQExpBuffer();
PQExpBuffer dquery = createPQExpBuffer();
+ PQExpBuffer tag = createPQExpBuffer();
+ teSection section = SECTION_PRE_DATA;
appendPQExpBuffer(cquery,
"SELECT pg_catalog.lo_create('%s');\n",
binfo->dobj.name);
+ /*
+ * In binary upgrade mode we put all the queries to restore
+ * one large object into a single TOC entry and emit it as
+ * SECTION_DATA so that they can be restored in parallel.
+ */
+ if (fout->dopt->binary_upgrade)
+ {
+ section = SECTION_DATA;
+
+ /* Dump comment if any */
+ if (binfo->dobj.dump & DUMP_COMPONENT_COMMENT)
+ dumpCommentQuery(fout, cquery, tag, "LARGE OBJECT",
+ binfo->dobj.name, NULL, binfo->rolname,
+ binfo->dobj.catId, 0, binfo->dobj.dumpId, NULL);
+
+ /* Dump security label if any */
+ if (binfo->dobj.dump & DUMP_COMPONENT_SECLABEL)
+ dumpSecLabelQuery(fout, cquery, tag, "LARGE OBJECT",
+ binfo->dobj.name,
+ NULL, binfo->rolname,
+ binfo->dobj.catId, 0, binfo->dobj.dumpId);
+
+ /* Dump ACL if any */
+ if (binfo->dobj.dump & DUMP_COMPONENT_ACL)
+ dumpACLQuery(fout, cquery, tag,
+ binfo->dobj.dumpId, InvalidDumpId, "LARGE OBJECT",
+ binfo->dobj.name, NULL,
+ NULL, binfo->rolname, &binfo->dacl);
+ }
+
appendPQExpBuffer(dquery,
"SELECT pg_catalog.lo_unlink('%s');\n",
binfo->dobj.name);
@@ -3491,27 +3538,30 @@ dumpBlob(Archive *fout, const BlobInfo *binfo)
ARCHIVE_OPTS(.tag = binfo->dobj.name,
.owner = binfo->rolname,
.description = "BLOB",
- .section = SECTION_PRE_DATA,
+ .section = section,
.createStmt = cquery->data,
.dropStmt = dquery->data));
- /* Dump comment if any */
- if (binfo->dobj.dump & DUMP_COMPONENT_COMMENT)
- dumpComment(fout, "LARGE OBJECT", binfo->dobj.name,
- NULL, binfo->rolname,
- binfo->dobj.catId, 0, binfo->dobj.dumpId);
+ if (!fout->dopt->binary_upgrade)
+ {
+ /* Dump comment if any */
+ if (binfo->dobj.dump & DUMP_COMPONENT_COMMENT)
+ dumpComment(fout, "LARGE OBJECT", binfo->dobj.name,
+ NULL, binfo->rolname,
+ binfo->dobj.catId, 0, binfo->dobj.dumpId);
- /* Dump security label if any */
- if (binfo->dobj.dump & DUMP_COMPONENT_SECLABEL)
- dumpSecLabel(fout, "LARGE OBJECT", binfo->dobj.name,
- NULL, binfo->rolname,
- binfo->dobj.catId, 0, binfo->dobj.dumpId);
+ /* Dump security label if any */
+ if (binfo->dobj.dump & DUMP_COMPONENT_SECLABEL)
+ dumpSecLabel(fout, "LARGE OBJECT", binfo->dobj.name,
+ NULL, binfo->rolname,
+ binfo->dobj.catId, 0, binfo->dobj.dumpId);
- /* Dump ACL if any */
- if (binfo->dobj.dump & DUMP_COMPONENT_ACL)
- dumpACL(fout, binfo->dobj.dumpId, InvalidDumpId, "LARGE OBJECT",
- binfo->dobj.name, NULL,
- NULL, binfo->rolname, &binfo->dacl);
+ /* Dump ACL if any */
+ if (binfo->dobj.dump & DUMP_COMPONENT_ACL)
+ dumpACL(fout, binfo->dobj.dumpId, InvalidDumpId, "LARGE OBJECT",
+ binfo->dobj.name, NULL,
+ NULL, binfo->rolname, &binfo->dacl);
+ }
destroyPQExpBuffer(cquery);
destroyPQExpBuffer(dquery);
@@ -9442,6 +9492,38 @@ dumpCommentExtended(Archive *fout, const char *type,
const char *owner, CatalogId catalogId,
int subid, DumpId dumpId,
const char *initdb_comment)
+{
+ PQExpBuffer query = createPQExpBuffer();
+ PQExpBuffer tag = createPQExpBuffer();
+
+ if (dumpCommentQuery(fout, query, tag, type, name, namespace, owner,
+ catalogId, subid, dumpId, initdb_comment))
+ {
+ /*
+ * We mark comments as SECTION_NONE because they really belong in the
+ * same section as their parent, whether that is pre-data or
+ * post-data.
+ */
+ ArchiveEntry(fout, nilCatalogId, createDumpId(),
+ ARCHIVE_OPTS(.tag = tag->data,
+ .namespace = namespace,
+ .owner = owner,
+ .description = "COMMENT",
+ .section = SECTION_NONE,
+ .createStmt = query->data,
+ .deps = &dumpId,
+ .nDeps = 1));
+ }
+ destroyPQExpBuffer(query);
+ destroyPQExpBuffer(tag);
+}
+
+static bool
+dumpCommentQuery(Archive *fout, PQExpBuffer query, PQExpBuffer tag,
+ const char *type, const char *name,
+ const char *namespace, const char *owner,
+ CatalogId catalogId, int subid, DumpId dumpId,
+ const char *initdb_comment)
{
DumpOptions *dopt = fout->dopt;
CommentItem *comments;
@@ -9449,19 +9531,19 @@ dumpCommentExtended(Archive *fout, const char *type,
/* do nothing, if --no-comments is supplied */
if (dopt->no_comments)
- return;
+ return false;
/* Comments are schema not data ... except blob comments are data */
if (strcmp(type, "LARGE OBJECT") != 0)
{
if (dopt->dataOnly)
- return;
+ return false;
}
else
{
/* We do dump blob comments in binary-upgrade mode */
if (dopt->schemaOnly && !dopt->binary_upgrade)
- return;
+ return false;
}
/* Search for comments associated with catalogId, using table */
@@ -9499,9 +9581,6 @@ dumpCommentExtended(Archive *fout, const char *type,
/* If a comment exists, build COMMENT ON statement */
if (ncomments > 0)
{
- PQExpBuffer query = createPQExpBuffer();
- PQExpBuffer tag = createPQExpBuffer();
-
appendPQExpBuffer(query, "COMMENT ON %s ", type);
if (namespace && *namespace)
appendPQExpBuffer(query, "%s.", fmtId(namespace));
@@ -9511,24 +9590,10 @@ dumpCommentExtended(Archive *fout, const char *type,
appendPQExpBuffer(tag, "%s %s", type, name);
- /*
- * We mark comments as SECTION_NONE because they really belong in the
- * same section as their parent, whether that is pre-data or
- * post-data.
- */
- ArchiveEntry(fout, nilCatalogId, createDumpId(),
- ARCHIVE_OPTS(.tag = tag->data,
- .namespace = namespace,
- .owner = owner,
- .description = "COMMENT",
- .section = SECTION_NONE,
- .createStmt = query->data,
- .deps = &dumpId,
- .nDeps = 1));
-
- destroyPQExpBuffer(query);
- destroyPQExpBuffer(tag);
+ return true;
}
+
+ return false;
}
/*
@@ -14423,23 +14488,65 @@ dumpACL(Archive *fout, DumpId objDumpId, DumpId altDumpId,
const DumpableAcl *dacl)
{
DumpId aclDumpId = InvalidDumpId;
+ PQExpBuffer query = createPQExpBuffer();
+ PQExpBuffer tag = createPQExpBuffer();
+
+ if (dumpACLQuery(fout, query, tag, objDumpId, altDumpId,
+ type, name, subname, nspname, owner, dacl))
+ {
+ DumpId aclDeps[2];
+ int nDeps = 0;
+
+ if (subname)
+ appendPQExpBuffer(tag, "COLUMN %s.%s", name, subname);
+ else
+ appendPQExpBuffer(tag, "%s %s", type, name);
+
+ aclDeps[nDeps++] = objDumpId;
+ if (altDumpId != InvalidDumpId)
+ aclDeps[nDeps++] = altDumpId;
+
+ aclDumpId = createDumpId();
+
+ ArchiveEntry(fout, nilCatalogId, aclDumpId,
+ ARCHIVE_OPTS(.tag = tag->data,
+ .namespace = nspname,
+ .owner = owner,
+ .description = "ACL",
+ .section = SECTION_NONE,
+ .createStmt = query->data,
+ .deps = aclDeps,
+ .nDeps = nDeps));
+
+ }
+
+ destroyPQExpBuffer(query);
+ destroyPQExpBuffer(tag);
+
+ return aclDumpId;
+}
+
+static bool
+dumpACLQuery(Archive *fout, PQExpBuffer query, PQExpBuffer tag,
+ DumpId objDumpId, DumpId altDumpId,
+ const char *type, const char *name, const char *subname,
+ const char *nspname, const char *owner,
+ const DumpableAcl *dacl)
+{
DumpOptions *dopt = fout->dopt;
const char *acls = dacl->acl;
const char *acldefault = dacl->acldefault;
char privtype = dacl->privtype;
const char *initprivs = dacl->initprivs;
const char *baseacls;
- PQExpBuffer sql;
/* Do nothing if ACL dump is not enabled */
if (dopt->aclsSkip)
- return InvalidDumpId;
+ return false;
/* --data-only skips ACLs *except* BLOB ACLs */
if (dopt->dataOnly && strcmp(type, "LARGE OBJECT") != 0)
- return InvalidDumpId;
-
- sql = createPQExpBuffer();
+ return false;
/*
* In binary upgrade mode, we don't run an extension's script but instead
@@ -14457,13 +14564,13 @@ dumpACL(Archive *fout, DumpId objDumpId, DumpId altDumpId,
if (dopt->binary_upgrade && privtype == 'e' &&
initprivs && *initprivs != '\0')
{
- appendPQExpBufferStr(sql, "SELECT pg_catalog.binary_upgrade_set_record_init_privs(true);\n");
+ appendPQExpBufferStr(query, "SELECT pg_catalog.binary_upgrade_set_record_init_privs(true);\n");
if (!buildACLCommands(name, subname, nspname, type,
initprivs, acldefault, owner,
- "", fout->remoteVersion, sql))
+ "", fout->remoteVersion, query))
pg_fatal("could not parse initial ACL list (%s) or default (%s) for object \"%s\" (%s)",
initprivs, acldefault, name, type);
- appendPQExpBufferStr(sql, "SELECT pg_catalog.binary_upgrade_set_record_init_privs(false);\n");
+ appendPQExpBufferStr(query, "SELECT pg_catalog.binary_upgrade_set_record_init_privs(false);\n");
}
/*
@@ -14485,43 +14592,19 @@ dumpACL(Archive *fout, DumpId objDumpId, DumpId altDumpId,
if (!buildACLCommands(name, subname, nspname, type,
acls, baseacls, owner,
- "", fout->remoteVersion, sql))
+ "", fout->remoteVersion, query))
pg_fatal("could not parse ACL list (%s) or default (%s) for object \"%s\" (%s)",
acls, baseacls, name, type);
- if (sql->len > 0)
+ if (query->len > 0 && tag != NULL)
{
- PQExpBuffer tag = createPQExpBuffer();
- DumpId aclDeps[2];
- int nDeps = 0;
-
if (subname)
appendPQExpBuffer(tag, "COLUMN %s.%s", name, subname);
else
appendPQExpBuffer(tag, "%s %s", type, name);
-
- aclDeps[nDeps++] = objDumpId;
- if (altDumpId != InvalidDumpId)
- aclDeps[nDeps++] = altDumpId;
-
- aclDumpId = createDumpId();
-
- ArchiveEntry(fout, nilCatalogId, aclDumpId,
- ARCHIVE_OPTS(.tag = tag->data,
- .namespace = nspname,
- .owner = owner,
- .description = "ACL",
- .section = SECTION_NONE,
- .createStmt = sql->data,
- .deps = aclDeps,
- .nDeps = nDeps));
-
- destroyPQExpBuffer(tag);
}
- destroyPQExpBuffer(sql);
-
- return aclDumpId;
+ return true;
}
/*
@@ -14546,35 +14629,59 @@ static void
dumpSecLabel(Archive *fout, const char *type, const char *name,
const char *namespace, const char *owner,
CatalogId catalogId, int subid, DumpId dumpId)
+{
+ PQExpBuffer query = createPQExpBuffer();
+ PQExpBuffer tag = createPQExpBuffer();
+
+ if (dumpSecLabelQuery(fout, query, tag, type, name,
+ namespace, owner, catalogId, subid, dumpId))
+ {
+ ArchiveEntry(fout, nilCatalogId, createDumpId(),
+ ARCHIVE_OPTS(.tag = tag->data,
+ .namespace = namespace,
+ .owner = owner,
+ .description = "SECURITY LABEL",
+ .section = SECTION_NONE,
+ .createStmt = query->data,
+ .deps = &dumpId,
+ .nDeps = 1));
+ }
+
+ destroyPQExpBuffer(query);
+ destroyPQExpBuffer(tag);
+}
+
+static bool
+dumpSecLabelQuery(Archive *fout, PQExpBuffer query, PQExpBuffer tag,
+ const char *type, const char *name,
+ const char *namespace, const char *owner,
+ CatalogId catalogId, int subid, DumpId dumpId)
{
DumpOptions *dopt = fout->dopt;
SecLabelItem *labels;
int nlabels;
int i;
- PQExpBuffer query;
/* do nothing, if --no-security-labels is supplied */
if (dopt->no_security_labels)
- return;
+ return false;
/* Security labels are schema not data ... except blob labels are data */
if (strcmp(type, "LARGE OBJECT") != 0)
{
if (dopt->dataOnly)
- return;
+ return false;
}
else
{
/* We do dump blob security labels in binary-upgrade mode */
if (dopt->schemaOnly && !dopt->binary_upgrade)
- return;
+ return false;
}
/* Search for security labels associated with catalogId, using table */
nlabels = findSecLabels(catalogId.tableoid, catalogId.oid, &labels);
- query = createPQExpBuffer();
-
for (i = 0; i < nlabels; i++)
{
/*
@@ -14595,22 +14702,11 @@ dumpSecLabel(Archive *fout, const char *type, const char *name,
if (query->len > 0)
{
- PQExpBuffer tag = createPQExpBuffer();
-
appendPQExpBuffer(tag, "%s %s", type, name);
- ArchiveEntry(fout, nilCatalogId, createDumpId(),
- ARCHIVE_OPTS(.tag = tag->data,
- .namespace = namespace,
- .owner = owner,
- .description = "SECURITY LABEL",
- .section = SECTION_NONE,
- .createStmt = query->data,
- .deps = &dumpId,
- .nDeps = 1));
- destroyPQExpBuffer(tag);
+ return true;
}
- destroyPQExpBuffer(query);
+ return false;
}
/*
diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
index 049a100634..2159f72ffb 100644
--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -60,6 +60,7 @@ main(int argc, char **argv)
int c;
int exit_code;
int numWorkers = 1;
+ int blobBatchSize = 0;
Archive *AH;
char *inputFileSpec;
static int disable_triggers = 0;
@@ -123,6 +124,7 @@ main(int argc, char **argv)
{"no-publications", no_argument, &no_publications, 1},
{"no-security-labels", no_argument, &no_security_labels, 1},
{"no-subscriptions", no_argument, &no_subscriptions, 1},
+ {"restore-blob-batch-size", required_argument, NULL, 4},
{NULL, 0, NULL, 0}
};
@@ -286,6 +288,10 @@ main(int argc, char **argv)
set_dump_section(optarg, &(opts->dumpSections));
break;
+ case 4: /* # of blobs to restore per transaction */
+ blobBatchSize = atoi(optarg);
+ break;
+
default:
/* getopt_long already emitted a complaint */
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -405,6 +411,7 @@ main(int argc, char **argv)
SortTocFromFile(AH);
AH->numWorkers = numWorkers;
+ AH->blobBatchSize = blobBatchSize;
if (opts->tocSummary)
PrintTOCSummary(AH);
@@ -478,6 +485,8 @@ usage(const char *progname)
printf(_(" --use-set-session-authorization\n"
" use SET SESSION AUTHORIZATION commands instead of\n"
" ALTER OWNER commands to set ownership\n"));
+ printf(_(" --restore-blob-batch-size=NUM\n"
+ " attempt to restore NUM large objects per transaction\n"));
printf(_("\nConnection options:\n"));
printf(_(" -h, --host=HOSTNAME database server host or socket directory\n"));
diff --git a/src/bin/pg_upgrade/dump.c b/src/bin/pg_upgrade/dump.c
index 29b9e44f78..9b838c88e5 100644
--- a/src/bin/pg_upgrade/dump.c
+++ b/src/bin/pg_upgrade/dump.c
@@ -53,8 +53,11 @@ generate_old_dump(void)
parallel_exec_prog(log_file_name, NULL,
"\"%s/pg_dump\" %s --schema-only --quote-all-identifiers "
+ "%s "
"--binary-upgrade --format=custom %s --file=\"%s/%s\" %s",
new_cluster.bindir, cluster_conn_opts(&old_cluster),
+ user_opts.pg_dump_opts ?
+ user_opts.pg_dump_opts : "",
log_opts.verbose ? "--verbose" : "",
log_opts.dumpdir,
sql_file_name, escaped_connstr.data);
diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c
index fbab1c4fb7..4bcd925874 100644
--- a/src/bin/pg_upgrade/option.c
+++ b/src/bin/pg_upgrade/option.c
@@ -56,6 +56,8 @@ parseCommandLine(int argc, char *argv[])
{"socketdir", required_argument, NULL, 's'},
{"verbose", no_argument, NULL, 'v'},
{"clone", no_argument, NULL, 1},
+ {"dump-options", required_argument, NULL, 2},
+ {"restore-options", required_argument, NULL, 3},
{NULL, 0, NULL, 0}
};
@@ -194,6 +196,34 @@ parseCommandLine(int argc, char *argv[])
user_opts.transfer_mode = TRANSFER_MODE_CLONE;
break;
+ case 2:
+ /* append option? */
+ if (!user_opts.pg_dump_opts)
+ user_opts.pg_dump_opts = pg_strdup(optarg);
+ else
+ {
+ char *old_opts = user_opts.pg_dump_opts;
+
+ user_opts.pg_dump_opts = psprintf("%s %s",
+ old_opts, optarg);
+ free(old_opts);
+ }
+ break;
+
+ case 3:
+ /* append option? */
+ if (!user_opts.pg_restore_opts)
+ user_opts.pg_restore_opts = pg_strdup(optarg);
+ else
+ {
+ char *old_opts = user_opts.pg_restore_opts;
+
+ user_opts.pg_restore_opts = psprintf("%s %s",
+ old_opts, optarg);
+ free(old_opts);
+ }
+ break;
+
default:
fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
os_info.progname);
@@ -283,6 +313,8 @@ usage(void)
printf(_(" -v, --verbose enable verbose internal logging\n"));
printf(_(" -V, --version display version information, then exit\n"));
printf(_(" --clone clone instead of copying files to new cluster\n"));
+ printf(_(" --dump-options=OPTIONS options to pass to pg_dump\n"));
+ printf(_(" --restore-options=OPTIONS options to pass to pg_restore\n"));
printf(_(" -?, --help show this help, then exit\n"));
printf(_("\n"
"Before running pg_upgrade you must:\n"
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 115faa222e..3b98312ed2 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -457,10 +457,13 @@ create_new_objects(void)
true,
true,
"\"%s/pg_restore\" %s %s --exit-on-error --verbose "
+ "%s "
"--dbname postgres \"%s/%s\"",
new_cluster.bindir,
cluster_conn_opts(&new_cluster),
create_opts,
+ user_opts.pg_restore_opts ?
+ user_opts.pg_restore_opts : "",
log_opts.dumpdir,
sql_file_name);
@@ -495,10 +498,13 @@ create_new_objects(void)
parallel_exec_prog(log_file_name,
NULL,
"\"%s/pg_restore\" %s %s --exit-on-error --verbose "
+ "%s "
"--dbname template1 \"%s/%s\"",
new_cluster.bindir,
cluster_conn_opts(&new_cluster),
create_opts,
+ user_opts.pg_restore_opts ?
+ user_opts.pg_restore_opts : "",
log_opts.dumpdir,
sql_file_name);
}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 60c3c8dd68..477de6f717 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -295,6 +295,8 @@ typedef struct
transferMode transfer_mode; /* copy files or link them? */
int jobs; /* number of processes/threads to use */
char *socketdir; /* directory to use for Unix sockets */
+ char *pg_dump_opts; /* options to pass to pg_dump */
+ char *pg_restore_opts; /* options to pass to pg_dump */
} UserOpts;
typedef struct
On 8/24/22 17:32, Nathan Bossart wrote:
I'd like to revive this thread, so I've created a commitfest entry [0] and
attached a hastily rebased patch that compiles and passes the tests. I am
aiming to spend some more time on this in the near future.
Just to clarify, was Justin's statement upthread (that the XID problem
is fixed) correct? And is this patch just trying to improve the
remaining memory and lock usage problems?
I took a quick look at the pg_upgrade diffs. I agree with Jan that the
escaping problem is a pretty bad smell, but even putting that aside for
a bit, is it safe to expose arbitrary options to pg_dump/restore during
upgrade? It's super flexible, but I can imagine that some of those flags
might really mess up the new cluster...
And yeah, if you choose to do that then you get to keep both pieces, I
guess, but I like that pg_upgrade tries to be (IMO) fairly bulletproof.
--Jacob
On Wed, Sep 07, 2022 at 02:42:05PM -0700, Jacob Champion wrote:
Just to clarify, was Justin's statement upthread (that the XID problem
is fixed) correct? And is this patch just trying to improve the
remaining memory and lock usage problems?
I think "fixed" might not be totally accurate, but that is the gist.
I took a quick look at the pg_upgrade diffs. I agree with Jan that the
escaping problem is a pretty bad smell, but even putting that aside for
a bit, is it safe to expose arbitrary options to pg_dump/restore during
upgrade? It's super flexible, but I can imagine that some of those flags
might really mess up the new cluster...And yeah, if you choose to do that then you get to keep both pieces, I
guess, but I like that pg_upgrade tries to be (IMO) fairly bulletproof.
IIUC the main benefit of this approach is that it isn't dependent on
binary-upgrade mode, which seems to be a goal based on the discussion
upthread [0]/messages/by-id/227228.1616259220@sss.pgh.pa.us. I think it'd be easily possible to fix only pg_upgrade by
simply dumping and restoring pg_largeobject_metadata, as Andres suggested
in 2018 [1]/messages/by-id/20181122001415.ef5bncxqin2y3esb@alap3.anarazel.de. In fact, it seems like it ought to be possible to just copy
pg_largeobject_metadata's files as was done before 12a53c7. AFAICT this
would only work for clusters upgrading from v12 and newer, and it'd break
if any of the underlying data types change their storage format. This
seems unlikely for OIDs, but there is ongoing discussion about changing
aclitem.
I still think this is a problem worth fixing, but it's not yet clear how to
proceed.
[0]: /messages/by-id/227228.1616259220@sss.pgh.pa.us
[1]: /messages/by-id/20181122001415.ef5bncxqin2y3esb@alap3.anarazel.de
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On Thu, Sep 8, 2022 at 4:18 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
IIUC the main benefit of this approach is that it isn't dependent on
binary-upgrade mode, which seems to be a goal based on the discussion
upthread [0].
To clarify, I agree that pg_dump should contain the core fix. What I'm
questioning is the addition of --dump-options to make use of that fix
from pg_upgrade, since it also lets the user do "exciting" new things
like --exclude-schema and --include-foreign-data and so on. I don't
think we should let them do that without a good reason.
Thanks,
--Jacob
On Thu, Sep 08, 2022 at 04:29:10PM -0700, Jacob Champion wrote:
On Thu, Sep 8, 2022 at 4:18 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
IIUC the main benefit of this approach is that it isn't dependent on
binary-upgrade mode, which seems to be a goal based on the discussion
upthread [0].To clarify, I agree that pg_dump should contain the core fix. What I'm
questioning is the addition of --dump-options to make use of that fix
from pg_upgrade, since it also lets the user do "exciting" new things
like --exclude-schema and --include-foreign-data and so on. I don't
think we should let them do that without a good reason.
Ah, yes, I think that is a fair point.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On Thu, Sep 08, 2022 at 04:34:07PM -0700, Nathan Bossart wrote:
On Thu, Sep 08, 2022 at 04:29:10PM -0700, Jacob Champion wrote:
To clarify, I agree that pg_dump should contain the core fix. What I'm
questioning is the addition of --dump-options to make use of that fix
from pg_upgrade, since it also lets the user do "exciting" new things
like --exclude-schema and --include-foreign-data and so on. I don't
think we should let them do that without a good reason.Ah, yes, I think that is a fair point.
It has been more than four weeks since the last activity of this
thread and there has been what looks like some feedback to me, so
marked as RwF for the time being.
--
Michael
Hi Everyone , I want to continue this thread , I have rebased the patch to latest
master and fixed an issue when pg_restore prints to file.
`
╰─$ pg_restore dump_small.custom --restore-blob-batch-size=2 --file=a
--
-- End BLOB restore batch
--
COMMIT;
`
On 09/11/2023, 17:05, "Jacob Champion" <jchampion@timescale.com <mailto:jchampion@timescale.com>> wrote:
To clarify, I agree that pg_dump should contain the core fix. What I'm
questioning is the addition of --dump-options to make use of that fix
from pg_upgrade, since it also lets the user do "exciting" new things
like --exclude-schema and --include-foreign-data and so on. I don't
think we should let them do that without a good reason.
Earlier idea was to not expose these options to users and use [1]/messages/by-id/a1e200e6-adde-2561-422b-a166ec084e3b@wi3ck.info
--restore-jobs=NUM --jobs parameter passed to pg_restore
--restore-blob-batch-size=NUM number of blobs restored in one xact
But this was later expanded to use --dump-options and --restore-options [2]/messages/by-id/8d8d3961-8e8b-3dbe-f911-6f418c5fb1d3@wi3ck.info.
With --restore-options user can use --exclude-schema ,
So maybe we can go back to [1]/messages/by-id/a1e200e6-adde-2561-422b-a166ec084e3b@wi3ck.info
[1]: /messages/by-id/a1e200e6-adde-2561-422b-a166ec084e3b@wi3ck.info
[2]: /messages/by-id/8d8d3961-8e8b-3dbe-f911-6f418c5fb1d3@wi3ck.info
Regards
Sachin
Amazon Web Services: https://aws.amazon.com
[ Jacob's email address updated ]
"Kumar, Sachin" <ssetiya@amazon.com> writes:
Hi Everyone , I want to continue this thread , I have rebased the patch to latest
master and fixed an issue when pg_restore prints to file.
Um ... you didn't attach the patch?
FWIW, I agree with Jacob's concern about it being a bad idea to let
users of pg_upgrade pass down arbitrary options to pg_dump/pg_restore.
I think we'd regret going there, because it'd hugely expand the set
of cases pg_upgrade has to deal with.
Also, pg_upgrade is often invoked indirectly via scripts, so I do
not especially buy the idea that we're going to get useful control
input from some human somewhere. I think we'd be better off to
assume that pg_upgrade is on its own to manage the process, so that
if we need to switch strategies based on object count or whatever,
we should put in a heuristic to choose the strategy automatically.
It might not be perfect, but that will give better results for the
pretty large fraction of users who are not going to mess with
weird little switches.
regards, tom lane
Hi,
On November 9, 2023 10:41:01 AM PST, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Also, pg_upgrade is often invoked indirectly via scripts, so I do
not especially buy the idea that we're going to get useful control
input from some human somewhere. I think we'd be better off to
assume that pg_upgrade is on its own to manage the process, so that
if we need to switch strategies based on object count or whatever,
we should put in a heuristic to choose the strategy automatically.
It might not be perfect, but that will give better results for the
pretty large fraction of users who are not going to mess with
weird little switches.
+1 - even leaving everything else aside, just about no user would know about the option. There are cases where we can't do better than giving the user control, but we are certainly adding options at a rate that doesn't seem sustainable. And here it doesn't seem that hard to do better.
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
On 09/11/2023, 18:41, "Tom Lane" <tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>> wrote:
Um ... you didn't attach the patch?
Sorry , patch attached
Regards
Sachin
Attachments:
pg_upgrade_improvements_v6.diffapplication/octet-stream; name=pg_upgrade_improvements_v6.diffDownload
commit 62563cf9103ee01cefcf6b477de633a8a6ac508e
Author: Sachin Kumar <ssetiya@amazon.com>
Date: Tue Sep 5 14:14:40 2023 +0100
Proto Patch
diff --git a/src/bin/pg_dump/parallel.c b/src/bin/pg_dump/parallel.c
index 85e6515ac2..9328830b2e 100644
--- a/src/bin/pg_dump/parallel.c
+++ b/src/bin/pg_dump/parallel.c
@@ -858,6 +858,11 @@ RunWorker(ArchiveHandle *AH, ParallelSlot *slot)
*/
WaitForCommands(AH, pipefd);
+ /*
+ * Close an eventually open BLOB batch transaction.
+ */
+ CommitBlobTransaction((Archive *)AH);
+
/*
* Disconnect from database and clean up.
*/
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 9ef2f2017e..65519791e9 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -223,6 +223,8 @@ typedef struct Archive
int numWorkers; /* number of parallel processes */
char *sync_snapshot_id; /* sync snapshot id for parallel operation */
+ int blobBatchSize; /* # of blobs to restore per transaction */
+
/* info needed for string escaping */
int encoding; /* libpq code for client_encoding */
bool std_strings; /* standard_conforming_strings */
@@ -293,6 +295,7 @@ extern void WriteData(Archive *AHX, const void *data, size_t dLen);
extern int StartLO(Archive *AHX, Oid oid);
extern int EndLO(Archive *AHX, Oid oid);
+extern void CommitBlobTransaction(Archive *AH);
extern void CloseArchive(Archive *AHX);
extern void SetArchiveOptions(Archive *AH, DumpOptions *dopt, RestoreOptions *ropt);
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 256d1e35a4..88de8c6829 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -45,6 +45,7 @@
#define TEXT_DUMP_HEADER "--\n-- PostgreSQL database dump\n--\n\n"
#define TEXT_DUMPALL_HEADER "--\n-- PostgreSQL database cluster dump\n--\n\n"
+static int blobBatchCount = 0;
static ArchiveHandle *_allocAH(const char *FileSpec, const ArchiveFormat fmt,
const pg_compress_specification compression_spec,
@@ -258,6 +259,23 @@ CloseArchive(Archive *AHX)
pg_fatal("could not close output file: %m");
}
+/* Public */
+void
+CommitBlobTransaction(Archive *AHX)
+{
+ ArchiveHandle *AH = (ArchiveHandle *) AHX;
+
+ if (blobBatchCount > 0)
+ {
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "-- End BLOB restore batch\n");
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "COMMIT;\n\n");
+
+ blobBatchCount = 0;
+ }
+}
+
/* Public */
void
SetArchiveOptions(Archive *AH, DumpOptions *dopt, RestoreOptions *ropt)
@@ -719,6 +737,8 @@ RestoreArchive(Archive *AHX)
ahprintf(AH, "COMMIT;\n\n");
}
+ CommitBlobTransaction(AHX);
+
if (AH->public.verbose)
dumpTimestamp(AH, "Completed on", time(NULL));
@@ -3543,6 +3563,57 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, bool isData)
{
RestoreOptions *ropt = AH->public.ropt;
+ /* We restore BLOBs in batches to reduce XID consumption */
+ if (strcmp(te->desc, "BLOB") == 0 && AH->public.blobBatchSize > 0)
+ {
+ if (blobBatchCount > 0)
+ {
+ /* We are inside a BLOB restore transaction */
+ if (blobBatchCount >= AH->public.blobBatchSize)
+ {
+ /*
+ * We did reach the batch size with the previous BLOB.
+ * Commit and start a new batch.
+ */
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "-- BLOB batch size reached\n");
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "COMMIT;\n");
+ ahprintf(AH, "BEGIN;\n\n");
+
+ blobBatchCount = 1;
+ }
+ else
+ {
+ /* This one still fits into the current batch */
+ blobBatchCount++;
+ }
+ }
+ else
+ {
+ /* Not inside a transaction, start a new batch */
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "-- Start BLOB restore batch\n");
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "BEGIN;\n\n");
+
+ blobBatchCount = 1;
+ }
+ }
+ else
+ {
+ /* Not a BLOB. If we have a BLOB batch open, close it. */
+ if (blobBatchCount > 0)
+ {
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "-- End BLOB restore batch\n");
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "COMMIT;\n\n");
+
+ blobBatchCount = 0;
+ }
+ }
+
/* Select owner, schema, tablespace and default AM as necessary */
_becomeOwner(AH, te);
_selectOutputSchema(AH, te->namespace);
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index e863913849..2c6d49732b 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -205,11 +205,20 @@ static inline void dumpComment(Archive *fout, const char *type,
const char *name, const char *namespace,
const char *owner, CatalogId catalogId,
int subid, DumpId dumpId);
+static bool dumpCommentQuery(Archive *fout, PQExpBuffer query, PQExpBuffer tag,
+ const char *type, const char *name,
+ const char *namespace, const char *owner,
+ CatalogId catalogId, int subid, DumpId dumpId,
+ const char *initdb_comment);
static int findComments(Oid classoid, Oid objoid, CommentItem **items);
static void collectComments(Archive *fout);
static void dumpSecLabel(Archive *fout, const char *type, const char *name,
const char *namespace, const char *owner,
CatalogId catalogId, int subid, DumpId dumpId);
+static bool dumpSecLabelQuery(Archive *fout, PQExpBuffer query, PQExpBuffer tag,
+ const char *type, const char *name,
+ const char *namespace, const char *owner,
+ CatalogId catalogId, int subid, DumpId dumpId);
static int findSecLabels(Oid classoid, Oid objoid, SecLabelItem **items);
static void collectSecLabels(Archive *fout);
static void dumpDumpableObject(Archive *fout, DumpableObject *dobj);
@@ -265,6 +274,12 @@ static DumpId dumpACL(Archive *fout, DumpId objDumpId, DumpId altDumpId,
const char *type, const char *name, const char *subname,
const char *nspname, const char *owner,
const DumpableAcl *dacl);
+static bool dumpACLQuery(Archive *fout, PQExpBuffer query, PQExpBuffer tag,
+ DumpId objDumpId, DumpId altDumpId,
+ const char *type, const char *name,
+ const char *subname,
+ const char *nspname, const char *owner,
+ const DumpableAcl *dacl);
static void getDependencies(Archive *fout);
static void BuildArchiveDependencies(Archive *fout);
@@ -3641,10 +3656,42 @@ dumpLO(Archive *fout, const LoInfo *loinfo)
{
PQExpBuffer cquery = createPQExpBuffer();
PQExpBuffer dquery = createPQExpBuffer();
+ PQExpBuffer tag = createPQExpBuffer();
+ teSection section = SECTION_PRE_DATA;
appendPQExpBuffer(cquery,
"SELECT pg_catalog.lo_create('%s');\n",
loinfo->dobj.name);
+ /*
+ * In binary upgrade mode we put all the queries to restore
+ * one large object into a single TOC entry and emit it as
+ * SECTION_DATA so that they can be restored in parallel.
+ */
+ if (fout->dopt->binary_upgrade)
+ {
+ section = SECTION_DATA;
+
+ /* Dump comment if any */
+ if (loinfo->dobj.dump & DUMP_COMPONENT_COMMENT)
+ dumpCommentQuery(fout, cquery, tag, "LARGE OBJECT",
+ loinfo->dobj.name, NULL, loinfo->rolname,
+ loinfo->dobj.catId, 0, loinfo->dobj.dumpId, NULL);
+
+ /* Dump security label if any */
+ if (loinfo->dobj.dump & DUMP_COMPONENT_SECLABEL)
+ dumpSecLabelQuery(fout, cquery, tag, "LARGE OBJECT",
+ loinfo->dobj.name,
+ NULL, loinfo->rolname,
+ loinfo->dobj.catId, 0, loinfo->dobj.dumpId);
+
+ /* Dump ACL if any */
+ if (loinfo->dobj.dump & DUMP_COMPONENT_ACL)
+ dumpACLQuery(fout, cquery, tag,
+ loinfo->dobj.dumpId, InvalidDumpId, "LARGE OBJECT",
+ loinfo->dobj.name, NULL,
+ NULL, loinfo->rolname, &loinfo->dacl);
+ }
+
appendPQExpBuffer(dquery,
"SELECT pg_catalog.lo_unlink('%s');\n",
@@ -3655,27 +3702,28 @@ dumpLO(Archive *fout, const LoInfo *loinfo)
ARCHIVE_OPTS(.tag = loinfo->dobj.name,
.owner = loinfo->rolname,
.description = "BLOB",
- .section = SECTION_PRE_DATA,
+ .section = section,
.createStmt = cquery->data,
.dropStmt = dquery->data));
- /* Dump comment if any */
- if (loinfo->dobj.dump & DUMP_COMPONENT_COMMENT)
- dumpComment(fout, "LARGE OBJECT", loinfo->dobj.name,
- NULL, loinfo->rolname,
- loinfo->dobj.catId, 0, loinfo->dobj.dumpId);
-
- /* Dump security label if any */
- if (loinfo->dobj.dump & DUMP_COMPONENT_SECLABEL)
- dumpSecLabel(fout, "LARGE OBJECT", loinfo->dobj.name,
- NULL, loinfo->rolname,
- loinfo->dobj.catId, 0, loinfo->dobj.dumpId);
-
- /* Dump ACL if any */
- if (loinfo->dobj.dump & DUMP_COMPONENT_ACL)
- dumpACL(fout, loinfo->dobj.dumpId, InvalidDumpId, "LARGE OBJECT",
- loinfo->dobj.name, NULL,
- NULL, loinfo->rolname, &loinfo->dacl);
+ if (!fout->dopt->binary_upgrade)
+ {
+ /* Dump comment if any */
+ if (loinfo->dobj.dump & DUMP_COMPONENT_COMMENT)
+ dumpComment(fout, "LARGE OBJECT", loinfo->dobj.name,
+ NULL, loinfo->rolname,
+ loinfo->dobj.catId, 0, loinfo->dobj.dumpId);
+ /* Dump security label if any */
+ if (loinfo->dobj.dump & DUMP_COMPONENT_SECLABEL)
+ dumpSecLabel(fout, "LARGE OBJECT", loinfo->dobj.name,
+ NULL, loinfo->rolname,
+ loinfo->dobj.catId, 0, loinfo->dobj.dumpId);
+ /* Dump ACL if any */
+ if (loinfo->dobj.dump & DUMP_COMPONENT_ACL)
+ dumpACL(fout, loinfo->dobj.dumpId, InvalidDumpId, "LARGE OBJECT",
+ loinfo->dobj.name, NULL,
+ NULL, loinfo->rolname, &loinfo->dacl);
+ }
destroyPQExpBuffer(cquery);
destroyPQExpBuffer(dquery);
@@ -9899,6 +9947,38 @@ dumpCommentExtended(Archive *fout, const char *type,
const char *owner, CatalogId catalogId,
int subid, DumpId dumpId,
const char *initdb_comment)
+{
+ PQExpBuffer query = createPQExpBuffer();
+ PQExpBuffer tag = createPQExpBuffer();
+
+ if (dumpCommentQuery(fout, query, tag, type, name, namespace, owner,
+ catalogId, subid, dumpId, initdb_comment))
+ {
+ /*
+ * We mark comments as SECTION_NONE because they really belong in the
+ * same section as their parent, whether that is pre-data or
+ * post-data.
+ */
+ ArchiveEntry(fout, nilCatalogId, createDumpId(),
+ ARCHIVE_OPTS(.tag = tag->data,
+ .namespace = namespace,
+ .owner = owner,
+ .description = "COMMENT",
+ .section = SECTION_NONE,
+ .createStmt = query->data,
+ .deps = &dumpId,
+ .nDeps = 1));
+ }
+ destroyPQExpBuffer(query);
+ destroyPQExpBuffer(tag);
+}
+
+static bool
+dumpCommentQuery(Archive *fout, PQExpBuffer query, PQExpBuffer tag,
+ const char *type, const char *name,
+ const char *namespace, const char *owner,
+ CatalogId catalogId, int subid, DumpId dumpId,
+ const char *initdb_comment)
{
DumpOptions *dopt = fout->dopt;
CommentItem *comments;
@@ -9906,19 +9986,19 @@ dumpCommentExtended(Archive *fout, const char *type,
/* do nothing, if --no-comments is supplied */
if (dopt->no_comments)
- return;
+ return false;
/* Comments are schema not data ... except LO comments are data */
if (strcmp(type, "LARGE OBJECT") != 0)
{
if (dopt->dataOnly)
- return;
+ return false;
}
else
{
/* We do dump LO comments in binary-upgrade mode */
if (dopt->schemaOnly && !dopt->binary_upgrade)
- return;
+ return false;
}
/* Search for comments associated with catalogId, using table */
@@ -9956,9 +10036,6 @@ dumpCommentExtended(Archive *fout, const char *type,
/* If a comment exists, build COMMENT ON statement */
if (ncomments > 0)
{
- PQExpBuffer query = createPQExpBuffer();
- PQExpBuffer tag = createPQExpBuffer();
-
appendPQExpBuffer(query, "COMMENT ON %s ", type);
if (namespace && *namespace)
appendPQExpBuffer(query, "%s.", fmtId(namespace));
@@ -9968,24 +10045,10 @@ dumpCommentExtended(Archive *fout, const char *type,
appendPQExpBuffer(tag, "%s %s", type, name);
- /*
- * We mark comments as SECTION_NONE because they really belong in the
- * same section as their parent, whether that is pre-data or
- * post-data.
- */
- ArchiveEntry(fout, nilCatalogId, createDumpId(),
- ARCHIVE_OPTS(.tag = tag->data,
- .namespace = namespace,
- .owner = owner,
- .description = "COMMENT",
- .section = SECTION_NONE,
- .createStmt = query->data,
- .deps = &dumpId,
- .nDeps = 1));
-
- destroyPQExpBuffer(query);
- destroyPQExpBuffer(tag);
+ return true;
}
+
+ return false;
}
/*
@@ -14939,23 +15002,65 @@ dumpACL(Archive *fout, DumpId objDumpId, DumpId altDumpId,
const DumpableAcl *dacl)
{
DumpId aclDumpId = InvalidDumpId;
+ PQExpBuffer query = createPQExpBuffer();
+ PQExpBuffer tag = createPQExpBuffer();
+
+ if (dumpACLQuery(fout, query, tag, objDumpId, altDumpId,
+ type, name, subname, nspname, owner, dacl))
+ {
+ DumpId aclDeps[2];
+ int nDeps = 0;
+
+ if (subname)
+ appendPQExpBuffer(tag, "COLUMN %s.%s", name, subname);
+ else
+ appendPQExpBuffer(tag, "%s %s", type, name);
+
+ aclDeps[nDeps++] = objDumpId;
+ if (altDumpId != InvalidDumpId)
+ aclDeps[nDeps++] = altDumpId;
+
+ aclDumpId = createDumpId();
+
+ ArchiveEntry(fout, nilCatalogId, aclDumpId,
+ ARCHIVE_OPTS(.tag = tag->data,
+ .namespace = nspname,
+ .owner = owner,
+ .description = "ACL",
+ .section = SECTION_NONE,
+ .createStmt = query->data,
+ .deps = aclDeps,
+ .nDeps = nDeps));
+
+ }
+
+ destroyPQExpBuffer(query);
+ destroyPQExpBuffer(tag);
+
+ return aclDumpId;
+}
+
+static bool
+dumpACLQuery(Archive *fout, PQExpBuffer query, PQExpBuffer tag,
+ DumpId objDumpId, DumpId altDumpId,
+ const char *type, const char *name, const char *subname,
+ const char *nspname, const char *owner,
+ const DumpableAcl *dacl)
+{
DumpOptions *dopt = fout->dopt;
const char *acls = dacl->acl;
const char *acldefault = dacl->acldefault;
char privtype = dacl->privtype;
const char *initprivs = dacl->initprivs;
const char *baseacls;
- PQExpBuffer sql;
/* Do nothing if ACL dump is not enabled */
if (dopt->aclsSkip)
- return InvalidDumpId;
+ return false;
/* --data-only skips ACLs *except* large object ACLs */
if (dopt->dataOnly && strcmp(type, "LARGE OBJECT") != 0)
- return InvalidDumpId;
-
- sql = createPQExpBuffer();
+ return false;
/*
* In binary upgrade mode, we don't run an extension's script but instead
@@ -14973,13 +15078,13 @@ dumpACL(Archive *fout, DumpId objDumpId, DumpId altDumpId,
if (dopt->binary_upgrade && privtype == 'e' &&
initprivs && *initprivs != '\0')
{
- appendPQExpBufferStr(sql, "SELECT pg_catalog.binary_upgrade_set_record_init_privs(true);\n");
+ appendPQExpBufferStr(query, "SELECT pg_catalog.binary_upgrade_set_record_init_privs(true);\n");
if (!buildACLCommands(name, subname, nspname, type,
initprivs, acldefault, owner,
- "", fout->remoteVersion, sql))
+ "", fout->remoteVersion, query))
pg_fatal("could not parse initial ACL list (%s) or default (%s) for object \"%s\" (%s)",
initprivs, acldefault, name, type);
- appendPQExpBufferStr(sql, "SELECT pg_catalog.binary_upgrade_set_record_init_privs(false);\n");
+ appendPQExpBufferStr(query, "SELECT pg_catalog.binary_upgrade_set_record_init_privs(false);\n");
}
/*
@@ -15001,43 +15106,19 @@ dumpACL(Archive *fout, DumpId objDumpId, DumpId altDumpId,
if (!buildACLCommands(name, subname, nspname, type,
acls, baseacls, owner,
- "", fout->remoteVersion, sql))
+ "", fout->remoteVersion, query))
pg_fatal("could not parse ACL list (%s) or default (%s) for object \"%s\" (%s)",
acls, baseacls, name, type);
- if (sql->len > 0)
+ if (query->len > 0 && tag != NULL)
{
- PQExpBuffer tag = createPQExpBuffer();
- DumpId aclDeps[2];
- int nDeps = 0;
-
if (subname)
appendPQExpBuffer(tag, "COLUMN %s.%s", name, subname);
else
appendPQExpBuffer(tag, "%s %s", type, name);
-
- aclDeps[nDeps++] = objDumpId;
- if (altDumpId != InvalidDumpId)
- aclDeps[nDeps++] = altDumpId;
-
- aclDumpId = createDumpId();
-
- ArchiveEntry(fout, nilCatalogId, aclDumpId,
- ARCHIVE_OPTS(.tag = tag->data,
- .namespace = nspname,
- .owner = owner,
- .description = "ACL",
- .section = SECTION_NONE,
- .createStmt = sql->data,
- .deps = aclDeps,
- .nDeps = nDeps));
-
- destroyPQExpBuffer(tag);
}
- destroyPQExpBuffer(sql);
-
- return aclDumpId;
+ return true;
}
/*
@@ -15062,16 +15143,42 @@ static void
dumpSecLabel(Archive *fout, const char *type, const char *name,
const char *namespace, const char *owner,
CatalogId catalogId, int subid, DumpId dumpId)
+{
+ PQExpBuffer query = createPQExpBuffer();
+ PQExpBuffer tag = createPQExpBuffer();
+
+ if (dumpSecLabelQuery(fout, query, tag, type, name,
+ namespace, owner, catalogId, subid, dumpId))
+ {
+ ArchiveEntry(fout, nilCatalogId, createDumpId(),
+ ARCHIVE_OPTS(.tag = tag->data,
+ .namespace = namespace,
+ .owner = owner,
+ .description = "SECURITY LABEL",
+ .section = SECTION_NONE,
+ .createStmt = query->data,
+ .deps = &dumpId,
+ .nDeps = 1));
+ }
+
+ destroyPQExpBuffer(query);
+ destroyPQExpBuffer(tag);
+}
+
+static bool
+dumpSecLabelQuery(Archive *fout, PQExpBuffer query, PQExpBuffer tag,
+ const char *type, const char *name,
+ const char *namespace, const char *owner,
+ CatalogId catalogId, int subid, DumpId dumpId)
{
DumpOptions *dopt = fout->dopt;
SecLabelItem *labels;
int nlabels;
int i;
- PQExpBuffer query;
/* do nothing, if --no-security-labels is supplied */
if (dopt->no_security_labels)
- return;
+ return false;
/*
* Security labels are schema not data ... except large object labels are
@@ -15080,20 +15187,18 @@ dumpSecLabel(Archive *fout, const char *type, const char *name,
if (strcmp(type, "LARGE OBJECT") != 0)
{
if (dopt->dataOnly)
- return;
+ return false;
}
else
{
/* We do dump large object security labels in binary-upgrade mode */
if (dopt->schemaOnly && !dopt->binary_upgrade)
- return;
+ return false;
}
/* Search for security labels associated with catalogId, using table */
nlabels = findSecLabels(catalogId.tableoid, catalogId.oid, &labels);
- query = createPQExpBuffer();
-
for (i = 0; i < nlabels; i++)
{
/*
@@ -15114,22 +15219,11 @@ dumpSecLabel(Archive *fout, const char *type, const char *name,
if (query->len > 0)
{
- PQExpBuffer tag = createPQExpBuffer();
-
appendPQExpBuffer(tag, "%s %s", type, name);
- ArchiveEntry(fout, nilCatalogId, createDumpId(),
- ARCHIVE_OPTS(.tag = tag->data,
- .namespace = namespace,
- .owner = owner,
- .description = "SECURITY LABEL",
- .section = SECTION_NONE,
- .createStmt = query->data,
- .deps = &dumpId,
- .nDeps = 1));
- destroyPQExpBuffer(tag);
+ return true;
}
- destroyPQExpBuffer(query);
+ return false;
}
/*
diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
index 049a100634..2159f72ffb 100644
--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -60,6 +60,7 @@ main(int argc, char **argv)
int c;
int exit_code;
int numWorkers = 1;
+ int blobBatchSize = 0;
Archive *AH;
char *inputFileSpec;
static int disable_triggers = 0;
@@ -123,6 +124,7 @@ main(int argc, char **argv)
{"no-publications", no_argument, &no_publications, 1},
{"no-security-labels", no_argument, &no_security_labels, 1},
{"no-subscriptions", no_argument, &no_subscriptions, 1},
+ {"restore-blob-batch-size", required_argument, NULL, 4},
{NULL, 0, NULL, 0}
};
@@ -286,6 +288,10 @@ main(int argc, char **argv)
set_dump_section(optarg, &(opts->dumpSections));
break;
+ case 4: /* # of blobs to restore per transaction */
+ blobBatchSize = atoi(optarg);
+ break;
+
default:
/* getopt_long already emitted a complaint */
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -405,6 +411,7 @@ main(int argc, char **argv)
SortTocFromFile(AH);
AH->numWorkers = numWorkers;
+ AH->blobBatchSize = blobBatchSize;
if (opts->tocSummary)
PrintTOCSummary(AH);
@@ -478,6 +485,8 @@ usage(const char *progname)
printf(_(" --use-set-session-authorization\n"
" use SET SESSION AUTHORIZATION commands instead of\n"
" ALTER OWNER commands to set ownership\n"));
+ printf(_(" --restore-blob-batch-size=NUM\n"
+ " attempt to restore NUM large objects per transaction\n"));
printf(_("\nConnection options:\n"));
printf(_(" -h, --host=HOSTNAME database server host or socket directory\n"));
diff --git a/src/bin/pg_upgrade/dump.c b/src/bin/pg_upgrade/dump.c
index 6c8c82dca8..941a822a50 100644
--- a/src/bin/pg_upgrade/dump.c
+++ b/src/bin/pg_upgrade/dump.c
@@ -53,8 +53,11 @@ generate_old_dump(void)
parallel_exec_prog(log_file_name, NULL,
"\"%s/pg_dump\" %s --schema-only --quote-all-identifiers "
+ "%s "
"--binary-upgrade --format=custom %s --file=\"%s/%s\" %s",
new_cluster.bindir, cluster_conn_opts(&old_cluster),
+ user_opts.pg_dump_opts ?
+ user_opts.pg_dump_opts : "",
log_opts.verbose ? "--verbose" : "",
log_opts.dumpdir,
sql_file_name, escaped_connstr.data);
diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c
index b9d900d0db..583d988c81 100644
--- a/src/bin/pg_upgrade/option.c
+++ b/src/bin/pg_upgrade/option.c
@@ -59,6 +59,8 @@ parseCommandLine(int argc, char *argv[])
{"clone", no_argument, NULL, 1},
{"copy", no_argument, NULL, 2},
{"sync-method", required_argument, NULL, 3},
+ {"dump-options", required_argument, NULL, 4},
+ {"restore-options", required_argument, NULL, 5},
{NULL, 0, NULL, 0}
};
@@ -206,6 +208,32 @@ parseCommandLine(int argc, char *argv[])
if (!parse_sync_method(optarg, &unused))
exit(1);
user_opts.sync_method = pg_strdup(optarg);
+ case 4:
+ /* append option? */
+ if (!user_opts.pg_dump_opts)
+ user_opts.pg_dump_opts = pg_strdup(optarg);
+ else
+ {
+ char *old_opts = user_opts.pg_dump_opts;
+
+ user_opts.pg_dump_opts = psprintf("%s %s",
+ old_opts, optarg);
+ free(old_opts);
+ }
+ break;
+
+ case 5:
+ /* append option? */
+ if (!user_opts.pg_restore_opts)
+ user_opts.pg_restore_opts = pg_strdup(optarg);
+ else
+ {
+ char *old_opts = user_opts.pg_restore_opts;
+
+ user_opts.pg_restore_opts = psprintf("%s %s",
+ old_opts, optarg);
+ free(old_opts);
+ }
break;
default:
@@ -302,6 +330,8 @@ usage(void)
printf(_(" --clone clone instead of copying files to new cluster\n"));
printf(_(" --copy copy files to new cluster (default)\n"));
printf(_(" --sync-method=METHOD set method for syncing files to disk\n"));
+ printf(_(" --dump-options=OPTIONS options to pass to pg_dump\n"));
+ printf(_(" --restore-options=OPTIONS options to pass to pg_restore\n"));
printf(_(" -?, --help show this help, then exit\n"));
printf(_("\n"
"Before running pg_upgrade you must:\n"
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 3960af4036..9b4f0693ab 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -548,10 +548,13 @@ create_new_objects(void)
true,
true,
"\"%s/pg_restore\" %s %s --exit-on-error --verbose "
+ "%s "
"--dbname postgres \"%s/%s\"",
new_cluster.bindir,
cluster_conn_opts(&new_cluster),
create_opts,
+ user_opts.pg_restore_opts ?
+ user_opts.pg_restore_opts : "",
log_opts.dumpdir,
sql_file_name);
@@ -586,10 +589,13 @@ create_new_objects(void)
parallel_exec_prog(log_file_name,
NULL,
"\"%s/pg_restore\" %s %s --exit-on-error --verbose "
+ "%s "
"--dbname template1 \"%s/%s\"",
new_cluster.bindir,
cluster_conn_opts(&new_cluster),
create_opts,
+ user_opts.pg_restore_opts ?
+ user_opts.pg_restore_opts : "",
log_opts.dumpdir,
sql_file_name);
}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index a710f325de..0218ff2f26 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -324,6 +324,8 @@ typedef struct
int jobs; /* number of processes/threads to use */
char *socketdir; /* directory to use for Unix sockets */
char *sync_method;
+ char *pg_dump_opts; /* options to pass to pg_dump */
+ char *pg_restore_opts; /* options to pass to pg_dump */
} UserOpts;
typedef struct
> "Tom Lane" <tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>> wrote:
FWIW, I agree with Jacob's concern about it being a bad idea to let
users of pg_upgrade pass down arbitrary options to pg_dump/pg_restore.
I think we'd regret going there, because it'd hugely expand the set
of cases pg_upgrade has to deal with.
Also, pg_upgrade is often invoked indirectly via scripts, so I do
not especially buy the idea that we're going to get useful control
input from some human somewhere. I think we'd be better off to
assume that pg_upgrade is on its own to manage the process, so that
if we need to switch strategies based on object count or whatever,
we should put in a heuristic to choose the strategy automatically.
It might not be perfect, but that will give better results for the
pretty large fraction of users who are not going to mess with
weird little switches.
I have updated the patch to use heuristic, During pg_upgrade we count
Large objects per database. During pg_restore execution if db large_objects
count is greater than LARGE_OBJECTS_THRESOLD (1k) we will use
--restore-blob-batch-size.
I also modified pg_upgrade --jobs behavior if we have large_objects (> LARGE_OBJECTS_THRESOLD)
+ /* Restore all the dbs where LARGE_OBJECTS_THRESOLD is not breached */
+ restore_dbs(stats, true);
+ /* reap all children */
+ while (reap_child(true) == true)
+ ;
+ /* Restore rest of the dbs one by one with pg_restore --jobs = user_opts.jobs */
+ restore_dbs(stats, false);
/* reap all children */
while (reap_child(true) == true)
;
Regards
Sachin
Attachments:
pg_upgrade_improvements_v7.diffapplication/octet-stream; name=pg_upgrade_improvements_v7.diffDownload
diff --git a/src/bin/pg_dump/parallel.c b/src/bin/pg_dump/parallel.c
index 85e6515ac2..9328830b2e 100644
--- a/src/bin/pg_dump/parallel.c
+++ b/src/bin/pg_dump/parallel.c
@@ -858,6 +858,11 @@ RunWorker(ArchiveHandle *AH, ParallelSlot *slot)
*/
WaitForCommands(AH, pipefd);
+ /*
+ * Close an eventually open BLOB batch transaction.
+ */
+ CommitBlobTransaction((Archive *)AH);
+
/*
* Disconnect from database and clean up.
*/
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 9ef2f2017e..65519791e9 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -223,6 +223,8 @@ typedef struct Archive
int numWorkers; /* number of parallel processes */
char *sync_snapshot_id; /* sync snapshot id for parallel operation */
+ int blobBatchSize; /* # of blobs to restore per transaction */
+
/* info needed for string escaping */
int encoding; /* libpq code for client_encoding */
bool std_strings; /* standard_conforming_strings */
@@ -293,6 +295,7 @@ extern void WriteData(Archive *AHX, const void *data, size_t dLen);
extern int StartLO(Archive *AHX, Oid oid);
extern int EndLO(Archive *AHX, Oid oid);
+extern void CommitBlobTransaction(Archive *AH);
extern void CloseArchive(Archive *AHX);
extern void SetArchiveOptions(Archive *AH, DumpOptions *dopt, RestoreOptions *ropt);
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 256d1e35a4..43be945064 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -45,6 +45,7 @@
#define TEXT_DUMP_HEADER "--\n-- PostgreSQL database dump\n--\n\n"
#define TEXT_DUMPALL_HEADER "--\n-- PostgreSQL database cluster dump\n--\n\n"
+static int blobBatchCount = 0;
static ArchiveHandle *_allocAH(const char *FileSpec, const ArchiveFormat fmt,
const pg_compress_specification compression_spec,
@@ -258,6 +259,23 @@ CloseArchive(Archive *AHX)
pg_fatal("could not close output file: %m");
}
+/* Public */
+void
+CommitBlobTransaction(Archive *AHX)
+{
+ ArchiveHandle *AH = (ArchiveHandle *) AHX;
+
+ if (blobBatchCount > 0)
+ {
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "-- End BLOB restore batch\n");
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "COMMIT;\n\n");
+
+ blobBatchCount = 0;
+ }
+}
+
/* Public */
void
SetArchiveOptions(Archive *AH, DumpOptions *dopt, RestoreOptions *ropt)
@@ -719,6 +737,8 @@ RestoreArchive(Archive *AHX)
ahprintf(AH, "COMMIT;\n\n");
}
+ CommitBlobTransaction(AHX);
+
if (AH->public.verbose)
dumpTimestamp(AH, "Completed on", time(NULL));
@@ -3543,6 +3563,57 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, bool isData)
{
RestoreOptions *ropt = AH->public.ropt;
+ /* We restore BLOBs in batches to reduce XID consumption */
+ if (strcmp(te->desc, "BLOB") == 0 && AH->public.blobBatchSize > 1)
+ {
+ if (blobBatchCount > 0)
+ {
+ /* We are inside a BLOB restore transaction */
+ if (blobBatchCount >= AH->public.blobBatchSize)
+ {
+ /*
+ * We did reach the batch size with the previous BLOB.
+ * Commit and start a new batch.
+ */
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "-- BLOB batch size reached\n");
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "COMMIT;\n");
+ ahprintf(AH, "BEGIN;\n\n");
+
+ blobBatchCount = 1;
+ }
+ else
+ {
+ /* This one still fits into the current batch */
+ blobBatchCount++;
+ }
+ }
+ else
+ {
+ /* Not inside a transaction, start a new batch */
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "-- Start BLOB restore batch\n");
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "BEGIN;\n\n");
+
+ blobBatchCount = 1;
+ }
+ }
+ else
+ {
+ /* Not a BLOB. If we have a BLOB batch open, close it. */
+ if (blobBatchCount > 0)
+ {
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "-- End BLOB restore batch\n");
+ ahprintf(AH, "--\n");
+ ahprintf(AH, "COMMIT;\n\n");
+
+ blobBatchCount = 0;
+ }
+ }
+
/* Select owner, schema, tablespace and default AM as necessary */
_becomeOwner(AH, te);
_selectOutputSchema(AH, te->namespace);
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index e863913849..2c6d49732b 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -205,11 +205,20 @@ static inline void dumpComment(Archive *fout, const char *type,
const char *name, const char *namespace,
const char *owner, CatalogId catalogId,
int subid, DumpId dumpId);
+static bool dumpCommentQuery(Archive *fout, PQExpBuffer query, PQExpBuffer tag,
+ const char *type, const char *name,
+ const char *namespace, const char *owner,
+ CatalogId catalogId, int subid, DumpId dumpId,
+ const char *initdb_comment);
static int findComments(Oid classoid, Oid objoid, CommentItem **items);
static void collectComments(Archive *fout);
static void dumpSecLabel(Archive *fout, const char *type, const char *name,
const char *namespace, const char *owner,
CatalogId catalogId, int subid, DumpId dumpId);
+static bool dumpSecLabelQuery(Archive *fout, PQExpBuffer query, PQExpBuffer tag,
+ const char *type, const char *name,
+ const char *namespace, const char *owner,
+ CatalogId catalogId, int subid, DumpId dumpId);
static int findSecLabels(Oid classoid, Oid objoid, SecLabelItem **items);
static void collectSecLabels(Archive *fout);
static void dumpDumpableObject(Archive *fout, DumpableObject *dobj);
@@ -265,6 +274,12 @@ static DumpId dumpACL(Archive *fout, DumpId objDumpId, DumpId altDumpId,
const char *type, const char *name, const char *subname,
const char *nspname, const char *owner,
const DumpableAcl *dacl);
+static bool dumpACLQuery(Archive *fout, PQExpBuffer query, PQExpBuffer tag,
+ DumpId objDumpId, DumpId altDumpId,
+ const char *type, const char *name,
+ const char *subname,
+ const char *nspname, const char *owner,
+ const DumpableAcl *dacl);
static void getDependencies(Archive *fout);
static void BuildArchiveDependencies(Archive *fout);
@@ -3641,10 +3656,42 @@ dumpLO(Archive *fout, const LoInfo *loinfo)
{
PQExpBuffer cquery = createPQExpBuffer();
PQExpBuffer dquery = createPQExpBuffer();
+ PQExpBuffer tag = createPQExpBuffer();
+ teSection section = SECTION_PRE_DATA;
appendPQExpBuffer(cquery,
"SELECT pg_catalog.lo_create('%s');\n",
loinfo->dobj.name);
+ /*
+ * In binary upgrade mode we put all the queries to restore
+ * one large object into a single TOC entry and emit it as
+ * SECTION_DATA so that they can be restored in parallel.
+ */
+ if (fout->dopt->binary_upgrade)
+ {
+ section = SECTION_DATA;
+
+ /* Dump comment if any */
+ if (loinfo->dobj.dump & DUMP_COMPONENT_COMMENT)
+ dumpCommentQuery(fout, cquery, tag, "LARGE OBJECT",
+ loinfo->dobj.name, NULL, loinfo->rolname,
+ loinfo->dobj.catId, 0, loinfo->dobj.dumpId, NULL);
+
+ /* Dump security label if any */
+ if (loinfo->dobj.dump & DUMP_COMPONENT_SECLABEL)
+ dumpSecLabelQuery(fout, cquery, tag, "LARGE OBJECT",
+ loinfo->dobj.name,
+ NULL, loinfo->rolname,
+ loinfo->dobj.catId, 0, loinfo->dobj.dumpId);
+
+ /* Dump ACL if any */
+ if (loinfo->dobj.dump & DUMP_COMPONENT_ACL)
+ dumpACLQuery(fout, cquery, tag,
+ loinfo->dobj.dumpId, InvalidDumpId, "LARGE OBJECT",
+ loinfo->dobj.name, NULL,
+ NULL, loinfo->rolname, &loinfo->dacl);
+ }
+
appendPQExpBuffer(dquery,
"SELECT pg_catalog.lo_unlink('%s');\n",
@@ -3655,27 +3702,28 @@ dumpLO(Archive *fout, const LoInfo *loinfo)
ARCHIVE_OPTS(.tag = loinfo->dobj.name,
.owner = loinfo->rolname,
.description = "BLOB",
- .section = SECTION_PRE_DATA,
+ .section = section,
.createStmt = cquery->data,
.dropStmt = dquery->data));
- /* Dump comment if any */
- if (loinfo->dobj.dump & DUMP_COMPONENT_COMMENT)
- dumpComment(fout, "LARGE OBJECT", loinfo->dobj.name,
- NULL, loinfo->rolname,
- loinfo->dobj.catId, 0, loinfo->dobj.dumpId);
-
- /* Dump security label if any */
- if (loinfo->dobj.dump & DUMP_COMPONENT_SECLABEL)
- dumpSecLabel(fout, "LARGE OBJECT", loinfo->dobj.name,
- NULL, loinfo->rolname,
- loinfo->dobj.catId, 0, loinfo->dobj.dumpId);
-
- /* Dump ACL if any */
- if (loinfo->dobj.dump & DUMP_COMPONENT_ACL)
- dumpACL(fout, loinfo->dobj.dumpId, InvalidDumpId, "LARGE OBJECT",
- loinfo->dobj.name, NULL,
- NULL, loinfo->rolname, &loinfo->dacl);
+ if (!fout->dopt->binary_upgrade)
+ {
+ /* Dump comment if any */
+ if (loinfo->dobj.dump & DUMP_COMPONENT_COMMENT)
+ dumpComment(fout, "LARGE OBJECT", loinfo->dobj.name,
+ NULL, loinfo->rolname,
+ loinfo->dobj.catId, 0, loinfo->dobj.dumpId);
+ /* Dump security label if any */
+ if (loinfo->dobj.dump & DUMP_COMPONENT_SECLABEL)
+ dumpSecLabel(fout, "LARGE OBJECT", loinfo->dobj.name,
+ NULL, loinfo->rolname,
+ loinfo->dobj.catId, 0, loinfo->dobj.dumpId);
+ /* Dump ACL if any */
+ if (loinfo->dobj.dump & DUMP_COMPONENT_ACL)
+ dumpACL(fout, loinfo->dobj.dumpId, InvalidDumpId, "LARGE OBJECT",
+ loinfo->dobj.name, NULL,
+ NULL, loinfo->rolname, &loinfo->dacl);
+ }
destroyPQExpBuffer(cquery);
destroyPQExpBuffer(dquery);
@@ -9899,6 +9947,38 @@ dumpCommentExtended(Archive *fout, const char *type,
const char *owner, CatalogId catalogId,
int subid, DumpId dumpId,
const char *initdb_comment)
+{
+ PQExpBuffer query = createPQExpBuffer();
+ PQExpBuffer tag = createPQExpBuffer();
+
+ if (dumpCommentQuery(fout, query, tag, type, name, namespace, owner,
+ catalogId, subid, dumpId, initdb_comment))
+ {
+ /*
+ * We mark comments as SECTION_NONE because they really belong in the
+ * same section as their parent, whether that is pre-data or
+ * post-data.
+ */
+ ArchiveEntry(fout, nilCatalogId, createDumpId(),
+ ARCHIVE_OPTS(.tag = tag->data,
+ .namespace = namespace,
+ .owner = owner,
+ .description = "COMMENT",
+ .section = SECTION_NONE,
+ .createStmt = query->data,
+ .deps = &dumpId,
+ .nDeps = 1));
+ }
+ destroyPQExpBuffer(query);
+ destroyPQExpBuffer(tag);
+}
+
+static bool
+dumpCommentQuery(Archive *fout, PQExpBuffer query, PQExpBuffer tag,
+ const char *type, const char *name,
+ const char *namespace, const char *owner,
+ CatalogId catalogId, int subid, DumpId dumpId,
+ const char *initdb_comment)
{
DumpOptions *dopt = fout->dopt;
CommentItem *comments;
@@ -9906,19 +9986,19 @@ dumpCommentExtended(Archive *fout, const char *type,
/* do nothing, if --no-comments is supplied */
if (dopt->no_comments)
- return;
+ return false;
/* Comments are schema not data ... except LO comments are data */
if (strcmp(type, "LARGE OBJECT") != 0)
{
if (dopt->dataOnly)
- return;
+ return false;
}
else
{
/* We do dump LO comments in binary-upgrade mode */
if (dopt->schemaOnly && !dopt->binary_upgrade)
- return;
+ return false;
}
/* Search for comments associated with catalogId, using table */
@@ -9956,9 +10036,6 @@ dumpCommentExtended(Archive *fout, const char *type,
/* If a comment exists, build COMMENT ON statement */
if (ncomments > 0)
{
- PQExpBuffer query = createPQExpBuffer();
- PQExpBuffer tag = createPQExpBuffer();
-
appendPQExpBuffer(query, "COMMENT ON %s ", type);
if (namespace && *namespace)
appendPQExpBuffer(query, "%s.", fmtId(namespace));
@@ -9968,24 +10045,10 @@ dumpCommentExtended(Archive *fout, const char *type,
appendPQExpBuffer(tag, "%s %s", type, name);
- /*
- * We mark comments as SECTION_NONE because they really belong in the
- * same section as their parent, whether that is pre-data or
- * post-data.
- */
- ArchiveEntry(fout, nilCatalogId, createDumpId(),
- ARCHIVE_OPTS(.tag = tag->data,
- .namespace = namespace,
- .owner = owner,
- .description = "COMMENT",
- .section = SECTION_NONE,
- .createStmt = query->data,
- .deps = &dumpId,
- .nDeps = 1));
-
- destroyPQExpBuffer(query);
- destroyPQExpBuffer(tag);
+ return true;
}
+
+ return false;
}
/*
@@ -14939,23 +15002,65 @@ dumpACL(Archive *fout, DumpId objDumpId, DumpId altDumpId,
const DumpableAcl *dacl)
{
DumpId aclDumpId = InvalidDumpId;
+ PQExpBuffer query = createPQExpBuffer();
+ PQExpBuffer tag = createPQExpBuffer();
+
+ if (dumpACLQuery(fout, query, tag, objDumpId, altDumpId,
+ type, name, subname, nspname, owner, dacl))
+ {
+ DumpId aclDeps[2];
+ int nDeps = 0;
+
+ if (subname)
+ appendPQExpBuffer(tag, "COLUMN %s.%s", name, subname);
+ else
+ appendPQExpBuffer(tag, "%s %s", type, name);
+
+ aclDeps[nDeps++] = objDumpId;
+ if (altDumpId != InvalidDumpId)
+ aclDeps[nDeps++] = altDumpId;
+
+ aclDumpId = createDumpId();
+
+ ArchiveEntry(fout, nilCatalogId, aclDumpId,
+ ARCHIVE_OPTS(.tag = tag->data,
+ .namespace = nspname,
+ .owner = owner,
+ .description = "ACL",
+ .section = SECTION_NONE,
+ .createStmt = query->data,
+ .deps = aclDeps,
+ .nDeps = nDeps));
+
+ }
+
+ destroyPQExpBuffer(query);
+ destroyPQExpBuffer(tag);
+
+ return aclDumpId;
+}
+
+static bool
+dumpACLQuery(Archive *fout, PQExpBuffer query, PQExpBuffer tag,
+ DumpId objDumpId, DumpId altDumpId,
+ const char *type, const char *name, const char *subname,
+ const char *nspname, const char *owner,
+ const DumpableAcl *dacl)
+{
DumpOptions *dopt = fout->dopt;
const char *acls = dacl->acl;
const char *acldefault = dacl->acldefault;
char privtype = dacl->privtype;
const char *initprivs = dacl->initprivs;
const char *baseacls;
- PQExpBuffer sql;
/* Do nothing if ACL dump is not enabled */
if (dopt->aclsSkip)
- return InvalidDumpId;
+ return false;
/* --data-only skips ACLs *except* large object ACLs */
if (dopt->dataOnly && strcmp(type, "LARGE OBJECT") != 0)
- return InvalidDumpId;
-
- sql = createPQExpBuffer();
+ return false;
/*
* In binary upgrade mode, we don't run an extension's script but instead
@@ -14973,13 +15078,13 @@ dumpACL(Archive *fout, DumpId objDumpId, DumpId altDumpId,
if (dopt->binary_upgrade && privtype == 'e' &&
initprivs && *initprivs != '\0')
{
- appendPQExpBufferStr(sql, "SELECT pg_catalog.binary_upgrade_set_record_init_privs(true);\n");
+ appendPQExpBufferStr(query, "SELECT pg_catalog.binary_upgrade_set_record_init_privs(true);\n");
if (!buildACLCommands(name, subname, nspname, type,
initprivs, acldefault, owner,
- "", fout->remoteVersion, sql))
+ "", fout->remoteVersion, query))
pg_fatal("could not parse initial ACL list (%s) or default (%s) for object \"%s\" (%s)",
initprivs, acldefault, name, type);
- appendPQExpBufferStr(sql, "SELECT pg_catalog.binary_upgrade_set_record_init_privs(false);\n");
+ appendPQExpBufferStr(query, "SELECT pg_catalog.binary_upgrade_set_record_init_privs(false);\n");
}
/*
@@ -15001,43 +15106,19 @@ dumpACL(Archive *fout, DumpId objDumpId, DumpId altDumpId,
if (!buildACLCommands(name, subname, nspname, type,
acls, baseacls, owner,
- "", fout->remoteVersion, sql))
+ "", fout->remoteVersion, query))
pg_fatal("could not parse ACL list (%s) or default (%s) for object \"%s\" (%s)",
acls, baseacls, name, type);
- if (sql->len > 0)
+ if (query->len > 0 && tag != NULL)
{
- PQExpBuffer tag = createPQExpBuffer();
- DumpId aclDeps[2];
- int nDeps = 0;
-
if (subname)
appendPQExpBuffer(tag, "COLUMN %s.%s", name, subname);
else
appendPQExpBuffer(tag, "%s %s", type, name);
-
- aclDeps[nDeps++] = objDumpId;
- if (altDumpId != InvalidDumpId)
- aclDeps[nDeps++] = altDumpId;
-
- aclDumpId = createDumpId();
-
- ArchiveEntry(fout, nilCatalogId, aclDumpId,
- ARCHIVE_OPTS(.tag = tag->data,
- .namespace = nspname,
- .owner = owner,
- .description = "ACL",
- .section = SECTION_NONE,
- .createStmt = sql->data,
- .deps = aclDeps,
- .nDeps = nDeps));
-
- destroyPQExpBuffer(tag);
}
- destroyPQExpBuffer(sql);
-
- return aclDumpId;
+ return true;
}
/*
@@ -15062,16 +15143,42 @@ static void
dumpSecLabel(Archive *fout, const char *type, const char *name,
const char *namespace, const char *owner,
CatalogId catalogId, int subid, DumpId dumpId)
+{
+ PQExpBuffer query = createPQExpBuffer();
+ PQExpBuffer tag = createPQExpBuffer();
+
+ if (dumpSecLabelQuery(fout, query, tag, type, name,
+ namespace, owner, catalogId, subid, dumpId))
+ {
+ ArchiveEntry(fout, nilCatalogId, createDumpId(),
+ ARCHIVE_OPTS(.tag = tag->data,
+ .namespace = namespace,
+ .owner = owner,
+ .description = "SECURITY LABEL",
+ .section = SECTION_NONE,
+ .createStmt = query->data,
+ .deps = &dumpId,
+ .nDeps = 1));
+ }
+
+ destroyPQExpBuffer(query);
+ destroyPQExpBuffer(tag);
+}
+
+static bool
+dumpSecLabelQuery(Archive *fout, PQExpBuffer query, PQExpBuffer tag,
+ const char *type, const char *name,
+ const char *namespace, const char *owner,
+ CatalogId catalogId, int subid, DumpId dumpId)
{
DumpOptions *dopt = fout->dopt;
SecLabelItem *labels;
int nlabels;
int i;
- PQExpBuffer query;
/* do nothing, if --no-security-labels is supplied */
if (dopt->no_security_labels)
- return;
+ return false;
/*
* Security labels are schema not data ... except large object labels are
@@ -15080,20 +15187,18 @@ dumpSecLabel(Archive *fout, const char *type, const char *name,
if (strcmp(type, "LARGE OBJECT") != 0)
{
if (dopt->dataOnly)
- return;
+ return false;
}
else
{
/* We do dump large object security labels in binary-upgrade mode */
if (dopt->schemaOnly && !dopt->binary_upgrade)
- return;
+ return false;
}
/* Search for security labels associated with catalogId, using table */
nlabels = findSecLabels(catalogId.tableoid, catalogId.oid, &labels);
- query = createPQExpBuffer();
-
for (i = 0; i < nlabels; i++)
{
/*
@@ -15114,22 +15219,11 @@ dumpSecLabel(Archive *fout, const char *type, const char *name,
if (query->len > 0)
{
- PQExpBuffer tag = createPQExpBuffer();
-
appendPQExpBuffer(tag, "%s %s", type, name);
- ArchiveEntry(fout, nilCatalogId, createDumpId(),
- ARCHIVE_OPTS(.tag = tag->data,
- .namespace = namespace,
- .owner = owner,
- .description = "SECURITY LABEL",
- .section = SECTION_NONE,
- .createStmt = query->data,
- .deps = &dumpId,
- .nDeps = 1));
- destroyPQExpBuffer(tag);
+ return true;
}
- destroyPQExpBuffer(query);
+ return false;
}
/*
diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
index 049a100634..2159f72ffb 100644
--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -60,6 +60,7 @@ main(int argc, char **argv)
int c;
int exit_code;
int numWorkers = 1;
+ int blobBatchSize = 0;
Archive *AH;
char *inputFileSpec;
static int disable_triggers = 0;
@@ -123,6 +124,7 @@ main(int argc, char **argv)
{"no-publications", no_argument, &no_publications, 1},
{"no-security-labels", no_argument, &no_security_labels, 1},
{"no-subscriptions", no_argument, &no_subscriptions, 1},
+ {"restore-blob-batch-size", required_argument, NULL, 4},
{NULL, 0, NULL, 0}
};
@@ -286,6 +288,10 @@ main(int argc, char **argv)
set_dump_section(optarg, &(opts->dumpSections));
break;
+ case 4: /* # of blobs to restore per transaction */
+ blobBatchSize = atoi(optarg);
+ break;
+
default:
/* getopt_long already emitted a complaint */
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -405,6 +411,7 @@ main(int argc, char **argv)
SortTocFromFile(AH);
AH->numWorkers = numWorkers;
+ AH->blobBatchSize = blobBatchSize;
if (opts->tocSummary)
PrintTOCSummary(AH);
@@ -478,6 +485,8 @@ usage(const char *progname)
printf(_(" --use-set-session-authorization\n"
" use SET SESSION AUTHORIZATION commands instead of\n"
" ALTER OWNER commands to set ownership\n"));
+ printf(_(" --restore-blob-batch-size=NUM\n"
+ " attempt to restore NUM large objects per transaction\n"));
printf(_("\nConnection options:\n"));
printf(_(" -h, --host=HOSTNAME database server host or socket directory\n"));
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index fa52aa2c22..459d834ac3 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -84,7 +84,7 @@ output_check_banner(bool live_check)
void
-check_and_dump_old_cluster(bool live_check)
+check_and_dump_old_cluster(bool live_check, DbDumpStats **stats)
{
/* -- OLD -- */
@@ -202,12 +202,36 @@ check_and_dump_old_cluster(bool live_check)
* the old server is running.
*/
if (!user_opts.check)
+ {
+ *stats = collect_db_stats();
generate_old_dump();
+ }
if (!live_check)
stop_postmaster(false);
}
+DbDumpStats* collect_db_stats(void)
+{
+ uint dbnum;
+ DbDumpStats *stats = (DbDumpStats *)pg_malloc(sizeof(DbDumpStats));
+ stats->large_objects = (uint64 *)pg_malloc(sizeof(old_cluster.dbarr.ndbs * sizeof(uint64)));
+ prep_status("Collecting database stats");
+ for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+ {
+ PGresult *res;
+ DbInfo *active_db = &old_cluster.dbarr.dbs[dbnum];
+ PGconn *conn = connectToServer(&old_cluster, active_db->db_name);
+
+ res = executeQueryOrDie(conn, "SELECT count(*) from pg_largeobject_metadata");
+ stats->large_objects[dbnum] = atoll(PQgetvalue(res, 0, 0));
+ PQclear(res);
+ PQfinish(conn);
+ }
+ check_ok();
+
+ return stats;
+}
void
check_new_cluster(void)
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 3960af4036..12605200b5 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -54,7 +54,8 @@
static void set_locale_and_encoding(void);
static void prepare_new_cluster(void);
static void prepare_new_globals(void);
-static void create_new_objects(void);
+static void restore_dbs(DbDumpStats *stats, bool parallel_restore);
+static void create_new_objects(DbDumpStats *stats);
static void copy_xact_xlog_xid(void);
static void set_frozenxids(bool minmxid_only);
static void make_outputdirs(char *pgdata);
@@ -82,6 +83,7 @@ main(int argc, char **argv)
{
char *deletion_script_file_name = NULL;
bool live_check = false;
+ DbDumpStats *stats;
/*
* pg_upgrade doesn't currently use common/logging.c, but initialize it
@@ -127,7 +129,7 @@ main(int argc, char **argv)
check_cluster_compatibility(live_check);
- check_and_dump_old_cluster(live_check);
+ check_and_dump_old_cluster(live_check, &stats);
/* -- NEW -- */
@@ -160,7 +162,7 @@ main(int argc, char **argv)
prepare_new_globals();
- create_new_objects();
+ create_new_objects(stats);
stop_postmaster(false);
@@ -508,9 +510,72 @@ prepare_new_globals(void)
check_ok();
}
+static void
+restore_dbs(DbDumpStats *stats, bool parallel_restore)
+{
+ int dbnum;
+ for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+ {
+ char sql_file_name[MAXPGPATH],
+ log_file_name[MAXPGPATH];
+ DbInfo *old_db = &old_cluster.dbarr.dbs[dbnum];
+ const char *create_opts;
+ int jobs = user_opts.jobs ? user_opts.jobs : 1 ;
+ bool large_objects_thresold_breached = stats && stats->large_objects[dbnum] > LARGE_OBJECTS_THRESOLD;
+
+ /* Skip template1 in this pass */
+ if (strcmp(old_db->db_name, "template1") == 0)
+ continue;
+ /* Skip dbs where LARGE_OBJECTS_THRESOLD is breached and parallel_restore is enabled*/
+ if (large_objects_thresold_breached && parallel_restore)
+ continue;
+
+ pg_log(PG_STATUS, "%s", old_db->db_name);
+ snprintf(sql_file_name, sizeof(sql_file_name), DB_DUMP_FILE_MASK, old_db->db_oid);
+ snprintf(log_file_name, sizeof(log_file_name), DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+ /*
+ * postgres database will already exist in the target installation, so
+ * tell pg_restore to drop and recreate it; otherwise we would fail to
+ * propagate its database-level properties.
+ */
+ if (strcmp(old_db->db_name, "postgres") == 0)
+ create_opts = "--clean --create";
+ else
+ create_opts = "--create";
+
+ if (parallel_restore)
+ parallel_exec_prog(log_file_name,
+ NULL,
+ "\"%s/pg_restore\" %s %s --exit-on-error --verbose "
+ "--dbname template1 \"%s/%s\"",
+ new_cluster.bindir,
+ cluster_conn_opts(&new_cluster),
+ create_opts,
+ log_opts.dumpdir,
+ sql_file_name);
+ else
+ exec_prog(log_file_name,
+ NULL,
+ true,
+ true,
+ "\"%s/pg_restore\" %s %s --exit-on-error --verbose "
+ "--restore-blob-batch-size %d --jobs %d "
+ "--dbname template1 \"%s/%s\"",
+ new_cluster.bindir,
+ cluster_conn_opts(&new_cluster),
+ create_opts,
+ large_objects_thresold_breached ?
+ LARGE_OBJECTS_THRESOLD : 0,
+ large_objects_thresold_breached ?
+ jobs : 1,
+ log_opts.dumpdir,
+ sql_file_name);
+ }
+}
static void
-create_new_objects(void)
+create_new_objects(DbDumpStats *stats)
{
int dbnum;
@@ -557,43 +622,13 @@ create_new_objects(void)
break; /* done once we've processed template1 */
}
-
- for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
- {
- char sql_file_name[MAXPGPATH],
- log_file_name[MAXPGPATH];
- DbInfo *old_db = &old_cluster.dbarr.dbs[dbnum];
- const char *create_opts;
-
- /* Skip template1 in this pass */
- if (strcmp(old_db->db_name, "template1") == 0)
- continue;
-
- pg_log(PG_STATUS, "%s", old_db->db_name);
- snprintf(sql_file_name, sizeof(sql_file_name), DB_DUMP_FILE_MASK, old_db->db_oid);
- snprintf(log_file_name, sizeof(log_file_name), DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
-
- /*
- * postgres database will already exist in the target installation, so
- * tell pg_restore to drop and recreate it; otherwise we would fail to
- * propagate its database-level properties.
- */
- if (strcmp(old_db->db_name, "postgres") == 0)
- create_opts = "--clean --create";
- else
- create_opts = "--create";
-
- parallel_exec_prog(log_file_name,
- NULL,
- "\"%s/pg_restore\" %s %s --exit-on-error --verbose "
- "--dbname template1 \"%s/%s\"",
- new_cluster.bindir,
- cluster_conn_opts(&new_cluster),
- create_opts,
- log_opts.dumpdir,
- sql_file_name);
- }
-
+ /* Restore all the dbs where LARGE_OBJECTS_THRESOLD is not breached */
+ restore_dbs(stats, true);
+ /* reap all children */
+ while (reap_child(true) == true)
+ ;
+ /* Restore rest of the dbs one by one with pg_restore --jobs = user_opts.jobs */
+ restore_dbs(stats, false);
/* reap all children */
while (reap_child(true) == true)
;
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index a710f325de..f41063dbc7 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -24,6 +24,8 @@
#define MESSAGE_WIDTH 62
+#define LARGE_OBJECTS_THRESOLD 200 // maybe 10k ?
+
#define GET_MAJOR_VERSION(v) ((v) / 100)
/* contains both global db information and CREATE DATABASE commands */
@@ -347,6 +349,15 @@ typedef struct
ClusterInfo *running_cluster;
} OSInfo;
+/*
+ * Dump stats, will be used by pg_upgrade to efficently run pg_restore
+ */
+
+typedef struct
+{
+ uint64 *large_objects;
+}DbDumpStats;
+
/*
* Global variables
@@ -361,7 +372,7 @@ extern OSInfo os_info;
/* check.c */
void output_check_banner(bool live_check);
-void check_and_dump_old_cluster(bool live_check);
+void check_and_dump_old_cluster(bool live_check, DbDumpStats **stats);
void check_new_cluster(void);
void report_clusters_compatible(void);
void issue_warnings_and_set_wal_level(void);
@@ -369,6 +380,7 @@ void output_completion_banner(char *deletion_script_file_name);
void check_cluster_versions(void);
void check_cluster_compatibility(bool live_check);
void create_script_for_old_cluster_deletion(char **deletion_script_file_name);
+DbDumpStats* collect_db_stats(void);
/* controldata.c */
I have updated the patch to use heuristic, During pg_upgrade we count
Large objects per database. During pg_restore execution if db large_objects
count is greater than LARGE_OBJECTS_THRESOLD (1k) we will use
--restore-blob-batch-size.
I think both SECTION_DATA and SECTION_POST_DATA can be parallelized by pg_restore, So instead of storing
large objects in heuristics, we can store SECTION_DATA + SECTION_POST_DATA.
Regards
Sachin
Import Notes
Reply to msg id not found: 81D13E16-BA04-43CF-9B89-B8924300B211@amazon.com
I spent some time looking at the v7 patch. I can't help feeling
that this is going off in the wrong direction, primarily for
these reasons:
* It focuses only on cutting the number of transactions needed
to restore a large number of blobs (large objects). Certainly
that's a pain point, but it's not the only one of this sort.
If you have a lot of tables, restore will consume just as many
transactions as it would for a similar number of blobs --- probably
more, in fact, since we usually need more commands per table than
per blob.
* I'm not too thrilled with the (undocumented) rearrangements in
pg_dump. I really don't like the idea of emitting a fundamentally
different TOC layout in binary-upgrade mode; that seems unmaintainably
bug-prone. Plus, the XID-consumption problem is not really confined
to pg_upgrade.
What I think we actually ought to do is one of the alternatives
discussed upthread: teach pg_restore to be able to commit
every so often, without trying to provide the all-or-nothing
guarantees of --single-transaction mode. This cuts its XID
consumption by whatever multiple "every so often" is, while
allowing us to limit the number of locks taken during any one
transaction. It also seems a great deal safer than the idea
I floated of not taking locks at all during a binary upgrade;
plus, it has some usefulness with regular pg_restore that's not
under control of pg_upgrade.
So I had a go at coding that, and attached is the result.
It invents a --transaction-size option, and when that's active
it will COMMIT after every N TOC items. (This seems simpler to
implement and less bug-prone than every-N-SQL-commands.)
I had initially supposed that in a parallel restore we could
have child workers also commit after every N TOC items, but was
soon disabused of that idea. After a worker processes a TOC
item, any dependent items (such as index builds) might get
dispatched to some other worker, which had better be able to
see the results of the first worker's step. So at least in
this implementation, we disable the multi-command-per-COMMIT
behavior during the parallel part of the restore. Maybe that
could be improved in future, but it seems like it'd add a
lot more complexity, and it wouldn't make life any better for
pg_upgrade (which doesn't use parallel pg_restore, and seems
unlikely to want to in future).
I've not spent a lot of effort on pg_upgrade changes here:
I just hard-wired it to select --transaction-size=1000.
Given the default lock table size of 64*100, that gives us
enough headroom for each TOC to take half a dozen locks.
We could go higher than that by making pg_upgrade force the
destination postmaster to create a larger-than-default lock
table, but I'm not sure if it's worth any trouble. We've
already bought three orders of magnitude improvement as it
stands, which seems like enough ambition for today. (Also,
having pg_upgrade override the user's settings in the
destination cluster might not be without downsides.)
Another thing I'm wondering about is why this is only a pg_restore
option not also a pg_dump/pg_dumpall option. I did it like that
because --single-transaction is pg_restore only, but that seems more
like an oversight or laziness than a well-considered decision.
Maybe we should back-fill that omission; but it could be done later.
Thoughts?
regards, tom lane
Attachments:
v8-0001-restore-transaction-size-option.patchtext/x-diff; charset=us-ascii; name=v8-0001-restore-transaction-size-option.patchDownload
diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml
index 1a23874da6..2e3ba80258 100644
--- a/doc/src/sgml/ref/pg_restore.sgml
+++ b/doc/src/sgml/ref/pg_restore.sgml
@@ -786,6 +786,30 @@ PostgreSQL documentation
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>--transaction-size=<replaceable class="parameter">N</replaceable></option></term>
+ <listitem>
+ <para>
+ Execute the restore as a series of transactions, each processing
+ up to <replaceable class="parameter">N</replaceable> database
+ objects. This option implies <option>--exit-on-error</option>.
+ </para>
+ <para>
+ <option>--transaction-size</option> offers an intermediate choice
+ between the default behavior (one transaction per SQL command)
+ and <option>-1</option>/<option>--single-transaction</option>
+ (one transaction for all restored objects).
+ While <option>--single-transaction</option> has the least
+ overhead, it may be impractical for large databases because the
+ transaction will take a lock on each restored object, possibly
+ exhausting the server's lock table space.
+ Using <option>--transaction-size</option> with a size of a few
+ thousand objects offers nearly the same performance benefits while
+ capping the amount of lock table space needed.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>--use-set-session-authorization</option></term>
<listitem>
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 9ef2f2017e..fbf5f1c515 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -149,7 +149,9 @@ typedef struct _restoreOptions
* compression */
int suppressDumpWarnings; /* Suppress output of WARNING entries
* to stderr */
- bool single_txn;
+
+ bool single_txn; /* restore all TOCs in one transaction */
+ int txn_size; /* restore this many TOCs per txn, if > 0 */
bool *idWanted; /* array showing which dump IDs to emit */
int enable_row_security;
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 256d1e35a4..600482c93c 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -502,7 +502,28 @@ RestoreArchive(Archive *AHX)
/* Otherwise, drop anything that's selected and has a dropStmt */
if (((te->reqs & (REQ_SCHEMA | REQ_DATA)) != 0) && te->dropStmt)
{
+ bool not_allowed_in_txn = false;
+
pg_log_info("dropping %s %s", te->desc, te->tag);
+
+ /*
+ * In --transaction-size mode, we have to temporarily exit our
+ * transaction block to drop objects that can't be dropped
+ * within a transaction.
+ */
+ if (ropt->txn_size > 0)
+ {
+ if (strcmp(te->desc, "DATABASE") == 0 ||
+ strcmp(te->desc, "DATABASE PROPERTIES") == 0)
+ {
+ not_allowed_in_txn = true;
+ if (AH->connection)
+ CommitTransaction(AHX);
+ else
+ ahprintf(AH, "COMMIT;\n");
+ }
+ }
+
/* Select owner and schema as necessary */
_becomeOwner(AH, te);
_selectOutputSchema(AH, te->namespace);
@@ -615,6 +636,33 @@ RestoreArchive(Archive *AHX)
}
}
}
+
+ /*
+ * In --transaction-size mode, re-establish the transaction
+ * block if needed; otherwise, commit after every N drops.
+ */
+ if (ropt->txn_size > 0)
+ {
+ if (not_allowed_in_txn)
+ {
+ if (AH->connection)
+ StartTransaction(AHX);
+ else
+ ahprintf(AH, "BEGIN;\n");
+ AH->txnCount = 0;
+ }
+ else if (++AH->txnCount >= ropt->txn_size)
+ {
+ if (AH->connection)
+ {
+ CommitTransaction(AHX);
+ StartTransaction(AHX);
+ }
+ else
+ ahprintf(AH, "COMMIT;\nBEGIN;\n");
+ AH->txnCount = 0;
+ }
+ }
}
}
@@ -711,7 +759,11 @@ RestoreArchive(Archive *AHX)
}
}
- if (ropt->single_txn)
+ /*
+ * Close out any persistent transaction we may have. While these two
+ * cases are started in different places, we can end both cases here.
+ */
+ if (ropt->single_txn || ropt->txn_size > 0)
{
if (AH->connection)
CommitTransaction(AHX);
@@ -772,6 +824,25 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te, bool is_parallel)
*/
if ((reqs & REQ_SCHEMA) != 0)
{
+ bool object_is_db = false;
+
+ /*
+ * In --transaction-size mode, must exit our transaction block to
+ * create a database or set its properties.
+ */
+ if (strcmp(te->desc, "DATABASE") == 0 ||
+ strcmp(te->desc, "DATABASE PROPERTIES") == 0)
+ {
+ object_is_db = true;
+ if (ropt->txn_size > 0)
+ {
+ if (AH->connection)
+ CommitTransaction(&AH->public);
+ else
+ ahprintf(AH, "COMMIT;\n\n");
+ }
+ }
+
/* Show namespace in log message if available */
if (te->namespace)
pg_log_info("creating %s \"%s.%s\"",
@@ -822,10 +893,10 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te, bool is_parallel)
/*
* If we created a DB, connect to it. Also, if we changed DB
* properties, reconnect to ensure that relevant GUC settings are
- * applied to our session.
+ * applied to our session. (That also restarts the transaction block
+ * in --transaction-size mode.)
*/
- if (strcmp(te->desc, "DATABASE") == 0 ||
- strcmp(te->desc, "DATABASE PROPERTIES") == 0)
+ if (object_is_db)
{
pg_log_info("connecting to new database \"%s\"", te->tag);
_reconnectToDB(AH, te->tag);
@@ -951,6 +1022,25 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te, bool is_parallel)
}
}
+ /*
+ * If we emitted anything for this TOC entry, that counts as one action
+ * against the transaction-size limit. Commit if it's time to.
+ */
+ if ((reqs & (REQ_SCHEMA | REQ_DATA)) != 0 && ropt->txn_size > 0)
+ {
+ if (++AH->txnCount >= ropt->txn_size)
+ {
+ if (AH->connection)
+ {
+ CommitTransaction(&AH->public);
+ StartTransaction(&AH->public);
+ }
+ else
+ ahprintf(AH, "COMMIT;\nBEGIN;\n\n");
+ AH->txnCount = 0;
+ }
+ }
+
if (AH->public.n_errors > 0 && status == WORKER_OK)
status = WORKER_IGNORED_ERRORS;
@@ -1297,7 +1387,12 @@ StartRestoreLOs(ArchiveHandle *AH)
{
RestoreOptions *ropt = AH->public.ropt;
- if (!ropt->single_txn)
+ /*
+ * LOs must be restored within a transaction block, since we need the LO
+ * handle to stay open while we write it. Establish a transaction unless
+ * there's one being used globally.
+ */
+ if (!(ropt->single_txn || ropt->txn_size > 0))
{
if (AH->connection)
StartTransaction(&AH->public);
@@ -1316,7 +1411,7 @@ EndRestoreLOs(ArchiveHandle *AH)
{
RestoreOptions *ropt = AH->public.ropt;
- if (!ropt->single_txn)
+ if (!(ropt->single_txn || ropt->txn_size > 0))
{
if (AH->connection)
CommitTransaction(&AH->public);
@@ -3149,6 +3244,19 @@ _doSetFixedOutputState(ArchiveHandle *AH)
else
ahprintf(AH, "SET row_security = off;\n");
+ /*
+ * In --transaction-size mode, we should always be in a transaction when
+ * we begin to restore objects.
+ */
+ if (ropt && ropt->txn_size > 0)
+ {
+ if (AH->connection)
+ StartTransaction(&AH->public);
+ else
+ ahprintf(AH, "\nBEGIN;\n");
+ AH->txnCount = 0;
+ }
+
ahprintf(AH, "\n");
}
@@ -3991,6 +4099,14 @@ restore_toc_entries_prefork(ArchiveHandle *AH, TocEntry *pending_list)
}
}
+ /*
+ * In --transaction-size mode, we must commit the open transaction before
+ * dropping the database connection. This also ensures that child workers
+ * can see the objects we've created so far.
+ */
+ if (AH->public.ropt->txn_size > 0)
+ CommitTransaction(&AH->public);
+
/*
* Now close parent connection in prep for parallel steps. We do this
* mainly to ensure that we don't exceed the specified number of parallel
@@ -4730,6 +4846,10 @@ CloneArchive(ArchiveHandle *AH)
clone = (ArchiveHandle *) pg_malloc(sizeof(ArchiveHandle));
memcpy(clone, AH, sizeof(ArchiveHandle));
+ /* Likewise flat-copy the RestoreOptions, so we can alter them locally */
+ clone->public.ropt = (RestoreOptions *) pg_malloc(sizeof(RestoreOptions));
+ memcpy(clone->public.ropt, AH->public.ropt, sizeof(RestoreOptions));
+
/* Handle format-independent fields */
memset(&(clone->sqlparse), 0, sizeof(clone->sqlparse));
@@ -4748,6 +4868,13 @@ CloneArchive(ArchiveHandle *AH)
/* clone has its own error count, too */
clone->public.n_errors = 0;
+ /*
+ * Clone connections disregard --transaction-size; they must commit after
+ * each command so that the results are immediately visible to other
+ * workers.
+ */
+ clone->public.ropt->txn_size = 0;
+
/*
* Connect our new clone object to the database, using the same connection
* parameters used for the original connection.
diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_backup_archiver.h
index 917283fd34..c21fdfe596 100644
--- a/src/bin/pg_dump/pg_backup_archiver.h
+++ b/src/bin/pg_dump/pg_backup_archiver.h
@@ -322,6 +322,9 @@ struct _archiveHandle
char *currTablespace; /* current tablespace, or NULL */
char *currTableAm; /* current table access method, or NULL */
+ /* in --transaction-size mode, this counts objects emitted in cur xact */
+ int txnCount;
+
void *lo_buf;
size_t lo_buf_used;
size_t lo_buf_size;
diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
index c3beacdec1..5ea78cf7cc 100644
--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -120,6 +120,7 @@ main(int argc, char **argv)
{"role", required_argument, NULL, 2},
{"section", required_argument, NULL, 3},
{"strict-names", no_argument, &strict_names, 1},
+ {"transaction-size", required_argument, NULL, 5},
{"use-set-session-authorization", no_argument, &use_setsessauth, 1},
{"no-comments", no_argument, &no_comments, 1},
{"no-publications", no_argument, &no_publications, 1},
@@ -289,10 +290,18 @@ main(int argc, char **argv)
set_dump_section(optarg, &(opts->dumpSections));
break;
- case 4:
+ case 4: /* filter */
read_restore_filters(optarg, opts);
break;
+ case 5: /* transaction-size */
+ if (!option_parse_int(optarg, "--transaction-size",
+ 1, INT_MAX,
+ &opts->txn_size))
+ exit(1);
+ opts->exit_on_error = true;
+ break;
+
default:
/* getopt_long already emitted a complaint */
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -337,6 +346,9 @@ main(int argc, char **argv)
if (opts->dataOnly && opts->dropSchema)
pg_fatal("options -c/--clean and -a/--data-only cannot be used together");
+ if (opts->single_txn && opts->txn_size > 0)
+ pg_fatal("options -1/--single-transaction and --transaction-size cannot be used together");
+
/*
* -C is not compatible with -1, because we can't create a database inside
* a transaction block.
@@ -484,6 +496,7 @@ usage(const char *progname)
printf(_(" --section=SECTION restore named section (pre-data, data, or post-data)\n"));
printf(_(" --strict-names require table and/or schema include patterns to\n"
" match at least one entity each\n"));
+ printf(_(" --transaction-size=N commit after every N objects\n"));
printf(_(" --use-set-session-authorization\n"
" use SET SESSION AUTHORIZATION commands instead of\n"
" ALTER OWNER commands to set ownership\n"));
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 3960af4036..5cfd2282e1 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -548,6 +548,7 @@ create_new_objects(void)
true,
true,
"\"%s/pg_restore\" %s %s --exit-on-error --verbose "
+ "--transaction-size=1000 "
"--dbname postgres \"%s/%s\"",
new_cluster.bindir,
cluster_conn_opts(&new_cluster),
@@ -586,6 +587,7 @@ create_new_objects(void)
parallel_exec_prog(log_file_name,
NULL,
"\"%s/pg_restore\" %s %s --exit-on-error --verbose "
+ "--transaction-size=1000 "
"--dbname template1 \"%s/%s\"",
new_cluster.bindir,
cluster_conn_opts(&new_cluster),
I have spent some more effort in this area and developed a patch
series that I think addresses all of the performance issues that
we've discussed in this thread, both for pg_upgrade and more
general use of pg_dump/pg_restore. Concretely, it absorbs
the pg_restore --transaction-size switch that I proposed before
to cut the number of transactions needed during restore, and
rearranges the representation of BLOB-related TOC entries to
reduce the client-side memory requirements, and fixes some
ancient mistakes that prevent both selective restore of BLOBs
and parallel restore of BLOBs.
As a demonstration, I made a database containing 100K empty blobs,
and measured the time needed to dump/restore that using -Fd
and -j 10. HEAD doesn't get any useful parallelism on blobs,
but with this patch series we do:
dump restore
HEAD: 14sec 15sec
after 0002: 7sec 10sec
after 0003: 7sec 3sec
There are a few loose ends:
* I did not invent a switch to control the batching of blobs; it's
just hard-wired at 1000 blobs per group here. Probably we need some
user knob for that, but I'm unsure if we want to expose a count or
just a boolean for one vs more than one blob per batch. The point of
forcing one blob per batch would be to allow exact control during
selective restore, and I'm not sure if there's any value in random
other settings. On the other hand, selective restore of blobs has
been completely broken for the last dozen years and I can't recall any
user complaints about that; so maybe nobody cares and we could just
leave this as an internal choice.
* Likewise, there's no user-accessible knob to control what
transaction size pg_upgrade uses. Do we need one? In any case, it's
likely that the default needs a bit more thought than I've given it.
I used 1000, but if pg_upgrade is launching parallel restore jobs we
likely need to divide that by the number of restore jobs.
* As the patch stands, we still build a separate TOC entry for each
comment or seclabel or ACL attached to a blob. If you have a lot of
blobs with non-default properties then the TOC bloat problem comes
back again. We could do something about that, but it would take a bit
of tedious refactoring, and the most obvious way to handle it probably
re-introduces too-many-locks problems. Is this a scenario that's
worth spending a lot of time on?
More details appear in the commit messages below. Patch 0004
is nearly the same as the v8 patch I posted before, although
it adds some logic to ensure that a large blob metadata batch
doesn't create too many locks.
Comments?
regards, tom lane
PS: I don't see any active CF entry for this thread, so
I'm going to go make one.
Attachments:
v9-0001-Some-small-preliminaries-for-pg_dump-changes.patchtext/x-diff; charset=us-ascii; name=v9-0001-Some-small-preliminaries-for-pg_dump-changes.patchDownload
From eecef8f312967ff7cc0f47899c6db2c3e654371d Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Wed, 20 Dec 2023 13:52:28 -0500
Subject: [PATCH v9 1/4] Some small preliminaries for pg_dump changes.
Centralize management of the lo_buf used to hold data while restoring
blobs. The code previously had each format handler create lo_buf,
which seems rather pointless given that the format handlers all make
it the same way. Moreover, the format handlers never use lo_buf
directly, making this setup a failure from a separation-of-concerns
standpoint. Let's move the responsibility into pg_backup_archiver.c,
which is the only module concerned with lo_buf. The main reason to do
this now is that it allows a centralized fix for the soon-to-be-false
assumption that we never restore blobs in parallel.
Also, get rid of dead code in DropLOIfExists: it's been a long time
since we had any need to be able to restore to a pre-9.0 server.
---
src/bin/pg_dump/pg_backup_archiver.c | 9 +++++++++
src/bin/pg_dump/pg_backup_custom.c | 7 -------
src/bin/pg_dump/pg_backup_db.c | 27 +++++----------------------
src/bin/pg_dump/pg_backup_directory.c | 6 ------
src/bin/pg_dump/pg_backup_null.c | 4 ----
src/bin/pg_dump/pg_backup_tar.c | 4 ----
6 files changed, 14 insertions(+), 43 deletions(-)
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 256d1e35a4..26c2c684c8 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -1343,6 +1343,12 @@ StartRestoreLO(ArchiveHandle *AH, Oid oid, bool drop)
AH->loCount++;
/* Initialize the LO Buffer */
+ if (AH->lo_buf == NULL)
+ {
+ /* First time through (in this process) so allocate the buffer */
+ AH->lo_buf_size = LOBBUFSIZE;
+ AH->lo_buf = (void *) pg_malloc(LOBBUFSIZE);
+ }
AH->lo_buf_used = 0;
pg_log_info("restoring large object with OID %u", oid);
@@ -4748,6 +4754,9 @@ CloneArchive(ArchiveHandle *AH)
/* clone has its own error count, too */
clone->public.n_errors = 0;
+ /* clones should not share lo_buf */
+ clone->lo_buf = NULL;
+
/*
* Connect our new clone object to the database, using the same connection
* parameters used for the original connection.
diff --git a/src/bin/pg_dump/pg_backup_custom.c b/src/bin/pg_dump/pg_backup_custom.c
index b576b29924..7c6ac89dd4 100644
--- a/src/bin/pg_dump/pg_backup_custom.c
+++ b/src/bin/pg_dump/pg_backup_custom.c
@@ -140,10 +140,6 @@ InitArchiveFmt_Custom(ArchiveHandle *AH)
ctx = (lclContext *) pg_malloc0(sizeof(lclContext));
AH->formatData = (void *) ctx;
- /* Initialize LO buffering */
- AH->lo_buf_size = LOBBUFSIZE;
- AH->lo_buf = (void *) pg_malloc(LOBBUFSIZE);
-
/*
* Now open the file
*/
@@ -902,9 +898,6 @@ _Clone(ArchiveHandle *AH)
* share knowledge about where the data blocks are across threads.
* _PrintTocData has to be careful about the order of operations on that
* state, though.
- *
- * Note: we do not make a local lo_buf because we expect at most one BLOBS
- * entry per archive, so no parallelism is possible.
*/
}
diff --git a/src/bin/pg_dump/pg_backup_db.c b/src/bin/pg_dump/pg_backup_db.c
index f766b65059..b297ca049d 100644
--- a/src/bin/pg_dump/pg_backup_db.c
+++ b/src/bin/pg_dump/pg_backup_db.c
@@ -544,26 +544,9 @@ CommitTransaction(Archive *AHX)
void
DropLOIfExists(ArchiveHandle *AH, Oid oid)
{
- /*
- * If we are not restoring to a direct database connection, we have to
- * guess about how to detect whether the LO exists. Assume new-style.
- */
- if (AH->connection == NULL ||
- PQserverVersion(AH->connection) >= 90000)
- {
- ahprintf(AH,
- "SELECT pg_catalog.lo_unlink(oid) "
- "FROM pg_catalog.pg_largeobject_metadata "
- "WHERE oid = '%u';\n",
- oid);
- }
- else
- {
- /* Restoring to pre-9.0 server, so do it the old way */
- ahprintf(AH,
- "SELECT CASE WHEN EXISTS("
- "SELECT 1 FROM pg_catalog.pg_largeobject WHERE loid = '%u'"
- ") THEN pg_catalog.lo_unlink('%u') END;\n",
- oid, oid);
- }
+ ahprintf(AH,
+ "SELECT pg_catalog.lo_unlink(oid) "
+ "FROM pg_catalog.pg_largeobject_metadata "
+ "WHERE oid = '%u';\n",
+ oid);
}
diff --git a/src/bin/pg_dump/pg_backup_directory.c b/src/bin/pg_dump/pg_backup_directory.c
index 679c60420b..16491d6a95 100644
--- a/src/bin/pg_dump/pg_backup_directory.c
+++ b/src/bin/pg_dump/pg_backup_directory.c
@@ -143,10 +143,6 @@ InitArchiveFmt_Directory(ArchiveHandle *AH)
ctx->dataFH = NULL;
ctx->LOsTocFH = NULL;
- /* Initialize LO buffering */
- AH->lo_buf_size = LOBBUFSIZE;
- AH->lo_buf = (void *) pg_malloc(LOBBUFSIZE);
-
/*
* Now open the TOC file
*/
@@ -823,8 +819,6 @@ _Clone(ArchiveHandle *AH)
ctx = (lclContext *) AH->formatData;
/*
- * Note: we do not make a local lo_buf because we expect at most one BLOBS
- * entry per archive, so no parallelism is possible. Likewise,
* TOC-entry-local state isn't an issue because any one TOC entry is
* touched by just one worker child.
*/
diff --git a/src/bin/pg_dump/pg_backup_null.c b/src/bin/pg_dump/pg_backup_null.c
index 08f096251b..776f057770 100644
--- a/src/bin/pg_dump/pg_backup_null.c
+++ b/src/bin/pg_dump/pg_backup_null.c
@@ -63,10 +63,6 @@ InitArchiveFmt_Null(ArchiveHandle *AH)
AH->ClonePtr = NULL;
AH->DeClonePtr = NULL;
- /* Initialize LO buffering */
- AH->lo_buf_size = LOBBUFSIZE;
- AH->lo_buf = (void *) pg_malloc(LOBBUFSIZE);
-
/*
* Now prevent reading...
*/
diff --git a/src/bin/pg_dump/pg_backup_tar.c b/src/bin/pg_dump/pg_backup_tar.c
index aad88ad559..4cb9707e63 100644
--- a/src/bin/pg_dump/pg_backup_tar.c
+++ b/src/bin/pg_dump/pg_backup_tar.c
@@ -156,10 +156,6 @@ InitArchiveFmt_Tar(ArchiveHandle *AH)
ctx->filePos = 0;
ctx->isSpecialScript = 0;
- /* Initialize LO buffering */
- AH->lo_buf_size = LOBBUFSIZE;
- AH->lo_buf = (void *) pg_malloc(LOBBUFSIZE);
-
/*
* Now open the tar file, and load the TOC if we're in read mode.
*/
--
2.39.3
v9-0002-In-dumps-group-large-objects-into-matching-metada.patchtext/x-diff; charset=us-ascii; name*0=v9-0002-In-dumps-group-large-objects-into-matching-metada.p; name*1=atchDownload
From b3239164371648ccb0053f045ddc14a762e88d49 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Wed, 20 Dec 2023 15:34:19 -0500
Subject: [PATCH v9 2/4] In dumps, group large objects into matching metadata
and data entries.
Commit c0d5be5d6 caused pg_dump to create a separate BLOB metadata TOC
entry for each large object (blob), but it did not touch the ancient
decision to put all the blobs' data into a single BLOBS TOC entry.
This is bad for a few reasons: for databases with millions of blobs,
the TOC becomes unreasonably large, causing performance issues;
selective restore of just some blobs is quite impossible; and we
cannot parallelize either dump or restore of the blob data, since our
architecture for that relies on farming out whole TOC entries to
worker processes.
To improve matters, let's group multiple blobs into each blob metadata
TOC entry, and then make corresponding per-group blob data TOC entries.
Selective restore using pg_restore's -l/-L switches is then possible,
though only at the group level. (We should provide a switch to allow
forcing one-blob-per-group for users who need precise selective
restore and don't have huge numbers of blobs. This patch doesn't yet
do that, instead just hard-wiring the maximum number of blobs per
entry at 1000.)
The blobs in a group must all have the same owner, since the TOC entry
format only allows one owner to be named. In this implementation
we also require them to all share the same ACL (grants); the archive
format wouldn't require that, but pg_dump's representation of
DumpableObjects does. It seems unlikely that either restriction
will be problematic for databases with huge numbers of blobs.
The metadata TOC entries now have a "desc" string of "BLOB METADATA",
and their "defn" string is just a newline-separated list of blob OIDs.
The restore code has to generate creation commands, ALTER OWNER
commands, and drop commands (for --clean mode) from that. We would
need special-case code for ALTER OWNER and drop in any case, so the
alternative of keeping the "defn" as directly executable SQL code
for creation wouldn't buy much, and it seems like it'd bloat the
archive to little purpose.
The data TOC entries ("BLOBS") can be exactly the same as before,
except that now there can be more than one, so we'd better give them
identifying tag strings.
We have to bump the archive file format version number, since existing
versions of pg_restore wouldn't know they need to do something special
for BLOB METADATA, plus they aren't going to work correctly with
multiple BLOBS entries.
Also, the directory and tar-file format handlers need some work
for multiple BLOBS entries: they used to hard-wire the file name
as "blobs.toc", which is replaced here with "blobs_<dumpid>.toc".
The 002_pg_dump.pl test script also knows about that and requires
minor updates. (I had to drop the test for manually-compressed
blobs.toc files with LZ4, because lz4's obtuse command line
design requires explicit specification of the output file name
which seems impractical here. I don't think we're losing any
useful test coverage thereby; that test stanza seems completely
duplicative with the gzip and zstd cases anyway.)
As this stands, we still generate a separate TOC entry for any
comment, security label, or ACL attached to a blob. I feel
comfortable in believing that comments and security labels on
blobs are rare; but we might have to do something about aggregating
blob ACLs into grouped TOC entries to avoid blowing up the TOC
size, if there are use cases with large numbers of non-default
blob ACLs. That can be done later though, as it would not create
any compatibility issues.
---
src/bin/pg_dump/common.c | 26 +++
src/bin/pg_dump/pg_backup_archiver.c | 76 +++++--
src/bin/pg_dump/pg_backup_archiver.h | 6 +-
src/bin/pg_dump/pg_backup_custom.c | 4 +-
src/bin/pg_dump/pg_backup_db.c | 27 +++
src/bin/pg_dump/pg_backup_directory.c | 38 ++--
src/bin/pg_dump/pg_backup_null.c | 4 +-
src/bin/pg_dump/pg_backup_tar.c | 39 +++-
src/bin/pg_dump/pg_dump.c | 280 +++++++++++++++-----------
src/bin/pg_dump/pg_dump.h | 11 +
src/bin/pg_dump/t/002_pg_dump.pl | 30 ++-
11 files changed, 354 insertions(+), 187 deletions(-)
diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c
index 8b0c1e7b53..c38700c21e 100644
--- a/src/bin/pg_dump/common.c
+++ b/src/bin/pg_dump/common.c
@@ -46,6 +46,8 @@ static DumpId lastDumpId = 0; /* Note: 0 is InvalidDumpId */
* expects that it can move them around when resizing the table. So we
* cannot make the DumpableObjects be elements of the hash table directly;
* instead, the hash table elements contain pointers to DumpableObjects.
+ * This does have the advantage of letting us map multiple CatalogIds
+ * to one DumpableObject, which is useful for blobs.
*
* It turns out to be convenient to also use this data structure to map
* CatalogIds to owning extensions, if any. Since extension membership
@@ -696,6 +698,30 @@ AssignDumpId(DumpableObject *dobj)
}
}
+/*
+ * recordAdditionalCatalogID
+ * Record an additional catalog ID for the given DumpableObject
+ */
+void
+recordAdditionalCatalogID(CatalogId catId, DumpableObject *dobj)
+{
+ CatalogIdMapEntry *entry;
+ bool found;
+
+ /* CatalogId hash table must exist, if we have a DumpableObject */
+ Assert(catalogIdHash != NULL);
+
+ /* Add reference to CatalogId hash */
+ entry = catalogid_insert(catalogIdHash, catId, &found);
+ if (!found)
+ {
+ entry->dobj = NULL;
+ entry->ext = NULL;
+ }
+ Assert(entry->dobj == NULL);
+ entry->dobj = dobj;
+}
+
/*
* Assign a DumpId that's not tied to a DumpableObject.
*
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 26c2c684c8..73b9972da4 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -512,7 +512,20 @@ RestoreArchive(Archive *AHX)
* don't necessarily emit it verbatim; at this point we add an
* appropriate IF EXISTS clause, if the user requested it.
*/
- if (*te->dropStmt != '\0')
+ if (strcmp(te->desc, "BLOB METADATA") == 0)
+ {
+ /* We must generate the per-blob commands */
+ if (ropt->if_exists)
+ IssueCommandPerBlob(AH, te,
+ "SELECT pg_catalog.lo_unlink(oid) "
+ "FROM pg_catalog.pg_largeobject_metadata "
+ "WHERE oid = '", "'");
+ else
+ IssueCommandPerBlob(AH, te,
+ "SELECT pg_catalog.lo_unlink('",
+ "')");
+ }
+ else if (*te->dropStmt != '\0')
{
if (!ropt->if_exists ||
strncmp(te->dropStmt, "--", 2) == 0)
@@ -528,12 +541,12 @@ RestoreArchive(Archive *AHX)
{
/*
* Inject an appropriate spelling of "if exists". For
- * large objects, we have a separate routine that
+ * old-style large objects, we have a routine that
* knows how to do it, without depending on
* te->dropStmt; use that. For other objects we need
* to parse the command.
*/
- if (strncmp(te->desc, "BLOB", 4) == 0)
+ if (strcmp(te->desc, "BLOB") == 0)
{
DropLOIfExists(AH, te->catalogId.oid);
}
@@ -1290,7 +1303,7 @@ EndLO(Archive *AHX, Oid oid)
**********/
/*
- * Called by a format handler before any LOs are restored
+ * Called by a format handler before a group of LOs is restored
*/
void
StartRestoreLOs(ArchiveHandle *AH)
@@ -1309,7 +1322,7 @@ StartRestoreLOs(ArchiveHandle *AH)
}
/*
- * Called by a format handler after all LOs are restored
+ * Called by a format handler after a group of LOs is restored
*/
void
EndRestoreLOs(ArchiveHandle *AH)
@@ -2994,13 +3007,14 @@ _tocEntryRequired(TocEntry *te, teSection curSection, ArchiveHandle *AH)
{
/*
* Special Case: If 'SEQUENCE SET' or anything to do with LOs, then it
- * is considered a data entry. We don't need to check for the BLOBS
- * entry or old-style BLOB COMMENTS, because they will have hadDumper
- * = true ... but we do need to check new-style BLOB ACLs, comments,
+ * is considered a data entry. We don't need to check for BLOBS or
+ * old-style BLOB COMMENTS entries, because they will have hadDumper =
+ * true ... but we do need to check new-style BLOB ACLs, comments,
* etc.
*/
if (strcmp(te->desc, "SEQUENCE SET") == 0 ||
strcmp(te->desc, "BLOB") == 0 ||
+ strcmp(te->desc, "BLOB METADATA") == 0 ||
(strcmp(te->desc, "ACL") == 0 &&
strncmp(te->tag, "LARGE OBJECT ", 13) == 0) ||
(strcmp(te->desc, "COMMENT") == 0 &&
@@ -3041,6 +3055,7 @@ _tocEntryRequired(TocEntry *te, teSection curSection, ArchiveHandle *AH)
if (!(ropt->sequence_data && strcmp(te->desc, "SEQUENCE SET") == 0) &&
!(ropt->binary_upgrade &&
(strcmp(te->desc, "BLOB") == 0 ||
+ strcmp(te->desc, "BLOB METADATA") == 0 ||
(strcmp(te->desc, "ACL") == 0 &&
strncmp(te->tag, "LARGE OBJECT ", 13) == 0) ||
(strcmp(te->desc, "COMMENT") == 0 &&
@@ -3612,18 +3627,26 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, bool isData)
}
/*
- * Actually print the definition.
+ * Actually print the definition. Normally we can just print the defn
+ * string if any, but we have two special cases:
*
- * Really crude hack for suppressing AUTHORIZATION clause that old pg_dump
+ * 1. A crude hack for suppressing AUTHORIZATION clause that old pg_dump
* versions put into CREATE SCHEMA. Don't mutate the variant for schema
* "public" that is a comment. We have to do this when --no-owner mode is
* selected. This is ugly, but I see no other good way ...
+ *
+ * 2. BLOB METADATA entries need special processing since their defn
+ * strings are just lists of OIDs, not complete SQL commands.
*/
if (ropt->noOwner &&
strcmp(te->desc, "SCHEMA") == 0 && strncmp(te->defn, "--", 2) != 0)
{
ahprintf(AH, "CREATE SCHEMA %s;\n\n\n", fmtId(te->tag));
}
+ else if (strcmp(te->desc, "BLOB METADATA") == 0)
+ {
+ IssueCommandPerBlob(AH, te, "SELECT pg_catalog.lo_create('", "')");
+ }
else
{
if (te->defn && strlen(te->defn) > 0)
@@ -3644,18 +3667,31 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, bool isData)
te->owner && strlen(te->owner) > 0 &&
te->dropStmt && strlen(te->dropStmt) > 0)
{
- PQExpBufferData temp;
+ if (strcmp(te->desc, "BLOB METADATA") == 0)
+ {
+ /* BLOB METADATA needs special code to handle multiple LOs */
+ char *cmdEnd = psprintf(" OWNER TO %s", fmtId(te->owner));
+
+ IssueCommandPerBlob(AH, te, "ALTER LARGE OBJECT ", cmdEnd);
+ pg_free(cmdEnd);
+ }
+ else
+ {
+ /* For all other cases, we can use _getObjectDescription */
+ PQExpBufferData temp;
- initPQExpBuffer(&temp);
- _getObjectDescription(&temp, te);
+ initPQExpBuffer(&temp);
+ _getObjectDescription(&temp, te);
- /*
- * If _getObjectDescription() didn't fill the buffer, then there is no
- * owner.
- */
- if (temp.data[0])
- ahprintf(AH, "ALTER %s OWNER TO %s;\n\n", temp.data, fmtId(te->owner));
- termPQExpBuffer(&temp);
+ /*
+ * If _getObjectDescription() didn't fill the buffer, then there
+ * is no owner.
+ */
+ if (temp.data[0])
+ ahprintf(AH, "ALTER %s OWNER TO %s;\n\n",
+ temp.data, fmtId(te->owner));
+ termPQExpBuffer(&temp);
+ }
}
/*
diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_backup_archiver.h
index 917283fd34..e4dd395582 100644
--- a/src/bin/pg_dump/pg_backup_archiver.h
+++ b/src/bin/pg_dump/pg_backup_archiver.h
@@ -68,10 +68,12 @@
#define K_VERS_1_15 MAKE_ARCHIVE_VERSION(1, 15, 0) /* add
* compression_algorithm
* in header */
+#define K_VERS_1_16 MAKE_ARCHIVE_VERSION(1, 16, 0) /* BLOB METADATA entries
+ * and multiple BLOBS */
/* Current archive version number (the format we can output) */
#define K_VERS_MAJOR 1
-#define K_VERS_MINOR 15
+#define K_VERS_MINOR 16
#define K_VERS_REV 0
#define K_VERS_SELF MAKE_ARCHIVE_VERSION(K_VERS_MAJOR, K_VERS_MINOR, K_VERS_REV)
@@ -448,6 +450,8 @@ extern void InitArchiveFmt_Tar(ArchiveHandle *AH);
extern bool isValidTarHeader(char *header);
extern void ReconnectToServer(ArchiveHandle *AH, const char *dbname);
+extern void IssueCommandPerBlob(ArchiveHandle *AH, TocEntry *te,
+ const char *cmdBegin, const char *cmdEnd);
extern void DropLOIfExists(ArchiveHandle *AH, Oid oid);
void ahwrite(const void *ptr, size_t size, size_t nmemb, ArchiveHandle *AH);
diff --git a/src/bin/pg_dump/pg_backup_custom.c b/src/bin/pg_dump/pg_backup_custom.c
index 7c6ac89dd4..55107b2005 100644
--- a/src/bin/pg_dump/pg_backup_custom.c
+++ b/src/bin/pg_dump/pg_backup_custom.c
@@ -338,7 +338,7 @@ _EndData(ArchiveHandle *AH, TocEntry *te)
}
/*
- * Called by the archiver when starting to save all BLOB DATA (not schema).
+ * Called by the archiver when starting to save BLOB DATA (not schema).
* This routine should save whatever format-specific information is needed
* to read the LOs back into memory.
*
@@ -398,7 +398,7 @@ _EndLO(ArchiveHandle *AH, TocEntry *te, Oid oid)
}
/*
- * Called by the archiver when finishing saving all BLOB DATA.
+ * Called by the archiver when finishing saving BLOB DATA.
*
* Optional.
*/
diff --git a/src/bin/pg_dump/pg_backup_db.c b/src/bin/pg_dump/pg_backup_db.c
index b297ca049d..c14d813b21 100644
--- a/src/bin/pg_dump/pg_backup_db.c
+++ b/src/bin/pg_dump/pg_backup_db.c
@@ -541,6 +541,33 @@ CommitTransaction(Archive *AHX)
ExecuteSqlCommand(AH, "COMMIT", "could not commit database transaction");
}
+/*
+ * Issue per-blob commands for the large object(s) listed in the TocEntry
+ *
+ * The TocEntry's defn string is assumed to consist of large object OIDs,
+ * one per line. Wrap these in the given SQL command fragments and issue
+ * the commands. (cmdEnd need not include a semicolon.)
+ */
+void
+IssueCommandPerBlob(ArchiveHandle *AH, TocEntry *te,
+ const char *cmdBegin, const char *cmdEnd)
+{
+ /* Make a writable copy of the command string */
+ char *buf = pg_strdup(te->defn);
+ char *st;
+ char *en;
+
+ st = buf;
+ while ((en = strchr(st, '\n')) != NULL)
+ {
+ *en++ = '\0';
+ ahprintf(AH, "%s%s%s;\n", cmdBegin, st, cmdEnd);
+ st = en;
+ }
+ ahprintf(AH, "\n");
+ pg_free(buf);
+}
+
void
DropLOIfExists(ArchiveHandle *AH, Oid oid)
{
diff --git a/src/bin/pg_dump/pg_backup_directory.c b/src/bin/pg_dump/pg_backup_directory.c
index 16491d6a95..829832586f 100644
--- a/src/bin/pg_dump/pg_backup_directory.c
+++ b/src/bin/pg_dump/pg_backup_directory.c
@@ -5,8 +5,10 @@
* A directory format dump is a directory, which contains a "toc.dat" file
* for the TOC, and a separate file for each data entry, named "<oid>.dat".
* Large objects are stored in separate files named "blob_<oid>.dat",
- * and there's a plain-text TOC file for them called "blobs.toc". If
- * compression is used, each data file is individually compressed and the
+ * and there's a plain-text TOC file for each BLOBS TOC entry named
+ * "blobs_<dumpID>.toc" (or just "blobs.toc" in archive versions before 16).
+ *
+ * If compression is used, each data file is individually compressed and the
* ".gz" suffix is added to the filenames. The TOC files are never
* compressed by pg_dump, however they are accepted with the .gz suffix too,
* in case the user has manually compressed them with 'gzip'.
@@ -51,7 +53,7 @@ typedef struct
char *directory;
CompressFileHandle *dataFH; /* currently open data file */
- CompressFileHandle *LOsTocFH; /* file handle for blobs.toc */
+ CompressFileHandle *LOsTocFH; /* file handle for blobs_NNN.toc */
ParallelState *pstate; /* for parallel backup / restore */
} lclContext;
@@ -81,7 +83,7 @@ static void _StartLOs(ArchiveHandle *AH, TocEntry *te);
static void _StartLO(ArchiveHandle *AH, TocEntry *te, Oid oid);
static void _EndLO(ArchiveHandle *AH, TocEntry *te, Oid oid);
static void _EndLOs(ArchiveHandle *AH, TocEntry *te);
-static void _LoadLOs(ArchiveHandle *AH);
+static void _LoadLOs(ArchiveHandle *AH, TocEntry *te);
static void _PrepParallelRestore(ArchiveHandle *AH);
static void _Clone(ArchiveHandle *AH);
@@ -232,7 +234,10 @@ _ArchiveEntry(ArchiveHandle *AH, TocEntry *te)
tctx = (lclTocEntry *) pg_malloc0(sizeof(lclTocEntry));
if (strcmp(te->desc, "BLOBS") == 0)
- tctx->filename = pg_strdup("blobs.toc");
+ {
+ snprintf(fn, MAXPGPATH, "blobs_%d.toc", te->dumpId);
+ tctx->filename = pg_strdup(fn);
+ }
else if (te->dataDumper)
{
snprintf(fn, MAXPGPATH, "%d.dat", te->dumpId);
@@ -415,7 +420,7 @@ _PrintTocData(ArchiveHandle *AH, TocEntry *te)
return;
if (strcmp(te->desc, "BLOBS") == 0)
- _LoadLOs(AH);
+ _LoadLOs(AH, te);
else
{
char fname[MAXPGPATH];
@@ -426,17 +431,23 @@ _PrintTocData(ArchiveHandle *AH, TocEntry *te)
}
static void
-_LoadLOs(ArchiveHandle *AH)
+_LoadLOs(ArchiveHandle *AH, TocEntry *te)
{
Oid oid;
lclContext *ctx = (lclContext *) AH->formatData;
+ lclTocEntry *tctx = (lclTocEntry *) te->formatData;
CompressFileHandle *CFH;
char tocfname[MAXPGPATH];
char line[MAXPGPATH];
StartRestoreLOs(AH);
- setFilePath(AH, tocfname, "blobs.toc");
+ /*
+ * Note: before archive v16, there was always only one BLOBS TOC entry,
+ * now there can be multiple. We don't need to worry what version we are
+ * reading though, because tctx->filename should be correct either way.
+ */
+ setFilePath(AH, tocfname, tctx->filename);
CFH = ctx->LOsTocFH = InitDiscoverCompressFileHandle(tocfname, PG_BINARY_R);
@@ -632,7 +643,7 @@ _ReopenArchive(ArchiveHandle *AH)
*/
/*
- * Called by the archiver when starting to save all BLOB DATA (not schema).
+ * Called by the archiver when starting to save BLOB DATA (not schema).
* It is called just prior to the dumper's DataDumper routine.
*
* We open the large object TOC file here, so that we can append a line to
@@ -642,10 +653,11 @@ static void
_StartLOs(ArchiveHandle *AH, TocEntry *te)
{
lclContext *ctx = (lclContext *) AH->formatData;
+ lclTocEntry *tctx = (lclTocEntry *) te->formatData;
pg_compress_specification compression_spec = {0};
char fname[MAXPGPATH];
- setFilePath(AH, fname, "blobs.toc");
+ setFilePath(AH, fname, tctx->filename);
/* The LO TOC file is never compressed */
compression_spec.algorithm = PG_COMPRESSION_NONE;
@@ -690,7 +702,7 @@ _EndLO(ArchiveHandle *AH, TocEntry *te, Oid oid)
pg_fatal("could not close LO data file: %m");
ctx->dataFH = NULL;
- /* register the LO in blobs.toc */
+ /* register the LO in blobs_NNN.toc */
len = snprintf(buf, sizeof(buf), "%u blob_%u.dat\n", oid, oid);
if (!CFH->write_func(buf, len, CFH))
{
@@ -703,7 +715,7 @@ _EndLO(ArchiveHandle *AH, TocEntry *te, Oid oid)
}
/*
- * Called by the archiver when finishing saving all BLOB DATA.
+ * Called by the archiver when finishing saving BLOB DATA.
*
* We close the LOs TOC file.
*/
@@ -795,7 +807,7 @@ _PrepParallelRestore(ArchiveHandle *AH)
}
/*
- * If this is the BLOBS entry, what we stat'd was blobs.toc, which
+ * If this is a BLOBS entry, what we stat'd was blobs_NNN.toc, which
* most likely is a lot smaller than the actual blob data. We don't
* have a cheap way to estimate how much smaller, but fortunately it
* doesn't matter too much as long as we get the LOs processed
diff --git a/src/bin/pg_dump/pg_backup_null.c b/src/bin/pg_dump/pg_backup_null.c
index 776f057770..a3257f4fc8 100644
--- a/src/bin/pg_dump/pg_backup_null.c
+++ b/src/bin/pg_dump/pg_backup_null.c
@@ -113,7 +113,7 @@ _EndData(ArchiveHandle *AH, TocEntry *te)
}
/*
- * Called by the archiver when starting to save all BLOB DATA (not schema).
+ * Called by the archiver when starting to save BLOB DATA (not schema).
* This routine should save whatever format-specific information is needed
* to read the LOs back into memory.
*
@@ -170,7 +170,7 @@ _EndLO(ArchiveHandle *AH, TocEntry *te, Oid oid)
}
/*
- * Called by the archiver when finishing saving all BLOB DATA.
+ * Called by the archiver when finishing saving BLOB DATA.
*
* Optional.
*/
diff --git a/src/bin/pg_dump/pg_backup_tar.c b/src/bin/pg_dump/pg_backup_tar.c
index 4cb9707e63..41ee52b1d6 100644
--- a/src/bin/pg_dump/pg_backup_tar.c
+++ b/src/bin/pg_dump/pg_backup_tar.c
@@ -94,7 +94,7 @@ typedef struct
char *filename;
} lclTocEntry;
-static void _LoadLOs(ArchiveHandle *AH);
+static void _LoadLOs(ArchiveHandle *AH, TocEntry *te);
static TAR_MEMBER *tarOpen(ArchiveHandle *AH, const char *filename, char mode);
static void tarClose(ArchiveHandle *AH, TAR_MEMBER *th);
@@ -634,13 +634,13 @@ _PrintTocData(ArchiveHandle *AH, TocEntry *te)
}
if (strcmp(te->desc, "BLOBS") == 0)
- _LoadLOs(AH);
+ _LoadLOs(AH, te);
else
_PrintFileData(AH, tctx->filename);
}
static void
-_LoadLOs(ArchiveHandle *AH)
+_LoadLOs(ArchiveHandle *AH, TocEntry *te)
{
Oid oid;
lclContext *ctx = (lclContext *) AH->formatData;
@@ -651,7 +651,26 @@ _LoadLOs(ArchiveHandle *AH)
StartRestoreLOs(AH);
- th = tarOpen(AH, NULL, 'r'); /* Open next file */
+ /*
+ * The blobs_NNN.toc or blobs.toc file is fairly useless to us because it
+ * will appear only after the associated blob_NNN.dat files. For archive
+ * versions >= 16 we can look at the BLOBS entry's te->tag to discover the
+ * OID of the first blob we want to restore, and then search forward to
+ * find the appropriate blob_<oid>.dat file. For older versions we rely
+ * on the knowledge that there was only one BLOBS entry and just search
+ * for the first blob_<oid>.dat file. Once we find the first blob file to
+ * restore, restore all blobs until we reach the blobs[_NNN].toc file.
+ */
+ if (AH->version >= K_VERS_1_16)
+ {
+ /* We rely on atooid to not complain about nnnn..nnnn tags */
+ oid = atooid(te->tag);
+ snprintf(buf, sizeof(buf), "blob_%u.dat", oid);
+ th = tarOpen(AH, buf, 'r'); /* Advance to first desired file */
+ }
+ else
+ th = tarOpen(AH, NULL, 'r'); /* Open next file */
+
while (th != NULL)
{
ctx->FH = th;
@@ -681,9 +700,9 @@ _LoadLOs(ArchiveHandle *AH)
/*
* Once we have found the first LO, stop at the first non-LO entry
- * (which will be 'blobs.toc'). This coding would eat all the
- * rest of the archive if there are no LOs ... but this function
- * shouldn't be called at all in that case.
+ * (which will be 'blobs[_NNN].toc'). This coding would eat all
+ * the rest of the archive if there are no LOs ... but this
+ * function shouldn't be called at all in that case.
*/
if (foundLO)
break;
@@ -847,7 +866,7 @@ _scriptOut(ArchiveHandle *AH, const void *buf, size_t len)
*/
/*
- * Called by the archiver when starting to save all BLOB DATA (not schema).
+ * Called by the archiver when starting to save BLOB DATA (not schema).
* This routine should save whatever format-specific information is needed
* to read the LOs back into memory.
*
@@ -862,7 +881,7 @@ _StartLOs(ArchiveHandle *AH, TocEntry *te)
lclContext *ctx = (lclContext *) AH->formatData;
char fname[K_STD_BUF_SIZE];
- sprintf(fname, "blobs.toc");
+ sprintf(fname, "blobs_%d.toc", te->dumpId);
ctx->loToc = tarOpen(AH, fname, 'w');
}
@@ -908,7 +927,7 @@ _EndLO(ArchiveHandle *AH, TocEntry *te, Oid oid)
}
/*
- * Called by the archiver when finishing saving all BLOB DATA.
+ * Called by the archiver when finishing saving BLOB DATA.
*
* Optional.
*
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 8c0b5486b9..ecb1156f5e 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -3560,11 +3560,10 @@ getLOs(Archive *fout)
{
DumpOptions *dopt = fout->dopt;
PQExpBuffer loQry = createPQExpBuffer();
- LoInfo *loinfo;
- DumpableObject *lodata;
PGresult *res;
int ntups;
int i;
+ int n;
int i_oid;
int i_lomowner;
int i_lomacl;
@@ -3572,11 +3571,15 @@ getLOs(Archive *fout)
pg_log_info("reading large objects");
- /* Fetch LO OIDs, and owner/ACL data */
+ /*
+ * Fetch LO OIDs and owner/ACL data. Order the data so that all the blobs
+ * with the same owner/ACL appear together.
+ */
appendPQExpBufferStr(loQry,
"SELECT oid, lomowner, lomacl, "
"acldefault('L', lomowner) AS acldefault "
- "FROM pg_largeobject_metadata");
+ "FROM pg_largeobject_metadata "
+ "ORDER BY lomowner, lomacl::pg_catalog.text, oid");
res = ExecuteSqlQuery(fout, loQry->data, PGRES_TUPLES_OK);
@@ -3588,30 +3591,72 @@ getLOs(Archive *fout)
ntups = PQntuples(res);
/*
- * Each large object has its own "BLOB" archive entry.
+ * Group the blobs into suitably-sized groups that have the same owner and
+ * ACL setting, and build a metadata and a data DumpableObject for each
+ * group. (If we supported initprivs for blobs, we'd have to insist that
+ * groups also share initprivs settings, since the DumpableObject only has
+ * room for one.) i is the index of the first tuple in the current group,
+ * and n is the number of tuples we include in the group.
*/
- loinfo = (LoInfo *) pg_malloc(ntups * sizeof(LoInfo));
+ for (i = 0; i < ntups; i += n)
+ {
+ Oid thisoid = atooid(PQgetvalue(res, i, i_oid));
+ char *thisowner = PQgetvalue(res, i, i_lomowner);
+ char *thisacl = PQgetvalue(res, i, i_lomacl);
+ LoInfo *loinfo;
+ DumpableObject *lodata;
+ char namebuf[64];
+
+ /* Scan to find first tuple not to be included in group */
+ n = 1;
+ while (n < 1000 && i + n < ntups)
+ {
+ if (strcmp(thisowner, PQgetvalue(res, i + n, i_lomowner)) != 0 ||
+ strcmp(thisacl, PQgetvalue(res, i + n, i_lomacl)) != 0)
+ break;
+ n++;
+ }
- for (i = 0; i < ntups; i++)
- {
- loinfo[i].dobj.objType = DO_LARGE_OBJECT;
- loinfo[i].dobj.catId.tableoid = LargeObjectRelationId;
- loinfo[i].dobj.catId.oid = atooid(PQgetvalue(res, i, i_oid));
- AssignDumpId(&loinfo[i].dobj);
+ /* Build the metadata DumpableObject */
+ loinfo = (LoInfo *) pg_malloc(offsetof(LoInfo, looids) + n * sizeof(Oid));
- loinfo[i].dobj.name = pg_strdup(PQgetvalue(res, i, i_oid));
- loinfo[i].dacl.acl = pg_strdup(PQgetvalue(res, i, i_lomacl));
- loinfo[i].dacl.acldefault = pg_strdup(PQgetvalue(res, i, i_acldefault));
- loinfo[i].dacl.privtype = 0;
- loinfo[i].dacl.initprivs = NULL;
- loinfo[i].rolname = getRoleName(PQgetvalue(res, i, i_lomowner));
+ loinfo->dobj.objType = DO_LARGE_OBJECT;
+ loinfo->dobj.catId.tableoid = LargeObjectRelationId;
+ loinfo->dobj.catId.oid = thisoid;
+ AssignDumpId(&loinfo->dobj);
+
+ if (n > 1)
+ snprintf(namebuf, sizeof(namebuf), "%u..%u", thisoid,
+ atooid(PQgetvalue(res, i + n - 1, i_oid)));
+ else
+ snprintf(namebuf, sizeof(namebuf), "%u", thisoid);
+ loinfo->dobj.name = pg_strdup(namebuf);
+ loinfo->dacl.acl = pg_strdup(thisacl);
+ loinfo->dacl.acldefault = pg_strdup(PQgetvalue(res, i, i_acldefault));
+ loinfo->dacl.privtype = 0;
+ loinfo->dacl.initprivs = NULL;
+ loinfo->rolname = getRoleName(thisowner);
+ loinfo->numlos = n;
+ loinfo->looids[0] = thisoid;
+ /* Collect OIDs of the remaining blobs in this group */
+ for (int k = 1; k < n; k++)
+ {
+ CatalogId extraID;
+
+ loinfo->looids[k] = atooid(PQgetvalue(res, i + k, i_oid));
+
+ /* Make sure we can look up loinfo by any of the blobs' OIDs */
+ extraID.tableoid = LargeObjectRelationId;
+ extraID.oid = loinfo->looids[k];
+ recordAdditionalCatalogID(extraID, &loinfo->dobj);
+ }
/* LOs have data */
- loinfo[i].dobj.components |= DUMP_COMPONENT_DATA;
+ loinfo->dobj.components |= DUMP_COMPONENT_DATA;
- /* Mark whether LO has an ACL */
+ /* Mark whether LO group has a non-empty ACL */
if (!PQgetisnull(res, i, i_lomacl))
- loinfo[i].dobj.components |= DUMP_COMPONENT_ACL;
+ loinfo->dobj.components |= DUMP_COMPONENT_ACL;
/*
* In binary-upgrade mode for LOs, we do *not* dump out the LO data,
@@ -3621,21 +3666,22 @@ getLOs(Archive *fout)
* pg_largeobject_metadata, after the dump is restored.
*/
if (dopt->binary_upgrade)
- loinfo[i].dobj.dump &= ~DUMP_COMPONENT_DATA;
- }
+ loinfo->dobj.dump &= ~DUMP_COMPONENT_DATA;
- /*
- * If we have any large objects, a "BLOBS" archive entry is needed. This
- * is just a placeholder for sorting; it carries no data now.
- */
- if (ntups > 0)
- {
+ /*
+ * Create a "BLOBS" data item for the group, too. This is just a
+ * placeholder for sorting; it carries no data now.
+ */
lodata = (DumpableObject *) pg_malloc(sizeof(DumpableObject));
lodata->objType = DO_LARGE_OBJECT_DATA;
lodata->catId = nilCatalogId;
AssignDumpId(lodata);
- lodata->name = pg_strdup("BLOBS");
+ lodata->name = pg_strdup(namebuf);
lodata->components |= DUMP_COMPONENT_DATA;
+ /* Set up explicit dependency from data to metadata */
+ lodata->dependencies = (DumpId *) pg_malloc(sizeof(DumpId));
+ lodata->dependencies[0] = loinfo->dobj.dumpId;
+ lodata->nDeps = lodata->allocDeps = 1;
}
PQclear(res);
@@ -3645,123 +3691,109 @@ getLOs(Archive *fout)
/*
* dumpLO
*
- * dump the definition (metadata) of the given large object
+ * dump the definition (metadata) of the given large object group
*/
static void
dumpLO(Archive *fout, const LoInfo *loinfo)
{
PQExpBuffer cquery = createPQExpBuffer();
- PQExpBuffer dquery = createPQExpBuffer();
-
- appendPQExpBuffer(cquery,
- "SELECT pg_catalog.lo_create('%s');\n",
- loinfo->dobj.name);
- appendPQExpBuffer(dquery,
- "SELECT pg_catalog.lo_unlink('%s');\n",
- loinfo->dobj.name);
+ /*
+ * The "definition" is just a newline-separated list of OIDs. We need to
+ * put something into the dropStmt too, but it can just be a comment.
+ */
+ for (int i = 0; i < loinfo->numlos; i++)
+ appendPQExpBuffer(cquery, "%u\n", loinfo->looids[i]);
if (loinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
ArchiveEntry(fout, loinfo->dobj.catId, loinfo->dobj.dumpId,
ARCHIVE_OPTS(.tag = loinfo->dobj.name,
.owner = loinfo->rolname,
- .description = "BLOB",
+ .description = "BLOB METADATA",
.section = SECTION_PRE_DATA,
.createStmt = cquery->data,
- .dropStmt = dquery->data));
-
- /* Dump comment if any */
- if (loinfo->dobj.dump & DUMP_COMPONENT_COMMENT)
- dumpComment(fout, "LARGE OBJECT", loinfo->dobj.name,
- NULL, loinfo->rolname,
- loinfo->dobj.catId, 0, loinfo->dobj.dumpId);
-
- /* Dump security label if any */
- if (loinfo->dobj.dump & DUMP_COMPONENT_SECLABEL)
- dumpSecLabel(fout, "LARGE OBJECT", loinfo->dobj.name,
- NULL, loinfo->rolname,
- loinfo->dobj.catId, 0, loinfo->dobj.dumpId);
-
- /* Dump ACL if any */
- if (loinfo->dobj.dump & DUMP_COMPONENT_ACL)
- dumpACL(fout, loinfo->dobj.dumpId, InvalidDumpId, "LARGE OBJECT",
- loinfo->dobj.name, NULL,
- NULL, loinfo->rolname, &loinfo->dacl);
+ .dropStmt = "-- dummy"));
+
+ /*
+ * Dump per-blob comments, seclabels, and ACLs if any. We assume these
+ * are rare enough that it's okay to generate retail TOC entries for them.
+ */
+ if (loinfo->dobj.dump & (DUMP_COMPONENT_COMMENT |
+ DUMP_COMPONENT_SECLABEL |
+ DUMP_COMPONENT_ACL))
+ {
+ for (int i = 0; i < loinfo->numlos; i++)
+ {
+ CatalogId catId;
+ char namebuf[32];
+
+ /* Build identifying info for this blob */
+ catId.tableoid = loinfo->dobj.catId.tableoid;
+ catId.oid = loinfo->looids[i];
+ snprintf(namebuf, sizeof(namebuf), "%u", loinfo->looids[i]);
+
+ if (loinfo->dobj.dump & DUMP_COMPONENT_COMMENT)
+ dumpComment(fout, "LARGE OBJECT", namebuf,
+ NULL, loinfo->rolname,
+ catId, 0, loinfo->dobj.dumpId);
+
+ if (loinfo->dobj.dump & DUMP_COMPONENT_SECLABEL)
+ dumpSecLabel(fout, "LARGE OBJECT", namebuf,
+ NULL, loinfo->rolname,
+ catId, 0, loinfo->dobj.dumpId);
+
+ if (loinfo->dobj.dump & DUMP_COMPONENT_ACL)
+ dumpACL(fout, loinfo->dobj.dumpId, InvalidDumpId,
+ "LARGE OBJECT", namebuf, NULL,
+ NULL, loinfo->rolname, &loinfo->dacl);
+ }
+ }
destroyPQExpBuffer(cquery);
- destroyPQExpBuffer(dquery);
}
/*
* dumpLOs:
- * dump the data contents of all large objects
+ * dump the data contents of the large objects in the given group
*/
static int
dumpLOs(Archive *fout, const void *arg)
{
- const char *loQry;
- const char *loFetchQry;
+ const LoInfo *loinfo = (const LoInfo *) arg;
PGconn *conn = GetConnection(fout);
- PGresult *res;
char buf[LOBBUFSIZE];
- int ntups;
- int i;
- int cnt;
-
- pg_log_info("saving large objects");
- /*
- * Currently, we re-fetch all LO OIDs using a cursor. Consider scanning
- * the already-in-memory dumpable objects instead...
- */
- loQry =
- "DECLARE looid CURSOR FOR "
- "SELECT oid FROM pg_largeobject_metadata ORDER BY 1";
+ pg_log_info("saving large objects \"%s\"", loinfo->dobj.name);
- ExecuteSqlStatement(fout, loQry);
+ for (int i = 0; i < loinfo->numlos; i++)
+ {
+ Oid loOid = loinfo->looids[i];
+ int loFd;
+ int cnt;
- /* Command to fetch from cursor */
- loFetchQry = "FETCH 1000 IN looid";
+ /* Open the LO */
+ loFd = lo_open(conn, loOid, INV_READ);
+ if (loFd == -1)
+ pg_fatal("could not open large object %u: %s",
+ loOid, PQerrorMessage(conn));
- do
- {
- /* Do a fetch */
- res = ExecuteSqlQuery(fout, loFetchQry, PGRES_TUPLES_OK);
+ StartLO(fout, loOid);
- /* Process the tuples, if any */
- ntups = PQntuples(res);
- for (i = 0; i < ntups; i++)
+ /* Now read it in chunks, sending data to archive */
+ do
{
- Oid loOid;
- int loFd;
-
- loOid = atooid(PQgetvalue(res, i, 0));
- /* Open the LO */
- loFd = lo_open(conn, loOid, INV_READ);
- if (loFd == -1)
- pg_fatal("could not open large object %u: %s",
+ cnt = lo_read(conn, loFd, buf, LOBBUFSIZE);
+ if (cnt < 0)
+ pg_fatal("error reading large object %u: %s",
loOid, PQerrorMessage(conn));
- StartLO(fout, loOid);
-
- /* Now read it in chunks, sending data to archive */
- do
- {
- cnt = lo_read(conn, loFd, buf, LOBBUFSIZE);
- if (cnt < 0)
- pg_fatal("error reading large object %u: %s",
- loOid, PQerrorMessage(conn));
-
- WriteData(fout, buf, cnt);
- } while (cnt > 0);
-
- lo_close(conn, loFd);
+ WriteData(fout, buf, cnt);
+ } while (cnt > 0);
- EndLO(fout, loOid);
- }
+ lo_close(conn, loFd);
- PQclear(res);
- } while (ntups > 0);
+ EndLO(fout, loOid);
+ }
return 1;
}
@@ -10413,28 +10445,34 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
case DO_LARGE_OBJECT_DATA:
if (dobj->dump & DUMP_COMPONENT_DATA)
{
+ LoInfo *loinfo;
TocEntry *te;
+ loinfo = (LoInfo *) findObjectByDumpId(dobj->dependencies[0]);
+ if (loinfo == NULL)
+ pg_fatal("missing metadata for large objects \"%s\"",
+ dobj->name);
+
te = ArchiveEntry(fout, dobj->catId, dobj->dumpId,
ARCHIVE_OPTS(.tag = dobj->name,
+ .owner = loinfo->rolname,
.description = "BLOBS",
.section = SECTION_DATA,
- .dumpFn = dumpLOs));
+ .deps = dobj->dependencies,
+ .nDeps = dobj->nDeps,
+ .dumpFn = dumpLOs,
+ .dumpArg = loinfo));
/*
* Set the TocEntry's dataLength in case we are doing a
* parallel dump and want to order dump jobs by table size.
* (We need some size estimate for every TocEntry with a
* DataDumper function.) We don't currently have any cheap
- * way to estimate the size of LOs, but it doesn't matter;
- * let's just set the size to a large value so parallel dumps
- * will launch this job first. If there's lots of LOs, we
- * win, and if there aren't, we don't lose much. (If you want
- * to improve on this, really what you should be thinking
- * about is allowing LO dumping to be parallelized, not just
- * getting a smarter estimate for the single TOC entry.)
+ * way to estimate the size of LOs, but fortunately it doesn't
+ * matter too much as long as we get large batches of LOs
+ * processed reasonably early. Assume 8K per blob.
*/
- te->dataLength = INT_MAX;
+ te->dataLength = loinfo->numlos * (pgoff_t) 8192;
}
break;
case DO_POLICY:
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 2fe3cbed9a..9105210693 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -589,11 +589,21 @@ typedef struct _defaultACLInfo
char defaclobjtype;
} DefaultACLInfo;
+/*
+ * LoInfo represents a group of large objects (blobs) that share the same
+ * owner and ACL setting. dobj.components has the DUMP_COMPONENT_COMMENT bit
+ * set if any blob in the group has a comment; similarly for sec labels.
+ * If there are many blobs with the same owner/ACL, we can divide them into
+ * multiple LoInfo groups, which will each spawn a BLOB METADATA and a BLOBS
+ * (data) TOC entry. This allows more parallelism during restore.
+ */
typedef struct _loInfo
{
DumpableObject dobj;
DumpableAcl dacl;
const char *rolname;
+ int numlos;
+ Oid looids[FLEXIBLE_ARRAY_MEMBER];
} LoInfo;
/*
@@ -680,6 +690,7 @@ typedef struct _SubscriptionInfo
extern TableInfo *getSchemaData(Archive *fout, int *numTablesPtr);
extern void AssignDumpId(DumpableObject *dobj);
+extern void recordAdditionalCatalogID(CatalogId catId, DumpableObject *dobj);
extern DumpId createDumpId(void);
extern DumpId getMaxDumpId(void);
extern DumpableObject *findObjectByDumpId(DumpId dumpId);
diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl
index eb3ec534b4..76548561c8 100644
--- a/src/bin/pg_dump/t/002_pg_dump.pl
+++ b/src/bin/pg_dump/t/002_pg_dump.pl
@@ -109,11 +109,11 @@ my %pgdump_runs = (
'--format=directory', '--compress=gzip:1',
"--file=$tempdir/compression_gzip_dir", 'postgres',
],
- # Give coverage for manually compressed blob.toc files during
+ # Give coverage for manually compressed blobs.toc files during
# restore.
compress_cmd => {
program => $ENV{'GZIP_PROGRAM'},
- args => [ '-f', "$tempdir/compression_gzip_dir/blobs.toc", ],
+ args => [ '-f', "$tempdir/compression_gzip_dir/blobs_*.toc", ],
},
# Verify that only data files were compressed
glob_patterns => [
@@ -172,16 +172,6 @@ my %pgdump_runs = (
'--format=directory', '--compress=lz4:1',
"--file=$tempdir/compression_lz4_dir", 'postgres',
],
- # Give coverage for manually compressed blob.toc files during
- # restore.
- compress_cmd => {
- program => $ENV{'LZ4'},
- args => [
- '-z', '-f', '--rm',
- "$tempdir/compression_lz4_dir/blobs.toc",
- "$tempdir/compression_lz4_dir/blobs.toc.lz4",
- ],
- },
# Verify that data files were compressed
glob_patterns => [
"$tempdir/compression_lz4_dir/toc.dat",
@@ -242,14 +232,13 @@ my %pgdump_runs = (
'--format=directory', '--compress=zstd:1',
"--file=$tempdir/compression_zstd_dir", 'postgres',
],
- # Give coverage for manually compressed blob.toc files during
+ # Give coverage for manually compressed blobs.toc files during
# restore.
compress_cmd => {
program => $ENV{'ZSTD'},
args => [
'-z', '-f',
- '--rm', "$tempdir/compression_zstd_dir/blobs.toc",
- "-o", "$tempdir/compression_zstd_dir/blobs.toc.zst",
+ '--rm', "$tempdir/compression_zstd_dir/blobs_*.toc",
],
},
# Verify that data files were compressed
@@ -413,7 +402,7 @@ my %pgdump_runs = (
},
glob_patterns => [
"$tempdir/defaults_dir_format/toc.dat",
- "$tempdir/defaults_dir_format/blobs.toc",
+ "$tempdir/defaults_dir_format/blobs_*.toc",
$supports_gzip ? "$tempdir/defaults_dir_format/*.dat.gz"
: "$tempdir/defaults_dir_format/*.dat",
],
@@ -4821,8 +4810,13 @@ foreach my $run (sort keys %pgdump_runs)
# not defined.
next if (!defined($compress_program) || $compress_program eq '');
- my @full_compress_cmd =
- ($compress_cmd->{program}, @{ $compress_cmd->{args} });
+ # Arguments may require globbing.
+ my @full_compress_cmd = ($compress_program);
+ foreach my $arg (@{ $compress_cmd->{args} })
+ {
+ push @full_compress_cmd, glob($arg);
+ }
+
command_ok(\@full_compress_cmd, "$run: compression commands");
}
--
2.39.3
v9-0003-Move-BLOBS-METADATA-TOC-entries-into-SECTION_DATA.patchtext/x-diff; charset=us-ascii; name*0=v9-0003-Move-BLOBS-METADATA-TOC-entries-into-SECTION_DATA.p; name*1=atchDownload
From 17ace22d028b24a89561e76f94f9defd92da9e8d Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Wed, 20 Dec 2023 16:56:54 -0500
Subject: [PATCH v9 3/4] Move BLOBS METADATA TOC entries into SECTION_DATA.
Commit c0d5be5d6 put the new BLOB metadata TOC entries into
SECTION_PRE_DATA, which perhaps is defensible in some ways,
but it's a rather odd choice considering that we go out of our
way to treat blobs as data. Moreover, because parallel restore
handles the PRE_DATA section serially, this means we're only
getting part of the parallelism speedup we could hope for.
Moving these entries into SECTION_DATA means that we can
parallelize the lo_create calls not only the data loading
when there are many blobs. The dependencies established by
the previous patch ensure that we won't try to load data for
a blob we've not yet created.
---
src/bin/pg_dump/pg_dump.c | 4 ++--
src/bin/pg_dump/t/002_pg_dump.pl | 8 ++++----
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index ecb1156f5e..4b34638cb1 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -3710,7 +3710,7 @@ dumpLO(Archive *fout, const LoInfo *loinfo)
ARCHIVE_OPTS(.tag = loinfo->dobj.name,
.owner = loinfo->rolname,
.description = "BLOB METADATA",
- .section = SECTION_PRE_DATA,
+ .section = SECTION_DATA,
.createStmt = cquery->data,
.dropStmt = "-- dummy"));
@@ -18534,12 +18534,12 @@ addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
case DO_FDW:
case DO_FOREIGN_SERVER:
case DO_TRANSFORM:
- case DO_LARGE_OBJECT:
/* Pre-data objects: must come before the pre-data boundary */
addObjectDependency(preDataBound, dobj->dumpId);
break;
case DO_TABLE_DATA:
case DO_SEQUENCE_SET:
+ case DO_LARGE_OBJECT:
case DO_LARGE_OBJECT_DATA:
/* Data objects: must come between the boundaries */
addObjectDependency(dobj, preDataBound->dumpId);
diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl
index 76548561c8..f0ea6e3dd8 100644
--- a/src/bin/pg_dump/t/002_pg_dump.pl
+++ b/src/bin/pg_dump/t/002_pg_dump.pl
@@ -912,7 +912,7 @@ my %tests = (
column_inserts => 1,
data_only => 1,
inserts => 1,
- section_pre_data => 1,
+ section_data => 1,
test_schema_plus_large_objects => 1,
},
unlike => {
@@ -1289,7 +1289,7 @@ my %tests = (
column_inserts => 1,
data_only => 1,
inserts => 1,
- section_pre_data => 1,
+ section_data => 1,
test_schema_plus_large_objects => 1,
},
unlike => {
@@ -1497,7 +1497,7 @@ my %tests = (
column_inserts => 1,
data_only => 1,
inserts => 1,
- section_pre_data => 1,
+ section_data => 1,
test_schema_plus_large_objects => 1,
},
unlike => {
@@ -4241,7 +4241,7 @@ my %tests = (
column_inserts => 1,
data_only => 1,
inserts => 1,
- section_pre_data => 1,
+ section_data => 1,
test_schema_plus_large_objects => 1,
binary_upgrade => 1,
},
--
2.39.3
v9-0004-Invent-transaction-size-option-for-pg_restore.patchtext/x-diff; charset=us-ascii; name=v9-0004-Invent-transaction-size-option-for-pg_restore.patchDownload
From 3ab3558a236e6ad17fe48087aac3cabb4b02aa3e Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Wed, 20 Dec 2023 17:42:39 -0500
Subject: [PATCH v9 4/4] Invent --transaction-size option for pg_restore.
This patch allows pg_restore to wrap its commands into transaction
blocks, somewhat like --single-transaction, except that we commit
and start a new block after every N objects. Using this mode
with a size limit of 1000 or so objects greatly reduces the number
of transactions consumed by the restore, while preventing any
one transaction from taking enough locks to overrun the receiving
server's shared lock table.
(A value of 1000 works well with the default lock table size of
around 6400 locks. Higher --transaction-size values can be used
if one has increased the receiving server's lock table size.)
In this patch I have just hard-wired pg_upgrade to use
--transaction-size 1000. Perhaps there would be value in adding
another pg_upgrade option to allow user control of that, but I'm
unsure that it's worth the trouble; I think few users would use it,
and any who did would see not that much benefit. However, we
might need to adjust the logic to make the size be 1000 divided
by the number of parallel restore jobs allowed.
---
doc/src/sgml/ref/pg_restore.sgml | 24 +++++
src/bin/pg_dump/pg_backup.h | 4 +-
src/bin/pg_dump/pg_backup_archiver.c | 139 +++++++++++++++++++++++++--
src/bin/pg_dump/pg_backup_archiver.h | 3 +
src/bin/pg_dump/pg_backup_db.c | 18 ++++
src/bin/pg_dump/pg_restore.c | 15 ++-
src/bin/pg_upgrade/pg_upgrade.c | 2 +
7 files changed, 197 insertions(+), 8 deletions(-)
diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml
index 1a23874da6..2e3ba80258 100644
--- a/doc/src/sgml/ref/pg_restore.sgml
+++ b/doc/src/sgml/ref/pg_restore.sgml
@@ -786,6 +786,30 @@ PostgreSQL documentation
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>--transaction-size=<replaceable class="parameter">N</replaceable></option></term>
+ <listitem>
+ <para>
+ Execute the restore as a series of transactions, each processing
+ up to <replaceable class="parameter">N</replaceable> database
+ objects. This option implies <option>--exit-on-error</option>.
+ </para>
+ <para>
+ <option>--transaction-size</option> offers an intermediate choice
+ between the default behavior (one transaction per SQL command)
+ and <option>-1</option>/<option>--single-transaction</option>
+ (one transaction for all restored objects).
+ While <option>--single-transaction</option> has the least
+ overhead, it may be impractical for large databases because the
+ transaction will take a lock on each restored object, possibly
+ exhausting the server's lock table space.
+ Using <option>--transaction-size</option> with a size of a few
+ thousand objects offers nearly the same performance benefits while
+ capping the amount of lock table space needed.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>--use-set-session-authorization</option></term>
<listitem>
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 9ef2f2017e..fbf5f1c515 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -149,7 +149,9 @@ typedef struct _restoreOptions
* compression */
int suppressDumpWarnings; /* Suppress output of WARNING entries
* to stderr */
- bool single_txn;
+
+ bool single_txn; /* restore all TOCs in one transaction */
+ int txn_size; /* restore this many TOCs per txn, if > 0 */
bool *idWanted; /* array showing which dump IDs to emit */
int enable_row_security;
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 73b9972da4..ec74846998 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -502,7 +502,28 @@ RestoreArchive(Archive *AHX)
/* Otherwise, drop anything that's selected and has a dropStmt */
if (((te->reqs & (REQ_SCHEMA | REQ_DATA)) != 0) && te->dropStmt)
{
+ bool not_allowed_in_txn = false;
+
pg_log_info("dropping %s %s", te->desc, te->tag);
+
+ /*
+ * In --transaction-size mode, we have to temporarily exit our
+ * transaction block to drop objects that can't be dropped
+ * within a transaction.
+ */
+ if (ropt->txn_size > 0)
+ {
+ if (strcmp(te->desc, "DATABASE") == 0 ||
+ strcmp(te->desc, "DATABASE PROPERTIES") == 0)
+ {
+ not_allowed_in_txn = true;
+ if (AH->connection)
+ CommitTransaction(AHX);
+ else
+ ahprintf(AH, "COMMIT;\n");
+ }
+ }
+
/* Select owner and schema as necessary */
_becomeOwner(AH, te);
_selectOutputSchema(AH, te->namespace);
@@ -628,6 +649,33 @@ RestoreArchive(Archive *AHX)
}
}
}
+
+ /*
+ * In --transaction-size mode, re-establish the transaction
+ * block if needed; otherwise, commit after every N drops.
+ */
+ if (ropt->txn_size > 0)
+ {
+ if (not_allowed_in_txn)
+ {
+ if (AH->connection)
+ StartTransaction(AHX);
+ else
+ ahprintf(AH, "BEGIN;\n");
+ AH->txnCount = 0;
+ }
+ else if (++AH->txnCount >= ropt->txn_size)
+ {
+ if (AH->connection)
+ {
+ CommitTransaction(AHX);
+ StartTransaction(AHX);
+ }
+ else
+ ahprintf(AH, "COMMIT;\nBEGIN;\n");
+ AH->txnCount = 0;
+ }
+ }
}
}
@@ -724,7 +772,11 @@ RestoreArchive(Archive *AHX)
}
}
- if (ropt->single_txn)
+ /*
+ * Close out any persistent transaction we may have. While these two
+ * cases are started in different places, we can end both cases here.
+ */
+ if (ropt->single_txn || ropt->txn_size > 0)
{
if (AH->connection)
CommitTransaction(AHX);
@@ -785,6 +837,25 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te, bool is_parallel)
*/
if ((reqs & REQ_SCHEMA) != 0)
{
+ bool object_is_db = false;
+
+ /*
+ * In --transaction-size mode, must exit our transaction block to
+ * create a database or set its properties.
+ */
+ if (strcmp(te->desc, "DATABASE") == 0 ||
+ strcmp(te->desc, "DATABASE PROPERTIES") == 0)
+ {
+ object_is_db = true;
+ if (ropt->txn_size > 0)
+ {
+ if (AH->connection)
+ CommitTransaction(&AH->public);
+ else
+ ahprintf(AH, "COMMIT;\n\n");
+ }
+ }
+
/* Show namespace in log message if available */
if (te->namespace)
pg_log_info("creating %s \"%s.%s\"",
@@ -835,10 +906,10 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te, bool is_parallel)
/*
* If we created a DB, connect to it. Also, if we changed DB
* properties, reconnect to ensure that relevant GUC settings are
- * applied to our session.
+ * applied to our session. (That also restarts the transaction block
+ * in --transaction-size mode.)
*/
- if (strcmp(te->desc, "DATABASE") == 0 ||
- strcmp(te->desc, "DATABASE PROPERTIES") == 0)
+ if (object_is_db)
{
pg_log_info("connecting to new database \"%s\"", te->tag);
_reconnectToDB(AH, te->tag);
@@ -964,6 +1035,25 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te, bool is_parallel)
}
}
+ /*
+ * If we emitted anything for this TOC entry, that counts as one action
+ * against the transaction-size limit. Commit if it's time to.
+ */
+ if ((reqs & (REQ_SCHEMA | REQ_DATA)) != 0 && ropt->txn_size > 0)
+ {
+ if (++AH->txnCount >= ropt->txn_size)
+ {
+ if (AH->connection)
+ {
+ CommitTransaction(&AH->public);
+ StartTransaction(&AH->public);
+ }
+ else
+ ahprintf(AH, "COMMIT;\nBEGIN;\n\n");
+ AH->txnCount = 0;
+ }
+ }
+
if (AH->public.n_errors > 0 && status == WORKER_OK)
status = WORKER_IGNORED_ERRORS;
@@ -1310,7 +1400,12 @@ StartRestoreLOs(ArchiveHandle *AH)
{
RestoreOptions *ropt = AH->public.ropt;
- if (!ropt->single_txn)
+ /*
+ * LOs must be restored within a transaction block, since we need the LO
+ * handle to stay open while we write it. Establish a transaction unless
+ * there's one being used globally.
+ */
+ if (!(ropt->single_txn || ropt->txn_size > 0))
{
if (AH->connection)
StartTransaction(&AH->public);
@@ -1329,7 +1424,7 @@ EndRestoreLOs(ArchiveHandle *AH)
{
RestoreOptions *ropt = AH->public.ropt;
- if (!ropt->single_txn)
+ if (!(ropt->single_txn || ropt->txn_size > 0))
{
if (AH->connection)
CommitTransaction(&AH->public);
@@ -3170,6 +3265,19 @@ _doSetFixedOutputState(ArchiveHandle *AH)
else
ahprintf(AH, "SET row_security = off;\n");
+ /*
+ * In --transaction-size mode, we should always be in a transaction when
+ * we begin to restore objects.
+ */
+ if (ropt && ropt->txn_size > 0)
+ {
+ if (AH->connection)
+ StartTransaction(&AH->public);
+ else
+ ahprintf(AH, "\nBEGIN;\n");
+ AH->txnCount = 0;
+ }
+
ahprintf(AH, "\n");
}
@@ -4033,6 +4141,14 @@ restore_toc_entries_prefork(ArchiveHandle *AH, TocEntry *pending_list)
}
}
+ /*
+ * In --transaction-size mode, we must commit the open transaction before
+ * dropping the database connection. This also ensures that child workers
+ * can see the objects we've created so far.
+ */
+ if (AH->public.ropt->txn_size > 0)
+ CommitTransaction(&AH->public);
+
/*
* Now close parent connection in prep for parallel steps. We do this
* mainly to ensure that we don't exceed the specified number of parallel
@@ -4772,6 +4888,10 @@ CloneArchive(ArchiveHandle *AH)
clone = (ArchiveHandle *) pg_malloc(sizeof(ArchiveHandle));
memcpy(clone, AH, sizeof(ArchiveHandle));
+ /* Likewise flat-copy the RestoreOptions, so we can alter them locally */
+ clone->public.ropt = (RestoreOptions *) pg_malloc(sizeof(RestoreOptions));
+ memcpy(clone->public.ropt, AH->public.ropt, sizeof(RestoreOptions));
+
/* Handle format-independent fields */
memset(&(clone->sqlparse), 0, sizeof(clone->sqlparse));
@@ -4793,6 +4913,13 @@ CloneArchive(ArchiveHandle *AH)
/* clones should not share lo_buf */
clone->lo_buf = NULL;
+ /*
+ * Clone connections disregard --transaction-size; they must commit after
+ * each command so that the results are immediately visible to other
+ * workers.
+ */
+ clone->public.ropt->txn_size = 0;
+
/*
* Connect our new clone object to the database, using the same connection
* parameters used for the original connection.
diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_backup_archiver.h
index e4dd395582..1b9f142dea 100644
--- a/src/bin/pg_dump/pg_backup_archiver.h
+++ b/src/bin/pg_dump/pg_backup_archiver.h
@@ -324,6 +324,9 @@ struct _archiveHandle
char *currTablespace; /* current tablespace, or NULL */
char *currTableAm; /* current table access method, or NULL */
+ /* in --transaction-size mode, this counts objects emitted in cur xact */
+ int txnCount;
+
void *lo_buf;
size_t lo_buf_used;
size_t lo_buf_size;
diff --git a/src/bin/pg_dump/pg_backup_db.c b/src/bin/pg_dump/pg_backup_db.c
index c14d813b21..6b3bf174f2 100644
--- a/src/bin/pg_dump/pg_backup_db.c
+++ b/src/bin/pg_dump/pg_backup_db.c
@@ -554,6 +554,7 @@ IssueCommandPerBlob(ArchiveHandle *AH, TocEntry *te,
{
/* Make a writable copy of the command string */
char *buf = pg_strdup(te->defn);
+ RestoreOptions *ropt = AH->public.ropt;
char *st;
char *en;
@@ -562,6 +563,23 @@ IssueCommandPerBlob(ArchiveHandle *AH, TocEntry *te,
{
*en++ = '\0';
ahprintf(AH, "%s%s%s;\n", cmdBegin, st, cmdEnd);
+
+ /* In --transaction-size mode, count each command as an action */
+ if (ropt && ropt->txn_size > 0)
+ {
+ if (++AH->txnCount >= ropt->txn_size)
+ {
+ if (AH->connection)
+ {
+ CommitTransaction(&AH->public);
+ StartTransaction(&AH->public);
+ }
+ else
+ ahprintf(AH, "COMMIT;\nBEGIN;\n\n");
+ AH->txnCount = 0;
+ }
+ }
+
st = en;
}
ahprintf(AH, "\n");
diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
index c3beacdec1..5ea78cf7cc 100644
--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -120,6 +120,7 @@ main(int argc, char **argv)
{"role", required_argument, NULL, 2},
{"section", required_argument, NULL, 3},
{"strict-names", no_argument, &strict_names, 1},
+ {"transaction-size", required_argument, NULL, 5},
{"use-set-session-authorization", no_argument, &use_setsessauth, 1},
{"no-comments", no_argument, &no_comments, 1},
{"no-publications", no_argument, &no_publications, 1},
@@ -289,10 +290,18 @@ main(int argc, char **argv)
set_dump_section(optarg, &(opts->dumpSections));
break;
- case 4:
+ case 4: /* filter */
read_restore_filters(optarg, opts);
break;
+ case 5: /* transaction-size */
+ if (!option_parse_int(optarg, "--transaction-size",
+ 1, INT_MAX,
+ &opts->txn_size))
+ exit(1);
+ opts->exit_on_error = true;
+ break;
+
default:
/* getopt_long already emitted a complaint */
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -337,6 +346,9 @@ main(int argc, char **argv)
if (opts->dataOnly && opts->dropSchema)
pg_fatal("options -c/--clean and -a/--data-only cannot be used together");
+ if (opts->single_txn && opts->txn_size > 0)
+ pg_fatal("options -1/--single-transaction and --transaction-size cannot be used together");
+
/*
* -C is not compatible with -1, because we can't create a database inside
* a transaction block.
@@ -484,6 +496,7 @@ usage(const char *progname)
printf(_(" --section=SECTION restore named section (pre-data, data, or post-data)\n"));
printf(_(" --strict-names require table and/or schema include patterns to\n"
" match at least one entity each\n"));
+ printf(_(" --transaction-size=N commit after every N objects\n"));
printf(_(" --use-set-session-authorization\n"
" use SET SESSION AUTHORIZATION commands instead of\n"
" ALTER OWNER commands to set ownership\n"));
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 3960af4036..5cfd2282e1 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -548,6 +548,7 @@ create_new_objects(void)
true,
true,
"\"%s/pg_restore\" %s %s --exit-on-error --verbose "
+ "--transaction-size=1000 "
"--dbname postgres \"%s/%s\"",
new_cluster.bindir,
cluster_conn_opts(&new_cluster),
@@ -586,6 +587,7 @@ create_new_objects(void)
parallel_exec_prog(log_file_name,
NULL,
"\"%s/pg_restore\" %s %s --exit-on-error --verbose "
+ "--transaction-size=1000 "
"--dbname template1 \"%s/%s\"",
new_cluster.bindir,
cluster_conn_opts(&new_cluster),
--
2.39.3
On Wed, Dec 20, 2023 at 06:47:44PM -0500, Tom Lane wrote:
I have spent some more effort in this area and developed a patch
series that I think addresses all of the performance issues that
we've discussed in this thread, both for pg_upgrade and more
general use of pg_dump/pg_restore. Concretely, it absorbs
the pg_restore --transaction-size switch that I proposed before
to cut the number of transactions needed during restore, and
rearranges the representation of BLOB-related TOC entries to
reduce the client-side memory requirements, and fixes some
ancient mistakes that prevent both selective restore of BLOBs
and parallel restore of BLOBs.As a demonstration, I made a database containing 100K empty blobs,
and measured the time needed to dump/restore that using -Fd
and -j 10. HEAD doesn't get any useful parallelism on blobs,
but with this patch series we do:dump restore
HEAD: 14sec 15sec
after 0002: 7sec 10sec
after 0003: 7sec 3sec
Wow, thanks for putting together these patches. I intend to help review,
but I'm not sure I'll find much time to do so before the new year.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Nathan Bossart <nathandbossart@gmail.com> writes:
Wow, thanks for putting together these patches. I intend to help review,
Thanks!
but I'm not sure I'll find much time to do so before the new year.
There's no urgency, surely. If we can get these in during the
January CF, I'll be happy.
regards, tom lane
On Thu, 21 Dec 2023 at 10:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I have spent some more effort in this area and developed a patch
series that I think addresses all of the performance issues that
we've discussed in this thread, both for pg_upgrade and more
general use of pg_dump/pg_restore.
Thanks for picking this up!
Applying all 4 patches, I also see good performance improvement.
With more Large Objects, although pg_dump improved significantly,
pg_restore is now comfortably an order of magnitude faster.
pg_dump times (seconds):
NumLOs dump-patch004 dump-HEAD improvement (%)
1 0.09 0.09 ~
10 0.10 0.12 ~
100 0.12 0.12 ~
1,000 0.41 0.44 ~
10,000 3 5 76%
100,000 35 47 36%
1,000,000 111 251 126%
pg_restore times (seconds):
NumLOs restore-patch0004 restore-HEAD improvement (%)
1 0.02 0.02 ~
10 0.03 0.03 ~
100 0.13 0.12 ~
1,000 0.98 0.97 ~
10,000 2 9 ~5x
100,000 6 93 13x
1,000,000 53 973 17x
Test details:
- pg_dump -Fd -j32 / pg_restore -j32
- 32vCPU / Ubuntu 20.04 / 260GB Memory / r6id.8xlarge
- Client & Server on same machine
- Empty LOs / Empty ACLs
- HEAD = 7d7ef075d2b3f3bac4db323c2a47fb15a4a9a817
- See attached graphs
IMHO the knob (for configuring batch size) is a non-blocker. The
default (1k) here is already way better than what we have today.
Look forward to feedback on the tests, or I'll continue testing
whether ACLs / non-empty LOs etc. adversely affect these numbers.
-
Robins Tharakan
Amazon Web Services
Attachments:
v9_restore.pngimage/png; name=v9_restore.pngDownload
�PNG
IHDR W '�B5 sRGB ��� gAMA ���a pHYs % %IR$� ��IDATx^����f�V����q���87����}��J#J��M�6P��(�(��D�WP!@@��� � ��j����W�u��u����s�����U�s�����'��2�����<�����Y?��c��?�g?���s~��~������y?�������?���_� ~�/��[~�/�Ee�����_�K~I�/����2������_�������b�W��_y���U�j�W��_}���5�f����_{���u�n����_�o�
�a�7���x�o�M��������o�-�����[�)����������i��w��K����;/��~��������������>��{��{%���������U���!������t�������]��-����;�#Z�!U�?U�]Iym��bR��R�m)�P>o)��R������{z�OG���z�����z��z�-hP���.�@��^�
�!���b�-���
�8����C7l��[��7��uF��h��Qx���� �i��R�h)��P j)PU)���@��k�s��zKA�������.�#T �R�=K%�JZ$<-N�*-���}��*�A���tn����(��Gh��<li���<��<�R�R�R�J�mU��-��*���r����r�.���zKK�G���R���u�����[�>�J{������Y�#���b����������g��<B7�.��U:$nA������ (gh��48{�-
�]
-�
B-�*��`��gR`�RP�R@o)��P�����RA���v�J�Q*�g�t_IK�[���)��������S����t\I��Y:o������Ks����Cs}�rEK��Jy�J9,)�U)7��;��w[��;��[����;Z�-=���z��y��(��#��3����}@����9B��+h�t��=���2P_��tA��d�n�*��oF�����ar�]O�������� �R �� �R��Rpk)�U)p&�*�*���}�J����J�.��#T�R�=C%�*Z��!O��Hw�/]O���[��p�ig��=J����]��-����U�-��*��*���W���R��R�m)7�Pno)��R�h�����z�SG���z�����z��z�-h/P�}��C����,��nA������@}BnE_��t1�d�n�*���nD��i��/$Gi��4({�=
�]
-��*���S�[K��JA3)�V)W)����T$v���T�v���R9<J����hpKZx<6-����t�<6�O����
:���9|���.��]��-����U�-��*��*���<W��R�R�m)?W)����w���cz�C=�����D�Q�Gg�wg�����@����9B{�+h�t��=���2P_��t!��c�n�*��CnF���h��?�!r��[OC��a�������R��R�i)0U)����0��i�q��xKA�Jb�
LKh�J�*�G����2}��[�b��h�sw��k�1�>��W�yw���#4'�������������C�#)�T)'U)�%��*����h�rpK9�J�����K=���S��W��7�)��#��3�����@��;�9J����W�����-���}q��Ex�n�]�9+t ����#:�Gt���Ql=
��mK�z�CK�c��NRP������,�i��p�xRp����C�����C�k�J�*�g�<_Ae���xZ�����k������'W��w���#47vi~���li���������[��������]�reR��<���w(���v����gz�E=�����D=rD�tD�wF���'��^b��#Gh_s��nA�����/�E=K�Q�)v����ap:�ftx��P���q�ZO����������R������v(�%�*�� Z� \����w�0T���Tvv�l��w���*�WP��-+��7wk��w�n����������t�\A��:����8B�l��iK��J9`�rHR~�Rn�R^K�yU��I�t�rqR���<�R��>�S�������R��IQ?Q��Q��� �������Q������-h_�e�O�-�z]|G�����J�-�`���9�yD��hh�A&�-
���������Q��YR��R�l)�V(�V)p'�*
U*(-�*X�T��P�<C��,�[�b�!i9�:�e��L���������t�������Gh���\�����\�R��<��c���*��Z�{U��I��J���\�C���^�C���~�R?��#��F�)E=uD�wF���/���b��$Ghos��nA{�[�/7��;J7���U:nA�����#:�E�(
��bOC����K���pQ�`�R0�R K
rU
�-�
�*����C�J���b�C�j�
�*�G����~5- �0�*-����9z���}(�w��3�
:/��9~���.����-���U�%Iy�J9�B����W����S���[��U��-��]�'-���zRO}�(�@Q�Q_QQ��������=�Q����=�-hovk��_�����.�#t#���X��t�����!,:�E��(
��aO����K���`Q�@�R �Rk)�U(8�<+t����y�
���2�Cej�J�*�g� ���~5-��,�-������U���!�~�����tn��s���]�w;4o�����*���\S�<U���R��P�l)�V)'������[���SZ�9=���z�Q���~)��#��3�����A��;�/9B��+h�t����}�A�Q� v�&���:�FtP�� ��.Ghp�4 {�=
�
-�
3IA�J�� W���R��P��R�N
�U*;TD�
��]*rG�X��{�����\�5-R�2-����&O���[��}%�Og�=J���7�4�vh�&���U�)I��J��B9��X���Rn�R^N��;��[� ;�Sz�;=��������~9��:�><��}5�
�����}�Q����}�-hvK��_����.�#t��
X��t�����#:�E�(
���_OC��A�CA�� �CA&) U)x��*[
�
�U
�IA�J�J�����C�i���*�G����2~-nM��H����I_��H����������t���~�f�.����-��*��*����S�|U�<�R�P�l)�V)7'����-���+-���zSO��(�BQ�Q����[���J{����=��w���n��,��v�.���*t�����#:tE�h0������i��4�w)�"�`Z
?
\-�
��f�m��tR �R��R�h���Pa����K��(���T�����-i1�Th�t�����)�}rK:����,��G����9�Csp��pKs�J��J�%)�T)gU(���+�C[����-��*�����.���zOO���v����o����z��z�-h�P����M��>�,��nA{�[y�e�����/�t���n�*�����kD������p�UO�������� �R��Rpi)�T)l%�*���`Y� [� �����T4Z**U*I�T�v�0�"{���U�$�-@�-���t=<�nE��Ut�����(��]�G�4�4�[��U�U�/I��Jy�B�.)V)�&��*����w�rK�a�zKO����S;B�P�7G�cG��G��oA{�*�/vhr��9W����O���2�@�Q��w�����~:�ft(������p��TO�����������Rp�R`i)�T)h%�*��@Y�[�������T0Z*(U*F;T�v�$��z���U��-:��BwwA��c�}u:'������Gi.��|����C�9i�W)OT)�$��
��*���|X�\��g���[��U��-���/=���zTO}�(�DQ�������[�>�B�������Y�?���i�� �@�oA_�+��:B��pU��oA�������[4����i��4,[�;4�[
;V��N�VR0�R L
�U
�
�IA�J�J�"���P!��B�K����3T�����h��������5�Xt��������;C����4�vhN���N��U�U�3I9�B��Jy/)'V)�&��*����C=���C=���S������(���>;��<��}�'Ti��C{�#�����P��������]XG�b����B7�-����a(:`Gth���QN-
��eK�v��}Ka�JA���S�`����d��k��rR��R��R�H*";T�v���R!<B�(��+�����AK���+�z{��nA��t��s���]�W;4/wh^'��*��*���<T��U������S��m��tKy�J=���K}��>�S������(��#����<�~�+Th��C{����9K{�[�^�j��2P_�+��:B��hU�����jF��W��- Gi(�4�Z�=
��-�*��N�UKa�B0)8V)�V( '�*�*��R���C�k�J�*�g�0��rZV<$-m�j>�>�A�cx+����t?�����t��s���]�_;4?�4�[���U�7I��B9�J�/)7V(���s+��[��U�-���3=���zUO��(�FQ�������i�P�}��S��~�
�G��vkW��2P�[��,]PG�"����B7�-����!(:XGtX��#4�zl=
���
��B��IK��Ja*)�U(�%�*�
��@]� _���R��R�����K����T��R���I����o���_�N������9K��Q:������9�Cs�J����_��Q�����*������c��jR��R�n)�W���+v�����z�W=��#�G�EE�vD�yD}��_��>c�v*Gh�s��Q�������� ]LG�����J7��t@�����*:�E�QF-
��cK�u��{R0��P�h���X�_Ka�B�B�8)HW)�W�0�T8�Ttv�h�P�;B��(���T���%�C���U�����y{�:~�o����
:'���}�������U��-��
��*����T�\V��R��PnM��U��I�|��AR���~�R?��g����R�QQ�Qo�Q/���U�k��^��y������_��M������8g�B:J��\��oA������Zt��A��0ki �4Twh��
�H��L�TR��R�K
�
�
�I�J��BE���Q��S�r�K��#T����~-�,�-�������@��C��|�CW��y���#4Wvi�Ui�Vi���*�?��{��R��Y��`R��R~M��U��I9�J���~�C�����R�����)����;��<�^~�3Th��K��#��9K{�[���*�e��.�#tQ���U���j:�ft���0����
��YO�����C��@P� �R��PpJ
\U
zI�B��B8)8W)�W� $��*6U*T�T�v�d��{�J�U�\�5-R�*-����>O���[��}�Kg��<J��.��]�sU��;4��rA�rH��ORn�PN�P.L��U��I��By���^���R�����RO��o����P��RQ�Q�Q?���U�o��~��{��^��g����������0g�:J��T��oA�������Yt�����ki�4Hwh���@Z
0
L-��
��`X� Z����+��T�
E����*sG�X��{�J�U�L�%-L�-���6}������*:���yz���#4ovh�������O�U�%�AI��By�B�0)WV(����+��[��U�-���;-���:WK��(�IQ?������[���B��]�����Y�O���mW�/A���w����M|5H3:�D������4xz^-
����
��B@��GRp�RXJ
Z
vI��B�B�7)(W(�W�$�*�*P;T�v�L�r{�J�U�<�-F�
-��^�z?�OnE��Utn�������4wvh�������O� U�'�CI9�B��B91)_V(�&��*����^���R�����Soj�w����R��������z��z���o���c��,Gh�s��S��}�n��_��9K���w�f��
|5D3:�Ftx�d��~�NO������ ���R �R�H
,U
JI�B�.)V(xV(�&�
�*��Q��R���C���#Th�P����w���_�k��?�'�����|��Z�� �M���]�I�������t���s���Gh��������O��o��o|�?���x����?�u��
���<U��V�������_��/|���w����\��������{�zCK�c�zOK�����S�;B�R�SE�wD}zF}�j�;Ti��C��#��9K{�[�����2����]�;tU�����r��sD�qO�Q6=
��_K�s�wK��J����R����*��`�g�nR0�������P��R���CEm�J�*�g�d�z����Bi _�5_���-h��X�:���������/�����?{�;�����zt�<�O����
:����{���.����U��U��I�!}��|��������$��)_y�B�()WU(�U(7�6g~�}�������~��[��I��JY<)�W�?��?v�����Z�_=����/E]U�G��G��oA��
�;vh�r��?giOu�����.��8K���w�&��M{5@3:�Fth�a��~�MO����������4��4Z
)
FI��B���W����`��+�`�K�?�0T��T� �PA���x���Q*�G}��}��%�3����A�����_�/}>��~���;�/������5�������^<{�}�?�g���������?�G/���{���g����n�mo{������?��`��t���?���R�^�������i������t��y�Ksi���L|���+����}��=������y����_�������{&>���-�����g�����K�Qn�PNJ�W+�q��������������hK97)W(������#�������SO=��>w���������z��z������c��.Ght��U������|���E8K��xw����M{5>#:�Ft`�`��~�LO��������������R�H
(U
EIa�B�-)�U(h�(�&��>xW)�'�*��*�})��rx���}�>�!��Zb<4-{��������c����k����Z�� �����bL�M����T�2N��H,3�~�W}�W�mz��v������g��������%g��;C���;4�vh>������ ����??�65�[����y����9�y�{^z�rK��RR�ZQ��P~Lm��%~������~����RF�P.O��U�-���A-����XO���LQo������[��B{����=�Y�W���og���@}���r�.�]�y*t�^M������Wt������ji��4(whP�4��0��I��PR��PpK
|
�+
�I���U
�I�BE�*��O��O�������>��>�%IT�v�e���3T���R����/�p�2P����N�g|�g<�����7?'�X�E�������>���z�C�=���<8}��P�@�9���1^�#����;�����f�?�������h�[�o=�3u������]�WU��������g
�?i~��R�#�����?#^6�>&)�T)7%����
�����3?�3_���3|�\�R�M��U��I��J}��>�C}��>�R�����)���<�~=��~5�!��������������;��e�>�[�'�,]$G������J7��t���@�A):xE�,=
���\KCr��t���R�h)�T(%�
���W�`Y�|��+���A��JU��?�g��K:^���#Q��"�K��(����2�e��E�]z��g}�g�������y�Y�H��W�=B������3O}�B�]��O��O��h�c��#G?��_��y�1��[:c����(���4/vin�����A����/
�����~/�?�i���Rf�����}�}���o^�=��*����U�|W�<�"w����p�����-����\�|�R��R�H�#;��Z�S=���������*��#��#����]D��;�9B���������z��������]��t�T�&���f�rD�nO�Q*-
��\Kr�tK��J�")�T( %�
���W�@Y� �z+��]�`�T*TT���]�x�P���z���Y}�W/��x���
G��ghAv�����Z��qU�~���^��Xo;���L��O���=��]����g���������������t6��s{���������o����n������k����w<���}��q���R|N��W\��-��
����U��W�\�"����_����������+����}�zEK�d�zQK���~�R�;J����:�^,��3��W�>�J;������Y�_]M{��.[��}����q�.��Y*t����d#:E���>B�����`�i8Vi0�4��(��H��OK�iE-)�U(HV(�&�
�
��"P���]m�9�T�����#T<�R)>�/���e�A����j�%��1��2���r�g��o,�����Ew�^���}Y|+���gt���s�(���4Gvh��h�g�}���uh���w�������������#/������f��)�T(G%��
��
������_~�s���jR�m)?W(�'��*�����u���UK����w�z����z�����z�-h/Q�=��a��^�,��nA��#����I��X�t�T�����b��Qt���#4HzH=
����
���^� �R�P�I
L$��K��y�
��Y��"l������(W�����fHS��Eq�K�/={��7�������������bE!S�5)��D���/�����,��?���������(�������x������~��>�����x��O��?}�u_�u�=���A�KG|���������������������&�/�����?��QX��R~��`����/��g������������y�>���q*z�X:�5��_�����?�������x��r���������/��G~�G>��/����T~L�M��}���~�KE����z^��{���#����w����(�;���b_�5_���C��� ����b�+��?�����>J��qq���������x������������g�������k�O������^���1�c�������7��sq]��,���������9��_��7�O����������{-��?�'����K��o��o���s���M�>^��+����>Om�;���<���_o����20��w�q�o���|����������|�\ �z������39����l��?�������w�<sSd�x9w{�W�3�"fF;����9%�����2������y��^1/��b������T8)?�Y�q]��c&��>���k�����@��Rq���?������>��?�'�����S���@���g�8K�k3��������1������?7��9��B��������-��*���~�C���~�SO�����)���~<��=�>5�%����]��
����-�{�3.Y��}���Eq�.��I*tc���^#:E����>BC��A��@ki ��@ni�W(@�@*x��R��O��V��"��[/�I��m��R��{��#������eX[
�Q��?7mQN���b��[A>�}��g��7o�F��?F�������D���F�����,Z)����c?�c����w�T�Z������&
d,b���z�e��h���/��s��V�%1
|����Xx��DoK}����������`Z2FI�cc��>.� � �B.���2%��Z�l��So3��������]>��?��^z\,���H���H��������|��b���p0��b�����L_s=������/��c/�:����Lo=����F�q�q!�q�!���,����K����{A�
q��?��o���!���3[�?����8��?7�T�A����o��~�����������`,�s��>�g������l�}#1{�{�o+����A!�T,W9&�����>����/O{o�����j�����o����R����g�*���r�zFK=e�zRK=����R�;B�S�gE�xD�{D}�����>d��1Gh?t��X������/qQ���J7H�n������5�Qt���#4@zB-
���a�qK��J�!)xT(�$����XF�jmd��I�����B���]��l+"D�����_�s#���|���r�����H��X����-�}s�g��fw��S�
�k ��+�5��E�}|,��[���_K�e`|�z��)^z����,�����4Q-�F�@���+������L|���R�|�;������?���x_�l����[-�Y������1{�`�1�2P/;�Wi!�/����20>?���jz�Q�������/~=����z�-�z����V��Z�����}�<���*����H<��?��=�37�=�nG���:��4�D��&�y����/�b��>F�*������~q������6G�c���]�I<������Bi4+$�����6��������K,d#7j�����0��rqR��P~O��U�-��*����VK}���w�z����z�����z������^�J��#�:K{�[�^n�+��'�,]G�����B7�-�p��5��Pt���#4<Z@=
���
��^���8*p��Qk���O��Oi ����F0�a<�c������Z &����20n���X��=�3��FQ�g�������^���%E}����S�\)�W����3^�����W�V��KE+^�����3�0���U��e��s=Z������u�t7oQ����g������������������(x�e��s�/�|�g���/����~�����x�]�l�]�R�|�=S0������1T����|��eW����>����V{Mh�����1��;�?}:�����~�}�;���rq^�g�:��,�����z��G]<�?w�g���(�/-������20h&���U9����E_,����e�����s���?~�����q�����C�|_����X��x�E~mf���7�� ��?����&�O����v����/����f���Z���>>��2���qR��P�O��U�I}e��RK}����R�;J]��^;��,��#����=E��";��9B{�����Z��;��2P�-��|�.�#tqV�����x5,#:�Ft�V�!}�GO����� ������P`h)lT(�$���l���b�R�O��x�L�Lo�������"l���������I���
��2��X>D �g��R��Q�����G�j��^�SM%������ ��y�2��,5Q~T��m���)Jr_����,�x�{Y���`Z����j�]����x?q����������{�KW�������}������I�cF��/��B-�q_�3b����\�K�[-�I�9�k9������$�3�qz p\������\�����g�E��K���~�����ho�k6���}���k6>'��K�R,J����_����1��ix�����?'���-����/u����>�]�������3�3,�T�g���?.��o�������8����3����������M��}��bv������l~��_��1����!>��(�s���;�{���/fB��6����.3�����������mJ���sL|�������� !~mfK1��s�1fz�}>.�w`�20^�������kE�8�yz�r|K=�B�����C�����Ro����>*����<�>�~5�)����^�����>�����J,� >K��0w�����x:TFtX���=�Gih�4xZ\=
�*
��w��BR��P�I
D��20�W���I��d�k����_x\�>0F0lo��~D[
�m��e`�(7��~ �������v����/�����X0�%#�� �8i�-03�i����$��}��������eM/-��A|O����e`�xyU����Z�?'��Jn���Ge8�;�Y6z�_|#��x��V���.���hy��x�d�X�{4�{�^��eK��=-�����q����e`�#�c��1z�����g���}>��f-c�?�)�]��\������H\���7�"�c�R�X ��i��V�q��q���������B{�������\�D8�={��={P�pD?�Z�{��(h��\�������x�������G/�������_�+��2k�����Md�����z��m�G�����^����c��mfK1�������>'�X
F^����Z^G�P�QNN����I=�J����R���S�j�����P/�����7�z����-h_Q����g����,�������8��t������]�;tST�&���T#:�D��p>B�������ji����M�U
I�B�&)����}� "����{����i����e'
�I�v�
�I������|L����?hd��%��c�P�K�����X�����]����^z�(X*jA? $~XH�8�$->����T${��3{�o����Q��8���~"h\�z���q�e`��HZ^��8��z��Xm�,F��I��g�����7�{����?NC�m���%��e�O���n������o���y�?��vC�m����?����1�t��t�\!�5�#F|�����s�L,������?�6�;����3�,��9����{�c��-����f����g��W�������v��������(��Y�������%��c��k3]�4��_�W��/��g\���3��I9�B�>�T��$���O-���:\K=��RQ��������i_Q�IU��9J�������~G��-��p�.��*t���R��Ot���#4(z6-
���
���u�BK��Ba&)�����2�����7Din\��9�9���������B���J�[ZF���o�x/|�L��~z���z���Q��<$���V|���{���e`�,��;��%}_���D!�20���8����.��K?Q�o_N����Yp�1�q�����Z���:"]���O��O���c1��Oz�e`��������Q� ���!�_X���>[Q�E������E^o�������Eu����:��?�%�=n�e�!������n���O��Oy��B<��}���������v��?�:�5�>&�'�8���gf��n����*-��������h���D���W�/)�~3����oM�����Ib����q�E�_��������s���6��5�?&�g��2��k�U�vFy9��z�r}K��B=���C=���R����~*����,��#�����E��$;��9B{�������tUO~�O�Y����q�n�
�|W�A2�jD����GhH�4hzV-
��I��J� )XT(�$���2�
fzi��1!�2�c��6�����6Q����i�dW����20~�A�^�[�����Q��l�~���%���L:�����M<SP�,�����~���2�e`|���1#q���\g�Uy���x��_����.TR<4�&�����?�����>S�������Z��Q���]<��\|��R���2�B�;��qG������O�c��jz����XD���h��
�rn��x}��W{z���mu�����N|��������D�sjv���0��`\�:�[�����,�=�k����=G/n��<Z����� ��~����x�q���m�������C|������+�[�������1�=���X5Z��1�kW����v��}R/�Ri��T�G���z�s-��#�OE}W��G��G�����E��$;��9B{�������tU����@nA��3��?B����nA���'��':@E�=
��UK�n��l���R0H
0I�gfw��O,����zI�H��qq-�BlE�{�e�x�����#Bz���<�x)��R+�)8{���e�
~�������� ��;��2����(�����0����?��C���e���(��)��Zjy��Z��z��\�=��[�.�^�e`\�z�^�9�������c��kV�km��'b'}���*_��_����x�������T8����<�_"�{�����I��$=�����K�?��>����?l��7z���3;������4�13>�1��g���e�9������3S$�9��x�2k���������K��"���q��W}]^���I��B9?�T��$���S-����\O���TQ��hQ/Q���/*�/��}������-�{�Oz�O�Y����p�n�
�tW�2��iD�����Gh8�4`zR-
�*
���s�AR��Rx
<+�e`��������c"�E@���H������0���
���l�{��mcA�/^�b��^������2���I��Ub��e������������&^����e`[D���F��_<?��-��<���x�q��J����v�P������x�L���X>���Z��z?��\<3P���)~��c/��W^&����/�B�y�=�20�����x�'�>��~��k�{�����<���
�?f�y�s�{�����6��I:�{�%;rf����X�{���34/�r>�3��? ����6;�^����^�e�s���G�����V%}F��S��X�%��8�|I���A#m^��20��������I��By?�'T����kv�S%���z]K��(u��z��G������_M��
�Kvh_s��Ggi�u�~WWuh����<K_�]� w���
w:<Ft(�:��):���`�i��4�Zn;4\�s��@R��PhI
;+�/�}��e��~�[�D"��?]S���W�|F�����UmP�Y*����;�v�������0�R�g�x�B��,IG���_�����"�R@��k_�v��}
�Rd���J���s��N�=�]�u�E�-���R%9���bP�G��j�s�2P����?�X�K��o����{�P�eW�_��%=�e`,����1����TXw������w����Iq^��#:���g���b���f/��\?<*��uu��4Ov����3~������� ���y�}�!���V�
Zz�k�Z���H��������>+�9&w�c��Yh�|��/cI�?�����m����q��?\+��(G'��
����P�~��kv�W���Z�u=��#�WE�W��E�|F��j�cTho�C{�]���=�-����'��'�,}�����C�n������4�CNth�t ����`ii0�4�vh�&
�*�� Q���tV"@i?M8Bh���b��?>�c>&B^�6�KX2 ���G!��
�A��BY������������@�
H��+�^�*���-��������`��,Q �"�������|��U��*����OYlGT�w��K?�v�4��2P��xU��H�>���]�K���{}y�e~��p�2PK���"���&!^���=>�a�?������SX����m|��#��8�9��'I���w�7}NW/��T��
��|�K���%�3q���+���m������[z�p����3����y�|���o���o����Y�k�AK��!fJ�~�IZ�6!�L�>�Y��{m^���6rL��-�������&s�}L�-+���e�}!��S�~�R?k�����Po�������z����������d��6Gh�t��]W��n����o�.�
�h��CcD��������
���JKC���V����0�P H
*I�b��
�#��a,^r�?.B_�~��(C�c"���`k�`R�
�} ��,3����x���=�
����cK�����h1?x�\h��~rd�Z�����l,�T�V�@�V�����X-���bA���QH��/��F�@�;�=�������2P������A��X�?ay��������Q��E�^*:z����i,����2p,�����^�����s&�-�?��O�*�o�}��K�����M��J1kuT����K���������O�&bE��*^���O��a��~������o<Q�|��!������������|�o���_��o�~f&������~!W���B������g�Z���1�i���e�YR�c�#s�<���Iy�B�?�7T����s���Z�g=u��z����,����>��������d��7��G:K��[��nf{�?��I<C_�#t���E_���j:,Ft��p�=�Gh�4LZH-
��I��J�?)4T(�$���O�e`����K��l��$������/AI���D�����K���6W���m�O��Oz���GaP ���7}�7=9q�+y���}�b����7��8�w���/=6�M�#^.��1����(�*��}��t�c��r��
��F/-U0%����u>�EE8^�������J�����B�0^�;�/����4�_�b�������,cy�/G��������(����j���������{��Ktum�8_�}�u�$������g�/�F���}]����|���K��}������ed��$��B�v!�W��O���L�����b��g�Q:�FV?�c���i���1q����gV_&��g����Ss2�|�Y��0��,���\����j�t�����&�D?�W�'�W]�+�������q������~��M�?����"7��|�{)�;�?d��Bd���G����sf@=�=�?2�������CN�CM� l��{����IywE�:)�W(�'��*�����C=����R���/���S�������}F��';��9B��������w3Or�O�Y�"���J|�n�����!$:�D���=BC��A��0ji�Ui��4�+4��C�JR�Yi��l��/�m�)B��f$�^��
�+m���-�>K��}�xvB,��
�� ����vQN��G ��>��s/^��b�w�����W�{H��-���}�����LQ�c��?s-�� ��x�J�L��zY�H�u��20��^����FY����Z-+��-�l�h
���o��oy��?�?�������2������������\�_�7��%�^����;����)W������K�x����?�/��|�����I:�wh&����/},-�D��o���z���;�,Y�}���r�?�0��5�G����^�g�)E~��eS<#/z��)�2�]��3��k�'`��������g�*����P���R��R�j�����Z��G������_����vW�N�B{�*�o��>�,��������2P�-��w���G������B7��tH�� ��&:${:p�� �i��4�ZdU�-
�
������IR��hC����_�E��g_EPk������E(�}+
���������@��(L���h����}��X�����R����[�e`,p���mF~���7�V�W,����;���T���C�c�������]�gWV�o,������3��X���v��~{Z�� ���u�20�������P����2plw�.�����1=K5�-�?nD�����
G?�xl,��}�l�����^*��gp����\-aF�<�g������4��
�F/n�.�\���.�v�YB�d�k�cR,�Y����P
���y[�-�7�@��LX�-~���-��
���|���R��Poi��T�o���Z�{=��#�c{���~=��>�}������e��8Gh�t��^��=���[�w���G�b���^���J: ft��4�):l����� �i�4��4<�o��}RP�P(
2}h�20^z�J���R#����V�}hb1�7�_X�Lx�|�%��~#
�+��eO���~���CF��^��(�W��,)~H�j)�����e�%4^����%�*V)^
�ZX��W�G���y����WY��X�������S�nEa�-���,�����B/^~6{�a,c���������X�����n3/����S����e��?)�m@,�����������&^��?.~bg��^��r������������OQ�����q-}���2p��BI��Sy�pK/_��w{���;��8s�'��}�g�Ut���������m���j&�"5�q-�F3`�f��������������Pgw?��E_Uf-ngy$2L����M<S0��rP�,�S��c����w�l�����W/���Y�������w��A�g[�������U�I=�J�%���P�J�k=���z����^,����>��������>�J{�#�W:K����;���2�H�]������aD��3����A{����GK���U��������O
$I!fE�i�� .�]��x�o�z�����v�����=�e,��������]��(����lB�t��!
\�������>�����g|�7|��/� �D�.�����L�X^��0�w`�L8�,
�����������b0�W<#0?*T�x�`,E�%9����x����Rn�l�^���(�!�6Km�����S<�/p�R�(�)��Y������T�G�e��}����� �Y��}�'����Fy��������h��/��x�x�Z,���D��/��/al��!���������q#�5�?+���Xh���$�S��~#�g�mb1��;�s=6�����Z���g��=/�?+�Y��~���f>6��q�qOM,
����;���o=��
��q~n���z\�{6>G+q}���3�� ?�q}�<+0��q��u�z{��G��IbA���H{�\Eg\�s0~�Q��!fl|=�����q.���;0�������Y���������8����{���yq�����8�T|o��u!�qo�{��r���,��6Evh�F+3L\��]�<���?2M�#�2�d��k)�.�g`����!�m�p����c�<�j3a��Z���!�}�����}\� l��0�<���\���W���rz�zAR��Pi��T�w���Z�}=��#�g{����=��>��������>e��9Gh�t��_����f��@�!W�'�,}Q��E�Cy�n�+�`���#:�D�����������iip���L��-���� S��4[�����X���+0��L_�e_�f0�Q�]i�=��
�
��*�*�*��TZ����T���.2�<��X�?���
t>$��W�9r����t��<��yT�9X��[�y_y"����C���sGR�I�?+�Y+�u���}��������(�&e�������^�R��P�I�?;��Z�o-���z�.�YQ?�mQ��~�J�oTi�R�}��/��=������2�@�]�����CaD��1����{���FOC���U���4h�4���A�BHPp�PH
�X��/�������i�
�-������@_�Q��R��T�r�Kq���*�g��_IK�����N�������]�E���t��������4vi>Ui.VhWh�W����<��S�H�3I9hEykE�.)#��Y��I~��q��Ay�B�;(�W�$��*���T���R����#�P�������z���W���B{�����~�,������FJ�@���O�����k�.�
�LW��0��Ft��D���K}O���������a�4d+4��A�HRhYQ@JG��
p���s����!/���R��)�%f����#�g��R/Y9+����rP�^Q��Pq��KJ��Q�
���]*�G�e�,��+i)��%��F���C��W�������J:g��9x���]�;4��4+4�+4�+~�'~��L������ -�����J����"0(#��y���&�}�{_���|���W���r{�zBR��P�i�U�����Z��=��]����,���?�������^e��:Gh�t��`���^��,��:K_�#tqU�����t5#:hFtx����#t��4(z6-
�*
��[���V<����Q� ��Tp�����gE|o��������c��a������|�
C�
J�
Q����]*�G�$��%-!nM��N�����9~�z�5��W��s���#tN�����yU�9Y��\������-�L���)���l����hsX���A9q%��~�u��:Q�M��+��I�}E=!�_T��$��*���z\O}��>y��mO=Y��G��G�'�������ng��Lgiv�~�'��2P_�#tQ��E]���j:Ft��.�A(:Tw���iH�4hZTU�-
�
�� P��T*�B���e���^�;��������=x�{���B����{��B��I��P�
�����"z���*�W�����Xy����{�z<E��oM��Ut��������47vhnUi^Vh>W(�����A"��a��^�oZZ���Y�B�����?���<DNT6m)���
����^����3*�kZ�EU�c-����`O�r����/������hOp5�;*�_�����7��}�����-��z�W�'�,}��EU��B7��t�����%:{:P�������i�$
��I��B�<) T(p$�����)�9s<#�
\�S��p��f������'����?#�������Z��a"���?98������{�
B�
I�
P����}�<Je����h�pKZ�<%ZH�=-��=%��oI��Ut������;4?vh~UinVhNW(�|�}��?8T���]���[Q&[�"0)'���L���O��Of6m)�&�����_����7*�o�z����:]K���^y�znO}Y��G��G�/�����Ti�s��Mgi/v5��Z�e��.���+t�\M�����%: E��.�=
���KK��J����Z�!�4�W4����PR�ZQP[Q
�+
�+
�I!zE�}E��BE�B��J��Jeo�
�*�G��_EK�[������������S���Vt�_E��Q:7��9�K��Js�J��B��B�`E9dEy')'�(��(�����������\���W���F�:NR?�R/k����{����sE�Y��E�~D���i�Q�=��yvi�t��bW�~�5]���>9g�x�.�*]�U�y��`D��+����Az���CK�����CC1i�Vh�'
�
���p���V�V�������pR�^QP_Q!�P�P��R��R�����K���+h�p+Z�<&-��^?��?6�����
:���9�K��.��*��*��
��
����������\����W�KW��2���yP��PH��9I�h��YK���~�R�<B}���,��#��#�\M��
�[���9B{�3������/��9K_�]��v�"��Ms5��#:TD�����t��4Z*-
�*
��AZ�����_Q�H
%+
?I�iE�lF0(0V(��( '���
�
�
��*v;T.�P�=J%�
Z��!�I����]�I��-������t���}�����
��
��
��������4�\V��g�KW��r���yR��P�H��;I=�J���~�R?��g�R��gQ���
���G��-;������Y��]M{���-��;B�]��i��D��)����z���BK���T�A�����N�
A�dE�')0�(��(�%���������`��P��Q��S�B�C�n�
�*�G�x��%�-h�����k��y��nA��Y:����z����3;4�*4W+4�+�V�OV���r�����r`R~�Q>]QN��+��I�~E="�T��������Z�y-���z���=�gQQ����J�Ti�R�}��?�������K�e���-��r��hG�"���[��j��Gt��(��':<w� �i�4LZFU�I�B;i�W(T$��������@���WHW|���������F��M����*��Tj�R�>KK�[����i�swW��������g�<;J��.��;4ovh�Uh�Vh�W(?�(��(�������������<��W����}��DR�P�I�KU�i-���zbO}s�z��G�z����hp5�A*�w����������j���G]��q��h�t���[���J��Gt���pv=�G�oi�4HZDU�-
�
��!��@�DVx������_P@\Q]Q�
�+
�+
�**4*P;T�v�@�R�=B��
Z\I�������Y������������#t�����Csg��^��l��z�r�����rQP�ZQ^[Q.��+��+��A9zEy=)���O$��
���zS��ZK}����R�<B���-��#��#�#\I{�
�]vh��K����'���}��Z�v�.�*]��Q���}D���`t�Cs����@KC��!T���48+4��|��DP YQ�I
G+
a3
zI�pFtEA7(�(�W(���XT��T�8U���Pi���z�
�Y*�W���hy�V�A�AB�[�������J:?��9w���]�;4��4�*4o+4�W�#*�[V������r��raR��Q^]Q.N��+��A9�B�"��T��$���lI}����S����+���~.��#�#\M��
�_���9B{�3�'�����_M�������g�.�
�$W��>�CDt(����#tx�4 Z -
�*
���f��t�p_Q�H
+
9A�hElE!/(�(|�(�&������
�
�*�*��TX�P�>Ke�JZV<-h^'Z�����{]��|(�����,�{G���y�C��Js�Bs�Bs~EybE�eE�()W�(��(�������<������+�-��H�:PR�Rok����[��G����E�|D�D��+iR����viu��eW����Y�u�.�*]��A���|D���@p��r������GK��JC/iXVh@'
���������f�V��������mP ^Q�^Q�_Q��Pq�PQ�RA����KE��3T�����C�B�U���[�>?�"]�A���t��������4vh.UiVh�Vh��(W�(��('�������\���:�|���W���r���ER/�PJ�O;���z_K�����K=X��E=]��G�O���"��Ti�s��Qgh_v5��^Z�
oA��3�E:BM�.�
�W�M>��Ct����#th�t��48Z:UxI��B�9i�W(<���@���5�`W8Wl���������
D�
K�
R����*�G�,��B-#nM��W�_wu���*t�������9K��:�wh>��|��\�����_Q�XQ�YQ^
�W+�q+��A�rF�uE99)_�(���
���~R�>������Z�-�������=�jQO������H��0;���}����B��{�e�>g���K�]��9���{D�� l�Cr������F�����kiHVh('
������P�V�f��B�����mP ^Q�^Q�_Qq�PQ�P1�R!�R��Bz�
�*�W�����`y�����
}��2]�����*:w���x���]�U�SU������+�+�3+�MA9kEynFy1)g�(��(/�����������O*��Z�SU�qI�����S��>,����.��#�+\I{�
�avh�K����7�Z��{-������Jj�n������!:�D�ZO�:�[:�[-
�*
��Y���4�W����MR�Q�ZQ�
�+
�3
�I�wF{EA~E��B�B��JE�Jp���*�G��_E��[�2������q�����z�%��W�9t���#t^�����������\�PXQ�XQ��QnJ�[3�s+��A9sEyvFy9)g�(�'������R�^������Z��-���z���=�kQ_������H��1U������]������\M������e�.�
�W��=�cD�PO��p��������a�4h�4�Z��IC|Ea!(`�(�$���������9� �|g�W�WT*TL*T��T��T�v���R>CE�
Z.��'O��OwO���O���[��}�Gg����s{��G��V��e��s����r������SR��Q�[Q~��3��+��Iy{E�>���o$��
���zU��\R��K��Gw��z����� #�/\I��
�cvh�K{���?�Z���/�7]$U�@+tC\M7����#:�z:��!��!���hi�Ti�%
�
������.Vb������\R��Q�\Qx
�+
�3
�* +*$*@U*^U*|;T:�P>J��
Z&���$O��Kw�6}�������
:����y����#U�_U����+��3�7+�QA�kE�nF�1)w�(��(?�����>�����W*���zU��\K}��>�R=B����-���=���W���B{�*����n�������^�e��(G�"���Y���J��GtP����� ��t@�t��4$�L�[�@���
A�bE&)��(X�(��������@��������"R��S��U���CEs�J�*�W���yLZ���t<&�'����
:���9�K����*��*��
���������������7��;W�og���r���}P�P��+U�II��J�.���)[�����E}���>�}���W���B{�� ��~��������\��W�_�,}Qv������B7��t3���<�C����-�-
���K��Z�0�� N�+
I�bE�%(��(T�(��������� �����b��R��S��U���Cs���Q*�W���jZ�<-����t�<&�?W�9p�[G�<��s}��J��Y��h�������r�����rUP�Q�[Q���3��+��A�{E�>���$��
���~U�^�R/l�W��K�PO��o���h0�=���+Y�^f��B���:K{����@}A���Q��B7��t#�������&:w�`��poi8$
�*
��AX���4�+����KP�YQ��Q`
x+
�3
�I�vFzEA}E�`E��BE�J�J��J����T�����jZv<-���D��c��t5�g��:J��:��4_�4��4O+4�W�V�GV�{f���������reP�Q�]Q���+��A��B=$��T�/%��*���^�S�l���RO�������#�7\I��
�g��:B{�3�G��}��������B7��t��p8�������-�-
���J��Y����M�+
IabF�%)��(H�(�������`;�������"���Q��S�bU�2�CEr�J�Q*�gi)p5-7�=ww;t]=4�_W�9q����t���y�C��B��Js�Bs|EyaE�dE�gF�*(��(��(W�����������V�C��K��RR��R�k���/[��G�/���E=^����J��Th?�C��]�S��]��^X�W�_�,}1v����Y��j��Gt8���.�!�KrKzOC!i�Ti�%
�
��A��@�"VV���B���ZP�[Qx�Q@
�+
�3
�+* *+*6*SU*p;T w����}�W�2�!i�sww]oI���t^��s�(���t���������|]��PnXQ>�Q�YQ�
�e3�}+��AytFywE�:(��(�'�����S����������aO=����K}Y��{�������
W���B{�*��viOu��iW{����B���Jc�.�+��� :hD�VO�:�[:�[-
�*
���W�a�4�W����JP�YQ��Q@
t+
�3
�AAvE�yF�|E��BEcE��B�J�m���.�#T��R����E����[�u�Pt�]I��Y:�����K����*��
����
���������|6������Kg�{W������rPOXQI�1�MI}�J=����R�l�����S��y�~`D{�+ioR�=M��CGh_u��iW{�e���g���K�]�+�����wD����Z=~�t�t��4�I�X�����M�+
I�aF%)��(8�(�������;����0>��_����"S��T���C�q���*�g��_I�������c���t?^I��:�����Ks`��P��_�����z�r��r������VP>[Q�Q���+��3��I�|F�?�/���$��
����U����{��-��]��=�oQ��F�w���'+�����h��Ugi�v��K�x5�����]U�+t�_I7���2=X=|G�n�oi�4D�4���^��l�`�P
+
'A�fE�iF�,(��((�(��������
�*�*�;TV�Py>K�*ZN<-c���]�A��Ut�������;4vhUiVh��h��(G�(��(�(w�������|:�������W����B��IP��PJ�]U�{-����fK}����z�����#�?\I��
�k��':B{�3�W��[v�a�.�
]�W�M;��@t�����]:�{:�[I�J�+i�Uh�&
����0�`�ff�f����B���hPp]Q8�Q _Q�_Q�XQq�PQ�RA����C��3T���e��i�rw���:�5��W�9s���#t.��\���T�yX����9��<�����l4����f�W�;�r��r��rvR>�Q�O�
+�'I��B=*�U��%���:gK�u��sO=\��E{�����I��5;�/������]���@����'�.�*]������Vt��p���z�t��tx�4 ZUZI��B�5h Wh����� ���4�0�fg@���B����������
K�JR��Y�J�.�#T��P������%����D���������3t��y��D��S��a�������r��������rXPn�Q.�Q���+��3��A�|E= �7T����
����U���Rol�w��[�P�������G�/��J��ThoS�}�.����~�J��2P��#tT�����~%��#:D���������������?ipTi`%
�
��a����fH���B���XPp�Q0\Q�
�3
�+
�3
�+*+**U*H*dU*�;TF�P9>C��
Z6��*ww�]�����
:���x����U�S��U��+��+�+�03�I3�aA�mE�pF�3(��(�(w��u��������o*���zX��_Rol�w��_w�G���E�^�/��J��ThoS�}��_��������@] ;t���B��n��C��Jt����������������Jr�A�xE?)$�(�������6�P8��Tg�W�g�WT*TR*T�*T��T w���B|���U�`�-O��^G��oE��t��s���;47�4�*4+4�+4�W�3f�cV��f���������2hPn�Q.�Q���+�I=bE}%����O%��*����cK�����K=Z��{���}���W�>eE{���������t�e��Bg���K_�*]x����uD��@N=t�t��th�t�'
�*
��W�a�4�W4�����HPpYQ8�Q�
k3
�3
�AuE!xFA{Ea~E�aE�B��B�J�o���*�G��_AK�[����>��?�I��|7���Vt�_A��Q:/����C��Js�Bs�BsyE�E9cEyfE�iF�,(��('�(�����������^�#V�W�zN�zUR�RL��-���z�.���z�����#�G\I��
�o��7����Y��]��Z�~���U��*t�_I7��n~�a":�z:�v��m��n��oiXTiH%
�
����A�f@�B������WPP�Q\Q�
�3
�3
�+
�+*+*&*B*^U*|;T8w���R~-nA���:-�^G���=�bP��Y:�����K����*��
��
����������������8�<�_g��g���r��zAR�XQoI�:+�UI}�J=���Rm���R�u��z�ho0�}���O���f�vG���:C{����������E���J�AGt������w�������AQ��4�*4D�����|P0XQ�
,3
D3
\I!mF!pFA3(��(��(X�(���(���T� U�pU���P����{���YZ��� o5Z���/�|������g��:J��.��;4O�4�*47+4�W�V�;V�of��f����������hR��QN�Q��+�A}bE�%��T�_%��*����R����R�������h0�����W���J��]�c��=�Un��_�,}�w��^���B��ts���"�C���m��������D��S�P[� M�+�I�`F�#)��(�(l����@:����P=�������"R��S��U���Cs���Q*�giQp5-=^w�������g?��?��>�^X|�������}t5�g��:J��.��;4W�4�*4?+4�W�V�?f�oV��f���r��r��riP��QN�QO��3�I�bE�%����_%��*����dK}��>�K���~.�������W�^�B{�*����.������,�I��/�]l�����sD7����@��`;B�lK�tK�|����`Jh�ICwE�=(�(t����`6��7�`�Fgvg�g�WTVT@*Tx*T��T�v�X�R�=B��,-��%��H�������g?��?�������/��/�~�����}��=�����������>�)���u���j:V>�s>������<������{��i���'�'�}��|��g�W~�W�p-�x��Y���y�����������y�~�g�K�+��O�������y��:��4_�4�*4G+4�W�V�Cf�sf��V����������iP��Q^�Q��+� A�bE�%��T�g%��*���>�Rm���~�SO����h� �K\M���qvh��K��3�o��}X��B��tc��f ����Cm��������C��R�0[��L�+�AA`E�#(��(�(d�������:��;� ���>�B���Q��S�bU�BW���#�b{F_���2������}^����k�����O<�;�D���2�3>�3�/P�����_���G.�*��y\�{�w����Wb(������{��t^�Y�����}��~����{����?����������\����z.�x��~=�������|��<���^QN�QYQ��Q��Q^�w3��3��AyvFyyE�<(���/��[T��$�����~V�^�R�l����kw�_���E�_�G�~�J��Th�S�=�.������
7Y�/p�>������EV���J�)Gt����"���K�kK�sK�{�`��@Jd�I�vEC=(�(l�����@6��7�@@gpg�W�WTfT8*Tp*T��T����x���}��� W�e`,��x�������o��o~�������7-����^yL�}���3��+�������������������w%�RY~��~�����lg�;����k���9;Z�|�~O�~��M��\��j����r�����r��r��r[P��Q��QN
��3��3��A9~E}!�_���$��
����V�~��+[��-��]�����S��F�����+��Ti�t�vZgh�v�+��'w���;t������n�����Ct�t������������P��0Jb+�I�vE=( �(h����� ���7�0����p;��<����"�����bS�"U�W��Fy�|�g����y���w=i�������>���,�Q�����<_J��.
v��������K�����K����!�!o��4��z�;�����x�\f|����^�g����\��[�~,D��o��L����~����/�d�}�)h�����k_B�j�f��������w�����}�^.��_K���w�����������f.��W������W?�����<�k��{�����������/\;�5>z��3��{��>����V���tk?p�}{����cdg�������;^�����<�g������{�������������_��S�yW����9�����\2��3�\�����F�#W�W��������yP�_Qo�+�3I=hE}+��U���/[��-��]��=�uQ��F�����,+����>i��Zgh�v�}X��B��t3��&�C���l��������B�Q� ���L�+�A�F#)��(��(T�������9�`;��<�`�������BS�U��V��E�E�oO��}���Y�c)�9
y���\�"!�|.
r���|�\L��^J��b�]�2%?������rQ����>�X������l-�O���������c��L>;0-�/�m�G����o,�Wy_���?���N����i�v��@-�������hz���s��_���{)�f��j����v!x��o|��x�}V���q��W��w��]��6�
�%b���e`����`����������]�<_QnXQ>�Q��Q��Q~�{3��3��A�vF�yF�<)���7��cT��$��
����V����/[��-��]��=�uQ��F�����,��Ti��K{�3�w;��e�>�3��=B_�*]\�����qD7����� �� ����C9�@oiTi%
�
����A4�W.�����UP �Q��Q�
�3
�3
�3
�+
�+*+*2*NU*lU*����{�������>o�\
����E_�g�����5�l��m��D����#��,���������2%$i�>�������k�n������\�����?7��;��Uh�1�o�?{/;��w|�w,�Wy_�{������/��nO�������������B��)�f�{��o���������}���_k})>�x�3�2P�y��7�\���/�O`�w.����?Z�KD�~�<��������]�\_Q~XQN�Q�Q��Q��}3��3��A9wF9zF9=(���?��gT��$������V���R�L��=��]��=������v
�=���g��^�J��#��:C��3��@}�w��Z��|%��#��E���������������aP��4�*4$�����x���Q�H
#3
;3
SA�kF�nF�QAsFAvEayFa|F�E��B%fE��JE�Jq�3���[��oyq��vP�,����YC���%�ZZha1{_� ��|��j]��K���\>�^��JZ��f��~PG/?��{���]��cR��������������������������������������������3�}���"0����}f`>�]���;��"���$�_��O�0���B���9;��U�GU��+����+�3�)3�A+�[3�s�}3��3��A9wF9zF9=)���?�����>T�����*�[��-���z�.���z�h �+�h_q%�[*����^i��[gh�v�}����B��t���=<�Cl�������AP���4�V4 ���x��_Q�
"3
:3
RA�kF�nF�1(d�(��((�(��(���PT����(U��U��|��~?/���O��_&<[
D�������:��k�l��E� ��W���cJ��x_��|��".���\���'T��Z���2��_&�B�}s�y���p{�����o^�g��|���Vu�~L)�/#n=�y������,C.���Y����x��������t�e�-��*��*����
��������������7�|9���wg��g������zDh�F��MR/ZQ�J�mU��I=����R�=B������@�W���J��Th�S���.�������K������'�}q�tQU�"��nB��-:(D�NO��.�-�-�-
�
���U���4TW4�����EP YQ�Q�
\3
t3
�AsFvFyF|EAFE�B�eE�J��J�p��~�<{��/�nO��~��=������ �Z
��J��!�2"����-�XK&-��_�%X�Sv�k���kW=3p�~�s?�w����*���������G�L�������O������S�����}����7�����^�G��+������3�������20�K�x6����F�$�??~-��`{V��?@$~�H��3�U�OU��+����3�+�-3�E3�]3�uA9pF9sF96(��(O�(�����������^T����*�[��-�����.�����h �/�hgq%�\V����^��������~�/l�.�
]�W��7��ZtH����]:D[:�[:��@��N��Z�`L�+�A�~Ea"(|�(��(<�������p9��:�`���=�������R�rT�BV�"�#��m>�/�#����`�
�vI������}����������%S�~�_�??�Z�U������M�j��d����~g��_��;^��������/�z���}�{��w��������MW�[G���|9��e`��5�N����~/>�20��K���G4/�4�*4+4�W4�W�+f�[V��f��F���r��r��rlP��Q��Qn��+�A�cE='����%��*�����R_m���R�u��������W���B{�*��vi�u��pG�����*t�^I7��nj�!��a����KhKpKxK��N����PL�+�AC~FA"(x�(��(8����`9��:�P<��=�`�����R�bT�"V��#K�*�G������{Q��YBQ�c �K������2"m�L�X���m�5��~���^������&�����_�������^�d����3���/-�>��Z������~^bA�y��yo>&�x_��_��1���_�>��/����u�^�G�����*�}��#���$#W/��k�������qo������*��
��
���������������<8��9�<�g��g�������D����zNR?�PK�q��-���zkK�w��wO=��}��v�����w����J��]�s��=�Q�-����O���V�bZ��{%�x#��E���������������;�����IR+�I�tE�:h��(D��������5� 7��(gXg�g�W�gTVTP*T�*T��T�v���(���T�gf���>Jtj������3�r���4����)��g�������x�XN����������X��B�]�����_��U���D�%^��I���<f�q��e�J��_G�����*��j�����y��M�e`����q�=������ �K������\�:��4?�4�*4'+4�W4�g�/V�cf��f��f���r��r���lP��Q��Q~��+�A=dE}'�'���%��*�����Rom���R�������=���W��eE{�*���������x�����V�B��E{%�t#��E�COMO�.�=�IwK�~�M����0L�3�I�}F"(p�(��(0������09��:� <����@?����bR�"T��U����/�G�e��+���L��-���cw��>_�]�O���+��:B��:��4G�4�*4/+4�W�f�3V�gf��f��F���r��r��rmP�Q��Q���3�I}dF}'�'U��%��
����gK�����K=��>��^@�g���J��Th�S�=�.����>���2p@Q�.�+������At��tX������������_�!�4�V4����t�`_Qx
3
33
KA�jD�mF�0(H�(��(�(d�(���,����� U�pU��U�`��{�
�YZ<5Z�T<�2P���}���
t?5:���u���#t�Wi�Ti�Uhn�h>�(�(o�(��(/�(��(�����e[����������~�GV�{����zYR��R�L��-�����.�����h? �7��W���B��*��vi�u��qG\��x�>�G��Y��hE��t���&
�C���j���������JC&i8�h&
�
���>��4fdf��B��B��aP��Q@�Q �Q��Q�_QQXQ!YQ��P��R��R����{���YZ<%Z��x�e�Yw����S���)��p����t����^��R�yV����9��<���1�\3��4�\6���G�?g�o��������|P��Q�H�%3�=I}�B�,��U�G��A��kK�w�z������F�����0+��Ti�t��^gh/���]�Y��B��t���&
=.=T�t`�t��tX�t�Wh�$
�
������4�W�B��B��RP��Q`�Q
�#
�3
�3
�+
�3*+*"+*<*XU*vU*��Tl�P�>KK��DK��FK������S���)�9q���#t���9_��R��V�9��y��\0����|3��4�|6���g�Cg�s�r������|P�_Q��%+�?I�iE�,��U�G��C[��-��]��=�����}���W��B{�*��vi�u��r���@��S���*��Ft�����!�K�eK�mK�u�A_���4�V4�����s�0�Q`
3
03
GAajFamDA0(8�(��(��(T�(���������T�XU��U�H�R�=B�,-�
-Q�
-���&}��
]�O����t��sv���*��*��
������������������8�:����g��g���z��zFR?�Q�I�M+�gI��J}2����c[�����{���=�h�0�}�U������J��]�{�������@}`g���K_�*]<+�P���lD7��0��`��������������C�B�%i Uh�
�
��A���.f^F��������B�����������
���G�J��
U��\�
�.�#T��P�*�4ylZ2��z��}
t<:?���v���]:��4o�4�V4O+4�W�f�Cf�sf��f��f��g�Gg�w����r��r}PXQ��'+�AA��B=-��U�O��G[��-��]��=���������W�>fE{�*��������v<�e�>i�����S���J��D7�������]:$[:d[:���*
��a����4,g4������BP��Qp�Q(
Q#
h3
�AaqFatDAwFAzEa}Fe`E�cE�Be�B�J�q�
�*�g��?Z�<&-��^/��?&�O���3t��sw���*��
��
������e�����������8�\:����g��g�������FRO�QJ�O+�iI��J�2����g[�����{���]�h� �g\I��
���w��������/;�h*t�^E7��nZ�! :Tz:�v��l��M:�[:�+4P�Q��^��\�0�+
A�bF�eD�((@�(��(�)(�(��(��(@�(�����p���T�HU��U�0�Rq��}�J�S���c�����A��c�}��\9C��.���4�4*4�*4_W4�W�f�GV�{F��f��f�����r��roPNQ��Q��+�A=eE}(�?U��%��
���ziR�m��R/����������W�>�B��*��vi�u��s;����.�]�W��5��Vt�t��t0��������t�Wi�$
�
��!��a4�g�������������B_PHQ �Q��Qx�Q@�QXQ�XQ��P��Pa�RQ����K�����%�c�b���M��c�}��t�������;4�4�*4�*4gW4�W�f�Kf�{f��f��F����������r��r��r~P/�Q�H�*3�CI=jE}-��U�_&���zmK�x��yO=��}�h�0������Y���J{�#�;C;��S�@}0g���K_�*]0�8���Kt�� �a����K�cK�kK�s��^�A�4�*4�����p��^Q8
3
+#
BA�iF�lD�/( �(|�(��(8�(��(���`���T�8U��U� �PA=B�������C���Nt�<4�G�I��:����Cs�J��B��B�vEs}E�aF�dF�gF�jD�mF�0(G�(��(��������n0���WV���zT�z[R��P�L��-���z�.���z�ho �C��W�^�B{�*��viv��tU�e`CK�.������*��{:HD���-�-�-�"I�gE�.i8�h
���� 1��2���4�@6��G<gjG�W�g�WT.VTbVT�*T��Tw����|�J�c����i�swW������zL:w��yx����U�K��+��+��+�3�'+�A#�W3�o3����3��3����3��3��A�`F�#����%������W�~�R?m�����w�����{���#�o\E{�
������=���U���
],+�0���jD7�������0��C��C��C9�@��I>+rA�qE8h`�(�������4�06��
G8ghg�g�g�WT,VT`VT�*T��Tw����|���c�R�!i�sww����������9J��:�whNTi>Uh�h��h��(G�(��(�(g�(��(��������<�<>����f�C�z���QP��Pw�{U��I���~�R?����S��io �C�h�q%�gV����i��`ghOWux��}�v�V��B��tS�nT��/:Dz:�v�Pl�PM:�[:�+4@����\�P\��
�3
AbFeF�GAiFAlF!O�pFasDAvFAyFA|FAE�bE�eE�B��JEp���.��3T��EK����������"]wE��c�9t���]:�wh^TiNUh.�h��h��(O�(��(�(o�(��('*O�(��(+7�(��(���������~���V���z_�zfK=5�����w�����E��>B�����3�Ui�K��3����/�7]$�(��jD7����� �� �������8� ���H:+nICqF�7hP�(�������4� 6��G4gbg�g�g�gT$VTXVT��T�*T w���R>J%��hy���y�h9�����:���t�=�KG����s{��F��T�������������������������rbP��QnQ��#��3��I]aF}$����%������W����S[��-��]��=�����}���W�~�B��*��viv��u�e����dE��t3��&��==B�t�t��t'�I�B�-h �h�
�����0�`2����4��5�`g2G`g�g�g�WT"fTT*T�*T�*T�v�x�R>J��1hY���y�i������U���!�~|:�����K����
��
��
�����������������|9��:�\��<�|>���f�G����zRP��P�K���I=����RO����S��i �G�h�q%�iV���j��agh_Wqh��}�v�U��B��t3�nP�
/:<z:�v� l� m� n����H6+lI�pF7h@�(������p4��5�PG0g^G�g�g�WT VTRVT�*T�*T�v�p�R�=J���i9���y�h9v�����*��zk�7����t~��9�Cs�Bs�BsrE�xEsE�bF�eF�hD�kF�nDy1(_�(��(��e�������^�cf������z\R��P�l����w[��;��{���=�h/!�s\I{�
�����������������.�
]�W��4��St��tp�t ��!��!��!�t�Wih
�
��A��a4�g �����������B��]P Q��Qp�Q(�Q��Q��QqXQAYQ�P��R��R����{���C�R���|y��������U����t�>4�WG�����Js�J��B�rEsyE�F�bF�eF�hF�kF�nD�1(g�(��(+G�(�������������*���:`�zgR_m�����w�����{�#��#�w\E{�
�����������������.�]�W�M4��St��th�t���!��!�t �txWhX$
�
��!8�A�4�g4���������NP(Q��Q�S��Q�Q`�Q �Q��Q��QaXQ1YQ�P��R��R����{���C����hy������>�O���[�}��t~��t���*��*��
�����������������G�������<=��>��f�K�����RR�ZQ�K����-�����R_����S��z�K�h�q%�kV���>j��bgho��Z,����Q���J��D7��&=>;t �t��t�&�UICfE�,h �h�
�
���0�2���@4��5� �F(gVG�g�g�WTfTH*T~VT��T��T,w�������x�-U�"-����.O���[����t��su���*��*����
������������������9�;����G��g������IP�YQo
�Y+�sI=�J�3����{[�����{���O�'D��+i_S��P��R��;J{���e���3�I��/N�.�
]�W�
4��Rt��tX�t�����������thWhP$
�
���7��4�g4��B��B���MPQ��Q�S��Q�QP�Q�Q�Q�_QQXQYQ�YQ��R��R���b{���C�����LyJ���{��zJt������������t�Wi�Ti��h~�hN�(�(w�(��(/�(��(��(G*o�(��('��e�����0�~��kf������z]R�P�L��-���z�.���� =�D���=��}M��CU�K��~���f��@\+� ���gD7eO7������������3��m�����H.+bA�oE�5h �h�����������7� 9��:� <��=��>����"�����bU�BW�"�K�������hy�h�t�j���)��+�����#t���y_�9S����9��y��\0��1�\3��4�<6��7��;G�gg������r��zAP��QO �5+�OA}kE�.�V���_[��I�y��{O{ �^������W��fE��*��vi?v��w3�e .�]|W��#�!E7�����������C��C7�����,bACoE�5h�h��������4��5����7�9��:��;�`=����r0����S�RU�"W��KE�������hY���X�{�����t?�����������t�Wi�Th�Uh��hn�(�(�(��(?�(��(��(O*w�(��(/��e����1���kV����V��]R'�PM��-�����.u��v=�D{
���J��ThGT���.�����nfk�?�}Bv�S���B�Ut���f��==6�t��th�t�&�ICeE,i��h�
�
��`0��1�@�f�F��B�������B��B��B��������
N�
U�
\�����#T�� ����c�������c��q�����#t����_��S�9W�����=�|��2�|3��4�|6��7�<�?G�kg���r������AP��Q_ �73�OI�kE�.�V����a��oK�y�z|O����
�=���W���B{�*��viOv��x#�e�&]xW�M3����[tH�t���������a��a]���4TV4�����j����
3
#
3A�gD�jF�MoD�qF�tD�wF�zFa}F�`E�cE�fEE�B��J�q�
�*�E��[�2��hQt�������~���E��:�whTi�Th��h��h~�('�(��(��(G�(��(��(W*�(��(7�������>1���oV���z���]R/�RM��-���z�������BO{��?������DU�O������F���M�����Ft#�n�����]:�Z:,[:l��
�������4�f4P���|P �Q�Q�
>3
U#
lA�nD�qD�tFawDAzFA}FE`E�cE�fE�B�m��b���*�E��i��X��{k�u�Xt�\M��C��w���*���C�{+��+��+�3�#3�;#�Q3�i#�#��A9tD�vF�9(o�(���'������3����V����a��hR�m����w�����i� �W��W��fE{�*��viOv��x#���'b�� U�*t�]E7��nB�M�����!�K�]K�e�A��!]���4LV4�����i����p
3
#
1A�gD�jFaM�nD�qFatDAwF!zF!}F%`FEcE��B��Be�JE�J����E����xhZ ���������j:���#t.WiTiUh�Uh��h��(/�(��(��(O�(��(�(_e�������=�?���+f�[�z���TP�ZQ�K����-����R���>��^����h_1�=�U������J{�]���=�Hy�?�}v��Q���B�Ut���&���=2;t��tP�t�&�U
A��BC+h��h�&
�
��00��1���3�05���FGDgrG�g�gT VT2fTd*T�*T��T�TL�PQ~Z\MK���e��]�������j:���#t>Wi.TiUh�Uh��h��(7�(��(��(O�(��(�(_��������=�<?���^1����wf���:���^P?�R/M��-����������=CO���A�����EU�S������������J�YD7��f��`�����C��C��C6�p��@H"+VICnFC4h��h���������3� 5���@7��8�:��;��<�`>����������T��V�b�C�t�
�C�B�JZj<$-v����u��t�]I��C���K����*��
���������������������L:��;���G��g������KP��Q�J�a+�{I=�B�4�����[�����{���g�-D{�+i���}Q��T��/;C�<yK.uT�b��n��|����C���e��������1@���������������r���;��;_
14b�������{�����p����������*
���;��F�`Q����3�5�p�FG@Glg�g�g�gT,VT`VT�*T��Tw���R1~Z\I����%����t�=�oW�y�t>��9�Cs�J��B�pEswE�}F�aF�dF�gF�jD�mDypD93(��(��(O+w�(���/��������. [}�h{^���U��-���>�R���^��~��=�ho1�}�U������J������}�<�2P��]�"T��_���*�IFt��t#����:�Z: [:\S(��]�za)�e�7|�7�����O�����_�������~�'~���}���9D����s��.������/����������h����@
3
#
.A!gDjD�,(��( �(x�(��(0�(��(���P����� U��U��P��B|k*�W����hiswwK����+���5���t^��������\\��]���Q��QN�QQ��Q~Q.Q���#��#��A�{D�~F�!�g�������V�]lGv?��b��iR�m����w�����z�[�hr�q*�7���j��fGi�'�e���3����/B���+�����Dt��n�����]:�Z:[:\S{��}�����~�����20� �?�S?�<