pg_dump does way too much before acquiring table locks
While poking at pg_dump for some work I'll show later, I grew quite
unhappy with the extent to which people have ignored this advice
given near the head of getTables():
* Note: in this phase we should collect only a minimal amount of
* information about each table, basically just enough to decide if it is
* interesting. We must fetch all tables in this phase because otherwise
* we cannot correctly identify inherited columns, owned sequences, etc.
Far from collecting "a minimal amount of information", we have
repeatedly stuffed extra joins and expensive sub-selects into this
query. That's fairly inefficient if we aren't interested in dumping
every table, but it's much worse than that: we are collecting all this
info *before we have lock* on the tables. That means we are at
serious risk of data skew in any place where we consult server-side
functions, eg pg_get_partkeydef(). For example:
regression=# create table foo(f1 int) partition by range(f1);
CREATE TABLE
regression=# begin; drop table foo;
BEGIN
DROP TABLE
Now start a pg_dump in another shell session, wait a second or two,
and
regression=*# commit;
COMMIT
and the pg_dump blows up:
$ pg_dump -s regression
pg_dump: error: query failed: ERROR: could not open relation with OID 37796
(Note: it's not so easy to provoke this failure manually before
e2ff7d9a8 guaranteed that pg_dump would wait around for you,
but it's certainly possible.)
To add insult to injury, all that work being done in this query
makes the time window for trouble wider.
Testing on the regression database, which isn't all that big,
I observe getTable's query taking 140 ms, substantially longer
than the next slowest thing done by "pg_dump -s regression".
It seems that the largest chunk of this can be blamed on the
sub-selects added by the pg_init_privs patch: EXPLAIN ANALYZE
puts their total runtime at ~100ms.
I believe that the pg_init_privs sub-selects can probably be
nuked, or at least their cost can be paid later. The other work
I mentioned has proven that we do not actually need to know
the ACL situation to determine whether a table is "interesting",
so we don't have to do that work before acquiring lock.
However, that still leaves us doing several inessential joins,
not to mention those unsafe partition-examination functions,
before acquiring lock.
I am thinking that the best fix is to make getTables() perform
two queries (or, possibly, split it into two functions). The
first pass would acquire only the absolutely minimal amount of
data needed to decide whether a table is "interesting", and then
lock such tables. Then we'd go back to fill in the remaining
data. While it's fairly annoying to read pg_class twice, it
looks like the extra query might only take a handful of msec.
Also, if we don't do it like that, it seems that we'd have to
add entirely new queries to call pg_get_partkeydef() and
pg_get_expr(relpartbound) in. So I'm not really seeing a
better-performing alternative.
Thoughts?
regards, tom lane
On Wed, Oct 20, 2021 at 05:14:45PM -0400, Tom Lane wrote:
Lastly, patch 0003 addresses the concern I raised at [3] that it's
unsafe to call pg_get_partkeydef() and pg_get_expr(relpartbound)
in getTables(). Looking closer I realized that we can't cast
pg_class.reloftype to regtype at that point either, since regtypeout
is going to notice if the type has been concurrently dropped.In [3] I'd imagined that we could just delay those calls to a second
query in getTables(), but that doesn't work at all: if we apply
these functions to every row of pg_class, we still risk failure
against any relation that we didn't lock. So there seems little
alternative but to push these functions out to secondary queries
executed later.Arguably, 0003 is a bug fix that we should consider back-patching.
However, I've not heard field reports of the problems it fixes,
so maybe there's no need to bother.
FYI, I see this issue happen in production environment.
Grepping logfiles on ~40 servers, I see it happened at least 3 times since
October 1.
Our backup script is probably particularly sensitive to this: it separately
dumps each "old" partition once and for all. We're more likely to hit this
since pg_dump is called in a loop.
I never reported it, since I think it's a documented issue, and it's no great
problem, so long as it runs the next day. But it'd be a pain to find that the
backup was incomplete when we needed it. Also, I use the backups to migrate to
new servers, and it would be a pain to start the job at a calculated time
expecting it to finish at the beginning of a coordinated maintenance window,
but then discover that it had failed and needs to be rerun with fingers
crossed.
On Sun, Oct 17, 2021 at 02:45:13PM -0400, Tom Lane wrote:
While poking at pg_dump for some work I'll show later, I grew quite
unhappy with the extent to which people have ignored this advice
given near the head of getTables():* Note: in this phase we should collect only a minimal amount of
* information about each table, basically just enough to decide if it is
* interesting. We must fetch all tables in this phase because otherwise
* we cannot correctly identify inherited columns, owned sequences, etc.Far from collecting "a minimal amount of information", we have
repeatedly stuffed extra joins and expensive sub-selects into this
query. That's fairly inefficient if we aren't interested in dumping
every table, but it's much worse than that: we are collecting all this
info *before we have lock* on the tables. That means we are at
serious risk of data skew in any place where we consult server-side
functions, eg pg_get_partkeydef(). For example:regression=# create table foo(f1 int) partition by range(f1);
CREATE TABLE
regression=# begin; drop table foo;
BEGIN
DROP TABLENow start a pg_dump in another shell session, wait a second or two,
andregression=*# commit;
COMMITand the pg_dump blows up:
$ pg_dump -s regression
pg_dump: error: query failed: ERROR: could not open relation with OID 37796
...
Justin Pryzby <pryzby@telsasoft.com> writes:
On Wed, Oct 20, 2021 at 05:14:45PM -0400, Tom Lane wrote:
Arguably, 0003 is a bug fix that we should consider back-patching.
However, I've not heard field reports of the problems it fixes,
so maybe there's no need to bother.
FYI, I see this issue happen in production environment.
I never reported it, since I think it's a documented issue, and it's no great
problem, so long as it runs the next day. But it'd be a pain to find that the
backup was incomplete when we needed it. Also, I use the backups to migrate to
new servers, and it would be a pain to start the job at a calculated time
expecting it to finish at the beginning of a coordinated maintenance window,
but then discover that it had failed and needs to be rerun with fingers
crossed.
Yeah, if you're dropping tables all the time, pg_dump is going to have
a problem with that. The fix I'm suggesting here would only ensure
that you get a "clean" failure at the LOCK TABLE command --- but from
an operational standpoint, that's little improvement.
The natural response is to consider retrying the whole dump after a lock
failure. I'm not sure if it'd be practical to do that within pg_dump
itself, as opposed to putting a loop in whatever script calls it.
regards, tom lane
Hello Tom!
I noticed you are improving pg_dump just now.
Some time ago I experimented with a customer database dump in parallel directory mode -F directory -j (2-4)
I noticed it took quite long to complete.
Further investigation showed that in this mode with multiple jobs the tables are processed in decreasing size order, which makes sense to avoid a long tail of a big table in one of the jobs prolonging overall dump time.
Exactly one table took very long, but seemed to be of moderate size.
But the size-determination fails to consider the size of toast tables and this table had a big associated toast-table of bytea column(s).
Even with an analyze at loading time there where no size information of the toast-table in the catalog tables.
I thought of the following alternatives to ameliorate:
1. Using pg_table_size() function in the catalog query
Pos: This reflects the correct size of every relation
Neg: This goes out to disk and may take a huge impact on databases with very many tables
2. Teaching vacuum to set the toast-table size like it sets it on normal tables
3. Have a command/function for occasionly setting the (approximate) size of toast tables
I think with further work under the way (not yet ready), pg_dump can really profit from parallel/not compressing mode, especially considering the huge amount of bytea/blob/string data in many big customer scenarios.
Thoughts?
Hans Buschmann
Import Notes
Resolved by subject fallback
Hans Buschmann <buschmann@nidsa.net> writes:
Some time ago I experimented with a customer database dump in parallel directory mode -F directory -j (2-4)
I noticed it took quite long to complete.
Further investigation showed that in this mode with multiple jobs the tables are processed in decreasing size order, which makes sense to avoid a long tail of a big table in one of the jobs prolonging overall dump time.
Exactly one table took very long, but seemed to be of moderate size.
But the size-determination fails to consider the size of toast tables and this table had a big associated toast-table of bytea column(s).
Hmm, yeah, we just use pg_class.relpages for scheduling parallel dumps.
I'd supposed that would be fine, but maybe it's worth being smarter.
I think it should be sufficient to add on the toast table's relpages
value; that's maintained by autovacuum on the same terms as relpages
for regular tables. See 0005 below.
Here's an update of this patch series:
0001 is the same as before, except I changed collectComments and
collectSecLabels to strdup the strings they want and then release
their PGresults. The previous behavior confused valgrind's leak
tracking, which is only a minor annoyance, but I think we can
justify changing it now that these functions don't save all of
the collected comments or seclabels. In particular, we've got
no need for the several thousand comments on built-in objects,
so that that PGresult is at least 100KB bigger than what we're
going to keep.
0002 is updated to account for commit 2acc84c6f.
0003 is the same except I added a missing free().
0004 is a new patch based on an idea from Andres Freund [1]/messages/by-id/20211022055939.z6fihsm7hdzbjttf@alap3.anarazel.de:
in the functions that repetitively issue the same query against
different tables, issue just one query and use a WHERE clause
to restrict the output to the tables we care about. I was
skeptical about this to start with, but it turns out to be
quite a spectacular win. On my machine, the time to pg_dump
the regression database (with "-s") drops from 0.91 seconds
to 0.39 seconds. For a database with 10000 toy tables, the
time drops from 18.1 seconds to 2.3 seconds.
0004 is not committable as-is, because it assumes that the source
server has single-array unnest(), which is not true before 8.4.
We could fix that by using "oid = ANY(array-constant)" conditions
instead, but I'm unsure about the performance properties of that
for large OID arrays on those old server versions. There's a
discussion at [2]/messages/by-id/2923349.1634942313@sss.pgh.pa.us about whether it'd be okay to drop pg_dump's
support for pre-8.4 servers; if we do so, then it would become
unnecessary to do anything more here.
0005 implements your suggestion of accounting for TOAST data while
scheduling parallel dumps. I realized while looking at that that
there's a pre-existing bug, which this'd exacerbate: on machines
with 32-bit off_t, dataLength can overflow. Admittedly such machines
are just about extinct in the wild, but we do still trouble to support
the case. So 0005 also includes code to check for overflow and clamp
the result to INT_MAX blocks.
Maybe we should back-patch 0005. OTOH, how likely is it that anyone
is wrangling tables exceeding 16TB on a machine with 32-bit off_t?
Or that poor parallel dump scheduling would be a real problem in
such a case?
Lastly, 0006 implements the other idea we'd discussed in the other
thread: for queries that are issued repetitively but not within a
single pg_dump function invocation, use PREPARE/EXECUTE to cut down
the overhead. This gets only diminishing returns after 0004, but
it still brings "pg_dump -s regression" down from 0.39s to 0.33s,
so maybe it's worth doing. I stopped after caching the plans for
functions/aggregates/operators/types, though. The remaining sorts
of objects aren't likely to appear in typical databases enough times
to be worth worrying over. (This patch will be a net loss if there
are more than zero but less than perhaps 10 instances of an object type,
so there's definitely reasons beyond laziness for not doing more.)
regards, tom lane
[1]: /messages/by-id/20211022055939.z6fihsm7hdzbjttf@alap3.anarazel.de
[2]: /messages/by-id/2923349.1634942313@sss.pgh.pa.us
Attachments:
0001-fix-component-masking-2.patchtext/x-diff; charset=us-ascii; name=0001-fix-component-masking-2.patchDownload+351-280
0002-rethink-ACL-handling-2.patchtext/x-diff; charset=us-ascii; name=0002-rethink-ACL-handling-2.patchDownload+784-1137
0003-move-unsafe-function-calls-2.patchtext/x-diff; charset=us-ascii; name=0003-move-unsafe-function-calls-2.patchDownload+64-33
0004-bulk-queries-2.patchtext/x-diff; charset=us-ascii; name=0004-bulk-queries-2.patchDownload+1008-760
0005-fix-toast-sched-2.patchtext/x-diff; charset=us-ascii; name=0005-fix-toast-sched-2.patchDownload+27-9
0006-prepare-repeated-queries-2.patchtext/x-diff; charset=us-ascii; name=0006-prepare-repeated-queries-2.patchDownload+527-365
On Sun, Oct 24, 2021 at 05:10:55PM -0400, Tom Lane wrote:
0003 is the same except I added a missing free().
0004 is a new patch based on an idea from Andres Freund [1]:
in the functions that repetitively issue the same query against
different tables, issue just one query and use a WHERE clause
to restrict the output to the tables we care about. I was
skeptical about this to start with, but it turns out to be
quite a spectacular win. On my machine, the time to pg_dump
the regression database (with "-s") drops from 0.91 seconds
to 0.39 seconds. For a database with 10000 toy tables, the
time drops from 18.1 seconds to 2.3 seconds.
+ if (tbloids->len > 1)
+ appendPQExpBufferChar(tbloids, ',');
+ appendPQExpBuffer(tbloids, "%u", tbinfo->dobj.catId.oid);
I think this should say
+ if (tbloids->len > 0)
That doesn't matter much since catalogs aren't dumped as such, and we tend to
count in base 10 and not base 10000.
BTW, the ACL patch makes the overhead 6x lower (6.9sec vs 1.2sec) for pg_dump -t
of a single, small table. Thanks for that.
--
Justin
Justin Pryzby <pryzby@telsasoft.com> writes:
On Sun, Oct 24, 2021 at 05:10:55PM -0400, Tom Lane wrote:
+ if (tbloids->len > 1)
I think this should say
+ if (tbloids->len > 0)
No, >1 is the correct test, because it's checking the string length
and we started by stuffing a '{' into the string. Maybe needs a
comment.
BTW, the ACL patch makes the overhead 6x lower (6.9sec vs 1.2sec) for pg_dump -t
of a single, small table. Thanks for that.
Yeah --- I haven't done any formal measurements of the case where you're
selecting a small number of tables, but I did note that it decreased a
good deal compared to HEAD.
regards, tom lane
________________________________________
1. Von: Tom Lane <tgl@sss.pgh.pa.us>
Maybe we should back-patch 0005. OTOH, how likely is it that anyone
is wrangling tables exceeding 16TB on a machine with 32-bit off_t?
Or that poor parallel dump scheduling would be a real problem in
such a case?
I tested your patch on Windows x64, pg15_devel_25.10.2021 against the customer database
( 2 core/4 thread NUC,32 GB RAM, 1 NVME-SSD, 4 jobs)
pg_dump manually patched with your changes
database pg14.0, 20 GB shared buffers.
The dump of the database tables took 3min7sec for a 16 GB database resulting in a directory of 31.1 GB with 1628 files!
The dump worked like a rush: full cpu-usage, finish.
I don't have the old performance data available, but it is a real improvement, so backpatching may be really woth the effort.
The former slowing-down table has a ratio from 5169 relpages to 673024 toastpages.
Despite of the great disk usage (about dubbling the size from the db) directory mode seems to be by far the fastest mode especially for databases in the range 1TB+.
For archiving purposes an extern_to_ postgres tool often fits better for compression and can be applied to the dumped data not withholding the dump process.
I am still working on another big speedup in this scenario (comming soon).
-----------------------------------------------------------
2. Another suggestion considering pg_dump
With some customer databases I follow a yearly practice of pg_dump/pg_restore to the new major version.
This eliminates all bloat and does a full reindex, so the disk data layout is already quite clean.
It would be very favorable to dump the pages according to the CLUSTER index when defined for a table. This would only concern the select to retrieve the rows and not harm pg_dump's logic.
This would give perfectly reorganized tables in a pg_dump/pg_restore round.
If a cluster index is defined by the customer, this expresses the whish to have the table layout in this way and nothing is introduced arbitrarily.
I would suggest to have a flag (--cluster) for pg_dump to activate this new behavior.
I think this is not immediately part of the current patchset, but should be taken into account for pg_dump ameliorations in PG15.
In the moment I have not yet enough knowledge to propose a patch of this kind ( a logic similar to the cluster command itself). Perhaps someone could jump in ...
Thanks for the patch and awaiting your thoughts
Hans Buschmann
Hi,
On 2021-10-24 17:10:55 -0400, Tom Lane wrote:
0004 is not committable as-is, because it assumes that the source
server has single-array unnest(), which is not true before 8.4.
We could fix that by using "oid = ANY(array-constant)" conditions
instead, but I'm unsure about the performance properties of that
for large OID arrays on those old server versions.
It doesn't seem bad at all. 8.3 assert:
CREATE TABLE foo(oid oid primary key);
INSERT INTO foo SELECT generate_series(1, 1000000);
postgres[1164129][1]=# explain ANALYZE SELECT count(*) FROM foo WHERE oid = ANY(ARRAY(SELECT generate_series(1100000, 1, -1)));
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Aggregate (cost=81.54..81.55 rows=1 width=0) (actual time=2433.656..2433.656 rows=1 loops=1) │
│ InitPlan │
│ -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.004..149.425 rows=1100000 loops=1) │
│ -> Bitmap Heap Scan on foo (cost=42.70..81.50 rows=10 width=0) (actual time=2275.137..2369.478 rows=1000000 loops=1) │
│ Recheck Cond: (oid = ANY (($0)::oid[])) │
│ -> Bitmap Index Scan on foo_pkey (cost=0.00..42.69 rows=10 width=0) (actual time=2274.077..2274.077 rows=1000000 loops=1) │
│ Index Cond: (oid = ANY (($0)::oid[])) │
│ Total runtime: 2436.201 ms │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(8 rows)
Time: 2437.568 ms (00:02.438)
Lastly, 0006 implements the other idea we'd discussed in the other
thread: for queries that are issued repetitively but not within a
single pg_dump function invocation, use PREPARE/EXECUTE to cut down
the overhead. This gets only diminishing returns after 0004, but
it still brings "pg_dump -s regression" down from 0.39s to 0.33s,
so maybe it's worth doing.
I think it's worth doing. There's things that the batch approach won't help
with and even if it doesn't help a lot with the regression test database, I'd
expect it to help plenty with other cases.
A test database I had around with lots of functions got drastically faster to
dump (7.4s to 2.5s), even though the number of queries didn't change
significantly. According to pg_stat_statements plan_time for the dumpFunc
query went from 2352ms to 0.4ms - interestingly execution time nearly halves
as well.
I stopped after caching the plans for
functions/aggregates/operators/types, though. The remaining sorts
of objects aren't likely to appear in typical databases enough times
to be worth worrying over. (This patch will be a net loss if there
are more than zero but less than perhaps 10 instances of an object type,
so there's definitely reasons beyond laziness for not doing more.)
Seems reasonable.
@@ -7340,25 +7340,37 @@ getDomainConstraints(Archive *fout, TypeInfo *tyinfo)
i_consrc;
int ntups;- query = createPQExpBuffer(); + static bool query_prepared = false;- if (fout->remoteVersion >= 90100) - appendPQExpBuffer(query, "SELECT tableoid, oid, conname, " - "pg_catalog.pg_get_constraintdef(oid) AS consrc, " - "convalidated " - "FROM pg_catalog.pg_constraint " - "WHERE contypid = '%u'::pg_catalog.oid " - "ORDER BY conname", - tyinfo->dobj.catId.oid); + if (!query_prepared) + {
I wonder if it'd be better to store this in Archive or such. The approach with
static variables might run into problems with parallel pg_dump at some
point. These objects aren't dumped in parallel yet, but still...
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2021-10-24 17:10:55 -0400, Tom Lane wrote:
+ static bool query_prepared = false;
I wonder if it'd be better to store this in Archive or such. The approach with
static variables might run into problems with parallel pg_dump at some
point. These objects aren't dumped in parallel yet, but still...
Yeah, I wasn't too happy with the static bools either. However, each
function would need its own field in the struct, which seems like a
maintenance annoyance, plus a big hazard for future copy-and-paste
changes (ie, copy and paste the wrong flag name -> trouble). Also
the Archive struct is shared between dump and restore cases, so
adding a dozen fields that are irrelevant for restore didn't feel
right. So I'd like a better idea, but I'm not sure that that one
is better.
regards, tom lane
On 2021-Oct-25, Tom Lane wrote:
Yeah, I wasn't too happy with the static bools either. However, each
function would need its own field in the struct, which seems like a
maintenance annoyance, plus a big hazard for future copy-and-paste
changes (ie, copy and paste the wrong flag name -> trouble). Also
the Archive struct is shared between dump and restore cases, so
adding a dozen fields that are irrelevant for restore didn't feel
right. So I'd like a better idea, but I'm not sure that that one
is better.
What about a separate struct passed from pg_dump's main() to the
functions that execute queries, containing a bunch of bools? This'd
still have the problem that mindless copy and paste would cause a bug,
but I wonder if that isn't overstated: if you use the wrong flag,
pg_dump would fail as soon as you try to invoke your query when it
hasn't been prepared yet.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"I'm impressed how quickly you are fixing this obscure issue. I came from
MS SQL and it would be hard for me to put into words how much of a better job
you all are doing on [PostgreSQL]."
Steve Midgley, http://archives.postgresql.org/pgsql-sql/2008-08/msg00000.php
Hi,
On 2021-10-25 16:02:34 -0400, Tom Lane wrote:
So I'd like a better idea, but I'm not sure that that one is better.
I guess we could move the prepared-statement handling into a query execution
helper. That could then use a hashtable or something similar to check if a
certain prepared statement already exists. That'd then centrally be extensible
to deal with multiple connects etc.
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
I guess we could move the prepared-statement handling into a query execution
helper. That could then use a hashtable or something similar to check if a
certain prepared statement already exists. That'd then centrally be extensible
to deal with multiple connects etc.
That seems like more mechanism than is warranted. I tried it with a
simple array of booleans, and that seems like not too much of a mess;
see revised 0006 attached.
(0001-0005 are the same as before; including them just to satisfy
the cfbot.)
regards, tom lane
Attachments:
0001-fix-component-masking-2.patchtext/x-diff; charset=us-ascii; name=0001-fix-component-masking-2.patchDownload+351-280
0002-rethink-ACL-handling-2.patchtext/x-diff; charset=us-ascii; name=0002-rethink-ACL-handling-2.patchDownload+784-1137
0003-move-unsafe-function-calls-2.patchtext/x-diff; charset=us-ascii; name=0003-move-unsafe-function-calls-2.patchDownload+64-33
0004-bulk-queries-2.patchtext/x-diff; charset=us-ascii; name=0004-bulk-queries-2.patchDownload+1008-760
0005-fix-toast-sched-2.patchtext/x-diff; charset=us-ascii; name=0005-fix-toast-sched-2.patchDownload+27-9
0006-prepare-repeated-queries-3.patchtext/x-diff; charset=us-ascii; name=0006-prepare-repeated-queries-3.patchDownload+532-366
Here's an updated version of this patch set. The only non-line-number
changes are
(1) in 0004, I dealt with the issue of not having unnest() in old branches
by bumping the minimum remote server version to 8.4. Seeing that we seem
to have consensus in the other thread to push the minimum up to somewhere
around 9.2, I see no point in making this patch put in conditional code
that we'd shortly rip out again.
(2) I also added some comments to 0004 to hopefully address Justin's
confusion about string lengths.
I feel quite fortunate that a month's worth of commitfest hacking
didn't break any of these patches. Unless someone intends to
review these more thoroughly than they already have, I'd like to
go ahead and push them.
regards, tom lane
Attachments:
0001-fix-component-masking-3.patchtext/x-diff; charset=us-ascii; name=0001-fix-component-masking-3.patchDownload+351-280
0002-rethink-ACL-handling-3.patchtext/x-diff; charset=us-ascii; name=0002-rethink-ACL-handling-3.patchDownload+784-1137
0003-move-unsafe-function-calls-3.patchtext/x-diff; charset=us-ascii; name=0003-move-unsafe-function-calls-3.patchDownload+64-33
0004-bulk-queries-3.patchtext/x-diff; charset=us-ascii; name=0004-bulk-queries-3.patchDownload+1012-764
0005-fix-toast-sched-3.patchtext/x-diff; charset=us-ascii; name=0005-fix-toast-sched-3.patchDownload+27-9
0006-prepare-repeated-queries-4.patchtext/x-diff; charset=us-ascii; name=0006-prepare-repeated-queries-4.patchDownload+532-366
Hello Tom,
from your mail from 25.10.2021:
0005 implements your suggestion of accounting for TOAST data while
scheduling parallel dumps. I realized while looking at that that
there's a pre-existing bug, which this'd exacerbate: on machines
with 32-bit off_t, dataLength can overflow. Admittedly such machines
are just about extinct in the wild, but we do still trouble to support
the case. So 0005 also includes code to check for overflow and clamp
the result to INT_MAX blocks.
Maybe we should back-patch 0005. OTOH, how likely is it that anyone
is wrangling tables exceeding 16TB on a machine with 32-bit off_t?
Or that poor parallel dump scheduling would be a real problem in
such a case?
I noticed that you patched master with all the improvements in pg_dump.
Did you change your mind about backpatching patch 0005 to fix the toast size matter?
It would be rather helpfull for handling existent user data in active branches.
On the matter of 32bit versions I think they are used only in much more little instances.
BTW the 32 bit build of postgres on Windows does not work any more with more modern tool sets (tested with VS2019 and VS2022) albeit not excluded explicitly in the docs. But no one complained yet (for a long time now...).
Thanks
Hans Buschmann
Hans Buschmann <buschmann@nidsa.net> writes:
I noticed that you patched master with all the improvements in pg_dump.
Did you change your mind about backpatching patch 0005 to fix the toast size matter?
I looked briefly at that and found that the patch would have to be
largely rewritten, because getTables() looks quite different in the
older branches. I'm not really sufficiently excited about the point
to do that rewriting and re-testing. I think that cases where the
old logic gets the scheduling badly wrong are probably rare.
regards, tom lane