pg_dump slower than pg_restore
I'm running PG 9.3.4 on CentOS 6.4 and noted that backing up my database
takes much longer than restoring it.
That seems counter-intuitive to me because it seems like reading from a
database should generally be faster than writing to it.
I have a database that pg_database_size reports as 18GB, and resulting
dump is about 13GB in 27 files (split creates them as 512MB).
A pg_dump backup -- with most of the data stored as large objects --
takes about 5 hours.
But restoring that dump takes about 2 hours. So it's taking 2.5 times
longer to back it up than to restore it.
My backup script runs vacuumlo, then vacuum, then analyze, then pg_dump
--format=c --oids $DB
I actually push pg_dump output through gzip, gpg and split on 512MB
files, but they shouldn't matter too much I figure as I have to run cat,
gpg and gunzip before pg_restore. In fact, my restore should have been
at a disadvantage because I used '-v' and showed the results to my ssh
term over the Internet which includes a line for each LOID, and the
postgresql.conf had 'ddl' logging on (which I suspect I can turn off in
future restores to speed things up a bit).
Is there something that might be wrong about my configuration that the
backup is slower than the restore?
Thanks,
David
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 07/03/14 10:04, David Wall wrote:
A pg_dump backup -- with most of the data stored as large objects --
takes about 5 hours.
If those large objects are 'files' that are already compressed (e.g.
most image files and pdf's) you are spending a lot of time trying to
compress the compressed data ... and failing.
Try setting the compression factor to an intermediate value, or even
zero (i.e. no dump compression). For example, to get the 'low hanging
fruit' compressed:
$ pg_dump -Z1 -Fc ...
IIRC, the default value of '-Z' is 6.
As usual your choice will be a run-time vs file-size trade-off so try
several values for '-Z' and see what works best for you.
HTH,
Bosco.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 03/07/14 18:36, Bosco Rama wrote:
On 07/03/14 10:04, David Wall wrote:
A pg_dump backup -- with most of the data stored as large objects --
takes about 5 hours.If those large objects are 'files' that are already compressed (e.g.
most image files and pdf's) you are spending a lot of time trying to
compress the compressed data ... and failing.Try setting the compression factor to an intermediate value, or even
zero (i.e. no dump compression). For example, to get the 'low hanging
fruit' compressed:
$ pg_dump -Z1 -Fc ...IIRC, the default value of '-Z' is 6.
As usual your choice will be a run-time vs file-size trade-off so try
several values for '-Z' and see what works best for you.HTH,
Bosco.
I'd also check the effect of those other run components; the vacuum's
and other things that are only running with the backup and not during
the restore.
--
Tim Clarke
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Thu, Jul 3, 2014 at 10:04 AM, David Wall <d.wall@computer.org> wrote:
I'm running PG 9.3.4 on CentOS 6.4 and noted that backing up my database
takes much longer than restoring it.
Are you dumping to a slower disk/storage than the database is using?
What does top -c look like during the dump vs. the restore? I.e. can you
identify the differing bottlenecks in the two cases and dig in further from
there.
My backup script runs vacuumlo, then vacuum, then analyze, then pg_dump
--format=c --oids $DB
How long does the backup take if you don't vacuum/analyze? On some of my
dbs that would be a huge part of the time. Honestly I'd pull the
vacuum/analyze out of the backup and into a different (nightly) task,
there's no real benefit doing as part of the backup task, other than just
getting it done. Analyzing after the restore would be a better time to do
it, to freshen up the statistics.
On Thu, 03 Jul 2014 10:04:12 -0700
David Wall <d.wall@computer.org> wrote:
I'm running PG 9.3.4 on CentOS 6.4 and noted that backing up my
database takes much longer than restoring it.That seems counter-intuitive to me because it seems like reading from
a database should generally be faster than writing to it.I have a database that pg_database_size reports as 18GB, and
resulting dump is about 13GB in 27 files (split creates them as
512MB).A pg_dump backup -- with most of the data stored as large objects --
takes about 5 hours.But restoring that dump takes about 2 hours. So it's taking 2.5
times longer to back it up than to restore it.My backup script runs vacuumlo, then vacuum, then analyze, then
pg_dump
--format=c --oids $DBI actually push pg_dump output through gzip, gpg and split on 512MB
files, but they shouldn't matter too much I figure as I have to run
cat, gpg and gunzip before pg_restore. In fact, my restore should
have been at a disadvantage because I used '-v' and showed the
results to my ssh term over the Internet which includes a line for
each LOID, and the postgresql.conf had 'ddl' logging on (which I
suspect I can turn off in future restores to speed things up a bit).Is there something that might be wrong about my configuration that
the backup is slower than the restore?
No, there's nothing wrong. All transparent compressed objects stored in database, toast, lo, etc.. is transparently decompressed while pg_dump access them and then you gzip it again. I don't know why it doesn't dump the compressed data directly.
Thanks,
David
--- ---
Eduardo Morras <emorrasg@yahoo.es>
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 7/3/2014 10:36 AM, Bosco Rama wrote:
If those large objects are 'files' that are already compressed (e.g.
most image files and pdf's) you are spending a lot of time trying to
compress the compressed data ... and failing.Try setting the compression factor to an intermediate value, or even
zero (i.e. no dump compression). For example, to get the 'low hanging
fruit' compressed:
$ pg_dump -Z1 -Fc ...IIRC, the default value of '-Z' is 6.
As usual your choice will be a run-time vs file-size trade-off so try
several values for '-Z' and see what works best for you.
That's interesting. Since I gzip the resulting output, I'll give -Z0 a
try. I didn't realize that any compression was on by default.
Thanks for the tip...
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 7/3/2014 10:38 AM, Tim Clarke wrote:
I'd also check the effect of those other run components; the vacuum's
and other things that are only running with the backup and not during
the restore.
The vacuumlo, vacuum and analyze run before the pg_dump. I am not
talking about any of the time they spend doing anything. What I am
measuring is the last modified timestamps of the 512MB split files only,
so if anything, I'm under-measuring by the time it takes to do the first
512MB segment.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 7/3/2014 4:51 PM, David Wall wrote:
That's interesting. Since I gzip the resulting output, I'll give -Z0
a try. I didn't realize that any compression was on by default.
default compression only happens in with pg_dump -Fc
--
john r pierce 37N 122W
somewhere on the middle of the left coast
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 07/03/14 16:51, David Wall wrote:
On 7/3/2014 10:36 AM, Bosco Rama wrote:
If those large objects are 'files' that are already compressed (e.g.
most image files and pdf's) you are spending a lot of time trying to
compress the compressed data ... and failing.Try setting the compression factor to an intermediate value, or even
zero (i.e. no dump compression). For example, to get the 'low hanging
fruit' compressed:
$ pg_dump -Z1 -Fc ...IIRC, the default value of '-Z' is 6.
As usual your choice will be a run-time vs file-size trade-off so try
several values for '-Z' and see what works best for you.That's interesting. Since I gzip the resulting output, I'll give -Z0 a
try. I didn't realize that any compression was on by default.
If you use gzip you will be doing the same 'possibly unnecessary'
compression step. Use a similar approach to the gzip command as you
would for the pg_dump command. That is, use one if the -[0-9] options,
like this:
$ pg_dump -Z0 -Fc ... | gzip -[0-9] ...
Thanks for the tip...
NP.
HTH,
Bosco.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 07/03/14 17:00, John R Pierce wrote:
On 7/3/2014 4:51 PM, David Wall wrote:
That's interesting. Since I gzip the resulting output, I'll give -Z0
a try. I didn't realize that any compression was on by default.default compression only happens in with pg_dump -Fc
Yeah. OP says he is using --format=c
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 7/3/2014 5:13 PM, Bosco Rama wrote:
If you use gzip you will be doing the same 'possibly unnecessary'
compression step. Use a similar approach to the gzip command as you
would for the pg_dump command. That is, use one if the -[0-9] options,
like this: $ pg_dump -Z0 -Fc ... | gzip -[0-9] ...
Bosco, maybe you can recommend a different approach. I pretty much run
daily backups that I only have for disaster recovery. I generally don't
do partials recoveries, so I doubt I'd ever modify the dump output. I
just re-read the docs about formats, and it's not clear what I'd be best
off with, and "plain" is the default, but it doesn't say it can be used
with pg_restore.
Maybe the --format=c isn't the fastest option for me, and I'm less sure
about the compression. I do want to be able to restore using pg_restore
(unless plain is the best route, in which case, how do I restore that
type of backup?), and I need to include large objects (--oids), but
otherwise, I'm mostly interested in it being as quick as possible.
Many of the large objects are gzip compressed when stored. Would I be
better off letting PG do its compression and remove gzip, or turn off
all PG compression and use gzip? Or perhaps use neither if my large
objects, which take up the bulk of the database, are already compressed?
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 07/03/14 17:30, David Wall wrote:
Bosco, maybe you can recommend a different approach. I pretty much run
daily backups that I only have for disaster recovery. I generally don't
do partials recoveries, so I doubt I'd ever modify the dump output. I
just re-read the docs about formats, and it's not clear what I'd be best
off with, and "plain" is the default, but it doesn't say it can be used
with pg_restore.
Correct. Plain is essentially one big SQL command file that you feed to
psql as the restore process.
Maybe the --format=c isn't the fastest option for me,
I think you are on the right track with -Fc since plain would end up
converting the LO parts to escaped byte format for storage in the SQL
command file.
and I'm less sure about the compression.
You seem to be gaining a considerable amount of compression and that is
probably worthwhile given that it is mostly going to be the 'easy' kind.
I do want to be able to restore using pg_restore (unless plain is the
best route, in which case, how do I restore that type of backup?), and
I need to include large objects (--oids), but otherwise, I'm mostly
interested in it being as quick as possible.
Hmmm. You are using '--oids' to *include* large objects? IIRC, that's
not the intent of that option. Large objects are dumped as part of a
DB-wide dump unless you request that they not be.
However, if you restrict your dumps to specific schemata and/or tables
then the large objects are NOT dumped unless you request that they are.
Something to keep in mind.
Many of the large objects are gzip compressed when stored. Would I be
better off letting PG do its compression and remove gzip, or turn off
all PG compression and use gzip? Or perhaps use neither if my large
objects, which take up the bulk of the database, are already compressed?
OK. Given all the above (and that gpg will ALSO do compression unless
told not to), I'd go with the following (note lowercase 'z' in gpg
command). Note also that there may be a CPU vs I/O trade-off here that
may make things muddier but the following are 'conceptually' true.
Fast but big
============
$ pg_dump -Z0 -Fc ... $DB | gpg -z0 ... | split
Less fast but smaller
=====================
$ pg_dump -Z1 -Fc ... $DB | gpg -z0 ... | split
(the '...' would be any access and/or key selection options)
Do some timing/size comparisons and see which is best for you.
BTW, is there any particular reason to do the 'split'?
For later consideration
=======================
Once you have this under control you may want to investigate the -Fd
format. It allows you to use the '-j' option that may speed things
up even further. However, I have no experience with it so I am unable
to make any concrete recommendations about it. It *would* require a
post dump tar/zip/cpio to get to a single file backup, though. If you
don't need a single file solution (as possibly exemplified by your use
of 'split') it may be just what you want.
Let us know how it goes. :-)
HTH,
Bosco.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 7/3/2014 6:26 PM, Bosco Rama wrote:
Hmmm. You are using '--oids' to *include* large objects? IIRC, that's
not the intent of that option. Large objects are dumped as part of a
DB-wide dump unless you request that they not be. However, if you
restrict your dumps to specific schemata and/or tables then the large
objects are NOT dumped unless you request that they are. Something to
keep in mind.
I can get rid of --oids and see what happens. I used to have
cross-table references to OID fields before, so this is no doubt a
holdover, but I think I am now using UUIDs for all such links/references
and the OID fields are just like any other data field. It may not be
needed and I'll see if it speeds up the backup and restores correctly.
Many of the large objects are gzip compressed when stored. Would I be
better off letting PG do its compression and remove gzip, or turn off
all PG compression and use gzip? Or perhaps use neither if my large
objects, which take up the bulk of the database, are already compressed?OK. Given all the above (and that gpg will ALSO do compression unless
told not to), I'd go with the following (note lowercase 'z' in gpg
command). Note also that there may be a CPU vs I/O trade-off here that
may make things muddier but the following are 'conceptually' true.Fast but big
============
$ pg_dump -Z0 -Fc ... $DB | gpg -z0 ... | splitLess fast but smaller
=====================
$ pg_dump -Z1 -Fc ... $DB | gpg -z0 ... | split
I'll give that a try now. I didn't notice any real time savings when I
changed pg_dump without any -Z param to -Z 0, and oddly, not much of a
difference removing gzip entirely.
BTW, is there any particular reason to do the 'split'?
Yes, I transfer the files to Amazon S3 and there were too many troubles
with one really big file.
Thanks again...
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 07/03/14 21:26, David Wall wrote:
On 7/3/2014 6:26 PM, Bosco Rama wrote:
BTW, is there any particular reason to do the 'split'?
Yes, I transfer the files to Amazon S3 and there were too many troubles
with one really big file.
Is the issue with S3 or just transfer time? I would expect that 'rsync'
with the '--partial' option (or -P if you want progress info too) may
help there.
HTH,
Bosco.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 7/3/2014 10:13 PM, Bosco Rama wrote:
Is the issue with S3 or just transfer time? I would expect that
'rsync' with the '--partial' option (or -P if you want progress info
too) may help there.
Don't know if rsync and S3 work together or what that would mean, but
it's not an issue I'm suffering now. I do think they may now have a
multipart upload with s3cmd (which I use), though that also wasn't
available when we first built our scripts.
I suspect nothing is really helping here and I'm mostly limited by disk
I/O, but not sure why the pg_dump is so much slower than pg_restore as
they are all on the same disks. I say this because even with pg_dump
-Z0 | gpg -z 0 and gzip removed entirely and no --oids on pg_dump,
there's no effective difference in overall speed. While I can see all
of those processes vying for resources via 'top -c', the throughput
remains much the same.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 7/3/2014 11:47 AM, Eduardo Morras wrote:
No, there's nothing wrong. All transparent compressed objects stored
in database, toast, lo, etc.. is transparently decompressed while
pg_dump access them and then you gzip it again. I don't know why it
doesn't dump the compressed data directly.
That sounds odd, but if pg_dump decompresses the large objects and then
I gzip them on backup, doesn't the same more or less happen in reverse
when I pg_restore them? I mean, I gunzip the backup and then pg_restore
must compress the large objects when it writes them back.
It just seems odd that pg_dump is slower than pg_restore to me. Most
grumblings I read about suggest that pg_restore is too slow.
I have noted that the last split file segment will often appear to be
done -- no file modifications -- while pg_dump is still running, often
for another 20 minutes or so, and then some last bit is finally
written. It's as if pg_dump is calculating something at the end that is
quite slow. At startup, there's a delay before data is written, too,
but it's generally 1-2 minutes at most.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
David Wall <d.wall@computer.org> writes:
A pg_dump backup -- with most of the data stored as large objects --
takes about 5 hours.But restoring that dump takes about 2 hours. So it's taking 2.5 times
longer to back it up than to restore it.
Does top(1) reveal any bottlenecks?
Is the backup constrained by CPU (top will show your backup at close to
100% CPU usage) or I/O (top will often show your process in state 'D').
I also like dstat for looking at how things play out in such situations.
Best regards
Jacob
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
David Wall <d.wall@computer.org> writes:
It just seems odd that pg_dump is slower than pg_restore to me. Most
grumblings I read about suggest that pg_restore is too slow.
I have noted that the last split file segment will often appear to be
done -- no file modifications -- while pg_dump is still running, often
for another 20 minutes or so, and then some last bit is finally
written. It's as if pg_dump is calculating something at the end that is
quite slow. At startup, there's a delay before data is written, too,
but it's generally 1-2 minutes at most.
You haven't given us much info about the contents of this database.
Are there a lot of tables? functions? large objects? How many is
"a lot", if so?
I'm suspicious that you're paying a penalty associated with pg_dump's
rather inefficient handling of metadata for large objects, but there's
not enough info in this thread to diagnose it. It'd be very interesting
to see perf or oprofile stats on the pg_dump run, particularly during
the parts where it doesn't seem to be writing anything.
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 7/4/2014 7:19 AM, Tom Lane wrote:
You haven't given us much info about the contents of this database.
Are there a lot of tables? functions? large objects? How many is
"a lot", if so?I'm suspicious that you're paying a penalty associated with pg_dump's
rather inefficient handling of metadata for large objects, but there's
not enough info in this thread to diagnose it. It'd be very interesting
to see perf or oprofile stats on the pg_dump run, particularly during
the parts where it doesn't seem to be writing anything.
There are only 32 table, no functions, but mostly large objects. Not
sure how to know about the LOs, but a quick check from the table sizes I
estimate at only 2GB, so 16GB could be LOs. There are 7,528,803 entries
in pg_catalog.pg_largeobject.
pg_database_size reports 18GB
biggest table sizes:
relation | size
-----------------------------------+--------
public.esf_formparty | 635 MB
public.esf_activity_log | 416 MB
public.esf_form | 181 MB
public.esf_encrypted_blob | 134 MB
public.esf_activity_log_ownertime | 73 MB
public.esf_tranfield | 72 MB
public.esf_formpartytranididx | 70 MB
public.esf_formparty_pkey | 65 MB
public.esf_encrypted_blob_pkey | 64 MB
public.esf_formpartyididx | 63 MB
public.esf_tranfield_pkey | 52 MB
public.esf_formpartypickupidx | 51 MB
public.esf_activity_log_typetime | 47 MB
public.esf_tran | 46 MB
public.esf_formorderidx | 46 MB
public.esf_form_pkey | 42 MB
public.esf_tranfieldvalueidx | 39 MB
public.esf_traninittimeidx | 19 MB
public.esf_tranupdatetimeidx | 19 MB
public.esf_tran_pkey | 13 MB
Basic top stats while running show:
top - 08:53:40 up 27 days, 17:38, 1 user, load average: 1.03, 1.12, 1.22
Tasks: 156 total, 1 running, 155 sleeping, 0 stopped, 0 zombie
Cpu(s): 1.3%us, 0.6%sy, 0.4%ni, 74.2%id, 23.5%wa, 0.0%hi, 0.0%si,
0.0%st
Mem: 3974112k total, 3954520k used, 19592k free, 46012k buffers
Swap: 4245496k total, 29996k used, 4215500k free, 1123844k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7549 esignfor 20 0 116m 1372 884 S 3.0 0.0 16:39.69 gpg --batch
--symmetric --cipher-algo AES256 --passphrase 3z4ig0Rq]w
7547 esignfor 30 10 1148m 1.0g 852 S 2.3 26.9 14:10.27 pg_dump
--format=c --oids ibc01
7548 esignfor 20 0 4296 748 372 S 2.3 0.0 13:05.44 gzip
7551 esignfor 20 0 555m 413m 410m D 1.7 10.6 9:32.03 postgres:
esignforms ibc01 [local] <FASTPATH>
1978 esignfor 20 0 15032 1372 1004 R 0.7 0.0 0:00.27 top -c
7550 esignfor 20 0 98.6m 592 472 S 0.3 0.0 0:49.80 split -b
512000000 - /home/esignforms/customers/archive/db/dump.20140704.gz.gpg
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 07/03/14 22:51, David Wall wrote:
It just seems odd that pg_dump is slower than pg_restore to me. Most
grumblings I read about suggest that pg_restore is too slow.I have noted that the last split file segment will often appear to be
done -- no file modifications -- while pg_dump is still running, often
for another 20 minutes or so, and then some last bit is finally
written. It's as if pg_dump is calculating something at the end that is
quite slow. At startup, there's a delay before data is written, too,
but it's generally 1-2 minutes at most.
Random thought: What OS & kernel are you running? Kernels between
3.2.x and 3.9.x were known to have IO scheduling issues. This was
highlighted most by the kernel in Ubuntu 12.04 (precise) as shown
here:
</messages/by-id/50BF9247.2010800@optionshouse.com>
Bosco.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general