Updating large postgresql database with blobs
Hello,
I didn't get any response on the GENERAL list so i'm escalating this ....
We have several independent database servers with ~50GB+ databases running
postgres 8.0.x. We are planning to upgrade these databases to postgres
8.2.xover the weekend
We plan to use the following steps to upgrade each server,
1. Dump the 8.0.x database cluster using 8.2.x pg_dumpall
% ./pg_dumpall > pgdumpall_backup.sql
2.Dump the 8.0.x database including large objects in compressed custom
format using 8.2.x pg_dump
% ./pg_dump -Fc -b -Z9 dbname > pgdump_lobs_backup
Restoring database
1. Initialize 8.2.x darabase
% initdb -D /data/pgdata
2. Restore template1 database from cluster dump
% ./psql -d template1 < pgdumpall_backup.sql
3. Delete database dbname else restoring will give error about existing
dbname
% dropdb dbname
4. Create fresh dbname
% createdb -O dbowner dbname
5. Restore database with lobs
% ./pg_restore -v -Fc -d dbname -e -U dbowner < pgdumpall_lobs_backup
Some of the problems we have are,
1. We are not sure if all of the data will be available after dump/restore
with above process
2. The dump and restore process is very very slow to be complete over the
weekend (takes approx 1GB/hr to dump on a dual G5 PPC 2Ghz with 1GB RAM and
RAID 1 disks)
What is the fastest way to upgrade postgres for large databases that has
binary objects?
Thanks for all your help.
CAJ CAJ wrote:
Hello,
I didn't get any response on the GENERAL list so i'm escalating this ....
We have several independent database servers with ~50GB+ databases
running postgres 8.0.x. We are planning to upgrade these databases to
postgres 8.2.x over the weekendWe plan to use the following steps to upgrade each server,
1. Dump the 8.0.x database cluster using 8.2.x pg_dumpall
% ./pg_dumpall > pgdumpall_backup.sql2.Dump the 8.0.x database including large objects in compressed
custom format using 8.2.x pg_dump
% ./pg_dump -Fc -b -Z9 dbname > pgdump_lobs_backupRestoring database
1. Initialize 8.2.x darabase
% initdb -D /data/pgdata2. Restore template1 database from cluster dump
% ./psql -d template1 < pgdumpall_backup.sql3. Delete database dbname else restoring will give error about
existing dbname
% dropdb dbname4. Create fresh dbname
% createdb -O dbowner dbname5. Restore database with lobs
% ./pg_restore -v -Fc -d dbname -e -U dbowner < pgdumpall_lobs_backupSome of the problems we have are,
1. We are not sure if all of the data will be available after
dump/restore with above process
2. The dump and restore process is very very slow to be complete over
the weekend (takes approx 1GB/hr to dump on a dual G5 PPC 2Ghz with
1GB RAM and RAID 1 disks)What is the fastest way to upgrade postgres for large databases that
has binary objects?Thanks for all your help.
Your procedure dumps and restore the databases twice. This seems less
than sound. My prediction is that you could get a 50% speed improvement
by fixing that ...
The only thing you really need pg_dumpall for is the global tables. I
would just use pg_dumpall -g to get those, and then use pg_dump -F c +
pg_restore for each actual database.
Another thing is to make sure that pg_dump/pg_restore are not competing
with postgres for access to the same disk(s). One way to do that is to
run them from a different machine - they don't have to be run on the
server machine - of course then the network can become a bottleneck, so
YMMV.
cheers
andrew
<snip>
What is the fastest way to upgrade postgres for large databases that
has binary objects?Your procedure dumps and restore the databases twice. This seems less
than sound. My prediction is that you could get a 50% speed improvement
by fixing that ...
Thanks for the response. This'd be wonderful if I can get my process right.
My assumptions (probably incorrect) are that pgdump has to be excuted twice
on a database with blobs. Once to get the data and once to get the blob
(using the -b flag).
The only thing you really need pg_dumpall for is the global tables. I
would just use pg_dumpall -g to get those, and then use pg_dump -F c +
pg_restore for each actual database.
This makes sense :) I assume that running pg_dump with -b will get all of
the data including the blobs?
Another thing is to make sure that pg_dump/pg_restore are not competing
with postgres for access to the same disk(s). One way to do that is to
run them from a different machine - they don't have to be run on the
server machine - of course then the network can become a bottleneck, so
YMMV.
We are using separate servers for dump and restore.
Thanks again for your suggestions. This helps immensely.
CAJ CAJ wrote:
Thanks for the response. This'd be wonderful if I can get my process
right. My assumptions (probably incorrect) are that pgdump has to be
excuted twice on a database with blobs. Once to get the data and once
to get the blob (using the -b flag).
Why do you assume that? The pg_dump manual says:
-b
--blobs
Include large objects in the dump. This is the default behavior
except when --schema, --table, or --schema-only is specified, so the
-b switch is only useful to add large objects to selective dumps.
IOW, in most cases you will just get the large objects for free, and
don't even need to use this flag to get them. No need to run twice.
cheers
andrew