Would you add a --dry-run to pg_restore?

Started by Edmundo Roblesover 8 years ago10 messagesgeneral

edmundo@sw-argos.com

over 8 years ago

I mean, to verify the integrity of backup i do:

gunzip -c backup_yesterday.gz | pg_restore -d my_database && echo
"backup_yesterday is OK"

but my_database's size, uncompresed, is too big more than 15G and
sometimes i have no space to restore it, so always i must declutter my
disk first.

By the way i have programmed backups on many databases so, i must check
the integrity one by one deleting the database to avoid disk space
issues. By the way the restores takes too long time an average of 1 hour by
backup.

Will be great to have a dry run option, because the time to verify
reduces a lot and will save space on disk, because just execute with no
write to disk.

if pg_restore have a dry option i will do:

(gunzip -c mydata.gz | pg_restore -d mydata --dry && echo "mydata0 is
ok")&
(gunzip -c my_other_data.gz | pg_restore -d my_other_data --dry && echo
"my_other_data is ok")&
(gunzip -c my_another_data.gz | pg_restore -d my_another_data --dry &&
echo "my_another_data is ok")&
wait

and the time to verify only will take 1 hour instead of 3 hours.

David G. Johnston

david.g.johnston@gmail.com

over 8 years ago

In reply to: Edmundo Robles (#1)

Re: Would you add a --dry-run to pg_restore?

On Wed, Aug 2, 2017 at 9:02 AM, Edmundo Robles <edmundo@sw-argos.com> wrote:

Will be great to have a dry run option, because the time to verify
reduces a lot and will save space on disk, because just execute with no
write to disk.

"Dry run", the way I understand it, can be accomplished via the "-l" (ell)
switch. It means - "tell me what is going to happen but don't actually do
it". I don't know what to call what you are describing but I don't see how
it could reasonably be made to work and give the admin confidence that a
true restoration would be valid. Maybe I'm just being unimaginative but at
minimum you'd have to write out the tables to disk so data could be loaded
into them. Then data would have to be persisted in order to validate the
constraints at the end.

If you are running out of disk space you should get larger/more disks.

Personally, I'd probably setup a dedicated "test restore" cluster with lots
of HD and put stuff like "fsync=off" into its postgresql.conf.

I could see having a "--make-tables-unlogged" option that would convert,
on-the-fly, all CREATE TABLE commands to "CREATE UNLOGGED TABLE" commands.

David J.

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Edmundo Robles (#1)

Re: Would you add a --dry-run to pg_restore?

Edmundo Robles <edmundo@sw-argos.com> writes:

I mean, to verify the integrity of backup i do:
gunzip -c backup_yesterday.gz | pg_restore -d my_database && echo
"backup_yesterday is OK"

but my_database's size, uncompresed, is too big more than 15G and
sometimes i have no space to restore it, so always i must declutter my
disk first.

Will be great to have a dry run option, because the time to verify
reduces a lot and will save space on disk, because just execute with no
write to disk.

What do you imagine a dry run option would do?

If you just want to see if the file contains obvious corruption,
you could do

pg_restore file >/dev/null

and see if it prints any complaints on stderr. If you want to have
confidence that the file would actually restore (and that there aren't
e.g. unique-index violations or foreign-key violations in the data),
I could imagine a mode where pg_restore wraps its output in "begin" and
"rollback". But that's not going to save any disk space, or time,
compared to doing a normal restore into a scratch database.

I can't think of any intermediate levels of verification that wouldn't
involve a huge amount of work to implement ... and they'd be unlikely
to catch interesting problems in practice. For instance, I doubt that
syntax-checking but not executing the SQL coming out of pg_restore would
be worth the trouble. If an archive is corrupt enough that it contains
bad SQL, it probably has problems that pg_restore would notice anyway.
Most of the restore failures that we hear about in practice would not be
detectable without actually executing the commands, because they involve
problems like issuing commands in the wrong order.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

steve@blighty.com

over 8 years ago

In reply to: Edmundo Robles (#1)

Re: Would you add a --dry-run to pg_restore?

On Aug 2, 2017, at 9:02 AM, Edmundo Robles <edmundo@sw-argos.com> wrote:

I mean, to verify the integrity of backup i do:

gunzip -c backup_yesterday.gz | pg_restore -d my_database && echo "backup_yesterday is OK"

but my_database's size, uncompresed, is too big more than 15G and sometimes i have no space to restore it, so always i must declutter my disk first.

By the way i have programmed backups on many databases so, i must check the integrity one by one deleting the database to avoid disk space issues. By the way the restores takes too long time an average of 1 hour by backup.

Will be great to have a dry run option, because the time to verify reduces a lot and will save space on disk, because just execute with no write to disk.

If the gunzip completes successfully then the backups weren't corrupted and the disk is readable. They're very likely to be "good" unless you have a systematic problem with your backup script.

You could then run that data through pg_restore, redirecting the output to /dev/null, to check that the compressed file actually came from pg_dump. (gunzip backup_yesterday.gz | pg_restore >/dev/null)

The only level of checking you could do beyond that would be to ensure that the database was internally self-consistent and so truly restorable - and to do that, you'll need to restore it into a real database.

You could do an intermediate check by restoring into a real database with --schema-only, I guess.

As an aside, pg_dump with custom format already compresses the dump with gzip, so the additional gzip step may be redundant. You can set pg_dump's compression level with -Z.

Cheers,
Steve

if pg_restore have a dry option i will do:

(gunzip -c mydata.gz | pg_restore -d mydata --dry && echo "mydata0 is ok")&
(gunzip -c my_other_data.gz | pg_restore -d my_other_data --dry && echo "my_other_data is ok")&
(gunzip -c my_another_data.gz | pg_restore -d my_another_data --dry && echo "my_another_data is ok")&
wait

and the time to verify only will take 1 hour instead of 3 hours.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

edmundo@sw-argos.com

over 8 years ago

In reply to: Tom Lane (#3)

Re: Would you add a --dry-run to pg_restore?

I imagine pg_restore can execute the instructions on dump but don't
write on disk. just like David said: "tell me what is going to happen
but don't actually do it"

Regards.

On Wed, Aug 2, 2017 at 11:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Edmundo Robles <edmundo@sw-argos.com> writes:

I mean, to verify the integrity of backup i do:
gunzip -c backup_yesterday.gz | pg_restore -d my_database && echo
"backup_yesterday is OK"

but my_database's size, uncompresed, is too big more than 15G and
sometimes i have no space to restore it, so always i must declutter

my

disk first.

Will be great to have a dry run option, because the time to verify
reduces a lot and will save space on disk, because just execute with

no

write to disk.

What do you imagine a dry run option would do?

If you just want to see if the file contains obvious corruption,
you could do

pg_restore file >/dev/null

and see if it prints any complaints on stderr. If you want to have
confidence that the file would actually restore (and that there aren't
e.g. unique-index violations or foreign-key violations in the data),
I could imagine a mode where pg_restore wraps its output in "begin" and
"rollback". But that's not going to save any disk space, or time,
compared to doing a normal restore into a scratch database.

I can't think of any intermediate levels of verification that wouldn't
involve a huge amount of work to implement ... and they'd be unlikely
to catch interesting problems in practice. For instance, I doubt that
syntax-checking but not executing the SQL coming out of pg_restore would
be worth the trouble. If an archive is corrupt enough that it contains
bad SQL, it probably has problems that pg_restore would notice anyway.
Most of the restore failures that we hear about in practice would not be
detectable without actually executing the commands, because they involve
problems like issuing commands in the wrong order.

regards, tom lane

--

David G. Johnston

david.g.johnston@gmail.com

over 8 years ago

In reply to: Edmundo Robles (#5)

Re: Would you add a --dry-run to pg_restore?

On Wed, Aug 2, 2017 at 10:10 AM, Edmundo Robles <edmundo@sw-argos.com>
wrote:

I imagine pg_restore can execute the instructions on dump but don't
write on disk. just like David said: "tell me what is going to happen
but don't actually do it"

You may wish to respond to the actual points being made as to why
separating out "writing" from "execution" doesn't provide meaningful value
- especially not for the effort it would take.

David J.

Karsten Hilbert

Karsten.Hilbert@gmx.net

over 8 years ago

In reply to: Edmundo Robles (#5)

Re: Would you add a --dry-run to pg_restore?

On Wed, Aug 02, 2017 at 12:10:37PM -0500, Edmundo Robles wrote:

I imagine pg_restore can execute the instructions on dump but don't
write on disk. just like David said: "tell me what is going to happen
but don't actually do it"

In fact, this already exists:

pg_restore --file=commands.sql your-backup

Then read commands.sql.

It will tell you what is going to happen but not actually do it.

Karsten
--
GPG key ID E4071346 @ eu.pool.sks-keyservers.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Francisco Olarte

folarte@peoplecall.com

over 8 years ago

In reply to: Edmundo Robles (#5)

Re: Would you add a --dry-run to pg_restore?

On Wed, Aug 2, 2017 at 7:10 PM, Edmundo Robles <edmundo@sw-argos.com> wrote:

I imagine pg_restore can execute the instructions on dump but don't write on disk. just like David said: "tell me what is going to happen but don't actually do it"

IIRC pg_restore does not execute SQL fully. It just passes the
commands to the server when in text mode, like psql, and builds some
commands and send them to the server to execute when in custom/tar
mode. I doubt it has the ability to validate the contents of the dump.

Edmundo Robles <edmundo@sw-argos.com> writes:

I mean, to verify the integrity of backup i do:
gunzip -c backup_yesterday.gz | pg_restore -d my_database && echo
"backup_yesterday is OK"

I also think if he is piping the dump must be text mode, I seem to
recall custom format needs seekable files, but not too sure about tar,
it should not. In this case, as previously suggested, a simple gunzip
-t is enough to verify backup file integrity, but checking internal
correctness is very difficult ( as it may even depend on server
configuration, i.e., needing some predefined users / locales /
encodings ).

Francisco Olarte.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Rory Campbell-Lange

rory@campbell-lange.net

over 8 years ago

In reply to: Steve Atkins (#4)

Re: Would you add a --dry-run to pg_restore?

On 02/08/17, Steve Atkins (steve@blighty.com) wrote:

On Aug 2, 2017, at 9:02 AM, Edmundo Robles <edmundo@sw-argos.com> wrote:

I mean, to verify the integrity of backup i do:

gunzip -c backup_yesterday.gz | pg_restore -d my_database && echo
"backup_yesterday is OK"

but my_database's size, uncompresed, is too big more than 15G and
sometimes i have no space to restore it, so always i must
declutter my disk first.

...

If the gunzip completes successfully then the backups weren't
corrupted and the disk is readable. They're very likely to be "good"
unless you have a systematic problem with your backup script.

You could then run that data through pg_restore, redirecting the
output to /dev/null, to check that the compressed file actually came
from pg_dump. (gunzip backup_yesterday.gz | pg_restore >/dev/null)

A couple of extra steps you can add to avoid a full restore (which is
best) is to do a file hash check as part of the verification, and do
something like add a token to the database just before dumping, then
verify that. We do something like this:

rory:~/db$ gpg -d dump_filename.sqlc.gpg | \
pg_restore -Fc --data-only --schema audit | \
grep -A 1 "COPY audit"

output >
COPY audit (tdate) FROM stdin;
2017-04-25

Cheers
Rory

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

mmoncure@gmail.com

over 8 years ago

In reply to: Tom Lane (#3)

Re: Would you add a --dry-run to pg_restore?

On Wed, Aug 2, 2017 at 11:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Edmundo Robles <edmundo@sw-argos.com> writes:

I mean, to verify the integrity of backup i do:
gunzip -c backup_yesterday.gz | pg_restore -d my_database && echo
"backup_yesterday is OK"

but my_database's size, uncompresed, is too big more than 15G and
sometimes i have no space to restore it, so always i must declutter my
disk first.

Will be great to have a dry run option, because the time to verify
reduces a lot and will save space on disk, because just execute with no
write to disk.

What do you imagine a dry run option would do?

If you just want to see if the file contains obvious corruption,
you could do

pg_restore file >/dev/null

and see if it prints any complaints on stderr. If you want to have
confidence that the file would actually restore (and that there aren't
e.g. unique-index violations or foreign-key violations in the data),
I could imagine a mode where pg_restore wraps its output in "begin" and
"rollback". But that's not going to save any disk space, or time,
compared to doing a normal restore into a scratch database.

I can't think of any intermediate levels of verification that wouldn't
involve a huge amount of work to implement ... and they'd be unlikely
to catch interesting problems in practice. For instance, I doubt that
syntax-checking but not executing the SQL coming out of pg_restore would
be worth the trouble. If an archive is corrupt enough that it contains
bad SQL, it probably has problems that pg_restore would notice anyway.
Most of the restore failures that we hear about in practice would not be
detectable without actually executing the commands, because they involve
problems like issuing commands in the wrong order.

The vast majority of my restore issues are dependency problems (for
example, postgis extension not being present). A distant second place
would be pg_restore's inability to do things in the proper order or
gaps in the dump feature itself (for example, a cast between two built
in types, at least back in the day).

A good reasonable test for all of those cases with the current tools
is to do a schema only restore (which should not take long in most
cases). If you get past that step, there is an exceptionally high
probability that the restore will succeed sans some controllable
factors like running out of space.

There are some rare known considerations that could a data load to
fail. For example, a unique index on floating point can dump but not
load if two binary differentiated values render to the same string.
I've never seen this in practice however. So I'd argue to just use
that (schema only) feature for pre-load verification if you're
paranoid.

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general