Verify Option with pg_dump

Started by Howard Coleover 9 years ago7 messagesgeneral
Jump to latest
#1Howard Cole
howardnews@selestial.com

Hi,

recently I had problems with a corrupt pg_dump file. The problem with
the file was due to a faulty disk. The trouble with this is that I was
unaware of the disk problem and the pg_dump file corruption so I did not
have a full valid backup. In order to reduce the chances of this I was
hoping that there could be a verify option as in SQL Server for the
backups. This could be as simple as checking the CRC/MD5 as the stream
is created. So pg_dump | crc_save

The idea being that the pg_dump is crc'd before it is streamed to disk,
and then the file re-read from disk to check the CRC.

Is there a linux utility to do this or would it be simple to modify
pg_dump to do this?

Thanks

Howard.

www.selestial.com <http://www.selestial.com&gt;

#2Karsten Hilbert
Karsten.Hilbert@gmx.net
In reply to: Howard Cole (#1)
Re: Verify Option with pg_dump

On Wed, Nov 30, 2016 at 12:00:07PM +0000, Howard News wrote:

recently I had problems with a corrupt pg_dump file. The problem with the
file was due to a faulty disk. The trouble with this is that I was unaware
of the disk problem and the pg_dump file corruption so I did not have a full
valid backup. In order to reduce the chances of this I was hoping that there
could be a verify option as in SQL Server for the backups. This could be as
simple as checking the CRC/MD5 as the stream is created. So pg_dump |
crc_save

The idea being that the pg_dump is crc'd before it is streamed to disk, and
then the file re-read from disk to check the CRC.

Is there a linux utility to do this or would it be simple to modify pg_dump
to do this?

You can try to suitably combine "pg_dump --format=plain" with
"tee" and "md5sum" such that the output stream is diverted to
both a file and a pipe-into-CRC-algorithm and eventually
compare the pipe's sum with the sum generated from the file.

But the better solution might be to stream to a filesystem
that verifies disk writes immediately. Or to a suitable RAID
array.

Regards,
Karsten
--
GPG key ID E4071346 @ eu.pool.sks-keyservers.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Howard Cole
howardnews@selestial.com
In reply to: Karsten Hilbert (#2)
Re: Verify Option with pg_dump

On 30/11/2016 12:27, Karsten Hilbert wrote:

You can try to suitably combine "pg_dump --format=plain" with
"tee" and "md5sum" such that the output stream is diverted to
both a file and a pipe-into-CRC-algorithm and eventually
compare the pipe's sum with the sum generated from the file.

But the better solution might be to stream to a filesystem
that verifies disk writes immediately. Or to a suitable RAID
array.

Regards,
Karsten

Thanks for this info Karsten. I will look into using "tee". As a matter
of interest, why does the format need to be plain?

Regarding the filesystem solution, the dump is currently written to a HP
RAID 10 array with an NTFS partition. What filesystems / raid arrays
have this ability?

Thanks.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Karsten Hilbert
Karsten.Hilbert@gmx.net
In reply to: Howard Cole (#3)
Re: Verify Option with pg_dump

On Wed, Nov 30, 2016 at 01:11:58PM +0000, Howard News wrote:

You can try to suitably combine "pg_dump --format=plain" with
"tee" and "md5sum" such that the output stream is diverted to
both a file and a pipe-into-CRC-algorithm and eventually
compare the pipe's sum with the sum generated from the file.

But the better solution might be to stream to a filesystem
that verifies disk writes immediately. Or to a suitable RAID
array.

Thanks for this info Karsten. I will look into using "tee". As a matter of
interest, why does the format need to be plain?

Actually, any of the formats producing a _single_ file right
away are likely to work. So, any but "directory", I guess.

Regarding the filesystem solution, the dump is currently written to a HP
RAID 10 array with an NTFS partition. What filesystems / raid arrays have
this ability?

If you can't trust your RAID 10 (1 meaning mirrored) to
actually store what you told it to you've got problems beyond
somehow verifying a pg_dump.

Regards,
Karsten
--
GPG key ID E4071346 @ eu.pool.sks-keyservers.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Howard Cole
howardnews@selestial.com
In reply to: Karsten Hilbert (#4)
Re: Verify Option with pg_dump

Regarding the filesystem solution, the dump is currently written to a HP

RAID 10 array with an NTFS partition. What filesystems / raid arrays have
this ability?

If you can't trust your RAID 10 (1 meaning mirrored) to
actually store what you told it to you've got problems beyond
somehow verifying a pg_dump.

Regards,
Karsten

I am told RAID can only protect you against disk failure. File writes to
one or more of the disks in an array are not typically compared so a
RAID array carrys on until the disk failure, or error count get to a
certain level. So RAID does not fully protect you from data corruption.

So you can't trust RAID!

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Karsten Hilbert
Karsten.Hilbert@gmx.net
In reply to: Howard Cole (#5)
Re: Verify Option with pg_dump

On Wed, Nov 30, 2016 at 01:53:21PM +0000, Howard News wrote:

Regarding the filesystem solution, the dump is currently written to a HP

RAID 10 array with an NTFS partition. What filesystems / raid arrays have
this ability?

If you can't trust your RAID 10 (1 meaning mirrored) to
actually store what you told it to you've got problems beyond
somehow verifying a pg_dump.

Regards,
Karsten

I am told RAID can only protect you against disk failure. File writes to one
or more of the disks in an array are not typically compared so a RAID array
carrys on until the disk failure, or error count get to a certain level. So
RAID does not fully protect you from data corruption.

True enough. So it seems you are referring to "silent data
corruption". Does this link help ?

http://www.raidix.com/knowledge-base/silent-data-corruption/

This link also seems relevant:

http://stackoverflow.com/questions/13107783/pipe-output-to-two-different-commands

Regards,
Karsten
--
GPG key ID E4071346 @ eu.pool.sks-keyservers.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Karsten Hilbert
Karsten.Hilbert@gmx.net
In reply to: Karsten Hilbert (#6)
Re: Verify Option with pg_dump

Also this

https://en.wikipedia.org/wiki/Silent_data_corruption#Countermeasures

--
GPG key ID E4071346 @ eu.pool.sks-keyservers.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general