Verify Option with pg_dump
Hi,
recently I had problems with a corrupt pg_dump file. The problem with
the file was due to a faulty disk. The trouble with this is that I was
unaware of the disk problem and the pg_dump file corruption so I did not
have a full valid backup. In order to reduce the chances of this I was
hoping that there could be a verify option as in SQL Server for the
backups. This could be as simple as checking the CRC/MD5 as the stream
is created. So pg_dump | crc_save
The idea being that the pg_dump is crc'd before it is streamed to disk,
and then the file re-read from disk to check the CRC.
Is there a linux utility to do this or would it be simple to modify
pg_dump to do this?
Thanks
Howard.
www.selestial.com <http://www.selestial.com>
On Wed, Nov 30, 2016 at 12:00:07PM +0000, Howard News wrote:
recently I had problems with a corrupt pg_dump file. The problem with the
file was due to a faulty disk. The trouble with this is that I was unaware
of the disk problem and the pg_dump file corruption so I did not have a full
valid backup. In order to reduce the chances of this I was hoping that there
could be a verify option as in SQL Server for the backups. This could be as
simple as checking the CRC/MD5 as the stream is created. So pg_dump |
crc_saveThe idea being that the pg_dump is crc'd before it is streamed to disk, and
then the file re-read from disk to check the CRC.Is there a linux utility to do this or would it be simple to modify pg_dump
to do this?
You can try to suitably combine "pg_dump --format=plain" with
"tee" and "md5sum" such that the output stream is diverted to
both a file and a pipe-into-CRC-algorithm and eventually
compare the pipe's sum with the sum generated from the file.
But the better solution might be to stream to a filesystem
that verifies disk writes immediately. Or to a suitable RAID
array.
Regards,
Karsten
--
GPG key ID E4071346 @ eu.pool.sks-keyservers.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 30/11/2016 12:27, Karsten Hilbert wrote:
You can try to suitably combine "pg_dump --format=plain" with
"tee" and "md5sum" such that the output stream is diverted to
both a file and a pipe-into-CRC-algorithm and eventually
compare the pipe's sum with the sum generated from the file.But the better solution might be to stream to a filesystem
that verifies disk writes immediately. Or to a suitable RAID
array.Regards,
Karsten
Thanks for this info Karsten. I will look into using "tee". As a matter
of interest, why does the format need to be plain?
Regarding the filesystem solution, the dump is currently written to a HP
RAID 10 array with an NTFS partition. What filesystems / raid arrays
have this ability?
Thanks.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Wed, Nov 30, 2016 at 01:11:58PM +0000, Howard News wrote:
You can try to suitably combine "pg_dump --format=plain" with
"tee" and "md5sum" such that the output stream is diverted to
both a file and a pipe-into-CRC-algorithm and eventually
compare the pipe's sum with the sum generated from the file.But the better solution might be to stream to a filesystem
that verifies disk writes immediately. Or to a suitable RAID
array.Thanks for this info Karsten. I will look into using "tee". As a matter of
interest, why does the format need to be plain?
Actually, any of the formats producing a _single_ file right
away are likely to work. So, any but "directory", I guess.
Regarding the filesystem solution, the dump is currently written to a HP
RAID 10 array with an NTFS partition. What filesystems / raid arrays have
this ability?
If you can't trust your RAID 10 (1 meaning mirrored) to
actually store what you told it to you've got problems beyond
somehow verifying a pg_dump.
Regards,
Karsten
--
GPG key ID E4071346 @ eu.pool.sks-keyservers.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Regarding the filesystem solution, the dump is currently written to a HP
RAID 10 array with an NTFS partition. What filesystems / raid arrays have
this ability?If you can't trust your RAID 10 (1 meaning mirrored) to
actually store what you told it to you've got problems beyond
somehow verifying a pg_dump.Regards,
Karsten
I am told RAID can only protect you against disk failure. File writes to
one or more of the disks in an array are not typically compared so a
RAID array carrys on until the disk failure, or error count get to a
certain level. So RAID does not fully protect you from data corruption.
So you can't trust RAID!
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Wed, Nov 30, 2016 at 01:53:21PM +0000, Howard News wrote:
Regarding the filesystem solution, the dump is currently written to a HP
RAID 10 array with an NTFS partition. What filesystems / raid arrays have
this ability?If you can't trust your RAID 10 (1 meaning mirrored) to
actually store what you told it to you've got problems beyond
somehow verifying a pg_dump.Regards,
KarstenI am told RAID can only protect you against disk failure. File writes to one
or more of the disks in an array are not typically compared so a RAID array
carrys on until the disk failure, or error count get to a certain level. So
RAID does not fully protect you from data corruption.
True enough. So it seems you are referring to "silent data
corruption". Does this link help ?
http://www.raidix.com/knowledge-base/silent-data-corruption/
This link also seems relevant:
http://stackoverflow.com/questions/13107783/pipe-output-to-two-different-commands
Regards,
Karsten
--
GPG key ID E4071346 @ eu.pool.sks-keyservers.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Also this
https://en.wikipedia.org/wiki/Silent_data_corruption#Countermeasures
--
GPG key ID E4071346 @ eu.pool.sks-keyservers.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general