BUG: pg_dump generates corrupted gzip file in Windows

Started by Kuntal Ghoshalmost 9 years ago11 messages
#1Kuntal Ghosh
kuntalghosh.2007@gmail.com
1 attachment(s)

Hello,
In Windows, if one needs to take a dump in plain text format (this is
the default option, or can be specified using -Fp) with some level of
compression (-Z[0-9]), an output file has to
be specified. Otherwise, if the output is redirected to stdout, it'll
create a corrupted dump (cmd is set to ASCII mode, so it'll put
carriage returns in the file).

I'm referring the following part of pg_dump code:

/*
* On Windows, we need to use binary mode to read/write non-text archive
* formats. Force stdin/stdout into binary mode if that is what we are
* using.
*/
#ifdef WIN32
if (fmt != archNull &&
(AH->fSpec == NULL || strcmp(AH->fSpec, "") == 0))
{
if (mode == archModeWrite)
setmode(fileno(stdout), O_BINARY);
else
setmode(fileno(stdin), O_BINARY);
}
#endif

For plain-text format, fmt is set to archNull. In that case, the
binary mode will not be forced(I think). To fix this, I've attached a
patch which adds one extra check in the 'if condition' to check the
compression level. PFA.

--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Attachments:

0001-Fix-pg_dump-for-Windows-cmd.patchbinary/octet-stream; name=0001-Fix-pg_dump-for-Windows-cmd.patchDownload
From 4de1569a424b25437d51dcbfbfd84ee9edf113f1 Mon Sep 17 00:00:00 2001
From: Kuntal Ghosh <kuntal.ghosh@enterprisedb.com>
Date: Fri, 24 Mar 2017 11:10:51 +0530
Subject: [PATCH] Fix pg_dump for Windows cmd

In Windows, force stdin/stdout in binary mode if any compression
is needed, even in plain text-format.
---
 src/bin/pg_dump/pg_backup_archiver.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index dd08925..3eb0663 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -2336,12 +2336,12 @@ _allocAH(const char *FileSpec, const ArchiveFormat fmt,
 	AH->OF = stdout;
 
 	/*
-	 * On Windows, we need to use binary mode to read/write non-text archive
-	 * formats.  Force stdin/stdout into binary mode if that is what we are
-	 * using.
+	 * On Windows, we need to use binary mode to read/write non-text
+	 * archive/compressed formats.  Force stdin/stdout into binary
+	 * mode if that is what we are using.
 	 */
 #ifdef WIN32
-	if (fmt != archNull &&
+	if ((fmt != archNull || compression > 0) &&
 		(AH->fSpec == NULL || strcmp(AH->fSpec, "") == 0))
 	{
 		if (mode == archModeWrite)
-- 
1.8.3.1

#2Kuntal Ghosh
kuntalghosh.2007@gmail.com
In reply to: Kuntal Ghosh (#1)
Re: BUG: pg_dump generates corrupted gzip file in Windows

On Fri, Mar 24, 2017 at 11:28 AM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:

Hello,
In Windows, if one needs to take a dump in plain text format (this is
the default option, or can be specified using -Fp) with some level of
compression (-Z[0-9]), an output file has to
be specified. Otherwise, if the output is redirected to stdout, it'll
create a corrupted dump (cmd is set to ASCII mode, so it'll put
carriage returns in the file).

To reproduce the issue, please use the following command in windows cmd:

pg_dump -Z 9 test > E:\test_xu.backup
pg_dump -Fp -Z 9 test > E:\test_xu.backup

--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Craig Ringer
craig@2ndquadrant.com
In reply to: Kuntal Ghosh (#2)
Re: BUG: pg_dump generates corrupted gzip file in Windows

On 24 March 2017 at 14:07, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:

On Fri, Mar 24, 2017 at 11:28 AM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:

Hello,
In Windows, if one needs to take a dump in plain text format (this is
the default option, or can be specified using -Fp) with some level of
compression (-Z[0-9]), an output file has to
be specified. Otherwise, if the output is redirected to stdout, it'll
create a corrupted dump (cmd is set to ASCII mode, so it'll put
carriage returns in the file).

To reproduce the issue, please use the following command in windows cmd:

pg_dump -Z 9 test > E:\test_xu.backup
pg_dump -Fp -Z 9 test > E:\test_xu.backup

This is a known problem. It is not specific to PostgreSQL, it affects
any software that attempts to use stdin/stdout on Windows via cmd,
where it is not 8-bit clean.

We don't just refuse to run with stdout as a destination because it's
perfectly sensible if you're not using cmd.exe. pg_dump cannot, as far
as I know, tell whether it's being invoked by cmd or something else.

If you have concrete ideas on how to improve this they'd be welcomed.
Is there anywhere you expected to find info in the docs? Do you know
of a way to detect in Windows if the output stream is not 8-bit clean
from within the application program? ... other?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Kuntal Ghosh
kuntalghosh.2007@gmail.com
In reply to: Craig Ringer (#3)
Re: BUG: pg_dump generates corrupted gzip file in Windows

On Fri, Mar 24, 2017 at 12:35 PM, Craig Ringer <craig@2ndquadrant.com> wrote:

On 24 March 2017 at 14:07, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:

On Fri, Mar 24, 2017 at 11:28 AM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:

Hello,
In Windows, if one needs to take a dump in plain text format (this is
the default option, or can be specified using -Fp) with some level of
compression (-Z[0-9]), an output file has to
be specified. Otherwise, if the output is redirected to stdout, it'll
create a corrupted dump (cmd is set to ASCII mode, so it'll put
carriage returns in the file).

To reproduce the issue, please use the following command in windows cmd:

pg_dump -Z 9 test > E:\test_xu.backup
pg_dump -Fp -Z 9 test > E:\test_xu.backup

This is a known problem. It is not specific to PostgreSQL, it affects
any software that attempts to use stdin/stdout on Windows via cmd,
where it is not 8-bit clean.

We don't just refuse to run with stdout as a destination because it's
perfectly sensible if you're not using cmd.exe. pg_dump cannot, as far
as I know, tell whether it's being invoked by cmd or something else.

ASAICU, if we use binary mode, output is stored bit by bit. In ASCII
mode, cmd pokes its nose and does CR / LF conversions on its own. So,
whenever we want compression on a plain-text dump file, we can set the
stdout mode to O_BINARY. Is it a wrong approach?

If you have concrete ideas on how to improve this they'd be welcomed.
Is there anywhere you expected to find info in the docs? Do you know
of a way to detect in Windows if the output stream is not 8-bit clean
from within the application program? ... other?

Actually, I'm not that familiar with windows environment. But, I
couldn't find any note to user in pg_dump documentation regarding the
issue. In cmd, if someone needs a plain-text dump in compressed
format, they should specify the output file, otherwise they may run
into the above problem. However, if a dump is corrupted due to the
above issue, a fix for that is provided in [1]http://www.gzip.org/ Use fixgz.c to remove the extra CR (carriage return) bytes.. Should we include this
in the documentation?

[1]: http://www.gzip.org/ Use fixgz.c to remove the extra CR (carriage return) bytes.
Use fixgz.c to remove the extra CR (carriage return) bytes.

--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Kuntal Ghosh
kuntalghosh.2007@gmail.com
In reply to: Kuntal Ghosh (#4)
Re: BUG: pg_dump generates corrupted gzip file in Windows

On Fri, Mar 24, 2017 at 2:17 PM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:

On Fri, Mar 24, 2017 at 12:35 PM, Craig Ringer <craig@2ndquadrant.com> wrote:

On 24 March 2017 at 14:07, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:

On Fri, Mar 24, 2017 at 11:28 AM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:

Hello,
In Windows, if one needs to take a dump in plain text format (this is
the default option, or can be specified using -Fp) with some level of
compression (-Z[0-9]), an output file has to
be specified. Otherwise, if the output is redirected to stdout, it'll
create a corrupted dump (cmd is set to ASCII mode, so it'll put
carriage returns in the file).

To reproduce the issue, please use the following command in windows cmd:

pg_dump -Z 9 test > E:\test_xu.backup
pg_dump -Fp -Z 9 test > E:\test_xu.backup

This is a known problem. It is not specific to PostgreSQL, it affects
any software that attempts to use stdin/stdout on Windows via cmd,
where it is not 8-bit clean.

We don't just refuse to run with stdout as a destination because it's
perfectly sensible if you're not using cmd.exe. pg_dump cannot, as far
as I know, tell whether it's being invoked by cmd or something else.

ASAICU, if we use binary mode, output is stored bit by bit. In ASCII
mode, cmd pokes its nose and does CR / LF conversions on its own. So,
whenever we want compression on a plain-text dump file, we can set the
stdout mode to O_BINARY. Is it a wrong approach?

With the help from Ashutosh Sharma, I tested this in Windows
environment. Sadly, it still doesn't work. :( IMHO, we should document
the issue somewhere.

--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Robert Haas
robertmhaas@gmail.com
In reply to: Kuntal Ghosh (#5)
Re: BUG: pg_dump generates corrupted gzip file in Windows

On Fri, Mar 24, 2017 at 6:44 AM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:

ASAICU, if we use binary mode, output is stored bit by bit. In ASCII
mode, cmd pokes its nose and does CR / LF conversions on its own. So,
whenever we want compression on a plain-text dump file, we can set the
stdout mode to O_BINARY. Is it a wrong approach?

With the help from Ashutosh Sharma, I tested this in Windows
environment. Sadly, it still doesn't work. :( IMHO, we should document
the issue somewhere.

Why not? I mean, if there's code there to force the output into
binary mode, does that not work for the -Fc case? And if it does work
for the -Fc case, then why doesn't it also work for -Z9?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Robert Haas (#6)
Re: BUG: pg_dump generates corrupted gzip file in Windows

Hi,

On Fri, Mar 24, 2017 at 10:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Mar 24, 2017 at 6:44 AM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:

ASAICU, if we use binary mode, output is stored bit by bit. In ASCII
mode, cmd pokes its nose and does CR / LF conversions on its own. So,
whenever we want compression on a plain-text dump file, we can set the
stdout mode to O_BINARY. Is it a wrong approach?

With the help from Ashutosh Sharma, I tested this in Windows
environment. Sadly, it still doesn't work. :( IMHO, we should document
the issue somewhere.

Why not? I mean, if there's code there to force the output into
binary mode, does that not work for the -Fc case? And if it does work
for the -Fc case, then why doesn't it also work for -Z9?

I have re-verified the patch with the help of my colleague 'Neha
Sharma' on Windows server 2008 R2 machine and the fix looks to be
working fine. With the help of attached patch, i could see that when
pg_dump is ran in default mode (i.e. -Fp mode) with compression level

0 and output redirected to stdout, the dump file generated is not a

corrupted file and can be easily decompressed with the help of gzip
command and also it can be easily restored without any issues. Thanks.

Please note that earlier i was not able to reproduce the issue on my
windows setup and had to borrow Neha's machine to validate the fix.
Sorry, for the delayed response.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Amit Kapila
amit.kapila16@gmail.com
In reply to: Ashutosh Sharma (#7)
Re: BUG: pg_dump generates corrupted gzip file in Windows

On Mon, Jun 19, 2017 at 11:42 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

Hi,

On Fri, Mar 24, 2017 at 10:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Mar 24, 2017 at 6:44 AM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:

ASAICU, if we use binary mode, output is stored bit by bit. In ASCII
mode, cmd pokes its nose and does CR / LF conversions on its own. So,
whenever we want compression on a plain-text dump file, we can set the
stdout mode to O_BINARY. Is it a wrong approach?

With the help from Ashutosh Sharma, I tested this in Windows
environment. Sadly, it still doesn't work. :( IMHO, we should document
the issue somewhere.

Why not? I mean, if there's code there to force the output into
binary mode, does that not work for the -Fc case? And if it does work
for the -Fc case, then why doesn't it also work for -Z9?

I have re-verified the patch with the help of my colleague 'Neha
Sharma' on Windows server 2008 R2 machine and the fix looks to be
working fine. With the help of attached patch,

Did you intended to attach the patch to this e-mail or are you
referring to Kuntal's patch up thread? If later, then it is better to
mention the link of mail which has a patch that you have verified.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Amit Kapila (#8)
Re: BUG: pg_dump generates corrupted gzip file in Windows

Hi,

On Mon, Jun 19, 2017 at 12:25 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jun 19, 2017 at 11:42 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

Hi,

On Fri, Mar 24, 2017 at 10:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Mar 24, 2017 at 6:44 AM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:

ASAICU, if we use binary mode, output is stored bit by bit. In ASCII
mode, cmd pokes its nose and does CR / LF conversions on its own. So,
whenever we want compression on a plain-text dump file, we can set the
stdout mode to O_BINARY. Is it a wrong approach?

With the help from Ashutosh Sharma, I tested this in Windows
environment. Sadly, it still doesn't work. :( IMHO, we should document
the issue somewhere.

Why not? I mean, if there's code there to force the output into
binary mode, does that not work for the -Fc case? And if it does work
for the -Fc case, then why doesn't it also work for -Z9?

I have re-verified the patch with the help of my colleague 'Neha
Sharma' on Windows server 2008 R2 machine and the fix looks to be
working fine. With the help of attached patch,

Did you intended to attach the patch to this e-mail or are you
referring to Kuntal's patch up thread? If later, then it is better to
mention the link of mail which has a patch that you have verified.

I am referring to Kuntal's patch upthread. The link for the email
having the patch is,

/messages/by-id/CAGz5QCJPvbBjXAmJuGx1B_41yVCetAJhp7rtaDf7XQGWuB1GSw@mail.gmail.com

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ashutosh Sharma (#9)
Re: BUG: pg_dump generates corrupted gzip file in Windows

Ashutosh Sharma <ashu.coek88@gmail.com> writes:

On Mon, Jun 19, 2017 at 12:25 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jun 19, 2017 at 11:42 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

On Fri, Mar 24, 2017 at 10:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Why not? I mean, if there's code there to force the output into
binary mode, does that not work for the -Fc case? And if it does work
for the -Fc case, then why doesn't it also work for -Z9?

I have re-verified the patch with the help of my colleague 'Neha
Sharma' on Windows server 2008 R2 machine and the fix looks to be
working fine. With the help of attached patch,

Did you intended to attach the patch to this e-mail or are you
referring to Kuntal's patch up thread? If later, then it is better to
mention the link of mail which has a patch that you have verified.

I am referring to Kuntal's patch upthread. The link for the email
having the patch is,
/messages/by-id/CAGz5QCJPvbBjXAmJuGx1B_41yVCetAJhp7rtaDf7XQGWuB1GSw@mail.gmail.com

That patch looks reasonable to me, will push.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#10)
Re: BUG: pg_dump generates corrupted gzip file in Windows

I wrote:

That patch looks reasonable to me, will push.

On closer inspection, the patch did contain a bug: it tested for
compression being active with "compression > 0", which is the wrong
thing --- everyplace else in pg_dump tests with "compression != 0".
This is important because Z_DEFAULT_COMPRESSION is -1. The net
effect is that the patch would have appeared to fix the bug when
writing "-Z1".."-Z9", but not when writing just "-Z". Possibly
this explains why Kuntal couldn't confirm success at one point
upthread.

Pushed with that correction and some wordsmithing on the comment.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers