Large object corruption during 'piped' pg_restore

Started by Bosco Ramaalmost 15 years ago8 messages
#1Bosco Rama
postgres@boscorama.com

Hi folks,

We've discovered (or possibly rediscovered?) a corruption when restoring large
objects.

If 'standard_conforming_strings = on' is set in our DB (which is required for
our app) then the piped restore method (e.g. pg_restore -O backup.dat | psql)
results in the large objects being corrupted.

However, when we use the DB-direct restoration method
(e.g. pg_restore -O -d dbname backup.dat) it works just fine.

If I take the output of the non-DB pg_restore command and edit it to set the
standard_conforming_strings off just prior to the large object creation and
then back on again immediately after the last one is created, it works.

It appears that the escaping of the strings used in the lowrite() functions
is not happening properly.

Is this something that is known and should just be avoided? Or is it something
that needs reporting?

TIA.

Bosco.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bosco Rama (#1)
Re: Large object corruption during 'piped' pg_restore

Bosco Rama <postgres@boscorama.com> writes:

We've discovered (or possibly rediscovered?) a corruption when restoring large
objects.

If 'standard_conforming_strings = on' is set in our DB (which is required for
our app) then the piped restore method (e.g. pg_restore -O backup.dat | psql)
results in the large objects being corrupted.

This vaguely reminds me of some long-since-fixed bug, but since you have
neglected to give any version details, it's hard to say for sure.
What PG version was the original server? The pg_dump you used to make
the backup? The pg_restore? The target server?

regards, tom lane

#3Bosco Rama
postgres@boscorama.com
In reply to: Tom Lane (#2)
Re: Large object corruption during 'piped' pg_restore

Hi Tom,

Tom Lane wrote:

Bosco Rama <postgres@boscorama.com> writes:

We've discovered (or possibly rediscovered?) a corruption when restoring large
objects.

If 'standard_conforming_strings = on' is set in our DB (which is required for
our app) then the piped restore method (e.g. pg_restore -O backup.dat | psql)
results in the large objects being corrupted.

This vaguely reminds me of some long-since-fixed bug, but since you have
neglected to give any version details, it's hard to say for sure.
What PG version was the original server? The pg_dump you used to make
the backup? The pg_restore? The target server?

Yeah, I only realized I hadn't added that as I saw it being sent. :-(

All servers and client tools involved are PG 8.4.6 on Ubuntu Server 10.04.1 LTS
with all current updates applied.

The dump file was made using: pg_dump -Fc -Z1 > backup.dat

Bosco.

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bosco Rama (#3)
Re: Large object corruption during 'piped' pg_restore

Bosco Rama <postgres@boscorama.com> writes:

If 'standard_conforming_strings = on' is set in our DB (which is required for
our app) then the piped restore method (e.g. pg_restore -O backup.dat | psql)
results in the large objects being corrupted.

All servers and client tools involved are PG 8.4.6 on Ubuntu Server 10.04.1 LTS
with all current updates applied.

I've been able to replicate this in 8.4; it doesn't happen in 9.0
(but probably does in all 8.x versions).

The problem is that pg_dump (or in this case really pg_restore) is
relying on libpq's PQescapeBytea() to format the bytea literal that
will be given as argument to lowrite() during the restore. When
pg_dump is producing SQL directly, or when pg_restore is connected
to a database, PQescapeBytea() mooches the standard_conforming_strings
value from the active libpq connection and gets the right answer.
In the single case where pg_restore is producing SQL without ever
opening a database connection, PQescapeBytea doesn't know what to do
and defaults to the old non-standard-strings behavior. Unfortunately
pg_restore set standard_conforming_strings=on earlier in the script
(if it was set in the original source database) so you get the wrong
thing.

The bottom line is that pg_dump can't depend on libpq's PQescapeBytea,
but needs its own copy. We have in fact done that as of 9.0, which is
what I was vaguely remembering:

Author: Tom Lane <tgl@sss.pgh.pa.us>
Branch: master Release: REL9_0_BR [b1732111f] 2009-08-04 21:56:09 +0000

Fix pg_dump to do the right thing when escaping the contents of large objects.

The previous implementation got it right in most cases but failed in one:
if you pg_dump into an archive with standard_conforming_strings enabled, then
pg_restore to a script file (not directly to a database), the script will set
standard_conforming_strings = on but then emit large object data as
nonstandardly-escaped strings.

At the moment the code is made to emit hex-format bytea strings when dumping
to a script file. We might want to change to old-style escaping for backwards
compatibility, but that would be slower and bulkier. If we do, it's just a
matter of reimplementing appendByteaLiteral().

This has been broken for a long time, but given the lack of field complaints
I'm not going to worry about back-patching.

I'm not sure whether this new complaint is enough reason to reconsider
back-patching. We cannot just backport the 9.0 patch, since it assumes
it can do bytea hex output --- we'd need to emit old style escaped
output instead. So it's a bit of work, and more to the point would
involve pushing poorly-tested code into stable branches. I doubt it
would go wrong, but in the worst-case scenario we might create failures
for blob-restore cases that work now.

So I'm not sure whether to fix it, or leave it as a known failure case
in old branches. Comments?

regards, tom lane

#5Vick Khera
vivek@khera.org
In reply to: Tom Lane (#4)
Re: Large object corruption during 'piped' pg_restore

On Thu, Jan 20, 2011 at 6:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

So I'm not sure whether to fix it, or leave it as a known failure case
in old branches.  Comments?

Since there is a workaround, I think it is best to document it and
leave it as-is.

#6Bosco Rama
postgres@boscorama.com
In reply to: Tom Lane (#4)
Re: Large object corruption during 'piped' pg_restore

Tom Lane wrote:

So I'm not sure whether to fix it, or leave it as a known failure case
in old branches. Comments?

I understand the reluctance to fool with stable code. I have zero insight
into your installed versions distribution and backward compatibility needs
so any comment I may have here is purely selfish.

As an end user there is one area of the DB that I want to work correctly
100% of the time and that is the dump/restore tool(s). If it's not going
to work under certain circumstances it should at least tell me so and
fail. I don't think having the tool appear to work while corrupting the
data (even if documented as doing so) is a viable method of operation.

Just my $0.02

Bosco.

#7Robert Haas
robertmhaas@gmail.com
In reply to: Bosco Rama (#6)
Re: [HACKERS] Large object corruption during 'piped' pg_restore

On Fri, Jan 21, 2011 at 12:44 PM, Bosco Rama <postgres@boscorama.com> wrote:

Tom Lane wrote:

So I'm not sure whether to fix it, or leave it as a known failure case
in old branches.  Comments?

I understand the reluctance to fool with stable code.  I have zero insight
into your installed versions distribution and backward compatibility needs
so any comment I may have here is purely selfish.

As an end user there is one area of the DB that I want to work correctly
100% of the time and that is the dump/restore tool(s).  If it's not going
to work under certain circumstances it should at least tell me so and
fail.  I don't think having the tool appear to work while corrupting the
data (even if documented as doing so) is a viable method of operation.

Yeah, I lean toward saying we should back-patch this. Yeah, it'll be
lightly tested, but it's a pretty confined change, so it's unlikely to
break anything else. ISTM the worst case scenario is that it takes
two minor releases to get it right, and even that seems fairly
unlikely.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#7)
Re: [HACKERS] Large object corruption during 'piped' pg_restore

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Jan 21, 2011 at 12:44 PM, Bosco Rama <postgres@boscorama.com> wrote:

Tom Lane wrote:

So I'm not sure whether to fix it, or leave it as a known failure case
in old branches. �Comments?

As an end user there is one area of the DB that I want to work correctly
100% of the time and that is the dump/restore tool(s).

Yeah, I lean toward saying we should back-patch this.

Fair enough, I'll go do it. I just wanted to hear at least one other
person opine that it was worth taking some risk for.

regards, tom lane