Gripe: bytea_output default => data corruption

Started by ljbover 15 years ago13 messagesdocsgeneral
Jump to latest
#1ljb
ljb9832@pobox.com
docsgeneral

Defaulting bytea output from the backend to use hex mode encoding, which is
incompatible with pre-9.0 interfaces, wasn't a friendly thing to do. The
default should have been escape mode. Or else you needed a big warning in
HISTORY that we must either change bytea_output, or upgrade all clients
before servers. Because using a 9.0 server with a 8.x libpq-based client
results in undetected data corruption when selecting BYTEA objects.

By default, the 9.0 server encodes a bytea using hex mode, but an 8.x
libpq-based client will decode that using escape mode, with no error detected
on either end. For example, start with "A", encode to "\x40" decode to "x40".

There are good reasons to break backward compatibility, like security or
standards compliance, but not performance. Please think twice next time you
consider breaking stuff just because you think the new way should be faster.

In reply to: ljb (#1)
docsgeneral
Re: Gripe: bytea_output default => data corruption

On 13/10/2010 01:37, ljb wrote:

Defaulting bytea output from the backend to use hex mode encoding, which is
incompatible with pre-9.0 interfaces, wasn't a friendly thing to do. The
default should have been escape mode. Or else you needed a big warning in
HISTORY that we must either change bytea_output, or upgrade all clients
before servers. Because using a 9.0 server with a 8.x libpq-based client
results in undetected data corruption when selecting BYTEA objects.

By default, the 9.0 server encodes a bytea using hex mode, but an 8.x
libpq-based client will decode that using escape mode, with no error detected
on either end. For example, start with "A", encode to "\x40" decode to "x40".

There are good reasons to break backward compatibility, like security or
standards compliance, but not performance. Please think twice next time you
consider breaking stuff just because you think the new way should be faster.

In fairness, it *is* flagged in the release note - it's the first item
under "data types" in the list of incompatibilities.

Ray.

--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie

#3Dmitriy Igrishin
dmitigr@gmail.com
In reply to: Raymond O'Donnell (#2)
docsgeneral
Re: Gripe: bytea_output default => data corruption

Hey all,

And it is 9.0, i.e another major version. IMO the higher major versions not
necessarily must be 100% backward compatible. So, think twice next time
you update the major version.

2010/10/13 Raymond O'Donnell <rod@iol.ie>

On 13/10/2010 01:37, ljb wrote:

Defaulting bytea output from the backend to use hex mode encoding, which
is
incompatible with pre-9.0 interfaces, wasn't a friendly thing to do. The
default should have been escape mode. Or else you needed a big warning in
HISTORY that we must either change bytea_output, or upgrade all clients
before servers. Because using a 9.0 server with a 8.x libpq-based client
results in undetected data corruption when selecting BYTEA objects.

By default, the 9.0 server encodes a bytea using hex mode, but an 8.x
libpq-based client will decode that using escape mode, with no error
detected
on either end. For example, start with "A", encode to "\x40" decode to
"x40".

There are good reasons to break backward compatibility, like security or
standards compliance, but not performance. Please think twice next time
you
consider breaking stuff just because you think the new way should be
faster.

In fairness, it *is* flagged in the release note - it's the first item
under "data types" in the list of incompatibilities.

Ray.

--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
// Dmitriy.

#4ljb
ljb9832@pobox.com
In reply to: ljb (#1)
docsgeneral
Re: Gripe: bytea_output default => data corruption

rod@iol.ie wrote:

...
In fairness, it *is* flagged in the release note - it's the first item
under "data types" in the list of incompatibilities.

Quote:
"bytea output now appears in hex format by default (Peter Eisentraut)
The server parameter bytea_output can be used to select the
traditional output format if needed for compatibility."

This is inadequate, because it fails to warn that pre-9.0 clients will
decode the data incorrectly without reporting an error.

In reply to: ljb (#4)
docsgeneral
Re: Gripe: bytea_output default => data corruption

On 13/10/2010 22:03, ljb wrote:

rod@iol.ie wrote:

...
In fairness, it *is* flagged in the release note - it's the first item
under "data types" in the list of incompatibilities.

Quote:
"bytea output now appears in hex format by default (Peter Eisentraut)
The server parameter bytea_output can be used to select the
traditional output format if needed for compatibility."

This is inadequate, because it fails to warn that pre-9.0 clients will
decode the data incorrectly without reporting an error.

I thought myself that the word "incompatibility" was fairly
self-explanatory... :-)

Ray.

--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie

#6Rajesh Kumar Mallah
mallah.rajesh@gmail.com
In reply to: ljb (#4)
docsgeneral
Re: Gripe: bytea_output default => data corruption

Dear Griper!,

fortunately someone showed a easy 'fix'

ALTER DATABASE foo SET bytea_output='escape' ;

Regds
Rajesh Kumar Mallah.

Show quoted text

On Wed, Oct 13, 2010 at 5:03 PM, ljb <ljb9832@pobox.com> wrote:

rod@iol.ie wrote:

...
In fairness, it *is* flagged in the release note - it's the first item
under "data types" in the list of incompatibilities.

Quote:
   "bytea output now appears in hex format by default (Peter Eisentraut)
    The server parameter bytea_output can be used to select the
    traditional output format if needed for compatibility."

This is inadequate, because it fails to warn that pre-9.0 clients will
decode the data incorrectly without reporting an error.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Bruce Momjian
bruce@momjian.us
In reply to: Rajesh Kumar Mallah (#6)
docsgeneral
Re: Gripe: bytea_output default => data corruption

Rajesh Kumar Mallah wrote:

Dear Griper!,

fortunately someone showed a easy 'fix'

ALTER DATABASE foo SET bytea_output='escape' ;

Yes, we mentioned that setting in the release notes too:

E.2.3.5. Data Types

*

Allow bytea values to be written in hex notation (Peter
Eisentraut)

The server parameter bytea_output controls whether hex or
traditional format is used for bytea output. Libpq's PQescapeByteaConn()
function automatically uses the hex format when connected to PostgreSQL
9.0 or newer servers.

The new hex format will be directly compatible with more
applications that use binary data, allowing them to store and retrieve
it without extra conversion. It is also significantly faster to read and
write than the traditional format.

While the "Incompatibilities" section mentions only the first paragraph,
this remention lower down has even more details. Not sure what else you
wanted us to do.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#8ljb
ljb9832@pobox.com
In reply to: Rajesh Kumar Mallah (#6)
docsgeneral
Re: Gripe: bytea_output default => data corruption

bruce@momjian.us wrote:

...
Yes, we mentioned that setting in the release notes too:
...
While the "Incompatibilities" section mentions only the first paragraph,
this remention lower down has even more details. Not sure what else you
wanted us to do.

Here's how I would have written that first paragraph. My addition starts
at "Libpq's PQunescapeBytea() function..."

| * Allow bytea values to be written in hex notation (Peter Eisentraut)
|
| The server parameter bytea_output controls whether hex or
| traditional format is used for bytea output. Libpq's PQescapeByteaConn()
| function automatically uses the hex format when connected to PostgreSQL
| 9.0 or newer servers. Libpq's PQunescapeBytea() function from version 9.0
| and newer will properly decode both hex and traditional format. However,
| in versions of Libpq older than 9.0, the PQunescapeByte() function can only
| decode traditional format, and will corrupt bytea data received in hex
| format without reporting an error. To avoid loss of data, you must either
| upgrade all clients to 9.0.x, or set the server's bytea_output parameter
| to 'escape'.

Again: My complaint is that pre-9.0 libpq-based clients mis-decode the new
default hex format bytea data without reporting an error, and this danger is
insufficiently documented in the release notes.

Speaking of documentation, go read the 9.0.x reference manual sections for
Libpq's PQescapeByteaConn() and PQunescapeBytea(). These descriptions of
escaping and unescaping are incorrect for 9.0, which can add to any confusion.

#9Bruce Momjian
bruce@momjian.us
In reply to: ljb (#8)
docsgeneral
Re: [GENERAL] Gripe: bytea_output default => data corruption

ljb wrote:

bruce@momjian.us wrote:

...
Yes, we mentioned that setting in the release notes too:
...
While the "Incompatibilities" section mentions only the first paragraph,
this remention lower down has even more details. Not sure what else you
wanted us to do.

Here's how I would have written that first paragraph. My addition starts
at "Libpq's PQunescapeBytea() function..."

| * Allow bytea values to be written in hex notation (Peter Eisentraut)
|
| The server parameter bytea_output controls whether hex or
| traditional format is used for bytea output. Libpq's PQescapeByteaConn()
| function automatically uses the hex format when connected to PostgreSQL
| 9.0 or newer servers. Libpq's PQunescapeBytea() function from version 9.0
| and newer will properly decode both hex and traditional format. However,
| in versions of Libpq older than 9.0, the PQunescapeByte() function can only
| decode traditional format, and will corrupt bytea data received in hex
| format without reporting an error. To avoid loss of data, you must either
| upgrade all clients to 9.0.x, or set the server's bytea_output parameter
| to 'escape'.

Again: My complaint is that pre-9.0 libpq-based clients mis-decode the new
default hex format bytea data without reporting an error, and this danger is
insufficiently documented in the release notes.

[ Thread moved to docs.]

Yes, I don't believe we were aware of the silent error behavior; I
certainly was not, and if I was, I would have mentioned it in the
release notes.

Speaking of documentation, go read the 9.0.x reference manual sections for
Libpq's PQescapeByteaConn() and PQunescapeBytea(). These descriptions of
escaping and unescaping are incorrect for 9.0, which can add to any confusion.

Yikes, you are right! I see:

http://www.postgresql.org/docs/9.0/static/libpq-exec.html#LIBPQ-EXEC-ESCAPE-STRING

PQescapeByteaConn
...
Certain byte values must be escaped (but all byte values can be escaped)
when used as part of a bytea literal in an SQL statement. In general, to
--> escape a byte, it is converted into the three digit octal number equal
to the octet value, and preceded by usually two backslashes. The single
quote (') and backslash (\) characters have special alternative escape
sequences. See Section 8.4 for more information. PQescapeByteaConn
performs this operation, escaping only the minimally required bytes.

Can someone suggest some updated wording? Thanks.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#10Vick Khera
vivek@khera.org
In reply to: ljb (#8)
docsgeneral
Re: Gripe: bytea_output default => data corruption

On Thu, Oct 21, 2010 at 7:12 PM, ljb <ljb9832@pobox.com> wrote:

Again: My complaint is that pre-9.0 libpq-based clients mis-decode the new
default hex format bytea data without reporting an error, and this danger
is
insufficiently documented in the release notes.

I had some hoops thru which I had to jump to make our app compatible with
both 8.x and 9.x so we could safely migrate our servers without having to
coordinate a code push. It wasn't that bad, but part of the problem is that
the DBD::Pg driver does not understand the new format... but it does
unescape the "\x" marker for me to a simple "x", since that follows the
traditional un-escaping rules for values returned from Pg. I don't like
overriding defaults in the DB settings unless I *really* have to.

So my code now looks something like this:

if (substr($value,2,100) =~ m/^[0-9a-f]+$/) {
# hex coding of bytea from Postgres 9.0+
$self->log_debug('bytea hex decode');
# remove the leading \x. DBD::Pg descapes \x to x, so just remove x
$value =~ s/^[^0-9a-f]+//;
$value = pack('H*',$value); # convert hex to bytes
} else {
$self->log_debug('bytea escape decoded');
# Postgres < 9.0 encode auto handled by DBD::Pg
}

Unfortunately, Greg is saying that he has no time right now to release an
updated DBD::Pg despite the fact that the code is written and merged into
the development source tree... so it may be a while before perl people are
happy. Luckily my above code will work even when DBD::Pg learns to do
auto-escaping of bytea new format.

At least the current $dbh->quote() method still seems to create acceptable
escaping for postgres 9.0 when you tell it you have a bytea field type.

#11ljb
ljb9832@pobox.com
In reply to: Rajesh Kumar Mallah (#6)
docsgeneral
Re: Gripe: bytea_output default => data corruption

vivek@khera.org wrote:

...
I had some hoops thru which I had to jump to make our app compatible with
both 8.x and 9.x so we could safely migrate our servers without having to
coordinate a code push. It wasn't that bad, but part of the problem is that
the DBD::Pg driver does not understand the new format... but it does
unescape the "\x" marker for me to a simple "x", since that follows the
traditional un-escaping rules for values returned from Pg. I don't like
overriding defaults in the DB settings unless I *really* have to.

Although DBD::Pg uses libpq, linking it with the PostgreSQL-9.0 libpq
doesn't fix the escape/unescape problem. (I'm sure you already know this.)
That's because DBD::Pg implements its own string and bytea escape and
unescape functions, for some reason, and does not use PQescapeStringConn
etc. (I wonder if that makes it vulnerable to the security problems fixed
8.1.4?)

I probably would have just gone ahead and deployed 9.0 servers with
"bytea_output='escape'" in their postgresql.conf files. Permanent work-around.

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#9)
docsgeneral
Re: [GENERAL] Gripe: bytea_output default => data corruption

Bruce Momjian <bruce@momjian.us> writes:

Yikes, you are right! I see:

http://www.postgresql.org/docs/9.0/static/libpq-exec.html#LIBPQ-EXEC-ESCAPE-STRING

PQescapeByteaConn
...
Certain byte values must be escaped (but all byte values can be escaped)
when used as part of a bytea literal in an SQL statement. In general, to
--> escape a byte, it is converted into the three digit octal number equal
to the octet value, and preceded by usually two backslashes. The single
quote (') and backslash (\) characters have special alternative escape
sequences. See Section 8.4 for more information. PQescapeByteaConn
performs this operation, escaping only the minimally required bytes.

Can someone suggest some updated wording? Thanks.

I think we should simply remove the description of *how* the escaping is
performed, and state only that the function produces a suitably escaped
literal string. Anything else is not future-proof, and could someday
break the way this wording did.

regards, tom lane

#13ljb
ljb9832@pobox.com
In reply to: Bruce Momjian (#9)
docsgeneral
Re: [GENERAL] Gripe: bytea_output default => data corruption

tgl@sss.pgh.pa.us wrote:

I think we should simply remove the description of *how* the escaping is
performed, and state only that the function produces a suitably escaped
literal string. Anything else is not future-proof, and could someday
break the way this wording did.

Perhaps it would be best to remove escaping details here. But the
description of PQescapeBytea() might need to be rewritten, too. Without
describing exactly what PQescapeByteaConn() does, it is hard to understand
what PQescapeBytea() does not do, and why it therefore "might give the
wrong results".

I think the actual function behavior should be documented somewhere. Even
though it might change again.