DBD::Pg BYTEA Character Escaping

Started by David E. Wheelerover 24 years ago10 messagesgeneral
Jump to latest
#1David E. Wheeler
david@kineticode.com

Hi All,

I recently noticed that the DBD::Pg Perl module appears to be doing a
lot of work escaping characters for BYTEA data types. It's importing
Perl's POSIX support to check every character in BYTEA data with
isprint(), and replacing it with its octal representation if its not
printable.

However, there are two issues with this approach. The first is
efficiency. The way the code is currently written in DBD::Pg does a
*lot* of unnecessary work, and I'd like to suggest an optimization
(based on discussions on this topic on the Fun with Perl mail list:
http://archive.develooper.com/fwp%40perl.org/msg00458.html -- patch
supplied upon request).

The second issue, however, is that it doesn't appear to me that it's
even necessary that non-printable characters be replaced. Although Alex
Pilosov says that such an approach is needed:

http://www.geocrawler.com/mail/msg.php3?msg_id=6509224&list=10

Joe Conway found that there were only three characters ('\', "'", and
"\0") that needed to be escaped, and it was those three characters that
Bruce Momjian documented for the forthcoming 7.2 release:

http://www.geocrawler.com/mail/msg.php3?msg_id=6547225&list=10

If that's true, then any solution escaping non-printable characters is
overkill, and therefore only the three characters need to be escaped.
And since it looks like two of them ('\' and "'") are already escaped
before the non-printable characters are escaped in DBD::Pg, it then it
seems that this code:

if ($data_type == DBI::SQL_BINARY ||
$data_type == DBI::SQL_VARBINARY ||
$data_type == DBI::SQL_LONGVARBINARY) {
$str=join("", map { isprint($_)?$_:'\\'.sprintf("%03o",ord($_)) }
split //, $str);
}

Could be changed to:

s/\0/\\000/g if $data_type == DBI::SQL_BINARY ||
$data_type == DBI::SQL_VARBINARY ||
$data_type == DBI::SQL_LONGVARBINARY;

So, the reason I'm posting this query is because I'd like to get
confirmation, if possible, on this conclusion. Based on the feedback I
receive, I will submit patches to the DBD::Pg maintainer.

Thanks!

David

PS: If discussion of this issue needs to be moved to the Hackers list,
I'll be happy to do so. I just thought I'd try here, first.

--
David Wheeler AIM: dwTheory
David@Wheeler.net ICQ: 15726394
Yahoo!: dew7e
Jabber: Theory@jabber.org

#2Bruce Momjian
bruce@momjian.us
In reply to: David E. Wheeler (#1)
Re: DBD::Pg BYTEA Character Escaping

If that's true, then any solution escaping non-printable characters is
overkill, and therefore only the three characters need to be escaped.
And since it looks like two of them ('\' and "'") are already escaped
before the non-printable characters are escaped in DBD::Pg, it then it
seems that this code:

if ($data_type == DBI::SQL_BINARY ||
$data_type == DBI::SQL_VARBINARY ||
$data_type == DBI::SQL_LONGVARBINARY) {
$str=join("", map { isprint($_)?$_:'\\'.sprintf("%03o",ord($_)) }
split //, $str);
}

Could be changed to:

s/\0/\\000/g if $data_type == DBI::SQL_BINARY ||
$data_type == DBI::SQL_VARBINARY ||
$data_type == DBI::SQL_LONGVARBINARY;

So, the reason I'm posting this query is because I'd like to get
confirmation, if possible, on this conclusion. Based on the feedback I
receive, I will submit patches to the DBD::Pg maintainer.

Thanks!

David

PS: If discussion of this issue needs to be moved to the Hackers list,
I'll be happy to do so. I just thought I'd try here, first.

Yes, you only need to escape NULL for bytea. The above patch looks fine
to me. We can add it to 7.3.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: David E. Wheeler (#1)
Re: DBD::Pg BYTEA Character Escaping

David Wheeler <david@wheeler.net> writes:

If that's true, then any solution escaping non-printable characters is
overkill, and therefore only the three characters need to be escaped.

Check ...

Could be changed to:

s/\0/\\000/g if $data_type == DBI::SQL_BINARY ||
$data_type == DBI::SQL_VARBINARY ||
$data_type == DBI::SQL_LONGVARBINARY;

Offhand I don't think you even need the check on the datatype; wouldn't
it be faster and safer to do the substitution unconditionally? I can't
see that there are any cases that work without this substitution and
fail with it.

regards, tom lane

#4Alex Pilosov
alex@pilosoft.com
In reply to: David E. Wheeler (#1)
Re: DBD::Pg BYTEA Character Escaping

On 17 Nov 2001, David Wheeler wrote:

The second issue, however, is that it doesn't appear to me that it's
even necessary that non-printable characters be replaced. Although Alex
Pilosov says that such an approach is needed:

http://www.geocrawler.com/mail/msg.php3?msg_id=6509224&amp;list=10

I didn't say it was needed :) I just had the easy way out and escaped
everything that might possibly need to be escaped.

Joe Conway found that there were only three characters ('\', "'", and
"\0") that needed to be escaped, and it was those three characters that
Bruce Momjian documented for the forthcoming 7.2 release:

Right.

http://www.geocrawler.com/mail/msg.php3?msg_id=6547225&amp;list=10

If that's true, then any solution escaping non-printable characters is
overkill, and therefore only the three characters need to be escaped.
And since it looks like two of them ('\' and "'") are already escaped
before the non-printable characters are escaped in DBD::Pg, it then it
seems that this code:

if ($data_type == DBI::SQL_BINARY ||
$data_type == DBI::SQL_VARBINARY ||
$data_type == DBI::SQL_LONGVARBINARY) {
$str=join("", map { isprint($_)?$_:'\\'.sprintf("%03o",ord($_)) }
split //, $str);
}

Could be changed to:

s/\0/\\000/g if $data_type == DBI::SQL_BINARY ||
$data_type == DBI::SQL_VARBINARY ||
$data_type == DBI::SQL_LONGVARBINARY;

Yep.

So, the reason I'm posting this query is because I'd like to get
confirmation, if possible, on this conclusion. Based on the feedback I
receive, I will submit patches to the DBD::Pg maintainer.

Go right ahead.

-alex

#5David E. Wheeler
david@kineticode.com
In reply to: Tom Lane (#3)
Re: DBD::Pg BYTEA Character Escaping

On Sat, 2001-11-17 at 21:46, Tom Lane wrote:

Offhand I don't think you even need the check on the datatype; wouldn't
it be faster and safer to do the substitution unconditionally? I can't
see that there are any cases that work without this substitution and
fail with it.

In that case, I suggest the attached, even more simplified patch. It
assumes that the quote character stored in $lp in the existing version
is always "'" (a safe assumption for DBD:Pg, I think, in a way it wasn't
for DBI, from whence the code for the LITERAL_PREFIX and LITERAL_SUFFIX
was copied into DBD::Pg), and therefore performs the escaping of the
three agreed-upon characters for all non-numeric data types.

If you agree that this tack makes sense, I'll submit the patch to Edmund
Mergl.

Regards,

Daviid

--
David Wheeler AIM: dwTheory
David@Wheeler.net ICQ: 15726394
Yahoo!: dew7e
Jabber: Theory@jabber.org

Attachments:

DBD-Pg.difftext/plain; charset=ISO-8859-1Download+17-37
#6David E. Wheeler
david@kineticode.com
In reply to: Alex Pilosov (#4)
Re: DBD::Pg BYTEA Character Escaping

On Sun, 2001-11-18 at 08:40, Alex Pilosov wrote:

I didn't say it was needed :) I just had the easy way out and escaped
everything that might possibly need to be escaped.

Ah, gotcha!

<snip />

So, the reason I'm posting this query is because I'd like to get
confirmation, if possible, on this conclusion. Based on the feedback I
receive, I will submit patches to the DBD::Pg maintainer.

Go right ahead.

Thanks. See other recent messages with patches attached.

Regards,

David
--
David Wheeler AIM: dwTheory
David@Wheeler.net ICQ: 15726394
Yahoo!: dew7e
Jabber: Theory@jabber.org

#7David E. Wheeler
david@kineticode.com
In reply to: David E. Wheeler (#5)
Re: DBD::Pg BYTEA Character Escaping

On Sun, 2001-11-18 at 13:21, David Wheeler wrote:

If you agree that this tack makes sense, I'll submit the patch to Edmund
Mergl.

<snip />

Actually, I've come up with an even simpler solution, and it should be
faster for some datatypes, too. It's attached.

Thanks,

David

--
David Wheeler AIM: dwTheory
David@Wheeler.net ICQ: 15726394
Yahoo!: dew7e
Jabber: Theory@jabber.org

Attachments:

DBD-Pg.difftext/plain; charset=ISO-8859-1Download+14-39
#8Lincoln Yeoh
lyeoh@pop.jaring.my
In reply to: Tom Lane (#3)
7.1.2: Backend message type 0x44 when selecting from a table

select version();
version
---------------------------------------------------------------------
PostgreSQL 7.1.2 on i686-pc-linux-gnu, compiled by GCC egcs-2.91.66
(1 row)

(doh, I thought I was running 7.1.3!).

SELECT * from arch_ranks_arch4 ;
Backend message type 0x44 arrived while idle
pqReadData() -- backend closed the channel unexpectedly.
This probably means the backend terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>

Trying to pg_dump the table gives me this:

<snipped>
1526 2001-11-18 07:22:15+08 0 BLANK 0 0
0 0
1414414664 4714-11--2147483624 BC 218103907

\.
ERROR: MemoryContextAlloc: invalid request size 1163153238
PQendcopy: resetting connection
SQL query to dump the contents of Table 'arch_ranks_arch4' did not execute
correctly. After we read all th
e table contents from the backend, PQendcopy() failed. Explanation from
backend: 'ERROR: MemoryContextAll
oc: invalid request size 1163153238
'.
The query was: 'COPY "arch_ranks_arch4" TO stdout;
'.

Another similar table is ok. I have vacuumed - no errors, but the problem
still remained.

These tables have had a lot of updates on them (24/7 every second or so).
Fortunately it's not critical: just test tables.

The type:
CREATE TABLE "arch_ranks_arch4" (
"id" integer,
"updated" timestamp with time zone,
"valid" integer,
"name" text,
"specialty" text,
"status" text,
"ranking" integer,
"power" integer,
"land" integer,
"forts" integer,
"description" text
);

I've truncated the table and it runs ok now.

The server itself hasn't crashed - up for 69 days.

Is it a fixed bug?

Regards,
Link.

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Lincoln Yeoh (#8)
Re: 7.1.2: Backend message type 0x44 when selecting from a table

Lincoln Yeoh <lyeoh@pop.jaring.my> writes:

Trying to pg_dump the table gives me this:
<snipped>
ERROR: MemoryContextAlloc: invalid request size 1163153238
PQendcopy: resetting connection

This looks like a corrupted-data problem ...

I've truncated the table and it runs ok now.

... but the evidence is now gone, so we can't really probe into it
further :-(. You might be well advised to run some hardware diagnostics
to see if you have any RAM problems, flaky disk controllers, that sort
of thing. Not that Postgres has no bugs, of course, but we've seen
quite a number of data-corruption reports that ultimately traced to
hardware problems.

The behavior during a SELECT seems odd also:

SELECT * from arch_ranks_arch4 ;
Backend message type 0x44 arrived while idle
pqReadData() -- backend closed the channel unexpectedly.

This suggests that libpq and the backend got out of sync somehow,
but I thought we'd fixed that class of problems years ago. If you
can reproduce this it'd be worth looking into.

regards, tom lane

#10Lincoln Yeoh
lyeoh@pop.jaring.my
In reply to: Tom Lane (#9)
Re: 7.1.2: Backend message type 0x44 when selecting

At 11:18 PM 18-11-2001 -0500, Tom Lane wrote:

Lincoln Yeoh <lyeoh@pop.jaring.my> writes:

Trying to pg_dump the table gives me this:
<snipped>
ERROR: MemoryContextAlloc: invalid request size 1163153238
PQendcopy: resetting connection

This looks like a corrupted-data problem ...

Yah. Timestamp was BC :).

I've truncated the table and it runs ok now.

... but the evidence is now gone, so we can't really probe into it
further :-(. You might be well advised to run some hardware diagnostics
to see if you have any RAM problems, flaky disk controllers, that sort
of thing. Not that Postgres has no bugs, of course, but we've seen
quite a number of data-corruption reports that ultimately traced to
hardware problems.

The behavior during a SELECT seems odd also:

SELECT * from arch_ranks_arch4 ;
Backend message type 0x44 arrived while idle
pqReadData() -- backend closed the channel unexpectedly.

This suggests that libpq and the backend got out of sync somehow,
but I thought we'd fixed that class of problems years ago. If you
can reproduce this it'd be worth looking into.

I'll try to upgrade to 7.1.3. Then if it happens again it'll mean more.

Cheerio,
Link.