Unexpected behaviour of encode()

Started by Bill Moranabout 13 years ago7 messagesgeneral
Jump to latest
#1Bill Moran
wmoran@potentialtech.com

psql -U postgres
psql (9.2.3)
Type "help" for help.

postgres=# select encode('can''t', 'escape');
encode
--------
can't
(1 row)

I observed the same behaviour on one of our older systems (8.3.11) as well.

Am I missing something? I expected "can''t" as the output.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Merlin Moncure
mmoncure@gmail.com
In reply to: Bill Moran (#1)
Re: Unexpected behaviour of encode()

On Tue, Mar 26, 2013 at 12:47 PM, Bill Moran <wmoran@potentialtech.com> wrote:

psql -U postgres
psql (9.2.3)
Type "help" for help.

postgres=# select encode('can''t', 'escape');
encode
--------
can't
(1 row)

I observed the same behaviour on one of our older systems (8.3.11) as well.

Am I missing something? I expected "can''t" as the output.

why? that isn't what you passed it. the input string doubled single
quotes is converted to single single quote per spec. it's 'ghetto
escaping'.

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Bill Moran
wmoran@potentialtech.com
In reply to: Merlin Moncure (#2)
Re: Unexpected behaviour of encode()

In response to Merlin Moncure <mmoncure@gmail.com>:

On Tue, Mar 26, 2013 at 12:47 PM, Bill Moran <wmoran@potentialtech.com> wrote:

psql -U postgres
psql (9.2.3)
Type "help" for help.

postgres=# select encode('can''t', 'escape');
encode
--------
can't
(1 row)

I observed the same behaviour on one of our older systems (8.3.11) as well.

Am I missing something? I expected "can''t" as the output.

why? that isn't what you passed it. the input string doubled single
quotes is converted to single single quote per spec. it's 'ghetto
escaping'.

Not sure what you mean by "ghetto secaping" ... but doubling up a '
is the SQL standard escaping method, AFAIK.

If I just do:
SELECT 'can''t'::text;
I get "can't" which is what I'd expect. I would then expect
encode to escape the ' somehow. Even c-style escaping, like
"can\'t" would have been less surprising to me.

If there's something I'm missing, I'm still missing it.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Merlin Moncure
mmoncure@gmail.com
In reply to: Bill Moran (#3)
Re: Unexpected behaviour of encode()

On Tue, Mar 26, 2013 at 1:36 PM, Bill Moran <wmoran@potentialtech.com> wrote:

I get "can't" which is what I'd expect. I would then expect
encode to escape the ' somehow.

nope -- encode/escape doesn't escape single quotes. it's not designed
to produce output that can be fed directly back into the database (at
least, not without escaping). yes, it (escape format) is pretty
terrible.

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bill Moran (#3)
Re: Unexpected behaviour of encode()

Bill Moran <wmoran@potentialtech.com> writes:

If I just do:
SELECT 'can''t'::text;
I get "can't" which is what I'd expect. I would then expect
encode to escape the ' somehow. Even c-style escaping, like
"can\'t" would have been less surprising to me.

If there's something I'm missing, I'm still missing it.

The manual says that 'escape' encoding "merely outputs null bytes as
\000 and doubles backslashes". (The reason to double backslashes is to
make \000 unambiguous, of course.) The point of this is to sanitize
bytea data sufficiently to allow it to be transported as text. If you
want to transport it as a SQL literal, that's a tighter constraint that
would require some other escaping method, or at least passing the result
through something like quote_literal.

(Having said that, I wonder though if "escape" doesn't need more
thought. The output is only valid text in SQL_ASCII or single-byte
encodings, otherwise there's risk of encoding violations.)

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Jasen Betts
jasen@xnet.co.nz
In reply to: Bill Moran (#1)
Re: Unexpected behaviour of encode()

On 2013-03-26, Tom Lane <tgl@sss.pgh.pa.us> wrote:

The manual says that 'escape' encoding "merely outputs null bytes as
\000 and doubles backslashes".

(Having said that, I wonder though if "escape" doesn't need more
thought. The output is only valid text in SQL_ASCII or single-byte
encodings, otherwise there's risk of encoding violations.)

it does that too, since as long as I can remember.
I used decode-hex here so it'll work on older version of pg.

select encode(decode('ee5c2700aa','hex'),'escape');
encode
--------------------
\356\\'\000\252

to see the slashes doubled you need to run it through quote-literal

select quote_literal(encode(decode('ee5c2700aa','hex'),'escape'));
quote_literal
--------------------------
E'\\356\\\\''\\000\\252'

--
⚂⚃ 100% natural

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jasen Betts (#6)
Re: Unexpected behaviour of encode()

Jasen Betts <jasen@xnet.co.nz> writes:

On 2013-03-26, Tom Lane <tgl@sss.pgh.pa.us> wrote:

The manual says that 'escape' encoding "merely outputs null bytes as
\000 and doubles backslashes".

(Having said that, I wonder though if "escape" doesn't need more
thought. The output is only valid text in SQL_ASCII or single-byte
encodings, otherwise there's risk of encoding violations.)

it does that too, since as long as I can remember.
I used decode-hex here so it'll work on older version of pg.

Hah ... that's what I get for believing the manual ;-). The code
comments tell the truth:

* We must escape zero bytes and high-bit-set bytes to avoid generating
* text that might be invalid in the current encoding, or that might
* change to something else if passed through an encoding conversion
* (leading to failing to de-escape to the original bytea value).
* Also of course backslash itself has to be escaped.

It appears that the manual's statement was correct before 8.3, but
when somebody fixed the code to deal with the encoding issue, they
didn't fix the manual. I'll go improve that ...

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general