regclass and format('%I')

Started by Jason Dusekabout 11 years ago12 messagesgeneral

jason.dusek@gmail.com

about 11 years ago

Hi All,

The difference in how format handles `regclass` and `name` seems like an
inconsistency:

WITH conversions(casts, format, result) AS (
VALUES (ARRAY['name']::regtype[], '%I', format('%I',
name('select'))),
(ARRAY['name']::regtype[], '%L', format('%L',
name('select'))),
(ARRAY['name']::regtype[], '%s', format('%s',
name('select'))),
(ARRAY['regclass']::regtype[], '%I', format('%I',
regclass('select'))),
(ARRAY['regclass']::regtype[], '%L', format('%L',
regclass('select'))),
(ARRAY['regclass']::regtype[], '%s', format('%s',
regclass('select'))),
(ARRAY['regclass', 'name']::regtype[], '%I', format('%I',
name(regclass('select')))),
(ARRAY['regclass', 'name']::regtype[], '%L', format('%L',
name(regclass('select')))),
(ARRAY['regclass', 'name']::regtype[], '%s', format('%s',
name(regclass('select'))))
) SELECT * FROM conversions;
casts | format | result
-----------------+--------+--------------
{name} | %I | "select"
{name} | %L | 'select'
{name} | %s | select
{regclass} | %I | """select"""
{regclass} | %L | '"select"'
{regclass} | %s | "select"
{regclass,name} | %I | """select"""
{regclass,name} | %L | '"select"'
{regclass,name} | %s | "select"

My assumption is that they both represent valid SQL identifiers, so it stands
to reason that `%I` should result in a valid identifier for both of them (or
neither one).

Kind Regards,
Jason Dusek

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

David G. Johnston

david.g.johnston@gmail.com

about 11 years ago

In reply to: Jason Dusek (#1)

Re: regclass and format('%I')

On Fri, Mar 13, 2015 at 12:18 PM, Jason Dusek <jason.dusek@gmail.com> wrote:

Hi All,

The difference in how format handles `regclass` and `name` seems like an
inconsistency:

WITH conversions(casts, format, result) AS (
VALUES (ARRAY['name']::regtype[], '%I', format('%I',
name('select'))),
(ARRAY['name']::regtype[], '%L', format('%L',
name('select'))),
(ARRAY['name']::regtype[], '%s', format('%s',
name('select'))),
(ARRAY['regclass']::regtype[], '%I', format('%I',
regclass('select'))),
(ARRAY['regclass']::regtype[], '%L', format('%L',
regclass('select'))),
(ARRAY['regclass']::regtype[], '%s', format('%s',
regclass('select'))),
(ARRAY['regclass', 'name']::regtype[], '%I', format('%I',
name(regclass('select')))),
(ARRAY['regclass', 'name']::regtype[], '%L', format('%L',
name(regclass('select')))),
(ARRAY['regclass', 'name']::regtype[], '%s', format('%s',
name(regclass('select'))))
) SELECT * FROM conversions;
casts | format | result
-----------------+--------+--------------
{name} | %I | "select"
{name} | %L | 'select'
{name} | %s | select
{regclass} | %I | """select"""
{regclass} | %L | '"select"'
{regclass} | %s | "select"
{regclass,name} | %I | """select"""
{regclass,name} | %L | '"select"'
{regclass,name} | %s | "select"

My assumption is that they both represent valid SQL identifiers, so it
stands
to reason that `%I` should result in a valid identifier for both of them
(or
neither one).

All three of the %I results are valid identifiers.

regclass performs the same conversion that %I performs. But since the
output of the regclass conversion is a valid identifier, with
double-quotes, the %I adds another pair of double-quotes and doubles-up the
existing pair thus leaving you with 6.

<select> is a reserved word and thus can only be used as an identifier if
it is surrounded in double-quotes. name() doesn't care (not that it is
user-documented that I can find) about making its value usable as an
identifier so when its output goes through %I you get the expected value.

If you are going to use regclass you want to use %s to insert the result
into your string; not %I.

David J.

Jason Dusek

jason.dusek@gmail.com

about 11 years ago

In reply to: David G. Johnston (#2)

Re: regclass and format('%I')

It honestly seems far more reasonable to me that %s and %I should do
the exact same thing with regclass. My reasoning is as follows:

‘%I’ formats a something such that it is a valid identifier,

regclass is already a valid identifier,

therefore, do nothing.

Another line of reasoning:

If you format with ‘%s’ you are saying: I don’t care whether it’s a
valid identifier or literal or whatever, just put the string there,

but when we sub a regclass into a string, we want it to be a valid identifier,

therefore we should write ‘%I’ when subbing it, so as not to confuse
our readers,

therefore ‘%I’ should do nothing.

On 13 March 2015 at 12:42, David G. Johnston <david.g.johnston@gmail.com> wrote:

On Fri, Mar 13, 2015 at 12:18 PM, Jason Dusek <jason.dusek@gmail.com> wrote:

Hi All,

The difference in how format handles `regclass` and `name` seems like an
inconsistency:

WITH conversions(casts, format, result) AS (
VALUES (ARRAY['name']::regtype[], '%I', format('%I',
name('select'))),
(ARRAY['name']::regtype[], '%L', format('%L',
name('select'))),
(ARRAY['name']::regtype[], '%s', format('%s',
name('select'))),
(ARRAY['regclass']::regtype[], '%I', format('%I',
regclass('select'))),
(ARRAY['regclass']::regtype[], '%L', format('%L',
regclass('select'))),
(ARRAY['regclass']::regtype[], '%s', format('%s',
regclass('select'))),
(ARRAY['regclass', 'name']::regtype[], '%I', format('%I',
name(regclass('select')))),
(ARRAY['regclass', 'name']::regtype[], '%L', format('%L',
name(regclass('select')))),
(ARRAY['regclass', 'name']::regtype[], '%s', format('%s',
name(regclass('select'))))
) SELECT * FROM conversions;
casts | format | result
-----------------+--------+--------------
{name} | %I | "select"
{name} | %L | 'select'
{name} | %s | select
{regclass} | %I | """select"""
{regclass} | %L | '"select"'
{regclass} | %s | "select"
{regclass,name} | %I | """select"""
{regclass,name} | %L | '"select"'
{regclass,name} | %s | "select"

My assumption is that they both represent valid SQL identifiers, so it
stands
to reason that `%I` should result in a valid identifier for both of them
(or
neither one).

All three of the %I results are valid identifiers.

regclass performs the same conversion that %I performs. But since the
output of the regclass conversion is a valid identifier, with double-quotes,
the %I adds another pair of double-quotes and doubles-up the existing pair
thus leaving you with 6.

<select> is a reserved word and thus can only be used as an identifier if it
is surrounded in double-quotes. name() doesn't care (not that it is
user-documented that I can find) about making its value usable as an
identifier so when its output goes through %I you get the expected value.

If you are going to use regclass you want to use %s to insert the result
into your string; not %I.

David J.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Pavel Stehule

pavel.stehule@gmail.com

about 11 years ago

In reply to: Jason Dusek (#3)

Re: regclass and format('%I')

2015-03-14 10:09 GMT+01:00 Jason Dusek <jason.dusek@gmail.com>:

It honestly seems far more reasonable to me that %s and %I should do
the exact same thing with regclass. My reasoning is as follows:

‘%I’ formats a something such that it is a valid identifier,

regclass is already a valid identifier,

therefore, do nothing.

Another line of reasoning:

If you format with ‘%s’ you are saying: I don’t care whether it’s a
valid identifier or literal or whatever, just put the string there,

but when we sub a regclass into a string, we want it to be a valid
identifier,

therefore we should write ‘%I’ when subbing it, so as not to confuse
our readers,

therefore ‘%I’ should do nothing.

yes, it is true, when you use a safe type: regclass, regtype, you should
not to use %I due double quoting.

postgres=# select 16398::regclass ;
-[ RECORD 1 ]-------
regclass | "omega a"

postgres=# select format('>>%I<<<', 16398::regclass );
-[ RECORD 1 ]--------------
format | >>"""omega a"""<<<

Should be fixed

Regards

Pavel

Regards

Pavel

Show quoted text

On 13 March 2015 at 12:42, David G. Johnston <david.g.johnston@gmail.com>
wrote:

On Fri, Mar 13, 2015 at 12:18 PM, Jason Dusek <jason.dusek@gmail.com>

wrote:

Hi All,

The difference in how format handles `regclass` and `name` seems like an
inconsistency:

WITH conversions(casts, format, result) AS (
VALUES (ARRAY['name']::regtype[], '%I', format('%I',
name('select'))),
(ARRAY['name']::regtype[], '%L', format('%L',
name('select'))),
(ARRAY['name']::regtype[], '%s', format('%s',
name('select'))),
(ARRAY['regclass']::regtype[], '%I', format('%I',
regclass('select'))),
(ARRAY['regclass']::regtype[], '%L', format('%L',
regclass('select'))),
(ARRAY['regclass']::regtype[], '%s', format('%s',
regclass('select'))),
(ARRAY['regclass', 'name']::regtype[], '%I', format('%I',
name(regclass('select')))),
(ARRAY['regclass', 'name']::regtype[], '%L', format('%L',
name(regclass('select')))),
(ARRAY['regclass', 'name']::regtype[], '%s', format('%s',
name(regclass('select'))))
) SELECT * FROM conversions;
casts | format | result
-----------------+--------+--------------
{name} | %I | "select"
{name} | %L | 'select'
{name} | %s | select
{regclass} | %I | """select"""
{regclass} | %L | '"select"'
{regclass} | %s | "select"
{regclass,name} | %I | """select"""
{regclass,name} | %L | '"select"'
{regclass,name} | %s | "select"

My assumption is that they both represent valid SQL identifiers, so it
stands
to reason that `%I` should result in a valid identifier for both of them
(or
neither one).

All three of the %I results are valid identifiers.

regclass performs the same conversion that %I performs. But since the
output of the regclass conversion is a valid identifier, with

double-quotes,

the %I adds another pair of double-quotes and doubles-up the existing

pair

thus leaving you with 6.

<select> is a reserved word and thus can only be used as an identifier

if it

is surrounded in double-quotes. name() doesn't care (not that it is
user-documented that I can find) about making its value usable as an
identifier so when its output goes through %I you get the expected value.

If you are going to use regclass you want to use %s to insert the result
into your string; not %I.

David J.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Tom Lane

tgl@sss.pgh.pa.us

about 11 years ago

In reply to: Jason Dusek (#3)

Re: regclass and format('%I')

Jason Dusek <jason.dusek@gmail.com> writes:

It honestly seems far more reasonable to me that %s and %I should do
the exact same thing with regclass.

You're mistaken. The operation of format() is first to convert the
non-format arguments to text strings, using the output functions for their
data types, and then to further process those text strings according to
the format specifiers:

%s -- no additional processing, just insert the string as-is.
%I -- apply double-quoting transformation to create a valid SQL identifier.
%L -- apply single-quoting transformation to create a valid SQL literal.

In the case of regclass, the output string is already double-quoted
as necessary, so applying %I to it produces a doubly double-quoted
string which is almost certainly not what you want. But it's not
format()'s job to be smarter than the user. If it tried to avoid
an extra pass of double quoting, it would get some cases wrong,
potentially creating security holes.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

David G. Johnston

david.g.johnston@gmail.com

about 11 years ago

In reply to: Jason Dusek (#3)

Re: regclass and format('%I')

On Saturday, March 14, 2015, Jason Dusek <jason.dusek@gmail.com> wrote:

It honestly seems far more reasonable to me that %s and %I should do
the exact same thing with regclass. My reasoning is as follows:

‘%I’ formats a something such that it is a valid identifier,

regclass is already a valid identifier,

therefore, do nothing.

Another line of reasoning:

If you format with ‘%s’ you are saying: I don’t care whether it’s a
valid identifier or literal or whatever, just put the string there,

but when we sub a regclass into a string, we want it to be a valid
identifier,

therefore we should write ‘%I’ when subbing it, so as not to confuse
our readers,

therefore ‘%I’ should do nothing.

I agree with the theory but adding type specific logic to format is going
to be difficult. The first thing the system does is convert all of the
inputs to text. Inside format() everything is text and so it has no way to
know that the type was regclass and should not be quoted again.

David J.

Jason Dusek

jason.dusek@gmail.com

about 11 years ago

In reply to: David G. Johnston (#6)

Re: regclass and format('%I')

On 14 March 2015 at 09:17, David G. Johnston <david.g.johnston@gmail.com> wrote:

On Saturday, March 14, 2015, Jason Dusek <jason.dusek@gmail.com> wrote:

It honestly seems far more reasonable to me that %s and %I should do
the exact same thing with regclass. My reasoning is as follows:

‘%I’ formats a something such that it is a valid identifier,

regclass is already a valid identifier,

therefore, do nothing.

Another line of reasoning:

If you format with ‘%s’ you are saying: I don’t care whether it’s a
valid identifier or literal or whatever, just put the string there,

but when we sub a regclass into a string, we want it to be a valid
identifier,

therefore we should write ‘%I’ when subbing it, so as not to confuse
our readers,

therefore ‘%I’ should do nothing.

I agree with the theory but adding type specific logic to format is going to
be difficult. The first thing the system does is convert all of the inputs
to text. Inside format() everything is text and so it has no way to know
that the type was regclass and should not be quoted again.

Could it work to add type-specific logic for the cast from `regclass`
to `name`? It would be nice to have something formulaic: always format
identifiers with `%I`, always cast to `name` before formatting.

Kind Regards,
Jason Dusek

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Pavel Stehule

pavel.stehule@gmail.com

about 11 years ago

In reply to: Jason Dusek (#7)

Re: regclass and format('%I')

2015-03-15 3:09 GMT+01:00 Jason Dusek <jason.dusek@gmail.com>:

On 14 March 2015 at 09:17, David G. Johnston <david.g.johnston@gmail.com>
wrote:

On Saturday, March 14, 2015, Jason Dusek <jason.dusek@gmail.com> wrote:

It honestly seems far more reasonable to me that %s and %I should do
the exact same thing with regclass. My reasoning is as follows:

‘%I’ formats a something such that it is a valid identifier,

regclass is already a valid identifier,

therefore, do nothing.

Another line of reasoning:

If you format with ‘%s’ you are saying: I don’t care whether it’s a
valid identifier or literal or whatever, just put the string there,

but when we sub a regclass into a string, we want it to be a valid
identifier,

therefore we should write ‘%I’ when subbing it, so as not to confuse
our readers,

therefore ‘%I’ should do nothing.

I agree with the theory but adding type specific logic to format is

going to

be difficult. The first thing the system does is convert all of the

inputs

to text. Inside format() everything is text and so it has no way to know
that the type was regclass and should not be quoted again.

Could it work to add type-specific logic for the cast from `regclass`
to `name`? It would be nice to have something formulaic: always format
identifiers with `%I`, always cast to `name` before formatting.

I don't think, so it can help - first, it is workaround and it doesn't help
for somebody who doesn't read a documentation. Same effect you can get if
you write "doesn't use %I for regclass, regtype types", although this
sentence is strange. I agree with you so it is bug or minimally not user
friendly design.

A some good solution should be safe function quote_identif, that protect us
against double quoting. This logic should be elsewhere than inside "format"
function. I am thinking so we can do it. It breaks nothing. Implementation
should not be too much complex, because "new" function quote_identif can do
nothing for safe types, so it can take string as input parameter and typid
as second parameter.

Pavel

Show quoted text

Kind Regards,
Jason Dusek

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

David G. Johnston

david.g.johnston@gmail.com

about 11 years ago

In reply to: Tom Lane (#5)

Re: regclass and format('%I')

On Sat, Mar 14, 2015 at 8:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jason Dusek <jason.dusek@gmail.com> writes:

It honestly seems far more reasonable to me that %s and %I should do
the exact same thing with regclass.

You're mistaken. The operation of format() is first to convert the
non-format arguments to text strings, using the output functions for their
data types, and then to further process those text strings according to
the format specifiers:

%s -- no additional processing, just insert the string as-is.
%I -- apply double-quoting transformation to create a valid SQL identifier.
%L -- apply single-quoting transformation to create a valid SQL literal.

In the case of regclass, the output string is already double-quoted
as necessary, so applying %I to it produces a doubly double-quoted
string which is almost certainly not what you want. But it's not
format()'s job to be smarter than the user. If it tried to avoid
an extra pass of double quoting, it would get some cases wrong,
potentially creating security holes.

TBH I'm not all that convinced by this argument.

First, it is not being smarter than the user but allowing the user to
generalize their problem so that they do not need to take the nature of the
input data into account and can write a semantically meaningful pattern
string instead. The risk of them incorrectly choosing between %s or %I and
opening a security hole seems higher - if not as widespread - than any
string logic we could apply.

Second, presupposing the the transformation of the input must be a single
"thing", and that we are doing the %I conversion based upon our own
internal (or SQL's at the matter may be) definition of what it means to
"quote an identifier", we should be capable of noticing that the provided
input is already a single "thing" which has been escaped according to said
rules.

IOW, as long as the output string matches: ^"(?:"{2})*"$ I do not see how
it is possible for format to lay in a value at %I that is any more
insecure than the current behavior. If the input string already matches
that pattern then it could be output as-is without any additional risk and
with the positive benefit of making this case work as expected. The broken
case then exists when someone actually intends to name their identifier
<"something"> which then correctly becomes <"""something"""> on output.

Since there is a behavior change involved there needs to be a convincing
use-case for the new behavior in order to justify the effort to change it.

David J.

#10

Tom Lane

tgl@sss.pgh.pa.us

about 11 years ago

In reply to: David G. Johnston (#9)

Re: regclass and format('%I')

"David G. Johnston" <david.g.johnston@gmail.com> writes:

IOW, as long as the output string matches: ^"(?:"{2})*"$ I do not see how
it is possible for format to lay in a value at %I that is any more
insecure than the current behavior. If the input string already matches
that pattern then it could be output as-is without any additional risk and
with the positive benefit of making this case work as expected. The broken
case then exists when someone actually intends to name their identifier
<"something"> which then correctly becomes <"""something"""> on output.

But that's exactly the problem: you just broke a case that used to work.
format('%I') is not supposed to guess at what the user intends; it is
supposed to produce a string that, after being passed through identifier
parsing (dequoting or downcasing), will match the input. It is not
format's business to break that contract just because the input has
already got some double quotes in it.

An example of where this might be important is if you're trying to
construct a query with arbitrary column headers in the output. You
can do
format('... AS %I ...', ..., column_label, ...)
and be confident that the label will be exactly what you've got in
column_label. This proposed change would break that for labels that
happen to already have double-quotes --- but who are we to say that
that can't have been what you wanted?

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#11

David G. Johnston

david.g.johnston@gmail.com

about 11 years ago

In reply to: Tom Lane (#10)

Re: regclass and format('%I')

On Sunday, March 15, 2015, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"David G. Johnston" <david.g.johnston@gmail.com <javascript:;>> writes:

IOW, as long as the output string matches: ^"(?:"{2})*"$ I do not see

how

it is possible for format to lay in a value at %I that is any more
insecure than the current behavior. If the input string already matches
that pattern then it could be output as-is without any additional risk

and

with the positive benefit of making this case work as expected. The

broken

case then exists when someone actually intends to name their identifier
<"something"> which then correctly becomes <"""something"""> on output.

But that's exactly the problem: you just broke a case that used to work.
format('%I') is not supposed to guess at what the user intends; it is
supposed to produce a string that, after being passed through identifier
parsing (dequoting or downcasing), will match the input. It is not
format's business to break that contract just because the input has
already got some double quotes in it.

An example of where this might be important is if you're trying to
construct a query with arbitrary column headers in the output. You
can do
format('... AS %I ...', ..., column_label, ...)
and be confident that the label will be exactly what you've got in
column_label. This proposed change would break that for labels that
happen to already have double-quotes --- but who are we to say that
that can't have been what you wanted?

regards, tom lane

Ok, but that doesn't impact security.

Contracts can be amended to be more practical and what is being suggested
is not as radical as you make it out to be. I'm with Jason and dislike the
"use %s" option and having a behavior of "quote if necessary" is not
unreasonable. Maybe it needs to be a different code for compatibility
reasons. In hindsight I'd make %Q the current unconditional quoting
behavior and use %I for conditional quoting but using %Q for that now would
at least make the behavior available.

David J.

#12

Jason Dusek

jason.dusek@gmail.com

about 11 years ago

In reply to: Tom Lane (#10)

Re: regclass and format('%I')

On 15 March 2015 at 08:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"David G. Johnston" <david.g.johnston@gmail.com> writes:

IOW, as long as the output string matches: ^"(?:"{2})*"$ I do not see

how

it is possible for format to lay in a value at %I that is any more
insecure than the current behavior. If the input string already matches
that pattern then it could be output as-is without any additional risk

and

with the positive benefit of making this case work as expected. The

broken

case then exists when someone actually intends to name their identifier
<"something"> which then correctly becomes <"""something"""> on output.

But that's exactly the problem: you just broke a case that used to work.
format('%I') is not supposed to guess at what the user intends; it is
supposed to produce a string that, after being passed through identifier
parsing (dequoting or downcasing), will match the input. It is not
format's business to break that contract just because the input has
already got some double quotes in it.

An example of where this might be important is if you're trying to
construct a query with arbitrary column headers in the output. You
can do
format('... AS %I ...', ..., column_label, ...)
and be confident that the label will be exactly what you've got in
column_label. This proposed change would break that for labels that
happen to already have double-quotes --- but who are we to say that
that can't have been what you wanted?

I agree with Tom that we shouldn't key off of contents in the string to
determine whether or not to quote. Introducing the behave I describe in an
intuitive way would require some kind of type-specific handling in
format(). I'm not sure what the cost of this is to the project, but David
makes the very reasonable point that imposing the burden of choosing
between `%s` and `%I` opens up the possibility of confusing vulnerabilities.

Kind Regards,
Jason Dusek