psql/pg_dump vs. dollar signs in identifiers

Started by Tom Laneover 18 years ago5 messages
#1Tom Lane
tgl@sss.pgh.pa.us

An example being discussed on the jdbc list led me to try this:

regression=# create table a$b$c (f1 int);
CREATE TABLE
regression=# \d a$b$c
Did not find any relation named "a$b$c".

It works if you use quotes:

regression=# \d "a$b$c"
Table "public.a$b$c"
Column | Type | Modifiers
--------+---------+-----------
f1 | integer |

The reason it doesn't work without quotes is that processSQLNamePattern()
thinks this:

* Inside double quotes, or at all times if force_escape is true,
* quote regexp special characters with a backslash to avoid
* regexp errors. Outside quotes, however, let them pass through
* as-is; this lets knowledgeable users build regexp expressions
* that are more powerful than shell-style patterns.

and of course $ is a regexp special character, so it bollixes up the
match.

Now, because we surround the pattern with ^...$ anyway, I can't offhand
see a use-case for putting $ with its regexp meaning into the pattern.
And since we do allow $ as a non-first character of identifiers, there
is a use-case for expecting it to be treated like an ordinary character.

So I'm thinking that $ ought to be quoted whether it's inside double
quotes or not. This change would affect psql's describe commands as
well as pg_dump -t and -n patterns.

Comments?

regards, tom lane

#2Gregory Stark
stark@enterprisedb.com
In reply to: Tom Lane (#1)
Re: psql/pg_dump vs. dollar signs in identifiers

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

Now, because we surround the pattern with ^...$ anyway, I can't offhand
see a use-case for putting $ with its regexp meaning into the pattern.

It's possible to still usefully use $ in the regexp, but it's existence at the
end means there should always be a way to write the regexp without needing
another one inside.

Incidentally, are these really regexps? I always thought they were globs.
And experiments seem to back up my memory:

postgres=# \d foo*
Table "public.foo^bar"
Column | Type | Modifiers
--------+---------+-----------
i | integer |

postgres=# \d foo.*
Did not find any relation named "foo.*".

Comments?

The first half of the logic applies to ^ as well. There's no use case for
regexps using ^ inside. You would have to use quotes to create the table but
we could have \d foo^* work:

postgres=# \d foo^*
Did not find any relation named "foo^*".

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gregory Stark (#2)
Re: psql/pg_dump vs. dollar signs in identifiers

Gregory Stark <stark@enterprisedb.com> writes:

Incidentally, are these really regexps? I always thought they were globs.

They're regexps under the hood, but we treat . as a schema separator
and translate * to .*, which makes it look like mostly a glob scheme.
But you can make use of brackets, |, +, ...

regards, tom lane

#4Jim C. Nasby
decibel@decibel.org
In reply to: Gregory Stark (#2)
Re: psql/pg_dump vs. dollar signs in identifiers

On Mon, Jul 09, 2007 at 07:04:27PM +0100, Gregory Stark wrote:

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

Now, because we surround the pattern with ^...$ anyway, I can't offhand
see a use-case for putting $ with its regexp meaning into the pattern.

It's possible to still usefully use $ in the regexp, but it's existence at the
end means there should always be a way to write the regexp without needing
another one inside.

Unless you're doing muti-line regex, what's the point of a $ anywhere
but the end of the expression? Am I missing something? Likewise with ^.

I'm inclined to escape $ as Tom suggested.
--
Jim Nasby decibel@decibel.org
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

#5Gregory Stark
stark@enterprisedb.com
In reply to: Jim C. Nasby (#4)
Re: psql/pg_dump vs. dollar signs in identifiers

"Jim C. Nasby" <decibel@decibel.org> writes:

Unless you're doing muti-line regex, what's the point of a $ anywhere
but the end of the expression? Am I missing something? Likewise with ^.

Leaving out the backslashes, you can do things like (foo$|baz|qux)(baz|qux|)
to say that all 9 combinations of those two tokens are valid except that foo
must be followed by the empty second half.

But it can always be refactored into something more normal like
(foo|((baz|qux)(baz|qux)?))

I'm inclined to escape $ as Tom suggested.

Yeah, I have a tendency to look for the most obscure counter-example if only
to be sure I really understand precisely how obscure it is. I do agree that
it's not a realistic concern. Especially since I never even realized we
handled regexps here at all :)

IIRC some regexp engines don't actually treat $ specially except at the end of
the regexp at all. Tom's just suggesting doing the same thing here where
complicated regexps are even *less* likely and dollars as literals more.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com