New string functions; initdb required
I've just committed changes which implement three SQL99 functions and
operators. OVERLAY() allows substituting a string into another string,
SIMILAR TO is an operator for pattern matching, and a new variant of
SUBSTRING() accepts a pattern to match.
Regression tests have been augmented and pass. Docs have been updated.
The system catalogs were updated, so it is initdb time. Details from the
cvs log below...
- Thomas
Implement SQL99 OVERLAY(). Allows substitution of a substring in a
string.
Implement SQL99 SIMILAR TO as a synonym for our existing operator "~".
Implement SQL99 regular expression SUBSTRING(string FROM pat FOR
escape).
Extend the definition to make the FOR clause optional.
Define textregexsubstr() to actually implement this feature.
Update the regression test to include these new string features.
All tests pass.
Rename the regular expression support routines from "pg95_xxx" to
"pg_xxx".
Define CREATE CHARACTER SET in the parser per SQL99. No implementation
yet.
Thomas,
I've just committed changes which implement three SQL99 functions and
operators. OVERLAY() allows substituting a string into another string,
SIMILAR TO is an operator for pattern matching, and a new variant of
SUBSTRING() accepts a pattern to match.
Way cool! Thank you ... this replaces several of my custom PL/pgSQL
functions.
How is SIMILAR TO different from ~ ?
--
-Josh Berkus
______AGLIO DATABASE SOLUTIONS___________________________
Josh Berkus
Complete information technology josh@agliodbs.com
and data management solutions (415) 565-7293
for law firms, small businesses fax 621-2533
and non-profit organizations. San Francisco
On Tue, Jun 11, 2002 at 11:08:11AM -0700, Josh Berkus wrote:
Thomas,
I've just committed changes which implement three SQL99 functions and
operators. OVERLAY() allows substituting a string into another string,
SIMILAR TO is an operator for pattern matching, and a new variant of
SUBSTRING() accepts a pattern to match.Way cool! Thank you ... this replaces several of my custom PL/pgSQL
functions.How is SIMILAR TO different from ~ ?
From the part of Thomas's email you snipped:
Implement SQL99 SIMILAR TO as a synonym for our existing operator "~".
So the answer is "not at all"
Ross
Thomas Lockhart wrote:
I've just committed changes which implement three SQL99 functions and
operators. OVERLAY() allows substituting a string into another string,
SIMILAR TO is an operator for pattern matching, and a new variant of
TODO item marked as done:
* -Add SIMILAR TO to allow character classes, 'pg_[a-c]%'
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
TODO item marked as done:
* -Add SIMILAR TO to allow character classes, 'pg_[a-c]%'
Darn. Will have to be more careful next time ;)
- Thomas
How is SIMILAR TO different from ~ ?
From the part of Thomas's email you snipped:Implement SQL99 SIMILAR TO as a synonym for our existing operator "~".
So the answer is "not at all"
Right. I'm not certain about the regex syntax defined by SQL99; I used
the syntax that we already have enabled and it looks like we have a
couple of other variants available if we need them. If someone wants to
research the *actual* syntax specified by SQL99 that would be good...
- Thomas
Thomas Lockhart wrote:
Right. I'm not certain about the regex syntax defined by SQL99; I used
the syntax that we already have enabled and it looks like we have a
couple of other variants available if we need them. If someone wants to
research the *actual* syntax specified by SQL99 that would be good...
As usual: ( ) + * [ ] |
Instead of dot . there is underscore _
There is % to mean .* just like LIKE
There is no ? or ^ or $
Regular expressions match the whole string, as if there were an
implicit ^ before and $ after the pattern. You have to add % if
you want to match anywhere in a string.
As far as I can tell, there is no default escape character like \
but you can specify one.
8.6 Similar predicate
Function
Specify a character string similarity by means of a regular expression.
Format
<similar predicate> ::=
<character match value> [ NOT ] SIMILAR TO <similar pattern>
[ ESCAPE <escape character> ]
<similar pattern> ::= <character value expression>
<regular expression> ::=
<regular term>
| <regular expression> <vertical bar> <regular term>
<regular term> ::=
<regular factor>
| <regular term> <regular factor>
<regular factor> ::=
<regular primary>
| <regular primary> <asterisk>
| <regular primary> <plus sign>
<regular primary> ::=
<character specifier>
| <percent>
| <regular character set>
| <left paren> <regular expression> <right paren>
<character specifier> ::=
<non-escaped character>
| <escaped character>
<non-escaped character> ::= !! See the Syntax Rules
<escaped character> ::= !! See the Syntax Rules
<regular character set> ::=
<underscore>
| <left bracket> <character enumeration>... <right bracket>
| <left bracket> <circumflex>
<character enumeration>... <right bracket>
| <left bracket> <colon> <regular character set identifier>
<colon> <right bracket>
<character enumeration> ::=
<character specifier>
| <character specifier> <minus sign> <character specifier>
<regular character set identifier> ::= <identifier>
*stuff omitted*
3) The value of the <identifier> that is a <regular character set
identifier> shall be either ALPHA, UPPER, LOWER, DIGIT, or ALNUM.
*collating stuff omitted*
5) A <non-escaped character> is any single character from the
character set of the <similar pattern> that is not a <left bracket>,
<right bracket>, <left paren>, <right paren>, <vertical bar>,
<circumflex>, <minus sign>, <plus sign>, <asterisk>, <underscore>,
<percent>, or the character specified by the result of the <character
value expression> of <escape character>. A <character specifier> that
is a <non-escaped character> represents itself.
6) An <escaped character> is a sequence of two characters: the
character specified by the result of the <character value expression>
of <escape character>, followed by a second character that is a <left
bracket>, <right bracket>, <left paren>, <right paren>, <vertical
bar>, <circumflex>, <minus sign>, <plus sign>, <asterisk>,
<underscore>, <percent>, or the character specified by the result of
the <character value expression> of <escape character>. A <character
specifier> that is an <escaped character> represents its second
character.
Thanks for the info! I have a question...
As usual: ( ) + * [ ] |
Instead of dot . there is underscore _
There is % to mean .* just like LIKE
There is no ? or ^ or $
Regular expressions match the whole string, as if there were an
implicit ^ before and $ after the pattern. You have to add % if
you want to match anywhere in a string.
Hmm. So if there are no explicit anchors then there must be a slightly
different syntax for the regular-expression version of the substring()
function? Otherwise, substrings would always have to start from the
first character, right?
Percents and underscores carried over from LIKE are really annoying.
I'll think about implementing an expression rewriter to convert SQL99 to
our modern regexp syntax.
- Thomas