add function argument names to regex* functions.
Hi.
similar to [1]/messages/by-id/877cw3jl8y.fsf@wibble.ilmari.org, add function argument names to the following functions:
regexp_like, regexp_match,regexp_matches,regexp_replace,
regexp_substr,regexp_split_to_array,regexp_split_to_table,regexp_count
so I call these function in a different notation[2]https://www.postgresql.org/docs/current/sql-syntax-calling-funcs.html#SQL-SYNTAX-CALLING-FUNCS, like:
SELECT regexp_like(string=>'a'||CHR(10)||'d', pattern=>'a.d', flags:='n');
select regexp_match(string=>'abc',n pattern=>'(B)(c)', flags=>'i');
select regexp_matches(string=>'Programmer', pattern=>'(\w)(.*?\1)',
flags=>'ig');
SELECT regexp_replace(source=>'A PostgreSQL function',
pattern=>'a|e|i|o|u', replacement=>'X', start=>1, n=>4, flags=>'i');
SELECT regexp_substr(string=>'1234567890',
pattern=>'(123)(4(56)(78))', start=>1, n=>1, flags=>'i', subexpr=>4);
SELECT regexp_split_to_array(string=>'thE QUick bROWn FOx jUMPs ovEr
The lazy dOG', pattern=>'e', flags=>'i');
SELECT foo, length(foo)
FROM regexp_split_to_table(string=>'thE QUick bROWn FOx jUMPs ovEr The
lazy dOG', pattern=>'e',flags=>'i') AS foo;
SELECT regexp_count(string=>'ABCABCABCABC', pattern=>'Abc', start=>1,
flags=>'i');
In [3]https://www.postgresql.org/docs/current/functions-matching.html, except the above mentioned function, there is a "substring"
function.
I want to refactor substring function argument names. it looks like:
Schema | Name | Result data type | Argument data
types | Type
------------+-----------+------------------+--------------------------------------------+------
pg_catalog | substring | bit | bits bit, "from" integer
| func
pg_catalog | substring | bit | bits bit, "from" integer,
"for" integer | func
pg_catalog | substring | bytea | bytes bytea, "from"
integer | func
pg_catalog | substring | bytea | bytes bytea, "from"
integer, "for" integer | func
pg_catalog | substring | text | string text, "from"
integer | func
pg_catalog | substring | text | string text, "from"
integer, "for" integer | func
pg_catalog | substring | text | string text, pattern text
| func
pg_catalog | substring | text | text, text, text
| func
(8 rows)
As you can see, the substring function argument names need an explicit
double quote,
which doesn't look good, so I gave up.
[1]: /messages/by-id/877cw3jl8y.fsf@wibble.ilmari.org
/messages/by-id/877cw3jl8y.fsf@wibble.ilmari.org
[2]: https://www.postgresql.org/docs/current/sql-syntax-calling-funcs.html#SQL-SYNTAX-CALLING-FUNCS
https://www.postgresql.org/docs/current/sql-syntax-calling-funcs.html#SQL-SYNTAX-CALLING-FUNCS
[3]: https://www.postgresql.org/docs/current/functions-matching.html
Attachments:
v1-0001-add-function-argument-names-to-regex.*-functions.patchapplication/x-patch; name=v1-0001-add-function-argument-names-to-regex.*-functions.patchDownload
From f86ac65a78847062d68bb7e299a1e45e2a3d9477 Mon Sep 17 00:00:00 2001
From: pgaddict <jian.universality@gmail.com>
Date: Wed, 27 Dec 2023 19:59:34 +0800
Subject: [PATCH v1 1/1] add function argument names to regex.* functions.
Specifically add function argument names to the following funtions:
regexp_like, regexp_match, regexp_matches, regexp_replace,
regexp_substr, regexp_split_to_array,
regexp_split_to_table, regexp_count.
So these functions can be easiler to understand in psql via \df. also more important make these functions
can be called in different notaions.
---
src/include/catalog/pg_proc.dat | 71 ++++++++++++++++++++++++++-------
1 file changed, 57 insertions(+), 14 deletions(-)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9052f526..e3977a59 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -3611,105 +3611,148 @@
prosrc => 'replace_text' },
{ oid => '2284', descr => 'replace text using regexp',
proname => 'regexp_replace', prorettype => 'text',
- proargtypes => 'text text text', prosrc => 'textregexreplace_noopt' },
+ proargtypes => 'text text text',
+ proargnames => '{source, pattern, replacement}',
+ prosrc => 'textregexreplace_noopt' },
{ oid => '2285', descr => 'replace text using regexp',
proname => 'regexp_replace', prorettype => 'text',
- proargtypes => 'text text text text', prosrc => 'textregexreplace' },
+ proargtypes => 'text text text text',
+ proargnames => '{source, pattern, replacement, flags}',
+ prosrc => 'textregexreplace' },
{ oid => '6251', descr => 'replace text using regexp',
proname => 'regexp_replace', prorettype => 'text',
proargtypes => 'text text text int4 int4 text',
+ proargnames => '{source, pattern, replacement, start, n, flags}',
prosrc => 'textregexreplace_extended' },
{ oid => '6252', descr => 'replace text using regexp',
proname => 'regexp_replace', prorettype => 'text',
proargtypes => 'text text text int4 int4',
+ proargnames => '{source, pattern, replacement, start, n}',
prosrc => 'textregexreplace_extended_no_flags' },
{ oid => '6253', descr => 'replace text using regexp',
proname => 'regexp_replace', prorettype => 'text',
proargtypes => 'text text text int4',
+ proargnames => '{source, pattern, replacement, start}',
prosrc => 'textregexreplace_extended_no_n' },
{ oid => '3396', descr => 'find first match for regexp',
proname => 'regexp_match', prorettype => '_text', proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_match_no_flags' },
{ oid => '3397', descr => 'find first match for regexp',
proname => 'regexp_match', prorettype => '_text',
- proargtypes => 'text text text', prosrc => 'regexp_match' },
+ proargtypes => 'text text text',
+ proargnames => '{string, pattern, flags}',
+ prosrc => 'regexp_match' },
{ oid => '2763', descr => 'find match(es) for regexp',
proname => 'regexp_matches', prorows => '1', proretset => 't',
prorettype => '_text', proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_matches_no_flags' },
{ oid => '2764', descr => 'find match(es) for regexp',
proname => 'regexp_matches', prorows => '10', proretset => 't',
prorettype => '_text', proargtypes => 'text text text',
+ proargnames => '{string, pattern, flags}',
prosrc => 'regexp_matches' },
{ oid => '6254', descr => 'count regexp matches',
proname => 'regexp_count', prorettype => 'int4', proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_count_no_start' },
{ oid => '6255', descr => 'count regexp matches',
proname => 'regexp_count', prorettype => 'int4',
- proargtypes => 'text text int4', prosrc => 'regexp_count_no_flags' },
+ proargtypes => 'text text int4',
+ proargnames => '{string, pattern, start}',
+ prosrc => 'regexp_count_no_flags' },
{ oid => '6256', descr => 'count regexp matches',
proname => 'regexp_count', prorettype => 'int4',
- proargtypes => 'text text int4 text', prosrc => 'regexp_count' },
+ proargtypes => 'text text int4 text',
+ proargnames => '{string, pattern, start, flags}',
+ prosrc => 'regexp_count' },
{ oid => '6257', descr => 'position of regexp match',
proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_instr_no_start' },
{ oid => '6258', descr => 'position of regexp match',
proname => 'regexp_instr', prorettype => 'int4',
- proargtypes => 'text text int4', prosrc => 'regexp_instr_no_n' },
+ proargtypes => 'text text int4',
+ proargnames => '{string, pattern, start}',
+ prosrc => 'regexp_instr_no_n' },
{ oid => '6259', descr => 'position of regexp match',
proname => 'regexp_instr', prorettype => 'int4',
- proargtypes => 'text text int4 int4', prosrc => 'regexp_instr_no_endoption' },
+ proargtypes => 'text text int4 int4',
+ proargnames => '{string, pattern, start, n}',
+ prosrc => 'regexp_instr_no_endoption' },
{ oid => '6260', descr => 'position of regexp match',
proname => 'regexp_instr', prorettype => 'int4',
proargtypes => 'text text int4 int4 int4',
+ proargnames => '{string, pattern, start, n, endoption}',
prosrc => 'regexp_instr_no_flags' },
{ oid => '6261', descr => 'position of regexp match',
proname => 'regexp_instr', prorettype => 'int4',
proargtypes => 'text text int4 int4 int4 text',
+ proargnames => '{string, pattern, start, n, endoption, flags}',
prosrc => 'regexp_instr_no_subexpr' },
{ oid => '6262', descr => 'position of regexp match',
proname => 'regexp_instr', prorettype => 'int4',
proargtypes => 'text text int4 int4 int4 text int4',
+ proargnames => '{string, pattern, start, n, endoption, flags, subexpr}',
prosrc => 'regexp_instr' },
{ oid => '6263', descr => 'test for regexp match',
- proname => 'regexp_like', prorettype => 'bool', proargtypes => 'text text',
+ proname => 'regexp_like', prorettype => 'bool',
+ proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_like_no_flags' },
{ oid => '6264', descr => 'test for regexp match',
proname => 'regexp_like', prorettype => 'bool',
- proargtypes => 'text text text', prosrc => 'regexp_like' },
+ proargtypes => 'text text text',
+ proargnames => '{string, pattern,flags}',
+ prosrc => 'regexp_like' },
{ oid => '6265', descr => 'extract substring that matches regexp',
proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_substr_no_start' },
{ oid => '6266', descr => 'extract substring that matches regexp',
proname => 'regexp_substr', prorettype => 'text',
- proargtypes => 'text text int4', prosrc => 'regexp_substr_no_n' },
+ proargtypes => 'text text int4',
+ proargnames => '{string, pattern, start}',
+ prosrc => 'regexp_substr_no_n' },
{ oid => '6267', descr => 'extract substring that matches regexp',
proname => 'regexp_substr', prorettype => 'text',
- proargtypes => 'text text int4 int4', prosrc => 'regexp_substr_no_flags' },
+ proargtypes => 'text text int4 int4',
+ proargnames => '{string, pattern, start, n}',
+ prosrc => 'regexp_substr_no_flags' },
{ oid => '6268', descr => 'extract substring that matches regexp',
proname => 'regexp_substr', prorettype => 'text',
proargtypes => 'text text int4 int4 text',
+ proargnames => '{string, pattern, start, n, flags}',
prosrc => 'regexp_substr_no_subexpr' },
{ oid => '6269', descr => 'extract substring that matches regexp',
proname => 'regexp_substr', prorettype => 'text',
- proargtypes => 'text text int4 int4 text int4', prosrc => 'regexp_substr' },
+ proargtypes => 'text text int4 int4 text int4',
+ proargnames => '{string, pattern, start, n, flags, subexpr}',
+ prosrc => 'regexp_substr' },
{ oid => '2088', descr => 'split string by field_sep and return field_num',
proname => 'split_part', prorettype => 'text',
proargtypes => 'text text int4', prosrc => 'split_part' },
{ oid => '2765', descr => 'split string by pattern',
proname => 'regexp_split_to_table', prorows => '1000', proretset => 't',
prorettype => 'text', proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_split_to_table_no_flags' },
{ oid => '2766', descr => 'split string by pattern',
proname => 'regexp_split_to_table', prorows => '1000', proretset => 't',
prorettype => 'text', proargtypes => 'text text text',
+ proargnames => '{string, pattern, flags}',
prosrc => 'regexp_split_to_table' },
{ oid => '2767', descr => 'split string by pattern',
proname => 'regexp_split_to_array', prorettype => '_text',
- proargtypes => 'text text', prosrc => 'regexp_split_to_array_no_flags' },
+ proargtypes => 'text text',
+ proargnames => '{string, pattern}',
+ prosrc => 'regexp_split_to_array_no_flags' },
{ oid => '2768', descr => 'split string by pattern',
proname => 'regexp_split_to_array', prorettype => '_text',
- proargtypes => 'text text text', prosrc => 'regexp_split_to_array' },
+ proargtypes => 'text text text',
+ proargnames => '{string, pattern, flags}',
+ prosrc => 'regexp_split_to_array' },
{ oid => '9030', descr => 'convert int4 number to binary',
proname => 'to_bin', prorettype => 'text', proargtypes => 'int4',
prosrc => 'to_bin32' },
--
2.34.1
On 27.12.23 17:53, jian he wrote:
similar to [1], add function argument names to the following functions:
regexp_like, regexp_match,regexp_matches,regexp_replace,
regexp_substr,regexp_split_to_array,regexp_split_to_table,regexp_count
Note that these functions are a quasi-standard that is shared with other
SQL implementations. It might be worth looking around if there are
already other implementations of this idea.
On Thu, Dec 28, 2023 at 6:25 AM Peter Eisentraut <peter@eisentraut.org> wrote:
On 27.12.23 17:53, jian he wrote:
similar to [1], add function argument names to the following functions:
regexp_like, regexp_match,regexp_matches,regexp_replace,
regexp_substr,regexp_split_to_array,regexp_split_to_table,regexp_countNote that these functions are a quasi-standard that is shared with other
SQL implementations. It might be worth looking around if there are
already other implementations of this idea.
turns out people do like calling functions via explicitly mentioning
function argument names, example: [0]https://stackoverflow.com/questions/33387348/oracle-named-parameters-in-regular-functions
There are no provisions for the argument names.
I looked around the oracle implementation in [1]https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/REGEXP_SUBSTR.html#GUID-2903904D-455F-4839-A8B2-1731EF4BD099, and the oracle regex
related function argumentation name in [2]https://dbfiddle.uk/h_SBDEKi
I use the doc [3]https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP syntax explanation and add the related function names.
Current, regex.* function syntax seems fine. but only parameter `N`
seems a little bit weird.
If we change the function's argument name, we also need to change
function syntax explanation in the doc; vise versa.
QUOTE:
The regexp_instr function returns the starting or ending position of
the N'th match of a POSIX regular expression pattern to a string, or
zero if there is no such match. It has the syntax regexp_instr(string,
pattern [, start [, N [, endoption [, flags [, subexpr ]]]]]). pattern
is searched for in string, normally from the beginning of the string,
but if the start parameter is provided then beginning from that
character index. If N is specified then the N'th match of the pattern
is located, otherwise the first match is located.
END OF QUOTE.
maybe we can change `N` to occurrence. but `occurrence` is kind of verbose.
[0]: https://stackoverflow.com/questions/33387348/oracle-named-parameters-in-regular-functions
[1]: https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/REGEXP_SUBSTR.html#GUID-2903904D-455F-4839-A8B2-1731EF4BD099
[2]: https://dbfiddle.uk/h_SBDEKi
[3]: https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP
On Wed Dec 27, 2023 at 10:28 PM EST, jian he wrote:
On Thu, Dec 28, 2023 at 6:25 AM Peter Eisentraut <peter@eisentraut.org> wrote:
On 27.12.23 17:53, jian he wrote:
similar to [1], add function argument names to the following functions:
regexp_like, regexp_match,regexp_matches,regexp_replace,
regexp_substr,regexp_split_to_array,regexp_split_to_table,regexp_countNote that these functions are a quasi-standard that is shared with other
SQL implementations. It might be worth looking around if there are
already other implementations of this idea.turns out people do like calling functions via explicitly mentioning
function argument names, example: [0]
There are no provisions for the argument names.I looked around the oracle implementation in [1], and the oracle regex
related function argumentation name in [2]
I use the doc [3] syntax explanation and add the related function names.Current, regex.* function syntax seems fine. but only parameter `N`
seems a little bit weird.
If we change the function's argument name, we also need to change
function syntax explanation in the doc; vise versa.QUOTE:
The regexp_instr function returns the starting or ending position of
the N'th match of a POSIX regular expression pattern to a string, or
zero if there is no such match. It has the syntax regexp_instr(string,
pattern [, start [, N [, endoption [, flags [, subexpr ]]]]]). pattern
is searched for in string, normally from the beginning of the string,
but if the start parameter is provided then beginning from that
character index. If N is specified then the N'th match of the pattern
is located, otherwise the first match is located.
END OF QUOTE.maybe we can change `N` to occurrence. but `occurrence` is kind of verbose.
[0] https://stackoverflow.com/questions/33387348/oracle-named-parameters-in-regular-functions
[1] https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/REGEXP_SUBSTR.html#GUID-2903904D-455F-4839-A8B2-1731EF4BD099
[2] https://dbfiddle.uk/h_SBDEKi
[3] https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP
I've been trying to use named arguments more diligently so expanding
support for built-in functions is welcome. The patch applies cleanly and
works as advertised.
I agree that the parameter name `n` is not ideal. For example, in
`regexp_replace` it's easy to misinterpret it as "make up to n
replacements". This has not been a problem when `n` only lives in the
documentation which explains exactly what it does, but that context is
not readily available in code expressing `n => 3`.
Another possibility is `index`, which is relatively short and not a
reserved keyword ^1. `position` is not as precise but would avoid the
conceptual overloading of ordinary indices.
1. https://www.postgresql.org/docs/current/sql-keywords-appendix.html
On 28.12.23 04:28, jian he wrote:
I looked around the oracle implementation in [1], and the oracle regex
related function argumentation name in [2]
I use the doc [3] syntax explanation and add the related function names.Current, regex.* function syntax seems fine. but only parameter `N`
seems a little bit weird.
If we change the function's argument name, we also need to change
function syntax explanation in the doc; vise versa.
So, it looks like Oracle already has defined parameter names for these,
so we should make ours match.
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<div class="moz-cite-prefix">On 1/1/24 12:05 PM, Dian Fay wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CY3KI01TXYMD.3Q41WQ6U0NFEO@nmfay.com">
<pre>I agree that the parameter name `n` is not ideal. For example, in
`regexp_replace` it's easy to misinterpret it as "make up to n
replacements". This has not been a problem when `n` only lives in the
documentation which explains exactly what it does, but that context is
not readily available in code expressing `n => 3`.
</pre>
</blockquote>
Agreed; IMO it's worth diverging from what Oracle has done here.<br>
<blockquote type="cite"
cite="mid:CY3KI01TXYMD.3Q41WQ6U0NFEO@nmfay.com">
<pre>
Another possibility is `index`, which is relatively short and not a
reserved keyword ^1. `position` is not as precise but would avoid the
conceptual overloading of ordinary indices.
</pre>
</blockquote>
<p>I'm not a fan of "index" since that leaves the question of
whether it's 0 or 1 based. "Position" is a bit better, but I think
Jian's suggestion of "occurance" is best.<br>
</p>
<pre class="moz-signature" cols="72">--
Jim Nasby, Data Architect, Austin TX</pre>
</body>
</html>
Another possibility is `index`, which is relatively short and not a
reserved keyword ^1. `position` is not as precise but would avoid the
conceptual overloading of ordinary indices.I'm not a fan of "index" since that leaves the question of
whether it's 0 or 1 based. "Position" is a bit better, but I think
Jian's suggestion of "occurance" is best.
We do have precedent for one-based `index` in Postgres: array types are
1-indexed by default! "Occurrence" removes that ambiguity but it's long
and easy to misspell (I looked it up after typing it just now and it
_still_ feels off).
How's "instance"?
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<div class="moz-cite-prefix">On 1/3/24 5:05 PM, Dian Fay wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CY5G4TAMVX32.23TH6LFF93ATZ@nmfay.com">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Another possibility is `index`, which is relatively short and not a
reserved keyword ^1. `position` is not as precise but would avoid the
conceptual overloading of ordinary indices.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
I'm not a fan of "index" since that leaves the question of
whether it's 0 or 1 based. "Position" is a bit better, but I think
Jian's suggestion of "occurance" is best.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
We do have precedent for one-based `index` in Postgres: array types are
1-indexed by default! "Occurrence" removes that ambiguity but it's long
and easy to misspell (I looked it up after typing it just now and it
_still_ feels off).
How's "instance"?
</pre>
</blockquote>
<p>Presumably someone referencing arguments by name would have just
looked up the names via \df or whatever, so presumably misspelling
wouldn't be a big issue. But I think "instance" is OK as well.<br>
</p>
<pre class="moz-signature" cols="72">--
Jim Nasby, Data Architect, Austin TX</pre>
</body>
</html>
On Thu, Jan 4, 2024 at 7:26 AM Jim Nasby <jim.nasby@gmail.com> wrote:
On 1/3/24 5:05 PM, Dian Fay wrote:
Another possibility is `index`, which is relatively short and not a
reserved keyword ^1. `position` is not as precise but would avoid the
conceptual overloading of ordinary indices.I'm not a fan of "index" since that leaves the question of
whether it's 0 or 1 based. "Position" is a bit better, but I think
Jian's suggestion of "occurance" is best.We do have precedent for one-based `index` in Postgres: array types are
1-indexed by default! "Occurrence" removes that ambiguity but it's long
and easy to misspell (I looked it up after typing it just now and it
_still_ feels off).How's "instance"?
Presumably someone referencing arguments by name would have just looked up the names via \df or whatever, so presumably misspelling wouldn't be a big issue. But I think "instance" is OK as well.
--
Jim Nasby, Data Architect, Austin TX
regexp_instr: It has the syntax regexp_instr(string, pattern [, start
[, N [, endoption [, flags [, subexpr ]]]]])
oracle:
REGEXP_INSTR (source_char, pattern, [, position [, occurrence [,
return_opt [, match_param [, subexpr ]]]]] )
"string" and "source_char" are almost the same descriptive, so maybe
there is no need to change.
"start" is better than "position", imho.
"return_opt" is better than "endoption", (maybe we need change, for
now I didn't)
"flags" cannot be changed to "match_param", given it quite everywhere
in functions-matching.html.
similarly for function regexp_replace, oracle using "repplace_string",
we use "replacement"(mentioned in the doc).
so I don't think we need to change to "repplace_string".
Based on how people google[0]https://www.google.com/search?q=regex+nth+match&oq=regex+nth+match&gs_lcrp=EgZjaHJvbWUyBggAEEUYOTIGCAEQRRg8MgYIAhBFGDzSAQc2MThqMGo5qAIAsAIA&sourceid=chrome&ie=UTF-8, I think `occurrence` is ok, even though
it's verbose.
to change from `N` to `occurrence`, we also need to change the doc,
that is why this patch is more larger.
Attachments:
v2-0001-add-function-argument-names-to-regex.-functions.patchtext/x-patch; charset=US-ASCII; name=v2-0001-add-function-argument-names-to-regex.-functions.patchDownload
From cb720fd696df3a34032ec2adaea712604a7e3d42 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Thu, 4 Jan 2024 13:02:35 +0800
Subject: [PATCH v2 1/1] add function argument names to regex.* functions.
Specifically add function argument names to the following funtions:
regexp_replace,
regexp_match,
regexp_matches
regexp_count,
regexp_instr,
regexp_like,
regexp_substr,
regexp_split_to_table
regexp_split_to_array
So these functions can be easiler to understand in psql via \df.
Also more important make these functions can be called in different notaions.
---
doc/src/sgml/func.sgml | 38 +++++++++---------
src/include/catalog/pg_proc.dat | 71 ++++++++++++++++++++++++++-------
2 files changed, 76 insertions(+), 33 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index cec21e42..b04a427a 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3299,7 +3299,7 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in
</indexterm>
<function>regexp_instr</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>
[, <parameter>start</parameter> <type>integer</type>
- [, <parameter>N</parameter> <type>integer</type>
+ [, <parameter>occurrence</parameter> <type>integer</type>
[, <parameter>endoption</parameter> <type>integer</type>
[, <parameter>flags</parameter> <type>text</type>
[, <parameter>subexpr</parameter> <type>integer</type> ] ] ] ] ] )
@@ -3307,7 +3307,7 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in
</para>
<para>
Returns the position within <parameter>string</parameter> where
- the <parameter>N</parameter>'th match of the POSIX regular
+ the <parameter>occurrence</parameter>'th match of the POSIX regular
expression <parameter>pattern</parameter> occurs, or zero if there is
no such match; see <xref linkend="functions-posix-regexp"/>.
</para>
@@ -3478,14 +3478,14 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in
</indexterm>
<function>regexp_substr</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>
[, <parameter>start</parameter> <type>integer</type>
- [, <parameter>N</parameter> <type>integer</type>
+ [, <parameter>occurrence</parameter> <type>integer</type>
[, <parameter>flags</parameter> <type>text</type>
[, <parameter>subexpr</parameter> <type>integer</type> ] ] ] ] )
<returnvalue>text</returnvalue>
</para>
<para>
Returns the substring within <parameter>string</parameter> that
- matches the <parameter>N</parameter>'th occurrence of the POSIX
+ matches the <parameter>occurrence</parameter>'th occurrence of the POSIX
regular expression <parameter>pattern</parameter>,
or <literal>NULL</literal> if there is no such match; see
<xref linkend="functions-posix-regexp"/>.
@@ -5888,13 +5888,13 @@ regexp_count('ABCABCAXYaxy', 'A.', 1, 'i') <lineannotation>4</lineannotation>
<para>
The <function>regexp_instr</function> function returns the starting or
- ending position of the <replaceable>N</replaceable>'th match of a
+ ending position of the <replaceable>occurrence</replaceable>'th match of a
POSIX regular expression pattern to a string, or zero if there is no
such match. It has the syntax
<function>regexp_instr</function>(<replaceable>string</replaceable>,
<replaceable>pattern</replaceable>
<optional>, <replaceable>start</replaceable>
- <optional>, <replaceable>N</replaceable>
+ <optional>, <replaceable>occurrence</replaceable>
<optional>, <replaceable>endoption</replaceable>
<optional>, <replaceable>flags</replaceable>
<optional>, <replaceable>subexpr</replaceable>
@@ -5903,8 +5903,8 @@ regexp_count('ABCABCAXYaxy', 'A.', 1, 'i') <lineannotation>4</lineannotation>
in <replaceable>string</replaceable>, normally from the beginning of
the string, but if the <replaceable>start</replaceable> parameter is
provided then beginning from that character index.
- If <replaceable>N</replaceable> is specified
- then the <replaceable>N</replaceable>'th match of the pattern
+ If <replaceable>occurrence</replaceable> is specified
+ then the <replaceable>occurrence</replaceable>'th match of the pattern
is located, otherwise the first match is located.
If the <replaceable>endoption</replaceable> parameter is omitted or
specified as zero, the function returns the position of the first
@@ -6024,8 +6024,8 @@ SELECT (regexp_match('foobarbequebaz', 'bar.*que'))[1];
expression pattern to a string. It has the same syntax as
<function>regexp_match</function>.
This function returns no rows if there is no match, one row if there is
- a match and the <literal>g</literal> flag is not given, or <replaceable>N</replaceable>
- rows if there are <replaceable>N</replaceable> matches and the <literal>g</literal> flag
+ a match and the <literal>g</literal> flag is not given, or <replaceable>occurrence</replaceable>
+ rows if there are <replaceable>occurrence</replaceable> matches and the <literal>g</literal> flag
is given. Each returned row is a text array containing the whole
matched substring or the substrings matching parenthesized
subexpressions of the <replaceable>pattern</replaceable>, just as described above
@@ -6079,10 +6079,10 @@ SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab;
<function>regexp_replace</function>(<replaceable>source</replaceable>,
<replaceable>pattern</replaceable>, <replaceable>replacement</replaceable>
<optional>, <replaceable>start</replaceable>
- <optional>, <replaceable>N</replaceable>
+ <optional>, <replaceable>occurrence</replaceable>
</optional></optional>
<optional>, <replaceable>flags</replaceable> </optional>).
- (Notice that <replaceable>N</replaceable> cannot be specified
+ (Notice that <replaceable>occurrence</replaceable> cannot be specified
unless <replaceable>start</replaceable> is,
but <replaceable>flags</replaceable> can be given in any case.)
The <replaceable>source</replaceable> string is returned unchanged if
@@ -6102,14 +6102,14 @@ SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab;
the string, but if the <replaceable>start</replaceable> parameter is
provided then beginning from that character index.
By default, only the first match of the pattern is replaced.
- If <replaceable>N</replaceable> is specified and is greater than zero,
- then the <replaceable>N</replaceable>'th match of the pattern
+ If <replaceable>occurrence</replaceable> is specified and is greater than zero,
+ then the <replaceable>occurrence</replaceable>'th match of the pattern
is replaced.
If the <literal>g</literal> flag is given, or
- if <replaceable>N</replaceable> is specified and is zero, then all
+ if <replaceable>occurrence</replaceable> is specified and is zero, then all
matches at or after the <replaceable>start</replaceable> position are
replaced. (The <literal>g</literal> flag is ignored
- when <replaceable>N</replaceable> is specified.)
+ when <replaceable>occurrence</replaceable> is specified.)
The <replaceable>flags</replaceable> parameter is an optional text
string containing zero or more single-letter flags that change the
function's behavior. Supported flags (though
@@ -6220,7 +6220,7 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', '\s*') AS foo;
<function>regexp_substr</function>(<replaceable>string</replaceable>,
<replaceable>pattern</replaceable>
<optional>, <replaceable>start</replaceable>
- <optional>, <replaceable>N</replaceable>
+ <optional>, <replaceable>occurrence</replaceable>
<optional>, <replaceable>flags</replaceable>
<optional>, <replaceable>subexpr</replaceable>
</optional></optional></optional></optional>).
@@ -6228,8 +6228,8 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', '\s*') AS foo;
in <replaceable>string</replaceable>, normally from the beginning of
the string, but if the <replaceable>start</replaceable> parameter is
provided then beginning from that character index.
- If <replaceable>N</replaceable> is specified
- then the <replaceable>N</replaceable>'th match of the pattern
+ If <replaceable>occurrence</replaceable> is specified
+ then the <replaceable>occurrence</replaceable>'th match of the pattern
is returned, otherwise the first match is returned.
The <replaceable>flags</replaceable> parameter is an optional text
string containing zero or more single-letter flags that change the
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9052f526..c7e8a23c 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -3611,105 +3611,148 @@
prosrc => 'replace_text' },
{ oid => '2284', descr => 'replace text using regexp',
proname => 'regexp_replace', prorettype => 'text',
- proargtypes => 'text text text', prosrc => 'textregexreplace_noopt' },
+ proargtypes => 'text text text',
+ proargnames => '{source, pattern, replacement}',
+ prosrc => 'textregexreplace_noopt' },
{ oid => '2285', descr => 'replace text using regexp',
proname => 'regexp_replace', prorettype => 'text',
- proargtypes => 'text text text text', prosrc => 'textregexreplace' },
+ proargtypes => 'text text text text',
+ proargnames => '{source, pattern, replacement, flags}',
+ prosrc => 'textregexreplace' },
{ oid => '6251', descr => 'replace text using regexp',
proname => 'regexp_replace', prorettype => 'text',
proargtypes => 'text text text int4 int4 text',
+ proargnames => '{source, pattern, replacement, start, occurrence, flags}',
prosrc => 'textregexreplace_extended' },
{ oid => '6252', descr => 'replace text using regexp',
proname => 'regexp_replace', prorettype => 'text',
proargtypes => 'text text text int4 int4',
+ proargnames => '{source, pattern, replacement, start, occurrence}',
prosrc => 'textregexreplace_extended_no_flags' },
{ oid => '6253', descr => 'replace text using regexp',
proname => 'regexp_replace', prorettype => 'text',
proargtypes => 'text text text int4',
+ proargnames => '{source, pattern, replacement, start}',
prosrc => 'textregexreplace_extended_no_n' },
{ oid => '3396', descr => 'find first match for regexp',
proname => 'regexp_match', prorettype => '_text', proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_match_no_flags' },
{ oid => '3397', descr => 'find first match for regexp',
proname => 'regexp_match', prorettype => '_text',
- proargtypes => 'text text text', prosrc => 'regexp_match' },
+ proargtypes => 'text text text',
+ proargnames => '{string, pattern, flags}',
+ prosrc => 'regexp_match' },
{ oid => '2763', descr => 'find match(es) for regexp',
proname => 'regexp_matches', prorows => '1', proretset => 't',
prorettype => '_text', proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_matches_no_flags' },
{ oid => '2764', descr => 'find match(es) for regexp',
proname => 'regexp_matches', prorows => '10', proretset => 't',
prorettype => '_text', proargtypes => 'text text text',
+ proargnames => '{string, pattern, flags}',
prosrc => 'regexp_matches' },
{ oid => '6254', descr => 'count regexp matches',
proname => 'regexp_count', prorettype => 'int4', proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_count_no_start' },
{ oid => '6255', descr => 'count regexp matches',
proname => 'regexp_count', prorettype => 'int4',
- proargtypes => 'text text int4', prosrc => 'regexp_count_no_flags' },
+ proargtypes => 'text text int4',
+ proargnames => '{string, pattern, start}',
+ prosrc => 'regexp_count_no_flags' },
{ oid => '6256', descr => 'count regexp matches',
proname => 'regexp_count', prorettype => 'int4',
- proargtypes => 'text text int4 text', prosrc => 'regexp_count' },
+ proargtypes => 'text text int4 text',
+ proargnames => '{string, pattern, start, flags}',
+ prosrc => 'regexp_count' },
{ oid => '6257', descr => 'position of regexp match',
proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_instr_no_start' },
{ oid => '6258', descr => 'position of regexp match',
proname => 'regexp_instr', prorettype => 'int4',
- proargtypes => 'text text int4', prosrc => 'regexp_instr_no_n' },
+ proargtypes => 'text text int4',
+ proargnames => '{string, pattern, start}',
+ prosrc => 'regexp_instr_no_n' },
{ oid => '6259', descr => 'position of regexp match',
proname => 'regexp_instr', prorettype => 'int4',
- proargtypes => 'text text int4 int4', prosrc => 'regexp_instr_no_endoption' },
+ proargtypes => 'text text int4 int4',
+ proargnames => '{string, pattern, start, occurrence}',
+ prosrc => 'regexp_instr_no_endoption' },
{ oid => '6260', descr => 'position of regexp match',
proname => 'regexp_instr', prorettype => 'int4',
proargtypes => 'text text int4 int4 int4',
+ proargnames => '{string, pattern, start, occurrence, endoption}',
prosrc => 'regexp_instr_no_flags' },
{ oid => '6261', descr => 'position of regexp match',
proname => 'regexp_instr', prorettype => 'int4',
proargtypes => 'text text int4 int4 int4 text',
+ proargnames => '{string, pattern, start, occurrence, endoption, flags}',
prosrc => 'regexp_instr_no_subexpr' },
{ oid => '6262', descr => 'position of regexp match',
proname => 'regexp_instr', prorettype => 'int4',
proargtypes => 'text text int4 int4 int4 text int4',
+ proargnames => '{string, pattern, start, occurrence, endoption, flags, subexpr}',
prosrc => 'regexp_instr' },
{ oid => '6263', descr => 'test for regexp match',
- proname => 'regexp_like', prorettype => 'bool', proargtypes => 'text text',
+ proname => 'regexp_like', prorettype => 'bool',
+ proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_like_no_flags' },
{ oid => '6264', descr => 'test for regexp match',
proname => 'regexp_like', prorettype => 'bool',
- proargtypes => 'text text text', prosrc => 'regexp_like' },
+ proargtypes => 'text text text',
+ proargnames => '{string, pattern,flags}',
+ prosrc => 'regexp_like' },
{ oid => '6265', descr => 'extract substring that matches regexp',
proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_substr_no_start' },
{ oid => '6266', descr => 'extract substring that matches regexp',
proname => 'regexp_substr', prorettype => 'text',
- proargtypes => 'text text int4', prosrc => 'regexp_substr_no_n' },
+ proargtypes => 'text text int4',
+ proargnames => '{string, pattern, start}',
+ prosrc => 'regexp_substr_no_n' },
{ oid => '6267', descr => 'extract substring that matches regexp',
proname => 'regexp_substr', prorettype => 'text',
- proargtypes => 'text text int4 int4', prosrc => 'regexp_substr_no_flags' },
+ proargtypes => 'text text int4 int4',
+ proargnames => '{string, pattern, start, occurrence}',
+ prosrc => 'regexp_substr_no_flags' },
{ oid => '6268', descr => 'extract substring that matches regexp',
proname => 'regexp_substr', prorettype => 'text',
proargtypes => 'text text int4 int4 text',
+ proargnames => '{string, pattern, start, occurrence, flags}',
prosrc => 'regexp_substr_no_subexpr' },
{ oid => '6269', descr => 'extract substring that matches regexp',
proname => 'regexp_substr', prorettype => 'text',
- proargtypes => 'text text int4 int4 text int4', prosrc => 'regexp_substr' },
+ proargtypes => 'text text int4 int4 text int4',
+ proargnames => '{string, pattern, start, occurrence, flags, subexpr}',
+ prosrc => 'regexp_substr' },
{ oid => '2088', descr => 'split string by field_sep and return field_num',
proname => 'split_part', prorettype => 'text',
proargtypes => 'text text int4', prosrc => 'split_part' },
{ oid => '2765', descr => 'split string by pattern',
proname => 'regexp_split_to_table', prorows => '1000', proretset => 't',
prorettype => 'text', proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_split_to_table_no_flags' },
{ oid => '2766', descr => 'split string by pattern',
proname => 'regexp_split_to_table', prorows => '1000', proretset => 't',
prorettype => 'text', proargtypes => 'text text text',
+ proargnames => '{string, pattern, flags}',
prosrc => 'regexp_split_to_table' },
{ oid => '2767', descr => 'split string by pattern',
proname => 'regexp_split_to_array', prorettype => '_text',
- proargtypes => 'text text', prosrc => 'regexp_split_to_array_no_flags' },
+ proargtypes => 'text text',
+ proargnames => '{string, pattern}',
+ prosrc => 'regexp_split_to_array_no_flags' },
{ oid => '2768', descr => 'split string by pattern',
proname => 'regexp_split_to_array', prorettype => '_text',
- proargtypes => 'text text text', prosrc => 'regexp_split_to_array' },
+ proargtypes => 'text text text',
+ proargnames => '{string, pattern, flags}',
+ prosrc => 'regexp_split_to_array' },
{ oid => '9030', descr => 'convert int4 number to binary',
proname => 'to_bin', prorettype => 'text', proargtypes => 'int4',
prosrc => 'to_bin32' },
--
2.34.1
On Thu Jan 4, 2024 at 2:03 AM EST, jian he wrote:
On Thu, Jan 4, 2024 at 7:26 AM Jim Nasby <jim.nasby@gmail.com> wrote:
On 1/3/24 5:05 PM, Dian Fay wrote:
Another possibility is `index`, which is relatively short and not a
reserved keyword ^1. `position` is not as precise but would avoid the
conceptual overloading of ordinary indices.I'm not a fan of "index" since that leaves the question of
whether it's 0 or 1 based. "Position" is a bit better, but I think
Jian's suggestion of "occurance" is best.We do have precedent for one-based `index` in Postgres: array types are
1-indexed by default! "Occurrence" removes that ambiguity but it's long
and easy to misspell (I looked it up after typing it just now and it
_still_ feels off).How's "instance"?
Presumably someone referencing arguments by name would have just looked up the names via \df or whatever, so presumably misspelling wouldn't be a big issue. But I think "instance" is OK as well.
--
Jim Nasby, Data Architect, Austin TXregexp_instr: It has the syntax regexp_instr(string, pattern [, start
[, N [, endoption [, flags [, subexpr ]]]]])
oracle:
REGEXP_INSTR (source_char, pattern, [, position [, occurrence [,
return_opt [, match_param [, subexpr ]]]]] )"string" and "source_char" are almost the same descriptive, so maybe
there is no need to change.
"start" is better than "position", imho.
"return_opt" is better than "endoption", (maybe we need change, for
now I didn't)
"flags" cannot be changed to "match_param", given it quite everywhere
in functions-matching.html.similarly for function regexp_replace, oracle using "repplace_string",
we use "replacement"(mentioned in the doc).
so I don't think we need to change to "repplace_string".Based on how people google[0], I think `occurrence` is ok, even though
it's verbose.
to change from `N` to `occurrence`, we also need to change the doc,
that is why this patch is more larger.
The `regexp_replace` summary in table 9.10 is mismatched and still
specifies the first parameter name as `string` instead of `source`.
Since all the other functions use `string`, should `regexp_replace` do
the same or is this a case where an established "standard" diverges?
I noticed the original documentation for some of these functions is
rather disorganized; summaries explain `occurrence` without explaining
the prior `start` parameter, and detailed documentation in 9.7 is
usually a single paragraph per function running pell-mell through ifs
and buts without section headings, so entries in table 9.10 have to
reference the entire section 9.7.3 instead of their specific functions.
It's out of scope here, but should I bring this up on pgsql-docs?
On Mon, Jan 8, 2024 at 8:44 AM Dian Fay <di@nmfay.com> wrote:
On Thu Jan 4, 2024 at 2:03 AM EST, jian he wrote:
On Thu, Jan 4, 2024 at 7:26 AM Jim Nasby <jim.nasby@gmail.com> wrote:
On 1/3/24 5:05 PM, Dian Fay wrote:
Another possibility is `index`, which is relatively short and not a
reserved keyword ^1. `position` is not as precise but would avoid the
conceptual overloading of ordinary indices.I'm not a fan of "index" since that leaves the question of
whether it's 0 or 1 based. "Position" is a bit better, but I think
Jian's suggestion of "occurance" is best.We do have precedent for one-based `index` in Postgres: array types are
1-indexed by default! "Occurrence" removes that ambiguity but it's long
and easy to misspell (I looked it up after typing it just now and it
_still_ feels off).How's "instance"?
Presumably someone referencing arguments by name would have just looked up the names via \df or whatever, so presumably misspelling wouldn't be a big issue. But I think "instance" is OK as well.
--
Jim Nasby, Data Architect, Austin TXregexp_instr: It has the syntax regexp_instr(string, pattern [, start
[, N [, endoption [, flags [, subexpr ]]]]])
oracle:
REGEXP_INSTR (source_char, pattern, [, position [, occurrence [,
return_opt [, match_param [, subexpr ]]]]] )"string" and "source_char" are almost the same descriptive, so maybe
there is no need to change.
"start" is better than "position", imho.
"return_opt" is better than "endoption", (maybe we need change, for
now I didn't)
"flags" cannot be changed to "match_param", given it quite everywhere
in functions-matching.html.similarly for function regexp_replace, oracle using "repplace_string",
we use "replacement"(mentioned in the doc).
so I don't think we need to change to "repplace_string".Based on how people google[0], I think `occurrence` is ok, even though
it's verbose.
to change from `N` to `occurrence`, we also need to change the doc,
that is why this patch is more larger.The `regexp_replace` summary in table 9.10 is mismatched and still
specifies the first parameter name as `string` instead of `source`.
Since all the other functions use `string`, should `regexp_replace` do
the same or is this a case where an established "standard" diverges?
got it. Thanks for pointing it out.
in functions-matching.html
if I change <replaceable>source</replaceable> to
<replaceable>string</replaceable> then
there are no markup "string" and markup "string", it's kind of
slightly confusing.
So does the following refactored description of regexp_replace make sense:
The <replaceable>string</replaceable> is returned unchanged if
there is no match to the <replaceable>pattern</replaceable>. If there is a
match, the <replaceable>string</replaceable> is returned with the
<replaceable>replacement</replaceable> string substituted for the matching
substring. The <replaceable>replacement</replaceable> string can contain
<literal>\</literal><replaceable>n</replaceable>, where
<replaceable>n</replaceable> is 1
through 9, to indicate that the source substring matching the
<replaceable>n</replaceable>'th parenthesized subexpression of
the pattern should be
inserted, and it can contain <literal>\&</literal> to indicate that the
substring matching the entire pattern should be inserted. Write
<literal>\\</literal> if you need to put a literal backslash in
the replacement
text.
I noticed the original documentation for some of these functions is
rather disorganized; summaries explain `occurrence` without explaining
the prior `start` parameter, and detailed documentation in 9.7 is
usually a single paragraph per function running pell-mell through ifs
and buts without section headings, so entries in table 9.10 have to
reference the entire section 9.7.3 instead of their specific functions.
It's out of scope here, but should I bring this up on pgsql-docs?
I got it.
in Table 9.10. Other String Functions and Operators, if we can
reference the specific function would be great.
As for now, in the browser, you need to use Ctrl+F to find the
detailed explanation in 9.7.3.
you can just bring your suggested or patch to pgsql-hackers@postgresql.org.
On Mon Jan 8, 2024 at 9:26 AM EST, jian he wrote:
On Mon, Jan 8, 2024 at 8:44 AM Dian Fay <di@nmfay.com> wrote:
The `regexp_replace` summary in table 9.10 is mismatched and still
specifies the first parameter name as `string` instead of `source`.
Since all the other functions use `string`, should `regexp_replace` do
the same or is this a case where an established "standard" diverges?got it. Thanks for pointing it out.
in functions-matching.html
if I change <replaceable>source</replaceable> to
<replaceable>string</replaceable> then
there are no markup "string" and markup "string", it's kind of
slightly confusing.So does the following refactored description of regexp_replace make sense:
The <replaceable>string</replaceable> is returned unchanged if
there is no match to the <replaceable>pattern</replaceable>. If there is a
match, the <replaceable>string</replaceable> is returned with the
<replaceable>replacement</replaceable> string substituted for the matching
substring. The <replaceable>replacement</replaceable> string can contain
<literal>\</literal><replaceable>n</replaceable>, where
<replaceable>n</replaceable> is 1
through 9, to indicate that the source substring matching the
<replaceable>n</replaceable>'th parenthesized subexpression of
the pattern should be
inserted, and it can contain <literal>\&</literal> to indicate that the
substring matching the entire pattern should be inserted. Write
<literal>\\</literal> if you need to put a literal backslash in
the replacement
text.
That change makes sense to me! I'll see about the section refactoring
after this lands.
On Tue, Jan 9, 2024 at 8:52 AM Dian Fay <di@nmfay.com> wrote:
On Mon Jan 8, 2024 at 9:26 AM EST, jian he wrote:
On Mon, Jan 8, 2024 at 8:44 AM Dian Fay <di@nmfay.com> wrote:
The `regexp_replace` summary in table 9.10 is mismatched and still
specifies the first parameter name as `string` instead of `source`.
Since all the other functions use `string`, should `regexp_replace` do
the same or is this a case where an established "standard" diverges?got it. Thanks for pointing it out.
in functions-matching.html
if I change <replaceable>source</replaceable> to
<replaceable>string</replaceable> then
there are no markup "string" and markup "string", it's kind of
slightly confusing.So does the following refactored description of regexp_replace make sense:
The <replaceable>string</replaceable> is returned unchanged if
there is no match to the <replaceable>pattern</replaceable>. If there is a
match, the <replaceable>string</replaceable> is returned with the
<replaceable>replacement</replaceable> string substituted for the matching
substring. The <replaceable>replacement</replaceable> string can contain
<literal>\</literal><replaceable>n</replaceable>, where
<replaceable>n</replaceable> is 1
through 9, to indicate that the source substring matching the
<replaceable>n</replaceable>'th parenthesized subexpression of
the pattern should be
inserted, and it can contain <literal>\&</literal> to indicate that the
substring matching the entire pattern should be inserted. Write
<literal>\\</literal> if you need to put a literal backslash in
the replacement
text.That change makes sense to me! I'll see about the section refactoring
after this lands.
I put the changes into the new patch.
Attachments:
v3-0001-add-function-argument-names-to-regex.-functions.patchtext/x-patch; charset=US-ASCII; name=v3-0001-add-function-argument-names-to-regex.-functions.patchDownload
From 8fa04ed1fecb48ca8254d2ed7e60a9013fa130a3 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Wed, 10 Jan 2024 17:46:18 +0800
Subject: [PATCH v3 1/1] add function argument names to regex.* functions.
Specifically add function argument names to the following funtions:
regexp_replace,
regexp_match,
regexp_matches
regexp_count,
regexp_instr,
regexp_like,
regexp_substr,
regexp_split_to_table
regexp_split_to_array
So it would be easier to understand these functions in psql via \df.
Also now these functions can be called in different notaions.
---
doc/src/sgml/func.sgml | 50 +++++++++++------------
src/include/catalog/pg_proc.dat | 71 ++++++++++++++++++++++++++-------
2 files changed, 82 insertions(+), 39 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index de78d58d..23ef07a5 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3299,7 +3299,7 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in
</indexterm>
<function>regexp_instr</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>
[, <parameter>start</parameter> <type>integer</type>
- [, <parameter>N</parameter> <type>integer</type>
+ [, <parameter>occurrence</parameter> <type>integer</type>
[, <parameter>endoption</parameter> <type>integer</type>
[, <parameter>flags</parameter> <type>text</type>
[, <parameter>subexpr</parameter> <type>integer</type> ] ] ] ] ] )
@@ -3307,7 +3307,7 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in
</para>
<para>
Returns the position within <parameter>string</parameter> where
- the <parameter>N</parameter>'th match of the POSIX regular
+ the <parameter>occurrence</parameter>'th match of the POSIX regular
expression <parameter>pattern</parameter> occurs, or zero if there is
no such match; see <xref linkend="functions-posix-regexp"/>.
</para>
@@ -3413,14 +3413,14 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in
<entry role="func_table_entry"><para role="func_signature">
<function>regexp_replace</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>, <parameter>replacement</parameter> <type>text</type>,
<parameter>start</parameter> <type>integer</type>,
- <parameter>N</parameter> <type>integer</type>
+ <parameter>occurrence</parameter> <type>integer</type>
[, <parameter>flags</parameter> <type>text</type> ] )
<returnvalue>text</returnvalue>
</para>
<para>
- Replaces the substring that is the <parameter>N</parameter>'th
+ Replaces the substring that is the <parameter>occurrence</parameter>'th
match to the POSIX regular expression <parameter>pattern</parameter>,
- or all such matches if <parameter>N</parameter> is zero; see
+ or all such matches if <parameter>occurrence</parameter> is zero; see
<xref linkend="functions-posix-regexp"/>.
</para>
<para>
@@ -3478,14 +3478,14 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in
</indexterm>
<function>regexp_substr</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>
[, <parameter>start</parameter> <type>integer</type>
- [, <parameter>N</parameter> <type>integer</type>
+ [, <parameter>occurrence</parameter> <type>integer</type>
[, <parameter>flags</parameter> <type>text</type>
[, <parameter>subexpr</parameter> <type>integer</type> ] ] ] ] )
<returnvalue>text</returnvalue>
</para>
<para>
Returns the substring within <parameter>string</parameter> that
- matches the <parameter>N</parameter>'th occurrence of the POSIX
+ matches the <parameter>occurrence</parameter>'th occurrence of the POSIX
regular expression <parameter>pattern</parameter>,
or <literal>NULL</literal> if there is no such match; see
<xref linkend="functions-posix-regexp"/>.
@@ -5888,13 +5888,13 @@ regexp_count('ABCABCAXYaxy', 'A.', 1, 'i') <lineannotation>4</lineannotation>
<para>
The <function>regexp_instr</function> function returns the starting or
- ending position of the <replaceable>N</replaceable>'th match of a
+ ending position of the <replaceable>occurrence</replaceable>'th match of a
POSIX regular expression pattern to a string, or zero if there is no
such match. It has the syntax
<function>regexp_instr</function>(<replaceable>string</replaceable>,
<replaceable>pattern</replaceable>
<optional>, <replaceable>start</replaceable>
- <optional>, <replaceable>N</replaceable>
+ <optional>, <replaceable>occurrence</replaceable>
<optional>, <replaceable>endoption</replaceable>
<optional>, <replaceable>flags</replaceable>
<optional>, <replaceable>subexpr</replaceable>
@@ -5903,8 +5903,8 @@ regexp_count('ABCABCAXYaxy', 'A.', 1, 'i') <lineannotation>4</lineannotation>
in <replaceable>string</replaceable>, normally from the beginning of
the string, but if the <replaceable>start</replaceable> parameter is
provided then beginning from that character index.
- If <replaceable>N</replaceable> is specified
- then the <replaceable>N</replaceable>'th match of the pattern
+ If <replaceable>occurrence</replaceable> is specified
+ then the <replaceable>occurrence</replaceable>'th match of the pattern
is located, otherwise the first match is located.
If the <replaceable>endoption</replaceable> parameter is omitted or
specified as zero, the function returns the position of the first
@@ -6024,8 +6024,8 @@ SELECT (regexp_match('foobarbequebaz', 'bar.*que'))[1];
expression pattern to a string. It has the same syntax as
<function>regexp_match</function>.
This function returns no rows if there is no match, one row if there is
- a match and the <literal>g</literal> flag is not given, or <replaceable>N</replaceable>
- rows if there are <replaceable>N</replaceable> matches and the <literal>g</literal> flag
+ a match and the <literal>g</literal> flag is not given, or <replaceable>occurrence</replaceable>
+ rows if there are <replaceable>occurrence</replaceable> matches and the <literal>g</literal> flag
is given. Each returned row is a text array containing the whole
matched substring or the substrings matching parenthesized
subexpressions of the <replaceable>pattern</replaceable>, just as described above
@@ -6076,18 +6076,18 @@ SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab;
The <function>regexp_replace</function> function provides substitution of
new text for substrings that match POSIX regular expression patterns.
It has the syntax
- <function>regexp_replace</function>(<replaceable>source</replaceable>,
+ <function>regexp_replace</function>(<replaceable>string</replaceable>,
<replaceable>pattern</replaceable>, <replaceable>replacement</replaceable>
<optional>, <replaceable>start</replaceable>
- <optional>, <replaceable>N</replaceable>
+ <optional>, <replaceable>occurrence</replaceable>
</optional></optional>
<optional>, <replaceable>flags</replaceable> </optional>).
- (Notice that <replaceable>N</replaceable> cannot be specified
+ (Notice that <replaceable>occurrence</replaceable> cannot be specified
unless <replaceable>start</replaceable> is,
but <replaceable>flags</replaceable> can be given in any case.)
- The <replaceable>source</replaceable> string is returned unchanged if
+ The <replaceable>string</replaceable> is returned unchanged if
there is no match to the <replaceable>pattern</replaceable>. If there is a
- match, the <replaceable>source</replaceable> string is returned with the
+ match, the <replaceable>string</replaceable> is returned with the
<replaceable>replacement</replaceable> string substituted for the matching
substring. The <replaceable>replacement</replaceable> string can contain
<literal>\</literal><replaceable>n</replaceable>, where <replaceable>n</replaceable> is 1
@@ -6102,14 +6102,14 @@ SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab;
the string, but if the <replaceable>start</replaceable> parameter is
provided then beginning from that character index.
By default, only the first match of the pattern is replaced.
- If <replaceable>N</replaceable> is specified and is greater than zero,
- then the <replaceable>N</replaceable>'th match of the pattern
+ If <replaceable>occurrence</replaceable> is specified and is greater than zero,
+ then the <replaceable>occurrence</replaceable>'th match of the pattern
is replaced.
If the <literal>g</literal> flag is given, or
- if <replaceable>N</replaceable> is specified and is zero, then all
+ if <replaceable>occurrence</replaceable> is specified and is zero, then all
matches at or after the <replaceable>start</replaceable> position are
replaced. (The <literal>g</literal> flag is ignored
- when <replaceable>N</replaceable> is specified.)
+ when <replaceable>occurrence</replaceable> is specified.)
The <replaceable>flags</replaceable> parameter is an optional text
string containing zero or more single-letter flags that change the
function's behavior. Supported flags (though
@@ -6220,7 +6220,7 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', '\s*') AS foo;
<function>regexp_substr</function>(<replaceable>string</replaceable>,
<replaceable>pattern</replaceable>
<optional>, <replaceable>start</replaceable>
- <optional>, <replaceable>N</replaceable>
+ <optional>, <replaceable>occurrence</replaceable>
<optional>, <replaceable>flags</replaceable>
<optional>, <replaceable>subexpr</replaceable>
</optional></optional></optional></optional>).
@@ -6228,8 +6228,8 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', '\s*') AS foo;
in <replaceable>string</replaceable>, normally from the beginning of
the string, but if the <replaceable>start</replaceable> parameter is
provided then beginning from that character index.
- If <replaceable>N</replaceable> is specified
- then the <replaceable>N</replaceable>'th match of the pattern
+ If <replaceable>occurrence</replaceable> is specified
+ then the <replaceable>occurrence</replaceable>'th match of the pattern
is returned, otherwise the first match is returned.
The <replaceable>flags</replaceable> parameter is an optional text
string containing zero or more single-letter flags that change the
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 79793927..3b4330a6 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -3611,105 +3611,148 @@
prosrc => 'replace_text' },
{ oid => '2284', descr => 'replace text using regexp',
proname => 'regexp_replace', prorettype => 'text',
- proargtypes => 'text text text', prosrc => 'textregexreplace_noopt' },
+ proargtypes => 'text text text',
+ proargnames => '{string, pattern, replacement}',
+ prosrc => 'textregexreplace_noopt' },
{ oid => '2285', descr => 'replace text using regexp',
proname => 'regexp_replace', prorettype => 'text',
- proargtypes => 'text text text text', prosrc => 'textregexreplace' },
+ proargtypes => 'text text text text',
+ proargnames => '{string, pattern, replacement, flags}',
+ prosrc => 'textregexreplace' },
{ oid => '6251', descr => 'replace text using regexp',
proname => 'regexp_replace', prorettype => 'text',
proargtypes => 'text text text int4 int4 text',
+ proargnames => '{string, pattern, replacement, start, occurrence, flags}',
prosrc => 'textregexreplace_extended' },
{ oid => '6252', descr => 'replace text using regexp',
proname => 'regexp_replace', prorettype => 'text',
proargtypes => 'text text text int4 int4',
+ proargnames => '{string, pattern, replacement, start, occurrence}',
prosrc => 'textregexreplace_extended_no_flags' },
{ oid => '6253', descr => 'replace text using regexp',
proname => 'regexp_replace', prorettype => 'text',
proargtypes => 'text text text int4',
+ proargnames => '{string, pattern, replacement, start}',
prosrc => 'textregexreplace_extended_no_n' },
{ oid => '3396', descr => 'find first match for regexp',
proname => 'regexp_match', prorettype => '_text', proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_match_no_flags' },
{ oid => '3397', descr => 'find first match for regexp',
proname => 'regexp_match', prorettype => '_text',
- proargtypes => 'text text text', prosrc => 'regexp_match' },
+ proargtypes => 'text text text',
+ proargnames => '{string, pattern, flags}',
+ prosrc => 'regexp_match' },
{ oid => '2763', descr => 'find match(es) for regexp',
proname => 'regexp_matches', prorows => '1', proretset => 't',
prorettype => '_text', proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_matches_no_flags' },
{ oid => '2764', descr => 'find match(es) for regexp',
proname => 'regexp_matches', prorows => '10', proretset => 't',
prorettype => '_text', proargtypes => 'text text text',
+ proargnames => '{string, pattern, flags}',
prosrc => 'regexp_matches' },
{ oid => '6254', descr => 'count regexp matches',
proname => 'regexp_count', prorettype => 'int4', proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_count_no_start' },
{ oid => '6255', descr => 'count regexp matches',
proname => 'regexp_count', prorettype => 'int4',
- proargtypes => 'text text int4', prosrc => 'regexp_count_no_flags' },
+ proargtypes => 'text text int4',
+ proargnames => '{string, pattern, start}',
+ prosrc => 'regexp_count_no_flags' },
{ oid => '6256', descr => 'count regexp matches',
proname => 'regexp_count', prorettype => 'int4',
- proargtypes => 'text text int4 text', prosrc => 'regexp_count' },
+ proargtypes => 'text text int4 text',
+ proargnames => '{string, pattern, start, flags}',
+ prosrc => 'regexp_count' },
{ oid => '6257', descr => 'position of regexp match',
proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_instr_no_start' },
{ oid => '6258', descr => 'position of regexp match',
proname => 'regexp_instr', prorettype => 'int4',
- proargtypes => 'text text int4', prosrc => 'regexp_instr_no_n' },
+ proargtypes => 'text text int4',
+ proargnames => '{string, pattern, start}',
+ prosrc => 'regexp_instr_no_n' },
{ oid => '6259', descr => 'position of regexp match',
proname => 'regexp_instr', prorettype => 'int4',
- proargtypes => 'text text int4 int4', prosrc => 'regexp_instr_no_endoption' },
+ proargtypes => 'text text int4 int4',
+ proargnames => '{string, pattern, start, occurrence}',
+ prosrc => 'regexp_instr_no_endoption' },
{ oid => '6260', descr => 'position of regexp match',
proname => 'regexp_instr', prorettype => 'int4',
proargtypes => 'text text int4 int4 int4',
+ proargnames => '{string, pattern, start, occurrence, endoption}',
prosrc => 'regexp_instr_no_flags' },
{ oid => '6261', descr => 'position of regexp match',
proname => 'regexp_instr', prorettype => 'int4',
proargtypes => 'text text int4 int4 int4 text',
+ proargnames => '{string, pattern, start, occurrence, endoption, flags}',
prosrc => 'regexp_instr_no_subexpr' },
{ oid => '6262', descr => 'position of regexp match',
proname => 'regexp_instr', prorettype => 'int4',
proargtypes => 'text text int4 int4 int4 text int4',
+ proargnames => '{string, pattern, start, occurrence, endoption, flags, subexpr}',
prosrc => 'regexp_instr' },
{ oid => '6263', descr => 'test for regexp match',
- proname => 'regexp_like', prorettype => 'bool', proargtypes => 'text text',
+ proname => 'regexp_like', prorettype => 'bool',
+ proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_like_no_flags' },
{ oid => '6264', descr => 'test for regexp match',
proname => 'regexp_like', prorettype => 'bool',
- proargtypes => 'text text text', prosrc => 'regexp_like' },
+ proargtypes => 'text text text',
+ proargnames => '{string, pattern,flags}',
+ prosrc => 'regexp_like' },
{ oid => '6265', descr => 'extract substring that matches regexp',
proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_substr_no_start' },
{ oid => '6266', descr => 'extract substring that matches regexp',
proname => 'regexp_substr', prorettype => 'text',
- proargtypes => 'text text int4', prosrc => 'regexp_substr_no_n' },
+ proargtypes => 'text text int4',
+ proargnames => '{string, pattern, start}',
+ prosrc => 'regexp_substr_no_n' },
{ oid => '6267', descr => 'extract substring that matches regexp',
proname => 'regexp_substr', prorettype => 'text',
- proargtypes => 'text text int4 int4', prosrc => 'regexp_substr_no_flags' },
+ proargtypes => 'text text int4 int4',
+ proargnames => '{string, pattern, start, occurrence}',
+ prosrc => 'regexp_substr_no_flags' },
{ oid => '6268', descr => 'extract substring that matches regexp',
proname => 'regexp_substr', prorettype => 'text',
proargtypes => 'text text int4 int4 text',
+ proargnames => '{string, pattern, start, occurrence, flags}',
prosrc => 'regexp_substr_no_subexpr' },
{ oid => '6269', descr => 'extract substring that matches regexp',
proname => 'regexp_substr', prorettype => 'text',
- proargtypes => 'text text int4 int4 text int4', prosrc => 'regexp_substr' },
+ proargtypes => 'text text int4 int4 text int4',
+ proargnames => '{string, pattern, start, occurrence, flags, subexpr}',
+ prosrc => 'regexp_substr' },
{ oid => '2088', descr => 'split string by field_sep and return field_num',
proname => 'split_part', prorettype => 'text',
proargtypes => 'text text int4', prosrc => 'split_part' },
{ oid => '2765', descr => 'split string by pattern',
proname => 'regexp_split_to_table', prorows => '1000', proretset => 't',
prorettype => 'text', proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_split_to_table_no_flags' },
{ oid => '2766', descr => 'split string by pattern',
proname => 'regexp_split_to_table', prorows => '1000', proretset => 't',
prorettype => 'text', proargtypes => 'text text text',
+ proargnames => '{string, pattern, flags}',
prosrc => 'regexp_split_to_table' },
{ oid => '2767', descr => 'split string by pattern',
proname => 'regexp_split_to_array', prorettype => '_text',
- proargtypes => 'text text', prosrc => 'regexp_split_to_array_no_flags' },
+ proargtypes => 'text text',
+ proargnames => '{string, pattern}',
+ prosrc => 'regexp_split_to_array_no_flags' },
{ oid => '2768', descr => 'split string by pattern',
proname => 'regexp_split_to_array', prorettype => '_text',
- proargtypes => 'text text text', prosrc => 'regexp_split_to_array' },
+ proargtypes => 'text text text',
+ proargnames => '{string, pattern, flags}',
+ prosrc => 'regexp_split_to_array' },
{ oid => '9030', descr => 'convert int4 number to binary',
proname => 'to_bin', prorettype => 'text', proargtypes => 'int4',
prosrc => 'to_bin32' },
--
2.34.1
On Wed Jan 10, 2024 at 9:18 AM EST, jian he wrote:
On Tue, Jan 9, 2024 at 8:52 AM Dian Fay <di@nmfay.com> wrote:
On Mon Jan 8, 2024 at 9:26 AM EST, jian he wrote:
On Mon, Jan 8, 2024 at 8:44 AM Dian Fay <di@nmfay.com> wrote:
The `regexp_replace` summary in table 9.10 is mismatched and still
specifies the first parameter name as `string` instead of `source`.
Since all the other functions use `string`, should `regexp_replace` do
the same or is this a case where an established "standard" diverges?got it. Thanks for pointing it out.
in functions-matching.html
if I change <replaceable>source</replaceable> to
<replaceable>string</replaceable> then
there are no markup "string" and markup "string", it's kind of
slightly confusing.So does the following refactored description of regexp_replace make sense:
The <replaceable>string</replaceable> is returned unchanged if
there is no match to the <replaceable>pattern</replaceable>. If there is a
match, the <replaceable>string</replaceable> is returned with the
<replaceable>replacement</replaceable> string substituted for the matching
substring. The <replaceable>replacement</replaceable> string can contain
<literal>\</literal><replaceable>n</replaceable>, where
<replaceable>n</replaceable> is 1
through 9, to indicate that the source substring matching the
<replaceable>n</replaceable>'th parenthesized subexpression of
the pattern should be
inserted, and it can contain <literal>\&</literal> to indicate that the
substring matching the entire pattern should be inserted. Write
<literal>\\</literal> if you need to put a literal backslash in
the replacement
text.That change makes sense to me! I'll see about the section refactoring
after this lands.I put the changes into the new patch.
Sorry, I missed one minor issue with v2. The replacement on lines
6027-6028 of func.sgml (originally "`n` rows if there are `n` matches")
is not needed and could be more confusing since the `n` represents a
number, not an argument to `regexp_matches`. I've built v3 and gone over
everything else one more time and it looks good.
On 10.01.24 15:18, jian he wrote:
I put the changes into the new patch.
Reading back through the discussion, I wasn't quite able to interpret
the resolution regarding Oracle compatibility. From the patch, it looks
like you chose not to adopt the parameter names from Oracle. Was that
your intention?
On Thu, Jan 18, 2024 at 4:17 PM Peter Eisentraut <peter@eisentraut.org> wrote:
On 10.01.24 15:18, jian he wrote:
I put the changes into the new patch.
Reading back through the discussion, I wasn't quite able to interpret
the resolution regarding Oracle compatibility. From the patch, it looks
like you chose not to adopt the parameter names from Oracle. Was that
your intention?
per committee message:
https://git.postgresql.org/cgit/postgresql.git/commit/?id=6424337073589476303b10f6d7cc74f501b8d9d7
Even if the names are all the same, our function is still not the same
as oracle.
There is a documentation bug.
In [0]https://www.postgresql.org/docs/current/functions-matching.html, Table 9.25. Regular Expression Functions Equivalencies
regexp_replace function definition: regexp_replace(string, pattern, replacement)
In one of the <tip> section below, regexp_replace explains as
<<<<<
The regexp_replace function provides substitution of new text for
substrings that match POSIX regular expression patterns. It has the
syntax regexp_replace(source, pattern, replacement [, start [, N ]] [,
flags ]). (Notice that N cannot be specified unless start is, but
flags can be given in any case.)
<<<<<
So I changed the first argument of regexp_replace to "string". So
accordingly, the doc needs to change also, which I did.
another regex* function argument changes: from "N" to "occurences", example:
+ If <replaceable>occurrence</replaceable> is specified
+ then the <replaceable>occurrence</replaceable>'th match of the pattern
+ is located,
but [2] says
Speaking of the "occurrence'th
occurrence" is just silly, not to mention long and easy to misspell."
summary:
adding function-named notation is my intention.
To make regex.* functions named-notation works, we need to add
proargnames to src/include/catalog/pg_proc.dat.
add proargnames also require changing the doc.
naming proargnames is a matter of taste now, So I only change 'N' to
'occurrence'.
[0]: https://www.postgresql.org/docs/current/functions-matching.html
[1]: /messages/by-id/fc160ee0-c843-b024-29bb-97b5da61971f@darold.net
On Sat, Jan 20, 2024 at 10:55 AM jian he <jian.universality@gmail.com> wrote:
another regex* function argument changes: from "N" to "occurences", example: + If <replaceable>occurrence</replaceable> is specified + then the <replaceable>occurrence</replaceable>'th match of the pattern + is located,but [2] says
Speaking of the "occurrence'th
occurrence" is just silly, not to mention long and easy to misspell."
sorry.
[2]: , The reference link is /messages/by-id/1567465.1627860115@sss.pgh.pa.us
/messages/by-id/1567465.1627860115@sss.pgh.pa.us
my previous post will link to the whole thread.
jian he <jian.universality@gmail.com> writes:
On Thu, Jan 18, 2024 at 4:17 PM Peter Eisentraut <peter@eisentraut.org> wrote:
Reading back through the discussion, I wasn't quite able to interpret
the resolution regarding Oracle compatibility. From the patch, it looks
like you chose not to adopt the parameter names from Oracle. Was that
your intention?
per committee message:
https://git.postgresql.org/cgit/postgresql.git/commit/?id=6424337073589476303b10f6d7cc74f501b8d9d7
Even if the names are all the same, our function is still not the same
as oracle.
The fact that there's minor discrepancies in the regex languages
doesn't seem to me to have a lot of bearing on whether we should
follow Oracle's choices of parameter names.
However, if we do follow Oracle, it seems like we should do that
consistently, which this patch doesn't. For instance, per [1]https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/REGEXP_SUBSTR.html#GUID-2903904D-455F-4839-A8B2-1731EF4BD099
Oracle calls the arguments of regex_substr
source_char,
pattern,
position,
occurrence,
match_param,
subexpr
while we have
string,
pattern,
start,
N,
flags,
subexpr
The patch proposes to replace "N" with "occurrence" but not touch
the other discrepancies, which seems to me to be a pretty poor
choice. "occurrence" is very long and difficult to spell correctly,
and if you're not following Oracle slavishly, exactly what is the
argument in its favor? I quite agree that Oracle's other choices
aren't improvements over ours, but neither is that one.
On the whole my inclination would be to stick to the names we have
in the documentation. There might be an argument for changing "N"
to something lower-case so you don't have to quote it; but if we do,
I'd go for, say, "count".
regards, tom lane
On Wed, Apr 3, 2024 at 4:45 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
jian he <jian.universality@gmail.com> writes:
On Thu, Jan 18, 2024 at 4:17 PM Peter Eisentraut <peter@eisentraut.org> wrote:
Reading back through the discussion, I wasn't quite able to interpret
the resolution regarding Oracle compatibility. From the patch, it looks
like you chose not to adopt the parameter names from Oracle. Was that
your intention?per committee message:
https://git.postgresql.org/cgit/postgresql.git/commit/?id=6424337073589476303b10f6d7cc74f501b8d9d7
Even if the names are all the same, our function is still not the same
as oracle.The fact that there's minor discrepancies in the regex languages
doesn't seem to me to have a lot of bearing on whether we should
follow Oracle's choices of parameter names.However, if we do follow Oracle, it seems like we should do that
consistently, which this patch doesn't. For instance, per [1]
Oracle calls the arguments of regex_substrsource_char,
pattern,
position,
occurrence,
match_param,
subexprwhile we have
string,
pattern,
start,
N,
flags,
subexprThe patch proposes to replace "N" with "occurrence" but not touch
the other discrepancies, which seems to me to be a pretty poor
choice. "occurrence" is very long and difficult to spell correctly,
and if you're not following Oracle slavishly, exactly what is the
argument in its favor? I quite agree that Oracle's other choices
aren't improvements over ours, but neither is that one.On the whole my inclination would be to stick to the names we have
in the documentation. There might be an argument for changing "N"
to something lower-case so you don't have to quote it; but if we do,
I'd go for, say, "count".
we have
---------------------------------------------------------------
The replacement string can contain \n, where n is 1 through 9, to
indicate that the source substring matching the n'th parenthesized
subexpression of the pattern should be inserted, and it can contain \&
to indicate that the substring matching the entire pattern should be
inserted.
----------------------------------------------------------------------------
in the regexp_replace explanation section.
changing "N" to lower-case would be misleading for regexp_replace?
so I choose "count".
By the way, I think the above is so hard to comprehend.
I can only find related test in src/test/regress/sql/strings.sql are:
SELECT regexp_replace('1112223333', E'(\\d{3})(\\d{3})(\\d{4})',
E'(\\1) \\2-\\3');
SELECT regexp_replace('foobarrbazz', E'(.)\\1', E'X\\&Y', 'g');
SELECT regexp_replace('foobarrbazz', E'(.)\\1', E'X\\\\Y', 'g');
but these tests seem not friendly.
maybe we should have some simple examples to demonstrate the above paragraph.
Attachments:
v4-0001-add-regex-functions-argument-names.patchtext/x-patch; charset=US-ASCII; name=v4-0001-add-regex-functions-argument-names.patchDownload
From 372a0c6cb894194b819fc380efda179bf6d1055d Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Thu, 4 Apr 2024 21:50:13 +0800
Subject: [PATCH v4 1/1] add regex functions argument names.
Specifically add function argument names to the following funtions:
regexp_replace,
regexp_match,
regexp_matches,
regexp_count,
regexp_instr,
regexp_like,
regexp_substr,
regexp_split_to_table,
regexp_split_to_array
So it would be easier to understand these functions in psql via \df.
now these functions can be called in different notaions.
discussion: https://postgr.es/m/CACJufxG3NFKKsh6x4fRLv8h3V-HvN4W5dA%3DzNKMxsNcDwOKang%40mail.gmail.com
---
doc/src/sgml/func.sgml | 50 +++++++++++------------
src/include/catalog/pg_proc.dat | 71 ++++++++++++++++++++++++++-------
2 files changed, 82 insertions(+), 39 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index ff690113..57ad1624 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3332,7 +3332,7 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in
</indexterm>
<function>regexp_instr</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>
[, <parameter>start</parameter> <type>integer</type>
- [, <parameter>N</parameter> <type>integer</type>
+ [, <parameter>count</parameter> <type>integer</type>
[, <parameter>endoption</parameter> <type>integer</type>
[, <parameter>flags</parameter> <type>text</type>
[, <parameter>subexpr</parameter> <type>integer</type> ] ] ] ] ] )
@@ -3340,7 +3340,7 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in
</para>
<para>
Returns the position within <parameter>string</parameter> where
- the <parameter>N</parameter>'th match of the POSIX regular
+ the <parameter>count</parameter>'th match of the POSIX regular
expression <parameter>pattern</parameter> occurs, or zero if there is
no such match; see <xref linkend="functions-posix-regexp"/>.
</para>
@@ -3446,14 +3446,14 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in
<entry role="func_table_entry"><para role="func_signature">
<function>regexp_replace</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>, <parameter>replacement</parameter> <type>text</type>,
<parameter>start</parameter> <type>integer</type>,
- <parameter>N</parameter> <type>integer</type>
+ <parameter>count</parameter> <type>integer</type>
[, <parameter>flags</parameter> <type>text</type> ] )
<returnvalue>text</returnvalue>
</para>
<para>
- Replaces the substring that is the <parameter>N</parameter>'th
+ Replaces the substring that is the <parameter>count</parameter>'th
match to the POSIX regular expression <parameter>pattern</parameter>,
- or all such matches if <parameter>N</parameter> is zero; see
+ or all such matches if <parameter>count</parameter> is zero; see
<xref linkend="functions-posix-regexp"/>.
</para>
<para>
@@ -3511,14 +3511,14 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in
</indexterm>
<function>regexp_substr</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>
[, <parameter>start</parameter> <type>integer</type>
- [, <parameter>N</parameter> <type>integer</type>
+ [, <parameter>count</parameter> <type>integer</type>
[, <parameter>flags</parameter> <type>text</type>
[, <parameter>subexpr</parameter> <type>integer</type> ] ] ] ] )
<returnvalue>text</returnvalue>
</para>
<para>
Returns the substring within <parameter>string</parameter> that
- matches the <parameter>N</parameter>'th occurrence of the POSIX
+ matches the <parameter>count</parameter>'th occurrence of the POSIX
regular expression <parameter>pattern</parameter>,
or <literal>NULL</literal> if there is no such match; see
<xref linkend="functions-posix-regexp"/>.
@@ -5921,13 +5921,13 @@ regexp_count('ABCABCAXYaxy', 'A.', 1, 'i') <lineannotation>4</lineannotation>
<para>
The <function>regexp_instr</function> function returns the starting or
- ending position of the <replaceable>N</replaceable>'th match of a
+ ending position of the <replaceable>count</replaceable>'th match of a
POSIX regular expression pattern to a string, or zero if there is no
such match. It has the syntax
<function>regexp_instr</function>(<replaceable>string</replaceable>,
<replaceable>pattern</replaceable>
<optional>, <replaceable>start</replaceable>
- <optional>, <replaceable>N</replaceable>
+ <optional>, <replaceable>count</replaceable>
<optional>, <replaceable>endoption</replaceable>
<optional>, <replaceable>flags</replaceable>
<optional>, <replaceable>subexpr</replaceable>
@@ -5936,8 +5936,8 @@ regexp_count('ABCABCAXYaxy', 'A.', 1, 'i') <lineannotation>4</lineannotation>
in <replaceable>string</replaceable>, normally from the beginning of
the string, but if the <replaceable>start</replaceable> parameter is
provided then beginning from that character index.
- If <replaceable>N</replaceable> is specified
- then the <replaceable>N</replaceable>'th match of the pattern
+ If <replaceable>count</replaceable> is specified
+ then the <replaceable>count</replaceable>'th match of the pattern
is located, otherwise the first match is located.
If the <replaceable>endoption</replaceable> parameter is omitted or
specified as zero, the function returns the position of the first
@@ -6057,8 +6057,8 @@ SELECT (regexp_match('foobarbequebaz', 'bar.*que'))[1];
expression pattern to a string. It has the same syntax as
<function>regexp_match</function>.
This function returns no rows if there is no match, one row if there is
- a match and the <literal>g</literal> flag is not given, or <replaceable>N</replaceable>
- rows if there are <replaceable>N</replaceable> matches and the <literal>g</literal> flag
+ a match and the <literal>g</literal> flag is not given, or <replaceable>count</replaceable>
+ rows if there are <replaceable>count</replaceable> matches and the <literal>g</literal> flag
is given. Each returned row is a text array containing the whole
matched substring or the substrings matching parenthesized
subexpressions of the <replaceable>pattern</replaceable>, just as described above
@@ -6109,18 +6109,18 @@ SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab;
The <function>regexp_replace</function> function provides substitution of
new text for substrings that match POSIX regular expression patterns.
It has the syntax
- <function>regexp_replace</function>(<replaceable>source</replaceable>,
+ <function>regexp_replace</function>(<replaceable>string</replaceable>,
<replaceable>pattern</replaceable>, <replaceable>replacement</replaceable>
<optional>, <replaceable>start</replaceable>
- <optional>, <replaceable>N</replaceable>
+ <optional>, <replaceable>count</replaceable>
</optional></optional>
<optional>, <replaceable>flags</replaceable> </optional>).
- (Notice that <replaceable>N</replaceable> cannot be specified
+ (Notice that <replaceable>count</replaceable> cannot be specified
unless <replaceable>start</replaceable> is,
but <replaceable>flags</replaceable> can be given in any case.)
- The <replaceable>source</replaceable> string is returned unchanged if
+ The <replaceable>string</replaceable> is returned unchanged if
there is no match to the <replaceable>pattern</replaceable>. If there is a
- match, the <replaceable>source</replaceable> string is returned with the
+ match, the <replaceable>string</replaceable> is returned with the
<replaceable>replacement</replaceable> string substituted for the matching
substring. The <replaceable>replacement</replaceable> string can contain
<literal>\</literal><replaceable>n</replaceable>, where <replaceable>n</replaceable> is 1
@@ -6135,14 +6135,14 @@ SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab;
the string, but if the <replaceable>start</replaceable> parameter is
provided then beginning from that character index.
By default, only the first match of the pattern is replaced.
- If <replaceable>N</replaceable> is specified and is greater than zero,
- then the <replaceable>N</replaceable>'th match of the pattern
+ If <replaceable>count</replaceable> is specified and is greater than zero,
+ then the <replaceable>count</replaceable>'th match of the pattern
is replaced.
If the <literal>g</literal> flag is given, or
- if <replaceable>N</replaceable> is specified and is zero, then all
+ if <replaceable>count</replaceable> is specified and is zero, then all
matches at or after the <replaceable>start</replaceable> position are
replaced. (The <literal>g</literal> flag is ignored
- when <replaceable>N</replaceable> is specified.)
+ when <replaceable>count</replaceable> is specified.)
The <replaceable>flags</replaceable> parameter is an optional text
string containing zero or more single-letter flags that change the
function's behavior. Supported flags (though
@@ -6253,7 +6253,7 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', '\s*') AS foo;
<function>regexp_substr</function>(<replaceable>string</replaceable>,
<replaceable>pattern</replaceable>
<optional>, <replaceable>start</replaceable>
- <optional>, <replaceable>N</replaceable>
+ <optional>, <replaceable>count</replaceable>
<optional>, <replaceable>flags</replaceable>
<optional>, <replaceable>subexpr</replaceable>
</optional></optional></optional></optional>).
@@ -6261,8 +6261,8 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', '\s*') AS foo;
in <replaceable>string</replaceable>, normally from the beginning of
the string, but if the <replaceable>start</replaceable> parameter is
provided then beginning from that character index.
- If <replaceable>N</replaceable> is specified
- then the <replaceable>N</replaceable>'th match of the pattern
+ If <replaceable>count</replaceable> is specified
+ then the <replaceable>count</replaceable>'th match of the pattern
is returned, otherwise the first match is returned.
The <replaceable>flags</replaceable> parameter is an optional text
string containing zero or more single-letter flags that change the
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 153d816a..be28950c 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -3623,105 +3623,148 @@
prosrc => 'replace_text' },
{ oid => '2284', descr => 'replace text using regexp',
proname => 'regexp_replace', prorettype => 'text',
- proargtypes => 'text text text', prosrc => 'textregexreplace_noopt' },
+ proargtypes => 'text text text',
+ proargnames => '{string, pattern, replacement}',
+ prosrc => 'textregexreplace_noopt' },
{ oid => '2285', descr => 'replace text using regexp',
proname => 'regexp_replace', prorettype => 'text',
- proargtypes => 'text text text text', prosrc => 'textregexreplace' },
+ proargtypes => 'text text text text',
+ proargnames => '{string, pattern, replacement, flags}',
+ prosrc => 'textregexreplace' },
{ oid => '6251', descr => 'replace text using regexp',
proname => 'regexp_replace', prorettype => 'text',
proargtypes => 'text text text int4 int4 text',
+ proargnames => '{string, pattern, replacement, start, count, flags}',
prosrc => 'textregexreplace_extended' },
{ oid => '6252', descr => 'replace text using regexp',
proname => 'regexp_replace', prorettype => 'text',
proargtypes => 'text text text int4 int4',
+ proargnames => '{string, pattern, replacement, start, count}',
prosrc => 'textregexreplace_extended_no_flags' },
{ oid => '6253', descr => 'replace text using regexp',
proname => 'regexp_replace', prorettype => 'text',
proargtypes => 'text text text int4',
+ proargnames => '{string, pattern, replacement, start}',
prosrc => 'textregexreplace_extended_no_n' },
{ oid => '3396', descr => 'find first match for regexp',
proname => 'regexp_match', prorettype => '_text', proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_match_no_flags' },
{ oid => '3397', descr => 'find first match for regexp',
proname => 'regexp_match', prorettype => '_text',
- proargtypes => 'text text text', prosrc => 'regexp_match' },
+ proargtypes => 'text text text',
+ proargnames => '{string, pattern, flags}',
+ prosrc => 'regexp_match' },
{ oid => '2763', descr => 'find match(es) for regexp',
proname => 'regexp_matches', prorows => '1', proretset => 't',
prorettype => '_text', proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_matches_no_flags' },
{ oid => '2764', descr => 'find match(es) for regexp',
proname => 'regexp_matches', prorows => '10', proretset => 't',
prorettype => '_text', proargtypes => 'text text text',
+ proargnames => '{string, pattern, flags}',
prosrc => 'regexp_matches' },
{ oid => '6254', descr => 'count regexp matches',
proname => 'regexp_count', prorettype => 'int4', proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_count_no_start' },
{ oid => '6255', descr => 'count regexp matches',
proname => 'regexp_count', prorettype => 'int4',
- proargtypes => 'text text int4', prosrc => 'regexp_count_no_flags' },
+ proargtypes => 'text text int4',
+ proargnames => '{string, pattern, start}',
+ prosrc => 'regexp_count_no_flags' },
{ oid => '6256', descr => 'count regexp matches',
proname => 'regexp_count', prorettype => 'int4',
- proargtypes => 'text text int4 text', prosrc => 'regexp_count' },
+ proargtypes => 'text text int4 text',
+ proargnames => '{string, pattern, start, flags}',
+ prosrc => 'regexp_count' },
{ oid => '6257', descr => 'position of regexp match',
proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_instr_no_start' },
{ oid => '6258', descr => 'position of regexp match',
proname => 'regexp_instr', prorettype => 'int4',
- proargtypes => 'text text int4', prosrc => 'regexp_instr_no_n' },
+ proargtypes => 'text text int4',
+ proargnames => '{string, pattern, start}',
+ prosrc => 'regexp_instr_no_n' },
{ oid => '6259', descr => 'position of regexp match',
proname => 'regexp_instr', prorettype => 'int4',
- proargtypes => 'text text int4 int4', prosrc => 'regexp_instr_no_endoption' },
+ proargtypes => 'text text int4 int4',
+ proargnames => '{string, pattern, start, count}',
+ prosrc => 'regexp_instr_no_endoption' },
{ oid => '6260', descr => 'position of regexp match',
proname => 'regexp_instr', prorettype => 'int4',
proargtypes => 'text text int4 int4 int4',
+ proargnames => '{string, pattern, start, count, endoption}',
prosrc => 'regexp_instr_no_flags' },
{ oid => '6261', descr => 'position of regexp match',
proname => 'regexp_instr', prorettype => 'int4',
proargtypes => 'text text int4 int4 int4 text',
+ proargnames => '{string, pattern, start, count, endoption, flags}',
prosrc => 'regexp_instr_no_subexpr' },
{ oid => '6262', descr => 'position of regexp match',
proname => 'regexp_instr', prorettype => 'int4',
proargtypes => 'text text int4 int4 int4 text int4',
+ proargnames => '{string, pattern, start, count, endoption, flags, subexpr}',
prosrc => 'regexp_instr' },
{ oid => '6263', descr => 'test for regexp match',
- proname => 'regexp_like', prorettype => 'bool', proargtypes => 'text text',
+ proname => 'regexp_like', prorettype => 'bool',
+ proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_like_no_flags' },
{ oid => '6264', descr => 'test for regexp match',
proname => 'regexp_like', prorettype => 'bool',
- proargtypes => 'text text text', prosrc => 'regexp_like' },
+ proargtypes => 'text text text',
+ proargnames => '{string, pattern,flags}',
+ prosrc => 'regexp_like' },
{ oid => '6265', descr => 'extract substring that matches regexp',
proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_substr_no_start' },
{ oid => '6266', descr => 'extract substring that matches regexp',
proname => 'regexp_substr', prorettype => 'text',
- proargtypes => 'text text int4', prosrc => 'regexp_substr_no_n' },
+ proargtypes => 'text text int4',
+ proargnames => '{string, pattern, start}',
+ prosrc => 'regexp_substr_no_n' },
{ oid => '6267', descr => 'extract substring that matches regexp',
proname => 'regexp_substr', prorettype => 'text',
- proargtypes => 'text text int4 int4', prosrc => 'regexp_substr_no_flags' },
+ proargtypes => 'text text int4 int4',
+ proargnames => '{string, pattern, start, count}',
+ prosrc => 'regexp_substr_no_flags' },
{ oid => '6268', descr => 'extract substring that matches regexp',
proname => 'regexp_substr', prorettype => 'text',
proargtypes => 'text text int4 int4 text',
+ proargnames => '{string, pattern, start, count, flags}',
prosrc => 'regexp_substr_no_subexpr' },
{ oid => '6269', descr => 'extract substring that matches regexp',
proname => 'regexp_substr', prorettype => 'text',
- proargtypes => 'text text int4 int4 text int4', prosrc => 'regexp_substr' },
+ proargtypes => 'text text int4 int4 text int4',
+ proargnames => '{string, pattern, start, count, flags, subexpr}',
+ prosrc => 'regexp_substr' },
{ oid => '2088', descr => 'split string by field_sep and return field_num',
proname => 'split_part', prorettype => 'text',
proargtypes => 'text text int4', prosrc => 'split_part' },
{ oid => '2765', descr => 'split string by pattern',
proname => 'regexp_split_to_table', prorows => '1000', proretset => 't',
prorettype => 'text', proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_split_to_table_no_flags' },
{ oid => '2766', descr => 'split string by pattern',
proname => 'regexp_split_to_table', prorows => '1000', proretset => 't',
prorettype => 'text', proargtypes => 'text text text',
+ proargnames => '{string, pattern, flags}',
prosrc => 'regexp_split_to_table' },
{ oid => '2767', descr => 'split string by pattern',
proname => 'regexp_split_to_array', prorettype => '_text',
- proargtypes => 'text text', prosrc => 'regexp_split_to_array_no_flags' },
+ proargtypes => 'text text',
+ proargnames => '{string, pattern}',
+ prosrc => 'regexp_split_to_array_no_flags' },
{ oid => '2768', descr => 'split string by pattern',
proname => 'regexp_split_to_array', prorettype => '_text',
- proargtypes => 'text text text', prosrc => 'regexp_split_to_array' },
+ proargtypes => 'text text text',
+ proargnames => '{string, pattern, flags}',
+ prosrc => 'regexp_split_to_array' },
{ oid => '9030', descr => 'convert int4 number to binary',
proname => 'to_bin', prorettype => 'text', proargtypes => 'int4',
prosrc => 'to_bin32' },
base-commit: 6f4d63e989ffbdb44f5856a17fae5ae68c40327d
--
2.34.1
On Thu, Apr 4, 2024 at 9:55 AM jian he <jian.universality@gmail.com> wrote:
in the regexp_replace explanation section.
changing "N" to lower-case would be misleading for regexp_replace?
so I choose "count".
I don't see why that would be confusing for regexp_replace
specifically, but I think N => count is a reasonable change to make.
However, I don't think this quite works:
+ then the <replaceable>count</replaceable>'th match of the pattern
An English speaker is more likely to understand what is meant by
"N'th" than what is meant by "count'th". Even if they can guess, it's
kinda strange-looking. I think it needs to be rephrased somehow, but
I'm not sure exactly how.
By the way, I think the above is so hard to comprehend.
I can only find related test in src/test/regress/sql/strings.sql are:
SELECT regexp_replace('1112223333', E'(\\d{3})(\\d{3})(\\d{4})',
E'(\\1) \\2-\\3');
SELECT regexp_replace('foobarrbazz', E'(.)\\1', E'X\\&Y', 'g');
SELECT regexp_replace('foobarrbazz', E'(.)\\1', E'X\\\\Y', 'g');but these tests seem not friendly.
maybe we should have some simple examples to demonstrate the above paragraph.
Examples in the regression tests aren't meant as tests, not examples
for users to copy. If we want examples, those belong in the
documentation. However, I see that regexp_replace already has some
examples at https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP
so I'm not sure exactly what you think should be added.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, May 15, 2024 at 2:46 PM Robert Haas <robertmhaas@gmail.com> wrote:
Examples in the regression tests aren't meant as tests, not examples
for users to copy. If we want examples, those belong in the
documentation. However, I see that regexp_replace already has some
examples at https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP
so I'm not sure exactly what you think should be added.
Woops. I should have said: Examples in the regression tests *are*
meant as tests...
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, May 15, 2024 at 11:46 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Apr 4, 2024 at 9:55 AM jian he <jian.universality@gmail.com>
wrote:in the regexp_replace explanation section.
changing "N" to lower-case would be misleading for regexp_replace?
so I choose "count".I don't see why that would be confusing for regexp_replace
specifically, but I think N => count is a reasonable change to make.
However, I don't think this quite works:+ then the <replaceable>count</replaceable>'th match of the pattern
An English speaker is more likely to understand what is meant by
"N'th" than what is meant by "count'th". Even if they can guess, it's
kinda strange-looking. I think it needs to be rephrased somehow, but
I'm not sure exactly how.
I think this confusion goes to show that replacing N with count doesn't
work.
"replace_at" comes to mind as a better name.
By default, only the first match of the pattern is replaced. If replace_at
is specified and greater than zero, then the first "replace_at - 1" matches
are skipped before making a single replacement (i.e., the g flag is ignored
when replace_at is specified.)
David J.
On Wed, May 15, 2024 at 3:01 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
I think this confusion goes to show that replacing N with count doesn't work.
"replace_at" comes to mind as a better name.
I do not agree with that at all. It shows that a literal
search-and-replace changing N to count does not work, but it does not
show that count is a bad name for the concept, and I don't think it
is. I believe that if I were reading the documentation, count would be
clearer to me than N, N would probably still be clear enough, and
replace_at wouldn't be clear at all. I'd expect replace_at to be a
character position or something, not an occurrence count.
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
On Thu, Apr 4, 2024 at 9:55 AM jian he <jian.universality@gmail.com> wrote:
changing "N" to lower-case would be misleading for regexp_replace?
so I choose "count".
I don't see why that would be confusing for regexp_replace
specifically, but I think N => count is a reasonable change to make.
However, I don't think this quite works:
+ then the <replaceable>count</replaceable>'th match of the pattern
I think the origin of the problem here is not wanting to use "N"
as the actual name of the parameter, because then users would have
to double-quote it to write "regexp_replace(..., "N" => 42, ...)".
However ... is that really so awful? It's still fewer keystrokes
than "count". It's certainly a potential gotcha for users who've
not internalized when they need double quotes, but I think we
could largely address that problem just by making sure to provide
a documentation example that shows use of "N".
An English speaker is more likely to understand what is meant by
"N'th" than what is meant by "count'th".
+1 ... none of the proposals make that bit read more clearly
than it does now.
regards, tom lane
On 05/15/24 15:07, Robert Haas wrote:
is. I believe that if I were reading the documentation, count would be
clearer to me than N, N would probably still be clear enough, and
replace_at wouldn't be clear at all. I'd expect replace_at to be a
character position or something, not an occurrence count.
You've said the magic word. In the analogous (but XQuery-based)
ISO standard regex functions, the argument that does that is identified
with the keyword OCCURRENCE.
What would be wrong with that, for consistency's sake?
Regards,
-Chap
On Wed, May 15, 2024 at 12:07 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, May 15, 2024 at 3:01 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:I think this confusion goes to show that replacing N with count doesn't
work.
"replace_at" comes to mind as a better name.
I do not agree with that at all. It shows that a literal
search-and-replace changing N to count does not work, but it does not
show that count is a bad name for the concept, and I don't think it
is. I believe that if I were reading the documentation, count would be
clearer to me than N, N would probably still be clear enough, and
replace_at wouldn't be clear at all. I'd expect replace_at to be a
character position or something, not an occurrence count.
The function replaces matches, not random characters. And if you are
reading the documentation I find it implausible that the wording I
suggested would cause one to think in terms of characters instead of
matches.
If I choose not to read the documentation "count" seems like it behaves as
a qualified "g". I don't want all matches replaced, I want the first
"count" matches only replaced.
"occurrence" probably is the best choice but I agree the spelling issues
are a big negative.
count - how many things there are. This isn't a count. I'd rather stick
with N, at least it actually has the desired meaning as a pointer to an
item in a list.
N - The label provides zero context as to what the number you place there
is going to be used for. Labels ideally do more work than this especially
if someone takes the time to spell them out. Otherwise why use "pattern"
instead of "p".
David J.
On Wed, May 15, 2024 at 3:23 PM Chapman Flack <jcflack@acm.org> wrote:
What would be wrong with that, for consistency's sake?
It was proposed and rejected upthread, but that's not to say that I
necessarily endorse the reasons given.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, May 15, 2024 at 3:25 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
The function replaces matches, not random characters. And if you are reading the documentation I find it implausible that the wording I suggested would cause one to think in terms of characters instead of matches.
I mean I just told you what my reaction to it was. If you find that
reaction "implausible" then I guess you think I was lying when I said
that?
N - The label provides zero context as to what the number you place there is going to be used for. Labels ideally do more work than this especially if someone takes the time to spell them out. Otherwise why use "pattern" instead of "p".
I feel like you're attacking a straw man here. I never said that N was
my first choice; in fact, I said the opposite. But I do think that if
the documentation says, as it does, that the function is
regexp_replace(source, pattern, replacement, start, N, flags), a
reader who has some idea what a function called regexp_replace might
do will probably be able to guess what N is. It's probably also true
that if we changed "pattern" to "p" they would still be able to guess
that too, because there's nothing other than a pattern that you'd
expect to pass to a regexp-replacement function that starts with p,
but it would still be worse than what we have now.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, May 15, 2024 at 12:52 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, May 15, 2024 at 3:25 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:The function replaces matches, not random characters. And if you are
reading the documentation I find it implausible that the wording I
suggested would cause one to think in terms of characters instead of
matches.I mean I just told you what my reaction to it was. If you find that
reaction "implausible" then I guess you think I was lying when I said
that?
You just broke my brain when you say that you read:
By default, only the first match of the pattern is replaced. If replace_at
is specified and greater than zero, then the first "replace_at - 1" matches
are skipped before making a single replacement (i.e., the g flag is ignored
when replace_at is specified.)
And then say:
I'd expect replace_at to be a character position or something, not an
occurrence count.
I guess it isn't a claim you are lying, rather I simply don't follow your
mental model of all this and in my mental model behind the proposal I don't
believe the typical reader will become confused on that point. I guess
that means I don't find you to be the typical reader, at least so far as
this specific topic goes. But hey, maybe I'm the one in the minority. In
either case we disagree and that was my main point.
David J.
On Wed, May 15, 2024 at 12:07 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, May 15, 2024 at 3:01 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:I think this confusion goes to show that replacing N with count doesn't
work.
"replace_at" comes to mind as a better name.
I'd expect replace_at to be a
character position or something, not an occurrence count.
I'll amend the name to: "replace_match"
I do now see that since the immediately preceding parameter, "start", deals
with characters instead of matches that making it clear this parameter
deals in matches in the name work. The singular 'match' has all the same
benefits as 'at' plus this point of clarity.
David J.
On Wed, May 15, 2024 at 4:13 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
You just broke my brain when you say that you read:
By default, only the first match of the pattern is replaced. If replace_at is specified and greater than zero, then the first "replace_at - 1" matches are skipped before making a single replacement (i.e., the g flag is ignored when replace_at is specified.)
And then say:
I'd expect replace_at to be a character position or something, not an occurrence count.
Ah. What I meant was: if I just saw the parameter name, and not the
documentation, I believe that I would not correctly understand what it
did. I would have had to read the docs. Whereas I'm pretty sure at
some point years ago, I looked up these functions and I saw "N", and I
did understand what that did without needing it explained. If I had
seen "count" or "occurrence" I think I would have understood that
without further explanation, too.
So my point was: to me, N is more self-documenting than replace_at,
and less self-documenting than count or occurrence.
If your mileage varies on that point, so be it!
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, May 15, 2024 at 1:19 PM Robert Haas <robertmhaas@gmail.com> wrote:
So my point was: to me, N is more self-documenting than replace_at,
and less self-documenting than count or occurrence.If your mileage varies on that point, so be it!
Maybe just "match" instead of "replace_match".
Reading this it strikes me that any of these parameter names can and
probably should be read as having "replace" in front of them:
replace N
replace count
replace occurrence
replace match
Saying replace becomes redundant:
replace replace at
replace replace match
David J.
On 05/15/24 15:31, Robert Haas wrote:
On Wed, May 15, 2024 at 3:23 PM Chapman Flack <jcflack@acm.org> wrote:
What would be wrong with [occurrence], for consistency's sake?
It was proposed and rejected upthread, but that's not to say that I
necessarily endorse the reasons given.
Apologies for not having read far enough up the thread before replying.
Having done so now, I guess I'd just offer one small point: the upthread
discussion did mention that 'occurrence' was used by Oracle, and asked
"if you're not following Oracle slavishly, exactly what is the argument
in its favor?".
Nothing else upthread seems to have mentioned that OCCURRENCE is the
exact keyword used in ISO SQL for the analogous argument in analogous
functions. Maybe that won't have any effect on the outcome either, but
it does seem worth getting into the thread.
Regards,
-Chap
On Wed, May 15, 2024 at 4:25 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
Maybe just "match" instead of "replace_match".
Well, this is just turning into a bikeshedding exercise at this point.
We can generate names for this parameter all day long, but a bunch of
names none of which gets more than one vote is not really helping
anything.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Thu, May 16, 2024 at 3:10 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Thu, Apr 4, 2024 at 9:55 AM jian he <jian.universality@gmail.com> wrote:
changing "N" to lower-case would be misleading for regexp_replace?
so I choose "count".I don't see why that would be confusing for regexp_replace
specifically, but I think N => count is a reasonable change to make.
However, I don't think this quite works:
+ then the <replaceable>count</replaceable>'th match of the patternI think the origin of the problem here is not wanting to use "N"
as the actual name of the parameter, because then users would have
to double-quote it to write "regexp_replace(..., "N" => 42, ...)".However ... is that really so awful? It's still fewer keystrokes
than "count". It's certainly a potential gotcha for users who've
not internalized when they need double quotes, but I think we
could largely address that problem just by making sure to provide
a documentation example that shows use of "N".
done it this way. patch attached.
last example from
regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 3, 'i')
A PostgrXSQL function
change to
regexp_replace(string=>'A PostgreSQL function', pattern=>'a|e|i|o|u',
replacement=>'X',start=>1, "N"=>3, flags=>'i');
A PostgrXSQL function
but I am not 100% sure
<lineannotation>A PostgrXSQL
function</lineannotation>
is in the right position.
also address Chapman Flack point:
correct me if i am wrong, but i don't think the ISO standard mandates
function argument names.
So we can choose the best function argument name for our purpose?
Attachments:
v5-0001-add-regex-functions-argument-names-to-pg_proc.patchtext/x-patch; charset=US-ASCII; name=v5-0001-add-regex-functions-argument-names-to-pg_proc.patchDownload
From 168dd2d06ce441958ac3ba2c70864da658540b1c Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Mon, 15 Jul 2024 18:51:18 +0800
Subject: [PATCH v5 1/1] add regex functions argument names to pg_proc
Specifically add function argument names to the following funtions:
regexp_replace,
regexp_match,
regexp_matches,
regexp_count,
regexp_instr,
regexp_like,
regexp_substr,
regexp_split_to_table,
regexp_split_to_array
So it would be easier to understand these functions in psql via \df.
now these functions can be called in different notaions.
---
doc/src/sgml/func.sgml | 8 ++--
src/include/catalog/pg_proc.dat | 71 ++++++++++++++++++++++++++-------
2 files changed, 61 insertions(+), 18 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 785886af..9cdb3cea 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6109,7 +6109,7 @@ SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab;
The <function>regexp_replace</function> function provides substitution of
new text for substrings that match POSIX regular expression patterns.
It has the syntax
- <function>regexp_replace</function>(<replaceable>source</replaceable>,
+ <function>regexp_replace</function>(<replaceable>string</replaceable>,
<replaceable>pattern</replaceable>, <replaceable>replacement</replaceable>
<optional>, <replaceable>start</replaceable>
<optional>, <replaceable>N</replaceable>
@@ -6118,9 +6118,9 @@ SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab;
(Notice that <replaceable>N</replaceable> cannot be specified
unless <replaceable>start</replaceable> is,
but <replaceable>flags</replaceable> can be given in any case.)
- The <replaceable>source</replaceable> string is returned unchanged if
+ The <replaceable>string</replaceable> is returned unchanged if
there is no match to the <replaceable>pattern</replaceable>. If there is a
- match, the <replaceable>source</replaceable> string is returned with the
+ match, the <replaceable>string</replaceable> is returned with the
<replaceable>replacement</replaceable> string substituted for the matching
substring. The <replaceable>replacement</replaceable> string can contain
<literal>\</literal><replaceable>n</replaceable>, where <replaceable>n</replaceable> is 1
@@ -6161,7 +6161,7 @@ regexp_replace('foobarbaz', 'b(..)', 'X\1Y', 'g')
<lineannotation>fooXarYXazY</lineannotation>
regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 0, 'i')
<lineannotation>X PXstgrXSQL fXnctXXn</lineannotation>
-regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 3, 'i')
+regexp_replace(string=>'A PostgreSQL function', pattern=>'a|e|i|o|u', replacement=>'X',start=>1, "N"=>3, flags=>'i');
<lineannotation>A PostgrXSQL function</lineannotation>
</programlisting>
</para>
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 73d9cf85..e4ead68f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -3623,105 +3623,148 @@
prosrc => 'replace_text' },
{ oid => '2284', descr => 'replace text using regexp',
proname => 'regexp_replace', prorettype => 'text',
- proargtypes => 'text text text', prosrc => 'textregexreplace_noopt' },
+ proargtypes => 'text text text',
+ proargnames => '{string, pattern, replacement}',
+ prosrc => 'textregexreplace_noopt' },
{ oid => '2285', descr => 'replace text using regexp',
proname => 'regexp_replace', prorettype => 'text',
- proargtypes => 'text text text text', prosrc => 'textregexreplace' },
+ proargtypes => 'text text text text',
+ proargnames => '{string, pattern, replacement, flags}',
+ prosrc => 'textregexreplace' },
{ oid => '6251', descr => 'replace text using regexp',
proname => 'regexp_replace', prorettype => 'text',
proargtypes => 'text text text int4 int4 text',
+ proargnames => '{string, pattern, replacement, start, N, flags}',
prosrc => 'textregexreplace_extended' },
{ oid => '6252', descr => 'replace text using regexp',
proname => 'regexp_replace', prorettype => 'text',
proargtypes => 'text text text int4 int4',
+ proargnames => '{string, pattern, replacement, start, N}',
prosrc => 'textregexreplace_extended_no_flags' },
{ oid => '6253', descr => 'replace text using regexp',
proname => 'regexp_replace', prorettype => 'text',
proargtypes => 'text text text int4',
+ proargnames => '{string, pattern, replacement, start}',
prosrc => 'textregexreplace_extended_no_n' },
{ oid => '3396', descr => 'find first match for regexp',
proname => 'regexp_match', prorettype => '_text', proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_match_no_flags' },
{ oid => '3397', descr => 'find first match for regexp',
proname => 'regexp_match', prorettype => '_text',
- proargtypes => 'text text text', prosrc => 'regexp_match' },
+ proargtypes => 'text text text',
+ proargnames => '{string, pattern, flags}',
+ prosrc => 'regexp_match' },
{ oid => '2763', descr => 'find match(es) for regexp',
proname => 'regexp_matches', prorows => '1', proretset => 't',
prorettype => '_text', proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_matches_no_flags' },
{ oid => '2764', descr => 'find match(es) for regexp',
proname => 'regexp_matches', prorows => '10', proretset => 't',
prorettype => '_text', proargtypes => 'text text text',
+ proargnames => '{string, pattern, flags}',
prosrc => 'regexp_matches' },
{ oid => '6254', descr => 'count regexp matches',
proname => 'regexp_count', prorettype => 'int4', proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_count_no_start' },
{ oid => '6255', descr => 'count regexp matches',
proname => 'regexp_count', prorettype => 'int4',
- proargtypes => 'text text int4', prosrc => 'regexp_count_no_flags' },
+ proargtypes => 'text text int4',
+ proargnames => '{string, pattern, start}',
+ prosrc => 'regexp_count_no_flags' },
{ oid => '6256', descr => 'count regexp matches',
proname => 'regexp_count', prorettype => 'int4',
- proargtypes => 'text text int4 text', prosrc => 'regexp_count' },
+ proargtypes => 'text text int4 text',
+ proargnames => '{string, pattern, start, flags}',
+ prosrc => 'regexp_count' },
{ oid => '6257', descr => 'position of regexp match',
proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_instr_no_start' },
{ oid => '6258', descr => 'position of regexp match',
proname => 'regexp_instr', prorettype => 'int4',
- proargtypes => 'text text int4', prosrc => 'regexp_instr_no_n' },
+ proargtypes => 'text text int4',
+ proargnames => '{string, pattern, start}',
+ prosrc => 'regexp_instr_no_n' },
{ oid => '6259', descr => 'position of regexp match',
proname => 'regexp_instr', prorettype => 'int4',
- proargtypes => 'text text int4 int4', prosrc => 'regexp_instr_no_endoption' },
+ proargtypes => 'text text int4 int4',
+ proargnames => '{string, pattern, start, N}',
+ prosrc => 'regexp_instr_no_endoption' },
{ oid => '6260', descr => 'position of regexp match',
proname => 'regexp_instr', prorettype => 'int4',
proargtypes => 'text text int4 int4 int4',
+ proargnames => '{string, pattern, start, N, endoption}',
prosrc => 'regexp_instr_no_flags' },
{ oid => '6261', descr => 'position of regexp match',
proname => 'regexp_instr', prorettype => 'int4',
proargtypes => 'text text int4 int4 int4 text',
+ proargnames => '{string, pattern, start, N, endoption, flags}',
prosrc => 'regexp_instr_no_subexpr' },
{ oid => '6262', descr => 'position of regexp match',
proname => 'regexp_instr', prorettype => 'int4',
proargtypes => 'text text int4 int4 int4 text int4',
+ proargnames => '{string, pattern, start, N, endoption, flags, subexpr}',
prosrc => 'regexp_instr' },
{ oid => '6263', descr => 'test for regexp match',
- proname => 'regexp_like', prorettype => 'bool', proargtypes => 'text text',
+ proname => 'regexp_like', prorettype => 'bool',
+ proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_like_no_flags' },
{ oid => '6264', descr => 'test for regexp match',
proname => 'regexp_like', prorettype => 'bool',
- proargtypes => 'text text text', prosrc => 'regexp_like' },
+ proargtypes => 'text text text',
+ proargnames => '{string, pattern,flags}',
+ prosrc => 'regexp_like' },
{ oid => '6265', descr => 'extract substring that matches regexp',
proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_substr_no_start' },
{ oid => '6266', descr => 'extract substring that matches regexp',
proname => 'regexp_substr', prorettype => 'text',
- proargtypes => 'text text int4', prosrc => 'regexp_substr_no_n' },
+ proargtypes => 'text text int4',
+ proargnames => '{string, pattern, start}',
+ prosrc => 'regexp_substr_no_n' },
{ oid => '6267', descr => 'extract substring that matches regexp',
proname => 'regexp_substr', prorettype => 'text',
- proargtypes => 'text text int4 int4', prosrc => 'regexp_substr_no_flags' },
+ proargtypes => 'text text int4 int4',
+ proargnames => '{string, pattern, start, N}',
+ prosrc => 'regexp_substr_no_flags' },
{ oid => '6268', descr => 'extract substring that matches regexp',
proname => 'regexp_substr', prorettype => 'text',
proargtypes => 'text text int4 int4 text',
+ proargnames => '{string, pattern, start, N, flags}',
prosrc => 'regexp_substr_no_subexpr' },
{ oid => '6269', descr => 'extract substring that matches regexp',
proname => 'regexp_substr', prorettype => 'text',
- proargtypes => 'text text int4 int4 text int4', prosrc => 'regexp_substr' },
+ proargtypes => 'text text int4 int4 text int4',
+ proargnames => '{string, pattern, start, N, flags, subexpr}',
+ prosrc => 'regexp_substr' },
{ oid => '2088', descr => 'split string by field_sep and return field_num',
proname => 'split_part', prorettype => 'text',
proargtypes => 'text text int4', prosrc => 'split_part' },
{ oid => '2765', descr => 'split string by pattern',
proname => 'regexp_split_to_table', prorows => '1000', proretset => 't',
prorettype => 'text', proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_split_to_table_no_flags' },
{ oid => '2766', descr => 'split string by pattern',
proname => 'regexp_split_to_table', prorows => '1000', proretset => 't',
prorettype => 'text', proargtypes => 'text text text',
+ proargnames => '{string, pattern, flags}',
prosrc => 'regexp_split_to_table' },
{ oid => '2767', descr => 'split string by pattern',
proname => 'regexp_split_to_array', prorettype => '_text',
- proargtypes => 'text text', prosrc => 'regexp_split_to_array_no_flags' },
+ proargtypes => 'text text',
+ proargnames => '{string, pattern}',
+ prosrc => 'regexp_split_to_array_no_flags' },
{ oid => '2768', descr => 'split string by pattern',
proname => 'regexp_split_to_array', prorettype => '_text',
- proargtypes => 'text text text', prosrc => 'regexp_split_to_array' },
+ proargtypes => 'text text text',
+ proargnames => '{string, pattern, flags}',
+ prosrc => 'regexp_split_to_array' },
{ oid => '6330', descr => 'convert int4 number to binary',
proname => 'to_bin', prorettype => 'text', proargtypes => 'int4',
prosrc => 'to_bin32' },
--
2.34.1
On 07/15/24 08:02, jian he wrote:
also address Chapman Flack point:
correct me if i am wrong, but i don't think the ISO standard mandates
function argument names.
So we can choose the best function argument name for our purpose?
Ah, I may have mistaken which functions the patch meant to apply to.
These being the non-ISO regexp_* functions using POSIX expressions,
the ISO standard indeed says nothing about them.
In the ISO standard *_regex "functions", there are not really "function
argument names" mandated, because, like so many things in ISO SQL, they
have their own special syntax instead of being generic function calls:
TRANSLATE_REGEX('a|e|i|o|u' FLAG 'i' IN 'A PostgreSQL function'
WITH 'X' FROM 1 OCCURRENCE 3);
Any choice to use similar argument names in the regexp_* functions would
be a matter of consistency with the analogous ISO functions, not anything
mandated.
Regards,
-Chap
On 07/15/24 10:46, Chapman Flack wrote:
Ah, I may have mistaken which functions the patch meant to apply to.
...
Any choice to use similar argument names in the regexp_* functions would
be a matter of consistency with the analogous ISO functions, not anything
mandated.
Or, looking back, I might have realized these were the non-ISO regexp_*
functions, but seen there was bikeshedding happening over the best name
to use for the occurrence argument, and merely suggested ISO's choice
OCCURRENCE for the analogous ISO functions, as a possible bikeshed
accelerator.
Regards,
-Chap
jian he <jian.universality@gmail.com> writes:
[ v5-0001-add-regex-functions-argument-names-to-pg_proc.patch ]
I'm not sure whether we've bikeshedded this to death yet, but
personally I'm content with the naming choices here (which basically
are those already shown in table 9.10). However, while looking
at the patch I noticed a couple of issues, one small, the other
a bit bigger.
The small issue is that table 9.10 offers this syntax diagram
for regexp_replace:
regexp_replace ( string text, pattern text, replacement text [, start integer ] [, flags text ] ) → text
This implies that it's valid to write
regexp_replace (string, pattern, replacement, start, flags)
but it is not: we have no function matching that signature. I'm not
in a hurry to add one, either, for fear of ambiguity against the other
regexp_replace signature. I think this needs to be broken into two
syntax diagrams:
regexp_replace ( string text, pattern text, replacement text [, start integer ] ) → text
regexp_replace ( string text, pattern text, replacement text [, flags text ] ) → text
The larger issue is that contrib/citext offers versions of some of
these functions that are meant to be drop-in replacements using
citext input. Hence, we need to add the same parameter names to
those functions, or they'll fail to replace some calls.
regards, tom lane
On Fri, Jul 19, 2024 at 5:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
jian he <jian.universality@gmail.com> writes:
[ v5-0001-add-regex-functions-argument-names-to-pg_proc.patch ]
I'm not sure whether we've bikeshedded this to death yet, but
personally I'm content with the naming choices here (which basically
are those already shown in table 9.10). However, while looking
at the patch I noticed a couple of issues, one small, the other
a bit bigger.The small issue is that table 9.10 offers this syntax diagram
for regexp_replace:regexp_replace ( string text, pattern text, replacement text [, start integer ] [, flags text ] ) → text
This implies that it's valid to write
regexp_replace (string, pattern, replacement, start, flags)
but it is not: we have no function matching that signature. I'm not
in a hurry to add one, either, for fear of ambiguity against the other
regexp_replace signature. I think this needs to be broken into two
syntax diagrams:regexp_replace ( string text, pattern text, replacement text [, start integer ] ) → text
regexp_replace ( string text, pattern text, replacement text [, flags text ] ) → text
We can list them separately.
regexp_replace(string, pattern, replacement [, start])
regexp_replace(string, pattern, replacement [, flags])
regexp_replace(string, pattern, replacement , start , N [, flags ]).
if both optional is not there then they are the same, list 2 potential
identical functions separately seems wrong?
so i choose 2 bracket with a vertical bar:
regexp_replace(string, pattern, replacement [[, start] | [, flags]]).
maybe less readable.
The larger issue is that contrib/citext offers versions of some of
these functions that are meant to be drop-in replacements using
citext input. Hence, we need to add the same parameter names to
those functions, or they'll fail to replace some calls.
I first wanted to use alterfunction solve this, then found out it cannot,
later I found out CREATE OR REPLACE FUNCTION saved us.
citext module, these functions:
regexp_match()
regexp_matches()
regexp_replace()
regexp_split_to_array()
regexp_split_to_table()
were created in contrib/citext/citext--1.4.sql, we can add the CREATE
OR REPLACE FUNCTION to 1.4.sql.
but to avoid unintended consequences I just add these to the newly
created file citext--1.6--1.7.sql,
to make a version bump.
Attachments:
v6-0001-add-regex-functions-argument-names.patchtext/x-patch; charset=US-ASCII; name=v6-0001-add-regex-functions-argument-names.patchDownload
From 1517c6cd25885c0dd2cc4d08549204fbe004e84a Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Fri, 19 Jul 2024 13:37:59 +0800
Subject: [PATCH v6 1/1] add regex functions argument names
Specifically add function argument names to the following funtions:
regexp_replace,
regexp_match,
regexp_matches,
regexp_count,
regexp_instr,
regexp_like,
regexp_substr,
regexp_split_to_table,
regexp_split_to_array
So it would be easier to understand these functions in psql via \df.
now these functions can be called in different notaions.
function argument name aslo added to
the following functions appeared in citext module.
regexp_match()
regexp_matches()
regexp_replace()
regexp_split_to_array()
regexp_split_to_table()
---
contrib/citext/Makefile | 1 +
contrib/citext/citext--1.6--1.7.sql | 45 ++++++++++++++++++
contrib/citext/citext.control | 2 +-
contrib/citext/expected/citext_1.out | 31 ++++++++++++
contrib/citext/meson.build | 1 +
contrib/citext/sql/citext.sql | 9 ++++
doc/src/sgml/func.sgml | 39 ++++++++++-----
src/include/catalog/pg_proc.dat | 71 ++++++++++++++++++++++------
8 files changed, 171 insertions(+), 28 deletions(-)
create mode 100644 contrib/citext/citext--1.6--1.7.sql
diff --git a/contrib/citext/Makefile b/contrib/citext/Makefile
index 35db6eac..b9b3713f 100644
--- a/contrib/citext/Makefile
+++ b/contrib/citext/Makefile
@@ -4,6 +4,7 @@ MODULES = citext
EXTENSION = citext
DATA = citext--1.4.sql \
+ citext--1.6--1.7.sql \
citext--1.5--1.6.sql \
citext--1.4--1.5.sql \
citext--1.3--1.4.sql \
diff --git a/contrib/citext/citext--1.6--1.7.sql b/contrib/citext/citext--1.6--1.7.sql
new file mode 100644
index 00000000..07dca0d4
--- /dev/null
+++ b/contrib/citext/citext--1.6--1.7.sql
@@ -0,0 +1,45 @@
+/* contrib/citext/citext--1.6--1.7.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION citext UPDATE TO '1.7'" to load this file. \quit
+
+-- adding function argument names
+CREATE OR REPLACE FUNCTION regexp_match(string citext, pattern citext ) RETURNS TEXT[] AS $$
+ SELECT pg_catalog.regexp_match($1::pg_catalog.text, $2::pg_catalog.text, 'i' );
+$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE OR REPLACE FUNCTION regexp_match(string citext,pattern citext, flags text ) RETURNS TEXT[] AS $$
+ SELECT pg_catalog.regexp_match( $1::pg_catalog.text, $2::pg_catalog.text, CASE WHEN pg_catalog.strpos($3, 'c') = 0 THEN $3 || 'i' ELSE $3 END );
+$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE OR REPLACE FUNCTION regexp_matches(string citext, pattern citext ) RETURNS SETOF TEXT[] AS $$
+ SELECT pg_catalog.regexp_matches( $1::pg_catalog.text, $2::pg_catalog.text, 'i' );
+$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE ROWS 1;
+
+CREATE OR REPLACE FUNCTION regexp_matches(string citext, pattern citext, flags text ) RETURNS SETOF TEXT[] AS $$
+ SELECT pg_catalog.regexp_matches( $1::pg_catalog.text, $2::pg_catalog.text, CASE WHEN pg_catalog.strpos($3, 'c') = 0 THEN $3 || 'i' ELSE $3 END );
+$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE ROWS 10;
+
+CREATE OR REPLACE FUNCTION regexp_replace(string citext, pattern citext, replacement text ) returns TEXT AS $$
+ SELECT pg_catalog.regexp_replace( $1::pg_catalog.text, $2::pg_catalog.text, $3, 'i');
+$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE OR REPLACE FUNCTION regexp_replace(string citext, pattern citext, replacement text, flags text ) returns TEXT AS $$
+ SELECT pg_catalog.regexp_replace( $1::pg_catalog.text, $2::pg_catalog.text, $3, CASE WHEN pg_catalog.strpos($4, 'c') = 0 THEN $4 || 'i' ELSE $4 END);
+$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE OR REPLACE FUNCTION regexp_split_to_array(string citext, pattern citext ) RETURNS TEXT[] AS $$
+ SELECT pg_catalog.regexp_split_to_array( $1::pg_catalog.text, $2::pg_catalog.text, 'i' );
+$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE OR REPLACE FUNCTION regexp_split_to_array(string citext, pattern citext, flags text ) RETURNS TEXT[] AS $$
+ SELECT pg_catalog.regexp_split_to_array( $1::pg_catalog.text, $2::pg_catalog.text, CASE WHEN pg_catalog.strpos($3, 'c') = 0 THEN $3 || 'i' ELSE $3 END );
+$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE OR REPLACE FUNCTION regexp_split_to_table(string citext, pattern citext ) RETURNS SETOF TEXT AS $$
+ SELECT pg_catalog.regexp_split_to_table( $1::pg_catalog.text, $2::pg_catalog.text, 'i' );
+$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE OR REPLACE FUNCTION regexp_split_to_table(string citext, pattern citext, flags text ) RETURNS SETOF TEXT AS $$
+ SELECT pg_catalog.regexp_split_to_table( $1::pg_catalog.text, $2::pg_catalog.text, CASE WHEN pg_catalog.strpos($3, 'c') = 0 THEN $3 || 'i' ELSE $3 END );
+$$ LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;
\ No newline at end of file
diff --git a/contrib/citext/citext.control b/contrib/citext/citext.control
index ccf44547..f82265b3 100644
--- a/contrib/citext/citext.control
+++ b/contrib/citext/citext.control
@@ -1,6 +1,6 @@
# citext extension
comment = 'data type for case-insensitive character strings'
-default_version = '1.6'
+default_version = '1.7'
module_pathname = '$libdir/citext'
relocatable = true
trusted = true
diff --git a/contrib/citext/expected/citext_1.out b/contrib/citext/expected/citext_1.out
index c5e5f180..91803bb4 100644
--- a/contrib/citext/expected/citext_1.out
+++ b/contrib/citext/expected/citext_1.out
@@ -1812,6 +1812,13 @@ SELECT regexp_match('foobarbequebaz'::citext, '(BAR)(BEQUE)'::citext, 'c'::citex
(1 row)
+SELECT regexp_match(string=>'foobarbequebaz'::citext, pattern=>'(BAR)(BEQUE)'::citext, flags=>'c'::citext)
+ = ARRAY[ 'bar', 'beque' ] AS "no result";
+ no result
+-----------
+
+(1 row)
+
-- g is not allowed
SELECT regexp_match('foobarbequebazmorebarbequetoo'::citext, '(BAR)(BEQUE)'::citext, 'g') AS "error";
ERROR: regexp_match() does not support the "global" option
@@ -1865,6 +1872,12 @@ SELECT regexp_matches('foobarbequebaz'::citext, '(BAR)(BEQUE)'::citext, 'c'::cit
---------
(0 rows)
+SELECT regexp_matches(string=>'foobarbequebaz'::citext, pattern=>'(BAR)(BEQUE)'::citext, flags=>'c'::citext)
+ = ARRAY[ 'bar', 'beque' ] AS "no rows";
+ no rows
+---------
+(0 rows)
+
-- g allows multiple output rows
SELECT regexp_matches('foobarbequebazmorebarbequetoo'::citext, '(BAR)(BEQUE)'::citext, 'g'::citext) AS "two rows";
two rows
@@ -1904,6 +1917,12 @@ SELECT regexp_replace('Thomas'::citext, '.[MN]A.'::citext, 'M', 'c') = 'Thomas'
t
(1 row)
+SELECT regexp_replace(string=>'Thomas'::citext, pattern=>'.[MN]A.'::citext, replacement=>'M', flags=>'c') = 'Thomas' AS t;
+ t
+---
+ t
+(1 row)
+
SELECT regexp_split_to_array('hello world'::citext, E'\\s+') = ARRAY[ 'hello', 'world' ] AS t;
t
---
@@ -1953,6 +1972,12 @@ SELECT regexp_split_to_array('helloTworld'::citext, 't'::citext, 'c') = ARRAY[ '
t
(1 row)
+SELECT regexp_split_to_array(string=>'helloTworld'::citext, pattern=>'t'::citext, flags=>'c') = ARRAY[ 'helloTworld' ] AS t;
+ t
+---
+ t
+(1 row)
+
SELECT regexp_split_to_table('hello world'::citext, E'\\s+') AS words;
words
-------
@@ -1988,6 +2013,12 @@ SELECT regexp_split_to_table('helloTworld'::citext, 't'::citext, 'c') AS word;
helloTworld
(1 row)
+SELECT regexp_split_to_table(string=>'helloTworld'::citext, pattern=>'t'::citext, flags=>'c') AS word;
+ word
+-------------
+ helloTworld
+(1 row)
+
SELECT repeat('Pg'::citext, 4) = 'PgPgPgPg' AS t;
t
---
diff --git a/contrib/citext/meson.build b/contrib/citext/meson.build
index 9770ab3a..40cdd0d2 100644
--- a/contrib/citext/meson.build
+++ b/contrib/citext/meson.build
@@ -25,6 +25,7 @@ install_data(
'citext--1.4.sql',
'citext--1.4--1.5.sql',
'citext--1.5--1.6.sql',
+ 'citext--1.6--1.7.sql',
kwargs: contrib_data_args,
)
diff --git a/contrib/citext/sql/citext.sql b/contrib/citext/sql/citext.sql
index aa1cf9ab..8c85f9ec 100644
--- a/contrib/citext/sql/citext.sql
+++ b/contrib/citext/sql/citext.sql
@@ -587,6 +587,9 @@ SELECT regexp_match('foobarbequebaz', '(BAR)(BEQUE)'::citext, '') = ARRAY[ 'bar'
SELECT regexp_match('foobarbequebaz'::citext, '(BAR)(BEQUE)'::citext, ''::citext) = ARRAY[ 'bar', 'beque' ] AS t;
-- c forces case-sensitive
SELECT regexp_match('foobarbequebaz'::citext, '(BAR)(BEQUE)'::citext, 'c'::citext) = ARRAY[ 'bar', 'beque' ] AS "no result";
+SELECT regexp_match(string=>'foobarbequebaz'::citext, pattern=>'(BAR)(BEQUE)'::citext, flags=>'c'::citext)
+ = ARRAY[ 'bar', 'beque' ] AS "no result";
+
-- g is not allowed
SELECT regexp_match('foobarbequebazmorebarbequetoo'::citext, '(BAR)(BEQUE)'::citext, 'g') AS "error";
@@ -599,6 +602,9 @@ SELECT regexp_matches('foobarbequebaz', '(BAR)(BEQUE)'::citext, '') = ARRAY[ 'ba
SELECT regexp_matches('foobarbequebaz'::citext, '(BAR)(BEQUE)'::citext, ''::citext) = ARRAY[ 'bar', 'beque' ] AS t;
-- c forces case-sensitive
SELECT regexp_matches('foobarbequebaz'::citext, '(BAR)(BEQUE)'::citext, 'c'::citext) = ARRAY[ 'bar', 'beque' ] AS "no rows";
+SELECT regexp_matches(string=>'foobarbequebaz'::citext, pattern=>'(BAR)(BEQUE)'::citext, flags=>'c'::citext)
+ = ARRAY[ 'bar', 'beque' ] AS "no rows";
+
-- g allows multiple output rows
SELECT regexp_matches('foobarbequebazmorebarbequetoo'::citext, '(BAR)(BEQUE)'::citext, 'g'::citext) AS "two rows";
@@ -608,6 +614,7 @@ SELECT regexp_replace('Thomas', '.[MN]A.'::citext, 'M') = 'ThM' AS t;
SELECT regexp_replace('Thomas'::citext, '.[MN]A.'::citext, 'M') = 'ThM' AS t;
-- c forces case-sensitive
SELECT regexp_replace('Thomas'::citext, '.[MN]A.'::citext, 'M', 'c') = 'Thomas' AS t;
+SELECT regexp_replace(string=>'Thomas'::citext, pattern=>'.[MN]A.'::citext, replacement=>'M', flags=>'c') = 'Thomas' AS t;
SELECT regexp_split_to_array('hello world'::citext, E'\\s+') = ARRAY[ 'hello', 'world' ] AS t;
SELECT regexp_split_to_array('helloTworld'::citext, 't') = ARRAY[ 'hello', 'world' ] AS t;
@@ -619,6 +626,7 @@ SELECT regexp_split_to_array('helloTworld'::citext, 't'::citext, 's') = ARRAY[ '
-- c forces case-sensitive
SELECT regexp_split_to_array('helloTworld'::citext, 't'::citext, 'c') = ARRAY[ 'helloTworld' ] AS t;
+SELECT regexp_split_to_array(string=>'helloTworld'::citext, pattern=>'t'::citext, flags=>'c') = ARRAY[ 'helloTworld' ] AS t;
SELECT regexp_split_to_table('hello world'::citext, E'\\s+') AS words;
SELECT regexp_split_to_table('helloTworld'::citext, 't') AS words;
@@ -626,6 +634,7 @@ SELECT regexp_split_to_table('helloTworld', 't'::citext) AS words;
SELECT regexp_split_to_table('helloTworld'::citext, 't'::citext) AS words;
-- c forces case-sensitive
SELECT regexp_split_to_table('helloTworld'::citext, 't'::citext, 'c') AS word;
+SELECT regexp_split_to_table(string=>'helloTworld'::citext, pattern=>'t'::citext, flags=>'c') AS word;
SELECT repeat('Pg'::citext, 4) = 'PgPgPgPg' AS t;
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index fd5699f4..a329cb81 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3426,19 +3426,29 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in
<primary>regexp_replace</primary>
</indexterm>
<function>regexp_replace</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>, <parameter>replacement</parameter> <type>text</type>
- [, <parameter>start</parameter> <type>integer</type> ]
- [, <parameter>flags</parameter> <type>text</type> ] )
+ <optional>
+ <optional>, <parameter>start</parameter> <type>integer</type> </optional> |
+ <optional>, <parameter>flags</parameter> <type>text</type> </optional>
+ </optional>)
<returnvalue>text</returnvalue>
</para>
<para>
Replaces the substring that is the first match to the POSIX
regular expression <parameter>pattern</parameter>, or all such
- matches if the <literal>g</literal> flag is used; see
+ matches if the <literal>g</literal> flag is used.
+ The optional <parameter>start</parameter> is the character index that beginning to search for in <parameter>string</parameter>,
+ character index begin with 1.
+ <parameter>start</parameter> cannot specified together with <parameter>flags</parameter>.
+ see
<xref linkend="functions-posix-regexp"/>.
</para>
<para>
<literal>regexp_replace('Thomas', '.[mN]a.', 'M')</literal>
<returnvalue>ThM</returnvalue>
+ </para>
+ <para>
+ <literal>regexp_replace('A PostgreSQL function', 'A|e|i|o|u', 'X', 2)</literal>
+ <returnvalue>A PXstgreSQL function</returnvalue>
</para></entry>
</row>
@@ -6109,18 +6119,21 @@ SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab;
The <function>regexp_replace</function> function provides substitution of
new text for substrings that match POSIX regular expression patterns.
It has the syntax
- <function>regexp_replace</function>(<replaceable>source</replaceable>,
+ <function>regexp_replace</function>(<replaceable>string</replaceable>,
<replaceable>pattern</replaceable>, <replaceable>replacement</replaceable>
- <optional>, <replaceable>start</replaceable>
- <optional>, <replaceable>N</replaceable>
- </optional></optional>
+ <optional>
+ <optional>, <replaceable>start</replaceable></optional> |
+ <optional>, <replaceable>flags</replaceable></optional>
+ </optional>)
+ and
+ <function>regexp_replace</function>(<replaceable>string</replaceable>,
+ <replaceable>pattern</replaceable>, <replaceable>replacement</replaceable>
+ , <replaceable>start</replaceable>
+ , <replaceable>N</replaceable>
<optional>, <replaceable>flags</replaceable> </optional>).
- (Notice that <replaceable>N</replaceable> cannot be specified
- unless <replaceable>start</replaceable> is,
- but <replaceable>flags</replaceable> can be given in any case.)
- The <replaceable>source</replaceable> string is returned unchanged if
+ The <replaceable>string</replaceable> is returned unchanged if
there is no match to the <replaceable>pattern</replaceable>. If there is a
- match, the <replaceable>source</replaceable> string is returned with the
+ match, the <replaceable>string</replaceable> is returned with the
<replaceable>replacement</replaceable> string substituted for the matching
substring. The <replaceable>replacement</replaceable> string can contain
<literal>\</literal><replaceable>n</replaceable>, where <replaceable>n</replaceable> is 1
@@ -6161,7 +6174,7 @@ regexp_replace('foobarbaz', 'b(..)', 'X\1Y', 'g')
<lineannotation>fooXarYXazY</lineannotation>
regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 0, 'i')
<lineannotation>X PXstgrXSQL fXnctXXn</lineannotation>
-regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 3, 'i')
+regexp_replace(string=>'A PostgreSQL function', pattern=>'a|e|i|o|u', replacement=>'X',start=>1, "N"=>3, flags=>'i');
<lineannotation>A PostgrXSQL function</lineannotation>
</programlisting>
</para>
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 73d9cf85..e4ead68f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -3623,105 +3623,148 @@
prosrc => 'replace_text' },
{ oid => '2284', descr => 'replace text using regexp',
proname => 'regexp_replace', prorettype => 'text',
- proargtypes => 'text text text', prosrc => 'textregexreplace_noopt' },
+ proargtypes => 'text text text',
+ proargnames => '{string, pattern, replacement}',
+ prosrc => 'textregexreplace_noopt' },
{ oid => '2285', descr => 'replace text using regexp',
proname => 'regexp_replace', prorettype => 'text',
- proargtypes => 'text text text text', prosrc => 'textregexreplace' },
+ proargtypes => 'text text text text',
+ proargnames => '{string, pattern, replacement, flags}',
+ prosrc => 'textregexreplace' },
{ oid => '6251', descr => 'replace text using regexp',
proname => 'regexp_replace', prorettype => 'text',
proargtypes => 'text text text int4 int4 text',
+ proargnames => '{string, pattern, replacement, start, N, flags}',
prosrc => 'textregexreplace_extended' },
{ oid => '6252', descr => 'replace text using regexp',
proname => 'regexp_replace', prorettype => 'text',
proargtypes => 'text text text int4 int4',
+ proargnames => '{string, pattern, replacement, start, N}',
prosrc => 'textregexreplace_extended_no_flags' },
{ oid => '6253', descr => 'replace text using regexp',
proname => 'regexp_replace', prorettype => 'text',
proargtypes => 'text text text int4',
+ proargnames => '{string, pattern, replacement, start}',
prosrc => 'textregexreplace_extended_no_n' },
{ oid => '3396', descr => 'find first match for regexp',
proname => 'regexp_match', prorettype => '_text', proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_match_no_flags' },
{ oid => '3397', descr => 'find first match for regexp',
proname => 'regexp_match', prorettype => '_text',
- proargtypes => 'text text text', prosrc => 'regexp_match' },
+ proargtypes => 'text text text',
+ proargnames => '{string, pattern, flags}',
+ prosrc => 'regexp_match' },
{ oid => '2763', descr => 'find match(es) for regexp',
proname => 'regexp_matches', prorows => '1', proretset => 't',
prorettype => '_text', proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_matches_no_flags' },
{ oid => '2764', descr => 'find match(es) for regexp',
proname => 'regexp_matches', prorows => '10', proretset => 't',
prorettype => '_text', proargtypes => 'text text text',
+ proargnames => '{string, pattern, flags}',
prosrc => 'regexp_matches' },
{ oid => '6254', descr => 'count regexp matches',
proname => 'regexp_count', prorettype => 'int4', proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_count_no_start' },
{ oid => '6255', descr => 'count regexp matches',
proname => 'regexp_count', prorettype => 'int4',
- proargtypes => 'text text int4', prosrc => 'regexp_count_no_flags' },
+ proargtypes => 'text text int4',
+ proargnames => '{string, pattern, start}',
+ prosrc => 'regexp_count_no_flags' },
{ oid => '6256', descr => 'count regexp matches',
proname => 'regexp_count', prorettype => 'int4',
- proargtypes => 'text text int4 text', prosrc => 'regexp_count' },
+ proargtypes => 'text text int4 text',
+ proargnames => '{string, pattern, start, flags}',
+ prosrc => 'regexp_count' },
{ oid => '6257', descr => 'position of regexp match',
proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_instr_no_start' },
{ oid => '6258', descr => 'position of regexp match',
proname => 'regexp_instr', prorettype => 'int4',
- proargtypes => 'text text int4', prosrc => 'regexp_instr_no_n' },
+ proargtypes => 'text text int4',
+ proargnames => '{string, pattern, start}',
+ prosrc => 'regexp_instr_no_n' },
{ oid => '6259', descr => 'position of regexp match',
proname => 'regexp_instr', prorettype => 'int4',
- proargtypes => 'text text int4 int4', prosrc => 'regexp_instr_no_endoption' },
+ proargtypes => 'text text int4 int4',
+ proargnames => '{string, pattern, start, N}',
+ prosrc => 'regexp_instr_no_endoption' },
{ oid => '6260', descr => 'position of regexp match',
proname => 'regexp_instr', prorettype => 'int4',
proargtypes => 'text text int4 int4 int4',
+ proargnames => '{string, pattern, start, N, endoption}',
prosrc => 'regexp_instr_no_flags' },
{ oid => '6261', descr => 'position of regexp match',
proname => 'regexp_instr', prorettype => 'int4',
proargtypes => 'text text int4 int4 int4 text',
+ proargnames => '{string, pattern, start, N, endoption, flags}',
prosrc => 'regexp_instr_no_subexpr' },
{ oid => '6262', descr => 'position of regexp match',
proname => 'regexp_instr', prorettype => 'int4',
proargtypes => 'text text int4 int4 int4 text int4',
+ proargnames => '{string, pattern, start, N, endoption, flags, subexpr}',
prosrc => 'regexp_instr' },
{ oid => '6263', descr => 'test for regexp match',
- proname => 'regexp_like', prorettype => 'bool', proargtypes => 'text text',
+ proname => 'regexp_like', prorettype => 'bool',
+ proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_like_no_flags' },
{ oid => '6264', descr => 'test for regexp match',
proname => 'regexp_like', prorettype => 'bool',
- proargtypes => 'text text text', prosrc => 'regexp_like' },
+ proargtypes => 'text text text',
+ proargnames => '{string, pattern,flags}',
+ prosrc => 'regexp_like' },
{ oid => '6265', descr => 'extract substring that matches regexp',
proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_substr_no_start' },
{ oid => '6266', descr => 'extract substring that matches regexp',
proname => 'regexp_substr', prorettype => 'text',
- proargtypes => 'text text int4', prosrc => 'regexp_substr_no_n' },
+ proargtypes => 'text text int4',
+ proargnames => '{string, pattern, start}',
+ prosrc => 'regexp_substr_no_n' },
{ oid => '6267', descr => 'extract substring that matches regexp',
proname => 'regexp_substr', prorettype => 'text',
- proargtypes => 'text text int4 int4', prosrc => 'regexp_substr_no_flags' },
+ proargtypes => 'text text int4 int4',
+ proargnames => '{string, pattern, start, N}',
+ prosrc => 'regexp_substr_no_flags' },
{ oid => '6268', descr => 'extract substring that matches regexp',
proname => 'regexp_substr', prorettype => 'text',
proargtypes => 'text text int4 int4 text',
+ proargnames => '{string, pattern, start, N, flags}',
prosrc => 'regexp_substr_no_subexpr' },
{ oid => '6269', descr => 'extract substring that matches regexp',
proname => 'regexp_substr', prorettype => 'text',
- proargtypes => 'text text int4 int4 text int4', prosrc => 'regexp_substr' },
+ proargtypes => 'text text int4 int4 text int4',
+ proargnames => '{string, pattern, start, N, flags, subexpr}',
+ prosrc => 'regexp_substr' },
{ oid => '2088', descr => 'split string by field_sep and return field_num',
proname => 'split_part', prorettype => 'text',
proargtypes => 'text text int4', prosrc => 'split_part' },
{ oid => '2765', descr => 'split string by pattern',
proname => 'regexp_split_to_table', prorows => '1000', proretset => 't',
prorettype => 'text', proargtypes => 'text text',
+ proargnames => '{string, pattern}',
prosrc => 'regexp_split_to_table_no_flags' },
{ oid => '2766', descr => 'split string by pattern',
proname => 'regexp_split_to_table', prorows => '1000', proretset => 't',
prorettype => 'text', proargtypes => 'text text text',
+ proargnames => '{string, pattern, flags}',
prosrc => 'regexp_split_to_table' },
{ oid => '2767', descr => 'split string by pattern',
proname => 'regexp_split_to_array', prorettype => '_text',
- proargtypes => 'text text', prosrc => 'regexp_split_to_array_no_flags' },
+ proargtypes => 'text text',
+ proargnames => '{string, pattern}',
+ prosrc => 'regexp_split_to_array_no_flags' },
{ oid => '2768', descr => 'split string by pattern',
proname => 'regexp_split_to_array', prorettype => '_text',
- proargtypes => 'text text text', prosrc => 'regexp_split_to_array' },
+ proargtypes => 'text text text',
+ proargnames => '{string, pattern, flags}',
+ prosrc => 'regexp_split_to_array' },
{ oid => '6330', descr => 'convert int4 number to binary',
proname => 'to_bin', prorettype => 'text', proargtypes => 'int4',
prosrc => 'to_bin32' },
--
2.34.1
jian he <jian.universality@gmail.com> writes:
On Fri, Jul 19, 2024 at 5:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
The larger issue is that contrib/citext offers versions of some of
these functions that are meant to be drop-in replacements using
citext input. Hence, we need to add the same parameter names to
those functions, or they'll fail to replace some calls.
citext module, these functions:
regexp_match()
regexp_matches()
regexp_replace()
regexp_split_to_array()
regexp_split_to_table()
were created in contrib/citext/citext--1.4.sql, we can add the CREATE
OR REPLACE FUNCTION to 1.4.sql.
but to avoid unintended consequences I just add these to the newly
created file citext--1.6--1.7.sql,
to make a version bump.
Yes. You *have to* do it like that, the shortcut is not an option,
because without an extension update script there is no way to
upgrade an existing installation to the new definition. Basically,
once we ship a given release of an extension, that script is frozen
in amber.
I haven't heard any further bikeshedding on the argument names,
so I'll move forward with committing this soon.
regards, tom lane
On 15.07.24 16:52, Chapman Flack wrote:
On 07/15/24 10:46, Chapman Flack wrote:
Ah, I may have mistaken which functions the patch meant to apply to.
...
Any choice to use similar argument names in the regexp_* functions would
be a matter of consistency with the analogous ISO functions, not anything
mandated.Or, looking back, I might have realized these were the non-ISO regexp_*
functions, but seen there was bikeshedding happening over the best name
to use for the occurrence argument, and merely suggested ISO's choice
OCCURRENCE for the analogous ISO functions, as a possible bikeshed
accelerator.
These functions were copied from Oracle, so one argument was to use the
names from Oracle as-is.
Import Notes
Reply to msg id not found: 669537C9.7060804@anastigmatix.net
I wrote:
I haven't heard any further bikeshedding on the argument names,
so I'll move forward with committing this soon.
Pushed, after a little further fooling with the documentation.
regards, tom lane
On Fri, Jul 19, 2024 at 5:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
The small issue is that table 9.10 offers this syntax diagram
for regexp_replace:regexp_replace ( string text, pattern text, replacement text [, start integer ] [, flags text ] ) → text
This implies that it's valid to write
regexp_replace (string, pattern, replacement, start, flags)
but it is not: we have no function matching that signature. I'm not
in a hurry to add one, either, for fear of ambiguity against the other
regexp_replace signature. I think this needs to be broken into two
syntax diagrams:regexp_replace ( string text, pattern text, replacement text [, start integer ] ) → text
regexp_replace ( string text, pattern text, replacement text [, flags text ] ) → text
this problem is still there, after commit
580f8727ca93b7b9a2ce49746b9cdbcb0a2b4a7e.
<<
It has the syntax regexp_replace(string, pattern, replacement [, start
[, N ]] [, flags ]). (Notice that N cannot be specified unless start
is, but flags can be given in any case.)
<<
doc, the above part still needs change?
see my posts:
/messages/by-id/CACJufxE5p4KhGyBUwCZCxhxdU+zJBXy2deX4u85SL+kew4F7Cw@mail.gmail.com
jian he <jian.universality@gmail.com> writes:
On Fri, Jul 19, 2024 at 5:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
but it is not: we have no function matching that signature. I'm not
in a hurry to add one, either, for fear of ambiguity against the other
regexp_replace signature. I think this needs to be broken into two
syntax diagrams:
this problem is still there, after commit
580f8727ca93b7b9a2ce49746b9cdbcb0a2b4a7e.
No, I believe I fixed it: the table now offers
regexp_replace ( string text, pattern text, replacement text [, flags text ] ) → text
regexp_replace ( string text, pattern text, replacement text, start integer [, N integer [, flags text ] ] ) → text
That's different from either of the solutions discussed in this
thread, but simpler.
<<
It has the syntax regexp_replace(string, pattern, replacement [, start
[, N ]] [, flags ]). (Notice that N cannot be specified unless start
is, but flags can be given in any case.)
<<
doc, the above part still needs change?
AFAICS, that one is correct, so I left it alone. (I didn't try to
merge the table's two entries into one like that, though.)
regards, tom lane
On Fri, Jul 26, 2024 at 10:17 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
<<
It has the syntax regexp_replace(string, pattern, replacement [, start
[, N ]] [, flags ]). (Notice that N cannot be specified unless start
is, but flags can be given in any case.)
<<
doc, the above part still needs change?AFAICS, that one is correct, so I left it alone. (I didn't try to
merge the table's two entries into one like that, though.)
functions-string.html output is correct.
but in functions-matching.html
regexp_replace(string, pattern, replacement [, start [, N ]] [, flags ]).
can represent
regexp_replace(string, pattern, replacement , start, flags ) ?
but we don't have "regexp_replace(string, pattern, replacement ,
start, flags )"
jian he <jian.universality@gmail.com> writes:
On Fri, Jul 26, 2024 at 10:17 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
AFAICS, that one is correct, so I left it alone. (I didn't try to
merge the table's two entries into one like that, though.)
regexp_replace(string, pattern, replacement [, start [, N ]] [, flags ]).
can represent
regexp_replace(string, pattern, replacement , start, flags ) ?
Hmm, yeah, you're right. I didn't want to write two separate
synopses there, but maybe there's no choice.
regards, tom lane
On Fri, Jul 26, 2024 at 10:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
jian he <jian.universality@gmail.com> writes:
On Fri, Jul 26, 2024 at 10:17 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
AFAICS, that one is correct, so I left it alone. (I didn't try to
merge the table's two entries into one like that, though.)regexp_replace(string, pattern, replacement [, start [, N ]] [, flags ]).
can represent
regexp_replace(string, pattern, replacement , start, flags ) ?
Hmm, yeah, you're right. I didn't want to write two separate
synopses there, but maybe there's no choice.
we can get rid of:
(Notice that <replaceable>N</replaceable> cannot be specified
unless <replaceable>start</replaceable> is,
but <replaceable>flags</replaceable> can be given in any case.)
Now the output is
It has the syntax regexp_replace(string, pattern, replacement [, flags
]) and regexp_replace(string, pattern, replacement, start [, N [,
flags ]]).
I also decorated "[]" with "<optional>".
Attachments:
v1-0001-refactor-regexp_replace-syntax.patchtext/x-patch; charset=US-ASCII; name=v1-0001-refactor-regexp_replace-syntax.patchDownload
From 40382d09208dc27fae141ffbfb181df649af7940 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Sat, 27 Jul 2024 08:46:21 +0800
Subject: [PATCH v1 1/1] refactor regexp_replace syntax
the status quo syntax:
"It has the syntax regexp_replace(string, pattern, replacement [, start [, N ]] [, flags ])."
is incorrect.
changing it to
"It has the syntax regexp_replace(string, pattern, replacement [, flags ])
and regexp_replace(string, pattern, replacement, start [, N [, flags ]])."
---
doc/src/sgml/func.sgml | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index b669ab7f..0fef7600 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3426,7 +3426,7 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in
<primary>regexp_replace</primary>
</indexterm>
<function>regexp_replace</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>, <parameter>replacement</parameter> <type>text</type>
- [, <parameter>flags</parameter> <type>text</type> ] )
+ <optional>, <parameter>flags</parameter> <type>text</type> </optional> )
<returnvalue>text</returnvalue>
</para>
<para>
@@ -3445,8 +3445,8 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in
<entry role="func_table_entry"><para role="func_signature">
<function>regexp_replace</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>, <parameter>replacement</parameter> <type>text</type>,
<parameter>start</parameter> <type>integer</type>
- [, <parameter>N</parameter> <type>integer</type>
- [, <parameter>flags</parameter> <type>text</type> ] ] )
+ <optional>, <parameter>N</parameter> <type>integer</type>
+ <optional>, <parameter>flags</parameter> <type>text</type> </optional> </optional> )
<returnvalue>text</returnvalue>
</para>
<para>
@@ -6117,13 +6117,13 @@ SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab;
It has the syntax
<function>regexp_replace</function>(<replaceable>string</replaceable>,
<replaceable>pattern</replaceable>, <replaceable>replacement</replaceable>
- <optional>, <replaceable>start</replaceable>
+ <optional>, <replaceable>flags</replaceable> </optional>)
+ and
+ <function>regexp_replace</function>(<replaceable>string</replaceable>,
+ <replaceable>pattern</replaceable>, <replaceable>replacement</replaceable>,
+ <replaceable>start</replaceable>
<optional>, <replaceable>N</replaceable>
- </optional></optional>
- <optional>, <replaceable>flags</replaceable> </optional>).
- (Notice that <replaceable>N</replaceable> cannot be specified
- unless <replaceable>start</replaceable> is,
- but <replaceable>flags</replaceable> can be given in any case.)
+ <optional>, <replaceable>flags</replaceable> </optional></optional>).
The source <replaceable>string</replaceable> is returned unchanged if
there is no match to the <replaceable>pattern</replaceable>. If there is a
match, the <replaceable>string</replaceable> is returned with the
--
2.34.1
jian he <jian.universality@gmail.com> writes:
On Fri, Jul 26, 2024 at 10:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Hmm, yeah, you're right. I didn't want to write two separate
synopses there, but maybe there's no choice.
Now the output is
It has the syntax regexp_replace(string, pattern, replacement [, flags
]) and regexp_replace(string, pattern, replacement, start [, N [,
flags ]]).
Pushed, thanks.
regards, tom lane